SOTL-4.3. Conduct Stylometric Analysis

> [!info] > Input: [[Social Media Content|user-generated text content (posts)]] > Output: [[Fingerprint|fingerprint of text]] > > Types: [[Behavioural Weakness|behavioural]] > Weakness: [[SOWEL-4. Communicating]] > Functionality: [[SOFL-1. Communications]] ### Explanation Authors leave a measurable fingerprint in their text — preferred sentence length, function-word frequencies, punctuation habits, characteristic typos, vocabulary richness, idiosyncratic phrasings. Even when someone hides behind a new pseudonym, their writing usually matches their previous work closely enough to be statistically identified. Collect a corpus of confirmed posts from the suspected author (a few hundred words is the bare minimum; several thousand makes the result far stronger) and a comparable corpus from the anonymous account. Run both through a stylometry tool that computes feature vectors and a similarity score — JGAAP, the `stylometry` Python library, or Voyant Tools for an interactive walkthrough. The result is a probability, not proof — stylometry has misattributed authors in published academic cases — so treat the score as one signal among several. Combine with timing analysis (`SOTL-5.4. Study Time of Posts and Actions`) and content-tag analysis (`SOTL-3.10. Analyze Content Tags`) for higher confidence. ### Examples {{some links to articles, videos, etc}} ### Tools - https://github.com/evllabs/JGAAP - https://voyant-tools.org - https://github.com/jpotts18/stylometry ### See also - [[SOTL-4.2. Check Emoji]]