> [!info]
> Input: [[Social Media Content|user-generated text content (posts)]]
> Output: [[Fingerprint|fingerprint of text]]
>
> Types: [[Behavioural Weakness|behavioural]]
> Weakness: [[SOWEL-4. Communicating]]
> Functionality: [[SOFL-1. Communications]]
### Explanation
Authors leave a measurable fingerprint in their text — preferred sentence length, function-word frequencies, punctuation habits, characteristic typos, vocabulary richness, idiosyncratic phrasings. Even when someone hides behind a new pseudonym, their writing usually matches their previous work closely enough to be statistically identified.
Collect a corpus of confirmed posts from the suspected author (a few hundred words is the bare minimum; several thousand makes the result far stronger) and a comparable corpus from the anonymous account. Run both through a stylometry tool that computes feature vectors and a similarity score — JGAAP, the `stylometry` Python library, or Voyant Tools for an interactive walkthrough.
The result is a probability, not proof — stylometry has misattributed authors in published academic cases — so treat the score as one signal among several. Combine with timing analysis (`SOTL-5.4. Study Time of Posts and Actions`) and content-tag analysis (`SOTL-3.10. Analyze Content Tags`) for higher confidence.
### Examples
{{some links to articles, videos, etc}}
### Tools
- https://github.com/evllabs/JGAAP
- https://voyant-tools.org
- https://github.com/jpotts18/stylometry
### See also
- [[SOTL-4.2. Check Emoji]]