Releases: ContextLab/llm-stylometry
Releases · ContextLab/llm-stylometry
v1.0 - Public Release
LLM Stylometry v1.0 - Public Release
Paper: A Stylometric Application of Large Language Models (Stropkay et al., 2025)
This release accompanies the arXiv preprint and makes all code, data, models, and analyses publicly available.
Key Features
📊 Reproducible Analysis
- 320 trained models (8 authors × 10 seeds × 4 conditions)
- Pre-computed results included for all figures
- One-line figure generation from pre-computed data
- Complete analysis pipeline from raw data to publication figures
🤖 HuggingFace Models (NEW!)
All 8 author-specific GPT-2 models publicly available:
- Jane Austen
- L. Frank Baum
- Charles Dickens
- F. Scott Fitzgerald
- Herman Melville
- Ruth Plumly Thompson
- Mark Twain
- H.G. Wells
Each model trained for 50,000 epochs (final loss ~1.2-1.5).
📚 HuggingFace Datasets (NEW!)
All 8 author text corpora with verified book titles:
- 84 books total from Project Gutenberg
- Cleaned and preprocessed for stylometry
- Professionally documented dataset cards
- Browse at: https://huggingface.co/contextlab
📦 Pre-trained Model Weights
- Dropbox distribution for all 320 paper models
- Automatic download script with checksum verification
- ~26GB compressed archives
- See
models/README.mdfor download instructions
🎨 Visualization & Analysis
- 7 main figures (baseline condition)
- 32 supplemental figures (3 linguistic variants)
- Statistical analyses (t-tests, cross-variant comparisons)
- Text classification experiments
Quick Start
# Clone repository
git clone https://github.com/ContextLab/llm-stylometry.git
cd llm-stylometry
# Generate all figures (from pre-computed results)
./run_llm_stylometry.sh
# Or generate specific figure
./run_llm_stylometry.sh -f 1aWhat's Included
- ✅ Complete training pipeline for GPT-2 models
- ✅ Visualization tools for all paper figures
- ✅ Statistical analysis scripts
- ✅ Text classification experiments
- ✅ Comprehensive documentation
- ✅ Full test suite (pytest)
- ✅ CI/CD integration (GitHub Actions)
Installation
One-line setup:
./run_llm_stylometry.shThis automatically creates conda environment, installs dependencies, and generates all figures.
Citation
@article{StroEtal25,
title={A Stylometric Application of Large Language Models},
author={Stropkay, Harrison F. and Chen, Jiayi and Jabelli, Mohammad J. L. and Rockmore, Daniel N. and Manning, Jeremy R.},
journal={arXiv preprint arXiv:2510.21958},
year={2025}
}Contact
- Paper: https://arxiv.org/abs/2510.21958
- Code: https://github.com/ContextLab/llm-stylometry
- Issues: https://github.com/ContextLab/llm-stylometry/issues
- ContextLab: https://www.context-lab.com/
Major contributors: Harrison Stropkay, Jiayi Chen, Mohammad Jabelli, Daniel Rockmore, Jeremy Manning
License: MIT