LLM Stylometry v1.0 - Public Release

Paper: A Stylometric Application of Large Language Models (Stropkay et al., 2025)

This release accompanies the arXiv preprint and makes all code, data, models, and analyses publicly available.

Key Features

📊 Reproducible Analysis

320 trained models (8 authors × 10 seeds × 4 conditions)
Pre-computed results included for all figures
One-line figure generation from pre-computed data
Complete analysis pipeline from raw data to publication figures

🤖 HuggingFace Models (NEW!)

All 8 author-specific GPT-2 models publicly available:

Each model trained for 50,000 epochs (final loss ~1.2-1.5).

📚 HuggingFace Datasets (NEW!)

All 8 author text corpora with verified book titles:

84 books total from Project Gutenberg
Cleaned and preprocessed for stylometry
Professionally documented dataset cards
Browse at: https://huggingface.co/contextlab

📦 Pre-trained Model Weights

Dropbox distribution for all 320 paper models
Automatic download script with checksum verification
~26GB compressed archives
See models/README.md for download instructions

🎨 Visualization & Analysis

7 main figures (baseline condition)
32 supplemental figures (3 linguistic variants)
Statistical analyses (t-tests, cross-variant comparisons)
Text classification experiments

Quick Start

# Clone repository
git clone https://github.com/ContextLab/llm-stylometry.git
cd llm-stylometry

# Generate all figures (from pre-computed results)
./run_llm_stylometry.sh

# Or generate specific figure
./run_llm_stylometry.sh -f 1a

What's Included

✅ Complete training pipeline for GPT-2 models
✅ Visualization tools for all paper figures
✅ Statistical analysis scripts
✅ Text classification experiments
✅ Comprehensive documentation
✅ Full test suite (pytest)
✅ CI/CD integration (GitHub Actions)

Installation

One-line setup:

./run_llm_stylometry.sh

This automatically creates conda environment, installs dependencies, and generates all figures.

Citation

@article{StroEtal25,
  title={A Stylometric Application of Large Language Models},
  author={Stropkay, Harrison F. and Chen, Jiayi and Jabelli, Mohammad J. L. and Rockmore, Daniel N. and Manning, Jeremy R.},
  journal={arXiv preprint arXiv:2510.21958},
  year={2025}
}

Contact

Paper: https://arxiv.org/abs/2510.21958
Code: https://github.com/ContextLab/llm-stylometry
Issues: https://github.com/ContextLab/llm-stylometry/issues
ContextLab: https://www.context-lab.com/

Major contributors: Harrison Stropkay, Jiayi Chen, Mohammad Jabelli, Daniel Rockmore, Jeremy Manning

License: MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

LLM Stylometry v1.0 - Public Release

Key Features

📊 Reproducible Analysis

🤖 HuggingFace Models (NEW!)

📚 HuggingFace Datasets (NEW!)

📦 Pre-trained Model Weights

🎨 Visualization & Analysis

Quick Start

What's Included

Installation

Citation

Contact

Uh oh!

Releases: ContextLab/llm-stylometry

v1.0 - Public Release

LLM Stylometry v1.0 - Public Release

Key Features

📊 Reproducible Analysis

🤖 HuggingFace Models (NEW!)

📚 HuggingFace Datasets (NEW!)

📦 Pre-trained Model Weights

🎨 Visualization & Analysis

Quick Start

What's Included

Installation

Citation

Contact

Uh oh!