Skip to content

Releases: ContextLab/llm-stylometry

v1.0 - Public Release

28 Oct 04:21
b22c613

Choose a tag to compare

LLM Stylometry v1.0 - Public Release

Paper: A Stylometric Application of Large Language Models (Stropkay et al., 2025)

This release accompanies the arXiv preprint and makes all code, data, models, and analyses publicly available.

Key Features

📊 Reproducible Analysis

  • 320 trained models (8 authors × 10 seeds × 4 conditions)
  • Pre-computed results included for all figures
  • One-line figure generation from pre-computed data
  • Complete analysis pipeline from raw data to publication figures

🤖 HuggingFace Models (NEW!)

All 8 author-specific GPT-2 models publicly available:

Each model trained for 50,000 epochs (final loss ~1.2-1.5).

📚 HuggingFace Datasets (NEW!)

All 8 author text corpora with verified book titles:

📦 Pre-trained Model Weights

  • Dropbox distribution for all 320 paper models
  • Automatic download script with checksum verification
  • ~26GB compressed archives
  • See models/README.md for download instructions

🎨 Visualization & Analysis

  • 7 main figures (baseline condition)
  • 32 supplemental figures (3 linguistic variants)
  • Statistical analyses (t-tests, cross-variant comparisons)
  • Text classification experiments

Quick Start

# Clone repository
git clone https://github.com/ContextLab/llm-stylometry.git
cd llm-stylometry

# Generate all figures (from pre-computed results)
./run_llm_stylometry.sh

# Or generate specific figure
./run_llm_stylometry.sh -f 1a

What's Included

  • ✅ Complete training pipeline for GPT-2 models
  • ✅ Visualization tools for all paper figures
  • ✅ Statistical analysis scripts
  • ✅ Text classification experiments
  • ✅ Comprehensive documentation
  • ✅ Full test suite (pytest)
  • ✅ CI/CD integration (GitHub Actions)

Installation

One-line setup:

./run_llm_stylometry.sh

This automatically creates conda environment, installs dependencies, and generates all figures.

Citation

@article{StroEtal25,
  title={A Stylometric Application of Large Language Models},
  author={Stropkay, Harrison F. and Chen, Jiayi and Jabelli, Mohammad J. L. and Rockmore, Daniel N. and Manning, Jeremy R.},
  journal={arXiv preprint arXiv:2510.21958},
  year={2025}
}

Contact


Major contributors: Harrison Stropkay, Jiayi Chen, Mohammad Jabelli, Daniel Rockmore, Jeremy Manning

License: MIT