attribute

Reproduction of Anthropic's Attribution Graphs.

Setup

# install uv if you don't have it
# curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --seed
uv sync

Cache activations

This step is not strictly necessary. Given the weights of a transcoder, we need to compute activations on a large dataset to be able to select maximum-activating examples.

# scripts/cache_activations
MODEL=HuggingFaceTB/SmolLM2-135M
TRANSCODER="nev/SmolLM2-CLT-135M-73k-k32"
DATASET="--dataset_repo EleutherAI/fineweb-edu-dedup-10b --dataset_split train --n_tokens 10_000_000"
NAME="smollm-v1"

uv run python cache.py $MODEL $TRANSCODER --num_gpus 1 $DATASET --hookpoints layers.0.mlp layers.1.mlp layers.2.mlp layers.3.mlp layers.4.mlp layers.5.mlp layers.6.mlp layers.7.mlp layers.8.mlp layers.9.mlp layers.10.mlp layers.11.mlp layers.12.mlp layers.13.mlp layers.14.mlp layers.15.mlp layers.16.mlp layers.17.mlp layers.18.mlp layers.19.mlp layers.20.mlp layers.21.mlp layers.22.mlp layers.23.mlp layers.24.mlp layers.25.mlp layers.26.mlp layers.27.mlp layers.28.mlp layers.29.mlp --name $NAME --batch_size 16

Alternatively, you can download the cache from Huggingface:

cd results && git clone https://huggingface.co/nev/SmolLM2-CLT-135M-73k-k32-cache --depth=1

Web UI

We developed a gradio visualization for running attribution:

uv run gradio serve.py

Colab (WIP)

The web UI can be run from a Colab notebook.

Run in CLI

SCAN="smollm-v1"
PROMPT_NAME="smollm-v1"
PROMPT_TEXT="Michael Jordan plays the sport of"
uv run python -m attribute \
--cache_path=results/$SCAN/latents \
--name $PROMPT_NAME \
--scan "$SCAN" \
"$PROMPT_TEXT" \
--transcoder_path="nev/SmolLM2-CLT-135M-73k-k32" --model_name="HuggingFaceTB/SmolLM2-135M" \

uv run python -m attribute
cd attribution-graphs-frontend
uv run python serve.py 9999
ngrok http 9999

TODOs

Cross-verify with Anthropic implementation

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
attribute		attribute
attribution-graphs-frontend		attribution-graphs-frontend
experiments		experiments
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
cache.py		cache.py
pyproject.toml		pyproject.toml
serve.ipynb		serve.ipynb
serve.py		serve.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

attribute

Setup

Cache activations

Web UI

Colab (WIP)

Run in CLI

TODOs

About

Uh oh!

Releases

Packages

Languages

EleutherAI/attribute

Folders and files

Latest commit

History

Repository files navigation

attribute

Setup

Cache activations

Web UI

Colab (WIP)

Run in CLI

TODOs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages