Skip to content

Validate shard filenames in checkpoint index to prevent path traversal (silent weight injection)#46913

Open
Snakinya wants to merge 2 commits into
huggingface:mainfrom
Snakinya:security/shard-index-path-traversal
Open

Validate shard filenames in checkpoint index to prevent path traversal (silent weight injection)#46913
Snakinya wants to merge 2 commits into
huggingface:mainfrom
Snakinya:security/shard-index-path-traversal

Conversation

@Snakinya

Copy link
Copy Markdown

Summary

get_checkpoint_shard_files in src/transformers/utils/hub.py reads the weight_map of model.safetensors.index.json / pytorch_model.bin.index.json and feeds weight_map.values() straight into os.path.join(repo, subfolder, f) with no validation. When loading an untrusted Hub repo, those values are attacker-controlled, and an absolute path or a ..-bearing value resolves the shard outside the checkpoint directory. The escaped path is then opened by safe_open (.safetensors) or torch.load (.bin) and loaded into the model as weights.

This patch rejects any shard filename that is not a single relative basename (no path separators, no absolute path, no ..), and adds regression tests.

Why this matters / impact

Defaults don't help. The bypass is reachable under the full safe-loading default stack:

  • use_safetensors=True (default) — the shard the attacker redirects to is itself a legal .safetensors file, so the safetensors-first cascade is happy.
  • weights_only=True (torch 2.6+ default) — safe_open doesn't go through pickle at all, so this gate never sees the attacker file.
  • trust_remote_code=False (default) — the bug is on the weight-loading path, not the remote-code path; no opt-in is required.

End-to-end PoC, run against the latest main at the time of writing (9b6af5d, 17 minutes old):

# attacker publishes a Hub repo with:
#   config.json = {"model_type":"bert","architectures":["BertModel"]}
#   model.safetensors.index.json = {"metadata":{},
#       "weight_map":{"bert.embeddings.word_embeddings.weight":
#                     "<absolute path to attacker-staged .safetensors>"}}
# attacker-staged safetensors contains a tensor of shape (30522, 768) filled with 0.31415
from transformers import AutoModel
m = AutoModel.from_pretrained("<attacker repo>")   # all defaults
sd = m.state_dict()
print(next(k for k in sd if "word_embeddings" in k), float(sd[k].mean()))
# >>> embeddings.word_embeddings.weight 0.314150

The attacker-controlled value loads into the model's word_embeddings cleanly — no error, no MISSING entry in _missing_keys, no code execution. The victim ends up running a model whose behavior is silently controlled by an attacker-staged weights file (silent model backdoor / weight injection).

Two practical attack shapes:

  1. Single-repo, same-host: attacker's index points the shard at another *.safetensors file path the attacker can stage on the victim host (e.g. another HF-cache file from a separately-staged attacker repo the victim was nudged to download earlier). The victim's from_pretrained(<malicious repo>) loads attacker weights without any code execution.
  2. Cross-repo weight confusion on shared cache (multi-tenant training hosts): attacker repo A is downloaded and cached; attacker repo B (completely separate) ships an index.json whose weight_map points at A's cached shard. User B loading "harmless" repo B silently ends up running attacker A's weights.

Severity rationale: bypass of the three default safe-loading gates, silent (no error), and persistent (model behavior tampering) — without any code execution primitive.

What this PR changes

src/transformers/utils/hub.py — after collecting shard_filenames, reject any filename that is "", ".", "..", an absolute path, or is not equal to its own os.path.basename. The error message names the offending value and the index file. The check runs before either the local-folder branch (os.path.join(pretrained_model_name_or_path, subfolder, f)) or the Hub branch (cached_files(...) — which sends the raw filenames to hf_hub_download / snapshot_download) ever sees the value.

tests/utils/test_hub_utils.py — new GetCheckpointShardFilesSecurityTests class:

  • test_rejects_absolute_path_in_weight_map
  • test_rejects_parent_traversal_in_weight_map
  • test_accepts_benign_relative_basename (sanity check: model-00001-of-00001.safetensors still loads)

Test commands run

PYTHONPATH=src TRANSFORMERS_OFFLINE=1 python -m unittest \
  tests.utils.test_hub_utils.GetCheckpointShardFilesSecurityTests -v

Result:

test_accepts_benign_relative_basename ... ok
test_rejects_absolute_path_in_weight_map ... ok
test_rejects_parent_traversal_in_weight_map ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.002s

OK

PoC verification before/after the patch:

  • Before: get_checkpoint_shard_files(...) returns ['/tmp/attacker_dropper.safetensors'], safe_open on the returned path yields a tensor with mean = 0.314150.
  • After: get_checkpoint_shard_files(...) raises ValueError: Invalid shard filename in checkpoint index ....

Coordination / duplicate-work check

  • gh pr list --repo huggingface/transformers --state open --search "weight_map path traversal" → no open PRs.
  • gh pr list --repo huggingface/transformers --state open --search "checkpoint shard basename" → no open PRs.
  • No tracking issue found for this exact code path.

No public issue exists yet for this vulnerability (no security advisory was found in the repo's published advisories either). Opening this public PR per the contributor's explicit choice; happy to also file a private GitHub Security Advisory if maintainers prefer to handle disclosure that way and squash this into a follow-up.

AI-assistance disclosure

Per the repository's AGENTS.md agentic contribution policy:

  • This patch, the regression tests, and the PoC verification were produced with the help of an AI coding agent.
  • The human submitter (Snakinya) reviewed every changed line, ran the included regression tests locally, and ran the PoC end-to-end against the latest main to confirm both the vulnerability and the fix.
  • The change is small (18 lines in hub.py, 51 in tests), fully visible in the diff, and contains no model code, no copy/sync machinery, and no doc changes that would trip the modular/copy lints.

Backwards compatibility

The only behavioral change is that ValueError is raised when an index.json ships a weight_map value that is not a relative basename. Legitimate sharded checkpoints produced by save_pretrained always use names like model-00001-of-00002.safetensors (verified against save_pretrained's shard-naming code), so no normal usage is affected. The added test test_accepts_benign_relative_basename locks this in.

归青 and others added 2 commits June 25, 2026 16:34
The `weight_map` values parsed from `model.safetensors.index.json` /
`pytorch_model.bin.index.json` are attacker-controlled when loading an
untrusted Hub repo. Previously these values were passed verbatim to
`os.path.join(repo, subfolder, f)`, so a crafted index pointing an entry
at an absolute path or a `..`-bearing value would resolve the shard
OUTSIDE the checkpoint directory. The escaped path is then opened by
safe_open (.safetensors) or torch.load (.bin) and loaded into the
model as weights.

This bypasses the loader's safe-loading contract: a malicious repo can
silently substitute its declared weights with the contents of any
arbitrary local .safetensors/.bin file (including one already cached
on disk from a different, separately-staged attacker repo), with no
trust_remote_code=True, no weights_only=False, and no opt-out of the
safetensors-first cascade. The substituted weights load cleanly with no
error and no missing-key report, so a victim running
from_pretrained(<malicious repo>) ends up with attacker-controlled
model behavior (silent model backdoor / weight injection).

Reject any shard filename that is not a single relative basename: no
path separators, no absolute path, no '..'. Add regression tests that
exercise the absolute-path and '..' rejection plus a benign baseline.

AI-assisted patch: the patch, tests, and PoC verification were produced
with the help of an AI coding agent; the human submitter reviewed every
changed line and ran the included regression tests locally.

Co-authored-by: Snakinya <Snakinya@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant