Validate shard filenames in checkpoint index to prevent path traversal (silent weight injection)#46913
Open
Snakinya wants to merge 2 commits into
Open
Validate shard filenames in checkpoint index to prevent path traversal (silent weight injection)#46913Snakinya wants to merge 2 commits into
Snakinya wants to merge 2 commits into
Conversation
The `weight_map` values parsed from `model.safetensors.index.json` / `pytorch_model.bin.index.json` are attacker-controlled when loading an untrusted Hub repo. Previously these values were passed verbatim to `os.path.join(repo, subfolder, f)`, so a crafted index pointing an entry at an absolute path or a `..`-bearing value would resolve the shard OUTSIDE the checkpoint directory. The escaped path is then opened by safe_open (.safetensors) or torch.load (.bin) and loaded into the model as weights. This bypasses the loader's safe-loading contract: a malicious repo can silently substitute its declared weights with the contents of any arbitrary local .safetensors/.bin file (including one already cached on disk from a different, separately-staged attacker repo), with no trust_remote_code=True, no weights_only=False, and no opt-out of the safetensors-first cascade. The substituted weights load cleanly with no error and no missing-key report, so a victim running from_pretrained(<malicious repo>) ends up with attacker-controlled model behavior (silent model backdoor / weight injection). Reject any shard filename that is not a single relative basename: no path separators, no absolute path, no '..'. Add regression tests that exercise the absolute-path and '..' rejection plus a benign baseline. AI-assisted patch: the patch, tests, and PoC verification were produced with the help of an AI coding agent; the human submitter reviewed every changed line and ran the included regression tests locally. Co-authored-by: Snakinya <Snakinya@users.noreply.github.com>
Contributor
|
CI Dashboard: View test results in Grafana |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
get_checkpoint_shard_filesinsrc/transformers/utils/hub.pyreads theweight_mapofmodel.safetensors.index.json/pytorch_model.bin.index.jsonand feedsweight_map.values()straight intoos.path.join(repo, subfolder, f)with no validation. When loading an untrusted Hub repo, those values are attacker-controlled, and an absolute path or a..-bearing value resolves the shard outside the checkpoint directory. The escaped path is then opened bysafe_open(.safetensors) ortorch.load(.bin) and loaded into the model as weights.This patch rejects any shard filename that is not a single relative basename (no path separators, no absolute path, no
..), and adds regression tests.Why this matters / impact
Defaults don't help. The bypass is reachable under the full safe-loading default stack:
use_safetensors=True(default) — the shard the attacker redirects to is itself a legal.safetensorsfile, so the safetensors-first cascade is happy.weights_only=True(torch 2.6+ default) —safe_opendoesn't go through pickle at all, so this gate never sees the attacker file.trust_remote_code=False(default) — the bug is on the weight-loading path, not the remote-code path; no opt-in is required.End-to-end PoC, run against the latest
mainat the time of writing (9b6af5d, 17 minutes old):The attacker-controlled value loads into the model's
word_embeddingscleanly — no error, noMISSINGentry in_missing_keys, no code execution. The victim ends up running a model whose behavior is silently controlled by an attacker-staged weights file (silent model backdoor / weight injection).Two practical attack shapes:
*.safetensorsfile path the attacker can stage on the victim host (e.g. another HF-cache file from a separately-staged attacker repo the victim was nudged to download earlier). The victim'sfrom_pretrained(<malicious repo>)loads attacker weights without any code execution.index.jsonwhoseweight_mappoints at A's cached shard. User B loading "harmless" repo B silently ends up running attacker A's weights.Severity rationale: bypass of the three default safe-loading gates, silent (no error), and persistent (model behavior tampering) — without any code execution primitive.
What this PR changes
src/transformers/utils/hub.py— after collectingshard_filenames, reject any filename that is"",".","..", an absolute path, or is not equal to its ownos.path.basename. The error message names the offending value and the index file. The check runs before either the local-folder branch (os.path.join(pretrained_model_name_or_path, subfolder, f)) or the Hub branch (cached_files(...)— which sends the raw filenames tohf_hub_download/snapshot_download) ever sees the value.tests/utils/test_hub_utils.py— newGetCheckpointShardFilesSecurityTestsclass:test_rejects_absolute_path_in_weight_maptest_rejects_parent_traversal_in_weight_maptest_accepts_benign_relative_basename(sanity check:model-00001-of-00001.safetensorsstill loads)Test commands run
Result:
PoC verification before/after the patch:
get_checkpoint_shard_files(...)returns['/tmp/attacker_dropper.safetensors'],safe_openon the returned path yields a tensor withmean = 0.314150.get_checkpoint_shard_files(...)raisesValueError: Invalid shard filename in checkpoint index ....Coordination / duplicate-work check
gh pr list --repo huggingface/transformers --state open --search "weight_map path traversal"→ no open PRs.gh pr list --repo huggingface/transformers --state open --search "checkpoint shard basename"→ no open PRs.No public issue exists yet for this vulnerability (no security advisory was found in the repo's published advisories either). Opening this public PR per the contributor's explicit choice; happy to also file a private GitHub Security Advisory if maintainers prefer to handle disclosure that way and squash this into a follow-up.
AI-assistance disclosure
Per the repository's
AGENTS.mdagentic contribution policy:mainto confirm both the vulnerability and the fix.hub.py, 51 in tests), fully visible in the diff, and contains no model code, no copy/sync machinery, and no doc changes that would trip the modular/copy lints.Backwards compatibility
The only behavioral change is that
ValueErroris raised when anindex.jsonships aweight_mapvalue that is not a relative basename. Legitimate sharded checkpoints produced bysave_pretrainedalways use names likemodel-00001-of-00002.safetensors(verified againstsave_pretrained's shard-naming code), so no normal usage is affected. The added testtest_accepts_benign_relative_basenamelocks this in.