convert-sha256: one-off SHA1 → SHA256 repo conversion#66
Open
nodo wants to merge 6 commits into
Open
Conversation
Entire-Checkpoint: 198bc99d4b0b
Entire-Checkpoint: dc4592acaadf
Entire-Checkpoint: fb6dc8f73ee0
Entire-Checkpoint: 80a4babf00ee
This allows to "sign" the conversion. Entire-Checkpoint: 1857d92ce8c7
golangci-lint v2.11.4 was unhappy across errcheck, exhaustive, gocritic, inamedparam, ireturn, maintidx, noctx, perfsprint, revive, and wrapcheck. Mostly mechanical: - Test helpers (initSHA1, initSHA256, mustTranslator) collapse the '_ :=' patterns into t.Fatalf-on-error wrappers. - exec.Command -> exec.CommandContext(ctx, ...); ctx threaded into runChecks and signBranchTips. - refs.ForEach return value checked instead of discarded. - if/else chain in runChecks rewritten as switch. - ireturn for openSource/normalizeAuth annotated, since both return shared transport interfaces by design. - maintidx on Run annotated; the function is a phase orchestrator, splitting it would obscure the pipeline. - exhaustive switches on plumbing.ObjectType annotated; the unhandled cases (OFSDelta/REFDelta/AnyObject/InvalidObject) can't reach a resolved storer. - Errors from io.ReadAll/MemoryObject.Reader/bufio.Flush/fmt.Fprintln/ auth.Method.Authorizer wrapped with fmt.Errorf. - Constant 'max' renamed to package-level 'previewMax' to stop shadowing the builtin in two places. - Named parameter added to interface methods that lint flagged. - Two fmt.Sprintf calls replaced with string concatenation. Entire-Checkpoint: 43bf90985cc5
Collaborator
Author
|
bugbot run |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 2b36d81. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New
git-sync convert-sha256subcommand. One-off conversion: fetches a pack over smart HTTP from a SHA1 source, walks every reachable object, and writes a fresh SHA256 bare repository where every tree/commit/tag reference is re-encoded under SHA256. All branches and tags are always converted (partial scope risks stranding cross-branch references in messages). Behavior, sharp edges, and operational characteristics documented indocs/convert-sha256.md.Design: two-pass topological DFS
The translation needs an authoritative "what's in scope" view before encoding starts, so abbreviated SHA1 prefixes in commit messages get the same uniqueness verdict regardless of how far translation has progressed, and so cross-branch message references (cherry-picks, reverts, etc.) can be added as dependency edges in the DFS:
tree/parentlinks, and tag targets. Produces areachable map[SHA1]ObjectTypecovering the full in-scope set. Submodule gitlinks pointing outside the repo error here, before the target bare repo is materialized — failing fast keeps half-converted state off disk.Cycles in the combined dependency graph are cryptographically infeasible (forming a cycle via message references would require knowing each hash before computing it), so the topological order is always valid. A defensive
inProgressguard turns unexpected graph shapes into a clear error rather than a stack overflow.Loose objects are written by hand —
go-git@v6.0.0-alpha.3'sobjfile.Writerhardcodes SHA1 in its hasher, so the standardSetEncodedObjectpath would place every translated object at a SHA1-derived filename even on a SHA256 store. A unit test recomputessha256of every loose object's decompressed content and compares against the filename to prevent regression.Signing (
--sign)After conversion,
--signshells out togit tag -s converted/<branch> <tip>for every converted branch. Each resulting signed annotated tag transitively attests the entire reachable history of that branch via the parent chain: tampering with any historical commit changes ancestor hashes, which changes the tip's hash, which changes the tag'sobjectline, which invalidates the signature.The signature is by the converter, not by the original authors — the originals' signatures are necessarily lost (they signed over pre-rewrite bytes). For internal mirrors and single-identity repos that's a strict improvement over unsigned-everywhere; for broad public repos it's weaker than the pre-conversion chain. SSH signing (
gpg.format = ssh) and OpenPGP both work — we delegate togit, so whatever the user's git config picks is honored.--sign-key <id>overrides the default key.Example output
--sign --checkagainstentireio/cli(283 refs, 6,950 commits, 55,289 objects):Inspecting one of the signed tags:
git verify-tag refs/tags/converted/main(against the user's allowed-signers file) then verifies the entire history.Other flags
--all-refs/--exclude-ref-prefix— extend scope to notes/pulls/etc., or subtract--write-mapping <path>— emit SHA1 → SHA256 TSV for rewriting external systems (PR descriptions, deploy manifests, etc.)--no-rewrite-messages— skip inline rewriting of hash references in commit/tag messages--no-origin-notes— skiprefs/notes/sha1-origin(per-commit reverse lookup of original SHA1)--check— post-conversion sanity steps (config, HEAD resolves, refs resolve,git fsck --full)--progress— live per-phase object counts (TTY only)--sign-key <id>— overrideuser.signingkeyfor--signFull reference in
docs/convert-sha256.md.Test plan
go test ./...passesgo test -race ./cmd/git-sync/internal/sha256convert/clean (atomic counters used so the--progressticker doesn't race against translation)sha256(content)invariant for every loose object, message rewriting (ancestor + cross-branch + ambiguous + skip), origin notes, mapping file, submodule fail-fast in discoverygit-http-backend(gated onGITSYNC_E2E_SHA256_HTTP_BACKEND=1): exercises pack fetch + discovery + translation + rewriting + notes + mapping +--checkend-to-end, assertsgit fsckcleanssh-keygen, pointsGIT_CONFIG_GLOBALat a generated gitconfig with SSH signing, runsconvert-sha256 --sign, asserts the resultingrefs/tags/converted/mainis a signed annotated tag targeting the converted branch tipspf13/cobra(37 refs, ~4,500 objects) andentireio/cli(283 refs, ~55,000 objects);git fsck --fullclean;git verify-tag refs/tags/converted/mainpasses against an SSH allowed-signers file🤖 Generated with Claude Code
Note
Medium Risk
Large new conversion pipeline touches object integrity, ref rewriting, and optional signing; mistakes could produce unusable repos, though scope is isolated to the new command and is heavily tested.
Overview
Adds
git-sync convert-sha256, a one-off migration path that pulls a SHA1 repo over smart HTTP, rewrites every reachable object and ref into a new SHA256 bare repo on disk (all branches/tags by default; optional--all-refs/ excludes).The new
sha256convertpackage runs discovery then memoized translation: fail-fast on external submodule gitlinks before writing the target; strips invalid GPG/mergetagdata; rewrites in-scope commit/tag message hashes (including cross-branch refs via DFS); writes SHA256 loose objects manually (workaround for go-git’s SHA1 object writer). Optional outputs:refs/notes/sha1-origin,--write-mappingTSV,--signbranch attestation tags,--check(config, HEAD, refs,git fsck).Wires the command into the CLI root, documents behavior in
docs/convert-sha256.md, updates README, ignoresredactors/local/, and adds broad unit tests plus gatedgit-http-backendintegration tests.Reviewed by Cursor Bugbot for commit 2b36d81. Configure here.