Skip to content

convert-sha256: one-off SHA1 → SHA256 repo conversion#66

Open
nodo wants to merge 6 commits into
mainfrom
nodo/convert-sha256
Open

convert-sha256: one-off SHA1 → SHA256 repo conversion#66
nodo wants to merge 6 commits into
mainfrom
nodo/convert-sha256

Conversation

@nodo
Copy link
Copy Markdown
Collaborator

@nodo nodo commented May 25, 2026

Summary

New git-sync convert-sha256 subcommand. One-off conversion: fetches a pack over smart HTTP from a SHA1 source, walks every reachable object, and writes a fresh SHA256 bare repository where every tree/commit/tag reference is re-encoded under SHA256. All branches and tags are always converted (partial scope risks stranding cross-branch references in messages). Behavior, sharp edges, and operational characteristics documented in docs/convert-sha256.md.

Design: two-pass topological DFS

The translation needs an authoritative "what's in scope" view before encoding starts, so abbreviated SHA1 prefixes in commit messages get the same uniqueness verdict regardless of how far translation has progressed, and so cross-branch message references (cherry-picks, reverts, etc.) can be added as dependency edges in the DFS:

  1. Discovery: a non-encoding DFS from each desired ref tip walks every reachable object via tree entries, commit tree/parent links, and tag targets. Produces a reachable map[SHA1]ObjectType covering the full in-scope set. Submodule gitlinks pointing outside the repo error here, before the target bare repo is materialized — failing fast keeps half-converted state off disk.
  2. Translation: memoized recursive DFS in topological order. Tree, parent, tag-target, and message-reference edges are all part of the DFS, so the SHA1 → SHA256 mapping contains every dependency by the time any object's bytes are encoded. Single forward sweep — no cascade re-encoding, no per-commit dirty bookkeeping.

Cycles in the combined dependency graph are cryptographically infeasible (forming a cycle via message references would require knowing each hash before computing it), so the topological order is always valid. A defensive inProgress guard turns unexpected graph shapes into a clear error rather than a stack overflow.

Loose objects are written by hand — go-git@v6.0.0-alpha.3's objfile.Writer hardcodes SHA1 in its hasher, so the standard SetEncodedObject path would place every translated object at a SHA1-derived filename even on a SHA256 store. A unit test recomputes sha256 of every loose object's decompressed content and compares against the filename to prevent regression.

Signing (--sign)

After conversion, --sign shells out to git tag -s converted/<branch> <tip> for every converted branch. Each resulting signed annotated tag transitively attests the entire reachable history of that branch via the parent chain: tampering with any historical commit changes ancestor hashes, which changes the tip's hash, which changes the tag's object line, which invalidates the signature.

The signature is by the converter, not by the original authors — the originals' signatures are necessarily lost (they signed over pre-rewrite bytes). For internal mirrors and single-identity repos that's a strict improvement over unsigned-everywhere; for broad public repos it's weaker than the pre-conversion chain. SSH signing (gpg.format = ssh) and OpenPGP both work — we delegate to git, so whatever the user's git config picks is honored. --sign-key <id> overrides the default key.

Example output

--sign --check against entireio/cli (283 refs, 6,950 commits, 55,289 objects):

$ OUT=$(mktemp -d) && ./git-sync convert-sha256 --sign --check https://github.com/entireio/cli.git "$OUT"
fetching 283 ref(s) from https://github.com/entireio/cli.git ...
discovering reachable objects ...
translating objects to sha256 ...
signing refs/tags/converted/1072-intermittent-stop-hook-timeout-... ...
signing refs/tags/converted/20260116-git-hook-absolute-path ...
signing refs/tags/converted/20260119-do-not-pin-go ...
... (one line per branch tip)
verifying output ...
  ✓ config: extensions.objectformat = sha256
  ✓ HEAD: 11a78c5d7ddc121d551d9bbf5e8fe7de2b329da77334cde2f2131ec44b1ef83d
  ✓ refs: 283 / 283 resolve to objects
  ✓ git fsck --full: clean
sha256 bare repo: /var/folders/.../tmp.S5PP3dNay8
source: https://github.com/entireio/cli.git (v2)
converted: 22989 blobs, 25350 trees, 6950 commits, 0 tags
refs written: 283
warning: stripped 4042 GPG signature(s); they no longer match the rewritten object content
rewrote 48 SHA1 hash reference(s) in commit/tag messages
origin notes ref: refs/notes/sha1-origin (use `git notes --ref=sha1-origin show <sha256>` to recover old SHA1)
signed 213 branch attestation tag(s): refs/tags/converted/1072-..., refs/tags/converted/20260116-..., refs/tags/converted/20260119-..., refs/tags/converted/20260128-..., refs/tags/converted/20260206-..., ... (208 more; full list in --json)

Inspecting one of the signed tags:

$ git -C "$OUT" cat-file -p refs/tags/converted/main
object 11a78c5d7ddc121d551d9bbf5e8fe7de2b329da77334cde2f2131ec44b1ef83d
type commit
tag converted/main
tagger Andrea Nodari <and.nodari@gmail.com> 1779698132 +0200

SHA1 → SHA256 conversion attestation for refs/heads/main.

Source: https://github.com/entireio/cli.git
Produced by git-sync convert-sha256.
-----BEGIN SSH SIGNATURE-----
U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAg7xes3P5ZXspRJNdboZZaUN+GZK
PowQqx6UX8bp0evB4AAAADZ2l0AAAAAAAAAAZzaGE1MTIAAABTAAAAC3NzaC1lZDI1NTE5
AAAAQLn2hdEdYiTF5MFug8L+mVD6fU/ZDsZF9nm6a6MHqIrGchMHxnkeN6kLPG4UdAKyxD
uk/lW8VIW/iErg9K46vA8=
-----END SSH SIGNATURE-----

git verify-tag refs/tags/converted/main (against the user's allowed-signers file) then verifies the entire history.

Other flags

  • --all-refs / --exclude-ref-prefix — extend scope to notes/pulls/etc., or subtract
  • --write-mapping <path> — emit SHA1 → SHA256 TSV for rewriting external systems (PR descriptions, deploy manifests, etc.)
  • --no-rewrite-messages — skip inline rewriting of hash references in commit/tag messages
  • --no-origin-notes — skip refs/notes/sha1-origin (per-commit reverse lookup of original SHA1)
  • --check — post-conversion sanity steps (config, HEAD resolves, refs resolve, git fsck --full)
  • --progress — live per-phase object counts (TTY only)
  • --sign-key <id> — override user.signingkey for --sign

Full reference in docs/convert-sha256.md.

Test plan

  • go test ./... passes
  • go test -race ./cmd/git-sync/internal/sha256convert/ clean (atomic counters used so the --progress ticker doesn't race against translation)
  • Unit tests cover the translator end-to-end: blobs/trees/commits/tags including signed commit + signed annotated tag, idempotency, on-disk sha256(content) invariant for every loose object, message rewriting (ancestor + cross-branch + ambiguous + skip), origin notes, mapping file, submodule fail-fast in discovery
  • Integration test against local git-http-backend (gated on GITSYNC_E2E_SHA256_HTTP_BACKEND=1): exercises pack fetch + discovery + translation + rewriting + notes + mapping + --check end-to-end, asserts git fsck clean
  • Signing integration test: spins up an ephemeral ed25519 SSH key via ssh-keygen, points GIT_CONFIG_GLOBAL at a generated gitconfig with SSH signing, runs convert-sha256 --sign, asserts the resulting refs/tags/converted/main is a signed annotated tag targeting the converted branch tip
  • Smoke-tested manually against spf13/cobra (37 refs, ~4,500 objects) and entireio/cli (283 refs, ~55,000 objects); git fsck --full clean; git verify-tag refs/tags/converted/main passes against an SSH allowed-signers file

🤖 Generated with Claude Code


Note

Medium Risk
Large new conversion pipeline touches object integrity, ref rewriting, and optional signing; mistakes could produce unusable repos, though scope is isolated to the new command and is heavily tested.

Overview
Adds git-sync convert-sha256, a one-off migration path that pulls a SHA1 repo over smart HTTP, rewrites every reachable object and ref into a new SHA256 bare repo on disk (all branches/tags by default; optional --all-refs / excludes).

The new sha256convert package runs discovery then memoized translation: fail-fast on external submodule gitlinks before writing the target; strips invalid GPG/mergetag data; rewrites in-scope commit/tag message hashes (including cross-branch refs via DFS); writes SHA256 loose objects manually (workaround for go-git’s SHA1 object writer). Optional outputs: refs/notes/sha1-origin, --write-mapping TSV, --sign branch attestation tags, --check (config, HEAD, refs, git fsck).

Wires the command into the CLI root, documents behavior in docs/convert-sha256.md, updates README, ignores redactors/local/, and adds broad unit tests plus gated git-http-backend integration tests.

Reviewed by Cursor Bugbot for commit 2b36d81. Configure here.

nodo added 6 commits May 23, 2026 01:40
Entire-Checkpoint: 198bc99d4b0b
Entire-Checkpoint: dc4592acaadf
Entire-Checkpoint: fb6dc8f73ee0
Entire-Checkpoint: 80a4babf00ee
This allows to "sign" the conversion.

Entire-Checkpoint: 1857d92ce8c7
golangci-lint v2.11.4 was unhappy across errcheck, exhaustive,
gocritic, inamedparam, ireturn, maintidx, noctx, perfsprint, revive,
and wrapcheck. Mostly mechanical:

- Test helpers (initSHA1, initSHA256, mustTranslator) collapse the
  '_ :=' patterns into t.Fatalf-on-error wrappers.
- exec.Command -> exec.CommandContext(ctx, ...); ctx threaded into
  runChecks and signBranchTips.
- refs.ForEach return value checked instead of discarded.
- if/else chain in runChecks rewritten as switch.
- ireturn for openSource/normalizeAuth annotated, since both return
  shared transport interfaces by design.
- maintidx on Run annotated; the function is a phase orchestrator,
  splitting it would obscure the pipeline.
- exhaustive switches on plumbing.ObjectType annotated; the
  unhandled cases (OFSDelta/REFDelta/AnyObject/InvalidObject) can't
  reach a resolved storer.
- Errors from io.ReadAll/MemoryObject.Reader/bufio.Flush/fmt.Fprintln/
  auth.Method.Authorizer wrapped with fmt.Errorf.
- Constant 'max' renamed to package-level 'previewMax' to stop
  shadowing the builtin in two places.
- Named parameter added to interface methods that lint flagged.
- Two fmt.Sprintf calls replaced with string concatenation.

Entire-Checkpoint: 43bf90985cc5
@nodo
Copy link
Copy Markdown
Collaborator Author

nodo commented May 25, 2026

bugbot run

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 2b36d81. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant