Skip to content

[experiment] Port reftable from Git C implementation and integrate as backend#2452

Draft
Byron wants to merge 1 commit intomainfrom
codex/reftable-port-sequence
Draft

[experiment] Port reftable from Git C implementation and integrate as backend#2452
Byron wants to merge 1 commit intomainfrom
codex/reftable-port-sequence

Conversation

@Byron
Copy link
Member

@Byron Byron commented Mar 3, 2026

Summary

This draft now intentionally scopes to standalone gix-reftable only.

It ports Git's in-tree reftable implementation into the dedicated crate and includes crate-local parity/unit/integration tests, but does not include wiring into gix-ref or gix in this PR.

Included commit range now ends at:

  • 33a91c9690 (gix-reftable/tests: add selected t0610/t0613/t0614 behavior parity integration tests)

Removed from this PR (deferred):

  • backend-agnostic gix-ref store changes
  • reftable adapter in gix-ref
  • gix backend detection/routing changes
  • cross-backend gix test updates
  • related docs/status integration notes
Original 15-step plan (for traceability)

Commit-By-Commit Execution Plan: Reftable Port + Integration

Summary

Implement the full reftable port in gix-reftable, integrate it as a real backend in gix-ref/gix, and land parity tests in small, reviewable commits.
Each commit is intentionally chained: it stabilizes one layer, then unlocks the next.

Commit Sequence

  1. workspace: add gix-reftable crate skeleton and wire it into Cargo workspace
    Motivation: create the isolated crate boundary first so all subsequent work lands incrementally.
    Relates to previous: baseline/no-op starting point.
    Future relevance: all reftable code/tests depend on this crate existing.

  2. gix-reftable: port basics/constants/error/varint primitives from git/reftable
    Motivation: establish byte-order, varint, hash-id, and error semantics shared by all modules.
    Relates to previous: fills in core primitives in the new crate.
    Future relevance: record/block/table/writer code will reuse these primitives directly.

  3. gix-reftable: implement record model and encode/decode parity (ref/log/obj/index)
    Motivation: record correctness is the format contract; everything else composes it.
    Relates to previous: consumes primitives and defines concrete wire payload behavior.
    Future relevance: block IO and iterators can now operate on typed records.

  4. gix-reftable: implement block + blocksource + table reader
    Motivation: make reftable files readable end-to-end (header/sections/restarts/seek).
    Relates to previous: uses record codec to decode table contents.
    Future relevance: merged tables and stack logic need a working single-table reader.

  5. gix-reftable: implement merged table iterators, pq, and tree helpers
    Motivation: parity for cross-table iteration and seek behavior.
    Relates to previous: builds on table reader to support multi-table views.
    Future relevance: stack and backend integration depend on merged iteration semantics.

  6. gix-reftable: implement writer with limits/index emission/write options
    Motivation: enable producing valid tables and exercising write-path parity tests.
    Relates to previous: complements reader path using the same record/block contracts.
    Future relevance: stack transactions and compaction need writer callbacks.

  7. gix-reftable: implement stack transactions, auto-compaction, reload, and fsck
    Motivation: complete operational backend behavior (tables.list, addition/commit, verify).
    Relates to previous: stack orchestrates reader/writer modules already landed.
    Future relevance: this is the direct foundation for gix-ref backend adapter.

  8. gix-reftable/tests: port upstream u-reftable-* unit suites with 1:1 case mapping
    Motivation: lock behavioral parity at the library level before integration churn.
    Relates to previous: validates all crate modules in isolation.
    Future relevance: reduces regression risk when wiring into gix-ref and gix.

  9. gix-reftable/tests: add selected t0610/t0613/t0614 behavior parity integration tests
    Motivation: cover high-value shell behavior in Rust tests (transactions/options/fsck/worktree).
    Relates to previous: adds scenario-level confidence on top of unit parity.
    Future relevance: these tests protect future backend integration refactors.

  10. gix-ref: activate backend-agnostic store abstraction (files + reftable state)
    Motivation: remove hard coupling to file-store without changing behavior yet.
    Relates to previous: prepares host crate interface for plugging in reftable.
    Future relevance: next commit injects real reftable-backed implementation.

  11. gix-ref: add reftable-backed store adapter and route find/iter/transaction operations
    Motivation: make gix-ref actually operate on reftable repositories.
    Relates to previous: fills the new abstraction with a concrete second backend.
    Future relevance: gix can now switch backend based on repository configuration.

  12. gix: switch RefStore to backend-capable store and detect extensions.refStorage=reftable
    Motivation: enable end-to-end opening and reading of reftable repos in top-level API.
    Relates to previous: consumes backend-capable gix-ref APIs.
    Future relevance: unlocks fixing existing tests that currently assert reftable unsupported.

  13. gix: make reference iteration/peeling/fetch update paths backend-agnostic
    Motivation: remove residual file-only assumptions in critical flows.
    Relates to previous: completes runtime behavior for common operations.
    Future relevance: ensures future features (e.g., optimizations) won’t regress reftable path.

  14. tests: update reftable open/head expectations and add cross-backend regression coverage
    Motivation: reflect new supported behavior and guard interoperability paths.
    Relates to previous: validates functional integration in gix public workflows.
    Future relevance: serves as long-term guardrail for both backends.

  15. docs/status: document reftable support, sha256 boundary, and update crate-status
    Motivation: finalize user/developer-facing contract and current limitations.
    Relates to previous: documents the now-landed behavior.
    Future relevance: provides clear baseline for follow-up work (end-to-end SHA-256 in gix).

Per-Commit Validation Rule

For each commit, run the smallest relevant test slice before committing, then run a broader slice periodically:

  • crate-local unit tests for touched modules,
  • gix-reftable parity suites,
  • gix-ref targeted tests,
  • gix targeted repository/reference tests.

Commit Message Format Rule

Every commit body will include:

  • Why now (motivation),
  • What changed (scope),
  • Why this order (relation to previous commit),
  • What it unlocks next (future relevance).

Assumptions

  • Source parity target is Git’s in-tree reftable C implementation and tests.
  • gix-reftable supports SHA-1 and SHA-256; gix integration remains SHA-1-only in this batch.
  • No squashing: one commit per step as listed above.

@Byron Byron changed the title Port reftable from Git C implementation and integrate as backend Port reftable from Git C implementation and integrate as backend (experiment) Mar 3, 2026
@Byron Byron changed the title Port reftable from Git C implementation and integrate as backend (experiment) Port reftable into standalone gix-reftable crate (no gix-ref/gix integration) Mar 3, 2026
@Byron Byron changed the title Port reftable into standalone gix-reftable crate (no gix-ref/gix integration) [experiment] Port reftable from Git C implementation and integrate as backend Mar 3, 2026
@Byron Byron force-pushed the codex/reftable-port-sequence branch 5 times, most recently from 8f1b751 to 38bb0ad Compare March 3, 2026 03:57
Why now
The goal is to land the reftable port as a standalone crate with strong parity coverage before any backend integration churn.

What changed
This squashed commit contains all standalone `gix-reftable` work that was previously split across 9 commits:
- workspace wiring for a dedicated `gix-reftable` crate
- low-level primitives (constants, varint, hash-kind, errors)
- record model and encode/decode for ref/log/obj/index records
- block source and single-table reader
- merged-table iterators with pq/tree helpers
- table writer with limits/index emission and options
- stack transactions, reload, auto-compaction, and fsck
- upstream-inspired `u-reftable-*` parity unit tests
- selected `t0610`/`t0613`/`t0614` scenario parity tests

Why this order
This commit is a squash of the previously reviewed sequence where each layer built on the previous one (primitives -> records -> io -> merged iteration -> writer -> stack -> tests).

What it unlocks next
A clean standalone reftable library baseline that can be integrated later into `gix-ref`/`gix` in follow-up work.

Prompt (verbatim)
Look at the reftable implementation at /Users/byron/dev/github.com/git/git and port it over to Rust in its own `gix-reftable` crate. Be sure to capture specific tests that exist.

Follow through with the entire plan. Do not stop until it's all done. After each step, make a commit with a meaningful message and motivation. Show how the commit relates to the previous commit, and at least hint at how it's going to be relevant in future commits.

PLEASE IMPLEMENT THIS PLAN:
# Commit-By-Commit Execution Plan: Reftable Port + Integration

## Summary
Implement the full reftable port in `gix-reftable`, integrate it as a real backend in `gix-ref`/`gix`, and land parity tests in small, reviewable commits.
Each commit is intentionally chained: it stabilizes one layer, then unlocks the next.

## Commit Sequence

1. **`workspace: add gix-reftable crate skeleton and wire it into Cargo workspace`**
Motivation: create the isolated crate boundary first so all subsequent work lands incrementally.
Relates to previous: baseline/no-op starting point.
Future relevance: all reftable code/tests depend on this crate existing.

2. **`gix-reftable: port basics/constants/error/varint primitives from git/reftable`**
Motivation: establish byte-order, varint, hash-id, and error semantics shared by all modules.
Relates to previous: fills in core primitives in the new crate.
Future relevance: record/block/table/writer code will reuse these primitives directly.

3. **`gix-reftable: implement record model and encode/decode parity (ref/log/obj/index)`**
Motivation: record correctness is the format contract; everything else composes it.
Relates to previous: consumes primitives and defines concrete wire payload behavior.
Future relevance: block IO and iterators can now operate on typed records.

4. **`gix-reftable: implement block + blocksource + table reader`**
Motivation: make reftable files readable end-to-end (header/sections/restarts/seek).
Relates to previous: uses record codec to decode table contents.
Future relevance: merged tables and stack logic need a working single-table reader.

5. **`gix-reftable: implement merged table iterators, pq, and tree helpers`**
Motivation: parity for cross-table iteration and seek behavior.
Relates to previous: builds on table reader to support multi-table views.
Future relevance: stack and backend integration depend on merged iteration semantics.

6. **`gix-reftable: implement writer with limits/index emission/write options`**
Motivation: enable producing valid tables and exercising write-path parity tests.
Relates to previous: complements reader path using the same record/block contracts.
Future relevance: stack transactions and compaction need writer callbacks.

7. **`gix-reftable: implement stack transactions, auto-compaction, reload, and fsck`**
Motivation: complete operational backend behavior (`tables.list`, addition/commit, verify).
Relates to previous: stack orchestrates reader/writer modules already landed.
Future relevance: this is the direct foundation for `gix-ref` backend adapter.

8. **`gix-reftable/tests: port upstream u-reftable-* unit suites with 1:1 case mapping`**
Motivation: lock behavioral parity at the library level before integration churn.
Relates to previous: validates all crate modules in isolation.
Future relevance: reduces regression risk when wiring into `gix-ref` and `gix`.

9. **`gix-reftable/tests: add selected t0610/t0613/t0614 behavior parity integration tests`**
Motivation: cover high-value shell behavior in Rust tests (transactions/options/fsck/worktree).
Relates to previous: adds scenario-level confidence on top of unit parity.
Future relevance: these tests protect future backend integration refactors.

10. **`gix-ref: activate backend-agnostic store abstraction (files + reftable state)`**
Motivation: remove hard coupling to file-store without changing behavior yet.
Relates to previous: prepares host crate interface for plugging in reftable.
Future relevance: next commit injects real reftable-backed implementation.

11. **`gix-ref: add reftable-backed store adapter and route find/iter/transaction operations`**
Motivation: make `gix-ref` actually operate on reftable repositories.
Relates to previous: fills the new abstraction with a concrete second backend.
Future relevance: `gix` can now switch backend based on repository configuration.

12. **`gix: switch RefStore to backend-capable store and detect extensions.refStorage=reftable`**
Motivation: enable end-to-end opening and reading of reftable repos in top-level API.
Relates to previous: consumes backend-capable `gix-ref` APIs.
Future relevance: unlocks fixing existing tests that currently assert reftable unsupported.

13. **`gix: make reference iteration/peeling/fetch update paths backend-agnostic`**
Motivation: remove residual file-only assumptions in critical flows.
Relates to previous: completes runtime behavior for common operations.
Future relevance: ensures future features (e.g., optimizations) won’t regress reftable path.

14. **`tests: update reftable open/head expectations and add cross-backend regression coverage`**
Motivation: reflect new supported behavior and guard interoperability paths.
Relates to previous: validates functional integration in `gix` public workflows.
Future relevance: serves as long-term guardrail for both backends.

15. **`docs/status: document reftable support, sha256 boundary, and update crate-status`**
Motivation: finalize user/developer-facing contract and current limitations.
Relates to previous: documents the now-landed behavior.
Future relevance: provides clear baseline for follow-up work (end-to-end SHA-256 in `gix`).

## Per-Commit Validation Rule
For each commit, run the smallest relevant test slice before committing, then run a broader slice periodically:
- crate-local unit tests for touched modules,
- `gix-reftable` parity suites,
- `gix-ref` targeted tests,
- `gix` targeted repository/reference tests.

## Commit Message Format Rule
Every commit body will include:
- **Why now** (motivation),
- **What changed** (scope),
- **Why this order** (relation to previous commit),
- **What it unlocks next** (future relevance).

## Assumptions
- Source parity target is Git’s in-tree reftable C implementation and tests.
- `gix-reftable` supports SHA-1 and SHA-256; `gix` integration remains SHA-1-only in this batch.
- No squashing: one commit per step as listed above.
@Byron Byron force-pushed the codex/reftable-port-sequence branch from 38bb0ad to 94793bb Compare March 3, 2026 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant