Skip to content

gix-blame: Incremental#2457

Open
vmg wants to merge 6 commits intoGitoxideLabs:mainfrom
withgraphite:vmg/blame-incremental
Open

gix-blame: Incremental#2457
vmg wants to merge 6 commits intoGitoxideLabs:mainfrom
withgraphite:vmg/blame-incremental

Conversation

@vmg
Copy link

@vmg vmg commented Mar 6, 2026

Hiiiiii @Byron! Thanks for all your work on the library!

I've been playing around with the new blame APIs that @cruessler developed. The existing gix_blame::file was not fitting the use case we needed at Cursor, so I took a stab at implementing an equivalent to git blame --incremental. The changes were quite minimal because I just left gix_blame::file as a thin wrapper over gix_blame::incremental.

I then tried benchmarking the incremental API against Git itself and the numbers were not good at all. After some review, I noticed that the gix-commitgraph crate just didn't support the changed-paths bloom filter cache from Git, so I took a stab at implementing those too.

The results are very good. These are for tools/clang/spanify/Spanifier.cpp in the Chromium repository, which is a very very hairy file:

gix-blame::incremental/without-commit-graph
                        time:   [14.852 s 14.895 s 14.944 s]
                        change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05)
                        Change within noise threshold.
gix-blame::incremental/with-commit-graph
                        time:   [287.55 ms 290.30 ms 292.85 ms]
                        change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05)
                        No change in performance detected.

Since all these changes are quite related, I'm putting them up here in a single PR. Every commit is self contained and explains the changes on the commit message so if you'd like me to split this into smaller PRs just let me know.

Thanks!

@vmg vmg force-pushed the vmg/blame-incremental branch from 21e316d to b65f15b Compare March 6, 2026 17:14
Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21e316d86c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +49 to +50
h0: murmur3_v2(SEED0, path),
h1: murmur3_v2(SEED1, path),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Respect Bloom hash_version when hashing query paths

This always hashes paths with the v2 routine, but the BDAT header’s hash_version is parsed and can be 1 in real commit-graphs (including current Git defaults for commit-graph write --changed-paths). For v1 filters, these hashes can differ (e.g. paths with non-ASCII bytes), which turns real path changes into Some(false) misses; callers then treat that as definitive and skip expensive checks, so blame/path-log results can become incorrect rather than just slower.

Useful? React with 👍 / 👎.

@vmg vmg force-pushed the vmg/blame-incremental branch 3 times, most recently from 72434d7 to 52dd978 Compare March 6, 2026 17:39
vmg added 6 commits March 6, 2026 18:58
The `git commit-graph write` command also supports writing a separate
section on the cache file that contains information about the paths
changed between a commit and its first parent.

This information can be used to significantly speed up the performance
of some traversal operations, such as `git log -- <PATH>` and `git
blame`.

This commit teaches the git-commitgraph crate in gitoxide how to parse
and access this information. We've only implemented support for reading
v2 of this cache, because v1 is deprecated in Git as it can return bad
results in some corner cases.

The implementation is 100% compatible with Git itself; it uses the exact
same version of murmur3 that Git is using, including the seed hashes.
Implement a gix_blame::incremental API that yelds the blame entries as
they're discovered, similarly to Git's `git blame --incremental`.

The implementation simply takes the original gix_blame::file and
replaces the Vec of blame entries with a generic BlameSink trait.

The original gix_blame::file is now implemented as a wrapper for
gix_blame::incremental, by implementing the BlameSink trait on
Vec<BlameEntry> and sorting + coalescing the entries before returning.
Use the new changed-path bloom filters from the commit graph to greatly
speed up blame our implementation. Whenever we find a rejection on the
bloom filter for the current path, we skip it altogether and pass the
blame without diffing the trees.
Implement the log_file method in gitoxide-core, which allows performing
path-delimited log commands. With the new changed paths bloom filter, it
is not possible to perform this operation very efficiently.
Change `process_changes` to take `&[Change]` instead of `Vec<Change>`,
eliminating the `changes.clone()` heap allocation at every call site.

Replace the O(H×C) restart-from-beginning approach with a cursor that
advances through the changes list across hunks. Non-suspect hunks are
now skipped immediately. When the rare case of overlapping suspect
ranges is detected (from merge blame convergence), the cursor safely
resets to maintain correctness.
Compare the performance of the implementation with and without the
commit graph cache.

gix-blame::incremental/without-commit-graph
                        time:   [14.852 s 14.895 s 14.944 s]
                        change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05)
                        Change within noise threshold.
gix-blame::incremental/with-commit-graph
                        time:   [287.55 ms 290.30 ms 292.85 ms]
                        change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05)
                        No change in performance detected.

Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg vmg force-pushed the vmg/blame-incremental branch from 52dd978 to e6b8998 Compare March 6, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant