Conversation
21e316d to
b65f15b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21e316d86c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| h0: murmur3_v2(SEED0, path), | ||
| h1: murmur3_v2(SEED1, path), |
There was a problem hiding this comment.
Respect Bloom hash_version when hashing query paths
This always hashes paths with the v2 routine, but the BDAT header’s hash_version is parsed and can be 1 in real commit-graphs (including current Git defaults for commit-graph write --changed-paths). For v1 filters, these hashes can differ (e.g. paths with non-ASCII bytes), which turns real path changes into Some(false) misses; callers then treat that as definitive and skip expensive checks, so blame/path-log results can become incorrect rather than just slower.
Useful? React with 👍 / 👎.
72434d7 to
52dd978
Compare
The `git commit-graph write` command also supports writing a separate section on the cache file that contains information about the paths changed between a commit and its first parent. This information can be used to significantly speed up the performance of some traversal operations, such as `git log -- <PATH>` and `git blame`. This commit teaches the git-commitgraph crate in gitoxide how to parse and access this information. We've only implemented support for reading v2 of this cache, because v1 is deprecated in Git as it can return bad results in some corner cases. The implementation is 100% compatible with Git itself; it uses the exact same version of murmur3 that Git is using, including the seed hashes.
Implement a gix_blame::incremental API that yelds the blame entries as they're discovered, similarly to Git's `git blame --incremental`. The implementation simply takes the original gix_blame::file and replaces the Vec of blame entries with a generic BlameSink trait. The original gix_blame::file is now implemented as a wrapper for gix_blame::incremental, by implementing the BlameSink trait on Vec<BlameEntry> and sorting + coalescing the entries before returning.
Use the new changed-path bloom filters from the commit graph to greatly speed up blame our implementation. Whenever we find a rejection on the bloom filter for the current path, we skip it altogether and pass the blame without diffing the trees.
Implement the log_file method in gitoxide-core, which allows performing path-delimited log commands. With the new changed paths bloom filter, it is not possible to perform this operation very efficiently.
Change `process_changes` to take `&[Change]` instead of `Vec<Change>`, eliminating the `changes.clone()` heap allocation at every call site. Replace the O(H×C) restart-from-beginning approach with a cursor that advances through the changes list across hunks. Non-suspect hunks are now skipped immediately. When the rare case of overlapping suspect ranges is detected (from merge blame convergence), the cursor safely resets to maintain correctness.
Compare the performance of the implementation with and without the
commit graph cache.
gix-blame::incremental/without-commit-graph
time: [14.852 s 14.895 s 14.944 s]
change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05)
Change within noise threshold.
gix-blame::incremental/with-commit-graph
time: [287.55 ms 290.30 ms 292.85 ms]
change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05)
No change in performance detected.
Signed-off-by: Vicent Marti <vmg@strn.cat>
52dd978 to
e6b8998
Compare
Hiiiiii @Byron! Thanks for all your work on the library!
I've been playing around with the new blame APIs that @cruessler developed. The existing
gix_blame::filewas not fitting the use case we needed at Cursor, so I took a stab at implementing an equivalent togit blame --incremental. The changes were quite minimal because I just leftgix_blame::fileas a thin wrapper overgix_blame::incremental.I then tried benchmarking the
incrementalAPI against Git itself and the numbers were not good at all. After some review, I noticed that thegix-commitgraphcrate just didn't support the changed-paths bloom filter cache from Git, so I took a stab at implementing those too.The results are very good. These are for
tools/clang/spanify/Spanifier.cppin the Chromium repository, which is a very very hairy file:Since all these changes are quite related, I'm putting them up here in a single PR. Every commit is self contained and explains the changes on the commit message so if you'd like me to split this into smaller PRs just let me know.
Thanks!