-
-
Notifications
You must be signed in to change notification settings - Fork 420
feat: add sha256 feature to gix-commitgraph
#2377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add sha256 feature to gix-commitgraph
#2377
Conversation
|
Thanks a lot, I am looking forward to sorting this out! There is also #2378, hopefully done in the next couple of days (it might conflict). |
|
Thanks a lot for starting this! This crate is special as it's the first one that parses a file-format that changes depending on the hash kind. It seems like one commit currently in the |
|
Assuming you’re referring to the one adding Apart from that, my plan is to go through the tests and, for every tests that currently uses |
Yes, that's the one, thanks 🙏.
Sorry for the confusion, that's the way. The only thing tests have to do is to actively set all hashes using feature toggles, and they should just set sha1/sha256 while they are at it. Once a crate has been converted, the tests should automatically test both hashes, and make sure they are compiled in. |
bde9c33 to
7bc1409
Compare
|
I added fixtures for SHA256 hashes to |
Co-authored-by: Eliah Kagan <[email protected]>
|
I think I’ve now gotten to a point where I need some feedback in order to be able to continue. (CI is still running, so I also might need to address any issue that comes up there.) Specifically, I’ve got the following questions:
|
Byron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great!
The PR runs tests for both hashes through cargo nextest run -p gix-commitgraph --features sha256 --no-fail-fast. Is that correct?
I made a note in Cargo.toml which should lead to the extra run in the justfile to be removed. Please take a look to see if it makes sense.
Last time CI ran it did not complain about any of the generated archives in gix-commitgraph, but when I delete both generated-archives as well as generated-do-not-edit in order to regenerate the archives, git status shows that most files have changed their size by 512 bytes and that one file has been both deleted and modified. (I’m running cargo test --features sha256 on Ubuntu 25.10.) What’s the best way to debug this?
generated-do-no-edit would be the folder to delete for it to restore from archive or regenerate the local cache in generated-do-not-edit. That's all there is to it. With the sha256 versions showing up, I'd expect it to want to create the sha256 versions which seems to have happened as well. It seems to work from my PoV, but I will take a closer look when reviewing properly.
PS: removing generated-archives will regenerate them, and they always appear changed. That's normal, and one would avoid deleting the tracked files.
Is there anything else I would need to change or include before I can mark this PR as ready?
It looks like it's ready for a non-armchair and final review after this one 🎉.
| if hash_kind != gix_hash::Kind::Sha1 { | ||
| path = path.join(hash_kind.to_string()); | ||
| } | ||
| path = path.join(format!("{}-{}", script_identity, family_name())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it would be more cluttered, I seem to prefer the idea of not nesting hash_kind, but add it to the format string here.
In future, there will be even more of these as another parameter, the one for the refs backend, is added.
Alternatively… what do you think about looking at both parameters and using them for nesting?
path.join(hash_kind).join(refs_kind)
The refs_kind is nothing that exists now, but it would be added at some point.
Yes, that seems like the way to go.
So let's make a mild breaking change and always add the hash_kind to the path via join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, sorry! I had seen the comment, but wasn’t 100 % sure what you had in mind at first, and then I forgot to address it. 😄 I think making the hash kind a part of the path is a good idea, though.
tests/tools/src/lib.rs
Outdated
| .env_remove("GIT_WORK_TREE") | ||
| .env_remove("GIT_COMMON_DIR") | ||
| .env_remove("GIT_ASKPASS") | ||
| .env_remove("GIT_DEFAULT_HASH") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the remove here isn't needed if we are always overriding. If there is something special, then let's keep this and document it though.
gix-commitgraph/tests/access/mod.rs
Outdated
| "overflow happened, can't represent huge dates" | ||
| ); | ||
| assert_eq!( | ||
| for hash_kind in gix_hash::Kind::all() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could all() rather return an iterator that hands out owned versions of hash_kind? I think that should work with impl Iterator<Item = Kind>.
gix-commitgraph/Cargo.toml
Outdated
|
|
||
| [features] | ||
| ## Support for SHA256 hashes and digests. | ||
| sha256 = ["gix-hash/sha256"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd want to use dev-dependencies to always add sha256 to the features when testing. That way, there is only one way to run tests: all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means removing the feature here, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's start with that.
This means we say that plumbing will never set hash features, and it expects to be able to see the hash to be used at runtime via gix-hash::Kind. Most implementations do that already, and I think this should work for most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, removed.
|
As I am looking at Git for how they tackle this I wonder this: Should we, rather than changing every test and looping over it, prefer to build with all hashes, but run tests for each hash selectively, maybe by setting an environment variable that controls which hash is under test? Performance-wise it should be very similar, even though development is a bit less convenient as one will have to run twice with different flags or environment variables set. There, having the tests run everything automatically, always, and what we have now, seems better. Also, how do we handle expectations on hashes, particularly those with insta? It basically requires to chose one hash and find/replace the actual output to match that. I.e. So I think it's fine to keep this PR as is as it's happy with this simple and convenient solution, and we can probably layer moer on top of that when a need arises. |
|
I’m glad you mention this other option because I’m still very hesitant to force every test to be rewritten to make it loop over hash kinds as well. What I could try instead is adding an environment variable, e. g. |
|
I’ve added a |
That's fair, and probably that's best whenever there is no actual assertions against a hash. The manual looping is always something one can implement if that's a good choice in specific cases. With such a system, we'd need just one more job to run the SHA256 version of the test suite, which just sets an environment variable.
Oh, right, the If In any case, it seems we'd have to find a way to get CI green. Something that I'd not want to lose is to see what the best possible size is, i.e. if just one hash is supported and that is the smallest hash. But then one would have to test the individual crate just with one hash kind which may also be difficult in future. So… maybe just change the size expectations to the new value, while also using |
|
Oh, and what happens if the crate is now tested individually? |
|
We could add the following line to This would duplicate the dependency, though. |
Yes, such a line would be needed via Lastly, it looks like there will never be a need to add
|
Co-authored-by: Eliah Kagan <[email protected]>
`gix-index-tests` is not ready yet as it has hard-coded expectations that expect `gix_hash::Kind::Sha1` in certain places.
|
CI is now green! Quick summary of this PR’s changes:
|
Byron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting!
It adds GIX_TEST_HASH which affects the default hash used in fixture scripts via GIT_DEFAULT_HASH. (We might want to consider renaming to GIX_TEST_FIXTURE_HASH or something that makes it clearer this is related to fixtures.)
Yes, I'd also like that.
I left a comment, but ideally it's applied to all the numbers so we keep track of the SHA1 versions.
gix-commitgraph/Cargo.toml
Outdated
|
|
||
| [features] | ||
| ## Support for SHA256 hashes and digests. | ||
| sha256 = ["gix-hash/sha256"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's start with that.
This means we say that plumbing will never set hash features, and it expects to be able to see the hash to be used at runtime via gix-hash::Kind. Most implementations do that already, and I think this should work for most cases.
gix-diff/src/tree/visit.rs
Outdated
| let ceiling = 72; | ||
| assert!( | ||
| actual <= 48, | ||
| "{actual} <= 48: this type shouldn't grow without us knowing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we record the old number? Maybe by doing:
let sha1 = 48;
let sha256_extra = 24
actual <= sha1 + sha256_extraThis will just help document how much memory can be saved by not compiling for SHA256.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea and have changed all occurrences to explicitly contain that piece of information.
| authors = ["Sebastian Thiel <[email protected]>"] | ||
| license = "MIT OR Apache-2.0" | ||
| description = "Please use `gix-<thiscrate>` instead ('git' -> 'gix')" | ||
| description = "Tests for the gix-pack crate" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, thanks!
| if hash_kind != gix_hash::Kind::Sha1 { | ||
| path = path.join(hash_kind.to_string()); | ||
| } | ||
| path = path.join(format!("{}-{}", script_identity, family_name())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see this comment?
justfile
Outdated
| cargo check -p gix-features --features zlib | ||
| cargo check -p gix-features --features cache-efficiency-debug | ||
| cargo check -p gix-commitgraph --all-features | ||
| cargo check -p gix-commitgraph --features sha256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, I think this is related to the comment at the top. What follows is what I wrote before.
Now I wonder: does gix-commitgraph still need its own feature? Or could it be told by gix_hash::Kind what it should expect?
My hope is that this can all be done with runtime values, and not forwarding sha1/sha256 features should make that clearer as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What surprises me is that there is no way to check what hash format the graph is using. Could we have an implementation of object_hash (which exists on File) for the Graph, and assert on the value.
But then we'd need a way to get the currently set hash, which had to come from gix-testtools.
I think this would be very useful to have and some assertion should definitely be made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking about something like the following:
/// The kind of hash used in this `Graph`.
///
/// Note that it is always conforming to the hash used in the owning repository.
///
/// # Panics
///
/// If the graph does not contain any `File`.
pub fn object_hash(&self) -> gix_hash::Kind {
self.files
.first()
.map(super::File::object_hash)
.expect("graph to have at least one file")
}|
@Byron I’ve merged |
This PR adds the
sha256feature togix-commitgraph. I opened it as a draft, so we have a space where we can discuss how to best approach testing this new feature.This PR was triggered by a comment in #2359.