-
-
Notifications
You must be signed in to change notification settings - Fork 410
Add ObjectId::Sha256 and Kind::Sha256
#2292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Byron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it's totally possible to start a new review per commit, sorry for that.
Generally, I think this kind of 'manual' testing is perfectly fine. Ideally we reach the same test-coverage for SHA256 as for SHA1, while maybe even covering more ground or doing some cleanup. This crate is one of the oldest, and it probably shows.
| #[test] | ||
| fn it_detects_equality() { | ||
| #[cfg(feature = "sha256")] | ||
| fn it_detects_inequality_sha256() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to see inequality between hashes of different kinds. Of course we know this will always be true, but… it's to be sure? Maybe make both have the same prefix, so an incorrect implementation could get it wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes total sense! Are you suggesting a test such as the following one?
#[test]
#[cfg(all(feature = "sha1", feature = "sha256"))]
fn it_detects_inequality_sha1_and_sha256() {
let prefix_sha1 = gix_hash::Prefix::new(&hex_to_id("bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"), 7).unwrap();
let prefix_sha256 = gix_hash::Prefix::new(
&hex_to_id("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"),
7,
)
.unwrap();
assert_eq!(prefix_sha256.cmp(&prefix_sha1), Ordering::Greater);
assert_eq!(prefix_sha1.to_string(), "bbbbbbb");
assert_eq!(prefix_sha256.to_string(), "aaaaaaa");
}
Prefix has #[derive(Ord, PartialOrd)], so this is what the default implementation gives us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be it.
We'd probably also want to ensure we have a test that turns both on at the same time then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not entirely sure I understand what you mean by “a test that turns both on at the same time”. Can you elaborate? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant a test that runs if both feature toggles are enabled so both hashes are available.
Byron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And my second comment, this should have been part of the 'review' above.
|
I’ve now finally been able to address your feedback and a couple of notes that I had taken. Most of my recent changes are related to polishing the API: normalizing the spelling to Observation
|
Byron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, I glanced at everything and it looked good to me.
However, there is this one big issue of having to find a solution for the Hasher.
gix-hash/src/hasher.rs
Outdated
| match kind { | ||
| crate::Kind::Sha1 => Hasher::default(), | ||
| #[cfg(feature = "sha256")] | ||
| crate::Kind::Sha256 => Hasher::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right, I'd put the PR back to draft until a Sha256 hashes is available.
I also don't think it can be unimplemented!(), but… maybe it would be an intermediate option if there is no other way now and there is a plan to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that! I should have noticed, but didn’t that Hasher is particular to SHA1. I was too focused on other things. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, no problem. Putting the PR back to draft so you can mark it as ready when ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sha256 now uses the correct hasher. I hope I didn’t miss anything and marked the PR as ready again.
|
|
||
| /// Returns the longest hash we support. | ||
| #[inline] | ||
| pub const fn longest() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that there is this logic, it feels like tests for longest/shortest would be good. They'd have to repeat the feature flags so I am not a huge fan, but it really feels like there should be something in place to prevent this from going wrong, ever.
I'd be OK if you said 'better not' as this seems to be correct and when Git introduces a new hash in 10y+, we'd be looking at a completely different environment anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Byron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just before finally working on merging this PR I think I found one more not-so-minor thing that could be addressed beforehand.
Do you feel the motivation of doing that? If not, I'd understand, and a wink to me will be enough and I will take it from here.
Thanks
gix-hash/Cargo.toml
Outdated
| ## Support for SHA1 digests. | ||
| sha1 = [] | ||
| ## Support for SHA256 digests. | ||
| sha256 = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to give this PR my final review and merge it, but noticed that it appears that sha2 should be optional, and probably with it, sha1-checked is as well.
With that, we'd need some new test entries in the just file to assure all combinations of these are compiling and can be tested, for good measure, after all this is very much at the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right, I forgot to make the sha2 dependency optional. I’ve changed that.
In order to facilitate review, I have not yet begun the work to make sha1 optional as well. In its current state, this PR tries to not make any changes to existing behaviour. That being said, feel free to make sha1 optional as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds SHA256 hash support to gix-hash by introducing ObjectId::Sha256 and Kind::Sha256 variants alongside the existing SHA1 support. The implementation uses feature flags for conditional compilation, allowing users to opt into SHA256 support.
Key Changes
- Added
ObjectId::Sha256variant andKind::Sha256enum variant with feature-gated compilation - Introduced SHA256-specific constants for digest sizes and empty tree/blob hashes
- Extended all hash-related methods to handle both SHA1 and SHA256 variants via match expressions
- Added comprehensive test coverage for SHA256 functionality across prefix, kind, and hasher modules
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
gix-hash/Cargo.toml |
Adds optional sha2 dependency for SHA256 support via the sha256 feature flag |
gix-hash/src/lib.rs |
Defines SHA256 constants (digest sizes, empty blob/tree hashes) and adds Kind::Sha256 enum variant |
gix-hash/src/kind.rs |
Extends Kind methods to handle SHA256, including from_hex_len, len_in_bytes, and hash-specific factory methods |
gix-hash/src/object_id.rs |
Adds ObjectId::Sha256 variant and updates all methods to handle both hash types |
gix-hash/src/oid.rs |
Extends borrowed oid type methods to support SHA256 references and conversions |
gix-hash/src/hasher.rs |
Refactors Hasher from a struct to an enum supporting both SHA1 and SHA256 hash algorithms |
gix-hash/src/io.rs |
Adds match arm for SHA256 in the Write constructor |
gix-hash/src/verify.rs |
Minor documentation updates (ID -> id) |
gix-hash/tests/hash/prefix.rs |
Adds comprehensive SHA256 test coverage for prefix operations and cross-hash comparisons |
gix-hash/tests/hash/oid.rs |
Adds is_null test for SHA256 |
gix-hash/tests/hash/kind.rs |
Tests shortest() and longest() methods with various feature flag combinations |
gix-hash/tests/hash/hasher.rs |
Updates size assertions for hasher types when SHA256 is enabled |
justfile |
Adds build and test commands with sha256 feature flag enabled |
Cargo.lock |
Adds sha2 crate dependency |
| 0..=40 => Kind::Sha1, | ||
| 0..=SIZE_OF_SHA1_HEX_DIGEST => Kind::Sha1, | ||
| #[cfg(feature = "sha256")] | ||
| 0..=SIZE_OF_SHA256_HEX_DIGEST => Kind::Sha256, |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The from_hex_len logic is incorrect when both SHA1 and SHA256 are enabled. The first match arm 0..=SIZE_OF_SHA1_HEX_DIGEST (0..=40) will catch all values including those intended for SHA256 (41..64), preventing the second arm from ever being reached. The ranges overlap completely, so any hex_len value will always match the first arm and return Kind::Sha1.
This should use non-overlapping ranges. For example:
- First arm:
0..=SIZE_OF_SHA1_HEX_DIGESTfor SHA1 - Second arm:
SIZE_OF_SHA1_HEX_DIGEST + 1..=SIZE_OF_SHA256_HEX_DIGESTfor SHA256 (i.e., 41..=64)
| 0..=SIZE_OF_SHA256_HEX_DIGEST => Kind::Sha256, | |
| SIZE_OF_SHA1_HEX_DIGEST + 1..=SIZE_OF_SHA256_HEX_DIGEST => Kind::Sha256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not how it works.
It matches Sha1 while it can, then it matches Sha256.
| pub fn from_bytes_or_panic(bytes: &[u8]) -> Self { | ||
| match bytes.len() { | ||
| 20 => Self::Sha1(bytes.try_into().expect("prior length validation")), | ||
| SIZE_OF_SHA1_DIGEST => Self::Sha1(bytes.try_into().expect("prior length validation")), |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The from_bytes_or_panic method doesn't handle SHA256 digests when the sha256 feature is enabled. It only checks for SHA1 (20 bytes) but should also check for SHA256 (32 bytes) when that feature is available. Add a match arm for SIZE_OF_SHA256_DIGEST that returns Self::Sha256(bytes.try_into().expect("prior length validation")).
| SIZE_OF_SHA1_DIGEST => Self::Sha1(bytes.try_into().expect("prior length validation")), | |
| SIZE_OF_SHA1_DIGEST => Self::Sha1(bytes.try_into().expect("prior length validation")), | |
| #[cfg(feature = "sha256")] | |
| SIZE_OF_SHA256_DIGEST => Self::Sha256(bytes.try_into().expect("prior length validation")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
gix-hash/tests/hash/prefix.rs
Outdated
| let prefix = gix_hash::Prefix::new(&id, 6).unwrap(); | ||
| assert_eq!(prefix.cmp_oid(&id), Ordering::Equal); | ||
| assert_eq!( | ||
| prefix.cmp_oid(&hex_to_id("a920bbffffffffffffffffffffffffffffffffff")), |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hex string comparison appears incorrect for SHA256. The test is comparing a SHA256 prefix (64 hex chars) against a SHA1 hash (40 hex chars with 'f' padding). This will likely fail because hex_to_id("a920bbffffffffffffffffffffffffffffffffff") produces a SHA1 hash (40 hex chars), not a SHA256 hash (64 hex chars). For SHA256, the comparison should use a 64-character hex string like "a920bbffffffffffffffffffffffffffffffffffffffffffffffffffffffffff".
| prefix.cmp_oid(&hex_to_id("a920bbffffffffffffffffffffffffffffffffff")), | |
| prefix.cmp_oid(&hex_to_id( | |
| "a920bbffffffffffffffffffffffffffffffffffffffffffffffffffffffffff" | |
| )), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
It is an additive feature, and makes the `Sha256` variant available in `gix_hash::Kind`.
d337afa to
76a025f
Compare
Byron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much!
It took me a while to throughly review this, but I think it should be Ok for now ("he says with a trembling voice").
- It feels wrong to put
empty_blob_sha256andempty_tree_sha256intooidas the comment saysoidis related toSha1, but I’m unsure how to best resolve this tension.
Documenting oid as related to hashes, or specific hashes, works for me, but maybe I am missing something.
- There’s inconsistencies with respect to spelling:
SHA256,SHA-256,Sha256.
Yes, that's quite unfortunate. The official way of referring to these seems to beSHA-1andSHA-256respectively, and I'd love this to be done in the docs at least. This PR unifies this towards SHA1 and SHA256, which can also be argued for as it's the capitalised version ofcore.objectFormat. As long as it is consistent, it's fine, with said preference.
- Hardcoded empty trees and empty blobs appear in more than place (for both SHA-1 and SHA-256). I feel like we should have a single source of truth here.
Sometimes one has one type or the other, but one as Deref implemented, which should automatically allow to invoke everything onoidfromObjectId. But it's implemented on ObjectId for practical reasons. Please feel free to submit a PR though if you have something specific in mind.
- Some comments in
ObjectIdwould probably benefit from being rephrased. See e. g.ObjectId::new_sha1orObjectId::from_20_bytes.
Maybe the function names could also be better. Please feel free to submit a PR here as well.
| const EMPTY_TREE_SHA1: &[u8; SIZE_OF_SHA1_DIGEST] = | ||
| b"\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60\xe5\x4b\xf8\xd6\x92\x88\xfb\xee\x49\x04"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally there is all the constants that were scattered before in one place.
| pub fn from_bytes_or_panic(bytes: &[u8]) -> Self { | ||
| match bytes.len() { | ||
| 20 => Self::Sha1(bytes.try_into().expect("prior length validation")), | ||
| SIZE_OF_SHA1_DIGEST => Self::Sha1(bytes.try_into().expect("prior length validation")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
| pub fn hex_to_buf<'a>(&self, buf: &'a mut [u8]) -> &'a mut str { | ||
| let num_hex_bytes = self.bytes.len() * 2; | ||
| faster_hex::hex_encode(&self.bytes, &mut buf[..num_hex_bytes]).expect("to count correctly") | ||
| } | ||
|
|
||
| /// Write ourselves to `out` in hexadecimal notation. | ||
| #[inline] | ||
| pub fn write_hex_to(&self, out: &mut dyn std::io::Write) -> std::io::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, why not.
These methods feel oddly familiar, even though they are new.
Ah, now I see they just moved.
gix-hash/src/verify.rs
Outdated
|
|
||
| impl oid { | ||
| /// Verify that `self` matches the `expected` object ID. | ||
| /// Verify that `self` matches the `expected` object id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't fully internalise why id is better than ID. Maybe because it's ObjectId, it's best to write object id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this instance, this could have been the context. The comment is in a module called oid which would probably be oId if it were ID. But then, it doesn’t matter terribly much, I guess. Also, I fell ID stands out quite a bit and people generally seem to prefer id these days (though my impression could of course also be wrong).
| fn size_of_try_finalize_return_type_sha1_and_sha256() { | ||
| assert_eq!( | ||
| std::mem::size_of::<Result<ObjectId, gix_hash::hasher::Error>>(), | ||
| 34, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little strange. Maybe it's padding that makes the tag one byte larger than I think it should be.
gix-hash/tests/hash/prefix.rs
Outdated
| let prefix = gix_hash::Prefix::new(&id, 6).unwrap(); | ||
| assert_eq!(prefix.cmp_oid(&id), Ordering::Equal); | ||
| assert_eq!( | ||
| prefix.cmp_oid(&hex_to_id("a920bbffffffffffffffffffffffffffffffffff")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
c7c75e2 to
cdf671a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 10 comments.
| let sha1 = hex_to_id("a920bbffffffffffffffffffffffffffffffffff"); | ||
| assert_eq!(prefix.cmp_oid(&sha1), Ordering::Equal); | ||
|
|
||
| let sha256 = hex_to_id("a920bbffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"); | ||
| assert_eq!(prefix.cmp_oid(&sha256), Ordering::Equal); |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test, a SHA256 prefix is compared against both a SHA1 hash (line 84-85) and a SHA256 hash (line 87-88), and both return Ordering::Equal. This test seems to demonstrate that prefixes can match hashes of different lengths, which is interesting behavior but potentially confusing. Consider adding a comment explaining why this cross-hash-type comparison returns Equal, as it might be surprising to readers that a SHA256 prefix matches a SHA1 hash.
gix-hash/src/verify.rs
Outdated
|
|
||
| impl oid { | ||
| /// Verify that `self` matches the `expected` object ID. | ||
| /// Verify that `self` matches the `expected` object id. |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment capitalization is inconsistent. Changed from 'object ID' to 'object id', but typically 'ID' is an acronym and should remain capitalized. Consider keeping it as 'object ID' for consistency with other parts of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cruessler An alternative would be to call it object identifier. Or use the type name in backticks. But this is also confusing, and the reasoning behind making it lower case is unclear to me.
gix-hash/src/verify.rs
Outdated
| /// Verify that `self` matches the `expected` object id. | ||
| /// | ||
| /// Returns an [`Error`] containing both object IDs if they differ. | ||
| /// Returns an [`Error`] containing both object ids if they differ. |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous line, 'object ids' should be 'object IDs' to maintain consistency with standard acronym capitalization.
gix-hash/src/oid.rs
Outdated
| #[must_use] | ||
| pub fn hex_to_buf<'a>(&self, buf: &'a mut [u8]) -> &'a mut str { | ||
| let num_hex_bytes = self.bytes.len() * 2; | ||
| faster_hex::hex_encode(&self.bytes, &mut buf[..num_hex_bytes]).expect("buf must be twice our size") |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The panic message 'buf must be twice our size' is informal and unclear. Consider a more precise message like 'buffer size must be at least twice the hash digest size in bytes' to help developers understand the requirement.
| faster_hex::hex_encode(&self.bytes, &mut buf[..num_hex_bytes]).expect("buf must be twice our size") | |
| faster_hex::hex_encode(&self.bytes, &mut buf[..num_hex_bytes]) | |
| .expect("buffer size must be at least twice the hash digest size in bytes"); |
gix-hash/src/oid.rs
Outdated
| } | ||
|
|
||
| /// Sha1 specific methods | ||
| /// Methods for creating special SHA1 and SHA256 `oid`s |
Copilot
AI
Dec 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says 'Sha1 specific methods' was changed to 'Methods for creating special SHA1 and SHA256 oids', but this is slightly misleading because it's now an implementation block that contains methods for both SHA1 and SHA256. Consider being more precise: 'Methods for creating special-case oids (null, empty blob, empty tree)'.
| /// Methods for creating special SHA1 and SHA256 `oid`s | |
| /// Methods for creating special-case `oid`s (null, empty blob, empty tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good!
cdf671a to
6f43ff5
Compare

This is a draft that adds
ObjectId::Sha256andKind::Sha256togix-hash. I’m opening this to get early feedback, in particular from a CI run.Some remarks:
empty_blob_sha256andempty_tree_sha256intooidas the comment saysoidis related toSha1, but I’m unsure how to best resolve this tension.SHA256,SHA-256,Sha256.ObjectId::null_sha1hardcodes 20. Should this beSIZE_OF_SHA1_DIGEST?oid::null_sha1usesSIZE_OF_SHA1_DIGEST.Kind::len_in_hexandKind::len_in_bytesalso hardcode values.ObjectIdwould probably benefit from being rephrased. See e. g.ObjectId::new_sha1orObjectId::from_20_bytes.Sources for the hardcoded values related to SHA-256:
https://github.com/git/git/blob/e85ae279b0d58edc2f4c3fd5ac391b51e1223985/hash.h
https://github.com/git/git/blob/e85ae279b0d58edc2f4c3fd5ac391b51e1223985/hash.c