-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support getting CityHash64 values for raw byte inputs #69
Comments
[deleted previous paragraph because I think it was incorrect] It's been a while since I worked on this but
Yeah there may be some translation here that is preventing the above results from being different. I will look into this a bit later. CityHash64 and FarmHash64 provide two types of functions:
If you are looking for a hash that's stable across implementations, you probably want to use a fingerprint (in FarmHash, the function is literally called Fingerprint, while in CityHash, it is the function that does not take a seed). So you should check whether the Rust implementation is a fingerprint or not and use the seedless function when trying to match it. That said, the fingerprint function ( Regarding matching the official spec, rather than studying |
Ahh - thanks for pointing out at the byte and Unicode objects have the same hash. Using the technique at How to memory dump an object?
I'm interested in both the fingerprints and the seed functions. I had also tried the fingerprints and they didn't match for me either. |
I'm getting different hash value outputs from python-cityhash than from other implementations. E.g.
fasthash::city::Hash64 - Rust
vs
I assume that is because the Python implementation hashes the entire String data structure, not just the 5 bytes of "hello".
That of course makes sense for the goal of a fast hash algorithm for arbitrary data structures.
But it would be nice to also have a way to hash arbitrary byte arrays. Perhaps some sort of argument "
raw=True
" could be added to the functions?As an aside, it is frustratingly hard to find any official examples of hash values or test vectors from CityHash official sources!
They are absent from the presentation at CityHash: Fast Hash Functions for Strings. The
city-test.cc
code is inscrutable. That Rust example is the best I've found. A sad state of affairs.So I'll add one more, and at the same time more clearly demonstrate the collision issue that was revealed by djb, Jean-Philippe Aumasson, and Martin Boßlet at 29C3: Hash-flooding DoS reloaded: attacks and defenses. I modified their poc citycollisions-20120730.tar.gz code (gone from their original web site, but preserved by the amazing archive.org!) to be clearer about how to call the code to reproduce their collisions.
The text was updated successfully, but these errors were encountered: