-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-136599: Improve long_hash #136600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-136599: Improve long_hash #136600
Conversation
Lib/test/test_long.py
Outdated
assert hash(10) == 10 | ||
assert hash(-1) == -2 | ||
assert hash(-2**61) != -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert hash(10) == 10 | |
assert hash(-1) == -2 | |
assert hash(-2**61) != -1 | |
assertEqual(hash(10), 10) | |
assertEqual(hash(-1), -2) | |
assertNotEqual(hash(-2**61), -1) |
Such methods are generally preferred (e.g. the above test) as they provide better errors.
--i; | ||
x = ((x << PyLong_SHIFT)); | ||
x += v->long_value.ob_digit[i]; | ||
assert(x < _PyHASH_MODULUS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(x < _PyHASH_MODULUS); | |
if (x >= _PyHASH_MODULUS) | |
x -= _PyHASH_MODULUS; |
This assert will trigger for values like hash(2**31)
for 32bit targets which use the default #define PYLONG_BITS_IN_DIGIT 30
Lines 132 to 138 in 0d4fd10
/* PYLONG_BITS_IN_DIGIT describes the number of bits per "digit" (limb) in the | |
* PyLongObject implementation (longintrepr.h). It's currently either 30 or 15, | |
* defaulting to 30. The 15-bit digit option may be removed in the future. | |
*/ | |
#ifndef PYLONG_BITS_IN_DIGIT | |
#define PYLONG_BITS_IN_DIGIT 30 | |
#endif |
because
Line 170 in 0d4fd10
typedef Py_ssize_t Py_hash_t; |
and thus
PyHASH_BITS
is 31cpython/Include/cpython/pyhash.h
Lines 12 to 18 in 0d4fd10
#if SIZEOF_VOID_P >= 8 | |
# define PyHASH_BITS 61 | |
#else | |
# define PyHASH_BITS 31 | |
#endif | |
#define PyHASH_MODULUS (((size_t)1 << _PyHASH_BITS) - 1) |
I suggest to add more tests to test_long_hash
around these corner cases for such build configurations:
>>> hash(-2**31 - 2)
-3
>>> hash(-2**31 - 1)
-2
>>> hash(-2**31)
-2
>>> hash(2**31)
1
>>> hash(2**31 + 1)
2
>>> hash(2**31 + 2)
3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check that hash(-2**31)
is not -1
. For the others I has a bit hesitant as the values are implementation details (document as implementation details here: https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types). To add tests I would have to get the (private) value of PyHASH_MODULUS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, your link also mentions that PyHASH_MODULUS
is available via https://docs.python.org/3/library/sys.html#sys.hash_info.modulus, which is e.g. used in
cpython/Lib/test/test_numeric_tower.py
Line 14 in 283b050
_PyHASH_MODULUS = sys.hash_info.modulus |
and some other tests. But yeah, seems to be an implementation detail. Don't know for sure whether it's ok to use it in tests - but I think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyHASH_MODULUS is not private anymore. And it was documented in the algorithm description before.
Lib/test/test_long.py
Outdated
@@ -1693,5 +1693,11 @@ class MyInt(int): | |||
# GH-117195 -- This shouldn't crash | |||
object.__sizeof__(1) | |||
|
|||
def test_long_hash(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on introducing hash tests in test_long.py
even though there are
cpython/Lib/test/test_builtin.py
Line 1165 in 3dbe02c
def test_hash(self): |
and
Lib/test/test_hash.py
There are special tests with respect to hashing in many other test modules, e.g.
cpython/Lib/test/test_float.py
Line 635 in 9e5cebd
def test_hash(self): |
so IMHO this is a good fit 👍
long_hash
: the final checklong_hash
by unrolling the first two digitsFor reviewers: the first commit adds a test, the second and third commit unroll the hash calculation for the digits. The performance gain is not a lot, so I would be fine with reverting the last two commits. A test scripts:
On my system I get a fairly consistent performance increase of 10% for the
set(ints)
test. For the other benchmarks my system (thermal throttled laptop) is not stable enough (the benchmarks are also a bit dominated by the creation of new integers).