gh-136599: Improve long_hash #136600

eendebakpt · 2025-07-12T21:36:40Z

There is a corner case in the C code that not tested in long_hash: the final check

    if (x == (Py_uhash_t)-1)
        x = (Py_uhash_t)-2;

We can improve performance of long_hash by unrolling the first two digits

For reviewers: the first commit adds a test, the second and third commit unroll the hash calculation for the digits. The performance gain is not a lot, so I would be fine with reverting the last two commits. A test scripts:

import pyperf

setup = """

z = 2 << 30
two_digit_ints = list(range(z, z + 2000))

def long_hash_small_int():
    for _ in range(4):
        for ii in range(0,250):
            hash(ii)

def long_hash_one_digit():
    z = 1000
    for ii in range(z, z + 1000):
        hash(ii)

def long_hash_two_digit():
    z = 2 << 30
    for ii in range(z, z + 1000):
        hash(ii)

def long_hash_multi_digit():
    z = 1 << 30 << 30 << 30 << 30
    for ii in range(z, z + 1000):
        hash(ii)
"""

runner = pyperf.Runner()
runner.timeit(name="long_hash_small_int", stmt="long_hash_small_int()", setup=setup)
runner.timeit(name="long_hash_one_digit", stmt="long_hash_one_digit()", setup=setup)
runner.timeit(name="long_hash_two_digit", stmt="long_hash_two_digit()", setup=setup)
runner.timeit(name="long_hash_multi_digit", stmt="long_hash_multi_digit()", setup=setup)
runner.timeit(name="set(ints)", stmt="set(two_digit_ints)", setup=setup)

On my system I get a fairly consistent performance increase of 10% for the set(ints) test. For the other benchmarks my system (thermal throttled laptop) is not stable enough (the benchmarks are also a bit dominated by the creation of new integers).

Issue: Improve long_hash #136599

StanFromIreland · 2025-07-13T05:32:29Z

Lib/test/test_long.py

+        assert hash(10) == 10
+        assert hash(-1) == -2
+        assert hash(-2**61) != -1


Suggested change

assert hash(10) == 10

assert hash(-1) == -2

assert hash(-2**61) != -1

assertEqual(hash(10), 10)

assertEqual(hash(-1), -2)

assertNotEqual(hash(-2**61), -1)

Such methods are generally preferred (e.g. the above test) as they provide better errors.

chris-eibl · 2025-07-13T08:57:32Z

Objects/longobject.c

+    --i;
+    x = ((x << PyLong_SHIFT));
+    x += v->long_value.ob_digit[i];
+    assert(x < _PyHASH_MODULUS);


Suggested change

assert(x < _PyHASH_MODULUS);

if (x >= _PyHASH_MODULUS)

x -= _PyHASH_MODULUS;

This assert will trigger for values like hash(2**31) for 32bit targets which use the default #define PYLONG_BITS_IN_DIGIT 30

cpython/Include/pyport.h

Lines 132 to 138 in 0d4fd10

/* PYLONG_BITS_IN_DIGIT describes the number of bits per "digit" (limb) in the

* PyLongObject implementation (longintrepr.h). It's currently either 30 or 15,

* defaulting to 30. The 15-bit digit option may be removed in the future.

*/

#ifndef PYLONG_BITS_IN_DIGIT

#define PYLONG_BITS_IN_DIGIT 30

#endif

because

cpython/Include/pyport.h

Line 170 in 0d4fd10

typedef Py_ssize_t Py_hash_t;

and thus PyHASH_BITS is 31

cpython/Include/cpython/pyhash.h

Lines 12 to 18 in 0d4fd10

#if SIZEOF_VOID_P >= 8

# define PyHASH_BITS 61

#else

# define PyHASH_BITS 31

#endif

#define PyHASH_MODULUS (((size_t)1 << _PyHASH_BITS) - 1)

I suggest to add more tests to test_long_hash around these corner cases for such build configurations:

>>> hash(-2**31 - 2) -3 >>> hash(-2**31 - 1) -2 >>> hash(-2**31) -2 >>> hash(2**31) 1 >>> hash(2**31 + 1) 2 >>> hash(2**31 + 2) 3

I added a check that hash(-2**31) is not -1. For the others I has a bit hesitant as the values are implementation details (document as implementation details here: https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types). To add tests I would have to get the (private) value of PyHASH_MODULUS.

Interesting, your link also mentions that PyHASH_MODULUS is available via https://docs.python.org/3/library/sys.html#sys.hash_info.modulus, which is e.g. used in

cpython/Lib/test/test_numeric_tower.py

Line 14 in 283b050

_PyHASH_MODULUS = sys.hash_info.modulus

and some other tests. But yeah, seems to be an implementation detail. Don't know for sure whether it's ok to use it in tests - but I think so?

PyHASH_MODULUS is not private anymore. And it was documented in the algorithm description before.

chris-eibl · 2025-07-13T09:18:40Z

Lib/test/test_long.py

@@ -1693,5 +1693,11 @@ class MyInt(int):
        # GH-117195 -- This shouldn't crash
        object.__sizeof__(1)

+    def test_long_hash(self):


+1 on introducing hash tests in test_long.py even though there are

cpython/Lib/test/test_builtin.py

Line 1165 in 3dbe02c

def test_hash(self):

and Lib/test/test_hash.py

There are special tests with respect to hashing in many other test modules, e.g.

cpython/Lib/test/test_float.py

Line 635 in 9e5cebd

def test_hash(self):

so IMHO this is a good fit 👍

eendebakpt added 3 commits July 12, 2025 22:21

Add test for long_hash

146f5aa

Unroll first digit calculation in long_hash

a162da2

Unroll second digit calculation in long_hash

4f9fc76

bedevere-app bot added the awaiting review label Jul 12, 2025

bedevere-app bot mentioned this pull request Jul 12, 2025

Improve long_hash #136599

Open

eendebakpt added 3 commits July 12, 2025 23:37

whitespace

07bce4b

add gh number

32341de

Merge branch 'long_hash' of github.com:eendebakpt/cpython into long_hash

194fb7a

StanFromIreland reviewed Jul 13, 2025

View reviewed changes

chris-eibl reviewed Jul 13, 2025

View reviewed changes

eendebakpt added 3 commits July 13, 2025 20:58

review comments

a48860f

fix test on wasi

6d3754b

add news entry

fec9fbe

rhettinger requested a review from tim-one July 14, 2025 03:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-136599: Improve long_hash #136600

gh-136599: Improve long_hash #136600

eendebakpt commented Jul 12, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

StanFromIreland Jul 13, 2025

Uh oh!

chris-eibl Jul 13, 2025

Uh oh!

eendebakpt Jul 13, 2025

Uh oh!

chris-eibl Jul 13, 2025

Uh oh!

skirpichev Jul 14, 2025

Uh oh!

chris-eibl Jul 13, 2025

Uh oh!

Uh oh!

	assert(x < _PyHASH_MODULUS);
	if (x >= _PyHASH_MODULUS)
	x -= _PyHASH_MODULUS;

	/* PYLONG_BITS_IN_DIGIT describes the number of bits per "digit" (limb) in the
	* PyLongObject implementation (longintrepr.h). It's currently either 30 or 15,
	* defaulting to 30. The 15-bit digit option may be removed in the future.
	*/
	#ifndef PYLONG_BITS_IN_DIGIT
	#define PYLONG_BITS_IN_DIGIT 30
	#endif

	#if SIZEOF_VOID_P >= 8
	# define PyHASH_BITS 61
	#else
	# define PyHASH_BITS 31
	#endif

	#define PyHASH_MODULUS (((size_t)1 << _PyHASH_BITS) - 1)

Uh oh!

gh-136599: Improve long_hash #136600

Are you sure you want to change the base?

gh-136599: Improve long_hash #136600

Conversation

eendebakpt commented Jul 12, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StanFromIreland Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

chris-eibl Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

eendebakpt Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

chris-eibl Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

skirpichev Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

chris-eibl Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eendebakpt commented Jul 12, 2025 •

edited by bedevere-app bot

Loading