Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hash function for AffineScalarFunc class #189

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
d90c541
Update core.py
MichaelTiemannOSC Jul 5, 2023
5d429fe
Implement hash invariant
MichaelTiemannOSC Jul 7, 2023
40154ce
Fix pickling (broken by last commit)
MichaelTiemannOSC Jul 8, 2023
634db47
Improve efficiency of AffineScalarFunc hash
MichaelTiemannOSC Jul 8, 2023
7cca18c
Update appveyor.yml
MichaelTiemannOSC Jul 11, 2023
af3447c
Revert "Update appveyor.yml"
MichaelTiemannOSC Jul 11, 2023
f3cb615
Replace nose with pytest
MichaelTiemannOSC Jul 11, 2023
5e40c49
Revert "Replace nose with pytest"
MichaelTiemannOSC Jul 11, 2023
cd3b7e0
Update appveyor.yml
MichaelTiemannOSC Jul 11, 2023
a2d4bb1
Update appveyor.yml
MichaelTiemannOSC Jul 11, 2023
d74a9d1
feat: Added hash function for AffineScalarFunc class
NelDav Jan 22, 2024
9a9d6d5
Merge remote-tracking branch 'hash/hash_for_pandas' into implement-ha…
NelDav Jan 25, 2024
3e0b064
fix: Merged hash method from "MichaelTiemannOSC". And created a corre…
NelDav Jan 25, 2024
1e57a2f
refactor: Nose does not like return types. Removed them.
NelDav Jan 25, 2024
e4ef9e1
Merge branch 'master' into implement-hash-for-AffineScalarFunc
NelDav Apr 2, 2024
4d8d268
fix: hash calculation works now. However, hash equality between Varia…
NelDav Apr 4, 2024
605b5cd
fix: Variabel hash is equal for objects which __eq__ returns True.
NelDav Apr 4, 2024
82285df
fix: Pickle is able to serialize/deserialize Variables again
NelDav Apr 5, 2024
3c64ae3
fix: pickling works now
NelDav Apr 5, 2024
185aa0d
fix: derivatives with value 0 are filtered out for hash calculation.
NelDav Apr 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 29 additions & 7 deletions uncertainties/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1756,6 +1756,19 @@ def __bool__(self):

########################################

def __hash__(self):
"""
Calculates the hash for any AffineScalarFunc object.
The hash is calculated from the nominal_value, and the derivatives.

Returns:
int: The hash of this object
"""

ids = [id(d) for d in self.derivatives.keys()]
values_hash = tuple(self.derivatives.values())
return hash((self._nominal_value, tuple(ids), values_hash))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a case to test_hash like

z_xy = x + y
z_yx = y + x
assert z_xy == z_yx
assert hash(z_xy) == hash(z_yx)

That case fails for me with this implementation but the implementation is dependent on the iteration order of derivatives which depends on in the insertion order, which can vary depending on the order in which operations are performed. Changing the tuples to frozenset or sorting the values before casting to tuple could remove the order dependence.

Also, we could define __hash__ on LinearCombination and use that here instead of constructing it from derivatives directly. Perhaps there is no use for that though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth trying implementations that cover obvious use cases (such as z_xy == z_yx) and using assertions to raise if we've gone past the limit of our assumptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm the problem of a forzenset might be, that the elements of a set must be unique right?
The ids are unique, but the values might not.
I will try to create a test for that...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the example:

a = ufloat(1.23, 2.34)
b = ufloat(1.23, 2.34)
c = ufloat(1.23, 2.34)

x = AffineScalarFunc(1, LinearCombination({a:1, b:2, c:1}))
y = AffineScalarFunc(1, LinearCombination({a:1, b:2, c:2}))
assert x != y                   # Valid
assert hash(x) != hash(y) # Both hashes are identical :(

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using sorted tuples now, to avoid this problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm the problem of a forzenset might be, that the elements of a set must be unique right?

This is not a blocking problem for the implementation of the __hash__. It is okay if unequal objects hash to the same value. Using frozenset would make it a little more likely to get hash collisions, but only in the case of multiple error terms with exactly the same value. Still, sorting should work just as well for the purpose of producing a deterministic hash. In the case of giant calculations with many error terms maybe sorting is slower than keeping a set, but in practice usually only a handful of variables are involved and sorting should be about as quick as creating a set.


# Uncertainties handling:

def error_components(self):
Expand Down Expand Up @@ -2766,6 +2779,22 @@ def std_dev(self, std_dev):

self._std_dev = float(std_dev)

def __hash__(self):
"""
Calculates the hash for any Variable object.
This simply calls the __hash__ method of `AffineScalarFunc` in most cases.
During initialization, the `_linear_part` and `_nominal_value` attributes does not exist.
Because of that, the id returned in case this method is called during initialization.

Returns:
int: The hash of this object
"""
NelDav marked this conversation as resolved.
Show resolved Hide resolved

if hasattr(self, '_linear_part') and hasattr(self, '_nominal_value'):
return super().__hash__()
else:
return id(self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failure is coming from this part. __hash__ has to give the same value every time. To avoid the error, one option is to change __hash__ here to:

return hash((self._nominal_value, (id(self),), (1.,)))

and update __init__ to set self._nominal_value before the {self: 1.} dict is created. __setstate__ also needs to be updated to make sure _nominal_value is set before it creates the self-referential dict (but see my other comment about wanting to avoid tuples).

It is tempting to try to remove the self-reference but I have seen another case in a symbolic algebra library where a set up similar to this one was the best option.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are totally right, the hash must be the same every time. Not sure, how I could miss that.


# The following method is overridden so that we can represent the tag:
def __repr__(self):

Expand All @@ -2776,13 +2805,6 @@ def __repr__(self):
else:
return "< %s = %s >" % (self.tag, num_repr)

def __hash__(self):
# All Variable objects are by definition independent
# variables, so they never compare equal; therefore, their
# id() are allowed to differ
# (http://docs.python.org/reference/datamodel.html#object.__hash__):
return id(self)

def __copy__(self):
"""
Hook for the standard copy module.
Expand Down
18 changes: 18 additions & 0 deletions uncertainties/test_uncertainties.py
Original file line number Diff line number Diff line change
Expand Up @@ -2338,3 +2338,21 @@ def test_correlated_values_correlation_mat():
assert arrays_close(
numpy.array(cov_mat),
numpy.array(uncert_core.covariance_matrix([x2, y2, z2])))

def test_hash():
'''
Tests the invariance that if x==y, then hash(x)==hash(y)
'''

x = ufloat(1.23, 2.34)
y = ufloat(1.23, 2.34)
# nominal values and std_dev terms are equal, but...
assert x.n==y.n and x.s==y.s
# ...x and y are independent variables, therefore not equal as uncertain numbers
assert x != y
assert hash(x) != hash(y)

# the equation (2x+x)/3 is equal to the variable x, so...
assert ((2*x+x)/3)==x
# ...hash of the equation and the variable should be equal
assert hash((2*x+x)/3)==hash(x)
Loading