Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type-related issues in pset hashes #22

Open
elcorto opened this issue Mar 7, 2024 · 0 comments
Open

Type-related issues in pset hashes #22

elcorto opened this issue Mar 7, 2024 · 0 comments
Assignees

Comments

@elcorto
Copy link
Owner

elcorto commented Mar 7, 2024

joblib.hash may be too specific for our purposes in some cases, since it is type-sensitive:

# Python int
>>> ps.pset_hash(dict(a=1))
'64846e128be5c974d6194f77557d0511542835a8'
>>> ps.pset_hash(dict(a=int(1)))
'64846e128be5c974d6194f77557d0511542835a8'

# np.int64
>>> ps.pset_hash(dict(a=np.int64(1)))
'4bbb1de2b27b9cfd2f81aa37df3bb3926b2d584d'
>>> ps.pset_hash(dict(a=np.array([1])[0]))
'4bbb1de2b27b9cfd2f81aa37df3bb3926b2d584d'

In the context of a pset, we wouldn't care what the type is, as long as it is some kind of int. But the type sensitivity can cause problems if we read back params from a database, e.g. when repeating workloads for failed psets.

If we pass in ints as in

>>> params = ps.plist("a", [1,2,3])

pandas will cast them such that in a DataFrame, df.a.values will be a numpy array

>>> df.a.values
array([1, 2, 3, 4])
>>> df.a.values.dtype
dtype('int64')

with each entry being int64, but to_dict() in

>>> strip_pset = lambda pset: {k: v for k,v in pset.items() if not k.startswith("_")}
>>> params_from_df = [strip_pset(row.to_dict()) for _, row in df.iterrows()]
>>> type(params_from_df[0]["a"])
int

will cast back to Python ints.

@elcorto elcorto self-assigned this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant