Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manually casting objects to np arrays when reading checkpoint #701

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

ijpulidos
Copy link
Contributor

@ijpulidos ijpulidos commented May 31, 2023

Description

Numba 0.57 explicitly raises an exception when trying to use MaskedArrays in @jit decorated functions. When we are reading MultiStateSampler attributes from serialized objects (storage) these attributes become MaskedArray. We are casting them to arrays here to make sure we can still use numba when mixing replicas after resuming the simulation.

Solves #700

Todos

  • Implement feature / fix bug
  • Add tests
  • Update documentation as needed
  • Update changelog to summarize changes in behavior, enhancements, and bugfixes implemented in this PR

Status

  • Ready to go

Changelog message

Support for latest numba 0.57 by deserializing attributes as numpy arrays instead of MaskedArrays.

@ijpulidos ijpulidos requested a review from mikemhenry May 31, 2023 19:56
@ijpulidos ijpulidos added this to the 0.22.2 milestone May 31, 2023
@mikemhenry
Copy link
Contributor

Any ideas on the consequence of this?

@mikemhenry
Copy link
Contributor

We might need to check the mask to make sure this is safe
https://numpy.org/doc/stable/reference/maskedarray.generic.html#what-is-a-masked-array

@ijpulidos
Copy link
Contributor Author

Any ideas on the consequence of this?

As far as I can tell they are supposed to be ndarray instead of masked_array, but for some reason when we resume the simulations they get into the mix_replicas method as masked_arrays instead of ndarray. Only when resuming, as far as I can see.

I wonder if this was also involved in the cycling issues we were seeing when resuming simulations in choderalab/perses#613 (it's back baby!! haha)

@codecov
Copy link

codecov bot commented May 31, 2023

Codecov Report

Merging #701 (f06876e) into main (abd4011) will not change coverage.
The diff coverage is 100.00%.

@mikemhenry
Copy link
Contributor

Ah so we are not using the actual masking functionality here? I'm just worried that if there is a mask on these arrays, that means we don't want to include some of that data in the analysis.

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to make sure that we are not using the mask

@ijpulidos
Copy link
Contributor Author

ijpulidos commented Jun 1, 2023

Just want to make sure that we are not using the mask

I think we don't use it or need it, since they weren't masked array from the start and resuming simulations seems to be working, when we explicitly use arrays instead of masked arrays. That said, I don't know a way to be completely sure.

I don't know of a systematic way to be sure we are not using the mask, it just seems that every time we read from a .nc file we will always get a MaskedArray , as in Unidata/netcdf4-python#787

@ijpulidos
Copy link
Contributor Author

We could probably come up with a less hacky or a more general solution that uses set_auto_mask(False) whenever we read from the .nc files

@mikemhenry
Copy link
Contributor

If we don't want the masking behavior or use it, then when we save the nc file we should use set_auto_mask(False) but it sounds like this is okay to merge then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants