Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_as_masked_array does not return a mask for datasets with nan nodata values #270

Open
Alex-Lewandowski opened this issue Feb 27, 2025 · 3 comments · May be fixed by #273
Open

read_as_masked_array does not return a mask for datasets with nan nodata values #270

Alex-Lewandowski opened this issue Feb 27, 2025 · 3 comments · May be fixed by #273
Labels
bug Something isn't working

Comments

@Alex-Lewandowski
Copy link

Describe the bug
read_as_masked_array contains the lines:

 if nodata is not None:
     return np.ma.masked_values(data, nodata)

np.nan is not None evaluates to True and enters the branch. However, np.nan == np.nan evaluates to False, so none of the nans are recognized as nodata values. Apparently, if there is nothing to mask, numpy sets the mask to False (of type np.bool_).

In this case, data is already the correct masked array, the branch can be bypassed, and data can be returned as-is.

To Reproduce
Steps to reproduce the behavior:

  1. array = read_as_masked_array(path_to_OPERA_RTC)
  2. print(array.mask is np.ma.nomask)
  3. print(type(array.mask))

Expected behavior
If the nodata value is nan, the line data = np.ma.masked_invalid(raster_band.ReadAsArray()) should have already produced the desired masked array, which can be returned.

Additional context
I hit this issue working to get OPERA data running on the HydroSAR HYDRO30 make_water_map function: https://github.com/HydroSAR/HydroSAR/blob/develop/src/hydrosar/water_map.py#L185

@Alex-Lewandowski Alex-Lewandowski added the bug Something isn't working label Feb 27, 2025
@jhkennedy
Copy link
Contributor

As an aside, 🥳 for OPERA RTCs!

But, yes, that's not great -- it's currently set up to handle having nodata and invalid data in the scene (e.g., the scene padding being nodata and invalid data being np.nan). With OPERA bursts, there's no need for the two cases. I think.

I think just changing the if statement here:
https://github.com/ASFHyP3/asf-tools/blob/develop/src/asf_tools/raster.py#L64

to account for nodata possibly being np.nan would be sufficient

@Alex-Lewandowski
Copy link
Author

I agree. I added a temporary workaround to make_water_mask but a small update to the if statement here makes more sense. If there aren't any concerns with doing that, I'll submit a PR.

@jhkennedy
Copy link
Contributor

Yeah! PRs welcome.

Looking at it again, would masking invalid after the if be better instead? We don't have any tests for this function, so we should invest in that at some point as well (doesn't have to be now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants