Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue of input data nodata changing by the odc loading #163

Merged
merged 4 commits into from
Oct 28, 2024

Conversation

emmaai
Copy link
Contributor

@emmaai emmaai commented Oct 25, 2024

As the title, ref opendatacube/datacube-core#1646

  • carry the nodata following the same convention in datacube
  • minor refactor of code always reading nodata from dataset attributes

Note that it's not a fundamental solution, as the issue raised in datacube, ideally the loading process should inform any change, as None can be understood and dealt with so many ways. Currently datacube rely on the assumption that people agree with one, which might not always be true.

Copy link

codecov bot commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.02%. Comparing base (980ded2) to head (39c9445).

Files with missing lines Patch % Lines
odc/stats/plugins/_base.py 75.00% 1 Missing ⚠️
odc/stats/plugins/lc_level3.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #163      +/-   ##
===========================================
- Coverage    80.02%   80.02%   -0.01%     
===========================================
  Files           49       49              
  Lines         4356     4361       +5     
===========================================
+ Hits          3486     3490       +4     
- Misses         870      871       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SpacemanPaul
Copy link
Contributor

It looks like it should work, but I'm confused why we have to handle so many weird corner cases here (float, nodata=NaN; float, nodata=None; int, nodata=255; int, nodata=255 OR MORE, etc). Surely we have more control over the dataflow here?

@emmaai
Copy link
Contributor Author

emmaai commented Oct 25, 2024

It looks like it should work, but I'm confused why we have to handle so many weird corner cases here (float, nodata=NaN; float, nodata=None; int, nodata=255; int, nodata=255 OR MORE, etc). Surely we have more control over the dataflow here?

The problem is with product definition. If the Dataset used in loading is created with odc database query, it has the correct dtype and nodata, that'd be uint8 and 255. If it is created by stac, then the product definition is missing and hence no dtype or nodata, in this case, odc falls to float and None. In the loading, None will be converted to nan, and odc loading will substitute the nodata in geotiff metadata with nan.

Currently there is no ideal way to gain the information of product definition for datasets not indexed, dataset metadata and product definition seems totally irrelevant unless that we are always assumed to know where to find that piece of information.

All the corner cases are caused by how odc deals with missing information by default, as stated in the issue raised on datacube. Hence the dataflow actually loses the control over the data loading, not knowing either product definition or the convention of datacube .

The essence of a solution would be linking product definition to dataset metadata such that the dataflow can know for sure what to be loaded without relying on datacube default. Though it requires some upgrades on dataset metadata, and further discussion on how to implement it.

Copy link
Contributor

@tebadi tebadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixing nodata at level3 for the cultivated band which is a specific known case. Level4 loads other bands and If nan is present in those bands, this won't fix that. Example are woody_cover, pv_pc_50 and water_frequency.

@emmaai
Copy link
Contributor Author

emmaai commented Oct 25, 2024 via email

@emmaai emmaai merged commit d799a99 into develop Oct 28, 2024
5 checks passed
@emmaai emmaai deleted the fix_l3 branch October 28, 2024 04:15
@emmaai emmaai restored the fix_l3 branch November 5, 2024 02:15
@emmaai emmaai deleted the fix_l3 branch November 5, 2024 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants