Fix the issue of input data nodata changing by the odc loading #163

emmaai · 2024-10-25T00:35:17Z

As the title, ref opendatacube/datacube-core#1646

carry the nodata following the same convention in datacube
minor refactor of code always reading nodata from dataset attributes

Note that it's not a fundamental solution, as the issue raised in datacube, ideally the loading process should inform any change, as None can be understood and dealt with so many ways. Currently datacube rely on the assumption that people agree with one, which might not always be true.

codecov · 2024-10-25T00:36:05Z

Codecov Report

Attention: Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.02%. Comparing base (980ded2) to head (39c9445).

Files with missing lines	Patch %	Lines
odc/stats/plugins/_base.py	75.00%	1 Missing ⚠️
odc/stats/plugins/lc_level3.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #163      +/-   ##
===========================================
- Coverage    80.02%   80.02%   -0.01%     
===========================================
  Files           49       49              
  Lines         4356     4361       +5     
===========================================
+ Hits          3486     3490       +4     
- Misses         870      871       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

SpacemanPaul · 2024-10-25T01:03:09Z

It looks like it should work, but I'm confused why we have to handle so many weird corner cases here (float, nodata=NaN; float, nodata=None; int, nodata=255; int, nodata=255 OR MORE, etc). Surely we have more control over the dataflow here?

emmaai · 2024-10-25T01:16:36Z

It looks like it should work, but I'm confused why we have to handle so many weird corner cases here (float, nodata=NaN; float, nodata=None; int, nodata=255; int, nodata=255 OR MORE, etc). Surely we have more control over the dataflow here?

The problem is with product definition. If the Dataset used in loading is created with odc database query, it has the correct dtype and nodata, that'd be uint8 and 255. If it is created by stac, then the product definition is missing and hence no dtype or nodata, in this case, odc falls to float and None. In the loading, None will be converted to nan, and odc loading will substitute the nodata in geotiff metadata with nan.

Currently there is no ideal way to gain the information of product definition for datasets not indexed, dataset metadata and product definition seems totally irrelevant unless that we are always assumed to know where to find that piece of information.

All the corner cases are caused by how odc deals with missing information by default, as stated in the issue raised on datacube. Hence the dataflow actually loses the control over the data loading, not knowing either product definition or the convention of datacube .

The essence of a solution would be linking product definition to dataset metadata such that the dataflow can know for sure what to be loaded without relying on datacube default. Though it requires some upgrades on dataset metadata, and further discussion on how to implement it.

tebadi

This is fixing nodata at level3 for the cultivated band which is a specific known case. Level4 loads other bands and If nan is present in those bands, this won't fix that. Example are woody_cover, pv_pc_50 and water_frequency.

emmaai · 2024-10-25T03:38:00Z

The change is not meant to fix any particular band or dataset. It only informs what’s the real nodata value of the data. The individual plugin should deal with the information accordingly, level 3 plugin only serves as an example of how.

…

On Fri, 25 Oct 2024 at 1:16 pm, Toktam Ebadi ***@***.***> wrote: ***@***.**** commented on this pull request. This is fixing nodata at level3 for the cultivated band which is a specific known case. Level4 loads other bands and If nan is present in those bands, this won't fix that. Example are woody_cover, pv_pc_50 and water_frequency. — Reply to this email directly, view it on GitHub <#163 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLBWNQG3FZPI6UYLIMQBDTZ5GWKBAVCNFSM6AAAAABQSF6SFGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGOJUGA2TIOJUGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Emma Ai added 4 commits October 24, 2024 08:07

fix the nodata issue when default to nan with float dtype

4d0732b

flexible to the case that numpy change convention

3131bbc

very explicit with mangroves dtype

119f4a3

simplify l3 cultivated mask

39c9445

emmaai requested review from SpacemanPaul, Ariana-B, JM-GA and tebadi October 25, 2024 00:35

JM-GA approved these changes Oct 25, 2024

View reviewed changes

tebadi reviewed Oct 25, 2024

View reviewed changes

tebadi approved these changes Oct 25, 2024

View reviewed changes

emmaai merged commit d799a99 into develop Oct 28, 2024
5 checks passed

emmaai deleted the fix_l3 branch October 28, 2024 04:15

emmaai restored the fix_l3 branch November 5, 2024 02:15

emmaai deleted the fix_l3 branch November 5, 2024 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the issue of input data nodata changing by the odc loading #163

Fix the issue of input data nodata changing by the odc loading #163

emmaai commented Oct 25, 2024

codecov bot commented Oct 25, 2024

SpacemanPaul commented Oct 25, 2024

emmaai commented Oct 25, 2024 •

edited

Loading

tebadi left a comment

emmaai commented Oct 25, 2024 via email

Fix the issue of input data nodata changing by the odc loading #163

Fix the issue of input data nodata changing by the odc loading #163

Conversation

emmaai commented Oct 25, 2024

codecov bot commented Oct 25, 2024

Codecov Report

SpacemanPaul commented Oct 25, 2024

emmaai commented Oct 25, 2024 • edited Loading

tebadi left a comment

Choose a reason for hiding this comment

emmaai commented Oct 25, 2024 via email

emmaai commented Oct 25, 2024 •

edited

Loading