Incorrect logic for sample datasets used for fixed_metadata #433

pindge · 2022-08-31T00:44:42Z

current sampling logic:
https://github.com/opendatacube/datacube-explorer/blob/develop/cubedash/summary/_stores.py#L651-L654

running the sql based on the above logic returns an inconsistent sample size.

opendatacube_test=# select count(*) from agdc.dataset tablesample system (100);
 count 
-------
     7
(1 row)

opendatacube_test=# select count(*) from agdc.dataset tablesample system (50);
 count 
-------
     7
(1 row)

opendatacube_test=# select count(*) from agdc.dataset tablesample system (40);
 count 
-------
     0
(1 row)

this is further validated by test ci: https://github.com/opendatacube/datacube-explorer/runs/8104066958?check_suite_focus=true,
sample percentage of 50 returned zero row of datasets

def test_product_fixed_metadata_by_sample_percentage(summary_store: SummaryStore):
        # There are 4 interim and 16 final maturity level datasets
        # at [100](https://github.com/opendatacube/datacube-explorer/runs/8104095953?check_suite_focus=true#step:5:101)% (all 20 datasets), the same dictionary will be returned
        # 100% of the time
        fixed_fields = summary_store._find_product_fixed_metadata(
            summary_store.index.products.get_by_name("ga_ls8c_ard_3"),
            sample_percentage=100,
        )
    
        assert fixed_fields == {
            "platform": "landsat-8",
            "instrument": "OLI_TIRS",
            "product_family": "ard",
            "format": "GeoTIFF",
            "eo_gsd": 15.0
        }
    
        # There are 4 interim and 16 final maturity level datasets
        # at 50% (10 datasets), there is a fair chance, that maturity level
        # will be in the dictionary
        fixed_fields = summary_store._find_product_fixed_metadata(
            summary_store.index.products.get_by_name("ga_ls8c_ard_3"),
            sample_percentage=50,
        )
    
>       assert len(fixed_fields) >= 5
E       assert 0 >= 5
E        +  where 0 = len({})

The text was updated successfully, but these errors were encountered:

pindge · 2022-08-31T00:45:04Z

related to #431

pindge self-assigned this Aug 31, 2022

pindge changed the title ~~Incorrect logic for sampling datasets~~ Incorrect logic for sample datasets used for fixed_metadata Aug 31, 2022

pindge mentioned this issue Aug 31, 2022

setup test case for dataset common fields #432

Merged

pindge closed this as completed in #432 Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect logic for sample datasets used for fixed_metadata #433

Incorrect logic for sample datasets used for fixed_metadata #433

pindge commented Aug 31, 2022

pindge commented Aug 31, 2022

Incorrect logic for sample datasets used for fixed_metadata #433

Incorrect logic for sample datasets used for fixed_metadata #433

Comments

pindge commented Aug 31, 2022

pindge commented Aug 31, 2022