Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect logic for sample datasets used for fixed_metadata #433

Closed
pindge opened this issue Aug 31, 2022 · 1 comment · Fixed by #432
Closed

Incorrect logic for sample datasets used for fixed_metadata #433

pindge opened this issue Aug 31, 2022 · 1 comment · Fixed by #432
Assignees

Comments

@pindge
Copy link
Collaborator

pindge commented Aug 31, 2022

current sampling logic:
https://github.com/opendatacube/datacube-explorer/blob/develop/cubedash/summary/_stores.py#L651-L654

running the sql based on the above logic returns an inconsistent sample size.

opendatacube_test=# select count(*) from agdc.dataset tablesample system (100);
 count 
-------
     7
(1 row)

opendatacube_test=# select count(*) from agdc.dataset tablesample system (50);
 count 
-------
     7
(1 row)

opendatacube_test=# select count(*) from agdc.dataset tablesample system (40);
 count 
-------
     0
(1 row)

this is further validated by test ci: https://github.com/opendatacube/datacube-explorer/runs/8104066958?check_suite_focus=true,
sample percentage of 50 returned zero row of datasets

def test_product_fixed_metadata_by_sample_percentage(summary_store: SummaryStore):
        # There are 4 interim and 16 final maturity level datasets
        # at [100](https://github.com/opendatacube/datacube-explorer/runs/8104095953?check_suite_focus=true#step:5:101)% (all 20 datasets), the same dictionary will be returned
        # 100% of the time
        fixed_fields = summary_store._find_product_fixed_metadata(
            summary_store.index.products.get_by_name("ga_ls8c_ard_3"),
            sample_percentage=100,
        )
    
        assert fixed_fields == {
            "platform": "landsat-8",
            "instrument": "OLI_TIRS",
            "product_family": "ard",
            "format": "GeoTIFF",
            "eo_gsd": 15.0
        }
    
        # There are 4 interim and 16 final maturity level datasets
        # at 50% (10 datasets), there is a fair chance, that maturity level
        # will be in the dictionary
        fixed_fields = summary_store._find_product_fixed_metadata(
            summary_store.index.products.get_by_name("ga_ls8c_ard_3"),
            sample_percentage=50,
        )
    
>       assert len(fixed_fields) >= 5
E       assert 0 >= 5
E        +  where 0 = len({})
@pindge pindge self-assigned this Aug 31, 2022
@pindge
Copy link
Collaborator Author

pindge commented Aug 31, 2022

related to #431

@pindge pindge changed the title Incorrect logic for sampling datasets Incorrect logic for sample datasets used for fixed_metadata Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant