Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage downstream datasets with locationless source dataset #178

Open
wants to merge 51 commits into
base: develop
Choose a base branch
from

Conversation

santoshamohan
Copy link
Contributor

@santoshamohan santoshamohan commented Oct 22, 2018

Reason for this pull request

Datasets within the production database had multiple derived datasets / duplicate datasets derived from the source dataset whose local_uri is set to None/Not Available. This was causing ingestion failure during orchestration process. We need a new tool to manage this scenario and take appropriate action.

Background
The above mentioned scenario could arise if we delete the file from the disk and run the sync tool with --update-location option. Dataset location within the database are set to None/Not Available by the sync tool. So, during ingest process, a query from the datacube would result in index out of range error and storage file creation fails.

Proposed solutions

Following are the changes implemented as part of this PR.

  1. Fixed dea-duplicates
    • Required changing the setup in collections.py for unique fields
    • Updated the tests since the CSV output depends on the unique fields
  2. Added NBART scenes to collections.py
  3. Fixed the output directory for the dea-clean archived tool
  4. Add a new feature to list all the downstream datasets whose source dataset has no location
  5. Extend dea-coherence to check downstream datasets
  6. Log erroneous datasets to a CSV file
  • Tests added

@codecov
Copy link

codecov bot commented Oct 22, 2018

Codecov Report

Merging #178 into develop will increase coverage by 1.61%.
The diff coverage is 98.78%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #178      +/-   ##
===========================================
+ Coverage     66.8%   68.42%   +1.61%     
===========================================
  Files           42       42              
  Lines         3190     3230      +40     
===========================================
+ Hits          2131     2210      +79     
+ Misses        1059     1020      -39
Impacted Files Coverage Δ
digitalearthau/collections.py 99.18% <100%> (+0.13%) ⬆️
digitalearthau/cleanup.py 95.55% <100%> (ø) ⬆️
digitalearthau/sync/validate.py 48.93% <100%> (ø) ⬆️
digitalearthau/coherence.py 97.64% <98.38%> (+65.38%) ⬆️
digitalearthau/uiutil.py 83.33% <0%> (-16.67%) ⬇️
digitalearthau/serialise.py 89.13% <0%> (+1.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74e59f0...652f76e. Read the comment docs.

@santoshamohan santoshamohan requested a review from omad October 22, 2018 22:58
Copy link
Contributor

@omad omad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Santosh, thanks for getting this working.

There's a few things here that it'd be good to clean up before merging. Some are really easy fixes, but some of the logic is also getting hard to reason about and we should sit down and work through it together to see if we can simplify it.

digitalearthau/coherence.py Outdated Show resolved Hide resolved
digitalearthau/coherence.py Outdated Show resolved Hide resolved
digitalearthau/coherence.py Outdated Show resolved Hide resolved
digitalearthau/coherence.py Outdated Show resolved Hide resolved
digitalearthau/coherence.py Outdated Show resolved Hide resolved
digitalearthau/coherence.py Outdated Show resolved Hide resolved
digitalearthau/coherence.py Outdated Show resolved Hide resolved
digitalearthau/coherence.py Outdated Show resolved Hide resolved
omad
omad previously requested changes Nov 23, 2018
Copy link
Contributor

@omad omad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Santosh, I think this is looking much better. I think there's still a few issues with the changes in tests.

Also, please try to list all the changes in a PR, for example for this PR:

  • Fixed dea-duplicates
    • Required changing the setup in collections.py for unique fields
    • Updated the tests since the CSV output depends on the unique fields
  • Added NBART scenes to collections.py
  • Fixed the output directory for the dea-clean archived tool
  • Extend dea-coherence to check downstream datasets

integration_tests/conftest.py Show resolved Hide resolved
integration_tests/conftest.py Outdated Show resolved Hide resolved
integration_tests/conftest.py Outdated Show resolved Hide resolved
integration_tests/test_coherence.py Outdated Show resolved Hide resolved
integration_tests/test_coherence.py Outdated Show resolved Hide resolved
integration_tests/test_full_sync.py Outdated Show resolved Hide resolved
@santoshamohan santoshamohan dismissed omad’s stale review January 16, 2019 23:37

Incorporated all the requested changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants