-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batch_job.py write_metadata
: avoid ad-hoc file selection for upload
#940
Comments
As an illustration that this does not scale:
doesn't even work as |
In export_workspace list of files that exists locally an on s3 is determined by the list of stac metadata files. |
This issue is a direct result of changes introduced in #877 |
Logged what files do get uploaded. All logs on cdse dev where one of the following 2:
|
Tested on CDSE dev without fusemount. Running the test suite seems to be slightly faster now (40min->30min, but only checked on a few CI runs) |
openeo-geopyspark-driver/openeogeotrellis/deploy/batch_job.py
Lines 511 to 519 in b63280c
Here we're building an ugly ad-hoc deny-list for "files" that should not be uploaded to S3
As mentioned in the TODO, we should use an explicit asset list to upload instead of blindly assuming everything from the job dir should be uploaded (minus some hand-picked exceptions)
The text was updated successfully, but these errors were encountered: