Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move file to results #335

Merged
merged 6 commits into from
Oct 17, 2024
Merged

Move file to results #335

merged 6 commits into from
Oct 17, 2024

Conversation

EmileSonneveld
Copy link
Contributor

There is already a version that only tries the move twice on staging. This will make 5 tries and is a bit more thread-safe

@EmileSonneveld EmileSonneveld requested a review from bossie October 16, 2024 12:32
@EmileSonneveld EmileSonneveld linked an issue Oct 17, 2024 that may be closed by this pull request
@EmileSonneveld
Copy link
Contributor Author

This should also fix the error in integration tests.
openeo.rest.OpenEoApiError: [500] Internal: Server error: PermissionError(13, 'Permission denied') (ref: r-241016413b7f4eae8ed54d00388a426f)

Files.createTempFile makes an empty file with by default very tight permissions. After a move the permissions stay the same. Now, no default empty file is created, and GDAL writes a file with more sensible permissions

@EmileSonneveld
Copy link
Contributor Author

@bossie had a nice idea to avoid the retry logic:
Let executors write to unique paths, and send it to the driver. The driver chooses what files get moved to the final location, and then removes files from failed executors.

@bossie
Copy link
Collaborator

bossie commented Oct 17, 2024

It's just that doing it twice apparently does not suffice, and why would doing it 5 times fix it?

If the problem is that executors are getting in each other's way (be it because of speculation or otherwise) maybe we should just avoid that.

I thought maybe this could work:

  • driver passes the save directory to its executors;
  • executors each write to their file(s) in said directory but append some unique suffix to the filename so you get e.g. openEO_20241017Z_abc123.tif and return the file names to the driver;
  • the driver only gets one value back (in this case: per date), regardless of how many executors attempted to write it;
  • the driver strips off the winning suffix, moves the file and deletes the losing files.

@EmileSonneveld
Copy link
Contributor Author

The unique filenames solution is probably more robust, but a file move on S3 is more heavy. It will need to download and upload the file again, giving some delays and maybe will take a lot of memory again. (Not more than during job execution I guess, but stil)

@jdries
Copy link
Contributor

jdries commented Oct 17, 2024

  • the driver strips off the winning suffix, moves the file and deletes the losing files.

This also sounds feasible. Only have to be careful with the fact that users want to provide arbitrary asset names. Of course, if the suffix uses a fixed number of characters, then I guess it's fool proof?

@EmileSonneveld
Copy link
Contributor Author

I'll start implementing that one.
If re-enabling the fusemount is urgent, we can maybe get this one trough already. Or at least the permissions fix

@jdries
Copy link
Contributor

jdries commented Oct 17, 2024

@EmileSonneveld yes perhaps merge this one already. I'd like to avoid too much delay.

@EmileSonneveld EmileSonneveld merged commit 244ba1e into develop Oct 17, 2024
EmileSonneveld added a commit that referenced this pull request Oct 17, 2024
@EmileSonneveld EmileSonneveld deleted the move_file_to_results branch October 17, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

separate_asset_per_band gives some empty tiffs
3 participants