Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globus integration: improve handling of downloads #11057

Open
landreev opened this issue Nov 27, 2024 · 1 comment
Open

Globus integration: improve handling of downloads #11057

landreev opened this issue Nov 27, 2024 · 1 comment
Assignees
Labels
Feature: Globus Size: 80 A percentage of a sprint. 56 hours.

Comments

@landreev
Copy link
Contributor

landreev commented Nov 27, 2024

Having had another collection of NESE-stored/Globus-accessible datasets published at IQSS, with the combination of a large volume of data, large numbers of files per dataset AND the apparent popularity of the data with actual users, some limitations of the Globus download framework have become apparent.

(starting a list of issues that need to be addressed below; work in progress)

  1. The same thing that was addressed in Improved handling of Globus uploads (experimental async framework) #10781 (merged in Sept.), is clearly a problem with downloads as well. Even when everything is working as it should, the reliance on continuous looping for the duration of the remote transfer in the current implementation is bound to cause problems. So, the same async. framework where the state of an ongoing upload is saved in the database needs to be implemented for downloads as well.

  2. When problems arise (such as network problems; or the issues with the Globus service on the data storage end, as the case may have been last week), there are assorted problems with how Dataverse handles such. A simplest of examples, when Globus Service gets back "status": "ACTIVE", … "nice_status": "CONNECT_FAILED" when checking on an ongoing download task, it assumes that it has failed beyond recovery and proceeds to remove the permission. (All it means is that there was a failure to connect to the remote Globus service, but "ACTIVE" means just that - the remote client will keep trying to reconnect; which is bound to keep failing with the permission permanently removed, even if the service becomes available).

@landreev landreev moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Nov 27, 2024
@landreev landreev changed the title Globus integration: improve handling of uploads Globus integration: improve handling of downloads Nov 27, 2024
@landreev landreev self-assigned this Dec 11, 2024
@landreev landreev added the Size: 80 A percentage of a sprint. 56 hours. label Dec 11, 2024
@landreev landreev moved this from SPRINT- NEEDS SIZING to In Progress 💻 in IQSS Dataverse Project Dec 11, 2024
@landreev
Copy link
Contributor Author

landreev commented Dec 11, 2024

adding extra things to add/improve...

  1. Dataverse apparently fails to create an access rule for a user if the rule for the target folder already exists (i.e., if Dataverse had failed to delete it after the last download attempt). This of course is exacerbated by the first issue on the list above - Dataverse will never delete the rule if it gets restarted before the download is completed). It should be made smarter about it - if a rule is already there, just use it.

  2. As currently implemented, it is also exceptionally difficult for an admin to help specific users. For example, if it's a stale rule on the endpoint (issue 3.), it's not easy at all to map the "principals" as shown in the output of the /access_list call to specific users; especially without access to the Globus web console. (It can only be done by sifting through the server logs, and only with the FINE logging enabled). This can be made easier by saving some extra information with the rest of the ongoing task state in the database (as part of item 1. on the initial list)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Globus Size: 80 A percentage of a sprint. 56 hours.
Projects
Status: In Progress 💻
Development

No branches or pull requests

1 participant