You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having had another collection of NESE-stored/Globus-accessible datasets published at IQSS, with the combination of a large volume of data, large numbers of files per dataset AND the apparent popularity of the data with actual users, some limitations of the Globus download framework have become apparent.
(starting a list of issues that need to be addressed below; work in progress)
The same thing that was addressed in Improved handling of Globus uploads (experimental async framework) #10781 (merged in Sept.), is clearly a problem with downloads as well. Even when everything is working as it should, the reliance on continuous looping for the duration of the remote transfer in the current implementation is bound to cause problems. So, the same async. framework where the state of an ongoing upload is saved in the database needs to be implemented for downloads as well.
When problems arise (such as network problems; or the issues with the Globus service on the data storage end, as the case may have been last week), there are assorted problems with how Dataverse handles such. A simplest of examples, when Globus Service gets back "status": "ACTIVE", … "nice_status": "CONNECT_FAILED" when checking on an ongoing download task, it assumes that it has failed beyond recovery and proceeds to remove the permission. (All it means is that there was a failure to connect to the remote Globus service, but "ACTIVE" means just that - the remote client will keep trying to reconnect; which is bound to keep failing with the permission permanently removed, even if the service becomes available).
The text was updated successfully, but these errors were encountered:
Dataverse apparently fails to create an access rule for a user if the rule for the target folder already exists (i.e., if Dataverse had failed to delete it after the last download attempt). This of course is exacerbated by the first issue on the list above - Dataverse will never delete the rule if it gets restarted before the download is completed). It should be made smarter about it - if a rule is already there, just use it.
As currently implemented, it is also exceptionally difficult for an admin to help specific users. For example, if it's a stale rule on the endpoint (issue 3.), it's not easy at all to map the "principals" as shown in the output of the /access_list call to specific users; especially without access to the Globus web console. (It can only be done by sifting through the server logs, and only with the FINE logging enabled). This can be made easier by saving some extra information with the rest of the ongoing task state in the database (as part of item 1. on the initial list)
Having had another collection of NESE-stored/Globus-accessible datasets published at IQSS, with the combination of a large volume of data, large numbers of files per dataset AND the apparent popularity of the data with actual users, some limitations of the Globus download framework have become apparent.
(starting a list of issues that need to be addressed below; work in progress)
The same thing that was addressed in Improved handling of Globus uploads (experimental async framework) #10781 (merged in Sept.), is clearly a problem with downloads as well. Even when everything is working as it should, the reliance on continuous looping for the duration of the remote transfer in the current implementation is bound to cause problems. So, the same async. framework where the state of an ongoing upload is saved in the database needs to be implemented for downloads as well.
When problems arise (such as network problems; or the issues with the Globus service on the data storage end, as the case may have been last week), there are assorted problems with how Dataverse handles such. A simplest of examples, when Globus Service gets back
"status": "ACTIVE", … "nice_status": "CONNECT_FAILED"
when checking on an ongoing download task, it assumes that it has failed beyond recovery and proceeds to remove the permission. (All it means is that there was a failure to connect to the remote Globus service, but "ACTIVE" means just that - the remote client will keep trying to reconnect; which is bound to keep failing with the permission permanently removed, even if the service becomes available).The text was updated successfully, but these errors were encountered: