You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now UriSource implicitly supports downloading all files directly under the input URL (by appending a wildcard character * to the URL), but downloads are strictly sequential (and go through the coordinator). We should support this at least in the REST API (if not MyriaL), by adding a new endpoint parallelIngestDatasets, which would take either a URL wildcard expression (which would be evaluated by org.apache.hadoop.fs.FileSystem.globStatus() as in UriSource) or a list of URLs (possibly in a separate endpoint), and distribute the downloads over all available workers (using the file sizes reported by org.apache.hadoop.fs.FileSystem.getFileStatus().getLen() and some greedy bin packing heuristic). We could then replace the parallel ingest API in myria-python by a call to this REST API. Eventually we could consider supporting parallel downloads directly in MyriaL.
The text was updated successfully, but these errors were encountered:
Right now
UriSource
implicitly supports downloading all files directly under the input URL (by appending a wildcard character*
to the URL), but downloads are strictly sequential (and go through the coordinator). We should support this at least in the REST API (if not MyriaL), by adding a new endpointparallelIngestDatasets
, which would take either a URL wildcard expression (which would be evaluated byorg.apache.hadoop.fs.FileSystem.globStatus()
as inUriSource
) or a list of URLs (possibly in a separate endpoint), and distribute the downloads over all available workers (using the file sizes reported byorg.apache.hadoop.fs.FileSystem.getFileStatus().getLen()
and some greedy bin packing heuristic). We could then replace the parallel ingest API inmyria-python
by a call to this REST API. Eventually we could consider supporting parallel downloads directly in MyriaL.The text was updated successfully, but these errors were encountered: