Retrievals in combination with xarray depend on the used cluster #19

observingClouds · 2022-12-08T10:04:48Z

xarray seems to request different amounts of files concurrently depending on the cluster configuration:

levante interactive

import xarray as xr
ds=xr.open_mfdataset("slk:///arch/mh0010/m300408/showcase/dataset.zarr", engine="zarr")
ds.air.mean().compute()
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/0.0.0
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/0.0.1
slk search '{"$and":[{"path":{"$gte":"/arch/mh0010/m300408/showcase/dataset.zarr/air","$max_depth":1}},{"resources.name":{"$regex":"0.0.0|0.0.1"}}]}'
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/0.1.0
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/0.1.1
slk search '{"$and":[{"path":{"$gte":"/arch/mh0010/m300408/showcase/dataset.zarr/air","$max_depth":1}},{"resources.name":{"$regex":"0.1.0|0.1.1"}}]}'
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/1.0.0
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/1.0.1
slk search '{"$and":[{"path":{"$gte":"/arch/mh0010/m300408/showcase/dataset.zarr/air","$max_depth":1}},{"resources.name":{"$regex":"1.0.0|1.0.1"}}]}'
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/1.1.0
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/1.1.1
slk search '{"$and":[{"path":{"$gte":"/arch/mh0010/m300408/showcase/dataset.zarr/air","$max_depth":1}},{"resources.name":{"$regex":"1.1.0|1.1.1"}}]}'

levante compute

import xarray as xr
ds=xr.open_mfdataset("slk:///arch/mh0010/m300408/showcase/dataset.zarr", engine="zarr")
ds.air.mean().compute()
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/0.0.0
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/0.0.1
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/0.1.0
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/0.1.1
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/1.0.0
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/1.0.1
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/1.1.0
/scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/air/1.1.1
slk search '{"$and":[{"path":{"$gte":"/arch/mh0010/m300408/showcase/dataset.zarr/air","$max_depth":1}},{"resources.name":{"$regex":"0.0.0|0.0.1|0.1.0|0.1.1|1.0.0|1.0.1|1.1.0|1.1.1"}}]}'

antarcticrainforest · 2023-01-12T15:54:29Z

Again, this is related to my comment in #10 . I think

observingClouds · 2023-01-14T19:16:56Z

A few more insights. The different is likely caused by dask and how the dask task graph looks like. Depending on the available resources the task graph is created differently. If more resources are available then more data is requested.

#21 scans the task graph and gathers all open-dataset requests and is thereby independent of the available resources.

observingClouds · 2023-02-06T00:27:31Z

With #21 being merged the recommended way to retrieve files is to use the ds.slk.stage() command which operates independent of the available resources.

observingClouds · 2023-02-22T21:42:24Z

This issue unfortunately seems to remain. Dask still schedules the retrievals depending on the available resources to the cluster.

observingClouds mentioned this issue Dec 8, 2022

Combination of retrievals #12

Open

5 tasks

observingClouds closed this as completed Feb 6, 2023

observingClouds reopened this Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrievals in combination with xarray depend on the used cluster #19

Retrievals in combination with xarray depend on the used cluster #19

observingClouds commented Dec 8, 2022

antarcticrainforest commented Jan 12, 2023 •

edited

Loading

observingClouds commented Jan 14, 2023

observingClouds commented Feb 6, 2023

observingClouds commented Feb 22, 2023

Retrievals in combination with xarray depend on the used cluster #19

Retrievals in combination with xarray depend on the used cluster #19

Comments

observingClouds commented Dec 8, 2022

levante interactive

levante compute

antarcticrainforest commented Jan 12, 2023 • edited Loading

observingClouds commented Jan 14, 2023

observingClouds commented Feb 6, 2023

observingClouds commented Feb 22, 2023

antarcticrainforest commented Jan 12, 2023 •

edited

Loading