Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Zip format files #117

Open
budprat opened this issue May 10, 2023 · 5 comments
Open

Issue with Zip format files #117

budprat opened this issue May 10, 2023 · 5 comments

Comments

@budprat
Copy link

budprat commented May 10, 2023

I tried using odc_stac to access files in zip format using STAC catalogue, but looks like its unable to open zip files, below is the error calls, python code used is od = stac_load(items, bands=('CLM_R1', 'CLM_R2'), crs='32632', resolution=10)

Aborting load due to failure while reading: https://download.geoservice.dlr.de/S2_L2A_MAJA/files/32/U/PF/2020/03/SENTINEL2B_20200307-102543-579_L2A_T32UPF_C_V1-2.zip#SENTINEL2B_20200307-102543-579_L2A_T32UPF_C/MASKS/SENTINEL2B_20200307-102543-579_L2A_T32UPF_C_CLM_R1.tif:1

---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

rasterio/_base.pyx in rasterio._base.open_dataset()

rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: '/vsicurl/https://download.geoservice.dlr.de/S2_L2A_MAJA/files/32/U/PF/2020/03/SENTINEL2B_20200307-102543-579_L2A_T32UPF_C_V1-2.zip' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
<ipython-input-25-e57571da046b> in <module>
----> 1 od = stac_load(items, bands=('CLM_R1', 'CLM_R2'), crs='32632', resolution=10)

~/.local/lib/python3.8/site-packages/odc/stac/_load.py in load(items, bands, groupby, resampling, dtype, chunks, pool, crs, resolution, anchor, geobox, bbox, lon, lat, x, y, like, geopolygon, progress, fail_on_error, stac_cfg, patch_url, preserve_original_order, **kw)
    608         _work = progress(SizedIterable(_work, total_tasks))
    609 
--> 610     for _ in _work:
    611         pass
    612 

~/.local/lib/python3.8/site-packages/odc/stac/_utils.py in pmap(func, inputs, pool)
     36     """
     37     if pool is None:
---> 38         yield from map(func, inputs)
     39         return
     40 

~/.local/lib/python3.8/site-packages/odc/stac/_load.py in _do_one(task)
    599         ]
    600         with rio_env(**_rio_env):
--> 601             _ = _fill_2d_slice(srcs, task.dst_gbox, task.cfg, dst_slice)
    602         t, y, x = task.idx_tyx
    603         return (task.band, t, y, x)

~/.local/lib/python3.8/site-packages/odc/stac/_load.py in _fill_2d_slice(srcs, dst_gbox, cfg, dst)
    696 
    697     src, *rest = srcs
--> 698     _roi, pix = rio_read(src, cfg, dst_gbox, dst=dst)
    699 
    700     for src in rest:

~/.local/lib/python3.8/site-packages/odc/stac/_reader.py in rio_read(src, cfg, dst_geobox, dst)
    192                 src.band,
    193             )
--> 194             raise e
    195 
    196     # Failed to read, but asked to continue

~/.local/lib/python3.8/site-packages/odc/stac/_reader.py in rio_read(src, cfg, dst_geobox, dst)
    184 
    185     try:
--> 186         return _rio_read(src, cfg, dst_geobox, dst)
    187     except rasterio.errors.RasterioIOError as e:
    188         if cfg.fail_on_error:

~/.local/lib/python3.8/site-packages/odc/stac/_reader.py in _rio_read(src, cfg, dst_geobox, dst)
    217     ttol = 0.9 if cfg.nearest else 0.05
    218 
--> 219     with rasterio.open(src.uri, "r", sharing=False) as rdr:
    220         assert isinstance(rdr, rasterio.DatasetReader)
    221         ovr_idx: Optional[int] = None

~/.local/lib/python3.8/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    449 
    450         with env_ctor(session=session):
--> 451             return f(*args, **kwds)
    452 
    453     return wrapper

~/.local/lib/python3.8/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    302 
    303         if mode == "r":
--> 304             dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
    305         elif mode == "r+":
    306             dataset = get_writer_for_path(path, driver=driver)(

rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

RasterioIOError: '/vsicurl/https://download.geoservice.dlr.de/S2_L2A_MAJA/files/32/U/PF/2020/03/SENTINEL2B_20200307-102543-579_L2A_T32UPF_C_V1-2.zip' not recognized as a supported file format.
@Kirill888
Copy link
Member

@budprat please see this comment on similar issue:

#114 (comment)

  1. Not sure if GDAL even supports reading JP2 from inside a zip file on a remote server
  2. If it does support that, and is not excruciatingly slow at it, one would need to construct an appropriate vsi url for it, which odc-stac currently does not.

Can you link to or provide a copy of the STAC Item document you are using.

Given the rate at which linked zip file is downloading for me right now, I strongly doubt that loading that data directly from a remote would be a pleasant experience, and odc-stac does not support persistent local cache, only whatever GDAL caches per process.

@budprat
Copy link
Author

budprat commented May 11, 2023

@Kirill888 Thanks for your reply, Using stackstac also I was not able download datacubes but I think GDAL allows to get data from zip files, STAC document I used is here, https://geoservice.dlr.de/eoc/ogc/stac/v1/collections/S2_L2A_MAJA/items?f=application%2Fgeo%2Bjson

@Kirill888
Copy link
Member

@budprat ok, so data urls look like this:

https://download.geoservice.dlr.de/S2_L2A_MAJA/files/33/U/UA/2023/05 /SENTINEL2A_20230509-103538-197_L2A_T33UUA_C_V1-3.zip#SENTINEL2A_20230509-103538-197_L2A_T33UUA_C/MASKS/SENTINEL2A_20230509-103538-197_L2A_T33UUA_C_CLM_R1.tif

odc-load accepts patch_url function that maps source url to destination. You can try using that mechanism to translate to vsi format gdal understands. I suggest using rio info <url> when testing things. Once you have url format that is understood by rio/gdal figured out you can codify that transformation and pass it on to odc.stac.load(.., patch_url=your_function).

@budprat
Copy link
Author

budprat commented May 28, 2023

Thanks for work around, I found the correct rasteri/gdal url but When Pass it as a str to patch_url it doesnt accept it, says 'str' object is not callable, can you provide an example of using patch_url ?

@Kirill888
Copy link
Member

@budprat you don't provide URL you computed for that exact input, you provide name of the python function that you have written yourself that computes new url from existing url. For example in the code below we replace all https: urls with http: equivalents. In your case url transformation logic will be more involved.

def https_to_http(url):
      if url.startswith("https:"): return url.replace("https:", "http:")
      return url

odc.stac.load(..., patch_url=https_to_http)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants