Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging AHN tiles takes forever #429

Open
martinvonk opened this issue Mar 17, 2025 · 5 comments · May be fixed by #431
Open

Merging AHN tiles takes forever #429

martinvonk opened this issue Mar 17, 2025 · 5 comments · May be fixed by #431
Assignees
Labels
bug Something isn't working

Comments

@martinvonk
Copy link
Collaborator

martinvonk commented Mar 17, 2025

When downloading the AHN5 (and 4) for a relatively large extent and at 05m scale.

extent = [101_000, 120_000, 412_000, 432_000]
ahn = nlmod.read.ahn.get_ahn5(extent=extent, identifier="AHN5_05M_M")

results in

Downloading tiles of AHN5_05M_M: 100%|██████████| 20/20 [00:03<00:00,  6.66it/s]
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad0dc2d0>, bounds=(105000.0, 425000.0, 110000.0, 431250.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad0dd4d0>, bounds=(115000.0, 425000.0, 120000.0, 431250.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad07c150>, bounds=(110000.0, 431250.0, 115000.0, 437500.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad07fdd0>, bounds=(115000.0, 431250.0, 120000.0, 437500.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad07fa90>, bounds=(100000.0, 418750.0, 105000.0, 425000.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad0dc610>, bounds=(115000.0, 418750.0, 120000.0, 425000.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad0e2190>, bounds=(110000.0, 425000.0, 115000.0, 431250.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad0a5750>, bounds=(110000.0, 406250.0, 115000.0, 412500.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad232350>, bounds=(110000.0, 418750.0, 115000.0, 425000.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad231ad0>, bounds=(105000.0, 418750.0, 110000.0, 425000.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad2320d0>, bounds=(105000.0, 406250.0, 110000.0, 412500.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad2307d0>, bounds=(100000.0, 406250.0, 105000.0, 412500.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad232a50>, bounds=(100000.0, 412500.0, 105000.0, 418750.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad2305d0>, bounds=(110000.0, 412500.0, 115000.0, 418750.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad2311d0>, bounds=(105000.0, 431250.0, 110000.0, 437500.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad231c90>, bounds=(115000.0, 406250.0, 120000.0, 412500.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad232590>, bounds=(105000.0, 412500.0, 110000.0, 418750.0)
INFO:rasterio.merge.merge:Skipping source: src=<rioxarray.merge.RasterioDatasetDuck object at 0x7f05ad233e50>, bounds=(115000.0, 412500.0, 120000.0, 418750.0)

which does not stop running.

@martinvonk martinvonk added the bug Something isn't working label Mar 17, 2025
@github-project-automation github-project-automation bot moved this to Todo in NHFLO Mar 17, 2025
@martinvonk
Copy link
Collaborator Author

martinvonk commented Mar 17, 2025

@rubencalje and I tried changing lines 620:624 of ahn.py to:

        # da = da.sel(x=slice(extent[0], extent[1]), y=slice(extent[3], extent[2]))
        das.append(da)
    if len(das) == 0:
        raise (ValueError("No data found within extent"))
    da = merge_arrays(das, bounds=(extent[0], extent[2], extent[1], extent[3]))

But that did not help..

@martinvonk
Copy link
Collaborator Author

I tried:

        # da = da.sel(x=slice(extent[0], extent[1]), y=slice(extent[3], extent[2]))
        das.append(da)
    if len(das) == 0:
        raise (ValueError("No data found within extent"))
    da = merge_arrays(das)
return da.sel(x=slice(extent[0], extent[1]), y=slice(extent[3], extent[2]))

But that did not work either.

@martinvonk
Copy link
Collaborator Author

Setting the mem_limit of _merge_rio to 8*64 (default 64MB) does not work either.

@rubencalje
Copy link
Collaborator

rubencalje commented Mar 18, 2025

So rioxarray.merge.merge_arrays hangs for some reason. One solution would be to first download the ahn-tiles, and then merge afterwards using rasterio (I tried rioxarray.merge.merge_arrays on the downloaded files, this still fails):

# %%
import os
import numpy as np
import matplotlib
import rasterio
import rioxarray
import nlmod

# %%
nlmod.util.get_color_logger()
pathname = "issue_429"
extent = [101_000, 120_000, 412_000, 432_000]
bounds = (extent[0], extent[2], extent[1], extent[3])

# %% download extent
tiles = nlmod.read.ahn.get_ahn5(
    extent=extent, identifier="AHN5_05M_M", return_tiles=True
)

# %% save tiles
if not os.path.isdir(pathname):
    os.makedirs(pathname)
for i in range(len(tiles)):
    fname = os.path.join(pathname, f"ahn_{i}.tif")
    tiles[i].rio.to_raster(fname)

# %% read tiles again using rasterio, merge and save
images = []
for file in os.listdir(pathname):
    fname = os.path.join(pathname, file)
    images.append(rasterio.open(fname))
dest, out_transform = rasterio.merge.merge(images, bounds=bounds, nodata=np.nan)

profile = {
    "driver": "GTiff",
    "height": dest.shape[1],
    "width": dest.shape[2],
    "count": 1,
    "dtype": "float32",
    "transform": out_transform,
}
with rasterio.open("ahn_merged.tif", "w", crs="EPSG:28992", **profile) as dst:
    dst.write(dest)

# %% read again using rioxarray
ahn_merged = rioxarray.open_rasterio("ahn_merged.tif")

This code needs the extra argument return_tiles in get_ahn5, which are added by PR #431.

@martinvonk
Copy link
Collaborator Author

Thanks for checking :)

I'll try the code in the PR. But if I understand you correctly. We could also just change rasterio.merge.merge_arrays to rasterio.merge.merge? I think that works because merge_arrays transforms the xarray.DataArray to a rioxarray.merge.RasterioDatasetDuck?

@martinvonk martinvonk linked a pull request Mar 19, 2025 that will close this issue
@martinvonk martinvonk linked a pull request Mar 28, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants