Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data gap at adjacent tile boundary when loading multiple Items #150

Open
pjhartzell opened this issue Apr 9, 2024 · 4 comments
Open

Data gap at adjacent tile boundary when loading multiple Items #150

pjhartzell opened this issue Apr 9, 2024 · 4 comments

Comments

@pjhartzell
Copy link

Summary:

When loading and reprojecting adjacent MODIS sinusoidal tiles to UTM with odc.stac.load, I'm ending up with a small sliver of missing data at the sinusoidal tile boundary. If you load each Item separately and plot them on the same graph, there is no gap:

image

But when loading Items simultaneously as an ItemCollection or just a list of Items (which is what I'd like to do), I end up with a small gap at the tile boundary:

image

My project requires odc-stac version 0.3.5 at the moment. I upgraded to version 0.3.9 and that seemed to fix the problem, but closer investigation revealed that my anchor=AnchorEnum.CENTER argument is being ignored in 0.3.9 and replaced with anchor=AnchorEnum.EDGE. Using EDGE does get rid of the gap, but I need to stick with CENTER. I also do not know why EDGE fixes the problem or if it will hold up in areas where the MODIS Sinusoidal grid is more heavily distorted.

I put together some minimal STAC Items and Jupyter notebook for each of the above issues at https://github.com/pjhartzell/data-gap.

Potential solution to the observed anchor problem:

If this is actually a bug and not a feature that I'm not understanding, in looking through the output_geobox function, it looks like the incoming anchor argument is never used. Is it possible that L968 should be changed from:

anchor = _anchor

to something like this?

anchor = anchor if anchor else _anchor

Versions, hardware

Currently pinned to:
odc-stac==0.3.5
odc-geo==0.3.3
odc-algo==0.2.3

Tested with Python 3.9 and 3.11.

Apple M1 (arm64)

@Kirill888
Copy link
Member

@pjhartzell thanks for reporting this, anchor config does look broken, do you mind creating a separate issue for that?

Issues with projection change on tiled products is tricky and sensitive to tiny shifts, like half-pixel shift you are observing due to anchor bug. My understanding is that modis tiling regime uses proper tiles in the mathematical sense of the word: no gaps and no overlaps between tiles. With data like that there is a difference between mosaic([reproject(tile) for tile in tiles]) and reproject(mosaic(tiles)), with the former leading to artifacts you are observing, especially when reproject generates smaller pixels (upsampling data). One has to first put tiles together into one single pixel plane, and only then sample from that plane to change projection/resolution/alignment.

@Kirill888
Copy link
Member

I have verified this being an issue in the latest code too. Issues goes away when you either

  1. Use resampling other than nearest
  2. Perform projection change AFTER loading in the native projection of the data (i.e. do not set crs=, acnhor=, followed by .odc.reproject(..))

Proper solution to this will be complicated. Currently fusing of pixel data from different Items happens after reprojection to a common pixel grid. That's the easiest to implement and allows for greatest concurrency. For a fully proper solution one would have to

  1. Group all items by native pixel grid (CRS, resolution and pixel alignment have to be the same)
  2. Load each group in the native projection fusing rasters with common grids (mosaic)
  3. Reproject each group into output pixel grid
  4. Fuse those groups again (if more than one)

This is much more involved than a current setup:

  1. Load each item data into the requested output grid
  2. Fuse those rasters into one image

This is one of the reason why Landsat and Sentinel tiles overlap quite a bit even when deep inside UTM zone.

Work-around

Requires latest odc-geo>=0.4.3, for anchor= in .odc.reproject(..).

Code below based on linked example above:

%matplotlib inline
import odc.stac
import pystac

items = [
    pystac.read_file(path)
    for path in [
        "data/MCD43A4.A2023205.h08v04.061.2023216154635/MCD43A4.A2023205.h08v04.061.2023216154635.json",
        "data/MCD43A4.A2023205.h08v05.061.2023216154702/MCD43A4.A2023205.h08v05.061.2023216154702.json",
    ]
]

ds = odc.stac.load(items).odc.reproject(32610, anchor="center")
ds.Nadir_Reflectance_Band1.sel(
    x=slice(450000, 550000), y=slice(4480000, 4380000)
).plot.imshow(robust=True, size=8, col="time")
display(ds.odc.geobox)

image

@pjhartzell
Copy link
Author

do you mind creating a separate issue for that?

Certainly. See #151.

@pjhartzell
Copy link
Author

@Kirill888 Thanks for the clear explanation.

ds = odc.stac.load(items).odc.reproject(32610, anchor="center") eliminates the data gap. The anchor problem still manifests in the results, but should go away once #151 is fixed up (I applied a temporary patch and the output looks good).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants