-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose transform and CRS in /metadata #163
Comments
Could you please expand a bit on your use case? Why do you need pixel indices from lat / lon? I am trying to determine whether this is a useful feature for a majority of users that warrants its own API endpoint. |
In my typical uses cases i have multiple variables [a, b, c, ...] defined on the same coordinate grid. I then calculate some derived variable that indicates feasibility and plot it. When i click on the map to select a location, i would then like to get all of the variables [a, b, c, ...] that belongs to that point. To do this, i need the raster index. A concrete example could be assessment of new sites for wind turbines. For each grid point i have many meteorological parameters (wind speed, air density, pressure, temperature, ...). To illustrate feasibility, i would plot e.g. the mean wind speed. When the user clicks the map to select a location for further analysis, i need to get the actual values of all of the variables; hence one needs index. From a user perspective it is very annoying if the index mapping is not perfect. Imagine that you need to locate the pixel with the heighest wind speed within your region of interest. Hence, you locate the pixel on the map and click on it. Now, if the index mapping is not perfect, the resulting selection will show a lower value. To get the "correct site", i.e. the one with the highest wind speed, you will need to play around clicking on the neighbouring pixels until your get it right. That is a poor user experience :/ Using Terracotta for data visualization, i would say that this is a killer feature. It is really common that you make a selection on a map and need the underlying data for extended analysis. However, for visualization of "pure" images, i agree that the feature is not that useful. Still, i think that the usefulness for the data visualization case (which is pretty broad) is enough to warrant its own API endpoint. |
Pixel drillingWe have for quite while been planning to build a data extraction ("pixel drilling") tool for data used with Terracotta. Very similar to your use case:
We considered different options:
Earlier, we considered this to be out of scope (#79). But we can open this discussion again. AccuracyRegarding your concern with pixel accuracy: You have to be careful here, because Terracotta reprojects your data on the fly. Could be from a different coordinate reference system to Web-Mercator, or to match the exact xyz tile bounds: So the pixels you see on your map are almost never the ones you have stored in your raster. So if you want to accurately compute stuff and stay true to the original raster data, you will need an API where you can post your point (or feature) coordinates and that will accurately translate them into the nearest pixel(s) by using the original image transform and not the XYZ tiles. We are currently testing that concept. The performance is really nice with async retrieval of pixels from many rasters with Lambdas etc. But we do not yet have enough experience to conclude on how to implement this best - and connect it to Terracotta. You see there are a lot of considerations to make. It would be great if you designed and prototyped something and presented your concept and considerations. |
@j08lue That sounds great! My use case fits perfectly with your concept of "pixel drilling". I didn't know that the feature had been discussed before. Regarding the different options,
or alternatively with the arguments (keys, lat, lon) as query parameters.
About accuracy, the lines of code that you quote was in fact the reason that i came up with the idea to do a projection of my coordinate grid to EPSG:3857 prior to searching for the coordinate. However, as noted in the initial post, while this approach did indeed increase accuracy, it's still not perfect. In my previously look at the code, i kind of got stuck in the _get_raster_tile method, trying to figure out the index mapping. I will give it another try. I guess that as soon as the correct transformation (lat,lon) -> pixel coordinate is determined, the implementation should be doable. |
Well, lat lon to pixel is simple: import rasterio
import rasterio.warp
import rasterio.windows
lat = 55.871610
lon = 12.493728
with rasterio.open('/my/target/raster.tif') as ds:
# transform to right CRS
x, y = rasterio.warp.transform({'init': 'epsg:4326'}, ds.crs, [lon], [lat])
# inverse-transform from projection coordinates to indexes
ii, jj = ~ds.transform * (x, y)
# round indexes to integers and turn them into a window
i = int(ii[0] + 0.5)
j = int(jj[0] + 0.5)
window = rasterio.windows.Window.from_slices((j, j+1), (i, i+1))
# extract the data
pixel_value = ds.read(window=window) Or use |
I imagine that it's simple if one knows how to do it ;). In your code, could you share the imports? And is the "~transform" operation an actual function of just pseudo code? I was thinking that there would be some rounding of indices when the pixels are drawn on the tiles other than the coordinate transformation, so that depending on the zoom level and the resulting tiles, the same (lat,lon) might map to different pixels. However, i would be very happy if this is not the case :) |
No, a lat lon position should map to the same source image pixel always. When you render map tiles of any zoom level, though, pixels will change size and position and their values will be interpolated accordingly. So the user will not see the exact same values on the map as in the point-extracted data. But the extracted data will be the "true" pixel data from the source image. I added the imports to the above code. Have not tested it but it should work exactly like this. |
Ah, i see, thanks. With slight modifications (it seems that the transform does not work on lists of coordinates),
the code runs. Comparing the result, i.e. the resulting indices, to my previous approach, your suggested approach seems to be similar in accuracy, i.e. +/- 1 pixel of what i see on the screen.
Is guess this is the very "source of the problem". Intuitively, a user would expect to get exactly the value that they clicked on. I understand why this cannot be achieved when the zoom is low as all pixels are not rendered. But shouldn't it be possible if the zoom level is large enough that all pixels are rendered? |
(the close action was unintentional) |
It's not necessarily the same pixels. The client (Leaflet) defines which image transform the XYZ tiles have, so pixels might be displaced compared to the original image. Maybe web-optimization can help here? #151 It could also be a round-off issue. How many decimals do your lat/lon coordinates have and how large are your pixels? |
May I suggest something else to help you work around the issue now. During ingestion, I suggest you add some additional metadata to your rasters: with rasterio.open(myraster) as src:
crs = src.crs
transform = src.transform
metadata = driver.compute_metadata(
myraster,
extra_metadata=dict(crs=crs, transform=transform)
)
driver.insert(..., metadata=metadata) Then, you get original CRS and transform of the raster when doing
(this is what @j08lue suggested, but in the frontend) Projecting both to another CRS is prone to off-by-one errors and not unique, especially at high latitudes. I suggest you stay in the native projection of the raster. |
You may have to get the transform and crs ready for serialization. |
Thank you for some great inputs! I am now realizing that what i was initially asking might not be possible - at least not when my data grid is in a different projection than the visualization (which is 'EPSG:3857' according to Leaflet documentation). To overcome this issue, i am not trying out a new workflow,
My initial tests seem to indicate that this approach does yield exact pixel mapping (!), but i will need to do some more testing to be sure. |
Index lookupI think what you are trying to do should definitely be possible, despite the numerous reprojection steps. Writing a lookup raster is not necessary. I'm not sure whether it should be If you consistently experience off-by-one errors with both rounding schemes, I would be grateful for a reproducible test case. It could be that Terracotta is actually off by a fraction of a pixel, so that would be a bug that needs to be fixed. Again, the recipe should be:
You can do this perfectly fine in the frontend, so I would be strongly opposed an API endpoint that does this. Such a function belongs into a potential Terracotta client (frontend) library. Pixel drillingThe pixel drilling discussion is more general, where we don't just want indices, but the actual pixel values and time series. Doing this efficiently is not possible with our current data model. Solving this problem involves a different approach to storing the data, vanilla COG won't do it. That is because COG can only be read one (internal) tile at a time. With an internal tile size of But if you are motivated you are still welcome to try. Maybe the performance is still usable for simple applications. |
Index lookupIn my proposed workflow above i am not using a lookup raster. Rather i am interpolating my data onto a uniform grid in 'EPSG:3857' prior to creating the raster (this step is not shown, for brevity i just created some dummy data) so that my data grid and my view (pixel grid) are in the same coordinate system. Does that make sense? With i = int(ii), the pixel mapping seems to be working, i.e. so far i have no reason to suspect that Terracotta is doing anything wrong. As a understand, to do the transformation, the raster CRS must be known. Hence the client would need access to this information somehow, but i guess that could be within the scope of a (potential) terracotta client. Pixel drillingI see that with the current data structure, the pixel drilling cannot be perfectly efficient. However, i think that a more interesting question is whether it is effictient enough? To quantify the efficiency, i did a quick benchmark of the read code above, and on my PC the execution time is 1-2 milliseconds. For single-point request (as in my case), i would say that this is good enough.
When you talk about scaling, do you mean to non-point requests, e.g. points within a polygon? |
Yes, that was the idea behind storing that information (CRS and transform) in the metadata, which the client can retrieve (
No, the problem is actually bigger with point requests that with polygons. The thing is that, even when you request a single point, GDAL/rasterio has to read the full containing block of data from the tiled GeoTIFF, and then discard all but one pixel. Hence you pay for 60,000 pixels with time and data access, but only get one. Requesting a million single pixels like that, you will effectively read 240 GB, when the data you actually need is only 4 MB. On the other hand, if you only want to "drill" a tiny fraction of your rasters occasionally, the overhead might be worth the convenience that lies in deploying the data only once. I agree with @dionhaefner that, considering the inefficiency, this is not a feature we should include with Terracotta. But a separate API that builds on top of Terracotta's database might be a good solution for occasional usage. An alternative is to look into implementing an alternative raster driver for a format that supports more efficient multi-dimensional data extraction. I do not know of a format that is fast enough, though. |
just an FYI becuse it wasn't mentioned but rasterio is already shipped with a https://github.com/developmentseed/cogeo-tiler/blob/master/cogeo_tiler/handler.py#L389-L402 with rasterio.open(src_path) as src_dst:
lon_srs, lat_srs = warp.transform("epsg:4326", src_dst.crs, [lon], [lat])
if not (
(src_dst.bounds[0] < lon_srs[0] < src_dst.bounds[2])
and (src_dst.bounds[1] < lat_srs[0] < src_dst.bounds[3])
):
raise Exception("Point is outside the raster bounds")
values = list(src_dst.sample([(lon_srs[0], lat_srs[0])], indexes=indexes))[
0
].tolist() |
Nice, thanks for the pointer, @vincentsarago! That nicely simplifies the code, but does not solve the problem of read overhead. Have you encountered efficient solutions for that - also with a completely different tech stack? |
@j08lue I think you are mentioning @dionhaefner comment:
Sadly I don't think we can do iner tile value retrieval without fetching the full tile, and I'm not sure if its possible because we can easily retrieve tile based on the |
For anyone else with this issue, I got it working using a small extension that fetches point data from the geotiff. You can find it here, https://pypi.org/project/terracotta-toolbelt/ Be aware that it's also necessary to,
I have created a small example that demonstrates how to use Terracotta for data visualization including pixel drilling, https://github.com/thedirtyfew/terracotta-dash-example When i get the time, I plan to add a hosted verison of the example on Heroku :) |
👋 @emilhe, thanks for sharing! I think it's great to have a full example deployment like yours, and your map looks gorgeous. I'm not sure whether this is how I would solve the pixel drilling, but whatever gets the job done :) FWIW, I would actually recommend to side-step the issue and do the value lookup in the frontend. You can retrieve the RGB value from the tile images in leaflet and map them back to the original pixel values via Since this just operates on the tile images, you can guarantee WYSIWYG compliance and don't need any additional API calls or special preprocessing. You just need to make sure to enable CORS for tiles so you can access the canvas. |
Thanks for the example, @dionhaefner ! It looks really cool. In terms of raw performance, it's definitly better than my approach, and it's also great to see an approach that's indepedent of data pre processing. However, i also see a few drawbacks,
While (1) is only a problem if high accuracy is needed (or special colorscales are used), (2) is a deal breaker for the use cases i am typically facing. As an example, consider an application that displays metereological data. The user will be able to select a single parameter, say wind speed, for visualization on the map. When a point is clicked, information on other paramers (e.g. air density, pressure, ...), should appear. Do you see any way your approach could be adopted to this kind of use case? |
|
Another drawback is that, due to resampling of overviews and interpolation, you can get values that are not present or not possible in the original raster, since it is truly WYSIWYG. I'm not saying this is the right approach in every case, just putting it out there as the first thing I would try in this situation :) |
I think that "the best approach" depends on the use case at hand :). In general, if you only need access to on-screen data, the client side color-to-value approach seems to be the most elegant (and performant). On the other hand, the server side approach seems more suitable when off-screen data are needed. |
Regarding performance: Did you do any benchmarks how long it takes for your custom endpoint to return a value? (just curious) |
Side note on this already pretty long thread - there is talk going on about data tiles where you pull the binary data into the client... Just FYI. |
Yes, i did a simple benchmark requesting a few thousand random (to avoid caching) points. With my current implementation, the mean request time is around 2 ms on my work station, but the speed will of course vary depending on HW and connection speeds. @j08lue - That looks really cool! But won't it be a problem for large data sets (i assume what the data must be held in the browser)? The reason i started using Terracotta was that my data files could be in the ~ GB range (covering the globe), which was not handled well by the browser. |
I am visualizing raster data in a Leaflet-based map component. Prior to using Terracotta, i was using a kind of ImageLayer, which emitted click events specifying the raster index of the click. I need this index to lookup some other data.
To accommodate larger rasters, i have switched to serving the data via Terracotta, i.e. i add a TileLayer to the Leaflet map with the appropriate Terracotta URL. Now, the TileLayer can obviously not tell me the raster index of a click, as it knows nothing about the underlying data. Hence i am left with the "general" map click event, which contains just the (lat,lon) of the click.
I can calculate the index of the nearest point in the raster from the (lat,lon) of the click, but this approach yields only an accurate result within +/- 2 pixel in each direction. I found that transforming the click and raster coordinates to EPSG:3857 prior to searching brings down the uncertainly to +/- 1. However, i would really like to get to zero, i.e. an exact match.
Since it is Terracotta, which is actually rendering which pixels goes where, i guess that Terracotta should be able to determine raster index of a (lat,lon) pair? If this could be done, think that the functionality should be exposed via a new API endpoint.
The text was updated successfully, but these errors were encountered: