-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use pixetl for geojson creation in resample script #610
base: develop
Are you sure you want to change the base?
Conversation
…w Data API implementation
…a-api into dont_use_pixetl_for_geojsons
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## develop #610 +/- ##
========================================
Coverage 81.07% 81.07%
========================================
Files 130 130
Lines 5876 5876
========================================
Hits 4764 4764
Misses 1112 1112
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks great. Just want to make sure that it is OK that you are not putting as many properties for each feature in the tiles.geojson as in the pixetl implementation.
batch/python/apply_colormap.py
Outdated
tiles_fc, extent_fc = generate_geojsons(tile_paths, min(16, NUM_PROCESSES)) | ||
logger.log(logging.INFO, "Finished generating geojsons") | ||
|
||
tiles_txt = json.dumps(tiles_fc, indent=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that in existing tiles.geojson, the json is just written out with only spaces and no newlines (i.e. not prettifying with indents, etc.). Do we also want to do that (not indent), or do you think it is worth it for debugging to leave it in pretty format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point. I like being able to look at it for debugging, but I'll remove the indent for now in the interests of changing as little as possible.
|
||
crs: CRS = CRS.from_string(gdalinfo_json["coordinateSystem"]["wkt"]) | ||
metadata = { | ||
# NOTE: pixetl seems to always write features in tiles.geojson in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, pixetl is using the metadata as computed by rasterio.open(). So, are you saying that gdalinfo is returning cornerCoordinates in the original projection, but rasterio.open() is giving the extent always in WGS 84. If so, maybe mention that you are emulating the behavior from rasterio.open() here. I do see that the tiles.geojson at s3://gfw-data-lake/umd_tree_cover_loss/v1.11/raster/epsg-3857/zoom_12/default/geotiff/tiles.geojson
is showing projection WGS 84 as a property for each of the tiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll re-write the comment to clarify that it's the units, not the actual projection.
"""Run gdalinfo and extract metadata for a single file.""" | ||
print(f"Running gdalinfo on {file_path}") | ||
try: | ||
stdout,stderr = run_gdal_subcommand( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason you are running 'gdalinfo` here rather than doing the same thing that pixetl does, which is getting the info from a rasterio.open() of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mind went immediately to gdalinfo because (though it may do it differently in other places) pixetl gets metadata with gdalinfo here: https://github.com/wri/gfw_pixetl/blob/master/gfw_pixetl/utils/gdal.py#L170
I can verify that the results are the same with a few tiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for that pointer to the pixetl code that is getting the meta-data. I don't see any reprojection/resizing (to EPSG:4326 or meters) in that code you are pointing to, so does that mean you don't really need the to_4326 stuff? See https://github.com/wri/gfw_pixetl/blob/cfb7c2fa1c0b6f366982e38839baa1cabc425938/gfw_pixetl/utils/gdal.py#L192 (But maybe I am missing a conversion somewhere else.)
*to_4326(crs, *corner_coordinates["lowerLeft"]), | ||
*to_4326(crs, *corner_coordinates["upperRight"]), | ||
], | ||
"name": gdalinfo_json["description"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the tiles.geojson that we create with pixetl have a whole bunch more properties in them (looks like most that are printed by gdalinfo), but I'm assuming you've verified that we only need the 'extent' and 'name' properties - since those are the only ones you are generating beside the Polygon coordinates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. I actually had those other attributes in there, but took them out because (I swear) I looked at a tiles.geojson that didn't have them and I was trying to reproduce it. I must have been mistaken, I guess? Maybe I was accidentally looking at a custom geojson I had lying around? I don't know, but you're right, I see all that stuff in recent tiles.geojsons. I'll put it back.
Pull request checklist
Please check if your PR fulfills the following requirements:
Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
The resample script hangs trying to generate tiles.geojson and extent.geojson for JRC, a global 10m dataset. The script uses code from pixetl which happens to be in the container image to generate these files, and it fails (hangs) for unknown reasons in this situation.
Issue Number: N/A
What is the new behavior?
Debugging has been difficult, and I never liked importing the code from pixetl this way with all its complication and machinery rumbling to life purely to generate these geojsons, so I replaced the calls to pixetl with a small amount of code. I may port this simplified code over to pixetl in the future.
Does this introduce a breaking change?
Other information