Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use pixetl for geojson creation in resample script #610

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from

Conversation

dmannarino
Copy link
Member

@dmannarino dmannarino commented Dec 1, 2024

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
  • Make sure you are making a pull request against the develop branch (left side). Also you should start your branch off our develop.
  • Check the commit's or even all commits' message styles matches our requested structure.
  • Check your code additions will fail neither code linting checks nor unit test.

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

The resample script hangs trying to generate tiles.geojson and extent.geojson for JRC, a global 10m dataset. The script uses code from pixetl which happens to be in the container image to generate these files, and it fails (hangs) for unknown reasons in this situation.

Issue Number: N/A

What is the new behavior?

Debugging has been difficult, and I never liked importing the code from pixetl this way with all its complication and machinery rumbling to life purely to generate these geojsons, so I replaced the calls to pixetl with a small amount of code. I may port this simplified code over to pixetl in the future.

Does this introduce a breaking change?

  • Yes
  • No

Other information

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.07%. Comparing base (7be61b9) to head (b70d59d).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #610   +/-   ##
========================================
  Coverage    81.07%   81.07%           
========================================
  Files          130      130           
  Lines         5876     5876           
========================================
  Hits          4764     4764           
  Misses        1112     1112           
Flag Coverage Δ
unittests 81.07% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dmannarino dmannarino changed the title NEEDS POLISH: Don't use pixetl for geojson creation Don't use pixetl for geojson creation Dec 10, 2024
@dmannarino dmannarino marked this pull request as ready for review December 10, 2024 16:05
@dmannarino dmannarino changed the title Don't use pixetl for geojson creation Don't use pixetl for geojson creation in resample script Dec 10, 2024
Copy link
Collaborator

@danscales danscales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great. Just want to make sure that it is OK that you are not putting as many properties for each feature in the tiles.geojson as in the pixetl implementation.

tiles_fc, extent_fc = generate_geojsons(tile_paths, min(16, NUM_PROCESSES))
logger.log(logging.INFO, "Finished generating geojsons")

tiles_txt = json.dumps(tiles_fc, indent=2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that in existing tiles.geojson, the json is just written out with only spaces and no newlines (i.e. not prettifying with indents, etc.). Do we also want to do that (not indent), or do you think it is worth it for debugging to leave it in pretty format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. I like being able to look at it for debugging, but I'll remove the indent for now in the interests of changing as little as possible.


crs: CRS = CRS.from_string(gdalinfo_json["coordinateSystem"]["wkt"])
metadata = {
# NOTE: pixetl seems to always write features in tiles.geojson in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, pixetl is using the metadata as computed by rasterio.open(). So, are you saying that gdalinfo is returning cornerCoordinates in the original projection, but rasterio.open() is giving the extent always in WGS 84. If so, maybe mention that you are emulating the behavior from rasterio.open() here. I do see that the tiles.geojson at s3://gfw-data-lake/umd_tree_cover_loss/v1.11/raster/epsg-3857/zoom_12/default/geotiff/tiles.geojson is showing projection WGS 84 as a property for each of the tiles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-12-11 at 1 43 40 PM Screenshot 2024-12-11 at 1 43 10 PM I admit I'm a little bit fuzzy on the terminology, but I think technically EPSG:3857 and EPSG:4326 are both WGS 84? In any case, what I really mean is that the coordinates in the feature are always in degrees, even for WM tiles. I *think* they should be in meters for WM tiles. So that's what I'm trying to duplicate (even though it feels wrong).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll re-write the comment to clarify that it's the units, not the actual projection.

"""Run gdalinfo and extract metadata for a single file."""
print(f"Running gdalinfo on {file_path}")
try:
stdout,stderr = run_gdal_subcommand(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason you are running 'gdalinfo` here rather than doing the same thing that pixetl does, which is getting the info from a rasterio.open() of the file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mind went immediately to gdalinfo because (though it may do it differently in other places) pixetl gets metadata with gdalinfo here: https://github.com/wri/gfw_pixetl/blob/master/gfw_pixetl/utils/gdal.py#L170
I can verify that the results are the same with a few tiles.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for that pointer to the pixetl code that is getting the meta-data. I don't see any reprojection/resizing (to EPSG:4326 or meters) in that code you are pointing to, so does that mean you don't really need the to_4326 stuff? See https://github.com/wri/gfw_pixetl/blob/cfb7c2fa1c0b6f366982e38839baa1cabc425938/gfw_pixetl/utils/gdal.py#L192 (But maybe I am missing a conversion somewhere else.)

*to_4326(crs, *corner_coordinates["lowerLeft"]),
*to_4326(crs, *corner_coordinates["upperRight"]),
],
"name": gdalinfo_json["description"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the tiles.geojson that we create with pixetl have a whole bunch more properties in them (looks like most that are printed by gdalinfo), but I'm assuming you've verified that we only need the 'extent' and 'name' properties - since those are the only ones you are generating beside the Polygon coordinates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I actually had those other attributes in there, but took them out because (I swear) I looked at a tiles.geojson that didn't have them and I was trying to reproduce it. I must have been mistaken, I guess? Maybe I was accidentally looking at a custom geojson I had lying around? I don't know, but you're right, I see all that stuff in recent tiles.geojsons. I'll put it back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants