-
Notifications
You must be signed in to change notification settings - Fork 2
fix methods to infer geom and bbox #3
base: main
Are you sure you want to change the base?
Conversation
Thanks for the PR!
That sounds reasonable. We'll just want to make sure that I'm not immediately sure why CI failed.
Probably unrelated to your change. Maybe pandas used to cast these to float and now it doesn't? |
Hi @TomAugspurger, many thanks for the swift response! So I made the following changes to the PR: DatatypesI've changed the expected datetypes in the tests to double for "pop_est" and int64 for "gdp_md_est". Although this matches with what you find in Pandas, it's a bit weird because I would expect population to be an integer and gdp a float? These are the dtypes in the test data: print(type(ds)) # <class 'pyarrow.parquet.core._ParquetDatasetV2'>
fragment = ds.fragments[0]
print(ds.schema)
pop_est: double
continent: string
name: string
iso_a3: string
gdp_md_est: int64
geometry: binary
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 945
geo: '{"primary_column": "geometry", "columns": {"geometry": {"encoding":' + 1355
print(fragment.metadata.schema)
<pyarrow._parquet.ParquetSchema object at 0x16cefd140>
required group field_id=-1 schema {
optional double field_id=-1 pop_est;
optional binary field_id=-1 continent (String);
optional binary field_id=-1 name (String);
optional binary field_id=-1 iso_a3 (String);
optional int64 field_id=-1 gdp_md_est;
optional binary field_id=-1 geometry;
optional int64 field_id=-1 __null_dask_index__;
}
print(gpd.read_parquet("path/to/written/data.parquet").dtypes) # same as gpd.read_file(gpd.datasets.get_path("naturalearth_lowres")).dtypes
pop_est float64
continent object
name object
iso_a3 object
gdp_md_est int64
geometry geometry
dtype: object Spatial partitionsI've added a condititon that checks if the spatial_partitions are None: ...
if data.spatial_partitions is None:
data.calculate_spatial_partitions() Projection EPSG testOne of the tests was checking if the def is_valid_epsg(epsg_code):
try:
pyproj.CRS.from_user_input(epsg_code)
return True
except pyproj.exceptions.CRSError:
return False
assert is_valid_epsg(result.properties["proj:epsg"]) |
Hi @TomAugspurger, I recently used stac table to create a STAC collection for a parquet dataset. Doing so I made some minor changes to the package. Please have a look if you would like to keep them.
In the source code I saw some TODO comments about 'maybe converting the geometries to EPSG:4326'. Since my data was in EPSG:3857, I added the reprojection for the geometries that fall directly under the pystac.Item.properties. So the geometries/bbox are now in 4326, whereas the ones under the projection prefix are in the CRS of the source data.
For some conditions, the data was being load using
dask_geopandas.read_paquet()
; but at least for my dataset the spatial partitions were not available without computing them first. What do you think about adding a call to calculate_spatial_partitions()?