Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GeoDataFrame support to Pipeline #173

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Add GeoDataFrame support to Pipeline #173

wants to merge 3 commits into from

Conversation

jf-geo
Copy link

@jf-geo jf-geo commented Jun 27, 2024

GeoDataFrame support added for if GeoPandas is available.

GeoPandas import mirrors existing Pandas import.

Pipeline.get_geodataframe method added. Users can get an array from an executed pipeline as a GeoDataFrame instead of a Pandas DataFrame. Optional arguments included for specifying XY vs XYZ point geometries and for providing CRS information to the GeoDataFrame constructor.

Pipeline(dataframes) modified so that columns named "geometry" will be dropped before conversion to structured arrays.

Added basic GeoPandas GeoDataFrame support. If GeoPandas is installed users can read an array from an executed pipeline and return a GeoDataFrame, with optional arguments for XY vs XYZ point and CRS. DataFrames passed to the Pipeline constructor will drop the "geometry" column if present.
Added GeoDataFrame support to pipeline.py
@hobu
Copy link
Member

hobu commented Jun 28, 2024

Can you please add a test for this?

I'm not quite familiar with the performance consequences of GeoDataFrame relative to our StructuredArray. Does this PR work as expected for you?

Added test for GeoDataFrames
@jf-geo
Copy link
Author

jf-geo commented Jun 28, 2024

Can you please add a test for this?

I'm not quite familiar with the performance consequences of GeoDataFrame relative to our StructuredArray. Does this PR work as expected for you?

Test added.

The PR works as expected.

Using GeoDataFrames as an input for pipelines performs the same as using Pandas DataFrames.

Getting a GeoDataFrame from an executed pipeline is slower than getting a StructuredArray or DataFrame as a GeoSeries needs to be created from the XY[Z] dimensions. Currently the fastest way of doing this is via geopandas.points_from_xy. points_from_xy performance depends on the user's version of shapely or if they have pygeos installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants