-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use arrow
instead of arrow2
in the example
#518
Conversation
r? @kylebarron |
arrow
instead of arrow2
in the example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I'll make a PR to connect to this from geoarrow-rs too, and then I can have an example of reading a full layer into a GeoArrow table.
examples/read_ogr_arrow.rs
Outdated
//! implementation of Arrow. | ||
//! As of this writing (Jan 2024), there are two competing low-level Arrow libraries in Rust. | ||
//! [`arrow`](https://github.com/apache/arrow-rs) is the "official" one, while | ||
//! [`arrow2`](https://github.com/jorgecarleitao/arrow2) is a less active alternative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest adding a link here
//! [`arrow2`](https://github.com/jorgecarleitao/arrow2) is a less active alternative. | |
//! [`arrow2`](https://github.com/jorgecarleitao/arrow2) is a [less active](https://github.com/jorgecarleitao/arrow2/issues/1429) alternative. |
|
||
#[cfg(any(major_ge_4, all(major_is_3, minor_ge_6)))] | ||
fn run() -> gdal::errors::Result<()> { | ||
use arrow2::array::{BinaryArray, StructArray}; | ||
use arrow2::datatypes::DataType; | ||
use arrow::array::{Array as _, BinaryArray}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, what's the benefit of as _
? Is that to signify that it isn't used as an object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it lets you call the trait methods, but doesn't import the Array
name. Makes it a little clearer that it's a trait IMHO.
let geom_column_index = arrow_stream_reader | ||
.schema() | ||
.column_with_name("wkb_geometry") | ||
.unwrap() | ||
.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the most reliable way to access the geometry column is. Another option is to look for the ogc.wkb
extension name on the geometry field.
As of GDAL 3.8 the user can also pass in GEOMETRY_METADATA_ENCODING=GEOARROW
(see doc). That assigns geoarrow.wkb
to the extension name instead of ogc.wkb
, but also includes the CRS as PROJJSON onto the Arrow extension metadata, so the user doesn't have to separately fetch the layer's CRS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious: can GDAL ever read two geometry columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious: can GDAL ever read two geometry columns?
Normally yes, not sure about Arrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... but in general with GeoArrow we need to have a manner outside the GDAL-specific API to know the geometry column.
Normally yes, not sure about Arrow.
Interesting. I should test GDAL on a file with multiple geometries and see what happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... but in general with GeoArrow we need to have a manner outside the GDAL-specific API to know the geometry column.
Well, I don't know much about Arrow, but that sounds like something that GeoArrow should clarify, not us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In GeoArrow the preferred way to do this is via the extension metadata, not a column's name. I just didn't recall exactly how GDAL implements this, and whether simply checking for the wkb_geometry
name is sufficient
// Downcast it to a `BinaryArray` | ||
let binary_array = geom_column.as_any().downcast_ref::<BinaryArray>().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the exact circumstances but I thought that sometimes GDAL would create an Arrow array with i64 offsets instead of i32 offsets. Maybe this is only in the rare case where the array is very large? I thought I hit a bug related to this, but it was probably in my own code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah, why is that even a thing 😕. I guess the offset size is part of the data type?
Anyway, we're keeping the old behavior, so I'm not sure how to improve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as-is is fine to merge.
With Arrow, a list array has a values buffer plus a buffer of offsets. If the values buffer has more than 2^31 items (signed ints to support Java) then you can't represent those values. Usually the preferred way to fix this is to use batching, so that you never have 2^31 values in a single contiguous array, but Arrow also defines "large" nested array types, which use int64 offsets instead of int32.
I don't recall if GDAL will ever create an array with int64 offsets
Thanks! |
CHANGES.md
if knowledge of this change could be valuable to users.Closes #516