- 
                Notifications
    You must be signed in to change notification settings 
- Fork 29
feat(c/sedona-geos): Implement ST_SimplifyPreserveTopology using geos #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(c/sedona-geos): Implement ST_SimplifyPreserveTopology using geos #243
Conversation
| # POLYGON with hole - topology preserved, both rings simplified | ||
| ( | ||
| "POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0),(5 5, 5 6, 6 6, 8 5, 5 5))", | ||
| 20, | ||
| "POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,5 6,8 5,5 5))", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The comment sounds misleading. The inner ring is simplified (removing the 6 6 coordinate), but the outer ring is left the same.
It would also be helpful if we kept the spacing consistent between the input and the expected string (this applies to all of these test cases). It visually makes it easier for readers to compare the geometries and see if there's a missing coordinate or not, since our eyes can just look up-down. I found myself "restarting" my comparison one or twice after losing track of which coordinate I was on, since my eyes had to track diagonally between the top and bottom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, the consistent spacing indeed made the test case much more easier to follow and understand. Also fixed that comment that was wrongly stating that both outer and inner rings would be simplified.
| def test_st_simplifypreservetopology(eng, geom, tolerance, expected): | ||
| eng = eng.create_or_skip() | ||
| eng.assert_query_result( | ||
| f"SELECT ST_AsText(ST_SimplifyPreserveTopology({geom_or_null(geom)}, {val_or_null(tolerance)}))", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: I was curious why ST_AsText() was there, so I tried removing it myself. We can actually get this to pass without it. After removing the ST_AsText(), the cases fail because your spacing is a bit off. It should pass if you add a space after each comma (,) and after the polygon name. For example, LINESTRING(0 0,0 10) would become LINESTRING (0 0, 0 10). Another benefit is readability from having consistent spacing, as I mentioned above.
The one caveat is POINT EMPTY. In that case, we can set expected to POINT (nan nan) because that's how geoarrow-c renders it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed this inconsistency and thought that ST_AsText seems to normalize the test output. But I agree, it's better to keep all test cases aligned with similar approach.
| ("st_crosses", st_crosses_impl()), | ||
| ("st_overlaps", st_overlaps_impl()), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add these function registrations back. We shouldn't be removing any existing functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed that these were actual duplicated.
| ("st_isvalid", st_is_valid_impl()), | ||
| ("st_isvalidreason", st_is_valid_reason_impl()), | ||
| ("st_length", st_length_impl()), | ||
| ("st_overlaps", st_overlaps_impl()), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already defined here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Achieving full PostGIS compatibility (including all the params) would probably require custom Rust implementation.
Good call...our GEOS list is supposed to be functions that are easy to implement and we've tried hard so far to ensure that when we do add a function we do it all the way.
I wonder if there is a Rust geo implementation of Simplify that checks these boxes already...in any case I'll open up a dedicated issue for this since it seems like it will be more involved. Thank you for identifying!
| let tolerance: Option<f64>; | ||
| let arg1 = args[1].cast_to(&DataType::Float64, None)?; | ||
| if let ColumnarValue::Scalar(scalar_arg) = &arg1 { | ||
| if scalar_arg.is_null() { | ||
| tolerance = None; | ||
| } else { | ||
| tolerance = Some(f64::try_from(scalar_arg.clone())?); | ||
| } | ||
| } else { | ||
| return Err(DataFusionError::Execution(format!( | ||
| "Invalid tolerance: {:?}", | ||
| args[1] | ||
| ))); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're here, I think it would be good to support this as an array input as well (it's roughly the same amount of code). The example I wrote up for ST_Buffer() was:
let distance_value = args[1].cast_to(&DataType::Float64)?.to_array(executor.num_iterations());
let distance_array = if let ColumnarValue::Array(array_ref) = distance_value {
  as_float64_array(array_ref)?
} else {
  return sedona_internal_err!("Unexpected scalar value");
};
let mut distance_iter = distance_array.iter();
// ...
executor.execute_wkb_void(|wkb| {
   match (wkb, distance_iter.next().unwrap() {
       (Some(wkb_item), Some(distance_item)) => { invoke_scalar(...) }
       _ => { /* append null */ }
   }
})There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, nice that the array input was relatively easy to handle with ColumnarValue. Implementation is now updated with this change.
Btw. Does this introduce some room for performance/memory improvement? With this approach and using scalar value, we will copy it for all geometries, potentially quite many times.
Thinking if these tracks should be separated, or might be just micro optimization...
let executor = GeosExecutor::new(arg_types, args);
let tolerance_arg = args[1].cast_to(&DataType::Float64, None)?;
let mut builder = BinaryBuilder::with_capacity(
    executor.num_iterations(),
    WKB_MIN_PROBABLE_BYTES * executor.num_iterations(),
);
match tolerance_arg {
    ColumnarValue::Scalar(scalar_val) => {
        // SCALAR PATH: Extract scalar once, reuse for all geometries
       ...
    }
    ColumnarValue::Array(_) => {
        // ARRAY PATH: Use iterator for per-row tolerances
       ...
    }
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Implements
ST_SimplifyPreserveTopology(geometry, tolerance)function using geos.Notice about ST_Simplify
Actually started the implementation of
ST_Simplifybut noticed that the PostGIS has different behavior for collapsed geometries and collection types in comparison to GEOS implementation.Achieving full PostGIS compatibility (including all the params) would probably require custom Rust implementation.
This might be actually nice function to be ported into Rust.