Skip to content

Conversation

@joonaspessi
Copy link
Contributor

Implements ST_SimplifyPreserveTopology(geometry, tolerance) function using geos.

Notice about ST_Simplify

Actually started the implementation of ST_Simplify but noticed that the PostGIS has different behavior for collapsed geometries and collection types in comparison to GEOS implementation.

Achieving full PostGIS compatibility (including all the params) would probably require custom Rust implementation.

This might be actually nice function to be ported into Rust.

Comment on lines 1428 to 1432
# POLYGON with hole - topology preserved, both rings simplified
(
"POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0),(5 5, 5 6, 6 6, 8 5, 5 5))",
20,
"POLYGON((0 0,10 0,10 10,0 10,0 0),(5 5,5 6,8 5,5 5))",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The comment sounds misleading. The inner ring is simplified (removing the 6 6 coordinate), but the outer ring is left the same.

It would also be helpful if we kept the spacing consistent between the input and the expected string (this applies to all of these test cases). It visually makes it easier for readers to compare the geometries and see if there's a missing coordinate or not, since our eyes can just look up-down. I found myself "restarting" my comparison one or twice after losing track of which coordinate I was on, since my eyes had to track diagonally between the top and bottom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, the consistent spacing indeed made the test case much more easier to follow and understand. Also fixed that comment that was wrongly stating that both outer and inner rings would be simplified.

def test_st_simplifypreservetopology(eng, geom, tolerance, expected):
eng = eng.create_or_skip()
eng.assert_query_result(
f"SELECT ST_AsText(ST_SimplifyPreserveTopology({geom_or_null(geom)}, {val_or_null(tolerance)}))",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: I was curious why ST_AsText() was there, so I tried removing it myself. We can actually get this to pass without it. After removing the ST_AsText(), the cases fail because your spacing is a bit off. It should pass if you add a space after each comma (,) and after the polygon name. For example, LINESTRING(0 0,0 10) would become LINESTRING (0 0, 0 10). Another benefit is readability from having consistent spacing, as I mentioned above.

The one caveat is POINT EMPTY. In that case, we can set expected to POINT (nan nan) because that's how geoarrow-c renders it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed this inconsistency and thought that ST_AsText seems to normalize the test output. But I agree, it's better to keep all test cases aligned with similar approach.

Comment on lines -65 to -66
("st_crosses", st_crosses_impl()),
("st_overlaps", st_overlaps_impl()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add these function registrations back. We shouldn't be removing any existing functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that these were actual duplicated.

("st_isvalid", st_is_valid_impl()),
("st_isvalidreason", st_is_valid_reason_impl()),
("st_length", st_length_impl()),
("st_overlaps", st_overlaps_impl()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already defined here

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Achieving full PostGIS compatibility (including all the params) would probably require custom Rust implementation.

Good call...our GEOS list is supposed to be functions that are easy to implement and we've tried hard so far to ensure that when we do add a function we do it all the way.

I wonder if there is a Rust geo implementation of Simplify that checks these boxes already...in any case I'll open up a dedicated issue for this since it seems like it will be more involved. Thank you for identifying!

Comment on lines 58 to 71
let tolerance: Option<f64>;
let arg1 = args[1].cast_to(&DataType::Float64, None)?;
if let ColumnarValue::Scalar(scalar_arg) = &arg1 {
if scalar_arg.is_null() {
tolerance = None;
} else {
tolerance = Some(f64::try_from(scalar_arg.clone())?);
}
} else {
return Err(DataFusionError::Execution(format!(
"Invalid tolerance: {:?}",
args[1]
)));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here, I think it would be good to support this as an array input as well (it's roughly the same amount of code). The example I wrote up for ST_Buffer() was:

let distance_value = args[1].cast_to(&DataType::Float64)?.to_array(executor.num_iterations());
let distance_array = if let ColumnarValue::Array(array_ref) = distance_value {
  as_float64_array(array_ref)?
} else {
  return sedona_internal_err!("Unexpected scalar value");
};

let mut distance_iter = distance_array.iter();

// ...

executor.execute_wkb_void(|wkb| {
   match (wkb, distance_iter.next().unwrap() {
       (Some(wkb_item), Some(distance_item)) => { invoke_scalar(...) }
       _ => { /* append null */ }
   }
})

Copy link
Contributor Author

@joonaspessi joonaspessi Oct 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, nice that the array input was relatively easy to handle with ColumnarValue. Implementation is now updated with this change.

Btw. Does this introduce some room for performance/memory improvement? With this approach and using scalar value, we will copy it for all geometries, potentially quite many times.

Thinking if these tracks should be separated, or might be just micro optimization...

let executor = GeosExecutor::new(arg_types, args);
let tolerance_arg = args[1].cast_to(&DataType::Float64, None)?;

let mut builder = BinaryBuilder::with_capacity(
    executor.num_iterations(),
    WKB_MIN_PROBABLE_BYTES * executor.num_iterations(),
);

match tolerance_arg {
    ColumnarValue::Scalar(scalar_val) => {
        // SCALAR PATH: Extract scalar once, reuse for all geometries
       ...
    }
    ColumnarValue::Array(_) => {
        // ARRAY PATH: Use iterator for per-row tolerances
       ...
    }
}

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@paleolimbot paleolimbot merged commit 8e6234c into apache:main Oct 27, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants