Skip to content

Conversation

Kontinuation
Copy link
Member

As mentioned in #24 (comment), the geometry pairs for benchmarking spatial predicates are identical geometries. This may not reflect the typical real-world workload where predicates are usually applied to a pair of different objects and have high selectivity.

This patch generates benchmark data containing pair of non-identical geometries. The rate of intersected pairs is near 2%, which mimics the workload of high-selectivity spatial filters.

@Kontinuation Kontinuation force-pushed the better-bench-geom-pairs branch from ef3dbab to f9302ff Compare September 12, 2025 09:37
@Kontinuation Kontinuation force-pushed the better-bench-geom-pairs branch from f9302ff to 954a634 Compare September 12, 2025 09:38
@Kontinuation Kontinuation marked this pull request as ready for review September 12, 2025 10:06
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the random benchmark data generator to create non-identical pairs of geometries for spatial predicate testing, addressing concerns that identical geometry pairs don't reflect typical real-world workloads with high selectivity.

  • Modified generate_circular_vertices function to accept an RNG parameter for randomized starting angles
  • Updated benchmark data generation to create two separate geometry sets with overlapping spatial distributions
  • Changed test seeds to ensure different geometry pairs are generated

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
rust/sedona-testing/src/datagen.rs Added RNG parameter to generate_circular_vertices for randomized geometry generation
benchmarks/test_bench_base.py Refactored to generate two distinct geometry sets with ~2% intersection rate
python/sedonadb/tests/test_sjoin.py Updated test seeds to generate different geometry pairs
python/sedonadb/tests/functions/test_distance.py Added numeric epsilon for distance test precision

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Randomizing the angle is a good idea, although it makes your sjoin tests not be perfect diamonds anymore. As long as that's OK with you!

@jiayuasu jiayuasu merged commit 2be2f35 into apache:main Sep 12, 2025
12 checks passed
Kontinuation added a commit that referenced this pull request Sep 12, 2025
…sing `with` statement (#71)

This patch depends on #70.

The pytest may hang when some of the test fails. This is a workflow run that exhibited this problem: https://github.com/apache/sedona-db/actions/runs/17668086933/job/50213528921

Pytest does not free up resources held by local variables by calling `__del__` in time when a test fails. When pytest captured the AssertionError, it effectively adds a reference to the local c object, preventing it from being __del__-ed when the function ends. This leaves the transaction started by the failed test active when the next test starts running.

We generate test data by dropping a temporary table, inserting data into the temporary table, and finally renaming the temporary table. Dropping the temporary table will be blocked by a lock held by the still-active transaction started by the previously failed test:

```
postgres=# select pid, query, state, wait_event, wait_event_type, backend_type from pg_stat_activity;
  pid  |                                           query                                            |        state        |     wait_event      | wait_event_type |         backend_type         
-------+--------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+------------------------------
 99895 |                                                                                            |                     | AutoVacuumMain      | Activity        | autovacuum launcher
 99896 |                                                                                            |                     | LogicalLauncherMain | Activity        | logical replication launcher
 83813 | select pid, query, state, wait_event, wait_event_type, backend_type from pg_stat_activity; | active              |                     |                 | client backend
 47073 | DROP TABLE IF EXISTS "public" . "sjoin_polygon_xtemp"                                      | idle in transaction | ClientRead          | Client          | client backend
 51565 | DROP TABLE IF EXISTS "public" . "sjoin_point_xtemp"                                        | active              | relation            | Lock            | client backend
 66330 | DROP TABLE IF EXISTS "public" . "sjoin_point_xtemp"                                        | active              | relation            | Lock            | client backend
 71494 | DROP TABLE IF EXISTS "public" . "sjoin_point_xtemp"                                        | active              | relation            | Lock            | client backend
 99892 |                                                                                            |                     | BgWriterHibernate   | Activity        | background writer
 99891 |                                                                                            |                     | CheckpointerMain    | Activity        | checkpointer
 99894 |                                                                                            |                     | WalWriterMain       | Activity        | walwriter

```

This patch fixes this problem by managing connections using `with` statement. The transaction will be committed or rolled back when the connection is closed. This prevents us from leaking transactions and blocking ourselves on table locks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants