Improve the random benchmark data generator to generate non-identical pairs of geometries #70

Kontinuation · 2025-09-12T07:44:37Z

As mentioned in #24 (comment), the geometry pairs for benchmarking spatial predicates are identical geometries. This may not reflect the typical real-world workload where predicates are usually applied to a pair of different objects and have high selectivity.

This patch generates benchmark data containing pair of non-identical geometries. The rate of intersected pairs is near 2%, which mimics the workload of high-selectivity spatial filters.

Copilot

Pull Request Overview

This PR improves the random benchmark data generator to create non-identical pairs of geometries for spatial predicate testing, addressing concerns that identical geometry pairs don't reflect typical real-world workloads with high selectivity.

Modified generate_circular_vertices function to accept an RNG parameter for randomized starting angles
Updated benchmark data generation to create two separate geometry sets with overlapping spatial distributions
Changed test seeds to ensure different geometry pairs are generated

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
rust/sedona-testing/src/datagen.rs	Added RNG parameter to `generate_circular_vertices` for randomized geometry generation
benchmarks/test_bench_base.py	Refactored to generate two distinct geometry sets with ~2% intersection rate
python/sedonadb/tests/test_sjoin.py	Updated test seeds to generate different geometry pairs
python/sedonadb/tests/functions/test_distance.py	Added numeric epsilon for distance test precision

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

rust/sedona-testing/src/datagen.rs

benchmarks/test_bench_base.py

paleolimbot

Thank you!

Randomizing the angle is a good idea, although it makes your sjoin tests not be perfect diamonds anymore. As long as that's OK with you!

…sing `with` statement (#71) This patch depends on #70. The pytest may hang when some of the test fails. This is a workflow run that exhibited this problem: https://github.com/apache/sedona-db/actions/runs/17668086933/job/50213528921 Pytest does not free up resources held by local variables by calling `__del__` in time when a test fails. When pytest captured the AssertionError, it effectively adds a reference to the local c object, preventing it from being __del__-ed when the function ends. This leaves the transaction started by the failed test active when the next test starts running. We generate test data by dropping a temporary table, inserting data into the temporary table, and finally renaming the temporary table. Dropping the temporary table will be blocked by a lock held by the still-active transaction started by the previously failed test: ``` postgres=# select pid, query, state, wait_event, wait_event_type, backend_type from pg_stat_activity; pid | query | state | wait_event | wait_event_type | backend_type -------+--------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+------------------------------ 99895 | | | AutoVacuumMain | Activity | autovacuum launcher 99896 | | | LogicalLauncherMain | Activity | logical replication launcher 83813 | select pid, query, state, wait_event, wait_event_type, backend_type from pg_stat_activity; | active | | | client backend 47073 | DROP TABLE IF EXISTS "public" . "sjoin_polygon_xtemp" | idle in transaction | ClientRead | Client | client backend 51565 | DROP TABLE IF EXISTS "public" . "sjoin_point_xtemp" | active | relation | Lock | client backend 66330 | DROP TABLE IF EXISTS "public" . "sjoin_point_xtemp" | active | relation | Lock | client backend 71494 | DROP TABLE IF EXISTS "public" . "sjoin_point_xtemp" | active | relation | Lock | client backend 99892 | | | BgWriterHibernate | Activity | background writer 99891 | | | CheckpointerMain | Activity | checkpointer 99894 | | | WalWriterMain | Activity | walwriter ``` This patch fixes this problem by managing connections using `with` statement. The transaction will be committed or rolled back when the connection is closed. This prevents us from leaking transactions and blocking ourselves on table locks.

Kontinuation added 2 commits September 12, 2025 17:33

Generate better geometry pairs for benchmarking binary ST functions

6cbc2f9

Improve the benchmark data generator

818b657

Kontinuation force-pushed the better-bench-geom-pairs branch from ef3dbab to f9302ff Compare September 12, 2025 09:37

Fix geography distance join test

954a634

Kontinuation force-pushed the better-bench-geom-pairs branch from f9302ff to 954a634 Compare September 12, 2025 09:38

Kontinuation marked this pull request as ready for review September 12, 2025 10:06

Kontinuation requested a review from Copilot September 12, 2025 10:06

Copilot AI reviewed Sep 12, 2025

View reviewed changes

rust/sedona-testing/src/datagen.rs Show resolved Hide resolved

benchmarks/test_bench_base.py Show resolved Hide resolved

Kontinuation mentioned this pull request Sep 12, 2025

Fix pytest hanging when test fails by closing PostgreSQL connection using with statement #71

Merged

paleolimbot approved these changes Sep 12, 2025

View reviewed changes

jiayuasu merged commit 2be2f35 into apache:main Sep 12, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve the random benchmark data generator to generate non-identical pairs of geometries #70

Improve the random benchmark data generator to generate non-identical pairs of geometries #70

Uh oh!

Kontinuation commented Sep 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

paleolimbot left a comment

Uh oh!

Uh oh!

Uh oh!

Improve the random benchmark data generator to generate non-identical pairs of geometries #70

Improve the random benchmark data generator to generate non-identical pairs of geometries #70

Uh oh!

Conversation

Kontinuation commented Sep 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!