Run queries in python benchmarks using only one thread #24

Kontinuation · 2025-09-05T13:41:49Z

SedonaDB may run queries using multiple threads so the results are not directly comparable with other engines. See comment #10 (comment) for details. This patch configures SedonaDB and DuckDB to run queries using one single thread.

paleolimbot · 2025-09-05T14:06:45Z

I haven't run into DuckDB running stuff on a single thread before although I have dealt with wildly different timings with the CLI vs Python. It's worth checking something other than ST_Contains, too, and also ensuring we're benchmarking against forthcoming DuckDB (pip install duckdb --upgrade --pre) since these will all change in a week.

Probably the benchmark we want to focus on is the "user-facing default", since that is how people will perceive the speed of our engine? We might want to consider tuning the PostGIS instance to be smarter since the defaults are very bad for geometry (the last time I did this was https://dewey.dunnington.ca/post/2024/wrangling-and-joining-130m-points-with-duckdb--the-open-source-spatial-stack/#postgis ).

Some other issues with the current benchmarks:

They benchmark on a "table" and not a Parquet scan. We have the edge on a Parquet scan, PostGIS and DuckDB have an edge with their native table format. The Parquet scan probably is more realistic.
We don't benchmark predicates with realistic input (they are ST_Contains with two identical inputs). The array/scalar case is probably best to focus on (more likely to affect the perceived speed of our engine's fist release).

Kontinuation · 2025-09-05T14:33:32Z

I was using my local build of duckdb-spatial from the latest trunk, the duckdb-python is also my locally built latest 1.4.1-dev version.

I have tried re-installing duckdb using pip install duckdb --upgrade --pre and ran another benchmark (test_st_buffer), the behavior is basically the same:

----------------------------------------------------------------------------------- benchmark 'table=collections_complex': 2 tests ----------------------------------------------------------------------------------
Name (time in ms)                                     Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_st_buffer[collections_complex-SedonaDB]     115.0520 (1.0)      157.6825 (1.0)      128.0455 (1.0)      13.9384 (1.73)     122.8113 (1.0)      10.1471 (1.0)           2;1  7.8097 (1.0)           9           1
test_st_buffer[collections_complex-DuckDB]       611.7650 (5.32)     633.0960 (4.02)     623.6101 (4.87)      8.0625 (1.0)      623.5183 (5.08)     10.8149 (1.07)          2;0  1.6036 (0.21)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------- benchmark 'table=collections_simple': 2 tests -----------------------------------------------------------------------------------
Name (time in ms)                                    Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_st_buffer[collections_simple-SedonaDB]     111.6307 (1.0)      128.6885 (1.0)      120.1398 (1.0)       5.7616 (1.0)      121.7260 (1.0)       8.5031 (1.0)           3;0  8.3236 (1.0)           9           1
test_st_buffer[collections_simple-DuckDB]       619.3594 (5.55)     698.7082 (5.43)     655.4287 (5.46)     34.3422 (5.96)     657.1137 (5.40)     60.9305 (7.17)          2;0  1.5257 (0.18)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Kontinuation · 2025-09-05T14:36:56Z

We don't benchmark predicates with realistic input (they are ST_Contains with two identical inputs). The array/scalar case is probably best to focus on (more likely to affect the perceived speed of our engine's fist release).

I also found this problem. This will drastically affect the benchmarking results for predicates such as ST_Intersects since some implementations may return early for such cases. I'll fix the benchmarking data by generating uniformly distributed pairs.

Kontinuation · 2025-09-05T14:43:10Z

You can run the test_st_buffer benchmark yourself and see if this single-thread behavior happens only on my site.

The benchmark results for ST_Contains represented by Peter seems to be single-threaded performance. I'm unsure if I'm doing anything wrong to achieve this multi-core concurrency behavior when running SedonaDB.

paleolimbot · 2025-09-05T15:14:45Z

When I run

import duckdb

duckdb.load_extension("spatial")
duckdb.sql(
    "SELECT count(*) FROM 'submodules/geoarrow-data/microsoft-buildings/files/microsoft-buildings_point_geo.parquet' WHERE ST_Intersects(geometry, geometry)"
).to_arrow_table()

...I see Activity Manager reporting 900% CPU usage (query runs in 30s). I'll check the benchmark too in a bit.

petern48 · 2025-09-05T16:49:12Z

We don't benchmark predicates with realistic input (they are ST_Contains with two identical inputs). The array/scalar case is probably best to focus on (more likely to affect the perceived speed of our engine's fist release).

Yeah, I knowingly made geom1 and geom2 identical here. I was primarily focused on the non-predicate functions, so I just put something together quick for binary functions. I meant to circle back to it later, but ran into other things.

They benchmark on a "table" and not a Parquet scan. We have the edge on a Parquet scan, PostGIS and DuckDB have an edge with their native table format. The Parquet scan probably is more realistic.

I knowingly did this too actually. When I started this was just supposed to be a scuffed development tool rather than something we'd use for presenting results publicly. I created these benchmarks with the intention of comparing purely the function implementations (ignoring parquet reading optimizations), since I've been focused on function implementations.

I would still argue using "table" is probably more useful as a developer when it comes to optimizing functions, since it would be a raw 1-to-1 comparison. We could make it configurable (later potentially). For now, It's totally fine if we want to change it to using parquet scans.

Sorry for all of these issues 😬

paleolimbot · 2025-09-05T16:51:40Z

I don't see any issues running these without the benchmark suite:

from sedonadb.testing import SedonaDB, DuckDB

sedonadb = SedonaDB()
duckdb = DuckDB()

random = sedonadb.con.sql("""
    SELECT geometry FROM sd_random_geometry('{
                    "geom_type": "Point",
                    "target_rows": 10000000,
                    "vertices_per_linestring_range": [2, 2]
    }')""")


duckdb.create_table_arrow("random", random)
sedonadb.create_table_arrow("random", random)

%time duckdb.execute_and_collect("SELECT ST_Buffer(geometry, 0.1) FROM random")
#> CPU times: user 1min 20s, sys: 2.04 s, total: 1min 22s
#> Wall time: 7.53 s

%time sedonadb.execute_and_collect("SELECT ST_Buffer(geometry, 0.1) FROM random")
#> CPU times: user 1min 31s, sys: 3.05 s, total: 1min 35s
#> Wall time: 9.75 s

Maybe duckdb always runs single threaded under pytest? Probably it makes sense to roll our own benchmark suite.

jiayuasu · 2025-09-05T19:38:05Z

+1 to add Parquet scan. We can have both (DuckDB parquet and DuckDB internal) and default to compare against Parquet scan. Internal storage format always have huge impact on performance. We are targeting at the lakehouse architecture which use open format like parquet for storage

Kontinuation · 2025-09-10T03:41:17Z

I don't see any issues running these without the benchmark suite:

from sedonadb.testing import SedonaDB, DuckDB

sedonadb = SedonaDB()
duckdb = DuckDB()

random = sedonadb.con.sql("""
    SELECT geometry FROM sd_random_geometry('{
                    "geom_type": "Point",
                    "target_rows": 10000000,
                    "vertices_per_linestring_range": [2, 2]
    }')""")


duckdb.create_table_arrow("random", random)
sedonadb.create_table_arrow("random", random)

%time duckdb.execute_and_collect("SELECT ST_Buffer(geometry, 0.1) FROM random")
#> CPU times: user 1min 20s, sys: 2.04 s, total: 1min 22s
#> Wall time: 7.53 s

%time sedonadb.execute_and_collect("SELECT ST_Buffer(geometry, 0.1) FROM random")
#> CPU times: user 1min 31s, sys: 3.05 s, total: 1min 35s
#> Wall time: 9.75 s

Maybe duckdb always runs single threaded under pytest? Probably it makes sense to roll our own benchmark suite.

I did some experiments and found that it is not related to whether we are using a test/benchmark framework or not. It is related to the size of dataset.

I tried this workload with various target_rows configurations when generating the test dataset. Here are the results:

target_rows = 100000: DuckDB uses a single thread, SedonaDB uses 10 threads (this is the data size in our pytest benchmarks)
target_rows = 500000: DuckDB uses 4 threads, SedonaDB uses 10 threads
target_rows = 1000000: DuckDB uses 5 threads, SedonaDB uses 10 threads

Generally, we observe less parallelism when the base table contains less data. This is probably related to how DuckDB estimates thread count. This behavior is also documented here.

jiayuasu · 2025-09-10T05:10:07Z

Interesting. I wonder if this also happens if we run spatial join against DuckDB.

This estimation gives us non-deterministic performance. But looks like there is a way to force DuckDB to use a fixed number of threads as well?

paleolimbot · 2025-09-10T19:08:39Z

I can see how there are two types of benchmarks that are equally valuable:

Integration-style benchmarks that use the defaults and read from Parquet (e.g., check the perceived speed of a relatively realistic query)
Unit-style benchmarks that are just a way to check if our particular implementation of a scalar function/iteration overhead is reasonable. Running these from memory on one thread (or the same number of threads) is possibly more comparable but forcing a single thread is maybe unrealistic because in practice some of our per-batch and per-item overhead is amortized over multiple threads and is possibly not something we should spend time optimizing yet.

Kontinuation · 2025-09-11T02:30:56Z

I wonder if this also happens if we run spatial join against DuckDB.

I have investigated the performance characteristics using a profiler, both are using all available processors when running the spatial join benchmarks. But this could be a problem if we run benchmarks in a different environment, or using a different dataset. We should always understand the reasons behind those numbers produced by benchmarks.

forcing a single thread is maybe unrealistic because in practice some of our per-batch and per-item overhead is amortized over multiple threads

How does multithreading amortize per-batch and per-item overhead? Could you explain that a bit more?

paleolimbot · 2025-09-11T03:50:51Z

How does multithreading amortize per-batch and per-item overhead? Could you explain that a bit more?

It might not! I am mostly just skeptical of creating too much of an artificial situation for the sake of fairness (although we can certainly do it for ourselves if it's giving us useful information/finding performance issues in specific benchmarks). Parts of Sedona and Sedona DB also adapt various configuration based on statistics...it's all part of the magic of the engine(s) that we want to measure.

Run python benchmarks using only one thread

d20e186

Kontinuation mentioned this pull request Sep 12, 2025

Improve the random benchmark data generator to generate non-identical pairs of geometries #70

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run queries in python benchmarks using only one thread #24

Run queries in python benchmarks using only one thread #24

Uh oh!

Kontinuation commented Sep 5, 2025

Uh oh!

paleolimbot commented Sep 5, 2025

Uh oh!

Kontinuation commented Sep 5, 2025 •

edited

Loading

Uh oh!

Kontinuation commented Sep 5, 2025

Uh oh!

Kontinuation commented Sep 5, 2025

Uh oh!

paleolimbot commented Sep 5, 2025

Uh oh!

petern48 commented Sep 5, 2025

Uh oh!

paleolimbot commented Sep 5, 2025

Uh oh!

jiayuasu commented Sep 5, 2025 •

edited

Loading

Uh oh!

Kontinuation commented Sep 10, 2025 •

edited

Loading

Uh oh!

jiayuasu commented Sep 10, 2025

Uh oh!

paleolimbot commented Sep 10, 2025

Uh oh!

Kontinuation commented Sep 11, 2025 •

edited

Loading

Uh oh!

paleolimbot commented Sep 11, 2025

Uh oh!

Uh oh!

Run queries in python benchmarks using only one thread #24

Are you sure you want to change the base?

Run queries in python benchmarks using only one thread #24

Uh oh!

Conversation

Kontinuation commented Sep 5, 2025

Uh oh!

paleolimbot commented Sep 5, 2025

Uh oh!

Kontinuation commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kontinuation commented Sep 5, 2025

Uh oh!

Kontinuation commented Sep 5, 2025

Uh oh!

paleolimbot commented Sep 5, 2025

Uh oh!

petern48 commented Sep 5, 2025

Uh oh!

paleolimbot commented Sep 5, 2025

Uh oh!

jiayuasu commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kontinuation commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiayuasu commented Sep 10, 2025

Uh oh!

paleolimbot commented Sep 10, 2025

Uh oh!

Kontinuation commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot commented Sep 11, 2025

Uh oh!

Uh oh!

Kontinuation commented Sep 5, 2025 •

edited

Loading

jiayuasu commented Sep 5, 2025 •

edited

Loading

Kontinuation commented Sep 10, 2025 •

edited

Loading

Kontinuation commented Sep 11, 2025 •

edited

Loading