Skip to content

Conversation

@yutannihilation
Copy link
Contributor

This pull request is my attempt to implement ST_Azimuth().

> sd_sql("select ST_Azimuth( ST_Point(25, 45),  ST_Point(75, 100))")
┌──────────────────────────────────────────────────────────────────────────┐
│ st_azimuth(st_point(Int64(25),Int64(45)),st_point(Int64(75),Int64(100))) │
│                                  float64                                 │
╞══════════════════════════════════════════════════════════════════════════╡
│                                                       0.7378150601204649 │
└──────────────────────────────────────────────────────────────────────────┘
Preview of up to 6 row(s)

This pull request is mostly for figuring out how to implement a function so that I can contribute more. So, please feel free to close if there's another plan to implement this.

I picked ST_Azimuth() because it looks relatively simple. I follow the implementation of Sedona, although I might not understand the code correctly as I'm poor at Java...

https://github.com/apache/sedona/blob/f9069e7dc6682d53335f0e0c6fb4bd444024d3b5/common/src/main/java/org/apache/sedona/common/Functions.java#L202-L209

I think the implementation is straightforward. One thing to note is that, if I understand the Sedona's implementation correctly, it returns 0.0 when the two points are the same. However, PostGIS returns NULL. The document says:

NULL if the two points are coincident

Comment on lines 25 to 26
pub fn st_azimuth_udf() -> SedonaScalarUDF {
SedonaScalarUDF::new_stub(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ST_Azimuth is simple enough to be implemented manually, we should move the entire implementation that you wrote inside of sedona-geo to this file in sedona-functions. No need for this stub function. sedona-geo is really for functions where we need to use the geo_generic_alg package to call more complicated algorithms. You can see an example implementation in st_isempty.rs. It should be as simple as copy-pasting over the implementation and modifying this function a bit to use SedonaScalarUDF::new instead of SedonaScalarUDF::new_stub.

Comment on lines 30 to 31
ArgMatcher::is_geometry_or_geography(),
ArgMatcher::is_geometry_or_geography(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ArgMatcher::is_geometry_or_geography(),
ArgMatcher::is_geometry_or_geography(),
ArgMatcher::is_geometry(),
ArgMatcher::is_geometry(),

Then, we can reduce this to geometry only for now, since we don't want to call the geometry implementation on geography objects.

}

// Note: When the two points are completely coincident, PostGIS's ST_Azimuth()
// returns NULL. However, this returns 0.0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's definitely add a test for this case.

But first, we should confirm what the desired behavior is. I suspect this is a just something that was missed in the original Sedona, and maybe we should follow PostGIS behavior here instead. There doesn't seem to be any discussion about this in the original Sedona PR. @jiayuasu was this difference intentional or maybe just something that was missed? Shall we fix it in Sedona as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To date, SedonaDB tests against PostGIS for feature parity and we file bugs with Sedona when we notice something is inconsistent.

@petern48
Copy link
Collaborator

petern48 commented Oct 5, 2025

Also, add a benchmark here in native-functions.rs (they're in alphabetical order).

There's an example of a bench for a function with two geometries as inputs here

@yutannihilation
Copy link
Contributor Author

Thanks for reviewing!

@yutannihilation
Copy link
Contributor Author

I think I addressed all the comments. Thanks for the detailed explanation, it really helps!

I switch the implementation to return NULL for the coincident points case, but I'll wait for confirmation.

@petern48
Copy link
Collaborator

petern48 commented Oct 6, 2025

Looks great. I just remembered, we should also add a new test_st_azimuth test in test_functions.py. We use this file for comparing the outputs directly with PostGIS results. There's also already a numeric_epsilon parameter that you can use to relax the precision (see test_st_buffer for an example). We often have a more comprehensive set of tests there since adding test cases is much more concise in Python.

You can follow these directions in the contributors-guide.md for testing Python. It will also require a running instance of PostGIS, which you can spin up using Docker by following these instructions I'm adding.

@yutannihilation
Copy link
Contributor Author

Thanks, I'll try it.

I also wonder if I should update this page, but probably there's no tool for this yet, guessing from #180?

https://github.com/apache/sedona-db/blob/main/docs/reference/sql.md

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! (And thanks Peter for the review!)

Looks great. I just remembered, we should also add a new test_st_azimuth test

These will be similar to the ones for st_distance():

@pytest.mark.parametrize("eng", [SedonaDB, PostGIS])
@pytest.mark.parametrize(
("geom1", "geom2", "expected"),
[
(None, None, None),
("POINT (0 0)", None, None),
(None, "POINT (0 0)", None),
("POINT (0 0)", "POINT (0 0)", 0),
(
"POINT(-72.1235 42.3521)",
"LINESTRING(-72.1260 42.45, -72.123 42.1546)",
0.0015056772638228177,
),
(
"POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))",
"POLYGON ((5 5, 6 5, 6 6, 5 6, 5 5))",
5.656854249492381,
),
],
)
def test_st_distance(eng, geom1, geom2, expected):
eng = eng.create_or_skip()
eng.assert_query_result(
f"SELECT ST_Distance({geom_or_null(geom1)}, {geom_or_null(geom2)})",
expected,
numeric_epsilon=1e-8,
)

...and could go here:

I also wonder if I should update this page

You're correct...don't worry about this, as long as the rust function documentation is there it will get updated.

(We'll hopefully do a better job documenting the process of adding a function in the future 😬 )

Comment on lines 104 to 105
// If either of the points is empty, the result is NULL
_ => Ok(None),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking: does PostGIS allow a MULITPOINT with a single child here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems PostGIS also rejects MULTIPOINT.

postgres=# SELECT ST_Azimuth(ST_Point(0, 0), ST_GeomFromText('MULTIPOINT (1 1)'));
ERROR:  Argument must be POINT geometries

}

// Note: When the two points are completely coincident, PostGIS's ST_Azimuth()
// returns NULL. However, this returns 0.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To date, SedonaDB tests against PostGIS for feature parity and we file bugs with Sedona when we notice something is inconsistent.

@yutannihilation
Copy link
Contributor Author

During adding these tests, I found PostGIS raise errors for these two cases where my implementation returns NULL.

The first one is two NULLs. I think test should not fail with this error because NULL::geometry works. However, casting to geometry doesn't work on SedonaDB yet.

postgres=# SELECT ST_Azimuth(NULL, NULL);
ERROR:  function st_azimuth(unknown, unknown) is not unique
LINE 1: SELECT ST_Azimuth(NULL, NULL);
               ^
HINT:  Could not choose a best candidate function. You might need to add explicit type casts.

The second one is empty point. I thought this returns NULL without errors, but it seems PostGIS doesn't allow empty point. I think we can follow this behavior.

postgres=# SELECT ST_Azimuth(ST_Point(0, 0), ST_GeomFromText('POINT EMPTY'));
NOTICE:  lwgeom_api.c [351] called with n=0 and npoints=0
ERROR:  Error extracting point

@yutannihilation
Copy link
Contributor Author

Thanks for your help. I think I addressed your comments.

You're correct...don't worry about this, as long as the rust function documentation is there it will get updated.

Good to know!

@jiayuasu
Copy link
Member

jiayuasu commented Oct 6, 2025

@yutannihilation does it make sense to show off your benchmark result (compared to DuckDB and PostGIS)? 😁

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines +126 to +128
# TODO: PostGIS fails without explicit ::GEOMETRY type cast, but casting
# doesn't work on SedonaDB yet.
# (None, None, None),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! We have a few cases where we're not sure exactly what this kind of thing should do (luckily people typing SQL NULLs for this kind of thing is pretty rare 🙂 )

@paleolimbot paleolimbot merged commit 21b8227 into apache:main Oct 6, 2025
12 checks passed
@yutannihilation yutannihilation deleted the feat/st-azimuth branch October 6, 2025 23:35
@yutannihilation
Copy link
Contributor Author

Thanks for reviewing!

@jiayuasu
I'm not familiar with benchmarking. Are there any guidance to do this? Do you mean I should have added a case here?

https://github.com/apache/sedona-db/blob/main/benchmarks/test_functions.py

@jiayuasu
Copy link
Member

jiayuasu commented Oct 6, 2025

@yutannihilation we have a tool in SedonaDB to produce benchmark results like what's shown in this PR: #171

@petern48 @paleolimbot can you help @yutannihilation figure out how to run it?

@yutannihilation
Copy link
Contributor Author

yutannihilation commented Oct 6, 2025

Thanks, I guess I can generate the benchmark result if I add a case to benchmarks/ like that PR, and I can follow this README.

https://github.com/apache/sedona-db/blob/main/benchmarks/README.md

@jiayuasu
Copy link
Member

jiayuasu commented Oct 6, 2025

I think you are right. Looking forward to it!

@yutannihilation
Copy link
Contributor Author

Sure, I'll try it!

@petern48
Copy link
Collaborator

petern48 commented Oct 7, 2025

Yep, that README.md is exactly it. I was reluctant to suggest adding a benchmark because there are still some issues to iron out about its purpose and validity (discussion in this PR.

Regardless, it surely doesn't hurt to add it. Especially since this is a native (manual) implementation, I'd expect to see some appealing numbers compared to DuckDB.

@yutannihilation
Copy link
Contributor Author

Hmm, before adding the case, it seems all benchmark tests are skipped on my local.

❯ pytest test_functions.py::TestBenchFunctions
======================================= test session starts =======================================
platform darwin -- Python 3.12.9, pytest-8.4.2, pluggy-1.6.0
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/yutani/repo/sedona-db/benchmarks
plugins: benchmark-5.1.0
collected 57 items

test_functions.py sssssssssssssssssssssssssssssssssssssssssssssssssssssssss                 [100%]

======================================= 57 skipped in 1.13s =======================================

Here's what I did. Am I missing some steps? Since I'm not very good at Python, I might make some silly mistake...

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install "python/sedonadb[test]" pytest pytest-benchmark

cd benchmarks
pytest test_functions.py::TestBenchFunctions

I'd expect to see some appealing numbers compared to DuckDB.

Since DuckDB's ST_Azimuth() is also implemented by me, you can blame me if DuckDB is not slow enough :) Anyway, looking forward to seeing the result!

@paleolimbot
Copy link
Member

I've also never successfully run the Python benchmarks, so no worries 🙂

(We're stoked you're here at all...feel free to spend your time doing whatever brings you the most joy!)

@petern48
Copy link
Collaborator

petern48 commented Oct 7, 2025

I'm not sure how you got actual s's (skips), but I suspect it's because you tried running the entire file, which would take forever. I submitted a PR to do this myself, since it requires adding a new points-only table. It wouldn't have been fun for you to figure out and debug since our current setup just hangs for a while (minutes) and eventually outputs a sort of useful error message. Also updated the docs to mention running one bench at a time.

Here's the benchmark screenshot copied over.

image

Since DuckDB's ST_Azimuth() is also implemented by me, you can blame me if DuckDB is not slow enough :) Anyway, looking forward to seeing the result!

I guess I'll instead thank you that DuckDB is slow enough 😉

@yutannihilation
Copy link
Contributor Author

I suspect it's because you tried running the entire file

Sorry, it was turned out that I simply forgot to launch PostGIS docker image this time...

Anyway, thanks for adding benchmark! Probably I can do it next time.

I guess I'll instead thank you that DuckDB is slow enough 😉

😉

@petern48
Copy link
Collaborator

petern48 commented Oct 7, 2025

Got it, thanks for catching that. It helps smooth the process for everyone when we know to document these small details. It's not ideal that the whole thing is skipped when the container isn't running, but I added a note about it to that same PR for now.

@jiayuasu jiayuasu linked an issue Oct 7, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants