-
Notifications
You must be signed in to change notification settings - Fork 744
[GH-2402] Add Sedona Flink SQL module #2452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e7af745 to
94b14e3
Compare
|
I have tested the module manually and it loads in Flink but the functions are not usable. I can call
I get
When i explicitly cast the number to a double or an integer the same issue appears. EDIT: I fixed the issue, it was linked to incorrect casing and I made everything lowercase. |
dd77708 to
7c8bee4
Compare
|
When I call Other than that, I'll work on the types tomorrow. |
7c8bee4 to
a77230c
Compare
|
I have been looking through the code and it seems to me that either there was a mistake or I'm missing something because all of the functions in For example: The documentation of this Making the functions use custom serializers requires passing a What do you think about this @Imbruced? |
|
@radekaadek, thanks for the research 🙇 Are you able to verify if it applies correctly with your changes? |
|
@radekaadek I think the only reason why I did that is because otherwise no way to register those functions. If there is a clear path now, please feel free to use the customized serializer. Please use Sedona's own geom serializer if this is the case: https://github.com/apache/sedona/tree/master/common/src/main/java/org/apache/sedona/common/geometrySerde Our serializer has way better performance than the default Java / Kyro serializer |
|
Well, looks like the current code already uses our serializer |
|
I guess the reason that it works now is because Flink automatically infers that a serializer that was passed in when that type was created. Can you tell me how you were able to verify that the code actually uses custom serializers @jiayuasu ? I don't know if I'll be able to serialize the index as I don't see it called in a function anywhere but I'll try to instantiate it's type and serializer when the module is loaded. The types in the module resolve to this RAW type where a custom serializer can be specified and the types themselves should already be handled correctly by Flink. I'll write some tests and documentation for the module once I'll have some time on my hands. |
|
I have added some tests for the module and made the functions use the custom serializer. I also noticed that all tests are initialized with the It occurred to me that the module could reuse most, if not all, of the test cases if this method were modified to something like this: This would require some refactoring, but let me know what you think. |
|
@radekaadek regarding logs, did you try to use log4j in Flink to print logs? |
|
@jiayuasu I solved the issue by making the serializer make some files on my machine, and it did, but I don't know how you would like see it actually being tested in unit tests. I have not noticed an error appeared in the CI that looks like this: and I don't know if where it's coming from. After implementing the The rest of the tests that are currently failing all seem to be connected to the The |
ef92065 to
2ebe793
Compare
|
sorry I am traveling this week so my response might be slow |
|
@radekaadek I've fixed the failed test cases. It turns out that you just need to add back the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a Sedona Flink SQL module to enable users to load Sedona's geospatial functions using Flink's module system, particularly beneficial for those using the Flink SQL Gateway. The key changes include:
- Implementation of a Flink module factory and module for Sedona
- Addition of explicit geometry type serializers to enable proper serialization in module-loaded functions
- Comprehensive test coverage for the new module functionality
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| flink/src/main/java/org/apache/sedona/flink/SedonaModuleFactory.java | Factory implementation for creating Sedona modules following Flink's factory pattern |
| flink/src/main/java/org/apache/sedona/flink/SedonaModule.java | Core module implementation that registers Sedona functions and predicates |
| flink/src/main/java/org/apache/sedona/flink/GeometryTypeSerializer.java | Custom type serializer for Geometry objects to enable proper serialization in Flink |
| flink/src/main/java/org/apache/sedona/flink/GeometryArrayTypeSerializer.java | Custom type serializer for Geometry arrays to support array-based operations |
| flink/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory | Service provider configuration file to register the Sedona module factory |
| flink/src/main/java/org/apache/sedona/flink/expressions/Predicates.java | Updated all predicate functions to include explicit rawSerializer in DataTypeHint annotations |
| flink/src/main/java/org/apache/sedona/flink/expressions/FunctionsGeoTools.java | Updated ST_Transform function to include explicit rawSerializer in DataTypeHint annotations |
| flink/src/main/java/org/apache/sedona/flink/expressions/Constructors.java | Updated all constructor functions to include explicit rawSerializer in DataTypeHint annotations |
| flink/src/main/java/org/apache/sedona/flink/expressions/Aggregators.java | Updated all aggregator functions to include explicit rawSerializer in DataTypeHint annotations |
| flink/src/main/java/org/apache/sedona/flink/expressions/Accumulators.java | Updated accumulator class to include explicit rawSerializer in DataTypeHint annotation |
| flink/src/test/java/org/apache/sedona/flink/ModuleTest.java | Comprehensive test suite verifying module loading, function registration, and basic functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-2402] my subject. Closes Registering Sedona types and SQL functions on a standalone Flink instance #2402What changes were proposed in this PR?
Add a Sedona Flink module for people who use the Flink SQL Gateway.
How was this patch tested?
Did this PR include necessary documentation updates?