Goal
Bring Geography type support to SedonaFlink so that the Flink SQL/Table API reaches parity with the Spark module. Geography (geodesic, S2-backed) is already implemented in the engine-agnostic common module and fully exposed in Spark, but the Flink module exposes none of it — Flink users currently have no GEOGRAPHY type and no ST_Geog* functions.
Background
The building blocks already exist and are engine-agnostic:
-
common module — Geography type and S2 backing:
common/src/main/java/org/apache/sedona/common/S2Geography/ — type hierarchy (Geography, PointGeography, PolylineGeography, PolygonGeography, MultiPolygonGeography, GeographyCollection, …), GeographySerializer, WKT/WKB readers & writers, Accessors, Predicates, Distance, Projection.
common/src/main/java/org/apache/sedona/common/geography/Constructors.java — geogFromWKT, geogFromEWKT, geogCollFromText, geogFromWKB, geogFromGeoHash, geogToGeometry, geomToGeography.
common/src/main/java/org/apache/sedona/common/geography/Functions.java — getEnvelope, asEWKT.
-
Spark module — how it is exposed (the parity target):
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/UDT/GeographyUDT.scala — the GeographyUDT, wrapping GeographySerializer.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/geography/Constructors.scala — 9 expressions: ST_GeogFromWKT, ST_GeogFromEWKT, ST_GeogFromText, ST_GeogCollFromText, ST_GeogFromWKB, ST_GeogFromEWKB, ST_GeogFromGeoHash, ST_GeogToGeometry, ST_GeomToGeography.
- Registered in
spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala.
-
Flink module — what geography work will parallel (geometry-only today):
flink/src/main/java/org/apache/sedona/flink/GeometryTypeSerializer.java, GeometryArrayTypeSerializer.java — the serializer pattern to mirror.
flink/src/main/java/org/apache/sedona/flink/expressions/Constructors.java, Functions.java, Predicates.java — where ScalarFunction wrappers live.
flink/src/main/java/org/apache/sedona/flink/Catalog.java — function registration. No geography functions registered today.
Approach
Mirror the Spark integration in Flink. Each Flink ScalarFunction wraps the existing common-module geography functions, using a @DataTypeHint(value = "RAW", rawSerializer = GeographyTypeSerializer.class, bridgedTo = Geography.class) annotation — the same pattern the geometry constructors already use.
Scope
Parity with the Spark geography surface on master. Spark exposes, all backed by the engine-agnostic common/geography/:
- Type + 9 constructors/conversions —
GeographyUDT, ST_GeogFromWKT, ST_GeogFromText, ST_GeogFromWKB, ST_GeogFromEWKB, ST_GeogFromEWKT, ST_GeogFromGeoHash, ST_GeogCollFromText, ST_GeogToGeometry, ST_GeomToGeography.
- 11 measurement/output functions that accept
Geography — ST_Area, ST_Length, ST_Distance, ST_Buffer, ST_Centroid, ST_Envelope, ST_NPoints, ST_NumGeometries, ST_GeometryType, ST_AsText, ST_AsEWKT.
- 5 predicates that accept
Geography — ST_Contains, ST_Intersects, ST_Within, ST_Equals, ST_DWithin.
In Spark these are Geography overloads on the existing geometry expressions (inferrableFunction accepting Geometry or Geography). Flink must expose the same set so geography columns are usable end-to-end, not just constructible.
Sub-tasks
All sub-tasks complete. SedonaFlink now has full geography parity with Spark: the Geography type serializer, 9 constructors/conversions, 11 measurement/output functions, 5 predicates, and complete API docs.
Notes
- All
common-module geography infrastructure is already implemented and engine-agnostic; this epic is purely about wiring it into Flink — no new geodesic/S2 logic is required.
- No
GeographyArrayTypeSerializer for now — no geography function returns an array yet.
- As Spark adds geography predicates/measurements/accessors, follow-up issues should track Flink parity for those.
Goal
Bring Geography type support to SedonaFlink so that the Flink SQL/Table API reaches parity with the Spark module. Geography (geodesic, S2-backed) is already implemented in the engine-agnostic
commonmodule and fully exposed in Spark, but the Flink module exposes none of it — Flink users currently have noGEOGRAPHYtype and noST_Geog*functions.Background
The building blocks already exist and are engine-agnostic:
commonmodule — Geography type and S2 backing:common/src/main/java/org/apache/sedona/common/S2Geography/— type hierarchy (Geography,PointGeography,PolylineGeography,PolygonGeography,MultiPolygonGeography,GeographyCollection, …),GeographySerializer, WKT/WKB readers & writers,Accessors,Predicates,Distance,Projection.common/src/main/java/org/apache/sedona/common/geography/Constructors.java—geogFromWKT,geogFromEWKT,geogCollFromText,geogFromWKB,geogFromGeoHash,geogToGeometry,geomToGeography.common/src/main/java/org/apache/sedona/common/geography/Functions.java—getEnvelope,asEWKT.Spark module — how it is exposed (the parity target):
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/UDT/GeographyUDT.scala— theGeographyUDT, wrappingGeographySerializer.spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/geography/Constructors.scala— 9 expressions:ST_GeogFromWKT,ST_GeogFromEWKT,ST_GeogFromText,ST_GeogCollFromText,ST_GeogFromWKB,ST_GeogFromEWKB,ST_GeogFromGeoHash,ST_GeogToGeometry,ST_GeomToGeography.spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala.Flink module — what geography work will parallel (geometry-only today):
flink/src/main/java/org/apache/sedona/flink/GeometryTypeSerializer.java,GeometryArrayTypeSerializer.java— the serializer pattern to mirror.flink/src/main/java/org/apache/sedona/flink/expressions/Constructors.java,Functions.java,Predicates.java— whereScalarFunctionwrappers live.flink/src/main/java/org/apache/sedona/flink/Catalog.java— function registration. No geography functions registered today.Approach
Mirror the Spark integration in Flink. Each Flink
ScalarFunctionwraps the existingcommon-module geography functions, using a@DataTypeHint(value = "RAW", rawSerializer = GeographyTypeSerializer.class, bridgedTo = Geography.class)annotation — the same pattern the geometry constructors already use.Scope
Parity with the Spark geography surface on
master. Spark exposes, all backed by the engine-agnosticcommon/geography/:GeographyUDT,ST_GeogFromWKT,ST_GeogFromText,ST_GeogFromWKB,ST_GeogFromEWKB,ST_GeogFromEWKT,ST_GeogFromGeoHash,ST_GeogCollFromText,ST_GeogToGeometry,ST_GeomToGeography.Geography—ST_Area,ST_Length,ST_Distance,ST_Buffer,ST_Centroid,ST_Envelope,ST_NPoints,ST_NumGeometries,ST_GeometryType,ST_AsText,ST_AsEWKT.Geography—ST_Contains,ST_Intersects,ST_Within,ST_Equals,ST_DWithin.In Spark these are
Geographyoverloads on the existing geometry expressions (inferrableFunctionacceptingGeometryorGeography). Flink must expose the same set so geography columns are usable end-to-end, not just constructible.Sub-tasks
GeographyTypeSerializerwrappingorg.apache.sedona.common.S2Geography.GeographyWKBSerializer, mirroringGeometryTypeSerializer— SedonaFlink: add GeographyTypeSerializer for the Geography type #3058 ✅ScalarFunctionwrappers inexpressions/geography/GeographyConstructors.java, registered inCatalog.java, with round-trip tests — SedonaFlink: add geography constructor functions (ST_Geog*) #3061 ✅common.geography.Functions, with tests — SedonaFlink: add geography support to measurement and output functions #3073 ✅All sub-tasks complete. SedonaFlink now has full geography parity with Spark: the
Geographytype serializer, 9 constructors/conversions, 11 measurement/output functions, 5 predicates, and complete API docs.Notes
common-module geography infrastructure is already implemented and engine-agnostic; this epic is purely about wiring it into Flink — no new geodesic/S2 logic is required.GeographyArrayTypeSerializerfor now — no geography function returns an array yet.