Skip to content

Add Geography type support to SedonaFlink (parity with Spark) #3054

Description

@jiayuasu

Goal

Bring Geography type support to SedonaFlink so that the Flink SQL/Table API reaches parity with the Spark module. Geography (geodesic, S2-backed) is already implemented in the engine-agnostic common module and fully exposed in Spark, but the Flink module exposes none of it — Flink users currently have no GEOGRAPHY type and no ST_Geog* functions.

Background

The building blocks already exist and are engine-agnostic:

  • common module — Geography type and S2 backing:

    • common/src/main/java/org/apache/sedona/common/S2Geography/ — type hierarchy (Geography, PointGeography, PolylineGeography, PolygonGeography, MultiPolygonGeography, GeographyCollection, …), GeographySerializer, WKT/WKB readers & writers, Accessors, Predicates, Distance, Projection.
    • common/src/main/java/org/apache/sedona/common/geography/Constructors.javageogFromWKT, geogFromEWKT, geogCollFromText, geogFromWKB, geogFromGeoHash, geogToGeometry, geomToGeography.
    • common/src/main/java/org/apache/sedona/common/geography/Functions.javagetEnvelope, asEWKT.
  • Spark module — how it is exposed (the parity target):

    • spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/UDT/GeographyUDT.scala — the GeographyUDT, wrapping GeographySerializer.
    • spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/geography/Constructors.scala — 9 expressions: ST_GeogFromWKT, ST_GeogFromEWKT, ST_GeogFromText, ST_GeogCollFromText, ST_GeogFromWKB, ST_GeogFromEWKB, ST_GeogFromGeoHash, ST_GeogToGeometry, ST_GeomToGeography.
    • Registered in spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala.
  • Flink module — what geography work will parallel (geometry-only today):

    • flink/src/main/java/org/apache/sedona/flink/GeometryTypeSerializer.java, GeometryArrayTypeSerializer.java — the serializer pattern to mirror.
    • flink/src/main/java/org/apache/sedona/flink/expressions/Constructors.java, Functions.java, Predicates.java — where ScalarFunction wrappers live.
    • flink/src/main/java/org/apache/sedona/flink/Catalog.java — function registration. No geography functions registered today.

Approach

Mirror the Spark integration in Flink. Each Flink ScalarFunction wraps the existing common-module geography functions, using a @DataTypeHint(value = "RAW", rawSerializer = GeographyTypeSerializer.class, bridgedTo = Geography.class) annotation — the same pattern the geometry constructors already use.

Scope

Parity with the Spark geography surface on master. Spark exposes, all backed by the engine-agnostic common/geography/:

  • Type + 9 constructors/conversionsGeographyUDT, ST_GeogFromWKT, ST_GeogFromText, ST_GeogFromWKB, ST_GeogFromEWKB, ST_GeogFromEWKT, ST_GeogFromGeoHash, ST_GeogCollFromText, ST_GeogToGeometry, ST_GeomToGeography.
  • 11 measurement/output functions that accept GeographyST_Area, ST_Length, ST_Distance, ST_Buffer, ST_Centroid, ST_Envelope, ST_NPoints, ST_NumGeometries, ST_GeometryType, ST_AsText, ST_AsEWKT.
  • 5 predicates that accept GeographyST_Contains, ST_Intersects, ST_Within, ST_Equals, ST_DWithin.

In Spark these are Geography overloads on the existing geometry expressions (inferrableFunction accepting Geometry or Geography). Flink must expose the same set so geography columns are usable end-to-end, not just constructible.

Sub-tasks

All sub-tasks complete. SedonaFlink now has full geography parity with Spark: the Geography type serializer, 9 constructors/conversions, 11 measurement/output functions, 5 predicates, and complete API docs.

Notes

  • All common-module geography infrastructure is already implemented and engine-agnostic; this epic is purely about wiring it into Flink — no new geodesic/S2 logic is required.
  • No GeographyArrayTypeSerializer for now — no geography function returns an array yet.
  • As Spark adds geography predicates/measurements/accessors, follow-up issues should track Flink parity for those.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions