Skip to content

Sedona reporting bounds inconsistent with PostGIS and GeoPandas #1874

@paleolimbot

Description

@paleolimbot

To be fair, this is the wild west and nobody seems to agree on what to do for the two cases of:

  • xmin/xmax/ymin/ymax for EMPTY
  • Geometry that contains zero or more nan values

Sedona seems to return an almost infinity value for a lower bound of an EMPTY (or maybe infinity just prints this way from Spark?) and seems to propagate a NaN encountered in coordinates Like Math.min/max():

from sedona.spark import *

config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

sedona.sql("SELECT ST_XMin(ST_GeomFromText('POINT EMPTY'))").show()
#> +---------------------------------------+
#> |st_xmin(st_geomfromwkt(POINT EMPTY, 0))|
#> +---------------------------------------+
#> |                   1.797693134862315...|
#> +---------------------------------------+
sedona.sql("SELECT ST_XMin(ST_GeomFromText('LINESTRING (0 1, nan nan, 2 3)'))").show()
#> +-----------------------------------------------------------+
#> |st_xmin(st_geomfromtext(LINESTRING (0 1, nan nan, 2 3), 0))|
#> +-----------------------------------------------------------+
#> |                                                        NaN|
#> +-----------------------------------------------------------+
sedona.sql("SELECT ST_XMin(ST_GeomFromText('LINESTRING EMPTY'))").show()
#> +---------------------------------------------+
#> |st_xmin(st_geomfromtext(LINESTRING EMPTY, 0))|
#> +---------------------------------------------+
#> |                         1.797693134862315...|
#> +---------------------------------------------+

GeoPandas seems to give NaN for empty bounds and ignores NaN if encountered in a coordinate sequence (like std::min/max()):

import geopandas

geopandas.GeoSeries.from_wkt(["POINT EMPTY", "POINT (0 1)", "LINESTRING (0 1, nan nan, 2 3)"]).total_bounds
#> array([0., 1., 2., 3.])

geopandas.GeoSeries.from_wkt(["POINT EMPTY", "POINT (0 1)", "LINESTRING (0 1, nan nan, 2 3)"]).bounds
#> minx  miny  maxx  maxy
#> 0   NaN   NaN   NaN   NaN
#> 1   0.0   1.0   0.0   1.0
#> 2   0.0   1.0   2.0   3.0

geopandas.GeoSeries.from_wkt(["POINT EMPTY"]).total_bounds
#> array([nan, nan, nan, nan])

PostGIS gives NULL for the bounds of an EMPTY:

# docker run --rm -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password -p "5432:5432" postgis/postgis
# psql -h 127.0.0.1 --user postgres
postgres=# SELECT ST_XMin(ST_GeomFromText('POINT EMPTY')) IS NULL;
 ?column? 
----------
 t
(1 row)

...and when coordinates contain NULL, it appears that the bounds are reset when the first nan occurs:

postgres=# SELECT ST_XMin(ST_GeomFromtext('LINESTRING (1 2, nan nan, 3 4)'));
 st_xmin 
---------
       3
(1 row)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions