Fix performance regression in quadtree_point_in_polygon by 5x #1446

mroeschke · 2024-08-27T22:50:04Z

Description

Improves on a performance regression identified by @trxcllnt when using quadtree_point_in_polygon

Seems like the core slowdown is that the passed points is a GeoSeries, and the x/y points are generated by interleaving them in GeoColumnAccessor.xy and then slicing x/y back out.

Since these points appear to be held in a cudf.ListDtype (i.e. [x, y]), I tried avoiding interleaving and slicing the the x and y points individually in the lists using the cudf.Series.list.get API but this was failing for x/y points for .polygons.x/.polygons.y since some cuspatial APIs expect these points to be float (not sure if it's a cudf bug in list.get or how points as structured for polygons)

Instead, I was able to get a 5x speedup by

Caching the interleaving step to avoid doing this multiple times
Slicing a cupy array of points instead of a cudf.Series (which uselessly slices an .index which is not used)

Benchmarking code

# 1.3G    points.arrow
# 722K    polys.arrow

import cudf
import cupy
import cuspatial
import pyarrow as pa
from time import perf_counter_ns

polys = pa.ipc.open_stream("polys.arrow").read_all()
polys = cudf.DataFrame.from_arrow(polys).rename(columns={"tract": "poly"})
point = polys["poly"].list.leaves.struct.explode()
polys = cuspatial.GeoSeries.from_polygons_xy(
  point.interleave_columns(),
  polys["poly"]._column.elements.offsets,
  polys["poly"]._column.offsets,
  cupy.arange(0, len(polys) + 1)
)

point = pa.ipc.open_stream("points.arrow").read_all()
point = cudf.DataFrame.from_arrow(point)
min_x = point["x"].min()
max_x = point["x"].max()
min_y = point["y"].min()
max_y = point["y"].max()

max_size = 0
max_depth = 16
threshold = 10
while max_size <= threshold:
  max_depth -= 1
  max_size = len(point) / pow(4, max_depth) / 4

scale = max(max_x - min_x, max_y - min_y) / (1 << max_depth)

point_xy = cuspatial.GeoSeries.from_points_xy(point.interleave_columns())
point_map, quadtree = (
  cuspatial.quadtree_on_points(point_xy,
                               min_x,
                               max_x,
                               min_y,
                               max_y,
                               scale,
                               max_depth,
                               max_size))

t0 = perf_counter_ns()

cuspatial.quadtree_point_in_polygon(
  cuspatial.join_quadtree_and_bounding_boxes(
      quadtree,
      cuspatial.polygon_bounding_boxes(polys[0:1]),
      min_x,
      max_x,
      min_y,
      max_y,
      scale,
      max_depth
  ),
  quadtree,
  point_map,
  point_xy,
  polys[0:1]
)

t1 = perf_counter_ns()

print(f"{(t1 - t0) / (10 ** 6)}ms")

t0 = perf_counter_ns()
poly_offsets = cudf.Series(polys[0:1].polygons.part_offset)._column
ring_offsets = cudf.Series(polys[0:1].polygons.ring_offset)._column
poly_points_x = cudf.Series(polys[0:1].polygons.x)._column
poly_points_y = cudf.Series(polys[0:1].polygons.y)._column

from cuspatial._lib import spatial_join

cudf.DataFrame._from_data(
  *spatial_join.quadtree_point_in_polygon(
      cuspatial.join_quadtree_and_bounding_boxes(
          quadtree,
          cuspatial.polygon_bounding_boxes(polys[0:1]),
          min_x,
          max_x,
          min_y,
          max_y,
          scale,
          max_depth
      ),
      quadtree,
      point_map._column,
      point["x"]._column,
      point["y"]._column,
      poly_offsets,
      ring_offsets,
      poly_points_x,
      poly_points_y,
  )
)

t1 = perf_counter_ns()

print(f"{(t1 - t0) / (10 ** 6)}ms")

# 127.406344ms  # this PR
# 644.963021ms  # branch 24.10

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

harrism · 2024-08-27T23:41:56Z

I think the title should be "Fix performance regression in ...". This is a bugfix, not an improvement, right? The current title sounds like a major new optimization.

Also, in your benchmark, please make it clear what data size you are seeing a 5x improvement for.

harrism · 2024-08-27T23:43:53Z

Probably should have been a bug issue filed for this.

trxcllnt · 2024-08-28T06:57:40Z

python/cuspatial/cuspatial/core/spatial/join.py

+    points_data = points.points
+    points_x = as_column(points_data.x)
+    points_y = as_column(points_data.y)


I think we should allow users to avoid packing points into a GeoSeries if desired.

Since GeoSeries is a Union<List<List<List<Float32>>>, each 8-byte point pair requires an additional 13 bytes (1 + 4 * 3 ) of indices, which is prohibitively expensive when the dataset is large.

I suggest we either change the API to accept separate x/y columns again (while still accepting and unpacking the points from a GeoSeries), or provide some sort of wrapper for a pair of x/y columns that can be accessed the same way as the GeoSeries.

At the moment, here's my workaround:

class Points: def __init__(self, x, y): self.x = x self.y = y self.points = self self._column = self self._meta = self self.input_types = Series([Feature_Enum.POINT.value]) def __len__(self) -> int: return len(self.x) x = Series([ ... ]) y = Series([ ... ]) # Wrap x and y cols in a Points instance which duck-types # GeoSeries' accessors ala `p.points.x` and `p.points.y` quadtree_on_points(Points(x, y), ...)

Historically it looks like the API changed in #934 where quadtree_point_in_polygon was updated to accept a GeoSeries points and polygon inputs. I suppose @isVoid is best to make the call on whether we should change the API back to accept x/y points again.

trxcllnt

127.406344ms is still way too long, the query in the benchmark code should take 5-10ms.

mroeschke · 2024-08-28T22:56:36Z

I think the title should be "Fix performance regression in ...". This is a bugfix, not an improvement, right? The current title sounds like a major new optimization.

Also, in your benchmark, please make it clear what data size you are seeing a 5x improvement for.

Sorry yeah I updated the PR title and added the sizes of points.arrow and polys.arrow in the benchmark example

harrism · 2024-08-28T23:15:20Z

127.406344ms is still way too long, the query in the benchmark code should take 5-10ms.

@trxcllnt do you have a repro of that high performance? Are you saying we still have a 12-25x performance regression?

trxcllnt · 2024-08-29T18:06:38Z

@harrism yes, applying the workaround in my comment brings the run time down to ~25ms (see below).

I believe this is still longer than it should take, since it takes 5-10ms in node-rapids (which is still on cuspatial branch-22.12). Since node-rapids is basically straight calls into libcudf and libcuspatial, it doesn't include most of the additional overhead in Python (like DataFrame indices), so there are likely additional operations in Python contributing to the 25ms.

There is an alternative to the workaround I've described here -- we could add a C++ API that accepts the packed X/Y format in GeoSeries, then adapt it into two strided x and y iterators for the quadtree spatial join.

However, this still requires the user to construct a GeoSeries from the points (which takes 21 bytes/row instead of 8), so this would still be sub-optimal for memory-sensitive users.

$ python cuspatial-quadtree-regression.py 
390.267027ms
25.687572ms

# cuspatial-quadtree-regression.py
# 1.3G    points.arrow https://node-rapids-data.s3.us-west-2.amazonaws.com/spatial/168898952_points.arrow.gz
# 722K    polys.arrow https://node-rapids-data.s3.us-west-2.amazonaws.com/spatial/263_tracts.arrow.gz

import cudf
import cupy
import cuspatial
import pyarrow as pa
from time import perf_counter_ns

polys = pa.ipc.open_stream("polys.arrow").read_all()
polys = cudf.DataFrame.from_arrow(polys).rename(columns={"tract": "poly"})
point = polys["poly"].list.leaves.struct.explode()
polys = cuspatial.GeoSeries.from_polygons_xy(
  point.interleave_columns(),
  polys["poly"]._column.elements.offsets,
  polys["poly"]._column.offsets,
  cupy.arange(0, len(polys) + 1)
)

point = pa.ipc.open_stream("points.arrow").read_all()
point = cudf.DataFrame.from_arrow(point)
min_x = point["x"].min()
max_x = point["x"].max()
min_y = point["y"].min()
max_y = point["y"].max()

max_size = 0
max_depth = 16
threshold = 10
while max_size <= threshold:
  max_depth -= 1
  max_size = len(point) / pow(4, max_depth) / 4

scale = max(max_x - min_x, max_y - min_y) / (1 << max_depth)

point_xy = cuspatial.GeoSeries.from_points_xy(point.interleave_columns())
point_map, quadtree = (
  cuspatial.quadtree_on_points(point_xy,
                               min_x,
                               max_x,
                               min_y,
                               max_y,
                               scale,
                               max_depth,
                               max_size))

t0 = perf_counter_ns()

cuspatial.quadtree_point_in_polygon(
  cuspatial.join_quadtree_and_bounding_boxes(
      quadtree,
      cuspatial.polygon_bounding_boxes(polys[0:1]),
      min_x,
      max_x,
      min_y,
      max_y,
      scale,
      max_depth
  ),
  quadtree,
  point_map,
  point_xy,
  polys[0:1]
)

t1 = perf_counter_ns()

print(f"{(t1 - t0) / (10 ** 6)}ms")

from cuspatial.core._column.geometa import Feature_Enum

class Points:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.points = self
        self._column = self
        self._meta = self
        self.input_types = cudf.Series([Feature_Enum.POINT.value])

    def __len__(self) -> int:
        return len(self.x)

t0 = perf_counter_ns()

cuspatial.quadtree_point_in_polygon(
  cuspatial.join_quadtree_and_bounding_boxes(
      quadtree,
      cuspatial.polygon_bounding_boxes(polys[0:1]),
      min_x,
      max_x,
      min_y,
      max_y,
      scale,
      max_depth
  ),
  quadtree,
  point_map,
  Points(point["x"], point["y"]),
  polys[0:1]
)

t1 = perf_counter_ns()

print(f"{(t1 - t0) / (10 ** 6)}ms")

mroeschke added 2 commits August 27, 2024 14:12

Improve quadtree_point_in_polygon performance by 5x

46a9149

Fix failing tests

8d7714a

mroeschke added Python Related to Python code improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 27, 2024

mroeschke requested a review from a team as a code owner August 27, 2024 22:50

mroeschke requested review from trxcllnt and bdice August 27, 2024 22:50

trxcllnt reviewed Aug 28, 2024

View reviewed changes

trxcllnt requested changes Aug 28, 2024

View reviewed changes

mroeschke changed the title ~~Improve quadtree_point_in_polygon performance by 5x~~ Fix performance regression in quadtree_point_in_polygon by 5x Aug 28, 2024

mroeschke changed the base branch from branch-24.10 to branch-24.12 September 25, 2024 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix performance regression in quadtree_point_in_polygon by 5x #1446

Fix performance regression in quadtree_point_in_polygon by 5x #1446

mroeschke commented Aug 27, 2024 •

edited

Loading

harrism commented Aug 27, 2024 •

edited

Loading

harrism commented Aug 27, 2024

trxcllnt Aug 28, 2024

mroeschke Aug 28, 2024

trxcllnt left a comment •

edited

Loading

mroeschke commented Aug 28, 2024

harrism commented Aug 28, 2024

trxcllnt commented Aug 29, 2024

Fix performance regression in quadtree_point_in_polygon by 5x #1446

Are you sure you want to change the base?

Fix performance regression in quadtree_point_in_polygon by 5x #1446

Conversation

mroeschke commented Aug 27, 2024 • edited Loading

Description

Checklist

harrism commented Aug 27, 2024 • edited Loading

harrism commented Aug 27, 2024

trxcllnt Aug 28, 2024

Choose a reason for hiding this comment

mroeschke Aug 28, 2024

Choose a reason for hiding this comment

trxcllnt left a comment • edited Loading

Choose a reason for hiding this comment

mroeschke commented Aug 28, 2024

harrism commented Aug 28, 2024

trxcllnt commented Aug 29, 2024

mroeschke commented Aug 27, 2024 •

edited

Loading

harrism commented Aug 27, 2024 •

edited

Loading

trxcllnt left a comment •

edited

Loading