Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent result with cuspatial.point_in_polygon #492

Open
hakantekbas opened this issue Feb 23, 2022 · 31 comments
Open

inconsistent result with cuspatial.point_in_polygon #492

hakantekbas opened this issue Feb 23, 2022 · 31 comments
Labels
bug Something isn't working

Comments

@hakantekbas
Copy link

Hi;

I encountered strange result when I run notebook below.

https://github.com/rapidsai/cuspatial/blob/branch-22.04/notebooks/nyc_taxi_years_correlation.ipynb

To show inconsistency, I did same calculation in 2 different ways. First one is pickups, second one is pickups2. Last column in pickups is calculated again in first column of pickups2. Strange part is to get different results in two calculations. Sum of true values in last column of pickups should be equal to first column of pickups2. In each calculation, sum of true values in last column is nearly the number rows. It is obvious that there is something wrong here. Can you help me to figure out?

start = 0
end = 4
pickups = cuspatial.point_in_polygon(taxi2016['pickup_longitude'] , taxi2016['pickup_latitude'], taxi_zones[0][start:end], taxi_zones[1], taxi_zones[2]['x'], taxi_zones[2]['y'])
pickups2 = cuspatial.point_in_polygon(taxi2016['pickup_longitude'] , taxi2016['pickup_latitude'], taxi_zones[0][3:6], taxi_zones[1], taxi_zones[2]['x'], taxi_zones[2]['y']) 

pickups

            0      1      2     3
0         False  False  False  True
1         False  False  False  True
2         False  False  False  True
3         False  False  False  True
4         False  False  False  True
...         ...    ...    ...   ...
10906853  False  False  False  True
10906854  False  False  False  True
10906855  False  False  False  True
10906856  False  False  False  True
10906857  False  False  False  True

count of true values in pickups

0         605
1           1
2          32
3    10727122
dtype: uint64

pickups2

              3      4     5
0         False  False  True
1         False  False  True
2         False  False  True
3         False  False  True
4         False  False  True
...         ...    ...   ...
10906853  False  False  True
10906854  False  False  True
10906855  False  False  True
10906856  False  False  True
10906857  False  False  True

count of true values in pickups2

3       34894
4           3
5    10692225
dtype: uint64
@hakantekbas hakantekbas added Needs Triage Need team to review and classify bug Something isn't working labels Feb 23, 2022
@hakantekbas
Copy link
Author

hakantekbas commented Feb 23, 2022

I want to give one more detail. This method only supports to accept 31 polygons at one time. If I create seperate shape file for every 31 polygons, it is running 10 times faster and results are correct.

for i in range(len(pip_iterations)-1):
    timeStart = time.time()
    start = pip_iterations[i]
    end = pip_iterations[i+1]

    tzones.iloc[start:end].to_file('/data_ssd/hakan/mobilData/spatialAnalysis/data/cu_taxi_zones.shp')
    taxi_zones = cuspatial.read_polygon_shapefile('/data_ssd/hakan/mobilData/spatialAnalysis/data/cu_taxi_zones.shp')
    pickups = cuspatial.point_in_polygon(taxi2016['pickup_longitude'] , taxi2016['pickup_latitude'], \
        taxi_zones[0], taxi_zones[1], taxi_zones[2]['x'], taxi_zones[2]['y'])
    for j in pickups.columns:
         taxi2016['atama'].loc[pickups[j]] = j + start

@zhangjianting
Copy link
Contributor

zhangjianting commented Feb 23, 2022

There is an issue in your first program. According to https://docs.rapids.ai/api/cuspatial/stable/api.html, in the API
cuspatial.point_in_polygon(test_points_x, test_points_y, poly_offsets, poly_ring_offsets, poly_points_x, poly_points_y)
poly_offsets and poly_ring_offsets were designed to begin with 0. There could be side effects if not.
Your 2nd program happens to satisfy the requirement when each polygon is read from shapefile separately.
I am investigating what might happen when use taxi_zones[0][start:end] instead of taxi_zones[0] for poly_offsets.

@hakantekbas
Copy link
Author

Thanks. When you say there is an issue in my code, are you pointing pickups2 ? I will ask another question about cuspatial.quadtree_point_in_polygon. Is this method faster than point_in_polygon? or quadtree_point_in_polygon is already incorporated in point_in_polygon?

@zhangjianting
Copy link
Contributor

@hakantekbas yes. poly_offsets and poly_ring_offsets arrays were designed to begin with 0.
I am trying to repeat your results. Unfortunately I am not a good Python programmer. What are the command(s) you used to generate the count lists?
quadtree_point_in_polygon indexes points first and then perform point-in-polygon test based spatial join. There is no limit on the # of polygons in quadtree_point_in_polygon and the results are (point_offset, polygon_offset) pairs, which is different from the result format of point_in_polygon.

@hakantekbas
Copy link
Author

hakantekbas commented Feb 24, 2022

@zhangjianting;
You can try following command. But if you look at pickups dataframe, you should see many trues in last column.

pickups.sum()

I will also ask how I can apply quadtree_point_in_polygon for this case and I also wonder whether this method is faster than cuspatial.point_in_polygon. Results obtained from following code does not look correct.

min_x = taxi2016.pickup_longitude.min()
max_x = taxi2016.pickup_longitude.max()
min_y = taxi2016.pickup_latitude.min()
max_y = taxi2016.pickup_latitude.max()
min_size = 50
max_depth = 4
scale = max(max_x - min_x, max_y - min_y) // (1 << max_depth)

pip_iterations = list(np.arange(0, 263, 30))
pip_iterations.append(263)
print(pip_iterations)
finalList = []
taxi2016["atama"]=264

for i in range(len(pip_iterations)-1):
    start = pip_iterations[i]
    end = pip_iterations[i+1]
    zoneShape = tzones.iloc[start:end]
    zoneShape.to_file('/data_ssd/hakan/mobilData/spatialAnalysis/data/cu_taxi_zones.shp')
    taxi_zones = cuspatial.read_polygon_shapefile('/data_ssd/hakan/mobilData/spatialAnalysis/data/cu_taxi_zones.shp')
    print("start:%s  , end:%s" % (start,end))

    key_to_point, quadtree = cuspatial.quadtree_on_points(
            taxi2016["pickup_longitude"],
            taxi2016["pickup_latitude"],
            min_x,
            max_x,
            min_y,
            max_y,
            scale, max_depth, min_size
        )

    polyBoundingBox = cuspatial.polygon_bounding_boxes(taxi_zones[0],taxi_zones[1],taxi_zones[2]["x"],taxi_zones[2]["y"])
    polyQuadOffset = cuspatial.join_quadtree_and_bounding_boxes(quadtree, polyBoundingBox, min_x, max_x, min_y, max_y, scale, max_depth)

    startTime = time.time()
    test = cuspatial.quadtree_point_in_polygon(polyQuadOffset, quadtree, key_to_point, taxi2016['pickup_longitude'], taxi2016['pickup_latitude'], 
                taxi_zones[0], taxi_zones[1], taxi_zones[2]['x'], taxi_zones[2]['y'])
    taxi2016.iloc[test.point_index,taxi2016.columns.tolist().index("atama")] = test.to_pandas()["polygon_index"] + zoneShape.index.min()
    print(time.time()-startTime)
    print("aaa")

@zhangjianting
Copy link
Contributor

zhangjianting commented Feb 24, 2022

In my previous answer, "poly_offsets and poly_ring_offsets were designed to begin with 0" was not accurate. The API expects values of poly_offsets matches the values of poly_ring_offsets for all the polygons being tested. This is naturally satisfied when a shapefile is read out and used to populate the two arrays.
However, using a subset of poly_offsets and the whole set of poly_ring_offsets breaks the matching requirement. It turns out that, when a subset of polygons are used, the result for the last polygon in the subset would be incorrect. But the result for all the other polygons should be correct.
Splitting polygon shapefile would be a good workaround for now. I am discussing with cuspatial team to find a good fix. Thanks for the bug report.

@hakantekbas
Copy link
Author

Thanks @zhangjianting for fast response. I also want to hear your thoughts about cuspatial.quadtree_point_in_polygon. If you can give more details about how to use this method and why my calculation does not work in my case. or maybe I'm applying correctly but calculation is wrong because this bug is also affecting this one. even so I expect to get same results. but results are very different.

@zhangjianting
Copy link
Contributor

@hakantekbas (min_x,min_y,max_x,max_y)
(-121.93428802490236, 0.0, 0.0, 60.908756256103516)
This is not a good space to apply quadtree indexing using min_size = 50 and max_depth = 4.
I would suggest filtering out records that are not in NYC area (with/without including neighboring NJ/CT areas) and try min_size=512 and max_depth=15.
For the example at https://github.com/zhangjianting/cuspatial_benchmark_nyctaxi/blob/main/python/spatial_join.py, EPSG 2263 projection was used for both point/polygon data. It should work for lon/lat (EPSG 4326).
For discussions on min_size and max_depth, see
[http://www.adms-conf.org/2019-camera-ready/zhang_adms19.pdf]

@zhangjianting
Copy link
Contributor

@hakantekbas point_in_polygon and quadtree_point_in_polygon are completely independent.

@hakantekbas
Copy link
Author

hakantekbas commented Feb 24, 2022

@zhangjianting thanks for sharing python file in github, and explanation

@zhangjianting
Copy link
Contributor

"If I create seperate shape file for every 31 polygons, it is running 10 times faster and results are correct."

My result shows that the majority of the time was spent on converting integer/bitmap vector form Cython/C++ results to DataFrame. We might be able to accelerate it at native CUDA level.

Page 3 of https://github.com/zhangjianting/cuspatial/blob/fea-spatial-join/docs/basic_spatial_trajectory_operators.pdf reported that for test of 1.3 million points in 27 (simple) polygons, the GPU runtime is 0.92 millisecond.

For the performance you have got, my guess is that it took smaller amount time to create 10 dataframes with 1 column than to create 1 data frame with 10 columns, although I do not have proof yet.

cuspatial.point_in_polygon was designed to for small problems (e.g., a few millions of points and <31 polygons). The dataframe return type is for convenience, not for performance. If performance is important, you can skip the dataframe wrapping and use lower level API directly (see example below):

result = cuspatial.core.gis.cpp_point_in_polygon(
taxi2016['pickup_longitude']._column,
taxi2016['pickup_latitude']._column,
as_column(taxi_zones[0], dtype="int32"),
as_column(taxi_zones[1], dtype="int32"),
taxi_zones[2]['x']._column,
taxi_zones[2]['y']._column)
result=result.data_array_view.copy_to_host()
Where each bit of result[i] indicates whether points[i] is in the corresponding polygon (first polygon at the highest bit if I remember correctly).

If the number of points is in the order of tens of millions, quadtree indexing would be beneficial. When you have a chance, please let me know the performance your get.

@hakantekbas
Copy link
Author

Hi @zhangjianting I will try the last one at weekend. I will keep you informed.

If I create polygon shape file for each group of 31 polygons, it is 10 times faster and it seems results are correct. I'm applying below code given in github page in base case. I believe performance degradation is caused by bug in point_in_polygon function.

pickups = cuspatial.point_in_polygon(taxi2016['pickup_longitude'] , taxi2016['pickup_latitude'], taxi_zones[0][start:end], taxi_zones[1], taxi_zones[2]['x'], taxi_zones[2]['y'])

Thanks.

@hakantekbas
Copy link
Author

Hi @zhangjianting

I received following error. It seems this method is also expecting 31 polygons maximum.

RuntimeError: cuSpatial failure at: /workspace/.conda-bld/work/cpp/src/spatial/point_in_polygon.cu:202: Number of polygons cannot exceed 31

If I put a limit on the number of polygons, it is working but returning values are not correct. Maybe I misterpretted. I copied unique values of result.

array([        0,         1,         2,         4,         8,        16,
              32,        64,       128,       256,       512,      1024,
            2048,      4096,      8192,     16384,     32768,     65536,
          131072,    262144,    524288,   1048576,   2097152,   4194304,
         8388608,  16777216,  33554432,  67108864, 134217728, 268435456,
       536870912], dtype=int32)

@zhangjianting
Copy link
Contributor

@hakantekbas Can you post the code segment you used? It looks like that the non-indexed version of point-in-polygon code was used. What is the array that you displayed?

@hakantekbas
Copy link
Author

hakantekbas commented Mar 1, 2022

Hi @zhangjianting

I displayed np.unique(result). All data files are downloaded from the links shared in notebook.

Thanks.

import cuspatial
from cuspatial.core.gis import as_column
import pandas as pd
import geopandas as gpd
import cudf
import numpy as np
import time

taxi2016 = cudf.read_csv("/data_ssd/hakan/mobilData/spatialAnalysis/data/taxi2016.csv")

tzones = gpd.GeoDataFrame.from_file('/data_ssd/hakan/mobilData/spatialAnalysis/data/tzones_lonlat.json')
tzones.iloc[0:30].to_file('/data_ssd/hakan/mobilData/spatialAnalysis/data/cu_taxi_zones.shp')

taxi_zones = cuspatial.read_polygon_shapefile('/data_ssd/hakan/mobilData/spatialAnalysis/data/cu_taxi_zones.shp')
shapeFileSehir = gpd.read_file("/data_ssd/hakan/mobilData/spatialAnalysis/data/cu_taxi_zones.shp")

taxi_zones = cuspatial.read_polygon_shapefile('/data_ssd/hakan/mobilData/spatialAnalysis/data/cu_taxi_zones.shp')
    
s = cuspatial.core.gis.cpp_point_in_polygon(taxi2016['pickup_longitude']._column,
                                            taxi2016['pickup_latitude']._column,
                                            as_column(taxi_zones[0], dtype="int32"),
                                            as_column(taxi_zones[1], dtype="int32"),
                                            taxi_zones[2]['x']._column,
                                            taxi_zones[2]['y']._column)
s=s.data_array_view.copy_to_host()

@zhangjianting
Copy link
Contributor

@hakantekbas I see. You called cython API directly for efficiency. This is still the regular point-in-polygon test API and has the 31 polygon limit.

As indicated In my previous comment, for your output s column vector (which is "result" in my code), "where each bit of result[i] indicates whether points[i] is in the corresponding polygon". You can use "bin(s[i]).count("1")" to count the number of polygons that points[i]) falls within (typically just one).

The unique values in your output are actually 1<<(31-j) where j vary from 0 to 31. It may indicate that no points fall within more than one polygon since the number of unique values is 31. This is expected.
The counts of the unique values should be the numbers of points that fall within the 31 polygons. This could be a more efficient way to implement dataframe.sum() if sum is all you need.

Please see

result = gis_utils.pip_bitmap_column_to_binary_array(
on how the column vector is converted to DataFrame. The conversion overhead is justifiable when users want to use the convenience of Dataframe (e.g., sum()).

@zhangjianting
Copy link
Contributor

@hakantekbas I repeated your experiment and got the following results. The two results agree each other.
tzones = gpd.GeoDataFrame.from_file('cu_taxi_zones.shp')
tzones.iloc[0:30].to_file('temp_taxi_zones.shp')
taxi_zones = cuspatial.read_polygon_shapefile('temp_taxi_zones.shp')
s = cuspatial.core.gis.cpp_point_in_polygon(
taxi2016['pickup_longitude']._column,
taxi2016['pickup_latitude']._column,
as_column(taxi_zones[0], dtype="int32"),
as_column(taxi_zones[1], dtype="int32"),
taxi_zones[2]['x']._column,
taxi_zones[2]['y']._column)
s=s.data_array_view.copy_to_host()
t=pd.DataFrame(s).value_counts().sort_index()
pickups = cuspatial.point_in_polygon(taxi2016['pickup_longitude'] , taxi2016['pickup_latitude'], taxi_zones[0], taxi_zones[1], taxi_zones[2]['x'], taxi_zones[2]['y'])

The first 10 rows of t (cython API, result in in bitmap format)
1 605
2 1
4 32
8 34894
16 3
32 36
64 22536
128 139
256 41
512 31502
1024 1725

The first 10 rows of pickups (Python API, result in in Dataframe format)
0 605
1 1
2 32
3 34894
4 3
5 36
6 22536
7 139
8 41
9 31502
10 1725

@zhangjianting
Copy link
Contributor

@hakantekbas I have a new explanation on the performance difference you saw before "If I create seperate shape file for every 31 polygons, it is running 10 times faster and results are correct."
Using separate shapefile will only perform point-in-polygon tests for the polygons in each shapefile. However, using taxi_zones[0][start:end] in cuspatial API will perform the tests for ALL the polygons represented by taxi_zones regardless of "start" and "end". Not only the results for the last polygon ("end-1") will be incorrect, the runtime will be much larger if "end"-"start" is smaller comparing to the total number of polygons.
Given the complexity of handling bitmap result from the cython API cuspatial.core.gis.cpp_point_in_polygon, it would be preferable to use python API cuspatial.point_in_polygon. The overhead may be not significant.
Sorry for the confusion.
As a summary, using something like taxi_zones[0][start:end] will bring unexpected semantic error (although syntactically correct). Using taxi_zones[0] as a whole set (no sub-setting) is required. The issue should be fixed in the documentation and/or API design ASAP.

@hakantekbas
Copy link
Author

is there any timeline to fix this issue? It seems this method is one fo the most important methods in api.

@harrism
Copy link
Member

harrism commented Mar 14, 2022

Can one of you (@zhangjianting ?) briefly summarize the specific issue? I am having trouble following the long discussion. Is there a reason you can't use cuspatial.quadtree_point_in_polygon.

I feel like we should investigate deprecating the old 31-polygon-limited PiP API if the above is a better replacement.

@zhangjianting
Copy link
Contributor

@harrism
The extensive discussions were mostly on the "bug" of cuspatial.point_in_polygon in Python API. It turns out that the notebook example at https://github.com/rapidsai/cuspatial/blob/branch-22.04/notebooks/nyc_taxi_years_correlation.ipynb used a subset of poly_offsets array but the complete poly_ring_offsets array, which made the result incorrect for the last polygon in the subset. poly_offsets must match poly_ring_offsets, which is automatically satisfied when the two arrays are read from shapefile directly.

Neither the python API or the underling C++ API has a logic bug, which was the reason that I used "bug" with quotation marks. It is the incorrect use of the APIs that caused the incorrect result. That being said, we need a way to enforce the matching between poly_offsets and poly_ring_offsets array to give a layer of protection.

A simple solution would be just pass the number of polygons and the number of rings as two additional scalar parameter, instead inferring from the lengths of poly_offsets and poly_ring_offsets arrays (as cudf columns).

The 2nd simple solution is to use the complete poly_offsets and poly_ring_offsets arrays but allows passing a subset of polygon IDs.

The 3rd solution is to use cuDF nested columns so that users will only see one column parameter that includes all information about poly_offsets/poly_ring_offsets/poly_x/poly_y. But this seems to be against our discussions on making cuSpatial independent of cudf.

Yet another simpler solution is just to add documentation and make the requirement on matching between poly_offsets and poly_ring_offsets clear. No change to our code is needed. Users just need to add a single line by calling GeoPandas API to write a subset polygons as a shapefile and then read it back using cuSpatial shapefile reader.

There could be additional alternatives. Which one would you prefer?

To answer your question on why not always use cuspatial.quadtree_point_in_polygon, the spatial-join based APIs need perform quad-tree indexing first which may bring unnecessary overheads and affect performance, when the number of points is not that large. More importantly, quad-tree indexing needs setting at least two parameters (max_level and min_size), which is a little tricky and data dependent and requires users to understand the underlying algorithm to a certain extent. The old (direct) PIP API does not need any such parameters, which is preferable if the number of polygons is no more than 32.

Actually, if we remove the bitvector representation of the direct/old PIP API result, and return (point, polygon) pairs as for cuspatial.quadtree_point_in_polygon, it can allow arbitrary number of polygons. The cost is doubled in this case by needing to add a phase to count the number of pairs before allocating memory and writing outputs in the 2nd phase.

Suggestion?

@harrism
Copy link
Member

harrism commented Mar 15, 2022

Let me think about it a bit. ;)

@harrism harrism self-assigned this Mar 15, 2022
@harrism
Copy link
Member

harrism commented Mar 16, 2022

Yet another simpler solution is just to add documentation and make the requirement on matching between poly_offsets and poly_ring_offsets clear. No change to our code is needed. Users just need to add a single line by calling GeoPandas API to write a subset polygons as a shapefile and then read it back using cuSpatial shapefile reader.

I think this is the first step, regardless of what we do to the API.

@thomcom
Copy link
Contributor

thomcom commented Mar 30, 2022

I think we can round-trip the above just by putting the selected polygons into a cuGeoDataframe. I'll give a stab at it tomorrow.

@zhangjianting
Copy link
Contributor

@thomcom @harrism
It would be great if we can have a Python API like subset_polygon_geodataframe(geo_df,subset_ids or subset_range) to return the same four arrays like read_polygon_shapefile(shp_file). This will leave the C++ API untouched and could be more efficient than saving subset of polygons to shapefile and then read the saved shapefile back (round-trip to disk), especially for large polygon datasets.

@harrism
Copy link
Member

harrism commented Mar 31, 2022

Another suggestion, from @isVoid:

#497 (comment)

would it be helpful to add an example of input to help demonstrate the use of the API?

rapids-bot bot pushed a commit that referenced this issue Mar 31, 2022
)

Partially addresses #492.

Adds notes to the Python and C++ documentation explaining that behavior is undefined if there are rings in `poly_ring_offsets` that are not referenced by `poly_offsets`

Authors:
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Christopher Harris (https://github.com/cwharris)
  - H. Thomson Comer (https://github.com/thomcom)

URL: #497
@harrism
Copy link
Member

harrism commented Mar 31, 2022

@hakantekbas in #497 I improved the documentation of this function in the C++ and Python APIs. We have not yet corrected the notebook or taken further steps to make the API less error-prone, so I'm keeping this issue open.

@harrism harrism moved this to In Progress in cuSpatial Apr 4, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Jul 27, 2022
## Overview

This PR closes #569. This PR rewrites the core pip test with Eric Haine's "crossings-multiply" algorithm: http://erich.realtimerendering.com/ptinpoly/

And this gives 30%~70% kernel time reduction (please ignore `Status` column):

```
['before_optim.json', 'optim_2.json']
# point_in_polygon_benchmark

## [0] Tesla V100-SXM2-32GB

|  CoordsType  |  NumTestPoints  |  NumSidesPerRing  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |          Diff |   %Diff |  Status  |
|--------------|-----------------|-------------------|------------|-------------|------------|-------------|---------------|---------|----------|
|     F32      |      1000       |         4         |  63.706 us |       6.09% |  36.483 us |       6.33% |    -27.223 us | -42.73% |   FAIL   |
|     F32      |     100000      |         4         | 113.862 us |       0.90% |  55.468 us |       1.75% |    -58.394 us | -51.29% |   FAIL   |
|     F32      |    10000000     |         4         |   9.167 ms |       0.02% |   3.785 ms |       0.14% |  -5382.554 us | -58.71% |   FAIL   |
|     F32      |      1000       |        10         | 109.911 us |       1.04% |  55.120 us |       1.82% |    -54.791 us | -49.85% |   FAIL   |
|     F32      |     100000      |        10         | 190.151 us |       0.57% |  81.772 us |       1.14% |   -108.380 us | -57.00% |   FAIL   |
|     F32      |    10000000     |        10         |  16.934 ms |       0.17% |   5.987 ms |       0.06% | -10947.199 us | -64.65% |   FAIL   |
|     F32      |      1000       |        100        | 813.854 us |       0.14% | 314.469 us |       0.30% |   -499.384 us | -61.36% |   FAIL   |
|     F32      |     100000      |        100        |   1.292 ms |       0.18% | 343.747 us |       0.37% |   -948.724 us | -73.40% |   FAIL   |
|     F32      |    10000000     |        100        | 124.471 ms |       0.09% |  29.318 ms |       0.06% | -95152.593 us | -76.45% |   FAIL   |
|     F64      |      1000       |         4         |  73.915 us |       1.40% |  46.365 us |       2.23% |    -27.550 us | -37.27% |   FAIL   |
|     F64      |     100000      |         4         | 149.814 us |       2.77% |  74.001 us |       1.42% |    -75.813 us | -50.60% |   FAIL   |
|     F64      |    10000000     |         4         |  10.541 ms |       0.08% |   5.186 ms |       0.05% |  -5354.802 us | -50.80% |   FAIL   |
|     F64      |      1000       |        10         | 133.100 us |       0.65% |  76.647 us |       1.25% |    -56.453 us | -42.41% |   FAIL   |
|     F64      |     100000      |        10         | 277.337 us |       3.51% | 117.104 us |       0.89% |   -160.233 us | -57.78% |   FAIL   |
|     F64      |    10000000     |        10         |  19.648 ms |       0.06% |   9.214 ms |       0.05% | -10434.298 us | -53.11% |   FAIL   |
|     F64      |      1000       |        100        |   1.017 ms |       0.19% | 485.950 us |       0.37% |   -531.440 us | -52.24% |   FAIL   |
|     F64      |     100000      |        100        |   1.992 ms |       0.63% | 517.869 us |       0.25% |  -1474.597 us | -74.01% |   FAIL   |
|     F64      |    10000000     |        100        | 137.110 ms |       0.08% |  40.617 ms |       0.07% | -96493.030 us | -70.38% |   FAIL   |

# Summary

- Total Matches: 18
  - Pass    (diff <= min_noise): 0
  - Unknown (infinite noise):    0
  - Failure (diff > min_noise):  18
```

## Potential impact introducing divergence and removal of division
**EDIT:** there's no divergence: #587 (comment)


@zhangjianting and @harrism mentions in different occasions of the changes in https://github.com/rapidsai/cuspatial/blob/e3a21341054ea5bf92896267800aed31a3493ee0/cpp/include/cuspatial/experimental/detail/point_in_polygon.cuh#L85-L88 may cause unwanted overhead due to divergence. The assumption to do so is that the cycles cost by division is greater than that of warp convergence. This is apparently dependent on many factors, such as compiling with/without `--fast-math` flag or how fast the hardware supports warp convergence. On a Tesla V100 the benchmark between divergence branch and division branch is as follows:

```
['optim_1.json', 'optim_2.json']
# point_in_polygon_benchmark

## [0] Tesla V100-SXM2-32GB

|  CoordsType  |  NumTestPoints  |  NumSidesPerRing  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|--------------|-----------------|-------------------|------------|-------------|------------|-------------|--------------|---------|----------|
|     F32      |      1000       |         4         |  37.351 us |       6.39% |  36.483 us |       6.33% |    -0.868 us |  -2.32% |   PASS   |
|     F32      |     100000      |         4         |  56.265 us |       1.63% |  55.468 us |       1.75% |    -0.797 us |  -1.42% |   PASS   |
|     F32      |    10000000     |         4         |   3.858 ms |       0.02% |   3.785 ms |       0.14% |   -72.843 us |  -1.89% |   FAIL   |
|     F32      |      1000       |        10         |  56.490 us |       2.04% |  55.120 us |       1.82% |    -1.370 us |  -2.43% |   FAIL   |
|     F32      |     100000      |        10         |  85.581 us |       1.08% |  81.772 us |       1.14% |    -3.810 us |  -4.45% |   FAIL   |
|     F32      |    10000000     |        10         |   6.327 ms |       0.11% |   5.987 ms |       0.06% |  -340.206 us |  -5.38% |   FAIL   |
|     F32      |      1000       |        100        | 314.513 us |       0.41% | 314.469 us |       0.30% |    -0.044 us |  -0.01% |   PASS   |
|     F32      |     100000      |        100        | 347.750 us |       0.49% | 343.747 us |       0.37% |    -4.003 us |  -1.15% |   FAIL   |
|     F32      |    10000000     |        100        |  29.822 ms |       0.07% |  29.318 ms |       0.06% |  -504.290 us |  -1.69% |   FAIL   |
|     F64      |      1000       |         4         |  49.142 us |       1.83% |  46.365 us |       2.23% |    -2.777 us |  -5.65% |   FAIL   |
|     F64      |     100000      |         4         |  78.167 us |       1.23% |  74.001 us |       1.42% |    -4.166 us |  -5.33% |   FAIL   |
|     F64      |    10000000     |         4         |   5.641 ms |       0.05% |   5.186 ms |       0.05% |  -454.642 us |  -8.06% |   FAIL   |
|     F64      |      1000       |        10         |  80.904 us |       1.13% |  76.647 us |       1.25% |    -4.257 us |  -5.26% |   FAIL   |
|     F64      |     100000      |        10         | 125.702 us |       0.65% | 117.104 us |       0.89% |    -8.598 us |  -6.84% |   FAIL   |
|     F64      |    10000000     |        10         |  10.074 ms |       0.03% |   9.214 ms |       0.05% |  -860.423 us |  -8.54% |   FAIL   |
|     F64      |      1000       |        100        | 487.959 us |       0.40% | 485.950 us |       0.37% |    -2.009 us |  -0.41% |   FAIL   |
|     F64      |     100000      |        100        | 528.070 us |       0.35% | 517.869 us |       0.25% |   -10.201 us |  -1.93% |   FAIL   |
|     F64      |    10000000     |        100        |  44.663 ms |       0.18% |  40.617 ms |       0.07% | -4045.961 us |  -9.06% |   FAIL   |
```
which gives visible speedups. This PR adopts the change based on this benchmark.

## Minor improvement and follow ups

This PR also introduces benchmarking suites for `point_in_polygon` API.

Follow ups to this PR includes:
- Introduce a set of functions to do input validation. This can help alleviate API misuse of #492.
- Improve test cases of this function. The function supports polygons of any sides, but test cases contains largely only quads.

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - H. Thomson Comer (https://github.com/thomcom)
  - Mark Harris (https://github.com/harrism)

URL: #587
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@isVoid
Copy link
Contributor

isVoid commented Jul 29, 2022

The extensive discussions were mostly on the "bug" of cuspatial.point_in_polygon in Python API. It turns out that the notebook example at https://github.com/rapidsai/cuspatial/blob/branch-22.04/notebooks/nyc_taxi_years_correlation.ipynb used a subset of poly_offsets array but the complete poly_ring_offsets array, which made the result incorrect for the last polygon in the subset. poly_offsets must match poly_ring_offsets, which is automatically satisfied when the two arrays are read from shapefile directly.

If my understanding of the issue is correct, the inconsistency results from that inputs format is incorrect. And the incorrectness results from that the polygon array is mistakenly sliced (subset). Recent dicussions spawns two separate paths (but converge) to solve this issue.

  1. While it is generally not favorable to introspect the data to determine the validity of the input, because the input format of GIS data is still unstandardized. We tend to create a small "validation" package inside cuspatial and leave it up to user to validate their data before calling the main algorithm.
  2. @thomcom recently refactored the geoseries library and implemented native slicing functionality inside for mixed geometry arrays (that includes polygon arrays) in Remove GeoArrow glue code replacing gpu storage with cudf.Series and host storage with pyarrow #585. I recommend trying out the slicing method there and use it with your workflow.

@harrism harrism removed their assignment Aug 1, 2022
@github-actions
Copy link

github-actions bot commented Sep 1, 2022

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@jarmak-nv jarmak-nv removed the Needs Triage Need team to review and classify label Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

6 participants