-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inconsistent result with cuspatial.point_in_polygon #492
Comments
I want to give one more detail. This method only supports to accept 31 polygons at one time. If I create seperate shape file for every 31 polygons, it is running 10 times faster and results are correct.
|
There is an issue in your first program. According to https://docs.rapids.ai/api/cuspatial/stable/api.html, in the API |
Thanks. When you say there is an issue in my code, are you pointing pickups2 ? I will ask another question about cuspatial.quadtree_point_in_polygon. Is this method faster than point_in_polygon? or quadtree_point_in_polygon is already incorporated in point_in_polygon? |
@hakantekbas yes. poly_offsets and poly_ring_offsets arrays were designed to begin with 0. |
Hİ @zhangjianting;
I will also ask how I can apply quadtree_point_in_polygon for this case and I also wonder whether this method is faster than cuspatial.point_in_polygon. Results obtained from following code does not look correct.
|
In my previous answer, "poly_offsets and poly_ring_offsets were designed to begin with 0" was not accurate. The API expects values of poly_offsets matches the values of poly_ring_offsets for all the polygons being tested. This is naturally satisfied when a shapefile is read out and used to populate the two arrays. |
Thanks @zhangjianting for fast response. I also want to hear your thoughts about cuspatial.quadtree_point_in_polygon. If you can give more details about how to use this method and why my calculation does not work in my case. or maybe I'm applying correctly but calculation is wrong because this bug is also affecting this one. even so I expect to get same results. but results are very different. |
@hakantekbas (min_x,min_y,max_x,max_y) |
@hakantekbas point_in_polygon and quadtree_point_in_polygon are completely independent. |
@zhangjianting thanks for sharing python file in github, and explanation |
My result shows that the majority of the time was spent on converting integer/bitmap vector form Cython/C++ results to DataFrame. We might be able to accelerate it at native CUDA level. Page 3 of https://github.com/zhangjianting/cuspatial/blob/fea-spatial-join/docs/basic_spatial_trajectory_operators.pdf reported that for test of 1.3 million points in 27 (simple) polygons, the GPU runtime is 0.92 millisecond. For the performance you have got, my guess is that it took smaller amount time to create 10 dataframes with 1 column than to create 1 data frame with 10 columns, although I do not have proof yet. cuspatial.point_in_polygon was designed to for small problems (e.g., a few millions of points and <31 polygons). The dataframe return type is for convenience, not for performance. If performance is important, you can skip the dataframe wrapping and use lower level API directly (see example below): result = cuspatial.core.gis.cpp_point_in_polygon( If the number of points is in the order of tens of millions, quadtree indexing would be beneficial. When you have a chance, please let me know the performance your get. |
Hi @zhangjianting I will try the last one at weekend. I will keep you informed. If I create polygon shape file for each group of 31 polygons, it is 10 times faster and it seems results are correct. I'm applying below code given in github page in base case. I believe performance degradation is caused by bug in point_in_polygon function.
Thanks. |
I received following error. It seems this method is also expecting 31 polygons maximum.
If I put a limit on the number of polygons, it is working but returning values are not correct. Maybe I misterpretted. I copied unique values of result.
|
@hakantekbas Can you post the code segment you used? It looks like that the non-indexed version of point-in-polygon code was used. What is the array that you displayed? |
I displayed np.unique(result). All data files are downloaded from the links shared in notebook. Thanks.
|
@hakantekbas I see. You called cython API directly for efficiency. This is still the regular point-in-polygon test API and has the 31 polygon limit. As indicated In my previous comment, for your output s column vector (which is "result" in my code), "where each bit of result[i] indicates whether points[i] is in the corresponding polygon". You can use "bin(s[i]).count("1")" to count the number of polygons that points[i]) falls within (typically just one). The unique values in your output are actually 1<<(31-j) where j vary from 0 to 31. It may indicate that no points fall within more than one polygon since the number of unique values is 31. This is expected. Please see
|
@hakantekbas I repeated your experiment and got the following results. The two results agree each other. The first 10 rows of t (cython API, result in in bitmap format) The first 10 rows of pickups (Python API, result in in Dataframe format) |
@hakantekbas I have a new explanation on the performance difference you saw before "If I create seperate shape file for every 31 polygons, it is running 10 times faster and results are correct." |
is there any timeline to fix this issue? It seems this method is one fo the most important methods in api. |
Can one of you (@zhangjianting ?) briefly summarize the specific issue? I am having trouble following the long discussion. Is there a reason you can't use cuspatial.quadtree_point_in_polygon. I feel like we should investigate deprecating the old 31-polygon-limited PiP API if the above is a better replacement. |
@harrism Neither the python API or the underling C++ API has a logic bug, which was the reason that I used "bug" with quotation marks. It is the incorrect use of the APIs that caused the incorrect result. That being said, we need a way to enforce the matching between poly_offsets and poly_ring_offsets array to give a layer of protection. A simple solution would be just pass the number of polygons and the number of rings as two additional scalar parameter, instead inferring from the lengths of poly_offsets and poly_ring_offsets arrays (as cudf columns). The 2nd simple solution is to use the complete poly_offsets and poly_ring_offsets arrays but allows passing a subset of polygon IDs. The 3rd solution is to use cuDF nested columns so that users will only see one column parameter that includes all information about poly_offsets/poly_ring_offsets/poly_x/poly_y. But this seems to be against our discussions on making cuSpatial independent of cudf. Yet another simpler solution is just to add documentation and make the requirement on matching between poly_offsets and poly_ring_offsets clear. No change to our code is needed. Users just need to add a single line by calling GeoPandas API to write a subset polygons as a shapefile and then read it back using cuSpatial shapefile reader. There could be additional alternatives. Which one would you prefer? To answer your question on why not always use cuspatial.quadtree_point_in_polygon, the spatial-join based APIs need perform quad-tree indexing first which may bring unnecessary overheads and affect performance, when the number of points is not that large. More importantly, quad-tree indexing needs setting at least two parameters (max_level and min_size), which is a little tricky and data dependent and requires users to understand the underlying algorithm to a certain extent. The old (direct) PIP API does not need any such parameters, which is preferable if the number of polygons is no more than 32. Actually, if we remove the bitvector representation of the direct/old PIP API result, and return (point, polygon) pairs as for cuspatial.quadtree_point_in_polygon, it can allow arbitrary number of polygons. The cost is doubled in this case by needing to add a phase to count the number of pairs before allocating memory and writing outputs in the 2nd phase. Suggestion? |
Let me think about it a bit. ;) |
I think this is the first step, regardless of what we do to the API. |
I think we can round-trip the above just by putting the selected polygons into a cuGeoDataframe. I'll give a stab at it tomorrow. |
@thomcom @harrism |
Another suggestion, from @isVoid:
|
) Partially addresses #492. Adds notes to the Python and C++ documentation explaining that behavior is undefined if there are rings in `poly_ring_offsets` that are not referenced by `poly_offsets` Authors: - Mark Harris (https://github.com/harrism) Approvers: - Christopher Harris (https://github.com/cwharris) - H. Thomson Comer (https://github.com/thomcom) URL: #497
@hakantekbas in #497 I improved the documentation of this function in the C++ and Python APIs. We have not yet corrected the notebook or taken further steps to make the API less error-prone, so I'm keeping this issue open. |
This issue has been labeled |
## Overview This PR closes #569. This PR rewrites the core pip test with Eric Haine's "crossings-multiply" algorithm: http://erich.realtimerendering.com/ptinpoly/ And this gives 30%~70% kernel time reduction (please ignore `Status` column): ``` ['before_optim.json', 'optim_2.json'] # point_in_polygon_benchmark ## [0] Tesla V100-SXM2-32GB | CoordsType | NumTestPoints | NumSidesPerRing | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |--------------|-----------------|-------------------|------------|-------------|------------|-------------|---------------|---------|----------| | F32 | 1000 | 4 | 63.706 us | 6.09% | 36.483 us | 6.33% | -27.223 us | -42.73% | FAIL | | F32 | 100000 | 4 | 113.862 us | 0.90% | 55.468 us | 1.75% | -58.394 us | -51.29% | FAIL | | F32 | 10000000 | 4 | 9.167 ms | 0.02% | 3.785 ms | 0.14% | -5382.554 us | -58.71% | FAIL | | F32 | 1000 | 10 | 109.911 us | 1.04% | 55.120 us | 1.82% | -54.791 us | -49.85% | FAIL | | F32 | 100000 | 10 | 190.151 us | 0.57% | 81.772 us | 1.14% | -108.380 us | -57.00% | FAIL | | F32 | 10000000 | 10 | 16.934 ms | 0.17% | 5.987 ms | 0.06% | -10947.199 us | -64.65% | FAIL | | F32 | 1000 | 100 | 813.854 us | 0.14% | 314.469 us | 0.30% | -499.384 us | -61.36% | FAIL | | F32 | 100000 | 100 | 1.292 ms | 0.18% | 343.747 us | 0.37% | -948.724 us | -73.40% | FAIL | | F32 | 10000000 | 100 | 124.471 ms | 0.09% | 29.318 ms | 0.06% | -95152.593 us | -76.45% | FAIL | | F64 | 1000 | 4 | 73.915 us | 1.40% | 46.365 us | 2.23% | -27.550 us | -37.27% | FAIL | | F64 | 100000 | 4 | 149.814 us | 2.77% | 74.001 us | 1.42% | -75.813 us | -50.60% | FAIL | | F64 | 10000000 | 4 | 10.541 ms | 0.08% | 5.186 ms | 0.05% | -5354.802 us | -50.80% | FAIL | | F64 | 1000 | 10 | 133.100 us | 0.65% | 76.647 us | 1.25% | -56.453 us | -42.41% | FAIL | | F64 | 100000 | 10 | 277.337 us | 3.51% | 117.104 us | 0.89% | -160.233 us | -57.78% | FAIL | | F64 | 10000000 | 10 | 19.648 ms | 0.06% | 9.214 ms | 0.05% | -10434.298 us | -53.11% | FAIL | | F64 | 1000 | 100 | 1.017 ms | 0.19% | 485.950 us | 0.37% | -531.440 us | -52.24% | FAIL | | F64 | 100000 | 100 | 1.992 ms | 0.63% | 517.869 us | 0.25% | -1474.597 us | -74.01% | FAIL | | F64 | 10000000 | 100 | 137.110 ms | 0.08% | 40.617 ms | 0.07% | -96493.030 us | -70.38% | FAIL | # Summary - Total Matches: 18 - Pass (diff <= min_noise): 0 - Unknown (infinite noise): 0 - Failure (diff > min_noise): 18 ``` ## Potential impact introducing divergence and removal of division **EDIT:** there's no divergence: #587 (comment) @zhangjianting and @harrism mentions in different occasions of the changes in https://github.com/rapidsai/cuspatial/blob/e3a21341054ea5bf92896267800aed31a3493ee0/cpp/include/cuspatial/experimental/detail/point_in_polygon.cuh#L85-L88 may cause unwanted overhead due to divergence. The assumption to do so is that the cycles cost by division is greater than that of warp convergence. This is apparently dependent on many factors, such as compiling with/without `--fast-math` flag or how fast the hardware supports warp convergence. On a Tesla V100 the benchmark between divergence branch and division branch is as follows: ``` ['optim_1.json', 'optim_2.json'] # point_in_polygon_benchmark ## [0] Tesla V100-SXM2-32GB | CoordsType | NumTestPoints | NumSidesPerRing | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |--------------|-----------------|-------------------|------------|-------------|------------|-------------|--------------|---------|----------| | F32 | 1000 | 4 | 37.351 us | 6.39% | 36.483 us | 6.33% | -0.868 us | -2.32% | PASS | | F32 | 100000 | 4 | 56.265 us | 1.63% | 55.468 us | 1.75% | -0.797 us | -1.42% | PASS | | F32 | 10000000 | 4 | 3.858 ms | 0.02% | 3.785 ms | 0.14% | -72.843 us | -1.89% | FAIL | | F32 | 1000 | 10 | 56.490 us | 2.04% | 55.120 us | 1.82% | -1.370 us | -2.43% | FAIL | | F32 | 100000 | 10 | 85.581 us | 1.08% | 81.772 us | 1.14% | -3.810 us | -4.45% | FAIL | | F32 | 10000000 | 10 | 6.327 ms | 0.11% | 5.987 ms | 0.06% | -340.206 us | -5.38% | FAIL | | F32 | 1000 | 100 | 314.513 us | 0.41% | 314.469 us | 0.30% | -0.044 us | -0.01% | PASS | | F32 | 100000 | 100 | 347.750 us | 0.49% | 343.747 us | 0.37% | -4.003 us | -1.15% | FAIL | | F32 | 10000000 | 100 | 29.822 ms | 0.07% | 29.318 ms | 0.06% | -504.290 us | -1.69% | FAIL | | F64 | 1000 | 4 | 49.142 us | 1.83% | 46.365 us | 2.23% | -2.777 us | -5.65% | FAIL | | F64 | 100000 | 4 | 78.167 us | 1.23% | 74.001 us | 1.42% | -4.166 us | -5.33% | FAIL | | F64 | 10000000 | 4 | 5.641 ms | 0.05% | 5.186 ms | 0.05% | -454.642 us | -8.06% | FAIL | | F64 | 1000 | 10 | 80.904 us | 1.13% | 76.647 us | 1.25% | -4.257 us | -5.26% | FAIL | | F64 | 100000 | 10 | 125.702 us | 0.65% | 117.104 us | 0.89% | -8.598 us | -6.84% | FAIL | | F64 | 10000000 | 10 | 10.074 ms | 0.03% | 9.214 ms | 0.05% | -860.423 us | -8.54% | FAIL | | F64 | 1000 | 100 | 487.959 us | 0.40% | 485.950 us | 0.37% | -2.009 us | -0.41% | FAIL | | F64 | 100000 | 100 | 528.070 us | 0.35% | 517.869 us | 0.25% | -10.201 us | -1.93% | FAIL | | F64 | 10000000 | 100 | 44.663 ms | 0.18% | 40.617 ms | 0.07% | -4045.961 us | -9.06% | FAIL | ``` which gives visible speedups. This PR adopts the change based on this benchmark. ## Minor improvement and follow ups This PR also introduces benchmarking suites for `point_in_polygon` API. Follow ups to this PR includes: - Introduce a set of functions to do input validation. This can help alleviate API misuse of #492. - Improve test cases of this function. The function supports polygons of any sides, but test cases contains largely only quads. Authors: - Michael Wang (https://github.com/isVoid) Approvers: - H. Thomson Comer (https://github.com/thomcom) - Mark Harris (https://github.com/harrism) URL: #587
This issue has been labeled |
If my understanding of the issue is correct, the inconsistency results from that inputs format is incorrect. And the incorrectness results from that the polygon array is mistakenly sliced (subset). Recent dicussions spawns two separate paths (but converge) to solve this issue.
|
This issue has been labeled |
Hi;
I encountered strange result when I run notebook below.
https://github.com/rapidsai/cuspatial/blob/branch-22.04/notebooks/nyc_taxi_years_correlation.ipynb
To show inconsistency, I did same calculation in 2 different ways. First one is pickups, second one is pickups2. Last column in pickups is calculated again in first column of pickups2. Strange part is to get different results in two calculations. Sum of true values in last column of pickups should be equal to first column of pickups2. In each calculation, sum of true values in last column is nearly the number rows. It is obvious that there is something wrong here. Can you help me to figure out?
pickups
count of true values in pickups
pickups2
count of true values in pickups2
The text was updated successfully, but these errors were encountered: