Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] quadtree_on_points not constant behavior #326

Open
msalazarcgeo opened this issue Dec 3, 2020 · 8 comments
Open

[BUG] quadtree_on_points not constant behavior #326

msalazarcgeo opened this issue Dec 3, 2020 · 8 comments
Labels
bug Something isn't working Needs Triage Need team to review and classify

Comments

@msalazarcgeo
Copy link

To create fast rasterization I wanted to use the quadtree_on_points, the data has 480631 points and a max_depht of 8, when I ran the function the resulting quadtree has 7720 points on the leaves, to calculate the number of points in all the leaves I use quad_tree[quad_tree['is_quad']== False]['length'].sum() and the number of nodes in the tree is 611. This is wrong, if the quadtree is recalculated then the result is right, meaning that the number of points in the leaves is 480631 and the size of the tree is has changed to what seems to be the proper tree.

This is not an issue to be critical or something like that, I can solve my problem by simply recalculating the quadtree. I just want to inform you about the behavior and thank you for the amazing tool that this library is.

@harrism
Copy link
Member

harrism commented Dec 4, 2020

Hi @msalazarcgeo , thanks for the issue. Any chance you can provide code to reproduce?

@harrism harrism added Needs Triage Need team to review and classify bug Something isn't working labels Dec 4, 2020
@harrism harrism changed the title quadtree_on_points not constant behaivor [BUG] quadtree_on_points not constant behaivor Dec 4, 2020
@msalazarcgeo
Copy link
Author

I have made a jupyter-lab notebook reproducing the behaviour, I don't know if this is is good for you or you want me to create a pure python script reproducing.

@msalazarcgeo
Copy link
Author

msalazarcgeo commented Dec 30, 2020

Here is a code that shows the main problem (the quadtree is not constant on the same set of points )


import numpy as np
import cudf 
import cuspatial
import shapely
import cuml
import geopandas
import pandas as pd

### Read a poligon 
poligon= pd.read_csv('./poligon_cdmx.csv') 
poligon.drop(columns= 'Unnamed: 0', inplace = True)

#The bounding box of the poligon previusly calculated
bb_coords = (-99.36492420394829, 19.048236663823484, -98.94030281205599, 19.59275727996437)
### Sent the poligon to the GPU 
poly_tuple= (cudf.Series(data = np.array([0]), dtype = np.int32),
             cudf.Series(data = np.array([0]), dtype = np.int32),
             cudf.DataFrame(poligon)
            )


#### Random points 
rand_x = np.random.random(600000) * (bb_coords[0]- bb_coords[2]) + bb_coords[2] 
rand_y = np.random.random(600000) * (bb_coords[1]- bb_coords[3]) + bb_coords[3] 

#### Dataframe with the points on the GPU
df_cu2 = cudf.DataFrame({'X': rand_x, 'Y': rand_y})

#### Get only the point in the polygon 
cu_res2 = cuspatial.point_in_polygon(
                    df_cu2['X'],
                    df_cu2['Y'],
                    poly_tuple[0],
                    poly_tuple[1],
                    poly_tuple[2]['x'],
                    poly_tuple[2]['y']
                    
)
### Only the point in the poligon 
df_cu3 = df_cu2[cu_res2[0]]
#### 
print("Number of points to do the quadtree: ", df_cu3.shape[0])
### to get the quantree
x_size = bb_coords[2] - bb_coords[0]
y_size = bb_coords[3]- bb_coords[1]
max_depth = 8
scale_p = max(x_size, y_size) / (1 << max_depth )
min_size = 10 ### to determin if the quea is a leaf
print("Generate first quadtree")
key_1 , quad_1 = cuspatial.quadtree_on_points(
                        df_cu3['X'],
                        df_cu3['Y'],
                        bb_coords[0],
                        bb_coords[2],
                        bb_coords[1],
                        bb_coords[3],
                        scale = scale_p,
                        max_depth = max_depth,
                        min_size = min_size
)
print("The number of point in the leaves of the first quadtree: ", quad_1[quad_1['is_quad']== False]['length'].sum()) 

print("Redo the quadtree on the same set of points")
key_2 , quad_2 = cuspatial.quadtree_on_points(
                        df_cu3['X'],
                        df_cu3['Y'],
                        bb_coords[0],
                        bb_coords[2],
                        bb_coords[1],
                        bb_coords[3],
                        scale = scale_p,
                        max_depth = max_depth,
                        min_size = min_size
)
print("The number of point in the leaves of the second quadtree: ",quad_2[quad_2['is_quad']== False]['length'].sum())

#### 
print("Redo the quadtree on the same set of points using the same variables ")
key_2 , quad_2 = cuspatial.quadtree_on_points(
                        df_cu3['X'],
                        df_cu3['Y'],
                        bb_coords[0],
                        bb_coords[2],
                        bb_coords[1],
                        bb_coords[3],
                        scale = scale_p,
                        max_depth = max_depth,
                        min_size = min_size
)
print("The number of point in the leaves of the third quadtree: ",quad_2[quad_2['is_quad']== False]['length'].sum())

Due to the use of the random function the output will be different (I didn't use the "seed" function ) but it show the problem meaning that the first and second quad_2[quad_2['is_quad']== False]['length'].sum() are different.

My believe is that for some reason the quadtree is not pointing correctly to the memory address of the inner leaves.

@zhangjianting
Copy link
Contributor

@msalazarcgeo Thanks for the bug report. Can you confirm which version you were using? we fixed a related bug for 0.17, which should be used for testing.

@msalazarcgeo
Copy link
Author

@msalazarcgeo Thanks for the bug report. Can you confirm which version you were using? we fixed a related bug for 0.17, which should be used for testing.

@zhangjianting I will check on the 0.17 version to confirm the bug.

@msalazarcgeo
Copy link
Author

@zhangjianting I have tested using a new environment with the 0.17 version. The result was better, meaning that the difference between the number of points in the leaves is closer between the quadtrees. But not the issue is not properly resolved, the number of points in the quadtrees should be the same for each.

@harrism harrism changed the title [BUG] quadtree_on_points not constant behaivor [BUG] quadtree_on_points not constant behavior Jan 4, 2021
@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions github-actions bot added the stale label Feb 16, 2021
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Needs Triage Need team to review and classify
Projects
Status: Todo
Development

No branches or pull requests

4 participants