Description
Is your feature request related to a problem? Please describe.
Farthest point sampling (FPS) is an important kernel for point cloud processing, I found that the current implementation of FPS needs to process all points to generate one sampling point, which often becomes a bottleneck for point cloud-based applications.
Context
I have implemented a more effective version of farthest point sampling named bucket-based farthest point sampling (BFPS) in both CUDA and CPP.
In short, It applies the KD-tree to split the point cloud into multiple buckets and uses some prune mechanisms to reduce the number of processed points during each iteration.
The test results show that the CPP version achieves a 10-15x speedup against the current implementation without any accuracy loss. It will help the farthest point sampling kernel applied in large-scale point clouds.
Expected behavior
Update the implementation of the farthest point sampling class and implementation.
Current Behavior
I would like to implement the BFPS in PCL, but I am not familiar with the PCL framework. Maybe some volunteers could finish it.
Describe alternatives you've considered
It is worth noting that the BFPS relies on the KD-tree construction stage, which will make the BFPS slower than the current implementation when the point cloud scale is smaller than 4000.
Additional context
some references