Skip to content

More effective implementation of farthest point sampling #5797

Open
@hanm2019

Description

@hanm2019

Is your feature request related to a problem? Please describe.

Farthest point sampling (FPS) is an important kernel for point cloud processing, I found that the current implementation of FPS needs to process all points to generate one sampling point, which often becomes a bottleneck for point cloud-based applications.

Context

I have implemented a more effective version of farthest point sampling named bucket-based farthest point sampling (BFPS) in both CUDA and CPP.

In short, It applies the KD-tree to split the point cloud into multiple buckets and uses some prune mechanisms to reduce the number of processed points during each iteration.

The test results show that the CPP version achieves a 10-15x speedup against the current implementation without any accuracy loss. It will help the farthest point sampling kernel applied in large-scale point clouds.

Expected behavior

Update the implementation of the farthest point sampling class and implementation.

Current Behavior

I would like to implement the BFPS in PCL, but I am not familiar with the PCL framework. Maybe some volunteers could finish it.

Describe alternatives you've considered

It is worth noting that the BFPS relies on the KD-tree construction stage, which will make the BFPS slower than the current implementation when the point cloud scale is smaller than 4000.

Additional context

some references

  1. the CPU implementation of FPS
  2. the GPU implementation of FPS
  3. the reference paper: QuickFPS: Architecture and Algorithm Co-Design for Farthest Point Sampling in Large-Scale Point Clouds

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions