Mapper

The Mapper algorithm

The Mapper algorithm is a method for topological data analysis invented by Gurjeet Singh, Facundo Memoli and Gunnar Carlsson. See the Reference [R1] for the publication. While the Mapper algorithm alone does not constitute a complete data analysis tool itself, it is the key part of a processing chain with (minimally) filter functions, the Mapper algorithm itself and visualization of the results.

Informally, the Mapper algorithm works by performing a local clustering guided by a projection function. The steps are as follows:

Project a dataset.
Cover this projection with overlapping intervals/hypercubes.

Cluster the points inside an interval
Graphization. The clusters become nodes in a graph. Due to the overlap, a single point can appear in multiple nodes. When there is such a member intersection, draw an edge between these nodes.

Implementation details of SparkTDA Mapper

In SparkTDA, we explore a tree-based approach for the Mapper algorithm that require only O(N log N) computation and O(N) memory. Our approaches compute a sparse approximation of the similarities between the input objects using vantage-point trees [R2]. The steps are as follows:

Project a dataset.
Cover this projection with overlapping intervals/hypercubes.

Generate index to repartition data points for querying kNNs of each data point [R3].
Repartition & query kNNs for each data points [R3].
Create kNN graph for each intervals, i.e., two graphs in different intervals are not connected.
Apply SNN-DBSCAN to create clusters.

How to use SparkTDA Mapper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapper

The Mapper algorithm

Implementation details of SparkTDA Mapper

How to use SparkTDA Mapper

About SNN-DBSCAN

References

Clone this wiki locally