Skip to content
ognis1205 edited this page May 8, 2018 · 4 revisions

The Mapper algorithm

The Mapper algorithm is a method for topological data analysis invented by Gurjeet Singh, Facundo Memoli and Gunnar Carlsson. See the Reference [R1] for the publication. While the Mapper algorithm alone does not constitute a complete data analysis tool itself, it is the key part of a processing chain with (minimally) filter functions, the Mapper algorithm itself and visualization of the results.

Informally, the Mapper algorithm works by performing a local clustering guided by a projection function. The steps are as follows:

  1. Project a dataset.
  2. Cover this projection with overlapping intervals/hypercubes.

  1. Cluster the points inside an interval
  2. Graphization. The clusters become nodes in a graph. Due to the overlap, a single point can appear in multiple nodes. When there is such a member intersection, draw an edge between these nodes.

Implementation details of SparkTDA Mapper

In SparkTDA, we explore a tree-based approach for the Mapper algorithm that require only O(N log N) computation and O(N) memory. Our approaches compute a sparse approximation of the similarities between the input objects using vantage-point trees [R2]. The steps are as follows:

  1. Project a dataset.
  2. Cover this projection with overlapping intervals/hypercubes.

  1. Generate index to repartition data points for querying kNNs of each data point [R3].
  2. Repartition & query kNNs for each data points [R3].
  3. Create kNN graph for each intervals, i.e., two graphs in different intervals are not connected.
  4. Apply SNN-DBSCAN to create clusters.

How to use SparkTDA Mapper

About SNN-DBSCAN

References

  1. G. Singh, F. Memoli, G. Carlsson (2007). Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition, Point Based Graphics 2007, Prague, September 2007.
  2. P. N. Yianilos (1993). Data structures and algorithms for nearest neighbor search in general metric spaces, Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp 311-321 1993.
  3. L. Ting, C. Rosenberg, H. Rowley (2007). Clustering billions of images with large scale nearest neighbor search. Applications of Computer Vision, 2007. WACV'07. IEEE Workshop on. IEEE, 2007.
Clone this wiki locally