-
Notifications
You must be signed in to change notification settings - Fork 5
Mapper
The Mapper algorithm is a method for topological data analysis invented by Gurjeet Singh, Facundo Memoli and Gunnar Carlsson. See the Reference [R1] for the publication. While the Mapper algorithm alone does not constitute a complete data analysis tool itself, it is the key part of a processing chain with (minimally) filter functions, the Mapper algorithm itself and visualization of the results.
Informally, the Mapper algorithm works by performing a local clustering guided by a projection function. The steps are as follows:
- Project a dataset.
- Cover this projection with overlapping intervals/hypercubes.
- Cluster the points inside an interval
- Graphization. The clusters become nodes in a graph. Due to the overlap, a single point can appear in multiple nodes. When there is such a member intersection, draw an edge between these nodes.
In SparkTDA, we explore a tree-based approach for the Mapper algorithm that require only O(N log N) computation and O(N) memory. Our approaches compute a sparse approximation of the similarities between the input objects using vantage-point trees [R2]. The steps are as follows:
- Project a dataset.
- Cover this projection with overlapping intervals/hypercubes.
- Generate index to repartition data points for querying kNNs of each data point [R3].
- Repartition & query kNNs for each data points [R3].
- Create kNN graph for each intervals, i.e., two graphs in different intervals are not connected.
- Apply SNN-DBSCAN to create clusters.
- G. Singh, F. Memoli, G. Carlsson (2007). Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition, Point Based Graphics 2007, Prague, September 2007.
- P. N. Yianilos (1993). Data structures and algorithms for nearest neighbor search in general metric spaces, Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp 311-321 1993.
- L. Ting, C. Rosenberg, H. Rowley (2007). Clustering billions of images with large scale nearest neighbor search. Applications of Computer Vision, 2007. WACV'07. IEEE Workshop on. IEEE, 2007.