2024-10-20
This is an implementation of the mapper algorithm by Singh, Mémoli, and Carlsson (2007).
To install and use the most recent CRAN upload of this package, run the following:
install.packages("mappeR")
library(mappeR)
To install the latest development version of this package from Github, run the following commands:
install.packages("devtools")
library(devtools)
devtools::install_github("https://github.com/Uiowa-Applied-Topology/mappeR/tree/dev", upgrade=FALSE)
library(mappeR)
If you’re installing from Github, you might need to do some more stuff:
- Windows: install Rtools (https://cran.r-project.org/bin/windows/Rtools/)
- OS X: install Xcode (from the Mac App Store)
- Linux: run
apt-get install r-base-dev
(or similar).
Mapper is a way to view a point cloud
Consider a function
Cover
based on this cover to reflect the original data, where
and
This is the basic idea of the mapper algorithm, with the addition that
each level set is first refined into clusters based on the intrinsic
pairwise distances of the data according to some clustering algorithm.
That is, we partition each level set
and build a new graph
and
So in general, the ingredients to construct a mapper graph are
- A data set, along with their pairwise distances
- A lens function with the data as its domain (above this was real-valued, but it does not have to be)
- A cover of the codomain of the lens function
- A clustering algorithm
num_points = 5000
P.data = data.frame(
x = sapply(1:num_points, function(x)
sin(x) * 10) + rnorm(num_points, 0, 0.1),
y = sapply(1:num_points, function(x)
cos(x) ^ 2 * sin(x) * 10) + rnorm(num_points, 0, 0.1),
z = sapply(1:num_points, function(x)
10 * sin(x) ^ 2 * cos(x)) + rnorm(num_points, 0, 0.1)
)
P.dist = dist(P.data)
Here is a point cloud
This seems to form a kind of figure-8 curve just based on this projection. But as we can see from the 2D projections, the “shape” of the data set we’re seeing really does depend on how we’re looking at it:
We will build graphs using the outline of the mapper algorithm described, with real-valued lens functions.
Parameters:
- Data: figure-8
- Lens: Projection to each factor, or eccentricity (a measure of centrality per data point)
- Cover: A cover of
$\mathbb{R}$ (just up to the extremes of the function values) using 10 equally spaced intervals with 25% overlap between each consecutive interval - Clustering method: Single-linkage hierarchical clustering
# lens functions
projx = P.data$x
projy = P.data$y
projz = P.data$z
# eccentricity = apply(as.matrix(P.dist), 1, sum) / num_points
eccentricity = eccentricity_filter(P.dist)
# cover parameters to generate a width-balanced cover
num_bins = 10
percent_overlap = 25
# generate the cover
xcover = create_width_balanced_cover(min(projx), max(projx), num_bins, percent_overlap)
ycover = create_width_balanced_cover(min(projy), max(projy), num_bins, percent_overlap)
zcover = create_width_balanced_cover(min(projz), max(projz), num_bins, percent_overlap)
eccentriccover = create_width_balanced_cover(min(eccentricity),
max(eccentricity),
num_bins,
percent_overlap)
# bin tester machine machine
check_in_interval <- function(endpoints) {
return(function(x) (endpoints[1] - x <= 0) & (endpoints[2] - x >= 0))
}
# each of the "cover" elements will really be a function that checks if a data point lives in it
xcovercheck = apply(xcover, 1, check_in_interval)
# build the mapper objects
xmapper = create_mapper_object(
data = P.data,
dists = P.dist,
filtered_data = projx,
cover_element_tests = xcovercheck,
method = "single"
)
# mappeR also has a built-in 1D mapper function that will do the above for you
ymapper = create_1D_mapper_object(P.data, P.dist, projy, ycover, "single")
zmapper = create_1D_mapper_object(P.data, P.dist, projz, zcover, "single")
eccentricmapper = create_1D_mapper_object(P.data, P.dist, eccentricity, eccentriccover, "single")
# mappeR also has functions which will convert the mapper outputs into igraph format
ixmapper = mapper_object_to_igraph(xmapper)
iymapper = mapper_object_to_igraph(ymapper)
izmapper = mapper_object_to_igraph(zmapper)
ieccentricmapper = mapper_object_to_igraph(eccentricmapper)
The vertices in each output graph below are colored according to the level set the cluster belongs to, and scaled by (the square root of) the number of data points in the cluster.
By toying with the general mapper parameters, we can obtain different
flavors of the algorithm. In the ball mapper flavor, we simply use the
inclusion into the ambient space of the data as our lens function, and
let the cover do the work. Specifically, we cover the ambient space with
Parameters:
- Data: figure-8
- Cover: set of
$\varepsilon$ -balls in$\mathbb{R^3}$ - Lens function: inclusion from
$P\hookrightarrow\mathbb{R}^3$ - Clustering method: none (or, “any data set is one big cluster”-type clustering)
There’s a secret parameter here, which is
# creates a cover using a greedy algorithm
balls1 = create_balls(data = P.data, dists = P.dist, eps = .25)
# ball tester machine machine
is_in_ball <- function(ball) {
return(function(x) x %in% ball)
}
# filtering is just giving back the data (row names because my balls are lists of data point names, so the filter should match)
ballmapper1 = create_mapper_object(P.data, P.dist, rownames(P.data), lapply(balls1, is_in_ball))
# mappeR has a built-in ball mapper function to do this for you
ballmapper2 = create_ball_mapper_object(P.data, P.dist, .5)
ballmapper3 = create_ball_mapper_object(P.data, P.dist, 1)
ballmapper4 = create_ball_mapper_object(P.data, P.dist, 2)
mappeR
has built-in methods for:
1D mapper
create_1D_mapper_object(data, dists, filtered_data, cover, clustering_method)
- Lens:
$P \to \mathbb{R}$ - Cover: intervals
- Clustering: yes
Ball mapper
create_ball_mapper_object(data, dists, eps)
- Lens:
$P \to P$ by identity - Cover:
$\varepsilon$ -balls in ambient$P$ -space - Clustering: no
Clusterball mapper
create_clusterball_mapper_object(data, ball_dists, clustering_dists, eps, clustering_method)
- Lens:
$P \to P$ by identity - Cover:
$\varepsilon$ -balls in ambient$P$ -space - Clustering: yes
mappeR
supports:
Hierarchical Clustering
"single"
: single linkage"complete"
: complete linkage"average"
: average linkage"mcquitty"
: McQuitty linkage"centroid"
: centroid linkage"median"
: median linkage"ward.D"
: Ward linkage (assumes squared distances)"ward.D2"
: Ward linkage (assumes nonsquared distances)