forked from elbamos/largeVis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
69 lines (54 loc) · 3.15 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
title: "largeVis"
output: github_document
bibliography: vignettes/largevisvignettes.bib
---
[![Travis-CI Build Status](https://travis-ci.org/elbamos/largeVis.svg?branch=master)](https://travis-ci.org/elbamos/largeVis)
[![Coverage Status](https://img.shields.io/codecov/c/github/elbamos/largeVis/master.svg)](https://codecov.io/gh/elbamos/largeVis/branch/master) [![https://gitter.im/elbamos/largeVis](https://badges.gitter.im/elbamos/largeVis.svg)](https://gitter.im/elbamos/largeVis?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/elbamos/largeVis?branch=master&svg=true)](https://ci.appveyor.com/project/elbamos/largeVis?branch=master)
This is an implementation of the `largeVis` algorithm described in (https://arxiv.org/abs/1602.00370). It also incorporates:
* A very fast algorithm for estimating k-nearest neighbors, implemented in C++ with `Rcpp` and `OpenMP`. See the [Benchmarks](./benchmarks.md) file for performance details.
* Efficient implementations of the clustering algorithms:
+ `HDBSCAN`
+ `OPTICS`
+ `DBSCAN`
* Functions for visualizing manifolds like [this](http://cs.stanford.edu/people/karpathy/cnnembed/).
### News Highlights
* Version 0.1.10 re-adds clustering, and also adds momentum training to largeVis, as well as a host of other features and improvements.
* Version 0.1.9.1 has been accepted by CRAN. Much grattitude to Uwe Ligges and Kurt Hornik for their assistance, advice, and patience.
### Some Examples
![MNIST](./README_files/figure-markdown_github/drawmnist-1.png)
![Wiki Words](./README_files/figure-markdown_github/drawwikiwords-1.png)
### Clustering With HDBSCAN
```{r clustering,echo=F,warning=F}
library(clusteringdatasets)
library(largeVis, quietly = TRUE)
library(ggplot2)
data(Aggregation)
vis <- largeVis(t(as.matrix(Aggregation[, 1:2])), sgd_batches = 1)
clusters <- hdbscan(vis, K = 2)
theme_set(
theme_bw() %+replace%
theme(
legend.title = element_text(size = rel(0.8),
face = "bold"),
legend.margin = unit(0, "cm"),
legend.position = "right",
legend.key.size = unit(2, "lines"),
legend.text = element_text(size = unit(8, "points")),
axis.title.y = element_text(angle = 90),
axis.text = element_text(size = rel(0.7)),
plot.margin = unit(c(0, 0.5, 1, 0), "lines"),
axis.title = element_text(size = rel(0.8),
face = "bold"),
title = element_text(size = rel(0.9))
)
)
show(gplot(clusters, Aggregation[, 1:2]) +
scale_x_continuous("", labels = NULL) +
scale_y_continuous("", labels = NULL))
```
### Visualize Embeddings
![Visualize Embeddings](./README_files/figure-markdown_github/faceImages-1.png)
#### Building Notes
* **Note on R 3.4**: Before R 3.4, the CRAN binaries were likely to have been compiled without OpenMP, and getting OpenMP to work on Mac OS X was somewhat tricky. This should all have changed (for the better) with R 3.4, which natively using `clang 4.0` by default. Since R 3.4 is new, I'm not able to provide advice, but am interested in hearing of any issues and any workarounds to issues that you may discover.