-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
92 lines (67 loc) · 2.19 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# clevr: Clustering and Link Prediction Evaluation in R
<!-- badges: start -->
<!-- badges: end -->
clevr implements functions for evaluating link prediction and clustering
algorithms in R. It includes efficient implementations of common performance
measures, such as:
* pairwise precision, recall, F-measure;
* homogeneity, completeness and V-measure;
* (adjusted) Rand index;
* variation of information; and
* mutual information.
While the current focus is on supervised (a.k.a. external) performance
measures, unsupervised (internal) measures are also in scope for future
releases.
## Installation
You can install the latest release from [CRAN](https://CRAN.R-project.org)
by entering:
``` r
install.packages("clevr")
```
The development version can be installed from GitHub using `devtools`:
``` r
# install.packages("devtools")
devtools::install_github("cleanzr/clevr")
```
## Example
Several functions are included which transform between different clustering
representations.
```{r example}
library(clevr)
# A clustering of four records represented as a membership vector
pred_membership <- c("Record1" = 1, "Record2" = 1, "Record3" = 1, "Record4" = 2)
# Represent as a set of record pairs that appear in the same cluster
pred_pairs <- membership_to_pairs(pred_membership)
print(pred_pairs)
# Represent as a list of record clusters
pred_clusters <- membership_to_clusters(pred_membership)
print(pred_clusters)
```
Performance measures are available for evaluating linked pairs:
```{r pair-measures}
true_pairs <- rbind(c("Record1", "Record2"), c("Record3", "Record4"))
pr <- precision_pairs(true_pairs, pred_pairs)
print(pr)
re <- recall_pairs(true_pairs, pred_pairs)
print(re)
```
and for evaluating clusterings:
```{r clust-measures}
true_membership <- c("Record1" = 1, "Record2" = 1, "Record3" = 2, "Record4" = 2)
ari <- adj_rand_index(true_membership, pred_membership)
print(ari)
vi <- variation_info(true_membership, pred_membership)
print(vi)
```