Make alluvial plots with node order and colors optimized to minimize edge crossings with wompwomp!
wompwomp solves the Weighted (permutation) Optimization of Multiple Partitions-Weighted (label) Optimization of Multiple Partitions (WPOMP--WLOMP) problem.
R - Requires system R to be installed
Bioconductor (not yet released on Bioconductor - please install from GitHub)
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("wompwomp")
wompwomp::setup_python_env()
GitHub
if (!require("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("pachterlab/wompwomp")
wompwomp::setup_python_env()
git clone https://github.com/pachterlab/wompwomp
cd wompwomp
conda env create -f environment.yml # or to avoid conda: Rscript inst/install.R
conda activate wompwomp_env # skip if used install.R above
remotes::install_local(".") # or use --dev flag in commands
The first time any command is run on the command line, a prompt will appear asking to install any missing R dependencies.
While Python is not strictly required for use of the package, it is required for some options, including default package options (i.e., NeighborNet algorithm for sorting_algorithm == "neighbornet" or column_sorting_algorithm == "neighbornet", Leiden clustering for coloring_algorithm == "advanced", fenwick tree optimization for objective calculation).
The I/O for each of wompwomp's functions is as follows:
- plot_alluvial: dataframe, csv, or tibble (grouped or ungrouped) --> plot
- data_preprocess: dataframe, csv, or tibble (grouped or ungrouped) --> dataframe (grouped)
- data_sort: dataframe, csv, or tibble (grouped or ungrouped) --> dataframe (grouped)
- plot_alluvial_internal: dataframe, csv, or tibble (grouped) --> plot
- determine_crossing_edges: dataframe, csv, or tibble (grouped or ungrouped) --> list
- determine_weighted_layer_free_objective: dataframe, csv, or tibble (grouped or ungrouped) --> integer
The input table can have one of two formats:
- Ungrouped: columns specified by column1 and column2, where each row corresponds to a separate entity
- Grouped: columns specified by column1, column2, and column_weights, where each row corresponds to a combination of column1 and column2, and column_weights specified the number of items in this combination
Ungrouped input
library("wompwomp")
df <- data.frame(method1 = sample(1:3, 100, TRUE), method2 = sample(1:3, 100, TRUE))
head(df)
#> method1 method2
#> 1 1 1
#> 2 1 3
#> 3 1 2
#> 4 1 1
#> 5 2 1
#> 6 2 2
p <- plot_alluvial(df)
p
Grouped input
set.seed(42)
raw_df <- data.frame(
method1 = sample(1:3, 100, TRUE),
method2 = sample(1:3, 100, TRUE)
)
# Aggregate by combination
df <- as.data.frame(dplyr::count(raw_df, method1, method2, name = "weight"))
head(df)
#> method1 method2 weight
#> 1 1 1 13
#> 2 1 2 15
#> 3 1 3 12
#> 4 2 1 12
#> 5 2 2 17
#> 6 2 3 10
p <- plot_alluvial(df, column_weights = "weight")
p
./exec/wompwomp plot_alluvial --df mydata.csv --graphing_columns column1 column2
For help on any command, run ./exec/wompwomp COMMAND --help
Notes about command line usage:
- all parameter values should be space-separted ex. ./exec/wompwomp plot_alluvial --df data.csv, NOT --df=data.csv
- all parameters that take a single argument have identical names between R and command line, with the value immediately following the argument ex. plot_alluvial(df=data.csv), ./exec/wompwomp plot_alluvial --df data.csv
- all parameters that take a vector/list of arguments have identical names between R and command line, with the values immediately following the argument, all separated by spaced ex. plot_alluvial(graphing_columns=c("tissue", "cluster")), ./exec/wompwomp plot_alluvial --graphing_columns tissue cluster
- all boolean parameters are passed with the flag without any following arguments; boolean parameters that default to FALSE have identical names between R and command line, while boolean parameters that default to TRUE have "disable_" prepended to the name in the command line ex. (note that the defaults for include_group_sizes=FALSE and include_axis_titles=TRUE): plot_alluvial(include_group_sizes=TRUE, include_axis_titles=FALSE), ./exec/wompwomp plot_alluvial --include_group_sizes --disable_include_axis_titles
See a full tutorial in our introductory vignette wompwomp-intro.Rmd
Read our preprint on arXiv here.