MICA is a clustering tool for single-cell RNA-seq data. MICA takes a preprocessed gene expression matrix as input and efficiently cluster the cells. MICA consists of the following main components:
- Mutual information estimation for cell-cell distance quantification
- Dimension reduction on the non-linear mutual information-based distance space
- Consensus clustering on dimension-reduced spaces
- Clustering visualization and cell type annotation
MICA workflow:
- python>=3.7.6, <=3.9.2 (developed and tested on python 3.7.6, tested on python3.9.2)
- See requirements.txt file for other dependencies
For MacOS users with arm-based Mchip, please visit MICA_MacOS_Mchip branch https://github.com/jyyulab/MICA/tree/MICA_MacOS_Mchip
The recommended method of setting up the required Python environment and dependencies is to use the conda dependency manager:
conda create -n mica100 python=3.9.2 # Create a python virtual environment
source activate mica100 # Activate the virtual environment
pip install MICA # Install MICA and its dependencies
conda create -n mica100 python=3.9.2 # Create a python virtual environment
source activate mica100 # Activate the virtual environment
git clone https://github.com/jyyulab/MICA # Clone the repo
cd MICA # Switch to the MICA root directory
pip install . # Install MICA from source
mica -h # Check if mica works correctly
MICA workflow has two built-in dimension reduction methods. The auto mode (mica or mica auto)
selects a dimension reduction method automatically based on the cell count of the preprocessed matrix.
Users can select graph embedding method (mica ge) or MDS (mica mds) or Louvain (mica louvain) method manually using the subcommand
ge or mds or louvain respectively.
$ mica -h
usage: mica [-h] {auto,ge,mds} ...
MICA - Mutual Information-based Clustering Analysis tool.
optional arguments:
-h, --help show this help message and exit
subcommands:
{auto,ge,mds} versions
auto automatic version
ge graph embedding version
mds MDS version
louvain simple louvain version
Use mica ge -h, mica mds -h, and mica louvain -h to check helps with subcommands.
Recently Changed* The main input for MICA is tab-separated genes/proteins by cells/samples (rows are genes/proteins) expression matrix or an anndata file after preprocessing.
After the completion of the pipeline, mica will generate the following outputs:
- Clustering results plot with clustering label mapped to each cluster
- Clustering results txt file with visualization coordinates and clustering label
MICA auto mode reduces the dimensionality using either the multidimensional scaling method (<= 5,000 cells) or the graph embedding method (> 5,000 cells), where the number of cells cutoff was chosen based on performance evaluation of datasets of various sizes.
mica auto -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -pn pbmc3k -nc 10
MICA GE mode reduces the dimensionality using the graph embedding method. It sweeps a range of resolutions
of the Louvain clustering algorithm. -ar parameter sets the upper bound of the range.
mica ge -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -maxr 4.0 -ss 1
The default setting is to build the MI distance-based graph with the K-nearest-neighbors algorithm, and the number of the neighbors can be set with -nnm. Another way to build the graph is to run approximate-nearest-neighbors(ann) based on the Hierarchical Navigable Small World(HNSW) algorithm. Set -nnt(knn or ann) to enable nn type selection.
Here are 2 main hyperparameters in ann, ef(-annef) and m(-annm). Suggested setting:
- default
- Really fast m = 4, ef = 200
- Accurate m = 16, ef = 800
Optimize these 2 parameters to make them work on your case, to make the ge mode both fast and robust. Please increase ef when -nnm is increased.
mica ge -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -nnt ann -annm 8 -annef 400 -maxr 4.0 -ss 1
To set the number of neighbors in the graph for Louvain clustering, please set -nne
MICA MDS mode reduces the dimensionality using the multidimensional scaling method. It includes both Kmeans clustering and louvain clustering.
To run KMeans mode please set -nck, to run louvain graph clustering, please set -nn or as default.
-pn parameter sets the
project name; -nck specifies the numbers of clusters (k in k-mean clustering algorithm); -dd is the
number of dimensions used in performing k-mean clusterings in the dimension reduced matrix.
mica mds -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -pn PBMC3k -nck 8
MICA Louvain mode reduces the dimension without MI distance estimate via PCA or MDS directly, and then the Louvain clustering will be executed.
To set the dimension-reduction method, please set -dm (PCA or MDS)
'mica louvain -i ./test_data/inputs/10x/PBMC/3k/pre-processed/pbmc3k_preprocessed.h5ad -o ./test_data/outputs -dm PCA'
-dd: Number of dimensions to reduce to
-cldis: distance in Louvain clustering, euclidean/cosine
-bpr: set the power index of the bin size for MI, 3 -> bins=(features)**(1/3)
-bsz: set the bin size for MI
-sil: run silhouette analysis for louvain clustering
-nw: num of workers
- Try to install MICA 1.0.0 but when checking, it’s still old version:
Solve: make sure to create a new folder and change direction into there before you start Step 1, if you’ve have a like ‘MICA’ at the current direction.
mkdir MICA100
cd MICA100
#start installation- Illegal instruction(core dumped) or other problem when importing mihnsw
Often because of the conda version or gcc version
try:
module load conda3/202402
module load gcc/10.2.0
#others may also works like gcc 9.5.0https://jyyulab.github.io/scMINER/site/
hnswlib: https://github.com/nmslib/hnswlib. The author of MICA adds a 'mutual-info-distance' to the space of hnswlib.
scMINER paper: https://www.nature.com/articles/s41467-025-59620-6
