This is the repo for our benchmarking and analysis project. All methods are collected until March 1st, 2024.
To install our benchmarking environment based on scGPT, please use conda to create an environment based on this yml file in your own machine:
conda env create -n scgpt --file scgpt_bench.yml
If you face any issues due to version conflicts, you can try to comment the problematic packages and try:
conda activate scgpt
conda env update --file scgpt_bench.yml
For other methods we used, please refer to their original project website for instructions. We recommend creating different environments for different methods. Considering the difficulties of installing different scFMs, we provide a list of yml files we used to install these models in the folder installation_baselines.
These methods include:
tGPT, Geneformer, scBERT, CellLM, SCimilarity, scFoundation, CellPLM, UCE, GeneCompass. These are also single-cell FMs.
And
TOSICA, scJoint, GLUE, ResPAN, Harmony, scDesign3, Splatter, scVI, Tangram, GEARS. These are task-specific models.
We need scIB for evaluation. Please use pip to install it:
pip install scib
We also provide a scib version with our new function in this repo. Please make sure you have scib >=1.0.4 to run kBET correctly.
We will release a version of scEval with more functions in the future!
Most of our experiments were finished based on weights under scGPT_bc. scGPT_full from scGPT v2 was also used in the batch effect correction evaluation. Pre-training weights of scBERT can be found in scBERT. Pre-training weights of CellLM can be found in cellLM. Pre-training weights of Geneformer can be found in Geneformer. Pre-training weights of SCimilarity can be found in SCimilarity. Pre-training weights of UCE can be found in UCE. Pre-training weights of tGPT can be found in tGPT. Pre-training weights of CellPLM can be found in CellPLM.
scFoundation relies on the APIs or local sever for access, please refer scFoundation for details. Details of GeneCompas can be found in GeneCompass
Please refer to different folders for the codes of scEval and metrics we used to evaluate single-cell LLMs under different tasks. In general, we list the tasks and corresponding metrics here:
Tasks | Metrics |
---|---|
Batch Effect Correction, Multi-omics Data Integration | |
and Simulation | scIB |
Cell-type Annotation and Gene Function Prediction | Accuracy, Precision, Recall and F1 score |
Imputation | scIB, Correlation |
Perturbation Prediction | Correlation |
Gene Network Analysis | Jaccard similarity |
The file 'sceval_lib.py' includes all of the metrics we used in this project.
To run the codes in different tasks, please use (we choose batch effect correction of scGPT as an example here):
python sceval_batcheffect.py
We recommend directly evaluating the methods based on their outputs (as .h5ad file), which can be easily performed based on the codes in sceval_method.py.
We offer demo datasets for batch effect correction and cell type annotation. Such datasets can be found here.
To avoid using wandb, please set:
os.environ["WANDB_MODE"] = "offline"
We will upload our codes for benchmarking different foundation models soon.
We recommend using sever to run benchmarked methods and scEval platform. To run single-cell Foundation Models, GPU cores (A100 or higher version) and 40+ GB memory are required. To run scEval (only the evaluation), 40+ GB memory is recommended.
We have an official website as the summary of our work. Please use this link for access.
Please contact [email protected] if you have any questions about this project.
@article{liu2023evaluating,
title={Evaluating the Utilities of Foundation Models in Single-cell Data Analysis},
author={Liu, Tianyu and Li, Kexing and Wang, Yuge and Li, Hongyu and Zhao, Hongyu},
journal={bioRxiv},
pages={2023--09},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}