scNOVA : Single-Cell Nucleosome Occupancy and genetic Variation Analysis - summarized in a single Snakemake pipeline.
PART0. Pre-requirement step - preparation of single-cell genetic information:
Mosaicatcher: https://github.com/friendsofstrandseq/mosaicatcher-pipeline
PART1. Read preprocessing
PART2. Read counting to generate single-cell genebody NO
PART3. Infer expressed genes of subclones
PART4. DE analysis of subclones
PART5. (Optional) Infer Single-cell TF motif accessibility using chromVAR (20210108 updated), by default, Roadmap epigenomics DHS (Enhancers) will be used to define CREs
PART6. (Optional) Infer haplotype-resolved genebody NO (20210108 updated)
Main output (GM20509 example data)
- Single-cell level NO table : result/GM20509_sort_geneid.txt (20210108 updated)
- Infer expression probability for each clones : result_CNN/DNN_train80_output_ypred_clone1_annot.txt, result_CNN/DNN_train80_output_ypred_clone2_annot.txt
- Infer differential expression table : result/Result_scNOVA_infer_expression_table.txt (20210108 updated)
- Heatmap and UMAP visualization of significant hits : result_plots/Result_scNOVA_plots_GM20509.pdf
- (Optional) Single-cell level TF motif deviation z-score : result/motif_dev_zscore_chromVAR_DHS_2kb_Enh_GM20509.txt (20210108 updated)
- (Optional) Haplotype-resolved NO of genebody and CREs : result_haplo/Deeptool_Genebody_H1H2_sort.txt, result_haplo/Deeptool_DHS_2kb_H1H2_sort.txt (20210108 updated)
This workflow is mean to be run in a Unix-based operating system.
- unix based tools : SAMtools/1.3.1-foss-2016b, biobambam2/2.0.76-foss-2016b, deeptools/2.5.1-foss-2016b-Python-2.7.12
- python packages : cuDNN, CUDA, TensorFlow, scikit-learn, matplotlib
- R packages : DESeq2, matrixStats, pheatmap, gplots, umap, Rtsne, factoextra, pracma, chromVAR, nabor, motifmatchr
-
Download this pipeline
- git lfs install
- git clone https://github.com/jeongdo801/scNOVA.git
-
Preparation of input files
- Add your single-cell bam and index files (input_bam/*.bam)
- Add key result files from mosaicatcher output in the input_user folder
- input_user/simpleCalls_llr4_poppriorsTRUE_haplotagsFALSE_gtcutoff0.05_regfactor6_filterTRUE.txt
- input_user/strandphaser_output.txt
- Add the subclonality information (input_user/input_subclonality.txt)
- Add the genes within copy number changed region to mask in the infer differential expression result, if it's not provided, genes will not be masked. (input_user/input_SV_affected_genes.txt)
-
Change the project name in the Snakefile
-
Launch the run_pipeline.sh script
For information on scTRIP see
Sanders et al., 2019 (doi: https://doi.org/10.1038/s41587-019-0366-x)