This Snakemake workflow is designed to split one or more scATAC-seq BAM files into pseudobulk replicates, each containing n cells.
- For each BAM file, count the number of unique occurences of each cell barcode
- Assign cell barcodes a label corresponding to a pseudobulk replicate to create
- Split the BAM file into pseudobulk BAM files using Sinto to separate the cell barcodes
- Generate indexes for the pseudobulk BAM files
- Generate bigWigs for the pseudobulks
- Call peaks for the pseudobulks
- Create a metadata file summarising the number of pseudobulks created from each input BAM file
-
Set up the pseudobulk conda enviroment using workflow/envs/pseudobulk_env.yaml
conda env create --name pseudobulk --file=pseudobulk_env.yml
-
Edit config/config.yaml to set the data and parameters
-
Then the pipeline can either be run manually via command line or scheduled to run on a cluster via Slurm.
-
Open a terminal and navigate to the folder containing the workflow, e.g.
cd user/path_to_projects/scATAC_Pseudobulk_Pipeline
-
Activate conda environment with dependencies
-
Run the workflow via the command below, setting
n
as the number of processors to use
snakemake --cores n
The pipeline can be scheduled to run on a cluster using the file submit.sh