scEpiGenie is a comprehensive suite of tools and workflows for analyzing single-cell ATAC (scATAC-Seq) data from 10X Genomics sequencing technology supporting human and mouse genome cohorts. scEpiGenie is an initiative of the Bioinformatics Core at the Department of Developmental Neurobiology at the St. Jude Children's Research Hospital.
- Getting Started
- Installation
- Tutorial and Documentation
- How to Use the Repository
- Requesting Resources from the HPCF Cluster
To begin using the scEpiGenie workflow, follow the instructions below to set up the environment and run the code. A pre-built Docker image is available for easy setup, containing all the necessary tools, packages, and dependencies to seamlessly run the code and analysis modules.
Currently under construction. Stay tuned! 🚧🚧🚧
The pipeline requires a TSV file containing essential metadata for cohort analysis. The file must be named project_metadata.tsv
. It can include one or more samples, as long as it contains at least the following columns in this exact order: ID
, SAMPLE
, and FASTQ
. The ID
column must contain unique values. The SAMPLE
column must contain the seq_submission_code
along with the ID, e.g., seq_submission_code1_sample1
or the corresponding library name. The FASTQ
column must contain the file path to the fastq files. For samples with top-ups or multiple technical replicates, list all associated library names and FASTQ file paths in the same row, using commas to separate each path. Additional metadata columns can be added and arranged as needed by the user (though not required).
The file can be stored anywhere, but its filepath must be specified in the project_parameters.Config.yaml
file.
For user convenience, an example project_metadata.tsv file is provided.
We recommend that users fork the sc-epigenie
repository and then clone their forked repository to their local machine. Team members should use the stjude-dnb-binfcore account, while others can use their preferred GitHub account. We welcome collaborations, so please feel free to reach out if you're interested in being added to the stjude-dnb-binfcore
account.
- Fork the repository
Navigate to the main page of the stjude-dnb-binfcore/sc-epigenie
repository and click the "Fork" button.

- Create Your Fork
You can change the name of the forked repository (optional - unless you will use it for multiple projects). Click "Create fork" to proceed.

- Enjoy your new project repo!

- Clone Your Fork
Once you have created the fork, clone it to your local machine:
git clone https://github.com/<FORK_NAME>.git
- Configure Your Parameters
Replace the project_parameters.Config.yaml
file with your own file paths and parameters.
- Navigate to an Analysis Module
Change to the relevant directory and run the desired shell script:
cd ./sc-epigenie/analyses/<module_of_interest>
- Sync Your Fork
User needs to ensure that the main branch of the forked repository is always up to date with stjude-dnb-binfcore/sc-epigenie:main
.
If your fork is behind the main repository (stjude-dnb-binfcore/sc-epigenie:main
), sync it to ensure you have the latest updates. This will update the main branch of your project repo with the new code and modules (if any). This will add code and not break any analyses already run in your project repo.
When syncing your forked repository with the main repository, please be cautious of any changes made to the following files, as they are typically modified and specified for project data analysis:
project_parameters.Config.yaml
Before pulling the latest changes, stash any modifications you have made to these files. This ensures that you won't accidentally overwrite your changes when syncing with the main repository.
Some useful git commands:
git branch
git checkout main
git config pull.rebase false
git status
git add project_parameters.Config.yaml
git commit -m "Update yaml"
Finally, git pull
to get the most updated changes and code in your project repo. Please be mindful of any local changes in files in your project repo that you have done, e.g., project_parameters.Config.yaml
. You will need to commit or stash (or restore) the changes to the yaml before completing the pull.
git pull
While we provide estimates for the computational resources required (based on 4 samples with approximately 80,000 cells), users may need to adjust memory settings based on cohort size and analysis requirements.
Important Considerations:
- Adjust memory requests according to the size of your cohort and specific analysis needs.
- For St. Jude users:
- Refer to the Introduction to the HPCF cluster for detailed guidance.
- If you require more than 1 TB of memory, use the
large_mem
queue to ensure proper resource allocation.
├── analyses
| ├── cellranger-analysis
| ├── cluster-cell-calling
| ├── fastqc-analysis
| ├── integrative-analysis
| ├── project-updates
| ├── README.md
| └── upstream-analysis
├── data
├── figures
├── LICENSE
├── project_parameters.Config.yaml
├── README.md
├── run-container
├── run-rstudio.sh
├── run-terminal.sh
└── SECURITY.md
Contributions, issues, and feature requests are welcome! Please feel free to check issues.
These tools and pipelines have been developed by the Bioinformatic core team at the St. Jude Children's Research Hospital. These are open access materials distributed under the terms of the BSD 2-Clause License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.