scEpiGenie: A Workflow for Single-cell ATAC-seq (scATAC-seq)

scEpiGenie is a comprehensive suite of tools and workflows for analyzing single-cell ATAC (scATAC-Seq) data from 10X Genomics sequencing technology supporting human and mouse genome cohorts. scEpiGenie is an initiative of the Bioinformatics Core at the Department of Developmental Neurobiology at the St. Jude Children's Research Hospital.

Getting Started

Installation

To begin using the scEpiGenie workflow, follow the instructions below to set up the environment and run the code. A pre-built Docker image is available for easy setup, containing all the necessary tools, packages, and dependencies to seamlessly run the code and analysis modules.

Tutorial and Documentation

Currently under construction. Stay tuned! 🚧🚧🚧

Preparing project metadata

The pipeline requires a TSV file containing essential metadata for cohort analysis. The file must be named project_metadata.tsv. It can include one or more samples, as long as it contains at least the following columns in this exact order: ID, SAMPLE, and FASTQ. The ID column must contain unique values. The SAMPLE column must contain the seq_submission_code along with the ID, e.g., seq_submission_code1_sample1 or the corresponding library name. The FASTQ column must contain the file path to the fastq files. For samples with top-ups or multiple technical replicates, list all associated library names and FASTQ file paths in the same row, using commas to separate each path. Additional metadata columns can be added and arranged as needed by the user (though not required).

The file can be stored anywhere, but its filepath must be specified in the project_parameters.Config.yaml file.

For user convenience, an example project_metadata.tsv file is provided.

How to Use the Repository

Accessing the Code

We recommend that users fork the sc-epigenie repository and then clone their forked repository to their local machine. Team members should use the stjude-dnb-binfcore account, while others can use their preferred GitHub account. We welcome collaborations, so please feel free to reach out if you're interested in being added to the stjude-dnb-binfcore account.

Fork the repository

Navigate to the main page of the stjude-dnb-binfcore/sc-epigenie repository and click the "Fork" button.

Create Your Fork

You can change the name of the forked repository (optional - unless you will use it for multiple projects). Click "Create fork" to proceed.

Enjoy your new project repo!

Clone Your Fork

Once you have created the fork, clone it to your local machine:

git clone https://github.com/<FORK_NAME>.git

Running the Code

Configure Your Parameters

Replace the project_parameters.Config.yaml file with your own file paths and parameters.

Navigate to an Analysis Module

Change to the relevant directory and run the desired shell script:

cd ./sc-epigenie/analyses/<module_of_interest>

Sync Your Fork

User needs to ensure that the main branch of the forked repository is always up to date with stjude-dnb-binfcore/sc-epigenie:main.

If your fork is behind the main repository (stjude-dnb-binfcore/sc-epigenie:main), sync it to ensure you have the latest updates. This will update the main branch of your project repo with the new code and modules (if any). This will add code and not break any analyses already run in your project repo.

When syncing your forked repository with the main repository, please be cautious of any changes made to the following files, as they are typically modified and specified for project data analysis:

project_parameters.Config.yaml

Before pulling the latest changes, stash any modifications you have made to these files. This ensures that you won't accidentally overwrite your changes when syncing with the main repository.

Some useful git commands:

git branch
git checkout main
git config pull.rebase false

git status
git add project_parameters.Config.yaml
git commit -m "Update yaml"

Finally, git pull to get the most updated changes and code in your project repo. Please be mindful of any local changes in files in your project repo that you have done, e.g., project_parameters.Config.yaml. You will need to commit or stash (or restore) the changes to the yaml before completing the pull.

git pull

Requesting CPU and Memory Resources

While we provide estimates for the computational resources required (based on 4 samples with approximately 80,000 cells), users may need to adjust memory settings based on cohort size and analysis requirements.

Important Considerations:

Adjust memory requests according to the size of your cohort and specific analysis needs.
For St. Jude users:
- Refer to the Introduction to the HPCF cluster for detailed guidance.
- If you require more than 1 TB of memory, use the large_mem queue to ensure proper resource allocation.

Below is the main directory structure listing the analyses and data files used in this repository

├── analyses
|  ├── cellranger-analysis
|  ├── cluster-cell-calling
|  ├── fastqc-analysis
|  ├── integrative-analysis
|  ├── project-updates
|  ├── README.md
|  └── upstream-analysis
├── data
├── figures
├── LICENSE
├── project_parameters.Config.yaml
├── README.md
├── run-container
├── run-rstudio.sh
├── run-terminal.sh
└── SECURITY.md

Contact

Contributions, issues, and feature requests are welcome! Please feel free to check issues.

These tools and pipelines have been developed by the Bioinformatic core team at the St. Jude Children's Research Hospital. These are open access materials distributed under the terms of the BSD 2-Clause License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scEpiGenie: A Workflow for Single-cell ATAC-seq (scATAC-seq)

Table of Contents

Getting Started

Installation

Tutorial and Documentation

Preparing project metadata

How to Use the Repository

Accessing the Code

Running the Code

Requesting CPU and Memory Resources

Below is the main directory structure listing the analyses and data files used in this repository

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
.github		.github
analyses		analyses
data/project_metadata		data/project_metadata
figures		figures
run-container		run-container
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
project_parameters.Config.yaml		project_parameters.Config.yaml
run-rstudio.sh		run-rstudio.sh
run-terminal.sh		run-terminal.sh

License

stjude-dnb-binfcore/sc-epigenie

Folders and files

Latest commit

History

Repository files navigation

scEpiGenie: A Workflow for Single-cell ATAC-seq (scATAC-seq)

Table of Contents

Getting Started

Installation

Tutorial and Documentation

Preparing project metadata

How to Use the Repository

Accessing the Code

Running the Code

Requesting CPU and Memory Resources

Below is the main directory structure listing the analyses and data files used in this repository

Contact

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages