Tutorial for Stanford Sherlock cluster

This tutorial shows how to run pipelines on Sherlock.

All test samples and genome data are shared on Stanford Sherlock cluster based on SLURM. You don't have to download any data for testing our pipeline on it.

SSH to Sherlock's login node.
```
$ ssh login.sherlock.stanford.edu
```

Download cromwell on your $HOME directory.

$ cd 
$ wget https://github.com/broadinstitute/cromwell/releases/download/34/cromwell-34.jar
$ chmod +rx cromwell-34.jar

Git clone this pipeline and move into its directory.

$ cd
$ git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2
$ cd chip-seq-pipeline2

Our pipeline supports both Conda and Singularity.

For Conda users

Install Conda. Skip this if you already have equivalent Conda alternatives (Anaconda Python). Download and run the installer. Agree to the license term by typing yes. It will ask you about the installation location. On Stanford clusters (Sherlock and SCG4), we recommend to install it outside of your $HOME directory since its filesystem is slow and has very limited space. At the end of the installation, choose yes to add Miniconda's binary to $PATH in your BASH startup script.
```
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
```

Install Conda dependencies.

$ bash conda/uninstall_dependencies.sh  # to remove any existing pipeline env
$ bash conda/install_dependencies.sh

Run a pipeline for the test sample.

$ sbatch --partition normal examples/sherlock/ENCSR936XTK_subsampled_chr19_only_sherlock_conda.sh

For singularity users

Run a pipeline for the test sample.

$ sbatch --partition normal examples/sherlock/ENCSR936XTK_subsampled_chr19_only_sherlock_singularity.sh

For all users

It will take about an hour. You will be able to find all outputs on cromwell-executions/chip/[RANDOM_HASH_STRING]/. See output directory structure for details. You can monitor your jobs with the following command:
```
$ squeue -u $USER
```
See full specification for input JSON file.
You can resume a failed pipeline from where it left off by using PIPELINE_METADATA(metadata.json) file. This file is created for each pipeline run. See here for details. Once you get a new input JSON file from the resumer, then edit your shell script (examples/sherlock/ENCSR936XTK_subsampled_chr19_only_sherlock_*.sh) to use it INPUT=resume.[FAILED_WORKFLOW_ID].json instead of INPUT=examples/....

For singularity users

IF YOU WANT TO RUN PIPELINES WITH YOUR OWN INPUT DATA/GENOME DATABASE, PLEASE ADD THEIR DIRECTORIES TO workflow_opts/sherlock.json. For example, you have input FASTQs on /your/input/fastqs/ and genome database installed on /your/genome/database/ then add /your/ to singularity_bindpath. You can also define multiple directories there. It's comma-separated.
```
{
    "default_runtime_attributes" : {
        "singularity_container" : "~/.singularity/chip-seq-pipeline-v1.1.7.simg",
        "singularity_bindpath" : "/scratch,/lscratch,/oak/stanford,/home/groups/cherry/encode,/your/,YOUR_OWN_DATA_DIR1,YOUR_OWN_DATA_DIR1,..."
    }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial_sherlock.md

tutorial_sherlock.md

Tutorial for Stanford Sherlock cluster

For Conda users

For singularity users

For all users

For singularity users

Files

tutorial_sherlock.md

Latest commit

History

tutorial_sherlock.md

File metadata and controls

Tutorial for Stanford Sherlock cluster

For Conda users

For singularity users

For all users

For singularity users