All test samples and genome data are shared on Stanford SCG cluster based on SLURM. You don't have to download any data for testing our pipeline on it.
-
SSH to SCG's login node.
$ ssh login.scg.stanford.edu
-
Git clone this pipeline and move into it.
$ git clone https://github.com/ENCODE-DCC/atac-seq-pipeline $ cd atac-seq-pipeline
-
Download cromwell.
$ wget https://github.com/broadinstitute/cromwell/releases/download/34/cromwell-34.jar $ chmod +rx cromwell-34.jar
-
Set your account in
workflow_opts/scg.json
. Ignore other runtime attributes for singularity.{ "default_runtime_attributes" : { "slurm_account" : "YOUR_SLURM_ACCOUNT" } }
Our pipeline supports both Conda and Singularity.
-
Install Conda dependencies.
$ bash conda/uninstall_dependencies.sh # to remove any existing pipeline env $ bash conda/install_dependencies.sh
-
Run a pipeline for a SUBSAMPLED (1/400) paired-end sample of ENCSR356KRQ.
$ source activate encode-atac-seq-pipeline # IMPORTANT! $ INPUT=examples/scg/ENCSR356KRQ_subsampled_scg.json $ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm cromwell-34.jar run atac.wdl -i ${INPUT} -o workflow_opts/scg.json
-
It will take about an hour. You will be able to find all outputs on
cromwell-executions/atac/[RANDOM_HASH_STRING]/
. See output directory structure for details. -
See full specification for input JSON file.
-
Pull a singularity container for the pipeline. This will pull pipeline's docker container first and build a singularity one on
~/.singularity
.$ SINGULARITY_PULLFOLDER=~/.singularity singularity pull docker://quay.io/encode-dcc/atac-seq-pipeline:v1.1
-
Run a pipeline for a SUBSAMPLED (1/400) paired-end sample of ENCSR356KRQ.
$ INPUT=examples/scg/ENCSR356KRQ_subsampled_scg.json $ java -jar -Dconfig.file=backends/backend.conf -Dbackend.default=slurm_singularity cromwell-34.jar run atac.wdl -i ${INPUT} -o workflow_opts/scg.json
-
It will take about an hour. You will be able to find all outputs on
cromwell-executions/atac/[RANDOM_HASH_STRING]/
. See output directory structure for details. -
See full specification for input JSON file.
-
IF YOU WANT TO RUN PIPELINES WITH YOUR OWN INPUT DATA/GENOME DATABASE, PLEASE ADD THEIR DIRECTORIES TO
workflow_opts/scg.json
. For example, you have input FASTQs on/your/input/fastqs/
and genome database installed on/your/genome/database/
then add/your/
to--bind
insingularity_command_options
. You can also define multiple directories there. It's comma-separated.{ "default_runtime_attributes" : { "singularity_container" : "~/.singularity/chip-seq-pipeline-v1.1.simg", "singularity_command_options" : "--bind /ifs/scratch,/srv/gsfs0,/your/,YOUR_OWN_DATA_DIR1,YOUR_OWN_DATA_DIR1,..." } }
-
If you want to run multiple (>10) pipelines, then run a cromwell server on an interactive node. We recommend to use
screen
ortmux
to keep your session alive and note that all running pipelines will be killed after walltime. Run a Cromwell server with the following commands.$ srun -n 2 --mem 5G -t 3-0 --qos normal --account [YOUR_SCG_ACCOUNT] --pty /bin/bash -i -l # 2 CPU, 5 GB RAM and 3 day walltime $ hostname -f # to get [CROMWELL_SVR_IP]
For Conda users,
$ source activate encode-atac-seq-pipeline $ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backends/backend/conf -Dbackend.default=slurm cromwell-34.jar server
For singularity users,
$ _JAVA_OPTIONS="-Xmx5G" java -jar -Dconfig.file=backends/backend/conf -Dbackend.default=slurm_singularity cromwell-34.jar server
-
You can modify
backend.providers.slurm.concurrent-job-limit
orbackend.providers.slurm_singularity.concurrent-job-limit
inbackends/backend.conf
to increase maximum concurrent jobs. This limit is NOT PER SAMPLE. It's for all sub-tasks of all submitted samples. -
On a login node, submit jobs to the cromwell server. You will get
[WORKFLOW_ID]
as a return value. Keep these workflow IDs for monitoring pipelines and finding outputs for a specific sample later.$ INPUT=YOUR_INPUT.json $ curl -X POST --header "Accept: application/json" -v "[CROMWELL_SVR_IP]:8000/api/workflows/v1" \ -F [email protected] \ -F workflowInputs=@${INPUT} \ -F workflowOptions=@workflow_opts/scg.json
To monitor pipelines, see cromwell server REST API description for more details. squeue
will not give you enough information for monitoring jobs per sample.
$ curl -X GET --header "Accept: application/json" -v "[CROMWELL_SVR_IP]:8000/api/workflows/v1/[WORKFLOW_ID]/status"