Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add workflow: fetch_sra_bams_for_genbank_accession #475

Draft
wants to merge 30 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
d6eeae2
WIP commit of fetch_sra_runs_for_genbank_accession task and associate…
tomkinsc May 19, 2023
720a47e
pluralize output of fetch_sra_bams_for_genbank_accession workflow
tomkinsc Jun 1, 2023
74d0b79
update ncbi-tools docker image 2.10.8 -> 2.11
tomkinsc Jul 6, 2023
076fbc1
pin ncbi-tools docker image to 2.11.0
tomkinsc Jul 14, 2023
42d4355
update retrieval of SRA accessions given a GenBank accession
tomkinsc Jul 17, 2023
f1cb9dc
bump ncbi-tools 2.11.0 -> 2.11.1
tomkinsc Jul 18, 2023
2e4099c
add workflow to fetch sequences from GenBank: fetch_fasta_for_genbank…
tomkinsc Jul 18, 2023
b1bf8ef
update NCBI esearch parameter name ("-q" -> "-query") to reflect API …
tomkinsc Jul 18, 2023
c26c920
pin broadinstitute/qiime2=latest
tomkinsc Jul 18, 2023
b9d683f
pin docker image to latest in other tasks of tasts_16S_amplicon.wdl
tomkinsc Jul 18, 2023
cb69a63
bump womtool and cromwell 61->85
tomkinsc Jul 18, 2023
cc9524b
do not quit commands in tasks_16S_amplicon on pipefail
tomkinsc Jul 18, 2023
d277bdd
condense GitHub actions script flags
tomkinsc Jul 18, 2023
6015636
bump setup-buildx-action GitHub action version v1->v2
tomkinsc Jul 18, 2023
50e402c
pin quay.io/broadinstitute/qiime2 to specific build hash
tomkinsc Jul 18, 2023
3f42c91
chmod 644 two qiime-related WDLs
tomkinsc Jul 18, 2023
4c8bd10
increase default mem for nextstrain_build_subsample task 50->96GB
tomkinsc Jul 18, 2023
4164239
WIP config to reduce verbosity of cromwell logging
tomkinsc Jul 18, 2023
e274a4f
list on dockstore: fetch_sra_bams_for_genbank_accession, fetch_fasta_…
tomkinsc Jul 19, 2023
64a7d73
cleanup cromwell test dir on failure or exit (unless KEEP_OUTPUT=true)
tomkinsc Jul 19, 2023
a986189
bugfix
tomkinsc Jul 19, 2023
ea4f193
only disable miniwdl post-execution output chown on macOS
tomkinsc Jul 19, 2023
f3b9eb2
add docker action for tests-cromwell; cruft removal;
tomkinsc Jul 19, 2023
8eed3b5
delay cd to test_dir until after jar copy
tomkinsc Jul 19, 2023
1bb6e03
roll back version of ncbi-tools pin
tomkinsc Jul 19, 2023
a624343
Merge branch 'master' into ct-fetch-sra-bams-for-genbank-accession
tomkinsc Jul 19, 2023
9a05bff
(debugging) roll back cromwell test config
tomkinsc Jul 19, 2023
8e1d45e
roll back cromwell 85 -> 61
tomkinsc Jul 19, 2023
e42bf47
DRY: use Fetch_SRA_to_BAM to fetch (and reheader) SRA bams
tomkinsc Jul 21, 2023
8ba6bbc
Merge branch 'master' into ct-fetch-sra-bams-for-genbank-accession
tomkinsc Jul 26, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,16 @@ workflows:
primaryDescriptorPath: /pipes/WDL/workflows/fetch_sra_to_bam.wdl
testParameterFiles:
- empty.json
- name: fetch_sra_bams_for_genbank_accession
subclass: WDL
primaryDescriptorPath: /pipes/WDL/workflows/fetch_sra_bams_for_genbank_accession.wdl
testParameterFiles:
- empty.json
- name: fetch_fasta_for_genbank_accessions
subclass: WDL
primaryDescriptorPath: /pipes/WDL/workflows/fetch_fasta_for_genbank_accessions.wdl
testParameterFiles:
- empty.json
- name: filter_classified_bam_to_taxa
subclass: WDL
primaryDescriptorPath: /pipes/WDL/workflows/filter_classified_bam_to_taxa.wdl
Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ jobs:
run: git fetch --prune --unshallow --tags
- name: Programmatic environment setup
run: |
set -e -x
set -ex
# $GITHUB_ENV is available for subsequent steps
GITHUB_ACTIONS_TAG=$(git describe --tags --exact-match && sed 's/^v//g' || echo '')
echo "GITHUB_ACTIONS_TAG=$GITHUB_ACTIONS_TAG" >> $GITHUB_ENV
Expand Down Expand Up @@ -218,6 +218,8 @@ jobs:
shell: bash
run: |
github_actions_ci/install-wdl.sh
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: test with cromwell
shell: bash
run: |
Expand All @@ -237,7 +239,7 @@ jobs:
run: git fetch --prune --unshallow --tags
- name: Programmatic environment setup
run: |
set -e -x
set -ex
# $GITHUB_ENV is available for subsequent steps
GITHUB_ACTIONS_TAG=$(git describe --tags --exact-match && sed 's/^v//g' || echo '')
echo "GITHUB_ACTIONS_TAG=$GITHUB_ACTIONS_TAG" >> $GITHUB_ENV
Expand Down Expand Up @@ -269,7 +271,7 @@ jobs:
run: |
pip3 install miniwdl docker[tls] six
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
uses: docker/setup-buildx-action@v2
- name: test with miniwdl
shell: bash
run: |
Expand Down
7 changes: 6 additions & 1 deletion github_actions_ci/check-wdl-runtimes.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,18 @@

echo "Checking wdl container versions against ${MODULE_VERSIONS}"


# this is the newer script that simply validates existing version strings
should_error=false
for task_file in $(ls -1 pipes/WDL/tasks/*.wdl); do
echo "Checking ${task_file}"
while IFS='=' read module version; do
OLD_TAG=$module
NEW_TAG="$module:$version"
if ! grep -q "sha256" <<< "$version"; then
NEW_TAG="$module:$version"
else
NEW_TAG="$module@$version"
fi

offending_lines="$(grep -nE "^[^#]*$OLD_TAG" "${task_file}" | grep -v $NEW_TAG)"

Expand Down
47 changes: 41 additions & 6 deletions github_actions_ci/tests-cromwell.sh
Original file line number Diff line number Diff line change
@@ -1,10 +1,43 @@
#!/bin/bash
set -e # intentionally allow for pipe failures below

mkdir -p workflows
cp *.jar pipes/WDL/workflows/*.wdl pipes/WDL/tasks/*.wdl workflows
cp -r test workflows/
cd workflows
# increase docker timeouts to allow for staging of larger images (seconds)
export DOCKER_CLIENT_TIMEOUT=240
export COMPOSE_HTTP_TIMEOUT=240

starting_dir="$(pwd)"
test_dir="cromwell_testing"

function cleanup(){
echo "Cleaning up from miniwdl run; exit code: $?"
cd "$starting_dir"
if [ -d "$test_dir" ] && [[ $KEEP_OUTPUT != "true" ]]; then
rm -r "$test_dir"
fi
}
trap cleanup EXIT SIGINT SIGQUIT SIGTERM

mkdir -p ${test_dir}
cp pipes/WDL/workflows/*.wdl pipes/WDL/tasks/*.wdl $test_dir
sed -i -- 's|import \"../tasks/|import \"|g' ${test_dir}/*.wdl
cp -r test ${test_dir}/

CROMWELL_LOG_LEVEL="${CROMWELL_LOG_LEVEL:=WARN}"

# if "cromwell" exists on the PATH (no .jar file extension suffix)
# it means it was installed from bioconda
if hash cromwell &>/dev/null; then
echo "conda cromwell present";
# this is the bioconda java-launching script
JAVA_ENTRYPOINT="cromwell"
else
# otherwise if cromwell is not installed via conda, call java
JAVA_ENTRYPOINT="java"
cp *.jar ${test_dir}
CROMWELL_JAR_ARG="-jar cromwell.jar"
fi

cd ${test_dir}

for workflow in ../pipes/WDL/workflows/*.wdl; do
workflow_name=$(basename $workflow .wdl)
Expand All @@ -13,8 +46,10 @@ for workflow in ../pipes/WDL/workflows/*.wdl; do
date
echo "Executing $workflow_name using Cromwell on local instance"
# the "cat" is to allow a pipe failure (otherwise it halts because of set -e)
java -Dconfig.file=../pipes/cromwell/cromwell.local-github_actions.conf \
-jar cromwell.jar run \
${JAVA_ENTRYPOINT} -Dconfig.file=../pipes/cromwell/cromwell.local-github_actions.conf \
-DLOG_MODE=pretty \
-DLOG_LEVEL=${CROMWELL_LOG_LEVEL} \
${CROMWELL_JAR_ARG} run \
$workflow_name.wdl \
-i $input_json | tee cromwell.out
if [ ${PIPESTATUS[0]} -gt 0 ]; then
Expand Down
13 changes: 12 additions & 1 deletion github_actions_ci/tests-miniwdl.sh
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
#!/bin/bash
set -ex -o pipefail

# increase docker timeouts to allow for staging of larger images (seconds)
export DOCKER_CLIENT_TIMEOUT=240
export COMPOSE_HTTP_TIMEOUT=240

starting_dir="$(pwd)"
test_dir="miniwdl_testing"

function cleanup(){
echo "Cleaning up from miniwdl run; exit code: $?"
cd "$starting_dir"
if [ -d "$test_dir" ]; then
if [ -d "$test_dir" ] && [[ $KEEP_OUTPUT != "true" ]]; then
rm -r "$test_dir"
fi
}
Expand All @@ -19,6 +23,13 @@ cd $test_dir

docker --version

if [ "$(uname)" == "Darwin" ]; then
# miniwdl tries to chown output files to the UID
# of the user executing miniwdl, but this can cause problems
# when docker is itself running in a virtualized environment (macOS)
export MINIWDL__FILE_IO__CHOWN=false
fi

# make sure our system has everything it needs to perform "miniwdl run" (e.g. docker swarm works)
miniwdl run_self_test

Expand Down
11 changes: 9 additions & 2 deletions github_actions_ci/version-wdl-runtimes.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,18 @@
# use sed to replace version strings of docker images based on versions defined in txt file

# requires $MODULE_VERSIONS to be set to point to a text file with equal-sign-separated values
# export MODULE_VERSIONS="./requirements-modules.txt" && ./github_actions_ci/check-wdl-runtimes.sh
# export MODULE_VERSIONS="./requirements-modules.txt" && ./github_actions_ci/version-wdl-runtimes.sh

while IFS='=' read module version; do
OLD_TAG=$module
NEW_TAG="$module:$version"
if ! grep -q "sha256" <<< "$version"; then
echo "$module is specified using image tag"
NEW_TAG="$module:$version"
else
echo "$module is specified using image build hash"
NEW_TAG="$module@$version"
fi
echo Replacing $OLD_TAG with $NEW_TAG in all task WDL files
sed -i '' "s|$OLD_TAG[^\"\']*|$NEW_TAG|g" pipes/WDL/tasks/*.wdl

done < $MODULE_VERSIONS
24 changes: 12 additions & 12 deletions pipes/WDL/tasks/tasks_16S_amplicon.wdl
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ task qiime_import_from_bam {
Int memory_mb = 7000
Int cpu = 5
Int disk_size_gb = ceil(2*20) + 5
String docker = "quay.io/broadinstitute/qiime2"
String docker = "quay.io/broadinstitute/qiime2@sha256:b1b8824516dc8b2d829cf562d4525d87f0ba5aec0a08a4c63d640eff5f91978b"
}
parameter_meta {
reads_bam: {
Expand All @@ -27,7 +27,7 @@ task qiime_import_from_bam {
}

command <<<
set -ex -o pipefail
set -ex

#Part 1A | BAM -> FASTQ [Simple samtools command]
manifest_TSV=manifest.tsv
Expand Down Expand Up @@ -86,7 +86,7 @@ task trim_reads {
Int memory_mb = 2000
Int cpu = 4
Int disk_size_gb = ceil(2*size(reads_qza, "GiB")) + 5
String docker = "quay.io/broadinstitute/qiime2"
String docker = "quay.io/broadinstitute/qiime2@sha256:b1b8824516dc8b2d829cf562d4525d87f0ba5aec0a08a4c63d640eff5f91978b"
}
parameter_meta {
reads_qza: {
Expand Down Expand Up @@ -119,7 +119,7 @@ task trim_reads {
}
}
command <<<
set -ex -o pipefail
set -ex
qiime cutadapt trim-paired \
--i-demultiplexed-sequences "~{reads_qza}" \
--p-front-f "~{forward_adapter}" \
Expand Down Expand Up @@ -160,7 +160,7 @@ task join_paired_ends {
Int memory_mb = 2000
Int cpu = 1
Int disk_size_gb = ceil(2*size(trimmed_reads_qza, "GiB")) + 50
String docker = "quay.io/broadinstitute/qiime2"
String docker = "quay.io/broadinstitute/qiime2@sha256:b1b8824516dc8b2d829cf562d4525d87f0ba5aec0a08a4c63d640eff5f91978b"
}
parameter_meta{
trimmed_reads_qza: {
Expand All @@ -177,7 +177,7 @@ task join_paired_ends {
}
}
command <<<
set -ex -o pipefail
set -ex
qiime vsearch join-pairs \
--i-demultiplexed-seqs ~{trimmed_reads_qza} \
--o-joined-sequences "joined.qza"
Expand Down Expand Up @@ -210,7 +210,7 @@ task deblur {
Int memory_mb = 2000
Int cpu = 1
Int disk_size_gb = ceil(2*size(joined_end_reads_qza, "GiB")) + 5
String docker = "quay.io/broadinstitute/qiime2"
String docker = "quay.io/broadinstitute/qiime2@sha256:b1b8824516dc8b2d829cf562d4525d87f0ba5aec0a08a4c63d640eff5f91978b"
}
parameter_meta {
joined_end_reads_qza: {
Expand Down Expand Up @@ -239,7 +239,7 @@ task deblur {
}
}
command <<<
set -ex -o pipefail
set -ex

qiime deblur denoise-16S \
--i-demultiplexed-seqs ~{joined_end_reads_qza}\
Expand Down Expand Up @@ -288,7 +288,7 @@ task train_classifier {
Int memory_mb = 2000
Int cpu = 1
Int disk_size_gb = ceil(2*size(otu_ref, "GiB")) + 5
String docker = "quay.io/broadinstitute/qiime2"
String docker = "quay.io/broadinstitute/qiime2@sha256:b1b8824516dc8b2d829cf562d4525d87f0ba5aec0a08a4c63d640eff5f91978b"
}
parameter_meta{
otu_ref: {
Expand Down Expand Up @@ -322,7 +322,7 @@ task train_classifier {
}

command <<<
set -ex -o pipefail
set -ex
CONDA_ENV_NAME=$(conda info --envs -q | awk -F" " '/qiime.*/{ print $1 }')
conda activate ${CONDA_ENV_NAME}

Expand Down Expand Up @@ -372,7 +372,7 @@ task tax_analysis {
Int memory_mb = 5
Int cpu = 1
Int disk_size_gb = 375
String docker = "quay.io/broadinstitute/qiime2"
String docker = "quay.io/broadinstitute/qiime2@sha256:b1b8824516dc8b2d829cf562d4525d87f0ba5aec0a08a4c63d640eff5f91978b"
}
parameter_meta{
trained_classifier: {
Expand All @@ -397,7 +397,7 @@ task tax_analysis {
}
}
command <<<
set -ex -o pipefail
set -ex
qiime feature-classifier classify-sklearn \
--i-classifier ~{trained_classifier} \
--i-reads ~{representative_seqs_qza} \
Expand Down
Loading