-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chipseq updates #111
base: main
Are you sure you want to change the base?
Chipseq updates #111
Changes from all commits
d3a1280
a7f0d6b
cbb83af
d98a1a9
da5b3c1
a51216f
40ef841
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,14 +36,15 @@ import "../../tools/picard.wdl" | |
import "../../tools/samtools.wdl" | ||
import "../../tools/util.wdl" | ||
import "../general/bam-to-fastqs.wdl" as b2fq | ||
import "https://raw.githubusercontent.com/stjude/seaseq/2.3/workflows/workflows/mapping.wdl" as seaseq_map | ||
import "https://raw.githubusercontent.com/stjude/seaseq/3.0/workflows/tasks/samtools.wdl" as seaseq_samtools | ||
import "https://raw.githubusercontent.com/stjude/seaseq/3.0/workflows/tasks/seaseq_util.wdl" as seaseq_util | ||
import "https://raw.githubusercontent.com/stjude/seaseq/3.1/workflows/workflows/mapping.wdl" as seaseq_map | ||
import "https://raw.githubusercontent.com/stjude/seaseq/3.1/workflows/tasks/samtools.wdl" as seaseq_samtools | ||
import "https://raw.githubusercontent.com/stjude/seaseq/3.1/workflows/tasks/seaseq_util.wdl" as seaseq_util | ||
|
||
workflow chipseq_standard { | ||
parameter_meta { | ||
bam: "Input BAM format file to realign with bowtie" | ||
bowtie_indexes: "Database of v1 reference files for the bowtie aligner. Can be generated with https://github.com/stjude/seaseq/blob/master/workflows/tasks/bowtie.wdl. [*.ebwt]" | ||
paired_end: "Is the data paired-end (true) or single-end (false)?" | ||
excludelist: "Optional list of regions that will be excluded after reference alignment" | ||
prefix: "Prefix for output files" | ||
validate_input: "Run Picard ValidateSamFile on the input BAM" | ||
|
@@ -55,6 +56,7 @@ workflow chipseq_standard { | |
input { | ||
File bam | ||
Array[File] bowtie_indexes | ||
Boolean paired_end = false | ||
File? excludelist | ||
String prefix = basename(bam, ".bam") | ||
Boolean validate_input = true | ||
|
@@ -89,7 +91,7 @@ workflow chipseq_standard { | |
|
||
call b2fq.bam_to_fastqs { input: | ||
bam=selected_bam, | ||
paired_end=false, | ||
paired_end=paired_end, | ||
use_all_cores=use_all_cores, | ||
max_retries=max_retries | ||
} | ||
|
@@ -105,29 +107,31 @@ workflow chipseq_standard { | |
max_retries=max_retries | ||
} | ||
|
||
scatter (pair in zip(bam_to_fastqs.read1s, read_groups)){ | ||
scatter (tuple in zip(zip(bam_to_fastqs.read1s, bam_to_fastqs.read2s), read_groups)){ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you positive this works for all combinations of inputs? I thought WDL enforced "every array in a zip must be of equal length". I think this line would break under certain circumstances. Or maybe my understanding of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to run some additional test data through, but for both single-end and paired-end data so far, that argument works. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you point me to an input JSON you're using for both PE and SE? I want to walk myself through some concrete examples, because I'm confused how this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. I just moved these from my home directory. I did a
|
||
call seaseq_util.basicfastqstats as basic_stats { input: | ||
fastqfile=pair.left | ||
fastqfile=tuple.left.left | ||
} | ||
call seaseq_map.mapping as bowtie_single_end_mapping { input: | ||
fastqfile=pair.left, | ||
call seaseq_map.mapping as bowtie_mapping { input: | ||
fastqfile=tuple.left.left, # the FASTQ pair is the left of the first pair, then it is R1 = left, R2 = right in the nested pair | ||
fastqfile_R2=tuple.left.right, | ||
index_files=bowtie_indexes, | ||
metricsfile=basic_stats.metrics_out, | ||
blacklist=excludelist | ||
blacklist=excludelist, | ||
paired_end=paired_end | ||
} | ||
File chosen_bam = select_first( | ||
[ | ||
bowtie_single_end_mapping.bklist_bam, | ||
bowtie_single_end_mapping.mkdup_bam, | ||
bowtie_single_end_mapping.sorted_bam | ||
bowtie_mapping.bklist_bam, | ||
bowtie_mapping.mkdup_bam, | ||
bowtie_mapping.sorted_bam | ||
] | ||
) | ||
call util.add_to_bam_header { input: | ||
bam=chosen_bam, | ||
additional_header=pair.right, | ||
additional_header=tuple.right, | ||
max_retries=max_retries | ||
} | ||
String rg_id_field = sub(sub(pair.right, ".*ID:", "ID:"), "\t.*", "") | ||
String rg_id_field = sub(sub(tuple.right, ".*ID:", "ID:"), "\t.*", "") | ||
String rg_id = sub(rg_id_field, "ID:", "") | ||
call samtools.addreplacerg as single_end { input: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rename |
||
bam=add_to_bam_header.reheadered_bam, | ||
|
@@ -159,7 +163,16 @@ workflow chipseq_standard { | |
use_all_cores=use_all_cores, | ||
max_retries=max_retries | ||
} | ||
call picard.validate_bam { input: bam=markdup.mkdupbam, max_retries=max_retries } | ||
call picard.validate_bam { input: | ||
bam=markdup.mkdupbam, | ||
ignore_list=["MISSING_PLATFORM_VALUE", | ||
"INVALID_PLATFORM_VALUE", | ||
"INVALID_MAPPING_QUALITY", | ||
"MATES_ARE_SAME_END", | ||
"MISMATCH_FLAG_MATE_NEG_STRAND", | ||
"MISMATCH_MATE_ALIGNMENT_START"], | ||
Comment on lines
+170
to
+173
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These highlighted 4 feel like things that shouldn't be ignored. Are BAMs really coming out of SEAseq with all these problems? Should that be addressed instead of ignored here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's an issue with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting. Is this a known and logged bug in I'd much rather fix these errors over simply ignoring them. If I'm understanding correctly, this update (and possible the current version) are outputting BAMs which don't follow the SPEC. I'm not a fan of passing the buck off to My two cents: we either need an update/rollback on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the solution is to use |
||
max_retries=max_retries | ||
} | ||
|
||
call md5sum.compute_checksum { input: | ||
file=markdup.mkdupbam, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,18 +54,39 @@ workflow bam_to_fastqs { | |
|
||
call samtools.quickcheck { input: bam=bam, max_retries=max_retries } | ||
call samtools.split { input: bam=bam, use_all_cores=use_all_cores, max_retries=max_retries } | ||
scatter (split_bam in split.split_bams) { | ||
call samtools.collate_to_fastq as bam_to_fastq { input: | ||
bam=split_bam, | ||
paired_end=paired_end, | ||
interleaved=false, # matches default but prevents user from overriding | ||
use_all_cores=use_all_cores, | ||
max_retries=max_retries | ||
|
||
if (paired_end){ | ||
scatter (split_bam in split.split_bams) { | ||
call samtools.collate_to_fastq as bam_to_fastq { input: | ||
bam=split_bam, | ||
paired_end=paired_end, | ||
interleaved=false, # matches default but prevents user from overriding | ||
use_all_cores=use_all_cores, | ||
max_retries=max_retries | ||
} | ||
} | ||
} | ||
|
||
scatter (reads in | ||
zip(bam_to_fastq.read_one_fastq_gz, bam_to_fastq.read_two_fastq_gz) | ||
if (!paired_end){ | ||
scatter (split_bam in split.split_bams) { | ||
call samtools.bam_to_fastq as bam_to_fastq_se { input: | ||
bam=split_bam, | ||
paired_end=paired_end, | ||
interleaved=false, # matches default but prevents user from overriding | ||
output_singletons=true, | ||
use_all_cores=use_all_cores, | ||
max_retries=max_retries | ||
} | ||
} | ||
} | ||
|
||
Array[File?] r1 = select_first([bam_to_fastq.read_one_fastq_gz, bam_to_fastq_se.interleaved_reads_fastq_gz]) | ||
Array[File] read1s_ = select_all(r1) | ||
|
||
Array[File?] read2s_ = select_first([bam_to_fastq.read_two_fastq_gz, bam_to_fastq_se.read_two_fastq_gz]) | ||
|
||
scatter (reads in | ||
zip(read1s_, read2s_) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same question about Seems to me this would break in SE mode There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was already using a |
||
) { | ||
call fq.fqlint { input: | ||
read_one_fastq=select_first([reads.left, "undefined"]), | ||
|
@@ -75,7 +96,7 @@ workflow bam_to_fastqs { | |
} | ||
|
||
output { | ||
Array[File] read1s = select_all(bam_to_fastq.read_one_fastq_gz) | ||
Array[File?] read2s = bam_to_fastq.read_two_fastq_gz | ||
Array[File] read1s = read1s_ | ||
Array[File?] read2s = read2s_ | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, did you reorder these to be alphabetical? The style-guide leaves it up to the author how to order parameters of the same type. My process for ordering things is typically:
paired_end
should be first in this block.I'm fine with this being alphabetical, but I don't think any other inputs are. This will likely be the odd one out if left alphabetical, but we intentionally punted the question of "order once type order is satisfied", so I can't complain about anything you do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did choose alphabetical. That said, in this case, I think it makes sense as
paired_end
is an input for the task, where the other arguments more closely correspond with inputs to the underlying tool.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. The style-guide leaves it up to individual authors, so if you want to write tasks using alphabetic sort at this level, that's fine with me. But I'm going to continue using my (highly subjective) method of sorting inputs, so there will be some degree of inconsistency. I think that's fine. The fine-grain sorting here is fairly inconsequential.