Skip to content

Commit

Permalink
SRJointCallGVCFsWithGenomicsDB.wdl now prefers dbsnp_vcf to `know…
Browse files Browse the repository at this point in the history
…n_sites_vcf` for genotyping (#480)

- Added ResolveMapKeysInPriorityOrder to resolve keys in a WDL Map object in priority order. This enables fallbacks / preferences for different entries in a map (such as the refmap).
- For genotyping with the gnarly genotyper, the dbsnp resource now prefers dbsnp_vcf from the RefMap. If this does not exist, it will fall back to known_sites_vcf. This allows different resources to be used for BQSR (which prefers known_sites_vcf in SRFlowcell) and the gnarly genotyper, which now prefers dbsnp_vcf.
  • Loading branch information
jonn-smith authored Dec 12, 2024
1 parent 182ede2 commit 5c20523
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,14 @@ workflow SRJointCallGVCFsWithGenomicsDB {

Map[String, String] ref_map = read_map(ref_map_file)

# Resolve the db_snp_vcf file, with preference to the db_snp_vcf file if it exists:
call UTILS.ResolveMapKeysInPriorityOrder as ResolveMapKeysInPriorityOrder {
input:
map = ref_map,
keys = ["test_bad_key_should_not_be_found", "dbsnp_vcf", "known_sites_vcf"]
}
File db_snp_vcf = ref_map[ResolveMapKeysInPriorityOrder.key]

# Create sample-name map:
call SRJOINT.CreateSampleNameMap as CreateSampleNameMap {
input:
Expand Down Expand Up @@ -178,7 +186,7 @@ workflow SRJointCallGVCFsWithGenomicsDB {
ref_fasta = ref_map['fasta'],
ref_fasta_fai = ref_map['fai'],
ref_dict = ref_map['dict'],
dbsnp_vcf = ref_map["known_sites_vcf"],
dbsnp_vcf = db_snp_vcf,
prefix = prefix + "." + interval_name + ".gnarly_genotyper.raw",
heterozygosity = heterozygosity,
heterozygosity_stdev = heterozygosity_stdev,
Expand All @@ -194,7 +202,7 @@ workflow SRJointCallGVCFsWithGenomicsDB {
ref_fasta = ref_map['fasta'],
ref_fasta_fai = ref_map['fai'],
ref_dict = ref_map['dict'],
dbsnp_vcf = ref_map["known_sites_vcf"],
dbsnp_vcf = db_snp_vcf,
prefix = prefix + "." + interval_name + ".genotype_gvcfs.raw",
heterozygosity = heterozygosity,
heterozygosity_stdev = heterozygosity_stdev,
Expand Down
47 changes: 47 additions & 0 deletions wdl/tasks/Utility/Utils.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -2013,3 +2013,50 @@ task SplitContigToIntervals {
docker: select_first([runtime_attr.docker, default_attr.docker])
}
}
task ResolveMapKeysInPriorityOrder {
meta {
description: "Gets the first key in the map that exists. If no keys exist, returns an empty string."
}

parameter_meta {
map: "Map[String, String] to resolve."
keys: "Array[String] of keys to check in order of priority"
}

input {
Map[String, String] map
Array[String] keys
}

String out_file = "key.txt"

command <<<
touch ~{out_file}

f=~{write_map(map)}
awk '{print $1}' < ${f} > keys_in_map.txt
while read key ; do
grep -q "^${key}$" keys_in_map.txt
if [[ $? -eq 0 ]] ; then
echo "${key}" > ~{out_file}
exit 0
fi
done < ~{write_lines(keys)}
>>>

output {
String key = read_string(out_file)
}

###################
runtime {
cpu: 1
memory: "512 MiB"
disks: "local-disk 50 HDD"
bootDiskSizeGb: 25
preemptible: 3
maxRetries: 2
docker:"gcr.io/cloud-marketplace/google/ubuntu2004:latest"
}
}

0 comments on commit 5c20523

Please sign in to comment.