Skip to content

Commit 90f2219

Browse files
authored
Make reblocking default in single-sample workflows (#141)
1 parent 4028664 commit 90f2219

File tree

59 files changed

+299
-140
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+299
-140
lines changed

pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.changelog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
# 1.5.2
2+
2021-11-01
3+
4+
Updated GenotypeGVCFs to support reblocked GVCFs as inputs
5+
16
# 1.5.1
27
2020-12-16
38

pipelines/broad/dna_seq/germline/joint_genotyping/JointGenotyping.wdl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ import "../../../../../tasks/broad/JointGenotypingTasks.wdl" as Tasks
66
# Joint Genotyping for hg38 Whole Genomes and Exomes (has not been tested on hg19)
77
workflow JointGenotyping {
88

9-
String pipeline_version = "1.5.1"
9+
String pipeline_version = "1.5.2"
1010

1111
input {
1212
File unpadded_intervals_file
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
It is expected that some variants will have annotation changes:
2+
- New version has higher ANs in some cases -- low quality genotypes retain more data
3+
- When ANs vary, DP can obviously vary,
4+
- ACs may not agree for * alleles because of corrections in reblocking
5+
- InbreedingCoeff (and AS_InbreedingCoeff) can vary since it's likelihood-based and not count based like ExcessHet; this is especially noticeable at low GQs because likelihood is split more evenly across genotypes
6+
- RankSums can vary because of a histogram/median bug fix in PR #7131 implemented in versions 4.2.1.0 and subsequent; comparison to results of ldg_revGGVCFs with updated GATK make this easier
7+
- High QD values can vary because of the "correction" of remapping values > 35 back to a Gaussian centered at 30 (stdev 3)
8+
- DP may vary due to changes in reference depth associated with reference block merging
9+
- Strand bias annotations (mostly AS_SOR, but also SOR and occasionally FS) change when annotations are dropped for homozygous reference genotypes (annotations agree with updated strand bias counts)
10+
11+
Reblocked callset will have a few more AC=2 hom-vars at low coverage sites because QUAL increases as hom-refs go to GQ0 and don't provide much evidence for reference
12+
13+
Spanning deletion alleles may be missing in new output due to corrections in reblocking (--allow-missing-stars)
14+
In a few rare cases variants appear to be "dropped" in the reblocked output (this is the --allow-extra-alleles argument below)
15+
QUAL scores are highly sensitive to a variety of factors including CPU platform, but here they may vary more significantly because GQ0 genotypes with depth data are called as hom-refs
16+
17+
The GATK branch ldg_VCFcomparator (4f0292abaeb7cabb527ca830048520e8aecdbde4) can take into account a lot of these
18+
expected differences with a command like:
19+
java -jar gatk.compare.jar VCFComparator -V:expected $truth -V:actual $test --ignore-quals -R $hg38 --warn-on-errors \
20+
--allow-extra-alleles --allow-missing-stars --ignore-filters --ignore-attribute DP
21+
22+
Lots of warnings about AN or InbreedingCoeff mismatches, but only exceptions that output VCs are of concern
23+
24+
Exome tests needed VQSR indel Gaussians reduced to 3 (only 50 exomes -- not a lot of indels; --ignore-filters --ignore-attribute VQSLOD --ignore-attribute culprit)

pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.changelog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
# 1.4.2
2+
2021-11-01
3+
4+
Task wdls used by JointGenotypingByChromosomePartOne were updated with changes that don't affect JointGenotypingByChromosomePartOne wdl
5+
16
# 1.4.1
27
2020-12-16
38

pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.wdl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ import "../../../../../../tasks/broad/JointGenotypingTasks.wdl" as Tasks
55
# Joint Genotyping for hg38 Exomes and Whole Genomes (has not been tested on hg19)
66
workflow JointGenotypingByChromosomePartOne {
77

8-
String pipeline_version = "1.4.1"
8+
String pipeline_version = "1.4.2"
99

1010
input {
1111
File unpadded_intervals_file

pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.changelog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
# 1.4.2
2+
2021-11-01
3+
4+
Task wdls used by JointGenotypingByChromosomePartTwo were updated with changes that don't affect JointGenotypingByChromosomePartTwo wdl
5+
16
# 1.4.1
27
2020-12-16
38

pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartTwo.wdl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ import "../../../../../../tasks/broad/JointGenotypingTasks.wdl" as Tasks
55
# Joint Genotyping for hg38 Exomes and Whole Genomes (has not been tested on hg19)
66
workflow JointGenotypingByChromosomePartTwo {
77

8-
String pipeline_version = "1.4.1"
8+
String pipeline_version = "1.4.2"
99

1010
input {
1111
String callset_name

pipelines/broad/dna_seq/germline/joint_genotyping/exome/JointGenotyping.inputs.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
"JointGenotyping.haplotype_database": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt",
2929

3030
"JointGenotyping.callset_name": "exomes_data_validation",
31-
"JointGenotyping.sample_name_map" : "gs://broad-gotc-test-storage/exome/joint_genotyping/gvcfs/round_three_sample_map",
31+
"JointGenotyping.sample_name_map" : "gs://broad-gotc-test-storage/exome/joint_genotyping/gvcfs/reblocked_sample_map",
3232

3333
"JointGenotyping.small_disk" : 100,
3434
"JointGenotyping.medium_disk" : 200,

pipelines/broad/dna_seq/germline/joint_genotyping/exome/test_inputs/Plumbing/gather_vcfs_high_memory.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,15 @@
2828
"JointGenotyping.haplotype_database": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt",
2929

3030
"JointGenotyping.callset_name": "small_callset_low_threshold",
31-
"JointGenotyping.sample_name_map" : "gs://broad-gotc-test-storage/joint_genotyping/exome/plumbing/callset/plumbing_sample_map",
31+
"JointGenotyping.sample_name_map" : "gs://broad-gotc-test-storage/joint_genotyping/exome/plumbing/callset/reblocked_plumbing_sample_map",
3232

3333
"JointGenotyping.small_disk" : 100,
3434
"JointGenotyping.medium_disk" : 200,
3535
"JointGenotyping.large_disk" : 500,
3636
"JointGenotyping.huge_disk" : 2000,
3737

3838
"JointGenotyping.gather_vcfs": true,
39-
"JointGenotyping.snps_variant_recalibration_threshold": 50
39+
"JointGenotyping.snps_variant_recalibration_threshold": 50,
40+
41+
"JointGenotyping.IndelsVariantRecalibrator.max_gaussians": 3
4042
}

pipelines/broad/dna_seq/germline/joint_genotyping/exome/test_inputs/Plumbing/gather_vcfs_low_memory.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,15 @@
2828
"JointGenotyping.haplotype_database": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt",
2929

3030
"JointGenotyping.callset_name": "small_callset_high_threshold",
31-
"JointGenotyping.sample_name_map" : "gs://broad-gotc-test-storage/joint_genotyping/exome/plumbing/callset/plumbing_sample_map",
31+
"JointGenotyping.sample_name_map" : "gs://broad-gotc-test-storage/joint_genotyping/exome/plumbing/callset/reblocked_plumbing_sample_map",
3232

3333
"JointGenotyping.small_disk" : 100,
3434
"JointGenotyping.medium_disk" : 200,
3535
"JointGenotyping.large_disk" : 500,
3636
"JointGenotyping.huge_disk" : 2000,
3737

3838
"JointGenotyping.gather_vcfs": true,
39-
"JointGenotyping.snps_variant_recalibration_threshold": 500000
39+
"JointGenotyping.snps_variant_recalibration_threshold": 500000,
40+
41+
"JointGenotyping.IndelsVariantRecalibrator.max_gaussians": 3
4042
}

0 commit comments

Comments
 (0)