hydra-genetics v0.7.0
Release notes
For more details on features and bug fixes see further down.
Features
CNV Filtering
Filtering using frequency in sample database removed true whole chromosome deletions. Therefore the frequency filtering is now only done on segments that are smaller than 10Mb.
CNV Reporting
The CNV html report is moved out from the pipeline and moved into a new reporting module in Hydra-genetics. The report itself are not changed except for minor improvements.
CNV PureCN
PureCN is now producing reliable results. However, it does not handle low TC samples and samples with few copy number abberations. Both of these will in general get underestimated TC in the range of 15-32%. Therefore a new combined CNV html and tsv report is now generated that uses the PureCN TC if it is above 35% and the pathology estimated TC otherwise. These files have the tag pathology_purecn in the results. The results for pathology and pureCN only are placed in additional files under results.
PureCN uses a new filter (snv_hard_filter_purecn) to get its vcf input file.
RNA MultiQC
The bam-files produced by the Star aligner are now duplicate marked by picard. This is only used for for QC and the duplication rate is reported in the RNA MultiQC report.
Changes in config.yaml
Changed:
- output: "config/output_list.json" => output: "config/output_files.yaml" #New output file format
- cnv_html_report:
show_table: true
template_dir: config/cnv_report_template - design_intervals_rna: "/projects/wp1/nobackup/ngs/utveckling/Twist_RNA_DATA/bed/Twist_RNA_Design5.annotated.20230630.interval_list" #New file for duplication QC
- report_fusions: #Corrected spelling of fusioncatcher, only changes shown
fusioncatcher_flag_low_support: 15
fusioncatcher_low_support: 3
fusioncatcher_low_support_fp_genes: 20
fusioncatcher_low_support_inframe: 6
Added:
- merge_cnv_json:
annotations:
- /references/cnv_amp_genes.bed
- /references/cnv_loh_genes.bed
filtered_cnv_vcfs:
- cnv_sv/svdb_query/{sample}{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.filter.cnv_hard_filter_amp.vcf.gz
- cnv_sv/svdb_query/{sample}{type}.{tc_method}.svdb_query.annotate_cnv.cnv_loh_genes_all.filter.cnv_hard_filter_loh.vcf.gz
unfiltered_cnv_vcfs:
- cnv_sv/svdb_query/{sample}{type}.{tc_method}.svdb_query.annotate_cnv.cnv_amp_genes.vcf.gz
- cnv_sv/svdb_query/{sample}{type}.{tc_method}.svdb_query.annotate_cnv.cnv_loh_genes_all.vcf.gz
germline_vcf: snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.filter.germline.exclude.blacklist.vcf.gz - filter_vcf:
snv_hard_filter_purecn: "config/config_hard_filter_purecn.yaml" #New filter for purecn - svdb_merge:
tc_method:
- name: pathology_purecn #new combined tag
cnv_caller:
- cnvkit
- gatk
Hydra modules with releases
- prealignment: v1.0.0 (No change)
- alignment: v0.3.1 (No change)
- snv_indels: v0.3.0 (No change)
- annotation: v0.3.0 (No change)
- filtering: v0.1.0 (No change)
- qc: v0.3.0 (No change)
- biomarker: v0.3.1 (No change)
- cnv_sv: v0.3.1 (No change)
- reports: v0.1.0 (New module, CNV html report moved here)
Features
- add result file for combined purecn and pathology (fb17cc2)
- added duplication % to multiQC (2923ebf)
- added picard mark duplicates of bam-files for QC (0f33656)
- added read group function for STAR (c224c17)
- added RG to STAR and changed bam file for QC (1f68585)
- added rule for modifying MBQ in vcf (72dc309)
- added rule for modifying MBQ in vcf (6c26626)
- added rule for modifying MBQ in vcf (188f418)
- change pureCN cutoff to 0.35 (4e06643)
- choose purecn if tc > 30% and pathology otherwise (20292ab)
- harder filtering (7cae319)
- make two tsv reports using different gene lists (0737c60)
- test_input_all.tsv for v0.7.0 (27df350)
- test_input_VAL2022.tsv for v0.7.0 (26f5157)
- use filtered vcf with both germline and somatic variants (f6c5cc3)
- use gatk2 for purecn (ee2e2bc)
- use germline vcf for purecn (eb0eaf5)
- use purity file directly from purecn to also get ploidity (b397a3c)
- use vaf and snv filtered vcf with both germline and somatic variants (48f3563)
Bug Fixes
- add germline flag to vcf (1e8de1b)
- add missing filter tag (7c01e2d)
- annotate using missing sites instead (de172b0)
- bug fixes (224a6a4)
- change checkpoint to rule (5e04c9a)
- change path to new normals (7a3b420)
- correct header in cnv report file (c947152)
- correct output name for purecn reference (92af141)
- correct rule import from wrong module (275a60d)
- delegate schema validation to reports module (b5dada0)
- do not filter large cnvs based on frequency in database (4b86637)
- get correct tc to html report (c394cba)
- handle empty purecn file (df89fe5)
- import spelling mistake (f3e7248)
- moved result file to additional files (c6c7574)
- properly overrule the
get_tc
function (701ceb8) - purecn_modify_vcf bugfix (e910e18)
- redefine rule to use new params in config (5ef5dac)
- return correct tc (45485c7)
- solve different wildcards in rule error (62082a1)
- spelling error of Exception (8d6fbf9)
- tabix of annotation database (1f70635)
- use correct genome (66e3c8b)
- use correct get_tc (ba5272e)
- use correct interval file (92c0e41)
- use Illumina for platform (130f3a2)