-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: bcftools consensus fail with overlapping variants #132
Comments
This is caused by a quirk of applying variants iteratively to the reference, I'll have to think about how to handle this case properly, medaka introducing nonsense frameshifts like this is a little concerning though... There's definitely some indications that using medaka for r10.4.1 data might actually be counterproductive for hac/sup basecalling https://rrwick.github.io/2023/05/05/ont-only-accuracy-with-r10.4.1.html. Any thoughts on the medaka side @cjw85 @SamStudio8? |
Medaka (like many variant callers) contains no explicit knowledge of biology so the attribution of a variant it proposes leading to a frameshift is not unsurprising. My recollection is that fieldbioinformatics explicitely excludes indels that aren't a multiple of 3 bases long?
Medaka is fundamentally a consensus caller, not a variant caller in the modern sense that people typically understand. To produce a VCF file, medaka:
(There's some subtleties in 4. to do with specifics of the VCF spec, and ensuring left alignment and variant normalization). It should be the case that |
That's a good point re filtering out of frame indels, the variants in |
@BioWilko @cjw85 Here is the only one of those three mutations from the pass.vcf file
|
@BioWilko Ah, I don't think we were aware of this. Thanks for linking this paper. |
When running
artic minion --medaka
, we are occasionally running into an issue where there are overlapping pass variants from the different read groups that is causing bcftools consensus to fail and an empty consensus file is generated. I did see similar issues 1, 2, but those seem to be caused by an overlap of a pass and fail variant.We have had this issue on two medaka models, r1041_e82_400bps_hac_v4.2.0 and r941_min_hac_g507, but did not encounter this issue with the model r941_min_high_g360. The models have been updated alongside our basecaller and have been confirmed with
medaka tools resolve_model --auto_model
.Log:
Running:
bcftools consensus -f sample.preconsensus.fasta sample.pass.vcf.gz -m sample.coverage_mask.txt -o sample.consensus.fasta
The fasta sequence does not match the REF allele at MN908947.3:669:
REF .vcf: [GT]
ALT .vcf: [G]
REF .fa : [GN]TAC.........
Command failed:
bcftools consensus -f sample.preconsensus.fasta sample.pass.vcf.gz -m sample.coverage_mask.txt -o sample.fasta
merged.vcf:
sample.1.vcf:
sample.2.vcf:
sample.primertrimmed.rg.sorted.bam:
The text was updated successfully, but these errors were encountered: