Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core dump error on import #29

Open
peterdfields opened this issue May 20, 2019 · 20 comments
Open

core dump error on import #29

peterdfields opened this issue May 20, 2019 · 20 comments

Comments

@peterdfields
Copy link

Hi,

I'm trying to import a bcf file that was generated by first converting a GATK vcf to bcf with bcftools. I'm getting the following error:

Program:   tomahawk-beta-0.7.1 (Tools for computing, querying and storing LD data)
Libraries: tomahawk-0.7.0; ZSTD-1.4.0; htslib 1.9
Contact: Marcus D. R. Klarqvist <[email protected]>
Documentation: https://github.com/mklarqvist/tomahawk
License: MIT
----------
[2019-05-20 16:53:15,426][LOG] Calling import...
[2019-05-20 16:53:15,426][LOG][READER] Opening snp.bcf...
[2019-05-20 16:53:15,433][LOG][VCF] Constructing lookup table for 608 contigs...
[2019-05-20 16:53:15,434][LOG][VCF] Samples: 56...
[2019-05-20 16:53:15,434][LOG][WRITER] Opening snp.twk...
00000000
00001010
tomahawk: lib/core.cpp:117: void tomahawk::twk1_t::calculateHardyWeinberg(): Assertion `ref == 0 || ref == 1 || ref == 4 || ref == 5' failed.
Aborted (core dumped)

The SNPs seem to meet the expectations of the program. I'm not entirely sure what's going wrong here. Please let me know if additional info would be useful.

@mklarqvist
Copy link
Owner

Thanks for reporting this @peterdfields . The problem appears to an assertion I've place in the computation of Hardy-Weinberg equilibrium. For some reason the offending line has an allele that is not biallelic (encodings 0, 1, 4, or 5 in my internal format). This should not happen as non-biallelic, non-diploid sites should be filtered out. Must be some edge case I have not covered.

Could you find the offending line and report it to me? By email if it is not public.

@peterdfields
Copy link
Author

Hi @mklarqvist. Given that an allele remains non-biallelic I have to assume it has somehow made it past gatk selectvariants and vcftools filtering for biallelic snps. Is there a way to force tomahawk to output the line that has the error? I tried with version of the program built with make DEBUG=true but that doesn't change the import stdout info.

@mklarqvist
Copy link
Owner

@peterdfields I'll update the error message to reflect the offending variant line number and offending allele encoding. This is something I should've done in the first place.

@peterdfields
Copy link
Author

@mklarqvist would there be an alternative method to localize the problem line?

@mklarqvist
Copy link
Owner

@peterdfields A crude way would be like a manual binary search:

  1. Input first half of file and check (easiest way is to pipe the data in from bcftools | head -n | tomahawk import)
  2. Input second half and check
  3. Keep splitting the half that fails. You should be able to deduce pretty quickly what the offending line is

I'm digging through the code to find the problem.

@peterdfields
Copy link
Author

@mklarqvist Okay, I followed your advice about doing the manual binary search. The line from the vcf that is causing the error is as follows:

000011F|quiver 1151 . A T 1216.54 . . GT:AD:DP:GQ:PL 1/1:0,1:1:3:40,3,0 1/1:0,3:3:9:109,9,0 1/1:0,3:3:9:118,9,0 1/1:0,5:5:15:155,15,0 1/1:0,2:2:6:68,6,0 1/1:0,2:2:6:69,6,0 1/1:0,4:4:12:158,12,0 1/1:0,2:2:6:86,6,0 1/1:0,3:3:9:124,9,0 1/1:0,3:3:9:116,9,0 1/1:0,1:1:3:43,3,0 0/0:4,0:4:9:0,9,135 ./.:0,0:0:.:0,0,0 1/1:0,3:3:9:125,9,0 1/1:0,5:5:15:202,15,0

@mklarqvist
Copy link
Owner

@peterdfields Thanks for helping me getting to the bottom of this. Very helpful! I am investigating this.

@peterdfields
Copy link
Author

@mklarqvist no worries! I'm looking forward to exploring tomahawk.

@peterdfields
Copy link
Author

Hi @mklarqvist. Any news about this issue? Thank you again for your help.

@mklarqvist
Copy link
Owner

Hello @peterdfields . Sorry for the delay in resolving this. I returned today from a trip abroad. Will pick up were I left of. Thanks for your patience!

@peterdfields
Copy link
Author

Hi @mklarqvist. Okay, great. Thank you again for your assistance!

@peterdfields
Copy link
Author

Hi @mklarqvist. Any luck on tracking down this issue?

@ckastall
Copy link

ckastall commented Aug 1, 2019

Hey @mklarqvist,

I got the same issue, I think the problem is related to missing data, or at least with './.' in the GT field. Replacing missing data with random genotypes or removing loci with any missing data solves the problem with import in my case.

@TinaH10
Copy link

TinaH10 commented Aug 1, 2019

I have the same problem too. Yes, removing sites with ANY missing data will resolve the situation, but this is not really a practical approach for my dataset.

Thanks

@tshalev
Copy link

tshalev commented Sep 12, 2019

Same problem here... Is there a different way we can encode missing data so that it can be captured?

@yifangt
Copy link

yifangt commented Oct 30, 2019

Same problem here ... with command line:

tomahawk import -i allsamples_All.bcf -o snp m 0.2 h 0.01

Program:   tomahawk-beta-0.7.1 (Tools for computing, querying and storing LD data)
Libraries: tomahawk-0.7.0; ZSTD-1.4.4; htslib 1.9
Contact: Marcus D. R. Klarqvist <[email protected]>
Documentation: https://github.com/mklarqvist/tomahawk
License: MIT
----------
[2019-10-30 10:07:36,137][LOG] Calling import...
[2019-10-30 10:07:36,138][LOG][READER] Opening allsamples_All.bcf...
[2019-10-30 10:07:36,139][LOG][VCF] Constructing lookup table for 43 contigs...
[2019-10-30 10:07:36,139][LOG][VCF] Samples: 573...
[2019-10-30 10:07:36,139][LOG][WRITER] Opening snp.twk...
00000000
00000000
00000000
00000000
00001010
tomahawk: lib/core.cpp:117: void tomahawk::twk1_t::calculateHardyWeinberg(): Assertion `ref == 0 || ref == 1 || ref == 4 || ref == 5' failed.
Aborted (core dumped)

And, besides that,

tomahawk import -i allsamples_All.bcf -o snp -m 0.2 -h 0.01
tomahawk: invalid option -- 'm'
[2019-10-30 10:07:30,955][ERROR] Unrecognized option: ?

And the examples I found are wll with the dash as -m 0.xx -h 0.001 etc.
Isn't the dash needed for the options?

@mbrieuc
Copy link

mbrieuc commented Dec 15, 2020

Hi. Is there an update for this problem? I have the same problem as well. Thanks

@wbsimey
Copy link

wbsimey commented Aug 26, 2021

I am getting the same -m error on Ubuntu 20.

$ tomahawk import -i snp-thin.bcf -o snp -m 0.2 -h 0.001 tomahawk: invalid option -- 'm' [2021-08-26 13:26:31,717][ERROR] Unrecognized option: ?

@wbsimey
Copy link

wbsimey commented Aug 26, 2021

I think I figured out that for 'import' the -m has changed to -n and -h is now -H. But I am getting the same core dump.

$ tomahawk import -i snp-thin.bcf -o snp -n 0.2 -H 0.001

Program:   tomahawk-beta-0.7.1 (Tools for computing, querying and storing LD data)
Libraries: tomahawk-0.7.0; ZSTD-1.4.4; htslib 1.9
Contact: Marcus D. R. Klarqvist <[email protected]>
Documentation: https://github.com/mklarqvist/tomahawk
License: MIT
----------
[2021-08-26 13:43:19,573][LOG] Calling import...
[2021-08-26 13:43:19,574][LOG][READER] Opening snp-thin.bcf...
[2021-08-26 13:43:19,598][LOG][VCF] Constructing lookup table for 2,100 contigs...
[2021-08-26 13:43:19,598][LOG][VCF] Samples: 96...
[2021-08-26 13:43:19,598][LOG][WRITER] Opening snp.twk...
00000000
00000000
00000000
00000000
00000000
00000001
00000000
00000001
00001010
tomahawk: lib/core.cpp:117: void tomahawk::twk1_t::calculateHardyWeinberg(): Assertion `ref == 0 || ref == 1 || ref == 4 || ref == 5' failed.
Aborted (core dumped)

@beausoleilmo
Copy link

beausoleilmo commented Sep 23, 2022

I also get this

tomahawk: lib/core.cpp:117: void tomahawk::twk1_t::calculateHardyWeinberg(): Assertion `ref == 0 || ref == 1 || ref == 4 || ref == 5' failed. 
Aborted (core dumped)

error. I don't know what is not working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants