-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure for chromosomes with "chr" prefix #4
Comments
Looking at the code, there are hard-coded mappings in
I think solution 1 is certainly too complex. C++ is not magic, but writing good robust parsers is not that easy and will take time. My C++ is also rusty, and Sophia is also implemented in a much advanced version of the language, that I haven't yet learned. Furthermore, given that we have to invest so much time to obtait the parameters for sophia, I think, it really does not make sense to provide Sophia for anything but hg37 and hg38. So my plan would be as follows:
The tables with prefixes could still at runtime get derived from static tables, e.g. by prefixing all/certain chromosomes with a user-provided value, but they could also be hard-coded as We'd also have to add parameters for selecting the mapping table by its name (hg37, hg38, chr_hg37, chr_hg38) and/or the prefixes ("chr", "hsa_", etc.). This solution would already provide some flexibility. There is also the issue, that in the hg38, we prefixed some, but not all chromosomes. Therefore, I think, we should go with the multiple static tables approach, but also with the initialization of the ChrConverter class to use any of these. This would open the path to a complete dynamic implementation with a mapping table file provided by the user. Basically, I propose to switch from from the |
The first thing to do, anyway, would be to ensure no segfault happens if the input file contains unknown chromosomes, but a helpful error message :) |
Hi @vinjana, I have generated this example from the public SEQC2 data for
And here is the error message:
I assume the error is because of different columns in the SAM file in this new hg38 alignments compared to the hg19 ones. I am trying to figure this out. |
Running sophia with input that contains "chr" prefixes for chromosomes produces a segmentation fault.
The text was updated successfully, but these errors were encountered: