add check for transcript length before querying UTR sequence #6546
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have been running into an issue with Funcotator where some mutations are causing Funcotator to crash because it attempts to query a segment that extends beyond the boundary of the transcript ( see #6345 )
This pull request addresses the issue by adding a check for transcript length before executing the query. I looked at the code, and Funcotator currently handles problematic sequence queries in
getFivePrimeUtrSequenceFromTranscriptFasta()
by returning an empty string. I modifiedgetFivePrimeUtrSequenceFromTranscriptFasta()
to also return an empty string when the segment it is trying to retrieve extends beyond the boundary of the transcript.I have a small VCF that can be used to reproduce the problem using the current code on
master
and the hg38 data source, and I have verified that this pull request allows Funcotator to process the problematic variant without crashing. I did not add the VCF to the tree, but can provide it if that is preferred. Is there any guidance for how to implement integration tests with funcotator? The Funcotator data source I am using is ~12gb, but I would think the problem could be reproduced with 1 transcript and 1 variant.This is my first pull request to GATK, so please let me know if there is anything you would like me to adjust, I'm happy to address any comments.