-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure ClinVar is filtering out significance terms #1075
Comments
In cases where more than one ClinVar significance is designated on a variant, gene.iobio will parse the multiple terms, and apply the filter criteria to the most pathogenic term. So, in your example, the variant will pass the filter ClinVar = 'pathogenic' because that term ranks higher than 'drug response'. Another common scenario is a ClinVar variant with 'Benign/Likely Benign' dual designation. Here, 'Likely Benign' ranks higher, so it will be evaluated when the filter is applied. So this variant will pass a custom filter ClinVar = 'Likely Benign'. However, this variant will NOT pass the filter ClinVar = 'Benign'. It could be argued that the filter logic should evaluate each term, passing the variant if ANY term matches. For example, as @AlistairNWard points out, 'drug response' is often coupled with another term. In these cases, the current logic will miss those variants with dual designations of 'drug response' + 'pathogenic' (or 'likely pathogenic'). |
We shouldn't be looking at any of the other terms (drug_response, risk allele etc) unless we explicitly want them in a different spot. These don't count as significance, the only terms that are a significance are: Pathogenic I think Pathogenic/Likely_pathogenic is also an allowed term and doesn't need to be broken up |
Thank you, @AlistairNWard, for identifying the clinical significance terms that we should be filtering on. Right now, the terms we can filter on are:
Here is an example of multiple CLINSIG terms for an RAI1 variant (https://www.ncbi.nlm.nih.gov/clinvar/variation/1560497/): The VCF INFO fields look like this:
IMPORTANT! This may break our current code. Notice that CLNSIGCONF has the terms that we filter on, not CLNSIG. It looks like this new INFO field CLNSIGCONF is only used for conflicting classifications of pathogenicity. Here are the INFO fields for a ClinVar variant with a single CLINSIG designation https://www.ncbi.nlm.nih.gov/clinvar/variation/1377204/:
|
It looks like the CLINSIG term will be replaced with three separate classifications. https://github.com/ncbi/clinvar/blob/master/ClassificationOnClinVar.md. I don't see these separate classifications in the latest ClinVar vcf. I will create a separate issue (#1093) |
@tonydisera, conflicting data from submitters is a term specifically used when a consortium makes a single submission to ClinVar, but the consortium has conflicting interpretations. Conflicting interpretations of pathogenicity is when different submitters submit the same variant but have specific conflicts. If one lab things a variant is Pathogenic and another Likely Pathogenic, the record will appear as Pathogenic/Likely_pathogenic, but if one lab has any type of pathogenic term and another lab has benign or uncertain, the variant will be listed as conflicting interpretations of pathogenicity. This is specifically information about RCV and VCV which are accession ids for variants with conflicting submissions and I don't know that this has any effect on the vcf files. |
If the ClinVar significance contains a term
Pathogenic,drug_response
, does gene.iobio remove the non-significance terms (drug_response in this case) so that the displayed term is justPathogenic
? If not, the variant will not be flagged even though it is a pathogenic variantThe text was updated successfully, but these errors were encountered: