Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differing ClinVar clinical significance annotation according to transcript/feature #1680

Open
growland2 opened this issue May 23, 2024 · 2 comments
Assignees
Labels

Comments

@growland2
Copy link

growland2 commented May 23, 2024

Describe the issue

When running VEP using a local implementation via Docker, we see that variants are annotated with different ClinVar clinical significance values depending on the corresponding transcript or feature. Examples of such variants are shown below:

Example variant 1:

CHROM    POS    ID    REF    ALT
1	976097	666960	G	GGGGCC

When annotated via VEP locally using VEP version=107, VEP Cache=RefSeq and ClinVar GRCh37 version=20240317 (as a custom annotation source), this variant was not annotated with a ClinVar_CLNSIG for the given transcripts (see excerpt from VEP output below):

VEP v107 and RefSeq cache output:

1	976097	666960	G	GGGGCC	49.28	.	AC=2;AF=1;AN=2;DB;DP=3;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=22.34;QD=16.43;SOR=1.179;CSQ=CGGGC|AGRN||insertion|frameshift_variant|HIGH|4/39||NM_001305275.2|NM_001305275.2:c.574_578dup|NP_001292204.1:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN||insertion|frameshift_variant|HIGH|3/36||NM_001364727.2|NM_001364727.2:c.259_263dup|NP_001351656.1:p.Ser89GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN||insertion|frameshift_variant|HIGH|4/36||NM_198576.4|NM_198576.4:c.574_578dup|NP_940978.2:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03	GT:AD:DP:GQ:PL	1/1:0,3:3:9:77,9,0

Command:

docker run -v /home/dnanexus:/data -w /data 199b8c2aa90b vep -i /data/path_likely_path_final_temp.vcf -o /data/path_likely_path_final_temp_annotated.vcf.gz --dir /data --vcf --cache --refseq --exclude_predicted --symbol --hgvs --hgvsg --check_existing --variant_class --numbers --format vcf --offline --exclude_null_alleles --assembly GRCh37 --custom /data/clinvar_20240317_GRCh37.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN,CLNSIGCONF --custom /data/gnomad.genomes.r2.1.1.sites.all.noVEP_normalised_decomposed_PASS.dias_trimmed_v1.0.0.vcf.bgz,gnomADg,vcf,exact,0,AC,AN,AF,nhomalt,popmax,AC_popmax,AN_popmax,AF_popmax,nhomalt_popmax --custom /data/gnomad.exomes.r2.1.1.sites.noVEP_normalised_decomposed_PASS.dias_trimmed_v1.0.0.vcf.bgz,gnomADe,vcf,exact,0,AC,AN,AF,nhomalt,popmax,AC_popmax,AN_popmax,AF_popmax,nhomalt_popmax,non_cancer_AC,non_cancer_AN,non_cancer_AF,non_cancer_nhomalt,non_cancer_AC_popmax,non_cancer_AN_popmax,non_cancer_AF_popmax,non_cancer_nhomalt_popmax,non_cancer_popmax --custom /data/TWE_POPAF_N500_chr1-22_220413.vcf.gz,TWE,vcf,exact,0,AF,AC_Hom,AC_Het,AN --custom /data/HGMD_Pro_2023.4_hg19.vcf.gz,HGMD,vcf,exact,0,PHEN,RANKSCORE,CLASS --plugin SpliceAI,snv=/data/spliceai_scores.masked.snv.hg19.vcf.gz,indel=/data/spliceai_scores.masked.indel.hg19.vcf.gz --plugin REVEL,/data/revel_b37.tsv.gz --plugin CADD,/data/cadd_whole_genome_SNVs_GRCh37.tar.gz,/data/gnomad.genomes.r2.1.1.indel.tsv.gz,/data/InDels_GRCh37.tsv.gz --fields Allele,SYMBOL,HGNC_ID,VARIANT_CLASS,Consequence,IMPACT,EXON,INTRON,Feature,HGVSc,HGVSp,HGVS_OFFSET,Existing_variation,STRAND,ClinVar,ClinVar_CLNSIG,ClinVar_CLNSIGCONF,ClinVar_CLNDN,gnomADg_AC,gnomADg_AN,gnomADg_AF,gnomADg_nhomalt,gnomADg_popmax,gnomADg_AC_popmax,gnomADg_AN_popmax,gnomADg_AF_popmax,gnomADg_nhomalt_popmax,gnomADe_AC,gnomADe_AN,gnomADe_AF,gnomADe_nhomalt,gnomADe_popmax,gnomADe_AC_popmax,gnomADe_AN_popmax,gnomADe_AF_popmax,gnomADe_nhomalt_popmax,gnomADe_non_cancer_AC,gnomADe_non_cancer_AN,gnomADe_non_cancer_AF,gnomADe_non_cancer_nhomalt,gnomADe_non_cancer_AC_popmax,gnomADe_non_cancer_AN_popmax,gnomADe_non_cancer_AF_popmax,gnomADe_non_cancer_nhomalt_popmax,gnomADe_non_cancer_popmax,TWE_AF,TWE_AC_Hom,TWE_AC_Het,TWE_AN,HGMD,HGMD_PHEN,HGMD_CLASS,HGMD_RANKSCORE,SpliceAI_pred_DS_AG,SpliceAI_pred_DS_AL,SpliceAI_pred_DS_DG,SpliceAI_pred_DS_DL,SpliceAI_pred_DP_AG,SpliceAI_pred_DP_AL,SpliceAI_pred_DP_DG,SpliceAI_pred_DP_DL,REVEL,CADD_PHRED --buffer_size 500 --fork 16 --no_stats --compress_output bgzip --shift_3prime 1

We then re-ran locally using VEP version=v112, VEP Cache=RefSeq and ClinVar GRCh37 version=20240317, to ensure the prior output was not specific to v107. This returned the same results as above.

Next, we ran specifying the merged cache containing RefSeq and Ensembl transcripts. Here we saw that we only get the expected ClinVar_CLNSIG annotation "Pathogenic/Likely_pathogenic" for some of the Ensembl (ENS) transcripts (ENST00000469403, ENST00000477585 and ENST00000479707) but not for the remaining transcripts (ENST00000379370, NM_001305275.2, NM_001364727.2 and NM_198576.4) (see output from VEP below).

VEP v112 and merged cache output:

1	976097	666960	G	GGGGCC	62.74	.	AC=2;AF=1;AN=2;DB;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=31.37;SOR=2.303;CSQ=CGGGC|AGRN|329|insertion|frameshift_variant|HIGH|4/36||ENST00000379370|ENST00000379370.2:c.574_578dup|ENSP00000368678.2:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,GGGCC|AGRN|329|insertion|non_coding_transcript_exon_variant|MODIFIER|2/3||ENST00000469403|ENST00000469403.1:n.521_525dup||14|rs1570190059|1|666960|Pathogenic/Likely_pathogenic||Congenital_myasthenic_syndrome_8&Congenital_myasthenic_syndrome|||||||||||||||||||||||||||||||||||||||||||||17.03,GGGCC|AGRN|329|insertion|downstream_gene_variant|MODIFIER|||ENST00000477585||||rs1570190059|1|666960|Pathogenic/Likely_pathogenic||Congenital_myasthenic_syndrome_8&Congenital_myasthenic_syndrome|||||||||||||||||||||||||||||||||||||||||||||17.03,GGGCC|AGRN|329|insertion|upstream_gene_variant|MODIFIER|||ENST00000479707||||rs1570190059|1|666960|Pathogenic/Likely_pathogenic||Congenital_myasthenic_syndrome_8&Congenital_myasthenic_syndrome|||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN|329|insertion|frameshift_variant|HIGH|4/39||NM_001305275.2|NM_001305275.2:c.574_578dup|NP_001292204.1:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN|329|insertion|frameshift_variant|HIGH|3/36||NM_001364727.2|NM_001364727.2:c.259_263dup|NP_001351656.1:p.Ser89GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03,CGGGC|AGRN|329|insertion|frameshift_variant|HIGH|4/36||NM_198576.4|NM_198576.4:c.574_578dup|NP_940978.2:p.Ser194GlyfsTer60|14||1|||||||||||||||||||||||||||||||||||||||||||||||||17.03	GT:AD:DP:GQ:PGT:PID:PL	1/1:0,2:2:6:1|1:1200192_C_G:90,6,0

Command:

docker run -v /home/dnanexus:/data -w /data 607ee83f9536 vep -i /data/vep_clnsig_temp.vcf -o /data/vep_clnsig_temp_annotated.vcf.gz --dir /data --vcf --cache --merged --exclude_predicted --symbol --hgvs --hgvsg --check_existing --variant_class --numbers --format vcf --offline --exclude_null_alleles --assembly GRCh37 --custom /data/clinvar_20240317_GRCh37.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT,CLNDN,CLNSIGCONF --custom /data/gnomad.genomes.r2.1.1.sites.all.noVEP_normalised_decomposed_PASS.dias_trimmed_v1.0.0.vcf.bgz,gnomADg,vcf,exact,0,AC,AN,AF,nhomalt,popmax,AC_popmax,AN_popmax,AF_popmax,nhomalt_popmax --custom /data/gnomad.exomes.r2.1.1.sites.noVEP_normalised_decomposed_PASS.dias_trimmed_v1.0.0.vcf.bgz,gnomADe,vcf,exact,0,AC,AN,AF,nhomalt,popmax,AC_popmax,AN_popmax,AF_popmax,nhomalt_popmax,non_cancer_AC,non_cancer_AN,non_cancer_AF,non_cancer_nhomalt,non_cancer_AC_popmax,non_cancer_AN_popmax,non_cancer_AF_popmax,non_cancer_nhomalt_popmax,non_cancer_popmax --custom /data/TWE_POPAF_N500_chr1-22_220413.vcf.gz,TWE,vcf,exact,0,AF,AC_Hom,AC_Het,AN --custom /data/HGMD_Pro_2023.4_hg19.vcf.gz,HGMD,vcf,exact,0,PHEN,RANKSCORE,CLASS --plugin SpliceAI,snv=/data/spliceai_scores.masked.snv.hg19.vcf.gz,indel=/data/spliceai_scores.masked.indel.hg19.vcf.gz --plugin REVEL,/data/revel_b37.tsv.gz --plugin CADD,/data/cadd_whole_genome_SNVs_GRCh37.tar.gz,/data/gnomad.genomes.r2.1.1.indel.tsv.gz,/data/InDels_GRCh37.tsv.gz --fields Allele,SYMBOL,HGNC_ID,VARIANT_CLASS,Consequence,IMPACT,EXON,INTRON,Feature,HGVSc,HGVSp,HGVS_OFFSET,Existing_variation,STRAND,ClinVar,ClinVar_CLNSIG,ClinVar_CLNSIGCONF,ClinVar_CLNDN,gnomADg_AC,gnomADg_AN,gnomADg_AF,gnomADg_nhomalt,gnomADg_popmax,gnomADg_AC_popmax,gnomADg_AN_popmax,gnomADg_AF_popmax,gnomADg_nhomalt_popmax,gnomADe_AC,gnomADe_AN,gnomADe_AF,gnomADe_nhomalt,gnomADe_popmax,gnomADe_AC_popmax,gnomADe_AN_popmax,gnomADe_AF_popmax,gnomADe_nhomalt_popmax,gnomADe_non_cancer_AC,gnomADe_non_cancer_AN,gnomADe_non_cancer_AF,gnomADe_non_cancer_nhomalt,gnomADe_non_cancer_AC_popmax,gnomADe_non_cancer_AN_popmax,gnomADe_non_cancer_AF_popmax,gnomADe_non_cancer_nhomalt_popmax,gnomADe_non_cancer_popmax,TWE_AF,TWE_AC_Hom,TWE_AC_Het,TWE_AN,HGMD,HGMD_PHEN,HGMD_CLASS,HGMD_RANKSCORE,SpliceAI_pred_DS_AG,SpliceAI_pred_DS_AL,SpliceAI_pred_DS_DG,SpliceAI_pred_DS_DL,SpliceAI_pred_DP_AG,SpliceAI_pred_DP_AL,SpliceAI_pred_DP_DG,SpliceAI_pred_DP_DL,REVEL,CADD_PHRED --buffer_size 2000 --fork 16 --no_stats --compress_output bgzip --shift_3prime 1

Example variant 2:

CHROM    POS    ID    REF    ALT
1	1371178    830327	T	TGGCGCGGAGC

As was seen for "Example variant 1", when annotated via VEP locally using VEP version=107, VEP Cache=RefSeq and ClinVar GRCh37 version=20240317, this variant received different ClinVar_CLNSIG values depending on the trascript/feature with NM_022834.5 and NM_199121.3 receiving no ClinVar_CLNSIG annotation, and NR_125994.1, NR_125995.1 and NR_125996.1 being annotated with the expected "Pathogenic/Likely_pathogenic" (see excerpt from VEP output below):

VEP v107 and RefSeq cache output:

1	1371178	830327	T	TGGCGCGGAGC	62.74	.	AC=2;AF=1;AN=2;DB;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=31.37;SOR=0.693;CSQ=GCGCGGAGCG|VWA1||insertion|frameshift_variant&splice_region_variant|HIGH|1/3||NM_022834.5|NM_022834.5:c.62_71dup|NP_073745.2:p.Gly25AlafsTer74|21||1|||||||||||||||||||||||||||||||||||||||||||||||||23.7,GCGCGGAGCG|VWA1||insertion|frameshift_variant&splice_region_variant|HIGH|1/3||NM_199121.3|NM_199121.3:c.62_71dup|NP_954572.2:p.Gly25AlafsTer53|21||1|||||||||||||||||||||||||||||||||||||||||||||||||23.7,GGCGCGGAGC|LINC01770||insertion|upstream_gene_variant|MODIFIER|||NR_125994.1||||rs749383814|-1|830327|Pathogenic/Likely_pathogenic||VWA1-related_condition&Neuronopathy&_distal_hereditary_motor&not_provided&Neuronopathy&_distal_hereditary_motor&_autosomal_recessive_7&Neuromuscular_disease|15|28196|0.00053199|0|afr|6|8376|0.000716332|0|||||||||||||||||||0.002|0|2|1000|CI218713|"Neuromyopathy"|DM|||||||||||23.7,GGCGCGGAGC|LINC01770||insertion|upstream_gene_variant|MODIFIER|||NR_125995.1||||rs749383814|-1|830327|Pathogenic/Likely_pathogenic||VWA1-related_condition&Neuronopathy&_distal_hereditary_motor&not_provided&Neuronopathy&_distal_hereditary_motor&_autosomal_recessive_7&Neuromuscular_disease|15|28196|0.00053199|0|afr|6|8376|0.000716332|0|||||||||||||||||||0.002|0|2|1000|CI218713|"Neuromyopathy"|DM|||||||||||23.7,GGCGCGGAGC|LINC01770||insertion|upstream_gene_variant|MODIFIER|||NR_125996.1||||rs749383814|-1|830327|Pathogenic/Likely_pathogenic||VWA1-related_condition&Neuronopathy&_distal_hereditary_motor&not_provided&Neuronopathy&_distal_hereditary_motor&_autosomal_recessive_7&Neuromuscular_disease|15|28196|0.00053199|0|afr|6|8376|0.000716332|0|||||||||||||||||||0.002|0|2|1000|CI218713|"Neuromyopathy"|DM|||||||||||23.7	GT:AD:DP:GQ:PGT:PID:PL	1/1:0,2:2:6:1|1:827267_C_T:90,6,0

What decides which transcript receives the expected ClinVar_CLNSIG value?

Additional information

System

  • VEP version: v107 and v112
  • VEP Cache version: RefSeq and merged cache
  • Perl version: N/A - whatever is present in docker container
  • OS: Ubuntu 20.04
  • tabix installed ? - Yes
@nuno-agostinho nuno-agostinho self-assigned this May 23, 2024
@nuno-agostinho
Copy link
Contributor

Hi @growland2,

Sorry to hear about this inconvenience.

While running VEP with ClinVar as a custom file (similar to your command), I do get the same results consistently for the expected variants, regardless of their Ensembl/RefSeq transcript.

I am going to try and see if any of the options you are using could be affecting the results.

Best regards,
Nuno

@growland2
Copy link
Author

Thanks, FYI running these variants through VEP GRCh37 online failed to annotate with any clinical significance for any transcript for both example variants:

VEP online results for Example variant 1 and Example variant 2.

Kind regards,

Greg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants