-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I extract the coordinates of CDS/Intron/UTR/IGR regions? #1450
Comments
Hi @MelanyOuyang, Thank you for your query. Please can you share the VEP version you are using ? Can you also share the GFF file you are using? An example of a region extracted from this file and VEP results that show a mismatch. Thank you, |
Hi @ola,
Thanks for your response.
The VEP version I am using is v103.0.
I downloaded cache from https://ftp.ensembl.org/pub/release-107/variation/indexed_vep_cache/homo_sapiens_refseq_vep_107_GRCh38.tar.gz<ftp://ftp.ensembl.org/pub/release-107/variation/indexed_vep_cache/homo_sapiens_refseq_vep_107_GRCh38.tar.gz> and annotated the mutations.
I found some non-intergenic mutations from VEP output(see below examples) were not located in any gene regions in file GCF_000001405.39_GRCh38.p13_genomic.gff.gz
CCDS3349.1 0 . GRCh38 chr4 1394971 1394972 + Frame_Shift_Ins INS
CCDS3349.1 0 . GRCh38 chr4 1394795 1394795 + Missense_Mutation SNP
compmerge.3755.Testes.chr7 0 . GRCh38 chr7 3166130 3166130 + Splice_Region SNP
compmerge.11.Liver.chr8 0 . GRCh38 chr8 9182244 9182245 + Intron INS
compmerge.176.pooled.chr11 0 . GRCh38 chr11 4164233 4164233 + Intron SNP
I also found some immuno-mutations classified as RNA mutations but seems that they don't belong to any transcripts.
IGL 0 . GRCh38 chr22 22255309 22255309 + RNA DEL
TRA 0 . GRCh38 chr14 22065623 22065623 + RNA SNP
Below are the regions they are located in.
NC_000022.11 Curated Genomic gene 22026076 22922913 . + . ID=gene-IGL;
NC_000014.9 Curated Genomic gene 21621904 22552132 . + . ID=gene-TRA;
I would also like to know how 3'Flank and 5'Flank mutations are defined? I defined them according to ‘VEP will include upstream and downstream annotations for variants within 5kb of a nearby feature' from the VEP webpage and found many of the 3'Flank mutations and 5'Flank mutations were IGR mutations according to file GCF_000001405.39_GRCh38.p13_genomic.gff.gz.
Kind regards,
Melany
发件人: Ola Austine ***@***.***>
发送时间: 2023年7月6日 21:44
收件人: Ensembl/ensembl-vep ***@***.***>
抄送: 欧阳丽梅(Limei Ouyang) ***@***.***>; Mention ***@***.***>
主题: Re: [Ensembl/ensembl-vep] How can I extract the coordinates of CDS/Intron/UTR/IGR regions? (Issue #1450)
外部邮件/External Mail
Hi @MelanyOuyang<https://github.com/MelanyOuyang>,
Thank you for your query.
Please can you share the VEP version you are using ?
Can you also share the GFF file you are using? An example of a region extracted from this file and VEP results that show a mismatch.
Thank you,
Ola.
―
Reply to this email directly, view it on GitHub<#1450 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALSKXQNRVJVR2QBK6T7WI7DXO26LDANCNFSM6AAAAAAZ632PBE>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
Hi @MelanyOuyang, Thank you for your response. Suggesting you use this Ensembl cache and let us know if the issue persists. If it is possible, can you share an example of the VCF file? Thank you, |
Hi Ola,
Is it possible that I can get a refseq cache that only contain refseq genes but without CCDS or compmerge genes? Since we prefer the gene names from refseq.
Or is there somewhere I can get the exact coordinates of the CDS/Intron/… regions that also used by VEP?
Look forward to your reply.
Thanks,
Melany
发件人: Ola Austine ***@***.***>
发送时间: 2023年7月7日 19:05
收件人: Ensembl/ensembl-vep ***@***.***>
抄送: 欧阳丽梅(Limei Ouyang) ***@***.***>; Mention ***@***.***>
主题: Re: [Ensembl/ensembl-vep] How can I extract the coordinates of CDS/Intron/UTR/IGR regions? (Issue #1450)
外部邮件/External Mail
Hi @MelanyOuyang<https://github.com/MelanyOuyang>,
Thank you for your response.
Suggesting you use this Ensembl cache<https://ftp.ensembl.org/pub/release-107/variation/indexed_vep_cache/homo_sapiens_vep_107_GRCh38.tar.gz> and let us know if the issue persists.
If it is possible, can you share an example of the VCF file?
Thank you,
Ola
―
Reply to this email directly, view it on GitHub<#1450 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALSKXQJL65G7ICW3Y2IL5JDXO7UPRANCNFSM6AAAAAAZ632PBE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi @MelanyOuyang, |
Hi dear Ensembl team,
I annotated SNPs/Indels using VEP with cache homo_sapiens_refseq/107_GRCh38. I wanted to calculate the mutation density on regions of CDS, UTR, Intron, IGR and Noncoding,respectively. So I firstly used GenomicFeatures to extract the intervals of these regions from ensembl gff. But I found that the Variant_Classification and Consequence in the VEP annotation results didn't match the extracted regions.
How can I get the correct coordinates of these regions?
Look forward to your reply.
Thanks in advance.
The text was updated successfully, but these errors were encountered: