Skip to content

AssertionError: amino acid sequence length (80) less than mutation position 81 #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
d-henness opened this issue Jan 22, 2019 · 10 comments

Comments

@d-henness
Copy link

I am getting the following error when I try and run MuPeXI on one of my vcf files.

Reading in data
Creating proteome reference dictionary
Creating genome reference dictionary
Creating cancer genes list

VEP: Starting process for running the Ensembl Variant Effect Predictor
Detecting variant caller
MuTect2
Change VCF to the VEP compatible
Extracting allele frequencies
Running VEP
Creating mutation information dictionary

MuPeX: Starting mutant peptide extraction
Extracting all possible peptides from reference
Peptides of 9 aa are being extracted
Peptide extraction begun
Traceback (most recent call last):
File "/home/arunimas/MuPeXI/MuPeXI.py", line 1807, in
main(sys.argv[1:])
File "/home/arunimas/MuPeXI/MuPeXI.py", line 78, in main
peptide_info, peptide_counters, fasta_printout, pepmatch_file_names = peptide_extraction(peptide_length, vep_info, proteome_reference, genome_reference, reference_peptides, reference_peptide_file_names, input_.fasta_file_name, paths.peptide_match, tmp_dir, input_.webserver, input_.print_mismatch, input_.keep_temp, input_.prefix, input_.outdir, input_.num_mismatches)
File "/home/arunimas/MuPeXI/MuPeXI.py", line 730, in peptide_extraction
peptide_sequence_info = mutation_sequence_creation(mutation_info, proteome_reference, genome_reference, p_length)
File "/home/arunimas/MuPeXI/MuPeXI.py", line 763, in mutation_sequence_creation
peptide_sequence_info = insertion_peptide(proteome_reference, mutation_info, peptide_length, PeptideSequenceInfo)
File "/home/arunimas/MuPeXI/MuPeXI.py", line 789, in insertion_peptide
asserted_proteome = reference_assertion(proteome_reference, mutation_info, reference_type = 'proteome')
File "/home/arunimas/MuPeXI/MuPeXI.py", line 1073, in reference_assertion
assert len(seq) >= mutation_info.prot_pos, 'amino acid sequence length ({}) less than mutation position {}'.format(len(seq), mutation_info.prot_pos)
AssertionError: amino acid sequence length (80) less than mutation position 81

I run MuPeXI with

/home/arunimas/MuPeXI/MuPeXI.py -v header.vcf -a HLA-A01:01,HLA-A32:01,HLA-B08:01,HLA-B14:01,HLA-C07:01,HLA-C08:02 -c /home/arunimas/MuPeXI/config.ini -t

I've attached a minimal vcf file which reproduces this error
header.vcf.gz

Is there anything I can do to fix this myself?

@ambj
Copy link
Owner

ambj commented Jan 24, 2019

The first thing I would check is that your references match the genome reference used to generate the vcf file. I have seen this error occur when there is a miss match between the two since the genomic positions do not match up.
Let me know if this is not the case.
Best,
Anne-Mette

@d-henness
Copy link
Author

The reference used for making the vcf file was hg38 and the references I'm giving MuPeXI are GRCh38. As I understand it they used the same positions, so would that make a difference?

@ambj
Copy link
Owner

ambj commented Jan 25, 2019

No this should be fine - I will look into this error soon and figure out why it occurs.
Best,
Anne-Mette

@d-henness
Copy link
Author

Thanks!

@ambj
Copy link
Owner

ambj commented Feb 7, 2019

Dear d-henness
i have looked at your vcf file and it seems that it is the output from VEP hence the following info in the header.

##VEP="v92" time="2018-05-01 18:07:06" cache="/home/jordan/.vep/homo_sapiens/92_GRCh38" ensembl-variation=92.0fc6556 ensembl-io=92.39280bd ensembl-funcgen=92.cd2ca86 ensembl=92.98e8548 1000genomes="phase3" COSMIC="83" ClinVar="201802" ESP="V2-SSA137" HGMD-PUBLIC="20174" assembly="GRCh38.p12" dbSNP="150" gencode="GENC
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID">

MuPeXI is generated to take the VCF file from a variant caller preferably MuTect2 - therefor i cannot ensure that your error does not occur due to the processing through VEP.

Do you see se the same error if you run the VCF file directly obtained from MuTect2

@d-henness
Copy link
Author

I do.

@ambj
Copy link
Owner

ambj commented Feb 10, 2019

Can you send me a link to this original MuTect2 file which have not been processed with VEP?

@d-henness
Copy link
Author

d-henness commented Feb 12, 2019

I'll email you the file

@eunjijunekim
Copy link

I encountered the same error. Has this been resolved?

@leldershaw
Copy link

I have encountered the same error with the example vcf and tsv files provided in the data/ folder on the MuPeXI GitHub repo. Has this been resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants