-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default output type of vcf2phylip.py: too many ambiguous nucleotide sequences? #51
Comments
They are nucleotides, VCF doesn't support aminoacids as far as I know. Your heterozygous genotypes are represented with ambiguity codes, see here https://www.promega.com/resources/guides/nucleic-acid-analysis/restriction-enzyme-resource/restriction-enzyme-resource-tables/iupac-ambiguity-codes-for-nucleotide-degeneracy/ Edgardo |
@edgardomortiz I see. Thanks for your reply! I then wonder if it is normal that my translated phylip file was filled with ambigous code and whether this wound affect the process of tree construction. If so, then should I enable the parameter of |
I don't think it is a good idea to translate SNPs, they are not contiguous in the genome. Besides that, degenerate nucleotides will create degenerate aminoacids as well during translation. The option --resolve-IUPAC will choose one nucleotide at random when you have an ambiguity, you may try that but I think I won't fix your issue of trying to translate SNPs (unless I am missing something about your specific VCF). I hope this makes sense, Edgardo |
Thanks for your reply. By stating "translating SNPs", I mean translating from the VCF format to a format of alignment, e.g., phylip format, for tree construction. There may be some misleading that I did not mean translating from nucleotides to amino acid sequences. Sorry about that. I agree that SNPs are discontinuous in the genome. I am just wondering why I got so many ambiguous sequences from VCF format and whether I should add the Thanks in advance |
Ah I see, you meant converting VCF to another format (sorry for being pedantic but translating has a biological meaning and I got confused). As I said above, you have heterozygous genotypes because I assume your organism is at least diploid. For phylogenetics it is common to use a single sequence per sample, the way to achieve this is by representing both possible nucleotides with a single ambiguity code. As for the consequences of these ambiguities on your data I can't predict them because I am obviously not familiar with the organisms you are analyzing, but in general I could say the more ambiguities the less resolved a tree ends up. Maybe your SNP calling settings were set up incorrectly? Maybe your reference genome is too distant? I don't know, I am just speculating here... Edgardo |
This is what the script does by default, the reason you have the ambiguity codes in the first place. No need to do anything additional... |
Hi, thanks for developing this tool
I run the script of vcf2phylip.py successfully but found the output seems to be the amino acid sequences. My code and the screenshort of my output file are as follows:
I did not found any parameters specified to set the output type, but I prefer the nucelotide sequences alignment to be output. How can I do for this?
Any suggestions would be greatly appreciated!
The text was updated successfully, but these errors were encountered: