-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taxon identifiers for training data? #9
Comments
I unfortunately do not have a taxonomic identifier for them, ie NCBI's taxonomy IDs. The genomes were collected mostly from NCBI and JGI's mycocosm, but a few were from JGI's genome portal as well. I do have the sequences compiled still and could send you a copy if you're interested. Please send me an email if so. |
Yea, I had a feeling they would contain a mix of NCBI and JGI. Definitely interested in playing around with the fasta database if it's available. How big is the (compressed?) fasta file database? Thanks Patrick! |
Are you still able to transfer over the fasta files? |
Sorry this slipped my mind, the compressed archive is 1.3 gbs. See if you can get it from here: |
Do you happen to have taxon identifiers for the training data in this file?
https://genome.cshlp.org/content/suppl/2018/03/22/gr.228429.117.DC1/Supplemental_Table_S1.xlsx
It's currently difficult to know which sequences are associated with these.
Here's the list but it's difficult to search the names. Did you you download them from NCBI or another database?
Organisms
The text was updated successfully, but these errors were encountered: