Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to set $taxaJson parameter #4

Open
kyle-lk opened this issue Oct 16, 2018 · 3 comments
Open

how to set $taxaJson parameter #4

kyle-lk opened this issue Oct 16, 2018 · 3 comments

Comments

@kyle-lk
Copy link

kyle-lk commented Oct 16, 2018

Hi Matt,
I need a help to comprehend your pipeline. I don't know how to set parameter $taxaJson ,Can you give me a example?

Thanks

@torptube
Copy link
Collaborator

Hi Kyle,

The --tasaJson parameter is a bit of a misnomer. The serialized data-structure is no longer JSON. I think in this version it's using Sereal. The structure is just a hash of hashes of the NCBI taxa-dump. The top level keys are "parents", "names", "ranks", and "children". These correspond to the same columns in the NCBI taxa dump.

Hopefully this helps, I don't have any place to upload the version that I am using internally, since it's a rather large file, and I am not done writing the database construction instructions.

Cheers,
Matt

@kyle-lk
Copy link
Author

kyle-lk commented Oct 22, 2018

Thanks,Matt,

So, do I need to write a script to convert NCBI's faxomomy dump files? Or I just added "parents", "names", "ranks" and "children" directly to the first line of the gi_taxid_prot.dmp file. In fact, I just came into contact with bioinformatics, this is the second pipeline I studied. A lot of things still don't understand.

@torptube
Copy link
Collaborator

torptube commented Jul 17, 2020

A database constructor has been written and posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants