Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct databases for all profilers from the same input refseq data #33

Open
2 of 7 tasks
LilyAnderssonLee opened this issue Aug 14, 2023 · 1 comment
Open
2 of 7 tasks

Comments

@LilyAnderssonLee
Copy link

LilyAnderssonLee commented Aug 14, 2023

Construct databases using the same refseq data as the kraken2 database which includes

  • complete genome, chromosome, scaffold and contigs
  • Archaea, Bacteria, viral, protozoa, fungi, plasmid, human (GRCh38 &T2T), UniVec_Core

kraken2 database can be downloaded from https://benlangmead.github.io/aws-indexes/k2

The list of refseq sequences is stored in the tsv file: https://genome-idx.s3.amazonaws.com/kraken/pluspf_20231009/library_report.tsv

Update databases whenever the kraken2 database is updated.

This issue can be closed after construction of these databases:

  • diamond
  • kaiju
  • centrifuge
  • krakenuniq
  • kmcp
  • ganon
  • sourmash
@sofstam sofstam changed the title Which databases are used in each Profiler? Which databases are used in each classifier/profiler? Sep 25, 2023
@LilyAnderssonLee LilyAnderssonLee changed the title Which databases are used in each classifier/profiler? Construct databases for all profilers from the same input refseq data Dec 18, 2023
@LilyAnderssonLee
Copy link
Author

LilyAnderssonLee commented Dec 19, 2023

kaiju The db construction failed due the memory limit, 450GB on hasta. This database was built on UPPMAX Bianca with 512GB memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant