DIAMOND Reference Database Generator

Small repo to generate and store DIAMOND formatted (.dmnd file extension) reference databases for diamond blastx with customizable specificity.

Creating the database

First activate your environment using

micromamba create -f environment.yaml
micromamba activate diamond-db-creator

Then create a config.yaml file (like the example) with the list of all INSDC references you would like to include in your database. For example:

references:
  seg4-H1: U08903.1
  seg4-H2: CY005413.1
  seg3--H5N1: NC_007359.1

Note that the key e.g. seg4-H1 will be the prefix of the dataset name that is returned in the diamond blastx results.tsv. (As DIAMOND matches proteins each CDS in the sequence receives its own identifier, in Loculus we map all sequences that match the protein id|CDS{i} to the sequence id. ). The keys should be the same as your sequenceName they are assigned to e.g. {segment}-{reference} or alternatively, you can add lists of accepted matches to the config.accepted_dataset_matches field.

seqName	dataset	pident	...
MW874350.1	seg3-H5N1|CDS1	0.4829120176662018	...
MW874350.1	seg3-H1N1|CDS2	0.34937439846005774	...
MW874350.1	seg3-H3N2|CDS1	0.22552301255230126	...

snakemake --config config_file=<path-to-config-file>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DIAMOND Reference Database Generator

Creating the database

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

DIAMOND Reference Database Generator

Creating the database