Skip to content

Latest commit

 

History

History
30 lines (24 loc) · 1.3 KB

File metadata and controls

30 lines (24 loc) · 1.3 KB

DIAMOND Reference Database Generator

Small repo to generate and store DIAMOND formatted (.dmnd file extension) reference databases for diamond blastx with customizable specificity.

Creating the database

First activate your environment using

micromamba create -f environment.yaml
micromamba activate diamond-db-creator

Then create a config.yaml file (like the example) with the list of all INSDC references you would like to include in your database. For example:

references:
  seg4-H1: U08903.1
  seg4-H2: CY005413.1
  seg3--H5N1: NC_007359.1

Note that the key e.g. seg4-H1 will be the prefix of the dataset name that is returned in the diamond blastx results.tsv. (As DIAMOND matches proteins each CDS in the sequence receives its own identifier, in Loculus we map all sequences that match the protein id|CDS{i} to the sequence id. ). The keys should be the same as your sequenceName they are assigned to e.g. {segment}-{reference} or alternatively, you can add lists of accepted matches to the config.accepted_dataset_matches field.

seqName	dataset	pident	...
MW874350.1	seg3-H5N1|CDS1	0.4829120176662018	...
MW874350.1	seg3-H1N1|CDS2	0.34937439846005774	...
MW874350.1	seg3-H3N2|CDS1	0.22552301255230126	...
snakemake --config config_file=<path-to-config-file>