create_templates

Create 3D templates (or solution) file for RNA.

Currently based on MMseqs to carry out search and alignment.

See also MMseqs2 3D RNA Template notebook on Kaggle, which has the full workflow!

Requirements:

biopython
PDB_RNA/ directory holding

cif.gz or .cif files (get ID's from Advanced Search in PDB, searching for polymer entity RNA and then batch download
pdb_release_dates_NA.csv (get from Advanced Search in PDB), and
pdb_seqres_NA.fasta (available at link).

Note that a minimal PDB_RNA folder that works for the example is provided here. A version used for the Kaggle RNA folding competition in May 2025 is available here. If that is not accessible, check out this clone.

Example command line

cd example/
python3 ../create_templates_csv.py \
	 -s validation_sequences.csv \
	--mmseqs_results_file validation_Result.txt \
	--skip_temporal_cutoff \
	--outfile validation_templates.csv

Output should match what is in example/example_output/validation_templates.csv

All options

% python3 create_templates_csv.py -h

usage: create_templates_csv.py [-h] [-s SEQUENCES_FILE] [--mmseqs_results_file MMSEQS_RESULTS_FILE] [--outfile OUTFILE] [--dataset_name DATASET_NAME] [-o OUTDIR] [--max_templates MAX_TEMPLATES] [--cif_dir CIF_DIR]
                               [--skip_temporal_cutoff] [--start_idx START_IDX] [--end_idx END_IDX] [--id_map ID_MAP]

Prepare templates.csv file similar to labels.csv but with MMseqs2-identified templates

options:
  -h, --help            show this help message and exit
  -s, --sequences_file SEQUENCES_FILE
                        CSV file with columns including "target_id" and "sequence". Default is `test_sequences.csv`.
  --mmseqs_results_file MMSEQS_RESULTS_FILE
                        MMseqs output with query,target,evalue,qstart,qend,tstart,tend,qaln,taln.
  --outfile OUTFILE     Name of the output CSV file. Default is `templates.csv`.
  --dataset_name, --name DATASET_NAME
                        full dataset_name, tag for csvs
  -o, --outdir OUTDIR   Where to save output CSVs (Default ./)
  --max_templates MAX_TEMPLATES
                        Maximum number of templates for target. Default is 5. Use 40 to prepare solution
  --cif_dir CIF_DIR     Directory holding cif.gz files, pdb_release_dates_NA.csv, and pdb_seqres_NA.fasta
  --skip_temporal_cutoff
                        Disable tests of temporal cutoff
  --start_idx START_IDX
                        Start index (1,2,...) of test_sequences to work on, for parallelization. Default: 0 (do all sequences).
  --end_idx END_IDX     End index (1,2,...) of test_sequences to work on, for parallelization. Default: 0 (do all sequences).
  --id_map ID_MAP       CSV file with fields `orig` and `new` for mapping original target IDs to new target IDs. Default is `` (no mapping).

How to run MMseqs2

To get the mmseqs file, install MMseqs2, and within example/, use command lines like

mmseqs createdb ../PDB_RNA/pdb_seqres_NA.fasta pdb_seqres_NA  --dbtype 2
mmseqs easy-search validation_sequences.fasta pdb_seqres_NA  validation_Result.txt tmp --search-type 3 --format-output "query,target,evalue,qstart,qend,tstart,tend,qaln,taln"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

create_templates

Example command line

All options

How to run MMseqs2

About

Uh oh!

Releases 3

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
PDB_RNA		PDB_RNA
example		example
LICENSE		LICENSE
README.md		README.md
create_templates_csv.py		create_templates_csv.py

License

DasLab/create_templates

Folders and files

Latest commit

History

Repository files navigation

create_templates

Example command line

All options

How to run MMseqs2

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages