(Original artwork by Althea Hansel-Harris)
Package for creating SQLite database from virtual screening results, performing filtering, and exporting results. Compatible with AutoDock-GPU and AutoDock-Vina.
Ringtail reads collections of Docking Log File (DLG) or PDBQT results from virtual screenings performed with AutoDock-GPU and AutoDock-Vina, respectively, and deposits them into a SQLite database. It then allows for the filtering of results with numerous pre-defined filtering options, generation of a simple result scatterplot, export of molecule SDFs, and export of CSVs of result data. Result file parsing is parallelized across the user's CPU.
The publication describing the design, implementation, and features of Ringtail may be found in the JCIM paper:
If using Ringtail in your work, please cite this publication.
Ringtail is developed by the Forli lab at the Center for Computational Structural Biology (CCSB) at Scripps Research.
--mode
is now--docking_mode
--summary
is now--print_summary
--pattern
is now--file_pattern
--name
is now--ligand_name
--max_nr_atoms
is now--ligand_max_atoms
--smarts
is now--ligand_substruct
--smarts_idxyz
is now--ligand_substruct_pos
--smarts_join
is now--ligand_operator
--van_der_waals
is now--vdw_interactions
--hydrogen_bond
is now--hb_interactions
--reactive_res
is now--reactive_interactions
- Fully developed API can use python for scripting exclusively
- Can add docking results directly without using file system (for vina only as output comes as a string).
- The Ringtail log is now written to a logging file in addition to STDOUT
- Interaction tables: one new table has been added (
Interactions
) which references the interaction id fromInteraction_indices
, while the tableInteraction_bitvectors
has been discontinued. - A new method to update an existing database 1.1.0 (or 1.0.0) to 2.0.0 is included. However, if the existing database was created with the duplicate handling option, there is a chance of inconsistent behavior of anything involving interactions as the Pose_ID was not used as an explicit foreign key in db v1.0.0 and v1.1.0 (see Bug fixes below).
- The option
duplicate_handling
could previously only be applied during database creation and produced inconsistent table behavior. Option can now be applied at any time results are added to a database, and will create internally consistent tables. Please note: if you have created tables in the past and invoking the keywordduplicate_handling
you may have errors in the "Interaction_bitvectors" table. These errors cannot be recovered, and we recommend you re-make the database with Ringtail 2.0.0. - Writing SDFs from filtering bookmarks: will check that bookmark exists and has data before writing, and will now produce SDFs for any bookmarks existing bookmarks. If the bookmark results from a filtering where
max_miss
< 0 it will note if the non-union bookmark is used, and if the base name for such bookmarks is provided it will default to thebasename_union
bookmark for writing the SDFs. - Output from filtering using
max_miss
andoutput_all_poses=False
(default) now producing expected behavior of outputting only one pose per ligand. Filtering for interactionsmax_miss
allows any given pose for a ligand to missmax_miss
interactions and still be considered to pass the filter. Previously, in the resultingunion
bookmark andoutput_log
text file some ligands would present with more than one pose, although the option tooutput_all_poses
wasFalse
(and thus the expectation would be one pose outputted per ligand). This would give the wrong count for how many ligands passed a filter, as some were counted more than once.
If you have previously written a database with Ringtail < v2.0.0, it will need to be updated to be compatible with filtering with v2.0.0. We have included a new script rt_db_to_v200.py
to perform this updated. Please note that all existing bookmarks will be removed during the update. The usage is as follows:
$ rt_db_to_v200.py -d <v2.0.0 database 1 (required)> <v2.0.0 database 2+ (optional)>
Multiple databases may be specified at once. The update may take a few minutes per database.
Code base and database schema version update
- Significant filtering runtime improvements vs v1.0
--summary
option for getting quick overview of data across entire dataset- Selection of dissimilar output ligands with Morgan fingerprint or interaction fingerprint clustering
- Select similar ligands from query ligand name in previous Morgan fingerprint or interaction finger clustering groups
- Option for exporting stored receptor PDBQTs
- Filter by ligand substructure
- Filter by ligand substructure location in cartesian space
--max_miss
option now outputs union of interaction combinations by default, with--enumerate_interaction_combs
option to log passing ligands/poses for individual interaction combination
If you have previously written a database with Ringtail v1.0.0, it will need to be updated to be compatible with filtering with v1.1.0. We have included a new script rt_db_v100_to_v110.py
to perform this updated. Please note that all existing bookmarks will be removed during the update. The usage is as follows:
$ rt_db_v100_to_v110.py -d <v1.0.0 database 1 (required)> <v1.0.0 database 2+ (optional)>
Multiple databases may be specified at once. The update may take a few minutes per database.
- python (> 3.9, tested up to 3.12)
- RDKit
- SciPy
- Matplotlib
- Pandas
- chemicalite
- Meeko (from the Forli Lab)
- Multiprocess
- Installation
- Definitions
- Getting Started Tutorial
- Scripts
- rt_process_vs.py Documentation
- rt_compare.py Documentation
- Python tutorials
Please note that Ringtail requires Python 3.9 or 3.10.
$ pip install ringtail
If using conda, pip
installs the package in the active environment.
Also note that if using MacOS, you may need to install Multiprocess separately:
$ pip install multiprocess
$ conda create -n ringtail python=3.10
$ conda activate ringtail
After this, navigate to the desired directory for installing Ringtail and do the following:
$ git clone [email protected]:forlilab/Ringtail.git
$ cd Ringtail
$ pip install .
This will automatically fetch the required modules and install them into the current conda environment.
If you wish to make the code for Ringtail editable without having to re-run pip install .
, instead use
$ pip install --editable .
If you would like to test your installation of Ringtail, a set of automated tests are included with the source code. To begin, you must install pytest in the Ringtail conda environment:
$ pip install -U pytest
Next, navigate to the test
subdirectory within the cloned Ringtail directory and run pytest by simply calling
$ pytest
The compounds used for the testing dataset were taken from the NCI Diversity Set V. The receptor used was PDB: 4J8M.
- DLG: Docking Log File, output from AutoDock-GPU.
- PDBQT: Modified PDB format, used for receptors (input to AutoDock-GPU and Vina) and output ligand poses from AutoDock-Vina.
- Cluster: Each docking result contains a number of independent runs, usually 20-50. These independent poses are then clustered by RMSD, giving groups of similar poses called clusters.
- Pose: The predicted ligand shape and position for single run of a single ligand in a single receptor.
- Docking score: The predicited binding energy from AutoDock-GPU or Vina.
- Bookmark: The set of ligands or ligand poses from a virtual screening passing a given set of filters. Stored within a virtual screening database as a view.
- Ringtail:
Drat, I'm not a cat! Even though this eye-catching omnivore sports a few vaguely feline characteristics such as pointy ears, a sleek body, and a fluffy tail, the ringtail is really a member of the raccoon family. https://animals.sandiegozoo.org/animals/ringtail
The Ringtail command line interface is orchestrated through the script rt_process_vs.py
.
Navigate to the directory containing the data, in our case test_data:
$ cd test/test_data/
To write to the database we need to specify a few things:
- that we are using
write
mode - source of docking results files. Docking results can be added either by providing one or more single files, a .txt file containing files, or by providing a directory containing docking results files.
- optional database name: ringtail will default to creating a database of name
output.db
- optional docking mode: ringtail will default to assuming the files were produced by Autodock-GPU, if they are from vina specify
--mode vina
Let us add all docking files within the path test_data (specified by .
meaning current directory), whose folders we can traverse recursively by specifying --recursive
$ rt_process_vs.py write --file_path . --recursive
We can print a summary of the contents of the database by using the optional tag -su
or --summary
and specifying the database database from which to read
:
$ rt_process_vs.py read --input_db output.db -su
Total Stored Poses: 645
Total Unique Interactions: 183
Energy statistics:
min_docking_score: -7.93 kcal/mol
max_docking_score: -2.03 kcal/mol
1%_docking_score: -7.43 kcal/mol
10%_docking_score: -6.46 kcal/mol
min_leff: -0.62 kcal/mol
max_leff: -0.13 kcal/mol
1%_leff: -0.58 kcal/mol
10%_leff: -0.47 kcal/mol
Let us start filtering with a basic docking score cutoff of -6 kcal/mol:
$ rt_process_vs.py read --input_db output.db --eworst -6
This produces an output log output_log.txt
with the names of ligands passing the filter, as well as their binding energies. Each round of filtering is also stored in the database as a SQLite view, which we refer to as a "bookmark" (default value is passing_results
).
We can also save a round of filtering with a specific bookmark name, and perform more filtering on this bookmark.
For example, start out with filtering out the compounds that are within the 5th percentile in terms of docking score and save the bookmark as ep5
:
$ rt_process_vs.py read --input_db output.db --score_percentile 5 --log_file ep5_log.txt --bookmark_name ep5
Let's then further refine the set of molecules by applying an interaction filter for van der Waals interactions with V279 on the receptor:
$ rt_process_vs.py read --input_db output.db --filter_bookmark ep5 --vdw_interactions A:VAL:279: --log_file ep5_vdwV279_log.txt --bookmark_name ep5_vdwV279
The filtered molecules can then be exported as an e.g., SDF file which can be used for visual inspection in molecular graphics programs. At the same time, if pymol is installed, we can kick off a pymol session of the ligands
$ rt_process_vs.py read --input_db output.db --bookmark_name ep5_vdwV279 --export_sdf_path ep5_vdwV279_sdfs --pymol
$ rt_process_vs.py --help
$ rt_process_vs.py write --help
$ rt_process_vs.py read --help
The Ringtail package includes two command line oriented scripts: rt_process_vs.py
and rt_compare.py
. Both may be run with options specified in the command line and/or using options specified in a JSON-formatted file given with --config
. Command line options override any conflicting options in the config file.
rt_process_vs.py serves as the primary script for the package and is used to both write docking files to a SQLite database and to perform filtering and export tasks on the database. It is designed to handle docking output files associated with a single virtual screening in a single database.
rt_compare.py is used to combine information across multiple virtual screenings (in separate databases) to allow or exclude the selection of ligands passing filters across multiple targets/models. This can be useful for filtering out promiscuous ligands, a technique commonly used in exerimental high-throughput screening. It may also be used if selection of ligands binding multiple protein structures/conformations/homologs are desired.
rt_generate_config_file.py can be ran to create a config file template
rt_db_to_v200.py is used to update older databases to the latest version.
rt_db_v100_to_v110.py is used to update db v1.0.0 to 1.1.0.
The rt_compare.py
script is designed to be used with databases already made and filtered. The script is used to select ligands which are shared between the given filter bookmark(s) of some virtual screenings (wanted) or exclusive to some screenings and not others (unwanted). The script uses a subset of commands similar to rt_process_vs.py
.
An example of use: select ligands found in "filter_bookmark" bookmarks of database1 but not database2 (they must both contain a bookmark named "filter1"):
rt_compare.py --wanted database1.db --unwanted database2.db --bookmark_name filter_bookmark
For more detailed description of usage, please see the readthedocs.org site for ringtail.
Ringtail has been re-designed to allow for direct use of its API for e.g., scripting purposes. This circumvents the use of the command line tools, and allows for more advanced usage. The available operations and keywords are the same as for the command line interface, but the methods can now be accessed at a more granular level if desired. For docking engines that provides direct string output such as Vina, it is also possible to save the docking results output directly to the database as a string and thus circumventing use of the computer file system (some link to vina scripting, probably in readthedocs).
A ringtail core is created by instantiating a RingtailCore
object with a database. Currently, a database can only be added upon instantiation.
rtc = RingtailCore("output.db")
Default logging level is "WARNING", and a different logger level can be set at the time of object instantiation, or later by the log level change API:
rtc = RingtailCore(db_file="output.db", logging_level="DEBUG)
# or
rtc.logger.set_level("INFO")
To add results to the database, use the add_results_from_files
method that takes as input files and directories to upload,
as well as a receptor path and database properties and how to handle the resutls (how many poses to save, how to deal with interactions if having vina results),
and whether or not to print a summary after writing the results to the database.
rtc.add_results_from_files( file_path = "test_data/",
recursive = True,
save_receptor = False,
max_poses = 3)
Both files (filesources_dict
) and processing options (optionsdict
) can be provided as dictionaries as well or instead of the the individual options. Any provided individual options will overwrite the options provided through dictionaries. The use and prioritization of dictionaries and method attributes is true for most of the available API methods.
file_sources = {
"file_path": "test_data/",
"recursive": True,
}
writeoptions = {
"store_all_poses": True,
"max_proc": 4
}
rtc.add_results_from_files( filesources_dict = file_sources,
optionsdict = writeoptions,)
If at any point you wish to print a summary of the database, the method can be called directly:
rtc.produce_summary()
The default docking mode is "dlg", and can be changed to "vina" by accessing the ringtail core property docking_mode
.
rtc_vina = RingtailCore("output_vina.db")
rtc_vina.docking_mode = "vina"
Since vina does not automatically write docking results to the file system, these can be added to the database by associating them with a ligand name in a dictionary and using this dictionary as the source of results when adding to the database:
vina_docking_result1 = "long string of results"
vina_docking_result2 = "different string of results"
vina_results = {
"ligand1": vina_docking_result1,
"ligand2": vina_docking_result2
}
rtc_vina.add_results_from_vina_string(results_strings = vina_results,
max_poses = 2)
To filter, simply access the API method filter
and provide desired filter values. Names of bookmark and output log for containing filtered results can be specified in the method.
rtc.filter(eworst=-6,
bookmark_name = "e6",
log_file = "filtered_results.txt")
Just like with the command line tool, you can choose to filter over a bookmark that has already been created:
rtc.filter(vdw_interactions=[('A:VAL:279:', True), ('A:LYS:162:', True)],
bookmark_name = "e6vdw279162",
filter_bookmark = "e6",
log_file = "filtered_results_2.txt")
To export filtered molecules in a specific bookmark to SDF files use the following method, where the sdf_path
directory will be created if it does not already exist:
rtc.write_molecule_sdfs(sdf_path = "sdf_files", bookmark_name = "e6vdw279162")
One or more of the filtered ligands can be visualized in PyMol:
rtc.pymol(bookmark_name = "e6vdw279162")
All of the arguments used for the command line tool also applies to the Ringtail API in some form. For example, bookmark names and filter values are provided when an API method is called, while the log level can be sat at instantiation or at any time during the scripting process. Instead of differentiating between an --input_db
and --output_db
, only one database file is operated on in a given instantiated RingtailCore
object. A subset of the command line arguments are actual API methods (such as --plot
or --find_similar_ligands
) that will be called directly, with optional input arguments (typically a bookmark_name
or ligand_name
). Each API method comes with type hints and extensive documentation. Additionally, a more extensive example of its use can be found on readthedocs.