Status | |
---|---|
Foundation | |
Installation |
QUBEKit is a Python 3.7+ based force field derivation toolkit for Linux operating systems. Our aims are to allow users to quickly derive molecular mechanics parameters directly from quantum mechanical calculations. QUBEKit pulls together multiple pre-existing engines, as well as bespoke methods to produce accurate results with minimal user input. QUBEKit aims to avoid fitting to experimental data where possible while also being highly customisable.
Users who have used QUBEKit to derive force field parameters should cite the following references (also given in the CITATION.cff
file):
- Exploration and validation of force field design protocols through QM-to-MM mapping
- QUBEKit: Automating the Derivation of Force Field Parameters from Quantum Mechanics
- Biomolecular Force Field Parameterization via Atoms-in-Molecule Electron Density Partitioning
- Harmonic Force Constants for Molecular Mechanics Force Fields via Hessian Matrix Projection
While QUBEKit is stable we are constantly working to improve the code and broaden its compatibilities.
We use software written by many people; if reporting a bug, please (to the best of your ability) make sure it is a bug with QUBEKit and not with a dependency. We welcome any suggestions, issues or pull requests for additions or changes.
QUBEKit is available through conda-forge; this is the recommended installation method. Github has our latest version which will likely have newer features but may not be stable.
# Recommended
conda install -c conda-forge qubekit
# Recommended for Developers (see below)
git clone https://github.com/qubekit/qubekit.git
cd <install location>
python setup.py develop
Download Anaconda from the above link and install with the linux command:
./Anaconda3<version>.sh
You may need to use chmod +x Anaconda3<version>.sh
to make it executable.
We recommend you add conda to your .bashrc when prompted.
Optionally, you may choose to use Gaussian. Installation of Gaussian is likely handled by your institution. Gaussian is an option for QM optimisations, hessian calculations and density calculations. QUBEKit contains alternative approaches for these calculations.
Minimal conda packages are included in the conda-forge install with all optional engine packages left to the user to install. If there are dependency issues or version conflicts in your environment, packages can be installed individually.
conda install -c conda-forge qubekit
The following table details some optional calculation engine packages which are not included in the QUBEKit conda package and how to install them.
Package | Conda Install |
---|---|
PSI4 >=1.4.1 | conda install -c psi4 psi4 |
Chargemol | conda install -c conda-forge chargemol |
xtb-python | conda install -c conda-forge xtb-python |
torchani | conda install -c conda-forge torchani |
Adding lots of packages can be a headache. If possible, install using Anaconda through the terminal. This is generally safest, as Anaconda should deal with versions and conflicts in your environment. Generally, conda packages will have the conda install command on their website or github. As a last resort, either git clone them and install:
git clone https://<git_address_here>
cd <location of cloned package>
python setup.py install
or follow the described steps in the respective documentation.
If downloading QUBEKit to edit the latest version of the source code, the easiest method is to install via conda, then remove the conda version of qubekit and git clone. This is accomplished with a few simple commands:
# Install QUBEKit as normal
conda install -c conda-forge qubekit
# Remove ONLY the QUBEKit package itself, leaving all dependencies installed
# and on the correct version
conda remove --force qubekit
# Re-download the latest QUBEKit through github
git clone https://github.com/qubekit/qubekit.git
# Re-install QUBEKit outside of conda
cd qubekit
python setup.py develop
Below is general help for most of the commands available in QUBEKit.
There is some short help available through the terminal (invoked with --help
).
--help
may be called for any command you are unsure of:
qubekit --help
qubekit bulk run --help
qubekit config create --help
All necessary long-form help is within this document.
QUBEKit has a lot of familiar settings which are used in production and changing these can result in very different force field parameters. The settings are controlled using json style config files which are easy to edit.
QUBEKit does have a full suite of defaults built in. To create a config file template with the name "example", containing default options, run the command:
qubekit config create example.json
Simply edit the sections as desired to change the method, basis set, memory, etc.
This config file can then be used to execute a QUBEKit job using the --config
or -c
flag in the run command.
A single molecule job can be executed with the run
command.
Some form of molecule input must also be supplied;
either a file (designated with the --input-file
or -i
flag), or a smiles string (--smiles
or -sm
).
In the case of a smiles string, the molecule must also be named with the --name
or -n
flag.
qubekit run -i methane.pdb
qubekit run -i ethane.mol2
qubekit run -sm CO -n methanol
Bulk commands are for high throughput analyses; they are invoked with the bulk
keyword.
A csv file must be used when running a bulk analysis.
If you would like to generate a blank csv file, simply run the command:
qubekit bulk create example.csv
where example.csv is the name of the file you want to create. This will automatically generate a csv file with the appropriate column headers. When writing to the csv file, simply append rows after the header row with the relevant information required.*
Importantly, different config files can be supplied for each molecule in each row.
name,smiles,multiplicity,config_file,restart,end
*Only the name column needs to be filled (which is filled automatically with the generated csv), any empty columns will simply use the default values:
- The smiles string column only needs to be filled if a file is not supplied;
- If the multiplicity column is empty, multiplicity will be set to 1;
- If the config column is empty, the default config is used;
- Leaving the restart column empty will start the program from the beginning;
- Leaving the end column empty will end the program after a full analysis from its start point.
A bulk analysis is started with the run
command, followed by the name of the csv file:
qubekit bulk run example.csv
Any necessary files should all be in the same place: where you're running QUBEKit from. Upon executing this bulk command, QUBEKit will work through the rows in the csv file. Each molecule will be given its own directory (the same as single molecule analyses).
Be aware that the names of the files are used as keys to find the configs. So, each molecule file being analysed should have a corresponding row in the csv file with the correct name (if using smiles strings, the name column will just be the name given to the created file).
For example (barring the header, csv row order does not matter, and you do not need to include smiles strings when a file is provided; column order does matter):
<location>/:
benzene.pdb
ethane.mol2
bulk_example.csv
bulk_example.csv:
name,smiles,multiplicity,config_file,restart,end
methane,C,,default_config.json,,
benzene,,,default_config.json,,
ethane,,,default_config.json,,
QUBEKit also has the ability to run partial analyses, or redo certain parts of an analysis.
For a single molecule analysis, this is achieved with the --end
(-e
) and restart
commands.
The stages are:
Stage | Description |
---|---|
parametrisation | The molecule is parametrised using OpenFF, AnteChamber or an xml file. This step also loads in the molecule and extracts key information like the atoms and their coordinates. |
optimisation | There is a quick, preliminary optimisation which speeds up the following QM optimisation. |
hessian | This uses the same QM engine as the prior optimisation to calculate the Hessian matrix which is needed for calculating bonding parameters. |
charges | The electron density is calculated. This is where the solvent is applied as well (if configured). Next, the charges are partitioned and calculated |
virtual_sites | Virtual sites are added where the error in ESP would be large without them. |
non_bonded | Lennard-Jones parameters (sigma and epsilon values) are calculated. |
bonded_parameters | Using the Hessian matrix, the bond lengths, angles and force constants are calculated with the Modified Seminario Method. |
torsion_scanner | Using the molecule's geometry, a torsion scan is performed. The molecule can then be optimised with respect to these parameters. |
torsion_optimisation | The fitting and optimisation step for the torsional analysis. |
In a normal run, all of these stages are executed sequentially,
but with --end
, restart
and --skip-stages
you are free to run from any step to any step inclusively while skipping any non-essential steps.
When using --end
(or -e
), specify the end-point in the proceeding command.
When using restart
, specify the start-point in the proceeding command.
When using --skip-stages
(or -s
), specify any stage(s) which you needn't run. Be aware some stages depend on one another and cannot necessarily be skipped.
When using custom start and/or end points with bulk commands, the stages are written to the csv file, rather than the terminal. If no start point is specified, a new working directory be created and QUBEKit will run as normal. Otherwise, QUBEKit will find the correct directory based on the molecule name.
Throughout an analysis, key information will be added to the run directory.
This information can be quickly parsed by QUBEKit's progress
command
To display the progress of all analyses in your current directory and below, use the command:
qubekit progress
QUBEKit will use the workflow_result.json
which is in all QUBEKit directories and display a colour-coded table of the progress.
Indicator | Meaning |
---|---|
✓ | Completed successfully |
~ | Neither finished nor errored |
S | Skipped |
E | Started and failed for some reason |
Viewing the QUBEKit.err file will give more information as to why it failed.
You cannot run multiple kinds of analysis at once. For example:
qubekit bulk run example.csv run -i methane.pdb
is not a valid command. These should be performed separately:
qubekit bulk run example.csv
qubekit run -i methane.pdb
Be wary of running QUBEKit concurrently through different terminal windows. The programs QUBEKit calls often just try to use however much memory is assigned in the config files; this means they may try to take more than is available, leading to a hang-up, or--rarely--a crash.