This is the official implementation of Virtual Screening Assistant Network (ViscaNet). The motivation of various modules in this repository is from Analyzing Learned Molecular Representations for Property Prediction and A self‐attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. The basecode was obtained from chemprop.
This is common for all the experiments.
- Install Anaconda.
- Download the github repository including “viscanet.yml”.
- Open the terminal in your system and go to the location where viscanet.yml is downloaded. Then, run
conda env create -f environment.yml. For more details, check the conda documentation. - Once the environment is created, enter the environment using the command
conda activate viscanet.
The following steps can be used for any experiment where data is not split into multiple files.
- Add the data files to
./data/directory. - Run all the cells of preprocess_fda.ipynb.
- Run
python train_fda.py --data_path (path to data file) --dataset_type classification --smiles_column s_sd_SMILES --target_columns r_i_docking_score --epochs 30 --num_folds 1 --features_path (path to features) --attention --separate_test_path ./data/fda.csv --separate_test_features_path ./data/fda.npy- Sample data_path =
./data/fda.csv - Sample features_path =
./data/fda.npy - If you want to split one data file into train/val/test, do not use
--separate_test_pathand--separate_test_features_path - If you have a separate test or val file, use
--separate_val_pathand--separate_val_features_path, and--separate_test_pathand--separate_test_features_path - If you need attention images, use
--viz_dirand give the path to the location where you want to store those images.
- Sample data_path =
- This will give the test scores and create TruePositives.csv and FalsePositives.csv in the
./inference/directory. - Run all the cells of get_fdaid.ipynb. This will add the drugbank_ID to the above mentioned generated files.
The following steps can be used for any experiment where data is split into multiple files.
- Add the data files to
./data/directory. Add them in./data/nsp1_supernaturaldb_sift_data/. Also, create directoriesnew_data,new_data_feats, andnew_data_norm_featsinside thedatadirectory. - Run
python preprocess_new_data.pyto preprocess the supernatural data. The processed data will store in the./data/new_data/directory. - Use
python feature.pyto generate the feature (.npy) files for train as well as test data. These will be stored in./data/new_data_feats/directory. - Generate normalized features for both train and test data by executing
python feature_normalize.py. These will be stored in./data/new_data_norm_feats/directory. - Split the data and the obtained normalized features into train, val, and test in directories
./data/new_data/and./data/new_data_norm_feats/respectively.
- Run
python os_train.py. This will train the model on all the datafiles available in/data/new_data/train/directory. - The model will be stored in
./model_checkpoints/directory which is created automatically. Every time you run point number 1, it will run the new epoch for model training. For example, if you runpython os_train.py5 times in sequence, it will mean that the model is now trained for 5 epochs.
- To test the model, you can run
python os_test.py --target_columns r_i_docking_score --dataset_type classification --epochs 1 --num_folds 1 --no_features_scaling --data_path ./.- The paths are hard-coded inside the os_test.py and therefore,
--data_pathcan be anything. - If you need attention images, use
--viz_dirand give the path to the location where you want to store those images.
- The paths are hard-coded inside the os_test.py and therefore,
NOTE - Kindly fix the paths in case it throws an error or feel free to contact me or raise an issue.