This repository is official implementation of Vector Quantized Graph-based AutoEncoder.
The demo of VQGAE is availible on HuggingFace.
More details on the implementation can be found in the pre-print.
🚧 Warning 🚧 This repository is under active development. Soon we will upload all models, weights, datasets etc.
This tool depends on the pytorch, pytorch-geometric and pytorch-lightning packages. If you want to use GPU, then you need to manually specify NVIDIA GPU driver version during installation. Therefore, we provide to instructions to install the VQGAE package with conda and manually.
Here we specify installation with conda/mamba. First, you should copy reposotiry from git
git clone https://github.com/Laboratoire-de-Chemoinformatique/VQGAE.git
cd VQGAE/
If you haven't installed conda-lock
package in your base enviroment, you can do it using the following command:
conda install --channel=conda-forge --name=base conda-lock
Then, you should create a new enviroment using vqgae_gpu.yml
file:
conda env create --name vqgae_env --file vqgae_gpu.yml
Then, you should activate the created enviroment, download repository and install VQGAE:
conda activate vqgae_env
pip install .
If drivers on your NVIDIA machine does not match with the ones used in enviroment, you can manually install all required packages. (Currently, we used Pytorch for CUDA 11.8 while drivers were already version of 12.0 and it worked fine)
First, check your GPU driver version with nvcc
or nvidia-smi
.
In case you haven't installed cudatoolkit drivers, and it requires administrator permissions whic you might not have, the only way to install is pytorch GPU version is with conda:
conda install pytorch cudatoolkit=${CUDAVERSION} -c pytorch -c conda-forge -y
where ${CUDAVERSION} is version of your GPU driver (the tool was tested with ${CUDAVERSION}=11.6).
In case you can manually install cudatoolkit, pytorch can be installed as
pip3 install torch --extra-index-url https://download.pytorch.org/whl/${CUDATORCH}
where ${CUDATORCH} is CUDA version in pytorch format (cpu, cu102, cu113, cu116).
For more details, please visit the official pytorch installation documentation
Then, proceed with installation of pytorch-geometric:
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-${TORCH}+${CUDAPYG}.html
where ${CUDATORCH} is CUDA version in pytorch format (cpu, cu102, cu113, cu116) and ${TORCH} is version of installed pytorch (1.11.0, 1.12.0). For more details please, check the pytorch-geometric installation docs.
Finally, install pytorch-lightning and adabelief optimizer:
pip install "pytorch-lightning>2.0" "adabelief-pytorch>=0.2.1"
The tool work in command line mode. For the training you can simply run:
vqgae_train fit -c configs/vqgae_training.yaml
For the encoding you should run the following command:
vqgae_encode -c configs/vqgae_encode.yaml
And for the decoding you should run the following command:
vqgae_decode -c configs/vqgae_decode.yaml
Also, if you want to create an example of default config, simply run:
vqgae_default_config --task train
Contributions are welcome, in the form of issues or pull requests.
If you have a question or want to report a bug, please submit an issue.
To contribute with code to the project, follow these steps:
- Fork this repository.
- Create a branch:
git checkout -b <branch_name>
. - Make your changes and commit them:
git commit -m '<commit_message>'
- Push to the remote branch:
git push
- Create the pull request.
Please make sure to cite this work if you find it useful:
@article{akhmetshin2023construction,
title={Construction of order-independent molecular fragments space with vector quantised graph autoencoder},
author={Akhmetshin, Timur and Lin, Albert and Madzhidov, Timur and Varnek, Alexandre},
journal={ChemRxiv},
publisher={Cambridge Open Engage},
year={2023},
note={This content is a preprint and has not been peer-reviewed.},
doi={10.26434/chemrxiv-2023-5zmvw}
}