init

HICAI-ZJU · Oct 16, 2023 · 6d24452 · 6d24452
commit 6d24452
Show file tree

Hide file tree

Showing 48 changed files with 1,801 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,12 @@
+dump/
+data/
+ogb-data/
+__pycache__/
+script/
+drugood-data*/
+drugood-data*
+ogb-data
+data
+*.out
+*.tar.gz
+checkpoint
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 todoooooo
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,100 @@
+# Learning Invariant Molecular Representation in Latent Discrete Space
+This repository is the official implementation of our paper:
+
+**Learning Invariant Molecular Representation in Latent Discrete Space**
+
+_Xiang Zhuang, Qiang Zhang*, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong Lv, Hongyang Chen, Huajun Chen* (* denotes correspondence)_
+
+Advances in Neural Information Processing Systems (NeurIPS) 2023
+
+<div align=center><img src="./resources/framework.png" style="zoom:50%;" />
+</div>
+
+## Environment
+To run the code successfully, the following dependencies need to be installed:
+```
+Python                     3.8      
+torch                      1.10.1
+torch_geometric            2.0.4
+torch_scatter              2.0.9
+torch_cluster              1.6.0
+torch_sparse               0.6.13
+torch_spline_conv          1.2.1
+rdkit_pypi                 2022.9.5
+vector_quantize_pytorch    1.0.7
+ogb                        1.3.6
+```
+
+This repo is also depended on `GOOD` and `DrugOOD`, please follow the installation methods provided for each package:
+- GOOD (Version 1.1.1)
+  - Repository: https://github.com/divelab/GOOD/
+  - Installation: Please follow the instructions provided in the repository to install.
+- DrugOOD (Version 0.0.1)
+  - Repository: https://github.com/tencent-ailab/DrugOOD
+  - Installation: Please follow the instructions provided in the repository to install.
+
+## Data
+The data used in the experiments can be downloaded from the following sources:
+
+1. GOOD
+   - [GOODPCBA](https://drive.google.com/file/d/1WGieOjtgNXtGoO6o1EGhKrZj0zWU7AJl/view?usp=sharing)
+   - [GOODHIV](https://drive.google.com/file/d/1CoOqYCuLObnG5M0D8a2P2NyL61WjbCzo/view?usp=sharing)
+   - [GOODZINC](https://drive.google.com/file/d/1CHR0I1JcNoBqrqFicAZVKU3213hbsEPZ/view?usp=sharing)
+   - Extract the downloaded files and save the contents in the `data` directory.
+2. DrugOOD
+    - download from [link](https://drive.google.com/drive/folders/19EAVkhJg0AgMx7X-bXGOhD4ENLfxJMWC).
+    - Extract the downloaded file and save the contents in the `drugood-data-chembl30` directory.
+
+An example of the folder hierarchy after adding the data files:
+```
+├── data
+│   ├── GOODHIV
+│   ├── GOODPCBA
+│   ├── GOODZINC
+├── drugood-data-chembl30
+│   ├── lbap_core_ec50_assay.json
+│   └── ...
+├── models
+│   ├── model.py
+│   └── ...
+├── run.py
+└── README.md
+```
+## Running Script
+#### Training
+```
+python run.py --dataset GOODZINC --domain scaffold --shift concept --num_e 4000 --bs 256 --gamma 0.5 --inv_w 0.01 --reg_w 0.5 --gpu 0 --exp_name ZINC --exp_id scaffold-concept
+```
+Running parameters and descriptions are as follows:
+| Parameter | Description | Choices |
+| --- | --- | --- |
+| dataset | name of dataset | `GOODHIV`, `GOODZINC`, `GOODPCBA`, `ic50_assay`, `ic50_scaffold`, `ic50_size`, `ec50_assay`, `ec50_scaffold`, `ec50_size`.|
+| domain | environment-splitting strategy | `scaffold`, `size`. Only need to be specified for datasets in `GOOD`. |
+| shift | type of distribution shift | `covariate`, `concept`. Only need to be specified for datasets in `GOOD`. |
+| num_e | code book size | - |
+| bs | batch size | - |
+| gamma | threshold $\gamma$ | - |
+| inv_w | $\lambda_1$ | - |
+| reg_w | $\lambda_2$ | - |
+| gpu | which GPU to use | - |
+| exp_name | experiment name | - |
+| exp_id | experiment ID | - |
+
+#### Evaluation
+We provide the hyperparameters for the training of each dataset in the Appendix, and provide the corresponding checkpoints in the [release page](https://github.com/HICAI-ZJU/iMoLD/releases).
+```
+python eval.py --dataset GOODZINC --domain scaffold --shift concept --load_path checkpoint/GOODZINC-scaffold-concept.pkl
+```
+The `load_path` parameter specifies the path to load the checkpoint.
+
+## Citation
+If you use or extend our work, please cite the paper as follows:
+
+```bibtex
+@InProceedings{zhuang2023learning,
+  title={Learning Invariant Molecular Representation in Latent Discrete Space},
+  author={Xiang Zhuang and Qiang Zhang and Keyan Ding and Yatao Bian and Xiao Wang and Jingsong Lv and Hongyang Chen and Huajun Chen},
+  booktile={Advances in Neural Information Processing Systems},
+  year={2023}
+}
+```
diff --git a/args_parse.py b/args_parse.py
@@ -0,0 +1,47 @@
+import argparse
+
+
+def args_parser():
+    parser = argparse.ArgumentParser()
+    # exp
+    parser.add_argument("--exp_name", default="run", type=str,
+                        help="Experiment name")
+    parser.add_argument("--dump_path", default="dump/", type=str,
+                        help="Experiment dump path")
+    parser.add_argument("--exp_id", default="", type=str,
+                        help="Experiment ID")
+    parser.add_argument("--gpu", default='0', type=str)
+    parser.add_argument("--random_seed", default=0, type=int)
+    parser.add_argument("--load_path", default=None, type=str)
+
+    # dataset
+    parser.add_argument("--data_root", default='data', type=str)
+    parser.add_argument("--config_path", default='configs', type=str)
+    parser.add_argument("--dataset", default='GOODHIV', type=str)
+    parser.add_argument("--domain", default='scaffold', type=str)
+    parser.add_argument("--shift", default='covariate', type=str)
+
+    # VQ
+    parser.add_argument("--num_e", default=4000, type=int)
+    parser.add_argument("--commitment_weight", default=0.1, type=float)
+
+    # Encoder
+    parser.add_argument("--emb_dim", default=128, type=int)
+    parser.add_argument("--layer", default=4, type=int)
+    parser.add_argument("--dropout", default=0.5, type=float)
+    parser.add_argument("--gnn_type", default='gin', type=str, choices=['gcn', 'gin'])
+    parser.add_argument("--pooling_type", default='mean', type=str)
+
+    # Model
+    parser.add_argument("--inv_w", default=0.01, type=float)
+    parser.add_argument("--reg_w", default=0.5, type=float)
+    parser.add_argument("--gamma", default=0.9, type=float)
+
+    # Training
+    parser.add_argument("--lr", default=0.001, type=float)
+    parser.add_argument("--bs", default=128, type=int)
+    parser.add_argument("--epoch", default=200, type=int)
+
+    args = parser.parse_args()
+
+    return args
diff --git a/configs/GOODHIV/base.yaml b/configs/GOODHIV/base.yaml
@@ -0,0 +1,8 @@
+includes:
+  - ../base.yaml
+model:
+  model_layer: 3
+  global_pool: mean
+# train:
+#   num_steps: 10
+#   mile_stones: [150]
diff --git a/configs/GOODHIV/scaffold/base.yaml b/configs/GOODHIV/scaffold/base.yaml
@@ -0,0 +1,11 @@
+includes:
+  - ../base.yaml
+dataset:
+  dataset_name: GOODHIV
+  domain: scaffold
+train:
+  # max_epoch: 200
+  train_bs: 32
+  val_bs: 256
+  test_bs: 256
+  # weight_decay: 0
diff --git a/configs/GOODHIV/scaffold/concept/ERM.yaml b/configs/GOODHIV/scaffold/concept/ERM.yaml
@@ -0,0 +1,13 @@
+includes:
+  - base.yaml
+model:
+  model_name: vGIN
+ood:
+  ood_alg: ERM
+  ood_param: -1.0
+train:
+  max_epoch: 300
+  lr: 0.001
+  weight_decay: 0.0
+log_file: lb_sweeping
+num_workers: 0
diff --git a/configs/GOODHIV/scaffold/concept/base.yaml b/configs/GOODHIV/scaffold/concept/base.yaml
@@ -0,0 +1,6 @@
+includes:
+  - ../base.yaml
+dataset:
+  shift_type: concept
+model:
+  model_name: vGIN
diff --git a/configs/GOODHIV/scaffold/covariate/ERM.yaml b/configs/GOODHIV/scaffold/covariate/ERM.yaml
@@ -0,0 +1,13 @@
+includes:
+  - base.yaml
+model:
+  model_name: vGIN
+ood:
+  ood_alg: ERM
+  ood_param: -1.0
+train:
+  max_epoch: 300
+  lr: 0.001
+  weight_decay: 0.0
+log_file: lb_sweeping
+num_workers: 0
diff --git a/configs/GOODHIV/scaffold/covariate/base.yaml b/configs/GOODHIV/scaffold/covariate/base.yaml
@@ -0,0 +1,6 @@
+includes:
+  - ../base.yaml
+dataset:
+  shift_type: covariate
+model:
+  model_name: vGIN
diff --git a/configs/GOODHIV/size/base.yaml b/configs/GOODHIV/size/base.yaml
@@ -0,0 +1,11 @@
+includes:
+  - ../base.yaml
+dataset:
+  dataset_name: GOODHIV
+  domain: size
+train:
+  # max_epoch: 200
+  train_bs: 32
+  val_bs: 256
+  test_bs: 256
+  # weight_decay: 0
diff --git a/configs/GOODHIV/size/concept/ERM.yaml b/configs/GOODHIV/size/concept/ERM.yaml
@@ -0,0 +1,13 @@
+includes:
+  - base.yaml
+model:
+  model_name: vGIN
+ood:
+  ood_alg: ERM
+  ood_param: -1.0
+train:
+  max_epoch: 300
+  lr: 0.001
+  weight_decay: 0.0
+log_file: lb_sweeping
+num_workers: 0
diff --git a/configs/GOODHIV/size/concept/base.yaml b/configs/GOODHIV/size/concept/base.yaml
@@ -0,0 +1,6 @@
+includes:
+  - ../base.yaml
+dataset:
+  shift_type: concept
+model:
+  model_name: vGIN
diff --git a/configs/GOODHIV/size/covariate/ERM.yaml b/configs/GOODHIV/size/covariate/ERM.yaml
@@ -0,0 +1,13 @@
+includes:
+  - base.yaml
+model:
+  model_name: vGIN
+ood:
+  ood_alg: ERM
+  ood_param: -1.0
+train:
+  max_epoch: 200
+  lr: 0.001
+  weight_decay: 0.0
+log_file: lb_sweeping
+num_workers: 0
diff --git a/configs/GOODHIV/size/covariate/base.yaml b/configs/GOODHIV/size/covariate/base.yaml
@@ -0,0 +1,6 @@
+includes:
+  - ../base.yaml
+dataset:
+  shift_type: covariate
+model:
+  model_name: vGIN
diff --git a/configs/GOODPCBA/base.yaml b/configs/GOODPCBA/base.yaml
@@ -0,0 +1,10 @@
+includes:
+  - ../base.yaml
+model:
+  model_layer: 5
+  global_pool: mean
+  model_name: vGIN
+train:
+  # num_steps: 10
+  test_bs: 128
+  # mile_stones: [150]
diff --git a/configs/GOODPCBA/scaffold/base.yaml b/configs/GOODPCBA/scaffold/base.yaml
@@ -0,0 +1,10 @@
+includes:
+  - ../base.yaml
+dataset:
+  dataset_name: GOODPCBA
+  domain: scaffold
+train:
+  # max_epoch: 200
+  train_bs: 32
+  val_bs: 128
+  # weight_decay: 0
diff --git a/configs/GOODPCBA/scaffold/concept/ERM.yaml b/configs/GOODPCBA/scaffold/concept/ERM.yaml
@@ -0,0 +1,12 @@
+includes:
+  - base.yaml
+model:
+  model_name: vGIN
+ood:
+  ood_alg: ERM
+  ood_param: -1.
+train:
+  max_epoch: 200
+  lr: 1e-3
+  mile_stones: [150]
+
diff --git a/configs/GOODPCBA/scaffold/concept/base.yaml b/configs/GOODPCBA/scaffold/concept/base.yaml
@@ -0,0 +1,4 @@
+includes:
+  - ../base.yaml
+dataset:
+  shift_type: concept
diff --git a/configs/GOODPCBA/scaffold/covariate/ERM.yaml b/configs/GOODPCBA/scaffold/covariate/ERM.yaml
@@ -0,0 +1,12 @@
+includes:
+  - base.yaml
+model:
+  model_name: vGIN
+ood:
+  ood_alg: ERM
+  ood_param: -1.
+train:
+  max_epoch: 200
+  lr: 1e-3
+  mile_stones: [150]
+
diff --git a/configs/GOODPCBA/scaffold/covariate/base.yaml b/configs/GOODPCBA/scaffold/covariate/base.yaml
@@ -0,0 +1,4 @@
+includes:
+  - ../base.yaml
+dataset:
+  shift_type: covariate
diff --git a/configs/GOODPCBA/size/base.yaml b/configs/GOODPCBA/size/base.yaml
@@ -0,0 +1,10 @@
+includes:
+  - ../base.yaml
+dataset:
+  dataset_name: GOODPCBA
+  domain: size
+train:
+  # max_epoch: 200
+  train_bs: 32
+  val_bs: 128
+  # weight_decay: 0
diff --git a/configs/GOODPCBA/size/concept/ERM.yaml b/configs/GOODPCBA/size/concept/ERM.yaml
@@ -0,0 +1,12 @@
+includes:
+  - base.yaml
+model:
+  model_name: vGIN
+ood:
+  ood_alg: ERM
+  ood_param: -1.
+train:
+  max_epoch: 200
+  lr: 1e-3
+  mile_stones: [150]
+