This repository implements a unified framework for knowledge graph completion (KGC) combining:
- GNN Distillation: Iterative message filtering to prevent over-smoothing
 - Abstract Probabilistic Interaction Modeling (APIM): Structured probabilistic interaction patterns
 
Built upon Are_MPNNs_helpful (GNN backbone) and SimKGC (KGC framework), with key architectural enhancements.
- Python 3.9.19
 - PyTorch 2.3.0 (CUDA 12.6 recommended)
 - DGL 2.2.1
 - Transformers 4.40.2
 - NVIDIA Apex (FP16 training)
 
More detailed requirements can be found in requirements.txt.
conda create -n kgc python=3.9.19
conda activate kgc
pip install -r requirements.txtGNN-Distill + APIM training and evaluation can be run using the run.py script.
# FB15k-237
nohup python run.py  -model 'compgcn'  -read_setting 'no_negative_sampling' -neg_num 0 -score_func 'conve' -data 'FB15k-237'  -lr 0.001 -nheads 2 -batch 512 -embed_dim 200 -gcn_dim 100 -init_dim 100 -k_w 10 -k_h 20  -l2 0. -num_workers 3  -hid_drop 0.3  -pretrain_epochs 0 -candidate_num 0 -topk 20 -gcn_layer 4 -gnn_distillation -ratio_type 'exponential' -ratio 0.74 -output_dir '***'> FB15K237_CompGCN_Layer4_ExpDecay.log 2>&1 &
# WN18RR
nohup python run.py  -model 'compgcn'  -read_setting 'no_negative_sampling' -neg_num 0 -score_func 'conve' -data 'WN18RR'  -lr 0.001 -nheads 2 -batch 512 -embed_dim 200 -gcn_dim 100 -init_dim 100 -k_w 10 -k_h 20  -l2 0. -num_workers 3  -hid_drop 0.3  -pretrain_epochs 0 -candidate_num 0 -topk 20 -gcn_layer 4 -gnn_distillation -ratio_type 'exponential' -ratio 0.74 -output_dir '***'> WN18RR_CompGCN_Layer4_ExpDecay.log 2>&1 &model: GNN model architecture (compgcn, rgcn, kbgat)read_setting: negative sampling setting (in this paper we use no_negative_sampling)score_func: scoring function for computing edge scores (in this paper we use conve)data: knowledge graph dataset (FB15k-237, WN18RR)lr: learning ratenheads: number of attention headsbatch: batch sizeembed_dim: embedding dimensiongcn_dim: GCN layer dimensioninit_dim: initial feature dimensionk_w: the size of conve kernel weightk_h: the size of conve kernel heightl2: L2 regularization strengthnum_workers: number of workers for data loadinghid_drop: dropout rate for hidden layerspretrain_epochs: number of pretraining epochs (set to 0 for no pretraining)candidate_num: must be 0 (the relational part of the model is not used in this paper)topk: number of APIM candidates (in this paper we use 20)gcn_layer: number of GCN layers (in this paper we use 4)gnn_distillation: whether to use GNN Distillationratio_type: ratio_type of the GNN Distillation model (linear, exponential)ratio: ratio of the GNN Distillation model (in this paper we use exponential decay ratio=0.74, linear decay ratio=0.4)output_dir: output directory for saving model checkpoints and logs