Official Github Repository for ACL 2025 paper:
Enhancing Transformers for Generalizable First-Order Logical Entailment
We provide a sample dataset in the folder ./sample_query_data
with a small amount of queries from all 55 query types.
We use multiprocessing to accelerate the dataset sampling process. To replicate our experiments, you may use the following command to sample your own dataset:
cd efo_code
python sample_query_multi.py --sample_formula_scope SEEN23 --mode train --output_folder \
../your_folder_name/training_queries --num_queries <your_training_data_size> \
--num_processes 80 --dataset FB15K237 <replace with your targeting KG>
python sample_query_multi.py --sample_formula_scope FULL55 --mode valid --output_folder \
../your_folder_name/validation_queries --num_queries <your_validation_data_size> \
--num_processes 80 --dataset FB15K237 <replace with your targeting KG>
python sample_query_multi.py --sample_formula_scope FULL55 --mode test --output_folder \
../your_folder_name/testing_queries --num_queries <your_testing_size> \
--num_processes 80 --dataset FB15K237 <replace with your targeting KG>
All transformers are trained on 4 NVIDIA A100 GPUs for two days, with a batch size of 1024.
To replicate the experimental results, you may train the models listed in the folder ./model
with the following command:
cd model
python train.py \
-dn FB15k-237 <replace with your targeting KG> \
-m transformer \
--train_query_dir ../sample_query_data/fb237-23-train <replace with your own folder name> \
--valid_query_dir ../sample_query_data/fb237-55-valid <replace with your own folder name> \
--test_query_dir ../sample_query_data/fb237-55-test <replace with your own folder name> \
--checkpoint_path ../checkpoint/logs \
-b 1024 \
--log_steps 50000 \
-lr 0.0001 \
cd model
python train.py \
-dn FB15k-237 <replace with your targeting KG> \
-m transformer \
--train_query_dir ../sample_query_data/fb237-23-train <replace with your own folder name> \
--valid_query_dir ../sample_query_data/fb237-55-valid <replace with your own folder name> \
--test_query_dir ../sample_query_data/fb237-55-test <replace with your own folder name> \
--checkpoint_path ../checkpoint/logs \
-b 1024 \
--log_steps 50000 \
-lr 0.0001 \
--rpe
cd model
python train.py \
-dn FB15k-237 <replace with your targeting KG> \
-m transformertega \
--train_query_dir ../sample_query_data/fb237-23-train <replace with your own folder name> \
--valid_query_dir ../sample_query_data/fb237-55-valid <replace with your own folder name> \
--test_query_dir ../sample_query_data/fb237-55-test <replace with your own folder name> \
--checkpoint_path ../checkpoint/logs \
-b 1024 \
--log_steps 50000 \
-lr 0.0001 \
--rpe \
--num_categories 6 \
--pooling sum \
--self_dist
To check model performances, you may use tensorboardX
to log the evaluation results in ./checkpoint/logs/gradient_tape
with the following command:
tensorboard --logdir ./checkpoint/logs/gradient_tape --port 6006
ssh -N -f -L localhost:port_number:localhost:port_number your_server_location