(MyGO) Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation

Overview

🎆 News

2024-12 🎉🎉🎉 Our paper is accepted by AAAI 2025. The title is changed to Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation.
2024-04 Our paper and code are released on ArXiV and Github.
2024-02 We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo].

Dependencies

pip install -r requirement.txt

Details

Python==3.9
numpy==1.24.2
scikit_learn==1.2.2
torch==2.0.0
tqdm==4.64.1
transformers==4.28.0

Data Preparation

You should first get the textual token embedding by running save_token_embeddings.py with transformers library (BERT, RoBERTa, LlaMA). You can first try MyGO on the pre-processed datasets DB15K, MKG-W, and MKG-Y. The large token files in tokens/ should be unzipped before using in the training process. We provide VQGAN / BEiT tokens for visual modality and BERT / RoBERTa / LlaMA tokens for textual modality.

Train and Evaluation

You can refer to the training scripts in run.sh to reproduce our experiment results. Here is an example for DB15K dataset.

CUDA_VISIBLE_DEVICES=0 nohup python train_mygo_fgc.py --data DB15K --num_epoch 1500 --hidden_dim 1024 --lr 1e-3 --dim 256 --max_vis_token 8 --max_txt_token 4 --num_head 2 --emb_dropout 0.6 --vis_dropout 0.3 --txt_dropout 0.1 --num_layer_dec 1 --mu 0.01 > log.txt &

More training scripts can be found in run.sh.

🤝 Citation


@misc{zhang2024mygo,
      title={MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion}, 
      author={Yichi Zhang and Zhuo Chen and Lingbing Guo and Yajing Xu and Binbin Hu and Ziqi Liu and Huajun Chen and Wen Zhang},
      year={2024},
      eprint={2404.09468},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Name	Name	Last commit message	Last commit date
Latest commit Zhang-Each delete absolute path names Mar 22, 2025 e807d77 · Mar 22, 2025 History 13 Commits
data	data	update code and data	Apr 16, 2024
resource	resource	update code and data	Apr 16, 2024
tokens	tokens	update data and code	Dec 19, 2024
.gitignore	.gitignore	update paper title	Dec 19, 2024
README.md	README.md	update paper title	Dec 19, 2024
dataset.py	dataset.py	update code and data	Apr 16, 2024
merge_tokens.py	merge_tokens.py	update data and code	Dec 19, 2024
model_mygo.py	model_mygo.py	delete absolute path names	Mar 22, 2025
model_new.py	model_new.py	update code and data	Apr 16, 2024
requirements.txt	requirements.txt	update code and data	Apr 16, 2024
run.sh	run.sh	update running scripts	Apr 17, 2024
save_token_embeddings.py	save_token_embeddings.py	update files	Mar 22, 2025
train_mygo_fgc.py	train_mygo_fgc.py	update README	Apr 17, 2024
train_other_tokenizer.py	train_other_tokenizer.py	update data and code	Dec 19, 2024
utils.py	utils.py	update code and data	Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(MyGO) Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation

Overview

🎆 News

Dependencies

Details

Data Preparation

Train and Evaluation

🤝 Citation

About

Releases

Packages

Contributors 2

Languages

zjukg/MyGO

Folders and files

Latest commit

History

Repository files navigation

(MyGO) Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation

Overview

🎆 News

Dependencies

Details

Data Preparation

Train and Evaluation

🤝 Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages