This repository contains the code used for masked language model and unsupervised parsing experiments in Unsupervised Dependency Graph Network paper. If you use this code or our results in your research, we'd appreciate if you cite our paper as following:
@misc{shen2020unsupervised,
title={Unsupervised Dependency Graph Network},
author={Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie Zhou, Aaron Courville},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
year={2022},
}
Python 3, NLTK, ufal.chu_liu_edmonds and PyTorch are required for the current codebase.
-
Install PyTorch and NLTK
-
Download Penn Treebank and BLLIP. Put Penn Treebank into NLTK's corpus folder. Put BLLIP into
./data
folder -
Scripts and commands, from
google-research/
:-
Train Language Modeling
python main.py --cuda --save /path/to/your/model
-
Test Unsupervised Parsing
python test_phrase_grammar.py --cuda --checkpoint /path/to/your/model --print
The default setting in
main.py
achieves a perplexity of approximately59.3
on PTB test set, Directed Dependency Accuracy of approximately49.9
on WSJ test set, and Undirected Dependency Accuracy of approximately61.8
. -
Much of our preprocessing and evaluation code is based on the following repository: