A set of useful tools for use with multiword expression extraction from parallel corpora for Moses statistical machine translation system
- Multiword Expression XML file converter to MPFormat
- One parameter - input XML file from MWE toolkit with sentances containing MWEs
- Phrase counter from XML
- One parameter - input XML file from MWE toolkit with MWE frequencies
- Jaccard Index calculator
- Three parameters - MWE file counts for target and source languages and MWE file count of source and target language pairs
- CPP source file
- Makefile
- Test data file(s)
- XmlInspector (included)
- boost/bimap
- clang++
- Converter from MP aligner format to the moses training data format
- Three parameters - input MPFormat file and output source and target language files
- Converter from MP aligner format to the moses translation table format
- Two parameters - input MPFormat file and output translation table file
- MWE Translation Workflow
- A complete workflow for extracting a set of parallel multiword-expressions from parallel corpora
If you use this tool, please cite the following paper:
Matīss Rikters and Ondřej Bojar (2017). "Paying Attention to Multi-Word Expressions in Neural Machine Translation." In Proceedings of the 16th Machine Translation Summit (MT Summit 2017) (2017).
@inproceedings{Rikters-Bojar2017MTSummit,
author = {Rikters, Matīss and Bojar, Ond\v{r}ej},
booktitle={Proceedings of the 16th Machine Translation Summit (MT Summit 2017)},
title = {{Paying Attention to Multi-Word Expressions in Neural Machine Translation}},
address={Nagoya, Japan},
year = {2017}
}