|
| 1 | +# Information Sciences Institute Aligner |
| 2 | + |
| 3 | +This is an algorithmic aligner based on the paper [Aligning English Strings with Abstract Meaning Representation Graphs](https://www.isi.edu/natural-language/mt/amr_eng_align.pdf). |
| 4 | +The code is a python`ized version of the ISI aligner code, which is a bunch of bash scripts and a |
| 5 | +few c++ files. A copy of that project can be found [here](https://github.com/melanietosik/string-to-amr-alignment). |
| 6 | + |
| 7 | +Due to the complexity of the alignment process and the underlying mgiza aligner, the code is not |
| 8 | +setup to be used as part of the library for inference. If you are doing simple inference, it's |
| 9 | +recommended that you use the [faa aligner](https://amrlib.readthedocs.io/en/latest/faa_aligner/). |
| 10 | +If you want to use this code, expect do need to dig scripts a bit to customize it for your use |
| 11 | +case as this is not setup for ease of use. |
| 12 | + |
| 13 | +The ISI alignment code is included here because this is the aligner that has been commonly used |
| 14 | +with AMR and, I believe, the aligner used to create alignments for LDC2020T02. It also performs |
| 15 | +slightly better than the FAA aligner (see performance at the bottom) |
| 16 | + |
| 17 | +To use the code you will need to install and compile [mgiza](https://github.com/moses-smt/mgiza/tree/master/mgizapp). |
| 18 | + |
| 19 | +Note that the main alignment process is a bash script so this will not run under Windows, though |
| 20 | +it could be converted if someone wanted to put in the effort. |
| 21 | + |
| 22 | + |
| 23 | +### Usage |
| 24 | +There are no library calls associated with the aligner. All of the code is in the scripts |
| 25 | +directory under the [ISI Aligner](https://github.com/bjascob/amrlib/tree/master/scripts/62_ISI_Aligner). |
| 26 | +These scripts are simply run in order to conduct the alignment and scoring process. You will |
| 27 | +need a copy of LDC2014T12 to run the code, although it could easily be modified to run on |
| 28 | +other versions, but for scoring the original AMR 1.0 corpus is required as the gold alignments are |
| 29 | +tied to these graphs. |
| 30 | + |
| 31 | +Directories and file locations are generally setup in each script under the `__main__` statement. |
| 32 | +Note that you will need to set the location of the mgiza binaries at the top of the bash script |
| 33 | +`Run_Aligner.sh` |
| 34 | + |
| 35 | +Unlike neural net models, the mgiza aligner doesn't natively separate training and inference into |
| 36 | +two distinct steps. Training and alignment all happen as part of the same process. While it is |
| 37 | +possible to re-use the pretrained tables to do inference, the scores generally drop a few points |
| 38 | +(possibly because it resumes training on the smaller inference dataset) and the code here is not |
| 39 | +setup to do inference. |
| 40 | + |
| 41 | +If you would like to align your own sentences / graphs, I would recommend modifying the script |
| 42 | +`Gather_LDC.py` and having the code append them on to the `sents.txt` and `gstrings.txt` files |
| 43 | +created by the script. The alignments can then be extracted from the end of the |
| 44 | +`amr_alignment_strings.txt` file after running all all steps (scripts) of the process. |
| 45 | + |
| 46 | + |
| 47 | +## Performance |
| 48 | +Score of the ISI_Aligner against the gold ISI hand alignments for LDC2014T12 <sup>**1</sup> |
| 49 | +``` |
| 50 | +Dev scores Precision: 93.78 Recall: 80.30 F1: 86.52 |
| 51 | +Test scores Precision: 92.05 Recall: 76.64 F1: 83.64 |
| 52 | +``` |
| 53 | + |
| 54 | +<sup>**1</sup> |
| 55 | +Note that these scores are obtained during training. When scoring with only the test/dev sets and |
| 56 | +using pre-trained parameters, the scores drop around 2-3 points. |
0 commit comments