Skip to content

Commit 5e6754d

Browse files
committed
update docs for isi aligner
1 parent 7a5c841 commit 5e6754d

File tree

5 files changed

+61
-2
lines changed

5 files changed

+61
-2
lines changed

README.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ amrlib is a python module designed to make processing for [Abstract Meaning Repr
2121
- Smatch (multiprocessed with enhanced/detailed scores) for graph parsing
2222
- BLEU for sentence generation
2323
- Alignment scoring metrics detailing precision/recall
24-
* Sentence paraphrasing - experimental
2524

2625

2726
## AMR Models

docs/faa_aligner.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Fast_Align Algorithm Aligner
22

33
This is an algorithmic aligner based on the paper [Aligning English Strings with Abstract Meaning Representation Graphs](https://www.isi.edu/natural-language/mt/amr_eng_align.pdf).
4-
The code is based on the ISI aligner code. A copy of that that project can be found [here](https://github.com/melanietosik/string-to-amr-alignment).
4+
The code is based on the ISI aligner code. A copy of that project can be found [here](https://github.com/melanietosik/string-to-amr-alignment).
55
The project makes use of original pre/post-processing code but replaces the use of the [mgiza](https://github.com/moses-smt/mgiza/tree/master/mgizapp)
66
app with [fast_align](https://github.com/clab/fast_align). The bash scripts have been converted to python and a new
77
"inference" step allows for pre-trained parameters to be used during run-time operation.

docs/install.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ for run-time use but is highly recommended for training models.
1212
This requires both the pip graphviz install and the installation of the non-python Graphviz library. The separate installs
1313
are required because graphviz is a python wrapper for Graphviz which pip can't install by itself.
1414

15+
* If you want to run the faa_aligner, you will need to install and compile [fast_align](https://github.com/clab/fast_align).
16+
Put this in your path or you can set the environment variable `FABIN_DIR` to its location.
17+
1518
`pip3 install -r requirements.txt`
1619

1720
`python3 -m spacy download en_core_web_sm`

docs/isi_aligner.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Information Sciences Institute Aligner
2+
3+
This is an algorithmic aligner based on the paper [Aligning English Strings with Abstract Meaning Representation Graphs](https://www.isi.edu/natural-language/mt/amr_eng_align.pdf).
4+
The code is a python`ized version of the ISI aligner code, which is a bunch of bash scripts and a
5+
few c++ files. A copy of that project can be found [here](https://github.com/melanietosik/string-to-amr-alignment).
6+
7+
Due to the complexity of the alignment process and the underlying mgiza aligner, the code is not
8+
setup to be used as part of the library for inference. If you are doing simple inference, it's
9+
recommended that you use the [faa aligner](https://amrlib.readthedocs.io/en/latest/faa_aligner/).
10+
If you want to use this code, expect do need to dig scripts a bit to customize it for your use
11+
case as this is not setup for ease of use.
12+
13+
The ISI alignment code is included here because this is the aligner that has been commonly used
14+
with AMR and, I believe, the aligner used to create alignments for LDC2020T02. It also performs
15+
slightly better than the FAA aligner (see performance at the bottom)
16+
17+
To use the code you will need to install and compile [mgiza](https://github.com/moses-smt/mgiza/tree/master/mgizapp).
18+
19+
Note that the main alignment process is a bash script so this will not run under Windows, though
20+
it could be converted if someone wanted to put in the effort.
21+
22+
23+
### Usage
24+
There are no library calls associated with the aligner. All of the code is in the scripts
25+
directory under the [ISI Aligner](https://github.com/bjascob/amrlib/tree/master/scripts/62_ISI_Aligner).
26+
These scripts are simply run in order to conduct the alignment and scoring process. You will
27+
need a copy of LDC2014T12 to run the code, although it could easily be modified to run on
28+
other versions, but for scoring the original AMR 1.0 corpus is required as the gold alignments are
29+
tied to these graphs.
30+
31+
Directories and file locations are generally setup in each script under the `__main__` statement.
32+
Note that you will need to set the location of the mgiza binaries at the top of the bash script
33+
`Run_Aligner.sh`
34+
35+
Unlike neural net models, the mgiza aligner doesn't natively separate training and inference into
36+
two distinct steps. Training and alignment all happen as part of the same process. While it is
37+
possible to re-use the pretrained tables to do inference, the scores generally drop a few points
38+
(possibly because it resumes training on the smaller inference dataset) and the code here is not
39+
setup to do inference.
40+
41+
If you would like to align your own sentences / graphs, I would recommend modifying the script
42+
`Gather_LDC.py` and having the code append them on to the `sents.txt` and `gstrings.txt` files
43+
created by the script. The alignments can then be extracted from the end of the
44+
`amr_alignment_strings.txt` file after running all all steps (scripts) of the process.
45+
46+
47+
## Performance
48+
Score of the ISI_Aligner against the gold ISI hand alignments for LDC2014T12 <sup>**1</sup>
49+
```
50+
Dev scores Precision: 93.78 Recall: 80.30 F1: 86.52
51+
Test scores Precision: 92.05 Recall: 76.64 F1: 83.64
52+
```
53+
54+
<sup>**1</sup>
55+
Note that these scores are obtained during training. When scoring with only the test/dev sets and
56+
using pre-trained parameters, the scores drop around 2-3 points.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ nav:
1010
- Models: models.md
1111
- 'FAA Aligner': faa_aligner.md
1212
- 'RBW Aligner': rbw_aligner.md
13+
- 'ISI Aligner': isi_aligner.md
1314
- Evaluation: evaluation.md
1415
- Training: training.md
1516
- Paraphrasing: paraphrase.md

0 commit comments

Comments
 (0)