GAlign – word alignment using HMMs estimated with Gibbs sampling

Description

GAlign is yet another word alignment tool (like GIZA++, MGiza, fast_align,...) with several distinguishing features:

competitive alignment quality
supports incremental training and forced alignment
fast performance thanks to multi-threading in all stages, scales to many cores
easy to use
simple code base with minimal dependencies (Boost, Intel TBB)

Installation

Download and compile GAlign:

git clone https://github.com/ales-t/galign.git
cd galign
# export BOOST_PATH=/optionally/specify/different/path/to/Boost
make

Verify that you can run the program:

bin/wordalign --help

Usage

Use of GAlign can be roughly divided into 3 modes:

Train and align (+output model)

Basic usage:

paste corpus.src corpus.tgt | bin/wordalign -m > corpus.alignment

For greater speed, you might like to skip the final Viterbi search for the best alignment (and use the last Gibbs sample instead) with the switch --no-viterbi.

Train and store the model:

paste corpus.src corpus.tgt | bin/wordalign -m --store-model-file corpus.model > corpus.alignment

Continue training with an existing model

In this mode, the existing model will be loaded and used as a starting point for training. Note that data size matters: if the old data are very small compared to the new corpus, statistics collected from the new data will outweigh the existing model, and vice versa.

paste newcorpus.src newcorpus.tgt | bin/wordalign -m --load-model-file oldcorpus.model > newcorpus.alignment

Force alignment

Do not train, only search for the best alignment given the existing model. Avoid --no-viterbi in this mode.

paste newcorpus.src newcorpus.tgt | bin/wordalign -f -m --load-model-file oldcorpus.model > newcorpus.alignment

License

Author: Aleš Tamchyna <tamchyna -at- ufal.mff.cuni.cz>

Licensed under the GNU Lesser General Public License version 2.1.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
src		src
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAlign – word alignment using HMMs estimated with Gibbs sampling

Description

Installation

Usage

Train and align (+output model)

Continue training with an existing model

Force alignment

License

About

Releases

Packages

Languages

ales-t/galign

Folders and files

Latest commit

History

Repository files navigation

GAlign – word alignment using HMMs estimated with Gibbs sampling

Description

Installation

Usage

Train and align (+output model)

Continue training with an existing model

Force alignment

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages