Current implemented languages:
Complete pipeline available for Arabic, German, English, French, Spanish, Turkish (no lemmatisation done for tr). A previous lemmatisation with TreeTagger is expected for Italian, Romanian and Dutch.
-
Download and install the BabelNet API and its dependencies
[API download] (http://babelnet.org/data/4.0/BabelNet-API-4.0.1.zip)
unzip BabelNet-API-4.0.1.zip
mvn install:install-file -Dfile=lib/lcl-jlt-2.4.jar -DgroupId=it.uniroma1.lcl.jlt -DartifactId=lcl-jlt -Dversion=2.4 -Dpackaging=jar
mvn install:install-file -Dfile=lib/babelscape-data-commons-1.0.jar -DgroupId=com.babelscape -DartifactId=babelscape-data-commons -Dversion=1.0 -Dpackaging=jar
unzip -p babelnet-api-4.0.1.jar META-INF/maven/it.uniroma1.lcl.babelnet/babelnet-api/pom.xml | grep -vP '<(scope|systemPath)>' >babelnet-api-4.0.1.pom
(consider using homebrew's ggrep if on OsX)
mvn install:install-file -Dfile=babelnet-api-4.0.1.jar -DpomFile=babelnet-api-4.0.1.pom
-
Download BabelNet indices and make the API aware of them
[Indices download] (http://babelnet.org/login)
tar xjvf babelnet-4.0.1-index.tar.bz2
- In
./BabelNet-API-4.0.1/config/babelnet.var.properties
include the path to the index:
babelnet.dir=/home/usr/BabelNet-4.0.1
- In
./BabelNet-API-4.0.1/config/jlt.var.properties
include the path to WordNet:
jlt.wordnetPrefix=/usr/local/share/wordnet
- Move the
./BabelNet-API-4.0.1/config
folder to your ${basedir}
If you need to annotate Arabic corpora:
- Download and install MADAMIRA jar
License for downloading
mvn install:install-file -Dfile={$PATH}/MADAMIRA-release-20160516-2.1/MADAMIRA-release-20160516-2.1.jar -DgroupId=edu.columbia.ccls.madamira -DartifactId=MADAMIRA-release -Dversion=20160516-2.1 -Dpackaging=jar
Finally:
- Download and install this repository
git clone https://github.com/cristinae/BabelWE.git
mvn clean dependency:copy-dependencies package
-
Download the IXA pipes for tokenisation and lemmatisation of English, Spanish, French and German. They are used as an external executable, no need for installation.
Download page
Include their path in the configuration file babelWE.ini -
Use the Moses tokeniser included in the ./scripts folder
Include its path in the configuration file babelWE.ini
### External software and models
# IXA pipe
ixaTok=/fullPath/ixa/ixa-pipe-tok-1.8.5-exec.jar
ixaLem=/fullPath/ixa/ixa-pipe-pos-1.5.1-exec.jar
posEs=/fullPath/ixa/morph-models-1.5.0/es/es-pos-perceptron-autodict01-ancora-2.0.bin
lemEs=/fullPath/ixa/morph-models-1.5.0/es/es-lemma-perceptron-ancora-2.0.bin
posEn=/fullPath/ixa/morph-models-1.5.0/en/en-pos-perceptron-autodict01-conll09.bin
lemEn=/fullPaths/ixa/morph-models-1.5.0/en/en-lemma-perceptron-conll09.bin
# Moses
mosesTok=/fullPath/moses/tokenizerNO2html.perl
- For Italian, Romanian and Dutch we expect input to be already lemmatised using TreeTagger, but the lemmatisation pipeline with TreeTagger is not included yet.