BabelWE

Collection of utilities to work with BabelNet synsets

Current implemented languages:

Complete pipeline available for Arabic, German, English, French, Spanish, Turkish (no lemmatisation done for tr). A previous lemmatisation with TreeTagger is expected for Italian, Romanian and Dutch.

Set-up and installation

Download and install the BabelNet API and its dependencies
[API download] (http://babelnet.org/data/4.0/BabelNet-API-4.0.1.zip)
unzip BabelNet-API-4.0.1.zip
mvn install:install-file -Dfile=lib/lcl-jlt-2.4.jar -DgroupId=it.uniroma1.lcl.jlt -DartifactId=lcl-jlt -Dversion=2.4 -Dpackaging=jar
mvn install:install-file -Dfile=lib/babelscape-data-commons-1.0.jar -DgroupId=com.babelscape -DartifactId=babelscape-data-commons -Dversion=1.0 -Dpackaging=jar
unzip -p babelnet-api-4.0.1.jar META-INF/maven/it.uniroma1.lcl.babelnet/babelnet-api/pom.xml | grep -vP '<(scope|systemPath)>' >babelnet-api-4.0.1.pom
(consider using homebrew's ggrep if on OsX)
mvn install:install-file -Dfile=babelnet-api-4.0.1.jar -DpomFile=babelnet-api-4.0.1.pom
Download BabelNet indices and make the API aware of them
[Indices download] (http://babelnet.org/login)
tar xjvf babelnet-4.0.1-index.tar.bz2

In ./BabelNet-API-4.0.1/config/babelnet.var.properties include the path to the index:
babelnet.dir=/home/usr/BabelNet-4.0.1
In ./BabelNet-API-4.0.1/config/jlt.var.properties include the path to WordNet:
jlt.wordnetPrefix=/usr/local/share/wordnet
Move the ./BabelNet-API-4.0.1/config folder to your ${basedir}

If you need to annotate Arabic corpora:

Download and install MADAMIRA jar
License for downloading
mvn install:install-file -Dfile={$PATH}/MADAMIRA-release-20160516-2.1/MADAMIRA-release-20160516-2.1.jar -DgroupId=edu.columbia.ccls.madamira -DartifactId=MADAMIRA-release -Dversion=20160516-2.1 -Dpackaging=jar

Finally:

Download and install this repository
git clone https://github.com/cristinae/BabelWE.git
mvn clean dependency:copy-dependencies package

External resources

Download the IXA pipes for tokenisation and lemmatisation of English, Spanish, French and German. They are used as an external executable, no need for installation.
Download page
Include their path in the configuration file babelWE.ini
Use the Moses tokeniser included in the ./scripts folder
Include its path in the configuration file babelWE.ini

### External software and models
# IXA pipe
ixaTok=/fullPath/ixa/ixa-pipe-tok-1.8.5-exec.jar
ixaLem=/fullPath/ixa/ixa-pipe-pos-1.5.1-exec.jar
posEs=/fullPath/ixa/morph-models-1.5.0/es/es-pos-perceptron-autodict01-ancora-2.0.bin
lemEs=/fullPath/ixa/morph-models-1.5.0/es/es-lemma-perceptron-ancora-2.0.bin
posEn=/fullPath/ixa/morph-models-1.5.0/en/en-pos-perceptron-autodict01-conll09.bin
lemEn=/fullPaths/ixa/morph-models-1.5.0/en/en-lemma-perceptron-conll09.bin

# Moses
mosesTok=/fullPath/moses/tokenizerNO2html.perl

For Italian, Romanian and Dutch we expect input to be already lemmatised using TreeTagger, but the lemmatisation pipeline with TreeTagger is not included yet.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
scripts		scripts
src/main		src/main
.gitignore		.gitignore
README.md		README.md
babelWE.ini		babelWE.ini
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BabelWE

Collection of utilities to work with BabelNet synsets

Set-up and installation

External resources

About

Releases

Packages

Languages

cristinae/BabelWE

Folders and files

Latest commit

History

Repository files navigation

BabelWE

Collection of utilities to work with BabelNet synsets

Set-up and installation

External resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages