FormaleSemantik

Task

Relation Classification

Structure

Every folder must contain a init.py.

Start programs in root folder with

python -m <filename without .py>

For example if you want to start the preprocessing.py if __name__ == '__main__': part, just type:

python -m preprocessing

For example if you want to start the model/preprocessing.py if __name__ == '__main__': part, just type:

python -m model.preprocessing

Team

Jan Sieber (3219317), Robin Ruland (3230684), Johannes Daub (3145320)

Environment

Python 3.5

Dataset

FewRel Dataset

Word2Vec

Our own trained models have huge sizes (6GB). We can give you the pretrained models on HDD or something. We will try to upload it into the CL Pool account.

A pretrained Word2Vec model from Google can be downloaded here.

Additionally we trained our own word2vec models on Wikidata with this repo , because many words were unknown in the pretrained model above. The repo is in our word2vec folder of our project.

The Stanford CoreNLP server will be needed to use this trainer. Start it with

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

After the server has started, you just can start the run.sh from the repo. See the Repo description here. You may need to "make" the repo. FINALLY you need to copy the word2vec/word2vec/results into the models/ folder.

Recurrent models

Papers read for the RNN models:

To train the recurrent models tested, follow these steps:

Download the dataset (train and validation) and save it to "data/fewrel_train.json", "data/fewrel_val.json"
Execute (WARNING: This needs some time and ~30GB of disk memory. 16GB of RAM are an advantage.):

python -m preprocessing -generate 1 -amount 999999 -prefix min15 -w2v min15
python -m preprocessing -generate 1 -amount 999999 -prefix min5 -w2v min5
python -m preprocessing -generate 1 -amount 999999 -prefix min2 -w2v min2
python -m preprocessing -generate 1 -amount 7000 -prefix min5-small -w2v min5
python -m preprocessing -generate 1 -amount 7000 -prefix min2-small -w2v min2

to create the datasets. The "-small" datasets have only 10 relations.

Execute (WARNING This needs ~5GB of disk memory)(If there is no gpu argument given, the cpu is used):

python -m Trainers.RNNTrain -prefix min15 -network NDNN -gpu <GPU Index>
python -m Trainers.RNNTrain -prefix min15 -network ND -gpu <GPU Index>
python -m Trainers.RNNTrain -prefix min15 -network NN -gpu <GPU Index>
python -m Trainers.RNNTrain -prefix min5-small -network NDNN -gpu <GPU Index>
python -m Trainers.RNNTrain -prefix min5 -network NDNN -gpu <GPU Index>
python -m Trainers.RNNTrain -prefix min2-small -network NDNN -gpu <GPU Index>
python -m Trainers.RNNTrain -prefix min2 -network NDNN -gpu <GPU Index>

to train the models with different architectures on different datasets.

Execute

python -m plotRNNs

to generate the classification accuracy plots.

Convolutional models

Papers read for the CNN models:

To train the convolutional models, follow these steps:

Download the dataset: "python -m preprocessing -generate 1 -amount 56000 -prefix dummy" (or change the amount value to train on only a part of the dataset)
train,valuate and test CNNs

python CNN.py

(parameters can by changed directly in the CNN.py file) As default this will train the data and saves a graph for loss and acc for the training AND validation for every epoch.

Default:

CNN with input: sub,obj emembeddings

in this project we also trained a CNN with input: sub,obj,shortest dependency path emembeddings

FC models

Execute

python -m FC.py

to train the model. The default is setup to train with two relations only.

It uses six fully connected layers and a softmax layer at the end. It uses stochastic gradient descent to update the weights (lr = 1) and negative log likelihood loss.

As input it uses the same a the recurrent model.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
DatasetClasses		DatasetClasses
Trainers		Trainers
data		data
models		models
plots		plots
presentation		presentation
producedData		producedData
word2vec		word2vec
.gitignore		.gitignore
CNN.py		CNN.py
FC.py		FC.py
README.md		README.md
Variables.py		Variables.py
__init__.py		__init__.py
plotRNNs.py		plotRNNs.py
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FormaleSemantik

Task

Structure

Team

Environment

Dataset

Word2Vec

Recurrent models

Convolutional models

FC models

About

Releases

Packages

Contributors 3

Languages

Berndinio/FormaleSemantik

Folders and files

Latest commit

History

Repository files navigation

FormaleSemantik

Task

Structure

Team

Environment

Dataset

Word2Vec

Recurrent models

Convolutional models

FC models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages