Norwegian Dependency Parsing

In the second assignment of IN5550, we build a dependency parser using the Universal Dependency framework and dataset is CoNLLU-format. Using the precode converter, we convert CoNLLU files to a JSONL format for golden truth. The assignments specifies to only train on a Norwegian training set. For evaluation we use part-of-speech (POS) accuracy, unlabeled attachment score (UAS) and labeled attachment score (LAS).

Created by Kjetil K. Indrehus and Caroline K. Vannebo

Usage

Train the model with FOX:

sbatch ./slurm_scripts/ndp.slurm \
    --id <EXPERIMENT_ID> \
    --model_name <PRETRAINED_TRANSFORMER_MODEL_NAME> \
    --train_conllu <PATH_TO_TRAIN_CONLLU> \
    --dev_conllu <PATH_TO_DEV_CONLLU> \
    --seed <RANDOM_SEED> \
    --cache_dir <PATH_TO_CACHE_FOLDER> \
    --batch_size <BATCH_SIZE> \
    --epochs <EPOCHS> \
    --lr <LEARNING_RATE> \
    --patience <PATIENCE> \
    --grad_clip <GRADIENT_CLIPPING> \
    --weight <WEIGHT_DECAY> \
    --lrs <CosineAnneal|StepDecay> \
    --heads <NUMBER_OF_HEADS> \
    --mha <True|False> \
    --optimizer <Adam|AdamW|AdaGrad|SGD> \
    --step_size <STEP_SIZE_FOR_LRS>

Note that:

Experiment ID is the name of the .pt file, and is used to keep track of what model is stored
model_name is the name of the model from Hugging face.
We use a development set for early stopping where patience is the max amount of epochs allowed of bad development loss in a row.
Cache directory must be a directory where you have space to store the pretrained transformer models
Check train.py for default values

Predict on the development set on FOX:

sbatch ./slurm_scripts/predict.slurm \
    --input_path <PATH_TO_COLLNU_FILE> \
    --output_path <OUTPUT_JSONL_PATH> \
    --model <TRAINED_MODEL_PATH>

Then calculate POS accuracy, UAS and LAS score with the metric.py script:

python metric.py --gold_path <PATH_TO_DEV_JOSNL> --prediction_path <PREDICTION_PATH>

Extra

To test the time of inference for a given model, use:

sbatch ./slurm_scripts/bench.slurm --model <MODEL_FILE>

To download CoLNNU files used for cross-linguar transfer evaluation use:

python src/scripts/download_ud.py --folder <FOLDER>

Where folder is the destination folder where the .collnu and .json files gets set up.

Results

Norwegian Validaiton results:

Model	Batch	Lr.	Heads	Early Stop	POS Acc.	UAS	LAS
ltg/norbert3-xs	32	3e-5	8	19	95.92%	92.53%	85.35%
ltg/norbert3-small	32	3e-5	8	19	97.16%	93.84%	88.69%
ltg/norbert3-base	32	3e-5	8	20	97.93%	95.60%	91.77%
ltg/norbert3-large	32	3e-5	8	6	97.98%	93.53%	89.65%
ltg/norbert3-xs	16	2e-5	4	16	95.93%	91.83%	84.56%
ltg/norbert3-small	16	2e-5	4	15	97.13%	85.61%	80.89%
ltg/norbert3-base	16	2e-5	4	20	97.91%	95.50%	91.56%
ltg/norbert3-large	16	2e-5	4	9	98.04%	95.30%	91.61%

Table: Comparison of different training configurations for the MHA model. Trained for a maximum of 20 epochs, with early stopping based on validation loss. Used AdamW optimizer, Cosine Annealing learning rate scheduler, and 0.01 weight decay. Best overall model highlighted in bold.

Model	Batch	Lr.	Heads	Early Stop	POS Acc.	UAS	LAS
ltg/norbert3-large	32	3e-5	16	12	98.22%	95.79%	92.38%
ltg/norbert3-large	16	2e-5	16	9	98.06%	95.86%	92.15%

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
docs		docs
out		out
slurm_scripts		slurm_scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
metric.py		metric.py
predict.py		predict.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Norwegian Dependency Parsing

Usage

Extra

Results

About

Uh oh!

Languages

License

KjetilIN/norwegian-dependency-parsing

Folders and files

Latest commit

History

Repository files navigation

Norwegian Dependency Parsing

Usage

Extra

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages