In the second assignment of IN5550, we build a dependency parser using the Universal Dependency framework and dataset is CoNLLU-format. Using the precode converter, we convert CoNLLU files to a JSONL format for golden truth. The assignments specifies to only train on a Norwegian training set. For evaluation we use part-of-speech (POS) accuracy, unlabeled attachment score (UAS) and labeled attachment score (LAS).
Created by Kjetil K. Indrehus and Caroline K. Vannebo
- Train the model with FOX:
sbatch ./slurm_scripts/ndp.slurm \
--id <EXPERIMENT_ID> \
--model_name <PRETRAINED_TRANSFORMER_MODEL_NAME> \
--train_conllu <PATH_TO_TRAIN_CONLLU> \
--dev_conllu <PATH_TO_DEV_CONLLU> \
--seed <RANDOM_SEED> \
--cache_dir <PATH_TO_CACHE_FOLDER> \
--batch_size <BATCH_SIZE> \
--epochs <EPOCHS> \
--lr <LEARNING_RATE> \
--patience <PATIENCE> \
--grad_clip <GRADIENT_CLIPPING> \
--weight <WEIGHT_DECAY> \
--lrs <CosineAnneal|StepDecay> \
--heads <NUMBER_OF_HEADS> \
--mha <True|False> \
--optimizer <Adam|AdamW|AdaGrad|SGD> \
--step_size <STEP_SIZE_FOR_LRS>
Note that:
- Experiment ID is the name of the
.pt
file, and is used to keep track of what model is stored model_name
is the name of the model from Hugging face.- We use a development set for early stopping where
patience
is the max amount of epochs allowed of bad development loss in a row. - Cache directory must be a directory where you have space to store the pretrained transformer models
- Check
train.py
for default values
- Predict on the development set on FOX:
sbatch ./slurm_scripts/predict.slurm \
--input_path <PATH_TO_COLLNU_FILE> \
--output_path <OUTPUT_JSONL_PATH> \
--model <TRAINED_MODEL_PATH>
- Then calculate POS accuracy, UAS and LAS score with the
metric.py
script:
python metric.py --gold_path <PATH_TO_DEV_JOSNL> --prediction_path <PREDICTION_PATH>
To test the time of inference for a given model, use:
sbatch ./slurm_scripts/bench.slurm --model <MODEL_FILE>
To download CoLNNU files used for cross-linguar transfer evaluation use:
python src/scripts/download_ud.py --folder <FOLDER>
Where folder is the destination folder where the .collnu
and .json
files gets set up.
Norwegian Validaiton results:
Model | Batch | Lr. | Heads | Early Stop | POS Acc. | UAS | LAS |
---|---|---|---|---|---|---|---|
ltg/norbert3-xs | 32 | 3e-5 | 8 | 19 | 95.92% | 92.53% | 85.35% |
ltg/norbert3-small | 32 | 3e-5 | 8 | 19 | 97.16% | 93.84% | 88.69% |
ltg/norbert3-base | 32 | 3e-5 | 8 | 20 | 97.93% | 95.60% | 91.77% |
ltg/norbert3-large | 32 | 3e-5 | 8 | 6 | 97.98% | 93.53% | 89.65% |
ltg/norbert3-xs | 16 | 2e-5 | 4 | 16 | 95.93% | 91.83% | 84.56% |
ltg/norbert3-small | 16 | 2e-5 | 4 | 15 | 97.13% | 85.61% | 80.89% |
ltg/norbert3-base | 16 | 2e-5 | 4 | 20 | 97.91% | 95.50% | 91.56% |
ltg/norbert3-large | 16 | 2e-5 | 4 | 9 | 98.04% | 95.30% | 91.61% |
Table: Comparison of different training configurations for the MHA model. Trained for a maximum of 20 epochs, with early stopping based on validation loss. Used AdamW optimizer, Cosine Annealing learning rate scheduler, and 0.01 weight decay. Best overall model highlighted in bold.
Model | Batch | Lr. | Heads | Early Stop | POS Acc. | UAS | LAS |
---|---|---|---|---|---|---|---|
ltg/norbert3-large | 32 | 3e-5 | 16 | 12 | 98.22% | 95.79% | 92.38% |
ltg/norbert3-large | 16 | 2e-5 | 16 | 9 | 98.06% | 95.86% | 92.15% |