How can I obtain the data to plot the learning curve from CLI output? #5639

erotavlas · 2020-06-23T15:57:48Z

erotavlas
Jun 23, 2020

I'm referring to these type of plots

https://scikit-learn.org/stable/modules/learning_curve.html

Or this

https://rstudio-conf-2020.github.io/dl-keras-tf/notebooks/learning-curve-diagnostics.nb.html

What I want to determine is where we are at with the training - if the model could benefit from more training data, also if its underfitting or overfitting.

Or if there is another way you suggest to determine these using what is available from the CLI train output.

Answered by svlandeg

Jun 26, 2020

I think there's two aspects to your question. I focused on your question "if the model could benefit from more training data" - and for that I would advise plotting dev accuracies versus training sizes, as that helps you understand how much the model is still improving (on an independent dev set) when you're adding data.

In proper ML lingo, "learning curve" does probably refer to the curve of plotting training loss vs. dev test accuracy. You want to stop learning (run no more epochs) when you start seeing overfitting.

So basically they are two different curves, determining two different hyperparameters: one for the size of the required training dataset, and one for the ideal number of epo…

View full answer

svlandeg · 2020-06-26T07:51:16Z

svlandeg
Jun 26, 2020
Maintainer

A learning curve is basically plotting the performance on a development set, versus the amount of training data you used. You can simulate this by calling the train script with different values for the parameter --n-examples (-ns).

You can notice overfitting when the training accuracy is still improving (training loss is decreasing), but at the same time development accuracy is remaining stable or even dropping.

If the development accuracy is still improving when you add all training examples - that would signal that your model could benefit from even more annotated data.

0 replies

erotavlas · 2020-06-26T14:40:02Z

erotavlas
Jun 26, 2020
Author

@svlandeg Thank you, I have some follow up questions,

accuracy doesn't appear as an output value (only precision, recall, and F1 score) its Ok to use those as the y- axis instead?
when I call train what should I pass it for the [dev_path] parameter? Can I use [train_path] for both to get the training accuracy, and then do a separate run using evaluate command on that model and the test set?
what is the plot of loss vs epoch? some also call this learning curve but does it show the same thing? For example see https://rstudio-conf-2020.github.io/dl-keras-tf/notebooks/learning-curve-diagnostics.nb.html

I did the following, so I just wanted to verify I was using this correctly.

split data into files manually from increasing number of annotations (did this before I learned I could use --n-examples)
call python -m spacy train en [output_path] [train_path] [train_path] on each
record the output metrics (P, R, or F1) from the console output for the last iteration (or obtain from json file for model-best? )
for each model above run python -m spacy evaluate on [dev_path] (dev set is a completely separate data set that never gets included in the training or cross validation)

Then I plot the results and it looks something like this

0 replies

svlandeg · 2020-06-26T14:54:51Z

svlandeg
Jun 26, 2020
Maintainer

I think there's two aspects to your question. I focused on your question "if the model could benefit from more training data" - and for that I would advise plotting dev accuracies versus training sizes, as that helps you understand how much the model is still improving (on an independent dev set) when you're adding data.

In proper ML lingo, "learning curve" does probably refer to the curve of plotting training loss vs. dev test accuracy. You want to stop learning (run no more epochs) when you start seeing overfitting.

So basically they are two different curves, determining two different hyperparameters: one for the size of the required training dataset, and one for the ideal number of epochs.

I use "dev set" thoughout instead of "test set" because I believe you should have some independent dev dataset to determine these hyperparameters on. Once you've determined all these hyperparameters and trained your model with the best ones, you can then use yet another set of data - the actual test set - to measure how well your final model works / generalizes on truely unseen data. If you don't do this, you could be overfitting on the dev set.

And yes, you can use F-score instead of accuracy. It depends on the type of ML problem which is the most appropriate.

I did the following, so I just wanted to verify I was using this correctly.
(...)

call python -m spacy train en [output_path] [train_path] [train_path] on each

for each model above run python -m spacy evaluate on [dev_path] (dev set is a completely separate data set that never gets included in the training or cross validation)

Why don't you just run python -m spacy train en [output_path] [train_path] [dev_path] to get the performance of the dev set? You have an estimate on how well the training dataset is being fitted with the loss - you don't necessarily need the training F-score/accuracy.

Anyway I guess most of these topics are really general ML questions and not so much specific to spaCy. It might make more sense to post these on a different forum with a larger community. That would also help us to keep this tracker focused on bug reports and specific enhancement features.

0 replies

erotavlas · 2020-06-26T15:52:29Z

erotavlas
Jun 26, 2020
Author

@svlandeg Ok thanks, your advice is appreciated.

Sorry I just want to clarify one last thing - you said "You have an estimate on how well the training dataset is being fitted with the loss - you don't necessarily need the training F-score/accuracy."

Ok that's true but during training I don't see the loss for the dev set in the console output or in the json files

Itn NER Loss NER P NER R NER F Token % CPU WPS GPU WPS
--- --------- ------ ------ ------ ------- ------- -------
1 12811.311 91.235 91.840 91.537 100.000 18956 19603
2 4751.773 93.108 94.886 93.989 100.000 18700 20171

0 replies

svlandeg · 2020-06-26T16:00:12Z

svlandeg
Jun 26, 2020
Maintainer

True: by default the script prints the training loss, and the dev F-score. So basically any "loss" metric is calculated on the training dataset, and the others are calculated on the dev set. So you can't necessarily compare the two with this script (unless you run additional evaluations like you suggested). But you may not need that. You can monitor the training loss (which needs to go down) vs. the dev F (which needs to go up).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I obtain the data to plot the learning curve from CLI output? #5639

{{title}}

Replies: 5 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How can I obtain the data to plot the learning curve from CLI output? #5639

erotavlas Jun 23, 2020

Replies: 5 comments

svlandeg Jun 26, 2020 Maintainer

erotavlas Jun 26, 2020 Author

svlandeg Jun 26, 2020 Maintainer

erotavlas Jun 26, 2020 Author

svlandeg Jun 26, 2020 Maintainer

erotavlas
Jun 23, 2020

svlandeg
Jun 26, 2020
Maintainer

erotavlas
Jun 26, 2020
Author

svlandeg
Jun 26, 2020
Maintainer

erotavlas
Jun 26, 2020
Author

svlandeg
Jun 26, 2020
Maintainer