Obtain the size of training data used to build a model #1066
adam-ra
started this conversation in
New Features & Project Ideas
Replies: 2 comments
-
Hm. It's more about the relationship between the step size and the L2-norm of the weights, though. I agree that this sort of diagnostic would be easy and useful to dump from the training process into the meta, though. I'll keep this in mind, thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
-
This feature is available on Prodigy's train-curve recipe. For more information on how to get training estimates on spaCy, you can see discussion #5639. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It could be quite useful to be able to get (accurate or estimated, whatever) size of the training data that has been used to train a tagging and parsing model (I guess it also holds for NER).
This could be for instance available as
nlp.tagger.model.examples_seen
or sth alike (perhaps ameta
dict with more statistics if available).This would be useful to guesstimate the number of examples needed to post-train a tagger (as in #1015). Making the post-training work as expected is obviously more complex than repeating the same few training examples FRACTION * ORIGINAL_CORPUS_SIZE but it's still better than hardcoding an out-of-the-blue absolute number.
Beta Was this translation helpful? Give feedback.
All reactions