Obtain the size of training data used to build a model #1066

adam-ra · 2017-05-17T09:05:53Z

adam-ra
May 17, 2017

It could be quite useful to be able to get (accurate or estimated, whatever) size of the training data that has been used to train a tagging and parsing model (I guess it also holds for NER).
This could be for instance available as nlp.tagger.model.examples_seen or sth alike (perhaps a meta dict with more statistics if available).

This would be useful to guesstimate the number of examples needed to post-train a tagger (as in #1015). Making the post-training work as expected is obviously more complex than repeating the same few training examples FRACTION * ORIGINAL_CORPUS_SIZE but it's still better than hardcoding an out-of-the-blue absolute number.

honnibal · 2017-05-17T10:11:48Z

honnibal
May 17, 2017
Maintainer

Hm.

It's more about the relationship between the step size and the L2-norm of the weights, though. I agree that this sort of diagnostic would be easy and useful to dump from the training process into the meta, though. I'll keep this in mind, thanks.

0 replies

damian-romero · 2021-05-31T18:50:03Z

damian-romero
May 31, 2021

This feature is available on Prodigy's train-curve recipe. For more information on how to get training estimates on spaCy, you can see discussion #5639.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Obtain the size of training data used to build a model #1066

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Obtain the size of training data used to build a model #1066

adam-ra May 17, 2017

Replies: 2 comments

honnibal May 17, 2017 Maintainer

damian-romero May 31, 2021

adam-ra
May 17, 2017

honnibal
May 17, 2017
Maintainer

damian-romero
May 31, 2021