Memory efficient Example objects #7806
Replies: 1 comment 3 replies
-
An Example is basically just two Doc objects with various convenience methods. The output you get when printing it is the result of calling If you're not using Transformers or other user attributes, the size of a Doc should be pretty constant with regard to its length, since most fields are pointers to the string store in Cython. So I'm not sure there's much you can do about this. One limitation of training at the moment is that we assume you can load the whole training set in memory, though we are working on supporting streaming corpora. The PR for that (#7208) has already been merged and should be in the next release. Also, as a note, rather than disabling everything in a pipeline you can just use |
Beta Was this translation helpful? Give feedback.
-
When I try to train my model it uses too much memory. It seems like most memory is consumed by Example objects. I am not sure about internal structure but when I print the example object I can see many sparse lists. I understand that those lists are filled once I enable the right pipeline component for it. However wouldn't it better to generate that sparse list on the fly when a particular pipeline component is disabled or store it somehow cleverly (maybe it's done already)?
Beta Was this translation helpful? Give feedback.
All reactions