Zero score for Spancat - debug data looks to be fine #12412
-
I am having trouble training a fresh spancat model. According to the data debug, I am using a fairly low amount of samples. However, the number of samples I use is far larger than any I have ever used before (although the other models did not have the spancat component). I have a large dataset of .txt files that I plan to process and annotate, but before I do so, I would like to know whether my current method has any flaws in it or whether the low number of samples is the sole reason this isn't working. To clarify: spans in the "sentences" key are whole sentences, hence the high token count (see below in data debug). Any help is appreciated! (apologies for the wall of text; if there is a way to create code dropdowns or other functions to promote visibility, please do let me know).
config.cfg:
Debug data:
The docbin is created like this:
EDIT: Still the same issue. I now tried adding
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 10 replies
-
To use the sentence suggester, you need to add a [training]
annotating_components = ["sentencizer"]
|
Beta Was this translation helpful? Give feedback.
To use the sentence suggester, you need to add a
sentencizer
(or other component that annotates sentences) to your pipeline and add that to[training.annotating_components]
:sentencizer
is rule-based and the easiest to start with. If you usedsentencizer
when creating the.spacy
files from your expanded spans, then usesentencizer
here, too.