Incorrect detection of sentence boundaries, if last sentence missing eos symbol for trf model #13356
Answered
by
svlandeg
koder-ua
asked this question in
Help: Other Questions
-
How to reproduce the behaviour
Your Environment
|
Beta Was this translation helpful? Give feedback.
Answered by
svlandeg
Feb 27, 2024
Replies: 1 comment
-
Hi! In this pretrained pipeline, the sentence segmentation is actually done by the parser, and the model was mostly trained on texts with correct punctuation. So unfortunately this type of occassional error is unavoidable. If you'd like to have more predictable behaviour, you can use the |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
svlandeg
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi!
In this pretrained pipeline, the sentence segmentation is actually done by the parser, and the model was mostly trained on texts with correct punctuation. So unfortunately this type of occassional error is unavoidable.
If you'd like to have more predictable behaviour, you can use the
sentencizer
instead, which is a more simple rule-based component that splits sentences on punctuation like.
,!
or?
.