You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 4, 2022. It is now read-only.
The current sentence and word tokenizers/parsers take into account HTML. In #406 we will build parsers for sentences and words that assume there is not HTML in the text anymore.
When all of the text analysis library code relies on the tree instead of the old (flawed) parsers we can delete the old parsers.
If this has not been done yet, we should also make sure that all the tests are implemented for the new parsers. Tests with HTML shouldn't be ported. Old tests:
Explanation
The current sentence and word tokenizers/parsers take into account HTML. In #406 we will build parsers for sentences and words that assume there is not HTML in the text anymore.
When all of the text analysis library code relies on the tree instead of the old (flawed) parsers we can delete the old parsers.
Technical decisions
The files I am talking about are:
If this has not been done yet, we should also make sure that all the tests are implemented for the new parsers. Tests with HTML shouldn't be ported. Old tests:
Feedback?
The text was updated successfully, but these errors were encountered: