Removing url matching from tokenizer #13685
Answered
by
chelle-rr
chelle-rr
asked this question in
Help: Coding & Implementations
-
Hi! I'm trying to use spaCy to work with a set of filepaths and names, so I've needed to set some specific rules for tokenization. However, I'm getting some unexpected results, and via nlp.tokenizer.explain I can see that the issue is that it's occasionally matching parts of the filename as a URL. Is there a way I can disable this?
Output:
Desired output:
I'm very new to this so please excuse me if I'm missing something obvious. Thank you! |
Beta Was this translation helpful? Give feedback.
Answered by
chelle-rr
Nov 12, 2024
Replies: 1 comment
-
If anyone needs it later, here's the answer:
|
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
chelle-rr
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If anyone needs it later, here's the answer:
nlp.tokenizer.url_match = None