Skip to content

How to correctly add lemmas into spacy default lookup table ? #9331

Discussion options

You must be logged in to vote

The lookups are only used in the French lemmatizer as a backoff for cases not covered by the rules. There's no detailed documentation for the French lemmatizer, but it should be pretty easy to follow the code to figure out which tables are relevant for the cases you want to modify:

def rule_lemmatize(self, token: Token) -> List[str]:
cache_key = (token.orth, token.pos)
if cache_key in self.cache:
return self.cache[cache_key]
string = token.text
univ_pos = token.pos_.lower()
if univ_pos in ("", "eol", "space"):
return [string.lower()]
elif "lemma_rules" not in self.lookups or uni…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@Pandalei97
Comment options

@adrianeboyd
Comment options

@Pandalei97
Comment options

Answer selected by Pandalei97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization
2 participants