What is the difference of norm and lower attributes in token #13283

VirArman · 2024-01-29T11:21:51Z

VirArman
Jan 29, 2024

Hi All,
I tried to give fallowing input to get lower and norm for each token and see the difference but both gave the same output so I would like to know whether there is a real difference and if there is no any then why do we need two attributes with same values?
"Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time."

Answered by svlandeg

Feb 12, 2024

Hi!

In many cases, token.norm and token.lower will be the same. Some languages can have tokenizer exceptions though where the norm attribute gets assigned and holds more information beyond just the lowercasing of the token.

Example:

    text = "Albert Einstein wasn't a German-born theoretical physicist."
    nlp = spacy.blank("en")
    doc = nlp(text)
    for token in doc:
        print(token.lower_, token.norm_)

output:

albert albert
einstein einstein
was was
n't not
a a
german german
- -
born born
theoretical theoretical
physicist physicist
. .

Here you see that the token n't is normalized to not.

View full answer

svlandeg · 2024-02-12T15:36:05Z

svlandeg
Feb 12, 2024
Maintainer

Hi!

In many cases, token.norm and token.lower will be the same. Some languages can have tokenizer exceptions though where the norm attribute gets assigned and holds more information beyond just the lowercasing of the token.

Example:

    text = "Albert Einstein wasn't a German-born theoretical physicist."
    nlp = spacy.blank("en")
    doc = nlp(text)
    for token in doc:
        print(token.lower_, token.norm_)

output:

albert albert
einstein einstein
was was
n't not
a a
german german
- -
born born
theoretical theoretical
physicist physicist
. .

Here you see that the token n't is normalized to not.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the difference of norm and lower attributes in token #13283

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What is the difference of norm and lower attributes in token #13283

VirArman Jan 29, 2024

Replies: 1 comment

svlandeg Feb 12, 2024 Maintainer

VirArman
Jan 29, 2024

svlandeg
Feb 12, 2024
Maintainer