Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect abbreviation of 'millions' in locale DE #739

Open
junglebrain opened this issue Jun 27, 2024 · 1 comment
Open

Incorrect abbreviation of 'millions' in locale DE #739

junglebrain opened this issue Jun 27, 2024 · 1 comment

Comments

@junglebrain
Copy link

Steps to reproduce

  • Send curl -XPOST http://0.0.0.0:8000/parse --data 'locale=de_DE&text=30m'
  • The result is:
    [{"body":"30m","start":0,"value":{"value":30,"type":"value","unit":"metre"},"end":3,"dim":"distance","latent":false},{"body":"30m","start":0,"value":{"value":30000000,"type":"value"},"end":3,"dim":"number","latent":false}]

Expected result:

[{"body":"30m","start":0,"value":{"value":30,"type":"value","unit":"metre"},"end":3,"dim":"distance","latent":false}

What's wrong

The second result body is incorrect because '30m' cannot mean '30 million' under locale de_DE. In German, 'million' is abbreviated 'Mio.' (see Wikipedia). Unlike in English, 'million' is never abbreviated "M" (let alone "m") as this would be too ambiguous ("Meter" vs. "Millionen" vs. "Milliarden").

Variation: dim=number

Let's change our request to restrict the dim to 'number':

  • Send curl -XPOST http://0.0.0.0:8000/parse --data 'locale=de_DE&dims="["number"]"&text=30m'
  • The result is [{"body":"30m","start":0,"value":{"value":30000000,"type":"value"},"end":3,"dim":"number","latent":false}]

Now, the expected behavior would be to simply ignore the 'm' and extract the number 30.

@junglebrain
Copy link
Author

Conversely, if I query '30 Mio.' (the correct abbreviation for 'millions'), I get [{"body":"30","start":0,"value":{"value":30,"type":"value"},"end":2,"dim":"number","latent":false}], which is also incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant