Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support of German compound words in ingredients processing (e.g. Himbeerstücke, Schokoladenstücke) #9115

Open
Tracked by #9949
stephanegigandet opened this issue Oct 5, 2023 · 1 comment
Labels
German https://wiki.openfoodfacts.org/Local_Communities/GermanTeam 🇩🇪 Germany https://wiki.openfoodfacts.org/Local_Communities/GermanTeam 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis

Comments

@stephanegigandet
Copy link
Contributor

stephanegigandet commented Oct 5, 2023

For ingredients processing, we currently remove words like "stücke" (from the ingredients_processing.txt taxonomy), and try to match them to ingredients. It does not work very well as some final letters might be added or removed when creating compound words.

e.g. Schokolade -> Schokoladenstücke, Himbeere -> Himbeerstücke

See https://de.openfoodfacts.org/ingredients?filter=st%C3%BCcke for some real world examples.

We can solve this in different ways.

One easy way would be to add the words with added/removed letter as synonyms in the ingredients.txt taxonomy:

de:Himbeere, himbeer
de:Schokolade, Schokoladen

Another way could be to try to add or remove letters when we remove part of words like stücke, to see if we find a matching ingredient. This may or may not be error prone (yielding false positives where we change the ingredient to a different ingredient).

Part of

@stephanegigandet
Copy link
Contributor Author

Some examples are listed in https://aclanthology.org/W17-1722.pdf

@teolemon teolemon added German https://wiki.openfoodfacts.org/Local_Communities/GermanTeam 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis labels Oct 25, 2023
@CharlesNepote CharlesNepote added the 🇩🇪 Germany https://wiki.openfoodfacts.org/Local_Communities/GermanTeam label Nov 21, 2023
@teolemon teolemon moved this to To discuss and validate in 🍊 Open Food Facts Server issues Apr 23, 2024
@teolemon teolemon removed the 🐛 bug This is a bug, not a feature request. label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
German https://wiki.openfoodfacts.org/Local_Communities/GermanTeam 🇩🇪 Germany https://wiki.openfoodfacts.org/Local_Communities/GermanTeam 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis
Projects
Status: To do
Status: To discuss and validate
Development

No branches or pull requests

3 participants