Improve support of German compound words in ingredients processing (e.g. Himbeerstücke, Schokoladenstücke) #9115
Labels
German
https://wiki.openfoodfacts.org/Local_Communities/GermanTeam
🇩🇪 Germany
https://wiki.openfoodfacts.org/Local_Communities/GermanTeam
🥗🔍 Ingredients analysis
https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis
For ingredients processing, we currently remove words like "stücke" (from the ingredients_processing.txt taxonomy), and try to match them to ingredients. It does not work very well as some final letters might be added or removed when creating compound words.
e.g. Schokolade -> Schokoladenstücke, Himbeere -> Himbeerstücke
See https://de.openfoodfacts.org/ingredients?filter=st%C3%BCcke for some real world examples.
We can solve this in different ways.
One easy way would be to add the words with added/removed letter as synonyms in the ingredients.txt taxonomy:
de:Himbeere, himbeer
de:Schokolade, Schokoladen
Another way could be to try to add or remove letters when we remove part of words like stücke, to see if we find a matching ingredient. This may or may not be error prone (yielding false positives where we change the ingredient to a different ingredient).
Part of
The text was updated successfully, but these errors were encountered: