Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a script to fix Taxonomized values stored in the wrong language #10643

Open
teolemon opened this issue Aug 5, 2024 · 3 comments
Open

Comments

@teolemon
Copy link
Member

teolemon commented Aug 5, 2024

Problem

  • Some problematic apps have stored axonomized values in the wrong language, like fr:Dairy drinks or en:Boissons lactées
  • https://world.openfoodfacts.org/category/en:boissons-lactees
  • Create a script to fix Taxonomized values stored in the wrong language
  • Since theoretically the loop is very long (140 languages), the algo could be directed on a specific tag like the one above, with a target value.

Part of

@benbenben2
Copy link
Collaborator

benbenben2 commented Aug 28, 2024

I created a script for that.
#10711
Currently testing it in net environment.

Also see:
#7838

@teolemon teolemon removed the ✨ Feature Features or enhancements to Open Food Facts server label Oct 18, 2024
@benbenben2 benbenben2 removed their assignment Dec 21, 2024
@github-throwaway
Copy link
Contributor

@benbenben2 Whats the status on this? :) Anything where one can help?

@benbenben2
Copy link
Collaborator

I stopped working on it.
Main challenge is the new limitation on page requests per minutes.
We need to load many pages to retrieve all unknown tags and also to retrieve all products having this tag.
Here is an old script version written before this API limitation: https://github.com/openfoodfacts/openfoodfacts-server/pull/9581/files. Now the script is obsolete and I made a PR to remove it from the repo.

There are suggestions from the community to deal with it beside the API limitation on Slack: https://openfoodfacts.slack.com/archives/C02LDQDDD/p1725200136159869.
Python SDK is also facing this limitation: openfoodfacts/openfoodfacts-python#292
Downloading the database locally and use something like DuckDB might be an option but we will have to scroll on all products and process all tags to recreate an equivalent of the API for unknown tags. Which seems adding too much complexity.

I did not have further ideas so I put it aside for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To discuss and validate
Status: To do
Development

No branches or pull requests

3 participants