Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] About status of text-corpora analysis. #57

Open
HarikalarKutusu opened this issue Jul 5, 2023 · 1 comment
Open

[FR] About status of text-corpora analysis. #57

HarikalarKutusu opened this issue Jul 5, 2023 · 1 comment
Labels
analyzer cv-tbodataset-analyzer related compiler cv-tbox-dataset-compiler related help wanted Extra attention is needed

Comments

@HarikalarKutusu
Copy link
Owner

Mozilla Common Voice started to use the database for new text-corpus directly, without exporting newly added (validated) sentences to the public. Therefore, our analysis on text-corpora is outdated (not changed after March 2023 release v13.0).

You can read about the issue and possible solutions on the Common Voice repo:

common-voice/common-voice#4100

It seems until it is fixed, there is nothing we can do about this. Any other idea is most welcome.

@HarikalarKutusu HarikalarKutusu added help wanted Extra attention is needed compiler cv-tbox-dataset-compiler related analyzer cv-tbodataset-analyzer related labels Jul 5, 2023
@HarikalarKutusu
Copy link
Owner Author

With v17.0, the text-corpora is released. Although it has problems, we could mitigate most of them in the Dataset Compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer cv-tbodataset-analyzer related compiler cv-tbox-dataset-compiler related help wanted Extra attention is needed
Projects
Status: 🆕 New
Development

No branches or pull requests

1 participant