Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English sentence seeing different results in all 3 versions #27

Open
thewilkybarkid opened this issue Feb 28, 2024 · 1 comment
Open

Comments

@thewilkybarkid
Copy link
Contributor

We're using the heavy version (v1.3.4), and I've just spotted that A population perspective on international students in Australian universities is detected as fr rather than en.

FR 4.02%
EN 3.65%
LA 2.37%
FI 2.13%
LV 2.06%

Looking at the Playground, it would be recognised as lv using the normal version:

LV 2.06%
FR 1.66%
FI 1.48%
ET 1.45%
EN 0.89%

And only correct using the light version:

EN 3.33%
FR 2.29%
NL 1.72%
FI 1.45%
IT 1.45%

I don't know much about Tatoeba. When we see incorrect detection, would it make sense to add the sentence there and hope that it triggers a tweak in this library? (A few other issues are open like this; could there be some guidance about what to do?)

@thewilkybarkid
Copy link
Contributor Author

Found a couple more:

DRAFT: Developing and implementing the semantic interoperability recommendations of the EOSC Interoperability Framework is confidently la rather than en in the heavy and normal versions; this looks to be triggered by 'EOSC'. I might be able to strip out acronyms/initialisms on our side, which sees it be en in all 3 versions.

Sardegna grassland mapping for livestock management: a practical Intra-Annual NDVI contrasts approach is confidently lt in heavy, fr in the normal and en only in the light. Removing the initialism ('NDVI') sees it be fr in heavy, fr in the normal and en in the light.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant