-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompiler no longer works for en-US & en-GB #10708
Comments
Hi @milekpl When you export spelling binary dictionaries, make sure that the path contains "hunspell" or "spelling". See: languagetool/languagetool-tools/src/main/java/org/languagetool/tools/DictionaryExporter.java Line 68 in 2446a07
We are using that to distinguish spelling and tagger or synthesizer dictionaries. I know that this is confusing. If we remove it, we'll need a new input parameter to specify the kind of dictionary. But we'll also need to modify all the scripts that use this class. |
Hi @jaumeortola, thanks for the explanation. Indeed, it does work when the dictionary is stored under a hunspell directory. Right now I have to time to work on this, but it seems to be it would be much easier just to use the existent logic of LT, and require the user to provide the language code and the explicit flag -spell. Tagging and synthesis should work the same way as before. LT is able to locate its resources, so we could simply instantiate a language and get the resource path this way, so that the user won't need to decompile a jar etc. Alternatively, provide -i with a full path and the explicit flag (-spell). |
We could do that, yes, keeping the current methods for backward compatibility. |
Ah, needed a modern word list for English, and ours is nicely curated.
pon., 8 lip 2024, 08:52 użytkownik Jaume Ortolà ***@***.***>
napisał:
… LT is able to locate its resources, so we could simply instantiate a
language and get the resource path this way, so that the user won't need to
decompile a jar etc.
We could do that, yes, keeping the current methods for backward
compatibility.
Anyway, what is your goal with the English dictionary? Usually, developers
decompile a binary dictionary when they want to update the dictionary and
need to see the contents of the old dict.
—
Reply to this email directly, view it on GitHub
<#10708 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALBERSBVXKXLX7AO7KHNSDZLIZJ7AVCNFSM6AAAAABKPNDTH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJTGE3TQMRXHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The documentation at
https://dev.languagetool.org/hunspell-support
is outdated, as it does not specify that English morfologik dictionaries are now, for some reason (which is obscure to me, given how small these files are), kept in a separate jar:
english-pos-dict.jar
. However, decompiling the files from the jar fails as well:An unhandled exception occurred. Stack trace below. java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkBounds(Unknown Source) at java.nio.HeapByteBuffer.put(Unknown Source) at morfologik.stemming.TrimSuffixEncoder.decode(TrimSuffixEncoder.java:86) at morfologik.stemming.DictionaryIterator.next(DictionaryIterator.java:86) at morfologik.stemming.DictionaryIterator.next(DictionaryIterator.java:12) at morfologik.tools.DictDecompile.call(DictDecompile.java:80) at morfologik.tools.DictDecompile.call(DictDecompile.java:20) at morfologik.tools.CliTool.main(CliTool.java:133) at morfologik.tools.DictDecompile.main(DictDecompile.java:132) at org.languagetool.tools.DictionaryExporter.build(DictionaryExporter.java:82) at org.languagetool.tools.DictionaryExporter.main(DictionaryExporter.java:59) Done. The dictionary export has been written to en-US.txt
I did not delve deeper into it, but Polish dictionaries decompile fine. Any ideas @jaumeortola ?
The text was updated successfully, but these errors were encountered: