-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example Sentences (from Tanaka Corpus) sometimes linked to incorrect sense (ex: in 掛ける entry) #79
Comments
After some research, I'm not sure but I suspect the reason this may have happened is because entries to the jmdict have been removed and others have been added and perhaps this contributed to a bunch of off by 1 errors over time that have shifted these example sentence groups around |
Another thought is to your point on #37 jreibun might come out soon and be a better source of sentences than tanaka corpus anyways, albeit not sure when it will be released |
Thanks, I'm always glad to hear that people like the project.
Yes, these errors are very common. I have probably fixed a couple hundred of them over the past year.
Tatoeba has a very primitive GUI for editing the links to JMdict entries. It is technically open to the public to use, but it is extremely user-unfriendly and difficult to use correctly. Feel free to let me know when you spot these errors and I'll go fix them. A couple of other users have also been reporting these errors to me in the discussion forum.
That is indeed a common reason for the errors. Whenever entries in JMdict are edited, the editors need to remember to update the sentence links as well. We try to keep this in mind, but sometimes we forget. I recently suggested that some of this sentence information should be displayed in the JMdict database editor to make it easier to remember, but this is a volunteer project and things don't always move quickly.
It's been almost a year since the last public update from that project, so I'm not sure how soon that will be. Fingers crossed. |
Hi Stephen! Amazing work, thank you for contributing to the world's knowledge!
I have noticed some issues with the Tanaka Corpus, and am not sure where to discuss this, but since I intend to use Yomitan as my popup dictionary of choice for some time, figured I would mention it here. This problem comes up in other projects that use the Tanaka Corpus of course (e.g. Shirabe Jisho for iOS).
If you look up the dictionary entry for 掛ける in Jitendex, there are many examples of sentences from Tanaka Corpus being assigned to the wrong sense.
sense 9 means multiply.
But multiply sentence is included with sense 5.
sense 11 means take a seat, and includes the correct reference
but the example sentence is with sense 22, to apply (insurance)
Is there anything that can be done about this?
I read on the EDRDG wiki that the Tanaka Corpus is now within Tatoeba and it is its new "home". Does this mean each time we see something like this, we should correct it there?
Here's one of those sentences:
https://tatoeba.org/en/sentences/show/236991
I have an account with Tatoeba, but I'm afraid I don't know how to edit the sentences, and even if I did, would I be able to change the attribution information that links it to one of the senses in the jmdict?
Just bringing this to your attention in case it is not possible to change things at the source (the Tanaka Corpus itself) and we might need to make a file in Jitendex for all the manually assigned corrections or something.
The text was updated successfully, but these errors were encountered: