-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging phrases in speech-db #457
Comments
Example of multiple entries: https://speech-db.altlab.app/maskwacis/search/?query=namôya+ê-ayamihât |
Question: What are the total set of fields that define two phrases as "the same" here, as in "it would be ok to automatically merge them and pick any of them"? I'm wondering in particular about those fields in the database beyond transcription and translation:
I'm currently asking for all fields to be the same, but that is too detailed in some cases. I am inclined to disregard differences in |
I had originally been thinking that if the transcription (in its latest state, so not necessarily the field transcription) and the translation (excluding spaces at the edges) are exactly the same, then the entries could be merged. For the other fields, if they only occur for one entry or not another, or are exactly the same for both entries, then one could use that common value. For other fields that do not match, one could combine them for the merged entry. But would this result in ambiguous cases? |
I don't think it results in ambiguous cases. I was designing an interface for automatically listing all possible candidates, but that will be unneccessary once all the ambiguous cases are dealt with. Also I would not be surprised if that would lead the interface to take too long to load and timeout, so I think it's better to have the automatic merging done separately. In general, automatically merging can be done with a |
Code is ready, action to decide on running django command on server to be discussed via email. |
Adding the linguist-administrator role is needed for linguists to undertake merging of individual items. That would be useful as checking the behavior with indidivual entries, before/instead of running merging whole-sale computationally. |
@fbanados I don't think we need to delay this any further, as I've been able to observe that the merging of individual entries has worked properly - so we can proceed with computationally merging all the entries for which the transcriptions and translations are exactly the same. |
I will make a database backup before merging |
Entries merged. |
Some times there are multiple entries for the same phrase. There should be an interface to merge them, and a script to automatically merge entries whose transcription and translation are the same.
(Split from #444)
The text was updated successfully, but these errors were encountered: