Merging phrases in speech-db #457

fbanados · 2024-09-05T17:50:32Z

Some times there are multiple entries for the same phrase. There should be an interface to merge them, and a script to automatically merge entries whose transcription and translation are the same.
(Split from #444)

Exact same phrases (transcription+translations) should be merged automatically
An interface for manually merging phrases would be useful.

fbanados · 2024-09-05T22:22:42Z

Example of multiple entries: https://speech-db.altlab.app/maskwacis/search/?query=namôya+ê-ayamihât

fbanados · 2024-09-05T22:25:01Z

This example raises interesting issues, as analysis and translations are different:

fbanados · 2024-09-05T23:47:21Z

Question: What are the total set of fields that define two phrases as "the same" here, as in "it would be ok to automatically merge them and pick any of them"? I'm wondering in particular about those fields in the database beyond transcription and translation:

field_transcription
analysis
comment
status
semantic class (RW)
modifier

I'm currently asking for all fields to be the same, but that is too detailed in some cases. I am inclined to disregard differences in field_transcription (which arises, e.g., when there's been a change in the transcription that makes them now the same), modifier (person that last touched the entry), and semantic class (RW needs to be regenerated anyways. For the others I don't know, this would require a linguist decision.

aarppe · 2024-09-05T23:57:45Z

I had originally been thinking that if the transcription (in its latest state, so not necessarily the field transcription) and the translation (excluding spaces at the edges) are exactly the same, then the entries could be merged. For the other fields, if they only occur for one entry or not another, or are exactly the same for both entries, then one could use that common value. For other fields that do not match, one could combine them for the merged entry.

But would this result in ambiguous cases?

fbanados · 2024-09-10T21:47:01Z

I don't think it results in ambiguous cases. I was designing an interface for automatically listing all possible candidates, but that will be unneccessary once all the ambiguous cases are dealt with. Also I would not be surprised if that would lead the interface to take too long to load and timeout, so I think it's better to have the automatic merging done separately. In general, automatically merging can be done with a manage.py command, and we can keep the interface just for search and merge.

fbanados · 2024-09-11T21:43:07Z

Code is ready, action to decide on running django command on server to be discussed via email.

aarppe · 2024-09-19T17:42:16Z

Adding the linguist-administrator role is needed for linguists to undertake merging of individual items. That would be useful as checking the behavior with indidivual entries, before/instead of running merging whole-sale computationally.

aarppe · 2024-11-08T07:17:25Z

@fbanados I don't think we need to delay this any further, as I've been able to observe that the merging of individual entries has worked properly - so we can proceed with computationally merging all the entries for which the transcriptions and translations are exactly the same.

fbanados · 2024-11-08T17:00:36Z

I will make a database backup before merging

fbanados · 2024-11-08T20:58:32Z

Entries merged.

fbanados added enhancement New feature or request question Further information is requested requires-linguist-work labels Sep 5, 2024

fbanados self-assigned this Sep 5, 2024

fbanados added a commit that referenced this issue Sep 11, 2024

Interface and auto merge command for #457

b64fa06

fbanados closed this as completed Sep 11, 2024

aarppe reopened this Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging phrases in speech-db #457

Merging phrases in speech-db #457

fbanados commented Sep 5, 2024 •

edited

Loading

fbanados commented Sep 5, 2024

fbanados commented Sep 5, 2024

fbanados commented Sep 5, 2024 •

edited

Loading

aarppe commented Sep 5, 2024

fbanados commented Sep 10, 2024

fbanados commented Sep 11, 2024

aarppe commented Sep 19, 2024 •

edited

Loading

aarppe commented Nov 8, 2024

fbanados commented Nov 8, 2024

fbanados commented Nov 8, 2024

Merging phrases in speech-db #457

Merging phrases in speech-db #457

Comments

fbanados commented Sep 5, 2024 • edited Loading

fbanados commented Sep 5, 2024

fbanados commented Sep 5, 2024

fbanados commented Sep 5, 2024 • edited Loading

aarppe commented Sep 5, 2024

fbanados commented Sep 10, 2024

fbanados commented Sep 11, 2024

aarppe commented Sep 19, 2024 • edited Loading

aarppe commented Nov 8, 2024

fbanados commented Nov 8, 2024

fbanados commented Nov 8, 2024

fbanados commented Sep 5, 2024 •

edited

Loading

fbanados commented Sep 5, 2024 •

edited

Loading

aarppe commented Sep 19, 2024 •

edited

Loading