Transcription restarts corner cases difficult to handle with combined transcription+translation

Transcription update Will fail to update (by design) if there are multiple Transcription entities for the same language and video.
Some comments:

Most of this invalid data 2020 data but we should address the whole dataset and then implement a constraint, together with adding a new column eg. "source" or "kind" to allow multiple transcripts per video

The new implementation looks for the min-max across all-languages. i.e. Find the last caption for each language. Determine the earliest one and then trim the audio from there.  The max time is then used to ensure captions are only added once we reach an unprocessed time for that particular language. 

This is useful because sometimes one particular language is lagging e.g. it stopped when one translation never arrived.

However some videos have large portions of time where there is no transcription (event=NOMATCH). We don't want to have to transcribe that audio again when we do a restart, but simply recording the lastsuccesstime is insufficient.

Also ... If we add a new translation it would start from the beginning (and use the uncorrected transcriptions).

This suggests a future design should separate out the transcription from the translation, would save some credits, rather than paying for NOMATCH regions twice, if we have to restart the task. (This would also allow translations of artificially inserted captions e.g. [silence] etc)

The worst case is an hour long silence, which fails half way. The restart would start from the beginning. Fortunately, "ServiceTimeouts" do not seem to occur if there are no transcriptions to translate.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transcription restarts corner cases difficult to handle with combined transcription+translation #117

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transcription restarts corner cases difficult to handle with combined transcription+translation #117

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions