You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Transcription update Will fail to update (by design) if there are multiple Transcription entities for the same language and video.
Some comments:
Most of this invalid data 2020 data but we should address the whole dataset and then implement a constraint, together with adding a new column eg. "source" or "kind" to allow multiple transcripts per video
The new implementation looks for the min-max across all-languages. i.e. Find the last caption for each language. Determine the earliest one and then trim the audio from there. The max time is then used to ensure captions are only added once we reach an unprocessed time for that particular language.
This is useful because sometimes one particular language is lagging e.g. it stopped when one translation never arrived.
However some videos have large portions of time where there is no transcription (event=NOMATCH). We don't want to have to transcribe that audio again when we do a restart, but simply recording the lastsuccesstime is insufficient.
Also ... If we add a new translation it would start from the beginning (and use the uncorrected transcriptions).
This suggests a future design should separate out the transcription from the translation, would save some credits, rather than paying for NOMATCH regions twice, if we have to restart the task. (This would also allow translations of artificially inserted captions e.g. [silence] etc)
The worst case is an hour long silence, which fails half way. The restart would start from the beginning. Fortunately, "ServiceTimeouts" do not seem to occur if there are no transcriptions to translate.
The text was updated successfully, but these errors were encountered:
Transcription update Will fail to update (by design) if there are multiple Transcription entities for the same language and video.
Some comments:
Most of this invalid data 2020 data but we should address the whole dataset and then implement a constraint, together with adding a new column eg. "source" or "kind" to allow multiple transcripts per video
The new implementation looks for the min-max across all-languages. i.e. Find the last caption for each language. Determine the earliest one and then trim the audio from there. The max time is then used to ensure captions are only added once we reach an unprocessed time for that particular language.
This is useful because sometimes one particular language is lagging e.g. it stopped when one translation never arrived.
However some videos have large portions of time where there is no transcription (event=NOMATCH). We don't want to have to transcribe that audio again when we do a restart, but simply recording the lastsuccesstime is insufficient.
Also ... If we add a new translation it would start from the beginning (and use the uncorrected transcriptions).
This suggests a future design should separate out the transcription from the translation, would save some credits, rather than paying for NOMATCH regions twice, if we have to restart the task. (This would also allow translations of artificially inserted captions e.g. [silence] etc)
The worst case is an hour long silence, which fails half way. The restart would start from the beginning. Fortunately, "ServiceTimeouts" do not seem to occur if there are no transcriptions to translate.
The text was updated successfully, but these errors were encountered: