You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using the code provided by HuggingFace for Seamless M4T v1 to do translation for some audio files I have extracted from mp4 video recordings using ffmpeg (cmd used below for reference).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm using the code provided by HuggingFace for Seamless M4T v1 to do translation for some audio files I have extracted from mp4 video recordings using ffmpeg (cmd used below for reference).
ffmpeg video_recording.mp4 -vn -acodec pcm_s16le -t 30 video_recording_0%d.wav
My understanding is that Seamless M4T v1 was trained on 16K audio . I had a couple of questions.
If the audio files I am providing have an original sample rate of 48K and the code resamples it to 16K, would that throw off the translations?
If Seamless was trained on 16K audio, can I pass it 48K audio or would that provide suboptimal translations?
Can seamless output the intermediate transcription of the audio before it performs the translation?
Beta Was this translation helpful? Give feedback.
All reactions