Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the simple_stream example so slow? Over 31 seconds to transcribe the wav, which is slower than typing #99

Open
Boscop opened this issue Oct 9, 2024 · 1 comment

Comments

@Boscop
Copy link

Boscop commented Oct 9, 2024

Hi, thanks for making this crate 🙂

I'm trying to figure out why websocket transcription, e.g. via the simple_stream example is slow for me.
I already commented out these lines, hoping it would speed it up:

//        .endpointing(Endpointing::CustomDurationMs(300))
//        .interim_results(true)
//        .utterance_end_ms(1000)
//        .vad_events(true)

But it still takes over 31 seconds to transcribe this wav audio, which is MUCH slower than it would be to type what is said in that wav!

$ time cargo run --example simple_stream
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
     Running `target\debug\examples\simple_stream.exe`
Deepgram Request ID: [...]
got: Ok(TranscriptResponse { type_field: "Results", start: 0.0, duration: 2.24, is_final: true, speech_final: true, from_finalize: false, channel: Channel { alternatives: [Alternatives { transcript: "", words: [], confidence: 0.0 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TranscriptResponse { type_field: "Results", start: 2.24, duration: 1.53, is_final: true, speech_final: true, from_finalize: false, channel: Channel { alternatives: [Alternatives { transcript: "", words: [], confidence: 0.0 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TranscriptResponse { type_field: "Results", start: 3.77, duration: 2.46, is_final: true, speech_final: true, from_finalize: false, channel: Channel { alternatives: [Alternatives { transcript: "Yep.", words: [Word { word: "yep", start: 5.6272583, end: 5.864355, confidence: 0.99365234, speaker: None, punctuated_word: Some("Yep.") }], confidence: 0.99365234 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TranscriptResponse { type_field: "Results", start: 6.23, duration: 0.84000015, is_final: true, speech_final: true, from_finalize: false, channel: Channel { alternatives: [Alternatives { transcript: "", words: [], confidence: 0.0 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TranscriptResponse { type_field: "Results", start: 7.07, duration: 2.0899997, is_final: true, speech_final: true, from_finalize: false, channel: Channel { alternatives: [Alternatives { transcript: "I said it before and I'll say it again.", words: [Word { word: "i", start: 7.27, end: 7.4300003, confidence: 0.9091797, speaker: None, punctuated_word: Some("I") }, Word { word: "said", start: 7.4300003, end: 7.59, confidence: 0.85791016, speaker: None, punctuated_word: Some("said") }, Word { word: "it", start: 7.59, end: 7.83, confidence: 0.9980469, speaker: None, punctuated_word: Some("it") }, Word { word: "before", start: 7.83, end: 8.07, confidence: 0.9970703, speaker: None, punctuated_word: Some("before") }, Word { word: "and", start: 8.07, end: 8.15, confidence: 0.9980469, speaker: None, punctuated_word: Some("and") }, Word { word: "i'll", start: 8.2300005, end: 8.39, confidence: 0.9897461, speaker: None, punctuated_word: Some("I'll") }, Word { word: "say", start: 8.39, end: 8.47, confidence: 0.99853516, speaker: None, punctuated_word: Some("say") }, Word { word: "it", start: 8.47, end: 8.71, confidence: 0.9980469, speaker: None, punctuated_word: Some("it") }, Word { word: "again", start: 8.71, end: 8.87, confidence: 0.9995117, speaker: None, punctuated_word: Some("again.") }], confidence: 0.9980469 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TranscriptResponse { type_field: "Results", start: 9.16, duration: 0.89000034, is_final: true, speech_final: true, from_finalize: false, channel: Channel { alternatives: [Alternatives { transcript: "", words: [], confidence: 0.0 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TranscriptResponse { type_field: "Results", start: 10.05, duration: 1.6599998, is_final: true, speech_final: true, from_finalize: false, channel: Channel { alternatives: [Alternatives { transcript: "Life moves pretty fast.", words: [Word { word: "life", start: 10.167857, end: 10.403572, confidence: 0.97802734, speaker: None, punctuated_word: Some("Life") }, Word { word: "moves", start: 10.403572, end: 10.717857, confidence: 0.99072266, speaker: None, punctuated_word: Some("moves") }, Word { word: "pretty", start: 10.717857, end: 11.032143, confidence: 0.99853516, speaker: None, punctuated_word: Some("pretty") }, Word { word: "fast", start: 11.032143, end: 11.425, confidence: 0.99902344, speaker: None, punctuated_word: Some("fast.") }], confidence: 0.99853516 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TranscriptResponse { type_field: "Results", start: 11.71, duration: 2.87, is_final: true, speech_final: true, from_finalize: false, channel: Channel { alternatives: [Alternatives { transcript: "You don't stop and look around once in a while.", words: [Word { word: "you", start: 12.146944, end: 12.305834, confidence: 0.99902344, speaker: None, punctuated_word: Some("You") }, Word { word: "don't", start: 12.305834, end: 12.623611, confidence: 0.99658203, speaker: None, punctuated_word: Some("don't") }, Word { word: "stop", start: 12.623611, end: 12.7825, confidence: 0.99902344, speaker: None, punctuated_word: Some("stop") }, Word { word: "and", start: 12.7825, end: 12.941389, confidence: 0.97021484, speaker: None, punctuated_word: Some("and") }, Word { word: "look", start: 12.941389, end: 13.179722, confidence: 0.9941406, speaker: None, punctuated_word: Some("look") }, Word { word: "around", start: 13.179722, end: 13.418056, confidence: 0.9995117, speaker: None, punctuated_word: Some("around") }, Word { word: "once", start: 13.418056, end: 13.656389, confidence: 0.9995117, speaker: None, punctuated_word: Some("once") }, Word { word: "in", start: 13.656389, end: 13.735833, confidence: 0.97802734, speaker: None, punctuated_word: Some("in") }, Word { word: "a", start: 13.735833, end: 13.894722, confidence: 0.95654297, speaker: None, punctuated_word: Some("a") }, Word { word: "while", start: 13.894722, end: 14.053612, confidence: 0.98535156, speaker: None, punctuated_word: Some("while.") }], confidence: 0.99658203 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TranscriptResponse { type_field: "Results", start: 14.58, duration: 3.0094376, is_final: true, speech_final: false, from_finalize: true, channel: Channel { alternatives: [Alternatives { transcript: "You could miss it.", words: [Word { word: "you", start: 14.777369, end: 14.935263, confidence: 0.99853516, speaker: None, punctuated_word: Some("You") }, Word { word: "could", start: 14.935263, end: 15.093158, confidence: 0.99365234, speaker: None, punctuated_word: Some("could") }, Word { word: "miss", start: 15.093158, end: 15.251053, confidence: 0.9975586, speaker: None, punctuated_word: Some("miss") }, Word { word: "it", start: 15.251053, end: 15.408947, confidence: 0.9946289, speaker: None, punctuated_word: Some("it.") }], confidence: 0.9975586 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TranscriptResponse { type_field: "Results", start: 17.589437, duration: 6.2942505e-5, is_final: true, speech_final: false, from_finalize: false, channel: Channel { alternatives: [Alternatives { transcript: "", words: [], confidence: 0.0 }] }, metadata: Metadata { request_id: "e438b919-52a7-4e6e-a53c-b5e06f6a97ac", model_info: ModelInfo { name: "general", version: "2024-01-26.8851", arch: "base" }, model_uuid: "1ed36bac-f71c-4f3f-a31f-02fd6525c489" }, channel_index: [0, 1] })
got: Ok(TerminalResponse { request_id: [...], created: "2024-10-09T09:06:55.964Z", duration: 17.5895, channels: 1 })

real    0m31.475s
user    0m0.015s
sys     0m0.015s

I'm looking for a way to have faster transcription than typing, to implement push-to-talk voice typing for my app.
If it's slower than typing there's no point in transcribing voice input..

Any idea how I can speed it up? I'd really appreciate it 🙂

@DamienDeepgram
Copy link
Contributor

DamienDeepgram commented Dec 2, 2024

Streaming transcribes realtime audio.

If the file you are streaming is 31sec long then it would get streaming @ 1sec per sec and take 31sec or so to process

If you want to transcribe a file and not use realtime streaming use our pre-recorded API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants