You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to build a client-server streaming setup where clients would:
pre-process audio
extract MELs
perform VAD and
remove silence from data sent to server.
Server (whisper.cpp) could then take MELs as streaming input (shift new samples into a 30-seconds window)
... and avoid spending some 10-15% of CPU time on repeating PCM -> MELs each time.
I' was looking at stream example, but it seems it is not working on MELs, but rather on PCM samples.
What's the procedure to use MEL interface ... I'm trying to understand what to use: whisper_pcm_to_mel, whisper_pcm_to_mel_with_state, whisper_set_mel, whisper_set_mel_with_state?
How should I initialize the context and state before I start shifting in MELs?
Also:
structwhisper_state{
...
std::vector<float> energy; // PCM signal energy
}
Does the energy in whisper_state need to be extracted from PCM or can it get that information from MELs? Does that need any special handling?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm trying to build a client-server streaming setup where clients would:
Server (whisper.cpp) could then take MELs as streaming input (shift new samples into a 30-seconds window)
... and avoid spending some 10-15% of CPU time on repeating PCM -> MELs each time.
I' was looking at
stream
example, but it seems it is not working on MELs, but rather on PCM samples.What's the procedure to use MEL interface ... I'm trying to understand what to use:
whisper_pcm_to_mel
,whisper_pcm_to_mel_with_state
,whisper_set_mel
,whisper_set_mel_with_state
?How should I initialize the context and state before I start shifting in MELs?
Also:
Does the
energy
inwhisper_state
need to be extracted from PCM or can it get that information from MELs? Does that need any special handling?Beta Was this translation helpful? Give feedback.
All reactions