-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement whisper.cpp #51
Comments
We added a basic version of whisper.cpp to the Container image here: |
Can this be closed? Do we have this functionality now? |
There's still more work for both |
What do you see an example command-line looking like here? I've toyed a bit with whisper.cpp and find its use to be very different than launching a gradio server or a chat instance. AFAICT, you're usually calling the main executable with an argument for the model and one or more additional arguments describing the file to transcribe and options. How would you feel about, as an alternative, dropping users into a [containerized] shell with access to input/output/model volumes and possibly some helper scripts to accomplish simple tasks? So, perhaps, Also, should the container files be pulling the latest whisper.cpp instead of the latest known to be good for Ramalama? |
I would say just this to start:
which would just perform a fairly standard whisper.cpp command. No interactive support, it's not the same as a chat bot workflow. We have renovate controlling what version of whisper.cpp we build against and it runs everything through CI before we rebase, I'd like to keep this, at least there's a CI run before we suddenly change version. Just cloning main/master without CI, I'd rather not. |
@p5 kindly set up renovate for us. |
Could you elaborate, please? What command would be performed? Would that be provided through additional command-line arguments to the command you've just given? Would you pass that as though it were a prompt to another model? The boilerplate syntax for using whisper.cpp directly is something like
Sorry to be dense, but I can't tell if you're reiterating that whisper.cpp does not provide interactive support or if you're saying that you do not like the concept of dropping the user to a shell prompt with some workspace-like features. |
I meant something like this, corrected:
|
We are always open to ideas. It's just less relevant with whisper.cpp, one doesn't speak to an interactive prompt. But if someone finds that useful for some reason, always happy to look at PRs, etc. |
This PR is a perfect example of why we don't just clone main/master: |
Yes, failing each time whisper.cpp is updated is perhaps a better solution than blindly updating along with it. Anyway, back to the meat of my question...
I don't care at all about that. I just saw the seemingly abandoned whisper.cpp stub in the container and thought to make it usable. It is not currently usable, right? Do you care to see it made usable? |
Yup we sure do, this issue is open for someone to complete it :) |
Great. I'm just having trouble understanding what "complete it" means to you. That's why I'm asking about the example command-lines you're envisioning.
|
We are open to ideas. Rome wasn't built in a day, one PR at a time.
Getting this executing the main command you specified would be a start ramalama --runtime whisper.cpp run ggml-large-v3-turbo.bin jfk.wav |
Thanks. And in the case that Ramalama is running the inferencing in a container, how we get the input media into the model? Would you infer a directory to bind based on the working directory/file given as an argument? |
Yes the wav file would need to be volume mounted into the container with a :z option. Would it make sense to also allow stdin for a way file? cat jfk.wav | ramalama --runtime whisper.cpp run ggml-large-v3-turbo.bin - I think it makes sense to allow grab output from stdout. cat jfk.wav | ramalama --runtime whisper.cpp run ggml-large-v3-turbo.bin - > /tmp/output Not sure what ramalama serve would do? |
Thank you for helping to bring me up to speed with some mock usage examples. I'm still pretty novice at containers, but could we bind the directory read-only instead of using :z? A dryrun (love that feature, btw) of something like(?):
It's more appealing in principle than binding directories, but whether or not it is practical is currently unknown to me. It has been a verrrrry long time since I've relied on a shell to create pipes for large data and I don't quite remember what the gotchas were, but I'm thinking there were at least a few. I know even less about what might happen trying to pipe potentially massive, uncompressed audio files into the container on STDIN.
Redirects aren't adequate?
If the documentation on the whisper.cpp github is accurate, it seems like the provided web service is very basic. All the examples are making requests via curl from the command-line. I totally get it and that isn't criticism, but we're not talking about a convenient UI AFAICT. I am aware that there exist some third-party front-ends, like https://github.com/litongjava/whisper-cpp-server + https://github.com/litongjava/listen-know-web, but I don't really know anything about them. For me, personally, the most likely use-case for whisper.cpp is in generating subtitles for arbitrary videos (upon audio extracted w/ ffmpeg) or for transcribing and translating arbitrary audio sequences. Command-line invocation over a bound directory seems adequate and possibly ideal. It probably doesn't require any code overlap with whisper.cpp, using Ramalama as scaffolding to bring all the pieces together and offering a layer of abstraction wrt GPU config, system libraries, etc. But I'm not deep into any of these techs or projects, so if there's a better vision / direction I'd love to hear about it. |
The stdout stuff should just work, we could even just have whisper grab /dev/stdin when it sees the "-". The volume mount can also be marked ro,z so it is both readonly and available for reading based on SELinux. |
|
Slightly simpler. podman run --rm -i --device nvidia.com/gpu=all |
Looks like |
It does look that way, though I haven't prepared any large wav files to test with. Thank you - your way is much better, I think, than binding a directory for a process that will only require one input. Especially if it creates the possibility of chaining ffmpeg without intermediate conversion files. So, consensus seems to be that Thanks |
stdin seems fine to me. Also we can autodetect when there is stdin coming in, it's better for usability, you don't need the explicit '-' then, although we can have the ability to explicitly use stdin also '-', grep is an example of a command that does this and llama-run. Once llama-run is integrated into RamaLama, I plan on adding this to the ramalama run:
llm-gguf and ollama have this feature. |
Cool |
If there is a way to auto-detect between language model files and asr model files. We should do that, or if that's not possible we should just use a runtime flag, so some options for the runtime flag would be:
The text was updated successfully, but these errors were encountered: