Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to set whisper recording command #91

Merged
merged 6 commits into from
Jan 20, 2024

Conversation

jayeheffernan
Copy link
Contributor

Hello and thanks for the plugin! I had a little issue getting whisper to work, so submitting a fix your consideration.

This PR adds a way to manually specify in the config which command (sox, ffmpeg, or arecord) should be used for recording for commands like GpWhisper. E.g. use it in .setup() like whisper_rec_cmd = 'sox'.

I had an issue trying to use GpWhisper, where the output would always be just "you". I found the recordings, rec.wav, were always the correct length, but only silence. I think the problem is the options to ffmpeg select audio input device :0, which doesn't work in my case. Modifying gp/init.lua to always choose rec_cmd = "sox" works fine for me. There's probably some way to look into the audio input devices more and improve the autodetection, but I'm not sure how to do that well, and thinking that this may not be a common issue anyway.

Debugging my issue with audio devices...

Here's some info from a terminal session of me figuring out what was going on, if it helps.

Screenshot with notes

tmux

Raw text output

/tmp/gp_whisper ❯ ffmpeg -devices -v quiet | grep -i avfoundation | wc -l                                                                                                                                                    11:47:11
       1
/tmp/gp_whisper ❯ ffmpeg -devices -v quiet | grep -i avfoundation                                                                                                                                                            11:50:53
 D  avfoundation    AVFoundation input device
/tmp/gp_whisper ❯ ffmpeg -devices -v quiet                                                                                                                                                                                   11:50:59
Devices:
 D. = Demuxing supported
 .E = Muxing supported
 --
  E audiotoolbox    AudioToolbox output device
 D  avfoundation    AVFoundation input device
 D  lavfi           Libavfilter virtual input device
  E sdl,sdl2        SDL2 output device
 D  x11grab         X11 screen capture, using XCB
/tmp/gp_whisper ❯ ffmpeg -f avfoundation -list_devices true -i ""                                                                                                                                                            11:51:10
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 14.0.3 (clang-1403.0.22.14.1)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
[AVFoundation indev @ 0x7fe4d6f04a00] AVFoundation video devices:
[AVFoundation indev @ 0x7fe4d6f04a00] [0] FaceTime HD Camera (Built-in)
[AVFoundation indev @ 0x7fe4d6f04a00] [1] LG UltraFine Display Camera
[AVFoundation indev @ 0x7fe4d6f04a00] [2] Snap Camera
[AVFoundation indev @ 0x7fe4d6f04a00] [3] Capture screen 0
[AVFoundation indev @ 0x7fe4d6f04a00] [4] Capture screen 1
[AVFoundation indev @ 0x7fe4d6f04a00] [5] Capture screen 2
[AVFoundation indev @ 0x7fe4d6f04a00] AVFoundation audio devices:
[AVFoundation indev @ 0x7fe4d6f04a00] [0] ZoomAudioDevice
[AVFoundation indev @ 0x7fe4d6f04a00] [1] MacBook Pro Microphone
[AVFoundation indev @ 0x7fe4d6f04a00] [2] LG UltraFine Display Audio

Screenshot of new error message in action

If you pick an invalid value, you'll find out when you try to record:

tmux

@Robitx
Copy link
Owner

Robitx commented Jan 20, 2024

@jayeheffernan thanks for the PR and comprehensive debug of the issue.

I've tweaked it slightly so that whisper_rec_cmd can be fully customized. If you hit cropping issues with sox (which can happen if recording sound recording device has high latency), you could go back to ffmpeg with manually chosen device.

@Robitx Robitx merged commit c2469c0 into Robitx:main Jan 20, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants