Add option to set whisper recording command #91

jayeheffernan · 2024-01-19T04:34:11Z

Hello and thanks for the plugin! I had a little issue getting whisper to work, so submitting a fix your consideration.

This PR adds a way to manually specify in the config which command (sox, ffmpeg, or arecord) should be used for recording for commands like GpWhisper. E.g. use it in .setup() like whisper_rec_cmd = 'sox'.

I had an issue trying to use GpWhisper, where the output would always be just "you". I found the recordings, rec.wav, were always the correct length, but only silence. I think the problem is the options to ffmpeg select audio input device :0, which doesn't work in my case. Modifying gp/init.lua to always choose rec_cmd = "sox" works fine for me. There's probably some way to look into the audio input devices more and improve the autodetection, but I'm not sure how to do that well, and thinking that this may not be a common issue anyway.

Debugging my issue with audio devices...

Here's some info from a terminal session of me figuring out what was going on, if it helps.

Screenshot with notes

Raw text output

/tmp/gp_whisper ❯ ffmpeg -devices -v quiet | grep -i avfoundation | wc -l                                                                                                                                                    11:47:11
       1
/tmp/gp_whisper ❯ ffmpeg -devices -v quiet | grep -i avfoundation                                                                                                                                                            11:50:53
 D  avfoundation    AVFoundation input device
/tmp/gp_whisper ❯ ffmpeg -devices -v quiet                                                                                                                                                                                   11:50:59
Devices:
 D. = Demuxing supported
 .E = Muxing supported
 --
  E audiotoolbox    AudioToolbox output device
 D  avfoundation    AVFoundation input device
 D  lavfi           Libavfilter virtual input device
  E sdl,sdl2        SDL2 output device
 D  x11grab         X11 screen capture, using XCB
/tmp/gp_whisper ❯ ffmpeg -f avfoundation -list_devices true -i ""                                                                                                                                                            11:51:10
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 14.0.3 (clang-1403.0.22.14.1)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
[AVFoundation indev @ 0x7fe4d6f04a00] AVFoundation video devices:
[AVFoundation indev @ 0x7fe4d6f04a00] [0] FaceTime HD Camera (Built-in)
[AVFoundation indev @ 0x7fe4d6f04a00] [1] LG UltraFine Display Camera
[AVFoundation indev @ 0x7fe4d6f04a00] [2] Snap Camera
[AVFoundation indev @ 0x7fe4d6f04a00] [3] Capture screen 0
[AVFoundation indev @ 0x7fe4d6f04a00] [4] Capture screen 1
[AVFoundation indev @ 0x7fe4d6f04a00] [5] Capture screen 2
[AVFoundation indev @ 0x7fe4d6f04a00] AVFoundation audio devices:
[AVFoundation indev @ 0x7fe4d6f04a00] [0] ZoomAudioDevice
[AVFoundation indev @ 0x7fe4d6f04a00] [1] MacBook Pro Microphone
[AVFoundation indev @ 0x7fe4d6f04a00] [2] LG UltraFine Display Audio

Screenshot of new error message in action

If you pick an invalid value, you'll find out when you try to record:

Robitx · 2024-01-20T16:53:52Z

@jayeheffernan thanks for the PR and comprehensive debug of the issue.

I've tweaked it slightly so that whisper_rec_cmd can be fully customized. If you hit cropping issues with sox (which can happen if recording sound recording device has high latency), you could go back to ffmpeg with manually chosen device.

jayeheffernan and others added 5 commits January 19, 2024 12:27

Add option to default config

a90516d

Override automatic detection

74ad6a7

chore: deprecate whisper_max_time

bb5cbef

feat: fully configurable whisper_rec_cmd

731d833

chore: formating

14f5402

fix: typo

b6c637b

Robitx merged commit c2469c0 into Robitx:main Jan 20, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to set whisper recording command #91

Add option to set whisper recording command #91

jayeheffernan commented Jan 19, 2024

Robitx commented Jan 20, 2024

Add option to set whisper recording command #91

Add option to set whisper recording command #91

Conversation

jayeheffernan commented Jan 19, 2024

Screenshot with notes

Raw text output

Screenshot of new error message in action

Robitx commented Jan 20, 2024