Skip to content

Commit

Permalink
refactor: Enable partial transcription with a latency of 1000ms (#141)
Browse files Browse the repository at this point in the history
* refactor: Enable partial transcription with a latency of 1000ms

* refactor: Update CMakePresets.json and buildspec.json

- Remove the "QT_VERSION" variable from CMakePresets.json for all platforms
- Update the "version" of "obs-studio" and "prebuilt" dependencies in buildspec.json
- Update the "version" of "qt6" dependency in buildspec.json
- Update the "version" of the project to "0.3.3" in buildspec.json
- Update the "version" of the project to "0.3.3" in CMakePresets.json
- Remove unused code in whisper-processing.cpp

* refactor: Add -Wno-error=deprecated-declarations option to compilerconfig.cmake

* refactor: Update language codes in translation module
  • Loading branch information
royshil authored Jul 19, 2024
1 parent 19017ca commit b3e4bfa
Show file tree
Hide file tree
Showing 16 changed files with 299 additions and 216 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
!LICENSE
!README.md
!/vendor
!patch_libobs.diff

# Exclude lock files
*.lock.json
Expand Down
12 changes: 4 additions & 8 deletions CMakePresets.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,8 @@
"rhs": "Darwin"
},
"generator": "Xcode",
"warnings": { "dev": true, "deprecated": true },
"warnings": {"dev": true, "deprecated": true},
"cacheVariables": {
"QT_VERSION": "6",
"CMAKE_OSX_DEPLOYMENT_TARGET": "11.0",
"CODESIGN_IDENTITY": "$penv{CODESIGN_IDENT}",
"CODESIGN_TEAM": "$penv{CODESIGN_TEAM}"
Expand Down Expand Up @@ -57,9 +56,8 @@
},
"generator": "Visual Studio 17 2022",
"architecture": "x64",
"warnings": { "dev": true, "deprecated": true },
"warnings": {"dev": true, "deprecated": true},
"cacheVariables": {
"QT_VERSION": "6",
"CMAKE_SYSTEM_VERSION": "10.0.18363.657"
}
},
Expand All @@ -84,9 +82,8 @@
"rhs": "Linux"
},
"generator": "Ninja",
"warnings": { "dev": true, "deprecated": true },
"warnings": {"dev": true, "deprecated": true},
"cacheVariables": {
"QT_VERSION": "6",
"CMAKE_BUILD_TYPE": "RelWithDebInfo"
}
},
Expand All @@ -112,9 +109,8 @@
"rhs": "Linux"
},
"generator": "Ninja",
"warnings": { "dev": true, "deprecated": true },
"warnings": {"dev": true, "deprecated": true},
"cacheVariables": {
"QT_VERSION": "6",
"CMAKE_BUILD_TYPE": "RelWithDebInfo"
}
},
Expand Down
24 changes: 13 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@

## Introduction

LocalVocal live-streaming AI assistant plugin allows you to transcribe, locally on your machine, audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). ✅ No GPU required, ✅ no cloud costs, ✅ no network and ✅ no downtime! Privacy first - all data stays on your machine.
LocalVocal lets you transcribe, locally on your machine, speech into text and simultaneously translate to any language. ✅ No GPU required, ✅ no cloud costs, ✅ no network and ✅ no downtime! Privacy first - all data stays on your machine.

If this free plugin has been valuable to you consider adding a ⭐ to this GH repo, rating it [on OBS](https://obsproject.com/forum/resources/localvocal-live-stream-ai-assistant.1769/), subscribing to [my YouTube channel](https://www.youtube.com/@royshilk) where I post updates, and supporting my work on [GitHub](https://github.com/sponsors/royshil) or [Patreon](https://www.patreon.com/RoyShilkrot) 🙏
If this free plugin has been valuable consider adding a ⭐ to this GH repo, rating it [on OBS](https://obsproject.com/forum/resources/localvocal-live-stream-ai-assistant.1769/), subscribing to [my YouTube channel](https://www.youtube.com/@royshilk) where I post updates, and supporting my work on [GitHub](https://github.com/sponsors/royshil), [Patreon](https://www.patreon.com/RoyShilkrot) or [OpenCollective](https://opencollective.com/occ-ai) 🙏

Internally the plugin is running a neural network ([OpenAI Whisper](https://github.com/openai/whisper)) locally to predict in real time the speech and provide captions.
Internally the plugin is running [OpenAI's Whisper](https://github.com/openai/whisper) to process real-time the speech and predict a transcription.
It's using the [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) project from [ggerganov](https://github.com/ggerganov) to run the Whisper network efficiently on CPUs and GPUs.
Translation is done with [CTranslate2](https://github.com/OpenNMT/CTranslate2).

## Usage

Expand Down Expand Up @@ -45,9 +46,10 @@ Current Features:
- Sync'ed captions with OBS recording timestamps
- Send captions on a RTMP stream to e.g. YouTube, Twitch
- Bring your own Whisper model (any GGML)
- Translate captions in real time to major languages (both Whisper built-in translation as well as NMT models with [CTranslate2](https://github.com/OpenNMT/CTranslate2))
- Translate captions in real time to major languages (both Whisper built-in translation as well as NMT models)
- CUDA, OpenCL, Apple Arm64, AVX & SSE acceleration support
- Filter out or replace any part of the produced captions
- Partial transcriptions for a streaming-captions experience

Roadmap:
- More robust built-in translation options
Expand All @@ -57,22 +59,22 @@ Roadmap:
Check out our other plugins:
- [Background Removal](https://github.com/occ-ai/obs-backgroundremoval) removes background from webcam without a green screen.
- [Detect](https://github.com/occ-ai/obs-detect) will detect and track >80 types of objects in real-time inside OBS
- [CleanStream](https://github.com/occ-ai/obs-cleanstream) for real-time filler word (uh,um) and profanity removal from live audio stream
- [CleanStream](https://github.com/occ-ai/obs-cleanstream) for real-time filler word (uh,um) and profanity removal from a live audio stream
- [URL/API Source](https://github.com/occ-ai/obs-urlsource) that allows fetching live data from an API and displaying it in OBS.
- [Polyglot](https://github.com/occ-ai/obs-polyglot) translation AI plugin for real-time, local translation to hunderds of languages
- [Squawk](https://github.com/occ-ai/obs-squawk) adds lifelike local text-to-speech capabilities built-in OBS

## Download
Check out the [latest releases](https://github.com/occ-ai/obs-localvocal/releases) for downloads and install instructions.

### Models
The plugin ships with the Tiny.en model, and will autonomoously download other bigger Whisper models through a dropdown.
However there's an option to select an external model file if you have it on disk.
The plugin ships with the Tiny.en model, and will autonomously download other Whisper models through a dropdown.
There's also an option to select an external GGML Whisper model file if you have it on disk.

Get more models from https://ggml.ggerganov.com/ and follow [the instructions on whisper.cpp](https://github.com/ggerganov/whisper.cpp/tree/master/models) to create your own models or download others such as distilled models.
Get more models from https://ggml.ggerganov.com/ and [HuggingFace](https://huggingface.co/ggerganov/whisper.cpp/tree/main), follow [the instructions on whisper.cpp](https://github.com/ggerganov/whisper.cpp/tree/master/models) to create your own models or download others such as distilled models.

## Building

The plugin was built and tested on Mac OSX (Intel & Apple silicon), Windows (with and without Nvidia CUDA) and Linux.
The plugin was built and tested on Mac OSX (Intel & Apple silicon), Windows (with and without Nvidia CUDA) and Linux.

Start by cloning this repo to a directory of your choice.

Expand Down Expand Up @@ -172,7 +174,7 @@ The build should exist in the `./release` folder off the root. You can manually
LocalVocal will now build with CUDA support automatically through a prebuilt binary of Whisper.cpp from https://github.com/occ-ai/occ-ai-dep-whispercpp. The CMake scripts will download all necessary files.
To build with cuda add `CPU_OR_CUDA` as an environment variable (with `cpu`, `12.2.0` or `11.8.0`) and build regularly
To build with cuda add `CPU_OR_CUDA` as an environment variable (with `cpu`, `clblast`, `12.2.0` or `11.8.0`) and build regularly
```powershell
> $env:CPU_OR_CUDA="12.2.0"
Expand Down
22 changes: 11 additions & 11 deletions buildspec.json
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
{
"dependencies": {
"obs-studio": {
"version": "30.0.2",
"version": "30.1.2",
"baseUrl": "https://github.com/obsproject/obs-studio/archive/refs/tags",
"label": "OBS sources",
"hashes": {
"macos": "be12c3ad0a85713750d8325e4b1db75086223402d7080d0e3c2833d7c5e83c27",
"windows-x64": "970058c49322cfa9cd6d620abb393fed89743ba7e74bd9dbb6ebe0ea8141d9c7"
"macos": "490bae1c392b3b344b0270afd8cb887da4bc50bd92c0c426e96713c1ccb9701a",
"windows-x64": "c2dd03fa7fd01fad5beafce8f7156da11f9ed9a588373fd40b44a06f4c03b867"
}
},
"prebuilt": {
"version": "2023-11-03",
"version": "2024-03-19",
"baseUrl": "https://github.com/obsproject/obs-deps/releases/download",
"label": "Pre-Built obs-deps",
"hashes": {
"macos": "90c2fc069847ec2768dcc867c1c63b112c615ed845a907dc44acab7a97181974",
"windows-x64": "d0825a6fb65822c993a3059edfba70d72d2e632ef74893588cf12b1f0d329ce6"
"macos": "2e9bfb55a5e0e4c1086fa1fda4cf268debfead473089df2aaea80e1c7a3ca7ff",
"windows-x64": "6e86068371526a967e805f6f9903f9407adb683c21820db5f07da8f30d11e998"
}
},
"qt6": {
"version": "2023-11-03",
"version": "2024-03-19",
"baseUrl": "https://github.com/obsproject/obs-deps/releases/download",
"label": "Pre-Built Qt6",
"hashes": {
"macos": "ba4a7152848da0053f63427a2a2cb0a199af3992997c0db08564df6f48c9db98",
"windows-x64": "bc57dedf76b47119a6dce0435a2f21b35b08c8f2948b1cb34a157320f77732d1"
"macos": "694f1e639c017e3b1f456f735330dc5afae287cbea85757101af1368de3142c8",
"windows-x64": "72d1df34a0ef7413a681d5fcc88cae81da60adc03dcd23ef17862ab170bcc0dd"
},
"debugSymbols": {
"windows-x64": "fd8ecd1d8cd2ef049d9f4d7fb5c134f784836d6020758094855dfa98bd025036"
"windows-x64": "fbddd1f659c360f2291911ac5709b67b6f8182e6bca519d24712e4f6fd3cc865"
}
}
},
Expand All @@ -38,7 +38,7 @@
},
"name": "obs-localvocal",
"displayName": "OBS Localvocal",
"version": "0.3.2",
"version": "0.3.3",
"author": "Roy Shilkrot",
"website": "https://github.com/occ-ai/obs-localvocal",
"email": "[email protected]",
Expand Down
19 changes: 16 additions & 3 deletions cmake/BuildWhispercpp.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ elseif(WIN32)
FetchContent_Declare(
whispercpp_fetch
URL ${WHISPER_CPP_URL}
URL_HASH SHA256=${WHISPER_CPP_HASH})
URL_HASH SHA256=${WHISPER_CPP_HASH}
DOWNLOAD_EXTRACT_TIMESTAMP TRUE)
FetchContent_MakeAvailable(whispercpp_fetch)

add_library(Whispercpp::Whisper SHARED IMPORTED)
Expand All @@ -104,8 +105,20 @@ elseif(WIN32)

# glob all dlls in the bin directory and install them
file(GLOB WHISPER_DLLS ${whispercpp_fetch_SOURCE_DIR}/bin/*.dll)
install(FILES ${WHISPER_DLLS} DESTINATION "obs-plugins/64bit")

foreach(FILE ${WHISPER_DLLS})
file(RELATIVE_PATH REL_FILE ${whispercpp_fetch_SOURCE_DIR}/bin ${FILE})
set(DEST_DIR "${CMAKE_SOURCE_DIR}/release/${CMAKE_BUILD_TYPE}/obs-plugins/64bit")
set(DEST_FILE "${DEST_DIR}/${REL_FILE}")

if(NOT EXISTS ${DEST_DIR})
file(MAKE_DIRECTORY ${DEST_DIR})
endif()

if(NOT EXISTS ${DEST_FILE} OR ${FILE} IS_NEWER_THAN ${DEST_FILE})
message(STATUS "Copying ${FILE} to ${DEST_FILE}")
file(COPY ${FILE} DESTINATION ${DEST_DIR})
endif()
endforeach()
else()
set(Whispercpp_Build_GIT_TAG "v1.6.2")
set(WHISPER_EXTRA_CXX_FLAGS "-fPIC")
Expand Down
2 changes: 1 addition & 1 deletion cmake/macos/compilerconfig.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -55,4 +55,4 @@ else()
endif()

add_compile_definitions($<$<CONFIG:DEBUG>:DEBUG> $<$<CONFIG:DEBUG>:_DEBUG> SIMDE_ENABLE_OPENMP)
add_compile_options(-Wno-error=newline-eof)
add_compile_options(-Wno-error=newline-eof -Wno-error=deprecated-declarations -Wno-deprecated-declarations)
2 changes: 2 additions & 0 deletions data/locale/en-US.ini
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,5 @@ log_group="Logging"
advanced_group="Advanced Configuration"
buffered_output_parameters="Buffered Output Configuration"
file_output_info="Note: Translation output will be saved to a file in the same directory with the target language added to the name, e.g. 'output_es.srt'."
partial_transcription="Enable Partial Transcription"
partial_transcription_info="Partial transcription will increase processing load on your machine to transcribe content in real-time, which may impact performance."
20 changes: 0 additions & 20 deletions patch_libobs.diff

This file was deleted.

22 changes: 17 additions & 5 deletions src/transcription-filter-callbacks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,8 @@ void set_text_callback(struct transcription_filter_data *gf,
const DetectionResultWithText &resultIn)
{
DetectionResultWithText result = resultIn;
if (!result.text.empty() && result.result == DETECTION_RESULT_SPEECH) {
if (!result.text.empty() && (result.result == DETECTION_RESULT_SPEECH ||
result.result == DETECTION_RESULT_PARTIAL)) {
gf->last_sub_render_time = now_ms();
gf->cleared_last_sub = false;
}
Expand Down Expand Up @@ -231,7 +232,10 @@ void set_text_callback(struct transcription_filter_data *gf,
str_copy = translated_sentence;
} else {
if (gf->buffered_output) {
gf->translation_monitor.addSentence(translated_sentence);
if (result.result == DETECTION_RESULT_SPEECH) {
// buffered output - add the sentence to the monitor
gf->translation_monitor.addSentence(translated_sentence);
}
} else {
// non-buffered output - send the sentence to the selected source
send_caption_to_source(gf->translation_output, translated_sentence,
Expand All @@ -241,17 +245,20 @@ void set_text_callback(struct transcription_filter_data *gf,
}

if (gf->buffered_output) {
gf->captions_monitor.addSentence(str_copy);
if (result.result == DETECTION_RESULT_SPEECH) {
gf->captions_monitor.addSentence(str_copy);
}
} else {
// non-buffered output - send the sentence to the selected source
send_caption_to_source(gf->text_source_name, str_copy, gf);
}

if (gf->caption_to_stream) {
if (gf->caption_to_stream && result.result == DETECTION_RESULT_SPEECH) {
send_caption_to_stream(result, str_copy, gf);
}

if (gf->save_to_file && gf->output_file_path != "") {
if (gf->save_to_file && gf->output_file_path != "" &&
result.result == DETECTION_RESULT_SPEECH) {
send_sentence_to_file(gf, result, str_copy, translated_sentence);
}
};
Expand Down Expand Up @@ -291,8 +298,10 @@ void reset_caption_state(transcription_filter_data *gf_)
{
if (gf_->captions_monitor.isEnabled()) {
gf_->captions_monitor.clear();
gf_->translation_monitor.clear();
}
send_caption_to_source(gf_->text_source_name, "", gf_);
send_caption_to_source(gf_->translation_output, "", gf_);
// flush the buffer
{
std::lock_guard<std::mutex> lock(gf_->whisper_buf_mutex);
Expand Down Expand Up @@ -326,13 +335,15 @@ void media_started_callback(void *data_, calldata_t *cd)
gf_->active = true;
reset_caption_state(gf_);
}

void media_pause_callback(void *data_, calldata_t *cd)
{
UNUSED_PARAMETER(cd);
transcription_filter_data *gf_ = static_cast<struct transcription_filter_data *>(data_);
obs_log(gf_->log_level, "media_pause");
gf_->active = false;
}

void media_restart_callback(void *data_, calldata_t *cd)
{
UNUSED_PARAMETER(cd);
Expand All @@ -341,6 +352,7 @@ void media_restart_callback(void *data_, calldata_t *cd)
gf_->active = true;
reset_caption_state(gf_);
}

void media_stopped_callback(void *data_, calldata_t *cd)
{
UNUSED_PARAMETER(cd);
Expand Down
2 changes: 2 additions & 0 deletions src/transcription-filter-data.h
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ struct transcription_filter_data {
bool enable_audio_chunks_callback = false;
bool source_signals_set = false;
bool initial_creation = true;
bool partial_transcription = false;
int partial_latency = 1000;

// Last transcription result
std::string last_text;
Expand Down
22 changes: 20 additions & 2 deletions src/transcription-filter-properties.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@ bool advanced_settings_callback(obs_properties_t *props, obs_property_t *propert
UNUSED_PARAMETER(property);
// If advanced settings is enabled, show the advanced settings group
const bool show_hide = obs_data_get_int(settings, "advanced_settings_mode") == 1;
for (const std::string &prop_name : {"whisper_params_group", "buffered_output_group",
"log_group", "advanced_group", "file_output_enable"}) {
for (const std::string &prop_name :
{"whisper_params_group", "buffered_output_group", "log_group", "advanced_group",
"file_output_enable", "partial_group"}) {
obs_property_set_visible(obs_properties_get(props, prop_name.c_str()), show_hide);
}
translation_options_callback(props, NULL, settings);
Expand Down Expand Up @@ -457,6 +458,22 @@ void add_general_group_properties(obs_properties_t *ppts)
}
}

void add_partial_group_properties(obs_properties_t *ppts)
{
// add a group for partial transcription
obs_properties_t *partial_group = obs_properties_create();
obs_properties_add_group(ppts, "partial_group", MT_("partial_transcription"),
OBS_GROUP_CHECKABLE, partial_group);

// add text info
obs_properties_add_text(partial_group, "partial_info", MT_("partial_transcription_info"),
OBS_TEXT_INFO);

// add slider for partial latecy
obs_properties_add_int_slider(partial_group, "partial_latency", MT_("partial_latency"), 500,
3000, 50);
}

obs_properties_t *transcription_filter_properties(void *data)
{
struct transcription_filter_data *gf =
Expand All @@ -480,6 +497,7 @@ obs_properties_t *transcription_filter_properties(void *data)
add_buffered_output_group_properties(ppts);
add_advanced_group_properties(ppts, gf);
add_logging_group_properties(ppts);
add_partial_group_properties(ppts);
add_whisper_params_group_properties(ppts);

// Add a informative text about the plugin
Expand Down
Loading

0 comments on commit b3e4bfa

Please sign in to comment.