Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

voice-llm in python #1

Merged
merged 19 commits into from
May 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
# pico-cookbook
# Pico Cookbook

Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)

[![Twitter URL](https://img.shields.io/twitter/url?label=%40AiPicovoice&style=social&url=https%3A%2F%2Ftwitter.com%2FAiPicovoice)](https://twitter.com/AiPicovoice)<!-- markdown-link-check-disable-line -->
[![YouTube Channel Views](https://img.shields.io/youtube/channel/views/UCAdi9sTCXLosG1XeqDwLx7w?label=YouTube&style=social)](https://www.youtube.com/channel/UCAdi9sTCXLosG1XeqDwLx7w)
Empty file removed recipes/.gitkeep
Empty file.
14 changes: 14 additions & 0 deletions recipes/llm-voice-assistant/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# LLM-Powered Voice Assistant

Hands-free voice assistant powered by a large language model (LLM), all voice recognition, LLM inference, and speech synthesis are on-device.

## Components

- [Porcupine Wake Word](https://picovoice.ai/docs/porcupine/)
- [Cheetah Streaming Speech-to-Text](https://picovoice.ai/docs/cheetah/)
- [picoLLM Inference Engine](https://github.com/Picovoice/picollm)
- [Orca Streaming Text-to-Speech](https://picovoice.ai/docs/orca/)

## Implementations

- [Python](python)
74 changes: 74 additions & 0 deletions recipes/llm-voice-assistant/python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
## Compatibility

- Python 3.8+
- Runs on Linux (x86_64), macOS (arm64, x86_64), Windows (x86_64), and Raspberry Pi (5 and 4).

## AccessKey

AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is
using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet
connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100%
offline and completely free for open-weight models. Everyone who signs up for
[Picovoice Console](https://console.picovoice.ai/) receives a unique AccessKey.

## picoLLM Model

picoLLM Inference Engine supports many open-weight models. The models are on
[Picovoice Console](https://console.picovoice.ai/).

## Usage

Install the required packages:

```console
pip install -r requirements.txt
```

Run the demo:

```console
python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH}
```

Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the
model downloaded from Picovoice Console.

To see all available options, type the following:

```console
python main.py --help
```

## Custom Wake Word

The demo's default wake phrase is `Picovoice`. You can generate your custom (branded) wake word using Picovoice Console by following [Porcupine Wake Word documentation (https://picovoice.ai/docs/porcupine/). Once you have the model trained, simply pass it to the demo
kenarsa marked this conversation as resolved.
Show resolved Hide resolved
application using `--keyword_model_path` argument.

## Profiling

To see the runtime profiling metrics, run the demo with the `--profile` argument:

```console
python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH} --profile
```

Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the
model downloaded from Picovoice Console.

The demo profiles three metrics: Real-time Factor (RTF), Token per Second (TPS), and Latency.

### Real-time Factor (RTF)

RTF is a standard metric for measuring the speed of speech processing (e.g., wake word, speech-to-text, and
text-to-speech). RTF is the CPU time divided by the processed (recognized or synthesized) audio length. Hence, a lower RTF means a more efficient engine.

### Token per Second (PPS)

Token per second is the standard metric for measuring the speed of LLM inference engines. TPS is the number of
generated tokens divided by the compute time used to create them. A higher TPS is better.

### Latency

We measure the latency as the delay between the end of the user's utterance (i.e., the time when the user finishes talking) and the
time that the voice assistant generates the first chunk of the audio response (i.e., when the user starts hearing the response).

Loading
Loading