Skip to content

Commit

Permalink
Merge pull request #1 from Picovoice/voice-llm-python
Browse files Browse the repository at this point in the history
llm-powered voice assistant in python
  • Loading branch information
kenarsa authored May 27, 2024
2 parents 040177d + 385b377 commit 5cb3741
Show file tree
Hide file tree
Showing 7 changed files with 494 additions and 2 deletions.
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
# pico-cookbook
# Pico Cookbook

Made in Vancouver, Canada by [Picovoice](https://picovoice.ai)

[![Twitter URL](https://img.shields.io/twitter/url?label=%40AiPicovoice&style=social&url=https%3A%2F%2Ftwitter.com%2FAiPicovoice)](https://twitter.com/AiPicovoice)<!-- markdown-link-check-disable-line -->
[![YouTube Channel Views](https://img.shields.io/youtube/channel/views/UCAdi9sTCXLosG1XeqDwLx7w?label=YouTube&style=social)](https://www.youtube.com/channel/UCAdi9sTCXLosG1XeqDwLx7w)
Empty file removed recipes/.gitkeep
Empty file.
14 changes: 14 additions & 0 deletions recipes/llm-voice-assistant/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# LLM-Powered Voice Assistant

Hands-free voice assistant powered by a large language model (LLM), all voice recognition, LLM inference, and speech synthesis are on-device.

## Components

- [Porcupine Wake Word](https://picovoice.ai/docs/porcupine/)
- [Cheetah Streaming Speech-to-Text](https://picovoice.ai/docs/cheetah/)
- [picoLLM Inference Engine](https://github.com/Picovoice/picollm)
- [Orca Streaming Text-to-Speech](https://picovoice.ai/docs/orca/)

## Implementations

- [Python](python)
74 changes: 74 additions & 0 deletions recipes/llm-voice-assistant/python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
## Compatibility

- Python 3.8+
- Runs on Linux (x86_64), macOS (arm64, x86_64), Windows (x86_64), and Raspberry Pi (5 and 4).

## AccessKey

AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is
using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet
connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100%
offline and completely free for open-weight models. Everyone who signs up for
[Picovoice Console](https://console.picovoice.ai/) receives a unique AccessKey.

## picoLLM Model

picoLLM Inference Engine supports many open-weight models. The models are on
[Picovoice Console](https://console.picovoice.ai/).

## Usage

Install the required packages:

```console
pip install -r requirements.txt
```

Run the demo:

```console
python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH}
```

Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the
model downloaded from Picovoice Console.

To see all available options, type the following:

```console
python main.py --help
```

## Custom Wake Word

The demo's default wake phrase is `Picovoice`. You can generate your custom (branded) wake word using Picovoice Console by following [Porcupine Wake Word documentation (https://picovoice.ai/docs/porcupine/). Once you have the model trained, simply pass it to the demo
application using `--keyword_model_path` argument.

## Profiling

To see the runtime profiling metrics, run the demo with the `--profile` argument:

```console
python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH} --profile
```

Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the
model downloaded from Picovoice Console.

The demo profiles three metrics: Real-time Factor (RTF), Token per Second (TPS), and Latency.

### Real-time Factor (RTF)

RTF is a standard metric for measuring the speed of speech processing (e.g., wake word, speech-to-text, and
text-to-speech). RTF is the CPU time divided by the processed (recognized or synthesized) audio length. Hence, a lower RTF means a more efficient engine.

### Token per Second (PPS)

Token per second is the standard metric for measuring the speed of LLM inference engines. TPS is the number of
generated tokens divided by the compute time used to create them. A higher TPS is better.

### Latency

We measure the latency as the delay between the end of the user's utterance (i.e., the time when the user finishes talking) and the
time that the voice assistant generates the first chunk of the audio response (i.e., when the user starts hearing the response).

Loading

0 comments on commit 5cb3741

Please sign in to comment.