-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from Picovoice/voice-llm-python
llm-powered voice assistant in python
- Loading branch information
Showing
7 changed files
with
494 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,6 @@ | ||
# pico-cookbook | ||
# Pico Cookbook | ||
|
||
Made in Vancouver, Canada by [Picovoice](https://picovoice.ai) | ||
|
||
[![Twitter URL](https://img.shields.io/twitter/url?label=%40AiPicovoice&style=social&url=https%3A%2F%2Ftwitter.com%2FAiPicovoice)](https://twitter.com/AiPicovoice)<!-- markdown-link-check-disable-line --> | ||
[![YouTube Channel Views](https://img.shields.io/youtube/channel/views/UCAdi9sTCXLosG1XeqDwLx7w?label=YouTube&style=social)](https://www.youtube.com/channel/UCAdi9sTCXLosG1XeqDwLx7w) |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# LLM-Powered Voice Assistant | ||
|
||
Hands-free voice assistant powered by a large language model (LLM), all voice recognition, LLM inference, and speech synthesis are on-device. | ||
|
||
## Components | ||
|
||
- [Porcupine Wake Word](https://picovoice.ai/docs/porcupine/) | ||
- [Cheetah Streaming Speech-to-Text](https://picovoice.ai/docs/cheetah/) | ||
- [picoLLM Inference Engine](https://github.com/Picovoice/picollm) | ||
- [Orca Streaming Text-to-Speech](https://picovoice.ai/docs/orca/) | ||
|
||
## Implementations | ||
|
||
- [Python](python) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
## Compatibility | ||
|
||
- Python 3.8+ | ||
- Runs on Linux (x86_64), macOS (arm64, x86_64), Windows (x86_64), and Raspberry Pi (5 and 4). | ||
|
||
## AccessKey | ||
|
||
AccessKey is your authentication and authorization token for deploying Picovoice SDKs, including picoLLM. Anyone who is | ||
using Picovoice needs to have a valid AccessKey. You must keep your AccessKey secret. You would need internet | ||
connectivity to validate your AccessKey with Picovoice license servers even though the LLM inference is running 100% | ||
offline and completely free for open-weight models. Everyone who signs up for | ||
[Picovoice Console](https://console.picovoice.ai/) receives a unique AccessKey. | ||
|
||
## picoLLM Model | ||
|
||
picoLLM Inference Engine supports many open-weight models. The models are on | ||
[Picovoice Console](https://console.picovoice.ai/). | ||
|
||
## Usage | ||
|
||
Install the required packages: | ||
|
||
```console | ||
pip install -r requirements.txt | ||
``` | ||
|
||
Run the demo: | ||
|
||
```console | ||
python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH} | ||
``` | ||
|
||
Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the | ||
model downloaded from Picovoice Console. | ||
|
||
To see all available options, type the following: | ||
|
||
```console | ||
python main.py --help | ||
``` | ||
|
||
## Custom Wake Word | ||
|
||
The demo's default wake phrase is `Picovoice`. You can generate your custom (branded) wake word using Picovoice Console by following [Porcupine Wake Word documentation (https://picovoice.ai/docs/porcupine/). Once you have the model trained, simply pass it to the demo | ||
application using `--keyword_model_path` argument. | ||
|
||
## Profiling | ||
|
||
To see the runtime profiling metrics, run the demo with the `--profile` argument: | ||
|
||
```console | ||
python3 main.py --access_key ${ACCESS_KEY} --picollm_model_path ${PICOLLM_MODEL_PATH} --profile | ||
``` | ||
|
||
Replace `${ACCESS_KEY}` with yours obtained from Picovoice Console and `${PICOLLM_MODEL_PATH}` with the path to the | ||
model downloaded from Picovoice Console. | ||
|
||
The demo profiles three metrics: Real-time Factor (RTF), Token per Second (TPS), and Latency. | ||
|
||
### Real-time Factor (RTF) | ||
|
||
RTF is a standard metric for measuring the speed of speech processing (e.g., wake word, speech-to-text, and | ||
text-to-speech). RTF is the CPU time divided by the processed (recognized or synthesized) audio length. Hence, a lower RTF means a more efficient engine. | ||
|
||
### Token per Second (PPS) | ||
|
||
Token per second is the standard metric for measuring the speed of LLM inference engines. TPS is the number of | ||
generated tokens divided by the compute time used to create them. A higher TPS is better. | ||
|
||
### Latency | ||
|
||
We measure the latency as the delay between the end of the user's utterance (i.e., the time when the user finishes talking) and the | ||
time that the voice assistant generates the first chunk of the audio response (i.e., when the user starts hearing the response). | ||
|
Oops, something went wrong.