An intelligent Live2D desktop assistant.
Petto periodically outputs quotes/contextual greetings/time greetings, supports streaming/Whisper voice recognition, TTS voice synthesis, motion triggering, expression management, etc. Since v3.0.0, it uses the new Kage mode by default; the original Live2DViewerEX ExAPI workflow is retained as "compatibility mode".
v3.0.0 is a major update featuring code refactoring, new Kage default mode, cross-platform improvements, and more.
View complete changelog: CHANGELOG_v3.0.0.md
| Mode | Status | Documentation |
|---|---|---|
| Kage | Recommended | docs/MODE_KAGE.md |
| Live2DViewerEX Compatibility | Supported | docs/MODE_LIVE2DVIEWEREX.md |
- Download the latest Release and extract
- Run
pettodirectly - If Kage is not found on first launch, it will detect and prompt for download
- Confirm WebSocket address in settings (default
ws://localhost:23333), optionally select model JSON - Click the refresh button next to action groups to auto-fetch motions
- Start chatting or enable streaming recognition
Settings → Switch to compatibility mode → Fill in ExAPI address and model number. See MODE_LIVE2DVIEWEREX.md for details.
- Contextual greetings (based on weather/time/current window screenshot)
- Voice interaction:
- Streaming MASR-compatible recognition
- Whisper recognition (OpenAI compatible)
- Background keyword wake-up
- TTS voice playback (OpenAI compatible, demo endpoint provided)
- Wake-up via hotkey
- Auto-fetch motion list in Kage mode
- Optional window screenshots
- Update check and logging toggle
You can configure Petto features in detail from the settings in the top right corner of the main interface.
- Kage (Default): Modern WebSocket interface with auto-download support, supports motion/expression/text management
- See MODE_KAGE.md for detailed configuration
- Live2DViewerEX Compatibility: Maintains support for legacy versions, requires manual configuration
- See MODE_LIVE2DVIEWEREX.md for detailed configuration
Petto supports OpenAI-compatible language model APIs.
Petto is pre-configured with a public language model: https://free-llm.cups.moe/v1, which includes some directly usable models.
Key: sk-ObgTAfL0qYK6OmoYpQx2qra3EyIxdtP2DPAzz8D5wwe3Eb9l
Note: This public service may periodically change keys or shut down. Please do not abuse.
You can customize the character's name, profile, alias, and other information.
Write character settings that match your expectations based on the desktop pet character you're using!
LLM responses will be influenced by message examples.
This setting provides reference examples to the model. If the LLM's responses don't meet your expectations, try writing example messages for the model to learn from.
If the recognition address is left blank, Whisper recognition mode will be used.
The background recognition service keeps recording continuously and sends voice content to the configured recognition address. Be aware of potential privacy and security issues.
Currently, streaming recognition must use an interface compatible with the MASR server-side recognition project.
The project is pre-configured with a public streaming recognition service: wss://api.cups.moe/api/asr/
Server performance is poor, use with care :) Heavy usage might crash my server.
It's recommended to deploy MASR service yourself following the tutorial below, or use Whisper mode.
After enabling background streaming recognition, Petto will continuously run in the background. When any speech containing the background wake-up keyword is detected, the desktop pet will send you a message and prompt you to talk to it:
User: Help me
Desktop Pet: Master, what do you need help with? Please tell me~
Then, Petto will automatically start a ten-second recording recognition session. You can interact with the desktop pet, and after the recording ends, it will respond.
Unlike streaming recognition, Whisper recognition mode must complete recording before obtaining the final text, which mainly affects background recognition speed.
Petto supports OpenAI-compatible Whisper APIs.
If there are no special circumstances, keep the default value.
Fill in the API address for requesting Hitokoto quotes.
Petto supports OpenAI-compatible TTS APIs.
Although TTS information is not filled in by default, we actually provide a ready-to-use TTS service:
TTS Address: https://api.cups.moe/api/tts/
TTS Key: ecWdn$TJ&ktP#89
This service is deployed based on openai-edge-tts
Set the hotkey to wake up Petto. Record and save to take effect.
This feature captures screenshots of the current active window as context information. Disabled by default.
If screenshot mode is enabled in settings as context:
- Petto will pass the current active window screenshot to the language model to generate more contextual greetings
- Be aware of potential privacy and security issues
Cross-platform Support:
v3.0.0 uses xcap for cross-platform screenshots, supporting Windows, macOS, and Linux.
Note: v3.0.0 has removed the legacy "window title" feature, now unified to use window screenshots.
Set the display time for desktop pet text bubbles, in milliseconds.
Default: 3000 (3 seconds)
- Applies to both Kage mode and Live2DViewerEX compatibility mode
- Controls display duration for greetings, reply messages, and other text
- (Optional) In settings, check "Hide window on startup"
- Press Win+R, type shell:startup. Create a shortcut for petto.exe and place it in this folder
The models provided below are trained only on Chinese corpus. You can manually train models that support more languages and have higher accuracy, or use pre-trained models or Whisper API for voice recognition.
- Download inference.pt for local deployment.
- Download model.pdiparams to punctuate speech recognition results.
Streaming: Real-time transmission, fast response. Whisper: Record first then recognize. Leave the streaming address blank to enable Whisper mode. Privacy: Be aware of privacy risks when using voice recognition.
- Support more languages
- Voice authentication
- UI optimization
- macOS support
- Linux support
- Kage integration
| Issue | Troubleshooting |
|---|---|
| Cannot connect to Kage | Check ws address/firewall/port occupation |
| Motion list is empty | Is model JSON correct; refresh; check logs |
| Text not displaying | Text display duration>0; is version latest |
| Streaming lag | Self-host MASR or switch to Whisper |
| No motions in compatibility mode | Is action group filled correctly |
| Kage not auto-detected | Manually select executable and restart |
Contributions and bug reports are welcome.
GPL-3.0-or-later, see LICENSE for details.
Sponsored by NETJETT: https://netjett.com/aff.php?aff=45
Need more detailed mode instructions? Read: docs/MODE_KAGE.md and docs/MODE_LIVE2DVIEWEREX.md.

