Skip to content

feat: add article email-phone-mispronunciation-tts #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Addressing Mispronunciation of Emails and Phone Numbers in Text-to-Speech

When using text-to-speech (TTS) technology, there may be instances where certain details such as email addresses and phone numbers are not pronounced accurately. Addressing these challenges can significantly improve the user experience when interacting with automated systems.

### Common Issues

1. **Email Mispronunciation**: In some TTS engines, email addresses may not be pronounced as expected. For instance, the domain suffix "au" might be spoken as "aww" rather than individual letters.

2. **Phone Number Numerical Interpretation**: Phone numbers might be interpreted as large numbers rather than a series of digits (e.g., "1800123456" being read as "one billion eight hundred million...").

These challenges were identified in certain requests processed via the Voice Agent API, where such content was not pronounced as anticipated.

### Solution

To resolve these issues, upgrading the voice model to a more advanced version, such as `aura-2`, has shown to address the mispronunciation challenges effectively. This update may enhance the naturalness and accuracy of speech outputs, particularly for complex or non-standard text inputs like emails and phone numbers.

### Implementation

- **Upgrade Voice Model**: Consider upgrading to an improved voice model. This involves updating your configuration settings to utilize the features of the latest models available in the API you are using.

- **Verify API Configuration**: Ensure your settings in the Voice Agent API are correctly configured to support advanced pronunciation features. It's crucial to verify and test outputs after changes to confirm improvements.

### Request IDs for Diagnosis
If you experience similar issues and need in-depth diagnosis, logging request IDs can aid in the support process. For example:
- Email pronunciation issue: `Request ID: 11111111-2222-3333-4444-555555555555`
- Phone number misinterpretation: `Request ID: 66666666-7777-8888-9999-000000000000`

For persistent issues or inconsistent system behavior, contacting your Deepgram support representative or visiting the community for assistance is recommended: [Deepgram Community on Discord](https://discord.gg/deepgram).

### Conclusion
Optimizing TTS for better interpretation of email addresses and phone numbers enhances user satisfaction. By keeping your voice model up-to-date and ensuring proper API configurations, you can mitigate these issues effectively.

### References
- [Deepgram Voice Agent API](https://developers.deepgram.com/docs/voice-agent)