Skip to content

Commit 231f3e2

Browse files
committed
Readme and main.py
1 parent c2f4f93 commit 231f3e2

File tree

6 files changed

+410
-0
lines changed

6 files changed

+410
-0
lines changed

.env.example

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
OPENAI_API_KEY=your_api_key

CODE_OF_CONDUCT.md

+134
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
2+
# Contributor Covenant Code of Conduct
3+
4+
## Our Pledge
5+
6+
We as members, contributors, and leaders pledge to make participation in our
7+
community a harassment-free experience for everyone, regardless of age, body
8+
size, visible or invisible disability, ethnicity, sex characteristics, gender
9+
identity and expression, level of experience, education, socio-economic status,
10+
nationality, personal appearance, race, caste, color, religion, or sexual
11+
identity and orientation.
12+
13+
We pledge to act and interact in ways that contribute to an open, welcoming,
14+
diverse, inclusive, and healthy community.
15+
16+
## Our Standards
17+
18+
Examples of behavior that contributes to a positive environment for our
19+
community include:
20+
21+
* Demonstrating empathy and kindness toward other people
22+
* Being respectful of differing opinions, viewpoints, and experiences
23+
* Giving and gracefully accepting constructive feedback
24+
* Accepting responsibility and apologizing to those affected by our mistakes,
25+
and learning from the experience
26+
* Focusing on what is best not just for us as individuals, but for the overall
27+
community
28+
29+
Examples of unacceptable behavior include:
30+
31+
* The use of sexualized language or imagery, and sexual attention or advances of
32+
any kind
33+
* Trolling, insulting or derogatory comments, and personal or political attacks
34+
* Public or private harassment
35+
* Publishing others' private information, such as a physical or email address,
36+
without their explicit permission
37+
* Other conduct which could reasonably be considered inappropriate in a
38+
professional setting
39+
40+
## Enforcement Responsibilities
41+
42+
Community leaders are responsible for clarifying and enforcing our standards of
43+
acceptable behavior and will take appropriate and fair corrective action in
44+
response to any behavior that they deem inappropriate, threatening, offensive,
45+
or harmful.
46+
47+
Community leaders have the right and responsibility to remove, edit, or reject
48+
comments, commits, code, wiki edits, issues, and other contributions that are
49+
not aligned to this Code of Conduct, and will communicate reasons for moderation
50+
decisions when appropriate.
51+
52+
## Scope
53+
54+
This Code of Conduct applies within all community spaces, and also applies when
55+
an individual is officially representing the community in public spaces.
56+
Examples of representing our community include using an official email address,
57+
posting via an official social media account, or acting as an appointed
58+
representative at an online or offline event.
59+
60+
## Enforcement
61+
62+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
63+
reported to the community leaders responsible for enforcement at
64+
65+
All complaints will be reviewed and investigated promptly and fairly.
66+
67+
All community leaders are obligated to respect the privacy and security of the
68+
reporter of any incident.
69+
70+
## Enforcement Guidelines
71+
72+
Community leaders will follow these Community Impact Guidelines in determining
73+
the consequences for any action they deem in violation of this Code of Conduct:
74+
75+
### 1. Correction
76+
77+
**Community Impact**: Use of inappropriate language or other behavior deemed
78+
unprofessional or unwelcome in the community.
79+
80+
**Consequence**: A private, written warning from community leaders, providing
81+
clarity around the nature of the violation and an explanation of why the
82+
behavior was inappropriate. A public apology may be requested.
83+
84+
### 2. Warning
85+
86+
**Community Impact**: A violation through a single incident or series of
87+
actions.
88+
89+
**Consequence**: A warning with consequences for continued behavior. No
90+
interaction with the people involved, including unsolicited interaction with
91+
those enforcing the Code of Conduct, for a specified period of time. This
92+
includes avoiding interactions in community spaces as well as external channels
93+
like social media. Violating these terms may lead to a temporary or permanent
94+
ban.
95+
96+
### 3. Temporary Ban
97+
98+
**Community Impact**: A serious violation of community standards, including
99+
sustained inappropriate behavior.
100+
101+
**Consequence**: A temporary ban from any sort of interaction or public
102+
communication with the community for a specified period of time. No public or
103+
private interaction with the people involved, including unsolicited interaction
104+
with those enforcing the Code of Conduct, is allowed during this period.
105+
Violating these terms may lead to a permanent ban.
106+
107+
### 4. Permanent Ban
108+
109+
**Community Impact**: Demonstrating a pattern of violation of community
110+
standards, including sustained inappropriate behavior, harassment of an
111+
individual, or aggression toward or disparagement of classes of individuals.
112+
113+
**Consequence**: A permanent ban from any sort of public interaction within the
114+
community.
115+
116+
## Attribution
117+
118+
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
119+
version 2.1, available at
120+
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
121+
122+
Community Impact Guidelines were inspired by
123+
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
124+
125+
For answers to common questions about this code of conduct, see the FAQ at
126+
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
127+
[https://www.contributor-covenant.org/translations][translations].
128+
129+
[homepage]: https://www.contributor-covenant.org
130+
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
131+
[Mozilla CoC]: https://github.com/mozilla/diversity
132+
[FAQ]: https://www.contributor-covenant.org/faq
133+
[translations]: https://www.contributor-covenant.org/translations
134+

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License Copyright (c) 2024 pkamp
2+
3+
Permission is hereby granted, free of
4+
charge, to any person obtaining a copy of this software and associated
5+
documentation files (the "Software"), to deal in the Software without
6+
restriction, including without limitation the rights to use, copy, modify, merge,
7+
publish, distribute, sublicense, and/or sell copies of the Software, and to
8+
permit persons to whom the Software is furnished to do so, subject to the
9+
following conditions:
10+
11+
The above copyright notice and this permission notice
12+
(including the next paragraph) shall be included in all copies or substantial
13+
portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF
16+
ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
18+
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
19+
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
20+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21+
THE SOFTWARE.

Readme.md

+86
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Speech Assistant with Twilio Voice and the OpenAI Realtime API (Python)
2+
3+
This application demonstrates how to use Python, [Twilio Voice](https://www.twilio.com/docs/voice) and [Media Streams](https://www.twilio.com/docs/voice/media-streams), and [OpenAI's Realtime API](https://platform.openai.com/docs/) to make a phone call to speak with an AI Assistant.
4+
5+
The application opens websockets with the OpenAI Realtime API and Twilio, and sends voice audio from one to the other to enable a two-way conversation.
6+
7+
See [here](https://www.twilio.com/en-us/voice-ai-assistant-openai-realtime-api-python) for a tutorial overview of the code.
8+
9+
This application uses the following Twilio products in conjuction with OpenAI's Realtime API:
10+
- Voice (and TwiML, Media Streams)
11+
- Phone Numbers
12+
13+
## Prerequisites
14+
15+
To use the app, you will need:
16+
17+
- **Python 3.9+** We used \`3.19.3\` for development; download from [here](https://www.python.org/downloads/).
18+
- **A Twilio account.** You can sign up for a free trial [here](https://www.twilio.com/try-twilio).
19+
- **A Twilio number with _Voice_ capabilities.** [Here are instructions](https://help.twilio.com/articles/223135247-How-to-Search-for-and-Buy-a-Twilio-Phone-Number-from-Console) to purchase a phone number.
20+
- **An OpenAI account and an OpenAI API Key.** You can sign up [here](https://platform.openai.com/).
21+
- **OpenAI Realtime API access.**
22+
23+
## Local Setup
24+
25+
There are 4 required steps and 1 optional step to get the app up-and-running locally for development and testing:
26+
1. Run ngrok or another tunneling solution to expose your local server to the internet for testing. Download ngrok [here](https://ngrok.com/).
27+
2. (optional) Create and use a virtual environment
28+
3. Install the packages
29+
4. Twilio setup
30+
5. Update the .env file
31+
32+
### Open an ngrok tunnel
33+
When developing & testing locally, you'll need to open a tunnel to forward requests to your local development server. These instructions use ngrok.
34+
35+
Open a Terminal and run:
36+
```
37+
ngrok http 5050
38+
```
39+
Once the tunnel has been opened, copy the `Forwarding` URL. It will look something like: `https://[your-ngrok-subdomain].ngrok.app`. You will
40+
need this when configuring your Twilio number setup.
41+
42+
Note that the `ngrok` command above forwards to a development server running on port `5050`, which is the default port configured in this application. If
43+
you override the `PORT` defined in `index.js`, you will need to update the `ngrok` command accordingly.
44+
45+
Keep in mind that each time you run the `ngrok http` command, a new URL will be created, and you'll need to update it everywhere it is referenced below.
46+
47+
### (Optional) Create and use a virtual environment
48+
49+
To reduce cluttering your global Python environment on your machine, you can create a virtual environment. On your command line, enter:
50+
51+
```
52+
python3 -m venv env
53+
source env/bin/activate
54+
```
55+
56+
### Install required packages
57+
58+
In the terminal (with the virtual environment, if you set it up) run:
59+
```
60+
pip install -r requirements.txt
61+
```
62+
63+
### Twilio setup
64+
65+
#### Point a Phone Number to your ngrok URL
66+
In the [Twilio Console](https://console.twilio.com/), go to **Phone Numbers** > **Manage** > **Active Numbers** and click on the additional phone number you purchased for this app in the **Prerequisites**.
67+
68+
In your Phone Number configuration settings, update the first **A call comes in** dropdown to **Webhook**, and paste your ngrok forwarding URL (referenced above), followed by `/incoming-call`. For example, `https://[your-ngrok-subdomain].ngrok.app/incoming-call`. Then, click **Save configuration**.
69+
70+
### Update the .env file
71+
72+
Create a `/env` file, or copy the `.env.example` file to `.env`:
73+
74+
```
75+
cp .env.example .env
76+
```
77+
78+
In the .env file, update the `OPENAI_API_KEY` to your OpenAI API key from the **Prerequisites**.
79+
80+
## Run the app
81+
Once ngrok is running, dependencies are installed, Twilio is configured properly, and the `.env` is set up, run the dev server with the following command:
82+
```
83+
python main.py
84+
```
85+
## Test the app
86+
With the development server running, call the phone number you purchased in the **Prerequisites**. After the introduction, you should be able to talk to the AI Assistant. Have fun!

main.py

+138
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
import os
2+
import json
3+
import base64
4+
import asyncio
5+
import websockets
6+
from fastapi import FastAPI, WebSocket, Request
7+
from fastapi.responses import HTMLResponse
8+
from fastapi.websockets import WebSocketDisconnect
9+
from twilio.twiml.voice_response import VoiceResponse, Connect, Say, Stream
10+
from dotenv import load_dotenv
11+
12+
load_dotenv()
13+
14+
# Configuration
15+
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') # requires OpenAI Realtime API Access
16+
PORT = int(os.getenv('PORT', 5050))
17+
SYSTEM_MESSAGE = (
18+
"You are a helpful and bubbly AI assistant who loves to chat about "
19+
"anything the user is interested in and is prepared to offer them facts. "
20+
"You have a penchant for dad jokes, owl jokes, and rickrolling – subtly. "
21+
"Always stay positive, but work in a joke when appropriate."
22+
)
23+
VOICE = 'alloy'
24+
LOG_EVENT_TYPES = [
25+
'response.content.done', 'rate_limits.updated', 'response.done',
26+
'input_audio_buffer.committed', 'input_audio_buffer.speech_stopped',
27+
'input_audio_buffer.speech_started', 'session.created'
28+
]
29+
30+
app = FastAPI()
31+
32+
33+
if not OPENAI_API_KEY:
34+
raise ValueError('Missing the OpenAI API key. Please set it in the .env file.')
35+
36+
@app.get("/", response_class=HTMLResponse)
37+
async def index_page():
38+
return {"message": "Twilio Media Stream Server is running!"}
39+
40+
@app.api_route("/incoming-call", methods=["GET", "POST"])
41+
async def handle_incoming_call(request: Request):
42+
"""Handle incoming call and return TwiML response to connect to Media Stream."""
43+
response = VoiceResponse()
44+
# <Say> punctuation to improve text-to-speech flow
45+
response.say("Please wait while we connect your call to the A. I. voice assistant, powered by Twilio and the Open-A.I. Realtime API")
46+
response.pause(length=1)
47+
response.say("O.K. you can start talking!")
48+
host = request.url.hostname
49+
connect = Connect()
50+
connect.stream(url=f'wss://{host}/media-stream')
51+
response.append(connect)
52+
return HTMLResponse(content=str(response), media_type="application/xml")
53+
54+
@app.websocket("/media-stream")
55+
async def handle_media_stream(websocket: WebSocket):
56+
"""Handle WebSocket connections between Twilio and OpenAI."""
57+
print("Client connected")
58+
await websocket.accept()
59+
60+
async with websockets.connect(
61+
'wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01',
62+
extra_headers={
63+
"Authorization": f"Bearer {OPENAI_API_KEY}",
64+
"OpenAI-Beta": "realtime=v1"
65+
}
66+
) as openai_ws:
67+
await send_session_update(openai_ws)
68+
stream_sid = None
69+
70+
async def receive_from_twilio():
71+
"""Receive audio data from Twilio and send it to the OpenAI Realtime API."""
72+
nonlocal stream_sid
73+
try:
74+
async for message in websocket.iter_text():
75+
data = json.loads(message)
76+
if data['event'] == 'media' and openai_ws.open:
77+
audio_append = {
78+
"type": "input_audio_buffer.append",
79+
"audio": data['media']['payload']
80+
}
81+
await openai_ws.send(json.dumps(audio_append))
82+
elif data['event'] == 'start':
83+
stream_sid = data['start']['streamSid']
84+
print(f"Incoming stream has started {stream_sid}")
85+
except WebSocketDisconnect:
86+
print("Client disconnected.")
87+
if openai_ws.open:
88+
await openai_ws.close()
89+
90+
async def send_to_twilio():
91+
"""Receive events from the OpenAI Realtime API, send audio back to Twilio."""
92+
nonlocal stream_sid
93+
try:
94+
async for openai_message in openai_ws:
95+
response = json.loads(openai_message)
96+
if response['type'] in LOG_EVENT_TYPES:
97+
print(f"Received event: {response['type']}", response)
98+
if response['type'] == 'session.updated':
99+
print("Session updated successfully:", response)
100+
if response['type'] == 'response.audio.delta' and response.get('delta'):
101+
# Audio from OpenAI
102+
try:
103+
audio_payload = base64.b64encode(base64.b64decode(response['delta'])).decode('utf-8')
104+
audio_delta = {
105+
"event": "media",
106+
"streamSid": stream_sid,
107+
"media": {
108+
"payload": audio_payload
109+
}
110+
}
111+
await websocket.send_json(audio_delta)
112+
except Exception as e:
113+
print(f"Error processing audio data: {e}")
114+
except Exception as e:
115+
print(f"Error in send_to_twilio: {e}")
116+
117+
await asyncio.gather(receive_from_twilio(), send_to_twilio())
118+
119+
async def send_session_update(openai_ws):
120+
"""Send session update to OpenAI WebSocket."""
121+
session_update = {
122+
"type": "session.update",
123+
"session": {
124+
"turn_detection": {"type": "server_vad"},
125+
"input_audio_format": "g711_ulaw",
126+
"output_audio_format": "g711_ulaw",
127+
"voice": VOICE,
128+
"instructions": SYSTEM_MESSAGE,
129+
"modalities": ["text", "audio"],
130+
"temperature": 0.8,
131+
}
132+
}
133+
print('Sending session update:', json.dumps(session_update))
134+
await openai_ws.send(json.dumps(session_update))
135+
136+
if __name__ == "__main__":
137+
import uvicorn
138+
uvicorn.run(app, host="0.0.0.0", port=PORT)

0 commit comments

Comments
 (0)