Support Audio output from OpenAI models #396

jackmpcollins · 2025-01-06T05:40:09Z

docs: https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-out

import base64
from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0])

wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

This will require a new output type, maybe StreamedAudioResponse. When this is present in the return type of a prompt-function the modalities=["text", "audio"] and audio arguments would be added to the completions request.

Audio output works with stream=True using audio={"voice": "alloy", "format": "pcm16"}. The response is a mix of transcript and audio chunks, so StreamedAudioResponse could be an iterable of StreamedStr and StreamedAudio (similar to StreamedResponse). StreamedAudio would be an iterable of bytes (or a new AudioBytes that could be used for audio input, see PR #397).

Open questions

How to set the voice param
Disallow union of StreamedAudio with any other type?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Audio output from OpenAI models #396

Support Audio output from OpenAI models #396

jackmpcollins commented Jan 6, 2025 •

edited

Loading

Support Audio output from OpenAI models #396

Support Audio output from OpenAI models #396

Comments

jackmpcollins commented Jan 6, 2025 • edited Loading

jackmpcollins commented Jan 6, 2025 •

edited

Loading