Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Audio output from OpenAI models #396

Open
jackmpcollins opened this issue Jan 6, 2025 · 0 comments
Open

Support Audio output from OpenAI models #396

jackmpcollins opened this issue Jan 6, 2025 · 0 comments

Comments

@jackmpcollins
Copy link
Owner

jackmpcollins commented Jan 6, 2025

docs: https://platform.openai.com/docs/guides/audio?audio-generation-quickstart-example=audio-out

import base64
from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0])

wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

This will require a new output type, maybe StreamedAudioResponse. When this is present in the return type of a prompt-function the modalities=["text", "audio"] and audio arguments would be added to the completions request.

Audio output works with stream=True using audio={"voice": "alloy", "format": "pcm16"}. The response is a mix of transcript and audio chunks, so StreamedAudioResponse could be an iterable of StreamedStr and StreamedAudio (similar to StreamedResponse). StreamedAudio would be an iterable of bytes (or a new AudioBytes that could be used for audio input, see PR #397).

Open questions

  • How to set the voice param
  • Disallow union of StreamedAudio with any other type?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant