feat(llm): Add AudioParser using OpenAI Whisper #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

biplavbarua wants to merge 1 commit into pathwaycom:main from biplavbarua:feature/audio-parser

biplavbarua commented Jan 4, 2026

Adds AudioParser to parsers.py allowing direct ingestion and transcription of audio files using OpenAI Whisper API.


          feat(llm): Add AudioParser using OpenAI Whisper

7df10e2

CLAassistant commented Jan 4, 2026 •

edited

Loading

All committers have signed the CLA.

zxqfd555 requested changes

View reviewed changes

Collaborator

zxqfd555 left a comment

Hey @biplavbarua,
Thank you for your interest in Pathway!
I have left several comments, could you please address them so that we could progress with the merge.

python/pathway/xpacks/llm/parsers.py

    
                      finally:

                          if os.path.exists(tmp_path):

                              os.remove(tmp_path)

Collaborator

zxqfd555 Jan 9, 2026

Please run flake8, mypy, and black linters on the code, and fix any issues introduced by your change.
In particular, flake8 reports an extra newline at the end of the file.

python/pathway/xpacks/llm/parsers.py

    
                                  **self.kwargs

                              )

                          # The response type depends on response_format, but default is object with .text

                          text = getattr(transcript, "text", str(transcript))

Collaborator

zxqfd555 Jan 9, 2026

Would it be possible to pin the response format so that there is no need to have fallback logic in parsing?

python/pathway/xpacks/llm/parsers.py

Comment on lines +1354 to +1359

    
                      with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as tmp_file:

                          tmp_file.write(contents)

                          tmp_path = tmp_file.name

                      try:

                          with open(tmp_path, "rb") as audio_file:

Collaborator

zxqfd555 Jan 9, 2026

It would be more efficient to have only one context manager: the one that creates tempfile.NamedTemporaryFile without delete=False. Then, one could perform all the work within it, without having to manually delete the file. It would have looked like this:

    tmp_file.write(contents)  # save the contents as before
    tmp_file.flush()  # complete the write
    tmp_file.seek(0)  # start reading the file from scratch

    # work with tmp_file as if it were opened for reading

Other than simplification, it would help us to avoid reopening the file, so it's a little bit more efficient too.

python/pathway/xpacks/llm/parsers.py

    
                  async def __wrapped__(self, contents: bytes) -> list[tuple[str, dict]]:

                      # openai.audio.transcriptions.create requires a file-like object with a name

                      # or a path. We use a temporary file.

                      # We default to .mp3 extension as it's widely supported/compressed.

Collaborator

zxqfd555 Jan 9, 2026

I would propose to have a Literal for the supported audio formats and add it to the parameters of the class constructor to work with the formats other than mp3 gracefully.

As far as I can see from the docs, it would look like

audio_format: Literal["mp3", "mp4", "mpeg", "mpga", "m4a", "wav", "webm"] = "mp3"

in the constructor arguments.

python/pathway/xpacks/llm/parsers.py

    
                      with optional_imports("xpack-llm"):

                          import openai

                      executor = _prepare_executor(

Collaborator

zxqfd555 Jan 9, 2026

This invocation wouldn't work, since there's a required positional argument async_mode, which is currently unspecified. You can accept it in the constructor too, similarly to how other classes in this file do.

zxqfd555 self-assigned this

Author

biplavbarua commented Jan 15, 2026

Self-review: Implementation looks solid. Good handling of temporary files to avoid cross-platform locking issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet