Skip to content

Conversation

@biplavbarua
Copy link

Adds AudioParser to parsers.py allowing direct ingestion and transcription of audio files using OpenAI Whisper API.

@CLAassistant
Copy link

CLAassistant commented Jan 4, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@zxqfd555 zxqfd555 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @biplavbarua,
Thank you for your interest in Pathway!
I have left several comments, could you please address them so that we could progress with the merge.

finally:
if os.path.exists(tmp_path):
os.remove(tmp_path)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run flake8, mypy, and black linters on the code, and fix any issues introduced by your change.
In particular, flake8 reports an extra newline at the end of the file.

**self.kwargs
)
# The response type depends on response_format, but default is object with .text
text = getattr(transcript, "text", str(transcript))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to pin the response format so that there is no need to have fallback logic in parsing?

Comment on lines +1354 to +1359
with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as tmp_file:
tmp_file.write(contents)
tmp_path = tmp_file.name

try:
with open(tmp_path, "rb") as audio_file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more efficient to have only one context manager: the one that creates tempfile.NamedTemporaryFile without delete=False. Then, one could perform all the work within it, without having to manually delete the file. It would have looked like this:

    tmp_file.write(contents)  # save the contents as before
    tmp_file.flush()  # complete the write
    tmp_file.seek(0)  # start reading the file from scratch

    # work with tmp_file as if it were opened for reading

Other than simplification, it would help us to avoid reopening the file, so it's a little bit more efficient too.

async def __wrapped__(self, contents: bytes) -> list[tuple[str, dict]]:
# openai.audio.transcriptions.create requires a file-like object with a name
# or a path. We use a temporary file.
# We default to .mp3 extension as it's widely supported/compressed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to have a Literal for the supported audio formats and add it to the parameters of the class constructor to work with the formats other than mp3 gracefully.

As far as I can see from the docs, it would look like

audio_format: Literal["mp3", "mp4", "mpeg", "mpga", "m4a", "wav", "webm"] = "mp3"

in the constructor arguments.

with optional_imports("xpack-llm"):
import openai

executor = _prepare_executor(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This invocation wouldn't work, since there's a required positional argument async_mode, which is currently unspecified. You can accept it in the constructor too, similarly to how other classes in this file do.

@zxqfd555 zxqfd555 self-assigned this Jan 9, 2026
@biplavbarua
Copy link
Author

Self-review: Implementation looks solid. Good handling of temporary files to avoid cross-platform locking issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants