-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat(llm): Add AudioParser using OpenAI Whisper #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
zxqfd555
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @biplavbarua,
Thank you for your interest in Pathway!
I have left several comments, could you please address them so that we could progress with the merge.
| finally: | ||
| if os.path.exists(tmp_path): | ||
| os.remove(tmp_path) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run flake8, mypy, and black linters on the code, and fix any issues introduced by your change.
In particular, flake8 reports an extra newline at the end of the file.
| **self.kwargs | ||
| ) | ||
| # The response type depends on response_format, but default is object with .text | ||
| text = getattr(transcript, "text", str(transcript)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to pin the response format so that there is no need to have fallback logic in parsing?
| with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as tmp_file: | ||
| tmp_file.write(contents) | ||
| tmp_path = tmp_file.name | ||
|
|
||
| try: | ||
| with open(tmp_path, "rb") as audio_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more efficient to have only one context manager: the one that creates tempfile.NamedTemporaryFile without delete=False. Then, one could perform all the work within it, without having to manually delete the file. It would have looked like this:
tmp_file.write(contents) # save the contents as before
tmp_file.flush() # complete the write
tmp_file.seek(0) # start reading the file from scratch
# work with tmp_file as if it were opened for readingOther than simplification, it would help us to avoid reopening the file, so it's a little bit more efficient too.
| async def __wrapped__(self, contents: bytes) -> list[tuple[str, dict]]: | ||
| # openai.audio.transcriptions.create requires a file-like object with a name | ||
| # or a path. We use a temporary file. | ||
| # We default to .mp3 extension as it's widely supported/compressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to have a Literal for the supported audio formats and add it to the parameters of the class constructor to work with the formats other than mp3 gracefully.
As far as I can see from the docs, it would look like
audio_format: Literal["mp3", "mp4", "mpeg", "mpga", "m4a", "wav", "webm"] = "mp3"
in the constructor arguments.
| with optional_imports("xpack-llm"): | ||
| import openai | ||
|
|
||
| executor = _prepare_executor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This invocation wouldn't work, since there's a required positional argument async_mode, which is currently unspecified. You can accept it in the constructor too, similarly to how other classes in this file do.
|
Self-review: Implementation looks solid. Good handling of temporary files to avoid cross-platform locking issues. |
Adds AudioParser to
parsers.pyallowing direct ingestion and transcription of audio files using OpenAI Whisper API.