Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add headers for multiple language identification #99

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

mbatchkarov
Copy link

@mbatchkarov mbatchkarov commented Jan 30, 2024

Issue #, if available: N/A

Description of changes:
Hi, Transcribe SDE here. We recently launched a new feature called multiple language identification. We've been asked to contribute to this package to enable the feature so customers can use it.

Notes
language_code is currently a required positional parameter. When language ID is added, language code should become optional. That means we'd have to make it the third param and give it a default value, but this would break existing clients that use positional-only arguments. Therefore I'm leaving it as a positional arg and requiring that it's set to None when language ID is enabled. I'm not too happy with this option either- happy to discuss. Maybe something like this would be more ergonomic:

        if identify_language or identify_language:
            warnings.warm("Setting language_code to None because language ID is enabled")
            language_code = None

Testing
I extended and ran the integration tests locally from my own AWS account.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mbatchkarov
Copy link
Author

Update: just saw #89 - looks like plain language ID is also not supported. Will update the PR to include that too

@praveenXira
Copy link

When will this PR get merged? I really need multiple language identification feature.

identify_language: Optional[bool] = False,
preferred_language: Optional[str] = None,
identify_multiple_languages=False,
language_options=None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with AWS coding standards, but is there a reason why there isn't type hiting on identify_multiple_languages and language_options ?

@GameSetAndMatch
Copy link

Definitely a feature I would use in the short term if it was reviewed and merged by AWS team, good work @mbatchkarov !

Header names are separated by hyphens, not underscores

https://docs.aws.amazon.com/transcribe/latest/dg/lang-id-stream.html

Co-authored-by: Gunwoo Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants