Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Amazon Transcribe format for multiple channels (not the same as speakers) #235

Open
gittes opened this issue May 18, 2020 · 2 comments
Labels
Enhancement a request for improvement

Comments

@gittes
Copy link

gittes commented May 18, 2020

Found your wonderful software, but had minor issue when loading an Amazon Transcribe transcript that had the variant format for independent audio channels as oppose to the typical speakers format.

Impressively, your software still loaded the rows of the transcript correctly, however, it made every speaker label have a unique number suffix, so it was impossible to relabel the speaker labels all at once and almost insurmountable task to track and correct by hand a very long transcript.

It's used when each speakers are each on a dedicated channel/track in the source audio file:
https://docs.aws.amazon.com/transcribe/latest/dg/how-channel-id.html
Excerpt from referred AWS doc showing the JSON format:

{
  "jobName": "job id",
  "accountId": "account id",
  "results": {
    "transcripts": [
      {
        "transcript": "When you try ... It seems to ..."
      }
    ],
    "channel_labels": {
      "channels": [
        {
          "channel_label": "ch_0",
          "items": [
            {
              "start_time": "12.282",
              "end_time": "12.592",
              "alternatives": [
                {
                  "confidence": "1.0000",
                  "content": "When"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.592",
              "end_time": "12.692",
              "alternatives": [
                {
                  "confidence": "0.8787",
                  "content": "you"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.702",
              "end_time": "13.252",
              "alternatives": [
                {
                  "confidence": "0.8318",
                  "content": "try"
                }
              ],
              "type": "pronunciation"
            },
            Transcription abbreviated
         ]
      },
      {
          "channel_label": "ch_1",
          "items": [
            {
              "start_time": "12.379",
              "end_time": "12.589",
              "alternatives": [
                {
                  "confidence": "0.5645",
                  "content": "It"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.599",
              "end_time": "12.659",
              "alternatives": [
                {
                  "confidence": "0.2907",
                  "content": "seems"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.669",
              "end_time": "13.029",
              "alternatives": [
                {
                  "confidence": "0.2497",
                  "content": "to"
                }
              ],
              "type": "pronunciation"
            },
            Transcription abbreviated
        ]
    }
}

It has "channel_labels" (object) -> "channels" (array/list) ->"channel" (object) with each channel containing it's own "items" for words oppose to "items" being declared once in the other format and uses "channel_label" instead of "speaker_label" for speakers.

Could you please accommodate the Amazon Transcribe channel format variant and at least have speaker ID labels be consistent per channel if not matching the "channel_label?"

Just for reference here's the doc for speaker identification format:
https://docs.aws.amazon.com/transcribe/latest/dg/how-diarization.html

@gittes gittes added the Enhancement a request for improvement label May 18, 2020
@pietrop
Copy link
Contributor

pietrop commented May 18, 2020

Hi @gittes
Thanks for flagging this!

The AWS adapter, same as many of the other adapters have been made thanks to community OS contributions. See PR #120

Remove the incremental counter

to remove the incremental counter this line should be changed packages/stt-adapters/amazon-transcribe/index.js#L140

-speaker: paragraph.speaker ? `Speaker ${ paragraph.speaker }` : `TBC ${ i }`,
+speaker: paragraph.speaker ? `Speaker ${ paragraph.speaker }` : `U_UKN`,

Doesn't have to be U_UKN but for STT services that returns speaker diarization infos sometimes it might look something like M_1 or F_2 etc... (eg using speechmatics)

AWS Adapter

There's a guide on how to make one from scratch under docs/guides/adapters.md for context and the code for the existing AWS one is at packages/stt-adapters/amazon-transcribe

AWS 2 channels json format

To accommodate that it be a matter of modifying the AWS STT Adapter in a way that

  • keeps compatibility with other AWS STT format
  • is able to distinguish between the two and uses the correct one
  • if speaker diarization info is available uses those, otherwise fallback to a default

Don't want to speak for @jamesdools and @emettely but I am guessing a PR would be welcome, if you got the time/capacity?


As a side note, at the moment I am mostly working on this alternative version pietrop/slate-transcript-editor. It doesn't provide any adapters as part of the core components, but I've extracted some of the adapters from this module, eg pietrop/aws-to-dpe, pietrop/gcp-to-dpe for when that type of conversion might be needed, eg working with AWS STT, or Google STT.

@emettely
Copy link
Contributor

@gittes - hi! Thanks for sending us a request to improve the adapters - we would be ecstatic if you could help us out to add that compatibility, based on the information that @pietrop mentioned above. We would be happy to review it / merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement a request for improvement
Projects
None yet
Development

No branches or pull requests

3 participants