Support for Amazon Transcribe format for multiple channels (not the same as speakers) #235

gittes · 2020-05-18T21:15:04Z

Found your wonderful software, but had minor issue when loading an Amazon Transcribe transcript that had the variant format for independent audio channels as oppose to the typical speakers format.

Impressively, your software still loaded the rows of the transcript correctly, however, it made every speaker label have a unique number suffix, so it was impossible to relabel the speaker labels all at once and almost insurmountable task to track and correct by hand a very long transcript.

It's used when each speakers are each on a dedicated channel/track in the source audio file:
https://docs.aws.amazon.com/transcribe/latest/dg/how-channel-id.html
Excerpt from referred AWS doc showing the JSON format:

{
  "jobName": "job id",
  "accountId": "account id",
  "results": {
    "transcripts": [
      {
        "transcript": "When you try ... It seems to ..."
      }
    ],
    "channel_labels": {
      "channels": [
        {
          "channel_label": "ch_0",
          "items": [
            {
              "start_time": "12.282",
              "end_time": "12.592",
              "alternatives": [
                {
                  "confidence": "1.0000",
                  "content": "When"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.592",
              "end_time": "12.692",
              "alternatives": [
                {
                  "confidence": "0.8787",
                  "content": "you"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.702",
              "end_time": "13.252",
              "alternatives": [
                {
                  "confidence": "0.8318",
                  "content": "try"
                }
              ],
              "type": "pronunciation"
            },
            Transcription abbreviated
         ]
      },
      {
          "channel_label": "ch_1",
          "items": [
            {
              "start_time": "12.379",
              "end_time": "12.589",
              "alternatives": [
                {
                  "confidence": "0.5645",
                  "content": "It"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.599",
              "end_time": "12.659",
              "alternatives": [
                {
                  "confidence": "0.2907",
                  "content": "seems"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.669",
              "end_time": "13.029",
              "alternatives": [
                {
                  "confidence": "0.2497",
                  "content": "to"
                }
              ],
              "type": "pronunciation"
            },
            Transcription abbreviated
        ]
    }
}

It has "channel_labels" (object) -> "channels" (array/list) ->"channel" (object) with each channel containing it's own "items" for words oppose to "items" being declared once in the other format and uses "channel_label" instead of "speaker_label" for speakers.

Could you please accommodate the Amazon Transcribe channel format variant and at least have speaker ID labels be consistent per channel if not matching the "channel_label?"

Just for reference here's the doc for speaker identification format:
https://docs.aws.amazon.com/transcribe/latest/dg/how-diarization.html

pietrop · 2020-05-18T21:51:34Z

Hi @gittes
Thanks for flagging this!

The AWS adapter, same as many of the other adapters have been made thanks to community OS contributions. See PR #120

Remove the incremental counter

to remove the incremental counter this line should be changed packages/stt-adapters/amazon-transcribe/index.js#L140

-speaker: paragraph.speaker ? `Speaker ${ paragraph.speaker }` : `TBC ${ i }`,
+speaker: paragraph.speaker ? `Speaker ${ paragraph.speaker }` : `U_UKN`,

Doesn't have to be U_UKN but for STT services that returns speaker diarization infos sometimes it might look something like M_1 or F_2 etc... (eg using speechmatics)

AWS Adapter

There's a guide on how to make one from scratch under docs/guides/adapters.md for context and the code for the existing AWS one is at packages/stt-adapters/amazon-transcribe

AWS 2 channels json format

To accommodate that it be a matter of modifying the AWS STT Adapter in a way that

keeps compatibility with other AWS STT format
is able to distinguish between the two and uses the correct one
if speaker diarization info is available uses those, otherwise fallback to a default

Don't want to speak for @jamesdools and @emettely but I am guessing a PR would be welcome, if you got the time/capacity?

As a side note, at the moment I am mostly working on this alternative version pietrop/slate-transcript-editor. It doesn't provide any adapters as part of the core components, but I've extracted some of the adapters from this module, eg pietrop/aws-to-dpe, pietrop/gcp-to-dpe for when that type of conversion might be needed, eg working with AWS STT, or Google STT.

emettely · 2020-05-19T05:42:11Z

@gittes - hi! Thanks for sending us a request to improve the adapters - we would be ecstatic if you could help us out to add that compatibility, based on the information that @pietrop mentioned above. We would be happy to review it / merge.

gittes added the Enhancement a request for improvement label May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Amazon Transcribe format for multiple channels (not the same as speakers) #235

Support for Amazon Transcribe format for multiple channels (not the same as speakers) #235

gittes commented May 18, 2020

pietrop commented May 18, 2020 •

edited

Loading

emettely commented May 19, 2020

Support for Amazon Transcribe format for multiple channels (not the same as speakers) #235

Support for Amazon Transcribe format for multiple channels (not the same as speakers) #235

Comments

gittes commented May 18, 2020

pietrop commented May 18, 2020 • edited Loading

Remove the incremental counter

AWS Adapter

AWS 2 channels json format

emettely commented May 19, 2020

pietrop commented May 18, 2020 •

edited

Loading