Skip to content

WhisperAPI is a fast and reliable API that transcribes video and audio files into text with support for all models and languages. It offers time-stamped results and translation to English.

License

Notifications You must be signed in to change notification settings

DontEatOreo/WhisperAPI

Repository files navigation

WhisperAPI

GitHub license GitHub release GitHub issues

WhisperAPI is a wrapper for Whisper.cpp a C++ implementation of the original OpenAI Whisper that greatly enhances its performance and speed.

AppSettings

You will need to edit the appsettings.json file to contain a full path to where you want to store models and audio files.

{
  "WhisperSettings": {
    "Folder": "/path/to/whisper/folder"
  }
}

In the Folder property you will need to provide a full path to where you want to store models and audio files.

Note

Translation increase the processing time, sometimes 2x the time! So avoid translation for long videos or audios.

Features

  • Transcribe video and audio files into text
  • Supports all models
  • Easy to use and integrate into your own projects
  • Fast and reliable transcription results
  • Supports every language by OpenAI Whisper
  • Ability to translate transcribed text to English

Notes

  • You can use any language codes supported by OpenAI Whisper
  • If you're unsure or don't know ahead of time which language code you need you can omit lang property.
  • Supported Models are: Tiny, Base, Medium and Large.

Usage

Before making a request to transcribe a file, you should query the /models endpoint to get a list of all available models.

curl --location --request GET 'https://localhost:5001/models'

To use WhisperAPI, you need to send a POST request to the /transcribe endpoint with the following form-data payload:

file: @/path/to/file/
model: String
translate: Boolean

Additionally, you can add headers to the request for language and response type preferences.

Accept: application/json
Accept-Language: en

The file should be provided as a multipart/form-data field named file.

translate is an optional property.

  • If the Accept header is omitted, the API will automatically detect the language of the file.
  • If the translate property is omitted, it defaults to false.

Here is an example of a request using curl:

curl --location --request POST 'https://localhost:5001/transcribe' \
--header 'Accept: application/json' \
--header 'Accept-Language: English' \
--form 'file=@"/path/to/file/"' \
--form 'model="base"' \
--form 'translate="true"'

The response will be a JSON payload with the following format:

{
  "data": [
    {
      "start": 0,
      "end": 3,
      "text": "Hello!"
    },
    {
      "start": 3,
      "end": 6,
      "text": " World!"
    }
  ],
  "count": 2
}

If text/plain is used the response will look like this:

Hello! World!

If application/xml is used the response will look like this:

<JsonResponse>
    <Data>
        <ResponseData>
            <Start>0</Start>
            <End>3</End>
            <Text>Hello</Text>
        </ResponseData>
        <ResponseData>
            <Start>3</Start>
            <End>6</End>
            <Text> World!</Text>
        </ResponseData>
    </Data>
    <Count>2</Count>
</JsonResponse>

If application/x-subrip is used the response will look like this:

1
00:00:00,000 --> 00:00:05,000
Hello

2
00:00:05,000 --> 00:00:10,000
World

On failure (e.g: invalid file format) the response JSON will be:

{
  "error": "Error message"
}

Contributing

We welcome contributions to WhisperAPI! If you would like to contribute, simply fork the repository and submit a pull request with your changes.

Support

If you need help with WhisperAPI, please create an issue on GitHub and I will respond as soon as possible.

About

WhisperAPI is a fast and reliable API that transcribes video and audio files into text with support for all models and languages. It offers time-stamped results and translation to English.

Topics

Resources

License

Stars

Watchers

Forks