Skip to content

aalramadan/TransVisio

Repository files navigation

TransVisio

image

💡What does it do?

TransVisio provides translations for multiple languages using Large Language Models (LLMs). It takes inputs in various formats (e.g., subtitle file or a video) and it extracts the text

Models Supported

  • GPT 4o
  • GPT 4 Turbo
  • GPT 3 Turbo
  • Gemini 1.5 Pro
  • Gemini 1.5 Flash
  • Whisper 20231117 (Online)
  • Faster-Whisper v1.0.3 (Offline)

Features

  • Inputs supported:
Subtitle files (*.srt *.ass *.ssa). 
Video files (*.mp4 *.mkv *.webm *.flv *.avi *.mov *.wmv *.m4v).
Audio files (*.wav *.ogg *.mp3 *.aac *.flac *.m4a *.oga *.opus). 
Excel files (*.xlsx *.csv).
  • You can save video/audio transcription to Excel.

  • You can specify the number of input sentences.

  • You can pause/resume translation at any point.

  • You can reverse the direction of the translated output.

  • You can remove and/or edit the translated output and the input. The tool will automatically align the rows.

  • Ability to specify Start Time and Duration for video/audio inputs.

  • Provides a Temperature setting, which Controls the randomness of the model’s output. A lower value makes the output more deterministic and focused, while a higher value makes the output more diverse and creative.

  • Light and Dark themes.

Note:
Make sure that you specify the Start Time and Duration before selecting the video/audio input.
Online Whisper requires an API key and is limited to 25 MB input size.
Offline Whisper does not require a key, but must download a model (e.g., tiny, small, etc.) on the first use.

Demo

Animation

Disclaimer

TransVisio is part of a collaborative research funded by the Abdul Hameed Shoman Foundation (Agreement Number: 230800351).
Hosting Institution: The project is hosted by the English Language and Translation Department at the Applied Science Private University.