Video AI Processor - Transcription & Description

Process video URLs with AI-powered transcription, video description, and translation. Accepts single or bulk video URLs for batch processing.

Features

Audio Transcription - Extract speech from videos using RunPod's Faster Whisper (large-v3 model)
Video Description - Generate visual descriptions using Novita.ai's Qwen3-VL vision models
Translation - Translate transcriptions using:
- Whisper's built-in translation (fast, to English only)
- LLM-based translation (flexible, any language)
- Both methods for comparison
Bulk Processing - Process single videos or arrays of video URLs
Flexible Configuration - All AI features are optional and independently configurable

Input

Required Fields

videoUrls (string | array) - Single video URL or array of URLs
- Must be publicly accessible
- Supported formats: MP4, WebM, etc.
- Example: "https://example.com/video.mp4" or ["url1", "url2"]

Optional Features

transcribeAudio (boolean) - Enable audio transcription (default: false)
describeVideo (boolean) - Enable video description (default: false)
translateTranscription (boolean) - Enable translation (default: false)

Translation Settings

translationMethod (string) - Choose translation approach:
- "whisper" - Use Whisper's built-in translation to English (fast, free)
- "llm" - Use LLM API for translation to any language (flexible, paid)
- "both" - Run both methods for comparison
targetLanguage (string) - Target language for LLM translation (e.g., "Spanish", "French")

API Credentials

runpodApiKey (string, secret) - Your RunPod API key (required if transcribeAudio is enabled)
runpodEndpointId (string) - Your RunPod Faster Whisper endpoint ID
novitaApiKey (string, secret) - Your Novita.ai API key (required if describeVideo is enabled)

Model Settings

qwenModel (string) - Qwen3-VL vision model (default: "qwen/qwen3-vl-8b-instruct")
- "qwen/qwen3-vl-8b-instruct" - Most affordable ($0.08/$0.50 per M tokens)
- "qwen/qwen3-vl-30b-a3b-instruct" - Balanced quality/cost
- "qwen/qwen3-vl-30b-a3b-thinking" - Shows reasoning process
- "qwen/qwen3-vl-235b-a22b-instruct" - Best quality
- "qwen/qwen3-vl-235b-a22b-thinking" - Premium reasoning
maxTokens (integer) - Max output tokens for descriptions (default: 512, range: 100-2048)
maxVideoLength (integer) - Max video duration in seconds (default: 120, range: 5-600)
- Videos longer than this will be skipped
videoDescriptionPrompt (string) - Custom prompt for video description

Output

Each processed video returns:

{
  "videoUrl": "https://example.com/video.mp4",
  "status": "success",
  "transcription": "Original transcription text...",
  "description": "AI-generated video description...",
  "whisper_translation": "English translation from Whisper...",
  "llm_translation": "Translation to target language...",
  "processingTime": 45.23,
  "error": null
}

Output Fields

videoUrl (string) - The video URL that was processed
status (string) - Processing status: "success" or "failed"
transcription (string) - Original audio transcription (if enabled)
description (string) - AI-generated visual description (if enabled)
whisper_translation (string | null) - Whisper's English translation (if enabled)
llm_translation (string | null) - LLM translation to target language (if enabled)
processingTime (number) - Total processing time in seconds
error (string | null) - Error message if processing failed

Usage Examples

Example 1: Transcribe Single Video

{
  "videoUrls": "https://example.com/video.mp4",
  "transcribeAudio": true,
  "runpodApiKey": "your-runpod-api-key",
  "runpodEndpointId": "abc123xyz456"
}

Example 2: Describe Video

{
  "videoUrls": "https://example.com/video.mp4",
  "describeVideo": true,
  "novitaApiKey": "your-novita-api-key",
  "qwenModel": "qwen/qwen3-vl-8b-instruct",
  "maxTokens": 512
}

Example 3: Transcribe with Translation

{
  "videoUrls": "https://example.com/video.mp4",
  "transcribeAudio": true,
  "translateTranscription": true,
  "translationMethod": "both",
  "targetLanguage": "Spanish",
  "runpodApiKey": "your-runpod-api-key",
  "runpodEndpointId": "abc123xyz456",
  "novitaApiKey": "your-novita-api-key"
}

Example 4: Batch Processing with All Features

{
  "videoUrls": [
    "https://example.com/video1.mp4",
    "https://example.com/video2.mp4",
    "https://example.com/video3.mp4"
  ],
  "transcribeAudio": true,
  "describeVideo": true,
  "translateTranscription": true,
  "translationMethod": "llm",
  "targetLanguage": "French",
  "runpodApiKey": "your-runpod-api-key",
  "runpodEndpointId": "abc123xyz456",
  "novitaApiKey": "your-novita-api-key",
  "qwenModel": "qwen/qwen3-vl-30b-a3b-instruct",
  "maxTokens": 512,
  "maxVideoLength": 120
}

API Keys

RunPod (for Transcription)

Sign up at https://runpod.io
Go to Settings > API Keys
Create a new API key
Deploy a Faster Whisper serverless endpoint
Copy the endpoint ID from your endpoint dashboard

Novita.ai (for Description & LLM Translation)

Sign up at https://novita.ai
Go to Key Management section
Create a new API key
Copy the API key

Cost Considerations

Transcription: Costs depend on your RunPod endpoint configuration
Video Description: Costs vary by model:
- 8B Instruct: $0.08/$0.50 per M tokens (most affordable)
- 30B Instruct: $0.20/$0.70 per M tokens
- 235B Instruct: $0.30/$1.50 per M tokens
Translation:
- Whisper translation: Free (included in transcription)
- LLM translation: Additional API costs

Use maxVideoLength to control costs by skipping long videos.

Error Handling

Videos that exceed maxVideoLength will be skipped with message: "[Skipped: Video too long]"
Failed API calls return error messages in the format: "[Error: error details]"
The actor continues processing remaining videos even if some fail
Check the status field to identify failed videos

Performance

Processing time varies based on:
- Video duration
- Selected models
- Network speed
- API response times
Average processing time: 30-60 seconds per video (with all features enabled)
Bulk processing handles videos sequentially to avoid API rate limits

Limitations

Video URLs must be publicly accessible (no authentication required)
Maximum video length: 600 seconds (10 minutes)
Whisper translation only outputs English
LLM translation quality depends on selected model
Processing time increases with video duration and enabled features

Support

For issues or questions:

Check the Actor logs for detailed error messages
Verify API keys are correct and have sufficient credits
Ensure video URLs are publicly accessible
Review the input schema for correct parameter format

Version History

v1.0 (2025-11-08)

Initial release
Audio transcription with RunPod Faster Whisper
Video description with Novita.ai Qwen3-VL
Translation with both Whisper and LLM methods
Single and bulk video processing

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.actor		.actor
src		src
.actorignore		.actorignore
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video AI Processor - Transcription & Description

Features

Input

Required Fields

Optional Features

Translation Settings

API Credentials

Model Settings

Output

Output Fields

Usage Examples

Example 1: Transcribe Single Video

Example 2: Describe Video

Example 3: Transcribe with Translation

Example 4: Batch Processing with All Features

API Keys

RunPod (for Transcription)

Novita.ai (for Description & LLM Translation)

Cost Considerations

Error Handling

Performance

Limitations

Support

Version History

v1.0 (2025-11-08)

About

Uh oh!

Releases

Packages

Languages

yfe404/video-ai-processor

Folders and files

Latest commit

History

Repository files navigation

Video AI Processor - Transcription & Description

Features

Input

Required Fields

Optional Features

Translation Settings

API Credentials

Model Settings

Output

Output Fields

Usage Examples

Example 1: Transcribe Single Video

Example 2: Describe Video

Example 3: Transcribe with Translation

Example 4: Batch Processing with All Features

API Keys

RunPod (for Transcription)

Novita.ai (for Description & LLM Translation)

Cost Considerations

Error Handling

Performance

Limitations

Support

Version History

v1.0 (2025-11-08)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages