Skip to content

Apify actor for AI-powered video transcription, description, and translation using RunPod Whisper and Novita.ai Qwen3-VL

Notifications You must be signed in to change notification settings

yfe404/video-ai-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video AI Processor - Transcription & Description

Process video URLs with AI-powered transcription, video description, and translation. Accepts single or bulk video URLs for batch processing.

Features

  • Audio Transcription - Extract speech from videos using RunPod's Faster Whisper (large-v3 model)
  • Video Description - Generate visual descriptions using Novita.ai's Qwen3-VL vision models
  • Translation - Translate transcriptions using:
    • Whisper's built-in translation (fast, to English only)
    • LLM-based translation (flexible, any language)
    • Both methods for comparison
  • Bulk Processing - Process single videos or arrays of video URLs
  • Flexible Configuration - All AI features are optional and independently configurable

Input

Required Fields

  • videoUrls (string | array) - Single video URL or array of URLs
    • Must be publicly accessible
    • Supported formats: MP4, WebM, etc.
    • Example: "https://example.com/video.mp4" or ["url1", "url2"]

Optional Features

  • transcribeAudio (boolean) - Enable audio transcription (default: false)
  • describeVideo (boolean) - Enable video description (default: false)
  • translateTranscription (boolean) - Enable translation (default: false)

Translation Settings

  • translationMethod (string) - Choose translation approach:
    • "whisper" - Use Whisper's built-in translation to English (fast, free)
    • "llm" - Use LLM API for translation to any language (flexible, paid)
    • "both" - Run both methods for comparison
  • targetLanguage (string) - Target language for LLM translation (e.g., "Spanish", "French")

API Credentials

  • runpodApiKey (string, secret) - Your RunPod API key (required if transcribeAudio is enabled)
  • runpodEndpointId (string) - Your RunPod Faster Whisper endpoint ID
  • novitaApiKey (string, secret) - Your Novita.ai API key (required if describeVideo is enabled)

Model Settings

  • qwenModel (string) - Qwen3-VL vision model (default: "qwen/qwen3-vl-8b-instruct")
    • "qwen/qwen3-vl-8b-instruct" - Most affordable ($0.08/$0.50 per M tokens)
    • "qwen/qwen3-vl-30b-a3b-instruct" - Balanced quality/cost
    • "qwen/qwen3-vl-30b-a3b-thinking" - Shows reasoning process
    • "qwen/qwen3-vl-235b-a22b-instruct" - Best quality
    • "qwen/qwen3-vl-235b-a22b-thinking" - Premium reasoning
  • maxTokens (integer) - Max output tokens for descriptions (default: 512, range: 100-2048)
  • maxVideoLength (integer) - Max video duration in seconds (default: 120, range: 5-600)
    • Videos longer than this will be skipped
  • videoDescriptionPrompt (string) - Custom prompt for video description

Output

Each processed video returns:

{
  "videoUrl": "https://example.com/video.mp4",
  "status": "success",
  "transcription": "Original transcription text...",
  "description": "AI-generated video description...",
  "whisper_translation": "English translation from Whisper...",
  "llm_translation": "Translation to target language...",
  "processingTime": 45.23,
  "error": null
}

Output Fields

  • videoUrl (string) - The video URL that was processed
  • status (string) - Processing status: "success" or "failed"
  • transcription (string) - Original audio transcription (if enabled)
  • description (string) - AI-generated visual description (if enabled)
  • whisper_translation (string | null) - Whisper's English translation (if enabled)
  • llm_translation (string | null) - LLM translation to target language (if enabled)
  • processingTime (number) - Total processing time in seconds
  • error (string | null) - Error message if processing failed

Usage Examples

Example 1: Transcribe Single Video

{
  "videoUrls": "https://example.com/video.mp4",
  "transcribeAudio": true,
  "runpodApiKey": "your-runpod-api-key",
  "runpodEndpointId": "abc123xyz456"
}

Example 2: Describe Video

{
  "videoUrls": "https://example.com/video.mp4",
  "describeVideo": true,
  "novitaApiKey": "your-novita-api-key",
  "qwenModel": "qwen/qwen3-vl-8b-instruct",
  "maxTokens": 512
}

Example 3: Transcribe with Translation

{
  "videoUrls": "https://example.com/video.mp4",
  "transcribeAudio": true,
  "translateTranscription": true,
  "translationMethod": "both",
  "targetLanguage": "Spanish",
  "runpodApiKey": "your-runpod-api-key",
  "runpodEndpointId": "abc123xyz456",
  "novitaApiKey": "your-novita-api-key"
}

Example 4: Batch Processing with All Features

{
  "videoUrls": [
    "https://example.com/video1.mp4",
    "https://example.com/video2.mp4",
    "https://example.com/video3.mp4"
  ],
  "transcribeAudio": true,
  "describeVideo": true,
  "translateTranscription": true,
  "translationMethod": "llm",
  "targetLanguage": "French",
  "runpodApiKey": "your-runpod-api-key",
  "runpodEndpointId": "abc123xyz456",
  "novitaApiKey": "your-novita-api-key",
  "qwenModel": "qwen/qwen3-vl-30b-a3b-instruct",
  "maxTokens": 512,
  "maxVideoLength": 120
}

API Keys

RunPod (for Transcription)

  1. Sign up at https://runpod.io
  2. Go to Settings > API Keys
  3. Create a new API key
  4. Deploy a Faster Whisper serverless endpoint
  5. Copy the endpoint ID from your endpoint dashboard

Novita.ai (for Description & LLM Translation)

  1. Sign up at https://novita.ai
  2. Go to Key Management section
  3. Create a new API key
  4. Copy the API key

Cost Considerations

  • Transcription: Costs depend on your RunPod endpoint configuration
  • Video Description: Costs vary by model:
    • 8B Instruct: $0.08/$0.50 per M tokens (most affordable)
    • 30B Instruct: $0.20/$0.70 per M tokens
    • 235B Instruct: $0.30/$1.50 per M tokens
  • Translation:
    • Whisper translation: Free (included in transcription)
    • LLM translation: Additional API costs

Use maxVideoLength to control costs by skipping long videos.

Error Handling

  • Videos that exceed maxVideoLength will be skipped with message: "[Skipped: Video too long]"
  • Failed API calls return error messages in the format: "[Error: error details]"
  • The actor continues processing remaining videos even if some fail
  • Check the status field to identify failed videos

Performance

  • Processing time varies based on:
    • Video duration
    • Selected models
    • Network speed
    • API response times
  • Average processing time: 30-60 seconds per video (with all features enabled)
  • Bulk processing handles videos sequentially to avoid API rate limits

Limitations

  • Video URLs must be publicly accessible (no authentication required)
  • Maximum video length: 600 seconds (10 minutes)
  • Whisper translation only outputs English
  • LLM translation quality depends on selected model
  • Processing time increases with video duration and enabled features

Support

For issues or questions:

  • Check the Actor logs for detailed error messages
  • Verify API keys are correct and have sufficient credits
  • Ensure video URLs are publicly accessible
  • Review the input schema for correct parameter format

Version History

v1.0 (2025-11-08)

  • Initial release
  • Audio transcription with RunPod Faster Whisper
  • Video description with Novita.ai Qwen3-VL
  • Translation with both Whisper and LLM methods
  • Single and bulk video processing

About

Apify actor for AI-powered video transcription, description, and translation using RunPod Whisper and Novita.ai Qwen3-VL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published