A FastAPI-based server that provides convenient endpoints for extracting information from YouTube videos, including video metadata, captions, timestamped transcripts, and available transcript languages.
- Get video metadata (title, author, thumbnails, etc.)
- Extract video captions/transcripts with language fallback
- Generate timestamped transcripts
- List available transcript languages for videos
- Support for multiple languages in captions
- Webshare proxy integration to avoid IP blocking
- Async processing with parallel execution
- Clean and RESTful API design
- Comprehensive error handling
- Python 3.12+
- uv (recommended) or pip
- Clone the repository:
git clone https://github.com/zaidmukaddam/youtube-api-server.git
cd youtube-api-server
- Install dependencies using uv (recommended):
uv sync
The server supports optional proxy configuration to avoid YouTube's IP blocking:
WEBSHARE_PROXY_USERNAME
- Your Webshare proxy username (optional)WEBSHARE_PROXY_PASSWORD
- Your Webshare proxy password (optional)HOST
- Server host (default: 0.0.0.0)PORT
- Server port (default: 8000)
You can set these in your environment or create a .env
file in the project root.
Use the included environment checker:
python load_env.py
This will verify your environment configuration and show which variables are set.
Start the server using:
uv run main.py
By default, the server runs on:
- Host: 0.0.0.0
- Port: 8000
You can customize these using environment variables:
export PORT=8080
export HOST=127.0.0.1
uv run main.py
GET /health
Response: Server status, proxy configuration, and system information.
POST /video-data
Request Body:
{
"url": "https://www.youtube.com/watch?v=VIDEO_ID"
}
Response: Video metadata including title, author, thumbnails, duration, etc.
POST /video-transcript-languages
Request Body:
{
"url": "https://www.youtube.com/watch?v=VIDEO_ID"
}
Response: List of available transcript languages with details about generated vs manual transcripts.
POST /video-captions
Request Body:
{
"url": "https://www.youtube.com/watch?v=VIDEO_ID",
"languages": ["en", "es"] // Optional, supports fallback
}
Response: Complete transcript text of the video with automatic language fallback.
POST /video-timestamps
Request Body:
{
"url": "https://www.youtube.com/watch?v=VIDEO_ID",
"languages": ["en"] // Optional, supports fallback
}
Response: List of timestamps with corresponding caption text and timing information.
# Health check
curl -X GET "http://localhost:8000/health"
# Get video metadata
curl -X POST "http://localhost:8000/video-data" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'
# List available languages
curl -X POST "http://localhost:8000/video-transcript-languages" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'
# Get video captions with language fallback
curl -X POST "http://localhost:8000/video-captions" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "languages": ["en", "es"]}'
# Get timestamped transcript
curl -X POST "http://localhost:8000/video-timestamps" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "languages": ["en"]}'
import requests
base_url = "http://localhost:8000"
video_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
# Get video metadata
response = requests.post(f"{base_url}/video-data",
json={"url": video_url})
metadata = response.json()
# Get captions
response = requests.post(f"{base_url}/video-captions",
json={"url": video_url, "languages": ["en"]})
captions = response.json()["captions"]
The project includes a comprehensive testing script that tests all endpoints with various scenarios:
chmod +x test_endpoints.sh
./test_endpoints.sh
Or test against a custom server:
./test_endpoints.sh http://your-server:8000
The test script covers:
- All API endpoints
- Multiple video types (English, Hindi, etc.)
- Language fallback scenarios
- Error handling
- Edge cases and invalid inputs
The API includes comprehensive error handling for:
- Invalid YouTube URLs (400 Bad Request)
- Missing or unavailable captions (500 Internal Server Error)
- Network errors and proxy issues
- Invalid language codes
- Malformed requests (422 Unprocessable Entity)
- Server connectivity issues
The server supports Webshare proxy integration to avoid YouTube's IP blocking:
- Sign up for a Webshare account
- Set your credentials as environment variables:
export WEBSHARE_PROXY_USERNAME="your_username" export WEBSHARE_PROXY_PASSWORD="your_password"
- Restart the server - proxy will be automatically enabled
- Async Processing: All transcript operations run asynchronously
- Parallel Execution: Blocking operations are executed in background threads
- Language Fallback: Automatic fallback to available languages
- Proxy Rotation: Webshare proxy integration for reliable access
The project uses these main dependencies:
fastapi
- Web frameworkuvicorn
- ASGI serveryoutube-transcript-api
- YouTube transcript extractionpydantic
- Data validationgunicorn
- Production WSGI server
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Run the tests (
./test_endpoints.sh
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
For major changes, please open an issue first to discuss what you would like to change.
- Clone and install dependencies:
git clone https://github.com/zaidmukaddam/youtube-api-server.git
cd youtube-api-server
uv sync
- Check environment setup:
python load_env.py
- Run the server:
python main.py
- Run tests:
./test_endpoints.sh
Please make sure to test your changes with the provided test script before submitting.