π Lightning-fast movie data scraper for Thailand's major cinema chains
Extract real-time movie listings from Major Cineplex and SF CinemaCity with ease
- π― Multi-Cinema Support: Scrapes from Major Cineplex and SF CinemaCity
- β‘ High Performance: Built with Bun.js for blazing-fast execution
- π€ Smart Scraping: Uses Puppeteer with randomized user agents
- π Structured Data: Outputs clean, standardized JSON format
- π Real-time Updates: Gets current and upcoming movie listings
- π³ Docker Ready: Containerized for easy deployment
- π€ API Integration: Built-in support for data uploading to external APIs
Cinema Chain | Status | Movies Count |
---|---|---|
π’ Major Cineplex | β Active | ~2000+ movies |
πͺ SF CinemaCity | β Active | ~1500+ movies |
- Bun.js runtime
- Node.js 18+ (if using npm/yarn)
# Clone the repository
git clone https://github.com/dvgamerr/cinema-scraper.git
cd cinema-scraper
# Install dependencies
bun install
# Run the scraper
bun dev
The scraper generates JSON files in the ./output
directory:
output/
βββ results.json # π Combined standardized data
βββ major-cineplex.json # π’ Raw Major Cineplex data
βββ sf-cinemacity.json # πͺ Raw SF CinemaCity data
{
"name": "movie-slug",
"name_en": "Movie Title in English",
"name_th": "ΰΈΰΈ·ΰΉΰΈΰΈ«ΰΈΰΈ±ΰΈΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉΰΈΰΈ’",
"display": "Display Name",
"release": "2025-06-06T17:00:00.000Z",
"genre": "Action",
"time": 120,
"theater": {
"major": {
"cover": "https://cdn.majorcineplex.com/...",
"url": "https://www.majorcineplex.com/..."
}
}
}
# Build the image
docker build -t cinema-scraper .
# Run the container
docker run -v $(pwd)/output:/app/output cinema-scraper
- β‘ Speed: Processes 3000+ movies in ~2-3 minutes
- π§ Memory: Optimized memory usage with chunked processing
- π Reliability: Built-in error handling and retry mechanisms
- π± Anti-Detection: Randomized user agents and request patterns
MIT Β© 2018-2025 Tounoβ’
Made with β€οΈ in Thailand
If this project helps you, please consider giving it a β