Skip to content

🎬 Lightning-fast movie scraper for Thailand's major cinema chains (Major Cineplex & SF CinemaCity) - Extract real-time movie listings with structured JSON output

License

Notifications You must be signed in to change notification settings

dvgamerr/etl-cinema-scraper

Repository files navigation

🎬 Cinema Scraper

build-containers License Version Bun

πŸš€ Lightning-fast movie data scraper for Thailand's major cinema chains

Extract real-time movie listings from Major Cineplex and SF CinemaCity with ease


✨ Features

  • 🎯 Multi-Cinema Support: Scrapes from Major Cineplex and SF CinemaCity
  • ⚑ High Performance: Built with Bun.js for blazing-fast execution
  • πŸ€– Smart Scraping: Uses Puppeteer with randomized user agents
  • πŸ“Š Structured Data: Outputs clean, standardized JSON format
  • πŸ”„ Real-time Updates: Gets current and upcoming movie listings
  • 🐳 Docker Ready: Containerized for easy deployment
  • πŸ“€ API Integration: Built-in support for data uploading to external APIs

🎬 Supported Cinemas

Cinema Chain Status Movies Count
🏒 Major Cineplex βœ… Active ~2000+ movies
πŸŽͺ SF CinemaCity βœ… Active ~1500+ movies

πŸš€ Quick Start

Prerequisites

  • Bun.js runtime
  • Node.js 18+ (if using npm/yarn)

Installation

# Clone the repository
git clone https://github.com/dvgamerr/cinema-scraper.git
cd cinema-scraper

# Install dependencies
bun install

# Run the scraper
bun dev

πŸ“ Output Structure

The scraper generates JSON files in the ./output directory:

output/
β”œβ”€β”€ results.json          # πŸ“‹ Combined standardized data
β”œβ”€β”€ major-cineplex.json   # 🏒 Raw Major Cineplex data
└── sf-cinemacity.json    # πŸŽͺ Raw SF CinemaCity data

πŸ“„ Sample Output Format

{
  "name": "movie-slug",
  "name_en": "Movie Title in English",
  "name_th": "ΰΈŠΰΈ·ΰΉˆΰΈ­ΰΈ«ΰΈ™ΰΈ±ΰΈ‡ΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉ„ΰΈ—ΰΈ’",
  "display": "Display Name",
  "release": "2025-06-06T17:00:00.000Z",
  "genre": "Action",
  "time": 120,
  "theater": {
    "major": {
      "cover": "https://cdn.majorcineplex.com/...",
      "url": "https://www.majorcineplex.com/..."
    }
  }
}

🐳 Docker Deployment

# Build the image
docker build -t cinema-scraper .

# Run the container
docker run -v $(pwd)/output:/app/output cinema-scraper

πŸ“Š Performance

  • ⚑ Speed: Processes 3000+ movies in ~2-3 minutes
  • 🧠 Memory: Optimized memory usage with chunked processing
  • πŸ”„ Reliability: Built-in error handling and retry mechanisms
  • πŸ“± Anti-Detection: Randomized user agents and request patterns

πŸ“„ License

MIT Β© 2018-2025 Tounoβ„’


Made with ❀️ in Thailand

If this project helps you, please consider giving it a ⭐

About

🎬 Lightning-fast movie scraper for Thailand's major cinema chains (Major Cineplex & SF CinemaCity) - Extract real-time movie listings with structured JSON output

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  

Packages