Infinite Scrapper is a versatile and efficient web scraping tool designed to handle large-scale data extraction tasks with ease. Whether you're gathering data for research, analysis, or automation, Infinite Scrapper provides the tools you need to extract, process, and manage web data seamlessly.
- Scalable Scraping: Handle thousands of pages with ease.
- Customizable Parsers: Easily define how to extract data from different websites.
- Data Export: Export scraped data in various formats like CSV, JSON, or directly to databases.
- Proxy Support: Rotate proxies to avoid IP bans and enhance scraping efficiency.
- Error Handling: Robust mechanisms to handle and retry failed requests.
- Scheduling: Automate scraping tasks at specified intervals.
- Extensible Architecture: Plug and play modules to extend functionality.
Screenshot demonstrating the Infinite Scrapper in action.
Before you begin, ensure you have met the following requirements:
- Operating System: Windows, macOS, or Linux
- Python: Version 3.7 or higher
- Git: Installed on your system
- pip: Python package installer
-
Clone the Repository
git clone https://github.com/kobasi896/Infinite-Scrapper.git
-
Navigate to the Project Directory
cd Infinite-Scrapper
-
Create a Virtual Environment (Optional but Recommended)
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Configure Environment Variables
# .env file example USER_AGENT="Your User Agent String" PROXY_LIST="path/to/proxy_list.txt" DATABASE_URL="your_database_connection_string"
-
Run Migrations (If Applicable)
python manage.py migrate
To start scraping, run the main script:
python main.py --config config.yaml
Parameters
- --config: Path to the configuration file.
Infinite Scrapper uses a YAML configuration file to define scraping tasks. Below is an example of a config.yaml:
settings:
user_agent: "Your User Agent String"
proxies:
- "http://proxy1.com:port"
- "http://proxy2.com:port"
delay: 2 # Delay between requests in seconds
tasks:
- name: "Example Task"
start_url: "https://example.com"
max_pages: 100
selectors:
title: "h1.title::text"
price: "span.price::text"
image: "img::attr(src)"
export:
format: "csv"
path: "data/example.csv"
Running with Custom Configuration:
python main.py --config path/to/your_config.yaml
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
-
Fork the Repository Click the "Fork" button at the top right of this page.
-
Clone your Repository
git clone https://github.com/your-username/Infinite-Scrapper.git
- Create a Branch
git checkout -b feature/YourFeature
- Make Your Changes Commit your changes with clear and descriptive messages.
git commit -m "Add feature X"
- Push to Your Fork
git push origin feature/YourFeature
- Open a Pull Request Navigate to the original repository and click the "Compare & pull request" button.
Please read our Code of Conduct before contributing.
Distributed under the MIT License. See LICENSE for more information.
Copyright (c) 2024
Permission is hereby granted, free of charge, to any person obtaining a copy