Infinite Scrapper

Infinite Scrapper is a versatile and efficient web scraping tool designed to handle large-scale data extraction tasks with ease. Whether you're gathering data for research, analysis, or automation, Infinite Scrapper provides the tools you need to extract, process, and manage web data seamlessly.

Features

Scalable Scraping: Handle thousands of pages with ease.
Customizable Parsers: Easily define how to extract data from different websites.
Data Export: Export scraped data in various formats like CSV, JSON, or directly to databases.
Proxy Support: Rotate proxies to avoid IP bans and enhance scraping efficiency.
Error Handling: Robust mechanisms to handle and retry failed requests.
Scheduling: Automate scraping tasks at specified intervals.
Extensible Architecture: Plug and play modules to extend functionality.

Demo

Screenshot demonstrating the Infinite Scrapper in action.

Installation

Prerequisites

Before you begin, ensure you have met the following requirements:

Operating System: Windows, macOS, or Linux
Python: Version 3.7 or higher
Git: Installed on your system
pip: Python package installer

Setup Steps

Clone the Repository

git clone https://github.com/kobasi896/Infinite-Scrapper.git

Navigate to the Project Directory
```
cd Infinite-Scrapper
```

Create a Virtual Environment (Optional but Recommended)

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies
```
pip install -r requirements.txt
```

Configure Environment Variables

# .env file example
USER_AGENT="Your User Agent String"
PROXY_LIST="path/to/proxy_list.txt"
DATABASE_URL="your_database_connection_string"

Run Migrations (If Applicable)
```
python manage.py migrate
```

Usage

Basic Usage**

To start scraping, run the main script:

python main.py --config config.yaml

Parameters

--config: Path to the configuration file.

Advanced Configuration

Infinite Scrapper uses a YAML configuration file to define scraping tasks. Below is an example of a config.yaml:

settings:
  user_agent: "Your User Agent String"
  proxies:
    - "http://proxy1.com:port"
    - "http://proxy2.com:port"
  delay: 2  # Delay between requests in seconds

tasks:
  - name: "Example Task"
    start_url: "https://example.com"
    max_pages: 100
    selectors:
      title: "h1.title::text"
      price: "span.price::text"
      image: "img::attr(src)"
    export:
      format: "csv"
      path: "data/example.csv"

Running with Custom Configuration:

python main.py --config path/to/your_config.yaml

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

How to Contribute

Fork the Repository Click the "Fork" button at the top right of this page.
Clone your Repository

git clone https://github.com/your-username/Infinite-Scrapper.git

Create a Branch

git checkout -b feature/YourFeature

Make Your Changes Commit your changes with clear and descriptive messages.

git commit -m "Add feature X"

Push to Your Fork

git push origin feature/YourFeature

Open a Pull Request Navigate to the original repository and click the "Compare & pull request" button.

Code of Conduct

Please read our Code of Conduct before contributing.

License

Distributed under the MIT License. See LICENSE for more information.

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Infinite Scrapper

Table of Contents

Features

Demo

Installation

Prerequisites

Setup Steps

Usage

Basic Usage**

Advanced Configuration

Contributing

How to Contribute

Code of Conduct

License

MIT License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Infinite Scrapper

Table of Contents

Features

Demo

Installation

Prerequisites

Setup Steps

Usage

Basic Usage**

Advanced Configuration

Contributing

How to Contribute

Code of Conduct

License

MIT License