Skip to content

Latest commit

 

History

History
162 lines (129 loc) · 4.54 KB

README.md

File metadata and controls

162 lines (129 loc) · 4.54 KB

Infinite Scrapper

License Stars Forks

Infinite Scrapper is a versatile and efficient web scraping tool designed to handle large-scale data extraction tasks with ease. Whether you're gathering data for research, analysis, or automation, Infinite Scrapper provides the tools you need to extract, process, and manage web data seamlessly.

Table of Contents

Features

  • Scalable Scraping: Handle thousands of pages with ease.
  • Customizable Parsers: Easily define how to extract data from different websites.
  • Data Export: Export scraped data in various formats like CSV, JSON, or directly to databases.
  • Proxy Support: Rotate proxies to avoid IP bans and enhance scraping efficiency.
  • Error Handling: Robust mechanisms to handle and retry failed requests.
  • Scheduling: Automate scraping tasks at specified intervals.
  • Extensible Architecture: Plug and play modules to extend functionality.

Demo

Infinite Scrapper Demo

Screenshot demonstrating the Infinite Scrapper in action.

Installation

Prerequisites

Before you begin, ensure you have met the following requirements:

  • Operating System: Windows, macOS, or Linux
  • Python: Version 3.7 or higher
  • Git: Installed on your system
  • pip: Python package installer

Setup Steps

  1. Clone the Repository

    git clone https://github.com/kobasi896/Infinite-Scrapper.git
  2. Navigate to the Project Directory

    cd Infinite-Scrapper
  3. Create a Virtual Environment (Optional but Recommended)

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  4. Install Dependencies

    pip install -r requirements.txt
  5. Configure Environment Variables

    # .env file example
    USER_AGENT="Your User Agent String"
    PROXY_LIST="path/to/proxy_list.txt"
    DATABASE_URL="your_database_connection_string"
  6. Run Migrations (If Applicable)

    python manage.py migrate

Usage

Basic Usage**

To start scraping, run the main script:

python main.py --config config.yaml

Parameters

  • --config: Path to the configuration file.

Advanced Configuration

Infinite Scrapper uses a YAML configuration file to define scraping tasks. Below is an example of a config.yaml:

settings:
  user_agent: "Your User Agent String"
  proxies:
    - "http://proxy1.com:port"
    - "http://proxy2.com:port"
  delay: 2  # Delay between requests in seconds

tasks:
  - name: "Example Task"
    start_url: "https://example.com"
    max_pages: 100
    selectors:
      title: "h1.title::text"
      price: "span.price::text"
      image: "img::attr(src)"
    export:
      format: "csv"
      path: "data/example.csv"

Running with Custom Configuration:

python main.py --config path/to/your_config.yaml

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

How to Contribute

  1. Fork the Repository Click the "Fork" button at the top right of this page.

  2. Clone your Repository

git clone https://github.com/your-username/Infinite-Scrapper.git
  1. Create a Branch
git checkout -b feature/YourFeature
  1. Make Your Changes Commit your changes with clear and descriptive messages.
git commit -m "Add feature X"
  1. Push to Your Fork
git push origin feature/YourFeature
  1. Open a Pull Request Navigate to the original repository and click the "Compare & pull request" button.

Code of Conduct

Please read our Code of Conduct before contributing.

License

Distributed under the MIT License. See LICENSE for more information.

MIT License

Copyright (c) 2024

Permission is hereby granted, free of charge, to any person obtaining a copy