Skip to content

PetroczyP/web-scraping-sandbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Web Scraping Sandbox

IMPORTANT NOTE: This project is a test project that I have created using Copilot Workspace. I am using the project for testing with other AI-collaboration platforms, RAG / Index it and feed it to AI agents and simply having fun with it.

Project Description and Purpose

The Web Scraping Sandbox is a modular and extensible web scraping framework designed to simplify the process of extracting data from websites. It provides a set of tools and utilities to facilitate web scraping tasks, including making HTTP requests, parsing HTML content, and handling various web scraping scenarios.

Setup Instructions

Prerequisites

  • Python 3.11 or higher
  • Docker (optional, for containerized setup)

Installation

  1. Clone the repository:

    git clone https://github.com/githubnext/web-scraping-sandbox.git
    cd web-scraping-sandbox
  2. Create a virtual environment and activate it:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install the dependencies:

    pip install -r requirements.txt

Docker Setup

  1. Build the Docker image:

    docker build -t web-scraping-sandbox .
  2. Run the Docker container:

    docker run -it --rm web-scraping-sandbox

Setting Up and Running the React UI

  1. Navigate to the web-scraping-sandbox/ui directory:

    cd web-scraping-sandbox/ui
  2. Install the dependencies:

    npm install
  3. Start the React development server:

    npm start
  4. Open your browser and navigate to http://localhost:3000 to access the UI.

Running the Docker Container with the UI

  1. Build the Docker image:

    docker build -t web-scraping-sandbox .
  2. Run the Docker container:

    docker run -it --rm -p 3000:3000 -p 5000:5000 web-scraping-sandbox
  3. Open your browser and navigate to http://localhost:3000 to access the UI.

Running Tests

To run the unit tests using pytest, execute the following command:

pytest

Example Usage

Here's an example of how to use the web scraper:

  1. Create a Python script (e.g., example.py) with the following content:

    from SRC.scraper import Scraper
    
    url = "https://example.com"
    scraper = Scraper(url)
    data = scraper.scrape()
    print(data)
  2. Run the script:

    python example.py

About

A repository for web scraping sandbox project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published