Web Scraping Sandbox

IMPORTANT NOTE: This project is a test project that I have created using Copilot Workspace. I am using the project for testing with other AI-collaboration platforms, RAG / Index it and feed it to AI agents and simply having fun with it.

Project Description and Purpose

The Web Scraping Sandbox is a modular and extensible web scraping framework designed to simplify the process of extracting data from websites. It provides a set of tools and utilities to facilitate web scraping tasks, including making HTTP requests, parsing HTML content, and handling various web scraping scenarios.

Setup Instructions

Prerequisites

Python 3.11 or higher
Docker (optional, for containerized setup)

Installation

Clone the repository:

git clone https://github.com/githubnext/web-scraping-sandbox.git
cd web-scraping-sandbox

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the dependencies:
```
pip install -r requirements.txt
```

Docker Setup

Build the Docker image:
```
docker build -t web-scraping-sandbox .
```

Run the Docker container:

docker run -it --rm web-scraping-sandbox

Setting Up and Running the React UI

Navigate to the web-scraping-sandbox/ui directory:
```
cd web-scraping-sandbox/ui
```
Install the dependencies:
```
npm install
```
Start the React development server:
```
npm start
```
Open your browser and navigate to http://localhost:3000 to access the UI.

Running the Docker Container with the UI

Build the Docker image:
```
docker build -t web-scraping-sandbox .
```

Run the Docker container:

docker run -it --rm -p 3000:3000 -p 5000:5000 web-scraping-sandbox

Open your browser and navigate to http://localhost:3000 to access the UI.

Running Tests

To run the unit tests using pytest, execute the following command:

pytest

Example Usage

Here's an example of how to use the web scraper:

Create a Python script (e.g., example.py) with the following content:

from SRC.scraper import Scraper

url = "https://example.com"
scraper = Scraper(url)
data = scraper.scrape()
print(data)

Run the script:
```
python example.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
web-scraping-sandbox		web-scraping-sandbox
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping Sandbox

IMPORTANT NOTE: This project is a test project that I have created using Copilot Workspace. I am using the project for testing with other AI-collaboration platforms, RAG / Index it and feed it to AI agents and simply having fun with it.

Project Description and Purpose

Setup Instructions

Prerequisites

Installation

Docker Setup

Setting Up and Running the React UI

Running the Docker Container with the UI

Running Tests

Example Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

PetroczyP/web-scraping-sandbox

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Sandbox

IMPORTANT NOTE: This project is a test project that I have created using Copilot Workspace. I am using the project for testing with other AI-collaboration platforms, RAG / Index it and feed it to AI agents and simply having fun with it.

Project Description and Purpose

Setup Instructions

Prerequisites

Installation

Docker Setup

Setting Up and Running the React UI

Running the Docker Container with the UI

Running Tests

Example Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages