Welcome to the Data Scraping and Storage GitHub repository! This project focuses on utilizing the Scrapy web scraping framework to extract data of Travel and Mystery books from https://books.toscrape.com/index.html and storing it in a MongoDB Atlas database using PyMongo.
This repository contains a Python script that demonstrates how to scrape book data from various websites using Scrapy. The scraped data is then processed and stored in a MongoDB Atlas database for further analysis and use. The script provides a flexible and efficient way to gather book-related information from online sources.
To run the script and replicate the project, you'll need the following:
- Python 3.x
- Scrapy
- PyMongo
- MongoDB Atlas
Make sure to install the necessary dependencies using pip or any other package manager.
-
Clone this repository to your local machine:
git clone https://github.com/your-username/book-data-scraping.git -
Navigate to the project directory:
cd book-data-scraping - Configure MongoDB Atlas:
- Create a MongoDB Atlas account and set up a new cluster.
- Obtain the connection string for your MongoDB Atlas cluster.
- Update the MONGO_URI variable in the script with your connection string.
- Customize the scraping process:
- Open the book_scraper/spiders/books_spider.py file.
- Modify the spider code to specify the websites to scrape, the data fields to extract, and the desired scraping logic.
- Feel free to add more spiders or customize existing ones based on your requirements.
- Run the scripts: scrapy crawl books
The script will start scraping the specified websites and store the scraped book data into your MongoDB Atlas database.
Contributions to this project are welcome! If you have any ideas for improvements or new features, feel free to open an issue or submit a pull request. Let's make this project even better together. This project is licensed under the MIT License. You are free to use, modify, and distribute the code as per the terms and conditions of the license. Special thanks to the developers of Scrapy and PyMongo for providing powerful tools that make web scraping and database integration seamless. Their contributions are invaluable to this project.If you have any questions or feedback, please don't hesitate to reach out. Happy scraping and data storage!