Web scraping project for market prices

Web scraping project for getting market datas for inflation rate analysis.

Project created with python web scraping library scrapy and its integration with web automation library playwright. In this project scrapy-playwright have used to get dynamically loaded content and execute javascript.

Spiders & Websites

This project contains different spiders for different market websites.

Spider names can be listed as

sokmarket (şok market)
carrefour (carrefoursa)
mopas (mopaş)
marketpaketi (marketpaketi)
migros (migros)

The scraped data is stored in .csv files and the current date as their names.

Data Format

The data is then formatted in a specific way so that it can be easily analyzed, accessed and scaled.

main_category	sub_category	lowest_category	name	price	high_price	in_stock	product_link	page_link	date
main category of the product	sub category of the product	lowest sub category of the product	name of the product	current price of the product	high price of the product if it is discounted	stock availability of the product	URL of the product	URL of the category page that product is on	date that product was scraped

Setting Up The Environment

Clone the repo to your local

$ git clone https://github.com/erayalp808/scraping-market-data.git

Go to project directory

$ cd scraping-market-data

Create and activate a virtual environment

To ensure that the needed Python packages do not corrupt the Python packages in your local area

$ virtualenv venv
$ source venv/bin/activate

Install the needed Python packages

(venv) $ pip install -r requirements.txt

Install the required browsers for playwright

$ playwright install

if you have missing dependencies, use "install-deps"

$ playwright install-deps

Run the spiders

$ scrapy crawl <spider name>

Use one script to run spiders, merge datas and store data in one folder

Use this script to scrape datas into "market_scraper/data" directory, merge them together and store in one file under "market_scraper/data/merged_data"

$ python run_spiders.py

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
market_scraper		market_scraper
.gitignore		.gitignore
README.md		README.md
merge_data.py		merge_data.py
requirements.txt		requirements.txt
run_spiders.py		run_spiders.py
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web scraping project for market prices

Spiders & Websites

Data Format

Setting Up The Environment

Clone the repo to your local

Go to project directory

Create and activate a virtual environment

Install the needed Python packages

Install the required browsers for playwright

Run the spiders

Use one script to run spiders, merge datas and store data in one folder

About

Releases

Packages

Languages

erayalp808/scraping-market-data

Folders and files

Latest commit

History

Repository files navigation

Web scraping project for market prices

Spiders & Websites

Data Format

Setting Up The Environment

Clone the repo to your local

Go to project directory

Create and activate a virtual environment

Install the needed Python packages

Install the required browsers for playwright

Run the spiders

Use one script to run spiders, merge datas and store data in one folder

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages