Web scraping project for getting market datas for inflation rate analysis.
Project created with python web scraping library scrapy and its integration with web automation library playwright. In this project scrapy-playwright have used to get dynamically loaded content and execute javascript.
This project contains different spiders for different market websites.
Spider names can be listed as
- sokmarket (şok market)
- carrefour (carrefoursa)
- mopas (mopaş)
- marketpaketi (marketpaketi)
- migros (migros)
The scraped data is stored in .csv files and the current date as their names.
The data is then formatted in a specific way so that it can be easily analyzed, accessed and scaled.
main_category | sub_category | lowest_category | name | price | high_price | in_stock | product_link | page_link | date |
---|---|---|---|---|---|---|---|---|---|
main category of the product | sub category of the product | lowest sub category of the product | name of the product | current price of the product | high price of the product if it is discounted | stock availability of the product | URL of the product | URL of the category page that product is on | date that product was scraped |
$ git clone https://github.com/erayalp808/scraping-market-data.git
$ cd scraping-market-data
To ensure that the needed Python packages do not corrupt the Python packages in your local area
$ virtualenv venv
$ source venv/bin/activate
(venv) $ pip install -r requirements.txt
$ playwright install
if you have missing dependencies, use "install-deps"
$ playwright install-deps
$ scrapy crawl <spider name>
Use this script to scrape datas into "market_scraper/data" directory, merge them together and store in one file under "market_scraper/data/merged_data"
$ python run_spiders.py