Skip to content

A collection of crawlers using Selenium, Scrapy-Framework and BS4

License

Notifications You must be signed in to change notification settings

tanjimanasreen/gsmarena-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gsmarena-crawler

This project is consisted of two crawlers built with different python webscraping libraries ( beautifulsoup4, scrapy and selenium ) which extract data from gsmarena and and its Bangladeshi variant website gsmarena-bd and store the data into a MongoDB Database.

Website Crawler
gsmarena gsmareana-selenium
gsmarena-bd gsmareanabd-beautifulsoup4
gsmareanabd-scrapy

Prerequisites

python , MongoDB database

Software Version

Python - 3.6.8 (64 bit)
MongoDB - 4.4.8

Download

  • Download source code
  • Clone the repository
    git clone https://github.com/tanjimanasreen/gsmarena-crawler.git
    

Gsmarenabd-crawler


Scrapy:

This comes with an end to end pipeline that scrapes all the phones' specifications available on gsmarena.com.bd and stores it into a MongoDB database.

Open the Scrapy-project folder and run it using scrapy crawl command. Set the Database configuration variables on the scrapy settings.py file.

Built With:

BeautifulSoup4

This parser can parse all the phones' specifications available on gsmarena.com.bd using python's beautifulsoup4 package and stores it into a json file.

Download and run the notebook available here in your local pc using jupyter notebook or on google colab.

Built With:

Gsmarena-selenium


This uses Selenium package for python to scrape all the phones' specifications available on gsmarena.com and stores it into a MongoDB database.

Open the gsmarena-com-crawler folder and run the gsmarena_parser.py file on your pc. The environment variables are provided in .env.example file. Set the Database configuration variables.

Built With: