Gsmarena-crawler

This project is consisted of two crawlers built with different python webscraping libraries ( beautifulsoup4, scrapy and selenium ) which extract data from gsmarena and and its Bangladeshi variant website gsmarena-bd and store the data into a MongoDB Database.

Website	Crawler
gsmarena	gsmareana-selenium
gsmarena-bd	gsmareanabd-beautifulsoup4 gsmareanabd-scrapy

Prerequisites

python , MongoDB database

Software Version

Python - 3.6.8 (64 bit)
MongoDB - 4.4.8

Download

Download source code

Clone the repository

git clone https://github.com/tanjimanasreen/gsmarena-crawler.git

Gsmarenabd-crawler

Scrapy:

This comes with an end to end pipeline that scrapes all the phones' specifications available on gsmarena.com.bd and stores it into a MongoDB database.

Open the Scrapy-project folder and run it using scrapy crawl command. Set the Database configuration variables on the scrapy settings.py file.

Built With:

Scrapy Framework - 2.5.0
Pymongo - 3.12.0

BeautifulSoup4

This parser can parse all the phones' specifications available on gsmarena.com.bd using python's beautifulsoup4 package and stores it into a json file.

Download and run the notebook available here in your local pc using jupyter notebook or on google colab.

Built With:

BeautifulSoup4 - 4.6.3

Gsmarena-selenium

This uses Selenium package for python to scrape all the phones' specifications available on gsmarena.com and stores it into a MongoDB database.

Open the gsmarena-com-crawler folder and run the gsmarena_parser.py file on your pc. The environment variables are provided in .env.example file. Set the Database configuration variables.

Built With:

Selenium - 3.141.0
Pymongo - 3.12.0

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
gsmarena-com-bd-crawler		gsmarena-com-bd-crawler
gsmarena-com-crawler		gsmarena-com-crawler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gsmarena-crawler

Prerequisites

Software Version

Download

Gsmarenabd-crawler

Scrapy:

BeautifulSoup4

Gsmarena-selenium

About

Releases

Packages

Languages

License

tanjimanasreen/gsmarena-crawler

Folders and files

Latest commit

History

Repository files navigation

Gsmarena-crawler

Prerequisites

Software Version

Download

Gsmarenabd-crawler

Scrapy:

BeautifulSoup4

Gsmarena-selenium

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages