Scraping Cryptocurrencies Historical Data

This is a web crawler using Scrapy to scrape histiorical data of different cryptocurrency available on CoinMarketCap. It is always tricky to scrapy the website at scale as one might experience IP banning which is also the case for this particular website. One good way to overcome this is by setting up the Scrapy to rotate proxies. However finding some free and working proxies are quite time consuming so I decide not to over-engineer the code since I am only interesting to get the whole historical data from just few cryptocurriencies. Hence the crawler is coded in such a way that only one particular cryptocurrency’s historical data will be scraped at a time given its ticker name and the time frame (starting date and ending date).

Dealing with JSON response with Scrapy

The data on this website is saved as a JSON object and is acessible through this HTML tag: <script id="__NEXT_DATA__" type="application/json”>. To extract the data I want from this JSON response, I am using the the Scrapy built in ItemLoader and jmespath as proposed by Szabolcs Antal. This is a clear-cut way to parse the JSON object using the SelectJmes processors and populate the Scrapy Item using its ItemLoader. Note that Scrapy doesn’t accept a DataFrame object to be populated.

How to use:

Three user inputs are required to run the script: the sticker of the Cryptocurrency, Start Date and End Date.
- the user defined arguments are passed in the crawl command using the -a option
- output of the CSV file can be passed using the -o option
Set up the scrapy project/clone this project and navigate to the directory of the project to run the following:
- Example 1: to scrape Bitcoin’s historical data: `scrapy crawl crypto_spider -a ticker=BTC -a start='20130101' -a end="20201213" -o BTC.csv'
- Example 2: to scrape Etherum’s historical data: scrapy crawl crypto_spider -a ticker=ETH -a start='20140101' -a end="20201213" -o ETH.csv
the output CSV file (e.g.BTC.csv / ETH.csv) will be generated at the project directory

References:

https://robustify.wordpress.com/2017/12/22/how-to-scrape-json-response-with-scrapy-using-the-selectjmes-processor/

https://github.com/TeamHG-Memex/scrapy-rotating-proxies

https://blog.scrapinghub.com/scrapy-proxy

https://www.scraperapi.com/blog/best-10-free-proxies-and-free-proxy-lists-for-web-scraping/

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
crypto_scrapy		crypto_scrapy
.DS_Store		.DS_Store
BTC.csv		BTC.csv
BTC_blockchain_info.csv		BTC_blockchain_info.csv
BTC_intraday.csv		BTC_intraday.csv
ETH.csv		ETH.csv
ETH_blockchain_info.csv		ETH_blockchain_info.csv
ETH_intraday.csv		ETH_intraday.csv
LTC.csv		LTC.csv
README.md		README.md
USDT.csv		USDT.csv
XRP_blockchain_info.csv		XRP_blockchain_info.csv
XRP_intraday.csv		XRP_intraday.csv
cryptocurrencies.csv		cryptocurrencies.csv
intraday_ffd.csv		intraday_ffd.csv
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping Cryptocurrencies Historical Data

Dealing with JSON response with Scrapy

How to use:

References:

About

Releases

Packages

Languages

Giant316/crypto_scrapy

Folders and files

Latest commit

History

Repository files navigation

Scraping Cryptocurrencies Historical Data

Dealing with JSON response with Scrapy

How to use:

References:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages