Skip to content

yorickcleerbout/Databases-Advanced

Repository files navigation

UbuntuPythonPandasMongoDBDocker

Databases Advanced

This repository is made as part of a assignment of the course 'Databases Advanced', the intention of this assignment is to get familiar with services like MongoDB, Docker, Virtual Machines etc. To keep this assignment clear and easy to understand we splitted this task in multiple smaller tasks.

Task 1: Python Webscraper

The first task is to scrape the Blockcain website for all the current Bitcoin (BTC) transactions all over the world. The output of this part of the assignment has to be the highest USD value at the moment of scraping. When you are running a webscraper permanently it can become quite heavy for your computer thats why I recommend running this in a cloud based environment or virutual machine and running the script once every minute. (For me it is running on an Ubuntu Virtual Machine)

Usage:

Step 1: Clone my repository
git clone https://github.com/yorickcleerbout/Databases-Advanced.git

Step 2: Install required python packages
pip3 install -r requirements.txt

Step 3: Make the python script executable (Linux)
chmod +x scraper.py

Step 4: Run the Script
python3 scraper.py

Output:

At this point the highest amount in USD is printed to the terminal, I also added a feature that the highest amount in saved inside a results.json file where the date is sorted per date. By having this file you can select the highest trades on a specific day if you would like.

Json Output Format:

{
	"yyyy-mm-dd": [
		{
			"Hash" : "hash is here",
			"Time": "Time of transaction",
			"Amount (BTC)": "Amount of BTC",
			"Amount (USD)": "Amount in USD"
		}
	]
}

Task 2: MongoDB

The next objective is to save the highest BTC transaction to a MongoDB collection. In order to accomplish this objective you need to download and install MongoDB. As we are using an ubuntu virtual machine, this is quite easy by using the terminal.

Installation (Follow these steps or just run setup_mongo.sh)

Step 1: Install MongoDB
wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | sudo apt-key add -

echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list

sudo apt-get install -y mongodb-org

Step 2: Import python package
If you didnt install all the python packages required for this project mentioned in Task 1 (Installed using requirements.txt file), you need to install the python package for mongodb manually.
pip3 install pymongo

Step 3: Start MongoDB Service
sudo systemctl start mongod

Usage:

For task 1 you had to run the scraper.py file to execute the scraper, in this part of the assignment I chanced it up a little bit. I created a file called main.py. From now on, the only file you need to run is main.py to use this project. As mentioned before you need to make this file executable in order to use it.

(Reminder)
Step 1: Clone my repository
git clone https://github.com/yorickcleerbout/Databases-Advanced.git

Step 2: Install required python packages
pip3 install -r requirements.txt

Step 3: Make the python script executable (Linux)
chmod +x main.py

Step 4: Run the Script
python3 main.py

Task 3: Redis

This task is all about the availability of the data during execution, Redis is a key-value paired database that I'm using to cache my scraped data temperary. The way I implemented Redis is after i scrape the data I immediatly "save" the data in a Redis database that holds the information for about 1 minute, when the data is in Redis my parser.py file gets the data out of Redis to filter for the highest value to be able to save this into a MongoDB.

Installation (Follow these steps or just run setup_redis.sh)

Step 1: Install Redis
sudo apt install redis-server

Step 2: Import python package
If you didnt install all the python packages required for this project mentioned in Task 1 (Installed using requirements.txt file), you need to install the python package for redis manually.
pip3 install redis

Step 3: Start Redis Service
sudo systemctl start redis

Usage:

As always just run the file main.py to use the full project.

Task 4: Docker

For the last part of this assignment we had to transform our project into containers so we can run every single component in a docker container. What this does is, it makes it possible to run this project everywhere you want, the only thing you have to do is install docker (Windows, Linux or MacOS) and start the program.

Installation of Docker

Windows & MacOS

https://www.docker.com/products/docker-desktop

Linux

sudo apt install docker.io

Building Docker containers

You can pull my created images from my docker hub profile or you can create your own images using Dockerfiles.

Pull my images

Scraper: https://hub.docker.com/repository/docker/yorickcleerbout/scraper
Parser: https://hub.docker.com/repository/docker/yorickcleerbout/parser

Create your own images

Put the next code in a file with name Dockerfile and no extension or download my dockerfiles from this repository.

Scraper

MAINTAINER yorickcleerbout
COPY . .
RUN apt-get update && apt-get install -y git
RUN apt-get install -y python3
RUN apt-get install -y python3-pip
RUN git clone https://github.com/yorickcleerbout/Databases-Advanced.git
RUN cd Databases-Advanced
RUN pip3 install requests
RUN pip3 install beautifulsoup4
RUN pip3 install pandas
RUN pip3 install pymongo
RUN pip3 install redis
RUN cp "Databases-Advanced/DockerVersion/scraper.py" .
CMD ["python3", "scraper.py"]

Parser

FROM ubuntu:latest AS parser
MAINTAINER yorickcleerbout
COPY . .
RUN apt-get update && apt-get install -y git
RUN apt-get install -y python3
RUN apt-get install -y python3-pip
RUN git clone https://github.com/yorickcleerbout/Databases-Advanced.git
RUN cd Databases-Advanced
RUN pip3 install requests
RUN pip3 install beautifulsoup4
RUN pip3 install pandas
RUN pip3 install pymongo
RUN pip3 install redis
RUN cp "Databases-Advanced/DockerVersion/parser.py" .
CMD ["python3", "parser.py"]

Mongo & Redis
docker pull mongo
docker pull redis

Transform these images into containers

docker run --name scraper {imageID}
docker run --name parser {imageID}
docker run -p 27017:27017 --name mongo mongo
docker run --name redis redis

Creating network & adding containers

We need to create a network to connect these containers with each other.
docker network create {networkName}

Adding containers to the network

docker network connect {networkName} {containerName}

Final Conclusion

This assignment was an overal fun and educational experience. Python was ofcourse nothing new to me but to work with mongoDB, Redis and Docker was a totaly new for me. I hope I can do more of this kind of assignments in the future to further expand my knowledge and skills as part of a learning experience.