This README covers all the aspects required in the report for the project
- Szymon Bujowski, 148050
- Dominika Plewińska, 151929
It's a distributed system intended for use by libraries. It helps management and administration by exposing a web interface, used to perform actions on the books.
The system allows for:
- Listing existing books in the system (library)
- Adding new books
- Deleting existing books
- Fetching book info (ISBN, author, title, borrower, publisher, year of publication)
- Borrowing book (by particular borrower)
- Returning borrowed book
Note that the following command assumes you're in the directory of the project
docker build -t my_flask_server:latest .
docker compose up
The default config is for webpage to be exposed at localhost:80
https://www.kaggle.com/datasets/saurabhbagchi/books-dataset, preprocessed for usage in the project. Namely:
"Image-URL-S";"Image-URL-M";"Image-URL-L"
columns are droppedborrower_id
field is added (initially at random) to specify if a given book is borrowed (and by whom)
Data resides in keyspace bookkeeper
, table books
:
isbn : text (PRIMARY KEY)
book_title: : text
book_author: : text
year_of_publication : bigint
publisher : text
borrower_id : bigint
Note that ISBN is of type text
because of some instances such as:
188164961X, Feel Great, Be Beautiful over 40: Inside Tips on How to Look Better, Be Healthier and Slow the Aging Process
Where X
at the end is valid.
The system is a multi-container docker setup with five services,
connected through cassandra-net
bridge network:
- 3 Cassandra database nodes -
[c1, c2, c3]
- each node has a healthcheck that tests if
describe keyspaces
command from Cassandra Query Language works within 5s timeout- if so, the node is considered healthy
- if not, there's a total of 60 retries in 5s intervals
- each subsequent node depends on the healthcheck of the previous one
c1->c2
,c2->c3
- this ensures that by the time
c3
is up, all nodes are as well
- each node has a healthcheck that tests if
- Flask server
- 3 replicas for load balancing, exposed and mapped on one port
8089
- on startup, the server:
- 1 - runs
init_db.py
script to initialize the database (populate it with data fromdata/dataset.csv
) - 2 - exposes Bookkeeper webpage
- 1 - runs
- depends on
c3
being healthy
- 3 replicas for load balancing, exposed and mapped on one port
- Nginx web server reverse proxy (middleware orchestrating client-server flow)
- listens on port
80
(this is how user accesses localhost:80 webpage) - has default volume mount
- depends on Flask server being healthy
- listens on port
The whole sequence of dependencies on health checks ensures proper setup.
/stress_tests
directory contains a number of stress tests that can be run once the system is set up using:
bash stress_tests.sh
The tests intend to simulate possible high-load situations the system may encounter:
test1_many_add.py
- high load of adding new booktest2_multiple_actions_and_clients.py
- high load of various actions coming from multiple clients at the same timetest3_reserving_books.py
- high load of reserving bookstest4_borrow_and_return.py
- high load of subsequent borrow and return requeststest5_conflicting_reservation.py
- high load of two clients trying to borrow the same book at the same time
We encountered a number of various problems, concerning different parts and aspects of the project:
- Flask requiring
2.2.2+
version of a library it depends on (werkzeug
), pulling the most recent one, which happened to not work with it anymore- A: specify specific version of
werkzeug
- A: specify specific version of
- database initializing before all Cassandra nodes were fully operational
- A: implement health checks and specify dependencies
- hyphens are special characters in CQL
- A: special-handling (double quoting -
'->''
when inserting data)
- A: special-handling (double quoting -
- various issues with ports due to improper nginx config
- A: a bit of trial and error, resolving issues one by one
- huge RAM usage by Cassandra nodes
- A:
MAX_HEAP_SIZE
andHEAP_NEWSIZE
limits
- A:
- no validation of id lead to "sql injection" of
-1
, making "borrow" action a "return" action- A: validation
- requiring all attributes specified for any operation
- A: rewrite code to only require what's actually needed
- unable to use
action_scripts.js
orbookkeeper.css
- A: serving of static files with templating of Flask