Database for storing LTA Datamall API results and flask app for browsing them
I was bored during lockdown and wanted to start applying the RDBMS skills I had to build a dataset by periodically hitting a public API. I then created a rudimentary visual database browser using Flask and Vue.js to learn more about Python and the basics of Web development. I eventually want to run some ML analysis on the data, but the pandemic and ensuing lockdown means that my current data is not representative of "normal" conditions.
This project collects information for traffic speeds on Singapore's road network. The data is provided by Singapore's Land Transport Authority (LTA) through their Dynamic Datsets on their DataMall. I am running this project off a Raspberry Pi. It has more than enough resources for this project, and the simplicity and lightness of this system appeals to me. I'm not deploying this on the cloud because the database can grow large. The API actually updates its values every 5 minutes, so I'm only actually getting half the data. Nevertheless, my database has grown to 50MB in a week.
- SQLite 3 for the database
- Python 3, with the following modules
- PANDAS
- SciPy
- Matplotlib
- SQLAlchemy
- Flask
I used require a multi-step setup process involving some manual command execution, but now all you need to do is do is:
python setup.py
This script creates a database file called trafficmonitor
with tables "ROADS" and "TRAFFIC". "ROADS" is pre-populated from the LTA API and "TRAFFIC" will store the time-series data that this project collects.
The next task is to collect data to populate the TRAFFIC Table. ingest.py
does this for you, but you need to supply your own API key, which you can obtain here. Place the key LTA sends you in a file named api_key.txt
in the root of the repo folder.
I run this script every 10 minutes using crontab
on Linux:
crontab -e */10 * * * * ~/.profile; <path to your python> "<path to this repo folder>/ingest.py"
The dataset browser is a rudimentary visual browser written in Flask. To run it, do:
export FLASK_APP=trafficmon.py
flask run
in your terminal in the root of the repo. If you're running this on a remote Raspberry Pi, do
flask run --host 0.0.0.0
to make the site accessible from a remote machine.
The dataset browser offers the following functions:
- Download the entire dataset as a .mat file. Accessible variables are:
- matrix of speedbands where the rows are each road in order of increasing road ID, and each column the speedband value in increasing order of timestamp.
- array of road names.
- array of road IDs index-matched to the road names array.
- The sampling interval and the starting timestamp so that you can reconstruct the time vector manually.
- Browse a table of roads and preview their speedband history for the last 24 hours with an option to download the speedband history for that road.
- Write a Python script that automates database setup and pre-population of the ROADS table.
- [] Rework the visual interface to dynamically generate interactive graphs with timestamp filtering (I'm new to Web stuff so this will take some time)
- [] Visualize completeness of the dataset. What if the Pi goes down for some time? Is there some way to check if my dataset is still complete? LTA's API is completely real-time and they do not publish complete historical data.
- [] Email LTA and question why data for many other major artery roads are missing. Singapore has 10 major expressways and only 1 of them is reported in this dataset.