The Liverpool Natural History Museum (LNMH) has introduced a new botanical wing and requires a robust solution to monitor the health of their plants housed in their conservatory. This project implements a cloud-based ETL pipeline and visualisation solution to monitor plant health metrics, including soil moisture and temperature, collected from sensors connected to Raspberry Pi devices.
This solution enables:
- Real-time monitoring of plant health.
- A short-term database to store readings from the last 24 hours.
- A long-term storage solution for historical data.
- A dashboard for visualising real-time and historical trends.
The architecture comprises:
- Sensors and Raspberry Pi: Collect data and expose it via an API.
- AWS Lambda and RDS: Short-term storage for the last 24 hours of data.
- ECS and S3: Batch ETL processes to transfer older data to long-term storage.
- Streamlit Dashboard: Real-time and historical visualisation of plant data.
- Data is retrieved from an API endpoint for each plant (approx. 50 plants) every minute.
- Faulty data is identified and excluded during transformation.
- Data is cleaned and verified.
- Plant-related metrics and metadata are normalised and stored in a relational database (RDS).
- Amazon RDS: Stores only the most recent 24 hours of data.
- Amazon S3: Cost-efficient long-term storage in Parquet format.
- Streamlit: Interactive dashboard for real-time monitoring and trend analysis of plant health metrics.
- View graphs of temperature and soil moisture readings for all plants.
- Query historical data from S3.
- Available at http://18.175.201.133:8501/
The following technologies were used:
- Infrastructure: Terraform for provisioning.
- Containers: Docker for running all services.
- Cloud Hosting: AWS.
- Database: Microsoft SQL Server (RDS).
- Data Visualisation: Streamlit.
- Programming Language: Python.
- ETL Pipelines:
- Terraform:
- Streamlit Dashboard:
- One botanist can be in charge of multiple plants, but only one plant can have one botanist.
- The origin location of the plant doesn't have a direct relationship to how healthy it is.
- The long-term storage solution (S3) is designed for scalability and low cost.
-
Infrastructure Setup:
- Navigate to the
terraform
folder and follow the Terraform README to provision resources.
- Navigate to the
-
ETL Pipeline:
- Set up the ETL pipeline by running the scripts located in the
pipeline
folder. Detailed instructions are available in the respective READMEs.
- Set up the ETL pipeline by running the scripts located in the
-
Dashboard:
- Deploy the Streamlit dashboard by following the Dashboard README.
- Botanist contact information (e.g., email, phone number) is intentionally excluded from the historical data dashboard. This ensures that if a botanist leaves the institute, their personal data will not persist on the dashboard, maintaining privacy and security.
-
Mobile App Integration
- Create a mobile application for on-the-go monitoring of plant health metrics.
- Include push notifications for critical alerts (e.g., low soil moisture, extreme temperatures).
-
Expanded Visualisation Features
- Add comparison graphs to analyse multiple plants simultaneously.
- Include dynamic filters for customisable visualisation views (e.g., by botanist, plant species, or section of the conservatory).
-
Real-Time Notifications
- Implement SMS, email notifications for real-time alerts on critical readings.
-
Historical Data Enrichment
- Include additional metadata for plants, such as origin, species-specific needs, and growth patterns.
-
Object-Oriented Programming (OOP) Integration
- Future iterations could explore using OOP principles, treating entities such as plants and botanists as objects. This would provide a more modular and scalable codebase, improving maintainability and readability.
- Simpler Implementation Needs: For the initial phase, the project’s requirements were straightforward, and procedural programming provided a faster and more direct approach.
- Time Constraints: Implementing OOP requires additional time for designing and structuring classes, which was not feasible given the project timeline.
- Data Size and Complexity: The current scale and complexity of the data did not necessitate the use of OOP. As the project evolves, and the system handles more complex interactions, OOP could become more beneficial.
Project Manager - S1mpySloth
Architect - Kurt812
Architect - ebradley12
Quality Assurance - Jakub-Poskrop