Liverpool Natural History Museum Plant Monitoring Project

Overview

The Liverpool Natural History Museum (LNMH) has introduced a new botanical wing and requires a robust solution to monitor the health of their plants housed in their conservatory. This project implements a cloud-based ETL pipeline and visualisation solution to monitor plant health metrics, including soil moisture and temperature, collected from sensors connected to Raspberry Pi devices.

This solution enables:

Real-time monitoring of plant health.
A short-term database to store readings from the last 24 hours.
A long-term storage solution for historical data.
A dashboard for visualising real-time and historical trends.

Project Architecture

The architecture comprises:

Sensors and Raspberry Pi: Collect data and expose it via an API.
AWS Lambda and RDS: Short-term storage for the last 24 hours of data.
ECS and S3: Batch ETL processes to transfer older data to long-term storage.
Streamlit Dashboard: Real-time and historical visualisation of plant data.

Deliverables

Extract

Data is retrieved from an API endpoint for each plant (approx. 50 plants) every minute.
Faulty data is identified and excluded during transformation.

Transform

Data is cleaned and verified.
Plant-related metrics and metadata are normalised and stored in a relational database (RDS).

Load

Short-Term Storage

Amazon RDS: Stores only the most recent 24 hours of data.

Long-Term Storage

Amazon S3: Cost-efficient long-term storage in Parquet format.

Visualisation

Streamlit: Interactive dashboard for real-time monitoring and trend analysis of plant health metrics.
- View graphs of temperature and soil moisture readings for all plants.
- Query historical data from S3.
- Available at http://18.175.201.133:8501/

Tech

The following technologies were used:

Infrastructure: Terraform for provisioning.
Containers: Docker for running all services.
Cloud Hosting: AWS.
Database: Microsoft SQL Server (RDS).
Data Visualisation: Streamlit.
Programming Language: Python.

Project Files

ETL Pipelines:
- Pipeline README (Short-term storage API-RDS)
- Pipeline README (Long-term storage RDS-S3)
Terraform:
- Terraform README
Streamlit Dashboard:
- Dashboard README

ERD (Entity Relationship Diagram)

Assumptions

One botanist can be in charge of multiple plants, but only one plant can have one botanist.
The origin location of the plant doesn't have a direct relationship to how healthy it is.
The long-term storage solution (S3) is designed for scalability and low cost.

Running the Project

Infrastructure Setup:
- Navigate to the terraform folder and follow the Terraform README to provision resources.
ETL Pipeline:
- Set up the ETL pipeline by running the scripts located in the pipeline folder. Detailed instructions are available in the respective READMEs.
Dashboard:
- Deploy the Streamlit dashboard by following the Dashboard README.

Wireframe Design

Assumptions

Botanist contact information (e.g., email, phone number) is intentionally excluded from the historical data dashboard. This ensures that if a botanist leaves the institute, their personal data will not persist on the dashboard, maintaining privacy and security.

Future Improvements

Mobile App Integration
- Create a mobile application for on-the-go monitoring of plant health metrics.
- Include push notifications for critical alerts (e.g., low soil moisture, extreme temperatures).
Expanded Visualisation Features
- Add comparison graphs to analyse multiple plants simultaneously.
- Include dynamic filters for customisable visualisation views (e.g., by botanist, plant species, or section of the conservatory).
Real-Time Notifications
- Implement SMS, email notifications for real-time alerts on critical readings.
Historical Data Enrichment
- Include additional metadata for plants, such as origin, species-specific needs, and growth patterns.
Object-Oriented Programming (OOP) Integration
- Future iterations could explore using OOP principles, treating entities such as plants and botanists as objects. This would provide a more modular and scalable codebase, improving maintainability and readability.

Why OOP was not Used in This Attempt:

Simpler Implementation Needs: For the initial phase, the project’s requirements were straightforward, and procedural programming provided a faster and more direct approach.
Time Constraints: Implementing OOP requires additional time for designing and structuring classes, which was not feasible given the project timeline.
Data Size and Complexity: The current scale and complexity of the data did not necessitate the use of OOP. As the project evolves, and the system handles more complex interactions, OOP could become more beneficial.

Contributors

Project Manager - S1mpySloth
Architect - Kurt812
Architect - ebradley12
Quality Assurance - Jakub-Poskrop

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
.github/workflows		.github/workflows
dashboard		dashboard
images		images
pipeline		pipeline
rds_to_s3_pipeline		rds_to_s3_pipeline
terraform		terraform
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
connect.sh		connect.sh
dockerise_pipeline.sh		dockerise_pipeline.sh
requirements.txt		requirements.txt
reset_database.sh		reset_database.sh
schema.sql		schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Liverpool Natural History Museum Plant Monitoring Project

Overview

Project Architecture