Skip to content

An end-to-end data engineering project built entirely on AWS. It features an automated data pipeline, an interactive dashboard and API integration running within Docker containers.

License

Notifications You must be signed in to change notification settings

vasileiosvyzas/maritime-emissions-analysis-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Maritime Emissions Analysis Platform

A full-stack data analytics platform built on AWS, featuring automated data pipelines, interactive dashboards, and API integration, all orchestrated with Docker containers.

Project Overview

This project is a comprehensive data analytics platform that:

  • Collects CO2 emissions reports from EU-MRV system through Python scripts and stores it in AWS S3
  • Processes raw data through an ETL pipeline using AWS Glue
  • Provides data access through a REST API (AWS API Gateway)
  • Visualizes data through an Apache Superset dashboard
  • Features a React-based landing page

Architecture

  • Frontend: React.js landing page
  • Backend:
    • AWS API Gateway for REST endpoints
    • Python ETL scripts for data processing
    • Apache Superset for data visualization
  • Infrastructure:
    • AWS EC2 for hosting
    • Docker containers for service orchestration
    • Nginx as reverse proxy
    • AWS S3 for data storage
    • AWS Route 53 for domain management

Technologies Used

  • Cloud & Infrastructure:
    • AWS (EC2, S3, API Gateway, Route 53)
    • Docker & Container Orchestration
    • Nginx
  • Backend & Data:
    • Python
    • Apache Superset
    • ETL Pipeline
    • API Development
  • Frontend:
    • React.js
    • HTML/CSS
    • JavaScript

Setup and Installation

Prerequisites

  • AWS Account
  • Docker and Docker Compose
  • Node.js
  • Python 3.x

Local Development Setup

  1. Clone the repository:
git clone [repository-url]
cd [repository-name]
  1. Start the frontend application:
cd frontend
npm install
npm start
  1. Run the Docker containers:
docker-compose up -d
  1. Initialize Superset (first time only):
docker-compose -f superset-docker-compose.yml exec superset superset-init

Production Deployment

deployment

  1. Configure AWS services:

    • Set up EC2 instance
    • Configure S3 bucket
    • Set up API Gateway
    • Configure Route 53 for domain management
  2. Deploy application:

docker-compose -f docker-compose.prod.yml up -d
  1. Set up SSL certificates:
sudo certbot --nginx -d yourdomain.com -d www.yourdomain.com -d dashboard.yourdomain.com

Project Structure

project/
├── docker-compose.yml           # Main application composition
├── superset-docker-compose.yml  # Superset setup
├── frontend/                    # React application
│   ├── public/
│   ├── src/
│   └── Dockerfile
├── backend/                    # Backend
│   ├── src/
│   ├── app/
│   └── Dockerfile
│   └── compose.yml
├── nginx/                       # Nginx configuration
│   └── conf.d/
└── superset/                   # Superset configuration
    └── superset_config.py

Future Development / Roadmap

Infrastructure Improvements

  • Move containers to AWS ECS
  • Automate infrastructure creation and deletion with Terraform
  • Set up monitoring and alerting with CloudWatch

Data Pipeline Enhancements

  • Add data quality checks
  • Implement error handling and retry mechanisms
  • Implement an automated testing suite with Pytests
  • Create automated data backup system

License

This project is licensed under the terms of the MIT license.

Contact

For any inquiries please email at [email protected]

About

An end-to-end data engineering project built entirely on AWS. It features an automated data pipeline, an interactive dashboard and API integration running within Docker containers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published