This project aims to securely manage, streamline, and perform analysis on the semi-structured daily fuel prices data.
- Data Ingestion — Ingest data daily from a list of URLs into storage.
- ETL Pipeline — Get data in raw format, transforming this data into a format that can be used for further analysis.
- Data lake — A centralized repo to store semi-structured date and processed data.
- Cloud — As the data increases daily, local computer will not be able to process the data. So we will use AWS for scalability.
- Infrastructure as Code - Terraform is an Infrastructure as Code tool to define cloud resources in a human-readable configuration files. All the resources in AWS in this project is deployed using Terraform.
- Reporting — Build a dashboard to give business insights.
- Amazon S3: To store raw data obtained daily from a list of URLs, and also store the data after it's been processed.
- AWS IAM: AWS Identity and Access Management which enables us to manage access to AWS services and resources securely.
- AWS Lambda: Lambda is a computing service that allows programmers to run code without creating or managing servers.
- AWS Glue: A serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
- AWS Athena: Athena is an interactive query service so we can query data stored in S3 directly without a database.