So, lets create a data pipeline using the Amazing Amazon Web Services!!!
All we have to do, is to create an AWS account and work with a few different services!
Step 1: Write a cool function to load data to S3 (but remember to implement code reusability)
Step 2: Trigger a message to SQS queue as soon as the data is loaded to S3
Step 3: Recieve the message from SQS queue, fetch csv & json data from S3 and show off your data analytics skills using Pandas.
So, it looks something like this:
- Lambda - To write Functions (code) to perform the tasks like reading to and from S3, data analysis
- S3 - For Storing data from multiple sources (CSV & API)
- SQS - Gets populated each time data is loaded into S3
- CloudFormation - For creating and orchestrating the data pipeline
This project consists of three folders solving three challenges of a data quest
- This folder contains the source code for Part 1 & Part 2 combined
- The .py file (lambda function) loads data from two sources (viz, CSV and API) into S3 buckets
- Link to data in S3 and source code:
- This folder contains the source code for Part 3
- There is a AnalyticsNotebook.ipynb which contains the results within the notebook
- Used Lambda Layer for importing pandas, boto3 packages
- Create a neat, little stack using yaml template to define the resources using CloudFormation
- This folder consists of the cloudFormation.yaml file