Skip to content

This project uses Amazon services for building a data pipeline to read data from different sources viz, CSV, and API. The data is then loaded to S3 which triggers an SQS queue. Data on S3 in altered and analyzed. A stack is created using CloudFormation CLI using template

Notifications You must be signed in to change notification settings

mishraapoorva/QuestSolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWSDataPipeline

So, lets create a data pipeline using the Amazing Amazon Web Services!!!
All we have to do, is to create an AWS account and work with a few different services!
Step 1: Write a cool function to load data to S3 (but remember to implement code reusability)
Step 2: Trigger a message to SQS queue as soon as the data is loaded to S3
Step 3: Recieve the message from SQS queue, fetch csv & json data from S3 and show off your data analytics skills using Pandas.

So, it looks something like this:

Or not

Services Used (- {& for what???}):

  • Lambda - To write Functions (code) to perform the tasks like reading to and from S3, data analysis
  • S3 - For Storing data from multiple sources (CSV & API)
  • SQS - Gets populated each time data is loaded into S3
  • CloudFormation - For creating and orchestrating the data pipeline

Also note,

This project consists of three folders solving three challenges of a data quest

Part 1 & 2: AWS S3 & Sourcing Datasets (sources: CSV and API)

  • This folder contains the source code for Part 1 & Part 2 combined
  • The .py file (lambda function) loads data from two sources (viz, CSV and API) into S3 buckets
  • Link to data in S3 and source code:

Part 3: Data Analytics

  • This folder contains the source code for Part 3
  • There is a AnalyticsNotebook.ipynb which contains the results within the notebook
  • Used Lambda Layer for importing pandas, boto3 packages

Part 4: Infrastructure as Code & Data Pipeline with AWS CloudFormation

  • Create a neat, little stack using yaml template to define the resources using CloudFormation
  • This folder consists of the cloudFormation.yaml file

About

This project uses Amazon services for building a data pipeline to read data from different sources viz, CSV, and API. The data is then loaded to S3 which triggers an SQS queue. Data on S3 in altered and analyzed. A stack is created using CloudFormation CLI using template

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published