GitHub - mishraapoorva/QuestSolution: This project uses Amazon services for building a data pipeline to read data from different sources viz, CSV, and API. The data is then loaded to S3 which triggers an SQS queue. Data on S3 in altered and analyzed. A stack is created using CloudFormation CLI using template

AWSDataPipeline

So, lets create a data pipeline using the Amazing Amazon Web Services!!!
All we have to do, is to create an AWS account and work with a few different services!
Step 1: Write a cool function to load data to S3 (but remember to implement code reusability)
Step 2: Trigger a message to SQS queue as soon as the data is loaded to S3
Step 3: Recieve the message from SQS queue, fetch csv & json data from S3 and show off your data analytics skills using Pandas.

So, it looks something like this:

Services Used (- {& for what???}):

Lambda - To write Functions (code) to perform the tasks like reading to and from S3, data analysis
S3 - For Storing data from multiple sources (CSV & API)
SQS - Gets populated each time data is loaded into S3
CloudFormation - For creating and orchestrating the data pipeline

Also note,

This project consists of three folders solving three challenges of a data quest

Part 1 & 2: AWS S3 & Sourcing Datasets (sources: CSV and API)

This folder contains the source code for Part 1 & Part 2 combined

The .py file (lambda function) loads data from two sources (viz, CSV and API) into S3 buckets

Link to data in S3 and source code:

https://rearc-population-data-1633606565.s3.amazonaws.com/data.json

https://rearc-pr-time-series-data.s3.amazonaws.com/pr.data.0.Current

Part 3: Data Analytics

This folder contains the source code for Part 3

There is a AnalyticsNotebook.ipynb which contains the results within the notebook

Used Lambda Layer for importing pandas, boto3 packages

Part 4: Infrastructure as Code & Data Pipeline with AWS CloudFormation

Create a neat, little stack using yaml template to define the resources using CloudFormation

This folder consists of the cloudFormation.yaml file

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Part1&2		Part1&2
Part3		Part3
Part4		Part4
img		img
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWSDataPipeline

Services Used (- {& for what???}):

Also note,

Part 1 & 2: AWS S3 & Sourcing Datasets (sources: CSV and API)

Part 3: Data Analytics

Part 4: Infrastructure as Code & Data Pipeline with AWS CloudFormation

About

Releases

Packages

Languages

mishraapoorva/QuestSolution

Folders and files

Latest commit

History

Repository files navigation

AWSDataPipeline

Services Used (- {& for what???}):

Also note,

Part 1 & 2: AWS S3 & Sourcing Datasets (sources: CSV and API)

Part 3: Data Analytics

Part 4: Infrastructure as Code & Data Pipeline with AWS CloudFormation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages