Skip to content

This repo contains a pipeline designed on basis of DAAS pattern to store, manipulate and display raw data using AWS services via FastAPI

Notifications You must be signed in to change notification settings

adison1994/Data-as-a-Service-Pipeline-DaaS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-as-a-Service-Pipeline (DaaS)

CLAAT Link - https://codelabs-preview.appspot.com/?file_id=1RU0VFAtEOq2Wmisp00X2sNH4aQsW1qim4Zh-A3m7jYc#0

Data Pipeline Design

"Data as a service (DaaS) is a data management strategy that uses the cloud to deliver data storage, integration, processing, and/or analytics services via a network connection."

DaaS is similar to software as a service, or SaaS, a cloud computing strategy that involves delivering applications to end-users over the network, rather than having them run applications locally on their devices. Just as SaaS removes the need to install and manage software locally, DaaS outsources most data storage, integration, and processing operations to the cloud.

Design

The overall architecture flow originates from the trigger event on file upload in two different S3 buckets. Both the buckets have data uploaded, which further invocates two different Lambda functions executing concurrently. The functions are mainly responsible for importing data from both the buckets into two different DynamoDB tables. Since one of the DynamoDB tables requires pre-processing for cleaning the underlying data another invocation of a different Lambda function happens to perform necessary transformations and data cleaning. This data is further stored in a new dynamodb table.

About

This repo contains a pipeline designed on basis of DAAS pattern to store, manipulate and display raw data using AWS services via FastAPI

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published