Skip to content

RobApril/storing-big-data-predict

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering: Storing Big Data Predict

© Explore Data Science Academy

Overview

After much hard work, dedicated studying, and exercising of your practical knowledge, you finally find yourself in your first job as a data engineer- Congratulations ⭐️!

Your boss, who’s a senior DE, is excited to have you join the team. However, seeing that you’re brand new, she’s a little cautious of your capabilities... As such, she decides to test out your skills by giving you two tasks based on work that your team has recently performed. Eager to prove your mettle to your boss and the greater team, you readily accept the challenge 🥋!

Within the sections below, we provide links to the instructions, resources, and assessments associated with each task:

Part-1: On-premise Source Connection

Instructions 🧑‍🏫

A detailed set of instructions guiding you through several steps of your boss's first task can be found here.

Resources 📕

The resources below are provided in order to help complete this task:

Code:

Learning Material:

Assessments ⏱

The following actions need to be taken in order to complete the assessments used within the first task for the predict:

  • Submit CloudFormation templates
  • Generate AWS File Gateway-based SNS alert email
  • Complete Task 1 MCQ

Part-2: Streaming Data Pipeline

Instructions 🧑‍🏫

A set of high-level instructions surrounding your boss's second task can be found here.

Resources 📕

The resources below are provided in order to help complete this task:

Code:

Assessments ⏱

The following actions need to be taken in order to complete the assessments used within the second task for the predict:

  • Stream ~5min of synthetic data through your data pipeline into your S3 bucket
  • Generate custom pipeline failure SNS alert email
  • Complete Task 2 MCQ

Athena Upload

You need to upload a zip folder to Athena.
Here is what is expected inside of the zip folder uploaded by a student named Dora Explorer.

  1. Dora-Explorer-VPC.yml
  2. Dora-Explorer-Windows-Instance.yml
  3. Dora-Explorer-Architecture-Diagram.png
  4. Dora-Explorer.csv (see below for example)
Name Surname File Gateway S3 Bucket Name Fileshare Alert SNS Topic ARN Delivery Stream S3 Bucket Name Streaming SNS Topic ARN
Dora Explorer dedoraexplorer-source-file-gateway arn:aws:sns:region:123456789:DEDOREXP-Fileshare-transfer-alert dedoraexplorer-deliverystream-s3 arn:aws:sns:region:123456789:DEDOREXP-streaming-sns-topic

FAQ ❓

This section of the repo will be periodically updated to represent common questions which may arise around its use. If you detect any problems/bugs, please create an issue and we will do our best to resolve it as quickly as possible.

We wish you all the best in your learning experience 🚀!

EDSA-logo

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%