Skip to content

Latest commit

 

History

History
 
 

week_2_workflow_orchestration

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Week 2: Workflow Orchestration

Python code from videos is linked below.

Also, if you find the commands too small to view in Kalise's videos, here's the transcript with code for the second Prefect video and the fifth Prefect video.

Data Lake (GCS)

  • What is a Data Lake
  • ELT vs. ETL
  • Alternatives to components (S3/HDFS, Redshift, Snowflake etc.)
  • Video
  • Slides

1. Introduction to Workflow orchestration

  • What is orchestration?
  • Workflow orchestrators vs. other types of orchestrators
  • Core features of a workflow orchestration tool
  • Different types of workflow orchestration tools that currently exist

🎥 Video

2. Introduction to Prefect concepts

  • What is Prefect?
  • Installing Prefect
  • Prefect flow
  • Creating an ETL
  • Prefect task
  • Blocks and collections
  • Orion UI

🎥 Video

3. ETL with GCP & Prefect

  • Flow 1: Putting data to Google Cloud Storage

🎥 Video

4. From Google Cloud Storage to Big Query

  • Flow 2: From GCS to BigQuery

🎥 Video

5. Parametrizing Flow & Deployments

  • Parametrizing the script from your flow
  • Parameter validation with Pydantic
  • Creating a deployment locally
  • Setting up Prefect Agent
  • Running the flow
  • Notifications

🎥 Video

6. Schedules & Docker Storage with Infrastructure

  • Scheduling a deployment
  • Flow code storage
  • Running tasks in Docker

🎥 Video

7. Prefect Cloud and Additional Resources

  • Using Prefect Cloud instead of local Prefect
  • Workspaces
  • Running flows on GCP

🎥 Video

Code repository

Code from videos (with a few minor enhancements)

Homework

Homework can be found here.

Community notes

Did you take notes? You can share them here.