Skip to content

dagster-io/ebook-scaling-data-teams

Repository files navigation

Scaling Data Teams

A comprehensive data engineering platform demonstrating best practices and patterns for teams of different sizes. This project showcases how data engineering practices evolve from individual contributors to large-scale engineering organizations.

🏗️ Architecture Overview

This project implements a modern data stack using:

  • Dagster - Data orchestration and asset management
  • dbt - Data transformation and modeling
  • dlt - Data loading and ingestion
  • Sling - Data replication and streaming
  • DuckDB - Analytics database

Supporting Infrastructure

  • PostgreSQL - Relational database for structured data
  • LocalStack - Local AWS S3 emulation

📊 Team Size Patterns

This project demonstrates data engineering patterns for different organizational scales:

  • 1 Person Team - Individual contributor patterns with basic dbt transformations
  • 5 Person Team - Small team collaboration with asset dependencies
  • 10 Person Team - Medium team structure with specialized roles
  • 20 Person Team - Large team patterns with advanced orchestration

Start Infrastructure Services

docker-compose up -d

Run Dagster Web Server

dg dev

The Dagster UI will be available at http://localhost:3000

📁 Project Structure

├── src/ebook/                    # Main Dagster Python package
│   ├── components/               # Reusable Dagster components
│   │   └── export.py             # Custom Dagster S3 export component
│   ├── defs/                     # Dagster definitions
│   │   ├── assets/               # Data assets
│   │   ├── dbt/                  # dbt integration
│   │   ├── dlt/                  # dlt integration
│   │   ├── export/               # Custom Dagster S3 component configuration
│   │   └── sling/                # Sling integration
│   └── definitions.py            # Main definitions entry point
├── dbt_project/                  # dbt transformations
├── tests/                        # Test scenarios by team size
└── docker-compose.yaml           # Infrastructure services

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published