Scaling Data Teams

A comprehensive data engineering platform demonstrating best practices and patterns for teams of different sizes. This project showcases how data engineering practices evolve from individual contributors to large-scale engineering organizations.

🏗️ Architecture Overview

This project implements a modern data stack using:

Dagster - Data orchestration and asset management
dbt - Data transformation and modeling
dlt - Data loading and ingestion
Sling - Data replication and streaming
DuckDB - Analytics database

Supporting Infrastructure

PostgreSQL - Relational database for structured data
LocalStack - Local AWS S3 emulation

📊 Team Size Patterns

This project demonstrates data engineering patterns for different organizational scales:

1 Person Team - Individual contributor patterns with basic dbt transformations
5 Person Team - Small team collaboration with asset dependencies
10 Person Team - Medium team structure with specialized roles
20 Person Team - Large team patterns with advanced orchestration

Start Infrastructure Services

docker-compose up -d

Run Dagster Web Server

dg dev

The Dagster UI will be available at http://localhost:3000

📁 Project Structure

├── src/ebook/                    # Main Dagster Python package
│   ├── components/               # Reusable Dagster components
│   │   └── export.py             # Custom Dagster S3 export component
│   ├── defs/                     # Dagster definitions
│   │   ├── assets/               # Data assets
│   │   ├── dbt/                  # dbt integration
│   │   ├── dlt/                  # dlt integration
│   │   ├── export/               # Custom Dagster S3 component configuration
│   │   └── sling/                # Sling integration
│   └── definitions.py            # Main definitions entry point
├── dbt_project/                  # dbt transformations
├── tests/                        # Test scenarios by team size
└── docker-compose.yaml           # Infrastructure services

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.devcontainer		.devcontainer
.github		.github
dbt_project		dbt_project
localstack		localstack
postgres		postgres
src/ebook		src/ebook
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scaling Data Teams

🏗️ Architecture Overview

Supporting Infrastructure

📊 Team Size Patterns

Start Infrastructure Services

Run Dagster Web Server

📁 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dagster-io/ebook-scaling-data-teams

Folders and files

Latest commit

History

Repository files navigation

Scaling Data Teams

🏗️ Architecture Overview

Supporting Infrastructure

📊 Team Size Patterns

Start Infrastructure Services

Run Dagster Web Server

📁 Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages