A comprehensive data engineering platform demonstrating best practices and patterns for teams of different sizes. This project showcases how data engineering practices evolve from individual contributors to large-scale engineering organizations.
This project implements a modern data stack using:
- Dagster - Data orchestration and asset management
- dbt - Data transformation and modeling
- dlt - Data loading and ingestion
- Sling - Data replication and streaming
- DuckDB - Analytics database
- PostgreSQL - Relational database for structured data
- LocalStack - Local AWS S3 emulation
This project demonstrates data engineering patterns for different organizational scales:
- 1 Person Team - Individual contributor patterns with basic dbt transformations
- 5 Person Team - Small team collaboration with asset dependencies
- 10 Person Team - Medium team structure with specialized roles
- 20 Person Team - Large team patterns with advanced orchestration
docker-compose up -ddg devThe Dagster UI will be available at http://localhost:3000
├── src/ebook/ # Main Dagster Python package
│ ├── components/ # Reusable Dagster components
│ │ └── export.py # Custom Dagster S3 export component
│ ├── defs/ # Dagster definitions
│ │ ├── assets/ # Data assets
│ │ ├── dbt/ # dbt integration
│ │ ├── dlt/ # dlt integration
│ │ ├── export/ # Custom Dagster S3 component configuration
│ │ └── sling/ # Sling integration
│ └── definitions.py # Main definitions entry point
├── dbt_project/ # dbt transformations
├── tests/ # Test scenarios by team size
└── docker-compose.yaml # Infrastructure services