A collection of learning resources for curious data engineers
The goal of this repo is to empower your data engineering growth journey with resources that I have found truly inspiring.
- The Fundamentals of Data Engineering
- Designing Data-Intensive Applications
- Data Mesh
- The Data Warehouse Toolkit
- Refactoring Databases
- Learning Spark
- Kafka: The Definitive Guide
- Data Pipelines with Apache Airflow
- Designing Machine Learning Systems
- Fundamentals of Software Architecture
- Understanding Distributed Systems
- Building Microservices
- Building Evolutionary Architectures
- Domain-Driven Design
- System Design Interview: Volume 1
- System Design Interview: Volume 2
- The Hadoop Distributed File System
- MapReduce: Simplified Data Processing on Large Clusters
- Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores
- Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
- Photon: A Fast Query Engine for Lakehouse Systems
- Dynamo: Amazon’s Highly Available Key-value Store