Data Engineering Zoomcamp

Overview

Welcome to my Data Engineering Zoomcamp repository. This repository is a comprehensive guide to mastering data engineering concepts and practices through hands-on exercises and real-world applications. Thus, it includes a collection of learning materials, notes, homeworks, projects and extra exercises completed during the Data Engineering Zoomcamp.

Description

This repository contains all the materials and code developed during the Data Engineering Zoomcamp. The course covers various aspects of data engineering, including data ingestion, data transformation, and data warehousing, using modern tools and frameworks based on sponsorship partners.

Technologies Used

Python: For scripting and data manipulation.
Kestra: For orchestrating workflows.
dlt: For data ingestion.
PostgreSQL: As the relational database management system.
BigQuery: For data warehousing, partitioning & clustering, and machine learning.
Docker: For containerization of applications.
dbt: For analytics engineering.
Spark: For batch processing.
Apache Kafka: For real-time data streaming.
Pandas: For data analysis and manipulation.
SQLAlchemy: For database interaction.

Resources

Course Notes

All notes will be centralized within this directory.

Final Project

The Final Project directory showcases a data engineering project designed to empower business intelligence insights. It includes:

ETL Pipelines: Well-structured Extract, Transform, Load (ETL) processes that integrate diverse data sources into a cohesive data warehouse.
Data Models: Comprehensive data models optimized for analytical queries, facilitating efficient data retrieval and reporting.
Documentation: Detailed guides on pipeline architecture, data flow, and usage instructions for stakeholders.
Dashboards: Interactive dashboards built using BI tools, demonstrating key metrics and visualizations derived from the processed data.
Testing Suite: Automated tests to validate data integrity and pipeline performance, ensuring reliable analytics.

This directory serves as a practical demonstration of data engineering principles applied to generate actionable business insights, aligning with the goals of a business intelligence analyst.

Contributing

All contributions from the community are welcome 👍. To ensure a smooth collaboration process, please follow these guidelines:

Fork the Repository: Start by forking the repository to your own GitHub account.
Clone Your Fork: Clone your forked repository to your local machine using:
```
git clone https://github.com/your-username/repo-name.git
```
Create a Branch: Create a new branch for your feature or bug fix:
```
git checkout -b category/reference/description-in-kebab-case
```
Make Changes: Implement your changes and ensure they are well-documented.
Commit Your Changes: Commit your changes with a clear message:
```
git commit -m 'category: do something; do some other things'
```
Push to Your Fork: Push your changes to your forked repository:
```
git push origin category/reference/description-in-kebab-case
```
Submit a Pull Request: Navigate to the original repository and submit a pull request. Provide a detailed description of your changes and why they should be merged.

We appreciate your contributions and will review your pull request as soon as possible. Kindly please follow the simplified naming convention for branches and commit as summarized here.

License

This project is licensed under the Apache 2.0 License. You are free to use, modify, and distribute these materials, provided that proper attribution is given to the original authors.

For more details, please refer to the LICENSE file in the repository.

Acknowledgments

We would like to thank all the instructors for their hard work and dedication to the Data Engineering Zoomcamp.
Special thanks to our sponsorship partners Kestra and dlt for their support.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
1_intro_to_data_engineering		1_intro_to_data_engineering
course_notes		course_notes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Zoomcamp

Overview

Table of Contents

Description

Technologies Used

Resources

Course Notes

Final Project

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

License

pizofreude/de-zoomcamp

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Zoomcamp

Overview

Table of Contents

Description

Technologies Used

Resources

Course Notes

Final Project

Contributing

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages