Skip to content

pizofreude/de-zoomcamp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReadMeSupportPalestine

Data Engineering Zoomcamp

Overview

Welcome to my Data Engineering Zoomcamp repository. This repository is a comprehensive guide to mastering data engineering concepts and practices through hands-on exercises and real-world applications. Thus, it includes a collection of learning materials, notes, homeworks, projects and extra exercises completed during the Data Engineering Zoomcamp.

Table of Contents

Description

This repository contains all the materials and code developed during the Data Engineering Zoomcamp. The course covers various aspects of data engineering, including data ingestion, data transformation, and data warehousing, using modern tools and frameworks based on sponsorship partners.

Technologies Used

  • Python: For scripting and data manipulation.
  • Kestra: For orchestrating workflows.
  • dlt: For data ingestion.
  • PostgreSQL: As the relational database management system.
  • BigQuery: For data warehousing, partitioning & clustering, and machine learning.
  • Docker: For containerization of applications.
  • dbt: For analytics engineering.
  • Spark: For batch processing.
  • Apache Kafka: For real-time data streaming.
  • Pandas: For data analysis and manipulation.
  • SQLAlchemy: For database interaction.

Resources

Course Notes

All notes will be centralized within this directory.

Final Project

The Final Project directory showcases a data engineering project designed to empower business intelligence insights. It includes:

  • ETL Pipelines: Well-structured Extract, Transform, Load (ETL) processes that integrate diverse data sources into a cohesive data warehouse.
  • Data Models: Comprehensive data models optimized for analytical queries, facilitating efficient data retrieval and reporting.
  • Documentation: Detailed guides on pipeline architecture, data flow, and usage instructions for stakeholders.
  • Dashboards: Interactive dashboards built using BI tools, demonstrating key metrics and visualizations derived from the processed data.
  • Testing Suite: Automated tests to validate data integrity and pipeline performance, ensuring reliable analytics.

This directory serves as a practical demonstration of data engineering principles applied to generate actionable business insights, aligning with the goals of a business intelligence analyst.

Contributing

All contributions from the community are welcome 👍. To ensure a smooth collaboration process, please follow these guidelines:

  1. Fork the Repository: Start by forking the repository to your own GitHub account.
  2. Clone Your Fork: Clone your forked repository to your local machine using:
    git clone https://github.com/your-username/repo-name.git
  3. Create a Branch: Create a new branch for your feature or bug fix:
    git checkout -b category/reference/description-in-kebab-case
  4. Make Changes: Implement your changes and ensure they are well-documented.
  5. Commit Your Changes: Commit your changes with a clear message:
    git commit -m 'category: do something; do some other things'
  6. Push to Your Fork: Push your changes to your forked repository:
    git push origin category/reference/description-in-kebab-case
  7. Submit a Pull Request: Navigate to the original repository and submit a pull request. Provide a detailed description of your changes and why they should be merged.

We appreciate your contributions and will review your pull request as soon as possible. Kindly please follow the simplified naming convention for branches and commit as summarized here.

License

This project is licensed under the Apache 2.0 License. You are free to use, modify, and distribute these materials, provided that proper attribution is given to the original authors.

For more details, please refer to the LICENSE file in the repository.

Acknowledgments

  • We would like to thank all the instructors for their hard work and dedication to the Data Engineering Zoomcamp.

  • Special thanks to our sponsorship partners Kestra and dlt for their support.