Skip to content

Commit

Permalink
Merge pull request #18 from nawinrajkumar/nawin
Browse files Browse the repository at this point in the history
Nawin
  • Loading branch information
nivu committed Aug 7, 2023
2 parents 4897442 + a685aa1 commit 6f39c90
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ _Navaneeth Malingan_
- [PyImageSearch](https://www.pyimagesearch.com/start-here/)
- [5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python](https://www.mrdbourke.com/5-beginner-friendly-steps-to-learn-machine-learning/)


## Intro to ML

- [Luis Serrano: A Friendly Introduction to Machine Learning](https://www.youtube.com/watch?v=IpGxLWOIZy4)
Expand Down
36 changes: 36 additions & 0 deletions data_engineering/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Data Engineering Resources


Data engineering is a field of work that involves **designing, building, and managing the infrastructure** and systems required to **collect, store, process, and analyze data**. Data engineers play a crucial role in the data lifecycle, ensuring that data is available, accessible, and reliable for various data-driven applications and decision-making processes.

---
## Here are some key resources for Data Engineering
---
### Batch Proceesing


**Batch processing** is a data processing technique where a set of data is collected over a period of time and processed as a group or batch. In batch processing, data is processed in predefined batches rather than being processed in real-time or immediately upon arrival. to understand the basics of Data Engineering, see this resources.

- [Understanding Data Engineering by Datacamp](https://app.datacamp.com/learn/courses/understanding-data-engineering)
- [Introduction to Data Engineering by Datacamp](https://app.datacamp.com/learn/courses/introduction-to-data-engineering)
- [Apache Spark Tutorial (used for Large Scale Data Processing using SQL commands)](https://spark.apache.org/docs/latest/sql-getting-started.html)
- [Test your knowledge using ProjectPro](https://www.projectpro.io/article/big-data-interview-questions-/773)


### Stream Processing
**Stream processing** is a method of data processing that involves continuously processing and analyzing data as it is generated or received in real-time. It enables the handling and analysis of data in motion, allowing for immediate insights and actions based on the streaming data. Here are some resources to refer to,
- [Introduction to Apache Kafka Streams](https://kafka.apache.org/documentation/streams/)
- [Apache Flink Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/try-flink/datastream/)
- [Stream Processing Quiz](https://chauff.github.io/documents/bdp-quiz/streaming.html)


### Data Pipelines and Integration

**Data pipelines and integration** are critical components of data engineering that involve the movement, transformation, and integration of data from various sources to a destination for further processing, analysis, or storage. They ensure that data flows seamlessly and reliably across different systems, enabling efficient data management and utilization. Refer these resources for reference.
- [Building Data Engineering Pipelines in Python](https://app.datacamp.com/learn/courses/building-data-engineering-pipelines-in-python)
- ["What is Data Integration?" by talend](https://www.talend.com/resources/what-is-data-integration/)
- [Data Cleaning Challenge: Handling missing values](https://www.kaggle.com/code/rtatman/data-cleaning-challenge-handling-missing-values/notebook)

---

Data engineering requires knowledge of programming languages (such as Python, Java, or Scala), database systems, big data technologies, cloud platforms, data modeling, and data warehousing concepts. Data engineers also need to keep up with the evolving landscape of data technologies and best practices to ensure efficient and effective data management.

0 comments on commit 6f39c90

Please sign in to comment.