Skip to content

JakhongirK/Data_Engineering_Simplified

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Roadmap

  1. Learn SQL... Aggregations with GROUP BY Joins (INNER, LEFT, FULL OUTER) Window functions Common table expressions etc.

You can learn from https://www.w3schools.com/

  1. Learn python/Scala..... Learn basics for/while/if loops, functional programming, abstract methods, traits Learn libraries like numpy, pandas, scikit-learn etc.

you can learn https://lnkd.in/gSz45km5

  1. Learn distributed computing... Hadoop versions/hadoop architecture fault tolerance in hadoop Read/understand about Mapreduce processing. learn optimizations used in mapreduce etc.

  2. Learn data ingestion tools... Learn Sqoop/ Kafka/NIFi Understand their functionality and job running mechanism.

  3. Learn data processing/NOSQL.... Spark architecture/ RDD/Dataframes/datasets. lazy evaluation, DAGs/ Lineage graph/optimization techniques YARN utilization/ spark streaming etc.

  4. Learn data warehousing..... Understand how HIve store and process the data different File formats/ compression Techniques. partitioning/ Bucketing. different UDF's available in Hive. SCD concepts. Ex Hbase. cassandra

  5. Learn job Orchestration... Learn Airflow/Oozie learn about workflow/ CRON etc.

  6. Learn Cloud Computing.... Learn Azure/AWS/ GCP. understand the significance of Cloud in #dataengineering Learn Azure synapse/Redshift/Big query Learn Ingestion tools/pipeline tools like ADF etc.

  7. Learn basics of CI/ CD and Linux commands.... Read about Kubernetes/Docker. And how crucial they are in data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%