Skip to content

Finfra/bigdataAwsPreLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FMS BigData AWS Pre-Lab

AWS EC2 기반 빅데이터 μ‹€μ‹œκ°„ 처리 μ‹œμŠ€ν…œ ꡬ좕 μ‹€μŠ΅ κ³Όμ •

🎯 ν”„λ‘œμ νŠΈ κ°œμš”

FMS(Factory Management System) 빅데이터 νŒŒμ΄ν”„λΌμΈμ„ AWS EC2 ν™˜κ²½μ—μ„œ Terraformκ³Ό Ansible을 μ΄μš©ν•΄ κ΅¬μΆ•ν•˜λŠ” μ’…ν•© μ‹€μŠ΅ ν”„λ‘œμ νŠΈμž…λ‹ˆλ‹€. 100개 μž₯λΉ„μ˜ μ„Όμ„œ 데이터λ₯Ό μ‹€μ‹œκ°„μœΌλ‘œ μˆ˜μ§‘, 처리, μ €μž₯ν•˜κ³  λͺ¨λ‹ˆν„°λ§ν•˜λŠ” μ™„μ „ν•œ 데이터 νŒŒμ΄ν”„λΌμΈμ„ κ΅¬ν˜„ν•©λ‹ˆλ‹€.

핡심 기술 μŠ€νƒ

  • 데이터 μˆ˜μ§‘: Python Collector + FMS API
  • λ©”μ‹œμ§€ 큐: Apache Kafka (2λ…Έλ“œ ν΄λŸ¬μŠ€ν„°, EC2 기반)
  • 슀트림 처리: Apache Spark Streaming (EC2 기반)
  • λΆ„μ‚° μ €μž₯: Hadoop μ €μž₯μ†Œλ₯Ό AWS S3둜 μ§€μ • (EC2 기반)
  • λͺ¨λ‹ˆν„°λ§: Grafana + Prometheus (EC2 기반)
  • 인프라 관리: AWS EC2, Terraform, Ansible

μ£Όμš” μ„±κ³Ό μ§€ν‘œ

  • μ²˜λ¦¬λŸ‰: 47 msg/sec (λͺ©ν‘œ 50 msg/sec λŒ€λΉ„ 94%)
  • μ§€μ—°μ‹œκ°„: 25초 μ—”λ“œνˆ¬μ—”λ“œ (λͺ©ν‘œ 30초 이내)
  • κ°€μš©μ„±: 99.7% (λͺ©ν‘œ 99% 이상)
  • 데이터 ν’ˆμ§ˆ: 96.2% (λͺ©ν‘œ 95% 이상)

πŸ“š μ§„ν–‰ λͺ©μ°¨

μž₯ 폴더λͺ… 제λͺ© 핡심 λ‚΄μš© μ£Όμš” μ‚°μΆœλ¬Ό
1 01-pre-lab-introduction/ Pre-Lab μ†Œκ°œ ν”„λ‘œμ νŠΈ λͺ©ν‘œ, AWS μ‹€μŠ΅ ν™˜κ²½ κ°œμš”, 계정 μ€€λΉ„ ν™˜κ²½ 체크리슀트
2 02-aws-account-setup/ AWS 계정 및 κΆŒν•œ μ€€λΉ„ IAM, ν‚€νŽ˜μ–΄, S3 버킷, VPC κΈ°λ³Έ μ„€μ • 계정/κΆŒν•œ 체크리슀트, S3 버킷
3 03-infra-provisioning/ 인프라 μžλ™ν™”(IaC) Terraform으둜 EC2, VPC, SG, S3 λ“± μžλ™ 생성 Terraform μ½”λ“œ, 인프라 λ‹€μ΄μ–΄κ·Έλž¨
4 04-ansible-automation/ μ„œλΉ„μŠ€ μžλ™ν™” 배포 Ansible둜 Hadoop/Spark μžλ™ μ„€μΉ˜ Ansible ν”Œλ ˆμ΄λΆ, 배포 둜그
5 05-architecture-design/ μ•„ν‚€ν…μ²˜ 섀계 및 κ²€ν†  AWS 기반 λΆ„μ‚° μ•„ν‚€ν…μ²˜ 섀계, 리슀크 뢄석 μ•„ν‚€ν…μ²˜ λ‹€μ΄μ–΄κ·Έλž¨, 리슀크 뢄석
6 06-hadoop-spark-cluster/ Hadoop/Spark ν΄λŸ¬μŠ€ν„° ꡬ좕 EC2 기반 λΆ„μ‚° μŠ€ν† λ¦¬μ§€(μ €μž₯μ†ŒλŠ” AWS S3 μ§€μ •)/μ»΄ν“¨νŒ… ν™˜κ²½ ꡬ좕 ν΄λŸ¬μŠ€ν„° ꡬ좕 슀크립트, λͺ¨λ‹ˆν„°λ§
7 07-kafka-streaming/ Kafka μ‹€μ‹œκ°„ 슀트리밍 EC2 기반 Kafka ν΄λŸ¬μŠ€ν„°, 데이터 μˆ˜μ§‘κΈ° Kafka ν΄λŸ¬μŠ€ν„°, 데이터 μˆ˜μ§‘κΈ°
8 08-data-quality-requirements/ 데이터 ν’ˆμ§ˆ μš”κ±΄ μ •μ˜ 데이터 ꡬ쑰 뢄석, ν’ˆμ§ˆ κ·œμΉ™, 검증 둜직 ν’ˆμ§ˆ κ·œμΉ™ μ •μ˜, 검증 슀크립트
9 09-spark-data-transformation/ 데이터 λ³€ν™˜ 둜직 κ΅¬ν˜„ Spark SQL, DataFrame API, UDF 데이터 λ³€ν™˜ λͺ¨λ“ˆ, ν’ˆμ§ˆ 검증기
10 10-integrated-pipeline/ 톡합 νŒŒμ΄ν”„λΌμΈ 개발 μ—”λ“œνˆ¬μ—”λ“œ 연동, μ—λŸ¬ 처리 톡합 νŒŒμ΄ν”„λΌμΈ, μ—λŸ¬ ν•Έλ“€λŸ¬
11 11-visualization-monitoring/ μ‹œκ°ν™” 및 λͺ¨λ‹ˆν„°λ§ Grafana, Prometheus, CloudWatch 연동 λŒ€μ‹œλ³΄λ“œ, λͺ¨λ‹ˆν„°λ§ 리포트
12 12-cost-optimization-security/ λΉ„μš© μ΅œμ ν™” 및 λ³΄μ•ˆ λΉ„μš© λͺ¨λ‹ˆν„°λ§, IAM μ •μ±…, S3 버킷 μ •μ±… λΉ„μš© 리포트, λ³΄μ•ˆ 체크리슀트
13 13-advanced-aws-analytics/ Glue/Athena/ν™•μž₯ μ‹€μŠ΅ Glue ETL, Athena 쿼리, S3 데이터 레이크 ETL 슀크립트, Athena 쿼리 예제
14 14-project-documentation/ ν”„λ‘œμ νŠΈ λ¬Έμ„œν™” 기술 λ¬Έμ„œ, 운영 κ°€μ΄λ“œ, κ°œμ„  κ³„νš μ•„ν‚€ν…μ²˜ λ¬Έμ„œ, νŠΈλŸ¬λΈ”μŠˆνŒ… κ°€μ΄λ“œ
15 15-presentation-feedback/ μ„±κ³Ό λ°œν‘œ 및 ν”Όλ“œλ°± ν”„λ‘œμ νŠΈ λ°œν‘œ, 데λͺ¨, 평가 λ°œν‘œ 자료, 데λͺ¨ 슀크립트

πŸš€ λΉ λ₯Έ μ‹œμž‘

1. ν™˜κ²½ μš”κ΅¬μ‚¬ν•­

  • AWS 계정 및 EC2 κΆŒν•œ
  • Terraform, Ansible μ„€μΉ˜λœ μ½˜μ†” μ„œλ²„(Oracle Linux EC2)
  • μ΅œμ†Œ 8GB RAM, 50GB λ””μŠ€ν¬

2. μ½˜μ†” μ„œλ²„ μ„€μ •

3. ν”„λ‘œμ νŠΈ 클둠 및 AWS 인프라 ꡬ좕

4. Ansible을 μ΄μš©ν•œ μ„œλΉ„μŠ€ μžλ™ν™” μ„€μΉ˜

Ansible을 μ΄μš©ν•΄ Hadoop 및 Spark μžλ™ μ„€μΉ˜:

cd ../7.HadoopEco
ansible-playbook -i hadoopInstall/df/i1/ansible-hadoop/hosts hadoopInstall/df/i1/ansible-hadoop/hadoop_install.yml -e ansible_python_interpreter=/usr/bin/python3
ansible-playbook -i hadoopInstall/df/i1/ansible-spark/hosts hadoopInstall/df/i1/ansible-spark/spark_install.yml -e ansible_python_interpreter=/usr/bin/python3

5. μ„œλΉ„μŠ€ 확인

EC2μ—μ„œ κ΅¬μΆ•λœ μ„œλΉ„μŠ€μ˜ μ›Ή UI 접속:

  • Hadoop NameNode: http://[s1-instance-ip]:9870
  • YARN ResourceManager: http://[s1-instance-ip]:8088
  • Spark Master: http://[s1-instance-ip]:8080
  • Prometheus: http://[s1-instance-ip]:9090

πŸ—οΈ μ•„ν‚€ν…μ²˜ κ°œμš”

μ‹œμŠ€ν…œ μ•„ν‚€ν…μ²˜

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ i1 : Ansible, Terraform, kafka   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚       β”‚       β”‚
           β–Ό       β–Ό       β–Ό
        β”Œβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”
        β”‚ s1 β”‚  β”‚ s2 β”‚  β”‚ s3 β”‚
        β””β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”˜

데이터 흐름

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ FMS BigData Pipeline ──────────────────────┐
β”‚                                                                β”‚
β”‚  [FMS API] β†’ [Collector] β†’  [Kafka]  β†’ [Spark]  β†’  [HDFS/s3]   β”‚
β”‚        ↓           ↓           ↓          ↓           ↓        β”‚
β”‚   [μ„Όμ„œλ°μ΄ν„°]  [검증/λ³€ν™˜]  [버퍼링]  [μ‹€μ‹œκ°„μ²˜λ¦¬] [λΆ„μ‚°μ €μž₯] β”‚
β”‚        ↓           ↓          ↓           ↓           ↓        β”‚
β”‚   [10μ΄ˆκ°„κ²©]   [μ—λŸ¬μ²˜λ¦¬]  [2개브둜컀] [ν’ˆμ§ˆκ²€μ¦] [νŒŒν‹°μ…”λ‹]   β”‚
β”‚                                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€ λͺ¨λ‹ˆν„°λ§ 계측  ────────────────────────────────┐     β”‚
β”‚  β”‚     [Prometheus] β†’ [Grafana] β†’ [AlertManager]         β”‚     β”‚
β”‚  β”‚         ↑            ↑            ↑                   β”‚     β”‚
β”‚  β”‚    [λ©”νŠΈλ¦­μˆ˜μ§‘]  [μ‹œκ°ν™”]    [μ•Œλ¦Όμ „μ†‘]               β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

EC2 λ…Έλ“œλ³„ μ„œλΉ„μŠ€ μ—­ν•  및 μ£Όμš” 포트

λ…Έλ“œ μ—­ν•  및 μ„œλΉ„μŠ€ μ£Όμš” 포트
i1 (μ½˜μ†” μ„œλ²„) Terraform, Ansible 관리 μ„œλ²„ SSH(22)
s1 (λ§ˆμŠ€ν„° λ…Έλ“œ) HDFS NameNode, YARN ResourceManager, Spark Master, Prometheus SSH(22), NameNode UI(9870), ResourceManager UI(8088), Spark Master UI(8080), Prometheus(9090)
s2, s3 (μ›Œμ»€ λ…Έλ“œ) HDFS DataNode, YARN NodeManager, Spark Worker, Node Exporter SSH(22), DataNode UI(9864), NodeManager UI(8042), Spark Worker UI(8081), Node Exporter(9100)

πŸ“Š 데이터 처리

데이터 μ†ŒμŠ€

  • λ°μ΄ν„°λŠ” FMS APIλ₯Ό 톡해 10초 κ°„κ²©μœΌλ‘œ μˆ˜μ§‘λ©λ‹ˆλ‹€.
  • μ†ŒμŠ€ μ£Όμ†Œ : curl finfra.iptime.org:9872/1/ ~ curl finfra.iptime.org:9872/100/ (μž₯λΉ„ 1~100)
  • 데이터 μˆ˜μ§‘ 예
ο£Ώ  ~ $ curl finfra.iptime.org:9872/1/
{"time": "2025-07-07T08:21:29Z", "DeviceId": 1, "sensor1": 85.69, "sensor2": 85.81, "sensor3": 82.15, "motor1": 1245.16, "motor2": 874.81, "motor3": 1119.36, "isFail": false}
ο£Ώ  ~ $ curl finfra.iptime.org:9872/100/
{"time": "2025-07-07T08:21:32Z", "DeviceId": 100, "sensor1": 175.29, "sensor2": 84.14, "sensor3": 148.35, "motor1": 1847.49, "motor2": 146.12, "motor3": 2155.11, "isFail": false}

μ‹€μ‹œκ°„ 처리 νŒŒμ΄ν”„λΌμΈ

  1. 데이터 μˆ˜μ§‘: Python Collectorκ°€ FMS APIμ—μ„œ 10초 κ°„κ²©μœΌλ‘œ μ„Όμ„œ 데이터 μˆ˜μ§‘
  2. λ©”μ‹œμ§€ νμž‰: Kafkaκ°€ μˆ˜μ§‘λœ 데이터λ₯Ό μ•ˆμ •μ μœΌλ‘œ 버퍼링 (2개 브둜컀)
  3. 슀트림 처리: Spark Streaming이 μ‹€μ‹œκ°„μœΌλ‘œ 데이터 λ³€ν™˜ 및 ν’ˆμ§ˆ 검증
  4. λΆ„μ‚° μ €μž₯: HDFS에 νŒŒν‹°μ…˜ 기반으둜 데이터 μ €μž₯ (μž₯비별, μ‹œκ°„λ³„)
  5. λͺ¨λ‹ˆν„°λ§: Grafana, Prometheus, CloudWatchμ—μ„œ μ‹€μ‹œκ°„ μ‹œκ°ν™” 및 μ•Œλ¦Ό

πŸ” νŠΈλŸ¬λΈ”μŠˆνŒ…

일반적인 문제

# EC2 μΈμŠ€ν„΄μŠ€ μƒνƒœ 확인
aws ec2 describe-instances --filters "Name=instance-state-name,Values=running"

# Ansible ν”Œλ ˆμ΄λΆ μ‹€ν–‰ 문제
ansible-playbook --syntax-check [playbook.yml]

# μ„œλΉ„μŠ€ 둜그 확인
ssh ec2-user@[instance-ip] 'sudo journalctl -u [service-name]'

# μ„œλΉ„μŠ€ μž¬μ‹œμž‘
ssh ec2-user@[instance-ip] 'sudo systemctl restart [service-name]'

μ„±λŠ₯ 및 λΉ„μš© 이슈

  • λ©”λͺ¨λ¦¬ λΆ€μ‘±: EC2 μΈμŠ€ν„΄μŠ€ νƒ€μž… μ—…κ·Έλ ˆμ΄λ“œ
  • λ””μŠ€ν¬ 곡간: 였래된 둜그 및 데이터 정리
  • λΉ„μš© 초과: AWS λΉ„μš© λͺ¨λ‹ˆν„°λ§ 및 μ•Œλ¦Ό μ„€μ •

μ˜΅μ…˜ 제곡

  • ν”„λ‘œμ νŠΈ μ§„ν–‰μ‹œ μ•„λž˜ κΈ°μˆ μ„ λŒ€μ‹  μ‚¬μš©ν•΄λ„ λ¬΄λ°©ν•©λ‹ˆλ‹€.
μ˜€ν”ˆμ†ŒμŠ€ 기술 AWS κ΄€λ¦¬ν˜• μ„œλΉ„μŠ€ λŒ€μ²΄μž¬
Apache Spark Amazon EMR, AWS Glue
Apache Kafka Amazon Kinesis Data Streams
Prometheus Amazon CloudWatch

πŸš€ ν–₯ν›„ λ°œμ „ λ°©ν–₯

Phase 1 - μ•ˆμ •μ„± κ°•ν™”

  • Auto Scaling κ·Έλ£Ή λ„μž…
  • μž₯μ•  볡ꡬ μžλ™ν™”
  • IAM κΆŒν•œ μ„ΈλΆ„ν™”

Phase 2 - μ„±λŠ₯ μ΅œμ ν™”

  • μ²˜λ¦¬λŸ‰ 3λ°° ν–₯상 (150 msg/sec)
  • 캐싱 λ ˆμ΄μ–΄ μΆ”κ°€
  • GPU 가속 μ—°μ‚° λ„μž…

Phase 3 - κΈ°λŠ₯ ν™•μž₯

  • AWS Glue ETL λ„μž…
  • Athena 쿼리 및 데이터 레이크 ꡬ좕
  • CloudWatch 기반 λͺ¨λ‹ˆν„°λ§ 및 μ•Œλ¦Ό κ°•ν™”

πŸ“ ν”„λ‘œμ νŠΈ ꡬ쑰

bigdataAwsPreLab/
β”œβ”€β”€ 01-pre-lab-introduction/     
β”œβ”€β”€ 02-aws-account-setup/        
β”œβ”€β”€ 03-infra-provisioning/       
β”œβ”€β”€ 04-ansible-automation/       
β”œβ”€β”€ 05-architecture-design/      
β”œβ”€β”€ 06-hadoop-spark-cluster/     
β”œβ”€β”€ 07-kafka-streaming/          
β”œβ”€β”€ 08-data-quality-requirements/
β”œβ”€β”€ 09-spark-data-transformation/
β”œβ”€β”€ 10-integrated-pipeline/      
β”œβ”€β”€ 11-visualization-monitoring/ 
β”œβ”€β”€ 12-cost-optimization-security/
β”œβ”€β”€ 13-advanced-aws-analytics/   
β”œβ”€β”€ 14-project-documentation/    
β”œβ”€β”€ 15-presentation-feedback/    
└── README.md                    

각 ν΄λ”λŠ” README.md(이둠/μ„€λͺ…) + src/(μ‹€μŠ΅μ½”λ“œ/슀크립트)둜 κ΅¬μ„±λ©λ‹ˆλ‹€.

🀝 κΈ°μ—¬ 방법

  1. ν”„λ‘œμ νŠΈ 포크
  2. κΈ°λŠ₯ 브랜치 생성 (git checkout -b feature/AmazingFeature)
  3. 변경사항 컀밋 (git commit -m 'Add some AmazingFeature')
  4. λΈŒλžœμΉ˜μ— ν‘Έμ‹œ (git push origin feature/AmazingFeature)
  5. Pull Request 생성

πŸ“„ λΌμ΄μ„ μŠ€

이 ν”„λ‘œμ νŠΈλŠ” MIT λΌμ΄μ„ μŠ€ ν•˜μ— λ°°ν¬λ©λ‹ˆλ‹€. μžμ„Έν•œ λ‚΄μš©μ€ LICENSE νŒŒμΌμ„ μ°Έμ‘°ν•˜μ„Έμš”.

πŸ“ž 문의 및 지원

  • 이슈 νŠΈλž˜ν‚Ή: GitHub Issues ν™œμš©
  • 기술 문의: 각 μž₯별 README.md의 상세 κ°€μ΄λ“œ μ°Έμ‘°
  • κΈ΄κΈ‰ 지원: κ°•μ˜ λ…ΈνŠΈμ˜ 강사 μ—°λ½μ²˜ 확인

🎯 μ‹€μŠ΅ λͺ©ν‘œ: 이둠과 μ‹€μŠ΅μ„ 톡해 AWS 기반 빅데이터 μ‹€μ‹œκ°„ 처리 μ‹œμŠ€ν…œμ˜ μ™„μ „ν•œ ꡬ좕 κ²½ν—˜μ„ μ œκ³΅ν•˜λ©°, ν˜„μ—…μ—μ„œ λ°”λ‘œ ν™œμš©ν•  수 μžˆλŠ” 싀무 μ—­λŸ‰μ„ κΈ°λ¦…λ‹ˆλ‹€.

Happy Learning! πŸš€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published