Distributed Machine Learning Framework

Overview

This project implements a distributed machine learning framework with real-time performance monitoring and task tracking.

Prerequisites

C++17 Compiler
MPI (OpenMPI or MPICH)
OpenCV
Eigen3
CPP Rest SDK
nlohmann/json

Build Instructions

mkdir build && cd build
cmake ..
make

Running the Application

mpirun -n <num_processes> ./distributed_ml_app

Dashboard

Access the dashboard at http://localhost:8080

Features

Distributed Training
Real-time Task Monitoring
Performance Metrics Tracking
Web-based Dashboard

Architecture

Distributed Trainer: Manages distributed machine learning tasks
Task Manager: Tracks and manages individual tasks
Performance Tracker: Monitors and records performance metrics
Dashboard Server: Provides a web interface for monitoring

Kubernetes Deployment

Prerequisites

Kubernetes Cluster
kubectl
Helm (optional)

Docker Image Build

docker build -t distributed-ml-app:latest .

Kubernetes Deployment Options

1. Direct Kubernetes Deployment

# Apply Kubernetes manifests
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/hpa.yaml

2. Helm Chart Deployment

# Install Helm chart
helm install distributed-ml helm/distributed-ml

Scaling and Monitoring

Horizontal Pod Autoscaler configured to scale based on CPU and memory utilization
Automatically scales between 3-10 replicas
Monitoring available through Kubernetes dashboard or kubectl

Accessing the Dashboard

kubectl port-forward service/distributed-ml-service 8080:8080

Open http://localhost:8080 in your browser

Logging and Debugging

# View logs
kubectl logs -l app=distributed-ml

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
build		build
dashboard		dashboard
helm/distributed-ml		helm/distributed-ml
include		include
k8s		k8s
src		src
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Machine Learning Framework

Overview

Prerequisites

Build Instructions

Running the Application

Dashboard

Features

Architecture

Kubernetes Deployment

Prerequisites

Docker Image Build

Kubernetes Deployment Options

1. Direct Kubernetes Deployment

2. Helm Chart Deployment

Scaling and Monitoring

Accessing the Dashboard

Logging and Debugging

License

About

Languages

License

bniladridas/distributed_ml

Folders and files

Latest commit

History

Repository files navigation

Distributed Machine Learning Framework

Overview

Prerequisites

Build Instructions

Running the Application

Dashboard

Features

Architecture

Kubernetes Deployment

Prerequisites

Docker Image Build

Kubernetes Deployment Options

1. Direct Kubernetes Deployment

2. Helm Chart Deployment

Scaling and Monitoring

Accessing the Dashboard

Logging and Debugging

License

About

Resources

License

Stars

Watchers

Forks

Languages