This project implements a distributed machine learning framework with real-time performance monitoring and task tracking.
- C++17 Compiler
- MPI (OpenMPI or MPICH)
- OpenCV
- Eigen3
- CPP Rest SDK
- nlohmann/json
mkdir build && cd build
cmake ..
make
mpirun -n <num_processes> ./distributed_ml_app
Access the dashboard at http://localhost:8080
- Distributed Training
- Real-time Task Monitoring
- Performance Metrics Tracking
- Web-based Dashboard
- Distributed Trainer: Manages distributed machine learning tasks
- Task Manager: Tracks and manages individual tasks
- Performance Tracker: Monitors and records performance metrics
- Dashboard Server: Provides a web interface for monitoring
- Kubernetes Cluster
- kubectl
- Helm (optional)
docker build -t distributed-ml-app:latest .
# Apply Kubernetes manifests
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/hpa.yaml
# Install Helm chart
helm install distributed-ml helm/distributed-ml
- Horizontal Pod Autoscaler configured to scale based on CPU and memory utilization
- Automatically scales between 3-10 replicas
- Monitoring available through Kubernetes dashboard or
kubectl
kubectl port-forward service/distributed-ml-service 8080:8080
Open http://localhost:8080
in your browser
# View logs
kubectl logs -l app=distributed-ml
MIT License