Skip to content

automate the management and tracking of Kubernetes workload healths and cost.

License

ochestra-tech/aria-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kubernetes Health and Cost Management Tool

Kubeopera is an cloud native K8's management tool that leverages artificial inteligence to simplify and automate the management of cloud native workloads on Kubernetes. Its main purpose is to help solve the challenges associated with operating cloud - native applications at scale - Complexity, Cost and Performance.

image

A comprehensive Go-based tool for monitoring Kubernetes cluster health and managing costs. This tool provides real-time health assessments, cost tracking, optimization recommendations, and detailed reporting for Kubernetes environments.

Features

🏥 Health Monitoring

  • Node Health: Monitor node status, resource pressure, and availability
  • Pod Health: Track pod states, restart counts, and crash loops
  • Control Plane: Monitor API server, etcd, scheduler, and controller manager
  • Network Health: Check CNI, DNS resolution, service endpoints, and ingress
  • Resource Usage: Track CPU, memory, and storage utilization
  • Health Scoring: Overall cluster health score (0-100)

💰 Cost Management

  • Node Costs: Calculate costs by instance type and region
  • Pod Costs: Track resource consumption and costs per workload
  • Namespace Costs: Aggregate costs by namespace
  • Cost Forecasting: Project future costs based on usage trends
  • Optimization: Identify over-provisioned resources and cost savings

📊 Reporting

  • Multiple Formats: JSON, HTML, and text output
  • Interactive Dashboards: Visual HTML reports with charts
  • Prometheus Metrics: Export metrics for monitoring systems
  • Combined Reports: Health and cost analysis in one view

🔧 Automation

  • Resource Cleanup: Automated cleanup of unused resources
  • Cost Alerts: Monitor cost changes and send notifications
  • Continuous Monitoring: Run as a service with configurable intervals
  • Optimization Recommendations: Automated suggestions for improvements

Installation

Prerequisites

  • Go 1.19 or later
  • Access to a Kubernetes cluster
  • kubectl configured with cluster access
  • (Optional) Metrics Server deployed in the cluster for detailed resource usage

Build from Source

# Clone the repository
git clone https://github.com/ochestra-tech/kubeopera-ai
cd kubeopera-ai

# Download dependencies
go mod tidy

# Build the application
go build -o kubeopera-ai ./cmd/main.go

Dependencies

The tool requires the following Go modules:

go get k8s.io/client-go@latest
go get k8s.io/api@latest
go get k8s.io/apimachinery@latest
go get k8s.io/metrics@latest
go get github.com/prometheus/client_golang@latest
go get github.com/olekukonko/tablewriter@latest

Configuration

Kubeconfig

The tool uses your existing kubeconfig file. By default, it looks for ~/.kube/config, but you can specify a different path:

./kubeopera-ai --kubeconfig /path/to/kubeconfig

Pricing Configuration

Create a pricing-config.json file to define your cloud pricing:

{
  "defaults": {
    "cpu": 0.03,
    "memory": 0.004,
    "storage": 0.00012,
    "network": 0.08,
    "gpuPricing": {
      "nvidia-tesla-v100": 1.2,
      "nvidia-tesla-k80": 0.6
    }
  },
  "instanceTypes": {
    "m5.large": {
      "cpu": 0.032,
      "memory": 0.0045,
      "storage": 0.00015,
      "network": 0.09
    },
    "c5.large": {
      "cpu": 0.035,
      "memory": 0.0035,
      "storage": 0.00018,
      "network": 0.095
    }
  },
  "regionMultipliers": {
    "us-east-1": 1.0,
    "us-west-2": 1.05,
    "eu-west-1": 1.1,
    "ap-southeast-1": 1.15
  }
}

Usage

Basic Commands

Health Check

# Quick health check
./kubeopera-ai --type health --format text

# Detailed health report in HTML
./kubeopera-ai --type health --format html --output health-report.html

Cost Analysis

# Cost report in JSON format
./kubeopera-ai --type cost --format json --output cost-report.json

# Monthly cost breakdown
./kubeopera-ai --type cost --format text

Combined Report

# Complete health and cost analysis
./kubeopera-ai --type combined --format html --output cluster-report.html

Continuous Monitoring

# Monitor every 5 minutes with Prometheus metrics
./kubeopera-ai --interval 5m --metrics-port 8080

# Custom configuration
./kubeopera-ai \
  --kubeconfig ~/.kube/config \
  --pricing-config ./my-pricing.json \
  --interval 10m \
  --metrics-port 9090 \
  --type combined \
  --format json \
  --output /var/log/k8s-reports/report.json

Command Line Options

Option Description Default
--kubeconfig Path to kubeconfig file ~/.kube/config
--pricing-config Path to pricing configuration pricing-config.json
--type Report type (health, cost, combined) combined
--format Output format (text, json, html) text
--output Output file path (empty for stdout) ``
--interval Check interval for continuous monitoring 60s
--metrics-port Prometheus metrics port 8080
--one-shot Run once and exit false

API and Programming Interface

Health Check API

package main

import (
    "context"
    "fmt"
    "github.com/ochestra-tech/kubeopera-ai/pkg/health"
)

func main() {
    clientset, metricsClient := initKubernetesClients()
    
    healthData, err := health.GetClusterHealth(
        context.Background(), 
        clientset, 
        metricsClient,
    )
    if err != nil {
        panic(err)
    }
    
    fmt.Printf("Cluster Health Score: %d/100\n", healthData.HealthScore)
}

Cost Analysis API

package main

import (
    "context"
    "github.com/ochestra-tech/kubeopera-ai/pkg/cost"
)

func main() {
    clientset, metricsClient := initKubernetesClients()
    pricing := loadPricingConfig()
    
    nodeCosts, err := cost.GetNodeCosts(
        context.Background(),
        clientset,
        metricsClient,
        pricing,
    )
    if err != nil {
        panic(err)
    }
    
    for _, node := range nodeCosts {
        fmt.Printf("Node %s: $%.2f/hour\n", node.Name, node.TotalCost)
    }
}

Report Generation API

package main

import (
    "context"
    "os"
    "github.com/ochestra-tech/kubeopera-ai/pkg/reports"
)

func main() {
    clientset, metricsClient := initKubernetesClients()
    pricing := loadPricingConfig()
    
    generator := reports.NewReportGenerator(
        clientset,
        metricsClient,
        reports.FormatHTML,
        os.Stdout,
    )
    
    err := generator.GenerateCombinedReport(context.Background(), pricing)
    if err != nil {
        panic(err)
    }
}

Prometheus Metrics

The tool exports the following Prometheus metrics:

Metric Type Description
k8s_health_manager_node_status Gauge Node readiness status
k8s_health_manager_pod_status Gauge Pod status by namespace
k8s_health_manager_namespace_resource_usage Gauge Resource usage by namespace
k8s_health_manager_namespace_cost Gauge Cost per namespace per hour
k8s_health_manager_resource_efficiency Gauge Resource efficiency ratio

Grafana Dashboard

You can create Grafana dashboards using these metrics:

# Cluster health score
k8s_health_manager_cluster_health_score

# Cost per namespace
k8s_health_manager_namespace_cost

# Resource efficiency
k8s_health_manager_resource_efficiency

Examples

Example Output

Health Report (Text)

=== Kubernetes Cluster Health Report ===
Generated at: 2024-01-15T10:30:00Z

Overall Health Score: 85/100

--- Node Health ---
Total Nodes:                    3
Ready Nodes:                    3
Memory Pressure Nodes:          0
Disk Pressure Nodes:            0
PID Pressure Nodes:             0
Network Unavailable Nodes:      0
Average Node Load:              45.2

--- Pod Health ---
Total Pods:                     48
Running Pods:                   45
Pending Pods:                   2
Failed Pods:                    1
Restarting Pods:                0
Crash Looping Pods:             0

--- Control Plane Status ---
API Server Healthy:             true
Controller Manager Healthy:     true
Scheduler Healthy:              true
Etcd Healthy:                   true
CoreDNS Healthy:                true
API Server Latency:             12.5 ms

--- Resource Usage ---
Cluster CPU Usage:              65.2%
Cluster Memory Usage:           72.8%
Cluster Storage Usage:          45.1%

Cost Report (Text)

=== Kubernetes Cluster Cost Report ===
Generated at: 2024-01-15T10:30:00Z

Total Hourly Cost:              $12.45
Total Monthly Cost:             $8,964.00

--- Node Cost Summary ---
┌──────────────────┬──────────────┬─────────────┬───────────┬─────────────┬─────────────┐
│ Node             │ Instance Type │ Hourly Cost │ CPU Cost  │ Memory Cost │ Utilization │
├──────────────────┼──────────────┼─────────────┼───────────┼─────────────┼─────────────┤
│ node-1           │ m5.large     │ $4.15       │ $2.88     │ $1.27       │ 68.5%       │
│ node-2           │ m5.large     │ $4.15       │ $2.88     │ $1.27       │ 71.2%       │
│ node-3           │ c5.large     │ $4.15       │ $3.15     │ $1.00       │ 59.8%       │
└──────────────────┴──────────────┴─────────────┴───────────┴─────────────┴─────────────┘

--- Namespace Cost Summary ---
┌─────────────────┬──────────────┬───────────┬─────────────┬───────────┐
│ Namespace       │ Monthly Cost │ CPU Cost  │ Memory Cost │ Pod Count │
├─────────────────┼──────────────┼───────────┼─────────────┼───────────┤
│ production      │ $4,234.80    │ $2,876.40 │ $1,358.40   │ 24        │
│ staging         │ $2,156.40    │ $1,438.20 │ $718.20     │ 12        │
│ monitoring      │ $1,892.80    │ $1,254.60 │ $638.20     │ 8         │
└─────────────────┴──────────────┴───────────┴─────────────┴───────────┘

Contributing

Development Setup

  1. Fork the repository
  2. Clone your fork: git clone https://github.com/your-username/kubeopera-ai.git
  3. Create a feature branch: git checkout -b feature/your-feature-name
  4. Make your changes
  5. Add tests for new functionality
  6. Run tests: go test ./...
  7. Create a pull request

Code Structure

.
├── cmd/
│   └── main.go                # Application entry point
├── pkg/
│   ├── health/
│   │   └── health-checker.go  # Health monitoring utilities
│   ├── cost/
│   │   └── cost-tracker.go    # Cost calculation utilities
│   └── reports/
│       └── generator.go       # Report generation
├── examples/
│   └── main.go                # Usage examples
├── configs/
│   └── pricing-config.json    # Default pricing configuration
├── deployments/
│   └── kubernetes.yaml        # Kubernetes deployment manifests
└── README.md

Testing

# Run all tests
go test ./...

# Run tests with coverage
go test -cover ./...

# Run specific package tests
go test ./pkg/health/

Troubleshooting

Common Issues

1. Permission Denied

Error: failed to list nodes: nodes is forbidden

Solution: Ensure your service account has the required RBAC permissions (see Kubernetes Deployment section).

2. Metrics Server Not Found

Error: failed to get pod metrics: the server could not find the requested resource

Solution: Install metrics-server in your cluster:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

3. Invalid Pricing Configuration

Error: failed to parse pricing config

Solution: Validate your pricing-config.json file format against the example provided.

Debug Mode

Enable debug logging:

./kubeopera-ai --debug --type health

Log Analysis

Check application logs for detailed error information:

# For container deployment
kubectl logs -n monitoring deployment/kubeopera-ai

# For local deployment
./kubeopera-ai 2>&1 | tee app.log

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Roadmap

  • Multi-cluster support: Monitor multiple clusters from a single instance (KubeCostGuard Project)
  • Historical data storage: Store metrics in time-series database (KubeOpera Project)
  • Advanced forecasting: ML-based cost prediction
  • Cloud provider integration: Direct billing API integration (KubeCostGuard Project)
  • Slack/Teams notifications: Real-time alerts
  • Helm chart: Easy deployment with Helm (KubeCostGuard Project)
  • Web UI: Built-in web interface for centralized multi-cluster monitoring & observability (KubeCostOpera Project)

Acknowledgments


Built with ❤️ for the Kubernetes community

About

automate the management and tracking of Kubernetes workload healths and cost.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published