|
| 1 | +# Serverless Configuration and Documentation |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This document provides a detailed guide on configuring serverless environments using Kubernetes, with a focus on integrating Prometheus for monitoring and KEDA for scaling. The configuration aims to ensure efficient resource utilization and seamless scaling of applications. |
| 6 | + |
| 7 | +## Concepts |
| 8 | + |
| 9 | +### Prometheus Configuration |
| 10 | + |
| 11 | +Prometheus is used for monitoring and alerting. To enable cross-namespace ServiceMonitor discovery, use `namespaceSelector`. In Prometheus, define `serviceMonitorSelector` to associate with ServiceMonitors. |
| 12 | + |
| 13 | +```yaml |
| 14 | +apiVersion: monitoring.coreos.com/v1 |
| 15 | +kind: ServiceMonitor |
| 16 | +metadata: |
| 17 | + name: qwen2-0--5b-lb-monitor |
| 18 | + namespace: llmaz-system |
| 19 | + labels: |
| 20 | + control-plane: controller-manager |
| 21 | + app.kubernetes.io/name: servicemonitor |
| 22 | +spec: |
| 23 | + namespaceSelector: |
| 24 | + any: true |
| 25 | + selector: |
| 26 | + matchLabels: |
| 27 | + llmaz.io/model-name: qwen2-0--5b |
| 28 | + endpoints: |
| 29 | + - port: http |
| 30 | + path: /metrics |
| 31 | + scheme: http |
| 32 | +``` |
| 33 | +
|
| 34 | +- Ensure that the `namespaceSelector` is set to allow cross-namespace monitoring. |
| 35 | +- Label your services appropriately to be discovered by Prometheus. |
| 36 | + |
| 37 | +### KEDA Configuration |
| 38 | + |
| 39 | +KEDA (Kubernetes Event-driven Autoscaling) is used for scaling applications based on custom metrics. It can be integrated with Prometheus to trigger scaling actions. |
| 40 | + |
| 41 | + |
| 42 | +```yaml |
| 43 | +apiVersion: keda.sh/v1alpha1 |
| 44 | +kind: ScaledObject |
| 45 | +metadata: |
| 46 | + name: qwen2-0--5b-scaler |
| 47 | + namespace: default |
| 48 | +spec: |
| 49 | + scaleTargetRef: |
| 50 | + apiVersion: inference.llmaz.io/v1alpha1 |
| 51 | + kind: Playground |
| 52 | + name: qwen2-0--5b |
| 53 | + pollingInterval: 30 |
| 54 | + cooldownPeriod: 50 |
| 55 | + minReplicaCount: 0 |
| 56 | + maxReplicaCount: 3 |
| 57 | + triggers: |
| 58 | + - type: prometheus |
| 59 | + metadata: |
| 60 | + serverAddress: http://prometheus-operated.llmaz-system.svc.cluster.local:9090 |
| 61 | + metricName: llamacpp:requests_processing |
| 62 | + query: sum(llamacpp:requests_processing) |
| 63 | + threshold: "0.2" |
| 64 | +``` |
| 65 | + |
| 66 | +- Ensure that the `serverAddress` points to the correct Prometheus service. |
| 67 | +- Adjust `pollingInterval` and `cooldownPeriod` to optimize scaling behavior and avoid conflicts with other scaling mechanisms. |
| 68 | + |
| 69 | +### Integration with Activator |
| 70 | + |
| 71 | +Consider integrating the serverless configuration with an activator for scale-from-zero scenarios. The activator can be implemented using a controller pattern or as a standalone goroutine. |
| 72 | + |
| 73 | +### Controller Runtime Framework |
| 74 | + |
| 75 | +Using the Controller Runtime framework can simplify the development of Kubernetes controllers. It provides abstractions for managing resources and handling events. |
| 76 | + |
| 77 | +#### Key Components |
| 78 | + |
| 79 | +1. **Controller**: Monitors resource states and triggers actions to align actual and desired states. |
| 80 | +2. **Reconcile Function**: Core logic for transitioning resource states. |
| 81 | +3. **Manager**: Manages the lifecycle of controllers and shared resources. |
| 82 | +4. **Client**: Interface for interacting with the Kubernetes API. |
| 83 | +5. **Scheme**: Registry for resource types. |
| 84 | +6. **Event Source and Handler**: Define event sources and handling logic. |
| 85 | + |
| 86 | + |
| 87 | +## Quick Start Guide |
| 88 | + |
| 89 | +1. Install Prometheus and KEDA using Helm charts, following the official documentation [Install Guide](https://llmaz.inftyai.com/docs/getting-started/installation/). |
| 90 | + |
| 91 | +```bash |
| 92 | +helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.10 |
| 93 | +make install-keda |
| 94 | +make install-prometheus |
| 95 | +``` |
| 96 | + |
| 97 | +2. Create a ServiceMonitor for Prometheus to discover your services. |
| 98 | +```bash |
| 99 | +kubectl apply -f service-monitor.yaml |
| 100 | +``` |
| 101 | + |
| 102 | +3. Create a ScaledObject for KEDA to manage scaling. |
| 103 | +```bash |
| 104 | +kubectl apply -f scaled-object.yaml |
| 105 | +``` |
| 106 | + |
| 107 | +4. Test with a cold start application. |
| 108 | +```bash |
| 109 | +kubectl exec -it -n kube-system deploy/activator -- wget -O- qwen2-0--5b-lb.default.svc:8080 |
| 110 | +``` |
| 111 | + |
| 112 | +5. Check with Prometheus and KEDA dashboards to monitor metrics and scaling activities in web page. |
| 113 | +```bash |
| 114 | +kubectl port-forward services/prometheus-operated 9090:9090 --address 0.0.0.0 -n llmaz-system |
| 115 | +``` |
| 116 | + |
| 117 | +## Conclusion |
| 118 | + |
| 119 | +This configuration guide provides a comprehensive approach to setting up a serverless environment with Kubernetes, Prometheus, and KEDA. By following these guidelines, you can ensure efficient scaling and monitoring of your applications. |
0 commit comments