Skip to content

Commit 66faed7

Browse files
committed
feat: add serverless usage doc for llmaz.
Signed-off-by: X1aoZEOuO <[email protected]>
1 parent b096ddf commit 66faed7

File tree

3 files changed

+129
-1
lines changed

3 files changed

+129
-1
lines changed

Makefile

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,15 @@ install-prometheus:
302302
uninstall-prometheus:
303303
kubectl delete -k config/prometheus
304304

305+
.PHONY: install-keda
306+
install-keda:
307+
helm repo add kedacore https://kedacore.github.io/charts
308+
helm install keda kedacore/keda --namespace keda --create-namespace
309+
310+
.PHONY: uninstall-keda
311+
uninstall-keda:
312+
helm uninstall keda -n keda
313+
305314
##@Release
306315

307316
.PHONY: artifacts

docs/examples/serverless/README.md

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Serverless Configuration and Documentation
2+
3+
## Overview
4+
5+
This document provides a detailed guide on configuring serverless environments using Kubernetes, with a focus on integrating Prometheus for monitoring and KEDA for scaling. The configuration aims to ensure efficient resource utilization and seamless scaling of applications.
6+
7+
## Concepts
8+
9+
### Prometheus Configuration
10+
11+
Prometheus is used for monitoring and alerting. To enable cross-namespace ServiceMonitor discovery, use `namespaceSelector`. In Prometheus, define `serviceMonitorSelector` to associate with ServiceMonitors.
12+
13+
```yaml
14+
apiVersion: monitoring.coreos.com/v1
15+
kind: ServiceMonitor
16+
metadata:
17+
name: qwen2-0--5b-lb-monitor
18+
namespace: llmaz-system
19+
labels:
20+
control-plane: controller-manager
21+
app.kubernetes.io/name: servicemonitor
22+
spec:
23+
namespaceSelector:
24+
any: true
25+
selector:
26+
matchLabels:
27+
llmaz.io/model-name: qwen2-0--5b
28+
endpoints:
29+
- port: http
30+
path: /metrics
31+
scheme: http
32+
```
33+
34+
- Ensure that the `namespaceSelector` is set to allow cross-namespace monitoring.
35+
- Label your services appropriately to be discovered by Prometheus.
36+
37+
### KEDA Configuration
38+
39+
KEDA (Kubernetes Event-driven Autoscaling) is used for scaling applications based on custom metrics. It can be integrated with Prometheus to trigger scaling actions.
40+
41+
42+
```yaml
43+
apiVersion: keda.sh/v1alpha1
44+
kind: ScaledObject
45+
metadata:
46+
name: qwen2-0--5b-scaler
47+
namespace: default
48+
spec:
49+
scaleTargetRef:
50+
apiVersion: inference.llmaz.io/v1alpha1
51+
kind: Playground
52+
name: qwen2-0--5b
53+
pollingInterval: 30
54+
cooldownPeriod: 50
55+
minReplicaCount: 0
56+
maxReplicaCount: 3
57+
triggers:
58+
- type: prometheus
59+
metadata:
60+
serverAddress: http://prometheus-operated.llmaz-system.svc.cluster.local:9090
61+
metricName: llamacpp:requests_processing
62+
query: sum(llamacpp:requests_processing)
63+
threshold: "0.2"
64+
```
65+
66+
- Ensure that the `serverAddress` points to the correct Prometheus service.
67+
- Adjust `pollingInterval` and `cooldownPeriod` to optimize scaling behavior and avoid conflicts with other scaling mechanisms.
68+
69+
### Integration with Activator
70+
71+
Consider integrating the serverless configuration with an activator for scale-from-zero scenarios. The activator can be implemented using a controller pattern or as a standalone goroutine.
72+
73+
### Controller Runtime Framework
74+
75+
Using the Controller Runtime framework can simplify the development of Kubernetes controllers. It provides abstractions for managing resources and handling events.
76+
77+
#### Key Components
78+
79+
1. **Controller**: Monitors resource states and triggers actions to align actual and desired states.
80+
2. **Reconcile Function**: Core logic for transitioning resource states.
81+
3. **Manager**: Manages the lifecycle of controllers and shared resources.
82+
4. **Client**: Interface for interacting with the Kubernetes API.
83+
5. **Scheme**: Registry for resource types.
84+
6. **Event Source and Handler**: Define event sources and handling logic.
85+
86+
87+
## Quick Start Guide
88+
89+
1. Install Prometheus and KEDA using Helm charts, following the official documentation [Install Guide](https://llmaz.inftyai.com/docs/getting-started/installation/).
90+
91+
```bash
92+
helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.10
93+
make install-keda
94+
make install-prometheus
95+
```
96+
97+
2. Create a ServiceMonitor for Prometheus to discover your services.
98+
```bash
99+
kubectl apply -f service-monitor.yaml
100+
```
101+
102+
3. Create a ScaledObject for KEDA to manage scaling.
103+
```bash
104+
kubectl apply -f scaled-object.yaml
105+
```
106+
107+
4. Test with a cold start application.
108+
```bash
109+
kubectl exec -it -n kube-system deploy/activator -- wget -O- qwen2-0--5b-lb.default.svc:8080
110+
```
111+
112+
5. Check with Prometheus and KEDA dashboards to monitor metrics and scaling activities in web page.
113+
```bash
114+
kubectl port-forward services/prometheus-operated 9090:9090 --address 0.0.0.0 -n llmaz-system
115+
```
116+
117+
## Conclusion
118+
119+
This configuration guide provides a comprehensive approach to setting up a serverless environment with Kubernetes, Prometheus, and KEDA. By following these guidelines, you can ensure efficient scaling and monitoring of your applications.

pkg/controller/inference/service_controller.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,7 @@ func CreateServiceIfNotExists(ctx context.Context, k8sClient client.Client, Sche
439439
ObjectMeta: metav1.ObjectMeta{
440440
Name: svcName,
441441
Namespace: service.Namespace,
442-
Labels: modelLabels(model[0]),
442+
Labels: modelLabels(model[0]),
443443
// For activator service, we can ignore it if serverless config is not enabled.
444444
Annotations: activatorAnnotations(model[0]),
445445
},

0 commit comments

Comments
 (0)