Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Commit 02e29fd

Browse files
authored
Merge pull request #189 from panchul/k8sdeployment
Adding "deploying model on k8s" chapter
2 parents 56c54fb + c1d2bae commit 02e29fd

File tree

13 files changed

+487
-52
lines changed

13 files changed

+487
-52
lines changed
Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
# Deploying model to Kubernetes
2+
3+
This section guides you through the steps needed to deploy a model for inferencing in GPU-enabled Kubernetes cluster.
4+
5+
## Prerequisites
6+
7+
1. Have a valid Microsoft Azure subscription
8+
2. Be able to provision GPU-enabled VMs
9+
3. Have access to VM image repository (DockerHub account, or ACR)
10+
11+
Clone this repository somewhere so you can easily access the different source files:
12+
13+
$ git clone https://github.com/Azure-Samples/azure-intelligent-edge-patterns.git
14+
15+
If you do not have a vm, you need to create one using Azure Portal or Azure CLI. We recommend selecting
16+
'Data Science' images, because they have pre-installed drivers and GPU utilities.
17+
18+
If you already have a VM, you need to be able to validate you have access to GPUs.
19+
20+
You can see the hardware using `lspci`:
21+
22+
$ lspci
23+
...
24+
0001:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
25+
...
26+
27+
And you can use `nvidia-smi` utility to see the gpu driver and CUDA versions:
28+
29+
$ nvidia-smi
30+
Thu Sep 17 18:03:11 2020
31+
+-----------------------------------------------------------------------------+
32+
| NVIDIA-SMI 450.36.06 Driver Version: 450.36.06 CUDA Version: 11.0 |
33+
|-------------------------------+----------------------+----------------------+
34+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
35+
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
36+
| | | MIG M. |
37+
|===============================+======================+======================|
38+
| 0 Tesla K80 On | 00000001:00:00.0 Off | 0 |
39+
| N/A 57C P0 59W / 149W | 10957MiB / 11441MiB | 0% Default |
40+
| | | N/A |
41+
+-------------------------------+----------------------+----------------------+
42+
43+
+-----------------------------------------------------------------------------+
44+
| Processes: |
45+
| GPU GI CI PID Type Process name GPU Memory |
46+
| ID ID Usage |
47+
|=============================================================================|
48+
| 0 N/A N/A 19918 C ...5b8c54360196ff/bin/python 10952MiB |
49+
+-----------------------------------------------------------------------------+
50+
51+
You also can check that your gpus are available from containers.
52+
53+
Please see [NVIDIA webpage](https://docs.nvidia.com/datacenter/kubernetes/kubernetes-upstream/index.html#kubernetes-run-a-workload) if you have any problems. You should see something like this, for example:
54+
55+
$ sudo docker run --rm --runtime=nvidia nvidia/cuda nvidia-smi
56+
Unable to find image 'nvidia/cuda:latest' locally
57+
latest: Pulling from nvidia/cuda
58+
3ff22d22a855: Pull complete
59+
e7cb79d19722: Pull complete
60+
323d0d660b6a: Pull complete
61+
b7f616834fd0: Pull complete
62+
c2607e16e933: Pull complete
63+
46a16da628dc: Pull complete
64+
4871b8b75027: Pull complete
65+
e45235afa764: Pull complete
66+
250da266cf64: Pull complete
67+
78f4b6d02e6c: Pull complete
68+
ebf42dcedf4b: Pull complete
69+
Digest: sha256:0fe0406ec4e456ae682226751434bdd7e9b729a03067d795f9b34c978772b515
70+
Status: Downloaded newer image for nvidia/cuda:latest
71+
Thu Sep 17 17:06:27 2020
72+
+-----------------------------------------------------------------------------+
73+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 11.0 |
74+
|-------------------------------+----------------------+----------------------+
75+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
76+
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
77+
|===============================+======================+======================|
78+
| 0 Tesla K80 On | 0000DE85:00:00.0 Off | 0 |
79+
| N/A 39C P8 25W / 149W | 0MiB / 11441MiB | 0% Default |
80+
+-------------------------------+----------------------+----------------------+
81+
82+
+-----------------------------------------------------------------------------+
83+
| Processes: GPU Memory |
84+
| GPU PID Type Process name Usage |
85+
|=============================================================================|
86+
| No running processes found |
87+
+-----------------------------------------------------------------------------+
88+
89+
## Creating one-node Kubernetes cluster
90+
91+
To create a simple one-node Kubernetes cluster, you can use `snap` to install `microk8s`:
92+
93+
$ sudo snap install microk8s --edge --classic
94+
95+
Add your current user to microk8s group:
96+
97+
$ sudo usermod -a -G microk8s $USER
98+
$ sudo chown -f -R $USER ~/.kube
99+
100+
You will also need to re-enter the session for the group update to take place:
101+
102+
$ su - $USER
103+
104+
Then start it:
105+
106+
$ microk8s.start --wait-ready
107+
108+
You need to enable its components depending on the desired configuration, for example, dns and dashboard:
109+
110+
$ microk8s.enable dns storage dashboard
111+
112+
The most important for us is the access to gpu
113+
114+
$ microk8s.enable gpu
115+
116+
You will be able to see the nodes:
117+
118+
$ microk8s.kubectl get nodes
119+
NAME STATUS ROLES AGE VERSION
120+
sandbox-dsvm-tor4 Ready <none> 14h v1.19.2-34+88df35f6de9eb1
121+
122+
And the gpu-support information in the description of the node:
123+
124+
$ microk8s.kubectl describe node sandbox-dsvm-tor4
125+
Capacity:
126+
...
127+
nvidia.com/gpu: 1
128+
...
129+
Allocatable:
130+
...
131+
nvidia.com/gpu: 1
132+
...
133+
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
134+
--------- ---- ------------ ---------- --------------- ------------- ---
135+
...
136+
kube-system nvidia-device-plugin-daemonset-hmzbl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 14h
137+
...
138+
Allocated resources:
139+
Resource Requests Limits
140+
-------- -------- ------
141+
...
142+
nvidia.com/gpu 1 1
143+
...
144+
145+
After we installed Kubernetes, you should also be able to run NVIDIA's examples,
146+
https://github.com/NVIDIA/k8s-device-plugin, here is a [`gpu-pod`](https://github.com/NVIDIA/k8s-device-plugin/blob/examples/workloads/pod.yml)
147+
example if ran successfully:
148+
149+
$ git clone -b examples https://github.com/NVIDIA/k8s-device-plugin.git
150+
$ cd k8-device-plugin/workloads
151+
$ kubectl create -f pod.yml
152+
153+
$ kubectl exec -it gpu-pod nvidia-smi
154+
+-----------------------------------------------------------------------------+
155+
| NVIDIA-SMI 384.125 Driver Version: 384.125 |
156+
|-------------------------------+----------------------+----------------------+
157+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
158+
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
159+
|===============================+======================+======================|
160+
| 0 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 |
161+
| N/A 34C P0 20W / 300W | 10MiB / 16152MiB | 0% Default |
162+
+-------------------------------+----------------------+----------------------+
163+
164+
+-----------------------------------------------------------------------------+
165+
| Processes: GPU Memory |
166+
| GPU PID Type Process name Usage |
167+
|=============================================================================|
168+
| No running processes found |
169+
+-----------------------------------------------------------------------------+
170+
171+
If it does not work, please check the instructions at Nvidia's examples page, https://github.com/NVIDIA/k8s-device-plugin/blob/examples/workloads/pod.yml
172+
173+
For generality, we will be using `kubectl` instead of `microk8s.kubectl`, and you are encouraged to alias it to a shortcut.
174+
175+
## Creating serialized model
176+
177+
We will be deploying a model that we created in Azure ML, using a notebook from [../../machine-learning-notebooks](../../machine-learning-notebooks).
178+
179+
If you want to create one yourself, update to your own account and run through
180+
[machine-learning-notebooks/production-deploy-to-ase-gpu.ipyb](../../machine-learning-notebooks/production-deploy-to-ase-gpu.ipyb).
181+
182+
Here is how it would look in Azure ML, you will need to make sure you install `azureml-sdk`:
183+
184+
![JupiterLab](pics/jupiter_lab_azureml-sdk.png)
185+
186+
If your image is deployed not on a publicly-available image registry, you will need to login with your credentials. You can
187+
retrieve your credentials from the notebook - through your workspace `ws.subscription_id`, and use
188+
`ContainerRegistryManagementClient`, see the similar cells in [machine-learning-notebooks/production-deploy-to-ase-gpu.ipyb](../../machine-learning-notebooks/production-deploy-to-ase-gpu.ipyb):
189+
190+
...
191+
imagename= "tfgpu"
192+
imagelabel="1.0"
193+
package = Model.package(ws, [model], inference_config=inference_config,image_name=imagename, image_label=imagelabel)
194+
package.wait_for_creation(show_output=True)
195+
client = ContainerRegistryManagementClient(ws._auth,subscription_id)
196+
result= client.registries.list_credentials(ws.resource_group, reg_name, custom_headers=None, raw=False)
197+
198+
print("ACR:", package.get_container_registry)
199+
print("Image:", package.location)
200+
print("using username \"" + result.username + "\"")
201+
print("using password \"" + result.passwords[0].value + "\"")
202+
...
203+
204+
It will print out the values(which you could also see at the Portal, in your Azure ML):
205+
206+
...
207+
ACR: 12345678901234567890.azurecr.io
208+
Image: 1234567dedede1234567ceeeee.azurecr.io/tfgpu:1.0
209+
using username: "9876543210abcdef"
210+
using password: "876543210987654321abcdef"
211+
...
212+
213+
At the Kubernetes cluster where you want this image to be available, you will need to login to your ACR:
214+
215+
$ docker login 12345678901234567890.azurecr.io
216+
Username: c6a1e081293c442e9465100e3021da63
217+
Password:
218+
Login Succeeded
219+
220+
This will record the authentication token in your `~/.docker/config.json`, and you will be able to
221+
create a Kubernetes secret to use to access your private repository:
222+
223+
$ kubectl create secret generic secret4acr2infer \
224+
--from-file=.dockerconfigjson=/home/azureuser/.docker/config.json \
225+
--type=kubernetes.io/dockerconfigjson
226+
227+
For more information, please see [Pull an Image from a Private Registry](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
228+
229+
230+
In the following steps we denote the image as `1234567dedede1234567ceeeee.azurecr.io/tfgpu:1.0`, you can tag your own image adhering to the naming conventions you like.
231+
232+
## Creating a Deployment
233+
234+
We provide the Deployment file, `deploy_infer.yaml`:
235+
236+
apiVersion: apps/v1
237+
kind: Deployment
238+
metadata:
239+
name: my-infer
240+
labels:
241+
app: my-infer
242+
spec:
243+
replicas: 1
244+
selector:
245+
matchLabels:
246+
app: my-infer
247+
template:
248+
metadata:
249+
labels:
250+
app: my-infer
251+
spec:
252+
containers:
253+
- name: my-infer
254+
image: 1234567dedede1234567ceeeee.azurecr.io/tfgpu:1.0
255+
ports:
256+
# we use only 5001, but the container exposes EXPOSE 5001 8883 8888
257+
- containerPort: 5001
258+
- containerPort: 8883
259+
- containerPort: 8888
260+
resources:
261+
limits:
262+
memory: "128Mi" #128 MB
263+
cpu: "200m" # 200 millicpu (0.2 or 20% of the cpu)
264+
nvidia.com/gpu: 1
265+
imagePullSecrets:
266+
- name: secret4acr2infer
267+
268+
You would need to update the image source, from your own DockerHub accout or ACR you have access to.
269+
270+
You can deploy this Deployment like so:
271+
272+
$ kubectl create -f deploy_infer.yaml
273+
274+
And you can see it instantiated, with pod creating, etc.:
275+
276+
$ kubeclt get deployment
277+
NAME READY UP-TO-DATE AVAILABLE AGE
278+
my-infer 1/1 1 1 1m
279+
280+
## Creating a Service
281+
282+
You then can expose the deployment to have access to it via a Service:
283+
284+
$ kubectl expose deployment my-infer --type=LoadBalancer --name=my-service-infer
285+
286+
You should see the Service, and if everything is ok, in a few minutes you will have an External IP address:
287+
288+
$ kubectl get service
289+
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
290+
...
291+
my-service-infer LoadBalancer 10.152.183.221 <pending> 5001:30372/TCP,8883:32004/TCP,8888:31221/TCP 1m
292+
...
293+
294+
## Running inference
295+
296+
The way our inference server setup, we need to make an http POST request to it, to port 5001.
297+
You are free to use the utility you like(curl, Postman, etc.), we provide a Python script to do it, and to
298+
convert the numbers into the labels this model(ResNet50) uses.
299+
300+
**IMPOTANT**: In the script you need to put the address of your own server, for example, the cluster-ip from the server we created earlier:
301+
302+
import requests
303+
#downloading labels for imagenet that resnet model was trained on
304+
classes_entries = requests.get("https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt").text.splitlines()
305+
306+
test_sample = open('snowleopardgaze.jpg', 'rb').read()
307+
print(f"test_sample size is {len(test_sample)}")
308+
309+
try:
310+
#scoring_uri = 'http://<replace with yout edge device ip address>:5001/score'
311+
scoring_uri = 'http://10.152.183.221:5001/score'
312+
313+
headers = {'Content-Type': 'application/json'}
314+
resp = requests.post(scoring_uri, test_sample, headers=headers)
315+
316+
print("Found: " + classes_entries[int(resp.text.strip("[]")) - 1] )
317+
318+
except KeyError as e:
319+
print(str(e))
320+
321+
322+
Run it like so:
323+
324+
$ python runtest_infer.py
325+
test_sample size is 62821
326+
Found: snow leopard, ounce, Panthera uncia
327+
328+
And, it should identify objects on your image.
329+
330+
331+
## Links
332+
333+
- https://docs.nvidia.com/datacenter/kubernetes/kubernetes-upstream/index.html#kubernetes-run-a-workload - NVIDIA webpage.
334+
- https://github.com/NVIDIA/k8s-device-plugin/blob/examples/workloads/pod.yml - NVIDIA example repository.
335+
- https://docs.microsoft.com/en-us/azure/container-registry/container-registry-get-started-docker-cli - ACR information.
336+
- https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ - working with private repositories in Kubernetes
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
3+
# You would need to login to the container registry first, to get config.json with the authentication tokens
4+
#
5+
# For example,
6+
# $ docker login <myaccount>.azurecr.io
7+
8+
sudo microk8s.kubectl create secret generic secret4acr2infer \
9+
--from-file=.dockerconfigjson=/home/azureuser/.docker/config.json \
10+
--type=kubernetes.io/dockerconfigjson
11+
12+
#
13+
# You can also create a secred using your SPN id and secret:
14+
#
15+
#kubectl create secret docker-registry <secret name> `
16+
# --docker-server=<crname, with FQDN>`
17+
# --docker-username=$userSPNID `
18+
# --docker-password=$userSPNSecret
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#
2+
# You can deploy this Deployment like so:
3+
#
4+
# $ kubectl create -f deploy_infer.yaml
5+
#
6+
#
7+
apiVersion: apps/v1
8+
kind: Deployment
9+
metadata:
10+
name: my-infer
11+
labels:
12+
app: my-infer
13+
spec:
14+
replicas: 1
15+
selector:
16+
matchLabels:
17+
app: my-infer
18+
template:
19+
metadata:
20+
labels:
21+
app: my-infer
22+
spec:
23+
containers:
24+
- name: my-infer
25+
image: myregistry.azurecr.io/rollingstone/myinfer:1.0
26+
ports:
27+
# we use only 5001, but the container exposes EXPOSE 5001 8883 8888
28+
- containerPort: 5001
29+
- containerPort: 8883
30+
- containerPort: 8888
31+
resources:
32+
limits:
33+
memory: "128Mi" #128 MB
34+
cpu: "200m" # 200 millicpu (0.2 or 20% of the cpu)
35+
nvidia.com/gpu: 1
36+
imagePullSecrets:
37+
- name: secret4acr2infer
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/bin/bash
2+
3+
kubectl expose deployment my-infer --type=LoadBalancer --name=my-service-infer
57.8 KB
Loading

0 commit comments

Comments
 (0)