|
| 1 | +# KFServing TensorFlow models |
| 2 | + |
| 3 | +## Building a model and running inference on it. |
| 4 | + |
| 5 | +Before we plug in our models to KFServing, we can create our own model and we would |
| 6 | +need to serialize it. As example we build a model based on Keras's `MobileNet`, serialize it, and show |
| 7 | +how to load it ad create our own .npy input file for inferensing with cli. |
| 8 | + |
| 9 | + See [TensorFlow documentation about saving and loading models](https://www.tensorflow.org/guide/saved_model) |
| 10 | + for more detals. This is how it works(we skept a few lines for clarity): |
| 11 | + |
| 12 | + $ python3 tensorflow_custom_model.py |
| 13 | + 020-10-07 20:32:09.913482: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 |
| 14 | + TensorFlow version: 2.3.1 |
| 15 | + pciBusID: 0001:00:00.0 name: Tesla K80 computeCapability: 3.7 |
| 16 | + Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/grace_hopper.jpg |
| 17 | + 65536/61306 [================================] - 0s 0us/step |
| 18 | + Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt |
| 19 | + 16384/10484 [==============================================] - 0s 0us/step |
| 20 | + Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_1_0_224_tf.h5 |
| 21 | + 17227776/17225924 [==============================] - 0s 0us/step |
| 22 | + 2020-10-07 20:32:20.189468: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 |
| 23 | + 2020-10-07 20:32:27.640408: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 |
| 24 | + esult for test image: [458 835 907 452 544] |
| 25 | + ['bow tie' 'suit' 'Windsor tie' 'bolo tie' 'dumbbell'] |
| 26 | + Result before saving: [653 458 835 440 716] |
| 27 | + ['military uniform' 'bow tie' 'suit' 'bearskin' 'pickelhaube'] |
| 28 | + mobilenet_save_path is build_models/mobilenet/1/ |
| 29 | + infer. structured_outputs: {'predictions': TensorSpec(shape=(None, 1000), dtype=tf.float32, name='predictions')} |
| 30 | + Result after saving and loading: ['military uniform' 'bow tie' 'suit' 'bearskin' 'pickelhaube'] |
| 31 | + |
| 32 | +You can now see the metadata of the saved model: |
| 33 | + |
| 34 | + $ saved_model_cli show --dir ./build_models/mobilenet/1/ --tag_set serve |
| 35 | + 2020-10-07 20:41:58.110146: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 |
| 36 | + The given SavedModel MetaGraphDef contains SignatureDefs with the following keys: |
| 37 | + SignatureDef key: "__saved_model_init_op" |
| 38 | + SignatureDef key: "serving_default" |
| 39 | + |
| 40 | +And you can see the details of the inputs: |
| 41 | + |
| 42 | + $ saved_model_cli show --dir ./build_models/mobilenet/1/ --tag_set 'serve' --signature_def serving_default |
| 43 | + 2020-10-07 21:00:22.577704: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 |
| 44 | + The given SavedModel SignatureDef contains the following input(s): |
| 45 | + inputs['input_1'] tensor_info: |
| 46 | + dtype: DT_FLOAT |
| 47 | + shape: (-1, 224, 224, 3) |
| 48 | + name: serving_default_input_1:0 |
| 49 | + The given SavedModel SignatureDef contains the following output(s): |
| 50 | + outputs['predictions'] tensor_info: |
| 51 | + dtype: DT_FLOAT |
| 52 | + shape: (-1, 1000) |
| 53 | + name: StatefulPartitionedCall:0 |
| 54 | + Method name is: tensorflow/serving/predict |
| 55 | + |
| 56 | +You can run this model using cli like so: |
| 57 | + |
| 58 | + $ saved_model_cli run --dir ./build_models/mobilenet/1/ --tag_set 'serve' --signature_def serving_default --inputs "input_1=mybowtie.npy" |
| 59 | + 2020-10-07 22:09:05.114650: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 |
| 60 | + 2020-10-07 22:09:06.590719: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1 |
| 61 | + ... |
| 62 | + INFO:tensorflow:Restoring parameters from ./build_models/mobilenet/1/variables/variables |
| 63 | + ... |
| 64 | + [[8.14447232e-10 1.08833642e-09 3.05714520e-09 3.76422588e-10 |
| 65 | + 3.02461900e-09 1.86771834e-10 9.58564408e-11 2.23442289e-11 |
| 66 | + ... |
| 67 | + 1.03344687e-10 1.75437484e-10 5.70104797e-10 2.57304542e-08 |
| 68 | + 9.80437953e-10 6.50071597e-09 7.63548336e-10 2.22535121e-07 |
| 69 | + 2.11364273e-10 1.93390726e-09 1.75153725e-09 3.15297433e-09 |
| 70 | + 3.13854276e-10 1.25729163e-10 1.90465019e-10 2.17428101e-06 |
| 71 | + 3.23613469e-09 6.73297507e-09 1.32053316e-07 7.10744175e-08 |
| 72 | + 1.44242229e-09 9.99776065e-01 1.08120508e-08 2.66501246e-07 <------------- here is the bow tie, 0.99976, index 458 |
| 73 | + 5.10951594e-11 2.09783249e-08 2.71486139e-10 4.61643097e-08 |
| 74 | + 4.05468148e-09 1.06352536e-06 1.00858000e-09 6.74229839e-11 |
| 75 | + 2.58849914e-10 2.56112132e-09 3.45258333e-09 2.42699444e-10 |
| 76 | + 6.64567623e-10 9.48480761e-09 8.73305410e-08 1.71701653e-10 |
| 77 | + 4.04795251e-12 2.47852516e-09 5.37987823e-08 1.00287258e-10 |
| 78 | + ... |
| 79 | + 1.32482428e-11 6.76930595e-11 7.33395428e-11 1.21903876e-10 |
| 80 | + 8.87640048e-12 1.07872808e-10 5.34377209e-10 1.29179213e-07]] |
| 81 | + ... |
| 82 | + |
| 83 | + |
| 84 | +## Deploying model |
| 85 | + |
| 86 | +To deploy a model, you need to create the `inferenceservice`: |
| 87 | + |
| 88 | + $ kubectl create -f tensorflow_flowers.yaml -n kfserving-test |
| 89 | + inferenceservice.serving.kubeflow.org/flowers-sample configured |
| 90 | + |
| 91 | +Give it some time to create the pods. You should eventually see it with `READY` state, and URL: |
| 92 | + |
| 93 | + $ kubectl get inferenceservices -n kfserving-test |
| 94 | + NAME READY URL DEFAULT TRAFFIC CANARY TRAFFIC AGE |
| 95 | + flowers-sample True http://flowers-sample.default.example.com 90 10 48s |
| 96 | + |
| 97 | +Now, you can identify the host and port to make requests to, it [depends on your environment](https://github.com/kubeflow/kfserving). |
| 98 | + |
| 99 | +For stand-alone KFServing using minikube: |
| 100 | + |
| 101 | + $ export INGRESS_HOST=$(minikube ip) |
| 102 | + $ export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}') |
| 103 | + |
| 104 | +For KFServing deployment within Kubeflow: |
| 105 | + |
| 106 | + $ export INGRESS_HOST=$(kubectl -n istio-system get service kfserving-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}') |
| 107 | + $ export INGRESS_PORT=$(kubectl -n istio-system get service kfserving-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}') |
| 108 | + |
| 109 | +For other stand-alone KFServing deployments: |
| 110 | + |
| 111 | + $ export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}') |
| 112 | + $ export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}') |
| 113 | + |
| 114 | +We also need to define the model you want to interact with(for the `curl` we compose later): |
| 115 | + |
| 116 | + $ export MODEL_NAME=flowers-sample |
| 117 | + $ export INPUT_PATH=@./tensorflow_input.json |
| 118 | + $ export SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n kfserving-test -o jsonpath='{.status.url}' | cut -d "/" -f 3) |
| 119 | + |
| 120 | +Do the inferencing itself: |
| 121 | + |
| 122 | + $ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH |
| 123 | + * Trying 12.34.56.78... |
| 124 | + * Connected to 12.34.56.78 (12.34.56.78) port 80 (#0) |
| 125 | + > POST /v1/models/flowers-sample:predict HTTP/1.1 |
| 126 | + > Host: flowers-sample.kfserving-test.example.com |
| 127 | + > User-Agent: curl/7.47.0 |
| 128 | + > Accept: */* |
| 129 | + > Content-Length: 16201 |
| 130 | + > Content-Type: application/x-www-form-urlencoded |
| 131 | + > Expect: 100-continue |
| 132 | + > |
| 133 | + < HTTP/1.1 100 Continue |
| 134 | + * We are completely uploaded and fine |
| 135 | + < HTTP/1.1 200 OK |
| 136 | + < content-length: 221 |
| 137 | + < content-type: application/json |
| 138 | + < date: Tue, 06 Oct 2020 17:24:59 GMT |
| 139 | + < x-envoy-upstream-service-time: 331 |
| 140 | + < server: istio-envoy |
| 141 | + < |
| 142 | + { |
| 143 | + "predictions": [ |
| 144 | + { |
| 145 | + "scores": [0.999114931, 9.20987877e-05, 0.000136786475, 0.00033725836, 0.000300533167, 1.84813962e-05], |
| 146 | + "prediction": 0, |
| 147 | + "key": " 1" |
| 148 | + } |
| 149 | + ] |
| 150 | + * Connection #0 to host 12.34.56.78 left intact |
| 151 | + } |
| 152 | + |
| 153 | +## Deploying custom model |
| 154 | + |
| 155 | +The prepared sample model is stored at `gs://kfserving-samples/models/tensorflow/flowers`. The custom model you build yourself also |
| 156 | +needs to be put into the location InferenceService CRD understands, which is, at the moment of writing this documentation, one of: |
| 157 | +`gs://`, `s3://`, or `pvc://`. |
| 158 | + |
| 159 | +For a detouched cluster, you could create a local storage using the `persistence.yaml` we provide in `sbin` folder, deploy it |
| 160 | +in `kfserving-test` namespace like so: |
| 161 | + |
| 162 | + $ kubectl create -f persistence.yaml -n kfserving-test |
| 163 | + storageclass.storage.k8s.io/local-storage created |
| 164 | + persistentvolume/samba-share-volume created |
| 165 | + persistentvolumeclaim/samba-share-claim created |
| 166 | + |
| 167 | +You should see the volume claims: |
| 168 | + |
| 169 | + $ kubectl get pvc -n kfserving-test |
| 170 | + NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE |
| 171 | + samba-share-claim Bound samba-share-volume 2Gi RWX local-storage 16h |
| 172 | + |
| 173 | +And the volume itself: |
| 174 | + |
| 175 | + $ kubectl get pv -n kfserving-test |
| 176 | + NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE |
| 177 | + ... |
| 178 | + samba-share-volume 2Gi RWX Retain Bound kfserving-test/samba-share-claim local-storage 16h |
| 179 | + |
| 180 | +Then you can copy your model from the `build_models` to where your pvc points to, and mark it in your deployment .yaml, like so: |
| 181 | + |
| 182 | + $ cat tensorflow_custom_model.yaml |
| 183 | + apiVersion: "serving.kubeflow.org/v1alpha2" |
| 184 | + kind: "InferenceService" |
| 185 | + metadata: |
| 186 | + name: "custom-model" |
| 187 | + spec: |
| 188 | + default: |
| 189 | + predictor: |
| 190 | + tensorflow: |
| 191 | + #storageUri: "gs://rollingstone/mobilenet" |
| 192 | + storageUri: "pvc://samba-share-claim/mymodels/build_models/mobilenet" |
| 193 | + |
| 194 | + |
| 195 | +## Inferencing using custom model |
| 196 | + |
| 197 | +See [Tensorflow rest api documentation](https://www.tensorflow.org/tfx/serving/api_rest) on constructing and interpreting the json inpu/output. |
| 198 | + |
| 199 | +For example, for the custom model we created earlier, we would need to define the instances with `input_1`. |
| 200 | +We can feed the 3-dimensional array with pixel values like so (see script `tensorflow_web_infer.py` for implementation suggestions): |
| 201 | + |
| 202 | + { |
| 203 | + "instances":[ |
| 204 | + {"input_1":[[ |
| 205 | + [25, 28, 82], [29, 31, 91], [27, 28, 95], [28, 27, 96], |
| 206 | + ... |
| 207 | + [13, 12, 18] |
| 208 | + ]] |
| 209 | + } |
| 210 | + ] |
| 211 | + } |
| 212 | + |
| 213 | +And we should get the predictions. |
| 214 | + |
| 215 | + { |
| 216 | + "predictions": [[7.41982103e-06, 0.00287958328, 0.000219230162, 4.96962894e-05, |
| 217 | + ... |
| 218 | + ]] |
| 219 | + } |
| 220 | + |
| 221 | +It is up to the user of the api to pre-process the input and to post-process the results according to the application's needs. |
| 222 | +See `tensorflow_web_infer.py` for example of how to pick the right index and get the label for your model. |
| 223 | + |
| 224 | +## Links |
| 225 | + |
| 226 | +- https://www.tensorflow.org/guide/saved_model |
| 227 | +- https://www.tensorflow.org/tfx/serving/api_rest |
| 228 | +- https://www.tensorflow.org/tfx/tutorials/serving/rest_simple |
| 229 | + |
| 230 | +[Back](Readme.md) |
0 commit comments