Skip to content

Automated kubernetes cluster for distributed facial recognition

Notifications You must be signed in to change notification settings

GHRik/Distributed-k8s-face-recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed face recognition

Using kubernetes cluster

Cuda      Dlib      k8s

Table of contents

  1. Quick Start
  2. Features
  3. Used technology
  4. Describe
  5. Helping ansible tags
  6. CUDA Support
  7. Without CUDA
  8. Example Result
  9. Prepare your own face database
  10. Debug/Known Bugs
  11. License

Quick Start

To deploy:

git clone https://github.com/GHRik/Distributed-k8s-face-recognition.git
cd Distributed-k8s-face-recognition/ansible
ansible-playbook -i inventory.yaml main.yaml

Features

Full automatization deploy:

Used technology:

  1. dlib - module to recognize face
  2. cuda - to accelerate GPU card
  3. ansible - to automatization create cluster
  4. kubernetes - to create cluster
  5. my docker hub repo - to store built images
  6. kubernetes-sample-cluster - to pattern code
  7. nvidia-docker - to passthrought my gpu to containers
  8. Microsoft azure cloud - for testing
  9. Calico - as CNI k8s plugin

Describe

      This repo is reworked code from this repo so if you want any info about components or how everything works together , check this link

If you still dont know how it works, maybe this diagram will help you ;) Example

Where is distrubuted?

      dlib have a Pool thread using to find face dis

Helping ansible tags

      To deploy this code you can use ansible tags:

...

No install nvida-docker and kubernetes packages

ansible-playbook -i inventory.yaml main.yaml

...

Have cluster, but dont have deploy cluster face fecogniton from this repo

ansible-playbook -i inventory.yaml main.yaml --tags "deploy"

...

Have cluster, have deployed face recognition from this repo, but you make changes on kube files or known/unknown people images

ansible-playbook -i inventory.yaml main.yaml --tags "redeploy"

...

Have cluster, this face regoznition deployed, but images not load or is an error in "recognize" role

ansible-playbook -i inventory.yaml main.yaml --tags "recognize"

...

Have cluster before , have deployed face recognition, but want to recreate cluster

ansible-playbook -i inventory.yaml main.yaml --tags "destroy_cluster" 
ansible-playbook -i inventory.yaml main.yaml

...

Have deployed face recognition cluster, but want clear it:

ansible-playbook -i inventory.yaml main.yaml --tags: "destroy"

Cuda Support

      This code support CUDA. In this case if you want deploy this cluster with CUDA support:

Check your GPU - which version CUDA your GPU is using

nvidia-smi

You will see output like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA Tesla K80    Off  | 00000001:00:00.0 Off |                    0 |
| N/A   34C    P8    32W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

This cluster was tested uising CUDA 11.3 version, but on my docker hub you can pull other version. Only one pod will be running using CUDA support face_recognition If you want change a CUDA version, change this line on other version:

face_recognition.yaml

30: image: ghrik/face_recognition:cuda11.3

This script using nvida-docker to deploy GPU Scheduling on k8s cluster. In this case you should uninstall your docker if you have.

Without CUDA Support

      You can run this cluster without CUDA.

In this case you have to change

face_recognition.yaml

30: image: ghrik/face_recognition:1.0

Result from example

      Results are in two pleaces:

Result.txt - If ansible end properly this file will be fill with the calculated time it takes to recognize a given face

$ cat results/results.txt

Server is on: http://10.98.219.249:8081
LOGS:
Checking image: unknown_people/unknown_02.PNG
Time: 0.4799957275390625 sec.

Checking image: unknown_people/unknown_03.PNG
Time: 0.6136119365692139 sec.

Checking image: unknown_people/unknown_04.PNG
Time: 0.5596208572387695 sec.

Checking image: unknown_people/unknown_01.PNG
Time: 0.46269893646240234 sec.

The first line from result.txt is a ip to frontend site. On this site you will see what faces have been recognized. Example

Prepare your own face database

      As you can see this cluster is checking only faces in unknown_people dir. To make your own database with face you change do a small change in

ansible/kube_files/database_setup.sql

So the first step is a create relation people-face

insert into person (name) values('Damian');
insert into person (name) values('Barack');
insert into person (name) values('Duda');
insert into person (name) values('Lewy');

It is very simple, add only something like that

The next step is create relation picture from known_people - people_id

insert into person_images (image_name, person_id) values ('damian_01.PNG', 1);
insert into person_images (image_name, person_id) values ('damian_02.PNG', 1);
insert into person_images (image_name, person_id) values ('barack_01.jpg', 2);
insert into person_images (image_name, person_id) values ('barack_02.PNG', 2);
insert into person_images (image_name, person_id) values ('duda_01.PNG', 3);
insert into person_images (image_name, person_id) values ('duda_02.PNG', 3);
insert into person_images (image_name, person_id) values ('lewy_01.PNG', 4);
insert into person_images (image_name, person_id) values ('lewy_02.PNG', 4);

Debug / Known Bugs

      In any case of error check for the first image_processor pod

kubectl logs image_processor
  • List_out_of range
          Probably one of images (from unknown/known_people) does not have any face to recognize. In this case image_processor cant process this image.

  • Image_processor is not up
          Sometimes a image_processor must have a more time to get up. You can see it if you run new cluster. Pulling image to pod can take a long time

  • No such file or directory on image processor pod
          Sometimes face_recog_unknown_pvc is connected to face_recog_known_pv, rerun with "redeploy" tag

  • dont_delete dir in unknown_people
          Dont delete end.jpg , it is corelated with show time all recognized faces.

  • Sleep 60 in recognize
          Sometimes a other services need more time to get up. To fast deploy you can comment "sleep 60", and after failed deploy recognize, rerun with tag: "recognize"

  • Circuitbreaker is engaged
          It means you have more than 5images in unknown_people dir. Probably it will unfreeze if not, you can add sleep function in

ansible/roles/recognize/tasks/main.yaml

40: shell: sleep 10 && curl -d '{"path":"{{ item.path }}"}' http://{{ receiver_ip.stdout }}:8000/image/post

Or add fewer face pictures ;)

  • Core dump using without CUDA image
          ghrik/face_recognition:1.0 was builded with AVX acceleration. All of CUDA images is using SSE4 (not AVX) If you want to use dlib without AVX acceleration check flags in dlib section:
images/face_recognitionGPU/Dockerfile

and colerate this with

images/face_recognition/Dockerfile

License

      Free to use ;)