Deep computer-vision algorithms for Processing.
The idea behind this library is to provide a simple way to use (inference) machine learning algorithms for computer vision tasks inside Processing. Mainly portability and easy-to-use are the primary goals of this library. Because of that (and javacv), no GPU support at the moment.
Caution: The API is still in development and can change at any time.
Lightweight OpenPose Example
Download the latest prebuilt version from the release sections and install it into your Processing library folder.
Because the library is still under development, it is not yet published in the Processing contribution manager.
The base of the library is the DeepVision
class. It is used to download the pretrained models and create new networks.
import ch.bildspur.vision.*;
import ch.bildspur.vision.network.*;
import ch.bildspur.vision.result.*;
DeepVision vision = new DeepVision(this);
Usually it makes sense to define the network globally for your sketch and create it in setup. The create
method downloads the pre-trained weights if they are not already existing. The network first has to be created and then be setup.
YOLONetwork network;
void setup() {
// create the network & download the pre-trained models
network = vision.createYOLOv3();
// load the model
network.setup();
// set network settings (optional)
network.setConfidenceThreshold(0.2f);
...
}
By default, the weights are stored in the library folder of Processing. If you want to download them to the sketch folder, use the following command:
// download to library folder
vision.storeNetworksGlobal();
// download to sketch/networks
vision.storeNetworksInSketch();
Each network has a run()
method, which takes an image as a parameter and outputs a result. You can just throw in any PImage and the library starts processing it.
PImage myImg = loadImage("hello.jpg");
ResultList<ObjectDetectionResult> detections = network.run(myImg);
Please have a look at the specific networks for further information or at the examples.
Here you find a list of implemented networks:
- Object Detection ✨
- YOLOv3-tiny
- YOLOv3-tiny-prn
- EfficientNetB0-YOLOv3
- YOLOv3 OpenImages Dataset
- YOLOv3-spp (spatial pyramid pooling)
- YOLOv3
- YOLOv4
- YOLOv4-tiny
- SSDMobileNetV2
- Handtracking based on SSDMobileNetV2
- TextBoxes
- Ultra-Light-Fast-Generic-Face-Detector-1MB RFB (~30 FPS on CPU)
- Ultra-Light-Fast-Generic-Face-Detector-1MB Slim (~40 FPS on CPU)
- Cascade Classifier
- Object Segmentation
- Mask R-CNN
- Object Recognition 🚙
- Tesseract LSTM
- Keypoint Detection 🤾🏻♀️
- Facial Landmark Detection
- Single Human Pose Detection based on lightweight openpose
- Classification 🐈
- MNIST CNN
- FER+ Emotion
- Age Net
- Gender Net
- Depth Estimation 🕶
- MidasNet
- Image Processing
- Style Transfer
- Multiple Networks for x2 x3 x4 Superresolution
The following list shows the networks that are on the list to be implemented (⚡️ already in progress):
- YOLO 9K (not supported by OpenCV)
- Multi Human Pose Detection ⚡️ (currently struggling with the partial affinity fields 🤷🏻♂️ help?)
- TextBoxes++ ⚡️
- CRNN ⚡️
- PixelLink
Locating one or multiple predefined objects in an image is the task of the object detection networks.
YOLO Example
The result of these networks is usually a list of ObjectDetectionResult
.
ObjectDetectionNetwork net = vision.createYOLOv3();
net.setup();
// detect new objects
ResultList<ObjectDetectionResult> detections = net.run(image);
for (ObjectDetectionResult detection : detections) {
println(detection.getClassName() + "\t[" + detection.getConfidence() + "]");
}
Every object detection result contains the following fields:
getClassId()
- id of the class the object belongs togetClassName()
- name of the class the object belongs togetConfidence()
- how confident the network is on this detectiongetX()
- x position of the bounding boxgetY()
- y position of the bounding boxgetWidth()
- width of the bounding boxgetHeight()
- height of the bounding box
YOLOv3 [Paper]
YOLOv3 the third version of the very fast and accurate single shot network. The pre-trained model is trained on the 80 classes COCO dataset. There are three different weights & models available in the repository:
- YOLOv3-tiny (very fast, but trading performance for accuracy)
- YOLOv3-spp (original model using spatial pyramid pooling)
- YOLOv3 (608)
- YOLOv4 (608) (most accurate network)
- YOLOv4-tiny (416)
// setup the network
YOLONetwork net = vision.createYOLOv4();
YOLONetwork net = vision.createYOLOv4Tiny();
YOLONetwork net = vision.createYOLOv3();
YOLONetwork net = vision.createYOLOv3SPP();
YOLONetwork net = vision.createYOLOv3Tiny();
// set confidence threshold
net.setConfidenceThreshold(0.2f);
SSDMobileNetV2 [Paper]
This network is a single shot detector based on the mobilenetv2 architecture. It is pre-trained on the 90 classes COCO dataset and is really fast.
SSDMobileNetwork net = vision.createMobileNetV2();
Handtracking [Project]
This is a pre-trained SSD MobilenetV2 network to detect hands.
SSDMobileNetwork net = vision.createHandDetector();
TextBoxes [Paper]
TextBoxes is a scene text detector in the wild based on SSD MobileNet. It is able to detect text in a scene and return its location.
TextBoxesNetwork net = vision.createTextBoxesDetector();
Ultra-Light-Fast-Generic-Face-Detector [Project]
ULFG Face Detector is a very fast CNN based face detector which reaches up to 40 FPS on a MacBook Pro. The face detector comes with four different pre-trained weights:
- RFB640 & RFB320 - More accurate but slower detector
- Slim640 & Slim320 - Less accurate but faster detector
ULFGFaceDetectionNetwork net = vision.createULFGFaceDetectorRFB640();
ULFGFaceDetectionNetwork net = vision.createULFGFaceDetectorRFB320();
ULFGFaceDetectionNetwork net = vision.createULFGFaceDetectorSlim640();
ULFGFaceDetectionNetwork net = vision.createULFGFaceDetectorSlim320();
The detector detects only the frontal face part and not the complete head. Most algorithms that run on results of face detections need a rectangular detection shape.
Cascade Classifier [Paper]
The cascade classifier detector is based on boosting and very common as pre-processor for many classifiers.
CascadeClassifierNetwork net = vision.createCascadeFrontalFace();
tbd
tbd
tbd
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
MidasNet
tbd
- Install JDK 8 (because of Processing)
Download maven snapshots for JavaCV:mvn -U compile
(No need to this anymore)
Run gradle to build a fat jar:
# windows
gradlew.bat fatjar
# mac / unix
./gradlew fatjar
Create a new release:
./release.sh version
Why is xy network not implemented?
Please open an issue if you have a cool network that could be implemented or just contribute a PR.
Why is it no possible to train my own network?
The idea was to give artist and makers a simple tool to run networks inside of Processing. To train a network needs a lot of specific knowledge about Neural Networks (CNN in specific).
Of course it is possible to train your own YOLO or SSDMobileNet and use the weights with this library.
Is it compatible with Greg Borensteins OpenCV for Processing?
No, OpenCV for Processing uses the direct OpenCV Java bindings instead of JavaCV. Please only include either one library, because Processing gets confused if two OpenCV packages are imported.
Maintained by cansik with the help of the following dependencies:
Stock images from the following peoples have been used:
- yoga.jpg by Yogendra Singh from Pexels
- office.jpg by fauxels from Pexels
- faces.png by shvetsa from Pexels
- hand.jpg by Thought Catalog on Unsplash
- sport.jpg by John Torcasio on Unsplash
- sticker.jpg by 🇨🇭 Claudio Schwarz | @purzlbaum on Unsplash
- children.jpg by Sandeep Kr Yadav