This repository holds the implementation of detecting vehicles and indicating risky distances using YOLACT++: Better Real-time Instance Segmentation) for object detection.
It is based on the repositories Social-Distance-Monitoring by Pias Paul and YOLACT by Daniel Bolya.
- CUDA 10.2 (for utilizing GPU you'll need CUDA version 10.x)
- Python 3.7 (3.6 should also be possible, 3.8 I haven't tested)
- PyTorch 1.6.0 (with torchvision 0.7.0)
- OpenCV 4.4.0
- Flask 1.1.2
This repo is modified to be used on Windows 10 (2004).
I highly recommend to use a virtual environment with conda
or virtualenv
.
First, you need to clone the repository and go into this directory (project root):
git clone https://github.com/ArVar/VehicleDistance.git
cd VehicleDistance
You can install all packages by running:
pip install -r requirements.txt
You can also manually install all packages listed in this file.
Feel free to experiment with different versions of the packages (as I've done). When trying to build up on a newer stack, keep in mind to use the right CUDA toolkit and PyTorch combination. For the installation of PyTorch with "pip" please follow the instructions from Pytorch.
This repo is mainly meant to watch the inference stream in your browser. When executing the server.py
script a flask app will start and provide the interface.
You can also run the inference using inference.py
from command line (options see below).
If you want to run the inference on a ip camera need to use WebcamVideoStream
with the following source in webapp.py
:
"rtsp://assigned_name_of_the_camera:assigned_password@camer_ip/"
An example stream is available at:
"rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa"
To be able to use YOLACT++, make sure you have the CUDA Toolkit (
cd external/DCNv2
python setup.py build develop
The official Yolact repository offers several pre-trained models:
Image Size | Model File (-m) | Config (-c) |
---|---|---|
550 | yolact_resnet50_54_800000.pth | yolact_resnet50 |
550 | yolact_darknet53_54_800000.pth | yolact_darknet53 |
550 | yolact_base_54_800000.pth | yolact_base |
700 | yolact_im700_54_800000.pth | yolact_im700 |
550 | yolact_plus_resnet50_54_800000.pth | yolact_plus_resnet50 |
550 | yolact_plus_base_54_800000.pth | yolact_plus_base |
Download the pre-trained weights and save in the folder ./weights
(related to your project root). For instance, the yolact_plus_base is hardcoded in webapp.py
.
Now you can run the webapp via:
python server.py
This starts the webserver and the webapp. With the standard configuration the webapp is locally reachable via localhost:5000.
You can specify a path (URL) for HTTPS and RTSP streams or just a digit for one of your webcam devices.
Alternatively, you can run the inference from your terminal with the following command:
python inference.py -m=weights/yolact_base_54_800000.pth -c=yolact_base -i 0
Here -i 0
defines the device id. Use 0
if you want to run the inference on your webcam feed. If you don't parse any argument it will run with the default values. You can tweak the following values according to your preferences.
Input | Standard Value | Description |
---|---|---|
width, height | 1280 x 720 |
Resolution of the output video. |
display_lincomb | False |
Display Lincomb masks (if the config uses them). |
crop | True |
For better segmentation use this flag as True . |
score_threshold | 0.15 |
The higher the value, the less objects are detected, the better the performance. |
top_k | 30 |
At max how many objects will the model consider to detect in a given frame. |
display_masks | True |
Draw segmentation masks. |
display_fps | True |
Display FPS counter. |
display_text | True |
Allow to display text. |
display_bboxes | True |
Display bounding boxes around detected objects. |
display_scores | True |
Display classification score. |
fast_nms | True |
Use fast NMS (Non-Maximum-Supression). |
cross_class_nms | True |
Use Cross-Class-NMS. |
To measure distance between two vehicles Euclidean distance is used. Euclidean distance or Euclidean metric is the "ordinary" straight-line distance between two points in Euclidean space.
The Euclidean distance between two points p and q is the length of the line segment connecting them . In the Euclidean plane, if p = (p1, p2) and q = (q1, q2) then the distance is given by
This formula was applied in the draw_distance(boxes) function where we got all the bounding boxes of vehicle classes car
and truck
in a given frame from the model where each bounding is a regression value consisting (x,y,w,h)
. Where x
and y
represent 2 coordinates of the vehicle. w
and h
represent width and height correspondingly. All combinations of boxes are used to calculate the distances between them.
Thanks to Pias Paul for providing his repository on github. It was a very good starting point with just very few caveats when running on Windows. I recommend, checking out his other repos as well.
Thanks to Daniel Bolya et. el for introducing Single Shot detection (SSD) implementation for segmentation in YOLACT & YOLACT++ as it becomes less memory hungry.