In this article I will introduce an algorithm for detecting objects from an image and video stream. The following informations are based on the Darknet/YOLO project, you can find a short presentation on TED.
To compile project I will use Microsoft Visual Studio 2017 and updated project from Darknet. To compile is needed:
Updated project for Visual Studio 2017 you can find on GitHUB.
In project is setup (Project->Properties->C/C++->General->Additional Include Directories):
Paths for linker (Project->Properties->Linker->General->Additional Library Directories):
Microsoft Visual Studio 2017 -project open, function main() in darknet.c
For application testing and debugging is possible to setup (Project->Properties->Debugging->Command Arguments): "imtest ./data/eagle.jpg". What is first testing example.
After run you will see:
Next example is for detection of objects from selected image.
First setup Command Arguments: "detect ./cfg/yolov3.cfg yolov3.weights ./data/dog.jpg"
What will execute command: yolo.exe detect ./cfg/yolov3.cfg yolov3.weights ./data/dog.jpg
File with preset weights "yolov3.weights" is here.
After running you will see this console output:
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BFLOPs
1 conv 64 3 x 3 / 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BFLOPs
...
105 conv 255 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 255 0.754 BFLOPs
106 yolo
Loading weights from yolov3.weights...Done!
./data/dog.jpg: Predicted in 21.612000 seconds.
dog: 100%
bicycle: 99%
truck: 92%
Project is compiled now for CPU, calculation of one image is about 20 seconds (processor Intel G4600 3.6GHz), that will be not usable for video stream detection. Next step is compilation with use of NVIDIA graphics card and CUDA units.
For computation is possible to use NVIDIA graphics (in my case GTX 1050 TI - 768 CUDA units). In Visual Studio project is needed to declare "GPU" macro and add .cu CUDA files - here are functions implementations with names *_gpu, that use CUDA API.
Microsoft Visual Studio 2017 - project with CUDA API implementation
Here is console output:
...
105 conv 255 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 255 0.754 BFLOPs
106 yolo
Loading weights from yolov3.weights...Done!
./data/dog.jpg: Predicted in 0.382000 seconds.
dog: 100%
bicycle: 99%
truck: 92%
Time for single image calculation is now around 0,38 seconds, what will be for video stream 2,5 fps.
Next code optimization is using of "CUDNN" macro, here we can reach time about 0,14 second, so for video 7 fps.
For testing video is possible to use traffic record.
yolo.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights <video file>
yolo.exe detector demo ./cfg/coco.data cfg/yolov3.cfg yolov3.weights ./data/traffic1.mp4
In console window is possible to see frame rate. With use of hardware CPU G4600 + GTX 1050 TI is frame rate about 7 fps.
In file cfg/yolov3.cfg is possible to adjust some values which have impact on calculation speed: