Vision systems - Object detection

In this article I will introduce an algorithm for detecting objects from an image and video stream. The following informations are based on the Darknet/YOLO project, you can find a short presentation on TED.

Development environment

To compile project I will use Microsoft Visual Studio 2017 and updated project from Darknet. To compile is needed:

Visual Studio 2017 - in my case I used Community edition (programming development for C/C++, Windows SDK 10.x)
OpenCV - in version 3.4.3
NVIDIA CUDA Toolkit version 10
NVIDIA cuDNN - for CUDA Toolkit 10 - after package download (after registration) jut copy archive files to CUDA Toolkit directory, in my case "D:\CUDA\v10"

Project compilation for Darknet/YOLO

Updated project for Visual Studio 2017 you can find on GitHUB.

In project is setup (Project->Properties->C/C++->General->Additional Include Directories):

OpenCV - "D:\OpenCV\opencv34\build\include"
NVIDIA CUDA Toolkit v10 - "$(CUDA_PATH)\include", what is "D:\CUDA\v10\include"

Paths for linker (Project->Properties->Linker->General->Additional Library Directories):

OpenCV - "D:\OpenCV\opencv34\build\x64\vc15\lib"
NVIDIA CUDA Toolkit v10 - "$(CUDA_PATH)\lib\x64", what is "D:\CUDA\v10\lib\x64"

Microsoft Visual Studio 2017 -project open, function main() in darknet.c

Running compiled aplication

Testing image

For application testing and debugging is possible to setup (Project->Properties->Debugging->Command Arguments): "imtest ./data/eagle.jpg". What is first testing example.

After run you will see:

Object detection from image

Next example is for detection of objects from selected image.

First setup Command Arguments: "detect ./cfg/yolov3.cfg yolov3.weights ./data/dog.jpg"

What will execute command: yolo.exe detect ./cfg/yolov3.cfg yolov3.weights ./data/dog.jpg

File with preset weights "yolov3.weights" is here.

After running you will see this console output:

layer     filters    size              input                output

    0 conv     32  3 x 3 / 1   608 x 608 x   3   ->   608 x 608 x  32  0.639 BFLOPs

    1 conv     64  3 x 3 / 2   608 x 608 x  32   ->   304 x 304 x  64  3.407 BFLOPs

...

  105 conv    255  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 255  0.754 BFLOPs

  106 yolo

Loading weights from yolov3.weights...Done!

./data/dog.jpg: Predicted in 21.612000 seconds.

dog: 100%

bicycle: 99%

truck: 92%

Project is compiled now for CPU, calculation of one image is about 20 seconds (processor Intel G4600 3.6GHz), that will be not usable for video stream detection. Next step is compilation with use of NVIDIA graphics card and CUDA units.

Project compilation for GPU with use of CUDA units

For computation is possible to use NVIDIA graphics (in my case GTX 1050 TI - 768 CUDA units). In Visual Studio project is needed to declare "GPU" macro and add .cu CUDA files - here are functions implementations with names *_gpu, that use CUDA API.

Microsoft Visual Studio 2017 - project with CUDA API implementation

Here is console output:

...

  105 conv    255  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 255  0.754 BFLOPs

  106 yolo

Loading weights from yolov3.weights...Done!

./data/dog.jpg: Predicted in 0.382000 seconds.

dog: 100%

bicycle: 99%

truck: 92%

Time for single image calculation is now around 0,38 seconds, what will be for video stream 2,5 fps.

Next code optimization is using of "CUDNN" macro, here we can reach time about 0,14 second, so for video 7 fps.

Object detection from video file

For testing video is possible to use traffic record.

yolo.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights <video file>

yolo.exe detector demo ./cfg/coco.data cfg/yolov3.cfg yolov3.weights ./data/traffic1.mp4

In console window is possible to see frame rate. With use of hardware CPU G4600 + GTX 1050 TI is frame rate about 7 fps.

Optimalization

In file cfg/yolov3.cfg is possible to adjust some values which have impact on calculation speed:

subdivisions=64, width=608, height=608 - frame rate 7 fps
subdivisions=64, width=416, height=416 - frame rate 12 fps
subdivisions=64, width=288, height=288 - frame rate 15 fps

Links

Report abuse