Summary:
In recent years, there have been major improvements in computer vision tasks such as image classification and object detection, both in accuracy and performance. This improvement can be attributed to two factors, large labeled image datasets such as ImageNet that have been available for the last few years, and the rise of convolutional neural networks, which can benefit from these datasets, that have a near human level performance in these tasks.
In this Thesis, we will implement and use VGGNet, a deep Convolutional Neural Network that was submitted to the ImageNet Challenge 2014. More specifically, we will use two of its variants, VGG16 and VGG19 in order to detect and provide bounding boxes for vehicles found in CCTV footage. For this task, we will go through the training process of the two Neural Networks, compare two different methods for providing Region Proposals to the networks, Sliding Windows and Selective Search, and use two different ways to see how well our object detection system works, Accuracy and mean Average Precision.
Keywords:
VGGNet, Object Detection, Convolutional Neural Networks, Selective Search, Sliding Windows, Image processing