Machine Learning

index

Object Detection

Top ↑

VIOLA-JONES (2001)

Wikipedia

Developed with the primary aim of face detection, is the first object detection framework to provide competitive object detection rates in real-time.
The algorithm has three phases:

1. Image representation as, what they call, integral image, which is a feature selection based on Haar Features.
2. Selection of relevant features using AdaBoost training
3. Reducing the search space eliminating sub-windows by successively applying more complex classifiers in a cascade structure.

The third phase in particular is known as a cascade classifier .
While the accuracy is not comparable with actual deep learning models, due to the fact that it was intended for use on low power CPUs (phones, cameras), it is lightweight and fast.

Robust Real-time Object Detection

Implementing the Viola-Jones Face Detection Algorithm

show details↓

HOG-SVM (2005)

Wikipedia

Traditional and similar to Viola-Jones. Uses Histogram of Oriented Gradients (HOG) features and Support Vector Machine (SVM) for classification. It still requires a multi-scale sliding window, and even though it’s superior to Viola-Jones, it’s much slower.

Histograms of Oriented Gradients for Human Detection

show details↓

Selective Search for Object Recognition (2013)

Replaces the exaustive search of the previous models with a selective search. This is accomplished using segmentation to generate a limited set of locations on which bag of words features are calculated.
Instead of searching a small number (tens) of accurate locations (usually selected with some sort of contour analysis), a large number (thousands) of approximate locations are generated at all scales. Initially the image is oversegmented and the various segments are progressively grouped together. This makes possible to account for all scales. Also different grouping strategies can be used to account for different type of features (eg: color-based, texture-based, ecc.). Finally, because it reduces the object locations to consider for the actual recognition it allows a more computing intensive classifier.

Segmentation As Selective Search for Object Recognition

Selective Search for Object Recognition

show details↓

OverFeat (2013)

Single ConvNet for detection, recognition and localization. It uses multi-scale sliding windows to produce a distribution over categories for each window. In addition it produces a prediction of position and size of the bounding box relative to the window. In contrast to selective search proposals are accumated with subsequent passes.

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

show details↓

R-CNN (2014)

Object detection system based on three modules:

1. Generation of category-indipendent region proposals. Various methods can be used (eg: Selective Search).
2. Feature extraction for every region using a CNN.
3. Classification with SVMs.

The model used pre training for region proposals and features. To slim the model a number of subsequent implementations have been developed, such as fast and faster rcnn.

Rich feature hierarchies for accurate object detection and semantic segmentation

show details↓

Fast R-CNN (2015)

R-CNN approach quickly evolved into a purer deep learning one. Similar to R-CNN, it used Selective Search to generate object proposals, but instead of extracting all of them independently and using SVM classifiers, it applied the CNN on the complete image and then used both Region of Interest (RoI) Pooling on the feature map with a final feed forward network for classification and regression. Not only was this approach faster, but having the RoI Pooling layer and the fully connected layers allowed the model to be end-to-end differentiable and easier to train. The biggest downside was that the model still relied on Selective Search (or any other region proposal algorithm), which became the bottleneck when using it for inference.

Fast R-CNN

show details↓

YOLO (2015)

Shortly after that, You Only Look Once: Unified, Real-Time Object Detection (YOLO) paper published by Joseph Redmon (with Girshick appearing as one of the co-authors). YOLO proposed a simple convolutional neural network approach which has both great results and high speed, allowing for the first time real time object detection.

You Only Look Once: Unified, Real-Time Object Detection

show details↓

Faster R-CNN (2016)

Faster R-CNN, the third iteration of the R-CNN series. Faster R-CNN added what they called a Region Proposal Network (RPN), in an attempt to get rid of the Selective Search algorithm and make the model completely trainable end-to-end. RPNs has the task to output objects based on an “objectness” score. These objects are used by the RoI Pooling and fully connected layers for classification.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

show details↓

SSD (2016)

Single Shot Detector (SSD) takes on YOLO by using multiple sized convolutional feature maps achieving better results and speed.

SSD: Single Shot MultiBox Detector

show details↓

R-FCN (2016)

Region-based Fully Convolutional Networks (R-FCN) takes the architecture of Faster R-CNN but with only convolutional networks.

R-FCN: Object Detection via Region-based Fully Convolutional Networks

show details↓

Mask R-CNN (2017)

Mask R-CNN

show details↓

Recurrent Neural Networks

Top ↑

Generating Sequences (2014)

Generating Sequences With Recurrent Neural Networks

show details↓

Reduce complexity

Top ↑

Dropout (2015)

Dropout A Simple Way to Prevent Neural Networks from Overfitting

show details↓

Weight-Connections (2015)

Learning both Weights and Connections for Efficient Neural Networks.

show details↓

Efficient Inference Engine (2016)

EIE: Efficient Inference Engine on Compressed Deep Neural Network

show details↓

Deep Compression (2016)

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

show details↓

Dynamic Net Surgery (2016)

show details↓