Sei sulla pagina 1di 29

CONVOLUTIONAL

NEURAL NETWORKS
RIDDHIMAN DASGUPTA & AYUSHI DALMIA
CSE577 TUTORIAL, IIIT HYDERABAD, MONSOON 2015

DEEP LEARNING: OVERVIEW

Feature visualization of convnet trained on ImageNet by Zeiler and Fergus, 2013.


Each layer of features is learned, and hierarchical.

DEEP LEARNING: OVERVIEW


Each level transforms its input to a higher level one
Deep means more than one stage of non-linear feature transformation

High level features are global and invariant


Low level features are shared among categories

DEEP LEARNING: COMPARISON


Neural networks with 1 hidden layer are not deep
There is no feature hierarchy, just one non linear feature transformation

SVMs are not deep


Kernels followed by linear classifiers implies no feature hierarchy

Classification trees are not deep


All decisions are in input space only, depth of tree does not have same meaning

Graphical models are orthogonal to deep learning


Factors can come from deep networks

CONVNETS: HISTORY

Cognitron/Neocognitron
Fukushima, 1971

Hubel-Wiesel Architecture
Hubel & Wiesel, 1962
Simple cells for local features
Complex cells for pooling

Multistage Hubel-Wiesel Architecture


Yann Lecun, 1988

CONVNETS: OVERVIEW
Input Image
Feed forward
Convolution (Filtering)
Non linearity (Activation)
Pooling (Dimension reduction)

Convolution (Learned)

Non Linearity

Back propagation
Supervised classification error
Update weights of convolutional filters

Pooling

Feature Maps

CONVNETS: FULLY CONNECTED LAYERS

For a 200*200 image with 40,000


hidden nodes, 1.6 billion parameters
are needed
Spatial correlation is locally
concentrated
Wastage of resources bound to
happen

CONVNETS: LOCALLY CONNECTED LAYERS

For a 200*200 image and a filter of


size 10*10, with 40,000 hidden
nodes, 40 million parameters are
needed
Much more computationally efficient
Utilises locality and correlation in
image

CONVNETS: CONVOLUTIONAL LAYERS

Share same parameters across


different locations
Similar to a sliding window

Each filer is akin to a convolutional


kernel
Convolutional kernels need to be
learned

CONVNETS: CONVOLUTIONAL LAYERS

Multiple filters can be learned


For a 200*200 image and a filter of
size 10*10, with 100 filters, 10,000
parameters are needed
In general, for image and
filter, and stride of 1, the hidden
layer will be of size ( + 1)
( + 1)

CONVNETS: CONVOLUTIONAL LAYERS

CONVNETS: NON LINEARITY


Tanh
=

Sigmoid
1

= 1+

ReLU

= max 0,
Preferred choice
Fast to compute
Simplifies back propagation

CONVNETS: POOLING LAYERS

Pooling implies aggregation


Robustness to exact spatial location
of features
Additional inter-feature competition

CONVNETS: POOLING LAYERS

Max
pooling

Average
pooling

L2
pooling

For pooling size, and


input maps, i.e. hidden layer size, the

output map will be units in size.

Computational cost negligible


compared to convolution

CONVNETS: NORMALISATION LAYERS

CONVNETS: NORMALISATION LAYERS

Local contrastive normalisation


Local neighbourhood
Zero mean
Unit variance

BEFORE

Additional inter feature competition

Increases sparsity & improves


invariance at very low cost

AFTER

CONVNETS: PUTTING LAYERS TOGETHER


Input Pixels

Non Linearity

Pooling

Convolution

Normalisation

Features

Stack multiple stages of the


architecture one after another to form
a convnet

Final layers are usually fully connected


layers, acting as hidden layers of a
neural network
Final layer is a classifier, usually a
softmax classifier, with a suitable loss
function for the task

CONVNETS: PUTTING LAYERS TOGETHER


Input Pixels

Convolution

Filter bank + non linearity gives a nonlinear embedding in high dimension


Non Linearity

Normalisation

Pooling is basically contraction and


dimensionality reduction
Normalisation is simply smoothing

Equivalent to simple + complex cell


model of vision
Pooling

Features

CONVNETS: PUTTING LAYERS TOGETHER

Convolution

Non Linearity

Normalisation

Pooling

Top: Single stage, zoomed in


Bottom: Usual convnet pipeline

Input image

Stage 1

Stage 2

Stage 3

Fully
connected
layers

Classifier

CONVNETS: TRICKS OF THE TRADE


Hyper parameter selection is difficult
Grid search is highly inefficient due to large number of parameters

Use stochastic gradient descent with mini-batches


Use momentum
=

+ 1

Use weight decay


=

Use weight sharing to reduce number of parameters

Use unsupervised pre-training if data is limited

CONVNETS: AUGMENTATION

Crops
Random perturbations
Corners & centres
Translation
Reflections
Rotation
Horizontal & vertical
Jitter / Noise

CONVNETS: DROPOUT

Randomly omit each hidden unit with a probability, usually 0.5


Equivalent to sampling from an ensemble of 2 models
Fast and efficient regularization, robust to noisy inputs
At test time, each hidden unit is given a weight, usually 0.5

CONVNETS: SLIDING WINDOW


Traditionally, applying a
sliding window to entire
images is very expensive

Convolutional nets can


easily be replicated over
large images very cheaply
Simply apply convolutions
to entire image and
spatially replicate fullyconnected layers

CONVNETS: FULLY CONVOLUTIONAL

Transform fixed input-size


models into any-size models by
converting all inner products to
1 1 convolutions

CONVNETS: GENERIC FEATURES

REFERENCES
Slides:
Yann Lecun, CVPR 2014 Workshop on Deep Learning in Vision
Marc Ranzato, CVPR 2014 Workshop on Deep Learning in Vision
Rob Fergus, NIPS 2013

LeCun, Bottou, Bengio and Haffner: Gradient-Based Learning Applied to Document


Recognition
Krizhevsky, Sutskever and Hinton: ImageNet Classification with deep convolutional neural
networks

Girshick, Donahue, Darrell and Malick: Rich feature hierarchies for accurate object
detection and semantic segmentation
Razavian, Azizpour, Sullivan and Carlsson: CNN features off-the-shelf: and astounding
baseline for recognition

THANK YOU

Potrebbero piacerti anche