Sei sulla pagina 1di 4

Deep learning

Deep LearningArtificial IntelligenceMachine Learning


Deep learning Deep learning (also known as deep structural learning or
hierarchical learning) is a set of algorithms in machine learning that attempt to
learn in multiple levels, corresponding to different levels of abstraction. If we
consider a simple case, where there might be two sets of neurons. One
neurons set will receive an input signal and others will send an output signal.
When the input layer receives an input signal, it passes the modified version of
the input to the next layer. In a deep network, there are many layers between
the input and output, allowing the algorithm to use multiple processing layers,
composed of multiple linear and non-linear transformations.

During the past several years, the techniques developed from deep learning
research have already been impacting a wide range of signal and information
processing work within the traditional and the new, widened scopes including
key aspects of machine learning and artificial intelligence.

Three important reasons for the popularity of deep learning today are

 The drastically increased chip processing abilities (e.g., general purpose


graphical processing units or GPGPUs),
 The significantly lowered cost of computing hardware, and
 The recent advances in machine learning and signal/information
processing research.

These advances have enabled the deep learning methods to effectively exploit
complex, compositional nonlinear functions, to learn distributed and
hierarchical feature representations, and to make effective use of both labeled
and unlabeled data.

Deep learning typically uses artificial neural networks. The levels in these
learned statistical models correspond to distinct levels of concepts, where
higher level concepts are defined from lower level ones, and the same lower
level concepts can help to define many higher level concepts.

Deep learning has two key aspects:


1. models consisting of multiple layers or stages of nonlinear information
processing and
2. methods for supervised or unsupervised learning of feature
representation at successively higher, more abstract layers. Deep
learning is in the intersections among the research areas of neural
networks, artificial intelligence, graphical modeling, optimization,
pattern recognition, and signal processing.

Historically, the concept of deep learning originated from artificial neural


network research. Feed-forward neural networks or MLPs with many hidden
layers, which are often referred to as deep neural networks (DNNs), are good
examples of the models with a deep architecture. Back-propagation (BP),
popularized in 1980’s, has been a well-known algorithm for learning the
parameters of these networks. Unfortunately back-propagation alone did not
work well in practice.

Using hidden layers with many neurons in a DNN significantly improves the
modeling power of the DNN and creates many closely optimal configurations.
Even if parameter learning is trapped in to a local optimum, the resulting DNN
can still perform quite well since the chance of having a poor local optimum is
lower than when a small number of neurons are used in the network. Using
deep and wide neural networks, however, would cast great demand to the
computational power during the training process.

Most machine learning and signal processing techniques had exploited shallow
structured architectures. These architectures typically contain at most one or
two layers of nonlinear feature transformations. Examples of the shallow
architectures are Gaussian mixture models (GMMs), Support vector machines
(SVMs), logistic regression, kernel regression, multilayer perceptrons (MLPs)
with a single hidden layer including extreme learning machines (ELMs). For
instance, SVMs use a shallow linear pattern separation model with one or zero
feature transformation layer. Shallow architectures have been shown effective
in solving many simple or well constrained problems, but their limited
modeling and representational power can cause difficulties when dealing with
more complicated real world applications involving natural signals such as
human speech, natural sound and language, and natural image and visual
scenes.
Deep learning algorithms are contrasted with shallow learning algorithms by
the number of parameterized transformations a signal encounters as it
propagates from the input layer to the output layer. Here parameterized
transformation is a processing unit the has trainable parameters, such as
weights and thresholds.

For simplicity, we can think DNNs as decision-making black boxes. They take
an array of numbers (that can represent pixels, audio waveforms, or words),
run a series of functions on that array, and output one or more numbers as
outputs. The outputs are usually a prediction of some properties we’re trying
to guess from the input, for example whether or not an image is a picture of a
cat.

The functions that are run inside the black box are controlled by the memory
of the neural network, arrays of numbers known as weights that define how
the inputs are combined and recombined to produce the results. Dealing with
real-world problems like cat-detection requires very complex functions, which
mean these arrays are very large, containing around 60 million numbers in the
case of one of the recent computer vision networks. The biggest obstacle to
using neural networks has been figuring out how to set all these massive
arrays to values that will do a good job transforming the input signals into
output predictions.

One of the theoretical properties of neural networks that has kept researchers
working on them is that they should be teachable. It’s pretty simple to show
on a small scale how you can supply a series of example inputs and expected
outputs, and go through a mechanical process to take the weights from initial
random values to progressively better numbers that produce more accurate
predictions.

How Deep learning neural network architectures differ from


"normal" neural networks? Deep learning neural network architectures
differ from "normal" neural networks because they have more hidden layers
and they can be trained in an UNSUPERVISED or SUPERVISED manner for
both UNSUPERVISED and SUPERVISED learning tasks. Moreover, people often
talk about training a deep network in an unsupervised manner, before training
the network in a supervised manner.
Why so many layers in DNN? Deep learning works because of the
architecture of the network AND the optimization routine applied to that
architecture. The network is a directed graph, meaning that each hidden unit
is connected to many other hidden units below it. So each hidden layer going
further into the network is a NON-LINEAR combination of the layers below it,
because of all the combining and recombining of the outputs from all the
previous units in combination with their activation functions. When the
OPTIMIZATION routine is applied to the network, each hidden layer then
becomes an OPTIMALLY WEIGHTED, NON-LINEAR combination of the layer
below it. When each sequential hidden layer has less units than the one below
it, each hidden layer becomes a LOWER DIMENSIONAL PROJECTION of the
layer below it as well. So the information from the layer below is nicely
summarized by a NON-LINEAR, OPTIMALLY WEIGHTED, LOWER
DIMENSIONAL PROJECTION in each subsequent layer of the deep network.

Problems with deep neural networks If DNNs are naively trained, many
issues can arise. Two common issues are overfitting and computation time.

DNNs are prone to overfitting because of the added layers of abstraction,


which allow them to model rare dependencies in the training data. Overfitting
Regularization methods can be applied during training to help combat
overfitting. A more recent regularization method applied to DNNs is dropout
regularization. In dropout, some number of units are randomly omitted from
the hidden layers during training. This helps to break the rare dependencies
that can occur in the training data.

There are many training parameters to be considered with a DNN, such as the
size (number of layers and number of units per layer), the learning rate and
initial weights. Sweeping through the parameter space for optimal parameters
may not be feasible due to the cost in time and computational resources.

Potrebbero piacerti anche