Supervised Learning

Machine Learning
Part 2
Supervised Learning
Supervised Learning
Supervised learningis themachine

learningtask of inferring a function
fromsupervised(labeled) training data.
(Wikipedia)
Thetraining dataconsist of a set oftraining
examples.
In supervised learning, each example is
apairconsisting of an input object (typically a
vector) and a desired output value.
A supervised learning algorithm analyzes the
training data and produces an inferred function,
which is called aclassifier(if the output is
discrete) or aregression function(if the output is
Learning a Class from Examples
Class C of a family car

Prediction: Is car x a family car?
Knowledge extraction: What do people expect from a family car?
Output:
Positive (+) and negative () examples
Input representation:
x : price, x : engine power
1
2
Training set X
X {xt ,rt }tN1
1 if x is positive
r
0 if x is negative
x1
x
x2
Class C
p1 price p2 AND e1 engine
power e2
Hypothesis class H
1 if h says x is positive
h( x)
0 if h says x is negative
Error of h on H
E(h| X ) 1 h xt rt
N
t 1
S, G, and the Version Space

most specific hypothesis, S
most general hypothesis, G
h H, between S and G is
consistent
and make up the
version space
(Mitchell, 1997)
Margin
Choose h with largest margin
Steps in Supervised
Learning
Determine the type of training examples. In the

case of handwriting analysis, for example, this
might be a single handwritten character, an
entire handwritten word, or an entire line of
handwriting.
Gather a training set. The training set needs to
be representative of the real-world use of the
function. Thus, a set of input objects is gathered
and corresponding outputs are also gathered,
either from human experts or from
measurements.
Steps in Supervised
Learning
Determine the input feature representation of the

learned function. The accuracy of the learned
function depends strongly on how the input
object is represented. Typically, the input object
is transformed into a feature vector, which
contains a number of features that are
descriptive of the object. The number of features
should not be too large, because of thecurse of
dimensionality; but should contain enough
information to accurately predict the output.
Determine the structure of the learned function
and corresponding learning algorithm. For
example, the engineer may choose to
usesupport vector machinesordecision trees.
Steps in Supervised
Learning
Complete the design. Run the learning algorithm

on the gathered training set. Some supervised
learning algorithms require the user to determine
certain control parameters. These parameters
may be adjusted by optimizing performance on a
subset (called avalidationset) of the training set,
or viacross-validation.
Evaluate the accuracy of the learned function.
After parameter adjustment and learning, the
performance of the resulting function should be
measured on a test set that is separate from the
training set.
Facts in Supervised
Learning
A wide range of supervised learning algorithms is

available, each with its strengths and
weaknesses.
There is no single learning algorithm that works
best on all supervised learning problems (No free
lunch theorem).
Noise and Model Complexity

Use the simpler one because
Simpler to use
(lower computational
complexity)
Easier to train (lower

space complexity)
Easier to explain
(more interpretable)
Generalizes better (lower

variance - Occams razor)
simpler explanations are more plausible and any unnecessary complexity should be shaved off.
Multiple Classes, Ci i=1,...,K

X {xt ,rt }tN1
t
1
if
x
Ci
t
ri
t
0
if
x
C j , j i
Train hypotheses
h (x), i =1,...,K:
i
t
1
if
x
Ci
t
hi x
t
0
if
x
C j , j i
Regression
X x ,r
t
t N
t 1
g x w1 x w0
rt
g x w2 x2 w1 x w0
rt f xt
Emperical error
1 N t
t 2
E g| X r g x
N t 1
1 N t
2
t
E w1 ,w0 | X r w1 x w0
N t 1
Model Selection & Generalization
Learning is an ill-posed problem; data is

not sufficient to find a unique solution
Generalization: How well a model
performs on new data
Overfitting: H more complex than C or f
Underfitting: H less complex than C or f
Triple Trade-Of
There is a trade-of between three

factors (Dietterich, 2003):
1.
2.
3.
Complexity of H, c (H),
Training set size, N,
Generalization error, E, on new data
As N
E
As c (H)
first Eand then E
Cross-Validation
To estimate generalization error, we need

data unseen during training. We split the
data as
Training set (50%)

Validation set (25%)
Test (publication) set (25%)
Resampling when there is few data
Dimensions of a Supervised
Learner
1.
Model:
2.
Loss function:
g x |
E | X L rt ,g xt |
t
3.
Optimization procedure:
* argminE | X

Supervised Learning

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Supervised Learning

Caricato da

Copyright:

Formati disponibili

Machine Learning

Supervised learningis themachine

Learning a Class from Examples

Class C of a family car

S, G, and the Version Space

Choose h with largest margin

Determine the type of training examples. In the

Determine the input feature representation of the

Complete the design. Run the learning algorithm

A wide range of supervised learning algorithms is

Noise and Model Complexity

Easier to train (lower

Generalizes better (lower

Multiple Classes, Ci i=1,...,K

Model Selection & Generalization

Learning is an ill-posed problem; data is

There is a trade-of between three

To estimate generalization error, we need

Training set (50%)

Resampling when there is few data

Potrebbero piacerti anche