Sei sulla pagina 1di 19

Machine Learning

Part 2
Supervised Learning

Supervised Learning

Supervised learningis themachine


learningtask of inferring a function
fromsupervised(labeled) training data.
(Wikipedia)
Thetraining dataconsist of a set oftraining
examples.
In supervised learning, each example is
apairconsisting of an input object (typically a
vector) and a desired output value.
A supervised learning algorithm analyzes the
training data and produces an inferred function,
which is called aclassifier(if the output is
discrete) or aregression function(if the output is

Learning a Class from Examples

Class C of a family car


Prediction: Is car x a family car?
Knowledge extraction: What do people expect from a family car?

Output:
Positive (+) and negative () examples

Input representation:
x : price, x : engine power
1
2

Training set X
X {xt ,rt }tN1

1 if x is positive
r
0 if x is negative

x1
x
x2

Class C
p1 price p2 AND e1 engine

power e2

Hypothesis class H
1 if h says x is positive
h( x)
0 if h says x is negative

Error of h on H

E(h| X ) 1 h xt rt
N

t 1

S, G, and the Version Space


most specific hypothesis, S
most general hypothesis, G
h H, between S and G is
consistent
and make up the
version space
(Mitchell, 1997)

Margin

Choose h with largest margin

Steps in Supervised
Learning

Determine the type of training examples. In the


case of handwriting analysis, for example, this
might be a single handwritten character, an
entire handwritten word, or an entire line of
handwriting.
Gather a training set. The training set needs to
be representative of the real-world use of the
function. Thus, a set of input objects is gathered
and corresponding outputs are also gathered,
either from human experts or from
measurements.

Steps in Supervised
Learning

Determine the input feature representation of the


learned function. The accuracy of the learned
function depends strongly on how the input
object is represented. Typically, the input object
is transformed into a feature vector, which
contains a number of features that are
descriptive of the object. The number of features
should not be too large, because of thecurse of
dimensionality; but should contain enough
information to accurately predict the output.
Determine the structure of the learned function
and corresponding learning algorithm. For
example, the engineer may choose to
usesupport vector machinesordecision trees.

Steps in Supervised
Learning

Complete the design. Run the learning algorithm


on the gathered training set. Some supervised
learning algorithms require the user to determine
certain control parameters. These parameters
may be adjusted by optimizing performance on a
subset (called avalidationset) of the training set,
or viacross-validation.
Evaluate the accuracy of the learned function.
After parameter adjustment and learning, the
performance of the resulting function should be
measured on a test set that is separate from the
training set.

Facts in Supervised
Learning

A wide range of supervised learning algorithms is


available, each with its strengths and
weaknesses.
There is no single learning algorithm that works
best on all supervised learning problems (No free
lunch theorem).

Noise and Model Complexity


Use the simpler one because

Simpler to use
(lower computational
complexity)

Easier to train (lower


space complexity)

Easier to explain
(more interpretable)

Generalizes better (lower


variance - Occams razor)

simpler explanations are more plausible and any unnecessary complexity should be shaved off.

Multiple Classes, Ci i=1,...,K


X {xt ,rt }tN1
t

1
if
x
Ci
t
ri
t
0
if
x
C j , j i

Train hypotheses
h (x), i =1,...,K:
i
t

1
if
x
Ci
t

hi x

t
0
if
x
C j , j i

Regression
X x ,r
t

t N
t 1

g x w1 x w0

rt

g x w2 x2 w1 x w0

rt f xt
Emperical error

1 N t
t 2
E g| X r g x
N t 1

1 N t
2
t
E w1 ,w0 | X r w1 x w0
N t 1

Model Selection & Generalization

Learning is an ill-posed problem; data is


not sufficient to find a unique solution
Generalization: How well a model
performs on new data
Overfitting: H more complex than C or f
Underfitting: H less complex than C or f

Triple Trade-Of

There is a trade-of between three


factors (Dietterich, 2003):
1.
2.
3.

Complexity of H, c (H),
Training set size, N,
Generalization error, E, on new data

As N
E
As c (H)
first Eand then E

Cross-Validation

To estimate generalization error, we need


data unseen during training. We split the
data as

Training set (50%)


Validation set (25%)
Test (publication) set (25%)

Resampling when there is few data

Dimensions of a Supervised
Learner
1.

Model:

2.

Loss function:

g x |

E | X L rt ,g xt |
t

3.

Optimization procedure:

* argminE | X

Potrebbero piacerti anche