Sei sulla pagina 1di 23

Artificial Intelligence 6.

Machine Learning, Version Space Method

Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka

Schedule
Oct 26 Oct 28 Nov 2 Nov 4 Nov 9 Machine Learning, Version Space Method Decision Trees Supervised and Unsupervised Learning Perceptron Neural Network
Office hour: exam. (logic)

Office hour: questions for the report

Office hour: exam.

Outline
Introduction to machine learning
What is machine learning? Applications of machine learning

Version space method


Representing hypotheses, version space Find-S algorithm Candidate-Elimination algorithm

http://www.jaist.ac.jp/~tsuruoka/lectures/

What is machine learning?


What does a machine learn? What machine learning can do:
Classification, regression, structured prediction Clustering

Machine learning involves


Theories of optimization, probability, graphs, search, logic, etc.

Recognizing handwritten digits

Hastie, Tibshirani and Friedman (2008). The Elements of Statistical Learning (2nd edition). Springer-Verlag.

Applications of machine learning


Image/speech recognition Part-of-speech tagging, syntactic parsing, word sense disambiguation Detection of spam emails Intrusion detection Credit card fraud detection Automatic driving AI players in computer games etc.

Types of machine learning


Supervised learning
correct output is given for each instance

Unsupervised learning
No output is given Analyses relations between instances

Reinforcement learning
Supervision is given via rewards

Why machine learning?


Why not write rules manually?
Detecting spam emails
If the mail contains the word Nigeria then it is a spam If the mail comes from IP X.X.X.X then it is a spam If the mail contains a large image then it is a spam

Too many rules Hard to keep consistency Each rule may not be completely correct

Version space method


Chapter 2 of Mitchell, T., Machine Learning (1997)

Concept Learning Training examples Representing hypotheses Find-S algorithm Version space Candidate-Elimination algorithm

Learning a concept with examples


Training examples
Ex. 1 2 3 4 Sky Sunny Sunny Rainy Sunny AirTemp Warm Warm Cold Warm

attributes
Humidity Normal High High High Wind Strong Strong Strong Strong Water Warm Warm Warm Cool Forecast Same Same Change Change EnjoySport Yes Yes No Yes

The concept we want to learn


Days on which my friend Aldo enjoys his favorite water sports

Hypotheses
Representing hypotheses
h1 = <Sunny, ?, ?, Strong, ?, ?>
Weather = Sunny, Wind = Strong (the other attributes can be any values)

h2 = <Sunny, ?, ?, ?, ?, ?>
Weather = Sunny

General and Specific


h1 is more specific than h2 (h2 is more general than h1)

Find-S Algorithm
1. Initialize h to the most specific hypothesis in H 2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x Then do nothing Else replace ai in h by the next more general constraint that is satisfied by x

3. Output hypothesis h

Example
h0 = <0, 0, 0, 0, 0, 0> x1 = <Sunny, Warm, Normal, Strong, Warm, Same>, yes h1 = <Sunny, Warm, Normal, Strong, Warm, Same> x2 = <Sunny, Warm, High, Strong, Warm, Same>, yes h2 = <Sunny, Warm, ?, Strong, Warm, Same> x3 = <Rainy, Cold, High, Strong, Warm, Change>, no h3 = <Sunny, Warm, ?, Strong, Warm, Same> x4 = <Sunny, Warm, High, Strong, Cool, Change>, yes h4 = <Sunny, Warm, ?, Strong, ?, ?>

Problems with the Find-S algorithm


It is not clear whether the output hypothsis is the correct hypothesis
There can be other hypotheses that are consistent with the training examples. Why prefer the most specific hypothesis?

Cannot detect when the training data is inconsistent

Version Space
Definition
Hypothesis space H Training examples D Version space:

VSH , D h H Consistenth, D
The subset of hypotheses from H consistent with the training examples in D

LIST-THEN-ELIMINATE algorithm
1. VersionSpace a list containing every hypothesis in H 2. For each training example, <x, c(x)>
Remove from VersionSpace any hypothesis h for which h(x) c(x)

3. Output the list of hypothesis in VersionSpace

Version Space
Specific boundary and General boundary
S: { <Sunny, Warm, ?, Strong, ?, ?> }

<Sunny, ?, ?, Strong, ?, ?>

<Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

G: { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?> }

The version space can be represented with S and G. You dont have to list all the hypotheses.

Candidate-Elimination algorithm
Initialization
G: the set of maximally general hypotheses in H S: the set of maximally specific hypotheses in H

For each training example d, do


If d is a positive example
Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is not consistent with d
Remove s from S Add to S all minimal generalization h of s such that h is consistent with d, and some member of G is more general than h

Remove from S any hypothesis that is more general than another hypothesis in S

If d is a negative example

Example
1st training example

<Sunny, Warm, Normal, Strong, Warm, Same>, yes S0: { <0, 0, 0, 0, 0, 0> } S1: { <Sunny, Warm, Normal, Strong, Warm, Same> }

G0, G1: { <?, ?, ?, ?, ?, ?> }

Example
2nd training example

<Sunny, Warm, High, Strong, Warm, Same>, yes S1: { <Sunny, Warm, Normal, Strong, Warm, Same> }

S2: { <Sunny, Warm, ?, Strong, Warm, Same> }

G0, G1 , G2 : { <?, ?, ?, ?, ?, ?> }

Example
3rd training example <Rainy, Cold, High, Strong, Warm, Change>, no S2,S3 :{ <Sunny, Warm, ?, Strong, Warm, Same> }

G3: { <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same> }

G2 : { <?, ?, ?, ?, ?, ?> }

Example
4th training example <Sunny, Warm, High, Strong, Cool, Change>, yes S3 :{ <Sunny, Warm, ?, Strong, Warm, Same> } S4 :{ <Sunny, Warm, ?, Strong, ?, ?> }

G4: { <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> }


G3: { <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same> }

The final version space


S4 :{ <Sunny, Warm, ?, Strong, ?, ?> }

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

G4: { <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> }

Potrebbero piacerti anche