Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Ke Chen
2
COMP24111 Machine Learning
Background
• There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM
b) Model the probability of class memberships given input data
Example: perceptron with the cross-entropy cost
c) Make a probabilistic model of data within each class
Examples: naive Bayes, model based classifiers
• a) and b) are examples of discriminative classification
• c) is an example of generative classification
• b) and c) are both examples of probabilistic classification
3
COMP24111 Machine Learning
Probability Basics
• Prior, conditional and joint probability for random variables
– Prior probability: P(X)
– Conditional probability: P(X1 |X2 ), P(X2 |X1 )
– Joint probability: X (X1 , X2 ), P(X) P(X1 ,X2 )
– Relationship: P(X1 ,X2 ) P(X2 |X1 )P(X1 ) P(X1 |X2 )P(X2 )
– Independence: P(X2 |X1 ) P(X2 ), P(X1 |X2 ) P(X1 ), P(X1 ,X2 ) P(X1 )P(X2 )
• Bayesian Rule
4
COMP24111 Machine Learning
Probability Basics
• Quiz: We have two six-sided dice. When they are tolled, it could end up
with the following occurance: (A) dice 1 lands on side “3”, (B) dice 2 lands
on side “1”, and (C) Two dice sum to eight. Answer the following questions:
1) P( A) ?
2) P(B) ?
3) P(C) ?
4) P( A| B) ?
5) P(C | A) ?
6) P( A , B) ?
7) P( A , C ) ?
8) Is P( A , C ) equal to P(A) P(C)?
5
COMP24111 Machine Learning
Probabilistic Classification
• Establishing a probabilistic model for classification
– Discriminative model
P(C|X) C c1 , ,cL , X (X1 , , Xn )
Discriminative
Probabilistic Classifier
x1 x2 xn
x ( x1 , x2 , , xn )
6
COMP24111 Machine Learning
Probabilistic Classification
• Establishing a probabilistic model for classification (cont.)
– Generative model
x ( x1 , x2 , , xn )
7
COMP24111 Machine Learning
Probabilistic Classification
• MAP classification rule
– MAP: Maximum A Posterior
– Assign x to c* if
P(C c* |X x) P(C c|X x) c c* , c c1 , , cL
8
COMP24111 Machine Learning
Naïve Bayes
• Bayes classification
P(C|X) P(X|C)P(C) P(X1 , , Xn |C)P(C)
Difficulty: learning the joint probability P(X1 , , Xn |C)
• Naïve Bayes classification
– Assumption that all input attributes are conditionally independent!
P(X1 , X2 , , Xn |C) P(X1 | X2 , , Xn , C)P(X2 , , Xn |C)
P(X1 |C)P(X2 , , Xn |C)
P(X1 |C)P(X2 |C) P(Xn |C)
– MAP classification rule: for x ( x1 , x2 , , xn )
[P( x1 |c* ) P( xn |c* )]P(c* ) [P( x1 |c) P( xn |c)]P(c), c c* , c c1 , , cL
9
COMP24111 Machine Learning
Naïve Bayes
• Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,
For each target value of ci (ci c1 , , c L )
Pˆ (C ci ) estimate P(C ci ) with examples in S;
For every attributevalue x jk of each attributeX j ( j 1, , n; k 1, , N j )
Pˆ ( X j x jk |C ci ) estimate P( X j x jk |C ci ) with examples in S;
10
COMP24111 Machine Learning
Example
• Example: Play Tennis
11
COMP24111 Machine Learning
Example
• Learning Phase
Outlook Play=Yes Play=No Temperature Play=Yes Play=No
Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5
12
COMP24111 Machine Learning
Example
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
– MAP rule
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
13
COMP24111 Machine Learning
Example
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
– MAP rule
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
14
COMP24111 Machine Learning
Relevant Issues
• Violation of Independence Assumption
– For many real world tasks, P(X1 , , Xn |C) P(X1 |C) P(Xn |C)
– Nevertheless, naïve Bayes works surprisingly well anyway!
• Zero conditional probability Problem
– If no example contains the attribute value X j a jk , Pˆ ( X j a jk |C ci ) 0
– In this circumstance, Pˆ ( x1 |ci ) Pˆ ( a jk |ci ) Pˆ ( xn |ci ) 0 during test
– For a remedy, conditional probabilities estimated with
n mp
Pˆ ( X j a jk |C ci ) c
nm
nc : number o f training examples fo r whic h X j a jk and C ci
n : number o f training examples fo r whic h C ci
p : prio r estimate (usually, p 1 / t fo r t po ssible values o f X j )
m : weight to prio r (number o f " virtual" examples, m 1)
15
COMP24111 Machine Learning
Relevant Issues
• Continuous-valued Input Attributes
– Numberless values for an attribute
– Conditional probability modeled with the normal distribution
1 ( X j ji )2
Pˆ ( X j |C ci ) exp
2 ji 2 ji2
ji : mean (avearag e)o f attribute values X j o f examples fo r whic h C ci
ji : standard deviatio n o f attribute values X j o f examples fo r whic h C ci
16
COMP24111 Machine Learning
Conclusions
• Naïve Bayes based on the independence assumption
– Training is very easy and fast; just requiring considering each
attribute in each class separately
– Test is straightforward; just looking up tables or calculating
conditional probabilities with normal distributions
• A popular generative model
– Performance competitive to most of state-of-the-art classifiers
even in presence of violating independence assumption
– Many successful applications, e.g., spam mail filtering
– A good candidate of a base learner in ensemble learning
– Apart from classification, naïve Bayes can do more…
17
COMP24111 Machine Learning