Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
BITS Pilani
Learning (for humans) is experience from past.
A machine can be programmed to gather experience in
the form of facts, instances, rules etc.
A machine with learning capability can predict about
the new situation (seen or unseen) using its past
experience.
Examples:
As e hu a s a tell a pe so s a e seei g hi /he
second or fifth time, a machine can also do that.
Machine Learning (IS ZC464) As e hu a s a e og ize a pe so s oi e e e if ot
seei g pe so s fa e, a a hi e a also e ade to lea
to do the same.
Session 1: Introduction
December 30, 2017 IS ZC464 2
Test images
Mathematical features
•DCT coefficients
•Pixel values
•Average pixel intensity
Y = (1-X2)
Y
Input Data
Test Program
vector
o Input Data
X
Machine Learning Program
Output
A Neuron
BITS Pilani
•
Line is represented
What if the data to train the system changes by parameters of
slope and intercept
slightly? The machine can be still made to How?
171 76 55
167 72
50
159 65
Which line (hypothesis) fits the given data best? Which line(hypothesis) fits the given data best?
80 80
75 75
70 70
65 65
60 60
55 55
50 50
Which line(hypothesis) fits the given data best? Which line(hypothesis) fits the given data best?
80 80
75 75
70 70
65 65
60 60
55 55
50 50
Which line(hypothesis) fits the given data best? Which line(hypothesis) fits the given data best?
80 80
75 75
70 70
65 65
60 60
55 55
50 50
+0
6 Hypothesis function
+ 1*1
y = wx
+1*1 5
Linear in one variable
+2*2
4 E(w) hw(x) = wx
+1*1
+2*2
3
+1*1
=13 SME = sqrt(S) / total no. of
2
observations
Error will be different if the line’s =√ / 8= .
slope is different (line passes through
1
origin)
At some value of w, E(w) is minimum.
x
w corresponding w
1 2 3 4 5 6 7
to minimum error
Local
Minima
w1 w1
Global Minima
January 13, 2018
w2 IS ZC464 20 January 13, 2018
w2 IS ZC464 21
Y = (1-X2)
Generalization
Generalization in Classification If the test feature
Problem vector can be Traditional Vs. Machine Learning
correctly classified
Y
Input Data
Test Program
vector
o Input Data
X
Machine Learning Program
Output
If
A person has pneumonia
example Then Types of uncertainty
has fever
is pale
has cough • Disease symptoms
white blood cells count is low
pneumonia may have other symptoms too
• Certainty exists in obtaining symptoms if the • Symptoms disease
disease is confirmed these symptoms may be common in other
Disease symptoms diseases as well,
• For converse, it is uncertain that if a person
has fever, and has cough, then has pneumonia, [ but if all possible symptoms can be observed
but if all symptoms are known then the and are same for all patients, then more
disease can be inferred definiteness can be inroduced]
Fever(p) Λ pale (p) Λcough(p) Λ WBC(p)pneumonia(p)
January 21, 2018 IS ZC464 3 January 21, 2018 IS ZC464 4
Representing uncertain
Evidences
knowledge using probability
• The pro a ilit that a patie t has a a it is .8, • Probability theory uses a language that is
depe ds o the age t s elief a d ot o the more expressive than the propositional logic
world.
• These beliefs depend on the percepts the agent • The basic element of the language is the
has received so far random variable.
• These percepts constitute the evidence on which • This random variable represents the real
probability assertions are based. world whose status is initially known.
• As new evidences add on, the probability • The proposition asserts that a random
changes.
variable has a particular value drawn from its
• This is known as conditional probability.
domain
January 21, 2018 IS ZC464 15 January 21, 2018 IS ZC464 16
Axioms of Probability
Conditional Probability
Kol ogoro s A io s
• Represented as P(a|b) • For any proposition a
• P(a|b) = P(aΛ b) / P(b) for P(b)>0 – 0<=P(a) <=1
Compute P(cavity)
Probability expressions More random variables
P(cavity,toothache)
P(toothache)
• P( cavity, toothache, catch) = 0.22 Cavity 0.09 0.01 0.22 0.28
• P( cavity, toothache, catch) = 0.28
January 21, 2018 IS ZC464 39 January 21, 2018 IS ZC464 40
example Review
• 90% students pass an examination Axioms of Probability
• 75% Students who study hard pass the exam 0<=P(a) <=1
P(true)=1 , P(false)=0
• 60% students study hard P(a V b) = P(a) + P(b) – P(aΛb)
• Let S: event that students pass the exam Syntax of the language representing
• H : studies hard uncertainty
• P(S| H) = 0.75 P(proposition)
P(Proposition) = P(ei) over all atomic events ei
• P(S) = 0.9 Where proposition is the conjunction of literals representing
• P(H)= 0.6 random variables.
The random variables represent the real world parameters and
• P(H| S) = ?? capture the uncertainty
• Solutio : Use a es Theore Example P(cavity Λ (weather = rainy))
• P(H|S) = P(S|H) P(H)/P(S) = 0.75 x 0.6 / 0.9 = 0.5
January 21, 2018 IS ZC464 45 January 21, 2018 IS ZC464 46
toothache toothache
catch catch
Review Review
Cavity 0.06
Catch
0.19
Catch
0.05 0.10
Cavity 0.09 0.01 0.22 0.28
Product Rule
P(cavity Λ toothache) = P(cavity, toothache) P(toothache)
• Normalization Constant
Marginal Probability
– P(a|b) = P(a Λ b) / P(b) = P(a Λ b)
P(cavity) = P(ei)
over all atomic events containing cavity – Example
• P(cavity | toothache) =
Marginalization P(cavity,toothache)/P(toothache)
Conditioning = (0.06+0.19) /
P(X) = P(X,z) (0.06+0.19+0.09+0.01)
= P(X|z) P(z) = 0.25/ 0.35 = 0.7142
Example: P(cavity) = P(cavity|toothache) P(toothache) Normalization constant is 1/0.35
+ P(cavity|catch) P(catch)
Solution (Problem 2)
Use Product Rule:
P(cavity|catchΛ toothache)
= P(cavity ΛcatchΛ toothache)/P(catchΛ toothache)
= 0.1/(0.1+0.28) = 0.1/ 0.38
January 21, 2018 IS ZC464 49 January 21, 2018 IS ZC464 50
P( R | BB ) P( BB )
P( R | BB ) P( BB ) P( R | BR ) P( BR ) P( R | RR ) P( RR )
Example courtesy:
http://www.medicine.mcgill.ca/epidemiology/joseph/courses/EPIB-607/BayesEx.pdf
January 21, 2018 IS ZC464 53 January 21, 2018 IS ZC464 54
Bayesian Network Bayesian network
• A suitable data structure implementing a
mechanism to represent the dependent and
independent relationships among the real world
variables.
• Captures the uncertain knowledge in a natural cloud humid
N SL P(H)
P(A) = 0.6 T T 0.95
P(SH) = 0.4 T F 0.78
Attend Healthy Attend Healthy F T 0.6
Study hard lectures lifestyle Study hard lectures lifestyle F F 0.001
S A H P(GP) SH A H P(GP)
T T T 0.99 T T T 0.99
T T F 0.45 T T F 0.45
T F T 0.60 T F T 0.60
Good T F F 0.30 Good T F F 0.30
performance F T T 0.85 performance F T T 0.85
F T F 0.45 F T F 0.45
F F T 0.05 Associated Conditional F F T 0.05
F F F 0.00001 Probability Tables (CPT) F F F 0.00001
January 21, 2018 IS ZC464 57 January 21, 2018 IS ZC464 58
Example 2 weather
example
cavity
• N: nutritious food
• SL: 8 hours sleep
• H: healthy lifestyle
catch
toothache • SH: study hard
• A: attends lectures
• P(N,SL, H, SH, A)
Weather is an independent variable while cavity = P(N) P(SL) P(H|NΛSL) P(SH) P(A)
effects toothache and catch both
= (0.3) x (0.3) x (0.6) x (0.4) x (0.4)
= 0.00864
Revisit Nutrious
food
8 hrs
sleep
Additional Nutrious
food
8 hrs
sleep
Additional Nutrious
food
8 hrs
sleep
All nodes are Nutrious
food
8 hrs
sleep
links if needed
N P( A)
connectedN P( A)
T 0.95 N SL P(H) T 0.95
F 0.60 T T 0.95 F 0.60
P(SH) = 0.4 T F 0.78 P(SH) = 0.4
Attend Healthy F T 0.6 Attend Healthy
Study hard lectures lifestyle F F 0.001 Study hard lectures lifestyle
SH A H P(GP) SH A H P(GP)
T T T 0.99 T T T 0.99
T T F 0.45 T T F 0.45
T F T 0.60 T F T 0.60
Good T F F 0.30 Good T F F 0.30
performance F T T 0.85 performance F T T 0.85
F T F 0.45 F T F 0.45
Associated Conditional F F T 0.05 Associated Conditional F F T 0.05
Probability Tables (CPT) F F F 0.00001 Probability Tables (CPT) F F F 0.00001
January 21, 2018 IS ZC464 67 January 21, 2018 IS ZC464 68
vehicles vehicles
fuel factories fuel factories
house house
green house
Deforestation pollution Deforestation pollution
gases
Marginal Probability
Joint Probability
Conditional Probability Machine Learning (IS ZC464) Session 4:
Bayes’ Theorem and its applications in Machine Learning, MAP
hypothesis, Information Theory and its application in Minimum
Description Length (MDL) principle
January 21, 2018 IS ZC464 77
Bayesian learning
Bayes theore Example 1: observation of sounds
• Bayes theore pro ides a a to al ulate Training with Observed data: {d1,d2, d3} =training
the probability of a hypothesis based on its data (say D) Sou ds ae a d a are the
observed targets that we
prior probability, the probability of observing d1: at sou ds ith ae know.
various data given the hypothesis, and the d2: pot sou ds ith a Prior probabilities
observed data itself. P sou d = ae = .5
d3: at sou ds ith ae P sou d = a = .5
features su h as a a d o Conditional probabilities are OR
are obtained through represented as P ae |d ,d ,d = /
preprocessing of the given P ae |feature = a = / P a |d ,d ,d = /
words – by parsing P a | feature = o = /
OR
P ae |D = /
P a |D)= 1/3
January 27, 2018 IS ZC464 2 January 27, 2018 IS ZC464 3
Conditional
Probabilities
Bayesian Learning Posterior Probabilities P(u | oo) = 0.1
P(u | a~)= 0.2
P(a |ae) = 0.3
• Training : Through the computation of the • P(ae |a) = P(a | ae) * P(ae)/P(a) Maximum P(a |aw) =0.1
probabilities as in previous two slides P(o| a~) = 0.1
Entropy p log
i
i 2
p i
= 1*0.4 + 3*0.25 + 2*0.3 + 3*0.05 = 1.9 bits
Fixed length encoding = 2 bits
Entropy p log p p log p
1 2 1 2 2 2
p log p p
3 2 3 4
log p
2 4
h MAP
Arg max P( D | h) P(h)
hH
h MAP
Arg max log P( D | h) log P(h)
hH 2 2
P ( D | h) P ( h)
max
hH P( D)
February 3, 2018 max P ( D | hIS)ZC464
P ( h) 4 February 3, 2018 IS ZC464 5
hH
Techniques for classification based on posterior Bayes optimal Classifier: A weighted majority
probabilities classifier
• Maximum a Posteriori (MAP) • What is the most probable classification of the new
Maximum likelihood instance given the training data?
Only one hypothesis contributes
– The most probable classification of the new instance is
• Bayes Opti al Classifier obtained by combining the prediction of all hypothesis,
Weighted Majority Classifier weighted by their posterior probabilities
All hypotheses contribute
Costly classifier • If the classification of new example can take any
• Gibbs Algorithm value vj from some set V, then the probability P(vj|D)
Any one randomly picked hypothesis that the correct classification for the new instance is
Less costly vj, is just:
• Naïve Bayes Classifier
uses assumption that the attributes are conditionally
independent
v max P (v j | a1 , a2 ....an )
vjV
P ( a1 , a2 ....an | v j ) P (v j )
• Compute the Maximum of two probabilities max
vjV P ( a1 , a2 ....an )
max P ( a1 , a2 ....an | v j ) P (v j )
P(+|D) and P(-|D) which is 0.49 vjV
travels in a city given its data P(a1 , a2 ....an | v j ) P(ai | v j ) (attributes are conditionally independent)
i
i 1
sunny mild high false N P(hot|p) = 2/9 P(hot|n) = 2/5
sunny cool normal false P
rain mild normal false P P(mild|p) = 4/9 P(mild|n) = 2/5
sunny mild normal true P
overcast mild high true P P(cool|p) = 3/9 P(cool|n) = 1/5
overcast hot normal false P
• Greatly reduces the computation cost, only rain mild high true N humidity
P(high|p) = 3/9 P(high|n) = 4/5
count the class distribution. P(p) = 9/14 P(normal|p) = 6/9 P(normal|n) = 2/5
windy
P(n) = 5/14
P(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5
February 3, 2018 IS ZC464 16 February 3, 2018 IS ZC464 17
• P( cavity, toothache, catch) = 0.22 Cavity 0.09 0.01 0.22 0.28
• P( cavity, toothache, catch) = 0.28
February 3, 2018 IS ZC464 24 February 3, 2018 IS ZC464 25
toothache toothache toothache toothache
Catch catch Catch catch Catch catch Catch catch
Solution Cavity 0.06 0.19 0.05 0.10 Solution Cavity 0.06 0.19 0.05 0.10
Cavity 0.09 0.01 0.22 0.28 Cavity 0.09 0.01 0.22 0.28
• Compute P(X|Ci) for each class • A box contains 10 red and 15 blue balls. Two balls are
P age= < | u s_ o puter= es = / = . P u s_ o puter=„ es = / selected at random and are discarded without their
P age= < | u s_ o puter= o = / = .
P i o e= ediu | u s_ o puter= es = / = . P u s_ o puter=„ o = / colors being seen. If a third ball is drawn randomly and
P i o e= ediu | u s_ o puter= o = / = .
P
P
stude t= es | u s_ o puter= es = / = .
stude t= es | u s_ o puter= o = / = .
observed to be red, what is the probability that both of
P
P
redit_rati g= fair | u s_ o puter= es = / = .
redit_rati g= fair | u s_ o puter= o = / = .
the discarded balls were blue?
• X=(age<=30 ,income =medium, student=yes,credit_rating=fair) • Solution Hint:
P(X|Ci) : P X| u s_ o puter= es = . . . . . = .
P X| u s_ o puter= o = . . . . = .
P( R | BB ) P( BB )
P(X|Ci)*P(Ci ) : P X| u s_ o puter= es * P u s_ o puter= es = .
P X| u s_ o puter= o * P u s_ o puter= o = . P( R | BB ) P( BB ) P( R | BR ) P( BR ) P( R | RR ) P( RR )
X elo gs to lass u s_ o puter= es
500
Housing Prices Training set of Size in feet2 Price ($) in
(Portland, OR) 400 housing prices (x) 1000's (y)
300 2104 460
Price 200 1416 232
(in 1000s 1534 315
of dollars) 100
852 178
… …
0
0 500 1000 1500 2000 2500 3000 Notation:
Size (feet2) m = Number of training examples
x s = i put aria le / features
Supervised Learning Regression Problem y s = output aria le / target aria le
Gi e the right a s er for Predict real-valued output
each example in the data.
Slides numbers 3-6 and 11-43 adapted from Coursera Courseware on Machine Learning
course offered by Prof. Andrew Ng.
February 4, 2018 IS ZC464 3 February 4, 2018 IS ZC464 4
2 2 2
Learning Algorithm
s: Para eters
1 1 1
Size of
h
Estimated Ho to hoose s? 0 0 0
house price 0 1 2 3 0 1 2 3 0 1 2 3
2
• The ith data is x(i)
• The ith target is y(i)
1
February 4, 2018 1 2
IS ZC464 3 4 5 7 February 4, 2018 IS ZC464 8
Hypothesis Objective
• Equation(1):
h(x(i)) = 0 + 1 x(i) • To find 0, 1 to minimize J(0, 1)
Note: The notations used in Bishop s book (Section 3.1) • J(0, 1) is given by the expression
are as follows
1. In place of parameters , The book uses the notion of
w (later will be referred to as weights)
2. In place of <x(i), x(2), x(3),…, (m)>, the book uses vector
J ( 0 , 1)
1 m
2m i 1 h x y
i i
2
x
3. In place of <y(i), y(2), y(3),…, (m)>, the book uses vector • Objective Function
h x y
y. m 2
4. In place of h(x(i)) , the book uses y(x,w) given by Minimize i i
y(x,w) = w0 + w1 x (which is equivalent to 0 1 i 1
equation (1))
February 4, 2018 IS ZC464 9 February 4, 2018 IS ZC464 10
Simplified (0 = 0)
Hypothesis:
(for fixed , this is a function of x) (function of the parameter )
3 3
Parameters:
2 2
y
1 1
Cost Function:
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
Goal:
3 3 3 3
1 = 0
2 2 2 2
y y
1 1 1 1
0 0 0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x x
Compute J(1) = (1/2*3)*{(0.5-1)2+ Compute J(1) = (1/2*3)*{(0-1)2+ (0-
(1-2)2 + (1.5-3)2 } 2)2 + (0-3)2 }
= (1/6)*(0.25+1+2.25) = (1/6)*(1+4+9)
=(1/6)* 3.5 = 0.58 =(1/6)* 14 = 2.3
February 4, 2018 IS ZC464 13 February 4, 2018 IS ZC464 14
2 Parameters:
1
Cost Function:
0
-0.5 0 0.5 1 1.5 2 2.5
Goal:
J(0,1)
500
400
Price ($)
i s 300
200
0
100
0
0 1000 2000 3000
Size in feet2 (x)
1
(for fixed , this is a function of x) (function of the parameters ) (for fixed , this is a function of x) (function of the parameters )
Want
Outline:
J(0,1)
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
1
0
J(0,1)
1
0
update
and
simultaneously
J(0,1)
1
0
(for fixed , this is a function of x) (function of the parameters ) (for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters ) (for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters ) (for fixed , this is a function of x) (function of the parameters )
Size Number
Size Price (feet2) of Number Age of home Price ($1000)
(feet2) ($1000) bedrooms of floors (years)
2104 5 1 45 460
2104 460 1416 3 2 40 232
1534 3 2 30 315
1416 232
852 2 1 36 178
1534 315 … … … … … 2104
5
852 178 (1)
x 1
… … Notation:
= number of features 45
= input (features) of training example.
= value of feature in training example.
Hypothesis: Hypothesis:
Previously:
Parameters:
Now with multiple variables or features
Cost function:
h(x) = 0 + 1 x1+ 2 x2+ 3 x3+ 4 x4
Size (feet2) Number of
bedrooms etc.
New algorithm :
Gradient Descent
Repeat Linear Regression
Previously (n=1):
x2 x2
x1 x1
Do ot k o
Test feature
condition
+
Decision Tree
BITS Pilani
A decision tree takes as input an object or situation
described by a set of attributes and returns a decision.
This decision is the predicted output value for the
input.
The input attributes can be discrete or continuous.
Classification Learning:
Learning a discrete valued function is called classification
learning
Machine Learning (IS ZC464) Session 8: Decision Trees Regression :
and Review Session Learning a continuous function is called Regression.
This slide is adapted from the text book and from the set of slides available at
aima.eecs.berkeley.edu/slides-ppt/m18-learning.ppt This slide is adapted from the text book and from the set of slides available at
February 11, 2018 IS ZC464 9 February 11, 2018 IS ZC464 10
aima.eecs.berkeley.edu/slides-ppt/m18-learning.ppt
This slide is adapted from the text book and from the set of slides available at
aima.eecs.berkeley.edu/slides-ppt/m18-learning.ppt
February 11, 2018 IS ZC464 13 February 11, 2018 IS ZC464 14
Generalize the splitting Gain(A)
• Let the attribute A divides the entire training Gain(A)
set i to sets E , E , … Ev. Where v is the total = I(p/p+n, n/p+n) – Remainder(A)
number of values A can be tested on. The heuristic to choose attribute A from a set of
• Assume that each set Ei contains pi positive all attributes is the maximum gain
examples and ni negative examples
• Remainder (A) Compute
= (pi+ni)/(p+n) I(pi/(pi+ni), ni/(pi+ni)) 1. Gain(Patrons)
over i=1 to v 2. Gain(type)
This slide is adapted from the text book and from the set of slides available at
February 11, 2018 IS ZC464 19 aima.eecs.berkeley.edu/slides-ppt/m18-learning.ppt
February 11, 2018 IS ZC464 20
Decision Trees Decision Trees
• Learning is through a series of decisions taken • If the decisions are binary, then in the best
with respect to the attribute at the non-leaf
node. case the decision eliminates almost half of the
• There can be many trees possible for the given regions (leaves).
training data. • If the e a e egio s, the the o e t egio
• Finding the smallest DT is an NP-complete can be found in log2(b) decisions in the best
problem.
• Greedy selection of the attribute with largest
case.
gain to split the training data into two or more • The height of the decision trees depends on
sub-classes may lead to approximately the the order of the attributes selected to split the
smallest tree
training examples at each step.
February 11, 2018 IS ZC464 21 February 11, 2018 IS ZC464 22
Entropy
• It is the measure of the information content
and is given by
– I = - P(vi)log2P(vi)
– Where v1,v2,..,vk are the values of the attribute on
which the decisions bifurcate.
• Here v1 = yes, v2 = no (YES) r3,r4, r5, r7, r9,r10, r11, r12, r13
• Positive examples: r3, r4, r5, r7, r9, r10, r11, r12, r13 (NO) r1, r2, r6, r8,r14
• Remainder = (4/14)*I(3/4,1/4)
(YES) r3,r4, r5, r7, r9,r10, r11, r12, r13 + (6/14)*I(4/6, 2/6)
(NO) r1, r2, r6, r8,r14
+(4/14)*I(2/4,2/4)
Low
Medium High
= (4/14) {-(3/4) log2(3/4) – (1/4)log2 (1/4)}
+(6/14) {-(4/6) log2(4/6) – (2/6)log2 (2/6)}
(YES) r5, r7, r9
(YES) r4, r10, r11, r12 (YES) r3, r13
(NO) r6
(NO) r8, r14 (NO) r1, r2
+(4/14){ (-(2/4)log2 (2/4)-(2/4)log2(2/4)
p1 = 3 p2 = 4 p3 = 2 [Home Work: Remaining computation]
n1 = 1 n2 = 2 n3 = 2 p2 = 4 p3 = 2
p1 = 3
n1 = 1 n2 = 2 n3 = 2
•
1 1
Y = f(x)
5 5
• Y=x 2 2
Straight Line
• What is its generalization ability? 4 4
3
Understanding ERROR
sample data-straight line learning
Review Session
Which line fits Which line(hypothesis) fits the given data best?
Straight Line the best?
80
75
70
Line is
represented by 65
parameters of How?
slope and 60
intercept
Machine must
55
learn on its own- Using the data –
which is the best known as training
fit data i.e. (x,y) pair 50
Local
Minima
w corresponding w w1
to minimum error
Global Minima
February 11, 2018 IS ZC464 53 February 11, 2018
w2 IS ZC464 54
Recall
Review Session
hypothesis
• Consider a set of hypotheses H and the • MAP algorithm
observed data used for training D • Gibbs Algorithm
• Define • Minimum Description Length Principle
h MAP
Arg max P(h | D) • Information theory – entropy
hH
• Bayes Opti al Classifie
• The maximally probable hypothesis is called a • Naïve Bayes Classifie
maximum a posteriori (MAP) hypothesis.
J(0,1)
500
400
Price ($)
in 1000’s 300
200
0 Correct: Simultaneous update Incorrect:
100
0
0 1000 2000 3000
Size in feet2 (x)
1
y(x,w)= w0 + w1 x1+ w2 x2+ w3 x3+ …..+wD xD • A notion of class of functions i(x) is used to
represent the regression function
• y(x,w)= w0 + w1 x1+ w2 x2+ w3 x3+ …..+wD xD
is represented as
Key Properties of Linear Regression
y(x,w)= w0 + w1 1(x)+ w2 2(x)+ w3 3(x)+
• y is a linear function of the parameters w0,w1,w2,… D …..+wD D(x)
• y is a linear function of the input variables Where i(x)=xi
(features) x0,x1,x2,…xD • i(x) are called as basis functions for
i=1,2,3,..D
• Examples ?
(No) 0
Boundary
Review Session
Probability of the third ball being
Review Session
red
• Number of ways to select two balls = 25C2 = 300
• Number of ways to select two red balls = 10C2 = 45 P(R) = P(RRR)+ P(RBB)+P(RBR)
• Number of ways to select two blue balls = 15C2 = 105 = P(R|RR)*P(RR) + P(R|BB)* P(BB)+ P(R|BR)*P(BR)
• Number of ways to select one red and one blue ball =
10C * 15C = 10*15 = 150 = (1/8)*(45/300) + (1/10)*(105/300)+ (1/9)*(150/300)
1 1
• P(RR) = 45/300 = 0.125*0.15 + 0.1*0.35 + 0.11*0.5
• P(BR) = 150/300 = 0.01875 + 0.035 + 0.0556 = 0.10935
• P(BB) = 105/300
• Therefore the probability that the third ball is red P(R)
= P(RRR)+ P(RBB)+P(RBR)
Problem 2 Given
• A science competition had students from three • Prior probabilities
schools A,B and C. The numbers of students who P(A)= 50/200 =0.25
participated from schools A, B and C respectively
are 50, 80 and 70. The probability of students P(B)=80/200= 0.4
qualifying (Q) the competition given his/her P(C)=70/200 = 0.35
school is P(Q|A)= 0.6, P(Q| B)= 0.25 and P(Q | • Conditional probabilities
A)= 0.45
P(Q|A)= 0.6,
Q1. Compute the probability of qualifying P(Q).
P(Q| B)= 0.25 and
Q2. Compute the probability of a student belonging
to school B given that he/she qualified i,.e. P(Q | A)= 0.45
compute P(B|Q)
February 17, 2018 IS ZC464 4 February 17, 2018 IS ZC464 5
Questions?