Announcements

Announcements
• Homework 8 due today, November 13

• ½ to 1 page description of final project due
Thursday, November 15
• Current Events
• Christian - now
• Jeff - Thursday
• Research Paper due Tuesday, November 20
CS 484 – Artificial Intelligence 1

Probabilistic Reasoning
Lecture 15
Probabilistic Reasoning
• Logic deals with certainties
• A→B
• Probabilities are expressed in a notation similar to
that of predicates in First Order Predicate
Calculus:
• P(R) = 0.7
• P(S) = 0.1
• P(¬(A Λ B) V C) = 0.2
• 1 = certain; 0 = certainly not

What's the probability that either A is true
or B is true?
Venn Diagram
AΛB
B
P(A V B) =

Conditional Probability
• Conditional probability refers to the
probability of one thing given that we
already know another to be true:
P ( B ∧ A)
P( B | A) =
P ( A)
• This states the probability of B, given A.
A
AΛB
B

Calculate
• P(R|S) given that the probability of rain is
0.7, the probability of sun is 0.1 and the
probability of rain and sun is 0.01
• P(R|S) =
• Note: P(A|B) ≠ P(B|A)

Joint Probability Distributions
• A joint probability distribution represents the combined
probabilities of two or more variables.
A ⌐A
B 0.11 0.09
⌐B 0.63 0.17
• This table shows, for example, that

P (A Λ B) = 0.11
P (¬A Λ B) = 0.09
• Using this, we can calculate P(A):
AΛB
P(A) = P(A Λ B) + P(A Λ ¬B) A B
= 0.11 + 0.63
= 0.74

Bayes’ Theorem
• Bayes’ theorem lets us calculate a
conditional probability:
P( A | B ) ⋅P( B )
P( B | A) =
P ( A)
• P(B) is the prior probability of B.

• P(B | A) is the posterior probability of B.

Bayes' Theorem Deduction
P ( B ∧ A)
• Recall: P( B | A) =
P ( A)

Medical Diagnosis
• Data
• 80% of the time you have a cold, you also have a high
temperature.
• At any one time, 1 in every 10,000 people has a cold
• 1 in every 1000 people has a high temperature
• Suppose you have a high temperature. What is the
likelihood that you have a cold?

Witness Reliability
• A hit-and-run incident has been reported, and an
eye witness has stated she is certain that the car
was a white taxi.
• How likely is she right?
• Facts:
• Yellow taxi company has 90 cars
• White taxi company has 10 cars
• Expert says that given the foggy weather, the witness
has 75% chance of correctly identifying the taxi

Witness Reliability – Prior Probability
• Imagine lady shown a sequence of 1000 cars
• Expect 900 to be yellow and 100 to be white
• Given 75% accuracy, how many will she say are
white and yellow
• Of 900 yellow cars, says yellow and says white
• Of 100 yellow cars, says yellow and says white
• What is the probability women says white?
• How likely is she right?

Comparing Conditional Probabilities
• Medical diagnosis
• Probability of cold (C) is 0.0001
• P(HT|C) = 0.8
• Probability of plague (P) is 0.000000001
• P(HT|P) = 0.99
• Relative likelihood of cold and plague
P ( HT | C ) * P (C ) P ( HT | P ) ∗P ( P )
P (C | HT ) = P ( P | HT ) =
P ( HT ) P ( HT )
P (C | HT ) P ( HT | C ) * P (C )
=
P ( P | HT ) P ( HT | P ) * P ( P )

Simple Bayesian Concept Learning (1)
• P (H|E) is used to represent the probability that
some hypothesis, H, is true, given evidence E.
• Let us suppose we have a set of hypotheses H1…
Hn.
P ( E | H ) ⋅P ( H )
P( H | E ) = i i
• For each Hi i
P( E )
• Hence, given a piece of evidence, a learner can
determine which is the most likely explanation by
finding the hypothesis that has the highest posterior
probability.

Simple Bayesian Concept Learning (2)
• In fact, this can be simplified.
• Since P(E) is independent of Hi it will have the same
value for each hypothesis.
• Hence, it can be ignored, and we can find the
hypothesis with the highest value of:
P ( E | H i ) ⋅P ( H i )
• We can simplify this further if all the hypotheses are
equally likely, in which case we simply seek the
hypothesis with the highest value of P(E|Hi).
• This is the likelihood of E given Hi.

Bayesian Belief Networks (1)
• A belief network shows the dependencies

between a group of variables.
• If two variables A and B are independent
if the likelihood that A will occur has
nothing to do with whether B occurs.
• C and D are dependent on A; D and E are
dependent on B.
• The Bayesian belief network has
probabilities associated with each link.
E.g., P(C|A) = 0.2, P(C|¬A) = 0.4
• A complete set of probabilities for this belief network
might be:
• P(A) = 0.1
• P(B) = 0.7
• P(C|A) = 0.2
• P(C|¬A) = 0.4
• P(D|A Λ B) = 0.5
• P(D|A Λ ¬B) = 0.4
• P(D|¬A Λ B) = 0.2
• P(D|¬A Λ ¬B) = 0.0001
• P(E|B) = 0.2
• P(E|¬B) = 0.1

• We can now calculate conditional probabilities:

P( A, B, C , D, E ) = P ( E | A, B, C , D) ⋅P( A, B, C , D )
P( A, B, C , D, E ) = P ( E | A, B, C , D) ⋅P( D | A, B, C ) ⋅P (C | A, B) ⋅P( B | A) ⋅P ( A)
• In fact, we can simplify this, since there are

no dependencies between certain pairs of
variables – between E and A, for example.
Hence:
P ( A, B, C , D, E ) = P ( E | B ) ⋅P ( D | A, B) ⋅P (C | A) ⋅P ( B ) ⋅P ( A)

College Life Example
• C = that you will go to college
• S = that you will study
• P = that you will party
• E = that you will be successful in your exams
• F = that you will have fun
S P
E F
College Life Example
C P(C) S P P(E)
0.2 true true 0.6
S P
C P(S) true false 0.9
false true 0.1
E F true 0.8
false 0.2 false false 0.2
C P(P) P P(F)
true 0.6 true 0.9
false 0.5 false 0.7
College Example
• Using the tables to solve problems such as
P(C==true, S = true, P = false, E = true, F =
false) ==
P(C,S, ¬P,E, ¬F)
• General solution
n
P( x1 , K , xn ) = ∏ P ( xi | E )
i =1
P(C , S , ¬ P, E , ¬ F ) =
P (C ) ⋅P( S | C ) ⋅P(¬ P | C ) ⋅P ( E | S ∧¬ P ) ⋅P (¬ F | ¬ P )

Noisy-V Function
• Want to assume know all reasons for a possible event
• E.g. Medical Diagnosis System
• P(HT|C) = 0.8
• P(HT|P) = 0.99
• Assume P(HT|C V P) = 1 (?)
• Assumption clearly not true
• Leak node – represents all other causes
• P(HT|O) = 0.9
• Define noise parameters – conditional probabilities for
¬HT
• P(¬ HT|C) = 1 – P(HT|C) = 0.2
• P(¬ HT|P) =
• P(¬ HT|O) =
• Further assumption – the causes of a high temperature are
independent of each other and the noisy parameters are
independent

Noisy V-Function
• Benefit of Noisy V-Function
• If cold, plague, and other is all false, P(¬HT) =
1
• Otherwise, P(¬HT) is equal to product of the
noise parameters for all the variables that are
true
• E.g. If plague and other is true and cold is false,
P(HT) = 1 – (0.01 * 0.1) = 0.999
• Benefit – don’t need to store as many
values as the Bayesian belief network

Bayes’ Optimal Classifier
• A system that uses Bayes’ theory to classify data.
• We have a piece of data y, and are seeking the correct hypothesis from H1 …
H5, each of which assigns a classification to y.
• The probability that y should be classified as cj is:
m
P (c j | x1 K xn ) = ∑ P(c j | hi ) ⋅P(hi | x1 K xn )
i =1
• x1 to xn are the training data, and m is the number of hypotheses.
• This method provides the best possible classification for a piece of data.
• Example: Given some date will classify it as true or false
• P(true|x1,…,xn) =
P(H1| x1,…,xn) = 0.2 P(false|H1) = 0 P(true|H1) = 1

• P(false|x1,…,xn) =
The Naïve Bayes Classifier (1)
• A vector of data is classified as a single classification.
p(ci| d1, …, dn)
• The classification with the highest posterior probability is
chosen.
• The hypothesis which has the highest posterior probability
is the maximum a posteriori, or MAP hypothesis.
• In this case, we are looking for the MAP classification.
• Bayes’ theorem is used to find the posterior probability:
P(d1 , K , d n | ci ) ⋅P (ci )
P(d1 , K , d n )

The Naïve Bayes Classifier (2)
• Since P(d1, …, dn) is a constant, independent of ci, we
can eliminate it, and simply aim to find the classification ci,
for which the following is maximised:
P(d1 , K , d n | ci ) ⋅P(ci )
• We now assume that all the attributes d1, …, dn are
independent
• So P(d1, …, dn|ci) can be rewritten as:
n
P(ci ) ⋅∏ P(d j | ci )
j =1
• The classification for which this is highest is chosen to

classify the data.

Training Data Classifier Example
x y z Classification • New piece of data to
2 3 2 A
classify
4 1 4 B
1 3 2 A
• (x = 2, y = 3, z =4)
2 4 3 A • Want P(ci|
4 2 4 B x=2,y=3,z=4)
2 1 3 C
1 2 4 A
• P(A) * P(x=2|A) *
2 3 3 B P(y=3|A) * P(z=4|A)
2 2 4 A 8 5 2 4
⋅ ⋅ ⋅ = 0.0417
3 3 3 C 15 8 8 8
3 2 1 A
1 2 1 B • P(B) * P(x=2|B) *
2 1 4 A P(y=3|B) * P(z=4|B)
4 3 4 C
2 2 4 A
M-estimate
• Problem with too little training data
• (x=1, y=2, z=2)
• P(x=1 | B) = 1/4
• P(y=2 | B) = 2/4
• P(z=2 | B) = 0
• Avoid problem by using M-estimate which pads the
computation with additional samples
• Conditional probability = (a + mp) / (b + m)
• m = 5 (equivalent sample size)
• p = 1/num_values_for_category (1/4 for x)
• a = training example with category value and
classification (x=1 and B is 1)
• b = training examples with classification (B is 4)

Collaborative Filtering
• A method that uses Bayesian reasoning to suggest items that a person
might be interested in, based on their known interests.
• If we know that Anne and Bob both like A, B and C, and that Anne likes
D then we guess that Bob would also like D.
• P(Bob likes Z | Bob likes A, Bob likes B, …, Bob likes Y)
• Can be calculated using decision trees:

Announcements

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Announcements

Caricato da

Copyright:

Formati disponibili

Announcements

• Homework 8 due today, November 13

CS 484 – Artificial Intelligence 1

CS 484 – Artificial Intelligence 3

CS 484 – Artificial Intelligence 4

CS 484 – Artificial Intelligence 5

• Note: P(A|B) ≠ P(B|A)

CS 484 – Artificial Intelligence 6

• This table shows, for example, that

CS 484 – Artificial Intelligence 7

• P(B) is the prior probability of B.

CS 484 – Artificial Intelligence 8

CS 484 – Artificial Intelligence 9

CS 484 – Artificial Intelligence 10

CS 484 – Artificial Intelligence 11

• How likely is she right?

CS 484 – Artificial Intelligence 12

CS 484 – Artificial Intelligence 13

CS 484 – Artificial Intelligence 14

CS 484 – Artificial Intelligence 15

• A belief network shows the dependencies

CS 484 – Artificial Intelligence 17

• We can now calculate conditional probabilities:

• In fact, we can simplify this, since there are

CS 484 – Artificial Intelligence 18

CS 484 – Artificial Intelligence 21

CS 484 – Artificial Intelligence 22

CS 484 – Artificial Intelligence 23

P(H1| x1,…,xn) = 0.2 P(false|H1) = 0 P(true|H1) = 1

CS 484 – Artificial Intelligence 25

• The classification for which this is highest is chosen to

CS 484 – Artificial Intelligence 26

CS 484 – Artificial Intelligence 28

CS 484 – Artificial Intelligence 29

Potrebbero piacerti anche