2014 - 0287 - Bayes Networks

Used in Spring 2012, Spring 2013, Winter 2014 (partially)
1. Bayesian Networks
2.Conditional Independence
3.Creating Tables
4.Notations for Bayesian Networks
5. Calculating conditional probabilities from
the tables
table
6.Calculating conditional independence
7. Markov Chain Monte Carlo
8.Markov Models.
9.Markov Models and Probabilistic methods
in vision
Introduction to
Probabilistic
Robotics
1. Probabilities
2. Bayes rule
3. Bayes filters
4. Bayes networks
5. Markov Chains
new
next
Bayesian
Networks and
Markov Models
Bayesian networks and Markov models :
1. Applications in User Modeling
2. Applications in Natural Language
Processing
3. Applications in robotic control
4. Applications in robot Vision
many many more
Bayesian Networks (BNs)

Overview
Introduction to BNs
Nodes, structure and probabilities
Reasoning with BNs
Understanding BNs
Extensions of BNs
Decision Networks
Dynamic Bayesian Networks (DBNs)
Definition of Bayesian
Networks
A data structure that represents the
dependence between variables
Gives a concise specification of the joint
probability distribution
A Bayesian Network is a directed acyclic
graph (DAG) in which the following holds:
1.A set of random variables makes up the nodes
in the network
2.A set of directed links connects pairs of nodes
3.Each node has a probability distribution that
quantifies the effects of its parents
5
Conditional
Independence
The relationship between :
conditional independence
and BN structure
is important for understanding how BNs
work
Conditional Independence Causal

Chains
Causal chains give rise to conditional
independence
A
B
C
P (C | A B ) P (C | B )
Example: Smoking causes cancer, which

causes dyspnoea
smoking
cancer
dyspnoea
A
B
C
Conditional Independence Common

Causes
Common Causes (or ancestors) also give rise to
conditional independence
cancer
B
Xray
dyspnoea
(A
indep
C)
(
)
P (C | A B ) P (C | B ) A indep C | B
|B
Example: Cancer is a common cause of the

two symptoms: a positive Xray and dyspnoea
I have dyspnoea (C ) because of cancer (B) so I
do not need an Xray test
Conditional Dependence Common Effects

Common effects (or their descendants) give
rise to conditional dependence
pollution
C
???
smoking
B
pollution
cancer
cancer
P ( A | C B ) P ( A) P (C ) ((A indep C) | B )
Example: Cancer is a common effect of
pollution and smoking
Given cancer, smoking explains away pollution
We know that you smoke and have cancer, we do not need to
assume that your cancer was caused by pollution
Joint Distributions for
describing uncertain worlds

Researchers found already numerous and
dramatic benefits of Joint Distributions for
describing uncertain worlds
Students in robotics and Artificial Intelligence
have to understand problems with using Joint
Distributions
You should discover how Bayes Net
methodology allows us to build Joint
Distributions in manageable chunks
10
Bayes Net methodology

Why Bayesian methods matter?
1. Bayesian Methods are one of the most important
conceptual advances in the Machine Learning / AI field
to have emerged since 1995.
2. A clean, clear, manageable language and methodology
for expressing what the robot designer is certain and
uncertain about
3. Already, many practical applications in medicine,
factories, helpdesks: for instance
1. P(this problem | these symptoms) // we will use P as probability
2. anomalousness of this observation
3. choosing next diagnostic test | these observations
11
12
Problem 1:
Creating Joint
Distribution
Table
Joint Distribution Table is
an important concept
Probabilistic
truth table
Truth table of all combinations of

Boolean Variables
You can
guess this
table, you
can take
data from
some
statistics,
You can
build this
table
based on
some
partial
tables
Idea use decision diagrams to

represent these data.
Use of
independence
while creating
the tables
Wet
Sprinkler
Rain
Example
17
W
S
R
Wet-Sprinkler-Rain
Example 18
Problem 1:
Creating the
Joint Table
Our Goal is to derive this

table
Let us observe
that if I know 7
of these, the
eight is
obviously
unique , as their
sum = 1
So I need to guess
or calculate or
find 2n-1 = 7 values
But the same data can be stored

explicitely or implicitely, not
necessarily in the form of a table!!
What extra
assumptions can
help to create this
table?
Wet-Sprinkler-Rain
21
Example
Sprinkler on under condition that it rained
Understanding of causation
You need to understand

causation when you
create the table
Wet-Sprinkler-Rain
Example 22
Independence simplifies
probabilities
We use independence of variables S and R
P(S|R) = Sprinkler on
under condition that it
rained
S and R
are
independe
nt
We can use these

probabilities to
create the table
Wet-Sprinkler-Rain
23
Example
Wet-Sprinkler-Rain
Example
We create the CPT for S and R based on our knowledge of the problem
Sprinkler was on
It rained
Conditional
Probability Table
(CPT)
Grass is wet
What about
children
playing or a dog
pissing?
It is still
possible by this
value 0.1
This first step

shows the
collected data
24
Full joint for only S and R

Independence of S and R is used
0.95
0.90
0.90
0.01
Use chain rule for probabilities
Wet-Sprinkler-Rain
25
Example
Chain Rule for

Probabilities
Random variables
0.95
0.90
0.90
0.01
Full joint
probability
You have
a table
You want
to
calculate
some
probability
P(~W)
Wet-Sprinkler-Rain
Example
Independence of S and R implies calculating fewer numbers to

create the complete Joint Table for W, S and R
Six numbers
We reduced only
from seven to six
numbers
Wet-Sprinkler-Rain
28
Example
Explanation of
Diagrammatic
Notations
such as Bayes
Networks
You do not need to build the
complete table!!
You can build a graph of tables or nodes which correspond to certain types of tables
30
Wet-SprinklerRain Example
Wet-Sprinkler-Rain
Example
Sprinkler was on
It rained
Grass is wet
Conditional
Probability Table
(CPT)
This first step

shows the
collected data
Full joint
probability
You have
a table
You want
to
calculate
some
probability
P(~W)
When you have this

table you can
modify it,
you can also
calculate
Problem 2:
Calculating
conditional
probabilities from
the Joint
Distribution Table
Probability that S=T and W=T

Probability that S=T
= Probability that grass is wet under

assumption that sprinkler was on
Wet-Sprinkler-Rain
36
Example
We will use
this in next
slide
We showed
examples of both
causal inference
and diagnostic
37
inference
Explaining Away the

facts from the table
Calculated earlier
from this table
<
Wet-Sprinkler-Rain Example
38
Conclusions on this problem

1. Table can be used for Explaining
Away
2. Table can be used to calculate
conditional independence.
3. Table can be used to calculate
conditional probabilities
4. Table can be used to determine
causality
39
Problem 3: What if S
and R are
dependent?
Calculating
conditional
independence
Conditional
Independence of S
and R
WetSprinklerRain
Example
41
Diagrammatic notation for

conditional Independence of
two variables
Wet-Sprinkler-Rain
Example
42
extended
Conditional Independence formalized for sets of

variables
S
3
S
1
S
2
Now we will explain conditional

independence
CLOUDY - WetSprinkler-Rain
Example
Example
Lung
Cancer
Diagnosis
Example Lung Cancer

Diagnosis
1. A patient has been suffering from shortness of breath

(called dyspnoea) and visits the doctor, worried that
he has lung cancer.
2. The doctor knows that other diseases, such as
tuberculosis and bronchitis are possible causes, as
well as lung cancer.
3. She also knows that other relevant information
includes whether or not the patient is a smoker
(increasing the chances of cancer and bronchitis) and
what sort of air pollution he has been exposed to.
4. A positive Xray would indicate either TB or lung
cancer.
46
Nodes and Values in Bayesian

Networks
Q: What are the nodes to represent and what

values can they take?
A: Nodes can be discrete or continuous
Boolean nodes represent propositions
taking binary values
Example: Cancer node represents
proposition the patient has cancer
Ordered values
Example: Pollution node with values low,
medium, high
Integral values
Example: Age with possible values 1-120
Lung Cancer
47
Lung Cancer Example: Nodes and

Values
Node name
Type
Values
Pollution
Binary
{low,high}
Smoker
Boolean
{T,F}
Cancer
Boolean
{T,F}
Dyspnoea
Boolean
{T,F}
Xray
Binary
{pos,neg}
Example of
variables as nodes
in BN
Lung Cancer Example: Bayesian Network

Structure
Pollution
Smoker
Cancer
Xray
Dyspnoea
49
Lung Cancer
Conditional
Probability Tables
(CPTs)
in Bayesian
Networks
Conditional Probability Tables

(CPTs) in Bayesian Networks
After specifying topology, we
must specify
the CPT for each discrete
node
1. Each row of CPT contains the
conditional probability of each
node value for each possible
combination of values in its
parent nodes
2. Each row of CPT must sum to 1
3. A CPT for a Boolean variable
with n Boolean parents contains
2n+1 probabilities
4. A node with no parents has one
51
Lung Cancer Example: example

of CPT
Smoking is
true
Pollution
low
Probability of
cancer given
state of
variables P and
S
C= cancer
P = pollution
S = smoking
X = Xray
D = Dyspnoea
Bayesian Network for cancer
52
Lung Cancer
Several small CPTs are used to create

larger JDTs.
The Markov Property for Bayesian

Networks
Modelling with BNs requires
assuming the Markov Property:
There are no direct dependencies in
the system being modelled which are
not already explicitly shown via arcs
Example: smoking can influence

dyspnoea only through causing
cancer
Markovs Idea: all information
is modelled by arcs
Software
NETICA
for
Bayesian
Networks
and
joint
Reasoning with Numbers Using Netica

software
Here are the collected data
56
Lung Cancer
Representing the Joint

Probability Distribution:
Example
Pr( P low S F C T X pos D T )
We want to calculate this
Pr( P low)
Pr( S F )
Pr(C T | P low, S F )
Pr( X pos | C T )
Pr( D T | C T )
This graph shows how we can
calculate the joint probability from
other probabilities in the network
P = pollution
S = smoking
X = Xray
D = Dyspnoea
57
Lung Cancer
Problem 4:
Determining
Causality and
Bayes Nets:
Advertisement
example
Nets:
Advertisement
Bayes nets allow one to learn about causal
example
relationships
One more Example:
Marketing analysts want to know whether to
increase, decrease or leave unchanged the
exposure of some advertisement in order to
maximize profit from the sale of some product
Advertised (A) and Buy (B) will be variables for
someone having seen the advertisement or
purchased the product
Advertised-Buy Example
Causality
Example
1. So we want to know the probability that

B=true given that we force A=true, or A=false
2. We could do this by finding two similar
populations and observing B based on A=true
for one and A=false for the other
3. But this may be difficult or expensive to find
such populations
Advertised (A) seen the advertisement
Buy (B) will be variables for
someone purchased the
product
How causality
can be
represented in
a graph?
Markov condition and

Causal Markov Condition
But how do we learn whether or not A causes B
at all?
The Markov Condition states:
Any node in a Bayes net is conditionally independent
of its non-descendants given its parents
The CAUSAL Markov Condition (CMC) states:

Any phenomenon in a causal net is independent of its
non-effects given its direct causes
Advertised (A) and Buy (B)
Acyclic Causal Graph versus

Bayes Net
Thus, if we have a directed acyclic
causal graph C for variables in X, then,
by the Causal Markov Condition, C is
also a Bayes net for the joint
probability distribution of X
The reverse is not necessarily truea
network may satisfy the Markov
condition without depicting causality
Causality Example: when we

learn that p(b|a) and p(b|a)
are not equal
Given the Causal Markov Condition CMC,
CMC we can
infer causal relationships from conditional
(in)dependence relationships learned from the data
Suppose we learn with high Bayesian probability that
p(b|a) and p(b|a) are not equal
Given the CMC, there are four simple causal
explanations for this: (more complex ones too)
Causality Example: four causal

explanations
A causes B
If they advertise more you

buy more
B causes A
If you buy more, they have
more money to advertise
Causality Example: four causal

explanations
selection bias
1. Hidden common
cause of A and B
(e.g. income)
In rich country
they advertise
more and they buy
more
A and B are causes for data selection

1.
2.
(a.k.a. selection bias,

perhaps if database didnt record
false instances of A and B)
If you increase information about

Ad in database, then you increase
also information about Buy in the
database
Causality Example
continued
But we still dont know if
A causes B
Suppose
We learn about the Income
(I) and geographic Location
(L) of the purchaser
And we learn with high
Bayesian probability the
network on the right
Advertised (A=Ad) and Buy (B)
Causality Example - using

CMC
Given the Causal Markov
Condition CMC, the ONLY
causal explanation for the
conditional (in)dependence
relationships encoded in the
Bayes net is that Ad is a cause
for Buy
That is, none of the other
relationships or combinations
thereof produce the
probabilistic relationships
encoded here
Advertised (Ad) and Buy (B)
Causality in Bayes Networks

Thus, Bayes Nets allow inference of
causal relationships by the Causal
Markov Condition (CMC)
Problem 5:
Determine
D-separation
in Bayesian
Networks
D-separation in Bayesian
Networks
We will formulate a Graphical
Criterion of conditional
independence
We can determine whether a set of
nodes X is independent of another
set Y, given a set of evidence nodes
E, via the Markov property:
If every undirected path from a node in
X to a node in Y is d-separated by E,
then X and Y are conditionally
Determining D-separation (cont)
A set of nodes E d-separates two sets of nodes X and Y,

Y if every
undirected path from a node in X to a node in Y is blocked given E
A path is blocked given a set of nodes E, if there is a node Z on the path

for which one of three conditions holds:
1.
Z is in E and Z has one arrow on the path leading in and one arrow out
(chain)
2.
3.
Z is in E and Z has both path arrows leading out (common cause)

cause
Neither Z nor any descendant of Z is in E, and both path arrows lead into Z
(common effect)
effect
Chain
Common
cause
Common
effect
Another Example
of Bayesian
Networks: Alarm
Alarm Example
Let us draw BN from these data
Bayes Net
Corresponding to
Alarm-Burglar
problem
Alarm Example
74
Compactness, Global
Semantics, Local Semantics
and Markov Blanket
Compactness of Bayes Net
Burglar Earthquake
John calls
Mary calls
75
Alarm Example
Semantics,
Local
Semantics and
Markov
Blanket
Useful concepts
76
Alarm Example
78
Markovs
blanket are:
1.Parents
2.Children
3.Childrens
parent
79
Problem 6
How to
systematically
Build a Bayes
Network
-Example
81
Alarm Example
83
Alarm Example
84
Alarm Example
So we add
arrow
85
Alarm Example
86
Alarm Example
87
Alarm Example
Bayes Net for the car that

does not want to start
Such networks can be used

for robot diagnostics or
diagnostic of a human done
by robot
88
Inference in Bayes Nets

and how to simplify it
Alarm Example
89
First method of
simplification:
Enumeration
90
Alarm Example
91
Alarm Example
Second method: Variable

Elimination
Alarm Example
Variable A was
eliminated 92
3SAT Example
Polytrees are
93
better
IDEA:
94
Convert DAG to polytrees
Clustering is used to
convert non-polytree BNs
95
EXAMPLE: Clustering is used to

convert non-polytree BNs
Not a polytree
Is a polytree
96
Alarm Example
Approximate Inference
1.
2.
3.
4.
Direct sampling methods

Rejection sampling
Likelihood weighting
Markov chain Monte Carlo
97
1. Direct
Sampling
Methods
98
Direct Sampling
Direct Sampling generates minterms
with their probabilities
We start
from top
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
100
Rain Example
Cloudy =
yes
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
101
Rain Example
Cloudy =
yes
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
102
Rain Example
sprinkler
= no
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
103
Rain Example
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
104
Rain Example
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
105
Rain Example
W=wet
C= cloudy
R = rain
S = sprinkler
We generated a
sample
minterm
C S R W
Wet Sprinkler
106
Rain Example
2. Rejection
Sampling
Methods
107
Rejection Sampling
Reject inconsistent samples
Wet Sprinkler
108
Rain Example
3.
Likelihood
weighting
methods
109
110
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
111
Rain Example
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
112
Rain Example
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
113
Rain Example
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
114
Rain Example
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
115
Rain Example
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
116
Rain Example
W=wet
C= cloudy
R = rain
S = sprinkler
Wet Sprinkler
117
Rain Example
Likelihood weighting vs.

rejection sampling
1. Both generate consistent estimates of the
joint distribution conditioned on the values
of the evidence variables
2. Likelihood weighting converges faster to the
correct probabilities
3. But even likelihood weighting degrades with
many evidence variables because a few
samples will have nearly all the total weight
119
Sources
Prof. David Page
Matthew G. Lee
Nuria Oliver,
Barbara Rosario
Alex Pentland
Ehrlich Av
Ronald J. Williams
Andrew Moore tutorial with the same
title
Russell &Norvigs AIMA site
Alpaydins Introduction to Machine
Learning site.

2014 - 0287 - Bayes Networks

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

2014 - 0287 - Bayes Networks

Caricato da

Copyright:

Formati disponibili

Used in Spring 2012, Spring 2013, Winter 2014 (partially)

Bayesian Networks (BNs)

Conditional Independence Causal

Example: Smoking causes cancer, which

Conditional Independence Common

Example: Cancer is a common cause of the

Conditional Dependence Common Effects

Joint Distributions for

describing uncertain worlds

Bayes Net methodology

Truth table of all combinations of

Idea use decision diagrams to

Our Goal is to derive this

But the same data can be stored

Sprinkler on under condition that it rained

You need to understand

We can use these

This first step

Full joint for only S and R

Use chain rule for probabilities

Chain Rule for

Independence of S and R implies calculating fewer numbers to

This first step

When you have this

Probability that S=T and W=T

= Probability that grass is wet under

Explaining Away the

Conclusions on this problem

Diagrammatic notation for

Conditional Independence formalized for sets of

Now we will explain conditional

Example Lung Cancer

1. A patient has been suffering from shortness of breath

Nodes and Values in Bayesian

Q: What are the nodes to represent and what

Lung Cancer Example: Nodes and

Lung Cancer Example: Bayesian Network

Conditional Probability Tables

Lung Cancer Example: example

Bayesian Network for cancer

Several small CPTs are used to create

The Markov Property for Bayesian

Example: smoking can influence

Reasoning with Numbers Using Netica

Here are the collected data

Representing the Joint

1. So we want to know the probability that

Markov condition and

The CAUSAL Markov Condition (CMC) states:

Acyclic Causal Graph versus

Causality Example: when we

Causality Example: four causal

If they advertise more you

Causality Example: four causal

A and B are causes for data selection

(a.k.a. selection bias,

If you increase information about

Causality Example - using

Causality in Bayes Networks

Determining D-separation (cont)

A set of nodes E d-separates two sets of nodes X and Y,

A path is blocked given a set of nodes E, if there is a node Z on the path