Sei sulla pagina 1di 5

Example

Topology of network encodes conditional independence assertions:


Weather Cavity
Bayesian networks
Toothache Catch
Chapter 14.1–3
W eather is independent of the other variables
T oothache and Catch are conditionally independent given Cavity
Chapter 14.1–3 1 Chapter 14.1–3 4
Outline Example
♦ Syntax I’m at work, neighbor John calls to say my alarm is ringing, but neighbor
Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is there a
♦ Semantics
burglar?
♦ Parameterized distributions
Variables: Burglar, Earthquake, Alarm, JohnCalls, M aryCalls
Network topology reflects “causal” knowledge:
– A burglar can set the alarm off
– An earthquake can set the alarm off
– The alarm can cause Mary to call
– The alarm can cause John to call
Chapter 14.1–3 2 Chapter 14.1–3 5
Bayesian networks Example contd.
A simple, graphical notation for conditional independence assertions
P(B) P(E)
and hence for compact specification of full joint distributions Burglary Earthquake
.001 .002
Syntax:
a set of nodes, one per variable B E P(A|B,E)
a directed, acyclic graph (link ≈ “directly influences”)
T T .95
a conditional distribution for each node given its parents: Alarm
T F .94
P(Xi|P arents(Xi)) F T .29
F F .001
In the simplest case, conditional distribution represented as
a conditional probability table (CPT) giving the
distribution overXi for each combination of parent values A P(J|A) A P(M|A)
JohnCalls T .90 MaryCalls T .70
F .05 F .01
Chapter 14.1–3 3 Chapter 14.1–3 6
Compactness Local semantics
A CPT for Boolean Xi with k Boolean parents has Local semantics: each node is conditionally independent
2k rows for the combinations of parent values B E of its nondescendants given its parents
Each row requires one number p for Xi = true A
(the number for Xi = f alse is just 1 − p) J M U1
...
Um
If each variable has no more than k parents,
the complete network requires O(n · 2k ) numbers
X
Z 1j Z nj
I.e., grows linearly with n, vs. O(2n) for the full joint distribution
For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25 − 1 = 31)
Y1 Yn
...
Theorem: Local semantics ⇔ global semantics
Chapter 14.1–3 7 Chapter 14.1–3 10
Global semantics Markov blanket
Global semantics defines the full joint distribution Each node is conditionally independent of all others given its
as the product of the local conditional distributions: B E Markov blanket: parents + children + children’s parents
n
P (x1, . . . , xn) = Πi = 1P (xi|parents(Xi)) A
e.g., P (j ∧ m ∧ a ∧ ¬b ∧ ¬e) J M
U1 Um
...
=
X
Z 1j Z nj
Y1 Yn
...
Chapter 14.1–3 8 Chapter 14.1–3 11
Global semantics Constructing Bayesian networks
“Global” semantics defines the full joint distribution Need a method such that a series of locally testable assertions of
as the product of the local conditional distributions: B E conditional independence guarantees the required global semantics
n
P (x1, . . . , xn) = Πi = 1P (xi|parents(Xi)) A 1. Choose an ordering of variables X1, . . . , Xn
J M 2. For i = 1 to n
e.g., P (j ∧ m ∧ a ∧ ¬b ∧ ¬e)
add Xi to the network
= P (j|a)P (m|a)P (a|¬b, ¬e)P (¬b)P (¬e) select parents from X1, . . . , Xi−1 such that
= 0.9 × 0.7 × 0.001 × 0.999 × 0.998 P(Xi|P arents(Xi)) = P(Xi|X1, . . . , Xi−1)
≈ 0.00063 This choice of parents guarantees the global semantics:
n
P(X1, . . . , Xn) = Πi = 1P(Xi|X1, . . . , Xi−1) (chain rule)
n
= Πi = 1P(Xi|P arents(Xi)) (by construction)
Chapter 14.1–3 9 Chapter 14.1–3 12
Example Example
Suppose we choose the ordering M , J, A, B, E Suppose we choose the ordering M , J, A, B, E
MaryCalls MaryCalls
JohnCalls JohnCalls
Alarm
Burglary
Earthquake
P (J|M ) = P (J)? P (J|M ) = P (J)? No
P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)? No
P (B|A, J, M ) = P (B|A)? Yes
P (B|A, J, M ) = P (B)? No
P (E|B, A, J, M ) = P (E|A)?
P (E|B, A, J, M ) = P (E|A, B)?
Chapter 14.1–3 13 Chapter 14.1–3 16
Example Example
Suppose we choose the ordering M , J, A, B, E Suppose we choose the ordering M , J, A, B, E
MaryCalls MaryCalls
JohnCalls JohnCalls
Alarm Alarm
Burglary
Earthquake
P (J|M ) = P (J)? No P (J|M ) = P (J)? No
P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)? P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)? No
P (B|A, J, M ) = P (B|A)? Yes
P (B|A, J, M ) = P (B)? No
P (E|B, A, J, M ) = P (E|A)? No
P (E|B, A, J, M ) = P (E|A, B)? Yes
Chapter 14.1–3 14 Chapter 14.1–3 17
Example Example contd.
Suppose we choose the ordering M , J, A, B, E
MaryCalls
MaryCalls
JohnCalls
JohnCalls
Alarm
Alarm
Burglary
Burglary Earthquake
P (J|M ) = P (J)? No Deciding conditional independence is hard in noncausal directions
P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)? No
P (B|A, J, M ) = P (B|A)? (Causal models and conditional independence seem hardwired for humans!)
P (B|A, J, M ) = P (B)?
Assessing conditional probabilities is hard in noncausal directions
Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed
Chapter 14.1–3 15 Chapter 14.1–3 18
Example: Car diagnosis Compact conditional distributions contd.
Initial evidence: car won’t start Noisy-OR distributions model multiple noninteracting causes
Testable variables (green), “broken, so fix it” variables (orange) 1) Parents U1 . . . Uk include all causes (can add leak node)
Hidden variables (gray) ensure sparse structure, reduce parameters 2) Independent failure probability qi for each cause alone
j
⇒ P (X|U1 . . . Uj , ¬Uj+1 . . . ¬Uk ) = 1 − Πi = 1qi
battery age alternator fanbelt
broken broken
Cold F lu M alaria P (F ever) P (¬F ever)
F F F 0.0 1.0
F F T 0.9 0.1
battery no charging F T F 0.8 0.2
dead
F T T 0.98 0.02 = 0.2 × 0.1
T F F 0.4 0.6
battery battery fuel line starter
meter flat no oil no gas blocked broken T F T 0.94 0.06 = 0.6 × 0.1
T T F 0.88 0.12 = 0.6 × 0.2
T T T 0.988 0.012 = 0.6 × 0.2 × 0.1
Number of parameters linear in number of parents
car won’t
lights oil light gas gauge start dipstick
Chapter 14.1–3 19 Chapter 14.1–3 22
Example: Car insurance Hybrid (discrete+continuous) networks
SocioEcon Discrete (Subsidy? and Buys?); continuous (Harvest and Cost)
Age
GoodStudent
ExtraCar
Mileage Subsidy? Harvest
RiskAversion
VehicleYear
SeniorTrain
DrivingSkill MakeModel Cost
DrivingHist
Antilock
DrivQuality Airbag CarValue HomeBase AntiTheft
Buys?
Ruggedness Accident
Theft
OwnDamage Option 1: discretization—possibly large errors, large CPTs
Cushioning
OtherCost OwnCost Option 2: finitely parameterized canonical families
1) Continuous variable, discrete+continuous parents (e.g., Cost)
MedicalCost LiabilityCost PropertyCost 2) Discrete variable, continuous parents (e.g., Buys?)
Chapter 14.1–3 20 Chapter 14.1–3 23
Compact conditional distributions Continuous child variables
CPT grows exponentially with number of parents Need one conditional density function for child variable given continuous
CPT becomes infinite with continuous-valued parent or child parents, for each possible assignment to discrete parents
Solution: canonical distributions that are defined compactly Most common is the linear Gaussian model, e.g.,:
Deterministic nodes are the simplest case: P (Cost = c|Harvest = h, Subsidy? = true)
X = f (P arents(X)) for some function f = N (a h + b , σ )(c)
t t t
1 c − (ath + bt) 2
  
1

E.g., Boolean functions = √ exp − 


σt 2π 2 σt
 
N orthAmerican ⇔ Canadian ∨ U S ∨ M exican
E.g., numerical relationships among continuous variables Mean Cost varies linearly with Harvest, variance is fixed
∂Level
= inflow + precipitation - outflow - evaporation Linear variation is unreasonable over the full range
∂t but works OK if the likely range of Harvest is narrow
Chapter 14.1–3 21 Chapter 14.1–3 24
Continuous child variables Discrete variable contd.
P(Cost|Harvest,Subsidy?=true)
Sigmoid (or logit) distribution also used in neural networks:
0.35
0.3
0.25
0.2 1
0.15 P (Buys? = true | Cost = c) =
0.1
0.05 1 + exp(−2 −c+µ
σ )
0
10
0 5 Harvest
Sigmoid has similar shape to probit but much longer tails:
5 1
Cost 10 0
0.9
All-continuous network with LG distributions 0.8
P(Buys?=false|Cost=c)

0.7
⇒ full joint distribution is a multivariate Gaussian
0.6
0.5
0.4
Discrete+continuous LG network is a conditional Gaussian network i.e., a
0.3
multivariate Gaussian over all continuous variables for each combination of
0.2
discrete variable values 0.1
0
0 2 4 6 8 10 12
Cost c
Chapter 14.1–3 25 Chapter 14.1–3 28
Discrete variable w/ continuous parents Summary
Probability of Buys? given Cost should be a “soft” threshold: Bayes nets provide a natural representation for (causally induced)
1
conditional independence
0.8 Topology + CPTs = compact representation of joint distribution
P(Buys?=false|Cost=c)

0.6 Generally easy for (non)experts to construct


0.4 Canonical distributions (e.g., noisy-OR) = compact representation of CPTs
0.2
Continuous variables ⇒ parameterized distributions (e.g., linear Gaussian)
0
0 2 4 6 8 10 12
Cost c
Probit distribution uses integral of Gaussian:
Rx
Φ(x) = −∞ N (0, 1)(x)dx
P (Buys? = true | Cost = c) = Φ((−c + µ)/σ)
Chapter 14.1–3 26 Chapter 14.1–3 29
Why the probit?
1. It’s sort of the right shape
2. Can view as hard threshold whose location is subject to noise
Cost Cost Noise
Buys?
Chapter 14.1–3 27

Potrebbero piacerti anche