Sei sulla pagina 1di 32

# Leong YK & Wong WY Introduction to Statistical Decisions 1

Chapter 5
Statistical Decision Problems
Decision problems called statistical are those in which there
are data on the state of nature, hopefully containing
information that can be used to make a better decision. The
availability of data will generally provide some illumination,
so that in the selection of an action one is not completely in
the dark concerning the state of nature. However, in
practice, one still faces the problem of taking bad action if
information contained in the data is not intelligently utilized.

## 5.1 Date and the State of Nature

We assume that data contain information about the true state
of nature. For the general discussion, the notation X , either a
single random variable or random vector, will be employed
to refer to data, will have a (joint) density function given by
f ( x| ) , x S X ; .
Leong YK & Wong WY Introduction to Statistical Decisions 2
If is treated as random variable then the above density
function will be regarded as the conditional density function
of X given that the state of nature is = .

## 5.2 Decision Functions

Let (, A, L) be a decision problem and X , an observable
random variable. A procedure for using data as an aid to
action choosing will involve a rule. That is, assign an action
to each of the observed data X.

Decision rule
A function
d : SX A
is called a nonrandomized decision rule.

Example 5.2.1
Suppose (, A, L) is a decision problem with A = { a1 , a2 },
and suppose X is an observable random variable such that
S X = {x1 , x2 }. Then for are four distinct nonrandomized
decision rules as in the following table:

d1 d2 d3 d4
x = x1 a2 a2 a1 a1
x2 a2 a1 a2 a1
Leong YK & Wong WY Introduction to Statistical Decisions 3

Example 5.2.2
Suppose (, A, L) is a decision problem with A = { a1 , a2 },
and suppose X is an observable random variable such that
S X = {x1 , x2 , x3}. Then for are eight distinct
nonrandomized decision rules as in the following table:

d1 d2 d3 d 4 d5 d 6 d 7 d8
x = x1 a2 a2 a2 a1 a2 a1 a1 a1
x2 a2 a2 a1 a2 a1 a2 a1 a1
x3 a2 a1 a2 a2 a1 a1 a2 a1

## 5.3 Risk Functions

When a decision function d ( x) is used, the loss incurred will
depend not only on the true state of nature , but also on the
value of the observable random variable X . Since X is
random, so the loss incurred
L( , d ( X ))
is a random variable. The expected value of the above
random loss is called the risk function of decision d . That is,

Risk Function
The risk function of the decision rule d , when
the state of nature is , is defined to be
R( , d ) = E[ L( , d ( X ))]
Leong YK & Wong WY Introduction to Statistical Decisions 4

## When observed random variable X was incorporated in

making decisions, the action space is expanded to the set of
(nonrandomized) decision rules, denoted by D and the no-
data decision problem (, A, L) is extended to the statistical
decision problem denoted by (, D, R) .
Example 5.3.1
Consider the decision problem with loss table given by
a1 a2
1 0 4
2 4 0

## Suppose that is random variable is observed with density

function given by
f ( x;1) f ( x; 2 )
x=0 0.6 0.2
1 0.4 0.8
The set of nonrandomized decision rules are
d1 d2 d3 d4
x=0 a2 a2 a1 a1
1 a2 a1 a2 a1

## The risks of these four decision rules are listed below:

Leong YK & Wong WY Introduction to Statistical Decisions 5

d1 d2 d3 d4
R(1, d ) 4 2.4 1.6 0
R( 2 , d ) 0 3.2 0.8 4
The decision rules are also be represented graphically by
means of their risk points.

Remark
Decisions d1 and d 4 , which ignore the data, give risk
points which are exactly the same as the corresponding
loss points for no-data problem.
The straight line joining the risk points of d1 and d 4
would consists the loss points ( as regards risks ) to the
randomized actions of a2 and a1.
An intelligent use of data, such as decision rule d 3 can
improve the expected losses.
Leong YK & Wong WY Introduction to Statistical Decisions 6
Using data foolishly, such as decision d 2 can deteriorate
the loss function.
How many distinct nonrandomized decision rules are in a
given statistical decision problem in which there n available
Actions and the observed random variable has m possible
values?
Example 5.3.2
Consider a decision problem in which the loss matrix is
given by
a1 a2
1 0 1
2 6 5
Suppose that the observed random variable X takes three
possible values with density function given in the following
table

x1 x2 x3
f ( x;1) 0.6 0.3 0.1
f ( x; 2 ) 0.1 0.4 0.5
The list of nonrandomized decision rules are tabulated as
follows:
Leong YK & Wong WY Introduction to Statistical Decisions 7

d1 d2 d3 d4 d5 d6 d7 d8
x1 a2 a2 a2 a1 a2 a1 a1 a1
x2 a2 a2 a1 a2 a1 a2 a1 a1
x3 a2 a1 a2 a1 a1 a1 a2 a1

## The risks of these decision rules are given in the following

table
d1 d 2 d 3 d 4 d 5 d 6 d 7 d8
R(1, d ) 1.0 0.9 0.7 0.4 0.6 0.3 0.1 0.0
R( 2 , d ) 5.0 5.5 5.4 5.1 5.9 5.6 5.5 6.0
These risk points are presented graphically as follows:

## The phenomenon observed in the above examples above is

typical:
Leong YK & Wong WY Introduction to Statistical Decisions 8

## The availability of data permits a reduction of

losses if the data are wisely used.
If the set of risk points ( R(1, a ), R( 2 , a )) represents
various decisions is pulled in toward the origin,
where decisions are as perfect as they could be.

## But it should be emphasized that for this to happen

the data must contain some information of the state of
nature. For, if the distribution of X is independent of ,
say
f ( x; ) = g ( x)
then the risks of a given decision rule d are
R(i , d ) = E[ L(i , d ( X ))] = L(i , d ( x)) g ( x) .
x
If the state space contains m elements, then
R(1, d ) L(1, d ( x))

R( 2 , d ) L( 2 , d ( x))
= g ( x)
M x M

R( m , d ) L( m , d ( x)
which is a convex combination of the loss point.
Leong YK & Wong WY Introduction to Statistical Decisions 9

Example 5.3.3
Consider the decision problem stated in Example 5.3.2 with
loss table given by
a1 a2
1 0 1
2 6 5

## Suppose now the density function of the observed random

variable X is given by
x1 x2 x3
f ( x;1) 0.6 0.3 0.1
f ( x; 2 ) 0.6 0.3 0.1

x1, x2 , x3
The risk of a decision rule, say d 2 =
(Note)
a2, a2 , a1
Would be
R(1, d 2 ) = 1 0.6 + 1 0.3 + 0 0.1
R( 2 , d 2 ) = 5 0.6 + 5 0.3 + 6 0.1,
or written in vector form as
R(1, d 2 ) 1 0
= 0.9 + 0.1
R( 2 , d 2 ) 5 6
This is a convex combination of the pure actions a1 and a2 .
Leong YK & Wong WY Introduction to Statistical Decisions 10

## 5.4 Optimal Decision Rules

The way to select a decision rule, with the knowledge of the
risk function, is exactly the same problem as the selection of
an action. The risk function plays the role as that of the loss
function in no-data decision problem.
The concepts of dominance and admissibility can be
extended in obvious way to decision rules.

We say that decision rule d is dominated by
decision rule d 0 if
R( , d 0 ) R ( , d ) .
If the above inequality is strict for some state ,
then decision rule is said to be inadmissible.
Decision rule which is not inadmissible is called

Example 5.4.1
Consider the decision problem stated in Example 5.3.1. The
risk set and the risk points of the nonrandomized decision
rules are reproduced here for easy reference.
Leong YK & Wong WY Introduction to Statistical Decisions 11
d1 d2 d3 d4
R(1, d ) 4 2.4 1.6 0
R( 2 , d ) 0 3.2 0.8 4

## Decision rule d 2 is inadmissible

Other three decision rules are admissible.
Minimax Principle
As in the no-data case, it is necessary to devise a scheme of
preferences so that in this ordering one can select the most
desirable decision rule.

## Minimax Decision Rule

Decision rule d 0 is said to be a minimax decision
rule if
max R ( , d 0 ) max R ( , d ) for all d D .

Leong YK & Wong WY Introduction to Statistical Decisions 12

Example 5.4.2
The risk set of the risk points of the decision rules
considered in Example 5.4.1 is reproduced as follows:
d1 d2 d3 d4
R(1, d ) 4 2.4 1.6 0
R( 2 , d ) 0 3.2 0.8 4
max R ( , d ) 3 3.2 1.6 4

So d3 is the minimax decision rule.

## The result can also be obtained graphically. That is,

The minimax decision rule is obtained by moving a wedge
whose vertex is on 450 line and whose sides are parallel to
the coordinate axes up to the set of risk sets
As in the no-data decision problem, it can make a difference
if one is going to consider a minimax approach whether
one uses regrets or losses.
Leong YK & Wong WY Introduction to Statistical Decisions 13

Note that there are two ways to introduce the idea of regret.
It is either
(a) applying it to the initial loss function
Data
Loss Regret Regret

Data
Loss Risk Regret

## Recall that the regret function is defined as

Lr ( , ai ) = L( , ai ) min L( , a ) .
aA
Note that the min L( , a ) depends solely on the state of
aA
nature. So for each observed value X = x ,
Lr ( , d ( x )) = L( , d ( x )) min L( , a )
aA
Thus the expected regret is then
Lr ( , d ) = E[ L( , d ( X ))] = R( , d ) min L( , a ) .
aA
Now, for any decision rule d,
min L( , a ) L( , d ( x )) for all x .
aA
Therefore,
Leong YK & Wong WY Introduction to Statistical Decisions 14

## min L( , a ) E[ L( , d ( X )] = R( , d ) for all d,

aA
and hence
min L( , a ) min R( , d ) .
aA d D
On the other hand, since A D, and R( , d *) = L( , a ) if
d * ( x ) a , so
min R( , d ) L( , a ) for all a A
d D
and therefore,
min R( , d ) = min L( , a ) ,
d D aA
Therefore
Rr ( , d ) = E[ Lr ( , d ( X )] = R( , d ) min R( , d ' )
d 'D
This implies that the expected regret is the same as the
regretized risk.

Example 5.4.3
Reconsider the statistical decision problem in Example 5.3.2.
The risk functions of the nonrandomized decision rules were
tabulated as follows:
Leong YK & Wong WY Introduction to Statistical Decisions 15

d1 d 2 d 3 d 4 d 5 d 6 d 7 d8
R(1, d ) 1.0 0.9 0.7 0.4 0.6 0.3 0.1 0.0
R( 2 , d ) 5.0 5.5 5.4 5.1 5.9 5.6 5.5 6.0

## By regretizing the above risks, we obtain the expected regret

as follows
d1 d 2 d3 d 4 d5 d 6 d 7 d8
E[ Lr (1, d ( X ))] 1.0 0.9 0.7 0.4 0.6 0.3 0.1 0.0
E[ Lr ( 2 , d ( X ))] 0.0 0.5 0.4 0.1 9.9 0.6 0.5 1.0

## It follows that d 4 is the minimax nonrandomized regret

decision rule. The minimax randomized regret decision rule
can be obtained graphically by moving the wedge with
vertex on the 450 line up to the set of expected regret points.
It is clear that the minimax randomized regret rule is a
mixture of d 4 and d 7 , namely

~ d1 d 2 d 3 d 4 d5 d 6 d 7 d8
p=
0 0 0 p 0 0 1 p 0

or simply denoted as
~
p = ( 0 , 0 , 0 , p , 0 , 0 ,1 p , 0 )

## The value of p is obtained by equating

Leong YK & Wong WY Introduction to Statistical Decisions 16

E[ Lr (1, ~
p )] = E[ Lr ( 2 , ~
p )]

## 0.4 p + 0.1(1 p ) = 0.1 p + 0.5(1 p )

3p +1 = 5 4 p
p = 4/7

Bayes Principle
Another scheme for ordering decision rules is that of
assigning prior probabilities ( ) to the various states of
nature, and determining the average risk over these states.
Leong YK & Wong WY Introduction to Statistical Decisions 17

## Bayes Decision Rule

Suppose that ( ) = P( = ) is the prior
probability of the state of nature. Then the Bayes
risk of the decision rule d in a statistical decision
problem (, D, R) is defined to be
R( , d ) = R( , d ) ( ) .

Decision rule d 0 is called a Bayes decision rule
against if
R( , d 0 ) R( , d ) for all d D.

Posterior Distribution
The Bayes approach to select an optimal decision rule
involves the assumption that the state of nature is random
with probability function ( ) . The probability function of
the observed random variable X will be regarded as the
conditional distribution given that = and is written as
P( X = x | = ) or simply as f ( x | ) . We shall denote the
conditional distribution of given that X = x as ( | x ) .
Note that
( | x ) P ( X = x ) = P( X = x | = ) ( ) (*)
Leong YK & Wong WY Introduction to Statistical Decisions 18
In fact, both sides of the above equation represent the joint
probability of X and . We call conditional distribution
( | x ) the posterior distribution of given that X = x .
As ( | x ) is considered as a function of while x is fixed,
( | x ) is proportional to the product
P( X = x | = ) ( )
and we write
( | x ) P ( X = x | = ) ( ) (**)

Example 5.4.4
Suppose that the conditional probabilities of X given =
are
P ( X = x1 | = 1 ) = 1 / 4 = 1 P ( X = x 2 | = 1 )
P( X = x1 | = 2 ) = 2 / 3 = 1 P ( X = x2 | = 2 ) ,
and suppose the prior probability function of is given by
P( = 1 ) = (1 ) = w
P( = 2 ) = ( 2 ) = 1 w , 0 w 1

## Find the posterior distribution of given that X = x .

For X = x1,
3w 8 8w
(1 | x1) = , ( 2 | x1 ) = 1 (1 | x1 ) =
8 5w 8 5w
Leong YK & Wong WY Introduction to Statistical Decisions 19
For X = x2 ,
3w 4 4w
(1 | x2 ) = , ( 2 | x2 ) =
4w 4w

Successive Observations
If an observation can alter prior odds to posterior odds, it
would seem that further observation, applied to the first
posterior distribution as though it were a prior, should result
in yet a new posterior distribution.

## It is of interest to know that if a posterior distribution is used

as a prior distribution with new data, the resulting posterior
distribution is the same as if one had waited until all the data
were at hand to use with the original prior distribution for a
final posterior distribution.
Leong YK & Wong WY Introduction to Statistical Decisions 20

## We shall give an affirmative answer by verifying the case

that X 1 and X 2 independent observations with density
functions f1( x1 | ) and f 2 ( x2 | ) , respectively.

## Let ( ) be a prior density function of . Then

( | x1) f1( x1 | ) ( ) .
Thus the posterior density function of given that X 1 = x1
is
( | x1) = c1( x1) f1( X 1 | ) ( )
By regarding 1( ) = ( | x1 ) as the density function of
,the posterior density function of when X 2 = x2 is
available is given by

2 ( | x2 ) f 2 ( x2 | )1( )
f 2 ( x2 | ) f1( x1 | ) ( )
Since the f1( x1 | ) f 2 ( x2 | ) is the joint density function of
X 1 and X 2 , this shows that 2 ( | x2 ) is the posterior
density of given the observed vector ( x1, x2 ) .

## Bayes Decision Rules

Recall that the Bayes risk of a decision rule d against the
prior probability function is

R( , d ) = E [ R( , d )] = R( , d ) ( ) .

Leong YK & Wong WY Introduction to Statistical Decisions 21
Now

R( , d ) = L( , d ( x )) f ( x | ) ( )
x

= L( , d ( x ) ( | x ) f ( x )
(*)
x
where f (x ) represents the marginal density function of X .

It follows from (*) that for each observed value x , the Bayes
decision rule d 0 such that

L( , d 0 ( x )) ( | x ) L( , d ( x )) ( | x ) d D

Example 5.4.5
Consider a decision problem with loss matrix
a1 a2
1 0 8
2 4 0
Suppose that the statistician can observe a random variable
X with the following conditional distributions:
P( X = 0 | = 1 ) = 3 / 4 P( X = 0 | = 2 ) = 1 / 3
P( X = 1 | = 1 ) = 1 / 4 P( X = 1 | = 2 ) = 2 / 3 .
It is requires to construct a Bayes decision rule against the
following prior distribution of
Leong YK & Wong WY Introduction to Statistical Decisions 22

: P( = 1) = w , P( = 2 ) = 1 w , 0 w 1

## Construction of Posterior Distribution

For x = 0 ,
(1 | 0) P ( X = 0 | = 1) P ( = 1)
= 3w / 4
( 2 | 0) P ( X = 0 | = 2 ) P ( = 2 )
= (1 w) / 3 .
This implies that

3w / 4 9w
(1 | 0) = =
3w / 4 + (1 w) / 3 9w + 4(1 w)
4(1 w)
( 2 | 0) = .
9 w + 4(1 w)

L( , a1 ) ( | 0) = 0 (1 | 0) + 4 ( 2 | 0)

16(1 w )
=
9 w + 4(1 w)

L( , a2 ) ( | 0) = 8 (1 | 0) + 0 ( 2 | 0)

72w
= .
9 w + 4(1 w)
Leong YK & Wong WY Introduction to Statistical Decisions 23

## Therefore, d (0) = a1 iff 16(1 w) 72w

iff w 2 / 11.

Similarly, for x = 1,

(1 | 1) P( X = 1 | = 1) P( = 1)
= w/4
( 2 | 1) P ( X = 1 | = 2 ) P( = 2 )
= 2(1 w ) / 3

w/4 3w
(1 | 1) = =
w / 4 + 2(1 w) / 3 3w + 8(1 w)
8(1 w)
( 2 | 1) = .
3w + 8(1 w)
Therefore, d (1) = a1 32(1 w ) 24 w w 4 / 7
Conclusion : The Bayes decision rule is
d1 , 0 w 2 / 11

d0 ( x ) = d3 , 2 / 11 w 4 / 7
d , 4/7 w 1
4
Leong YK & Wong WY Introduction to Statistical Decisions 24

d1 d2 d3 d3
0 a2 a2 a1 a1
1 a2 a1 a2 a1

## Note that the risk points of these decision rules are

d1 d2 d3 d4
R(1, d ) 8 6 2 0
R( 2 , d ) 0 8/3 4/3 4

## Bayes decision rule with constant risk is

minimax
Leong YK & Wong WY Introduction to Statistical Decisions 25

Example 5.4.6
In the above example (Example 4.5.4), the (randomized)
decision rule with constant risk is a mixture of d3 and d3 .
The slope of the line joining risk points of d3 and d 4 is
4 4/3 4
m= = .
r0 2 3
Hence the prior vector =< w ,1 w > which is perpendicular
to vector joining risk points of d3 and d 4 is given by
1 w 3 4
= or w = .
w 4 7
This implies that the randomized decision rule d * with
constant risk is Bayes against the prior distribution
: P( = 1) = 4 / 7 and P( = 2 ) = 3 / 7 .

## In fact the randomized minimax decision rule is found to be

d * = (0 , 0 , 6 / 7 , 1 / 7 )
Notice that d * is the only admissible decision rule with
constant risk ( see the following figure ).

## Some answers are given in the following assertions

Leong YK & Wong WY Introduction to Statistical Decisions 26

## (a) If for a given prior distribution , 0 is the

unique Bayes decision rule against , then

## (b) Suppose that = {1 , L , n } and 0 is

Bayes against a prior distribution
P( = i ) = pi > 0 , for all i = 1, L, n ,

5.5 Sufficiency
It is common practice for statistician, when confronted with
a mass of data, to compute some simple measure from the
data and then base statistical procedures on the simpler
quantity . Computing such simpler measure is called
reducing the data; the measures themselves are called
statistics.
A question arises naturally, How much reduction of the data
is possible without losing information regarding the state of
nature?
Leong YK & Wong WY Introduction to Statistical Decisions 27

Sufficient Statistic
statistic T is said to be sufficient for a family of density
functions { f ( | ) ; } if ( | ~
x1 ) = ( | ~
x2 ) for
any prior distribution of , and any data ~ x1 and ~ x2 of
the same size from the family { f ( | ) ; }.

Factorization Theorem
Suppose that f ( ~ x | ) represents the joint density
~
function of the observed random vector X .
~
Statistic T = t ( X ) is sufficient for if and only if
f (~x | ) = g (t ( ~
x ); )h( ~
x)
where g depends on ~
x only through t and h does
not depend on .

## The original definition of sufficiency was proposed by

Fisher in early of 1920.

~
Statistic T = t ( X ) is said to be sufficient for the
family of density functions { f (; ) : } if the
~
conditional distribution of X given T = t does not
depend on .
Leong YK & Wong WY Introduction to Statistical Decisions 28

## Use of Sufficient Statistic in Decision Problem

The reason for performing an experiment whose distribution
depends on is to learn about , and if the distribution does
not depend on there is no point to performing the
~
This shows, intuitively, why nothing is lost if the data X is
reduced to a sufficient statistic T .
~
More precisely, given any decision rule d ( X ) , there is a decision
rule (T ) for which R ( , ) = R ( , d ) .The concept of
sufficiency is useful for it allows us to focus on decision rules that
are functions of sufficient statistics. We shall illustrate this fact by
example (see Example 5.6.1) and state without proof the
theoretical result as given below.

## Given any decision rule d (x~ ), there is a decision rule (t ) ,

~
depending only on sufficient statistic T = t ( X ) so that
R( , d ) = R( , ).

## Let d (x~ ) be any decision rule. Define a decision rule as

follows:
~
Suppose T = t is observed. Then in the distribution of X
~
given T = t , if data X = ~
x * is observed, then taken action
d (x~*) .
Leong YK & Wong WY Introduction to Statistical Decisions 29

## Note that even if d (x~ ) is a nonrandomized decision rule,

(t ) is generally a randomized decision rule. Since the
action taken depends on ~ x *, and ~ x * came about by
~
performing the experiment X | T = t . Thus for given and
T = t , the loss of is random and what is relevant is the
expected loss:
~
L( , (t )) = E[ L( , d ( X )) | T = t )
~ ~ ~
= L ( , d ( x ) P ( X = x | T = t).
~
x :t ( ~
x )= t

## The risk function using (T ) is then

R( , ) = E[ L( , (T ))]
= L( , (t )) P(T = t )
t
~
= L( , d ( x )) P( X = x | T = t ) P (T = t )
~ ~
x :t ( ~x )= t
t ~

~
= L( , d ( ~ x )) P( X = ~
x)
~
x
= R( , d ) .

Example 5.5.1
Consider a decision problem (, A, L ) in which
A = { a1 , a2 }. Let X 1 and X 2 be a random sample of size 2
from a Bernoulli distribution with parameter . Then
Leong YK & Wong WY Introduction to Statistical Decisions 30
T = X 1 + X 2 is sufficient for . Moreover, T is a binomial
random variable with parameter ( 2 , ).

## ~ a1 , ( x1, x2 ) = (0, 0) or (0,1)

d(x ) =
a2 , ( x1, x2 ) = (1, 0) or (1,1)

a1 , t =0
(t ) =
a2 , t=2

## If t = 1, then toss a fair coin once,

(1) = .
a2 , if Tail occurs

## Then the risk function of is

R( , ) = E[ L( , (T ))]

= L( , a1 ) P (T = 0) + L( , (1)) P (T = 1)
+ L( , a2 ) P (T = 2)
2 1
= L( , a1 )(1 ) + [ L( , a1 ) + L( , a2 )]2 (1 )
2
+ L( , a2 ) 2
Leong YK & Wong WY Introduction to Statistical Decisions 31

= L( , a1 ){(1 ) 2 + (1 )} + L( , a2 ){ (1 ) + 2 }
~ ~
= L( , a1 ){P ( X = (0,0)) + P( X = (0,1))}
~ ~
+ L( , a2 ){P ( X = (1,1)) + P ( X = (1,0))}

Example 5.6.1
Consider a decision problem (, A, L ) in which A = { a1 , a2 }.
Let X1 and X 2 be a random sample of size 2 from a Bernoulli
distribution with parameter . Then T = X 1 + X 2 is sufficient for
. Moreover, T is a binomial random variable with parameter
( 2 , ).

## ~ a1 , ( x1, x2 ) = (0, 0) or (0,1)

d (x ) =
a 2 , ( x1, x2 ) = (1, 0) or (1,1)

a1 , t =0
(t ) =
a 2 , t=2

## If t = 1, then toss a fair coin once,

(1) = .
a 2 , if Tail occurs
Leong YK & Wong WY Introduction to Statistical Decisions 32

## Then the risk function of is

R ( , ) = E[ L ( , (T ))]
= L( , a1 ) P (T = 0) + L( , (1)) P (T = 1) + L( , a2 ) P (T = 2)
1
= L( , a1 )(1 ) 2 + [ L( , a1 ) + L( , a2 )]2 (1 ) + L( , a2 ) 2
2
= L( , a1 ){(1 ) 2 + (1 )} + L( , a2 ){ (1 ) + 2 }
~ ~
= L( , a1){P ( X = (0,0)) + P ( X = (0,1))}
~ ~
+ L( , a2 ){P ( X = (1,1)) + P ( X = (1,0))}
= R ( , d )