Sei sulla pagina 1di 21

# Minimax Hypothesis Testing

05-February-2009

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 1 / 21

Minimax Hypothesis Testing

## r(δ, π0 ) = π0 R0 (δ) + (1 − π0 )R1 (δ)

1
R0 (δ)
0. 9

0. 8 r(δ, π0 )
R1 (δ)
0. 7

R0 (δBπ0 )
0. 6

0. 5 r(δBπ0 , π0 )
0. 4

0. 3
r(δBπ0 , π0 )

R1 (δBπ0 )
0. 2

0. 1

0
0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1
π0′ prior π0

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 2 / 21

Minimax Hypothesis Testing

## Least Favorable Prior State Distribution

1

0. 9

0. 8

0. 7

R0 (δBπ0 )
0. 6

0. 5 r(δBπ0 , π0 )
r(δBπlf , π 0) R0 (δBπlf )
R1 (δBπlf )

0. 3
r(δBπ0 , π0 )
R1 (δ Bπ0′ )
0. 2

0. 1

0
0 0. 1 0. 2 0. 3 0. 4 0. 6 0. 7 0. 8 0. 9 1
π0′ πlf prior π0

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 3 / 21

Minimax Hypothesis Testing

Definition:

## ρmm := arg min max Rj (ρ)

ρ j

Remarks:
◮ No single decision rule minimizes the weighted average, e.g. Bayes,
risk for every possible prior state distribution.
◮ A conservative approach is to minimize the worst case risk over all
possible prior state distributions.
◮ Intuitively, there should be a least favorable prior. Does it always
exist? Is it unique?
◮ Intuitively, the minimax decision rule should be the Bayesian decision
rule with constant Bayesian risk over the priors. Is this always true?

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 4 / 21

Minimax Hypothesis Testing

## Minimum Bayesian Risk as a Function of the Prior

Let V (π) := r(δBπ , π) be the minimum Bayesian risk for the prior π.
Theorem
The minimum Bayesian risk V (π) is concave and continuous P over the
space of priors satisfying πj ≥ 0, j = 0, 1, . . . , N − 1, and j πj = 1.
Hence, there exists a unique least favorable prior

π

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 5 / 21

Minimax Hypothesis Testing

## Concavity of the Minimum Bayesian Risk

A function is concave if, for any {x, y} in the domain of f and any
α ∈ [0, 1], f (αx + (1 − α)y) ≥ αf (x) + (1 − α)f (y).
Denote a pair of priors as π and π ′ and a third prior π ′′ = απ + (1 − α)π ′ .
We can write
′′
V (π ′′ ) = π ′′⊤ R(δBπ )
′′ ′′
= απ ⊤ R(δBπ ) + (1 − α)π ′⊤ R(δBπ )
≥ αV (π) + (1 − α)V (π ′ )
hence V (π) is concave.
V
V (π0′′ ) V (π ′ )
0
V (π0 )

0
π0 π0′′ π0′ 1

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 6 / 21

Minimax Hypothesis Testing

## Theorem (“A First Course in Optimization Theory” by

R.K. Sundaram)
Let f : D → R be a concave function. Then, if D is open, f is continuous
on D. If D is not open, f is continuous on the interior of D.

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 7 / 21

Minimax Hypothesis Testing

## The Four Possibilities

V (π0 ) V (π0 )

R1 (δBπlf ) R0 (δBπlf )
R1 (δBπlf ) R0 (δBπlf )

1 π0
2 π0
π0lf π0lf
V (π0 ) V (π0 )

R1 (δBπlf ) R0 (δBπlf )

R0 (δBπlf )
R1 (δBπlf )

π0lf = 0
π0lf = 1
3 π0
4 π0

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 8 / 21

Minimax Hypothesis Testing

## Case 1: Differentiable Interior Maximum Risk

Theorem
′ ′
If there exists a prior π ′ such that the conditional risks satisfy R0 (δ Bπ ) = R1 (δ Bπ )

then π ′ is a least favorable prior and the minimax decision rule is ρmm = δ Bπ .

Proof.
′ ′
Given a π ′ satisfying R0 (δ Bπ ) = R1 (δ Bπ ). For any δ,

π0 ∈[0,1]

## ≥ π ′ R0 (δ) + (1 − π ′ )R1 (δ)

′ ′
≥ π ′ R0 (δ Bπ ) + (1 − π ′ )R1 (δ Bπ )
′ ′
= R0 (δ Bπ ) = R1 (δ Bπ )

## Moreover, for any π0 ∈ [0, 1],

′ ′ ′ ′
V (π ′ ) = π ′ R0 (δ Bπ ) + (1 − π ′ )R1 (δ Bπ ) = π0 R0 (δ Bπ ) + (1 − π0 )R1 (δ Bπ ) ≥ V (π0 )

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 9 / 21

Minimax Hypothesis Testing

## 1. Find a Bayesian decision rule δBπ as a function of the prior π.

2. See if Case 1 holds by solving for the unique least favorable prior πlf
using the equalizer rule:

R0 (δBπlf ) = R1 (δBπlf )

ρmm = δBπlf

## 4. If there is no solution to the equalizer rule, then see if Case 3 or 4

holds by computing the risk at the endpoints π0lf = 0 and π0lf = 1.
5. If neither endpoint is least favorable, then we must be in Case 2. In
this case we must create a randomized minimax decision rule as a
convex function of two deterministic Bayes decision rules.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 10 / 21
Minimax Hypothesis Testing

## Example: Coherent Detection of BPSK

Our Bayes decision rule for coherent BPSK with prior π0 , π1 = 1 − π0 is

1
 if y >γ

δ (y) = 0/1 if y =γ

0 if y < γ.

2
where γ := a0 +a
2
1
+ a1σ−a0 ln ππ01 .
The conditional risks are
 
Bπ γ − a0
R0 (δ ) = Q
σ
 
Bπ a1 − γ
R1 (δ ) = Q
σ
R∞ 2
where Q(x) := √1 e−t /2 dt.
x 2π

Let’s try the equalizer rule. What value of γ gives us R0 (δBπ ) = R1 (δBπ )?
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 11 / 21
Minimax Hypothesis Testing

## Example: Coherent Detection of BPSK

a0 +a1
Answer: R0 = R1 when γ = 2 . Hence
a0 +a1

1
 if y > 2
mm a0 +a1
ρ (y) = 0/1 if y = 2
a0 +a1

0 if y < 2 .

a0 +a1
γ= 2

a0 a1
Y0 Y1
What does this imply about the least favorable prior?
Answer: π0 = π1 = 21 .
Given a0 , a1 , and σ, the minimax rule allows you to guarantee a
worst-case risk over all priors.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 12 / 21
Minimax Hypothesis Testing

## Example: Coherent Detection of BPSK

1

0. 9

0. 8

0. 7

R0 (δBπ0 )
0. 6

0. 5 r(δBπ0 , π0 )
r(δBπlf , π 0) R0 (δBπlf )
R1 (δBπlf )

0. 3
r(δBπ0 , π0 )
R1 (δ Bπ0′ )
0. 2

0. 1

0
0 0. 1 0. 2 0. 3 0. 4 0. 6 0. 7 0. 8 0. 9 1
π0′ πlf prior π0

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 13 / 21

Minimax Hypothesis Testing

## Cases 3-4: Maximum Risk Occurs at Boundary

Let’s return to our coin flipping problem from Lecture 1 (H0 ↔ x0 = HT
and H1 ↔ x1 = HH) with a modified cost matrix
 
0 100
C=
100 60
100

90

80

70

60
Bayes risk

50

40

30

20
rule 1
rule 2
10 rule 3
rule 4
minimum
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
prior π
0

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 14 / 21

Minimax Hypothesis Testing

## Cases 3-4: Maximum Risk Occurs at Boundary

Remarks:
◮ Rules 2 and 3, depending on the prior, minimize the Bayes risk.

◮ Note that the equalizer rule would give no solution to this problem:

## R0 (D1 ) = 100 and R1 (D1 ) = 60

R0 (D2 ) = 0 and R1 (D2 ) = 100
R0 (D3 ) = 50 and R1 (D3 ) = 60
R0 (D4 ) = 50 and R1 (D4 ) = 100

## No decision rule gives R0 = R1 .

◮ In this example, the least favorable prior (maximizing the minimum
risk) is π0 = 0, or that the coin is always HH. This should make sense.
◮ The minimax decision rule is Rule 3: observe T, decide the coin is
fair; observe H, decide the coin is unfair.
◮ You can guarantee a worst-case risk of \$60 by using Rule 3.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 15 / 21
Minimax Hypothesis Testing

## Case 2: Non-Differentiable Interior Maximum

Back to our original coin flipping problem with cost matrix
 
0 100
C=
100 0
100

90

80

70

60
rule 1
Bayes risk

rule 2
50 rule 3
rule 4
minimum
40

30

20

10

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
prior π
0

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 16 / 21

Minimax Hypothesis Testing

## Case 2: Non-Differentiable Interior Maximum

Remarks:
◮ Rules 2 and 3, depending on the prior, minimize the Bayes risk.
◮ Again, the equalizer rule gives no deterministic solution since

## R0 (D1 ) = 100 and R1 (D1 ) = 0

R0 (D2 ) = 0 and R1 (D2 ) = 100
R0 (D3 ) = 50 and R1 (D3 ) = 0
R0 (D4 ) = 50 and R1 (D4 ) = 100

## ◮ In this example, the least favorable prior (maximizing the minimum

risk) is π0 = 32 , or that the coin is HT with probability 23 .
◮ The minimax decision rule is neither Rule 2 or Rule 3.
◮ You can guarantee a worst-case risk of \$100
3 by using a randomized
decision rule that is a combination of Rules 2 and 3.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 17 / 21
Minimax Hypothesis Testing

## Case 2: Non-Differentiable Interior Maximum

Problem: Find a randomized decision rule that satisfies the equalizer rule
+
R1 (δBπlf ) 100
90

80
+
70 δBπlf

60
rule 1
Bayes risk

rule 2 −
50 rule 3

R0 (δBπlf )
rule 4 Bπlf
minimum
δ
40

V (πlf ) V (πlf )
30 r(ρmm )
20

10

− +
R1 (δ Bπlf
) 00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 R0 (δBπlf )
prior π0
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 18 / 21
Minimax Hypothesis Testing

## Case 2: Non-Differentiable Interior Maximum

Our randomized minimax decision rule is the
− +
ρmm = αδBπlf + (1 − α)δBπlf

## We can calculate the randomization α ∈ [0, 1] by applying the equalizer

rule:
− + − +
αR0 (δBπlf ) + (1 − α)R0 (δBπlf ) = αR1 (δBπlf ) + (1 − α)R1 (δBπlf )

## which gives the solution

+ +
R1 (δBπlf ) − R0 (δBπlf )
α = − − + +
(R0 (δBπlf ) − R1 (δBπlf )) − (R0 (δBπlf ) − R1 (δBπlf ))
+
V ′ (πlf )
= + −
V ′ (πlf ) − V ′ (πlf )

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 19 / 21

Minimax Hypothesis Testing

## Case 2: Non-Differentiable Interior Maximum

What is the minimax decision rule in our example?
+
V ′ (πlf ) = −100
V ′ (πlf

) = 50

## hence α = 32 . If H0 is the hypothesis that the coin is HT, H1 is the

hypothesis that the coin is HH, the observations y0 = T, y1 = H, then our
deterministic decision rules 2 and 3 can be written as
   

Bπlf 1 0 +
Bπlf 1 1
D3 = δ = and D2 = δ =
0 1 0 0
The minimax decision rule is then given by
» – » – » –
2 1 1 1 1
ρmm (y = T) = + = (T → always decide HT)
3 0 3 0 0
» – » – » – „ nPr(decide HT)=1/3«
2 0 1 1 1/3
ρmm (y = H) = + = H → randomize →
3 1 3 0 2/3 Pr(decide HH)=2/3

## Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 20 / 21

Minimax Hypothesis Testing

## 1. The objective of minimax hypothesis testing is to minimize your

worst-case (maximum) risk over all possible prior state probabilities.
2. Conservative approach but useful in scenarios when:
◮ the prior is unknown and/or
◮ you need to provide a maximum risk guarantee.
3. Try the equalizer rule first!
4. Minimax risk at the endpoints only occurs in weird cases.
5. Finite observation space Y implies that the minimum Bayes risk curve
V is not going to be differentiable everywhere. Randomization is
often necessary to obtain the minimax decision rule in these cases.
6. Composite hypotheses:
◮ The equalizer rule is still valid.
◮ Checking “endpoints” is still valid (but more points to check).
◮ In the case of a non-differentiable interior maximum, finding the
randomization can be difficult.
Worcester Polytechnic Institute D. Richard Brown III 05-February-2009 21 / 21