Lesson 35 Game Theory and Linear Programming

Lesson 35
Game Theory and Linear Programming

Math 20
December 14, 2007
Announcements
I
Pset 12 due December 17 (last day of class)
Lecture notes and K&H on website
next OH Monday 12 (SC 323)
Outline
Recap
Definitions
Examples
Fundamental Theorem
Games we can solve so far
GT problems as LP problems
From the continuous to the discrete
Standardization
Rock/Paper/Scissors again
The row players LP problem
Definition
A zero-sum game is defined by a payoff matrix A, where aij
represents the payoff to the row player if R chooses option i and C
chooses option j.
Definition
A zero-sum game is defined by a payoff matrix A, where aij
represents the payoff to the row player if R chooses option i and C
chooses option j.
I
The row player chooses from the rows of the matrix, and the
column player from the columns.
The payoff could be a negative number, representing a net

gain for the column player.
Definition
A strategy for a player consists of a probability vector representing
the portion of time each option is employed.
Definition
A strategy for a player consists of a probability vector representing
the portion of time each option is employed.
I
We use a row vector p for the row players strategy, and a

column vector q for the column players strategy.
A pure strategy (select the same option every

represented by a standard basis vector ej or e0j .
if R has three choices and C has five:

0
e4 = 0 0 0 1
e02 = 1
0
A non-pure strategy is called mixed.
time) is
For instance,
Definition
The expected value of row and column strategies p and q is the
scalar
n
X
E (p, q) =
pi aij qj = pAq
i,j=1
Definition
The expected value of row and column strategies p and q is the
scalar
n
X
E (p, q) =
pi aij qj = pAq
i,j=1
Probabilistically, this is the amount the row player receives (or the
column player if its negative) if players employ these strategies.
Rock/Paper/Scissors
Example
What is the payoff matrix for Rock/Paper/Scissors?
Rock/Paper/Scissors
Example
What is the payoff matrix for Rock/Paper/Scissors?
Solution
The payoff matrix is
0 1 1
0 1 .
A= 1
1 1
0
Example
Consider a new game: players R and C each choose a number 1,
2, or 3. If they choose the same thing, C pays R that amount. If
they choose differently, R pays C the amount that C has chosen.
What is the payoff matrix?
Example
Consider a new game: players R and C each choose a number 1,
2, or 3. If they choose the same thing, C pays R that amount. If
What is the payoff matrix?
Solution
1 2 3
A = 1 2 3
1 2 3
Theorem (Fundamental Theorem of Matrix Games)

There exist optimal strategies p for R and q for C such that for
all strategies p and q:
E (p , q) E (p , q ) E (p, q )
Theorem (Fundamental Theorem of Matrix Games)

There exist optimal strategies p for R and q for C such that for
all strategies p and q:
E (p , q) E (p , q ) E (p, q )
E (p , q ) is called the value v of the game.
Reflect on the inequality
E (p , q) E (p , q ) E (p, q )
In other words,
I
E (p , q) E (p , q ): R can guarantee a lower bound on

his/her payoff
E (p , q ) E (p, q ): C can guarantee an upper bound on

how much he/she loses
This value could be negative in which case C has the

advantage
Fundamental problem of zero-sum games
I
I
Find the p and q !

Last time we did these:
I
I
Strictly-determined games
2 2 non-strictly-determined games
The general case well look at next.
Pure Strategies are optimal in Strictly-Determined Games
Theorem
Let A be a payoff matrix. If ars is a saddle point, then e0r is an
optimal strategy for R and es is an optimal strategy for C. Also
v = E (e0r , es ) = ars .
Optimal strategies in 2 2 non-Strictly-Determined Games
Let A be a 2 2 matrix with no saddle points. Then the optimal

strategies are
a a
22
12

a a
a11 a12
22
21
q = a11 a21
p=
where = a11 + a22 a12 a21 . Also

v=
|A|
Outline
Recap
Definitions
Examples
Fundamental Theorem
Standardization
This could get a little weird

This derivation is not something that needs to be memorized, but
should be understood at least once.
Objectifying the problem
Lets think about the problem from the column players

perspective. If she chooses strategy q, and R knew it, he would
choose p to maximize the payoff pAq. Thus the column player
wants to minimize that quantity. That is, C s objective is realized
when the payoff is

E = min max pAq.
q
Objectifying the problem
Lets think about the problem from the column players

perspective. If she chooses strategy q, and R knew it, he would
choose p to maximize the payoff pAq. Thus the column player
wants to minimize that quantity. That is, C s objective is realized
when the payoff is

E = min max pAq.
q
This seems hard! Luckily, linearity, saves us.
Lemma
Regardless of q, we have
max pAq = max e0i Aq
p
1im
Here e0i is the probability vector represents the pure strategy of

going only with choice i.
Lemma
Regardless of q, we have
max pAq = max e0i Aq
p
1im
Here e0i is the probability vector represents the pure strategy of

going only with choice i.
The idea is that a weighted average of things is no bigger than the
largest of them. (Think about grades).
Proof of the lemma

Proof.
We must have
max pAq max e0i Aq
p
1im
(the maximum over a larger set must be at least as big). On the

other hand, let q be C s strategy. Let the quantity on the right be
maximized
when i = i0 . Let p be any strategy for R. Notice that
P
p = i pi e0i . So
E (p, q) = pAq =
m
X
pi e0i Aq
i=1
m
X
i=1
!
pi
e0i0 Aq = e0i0 Aq.
i=1
Thus
max pAq e0i0 Aq.
p
m
X
pi e0i0 Aq
The next step is to introduce a new variable v representing the

value of this inner maximization. Our objective is to minimize it.
Saying its the maximum of all payoffs from pure strategies is the
same as saying
v e0i Aq
for all i. So we finally have something that looks like an LP
problem! We want to choose q and v which minimize v subject to
the constraints
n
X
j=1
v e0i Aq
i = 1, 2, . . . m
qj 0
j = 1, 2, . . . n
qj = 1
Trouble with this formulation
Simplex method with equalities?
Not in standard form
Resolution:
I
We may assume all aij 0, so v > 0
Let xj =
qj
v
Since we know v > 0, we still have x 0. Now

n
X
j=1
xj =
n
1X
1
qj = .
v
v
j=1
So our problem is now to choose x 0 which maximizes

The constraints now take the form
v e0i Aq 1 e0i Ax,
for all i. Another way to write this is
Ax 1,
where 1 is the vector consisting of all ones.
xj .
Upshot
Theorem
Consider a game with payoff matrix A, where each entry of A is
x
positive. The column players optimal strategy q is
,
x1 + + xn
where x 0 satisfies the LP problem of maximizing x1 + + xn
subject to the constraints Ax 1.
Rock/Paper Scissors
0 1 1
0 1 .
A= 1
1 1
0
Rock/Paper Scissors
0 1 1
0 1 .
A= 1
1 1
0
We can add 2 to everything to make
2 1 3
= 3 2 1 .
A
1 3 2
Convert to LP
The problem is to maximize x1 + x2 + x3 subject to the constraints
2x1 + x2 + 3x3 1
3x1 + 2x2 + x3 1
x1 + 3x3 + 2x3 1.
We introduce slack variables y1 , y2 , and y3 , so the constraints now
become
2x1 + x2 + 3x3 + y1 = 1
3x1 + 2x2 + x3 + y2 = 1
x1 + 3x3 + 2x3 + y3 = 1.
An easy initial basic solution is to let x = 0 and y = 1. The initial

tableau is therefore
x1 x2 x3 y1 y2 y3
y1
2
1
3 1 0 0
y2
3
2
1 0 1 0
y3
1
3
2 0 0 1
z 1 1 1 0 0 0
z
0
0
0
1
value
1
1
1
0
Which should be the entering variable? The coefficients in the

bottom row are all the same, so lets just pick one, x1 . To find the
departing variable, we look at the ratios 12 , 31 , and 11 . So y2 is the
departing variable.
We scale row 2 by 13 :
y1
y2
y3
z
x1 x2 x3 y1
2
1
3 1
1 2/3 1/3 0
1
3
2 0
1 1 1 0
y2 y3
0 0
1/3
0
0 1
0 0
z
0
0
0
1
value
1
1/3
1
0
Then we use row operations to zero out the rest of column one:
y1
x1
y3
z
x1
x2
x3 y1
y2 y3
7/3
0 1/3
1 2/3 0
2/3
1/3
1/3
1
0
0
7/3
5/3
0
0 1/3 1
1/3
0 1/3 2/3 0
0
z
0
0
0
1
value
1/3
1/3
2/3
1/3
We can still improve this: x3 is the entering variable and y1 is the

departing variable. The new tableau is
x3
x1
y3
z
x1
x2 x3
y1
y2 y3
3/7 2/7
0
0 1/7 1
5
1
3
1
/7 0 /7
/7 0
1/7
0 18/7 0 5/7
1
3
2
1
0 /7 0
/7
/7 0
z
0
0
0
1
value
1/7
2/7
3/7
3/7
Finally, entering x2 and departing y3 gives

x3
x1
x2
z
x1 x2 x3
y1
y2
y3
7/18 5/18
1/18
0 0 1
1/18
7/18 5/18
1 0 0
1/18
7/18
0 1 0 5/18
1
1
1/6
0 0 0
/6
/6
z
0
0
0
1
value
1/6
1/6
1/6
1/2
So the x variables have values x1 = 1/6, x2 = 1/6, x3 = 1/6.

Furthermore z = x1 + x2 + x3 = 1/2, so v = 1/z = 2. This also
means that p1 = 1/3, p2 = 1/3, and p3 = 1/3. So the optimal
strategy is to do each thing the same number of times.
Outline
Recap
Definitions
Examples
Fundamental Theorem
Standardization
Now lets think about the problem from the column players
perspective. If he chooses strategy p, and C knew it, he would
choose p to minimize the payoff pAq. Thus the row player wants
to maximize that quantity. That is, Rs objective is realized when
the payoff is
E = max min pAq.
p
Lemma
Regardless of p, we have
min pAq = min pAej
q
1jn
The next step is to introduce a new variable v representing the

value of this inner minimization. Our objective is to maximize it.
Saying its the minimum of all payoffs from pure strategies is the
same as saying
v pAej
for all j. Again, we have something that looks like an LP problem!
We want to choose p and v which maximize v subject to the
constraints
m
X
i=1
v pAej
j = 1, 2, . . . n
pi 0
i = 1, 2, . . . m
pi = 1
As before, we can standardize this by renaming

y=
1 0
p
v
(this makes y a column vector). Then

m
X
i=1
yi =
1
,
v
So maximizing v is the same as minimizing 10 y. Likewise, the

equations of constraint become v (
v y0 )Aej for all j, or y0 A 10 ,
0
or (taking transposes) A y 1. If all the entries of A are positive,
we may assume that v is positive, so the constraints p 0 are
satisfied if and only if y 0.
Upshot
Theorem
Consider a game with payoff matrix A, where each entry of A is
y0
positive. The row players optimal strategy p is
,
y1 + + yn
where y 0 satisfies the LP problem of minimizing
y1 + + yn = 10 y subject to the constraints A0 y 1.
The big idea
The big observation is this:
Theorem
The row players LP problem is the dual of the column players LP
problem.
The final tableau in the Rock/Paper/Scissors LP problem was this:
x3
x1
x2
z
x1 x2 x3
y1
y2
y3
7/18 5/18
1/18
0 0 1
1/18
7/18 5/18
1 0 0
1/18
7/18
0 1 0 5/18
1/6
1/6
1/6
0 0 0
z
0
0
0
1
value
1/6
1/6
1/6
1/2
The entries in the objective row below the slack variables are the
solutions to the dual problem! In this case, we have the same
values, which means R has the same strategy as C . This reflects
the symmetry of the original game.
Example
Consider the game: players R and C each choose a number 1, 2,
or 3. If they choose the same thing, C pays R that amount. If
What should each do?
Example
Consider the game: players R and C each choose a number 1, 2,
or 3. If they choose the same thing, C pays R that amount. If
What should each do?
Answer.
Choice
1
2
3
R
54.5%
27.3%
18.2%
C
22.7%
36.3%
40.1%
The expected payoff is 2.71 to the column player.

Lesson 35 Game Theory and Linear Programming

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Lesson 35 Game Theory and Linear Programming

Caricato da

Copyright:

Formati disponibili

Lesson 35

Game Theory and Linear Programming

December 14, 2007

Pset 12 due December 17 (last day of class)

Lecture notes and K&H on website

next OH Monday 12 (SC 323)

The payoff could be a negative number, representing a net

We use a row vector p for the row players strategy, and a

A pure strategy (select the same option every

A non-pure strategy is called mixed.

Theorem (Fundamental Theorem of Matrix Games)

Theorem (Fundamental Theorem of Matrix Games)

Reflect on the inequality

E (p , q) E (p , q ): R can guarantee a lower bound on

E (p , q ) E (p, q ): C can guarantee an upper bound on

This value could be negative in which case C has the

Fundamental problem of zero-sum games

Find the p and q !

The general case well look at next.

Pure Strategies are optimal in Strictly-Determined Games

Optimal strategies in 2 2 non-Strictly-Determined Games

Let A be a 2 2 matrix with no saddle points. Then the optimal

where = a11 + a22 a12 a21 . Also

This could get a little weird

Objectifying the problem

Lets think about the problem from the column players

Objectifying the problem

Lets think about the problem from the column players

This seems hard! Luckily, linearity, saves us.

From the continuous to the discrete

Here e0i is the probability vector represents the pure strategy of

From the continuous to the discrete

Here e0i is the probability vector represents the pure strategy of

Proof of the lemma

(the maximum over a larger set must be at least as big). On the

e0i0 Aq = e0i0 Aq.

The next step is to introduce a new variable v representing the

Trouble with this formulation

Simplex method with equalities?

Not in standard form

We may assume all aij 0, so v > 0

Since we know v > 0, we still have x 0. Now

So our problem is now to choose x 0 which maximizes

The payoff matrix is

The payoff matrix is

An easy initial basic solution is to let x = 0 and y = 1. The initial

Which should be the entering variable? The coefficients in the

We can still improve this: x3 is the entering variable and y1 is the

Finally, entering x2 and departing y3 gives

So the x variables have values x1 = 1/6, x2 = 1/6, x3 = 1/6.

The next step is to introduce a new variable v representing the

As before, we can standardize this by renaming

(this makes y a column vector). Then

So maximizing v is the same as minimizing 10 y. Likewise, the

The big idea

The big observation is this:

The final tableau in the Rock/Paper/Scissors LP problem was this:

The expected payoff is 2.71 to the column player.

Potrebbero piacerti anche