Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
THEORY
SERIES ON OPTIMIZATION
Published
Vol. 2 Differential Games of Pursuit
by L A. Petrosjan
OPT!IV]IZATiON
VOL.3
GAME
THEORY
Leon A. Petrosjan
Nibolay A. Zenkevich
Faculty of Applied Mathematics
St. Petersburg State University
RUSSIA
j World Scientific
Singapore* New Jersey London Hong Kong
Published by
World Scientific Publishing Co. Pte. Ltd.
P O Box 128, Fairer Road, Singapore 912805
USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, Massachusetts 01923, USA.
We begin by acknowledging our debts to our teacher Nicolay Vorobjev who started
in the former Soviet Union teaching us game theory, the time when this subject was
not a necessary part of applied mathematics, economics and management science
curriculum.
We have to mention specially Elena A. Semina who wrote with us sections 1.7,
1.9, 3.7, 3.13, 4.4-4.6, 4.8, 4.9 and sections 5.2-5.6, 5.8 for the Russian version of the
book and Jury M. Donetz who translated the book in English.
We thank Olga Kholodkevich, Maria Kultina, Tatiana Survillo and Sergey Voz-
nyuk for their effective research assistance; and also for reading the manuscript and
suggesting ways to improve it.
Many thanks to Andrey Ovsienko and Sergey Voznyuk for preparation of the
manuscript in M g X .
v
This page is intentionally left blank
Preface
Game theory is a branch of modern applied mathematics that aims to analyze various
problems of conflict between parties that have opposed, similar or simply different
interests. A theory of games, introduced in 1921 by Emile Borel, was established
in 1928 by John von Neumann and Oskar Morgenstern, to develop it as a means
of decision making in complicated economic systems. In their book "The Theory of
Games and Economic Behaviour", published in 1944, they asserted that the classical
mathematics developed for applications in mechanics and physics fail to describe the
real processes in economics and social life. They have also seen many common factors
such as conflicting interests, various preferences of decision makers, the dependence
of the outcome for each individual from the decisions made by other individuals both
in actual games and economic situations. Therefore, they named this new kind of
mathematics game theory.
Games are grouped into several classes according to some important features. In
our book we consider zero-sum two-person games, strategic n-person games in normal
form, cooperative games, games in extensive form with complete and incomplete
information, differential pursuit games and differential cooperative n-person games.
There is no single game theory which could address such a wide range of "games".
At the same time there are common optimality principles applicable to all classes
of games under consideration, but the methods of effective computation of solutions
are very different. It is also impossible to cover in one book all known optimality
principles and solution concepts. For instance only the set of different "refinements"
of Nash equilibria generates more than 15 new optimality principles. In this book we
try to explain the principles which from our point of view are basic in game theory,
and bring the reader to the ability to solve problems in this field of mathematics. We
have included results published before in Petrosjan (1965), (1968), (1970), (1972),
(1977), (1992), (1993); Petrosjan and Zenkevich (1986); Zenkevich and Marchenko
(1987), (1990); Zenkevich and Voznyuk (1994).
vii
This page is intentionally left blank
Contents
1 Matrix games 1
1.1 Definition of a two-person zero-sum game in normal form 1
1.2 Maximin and minimax strategies 5
1.3 Saddle points 7
1.4 Mixed extension of a game 11
1.5 Convex sets and systems of linear inequalities 15
1.6 Existence of a solution of the matrix game in mixed strategies . . . . 18
1.7 Properties of optimal strategies and value of the game 22
1.8 Dominance of strategies 30
1.9 Completely mixed and symmetric games 35
1.10 Iterative methods of solving matrix games 40
1.11 Exercises and problems 44
ix
X Contents
5.14 Strongly time consistent optimality principles for the games with dis
count payoffs 336
5.15 Exercises and problems 338
Bibliography 345
Index 351
Chapter 1
Matrix games
T = {X,Y,K), (1.1.1)
where X and Y are nonempty sets, and the function K : X x Y * Ft1, is called a
two-person zero-sum game in normal form.
The elements x X and y Y are called the strategies of players 1 and 2,
respectively, in the game I \ the elements of the Cartesian product X x Y (i.e. the
pairs of strategies (z,y), where x X and y Y) are called situations, and the
function K is the payoff of Player 1. Player 2's payoff in situation (x,y) is set equal
to \~K(x,y)]; therefore the function K is also called the payoff function of the game
T itself and the game T is called a zero-sum game. Thus, in order to specify the game
T, it is necessary to define the sets of strategies X, Y for players 1 and 2, and the
payoff function K given on the set of all situations X x Y.
The game T is interpreted as follows. Players simultaneously and independently
choose strategies x X,y Y. Thereafter Player 1 receives the payoff equal to
K(x,y) and Player 2 receives the payoff equal to (K(x,y)).
Definition. The game I" = (X', V , K') is called a subgame of the game Y =
(X,Y,K) ifX' C X,Y' C Y, and the function K' : X' x Y" R1 is a restriction of
function K on X' x Y'.
This chapter focuses on two-person zero-sum games in which the strategy sets of
the players' are finite.
1.1.2. Definition. Two-person zero-sum games in which both players have
finite sets of strategies are called matrix games.
Suppose that Player 1 in matrix game (1.1.1) has a total of m strategies. Let us
order the strategy set X of the first player, i.e. set up a one-to-one correspondence
between the sets M = { 1 , 2 , . . . , m } and X. Similarly, if Player 2 has n strategies, it
is possible to set up a one-to-one correspondence between the sets N = { 1 , 2 , . . . , n }
1
2 Chapter 1. Matrix games
and Y. The game T is then fully defined by specifying the matrix A = {a,-,-}, where
ctij = K(xi,yi), (i,j) M x N, (x;,y,) X x Y,i 6 M , ; N (whence comes the
name of the game the matrix game). In this case the game T is realized as follows.
Player 1 chooses row i G M and Player 2 (simultaneously and independently from
Player 1) chooses column j N. Thereafter Player 1 receives the payoff (a,,) and
Player 2 receives the payoff (<Hj). If the payoff is equal to a negative number, then
we are dealing with the actual loss of Player 1.
Denote the game T with the payoff matrix A by TA and call it the (mxn) game
according to the dimension of matrix A. We shall drop index A if the discussion
makes it clear what matrix is used in the game.
The game is zero-sum. We shall describe strategies of the players. Suppose that
m > n. Player 1 has the following strategies: x 0 = (iw,0) - to place all of the
regiments at the first post; xi = (m - 1 , 1 ) - to place (m - 1 ) regiments at the first post
and one at the second; x2 = ( m 2 , 2 ) , . . . , xm-x = ( l , m l ) , x m = (0,m). Theenemy
(Player 2) has the following strategies: y0 = (n,0), yx = (n - 1,1),... ,yn = (0,n).
Suppose that the Player 1 chooses strategy x 0 and Player 2 chooses strategy y0.
Compute the payoff am of Player 1 in this situation. Since m > n, Player 1 wins
at the first post. His payoff is n + 1 (one for holding the post). At the second post
it is draw. Therefore a<jo = n + 1. Compute a m . Since m > n 1, then in the
first post Player l's payoff is n 1 + 1 = n. Player 2 wins at the second post.
Therefore the loss of Player 1 at this post is one. Thus, a0\ = n 1. Similarly,
we obtain aoj = n j + 1 1 = n j , 1 < j < n. Further, if m 1 > n then
aw = n + 1 + 1 = n + 2 , a n = n1 + 1 = n, aXj = n j + 1 11 = nj l, 2 < j < n.
In a general case (for any m and n) the elements o, t = 0, m, j = 0,n, of the payoff
J.I. Definition of a two-person zero-sum game in normal form 3
yo Vi V2 #3
Xo 4 2 1 0
Xl 1 3 0 -1
*2 -2 2 2 -2
x3 -1 0 3 1
*4 0 1 2 4
1 2 3 4
1 "0 1 2 3
2 1 0 1 2
A =
3 2 1 0 1
4 .3 2 1 0
Example 3. (Discrete Duel Type Game.) [Gale (I960)]. Players approach one
another by taking n steps. After each step a player may or may not fire a bullet, but
during the game he may fire only once. The probability that the player will hit his
opponent (if he shoots) on the k-th step is assumed to be k/n (k < n).
A strategy for Player 1 (2) consists in taking a decision on shooting at the t-th
(j-th) step. Suppose that i < j and Player 1 makes a decision to shoot at the t-th
step and Player 2 makes a decision to shoot at the j - t h step. The payoff a i ; to Player
1 is then determined by
a
ij = V1 ) -= 5
n n n rr
Thus the payoff a^ is the difference in the probabilities of hitting the opponent and
failing to survive. In the case i > j , Player 2 is the first to fire and a^ = OJ;. If
4 Chapter 1. Matrix games
An n
T2
An>.
Example 5. (Discrete Search Game.) There are n cells. Player 2 conceals an
object in one of the n cells and Player 1 wishes to find it. In examining the i-th cell,
Player 1 exerts n > 0 efforts, and the probability of finding the object in the t-th cell
(if it is concealed there) is 0 </?,-< 1, i = 1,2,... , n . If the object is found, Player
1 receives the amount a. The players' strategies are the numbers of cells wherein
the players respectively conceal and search for the object. Player l's payoff is equal
to the difference in the expected receipts and the efforts made in searching for the
object. Thus, the problem of concealing and searching for the object reduces to the
game with the payoff matrix
afa- r, -T, -T, -1-1
T2 0102 ~ Ti ~T2 -r 2
A =
a&
Example 6. (Noisy Search.) Suppose that Player 1 is searching for a mobile object
(Player 2) for the purpose of detecting it. Player 2's objective is the opposite one (i.e.
he seeks to avoid being detected). Player 1 can move at velocities of A = 1, 02 =
2, 03 = 3, respectively. The range of the detecting device used by Player 1, depending
on the velocities of the players is determined by the matrix
A fa t
<*1 "4 5 6
= a2 3 4 5
<*3 1 2 3
i .2. Maximin and minimax strategies 5
Strategies of the players are the velocities, and Player l's payoff in the situation
(aj,/3j) is assumed to be the search efficiency o,_, = a<6,j, i = 1,3, j = 1,3, where 5y
is an element of the matrix D, Then the problem of selecting velocities in a noisy
search can be represented by the game with matrix
fix ft ft
Qi 4 5 6
A = Q2 6 8 10 .
a3 [3 6 9
which is called the lower value of the game. The principle of constructing strategy
x based on the maximization of the minimal payoff is called the maximin principle,
and the strategy x selected by this principle is called the maximin strategy of Player
1.
For Player 2 it is possible to provide similar reasonings. Suppose he chooses
strategy y. Then, at worst, he will lose max* K(x,y). Therefore, the second player
can always guarantee himself the payoff min y max, K(x, y). The number
is called the upper value of the game I \ The principle of constructing a strategy
y, based on the minimization of maximum losses, is called the minimax principle,
and the strategy y selected for this principle is called the minimax strategy of Player
2. It should be stressed that the existence of the minimax (maximin) strategy is
determined by the reachability of the extremum in (1.2.2), (1.2.1).
Consider the (m x n) matrix game TA- Then the extrema in (1.2.1) and (1.2.2)
are reached and the lower and upper values of the game are, respectively equal to
u
~ . S ^ ,PJ? a "> (1.2.3)
The minimax and maximin for the game YA can be found by the following scheme
mm maxajj
Thus, in the game TA with the matrix
1 0 4
A = 5 3 8
6 0 1
the lower value (maximin) u and the maximin strategy t 0 of the first player are u 3,
o = 2, respectively, and the upper value (minimax) v and the minimax strategy jo
of the second player are t> = 3, jo = 2, respectively.
1.2.2. The following assertion holds for any game T = (X, Y, K)
Lemma. In two-person zero-sum game T
t; < t; (1.2.5)
Hence we get
inf Klx.y) < inf sup A"(x,y).
Note that we have a constant on the right-hand side of the latter inequality, and the
value x X has been chosen arbitrarily. Therefore, the following inequality holds
sup inf K(x, y) < inf sup K(x, y).
xZX vY yY xeX
1.3. Saddle pom is 7
K(x,y')<K(X',y'), (1.3.1)
K(x\y)>K{x',y') (1.3.2)
for allx X and y Y.
The set of all equilibrium points in the game T will be denoted as
In the matrix game T^ the equilibrium points are the saddle points of the payoff
matrix A, i.e. the points (i',j") for which for all i 6 M and j N the following
inequalities are satisfied
The element of the matrix a^-j. at the saddle point is simultaneously the minimum of
its row and the maximum of its column. For example, in the game with the matrix
1 0 41
5 3 8 the point (2,2) is a saddle point (equilibrium).
6 0 1 J
1.3.2. The set of saddle points in the two-person zero-sum game T has the proper
ties which enable one to deal with the optimality of a saddle point and the strategies
involved.
Theorem. Let (zj,y"), (x^y^) be two arbitrary saddle points in the two-person
zero-sum game T. Then:
1. K(x'l,y') = K(x'2,y'2);
Proof. From the definition of a saddle point for all x g X and y G Y we have
Show the validity of the second assertion. Consider the point (xj,yj). FVom (1.3.3)-
(1.3.5), we then have
for all x 6 X, y Y. The inclusion (xj, yj) Z(T) can be proved in much the same
way.
Rrom the theorem it follows that the payoff function takes the same values at all
saddle points. Therefore, it is meaningful to introduce the following definition.
Definition. Let (x*,y*) be a saddle point in the game T. Then the number
v = K(x',y') (1.3.7)
The proof of (1.3.8) is a consequence of the second assertion of the theorem, and is
left to the reader.
Definition The set X*(Y") is called the set of optimal strategies of Player 1 (2)
in the game T, and their elements-optimal strategies of the 1(2) player.
Note that from (1.3.5) it follows that any pair of optimal strategies forms a saddle
point, and the corresponding payoff is the value of the game.
1.3.3. Optimality of the players' behavior remains unaffected if the strategy sets
in the game remain the same and the payoff function is multiplied by a positive
constant, or a constant number is added thereto.
Lemma (on Scale). Let F = (X,Y,K) and V (X,Y,K') be two zero-sum
games and
K' = 0K + a, 0 > 0, a = const, 0 = const. (1.3.9)
1.3. Saddle points
Then
Z(T') = Z{T), vr, = Pvr + a. (1.3.10)
K(x,y) = (lf0)K'(x,y)~a/f}
and, by the similar reasoning, we have that {x,y) Z(T). Therefore Z(T) = Z(r')
and we have
vr. = K'{x% y') = flK(x', y') + a = /? r + o.
Conceptually, this lemma states strategic equivalence of the two games differing
only by the payoff origin and the scale of their measurements.
1.3.4. We shall now establish a link between the principle of equilibrium and the
principles of minimax and maximin in a two-person zero-sum game.
Theorem. For the existence of the saddle point in the game T = (X, Y, K), it
is necessary and sufficient that the quantities
Proof. Necessity. Let (x*,y*) G Z{V). Then for all x e X and y 6 Y the following
inequality holds:
K{x,y-)<K(x',y-)<K{x-,y) (1.3.13)
and hence
snpK(x,y')<K(x',y"). (1.3.14)
i.e. the exterior extrema of the min sup and max inf are reached at the points y* and
x* respectively.
Sufficiency. Suppose there exist the min sup and max inf
The games, in which saddle points exist, are called strictly determined. Therefore,
this theorem establishes the criterion for strict determination of the game and can
be restated as follows. For the game to be strictly determined it is necessary and
sufficient that the minsup and maxinf in (1.3.11) exist and the equality (1.3.12) is
satisfied.
Note that, in the game TA, the extrema in (1.3.11) are always reached and the
theorem may be reformulated in the following form.
Corollary 2. For the (m x n) matrix game to be strictly determined it is
necessary and sufficient that the following equalities hold
1 4 1
For example, in the game with the matrix 2 3 4 the point (2,1) is a saddle
0 - 2 7
point. In this case
max min a;. = min max a;. = 2.
3 i i
1 0
On the other hand, the game with the matrix does not have a saddle point,
0 1
since
min max a,-, = 1 > max min o^, = 0.
J i 3
Note that the games formulated in Examples 1-3 are not strictly determined,
whereas the game in Example 6 is strictly determined and its value is v = 6.
In this case the maximin and minimax strategies are not optimal. Moreover, it is not
advantageous for the players to play such strategies, as he can obtain a larger payoff.
The information about a choice of a strategy supplied to the opponent, however, may
cause greater losses than in the case of the maximin or minimax strategy.
Indeed, let the matrix A be of the form
7 3
4 = 2 5
12 Chapter 1. Matrix games
For such a matrix mm, max; a;j = 5, max; minj a;; = 3, i.e. the saddle point does not
exist. Denote by i* the maximin strategy of Player 1 (t* = 1), and by j ' the minimax
strategy of Player 2 (j* = 2). Suppose Player 2 adopts strategy j * = 2 and Player
1 chooses strategy t = 2. Then the latter receives the payoff 5, i.e. 2 units greater
than the maximin. If, however, Player 2 guesses the choice by Player 1, he alters his
strategy to j = 1 and then Player 1 receives a payoff of 2 units only, i.e. 1 unit less
than in the case of the maximin. Similar reasonings apply to the second player.
How to keep the information about the choice of the strategy in secret from the
opponent? To answer this question, it may be wise to choose the strategy with
the help of some random device. In this case the opponent cannot learn the player's
particular strategy in advance, since the player does not know it himself until strategy
will actually be chosen at random.
1.4.2. Definition. The random variable whose values are strategies of a player
is called a mixed strategy of the player.
Thus, for the matrix game TA, a mixed strategy of Player 1 is a random variable
whose values are the row numbers i G M, M = { 1 , 2 , . . . , m } . A similar definition
applies to Player 2's mixed strategy whose values are the column numbers j N of
the matrix A.
Considering the above definition of mixed strategies, the former strategies will
be referred to as pure strategies. Since the random variable is characterized by its
distribution, the mixed strategy will be identified in what follows with the probability
distribution over the set of pure strategies. Thus, Player l's mixed strategy x in the
game is the m-dimensional vector
m
*=Ki,,&.), & = U . > 0 , i = l,...,m. (1.4.2)
;=i
In this case, & > 0 and x\, > 0 are the probabilities of choosing the pure strategies
* M and j N, respectively, when the players use mixed strategies a; and y.
Denote by X and Y the sets of mixed strategies for the first and second players,
respectively. It can easily be seen that the set of mixed strategies of each player is a
compact set in the corresponding finite Euclidean space (closed, bounded set).
Definition. Let x = ( i , . . . , m ) X be a mixed strategy of Player 1. The set
of indices
Mx = {i\ieM,(i>0}, (1.4.4)
where M = { 1 , 2 , . . . , m), is called the spectrum of strategy x.
Similarly, for the mixed strategy y = ( ^ j , . . . , J?) Y of Player 2 the spectrum
N v is determined as follows:
Ny = {j\j6N,m>0}, (1.4.5)
J .4. Mixed extension of a game 13
The function K(x,y) is continuous in x X and y G Y. Notice that when one player
uses a pure strategy (i or j , respectively) and the other uses a mixed strategy (y or
x), the payoffs K(i,y), K(x, j) are computed by formulas
K(x,y')<K(x\y')<K(x',y). (1.4.7)
14 Chapter 1. M&trix games
The strategies (x*,y*) appearing in the saddle point are called optimal. Moreover,
by Theorem 1.3.4, the strategies x* and y* are respectively the maximin and minimax
strategies, since the exterior extrema in (1.3.11) are reachable (the function K(x,y)
is continuous on the compact sets X and Y).
Lemma 1.3.3 shows that the two games differing by the payoff reference point and
the scale of payoff measurements (Lemma on Scale) are strategically equivalent. It
turns out that if two matrices games T^ and V* are subject to this lemma, their
mixed extensions are strategically equivalent. This fact is formally estabhshed by the
following lemma.
Lemma. Let TA and TA> be two (m x n) matrix games, where
and B is the matrix with the same elements 0, i.e. /?,j = fi for all i and j . Then
Z(jT~A') = Z(FA)> *>A> = <*VA + P, where TA1 and TA are the mixed extensions of the
games TA' and TA, respectively, andvA^^A are the values of the games TA' andTA-
Proof. Both matrices A and A' are of dimension mxn; therefore the sets of mixed
strategies in the games TA< and TA coincide. We shall show that for any situation in
mixed strategies (x, y) the following equality holds
where K' and K are Player l's payoffs in the games TA> and TA, respectively.
Indeed, for all x 6 X and j g V w e have
From Scale Lemma it then follows that Z(TA-) = Z(TA), VA' = OVA + P-
are
Example 7. Let us verify that the strategies y* = ( , 5 , 4 ) , x* = ( I J ^ I )
optimal and v = 0 is the value of the game TA with the matrix
1 -1 -1
A= 1 -1 3
1 3 -1
We shall simplify the matrix A (to obtain the maximum number of zeros). Adding a
unity to all elements of the matrix A, we get the matrix
' 2 0 0"
A' = 0 0 4
0 4 0
Each element of the matrix A' can be divided by 2. The new matrix is of the form
' 1 0 0"
A" = 0 0 2
0 2 0
1.5. Convex sets and systems of linear inequalities 15
By the lemma we have t u " = | x ' = %{VA + 1)- Verify that the value of the game
Tx is equal to 1/2. Indeed, K(xm,y*) = Ay' = 1/2. On the other hand, for each
strategy y Y,y = (r}un2,i)3) we have K{x',y) = |IJ, + \n2 + |J? 3 = \ 1 = , and
for all x = ( 6 , 6 , 6 ) , i X , ff(!,) = | 6 + *& + & = \. Consequently, the
above-mentioned strategies x',y' are optimal and VA ~ 0.
In what follows, whenever the matrix game TA is mentioned, we shall mean its
mixed extension VA-
xA<b
or
* > < & , jJV,iV = { l , . . . , n } , (1.5.1)
where A = [a',} N] is the (m x n) matrix, 6 = ( & , . . . ,/?) 6 R.
Denote a set of solutions of (1.5.1) as X = {x\xA < b}. From the definition it
immediately follows that X is a convex set. The set X is called a convex polyhedral
set given by the system of constraints (1.5.1).
1.5.2. The point X M, where M is the convex set is called an extreme point if
from the condition x = Axi + (1 - A)x2, Xi G M, x 2 M and 0 < A < 1 it follows that
xj = x 2 = x. Conceptually, the definition implies that x M is an extreme point if
there is no line segment with two end points in M to which x is an interior point.
Notice that the extreme point of the convex set is always a boundary point and
the converse is not true.
16 Chapter 1. Matrix games
Let X be a convex polyhedral set that is given by the system of constraints (1.5.1).
Then the following assertions are true.
Theorem. [Ashmanov (1981)]. The set X has extreme points if and only if
rank A = rank[a',j N]=?m.
Theorem. [Ashmanov (1981)]. For the point x0 6 X to be extreme, it is
necessary and sufficient that this point be a solution of the system
The convex hull of a finite number of points is called a convex polyhedron generated
by these points. The convex polyhedron is generated by its extreme points. Thus,
if we consider the set X of Player l's mixed strategies in the (m x n) game, then
X = conv{ui,... ,u m }, where u, = (0,... ,0,1,0,... ,0) are unit vectors of the space
PJ" of pure strategies of Player 1. The set X is a convex polyhedron of (m 1)
dimension and is also called the (ro I)-dimensional simplex (or the fundamental
simplex). In this case, all vectors u, (pure strategies) are extreme points of the
polyhedron X. Similar assumptions apply to the Player 2's set Y of mixed strategies.
The cone C is called a set of such points that if x C, X > 0, then Ax 6 C.
Conceptually, the cone C, which is the subset i f , contains, together with the
point x, the entire half-line (a:), such that
The cone C is also called a convex cone if the following condition is satisfied: x+y C
holds for all i , y C. In other words, the cone C is convex if it is closed with respect to
addition. Another equivalent definition may also be given. The cone is called convex if
it is a convex set. The sum of convex cones Ci + Cj = {c|c = ci+c a ,ci 6 Ci,cj 6 C2)
and their intersection C\ (1 C2 are also convex cones.
1.5. Convex sets and systems of linear inequalities 17
Immediate verification of the definition may show that the set C = {x \xA < 0}
of solutions to a homogeneous system of linear inequalities corresponding to (1.5.1)
is a convex cone.
Let X be a convex polyhedral set given in the equivalent form
m
f,a,<6, (1.5.4)
;=>
where x ( & , . . . ,TO) Rm, a< is the tth row of the matrix A, t = 1 , . . . ,m. Now
suppose that rank A = r,r < m, and vectors ai,...,ar form the row basis of the
matrix A. Decompose the remaining rows with respect to the basis
f
a3; = E * v a i i j = r + l , . . . , m . (1.5.5)
i-l
Substituting (1.5.5) into (1.5.4), we obtain the following system of inequalities (equiv
alent to (1.5.4))
r m
E(fc + E &*)*<* (i-s-6)
Denote by Xo the set of vectors x = ( j , . . . , m ) R satisfying the inequalities (1.5.6)
and condition , = 0,j = r + 1 , . . . ,m. By Theorem in 1.5.2, the set X0 has extreme
points. The following theorem holds [Ashmanov (1981)].
Theorem on representation of a polyhedral set. Let X be the polyhedral
set given by the system of constraints (1.5.4)- Then
X = M + C,
ex = by.
1.5.5. In closing this section, we give one property of convex functions. First
recall that the function <p: M - Rl, where M C Rm is a convex set, is convex if
for any xt,x2 M and A [0,1). If the inverse inequality holds in (1.5.9), then the
function ip is called concave.
Let ifii(x) be the family of functions convex on Af, = 1,..., n. Then the upper
envelope ^>(x) of this family of functions
is convex on M.
Indeed, by the definition of the convex function for Xi,x2 M and a [0,1] we
have
ipi(axi + (1 - a)x2) < Qifi(xi) + (1 - a)<,-(z2)
< amaxifii(xi) + (1 ct)max<pi(x2).
Hence we get
rjj(ax\ + (1 ct)x2) = maxy>(ii + (1 ct)x2) < a^(zi) + (1 a)ip(x2),
and from feasibility of x and y for problems (1.6.1), (1.6.2), it follows that x* =
x/Q > 0 and y" = y/Q > 0, i.e. x" and y" are the mixed strategies of players 1 and
2 in the game T^.
Let us compute a payoff to Player 1 at (x',y*):
On the other hand, from the feasibility of vectors x and y for problems (1.6.1),(1.6.2)
and equality (1.6.3), we have
Let x G X and y Y be arbitrary mixed strategies for players 1 and 2. The following
inequalities hold:
A= 4 0
2 3
Associated problems of linear programming are of the form
n6+6, maxtii+tjj,
46 + 2fa > 1, Am < 1
%>1, 2in+3ij4<l,
6>o,6>o, m>o,V2>0-
Note that, these problems may be written in the equivalent form with constraints in
the form of equalities
mm 6 + 6 , maxift+fft,
4 6 + 2 6 - 6 = 1, 4>h+fc = l
36-6=1, 2 ^ + 3 ^ + ^4 = 1,
6 > o,6 > 0,6 > 0,6 > o, ;t > o,.?2 > 0,1/3 > o,i/4 > o.
1.6. Existence of a soiution of the matrix game in mixed strategies 21
Thus, any method of solving the linear programming problems can be used to solve the
matrix games. The simplex method is most commonly used to solve such problems.
Its systematic discussion may be found in Ashmanov (1981), Gale (1960), Hu (1970).
1.6,2. In a sense, the linear programming problem is equivalent to the matrix
game T^. Indeed, consider the following direct and dual problems of linear program
ming
minzii
xA > w, (1.6.9)
x>0,
maxyu)
Ay<u, (1.6.10)
y>0.
Let X and Y be the sets of optimal solutions of the problems (1.6.9) and (1.6.10),
respectively. Denote ( 1 / 0 ) X = {x/Q \x X}, ( 1 / 9 ) 7 = { f / 0 \y Y}, 0 > 0.
Theorem. Let F^ be the (m x n) game with the positive matrix A (all elements
are positive) and let there be given two dual problems of linear programming (1.6.9)
and (1.6.10). Then the following assertions hold.
M = i/e,
and the strategies
x- = x/Q, y- = y / 0
are optimal, where i 6 l is an optimal solution of the direct problem (1.6.9)
and y G Y is the solution of the dual problem (1.6.10).
3. Any optimal strategies x' X" and y' Y* of the players can be constructed
as shown above, i.e.
X' = ( 1 / 0 ) X , Y' = ( 1 / 0 ) F .
xa> = 0 x V > 0 ( 1 / 0 ) = 1,
in which case 1 > 0, since 0 > 0 and x* > 0. Therefore x is a feasible solution to
problem (1.6.9).
22 Chapter 1. Matrix games
K(iy)<K(x',y')<K(X',j). (1.7.1)
Proof. Necessity. Let (x*,y*) be a saddle point in the game V*. Then
K(x,y')<K(x',y')<K(x',y)
for all x 6 X, y G Y. Hence, in particular, for Ui X and Wj Y we have
K(i,y')<K(x\y-)<K(x',j)
K{i,y')<K(x',j)
for each t and j . Suppose the opposite is true, i.e. (1.7.6) is not satisfied. Then
3=1
= f:lK(i,y)<m^K(i,y).
Hence we have
for all 1 < t < m and 1 < j < n, then, by the Theorem in 1.7.1, (x,y) is the saddle
point in the game V*.
From the proof it follows that any one of the numbers in (1.7.6) is the value of
the game.
1.7.3. Theorem. The following relation holds for the matrix game FA
m a x m i n i f ( x ! j ) = VA = minmaxif(*,y), (1-7.7)
in which case the extrema are achieved on the players' optimal strategies.
This theorem follows from the Theorems 1.3.4, 1.7.2, and its proof is left to the
reader.
1.7.4. Theorem. In the matrix game TA the players' sets of optimal mixed
strategies X' and Y* are convex polyhedra.
Proof. By Theorem 1.7.1, the set X* is the set of all solutions of the system of
inequalities
xa3 > VA, j N,
XV. = 1,
*>0,
where u = ( 1 , . . . , 1 ) G i f , VA is the value of the game. Thus, X* is a convex
polyhedral set (1.5.1). On the other hand, X* C X, where X is a convex polyhedron
J. 7. Properties of optimal strategies and vaJue of the game 25
(1.5.3). Therefore X" is bounded. Consequently, by Theorem 1.5.3, the set X' is a
convex polyhedron.
In a similar manner, it may be proved that Y" is a convex polyhedron.
1.7.5. As an application of Theorem 1.7.3, we shall provide a geometric solution
to the games with two strategies for one of the players (2 x n) and (m x 2) games.
This method is based on the property that the optimal strategies x' and y* deliver
exterior extrema in the equality
Example 11. ((2 x n) game.) We shall examine the game in which Player 1 has
two strategies and Player 2 has n strategies. The matrix is of the form
A = an an "in
a
2l <*22 2n
is the lower envelope of the family of straight lines (1.7.8). This function is concave
as the lower envelope of the family of concave (linear in the case) function (1.5.5).
The point *, at which the maximum of the function H() is achieved with respect
to ^ [0,1], yields the required optimal solution x* = (*, 1 *) and the value of
the game vA = //(")
For definiteness, we shall consider the game with the matrix
1 3 14
A=
2 14 0
i(' = -C + 2 = vA.
Hence we get the optimal strategy x' = (2/5,3/5) of Player 1 and the value of the
game is VA = 8/5. Player 2's optimal strategy is found from the following reasonings.
Note that in the case studied K(x",l) = K(x",4) = vA = 8/5.
26 Chapter I, Matrix games
K(x,2)
K(x,4)
Figure 1.1
For the optimal strategy y* = (t)l, ijj, i?J, JJJ) the following equality must hold
In this case K(x',2) > 8/5, K(x',S) > 8/5; therefore ijj = ^J = 0, and til,t)l can be
found from the conditions
, ; + 4*: = 8/5,
2>tf = 8/5.
Thus, t]t = 4/5, tfl 1/5 and the optimal strategy of Player 2 is y* = (4/5,0,0,1/5).
Example IS. ((m x 2) game.) In this example, Player 2 has two strategies and
Player 1 has m strategies. The matrix A is of the form
12
22
-4 =
. Otnl Omi .
1.7. Properties of optimal strategies and vaiue of the game 27
This game can be analyzed in a similar manner. Indeed, let y = (JJ, 1 r)) be an
arbitrary mixed strategy of Player 2. Then Player l's payoff in situation (t,y) is
K(i,y) = a tl i? + a, 2 (l - n) = ( a n - ai2)n + a, 2 .
The graph of the function K(i, y) is a straight line. Consider the upper envelope
of these straight lines, i.e. the function
The function H(TJ) is convex (as the upper envelope of the family of convex functions).
The point of minimum if of the function H(n) yields the optimal strategy y" =
(if, 1 r}') and the value of the game is VA = H(rf) min H (n).
l[0,l]
1.7.6. We shall provide a theorem that is useful in finding a solution of the game.
T h e o r e m . Let x' = (1,..., ^) and y' = (i}J,..., IJ* ) 6e optimal strategies in
the game TA andv* be the value of the game. Then for any i, for which K(i,y') < VA,
there must be * = 0 , and for any j such that v& < K(x*,j) there must be !j* = 0.
Conversely, if " > 0, then K(i,y") = vA, and iff)* > 0, then K(x',j) = vA.
Proof. Suppose that for some 0 M, K(i0,y") < VA and *> / 0. Then we have
K(i0,y')tta<VAi:-
K(hy')C<vAil
Consequently, K(x",y") < vA, which contradicts to the fact that VA is the value of
the game. The second part of the Theorem can be proved in a similar manner.
This result is a counterpart of the complementary stackness theorem [Hu (1970)]
or, as it is sometimes called the canonical equilibrium theorem for the linear program
ming problem [Gale (I960)].
Definition. Player l's (S's) pure strategy i M(j N) is called an essential
or active strategy if there exists the player's optimal strategy x" = ([,-,() (v"
fal, -,*:)) for which t:>0(r,;>0).
From the definition, and from the latter theorem, it follows that for each essential
strategy i of Player 1 and any optimal strategy y' Y' of Player 2 in the game TA
the following equality holds:
A similar equality holds for any essential strategy j e N of Player 2 and for the
optimal strategy x" G X" of Player 1
If the equality a*y = VA holds for the pure strategy i M and mixed strategy
y S Y, then the strategy t is the best reply to the mixed strategy y in the game I V
28 Ciapter J. Matrix games
Thus, using this terminology, the theorem can be restated as follows. If the pure
strategy of the player is essential, then it is the best reply to any optimal strategy of
the opponent.
A knowledge of the optimal strategy spectrum simplifies to finding a solution of
the game. Indeed, let Mx> be the spectrum of Player l's optimal strategy x*. Then
each optimal strategy y" = (r/,, ...,/*) of Player 2 and the value of the game t; satisfy
the system of inequalities
a,y* = v, i Mx.,
"iV* < v, t M\MX.,
Thus, only essential strategies may appear in the spectrum Mx of any optimal strat
egy x'.
1.7.7. To conclude this section, we shall provide the analytical solution of Attack
and Defence game (see Example 4, 1.1.3).
Example 13. [Sakaguchi (1973)]. Let us consider the game with the (n x n) matrix
A.
An n n
A = T2 fi3T2 ... T2
A>T .
Here 7j > 0 is the value and 0 < & < 1 is the probability of hitting the target
Ci,i 1 , 2 , . . . , n provided that it is defended.
Let Tj < T 2 < . . . < T. We shall define the function sp, of integers 1 , 2 , . . . , n as
follows:
<p(k) = { ( 1 - A ) " 1 - 1 } / i > , ( i - A-))" 1 (1-7.9)
We shall establish properties of the function <p(k). Denote by R one of the signs of
the order relation { > , = , < } . In this case
<p(k)Rtp(k + l) (1.7.11)
if and only if
rkR<p(k), * = l , 2 , . . . , n - l , 70 = 0. (1.7.12)
Indeed, from (1.7.9) we obtain
n \,i=k+i(n(i - Pi))
1.7. Properties of optimal strategies and va/ue of the game 29
-i
( i - -&)"
ET=*+i( 7 . ( i - A))"1'
Then we have
Ti-i <
T|-l <>(/)<
>(/) < rTi.
t. (1.7.14)
(1-7.14)
Find optimal strategies in the game r ^ . Recall that we have inequalities rt < r2 <
... Tn. Then the optimal strategies x" = (J, ..-,*) and y" = (i/J,..., JJ*) for players
1 and 2 respectively, are as follows:
' 0,
0, i i == l l, ,. ... .. ,, / - l ,
c =
| wi-ftrvDriO-ft))- 1 , it == /i,...,n,
, . . . , n,
L7 15
(1.7.15)
( - )
, _ / 0, j = 1l ,,..... ,, ,/ /~-1l ,, ,)7,M
(1.7.16)
(1Jlb)
'' I (*>->(0)/(r>(l-ft)),
(>-V>(0)/(r>(l-ft)), J = /, . . , n ,
=/,...,,
and the value of the game is
vA = <p(l).
We have that * > 0 , i = 1,2,... ,n and X)"=i (' = 1- From the definition of tp(l)
and (1.7.14) we have that??' > 0, j = 1,2,... , n and "=! J?* = 1.
Let K(x',j) be a payoff of Player 1 at (x',j). Similarly, let K(i,y*) be a pay-
off at (i,y"). Substituting (1.7.15), (1.7.16) into the payoff function and using the
assumption that the values of targets do not decrease, and using (1.7.14), we obtain
*(*',i) = <
' > , = <p(l) + f > , ( i - /?,))- > <p(i), j = TJ^r,
Equivalence of the pairs of strategies i',"(' ~ i") and j',j"(j* ~ j") implies that
the conditions <v = ain{a'' a'") are satisfied.
1.8. Dominance of strategies 31
for all j = l , n . Hence, using the optimality of strategy x* (see 1.7.3) , we get
Hence
mm x'a1 > mmx'a-'.
3 1
? (1.8.4)
(L8 4)
**""
' - \ 00,, - m.
i= m. -
Components of the vector x are non-negative, (J > 0,i = 1,... ,m) and J^Jl, ,- = 1.
On the other hand, for all t = 1,..., n we have
I m
1 a > mj
m
Q
1I m
m
737- E &<**
1 Ctn E&
1
o w~(mrzrr1=1 *' l
x
4"> i=l
i=l Sm , = i
1.8. Dominance of strategies 33
or
, _, E i*i) > " - ; , _ f E &
m-l m-l
E ,'<*<; ^ m, E = Q"J' j = 1, ,
=i i=i
m-J
E ! = i . ^ M = i v . i - i . (1-8-5)
Thus, from the dominance of the mth row it always follows that it does not exceed a
convex linear combination of the remaining m 1 rows [(1.8.5)].
Let {x",y') e Z(TA>) be a saddle point in the game YA>, x* = (([,... ,fm-i)>
y* = {rfl,... ,r?*). To prove assertions 1,2,3 of the theorem, it suffices to show that
K{x*m,y') = vA, and
n m-l
E Oiitf << E e? + 0 am} (1.8.6)
j=i i=i
The first of the inequalities (1.8.6) is evident from (1.8.7). We shall prove the first
inequality. To do this, it suffices to show that
n
E Q w?i -v*-
From inequalities (1.8.3), (1.8.5) we obtain
n n m~l m-l
E<w < E E yfc. 9; < E *& =v^'.
>=i j = i .=i <=i
From Theorem 1.7.6, we then have that the mth component of any optimal strategy
of Player 1 in the game T* is zero. This completes the proof.
Let us formulate the dominance theorem for the second player without providing
any proof.
Theorem. Let FA be an (m x n) game. Assume that the jth column of the
matrix A is dominated and r# is the game having the matrix A' obtained from A by
deleting the jiA column. Then the following assertions are true.
1. vA = vA-.
2. Any optimal strategy x* of Player 1 in the game TA' is also optimal in the game
In the latter matrix no row (column) is dominated by the other row (column). At
the same time, the 1st column a 1 is dominated by the convex linear combination of
columns o 2 and a 3 , i.e. a 1 > l/2a 2 + l/2a 3 , since 3 > 1 / 2 + 1 / 2 - 3 , 1 = 1/2-2+1/2-0,
3 = 0- 1/2 + 1/2 6. By eliminating the 1st column, we obtain
r i 3]
A3 = 2 0
0 6
In this matrix the 1st row is equal to the linear convex combination of the second
and third rows with a mixed strategy x = (0,1/2,1/2), since 1 = 1/2 2 + 0 1/2,
3 = 0 - 1 / 2 + 6 - 1 / 2 . Thus, by eliminating the 1st row, we obtain the matrix
2 0
A4 =
0 6
The players' optimal strategies x" and y* in the game with this matrix are x* = y" =
(3/4,1/4), in which case the game value v is 3/2.
The latter matrix is obtained by deleting the first two rows and columns; hence
the players' optimal strategies in the original game are extensions of these strategies
at the 1st and 2nd places, i.e. x\2 = y\2 = (0,0,3/4,1/4).
uA-
x = u/4. -1 u' (1.9.1)
(1.9.2)
1
VA = (1.9.3)
uA~lu
36 Chapter 1. Matrix games
Proof. Let x* = ( & ' , . . . , & ) 6 X' and y* = (;!,..., i) Y" be the players
arbitrary optimal strategies and let VA be the value of the game TA- Since TA is
completely mixed game, x* and y" are completely mixed strategies that (and only
them) are solutions of the systems of linear inequalities in 1.7.6:
xa3 = VA, j = hn
x* = VAUA~1.
y' =vAA~1u,
then
1
VA = ._. 1
uA
u
This completes the proof of Theorem.
The reverse is also true, although the proof will be left to the reader.
1.9. Completely mixed and symmetric games 37
Theorem. Suppose the matrix A is nonsingular in the (m x m) game TA. If
Player 2 has in TA a completely mixed optimal strategy, then Player 1 has a unique
optimal strategy x* (1.9.1). If Player I has in the game TA a completely mixed optimal
strategy, then Player 2 has a unique optimal strategy y* (1.9.2), the valve of the game
VA is defined by (1.9.S).
Example 15. ((2 x 2) game.) Consider the (2 x 2) game with the matrix
A =
Q2t a2i
We now assume that the game TA has no saddle point in pure strategies (a solution
is then found from the maximin and minimax equality) and x" = ({*, 1 *), y* =
(ij", 1 T)") are arbitrary optimal strategies of the first and second players respectively.
In this case, the saddle point (s*,y*) and game TA are completely mixed (* > 0 and
rj' > 0). Therefore, by Theorem 1.9.1, the game has a unique pair of optimal mixed
strategies which are a solution of the system of equations
If we ensure that vA ^ 0 (e.g. if all the elements of the matrix A are positive, this
inequality is satisfied), then the solution of the game is
v
* = , , > x' - VAUA~1, y' = vAA~lu,
1 0
A~l =
1 1
vA=0
X* = Y\
Proof. Let A be the game matrix and let x X be an arbitrary strategy. Then
xAx = xATx = xAx. Hence xAx = 0.
Let (x',y*) G Z(A) be a saddle point, and let vA be the value of the game. Then
Hence we get vA = 0.
Let the strategy x' be optimal in the game TA. Then (see Theorem 1.7.1)
x'A > 0.
It follows, however, that x*(AT) > 0, hence x'AT < 0. Thus we get
Ax' < 0.
By the same Theorem 1.7.1, this means that x* is the optimal strategy of Player 2.
We have thus proved that X* C Y*. The inverse inclusion is proved in a similar
manner.
In what follows, dealing with the player's optimal strategy in the symmetric game,
because of the equality X* = Y* we shall not indicate which of the players is con
cerned.
Example 16. Let us solve the game with the matrix
r o i ii
A=
1.9. Completely mixed and symmetric games 39
an
Let x' = ( 6 , 6 > 6 ) be optimal strategy in the game FA- Then the following
inequalities are satisfied
6 - 6 = 0, - 6 + 6 = 0 , 6 - 6 = 0,
6 + 6 + 6 = 1, & > 0 , i = l , 2 , 3 .
This system has a unique solution. The vector x' = (1/3,1/3,1/3) is an optimal
strategy.
Example 17. Solve a discrete five-step duel game in which each duelist has one
bullet. This game was formulated in 1.1.4 (see Example 3). The payoff matrix A of
Player 1 is symmetric and is of the form
0 - 3 - 7 -11 -15
3 0 1 -2 -5
A = 7 -1 0 7 5
11 2 - 7 0 15
15 5 - 5 -15 0
Note that the first strategy of each player (first row and first column of the matrix)
is strictly dominated; hence it cannot be essential and can be deleted. In the resulting
truncated matrix
0 1 - 2 - 5
A' = - 1 0 7 5
2 - 7 0 15
5 - 5 -15 0
not all strategies are essential.
Indeed, symmetry of the game FA> implies that vA> = 0. If all strategies were
essential, the optimal strategy x* would be a solution of the system of equations
x V = 0 , ; = 2,3,4,5,
6 = 1,
40 Chapter 1. Matrix games
which define the solution. Exhausting different possibilities, we dwell on the essential
submatrix A" composed of the rows and columns of the matrix A that are labeled as
2,3 and 5:
0 1-5
A" = - 1 0 5
. 5 - 5 0
The game with the matrix A" is completely mixed and has a unique solution y = x =
(5/11,5/11,1/11).
In the original game, we now consider the strategies x* = y' =
(0,5/11,5/11,0,1/11) which are optimal.
Thus, we finally have that v* = 0 and the saddle point (x*,y*) is unique. As far
as the rules of the game are concerned, the duelist should not fire at the 1st step, he
must fire with equal probability after the 2nd or 3rd step, never after the 4th step,
and only with small probability may he fire when the duelists are breast to breast.
C* = max <;/* = o i w i ! #
i )
and
v* = mjn ][>* = $>, J V H *.
1.10. Iterative methods of solving matrix games 41
Let v be the value of the matrix game TA- Consider the expressions
3
i )
vk/k = m j n ^ a ^ f / * = </*+,,*/*
The vectors xk = (k/k,..., /&) and yk = (r]k/k,..., t)k/k) are mixed strategies for
the players 1 and 2, respectively; hence, by the definition of the value of the game we
have
ma.xuk/k < v < rmnvk/k,
Example 18. Find an approximate solution to the game having the matrix
a b c
a [ 2 1 3'
A= P 3 0 1 .
7 [ l 2 l
.
Denote Player 1 's strategies by a,0,-y, and Player 2's strategies by a, 6, c. Suppose
the players first choose strategies a and a, respectively. If Player 1 chooses strategy
a, then Player 1 can receive one of the payoffs (2,1,3). If Player 2 chooses strategy a,
then Player 1 can receive one of the payoffs (2,3,1). In the 2nd and 3rd plays, Player
1 chooses strategy f3 and Player 2 chooses strategy 6, since these strategies ensure the
best result, etc.
Table 1.1 shows the results of plays, the players' strategies, the accumulated payoff,
and the average payoff.
Thus, for 12 plays, we obtain an approximate solution
and the accuracy can be estimated by the number 5/12. The principal disadvantage
of this method is its low speed of convergence which decreases as the matrix dimension
increases. This also results from the nonmonotonicity of sequences vk/k and y_k/k.
Consider another iteration algorithm which is free of the above-mentioned disad
vantages.
1.10.2. [Sadovsky (1978)]. Monotonic iterative method of solving matrix games.
We consider a mixed extension TA = (X, Y, K) of the matrix game having the (m x n)
matrix A.
42 Chapter I. Matrix games
Play Player Player Player l's Player 2's
No l's 2's payoff payoff 11*
k k
choice choice a 0 7 a b c
1 a a 2 3 1 2 1 3 3 1
2 0 b 3 3 3 5 1 4 3/2 1/2
3 b 4 3 5 8 1 5 5/3 1/3
4 7 b 5 3 7 9 3 6 7/4 3/4
5 7 b 6 3 9 10 5 7 9/5 5/5
6 7 b 7 3 11 11 7 8 11/6 7/6
7 7 b 8 3 13 12 9 9 13/7 9/7
8 7 c 14 4 14 13 12 10 14/8 10/8
9 7 c 14 5 15 14 12 11 15/9 11/9
10 7 c 17 6 16 15 14 12 17/10 12/10
11 a c 20 7 17 17 15 15 20/11 15/11
12 a b 21 7 19 19 16 18 21/12 16/12
Table 1.1
Let us denote
V."-1 = min 7*- 1 (1-10.3)
j=l,...,n '
and JN~l = {ji,... ,jk} be the set of indices on which (1.10.3) is achieved.
Let TN C TA be a subgame of the game T* with the matrix AN = {<$ -1 },
= l , . . . , m , and the index j N _ 1 6 JN~l. Solve the subgame and find an optimal
strategy xN X for Player 1. Let xN = (?,...,(%).
I.JO. Iterative methods of solving matrix games 43
Compute the vector c " = YlTLi ifai- Suppose the vector c " has components
c " = ( 7 f , . . . , 7 ^ ) . Consider the (2 x n) game having the matrix
in
7f In
Find Player l's optimal strategy (aw, 1 ajv)i 0 < a/v < 1 in this subgame.
Substituting the obtained values xN, c^,a;v into (1.10.1), (1.10.2), we find xN and
cf*. We continue the process until the equality a^ = 0 is satisfied or the required
accuracy of computations is achieved. Convergence of the algorithm is guaranteed by
the following theorem [Sadovsky (1978)].
Theorem. Let {v.N},{xN} be the iterative sequences determined by (1.10.1),
(1.10.3). Then the following assertions are true.
1. vN > if'1, i.e. the sequence {t^'1} strictly and monotonically increases
2.
lim vN = v = v (1.10.4)
2 1 3"
A = 3 0 1
1 2 1
Iteration 0. Suppose Player 1 chooses the 1st row of the matrix A, i.e. 2* = (1,0,0)
and c = at = (2,1,3). Compute v = min7; = -ft = 1, J = 2.
Iteration 1. Consider the subgame Tl C T having the matrix
A1 =
x 1 = l/2a; 0 + l / 2 i 1 = (1/2,0,1/2),
c l = l/2c + l/2c = (3/2,3/2,2),
y* = mitiii} = 7} = i\ = 3/2 > t;0 = 1.
44 Chapter 1. Matrix games
'2 1 "
A = 3 0
1 2
The first row in this matrix is dominated; hence it suffices to examine the subma-
trix
3 0
1 2
Player l's optimal strategy in this game is the vector (1/4,3/4); hence i a =
(0,1/4,3/4),
Compute c 2 = l/4a 3 + 3/4a 3 = (3/2,3/2,1) and consider the (2 x 3) game with
the matrix
" 3/2 3/2 1
3/2 3/2 2
The second strategy of Player 1 dominates the first strategy and hence a 3 = 0.
This completes the computations x* = x1 = (1/2,0,1/2); the value of the game is
v = ji1 = 3/2, and Player 2's optimal strategy is of the form y* = (1/2,1/2,0) (see
Example 18).
2 has to allocate m black balls to n containers, with the total number of balls in each
container being constant and equal to lj, lj > m.
The opponent (Player 1) tries to find as many black balls as possible and has an
opportunity to examine one of the containers. In examination of the ith container,
Player 1 chooses at random (equiprobably) m balls from U and his payoff is the
mathematical expectation of the number of black balls in the sample from m balls.
(a) Let Pi black balls be hidden in the ith container. Compute the probability
Pij that the sample of r balls chosen from the ith container contains exactly j black
balls.
(b) Construct the game matrix.
4. Air defense. An air defense system can use three types of weapons to hit an air
target (1,2,3) which are to be allocated to two launcher units. The enemy (Player 2)
has two types of aircraft (type 1 and type 2). The probabilities of hitting the planes
by one defense system are summarized in the matrix
1 2
1 ' 0.3 0.5
2 0.5 0.3
3 0.1 0.6
<**a = 0 , i = j , i ^ k,
<*ka = A , i = k, i^j,
*a = Pj, j = k, i? j ,
*. = A ( 2 - A ) , i = j = k.
Solve the game.
18. Solve the search game with many hidden objects (see Exercise 3).
19. Search game for several sets on a plane. The family of n fixed convex com
pact sets Kx, K2, . , Kn C R2 and the system m of convex compact congruent sets
Ti,...,Tm C R2 are given. The simultaneous discrete search game is as follows.
Player 2 "hides" m sets Tj (j = 1,..., m) in n sets /,-(* = 1,..., n) in such a manner
that each set intersects one and only one set if,. Player 2's pure strategy is of the
form
n
a = (pi,P2, ,Pn) R", Y,Pi = m,
where pi is the number of sets Tj hidden in the set K{.
Player 1 can examine one of the sets K, by selecting the point x from Ki. Player
l's payoff is the number of sets {Tj} whereto z belongs.
Find a solution of the game.
J. J J. Exercises an d problems 47
20. Search game with two trials for a searcher. Player 2 hides an object in one of
the n cells and Player 1 (searcher) searches for it in one of these cells. Player 1 has
an opportunity to examine two cells (repeated examination of cells is not allowed).
Player l's set of pure strategies consists of pairs (i, j), i = 1 , . . . , n , j = 1 , . . . ,n,
t ^ j and contains C 2 elements. Player 2's set of pure strategies contains n elements
k = 1 , . . . , n. The payoff matrix is of the form, where
22. In the evasion-type game, (see 1.7.1), show that Player 1 always has a unique
optimal strategy.
This page is intentionally left blank
Chapter 2
Infinite zero-sum two-person
games
where X and Y are arbitrary infinite sets whose elements are the strategies of the
players 1 and 2, respectively, and H : X x Y > Rl is the payoff function of Player 1.
Recall that the rules for zero-sum two-person games are described in 1.1.1. Player 2's
payoff in the situation (z,y) is [ H(x,y)\, x G X,y 6 Y (the game being zero-sum).
In this chapter the games with bounded payoff function H are considered.
2.1.2. Example 1. (Simultaneous planar pursuit-evasion game.) Let Si and S 2
be the sets on a plane. Player 1 chooses a point i 6 Si and Player 2 chooses a point
y S2. In making his choice, no player has information on the opponent's actions,
and hence such a choice can be conveniently interpreted as simultaneous. In this case,
the points x G Si,y S2 are strategies for the players 1 and 2, respectively. Thus
the players' sets of strategies coincide with the sets Si and S2 on the plane.
Player 2's objective is to minimize the distance between himself and Player 1
(Player 1 pursues the opposite objective). Therefore, by Player l's payoff H(x,y)
in this game is meant the Euclidean distance p{x,y) between the points x 6 Si and
y G Si, i.e. H(x,y) = p(x,y),x S\,y G S?. Player 2's payoff is taken to be equal to
[p(x,y)] (the game being zero-sum).
Example 2. (Search on a closed interval.) [Diubin and Suzdal (1981)]. The
simplest search game with an infinite number of strategies is the following.
49
50 Chapter 2. Infinite zero-sum two-person games
Player 2 (Hider) chooses a point y G [0,1] and Player 1 (Searcher) chooses si
multaneously and independently a point x [0,1], The point y is considered to be
"detected" if \x y| < I, where 0 < / < 1. In this case, Player 1 wins an amount +1;
otherwise his payoff is 0. The game is zero-sum.
Thus the payoff function is
*<'>={ J; if I* - y| < 1,
otherwise.
The payoff to Player 2 is [-H{x,y)].
Example S. (Search on a sphere.) Let a sphere C of radius R be given in R3. Player
1 (Searcher) chooses a system of the points xi,xj,...,x, 6 C and Player 2 chooses
one point y C. The players make their choices simultaneously and independently
of one another. Player 2 is said to be detected if the point y 6 C is found in the
r-neighborhood of one of the points x,, j = 1,..., s. Here by the r-neighbourhood of
the point ij is meant the segment of a sphere (cup) having its apex at the point Xj
with r as the base radius (Fig. 2.1). In what follows the r-neighborhood of the point
Xj is denoted by S(xj,r).
/^\-j-_-
0
\ f l rt^
Figure 2.1
The objective of Player 1 is to find Player 2, whereas Player 2 pursues the opposite
objective. Accordingly the payoff to Player 1 is
tii \ f 1, if y Afx,
H(x,y) = ^Q' otherwise,
where x = (xi,...,xt) and Mx = U*=lS(xj,r). The payoff to Player 2 is [H(x,y)].
Example 4- (Noisy duel.) [Karlin (1959)]. Each duelist has only one bullet to
fire. The duel is assumed to be a noisy duel because each duelist is informed of his
opponent's action, firing his bullet, as soon as it takes place. Further, it is assumed
2.1. Infini te games 51
that the accuracy function J>I(I) (the probability of hitting the opponent at the instant
x) for Player 1 is defined on [0,1], is continuous and increases monotonically in x and
Pi(0) = 0, p i ( l ) = 1. Similarly, the accuracy of Player 2 is described by the function
P2(y) on [0,1] where pj(0) = 0, P2(l) = 1. If Player 1 hits Player 2, his payoff is + 1 .
If Player 2 hits Player 1, his payoff is 1. If, however, both players fire simultaneously
and achieve the same result (positive or negative), the payoff to Player 1 is 0.
The information structure in this game (the fact that the weapons are noisy)
is taken into account in constructing the payoff function H{x,y). If x < y, the
probability of Player l's hitting the opponent is pi(x) and the probability of Player
l's missing is 1 pi(x). If Player 2 has not fired and knows that Player 1 cannot
fire any more, Player 2 can obtain a sure hit by waiting until y is equal to 1. Thus,
if Player 1 misses at the instant x, he is sure to be hit by Player 2 provided x < y;
hence
H(x,y) = p1(x) + (~l)[l-pl(x)}, x<y.
Similarly, we have
and
H(x,y) = pi(x)[l -p2{y)\ + pj(y)[l - p i ( x ) ] ( - l ) , x = y.
Thus, the payoff function H(x,y) in the game is
2p,(x) - 1, x < y,
!
piOO - p*(y), x = y,
1 - 2pj(y), x > y,
where x g [0,1], y [0,1].
Example 5. (Silent duel.) [Karlin (1959)]. In a silent duel each duelist has one
bullet, but neither duelist knows whether his opponent has fired. For simplicity, let
the accuracy functions be given by Pi(x) = p?(x) = x. Then the payoff function
describing the game is
0, {
x - (1 - x)y, x < y,
x = y,
-y + (1 - y)*, x > y,
where x [0,1], y (0,1). In this game the payoff function H(x,y) is constructed
in the same manner as in Example 4, except that neither duelist can determine the
time of his opponent's firing provided the opponent misses.
Example 6. ("Noisy" target search.) Consider the search problem for a "noisy"
target. In this problem the "noisy" target (Player 2) is to be detected by a mobile
facility (Player 1). The detector range /(x,y), depending on the velocities x G [x 0 ,x 1 ]
and y 6 [yo,yi] of the players 1 and 2, respectively, is of the form
l{x,y)=1{y)kLzEL
(Xi - X 0 )
52 Chapter 2. Infinite zero-sum two-person games
(yi-yo)'
/j = 7(yi), /0 = f(yo)- Positive numbers /i, /0 are said to be given. Thus
H(x,y')<H(x',y')<H(x',y) (2.2.1)
holds for all x X, y Y is called saddle point. This principle may be realized in
the game T if and only if
v = v = v = H{x',ym),
where
u = max inf H(x, y), v = minsup H(x,y), (2.2.2)
i.e. the external extrema of maximin and minimax are achieved and the lower value
of the game u, is equal to the upper value of the game V. The game T for which the
(2.2.2) holds is called strictly determined and the number v is the value of the game
(see 1.3.4).
For matrix games, the existence of the saddle point and the equality of maximin
to minimax were proved in the class of mixed strategies (see Sec. 1.6) and hence a
solution consists in finding of their common value v and those strategies x*,y" in
terms of which the external extrema are achieved in (2.2.2).
2.2. t-saddle points, e-optima.1 strategies 53
Here the situation (1,0) would be equilibrium if 1 and 0 were among the players'
strategies, with the game value v being v = 1. Actually the external extrema in
(2.2.2) are not achieved but in the same time the upper value is equal to the lower
value of the game. Therefore v = 1 and Player 1 can always receive the payoff
sufficiently close to the game value by choosing a number 1 e, t > 0 sufficiently
close to 1. On the other hand, by choosing t > 0 as a sufficiently small number (close
to 0), Player 2 can guarantee that his loss will be arbitrarily close to the value of the
game.
2.2.3. Definition. The point (x(,yt) in the zero-sum two-person game T =
(X,Y,H) is called the t-equilibrium point if the following inequality holds for any
strategies x X and y 6 Y of the players 1 and 2, respectively:
The point (xt, y t ), for which (2.2,4) holds, is called the t-saddle point and the strategies
x< and yt are called c-optimal strategies for the players 1 and 2, respectively.
Compare the definitions of the saddle point (2.2.1) and the t-saddle point (2.2.4).
A deviation from the optimal strategy reduce the player's payoff, whereas a deviation
from the e-optimal strategy may increase the payoff by no more than c.
Thus, the point (I ,e), 0 < e < 1, is e-equilibrium in Example 7, and the
strategies i , = 1 t, yc = e are -optimal strategies for the players 1 and 2 respectively.
2.2.4. Note that the following results hold for two strategically equivalent games
T = (X,Y,H) and Y'{X,Y,H'), where H' = 0H + a, /? > 0. If (XT/ ( ) is the t-
equilibrium point in the game F, then it is the (0e)-equilibrium point in the game T'
(compare it with Scale Lemma in Sec. 1.3).
2.2.5. The following theorem yields the main property of -optimal strategies.
Theorem. For the finite value v of the zero-sum two-person game V = (X, Y, H)
to exist, it is necessary and sufficient that, for any e > 0, there be e-optimal strategies
xe,yt for the players 1 and 2, respectively, in which case
\\mH(xc,yt)=v. (2.2.5)
o
sup//(x,j,e)-~<v (2.2.6)
54 Chapter 2. Infinite zero-sum two-person games
and strategy x< from the condition
mf//(*,y)+>v. (2.2.7)
\H(xt,yt)-v\<6-. (2.2.9)
Let Xo Si- Then min9 p(x0,y) is achieved at the intersection point y0 of the
straight line, passing through the center Oi of the circle S? and the point x0, and the
boundary of the circle S^. Evidently, the quantity min^gs p(x0, y) is a maximum at
the point M e Si where the lines of centers OOi (Fig. 2.2) intersect the boundary of
the circle S\ that is farthest from the point Ox.
Thus,t;=|01M|-fl3.
2.2. e-saddie points, e-optima] strategies 55
* -
Figure 2.2
p{x0,y0) = .^ff3p(x'0,yo).
hence
minm20c/>(x,j/) = v = ft.
VS2 r e S j
fl5
\ v = RA*5
M
^
x
'^0 \MX
0=
^ / 5,/
'V^w-
Figure 2.S
to choose the point M lying at the intersection of the line of centers 00\ with the
boundary of the set S\ that is farthest from the point 0\. An optimal strategy for
Player 2 is to choose the point y & coinciding with the center 0 of the circle Si-
In this case the value of the game is v = v = v = Rx + R2 R% = R\
Case 2. The center of the circle O & S2. This case is coinsidered in the same
way as Case 1 when the center of the circle St belongs to the boundary of the set
Sj. Compute the quantity v (Fig. 2.4). Let j/o S j . Then the point Xo providing
m a x i e s , p(x,yo) coincides with the intersection point x 0 of the straight line, passing
through y 0 and the center O of the circle S t , and the boundary of the circle Si that
is farthest from the point y 0 . Indeed, the circle of radius x5yo with its center at
the point yo contains Si and its boundary is tangent to the boundary of the circle
Si at the unique point XQ. Evidently, the quantity m a x l s , p(x,y) />(xo,y) takes
its minimum value at the intersection point M\ of the line segment 00\ and the
boundary of the circle S2. Thus, in the case under study
Optimal strategies for players 1 and 2 are to choose the points M Si and M\ S%,
respectively.
If the open circles Si and S2 are considered to be the strategy sets in Example 1,
ref. 1.1.2, then in Case 2 the value of the game exists and is equal to
Optimal strategies, however, do not exist, since M & Sx, Mx S2. Nevertheless for
any e > 0 there are e-optimal strategies that are the points from the e-neighborhood
of the points M and M\ belonging respectively to the sets S t and S3.
2.2.7. In conclusion, it should be noted that the game in Example 6 has an
equilibrium point in pure strategies (see Exercise 7), and the games in the Examples
2.3. Mixed strategies 57
S2
R7
o:
Figure 2-4
1-5, generally, do not have an equilibrium point and a game value. Thus in Example
2 Player 1 has an optimal strategy 2* = 1/2 when I > 1/2, and the game value is 1
(any strategy of Player 2 being optimal).
representing the mathematical expectation of the payoff H(x, y) for measures fi, v
[Prokhorov and Riazanov (1967)].
Definition. A mixed extension of the game V = (X, Y, H) is a zero-sum two-
person game in normal form with the strategy sets X,Y and the payoff function
58 Ch&pter 2. Infinite zero-sum two-person games
where the integrals in (2.3.1), (2.3.2), (2.3.3) are taken in the sense of Lebesgue-
Stieltjes. If, however, the distributions n(x),v(y) have the densities / ( x ) and g(y),
i.e. dfi(x) = f(x)dx and di/(y) = g(y)dy, then the integrals in (2.3.1), (2.3.2), (2.3.3)
are taken in the sense of Riemann-Stieltjes. The game F C T is a subgame of its
mixed extension T. Whatever the probability measures n and v, all integrals in
(2.3.1), (2.3.2), (2.3.3) are supposed to exist.
Definition. Let T = (X,Y, H) be a zero-sum two-person game, and let T =
(X,Y,K) be its mixed extension. Then the point (/I*,J/*) G X x V is called an
equilibrium point in the game T in mixed strategies if for all p 6 X and v (E Y the
following inequality holds:
i.e. (|i*, v') is an equilibrium point in the mixed extension of the game T, and (i*(v')
is Player 1's (S's) optimal strategy in T.
Similarly, the point (/**, i/*) (E X x Y is called the e-equilibrium point in the mixed
extension of the game T if for all p. X and 1/6V the following inequalities hold
Proof. Necessity of the theorem is obvious, since the pure strategies are a special
case of mixed strategies. Prove the sufficiency. First prove (2.3.6) ((2.3.7) can be
proved in the same way). Let fi and v be arbitrarily mixed strategies for players 1
and 2, respectively. From (2.3.1), (2.3.2) and (2.3.6) we then get
If the players have optimal strategies, then the exterior extrema in (2.3.8) are achieved
and the equations
iaK(n%y) = v, (2.3.9)
are the necessary and sufficient optimality condition for the mixed strategies /i" X
and i>* F .
Proof. Let v be the value of the game. Then, by definition,
For a fixed strategy (i, the set {K(fi,i/)\ v 6 } is a convex hull of numbers
^(/*>y)> V Y- Since the exact lower bound of any set of real numbers coincides
with that of a convex hull of these numbers, then
MK(w)=MK(it,y). (2.3.12)
Equality (2.3.12) can also be obtained from the following reasonings. Since Y C F ,
we have
inf # ( / . , * ) < inf * 0 i , ) .
inftfO(,i')<infK(jt,y).
This means that for a sufficiently small e > 0 the following inequality holds:
miK(ftyv)> iatK(fi,v) + i.
The obtained contradiction proves (2.3.12). Let us take a supremum for /i in (2.3.12).
Then
v = supinf K(n,y).
The second equality in (2.3.8) can be proved in the same way. Conversely, if (2.3.8)
is satisfied, it follows from (2.3.12) that v is the value of the game.
Now let (i',v* be optimal strategies for the players 1 and 2, respectively. By
the Theorem given in 1.3.4, the exterior extrema in (2.3.8) are achieved, and (2.3.9),
(2.3.10) are the necessary and sufficient optimality conditions for the mixed strategies
H" and v".
As noted in 2.3.2, introduction of mixed strategies in a zero-sum infinite game
depends on the way of randomizing the set of pure strategies. From (2.3.8), however,
it follows that the game value v is independent of the randomization method. Thus,
to prove its existence, it suffices to find at least one mixed extension of the game for
which (2.3.8) holds.
Corrollary. For any zero-sum two-person game T = (X,Y,H) having the value
v in mixed strategies, the following inequality holds:
supinf H(x,y) < supinf K(fi,y) = v = inf sup K(x,v) < inf &\xpH{x,y).
X " i X " X
2.3.6. From (2.3.14), one of the methods for an approximate solution of the zero-
sum two-person game follows. Indeed, suppose the exterior extrema in (2.3.14) are
achieved, i.e.
v~ = max inf H(x, y) = inf H(x, y), (2.3.15)
and let a = v+ - v~. Then Player l's maximin strategy i and Player 2's minimax
strategy y describe the players' optimal behavior with a accuracy, and can be taken
as an approximate solution to the game T. In this case the problem thus reduces to
that of finding the maximin and minimax strategies for the players 1 and 2, respec
tively, with the accuracy of the approximate solution determined by a = v+ - v~.
Here, by (2.3.14), the game value v lies in the interval v {v~,v% Minimax theory
[31,30] is devoted to the methods of finding solutions to problems (2.3.15), (2.3.16).
2.3.7. As in the case of matrix games, the notion of a mixed strategy spectrum
is important for infinite games.
62 Chapter 2. Infinite zero-sum two-person games
The least closed set whose ^-measure (u-measure) is equal to 1 will be called the
spectrum of the mixed strategy (t(v).
The mixed strategy concentration points are the spectrum points. The opposite
is not true. Thus, the pure strategies on which the density of a mixed strategy is
positive are the spectrum points, but not the concentration points.
The mixed strategy spectrum /i(v, respectively) will be denoted by X^Y,,). We
shall prove the analog of the complementary slackness theorem 1.7.6 for infinite games.
Theorem. Suppose T = (X, Y,H) is a zero-sum two-person game having the
value v. Ifxo X and v* is an optimal mixed strategy of Player 2 and
K{x0,u')<v (2.3.17)
Let n'(xo) > 0, i.e. XQ is concentration point of Player I's optimal mixed strategy
/i*. Then it follows from (2.3.17) that
v = jxK{x,v'W{x) = K{f,')<v.
1, if a; > y,
0, if i = y,
! -1, ifx<y.
This game has no value in pure strategies. Show that it has no value in mixed
strategies as well.
Let p. be an arbitrary mixed strategy of Player 1, and d\i{x) = 6X, where Sx > 0
and ?Li *i = 1- Take e > 0 and find y< such that
E S* > l ~ e-
x<y,
Then
oo
r< *>V
Because of the arbitrariness of e > 0 and since H(x, y) does not take values less than
1, we have
m{K(fi,y) = -\.
v = supinf K(fi,y) = 1.
Since V > v, then the game F has no value in mixed strategies. As is shown in the
next section, the continuity of the payoff function and the compactness of the strategy
spaces is sufficient for the existence of a solution (value and optimal strategies) in the
mixed extension.
K(it',y)>v (2.4.3)
holds for all points y 6 Y. If (2.4.1) does not hold, then there exists a point yo Yv-
such that K(fi*,yo) > v. By the continuity of the function K{p',y) there exists such
a neighborhood that the inequality (2.4.3) in a neighborhood w of the point y0 is
2.4. Games with continuous payoff functions 65
strict. From the fact that y 0 V is a point of the mixed strategy spectrum v", it
follows that v*(w) > 0. From this, and from inequality (2.4.3) we get
v = K{f,S) = JYK{f,y)dv'{y)>v.
The contradiction proves the validity of (2.4.1). Equality (2.4.2) can be proved in a
similar way.
This result is analog of the complementary stackness theorem, 1.7.6. Recall that
the pure strategy x appearing in the optimal strategy spectrum is called essential.
Thus, the theorem states that (2.4.1) or (2.4.2) must hold for essential strategies.
Theorem 2.4.2 holds for any continuous game since the following assertion is true.
2.4.3. L e m m a . / / the function H : X xY R1 is continuous on X xY then
the integrals of K(p,y) and K(x,u) are respectively continuous functions of y and x
for any fixed mixed strategies \i X and u 6 Y.
Proof. The function H(x,y) is continuous on the compact set X xY, and hence
is uniformly continuous.
Let us take an arbitrary t > 0 and find such 8 > 0 that as soon as p(yi,y2) < 6,
then for any x the following inequaility holds:
\H(x,yi)-H{x,y2)\<e, (2.4.4)
Let us outline the proof for the set of mixed strategies X (similar arguments apply
toF). _
The space of Borel measures X given on the Borel a-algebra X of the compact
metric space X will be a metric space if we introduce the metric
p(S,lt) = max(p',p%
where p' and p" are respectively lower bounds of the numbers r' and r" such that for
any closed set F C X
where VT{F) = {x X : min, Fpi(x,z) <r},r > 0, and p\(-) is the metric in the
space X.
It is known [Prokhorov and Riazanov (1967)] that convergence in this metric space
is equivalent to weak convergence, and the set of measures p defined on a Borel <r-
algebra of the subsets of the space X is weakly compact (i.e. compact in terms of the
above defined metric space of all Borel measures) if and only if this set is uniformly
bounded
*(*) < c (2.4.7)
and uniformly dense, i.e. for any c > 0 there is such compact set A C X that
Condition (2.4.8) follows from compactness of X, and (2.4.7) follows from the fact
that measures (i 6 X are normed {n(X) = 1).
2.4.6. Note that under conditions of the theorem, 2.4.4, the set of mixed strategies
X(Y) of Player 1(2) is also a compact set in the ordinary sense, since in this case
the weak convergence of the measure sequence {/<}, n = 1,2,... is equivalent to the
convergence in the ordinary sense:
for any Borel set A C X with the bound A' having the zero measure fi(A') = 0.
Proof of this result involves certain complexities and can be found, in Prokhorov
and Riazanov (1967).
2.4.7. Denote by w and v respectively the lower and upper values of the game
T = (X,Y,H).
U = supinf K(ft,y), v infsup(z,i/). (2.4.9)
are
Lemma. / / the conditions of the 2-4-4 satisfied, the extrema in (2-4-9) are
achieved, and hence
Proof. Since H(x,y) is continuous, then, by the Lemma 2.4.3, for any measure
fi 6 X the function
K(jt,y)= I H(x,y)drtx)
Jx
is continuous in y. Since Y is a compact set, then K(fi,y) achieves a minimum at a
particular point of this set.
By the definition of v, for any n there exists such a measure ( i e X that
Since X is a compact set in the topology of weak convergence (Lemma, 2.4.5), then
a weakly convergent subsequence can be chosen from the sequence {nn}%Li, Hn X.
Suppose the sequence {/*n}Li weakly converges to a certain measure ft0 6 X. Then
Ym^Kin^y) = \im>jxH(x,y)dnn(x)
= j H(x,y)di*>(x) = K{no,y), 6 K
But K(no,y) is not less than v for every y 6 Y. Hence min v K(fi0,y) > 1> and the
required maximum is achieved on fio X.
Similarly, inf sup in (2.4.9) can be shown to be replaced by minmax.
2.4.8. We now turn to the proof of the Theorem 2.4.4.
Proof. Since X and Y are metric compact sets, then for any integer n there exist
a finite (l/n)-networks
of the sets X and Y, respectively. This means that for any points x X and y Y
there are such points i " 6 Xn and y" Yn that
p 1 (x,x')<<5, P2(y,y')<6>
68 Ciapter 2. Infinite zero-sum two-person games
then
\H(x,y)-H(x',y')\<e. (2.4.13)
We choose n such that 1/n < , and then determine the strategy /xn 6 X by the
following rule:
*.(*)= *? (2-4-14)
K(Mn,y)>6n-e. (2.4.16)
v > 0n - t. (2.4.17)
But, by Lemma, 1.2.2, the inequality v < v always holds. Because t > 0 was arbitrary,
we obtain
11 = v, (2.4.19)
then from Lemma, 2.4.7, and (2.4.19) follows the assertion of the theorem (see 2.2.1).
2.4.9. Corollary. The following relation holds:
v = lim 0n, (2.4.20)
where 9n = v(An) is the value of the matrix game with matrix An (2.4-12)-
2.4.10. It follows from the proof of the Theorem 2.4.4 that a continuous game can
be approximated by finite games to any degree of accuracy. Moreover, the following
result holds true.
Theorem. An infinite two-person zero-sum game T = (X, Y, H), where X, Y are
metric compact sets and H is a continuous function on their product, has t-optimal
mixed strategies with a finite spectrum for any e > 0.
2.4. Games with continuous payoff functions 69
Proof of this theorem follows from proof (2.4.8) of the Theorem 2.4.4. Indeed, by
the game T, we may construct matrix games with matrices An and mixed strategies
pn 6 X that are respectively determined by (2.4.12), (2.4.14) for an arbitrary integer
n. By analogy, Player 2's strategies vn Y are determined as follows:
"(<?)= E "". ( 2 - 4 - 21 )
where t" = (r, n ,... , r" n ) is an optimal mixed strategy for Player 2 in the game with
matrix An and value 0n.
By construction, we have
for all x 6 X and 5 6 V. Considering that the strategies ftn and fn have respective
finite spectra Xn and Kn, and Xn,Yn are finite 1/n-networks of the sets X and V,
respectively, we obtain the assertion of theorem (see 2.3.4).
2.4.11. By combining the results of theorems 2.4.4 and 2.4.10, we may conclude
that the infinite two-person zero-sum game with the continuous payoff function and
compact strategy sets for any > 0 has e-optimal strategies of the players that are
mixtures of a finite number of pure strategies, and the optimal mixed strategies in
the class of Borel probability measures. Specifically, these results hold for the games
on the unit square (see 2.1.3) with a continuous payoff function.
2.4.12. There are many papers proving the existence of the value of infinite two-
person zero-sum games. The most general result in this line is attributed to Sion
(1958). The results are well known for the games with compact strategy spaces and
semicontinuous payoff functions [Peck and Dulmage (1957), Yanovskaya (1973)]. We
shall show that in some respects they do not lend themselves to generalization.
Example 10. (A square game with no value in mixed strategies.) [Sion and Wolfe
(1957)]. Consider a two-person zero-sum game T = (X,Y, H), where X = Y = [0,1]
and the payoff function H is of the form
This function has the points of discontinuity on the straight lines y = x and y =
s + 1/2. Show that
MK(ii,v)<K(it,Vll)<l/3,
y 6 Y, then the game T = (X, Y, H) is called the game with a concave payoff function
(a concave game).
If, however, X C i? m ,K C Rn are compact sets and the payoff function H(x,y)
which is continuous in all its arguments, is concave with respect to x for any fixed
y and is convex with respect to y for each a:, then the game T(X,Y,H) is called the
game with a concave-convex payoff function (a concave-convex game).
We will now consider convex games. Note that similar results are also true for
concave games.
Theorem. Suppose T = (X, Y, H) is a convex game. Then Player 2 has an
optimal pure strategy, with the game value being equal to
Proof. Since X and Y are metric compact sets (in the metric of Euclidean spaces
R and fl") and the function H is continuous on the product X x Y, then, by the
Theorem 2.4.4, in the game T there exist the value v and optimal mixed strategies
/i*,i/". It is well known that the set of all probability measures with a finite sup
port is everywhere dense in the set of all probability measures on Y [Prokhorov and
Riazanov (1967)]. Therefore, there exists a sequence of mixed strategies vn with a
finite spectrum that is weakly convergent to v*. Suppose the strategy spectrum vn
is compressed of the points yi,-,y" that are chosen with probabilities ty", , ?
By the convexity of the function H, we then have
where y" = **, ^"!/n- Passing to the limit as n > oo in (2.5.2) (with the sequence
{y"} to be considered as required) we obtain
where y is a limit point of the sequence fy"}. Prom (2.5.3) and Lemma 2.4.3 we have
v = max K(x, v') > max H(x, y) > min max H(x, v) = v,
X X V X
which is impossible. Thus, max x H{x,y) = max* K(x, v') = v and it follows from
Theorem 2.3.5 that y is an optimal strategy for Player 2.
We will now demonstrate the validity of (2.5.1). Since y Y is an optimal strategy
of Player 2, then
t; = maxH(x,y) > minmaxH(x,y).
72 Chapter 2. Infinite zero-sum two-person games
On the other hand, the following inequality holds:
v = minmaxK(x,i/) < min max H(x,y).
v
v x ' v *
Comparing the latter inequalities we obtain (2.5.1).
2.5.2. Recall that the function <p : Y - R1, Y C Rn, Y being a convex set, is
strictly convex if the following strict inequality holds for all A (0,1):
>(Ayi + (1 - A)y3) < \<p(yi) + (1 - X)<p(y2), y,,y2 6 Y, y, ^ y2.
Theorem. le* T = (X, Y,H) be a convex game with a strictly convex payoff
function. Player 2 then has a unique optimal strategy that is pure.
Proof. Let n* be an optimal strategy for Player 1, ip(y) = K(fi',y) and v - the
value of the game. If y is a point of Player 2's optimal strategy spectrum, then the
following relation holds (2.4.2):
K(f,y) = v.
For all y 6 Y, however, there is the inequality K{y.*,y) > v, and hence
The function <p(y) is strictly convex since for A (0,1) the following inequality holds:
In the game F there always exists a pure strategy saddle point (z*, y*), where x* 6 X,
y* Y are pure strategies for players 1 and 2, on which exterior extrema in (2.5.7)
2.5. Games with a convex payoff {unction 73
are achieved. In this case, if the function H(x,y) is strictly concave (convex) with
respect to a variable x(y) for any fixed y Y (x X), then Player 1(2) has an
optimal unique strategy that is pure.
2.5.4. We will now clarify the structure of Player l's optimal strategy in the
convex game T = (X, Y, H).
Theorem. In the convex game T (X, Y, H), Y C R", Player 1 has an optimal
mixed strategy pS with the finite spectrum composed of no more than (n + 1) points
of the set X.
The proof of this result is based on the well known Helly theorem of convex sets
that is presented below without proof [Rockafellar (1970), Davidov (1978)].
Theorem (Helly Theorem). Let K be a family composed of at least n + 1
convex set in Rn, each set from K being compact. Then, if each n + 1 of the sets of
the family K have a common point, there exists a point common to all the sets of the
family K.
Before proving the theorem, we shall prove some auxiliary statements.
Suppose the function H(x,y) is continuous on the product X xY of the compact
sets X C Rm,Y C Rn. Denote by XT = X x . . . x X the Cartesian product of r sets
of X.
Consider the function ip : Xr x Y Rl:
If pi(xXj) < 6 for t = l , . . . , r , p2(yi,y2) < 8 and if H(xit,yi) > H(xi:l,y2), then
Similar inequalities also hold in the event that /f(x;,,yi) < H(xi2,y2).
Lemma. In the convex game T = (X, Y, H), Y C Rn the game value v is
v = minma.xH(x,y) = max min max i/(x,,y), (2.5.8)
where y 6 Y, X; X, i = 1 , . . . ,n + 1.
74 Chapter 2. Infinite zero-sum two-person games
Proof. Denote
9 = max min max H(xi,y).
v
>,...^r+i V l<Kn+l '
Since miny maxi<t-<n+i H(XJ, y) < min max* H(x, y) = v for each system of points
( i , , . . . , x n + i ) e X" + , ,then
0 < v. (2.5.9)
For an arbitrary fixed set of strategies x; X, i = 1,..., n + 1, we shall consider a
system of inequalities with respect to y:
Dx{y\H{x,y)<e}.
The function H(x, y) is convex and continuous in y, and hence the set Dx is closed
and convex for each x. The sets {Dz} forms a system of convex compact sets in if.
And since the inequalities (2.5.10) always have a solution, any collection in the (n +1)
sets of system {Dx} has a nonempty intersection. Therefore, by the Helly theorem,
there exists a point y0 Y common to all sets Dx. That is there exists a point such
that
H(x,y0) < 0 (2.5.11)
for any i l Suppose that 0 ^ v. From (2.5.9) and (2.5.11) we then have
where P is composed of the vectors satisfying (2.5.13). The function K(p,y) is con
tinuous in p and y, convex in y and concave in p, with the sets Y C .ft", P C iZ n+1
compact in the corresponding Euclidean spaces. Therefore, by the Theorem 2.5.3 and
from (2.5.12) we have
n+l n+l
v = minmax 5~* ff(x,,t/)xj = maxmin V* / f t e . y W j . (2.5.14)
i=i 1=1
From (2.5.8) and (2.5.14) follows the existence of p* g P and j ' E V such that for all
x X and y G F the following inequality holds:
n+l
=1
Player 1 has an optimal mixed strategy fi0 with a finite spectrum composed of no
more than (n + l) points of the set X. However, all pure strategies y0, on which
min y m a x I / / ( i , y ) is achieved, are optimal for Player 2. Furthermore, if the function
H(x,y) for every fixed x X is strictly convex in y, then Player 2's optimal strategy
is unique.
We shall illustrate these results by referring to the example given below.
Example 11. Let us consider a special case of Example 1 (see 2.1.2). Let Si =
Sj = S and the set S be a closed circle on a plane of radius R and centered at the
point O.
The payoff function H(x,y) = p(x,y), x E S, y S, with p(-) as a distance
function in R2, is strictly convex in y, and S is a convex set. Hence, by theorem 2.5.5,
the game value v is
t; = min max p(x,y). (2.5.15)
v v
yes xS ' '
76 Chapter 2. Infinite zero-sum two-person games
H'y(x',yo)<0. (2.5.18)
//, however, yo < 1, then for Player 1 there is a balancing strategy x" such that
H'y(x",yo)>0. (2.5.19)
Proof. Let us prove (2.5.18). (The second part of lemma can be proved in a similar
way.) Suppose the opposite is true, viz. the inequality H'y(x, yo) > 0 holds for every
balancing strategy x of Player 1, i.e. the function H(x~, ) is strictly increasing at the
2.5. Games with a convex payoff function 77
point t/o- This means that there are e(x) > 0 and 6 ( x ) > 0, such that for y 6 [0,1)
satisfying the inequality 6 ( x ) > y0 y > 0, the following inequality holds:
By the continuity of the function H we have that for every balancing strategy x and
e(x)/2 there is S(x) > 0 such that for 0 ( x ) > y0 y > 0
for all balancing strategies x for which |xx| < 6(x). The set of balancing strategies is
compact, and hence it can be covered by a finite number of such 6(x)-neighborhoods.
Let e be the smallest of all corresponding numbers e(x). Then we have an inequality
holding for all balancing strategies x (and for all essential strategies)
where
y0 - min 0 ( 5 ) < y < y0.
Let fi0 be the optimal mixed strategy of Player 1. The last inequality is valid for
all spectrum points of /J0> thus by integrating we set
t
K(no, y) < K(fi0, y0) - - = v - -,
which is contradiction to the optimality of the strategy ft0-
Theorem. Suppose that T is a convex game on a unit square with the payoff
function H differentiable with respect to y for any x, 5/0 * <*n optimal pure strategy of
Player 2, and v is the value of the game. Then:
1- ifyo 1) then among optimal strategies of Player 1 there is pure strategy x' for
which (2.5.18) holds;
3. if 0 < y0 < 1, then among optimal strategies of Player 1 there is a strategy that
is a mixture of two essential strategies x' and x" satisfying (2.5.18), (2.5.19)
with probabilities a and 1 a, a [0,1]. And a, is a solution of the equation
Proof. Let y0 = 1. Then, for Player 1, there is an equilibrium strategy x' for which
(2.5.18) holds. Hence it follows from the convexity of the function H(x',y) that it
does not increase in y over the entire interval [0,1], achieving its minimum in y = 1.
This means that
H(x',y0)<H(x',y) (2.5.21)
78 Chapter 2. Infinite zero-sum two-person games
for all y G [0, lj. On the other hand, it follows from (2.5.17) that
for all x e [0, lj. The inequalities (2.5.21), (2.5.22) show that (x',y0) is an equilibrium
point.
The case jfo = 0 can be examined in a similar way. We shall now discuss case 3.
If 0 < y<> < 1, then there are two equilibrium strategies x' and x" satisfying (2.5.18),
(2.5.19), respectively.
Consider the function
Consequently, the function K((io,y) achieves a minimum at the point y0- Hence
considering (2.5.17), we have
for all * [0,1] and y [0,1], which proves the optimality of strategies fio and yo-
2.5.7. Theorem 2.5.6 provides a way of finding optimal strategies which can be
illustrated by referring to the following example.
Example 12. Consider a game over the unit square with the payoff function
H(x, y) = (x y) 3 . This is a one-dimensional analogy for Example 11 except that
the payoff function is taken to be the square of distance. Therefore, it would appear
natural that the game value v would be v = 1/4, Player l's optimal strategy be to
choose with probability 1/2 the extreme points 0 and 1 of the interval [0,1]. We shall
show this by employing the Theorem 2.5.6.
Note that BtH(x,y)/dy2 = 2 > 0 so the game T is strictly convex, and hence
Player 2 has a unique optimal strategy that is pure (Theorem 2.5.5). Let y be a fixed
strategy for Player 2. Then
K
nax(a; - yf = i *'
y\ y>i/2.
Both interior minima are achieved on y0 = 1/2 and is equal to 1/4. Therefore,
v = 1/4, and y0 = 1/2 is the unique optimal strategy of Player 2.
We shall now find an optimal strategy for Player 1. To be noted here is that
0 < s/o < 1 (jfo = 1/2). Equation (2.5.17) now becomes (a; - 1/2)2 = 1/4. Hence
Xi = 0 and x-i = 1, i.e. the extreme points of the interval [0,1] are essential for Player
1.
Let us compute derivatives
K(Xo,y')>0; (2.5.23)
3. /0 < Xo < 1, then among optimal strategies of Player 2 there is a strategy that
is a mixture of two essential strategies y' and y" satisfying (2.5.23), (2.5.24)
with probability 0 and 1 /?. Here the number /9 [0,1] is a solution of the
equation
0H'x(xo,y') + (l-P)H'x(xo,y") = O.
Let us fix H and consider the function $(p, i?) on the interval 0 < p < R. Differenti
ation with respect to p shows that
<M.0,>0, 0S ,S*
Therefore, the function #(p, R) is monotonically increasing in p, and hence $(r, R) <
9(p,R)
K(x,u')<K(p',u')<K(p',y)
for all x, y 5. We have thus proved the optimality of strategies p.* and u'. Here the
game value t; is u = K{p', v'), where K(p*, u*) is determined by (2.6.1). Specifically,
if 5 is a circle of radius R (the case r = R), then the value of the game is 4R/ir.
2.6.2. Example 14- Consider a simultaneous game in which Player 2 chooses a
pair of points y = {yi,yi}, where yt S, y2 S, and Player 1 having information
about Player 2's choice chooses a point x g 5. The payoff to Player 1 is assumed
to be m i n ^ a P3(x, Vi)- We shall provide a solution for the case where the set S is a
circle of radius R centered at the origin of coordinates (the point O): S = S(0, R).
Consider the function $(r, p) r2 + p2 4rp/x, where r and p take values from
the interval r, p E [0,fl]. We shall establish properties of the function $(r,/>).
Lemma 1. The function $(r, R) (as a function of the variable r) is strictly
convex and achieves the absolute minimum at the unique point r 0 = 2R/ir.
2.6. Simultaneous games of pursuit 81
2
= T" / V + 'I - 2Rr2 cos(t/> - p j ) ) # + ^ - f *~ V * + r2 - 2ftr, cos(V> - V i ) ] # -
lie J-B IT? J0
82 Chapter 2. Infinite zero-sum two-person games
Figure 2.5
Let
F,(v>) = \{R2 + r\)P - 2Rr2 sin 0cos <?]/*-, - / ? < > < / ? ,
F2(<p) = [(ft2 + rj)(ir - 0) + 2Rrx sin 0cos <p]/v, 0 < y> < 2TT - 0.
Stationary points of the functions Fi and F 2 are respectively 0 and ir since 0 <
f) < r/2 and F[(<p) = ^/^sin/Sain^?, F(<p) = /?risin/?siny, with 0 and v
as the points of the absolute minimum of the functions F\ and Fj, (F((^>) < 0 for
V 6 (-0,0), ^i'(v) > 0 for V 6 (0,0); F ^ ) < 0 for y> G ( f t x ) , / ^ ) > 0 for
y (*, 2* 0)). Consequently,
i.e. the amount of the payoff to Player 1, with Player 2 using a strategy yt =
{ri,0}, j/2 = {>"2,0}, is less than that to be obtained by using a strategy
Suppose the points jd and y2 are lying on the diameter of the circle S(0,R)
and the distance between them is 2r. Denote by 2a the central angle basing on
the arc spanned by the chord AB (Fig. 2.6). Suppose that y% = {.ft cos a r,0},
Vi = {Rcosa + r,0}. Then the payoff to Player 1 is
+S~ / [(flcos^-Hcosa + r) 2 + ^ 2 s i n 2 ^ ] #
2JT Ja
1 fa
= - / [fl 2 -2fl cos #(.ft cos + ?) + ( # cos a + r ) 2 ] #
2lC Jot R
+K~ / i ~ 2Rcos ^(^cos a - r) + (flcos a - r)2]<ty
2x Ja
2.6. Simultaneous games of pursuit 83
Figure 2.6
We shall show that, for a fixed r, the function ip{a,r) achieves a minimum in a
when a = w/2. Elementary computations shows dip/da = {2i?sin <*[(* - 2o)r
itRcosa}}/it, and hence for sufficiently small values of a we have dip(a,r)/da < 0
since sin a > 0, f(it 2a) * R cos a < 0 (in the limiting case fit itR < 0). At the
same time di>(it/2, r)/da = 0.
For every fixed r the function dip(a,r)/da has no zeros for a except where a =
it 12. Suppose the opposite is true. Let a-i be a zero of this function in the interval
(0,it/2). Then the function G(a) = {it 2a)r itRcosa vanishes at a = a\. Thus,
G(o,) = G(*/2) = 0.
It is evident that G(ct) > 0 for all a 6 (ai,ir/2). This is a contradiction to the
convexity of the function (3(a) (G"(a) = itRcosa > 0). Thus dtl>(a,r)/da < 0 for
a 6 (0, ir/2) and dip(it/2,r)/da = 0. Consequently, the function ip(a,r) achieves an
absolute minimum in a when a = r/2 : ij>(a,r) > ip(it/2,r). The implication here is
that
K(f, y) = rp(a, r) > 4>(it/2, r) = $(r, R) > $(r 0 , R). (2.6.5)
From (2.6.3)-(2.6.5) it follows that for any pure strategy y = {sh,!fe} the following
inequality holds:
K(f,y)>$(r0,R). (2.6.6)
Suppose Player 2 uses strategy v" and Player 1 uses an arbitrary pure strategy x =
{pcosip,ps'mxi>}. Then Player 1 receives the payoff
1 / 2lr
K{x, vm) = / min[p2 + r2 - 2pr0cos(t/> -tp), p2 + r2 + 2pr0 cos(^ - tp)]Ap
lit Jo
1 r2*
= / min(/>2 + r 2 - 2/w 0 cos^, p2 + r2 + 2pr0cos()dt = $(r,p)
ZTt Jo
and by Lemma 2 we have
2m - 4t - 1
K(x, v") = - min
2m-1 x
1 . j - 2 m + 4t + l j
+-min x
2 < | 2m-1 I
1/ 2m-2j-l\ 1/2m-27 + 1 \ 1 , ,
Now suppose Player 1 chooses a mixed strategy p.* and Player 2 chooses an arbitrary
pure strategy y = {y l 7 ... ,ym}.
Denote
2m - 2j - 1
] =0,l,...,2m-l.
' 2m-1
Then
im-i i
m
1 1 2 1
= 1 +
S^SW-^- '^ .igL^-*)! * 2 ^ 2 ^ ! = 2m^T
The statement of the theorem follows from inequalities (2.6.8), (2.6.9).
2.7. One class of games with a discontinuous payoff function 85
( ip(x,y),
>(*),
6(x,y),
if x < y,
if* = y,
if x > y,
(2-7.1)
where ip{x, y) is defined and continuous on the set 0 < x < y < 1, the function <p is
continuous on [0,1], and &(x,y) is defined and continuous on the set 0 < y < x < 1.
Suppose the game T = (X, Y, H), where X = Y = [0,1], with H given (2.7.1), has
optimal mixed strategies (x" and v" for the players 1 and 2, respectively. Moreover,
the optimal mixed strategies n",v' are assumed to be the probability distributions
which have continuous densities f"(x) and g*(x), respectively.
Let us next denote the required strategy by / (or g, respectively) to be taken as
distribution density. We shall clarify the properties of optimal strategies.
Let / be a strategy for Player 1. For y [0,1] we have
K(f,y) =[ 4>(x,y)f(x)dx +f 0(x,y)f(x)dx. (2.7.2)
Suppose that / and g are optimal strategies for players 1 and 2, respectively. Then
for any point y0, at which
g(y0) > 0 (2.7.3)
(that is the point of the strategy spectrum g), the following equation holds:
K(f,yo) = v, (2.7.4)
where v is the value of the game. But (2.7.3) is strict and hence there is 6 > 0 such
that inequality (2.7.3) holds for all y : \y-yo\ < $ Thus, inequality (2.7.4) also holds
for these y, i.e. the equality K(f,y) = v is satisfied. This means that
dK
U>yl = Q. (2.7.5)
y
Equation (2.7.5) can be rewritten as
/ (x - p + px)f(x)dx = 0. (2.7.15)
Ja
2.7. One class of games with a discontinuous payoff function 87
But in the case /? < 1 it follows from (2.7.15) that
t0
K(f, 1) = / (x - 1 + x)f(x)dx < 0
Ja
ri 2x 1
7/ k = 0, 7 5^0.
Ja X
Solving equation (2.7.16) we find two roots, a = 1 and a = 1/3, the first root being
extraneous. Consequently, or = 1/3. The coefficient 7 is found from the normality
condition for f(y)
I] f(v)dy = i[]
/1/3 Jl/3
if3* = 1,
Jl/3 Jl/3
whence follows 7 = 1/4.
We have thus obtained the solution of the game given in Example 5, 2.1.2: the
value of the game is v = 0, and the players' optimal strategies / and g (as distribution
densities) are equal to one another and are of the form
Q
f(x)-f> x<l/3,
/ w
~\l/(4i3), x>l/3.
2.7.3. Example 17. Find a solution to a "noisy duel" game (see Example 4, 2.1.2)
for the accuracy functions pi(x) = x and Pi(y) y. The payoff function H(x,y) in
the game is of the form (2.7.1), where
0(I,) = 2 I - 1 , (2.7.17)
(,) = l - 2 , (2.7.18)
(*) = 0. (2.7.19)
The game is symmetric, hence v = 0, and the players' optimal strategies coincide.
Here both players have an optimal pure strategy x" = y* = 1/2. In fact, H(l/2,y) =
0(1/2,y) = 1 - 2y > 0 if y < 1/2, H ( l / 2 , y ) = v ( l / 2 ) = 0 if y = 1/2, / / ( l / 2 , y ) =
V>(l/2,y) = 0 i f y > l / 2 .
From the game interpretation standpoint, the solution for the duelists is to fire
their bullets simultaneously after having advanced half the distance to the barrier.
In conclusion it may be said that the class of games of timing has been much
studied (see Davidov (1978), Karlin (1959), Vorobjev (1984)).
88 Chapter 2. infinite zero-sum two-person games
2.8 Solution of simultaneous infinite games of
search
This section provides a solution of the games of search with the infinite number of
strategies formulated in 2.1.2. It is of interest that, in the first of the games considered,
both players have optimal mixed strategies with a finite spectrum.
2.8,1. Example 18. (Search on a closed interval.) (Diubin and Suzdal (1981)].
Consider the game of search on closed interval (see Example 2 in 2.1.1) which is
modelled by the game on a unit square with the payoff function H(x, y) of the form
*(*')-{!: '&*'''^ ^
Note that for / > 1/2 Player 1 has an optimal pure strategy x" = 1/2 and the value
of the game is 1; in this case H(x',y) = H(l/2,y) = 1, since \y - 1/2| < 1/2 < / for
all y [0,1]. Let / < 1/2. Note that the strategy x = I dominates all pure strategies
x < I, and the strategy x = 1 / dominates all strategies x > 1 1. Indeed,
*(..,>-(.-w-{i: i2;2'A
and if x 6 [1 - 1,1], then
^ ' ^ 1 0 , otherwise.
Thus, with x [1 - /, 1], H(x,y) < H(l - I,y) for all y (0,1].
Consider the following mixed strategy fi* of Player 1. Let I = xx < x2 < ... <
xm = 1 / be the points for which the distance between any pair of adjacent points
does not exceed 21. Strategy p.* selects each of these points with equal probabilities
1/m. Evidently, in this case any point y G [0,1] falls within /-neighborhood of at
least one point x*. Hence
K(fi\y)>l/m. (2.8.2)
Now let v' be a strategy of Player 2 that is the equiprobable choice of points 0 =
yi < V2 < < Vn 1, the distance between a pair of adjacent points exceeding 21.
Then there apparently exists at most one point y* whose /-neighborhood contains the
point x. Consequently,
K{x,v>) < l/n. (2.8.3)
2.8. Solution of simultaneous infinite games of search 89
S ^ ^ - , i = l,2,...,n, (2.8.6)
n 1
faithfully exceeds 2/. Thus, 1/n is the value of the game, and the optimal strategies
/i*,i/* are the equiprobable mixtures of pure strategies determined by (2.8.5), (2,8.6).
2.S.2. Example 19. Consider an extension of the preceding problem to the case
where Player 1 (Searcher) chooses a system of s points x i , . . . , x , , x,- [0,1], i =
1,... ,s, and Player 2 (Hider) chooses independently and simultaneously with Player
1 a point y 6 [0,1]. Player 2 is considered to be discovered if there is j { 1 , . . . , s)
such that \y Xj\ < I, I > 0. Accordingly, the payoff function (the payoff to Player
1) is defined as follows:
Thus, the value of the game is s/n, and /**, v* are the players' optimal strategies.
The value of the game is linearly dependent on the number of points to be chosen by
the Searcher.
2.8.3. Example 20. (Search on a sphere.) Consider the game of search on a
sphere (see Example 3 in 2.1.2). The payoff function H(x,y) is
L **-{$ <**>
where L(A) is Lebesgue measure (area) of the set A.
Parameters of the game, s, r and R, are taken such as to permit selection of the
system of points x = (xj, x3,..., x,) satisfying condition
L(Mz) = L ( S ( x , , r ) ) (2.8.10)
i=i
(spherical segments S(xj,r) do not intersect).
Let us fix a figure Mx on some sphere C". The mixed strategy p" is then generated
by throwing at random this figure Mx onto sphere C. To do this, we set in the figure
Mx an interior point z whereto rigidly connected are two noncollinear vectors a,b
(with an angle <p > 0 between them) lying in a tangent plane to Mx at point z.
Point z is "thrown" onto sphere C in accordance with uniform distribution (i.e.
density l/(4ir/t 2 )). Suppose this results in realization of point z' 6 C. Figure Mx
with the vectors set thereon, is transferred to sphere C in a parallel way so that the
points z and z' coincide. Thus, vectors a, b are lying in a tangent plane to sphere C
at point z'.
An angle ip' is then chosen in accordance with uniform distribution in the interval
[0,2T], and vector 6 lying in a tangent plane is turned clockwise together with its
2.8. Solution of simu/taneous infinite games of search 91
associated figure Mx through an angle <p'. This results in the transition of figure Mx
and vector 6 to a new position on sphere C. Random positioning of the set Mx on
a sphere in accordance with this two-stage procedure described, generates a random
choice of the points x\, x'2,..., x', 6 C whereas the centers xi,...,xt of the spherical
neighborhoods S(ij,r) making up the set M, are located.
Measure ft' is so constructed that it is invariant, i.e. the probability of covering
the set Mz of any point y 6 C is independent of y. Indeed, find the probability of
this event. Let ST = {<*>} be the space of all possible positions of Mx on sphere C.
Then the average area covered on sphere C by throwing the set Mx thereon (area
expectation) is equal to L(MX). At the same time
^'-^^-SM-<*>)
since L{S{xr)) = 2wR(R - yJ(B? - r)).
From the definition of a saddle point and the resulting inequality K(y.',y) >
K(x, 1/") it follows that the mixed strategies p.* and v* are optimal and
*V..->"i(i-,/i-<>)
92 Chapter 2. Infinite zero-sum two-person games
LiY L z)
*(-, ) = K (, ') = *(-, O = l^.
If Y is the r-neighborhood of the point y, then the value of the game is
the set E, let us introduce on a plane the polar coordinate system with its pole at
the point O, and with x0 as a polar axis.
Definition. By the strategy y E for Player 2 is meant a pair yv = {Tp, v) where
Tp is a random variable uniformly distributed over the interval [0,2it], and v [0,0\.
We assume that the strategy yv 6 E may be realized as follows. Having selected
a velocity v 6 [0,/?] under strategy yv, from the time t = 0 on, Player 2 moves from
point O with a constant velocity v along the ray <p = ipo, where y>0 is the realized
value of a random variable <p. Then the motion of Player 2 in the polar coordinate
system corresponding to the strategy yv is given by
Figure 2.1
It is clear that the strategy class P is denned correctly, since every strategy x P
uniquely defines Player l's trajectory. Indeed, to the strategy xu = (u,a(-)) corre
sponds Player l's trajectory of motion in the polar coordinate system as follows:
Figure S.8
Suppose the Searcher chooses strategy xu g P. Fig. 2.8 shows the area swept by
the circle of detection. The shaded area is called the area of detection of Player 1
using the strategy x e P and is denoted by fiu.
Suppose the Hider chooses the strategy yv P and Player 1 chooses the strategy
z P. Compute K(xu,yv). If l/t$ > |u - i/|, then the area ftu covers the circle of
radius t; as long as l/t > |u |, i.e. until the time necessary for the first player to
turn around point 0 (u P is fixed).
2.9. Games of secondary search 95
Let DA be the arclength of the circle of radius u travelled by Player 1 from the
time <JJ to the time tu. Then we have
Denote by DA' the part of the radius v arc covered by the area ftu. Then (see Fig. 2.8)
DA' = DA (vfu). Recall that the quantity Tp is uniformly distributed over [0,2JT],
and hence
H(^yv)^ -DA> = ^ V c T ^ ^ l . (2.9.8)
_ fl, |u-|<e(u),
H(xu,yv) = H(u,v) = &^In jpHg^Lf, e(u) <\u-v\< //, (2.9.13)
U, |u-i >//<;.
Note that the resulting payoff function H(u, v) depends only on (u, v) [O,_0\ x [0, /?],
i.e. we have thus obtained the game on a square, with the payoff function H(-) being
continuous in its arguments; hence in our game there exist an equilibrium point in
mixed strategies. A closely related, but differently stated problem is solved in Danskin
(1968).
2.9.2. Discrete secondary search. Consider the following game theoretic problem.
The Searcher learns at the time t = 0 that the Hider is at the point y(Q) = 0 and
can move with velocity which does not exceed the magnitude of /?. At this time the
Searcher calls a team of pursuers S = {Si,..,,Sn} acting as one player (Player 1)
to conduct a secondary search for Player 2. It is assumed that each of the pursuers
Si,i = 1 , . . . , n, can conduct a discrete search at the fixed instants of time t], t2,...,tjv
by choosing the points x,-(tj) C(O,0tj), where C(0,f)t) is the circle of radius fit
96 Chapter 2. Infinite zero-sum two-person games
with its center at point 0. In this case Player 2 is considered to be detected if there
exist t and j such that
IM*i)-*<(ti)ll<f, (2-9-14)
where / is the given detecting radius (or capture radius), and || || is a norm in R2.
Player 2 (the Hider) knows that, starting from the time t = 0, he will be searched
for by the team of pursuers S. And he has no other information about the opponent.
Based on the available information is the assumption that Player 2 confines himself
to a linear motion along the ray proceeding from point O with a constant velocity
v [0,/?], and chooses randomly the direction of motion by the uniform distribution
law. This situation is visualized in Fig. 2.9.
Figure 2.9
The region ft will be called a detection region of Player 2 provided Player 1 uses
strategy x. The probability of detecting Player 2 is taken to be the payoff function
H(x,y) of Player 1, i.e.
J7(*,y) = Pr(;eft).
Denote by M = rfP the area of a velocity circle C(0, ff). We assume that ft < M.
It is clear that, in this game, there is no optimal pure strategy for the Hider. Hence
we introduce mixed strategies for Player 2.
2.9. Games of secondary search 97
Figure 2.10
Definition. By the mixed strategy v(v) for Player 2 is meant any distribution
density of a random variable v [0,0], i.e. the function u(v),
v*(v) = 2 / M , (2.9.17)
H(x',yv)>i, (2.9.18)
K{x,v*)<-r (2.9.19)
hold for all strategies x and yv.
Suppose Player 1 chooses strategy x'. Then for every strategy yv of Player 2 we
have
H(x%yv) = ej(2w) = n/M, (2.9.20)
where 6 = 2^/fi7 is the central angle of sector Q (Fig. 2.11).
We shall now show the inverse inequality (2.9.19). To this end, we assume that
Player 1 chooses an arbitrary pure strategy x and Player 2 chooses a strategy i/*.
Examine now dil which is the part of the region fi bounded by v, v+dv and 0 , 6 + 5 0 ,
as shown in Fig. 2.12.
If u'(v) < 2icv/M, then the probability of finding Player 2 in the region d(l is
Figure 2.11
Figure 2.12
But the quantity 6Q[(v + dv)2 - t>2]/2 is the measure fi(dQ) of dSl. Therefore the
probability of K(x, v*) is
*(*'"*) = i i / ( d f t ) : = M (2 9 22)
--
for every pure strategy x of Player 1 (the Searcher). This completes the proof of the
theorem.
Note that the above-stated optimal strategy of search x" is not unique. In fact,
if the central sector Q will be "cut" by radial rays into several central sectors H,-, i =
1 , . . . , m, then the resulting new strategy would also be optimal. Similarly, "cutting"
by a circle arc of sector leaves its optimality unchanged. In particular, the following
"good" strategy of search can be proposed to the team of pursuers 5 = { S i , . . . , Sn}.
The central sector is cut by radial rays into n sectors with the area fijn each. And
each of these sectors is cut by the circle arcs into N segments with the respective
areas ff(//ii)2,ir(l/*2)2,... ,r(l/tN)2. The segments are then approximated by the
respective equal circles (Figs, 2.13, 2.14).
2.9.3. Reestablishing a contact with the evading submarine. [Diubin and Suz
dal (1981)]. Suppose a floating submarine has been detected by an aircraft radar.
Knowing that it has been detected, the submarine makes a crash dive and breaks
into evasion in a submerged condition. In order to reestablish a contact, the aircraft
is expected to use in time t a radio-sonic buoy whose range is taken to be 6. When
2.9- Games of secondary search 99
Figure 2.13
Figure 2.14
evading, the submarine is expected to appear in time t at one of the points of the
circle of a unit radius. It is evident that if 6 > 1, then, after setting up the buoy at the
center of this circle, the aircraft would reestablish a contact, otherwise reestablish-
ment of a contact would depend on a distance between the buoy and the submarine.
Therefore, with S < 1, the aircraft chooses the point for setting up the buoy and the
submarine chooses the point for its position in time t.
We assume without loss of generality that at the initial time the submarine is at
the point (0,0) and in time t appears at the point y = (5/1,1/2), where y* + y\ < 1.
The submarine is taken to be Player 2 whose pure strategy is the choice of the point
y Y = {(yi>y2)\(yi + y|) ^ *} Accordingly, the aircraft is taken to be Player 1
whose pure strategy is the choice of the point x X = {(x 1 ,i 2 )|(si + #2) ^ ! }
Then the model for this secondary search is a two-person zero-sum infinite game
r = < X,Y,H >, where
M-hCL*'-*'' Acx>
^ a s f / , ^ BCY-
Show that the measures /i and v are optimal strategies for the players in the game F.
In fact, by (2.9.26),
K{x,i>)<K{fi,i>), xX.
The inequality
K(ii,y)>K(ii,u), yY
can be proved in much the same way.
2.9. Games of secondary search 101
Thus, the strategies ji and v are optimal strategies for the players.
The measures \i and v are invariant, by their definition, under rotations through
any angles a and f3, i.e.
is defined by
Figure 2.15
The sets X and Y can be identified as the points of the interval [0,1]. Then the
choice of pure strategies 1 [0,1], y 6 [0,1] in the game T will mean the choice of
102 Chapter 2. Infinite zero-sum two-person games
mixed strategies in the game T which correspond to the uniform distributions over
the circles of radii x and y. In this case the function K is defined by
2xf
for (x + y) > I, |x - yj < I,
K{x,y) = for \x + y| < /, (2.9.29)
otherwise.
Indeed, (see Fig. 2.15) for any y G Y, \Jy\ + y\ = y, the expectation of the payoff
for the first player K(x, y) is equal to the arc length of the circle of radius x which
is inside of the circle of radius / with its center at point y. This length is equal to
2 2
2Q/(2JT), where the angle or is determined from the equation x + y 2xycosa = P.
Solving the last equation for a with x + J? > /, \x - y\ < I, y = (yt,y2), sjy\ + y\ = y
yields
X* + f - P
^( s > y) arccos
2xy
Figure 2.16
Since the last expression holds for all pure strategies y lying on the circle of radius
y, integrating with respect to a probability measure uniformly distributed over this
circle, with the indicated values x and y, yields (2.9.29). If, however, (x + y) < /,
then (see Fig. 2.16) it is evident that any circle of radius S circumscribed about any
point y lying on the circle of radius y completely covers the circle of radius x. Hence
K(x,y) = 1. Finally, if |x - y| > i, then (see Fig. 2.17) these circles do not intersect,
and hence K(x,y) = 0.
We shall prove that any pair of optimal strategies p* and V" in the game T defines
a solution of the game T. Let ft' be a measure on X defined by strategy jT, and
v' be a measure on Y defined by strategy F*. By the definition of the function K,
~K(r,T) = * ( , x > ' ) , T?{x,V) = K{x,V), K{n',y) = ^ / T . y ) . The arguments
x and y of the function K should be interpreted as the mixed strategies selecting
2.9. Games of secondary search 103
Figure 2.17
uniformly the points of the circles of radii x and y. Prom the above equalities and
optimality of strategies ]T and F", the inequalities
K{x,S)<K(iiW)<K(r',V) (2.9.30)
K{x,v')>K{y.\v'). (2.9.31)
Then, by the invariance of strategies v' for any a, K(Ta(x),v') = K{xQ,v'). Conse
quently equality K{x% v') = K(x, u') holds for all i X located at the same distance
from the center of the circle as x. In view of the last equality and inequality (2.9.31)
we obtain K(x, v') > K(fi',i/'), x X. By integrating this inequality with respect
to the measure uniformly distributed over the circle containing the point x, we arrive
at the inequality K(x~, v') > K((i',u'). This inequality contradicts the first of the
inequalities (2.9.30). Hence the inequality
holds for all x. The following inequality may be proved in much the same way:
W O <Ww). yey.
Thus, to obtain optimal strategies for the players in the game T, it is sufficient to
solve the game T.
Note that, for l / \ / 2 < / < 1, the game T has a solution in pure strategies x* =
%/T^T 5 , y* = 1. Indeed, for l / \ / 2 < / < 1 we have \s/\ - P - y'| < /. Therefore
_ . % f l l_2/2 + f.
minKWl P,y) = min{l, mm arccos , }
v 7T y>i-s/v^P 2vl - ' y
1 . l-2P+f
= miti arccos . - .
ir v>i-v/TZ(T 2^/T^Py
104 Chapter 2. Infinite zero-sum two-person games
The function placed under the arccos sign is monotonically increasing. Since the
function arccos y is monotonically decreasing, a minimum of the function K((lP),y)
is achieved at the point y = 1. Hence
The model. Two players, A and B, ante one unit each at the beginning of the
game. After each draws a card, A acts first: he may either bet a more units or fold
and forfeit his initial bet. If A bets, B has two choices: he may either fold (losing
his initial bet) or bet a units and "see" A's hand. If B sees, the two players compare
hands, and the one with the better hand wins the pot.
We shall denote A's hand by , whose distribution is assumed to be the uniform
distribution on the unit interval, and B's hand by i?, also distributed uniformly on
the unit interval. We shall write (f ,r/) = sign( 77) as before.
Strategies and payoff. The strategies are composed as follows: Let
<!>{) = probability that if A draws he will bet a,
1
~ <t>{0 = probability that if A draws he will fold,
4>(t)) = probability that if B draws rj he will see,
1 i>(ri) = probability that if B draws t] he will fold.
If the two players follow these strategies, the expected net return K(<j>,t[>) is the
sum of the returns corresponding to three mutually exclusive possibilities: A folds; A
bets a units and B sees; A bets and B folds. Thus
+ //*(0[i-MM*-
The yield to A may also be more transparently written as
or
and
min # j ) ( - ( a + 2) j f * * ( 0 # + *f **(0<f)*?. (2.10.5)
106 Cnapter 2. Infinite zero-sum two-person games
The crux of the argument is to verify that our results are consistent, i.e., that the
function <f>* that maximizes (2.10.4) is the same function (f>* that appears in (2.10.5),
and similarly for ^>*; if these assertions are valid, then (2.10.3) is satisfied and we
have found a solution.
At this point intuitive considerations suggest what type of solution we search for.
Since B has no chance to bluff, #'(77) = 1 for T) greater than some critical number
c, and t/>*(f?) = 0 otherwise; also, since B is minimizing, II>"(TJ) should be equal to 1
when the coefficient of 0(n) in (2.10.5) is negative. But this coefficient expresses a
decreasing function of q, and thus c is the value at which it first becomes zero. Hence
*'-{!: * ^
and
1, sfe<*<i
2.10. A poker model 107
then <f>" maximizes (2.10.4) for t/>* and if)' minimizes (2.10.5) for <f>".
The interpretation of the solution is of some interest:
(1) Both players bet or see on high hands. What is of special significance is that
both players use the identical critical number a/(a + 2) to distinguish high and low
hands.
(2) The element of bluffing shows up for Player 1 only to the extent that the
proportion of hands on which he should bluff is determined; he may choose the actual
hands in an arbitrary manner from [0,a/(a 4- 2)] subject only to the restriction
2.10.2. A poker model with several sizes of bet. [Karlin and Restrepo (1957)].
The model examined here is an extension of the one just analyzed.
As before, the unit interval is taken in the representation of all possible hands that
can be dealt to a player. Each hand is considered equally likely, and therefore the
operation of dealing a hand to a player may be considered as equivalent to selecting
a random number from the unit interval according to the uniform distribution. Of
course, a hand t is inferior to a hand 2 if ar*d nly if i < & The game proceeds as
follows. Two players A and B select points and 77, respectively, from the unit interval
according to the uniform distribution. Both players ante one unit. A, knowing his
value (, acts first and has the option of either folding immediately, thus forfeiting his
ante to B, or betting any one of the amounts a i , a 2 , . . . ,a, where 1 < aj < 02 <
. . . < a n . B must then respond by either passing immediately or seeing. In the first
circumstance A wins J3's ante. If B sees, the hands and TJ are compared and the
player with the better hand wins the pot. If = r\, no payment is made.
A strategy for A can be described as an n-tuple of functions
m = (*i(o,fc(o,- ..,*t(o),
where </>,() expresses the probability that A will bet the amount a, when his hand is
. The fait) must satisfy 4>i() > 0 and
i=l
The probability that A will fold immediately is
i-I>(0.
A strategy for B can be represented by the n-tuple
$(.V) = {tM'?).tM'?).--i<M'?)} 1
where $,(9) expresses the probability that B will see a bet of a; units when he holds
the value ?. The probability that B will pass after A has bet a, is 1 !&,*(?). Each
i>i(r}) is subject only to the condition that
> + 1) / 7 1 MQkivmiVW*!,
7Z, JO JO
where (,?) = sffn( - tj).
Any pair of optimal strategies <^* and 0* satisfy the inequalities
and
(2.10.11)
Thus (2.10.10) and (2.10.11) have the form
where Ci and C2 are independent of <j> and V. respectively, and L, and if< stand
for the bracketed expressions in (2.10.10) and (2.10.11), respectively. In view of the
constraints on & , . . . , ^ n , it is clear that in order to maximize (2.10.10) or (2.10.12),
A must choose &() = 1 wherever ,() is positive and greater than all Lj(() < 0;
and finally, if Li(() = 0 and if the remaining coefficients Lj()(j ^ i) are nonpositive,
he can maximize K(if>,xl>*) by choosing &() arbitrary consistent with 0 < &() < 1
(or, if more than one Lj(|) is zero, & ( { ) S 1> where the sum is extended over
those indices corresponding to ,() = 0). Similarly, in order to minimize (2.10.11)
or (2.10.13) B must choose rl>i(ij) = 1 wherever K((t]) < 0, and ^<(?) = 0 wherever
Ki(t)) > 0. Where JC,-(ij) = 0, the values of V\(?) will not affect the payoff.
2.10. A poker model 109
* ( , ) = {;; l ^ ; (2.10.H)
for some 6,-. This is in fact the case, since each KJ(TJ) is nonincreasing.
On the other hand, we may expect that A will sometimes bluff when his hand
is low. In order to allow for this possibility, we determine the critical numbers 6,
which define 0"(JJ) so that the coefficient L,() of fa is zero for < 6;. This can be
accomplished, since <() is constant on this interval. Hence we choose
h = r^-, (2.10.15)
I + a,
and thus ^ < ^ . . . < 6n < 1. The coefficient ,() of 4>i is zero in the interval (0,6,),
and thereafter it is a linear function of such that
c = 1 - % -. (2.10.16)
v
" (2 + a,)(2 + aj)
Clearly, ctJ is a strictly increasing function of i and j . Define C\ = b\ and c, =
c,-!,, for i = 2, . . . , n and c n + i = 1. For in the interval (C;,CJ + I), it is clear that
.({) > ,() > 0 for j / j . Consequently, according to our previous discussion, if
<f>' maximizes K((f>^*), then <j>"(i) = 1 for Ci < < c, + ! . For definiteness we also
set <j>"(ci) = 1; this is of no consequence, since if a strategy is altered only at a finite
number of points (or on a set of Lebesgue measure zero), the yield K(4>,i>) remains
unchanged.
Summarizing, we have shown that if ip' is defined as in (2.10.14), with
b, ^
2 +a,'
then K(4>,ip~) is maximized by any strategy <j>' of the form
where
I>?(0 < i, *?(0 > o.
The values of $*(() in the interval 0 < ( < cx are still undetermined because of the
relations ;() = 0 which are valid for the same interval.
It remains to show that ip* as constructed actually minimizes K(4>*,xl>). In order
to guarantee this property for ^*, it might be necessary to impose some further
conditions on the # ; for this purpose, we shall utilize the flexibility present in the
definition of <j>* as ranges over the interval (0,Ci). In order to show that ij>m minimizes
^W>^0) w e must show that the coefficient /<(>?) of &(/) is non-negative for >j < 6,
and nonpositive for T) > &,-. Since Ki{rj) is a continuous monotone-decreasing function,
the last condition is equivalent to the relation
Inserting the special form (2.10.17) of <j>' into (2.10.18) leads to the equations
=1
these equations can be satisfied if and only if
But the sum of the right-hand sides of (2.10.19) and (2.10.21) is at most (2+6 -&i)/4,
since always 6;(1 bi) < 1/4. As 61 > 1/3, we get
1, *<< C.-+1,
2.10. A poker model 111
where
k = j r - 7 , cj = 6,,
2 + a,
2
c
< = ! - 7^1wTl N> t = 2 , . . . ,n, c n + 1 = 1
(2 + Oi)(2 + o,-_i)
and
E(0<i.
2.10.3. Poker model with two rounds of betting. [Bellman (1952), Karlin and Re-
strepo (1957)]. In this section we generalize the poker model of the preceding section
to include two rounds of betting, but at the same time we restrict it by permitting
only one size of bet. We assume again, for convenience, that hands are dealt at
random to each player from the unit interval according to the uniform distribution.
Payoff and strategies. After making the initial bet of one unit, A acts first and
has two choices: he may fold or bet a units. B acts next and has three choices: he
may fold, he may see, or he may raise by betting a + 6 units. If B has raised, A must
either fold or see.
If A and B draw cards and r\, respectively, their strategies may be described as
follows:
<f>\{) = probability that A bets a and folds later if B raises.
<fo() = probability that A bets a and sees if B raises.
1 - 4>x{) fad) probability that A folds initially.
^jj(^) = probability that B sees the initial bet.
t/j(r;) = probability that B raises.
1 ^i(?) ^(?) = probability that B folds.
The expected return is
+(o + l ) J ^ ( ^ 1 ( i , ) I ( f 1 ^ 4 J + (l+(.+ 4 ) J / ^ ) ^ ) I ( ( , i l W l
where L(,ti) = sign( - TJ). (This expected yield is derived by considering mutually
exclusive possibilities: A folds; A bets and B folds; A acts according to ^i and B sees
or raises; A acts according to <j>2 and B sees or raises.)
If we denote the optimal strategies by
(In writing (2.10.29) we postulate that ^ ( 0 = 0 for 0 < { < c; this is intuitively
clear.) At this point we introduce the notation
Jo Jo
Recalling the assumption made on the form of the solution and assuming that c <
e < d, equations (2.10.26)-(2.10.29) may be written as follows:
2 = (a + 2)(m, + 1 - c), (2.10.30)
1-d = d-eor2(l-d) = (l-e), (2.10.31)
(2a + b + 2)m 2 = 6(1 - d), (2.10.32)
(a + 2)m, = a(l - c), (2.10.33)
(a + 2)(rn 1 + e - c) = (a + 6)(1 - c). (2.10.34)
We have obtained a system of five equations in the five unknowns m,\,m2,c,d,e;
we now prove that this system of equations has a solution which is consistent with
the assumptions made previously, namely that 0 < c < e < d < 1 , 0 < I T I < 0 ,
0 < m 2 < c.
Solution of equations (2.10.30)-(2.10.34). The system of equations may be solved
explicitly as follows. We write the last equation as:
2
a +2 ' l - c )' = o2_a +A6 +, o2( l ~ < 0 -
v (2-10-36)
Therefore
b
_ d)(
(1 + 2 a +fe+ 2 ) = * (2.10.37)
v ; v
V2a + 6 + 2 a+1 / a+2
Having obtained 1 d, we can solve (2.10.36) for 1 c, and the remaining unknowns
are then calculated from the original equations.
In order to show that the solution is consistent, we first note that 1 d > 0.
Equation (2.10.35) shows that 1 - c > 1 - d, and therefore c < d. Also, from
(2.10.36),
c
=(2^TT2)( 1 -^ + ( 1 - T 2 ) > - ^10-38)
Since 2 ( a + l ) ( l - c ) = (2a + 6 + 2 ) ( l - e ) , we infer that 1 - e < l - c , o r c < e; and since
2d = 1 + e, we must have e < d. Summing up, we have shown that 0 < c < e < d < l .
For the two remaining conditions we note that (2.10.30) implies that
m-i = c-
( ' - ^ >
114 Chapter 2. Infinite zero-sum two-person games
so that m2 < c, and by (2.10.38), m2 > 0. Finally, using (2.10.33) and (2.10.30), we
conclude that
m1 = ^ - ( l - c ) = ( l - c ) - - ( 1 - c )
= (1 - c)[l - (m, + 1 - c)\ = (1 - c)(c - m,),
so that 0 < ni < c.
Optimality of the strategies <j>" and t^>*. We summarize the representation of <f>*
and rl>* in terms of the values of c, e, d, ni, and m2 as computed above:
,.m fl, c<<e, ,.m_f0, 0
0<< <
< ee ,,
(2.10.39)
f 0, 0 < i? < c, <r ^,;
l a r 1
U , <*<?<i, ' - >- -
In the remaining interval 0 < r] < c, the functions #*() and ^J(JJ) are chosen arbi
trarily but bounded between 0 and 1, satisfying
Figure 2.18
Figure 2.19
0, 0 < < 2 3 / 3 5 ,
-{!:
<t>\ 23/35 < < 1,
f 0, 0 < r) < 19/35,
i/>;(V) = il, 19/35 < v < 29/35,
[o, 29/35 < t ? < l ,
/ fad = mi
(calculated above) and fa(() 1 on [d, I). For in [c, d] we require only that fa +fa =
1. Writing out the conditions under which K(faij>) is minimized for V*, we obtain
rd - rd - bid - n)
l-d= 4{IW and / fa(t)d( < K 2 for r, [c,rf], (2.10.40)
Jc Jri 0+0+1
where c and d are the same as before. We obtain these constraints by equating the
coefficients of ^>i(i?) and ^(tj) in (2.10.25) at r? = d and by requiring that N\{r\) <
#2(9) for t) in [c, d].
This relation is easily seen to be necessary and sufficient for $ to be optimal.
2.10.4. Poker model with k raises, [Karlin and Restrepo (1957)]. In this section
we indicate the form of the optimal strategies of a poker model with several rounds of
betting. The methods of analysis are in principle extensions of those employed in the
preceding section, but far more complicated in detail. We omit the proofs, referring
the reader to the references.
JtuJes, strategies, and payoff. The two players ante one unit each and receive
independently hands and rj (which are identified with points of the unit interval)
according to the uniform distribution. There are k + 1 rounds of betting ("round"
in this section means one action by one player). In the first round A may either fold
(and lose his ante) or bet a units. A and B act alternately. In each subsequent round
a player may either fold or see (whereupon the game ends), or raise the bet by a
units. In the last round the player can only fold or see. If k is even, the last possible
round ends with A; if k is odd, the last possible round ends with B.
A strategy for A can be described by a i-tuple of functions 4> = (fa(Q>fa(0r>
fa(()). These functions indicate A's course of action when he receives the hand (.
Explicitly,
i-ixo
1=1
is the probability that A will fold immediately, and
I>(0
1=1
is the probability that A will bet at his first opportunity. Further,
^ i ( 0 = probability that A will fold in his second round,
fa(i) = probability that A will see in his second round,
?=3 &(0 probability that A will raise in his second round,
if the occasion arises, i.e., if B has raised in his first round and kept the game going.
Similarly, if the game continues until J4'S rth round, then
2.10. A poker model 117
which indicates B's course of action when he receives the hand rj. The probability
that B will fold at his first opportunity is
m0.tfO?)] = ( - i ) ( i - 5 > ( o )
L
.=1 j=i
*(*,*) = / T F W O - ^ M ^ . (2.10.41)
Jo Jo
118 Chapter 2. Infinite zero-sum two-person games
Description of the optimal strategies. There exist optimal strategies <* and V>*
characterized by 2k -r 1 numbers b, cx,..., ck, du..., dk. When a player gets a hand
( in (0, b) he will bluff part of the time and fold part of the time. We shall write
and
; = / ^-(ij)*J, j =2,4,6,... (2.10.43)
JO
is important.
The constants c,,^,m,-, and n;- are determined by solving an elaborate system of
equations analogous to (2.10.30)-(2.10.34). Explicitly, if k is even, b,a,dj,m.i, and tij
are evaluated as solutions of the following equations:
k
[(4r - l ) a + 2] rtj = a(l - <**_,), 2r = 2 , 4 , . . . . *,
i=2r
*
a(c 2 r _ 2 - d 2 ,- 2 ) = a(l - c 2r _ 2 ) + [(4r - 3)a + 2] n,, 2r = 4 , 6 , . . . , * ,
i=ir
k
[ ( 4 r - 3 ) o + 2] *, = a ( l - c 2 r _ 2 ) , 2r = 2 , 4 , . . . , * ,
i=2r-l
k
o ^ . , - C 2 r - i ) = a ( l - r f 2 r - i ) + [ ( 4 r - l ) o + 2] 5 3 m 2r = 2 , 4 , . . . ,fc,
i=2r+l
2 = (a + 2)[Yimv)dr].
J0
]=\
An analogous system applies for k odd. The solutions obtained are consistent with
the requirements of Fig. 2.20.
2.10.5. Poker with simultaneous moves, [von Neumann and Morgenstern (1944),
Karlin (1959)]. Two players, A and B, make simultaneous bets after drawing hands
according to the uniform distribution. The initial bet can be either 6 (the low bet) or
a (the high bet). If both bets are equal, the player with the higher hand wins. If one
player bets high and the other bets low, the low bettor has a choice: he may either
2.10. A poker model 119
Figure 2.20
thelow
fold (losing the low bet) or see by making an additional bet oia b. IfIfthe low' bettor
bettor
sees, the player with the higher hand wins the pot.
Since the game is symmetric, we need only describe the strategies of < one player.
If A draws , we shall write
<pi() = probability that A will bet low and fold if B bets high,
<h(0 = probability that A will bet low and subsequently see,
$3(0 = probability that A will bet high.
These functions are of course subject to the constraints
The
i. ne expected
expecieu yield
yieiu to
to A
s\ if
11 he
ne uses
uses strategy
si.rai.egy $
if) while
mine Ba employs
euipiuys strategy
suaregj 0y reduces to
raiuua IU
K(4>,,j>)
K(^i>) - = bn\m) + Hi))\Uv) + Un)]mv)df,dv
Jo Jo
Jo Jo
-- bJofflMOMvWdt,
Jo
+ Jo
6 Jo/ Y Mt)MnW*i
Jo Jo Jo Jo
+ aJ*j\(t)MvWWt*l
+ Jo Jo
Jo
+ a Jof Jof
+ MZ)Mv)L((,v)d(dr,.
Jo Jo
Because of the symmetry of the game, we may replace fl's strategy $(i)) in this
expression by an optimal strategy <j>*(f}). We make the plausible assumption that
120 Chapter 2. Infinite zero-sum two-person games
in this strategy M^) = 0, since there would appear to be no clear justification for
making a low bet initially and then seeing. The consistency of this assumption will
be established later. With Mv) = 0 we may write K((j>,4>") as
K(4>,f) ==bbJo//Jo7M0
K(faP) W )++ MOlMnWO^d^
M0mtW(,r))d(dr}
JO J0
-* ft
ff MOMvWdr, flflMOMriWdv
MOPMWi ++b b/ Y MOM*)^
Jo Jo Jo Jo
+a //'/'[I
7 V - MO ~- M0mv)L(0v)dSd
MtMivWCvWin
Jo Jo
+ H1
Jo Jo MOMmOvWdr,
Jo Jo
= M
=a M[ && r44''Mdri b iir )d, b 4 Mdv
Mdri bii *'iir>)d,> - b 4'Mdv
- Jo Jo
+
+ ffQQ *()[*
MO\bfj of <f>'Mdr}
PMdf} - -bj*
bf Ml)dri}dZ
$(tl)*l]dt
+ [ MO[*
MO[*jf j ftftf fo)fy]
fo)fy] + [fo rP33(v)L((, v)d(dr,,
(f,m,l)Wl, (2.10.44
(2-10-44)
or
or
K(<f>,f) = J*^M0Ti(0dt + Z,
where Z is a term independent of fa. The <j> maximizing K(<j>, <f>") is evaluated by
choosing the component fa as large as possible whenever 7j(() = maxj T,(). If the
maximum of T,(f) is attained simultaneously by two of the Ti% then the corresponding
fa may share any positive values provided their sum is 1.
Some bluffing is anticipated on low hands. This suggests that for < 0 we
should have Ti(0 = T3() > T2{0- Differentiating the identity T,() = T3(), and
remembering that MO + MO = 1 on the interval [0,f0]> we deduce that
S8 ==i.P' ^
t' =l I(o
=
^ ' e!>K &.
f' (2-10.45)
(2.10.45)
that <j>i + 4>3 = 1, which is certainly satisfied for <f>' of (2.10.45). Moreover, we have
seen that Tt() T3() uniquely determines <j>* to be as in (2.10.45) for < 0-
For > , by examining (2.10.44) we find that T2() = T3() > Tt((). Hence the
maximization of K{<j>, <j>") requires <f> to be such that 4n + <h = 1- But, if $\ > 0 in this
interval, a simple calculation shows that Tj() < Ti() for f > i where i < o- All
these inferences in conjunction prove that the <f>* of (2.10.45) is the unique optimal
strategy of the game.
3. Show that the game on a unit square with the payoff function
H(x,y) = sign(x-y)
i
x>y,
0, x = y,
1/y2,
x <y
has the saddle point (0,0).
5. Show that the game on a unit square with the payoff function H(x, y) = {xy)2
does not have the saddle point in pure strategies.
6. Show that in the game on a unit square with the payoff function
fx + y, zthyfO,
1/2 + !/, x = l,y^0,
H(x,y)
1/2+x, x ^ l , y = 0,
2, x = l,y = 0
122 Chapter 2. Infinite zero-sum two-person games
the pair (xe,yc), where i e = 1 e, yt = e, is an e-saddle point. Does the game have a
value?
7. Solve the game of "search for a noisy object" formulated in Example 6, 2.1.2.
8. Compute the payoff to Player 1 in the game on a unit square with the payoff
function H(x,y) in the situation (F(x),G(y)) (F and G are distribution functions),
if
(a) H{x,y) = (x + )/(4*y), F(x) = x2, G(y) = y2;
(b) H(x, y) = \x- y\(l - |* - If I), F(*) = *, G(y) = y;
(c) H(z,y) = (x- y)2, F(x) = l/2/ 0 (x) + l/2h(x), G(y) = / , ( * ) , where Ik(x)
is a step function.
9. Game of discrete search. Consider the following infinite game. A strategy for
Player 2 is the choice of the point uniformly distributed over the circle of radius y,
where y can take values from the interval [0,1]. Player 1 may survey in a unit circle
the simply connected region whose area a(Q) = a = const, where a < A, A = T is
the area of the unit circle. His strategy x is the choice of a shape of the region Q
which has the area a and lies entirely within the unit circle. The payoff H(x,y) to
Player 1 is the probability of being discovered, i.e. H(x, y) = Pr(y Q). The mixed
strategy g(y) for Player 2 means the density of the distribution function of a random
variable y [0,1]. Find a solution to the game.
10. Prove Helly theorem, 2.5.4.
11. Consider a continuous analog of the "town defense" game (see 1.1.3). Player
1 has x units to attack the first post and 1 x to attack the second post, x [0,1].
Player 2 has y units of defense, where y [0,1], to allocate to the first post and 1 y
units of defense to the second post at which the permanent forces of defense, 1/2, are
located. A player pays 1 to the other for every post at which he has less units than
his opponent, and pays nothing if the players' forces are equal in number.
Construct the payoff function H(x,y) for the game on a unit square. Show that
this game has no solution in mixed strategies.
Hint. Make use of the result of Example 10, 2.4.12.
12. Show that in the continuous game with the payoff function
H(x,y) = [l + (x + y)2}-i
the strategies F*(x) = h/i{x), G*(y) = l/2I0(y)+l/2Il(y) are optimal for the players
1 and 2, respectively.
13. Prove that the value of the continuous symmetric game on a unit square
is zero, and optimal mixed strategies coincide (the game is symmetric if the payoff
function is skew-symmetric, i.e. H(x,y) = H(y,x)).
14. Define the optimal strategies and the value of the game on a unit square with
the payoff function H(x, y) = y3 3xy + x3.
15. Show that in the game with the payoff function
16. Verify that the payoff function from Example 11, 2.5.5, H(x,y) = p(x,y),
x e 5(0,/), y g 5(0,/), where 5(0,/) is the circle with its center at 0 and radius /,
/>() being a distance in R2, is strictly convex in y for any x fixed.
17. Show that the sum of two convex functions is convex.
18. Prove that, when bounded, the convex function <p : [a, fi\ * R} is continuous
in any point x (,/3). At the ends a and (5 of the closed interval (a,/?), however,
the convex function <p is upper semicontinuous, i.e.
where S(XJ, r) is a spherical segment with its apex at the point Xj and with r as a base
radius; |{y,}| means the number of points of the set {j/,}. The point y, is considered
to be discovered if & S(xj,r) for at least one Xj. Thus the payoff function is the
number of the points discovered in the situation {x,y).
Find a solution to the game.
Chapter 3
Nonzero-sum games
r = {N,{Xi}ieN,{HiheN),
where N = {1,2,..., n} is the set of players, X,- is the strategy set for player i, Hi is
the payoff function for player i defined on Cartesian product of the players' strategy
sets X = n"=i Xi (ine sei of situations in the game), is called a noncooperative game.
A noncooperative n-person game is played as follows. Players choose simultane
ously and independently their strategies i , from the strategy sets Xif i = 1,2,... , n ,
thereby generating a situation x = {x\,...,x), n 6 Xi. Each player i receives the
amount //<(x), whereupon the game ends.
If the players' pure strategy sets X, are finite, the game is called a finite nonco
operative n-person game.
125
126 Cnapter 3. Nonzero-sum games
It is assumed that each player prefers to make a stop in order to avoid an accident,
or to continue on his way if the other player has made a stop. This conflict can be
formalized by the bimatrix game with the matrix
01 A
1 (1,1) (l-,2)
(A,B)
<*2 (2,1-c) (0,0)
1 yr-/ail)
a(0) / /
/
6(0) /
0 io i1 1 t
Figure 3.1
Let a and b be of the form shown in Fig. 3.1. From the form of the functions a(t)
and 6(f) it follows that if the number of players choosing 1 is greater than t\, then the
street traffic is light enough to make the driver of a private vehicle more comfortable
than the passenger in a public vehicle. However, if the number of motorists is greater
than 1 to, then the traffic becomes so heavy (with the natural priority for public
vehicles) that the passenger in a public vehicle compares favourably with the driver
of a private vehicle.
Example 4- (Allocation of a limited resource taking into account the users' inter
ests.) Suppose n users have a good chance of using (accumulating) some resource
whose volume is bounded by A > 0. Denote by i , the volume of the resource to
128 Ciapter 3. Nonzero-sum games
be used (accumulated) by the ith user. The users receive a payoff depending on the
values of the vector x = (x\, x 2 , . . . , x). The payoff for the ith user is evaluated by
the function /i,(xi, x 2 , . . . , x n ), if the total volume of the used (accumulated) resource
does not exceed a given positive value 0 < A, i.e.
n
x( < 0, x{ > 0.
If the inverse inequality is satisfied, the payoff to the ith user is calculated by the func
tion <?,(xi, x 2 , . . . , i ) . Here the resource utility shows a sharp decrease if J2?=i xi > >
i.e.
9i(xi,Xi,...,Xn) < hi(xi,X2,...,Xn).
Consider a nonzero-sum game in normal form
r = (N,{X<UN,{HiheN)
tf/~ \ _ f Mxi,X2, . . . , X n ) ,
(ni(xi,x2,...,xn) -st, 9>0,
where /,(xi,Xa,...,x) are the functions that are continuous and increasing in the
variables x;.
3.2. Optimality principles in noncooperative games 129
ti / 1 _ J (PT - P i k i . Pi ^ P21
"l(Pl'P2)-n(pT-Pi)(9-92), P<P2,
and its various extensions and refinements. When the game T is zero-sum, the Nash
equilibrium coincides with the notion of optimality (saddle point - equilibrium) that
is the basic principle of optimality in a zero-sum game.
Suppose x = ( x i , . . . , x x) is an arbitrary situation in the game
T and Xi is a strategy of player i. We construct a situation that is different from x
only in that the strategy x, of player i has been replaced by a strategy x|. As a result
we have a situation ( x , , . . . ,Xj_i,xJ,x; + 1 ,... , s ) denoted by (x||xj). Evidently, if Xj
and x\ coincide, then (x||x() = x.
Definition. The situation x' = ( x j , . . . , x * , . . . , x*) is called the Nash equilibrium
if for all x; G AT,- and i = 1 , . . . , n there is
Example 7. Consider the game from Example 3, 3.1.4. Here we regard as Nash
equilibrium the situation for which there is the condition
k < t' - 1/n, f + 1/n < t, (3.2.2)
where t* = " _ , x}. It follows from (3.2.2) that a payoff to a player remains
unaffected when he shifts from one pure strategy to another provided the other players
do not change their strategies.
Suppose a play of the game realizes the situation x corresponding to t = )"_, Xj,
t [to, U], and the quantity S is the share of the players who wish to shift from strategy
0 to strategy 1. Note that if 6 is such that b(t) = a(t) < a(t + 6), then the payoffs
to these players tend to increase (with such a strategy shift) provided the strategies
of the other players remain unchanged. However, if this shift is actually effected,
then the same players may wish to shift from strategy 1 to strategy 0, because the
condition a(t + S) < b(t + 6) is satisfied. If this wish is reahzed, then the share of
players, Z)" = i Xj, decreases and again falls in the interval [to!*i]-
Similarly, let 6 be the share of players, who decided, for some reason (e.g. because
of random errors), to shift from strategy 1 to strategy 0, when t S < to- Then, by
the condition b(t S) < a(t S), the players may wish to shift back to strategy 1.
When this wish is realized, the share of the players, ^ " = 1 x,, increases and again
comes back to the interval [<oi'i]-
3.2.2. It follows from the definition of the Nash equilibrium situation that none of
the players :' is interested to deviate from the strategy x* appearing in this situation
(by (3.2.1), when such a player uses strategy x, instead of x*, his payoff may decrease
provided the other players follow the strategies generating an equilibrium x*). Thus, if
the players agree on the strategies appearing in the equilibrium x*, then any individual
non-observance of this agreement is disadvantageous to such a player.
Definition. The strategy x* Xi is called equilibrium if it appears at least in
one Nash equilibrium.
For the noncooperative two-person game T = (X\,Xi,H\,H-i) the situation (x*, y*)
is equilibrium if the inequalities
^(x,y*)</f,(x*,y-), H3(x',y)<H2(x',y') (3.2.3)
3.2. Optimaiity principles in noncooperative games 131
hold for all the rows * M and columns j N. Thus, Example 1 has two equilibria
at ( a , , f t ) and ( a 2 , ^ 3 ) , whereas Example 2 has equilibria at (aj,/? 2 ) and (a2,ffi).
Recall that for the zero-sum game T = (Xi,X2,H) the pair (x*,y") X\ x X2 is
an equilibrium if
that both players become losers (the payoff vector (0,0)). Then it may be wise of
Player 1 to choose strategy a 2 , since in the situation (a 3 , #)) he would receive a payoff
1. Player 2, however, may follow a similar line of reasoning and choose 0i, then, in
the situation (a<2,/3i) both players again become losers.
Thus, this is the case where the situation is advantageous (but at the same time
unstable) to Player 1. Similarly, we may examine the situation (a 2 , & ) (from Player
2's point of view). For this reason, it may be wise of the players to make, in advance
of the play, contact and agree on a joint course of action, which contradicts property
3. Note that some difficulties may arise when the pairs of maximin strategies do not
form an equilibrium.
Thus we have an illustrative example, where none of the properties 1 4 of a
zero-sum game is satisfied.
Payoffs to players may vary with Nash equilibria. Furthermore, unlike the equi
librium set in a zero-sum game, the Nash equilibrium set is not rectangular. If
x = ( i i , . . . , a;,,..., x) and x' = (x\,..., x(-,..., x'n) are two different equilibria, then
the situation x" composed of the strategies, which form the situations x and x' and
coincides with none of these situations, may not be equilibrium. The Nash equilib
rium is a multiple optimality principle in that various equilibria may be preferable to
different players to a variable extent. It now remains for us to answer the question:
which of the equilibria can be taken as an optimality principle convenient to all play
ers? In what follows it will be shown that the multiplicity of the optimality principle
is characteristically and essential feature of an optimal behavior in the controlled
conflict processes, with many participants.
Note that, unlike a zero-sum case, the equilibrium strategy x* of the th player
may not always ensure at least the payoff Hi{x") in the Nash equilibrium, since this
essentially depends on whether the other players choose the strategies appearing in
the given Nash equilibrium. For this reason, the equilibrium strategy should not be
interpreted as an optimal strategy for the ith player. This interpretation makes sense
only for the n-tuples of players' strategies, i.e. for situations.
3.2.3. An important feature of the Nash equilibrium is that any deviation from
it made by two or more players may increase a payoff to one of deviating players.
Let S C N be a subset of the set of players (coalition) and let z = ( x i , . . . , x ) be a
situation in the game I \ Denote by (x||z$) the situation which is obtained from the
situation x by replacing therein the strategies x,, S, with the strategies xj- 6 A",-,
t S. In other words, the players appearing in the coalition S replace their strategies
x,- by the strategies X;. If x* is the Nash equilibrium, then (3.2.1) does not necessary
imply
Hi(x') > Hi(x"\\xs) for all i S. (3.2.7)
In what foDows this will be established by some simple examples.
But we may strengthen the notion of a Nash equilibrium by requiring the condition
(3.2.7) or the relaxed condition (3.2.7) to hold for at least one of the players t 6 S.
Then we arrive at the following definition.
Definition. The situation x" is called a strong equilibrium if for any coalition
S C N and xs n . s X> there is a player io S such that the following inequality is
3.2. Optimality principles in noncooperative games 133
satisfied:
H,0(x') > Hio(x'\\xs). (3.2.8)
01 02
(5,5) (0,10)
(A,B)= *
a2 (10,0) (1,1)
Here we have one equilibrium situation (02,(82) (though not strong equilibrium),
which yields the payoff vector (1,1). However, if both players play (01,^1), they
obtain the payoff vector (5,5), which is better to both of them. Zero-sum games have
no such paradoxes. As for this particular case, the result is due to the fact that a
simultaneous deviation from the equilibrium strategy may further increase a payoff
to each player.
3.2.4. Example 9 suggests the possibility of applying other optimality principles
to a noncooperative game which may bring about situations that are more advanta
geous to both players than in the case of equilibrium situations. Such an optimality
principle is Pareto optimality.
Consider a set of vectors {H(x)} = {Hx(x),..., Hn(x)}, x X, X = n"=i Xi, i.e.
the set of possible values of vector payoffs in all possible situations x X.
Definition. The situation x in the noncooperative game T is called Pareto
optimal if there is no situation x X for which the following inequalities hold:
Z1 {(xux2)\Hl(xllx2)=auPH1(yl,x2)}, (3.2.9)
VI
where i = 1,2, t ^ j .
The notion of i-equilibrium may be interpreted as follows. Player 1 (Leader) knows
the payoff functions of both piayers Hi,H2, and hence he learns Player 2's (Follower)
set of best responses Z2 to any strategy X\ of Player 1. Having this information he
then maximizes his payoff by selecting strategy xl from condition (3.2.11). Thus, 7F,
is a payoff to the ith player acting as a "leader" in the game V.
Lemma. Let Z(T) be a set of Nash equilibria in the two-person game F. Then
where Zl,Z2 are the sets of the best responses (3.2,9), (3.2.10) given by the players
1,2 in the game Y.
3.2. Optimality principles in noncooperative games 135
Proof. Let (xi,x 2 ) g Z(T) be the Nash equilibrium. Then the inequalities
( / f i ( z i , i j ) , t f a ( * i , u ) ) ? (tfi(Vi,w),#2(Vi,2)), (3-2.16)
Suppose the opposite is true, i.e. the game T does not involve competition for lead
ership. Then there is a situation (zi,z 2 ) 6 X\ x X2 for which
ff,(yi,y2)<H,<//^.,z2), (3.2.18)
i = 1,2. But (xj, x 2 ), (t/!, y2) are Pareto optimal situations, and hence the inequalities
(3.2.17), (3.2.18) are satisfied as equalities, which contradicts (3.2.16). This completes
the proof of the theorem.
In conclusion we may say that the games "battle of the sexes" and "crossroads"
(as in 3.1.4) satisfy the condition of the theorem (as in 3.2.5) and hence involve
competition for leadership.
136 Chapter 3. Nonzero-sum games
= E E Hi(xu...,xn)xiti(xl)x...xitn(xn),
iN, x = ( ! , , . . . , i ) X. (3.3.3)
We introduce the notation
*',X,
3.3. Mixed extension of noncooperative game 137
X2 = {y | yw = 1, y > 0, y 6 " } ,
where u = ( 1 , . . . , 1) /T", w = ( 1 , . . . , 1) IT. We also define the players' payoffs
Ki and Ki at (x, y) in mixed strategies to be the payoff expectations
Thus, we have constructed formally a mixed extension T(A,B) of the game T(A,B),
i.e. the noncooperative two-person game T(A,B) = {X\,X2,Ki, K2).
For the bimatrix game (just as for the matrix game) the set Mz = {t|, > 0} will be
called Player l's spectrum of mixed strategy x = ( f i , . . . , m ), while the strategy x, for
which Mx M, M = { 1 , 2 , . . . , m}, will be referred to as completely mixed. Similarly,
Wy = (il^i > 0} will be Player 2's spectrum of mixed strategy y = {r]i,... ,?} in
the bimatrix (m x n) game T(A,B). The situation (x,y), in which both strategies x
and y are completely mixed, will be referred to as completely mixed.
We shall now use the "battle of the sexes" game to demonstrate that the difficulties
encountered in examination of a noncooperative game (Example 8, 3.2.2) are not
resolved through introduction of mixed strategies.
Example 11. Suppose Player 1 in the "battle of the sexes" game wishes to maxi
mize his guaranteed payoff. This means that he is going to choose a mixed strategy
x (, 1), 0 < < 1 so as to maximize the least of the two quantities K\(x,fii)
and K\{x,^i), i.e.
m^v^n{Kx{x,^),K,{x,h)}-mm{Kx{x\^),K,{x,h)}-
X
are (1/5,4/5). Therefore, it is advantageous for him to use his strategy e*i against
the maximin strategy y.
If both players follow this line of reasoning, they will arrive at a situation (c*!,/^),
in which the payoff vector is (0,0). Hence the situation (x,y) in maximin mixed
strategies is not a Nash equilibrium.
3.3.3. Definition. The situation ft" is called a Nash equilibrium in mixed
strategies in the game T if for any player i, and for any mixed strategies /, the
following inequality holds:
. . l- , 1
K1(a2,S) = 2 l 0 r l-^.
Furthermore, since for any pair of mixed strategies x = ((, 1 ) and y = (n, 1 - n),
we have
ffi(*,lf) = W a , , / ) + (1 - 0 # , ( a y * ) = 1 - j ^ ,
then we get
Kx{x,f) = #fx(**,"), K2(x%y) = K2(x%f)
for all mixed strategies x 6 X\ and y X2. Therefore, (x*,ym) is a Nash equilibrium.
Furthermore, it is a completely mixed equilibrium. But the situation (x*,ym) is not
Pareto optimal, since the vector K(x*,y*) = (1 e/(2 e), 1 e/(2 )) is strictly
(component-wise) smaller than the payoff vector (1,1) in the situation (ai,/?i).
Let K{fi') = {Ki(p*)} be a payoff vector in some Nash equilibrium. Denote
v.- = K,(//*) and v = {,-}. While the zero-sum games have the same value v of the
payoff function in all equilibrium points and hence this value was uniquely defined
for each zero-sum game, which had such an equilibrium, in the nonzero-sum games
3.4. Existence of Nash equilibrium 139
there is a whole set of vectors v. Thus every vector v is connected with a special
equilibrium point /i*, v< = #,(/**), /i* X,J( = n " = i ^ f
In the game of "crossroads", the equilibrium payoff vector (i/i,j) at the equi
librium point ( a i , / ^ ) is of the form (1 e,2), whereas at (x',y*) it is equal to
(1 - e/(2 - e), 1 - t/(2 - c)) (see Example 12).
3.3.4. If the strategy spaces in the noncooperative game T = (Xi,Xj, H\, Jf/2)
are infinite, e.g. Xt C IP", X2 C IP1, then as in the case of zero-sum infinite game,
the mixed strategies of the players are identified with the probability measures given
on Borel a-algebras of the sets Xi and X2. If /i and v are respectively the mixed
strategies of Players 1 and 2, then a payoff to player i in this situation Ki(fi, u) is the
mathematical expectation of payoff, i.e.
where the integrals are taken to be Lebesgue-Stieltjes integrals. Note that, the payoffs
to the players at (x,u) and (p,y) are
Ki{x,r)= / # ; ( * , y)<M),
JX3
tj>: Xt x X2-> Xi x X^
i.e. the image of the map tp consists of the pairs of the players' best responses to the
strategies yo and xo, respectively.
The functions Ki and K2 as the mathematical expectations of the payoffs in
the situation (x,y) are bilinear in x and y, and hence the image ^(xo,yo) of the
situation (x0,j/o) with ^ as a map represents a convex compact subset in X\ x X2.
Furthermore, if the sequence of pairs { ( X Q ^ Q ) } , ( x o>yo) Xi x X2 and { ( X J , , J 4 ) } ,
(x'n,y'n) V>(xo,yo) have limit points, i.e.
then by the bilinearity of the functions Ki and K2, and because of the compactness
of the sets Xt and X2, we have (x',y') ift(x0,y0). Then, by the Kakutani's theorem,
there exists a situation (x",y*) G Xi x Xi for which (x',ym) ip(x*,ym), i.e.
for all x Xi and y X2. This completes the proof of the theorem.
3.4.2. The preceding theorem can be extended to the case of continuous payoff
functions Hi and H2. To prove this result, we have to use the well-known Brouwer
fixed point theorem [Parthasarathy and Raghavan (1971)].
Theorem. Let S be a convex compact set in FC which has an interior. Ifipisa
continuous self-map of S, then there exists a fixed point x' of the map <p, i.e. x* G S
and x* = y(x*).
Theorem. Let V = (Xi, X2, Hx,H2) be a noncooperative two-person game,
where the strategy spaces Xy C flm, X2 C i?" are convex compact subsets and the
set X\ x X2 has an interior. Also, let the payoff functions Hi(x,y) and H2(x,y) be
continuous in Xx x X2, with Hi(x,y) being concave in x at every fixed y and H2(x,y)
being concave in y at every fixed x.
Then the game V has the Nash equilibrium (x',y*).
Proof. Let p = (x,y) X% x X2 and q = (x,y) Xi x X2 be two situations in
the game T. Consider the function
First we show that there exists a situation q" = (x",y*) for which
Suppose this is not the case. Then for each q Xt x X2 there is a p X\ x X2,
p ^ q, such that #(p, 9) > 0(q,q). Introduce the set
Gp={q\9(p,q)>9(q,q)}.
Since the function 9 is continuous (H\ and // 2 are continuous in all their variables)
and Xi x X2 is a convex compact set, then the sets Gp are open. Furthermore, by
the assumptions, X^ x X2 is covered by the sets from the family Gp.
It follows from the compactness of X\ x X2 that there is a finite collection of these
sets which covers Xt x X2. Suppose these are the sets G M , . . . , Grk. Denote
^(9) = -J-J1L'PMPJ>
where <p(q) = ,- >j(<l)- The functions ipj are continuous and hence $ is a continuous
self-map of Xi xX2. By the Brouwer's fixed point theorem, there is a point q XiXX2
such that ip(q) = q, i.e.
q = (lf(q))YlsfiMPi-
3
Consequently,
9(q,q) = (^)I>i(flP;.?)-
But the function 9(p, q) is concave in p, with q fixed, and hence
9{q,q)>~Y,vMWP,^)- (3-4.1)
On the other hand, if <pj(q) > 0, then 9(q,q) < 9(pj,q), and if <pj(q) = 0, then
<pj(q)9(pj,q) = <pj{q)0(q,q). Since <,(<?) > 0 for some j , we get the inequality
*(?.?) <j=jI>j(?WPi.?).
9 ) = 9
p e & ^ ' * ^ * > -
142 Chapter 3. Nonzero-sum games
for all i Xi and y X3. Setting successively z = x' and y = y* in the last
inequality, we obtain the inequalities
which hold for all x Xi and y 6 l j . This completes the proof of the theorem.
The result given below holds for the noncooperative two-person games played on
compact sets (specifically, on a unit square) with a continuous payoff function.
Theorem. Let T = (Xi)Xi,Hi,H2) be a noncooperative two-person game,
where Hi and i/ 2 are continuous functions on X\ X Xj; XXs X2 are compact subsets
in finite-dimensional Euclidean spaces. Then the game V has an equilibrium (ft, v) in
mixed strategies.
This theorem is given without proof, since it is based on the continuity and bilin-
earity of the function
over the set Xi x X% and almost exactly repeats the proof of the preceding theorem.
We shall discuss in more detail the construction of mixed strategies in noncoope
rative n-person games with an infinite number of strategies. Note that if the players'
payoff functions /f,(x) are continuous on the Cartesian product X = FI"=i Xi of the
compact sets of pure strategies, then in such a noncooperative game there always
exists a Nash equilibrium in mixed strategies. As for the existence of Pareto optimal
situations, it suffices to ensure the compactness of the set {H(x)}, x X, which in
turn can be ensured by the compactness in some topology of the set of all situations
X and the continuity in this topology of all the payoff functions K{, i = 1,2,,.. , n .
It is evident that this is always true for finite noncooperative games.
a non-negative real number m(xi), representing the probability that player i would
choose x{, such that
5 3 Hi(xi) = 1, for all i N.
If the players choose their pure strategies independently, according to the mixed
strategy profile ft, then the probability, that they will choose the pure strategy profile
x = (xi,..,,Xi,...,xn) is n"=i f*i(xi), t n e multiplicative product of the individual
strategy probabilities.
For any mixed strategy profile ft, let Ki(fi) denote the mathematical expectation
of the payoff that player i would get when the players independently choose their pure
strategies according to ft. Denote X = n?=i Xi (X, is the set of all possible situation
in pure strategies), then
for a11
K.-00 = ( I I W(*i))#i(), 6 N.
For any r, X we denote by (p||r,-) the mixed strategy profile in which the t-
component is T; and all other components are as in p. Thus
TOh) = E( IT KMWximx).
We shall not use any special notation for the mixed strategy pi that puts
probability 1 on the pure strategy Xi, denoting this mixed strategy by Xi (in the same
manner as the corresponding pure strategy).
If the player t used the pure strategy x,-, while all other players behaved inde
pendently according to the mixed-strategy profile ft, then player t's mathematical
expectation of payoff would be
KM*i)= ( II K{xi))HM*i)-
KMrJ = ii{xi)KMxi).
144 Chapter 3. Nonzero-sum games
lemma Rj(fi) is the set of all probability distributions pj over Xj such that
That is, r 6 #(/*) if and only if Tj e i?j(/i) for every j N. For each p, R(fi) is
nonempty and convex, because it is the Cartesian product of nonempty convex sets.
To show that R is upper-semicontinuous, suppose that {(ik) and { T * } , k =
l , . . . , o o are convergent sequences, pk HieN^i> ^ = 1)2,...; Tk R{Hk),
k = 1,2,...;~p= lim^oo/i*, r = lim*_0OTfc. __
We have to show that f R(fi). For every player j N and every pj Xj
K^k\\rk)>Kj{^\\Pll *=1,2
tms
By continuity of the mathematical expectation Kj(p.) on n"=i Xii in turn implies
that, for every j N and pj Xj,
X"i(ni)={fiiX'i: /i,( Zi ) > */,-(i,-), for all *,- X where ffr(x<) > 0, >?,(x,) < 1}.
Let i;(x) = (i7i(ii),...,;(z)) ) x, Xi, i - I,.. .,n and X[i?(x)] = Il" = iX(i),(ii)).
The perturbed game (F, J?) is the infinite game in normal form
defined over the strategy sets Xj(j^(xi)) with payoffs Ki{p\,..., fin), /J, X , ( ^ ( x i ) ) ,
t = l,...,n.
S.8.3. It is easily seen that a perturbed game (I\j) satisfies the conditions under
which the Kakutani fixed point theorem can be used and so such a game possesses at
least one equilibrium. It is clear that in such an equilibrium a pure strategy which
is not a best reply has to be chosen with a minimum probability. And we have the
following lemma.
Lemma. A strategy profile fi g X(n) is an equilibrium of(F,n) if and only if
the following condition is satisfied:
L2 *2
1 (1.1) (0,0)
tl (0,0) (0,0)
This game has two equilibria (L\,L3) and (Ri,R2). Consider a perturbed game
(r,ij). In the situation (Ri,Ri) in the perturbed game the strategies Rt and Ri will
be chosen with probabilities 1 ni(Li) and 1 I;J(J) respectively and the strategies
3.6. Refinements of Nash equilibria. 147
Lx and L2 will be chosen with probabilities f?i(Li) and TI2(L2). Thus the payoff
K?{RiRt) in (r,t?) will be equal
* ? ( f l i , ) = i h ( i ) &(*)
In the situation (Ri, L2) the strategies ft and L2 will be chosen with probabilities
(i - r,i{Lx)) and (1 - r, 2 (ft)), and
tf?(Li,fla) = '?i(i)(l-'&(^))-
2 R2
1 "(1.1) (10,0)
Hi (0,10) (10,10)
In this game we can see that a perfect equilibrium (L\, L2) is payoff dominated by
a non-perfect one. This game also has two different equilibria, (Lt,L2) and ( f t , ft).
Consider the perturbed game (F,^). Show that (Li,L2) is a perfect equilibrium in
(IN?)
K1(LUL2)>K2(R1,L2).
Ki(LuL2)>Ki{LuR2).
n
Consider now ( f t , ft) > (F>0
= 1 0 ( 1 - * ( ! ) ) + iji(L,)jfe(Ia),
^ ( 1 1 , % ) = 10(1-iji(fl,))(l-%())+ 10Iji(,)(l-ih(Ia)) + ( l - J i ( i ) M i a )
= 10(1 - r]2(L2)) + (1 - Vi{Ri))V2{L2).
For small 1? K ^ L i , / ^ ) > tf?(ft,ft). Thus ( f t , ft) is not an equilibrium in
(r,j) and it cannot be a perfect equilibrium in T.
148 Chapter 3. Nonzero-sum games
It can be seen that (Zri,L 2 ) equilibrium in (I\?), and the only perfect equilibrium
in T, but this equilibrium is payoff dominated by (Ri,Rj). We see that the perfect-
ness refinement eliminates equilibria with attractive payoffs. At the same time the
perfectness concept does not eliminate all intuitively unreasonable equilibria.
As it is seen from the example of Myerson (1978)
Li Ri A?
(1,1) (0,0) (-1,-2)-
Ri (0,0) (0,0) (0,-2) .
Ax (-2,-1) (-2,0) (-2,-2).
It can be seen that an equilibrium (J?i, flj) in this game is also perfect. Namely if
the players have agreed to play (Ri,R2) and if each player expects, that the mistake
A will occur with a larger probability than the mistake L, then it is optimal for each
player to play R. Hence adding strictly dominated strategies may change the set of
perfect equilibria.
3.6.6. There is another refinement of equilibria concept introduced by Myerson
(1978), which exclude some "unreasonable" perfect equilibria like (i?i, Ri) in the last
example.
This is the so-called proper equilibrium. The basic idea underlying the properness
concept is that a player when making mistakes, will try much harder to prevent more
costly mistakes than he will try to prevent the less costly ones, i.e. that there is some
rationality in the mechanism of making mistakes. As a result, a more costly mistake
will occur with a probability which is of smaller order than the probability of a less
costly one.
3.6.7. Definition. Let {N,Xi,...,Xn,Kx,... ,Kn) be an n-person normal
form game in mixed strategies. Let e > 0, and ptc f]?=i 3Tt- W e say that the strategy
profile /il is an e-proper equilibrium of T if fi' is completely mixed and satisfies
Kx{x,v*)<Kx{n*y), (3.7.1)
Proof. The necessity is evident, since every pure strategy is a special case of a
mixed strategy, and hence inequalities (3.7.1), (3.7.2) must be satisfied. To prove the
sufficiency, we need to shift to the mixed strategies of Players 1 and 2, respectively,
in inequalities (3.7.1), (3.7.2).
This theorem (as in the case of zero-sum games) shows that, for the proof that the
situation forms an equilibrium in mixed strategies it only suffices to verify inequalities
(3.7.1), (3.7.2) for opponent's pure strategies. For the bimatrix (m x n) game T(A, B)
these inequalities become
hold for all i Mx and j e Ny, where Mx(Ny) is the spectrum of a mixed strategy
The contradiction proves the validity of (3.7.5). Equations (3.7.6) can be proved in
the same way.
This theorem provides a means of finding equilibrium strategies of players in the
game T(A,B). Indeed, suppose we are looking for an equilibrium (x,y), with the
strategy spectra Mz, Nv being given. The optimal strategies must then satisfy a
system of linear equations
ya< = vu xb> = v 3 , (3.7.9)
where Mx, j Ny, Vi, 2 are some numbers. If, however, the equilibrium (x, y) is
completely mixed, then the system (3.7.9) becomes
x = ViuB-\ (3.7.11)
l
y = viA~ u, (3.7.12)
where
vt = l/(i4 _ 1 u), v2 = l / ( B - ' ) . (3.7.13)
Conversely, if x > 0, y > 0 hold for the vectors x,y 6 i f defined by (3.7.11)-
(3.7.13), then the pair (x,y) forms an equilibrium in mixed strategies in the game
F(A, B) with the equilibrium payoff vector (uijWj),
Proof. If (x,y) is a completely mixed equilibrium, then x and y necessarily satisfy
system (3.7.10). Multiplying the first of the equations (3.7.10) by A~l, and the second
by B~l, we obtain (3.7.11), (3.7.12). On the other hand, since xu = 1 and yu = 1,
we find values for i and v2. The uniqueness of the completely mixed situation (x,y)
follows from the uniqueness of the solution of system (3.7.10) in terms of the theorem.
We shall now show that the reverse is also true. By the construction of the vectors
x,y in terms of (3.7.11)-(3.7.13), we have xu = yu = 1. Prom this, and from the
conditions x > 0, y > 0, it follows that (x, y) is a situation in mixed strategies in the
game I\
By Theorem 3.7.1, for the situation (x, y) to be an equilibrium in mixed strategies
in the game T(A, B), it suffices to satisfy the conditions
Ay
(B-u)(uA- l u)
~ uA-^u - (B->)(A->) _ (XAV)U
'
u _ {uB-lBA~lu)u
xB = = {xBy)u,
uB-tu ~~ {uB-lu)(uA'lu)
which proves the statement.
We shall now demonstrate an application of the theorem with the example of a
"battle of the sexes" game as in 3.1.4. Consider a mixed extension of the game. The
set of points representing the payoff vectors in mixed strategies can be represented
graphically (Fig. 3.2, Exercise 6). It can be easily seen that the game satisfies the
(5/2,5/2)
Figure 3.2
true for the mixed extension V. For this reason, the theorem of competition for
leadership (see 3.2.2) holds for the two-person game:
Z(r) = z l u^ 2 ,
where Z(T) is the set of Nash equilibria, Z and ^ are the sets of the best responses
to be given by the players 1 and 2, respectively, in the game F .
Things become more complicated where the Nash equilibria and Pareto optimal
situations are concerned. The examples given in Sec. 3.2 suggest the possibility of
the cases where the situation is Nash equilibrium, but not Pareto optimal, and vice
versa. However, the same situation can be optimal in both senses (see 3.2.4).
Example 12 in 3.3.3 shows that an additional equilibrium arising in the mixed
extension of the game T is not Pareto optimal in the mixed extension of I \ This
appears to be a fairly common property of bimatrix games.
Theorem. Let T(A, B) be a bimatrix (mxn) game. Then the following assertion
is true for almost all (mxn) games (except for no more than a countable set of games).
Nash equilibrium situations in mixed strategies, which are not equilibrium in the
original game, are not Pareto optimal in the mixed extension.
For the proof of this theorem, see Moulin (1981).
3.7.5. In conclusion of this section, we examine an example of the solution of
bimatrix games with a small number of strategies, which seems to be instructive in
many respects.
Example IS. (Bimatrix (2 x 2) games.) [Moulin (1981)]. Consider the game
r(A, B), in which each player has two pure strategies. Let
n r%
*1 ' (OH.AI) (<*12,Al)
(A,B)
* . (21,All) (M,0M)
Here the indices 6i,6jy ^ij^a denote pure strategies of Players 1 and 2, respectively.
For simplicity, assume that the numbers a n , a ^ , 021, a 2 2, ( A n At A l t Aw) a r e
different.
Case 1. In the original game T, at least one player, say Player 1, has a strictly
dominant strategy, say 61 (see Sec. 1.8). Then the game T and its mixed extension T
have a unique Nash equilibrium. In fact, inequalities a n > a2i, a 12 > a22 cause the
pure strategy 61 in the game T to dominate strictly all the other mixed strategies of
Player 1. Therefore, an equilibrium is represented by the pair ($I,TI) if A i > A2t or
by the pair (5i,r2) if A i < fin-
Case 2. The game T does not have a Nash equilibrium in pure strategies. Here
two mutually exclusive cases a) and 6) are possible:
where detA ^ 0, detB ^ 0 and hence the conditions of Theorem 3.7.3 are satisfied.
The game, therefore, has the equilibrium (x*,y*), where
v, = a u^ M - n02i , t;2ftift2
= - ft2fti
<*u + 22 - 2i - ' fti + fin- fin- fiix
Case 3. The game T has two Nash equilibria. This occurs when one of the following
conditions is satisfied:
6) a < a 2 i, a22 < <*i2, fin < fizi, fin < fiix-
In case a), the situations (6,, r , ) , (6 2 , r 2 ) are found to be equilibrium, whereas in case
6), the situations ( ^ , , T 2 ) , ( 6 2 ! T I ) frm an equilibrium. The mixed extension, however,
has one more completely mixed equilibrium (x*,y') determined by (3.7.14), (3.7.15).
The above cases provide an exhaustive examination of a (2 X 2) game with the
matrices having different elements.
For any mixed strategy p = {} define C(p) as a carrier of p and B(p) the set of pure
best replies against p in the game A
3.8.2. Consider now an example of Hawk-Dove game of Maynard Smith and Price
(1973) which leads us to the notion of the evolutionary stable strategy (ESS).
Example 14. The Hawk-Dove game is 2 x 2 symmetric bimatrix game with the
following matrices:
H D H D
H V-C) V
1 R - H 1/2(V - C) 0
(3.8.1)
A-
A
~ D 0 1/2V J ' " ~ D V 1/2V
* = , i - 0 . ? = (M-0,
where ( = V/C.
There are also 2 asymmetric equilibria (H,D) and (D,H).
3.8.3. Assume now that a monomorphic population is playing the mixed strategy
p in the game with fitness matrix A and suppose that a mutant playing q arises.
Then we may suppose that the population will be in perturbed state in which a
small fraction e of the individuals is playing q. The population will return to its
original position if the mutant is selected again, i.e. the fitness of a ^-individuals is
smaller than that of an individual playing p. Suppose that (p,p) is a symmetric Nash
equilibrium in a symmetric bimatrix game, and suppose that the second player in the
game instead of playing the strategy p decides to play a mixture of the two mixed
strategies: p and q with the probabilities 1 e, e, where c is small enough. Then in
general for the new mixed strategy y = (1 t)p + eq the set of Player's 1 best replies
3-8. Symmetric bimatrix games and evolutionary stable strategies 155
against y = (1 e)p + tq will not necessarily contain the starting p. And also it may
happen that q is better reply against y than p. But if for any q there exists such an
e > 0, that p is better reply against y = (1 e)p + tq than q, then we see that the use
of p by Player 1 is in some sense stable against small perturbations of the opponents
strategies, i.e. for any q there exists > 0 such that
If (pfp) is a strict equilibrium {pAp > qAp for all q), then (3.8.2) always holds.
There is also an evolutionary interpretation of (3.8.2), based on the example of Hawk-
Dove game.
If (3.8.2) holds, we have
From (3.8.4) we see that qAp > pAp is impossible because in this case (3.8.2) will not
hold for small e > 0. Then from (3.8.4) it uniquely follows
or
if qAp = pAp, then qAq < pAq. (3.8.6)
From (3.8.5), (3.8.6) the condition (3.8.4) trivially follows for sufficiently small > 0
(this e depends upon q).
3.8.4. Definition. A mixed strategy p is an ESS t/(p,p) s a Nosh equilibrium,
and the following stability condition is satisfied:
p->{qY,C(q)CB{p)}.
This correspondence satisfies the conditions of the Kakutani fixed point theorem, and
hence there exists a point p" for which
p* {q V, C(q) C B(P)},
and thus
C(P') C B(p'). (3.8.7)
From (3.8.7) it follows that
p'Ap' > qAp'
m
for all q G Y, and (p*,p ) is a symmetric Nash equilibrium. We proved a theorem.
Theorem. [Nash (1951)]. Every symmetric bimatrix game has a symmetric
Nash equilibrium.
156 Chapter 3. Nonzero-sum games
3.8.6. We have already seen that if (p,p) is a strict Nash equilibrium, then p is
an ESS (this follows also directly from the definition of the ESS, since in this case
there are no such q 6 Y that qAp = pAp).
Not every bimatrix games possess an ESS. For example if in A all a,y = a and do
not depend upon i,j then it is impossible to satisfy (3.8.7).
3.8.7. Theorem. If A is 2 x 2 matrix with au / 2i and atu ^ a 22 , then A
has an ESS. Ifau > ctn and a 22 > ot]2, then A has two strict equilibria (1,1), (2,2)
and they are ESS. Ifan < a 2 i, an < u then A has a unique symmetric equilibrium
(p,p), which is completely mixed (C(p) = B(p) = {l,2}j.
Proof. For q^p, we have qAq-pAp = {q-p)A(q-p). If q = (nun2),p = ( 6 , 6 ) ,
then
( ~ P)Aia ~P) = fai - i)2(<*ii - a 2 i + <*S2 - an) < 0.
Hence (3.8.3) is satisfied, and p is ESS.
3.8.8. Consider the game, where the matrix A has the form
b a a a a
a b a a a
A= a a 6 a a (3.8.8)
a a a b a
If 0 < b < a this game does not have an ESS. The game has a unique symmetric
equilibrium p = (1/5,1/5,1/5,1/5,1/5), pAp = 6/5, and every stratetgy is a best
reply against p and for any , e^e,- = a,, = 6 > 6/5 = pAp = pAe; (where e^ =
(0,..., 0, U, 0,..., 0)), and the condition (3.8.3) in ESS is violated.
Thus for the games with more than two pure strategies the theorem does not hold.
3.8.9. It is interesting that the number of ESS in the game is always finite
(although may be equal to zero).
If (p,p) and (q,q) are Nash equilibria of A with q^ p, and C(q) C B(p), then p
cannot be an ESS, since q is the best reply against p and q.
Theorem. Ifp is an ESS of A and (q,q) is a symmetric Nash equilibrium of A
with C(q) C B(p), then p = q.
3.8.10. Let {pn,pn) be the sequence of symmetric Nash equilibrium's of A such
that limn^oop" = p. Then from the definition of the limit we get, that there exists
such N that for all n > N
FVom the previous theorem we have that p n = p for n > N. It follows, that
every ESS is isolated within the set of symmetric equilibrium strategies. From the
compactness of the set of situations in mixed strategies we have that if there would
be infinitely many ESS, there would be a cluster point, but the previous discussion
shows that this is impossible. Thus the following theorem holds:
Theorem. [Haigh (1975)]. The number ESS is finite (but possibly zero).
3.9. Equilibrium in joint mixed strategies 157
Figure 3.3
If the game is repeated, then it may be wise for the players to make their choice
jointly, i.e. to choose with probability 1/2 the situation (Qi,0i) or (ct2, fh). Then
the expected payoff to the players is, on the average, (5/2,5/2). This point, however,
is not lying in the set of payoff vectors corresponding to possible situations in a
noncooperative game (Fig. 3.2), i.e. it cannot be realized if the players choose mixed
158 Chapter 3. Nonzero-sum games
strategies independently.
A joint mixed strategy of players is the probability distribution over the set of all
possible pairs (i, j) (a situation in pure strategies), which is not necessary generated
by the players' independent random choices of pure strategies. Such strategies can
be realized by a mediator before the game starts.
Denote by M a joint mixed strategy in the game T(A, B). If this strategy is played
by Players 1 and 2, their expected payoffs K\(M), K2(M) respectively are
U
* ' \0, t / ^ = 0 , j = l,...,n,
";()
-\o, ifl*ij = 0, t = 1,
(3.9.1), where the left-hand sides of the inequalities coincide with the expected payoff
to Player 1(2) provided he agrees on the realization i(j).
Suppose the strategy t of Player 1 is such that m, = 0 for all j = l , 2 , . . . , n .
Then the first of the inequalities (3.9.1) seems to be satisfied. Similarly, if j,-,- = 0 for
all t = 1 , . . . , m, then the second inequality in (3.9.1) is satisfied. We substitute the
expressions for fii{j) and i/,(t) in terms of fi^ into (3.9.1). Then it follows that the
necessary and sufficient condition for the situation M* = {^*j} to be equilibrium is
that the inequalities
n n m n m m
5 > i M y > a , ^ , > ' , = 1, f t ^ > / ? , y , v fi > 0 (3.9.2)
j=l j=l i-\]=\ 1=1 i=l
2. If ( i , y) is a situation in mixed strategies in the game T(A, B), then the joint
mixed strategy situation M = {^s,j} generated by the situation (x,y), is equilib
rium if and only if(x,y) is the Nash equilibrium in mixed strategies in the game
T(AB).
for all t and j from the spectra of optimal strategies. Therefore, inequalities (3.9.4)
are satisfied and M ZC{T).
Conversely, if (3.9.3) is satisfied, then summing the inequalities (3.9.3) over t and
j , respectively, and applying Theorem 3.5.1, we have that the situation (x,y) is a
Nash equilibrium.
160 Chapter 3. Nonzero-sum games
The convexity and compactness of the set ZC(T) follow from the fact that ZC(T)
is the set of solutions to the system of linear inequalities (3.9.2) which is bounded,
whereas its nonemptiness follows from the existence of the Nash equilibrium in mixed
strategies (see 3.4.1). This completes the proof of the theorem.
"1/2 0
Note that the joint mixed strategy M* is equilibrium in the
0 1/2
game "battle of the sexes" (see Example 1, 3.1.4), which may be established by mere
verification of inequalities (3.9.2).
^1,4)
(vi,v2) = (5/2,5/2)
(4,1)
4 K,
Figure 3-4
R. However, this does not mean that they can agree on any outcome of the game.
Thus, the point (4,1) is preferable to Player 1 whereas the point (1,4) is preferable to
Player 2. Neither of the two players can agree with the results of negotiations if his
payoff is less than the maximin value, since he can receive this payoff independently
of his partner. Maximin mixed strategies for the players in this game are respectively
z = (1/5,4/5) and y = (4/5,1/5), while the payoff vector in maximin strategies
3.JO. The bargaining problem 161
1. (vuv2)>(vlv).
S. (vuv2) S.
tp(S,v1,vl) = (vuv2).
The function yj, which maps the bargaining game {S,v,v^) into the payoff vec
tor set (wi,u 2 ) and satisfies conditions 1-6, is called a Nash bargaining scheme
[Owen (1968)], conditions 1-6 are called Nash axioms, and the vector (vi, v2) is called
a bargaining solution vector. Thus, the bargaining scheme is a realizable optimality
principle in the bargaining game.
Before going to prove the theorem we will discuss its conditions using the game
"battle of the sexes" as an example (see Fig. 3.4). Axioms 1 and 2 imply that the
payoff vector (wi,v 2 ) is contained in the set bounded by the points a,b,c,d,e. The
axiom 3 implies that (tJi,U2) is Pareto optimal. Axiom 4 shows that the function v?
162 Chapter 3. Nonzero-sum games
If v[ < v", then v'2 > v2. Since the set Si is convex, then (vi,v2) G Si, where
t?i = ({ + v'{)/2, v2 = (v'2 + v%)/2. We have
*(i,Vj) g 2
S(Vl,V2)<6(vUV2).
Proof. Suppose there exists a point (t>i,j) 6 S such that 6(vi,v2) > 6(vi,v2).
From the convexity of S we have: (vi,t/j) S, where v[ = t7i + (i t^) and
v'2 = v2 + t(v2 v2), 0 < t < 1. By linearity, S(vi vu v2 v2) > 0. We have
For a sufficiently small e > 0 we obtain the inequality 0(v[,v'2) > 0(vi,v2), but this
contradicts the maximality of 0(vi,v2).
3.10.5. We shall now prove Theorem 3.10.2. To do this, we shall show the point
(^liVj), which maximizes 0(vi,u 2 ), is a solution of the bargaining problem.
3.JO. The bargaining problem 163
Proof. Suppose the conditions of Lemma 3.10.3 are satisfied. Then the point
(^1,^2) maximizing 0{vi,v2) is defined. It is easy to verify that (vi,v2) satisfies
conditions 1-4 of Theorem 3.10.2. This point also satisfies condition 5 of this theorem,
because if v[ = a^vi + fl\ and v'2 = a2v2 + 02, then
and if (Ui,Wj) maximizes 9(vi,v2), then (t^,?^) maximizes #'(i4,v 2 ). Show that the
set S is symmetric in the sense of condition 6 and v = v. Then (t>2,Vi) G S and
S(wiiWj) = 0(t>2,^i). Since (vi,v2) is a unique point, which maximizes 9(vi,v2) over
Si, then (i,i) = (vj.wi), i.e. vt = v2.
Thus, the point {Vi,v2) satisfies conditions 1-6. Show that this is a unique solution
to the bargaining problem. Consider the set
T={(v[,v'2)\v'l+v'2<2}
and uj = Vj = 0. Since T is symmetric, it follows from property 6 that a solution
(if any) must lie on a straight line v[ = v 2 , and, by condition 3, it must coincide with
the point (1,1), i.e. (1,1) = >(T,0,0). Reversing the transform (3.10.3) and using
property 5, we obtain (Vi,v2) = ip(R,v,v2). Since (i,wj) 6 S and S C R, then by
property 4, the pair (v\,v2) is a solution of (S,Vj,v5)-
Now suppose that the conditions of Lemma 3.10.3 are not satisfied, i.e. there are
no points (t>i,t>2) S for which vt > v and v2 > v2. Then the following cases are
possible:
a) There are points, at which vi > v and v2 = v2. Then (vi,v2) is taken to be
the point in S, which maximizes t>i under constraint v2 = v2.
b) There are points, at which v\ = v and v? > v. In this case, (t>i,wj) is taken
to be the point in S, which maximizes v2 under constraint v\ = v.
c) The bargaining set 5 degenerates into the point (u, uj) of maximin payoffs
(e.g., the case of matrix games). Set Vi = v, v2 = v2.
It can be immediately verified that these solutions satisfy properties 1-6, and
properties 1-3 imply uniqueness. This completes the proof of the theorem.
In the game "battle of the sexes", the Nash scheme yields bargaining payoff
(vuv2) = (5/2,5/2) (see Fig. 3.4).
3.10.6. In this section we survey the axiomatic theory of bargaining for n players.
Although alternatives to the Nash solution were proposed soon after the publication of
Nash's paper, it is fair to say until the mid-1970s, the Nash solution was often seen by
164 Chapter 3. Nonzero-sum games
economists and game theorists as the main, if not the only, solution to the bagaining
problem. Since all existing solutions are indeed invariant under translations of the
origin and since our own formulation will also assume this invariance, it is convenient
to take as admissible only problems that have already been subjected to translation
bringing their disagreement point to the origin. Consequently, v = (vf,...,t/) =
( 0 , . . . ,0) 6 i f always, and a typical problem is simply denoted S instead of (5,0).
Finally, all problems are taken to be subsets of R% (instead of i f ) . This means that
all alternatives that would give any player less than what he gets at the disagreement
point v = 0 are disregarded.
Definition. The Nosh solution N is defined by setting, for all convex, compact,
comprehensive subsets S C R^. containing at least one vector with all positive coordi
nates (denote S n J , N(S) equal to the maximizer in v 5 of the "Nosh product"
ns.ii-
Nash's theorem is based on the following axioms:
1. Pareto optimality. For all 5 n , for all v i f , if v > f(S) and v ^ <p(S),
then v i S [denote <p(S) 6 PO(S)}.
A slightly weaker condition is:
2. Weak Pareto-optimality: For all S 6 " , for all t> g Rn, if v > ip(S), then
v$S.
Let Il n : { 1 , . . . , n } { l , . . . , n } be the class of permutations of order n. Given
II e I P , and c e i J " , let JT(V) = (u(i),... ,t; l ( n )). Also, given S C i f , let x(S) =
{v' 6 R" | 3v S with v' = *()}.
3. Symmetry. For all 5 G n , if for all x U\w(S) = 5 , then <^(S) = ip^S)
for all i,j (note that ir(S) e " ) .
Let Ln : R" > i f be the class of positive, independent person-by-person, and
linear transformations of order n. Each I Ln is characterised by n positive numbers
a,- such that given v R?,l(v) = (aiVi,.. . ,a n t>). Now, given S C -ft", let l(S) =
{v'eRn\3v S with v' = l(v)}.
4. Scale invariance: For all S " , for all / 6 Ln, ip(l(S)) = l(<p(S)) [note that
l(S) 6 E"]-
5. Independence of irrelevant alternatives: For all S, S' C E " i if S' C 5 and
>(S) 5 ' then V(S') = p(S).
In previous section we showed the Nash theorem for n = 2, i.e. only one solution
satisfies these axioms. This result extends directly to arbitrary n.
Theorem. A solution tp(S), S E " satisfies 1, 3, 4, 5 if and only if it is
the Nash solution.
This theorem constitutes the foundation of the axiomatic theory of bargaining.
It shows that a unique point can be identified for each problem, representing an
equitable compromise.
In the mid-1970s, Nash's result become the object of a considerable amount of
renewed attention, and the role played by each axiom in the characterization was
scrutinized by several authors.
6. Strong individual rationality. For all S n , f(S) > 0.
3.10. The bargaining problem 165
0 0
Theorem [Roth (1977)]. A solution <p{S), S 6 " sate/iea 3,4 ,5,6 if and
only if it is the Nash solution.
If 3 is dropped from the list of axioms in Theorem 3.10.6, a somewhat wider but
still small family of additional solutions become admissible.
3.10.7. Definition. Given a = ( a i , . . . , a ) , o^ > 0, t = l , . . . , n , " = i a i = 1,
the asymmetric Nash solution with weights a, Na, is defined by setting, for all S 53" >
Na(S) = argmaxn? = 1 ?", S.
These solutions were introduced by Harsanyi and Selten (1972).
Theorem. A solution <p(S), S n satisfies 4 0 ,5,6 0 if and only if it is an
asymmetric Nash solution.
If 6 is not used, a few other solutions became available.
3.10.8. Definition. Given i 6 { l , . . . , n } the i-th Dictatorial solution D'
is defined by setting, for all S X2n, D*(S) equals to the maximal point of S in the
direction of the ith unit vector.
Note that all D' satisfy 4, 5, and 2 (but not 1). To recover full optimality,
one may proceed as follows. First, select an ordering tc of the n players. Then given
S )", pick D*(l\S) if the point belongs to Pareto optimal subset of S; otherwise,
among the points whose ir(l)th coordinate is equal to D'^(S), find the maximal
point in the direction of the unit vector pertaining to player T ( 2 ) . Pick this point if it
belongs to Pareto optimal subset of S; otherwise, repeat the operation with fl"(3),
This algorithm is summarized in the following definition.
3.10.9. Definition. Given an ordering ir o / { l , . . . , n } , the lexicographic
Dictatorial solution relative to w, D*, if defined by setting, for all S JC", ^(S) to
be the lexicographic maximizer over v S of v*(t), tv(2), , w*(n)-
All of these solutions satisfy 1,4,5, and there are no others if n = 2.
3.10.10. The Kalai-Smorodinsky solution. A new impetus was given to the
axiomatic theory of bargaining when Kalai and Smorodinsky (1975) provided a char
acterization of the following solution (see Fig. 3.5).
MS)
Figure 3.5
Figure 3.6
The most striking feature of this solution is that it satisfies the following mono
tonicity condition, which is very strong, since no restriction are imposed in its hy
potheses on the sort of expansions that take S into S'. In fact, this axiom can serve
to provide an easy characterization of the solution.
8. Strong monotonicity. For all S,S' G E n , if 5 C ", then ip(S) < ifi(S').
The following characterization result is a variant of a theorem due to Kalai(1977).
Theorem. A solution *p(S), S " satisfies 2, 3, 8 if and only if it is the
Egalitarian solution.
3.11. Games in characteristic function form 167
3.10.12. The Utilitarian solution. We close this review with a short discussion
of the Utilitarian solution.
Definition. A Utilitarian solution U is defined by choosing, for each S J?
among the maximizers of "=i vi for v 6 5 (see Fig. 3.7).
Figure 3.7
or act individually.
3.11.1. Let N = { 1 , . . . , n } be a set of all players. Any nonempty subset S C N
is called a coalition.
Definition. The real-valued function v defined on coalitions S C N is called a
characteristic function of the n-person game. Here the inequality
X>($) <M
1=1
This, in particular, implies that there is no decomposition of the set N into coalitions
such that the guaranteed total payoff to these coalitions exceeds the maximum payoff
to all players v(N).
3.11.2. We shall now consider a noncooperative game T = (N, {Aj}<g/v, {/f"<};e;v).
Suppose the players appearing in a coalition S C N unite their efforts for the
purpose of increasing their total payoff. Let us find the largest payoff they can
guarantee themselves. The joint actions of the players from a coalition S imply
that this coalition 5 acting for all its members as one player (call him Player 1) takes
the set of pure strategies to be the set of all possible combinations of strategies for
its constituent players from S, i.e. the elements of the Cartesian product
xs = l[xi.
ies
The community of interests of the players from S means that a payoff to the coalition
S (Player 1) is the sum of payoffs to the players from S, i.e.
tfs(*)i #,(*),
of the game Ts, the guaranteed payoff v(S) to Player 1 can merely be increased
in comparison with that in the game Ts. For this reason, the following discussion
concentrates on the mixed extension of Ts- In particular, it should be noted that,
according to this interpretation, v(S) coincides with the value of the game Ts (if any),
while v(N) is the maximum total payoff to the players. Evidently, v(S) only depends
on the coalition S (and on the original noncooperative game itself, which remains
unaffected in our reasoning) and is a function of S. We shall verify that this function
is a characteristic function of a noncooperative game. To do this, it suffices to show
that the conditions (3.11.1) is satisfied.
Note that t>(0) = 0 for every noncooperative game constructed above.
Lemma (on superadditivity). For the noncooperative game that
T = (N, {Xi}i6jv, {//,},#), we shall construct the function v(S) as
where HSVT is the mixed strategy of the coalition S U T, i.e. arbitrary probability
measures on Xs^j, VN\{SUT) >S probability measure on ^/V\(SUT)J K> is a payoff to
player i in mixed strategies. If we restrict ourselves to those probability measures on
XsuT, which are the products of independent distributions ps and vj over the Carte
sian product Xs x XT, then the range of the variable, in terms of which maximization
is taken, shrinks and supremum merely decreases. Thus we have
Hence
v(SvT)> inf 53 ^.(MS X/*T,C/V\(SUT))
Since the sum of infimums does not exceed the infimum of the sum, we have
x
v(S U T) > inf Yl KiiPs VT, "H\(SI>T)) + inf J2 K^s x HT, fN\(sur)).
170 Chapter 3. Nonzero-sum games
Minimizing the first addend on the right-hand side of the inequality over fir, and
the second addend over /is (f r uniformity, these will be renamed as uj and v$), we
obtain
v(S U T) > inf inf 5 3 Ki(ns x vT, ^W\(SUT)) + inf inf 5 3 Ki{vs x fir, W\(sur))
The last inequality holds for any values of the measures fts and fir- Consequently,
these make possible the passage to suprema
v(S U T ) > sup inf 5 3 Ki(ns, VN\S) + sup inf 5 3 #i - (/r, "Ar\r)>
v{Sl)T)>v(S) + v{T).
The superadditivity is proved.
Note that inequality (3.11.3) also holds if the function v(S) is constructed by the
rule
v(S) = sup inf Hs(xs,xN\s), S C N,
where xs 6 Xs, Xft\s G ^JV\SI ?s = (Xs, Xx\s, HS)- In this case, the proof literarily
repeats the one given above.
3.11.3. Definition. The noncooperative game V = (JV, {Jf.Jig^, {//,},#) is
called a constant sum game if
5 3 Hi(x) c = const
for all situations x in pure strategies and all situations fi in mixed strategies.
On the other hand,
Q e N
= "({*}) + 7 >
and
7.->0,*JV, 2 > = ( * ) - 5 X W ) -
iJV iN
172 Chapter 3. Nonzero-sum games
The equivalence of the game (N, v) to (N, v') will be denoted as (N, v) ~ (N, v')
or v ~ v'.
It is obvious that v ~ v. This can be verified by setting d = 0, k = 1, v' = v in
(3.11.8). This property is called reflexity.
We shall prove the symmetry of the relation, i.e. that the condition v ~ v' implies
v' ~ v. In fact, setting k' = l/k, cj = Ci/k we obtain
v(s) = *V(S) + 4
ies
i.e. v' ~ v.
Finally, if v ~ v' and v' ~ v", then t; ~ v". This property is called transitivity.
This can be verified by successively applying (3.11.8).
3,11. Games in characteristic function form 173
a'i =fca,-+ Q, i N,
establishes the one-to-one mapping of the set of all imputations in the game v onto
s s
the imputation set in the game v', so that a >~ ft implies a' -< /?'.
Proof. Let us verify that a' is an imputation in the game (N,v'). Indeed,
It follows that conditions (3.11.4), (3.11.5) hold for a'. Furthermore, if a >- /?, then
v(N)-ZiNV{b}) fts
Then '({*}) = 0, v'(N) = 1. This completes the proof of the theorem.
174 Chapter 3. Nonzero-sum games
This theorem implies that the game theoretic properties involving the notion of
dominance can be examined on the games in (0-1) - reduced form. If v is the char
acteristic function of an arbitrary essential game {N,v), then
fl - ~ J. (S) ~ <*(S) c c
isi' '
3.12. The core and NM-solution 175
where \S\ is the number of elements of the set S. It can be easily seen that fi(N) = 1,
s
fii>Q and /3 >- a. Then it follows that a does not belong to the core.
Theorem 3.9.1 implies that core is a closed convex subset of the set of all imputa
tions (core may also be an empty set).
3.12.2. Suppose the players are negotiating the choice of a cooperative agreement.
It follows from the superadditivity of v that such an agreement brings about the
formation of the coalition N of all players. The question is tackled as to the way of
distributing the total payoff v(N), i.e. the way of choosing a vector a K" for which
E . e N " . = v{N).
The minimum requirement for obtaining the players' consent to choose a vec
tor a is the individual rationality of this vector, i.e. the condition a, > v({t}),
i 6 N. Suppose the players are negotiating the choice of the particular imputation a.
Some coalition S demanding a more advantageous imputation may raise an objection
against the choice of this imputation. The coalition S lays down this demand, threat
ening to break up general cooperation (this threat is quite real, since the payoff v(N)
can only be ensured by unanimous consent on the part of all players). Suppose the
other players N\S respond to this threat by uniting their efforts against the coalition
S. The maximum guaranteed payoff to the coalition S is evaluated by the number
v(S). Condition (312.1) implies that there exists a stabilizing threat to the coalition
S from the coaltion N. Thus, a core of the game (JV, v) is the set of distributions of
the maximum total payoff v(N) which is immune to cooperative threats.
We shall bring forward one more criterion to judge whether an imputation belongs
to the core.
L e m m a . Let a be an imputation in the game {N,v). Then a belongs to the core
if and only if the inequality
Y,at<v(N)-v(N\S) (3.12.2)
v(N\S)< a,.
iN\S
The core is a subset of the imputation set defined by linear inequalities (3.12.1),
i.e. a convex polyhedron. By the symmetry of v(S), the core is also symmetric,
i.e. invariant under any permutation of components o j , . . . ,an. Furthermore, by the
convexity of the core, it can be shown that the core is nonempty if and only if it
contains the center a* of the set of all distributions (a* = f(n)/n, i = l , . . . , n ) .
Returning to system (3.12.1), we see that the core is nonempty if and only if the
inequality (1/|S|)/(|S|) < (l/n)/(n) holds for all \S\ = l , . . . , n . Thus, the core is
3.12. The core and NM -solution
0 1 2 ... | 5 | ... n n
Figure 3.8
.. n
Figure 3.9
nonempty if and only if there is no intermediate coalition S, in which the average part
to each player exceeds the corresponding amount in the coalition N. Fig. 3.8(3.9)
corresponds to the case where the core is nonempty (empty).
3.12.4. Example 20. [Vorobjev (1977)]. Consider a general three-person game in
(0-1) - reduced form. For its characteristic function we have v(0) = v(l) = v(2) =
w(3) = 0, t(l,2,3) = 1, w(l,2) = c 3 , v(l,3) = c 2 , w(2,3) = c where 0 < q < 1,
i = 1,2,3. By the Theorem 3.9.1, for the imputation a to belong to the core, it is
necessary and sufficient that there be
or
Q 3 < 1 - C 3 , 2 < 1 - C 2 , Oi < 1 - Cj. (3.12.3)
Summing inequalities (3.12.3) we obtain
tti + a ! + tt3<3-(cl+ca + c 3 ),
178 Chapter 3. Nonzero-sum games
or, since the sum of all a,, t = 1,2,3, is identically equal to 1,
ci + c2 + c3 < 2. (3.12.4)
The last inequality is the necessary condition for the existence of a nonempty core
in the game of interest. On the other hand, if (3.12.4) is satisfied, then there are
non-negative i,2,3 such that
Figure 3.10
Let fii = 1 (^ &, = 1,2,3. The numbers & satisfy inequalities (3.12.3) in such
a way that the imputation /? (&,$, #3) belongs to the core of the game; hence
relation (3.12.4) is also sufficient for a nonempty core to exist.
Geometrically, the imputation set in the game involved is the simplex: a\ + a2 +
<*3 = 1, atj > 0, t = 1,2,3 (triangle ABC shown in Fig. 3.10). The nonempty core is an
intersection of the imputation set (A ABC) and a convex polyhedron (parallelepiped)
0 < Qfi < 1 Ci, i = 1,2,3. It is the part of triangle ABC cut out by the lines of
intersections of the planes
Qi = l - c t = 1,2,3 (3.12.5)
with the plane AABC. Referring to Fig. 3.10, we have a^, i = 1,2,3 standing for
the line formed by intersection of the planes at = 1 c, and ai + a 2 + 0:3 = 1. The
intersection point of two lines, on and ctj, belongs to triangle ABC if the kth coordinate
of this point, with (k ^ t, ifc ^ j), is non-negative; otherwise it is outside of AABC
(Fig. 3.11a, 3.11b). Thus, the core has the form of a triangle if a joint solution to
3.12. The core and NM-solution 179
any pair of equations (3.12.5) and equation ct\ + a 2 + e*3 = 1 is non-negative. This
requirement holds for
The core can take one or another form, as the case requires (whereas a total of eight
is possible here). For example, if none of the three inequalities (3.12.6) is satisfied,
then the core appears to be a hexagon (Fig. 3.11b).
a) b)
Figure S.ll
v(S)<
n - | 5 | + l'
where | S | is the number of players in coalition S, hold for the characteristic function
of the game (N,v) in (0-1) - reduced form (\N\ = n), then the core of this game is
nonempty, and is its NM-solution.
Proof. Take an arbitrary imputation a which is exterior to the core. Then there
exists a nonempty set of those coalitions S in which it is possible to dominate a, i.e.
these are the coalitions for which a(S) < v(S). The set {S} is partially ordered in the
inclusion, i.e. Si > S 2 if S2 C Si. Take in it a minimal element So which apparently
exists. Let k be the number of players in the coalition So. Evidently, 2 < k < n 1.
Let us construct the distribution 0 as follows:
ftH
Since 0(So) = (So), ft > a,-, i So, then 0 dominates a in the coalition So- Show
that 0 is contained in the core. To do this, it suffices to show that 0(S) > v(S) for
an arbitrary S. At first, let | S | < k. Note that 0 is not dominated for any coalition
S C So, since ft > a, (t g S 0 ), while S 0 is a minimal coalition for which it is possible
to dominate a. If, however, at least one player from S is not contained in So, then
flg). W-*><-**>>+.(ft)
n k
a n1^k
|5|- t + t - | 5 |+ i_ l
However, if S does not contain S 0 , then the number of players of the set S, not
contained in So, is at least | S | k + 1; hence
3.12.6. Definition. The game (N,v) in (0-1) - reduced form is called simple
if for any S C N v(S) takes only one of the two values, 0 or 1. A cooperative game
is called simple if its (0-1) - reduced form is simple.
Example 21. [Vorobjev (1977)]. Consider a three-person simple game in (0-1) -
reduced form, in which the coalition composed of two or three players wins (v(S) = 1),
while the one-player coalition loses (u({t}) = 0). For this game, we consider three
imputations
None of the three imputations dominates each other. The imputation set (3.12.7)
also has the property as follows: any imputation (except for three imputations a,j)
is dominated by one of the imputations a . This can be verified by examining some
imputation a = (cti,ct2,a3). Since we are examining the game in (0-1) - reduced
form, then a, > 0 and ei + a7 + 3 = 1. Therefore, no more than two components
of the vector a can be at least 1/2. If there are actually two components, then each
of them is 1/2, whereas the third component is 0. But this means that a coincides
with one of a^. However, if a is some other imputation, then it has no more than
one component which is at least 1/2. We thus have at least two components, say, a;
and ctj (i < j ) , which are less than 1/2. But here o >- a. Now three imputations
(3.12.7) form NM-solution. But this is not the only iVM-solution.
Let c be any number from the interval [0,1/2]. It can be easily verified that the
set
L3iC = {(a, 1 - c - o,c) | 0 < a < 1 - c}
also is JVM-solution. Indeed, this set contains imputations, on which Player 3 receives
a constant c, while the players 1 and 2 divide the remaining part in all possible
proportions. Internal stability follows from the fact that for any two imputations a
and P from this set we have: if en > ft, then a2 < ft. But dominance for a single
player coalition is not possible. To prove the external stability Z, 3c , we may take any
imputation p $ L3iC. This means that either ft > c or ft < c. Let ft > c, e.g.,
ft = c -I- c. Define the imputation a as follows:
Then o 6 L3x and a >- fi for coalition {1,2}. Now, let ft < c. It is clear that
either ft < 1/2 or ft < 1/2 (otherwise their sum would be greater than 1). Let
ft < 1/2. Set a = (1 - c , 0 , c ) . Since 1 - c > 1/2 > ft, then a y @ for coalition {1,3}.
Evidently, a e L3,c. However, if ft < 1/2, then we may show in a similar manner
that 7 >- ft where 7 = (0,1 c, c). Now, aside from the symmetric NM-solution, the
game involved has the whole family of solutions which allow Player 3 to obtain a fixed
amount c from the interval 0 < c < 1/2. These iVM-solutions are called discriminant.
In the case of the set 3,0, Player 3 is said to be completely discriminated or excluded.
From symmetry it follows that there are also two families of iVM-solutions, LltC
and Lxc which discriminate Players 1 and 2, respectively.
182 Chapter 3. Nonzero-sum games
The preceding example shows that the game may have many NM-solutions. It
is not clear which of them is to be chosen. If, however, one ATM-solution has been
chosen, it remains unclear which of the imputations is to be chosen from this particular
solution.
Although the existence of ATM-solutions in the general case has not been proved,
some special results have been obtained. Some of them are concerned with the exis
tence of ATM-solutions, while the others are related to the existence of ATM-solutions
of a particular type [Diubin and Suzdal (1981)].
(M*i),*(a),--.,*(.)}) = (S).
The game (AT, xv) and the game (AT, v) differ only in that in the latter the players
exchange their roles in accordance with permutation P.
The definition permit the presentation of Shapley axiomatics. First, note that
since cooperative n-person games are essentially identified with real-valued (charac
teristic) functions, we may deal with the sum of two or more games and the product
of game by number.
3.13.2. We shall set up a correspondence between every cooperative game (AT, v)
and the vector ip(v) = (viMi >VnM) whose components are interpreted to mean
the payoffs received by players under an agreement or an arbitration award. Here,
this correspondence is taken to satisfy the following axioms.
Shapley a x i o m s .
1. If S is any carrier of the game (N,v), then
I > M = (S).
3.13. Shapley value 183
Definition. Suppose <p is the function which, by axioms 1-3, sets up a cor
respondence between every game (N,v) and the vector <p{v}. Then ip[v] is called the
vector of values or the Shapley value of the game (N,v).
It turns out that these axioms suffice to define uniquely values for all n-person
games.
Theorem. There exists a unique function y> which is defined for all games (N, v)
and satisfies axioms 1-3.
3.13.3. The proof of the theorem is based on the following results.
Lemma. Let the game (N,ws) be defined for any coalition S C N as follows:
Then for the game (N,ws) the vector <p[w$\ is uniquely defined by axioms 1,2:
(3i3 2)
*IHJ(*' US; -
where s = \S\ is the number of players in S.
Proof. It is obvious that S is the carrier of ws, as is any set T containing the set
S. Now, by axiom 1, if S C T, then
But this means that y>i[u>s] = 0 for t ^ S. Further, if r is any substitution which
converts S to itself, then xws = ws. Therefore, by axiom 2, for any i, j 6 5 there is
the equality <pi\ws] = <ft[u>s]. Since there is a total of s = \S\ and their sum is 1, we
have sft[wsl = 1/s if t G S.
The game with the characteristic function ws defined by (3.13.1) is called the
simple n-person game. Now the lemma states that for the simple game (N,Ws) the
Shapley value for the game (N,ws) is determined in a unique manner.
Corollary. / / c > 0, then
hence
vM = f\ E csws) - v[ E (~cs)wS]
S\cs>0 S|cs<0
3.13.4. Lemma. Let (N, v) be any game. Then there are 2n 1 real numbers
cs such that
v = J2 csws, (3.13.4)
SCN
where w$ are defined by (S. IS. 1) and summation is made over all subsets S of the set
N, exclusive of an empty set. Here, representation (S.1S.4) is unique.
Proof. Set
cs = E ( - l ) - V T ) (3-13-5)
T[TCS
(here t is the number of elements in T). Show that these numbers cs satisfy the
conditions of the lemma. Indeed, if U is an arbitrary coalition, then
E csws(U) = E cs
S{ScN S\SCU
= E ( E (-i)-MT)) = E f E (-i)Hf(T).
S|SCf T|TCS ' T\TCU lS\TcSCU '
We shall now consider the quantity which is bracketed in the last expression. For
every value 5 between i and u there are C"Z," of sets S with s-elements such that
T C S CU. Therefore the bracketed expression can be replaced by the following:
Ec::'(-ir, = EQ:;(-ir<,
but this is a binomial decomposition of (1 1)""'; hence it is 0 for all t < u, and 1
for t = u. Therefore for ail U C N
E csws(U) = v(U).
S\SCN
3.13. Sh&pley value 185
Then for all T = {i} we have ws({i}) = 0 if S # {i}, and u>s({i}) = 1 if S = {i}.
Hence X^ = 0 for all i C N. Continue the proof by using the induction method. Let
Xs = 0 for all S C T, S ^ T. Show that A r = 0. Indeed,
E Xsws(T) = Xsws(T) = XT = Q.
ScN SCX
Now, we have 2 1 linearly independent vectors in .ft 2 " -1 ; therefore every vector,
n
Set
7,G0= (-ir*(l/). (3-13.6)
S\ToiCSCN
If i V and T = T'U {}, then fi{V) = -fi(T). In fact, all terms on the right-hand
side of (3.13.6) in both cases are the same and only t = f + 1; hence they differ only
in sign. Thus we have
>.[]= E 7<(r)[(r)-(r\{t})].
T|iTcN
Further, if t 6 T, then there are exactly C*l'f coalitions S with s-elements such that
T C S. This brings us to the well known definite integral
J Jo
izi .=t
= / ' xf-l(l - x)n-'dx.
Jo
Thus we have
7 > (T) = (t-l)!(n-t)!/(n!)
and hence
{t
1*M = E ~l)^~t)lMT)-v(T\{t))). (3.13.7)
ThereN
Equation (3.13.7) determines explicitly the components of Shapley value. This
expression satisfies axioms 1-3 in 3.10.2.
Note that the vector tp[v] is an imputation. Indeed, by the superadditivity of the
function v,
r w / f i x v - (t - 1)K - *)'
Vi[v] > v({i}) J2 -T, ~
T\iCTcN "'
3.13.6. Axiomatic definition apart, the Shapley value expressed by (3.13.7) can
be interpreted conceptually as follows. Suppose the players (elements of the set N)
have decided to meet in a specified place at a specified time. It would appear natural
that, because of random deviations, they would arrive at various instants of time.
However, it is assumed that the players' arrival orders (i.e. their permutations) have
the same probability, namely l/(n!). Suppose that if, on arrival, player finds in place
(only) the members of coalition T \ {t}, then he receives a payoff v(T) v(T \ {i}),
that is, the limit amount he contributes to that coalition. Then the component of
Shapley value tpi[v) represents player 's payoff mathematical expectation in terms of
this randomization.
3.13.7. For a simple game (as in 3.9.6), the formula for Shapley value seems to
be particularly descriptive. Indeed, v(T) v(T\ {'}) is always either 0 or 1, and this
expression equals 1 if the coalition T wins, while 0 if the coalition T\ {i} fails to win.
Hence we have
*[] = ( - O K * -<)7!.
T
where summation is extended over all such winning coalitions T 3 t for which the
coalition T \ {i} is not the winning one.
Example 22. (Game with major player.) [Vorobjev (1977)]. The game is played
by n players. One of the players is called "major". Coalition 5 wins 1 if it contains
either a major player and a t least one more player or all its n 1 "ordinary" players.
If n is the major player, then the characteristic function of this game can be written
as
i
l, 5 D { t , n } , i^n,
1, SD{l,...,n-l},
0, otherwise.
3.14. The potential of the Shapley value 187
It is obvious that the conditions v(T) = 1 and t>(T\{n}) = 0 hold for any coalition
T D {n} if and only if 2 < |T| < n - 1. Hence
n-l
(=2
E'
All ordinary players possess equal rights; hence, by symmetry,
Now, the "monopolistic" position of major player ensures him the payoff (n
l)(n 2)/2 times that of ordinary players.
3.13.8. Example 23. ("Land-lord and farm labourers".) [Vorobjev (1977)]. Sup
pose there are n 1 farm labourers (players t = 1 , . . . ,n 1) and a land-lord (player
n). The land-lord engages k labourers and derives from the harvest a profit f(k) (f(k)
increases monotonically). The farm labourers cannot derive a profit for themselves.
This is described by the characteristic function
w(5) = ;/(|S|-i), W e s ,
\ 0, otherwise.
Here, for all T D {n}, \T\ > 1, v(T) - v(T \ {n}) = f(t - 1), where t = \T\ and
from (3.13.7) follows
consisting from the unique payoff vector (unique imputation). In this section we
follow Hart and Mas-Colell (1988) and introduce here one number which specifies the
cooperative game. By using the "marginal contribution" principle we assign to each
player his marginal contribution according to the numbers defined for the game. It
happens that the only requirement, that the resulting payoff vector be "efficient" (i.e.
that the payoffs add up to the worth of the grand coalition), determines this process
uniquely.
3.14.2. A cooperative game with transferable payoffs is a pair (N, v), where N
is a finite set of players and v : 2N * R is the characteristic function, satisfying
v(0) = 0. A subset 5 C N is called a coalition, and v(S) is the worth of the coalition
S. Given a game (N,v) and a coalition S C N, we write (S,v) for the subgame
obtained by restricting v to (the subsets of) 5; that is, the domain of the function v
is restricted to 2 s .
3.14.3. Let f denote the set of all games. Given a function P : f R that
associates a real number P(N,v) to every game (N, v), the marginal contribution of
player i in game (N, v) is defined as
for all games (N, v). Thus, a potential function is such that its marginals are always
efficient; that is, they add up to the worth of the grand coalition.
3.14.4. Theorem. There exists a unique potential function P. For every game
(N, v) the resulting payoff vector (D'P(N,v))iN of marginal contributions coincides
with the Shapley value of the game. Moreover, the potential of a game (N, v) is
uniquely determined by (8.14-V applied only to the game and its subgames (i.e., to
(S,v) for all S C N).
Proof. Rewrite (3.14.1) as
v(S) = v(S \ {:'}) for all S). We claim that this implies P(N,v) = P(N \ {*},);
hence D,P(N,v) = 0. Assume the assertion holds for all games with less than |JV|
players; in particular, P(N \ {j},v) = P(N \ {j,i},v) for all j ^ i. Now subtract
(3.14.2) for N \ {i} from (3.14.2) for /V, to obtain
M"o)= EMJVo)-o(JVo\{})]
This reduces to
* ( * ) = E M J V o ) - MNo \ {})], (3.14.4)
aT = aT(N,v) = ( - l ) l T H s l t , ( S ) . (3-14-5)
ScT
TCN I1 I
W )SCN= E - ^ ^ , n
The restriction of the weight function w to the edge E(Ks) is denoted also by w.
3.15.4. Definition. The minimum cost spanning tree (MCST) game corre
sponding to the network (K^, w) is the cooperative cost game in characteristic function
form (N,c), where the characteristic function c ; 2N R is given by
c(0) = 0
N
and for all S= MuK, K C2 ,
c(S) = total weight of a MCST (S,E(TS)) in the subnetwork (Ks,w), i.e.
c(S)= w(l).
<(r s )
<*(*)
a{x) ==- E k*(*,
(x'z)*)+E*(*,
+ E *(*>*)>
z),
xM *6Af
zM
then it is easily seen that for 5 = M U K, K C N, \k\ > 1
c(S) = E (*)
xSn/V=tf, |K|=*>1
For the cooperative MCST game with the above defined characteristic function we
have
{ l a)
p(Mu^c
P(MUN,C)=)= E n m
SCNuM
E
!
^'- ;Zi:~
T i r ^<s)V )
SCNuM ((n +
+ m) !
)
^ (m
{m +
+ kk-- l)!(m + n - (m + A))!
-= 2_,
Z* /r nn -i-, mm'if,, ' <w
c
^'
S=MuK, \K\>i
\K\>1 \ "'" r
^ ( m + * - l ) !fc-l)!(n-*)!
(n-*)! ^ aa
~ Z,
~ Z, /(nn +, TnmV\t Z,
^ **
it=l
k=l (\ n ^
^ Tn>-
>- iSnJV=K, |K|=*>1
*SnJV=K, |K|=*>1
" (m + fc-l)!(n-*)! t _ i v
(3.15.2)
m
*=1 V" "" l- xN
Denote the coefficient
k--1)!( B - - * ) ' k_x _
(m + k-l)\(n-ky
C = A(n, m).
(^^jj
( + m)! - . - Mn,m).
For m = 1, A(n, 1) does not depend upon n and
n
1 1
A(n,l) =
'n(n + ll))j S^ * - 22 '
n(n +
for m = 2, /l(n, 2) does not depend upon n and
AU n -= E
A(n
A UM,2)J EUL i *(* *) _ 1I
*0 ++ *)
~(n(n + 2)(n + l)n
l ) n~~ 33''
3.16. Exercises and problems 193
1
A(n,m) =
m +1
we have
Air, m\ - V (n-mm + k-l)\
A(n,m)=_I_
and we have
1
A(n + l , m ) = n+m+l - M ( n , m ) + 1) = n+m+1m+1
i - ( _ J L _ + i) = ra+1
Using the expression (3.15.2)
7. Find a completely mixed Nash equilibrium in the bimatrix game with the
matrices
6 0 2 "607"
0 4 3 , B= 0 4 0
7 0 0 2 3 0
Does this game also have equilibria in mixed strategies?
Hint. First find a completely mixed equilibrium (x,y), x = (fi, &,}), y =
('h>'?2>,?3)> then a n equilibrium for which i = 0, etc.
8. "Originality game". [Vorobjev (1984)]. Consider a noncooperative n-person
game T = (N,{X,}ieN,{H,}ieN), where X, = {0,1}, /f<(0,... ,0||,1) = g, > 0,
i / , ( l , . . . , 1||,0) = hi > 0, Hi(x) = 0 in the remaining cases where ||, means that a
replacement is made in the ith position.
(a) Interpret the game in terms of advertising,
(b) Find a completely mixed equilibrium.
9. As is shown in 1.10.1, zero-sum two-person games can be solved by the "fictious
play" method. Examining the bimatrix game with the matrices
(a) Show that the unique completely mixed Nash equilibrium is an equiprobable
choice of chairs to be made by each player.
(b) Show that an equilibrium in joint mixed strategies is of the form
1/6, if i ? i,
L(hf to, if i = j .
(c) Show that the payoffs in Nash equilibrium are not Pareto optimal, while a
joint mixed strategy equilibrium may result in Pareto optimal payoffs (3/2,3/2).
11. The equilbrium in joint mixed strategies does not imply that the players must
necessarily follow the pure strategies resulting from the adopted joint mixed strategy
(see definition in 3.6.1). However, if we must adhere to the results of a particular
realization of the joint mixed strategy, then it is possible to extend the concept an
196 Chapter 3. Nonzero-sum games
"equilibrium in joint mixed strategies". For all i 6 N, denote by /i(N \ {i}) the
restriction of distribution fi to the set XN\^} = n, 6 N\{i} Xi> namely
for all a; 6 UieN & We say that fi is the weak equilibrium in joint mixed strategies
if the following inequalities hold for all i N and i/i Xi~.
(a) Prove that any equilibrium in joint mixed strategies is the weak equilibrium
in joint mixed strategies.
(b) Let fi = (iii,..., fin) be a mixed strategy situation in the game T. Show that
the probability measure Ji = HiN in on the set X = ILeN Xi is a weak equilbrium in
joint mixed strategies and an equilibrium in joint strategies if and only if the situation
ft = (fii,..., fin) is Nash equilibrium.
12. (a) Prove that in the game formulated in Ex. 10 the set of Nash equilibria, the
set of joint strategy equilibria and the set of weak equilibria in joint mixed strategies
do not coincide.
(b) Show that the interval [(5/3,4/3), (4/3,5/3)] is covered by the set of vector
payoffs that are Pareto optimal among the payoffs in joint mixed strategy equilibria,
while the interval [(2,1),(1,2)] is covered by the payoffs that are Pareto optimal
among the weak equilibrium payoffs in joint mixed strategies.
13. Find an arbitration solution to the bimatrix game with the matrices A =
2 -1 ' 1 -1 '
,B = by employing the Nash bargaining procedure.
-1 1 > *"* -1 2
14. Consider the bimatrix (2 x 2) game with the matrix
A ft
(A,B) = <*1 (1,1) (1,2)
a2 (2,1) (-5,0)
This is a modification of the "crossroads" game (see Example 2 in 3.1.4) with the
following distinction. A car driver (Player 1) and a truck driver (Player 2) make
different assessments of an accident (situation (a 2 , A ) ) . Show that an analysis of the
game in threat strategies prescribes a situation ( o i , f t ) , i.e. the car must "go" and
the truck must "make a stop".
15. Suppose the kernel has a nonempty intersection with all the bounds z,- =
v({}) of the imputation set. Show that here it is a unique ./VM-solution.
16. For the cooperative game (N, v) we define a semiimputation to be the vector
a = ( a i , . . . , a ) for which a^ > ({}) and E" = 1 Oi < v(N). Show that if L is an
NM-solution of the game (N,v) and a is the semiimputation which does not belong
to L, then there exists an imputation fi G L such that fi > a.
3.16. Exercises and probiems 197
A= max [ ( 5 U { i } ) - ( 5 ) l .
ScN\[i}1 V
' " V
"
Show that if there is an t for which a; > /?,-, then the imputation a cannot belong
either to the core or to one of the N Af-solutions.
18. Let (N,v) be a simlple game in (0, l)-reduced form (see 3.10.6) Player i is
called a "veto" player if v(N \ {i}) = 0.
(a) Prove that in order for the core to be nonempty in a simple game, it is necessary
and sufficient that there be at least one "veto" player in the game.
(b) Let S be the set of all "veto" players. Show that the imputation a =
( a i , . . . , a ) belongs to the core if Yliesai = 1> a i > 0, for t G S and a< = 0,
for i $ S.
19. In the game (N,v), we interpret the quasiimputation to mean a vector a =
(cri,... , a n ) such that . /v ai V(N). For every t > 0 we define a strict e-core Ct(v)
to be the set of imputations such that for every coalition
5M) = 2 [(n~5)'(,3"1)!(S)1-v(iV).
(a) Prove that the convex game has a nonempty core and the Shapley value belongs
to the core.
(b) Show that (A7, v) is a convex game if
199
200 Chapter 4. Positional games
set
FA i UxAFx.
By definition, let F(0) = 0 . It can be seen that if A,;C X, i = 1,... ,n, then
The map F of the set X into X is called a transitive closure of the map F, if
The graph (X, F) is denoted by G. In what follows, the elements of the set X
are represented by points on a plane, and the pairs of points x and y, for which
y Fx, are connected by the solid line with the arrow pointing from x to y. Then
every element of the set X is called a vertex or a node of the graph, and the pair of
elements (x, y), where y Fx is called the ore of the graph. For the arc p = ( i , y) the
nodes x and y Ate called the boundary nodes of the arc with x as the origin and y as
the end point of the arc. Two arcs p and q are called contingent if they are distinct
and have a boundary point in common.
The set of arcs in the graph is denoted by P. The set of arcs in the graph
G = (X, F) determines the map F, and vice versa, the map F determines the set P.
Therefore, the graph G can be represented as G = (X, F) or G = (X, P).
The path in the graph G = (X,F) is called a sequence of arcs, p =
(Pi)P2)-- >P*i- )' s u c n t n a t *^ e e n d f the preceding arc coincides with the ori
gin of the next one. The length of the path p = (pi, ,p*) is the number l(p) = k
of arcs in the sequence; in the case of an endless path p we set l{p) = oo.
The edge of the graph G (X, P) is called the set made up of two elements
x
i V X, for which either (x, y) P or (y, x) 6 P. The orientation is of no importance
in the edge as opposed to the arc. The edges are denoted by p, q, and the set of
edges by P. By the chain is meant a sequence of edges (pi,P2, - ), where one of the
boundary nodes of each edge p* is also boundary for p*_i, while the other is boundary
forp*+i.
The cycle is a finite chain starting in some node and terminating in the same node.
The graph is called connected if its any two nodes can be connected by a chain.
By definition, the tree or the graph tree is a finite connected graph without cycles
which has at least two nodes. Any graph tree has a unique node x0 such that F I 0 = X.
The node x0 is called the initial node of the graph G.
Example 2. Fig. 4.1 shows the graph or the graph tree with its origin at x0. The
nodes x X or the vertices of the graph are marked by dots. The arcs are depicted
as the arrowed segments emphasizing the origin and the end point of the arc.
Example S. Generally speaking, draughts or chess cannot be represented by a
graph tree if by the node of the graph is meant an arrangement of draughtsmen or
chess pieces on the board at a given time and an indication of a move, since the same
arrangement of pieces can be obtained in a variety of ways. However, if the node of
the graph representing a structure of draughtsmen or chess pieces at a given time is
taken to mean an arrangement of pieces on the board at a given time, an indication
of a move and the past course of the game (all successive positions of pieces on the
earlier moves), then each node is reached from the original one in a unique way (i.e.
there exists the only chain passing from the original node to any given node); hence
the corresponding graph of the game, contains no cycles and is the tree.
4.1.3. Let z X. The subgraph Gz of the tree graph G = (X, F) is called a graph
of the form (X Ft), where Xz = Fz and Fa = FXD Xz. In Fig. 4.1 the dashed line
encircles the subgraph starting in the node z. On the tree graph for all x Xz the set
Fx and the set Fzx coincide, i.e. the map Fz is a restriction of the map F to the set
Xx. Therefore, for the subgraphs of the tree graph we use the notation Gz = (Xt,F).
202 Chapter 4. Positional games
Figure 4.1
4.1.4. We shall now define the multistage game with complete information on a
finite tree graph.
Let G = (X, F) be a tree graph. Consider the partition of the node set X into
n + 1 sets X\,...,Xn,Xn+i, U"=xXi = X, Xk n Xi = 0 , k ^ I, where Fz = 0 for
x Xn+i- The set Xi, i l , . . . , n is called the priority set for the *-th player,
while the set Xn+i is called the set of final positions. The real-valued functions
Hi(x),...,Hn(x), x 6 Xn+i ate defined on the set of final positions X+i. The
function Hi(x), i = 1 , . . . ,n is called a payoff to the t-th player.
The game proceeds as follows. Let there be given the set N of players designated
by natural numbers l , . . . , t , . . . , n (hereafter denoted as N = { 1 , 2 , . . . , } ) . Let
x 0 Xtj, then in the node (position) x 0 player ti "makes a move" and chooses the
next node (position) xi Fxo. If xj Xi}, then in the node Xi Player tj "makes a
move" and chooses the next node (position) x% FXl and so on. Thus, if the node
(position) Xfc_! g Xik is realized at the fc-th step, then in this node Player t* "makes a
move" and selects the next node (position) from the set FXk_1. The game terminates
as soon as the terminal node (position) x/ A+i! (i.e. the node for which FXl = 0 )
is reached.
Such a step-by-step selection implies a unique realization of some sequence Xo> 1
x * , . . . , XJ determining the path in the tree graph G which emanates from the initial
position and reaches one of the final positions of the game. In what follows, such a
4.1. Multistage games with perfect information 203
path is called a play of the game. Because of the tree-like structure of the graph G,
each play uniquely determines the final position Xj to be reached and, conversely, the
final position xs uniquely determines the play. In the position X| each of the players
t, i = 1 , . . . , n, receives a payoff Hi(xi).
We assume that Player t making his choice in position x Xi knows this position
and hence, because of the tree-like structure of the graph G, can restore all previous
positions. In this case, the players are said to have perfect information. Chess and
draughts provide a good example of the game with perfect information, because
players can put down their moves, and hence they are said to know the past course
of the game when making each move in turn.
Definition. The single-valued map u, which sets up a correspondence between
each node (position) x Xi and some node (position) y Fx is called a strategy for
player i.
The set of all possible strategies for player i is denoted by {/,-. Now the strategy
of the t-th player prescribes him, in any position a; from his priority set Xi, a unique
choice of the next position.
The ordered set u => ( i , . . . , u ; , . . . , u n ) , where j Ui, is called a situation in the
game, while the Cartesian product U = n"=i Ui ' s called the set of situations. Each
situation u = ( u > , . . . , u , , . . . , u n ) uniquely determines a play in the game, and hence
payoffs to the players. Indeed, let x 0 A";,. In the situation u = ( u j , . . . , U j , . . . , u n )
the next position x\ is then uniquely determined by the rule ;,(x 0 ) = i j . Now let
X\ 6 Xi2. Then x 2 is uniquely determined by the rule u; 3 (xi) = x 2 . If the position
Xk-\ Xik is realized at the fc-th step, then x* is uniquely determined by the rule
Xfc = u;t(2fc_i) and so on.
Suppose that to the situation u = ( t t i , . . . , t t j , . . . ,u n ) in the above sense corre
sponds a play x 0 , x t , . , . , xj. Then we may introduce the notion of the payoff function
Ki for player i by equating its value in each situation to the value of the payoff Hi in
the final position of the play x 0 , . . , xj corresponding to the situation u = ( u j , . . . , u n ) ,
that is
Ki(uu...,Ui, . . . , n ) = Hi(xi), i = l,...,n.
The functions /",, i = l , . . . , n , are defined on the set of situations U = n?=i Ui-
Thus, constructing the players' strategy sets Ui and defining the payoff functions Ki,
i = 1 , . . . ,n, on the Cartesian product of strategy sets of the players we obtain a
game in normal form
T = (N,{U,}ieN,{KiUN),
where N = {!,... , t , . . . , n } is the set of players, Ui is the strategy set for player t,
and Ki is the payoff function for player t, i = 1 , . . . ,n.
4.1.5. For the purposes of further examination of the game T we need to introduce
the notion of a subgame, i.e. the game on a subgraph of the graph G in the main
game (see 1.1.1).
Let z 6 X. Consider a subgraph Gt = {XZ,F) which is associated with the
subgame Tz as follows. The players priority sets in the subgame Vz are determined
by the rule Y' = Xi fl Xz, i = 1 , . . . , n , the set of final positions Vjf+1 = Xn+i ("1 Xx,
204 Chapter 4, Positional games
//*(*) = # , ( * ) , i ^ + 1 , i = l,...,n.
The set of all strategies for player i in the subgame is denoted by U*. Then each
subgraph Gz is associated with the subgame in normal form
r. = (#,{?}, {/?}),
where the payoff function K', i = l , . . . , n are defined on the Cartesian product
v = nr=i if?.
The function u* is defined on player t's priority set A',-, * = 1 , . . . , n, and for every
fixed x e Xi the value u*(x) g Fx. Thus, u", i = 1 , . . . ,n, is a strategy for player
i in the game T, i.e. w* Ui. By construction, the truncation (u')z of the strategy
u* to the set Xi n Xz is the strategy appearing in the absolute Nash equilibrium of
the game 1%, z FZQ. Therefore, to complete the proof of the theorem, it suffices to
show that the strategies u", i: = 1 , . . . ,n constructed by formulas (4.2.2) constitute a
Nash equilibrium in the game T. Let t / M. By the construction of the strategy u^
after a position z" has been chosen by player i\ at the first step, the game T becomes
the subgame I V . Therefore,
Figure 4.2
Payoff to both players in situation (4.2.6) are less than those in situation (4.2.5). Just
as situation (4.2.5), situation (4.2.6) is an absolute equilibrium.
4.2.3. It is apparent that in parallel with "favorable" and "unfavorable" absolute
Nash equilibria there exists the whole family of intermediate absolute equilibria.
Of interest is the question concerning the absence of two distinct absolute equilibria
differing by payoffs to players.
T h e o r e m . [Rochet (1980)]. Let the players' payoffs Hi(x), i = 1 , . . . ,n, in the
game T be such that if there exists an o and there are x,y such that Hta(x) = Hi0(y),
then Hi(x) = Hi(y) for all i 6 N. Then in the game T, the players' payoffs coincide
in all absolute equilibria.
Proof. Consider the family of subgames Tx of the game T and prove the theorem
by induction over their length l(x). Let l(x) = 1 and suppose player i l makes a move
in a unique nonterminal position x. Then in the equilibrium he makes his choice from
the condition
#,,(*) = max #,,(*')
If the point z is unique, then so is the payoff vector in the equilibrium which is
here equal to H(x) = {Hi(x),...,Hn(x)}. If there exists a point W ^ x such
208 Chapter 4. Positional games
that #;,(f) = Hi^x), then there is one more equilibrium with payoffs H(W) =
{Hi(W),,.. , # , , ( ! ) , . . . , H(W)}. From the condition of the theorem, however, it fol
lows that if Hh(W) = Hh(w), then H{(W) = #;(x) for all N.
Let v(x) = {j(i)} be the payoff vector in the equilibrium in a single-stage sub-
game Tx which, as is shown above, is determined in a unique way. Show that if the
equality v^x') = w,(x") holds for some t0 {x\x" are such that the lengths of the
subgames IV, IV< are 1), then j(x') = Vi(x") for all t 6 N. Indeed, let x' 6 Xh,
x" X^, then
Vfe'i'
t>,,(x") = Hh(r) = max Hh(y)
and Vi(x') = ffi(x'), Vi(x") = #<(x") for all i g JV. From the equality UJ,(X') = ^ ( x " )
it follows that H^W1) = //^(x"). But, under the condition of the theorem, /f^x*) =
Hi(x") for allieN. Hence v.-(x') = t)<(x") for all i iV.
We now assume that in all subgames Fx of length l(x) < k 1 the payoff vector in
equilibria is determined uniquely and if for some two subgames Tz>, Tx whose length
does not exceed k l, t)<(x') = "^(x") for some i 0 , then v<(x') = v,(x") for all t N.
Suppose the game Tm is of length k and player it makes his move in the initial
position x0. By the induction hypothesis, for all z g Fzo in the game T, the payoffs in
Nash equilibria are determined uniquely. Let the payoff vector in Nash equilibria in
the game Tx be {,()}. Then as follows from (4.2.2), in the node x0 player t chooses
the next node z g Fxo from the condition
*,(*)= max Vil(z). (4.2.8)
If the point z determined by (4.2.8) is unique, then the vector with components
v.-(xo) = Vi(J), t = 1,... ,n, is a unique payoff vector in Nash equilibria in the game
r i ( ! . If, however, there exist two nodes z, I for which i>,,(z) = v;,(f), then, by the
induction hypothesis, since the lengths of subgames TT and TM do not exceed kl,
the equality v,,(z) = w,-,(*) implies the equality Vi(z) = t>,(z) for all i g N. Thus, in
this case the payoffs in equilibria v,(x0), t N are also determined uniquely.
4.2.4. Example 5. We have seen in the previous example that "favorableness" of
the players give them higher payoffs in the corresponding Nash equilibria, than the
"unfavorable" behavior. But it is not always the case. Sometimes the "unfavorable"
Nash equilibrium gives higher payoffs to all the players than "favorable" one. We
shall illustrate this rather nontrivial fact on example. Consider the two-person game
on the Fig. 4.3. The nodes from the priority set Xi are represented by circles and
those from X% by blocks, with players payoffs written in final position. On the figure
positions from the sets Xi (i = 1,2) are numbered by double indexes (, j) where is
the index of the player and j the index of the node x in the set Xi. One can easily see
that the "favorable" equilibrium has the form ((2,2,1,1,1), (2,1)) with payoffs (2,1).
The "unfavorable" equilibrium has the form ((1,1,2,1,1),(1,1)) with payoffs (5,3).
4.2.5. [Fudenberg and Tirole (1992)]. Consider the n-person game with complete
information, where each player i <n can either end the game by playing D or play
A and give the move to player t -f 1 (see Fig. 4.4).
4.2. Absolute equilibrium (subgame-perfect) 209
1 A 2 A n-1 4 n A
(2,2,... ,2)
D D D )
Figure 44
If player selects D, each player gets 1/t, if all players select A each gets 2. The
backward induction algorithm for computing the subgame perfect (absolute) equi
libria predicts that all players should play A. Thus the situation (A,A,...,A) is
a subgame perfect Nash equilibrium. (Note that in the game under consideration
each player is moving only once and has two alternatives, wich are also his strate
gies.) But there are also other equilibria. One class of Nash equilibria has the form
(D, A, A,D,...), where the first player selects D and at least one of the others selects
D. The payoffs in the first case are ( 2 , 2 , . . . ,2) and in the second ( 1 , 1 , . . . , 1). On
the basis of robustness argument it seems that the equilibrium {A, A,..., A) is ineffi
cient if n is very large. The equilibrium (D, A, A, / ? , . . . ) is such because the player 4
uses the punishment strategy to enforce player 1 to play D. This equilibrium is not
subgame perfect, because it is not an equilibrium in any subgame starting from the
positions 2,3.
210 Chapter 4. Positional games
Equations (4.3.8), (4.3.9) are equivalent and must be considered with the initial con
dition (i)| i J rj = #i(*)-
4.3.4. The Theorem in 4.2.1, considered for multistage zero-sum alternating
games, shows the existence of an equilibrium in the game of chess, draughts in the
class of pure strategies, while equations (4.3.8), (4.3.9) show a way of finding the
value of the game. At the same time, it is apparent that for the foreseeable future no
computer implementation is possible for solving these functional equations in order
to find the value of the game and optimal strategies. It is highly improbable that
we will know whether a player, "black" or "white", can guarantee the winning in
any play or there can always be a draw. In chess and draughts, however, successful
attempts are made to construct approximately optimal solutions by creating programs
capable to foresee several steps. Use is also made of various (obtained empirically)
estimations of current positions. Such an approach is possible in the investigatioD of
general multistage zero-sum games with perfect information. Successive iteration of
estimations (for several steps ahead) may lead to desirable results.
212 C&apter 4. Positioned games
Figure 4-5
The game T will be associated with two zero-sum games T1 and r 2 as follows. The
game Ti is a zero-sum game constructed in terms of the game T, where Player 2
plays against Player 1, i.e. K2 = K\. The game T2 is a zero-sum game constructed
in terms of the game T, where Player 1 plays against Player 2, i.e. Kx = K2.
The graphs of the games Tj, r 2 , T and the sets therein coincide. Denote by ( u ^ u ^ )
and (uj 2 ,5 2 ) absolute equilibria in the games r i , r 2 respectively. Let r ^ T ^ be
subgames of the games T , ^ ; Vi(x),v2(x) are the values of these subgames. Then
the situations {(tij,) 1 , (u 21 )*} and {(uj 2 ) r , (uj 2 ) x } are equilibria in the games r l x , r 2 r ,
respectively, and vl(x) = Kf((u'n)z,(u'2l)x), v2{x) = K%((u*l2)x,(um22)z).
Consider an arbitrary pair (i,u 2 ) of strategies in the game T. Of course, this
pair is the same in the games T], T 2 . Let Z = (x0 = z 0 , z , , . . . , zi) be the path to be
realized in the situation (ui,u 2 ).
Definition. The strategy () is called a penalty strategy of Player 1 if
4.4.3. From the definition of penalty strategies we immediately obtain the fol
lowing properties:
1. if 1 (u,(-),u,(-)) = Ht(z,), K2(u,(.),u2(-)) = tfate).
2. Suppose one of the players, say, Player 1, uses strategy i(-) for which the
position zk Z Ci Xi is the first in the path Z, where Ui(-) dictates the choice of the
next position z'k+1 that is different from the choice dictated by the strategy i(-), i.e.
-4+1 ^ zk+\- Hence from the definition of the penalty strategy u 2 ( ) it follows that
Figure 4.6
is the set of controls for division Bi predetermined by the control u of center Ao-
Now, the control center has the priority right to make the first move and may restrict
the possibilities of its subordinate divisions by channeling their actions as desired.
The aim of center A0 is to maximize the functional K0(u, t>i,..., vn) over u, whereas
divisions B t , t = 1 , . . . ,n, which have their own goals to pursue, seek to maximize the
functional #,(;, Vi) over t>,\
4.5.2, We shall formalize this problem as a noncooperative (n + l)-person game
T (an administrative center A0 and production divisions B\,..., Bn) in normal form.
Suppose Player A0 selects a vector u U, where
n
U - {u = ( ! , . . . , ) : Ui>0, R', i = l , . . . , n , ] , < &}, 6 > 0
1=1
is the set of strategies for player Ao in the game T. The vector U; will be interpreted to
mean the vector of resources of / items allocated by center Ao to the t-th production
division.
Suppose each of the players Bi in the original problem (see 4.5.1) knows the choice
by Ao and selects the vector t>,- 6 K(.) where
The vector Vi is interpreted as a production program of the t-th division for various
products; A; is the production or the technological matrix of the t-th production
division (/4; > 0); a, is the vector of available resources for the t-th production
division (a< > 0).
By definition, the strategies of Player Bj in the game T mean the set of functions
,() setting up a correspondence among the elements m : ( i t i , . . . , u<,..., u n ) U and
the vector Vj(tti) G Vi(u<). The set of such functions is denoted by Vi, i = 1 , . . . , n.
Let us define the players' payoff functions in the game T. The payoff function for
Player AQ is
n
t=l
For simplicity assume that the maxima in (4.5.2) and (4.5.3) are achieved. Note
that (4.5.3) is a nonlinear programming problem with an essentially discontinuous
objective function (maximization is taken over u, and "(,-) are generally discontin
uous functions of the parameter Uj). Show that the point (u*,J(-),.. .,*()) is an
equilibrium in the game I\ Indeed,
= Ki(u',v;(-),...,vll(.),viC)^Ui(-), ..-,<())
holds for any <() K. Thus, it is not advantageous for every player Ac, Bx,...,Bn
to depart individually from the situation (u*,i;J(-),..., v*(-)), i.e. it is an equilibrium.
Note that this situation is also stable against departures from it of any coalitionS C
{ B j , . . . , B}, since the payoff Kj to the t-th player does not depend on strategies
Equality (4.6.1) holds, since the coalition {B\,. ..,/?} can ensure a zero payoff to
Player Ao by selecting all t>; = 0, t = 1 , . . . ,n; equality (4.6.2) holds, since Player Ao
can always guarantee for S the payoff at most (4.6.2) by allocating to every Bi S
a zero resource; equality (4.6.3) holds, since coalition S incorporating Ao can always
ensure distribution of the whole resource only among its members.
Let S be an arbitrary coalition containing Ag. Denote by us = (uf,... , u j ) the
maximizing vector in the nonlinear programming problem (4.6.3) (condition uf = 0
holds for i : Bj $ S). The following expression holds for any coalition 5 C 5, S ^ Ao,
AoS:
E (a, + c,K(uf)> (a, + c,K(uf)
i:B,S i:B,S
= E (a + K(F) + E (* + ciKW-
Let S, .ft C JV, S n ft = 0 and A0 5 ^ AQ. Then ^ .ft. In view of the condition
o > 0, c,- > 0, i\ > 0, t = 1 , . . . , n, we have
where u = uN. The vector is an imputation since the following relationships are
satisfied:
1) E& = f > + );(*) = (AT);
*=0 >=1
By the Theorem 3.10.1, the necessary and sufficient condition for the imputation
(oi6, .n) t o belong to core is that the inequality
Lti>(S) (4.6.5)
+ E .;(*)> E (.-+c*K(uf).
i:Bj*S t:BieS
Therefore, the imputation (4.6.4) belongs to the core if the inequality
E *:(*)> E te + coitfluf)-;(*)]
:BjgS .:BjeS
m
M IS?? J E( a * + C*H(*) >
(different from the definition of a characteristic function adopted in Ch. 3).
4.6.3. The characteristic function of the game can be constructed in the ordinary
way, that is: it can be defined for every coalition S as the value of a zero-sum game
between this coalition and the coalition of the other players N \ S, We shall now
construct the characteristic function exactly in this way. In this case we shall slightly
generalize the preceding problem by introducing arbitrary payoff functions for players.
As in the previous case, we assume that center AQ distributes resources among
divisions B\,...,Bn which use these resources to manufacture products. The payoffs
to the control center A and "production" divisions B\,..., B depend on the output
of products by B i , . . . , ?. The vector of resources available to center AQ is denoted
by 6. Center (Player) ,4o selects a system of n vectors u = (tij,..., u n ) from the set
n
U = {u = (uu... ,u):ujt > 0 , uk rf,Eu* < M = li- -,}
'o(*) = '(**).
where the term l(xk) is interpreted to mean the payoff to Player AQ due from Player
Bk. We also assume that l(xk) > 0 for all xk Bk{uk) and /*(0) = 0, /(0) = 0,
ifc= l , . . . , n .
Just as in Sec. 4.5, so can the hierarchical game 4.6.3 be represented as a non-
cooperative (n + 1) person game in normal form, where the strategies for Player Ao
are the vectors u U, while the strategies for players Bk are the functions from the
corresponding sets. Let us construct the characteristic function () for this game
following 3.9.2. For each players' subset of S, we take v(S) to be the value (if it exists
in conditions of the subsection) of a zero-sum game between coalitions S and N\S,
in which the payoff to coalition 5 is determined as the sum of payoffs belonging to
the players set S.
Let N = {Ao,Bu...,Bn}. Then
Note that for all S C {Bx,..., Bn), v(S) = 0, since Player Ao can always distribute
the whole resource b among the members of coalition N \ S, to which he also belongs,
thereby depriving coalition S of resources (i.e. ^o can always set uk = 0 for k : Bk S,
which results in Bk(0) = 0 for all Bk S). Using this line of reasoning we get
v(Ao) = 0, since the players B\,..,Bn can always nullify a payoff to center Ao by
setting xk = 0 for k = 1 , . . . , n (without turning out products). It is apparent that J4O
will distribute the whole resource among the members of the coalition when coalition
S contains center Ao. This reasoning leads to the following formula:
for S : Ao S.
It can be shown that, under such definition of the characteristic function, the core
of the imputation set
n
a = (a0, fti,..., an): en > 0, i = 0 , 1 , . . . , n, a,; = v(N)
220 Chapter 4. Positional games
is always nonempty.
4.6.4. Hierarchical systems with double subordination are called diamond-shaped
(Fig. 4.7). Control of a double subordination division C depends on control B\ and
control B 2 .
Figure 4-7
r = (U,BuB^C,Kx,K2,K^K<).
4.6.5. We shall now seek a Nash equilibrium in the game T. To this end, we shall
perform additional constructions.
For every fixed pair (wi,W2), (u)i,w2) G U u 6 [/Bi(ui) x B2(u2) we denote by
v"(u>i,u>2) a solution to the parametric extremal problem.
*,(,,(.),";(),*()) = n u g c / i K K f t i . J . w ^ u , ) ) )
> /.(^(wKuO.wSM)) = K.KwK-),^-), '())
for all u e U. Since wj(u,),wj(uj) form a Nash equilibrium in the auxiliary game
F ( U J , U J ) the relationships
d)S = {B2,C}
v'(S) = min max min max (hM + U(v));
e)S={B1,B3,C)
vt/(S) = min max max max V* Uv):
eu <*, B, (m) ^j 82(02) ec(i ,uj) , J j ^ 4
t)S={A0,B1,C}
v'(5)
v = max max min max V* Uv):
u&J , 6 B j ( u , ) W J B J ( U J ) 6 C ( , ^ , ) ; J ^ J 4
g)S={Ao,B2,C]
w'(5)
v? = max max min max V] /,();
ut/ W26B,(uj)iBi(u,)ueC(wl^j)j=~i4
h)S = {Ao,Bi,B2,C}
4
t/(S) = max max max max yii(v).
uV ifli(ui)tB2(M)BC(wi)^
# ( 1 , 1 , 2 ) = - 2 , # ( 2 , 1 , 2 ) = 1,
/Y(l,2,l) = 2, # ( 2 , 2 , 1 ) = 1,
# ( 1 , 2 , 2 ) = - 5 , # ( 2 , 2 , 2 ) = 5. (4.7.1)
The graph G (X, F) of the game is depicted in Fig. 4.8. The circles in the graph
represent positions in which Player 1 makes a move, whereas the blocks represent
positions in which Player 2 makes a move.
If the set X, is denoted by X, the set X 2 by Y and the elements of these sets by x
X, y 6 Y, respectively, then Player l's strategy ux(-) is given by the five-dimensional
vector Ui(-) = {ui(ii),Ui(x2),ui(x 3 ),111(14),ui(xs)} prescribing the choice of one of
the two numbers {1,2} in each position of the set X. Similarly, Player 2's strategy
u 2 (-) is a two-dimensional vector u2(-) = {"2(2/1), "2(5/2)} prescribing the choice of one
of the two numbers {1,2} in each of the positions of the set V. Now, in this game
Player 1 has 32 strategies and Player 2 has 4 strategies. The corresponding normal
form of the game has a 32x4 matrix which (this follows from the Theorem in 4.2.1)
has an equilibrium in pure strategies. It can be seen that the value of this game
is 4. Player 1 has four optimal pure strategies: (2,1,1,1,2), (2,1,2,1,2), (2,2,1,1,2),
(2,2,2,1,2). Player 2 has two optimal strategies: (1,1), (2,1).
224 Chapter 4. Positional games
- 3 - 2 2 - 5 4 1 1 5
Figure 4-8
Figure 4.9
4.7. Multistage games with incomplete information 225
This game has no equilibium in pure stategies. The value of the game is 19/7,
an optimal mixed strategy for Player 1 is the vector (0,0,4/7,3/7), and an optimal
mixed stategy for Player 2 is (4/7,3/7,0,0). The guaranteed payoff to Player 1 is
reduced as compared to the one in Example 7. This is due to the degradation of his
information state.
It is interesting to note that the game in Example 8 has a 4 x 4 matrix, whereas the
game in Example 7 has a 32 x 4 matrix. The deterioration of available information
thus reduces the size of the payoff matrix and hence facilitates the solution of the
game itself. But this contradicts the wide belief that the deterioration of information
results in complication of decision-making.
Modifying information conditions we may obtain other variants of the game de
scribed in Example 7.
Example 9. Player 1 chooses at the first move a number from the set {1,2}. The
second move is made by Player 2, who, without knowing Player l's choice, chooses
a number from the set {1,2}. Further, the third move is made by Player 1. Being
informed about Player 2's choice and with the memory of his own choice on the first
step he chooses a number from the set {1,2}. The payoff is determined in the same
way as in Example 7 (Fig. 4.10).
Since on the third move the player knows the position in which he stays, the
226 Chapter 4. Positional games
Figure 4.10
positions of the third level are enclosed in circles and the two nodes, in which Player
2 makes his move, are traced by the dashed line and are included in one information
set.
Example 10, Player 1 chooses a number from the set {1,2} on the first move. The
second move is made by Player 2 without being informed about Player l's choice.
Further, on the third move Player 1 chooses a number from the set {1,2} without
knowing Player 2's choice and with no memory of his own choice at the first step.
The payoff is determined in the same way as in Example 7 (Fig. 4.11).
Figure ^.//
Here the strategy of Player 1 consists of a pair of numbers (, j), the t-th choice
is at the first step, and j-th choice is at the third step; the strategy of Player 2 is a
choice of number j at the second step of the game. Now, Player 1 has four strategies
4.7. Multistage games with incomplete information 227
and Player 2 has two strategies. The game in normal form has a 4 x 2 matrix:
1 2
(1.1) [-3 2'
(1.2) -2 -5
(2.1) 4 1
(2.2) [ 1 5.
The value of the game is 19/7, an optimal mixed strategy for Player 1 is
(0,0,4/7,3/7), whereas an optimal strategy for Player 2 is (4/7,3/7).
In this game the value is found to be the same as in Example 8, i.e. it turns out
that the deterioration of information conditions for Player 2 did not improve the state
of Player 1. This condition is random in nature and is accountable to special features
of the payoff function.
Example 11. In the previous example the players fail to distinguish among posi
tions placed at the same level of the game tree, but they do know the move to be
made. It is possible to construct the game in which the players may reveal their
ignorance to a greater extent.
Let us consider a zero-sum two-person game in which Player 1 is one person,
whereas Player 2 is the team of two persons, A and B. All three persons are placed
in different rooms and cannot communicate with each other. At the start of the game
a mediator comes to Player 1 and suggests that he should choose a number from the
set {1,2}. If Player 1 chooses 1, the mediator suggests that A should be the first
to make his choice. However, if Player 1 chooses 2, the mediator suggests that B
should be the first to make his choice. Once these three numbers have been chosen,
Player 1 wins an amount K(x,y,z), where x,y,z are the choices made by Player 1
and members of Team 2, A and B, respectively. The payoff function K(x,y, z) is
defined as follows:
tf(l,l,l) = l, # ( 1 , 1 , 2 ) = 3,
# ( 1 , 2 , 1 ) = 7, # ( 1 , 2 , 2 ) = 9,
# ( 2 , 1 , 1 ) = 5 , # ( 2 , 1 , 2 ) = 1,
# ( 2 , 2 , 1 ) = 6, # ( 2 , 2 , 2 ) = 7.
From the rules of the game it follows that when a member of the team, A or ,
is suggested that he should make his choice he does not know whether he makes his
choice at the second or at the third step of the game. The structure of the game is
shown in Fig. 4.12.
Now, the information sets of Player 2 contain the nodes belonging to different
levels, which corresponds to the ignorance of the number of the move in the game.
Here Player 1 has two strategies, whereas Player 2 has four strategies composed of
all possible choices by members of the team, A and B, i.e. strategies for him are the
pairs (1,1),(1,2),(2,1),(2,2).
In order to understand how the elements of the payoff matrix are determined, we
consider a situation (2,(2,1)). Since Player 1 has chosen 2, the mediator goes to B
228 Gbapter 4. Positional games
6 7
//
Figure 4-12
who, in accordance with strategy (2,1), chooses 1. Then the mediator goes to A who
chooses 2. Thus the payoff in situation (2,(2,1)) is #(2,1,2) = 1. The payoff matrix
for the game in normal form is
The value of the game is 17/5 and optimal mixed strategies for players 1 and 2
respectively are (2/5,3/5), (3/5,0,2/5,0).
Note that in multistage games with perfect information (see Theorem in 4.2.1)
there exists a Nash equilibrium in the class of pure strategies, while in multistage
zero-sum games there exists an equilibrium in pure strategies. Yet all the games
with incomplete information discussed in Examples 8-11 have no equilibrium in pure
strategies.
4.7.2. We shall now give a formal definition of a multistage game in extensive
form.
Definition. [Kuhn (1953)]. The n-person game in extensive form is defined
h
1) Specifying the tree graph G = (X,F) with the initial vertex x0 referred to as
the initial position of the game.
2) Partition the sets of all vertices X into n + 1 sets Xi,X%,..., Xn, Xn+i, where
the set Xi is called the priority set of the i-th player, i = 1,..., n, and the set Xn+i =
{x : Fz 0 } is called the set offinalpositions.
S) Specifying the vector function K(x) = (Ki(x),..., Kn(x)) on the set of final
positions x X n+J ; the function Ki(x) is called the payoff to the i-th player.
4) Subpartition of the set Xi, i= 1,..., n into nonoverlapping subsets X' referred
to as information sets of the i-th player. In this case, for any position of one and
the same information set the set of its subsequent vertices should contain one and the
4.8. Behavior strategy 229
some number of vertices, i.e. for any x,y X'\FX\ = \Fy\ (\FZ\ is the number of
elements of the set Fz), and no vertex of the information set should follow another
vertex of this set, i.e. ifxX(, then there is no other vertex y X{ such that y FT
(see 4.1.2).
The definition of a multistage game with perfect information (see 4.1.4) is distin
guished from the one given here only by condition 4, where additional partitions of
players' priority sets Xi into information sets are introduced. As may be seen from
the above examples, the conceptual meaning of such a partition is that when player
i makes his move in position x Xi in terms of incomplete information he does not
know the position x itself, but knows that this position is in a certain set X' C A",-
(x X'). Some restrictions are imposed by condition 4 on the players' information
sets. The requirement \FX\ = \FV\ for any two vertices of the same information set
are introduced to make vertices x,y X' indistinguishable. In fact, with \FZ\ ^ \Fy\
Player t could distinguish among the vertices x,y Xj by the number of arcs ema
nating therefrom. If one information set could have two vertices x,y such that y Fx
this would mean that a play of the game can intersect twice an information set, but
this in turn is equivalent to the fact that player j has no memory of the number of
his move in this play which can hardly be conceived in the actual play of the game.
Figure 4.13
Suppose that in the game V all alternatives are enumerated as above. Let A* be
230 Chapter 4. Positional games
the set of all vertices x X having exactly k alternatives, i.e. At = {x : \FX\ = k}.
Let J, = {Xf : XI C Xt} be the set of all information sets for player i. By definition
the pure strategy of player i means the function u< mapping U into the set of positive
numbers so that Ui(Xf) < k if Xf C ^4*. We say that the strategy u,- chooses
alternative / in position x Xf if Ui(Xf) = /, where I is the number of the alternative.
As in 4.1.4, we may show that to each situation () = ( i ( ) , . . .,()) uniquely
corresponds a play w, and hence the payoff in the final position of this play.
Let * Xn+i be a final position and ui is the only path (F is the tree) leading
from x 0 to x. The condition that the position y belongs to the path w will be written
as y o> or y < x.
Definition. Position x X is called possible for u , ( ) if there exists a situation
u(-) containing ,() such that the path u> containing position x is realized in situation
u(-), i.e. x w. The information set Xf is called relevant for u,(-) if some position
x Xf is possible for ;()
The set of positions, possible for u,(-), is denoted by Possui(-), while the collection
of information sets that are relevant for u,-(-) are denoted by Reiuj(-).
Lemma. Position x X is possible for ,() if and only / u , ( ) chooses alter
natives lying on the segment of the path wx from XQ to x in all its information sets
intersecting u>x.
Proof. Let x Possui(-). Then there exists a situation () containing Uj(-) such
that the path to realized in this situation passes through x, which exactly means that
in all its information sets intersecting the segment of the path wx the strategy u,(-)
chooses alternatives (arcs) belonging to ux.
Now let Uj(-) choose all alternatives for player i in w r . In order to prove the
possibility of x for u<(-) we need to construct a situation () containing u.-(-) in
which the path would pass through x. For player k ^ * we construct a strategy
() which, in the information sets X'k intersecting the segment of the path ut,
chooses alternatives (arcs) lying on this path and is arbitrary otherwise. Since each
information set only intersects once the path w, this can always be done. In the
resulting situation () the path w necessarily passes through x; hence we have shown
that x POSSUJ(-).
4.8.2. Mixed strategies in the games in extensive form T axe defined in the same
way as in 1.4.2, for finite games.
Definition. The probability distribution over the set of pure strategies of player
i which places his every pure strategy <() in correspondence with probability qUi(')
(for simplicity we write quJ is called a mixed strategy ^, for player i.
The situation ft = (/*j,...,^ n ) in mixed strategies determines the probability
distribution over all plays (paths) u (hence, in final positions Xn+i as well) by the
formula
^ . M = I ] 9i 9n-PH,
The proof of this statement immediately follows from Lemma in 4.8.1. The math
ematical expectation of the payoff Ei(fi) for player in situation p is
X>(*/,") = i,
where Ak = {x : \FX\ = k).
The numbers b(X',v) can be interpreted as the probabilities of choosing alter
native v in the information set X\ C Ak each position of which contains exactly k
alternatives.
Any behavior strategy set 0 = (/Jj,... ,/?) for n players determines the probability
distribution over the plays of the game and in final positions as follows:
Here the product is taken over all Xf,v such that X / U w ^ 0 and the choice in the
point X' n w of an alternative numbered as v leads to a position belonging to the
path w.
m what follows it is convenient to interpret the notion of a "path" not only as a
set of its component positions, but also as a set of suitable alternatives (arcs).
The expected payoff ,(/?) in the behavior strategy situation /? = (/?i,... ,/?) is
defined to be the expectation
If X' & Relfii, then on the set Xf the strategy ft can be defined as distinct from
(4.8.4) in an arbitrary way. (In the case X- # Relfii the denominator in (4-8.4) goes
to zero.) For definiteness, let
*(*>) = E ?- (4-8-5)
4.8.6. L e m m a . Let T be a perfect recall game for all players with w as a play
in T. Suppose x G X' is the final position of the path u> in which player i makes his
move, and suppose he chooses in x an arc v. Let
Ifui has no positions from Xi, then we denote 6j/T,(w) the set of all pure strategies for
player i. Then the play w is realized only in those situations () = (i(-)> >**("))
for which u< G T;(w).
Proof. Sufficiency. It suffices to show that if Uj G Tj(w), then the strategy u<
chooses all the arcs (alternatives) for player i appearing in the play w (if player t has
a move in w). However, if , G ^ ( w ) then X' G Relui, and since the game V has
perfect recall, x G Possui (x G w). Thus, by Lemma in 4.8.1, the strategy u< chooses
all the alternatives for player i appearing in the play u>.
Necessity. Suppose the play w is realized in situation u(-), where U; Ti{u>) for
some i. Since X' G Reiu,, this means that u,-(X/) ^ v. But then the path u> is not
realized. This contradiction completes the proof of the lemma.
4.8.7. L e m m a . Let T be a perfect recall game for all players. Suppose v is an
alternative (arc) in a play w that is incidental to x G X', where x w, and the next
position for player i (if any) on the path u> is y G X*. Consider the sets S and T,
where
S = {u, : X{ G ReJu,, <(*/) = v),
T={m: X* G Relui}.
Then S = T.
Proof. Let Ui G S. Then X' G Relui, and since T has perfect recall x G Possui.
By Lemma 4.8.1, it follows that the strategy u,- chooses all the arcs incidental to
player i's positions on the path from XQ to x, though u,-(X'/) = v. Thus, Uj chooses all
the arcs incidental to Player 's positions on the path from x0 to y, i.e. y G Possui,
X? G Keiu; and u{ G T.
Let u, T. Then X* G Relui, and since T has perfect recall y G Fossu,. But this
means that x 6 Pos$m and u^(X/) = v, i.e. u; G S. This completes the proof of the
lemma.
4.8.8. T h e o r e m . Let $ be a situation in behavior strategies corresponding to
a situation in mixed strategies p in the game T (in which all positions have at least
two alternatives). Then for
Etffi) = Eiin), i = l , . . . , n ,
it is necessary and sufficient that T be a perfect recall game for all players.
Proof. Sufficiency. Let T be a perfect recall game for all players. Fix an arbitrary
p. It suffices to show that /^(w) = P^UJ) for all plays w. If in u there exists a
position for player i belonging to the information set that is irrelevant for fc, then
there is Xj G Rei/i;, X' f l u ^ 0 such that the equality b(Xi,v) 0 where K
holds for the behavior strategy /J, corresponding to /*,-. Hence we have F#(w) = 0.
The validity of relationship Pu(u) = 0 in this case is obvious.
234 Chapter 4. Positional games
We now assume that all the information sets for the t-th player through which
the play w passes, are essential for pi, i = 1,2,...,n. Suppose player i in the play
w makes his succeeding moves in the positions belonging to the sets X},... ,X' and
chooses in the set X\ an alternative i/j, i = 1,..., s. Then, by formula (4.8.4) and
Lemma 4.8.7, we have
n*w,"i)= E fa-
Indeed, since in the play w player t makes his first move from the set X,', it is essential
for all u,(-), therefore the denominator in formula (4.8.4) for b(X*,v\) is equal to 1.
Further, by Lemma 4.8.7, the numerator 6(X7,Vj) in formulas (4.8.4) is equal to the
denominator b(Xj+l, i/J+1), i = 1 , . . . , s. By formula (4.8.3), we finally get
w = n fa>
where Tj(w) is determined in Lemma 4.8.6.
By Lemma 4.8.6
*V(W) = E f e - fa,^(w) = fa fa.
u(-) u:uiTi(w), i=l,...,n
i.e. PM(w) = JP/J(W). This proves the sufficiency part of the theorem.
Necessity. Suppose T is not a perfect recall game for all players. Then there exist
player i, a strategy u,, an information set Xj Reiuj and two positions x, y X\ such
that x e Possu,-, y Possu,-. Let u\ be a strategy for player t for which y PossuJ and
w is the corresponding play passing through y in situation u'. Denote by p,- a mixed
strategy for player t which prescribes with probability 1/2 the choice of strategy u,-
or u\. Then P'||w(y) = ^'||(w) 1/2 (here u'|j/ij is a situation in which the pure
strategy uj is replaced by the mixed strategy ni). Prom the condition y Possui
it follows that the path & realized in sitaution u'||u, does not pass through y. This
means that there exists X$ such that JVf n w = X$ (1 w ^ 0 and ,(Xf) ^ |(A'^).
Hence, in particular, it follows that X* Relui, X* ReiuJ. Let # be the behavior
strategy corresponding to p<. Then ^A^u^X*)) = 1/2. We may assume without
loss of generality that u,(X/) ^ u'{(Xf). Then 6(X/,uJ(X/)) = 1/2. Denote by (t
a situation in behavior strategies corresponding to a mixed strategy situation u'||/<i.
Then P/j(w) < 1/4, whereas Pu'|/(<*>) = 1/2. This completes the proof of the theorem.
Prom Theorem 4.8.8, in particular, it follows that in order to find an equilibrium
in the games with perfect recall it is sufficient to restrict ourselves to the class of
behavior strategies.
However, when information sets have a simple structure this theorem provides a basis
for derivation of functional equations for the value of the game and the methods of
finding optimal strategies based on these equations. The simplest games with perfect
recall, excluding games with perfect information, are the so-called repeated zero-sum
games. We shall derive a functional equation for the value of such games and consider
some of the popular examples [Diubin and Suzdal (1981), Owen (1968)] where these
equations are solvable.
4.9.1. Conceptually, a repeated game is a multistage zero-sum game, where at
each step of the game the Players 1 and 2 choose their actions simultaneously, i.e.
without being informed about the opponent's choice at this moment. After the choices
have been made they become known to both players, and the players again make their
choices simultaneously, and so on.
Such a game can be represented with the help of a graph which may have one of
the two representations a) or b) in Fig. 4.14.
Figure 4-H
The graph represents an alternating game with an even number of moves, where
the information sets for a player who makes the first move are single-element, while
the information sets for the other player are two-element. In such a game T the two
players have perfect recall. Therefore, in this game, by Theorem 4.8.8, the search for
an equilibrium may be restricted to the class of behavior strategies.
For definiteness, we assume that the first move in T is made by Player 1 and for
every x X$ there is a subgame Tx which has the same structure as the game T.
The normal form of any finite-stage zero-sum game with incomplete information is a
matrix game, i.e. a zero-sum game with a finite number of strategies; therefore in all
subgames Tx, x S X\ (including the game V = r i 0 ) there exists an equilibrium in the
class of mixed strategies. By Theorem 4.8.8, such an equilibrium also exists in the
class of behavior strategies and the values of the game (i.e. the values of the payoff
function in a mixed strategy equilibrium or in a behavior strategy equilibrium) are
equal.
Denote the value of the game Tz by v(x), x Xx and set up functional equations
for v(x).
For each x X\ the next position x' (if any), in which Player 1 makes his move,
belongs to the set F*. Position x' is realized as a result of two consecutive choices:
236 Chapter 4. Positional games
first by Player 1 of an arc incidental to vertex x, and then by Player 2 of an arc in
positions y Fx forming information sets for Player 2. Hence we may say that the
position x' results from the mapping of Tx depending on the choices of or, fi by the
Players 1 and 2, i.e.
x' = Tx(a,0).
Since the number of various alternatives a and /? is finite, for every x X%,
we may consider a matrix game with the payoff matrix Ax = {v{Tx(a,f3)}}. Let
fi(x) = {b',(x,a)}, 0ji(x) = {b'n(x,0)} be optimal mixed strategies in the game
with the matrix Ax. Then we have the following theorem for the structure of optimal
strategies in the game Tx.
Theorem. In the game T an optimal behavior strategy for Player 1 at the point
x (each information set of Player 1 in the game V consists of one position x 6 X\)
assigns probability to each alternative a in accordance with an optimal mixed strategy
of Player 1 in the matrix game Az = {v(Tx{a, /3))} that is
6,(x,a) = 6J(3,a).
An optimal behavior strategy {6j(-^2) 0)} f Player 2 in the game T assigns prob
ability to each alternative fi in accordance with an optimal mixed strategy of Player S
in the game with the matrix Ax, i.e.
b2(xi,0) = rn(x,0),
where x = F~l ify&X^.
The value of the game satisfies the following functional equation:
v(x) = Val{v[Tx(a,ffl}, x Xu (4.9.1)
with the initial condition
v(*)Ux3 = H(x). (4.9.2)
(Here ValA is the value of the game with matrix A.)
The proof is carried out by induction and is completely analogous to the proof of
Theorem 4.2.1.
4.9.2. Example 12. (Game of inspection.) [Diubin and Suzdal (1981)]. Player E
(Violator) wishes to take a wrongful action. There are N periods of time during which
this action can be performed. Player P (Inspector) wishes to prevent this action, but
can perform only one inspection during any one of these periods. The payoff to Player
E is 1 if the wrongful action remains undetected after it has been performed, and is
(1) if the violator has been detained (this is possible when he chooses for his action
the same period of time as the inspector for his inspection); the payoff is zero if the
violator takes no action. Denote this iV-step game by V^.
Each player has two alternatives during the first period (at the 1st step). Player
E may or may not take an action; Player P may or may not perform inspection. If
Player E acts and Player P inspects, then the game terminates and the payoff is 1.
4.9. Functional equations for simultaneous multistage games 237
If Player E acts, while Player P fails to inspect, the game terminates and the payoff
is 1. If Player E does not act, while Player P inspects, then Player E may take action
during the next period of time (assuming that N > 1) and the payoff will also be 1.
If Player E does not act and Player P does not inspect, they pass to the next step
which differs from the previous one only in that there are less periods left before the
end of the game, i.e. they pass to a subgame r ^ - i - Therefore, the game matrix for
the 1st step is as follows:
-1 1
(4.9.3)
1 V/v-i
Equation (4.9.1) then becomes
-1 1
VN : VaJ (4.9.4)
1 VN-l
Here v(x) is the same for all game positions of the same level and hence depends only
on the number of periods until the end of the game. For this reason we write v/v in
place of v(x). In what follows it will be shown that I>JV-I < 1; hence the matrix in
(4.9.4) does not have a saddle point, i.e. the game with matrix (4.9.4) is completely
mixed. From this (see 1.9.1) we obtain the recursive equation
VN-l + 1
vN (4.9.5)
-v/v-i + 3 '
which together with the initial condition
0 (4.9.6)
-1 1
1 [N-2]/N
and the optimal behavior strategies are
1
\N + l'N + l)' U + l'iV + lA
Example 13. (Game-theoretic features of optimal use of resource.) Suppose that
initially the Players 1 and 2 have respectively r and R r units of some resource
238 Chapter 4. Positional games
and two pure strategies each. We also assume that if the players choose the same
pure strategies, then Player 2's resource is reduced by a unit. If, however, the players
choose different pure strategies, then Player l's resource is reduced by unit. The game
terminates after the resource of one of the players has become zero. In this case the
payoff to Player 1 is 1 if the resource of Player 2 is zero. The payoff to him is 1 if
his resource is zero.
Denote by Tk,i a multistage game in which Player 1 has k (k = 1,2,... , r ) units,
and Player 2 has / (/ = 1 , . . . , R - r) units of resource. Then
VaJT w _, ValTk_u
VailY, = Val
VailV,,, Vff w _,
1
ri.i = -1
The game r u is symmetric, its value denoted by " U is zero, optimal strategies for
players coincide and are equal to (1/2,1/2).
At the 2nd step from the end, i.e. when the players are left with 3 units of resource,
one of the two matrix games is played: Fi,2 or f ^ i . In this case
"1,1 -1 i,i - 1
w1%2 = VaJI\ 2 = Val
-1 vu
1 i,i
va,, = Va/r 2il = Val
t>u 1 2 2"
At the 3rd step from the end (i.e. the players have a total of 4 units of resource)
one of the following three games is played: I \ 3 , TJi2, r 3 i i . In this case
"r,R-r-l "r-l,fl-r
v,jt-r = Vairr,R_r = Val
"r-l,K-r "r,fl-r-l
4.9. Functional equations for simuitaneous multistage games 239
optimal behavior strategies for the players at each step coincide and are equal to
(1/2,1/2).
Example 14- This jocky game is played by two teams Player 1 (mi women
and m 2 cats) and Player 2 (n, mouses and n 2 men). Each team chooses at each
step his representative. One of the two chosen representatives is "removed" by the
following rule: woman "removes" man; man "removes" cat; mouse "removes" woman;
cat "removes" mouse. The game is continued until only players of one type remain
in one of the groups. When a group has nothing to choose the other group evidently
wins.
Denote the value of the original game as u ( m i , m 2 , n i , n 2 ) . Let
v(mi 1) v(n 2 1)
v ( m i , m 2 , n i , n 2 ) = Vai v(n-i 1) u(m 2 1)
It can be shown that this game is completely mixed. By Theorem 4.9.1, we have
(m T ) T = m for all m.
240 Chapter 4. Positional games
For every information set u there exists an information set r such that every
alternative at u is mapped onto a choice at T , for every endpoint x Xn+i there
exists an endpoint xT Xn+i such that if zis reached by the sequence m s , ma,... ,m n ,
then xT is reached by (a permutation of) m f , m j , . . . , m j , and the payoffs Hi(x) =
H2(xT) for every endpoint x X+t, xT G X n + i.
A symmetric game in extensive form is a pair (r, T) where F is a game in extensive
form and where T is a symmetry of T. If 6 is a behavior strategy of player 1 in
(r, T), then the symmetric image of b is the behavior strategy b7 of Player 2 defined
by
bT(m) = KT(rnT) (u g Uum e M u ).
If &i,6j are behavior strategies of Player 1, then the probability that the endpoint x
is reached when (frj, b7) is played is equal to the probability that xT is reached when
(62,6f) is played. Therefore the expected payoff to Player 1 when (61, b7) is played is
equal to Player 2's expected payoff when (63, bT) is played
Ei(bi,b2) = ;(foA )
This equation defines the symmetric normal form of (r, T) if restricted to the pure
strategies. FoUowing van Damme (1991) define the direct ESS of (r, T), as a behavior
strategy 6 of Player 1 that satisfies
E1(S,f) = m%xE1(b,f)
(H,H,D,H,D)
4.9. Functional equations for simuJtaneous multistage games 241
Figure 4-15
v() = (H,H,D,D,D)
for which the payoff (in pure strategies the expected payoff E coincides with the
payoff function K) in the situation ((),()) K(v(-),u()) is equal to the payoff in
the situation (u(-),u(-)) - K(u(-),u(-)).
tf ((), u(-)) = *(tt(.), ()) = - < . ,
*((),()) = *((),()),
then
* ( ( ) , ( ) ) < * ( ( ) , ( ) )
*=o
If h{(x) = 0, x Xi, i = l , . . . , n , we have the game exactly defined in form
(4.1.1). As it was done in the classical cooperative game theory (see Chap. 3) we
suppose that before starting the game the players agree to choose such n tuple of
strategies
" ( ) = ( l ( - ) . .<(). ()).
which maximizes the sum of the payoffs by the players. If ? = (ZQ, . . . , z*,..., zj),
zi 6 X n + i is the path (trajectory) realized in the situation u(-) = (i,...,,-,... ,),
then by the definition of () we have that
V(N)='thi(jk),N = {l,...,n},
and for Si C N, S2 C N, St n 5 2 = 0
V(S1uS2)>V(Sl) + V(Si),
4.10. Cooperative multistage games with complete information 243
V ( 0 ) = 0.
If the characteristic function is defined then we can define the set of imputations
the core M C C
M = {( = ((,)--Y:t^V(S),ScN}cC,
iVM solution, Shapley value, and other optimality principles of classical game theory.
In what follows we shall denote by M C C anyone of this optimality principles.
Suppose at the beginning of the game the players agree to use the optimality
principle M C C as the basis for the selection of the "optimal" imputation ( 6 M.
This means that playing cooperatively by choosing the strategy maximizing the
common payoff each one of them is waiting to get the payoff ^ from the optimal
imputation f M after the end of the game (after the maximal common payoff
V(N) is really earned by the players).
But when the game T actually develops along the "optimal" trajectory z =
(z 0 ,2),. . . , ) . , . . . , ; ) at each vertex z* the players find themselves in the new multi
stage game with complete information r j 4 , k = 0 , . . . , /, which is the subgame of the
original game V starting from z* with the payoffs
It is important to mention that for the problem (4.10.1) the Bellman optimality
principle holds and the part zk = (:?*,..., Z j , . . . ,7j) of the trajectory z, starting from
Zk maximizes the sum of the payoffs in the subgame Tjk, i.e.
At the same time at the beginning of the game T = r(x0) = F(z0) the player t
was oriented to get the payoff f, the ith component of the "optimal" imputation
f M C C. From this it follows that in the subgame Tjk he is expected to get the
payoff equal to
l-H?=t!\ = l,...,n (4.10.4)
244 Chapter 4. Positioned games
and then the question arises whether the new vector * = ( j * , . . . i & V - , )
remains to be optimal in the same sense in the subgame F,k as the vector was in
the game r ( I 0 ) - If this will not be the case, it will mean that the players in the
subgame T^k will not orient themselves on the same optimality principle as in the
game T(z0) which may enforce them to go out from the cooperation by changing
the chosen cooperative strategies u , ( ) , i = l , . . . , n and thus changing the optimal
trajectory ~zk in the subgame Tjk. Try now formalize this reasoning.
Introduce in the subgame Tjk, k = 1 , . . . , / , the characteristic function V(S;zjt),
S C N. In the same manner as it was done in the game T r(zo). Based on the
characteristic function V(S; Jk) we can introduce the set of imputations
NM-solution, Shapley value and other optimality principles of the classical game
theory. Denote by M{zk) C C(lk) the optimality principle M C C (which was
selected by players in the game r(o)) considered in the subgame Tjk.
If we suppose that the players in the game T(zo) when moving along the optimal
trajectory (J0,.. .,zk,... ,zj) follow the same ideology of optimal behavior then the
vector * = HZk must belong to the set M(zk) the corresponding optimality
principle in the cooperative game Tjk, k = 0,...,l.
It is clearly seen that it is very difficult to find games and corresponding optimality
principles for which this condition is satisfied. Try to illustrate this on the following
example.
Suppose that in the game T hi(z) / 0 only for z E Xn+i (the game T is the game
with terminal payoff as the game in the Sec. 4.1). Then the last condition would
mean that
t = ?keM(sk), k = o,...,i,
which gives us
len'k=0M(zk). (4.10.5)
For k = / we shall have that
I 6 M(z,)-
But M(zj) = C(z~i) = {hi(z~t)}. And this condition have to be valid for all imputations
of the set M(zo) and for all optimality principles M{z$) C C{ZQ), which means that
in the cooperative game with terminal payoff the only reasonable optimality principle
will be
the payoff vector obtained at the end point of the trajectory in the game T(zo). At
the same time the simplest examples show that the intersection (4.10.5) except the
"dummy" cases, is void for the games with terminal payoffs.
4.10. Cooperative multistage games with complete information 245
How to overcome this difficulty. The plausible way of finding the outcome is to
introduce a special rule of payments (stage salary) on each stage of the game in such
a way that the payments on each stage will not exceed the common amount earned
by the players on this stage and the payments received by the players starting from
the stage k (in the subgame Tj^) will belong to the same optimality principle as the
imputation on which the players agree in the game Tz0 at the beginning of the game.
Whether it is possible or not we shall consider now.
Introduce the notion of the imputation distribution procedure (IDP).
Definition Suppose that = { i , . . . , & , . . ,} G M(ZQ).
Any matrix 0 = {ft*}, i = 1 , . . . ,n, k 0 , . . . , / such that
0(k)$>M(zk)cM(zo), k = Q,...,l.
??=ftc
From the definition it follows that
Pi
V(N;z0) '
where C(z0)
fll _ & -'=1 "(*l) e 1 , o(-z V
ft 6C(Z,)
~ V(N,zi) ' * '
^ = ^'v[y,?f)' eeC{J,)
- {4 1010)
"
4.11. Voting the directorial council 247
f3(k)C(zk)cC(z0),
# = i, * = o,...,/,
1=1
and thus
E&() = !>.(**), (4.10.11)
i=\ i=\ k=0
which is the actual amount to be divided between the players on the first 6 + 1 stages
and which is as it is seen by the formula (4.10.11) exactly equal to the amount earned
by them on this stages.
0i = ~-K (4.11.2)
a(b)
for i $ S, ft = 0.
The problem is now how the voters have to vote, what is the optimal size of DC
and the optimal membership.
To solve the problem we shall construct a game theoretic model and purpose
two different approaches, both of them lead to the Nash equilibrium in a specially
constructed multistage game with complete information.
4.11.1. Simultaneous n-person voting game.
Consider the sets S (coalitions) S C N for which the following condition is satis
fied:
a(S}>^- (4.11.3)
How the members of S should behave to guarantee the forming of DC from the
candidates of 5?
Suppose that every company Ai decides how to vote for each of its members
(voters) (the members of A, can always form a voting coalition). In this case if they
say "yes" to the candidates from Ai, i ~5, and "no" to the other candidates, then
DC from S will indeed be elected.
Consider a simultaneous n-person voting game T. The number of players in T
is equal t o n = ct(N) (each voter is considered as a player). Every player / has 2"
strategies, the strategy set of the voter / consists from all possible vectors of the form
a1 = {a[, . . . , < * ' , . . . , a'n}, where a[ can take one of the values "yes" or "no". In the
situation a = (a1,. ..,a!,... ,an) the result of the voting is defined in the following
way: if the number of "yes" in players' strategies on the i-th place is more than ^ ^ ,
then the candidate from the company A, is elected in the directorial council. In the
opposite case this candidate loses the vote. Suppose in the situation a the council
B = {&,, i S], where S is an admissible coalition, is elected. Then each voter / from
the company Ai, i 6 S wins the amount
the other voters' payoffs are equal to zero, if S is not an admissible coalition, then we
suppose that ki(a) 0 for all /. Construct now the Nash equilibrium in P. Suppose
that the set 3 is defined by (4.11.3) and (4.11.4). If / Ai, i G 3 , then in the strategy
a1 of player (voter) /
5^ = " y e s " , i f i S ,
for l
*/(*) = -rsvi Ai, i e 5, (4.11.6)
a(S)
k,(a) = 0, for / e Ai, i $ S. (4.11.7)
Show that
Jb|(5||a') < k,(5), t=l,...,n = a{N). (4.11.8)
Suppose / 6 Ak, k e 5 . If the change of the strategy di on a' changes the result of
the vote, then two possibilities have to be considered:
a) The candidate 6* Ak wins the vote, i.e. bk 6 B, where B is the new DC
elected in the situation (a\a!). If a(S) > ^ (S = {': 6, B}) then
a(S)
250 Chapter 4. Positional games
Since S is not necessary minimal admissible set, then a(S) > a(S), and from (4.11.6)
follows (4.11.8).
b) bk Ak does not win the vote. Then h g B, where B is the new DC elected
in the situation (olla*), and from (4.11.7) we have M"!!**') = 0 and the inequality
(4.11.8) holds also in this case.
Suppose now / 6 A*, k ~5. In this case the change of the strategy by the player
will not change the minimal admissible coalition and the DC. Thus in the situation
(5||a') the company A* will not be represented in the DC, and we shall also have
fci(|l') = 0.
The theorem is proved.
From theorem it follows that for different minimal admissible coalitions 5 we get
different Nash equilibria in T.
4.11.2. The. multistage game generating the minimal admissible coalition.
Consider the n-person multistage game G with complete information with com
panies Ai,...,Ai,...,An as players. Let N = {Ai,... ,Ai,...,A} and S C N be
any coalition in G. In what follows sometimes instead of Ai we shall write I. In
this section the model for the formation of the minimal admissible coalition purposed
in 4.11.1 will be described with the help of the flow chart of Fig. 4.16. A dynamic
notation is used. Statements of the form a > 0 indicate that /? assumes the value
a. Arrows at connecting lines show the directions of flow. Rectangles permit only
one continuation whereas rhomboids contain questions, whose answers "yes" or "no"
determine which branch the game is to be continued.
For every voting game G the model generates a finite extensive game G(T) with
complete information. (The structure of this game is described by the flow chart.)
The idea of using the flow chart for the representation of the multistage game
with some periodic properties is due to Selten (1991).
The process of forming minimal admissible coalition proceed as a succession of
stages. In the following, the rectangles and rhomboids of the flow chart will be
explained in details (see Fig. 4.16).
Rectangle 2. M is the set of active players who did not make any decision yet. At
the beginning, M is the player set N = {A\,..., Ai,..., An). S is the set of players
which agree to form one coalition. At the beginning S is empty. The symbol r, the
stage number, indicates the number of the current stage. The game begins with the
first stage.
Rectangle 3. The term "random draw" means that every player Ai 6 M has the
same probability to be chosen as the next decision maker.
Rectangie 4. The decision maker Ai is excluded from the set of active players.
Rhomboid 5. It is now important whether the decision maker joined to the coali
tion S or not.
Rectangle 6. The player Ai joined to the coalition S.
Rhomboid 7. The decision maker may be the last active player or not.
Rhomboid 8. The player i is the last active player. Checking up whether S is an
4.11. Voting the directorial council 251
Start
_2i
N -.M
0 - 5
l->r
18
Random
r + 1 > r draw of
i G Af
11 jLL
j - n M\i M
r + 1 r
19
Figure 4-16
252 Chapter 4. Positional games
hi = -777T K for Ai S;
kj = 0 for Aj < S.
Rhomboid 10. Ai is not the last active player. Then he selects the next decision
maker (invites the new member in the coalition 5) or refuses to select the next decision
maker.
Rectangle 11. Ai selects the next decision maker Aj 6 M. Aj is now acting as Ai.
The new stage begins.
Rhomboid 12. Ai refuses to select the next decision maker. The coalition S is
formed. Checking up whether S is an admissible coalition, i.e.,
Then the player Ai has two alternatives: purpose to form new coalition including
himself and some of the remaining active players or to go out from the game.
Rectangie 15. The player Ai decides to form a new coalition including himself and
some of remaining active players and selects the next decision maker Aj M.
Rectangle 16. The player Ai went out from the game. The members of the
coalition S are wiped out ( 5 = 0 ) .
Rhomboid 17. If Aj was the last active player, the game ends. If not, the game
continues with the current set of active players M.
Rectangle 18. The next stage begins.
The flow chart of Fig. 4.16 contains all the necessity for the construction of the
multistage game G(F) with the complete information generating the minimal admissi
ble coalition. The precise mathematical statements require a more formal description
of G(T). For this reason we have to introduce the notion of position, strategy, choice
sets, histories and payoffs.
Positions. The first position v0 = N and consists from the set of players in the
game G(T).
If on the preceding stage the player t was selected by the chance or was invited
by the preceding decision maker in the coalition S then the position is a triple:
u = (M,S,i),
4.11, Voting the directorial council 253
where M is the set of active players, S the forming coalition, i the decision
maker.
If in the preceding position the decision maker went out of the game or refused to
continue the formation of the coalition then the position u is a triple:
u = (Af,S,H i ),
where M is the set of active players, 5 the formed coalition, i?; the negative
decision of the decision maker i in the preceding position.
Choice set A(u). In the position o the A(UQ) is equal to the set of players N in
the game G(T) and the choice is made with equal probabilities An. In the position
u = (M,S,i) the choice set consists from the following alternatives:
a) {R} go out of the game;
b) {RYk,k M} refuse to enter the coalition 5, decide to form the new coalition
including himself, purpose to player k M enter this coalition suggesting him as the
next decision maker;
c) {YR} agree with purposal to enter the coalition S and refuse to invite in
the coalition anyone from the set of active players M (the next decision maker).
d) {YYi,,k M] agree to enter the coalition S, invite in S the next player
k M suggesting him as the next decision maker.
In the position u = (M,S,Ri) A(u) = M and the next decision maker is chosen
with equal probabilities (with the probability j i j ) if 5 = 0 or \S\ < ^ . If \S\ > ^
the position u = (M,S,Ri) is a terminal position (A{u) = 0 ) .
Thus
f {R} U {RYk,k M}0 {YR} u {YYk,ke M), if = (M,S,i),
A = lM, \iu = (M,S,Ri),
( N, if u = u0.
Explanation of Table 4.1. Under the heading "Next positions" the table shows
which positions can follow a position u by a choice o A(u) according to the condi
tions on u and a A(u) shown in the other columns. The set of all positions which
can follow u by a 6 A(u) is denoted by D(u,a), "end" indicates that D(u,a) is empty.
D(u) stands for the union of all D(u,a) with a A(u).
Histories. A history q of G(T) is a sequence of positions:
<? = ( u 0 , . . . , I*T)>
z = (u0,...,ur;ar).
S)<? (M,0,Jfc)
with t M
YYk, ifc G M (M,S,k)
with k G M
the preceding decision maker random draw M ^0 (M,0,t)
went out of the game 6 M with G Af
M = 0 end
the preceding decision maker random draw <n>f end
Tae ^ . i
ki(a) = 0, if i 3 .
Consider the situation (a||a'). If i # ~5 the player i cannot prevent the forming of
coalition S in the situation (a||a'). Thus
fc(a) = *i(B||o') = 0.
Suppose now in the situation (5||a') i 6 5 . Then with positive probability the
formation of the coalition different from ~5 which assigns to the player t lower payoff
than he gets in the coalition S is possible (in ~S the payoff of the player i S is
maximal because of the structure of the coalition J ) , thus
The subgame perfectness of the situation 3 can be proved in the similar way in any
subgame of the game T.
The theorem is proved.
4.11.3. The multistage game modeling the vote of directorial council
In this section we consider another approach to the problem of voting the direc
torial council. This approach is based upon the construction of the corresponding
multistage game G with complete information where the players have the possibili
ties not only to form coalitions but also make commitments for payoffs they wanted
to get if the coalition will be formed. The approach is very close to R.Selten[1991],
but the commitment-intervals are different, and each stage has only one round.
We shall find the Nash equilibrium (subgame perfect) in G.
We remain in the formalization of the 4.11.1.
Denote by 5 any coalition S C N such that o(S) > ^p- and there exists such
i0 s that a(S \ {;}) <
e ^
Define now the upper bounds for the commitment possibilities of the players.
Consider the following problem of linear programming:
mx
f _ (
iN\{S}
fc>0, i<EN\S,
where S is any fixed minimal admissible coalition.
256 Chapter 4. Positional games
0 <&<?,, iN\~5,
0<&< , t e s.
(3)
Let , = ^ for i 5 .
The structure of the game G is easier to illustrate by the flow chart of Fig. 4.17,
which is a simplified version of a flow chart purposed by Selten (1991).
1
f Start J
1 -*r
101
M\C ->M Random
S\C-*S draw of SUi- S
r + 1 ->r iM\S r+1 >r
8
names
permissible
payoff
demand &
each j &C\i
receives ,,
t receives
"(C)-Ej6cy6
11
( ^ )
Figure 4.17
>
v(C) = I ^ ^' 'f ^ > s a n admissible coalition (a(C) > ^Ip-),
K
' \o, for all other C C iV.
Explanation of Table 4.2. Under the heading "Next positions" the table shows
which positions can follow a position u by a choice a A(u) according to the condi
tions on u and a e A(u) shown in the other columns. The set of all positions which
can follow u by a 6 A(u) is denoted by D(u,a). The entry "end" indicates that
D(u,a) is empty. D(u) stands for the union of all D(u,a) with o at A(u). The sym
bol Zsm indicates the demand system obtained if s is complemented by the player
t's demand ,.
Histories. A history q of G(T) is a sequence of positions:
9 = (O,...,T)
z= (uo,...,tiT-}aT).
4.11. Voting the directorial council 259
3*(), i = 1,. . . , n .
f;, if in the position u = (M, s , ' ) the coalition SUi is not admissible,
i.e., \S U t| < ^ - * and u is not a terminal position,
S U t, if in the position u = (M, s, i) the coalition S U t is an admissible
a'(tj) = I coalition, \S U i\ > ^j^ anc * u *s n t a terminal position,
R, if u = (M, $s> *) is a terminal position and SUt is not an admissible
coalition
M, if u = (M, si t) is a terminal position and S U i is an admissible
coalition.
Theorem. The n-tuplt of strategies a*(u) forms the subgame perfect Nash
equilibrium in G.
Proof. In the situation when the n-tuple a is used as the result of the game the
coalition of the type S is necessarily formed, and the payoffs of the players are equal
to:
a(S)
Ml S and I is not the player which forms the coalition 5 ;
k(a) = I,
if / S \ 5 and / is not the player which forms the coalition 5 ;
fc,(a?) = 0 for / $ S.
for all strategies a1 of the player /. If the strategy a'(u) is different from a'(u) then
it prescribes different choices in some positions of the game.
260 Chapter 4. Positioned games
where S\Ji is not an admissible coalition (denote this type of position by U/), a'(uj) ^
a'(tij). This means that whether Q'(U/) prescribes to the player I to form a coalition
or to name a payoff demand
6 < ?i = *("/)
In the first case the formed coalition cannot be admissible and thus the payoff of player
/ &|(a||a') < 0, since the value of the characteristic function for the nonadmissible
coalition is equal to zero. In the second case the player / will be invited in some
coalition S and will be paid & < {( which is less than in the situation a. Thus we
have proved the inequality
ki(a) < k,(al\al)
a N
tc -N ( )
(5U)>Y
and u is not a terminal position. Denote the positions of this type by u//.
Suppose a'(un) / a^tijj). This means that the player I instead of forming the
coalition 5 U i named the payoff demand & 6 (0,6). From the conditions (4.11.14)
we have that
* - I * .
tSu(
iS
Thus by naming a payoff demand & the player / gets not more than (&) by forming
a coalition (K , s , ) - Thus we have proved
for the strategies a1 differing from 5 1 in the positions of the second type.
The proof for the strategies a ' differing from a* in the terminal position is similar.
The theorem is proved.
One can verify that payoffs in the considered Nash equilibrium a are also Pareto
optimal.
4.12. Exercises and problems 261
(b) Give an example of the game in which the payoffs to players in a penalty
strategy equilibrium do not satisfy the system of functional equations (4.12.1) with
the boundary condition (4.12.2).
4. Construct an example of a multistage two-person nonzero-sum game where,
in a penalty strategy equilibrium, the penalizing player penalizes his opponent for
deviation from the chosen path and thus penalizes himself to a greater extent.
5. Construct Pareto-optimal sets in the game from Example 4, 4.2.2.
6. Construct an example of multistage nonzero-sum game where none of the Nash
equilibria leads to a Pareto-optimal solution.
7. Construct the map T which sets up a correspondence between each subgame
Tz of the game T some subset of situations U, in this subgame. Let T(T) = Uxa.
We say that the map T is dynamically stable (time-consistent) if from () Uxo
it follows that u**(-) UIk where u'-(-) = (uf*(-),.-.,?()) >s the truncation of
situation () to the subgame T,k, and w0 = {x 0 , Z\, ,z*, ..} is the play realized in
situation () E C/I0.
Show that if the map T places each subgame TZk in correspondence with the set
of Pareto-optimal situations Uf, then it is dynamically stable (time-consistent).
8. The map T defined in Ex.7 is called strongly dynamic stable (strongly time-
consistent) if for any situation u(-) 6 Uxo, any zk 6 {z<} = w, where {z<} = u> is a
play in situation u(-), situation u**(-) e UIk there exists a situation w(-) 6 UXo for
which the situation **() is its truncation on positions of the subgame Ttk.
Show that if the map T places each subgame FZk in correspondence with the set
of Nash equilibria, then it is strongly dynamic stable.
9. Construct an example where the map T placing each subgame I \ in correspon
dence with the set of Pareto-optimal equilibria is not strongly dynamic stable.
10. For each subgame T t we introduce the quantities v({t}), i = 1 , . . . , n rep
resenting a guaranteed payoff to the r-th player in the subgame Tx, i.e. ({'}, z) is
the value of the game constructed in terms of the graph of the subgame TI between
262 Chapter 4. Positional games
player and players N \ i acting as one player. In this case, a strategy set for the
coalition of players N \ i is the Cartesian product of strategy sets for each of the
players k {N \ i}, us\i G FI*iVui> t n e payoff function for player i in situation
(ui,xtN\i) is defined to be Hf(ui,UN\i), and the payoff function for coalition N \ i is
taken to be [Hf(%H,UN\i)].
Construct the functions v({i},z) for all subgames Tz of the game from Example
4, 4.2.2.
1 1 . Show that if in a multistage nonzero-sum game T with non-negative payoffs
(Hi > 0, i - l , . . . , n ) , w({t},z) = 0 for all i = l , . . . , n and z 6 Uf =1 X then any
play can be realized in some penalty strategy equilibrium.
12. Formalize the fc-level control tree-like system as a hierarchical game in which a
control center at the i-th level (i = 1 , . . . , k 1) allocate resources among subordinate
control centers at the next level with i < k I and among its subordinate produc
tion divisions with i = k 1. The payoff to each production division depends only
on its output, while the payoff to the control centers depends on their subordinate
production divisions.
1 3 . Find a Nash equilibrium in the tree-like hierarchical fc-level game constructed
in Exercise 12.
14. Show that the payoff vector a = {u(JV),0,... ,0} belongs to the core of a
tree-like hierarchical two-level game with the characteristic function v(S). Show that
the equilibrium constructed in the tree-like hierarchical two-level game is also a strong
equilibrium.
15. In a diamond-shaped hierarchical game construct a characteristic function by
using a Nash equilibrium.
16. Describe the set of all Nash equilibria in a tree-like hierarchical two-level
game. Take into account the possibility that the players B\,...,Bn can "penalize"
center A$ (e.g., by stopping production when the allocation of resources runs counter
to the interests of player ).
17. Construct the payoff matrix for players in the game of Example 7, 4.7.1. Find
optimal pure strategies and the value of the matrix game obtained.
18. Convert the game from Example 9, 4.7.1, to the matrix form and solve it.
19. Consider the following multistage zero-sum game with delayed information
about the positions of one of the players. The game is played by two players: target E
and shooter P. The target can move only by the point of the Ox axis with coordinates
0,1,2, In this case, if player E is at the point t, then at the next moment he can
move only to the points t + 1, i 1 or stay where he is. Shooter P has j bullets,
(j = 0,1,...) and can fire no more than one bullet at each time instant. It is assumed
that the shooter hits the point at which he is aiming.
At each time instant player P knows exactly the position of player E at the
previous step, i.e. if player E has been in the point i at the previous step then player
P has to aim at the points t + 1 , t and % 1. Player E is informed about the number
of bullets that player P has at each time instant, but he does not know where player
P is aiming at. The payoff to shooter P is determined by his accuracies, and so the
objective of shooter P is to maximize the number of his accurate hits before target E
4.12. Exercises and problems 263
can reach a "bunker". The objective of the target is the opposite one. Here "bunker"
means the point 0 where the target is inaccessible for player P.
Denote this game by T^j) with the proviso that at the initial time instant target E
is at the point with the coordinate i, while shooter P has j bullets. Denote by v(i,j)
the value of the game (if any). It can be readily seen that i>(i,0) = 0, i = 1,2,...,
v ( l , j ) = 0, j = 1,2, At each step of the game I \ j , i = 2 , 3 , . . . , j = 1,2,... the
shooter has four strategies (actually he has more strategies, but they are not rational),
whereas player E has three strategies. The strategies for shooter P are: shooting at
the point * 1, shooting at the point t, shooting at the point i + 1, no shooting at
this step. The strategies for the target are: move to the point i 1, stay at the point
t, move to the point t + 1. Thus at each step of the game we have the matrix game
with the payoff matrix
(1 + v(i - 1, j - l))xi + v(i - 1, j - l)x2 + v(i - l,j* - l)ar3 + v(i - l,j)x4 > v{i,j),
(1 + V( - l , i - 1) + V(i,i - l))/2>.
1) Prove that v(i,j) = <fi(i,j), and if v(i,j) = (1 + v(i - 1, j - 1) + v(i, j - 1) +
w(' + l , i - l ) ) / 3 , then
*('.j) = v (*.i) - v ( - l . J ~ 1).
*3(t,j) = (, j ) - (t + 1, j - 1),
*4(*,i) = o,
i(,i) = (,i) = ys(h j) = 1/3;
2) Prove that v(i,j) = <p(i,j), and if v(t, j ) = (1 + v(i - IJ - 1) + v(i,j - l))/2,
then
*i(ij) = (ii) - "( - l.J - ! ) .
*a(*',i) = w(.i) - (,i - 1),
x3(i,j) = i4(',i) = 0,
yi(\i) = y2(>i) = 1/2,
3(*',i) = 0;
(c) Prove that the following relationships hold for any j = 0 , 1 , 2 , . . . :
l)v(i,j)=j/3, i = j + l,j + 2,...;
2)v(i,j)<v( + l,i), i = l , 2 , . . . ;
3) v(ij) < (,j + 1), 1 = 2 , 3 , . . . ;
4) (t,j) + w(i + 2,j) < 2(i + l , j ) , t " = l , 2 , . . .
(d) Prove that:
1) lim^+oo v(i,j) = j / 3 for any fixed j = 0 , 1 , 2 , . . . ;
2) lira ; __ 00 v(i,j) = t 1 for any fixed * = 1,2,
20. Consider an extension of the game of shooter and target, where target E is
in position i, from where it can move at most k units to the right or the left, i.e. it
can move to each of the points: i k,i fc+ 1 , . . . , t , . . . , + ib. The other objectives
and possibilitilies for shooter P and target E remain unaffected in terms of the new
definition of a strategy for Player E.
Denote by G(i,j) the game with the proviso that at the initial time instant the
target is at the t-th point and the shooter has j bullets. Farther, denote by v(i,j)
the value of the game G(i,j). From the definition of G(i, j) we have
v(t,0) = 0, t = 1,2,...,
v(i,j) = 0, = 1 , 2 , . . . , * , ; = 1,2,....
( l+(* + n - j f c - l , j - 1), if m = n = l , . . . , 2 f c + l ,
v(i + n~ k- l,j - 1),
if m = 2* + 2, n = 1,... ,2k + 1.
(a) Show that the game G(i,j) has the value equal to v(i, j) if and only if there
exist ( i , , x 2 , . . . , i 2 t + 2 ) , (yi,y 2 ,...,y 3 * + i) such that
2fc+2
amn(i,j)Xm >v(ij), n = 1,...,2* + 1,
m=l
2*+2
Y, xm = 1, i m > 0, m = l , . . . , 2 * + 2,
m=l
2*+l
5Z mn(i,i)yn <(*,i), m = 1,...,2* + 1,
n=l
2*+l
v = i, y n > 0 , n = i , . . . , 2 * + i.
>(,0) = 0, 1 = 1 , 2 , . . . ;
tp(ij) = 0, i = 1,2,...,*; j = 1,2,...;
x = /(x,u), (5.1.1)
y = g{y,v) (5.1.2)
with initial conditions xo, j/o- Player P (E) starts his motion from the phase state xo
(yo) and moves in the phase space i?" in accordance with (5.1.1) and (5.1.2), choosing
at each instant of time the value of parameter u G U(v G V) to suit his objectives
and in terms of information available in each current state.
The simplest to describe is the case of perfect information. In the differential
game this means that at each time instant t the players choosing parameters u G U,
v G V know the time t and their own and the opponent's phase states. Sometimes
one of the players, say, Player P , is required to know at each current instant t the
value of the parameter v G V chosen by Player E at the same instant of time. In
this case Player E is said to be discriminated and the game is called the game with
discrimination against Player JE.
The parameters u G U, v G V are called controls for the players P and E,
respectively. The functions x(t),y{t) which satisfy equations (5.1.1), (5.1.2) and initial
conditions that are called trajectories for the players P, E.
267
268 Chapter 5. Differential games
5.1.2. Objectives in the differential game are determined by the payoff which
may defined by the realized trajectories x(t), y(t) in a variety of ways. For example,
suppose it is assumed that the game is played during a preassigned time T. Let
x(T),y(T) be phase states for the players P and E at the time instant T the game
terminates. Then the payoff to Player E is taken to be H(x(T),y(T)), where H(x,y)
is some function given on R" x R". In the specific case, when
where p(x(T), y{T)) = y/l%=i(xi('r) - y,(T)) 2 is the Euclidean distance between the
points x(T),y(T), the game describes the process of pursuit during which the objec
tive of Player E is to avoid Player P by moving a maximum distance from him by the
time the game ends. In all cases the game is assumed to be a differential zero-sum
game. Under condition (5.1.3), this means that the objective of Player P is to come
within the shortest distance of Player E by the time T the game ends.
With such a definition, the payoff depends only on final states and the results
obtained by each player during the game until the time T are not scored. It is of
interest to state the problem in which the payoff to Player E is defined as a minimum
distance between the players during the game:
omipT/(*(*)> y(0)-
There exist games in which the constraint on the game duration is not essential and
the game continues until the players obtain a particular result. Let an m-dimensional
surface F be given in R2n. This surface will be called terminal. Let
tn = {mmt:(x(t),y(t))F}, (5.1.4)
i.e. tn is the first time when the point (x(t),y(t)) falls on F. If for all t > 0 the point
(x(i),y(<)) ^ F, then tn is +oo. For the realized paths x(t),y(t) the payoff to Player
E is (the payoff to Player P is tn). In particular, if F is a sphere of radius / > 0
given by the equation
then we have the game of pursuit in which Player P seeks to come within a distance
/ > 0 to Player E as soon as possible. If / = 0, then the capture is taken to mean
the coincidence of phase coordinates for the players P and E, in which case Player E
seeks to postpone the capture time. Such games of pursuit are called the time-optimal
games of pursuit.
The theory of differential games also deals with the problems of determining the
set of initial states for the players from which Player P can ensure the capture of
Player E within a distance /. And a definition is provided for the set of initial states
of the players from which Player E can avoid in a finite time the encounter with
Player P within a distance /. One set is called a capture zone (C, Z) and the other an
5.1. Differential zero-sum games with prescribed duration 269
escape zone (E, Z). It is apparent that these zones do not meet. However, a critical
question arises of whether the closure of the union of the capture and the escape zones
spans the entire phase space. Also the answer to this question is provided below, we
now note that in order to adequately describe this process, it suffices to define the
payoif as follows. If there exists tn < oo (see (5.1.4)), then the payoff to Player E is
1. If, however, tn = oo, then the payoff is +1 (the payoff to Player P is equal to
the payoff to Player E but opposite in sign, since the game is zero-sum). The games
of pursuit with such a payoff are called the pursuit games of kind.
5.1.3. Phase constraints. If we further require that the phase point (x, y) would
not leave some set F C R2n during the game, then we obtain a differential game
with phase constraints. A special case of such a game is the "Life-line" game. The
"Life-line" game is a zero-sum game of kind in which the payoff to Player E is -(-1 if he
reaches the boundary of the set F ("Life-line") before Player P captures him. Thus,
the objective of Player E is to reach the boundary of the set F before being captured
by Player P (coming within a distance /, / > 0 with Player P). The objective of
Player P, however, is to come within a distance / with Player E while the latter is
still in the set F. It is assumed that Player P cannot abandon the set F.
5.1.4. Example 1. (Simple motion.) The game is played on a plane. Motions of
the players P and E are described by the system of differential equations
x4 = a2 kpii, Uj + uj < a 2 ,
!/i = !/3, fa = y*, fa = Pvi- kEy3, (5.1.6)
1
y4 = $v2 - *E!k, v? + v\ < 0 ,
where (xt,x2), (yuy2) are geometric coordinates, (x3,x4), (y3,y4) are respectively mo
menta of the points P and E, kp and jfeg are friction coefficients, a and f$ are maximum
forces which can be applied to the material points P and E. The motion starts from
the states x,(0) = x% y,(0) = y, i = 1,2,3,4. Here, by the state is meant not
the locus of the players P and E, but their phase state in the space of coordinates
and momenta. The sets U, V are the circles U = {u = (u^uj) : ? + u\ < a 2 },
V = {v = (i,t;2): V2 +w| < 0 2 }. This means that at each instant the players P and
E may choose the direction of applied forces. However, the maximum values of these
forces are restricted by the constants a and 0. In this formulation as shown below,
the condition a > f) (power superiority) is not adequate for Player P to accomplish
pursuit from any initial state.
5.1.5. The ways of selecting controls u U, v 6 V by the players P and E in
terms of the incoming information are as yet unknown. In other words, the notion of
a strategy in the differential game remains to be defined.
Although there exist several approaches to this notion, we shall focus on those in
tuitively obvious game-theoretic properties which the notion must possess. As noted
in Ch. 4, the strategy must describe the behavior of a player in all information states
in which he may find himself during the game. In what follows the information state
of each player will be determined by the phase vectors x(t),y(t) at the current time
instant t. Then it would be natural to regard the strategy for Player P (E) as a
function u(x,y,t) (v(x,y,t)) with values in the set of controls U (V). That is how
the strategy is defined in Isaacs (1965). Strategies of this type are called synthesizing.
However, this method of defining a strategy suffers from some grave disadvantages.
Indeed, suppose the players P and E have chosen strategies u(x, y, t), v(x, y, t), re
spectively. Then, to determine the paths for the players, and hence the payoff (which
is dependent of the paths), we substitute the functions u(x,y,t),v(x,y,t) into equa
tions (5.1.1),(5.1.2) in place of the control parameters w, v and integrate them with
initial conditions x0, y0 on the time interval [0, T]. We obtain the following system of
ordinary differential equations:
x = f(x,u(x,y,t)), y = g(y,v(x,y,t)). (5.1.7)
For the existence and uniqueness of the solution to system (5.1.7) it is essential
that some conditions be imposed on the functions f(x,u),g(y,v) and the strategies
5.1. Differential zero-sum games with prescribed duration 271
u(x, y, t), v(x, y, t). The first group of conditions places no limitations on the players'
capabilities, refers to the statement of the problem and is justified by the physical
nature of the process involved. The case is different from the constraints on the
class of functions (strategies) w(x, y, t),v(x, y,t). Such constraints on the players'
capabilities contradict the notions adopted in game theory that the players are at
liberty to choose a behavior. In some cases this leads to substantial impoverishment
of the sets of strategies. For example, if we restrict ourselves to continuous functions
u(x,y, t),v(x, y, t), the problems arise where there are no solutions in the class of
continuous functions. The assumption of a more general class of strategies makes
impossible the unique solution of system (5.1.7) on the interval [0, T]. At times, to
overcome this difficulty, one considers the sets of strategies u(x,y,t),v(x,y, t) un
der which the system (5.1.7) has a unique solution extendable to the interval [0, T].
However, such an approach (aside from the nonconstructivity of the definition of
the strategy sets) is not adequately justified, since the set of all pairs of strategies
u(x,y,t), v(x,y, t) under which the system (5.1.7) has a unique solution is found to
be nonrectangular.
5.1.6. We shall consider the strategies in the differential game to be piecetot'se
open-loop strategies.
The piecewise open-loop strategy u(-) for Player P consists of a pair {a, a}, where
a is some partition 0 = t'0 < t\ < ... < t'n < . . , of the time interval [0, oo) by the
points t'k which have no finite accumulation points; a is the map which places each
point l'k and phase coordinates x(t'k),y(t'k) in correspondence with some measureable
open-loop control u(t) G U for t G [t*)'i+i) (the measurable function u(t) taking
values from the set U). Similarly, the piecewise open-loop strategy v(-) for Player
E consists of a pair {T, 6} where T is some partition 0 = tg < t" < .,. < <J[ < . . .
of the time interval [0, oo) by the points t'k which have no accumulation points; b is
the map which places each point t'k and positions x(t'k), y(tk) in correspondence with
some measurable open-loop control v(t) G V on the interval [t'k, <i'+1) (the measurable
function v(t) taking values from the set V). Using a piecewise open-loop strategy the
player responds to changes in information not continuously in time, but at the time
instants tk G r which are determined by the player himself.
Denote the set of all piecewise open-loop strategies for Player P by P, and the
set of all possible piecewise open-loop strategies for Player E by E.
Let u(t),v(t) be a pair of measurable open-loop controls for the players P and
E (measurable functions with values in the control sets U, V). Consider a system of
ordinary differential equations
Impose the following constraints on the right-hand sides of system (5.1.8). The vector
functions f(x,u),g(y,v) are continuous in all their independent variables and are
uniformly bounded, i.e. / ( x , u ) is continuous on the set ft" X U, while g(y,v) is
continuous on the set i f x V and | | / ( x , u ) | | < a, ||s(y,v)|| < 0 (here ||z|| is the
vector norm in Rn). Furthermore, the vector functions f(x, u) and g(y, v) satisfy the
272 Chapter 5. Differential games
K(x0,y0;u(-),v(-)) = H(*(ny(m
where x(T) = x(t)\t=r, y{T) = y(t)|=r (here x(t),y(t) are the paths of players P
and in a situation 5). We have the game of pursuit when the function H(x, y) is
a Euclidean distance between the points x and y.
5.1. Differential zero-sum games with prescribed duration 273
K(x0,y0;<),()) = t H(x(t),y(t))dt
Jo
(if tn = oo, then K = oo), where x(f),y(t) are the paths of players P and corre
sponding to the situation S. In the case H = 1, A" = <, we have the time optimal
game of pursuit.
Qualitative payoff. The payoff function K can take only one of the three values
+ 1,0,-1 depending on a position of (x(t),y(t)) in i f x /f. Two manifolds F and
L of dimensions mx and m 2 respectively are given in fi" X /f. Suppose that in the
situation S = {xo.Jto; w(')i"(")}> 'n is the first instant at which the path (x(t),y{t))
falls on F. Then
f-1, if(x(),if(M)Gl,
#{*<),oi(0) "()) = ] > if*n = oo,
[+1, if (*(<),(*.))*
5.1.9. Having defined the strategy sets for the players P and E and the payoff
function, we may define the differential game as the game in normal form. In 1.1.1,
we interpreted the normal form T as the triple V = < X, Y, K >, where X X Y is the
space of pairs of all possible strategies in the game T, and K is the payoff function
defined on X x Y. In the case involved, the payoff function is defined not only on
the set of pairs of all possible strategies in the game, but also on the set of all pairs
of initial positions x 0 ,y 0 . Therefore, for each pair (x0,jfo) 6 i f x i f there is the
corresponding game in normal form, i.e. in fact some family of games in normal form
that are dependent on parameters (xo,yo) i f x i f are defined.
Definition. The normal form of the differential game T(xo,yo) given on the
space of strategy pairs P x E means the system
where K(xo,y0;u(-),v(-)) is the payoff function defined by any one of the above meth
ods.
If the payoff function K in the game V is terminal, then the corresponding game
T is called the game with terminal payoff. If the function K is defined by the second
method, then we have the game for achievement of a minimum result. If the function
K in the game T is integral, then the corresponding game T is called the game with
integral payoff. When the payoff function in the game T is qualitative, then the
corresponding game T is called the game of kind.
274 Chapter 5. Differential games
5.1.10. It appears natural that optimal strategies cannot exist in the class of
piecewise open-loop strategies (in view of the open structure of the class). However,
we can show that in a sufficiently large number of cases, for any e > 0 there is an
-equilibrium point.
Recall the definition of the e-equilibrium point (see 2.2.3).
Definition Let t > 0 be given. The situation st = {x0,ya;vc(-),vc(-)} is called
an e-equilibrium in the game T{xo,yo) if for all () G P and () E there is
> K(x0,yo]u(-),())- e-
The strategies ut(-),v(-) determined in (5.1.10) are called e-optimal strategies for
players P and E respectively.
The following Lemma is rephrasing of Theorem 2.2.5 for differential games.
Lemma. Suppose that in the game r(xo,yo) for every t > 0 there is an e-
equMbrium. Then there exists a limit
YimK(x0,y0;u(-),v({-)).
Definition. The function V(x,y) defined at each point (x,y) of some set D C
i f x fl* by the rule
YunK(x,y; ,(), ,()) = V(x,y), (5.1.11)
is called the value of the game r(x,y) on the set of initial conditions (x,y) 6 D.
The existence of an e-equilibrium in the game T{xo, yo) for any e > 0 is equivalent
to the fulfilment of the equality
SU
P tof tf(*o,jto; u(-), ())= inf sup /f(i 0 ,yo; ().())
v(.)B(-)P ()^v()B
If in the game T(x0, y0) for any e > 0 there are e-optimal strategies for players P
and E, then the game r(x 0 , yo) is said to have, a solution.
Definition. Let '(),*() be the pair of strategies such that
/or all () 6 P and v() 6 J5. 7%e stfwaiion s' = (z0,y<>; *()> "*(")) ** ^ e n ca^d
an equilibrium in the game r(xo,yo). The strategies *() P and v'(-) E from
(5.1.12) are called optimal strategies for players P and E, respectively.
The existence of an equilibrium in the game r(xoiyo) is equivalent (see 1.3.4) to
the fulfilment of the equality
m
S ftLtf^yo;().())= ""a. suptfto'Vo;()>())
(-) (-)EP ( ) 6 E v (.)g/>
5.2. Multistage perfect-information games with an infinite number of... 275
Clearly, if there exists an equilibrium, then for any e > 0 it is also an e-equilibrium,
i.e. here the function V(x,y) merely coincides with K(x,y;u'(-),v*(-)) (see 2.2.3).
5.1.11. We shall now consider a synthesizing strategies.
Definition. The pair (u*(x,y,t),t>*(x,y,i)) is called a synthesizing strategy
equilibrium in the differential game, if the inequality
(UI0)CDUX, {Ux)tDUXo;
F2(x,y) = jrunmax/(*',y')
discussion it is convenient to assign each game: r(zoi yo, N) to the family of games
T(x,y,T) depending on parameters x,y,T.
5.2.2. The following result is a generalization of Theorem 4.2.1.
T h e o r e m . The game T(xo, yo, N) has an equilibrium in pure strategies and the
value of the game V(xo,yo,N) satisfies the relationship
Proof is carried out by induction for the number of steps. Let N = 1. Define the
strategies u*(-),u*(), for the players in the game r(xo>yo, 1) in the following way:
and for any strategies u(-), () of the players in the game T(x0, yo, 1)
If max vVlB min r 6 t / , o V{x,y,n) = K(u" + 1 (x 0 ,y, l ) , y , n ) , then " + , (zo,yo,l) = y (for
x ^ x0, y ^ t/o the functions t f + , ( z , y , 1) and u"+,(x,y,l) can be defined in an
arbitrary way)
5" +1 (-,*) = * ; , ( , * - 1 ) , fc = 2 , . . . , n + l ,
5" + , (-,*) = ^ ( - , * - 0 . * = 2 , . . . , n + l .
Here xi e Uxo, y t e V^ are the positions realized after the 1st step in the game
r(x0,y0,n + 1). By construction,
and the second move is made by Player 1. The game of this type is called a discrete
game of "simple pursuit" with discrimination against evader. The duration of the
game is N steps and the payoff to Player 2 is equal to a distance between the players
at the final step.
We shall find the value of the game and optimal strategies for the players by using
the functional equation (5.2.7).
We have
V ( x , y , l ) = rnaxminp(x',y'). (5-2.9)
Since Ux and Vy are the circles of radii a and 0 with centers at x and y, we have that if
Ux D Vy, then V(x, y, 1) = 0, if, however, Ux ~f> Vv, then V(x,y, 1) = p(x,y) + 0-a =
p(x,y) - (a - /?) (see Example 8 in 2.2.6). Thus,
=
j^J?&{max[o,p(z'>y') - ("* - i)( - 0)}}
= max[0, max min {/>(x', y')} - (m - l)(a - /?)]
y'V i f
= max[0,max{0,/>(x,y)- (a - 0)} - (m - l ) ( a - / ? ) ] = max[0,/?(x,y) - m(a - 0)\,
which is what we set out to prove.
If V(x0,y0,m) = p(x 0 ) y 0 ) - m(a - 0), i.e. p(x0,y0) - m(a - 0) > 0, then the
optimal strategy dictates Player 2 to choose at the k-th step of the game the point
yt of intersection the line of centers Xk-iiVk-i with the boundary VVk_l that is the
farthest from Xk-i- Here xic-i,yk-i are the players positions after the (k l)-th
step, k = l,...,N. The optimal strategy for Player 1 dictates him to choose at
the Jfc-th step of the game the point from the set UXk_, that is the nearest to the
point y*. If both players are acting optimally, then the sequence of the chosen points
x 0 , x , , . . . , x N , yo,y\,---,VN es along the straight line passing through x0,y0. If
V(x0,y0,m) = 0, then an optimal strategy for Player 2 is arbitrary, while an optimal
strategy for Player 1 remains unaffected. In this case, after some step k the equality
maxygv^ min x 6 [/ n p{x,y) = 0 is satisfied; therefore starting with the (k+ 1) step the
choices by Player 1 will repeat the choices by Player 2.
280 Chapter 5. Differential games
f o r : y = g(y,v). (5.3.2)
Here x,y R", u U, v V, where U, V are compact sets in the Euclidean
spaces Rk and Rf, respectively, t [0,oo). Suppose the requirements in 5.1.6 are all
satisfied.
Definition. Denote by CP(x0) the set of points x R71 for which there is
a measurable open-loop control u(t) U sending the point XQ to x in time t, i.e.
x(t0) = Xo, x(t0 -r t) = x. The set Cp(x0) is called a reachability set for Player P
from initial state XQ in time t.
In this manner we may also define the reachability set Cg(y 0 ) for Player E from
the initial state y0 in time t.
We assume that the functions f,g are such that the reachability sets Cp(xo),
CE(J/O) for players P and E, respectively, satisfy the following conditions:
1. Cp(x0), Cg(jfo) are defined for any x0,yo G # " , U>,t [0, oo) (t0 < t) and are
compact sets of the space i f ;
2. the point to set map C'p(xQ) is continuous in all its variables in Hausdorff metric,
i.e. for every e > 0, x'0 R", t 6 [0,oo) there is S > 0 such that if \t - t'\ < S,
p(x0,x'0) < 6, then p"(CP(x0),Cp(x'0)) < e. This also applies to C^(y 0 )-
Recall that the Hausdorff metric p" in the space of compact sets Rn is given as
follows:
Proof. The games Tf (x 0 , y0,T),i = 1,2,3, belong to the class of multistage games
defined in Sec. 5.2. The existence of an equilibrium in the games Tf (x 0 , yo, T) and the
continuity of functions Vairf(xo,yo, T) in Xo.yo immediately follows from Theorem
in 5.2.2 and its corollary. The following recursion equations hold for the values of the
games rf(xo,yo,7'), t = 1,2,
= max min
iec-(
viec-(ww)*,C"()
Continuation of this process yields
5.3.5. Theorem.
T h e o r e m . For
For all x00,y
,y00 IP,
IP, T < oo there
there is the
the limit
limit equality
equality
Jim VaJrJ"(*o,yo,r)
V a i l f ( z 0 , y o , r ) = Jim Va/lf
V a / l f (x 0)
0 ) yo,T),
w/tere n = T/2 n .
Proof. Let us fix some n > 0. Let u(-), v(-) be a pair of strategies in the game
T2n(xQ,yo,T). This pair remains the same in the game T^(xa,yo,T). Suppose that
the sequence x 0 ,*i>,x 3 , ya,yi,...,yv is realized in situation w(-),v(-). Denote
the payoff functions in the games rj n (xo,yo,T), ^"(xoiyo,^ by K2(u(-),v(-)) =
f>{x2n,y2), K3(u(-),v(-)) = p(x2n,y3_i), respectively. Then
Since the function Cg(y) is continuous in t and the condition C%{y) = y is satisfied,
the second term in (5.3.5) tends to zero as n > oo. Denote it by ei(n). From (5.3.5),
(5.3.6) we obtain
From Lemma 5.3.3 the inverse inequality follows. Hence both limits in (5.3.9) coincide.
5.3.6. The statement of Theorem 5.3.5 is proved on the assumption that the
partition sequence of the interval [0, T]
We shall now consider any such partition sequences of the interval [0,T] {an} and
Hence we may find natural numbers m t ,rti such that the following inequality is sat
isfied:
Vair' m '(x 0 ,yo,T) > ValT^ (x0,y0,T).
284 Chapter 5. Differential games
Denote by o the partition of the interval [0, T] by the points belonging to both the
partitions <rmi and o~'ni. For this partition
The game depends on the initial conditions xo,yo, therefore it is denoted by r(xo,yo)-
From the definition of the payoff function (5.4.2) it follows that the objective of
Player E in the game r(a;0,J'o) is to maximize the time of approaching Player P
within a given distance / > 0. Conversely, Player P wishes to minimize this time.
5.4.2. There is a close relation between the time-optimal game of pursuit
T(-ioi Vo, T) and the minimum result game of pursuit with prescibed duration. Let
F{xo,yo,T) be the game of pursuit with prescribed duration T for achievement of a
minimum result (the payoff to Player E is mino<t<T p(x(t),y(t))). It was shown that
for the games of this type there is an e-equilibrium in the class of piecewise open-
loop strategies for any > 0 (see 5.3.8). Let V(x0,yo,T) be the value of the game
T(x0ty0,T) and V(xo,y0) be the value of the game r(x 0 ,y 0 ) if it exists.
L e m m a . With xQ,y0 fixed, the function V(x0,y0,T) is continuous and does not
increase in T on the interval [0, oo].
Proof. Let Ti > T2 > 0. Denote by vj1 a strategy for Player E in the game
F(zo> Jfoi T) which guarantees that a distance between Player E and Player P on the
interval [0, Tj] will be at least max[0, V(xo, yo, Ti) ] Hence it does ensure a distance
max[0, V{x0,yo,T\) t] between the players on the interval [0,T 2 ], where Ti < Ti.
Therefore
V(x0,yo,T2) > max[0, V(x 0 ,yo,jTi) - e] (5.4.3)
(the strategy e-optimal in the game T(xo,yo,Ti) is not necessarily e-optimal in the
game r(xo,y0,T^)). Since e can be chosen to be arbitrary, the statement of this
Lemma follows from (5.4.3). The continuity of V(x0,y0,T) in T will be left without
proof. To be noted only is that this property can be obtained by using the continuity
of V(x0,yo,T) in x0,5to-
5.4.3. Let us consider the equation
V(x0,y0,T) =l (BAA)
Hence for any T > 0 (arbitrary large) Player E has a suitable strategy w T () 6 E which
guarantees him / capture avoidance on the interval [0, T]. But Player P then has no
strategy which could guarantee him /-capture of Player E in finite time. However, we
cannot claim that Player E has a strategy which ensures /-capture avoidance in finite
time. The problem of finding initial states in which such a strategy exists reduces to
solving the game of kind for player E. Thus, for / < limx-^x, V(*o, yo, T) it can be
merely stated that the value of the game T(x0, y0), if any, is larger than any previously
given T, i.e. it is +oo.
c) is considered together with case 3).
Case 2. Let T0 be a single root of equation (5.4.4). Then it follows from the
monotonicity and the continuity of the function V(i 0 , yo, T) in T that
V(x0, y0, T) < V(x0, y0, T0) for all T > T0, (5.4.5)
Jim. V(*0,yo,T) = V(x0,y0,T0). (5.4.6)
Let us fix an arbitrary T > T0 and consider the game of pursuit r(x0,y0,T). The
game has an e-equilibrium in the class of piecewise open-loop strategies for any > 0.
This, in particular, means that for any e > 0 there is Player P's strategy u t ( ) P
which ensures the capture of Player E within a distance V(x0, y0l T) + e, i.e.
K(uc{-),v())<V{x0,yQ,T) + e, () e E, (5.4.7)
where K((),()) is the payoff function in the game F(x0,y0,T). Then (5.4.5),
(5.4.6) imply the existence of t > 0 such that for any e < 1 there is a number
f(e),T0<f(e)<T for which
i.e. the strategy u<() ensures /-capture in time T. Hence, by the arbitrariness of
T > To, it follows that for any T > To there is a corresponding strategy u T ( ) P
which ensures /-capture in time T. In other words, for 6 > 0 there is us(-) P such
that
tn(x0, Vo; (), v()) < T0 + 6 for all v(-) e E. (5.4.9)
In a similar manner we may prove the existence of tij(-) E such that
It follows from (5.4.9), (5.4.10) that in the time-optimal game of pursuit T(x0, y0)
for any e > 0 there is an e-equilibrium in piecewise open-loop strategies and the value
of the game is equal to T0, with T0 as a single root of equation (5.4.4).
Case 3. Denote by T0 the minimal root of equation (5.4.4). Generally speaking, we
cannot now state that the value of the game Vair(x 0 , yo) = To. Indeed, V(x0, yo, To) =
I merely implies that in the game T(a:o!yo)7o) for any e > 0 Player P has a strategy
u e (-) which ensures for him, in time T0, the capture of Player E within a distance
of at most / + t. From the existence of more than one root of equation (5.4.4), and
from the monotonicity of V ( i 0 , y 0 , T ) in T we obtain the existence of the interval
of constancy of the function V(x0,yo,T) in T [T 0 ,Ti]. Therefore, an increase in
the duration of the game F(x0, y 0 , T0) by 6, where 6 < T\ - T0, does not involve a
decrease in the guaranteed approach to Player E, i.e. for all T Po,Ti] Player F
can merely ensure approaching Player E within a distance / + e (for any e > 0), and
it is beyond reason to hope for this quantity to become zero for some T [To,7\]. If
the game F(x0, y0, T0) had an equilibrium (but not an e-equilibrium), then the value
of the game T(xo,!/o) would also be equal to To in Case 3.
5.4.4. Let us modify the notion of an equilibrium in the game r(i 0 ,yo)- Further,
in this section it may be convenient to use the notation T(xo, yo,') instead of r ( i 0 , yo)
emphasizing the fact that the game r(x o >Sto,0 terminates when the players come
within a distance / of each other.
Let t'n(xo,yo\"()>()) ^e the time until coming within a distance / in situation
((),()) and let there be e > 0, 6 > 0.
Definition. We say that the pair of strategies uf(-),cf(-) constitutes an t,6-
equilibrium in the game T(x0,yo,l) if
From the definition of the value and solution of the game F(x0, yo, I) (in the gen
eralized sense) it follows that if in the game r(z 0 . yo.') for every e > 0 there is an
e-equilibrium in the ordinary sense (i.e. the solution in the ordinary sense), then
V(xa,yo,l) = V'(xo,yo,l) (it suffices to take a sequence 6k = 0 for all k).
Theorem. Let equation (5.4-4) have more than one root and let To be the least
root, T0 < oo. Then there exists the value V'(x0, yo, I) (in the generalized sense) of
the time-optimal game of pursuit r(x0,yo,l) and V'(xo,yo,l) = To-
Proof. The monotonicity and continuity of the function V(xo> Vo, T) in T imply the
existence of a sequence Tk To on the left such that V(xo, yo, Tk) V(xo, yo, To) = I
and the function V(xo,yo,Tk) is strictly monotone in the points Tk. Let
Sk = V(xo,yo,Tk)-l>0.
The strict monotonicity of the function V(xo,yo,T) in the points Tk implies that
the equation V(x0,yo,T) = I + 6k has a single root 7*. This means that for every
6k 6 {6k} in the games r(xo,yo, I + 6k) there is an e-equilibrium for every e > 0 (see
Case 2 in 5.4.3). The game T(xo,yo,l) then has a solution in the generalized sense:
lim V{x0, yo, / + **)= Km 7* = T0 = V'{x0, y0,1).
k*oo *oo
Let V(x,y, T) be the value of the game with prescribed duration T from initial
states x,y Rn with the payoff min 0 <i<r p(x(t),y(t)). Then the following alternatives
are possible: 1) V{x,y,T) > I; 2) V(x~y,T) < I.
Case 1. From the definition of the function V(x,y,T) it follows that for every
e > 0 there is a strategy for Player E such that for all strategies u(-)
K(x,y;u(),v:())>V(x,y,T)-e.
ff(*,;u(-),w;(-))>V(*,y,T)-e>/
holds for all strategies () 6 P of Player P. From the form of the payoff function K
it follows that, by employing a strategy v'(-), Player E can ensure that the inequality
min0<i<To p(x(t),y(t)) > I would be satisfied no matter what Player P does. That
is, in this case Player E ensures /-capture avoidance on the interval [0, T] no matter
what Player P does.
Case 2. Let T0 be a minimal root of the equation V(x,y,T) = / with x,y fixed
(if p{x,y) < I, then To is taken to be 0). From the definition of V(x, y,7o) it then
follows that in the game T(x,y,To) for every e > 0 Player P has a strategy u* which
ensures that
K{z,y;u;(-),v(-)) < V(x,y;T0) + e= /+ e
for all strategies u(-) E of Player E. From the form of the payoff function K it
follows that, by employing a strategy u"(-), Player P can ensure that the inequality
mm0<t<T p(x(t),y(t)) < I + would be satisfied no matter what Player E does.
Extending arbitrarily the strategy *() to the interval [T0, T] we have that, in Case
2, for every e > 0 Player P can ensure (/ + e)-capture of Player E in time T no matter
what the latter does.
This in fact proves the following theorem (of alternative).
Theorem. For every x, y Rn, T > 0 one of the following assertions holds:
1. from initial conditions x,y Player E can ensure l-capture avoidance during the
time T no matter what Player P does;
S. for any e > 0 Player P can ensure (l + t)-capture of Player E from initial states
x,y during the time T no matter what Player E does.
5.4.6. For each fixed T > 0 the entire space Rn x R" is divided into three
nonoverlapping regions: region A - {x,y : V(x,y,T) < 1} which is called the capture
zone; region B ~ {x,y : V(x,y,T) > 1} which is naturally called the escape zone;
region C = {x,y : V(x,y,T) = 1} is called the indifference zone.
Let x,y A. By the definition of A, for any e > 0 Player P has a strategy *()
such that
K(x,y;u:(-)A-))<V(x,y,T) +e
290 Gbapter 5. Differentia] games
for all strategies v(-) of Player E. By a proper choice of e > 0 it is possible to ensure
that the following inequality be satisfied:
K(x,y;u;(.),v(-))<V(x,y,T) + e<l.
This means that the strategy u* of Player P guarantees him l-capture of Player E
from initial states during the time T. We thus obtain the following refinement of
Theorem 5.4.5.
Theorem. For every fixed T > 0 the entire space is divided into three nonover-
lapping regions A, B, C possessing the following properties:
1. for any x,y A Player P has a strategy *() which ensures i-captvre of Player
E on the interval [0, T] no matter what the latter does;
8. for x,y B Player E has a strategy *() which ensures l-capture avoidance of
Player P on the interval [0,T] no matter what the latter does;
S. if x,y C and t > 0, then Player P has a strategy u*() which ensures (/+ e)
capture of Player E during the time T no matter what the latter does.
which may at times also be called (see Krasovskii (1985), Krasovskii and Subbotin
(1974)) a hypothetical mismatch of the sets CE(VO) and Cf(xo) (see Example 6 in
2.2.6).
The function pr{x0,yo) has the following properties:
1. pr(x0,y0) > 0, PT{X0, y0)|T=o = p(*o,o);
2. h(*o,Vo) = 0 if Cp(*o) D Cl(y0);
5.5, Necessary and sufficient condition for existence of optimal open-loop 291
V(*o,yo,T) >pr(x0,yo)-
Indeed, property 1 follows from non-negativity of the function p(x,y). Let
Cp(x 0 ) D C | ( y 0 ) . Then for every y' G C^yo) there is x' G Cfp{x0) such that
p(x',y') = 0, (x' = y'), whence follows 2. Property 3 follows from the fact that
Player E can always guarantee himself an amount />r(xo,yo) by choosing the motion
directed towards the point M G C|(y 0 ) for which
h{xo,yo)= mm p(x,M).
For all x C P (x 0 ) there is an inclusion Cp~~s(x) C Cp(x0). Hence for any x G C^(x0),
yeCTE-s{y).
min p ( x , y ) > _ min p(x,y).
and
min max min p(x,y) > max min p(x,y).
Cj 1 {o)FeCS-'(,)SCj-'(*) 56Cj-'()x6c;(ro)
292 Chapter 5. Differential games
Thus
max min pT-s(x,y) > max max min p(3S,y)
vCf.(vo)*C(*) 6^(vo)veCj-'()SCf(ro)
vecj(i)iecj(i 0 )
This completes the proof of lemma.
We shall now prove the Theorem.
Necessity. Suppose that condition (5.5.2) is satisfied and condition (5.5.3) is not.
Then, by Lemma, there exist S > 0, xo, yo /?*, To = Ska, ^o > 1 such that
L e t ( ) be an optimal strategy for Player P in the game Ti(xo, yo, T0) and suppose
that at the 1st step Player E chooses the point y* C%(y0) for which
Let x(S) be the state to which Player P passes at the 1st step when he uses
strategy u(-), and let (-) be an optimal strategy for Player E in the game
Ts(x0(6),y*,T0 S). Let us consider the strategy v(-) for Player E in the game
rs(x0,y0,To): at the time instant t = 0 he chooses the point y* and from the instant
t = 6 uses strategy 5(-).
Denote by u(-) the truncation of strategy u(-) on the interval [6, To]. From
(5.5.2), (5.5.4), (5.5.5) (by (5.5.2) pr(x0, yo) is the value of the game T$(x0, yo, T)) we
find
Denote "() = {<x, v"(t)}, where the partition <x of the interval [0,T] consists of two
points t0 = 0,ii = T. Evidently, v*(-) e . By Theorem in 1.3.4, v'(-) is an
optimal strategy for Player E in the game V(x0,yo,T) if
}n{K{u(),v"{);x0,y0,T) = pr{x0,y0).
Necessity. Suppose that in the game T(xo, yo, T) there exists an optimal open-loop
strategy for Player E. Then
By Lemma in 5.5.3, this implies existence of an optimal open-loop strategy for Player
E.
Necessity of condition (5.5.7) follows from Theorem in 5.5.2, since the existence
of an optimal open-loop strategy for Player E in the game T(xo,yo,T) involves the
existence of such a strategy in all games Ts(x0,yo,T), T = Sk, k>l and the validity
of relationship (5.5.3).
294 Chapter 5. Differentia] games
5.6 Fundamental equation
In this section we will show that, under some particular conditions, the value func
tion of the differential game satisfies a partial differential equation which is called
fundamental. Although in monographic literatures R. Isaacs (1965) was the first to
consider this equation, it is often referred to as the Isaacs-Bellman equation.
5.6.1. By employing Theorem in 5.5.3, we shall derive a partial differential equa
tion for the value function of the differential game. We assume that the conditions
of Theorem in 5.5.3 hold for the game T(x, y, T). Then the function pr{x, y) is the
value of the game F(x,y, T) of duration T from initial states x, y.
Suppose that in some domain ft of the space i f X i f x [0, oo) the function pr(x, y)
has continuous partial derivatives in all its variables. We shall show that in this case
the function pr(x, y) in domain ft satisfies the extremal differential equation
w-^%&^-&ti&M''u)=0' (561)
where the functions fi{x,v),gi(y,v), i = l , . . . , n determine the behavior of players
in the game T (see (5.3.1), (5.3.2)).
Suppose that (5.6.1) fails to hold in some point (x, y,T) ft. For definiteness, let
E^(y,*) = maxg(*).
Then the following inequality holds for any u U in the point (x, y, T) 6 ft:
-!>.>-...)>. (5.6.2)
From the continuous differentiability of the function p in all its variables it follows
that the inequality (5.6.2) also holds in some neighbourhood S of the point (x,y,T).
Let us choose a number 6 > 0 so small that the point (x(r), y(r), T T) 6 S for all
r e [0,S\. Here
fr /(*(*), (*))*.
X{T) = X+ Jo
G
(T) = _
^I(*(T).*<T),T-T)-
U U X,
~ ( ~dx~^ "= ^ 0J7^' (5.6.5)
Substituting expressions (5.6.5) into (5.6.4) we obtain
this theorem holds for any continuous terminal payoff H(x(T),y(T)). In this case,
however, instead of the quantity pr(x, y) we have to consider the quantity
HT(X,V) = max min H(x',y').
'Cj(v)x'6Cf(x)
Equation (5.6.4) also holds for the value of the differential game with prescribed
duration and any terminal payoff, i.e. if in the differential game T(x,y,T) with
prescribed duration and terminal payoff H(x(T),y(T)) there is an optimal open-loop
strategy for Player E, then the value of the game V(x, y, T) in the domain of the space
fl x i? 1 x [0,oo), where there exist continuous partial derivatives, satisfies equation
(5.6.4) with the initial condition V(x,y,T)|r=o = H(x,y) or equation (5.6.6) with
the same initial condition.
5.6.2. We shall now consider the games of pursuit in which the payoff function
is equal to the time-to-capture. For definiteness, we assume that terminal manifold
F is a sphere p(x, y) = I, I > 0. We also assume that the sets Cp(x) and C'E(y) are
^-continuous in zero uniformly with respect to z and y.
Suppose the following quanitity makes sense:
0(x,y,l) = maxtmntln(x,y\u(t),v(t)),
v(l) u(f)
where t'n(x,y;u(t),v(t)) is the time of approach within /-distance for the players P
and E moving from initial points x,y and using measurable open-loop controls (t)
and v(t), respectively. Also, suppose the function 0(x,y,l) is continuous in all its
independent variables.
Let us denote the time-optimal game by F(a:o,S/o)- As in Sees. 5.4, 5.5, we may de-
rive necessary and sufficient conditions for existence of an optimal open-loop strategy
for Player E in the time-optimal game. The following theorem holds.
T h e o r e m . In order for Player E to have an optimal open-loop strategy for any
xo, !fo 6 .ft in the game T(xo, Sto) it is necessary and sufficient that for any 8 > 0 and
any x0, !fo 6 /P*
Hxo,VoJ) = 6 + max min 0(x',y',l).
i,'eC'(vo)r'ec<()
For the time-optimal game of pursuit the equation (5.6.4) becomes
(-*>)+K*(^5>) <"IO>
5.6. Fundamental equation 297
subject to
0(*,y,OW,)=; = O. (5.6.11)
The derivation of equation (5.6.8) is analogous to the derivation of equation (5.6.4)
for the game of pursuit with prescribed duration.
Both initial value problems (5.6.4), (5.6.7) and (5.6.8), (5.6.9) are nonlinear in
partial derivatives, therefore their solution presents serious difficulties.
5.6.3. We shall now derive equations of characteristics for (5.6.4). We assume
that the function V(x,y;T) has continuous mixed second derivatives over the entire
space, the functions gi(y,v), /,(x,u) and the functions tt = u(x, | ^ ) , v = v(y, )
have continuous first derivatives with respect to all their variables, and the sets U, V
have the aspect of parallelepipeds a m < u m < 6 m , m = 1 , . . . , k and c, < vq < dq,
q = 1 , . . . , t. where u ( t t j , . . . , uk) . U,v = ( v i , . . . , vi) 6 V. Denote
dV J* dV J i , dV
B(x,y,T) = ~-^f,(x,z)-^~9i(y,u).
-&(.)-. '-> -
For every fixed point (x,y,T) H" x Rn x [0,oo) the maximizing value v and the
minimizing value u in (5.6.4) lie either inside or on the boundary of the interval of
constraints. If this is an interior point, then
(&)--* (>
If, however, u(v) is at the boundary, then two cases are possible. Let us discuss these
cases for one of the components m (z, |j) of the vector u. The other components of
vector u and vector V can be investigated in a similar manner. For simplicity assume
that at some point (x',y',T')
Case 1. In the space H" there exists a ball with its center at the point x' and the
following equality holds for all points x:
_ _ / dV(x,y',T')\
m = m I X, 1 = am.
298 Chapter 5. Differential games
The function u m assumes on the ball a constant value; therefore in the point x' we
have
=0, t= l,...,n.
OX,
Case 2. Such a ball does not exist. Then there is a sequence of interior points x r ,
lim r _ 0 0 x r = x' such that
/ dV(Xr,y',T')\
um r f m
\ ' dx ) "
Hence n
d / 8V \
dum
From the continuity of derivatives dV/dx{, dfi/dum and function u
u(x, dV^'T^) it follows that the preceding equality also holds in the point (x',y',' r).
Thus, the last two terms in (5.6.12) are zero and the following equality holds for
all(x,y,r)eflnx[0)oo):
0V0 _ n d*V
SJ dxi dxk ' . a &(>") = *=l,2,...,n.
LetS(t),y(t),te [ 0 , r ] b e a !solution of the system
x= ./(..<, 53*g^>).
y- -,(..<.. S ^ ) )
with the initial condition x(0) = Xo, y(0) = jfo- Along the solution x(t),y(t) we have
d*V(x(t),y(t),T
dTdxk -''-E^^-'W*.))
j^dV{x{t),y(t),T-t)dfi{x{t),u{t))
$T[ dxi dxk
Note that for the twice continuously differentiable function we may reverse the order
of differentiation. Now (5.6.13) can be rewritten in terms of (5.6.14) as
d fdV(x(t),y(t),T~ t)\ ^ dV(x(t),V(t),T - t) 9/,(x(t),u(0)
dx 9l
dt\ dxk ) ' h. ' * ,*-!,...,
In a similar manner we obtain the equations
dfdV{x(t),y(t),T-t)\_ ^dV(x(t),y(t),T-t)d9j(y(t)Mm .
dt\ dy{ I' & dy, % '*-1.-.'
Since for t G [0, T]
V(x(t),y(t),T-t)^H(x(ny(T)%
we have
d/dV(x(t),V(t),T-t)s
dt\ ar )
Let us introduce the following notation:
*dv(x(t),v(t),T-t)
v,.(t) dxi
v m * dV(m>y(t)>T-t)
Vyi{l) g , I i,...,n,
As a result we obtain the following system of ordinary differential equations for the
functions x(t),y(t), Vx(t), Vy(t):
ii = fi(x,u(x,Vx)),
yi = 9i(yAy,vv))>
_ _ A dfj{xM*,v*))
k
~ h ' dxk
300 Ch&pter 5. Differential games
v
vk = - L, vv. Q^ (5.6.15)
VT = 0, i,k= l,...,n,
and, by (5.6.6), we have
Vr = 'tviligi(y,V(1,,V1f)) + Jtv,MzM*,V,)).
In order to solve the system of nonlinear equations (5.6.15) with respect to the func
tions x(t),y(t),VXk(t),VVk(t),VT(t), we need to define initial conditions. For the func
tion V(x(t),y(t),T t) such conditions are given at the time instant t = T, therefore
we introduce the variable T T t and write the equation of characteristics as a
regression. Let us introduce the notation x= x,y= y. The equation of character
istics become
Xi= -fi(x,u),
$i= -9i(y,v),
V,= E V , ^ , (5.6.16)
VT=0.
In the specification of initial conditions for system (5.6.16), use is made of the rela
tionship V(x,y,T)\T=0 = H(x,y). Let x| T=0 = s, yr=0 = s'. Then
M
VI - \
*XJ | T = 0 a |x=,y=',
OXi
dH
VV,|T=O = -TT |x=.tV=,', (5.6.17)
limVs{x,y,T) = V(x,ytT)
o0
and optimal strategies in the game Fs(x, y, T) for sufficiently small 6 can be efficiently
used to construct e-equilibria in the game T(x,y,T).
5.7.2. The essence of the numerical method is to construct an algorithm of finding
a solution of the game Tg(x,y, T). We shall now expound this method.
Zero-order approximation. A zero-order approximation for the function of the
value of the game Vs(x,y,T) is taken to be the function
where Cp(x),C^(y) are reachability sets for the players P and E from initial states
i . y G f f 1 by the time T.
The choice of the function V(x,y,T) as an initial approximation is justified by
the fact that in a sufficiently large class of games (what is called a regular case) it
turns out to be the value of the game T(x,y,T), The following approximations are
constructed by the rule:
Vsl(z,y,T)>V?(x,y,T).
Hence
VHx,y,T)= max min max min p(I,fj)
m
> 5?, "r^ >- m i n ^ ' ^ ) =
max min p(,i?) = K<(x,j/,r).
^(x^.TJ^V^K^T). (5.7.3)
We prove this inequality for / = ifc + 1. From relationships (5.7.2) and (5.7.3) it follows
that
V/ +1 (x, y,T)= max min V/tf, r,, T - S)
Similarly we get
VtN+\x,y,T)
= max min max ... max min V?((rf-1,nN-1tT-(N-l)6).
mC(i ( ){C^( I )^C|(n') *->Cf.(-*K*->Cj,-*)
But T -1(N 1)S = a <6, therefore
The coincidence of members of the sequence V/ for Jfc > N is derived from (5.7.4)
by induction. This completes the proof of the theorem.
5.7.5. Theorem. The limit of the sequence {V/(x,y,T)} coincides with the
value of the game Ts{x,y,T).
Proof. This theorem is essentially a corollary to Theoren in 5.7.4. Indeed, let
V^s,T)=limV/(i,B,r).
*->00
which is a sufficient condition for the function Vg(x, y, T) to be the value of the game
rs(x,y,T), (this is also a "regularity" criterion).
5.7.6. We shall now provide a modification of the method of successive approxi
mation discussed above.
The initial approximation is taken to be the function V^(x,y,T) V(x,y,T),
where V(x,y,T) is defined by (5.7.1). The following approximations are constructed
by the rule:
V(x, y, T) = max max min V,*(,i,, T - iS)
ll-.N]i)C'1{!,XeC},'(r)
for T > 6, where N = [T/6], and Vsk+l{x,y,T) = Vs{x,y,T) for T < 6.
The statements of the theorems in 5.7.3-5.7.5 hold for the sequence of functions
{Vf{x,y,T)} and the sequence of functions { V / ( x , y , T ) } .
The proof of these statements for the sequence of functions {Vsk(x, y, T)} is almost
an exact replica of a similar argument for the sequence of functions {V/(x, y, T)}. In
the region {(x,y,T)|T > 8} the functional equation for the function of the value of
the game T$(x, y, T) becomes
where N = [T/6], while the initial condition remains unaffected, i.e. it is of the form
(5.7.6).
5.7.7. We shall now prove the equivalence of equations (5.7.5) and (5.7.7).
Theorem. Equations (5.7.5) and (5.7.7) with initial condition (5.7.6) are equiv
alent.
Proof. Suppose the function Vs(x,y,T) satisfies equation (5.7.5) and initial
condition (5.7.6). Show that this function satisfies equation (5.7.7) in the region
{{x,y,T)\T>6}.
304 Chapter 5. Differentia} games
= max min VsU,f),T-26) > ...> max min V W ^ . T - iS) > ....
When = 1 we have
hence
Vs(x, y.T) = max max min VgU.n.T i6),
However,
Since for i = 1 the strong inequality holds, this contradiction proves the theorem.
ff(*(T),(r))=||*(r)-y(T)||.
Let Ts(x,y,T) be a discrete form of the differential game F(x,y,T) with the
partition step 6 > 0 and discrimination against Player E. The game Tg{x,y,T) has
N steps, where JV = T/6. By Sec. 5.2 (see Example in 5.2.3) the game Fg(x,y,T)
has the value
Vs{x, y, T) = max{0, ||x - y|| - N 6 ( a - /?)} = max{0, ||x - y\\ - T(a - /?)},
and the optimal motion by players is along the straight line connecting the initial
states x,y.
By the results of 5.3, the value of the original differential game
where Cj.(y) = S(y, /3T) is the ball in ft" of radius jlT with its center at the point
y, similarly C%(x) = S(x,aT). Thus, by Lemma in 5.5.3, Player E in the game
F(x 0 ,Vo,T) has the optimal open-loop strategy t>"(i), t (0,!Tj, which leads Player
E's trajectory to the point y* G Cg(yo) for which
Evidently,
w i t
v'(t) = v' = (i^\V ^ * 0 ,
[ v, with y0 = xo,
where v Rn is an arbitrary vector such that ||t>|| = 1. From the results of 5.6 it
follows that in the region A
A={(x,y,T):||*-y||-r(a-/*)>0},
where there exist continuous partial derivatives
dV . a. OV dV x-y
v
dT "" dx dy ||x-y|r
dV
<dV \ a (dV ^ n /* a
- - a min(-^ ,u) pmaxf-r ,v) = 0. (5.8.3)
306 Chapter 5. Differential games
In equation (5.8.3), minimum and maximum are achieved under controls
av
dV (5 8 4)
-"^-m-i^-r --
(5 8J)
^^mrw^i- -
Strategies (5.8.4), (5.8.5) are optimal in the differential game (5.8.1). The strategy
u(x,y) determined by relationship (5.8.4) is called a "pursuit strategy", since at each
instant of time for Player P using this strategy the vector of his velocity is pointing
towards Evader E.
5.8.2. Example 5. (Game of pursuit with frictional forces.) The pursuit takes
place over the plane. Equations of motion are of the form:
forP:
9. = Pi i
In the plane q = (41, ft), the reachability set Cp(q,p) for Player P from the
initial states p(0) = p, q(0) = q in the time T is the circle (Exercise 18) of radius
(,0,p0,r) = , 0 + p i ^ I .
Similarly, the set Cf(r,s) is the circle of radius
ksT
RE(T) = A-(e-
k
+ kET - 1)
E
5.8. Examples of solutions to differential games of pursuit 307
b(r0,S0,T) = r0 + Sl I .
KE
max< 0 , . r +p s
tk* - - kP ' kE )
/ e-k"T + kPT-l ae-"*
T
+ kBT - \ \ \
\Q kj, fi
kl )) (5 89)
'
In particular, the conditions a > /?, f- > r^- suffice to ensure that for any initial
states <?,p, r, 5 there is a suitable T for which pr(q,p, r, s) = 0.
The function pr(q,p,r,s) satisfies the extremal differential equation (5.6.1) in the
domain ft {(q,p,r,s,T) : /Sr(g> P i r , s ) > 0}. In fact, in the domain ft there are
continuous partial derivatives
dp dp dp dp dp Ksirn
(58U)
''mtfe*'"$&**''
Here extrema are achieved on the controls u, tJ determined by the following formulas:
''
it
77. <** (5 8 12)
( }
* s/&Hgr
*L
d,i
v - i - 1 9 f K 8 13"i
dp ^{dp dp dp dp \ I, dp st , , dp ^
+ kpp + { r)
'ds
ar - (%" &? - dp- ' - d^M - ^V W ^
+Q ( )2+( )2=0 (5814)
vt -
Computing the partial derivatives (5.8.10) we see that the function pr(q,p,r,s)
--
in the domain ft satisfies equation (5.8.14). Note that the quantity pT(q,p0,r,s0)
is the value of the differential game (5.8.6)-(5.8.8) and the controls determined by
relationships (5.8.12), (5.8.13) are optimal in the domain ft,
From formulas (5.8.12), (5.8.13), (5.8.9) we find
r 9i kE Pi kF
Ui= ' * t/, = u i = 1,2. (5.8.15)
In the situation u, v the force direction for each of the players is parallel to the line
connecting the centers of reachability circles (as follows from formula (5.8.15)) and
remains unaffected, since in this situation the centers of reachability circles move
along the straight line.
opponent's state x(t) and the time t. His payoff is equal to a distance between the
players at the time instant T, the payoff to Player P is equal to the payoff to Player
E but opposite in sign (the game is zero-sum). Denote this game by V(x0,y0,T).
Definition. The pure piecewise open-loop strategy v(-) for Player E means the
pair {T,b), where T is a partitioning of the time interval [0,T] by a finite number of
points 0 = t\ < ... < tk = T, and b is the map which places each state x(ti),y(ti),U
in correspondence with the measurable open-loop control v(t) of Player E for t
Definition. The pure piecewise open-loop strategy u(-) for Player P means the
pair {o~, a}, where a is an arbitrary partitioning of the time interval [0,T) by a finite
number of points 0 = t\ < t'2 < < t', = T, and a is the map which places each
state x(*J),y(*| /) K for ' t'i in correspondence with the segment of Player P's
measurable open-loop control u(<) for t [<o*J+1). For <{ < I, the map a places each
state x(tj)> yo, t'i in correspondence with the segment of Player P 's measurable control
U (o/ort e[<:,*;+1).
The sets of all pure piecewise open-loop strategies for the players P and E are
denoted by P and E, respectively.
Equations of motion are of the form
x = f{x,u), u 6 U C RF, x 6 R",
where sup inf is taken over the players' strategy sets in the game with incomplete
information.
For any strategy u(x, t) of Player P, however, we may construct a strategy v(x, y, t)
for Player E such that in situation (u(x,t),t>(x,y, t)) the payoff p to Player E will
exceed /3T. Indeed, let u(x, t) be a strategy for Player P. Since his motion is inde
pendent of y(t), the path of Player P can be obtained by integrating the system
motion by Player E is oriented along the straight line [x(T),t/o] away from the point
x(T). His speed is taken to be maximum. Evidently, the motion by Player E ensures a
distance between him and the point x(T) which is greater than or equal to /3T. Denote
the thus constructed strategy for Player E by v(t). In the situation (u(x,t),v(t)), the
payoff to Player E is then greater than or equal to ffT. From this it follows that
where infsup is taken over the players' strategy sets in the game with incomplete
information.
It follows from (5.9.5) and (5.9.7) that the value of the game in the class of pure
strategies does not exist in the game under study.
5.9.4. Definition. The mixed piecewise open-loop behavior strategy (MPOLBS)
for Player P means the pair p(-) = {r,d}, where r is an arbitrary partitioning of the
time interval [0, T] by a finite number of points 0 = ti < tj < . . . < tk = T, and d is
the map which places each state x(ti),y(ti /),, for i, > I and the state x(ti),yo,ti
for t, < I in correspondence with the probability distribution /!;() concentrated on a
finite number of measurble open-loop controls u(t) for t [ti,ti+l).
Similarly, MPOLBS for Player E means the pair v(-) = {c, c}, where a is an
arbitrary partitioning of the time interval [0,T] by a finite number of points 0 =
t'j < t'2 < . . . < t't = T, and c is the map which places the state x(i'j),y{t'i),t'i in
correspondence with the probability distribution /() concentrated on a finite number
of measurable open-loop controls v(t) for t 6 [ft,it+i)>
MPOLBS for the players P and E are denoted respectively by P and E (compare
these strategies with "behavior strategies" in 4.8.3).
Each pair of MPOLBS fi(-),i/(-) induces the probability distribution over the
space of trajectories x(t),x(0) = Xo; 2/(0>y(0) = Vo- For this reason, the payoff
^(*Oi J/o! A*(") "(*)) m MPOLBS is interpreted to mean the mathematical expectation
of the payoff averaged over the distributions over the trajectory spaces that are in
duced by MPOLBS p(-), /(). Having determined the strategy spaces P,E and the
payoff K we have determined the mixed extension r ( i o 5 Vo, T) of the game r ( x 0 , yo> T).
5.9.5. Denote by Cp(x) and C%{y) the respective reachability sets of the players
.T*
P and E from initial states x and y at the instant of time T, and by CE(y) the
convex hull of the set Cg(y). We assume that the reachability sets are compact, and
introduce the quantity
7(y,T)= min max ? ((,i(),
Cs(v)"cSW
Let l{y,T) = p{y,y), where y CE(y), y 6 C"|(y). From the definition of the point y
it follows that it is a center of the minimal sphere containing the set C | ( y ) . Hence it
follows that this point is unique. At the same time, there exist at least two points of
tangency of the set C7f(y) to the minimal sphere containing it, these points coinciding
with the points y.
Let y(t) be a trajectory (y(0) = y0) of Player E for 0 < t < T. When Player
E moves along this trajectory the value of the quanitity f(y(t),T t) changes, the
312 Chapter 5. Differential games
point y also changes. Let y(t) be a trajectory of the point y corresponding to the
trajectory y(t). The point M Cf ~'(yo) will be referred to as the center of pursuit
if
f(M,l)= max -f(y',l).
w'Cj-'(i)
5.9.6. We shall now consider an auxiliary simultaneous zero-sum game of pursuit
over a convex hull of the set Cf (y). Pursuer chooses a point 6 ^E(V) an<^ Evader
chooses a point n CE(y). The choices are made simultaneously. When choosing the
point i, Player P has no information on the choice of n by Player E, and conversely.
Player E receives a payoff />(,?). We denote the value of this game by V(y, T)
in order to emphasize the dependence of the game value on the parameters y and T
which determine the strategy sets TJE(y) and Cfgiy) for players P and E, respectively.
The game in normal form can be written as follows:
r(y,T) = (UTE(y),cUy),p(y'y))-
The strategy set of the minimizing player P is convex, the function p(y', y") is
also convex in its independent variables and is continuous. Theorem in 2.5.5 can
be applied to such games. Therefore the game T(y, T) has an equilibrium in mixed
strategies. An optimal strategy for Player P is pure, and an optimal strategy for
Player E assigns the positive probability to at most (n + 1) points from the set
Cf(y), with V{y, T) = i(y, T). An optimal strategy for Player P in the game T(y, T)
is the choice of a center of the minimal sphere y containing the set CE(y). An optimal
strategy for Player E assigns the positive probabilities to at most (n+1) points among
the points of tangency of the sphere to the set C7g(y) (here n is the dimension of the
space of y). The value of the game is equal to the radius of this sphere (see Example
11 in 2.5.5).
5.9.7. We shall now consider a simultaneous game T(Af, /), where M is the center
of pursuit. Denote by yt(M),... ,yn+i(M) the points from the set C'E(M) appearing
in the spectrum of an optimal mixed strategy for Player E in the game T(Af, /) and
by y(M) an optimal strategy for Player P in this game.
Definition. The trajectory y"(t) is called conditionally optimal if y*(0) = y0,
ym(T l) = M, y*(T) = y~i(M) for some i from the numbers 1,... ,n + 1.
For each i there can be several conditionally optimal strategies of Player E,
Theorem. Let T > I and suppose that for any number e > 0 Player P can ensure
by the time T the e-capture of the center y(T) of the minimal sphere containing the
set CE(y(Tl)). Then the game T(xo,yo,T) has the value f(M,l), and the e-optimal
strategy of Player P is pure and coincides with any one of his strategies which may
ensure the e/2-capture of the point y(T). An optimal strategy for Player E is mixed:
during the time 0 <t <T I he must move to the point M along any conditionally
optimal trajectory y"(t) and then, with probabilities p i , . . . ,p+i (the optimal strategy
for Player E in the game T(M, I)), he must choose one of the conditionally optimal
trajectories sending the point y*(T - I) = M to the points y~i(M), i = 1 , . . . , n + 1
which appear in the spectrum of an optimal mixed strategy for Player E in the game
T(M,l).
5.9. Games of pursuit with delayed information for Pursuer 313
Proof, Denote by u,(),!/() the strategies mentioned in Theorem whose optimal
ity is to be proved. In order to prove Theorem, it suffices to verify the validity of the
following relationships:
Let R be a radius of the minimal sphere containing the set C'E(M), i.e. R =
7(M, 0- Then # - e/2 < p(x'{T),yx(M)) < R + e/2 for all i = 1,... ,n + 1, since the
point i*(T) belongs to the e/2-neighborhood of the point y(M). Since H ^ 1 p; = 1,
Pi > 0, from (5.9.10) we get
where y[y(T - /)] is the center of the minimal sphere containing the set C'B(y{T - / ) ) .
However, p{x(T),y[y(T - 1)}) < e/2, therefore for y 6 CsE(y(T - I)) we have
but
/ p(x(T), y)dQ = ( x 0 , y0; ,(), *()) (5-9-15)
From formulas (5.9.14) and (5.9.15) we obtain the right-hand side of inequality (5.9.8).
This completes the proof of the theorem.
For T <l the solution of the game does not differ essentially from the case T > I
and Theorem holds if we consider CE(yo), S^Oto), f(M,T), y0 instead of CE(yo),
^fi(yo), l{M,l), yiT - 0) respectively.
The diameter of the set C'B(M) tends to zero as I * 0, which is why the value of the
auxiliary game T(M, /) also tends to zero. But the value of this auxiliary game is equal
to the value Vi(x0,y0,T) of the game of pursuit with delayed information T(x0,y0,T)
(here index / indicates the information delay). The optimal mixed strategy for Player
E in V(M, 1) concentrating its mass on at most n + 1 points from C'E(M) concentrates
in the limit its entire mass in one point M, i.e. it becomes a pure strategy. This agrees
with the fact that the game T(x0,yo,T) becomes the game with perfect information
as / ~ + 0 .
Example 7. Equations of motion are of the form
Suppose the time T satisfies the condition T > p(x0,yo)/(a 0) + ' The reach
ability set Cg(yo) = Cfidft)) and coincides with the circle of radius 01 with its center
at jfo- The value of the game T(y, I) is equal to the radius of the circle ClE(y), i.e.
V(y,l) = 0l.
Since V(y, I) is now independent of y, any point of the set C f ~'(j/o) can be the
center of pursuit M. An optimal strategy for Player P in the game T(y, I) is the choice
of point y, and an optimal strategy for Player E is mixed and is the choice of any
two diametrically opposite points of the circle C'E(y) with probabilities (1/2,1/2).
Accordingly, an optimal strategy for Pursuer in the game r(xo,yo)^') is the linear
pursuit of the point y(t /) for / < t < T (the point yo tot 0 < t < I) until the
capture of this point; moreover, it must remain in e/2-neighborhood of this point.
An optimal strategy for Player E (the mixed piecewise open-loop behavior strategy)
is the transition from the point y0 to an arbitrary point M G CE~'(yo) during the
time T I and then the equiprobable choice of a direction towards one of the two
diametrically opposite points of the circle CE(M). In this case ValT(x0,y0,T) = 01.
= J2fTk>(x'(t))dt = v{N;xo,T-to),
where N is the set of all players in T(x0,T t0). The trajectory x*(t) is called
conditionally optimal. Let S C N, and v(S;x0,T to) be a characteristic function.
It follows from the surperadditivity condition that it is advantageous for the players
to form a maximal coalition N and obtain a maximal total payoff v(N; XQ, T to)
that is possible in the game. Purposefully, the quantity v{S;x0,T t0) [S ^ N) is
equal to a guaranteed payoff of the coalition 5 obtained irrespective of the behavior
of other players, even though the latter form a coalition N\S against S.
Note that the positiveness of payoff fuunctions J,, i = l , . . . , n implies that of
characteristic function. From the superadditivity of v it follows that v(S'\ x0,Tt0) >
v(S;x0,T - t 0 ) for any S, S' C N such that S C S", i.e. the superadditivity of the
function v in S implies that this function is monotone in S.
Since the essence of cooperative game is the possibility to form coalitions and the
main problem therein is distribution of the total payoff between players, then the
subject of cooperative theory is characteristic function rather than strategy. In fact,
the characteristic function displays the possibilities of coalitions in the best way and
can form the basis for equitable distribution of the total payoff between players.
The pair (N, v(S; xa, T~t0)), where N is the set of players, and v the characteristic
function, defined by (8.1.2) is called the cooperative differential game in the form of
characteristic function v. For short, it will be denoted by r(xo,T to)-
5.10.2. Various methods for "equitable" distribution of the total profit between
players are treated as optimality principles in cooperative games. The set of such
distributions satisfying an optimality principle is called a solution to the cooperative
game (in the sense of this optimality principle). We will now define solutions of the
game Tv(N;x0,T -t0).
316 Chapter 5. Differential games
Denote by , a share of the player :' N in the total gain v(N;xo,T tQ).
Definition. The vector (i,...,), whose components satisfy the condi
tions:
1. ti>v({i};x0,T-to), ieN,
2- ,<=* = f ( ^ ; z o , T - < o ) ,
is called an imputation in the game Tv(x0, T t0).
The equity of the distribution = (&,... ,) representing an imputation is that
each player receives at least his safe payoff and the entire maximal payoff is divided
evenly without a remainder.
5.10.3. Theorem. Suppose the function w : 2N x R x R1 - Rl is additive
in S 6 2B, i.e. for any S,R 2N, Sf)R = 0 have w{S\JR\x0,T - t0) =
w(S;x0,T t0) + w(R;xo,T t0)- Then in the game rw(x0,T t0) there is a unique
imputation ,- = {w({i};xo,T <o)> * = 1,... ,}
Proof. Prom the additivity of w we immediately obtain w(N,xQ,T (0) =
ic({l};x 0 , T t0) + ... + w({n}; XQ,T t0), whence follows the statement of the
theorem.
The game with additive characteristic function is called inessential. In the essential
game r(zo, T to) there is an infinite set of imputations. Indeed, any vector of the
form
1- (i > m> S;
2- T,iesti ^ v(S;x0,T- t0).
The imputation is said to dominate the imputation n ( >- n) if there is such coalition
SCN thaty n.
It follows from the definition of the imputation that domination in single-element
coalition and coalition N, is not possible.
5.10.5. Definition. The set of nondominated imputations is called the core of
the game Tv(xo,T t0) and is denoted by CV(XQ,T t0).
The equity of the imputation belonging to the core is that none of the coalitions
can offer a reasonable alternative against this imputation.
5.10.6. Definition. The set Lv(x0,T - t0) C Ev(x0,T - t0) is called the
Neumann-Morgenstem solution (the NM-solution) of the game Tv(xo,T t0) if:
5.1 J. Principle of dynamic stability (time-consistency) 317
As is seen from the Definition, the conditions placed on the imputations from
the iVM-solution are weaker than those on the imputation from the core and, as
a result, the ./VAf-solution always contains the core. Unlike the core and NM-
solution, the Shapley value representing an optimal distribution principle of the total
gain V(N;XQ,T t0) is defined without using the concept of domination.
5.10.7. Definition. The vector$v(x0,T-t0) = {$vi(x0,T-t0), i= I,...,n}
is called the Shapley value if it satisfies the following conditions:
1. ifv,w are two characteristic functions, then $V(XQ,T t0) + $ w (xo, T to) =
""(icT-to);
2. w$v(x0,T t0) = $ * " ( x , r t 0 ), where ir is any permutation of play
ers, Tt$v(x0,T - t0) = {<&^(t)(x0,T - t0), i = l , . . . , n } , where nv
is a characteristic function such that for any coalition S { t i , . . . , t , }
*({T(*I) . *(.)}; *o>T - to) = v(5; x 0 , T - t0);
As we have seen in Chap. 3, there exists a unique vector $"(xo,X t0) satisfying
these four conditions, and its components are computed by the formulas
*?(xo,T-io)
( 3 ) 1)!
= " " t ~ K S ; x 0 , r - t 0 ) - v{S \ i;x0,T - t0)}, (5.10.4)
n
SCN (S3i) '
i 1 , . . . ,n.
The components of the Shapley value have the meaning of the players' expected
shares in the total gain. Also, it may be shown that the Shapley value is an imputa
tion.
duce the notion of "e-conditionally optimal trajectory" and carry out the necessary
constructions with an accuracy e.
5.11.3. We will now consider the behavior of the set WV(XQ,T t0) along the
conditionally optimal trajectory ~x(t). Towards this end, in each current state x(t)
the current subgame Tv(x(t),T t) is defined as follows. In the state x(t), we define
the characteristic function
f 0, if S = 0 ,
v(S; x(t), T-t) = I ValTs(x(t), T-t), if S C N(0 ? S ? N),
[ maxUN(.)[,,r)ex)w[,,n KN(x(t),uN(-)[t,T\), if 5 = N.
Here Ks{x(t),us(-)[t, T]) is the remaining total payoff of the players from the initial
state x(t) on the conditionally optimal trajectory, i.e.
E& = t>(tf;*(0,:r-o},
where
v(N;x(t),T- t) = v(N;x0,T- t0) - / ' h,(x(r))dr.
Jt
iN
The quantity
320 Chapter 5. Differential games
is interpreted as the total gain of the players on the time interval [to, t] when the
motion is carried out along the trajectory x(-).
5.11.4. Consider the family of current games
{Tv(x(t),T - t) = (JV,(S;x(0,T - t)},0 < < < T],
determined along the conditionally optimal trajectory x() and their solutions
Wv(x(t), T - t) C Ev(x(t),T - t) generated by the same principle of optimality
as the initially solution Wv(xo,T t0).
Lemma. The set Wv(x(T),0) is a solution of the current game r(x(r),0)
and is composed of the only imputation H(x(T)) = {Hi(x(T)), i = l , . . . , n } , where
Hi(x(T)) is the terminal part of the player i's payoff along the trajectory x~(-).
Proof. Since the game I\,(x(T),0) is of zero-duration, then for all i 6 N
v({i};x(T), 0) = Hi(x(T)). Hence
v({i};x(T),0) = ff,-(*(T)) = v(N;x(T),0),
i.e. the characteristic function of the game r(x(T),0) is additive for S and, by
Theorem,
(x(T),0) = H(x(T)) = Wv(x(T),Q).
This completes the proof of the lemma.
5.11.5. Dynamic stability of solution. Let the conditionally optimal trajectory
x(-) be such that Wv(x(t),T t) ^ 0,to < t <T. If this condition is not satisfied,
it is impossible for players to adhere to the chosen principle of optimality, since at
the very first instant t, when Wv(x(t),T t) = 0 , the players have no possibility to
follow this principle. Assume that in the initial state xo the players agree upon the
imputation ( Wv(x0,T tQ). This means that in the state x0 the players agree
upon such imputation of the gain in such a way that (when the game terminates at
the instant T) the share of the tth player is equal to ?, i.e. the tth component of the
imputation . Suppose the player t's payoff (his share) on the time interval [t0, t]
is ,(x(*)). Then, on the remaining time interval [t, T] according to the he is to
receive the gain n\ = ,(x(<)). For the original agreement (the imputation ) to
remain in force at the instant t, it is essential that the vector n* = (n{,... ,n) belongs
to the set Wv(x(t),T t), i.e. a solution of the current game Tv(x(t),T t). If such
a condition is satisfied at each instant of time t [to,3"] along the trajectory x(),
then the imputation is realized. Such is the conceptual meaning of the dynamic
stability of the sharing.
Along the trajectory x(-) on the time interval [t,T], to < t < T, the coalition N
obtains the payoff
is equal to the payoff the coalition N obtains on the time interval [to, t). The share
of the ith player in this payoff, considering the transferability of payoffs, may be
represented as
iM = f 0,(r)f:ht(x(r))dT = 7<(z(t),/J), (5.11.1)
d
Jl=m^h,(x(t))-
a l
iN
2. there exists such [t0, T] integrable function P(t) = ( & ( t ) , . . . , /?(*)) ^ai for eac
^
h<t<T ft() > 0, "=, ft(t) = 1 and
fe H [7(*(0.0)wW),r-O). (5-H-2)
'o<t<T
e- T
e=f p{T)Y,H*(r))dr
/O0*
Jto
J
*o ieN
ii(x(r))dT + H{x{T)).
H(x(T)).
The dynamic stable imputation Wv(x0, T 10) may be realized as follows. FVom
(5.11.2) at any instant t0 < t < T we have
ef b(x-(t),0)
h(x(t),P)W v(x(t),T-t)),
Wv(x(t),T - <)], (5.11.3)
where
7
(x(0,/?)=/V)M*-(r))<<r
7(X(*),/?)=/V)M*(T))^
is the payoff vector on the time interval [t0, t], the player t's share in the gain on the
same interval being
7i(x(tlP)-- = r^(r)A n(x(r))dr.
When the game proceeds along the optimal trajectory, the players on each time
interval [to, t] share the total gain
:i(x(T))dr
""ieiv
among themselves
- -r(x(t),0) 6y(x(t),T-t)
{--y(x(t),l3)eW Wy(x(t),T - t) (5.11.4)
so that the inclusion (5.11.4) is satisfied. Furthermore, (5.11.4) implies the existence
of such vector {' Wv(x(t),T-t) that = y(x(t),0)+lt. That is, in the description
of the above method of choosing fi{r), the vector of the gains to be obtained by the
players at the remaining stage of the game
et ==e-
e - -t(mj) =-i:
7(5(0,0) = h(x(r))dr
[h{x(r))dr +
+ H(x(T))
belongs to the set Wv(3(t), T t). Geometrically, this means that by varying the
vector 7(i(<),/9) = (~/i(x(t),0),... ,*yn(x(t),0)) restricted by the only condition
,
*(*W,/)=/
*(*(*),) = M*M)<fr,
,(x(r))dr,
EN JC*
the players ensure displacement of the set f(x(t), f$) @ Wv(x(t), T t) in such a way
that the inclusion (5.11.3) is satisfied.
In general, it is fairly easy to see that there may exist an infinite number of
vectors 0(T) satisfying conditions (5.11.3),(5.11.4). Therefore the sharing method
5.11. Principle of dynamic stability (time-consistency) 323
proposed here seems to lack true uniqueness. However, for any vector f}(r) satisfying
conditions (5.11.3)(5.11.4) at each time instant t0 < t < T the players are guided
by the imputation ' Wv(s(t), T t) and the same optimality principle throughout
the game, and hence have no reason to violate the previously concluded agreement.
Let us make the following additional assumptions:
Show that by properly choosing /3(t) we may always ensure dynamic stability
of the sharing WV(XQ,T to) under assumptions a),b) and the first condition
of Definition (i.e. along the conditionally optimal trajectory at each time instant
to<t<TWvCx(t),T-t)?e>).
We choose ' Wv(~x(t),T t) to be a continuously monotone differentiable
nonincreasing function of t, t0 < t < T. Construct the difference ' = a(t) then
we get f! + a(t) Wv(x0,T-t0). Let ${t) = ( A ( 0 . - ,A.(0) b e t h e %,T) integrable
vector function satisfying condition (5.11.4). Solve the equation (with respect to 0(t))
f|(r)^(#,
ffi(T)J2hi(x(r))dT a=(l):
a(t):
l 1
m
3(t]
P U
= 1 . ^**()_
1= 1 . *! (5(5.11.5)
n 5)
***.-(*(*)) dt E*JVM*(0) dt'
Make sure that for such 0(t) the condition (5.11.4) is satisfied. Indeed,
y m .* _ HN;x{t),T-t)
= ililies
[E*N (If
[ll Hn^dr
h,(x(r))dr++HMT))))
HitfT)))]= E,E, M^))
M*(0)=- 1
*AfW))
De* *,(*(')) ""E*wM*(0)
Li^h,(W(t))
(smceZizNti=v(N;x(t),T-t)).
From condition (5.10.3) we have h{,Hi > 0, i N, and since d^/dt < 0, then
ft(r) > 0.
Thus, if along the conditionally optimal trajectory all current games have
nonempty solutions possessing conditions a), b) then the original game TV(XQ,T 10)
has a dynamic stable solution. Theoretically, the main problem is to study conditions
imposed on the vector function /?(<) in order to ensure dynamic stability of specific
forms of solutions Wv(x0, T - 1 0 ) in various classes of games. In what follows we shall
try to make a classification of dynamic stable solutions.
Consider the new concept of strongly dynamic stability and define dynamic stable
solutions for cooperative games with terminal payoffs.
5.11.6. Strongly-dynamic stable solution. For the dynamic stable imputation
6 Wv(x0,T tQ), as follows from the Definition, for t0 < t < T there exists
324 Chapter 5. Differential games
such [to,T] integrable vector function 0(t) and imputation (* (generally nonunique)
from the solution Wv{w(t),T - t) of the current game Tv(j(t),T - t) that =
7(x(i), ) + '. The conditions of dynamic stability do not affect the imputation from
the set Wv(x(t),T t) which fail to satisfy this equation. Furthermore, of interest is
the case where any imputation from the current solution Wv(x(t), T t) may provide
a "good" continuation for the original agreement, i.e. for a dynamic stable imputation
( 6 Wv(x0,T - to) at any instant tQ < t < T and for every * 6 Wv(x(t),T - t)
the condition t(x(t),0) + (* 6 Wv(x0,T - t0), where i{x{T)J) + H(x(T)) = (, be
satisfied. By slightly strengthening this requirement, we obtain a qualitatively new
dynamic stability concept of the solution WV(XQ,T t0) of the game Tv(xo,T t0)
and call it a strongly dynamic stability.
Definition. The imputation 6 W v (*o, T-t0) is called strongly-dynamic stable
in the game Tv(x0, T 10), if the following conditions are satisfied:
1. the imputation is dynamic stable;
S. for any to<ti<t2<T and /9(t) corresponding to the imputation ( according
to (5.11.2),
The cooperative differential game Tv(xo, Tt0) with side payments has a strongly-
dynamic stable solution Wv(x0, T -10) if all the imputations from W^zo, T -10) are
strongly-dynamic stable.
The conditionally optimal trajectory along which there exists a strongly dynamic
stable solution of the game I\,(xo! T to) is called a strongly optimal trajectory.
If there exists at least one strongly-dynamic stable, imputation ( Wv(xo,T to),
but not all of the imputation from the set W^xo, Tto) have such a property, then
we are dealing with a partial strongly dynamic stability of the solution JV",,(xo, T to)
of the game r v (x 0 ) T to)-
The dynamic instability of the solution of the cooperative differential game leads
to abandonment of the optimality principle generating this solution, since none of the
imputation from the set Wv(x0, T to) remains optimal until the game terminates.
Therefore, the set Wvfach T 10) may be called a solution to the game r(x 0 , T to)
only if it is dynamic stable. Otherwise the game r(x 0 , T 10) is assumed to have no
solution.
5.11.7. Terminal payoffs. In (5.10.3) let hi = 0, t = l , . . . , n . The cooperative
differential game with terminal payoffs is denoted by the same symbol r(xo, T to).
In such games the payoffs are made when the game terminates.
Denote by C T - , 0 (xo) the set of points y G /T" for which there exists an open-
loop control ( i ( < ) , . . . , n ( 0 ) ! u i ( 0 W,-, * = 1 , . . . , n transferring (because of system
(5.10.2)), a phase point from the initial state Xo to the point y in time T to- The
set CT~te(x0) is called the reachability set in the game r v ( x 0 , T t0).
It is naturally assumed that in the game with terminal payoffs
v{N;x0,T-t0)= max HN(x) = HN(x'), (5.11.7)
xeC T -'o(r 0 )
5.11. Principle of dynamic stability (time-consistency) 325
where
HN(x) = #,(*)
(for simplicity, assume that a maximum in (5.11.7) is achieved, otherwise construc
tions become somewhat complicated and we have to deal with the e-dynamic stabil-
ity).
Definition. Any trajectory x(-) of system (5.10.1)-(5.10.2) such that x(T) =
3T* is called a conditionally optimal trajectory in the cooperative differential game
TV(XQ,T to) with terminal payoffs.
Definition of a dynamic stable imputation from the solution WV(XQ,T to) is
obtained as a special case of the definition from 5.11.5. Since the games with terminal
payoffs are frequently encountered in this book, this definition is provided separately.
Consider the current games r(z(t), T t), t0 < t < T along a conditionally
optimal trajectory x(-). As before, their solutions are denoted by Wv(x(t),T t) C
Ev(x(t),T -t),t0<t< T. The game Vv(x(t),T - t) is of duration T - t, has the
initial state x(), and the payoff functions therein are defined just as in the game
r v ( i 0 ) T to). Note that, with the motion along the conditionally optimal trajectory,
at each time instant t0 < t < T the point x* remains in the reachability region
The conditionally optimal trajectory along which there exists a dynamic stable
imputation Wv{x0,T - t0) is called an optimal trajectory.
Theorem. In the cooperative differential game TV(XQ,T t0) with terminal
payoffs Hi(x{T)), i = l , . . . , n , only the vector H(x") = {#i(x"), i = l , . . . , n }
whose components are equal to the players payoffs at the end point of the conditionally
optimal trajectory may be dynamic stable.
Proof. It follows from the dynamic stability of the imputation
Wv(x0,T-t0) that
(0 fl W.(x(t),T-t).
to<(<T
But since the current game i\(x"(T), 0) is of zero duration, then therein Ev(x(T), 0) =
W(*(T),0) = H{x(T)) = H{x'). Hence
f| Wv(W(t),T-t) = H(x'),
to<t<T
where N is the set of all players in V(x0, T-U,}. The trajectory x*(t) is called optimal.
Let v(S; x0, T -10) be the characteristic function (5 C N) and C(x0, T -10) the core.
Consider the family of subgames r(x"(t), T t) along x*(t), t ^oi^]) corresponding
cores C(x*(t),T t) (which are supposed to be nonvoid) and c.f. v(S;x'(t),T t).
The core C(x0,T - t0) is strongly time inconsistent and moreover in all nontrivial
cases even time inconsistent. But using the c.f. v(S;x'(t),T t) and C(x*(t),T t),
t [*oi T] we shall construct a new c.f. and based on it a new strongly dynamic stable
time consistent (STC) optimality principle (OP).
Let us introduce the following function
= i r ~ / v{N;x0,T-to)dt = v(N;x0,T-to),
1 to Jto
because along the optimal trajectory x*(r), T [to,T\ the Bellman's optimality prin
ciple holds for the function v(N; x"(t),T <), i.e.
We see that V is not a c.f. in a common sense in the subgame T(x'(&),T 0),
*o : is T, because v(N;x"(Q),T 0 ) is not equal to the maximal sum of the
payoffs of all the players in this subgame.
In the way we have done it for V(S;2Q,T t0) we show that v(S;x'(Q),T 0),
0 [to, T], S C N is a superadditive function of 5.
Let C(XQ,T~t0) and C(x"(t),T t) bethenonvoid cores in the game r(x0,Tto)
and r(x'(t),T - t), t 6 [t0,T] respectively. Let
0 = {W0.-.W0-..,W*)}eC(i-(t),r-0. te[t0,T]
be an integrable selector which is an imputation from the core of the subgame
r(x*(t),T t) at each instant t. Consider the quantities
t(t)eC(x'(t),T-t), te[t0,T},
from the cores of the subgames T(x'(i),T t). Then using the set integration we
may write,
E^{/;[E/e'M^(r)^^t(o]^
i.e. that any vector C(x*(8), T 8 ) , 8 [*< T] belongs also to the core of the
subgame r ( x * ( 0 ) , T - 0 ) defined by the c.f. u ( 5 ; x * ( 8 ) , T - 0 ) .
Theorem. The set C(x*(Q),T 0 ) belongs to the core of the subgame
r ( x - ( 8 , T - 8 ) ) with the c.f. u(S;x*(6),T -0),ScN. __
5.12.2. Now we have the intuitive background to introduce C(x*(8),T0) as an
OP in the subgame r ( x * ( 0 ) , r - 0 ) , 0 [t0,T] ( in the case 0 = t0 we have an OP
for the original game r(xa,T t0)). Define now a natural procedure of distribution
of the imputation on the time interval [t0, T] which leads to the STCOP.
Let I C ( x 0 , T - <o) and a function #(<), i = 1 , . . . ,n, t [t0,T] satisfies the
condition
[T&{t)dt = l, ft(t)>o.
The function 0(t) = {0i{t)} shall be called the imputation distribution procedure
(1DP). Define
PW)* = ?,()- t=l,...,n. (5.12.13)
Jto
Definition. The OP ~U(x0,T - t0) is called STC if there exists such an IDP
0(t) = {&(t)}, *at
?-?(e)c*V,r-e),
which means that the part of the previously considered "optimal'' imputation belongs
to the OP in the corresponding current subgame r(x*(0),T 0).
5.12.S. Theorem. OP <7(x0, T - T0) is STC in T(x0, T-t0).
Proof. Define
A-(0 = ^ + ^ ~ M**W), (5-12.15)
where (t) C(x*(t),T - t), t [t0,T]. Consider the set ?() <7(x*(0), T - ),
where
J{Q) = / 0(t)dt. (5.12.16)
Jto
From (5.12.16) we get
?(e) =
fh'o' *{t)dt+Th; {T ~ *w**w)*
=
r-J,^ + r-JJX^*wH A
+
r - tJjT - W*>* - T U L \L h{x*{T))drdt\ (5.12.17)
=T-tJAih{x'iT))dT+m}dt
+Tl-tJjT- w<* - T U L \L h{x'{r))dT\dt
= r -J fc U *<**+H dt + T=^i k^'^dT'
9
T-Q f
T - 0 /e
* /i(x*(T))dr 0 i f : f = ^-L- [/j h(x'(r))dr + '(*) <ft,
T-fc,
?m _ / ( ( * ) , 1*0,6),
tl,
~ U ' ( 0 . [0,r]
is also a selector from C{x"(t),T t), t G [to,^), thus
+ + k{x dr+ dt
^ i r*<*<** r^ n vi '^ H
= r h f K *(**(T)dT+ N * + rho llllh^T))dTdt
+
rho ll [II ^ ) ^ + H d< = r h ll K h^T))dT +i{t)\dt
+Tho{Ch^))dT + m dt
]
= ^-r lT \ /' M**^) + *)1* c e(* 0 ,r - <o)
J tp Jlo Ulo
and we have
?(6) + Z7(x-(9), T - 0) C C(x0, T - t0),
for all 0 [f0, T]. The theorem is proved.
It may be easily seen that in the case of terminal payoff, when
Ki(x0,T - t0\ui,...,u) = /fc(*(T)), i = 1,.. - , n
T
i= ^-T/ mdt,
l to 'to
332 Chapter 5. Differential games
whereof) 6 C(x'(t),T - t),
where 5ft(z*(i), T i) is a Shapley value for the subgame r(z*(i), T t) with c.f.
v{S;x'(t),T - t), S C N. The following equality holds:
The construction of the STC JVM-solution proceeds in much the same way.
Ki(x0,T~t0;uu...,un)= f h,(x(t))dt,
Jto
5.13. Differential strongly time consistent optimality principles 333
hii> 0, = l,...,n
be given.
Denote by E(x0,T - t0) the set of all imputations in r ( i 0 , T 10), i.e.,
i = l,...,n}.
( De a
Let C'~ (xo) (t (to, T]) reachable set of the system. I.e., the set of all points
in Rn which could be reached at instant t 6 [to, T\ from the initial position XQ = x(to)
according to (5.13.1) with the help of some admissible open-loop control U(T), r
[t 0 ,f]. For each y C - < 0 (xo) consider a subgame T(y,T t) of the game T(x,T
t 0 ) with corresponding characteristic function v(S;y,T t) and set of imputations
E(y,T-ty
Definition. A point-to-set mapping
C(y,T-t)cE(y,T-t)
defined for all y 6 C"~ to (xo), t 6 [to,T] is called optimality principle (OP) in the
family of subgames T(y,T t).
In special cases C(y,T t) may be a core, NM-solution, Shapley value etc.
Consider the family of subgames T(a;*(t), Tt), along the optimal trajectory x*(t),
t 6 [(o,T), with corresponding characteristic functions v(S;xQ(t),T t), and sets of
imputations E(x'(t),T t).
5.13.2. Define now a natural procedure of distribution of the imputation on the
time interval [to,T] which leads to the differential STCOP.
Let i C(x0,T - t 0 ) and a function &(t), t = l , . . . , n , t [to,T] satisfies the
condition
/ ft(()di = L A W > o .
Jto
The function fi(t) = (ft(i)} shall be called the imputation distribution procedure
(IDP). Define
/eft(i)<ft={8), t = l,...,n.
?-?(e)c*V(e),T-6) (513-2)
foraliee[t,T}. _
5.13.3. Definition. The OP C(x'(t),T - t), t [io,T] is called strongly
dynamic stable STC (strongly time consistent) if there exists such an IDP 0(t) =
{A(t)}, that
((G) ffi C(x'(B),T - 8 ) C C(x0,T- to) (5.13.3)
334 Chapter 5. Differential games
for all 0 6 [t0,T}. Here C~(x'(Q),T - 0) means the set of all possible vectors
l + V,forallr,eC(x'(e),T-B).
The STC of the OP means that if an imputation e C~(x0, T - t0) and IDP
f}(t) = {Pi(t)} of f are selected, then after getting by the players on the time interval
[to, ] of the amount
W) = ffiiWdt, =1,...,
any optimal income (in the sense of the OP C7(x*(0),T - 0)) on the time interval
[0, T] in the subgame T(x*, T - 0 ) together with &(0) constitutes the imputation
belonging to the OP in the original game T(x0, T - to).
Suppose <0 = o < 0 i < < 6n < 0n+i < < m = T is a partition of the
time interval [t0, T] such that 0 n + i 0 B = 6, k = 0 , 1 , . . . , m 1.
If (5.13.2) holds only in the points 0*, k = 0 , 1 , . . . ,m - 1, i.e.
f-f(e 4 )cZ7(x-(e 4 ),r-eo (5.13.4)
we call OP 7J(x*(t),T-t) 6TC (6 time-consistent). If (5.13.3) holds only in the points
0*, k = 0,1,...,m 1, i.e.
?(6) $ * V ( 9 * ) , T - 6*) C U(x0, T - to) (5-13.5)
we call OP 7J(x'(t),T- t) <5STC (* strongly time-consistent).
Now we have everything necessary to construct differential STCOP. Introduce the
following functions:
Pa ;
v(N-,x0,T-e0)
r [0o,0,), where > C(x*(i0), T - t 0 ), x0 = *(0o) = *(*o)-
tfE?=iA.(*'(r))
ATM V (iy;x-(0 t _,),T-*_,)
fW (jv;**(em-,),r-em_,)
r S [ 0 m - 0 m = T], where f 6 C(x-(0 m _,),T - _,).
Define the IDP (T) by formula
The set (?(x0, T 10) is called the regularized OP 77(x0, T 10), and correspondingly
d{x*(Qk),T - e) is a regularized OP V{x*(Sk),T - 6*).
We consider (7(xo> T 10) as a new optimality principle in the game T(x0, T to).
Theorem. / / the IDP0{T), T G [t0,T] is defined by (5.13.7) then always
? ( e t ) ffi d(x'(Qk), T-Qk)c V(x0, T -10)
i.e. the OP d(x'(r),T - t) is 6_STCOP.
Proof. Suppose I = l(Bk) V(x'(Bk), T - Bk). Then = ? ( 0 t ) + J * J8(T)JT for
some 0 ( T ) B. But (0*) = j* '( T M T for s o m e
^ ( r ) B . Consider
^^-\/(r), r[e*,Tl,
then /?"(T) G B, and
? = f ('{r)dT
and thus G C ( X Q , T to). The theorem is proved.
5.13.4. The denned IDP has the advantage (compared with integral one defined
in 1 of this chapter)
> , ( r ) = l, r6[lo,r|,
and thus
E?.-(0) = E f M*.-M)*T (5-13.8)
which is the actual amount to be divided between the players on the time interval
[t0, &] and which is as it is seen by the formula (5.13.8), exactly equal to the amount
earned by them on this time interval. Thus for the realization of the purposed IDP
no additional investments are needed ((5.13.8) may not hold for integral OP's).
If 6 tends to zero we may get STCOP by introducing the IDP 0(r), r G [t0,T],
by the formula
PAT)
V(N;X-(T),T-T)
where & ( T ) G C(X'(T),T T ) is an integrable selector.
336 Chapter 5. Differential games
5.14 Strongly time consistent optimality princi
ples for t h e games with discount payoffs
The problem of dynamic stability (time consistency) for the n-person differential
games with discount payoffs was first mentioned in Strotz (1955), where it was proved
that even the Pareto optimal solutions may be time inconsistent in this case. The
reason is that in a discount payoff case the payoffs of the players in subgames ac
quiring along an optimal path essentially change their structure implying the time
inconsistency of the chosen optimality principle (OP). Till the last time no attempts
have been made for the regularizing of the OP's in discount payoff case. We refer to
Kaitala and Pohjola (1992) where this question was once more stated. Here we try
to use the approach from Sec. 5.13 to construct a family of strongly dynamic stable
optimality principles in the case under discussion. Here we shall consider core as OP
in the game, but all the results remain valid for any other subset of imputations,
considered as optimality principle.
5.14.1. Consider n-person differential game T(x0)
with payoffs
where N is the set of all players in r ( i 0 ) . The trajectory xu(t) is called conditionally
optimal. Let V1(S; Xo) be the characteristic function (S C N) and Cl(x0) be the core.
Consider the family of subgames r(x'*(f)), along xl'(t), t (*o,oo), corresponding
cores and c.f. V1(5;21*(t)). The payoff functions in the subgames have the form
and differs by multiplier eXit from the payoff functions in the subgames defined for
the games without discount factor. This essentially changes the relative weights of
payoff functions of different players, when the game develops, and thus the whole
game itself.
5.14. Strongly time consistent optimahty principles (or the games with ... 337
5.14.2. Consider the partition of the time interval [to,oo) by the points 0 , to =
o < 6 i < . . . < 0 * < 0fc+i < . . . , where &k+l Qk = S > 0 does not depend upon
k, the subgame I V ^ I ) ) , c.f. V 2 ( S ; x u ( 6 , ) ) and core C 2 ( x u ( 0 i ) ) . Let x 2 '(<),
t > 0 i be the conditionally optimal trajectory in the subgame r ( x ' * ( 0 i ) ) , i.e. such
that
maxf;^,V(e1),ul,...1uB) = x;^1v-(e,),u:,...,u:)
i=l isi
n
fOO
= 12 / e-A>(T-e>>Mx2*(r))</T.
Then consider the subgame I ^ x 2 * ^ ) ) , c.f. V 3 (S;x 2 *(0 2 )), core C 3 (x 2 *(0 2 )). Con
tinuing in the same manner we get the sequence of subgames r(x l *(0n)), c.f. V* + 1 (S;
x**(e)), cores Ck+l(xk'(ek)) and conditionally optimal trajectories x(* +1) *(i), t >
k+l k
mV (N;x ^'(Qk)) - Vk"(N;xk"-(Qk+l))}
k k
V ^(N;x '{ek))6 ~ '
i = l , . . . , n , k = 0 , 1 , . . . . The functions {&(<)}> * j ^ *o constitute for each k
Ck+1{xkm{Qk)), k = 0 , 1 , . . . an UDP in r*(x 0 ). Let l be the infinite sequence ( =
{Je **' P{t)<H}- Denote by C(x 0 ) the set of all such sequences , for all possible UDP's
P(t), for different k Ck+\xk*(Qk)), k = 0 , 1 , . . . . Consider C(x0) as optimality
principal (OP) in r s ( x 0 ) , and call it the regularized core (RC). Define ~U(x'(9k)) for
subgame Ts(x'(ek)).
Denote by (0) a finite sequence
6. Suppose that Player E moves from the point y0 along some smooth curve y(t)
with maximum velocity 0. Player P moves with maximum velocity o; at each instant
of time T he knows Player 's position y(t) and the direction of the velocity vector
V{T) = {VX{T),V2(T)}, (UJ(T) + u | ( r ) = 01). Construct Il-strategy for Player P. In
accordance with this strategy he chooses the direction of the velocity vector towards
the capture point M assuming that on the time interval [T, OO) Player E follows a
constant direction { I ( T ) , V J ( T ) } (he moves along the ray with constant velocity /?).
Show that if Player P uses Il-strategy, then the line segment [X(T), y(r)] connecting
the current positions of the players is kept parallel to the segment [x0, y0] until the
time of capture.
7. Suppose that Player E moves from yo along some smooth curve y(r) with
maximum velocity /?. Write an analytical expression for Il-strategy of Player P.
8. Show that when Player P uses Il-strategy, the capture point is always contained
in the set A(xo,yo) bounded by the Apollonius circle A(io,yo)-
Hint. The proof is carried out for the motions by Player E along fc-vertex broken
lines in terms of the statement of Exercise 5, then a passage to the limit is made.
9. ("Driver the Killer" game.) In order to write equations of motion for the
players in this game, it suffices to specify five phase coordinates: two coordinates to
identify the position of Player P (a motor vehicle), two coordinates to identify the
position of Player E (a pedestrian), and one coordinate to indicate the direction of
pursuit. Denote these coordinates by x\,Xj,yi,yj,6 (Fig. 5.1). The state of the game
Zj.Sfil
_Ji >W2
E
\
&>
&l
*1 P V2
*2
0 \ xuV\
Figure 5.1
ii = wi sin 0, 2 = Wi cos 0,
Figure 5.2
around C in the opposite sense, but with the same angular velocity. Thus the vector
i moves with velocity that is equal to ui(dujR) in absolute value and perpendicular
to CE. The components of his velocity are obtained by multiplying the modulus by
xj/d and (ij R/<p)/d, respectively.
Show that the equations of motion are:
i , = av + u;sinu,
i 2 = 1 4-wcosu,
0 < < 2ir, - 1 < v < 1,
where a and w are positive smooth functions of xx and x2-
Write the equation for the value V of the game in form (5.5.64) and (5.5.66) and
show that the equation in form (5.5.69) is
where
P = \JVl +V22> v
= sgnVx,, sinu = -Vx/p, cos5 = -Vy/p.
Hint. Make use of Exercise 11.
13. ("Driver The Killer" game.) Write the main equation in form (5.6.8) and
(5.6.10) for equations of motion in the natural space (Exercise 9) and in the re
duced space (Exercise 10). In the first case, for vx,vy,v we introduce the notation
v
\i v2i u3> vii u5> where the indices refer to the relevant phase coordinates following the
order in which they appear in equations of motion.
14. Find the equation of characteristics as a regression in the natural space for
the "Driver The Killer" game. Here the main equation (5.6.10) becomes
with the terminal payoff p{x(T),A), where A is some point, A e R2, lying outside
the system reachability set by the time-instant T from the initial state i 0 -
17. Write explicit expressions for optimal strategies in the game as in Exercise 16
and for its modification, where the duration of the game is not prefixed and the payoff
to Player E is taken to be equal to the time of arrival at the origin of coordinates.
18. Prove that the reachability set of the controlled system
9. = Pi, Pi = am - kpi,
342 Chapter 5. Differential games
Here q, y are positions of the players P and E respectively and p is the momentum
of Player P. Now in this case Player E moves in accordance with a "simple motion",
while Player P represented by a material unit mass point moves under the frictional
force a.
The payoff to a player is defined to be the distance between geometric positions
of the players by the time T when the game ends:
Construct the rechability sets of the players and determine geometrically the max-
imin distance Pr(xo,xl,y) between these sets.
25. Extend Theorem in 5.9.7 to the case where the participants are several pur
suers Pi,...,Pn acting as one player, and one evading player E.
This page is intentionally left blank
Bibliography
Bellman, R. Rendiconty del Circolo Mathematico di Palermo, ser. 2.1, N2. 1952.
345
346 Bibliography
Feller, V. Introduction to Probability Theory and its Applications, p. 1230. John
Wiley fc Sons Inc., N.Y.-London-Sydney-Toronto, 1971.
Friedman, A. Differential Games, p. 350. Wiley, N.Y., 1971.
Friedman, J. W. Game Theory with applications to economics, p. 361. Oxford Univ.
Press, N.Y., Oxford, 1986.
Fudenberg, D. and J. Tirole. Game theory, p. 580. The MIT Press, Cambridge,
1992.
Gale, D. The Theory of Linear Economic Models, p. 330. McGraw-Hill Book comp.,
inc., N.Y., London, 1960.
Grigorenko, N. L. Differential Games of Pursuit by Several Units, p. 217. Moscow
State Univ. Publ, Moscow, 1983.
Haigh, J. Adv. Applied Prob., 7, 1975.
Harsanyi, J. C. International Economic Review, 4, 1963.
Harsanyi, J. C. Papers in Game Theory, p. 367. Reidel, Dordrecht, 1982.
Harsanyi, J. C. and R. Selten. Management Science, 18, 1972.
Hart, S. and A. Mas-Colell. In A. E. Roth, editor, The Shapley Value. Cambridge
Univ. Press, Cambridge, 1988.
Hu, T. Integer Programming and Network Flows, p. 411. Addison-Wesley Publ.
Comp., Mento Park, Calif.-London-Dou Hills, 1970.
Isaacs, R. Differential Games, p. 384. Wiley, N.Y., 1965.
Karlin, S. Reduction of certain classes of games to integral equations. In H. Kuhn
and A. Tucker, editors, Contributions to the Theory of Games, II. Princeton Univ.
Press, Princeton (N.Y.), 1953.
Karlin, S. Mathematical Methods and Theory in Games, Programming and Eco
nomics, p. 840. Pergamon Press, London, 1959.
Kolmogorov, A. N. and S. V. Fomin. Elements of Theory of Functions and Functional
Analysis, p. 389. Nauka, Moscow, 1981.
N. Kazakova-Frehse. In M. Breton and G. Zaccour, editors, 6th International Sympo
sium on Dynamic Games and Applications, Preprint Volume, Montreal, Canada,
1994. Ecole des Hautes Etudes Commerciales.
Kononenko, A. F. Soviet Math. Reports, 231(2), 1976.
Kovalenko, A. A. Set of Problems for Theory of Games. Visha Sch., Lvov, 1974.
Bibliography 347
Luce, R. D. and H. Raiffa. Games and decisions. Introduction and critical survey,
p. 509. Wiley, N.Y., 1957.
Moulin, H. Theorie des jevx pour I'economie et la politique, p. 200. Hermann, Paris,
1981.
Moulin, H. Game Theory for the Social Sciences, p. 465. N.Y. Univ. Press, N.Y.,
2nd edition, 1986.
Maynard, S. J. and G. R. Price. The logic of animal conflict. Nature, London, 1973.
Owen, G. Game Theory, p. 230. Acad. Press, N.Y., 2nd edition, 1982.
348 Bibliography
Petrosjan, L. A., A. Azamov and H. Satimov. Controlled Systems, 13, 1974.
Peck, J. E. L. and A. L. Dulmage. Canad. J. Math,, 9(3), 1957.
Petrosjan, L. A. Soviet Math. Reports, 161(1), 1965.
Petrosjan, L. A. Wissenshaftliche Zeitshrift der TU Dresden, 4, 1968.
Petrosjan, L. A. Soviet Math. Reports, 195(3), 1970.
Petrosjan, L. A. Vestnik of the Leningrad State University, 19, 1972.
Petrosjan, L. A. Vestnik of the Leningrad State University, 13, 1977.
Petrosjan, L. A. Vesfnifc of the Leningrad State University, 2, 1992.
Petrosjan, L. A. Differential games of pursuit, p. 325. World Scientific, Singapore,
1993.
Petrosjan, L. A. and Yu. Garnaev. Search Games, p. 217. Len. State Univ. Publ.,
Leningrad, 1992.
Perles, M. A. and M. Mashler. International Journal of Game Theory, 10, 1981.
Pontryagin, L. S. Advances in Math. Sci., 21(4), 1966.
Prokhorov, Y. V. and Y. A. Riazanov. Probability Theory. Basic Notings. Central
limit theorems. Random processes, p. 358. Nauka, Moscow, 1967.
Parthasarathy, T. and T.E.S. Raghavan. Some topics in two-person games, p. 259.
Amer. Elsevier, N.Y., 1971.
Petrosjan, L. A. and V. V. Zakharov. Introduction to mathematical ecology, p. 295.
Len. State Univ. Publ., Leningrad, 1986.
Petrosjan, L. A. and N. A. Zenkevich. Optimal search in conflict conditions, p. 96.
Len. State Univ. Publ., Leningrad, 1986.
Robinson, G. B. An iteration method of solving a game, volume P-154, p. 9. RAND
Corp., 1950.
Rockafellar, R. T. Convex analysis, p. 470. Princeton Univ. Press, Princeton, 1970.
Rochet, J. C. Selection of unique equilibrium payoff for extension games with perfect
information. Mimeo, Universite de Paris, ix, 1980.
Roth, A. E. Mathematics of operations research, 2, 1977.
Rozenmuller, J. Cooperative games and markets, p. 115. Springer-Verlag, Berlin,
1971.
Bibliography 349
Sion, M. and Ph. Wolfe. In M. Dresher, A. Tucker, and Ph. Wolfe, editors, Contri
butions to the Theory of Games, III. Princeton Univ. Press, Princeton, 1957.
Vorobjev, N. N. Game theory lectures for economists and system scientists, p. 178.
Springer, N.Y., 1977.
351
352 Index