Sei sulla pagina 1di 365

GAME

THEORY
SERIES ON OPTIMIZATION

Published
Vol. 2 Differential Games of Pursuit
by L A. Petrosjan

Vol. 3 Game Theory


by L A. Petrosjan and N. A. Zenkevich
SERIES ON

OPT!IV]IZATiON

VOL.3

GAME
THEORY

Leon A. Petrosjan
Nibolay A. Zenkevich
Faculty of Applied Mathematics
St. Petersburg State University
RUSSIA

j World Scientific
Singapore* New Jersey London Hong Kong
Published by
World Scientific Publishing Co. Pte. Ltd.
P O Box 128, Fairer Road, Singapore 912805
USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

library of Congress Cataloging-in-Publication Data


Petrosjan, L. A. (Leon Aganesovich)
Game theory / Leon A. Petrosjan, Nikolay A. Zenkevich.
xi, 352 p.; 22.5 cm -- (Series on optimization ; vol. 3)
ISBN 981022396X
1. Game theory. I. Zenkevich, N. A. (Nikolai Anatol'evich)
II. Tide. III. Series.
QA269.P47 1996
519.3--dc20 95-33547
cn>
English translation by J. M. Donetz.

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library.

Copyright 1996 by World Scientific Publishing Co. Pte. Ltd.


All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, Massachusetts 01923, USA.

This book is printed on acid-free paper.

Printed in Singapore by Uto-Print


Acknowledgments

We begin by acknowledging our debts to our teacher Nicolay Vorobjev who started
in the former Soviet Union teaching us game theory, the time when this subject was
not a necessary part of applied mathematics, economics and management science
curriculum.
We have to mention specially Elena A. Semina who wrote with us sections 1.7,
1.9, 3.7, 3.13, 4.4-4.6, 4.8, 4.9 and sections 5.2-5.6, 5.8 for the Russian version of the
book and Jury M. Donetz who translated the book in English.
We thank Olga Kholodkevich, Maria Kultina, Tatiana Survillo and Sergey Voz-
nyuk for their effective research assistance; and also for reading the manuscript and
suggesting ways to improve it.
Many thanks to Andrey Ovsienko and Sergey Voznyuk for preparation of the
manuscript in M g X .

v
This page is intentionally left blank
Preface

Game theory is a branch of modern applied mathematics that aims to analyze various
problems of conflict between parties that have opposed, similar or simply different
interests. A theory of games, introduced in 1921 by Emile Borel, was established
in 1928 by John von Neumann and Oskar Morgenstern, to develop it as a means
of decision making in complicated economic systems. In their book "The Theory of
Games and Economic Behaviour", published in 1944, they asserted that the classical
mathematics developed for applications in mechanics and physics fail to describe the
real processes in economics and social life. They have also seen many common factors
such as conflicting interests, various preferences of decision makers, the dependence
of the outcome for each individual from the decisions made by other individuals both
in actual games and economic situations. Therefore, they named this new kind of
mathematics game theory.
Games are grouped into several classes according to some important features. In
our book we consider zero-sum two-person games, strategic n-person games in normal
form, cooperative games, games in extensive form with complete and incomplete
information, differential pursuit games and differential cooperative n-person games.
There is no single game theory which could address such a wide range of "games".
At the same time there are common optimality principles applicable to all classes
of games under consideration, but the methods of effective computation of solutions
are very different. It is also impossible to cover in one book all known optimality
principles and solution concepts. For instance only the set of different "refinements"
of Nash equilibria generates more than 15 new optimality principles. In this book we
try to explain the principles which from our point of view are basic in game theory,
and bring the reader to the ability to solve problems in this field of mathematics. We
have included results published before in Petrosjan (1965), (1968), (1970), (1972),
(1977), (1992), (1993); Petrosjan and Zenkevich (1986); Zenkevich and Marchenko
(1987), (1990); Zenkevich and Voznyuk (1994).

vii
This page is intentionally left blank
Contents

1 Matrix games 1
1.1 Definition of a two-person zero-sum game in normal form 1
1.2 Maximin and minimax strategies 5
1.3 Saddle points 7
1.4 Mixed extension of a game 11
1.5 Convex sets and systems of linear inequalities 15
1.6 Existence of a solution of the matrix game in mixed strategies . . . . 18
1.7 Properties of optimal strategies and value of the game 22
1.8 Dominance of strategies 30
1.9 Completely mixed and symmetric games 35
1.10 Iterative methods of solving matrix games 40
1.11 Exercises and problems 44

2 Infinite zero-sum two-person games 49


2.1 Infinite games 49
2.2 -saddle points, c-optimal strategies 52
2.3 Mixed strategies 57
2.4 Games with continuous payoff functions 64
2.5 Games with a convex payoff function 70
2.6 Simultaneous games of pursuit 79
2.7 One class of games with a discontinuous payoff function 85
2.8 Solution of simultaneous infinite games of search 88
2.9 Games of secondary search 92
2.10 A poker model 104
2.11 Exercises and problems 121

3 Nonzero-sum games 125


3.1 Definition of noncooperative game in normal form 125
3.2 Optimality principles in noncooperative games 129
3.3 Mixed extension of noncooperative game 136
3.4 Existence of Nash equilibrium 139
3.5 Kakutani fixed-point theorem and proof of existence of an equilibrium
in n-person games 142
3.6 Refinements of Nash equilibria 145

ix
X Contents

3.7 Properties of optimal solutions 149


3.8 Symmetric bimatrix games and evolutionary stable strategies 153
3.9 Equilibrium in joint mixed strategies 157
3.10 The bargaining problem 160
3.11 Games in characteristic function form 167
3.12 The core and ATM-solution 174
3.13 Shapley value 182
3.14 The potential of the Shapley value 187
3.15 The Shapley value for a minimum cost spanning tree game 191
3.16 Exercises and problems 193

4 Positional games 199


4.1 Multistage games with perfect information 199
4.2 Absolute equilibrium (subgame-perfect) 204
4.3 Fundamental functional equations 210
4.4 Penalty strategies 212
4.5 Hierarchical games 214
4.6 Hierarchical games (Cooperative version) 216
4.7 Multistage games with incomplete information 223
4.8 Behavior strategy 229
4.9 Functional equations for simultaneous multistage games 234
4.10 Cooperative multistage games with complete information 242
4.11 Voting the directorial council 247
4.12 Exercises and problems 261

5 Differential games 267


5.1 Differential zero-sum games with prescribed duration 267
5.2 Multistage perfect-information games with an infinite number of alter
natives 275
5.3 Existence of e-equilibria in differential games with prescribed duration 280
5.4 Differential time-optimal games of pursuit 285
5.5 Necessary and sufficient condition for existence of optimal open-loop
strategy for Evader 290
5.6 Fundamental equation 294
5.7 Methods of successive approximations for solving differential games of
pursuit 301
5.8 Examples of solutions to differential games of pursuit 304
5.9 Games of pursuit with delayed information for Pursuer 308
5.10 Definition of cooperative differential game in the characteristic function
form 314
5.11 Principle of dynamic stability (time-consistency) 317
5.12 Integral optimality principles 326
5.13 Differential strongly time consistent optimality principles 332
Contents xi

5.14 Strongly time consistent optimality principles for the games with dis
count payoffs 336
5.15 Exercises and problems 338

Bibliography 345

Index 351
Chapter 1
Matrix games

1.1 Definition of a two-person zero-sum game in


normal form
1.1.1 Definition. The system

T = {X,Y,K), (1.1.1)

where X and Y are nonempty sets, and the function K : X x Y * Ft1, is called a
two-person zero-sum game in normal form.
The elements x X and y Y are called the strategies of players 1 and 2,
respectively, in the game I \ the elements of the Cartesian product X x Y (i.e. the
pairs of strategies (z,y), where x X and y Y) are called situations, and the
function K is the payoff of Player 1. Player 2's payoff in situation (x,y) is set equal
to \~K(x,y)]; therefore the function K is also called the payoff function of the game
T itself and the game T is called a zero-sum game. Thus, in order to specify the game
T, it is necessary to define the sets of strategies X, Y for players 1 and 2, and the
payoff function K given on the set of all situations X x Y.
The game T is interpreted as follows. Players simultaneously and independently
choose strategies x X,y Y. Thereafter Player 1 receives the payoff equal to
K(x,y) and Player 2 receives the payoff equal to (K(x,y)).
Definition. The game I" = (X', V , K') is called a subgame of the game Y =
(X,Y,K) ifX' C X,Y' C Y, and the function K' : X' x Y" R1 is a restriction of
function K on X' x Y'.
This chapter focuses on two-person zero-sum games in which the strategy sets of
the players' are finite.
1.1.2. Definition. Two-person zero-sum games in which both players have
finite sets of strategies are called matrix games.
Suppose that Player 1 in matrix game (1.1.1) has a total of m strategies. Let us
order the strategy set X of the first player, i.e. set up a one-to-one correspondence
between the sets M = { 1 , 2 , . . . , m } and X. Similarly, if Player 2 has n strategies, it
is possible to set up a one-to-one correspondence between the sets N = { 1 , 2 , . . . , n }

1
2 Chapter 1. Matrix games

and Y. The game T is then fully defined by specifying the matrix A = {a,-,-}, where
ctij = K(xi,yi), (i,j) M x N, (x;,y,) X x Y,i 6 M , ; N (whence comes the
name of the game the matrix game). In this case the game T is realized as follows.
Player 1 chooses row i G M and Player 2 (simultaneously and independently from
Player 1) chooses column j N. Thereafter Player 1 receives the payoff (a,,) and
Player 2 receives the payoff (<Hj). If the payoff is equal to a negative number, then
we are dealing with the actual loss of Player 1.

Denote the game T with the payoff matrix A by TA and call it the (mxn) game
according to the dimension of matrix A. We shall drop index A if the discussion
makes it clear what matrix is used in the game.

Strategies in the matrix game can be enumerated in different ways; therefore to


each order relation, strictly speaking, corresponds its matrix. Accordingly, a finite
two-person zero-sum game can be described by distinct matrices different from one
another only by the order of rows and columns.

1.1.3. Example 1, [Dresher (1961)]. This example is known in literature as


Colonel Blotto game. Colonel Blotto has m regiments and his enemy has n regiments.
The enemy is defending two posts. The post will be taken by Colonel Blotto if when
attacking the post he is more powerful in strength on this post. The opposing parties
are two separate regiments between the two posts.
Define the payoff to the Colonel Blotto (Player 1) at each post. If Blotto has
more regiments than the enemy at the post (Player 2), then his payoff at this post is
equal to the number of the enemy's regiments plus one (the occupation of the post is
equivalent to capturing of one regiment). If Player 2 has more regiments than Player
1 at the post, Player 1 loses his regiments at the post plus one (for the lost of the
post). If each side has the same number of regiments at the post, it is a draw and
each side gets zero. The total payoff to Player 1 is the sum of the payoffs at the two
posts.

The game is zero-sum. We shall describe strategies of the players. Suppose that
m > n. Player 1 has the following strategies: x 0 = (iw,0) - to place all of the
regiments at the first post; xi = (m - 1 , 1 ) - to place (m - 1 ) regiments at the first post
and one at the second; x2 = ( m 2 , 2 ) , . . . , xm-x = ( l , m l ) , x m = (0,m). Theenemy
(Player 2) has the following strategies: y0 = (n,0), yx = (n - 1,1),... ,yn = (0,n).

Suppose that the Player 1 chooses strategy x 0 and Player 2 chooses strategy y0.
Compute the payoff am of Player 1 in this situation. Since m > n, Player 1 wins
at the first post. His payoff is n + 1 (one for holding the post). At the second post
it is draw. Therefore a<jo = n + 1. Compute a m . Since m > n 1, then in the
first post Player l's payoff is n 1 + 1 = n. Player 2 wins at the second post.
Therefore the loss of Player 1 at this post is one. Thus, a0\ = n 1. Similarly,
we obtain aoj = n j + 1 1 = n j , 1 < j < n. Further, if m 1 > n then
aw = n + 1 + 1 = n + 2 , a n = n1 + 1 = n, aXj = n j + 1 11 = nj l, 2 < j < n.
In a general case (for any m and n) the elements o, t = 0, m, j = 0,n, of the payoff
J.I. Definition of a two-person zero-sum game in normal form 3

matrix are computed as follows:

n+2 if m -- i > n -j, > ; ' ,


n-j +l if m -- i > n - i . > = j,
n- j -i if m --i>n - j, ' <i.
-m + i + j if m --i < n - j , < >i.
atj = K(xi,yj) = < J + l if m -- i = n - j , >i,
-m-2 if m -- 1 < n - j, <i.
--l if m -- 1 = n - J ! <h
-m + i I if m -- t < n - J , ' = i.
0 if m -- j = n "j, = i-
Thus, with m = 4,n = 3, considering all possible situations, we obtain the payoff
matrix A of this game:

yo Vi V2 #3
Xo 4 2 1 0
Xl 1 3 0 -1
*2 -2 2 2 -2
x3 -1 0 3 1
*4 0 1 2 4

Example 2. (Game of Evasion.) [Gale (I960)]. Players 1 and 2 choose integers i


and j from the set { 1 , . . . , n } . Player 1 wins the amount \i-j\. The game is zero-sum.
The payoff matrix is square (n x n) matrix, where ay = jt j \ . For n = 4, the payoff
matrix A has the form

1 2 3 4
1 "0 1 2 3
2 1 0 1 2
A =
3 2 1 0 1
4 .3 2 1 0

Example 3. (Discrete Duel Type Game.) [Gale (I960)]. Players approach one
another by taking n steps. After each step a player may or may not fire a bullet, but
during the game he may fire only once. The probability that the player will hit his
opponent (if he shoots) on the k-th step is assumed to be k/n (k < n).
A strategy for Player 1 (2) consists in taking a decision on shooting at the t-th
(j-th) step. Suppose that i < j and Player 1 makes a decision to shoot at the t-th
step and Player 2 makes a decision to shoot at the j - t h step. The payoff a i ; to Player
1 is then determined by

a
ij = V1 ) -= 5
n n n rr
Thus the payoff a^ is the difference in the probabilities of hitting the opponent and
failing to survive. In the case i > j , Player 2 is the first to fire and a^ = OJ;. If
4 Chapter 1. Matrix games

however, t = j , then we set a^ = 0. Accordingly, if we set n = 5, the game matrix


multiplied by 25 has the form
0 - 3 _7 -11 -15
3 0 1 -2 -5
A = 7 -1 0 7 5
11 2 - 7 0 15
15 5 - 5 -15 0
Example 4- (Attack-Defense Game.) Suppose that Player 1 wants to attack one
of the targets c i , . . . , c having positive values T\ > 0 , . . . , r n > 0. Player 2 defends
one of these targets. We assume that if the undefended target Cj is attacked, it is
necessarily destroyed (Player 1 wins r,) and the defended target is hit with probability
1 > fa; > 0 (the target c; withstands the attack with probability 1 ft > 0), i.e. Player
1 wins (on the average) ftr,, i = 1,2,... , n .
The problem of choosing the target for attack (for Player 1) and the target for
defense (for Player 2) reduces to the game with the payoff matrix

An n
T2

An>.
Example 5. (Discrete Search Game.) There are n cells. Player 2 conceals an
object in one of the n cells and Player 1 wishes to find it. In examining the i-th cell,
Player 1 exerts n > 0 efforts, and the probability of finding the object in the t-th cell
(if it is concealed there) is 0 </?,-< 1, i = 1,2,... , n . If the object is found, Player
1 receives the amount a. The players' strategies are the numbers of cells wherein
the players respectively conceal and search for the object. Player l's payoff is equal
to the difference in the expected receipts and the efforts made in searching for the
object. Thus, the problem of concealing and searching for the object reduces to the
game with the payoff matrix
afa- r, -T, -T, -1-1
T2 0102 ~ Ti ~T2 -r 2
A =

a&
Example 6. (Noisy Search.) Suppose that Player 1 is searching for a mobile object
(Player 2) for the purpose of detecting it. Player 2's objective is the opposite one (i.e.
he seeks to avoid being detected). Player 1 can move at velocities of A = 1, 02 =
2, 03 = 3, respectively. The range of the detecting device used by Player 1, depending
on the velocities of the players is determined by the matrix

A fa t
<*1 "4 5 6
= a2 3 4 5
<*3 1 2 3
i .2. Maximin and minimax strategies 5

Strategies of the players are the velocities, and Player l's payoff in the situation
(aj,/3j) is assumed to be the search efficiency o,_, = a<6,j, i = 1,3, j = 1,3, where 5y
is an element of the matrix D, Then the problem of selecting velocities in a noisy
search can be represented by the game with matrix

fix ft ft
Qi 4 5 6
A = Q2 6 8 10 .
a3 [3 6 9

1.2 Maximin and minimax strategies


1.2.1. Consider a two-person zero-sum game T (X, Y, K). In this game each of the
players seeks to maximize his payoff by choosing a proper strategy. But for Player 1
the payoff is determined by the function K(x,y), and for Player 2 it is determined by
(K(x,y)), i.e. the players' objectives are directly opposite. Note that the payoff of
Player 1 (2) (the payoff function) is determined on the set of situations ( i , y) XxY.
Each situation, and hence the player's payoff do not depend only on his own choice,
but also on what strategy will be chosen by his opponent whose objective is directly
opposite. Therefore, seeking to obtain the maximum possible payoff, each player must
take into account the opponent's behavior.
Colonel Blotto game provides a good example of the foregoing. If Player 1 wants
to obtain the maximum payoff, he must adopt the strategy x0 (or x 4 ). In this case, if
Player 2 uses strategy yoiys), then the first player receives the payoff 0, i.e. he loses
4 units. Similar reasonings are applicable to Player 2.
In the theory of games it is supposed that the behavior of both players is ra
tional, i.e. they wish to obtain the maximum payoff, assuming that the opponent
is acting in the best (for himself) possible way. What maximal payoff can Player
1 guarantee himself? Suppose player 1 chooses strategy z. Then, at worst case he
will win min y K(x,y). Therefore, Player 1 can always guarantee himself the pay
off max I min v /i"(x,j/). If the max and min are not reached, Player 1 can guarantee
himself obtaining the payoff arbitrarily close to the quantity

w = sup inf. K[x,y), (1.2.1)

which is called the lower value of the game. The principle of constructing strategy
x based on the maximization of the minimal payoff is called the maximin principle,
and the strategy x selected by this principle is called the maximin strategy of Player
1.
For Player 2 it is possible to provide similar reasonings. Suppose he chooses
strategy y. Then, at worst, he will lose max* K(x,y). Therefore, the second player
can always guarantee himself the payoff min y max, K(x, y). The number

t7= inf supA'(x,y) (1.2.2)


6 Chapter J. Matrix games

is called the upper value of the game I \ The principle of constructing a strategy
y, based on the minimization of maximum losses, is called the minimax principle,
and the strategy y selected for this principle is called the minimax strategy of Player
2. It should be stressed that the existence of the minimax (maximin) strategy is
determined by the reachability of the extremum in (1.2.2), (1.2.1).
Consider the (m x n) matrix game TA- Then the extrema in (1.2.1) and (1.2.2)
are reached and the lower and upper values of the game are, respectively equal to
u
~ . S ^ ,PJ? a "> (1.2.3)

v= min max HM. (1.2.4)

The minimax and maximin for the game YA can be found by the following scheme

2J a-a min, a2j


} max mm a,.
i
Ofira minj<jmj )
max; o,i max; ai2 max; a,

mm maxajj
Thus, in the game TA with the matrix
1 0 4
A = 5 3 8
6 0 1
the lower value (maximin) u and the maximin strategy t 0 of the first player are u 3,
o = 2, respectively, and the upper value (minimax) v and the minimax strategy jo
of the second player are t> = 3, jo = 2, respectively.
1.2.2. The following assertion holds for any game T = (X, Y, K)
Lemma. In two-person zero-sum game T
t; < t; (1.2.5)

sup inf K(x,y) < inf sup K(x,y). (1.2.6)


xX vK K XX

Proof. Let x 6 X be an arbitrary strategy of Player 1. Then we have


K(x,y) < supK(x,y).

Hence we get
inf Klx.y) < inf sup A"(x,y).

Note that we have a constant on the right-hand side of the latter inequality, and the
value x X has been chosen arbitrarily. Therefore, the following inequality holds
sup inf K(x, y) < inf sup K(x, y).
xZX vY yY xeX
1.3. Saddle pom is 7

1.3 Saddle points


1.3.1. Consider the optimal behavior of players in a two-person zero-sum game. In
the game V = (X, Y, K) it is natural to consider as optimal a situation (z*, y') XxY
the deviation from which there is no advantage for both players. Such a point (x",y")
is called the equilibrium point and the optimality principle based on constructing an
equilibrium point is called the equilibrium principle. For two-person zero-sum games,
as will be shown later, the equilibrium principle is equivalent to the principles of
minimax and maximin. This, of course, requires the existence of an equilibrium (i.e.
that the optimality principle be applicable).
Definition. In the two-person zero-sum game T = (X,Y,K) the point (z*,y*)
is called an equilibrium point, or a saddle point, if

K(x,y')<K(X',y'), (1.3.1)

K(x\y)>K{x',y') (1.3.2)
for allx X and y Y.
The set of all equilibrium points in the game T will be denoted as

Z(0, Z(T) cXxY.

In the matrix game T^ the equilibrium points are the saddle points of the payoff
matrix A, i.e. the points (i',j") for which for all i 6 M and j N the following
inequalities are satisfied

The element of the matrix a^-j. at the saddle point is simultaneously the minimum of
its row and the maximum of its column. For example, in the game with the matrix
1 0 41
5 3 8 the point (2,2) is a saddle point (equilibrium).
6 0 1 J
1.3.2. The set of saddle points in the two-person zero-sum game T has the proper
ties which enable one to deal with the optimality of a saddle point and the strategies
involved.
Theorem. Let (zj,y"), (x^y^) be two arbitrary saddle points in the two-person
zero-sum game T. Then:

1. K(x'l,y') = K(x'2,y'2);

2- (x\,y'2) Z(T), (*5,y?)Z(r).

Proof. From the definition of a saddle point for all x g X and y G Y we have

*(*,,) <ff(*I,y?) <*(*!,); (1-3.3)


K{x,y'2) < K(x'2Jy'2) < K(x2,y). (1.3.4)
8 Chapter 1. Matrix games
We substitute xj into the left-hand side of the inequality (1.3.3), yj into the right-hand
side, xj into the left-hand side of the inequality (1.3.4) and yj into the right-hand
side. Then we get

jf(3,j) < K(*'UV'I) < *(*;;) < *(;.) < *(;,*)


FVom this it follows that:

jf(x;,j) = K(xi,y;) = jf (*;,i) = *(*;,;). (1.3.5)

Show the validity of the second assertion. Consider the point (xj,yj). FVom (1.3.3)-
(1.3.5), we then have

K(x,yt) < Kix^) = *(*$,?) = K(x*2,y;) < K(x'3,y) (1.3.6)

for all x 6 X, y Y. The inclusion (xj, yj) Z(T) can be proved in much the same
way.
Rrom the theorem it follows that the payoff function takes the same values at all
saddle points. Therefore, it is meaningful to introduce the following definition.
Definition. Let (x*,y*) be a saddle point in the game T. Then the number

v = K(x',y') (1.3.7)

is called the value of the game F.


The second assertion of the theorem suggests, in particular, the following fact.
Denote by X* and Y* projections of the set Z(T) onto X and Y, respectively, i.e.

X- = {x- |* 6 X, 3y' Y, (x*,y) Z(T)},

Y' = {y*|y* F,3x* X,(x%f) Z(T)}.


The set Z(T) may then be represented as Cartesian product

Z{T) = X'x Y'. (1.3.8)

The proof of (1.3.8) is a consequence of the second assertion of the theorem, and is
left to the reader.
Definition The set X*(Y") is called the set of optimal strategies of Player 1 (2)
in the game T, and their elements-optimal strategies of the 1(2) player.
Note that from (1.3.5) it follows that any pair of optimal strategies forms a saddle
point, and the corresponding payoff is the value of the game.
1.3.3. Optimality of the players' behavior remains unaffected if the strategy sets
in the game remain the same and the payoff function is multiplied by a positive
constant, or a constant number is added thereto.
Lemma (on Scale). Let F = (X,Y,K) and V (X,Y,K') be two zero-sum
games and
K' = 0K + a, 0 > 0, a = const, 0 = const. (1.3.9)
1.3. Saddle points

Then
Z(T') = Z{T), vr, = Pvr + a. (1.3.10)

Proof. Let (x",y") be a saddle point in the game T. Then we have

K'(x',y') = (3K(x\y') + a < 0K(x',y) +a = K\x\y),

K'(x,y') = pK(x,y') + a< f!K{x\y') + a = A"(z',y*),


for all i X and y Y.
Therefore (x',y*) G Z(Y%Z{?) C Z{T').
Conversely, let (x,y) 6 Z(T'). Then

K(x,y) = (lf0)K'(x,y)~a/f}
and, by the similar reasoning, we have that {x,y) Z(T). Therefore Z(T) = Z(r')
and we have
vr. = K'{x% y') = flK(x', y') + a = /? r + o.
Conceptually, this lemma states strategic equivalence of the two games differing
only by the payoff origin and the scale of their measurements.
1.3.4. We shall now establish a link between the principle of equilibrium and the
principles of minimax and maximin in a two-person zero-sum game.
Theorem. For the existence of the saddle point in the game T = (X, Y, K), it
is necessary and sufficient that the quantities

mia sup K(x,y), maxinfK(x,y) (1.3.11)


V x x y

exist and the following equality holds:

U = maxinf K(x^y) = minsupA'(x,y) = v. (1.3.12)


x
V x

Proof. Necessity. Let (x*,y*) G Z{V). Then for all x e X and y 6 Y the following
inequality holds:
K{x,y-)<K(x',y-)<K{x-,y) (1.3.13)
and hence

snpK(x,y')<K(x',y"). (1.3.14)

At the same time, we have

infsuptf(x,y) < supA"(x,y*). (1.3.15)

Comparing (1.3.14) and (1.3.15) we get

infsup/C(x,y) < s u p # ( x , y * ) < K{x',y'). (1.3.16)


10 Chapter 1. Matrix games

In the similar way we get the inequality

K(x",y') < inf K(x',y) < supinf K(x,y). (1.3.17)


y x v
On the other hand, the inverse inequality (1.2.6) holds. Thus, we get

supinf K(x,y) = inf sup K(x,y), (1.3.18)


x y v x
and finally we get

minsupA"(a;,y) = sup K(x,y') = K(x',y'),


y x x
max inf K(x,y) = inf K{x',y) = K(x",y'),

i.e. the exterior extrema of the min sup and max inf are reached at the points y* and
x* respectively.
Sufficiency. Suppose there exist the min sup and max inf

maxmi K(x,y) = inf K(x*,y); (1.3.19)

minsup K(x,y) = sup K(x,y") (1.3.20)


y x x
and the equality (1.3.12) holds. We shall show that (x",y*) is a saddle point. Indeed,

K(x',y') > inf K(x",y) = max inf K(x,y); (1.3.21)


K{x\y") < sup/i^x, j/*) = minsup/i:(x,y). (1.3.22)
x y x
By (1.3.12) the minsup is equal to the maxinf, and from (1.3.21), (1.3.22) it
follows that the minsup is also equal to the K(x",ym), i.e. the inequalities in (1.3.21),
(1.3.22) are satisfied as equalities. Now we have

K(x',y') = MK(x',y) <K(x% y),

K(x', y') = sup K(x,y')>K(x,y')


X

for all x e X and y Y, i.e. (x',y') 6 Z(T).


The proof shows that the common value of the min sup and max inf is equal to
K(x*,y*) = v, the value of the game, and any minsup (maxinf) strategy y"(x*) is
optimal in terms of the theorem, i.e. the point (x*,y*) is a saddle point.
The proof of the theorem yields the following assertion.
Corollary. If the minsup and maxinf in (1.3.11) exist and are reached on y
and "x, respectively, then

maxinf K(x,y) = K(x,y) = minsupA"(x,y). (1.3.23)


J .4. Mixed extension of a game 11

The games, in which saddle points exist, are called strictly determined. Therefore,
this theorem establishes the criterion for strict determination of the game and can
be restated as follows. For the game to be strictly determined it is necessary and
sufficient that the minsup and maxinf in (1.3.11) exist and the equality (1.3.12) is
satisfied.
Note that, in the game TA, the extrema in (1.3.11) are always reached and the
theorem may be reformulated in the following form.
Corollary 2. For the (m x n) matrix game to be strictly determined it is
necessary and sufficient that the following equalities hold

mm max a,-, max nun a,, (1.3.24)


j=l,2 n t=l,2,...,m =1,S m i=l,2 T '

1 4 1
For example, in the game with the matrix 2 3 4 the point (2,1) is a saddle
0 - 2 7
point. In this case
max min a;. = min max a;. = 2.
3 i i
1 0
On the other hand, the game with the matrix does not have a saddle point,
0 1
since
min max a,-, = 1 > max min o^, = 0.
J i 3
Note that the games formulated in Examples 1-3 are not strictly determined,
whereas the game in Example 6 is strictly determined and its value is v = 6.

1.4 Mixed extension of a game


1.4.1. Consider the matrix game TA- If the game has a saddle point, then the
minimax is equal to the maximin; and each of the players can, by the definition of
the saddle point, inform the opponent of his optimal (maximin, minimax) strategy
and hence no player can receive extra benefits. Now assume that the game F^ has
no saddle point. Then, by Theorem 1.3.4, and Lemma 1.2.2, we have

min max Oj, max min au > 0. (1.4.1)


i i 3

In this case the maximin and minimax strategies are not optimal. Moreover, it is not
advantageous for the players to play such strategies, as he can obtain a larger payoff.
The information about a choice of a strategy supplied to the opponent, however, may
cause greater losses than in the case of the maximin or minimax strategy.
Indeed, let the matrix A be of the form

7 3
4 = 2 5
12 Chapter 1. Matrix games

For such a matrix mm, max; a;j = 5, max; minj a;; = 3, i.e. the saddle point does not
exist. Denote by i* the maximin strategy of Player 1 (t* = 1), and by j ' the minimax
strategy of Player 2 (j* = 2). Suppose Player 2 adopts strategy j * = 2 and Player
1 chooses strategy t = 2. Then the latter receives the payoff 5, i.e. 2 units greater
than the maximin. If, however, Player 2 guesses the choice by Player 1, he alters his
strategy to j = 1 and then Player 1 receives a payoff of 2 units only, i.e. 1 unit less
than in the case of the maximin. Similar reasonings apply to the second player.
How to keep the information about the choice of the strategy in secret from the
opponent? To answer this question, it may be wise to choose the strategy with
the help of some random device. In this case the opponent cannot learn the player's
particular strategy in advance, since the player does not know it himself until strategy
will actually be chosen at random.
1.4.2. Definition. The random variable whose values are strategies of a player
is called a mixed strategy of the player.
Thus, for the matrix game TA, a mixed strategy of Player 1 is a random variable
whose values are the row numbers i G M, M = { 1 , 2 , . . . , m } . A similar definition
applies to Player 2's mixed strategy whose values are the column numbers j N of
the matrix A.
Considering the above definition of mixed strategies, the former strategies will
be referred to as pure strategies. Since the random variable is characterized by its
distribution, the mixed strategy will be identified in what follows with the probability
distribution over the set of pure strategies. Thus, Player l's mixed strategy x in the
game is the m-dimensional vector
m
*=Ki,,&.), & = U . > 0 , i = l,...,m. (1.4.2)
;=i

Similarly, Player 2's mixed strategy y is the n-dimensional vector


n
l
y = (nl,...,tjn), Yl^i- ^i>QJ = l,--.,n- (1.4.3)

In this case, & > 0 and x\, > 0 are the probabilities of choosing the pure strategies
* M and j N, respectively, when the players use mixed strategies a; and y.
Denote by X and Y the sets of mixed strategies for the first and second players,
respectively. It can easily be seen that the set of mixed strategies of each player is a
compact set in the corresponding finite Euclidean space (closed, bounded set).
Definition. Let x = ( i , . . . , m ) X be a mixed strategy of Player 1. The set
of indices
Mx = {i\ieM,(i>0}, (1.4.4)
where M = { 1 , 2 , . . . , m), is called the spectrum of strategy x.
Similarly, for the mixed strategy y = ( ^ j , . . . , J?) Y of Player 2 the spectrum
N v is determined as follows:

Ny = {j\j6N,m>0}, (1.4.5)
J .4. Mixed extension of a game 13

where N = { 1 , 2 , . . . , n }. Thus, the spectrum of mixed strategy is composed of such


pure strategies that are chosen with positive probabilities.
For any mixed strategy x the spectrum Mx ^ 0 , since the vector x has non-
negative components with the sum equal to 1 (see (1.4.2)).
Consider a mixed strategy u* = (d,...,m) X, where & = 1, 4j = 0, j ^ *, =
1,2,... ,m. Such a strategy prescribes a selection of the i-th row of the matrix A
with probability 1. It would appear natural to identify a mixed strategy m 6 X
with the choice of i-th row, i.e. the pure strategy t 6 M of Player 1. In a similar
manner, we shall identify the mixed strategy Wj = (t]i,..., JJ3, . . . ,nn) (E Y, where
n
i = U?i 0,t ^ j,j = l,...,n with the pure strategy j N of Player 2. Thus
we have that the player's set of mixed strategies is the extension of his set of pure
strategies.
Definition. The pair (x,y) of mixed strategies in the matrix game TA is called
the situation in mixed strategies.
We shall define the payoff of Player 1 at the point (x, y) in mixed strategies for
the (m x n) matrix game TA as the mathematical expectation of his payoff provided
that the players use mixed strategies x and y, respectively. The players choose their
strategies independently; therefore the mathematical expectation of payoff K(x,y) in
mixed strategies i = ( & , . . . , f m ), y = (vi, ,Vn) is equal to

*(*,) = ; i > . ^ = (**)V = *(*y) (1-4.6)

The function K(x,y) is continuous in x X and y G Y. Notice that when one player
uses a pure strategy (i or j , respectively) and the other uses a mixed strategy (y or
x), the payoffs K(i,y), K(x, j) are computed by formulas

K(i,y) = K(ui,y) = J2a^i = *IM = l , . . . , m ,

K(x,j) = K(x,Wj) = a i ; & = xa3,j = l , . . . , n ,


i=i
where a;, a1 are respectively the ith row and the j t h column of the (m x n) matrix
A.
Thus, from the matrix game TA = (M,N,A) we have arrived at a new game
r ^ = (X, Y, K), where X and Y are the sets of mixed strategies in the game TA and
K is the payoff function in mixed strategies (mathematical expectation of the payoff).
The game TA will be called a mixed extension of the game r ^ . The game TA is a
subgame for TA, ie. ^ C T^. _
1.4.3. Definition. The point (x',y*) in the game TA forms a saddle point and
the number v = K(x*,y*) is the value of the game TA if for all x X and y Y

K(x,y')<K(x\y')<K(x',y). (1.4.7)
14 Chapter 1. M&trix games

The strategies (x*,y*) appearing in the saddle point are called optimal. Moreover,
by Theorem 1.3.4, the strategies x* and y* are respectively the maximin and minimax
strategies, since the exterior extrema in (1.3.11) are reachable (the function K(x,y)
is continuous on the compact sets X and Y).
Lemma 1.3.3 shows that the two games differing by the payoff reference point and
the scale of payoff measurements (Lemma on Scale) are strategically equivalent. It
turns out that if two matrices games T^ and V* are subject to this lemma, their
mixed extensions are strategically equivalent. This fact is formally estabhshed by the
following lemma.
Lemma. Let TA and TA> be two (m x n) matrix games, where

A' = a A + B, a > 0, a = const,

and B is the matrix with the same elements 0, i.e. /?,j = fi for all i and j . Then
Z(jT~A') = Z(FA)> *>A> = <*VA + P, where TA1 and TA are the mixed extensions of the
games TA' and TA, respectively, andvA^^A are the values of the games TA' andTA-
Proof. Both matrices A and A' are of dimension mxn; therefore the sets of mixed
strategies in the games TA< and TA coincide. We shall show that for any situation in
mixed strategies (x, y) the following equality holds

K'(x,y) = aK(x,y) + p, (1.4.8)

where K' and K are Player l's payoffs in the games TA> and TA, respectively.
Indeed, for all x 6 X and j g V w e have

K'{x,y) = xA'y = a(xAy) + xBy = aK(x,y) + fi.

From Scale Lemma it then follows that Z(TA-) = Z(TA), VA' = OVA + P-
are
Example 7. Let us verify that the strategies y* = ( , 5 , 4 ) , x* = ( I J ^ I )
optimal and v = 0 is the value of the game TA with the matrix

1 -1 -1
A= 1 -1 3
1 3 -1
We shall simplify the matrix A (to obtain the maximum number of zeros). Adding a
unity to all elements of the matrix A, we get the matrix

' 2 0 0"
A' = 0 0 4
0 4 0

Each element of the matrix A' can be divided by 2. The new matrix is of the form

' 1 0 0"
A" = 0 0 2
0 2 0
1.5. Convex sets and systems of linear inequalities 15

By the lemma we have t u " = | x ' = %{VA + 1)- Verify that the value of the game
Tx is equal to 1/2. Indeed, K(xm,y*) = Ay' = 1/2. On the other hand, for each
strategy y Y,y = (r}un2,i)3) we have K{x',y) = |IJ, + \n2 + |J? 3 = \ 1 = , and
for all x = ( 6 , 6 , 6 ) , i X , ff(!,) = | 6 + *& + & = \. Consequently, the
above-mentioned strategies x',y' are optimal and VA ~ 0.
In what follows, whenever the matrix game TA is mentioned, we shall mean its
mixed extension VA-

1.5 Convex sets and systems of linear


inequalities
This section is auxiliary in nature and can be omitted by reader with no loss in
continuity. In order to obtain an understanding of the proofs of the following asser
tions, however, it may be reasonable to recall certain widely accepted definitions and
theorems. Most of the theorems are given without proofs and special references are
provided when needed.
1.5.1. The set M C R is called convex if for any two points of this set x t , x% M
all the points of the interval Axj + (1 A)x 2 ,0 < A < 1 are contained in M. The
notion of a convex set can be formulated in a more general, but equivalent form.
The set M C R is convex if, together with points i i , . . . , n from M, it also
contains all points of the form
k k
x = A , z A,>0, A , = 1,
i=l i=l

referred to as convex linear combinations of the points x i , . . . ,*.


Intersection of convex sets is a convex set.
Consider a system of linear inequalities

xA<b

or
* > < & , jJV,iV = { l , . . . , n } , (1.5.1)
where A = [a',} N] is the (m x n) matrix, 6 = ( & , . . . ,/?) 6 R.
Denote a set of solutions of (1.5.1) as X = {x\xA < b}. From the definition it
immediately follows that X is a convex set. The set X is called a convex polyhedral
set given by the system of constraints (1.5.1).
1.5.2. The point X M, where M is the convex set is called an extreme point if
from the condition x = Axi + (1 - A)x2, Xi G M, x 2 M and 0 < A < 1 it follows that
xj = x 2 = x. Conceptually, the definition implies that x M is an extreme point if
there is no line segment with two end points in M to which x is an interior point.
Notice that the extreme point of the convex set is always a boundary point and
the converse is not true.
16 Chapter 1. Matrix games
Let X be a convex polyhedral set that is given by the system of constraints (1.5.1).
Then the following assertions are true.
Theorem. [Ashmanov (1981)]. The set X has extreme points if and only if
rank A = rank[a',j N]=?m.
Theorem. [Ashmanov (1981)]. For the point x0 6 X to be extreme, it is
necessary and sufficient that this point be a solution of the system

x0a* = fr,j Nv, (1.5.2)

x0a> <PhieN\Nu (1.5.3)


%
where N\ C N, rank[a ,j JVjj = m.
The latter theorem yields an algorithm for finding extreme points of the set X.
To this end, we need to consider column bases of the matrix A, to solve the system
of linear equations (1.5.2), and to verify that the inequalities (1.5.3) hold. However,
this method of searching for extreme points of the polyhedral set is of little practical
significance, since its application involves complete exhaustion of all possible column
bases of the matrix A.
1.5.3. An intersection of all convex sets containing P will be called the convex
hull of the set P and denoted as conv(P). This definition is equivalent to the following
statement. The convex hull of the set P is composed of all convex linear combinations
of all possible finite systems of points from P, i.e.
n n
conv(P) = {x\x = ^XiXi^Xi = l.Ai > 0,ii e P).
i=l i=l

The convex hull of a finite number of points is called a convex polyhedron generated
by these points. The convex polyhedron is generated by its extreme points. Thus,
if we consider the set X of Player l's mixed strategies in the (m x n) game, then
X = conv{ui,... ,u m }, where u, = (0,... ,0,1,0,... ,0) are unit vectors of the space
PJ" of pure strategies of Player 1. The set X is a convex polyhedron of (m 1)
dimension and is also called the (ro I)-dimensional simplex (or the fundamental
simplex). In this case, all vectors u, (pure strategies) are extreme points of the
polyhedron X. Similar assumptions apply to the Player 2's set Y of mixed strategies.
The cone C is called a set of such points that if x C, X > 0, then Ax 6 C.
Conceptually, the cone C, which is the subset i f , contains, together with the
point x, the entire half-line (a:), such that

(x) = {y\y = Xx,X>0}.

The cone C is also called a convex cone if the following condition is satisfied: x+y C
holds for all i , y C. In other words, the cone C is convex if it is closed with respect to
addition. Another equivalent definition may also be given. The cone is called convex if
it is a convex set. The sum of convex cones Ci + Cj = {c|c = ci+c a ,ci 6 Ci,cj 6 C2)
and their intersection C\ (1 C2 are also convex cones.
1.5. Convex sets and systems of linear inequalities 17

Immediate verification of the definition may show that the set C = {x \xA < 0}
of solutions to a homogeneous system of linear inequalities corresponding to (1.5.1)
is a convex cone.
Let X be a convex polyhedral set given in the equivalent form
m
f,a,<6, (1.5.4)
;=>
where x ( & , . . . ,TO) Rm, a< is the tth row of the matrix A, t = 1 , . . . ,m. Now
suppose that rank A = r,r < m, and vectors ai,...,ar form the row basis of the
matrix A. Decompose the remaining rows with respect to the basis
f

a3; = E * v a i i j = r + l , . . . , m . (1.5.5)
i-l

Substituting (1.5.5) into (1.5.4), we obtain the following system of inequalities (equiv
alent to (1.5.4))
r m
E(fc + E &*)*<* (i-s-6)
Denote by Xo the set of vectors x = ( j , . . . , m ) R satisfying the inequalities (1.5.6)
and condition , = 0,j = r + 1 , . . . ,m. By Theorem in 1.5.2, the set X0 has extreme
points. The following theorem holds [Ashmanov (1981)].
Theorem on representation of a polyhedral set. Let X be the polyhedral
set given by the system of constraints (1.5.4)- Then

X = M + C,

where M 4- C = {x\x = y + z,y G M, z 6 C), M is a convex polyhedron generated


by extreme points of the polyhedral set X0 given by (1.5.6), and C = {x\xA < 0} is
convex cone.
This theorem, in particular, implies that if the solution set X of (1.5.4) is bounded,
then X is a convex polyhedron.
1.5.4. Recall that the problem of finding miner, under constraints

xA > b, x > 0, (1.5.7)

where A is an (m x n) matrix, c G R, x Rm, b R" is called a direct problem


of linear programming in standard form, and the problem of determining max by the
constraints
Ay<c, y>0, (1.5.8)
where y G R", is called a dual linear programming problem for (1.5.7).
The vector x G / T , satisfying the system (1.5.7) is a feasible solution of problem
(1.5.7). The notion of a feasible solution y G if* of problem (1.5.8) is introduced in a
similar manner. The feasible solution i(j?) is an optimal solution of problem (1.5.7)
18 Chapter 1. Matrix games
[(1.5.8)] if ex = miner (by = maxby) and the minimum (maximum) of the function
cx(by) is achieved on the set of all feasible solutions.
The following assertion is true [Ashmanov (1981)].
Duality Theorem. If both problems (1.5.7), (1.5.8) have feasible solutions then
both of them have optimal solutions x,y, respectively, and

ex = by.

1.5.5. In closing this section, we give one property of convex functions. First
recall that the function <p: M - Rl, where M C Rm is a convex set, is convex if

?(A*, + (1 - X)x2) < AV(*,) + (1 - AM* a ) (1.5.9)

for any xt,x2 M and A [0,1). If the inverse inequality holds in (1.5.9), then the
function ip is called concave.
Let ifii(x) be the family of functions convex on Af, = 1,..., n. Then the upper
envelope ^>(x) of this family of functions

r[>(x) = max <pi(x) (1.5.10)


tl,...,n

is convex on M.
Indeed, by the definition of the convex function for Xi,x2 M and a [0,1] we
have
ipi(axi + (1 - a)x2) < Qifi(xi) + (1 - a)<,-(z2)
< amaxifii(xi) + (1 ct)max<pi(x2).
Hence we get
rjj(ax\ + (1 ct)x2) = maxy>(ii + (1 ct)x2) < a^(zi) + (1 a)ip(x2),

which is what we set out to prove.


In a similar manner, we may show that the lower envelope (in (1.5.10) the mini
mum is taken in t) of the family of concave functions is concave.

1.6 Existence of a solution of t h e matrix game in


mixed strategies
We shall prove that an arbitrary matrix game is strictly determined in mixed strate
gies.
1.6.1. Theorem, [von Neumann (1928)] Any matrix game has a saddle
point in mixed strategies.
Proof. Let T* be an arbitrary (m X n) game with a strictly positive matrix
A = {ay}, i.e. a,j > 0 for all = l,m and j = l,n. Show that in this case
1,6. Existence of a solution of the matrix game in mixed strategies 19

the theorem is true. To do this, we shall consider an auxiliary linear programming


problem
minxu, xA > w, x > 0 (1.6.1)
and its dual problem
maxyto, Ay <u, y > 0, (1.6.2)
where u = ( 1 , . . . , 1 ) -ft, u> = ( 1 , . . . , 1 ) e / P . From the strict positivity of the
matrix A it follows that there exists a vector x > 0 for which xA > to, i.e. problem
(1.6.1) has a feasible solution. On the other hand, the vector y = 0 is a feasible
solution of problem (1.6.2). Therefore, by the duality theorem of linear programming
(see 1.5.4), both problems (1.6.1) and (1.6.2) have optimal solutions x,f, respectively,
and
xu = yw = Q>0. (1.6.3)
Consider vectors xm = x / 0 and y" = y/Q and show that they are optimal strate
gies for the players 1 and 2 in the game T^, respectively and the value of the game is
equal to 1/0.
Indeed, from (1.6.3) we have

x'u = (*)/ = ( w ) / 6 = y'w = 1,

and from feasibility of x and y for problems (1.6.1), (1.6.2), it follows that x* =
x/Q > 0 and y" = y/Q > 0, i.e. x" and y" are the mixed strategies of players 1 and
2 in the game T^.
Let us compute a payoff to Player 1 at (x',y*):

K(x\y') = x'Ay' = (xAy)/Q\ (1.6.4)

On the other hand, from the feasibility of vectors x and y for problems (1.6.1),(1.6.2)
and equality (1.6.3), we have

0 = wy < (xA)y = x(Ay) < xu = Q. (1.6.5)

Thus, xAy = 0 and (1.6.4) implies that

K(x-,y') = \/Q. (1.6.6)

Let x G X and y Y be arbitrary mixed strategies for players 1 and 2. The following
inequalities hold:

K{x',y) = (x'A)y = (3-A)y/e > {wy)/Q = 1/0, (1.6.7)

K(x,y') = x(Ay') = x ( ^ y ) / 0 < ( x u ) / 6 = 1/0. (1-6.8)


Comparing (1.6.6)-(1.6.8), we have that (x*,y*) is a saddle point and 1/0 is the value
of the game FA with a strictly positive matrix A.
Now consider the (m x n) game TA> with an arbitrary matrix A' = { a ^ } . Then
there exists such constant /5 > 0 that the matrix A = A' + B is strictly positive,
20 Chapter 1. Matrix games
where B = {#,} is an (m x n) matrix, /? = /?, = l,m, j = l,n. In the game
IV there exists a saddle point (x',ym) in mixed strategies, and the value of the game
equals Vyi = 1/Q, where 0 is determined as in (1.6.3).
From Lemma 1.4.3, it follows that (x*,ym) Z(TA>) is a saddle point in the game
IV in mixed strategies and the value of the game is equal to VA> = VA 0 = 1/6 /?
This completes the proof of Theorem.
Informally, the existence of a solution in the class of mixed strategies implies that,
by randomizing the set of pure strategies, the players can always eliminate uncertainty
in choosing their strategies they have encountered before the game starts. Note that
the mixed strategy solution does not necessarily exist in zero-sum games. Examples
of such games with an infinite number of strategies are given in Sees. 2.3, 2.4.
Notice that the proof of theorem is constructive, since the solution of the matrix
game is reduced to a linear programming problem, and the solution algorithm for the
game IV is as follows.
1. By employing the matrix A', construct a strictly positive matrix A = A' + B,
where B ={,,},/% = / ? > 0 .
2. Solve the linear programming problems (1.6.1),(1.6.2). Find vectors x,y and a
number 8 [see (1.6.3)].
3. Construct optimal strategies for the players 1 and 2, respectively,
x' = x/e, f = y/e.
4. Compute the value of the game IV
w4> = i / e - / ? .

Example 8. Consider the matrix game FA determined by the matrix

A= 4 0
2 3
Associated problems of linear programming are of the form
n6+6, maxtii+tjj,
46 + 2fa > 1, Am < 1
%>1, 2in+3ij4<l,
6>o,6>o, m>o,V2>0-
Note that, these problems may be written in the equivalent form with constraints in
the form of equalities
mm 6 + 6 , maxift+fft,
4 6 + 2 6 - 6 = 1, 4>h+fc = l
36-6=1, 2 ^ + 3 ^ + ^4 = 1,
6 > o,6 > 0,6 > 0,6 > o, ;t > o,.?2 > 0,1/3 > o,i/4 > o.
1.6. Existence of a soiution of the matrix game in mixed strategies 21

Thus, any method of solving the linear programming problems can be used to solve the
matrix games. The simplex method is most commonly used to solve such problems.
Its systematic discussion may be found in Ashmanov (1981), Gale (1960), Hu (1970).
1.6,2. In a sense, the linear programming problem is equivalent to the matrix
game T^. Indeed, consider the following direct and dual problems of linear program
ming
minzii
xA > w, (1.6.9)
x>0,
maxyu)
Ay<u, (1.6.10)
y>0.
Let X and Y be the sets of optimal solutions of the problems (1.6.9) and (1.6.10),
respectively. Denote ( 1 / 0 ) X = {x/Q \x X}, ( 1 / 9 ) 7 = { f / 0 \y Y}, 0 > 0.
Theorem. Let F^ be the (m x n) game with the positive matrix A (all elements
are positive) and let there be given two dual problems of linear programming (1.6.9)
and (1.6.10). Then the following assertions hold.

1. Both linear programming problems have a solution (X ^ 0 and Y ^ 0 ) , tn


which case
0 = minxu = maxytu.

2. The value VA of the game FA is

M = i/e,
and the strategies
x- = x/Q, y- = y / 0
are optimal, where i 6 l is an optimal solution of the direct problem (1.6.9)
and y G Y is the solution of the dual problem (1.6.10).

3. Any optimal strategies x' X" and y' Y* of the players can be constructed
as shown above, i.e.

X' = ( 1 / 0 ) X , Y' = ( 1 / 0 ) F .

Proof. Assertions 1,2 and inclusions ( 1 / 6 ) 7 C X", ( 1 / 0 ) F C V , immediately


follow from the proof of Theorem 1.6.1.
Show the inverse inclusion. To do this, consider the vectors x* = ( " , . . . ,J,) X'
and x = (i, . , m ) , where x = 0z*. Then for all j N we have

xa> = 0 x V > 0 ( 1 / 0 ) = 1,

in which case 1 > 0, since 0 > 0 and x* > 0. Therefore x is a feasible solution to
problem (1.6.9).
22 Chapter 1. Matrix games

Let us compute the value of the objective function

xu = x*u = 9 = min xu,

i.e. x G X is an optimal solution to problem (1.6.9).


The inclusion Y' C ( 1 / 6 ) F can be proved in a similar manner. This completes
the proof of the theorem.

1.7 Properties of optimal strategies and value of


the game
Consider the properties of optimal strategies which, in some cases, assist in finding
the value of the game and a saddle point.
1.7.1. Let (x',y") X x Y be a saddle point in mixed strategies for the game
I V It turns out that, to test the point (x*,y*) for equilibrium, it will suffice to test
the inequalities (1.4.7) only for * g M and j g N, not for all x 6 X and y Y, since
the following assertion is true.
T h e o r e m . For the situation (x',y*) to be an equilibrium (saddle point) in the
game TA, and the number v = K(xm,y') be the value, it is necessary and sufficient
that the following inequalities hold for all i M and j g N:

K(iy)<K(x',y')<K(X',j). (1.7.1)

Proof. Necessity. Let (x*,y*) be a saddle point in the game V*. Then

K(x,y')<K(x',y')<K(x',y)
for all x 6 X, y G Y. Hence, in particular, for Ui X and Wj Y we have

K(i,y') K^y*) < K(x%f) < K(x',Wj) K(x%j)


for all i 6 M and j g N.
Suffidency. Let (x*,j/*) be a pair of mixed strategies for which the inequalities
(1.7.1) hold. Also, let x = ( f i , . . . , m ) 6 X and y = (%,...,nn) 6 Y be arbitrary
mixed strategies for the players 1 and 2, respectively. Multiplying the first and second
inequalities (1-7.1) by & and rjj, respectively, and summing, we get
m rn

G*-(i,y-) < *(**,) 6 = *(,'), (1-7.2)


i=i i=i

X>;*(x*,;) > *(*,*)>; = K(x',y*). (1.7.3)


i=i i=i
In this case we have
Efcffftlf) = *(*,), (1-7.4)
J.7. Properties of optimal strategies and value of the game 23

X>i*(*-, = *(*%). (L7-5)


Substituting (1.7.4), (1.7.5) into (1.7.2) and (1.7.3), respectively, and taking into
account the arbitrariness of strategies i 6 X and y 6 Y, we obtain saddle point
conditions for the pair of mixed strategies (x',y").
Corollary. Let (i*,j*) be a saddle point in the game T^. Then the situation
(i*, j") is also a saddle point in the game T&.
Example 10. (Solution of the Evasion-type Game.) Suppose the players select
integers t and j between 1 and n, and Player 1 wins the amount a{j = \i j \ , i.e. the
distance between the numbers i and j .
Suppose the first player uses strategy x" = ( 1 / 2 , 0 , . . . ,0,1/2). Then

K{x%j) = 1/2|1 -j\ + l/2|n - j \ = 1/20 - 1) + l/2(n - j ) = (n - l ) / 2

for all 1 < j < n,


a) Let n = 2fc+l be odd. Then Player 2 has a pure strategy j * = (n + l ) / 2 = fc + 1
such that
ah. = \i - (n + 1)/2| = | - Jfc - 1| < Jfc = (n - l ) / 2
for all i 1 , 2 , . . . , n .
b) Let n = 2k be even. Then Player 2 has a strategy
y" = ( 0 , 0 , . . . , 1 / 2 , 1 / 2 , 0 , . . . , 0 ) , where v'k = 1/2, ifoi = 1/2, V'} = 0, j ? k + 1,
j it k, and

tf(j,y') = l/2|t - Jb| + l/2|t - Jb - 1| < 1/2* + l/2(Jfe - 1) = (n - l ) / 2

for all 1 < t < n.


Now, using Theorem, it can be easily seen that the value of the game is v =
(n l ) / 2 , Player 1 has optimal strategy x", and Player 2's optimal strategy is jf* if
n = 2Jfc + l , a n d y * ifn = 2fc.
1.7.2. Provide the results that immediately follow from the Theorem in 1.7.1.
Theorem. Let F^ 6e an (m x n) game. For the situation in mixed strategies, let
(x',y') be an equilibrium (saddle point) in the game T^, it is necessary and sufficient
that the following equality holds

max K(i, ) = min K(x',j). (1.7.6)


l<i<m !<><"

Proof. Necessity. If (xm,y*) is a saddle point, then, by Theorem in 1.7.1, we have

K(i,y')<K(x\y-)<K(x',j)

for all t 6 { l , . . . , m } , j { l , . . . , n } . Therefore

K{i,y')<K(x',j)

for each t and j . Suppose the opposite is true, i.e. (1.7.6) is not satisfied. Then

max K(i.y') < min K[x*,j).


V V
1<<m " ' 1<J< '
24 Chapter 1. Matrix games

Consequently, the following inequalities hold

*(*",) = E C t f & y ' ) ^ ,"*, *(*>?*) < $?*(**>->>

3=1

The obtained contradiction proves the necessity of the assertion of Theorem.


Sufficiency. Let a pair of mixed strategies (z, y) be such that
maxK(i,y) = minK(x,j). Show that in this case (x,y) is a saddle point in the game
3

The following relations hold


n
mm K(x,j) < ^2rjjK(i,j) = K(x,y)
i<< > = ,

= f:lK(i,y)<m^K(i,y).
Hence we have

K(i,y) < max /C(i,y) = K(x,y) = min tf(x,j) < tf(x, j )


t<*<m l<^5n

for all 1 < t < m and 1 < j < n, then, by the Theorem in 1.7.1, (x,y) is the saddle
point in the game V*.
From the proof it follows that any one of the numbers in (1.7.6) is the value of
the game.
1.7.3. Theorem. The following relation holds for the matrix game FA

m a x m i n i f ( x ! j ) = VA = minmaxif(*,y), (1-7.7)

in which case the extrema are achieved on the players' optimal strategies.
This theorem follows from the Theorems 1.3.4, 1.7.2, and its proof is left to the
reader.
1.7.4. Theorem. In the matrix game TA the players' sets of optimal mixed
strategies X' and Y* are convex polyhedra.
Proof. By Theorem 1.7.1, the set X* is the set of all solutions of the system of
inequalities
xa3 > VA, j N,
XV. = 1,

*>0,
where u = ( 1 , . . . , 1 ) G i f , VA is the value of the game. Thus, X* is a convex
polyhedral set (1.5.1). On the other hand, X* C X, where X is a convex polyhedron
J. 7. Properties of optimal strategies and vaJue of the game 25

(1.5.3). Therefore X" is bounded. Consequently, by Theorem 1.5.3, the set X' is a
convex polyhedron.
In a similar manner, it may be proved that Y" is a convex polyhedron.
1.7.5. As an application of Theorem 1.7.3, we shall provide a geometric solution
to the games with two strategies for one of the players (2 x n) and (m x 2) games.
This method is based on the property that the optimal strategies x' and y* deliver
exterior extrema in the equality

VA = maxminA'(x,j) = minmax K(i,y).

Example 11. ((2 x n) game.) We shall examine the game in which Player 1 has
two strategies and Player 2 has n strategies. The matrix is of the form

A = an an "in
a
2l <*22 2n

Suppose Player 1 chooses mixed strategy x = (, 1 ) and Player 2 chooses pure


strategy j N. Then a payoff to Player 1 at (x,j) is

*(*,i) = foj + ( l - 0 - (1.7.8)

Geometrically, the payoff is a straight line segment with coordinates (, K). Ac


cordingly, to each pure strategy j corresponds a straight line. The graph of the
function
H(0 = rmnK(x,j)

is the lower envelope of the family of straight lines (1.7.8). This function is concave
as the lower envelope of the family of concave (linear in the case) function (1.5.5).
The point *, at which the maximum of the function H() is achieved with respect
to ^ [0,1], yields the required optimal solution x* = (*, 1 *) and the value of
the game vA = //(")
For definiteness, we shall consider the game with the matrix

1 3 14
A=
2 14 0

For each j = 1,2,3,4 we have: K(x, 1) = -( + 2, K(x,2) = 2$ + 1, A"(i,3) =


3 + 4, ^"(x,4) = 4. The lower envelope N() of the family of straight lines
{K(x,j)} and the lines themselves, K(x,j),j = 1,2,3,4 are shown in Fig. 1.1. The
maximum //(*) of the function H() is found as the intersection of the first and the
fourth lines. Thus, * is a solution of the equation.

i(' = -C + 2 = vA.

Hence we get the optimal strategy x' = (2/5,3/5) of Player 1 and the value of the
game is VA = 8/5. Player 2's optimal strategy is found from the following reasonings.
Note that in the case studied K(x",l) = K(x",4) = vA = 8/5.
26 Chapter I, Matrix games

K(x,2)
K(x,4)

Figure 1.1

For the optimal strategy y* = (t)l, ijj, i?J, JJJ) the following equality must hold

vA = K(x%y') = fiK(x% 1) + r,;K(x%2) + #(x*,3) + t)lK(x',A).

In this case K(x',2) > 8/5, K(x',S) > 8/5; therefore ijj = ^J = 0, and til,t)l can be
found from the conditions
, ; + 4*: = 8/5,
2>tf = 8/5.
Thus, t]t = 4/5, tfl 1/5 and the optimal strategy of Player 2 is y* = (4/5,0,0,1/5).
Example IS. ((m x 2) game.) In this example, Player 2 has two strategies and
Player 1 has m strategies. The matrix A is of the form

12
22
-4 =
. Otnl Omi .
1.7. Properties of optimal strategies and vaiue of the game 27

This game can be analyzed in a similar manner. Indeed, let y = (JJ, 1 r)) be an
arbitrary mixed strategy of Player 2. Then Player l's payoff in situation (t,y) is

K(i,y) = a tl i? + a, 2 (l - n) = ( a n - ai2)n + a, 2 .

The graph of the function K(i, y) is a straight line. Consider the upper envelope
of these straight lines, i.e. the function

H(n) = max[(a,, - ai2)t) + ai2].

The function H(TJ) is convex (as the upper envelope of the family of convex functions).
The point of minimum if of the function H(n) yields the optimal strategy y" =
(if, 1 r}') and the value of the game is VA = H(rf) min H (n).
l[0,l]
1.7.6. We shall provide a theorem that is useful in finding a solution of the game.
T h e o r e m . Let x' = (1,..., ^) and y' = (i}J,..., IJ* ) 6e optimal strategies in
the game TA andv* be the value of the game. Then for any i, for which K(i,y') < VA,
there must be * = 0 , and for any j such that v& < K(x*,j) there must be !j* = 0.
Conversely, if " > 0, then K(i,y") = vA, and iff)* > 0, then K(x',j) = vA.
Proof. Suppose that for some 0 M, K(i0,y") < VA and *> / 0. Then we have
K(i0,y')tta<VAi:-

For all t e M, K(i,ym) < vA, therefore

K(hy')C<vAil
Consequently, K(x",y") < vA, which contradicts to the fact that VA is the value of
the game. The second part of the Theorem can be proved in a similar manner.
This result is a counterpart of the complementary stackness theorem [Hu (1970)]
or, as it is sometimes called the canonical equilibrium theorem for the linear program
ming problem [Gale (I960)].
Definition. Player l's (S's) pure strategy i M(j N) is called an essential
or active strategy if there exists the player's optimal strategy x" = ([,-,() (v"
fal, -,*:)) for which t:>0(r,;>0).
From the definition, and from the latter theorem, it follows that for each essential
strategy i of Player 1 and any optimal strategy y' Y' of Player 2 in the game TA
the following equality holds:

K(i,y") = aiy* = vA-

A similar equality holds for any essential strategy j e N of Player 2 and for the
optimal strategy x" G X" of Player 1

K(x',j) = a=x* = vA.

If the equality a*y = VA holds for the pure strategy i M and mixed strategy
y S Y, then the strategy t is the best reply to the mixed strategy y in the game I V
28 Ciapter J. Matrix games

Thus, using this terminology, the theorem can be restated as follows. If the pure
strategy of the player is essential, then it is the best reply to any optimal strategy of
the opponent.
A knowledge of the optimal strategy spectrum simplifies to finding a solution of
the game. Indeed, let Mx> be the spectrum of Player l's optimal strategy x*. Then
each optimal strategy y" = (r/,, ...,/*) of Player 2 and the value of the game t; satisfy
the system of inequalities
a,y* = v, i Mx.,
"iV* < v, t M\MX.,

I > ; = 1, i; > 0 , ; e JV.

Thus, only essential strategies may appear in the spectrum Mx of any optimal strat
egy x'.
1.7.7. To conclude this section, we shall provide the analytical solution of Attack
and Defence game (see Example 4, 1.1.3).
Example 13. [Sakaguchi (1973)]. Let us consider the game with the (n x n) matrix
A.
An n n
A = T2 fi3T2 ... T2

A>T .
Here 7j > 0 is the value and 0 < & < 1 is the probability of hitting the target
Ci,i 1 , 2 , . . . , n provided that it is defended.
Let Tj < T 2 < . . . < T. We shall define the function sp, of integers 1 , 2 , . . . , n as
follows:
<p(k) = { ( 1 - A ) " 1 - 1 } / i > , ( i - A-))" 1 (1-7.9)

and let / { 1 , 2 , . . . , n } be an integer which maximize the function >fi(k), i.e.

V(0 = feniin (*) (1-7-10)

We shall establish properties of the function <p(k). Denote by R one of the signs of
the order relation { > , = , < } . In this case

<p(k)Rtp(k + l) (1.7.11)

if and only if
rkR<p(k), * = l , 2 , . . . , n - l , 70 = 0. (1.7.12)
Indeed, from (1.7.9) we obtain

n \,i=k+i(n(i - Pi))
1.7. Properties of optimal strategies and va/ue of the game 29
-i
( i - -&)"
ET=*+i( 7 . ( i - A))"1'
Then we have

- + ip(k) = <p(k + i). (1.7.13)


V Tk /rfal+iWi-ft))-
Note that the coefficient in (1.7.13) placed after brackets, is positive. Therefore, from
(1.7.13) we obtained equivalence of relations (1.7.11) and (1.7.12).
Now, since tp(l) > ip(l - 1) or <p(l) > <p(l + I), (in this case T|_i < <p(l 1) or
ft > f(l)), then from relations (1.7.10), (1.7.11) we have

Ti-i <
T|-l <>(/)<
>(/) < rTi.
t. (1.7.14)
(1-7.14)

Find optimal strategies in the game r ^ . Recall that we have inequalities rt < r2 <
... Tn. Then the optimal strategies x" = (J, ..-,*) and y" = (i/J,..., JJ*) for players
1 and 2 respectively, are as follows:

' 0,
0, i i == l l, ,. ... .. ,, / - l ,
c =
| wi-ftrvDriO-ft))- 1 , it == /i,...,n,
, . . . , n,
L7 15
(1.7.15)
( - )
, _ / 0, j = 1l ,,..... ,, ,/ /~-1l ,, ,)7,M
(1.7.16)
(1Jlb)
'' I (*>->(0)/(r>(l-ft)),
(>-V>(0)/(r>(l-ft)), J = /, . . , n ,
=/,...,,
and the value of the game is
vA = <p(l).
We have that * > 0 , i = 1,2,... ,n and X)"=i (' = 1- From the definition of tp(l)
and (1.7.14) we have that??' > 0, j = 1,2,... , n and "=! J?* = 1.
Let K(x',j) be a payoff of Player 1 at (x',j). Similarly, let K(i,y*) be a pay-
off at (i,y"). Substituting (1.7.15), (1.7.16) into the payoff function and using the
assumption that the values of targets do not decrease, and using (1.7.14), we obtain

*(*',i) = <
' > , = <p(l) + f > , ( i - /?,))- > <p(i), j = TJ^r,

T, - r<(l - A)l? = ?(0. =


T
udall'\
'hus, for i, j -= J1 , . . -. , >(*)>
n the *following
= M - iinequalities
> hold

* i*. yJ - 1 T . _ T.(iK(hV _ m m)<<p(t)<K(x;j).


= ^(/),,- = j ^ .
Thus, for all i, j = 1 , . . . , n the following inequalities hold
K(hVm)<<p(t)<K(x;j).
Then, by Theorem, 1.7.1, x* and y* are optimal and vA = i^(/). This completes the
solution of the game.
30 Chapter 1. Matrix games

1.8 Dominance of strategies


The complexity of solving a matrix game increases as the dimensions of the matrix A
increase. In some cases, however, the analysis of payoff matrices permits a conclusion
that some pure strategies do not appear in the spectrum of optimal strategy. This
can result in replacement of the original matrix by the payoff matrix of a smaller
dimension.
1.8.1. Definition. Strategy x' of Player 1 is said to dominate strategy x" in the
(m x n) game T* if the following inequalities hold for all pure strategies j 6 { 1 , . . . , n}
of Player 2
x'a>>x"a>. (1.8.1)
Similarly, strategy y' of Player 2 dominates his strategy y" if for all pure strategies
t { l , . . . , m } of Player 1
aiy'<aiy". (1.8.2)
If inequalities (1.8.1), (1.8.2) are satisfied as strict inequalities, then we are deal
ing with a strict dominance. A special case of the dominance of strategies is their
equivalence.
Definition. Strategies x' and x" of Player 1 are equivalent in the game TA if
for all j {l,..-,r}
zV" = i V .
We shall denote this fact by x' ~ x".
For two equivalent strategies x' and x" the following equality holds (for every
yeF)
K(x',y) = K(x",y).
Similarly, strategies y' and y" of Player 2 are equivalent (y' ~ y") in the game FA if
for all * { l , . . . , m }
y'ai = y"oi.
Hence we have that for any mixed strategy x X of Player 1 the following equality
holds
K(x,y') = K(x,y").
For pure strategies the above definitions are transformed as follows. If Player
l's pure strategy t' dominates strategy t" and Player 2's pure strategy j ' dominates
strategy j " of the same player, then for all t = 1 , . . . , m; j = 1 , . . . , n the following
inequalities hold
<Vj > a t " j , a
'i' ^ a
>j"-

This can be written in vector form as follows:

a<( > o,-, a' < aJ .

Equivalence of the pairs of strategies i',"(' ~ i") and j',j"(j* ~ j") implies that
the conditions <v = ain{a'' a'") are satisfied.
1.8. Dominance of strategies 31

Definition. The strategy x"(y") of Player 1(2) is dominated if there exists a


strategy x' ^ x" (y' y= y") of this player which dominates x"{y"); otherwise strategy
x"(y") is an undominated strategy.
Similarly, strategy x"(y") of Player 1(2) is strictly dominated if there exists a
strategy x'(y') of this player which strictly dominates x"(y"), i.e. for all j = l,n(i =
1, m) the following inequalities hold

x V > x"a3, a;t/' < aiy";

otherwise strategy x"(y") of Player 1(2) is not strictly dominated.


1.8.2. Show that players playing optimally do not use dominated strategies. This
establishes the following assertion.
Theorem. //, in the game Y&, strategy x' of one of the players dominates an
optimal strategy x', then strategy x' is also optimal.
Proof. For definiteness, let x' and x' be strategies of Player 1. Then, by domi
nance,
x'a1 > x'a3

for all j = l , n . Hence, using the optimality of strategy x* (see 1.7.3) , we get

x>A = minx*a J > min x'a1 > minx*a J = VA


3 ] j

for all j = l , n . Therefore, by Theorem 1.7.3, strategy x' is also optimal.


Thus, an optimal strategy can be dominated only by another optimal strategy.
On the other hand, no optimal strategy is strictly dominated; hence the players when
playing optimally must not use strictly dominated strategies.
Theorem. //, in the game TA, strategy x" of one of the players is optimal, then
strategy x* is not strictly dominated.
Proof. For definiteness, let x* be an optimal strategy of Player 1. Assume that x*
is strictly dominated, i.e. there exist such strategy x' 6 X that

x V > x'a1, j = 1,2,...,n.

Hence
mm x'a1 > mmx'a-'.
3 1

However, by the optimality of x* 6 X, the equality rninx'cf' = iu is satisfied. There-


i
fore, the strict inequality
1
max min xa > v&
1
i
holds and this contradicts to the fact that vA is the value of the game (1.7.3). The
contradiction proves the theorem.
It is clear that the reverse assertion is generally not true. Thus, in the game
with the matrix ~ the first and second strategies of Player 1 are not strictly
dominated, although they are not optimal,
32 Chapter i. Matrix games
On the other hand, it is intuitively clear that if the t'th row of the matrix A (the
/th column) is dominated, then there is no need to assign positive probability to it.
Thus, in order to find optimal strategies instead of the game I V it suffices to solve
a subgame IV, where A' is the matrix obtained from the matrix A by deleting the
dominated rows and columns.
Before proceeding to a precise formulation and proof of this result, we will in-
troduce the notion of an extension of mixed strategy x at the ith place. If x =
(fi,..., (m) X and 1 < t < m + 1, then the extension of strategy z at the ith place
is called the vector xt = (i,.., t -i,0,&, ..., m ) e ff"+1. Thus the extension of
vector (1/3,2/3,1/3) at the 2nd place is the vector (1/3,0,2/3,1/3); the extension
at the 4th place is the vector (1/3,2/3,1/3,0); the extension at the 1st place is the
vector (0,1/3,2/3,1/3).
1.8.3. Theorem. Let IV be an (m x n) game. We assume that the ith row
of matrix A is dominated (i.e. Player l's pure strategy i is dominated) and let IV
be the game with the matrix A' obtained from A by deleting the ith row. Then the
following assertions hold.
1- vA = vA,.
2. Any optimal strategy y* of Player 2 in the game IV is also optimal in the game
IV
S. If x" is an arbitrary optimal strategy of Player 1 in the game IV and x~f is the
extension of strategy x" at the ith place, then x^ is an optimal strategy of that
player in the game IV
4. If the ith row of the matrix A is strictly dominated, then an arbitrary optimal
strategy x* of Player 1 in the game FA can be obtained from an optimal strategy
x' in the game IV by the extension at the ith place.
Proof. We may assume, without loss of generality, that the last mth row is dom-
inated. Let x = (i,.-.,m) be a mixed strategy which dominates the row m. If
m = 0, then from the dominance condition for all j< 1,2,..., n we get
m m1
a = a a
E&a 'i = E &
E& &av ^^ a"i'
"i'
t=l i=l
m-1
V
Efc=U> 0 , = l1
d _ i
f x n :
m - 11 . (1.8.3)
i=l
Otherwise (m > 0), consider the vector z' = ((,.. .,{,), where

? (1.8.4)
(L8 4)
**""
' - \ 00,, - m.
i= m. -
Components of the vector x are non-negative, (J > 0,i = 1,... ,m) and J^Jl, ,- = 1.
On the other hand, for all t = 1,..., n we have
I m
1 a > mj
m
Q
1I m
m
737- E &<**
1 Ctn E&
1
o w~(mrzrr1=1 *' l
x
4"> i=l
i=l Sm , = i
1.8. Dominance of strategies 33

or
, _, E i*i) > " - ; , _ f E &

Considering (1.8.4) we get

m-l m-l
E ,'<*<; ^ m, E = Q"J' j = 1, ,
=i i=i

m-J
E ! = i . ^ M = i v . i - i . (1-8-5)

Thus, from the dominance of the mth row it always follows that it does not exceed a
convex linear combination of the remaining m 1 rows [(1.8.5)].
Let {x",y') e Z(TA>) be a saddle point in the game YA>, x* = (([,... ,fm-i)>
y* = {rfl,... ,r?*). To prove assertions 1,2,3 of the theorem, it suffices to show that
K{x*m,y') = vA, and
n m-l
E Oiitf << E e? + 0 am} (1.8.6)
j=i i=i

for all i = 1,. . . , m , j = l , . . . , n .


The first equality is straightforward, and the optimality of strategies (x*,y*) in
the game r^> implies that the following inequalities are satisfied
n m 1
E Q ' i T ? j < V < E a
" ; C * = 1 , " - 1 , ;' = T~n. (1.8.7)
i=i .=i

The first of the inequalities (1.8.6) is evident from (1.8.7). We shall prove the first
inequality. To do this, it suffices to show that
n
E Q w?i -v*-
From inequalities (1.8.3), (1.8.5) we obtain
n n m~l m-l
E<w < E E yfc. 9; < E *& =v^'.
>=i j = i .=i <=i

which proves the first part of the theorem.


To prove the second part of the theorem (assertion 4), it suffices to note that in the
case of strict dominance of the mth row the inequalities (1.8.3), (1.8.5) are satisfied
as strict inequalities for all j = l , n ; hence
n n m1
E ari <Y,Y, antiv'j < M-.
;=i i=\ t=i
34 Chapter 1. Matrix games

From Theorem 1.7.6, we then have that the mth component of any optimal strategy
of Player 1 in the game T* is zero. This completes the proof.
Let us formulate the dominance theorem for the second player without providing
any proof.
Theorem. Let FA be an (m x n) game. Assume that the jth column of the
matrix A is dominated and r# is the game having the matrix A' obtained from A by
deleting the jiA column. Then the following assertions are true.
1. vA = vA-.
2. Any optimal strategy x* of Player 1 in the game TA' is also optimal in the game

3. Ify* is an arbitrary optimal strategy of Player 2 m the game TA and V) w the


extension of strategy y at the jth place, then y^ is an optimal strategy of Player
2 in the game TA-
4. Further, if the jth column of the matrix A is strictly dominated, then an arbi
trary optimal strategy y" of Player 2 in the game FA can be obtained from an
optimal strategy y' in the game TA' by extension at the jth place.
1.8.4. To summarize: The theorems in 1.8.3 yield an algorithm for reducing the
dimension of a matrix game. Thus, if the matrix row (column) is not greater (not
smaller) than a convex linear combination of the remaining rows (columns) of the
matrix, then to find a solution of the game, this row (column) can be deleted. In this
case, an extension of optimal strategy in the truncated matrix game yields an optimal
solution of the original game. If the inequalities are satisfied as strict inequalities, the
set of optimal strategies in the original game can be obtained by extending the set of
optimal strategies in the truncated game; otherwise this procedure may cause a loss
of optimal strategies. An application of these theorems is illustrated by the following
example.
Example 14. Let us consider the game with the matrix
2 1 1 0
2 3 1 3
3 12 0
0 3 0 6
Since the 3rd row 03 dominates the 1st row (03 > a%), then, by deleting the 1st row,
we obtain
' 2 3 1 3
A, = 3 1 2 0
. 0 3 0 6
In this matrix the 1st column a1 dominates the 3rd column a3. Hence we get
"3 1 3"
A* 1 2 0
3 0 6
1.9. Completely mixed and symmetric games 35

In the latter matrix no row (column) is dominated by the other row (column). At
the same time, the 1st column a 1 is dominated by the convex linear combination of
columns o 2 and a 3 , i.e. a 1 > l/2a 2 + l/2a 3 , since 3 > 1 / 2 + 1 / 2 - 3 , 1 = 1/2-2+1/2-0,
3 = 0- 1/2 + 1/2 6. By eliminating the 1st column, we obtain

r i 3]
A3 = 2 0
0 6

In this matrix the 1st row is equal to the linear convex combination of the second
and third rows with a mixed strategy x = (0,1/2,1/2), since 1 = 1/2 2 + 0 1/2,
3 = 0 - 1 / 2 + 6 - 1 / 2 . Thus, by eliminating the 1st row, we obtain the matrix

2 0
A4 =
0 6

The players' optimal strategies x" and y* in the game with this matrix are x* = y" =
(3/4,1/4), in which case the game value v is 3/2.
The latter matrix is obtained by deleting the first two rows and columns; hence
the players' optimal strategies in the original game are extensions of these strategies
at the 1st and 2nd places, i.e. x\2 = y\2 = (0,0,3/4,1/4).

1.9 Completely mixed and symmetric games


A knowledge of the optimal strategy spectrum simplifies solving games. The optimal
strategy spectrum includes only essential pure strategies of a player. Thus no essential
strategy is strictly dominated which follows immediately from the theorems in 1.8.
1.9.1. We consider the class of games in which a knowledge of the spectrum
suffices to find a solution of the game.
Definition. Strategy x(y) of Player 1(2) is completely mixed if its spectrum
consists of the set of all strategies of the player, i.e. Mz = M (Ny = N).
A saddle point (x*,y*) is completely mixed if the strategies x" and y* are com
pletely mixed. The game TA is completely mixed if each saddle point therein is
completely mixed.
The following theorem states that a completely mixed game has a unique solution.
Theorem. A completely mixed (m x n) game TA has a unique solution (x*,y'}
and a square matrix (m = n ) . If VA ^ 0, then the matrix A is nonsingular and

uA-
x = u/4. -1 u' (1.9.1)

(1.9.2)

1
VA = (1.9.3)
uA~lu
36 Chapter 1. Matrix games

Proof. Let x* = ( & ' , . . . , & ) 6 X' and y* = (;!,..., i) Y" be the players
arbitrary optimal strategies and let VA be the value of the game TA- Since TA is
completely mixed game, x* and y" are completely mixed strategies that (and only
them) are solutions of the systems of linear inequalities in 1.7.6:

xa3 = VA, XU = 1, x > 0, j = 1 , . . . , n, (1.9.4)

ya. = f/, y>= l,y > 0, t = l , . . . , m , (1.9.5)


where u = ( 1 , . . . , 1) 6 FT, w = ( 1 , . . . , 1) / T .
We shall show that the solution of the completely mixed game (x",y') is unique.
The sets X*,Y* given by (1.9.4) and (1.9.5) are nonempty convex polyhedra and,
consequently, have extreme points. By the second of the theorems in 1.5.2, we have

m < rank[a},.. .,an,u] = rank[A,u] < m, (1.9.6)

n < rank[ai,.. .,am,w] = rank[A,to] < n. (1.9.7)


This theorem now implies that the sets X", Y* have one extreme point each and
hence consist only of such points (as convex polyhedra containing a unique extreme
point). This completes the proof of uniqueness of the solution (x',y*).
Let VA = 0. Then a homogeneous system

xa3 = VA, j = hn

has a nonzero solution; hence rank(A) < m. Since rank[A, u] = m, we have:


rank(A) = m 1. Similarly, from (1.9.5) and (1.9.7) it follows that rank(A) = n 1.
Hence n = m.
Let vA f 0. Then

rank(A) = rank[A, t^u] = ranfc[/l,] = m,

rank(A) = rank\A,VAw] = rank[A, w] = n.


Hence we have n = m = rank(A), i.e. A is a nonsingular matrix. The system of
equations x'A = v&u has a solution

x* = VAUA~1.

Write a solution of the system Ay' = v^u :

y' =vAA~1u,

then
1
VA = ._. 1
uA
u
This completes the proof of Theorem.
The reverse is also true, although the proof will be left to the reader.
1.9. Completely mixed and symmetric games 37
Theorem. Suppose the matrix A is nonsingular in the (m x m) game TA. If
Player 2 has in TA a completely mixed optimal strategy, then Player 1 has a unique
optimal strategy x* (1.9.1). If Player I has in the game TA a completely mixed optimal
strategy, then Player 2 has a unique optimal strategy y* (1.9.2), the valve of the game
VA is defined by (1.9.S).
Example 15. ((2 x 2) game.) Consider the (2 x 2) game with the matrix

A =
Q2t a2i

Arbitrary mixed strategy x of Player 1 can be written as x = (, 1 - ), where


0 < < 1. Similarly, a mixed strategy of Player 2 is of the form y = (n, 1 rj), where
0 < rf < 1. The payoff at (x,y) is

K(x,y) = ({anrf + a, 2 (l - n)} + (1 - 0 [<*:'? + a w ( l - *?)]

We now assume that the game TA has no saddle point in pure strategies (a solution
is then found from the maximin and minimax equality) and x" = ({*, 1 *), y* =
(ij", 1 T)") are arbitrary optimal strategies of the first and second players respectively.
In this case, the saddle point (s*,y*) and game TA are completely mixed (* > 0 and
rj' > 0). Therefore, by Theorem 1.9.1, the game has a unique pair of optimal mixed
strategies which are a solution of the system of equations

QuV' + (! - /*) = vAl


a2ii?* + (1 - tf)a2i = vA,
<*nC + (1 - V)an = "A,

If we ensure that vA ^ 0 (e.g. if all the elements of the matrix A are positive, this
inequality is satisfied), then the solution of the game is

v
* = , , > x' - VAUA~1, y' = vAA~lu,

where u = (1,1). Thus, it can be readily verified that the matrix


1 0
has no saddle point. The inverse matrix A ' is
-1 1

1 0
A~l =
1 1

Then vA = 1/3, x' = (2/3,1/3), y' = (1/3,2/3).


1.9.2. We now examine a special class of the games having matrices of a special
form.
Definition. The game TA with the square matrix A is symmetric if the matrix
A is skew-symmetric, i.e. if a<j = a^ for all i and j .
38 Chapter 1. Matrix games
In this case, all diagonal elements of the matrix A are zero, i.e. an = 0 for all t.
For the skew-symmetric matrix A we have AT = -A. Since the matrix A is square,
the players' sets of mixed strategies coincide, i.e. X = Y.
We shall prove the theorem on properties of a solution to the skew-symmetric
game T^ which may be useful in finding a saddle point.
Theorem. Let VA be a symmetric game. Then

vA=0

and the players' sets of optimal strategies coincide, i.e.

X* = Y\

Proof. Let A be the game matrix and let x X be an arbitrary strategy. Then
xAx = xATx = xAx. Hence xAx = 0.
Let (x',y*) G Z(A) be a saddle point, and let vA be the value of the game. Then

vA = x'Ay' < x'Ay,

vA = x'Ay' > xAy'


for all x G X, y G Y. Consequently,

vA < x'Ax* = 0, vA > y'Ay' = 0.

Hence we get vA = 0.
Let the strategy x' be optimal in the game TA. Then (see Theorem 1.7.1)

x'A > 0.

It follows, however, that x*(AT) > 0, hence x'AT < 0. Thus we get

Ax' < 0.
By the same Theorem 1.7.1, this means that x* is the optimal strategy of Player 2.
We have thus proved that X* C Y*. The inverse inclusion is proved in a similar
manner.
In what follows, dealing with the player's optimal strategy in the symmetric game,
because of the equality X* = Y* we shall not indicate which of the players is con
cerned.
Example 16. Let us solve the game with the matrix

r o i ii
A=
1.9. Completely mixed and symmetric games 39
an
Let x' = ( 6 , 6 > 6 ) be optimal strategy in the game FA- Then the following
inequalities are satisfied

6 - 6 > 0, - 6 + 6 > 0, Ci - 6 > 0, (1.9.8)


6 + 6 + 6 = 1, 6 > 0, 6 > 0, 6 > 0.
We shall show that this game is completely mixed. Indeed, let J = 0. From the
system of inequalities (1.9.8) we then obtain the system

6~6>o, 6>o, -6>o,


+ 6 + 6 = 1,
which has no non-negative solution. Similar reasoning show that the cases J = 0 and
fj = 0 are impossible. Therefore the game T& is completely mixed. Consequently,
the components 6 , 6 , 6 a r e solutions of the system

6 - 6 = 0, - 6 + 6 = 0 , 6 - 6 = 0,

6 + 6 + 6 = 1, & > 0 , i = l , 2 , 3 .
This system has a unique solution. The vector x' = (1/3,1/3,1/3) is an optimal
strategy.
Example 17. Solve a discrete five-step duel game in which each duelist has one
bullet. This game was formulated in 1.1.4 (see Example 3). The payoff matrix A of
Player 1 is symmetric and is of the form

0 - 3 - 7 -11 -15
3 0 1 -2 -5
A = 7 -1 0 7 5
11 2 - 7 0 15
15 5 - 5 -15 0
Note that the first strategy of each player (first row and first column of the matrix)
is strictly dominated; hence it cannot be essential and can be deleted. In the resulting
truncated matrix
0 1 - 2 - 5
A' = - 1 0 7 5
2 - 7 0 15
5 - 5 -15 0
not all strategies are essential.
Indeed, symmetry of the game FA> implies that vA> = 0. If all strategies were
essential, the optimal strategy x* would be a solution of the system of equations

x V = 0 , ; = 2,3,4,5,

6 = 1,
40 Chapter 1. Matrix games
which define the solution. Exhausting different possibilities, we dwell on the essential
submatrix A" composed of the rows and columns of the matrix A that are labeled as
2,3 and 5:
0 1-5
A" = - 1 0 5
. 5 - 5 0
The game with the matrix A" is completely mixed and has a unique solution y = x =
(5/11,5/11,1/11).
In the original game, we now consider the strategies x* = y' =
(0,5/11,5/11,0,1/11) which are optimal.
Thus, we finally have that v* = 0 and the saddle point (x*,y*) is unique. As far
as the rules of the game are concerned, the duelist should not fire at the 1st step, he
must fire with equal probability after the 2nd or 3rd step, never after the 4th step,
and only with small probability may he fire when the duelists are breast to breast.

1.10 Iterative methods of solving matrix games


The popular method of solving a matrix game by reducing it to a linear program
ming problem has a disadvantage that the process of solving the linear programming
problem is complicated for matrices of large dimensions.
In such cases it is standard practice to employ decomposition methods for linear
programming problems when, instead of solving the problem with the original matrix,
we construct the corresponding problem with the matrix which has few rows but many
columns. A set of auxiliary linear programming problems with matrices of smaller
dimensions is solved at each iteration of the corresponding problem.
Unfortunately, decomposition methods are only effective for matrices of a special
type (e.g., block-diagonal matrices).
1.10.1. [Robinson (1950)]. Brown-Robinson iterative method (fictitious play
method). This method employs a repeated fictitious play of the game having a given
payoff matrix. One repetition of the game is called a play. Suppose the game is
played with an (m x n) matrix A = {ay}. In the 1st play both players choose ar
bitrary pure strategies. In the kth play each player chooses the pure strategy which
maximizes his expected payoff against the observed empirical probability distribution
of the opponents pure strategies for (k 1) plays.
Thus, we assume that in the first k plays Player 1 uses the t'th strategy * times
(i = 1,... ,m) and Player 2 uses the jth strategy n* times (j = 1,... ,n). In the (fc+1)
play, Player 1 will then use i*+i strategy and Player 2 will use his jk+i strategy, where

C* = max <;/* = o i w i ! #
i )

and
v* = mjn ][>* = $>, J V H *.
1.10. Iterative methods of solving matrix games 41

Let v be the value of the matrix game TA- Consider the expressions

3
i )

vk/k = m j n ^ a ^ f / * = </*+,,*/*

The vectors xk = (k/k,..., /&) and yk = (r]k/k,..., t)k/k) are mixed strategies for
the players 1 and 2, respectively; hence, by the definition of the value of the game we
have
ma.xuk/k < v < rmnvk/k,

We have thus obtained an iterative process which enables us to find an approxi


mate solution of the matrix game, the degree of approximation to the true value of
the game being determined by the length of the interval [maxtJ*/A;,minv*/A:]. Con
vergence of the algorithms is guaranteed by the Theorem [Robinson (1950)].
Theorem.
lim (min v*/fc) = lim (max V.k/k) = v.
k~*oo k Jfc*oo k

Example 18. Find an approximate solution to the game having the matrix

a b c
a [ 2 1 3'
A= P 3 0 1 .
7 [ l 2 l
.
Denote Player 1 's strategies by a,0,-y, and Player 2's strategies by a, 6, c. Suppose
the players first choose strategies a and a, respectively. If Player 1 chooses strategy
a, then Player 1 can receive one of the payoffs (2,1,3). If Player 2 chooses strategy a,
then Player 1 can receive one of the payoffs (2,3,1). In the 2nd and 3rd plays, Player
1 chooses strategy f3 and Player 2 chooses strategy 6, since these strategies ensure the
best result, etc.
Table 1.1 shows the results of plays, the players' strategies, the accumulated payoff,
and the average payoff.
Thus, for 12 plays, we obtain an approximate solution

u = (1/4,1/6,7/12), y12 = (1/12,7/12,1/3)

and the accuracy can be estimated by the number 5/12. The principal disadvantage
of this method is its low speed of convergence which decreases as the matrix dimension
increases. This also results from the nonmonotonicity of sequences vk/k and y_k/k.
Consider another iteration algorithm which is free of the above-mentioned disad
vantages.
1.10.2. [Sadovsky (1978)]. Monotonic iterative method of solving matrix games.
We consider a mixed extension TA = (X, Y, K) of the matrix game having the (m x n)
matrix A.
42 Chapter I. Matrix games
Play Player Player Player l's Player 2's
No l's 2's payoff payoff 11*
k k
choice choice a 0 7 a b c
1 a a 2 3 1 2 1 3 3 1
2 0 b 3 3 3 5 1 4 3/2 1/2
3 b 4 3 5 8 1 5 5/3 1/3
4 7 b 5 3 7 9 3 6 7/4 3/4
5 7 b 6 3 9 10 5 7 9/5 5/5
6 7 b 7 3 11 11 7 8 11/6 7/6
7 7 b 8 3 13 12 9 9 13/7 9/7
8 7 c 14 4 14 13 12 10 14/8 10/8
9 7 c 14 5 15 14 12 11 15/9 11/9
10 7 c 17 6 16 15 14 12 17/10 12/10
11 a c 20 7 17 17 15 15 20/11 15/11
12 a b 21 7 19 19 16 18 21/12 16/12

Table 1.1

Denote by xN = (f,...,(%) X the approximation of Player l's optimal strat


egy at the Nth iteration, and by cN 6 RN, c^ = ( 7 ^ , . . . ,7^) an auxiliary vector.
Algorithm makes it possible to find (exactly and approximately) an optimal strategy
for Player 1 and a value of the game v.
At the start of the process, Player 1 chooses an arbitrary vector of the form
c
= o<o, where a, is the row of the matrix A having the number t0.
Iterative process is constructed as follows. Suppose the N 1 iteration is per
formed and vectors i^" 1 , c^ - 1 are obtained. Then xN and </* are computed from the
following iterative formulas

xN = (1 - aN)xN-1 +aNxN, (1.10.1)

cN = (1 - a^c"-1 + aNcN, (1.10.2)


N
where 0 < an < 1. Vectors i and c^ will be obtained below.
We consider the vector c^ - 1 = ( 7 ^ _ 1 , . . . , 7 ^ _ 1 ) and select indices j k on which
the minimum is achieved

Let us denote
V."-1 = min 7*- 1 (1-10.3)
j=l,...,n '
and JN~l = {ji,... ,jk} be the set of indices on which (1.10.3) is achieved.
Let TN C TA be a subgame of the game T* with the matrix AN = {<$ -1 },
= l , . . . , m , and the index j N _ 1 6 JN~l. Solve the subgame and find an optimal
strategy xN X for Player 1. Let xN = (?,...,(%).
I.JO. Iterative methods of solving matrix games 43

Compute the vector c " = YlTLi ifai- Suppose the vector c " has components
c " = ( 7 f , . . . , 7 ^ ) . Consider the (2 x n) game having the matrix

in
7f In

Find Player l's optimal strategy (aw, 1 ajv)i 0 < a/v < 1 in this subgame.
Substituting the obtained values xN, c^,a;v into (1.10.1), (1.10.2), we find xN and
cf*. We continue the process until the equality a^ = 0 is satisfied or the required
accuracy of computations is achieved. Convergence of the algorithm is guaranteed by
the following theorem [Sadovsky (1978)].
Theorem. Let {v.N},{xN} be the iterative sequences determined by (1.10.1),
(1.10.3). Then the following assertions are true.

1. vN > if'1, i.e. the sequence {t^'1} strictly and monotonically increases

2.
lim vN = v = v (1.10.4)

3. lim xN x', where x" X* is an optimal strategy of Player 1.


N-oo
Example 19. By employing a monotonic algorithm, solve the game with the matrix

2 1 3"
A = 3 0 1
1 2 1
Iteration 0. Suppose Player 1 chooses the 1st row of the matrix A, i.e. 2* = (1,0,0)
and c = at = (2,1,3). Compute v = min7; = -ft = 1, J = 2.
Iteration 1. Consider the subgame Tl C T having the matrix

A1 =

An optimal strategy l of Player 1 is the vector xl = (0,0,1). Then c l = <t3 = (1,2,1).


" 2 1 3
Solve the (2 x 3) game with the matrix . Note that the 3rd column of
1 2 1
2 1
the matrix is dominated and so we consider the matrix Because of the
1 2
symmetry, Player l's optimal strategy in this game is the vector (ow,l as) =
(1/2,1/2).
We compute xl and c l by formulas (1.10.1), (1.10.2). We have

x 1 = l/2a; 0 + l / 2 i 1 = (1/2,0,1/2),
c l = l/2c + l/2c = (3/2,3/2,2),
y* = mitiii} = 7} = i\ = 3/2 > t;0 = 1.
44 Chapter 1. Matrix games

The set of indices is of the form Jl = {1,2}.


Iteration 2. Consider the subgame r ! C T having the matrix

'2 1 "
A = 3 0
1 2
The first row in this matrix is dominated; hence it suffices to examine the subma-
trix
3 0
1 2
Player l's optimal strategy in this game is the vector (1/4,3/4); hence i a =
(0,1/4,3/4),
Compute c 2 = l/4a 3 + 3/4a 3 = (3/2,3/2,1) and consider the (2 x 3) game with
the matrix
" 3/2 3/2 1
3/2 3/2 2
The second strategy of Player 1 dominates the first strategy and hence a 3 = 0.
This completes the computations x* = x1 = (1/2,0,1/2); the value of the game is
v = ji1 = 3/2, and Player 2's optimal strategy is of the form y* = (1/2,1/2,0) (see
Example 18).

1.11 Exercises a n d problems


1. Each of the two players shows m fingers, (1 < m < n, n < 5), and simulta
neously calls his guess of the number of fingers his opponent will show. If just one
player guesses correctly (the opponent's guess being incorrect), he wins an amount
equal to the sum of the fingers shown by both players. In all other cases the players'
payoffs are zero.
(a) How many strategies will each player have for n = 3?
(b) Construct the game matrix for n = 2.
2. Allocation of search efforts. Player 2 hides in one of the n cells. Player 1 has
a team of r searches to be allocated to cells for the purpose of searching Player 2.
For example, (r 1) searchers can be allocated to the first cell, one searcher to the
second cell, and no searcher to the remaining cells, etc.
In the search conducted by one searcher, the probability of finding the Player 2
in the th cell (if he is there) is assumed to be known. Finding Player 2 by each of
the searchers are independent events.
Player l's payoff is the probability of finding the Player 2 when the allocation of
searchers is given.
(a) Compute the number m of pure strategies for Player 1.
(b) Construct the game matrix.
3 . Searching for many objects. Player 2 hides m black balls in n containers. The
total number of balls (black and white) in the jth container is lj,j = 1 , . . . , n . Player
1.11. Exercises and problems 45

2 has to allocate m black balls to n containers, with the total number of balls in each
container being constant and equal to lj, lj > m.
The opponent (Player 1) tries to find as many black balls as possible and has an
opportunity to examine one of the containers. In examination of the ith container,
Player 1 chooses at random (equiprobably) m balls from U and his payoff is the
mathematical expectation of the number of black balls in the sample from m balls.
(a) Let Pi black balls be hidden in the ith container. Compute the probability
Pij that the sample of r balls chosen from the ith container contains exactly j black
balls.
(b) Construct the game matrix.
4. Air defense. An air defense system can use three types of weapons to hit an air
target (1,2,3) which are to be allocated to two launcher units. The enemy (Player 2)
has two types of aircraft (type 1 and type 2). The probabilities of hitting the planes
by one defense system are summarized in the matrix

1 2
1 ' 0.3 0.5
2 0.5 0.3
3 0.1 0.6

It is assumed that only one plane is attacking.


Player l's payoff is the probability of hitting the plane by the air defense system
(a) Construct the game matrix.
(b) Find out if there exists a solution in pure strategies.
5. Find a saddle point and value of the following games:
[ 1/2 0 1/2
3 5
(a) 3 2 (b) 1 3/2 2
[ 0 - 1 7/4
6. Verify that t; = 2 and the pair (x',y"), where x* = (0,0,1), y" = (2/5,3/5,0)
[ 3-2 4
are respectively the value and saddle point in the game with the matrix 1 4 2
[ 2 2 6
7. Let A'{A") be the submatrix of the matrix A obtained by deleting a series of
rows (columns) of A. Show that the inequalities vA> < vA < vA, where VA>,VA" are
the values of the games I V , r^, respectively, are satisfied.
" 1 3 - 3
8. Consider the game r A -, with the matrix 2 0 3 . The value of the
2 1 0
game is vA = 1 and Player l's optimal strategy is x" = (1/3,2/3,0). Find an optimal
strategy y* for Player 2.
-4 0
9. Solve the matrix game using the graphical method 3 - 2
5 -3
-1 -1
10. Show that a strictly dominated strategy cannot be essential.
46 Chapter 1. Matrix games
'20 0"
11. Show that the 3rd row of the matrix A is dominated, if A 0 8
4 5
12. Show that the choice of the 1st column is equivalent to a mixed strategy
= (0,1/3,2/3) if the game matrix is of the form .
13. Employing the notion of dominance, find a solution of the game with the
" 1 7 2"
matrix 6 2 7
5 1 6
14. rove Theorem 1.7.3.
15. Solve the one-trial search game. Player 2 conceals an object in one of the n
cells. Player 1 searches for it in one of these cells with the probability of finding the
object in the tth cell equal to $; > 0, t* = 1,..., n (if the object is hidden in the cell
'). Show that the game under consideration is completely mixed. Find a solution of
the game.
16. Solve a discrete search game (Example 5, 1.1.3) with the assumption that
afii - T , - ^ 0 , t = l , . . . , n .
Hint. Make use of the result from 1.7.7.
17. Two-object search game. Player 2 conceals two objects in n cells (both objects
can be placed in the same cell). Player l's objective is to find at least one object;
he has an opportunity to examine one cell (# > 0 is the probability of finding one
object in the tth cell if the object is hidden in this cell). If the tth cell contains
simultaneously two objects, the probability of finding them both is 0f. Thus the
matrix A = {<**},a = (i,j),j = 1,... ,n, is of the form

<**a = 0 , i = j , i ^ k,
<*ka = A , i = k, i^j,
*a = Pj, j = k, i? j ,
*. = A ( 2 - A ) , i = j = k.
Solve the game.
18. Solve the search game with many hidden objects (see Exercise 3).
19. Search game for several sets on a plane. The family of n fixed convex com
pact sets Kx, K2, . , Kn C R2 and the system m of convex compact congruent sets
Ti,...,Tm C R2 are given. The simultaneous discrete search game is as follows.
Player 2 "hides" m sets Tj (j = 1,..., m) in n sets /,-(* = 1,..., n) in such a manner
that each set intersects one and only one set if,. Player 2's pure strategy is of the
form
n
a = (pi,P2, ,Pn) R", Y,Pi = m,
where pi is the number of sets Tj hidden in the set K{.
Player 1 can examine one of the sets K, by selecting the point x from Ki. Player
l's payoff is the number of sets {Tj} whereto z belongs.
Find a solution of the game.
J. J J. Exercises an d problems 47

20. Search game with two trials for a searcher. Player 2 hides an object in one of
the n cells and Player 1 (searcher) searches for it in one of these cells. Player 1 has
an opportunity to examine two cells (repeated examination of cells is not allowed).
Player l's set of pure strategies consists of pairs (i, j), i = 1 , . . . , n , j = 1 , . . . ,n,
t ^ j and contains C 2 elements. Player 2's set of pure strategies contains n elements
k = 1 , . . . , n. The payoff matrix is of the form, where

a _ J **> lf ' = * or j = fc, (8k > 0),


P{ij)k
~ \ 0, otherwise.

Find a solution of the game.


2 1 . In the search game with two trials for a searcher, consider the case where
Player l's set of pure strategies consists of all possible pairs (t, j) and contains n 2
elements. Solve the game under the assumption that

22. In the evasion-type game, (see 1.7.1), show that Player 1 always has a unique
optimal strategy.
This page is intentionally left blank
Chapter 2
Infinite zero-sum two-person
games

2.1 Infinite games


2 . 1 . 1 . This chapter deals with zero-sum two-person games differing from matrix
games in that one or two players in such games have an infinite (countable or con
tinuum) sets of strategies. From the game-theoretic point of view this difference is
unimportant, since the game continues to be a zero-sum two-person game and the
problem is only one of employing more sophisticated analytical techniques of research.
Thus we shall examine general zero-sum two-person games, i.e. systems of the
form
T = (X,Y,H), (2.1.1)

where X and Y are arbitrary infinite sets whose elements are the strategies of the
players 1 and 2, respectively, and H : X x Y > Rl is the payoff function of Player 1.
Recall that the rules for zero-sum two-person games are described in 1.1.1. Player 2's
payoff in the situation (z,y) is [ H(x,y)\, x G X,y 6 Y (the game being zero-sum).
In this chapter the games with bounded payoff function H are considered.
2.1.2. Example 1. (Simultaneous planar pursuit-evasion game.) Let Si and S 2
be the sets on a plane. Player 1 chooses a point i 6 Si and Player 2 chooses a point
y S2. In making his choice, no player has information on the opponent's actions,
and hence such a choice can be conveniently interpreted as simultaneous. In this case,
the points x G Si,y S2 are strategies for the players 1 and 2, respectively. Thus
the players' sets of strategies coincide with the sets Si and S2 on the plane.
Player 2's objective is to minimize the distance between himself and Player 1
(Player 1 pursues the opposite objective). Therefore, by Player l's payoff H(x,y)
in this game is meant the Euclidean distance p{x,y) between the points x 6 Si and
y G Si, i.e. H(x,y) = p(x,y),x S\,y G S?. Player 2's payoff is taken to be equal to
[p(x,y)] (the game being zero-sum).
Example 2. (Search on a closed interval.) [Diubin and Suzdal (1981)]. The
simplest search game with an infinite number of strategies is the following.

49
50 Chapter 2. Infinite zero-sum two-person games
Player 2 (Hider) chooses a point y G [0,1] and Player 1 (Searcher) chooses si
multaneously and independently a point x [0,1], The point y is considered to be
"detected" if \x y| < I, where 0 < / < 1. In this case, Player 1 wins an amount +1;
otherwise his payoff is 0. The game is zero-sum.
Thus the payoff function is

*<'>={ J; if I* - y| < 1,
otherwise.
The payoff to Player 2 is [-H{x,y)].
Example S. (Search on a sphere.) Let a sphere C of radius R be given in R3. Player
1 (Searcher) chooses a system of the points xi,xj,...,x, 6 C and Player 2 chooses
one point y C. The players make their choices simultaneously and independently
of one another. Player 2 is said to be detected if the point y 6 C is found in the
r-neighborhood of one of the points x,, j = 1,..., s. Here by the r-neighbourhood of
the point ij is meant the segment of a sphere (cup) having its apex at the point Xj
with r as the base radius (Fig. 2.1). In what follows the r-neighborhood of the point
Xj is denoted by S(xj,r).

/^\-j-_-

0
\ f l rt^

Figure 2.1

The objective of Player 1 is to find Player 2, whereas Player 2 pursues the opposite
objective. Accordingly the payoff to Player 1 is
tii \ f 1, if y Afx,
H(x,y) = ^Q' otherwise,
where x = (xi,...,xt) and Mx = U*=lS(xj,r). The payoff to Player 2 is [H(x,y)].
Example 4- (Noisy duel.) [Karlin (1959)]. Each duelist has only one bullet to
fire. The duel is assumed to be a noisy duel because each duelist is informed of his
opponent's action, firing his bullet, as soon as it takes place. Further, it is assumed
2.1. Infini te games 51

that the accuracy function J>I(I) (the probability of hitting the opponent at the instant
x) for Player 1 is defined on [0,1], is continuous and increases monotonically in x and
Pi(0) = 0, p i ( l ) = 1. Similarly, the accuracy of Player 2 is described by the function
P2(y) on [0,1] where pj(0) = 0, P2(l) = 1. If Player 1 hits Player 2, his payoff is + 1 .
If Player 2 hits Player 1, his payoff is 1. If, however, both players fire simultaneously
and achieve the same result (positive or negative), the payoff to Player 1 is 0.
The information structure in this game (the fact that the weapons are noisy)
is taken into account in constructing the payoff function H{x,y). If x < y, the
probability of Player l's hitting the opponent is pi(x) and the probability of Player
l's missing is 1 pi(x). If Player 2 has not fired and knows that Player 1 cannot
fire any more, Player 2 can obtain a sure hit by waiting until y is equal to 1. Thus,
if Player 1 misses at the instant x, he is sure to be hit by Player 2 provided x < y;
hence
H(x,y) = p1(x) + (~l)[l-pl(x)}, x<y.
Similarly, we have

ff(x,y) = pj(y)(-l)+ 11-!*()]-1, x>y

and
H(x,y) = pi(x)[l -p2{y)\ + pj(y)[l - p i ( x ) ] ( - l ) , x = y.
Thus, the payoff function H(x,y) in the game is

2p,(x) - 1, x < y,

!
piOO - p*(y), x = y,
1 - 2pj(y), x > y,
where x g [0,1], y [0,1].
Example 5. (Silent duel.) [Karlin (1959)]. In a silent duel each duelist has one
bullet, but neither duelist knows whether his opponent has fired. For simplicity, let
the accuracy functions be given by Pi(x) = p?(x) = x. Then the payoff function
describing the game is

0, {
x - (1 - x)y, x < y,
x = y,

-y + (1 - y)*, x > y,
where x [0,1], y (0,1). In this game the payoff function H(x,y) is constructed
in the same manner as in Example 4, except that neither duelist can determine the
time of his opponent's firing provided the opponent misses.
Example 6. ("Noisy" target search.) Consider the search problem for a "noisy"
target. In this problem the "noisy" target (Player 2) is to be detected by a mobile
facility (Player 1). The detector range /(x,y), depending on the velocities x G [x 0 ,x 1 ]
and y 6 [yo,yi] of the players 1 and 2, respectively, is of the form

l{x,y)=1{y)kLzEL
(Xi - X 0 )
52 Chapter 2. Infinite zero-sum two-person games

where l(y) - l0 + 0(y - y0),

(yi-yo)'
/j = 7(yi), /0 = f(yo)- Positive numbers /i, /0 are said to be given. Thus

io(yi - y) + h(y - yo) (xx-x)


x,y) =
(y\ - yo) (*i - x0)
By Player l's payoff function H(x,y) is meant the search capacity, i.e. the searched
area per unit time H{x,y) = 2x l(x,y). The payoff to Player 2 is [-H(x,y)]. We
obtain the game with the payoff function

Htx y) = 2 i / ^ 1 ~y) + 'i(y- yo) (** ~ * )


(Vi - yo) (xi - x0)
where ar [i 0 ,xi],y [y 0 ,yi]-
2.1.3. In conclusion we will point out a special class of zero-sum two-person game
in which X = Y = [0,1]. In these games, situations are the pairs of numbers (x,y),
where x,y 6 [0,1]. Such games are called the games on the unit square. The class of
the games on the unit square is basic in examination of infinite games. In particular,
the Examples 2, 4, 5 are illustrative of the unit square games. Also, Example 6 is the
unit square game if we set ZQ = yo = 0, Xj = yt = 1.

2.2 e-saddle points, e-optimal strategies


2.2.1. In the infinite game, as in any zero-sum two-person game r(X,Y,H) the
principle of players' optimal behavior is the saddle point (equilibrium) principle. The
point (xm,ym) for which the inequality

H(x,y')<H(x',y')<H(x',y) (2.2.1)

holds for all x X, y Y is called saddle point. This principle may be realized in
the game T if and only if
v = v = v = H{x',ym),
where
u = max inf H(x, y), v = minsup H(x,y), (2.2.2)

i.e. the external extrema of maximin and minimax are achieved and the lower value
of the game u, is equal to the upper value of the game V. The game T for which the
(2.2.2) holds is called strictly determined and the number v is the value of the game
(see 1.3.4).
For matrix games, the existence of the saddle point and the equality of maximin
to minimax were proved in the class of mixed strategies (see Sec. 1.6) and hence a
solution consists in finding of their common value v and those strategies x*,y" in
terms of which the external extrema are achieved in (2.2.2).
2.2. t-saddle points, e-optima.1 strategies 53

In infinite games, the existence of external extrema in (2.2.2) cannot be proved in


general case.
2.2.2. Example 7. Suppose each of the players 1 and 2 chooses a number from the
open interval (0,1). Then Player 1 receives a payoff equal to the sum of the chosen
numbers. In this manner we obtain the game on the open unit square with the payoff
function H(x,y) for Player 1.

H{x,y) = x + y, x ( 0 , l ) , y (0,1). (2.2.3)

Here the situation (1,0) would be equilibrium if 1 and 0 were among the players'
strategies, with the game value v being v = 1. Actually the external extrema in
(2.2.2) are not achieved but in the same time the upper value is equal to the lower
value of the game. Therefore v = 1 and Player 1 can always receive the payoff
sufficiently close to the game value by choosing a number 1 e, t > 0 sufficiently
close to 1. On the other hand, by choosing t > 0 as a sufficiently small number (close
to 0), Player 2 can guarantee that his loss will be arbitrarily close to the value of the
game.
2.2.3. Definition. The point (x(,yt) in the zero-sum two-person game T =
(X,Y,H) is called the t-equilibrium point if the following inequality holds for any
strategies x X and y 6 Y of the players 1 and 2, respectively:

H(x,yc) - t < H{xc,yt) < H(xc,y) + t. (2.2.4)

The point (xt, y t ), for which (2.2,4) holds, is called the t-saddle point and the strategies
x< and yt are called c-optimal strategies for the players 1 and 2, respectively.
Compare the definitions of the saddle point (2.2.1) and the t-saddle point (2.2.4).
A deviation from the optimal strategy reduce the player's payoff, whereas a deviation
from the e-optimal strategy may increase the payoff by no more than c.
Thus, the point (I ,e), 0 < e < 1, is e-equilibrium in Example 7, and the
strategies i , = 1 t, yc = e are -optimal strategies for the players 1 and 2 respectively.
2.2.4. Note that the following results hold for two strategically equivalent games
T = (X,Y,H) and Y'{X,Y,H'), where H' = 0H + a, /? > 0. If (XT/ ( ) is the t-
equilibrium point in the game F, then it is the (0e)-equilibrium point in the game T'
(compare it with Scale Lemma in Sec. 1.3).
2.2.5. The following theorem yields the main property of -optimal strategies.
Theorem. For the finite value v of the zero-sum two-person game V = (X, Y, H)
to exist, it is necessary and sufficient that, for any e > 0, there be e-optimal strategies
xe,yt for the players 1 and 2, respectively, in which case

\\mH(xc,yt)=v. (2.2.5)
o

Proof. Necessity. Suppose t h e g a m e V h a s t h e finite value v. For any t > 0 w e


choose strategy x, from t h e condition

sup//(x,j,e)-~<v (2.2.6)
54 Chapter 2. Infinite zero-sum two-person games
and strategy x< from the condition

mf//(*,y)+>v. (2.2.7)

From (2.2.2), (2.2.6), (2.2.7) we obtain the inequality

* ( * , * ) - 5 < < # ( * , ) + ' (2-2-8)

for all strategies x, y. Consequently,

\H(xt,yt)-v\<6-. (2.2.9)

The relations (2.2.4), (2.2.5) follow from (2.2.8), (2.2.9).


Sufficiency. If the inequalities (2.2.4) hold for any number t > 0, then
v = inf sup H(x, y) < sup H(x, ye) < H(xt, yt) + e
v i i

<inf#(x,y) + 2e<supinf.r7(x,y) + 2e = v. + 2e. (2.2.10)


Hence it follows that v < j ; , but according to the Lemma given in 1.2.2, the inverse
inequality holds true. Thus, it remains to prove that the value of the game T is finite.
Let us take such sequence {c} that lim_00e = 0. Let e* 6 {en}, ck+m {tn},
where m is any fixed natural number. We have
) + ek+m > H(x ) > H(xek,yn+m) - ek+m,

H(x<k,y<k+m) + efc > H(xtk,ytk) > H(xtm,ylk) - ek.


Thus,
W(xtk,yt) - H(xtk+m,yek+m)\ <ek + e*+m = 6^.
Since linu_oo*m = 0 for any fixed value of m, then there exists a finite limit
lim_^) J/(xjk). From the relationship (2.2.10) we obtain the inequality \H(xt,yt)
v\ < t; hence v = linv^oH(x t ,y c ). This completes the proof of the theorem.
2.2.6. To illustrate the definitions given in this section, we shall consider in
greater details Example 1, ref. 1.1.2.
Example 8. Suppose the sets Si and Sj are the closed circles of radii R\ and
Ri (Ri < R2), respectively. Find the lower value of the game
v = max min p(x, y).

Let Xo Si- Then min9 p(x0,y) is achieved at the intersection point y0 of the
straight line, passing through the center Oi of the circle S? and the point x0, and the
boundary of the circle S^. Evidently, the quantity min^gs p(x0, y) is a maximum at
the point M e Si where the lines of centers OOi (Fig. 2.2) intersect the boundary of
the circle S\ that is farthest from the point Ox.
Thus,t;=|01M|-fl3.
2.2. e-saddie points, e-optima] strategies 55
* -

Figure 2.2

In order to compute the upper value of the game

v = min max p(x,y)

we shall consider two cases.


Case 1. The center 0 of the circle Si belongs to the set S2 (Fig. 2.3). For each
Vo S2 the point x0 providing max r s, p(x,y0) is constructed as follows.
Let x0 and x% be the intersection points of the line 0ryo and the boundary of the
circle Si, XQ >S t n e intersection point of the line Oyo with the boundary of the circle
Si, that is farthest from the point y0. Then x0 is determined from the condition

p{x0,y0) = .^ff3p(x'0,yo).

By construction, for all 3/0 S2

m&xp(x,y0) = p(x0,y0) > ft.

With y0 = O, however, we get

max p(x,0) ft,

hence
minm20c/>(x,j/) = v = ft.
VS2 r e S j

It can be readily seen that since 0 6 S2, in Case 1 U = ft > |0iM| - ft = v.


Furthermore, we get the equality provided 0 belongs to the boundary of the set S2.
Thus, if in Case 1 the point 0 does not belong to the boundary of the set S^, then
the game has no saddle point. If, however, the point O belongs to the boundary of
the set S2, then there exists a saddle point and an optimal strategy for Player 1 is
56 Chapter 2. Infinite zero-sum two-person games

fl5
\ v = RA*5

M
^
x
'^0 \MX
0=
^ / 5,/
'V^w-

Figure 2.S

to choose the point M lying at the intersection of the line of centers 00\ with the
boundary of the set S\ that is farthest from the point 0\. An optimal strategy for
Player 2 is to choose the point y & coinciding with the center 0 of the circle Si-
In this case the value of the game is v = v = v = Rx + R2 R% = R\
Case 2. The center of the circle O & S2. This case is coinsidered in the same
way as Case 1 when the center of the circle St belongs to the boundary of the set
Sj. Compute the quantity v (Fig. 2.4). Let j/o S j . Then the point Xo providing
m a x i e s , p(x,yo) coincides with the intersection point x 0 of the straight line, passing
through y 0 and the center O of the circle S t , and the boundary of the circle Si that
is farthest from the point y 0 . Indeed, the circle of radius x5yo with its center at
the point yo contains Si and its boundary is tangent to the boundary of the circle
Si at the unique point XQ. Evidently, the quantity m a x l s , p(x,y) />(xo,y) takes
its minimum value at the intersection point M\ of the line segment 00\ and the
boundary of the circle S2. Thus, in the case under study

v = min maxp(x,y) = \0XM\ R2

Optimal strategies for players 1 and 2 are to choose the points M Si and M\ S%,
respectively.
If the open circles Si and S2 are considered to be the strategy sets in Example 1,
ref. 1.1.2, then in Case 2 the value of the game exists and is equal to

v = sup inf p(x,y) = inf sup p(x,y) = v = \0XM\ - R-x = v.


xes, v^Sj vSa x s ,

Optimal strategies, however, do not exist, since M & Sx, Mx S2. Nevertheless for
any e > 0 there are e-optimal strategies that are the points from the e-neighborhood
of the points M and M\ belonging respectively to the sets S t and S3.
2.2.7. In conclusion, it should be noted that the game in Example 6 has an
equilibrium point in pure strategies (see Exercise 7), and the games in the Examples
2.3. Mixed strategies 57

S2
R7

o:

Figure 2-4

1-5, generally, do not have an equilibrium point and a game value. Thus in Example
2 Player 1 has an optimal strategy 2* = 1/2 when I > 1/2, and the game value is 1
(any strategy of Player 2 being optimal).

2.3 Mixed strategies


2.3.1. Consider zero-sum two-person game T = (X,Y,H). If it has no value, then
v ^ v.. As noted in Sec. 1.4, in this case in order to increase his payoff for the player
it is important to know the opponent's intention. Although the rules of the games
do not provide such possibility one can estimate statistically the chances of choosing
one or another strategy and take a proper course of action after playing the game
with the same opponent sufficiently many times. But what course of action should
the player take if he wishes to keep his intentions secret? The only rational course of
action here is to choose a strategy by using some chance device, i.e. it is necessary to
use mixed strategies.
We shall give a formal definition of the mixed strategy for the infinite game.
2.3.2. Let X be some a-algebra of subsets of the set X (including singletons
x X) and let y be er-algebra of subsets of Y (y y if y Y). Denote by X and
Y the sets of all probability measures on the u-algebras X and y, respectively, and
let the function H be measurable with respect to a- algebra of X x y. Consider the
integral
K{fi,V) i / / Hix^W^dviy), pX,veY, (2.3.1)

representing the mathematical expectation of the payoff H(x, y) for measures fi, v
[Prokhorov and Riazanov (1967)].
Definition. A mixed extension of the game V = (X, Y, H) is a zero-sum two-
person game in normal form with the strategy sets X,Y and the payoff function
58 Ch&pter 2. Infinite zero-sum two-person games

K(p,v), i.e. the game T = (X,Y,K).


The players behavior in the mixed extension of the game T can be interpreted as
follows. The players choose independently of each other the measures p G X and
v G Y. In accordance with these measures they implement (e.g. by the table of
random numbers) a random choice of the strategies x X and y G Y. Thereafter
Player 1 receives the payoff H(x,y). The strategies p G X , v G Y are called mixed,
and x G X, y G Y are called pure strategies in the game F.
Introduction of a mixed extension of the infinite game deserves further comment.
The sets X and Y depend on what <r-algebras X and y are used to consider proba
bility measures. In the case of matrix games (the sets X and Y are finite) in a mixed
extension, the players choose their strategies in accordance with probability distribu
tions over the sets X and Y. If X is an infinite set and we are acting just as in the
finite case, then we need to consider measures for which all the subsets of the infinite
set X are measurable. Such measures, however, are very special, being concentrated
on the sets that are, at most, countable. Where only such measures are used the
players impoverish their possibilities and cannot always guarantee the existence of an
equilibrium point in mixed strategies. Therefore they use less extensive <r-algebras
on which the probability measures are determined. This substantially increases the
range of probability measures (and, as a rule, guarantees the existence of an equi
librium point in mixed strategies). In this case, however, not every function H on
X x Y proves to be measurable, and hence there is no way to define a mathematical
expectation of the payoff, thereby defining the concept of an equilibrium, and the
values of the game and optimal strategies. Thus, trade-off is required here. From the
point of view of finding a solution it is desirable that the mixed strategies be of the
simplest form and there be at least the value of the game in this extension.
Strictly speaking, the integral in (2.3.1) should be taken over the measure fix von
the Cartesian product XxY. By the rules of the zero-sum two-person game, however,
the mixed strategies (measures) p and v are selected by the players simultaneously and
independently, i.e. the probability measures p and v are stochastically independent.
Definition. A pair of probability measures p X, v G Y that are stochastically
independent is called the situation (p,v) in mixed strategies.
Thus, at the situation (p, v) in mixed strategies the payoff K(p, v) is the iterated
integral (2.3.1). Singletons belong to the cr-algebra of subsets of the strategy set
on which the probability measures are determined; hence every pure strategy x(y)
can be placed in correspondence with the probability measure px G 1C (vv G Y)
concentrated at the point x G X (y G Y). Identifying the strategies x and pT, y and
vv we see that pure strategies are a special case of mixed strategies, i.e. the inclusions
X C X, Y CY hold true. Then the payoffs of Player 1 at the points (x, v) and (p, y)
are respectively mathematical expectations

K(x, v) t K(px, u) = j y H(x, y)du(y), (2.3.2)

K(p, y) K(p, uy) = j H{x, y)dp(x), (2.3.3)


2.3. Mixed strategies 59

where the integrals in (2.3.1), (2.3.2), (2.3.3) are taken in the sense of Lebesgue-
Stieltjes. If, however, the distributions n(x),v(y) have the densities / ( x ) and g(y),
i.e. dfi(x) = f(x)dx and di/(y) = g(y)dy, then the integrals in (2.3.1), (2.3.2), (2.3.3)
are taken in the sense of Riemann-Stieltjes. The game F C T is a subgame of its
mixed extension T. Whatever the probability measures n and v, all integrals in
(2.3.1), (2.3.2), (2.3.3) are supposed to exist.
Definition. Let T = (X,Y, H) be a zero-sum two-person game, and let T =
(X,Y,K) be its mixed extension. Then the point (/I*,J/*) G X x V is called an
equilibrium point in the game T in mixed strategies if for all p 6 X and v (E Y the
following inequality holds:

K(p, v-) < K(ii't') < Kilt', u), (2.3.4)

i.e. (|i*, v') is an equilibrium point in the mixed extension of the game T, and (i*(v')
is Player 1's (S's) optimal strategy in T.
Similarly, the point (/**, i/*) (E X x Y is called the e-equilibrium point in the mixed
extension of the game T if for all p. X and 1/6V the following inequalities hold

Kfc., /) - e < KM, & < KM, u) + e, (2.3.5)

i.e. ^* (j/*) is the e-optimal strategy of Player 1(2) in T.


2.3.3. As in the case of matrix games, we may show that if the payoff functions
of the games T = (X,Y,H) and F' = (X, Y, H') are related by the equality H'(x, y) =
aH(x,y) + 13, a > 0, then the sets of equilibrium points in the mixed strategies in
the games T and V coincide, and for the game values we have v(F') = aw(r) + /3 (see
Sec. 1.4).
2.3.4. The equilibrium points in mixed strategies have the same properties as in
the case of matrix games, which follows from the theorems given below.
Theorem. For the pair (fi'v"), p' X, v* Y to be an (t) equilibrium
point in mixed strategies in the game T, it is necessary and sufficient that for all
x X,y Y the following inequalities hold:

K(x,u')<K(p; i/') < K{jf, y), (2.3.6)

(K(x,v") - t < K(fi;u') <ff(/i*,y)+ e). (2.3.7)

Proof. Necessity of the theorem is obvious, since the pure strategies are a special
case of mixed strategies. Prove the sufficiency. First prove (2.3.6) ((2.3.7) can be
proved in the same way). Let fi and v be arbitrarily mixed strategies for players 1
and 2, respectively. From (2.3.1), (2.3.2) and (2.3.6) we then get

K{ii,v') = / K{x,v-)dii(x) < K(?;u%

K{n;u) = jYK(n;y)d(y) > /f(/,!/)


60 Chapter 2. Infinite zero-sum two-person games

This implies the inequalities (2.3.4), which is the required result.


From the theorem, in particular, it follows that if (x", y*) is the pure strategy (-)
equilibrium point in the game T, then it is also the (e-) equilibrium point in the mixed
extension T in which case the game value is preserved.
Note that the mixed extension T is a zero-sum two-person game and hence the
notion of a strictly determined game (in 2.2.1) holds true for T. Although this applies
to the theorem in 2.2.5, but we are dealing here with the equilibrium point and value
of the game in mixed strategies.
2.3.5. Theorem. For the game F = (X, Y, H) to have the value v in mixed
strategies, it is necessary and sufficient that the following equation holds:

sup inf K(p, y) = inf sup K(x, v) = v. (2.3.8)

If the players have optimal strategies, then the exterior extrema in (2.3.8) are achieved
and the equations
iaK(n%y) = v, (2.3.9)

sup K(x, /*) = v (2.3.10)

are the necessary and sufficient optimality condition for the mixed strategies /i" X
and i>* F .
Proof. Let v be the value of the game. Then, by definition,

v = sup inf K{ft,v). (2.3.11)

For a fixed strategy (i, the set {K(fi,i/)\ v 6 } is a convex hull of numbers
^(/*>y)> V Y- Since the exact lower bound of any set of real numbers coincides
with that of a convex hull of these numbers, then

MK(w)=MK(it,y). (2.3.12)

Equality (2.3.12) can also be obtained from the following reasonings. Since Y C F ,
we have
inf # ( / . , * ) < inf * 0 i , ) .

Suppose the inequality is strict, i.e.

inftfO(,i')<infK(jt,y).

This means that for a sufficiently small e > 0 the following inequality holds:

inf K{p,v) + t< inf K(ft,y).

Thus, for all y 6 Y


K{n,y)> inf K(p,v) + t. (2.3.13)
2.3. Mixed strategies 61

Passing to the mixed strategies in (2.3.13) we now get

miK(ftyv)> iatK(fi,v) + i.

The obtained contradiction proves (2.3.12). Let us take a supremum for /i in (2.3.12).
Then
v = supinf K(n,y).

The second equality in (2.3.8) can be proved in the same way. Conversely, if (2.3.8)
is satisfied, it follows from (2.3.12) that v is the value of the game.
Now let (i',v* be optimal strategies for the players 1 and 2, respectively. By
the Theorem given in 1.3.4, the exterior extrema in (2.3.8) are achieved, and (2.3.9),
(2.3.10) are the necessary and sufficient optimality conditions for the mixed strategies
H" and v".
As noted in 2.3.2, introduction of mixed strategies in a zero-sum infinite game
depends on the way of randomizing the set of pure strategies. From (2.3.8), however,
it follows that the game value v is independent of the randomization method. Thus,
to prove its existence, it suffices to find at least one mixed extension of the game for
which (2.3.8) holds.
Corrollary. For any zero-sum two-person game T = (X,Y,H) having the value
v in mixed strategies, the following inequality holds:

supinf H(x,y) <v< inf$upH(x,y), (2.3.14)


x y v x

Proof. Theorem 2.3.5 implies

supinf H(x,y) < supinf K(fi,y) = v = inf sup K(x,v) < inf &\xpH{x,y).
X " i X " X

2.3.6. From (2.3.14), one of the methods for an approximate solution of the zero-
sum two-person game follows. Indeed, suppose the exterior extrema in (2.3.14) are
achieved, i.e.
v~ = max inf H(x, y) = inf H(x, y), (2.3.15)

v+ = minsup H(x,y) = sup//(x,y), (2.3.16)


V x x

and let a = v+ - v~. Then Player l's maximin strategy i and Player 2's minimax
strategy y describe the players' optimal behavior with a accuracy, and can be taken
as an approximate solution to the game T. In this case the problem thus reduces to
that of finding the maximin and minimax strategies for the players 1 and 2, respec
tively, with the accuracy of the approximate solution determined by a = v+ - v~.
Here, by (2.3.14), the game value v lies in the interval v {v~,v% Minimax theory
[31,30] is devoted to the methods of finding solutions to problems (2.3.15), (2.3.16).
2.3.7. As in the case of matrix games, the notion of a mixed strategy spectrum
is important for infinite games.
62 Chapter 2. Infinite zero-sum two-person games

Definition. Let V = (X,Y,H) be o zero-sum two-person game. The pure


strategy x 0 X (y 0 Y) of Player 1(2) is then called the concentration point of his
mixed strategy (i(v) if fi(x0) > 0(f(y o ) > 0).
Definition. The pure strategy x0 X (y0 6 Y), where X (Y, respectively) is
a topological space, is called the point of the mixed strategy spectrum n(v) given on
Borel a-algebra of subsets of the set X(Y) if, for any measurable neighborhood w of
the point Zo(lfo) ^ne following inequality holds:

fi(u)) = / dfi(x) > 0 (v{u) = / dv(y) > 0).

The least closed set whose ^-measure (u-measure) is equal to 1 will be called the
spectrum of the mixed strategy (t(v).
The mixed strategy concentration points are the spectrum points. The opposite
is not true. Thus, the pure strategies on which the density of a mixed strategy is
positive are the spectrum points, but not the concentration points.
The mixed strategy spectrum /i(v, respectively) will be denoted by X^Y,,). We
shall prove the analog of the complementary slackness theorem 1.7.6 for infinite games.
Theorem. Suppose T = (X, Y,H) is a zero-sum two-person game having the
value v. Ifxo X and v* is an optimal mixed strategy of Player 2 and

K{x0,u')<v (2.3.17)

then x0 cannot be the concentration point for an optimal strategy of Player 1.


A similar result is true for the concentration points of Player 2's optimal strategies.
Proof. It follows from the optimality of mixed strategy is* Y that for all x 6 X
the following inequality holds:
K{x,u')<v.
Integrating it with respect to Player I's optimal mixed strategy (measure) fi* over
the set X \ {x0}, we get

Let n'(xo) > 0, i.e. XQ is concentration point of Player I's optimal mixed strategy
/i*. Then it follows from (2.3.17) that

K(x0, v")n*{x0) < v(i*(xo).

Combining the last two inequalities yields a contradiction

v = jxK{x,v'W{x) = K{f,')<v.

Hence n*(x0) = 0 for all optimal strategies ( i ' e l


2.3.8. For zero-sum infinite games, the notion of strategy dominance can be
introduced in the same way as in Sec. 1.8.
2.3. Mixed strategies 63

Definition. Player 1 's strategy p\ X dominates strictly strategy n2 X (HI >-


Pi) /
H((iuy)>H(fi2,y),
for all y Y. Similarly, Player 2's strategy V\ Y dominates strictly strategy
"2 6 Y (vx X i/2) if
H(x,ut) < H{x,v%),
for all x X. Strategies /i 2 and i/2 are called strictly dominated if there exist /*i >- fi2
and fi >- v2
If the last inequalities axe satisfied as nonstrict, then we say that Hi dominates
M2 {f*i b Pi) and V\ dominates f2 (vx X y 2 ).
Dominance theorems that are similar to those in 8.3 will be given without proof.
Theorem. For a zero-sum infinite game having a solution, none of the player's
strictly dominated pure strategies is contained in the spectrum of his optimal mixed
strategies.
Theorem. Let T = (X, Y, H) be a zero-sum infinite game having a solution (X
and Y are topological spaces), and each element of the open set X C X is dominated
by a strategy fi whose spectrum does not intersect X. Then any solution of the game
T' = (X\ X, Y,H) is a solution of the game T.
A similar theorem holds for strategies of Player 2.
2.3.9. This section deals with the properties of optimal (e-optimal) mixed strate
gies with the assumption that there exists a solution of the game. The matrix game
is strictly determined in mixed strategies, i.e. there always exist a value and an
equilibrium point, which follows from the Theorem 1.6.1.
The saddle point in zero-sum two-person infinite games does not exist always as
it is seen from the next example.
Example 9. (The game without a value in mixed strategies.) [Vorobjev (1984)].
Consider the game T = {X,Y,H), where X = Y = {1,2,...} is the set of natural
numbers, and the payoff function is

1, if a; > y,
0, if i = y,
! -1, ifx<y.

This game has no value in pure strategies. Show that it has no value in mixed
strategies as well.
Let p. be an arbitrary mixed strategy of Player 1, and d\i{x) = 6X, where Sx > 0
and ?Li *i = 1- Take e > 0 and find y< such that

E S* > l ~ e-
x<y,

Then
oo

*(p,y) = *.#<*,*) = E W*,y<)+ E I


Ww)
x=l i<V< >V
64 Chapter 2. Infinite zero-sum two-person games

r< *>V

Because of the arbitrariness of e > 0 and since H(x, y) does not take values less than
1, we have
m{K(fi,y) = -\.

Consequently, since the strategy p is arbitrary,

v = supinf K(fi,y) = 1.

By a similar reasoning we may get

t; = inf sup K (x, v) = 1.

Since V > v, then the game F has no value in mixed strategies. As is shown in the
next section, the continuity of the payoff function and the compactness of the strategy
spaces is sufficient for the existence of a solution (value and optimal strategies) in the
mixed extension.

2.4 Games with continuous payoff functions


2.4.1. In this section the zero-sum two-person games T = (X,Y,H) are considered
with the assumption that the strategy spaces X and Y are metric compact sets (more
often they will be the subsets of Euclidean spaces), and the function H is continuous
in both variables. The sets X , Y of mixed strategies of the players 1 and 2 mean the
sets of probability measures given on the <7-algebras X and y of the Borel subsets
of the sets X and Y, respectively. Then Player l's payoff K{p,v) in the situation
in mixed strategies (p,v) 6 ~X~ x Y is determined by the integral (2.3.1) and is the
mathematical expectation of the payoff over the probability measure fix v.
The defined game T = (X, F, H) will be called a continuous game.
2.4.2. Theorem. IfT = (X, Y,M) is the zero-sum infinite game having the
value v and the equilibrium point (fi',i>*), and the functions K(fi",y),K(x,i/") are
continuous respectively in y and x, then the following equalities hold:

tf(/Ay) = >, yen-, (2.4.1)

K{x,u')=v, *eX w ., (2.4.2)


where Yu>, X^ are the spectrums of mixed strategies v* and /i" respectively.
Proof. From the Theorem 2.3.4 it follows that the inequality

K(it',y)>v (2.4.3)

holds for all points y 6 Y. If (2.4.1) does not hold, then there exists a point yo Yv-
such that K(fi*,yo) > v. By the continuity of the function K{p',y) there exists such
a neighborhood that the inequality (2.4.3) in a neighborhood w of the point y0 is
2.4. Games with continuous payoff functions 65

strict. From the fact that y 0 V is a point of the mixed strategy spectrum v", it
follows that v*(w) > 0. From this, and from inequality (2.4.3) we get

v = K{f,S) = JYK{f,y)dv'{y)>v.

The contradiction proves the validity of (2.4.1). Equality (2.4.2) can be proved in a
similar way.
This result is analog of the complementary stackness theorem, 1.7.6. Recall that
the pure strategy x appearing in the optimal strategy spectrum is called essential.
Thus, the theorem states that (2.4.1) or (2.4.2) must hold for essential strategies.
Theorem 2.4.2 holds for any continuous game since the following assertion is true.
2.4.3. L e m m a . / / the function H : X xY R1 is continuous on X xY then
the integrals of K(p,y) and K(x,u) are respectively continuous functions of y and x
for any fixed mixed strategies \i X and u 6 Y.
Proof. The function H(x,y) is continuous on the compact set X xY, and hence
is uniformly continuous.
Let us take an arbitrary t > 0 and find such 8 > 0 that as soon as p(yi,y2) < 6,
then for any x the following inequaility holds:

\H(x,yi)-H{x,y2)\<e, (2.4.4)

where p(-) is a metric in the space Y.


Then

|Jif(*i,y,) - /f(p,Si)| = I / H(x,yi)d(x) - ( H(x,yi)dn(X)\


JX JX

= I hH(x,yi) - i/(x,y2)]#(2)| < / |/f(j,,) - tf(*,W)MM*) < / M*) =


JX JX JX
(2.4.5)
Hence the function K(fi, y) is continuous in y.
The continuity of the function K(x,v) in x can be proved in a similar way.
2.4.4. Let us formulate the fundamental theorem of this section.
T h e o r e m . The zero-sum infinite game T = (X, Y, H), where X, Y are metric
compact sets and H is a continuous function, has a solution in mixed strategies (the
value and optimal strategies).
This theorem can be proved in terms of analytical properties of the mixed exten
sion of the game T = (X, Y, K) and some auxiliary results.
2.4.5. Recall that the sequence of Borel measures ju, n = 1,2,..., given on the
Borel (7-algebra X of the compact metric space X is called weakly convergent if

lim / v K x ) # n ( * ) = / y(x)<fc(*) (2-4-6)


noo Jx JX
for any continuous function <p(x), x X. __
L e m m a . In terms of the Theorem 2.4-4, iht mixed strategy sets X and Y (the
sets of Borel probability measures) are compact metric sets in the topology of weak
convergence.
66 Chapter 2. infinite zero-sum two-person games

Let us outline the proof for the set of mixed strategies X (similar arguments apply
toF). _
The space of Borel measures X given on the Borel a-algebra X of the compact
metric space X will be a metric space if we introduce the metric

p(S,lt) = max(p',p%

where p' and p" are respectively lower bounds of the numbers r' and r" such that for
any closed set F C X

MF) < v"(VT.(F)) + r', ,i"(F) < AVr(F)) + r",

where VT{F) = {x X : min, Fpi(x,z) <r},r > 0, and p\(-) is the metric in the
space X.
It is known [Prokhorov and Riazanov (1967)] that convergence in this metric space
is equivalent to weak convergence, and the set of measures p defined on a Borel <r-
algebra of the subsets of the space X is weakly compact (i.e. compact in terms of the
above defined metric space of all Borel measures) if and only if this set is uniformly
bounded
*(*) < c (2.4.7)
and uniformly dense, i.e. for any c > 0 there is such compact set A C X that

H(X \A)<t. (2.4.8)

Condition (2.4.8) follows from compactness of X, and (2.4.7) follows from the fact
that measures (i 6 X are normed {n(X) = 1).
2.4.6. Note that under conditions of the theorem, 2.4.4, the set of mixed strategies
X(Y) of Player 1(2) is also a compact set in the ordinary sense, since in this case
the weak convergence of the measure sequence {/<}, n = 1,2,... is equivalent to the
convergence in the ordinary sense:

n lim ) /i n (y4) = /i(A)

for any Borel set A C X with the bound A' having the zero measure fi(A') = 0.
Proof of this result involves certain complexities and can be found, in Prokhorov
and Riazanov (1967).
2.4.7. Denote by w and v respectively the lower and upper values of the game
T = (X,Y,H).
U = supinf K(ft,y), v infsup(z,i/). (2.4.9)
are
Lemma. / / the conditions of the 2-4-4 satisfied, the extrema in (2-4-9) are
achieved, and hence

11 = maxvain K(n,y), v = min max K (x, u). (2.4.10)


2.4. Games with continuous payoff functions 67

Proof. Since H(x,y) is continuous, then, by the Lemma 2.4.3, for any measure
fi 6 X the function
K(jt,y)= I H(x,y)drtx)
Jx
is continuous in y. Since Y is a compact set, then K(fi,y) achieves a minimum at a
particular point of this set.
By the definition of v, for any n there exists such a measure ( i e X that

min K(pn,y) > v - 1/n.

Since X is a compact set in the topology of weak convergence (Lemma, 2.4.5), then
a weakly convergent subsequence can be chosen from the sequence {nn}%Li, Hn X.
Suppose the sequence {/*n}Li weakly converges to a certain measure ft0 6 X. Then

Ym^Kin^y) = \im>jxH(x,y)dnn(x)

= j H(x,y)di*>(x) = K{no,y), 6 K
But K(no,y) is not less than v for every y 6 Y. Hence min v K(fi0,y) > 1> and the
required maximum is achieved on fio X.
Similarly, inf sup in (2.4.9) can be shown to be replaced by minmax.
2.4.8. We now turn to the proof of the Theorem 2.4.4.
Proof. Since X and Y are metric compact sets, then for any integer n there exist
a finite (l/n)-networks

Xn = {xl,...,xn}, XncX, K = { ? , . . . , : } , YnCY

of the sets X and Y, respectively. This means that for any points x X and y Y
there are such points i " 6 Xn and y" Yn that

p, (*,*?) < 1/", P2(,V?) < 1/n, (2.4.11)

where Pi(-)iP2( - ) are metrics of the spaces X and K, respectively.


For an arbitrary integer n we construct a matrix game with the matrix An = {<*-},
where
S = ff(x7,v"), *? e * n , v ; e Yn. (2.4.12)
The game with the matrix An has the value 6n and optimal mixed strategies p n =
(*7, . . . , 7r?n), t n = (T,", ...,7-pj for the players 1 and 2, respectively (see Theorem in
1.6.1).
The function H(x,y) is continuous on the Cartesian product X x Y of metric
compact sets, and hence it is uniformly continuous, i.e. for a given e > 0 we may find
such 6 > 0 that as soon as

p 1 (x,x')<<5, P2(y,y')<6>
68 Ciapter 2. Infinite zero-sum two-person games
then
\H(x,y)-H(x',y')\<e. (2.4.13)
We choose n such that 1/n < , and then determine the strategy /xn 6 X by the
following rule:
*.(*)= *? (2-4-14)

for each Borel set F of the space X. Thus we have

*(/,#) = > S * ? > 0 - (2-4-15)

If />j(y.y") < *, then by (2.4.4), (2.4.5) and (2.4.13) we get


\H(x,y)-H{x,yJ)\<e,

Consequently, for any y Y (Yn is (l/n)-network of the set Y)

K(Mn,y)>6n-e. (2.4.16)

Since min K(n,y) is achieved (Lemma, 2.4.7), then

v > 0n - t. (2.4.17)

Similarly, we may show that


v < 0n + e. (2.4.18)
From (2.4.17) and (2.4.18) we obtain
v > v 2e.

But, by Lemma, 1.2.2, the inequality v < v always holds. Because t > 0 was arbitrary,
we obtain
11 = v, (2.4.19)
then from Lemma, 2.4.7, and (2.4.19) follows the assertion of the theorem (see 2.2.1).
2.4.9. Corollary. The following relation holds:
v = lim 0n, (2.4.20)

where 9n = v(An) is the value of the matrix game with matrix An (2.4-12)-
2.4.10. It follows from the proof of the Theorem 2.4.4 that a continuous game can
be approximated by finite games to any degree of accuracy. Moreover, the following
result holds true.
Theorem. An infinite two-person zero-sum game T = (X, Y, H), where X, Y are
metric compact sets and H is a continuous function on their product, has t-optimal
mixed strategies with a finite spectrum for any e > 0.
2.4. Games with continuous payoff functions 69

Proof of this theorem follows from proof (2.4.8) of the Theorem 2.4.4. Indeed, by
the game T, we may construct matrix games with matrices An and mixed strategies
pn 6 X that are respectively determined by (2.4.12), (2.4.14) for an arbitrary integer
n. By analogy, Player 2's strategies vn Y are determined as follows:

"(<?)= E "". ( 2 - 4 - 21 )

where t" = (r, n ,... , r" n ) is an optimal mixed strategy for Player 2 in the game with
matrix An and value 0n.
By construction, we have

<> = E E ^ , n r ; = *(*.,"-), (2-4.22)


=i i=i
where K{fi,v) is the payoff in mixed strategies (fi,v) in the game V. From (2.4.16)
and a similar inequality for the strategy vn we have that for an arbitrary t > 0 there
exist an index n such that

-(i, vn) - e < < K(pn, y) + e (2.4.23)

for all x 6 X and 5 6 V. Considering that the strategies ftn and fn have respective
finite spectra Xn and Kn, and Xn,Yn are finite 1/n-networks of the sets X and V,
respectively, we obtain the assertion of theorem (see 2.3.4).
2.4.11. By combining the results of theorems 2.4.4 and 2.4.10, we may conclude
that the infinite two-person zero-sum game with the continuous payoff function and
compact strategy sets for any > 0 has e-optimal strategies of the players that are
mixtures of a finite number of pure strategies, and the optimal mixed strategies in
the class of Borel probability measures. Specifically, these results hold for the games
on the unit square (see 2.1.3) with a continuous payoff function.
2.4.12. There are many papers proving the existence of the value of infinite two-
person zero-sum games. The most general result in this line is attributed to Sion
(1958). The results are well known for the games with compact strategy spaces and
semicontinuous payoff functions [Peck and Dulmage (1957), Yanovskaya (1973)]. We
shall show that in some respects they do not lend themselves to generalization.
Example 10. (A square game with no value in mixed strategies.) [Sion and Wolfe
(1957)]. Consider a two-person zero-sum game T = (X,Y, H), where X = Y = [0,1]
and the payoff function H is of the form

-l, if x < y < x + 1/2,


0, if x = y or x = x + 1/2,
! 1, if y < x or x + 1/2 < y.

This function has the points of discontinuity on the straight lines y = x and y =
s + 1/2. Show that

sup inf K(ii, u) = 1/3, inf sup K{n, v) = 3/7. (2.4.24)


70 Chapter 2. Infinite zero-sum two-person games
Let fi be a probability measure on [0,1). If ^([0,1/2)) < 1/3, then we set yM = 1. If,
however, p([0,1/2)) > 1/3, then we choose 6 > 0 so that /i([0,1/2 - 6]) > 1/3, and
set yM = 1/2 6. In each of these cases we have inequalities

MK(ii,v)<K(it,Vll)<l/3,

which can be proved directly,


On the other hand, if \L is chosen such that fi({0}) = p({l/2}) = M{1}) = 1/3,
then for all y g [0,1] we have

llH{x,yW(x) = l/3[H(0,y) + #(1/2, y) + H(l,y)} > 1/3.


JO

We have thus proved the first equality of (2.4.24).


Now let v be a probability measure on [0,1]. If f([0,1)) > 3/7, then we set
z = 1. If, however, v([0,1)) < 3/7, then ('({l}) > 4/7, in which case we set x = 0
provided i/([0,l/2)) < 1/7; if j/([0,l/2)) > 1/7, then we choose S > 0 such that
v([0,1/2 S\) > 1/7, and set x = 1/2 S. In each of these cases we see that

supK(n,u) > K(xu,v) > 3/7.

On the other hand, if v is chosen such that

KO/4}) = 1/7, K({1/2}) = 2/7, K{1}) = 4/7,


then for any x [0,1] we have

H(x,y)du(y) = i[tf(z, I ) + 2H{x,l-) + 4H(x, 1)] < *.

We have thus proved the second equality of (2.4.24).

2.5 Games with a convex payoff function


In Sec. 2.4, the existence of a solution to the infinite two-person zero-sum games with
a continuous payoff function and compact strategy sets was proved under sufficiently
general assumptions. At the same time, it may be interesting, both theoretically
and practically, to distinguish the classes of games, where one or both players have
optimal pure strategies. This section deals with such games.
2.5.1. Definition. Suppose X C iT", Y C fll" ore compact sets, the set Y is
convex, the payoff function H : X xY > R1 is continuous in all its arguments and is
convex with respect to y Y for any fixed value ofxGX, Then the game T(X,Y,H)
is called the game with a convex payoff function (a convex game).
A symmetric definition for Player 1 is given as follows:
Definition. If X C R, Y C i f are compact sets, X is convex, the function H
is continuous in all its arguments and is concave with respect to x X for any fixed
2.5. Games with a convex payoff function 71

y 6 Y, then the game T = (X, Y, H) is called the game with a concave payoff function
(a concave game).
If, however, X C i? m ,K C Rn are compact sets and the payoff function H(x,y)
which is continuous in all its arguments, is concave with respect to x for any fixed
y and is convex with respect to y for each a:, then the game T(X,Y,H) is called the
game with a concave-convex payoff function (a concave-convex game).
We will now consider convex games. Note that similar results are also true for
concave games.
Theorem. Suppose T = (X, Y, H) is a convex game. Then Player 2 has an
optimal pure strategy, with the game value being equal to

v = min max H(x,y). (2.5.1)

Proof. Since X and Y are metric compact sets (in the metric of Euclidean spaces
R and fl") and the function H is continuous on the product X x Y, then, by the
Theorem 2.4.4, in the game T there exist the value v and optimal mixed strategies
/i*,i/". It is well known that the set of all probability measures with a finite sup
port is everywhere dense in the set of all probability measures on Y [Prokhorov and
Riazanov (1967)]. Therefore, there exists a sequence of mixed strategies vn with a
finite spectrum that is weakly convergent to v*. Suppose the strategy spectrum vn
is compressed of the points yi,-,y" that are chosen with probabilities ty", , ?
By the convexity of the function H, we then have

* ( * , " " ) = ! > " # < * . Vi) > H(x,T), (2-5.2)

where y" = **, ^"!/n- Passing to the limit as n > oo in (2.5.2) (with the sequence
{y"} to be considered as required) we obtain

K{x, i/*) > H{x,y), xX (2.5.3)

where y is a limit point of the sequence fy"}. Prom (2.5.3) and Lemma 2.4.3 we have

max K{x, v') > max H(y). (2.5.4)

Suppose inequality (2.5.4) is strict. Then

v = max K(x, v') > max H(x, y) > min max H(x, v) = v,
X X V X

which is impossible. Thus, max x H{x,y) = max* K(x, v') = v and it follows from
Theorem 2.3.5 that y is an optimal strategy for Player 2.
We will now demonstrate the validity of (2.5.1). Since y Y is an optimal strategy
of Player 2, then
t; = maxH(x,y) > minmaxH(x,y).
72 Chapter 2. Infinite zero-sum two-person games
On the other hand, the following inequality holds:
v = minmaxK(x,i/) < min max H(x,y).
v
v x ' v *
Comparing the latter inequalities we obtain (2.5.1).
2.5.2. Recall that the function <p : Y - R1, Y C Rn, Y being a convex set, is
strictly convex if the following strict inequality holds for all A (0,1):
>(Ayi + (1 - A)y3) < \<p(yi) + (1 - X)<p(y2), y,,y2 6 Y, y, ^ y2.
Theorem. le* T = (X, Y,H) be a convex game with a strictly convex payoff
function. Player 2 then has a unique optimal strategy that is pure.
Proof. Let n* be an optimal strategy for Player 1, ip(y) = K(fi',y) and v - the
value of the game. If y is a point of Player 2's optimal strategy spectrum, then the
following relation holds (2.4.2):
K(f,y) = v.
For all y 6 Y, however, there is the inequality K{y.*,y) > v, and hence

>(y) = minv>(y) = t>.

The function <p(y) is strictly convex since for A (0,1) the following inequality holds:

^(Ay, + (1 - A)y2) = j x H(x, Ay, + (1 - \)y,)df(x)

< A / H(x,yi)df(x) + (1 - A) / H(x>yi)d'(x) = \<p(yi) + (1 - AMy 2 ). (2.5.5)


From (2.5.5) it follows that the function ip{y) cannot achieve a minimum at two
different points. On the other hand, the existence of a minimum point y of the
function ip(y) is guaranteed by the Theorem 2.5.1, which completes the proof.
2.5.3. The results that are counterparts of the theorems in 2.5.1 and 2.5.2 for
concave and concave-convex games will be presented without proof.
Theorem. Let T = (X, Y, H), X C fi, K c i f k s concave game. The game
value v equals
v = maxmin H(x,y) (2.5.6)
and every pure strategy x*, on which max min (2.5.6) is achieved, is optimal for Player
1. Moreover, if the function H(x,y) is strictly concave with respect to x for every
fixed y 6 Y, Player 1 's optimal strategy is unique.
Theorem. Let T = (X, Y, H), X C R, Y C R" be a concave-convex game.
Then the game value v is equal
v = min max H(x, y) = max min H(x, y). (2.5.7)

In the game F there always exists a pure strategy saddle point (z*, y*), where x* 6 X,
y* Y are pure strategies for players 1 and 2, on which exterior extrema in (2.5.7)
2.5. Games with a convex payoff {unction 73

are achieved. In this case, if the function H(x,y) is strictly concave (convex) with
respect to a variable x(y) for any fixed y Y (x X), then Player 1(2) has an
optimal unique strategy that is pure.
2.5.4. We will now clarify the structure of Player l's optimal strategy in the
convex game T = (X, Y, H).
Theorem. In the convex game T (X, Y, H), Y C R", Player 1 has an optimal
mixed strategy pS with the finite spectrum composed of no more than (n + 1) points
of the set X.
The proof of this result is based on the well known Helly theorem of convex sets
that is presented below without proof [Rockafellar (1970), Davidov (1978)].
Theorem (Helly Theorem). Let K be a family composed of at least n + 1
convex set in Rn, each set from K being compact. Then, if each n + 1 of the sets of
the family K have a common point, there exists a point common to all the sets of the
family K.
Before proving the theorem, we shall prove some auxiliary statements.
Suppose the function H(x,y) is continuous on the product X xY of the compact
sets X C Rm,Y C Rn. Denote by XT = X x . . . x X the Cartesian product of r sets
of X.
Consider the function ip : Xr x Y Rl:

<fi(xu..., xr, y) = max H(x{, y).


l<i<r

Lemma. The function ip(x\,... ,xT,y) is continuous on XT xY.


Proof. The function H(x,y) is continuous on the compact set X x Y and hence
is uniformly continuous on it. Then for any t > 0 there is 6 > 0 such that from the
inequalities pi(x,W) < S,p2(yi,y2) < 6 follows the inequality \H(x,yi)~H(x,y2)\ < e,
where Pi{-), p2(-) are distances in R1* and R", respectively. We have

\<p(xu... ,xT,yi) - <p(Ti,... ,%,y2)\

= | max H{xi,yi) - max H(xy2)\ = |ff(xj,,y,) - H{xh,y3)\,


l<t<r J$*S r
where
W(*i..i) = njax H(xi,yi), H(Wh,y2) = max H(Wi,y2).
l<i<r *5 I S T

If pi(xXj) < 6 for t = l , . . . , r , p2(yi,y2) < 8 and if H(xit,yi) > H(xi:l,y2), then

0 < # ( i , i h ) - H(xh,y^) < H{xit,yi) - A f c , , ^ ) < e.

Similar inequalities also hold in the event that /f(x;,,yi) < H(xi2,y2).
Lemma. In the convex game T = (X, Y, H), Y C Rn the game value v is
v = minma.xH(x,y) = max min max i/(x,,y), (2.5.8)

where y 6 Y, X; X, i = 1 , . . . ,n + 1.
74 Chapter 2. Infinite zero-sum two-person games
Proof. Denote
9 = max min max H(xi,y).
v
>,...^r+i V l<Kn+l '
Since miny maxi<t-<n+i H(XJ, y) < min max* H(x, y) = v for each system of points
( i , , . . . , x n + i ) e X" + , ,then
0 < v. (2.5.9)
For an arbitrary fixed set of strategies x; X, i = 1,..., n + 1, we shall consider a
system of inequalities with respect to y:

H(xity)<e, yeY,i=l,...,n+l. (2.5.10)

We shall show that system (2.5.10) has a solution.


Indeed,

8 > min max /f(z,-,y) = max H(xi,y) > i/(x,-,y), i = 1,... ,n + 1.


_ V V
V l<i'<n+l '"' l<i<n+l "/- \ "

Thus, y satisfies system (2.5.10).


Consequently, system (2.5.10) has a solution for any x,- 6 X, i 1,2,.. . , n + 1.
Let us fix x and consider the set

Dx{y\H{x,y)<e}.

The function H(x, y) is convex and continuous in y, and hence the set Dx is closed
and convex for each x. The sets {Dz} forms a system of convex compact sets in if.
And since the inequalities (2.5.10) always have a solution, any collection in the (n +1)
sets of system {Dx} has a nonempty intersection. Therefore, by the Helly theorem,
there exists a point y0 Y common to all sets Dx. That is there exists a point such
that
H(x,y0) < 0 (2.5.11)
for any i l Suppose that 0 ^ v. From (2.5.9) and (2.5.11) we then have

0 < v = min max H(x, y) < max H(x, yo) < 0j

i.e. 0 < 0. It is this contradiction that establishes (2.5.8).


We shall now prove the theorem.
Proof. From the preceding lemma we have

v= max rnin max H(x;,y) = min max H(xi,y) = minmaxY^ /f(z;,yW;,


(2.5.12)
where i i , . . . ,x n +i sure vectors on which an exterior maximum in (2.5.8) is achieved.
n+l
p = ( * . . . , x n+1 ) / T + \ x,- > 0, *.- = 1. (2.5.13)
2.5. Games with a convex payoff function 75

Consider the function

#(p,y) = #(*>,yK yeY,PeP,

where P is composed of the vectors satisfying (2.5.13). The function K(p,y) is con
tinuous in p and y, convex in y and concave in p, with the sets Y C .ft", P C iZ n+1
compact in the corresponding Euclidean spaces. Therefore, by the Theorem 2.5.3 and
from (2.5.12) we have
n+l n+l
v = minmax 5~* ff(x,,t/)xj = maxmin V* / f t e . y W j . (2.5.14)
i=i 1=1

From (2.5.8) and (2.5.14) follows the existence of p* g P and j ' E V such that for all
x X and y G F the following inequality holds:
n+l

=1

This completes the proof of the theorem.


Let us state the structure theorem for the optimal strategy used by Player 2 in
the concave game T = (X, Y, H).
Theorem. In the concave game T = (X,Y,H), X C Rm, Player 2 has an
optimal mixed strategy v' with a finite spectrum composed of no more than (m + 1)
points of the set Y.
Proof of this theorem is similar to that of the preceding theorem.
2.5.5. We shall now summarize the results established in this section.
Theorem. Let T = (X,Y,H), X C Rm, Y C Rn be a convex game. Then the
value v of the game T is determined from

v = min max H(x> y).

Player 1 has an optimal mixed strategy fi0 with a finite spectrum composed of no
more than (n + l) points of the set X. However, all pure strategies y0, on which
min y m a x I / / ( i , y ) is achieved, are optimal for Player 2. Furthermore, if the function
H(x,y) for every fixed x X is strictly convex in y, then Player 2's optimal strategy
is unique.
We shall illustrate these results by referring to the example given below.
Example 11. Let us consider a special case of Example 1 (see 2.1.2). Let Si =
Sj = S and the set S be a closed circle on a plane of radius R and centered at the
point O.
The payoff function H(x,y) = p(x,y), x E S, y S, with p(-) as a distance
function in R2, is strictly convex in y, and S is a convex set. Hence, by theorem 2.5.5,
the game value v is
t; = min max p(x,y). (2.5.15)
v v
yes xS ' '
76 Chapter 2. Infinite zero-sum two-person games

Computing minmax in (2.5.15) we have that v = R (see Example 8 in 2.2.6). In


this case, the point y0 6 S, on which a minimum of the expression maxxesp(x,y)
is achieved, is unique and coincides with the center of the circle S (i.e. the point
0). Also, this point is an optimal strategy for Player 2 (minimizer). The theorem
states that Player 1 (maximizer) has an optimal mixed strategy prescribing a posi
tive probability to no more three points of the set S. Because of the symmetry of
the set S, however, Player l's optimal mixed strategy /*0 actually prescribes with
probability 1/2 the choice of any two diametrically opposite points at the boundary
of the set S. To prove the optimality of strategies /io,yo, >t is sufficient to establish
that K(x,y0) < K(ii0,yo) < K(l*o,y) for all x,y 6 S, where K is the expectation
of the payoff. K(fi0,yo) = ft/2 + # / 2 = R- Indeed, K(x,y0) = p(0,x) < R and
^(A*O>S/) = p( x i>y)/2 + p{x2,y)/2 > R, where Xi and x-i are arbitrary diametrically
opposite points at the boundary of the circle S. We have thus proved the optimality
of strategies /i 0 and j/o-
2.5.6. Let us consider a special case of the convex game T = (X, Y, H) when
X = Y = [0,1], i.e. the convex game on a unit square. From the Theorem 2.5.5 it
follows that Player 2 always has an optimal pure strategy y0 [0,1], and Player 1
has a mixed strategy concentrated at no more than two points. In this case the value
of the game is
; = min max H(x,y). (2.5.16)
V S l
5,[0,1]*[0,1] '*'
The set of all essential strategies {x} C [0,1] of Player 1 is a subset of the solutions
of (see 2.4.2)
H(x,y0) = v, x [0,1], (2.5.17)
where y0 is an optimal strategy for Player 2. Player l's pure strategies satisfying
(2.5.17) are sometimes called balancing strategies. The set of all balancing strategies
of Player 1 is closed and bounded, i.e. it is compact. An optimal pure strategy for
Player 2 is any point j/o [0,1] on which (2.5.16) is achieved.
Denote by H'v(x,y) a partial derivative of the function H with respect to y (the
right-hand and left-hand derivatives mean y = 0 and y = 1 respectively).
Lemma. Ify0 is Player 2's optimal strategy in a convex game on a unit square
with the payoff function H differentiate with respect to y and y0 > 0, then for Player
1 there is a balancing strategy x' for which

H'y(x',yo)<0. (2.5.18)

//, however, yo < 1, then for Player 1 there is a balancing strategy x" such that

H'y(x",yo)>0. (2.5.19)

Proof. Let us prove (2.5.18). (The second part of lemma can be proved in a similar
way.) Suppose the opposite is true, viz. the inequality H'y(x, yo) > 0 holds for every
balancing strategy x of Player 1, i.e. the function H(x~, ) is strictly increasing at the
2.5. Games with a convex payoff function 77

point t/o- This means that there are e(x) > 0 and 6 ( x ) > 0, such that for y 6 [0,1)
satisfying the inequality 6 ( x ) > y0 y > 0, the following inequality holds:

H(x,y) <H(x,y0) - e(x).

By the continuity of the function H we have that for every balancing strategy x and
e(x)/2 there is S(x) > 0 such that for 0 ( x ) > y0 y > 0

H(x,y) < H(x,y) - *-f- < H(x,y0) - ^ = H(x,y0) - ^f-

for all balancing strategies x for which |xx| < 6(x). The set of balancing strategies is
compact, and hence it can be covered by a finite number of such 6(x)-neighborhoods.
Let e be the smallest of all corresponding numbers e(x). Then we have an inequality
holding for all balancing strategies x (and for all essential strategies)

H(x,y) < H(x,y0)--,

where
y0 - min 0 ( 5 ) < y < y0.
Let fi0 be the optimal mixed strategy of Player 1. The last inequality is valid for
all spectrum points of /J0> thus by integrating we set
t
K(no, y) < K(fi0, y0) - - = v - -,
which is contradiction to the optimality of the strategy ft0-
Theorem. Suppose that T is a convex game on a unit square with the payoff
function H differentiable with respect to y for any x, 5/0 * <*n optimal pure strategy of
Player 2, and v is the value of the game. Then:
1- ifyo 1) then among optimal strategies of Player 1 there is pure strategy x' for
which (2.5.18) holds;

S- if yo 0, then among optimal strategies of Player 1 there is a strategy x" for


which (2.5.19) holds;

3. if 0 < y0 < 1, then among optimal strategies of Player 1 there is a strategy that
is a mixture of two essential strategies x' and x" satisfying (2.5.18), (2.5.19)
with probabilities a and 1 a, a [0,1]. And a, is a solution of the equation

aH'y(x',y0) + (1 - a)H'y(x",y0) = 0. (2.5.20)

Proof. Let y0 = 1. Then, for Player 1, there is an equilibrium strategy x' for which
(2.5.18) holds. Hence it follows from the convexity of the function H(x',y) that it
does not increase in y over the entire interval [0,1], achieving its minimum in y = 1.
This means that
H(x',y0)<H(x',y) (2.5.21)
78 Chapter 2. Infinite zero-sum two-person games

for all y G [0, lj. On the other hand, it follows from (2.5.17) that

H(x,y0) < H(x',y0) (2.5.22)

for all x e [0, lj. The inequalities (2.5.21), (2.5.22) show that (x',y0) is an equilibrium
point.
The case jfo = 0 can be examined in a similar way. We shall now discuss case 3.
If 0 < y<> < 1, then there are two equilibrium strategies x' and x" satisfying (2.5.18),
(2.5.19), respectively.
Consider the function

<M = f*KV>) + C1 - 0K(*".).


From (2.5.18), (2.5.19) it follows that <p(0) > 0, <p(l) < 0. The function <?(&) is
continuous, and hence there is a [0,1] for which <p(a) = 0.
Consider a mixed strategy (t0 of Player 1 that is to choose strategy x' with prob
ability a and strategy x" with probability 1 a. The function

Kfa, V) = <*H(x>, y) + (1 - a)H(x", y)

is convex in y, Its derivative in y at the point y = yo is

Kfayo) = aHl(x',y0) + (1 - a)H'y(x",y0) = 0.

Consequently, the function K((io,y) achieves a minimum at the point y0- Hence
considering (2.5.17), we have

#0*o,yo) < K(no,y), K(Po,y) = H{x,y0) = v = max//(z,y0) > H(x,y0)

for all * [0,1] and y [0,1], which proves the optimality of strategies fio and yo-
2.5.7. Theorem 2.5.6 provides a way of finding optimal strategies which can be
illustrated by referring to the following example.
Example 12. Consider a game over the unit square with the payoff function
H(x, y) = (x y) 3 . This is a one-dimensional analogy for Example 11 except that
the payoff function is taken to be the square of distance. Therefore, it would appear
natural that the game value v would be v = 1/4, Player l's optimal strategy be to
choose with probability 1/2 the extreme points 0 and 1 of the interval [0,1]. We shall
show this by employing the Theorem 2.5.6.
Note that BtH(x,y)/dy2 = 2 > 0 so the game T is strictly convex, and hence
Player 2 has a unique optimal strategy that is pure (Theorem 2.5.5). Let y be a fixed
strategy for Player 2. Then

K
nax(a; - yf = i *'
y\ y>i/2.

Thus, from (2.5.16) it follows that

v = min{ min (1 - y) J , min y 3 } .


2.6. Simultaneous games of pursuit 79

Both interior minima are achieved on y0 = 1/2 and is equal to 1/4. Therefore,
v = 1/4, and y0 = 1/2 is the unique optimal strategy of Player 2.
We shall now find an optimal strategy for Player 1. To be noted here is that
0 < s/o < 1 (jfo = 1/2). Equation (2.5.17) now becomes (a; - 1/2)2 = 1/4. Hence
Xi = 0 and x-i = 1, i.e. the extreme points of the interval [0,1] are essential for Player
1.
Let us compute derivatives

H'y(xuy0) = 1 > 0, H'y(X2,y0) = -l<Q.

We now set up equation (2.5.20) for a. We have 2a 1 = 0 , and hence a = 1/2.


Thus, the optimal strategy for Player 1 is to choose pure strategies 0 and 1 with
probability 1/2.
2.5.8. To conclude this section, we shall present a result that is similar to 2.5.6
for a concave game.
Theorem. Suppose that T is a concave game on a unit square with the payoff
function H differentiabk with respect to x for any fixed y, xQ is an optimal pure
strategy of Player I, and v is the value of the game. Then:

1. if x0 = 1, then among optimal strategies of Player 2 there is a pure strategy y'


for which the following inequality holds:

K(Xo,y')>0; (2.5.23)

2. if XQ = 0, then among optimal strategies of Player 2 there is a pure strategy y",


for which
H't(x0,y")<0; (2.5.24)

3. /0 < Xo < 1, then among optimal strategies of Player 2 there is a strategy that
is a mixture of two essential strategies y' and y" satisfying (2.5.23), (2.5.24)
with probability 0 and 1 /?. Here the number /9 [0,1] is a solution of the
equation
0H'x(xo,y') + (l-P)H'x(xo,y") = O.

2.6 Simultaneous games of pursuit


This section provides a solution to certain of the games of pursuit whose payoff
function or the players' strategy sets are nonconvex. The results obtained in Sec. 2.5
are not applicable to such games, and hence a solution for both players is in the class
of mixed strategies. The existence of solution is guaranteed by the Theorem 2.4.4.
2.6.1. Example IS. (Simultaneous game of pursuit inside the ring.) This game
is a special case of Example 1 in 2.1.2 where the sets Si = Sj = S and S is a ring.
Radii of external and internal circles of the ring S are respectively denoted by R and
r,R>r.
80 Chapter 2. Infinite zero-sum two-person games
We shall show that optimal strategies of the players 1 and 2 are to choose the points
uniformly distributed over the internal (for Player 2) and external (for Player 1) circles
of the ring S. Denote these strategies by p" (for Player 1) and v* (for Player 2). When
using this strategies, the mean payoff (distance) is equal to

K{(i',v') = - i rJ FFjR2 + r2 - 2Rrcaa{<p - rl>)dtpdip


4JT Jo Jo

= ~ f2*jR2 + r*-2Rrcoatdt = *(r, R) (2.6.1)


IT JO
where if> and tp are polar angles of pure strategies for players 1 and 2, respectively. If
Player 1 chooses the point x with polar coordinates p, ip then the distance expectation
(Player 2 uses strategy v*) is

K(x,') = *(r,p) = j j f V r 2 + P3- 2r/>cos^.

For r <p< R, the function <p(p) = p2 + r2 2pr cos ( is monotonically increasing.


In particular, ip{p) < tp{R) for r < p < R. Hence we have 9(r, p) < $(r,R).
Therefore, for any strategy of Player 1 the distance expectation is at most $(r, R).
We shall now consider the situation (p',y) at which y e S, and p and <p are polar
coordinates of the point y. We have

K(p% y) = (p, i?) = ^ j*yR* + p2-2Rpcosmp), r<p<R.

Let us fix H and consider the function $(p, i?) on the interval 0 < p < R. Differenti
ation with respect to p shows that

<M.0,>0, 0S ,S*
Therefore, the function #(p, R) is monotonically increasing in p, and hence $(r, R) <
9(p,R)
K(x,u')<K(p',u')<K(p',y)
for all x, y 5. We have thus proved the optimality of strategies p.* and u'. Here the
game value t; is u = K{p', v'), where K(p*, u*) is determined by (2.6.1). Specifically,
if 5 is a circle of radius R (the case r = R), then the value of the game is 4R/ir.
2.6.2. Example 14- Consider a simultaneous game in which Player 2 chooses a
pair of points y = {yi,yi}, where yt S, y2 S, and Player 1 having information
about Player 2's choice chooses a point x g 5. The payoff to Player 1 is assumed
to be m i n ^ a P3(x, Vi)- We shall provide a solution for the case where the set S is a
circle of radius R centered at the origin of coordinates (the point O): S = S(0, R).
Consider the function $(r, p) r2 + p2 4rp/x, where r and p take values from
the interval r, p E [0,fl]. We shall establish properties of the function $(r,/>).
Lemma 1. The function $(r, R) (as a function of the variable r) is strictly
convex and achieves the absolute minimum at the unique point r 0 = 2R/ir.
2.6. Simultaneous games of pursuit 81

Proof. We have dPQ/dr2 = 2 > 0. Hence the function * ( r , p ) , r G [0,R\ is strictly


convex and the derivative
* % ? I =2 r - i * (2.6.2)
or JT
is strictly monotone, it is evident that the function (2.6.2) is equal to zero at the
unique point r 0 = 2Rjir. By the strict convexity of $(r, R), the point r 0 is the unique
point of an absolute minimum. This completes the proof of the lemma.
L e m m a 2. The function $(r0,p) is strictly convex in p and achieves an absolute
maximum at the point p0 = R.
Proof. By the symmetry the function $(r, p) is strictly convex in p. Therefore,
the maximum of this function is achieved at one of the points 0 or R. We have

*(r 0 , R) - * ( r 0 , 0 ) = r20 + R2 - ir0R/n -r2 = R2- -()R= ^ ~ 8 ) > 0.


IT IT W

This completes the proof of the lemma.


From the lemmas 1 and 2 it follows that the pair (r 0 , R) is a saddle point of the
function $:
*(ro,/>)<*(r0,/l)<*(r,Ji).
Theorem. An optimal mixed strategy for Player 2 is to choose a point J/J
uniformly distributed over the circle S ( 0 , r o ) with center at the point O and radius
r
o (j/i = ~ys)- An optimal mixed strategy for Player 1 is to choose a point x uniformly
distributed over the circle S(0, R). The value of the game is $ ( r 0 , R).
Proof. The strategies specified by theorem are denoted by n' and v" for the
players 1 and 2, respectively. Suppose Player 1 uses strategy p", and Player 2 uses
an arbitrary pure strategy y = {y^y?}, yi ( r , c o s i e r ; s i n < ^ ) , t = 1,2. First we
consider the case where y\ = y 2 .
Denote by r a number r% + r 2 , and by <p an angle <pi = ipi- The payoff to Player
1 is

K(f,y) = i- / a *[fl 2 +r a -2flrcos(V'-v)]d0 = tf+r2 > / ? 2 + r 2 - - ( / ? r ) = *{r,R).


2TT JO ir
(2.6.3)
Then, by the Lemma 1, we have K(fi*,y) > $ ( r 0 , R).
In what follows we assume yi ^ y 2 . The polar coordinate system is introduced
on a plane as follows. We take the origin of coordinates to be the point O, and the
polar axis to be the ray emanating from the point O perpendicular to the chord AB
(the set of the points of the circle S{0, R) that are equidistant from yi and y 2 ). For
simplicity assume that, for the new coordinate system, the points y; have the same
coordinates ( n c o s ^ ; , r j s i n w ) . Then (Fig. 2.5) the payoff to Player 1 is
1 rJ*
K(if,y) = T - / mm\R2 + r2 - 2Rrt cos(^ - V ,)]#
zir Jo 1=1,2

2
= T" / V + 'I - 2Rr2 cos(t/> - p j ) ) # + ^ - f *~ V * + r2 - 2ftr, cos(V> - V i ) ] # -
lie J-B IT? J0
82 Chapter 2. Infinite zero-sum two-person games

Figure 2.5

Let
F,(v>) = \{R2 + r\)P - 2Rr2 sin 0cos <?]/*-, - / ? < > < / ? ,
F2(<p) = [(ft2 + rj)(ir - 0) + 2Rrx sin 0cos <p]/v, 0 < y> < 2TT - 0.
Stationary points of the functions Fi and F 2 are respectively 0 and ir since 0 <
f) < r/2 and F[(<p) = ^/^sin/Sain^?, F(<p) = /?risin/?siny, with 0 and v
as the points of the absolute minimum of the functions F\ and Fj, (F((^>) < 0 for
V 6 (-0,0), ^i'(v) > 0 for V 6 (0,0); F ^ ) < 0 for y> G ( f t x ) , / ^ ) > 0 for
y (*, 2* 0)). Consequently,

A V , ) = fifo) + F2(Vl) > F,(0) + F8(*)

= -- f 0(R2+rl-2Rricostl>W + -- / " " ' ( t f + r f ^ J l r , cos(0-*))<#, (2.6.4)


2?r J-/3 2ir j/3

i.e. the amount of the payoff to Player 1, with Player 2 using a strategy yt =
{ri,0}, j/2 = {>"2,0}, is less than that to be obtained by using a strategy

tji = |rjCosy>j,risiny>j|, t = 1,2.

Suppose the points jd and y2 are lying on the diameter of the circle S(0,R)
and the distance between them is 2r. Denote by 2a the central angle basing on
the arc spanned by the chord AB (Fig. 2.6). Suppose that y% = {.ft cos a r,0},
Vi = {Rcosa + r,0}. Then the payoff to Player 1 is

V>(a, f) = r - / [(R cos i/> - R cos a - r) 2 + fl2 sin2 V ] #


25T J-a

+S~ / [(flcos^-Hcosa + r) 2 + ^ 2 s i n 2 ^ ] #
2JT Ja
1 fa
= - / [fl 2 -2fl cos #(.ft cos + ?) + ( # cos a + r ) 2 ] #
2lC Jot R
+K~ / i ~ 2Rcos ^(^cos a - r) + (flcos a - r)2]<ty
2x Ja
2.6. Simultaneous games of pursuit 83

Figure 2.6

= -{{R2 + (Rcoaa + F) 2 ]a - 2/Jsina(Rcosa + r)


It

+[R + (ficosa - r) 2 ](ir - a ) + 2R sin a(R cos a - F)}.


2

We shall show that, for a fixed r, the function ip{a,r) achieves a minimum in a
when a = w/2. Elementary computations shows dip/da = {2i?sin <*[(* - 2o)r
itRcosa}}/it, and hence for sufficiently small values of a we have dip(a,r)/da < 0
since sin a > 0, f(it 2a) * R cos a < 0 (in the limiting case fit itR < 0). At the
same time di>(it/2, r)/da = 0.
For every fixed r the function dip(a,r)/da has no zeros for a except where a =
it 12. Suppose the opposite is true. Let a-i be a zero of this function in the interval
(0,it/2). Then the function G(a) = {it 2a)r itRcosa vanishes at a = a\. Thus,
G(o,) = G(*/2) = 0.
It is evident that G(ct) > 0 for all a 6 (ai,ir/2). This is a contradiction to the
convexity of the function (3(a) (G"(a) = itRcosa > 0). Thus dtl>(a,r)/da < 0 for
a 6 (0, ir/2) and dip(it/2,r)/da = 0. Consequently, the function ip(a,r) achieves an
absolute minimum in a when a = r/2 : ij>(a,r) > ip(it/2,r). The implication here is
that
K(f, y) = rp(a, r) > 4>(it/2, r) = $(r, R) > $(r 0 , R). (2.6.5)
From (2.6.3)-(2.6.5) it follows that for any pure strategy y = {sh,!fe} the following
inequality holds:
K(f,y)>$(r0,R). (2.6.6)
Suppose Player 2 uses strategy v" and Player 1 uses an arbitrary pure strategy x =
{pcosip,ps'mxi>}. Then Player 1 receives the payoff

1 / 2lr
K{x, vm) = / min[p2 + r2 - 2pr0cos(t/> -tp), p2 + r2 + 2pr0 cos(^ - tp)]Ap
lit Jo

1 r2*
= / min(/>2 + r 2 - 2/w 0 cos^, p2 + r2 + 2pr0cos()dt = $(r,p)
ZTt Jo
and by Lemma 2 we have

K(Xlv*) = *(r0,p)<*(r0,R). (2.6.7)


84 Chapter 2. Infinite zero-sum two-person games
From (2.6.6) and (2.6.7) we have that p.* and u" are optimal strategies for the players,
and $(r 0 , R) is the value of the game. This completes the proof of the theorem.
2.6.3. Example 15. Suppose Player 2 chooses a set of m points y = {j/i,... ,ym},
where y, S, t = 1,..., m, and Player 1 simultaneously chooses a point x S. The
payoff to Player 1 is assumed to be mini=i,.,.>m p(x,y;). We shall solve the game for
the case where the set S coincides with the interval [1,1].
Theorem. Player S's optimal mixed strategy v* is the equiprobable choice of two
sets of m points
{-1 + 2^=T' i = 0 ' 1 ' - ' m - 1 > '
{1 = 0 1
-2^T'' ' '-'m-1}-
The optimal strategy p." for Player 1 is to choose the points
r 2m-2t-l . , ,,
{ 2m_l , t = 0,l,...,2m-l}
with probabilities l/(2m). The value of the game is l/(2m 1).
Proof. Suppose p.* and v" are the respective mixed strategies for the players 1 and
2 whose optimality is to be proved. Introduce the following notation:

''-[ 2m-l ' 2m-1 I' = M , - . ! ^ - 1 .


First show that K(x,v") < p^rjt * r all * G [1, !] In fact, for all x 6 i, we have

2m - 4t - 1
K(x, v") = - min
2m-1 x
1 . j - 2 m + 4t + l j
+-min x
2 < | 2m-1 I
1/ 2m-2j-l\ 1/2m-27 + 1 \ 1 , ,

Now suppose Player 1 chooses a mixed strategy p.* and Player 2 chooses an arbitrary
pure strategy y = {y l 7 ... ,ym}.
Denote
2m - 2j - 1
] =0,l,...,2m-l.
' 2m-1
Then
im-i i

m
1 1 2 1
= 1 +
S^SW-^- '^ .igL^-*)! * 2 ^ 2 ^ ! = 2m^T
The statement of the theorem follows from inequalities (2.6.8), (2.6.9).
2.7. One class of games with a discontinuous payoff function 85

2.7 One class of games with a discontinuous pay


off function
The existence of the game value in mixed strategies cannot be guaranteed for the
games whose payoff functions are discontinuous (see, e.g., Example, 2.4.12). But
sometimes the discontinuity of the payoff function makes possible to obtain optimal
strategies and the value of the game. Empirical assumptions of the form of players'
optimal strategies can also assist in finding a solution.
2 . 7 . 1 . This section deals with games of timing or duel type games (see Examples
4,5 in 2.1.2). The main feature of this class of games on a square is the discontinuity
of the payoff function H(x,y) along the diagonal x = y.
We shall consider the game on a unit square with the payoff function [Karlin
(1959)]

( ip(x,y),
>(*),
6(x,y),
if x < y,
if* = y,
if x > y,
(2-7.1)

where ip{x, y) is defined and continuous on the set 0 < x < y < 1, the function <p is
continuous on [0,1], and &(x,y) is defined and continuous on the set 0 < y < x < 1.
Suppose the game T = (X, Y, H), where X = Y = [0,1], with H given (2.7.1), has
optimal mixed strategies (x" and v" for the players 1 and 2, respectively. Moreover,
the optimal mixed strategies n",v' are assumed to be the probability distributions
which have continuous densities f"(x) and g*(x), respectively.
Let us next denote the required strategy by / (or g, respectively) to be taken as
distribution density. We shall clarify the properties of optimal strategies.
Let / be a strategy for Player 1. For y [0,1] we have
K(f,y) =[ 4>(x,y)f(x)dx +f 0(x,y)f(x)dx. (2.7.2)

Suppose that / and g are optimal strategies for players 1 and 2, respectively. Then
for any point y0, at which
g(y0) > 0 (2.7.3)
(that is the point of the strategy spectrum g), the following equation holds:

K(f,yo) = v, (2.7.4)
where v is the value of the game. But (2.7.3) is strict and hence there is 6 > 0 such
that inequality (2.7.3) holds for all y : \y-yo\ < $ Thus, inequality (2.7.4) also holds
for these y, i.e. the equality K(f,y) = v is satisfied. This means that

dK
U>yl = Q. (2.7.5)
y
Equation (2.7.5) can be rewritten as

[%,v)-tf(v,y)]/(y) = f'0(*,v)/(*)k+ / ' ey(x,y)f(x)dx, y e S(yo,S). (2.7.6)


JO Jti
86 Chapter 2. Infinite zero-sum two-person games
We have thus obtained the integral equation (2.7.6) for the required strategy / .
2.7.2. Example 16. Consider the noisy duel formulated in Example 5, 2.1.2. The
payoff function H(x,y) in the game is of the form (2.7.1), where
t(>(x,y)=x-y + xy, (2.7.7)

0(x,y) = x-y-xy, (2.7.8)


<p(x) = 0. (2.7.9)
Note that this game is symmetric, because H(x, y) = ~H(y, x) (a skew-symmetric
payoff function). Therefore, analysis (similar to the analysis given in 1.9.2) shows that
the game value v, if any, is zero, and players' optimal strategies, if any, must be the
same.
We have: i>y(x,y) = - ! + *, 9v(x,y) = - 1 - x, 0(y,y) - $(y,y) = -2y 2 and the
integral equation (2.7.6) becomes

-2y2f(y) = f\x - l)f(x)dx - f\x + l)f(x)dx. (2.7.10)

We seek a strategy / in the class of the differentiable distributbn densities taking


positive values in the interval (a,/?) C [0,1], with the interval (a,/?) taken as a
strategy spectrum of / . Then (2.7.10) can be written as follows:

-2y3/(y) = l\x - i)f(x)dx - / % + i)f(x)dx. (2.7.11)


Differentiating both sides of (2.7.11) with respect to y we obtain a differential equation
of the form
- 4 y / - 2 j , 2 / ' = ( y - i ) / + (y + i ) /
or
y/' = - 3 / (y^O). (2.7.12)
Integrating equation (2.7.12) yields

/(y) = 7y~3, (2.7.13)


where <y is a constant.
It remains for us to find a, /? and 7. Recall that the players' optimal strategies in
the game are the same. From the assumption on a strategy spectrum of / it follows
that
K(f,y) = 0 (2.7.14)
for all y (a,p).
Let /? < 1. Since the function K(f,y) is continuous in y, from (2.7.14) we have
K(f,fi) = 0. Consequently,

/ (x - p + px)f(x)dx = 0. (2.7.15)
Ja
2.7. One class of games with a discontinuous payoff function 87
But in the case /? < 1 it follows from (2.7.15) that

t0
K(f, 1) = / (x - 1 + x)f(x)dx < 0
Ja

which contradicts the optimality of the strategy / . Thus $ = l and K(f, 1) = 0. By


substituting (2.7.13) into (2.7.15), with /? = 1, we obtain

ri 2x 1
7/ k = 0, 7 5^0.
Ja X

Hence it follows that


3 Q 2 - 4a + 1 = 0. (2.7.16)

Solving equation (2.7.16) we find two roots, a = 1 and a = 1/3, the first root being
extraneous. Consequently, or = 1/3. The coefficient 7 is found from the normality
condition for f(y)

I] f(v)dy = i[]
/1/3 Jl/3
if3* = 1,
Jl/3 Jl/3
whence follows 7 = 1/4.
We have thus obtained the solution of the game given in Example 5, 2.1.2: the
value of the game is v = 0, and the players' optimal strategies / and g (as distribution
densities) are equal to one another and are of the form

Q
f(x)-f> x<l/3,
/ w
~\l/(4i3), x>l/3.

2.7.3. Example 17. Find a solution to a "noisy duel" game (see Example 4, 2.1.2)
for the accuracy functions pi(x) = x and Pi(y) y. The payoff function H(x,y) in
the game is of the form (2.7.1), where

0(I,) = 2 I - 1 , (2.7.17)

(,) = l - 2 , (2.7.18)

(*) = 0. (2.7.19)

The game is symmetric, hence v = 0, and the players' optimal strategies coincide.
Here both players have an optimal pure strategy x" = y* = 1/2. In fact, H(l/2,y) =
0(1/2,y) = 1 - 2y > 0 if y < 1/2, H ( l / 2 , y ) = v ( l / 2 ) = 0 if y = 1/2, / / ( l / 2 , y ) =
V>(l/2,y) = 0 i f y > l / 2 .
From the game interpretation standpoint, the solution for the duelists is to fire
their bullets simultaneously after having advanced half the distance to the barrier.
In conclusion it may be said that the class of games of timing has been much
studied (see Davidov (1978), Karlin (1959), Vorobjev (1984)).
88 Chapter 2. infinite zero-sum two-person games
2.8 Solution of simultaneous infinite games of
search
This section provides a solution of the games of search with the infinite number of
strategies formulated in 2.1.2. It is of interest that, in the first of the games considered,
both players have optimal mixed strategies with a finite spectrum.
2.8,1. Example 18. (Search on a closed interval.) (Diubin and Suzdal (1981)].
Consider the game of search on closed interval (see Example 2 in 2.1.1) which is
modelled by the game on a unit square with the payoff function H(x, y) of the form

*(*')-{!: '&*'''^ ^
Note that for / > 1/2 Player 1 has an optimal pure strategy x" = 1/2 and the value
of the game is 1; in this case H(x',y) = H(l/2,y) = 1, since \y - 1/2| < 1/2 < / for
all y [0,1]. Let / < 1/2. Note that the strategy x = I dominates all pure strategies
x < I, and the strategy x = 1 / dominates all strategies x > 1 1. Indeed,

<*> -*.>=U: ffi


and if x < /, then
(X,y)
10, otherwise.
Thus, with x <l: H(x,y) < H(l,y) for all y [0,1). Similarly, we have

*(..,>-(.-w-{i: i2;2'A
and if x 6 [1 - 1,1], then

^ ' ^ 1 0 , otherwise.
Thus, with x [1 - /, 1], H(x,y) < H(l - I,y) for all y (0,1].
Consider the following mixed strategy fi* of Player 1. Let I = xx < x2 < ... <
xm = 1 / be the points for which the distance between any pair of adjacent points
does not exceed 21. Strategy p.* selects each of these points with equal probabilities
1/m. Evidently, in this case any point y G [0,1] falls within /-neighborhood of at
least one point x*. Hence
K(fi\y)>l/m. (2.8.2)
Now let v' be a strategy of Player 2 that is the equiprobable choice of points 0 =
yi < V2 < < Vn 1, the distance between a pair of adjacent points exceeding 21.
Then there apparently exists at most one point y* whose /-neighborhood contains the
point x. Consequently,
K{x,v>) < l/n. (2.8.3)
2.8. Solution of simultaneous infinite games of search 89

If strategies t*,f* were constructed so that ro = n, the quantity 1/n would be


the value of the game with strategies fi', v* as the players' optimal strategies.
It turns out that such strategies can actually be constructed. To do this, it suffices
to take
1
ro _ _ / /(20, >f V(2') is an integer, , ,
m n (2 8 4)
- - \ [ l / ( 2 0 ] + l, otherwise. ' "
Here [a] is the integer part of the number a. The points

x, = / + )~^-{i - 1), t = l,2,...,n, (2.8.5)


n 1
are spaced at most 21, and the distance between adjacent points

S ^ ^ - , i = l,2,...,n, (2.8.6)
n 1
faithfully exceeds 2/. Thus, 1/n is the value of the game, and the optimal strategies
/i*,i/* are the equiprobable mixtures of pure strategies determined by (2.8.5), (2,8.6).
2.S.2. Example 19. Consider an extension of the preceding problem to the case
where Player 1 (Searcher) chooses a system of s points x i , . . . , x , , x,- [0,1], i =
1,... ,s, and Player 2 (Hider) chooses independently and simultaneously with Player
1 a point y 6 [0,1]. Player 2 is considered to be discovered if there is j { 1 , . . . , s)
such that \y Xj\ < I, I > 0. Accordingly, the payoff function (the payoff to Player
1) is defined as follows:

#(x...,*.,y)={I' T ^ . - * ' ' 1 - ' ' (2.8.7)


\ u , xf |Qj ot j,erW]Se. *> >
Suppose Player 1 places the points x\,..., x, at the points Xj = / + (1 2/)(t
l ) / ( n - 1), 1 < i < n that are the points of the strategy spectrum ft* from the
preceding example. Evidently, arrangement of two points Xjs, Xj2 at one point of the
interval [0,1] (i.e. selection of coincident points) provides no advantage. Let (i* be
Player l's strategy selecting equiprobably any s-collections of unequal points {3?*}.
If s > n, then, by placing a point Xj at each of the points x^ Player 1 covers the
entire interval [0,1] with the segments of length 21 centered at the points x\ and thus
ensures that for any point y [0,1] there is miny \XJ y\ < I, i.e. here the value of
the game is 1. Therefore, we assume that s < n. The number of all possible distinct
selections of s-collections of points from the set {xi} is C*. We have

In fact, the point y is discovered if it falls within /-neighborhood of at least one of


the points {x<} selected by strategy /j*. In order for this to occur, Player 1 needs
to select the point x; from /-neighborhood of the point y. The number of collections
satisfying this requirement is at least C%l\-
90 Chapter 2. Infinite zero-sum two-person games
We now suppose that Player 2 uses strategy vm from the preceding example and
Player 1 uses an arbitrary pure strategy x = (x\,... ,x,). Then
" I s
K{xu...,xv*) = ^2H{xu...,xyi)- < -.

Thus, the value of the game is s/n, and /**, v* are the players' optimal strategies.
The value of the game is linearly dependent on the number of points to be chosen by
the Searcher.
2.8.3. Example 20. (Search on a sphere.) Consider the game of search on a
sphere (see Example 3 in 2.1.2). The payoff function H(x,y) is

*<.> H i ! otherwise, C2-8-8*


where x {x%,..., x,) is a collection of s points on a sphere C and Mx = UJ = 1 S(XJ, r);
S(ij,r) is the r-spherical neighborhood of the point Xj. The set of mixed strategies
for Player 1 is the family of probability measures determined on the Cartesian product
of s spheres C x C x . . . x C = ft, i.e. onfl = C.
Define the set of mixed strategies for Player 2 as the family of probability measures
{u} determined on the sphere C.
Consider a specific pair of strategies (/I*,J/*). We choose a uniform measure on
the sphere C to be the strategy v", i.e. we require that

L **-{$ <**>
where L(A) is Lebesgue measure (area) of the set A.
Parameters of the game, s, r and R, are taken such as to permit selection of the
system of points x = (xj, x3,..., x,) satisfying condition

L(Mz) = L ( S ( x , , r ) ) (2.8.10)
i=i
(spherical segments S(xj,r) do not intersect).
Let us fix a figure Mx on some sphere C". The mixed strategy p" is then generated
by throwing at random this figure Mx onto sphere C. To do this, we set in the figure
Mx an interior point z whereto rigidly connected are two noncollinear vectors a,b
(with an angle <p > 0 between them) lying in a tangent plane to Mx at point z.
Point z is "thrown" onto sphere C in accordance with uniform distribution (i.e.
density l/(4ir/t 2 )). Suppose this results in realization of point z' 6 C. Figure Mx
with the vectors set thereon, is transferred to sphere C in a parallel way so that the
points z and z' coincide. Thus, vectors a, b are lying in a tangent plane to sphere C
at point z'.
An angle ip' is then chosen in accordance with uniform distribution in the interval
[0,2T], and vector 6 lying in a tangent plane is turned clockwise together with its
2.8. Solution of simu/taneous infinite games of search 91

associated figure Mx through an angle <p'. This results in the transition of figure Mx
and vector 6 to a new position on sphere C. Random positioning of the set Mx on
a sphere in accordance with this two-stage procedure described, generates a random
choice of the points x\, x'2,..., x', 6 C whereas the centers xi,...,xt of the spherical
neighborhoods S(ij,r) making up the set M, are located.
Measure ft' is so constructed that it is invariant, i.e. the probability of covering
the set Mz of any point y 6 C is independent of y. Indeed, find the probability of
this event. Let ST = {<*>} be the space of all possible positions of Mx on sphere C.
Then the average area covered on sphere C by throwing the set Mx thereon (area
expectation) is equal to L(MX). At the same time

L{MX)= Lj J{y,w)dydn' (2.8.11)


Ju Jc
where J(y,w) is the characteristic function of the set on sphere C covered by the
domain Mz. By Fubini theorem, we have

/ / J(y,u>)dydf = [ Lj(y,u>)dfdy. (2.8.12)


JnJc Jc Ju
By the invariance of measure fi*, however, the integral /|jj(y,u>){fy**, which coincides
with the probability of covering the point y by the set Mx is independent of y and
equals p. Then, from (2.8.11), (2.8.12), we have
,)
- r = ^"'
Denote by K(ft,u) the payoff expectation when using mixed strategies (i {ft} and
v E {f}- If one of the players uses a pure strategy, then

K(x,v)~ f H(x,y)du = f dv = Pr(yMx),


Jc Ju,

K(f*,y) = LHix^dfi = Lj{x,y)dp = Pr(y e Mx),


Jo Ja
in which case mathematical expectation signify respective probabilities that a random
point falls within a fixed region and a random region to cover a fixed point. For all
y and x = (asi,... , x , ) , under conditions (2.8.9) and (2.8.13), we have

K(r L .., L(MX) ^ E ' = 1 L(S(x3,r)) w / r 2\

^'-^^-SM-<*>)
since L{S{xr)) = 2wR(R - yJ(B? - r)).
From the definition of a saddle point and the resulting inequality K(y.',y) >
K(x, 1/") it follows that the mixed strategies p.* and v* are optimal and

*V..->"i(i-,/i-<>)
92 Chapter 2. Infinite zero-sum two-person games

is the value of the game of search discussed above.


2.8.4. Consider an alternative to the preceding game assuming that Player 2,
chooses a simply connected set Y C C and Player 1 aims to maximize an intersection
area
L(Y n Mx) = L(Y n U'=1S(Xj,r)).
The objective of Player 2 is the opposite one. Otherwise the game coincides with
that considered at the beginning of this section. Player l's strategy p* coincides
with that given in the preceding game. Player 2's mixed strategy v* is constructed
analogously to strategy ft' and is the random throwing of the set Y onto a sphere (in
the preceding case Player 2 choose points y C at random). Thus, v* is constructed
as an invariant measure which is to choose at random (in accordance with a uniform
distribution over C) one of the fixed points of the set Y on C and to turn Y around
this point through a random angle (in accordance with a uniform distribution over
[0,2*]). Let K(x, t>), K(fi, y) be the mathematical expectation of the intersection area
L(YnMx). Then

LiY L z)
*(-, ) = K (, ') = *(-, O = l^.
If Y is the r-neighborhood of the point y, then the value of the game is

K(p; v') = *s{R - VrV - r) 2 .

2.9 Games of secondary search


2 . 9 . 1 . Search in the class of one parameter families of trajectories. Let us formulate
the game which is essentially dynamic, but, as shown below, can be reduced to a
simultaneous game on a square.
Consider the following conflict situation. Player 1 (Searcher) learns that at the
time t = 0 Player 2 (Hider) is at the point 0. The Hider knows that his position
has been disclosed, but the initial position x 0 of Player 1 is unknown to him. From
this point on the players know only their own positions, x(t) and y(t), but have no
current information about the opponent's behavior. Such problems are often called
a secondary search or a search by call, because the Searcher is informed about the
initial state of his opponent.
Let there be given the number / > 0 as a radius of a detection circle. Equations
of motion for a > 0 and ||xo|| > ' become

x = u, |||| < a, x(0) = x 0 ; x, R2, (2.9.1)

y = v, |M| < 0, y(0) = y 0 ; y, v R2 (2.9.2)


for the players 1 and 2, respectively.
Objective of Player 1(2) is to ensure (avoid) /-capture by Player 2(1).
Strategy. Since Player 2 has no information about the initial and current position
of the opponent, his behavior will be limited to a class of strategies E. To describe
2.9. Games of secondary search 93

the set E, let us introduce on a plane the polar coordinate system with its pole at
the point O, and with x0 as a polar axis.
Definition. By the strategy y E for Player 2 is meant a pair yv = {Tp, v) where
Tp is a random variable uniformly distributed over the interval [0,2it], and v [0,0\.
We assume that the strategy yv 6 E may be realized as follows. Having selected
a velocity v 6 [0,/?] under strategy yv, from the time t = 0 on, Player 2 moves from
point O with a constant velocity v along the ray <p = ipo, where y>0 is the realized
value of a random variable <p. Then the motion of Player 2 in the polar coordinate
system corresponding to the strategy yv is given by

p = vt, <p = <p0. (2.9.3)

A strategy for Player 1 will be chosen from the class P.


Definition. By the strategy xu P for Player 1 is meant a pair xu = (u,a(-)),
where u [0,0], and a(-) is the rule by which, from the time t = 0 Player 1 moves
at velocity a toward point 0 until the instant i" at which he meets Player 2, if the
latter moved to meet him with velocity u, after Player 1 chooses at the instant t* a
particular direction to bypass point O and continues to move with a maximum velocity
a keeping the radial velocity component equal to u.

Figure 2.1

It is clear that the strategy class P is denned correctly, since every strategy x P
uniquely defines Player l's trajectory. Indeed, to the strategy xu = (u,a(-)) corre
sponds Player l's trajectory of motion in the polar coordinate system as follows:

rtOHWI-at, G KM;], ? = 0, (2.9.4)

P{t) = M > *:, t) = ^ ^ in 1 , tl = J M . (2.9.5)


Eliminating the parameter t in (2.9.5) yields

p(<p) = ti||i 0 ||/(a + u) exp(uy./Va a - u 2 ), <p > 0. (2.9.6)


94 Chapter 2. Infinite zero-sum two-person games
It is evident from (2.9.5) and (2.9.6) that the players' trajectories are controlled
by parameters v [0,/?] and u[Q,0\.
We assume that the game terminates as soon as Player 1 completes one turn
around the point 0. Note that the logarithmic spiral (2.9.6) is a locus of capture
points provided Player 1 has been moving from the time t" with a constant velocity
a, with Player 2 moving along the ray y? = tp0 with the known velocity (for all
<po 6 [0,2*-]). Therefore the spiral (2.9.6) is often called a pointwise capture spiral
(Fig. 2.7).
Payoff. The payoff function K(xa,yv) for Player 1 is taken to be the probability of
detecting Player 2 provided the latter chooses strategy yv E and the former chooses
strategy P. The payoff to Player 2 is set equal to K(~xu,yv), since the game
is zero-sum. Derive an analytic expression for the function K(xu,yv). To do this, it
may be convenient to introduce the transformation of game space which is called the
circular velocity transformation.
Under this transformation the point O remains invariant. For every point the
polar angle remains invariant and the radial distance has a factor l/t, where t > 0
is the time reckoned from the start of the game. From (2.9.3) and (2.9.5) it follows
that, in the velocity space, the strategy yv for Player 2 is a random choice of the point
on a circle of radius t; [0, ft] with its center at point O, and Player l's motion in
accordance with strategy xu P becomes from the time t* a motion along the circle
of radius u G [0,0} centered at point O, with a linear velocity -^a2 u2/t (Fig. 2.8).
Here, in the velocity space, the radius of a detection circle is l/t.

Figure S.8

Suppose the Searcher chooses strategy xu g P. Fig. 2.8 shows the area swept by
the circle of detection. The shaded area is called the area of detection of Player 1
using the strategy x e P and is denoted by fiu.
Suppose the Hider chooses the strategy yv P and Player 1 chooses the strategy
z P. Compute K(xu,yv). If l/t$ > |u - i/|, then the area ftu covers the circle of
radius t; as long as l/t > |u |, i.e. until the time necessary for the first player to
turn around point 0 (u P is fixed).
2.9. Games of secondary search 95

Let DA be the arclength of the circle of radius u travelled by Player 1 from the
time <JJ to the time tu. Then we have

DA = / ' Va 2 - u2lt dt = \ / a 2 - u 2 l n ^ . (2.9.7)

Denote by DA' the part of the radius v arc covered by the area ftu. Then (see Fig. 2.8)
DA' = DA (vfu). Recall that the quantity Tp is uniformly distributed over [0,2JT],
and hence
H(^yv)^ -DA> = ^ V c T ^ ^ l . (2.9.8)

Note that for l/t$ < \u - v\


H(5u,yv) = Q, (2.9.9)
because the detection area 0 U does not intersect the circle of radius v.
Consider the last case where, after making a complete turn around point O, the
area fiu continues to cover the circle of radius v. In this case

H(xu,%) = 1, |u-|<(u), (2.9.10)

where e(u) is found from the relation

2 = % / a 2 - u 2 l n ( r / ^ ) = Va2-u2\n{l{a + u)/(||x 0 || (u))]. (2.9.11)

From this we have

t(u) = t(a + u)/||a:o|| e x p ( - 2 / v / a 2 ^ u 2 ) . (2.9.12)

Thus we finally get

_ fl, |u-|<e(u),
H(xu,yv) = H(u,v) = &^In jpHg^Lf, e(u) <\u-v\< //, (2.9.13)
U, |u-i >//<;.
Note that the resulting payoff function H(u, v) depends only on (u, v) [O,_0\ x [0, /?],
i.e. we have thus obtained the game on a square, with the payoff function H(-) being
continuous in its arguments; hence in our game there exist an equilibrium point in
mixed strategies. A closely related, but differently stated problem is solved in Danskin
(1968).
2.9.2. Discrete secondary search. Consider the following game theoretic problem.
The Searcher learns at the time t = 0 that the Hider is at the point y(Q) = 0 and
can move with velocity which does not exceed the magnitude of /?. At this time the
Searcher calls a team of pursuers S = {Si,..,,Sn} acting as one player (Player 1)
to conduct a secondary search for Player 2. It is assumed that each of the pursuers
Si,i = 1 , . . . , n, can conduct a discrete search at the fixed instants of time t], t2,...,tjv
by choosing the points x,-(tj) C(O,0tj), where C(0,f)t) is the circle of radius fit
96 Chapter 2. Infinite zero-sum two-person games
with its center at point 0. In this case Player 2 is considered to be detected if there
exist t and j such that
IM*i)-*<(ti)ll<f, (2-9-14)
where / is the given detecting radius (or capture radius), and || || is a norm in R2.
Player 2 (the Hider) knows that, starting from the time t = 0, he will be searched
for by the team of pursuers S. And he has no other information about the opponent.
Based on the available information is the assumption that Player 2 confines himself
to a linear motion along the ray proceeding from point O with a constant velocity
v [0,/?], and chooses randomly the direction of motion by the uniform distribution
law. This situation is visualized in Fig. 2.9.

Figure 2.9

For the purposes of further discussion it may be convenient to introduce the ve


locity circle transformation. Then, in the velocity space, a pure strategy yv for the
Hider is a choice of the point uniformly distributed over the circle of radius v 6 [0,0\,
whereas a strategy a; for the Searcher is a choice of the shape of a detection region ft,
i.e. the area a(ft) (Fig. 2.10), where for every ft

/t(ft) = 5 r n 5 3 ( - l = const. (2.9.15)

The region ft will be called a detection region of Player 2 provided Player 1 uses
strategy x. The probability of detecting Player 2 is taken to be the payoff function
H(x,y) of Player 1, i.e.
J7(*,y) = Pr(;eft).
Denote by M = rfP the area of a velocity circle C(0, ff). We assume that ft < M.
It is clear that, in this game, there is no optimal pure strategy for the Hider. Hence
we introduce mixed strategies for Player 2.
2.9. Games of secondary search 97

Figure 2.10

Definition. By the mixed strategy v(v) for Player 2 is meant any distribution
density of a random variable v [0,0], i.e. the function u(v),

" ( w ) > 0 , vG[0,/9], / u(v)dv = l.


Jo
In mixed strategies, the probability of detecting Player 1 then becomes

K(x,u)- H(x,yv)t/(v)dv. (2.9.16)


Jo

T h e o r e m . The game formulated above has a saddle point (x*,y*), where x* is


the area sector ft oriented in a random way by the uniform distribution law

v*(v) = 2 / M , (2.9.17)

with 7 = fi/M as the value of the game (see Fig. 2.10).


Proof. To prove this theorem it is necessary to show that inequalities

H(x',yv)>i, (2.9.18)

K{x,v*)<-r (2.9.19)
hold for all strategies x and yv.
Suppose Player 1 chooses strategy x'. Then for every strategy yv of Player 2 we
have
H(x%yv) = ej(2w) = n/M, (2.9.20)
where 6 = 2^/fi7 is the central angle of sector Q (Fig. 2.11).
We shall now show the inverse inequality (2.9.19). To this end, we assume that
Player 1 chooses an arbitrary pure strategy x and Player 2 chooses a strategy i/*.
Examine now dil which is the part of the region fi bounded by v, v+dv and 0 , 6 + 5 0 ,
as shown in Fig. 2.12.
If u'(v) < 2icv/M, then the probability of finding Player 2 in the region d(l is

PT(H edil) = ^- V'{v')dv' = ~[(v + dvf - v2}. (2.9.21)


lit Jv IM
98 Chapter 2. Infinite zero-sum two-person games

Figure 2.11

Figure 2.12

But the quantity 6Q[(v + dv)2 - t>2]/2 is the measure fi(dQ) of dSl. Therefore the
probability of K(x, v*) is

*(*'"*) = i i / ( d f t ) : = M (2 9 22)
--
for every pure strategy x of Player 1 (the Searcher). This completes the proof of the
theorem.
Note that the above-stated optimal strategy of search x" is not unique. In fact,
if the central sector Q will be "cut" by radial rays into several central sectors H,-, i =
1 , . . . , m, then the resulting new strategy would also be optimal. Similarly, "cutting"
by a circle arc of sector leaves its optimality unchanged. In particular, the following
"good" strategy of search can be proposed to the team of pursuers 5 = { S i , . . . , Sn}.
The central sector is cut by radial rays into n sectors with the area fijn each. And
each of these sectors is cut by the circle arcs into N segments with the respective
areas ff(//ii)2,ir(l/*2)2,... ,r(l/tN)2. The segments are then approximated by the
respective equal circles (Figs, 2.13, 2.14).
2.9.3. Reestablishing a contact with the evading submarine. [Diubin and Suz
dal (1981)]. Suppose a floating submarine has been detected by an aircraft radar.
Knowing that it has been detected, the submarine makes a crash dive and breaks
into evasion in a submerged condition. In order to reestablish a contact, the aircraft
is expected to use in time t a radio-sonic buoy whose range is taken to be 6. When
2.9- Games of secondary search 99

Figure 2.13

Figure 2.14

evading, the submarine is expected to appear in time t at one of the points of the
circle of a unit radius. It is evident that if 6 > 1, then, after setting up the buoy at the
center of this circle, the aircraft would reestablish a contact, otherwise reestablish-
ment of a contact would depend on a distance between the buoy and the submarine.
Therefore, with S < 1, the aircraft chooses the point for setting up the buoy and the
submarine chooses the point for its position in time t.
We assume without loss of generality that at the initial time the submarine is at
the point (0,0) and in time t appears at the point y = (5/1,1/2), where y* + y\ < 1.
The submarine is taken to be Player 2 whose pure strategy is the choice of the point
y Y = {(yi>y2)\(yi + y|) ^ *} Accordingly, the aircraft is taken to be Player 1
whose pure strategy is the choice of the point x X = {(x 1 ,i 2 )|(si + #2) ^ ! }
Then the model for this secondary search is a two-person zero-sum infinite game
r = < X,Y,H >, where

#(*,) = ( ! ' if>/(*i-yi)2 + ( * 3 - ) 2 < ' . (2.9.23)


(. 0, otherwise.
Let fi* and v" be optimal strategies for players 1 and 2, respectively, and let Ta
100 Chapter 2. Infinite zero-sum two-person games
and Tp be the transformations of the circles X and Y rotating them through the
angles a and /?, respectively. Denote by p* and v*& the measures on the circles X and
Y defined by the equations n'0(A) = ft'(T^lA), AcX, ^(B) = ^ ( T j ' B ) , BcY.
For any a, /? [0, 2TT] the strategies p*a and fjj are also found to be optimal strategies
for the players in the game T.
I y| *\ n f*f\ if

ff(x, V') < K(it\ S) < K(ti', y),xX,y F,


1 l
then, from H(T~ (x),T~ (y)) = H(x,y), the following inequalities hold:

K(fil,y) = j x H(x,yWa(x) =j g H(T;\x),T;l(y))dn;,(T;l(x))

= K ^ M T ' W ) > K(p\ v'), (2.9.24)


K{x,;) = / F ff(*,y)Afl*) = K(T^(x),u') < K(p\v'). (2.9.25)
On the other hand, integrating (2.9.24) and (2.9.25) with respect to j/jj and n'a
yields the equality #(;**, yj) = K(p',v"). From the last equality and inequalities
(2.9.24) and (2.9.25) it follows that for a,0, xX,ytY

K(x, V') < Kbl, v}) < K(ix-a, y). (2.9.26)


Thus, nl, and v*& are optimal strategies for the players. We shall now define the
measures ft and v by

M-hCL*'-*'' Acx>
^ a s f / , ^ BCY-
Show that the measures /i and v are optimal strategies for the players in the game F.
In fact, by (2.9.26),

K(x, v) = i - j**K{x, v;)d$ < K(jfat v$. (2.9.2?)

Payoffs in all situations of saddle point (ftl,vp) are equal; hence

Consequently, inequality (2.9.27) is equivalent to the following inequality:

K{x,i>)<K{fi,i>), xX.

The inequality
K(ii,y)>K(ii,u), yY
can be proved in much the same way.
2.9. Games of secondary search 101

Thus, the strategies ji and v are optimal strategies for the players.
The measures \i and v are invariant, by their definition, under rotations through
any angles a and f3, i.e.

p(A) = fL(TaA), A C A", a [ 0 , 2 ^ ] ,

y{B) = i>(T0B), B c V J e [0,2*],


Any rotation-invariant measure fi X can be given by the measure (i defined on
the interval X = [0,1]; the measure of the set

A = {(xx,Xi) \xi = i cos a, 2 = 1 sin a, xl <x< x2,cti < a < Q 2 }

is defined by

M^) = r - / da dji(x). (2.9.28)

If, however, the measure y. is defined by (2.9.28), where ~p is an arbitrary proba


bility measure on the interval X, then it appears to be optimal. A similar assertion
applies to any invariant strategy v Y and an arbitrary measure on Y = [0,1].
As noted above, the players in the game T have optimal invariant strategies. If the
measure Ji is concentrated in the point x X, then the measure is the one uniformly
distributed over the circle of radius x. This suggests that a solution of the game
T = (X,Y,K), in which the players take the uniform measures on concentric cir
cles as pure strategies, is a solution of the game T. The function K is equal to the
expectation of payoff provided the players use only the strategies described in this
paragraph.

Figure 2.15

The sets X and Y can be identified as the points of the interval [0,1]. Then the
choice of pure strategies 1 [0,1], y 6 [0,1] in the game T will mean the choice of
102 Chapter 2. Infinite zero-sum two-person games
mixed strategies in the game T which correspond to the uniform distributions over
the circles of radii x and y. In this case the function K is defined by

2xf
for (x + y) > I, |x - yj < I,
K{x,y) = for \x + y| < /, (2.9.29)
otherwise.

Indeed, (see Fig. 2.15) for any y G Y, \Jy\ + y\ = y, the expectation of the payoff
for the first player K(x, y) is equal to the arc length of the circle of radius x which
is inside of the circle of radius / with its center at point y. This length is equal to
2 2
2Q/(2JT), where the angle or is determined from the equation x + y 2xycosa = P.
Solving the last equation for a with x + J? > /, \x - y\ < I, y = (yt,y2), sjy\ + y\ = y
yields
X* + f - P
^( s > y) arccos
2xy

Figure 2.16

Since the last expression holds for all pure strategies y lying on the circle of radius
y, integrating with respect to a probability measure uniformly distributed over this
circle, with the indicated values x and y, yields (2.9.29). If, however, (x + y) < /,
then (see Fig. 2.16) it is evident that any circle of radius S circumscribed about any
point y lying on the circle of radius y completely covers the circle of radius x. Hence
K(x,y) = 1. Finally, if |x - y| > i, then (see Fig. 2.17) these circles do not intersect,
and hence K(x,y) = 0.
We shall prove that any pair of optimal strategies p* and V" in the game T defines
a solution of the game T. Let ft' be a measure on X defined by strategy jT, and
v' be a measure on Y defined by strategy F*. By the definition of the function K,
~K(r,T) = * ( , x > ' ) , T?{x,V) = K{x,V), K{n',y) = ^ / T . y ) . The arguments
x and y of the function K should be interpreted as the mixed strategies selecting
2.9. Games of secondary search 103

Figure 2.17

uniformly the points of the circles of radii x and y. Prom the above equalities and
optimality of strategies ]T and F", the inequalities

K{x,S)<K(iiW)<K(r',V) (2.9.30)

hold for all x and y. Suppose that for some x X

K{x,v')>K{y.\v'). (2.9.31)

Then, by the invariance of strategies v' for any a, K(Ta(x),v') = K{xQ,v'). Conse
quently equality K{x% v') = K(x, u') holds for all i X located at the same distance
from the center of the circle as x. In view of the last equality and inequality (2.9.31)
we obtain K(x, v') > K(fi',i/'), x X. By integrating this inequality with respect
to the measure uniformly distributed over the circle containing the point x, we arrive
at the inequality K(x~, v') > K((i',u'). This inequality contradicts the first of the
inequalities (2.9.30). Hence the inequality

K(x,v') < K{ti',v')

holds for all x. The following inequality may be proved in much the same way:

W O <Ww). yey.
Thus, to obtain optimal strategies for the players in the game T, it is sufficient to
solve the game T.
Note that, for l / \ / 2 < / < 1, the game T has a solution in pure strategies x* =
%/T^T 5 , y* = 1. Indeed, for l / \ / 2 < / < 1 we have \s/\ - P - y'| < /. Therefore

_ . % f l l_2/2 + f.
minKWl P,y) = min{l, mm arccos , }
v 7T y>i-s/v^P 2vl - ' y
1 . l-2P+f
= miti arccos . - .
ir v>i-v/TZ(T 2^/T^Py
104 Chapter 2. Infinite zero-sum two-person games
The function placed under the arccos sign is monotonically increasing. Since the
function arccos y is monotonically decreasing, a minimum of the function K((lP),y)
is achieved at the point y = 1. Hence

*((1 - 1%V) > K(VT^-P, 1). (2.9.32)


On the other hand,
1 x3 + 1 - P
max K(x, 1) = max{0, max arccos }.
* i ir x>i-i 2z
Since the function placed under arccos sign achieves a minimum at the point i =
\/l P and the function arccos x is monotonically decreasing, then max/T (7,1) =
#(VT=~P,l)and
K(x, 1) < KiVT^P, 1). (2.9.33)
Inequalities (2.9.32) and (2.9.33) prove the optimality of strategies x* = \/l -P,
y* = 1. And the value of the game is
1 2 - IP 1
v = arccos , = arccos \/\ P. (2.9.34)
5 v ;
* 2 V /nn ir
Strategies x* = \A P, p* = 1 in the game T mean that the submarine has to
arrive at one of the uniformly distributed points on a circle of radius 1, whereas the
aircraft has to drop with uniform probability distribution a buoy at one of the points
on a circle of radius y/l P. In this case the probability of detecting the submarine
will be given by (2.9.34).
In the general case the solution of the game T is unknown to us. Note only that,
with / = 1/2 an optimal strategy for Player 1 is the choice with probability 1/7 a zero
point and with probability 6/7 the point s/%/2. An optimal strategy for Player 2 is
to choose with probability 1/7 point 0 and with probability 6/7 point 1. The value
of the game is 1/7. This means that the submarine stays with probability 1/7 and
moves with probability 6/7 to one of the points on a circle of unit radius. The aircraft
drops a buoy with probability 1/7 at the center of a circle and with probability 6/7
at one of the points on a circle of radius >/3/2. With such a behavior, the probability
of detecting the submarine is 1/7. It can be easily seen that the strategies discussed
are optimal. The solution of the games similar to those discussed here is examined
more fully in Lutcenko (1978). The game T is classified with the games which are
invariant under a particular group of transformations. The general theory of such
games is contained in Karlin (1959).

2.10 A poker model


2.10.1, A poker model with one round of betting and one size of bet. [Bellman (1952),
Karlin (1959)]. The model examined is a special case of the model treated in 2.10.2,
which permits n possible sizes of bet. In this section we follow Karlin (1959).
2.JO. A poker model 105

The model. Two players, A and B, ante one unit each at the beginning of the
game. After each draws a card, A acts first: he may either bet a more units or fold
and forfeit his initial bet. If A bets, B has two choices: he may either fold (losing
his initial bet) or bet a units and "see" A's hand. If B sees, the two players compare
hands, and the one with the better hand wins the pot.
We shall denote A's hand by , whose distribution is assumed to be the uniform
distribution on the unit interval, and B's hand by i?, also distributed uniformly on
the unit interval. We shall write (f ,r/) = sign( 77) as before.
Strategies and payoff. The strategies are composed as follows: Let
<!>{) = probability that if A draws he will bet a,
1
~ <t>{0 = probability that if A draws he will fold,
4>(t)) = probability that if B draws rj he will see,
1 i>(ri) = probability that if B draws t] he will fold.
If the two players follow these strategies, the expected net return K(<j>,t[>) is the
sum of the returns corresponding to three mutually exclusive possibilities: A folds; A
bets a units and B sees; A bets and B folds. Thus

K(*,*) = ( - 1 ) / [ 1 - M)W + ( + l)jj<t>(0^n)L((,r,)d(dr,

+ //*(0[i-MM*-
The yield to A may also be more transparently written as

K(4>,*) = - ! + _ * ( 0 ( 2 + a j[Vfo)fy - (a + 2) | V f o ) * f ) ^ (2.10.1)

or

K(*,*) = -l+2/ o ' *(0^+j[V(?)(-(+2) J^'^O^+Ajf1 *))*. (2-10.2)


Method of analysis. We begin by observing that the existence of a pair of optimal
strategies is equivalent to the existence of two functions <j>' and rj>* satisfying the
inequalities
KW) < K(P,P) < K(4>m,*) (2-10.3)
for all strategies $ and V>> respectively. Thus <j>' maximizes K(<f>, t/>*) while V>* mini
mizes K(4>*,rl>). We shall therefore search for the strategy 4? that maximizes (2.10.1)
with ip replaced by />*; and we shall also search for the strategy rj>* that minimizes
(2.10.2) with <f> replaced by </>'. Since the constant terms are not important, the
problem is to find

max J ' *()(2 + ajl tf*(ij)rtj - (a + 2) j * </>*(?)*?)# (2.10.4)

and
min # j ) ( - ( a + 2) j f * * ( 0 # + *f **(0<f)*?. (2.10.5)
106 Cnapter 2. Infinite zero-sum two-person games
The crux of the argument is to verify that our results are consistent, i.e., that the
function <f>* that maximizes (2.10.4) is the same function (f>* that appears in (2.10.5),
and similarly for ^>*; if these assertions are valid, then (2.10.3) is satisfied and we
have found a solution.
At this point intuitive considerations suggest what type of solution we search for.
Since B has no chance to bluff, #'(77) = 1 for T) greater than some critical number
c, and t/>*(f?) = 0 otherwise; also, since B is minimizing, II>"(TJ) should be equal to 1
when the coefficient of 0(n) in (2.10.5) is negative. But this coefficient expresses a
decreasing function of q, and thus c is the value at which it first becomes zero. Hence

-(a + 2) j 4>\m + [ f(tW = 0. (2.10.6)


With this choice for tj)'(ri), we find that the coefficient of $(() in (2.10.4) is constant
for ( < c. If we assume that this constant is 0, we obtain at { = c
2 + 0 - (a + 2)(1 - c) = 0,
or
c=-^~. (2.10.7)
v
o+2 '
The reason we determine the constant c so as to make the coefficient zero in the
interval [0,c] is as follows. In maximizing (2.10.4) we are obviously compelled to
make #*() = 1 whenever its coefficient is positive, and <t>"(Z) = 0 whenever its
coefficient is negative. The only arbitrariness allowed in the values of <f>*(i) occurs
when its coefficient is zero. But we expect A to attempt some partial bluffing on
low hands, which means that probably 0 < <*() < 1 for these hands. As pointed
out, this is feasible if the coefficient of <^() is zero. With the determination of c
according to (2.10.7), the coefficient of <f>(() in (2.10.4) is zero for f < c and positive
for ( > c. Under these circumstances it follows from (2.10.4) that the maximizing
player is obligated to have <j>'() = 1 for ( > c while the values of <f>*{() for ( < c are
irrelevant, in the sense that they do not contribute to the payoff. However, in order
to satisfy (2.10.6) with this choice of <f>* we must have
-{a + 2)Jo/ V ( 0 * + <*(1 - c) = 0,

/ V ( ^ = ^ +C2) - (a -+a2) ' 2


Jo a
and this can be accomplished with <f>"(() < 1. It is easy to verify now that if

*'-{!: * ^
and

( arbitrary between 0 and 1 but satisfying

1, sfe<*<i
2.10. A poker model 107

then <f>" maximizes (2.10.4) for t/>* and if)' minimizes (2.10.5) for <f>".
The interpretation of the solution is of some interest:
(1) Both players bet or see on high hands. What is of special significance is that
both players use the identical critical number a/(a + 2) to distinguish high and low
hands.
(2) The element of bluffing shows up for Player 1 only to the extent that the
proportion of hands on which he should bluff is determined; he may choose the actual
hands in an arbitrary manner from [0,a/(a 4- 2)] subject only to the restriction

2.10.2. A poker model with several sizes of bet. [Karlin and Restrepo (1957)].
The model examined here is an extension of the one just analyzed.
As before, the unit interval is taken in the representation of all possible hands that
can be dealt to a player. Each hand is considered equally likely, and therefore the
operation of dealing a hand to a player may be considered as equivalent to selecting
a random number from the unit interval according to the uniform distribution. Of
course, a hand t is inferior to a hand 2 if ar*d nly if i < & The game proceeds as
follows. Two players A and B select points and 77, respectively, from the unit interval
according to the uniform distribution. Both players ante one unit. A, knowing his
value (, acts first and has the option of either folding immediately, thus forfeiting his
ante to B, or betting any one of the amounts a i , a 2 , . . . ,a, where 1 < aj < 02 <
. . . < a n . B must then respond by either passing immediately or seeing. In the first
circumstance A wins J3's ante. If B sees, the hands and TJ are compared and the
player with the better hand wins the pot. If = r\, no payment is made.
A strategy for A can be described as an n-tuple of functions

m = (*i(o,fc(o,- ..,*t(o),
where </>,() expresses the probability that A will bet the amount a, when his hand is
. The fait) must satisfy 4>i() > 0 and

i=l
The probability that A will fold immediately is

i-I>(0.
A strategy for B can be represented by the n-tuple

$(.V) = {tM'?).tM'?).--i<M'?)} 1
where $,(9) expresses the probability that B will see a bet of a; units when he holds
the value ?. The probability that B will pass after A has bet a, is 1 !&,*(?). Each
i>i(r}) is subject only to the condition that

0 < tpi(v) < 1.


108 Chapter 2. Infinite zero-sum two-person games
If A uses the strategy <j> and B uses the strategy tj>, the expected gain to A is
denoted by K(<f>,r)>). Enumerating all the possibilities, we find that

*(*, *) = (-i) jl [i - *(o] < + / 0 X *)[i - MvM*,

> + 1) / 7 1 MQkivmiVW*!,
7Z, JO JO
where (,?) = sffn( - tj).
Any pair of optimal strategies <^* and 0* satisfy the inequalities

A"(<A*,V>)> A"(^*,0*)forallV (2.10.8)


and
# ( & V*) < #(<% i>') for all <. (2.10.9)
Conversely, if the inequalities are satisfied, the * strategies are optimal. Thus, <j>* max
imizes JK"(^,V>*) and ^>* minimizes K(<j>m,i>)- In this case, rearranging the expression
for K(4,il>), we can write

K{*, r) = - i + i z / *) [2+. jf 0*(v)*i - (+2) jf * ^( 9 W


d (2.10.10)

and

(2.10.11)
Thus (2.10.10) and (2.10.11) have the form

*(*,*) = cl + J2[1 *(i) WQt (2-10.12)


and
K(f, ,/>) = C2 + / ' Ml)Ki(n)dri, (2.10.13)

where Ci and C2 are independent of <j> and V. respectively, and L, and if< stand
for the bracketed expressions in (2.10.10) and (2.10.11), respectively. In view of the
constraints on & , . . . , ^ n , it is clear that in order to maximize (2.10.10) or (2.10.12),
A must choose &() = 1 wherever ,() is positive and greater than all Lj(() < 0;
and finally, if Li(() = 0 and if the remaining coefficients Lj()(j ^ i) are nonpositive,
he can maximize K(if>,xl>*) by choosing &() arbitrary consistent with 0 < &() < 1
(or, if more than one Lj(|) is zero, & ( { ) S 1> where the sum is extended over
those indices corresponding to ,() = 0). Similarly, in order to minimize (2.10.11)
or (2.10.13) B must choose rl>i(ij) = 1 wherever K((t]) < 0, and ^<(?) = 0 wherever
Ki(t)) > 0. Where JC,-(ij) = 0, the values of V\(?) will not affect the payoff.
2.10. A poker model 109

Guided to some extent by intuitive considerations, we shall construct two strate


gies <f>" and ip*. It will be easy to verify that these strategies satisfy (2.10.8) and
(2.10.9) and are therefore optimal. The main problem in the construction is to make
sure that the strategies are consistent i.e., that the function <j>' that maximizes
(2.10.10) is the same function that appears in (2.10.11), and similarly for if>'. This is
another illustration of the fixed-point method. We now proceed with the details.
Since B has no opportunity to bluff, we may expect that

* ( , ) = {;; l ^ ; (2.10.H)

for some 6,-. This is in fact the case, since each KJ(TJ) is nonincreasing.
On the other hand, we may expect that A will sometimes bluff when his hand
is low. In order to allow for this possibility, we determine the critical numbers 6,
which define 0"(JJ) so that the coefficient L,() of fa is zero for < 6;. This can be
accomplished, since <() is constant on this interval. Hence we choose

h = r^-, (2.10.15)
I + a,
and thus ^ < ^ . . . < 6n < 1. The coefficient ,() of 4>i is zero in the interval (0,6,),
and thereafter it is a linear function of such that

M0= 4(1- -4o).


From this we deduce that the functions ;() and ,() intersect at the point

c = 1 - % -. (2.10.16)
v
" (2 + a,)(2 + aj)
Clearly, ctJ is a strictly increasing function of i and j . Define C\ = b\ and c, =
c,-!,, for i = 2, . . . , n and c n + i = 1. For in the interval (C;,CJ + I), it is clear that
.({) > ,() > 0 for j / j . Consequently, according to our previous discussion, if
<f>' maximizes K((f>^*), then <j>"(i) = 1 for Ci < < c, + ! . For definiteness we also
set <j>"(ci) = 1; this is of no consequence, since if a strategy is altered only at a finite
number of points (or on a set of Lebesgue measure zero), the yield K(4>,i>) remains
unchanged.
Summarizing, we have shown that if ip' is defined as in (2.10.14), with

b, ^
2 +a,'
then K(4>,ip~) is maximized by any strategy <j>' of the form

' arbitrary, < c t = bx,


c
m) ; .ff<c.. {2.10.i7)
v
i, c,i < i < c,-+i, '
I o, c-+, < i < l,
no Chapter 2. Infinite zero-sum two-person games

where
I>?(0 < i, *?(0 > o.
The values of $*(() in the interval 0 < ( < cx are still undetermined because of the
relations ;() = 0 which are valid for the same interval.
It remains to show that ip* as constructed actually minimizes K(4>*,xl>). In order
to guarantee this property for ^*, it might be necessary to impose some further
conditions on the # ; for this purpose, we shall utilize the flexibility present in the
definition of <j>* as ranges over the interval (0,Ci). In order to show that ij>m minimizes
^W>^0) w e must show that the coefficient /<(>?) of &(/) is non-negative for >j < 6,
and nonpositive for T) > &,-. Since Ki{rj) is a continuous monotone-decreasing function,
the last condition is equivalent to the relation

-(oi + 2) f' #[(),% + a j 1 m)<% = 0- (2.10.18)


JO Jti

Inserting the special form (2.10.17) of <j>' into (2.10.18) leads to the equations

2 t m)<% = M l - &1X&2 + 1). 2 ?l <Fn(m = M l - *)(! - *-i), (2-10.19)


Jo Jo
and
2 T tf ( 0 # = Ml - *s)(*i - *i-i), i = 2, -.. ,n - 1. (2.10.20)
Jo
JO
Since

=1
these equations can be satisfied if and only if

2/** x>:)*e < i - (2.10.21)

But the sum of the right-hand sides of (2.10.19) and (2.10.21) is at most (2+6 -&i)/4,
since always 6;(1 bi) < 1/4. As 61 > 1/3, we get

i(2 + 6 n - 6 1 ) < i ( 3 - 6 1 ) < | < 2 6 1 .

Thus the requirements in (2.10.19)-(2.10.21) can be fulfilled. We have consequently


established the inequalities (2.10.8) and (2.10.9) for the strategies <j>* and ip'. In
summary, we display the optimal strategies as follows:
(2 10 22)
*?<">-{!; l i t --
( arbitrary but satisfying

1, *<< C.-+1,
2.10. A poker model 111

where
k = j r - 7 , cj = 6,,
2 + a,
2
c
< = ! - 7^1wTl N> t = 2 , . . . ,n, c n + 1 = 1
(2 + Oi)(2 + o,-_i)
and
E(0<i.
2.10.3. Poker model with two rounds of betting. [Bellman (1952), Karlin and Re-
strepo (1957)]. In this section we generalize the poker model of the preceding section
to include two rounds of betting, but at the same time we restrict it by permitting
only one size of bet. We assume again, for convenience, that hands are dealt at
random to each player from the unit interval according to the uniform distribution.
Payoff and strategies. After making the initial bet of one unit, A acts first and
has two choices: he may fold or bet a units. B acts next and has three choices: he
may fold, he may see, or he may raise by betting a + 6 units. If B has raised, A must
either fold or see.
If A and B draw cards and r\, respectively, their strategies may be described as
follows:
<f>\{) = probability that A bets a and folds later if B raises.
<fo() = probability that A bets a and sees if B raises.
1 - 4>x{) fad) probability that A folds initially.
^jj(^) = probability that B sees the initial bet.
t/j(r;) = probability that B raises.
1 ^i(?) ^(?) = probability that B folds.
The expected return is

K{4>, *) = - j[\ - MO - MiM + / /(*( + MiW ~ Mv) - MvMdr,

+(o +1) / / <t>i(0Mn)Ht,v)dtdt, -(a + i)JJ MQMnW*!

+(o + l ) J ^ ( ^ 1 ( i , ) I ( f 1 ^ 4 J + (l+(.+ 4 ) J / ^ ) ^ ) I ( ( , i l W l
where L(,ti) = sign( - TJ). (This expected yield is derived by considering mutually
exclusive possibilities: A folds; A bets and B folds; A acts according to ^i and B sees
or raises; A acts according to <j>2 and B sees or raises.)
If we denote the optimal strategies by

<nz) = mamwiv) = wfo),*;o)>


and rearrange the terms as we have done in the previous examples, we obtain

K{4>, P) == -1JO+ / ' MO


I
[2JO+ a f #(!,)*, - (aJi+ 2) / ' tffo)*,
112 Chapter 2. Infinite zero-sum two-person games

- ( a + 2) ttfotojrfC + <h(0[?> + *jl *&i)*l - ( + 2) *?(*)*

+(a + 6) jf * fo)*J -(a + b + 2) (*)*?] # (2.10.24)


and

W,#) = A-i + 2tf (0 + 2^UM + f *fo)[-( + 2) f fc(0 + M


JO JO L JO

+ j[W ) + (01] *? + jf *(*) [-( + 2) m)d(


- ( o + 6 + 2) j T * 3 K - + ( + 6) j f ( 0 ] * J - (2-10.25)
Search for the optimal strategies. We shall search again for the functions <!>'(()
that maximiise (2.10.24) and the functions rl>*(t)) that minimize (2.10.25). Intuitive
considerations suggest that A may do a certain amount of bluffing with low cards
with the intention of folding if raised; he will also choose $() = 1 in an intermediate
range, say c < f < e, and he will select the strategy 4^(0 = 1 for e < ( < 1. B
should sometimes bluff by raising hands in the interval 0 < q < c; he should choose
ipl = 1 in an interval c < 7/ < d, and 0J = 1 in d < JJ < 1.
This is not to imply that this is the only possible form of the optimal strategies.
In fact, we shall see later that optimal strategies of different character exist. However,
once a pair of optimal strategies is determined, it is then relatively easy to calculate
all solutions; hence we first concentrate on determining optimal strategies of the form
indicated. To this end we shall attempt to find values of c, d, and e that produce a
solution of the given type. In view of the construction of />i we see immediately that
the coefficient of <j() in (2.10.24) is constant for < c; evaluating this constant and
setting it equal to zero, we obtain
2 - (a + 2) / rl>fa)dri - (a + 2)(d - c) = 0. (2.10.26)
Jo
The coefficients of i(>l and ipj should be equal at the point d, where B changes the
character of his action; this requires

(2a + 2) f # ( 0 i f = - 6 P fttfW + * / ' MM- (2-10.27)


Jd JO Jd
A similar condition at = e requires

(2a + 6 + 2) ip;{t))dt} = b tkm2(v)drj. (2.10.28)

At the point t) = c, where B begins to play $j and ^J without bluffing, the


corresponding coefficients (which are decreasing functions) should change from non-
negative to nonpositive, i.e., they should be zero attf = c. Hence
-(a+2) mm)+mm+I'mo+mm=o, (910
K
.]
- ( a + 2) ( 0 * + (a + b) Si PMW = 0. * ^
2.10. A poker model 113

(In writing (2.10.29) we postulate that ^ ( 0 = 0 for 0 < { < c; this is intuitively
clear.) At this point we introduce the notation

Jo Jo
Recalling the assumption made on the form of the solution and assuming that c <
e < d, equations (2.10.26)-(2.10.29) may be written as follows:
2 = (a + 2)(m, + 1 - c), (2.10.30)
1-d = d-eor2(l-d) = (l-e), (2.10.31)
(2a + b + 2)m 2 = 6(1 - d), (2.10.32)
(a + 2)m, = a(l - c), (2.10.33)
(a + 2)(rn 1 + e - c) = (a + 6)(1 - c). (2.10.34)
We have obtained a system of five equations in the five unknowns m,\,m2,c,d,e;
we now prove that this system of equations has a solution which is consistent with
the assumptions made previously, namely that 0 < c < e < d < 1 , 0 < I T I < 0 ,
0 < m 2 < c.
Solution of equations (2.10.30)-(2.10.34). The system of equations may be solved
explicitly as follows. We write the last equation as:

(a + 2)(m, + 1 - c) = (2a + 6 + 2)(1 - e).


Eliminating mi and 1 e by means of (2.10.33) and (2.10.31), we obtain
(a + 1)(1 - c) = (2a + 6 + 2)(1 - d). (2.10.35)
From the remaining equations we eliminate m 2 ; then

2
a +2 ' l - c )' = o2_a +A6 +, o2( l ~ < 0 -
v (2-10-36)
Therefore
b
_ d)(
(1 + 2 a +fe+ 2 ) = * (2.10.37)
v ; v
V2a + 6 + 2 a+1 / a+2
Having obtained 1 d, we can solve (2.10.36) for 1 c, and the remaining unknowns
are then calculated from the original equations.
In order to show that the solution is consistent, we first note that 1 d > 0.
Equation (2.10.35) shows that 1 - c > 1 - d, and therefore c < d. Also, from
(2.10.36),
c
=(2^TT2)( 1 -^ + ( 1 - T 2 ) > - ^10-38)
Since 2 ( a + l ) ( l - c ) = (2a + 6 + 2 ) ( l - e ) , we infer that 1 - e < l - c , o r c < e; and since
2d = 1 + e, we must have e < d. Summing up, we have shown that 0 < c < e < d < l .
For the two remaining conditions we note that (2.10.30) implies that

m-i = c-
( ' - ^ >
114 Chapter 2. Infinite zero-sum two-person games
so that m2 < c, and by (2.10.38), m2 > 0. Finally, using (2.10.33) and (2.10.30), we
conclude that
m1 = ^ - ( l - c ) = ( l - c ) - - ( 1 - c )
= (1 - c)[l - (m, + 1 - c)\ = (1 - c)(c - m,),
so that 0 < ni < c.
Optimality of the strategies <j>" and t^>*. We summarize the representation of <f>*
and rl>* in terms of the values of c, e, d, ni, and m2 as computed above:
,.m fl, c<<e, ,.m_f0, 0
0<< <
< ee ,,

(2.10.39)
f 0, 0 < i? < c, <r ^,;
l a r 1
U , <*<?<i, ' - >- -
In the remaining interval 0 < r] < c, the functions #*() and ^J(JJ) are chosen arbi
trarily but bounded between 0 and 1, satisfying

/ # ( ) # = m, and / ^(?)di? = m2,


Jo Jo
respectively.
It remains to verify that the strategies <j>' and V** prescribed above maximize
K{4>,$') of (2.10.24), and minimize K(4>",tj>) of (2.10.25), respectively. To do this,
we first examine the coefficients Afi() and M2() of fa and fa in K(fatpm). By
construction, the coefficient M\(Q of fa is identically zero on [0,c), increases linearly
on [c,d)t and afterwards remains constant. Also, Mi() is continuous throughout
[0,1]. Next we notice that M2({) is linear on [c,d] with the same slope as Af^f).
Furthermore, they agree at = e in [c,d] by (2.10.28), and hence Mi = M2 for in
[c,d].
We may also deduce immediately from the definition of ^>* that M2 increases
strictly throughout [0,1] (see Fig. 2.18). With these facts the maximization of
K(faJ>*) is now easy to perform. Clearly, the maximum is achieved for any 4> with
the properties (a) #2 = 0 and fa arbitrary (0 < #i < 1) for in [0, c); (b) fa + fa = 1
and otherwise arbitrary for in [c,d); (c) fa = 1 for in [d,l]. It is clear that <j>* as
specified above fulfills these conditions.
A study of the coefficients N\(i}) and iV2(ij) of fa(ti) and fa{q) in K(<f>*, tj>) shows
that they are as indicated in Fig, 2.19. We observe that K{<j>',^>) is minimized by
any ^ with the properties (a') i>t = 0 and fa arbitrary (0 < fa < 1) for IJ in (0,c);
(b') fa = 1 for TJ in [c,d); (c') fa = 1 for q in [d,l]. Clearly, tp* as specified above
obeys these requirements. The proof of the optimality of $* and ^>* is now complete.
For illustrative purposes we append the following example. Let a = b = 2; then
( /o 9/3S <t>ld( = 8/35, 0 < { < 19/35,
4ttt) = \ 1, 19/35 < i < 23/35,
U, 23/35<<l,
2.10. A poker model 115

Figure 2.18

Figure 2.19

0, 0 < < 2 3 / 3 5 ,
-{!:
<t>\ 23/35 < < 1,
f 0, 0 < r) < 19/35,
i/>;(V) = il, 19/35 < v < 29/35,
[o, 29/35 < t ? < l ,

[ /o19/351$% = 3/70, 0 < t) < 19/35,


tfjfo) = \ 0, 19/35 < ?? < 29/35,
I 1, 29/35 < TI < 1.
The value is 11/35.
The general solution. It follows readily from Fig. 2.19 that the minimizing solution
cannot have any other form than that given in (2.10.39). However, in maximizing
116 Chapter 2. Infinite zero-sum two-person games
K{fail>m) w e found that on the interval [c,d] the only necessary condition for 4> to
qualify as a maximum is that fa + fa = 1- Altering fa and fa in this interval subject
to this condition, we can determine all possible optimal strategies for A. To this end,
we specify fa{) on [0,c] such that

/ fad = mi

(calculated above) and fa(() 1 on [d, I). For in [c, d] we require only that fa +fa =
1. Writing out the conditions under which K(faij>) is minimized for V*, we obtain
rd - rd - bid - n)
l-d= 4{IW and / fa(t)d( < K 2 for r, [c,rf], (2.10.40)
Jc Jri 0+0+1
where c and d are the same as before. We obtain these constraints by equating the
coefficients of ^>i(i?) and ^(tj) in (2.10.25) at r? = d and by requiring that N\{r\) <
#2(9) for t) in [c, d].
This relation is easily seen to be necessary and sufficient for $ to be optimal.
2.10.4. Poker model with k raises, [Karlin and Restrepo (1957)]. In this section
we indicate the form of the optimal strategies of a poker model with several rounds of
betting. The methods of analysis are in principle extensions of those employed in the
preceding section, but far more complicated in detail. We omit the proofs, referring
the reader to the references.
JtuJes, strategies, and payoff. The two players ante one unit each and receive
independently hands and rj (which are identified with points of the unit interval)
according to the uniform distribution. There are k + 1 rounds of betting ("round"
in this section means one action by one player). In the first round A may either fold
(and lose his ante) or bet a units. A and B act alternately. In each subsequent round
a player may either fold or see (whereupon the game ends), or raise the bet by a
units. In the last round the player can only fold or see. If k is even, the last possible
round ends with A; if k is odd, the last possible round ends with B.
A strategy for A can be described by a i-tuple of functions 4> = (fa(Q>fa(0r>
fa(()). These functions indicate A's course of action when he receives the hand (.
Explicitly,

i-ixo
1=1
is the probability that A will fold immediately, and

I>(0
1=1
is the probability that A will bet at his first opportunity. Further,
^ i ( 0 = probability that A will fold in his second round,
fa(i) = probability that A will see in his second round,
?=3 &(0 probability that A will raise in his second round,
if the occasion arises, i.e., if B has raised in his first round and kept the game going.
Similarly, if the game continues until J4'S rth round, then
2.10. A poker model 117

<t>2r-3(0 = probability that A will fold in his rth round,


<f>2r~2() = probability that A will see in his rth round,
?=2r-i <t>i() = probability that A will raise in his rth round.
Analogously, a strategy for B can be expressed as a fc-tuple

tf> = (My), ,i>k(v))

which indicates B's course of action when he receives the hand rj. The probability
that B will fold at his first opportunity is

If the game continues until B's rth round, then


i>2r-2(i}) = probability that B will fold in his r t h round,
4>2r-i(v) = probability that B will see in his rth round,
Hk,=2rili(Tl) = probability that B will raise in his r t h round.
If the two players receive hands and 17 and choose the strategies <f> and tp, respectively,
then the payoff to A can be computed as in the previous examples by considering the
mutually exclusive ways in which the betting may terminate. The payoff to A is as
follows:

m0.tfO?)] = ( - i ) ( i - 5 > ( o )

L
.=1 j=i

+ I>ifo)H + l)Mt) + (2a + i)Mi)i>{i,v)}


1=2

+ E ^i('?){-[(2r-3)o + l]^ r _3(0 + [(2r-2) f l + l]*j r _ 2 (OI(f,i?)}


j=2r-2

+ E ^ , ( 0 { [ ( 2 r - 2 ) a + l ] ^ ^ ) + [ ( 2 r - l ) a + l]^ r -,(i ? )I(e,f?)}


i= 2r-l

+ E ^(?){-|(2r - 1) + % r - i ( 0 + (2ra + l J M O i t f . " ? ) }

+ E *(0{(2ro + 1)TM>?) + [(2r + l)a + 1 ] ^ + , (*?)(, *?)}


i=2r+l
where L(, 77) is the function already defined. The expected payoff is

*(*,*) = / T F W O - ^ M ^ . (2.10.41)
Jo Jo
118 Chapter 2. Infinite zero-sum two-person games

Description of the optimal strategies. There exist optimal strategies <* and V>*
characterized by 2k -r 1 numbers b, cx,..., ck, du..., dk. When a player gets a hand
( in (0, b) he will bluff part of the time and fold part of the time. We shall write

rm= ["mW, i = 1,3,5,... (2.10.42)


Jo

and
; = / ^-(ij)*J, j =2,4,6,... (2.10.43)
JO

to represent the probabilities of bluffing in the various rounds of betting. If A receives


a hand in (CJ_I,C<), where co = b, he will choose # ( ( ) = 1 and <*() = 0 for / ^ t.
Similarly, if B gets a hand n in (dj_i, dj), where do = 6, he will choose 0y (r;) = 1 and
I/,;(T)) = 0 for / ^ j . The solution is represented by Pig. 2.20. The fact that

Car-j < d2T-i < dir < c 2 r , r 1,2,...

is important.
The constants c,,^,m,-, and n;- are determined by solving an elaborate system of
equations analogous to (2.10.30)-(2.10.34). Explicitly, if k is even, b,a,dj,m.i, and tij
are evaluated as solutions of the following equations:
k
[(4r - l ) a + 2] rtj = a(l - <**_,), 2r = 2 , 4 , . . . . *,
i=2r

*
a(c 2 r _ 2 - d 2 ,- 2 ) = a(l - c 2r _ 2 ) + [(4r - 3)a + 2] n,, 2r = 4 , 6 , . . . , * ,
i=ir
k
[ ( 4 r - 3 ) o + 2] *, = a ( l - c 2 r _ 2 ) , 2r = 2 , 4 , . . . , * ,
i=2r-l
k
o ^ . , - C 2 r - i ) = a ( l - r f 2 r - i ) + [ ( 4 r - l ) o + 2] 5 3 m 2r = 2 , 4 , . . . ,fc,
i=2r+l

(4ra + 2)(c 2r - d 2 r _.) = [(4r + 2)a + 2](c2r - <f2r), 2r = 2 , 4 , . . . , k,


[(4r - 2)o + 2](d 2r _! - c 2r _ 2 ) = (4ra + 2)(<*2r_, - c 2 r _,),2r = 2 , 4 , . . . , k,

2 = (a + 2)[Yimv)dr].
J0
]=\

An analogous system applies for k odd. The solutions obtained are consistent with
the requirements of Fig. 2.20.
2.10.5. Poker with simultaneous moves, [von Neumann and Morgenstern (1944),
Karlin (1959)]. Two players, A and B, make simultaneous bets after drawing hands
according to the uniform distribution. The initial bet can be either 6 (the low bet) or
a (the high bet). If both bets are equal, the player with the higher hand wins. If one
player bets high and the other bets low, the low bettor has a choice: he may either
2.10. A poker model 119

Figure 2.20

thelow
fold (losing the low bet) or see by making an additional bet oia b. IfIfthe low' bettor
bettor
sees, the player with the higher hand wins the pot.
Since the game is symmetric, we need only describe the strategies of < one player.
If A draws , we shall write
<pi() = probability that A will bet low and fold if B bets high,
<h(0 = probability that A will bet low and subsequently see,
$3(0 = probability that A will bet high.
These functions are of course subject to the constraints

*(0 > 0, S > ( 0 = 1,


1=1

The
i. ne expected
expecieu yield
yieiu to
to A
s\ if
11 he
ne uses
uses strategy
si.rai.egy $
if) while
mine Ba employs
euipiuys strategy
suaregj 0y reduces to
raiuua IU

K(4>,,j>)
K(^i>) - = bn\m) + Hi))\Uv) + Un)]mv)df,dv
Jo Jo
Jo Jo

-- bJofflMOMvWdt,
Jo
+ Jo
6 Jo/ Y Mt)MnW*i
Jo Jo Jo Jo
+ aJ*j\(t)MvWWt*l
+ Jo Jo
Jo

+ a Jof Jof
+ MZ)Mv)L((,v)d(dr,.
Jo Jo
Because of the symmetry of the game, we may replace fl's strategy $(i)) in this
expression by an optimal strategy <j>*(f}). We make the plausible assumption that
120 Chapter 2. Infinite zero-sum two-person games
in this strategy M^) = 0, since there would appear to be no clear justification for
making a low bet initially and then seeing. The consistency of this assumption will
be established later. With Mv) = 0 we may write K((j>,4>") as

K(4>,f) ==bbJo//Jo7M0
K(faP) W )++ MOlMnWO^d^
M0mtW(,r))d(dr}
JO J0

-* ft
ff MOMvWdr, flflMOMriWdv
MOPMWi ++b b/ Y MOM*)^
Jo Jo Jo Jo
+a //'/'[I
7 V - MO ~- M0mv)L(0v)dSd
MtMivWCvWin
Jo Jo
+ H1
Jo Jo MOMmOvWdr,
Jo Jo
= M
=a M[ && r44''Mdri b iir )d, b 4 Mdv
Mdri bii *'iir>)d,> - b 4'Mdv
- Jo Jo

a [ r '' - ii *' > > - '


-a^<j>;(r))dr, + aj^'Mdrf]di

+
+ ffQQ *()[*
MO\bfj of <f>'Mdr}
PMdf} - -bj*
bf Ml)dri}dZ
$(tl)*l]dt

+ [ MO[*
MO[*jf j ftftf fo)fy]
fo)fy] + [fo rP33(v)L((, v)d(dr,,
(f,m,l)Wl, (2.10.44
(2-10-44)
or
or
K(<f>,f) = J*^M0Ti(0dt + Z,
where Z is a term independent of fa. The <j> maximizing K(<j>, <f>") is evaluated by
choosing the component fa as large as possible whenever 7j(() = maxj T,(). If the
maximum of T,(f) is attained simultaneously by two of the Ti% then the corresponding
fa may share any positive values provided their sum is 1.
Some bluffing is anticipated on low hands. This suggests that for < 0 we
should have Ti(0 = T3() > T2{0- Differentiating the identity T,() = T3(), and
remembering that MO + MO = 1 on the interval [0,f0]> we deduce that

MO == ~Tl a e forfor <


-j-a-e.
- - * < &
With this choice of <^, and where MO = 1 for > 0, we obtain that 7i() =
Ta(f) is possible only if
a-b
o
So =

a
a
The proposed solution is as follows:
* '

S8 ==i.P' ^
t' =l I(o

=
^ ' e!>K &.
f' (2-10.45)
(2.10.45)

It is now routine to verify that $* as exhibited is indeed optimal. It is clear that


Ti{Q = T3() > Ta() for < (0, and hence the maximum is achieved provided only
2.11. Exercises and problems 121

that <j>i + 4>3 = 1, which is certainly satisfied for <f>' of (2.10.45). Moreover, we have
seen that Tt() T3() uniquely determines <j>* to be as in (2.10.45) for < 0-
For > , by examining (2.10.44) we find that T2() = T3() > Tt((). Hence the
maximization of K{<j>, <j>") requires <f> to be such that 4n + <h = 1- But, if $\ > 0 in this
interval, a simple calculation shows that Tj() < Ti() for f > i where i < o- All
these inferences in conjunction prove that the <f>* of (2.10.45) is the unique optimal
strategy of the game.

2.11 Exercises and problems


1. Attack-defense game. Player 1 wishes to attack with A units one of the targets
C i , . . . , C n whose value is determined by the numbers TX > 0, T 3 > 0 , . . . , r n > 0, with
Ti > T2 > . . . > r. A pure strategy x for Player 1 is the vector x = (i, ,),
12?=i (> = ^ J where & is the part of the units of attack allocated to the target Ci. The
Defender (Player 2) has a total of B units. A pure strategy for Player 2 is the choice
of a collection of y non-negative numbers y = (fh,...,ij n ) satisfying the condition
53"= i V< = B where 77, is a part of the units of defense assigned to the target C,. The
result of an attack on the target C{ is proportional to the difference & % if the
Attacker's forces outnumber the Defender's forces, otherwise it is zero. Construct the
payoff function.
2. A game on a unit square has the payoff function

H(x,y) = xy- -x- -y.

Show that (1/2,1/3) is the equilibrium point in this game.

3. Show that the game on a unit square with the payoff function

H(x,y) = sign(x-y)

has a saddle point.


4. Show that the duel type game on a unit square with the payoff function
-l/x2,

i
x>y,
0, x = y,

1/y2,
x <y
has the saddle point (0,0).
5. Show that the game on a unit square with the payoff function H(x, y) = {xy)2
does not have the saddle point in pure strategies.
6. Show that in the game on a unit square with the payoff function
fx + y, zthyfO,
1/2 + !/, x = l,y^0,
H(x,y)
1/2+x, x ^ l , y = 0,
2, x = l,y = 0
122 Chapter 2. Infinite zero-sum two-person games

the pair (xe,yc), where i e = 1 e, yt = e, is an e-saddle point. Does the game have a
value?
7. Solve the game of "search for a noisy object" formulated in Example 6, 2.1.2.
8. Compute the payoff to Player 1 in the game on a unit square with the payoff
function H(x,y) in the situation (F(x),G(y)) (F and G are distribution functions),
if
(a) H{x,y) = (x + )/(4*y), F(x) = x2, G(y) = y2;
(b) H(x, y) = \x- y\(l - |* - If I), F(*) = *, G(y) = y;
(c) H(z,y) = (x- y)2, F(x) = l/2/ 0 (x) + l/2h(x), G(y) = / , ( * ) , where Ik(x)
is a step function.
9. Game of discrete search. Consider the following infinite game. A strategy for
Player 2 is the choice of the point uniformly distributed over the circle of radius y,
where y can take values from the interval [0,1]. Player 1 may survey in a unit circle
the simply connected region whose area a(Q) = a = const, where a < A, A = T is
the area of the unit circle. His strategy x is the choice of a shape of the region Q
which has the area a and lies entirely within the unit circle. The payoff H(x,y) to
Player 1 is the probability of being discovered, i.e. H(x, y) = Pr(y Q). The mixed
strategy g(y) for Player 2 means the density of the distribution function of a random
variable y [0,1]. Find a solution to the game.
10. Prove Helly theorem, 2.5.4.
11. Consider a continuous analog of the "town defense" game (see 1.1.3). Player
1 has x units to attack the first post and 1 x to attack the second post, x [0,1].
Player 2 has y units of defense, where y [0,1], to allocate to the first post and 1 y
units of defense to the second post at which the permanent forces of defense, 1/2, are
located. A player pays 1 to the other for every post at which he has less units than
his opponent, and pays nothing if the players' forces are equal in number.
Construct the payoff function H(x,y) for the game on a unit square. Show that
this game has no solution in mixed strategies.
Hint. Make use of the result of Example 10, 2.4.12.
12. Show that in the continuous game with the payoff function
H(x,y) = [l + (x + y)2}-i
the strategies F*(x) = h/i{x), G*(y) = l/2I0(y)+l/2Il(y) are optimal for the players
1 and 2, respectively.
13. Prove that the value of the continuous symmetric game on a unit square
is zero, and optimal mixed strategies coincide (the game is symmetric if the payoff
function is skew-symmetric, i.e. H(x,y) = H(y,x)).
14. Define the optimal strategies and the value of the game on a unit square with
the payoff function H(x, y) = y3 3xy + x3.
15. Show that in the game with the payoff function

H(x,y) = e^-'yjl - x2/y2, x (x 0 ,*i], V [ya,yi], 1 > 0.


Player 2 has an optimal pure strategy; clarify the form of this strategy depending on
the parameter 7 > 0. What can be said about Player l's optimal strategy?
2.11. Exercises and problems 123

16. Verify that the payoff function from Example 11, 2.5.5, H(x,y) = p(x,y),
x e 5(0,/), y g 5(0,/), where 5(0,/) is the circle with its center at 0 and radius /,
/>() being a distance in R2, is strictly convex in y for any x fixed.
17. Show that the sum of two convex functions is convex.
18. Prove that, when bounded, the convex function <p : [a, fi\ * R} is continuous
in any point x (,/3). At the ends a and (5 of the closed interval (a,/?), however,
the convex function <p is upper semicontinuous, i.e.

lim^j(i) < ^J(Q)

(in much the same way as x > /?).


19. Let there be given the game T = {X,Y,H), X ~Y - [0,1] with the bounded
convex payoff function H(x,-) : [0,1] > Rl. Show that Player 2 in this game has
either an optimal strategy or an e-optimal pure strategy for every t > 0. The result
of the theorem given in 2.5.6 applies to Player 1.
Hint. Make use of the result of Exercise 18 and consider the auxiliary game
To = (X,Y,H0), where

/fo(x,,) = ( f ^ ' , jj^if.1)' ,


v 'y) \ l i m v ^ y H{x,yn), ify = 0 o r y = l .
20. Solve the "attack-defense" game formulated in Exercise 1.
2 1 . Consider the simultaneous game of pursuit on a plane (see Example 1 in
2.1.2) in which the strategy sets Si = S2 = 5 , where 5 is bounded and closed convex
set.
(a) Show that the value of the game discussed is R, where R is the radius of a
minimal circle 5 ( 0 , R) containing 5 , and an optimal strategy for Player 2 is pure and
is the choice of a center O of the circle S(0, R).
(b) Show that an optimal strategy for Player 1 is mixed and constitutes a mixture
of two diametrically opposite points of tangency of the set 5 to the circle S(0, R) (if
such points xx and i 2 exist), or, alternatively a mixture of three points of tangency
x'x,x'2,x'3 such that the point O is inside a triangle whose vertices are these points.
22. Solve the simultaneous game of pursuit on a plane discussed in Example 21
assuming that Player 2 chooses not one point y 6 5 , but m points yi,... ,ym 5 .
The payoff function of the game is
m
1
H{x,y) = y(x,t/i),

where /() is a distance in R2.


2 3 . Player 1 selects a system x of m points in the interval [-1,1], i.e. x =
( i , - - , m ) , & S [-1,1], i = l , . . . , m . Player 2 selects independently and simul
taneously a system y of n points in the same interval [-1,1], i.e. y (?i, -,?),
r\j G [-1,1], j = 1,2,... ,n. The payoff function H(x,y) is of the form

H{x<y) = ^(maxmin 16 ~ n}\ + max mm |, - J/J|).


124 Chapter 2. Infinite zero-sum two-person games

Find a solution to the game.


24. Consider an extension of the problem given in 2.8.3, namely, the game of
search in which Player 2 selects a system of k points y = (yu..., j/*) on a sphere C
and Player 1 selects, as before, a system x of s points x = (x\,... ,x,) on the sphere
C. The payoff function is

H(x,y) = {M\M = \{Vi}\: Vi S(xhr); j = l,...,a},

where S(XJ, r) is a spherical segment with its apex at the point Xj and with r as a base
radius; |{y,}| means the number of points of the set {j/,}. The point y, is considered
to be discovered if & S(xj,r) for at least one Xj. Thus the payoff function is the
number of the points discovered in the situation {x,y).
Find a solution to the game.
Chapter 3
Nonzero-sum games

3.1 Definition of noncooperative game in normal


form
3.1.1. The preceding chapters concentrated on zero-sum two-person games, i.e. the
games in which the interests of the parties are strictly contradictory. However, a
special feature of the actual problems of decisions making in a conflict context is that
there are too many persons involved, with the result that the conflict situation is far
from being strictly contradictory. As for a two-person conflict and its models, it may
be said that such a conflict is not confined to the antagonistic case alone. Although
the players' interests may intersect, they are not necessarily contradictory. This, in
particular, can involve situations that are of mutual benefit to both players (which
is not possible in the antagonistic conflict). Cooperation (selection of an agreed
decision) is thus made meaningful and tends to increase a payoff to both players. At
the same time, there are conflicts for which the rules of a game do not specify any
agreement or cooperation. For this reason, in non zero-sum games, a distinction is
made between noncooperative behavior, where the rules do not allow any cooperation
(see Sees. 3.1-3.8), and cooperative behavior, where the rules allow cooperation in
the joint selection of strategies (see Sees. 3.9-3.10) and side payments making (see
Sees. 3.11-3.15). We shall consider the former case.
3.1.2. Definition. The system

r = {N,{Xi}ieN,{HiheN),
where N = {1,2,..., n} is the set of players, X,- is the strategy set for player i, Hi is
the payoff function for player i defined on Cartesian product of the players' strategy
sets X = n"=i Xi (ine sei of situations in the game), is called a noncooperative game.
A noncooperative n-person game is played as follows. Players choose simultane
ously and independently their strategies i , from the strategy sets Xif i = 1,2,... , n ,
thereby generating a situation x = {x\,...,x), n 6 Xi. Each player i receives the
amount //<(x), whereupon the game ends.
If the players' pure strategy sets X, are finite, the game is called a finite nonco
operative n-person game.

125
126 Cnapter 3. Nonzero-sum games

3.1.3. The noncooperative game T played by two players is called a two-person


game. The noncooperative two-person game T is then defined by the system T =
{Xi,X2,HitH2), where Xx is the strategy set of one player, X3 is the strategy set
of the other player, Xi x Xj is the set of situations, while Hi : X\ x X? ~* R1,
H3 : Xi x X2 R1 are the payoff functions to the players 1 and 2, respectively. The
finite noncooperative two-person game is called bimatrix game. This is due to the
fact that, once the pure strategy sets of players have been designated by the numbers
1,2, ...,rn and 1,2, . . . , n , the payoff functions can be written in the form of two
matrices

an ... aln Ai ... A*


H^A^ and Hi = B =
a Q
m! mn Anl fan
Here the elements a y and fa of the matrices A, B are respectively the payoffs to
players 1 and 2 in the situation (t, j), i g M,j N, M = { 1 , . . . , m } , N = { 1 , . . . , n}.
In line with the foregoing, the bimatrix game is played as follows. Player 1 chooses
number t (the row) and Player 2 (simultaneously and independently) chooses number
j (the column). Then Player 1 receives the amount ory = Hi(xi,yj) and Player 2
receives the amount fa = ^ ( x ^ y y ) .
Note that the bimatrix game with matrices A and B can also be described by
the (m x n) matrix (A,B), where each component is a pair (ctij,fa), = 1 , 2 , . . . , m ,
j = 1,2,... ,n. The game determined by the matrix A and B will be denoted as
T{AtB).
If the noncooperative two-person game T is such that H\{x,y) = #2(2:,$/) for
all x Xi, y Xi, then T appears to be a zero-sum two-person game discussed in
the preceding chapters. In the special bimatrix game, where there is a y = fa, we
have a matrix game examined in Chapter 1.
3.1.4. Example 1. ("Battle of the sexes".) Consider the bimatrix game deter
mined by
A A
(4,1) (0,0)
(A,B) =
2 (0,0) (1-4)
Although this game has a variety of interpretations, the best known seems to be
Luce and Raiffa (1957). Husband (Player 1) and his wife (Player 2) may choose one
of the two evening entertainments: football match ( a i , A ) or theatre ( a 2 , A ) - If they
have different desires, ( a i , A ) o r (2iA) t n e y stay at home. The husband shows
preference to the football match, while his wife prefers to go to the theatre. However,
it is more important for them to spend the evening together than to be alone at the
entertainment (though preferable).
Example 2. ("Crossroads" game.) [Moulin (1981)]. Two motorists move along
two mutually perpendicular routes and simultaneously meet each other at a crossroad.
Each motorist may make a stop (1st strategy, a j or A ) or continue on his way (2nd
strategy, a 2 or A ) .
3.1. Definition of noncooperative game in normal form 127

It is assumed that each player prefers to make a stop in order to avoid an accident,
or to continue on his way if the other player has made a stop. This conflict can be
formalized by the bimatrix game with the matrix

01 A
1 (1,1) (l-,2)
(A,B)
<*2 (2,1-c) (0,0)

(the non-negative number e corresponds to the feeling of dissatisfaction that one


player has to make a stop and let the other go).
Example 3. (Selection of a vehicle for a city tour.) [Moulin (1981)]. Suppose the
number of players is large and each of the sets X{ consists of two elements: X, = {0,1}
(for definiteness: 0 is the use of a private vehicle and 1 is the use of a public vehicle).
The payoff function is defined as follows:
with x = 1
H(x x ^--H')' ' >
,( 1
'--' n)
~~ \b(t), with x,= 0,
where * = * , , * ; .

1 yr-/ail)

a(0) / /
/

6(0) /
0 io i1 1 t

Figure 3.1

Let a and b be of the form shown in Fig. 3.1. From the form of the functions a(t)
and 6(f) it follows that if the number of players choosing 1 is greater than t\, then the
street traffic is light enough to make the driver of a private vehicle more comfortable
than the passenger in a public vehicle. However, if the number of motorists is greater
than 1 to, then the traffic becomes so heavy (with the natural priority for public
vehicles) that the passenger in a public vehicle compares favourably with the driver
of a private vehicle.
Example 4- (Allocation of a limited resource taking into account the users' inter
ests.) Suppose n users have a good chance of using (accumulating) some resource
whose volume is bounded by A > 0. Denote by i , the volume of the resource to
128 Ciapter 3. Nonzero-sum games

be used (accumulated) by the ith user. The users receive a payoff depending on the
values of the vector x = (x\, x 2 , . . . , x). The payoff for the ith user is evaluated by
the function /i,(xi, x 2 , . . . , x n ), if the total volume of the used (accumulated) resource
does not exceed a given positive value 0 < A, i.e.
n
x( < 0, x{ > 0.

If the inverse inequality is satisfied, the payoff to the ith user is calculated by the func
tion <?,(xi, x 2 , . . . , i ) . Here the resource utility shows a sharp decrease if J2?=i xi > >
i.e.
9i(xi,Xi,...,Xn) < hi(xi,X2,...,Xn).
Consider a nonzero-sum game in normal form

r = (N,{X<UN,{HiheN)

where the players' payoff functions is

L9i{zi, , Xn), L,=i a;.' > 0 ,

Xi = [0,ai], 0<a{<A, > = i4, N = { 1 , 2 , . . . , n } .


i=i
The players in this game are the users of the resource.
Example 5. (Game-theoretic model for air pollution control.) In an industrial
area there are n enterprises, each having an emission source. Also, in this area
there is an ecologically significant zone ft whose air pollution must not exceed a
maximum permissible level. The time and area-averaged emission from n emitters
can be approximately calculated by the formula
n
1 = Y,cix 0<Xi < o = l,2,...,n.
i=i

Let 0 < HC?=i Cj<ii be a maximum emission concentration level.


We shall consider the enterprises to be players and construct the game, modeling
an air pollution conflict situation. Suppose each enterprise can reduce its operating
expenses by increasing an emission x^. However, if the air pollution in the area fl
exceeds the maximum emission concentration level, the enterprise incurs a penalty
Si>0.
Suppose player i (enterprise) has an opportunity of choosing the values x; from
the set Xi = [Q,Oi]. The players' payoff functions are

tf/~ \ _ f Mxi,X2, . . . , X n ) ,
(ni(xi,x2,...,xn) -st, 9>0,

where /,(xi,Xa,...,x) are the functions that are continuous and increasing in the
variables x;.
3.2. Optimality principles in noncooperative games 129

Example 6. (Game-theoretic model for bargaining of divisible good.)


[Zenkevich and Voznyuk (1994a)]. Two players take part in an auction where q units
of good with minimal price po are offered. Assumed that players 1,2 have budgets
M\,M2 respectively. The players demand their quantities of good qi,q2 (<h,92,<?
-integers) and bid their prices pi,p2 for unit of the good simultaneously and indepen
dently in such a way that

9i + 92 > 9, 0 < 9i < 9, 0 < 92 < 9, pi [po.pl], p^ [po.pl],

where pT = MA/{q - 1), p7 = M2/(q - 1).


According to the bargaining process rules, a player who bids the higher price buys
demanded quantity of good at this price. The other buys the rest of good at his own
price. If bidden players' prices are equal then Player 1 has an advantage over Player
2. Each player objective is to maximize his profit.
This bargaining process can be described as a nonzero-sum two-person game in
normal form T = (X, Y, Hi,H2), where sets of the players' strategies are

X = {pilPi (po,pT]}, Y = W P ? [po-PT]}

and payoff functions are

ti / 1 _ J (PT - P i k i . Pi ^ P21
"l(Pl'P2)-n(pT-Pi)(9-92), P<P2,

H / \ _ / (PI - P2)?2, Pi < P2,


^(P"P2)-\(PJ-P2)(<?-91), Pl>P2-

3.2 Optimality principles in noncooperative


games
3.2.1. It is well known that for zero-sum games the principles of minimax, maximin
and equilibrium coincide (if they are realizable, i.e. there exists an equilibrium, while
maximin and minimax are reached and equal to each other). In such a case, they
define a unified notion of optimality and game solutions. The theory of nonzero-sum
games does not have a unified approach to optimality principles. Although, there are
actually many such principles, each of them is based on some additional assumptions
of players' behavior and a structure of a game.
It appears natural that each player in the game F seeks to reach a situation z in
which his payoff function has a maximum value. The payoff function //,, however,
depends not only on the strategy of the tth player, but also on the strategies chosen
by the other players. Because of this, the situations \x'} determining a maximum
payoff to the th player may not do the same thing for the other players. As in the
case of a zero-sum game, the quest for a maximum payoff involves a conflict, and even
formulation of a "good" or optimal behavior in the game becomes highly conjectural.
There are many approaches to this problem. One of these is the Nash equilibrium
130 Chapter 3. Nonzero-sum games

and its various extensions and refinements. When the game T is zero-sum, the Nash
equilibrium coincides with the notion of optimality (saddle point - equilibrium) that
is the basic principle of optimality in a zero-sum game.
Suppose x = ( x i , . . . , x x) is an arbitrary situation in the game
T and Xi is a strategy of player i. We construct a situation that is different from x
only in that the strategy x, of player i has been replaced by a strategy x|. As a result
we have a situation ( x , , . . . ,Xj_i,xJ,x; + 1 ,... , s ) denoted by (x||xj). Evidently, if Xj
and x\ coincide, then (x||x() = x.
Definition. The situation x' = ( x j , . . . , x * , . . . , x*) is called the Nash equilibrium
if for all x; G AT,- and i = 1 , . . . , n there is

#,(**) > #,(*!*,) (3- 2 - 1 )

Example 7. Consider the game from Example 3, 3.1.4. Here we regard as Nash
equilibrium the situation for which there is the condition
k < t' - 1/n, f + 1/n < t, (3.2.2)
where t* = " _ , x}. It follows from (3.2.2) that a payoff to a player remains
unaffected when he shifts from one pure strategy to another provided the other players
do not change their strategies.
Suppose a play of the game realizes the situation x corresponding to t = )"_, Xj,
t [to, U], and the quantity S is the share of the players who wish to shift from strategy
0 to strategy 1. Note that if 6 is such that b(t) = a(t) < a(t + 6), then the payoffs
to these players tend to increase (with such a strategy shift) provided the strategies
of the other players remain unchanged. However, if this shift is actually effected,
then the same players may wish to shift from strategy 1 to strategy 0, because the
condition a(t + S) < b(t + 6) is satisfied. If this wish is reahzed, then the share of
players, Z)" = i Xj, decreases and again falls in the interval [to!*i]-
Similarly, let 6 be the share of players, who decided, for some reason (e.g. because
of random errors), to shift from strategy 1 to strategy 0, when t S < to- Then, by
the condition b(t S) < a(t S), the players may wish to shift back to strategy 1.
When this wish is realized, the share of the players, ^ " = 1 x,, increases and again
comes back to the interval [<oi'i]-
3.2.2. It follows from the definition of the Nash equilibrium situation that none of
the players :' is interested to deviate from the strategy x* appearing in this situation
(by (3.2.1), when such a player uses strategy x, instead of x*, his payoff may decrease
provided the other players follow the strategies generating an equilibrium x*). Thus, if
the players agree on the strategies appearing in the equilibrium x*, then any individual
non-observance of this agreement is disadvantageous to such a player.
Definition. The strategy x* Xi is called equilibrium if it appears at least in
one Nash equilibrium.
For the noncooperative two-person game T = (X\,Xi,H\,H-i) the situation (x*, y*)
is equilibrium if the inequalities
^(x,y*)</f,(x*,y-), H3(x',y)<H2(x',y') (3.2.3)
3.2. Optimaiity principles in noncooperative games 131

hold for all x X\ and y X2


In particular, for the bimatrix (m x n) game T(A,B) the pair (*,J*) is the Nash
equilibrium if the inequalities

a . <<*.,-., ft.j<A.i. (3.2.4)

hold for all the rows * M and columns j N. Thus, Example 1 has two equilibria
at ( a , , f t ) and ( a 2 , ^ 3 ) , whereas Example 2 has equilibria at (aj,/? 2 ) and (a2,ffi).
Recall that for the zero-sum game T = (Xi,X2,H) the pair (x*,y") X\ x X2 is
an equilibrium if

H(x,y') < H(x',y') < H(x",y), x Xu y X2.

Equilibria in zero-sum games have the following properties:


1. A player is not interested to inform the opponent of the strategy (pure or
mixed) he wishes to use. (Of course, if the player announces in advance of the play,
the optimal strategy to be employed, then a payoff to him will not be reduced by the
announcement, though he will not win anything.)
2. If (x, y) 6 Z(T), (x\ y') Z(T) are equilibria in the game I \ and v is the value
of the game, then
(x',y)Z(r), (x,y')Z(T), (3.2.5)
v = H(x,y) = H(x',y') = H(x,y') = H(x\y). (3.2.6)
3. Players are not interested in any intercourse for the purposes of developing
joint courses of action before the game starts.
4. If the game T has an equilibrium, with x as a maximin and y as a minimax
strategy for the players 1 and 2, respectively, then (x,y) Z(T) is an equilibrium,
and vice versa.
We shall find whether these properties hold for bimatrix games.
Example 8. Consider a "battle of the sexes" game (see Example 1 and 3.1.4). As
already noted, this game has two equilibria: ( a i , f t ) and {a2,P2)- The former is ad
vantageous to Player 1, while the latter is advantageous to Player 2. This contradicts
(3.2.6), since the payoffs to the players in these situations are different. Although the
situations (ai,/?i), (0:2,ft) are equilibria, the pairs (ai,/?2) and (Q2,/?I) are not Nash
equilibria, i.e. property 2 (see (3.2.5)) is not satisfied.
If Player 1 informs his partner of the strategy ax to be employed, and if Player
2 is convinced that he is sure to do it, then he cannot do better than to announce
the first strategy 0t. Similar reasoning applies to Player 2. Thus, it is advantageous
to each player to announce his strategy, which contradicts property 1 for zero-sum
games.
Suppose the players establish no contact with each other and make their choices si
multaneously and independently (as specified by the rules of a noncooperative game),
Let us do the reasoning for Player 1. He is interested in realization of the situation
(ai,/?i), whereas the situation (a 2 ,/?2) is advantageous to Player 2. Therefore, if
Player 1 chooses strategy Qi, then Player 2 can choose strategy 02, with the result
132 Chapter 3. Nonzero-sum games

that both players become losers (the payoff vector (0,0)). Then it may be wise of
Player 1 to choose strategy a 2 , since in the situation (a 3 , #)) he would receive a payoff
1. Player 2, however, may follow a similar line of reasoning and choose 0i, then, in
the situation (a<2,/3i) both players again become losers.
Thus, this is the case where the situation is advantageous (but at the same time
unstable) to Player 1. Similarly, we may examine the situation (a 2 , & ) (from Player
2's point of view). For this reason, it may be wise of the players to make, in advance
of the play, contact and agree on a joint course of action, which contradicts property
3. Note that some difficulties may arise when the pairs of maximin strategies do not
form an equilibrium.
Thus we have an illustrative example, where none of the properties 1 4 of a
zero-sum game is satisfied.
Payoffs to players may vary with Nash equilibria. Furthermore, unlike the equi
librium set in a zero-sum game, the Nash equilibrium set is not rectangular. If
x = ( i i , . . . , a;,,..., x) and x' = (x\,..., x(-,..., x'n) are two different equilibria, then
the situation x" composed of the strategies, which form the situations x and x' and
coincides with none of these situations, may not be equilibrium. The Nash equilib
rium is a multiple optimality principle in that various equilibria may be preferable to
different players to a variable extent. It now remains for us to answer the question:
which of the equilibria can be taken as an optimality principle convenient to all play
ers? In what follows it will be shown that the multiplicity of the optimality principle
is characteristically and essential feature of an optimal behavior in the controlled
conflict processes, with many participants.
Note that, unlike a zero-sum case, the equilibrium strategy x* of the th player
may not always ensure at least the payoff Hi{x") in the Nash equilibrium, since this
essentially depends on whether the other players choose the strategies appearing in
the given Nash equilibrium. For this reason, the equilibrium strategy should not be
interpreted as an optimal strategy for the ith player. This interpretation makes sense
only for the n-tuples of players' strategies, i.e. for situations.
3.2.3. An important feature of the Nash equilibrium is that any deviation from
it made by two or more players may increase a payoff to one of deviating players.
Let S C N be a subset of the set of players (coalition) and let z = ( x i , . . . , x ) be a
situation in the game I \ Denote by (x||z$) the situation which is obtained from the
situation x by replacing therein the strategies x,, S, with the strategies xj- 6 A",-,
t S. In other words, the players appearing in the coalition S replace their strategies
x,- by the strategies X;. If x* is the Nash equilibrium, then (3.2.1) does not necessary
imply
Hi(x') > Hi(x"\\xs) for all i S. (3.2.7)
In what foDows this will be established by some simple examples.
But we may strengthen the notion of a Nash equilibrium by requiring the condition
(3.2.7) or the relaxed condition (3.2.7) to hold for at least one of the players t 6 S.
Then we arrive at the following definition.
Definition. The situation x" is called a strong equilibrium if for any coalition
S C N and xs n . s X> there is a player io S such that the following inequality is
3.2. Optimality principles in noncooperative games 133

satisfied:
H,0(x') > Hio(x'\\xs). (3.2.8)

Condition (3.2.8) guarantees that the players' agreement to enter a coalition 5


is inexpedient because any coalition has a player i0 who is not interested in this
agreement. Any strongly equilibrium situation is a Nash equilibrium.
If the strong equilibrium existed in a broad class of games, then it could be an
acceptable principle of optimality in a noncooperative games. However, it happens
extremely rare.
Example 9. ("Prisoners' dilemma".) Consider the bimatrix game determined by

01 02
(5,5) (0,10)
(A,B)= *
a2 (10,0) (1,1)

Here we have one equilibrium situation (02,(82) (though not strong equilibrium),
which yields the payoff vector (1,1). However, if both players play (01,^1), they
obtain the payoff vector (5,5), which is better to both of them. Zero-sum games have
no such paradoxes. As for this particular case, the result is due to the fact that a
simultaneous deviation from the equilibrium strategy may further increase a payoff
to each player.
3.2.4. Example 9 suggests the possibility of applying other optimality principles
to a noncooperative game which may bring about situations that are more advanta
geous to both players than in the case of equilibrium situations. Such an optimality
principle is Pareto optimality.
Consider a set of vectors {H(x)} = {Hx(x),..., Hn(x)}, x X, X = n"=i Xi, i.e.
the set of possible values of vector payoffs in all possible situations x X.
Definition. The situation x in the noncooperative game T is called Pareto
optimal if there is no situation x X for which the following inequalities hold:

H({x) > tf,(z) for allie N and

Hi0(x) > Hi0(x) for at least one io N.


The set of all Pareto optimal situations will be denoted by Xp.
Conceptually, the belonging of the situation x to the set Xp means that there is
no other situation x which might be more preferable to all players than the situation
1.
Following Vorobjev (1977), we conceptually distinguish the notion of an equi
librium situation from that of a Pareto optimal situation. In the first case, neither
player may individually increase his payoff, while in the second, all the players cannot
increase acting as one player (even not strictly) a payoff to each of them.
To be noted also is that the agreement on a fixed equilibrium does not allow each
individual player to deviate from it. In the Pareto optimal situation, the deviating
player can occasionally obtain an essentially greater payoff. Of course, a strong
134 Chapter 3. Nonzero-sum games
equilibrium situation is also Pareto optimal. Thus, Example 9 provides a situation
(aa.A) which is equilibrium, but is not Pareto optimal. Conversely, the situation
(aj,/?i) is Pareto optimal, but not an equilibrium. In the game "battle of the sexes",
both equilibrium situations (ori, /?i), (a2, ft) are strong equilibria and Pareto optimal,
but, as already noted in Example 8, they are not interchangeable. Similar reasoning
also applies to the following example.
Example 10, Consider the "crossroads" game (see Example 2, 3.1.4). The sit
uations (a2,ft),(ai,ft) form Nash equilibria and are Pareto optimal (the situation
(on, ft) is Pareto optimal, but not an equilibrium). For each player the "stop" strategy
c*i,ft is equilibrium if the other player decides to pass the crossroads and, conversely,
it is advantageous for him to choose the "continue" strategy a2< ft if the other player
decides to pass the crossroads and, conversely, it is advantageous for him to choose
the "continue" strategy a2,ft if the other player makes a stop. However, each player
receives a payoff of 2 units only if he chooses the "continue" strategy a^fc). This
necessarily involves competition for leadership, i.e. each player is interested to be the
first to announce the "continue" strategy.
Note that we have reached the same conclusion from examination of the "battle
of the sexes" game (see Example 8).
3.2.5. We shall now consider behavior of a "leader-follower" type in a two-person
game T = (Xi,X2,Hi,H2). Denote by Z1, Z2 the sets of best responses for players
1 and 2, respectively, here

Z1 {(xux2)\Hl(xllx2)=auPH1(yl,x2)}, (3.2.9)
VI

Z2 = {(xux2)\H2(Xl,x2) = suPH2(xuy2)} (3.2.10)


w
(suprema in (3.2.9) and (3.2.10) are supposed to be reached).
Definition. We call the situation (xi,x2) X\xX2 the Stakelberg i-equilibrium
in the two-person game T if

^jffi(*1,xa)= sup ftfa.jto), (3.2.11)


{v\,yi)Z!

where i = 1,2, t ^ j .
The notion of i-equilibrium may be interpreted as follows. Player 1 (Leader) knows
the payoff functions of both piayers Hi,H2, and hence he learns Player 2's (Follower)
set of best responses Z2 to any strategy X\ of Player 1. Having this information he
then maximizes his payoff by selecting strategy xl from condition (3.2.11). Thus, 7F,
is a payoff to the ith player acting as a "leader" in the game V.
Lemma. Let Z(T) be a set of Nash equilibria in the two-person game F. Then

Z(T) = Z1 (1 Z2, (3.2.12)

where Zl,Z2 are the sets of the best responses (3.2,9), (3.2.10) given by the players
1,2 in the game Y.
3.2. Optimality principles in noncooperative games 135

Proof. Let (xi,x 2 ) g Z(T) be the Nash equilibrium. Then the inequalities

Hi(x't,x2) < #i(x,,x2), H2(xux'2) < H2(xltxi)

hold for all x\ 6 Xi and x'2 6 A j ; whence it follows that

Hi(xi,x2) = supHi(x\,x2), (3.2.13)


J

H2(xux2) = sup H2(xi,x'2). (3.2.14)

Thus, (x,,x 2 ) 6 Z 1 and (x,,x 2 ) Z s , i.e. (x,,x 2 ) e Z ' n Z 2 .


The inverse inclusion follows immediately from (3.2.13), (3.2.14).
Definition. [Moulin (1981)]. We say that the two-person game T
(Xi,X2,Hi,H2) involves competition for leadership if there exists a situation
(xi,x 2 ) XiX X2 such that

7/;<tf;(x,,x2), i = 1,2. (3.2.15)

Theorem. [Moulin (1981)]. / / the two -person game T (Xi,X2,H\tH2)


has at least two Pareto optimal and Nash equilibrium situations (xi,x 2 ), (yi,y 2 ) with
different payoff vectors

( / f i ( z i , i j ) , t f a ( * i , u ) ) ? (tfi(Vi,w),#2(Vi,2)), (3-2.16)

then the game T involves competition for leadership.


Proof. By (3.2.12), for any Nash equilibrium (z\,z2) Z(T) we have

Hii'Wh) <T?i, ' = 1,2.

Suppose the opposite is true, i.e. the game T does not involve competition for lead
ership. Then there is a situation (zi,z 2 ) 6 X\ x X2 for which

Hi(xux2) sTIiK Hi(zu22), (3.2.17)

ff,(yi,y2)<H,<//^.,z2), (3.2.18)

i = 1,2. But (xj, x 2 ), (t/!, y2) are Pareto optimal situations, and hence the inequalities
(3.2.17), (3.2.18) are satisfied as equalities, which contradicts (3.2.16). This completes
the proof of the theorem.
In conclusion we may say that the games "battle of the sexes" and "crossroads"
(as in 3.1.4) satisfy the condition of the theorem (as in 3.2.5) and hence involve
competition for leadership.
136 Chapter 3. Nonzero-sum games

3.3 Mixed extension of noncooperative game


3.3.1. We shall examine a noncooperative two-person game T = (Xi,X2,Hi,Hi).
In the nonzero-sum case, we have already seen that an equilibrium in pure strategies
generally does not exist. The matrix games have an equilibrium in mixed strategies.
For this reason, it would appear natural that the Nash equilibrium in a noncooperative
game would be sought in the class of mixed strategies.
As in the case of zero-sum games, we identify the player's mixed strategy with
the probability distribution over the set of pure strategies. For simplicity, we assume
that the sets of strategies X, are finite, and introduce the notion of a mixed extension
of the game. Let
r = (N,{Xi}ieN,{Hi}ieN) (3.3.1)
be an arbitrary finite noncooperative game. For definitiveness, suppose the player i
in the game T has rrti strategies.
Denote by ^,- in arbitrary mixed strategy of the player i, i.e. some probability
distribution over the set of strategies Xi to be referred to as pure strategies. Also,
denote by /i,(x;) the probability prescribed by strategy p, to the particular pure
strategy x< e Xf. The set of all mixed strategies of player i will be denoted by Xj.
Suppose each player i N uses his mixed strategy /*,-, i.e. he chooses pure
strategies with probabilities f*i(xi). The probability that a situation x = (x\,... , x n )
may arise is equal to the product of the probabilities of choosing its component
strategies, i.e.
fi(x) = p,(x,) x p 2 (x 2 ) x . . . x ftn(xn). (3.3.2)
Formula (3.3.2) defines the probability distribution over the set of all situa
tions X = H?=iXi determined by mixed strategies fti,fi2,... ,ix. The n-tuple
p = (p\,,.., ftn) is called a situation in mixed strategies. The situation in mixed
strategies (i realizes various situations in pure strategies with some probabilities;
hence the value of the payoff function for each player turns out to be a random vari
able. The value of the payoff function for the ith player in the situation ft is taken to
be the mathematical expectation of this random variable:

= E E Hi(xu...,xn)xiti(xl)x...xitn(xn),

iN, x = ( ! , , . . . , i ) X. (3.3.3)
We introduce the notation

*MK-)= E - E E E *.-(*K)n/*(**) (3-3.4)


Let n'j be an arbitrary mixed strategy for player j in the game T. Multiplying (3.3.4)
by /i'(Xj) and summing over all Xy 6 Xj, we obtain

*',X,
3.3. Mixed extension of noncooperative game 137

Definition. The game T = (N,{Xi}iN{Ki}iN), in which N is the set of


players, Xi is the set of mixed strategies of each player i, and the payoff function is
defined by (3.3.3), is called a mixed extension of the game T.
If the inequality Kj((i\\xi) < a holds for any pure strategy x,- of player i, then the
inequality Kj(p\\p*) < a holds for any mixed strategy fi*. The truth of this assertion
follows from (3.3.3) and (3.3.4) by a standard shift to mixed strategies.
3.3.2. For the bimatrix (m x n) game V(A, B) we may define the respective sets
of mixed strategies X\, Xi for players 1 and 2 to be

A"i = {x | xu = 1, x > 0, x Rm},

X2 = {y | yw = 1, y > 0, y 6 " } ,
where u = ( 1 , . . . , 1) /T", w = ( 1 , . . . , 1) IT. We also define the players' payoffs
Ki and Ki at (x, y) in mixed strategies to be the payoff expectations

Ki(x,y) = xAy, /fa(x,y) = xBy, x X\, y X2.

Thus, we have constructed formally a mixed extension T(A,B) of the game T(A,B),
i.e. the noncooperative two-person game T(A,B) = {X\,X2,Ki, K2).
For the bimatrix game (just as for the matrix game) the set Mz = {t|, > 0} will be
called Player l's spectrum of mixed strategy x = ( f i , . . . , m ), while the strategy x, for
which Mx M, M = { 1 , 2 , . . . , m}, will be referred to as completely mixed. Similarly,
Wy = (il^i > 0} will be Player 2's spectrum of mixed strategy y = {r]i,... ,?} in
the bimatrix (m x n) game T(A,B). The situation (x,y), in which both strategies x
and y are completely mixed, will be referred to as completely mixed.
We shall now use the "battle of the sexes" game to demonstrate that the difficulties
encountered in examination of a noncooperative game (Example 8, 3.2.2) are not
resolved through introduction of mixed strategies.
Example 11. Suppose Player 1 in the "battle of the sexes" game wishes to maxi
mize his guaranteed payoff. This means that he is going to choose a mixed strategy
x (, 1), 0 < < 1 so as to maximize the least of the two quantities K\(x,fii)
and K\{x,^i), i.e.

m^v^n{Kx{x,^),K,{x,h)}-mm{Kx{x\^),K,{x,h)}-
X

The maximin strategy x of Player 1 is of the form x = (1/5,4/5) and guarantees


him, on the average, a payoff of 4/5. If Player 2 chooses strategy ft, then the players'
payoffs are (4/5,1/5). However, if he uses strategy ft, then the players' payoffs are
(4/5,16/5).
Thus, if Player 2 suspects that his partner pursues strategy x, then he will choose
ft and receive a payoff of 16/5. (If Player 1 can justify the choice of ft for Player
2, then he may also improve his own choice.) Similarly, suppose Player 2 uses a
maximin strategy that is y = (4/5,1/5). If Player 1 chooses strategy cti then the
players' payoffs are (16/5,4/5). However, if he chooses QJ, then the players' payoffs
138 Chapter 3. Nonzero-sum games

are (1/5,4/5). Therefore, it is advantageous for him to use his strategy e*i against
the maximin strategy y.
If both players follow this line of reasoning, they will arrive at a situation (c*!,/^),
in which the payoff vector is (0,0). Hence the situation (x,y) in maximin mixed
strategies is not a Nash equilibrium.
3.3.3. Definition. The situation ft" is called a Nash equilibrium in mixed
strategies in the game T if for any player i, and for any mixed strategies /, the
following inequality holds:

# , ( , < > , ) < Ki(f), =l,...,n.

Example 11 shows that a situation in maximin mixed strategies is not necessarily


a Nash equilibrium in mixed strategies.
Example 12. The game of "crossroads" (see Example 10, 3.2.4) has two Nash
equilibria in pure strategies: (a\,/}2) and (a2,0\). These situations are also Pareto
optimal. The mixed extension of the game gives rise to one more equilibrium situation,
namely the pair (x",y'):

. . l- , 1

where u t = (1,0), u 2 = (0,1) or x' = y* = ((1 - e)/(2 - ), 1/(2 - e)).


Indeed, we have

K1(a2,S) = 2 l 0 r l-^.
Furthermore, since for any pair of mixed strategies x = ((, 1 ) and y = (n, 1 - n),
we have
ffi(*,lf) = W a , , / ) + (1 - 0 # , ( a y * ) = 1 - j ^ ,

K2(x',y) = nK2(x%ft) + (1 - n)K2{x\fc) = 1- -^

then we get
Kx{x,f) = #fx(**,"), K2(x%y) = K2(x%f)
for all mixed strategies x 6 X\ and y X2. Therefore, (x*,ym) is a Nash equilibrium.
Furthermore, it is a completely mixed equilibrium. But the situation (x*,ym) is not
Pareto optimal, since the vector K(x*,y*) = (1 e/(2 e), 1 e/(2 )) is strictly
(component-wise) smaller than the payoff vector (1,1) in the situation (ai,/?i).
Let K{fi') = {Ki(p*)} be a payoff vector in some Nash equilibrium. Denote
v.- = K,(//*) and v = {,-}. While the zero-sum games have the same value v of the
payoff function in all equilibrium points and hence this value was uniquely defined
for each zero-sum game, which had such an equilibrium, in the nonzero-sum games
3.4. Existence of Nash equilibrium 139

there is a whole set of vectors v. Thus every vector v is connected with a special
equilibrium point /i*, v< = #,(/**), /i* X,J( = n " = i ^ f
In the game of "crossroads", the equilibrium payoff vector (i/i,j) at the equi
librium point ( a i , / ^ ) is of the form (1 e,2), whereas at (x',y*) it is equal to
(1 - e/(2 - e), 1 - t/(2 - c)) (see Example 12).
3.3.4. If the strategy spaces in the noncooperative game T = (Xi,Xj, H\, Jf/2)
are infinite, e.g. Xt C IP", X2 C IP1, then as in the case of zero-sum infinite game,
the mixed strategies of the players are identified with the probability measures given
on Borel a-algebras of the sets Xi and X2. If /i and v are respectively the mixed
strategies of Players 1 and 2, then a payoff to player i in this situation Ki(fi, u) is the
mathematical expectation of payoff, i.e.

K,{n,v)=f i /fc(x,y)ifo(*)<My), (3-3-5)

where the integrals are taken to be Lebesgue-Stieltjes integrals. Note that, the payoffs
to the players at (x,u) and (p,y) are

Ki{x,r)= / # ; ( * , y)<M),
JX3

Ki(t*,y) = / Hi{x,y)dn(x), i = 1,2.

(All integrals are assumed to exist.)


Formally, the mixed extension of the noncooperative two-person game T can be
defined as a system T = ( X j j X i , ^ , ^ ) , where Xx - {/i}, X2 = {v} with Ky and
Ki determined by (3.3.5). The game T is a noncooperative two-person game, and
hence the situation (n',i/*) is equilibrium if and only if the inequalities (as in (3.2.3))
are satisfied.

3.4 Existence of Nash equilibrium


3.4.1. In the theory of zero-sum games, the continuity of a payoff function and the
compactness of strategy sets (see 2.4.4) sufficed for the existence of an equilibrium in
mixed strategies. It turns out that these conditions also suffice for the existence of
a Nash equilibrium in mixed strategies where a noncooperative two-person game is
concerned.
First we prove the existence of an equilibrium in mixed strategies for a bimatrix
game. This proof is based on the familiar Kakutani's fixed point theorem. This
theorem will be given without proof (see 3.5.5).
T h e o r e m . Let S be a convex compact set in E" and i> be a multi-valued map
which corresponds to each point of S the convex compact subsets of S and satisfies
the condition: if xn S, xn -* x, yn rp{xn), yn y and y il>(x). Then then
exists x' S such that x* ip(x').
140 Chapter 3. Nonzero-sum games

Theorem. Let T(A, B) be a bimatrix (m x n) game. Then there are mixed


strategies x' Xx and y* X2 for Players 1 and 2, respectively, such that the pair
(x*,y*) is a Nosh equilibrium.
Proof. The mixed strategy sets Xi and X2 of players 1 and 2 are convex polyhedra.
Hence the set of situations X\ x Xj is a convex compact set.
Let ^ be a multi-valued map,

tj>: Xt x X2-> Xi x X^

determined by the relationship

i.e. the image of the map tp consists of the pairs of the players' best responses to the
strategies yo and xo, respectively.
The functions Ki and K2 as the mathematical expectations of the payoffs in
the situation (x,y) are bilinear in x and y, and hence the image ^(xo,yo) of the
situation (x0,j/o) with ^ as a map represents a convex compact subset in X\ x X2.
Furthermore, if the sequence of pairs { ( X Q ^ Q ) } , ( x o>yo) Xi x X2 and { ( X J , , J 4 ) } ,
(x'n,y'n) V>(xo,yo) have limit points, i.e.

n lirn(xS,yJ) = (x 0) yo), lmjix'n,y'n) = (x',j/')>

then by the bilinearity of the functions Ki and K2, and because of the compactness
of the sets Xt and X2, we have (x',y') ift(x0,y0). Then, by the Kakutani's theorem,
there exists a situation (x",y*) G Xi x Xi for which (x',ym) ip(x*,ym), i.e.

Ki(x*,y') > * i (*,*), if a (x-,') > tf,(x*,y)

for all x Xi and y X2. This completes the proof of the theorem.
3.4.2. The preceding theorem can be extended to the case of continuous payoff
functions Hi and H2. To prove this result, we have to use the well-known Brouwer
fixed point theorem [Parthasarathy and Raghavan (1971)].
Theorem. Let S be a convex compact set in FC which has an interior. Ifipisa
continuous self-map of S, then there exists a fixed point x' of the map <p, i.e. x* G S
and x* = y(x*).
Theorem. Let V = (Xi, X2, Hx,H2) be a noncooperative two-person game,
where the strategy spaces Xy C flm, X2 C i?" are convex compact subsets and the
set X\ x X2 has an interior. Also, let the payoff functions Hi(x,y) and H2(x,y) be
continuous in Xx x X2, with Hi(x,y) being concave in x at every fixed y and H2(x,y)
being concave in y at every fixed x.
Then the game V has the Nash equilibrium (x',y*).
Proof. Let p = (x,y) X% x X2 and q = (x,y) Xi x X2 be two situations in
the game T. Consider the function

e(p,q) = Hi(x,y) + H2(W,y).


3.4. Existence of Nash equilibrium 141

First we show that there exists a situation q" = (x",y*) for which

max 9(p,q') = 9{q',q').

Suppose this is not the case. Then for each q Xt x X2 there is a p X\ x X2,
p ^ q, such that #(p, 9) > 0(q,q). Introduce the set

Gp={q\9(p,q)>9(q,q)}.

Since the function 9 is continuous (H\ and // 2 are continuous in all their variables)
and Xi x X2 is a convex compact set, then the sets Gp are open. Furthermore, by
the assumptions, X^ x X2 is covered by the sets from the family Gp.
It follows from the compactness of X\ x X2 that there is a finite collection of these
sets which covers Xt x X2. Suppose these are the sets G M , . . . , Grk. Denote

VM) = m&x{9(Pi,q) - 9(q,q),0}.


The functions ipj(q) are non-negative, and, by the definition of GP}, at least one
of the functions tpj takes a positive value at every point q.
We shall now define the self-map tj> of the set X\ x X2 as follows:

^(9) = -J-J1L'PMPJ>

where <p(q) = ,- >j(<l)- The functions ipj are continuous and hence $ is a continuous
self-map of Xi xX2. By the Brouwer's fixed point theorem, there is a point q XiXX2
such that ip(q) = q, i.e.
q = (lf(q))YlsfiMPi-
3

Consequently,
9(q,q) = (^)I>i(flP;.?)-
But the function 9(p, q) is concave in p, with q fixed, and hence

9{q,q)>~Y,vMWP,^)- (3-4.1)

On the other hand, if <pj(q) > 0, then 9(q,q) < 9(pj,q), and if <pj(q) = 0, then
<pj(q)9(pj,q) = <pj{q)0(q,q). Since <,(<?) > 0 for some j , we get the inequality

*(?.?) <j=jI>j(?WPi.?).

which contradicts (3.4.1).


Thus, there always exists q* for which

9 ) = 9
p e & ^ ' * ^ * > -
142 Chapter 3. Nonzero-sum games

Which means that

Hi(xty-) + H2(x%y)< H1(x'ty-) + H2(x\y')

for all i Xi and y X3. Setting successively z = x' and y = y* in the last
inequality, we obtain the inequalities

H3(x\y) < Hiix^y'), Hi(x,y'} < Hr(x\f),

which hold for all x Xi and y 6 l j . This completes the proof of the theorem.
The result given below holds for the noncooperative two-person games played on
compact sets (specifically, on a unit square) with a continuous payoff function.
Theorem. Let T = (Xi)Xi,Hi,H2) be a noncooperative two-person game,
where Hi and i/ 2 are continuous functions on X\ X Xj; XXs X2 are compact subsets
in finite-dimensional Euclidean spaces. Then the game V has an equilibrium (ft, v) in
mixed strategies.
This theorem is given without proof, since it is based on the continuity and bilin-
earity of the function

Ki(ii,v)= [ [ Hi(x,y)dii(x)dv{y)t i=l,2,

over the set Xi x X% and almost exactly repeats the proof of the preceding theorem.
We shall discuss in more detail the construction of mixed strategies in noncoope
rative n-person games with an infinite number of strategies. Note that if the players'
payoff functions /f,(x) are continuous on the Cartesian product X = FI"=i Xi of the
compact sets of pure strategies, then in such a noncooperative game there always
exists a Nash equilibrium in mixed strategies. As for the existence of Pareto optimal
situations, it suffices to ensure the compactness of the set {H(x)}, x X, which in
turn can be ensured by the compactness in some topology of the set of all situations
X and the continuity in this topology of all the payoff functions K{, i = 1,2,,.. , n .
It is evident that this is always true for finite noncooperative games.

3.5 K a k u t a n i fixed-point theorem a n d proof of


existence of an equilibrium in n-person games
3.5.1. The reader can read Section 3.5 without referring to the previous Section 3.4.
Given any game T = {N, {Jfj},-JV, {H,}, e N ) with finite sets of strategies Xi in normal
form (\N\ = n), a mixed strategy for any player i is a probability distribution over Xi.
We let Xi denote the set of all possible mixed strategies for player i. To underline the
distinction from mixed strategies, the strategies in Xi will be called pure strategies.
A mixed strategy profile is any vector that specifies one mixed strategy for each
player, so the set of all mixed strategy profiles (situations in mixed strategies) is
a Cartesian product X = n?=i Xi. fi = ( / J i , . . . , p n ) is a mixed-strategy profile in
njLi Xi if and only if, for each player i and each pure strategy Xi Xit fi prescribes
3.5. Kakutani fixed-point theorem and proof of existence of an equilibrium 143

a non-negative real number m(xi), representing the probability that player i would
choose x{, such that
5 3 Hi(xi) = 1, for all i N.

If the players choose their pure strategies independently, according to the mixed
strategy profile ft, then the probability, that they will choose the pure strategy profile
x = (xi,..,,Xi,...,xn) is n"=i f*i(xi), t n e multiplicative product of the individual
strategy probabilities.
For any mixed strategy profile ft, let Ki(fi) denote the mathematical expectation
of the payoff that player i would get when the players independently choose their pure
strategies according to ft. Denote X = n?=i Xi (X, is the set of all possible situation
in pure strategies), then

for a11
K.-00 = ( I I W(*i))#i(), 6 N.

For any r, X we denote by (p||r,-) the mixed strategy profile in which the t-
component is T; and all other components are as in p. Thus

TOh) = E( IT KMWximx).
We shall not use any special notation for the mixed strategy pi that puts
probability 1 on the pure strategy Xi, denoting this mixed strategy by Xi (in the same
manner as the corresponding pure strategy).
If the player t used the pure strategy x,-, while all other players behaved inde
pendently according to the mixed-strategy profile ft, then player t's mathematical
expectation of payoff would be

KM*i)= ( II K{xi))HM*i)-

3.5.2. Definition. The mixed strategy profile ft is Nash equilibrium in mixed


strategies if
KMn) < Ki(ii), for all TJ e X i N.

3.5.3. Lemma. For any ft n?=i Xi and any player i N,

max Ki(ft\\xi) = max Kj(^,n).

Furthermore, pt argmaxT>jf. jf;(p||7y) if and only if p,(xj) = 0 for every x, such


that Xi argmax 1>e xi Jf,-(/t|[zj). __
Proof. Notice that for any ry Xi

KMrJ = ii{xi)KMxi).
144 Chapter 3. Nonzero-sum games

Ki(fi\\Ti) is a mathematical expectation of terms Ki(p\\xi). This mathematical expec


tation cannot be greater than the maximum value of the random variable X,-(/i||z,-),
and is strictly less than this maximum value, whenever any nonmaximal value of
Ki(n\\xi) gets a positive probability (r,(a;,) > 0, D I i 6 x , T -( I i) = !)
So the highest expected payoff that player i can get against any combination of
other player's mixed strategies is the same whether he uses a mixed strategy or not.
3.5.4. As we have seen in the two-person case, the Kakutani fixed-point theorem
is a useful mathematical tool for proving existence of solution concepts in game the
ory including Nash equilibrium. To state the Kakutani fixed-point theorem we first
develop some terminology.
The set S of a finite dimensional vector space i?"1 is closed if for every convergent
sequence of vectors {x'}, j = 1 , . . . , oo, if x' S for every j , then lim J _ 0 0 x> 5 .
The set S is bounded if there exists some positive number K such that for every
x S, J2^L\ Z? < K (here x = {&}, ,- are the components of x).
A point-to-set correspondence F : X * Y is any mapping that specifies, for
every point x in X, a set F(x) that is subset of Y. Suppose that X and Y are any
metric spaces, so the notion of convergence and limit have been defined for sequences
of points in X and in Y. A correspondence F : X Y is upper-semicontinuous
if, for every sequence x\ y', j = 1 , . . . ,oo, if x> g S and y' F(x') for every j ,
and the sequence {x J } converges to some point x, and the sequence {y J } converges
to some point y, y F(x). Thus F : X Y is upper-semicontinuous, if the set
{ ( x , y ) : x e X,y g F(x)} is a closed subset of X x Y.
A fixed point of a correspondence F : S * S is any x in S such that x F(x).
3.5.5. Theorem (Kakutani). Let S be any nonempty, convex, bounded, and
closed subset of a finite-dimensional vector space Rm. Let F : S S be any upper-
semicontinuous point-to-set correspondence such that, for every x in S, F(x) is a
nonempty convex subset of S. Then there exists some x in S such that x F(x~).
Proofs of the Kakutani fixed-point Theorem can be found in Burger (1963).
With the help of the previous theorem we shall prove the following fundamental
result.
3.5.6. Theorem. Given any finite n-person game F in normal form, there
exists at least one equilibrium in mixed strategies (in nJV^i)-
Proof. Let T be any finite game in normal form
r = (Af,{x,}JeN,{i/,}ieN).
The set of mixed-strategy profiles Fl?=i Xi is a nonempty, convex, closed, and bounded
subset of a finite dimensional vector space. This set satisfies the above definition of
boundedness with K = \N\ and it is a subset of i P , where m = ? _ , \Xi\ (here \A\
means the number of elements in a finite set A).
For any fi n?=i ^ al *d a n v player j N, let
Rj(fi) = arg max Jif.-Mta).
r,ex,
That is, Rj(i*) is the set of best responses in Xj to the combination of inde
pendently mixed strategies ( / j ( , . . . , /Jj_i, fij+i, . , ^ N ) of other players. By previous
3.6. Refinements of Nash equiJibria 145

lemma Rj(fi) is the set of all probability distributions pj over Xj such that

P]{XJ) = 0 for every x3 such that Xj arg max Kj(n\\yj).


v,eX,
Thus, Rj(fi) is convex, because it is a subset of Xj that is defined by a collection of
linear equalities. Furthermore, Rj(p) is nonempty, because it includes Xj from the
set argmaXy ^ K^pWy^), which is nonempty.
De t n e
Let R : f]"=i Xi * nr=i X\ point-to-set correspondence such that

% ) = n^W.forall < iPi.

That is, r 6 #(/*) if and only if Tj e i?j(/i) for every j N. For each p, R(fi) is
nonempty and convex, because it is the Cartesian product of nonempty convex sets.
To show that R is upper-semicontinuous, suppose that {(ik) and { T * } , k =
l , . . . , o o are convergent sequences, pk HieN^i> ^ = 1)2,...; Tk R{Hk),
k = 1,2,...;~p= lim^oo/i*, r = lim*_0OTfc. __
We have to show that f R(fi). For every player j N and every pj Xj

K^k\\rk)>Kj{^\\Pll *=1,2
tms
By continuity of the mathematical expectation Kj(p.) on n"=i Xii in turn implies
that, for every j N and pj Xj,

*MTj) > KiCP\\Pi)-


Thus fj R,{fi) for every j N, and by the definition of R(]i), f R{JC). And we
have proved that R : Hits %i ~* Tli^N %t>s a n upper-semicontinuous correspondence.
By the Kakutani fixed-point theorem, there exists some mixed strategy profile ft
in rLgjv Xi such that ft R(p). That is pj Rj(p) for every j N thus Kj(ft) >
Kj(fi\\Tj) for all j N, Tj Xj, and so p. is a Nash equilibrium of T.

3.6 Refinements of Nash equilibria


3.6.1. Many attempts were made to choose a particular Nash equilibrium from the
set of all possible Nash equilibria profiles. There are some approaches, but today
it is very difficult to distinguish among them to find out the most perspective ones.
We shall introduce only some of them, but for the better understanding of this very
advanced topic we refer to the book of Eric van Damme (1991).
3.6.2. One of the ideas is that each player with a small probability makes mistakes,
and as a consequence every pure strategy is chosen with a positive (although small)
probability. This idea is modelled through perturbed games, i.e. games in which
players have to use only completely mixed strategies.
Let T = (N, Xi,..., Xn, Hi,..., //) be n-person game in normal form. Denote
as before Xt - the set of mixed strategies of player t, K{ - mathematical expectation
146 Chapter 3. Nonzero-sum games
of the payoff of player : in mixed strategies. For N, let rji(xj), x, X{ and Xi(i}i)
be defined by

X"i(ni)={fiiX'i: /i,( Zi ) > */,-(i,-), for all *,- X where ffr(x<) > 0, >?,(x,) < 1}.

Let i;(x) = (i7i(ii),...,;(z)) ) x, Xi, i - I,.. .,n and X[i?(x)] = Il" = iX(i),(ii)).
The perturbed game (F, J?) is the infinite game in normal form

T = {N, X i 0 ? i ( x , ) ) , . . . ,X(i/ n (a: B )), tfi(/*i, , p B ) , . ,#0*i,...,/*))

defined over the strategy sets Xj(j^(xi)) with payoffs Ki{p\,..., fin), /J, X , ( ^ ( x i ) ) ,
t = l,...,n.
S.8.3. It is easily seen that a perturbed game (I\j) satisfies the conditions under
which the Kakutani fixed point theorem can be used and so such a game possesses at
least one equilibrium. It is clear that in such an equilibrium a pure strategy which
is not a best reply has to be chosen with a minimum probability. And we have the
following lemma.
Lemma. A strategy profile fi g X(n) is an equilibrium of(F,n) if and only if
the following condition is satisfied:

if Ki(ft\\xk) < Ki{n\\xi), then Pi(xk) = ni(xk), for alli,xk,xt.

3.6.4. Definition. Let T be a game in normal form. An equilibrium f i o / T


is a perfect equilibrium ofT if ft is a limit point of a sequence {n(n)}v^,00 with n(n)
being Nosh equilibrium in a perturbed game (T,n) for all n.
For an equilibrium p of T to be perfect it is sufficient that some perturbed games
(T,n) with n close to zero possess an equilibrium close to fi and that it is not required
that all perturbed games (T,t)} with rj close to zero possess such an equilibrium.
Let { ( r , n k ) } , k = 1 , . . . , oo be a sequence of perturbed games for which r)k ~* 0 as
k oo. Since every game (F, JJ*) possesses at least one equilibrium /**, and since /i is
an element of compact set X = n"=i Xi, there exists one limit point of {pk}. It can
be easily seen that this limit point is an equilibrium of T and this will be a perfect
equilibrium. Thus the following theorem holds.
Theorem [Selten (1975)]. Every game in normal form possesses at least one
perfect equilibrium.
3.6.5. Consider a bimatrix game T

L2 *2
1 (1.1) (0,0)
tl (0,0) (0,0)

This game has two equilibria (L\,L3) and (Ri,R2). Consider a perturbed game
(r,ij). In the situation (Ri,Ri) in the perturbed game the strategies Rt and Ri will
be chosen with probabilities 1 ni(Li) and 1 I;J(J) respectively and the strategies
3.6. Refinements of Nash equilibria. 147

Lx and L2 will be chosen with probabilities f?i(Li) and TI2(L2). Thus the payoff
K?{RiRt) in (r,t?) will be equal

* ? ( f l i , ) = i h ( i ) &(*)
In the situation (Ri, L2) the strategies ft and L2 will be chosen with probabilities
(i - r,i{Lx)) and (1 - r, 2 (ft)), and

tf?(Li,fla) = '?i(i)(l-'&(^))-

Since t? is small we get


K?(LUR2)>KURUR3).
Then we see that in perturbed game ( f t , ft.) is not an equilibrium, from this it follows
that (Ri, ft) is not a perfect equilibrium in the original game. It is easily seen that
( i n 2) is a perfect equilibrium.
Consider now the game with matrix

2 R2
1 "(1.1) (10,0)
Hi (0,10) (10,10)
In this game we can see that a perfect equilibrium (L\, L2) is payoff dominated by
a non-perfect one. This game also has two different equilibria, (Lt,L2) and ( f t , ft).
Consider the perturbed game (F,^). Show that (Li,L2) is a perfect equilibrium in
(IN?)

/ f i ( i i , i 2 ) = (1 - t?i(ft))(i - ijj(/fe)) +10(1 - i j , ( f t ) M f t O + l O ^ f t M f t ) ,


Ki(Ri,L2) = 10(1 - ih(L,))ij 3 (rt,) + 1 i?i(Ii)(l - ?2(ft)) + 1 0 ^ ( 1 , ^ 2 ( ^ 2 ) .
For J/i, 172 small we have

K1(LUL2)>K2(R1,L2).

In the similar way we can show that

Ki(LuL2)>Ki{LuR2).
n
Consider now ( f t , ft) > (F>0

i f , ( f t , ft) = 10(1 - if,(L,))(l - r?a(L2)) + 10(1 - % ( L J ) ) I J I ( I I ) + i7i(/a)ih(^)

= 1 0 ( 1 - * ( ! ) ) + iji(L,)jfe(Ia),
^ ( 1 1 , % ) = 10(1-iji(fl,))(l-%())+ 10Iji(,)(l-ih(Ia)) + ( l - J i ( i ) M i a )
= 10(1 - r]2(L2)) + (1 - Vi{Ri))V2{L2).
For small 1? K ^ L i , / ^ ) > tf?(ft,ft). Thus ( f t , ft) is not an equilibrium in
(r,j) and it cannot be a perfect equilibrium in T.
148 Chapter 3. Nonzero-sum games

It can be seen that (Zri,L 2 ) equilibrium in (I\?), and the only perfect equilibrium
in T, but this equilibrium is payoff dominated by (Ri,Rj). We see that the perfect-
ness refinement eliminates equilibria with attractive payoffs. At the same time the
perfectness concept does not eliminate all intuitively unreasonable equilibria.
As it is seen from the example of Myerson (1978)

Li Ri A?
(1,1) (0,0) (-1,-2)-
Ri (0,0) (0,0) (0,-2) .
Ax (-2,-1) (-2,0) (-2,-2).

It can be seen that an equilibrium (J?i, flj) in this game is also perfect. Namely if
the players have agreed to play (Ri,R2) and if each player expects, that the mistake
A will occur with a larger probability than the mistake L, then it is optimal for each
player to play R. Hence adding strictly dominated strategies may change the set of
perfect equilibria.
3.6.6. There is another refinement of equilibria concept introduced by Myerson
(1978), which exclude some "unreasonable" perfect equilibria like (i?i, Ri) in the last
example.
This is the so-called proper equilibrium. The basic idea underlying the properness
concept is that a player when making mistakes, will try much harder to prevent more
costly mistakes than he will try to prevent the less costly ones, i.e. that there is some
rationality in the mechanism of making mistakes. As a result, a more costly mistake
will occur with a probability which is of smaller order than the probability of a less
costly one.
3.6.7. Definition. Let {N,Xi,...,Xn,Kx,... ,Kn) be an n-person normal
form game in mixed strategies. Let e > 0, and ptc f]?=i 3Tt- W e say that the strategy
profile /il is an e-proper equilibrium of T if fi' is completely mixed and satisfies

if Ki{pf\\xk) < Ki(itc\\x,), then ftc,(xk) < tfi^x,) for all i, k, I.

H n?=j Xi is a proper equilibrium ofT if ii is a limit point of a sequence fjtl(e *


0), where p.' is an t-proper equilibrium ofT.
The following theorem holds.
T h e o r e m [Myerson (1978)]. Every normal form game possesses at least one
proper equilibrium.
3.6.8. When introducing perfectness and properness concepts we considered re
finements of the Nash equilibrium which are based on the idea that a reasonable
equilibrium should be stable against slight perturbations in the equilibrium strate
gies. There are refinements based on the idea that a reasonable equilibrium should
be stable against perturbations in the payoffs of the game. But we do not cover all
possible refinements. We recommend the readers to the book of Eric van Damme
(1991) for a complete investigation of the problem.
3.7. Properties of optimal solutions 149

3.7 Properties of optimal solutions


3.7.1. We shall now present some of the equilibrium properties which may be helpful
in finding a solution of a noncooperative two-person game.
Theorem. In order for o mixed strategy situation (/i*,f*) in the game T =
{X\,Xi,H\,Hi) to be an equilibrium, it is necessary and sufficient that for all the
players' pure strategies x G X\ and y G X2 the following inequalities be satisfied:

Kx{x,v*)<Kx{n*y), (3.7.1)

W > ) <*(**>*) (3-7.2)

Proof. The necessity is evident, since every pure strategy is a special case of a
mixed strategy, and hence inequalities (3.7.1), (3.7.2) must be satisfied. To prove the
sufficiency, we need to shift to the mixed strategies of Players 1 and 2, respectively,
in inequalities (3.7.1), (3.7.2).
This theorem (as in the case of zero-sum games) shows that, for the proof that the
situation forms an equilibrium in mixed strategies it only suffices to verify inequalities
(3.7.1), (3.7.2) for opponent's pure strategies. For the bimatrix (m x n) game T(A, B)
these inequalities become

/f!(,) = o.y* < x'Ay' = Jjfi(x*,y'), (3.7.3)

K2(x*,j) = x"V < x'By' = K2(x',y'), (3.7.4)


where a,(fe^) are rows (columns) of the matrix A(B), i = l , . . . , m , j = l , . . . , n .
3.7.2. Recall that, for matrix games, each essentially pure strategy equalizes any
optimal strategy of the opponent (see 1.7.6). A similar result is also true for bimatrix
games.
Theorem. Let F(A,B) be a bimatrix (m x n) game and let (x,y) G Z(T) be a
Nash equilibrium in mixed strategies. Then the equations

K,(i,y) = ffi(i,y), (3.7.5)

K,{x,j) = K3{x,y) (3.7.6)

hold for all i Mx and j e Ny, where Mx(Ny) is the spectrum of a mixed strategy

Proof. By the Theorem 3.7.1, we have

ffi(,y) < # . ( * , ) (3-7-7)


for all t Mz. Suppose that at least one strict inequality in (3.7.7) is satisfied. That
is
ffi(o.)<*i(*.y). (3-7.8)
150 Chapter 3. Nonzero-sum games

where to Mx. Denote by ti the components of the vector x = ( 1 , . . . ,TO). Then


(it > 0 and
m

The contradiction proves the validity of (3.7.5). Equations (3.7.6) can be proved in
the same way.
This theorem provides a means of finding equilibrium strategies of players in the
game T(A,B). Indeed, suppose we are looking for an equilibrium (x,y), with the
strategy spectra Mz, Nv being given. The optimal strategies must then satisfy a
system of linear equations
ya< = vu xb> = v 3 , (3.7.9)
where Mx, j Ny, Vi, 2 are some numbers. If, however, the equilibrium (x, y) is
completely mixed, then the system (3.7.9) becomes

Ay = Viu, xB = Vjtv, (3.7.10)

where u = ( 1 , . . . , 1), u> = ( 1 , . . . , 1) are the vectors of suitable dimensions composed


of unit elements, and the numbers t>i = xAy, Vj = xBy are the players' payoffs in the
situation (x,y).
3.7.3. T h e o r e m . Let T(A, B) be a bimatrix (m x m) game, where A, B are
nonsingular matrices. If the game T has a completely mixed equilibrium, then it is
unique and is defined by formulas

x = ViuB-\ (3.7.11)
l
y = viA~ u, (3.7.12)
where
vt = l/(i4 _ 1 u), v2 = l / ( B - ' ) . (3.7.13)
Conversely, if x > 0, y > 0 hold for the vectors x,y 6 i f defined by (3.7.11)-
(3.7.13), then the pair (x,y) forms an equilibrium in mixed strategies in the game
F(A, B) with the equilibrium payoff vector (uijWj),
Proof. If (x,y) is a completely mixed equilibrium, then x and y necessarily satisfy
system (3.7.10). Multiplying the first of the equations (3.7.10) by A~l, and the second
by B~l, we obtain (3.7.11), (3.7.12). On the other hand, since xu = 1 and yu = 1,
we find values for i and v2. The uniqueness of the completely mixed situation (x,y)
follows from the uniqueness of the solution of system (3.7.10) in terms of the theorem.
We shall now show that the reverse is also true. By the construction of the vectors
x,y in terms of (3.7.11)-(3.7.13), we have xu = yu = 1. Prom this, and from the
conditions x > 0, y > 0, it follows that (x, y) is a situation in mixed strategies in the
game I\
By Theorem 3.7.1, for the situation (x, y) to be an equilibrium in mixed strategies
in the game T(A, B), it suffices to satisfy the conditions

a{y = Ki{i,y) < xAy, i = T^rn,


3.7. Properties of optimal solutions 151

xht = K2(x,j) < xBy, j = l , m ,


or
Ay < (xAy)u, xB < (xBy)u.
uB'1 A-'u
Let us check the validity of these relations for x = __ . and y = . We have
uB~'u uA~lu

Ay
(B-u)(uA- l u)
~ uA-^u - (B->)(A->) _ (XAV)U
'

u _ {uB-lBA~lu)u
xB = = {xBy)u,
uB-tu ~~ {uB-lu)(uA'lu)
which proves the statement.
We shall now demonstrate an application of the theorem with the example of a
"battle of the sexes" game as in 3.1.4. Consider a mixed extension of the game. The
set of points representing the payoff vectors in mixed strategies can be represented
graphically (Fig. 3.2, Exercise 6). It can be easily seen that the game satisfies the

(5/2,5/2)

Figure 3.2

conditions of the theorem; therefore, it has a unique, completely mixed equilibrium


(x,y) which can be computed by the formulas (3.7.11)~(3.7.13): x = (4/5,1/5),
y = (1/5,4/5), (,,wj) = (4/5,4/5).
3.7.4. We shall now consider the properties of various optimality principles. Note
that the definitions given in Sec. 3.2 of Nash equilibria and Pareto optimal situations
apply (in particular) to an arbitrary noncooperative game; therefore, they are also
152 Chapter 3. Nonzero-sum games

true for the mixed extension V. For this reason, the theorem of competition for
leadership (see 3.2.2) holds for the two-person game:

Z(r) = z l u^ 2 ,
where Z(T) is the set of Nash equilibria, Z and ^ are the sets of the best responses
to be given by the players 1 and 2, respectively, in the game F .
Things become more complicated where the Nash equilibria and Pareto optimal
situations are concerned. The examples given in Sec. 3.2 suggest the possibility of
the cases where the situation is Nash equilibrium, but not Pareto optimal, and vice
versa. However, the same situation can be optimal in both senses (see 3.2.4).
Example 12 in 3.3.3 shows that an additional equilibrium arising in the mixed
extension of the game T is not Pareto optimal in the mixed extension of I \ This
appears to be a fairly common property of bimatrix games.
Theorem. Let T(A, B) be a bimatrix (mxn) game. Then the following assertion
is true for almost all (mxn) games (except for no more than a countable set of games).
Nash equilibrium situations in mixed strategies, which are not equilibrium in the
original game, are not Pareto optimal in the mixed extension.
For the proof of this theorem, see Moulin (1981).
3.7.5. In conclusion of this section, we examine an example of the solution of
bimatrix games with a small number of strategies, which seems to be instructive in
many respects.
Example IS. (Bimatrix (2 x 2) games.) [Moulin (1981)]. Consider the game
r(A, B), in which each player has two pure strategies. Let

n r%
*1 ' (OH.AI) (<*12,Al)
(A,B)
* . (21,All) (M,0M)

Here the indices 6i,6jy ^ij^a denote pure strategies of Players 1 and 2, respectively.
For simplicity, assume that the numbers a n , a ^ , 021, a 2 2, ( A n At A l t Aw) a r e
different.
Case 1. In the original game T, at least one player, say Player 1, has a strictly
dominant strategy, say 61 (see Sec. 1.8). Then the game T and its mixed extension T
have a unique Nash equilibrium. In fact, inequalities a n > a2i, a 12 > a22 cause the
pure strategy 61 in the game T to dominate strictly all the other mixed strategies of
Player 1. Therefore, an equilibrium is represented by the pair ($I,TI) if A i > A2t or
by the pair (5i,r2) if A i < fin-
Case 2. The game T does not have a Nash equilibrium in pure strategies. Here
two mutually exclusive cases a) and 6) are possible:

a) an < an, an < a 2 2 , A i < A2, As < An,

6) a n < a 2 i , a 2 2 < a j 2 , A2 < A i , An < A2,


3.8. Symmetric bimatrix games and evolutionary stable strategies 153

where detA ^ 0, detB ^ 0 and hence the conditions of Theorem 3.7.3 are satisfied.
The game, therefore, has the equilibrium (x*,y*), where

' = ( ^22~ ^" ffn ~ fin \ f3 7Hl


X l J
Vft, + ft2 - ft, - ft,' ft, + ft2 - ft, - ftj' ''
- = ( ^ L Z ^ , ^ i l ^ ) (3.7.15)
v a n + Q22 2i on <*,, + a 2 2 2i 12/
while the corresponding equilibrium payoffs u, and t>2 are determined by

v, = a u^ M - n02i , t;2ftift2
= - ft2fti
<*u + 22 - 2i - ' fti + fin- fin- fiix
Case 3. The game T has two Nash equilibria. This occurs when one of the following
conditions is satisfied:

a) a 2 , < Q , a, 2 < Q 2 2 , ft2 < fti, fin < fin,

6) a < a 2 i, a22 < <*i2, fin < fizi, fin < fiix-
In case a), the situations (6,, r , ) , (6 2 , r 2 ) are found to be equilibrium, whereas in case
6), the situations ( ^ , , T 2 ) , ( 6 2 ! T I ) frm an equilibrium. The mixed extension, however,
has one more completely mixed equilibrium (x*,y') determined by (3.7.14), (3.7.15).
The above cases provide an exhaustive examination of a (2 X 2) game with the
matrices having different elements.

3.8 Symmetric bimatrix games and evolutionary


stable strategies
3.8.1. Let T = (X, Y, A, B) be a bimatrix game. T is said to be symmetric if the sets
X and Y coincide X = Y and cr^ = fti for all i,j.
This definition of symmetry is not invariant with respect to permutations of strat
egy sets. Suppose \X\ = \Y\ = m, and the pure strategies will be denoted by i or
j . In the evolutionary game theory the payoff matrix A of Player 1 is also called the
fitness matrix of the game, and in symmetric case the matrix A determines the game
completely, thus we shall identify the game T with A and speak of the game A. Mixed
strategies x, y are defined in the usual way. And the mathematical expectation of the
payoff to Player 1 in the situation (x,y) is equal
m m
E(x,y) = xAy = ^ ^ a , i ^ , .

For any mixed strategy p = {} define C(p) as a carrier of p and B(p) the set of pure
best replies against p in the game A

C(p) = {i : fc > 0}, B(p) = {t: E(i,p) = maxE(j,p)}.


i
154 Chapter 3. Nonzero-sum games

3.8.2. Consider now an example of Hawk-Dove game of Maynard Smith and Price
(1973) which leads us to the notion of the evolutionary stable strategy (ESS).
Example 14. The Hawk-Dove game is 2 x 2 symmetric bimatrix game with the
following matrices:

H D H D
H V-C) V
1 R - H 1/2(V - C) 0
(3.8.1)
A-
A
~ D 0 1/2V J ' " ~ D V 1/2V

Suppose 2 animals are contesting a resource (such as territory in a favourable


place) of value V, i.e. by obtaining the resource, an animal increases the expected
number of offspring (fitness) by V. For simplicity assume that only 2 pure strategies,
hawk and dove, are possible. An animal adopting the hawk strategy always fights
as hard as it can, retreated only when seriously injured. A dove merely threats in a
conventional way and quitely retreats when seriously challenged, without ever being
wounded. Two doves can share the resource peacefully, but two hawks go on fighting
until one is wounded and forced to retreat. It is assumed that a wound reduces the
fitness by an amount C. If we furthermore assume that there are no differences in size
or age that influence the probability of winning then the conflict can be represented
by the symmetric bimatrix game (3.8.1).
If V > C, the Hawk-Dove game has a unique Nash equilibrium, (H, H) hence, it
is always reasonable to fight. In a population of hawks and doves, the hawks have
greater reproductive success the dove will gradually die out and in the long run only
hawks will exist.
If V < C, then (H,H) is not an equilibrium. Consequently, a monomorphic
population of hawks is not stable. In such a population, a mutant dove has greater
reproductive success and, therefore, doves will spread through the population. Simi
larly, a population of doves can also be successfully invaded by hawks, because (D, D)
is not a Nash equilibrium.
If V < C the game has the unique symmetric Nash equilibrium in mixed strategies

* = , i - 0 . ? = (M-0,
where ( = V/C.
There are also 2 asymmetric equilibria (H,D) and (D,H).
3.8.3. Assume now that a monomorphic population is playing the mixed strategy
p in the game with fitness matrix A and suppose that a mutant playing q arises.
Then we may suppose that the population will be in perturbed state in which a
small fraction e of the individuals is playing q. The population will return to its
original position if the mutant is selected again, i.e. the fitness of a ^-individuals is
smaller than that of an individual playing p. Suppose that (p,p) is a symmetric Nash
equilibrium in a symmetric bimatrix game, and suppose that the second player in the
game instead of playing the strategy p decides to play a mixture of the two mixed
strategies: p and q with the probabilities 1 e, e, where c is small enough. Then in
general for the new mixed strategy y = (1 t)p + eq the set of Player's 1 best replies
3-8. Symmetric bimatrix games and evolutionary stable strategies 155

against y = (1 e)p + tq will not necessarily contain the starting p. And also it may
happen that q is better reply against y than p. But if for any q there exists such an
e > 0, that p is better reply against y = (1 e)p + tq than q, then we see that the use
of p by Player 1 is in some sense stable against small perturbations of the opponents
strategies, i.e. for any q there exists > 0 such that

*4((1 - t)p + eq) < pA((l - t)p + eq). (3.8.2)

If (pfp) is a strict equilibrium {pAp > qAp for all q), then (3.8.2) always holds.
There is also an evolutionary interpretation of (3.8.2), based on the example of Hawk-
Dove game.
If (3.8.2) holds, we have

(1 - t)qAp + tqAq < (1 - t)pAp + tpAq. (3.8.4)

From (3.8.4) we see that qAp > pAp is impossible because in this case (3.8.2) will not
hold for small e > 0. Then from (3.8.4) it uniquely follows

qAp < pAp, (3.8.5)

or
if qAp = pAp, then qAq < pAq. (3.8.6)
From (3.8.5), (3.8.6) the condition (3.8.4) trivially follows for sufficiently small > 0
(this e depends upon q).
3.8.4. Definition. A mixed strategy p is an ESS t/(p,p) s a Nosh equilibrium,
and the following stability condition is satisfied:

if q j p and qAp = pAp, then qAq < pAq.

3.8.5. Consider the correspondence

p->{qY,C(q)CB{p)}.

This correspondence satisfies the conditions of the Kakutani fixed point theorem, and
hence there exists a point p" for which

p* {q V, C(q) C B(P)},

and thus
C(P') C B(p'). (3.8.7)
From (3.8.7) it follows that
p'Ap' > qAp'
m
for all q G Y, and (p*,p ) is a symmetric Nash equilibrium. We proved a theorem.
Theorem. [Nash (1951)]. Every symmetric bimatrix game has a symmetric
Nash equilibrium.
156 Chapter 3. Nonzero-sum games
3.8.6. We have already seen that if (p,p) is a strict Nash equilibrium, then p is
an ESS (this follows also directly from the definition of the ESS, since in this case
there are no such q 6 Y that qAp = pAp).
Not every bimatrix games possess an ESS. For example if in A all a,y = a and do
not depend upon i,j then it is impossible to satisfy (3.8.7).
3.8.7. Theorem. If A is 2 x 2 matrix with au / 2i and atu ^ a 22 , then A
has an ESS. Ifau > ctn and a 22 > ot]2, then A has two strict equilibria (1,1), (2,2)
and they are ESS. Ifan < a 2 i, an < u then A has a unique symmetric equilibrium
(p,p), which is completely mixed (C(p) = B(p) = {l,2}j.
Proof. For q^p, we have qAq-pAp = {q-p)A(q-p). If q = (nun2),p = ( 6 , 6 ) ,
then
( ~ P)Aia ~P) = fai - i)2(<*ii - a 2 i + <*S2 - an) < 0.
Hence (3.8.3) is satisfied, and p is ESS.
3.8.8. Consider the game, where the matrix A has the form

b a a a a
a b a a a
A= a a 6 a a (3.8.8)
a a a b a

If 0 < b < a this game does not have an ESS. The game has a unique symmetric
equilibrium p = (1/5,1/5,1/5,1/5,1/5), pAp = 6/5, and every stratetgy is a best
reply against p and for any , e^e,- = a,, = 6 > 6/5 = pAp = pAe; (where e^ =
(0,..., 0, U, 0,..., 0)), and the condition (3.8.3) in ESS is violated.
Thus for the games with more than two pure strategies the theorem does not hold.
3.8.9. It is interesting that the number of ESS in the game is always finite
(although may be equal to zero).
If (p,p) and (q,q) are Nash equilibria of A with q^ p, and C(q) C B(p), then p
cannot be an ESS, since q is the best reply against p and q.
Theorem. Ifp is an ESS of A and (q,q) is a symmetric Nash equilibrium of A
with C(q) C B(p), then p = q.
3.8.10. Let {pn,pn) be the sequence of symmetric Nash equilibrium's of A such
that limn^oop" = p. Then from the definition of the limit we get, that there exists
such N that for all n > N

C(p) C <7(p") C B(p") C B(p).

FVom the previous theorem we have that p n = p for n > N. It follows, that
every ESS is isolated within the set of symmetric equilibrium strategies. From the
compactness of the set of situations in mixed strategies we have that if there would
be infinitely many ESS, there would be a cluster point, but the previous discussion
shows that this is impossible. Thus the following theorem holds:
Theorem. [Haigh (1975)]. The number ESS is finite (but possibly zero).
3.9. Equilibrium in joint mixed strategies 157

3.9 Equilibrium in joint mixed strategies


3.9.1. We shall continue discussion of two-person games. As already noted in Sec. 3.2,
even though an equilibrium is not dominated (Pareto optimal), we may have the cases
where one equilibrium is advantageous to Player 1, while the other is advantageous to
Player 2. This presents certain problems in finding a mutually acceptable solution to a
nonantagonistic conflict which arises where a noncooperative game is to be formalized.
For this reason, we have to examine a nonantagonistic conflict in the formalization
which allows the players to make joint decisions. This approach can be illustrated
with an example of the "battle of the sexes" game (see Example 12, 3.1.4).
Example 15. Let us consider a mixed extension of the "battle of the sexes" game.
The set of points corresponding to the payoff vectors in mixed strategies in the game
can be represented graphically (see Fig. 3.2, and 3.5.3). Figure 3.2 shows two Nash
equilibria in pure strategies with the payoff vectors (1,4), (4,1) and one completely
mixed equilibrium with the payoff vectors (4/5,4/5) (this may be found by employing
Theorem 3.5.3), which is less preferable to players than every equilibrium in pure
strategies. Thus here the following situations form an equilibrium: (cti,/?i), (ctj,/^),
(x*,y*), where x" = (4/5,1/5), y" = (1/5,4/5), and the situations ( a j , ^ ) , ( a 2 , f t )
are also Pareto optimal.

Figure 3.3

If the game is repeated, then it may be wise for the players to make their choice
jointly, i.e. to choose with probability 1/2 the situation (Qi,0i) or (ct2, fh). Then
the expected payoff to the players is, on the average, (5/2,5/2). This point, however,
is not lying in the set of payoff vectors corresponding to possible situations in a
noncooperative game (Fig. 3.2), i.e. it cannot be realized if the players choose mixed
158 Chapter 3. Nonzero-sum games

strategies independently.
A joint mixed strategy of players is the probability distribution over the set of all
possible pairs (i, j) (a situation in pure strategies), which is not necessary generated
by the players' independent random choices of pure strategies. Such strategies can
be realized by a mediator before the game starts.
Denote by M a joint mixed strategy in the game T(A, B). If this strategy is played
by Players 1 and 2, their expected payoffs K\(M), K2(M) respectively are

where A = { } , B = {/?y} are the players' payoff matrices, M = {/*y}, and


uMw= 1, M > 0, u = ( 1 , . . . , 1 ) G IF,w = ( 1 , . . . , 1 ) Rn. Geometrically, the set
of vector payoffs corresponding to the joint mixed strategies is the convex hull of the
set of possible vector payoffs in pure strategies. For the game in Example 14 it is of
the form as shown in Fig. 3.3.
Note that the joint mixed strategy M * = L t , is Pareto optimal and
corresponds to the payoff vector (5/2,5/2). Thus, Af* can be suggested as a solution
to the game "battle of the sexes".
Definition. For the bimatrix (m x n) game T(A,B), denote by M = {#iy} the
joint probability distribution over the pairs (i,j), i 1 , . . . , m, j = 1 , . . . , n . Denote
by fii(j) the conditional probability of realizing strategy j provided strategy i has been
realized. Similarly, denote by i/j(i) the conditional probability of realizing strategy i
provided strategy j has been realized. Then

U
* ' \0, t / ^ = 0 , j = l,...,n,

";()
-\o, ifl*ij = 0, t = 1,

We say that M* = {/iy} is an equilibrium in joint mixed strategies in the game


T(A, B) if the inequalities
n n TO m

<*.X0") > E.'^*(i), /v;(0 > / w ) (3.9.1)


3= 1 J=l 1=1 .= 1

hold forall,t' { l , 2 , . . . , m } and j,j' { l , 2 , . . . , n } .


3.9.2. The game T(A, B) in joint mixed strategies can be interpreted as follows.
Suppose the players have reached an agreement on joint strategy M* = {ft*j} and a
chance device has yielded the pair (i,j), i.e. Player 1(2) has received the strategy
number i(j). Note that each player knows only his own course of action. In general,
he may not agree on the realization i (j, respectively) of the joint strategy and
choose the strategy i'(j'). If M* is an equilibrium, then it is disadvantageous for each
player to deviate from the proposed realization t (j, respectively), which follows from
3.9. Equilibrium in joint mixed strategies 159

(3.9.1), where the left-hand sides of the inequalities coincide with the expected payoff
to Player 1(2) provided he agrees on the realization i(j).
Suppose the strategy t of Player 1 is such that m, = 0 for all j = l , 2 , . . . , n .
Then the first of the inequalities (3.9.1) seems to be satisfied. Similarly, if j,-,- = 0 for
all t = 1 , . . . , m, then the second inequality in (3.9.1) is satisfied. We substitute the
expressions for fii{j) and i/,(t) in terms of fi^ into (3.9.1). Then it follows that the
necessary and sufficient condition for the situation M* = {^*j} to be equilibrium is
that the inequalities
n n m n m m
5 > i M y > a , ^ , > ' , = 1, f t ^ > / ? , y , v fi > 0 (3.9.2)
j=l j=l i-\]=\ 1=1 i=l

hold for all i,i' { l , 2 , . . . , m } and j,j' 6 {1,2, . . . , n } .


Denote by ZC(F) the set of equilibria in joint mixed strategies.
Theorem. The following assertions are true:

1. The set ZC{T) of equilibria in joint mixed strategies in the bimatrix (m x n)


game T(A, B) is a nonempty convex compact set in the space Rm*n.

2. If ( i , y) is a situation in mixed strategies in the game T(A, B), then the joint
mixed strategy situation M = {^s,j} generated by the situation (x,y), is equilib
rium if and only if(x,y) is the Nash equilibrium in mixed strategies in the game
T(AB).

Proof. Suppose that (x,y), x = ( i , . . . , m ) , y (>?i,-!>?!) is the situation in


mixed strategies in the game T(A, B), while M = {ftj} is the corresponding situation
in joint strategies, i.e. ji,j = & rjj, i = 1 , . . . ,ro, j = 1 , . . . ,n. The necessary and
sufficient condition for M to be equilibrium is provided by the system of inequalities
(3.9.2), i.e.
6*i(,y)>fcffi(',j0, ViK2{*,i)>VjK2{*J'), (3-9.3)
where i,i' { 1 , 2 , . . . , m } , j , j ' { 1 , . . . ,n}. If & = 0 (>?, = 0), then the inequalities
(3.9.3) are trivially satisfied. Therefore, the system of inequalities (3.9.3) is equivalent
to the following:
Ki(i,y) > Ki(i',y), * a ( * , j ) > K2(x,j% (3.9.4)
i,t' 6 { 1 , . -,m}, j,j' 6 { l , . . . , n } , where i and j belong to the spectra of strategies
x and y. Let (x, y) be the Nash equilibrium in mixed strategies in the game T(A,B).
Then, by the Theorem 3.5.2,

tfi(i,3/) = Ki{x,y), K2{x,j) = K2(x,y)

for all t and j from the spectra of optimal strategies. Therefore, inequalities (3.9.4)
are satisfied and M ZC{T).
Conversely, if (3.9.3) is satisfied, then summing the inequalities (3.9.3) over t and
j , respectively, and applying Theorem 3.5.1, we have that the situation (x,y) is a
Nash equilibrium.
160 Chapter 3. Nonzero-sum games

The convexity and compactness of the set ZC(T) follow from the fact that ZC(T)
is the set of solutions to the system of linear inequalities (3.9.2) which is bounded,
whereas its nonemptiness follows from the existence of the Nash equilibrium in mixed
strategies (see 3.4.1). This completes the proof of the theorem.
"1/2 0
Note that the joint mixed strategy M* is equilibrium in the
0 1/2
game "battle of the sexes" (see Example 1, 3.1.4), which may be established by mere
verification of inequalities (3.9.2).

3.10 The bargaining problem


3 . 1 0 . 1 . This section deals with the question: how rational players can come to an
agreement on a joint choice by negotiations. Before stating the problem, we return
to the game "battle of the sexes" once again.
Example 16. Consider the set R corresponding to possible payoff vectors in joint
mixed strategies for the game "battle of the sexes" (this region is shaded in Fig. 3.4).
Acting together, the players can ensure any payoff in mixed strategies in the region

^1,4)

(vi,v2) = (5/2,5/2)

(4,1)

4 K,

Figure 3-4

R. However, this does not mean that they can agree on any outcome of the game.
Thus, the point (4,1) is preferable to Player 1 whereas the point (1,4) is preferable to
Player 2. Neither of the two players can agree with the results of negotiations if his
payoff is less than the maximin value, since he can receive this payoff independently
of his partner. Maximin mixed strategies for the players in this game are respectively
z = (1/5,4/5) and y = (4/5,1/5), while the payoff vector in maximin strategies
3.JO. The bargaining problem 161

( v ? , ^ ) is (4/5,4/5). Therefore, the set 5 , which can be used in negotiations, is


bounded by the points a,b,c,d,e (see Fig. 3.4). This set will be called a bargaining
set of the game. Furthermore, acting jointly, the players can always agree to choose
points on the line segment ab, since this is advantageous to both of them (the line
segment ab corresponds to Pareto optimal situations).
3.10.2. The problem of choosing the points (vi,v2) from S by bargaining will be a
bargaining problem. This brings us to the following consideration. Let the bargaining
set S and the maximin payoff vector (J, v) be given for the bimatrix game T(A, B).
We need to find the device capable for solving bargaing problem, i.e. to find a function
tp such that
V(S,vlv2) = (vuv2), (3.10.1)
where {vi,v2) S is the solution. The point (v,v2) is called a disagreement point.
It appears that, under some reasonable assumptions, it is possible to construct
such a function <p(S, vf,v2).
Theorem. Let S be a convex compact set in R2, and let (v,v2) be a maximin
payoff vector in the game T(A,B). The set S, the pair (vi,v2) and the function <p
satisfy the following conditions:

1. (vuv2)>(vlv).

S. (vuv2) S.

3. If{vx,v2) 6 S and (vt,v2) > (vt,v2), then (vuv2) = (i,vj).

4. If(vuv2) ~5cS and (x>i,v2) = y?(S,??) then (u,,tJ 2 ) = <p(5,v,v2).

5. Let T be obtained from S by linear transformation v[ = o.\VxJr^i, v'2 = a2v2+fa;


a, > 0, a2 >Q. //^(S,t;?,f) = (vuv2), then ^ ( r , o , v ? + fit,a2v + fa) =
(a 1 v 1 + /3 t ,a 2 t> 2 + fa)-

6. If for any (vt,v2) g S, also (v2,vx) G S, J = v2 and (p(S,v,v2) = {vi,v2),


then tJi = v2.

There exists a unique function <p, satisfying 1-6 such that

tp(S,v1,vl) = (vuv2).

The function yj, which maps the bargaining game {S,v,v^) into the payoff vec
tor set (wi,u 2 ) and satisfies conditions 1-6, is called a Nash bargaining scheme
[Owen (1968)], conditions 1-6 are called Nash axioms, and the vector (vi, v2) is called
a bargaining solution vector. Thus, the bargaining scheme is a realizable optimality
principle in the bargaining game.
Before going to prove the theorem we will discuss its conditions using the game
"battle of the sexes" as an example (see Fig. 3.4). Axioms 1 and 2 imply that the
payoff vector (wi,v 2 ) is contained in the set bounded by the points a,b,c,d,e. The
axiom 3 implies that (tJi,U2) is Pareto optimal. Axiom 4 shows that the function v?
162 Chapter 3. Nonzero-sum games

is independent of irrelevant alternatives. This says that if the solution outcome of a


given problem remains feasible for a new problem obtained from it by contraction,
then it should also be the solution outcome of this new problem. Axiom 5 is the scale
invariance axiom and axiom 6 shows that the two players possess equal rights.
The proof of the Theorem 3.10.2 is based on the following auxiliary results.
3.10.3. Lemma. / / there are points (vi,vj) S such that vt > ri[ and v? > v$,
then there exists a unique point (vr, v2) which maximizes the function

0{vi,v2) = (t>, - v)(v2 - vl)

over a subset Sx C S, S t = { (vi,v2) | (vi,v2) S, vt > v).


Proof. By condition, Si is a nonempty compact set while 0 is a continuous func
tion, and hence achieves its maximum 0 on this set. By assumption, 0 is positive.
Suppose there are two different points of maximum (w^Wj) and (v",v2) for the
function 0 on Si. Note that uj / v"; otherwise the form of the function 0 would imply

If v[ < v", then v'2 > v2. Since the set Si is convex, then (vi,v2) G Si, where
t?i = ({ + v'{)/2, v2 = (v'2 + v%)/2. We have

*(i,Vj) g 2

K-f)K-ig) K-?)K-^) (t/,-f)K-t4)


+ +
- 2 2 I
Each augend in the last sum is equal to 0/2, while the third summand is positive,
which is impossible, because 0 is the maximum of the function 0. Thus, the point
(tJi,t>a), which maximizes the function 0 over the set Si, is unique.
3.10.4. Lemma. Suppose that S satisfies the conditions of Lemma 3.10.3,
while (Vi,V2) is the point of maximum for the function 9(vi,v2). Define

S(vuv2) = (t?2 - v?)t>i + (vi - vt)v2.

^ / ( ^ l ) ^ ) S, then the following inequality holds:

S(Vl,V2)<6(vUV2).

Proof. Suppose there exists a point (t>i,j) 6 S such that 6(vi,v2) > 6(vi,v2).
From the convexity of S we have: (vi,t/j) S, where v[ = t7i + (i t^) and
v'2 = v2 + t(v2 v2), 0 < t < 1. By linearity, S(vi vu v2 v2) > 0. We have

0(v' v'2) = $(vuv2) + t6(vi - t?i,2 - v2) + e 2 (v, - ?!)(, - v2).

For a sufficiently small e > 0 we obtain the inequality 0(v[,v'2) > 0(vi,v2), but this
contradicts the maximality of 0(vi,v2).
3.10.5. We shall now prove Theorem 3.10.2. To do this, we shall show the point
(^liVj), which maximizes 0(vi,u 2 ), is a solution of the bargaining problem.
3.JO. The bargaining problem 163

Proof. Suppose the conditions of Lemma 3.10.3 are satisfied. Then the point
(^1,^2) maximizing 0{vi,v2) is defined. It is easy to verify that (vi,v2) satisfies
conditions 1-4 of Theorem 3.10.2. This point also satisfies condition 5 of this theorem,
because if v[ = a^vi + fl\ and v'2 = a2v2 + 02, then

tf'KX) = li " (i? + A ) l t e - (<*3S + &)] = a,a2e(vuv2),

and if (Ui,Wj) maximizes 9(vi,v2), then (t^,?^) maximizes #'(i4,v 2 ). Show that the
set S is symmetric in the sense of condition 6 and v = v. Then (t>2,Vi) G S and
S(wiiWj) = 0(t>2,^i). Since (vi,v2) is a unique point, which maximizes 9(vi,v2) over
Si, then (i,i) = (vj.wi), i.e. vt = v2.
Thus, the point {Vi,v2) satisfies conditions 1-6. Show that this is a unique solution
to the bargaining problem. Consider the set

R = { (!,!) I *(1,B2) < (!,, i?j) } (3.10.2)

By Lemma 3.10.4, the inclusion S C R holds. Suppose T is obtained from R by


transformation

<=^4, *=^4. (3-10.3)


Expressing i and 2 in terms of (3.10.3) and substituting into (3.10.2), we obtain

T={(v[,v'2)\v'l+v'2<2}
and uj = Vj = 0. Since T is symmetric, it follows from property 6 that a solution
(if any) must lie on a straight line v[ = v 2 , and, by condition 3, it must coincide with
the point (1,1), i.e. (1,1) = >(T,0,0). Reversing the transform (3.10.3) and using
property 5, we obtain (Vi,v2) = ip(R,v,v2). Since (i,wj) 6 S and S C R, then by
property 4, the pair (v\,v2) is a solution of (S,Vj,v5)-
Now suppose that the conditions of Lemma 3.10.3 are not satisfied, i.e. there are
no points (t>i,t>2) S for which vt > v and v2 > v2. Then the following cases are
possible:
a) There are points, at which vi > v and v2 = v2. Then (vi,v2) is taken to be
the point in S, which maximizes t>i under constraint v2 = v2.
b) There are points, at which v\ = v and v? > v. In this case, (t>i,wj) is taken
to be the point in S, which maximizes v2 under constraint v\ = v.
c) The bargaining set 5 degenerates into the point (u, uj) of maximin payoffs
(e.g., the case of matrix games). Set Vi = v, v2 = v2.
It can be immediately verified that these solutions satisfy properties 1-6, and
properties 1-3 imply uniqueness. This completes the proof of the theorem.
In the game "battle of the sexes", the Nash scheme yields bargaining payoff
(vuv2) = (5/2,5/2) (see Fig. 3.4).
3.10.6. In this section we survey the axiomatic theory of bargaining for n players.
Although alternatives to the Nash solution were proposed soon after the publication of
Nash's paper, it is fair to say until the mid-1970s, the Nash solution was often seen by
164 Chapter 3. Nonzero-sum games

economists and game theorists as the main, if not the only, solution to the bagaining
problem. Since all existing solutions are indeed invariant under translations of the
origin and since our own formulation will also assume this invariance, it is convenient
to take as admissible only problems that have already been subjected to translation
bringing their disagreement point to the origin. Consequently, v = (vf,...,t/) =
( 0 , . . . ,0) 6 i f always, and a typical problem is simply denoted S instead of (5,0).
Finally, all problems are taken to be subsets of R% (instead of i f ) . This means that
all alternatives that would give any player less than what he gets at the disagreement
point v = 0 are disregarded.
Definition. The Nosh solution N is defined by setting, for all convex, compact,
comprehensive subsets S C R^. containing at least one vector with all positive coordi
nates (denote S n J , N(S) equal to the maximizer in v 5 of the "Nosh product"
ns.ii-
Nash's theorem is based on the following axioms:
1. Pareto optimality. For all 5 n , for all v i f , if v > f(S) and v ^ <p(S),
then v i S [denote <p(S) 6 PO(S)}.
A slightly weaker condition is:
2. Weak Pareto-optimality: For all S 6 " , for all t> g Rn, if v > ip(S), then
v$S.
Let Il n : { 1 , . . . , n } { l , . . . , n } be the class of permutations of order n. Given
II e I P , and c e i J " , let JT(V) = (u(i),... ,t; l ( n )). Also, given S C i f , let x(S) =
{v' 6 R" | 3v S with v' = *()}.
3. Symmetry. For all 5 G n , if for all x U\w(S) = 5 , then <^(S) = ip^S)
for all i,j (note that ir(S) e " ) .
Let Ln : R" > i f be the class of positive, independent person-by-person, and
linear transformations of order n. Each I Ln is characterised by n positive numbers
a,- such that given v R?,l(v) = (aiVi,.. . ,a n t>). Now, given S C -ft", let l(S) =
{v'eRn\3v S with v' = l(v)}.
4. Scale invariance: For all S " , for all / 6 Ln, ip(l(S)) = l(<p(S)) [note that
l(S) 6 E"]-
5. Independence of irrelevant alternatives: For all S, S' C E " i if S' C 5 and
>(S) 5 ' then V(S') = p(S).
In previous section we showed the Nash theorem for n = 2, i.e. only one solution
satisfies these axioms. This result extends directly to arbitrary n.
Theorem. A solution tp(S), S E " satisfies 1, 3, 4, 5 if and only if it is
the Nash solution.
This theorem constitutes the foundation of the axiomatic theory of bargaining.
It shows that a unique point can be identified for each problem, representing an
equitable compromise.
In the mid-1970s, Nash's result become the object of a considerable amount of
renewed attention, and the role played by each axiom in the characterization was
scrutinized by several authors.
6. Strong individual rationality. For all S n , f(S) > 0.
3.10. The bargaining problem 165
0 0
Theorem [Roth (1977)]. A solution <p{S), S 6 " sate/iea 3,4 ,5,6 if and
only if it is the Nash solution.
If 3 is dropped from the list of axioms in Theorem 3.10.6, a somewhat wider but
still small family of additional solutions become admissible.
3.10.7. Definition. Given a = ( a i , . . . , a ) , o^ > 0, t = l , . . . , n , " = i a i = 1,
the asymmetric Nash solution with weights a, Na, is defined by setting, for all S 53" >
Na(S) = argmaxn? = 1 ?", S.
These solutions were introduced by Harsanyi and Selten (1972).
Theorem. A solution <p(S), S n satisfies 4 0 ,5,6 0 if and only if it is an
asymmetric Nash solution.
If 6 is not used, a few other solutions became available.
3.10.8. Definition. Given i 6 { l , . . . , n } the i-th Dictatorial solution D'
is defined by setting, for all S X2n, D*(S) equals to the maximal point of S in the
direction of the ith unit vector.
Note that all D' satisfy 4, 5, and 2 (but not 1). To recover full optimality,
one may proceed as follows. First, select an ordering tc of the n players. Then given
S )", pick D*(l\S) if the point belongs to Pareto optimal subset of S; otherwise,
among the points whose ir(l)th coordinate is equal to D'^(S), find the maximal
point in the direction of the unit vector pertaining to player T ( 2 ) . Pick this point if it
belongs to Pareto optimal subset of S; otherwise, repeat the operation with fl"(3),
This algorithm is summarized in the following definition.
3.10.9. Definition. Given an ordering ir o / { l , . . . , n } , the lexicographic
Dictatorial solution relative to w, D*, if defined by setting, for all S JC", ^(S) to
be the lexicographic maximizer over v S of v*(t), tv(2), , w*(n)-
All of these solutions satisfy 1,4,5, and there are no others if n = 2.
3.10.10. The Kalai-Smorodinsky solution. A new impetus was given to the
axiomatic theory of bargaining when Kalai and Smorodinsky (1975) provided a char
acterization of the following solution (see Fig. 3.5).

MS)

Figure 3.5

Definition. The Kalai-Smorodinsky solution K is defined by setting, for all


S l n , / ( S ) to be the maximal point of S on the segment connecting the origin to
166 Chapter 3. Nonzero-sum games

a(S), the ideal point of S, defined by v,(5) = max{i;; | v 5 } for each i.


An important distinguishing feature between the Nash solution and the Kalai-
Smorodinsky solution is that the latter responds much more satisfactorily to expan
sions and contractions of the feasible set. In particular, it satisfies the following
axiom.
7. Individual monotonicity. For all S,S' e E2> for all t, if Vj(S) = vj(S') and
S' D S, then <^(S') > <pi(S).
Theorem. A solution <fi(S), 5 e 2 satisfies 1,3,7 if and only if it is the
Kalai-Smorodinsky solution.
Although the extension of the definition of the Kalai-Smorodinsky solution to the
n-person case itself causes no problem, the generalization of the preceding results
to the n-person case is not as straightforward as was the case of the extensions of
the results concerning the Nash solution from n = 2 to arbitrary n. First of all, for
n > 2, the n-person Kalai-Smorodinsky solution satisfies 2 only. This is not a serious
limitations since, for most problems S, K(S) in fact is (fully) Pareto optimal. But it
is not the only change that has to be made in the axioms of Theorem to extend the
characterization of the Kalai-Smorodinsky solution to the case n > 2.
3.10.11. The Egalitarian solution. We now turn to a third solution, which is the
main distinguishing feature from the previous two.
Definition. The Egalitarian solution E is defined by setting, for all S "
E(S) to be the maximal point of S of equal coordinates (see Fig. 3.6).

Figure 3.6

The most striking feature of this solution is that it satisfies the following mono
tonicity condition, which is very strong, since no restriction are imposed in its hy
potheses on the sort of expansions that take S into S'. In fact, this axiom can serve
to provide an easy characterization of the solution.
8. Strong monotonicity. For all S,S' G E n , if 5 C ", then ip(S) < ifi(S').
The following characterization result is a variant of a theorem due to Kalai(1977).
Theorem. A solution *p(S), S " satisfies 2, 3, 8 if and only if it is the
Egalitarian solution.
3.11. Games in characteristic function form 167

3.10.12. The Utilitarian solution. We close this review with a short discussion
of the Utilitarian solution.
Definition. A Utilitarian solution U is defined by choosing, for each S J?
among the maximizers of "=i vi for v 6 5 (see Fig. 3.7).

Figure 3.7

Obviously, all Utilitarian solutions satisfy 1. They also satisfy 3 if appropriate


selections are made. However, no Utilitarian solution satisfies 4. Also, no Utilitarian
solution satisfies 5, because of the impossibility of performing appropriate selections.
The Utilitarian solution has been characterized by Myerson (1981).
Other solutions have been discussed in the literature by Luce and Raiffa (1957),
and Perles and Mashler (1981). In this section we follow Thomson and Lensberg
(1989), where the reader can find proofs of the theorems.

3.11 Games in characteristic function form


Sections 3.6-3.7 demonstrate on the example of two-person games how the players
can arrive at a mutually acceptable decisions on the arising conflict by an agreed
choice of strategies (strategic cooperation). We now suppose that the conditions of
a game admit the players' joint actions and redistribution of a payoffs. This implies
that the utilities of various players can be evaluated by a single scale (transferable
payoffs), and hence the mutual redistribution of payoffs does not affect the conceptual
statement of the original problem. It appears natural that, from the point of view of
each player, the best results may also be produced by uniting players into a maximal
coalition (the coalition composed of all players). In this case, we are interested not
only in the ways the coalition of players ensures its total payoff, but also in the ways
it is distributed among the members of this coalition (cooperative approach).
Sections 3.8-3.10 deal with the cooperative theory of n-person games. This theory
is focused on the conditions under which integration of players into a maximal coali
tion is advisable and individual players are not interested in forming smaller groups
168 Chapter 3. Nonzero-sum games

or act individually.
3.11.1. Let N = { 1 , . . . , n } be a set of all players. Any nonempty subset S C N
is called a coalition.
Definition. The real-valued function v defined on coalitions S C N is called a
characteristic function of the n-person game. Here the inequality

v(T) + v(S)<v(TuS), (0) = O (3.11.1)

holds for any nonintersecting coalitions T, S (T C N, S C N).


Property (3.11.1) is called a superadditivity property. This property is necessary for
the number v(T) to be conceptually interpreted as a guaranteed payoff to a coalition
T when this coalition is acting independently of the other players. This interpretation
of inequality (3.11.1) implies that the coalition S U T has no less opportunities then
the two nonintersecting coalitions S and T when they act independently.
From the superadditivity of v it follows that for any system nonintersecting coali
tions Si,..., Sk there is

X>($) <M
1=1
This, in particular, implies that there is no decomposition of the set N into coalitions
such that the guaranteed total payoff to these coalitions exceeds the maximum payoff
to all players v(N).
3.11.2. We shall now consider a noncooperative game T = (N, {Aj}<g/v, {/f"<};e;v).
Suppose the players appearing in a coalition S C N unite their efforts for the
purpose of increasing their total payoff. Let us find the largest payoff they can
guarantee themselves. The joint actions of the players from a coalition S imply
that this coalition 5 acting for all its members as one player (call him Player 1) takes
the set of pure strategies to be the set of all possible combinations of strategies for
its constituent players from S, i.e. the elements of the Cartesian product
xs = l[xi.
ies
The community of interests of the players from S means that a payoff to the coalition
S (Player 1) is the sum of payoffs to the players from S, i.e.

tfs(*)i #,(*),

where x AW, x = ( x i , . . . , x n ) is a situation in pure strategies.


We are interested in the largest payoff the players from S can guarantee them
selves. In the worst case for Player 1, the remaining players from N\S may also
unite into a collective Player 2 with the set of strategies XN\S = riiN\s Xu where
interests are diametrically opposite to those of Player 1 (i.e. Player 2's payoff at x is
Hs(x)). As a result of this reasoning, the question of the largest guaranteed payoff
to the coalition S becomes the issue of the largest guaranteed payoff to Player 1 in the
zero-sum game Ts = (XS,XN\S,HS)- In the mixed extension Ts = (Xs,Xff\$,Ks)
3.11. Games in characteristic function form 169

of the game Ts, the guaranteed payoff v(S) to Player 1 can merely be increased
in comparison with that in the game Ts. For this reason, the following discussion
concentrates on the mixed extension of Ts- In particular, it should be noted that,
according to this interpretation, v(S) coincides with the value of the game Ts (if any),
while v(N) is the maximum total payoff to the players. Evidently, v(S) only depends
on the coalition S (and on the original noncooperative game itself, which remains
unaffected in our reasoning) and is a function of S. We shall verify that this function
is a characteristic function of a noncooperative game. To do this, it suffices to show
that the conditions (3.11.1) is satisfied.
Note that t>(0) = 0 for every noncooperative game constructed above.
Lemma (on superadditivity). For the noncooperative game that
T = (N, {Xi}i6jv, {//,},#), we shall construct the function v(S) as

v(S) = sup inf KS(HS,UN\S), SCN, (3.11.2)


MS "N\s

where ps Xs, I>N\S G XN\S &nd Ps = (Xs,Xs\s,Ks) is a mixed extension of


the zero-sum game Ts- Then for all 5, T C N for which S PI T = 0 , the following
inequality holds:
v(SnT)>v(S)-rv(T). (3.11.3)

Proof. Note that

v(S U T ) = sup inf ] P Ki(nsuT, f/v\(suT)),

where HSVT is the mixed strategy of the coalition S U T, i.e. arbitrary probability
measures on Xs^j, VN\{SUT) >S probability measure on ^/V\(SUT)J K> is a payoff to
player i in mixed strategies. If we restrict ourselves to those probability measures on
XsuT, which are the products of independent distributions ps and vj over the Carte
sian product Xs x XT, then the range of the variable, in terms of which maximization
is taken, shrinks and supremum merely decreases. Thus we have

v(SUT) > sup sup inf V] tf.-^s * W,"JV\(SUT))-

Hence
v(SvT)> inf 53 ^.(MS X/*T,C/V\(SUT))

= inf (y\ Ki(ns x PT, fN\(suT)) + J2 Ki(l*s x Vr, VN\{SUT)))

Since the sum of infimums does not exceed the infimum of the sum, we have

x
v(S U T) > inf Yl KiiPs VT, "H\(SI>T)) + inf J2 K^s x HT, fN\(sur)).
170 Chapter 3. Nonzero-sum games

Minimizing the first addend on the right-hand side of the inequality over fir, and
the second addend over /is (f r uniformity, these will be renamed as uj and v$), we
obtain

v(S U T) > inf inf 5 3 Ki(ns x vT, ^W\(SUT)) + inf inf 5 3 Ki{vs x fir, W\(sur))

> inf 5 3 #i(/<s, f*r\s) + inf 53^'(w>"Af\T)-

The last inequality holds for any values of the measures fts and fir- Consequently,
these make possible the passage to suprema

v(S U T ) > sup inf 5 3 Ki(ns, VN\S) + sup inf 5 3 #i - (/r, "Ar\r)>

whence, using (3.11.2), we obtain

v{Sl)T)>v(S) + v{T).
The superadditivity is proved.
Note that inequality (3.11.3) also holds if the function v(S) is constructed by the
rule
v(S) = sup inf Hs(xs,xN\s), S C N,

where xs 6 Xs, Xft\s G ^JV\SI ?s = (Xs, Xx\s, HS)- In this case, the proof literarily
repeats the one given above.
3.11.3. Definition. The noncooperative game V = (JV, {Jf.Jig^, {//,},#) is
called a constant sum game if

5 3 Hi(x) c = const

/or a// xXN, XN = UiN Xt.


L e m m a . Let F (N, {A'.Jjg/v, {//i}igjv) &e noncoopcraitce constant sum #ame,
the function v(S), S C N, be defined as in Lemma 3.11.2, and the games Ts, S C N,
have the values in mixed strategies. Then

v(N) = v(S) + v(N \S), SC N.

Proof. The definition of the constant sum game implies that

v(N) = j;H,(x) = ^Ki(ft) =c

for all situations x in pure strategies and all situations fi in mixed strategies.
On the other hand,

v(S) = sup inf 5 3 Ki(ns, VN\S) - sup inf ( c - 5 3 Ki(l*s, "N\S))


3.11. Games in characteristic function form 171

= c - inf sup 5 2 Ki(fis, "N\S) = c - v(N \ S),

which is what we set out to prove.


3.11.4. In what follows, by the cooperative game is meant a pair (N, v), where
v is the characteristic function satisfying inequality (3.11.1). The conceptual inter
pretation of the characteristic function justifying property (3.11.1) is not essential for
what follows.
Example 17. ("Jazz band" game.) [Moulin (1981)]. Manager of a club promises
singer S, pianist F , and drummer D to pay $100 for a joint performance. He values
a singer-pianist duet at $80, a drummer-pianist duet at $65 and a pianist at $30.
A singer-drummer duet may earn $50 and a singer, on the average, $20 for doing
an evening performance. A drummer may not earn anything by playing alone.
Designating players S, F , and D by numbers 1,2,3, respectively, we are facing a
cooperative game ( J V , where N = {1,2,3}, v(l,2,3) = 100, u(l,3) = 50, v(l) =
20, v(l,2) = 80, u(2,3) = 65, t>(2) = 30, u(3) = 0.
The main problem in the cooperative theory of n-person games is to construct
realizable principles for optimal distribution of a maximum total payoff v(N) among
players.
Let cti be an amount the player i receives by distribution of maximum total payoff
v(iV),JV = { l , 2 , . . . , n } .
Definition. The vector a = (ait... ,an), which satisfies the conditions

<*i>v{{i)), iN, (3.11.4)

,<*, = v(N), (3.11.5)


i=l

where v({i}} is the value of characteristic function for a single-element coalition S =


{i} is called an imputation.
Condition (3.11.4) is called an individual rationality condition and implies that
every member of coalition received at least the same amount he could ensure by acting
alone, without any support on the part of other players. Furthermore, condition
(3.11.5) must be satisfied, since in the case iN <*i < v(N) there is a distribution a',
on which every player i N receives more than his share ;, However, if ICiew <* >
v(N), then players from N distribute among themselves an unrealized payoff. For
this reason, the vector a can be taken to be admissible only if condition (3.11.5) is
satisfied. This condition is called a collective (or group) rationality condition.
By (3.11.4), (3.11.5), for the vector a - ( a j , . . . , a n ) to be an imputation in the
cooperative game (N, v), it is necessary and sufficient that it could be represented as

Q e N
= "({*}) + 7 >

and
7.->0,*JV, 2 > = ( * ) - 5 X W ) -
iJV iN
172 Chapter 3. Nonzero-sum games

Definition. The game (N,v) is called essential if

5 > ( { 0 ) < (JV), (3.11.6)


t/V

otherwise it is called nonessential.


For any imputation a, we denote the quantity l s a M by a(S), and the set
of all imputations by D. The nonessential game has a unique imputation a =
(({l}),w({2}),...,({})).
In any essential game with more than one player, the imputation set is infinite.
We shall examine such games by using a dominance relation.
Definition. Imputation a dominates imputation 0 in coalition S (denoted as
a>-P)if
Q,>ft,iS, a(S)<v(S). (3.11.7)

The first condition in (3.11.7) implies that imputation a is more advantageous to


all members of coalition S than imputation /?, while the second condition accounts
for the fact that imputation a can be realized by coalition S (that is, coalition 5 can
actually offer an amount a,- to every player i g S).
Definition. Imputation a is said to dominanate imputation P if there is a coali-
tion S for which a >- 0. Dominance of imputation /? by imputation a is denoted as
a y ft.
Dominance is not possible in a single element coalition and in the set of all players
N. Indeed, a y 0 had to imply /3, < a, < ({}) which contradicts condition (3.11.5).
3.11.5. Combining cooperative games into one or another class may substantially
simplify their subsequent examination. We may examine equivalency classes of games.
Definition. TVie cooperative game (N,v) is called equivalent to the game (N,v')
if there exists a positive number k and n arbitrary real numbers c,-, i N, such that
for any coalition S C N there is

v'(S) = kv(S)-r'ci. (3.11.8)

The equivalence of the game (N, v) to (N, v') will be denoted as (N, v) ~ (N, v')
or v ~ v'.
It is obvious that v ~ v. This can be verified by setting d = 0, k = 1, v' = v in
(3.11.8). This property is called reflexity.
We shall prove the symmetry of the relation, i.e. that the condition v ~ v' implies
v' ~ v. In fact, setting k' = l/k, cj = Ci/k we obtain

v(s) = *V(S) + 4
ies
i.e. v' ~ v.
Finally, if v ~ v' and v' ~ v", then t; ~ v". This property is called transitivity.
This can be verified by successively applying (3.11.8).
3,11. Games in characteristic function form 173

Since the equivalence relation is reflexive, symmetric and transitive, it decomposes


the set of all n-person games into mutually nonintersecting classes of equivalent games.
Theorem. / / two games v and v' are equivalent, then the map a > a', where

a'i =fca,-+ Q, i N,

establishes the one-to-one mapping of the set of all imputations in the game v onto
s s
the imputation set in the game v', so that a >~ ft implies a' -< /?'.

Proof. Let us verify that a' is an imputation in the game (N,v'). Indeed,

a\ = kai + a> kv{{i}) + c, = ({}),


; = -{!* + c) = MAO + E c< = AN).
>N igAf ieN
5

It follows that conditions (3.11.4), (3.11.5) hold for a'. Furthermore, if a >- /?, then

Oi>Pi, iS, a,<v(S),


and hence
a', = kati + c > kfii + c, = 0'i{k> 0),

E: = * E + E c< < M^) + E c - = m


tS tS iS iS
5
i.e. a ' >- /?'. The one-to-one correspondence follows from the existence of the inverse
mapping (it was used in the proof of the symmetry of the equivance relation). This
completes the proof of the theorem.
3.11.6. When decomposing the set of cooperative games into mutually disjoint
classes of equivalence, we are faced with the problem of choosing the simplest repre
sentative from each class.
Definition. The game (N,v) is called the game in (0-1) - reduced form, if for
allieN
v({t}) = 0, v(N) = l.

Theorem. Every essential cooperative game is equivalent to some game in (0-1)


- reduced form.
Proof. Let

*%av)-, 6 v(W) >0,

v(N)-ZiNV{b}) fts
Then '({*}) = 0, v'(N) = 1. This completes the proof of the theorem.
174 Chapter 3. Nonzero-sum games
This theorem implies that the game theoretic properties involving the notion of
dominance can be examined on the games in (0-1) - reduced form. If v is the char
acteristic function of an arbitrary essential game {N,v), then

AS) = "(fi"ge5r(/;", (3.11.9)

is (0-1) - normalization corresponding to the function v. In this case, a imputation


is found to be any vector a = ( a i , . . . , a n ) whose components satisfy the conditions

* i > 0 , iJV, Ea> = 1


> (3.11.10)

i.e. imputations can be regarded as the points of the (n 1) - dimensional simplex


generated by the unit vectors Wj = (0,... ,0,1,0,... ,0), j = T^n in the space iF\

3.12 T h e core and iVM-solution


We shall now turn to the principles of optimal behavior in cooperative games. As
already noted in 3.11.4, we are dealing with the principles of optimal distribution of
a maximum total payoff among players.
3.12.1. The following approach is possible. Suppose the players in the cooperative
game (N,v) have come to an agreement on distribution of a payoff to the whole
coalition N (imputation a"), under which none of the imputations dominates a*.
Then such a distribution is stable in that it is disadvantageous for any coalition S to
separate from other players and distribute a payoff v(S) among its members. This
suggests that it may be wise to examine the set of nondominant imputations.
Definition. The set of nondominant imputations in the cooperative game (N,v)
is called its core.
Then we have the theorem which characterizes core.
Theorem. For the imputation a to belong to core, it is necessary and sufficient
that
v(S)<a(S) = Y,ai (3.12.1)

hold for all SQN.


Proof. This theorem is straightforward for nonessential games, and, by Theorem
3.11.6, it suffices to prove it for the games in (0-1) - reduced form.
Prove first that the statement of the theorem is sufficient. Suppose that condition
(3.12.1) holds for the imputation a. Show that the imputation a belongs to the
core. Suppose this is not so. Then there is an imputation /3 such that 0 >~ a, i.e.
P(S) > a{S) and 0(S) < v(S) which contradicts (3.12.1).
We shall now prove the necessity of (3.12.1). For any imputation a, which does
not satisfy (3.12.1), there exists a coalition S for which a(S) < v(S). Let

fl - ~ J. (S) ~ <*(S) c c
isi' '
3.12. The core and NM-solution 175

where \S\ is the number of elements of the set S. It can be easily seen that fi(N) = 1,
s
fii>Q and /3 >- a. Then it follows that a does not belong to the core.
Theorem 3.9.1 implies that core is a closed convex subset of the set of all imputa
tions (core may also be an empty set).
3.12.2. Suppose the players are negotiating the choice of a cooperative agreement.
It follows from the superadditivity of v that such an agreement brings about the
formation of the coalition N of all players. The question is tackled as to the way of
distributing the total payoff v(N), i.e. the way of choosing a vector a K" for which
E . e N " . = v{N).
The minimum requirement for obtaining the players' consent to choose a vec
tor a is the individual rationality of this vector, i.e. the condition a, > v({t}),
i 6 N. Suppose the players are negotiating the choice of the particular imputation a.
Some coalition S demanding a more advantageous imputation may raise an objection
against the choice of this imputation. The coalition S lays down this demand, threat
ening to break up general cooperation (this threat is quite real, since the payoff v(N)
can only be ensured by unanimous consent on the part of all players). Suppose the
other players N\S respond to this threat by uniting their efforts against the coalition
S. The maximum guaranteed payoff to the coalition S is evaluated by the number
v(S). Condition (312.1) implies that there exists a stabilizing threat to the coalition
S from the coaltion N. Thus, a core of the game (JV, v) is the set of distributions of
the maximum total payoff v(N) which is immune to cooperative threats.
We shall bring forward one more criterion to judge whether an imputation belongs
to the core.
L e m m a . Let a be an imputation in the game {N,v). Then a belongs to the core
if and only if the inequality

Y,at<v(N)-v(N\S) (3.12.2)

holds for all coalitions S C N.


Proof. Since , 6 / v a , = v(N),tne above inequality can be written as

v(N\S)< a,.
iN\S

Now the assertion of the lemma follows from (3.12.1).


Condition (3.12.1) shows that if the imputation a belongs to the core, then no
coalition 5 can guarantee itself the amount exceeding T,isai a ()> '- e - t n e t o "
tal payoff ensured by the coalition members using the imputation a. This makes
unreasonable the existence of coalitions 5 other than the maximal coalition iV.
Theorem, 3.9.1, provides enough reason to use core as an important optimality
principle in the cooperative theory. However, in many cases the core appears to be
empty, whereas in the other cases it represents a multiple optimality principle and
176 Chapter 3. Nonzero-sum games
the question as to which of the imputation are to be chosen from the core in the
particular case is still open.
Example 18. Consider the "jazz band" game (see Example 15, 3.11.4). The total
receipts of three musicians is maximum ($100) when they do performance jointly. If
the singer does performance separately from the pianist and drummer, they receive
$65+$20 all together. If the pianist does performance alone, they receive $30+$50.
Finally, if the pianist and singer do performance without the drummer, their total
receipts amount to $80. What is the distribution of the maximum total receipts
to be considered rational in terms of the above-mentioned partial cooperation and
individual behavior?
The vector or = (01,02,03) in the "jazz band" game belongs to the core if and
only if
QJ > 20, a2 > 30, a3 > 0,
i + 02 4- Q3 = 100,
<*i + ot2 > 80, a 2 + 03 > 65, *i + a3 > 50.
This set is a convex hull of the following three imputations: (35,45,20), (35,50,15),
(30,50,20). Thus, the payoffs of each player in different imputations differs on the
amount not more than 5 rubles. The typical representative of the core is the arithmeti
cal mean of extreme points of core, namely a* = (33.3,48.3,18.3). The characteristic
feature of the imputation o* is that all two-component coalitions have the same ad
ditional receipts: a, + ctj v({i,j}) = 1.6. The imputation a* is a fair compromise
from the interior of the core.
3.12.3. The fact that the core is empty does not mean that the cooperation of
all players N is impossible. This simply means that no imputation can be stabilized
with the help of simple threats as above. The kernel is empty when intermediate
coalitions are too strong. This assertion can be explained as follows.
Example 19. (Symmetric games.) [Moulin (1981)]. In a symmetric game, coali
tions with the same number of players have the same payoffs. The characteristic
function v is
v(S) = f{\S\)
for all S C N, where |S| is the number of elements of the set S.
We may assume, without loss of generality, that / ( l ) = 0 and N = { l , . . . , n } .
Then the imputation set in the game (N, v) is the following simplex in i f
n
X)".- = f(n) = v(N), a, > 0 , t = l , . . . , n .
i=l

The core is a subset of the imputation set defined by linear inequalities (3.12.1),
i.e. a convex polyhedron. By the symmetry of v(S), the core is also symmetric,
i.e. invariant under any permutation of components o j , . . . ,an. Furthermore, by the
convexity of the core, it can be shown that the core is nonempty if and only if it
contains the center a* of the set of all distributions (a* = f(n)/n, i = l , . . . , n ) .
Returning to system (3.12.1), we see that the core is nonempty if and only if the
inequality (1/|S|)/(|S|) < (l/n)/(n) holds for all \S\ = l , . . . , n . Thus, the core is
3.12. The core and NM -solution

0 1 2 ... | 5 | ... n n

Figure 3.8

.. n

Figure 3.9

nonempty if and only if there is no intermediate coalition S, in which the average part
to each player exceeds the corresponding amount in the coalition N. Fig. 3.8(3.9)
corresponds to the case where the core is nonempty (empty).
3.12.4. Example 20. [Vorobjev (1977)]. Consider a general three-person game in
(0-1) - reduced form. For its characteristic function we have v(0) = v(l) = v(2) =
w(3) = 0, t(l,2,3) = 1, w(l,2) = c 3 , v(l,3) = c 2 , w(2,3) = c where 0 < q < 1,
i = 1,2,3. By the Theorem 3.9.1, for the imputation a to belong to the core, it is
necessary and sufficient that there be

a i + at > c 3 , <*i + Q3 > c 2 , a2 + a 3 > c i

or
Q 3 < 1 - C 3 , 2 < 1 - C 2 , Oi < 1 - Cj. (3.12.3)
Summing inequalities (3.12.3) we obtain
tti + a ! + tt3<3-(cl+ca + c 3 ),
178 Chapter 3. Nonzero-sum games
or, since the sum of all a,, t = 1,2,3, is identically equal to 1,

ci + c2 + c3 < 2. (3.12.4)

The last inequality is the necessary condition for the existence of a nonempty core
in the game of interest. On the other hand, if (3.12.4) is satisfied, then there are
non-negative i,2,3 such that

Figure 3.10

Let fii = 1 (^ &, = 1,2,3. The numbers & satisfy inequalities (3.12.3) in such
a way that the imputation /? (&,$, #3) belongs to the core of the game; hence
relation (3.12.4) is also sufficient for a nonempty core to exist.
Geometrically, the imputation set in the game involved is the simplex: a\ + a2 +
<*3 = 1, atj > 0, t = 1,2,3 (triangle ABC shown in Fig. 3.10). The nonempty core is an
intersection of the imputation set (A ABC) and a convex polyhedron (parallelepiped)
0 < Qfi < 1 Ci, i = 1,2,3. It is the part of triangle ABC cut out by the lines of
intersections of the planes
Qi = l - c t = 1,2,3 (3.12.5)
with the plane AABC. Referring to Fig. 3.10, we have a^, i = 1,2,3 standing for
the line formed by intersection of the planes at = 1 c, and ai + a 2 + 0:3 = 1. The
intersection point of two lines, on and ctj, belongs to triangle ABC if the kth coordinate
of this point, with (k ^ t, ifc ^ j), is non-negative; otherwise it is outside of AABC
(Fig. 3.11a, 3.11b). Thus, the core has the form of a triangle if a joint solution to
3.12. The core and NM-solution 179

any pair of equations (3.12.5) and equation ct\ + a 2 + e*3 = 1 is non-negative. This
requirement holds for

ci + c2 > 1, cx+cz> 1, c2 + c 3 > 1. (3.12.6)

The core can take one or another form, as the case requires (whereas a total of eight
is possible here). For example, if none of the three inequalities (3.12.6) is satisfied,
then the core appears to be a hexagon (Fig. 3.11b).

a) b)

Figure S.ll

3.12.5. Another optimality principle in cooperative games is JVM-solution, which


is actually a multiple optimality principle in the set of all imputations, as also is the
core. Although the elements of the core are not dominated by other imputations,
but we cannot say that for any previously given imputation a in the core there is its
associated dominating imputation. For this reason, it seems to be wise to formulate
an optimality principle by taking into account this situation.
Definition. The imputation set L in the cooperative game (N,v) is called NM-
solution if:
1) a y 0 implies that either a L or j3 < L (interior stability);
2) for any a $ L there is an imputation fi L such that P > a (exterior stabil-
ity)-
Unfortunately, the definition is not constructive and thus the notion of iVM-
solution cannot find practical use, and has more philosophical significance rather
than a practical meaning.
There is a particular relation between the core in a cooperative game and its
JVAf-solution. For example, if the core is nonempty and JVM-solution exists, then it
contains the core. Suppose the imputation a belongs to the core. In fact, if it did
not belong to NM-solution L, then, by property 2, there would be an imputation a '
such that a' X o. This, however, contradicts the fact that a belongs to the core as a
set of nondominant imputations.
180 Chapter 3. Nonzero-sum games

Theorem. If the inequalities

v(S)<
n - | 5 | + l'
where | S | is the number of players in coalition S, hold for the characteristic function
of the game (N,v) in (0-1) - reduced form (\N\ = n), then the core of this game is
nonempty, and is its NM-solution.
Proof. Take an arbitrary imputation a which is exterior to the core. Then there
exists a nonempty set of those coalitions S in which it is possible to dominate a, i.e.
these are the coalitions for which a(S) < v(S). The set {S} is partially ordered in the
inclusion, i.e. Si > S 2 if S2 C Si. Take in it a minimal element So which apparently
exists. Let k be the number of players in the coalition So. Evidently, 2 < k < n 1.
Let us construct the distribution 0 as follows:

ftH
Since 0(So) = (So), ft > a,-, i So, then 0 dominates a in the coalition So- Show
that 0 is contained in the core. To do this, it suffices to show that 0(S) > v(S) for
an arbitrary S. At first, let | S | < k. Note that 0 is not dominated for any coalition
S C So, since ft > a, (t g S 0 ), while S 0 is a minimal coalition for which it is possible
to dominate a. If, however, at least one player from S is not contained in So, then

w >i^a >i^k. _ i _ >_ L _ > (s,


Thus, 0 is not dominated for any coalition containing at most k players.
Now, let \S\ > k. If So C S, then

flg). W-*><-**>>+.(ft)
n k
a n1^k
|5|- t + t - | 5 |+ i_ l

However, if S does not contain S 0 , then the number of players of the set S, not
contained in So, is at least | S | k + 1; hence

m > ( \ s \ - k + i ) ( i - V ( s 0 ) ) > m - k + i > i


K V ; V
~ n- & -r-Jfc + l - r - | S | + l - ' '
Thus, 0 is not dominated for any one of the coalitions S. Therefore, 0 is contained
in the core. Furthermore, 0 dominates a. We have thus proved that the core is
nonempty and satisfies property 2 which characterizes the set of NM-solutions. By
definition, the core satisfies property 1 automatically. This completes the proof of
the theorem.
3.12. The core and NM -solution 181

3.12.6. Definition. The game (N,v) in (0-1) - reduced form is called simple
if for any S C N v(S) takes only one of the two values, 0 or 1. A cooperative game
is called simple if its (0-1) - reduced form is simple.
Example 21. [Vorobjev (1977)]. Consider a three-person simple game in (0-1) -
reduced form, in which the coalition composed of two or three players wins (v(S) = 1),
while the one-player coalition loses (u({t}) = 0). For this game, we consider three
imputations

a 12 = (1/2,1/2,0), a, 3 = (1/2,0,1/2), a M = (0,1/2,1/2). (3.12.7)

None of the three imputations dominates each other. The imputation set (3.12.7)
also has the property as follows: any imputation (except for three imputations a,j)
is dominated by one of the imputations a . This can be verified by examining some
imputation a = (cti,ct2,a3). Since we are examining the game in (0-1) - reduced
form, then a, > 0 and ei + a7 + 3 = 1. Therefore, no more than two components
of the vector a can be at least 1/2. If there are actually two components, then each
of them is 1/2, whereas the third component is 0. But this means that a coincides
with one of a^. However, if a is some other imputation, then it has no more than
one component which is at least 1/2. We thus have at least two components, say, a;
and ctj (i < j ) , which are less than 1/2. But here o >- a. Now three imputations
(3.12.7) form NM-solution. But this is not the only iVM-solution.
Let c be any number from the interval [0,1/2]. It can be easily verified that the
set
L3iC = {(a, 1 - c - o,c) | 0 < a < 1 - c}
also is JVM-solution. Indeed, this set contains imputations, on which Player 3 receives
a constant c, while the players 1 and 2 divide the remaining part in all possible
proportions. Internal stability follows from the fact that for any two imputations a
and P from this set we have: if en > ft, then a2 < ft. But dominance for a single
player coalition is not possible. To prove the external stability Z, 3c , we may take any
imputation p $ L3iC. This means that either ft > c or ft < c. Let ft > c, e.g.,
ft = c -I- c. Define the imputation a as follows:

a\ - ft + /2, ct2 = ft + e/2, a3 = c.

Then o 6 L3x and a >- fi for coalition {1,2}. Now, let ft < c. It is clear that
either ft < 1/2 or ft < 1/2 (otherwise their sum would be greater than 1). Let
ft < 1/2. Set a = (1 - c , 0 , c ) . Since 1 - c > 1/2 > ft, then a y @ for coalition {1,3}.
Evidently, a e L3,c. However, if ft < 1/2, then we may show in a similar manner
that 7 >- ft where 7 = (0,1 c, c). Now, aside from the symmetric NM-solution, the
game involved has the whole family of solutions which allow Player 3 to obtain a fixed
amount c from the interval 0 < c < 1/2. These iVM-solutions are called discriminant.
In the case of the set 3,0, Player 3 is said to be completely discriminated or excluded.
From symmetry it follows that there are also two families of iVM-solutions, LltC
and Lxc which discriminate Players 1 and 2, respectively.
182 Chapter 3. Nonzero-sum games

The preceding example shows that the game may have many NM-solutions. It
is not clear which of them is to be chosen. If, however, one ATM-solution has been
chosen, it remains unclear which of the imputations is to be chosen from this particular
solution.
Although the existence of ATM-solutions in the general case has not been proved,
some special results have been obtained. Some of them are concerned with the exis
tence of ATM-solutions, while the others are related to the existence of ATM-solutions
of a particular type [Diubin and Suzdal (1981)].

3.13 Shapley value


3.13.1. The multiplicity of the previously discussed optimality principles (core and
ATM-solution) in cooperative games and the rigid conditions on the existence of these
principles force us to a search for the principles of optimality, existence and uniqueness
of which may be ensured in every cooperative game. Among such optimality principles
is Shapley value. The Shapley value is defined axiomatically
Definition. The carrier of the game (AT, v) is called a coalition T such that
v(S) = v(S fl T) for any coalition ScN,
Conceptually, this definition states that any player, not a member of the carrier,
is a "dummy", i.e. he has nothing to contribute to any one of the coalitions.
We shall consider an arbitrary permutation P of the ordered set of players AT =
{ l , 2 , . . . , n } . This permutation has associated with the substitution ?r, i.e. one-to-
one function * : AT AT such that for t 6 AT the value w(i) AT is an element of AT
to which t AT changes in a permutation P.
Definition. Suppose that (N,v) is an n-person game, P is a permutation of the
set N and x is its associated substitution. Denote by (AT, T ) a game (AT, u) such that
for any coalition S C AT, 5 = {ii,3, ,*}

(M*i),*(a),--.,*(.)}) = (S).

The game (AT, xv) and the game (AT, v) differ only in that in the latter the players
exchange their roles in accordance with permutation P.
The definition permit the presentation of Shapley axiomatics. First, note that
since cooperative n-person games are essentially identified with real-valued (charac
teristic) functions, we may deal with the sum of two or more games and the product
of game by number.
3.13.2. We shall set up a correspondence between every cooperative game (AT, v)
and the vector ip(v) = (viMi >VnM) whose components are interpreted to mean
the payoffs received by players under an agreement or an arbitration award. Here,
this correspondence is taken to satisfy the following axioms.
Shapley a x i o m s .
1. If S is any carrier of the game (N,v), then
I > M = (S).
3.13. Shapley value 183

2. For any substitution of ir and i N

3. If (N,u) and (N,v) are any cooperative games, then

<fii[u + V] = <fii[lt] + (Pi[v\.

Definition. Suppose <p is the function which, by axioms 1-3, sets up a cor
respondence between every game (N,v) and the vector <p{v}. Then ip[v] is called the
vector of values or the Shapley value of the game (N,v).
It turns out that these axioms suffice to define uniquely values for all n-person
games.
Theorem. There exists a unique function y> which is defined for all games (N, v)
and satisfies axioms 1-3.
3.13.3. The proof of the theorem is based on the following results.
Lemma. Let the game (N,ws) be defined for any coalition S C N as follows:

s(r) = {{; f * (3.13.1)

Then for the game (N,ws) the vector <p[w$\ is uniquely defined by axioms 1,2:

(3i3 2)
*IHJ(*' US; -
where s = \S\ is the number of players in S.
Proof. It is obvious that S is the carrier of ws, as is any set T containing the set
S. Now, by axiom 1, if S C T, then

But this means that y>i[u>s] = 0 for t ^ S. Further, if r is any substitution which
converts S to itself, then xws = ws. Therefore, by axiom 2, for any i, j 6 5 there is
the equality <pi\ws] = <ft[u>s]. Since there is a total of s = \S\ and their sum is 1, we
have sft[wsl = 1/s if t G S.
The game with the characteristic function ws defined by (3.13.1) is called the
simple n-person game. Now the lemma states that for the simple game (N,Ws) the
Shapley value for the game (N,ws) is determined in a unique manner.
Corollary. / / c > 0, then

The proof is straightforward. Thus >[ctos] = <y[u>s] f r c


0-
184 Chapter 3. Nonzero-sum games
We shall now show that if J2s csws is a characteristic function, then

f>i\Yicsws) = Ylfi(csws) = E w ( ^ ) - (3.13.3)


v
s ' s s
In the case of c$ > 0, the first equation in (3.13.3) is stated by axiom 3, while
the second follows from the corollary. Further, if , v and - t; are characteristic
functions, then, by axiom 3, tp[u w] = <p[u] <p[v). Hence it follows that (3.13.3)
holds for any Cs- Indeed, if s csws is a characteristic function, then
c
v = ^2csws= E s^S~ E {~cs)v>s,
S S\cs>0 S\cs<0

hence
vM = f\ E csws) - v[ E (~cs)wS]
S\cs>0 S|cs<0

= E cs<pbs]- E (-cs)<p[v>s] = '52cs<plu>s]-


S\cs>0 S\cs<0 S

3.13.4. Lemma. Let (N, v) be any game. Then there are 2n 1 real numbers
cs such that
v = J2 csws, (3.13.4)
SCN
where w$ are defined by (S. IS. 1) and summation is made over all subsets S of the set
N, exclusive of an empty set. Here, representation (S.1S.4) is unique.
Proof. Set
cs = E ( - l ) - V T ) (3-13-5)
T[TCS
(here t is the number of elements in T). Show that these numbers cs satisfy the
conditions of the lemma. Indeed, if U is an arbitrary coalition, then
E csws(U) = E cs
S{ScN S\SCU

= E ( E (-i)-MT)) = E f E (-i)Hf(T).
S|SCf T|TCS ' T\TCU lS\TcSCU '
We shall now consider the quantity which is bracketed in the last expression. For
every value 5 between i and u there are C"Z," of sets S with s-elements such that
T C S CU. Therefore the bracketed expression can be replaced by the following:

Ec::'(-ir, = EQ:;(-ir<,
but this is a binomial decomposition of (1 1)""'; hence it is 0 for all t < u, and 1
for t = u. Therefore for ail U C N
E csws(U) = v(U).
S\SCN
3.13. Sh&pley value 185

We shall prove the uniqueness of representation (3.13.4). To every characteristic


function v corresponds an element in the space R2"~l. Now, let us order coalitions T C
U. Then for every nonempty coalition T C U corresponds component of the vector
equal to v(T). These vectors will be denoted by v as functions. It is obvious that to
the simple characteristic functions tus correspond the vectors whose components are
0 or 1. We shall prove that the simple characteristic functions (or, more specifically,
their associated vectors) are linearly independent. Indeed, let
E \sws(T) = 0 for all T C N.
ScN

Then for all T = {i} we have ws({i}) = 0 if S # {i}, and u>s({i}) = 1 if S = {i}.
Hence X^ = 0 for all i C N. Continue the proof by using the induction method. Let
Xs = 0 for all S C T, S ^ T. Show that A r = 0. Indeed,
E Xsws(T) = Xsws(T) = XT = Q.
ScN SCX

Now, we have 2 1 linearly independent vectors in .ft 2 " -1 ; therefore every vector,
n

and hence every characteristic function v is uniquely expressed as a linear combination


(3.13.4) of the simple characteristic functions ws- This completes the proof of the
lemma.
3.13.5. We shall now turn to the proof of Theorem 3.10.2. Lemma 3.10.4 shows
that any game can be represented as a linear combination of games w$ and the
representation (3.13.4) is unique. By 3.10.3, the function sp[v] is uniquely defined by
relations (3.13.3), (3.13.2).
Let (N, v) be an arbitrary game. We now obtain an expression for the vector <p[v].
By 3.10.3, 3.10.4,
V.M = csipi[ws} = cs(l/s),
S\SCN S|.CSC/V
but cs are determined by (3.13.5). Substituting (3.13.5) into this expression we obtain

ViM= (i/)[ (-lrMd = [ (-ir'(i/,MT)|.


S\iCSCN "TITCS T\TcNlS\TuicSCN J

Set
7,G0= (-ir*(l/). (3-13.6)
S\ToiCSCN
If i V and T = T'U {}, then fi{V) = -fi(T). In fact, all terms on the right-hand
side of (3.13.6) in both cases are the same and only t = f + 1; hence they differ only
in sign. Thus we have
>.[]= E 7<(r)[(r)-(r\{t})].
T|iTcN

Further, if t 6 T, then there are exactly C*l'f coalitions S with s-elements such that
T C S. This brings us to the well known definite integral

7i(T) = D - i r c'liu/*)=B-ir'^r, f x-d*


186 Chapter 3. Nonzero-sum games

J Jo
izi .=t
= / ' xf-l(l - x)n-'dx.
Jo
Thus we have
7 > (T) = (t-l)!(n-t)!/(n!)
and hence
{t
1*M = E ~l)^~t)lMT)-v(T\{t))). (3.13.7)
ThereN
Equation (3.13.7) determines explicitly the components of Shapley value. This
expression satisfies axioms 1-3 in 3.10.2.
Note that the vector tp[v] is an imputation. Indeed, by the superadditivity of the
function v,
r w / f i x v - (t - 1)K - *)'
Vi[v] > v({i}) J2 -T, ~
T\iCTcN "'

3.13.6. Axiomatic definition apart, the Shapley value expressed by (3.13.7) can
be interpreted conceptually as follows. Suppose the players (elements of the set N)
have decided to meet in a specified place at a specified time. It would appear natural
that, because of random deviations, they would arrive at various instants of time.
However, it is assumed that the players' arrival orders (i.e. their permutations) have
the same probability, namely l/(n!). Suppose that if, on arrival, player finds in place
(only) the members of coalition T \ {t}, then he receives a payoff v(T) v(T \ {i}),
that is, the limit amount he contributes to that coalition. Then the component of
Shapley value tpi[v) represents player 's payoff mathematical expectation in terms of
this randomization.
3.13.7. For a simple game (as in 3.9.6), the formula for Shapley value seems to
be particularly descriptive. Indeed, v(T) v(T\ {'}) is always either 0 or 1, and this
expression equals 1 if the coalition T wins, while 0 if the coalition T\ {i} fails to win.
Hence we have
*[] = ( - O K * -<)7!.
T
where summation is extended over all such winning coalitions T 3 t for which the
coalition T \ {i} is not the winning one.
Example 22. (Game with major player.) [Vorobjev (1977)]. The game is played
by n players. One of the players is called "major". Coalition 5 wins 1 if it contains
either a major player and a t least one more player or all its n 1 "ordinary" players.
If n is the major player, then the characteristic function of this game can be written
as

i
l, 5 D { t , n } , i^n,

1, SD{l,...,n-l},
0, otherwise.
3.14. The potential of the Shapley value 187

It is obvious that the conditions v(T) = 1 and t>(T\{n}) = 0 hold for any coalition
T D {n} if and only if 2 < |T| < n - 1. Hence

n-l

** E'5 c - > s ~ ~^r-


[v] _

(=2

Since the game is in (0-1) - reduced form,


><[] = 1 - <pM = 2/n.
n-l

E'
All ordinary players possess equal rights; hence, by symmetry,

Now, the "monopolistic" position of major player ensures him the payoff (n
l)(n 2)/2 times that of ordinary players.
3.13.8. Example 23. ("Land-lord and farm labourers".) [Vorobjev (1977)]. Sup
pose there are n 1 farm labourers (players t = 1 , . . . ,n 1) and a land-lord (player
n). The land-lord engages k labourers and derives from the harvest a profit f(k) (f(k)
increases monotonically). The farm labourers cannot derive a profit for themselves.
This is described by the characteristic function

w(5) = ;/(|S|-i), W e s ,
\ 0, otherwise.
Here, for all T D {n}, \T\ > 1, v(T) - v(T \ {n}) = f(t - 1), where t = \T\ and
from (3.13.7) follows

1=2 " " 1=1

By the efficiency and symmetry of all labourers, we have

><M = - ^ ( / ( " - ! ) - n- /(*)). i = 1, ,n - 1.


n I 1=1

In what follows we shall denote <pi[v] by S/i*.

3.14 T h e potential of t h e Shapley value


3.14.1. Consider as before the n-person games in characteristic function form with
transferable payoffs. We studied different solution concepts or different optimality
principles. Some of them constitute subsets of payoff vectors or imputations (such as
core and iVM-solution). Finally, the Shapley value represents an optimality principle
188 Chapter 3. Nonzero-sum games

consisting from the unique payoff vector (unique imputation). In this section we
follow Hart and Mas-Colell (1988) and introduce here one number which specifies the
cooperative game. By using the "marginal contribution" principle we assign to each
player his marginal contribution according to the numbers defined for the game. It
happens that the only requirement, that the resulting payoff vector be "efficient" (i.e.
that the payoffs add up to the worth of the grand coalition), determines this process
uniquely.
3.14.2. A cooperative game with transferable payoffs is a pair (N, v), where N
is a finite set of players and v : 2N * R is the characteristic function, satisfying
v(0) = 0. A subset 5 C N is called a coalition, and v(S) is the worth of the coalition
S. Given a game (N,v) and a coalition S C N, we write (S,v) for the subgame
obtained by restricting v to (the subsets of) 5; that is, the domain of the function v
is restricted to 2 s .
3.14.3. Let f denote the set of all games. Given a function P : f R that
associates a real number P(N,v) to every game (N, v), the marginal contribution of
player i in game (N, v) is defined as

DiP(N,v) = P(N,v)-P(N\ {},),

where t N. (The game (N \ {i},v) is restriction of (N,v) to (A^ \ {t},v).)


A function P : f R with P ( 0 , v) = 0 is called a potential function if it satisfies
the following condition:
'lDiP(N,v) = v(N) (3.14.1)

for all games (N, v). Thus, a potential function is such that its marginals are always
efficient; that is, they add up to the worth of the grand coalition.
3.14.4. Theorem. There exists a unique potential function P. For every game
(N, v) the resulting payoff vector (D'P(N,v))iN of marginal contributions coincides
with the Shapley value of the game. Moreover, the potential of a game (N, v) is
uniquely determined by (8.14-V applied only to the game and its subgames (i.e., to
(S,v) for all S C N).
Proof. Rewrite (3.14.1) as

P(N,v) = -^-l{v(N) + ^P(N\{i},v)y (3.14.2)

Starting with P ( 0 , v) = 0, (3.14.2) determines P(N,v) recursively. This proves the


existence and uniqueness of the potential function P and that P(N, v) is uniquely
determined by (3.14.1) (or (3.14.2)) applied just to (S,v) for all S CN.
It remains to show that D'P(N, v) = Sh'(N,v) for all games (JV,) and all players
j G JV, where P is the (unique) potential function and Shf(N,v) denotes the Shapley
value of player i in the game (N,v). We prove that all the axioms that uniquely
determine the Shapley value are satisfied by D'P. Efficiency is just (3.14.1); t h e
other three axioms - dummy (null) player, symmetry, and additivity - are proved
inductively using (3.14.2). Indeed, let i be a null player in the game (N,t;) (i.e.,
3.14. The potential of the Shapky value 189

v(S) = v(S \ {:'}) for all S). We claim that this implies P(N,v) = P(N \ {*},);
hence D,P(N,v) = 0. Assume the assertion holds for all games with less than |JV|
players; in particular, P(N \ {j},v) = P(N \ {j,i},v) for all j ^ i. Now subtract
(3.14.2) for N \ {i} from (3.14.2) for /V, to obtain

\N\[P(N,v) - P(N\{i},v)) = \v(N)~v(N\{i}))


+ EmN\U},v)-P(N\{j,t},v)} = 0.
Next, assume players i and j are substitutes in the game (JV, v). This implies that
P{N \ {i},v) = P(N \ {j},v) (use (3.14.2), noting that i and j are substitutes in
(N,{k},v) for all k / i,j); thus D'P{N,v) = D>P{N,v). Finally, another induc
tive argument on (3.14.2) shows that P(N,v + to) = P(N,v) + P(7V,to), implying
additivity.
Present another way of viewing the potential. Given a game (N,v), the allocation
of marginal contributions (i.e., v(N)v(N\{i}) to player i) is, in general, not efficient.
One way to resolve this difficulty is to add a new player, say player 0, and extend the
game to yV0 = ./V U {0} in such a way that the allocation of marginal contributions in
the extended game becomes efficient. Formally, let (iV0,vo) be an extension of {N,v)
(i.e., v0(S) v(S) for all 5 C N). Then the requirement is

M"o)= EMJVo)-o(JVo\{})]

= [vo(N0) - v(N)} + 5 > 0 ( M > ) - v0(N0 \ {{))). (3.14.3)

This reduces to
* ( * ) = E M J V o ) - MNo \ {})], (3.14.4)

which, yields the following restatement of the result of Theorem.


3.14.5. Corollary. There exists o unique extension v0 of v whose marginal
contributions to the grand coalition are always efficient (more precisely, (3.14-3) is
satisfied for the game and all its subgames); it is given by VQ(S U {0}) = P(S,v) for
all S C N, where P is the potential function.
Note that the payoffs to the original players (in N) add up correctly to v(N)
(3.14.4); these are the Shapley values. Player 0, whose payoff is the residual
P(N,v) v(N), may be regarded as a "hidden player", similarly to the "hidden
factor" introduced by McKenzie in the study of production functions in order to
explain the residual profit (or loss).
In (3.14.1) and (3.14.2) the potential is only given implicitly. We now present two
explicit formulas. The T-unanimity game uj- (where T is a nonempty finite set) is
defined by UT(S) - 1 if S 3 T, and uj(S) = 0 otherwise. It is well known that these
games form a linear basis for T: Each game (N,v) has a unique representation (see
Shapley[1953] and Sec. 3.8)
v = ^ aTuT,
TCN
190 Chapter 3. Noazero-sum games

where, for all T C N,

aT = aT(N,v) = ( - l ) l T H s l t , ( S ) . (3-14-5)
ScT

3.14.6. Theorem. The potential function P satisfies

TCN I1 I

for all games (N,v), where aj is given by (S.14-5).


Proof. Let Q(N,v) denote the right-hand side in the preceding formula. Then
Q(Q),v) = Oand Q(N,v)-Q(N\{i},v) = T,ieTaT/\T\, which when summed up over
t shows that Q satisfies (3.14.1). Therefore, by Theorem 3.11.4, Q coincides with the
unique potential function P.
The number &r = aj-/|T| is called the dividend of each member of the coalition T
and Sti{N,v) = E;eT*r [Harsanyi (1963)].
3.14.7. Theorem. The potential function P satisfies

W )SCN= E - ^ ^ , n

where n \N\ and s = \S\.


Proof. The marginal contributions of the function on the right side are easily seen
to yield the Shapley value.
To interpret this last formula, consider the following probabilistic model of choos
ing a random nonempty coalition S C N: First, choose a size s = 1,2,... , n = \N\
uniformly (i.e., with probability 1/n each). Second, choose a subset S of size s, again
uniformly (i.e., each of the C* subsets has the same probability). Equivalently, choose
a random order of the n elements of N (with probability 1/n! each), choose a cutting
point s (1 < s < n ) , and let S be the first s elements in that order. The probability
of choosing of a set S with \S\ = s is

s!(n - s)! s (s - l)!(n - s)!


*s = j = ;
n nl n n!
Therefore the formula of Theorem 3.11.7, may be rewritten as

P(N,v)= Yl*s-v(S) = E (3.14.6)


SCN S , ! > :
where E denotes expectation over S with respect to the foregoing probability model.
The interpretation of (3.14.6) is that the potential is the expected normalized worth, or,
equivalently, the per capita potential P(N,v)/\N\ equals the average per capita worth
v(S)/\S\. This shows that the potential may be viewed as an appropriate "summary"
of the characteristic function into one number (from which marginal contributions are
then computed).
3.15. The Shapley value for a minimum cost spanning tree game 191

3.15 T h e Shapley value for a minimum cost


spanning tree game
3.15.1. The minimum cost spanning tree game (MCST) is a cooperative game that
arises from a cost allocation problem in a complete weighted graph. A set of nodes of
this graph represents a source that can provide the other nodes (which are identified
with the players) with a certain good or service. The value of a coalition in the MCST
game is determined by the minimal cost to provide the members of that coalition with
the good and service involved, without help from the players outside the coalition.
This minimal cost is precisely the cost of a MCST in the weighted subgraph induced
by the coalition involved. MCST games were introduced by Bird (1976), but we
here follow the definition from Aarts and Driessen (1993). We shall use the following
notations and definitions concerning MCST games.
3.15.2. Let (Af U N) = N, M (1 N = 0 be the node set.
Definition. A network on N is an ordered pair (Kft,w) where K^ = (N U
M, E(Kff)) represents the complete graph with node set N U M and set of undirected
edges E(K^) = {(x, y), x,y 6 N U M, x ^ y), and w := E(K^) R+ represents a
non-negative function of E(Kfj).
The nodes in N are interpreted as the users in the network and the nodes in M
as common suppliers. The function w is called the weight function of the network.
Let 2* := {S\S C N}.
3.15.3. Definition. Let (K^,w) be a network on N = M U N and S C 2 * .
The subnetwork of(Kj/,w) on S is the ordered pair (Ks,w), where Ks = [S, E(Ks))
represents the complete graph with node set S and edge set E(Ks) = {(x,y)\x,y

The restriction of the weight function w to the edge E(Ks) is denoted also by w.
3.15.4. Definition. The minimum cost spanning tree (MCST) game corre
sponding to the network (K^, w) is the cooperative cost game in characteristic function
form (N,c), where the characteristic function c ; 2N R is given by
c(0) = 0
N
and for all S= MuK, K C2 ,
c(S) = total weight of a MCST (S,E(TS)) in the subnetwork (Ks,w), i.e.
c(S)= w(l).
<(r s )

In the game theoretic interpretation the elements of N are called players.


3.15.5. Following Kazakova-Frehse (1994) we shall compute the Shapley value
for a special class of MCST games. Suppose that k(z',z") > 0 for z' 6 M, z" N;
k(z",z') > 0 for z" G N, z' g M; k(x,y) = 0 for all other cases.
Let 5 = M U K, where K C N and K ^ 0 . Define the characteristic function
c(S) in the following way. For S M U K
C(MUK)= *(/,*')+ E E *(*V), (3-15.1)
192 Chapter 3. Nonzero-sum games
for all other ScMUN, C(S) = 0.
Compute the Shapley value for the cooperative MCST game with the characteris-
tic function (3.15.1) defined above. For this reason define the potential P(M U N,c)
of the game and use the theorem where it is proved that the resulting payoff vector
(D*P(M U N), c)rAr of marginal contributions coincides with the Shapley value of
the game.
The potential function is computed by the formula as in Theorem 3.14.7.
-l)!(n + m -
P(JVUM,c)= E ~ -^c(S),
SCMoN
SCMUN (n
(n -f
+ m)
m !
)

where n = |JV|, m = |M|, s = \S\. As we have seen in theorem the marginal


contributions of the function on the right side yield the Shapley value.
Introduce a(x) foixN by formula

<*(*)
a{x) ==- E k*(*,
(x'z)*)+E*(*,
+ E *(*>*)>
z),
xM *6Af
zM
then it is easily seen that for 5 = M U K, K C N, \k\ > 1

c(S) = E (*)
xSn/V=tf, |K|=*>1

For the cooperative MCST game with the above defined characteristic function we
have
{ l a)
p(Mu^c
P(MUN,C)=)= E n m
SCNuM
E
!
^'- ;Zi:~
T i r ^<s)V )
SCNuM ((n +
+ m) !
)
^ (m
{m +
+ kk-- l)!(m + n - (m + A))!
-= 2_,
Z* /r nn -i-, mm'if,, ' <w
c
^'
S=MuK, \K\>i
\K\>1 \ "'" r

^ ( m + * - l ) !fc-l)!(n-*)!
(n-*)! ^ aa
~ Z,
~ Z, /(nn +, TnmV\t Z,
^ **
it=l
k=l (\ n ^
^ Tn>-
>- iSnJV=K, |K|=*>1
*SnJV=K, |K|=*>1
" (m + fc-l)!(n-*)! t _ i v
(3.15.2)
m
*=1 V" "" l- xN
Denote the coefficient
k--1)!( B - - * ) ' k_x _
(m + k-l)\(n-ky
C = A(n, m).
(^^jj
( + m)! - . - Mn,m).
For m = 1, A(n, 1) does not depend upon n and
n
1 1
A(n,l) =
'n(n + ll))j S^ * - 22 '
n(n +
for m = 2, /l(n, 2) does not depend upon n and

AU n -= E
A(n
A UM,2)J EUL i *(* *) _ 1I
*0 ++ *)
~(n(n + 2)(n + l)n
l ) n~~ 33''
3.16. Exercises and problems 193

Prove by induction that A(n, m) does not depend upon n and

1
A(n,m) =
m +1
we have
Air, m\ - V (n-mm + k-l)\

At x i l "^ n!(m + Jfc - 1)!


A(n+l,m)=> r-rr rr, r.
^ ( f c - l ) ! ( m + n)!(n + m + l)
Thus
4 ( n + l , m ) = /4(n,m) + -.
v v ;
' n + m+l n + m+ 1
By induction hypothesis

A(n,m)=_I_

and we have
1
A(n + l , m ) = n+m+l - M ( n , m ) + 1) = n+m+1m+1
i - ( _ J L _ + i) = ra+1
Using the expression (3.15.2)

P(N U M,c) = ,4(n,m) < * , = - ^ - r *

For x G /V we get for the Shapley value

5fc = P(AT U Af,c) - />[(# \ {i}) U Jlf.c] = L - a


m + 1
and for x M, P[N U W \ {x},c] = 0

Sh, = P(NUM,c)~ P{N U ( M \ {*}), c] = - i - $ : a , .

3.16 Exercises and problems


1. Two companies are engaged in exploration of n mineral deposits. The explo
ration funds allocated by the companies 1 and 2 are a and 0, respectively. The profit
from mining the ith field is 7; > 0. It is distributed between companies in proportion
with their contributions to the ith mining field. If they make no investment in the
ith field, then their profits from this field are zero.
(a) Describe this conflict as a two-person game, taking a payoff to each company
to be the total profit from mining of all fields.
(b) Find a Nash equilibrium.
Hint. Use the convexity of the function Hx in x and that of # 2 in y.
194 Chapter 3. Nonzero-sum games
2. In the ecologically significant region there are n industrial enterprises, each
having one pollution source. Concentration qi of emission from the tth enterprise is
proportional to the value 0 < x; < a,, t = 1,... ,n, of emission from this enterprise.
The losses incurred by the tth enterprise and made up of the waste utilization ex
penses (fi(xi)) and the pollution tax which is proportional to the total concentration
q of emission from all enterprises. The quantity q should not exceed that is the
maximum level of emission concentration; otherwise each tth enterprise has to pay
an extra penalty S{.
Describe this conflict, as a noncooperative n-person game, taking the losses in
curred by each enterprise to be the total environmental projection expenses.
Hint. Use the result of Example 5, 3.1.4.
3. Find the sets of all Nash equilibria (in pure strategies) in the following (m x n)
bimatrix games with the matrices A = {ay} and B = {/?y}.
(a) The matrices A and B are diagonal and positive, i.e. m = n, ay = /?y = 0,
t ^ j and a > 0, /?,-, > 0, t = 1,... ,ro, j = 1,... ,n.
(b)
2 0 5 ' 2 2 1 '
, B=
2 2 3 0 7 8
(c)
'3 8 -1 ' 1 3 4"
4 0 2 , B = 2 1 8
1 2 3 2 3 0
4. Show that in the bimatrix game with the matrices

1 2 0' '3 4 0'


1 3 1 , B= 1 3 2
2 2 1 1 3 0

the situation (2,2) is equilibrium. Is it strongly equilibrium?


5. Find all Pareto optimal situations in pure strategies in the bimatrix game with
the matrices
"4 1 0 ' 0 5 6'
2 7 5 , B= 7 0 2
6 0 1 2 6 1
Does this game have pure strategy equilibria?
6. Show graphically in coordinates (Kf,K3) the set of all possible mixed strategy
payoff vectors in the game "battle of the sexes" (see 3.1.4).
Hint. Arbitrary mixed strategies x and y for Players 1 and 2, respectively, can
be written as x = ((, 1 f), y = (r;, 1 j), , t] 6 [0,1]. Writing the mixed strategy
payoff functions K% and K? and eliminating one of the parameters we obtain a single-
parameter family of line segments the union of which is the required set (see Fig. 3.2).
The curvilinear part of the boundary represents an envelope for this family of line
segments and is a part of the parabola 5/f? + bK\ - IQKXK2 - l8(Kt + K%) + 45 = 0.
3.16. Exercises and problems 195

7. Find a completely mixed Nash equilibrium in the bimatrix game with the
matrices
6 0 2 "607"
0 4 3 , B= 0 4 0
7 0 0 2 3 0
Does this game also have equilibria in mixed strategies?
Hint. First find a completely mixed equilibrium (x,y), x = (fi, &,}), y =
('h>'?2>,?3)> then a n equilibrium for which i = 0, etc.
8. "Originality game". [Vorobjev (1984)]. Consider a noncooperative n-person
game T = (N,{X,}ieN,{H,}ieN), where X, = {0,1}, /f<(0,... ,0||,1) = g, > 0,
i / , ( l , . . . , 1||,0) = hi > 0, Hi(x) = 0 in the remaining cases where ||, means that a
replacement is made in the ith position.
(a) Interpret the game in terms of advertising,
(b) Find a completely mixed equilibrium.
9. As is shown in 1.10.1, zero-sum two-person games can be solved by the "fictious
play" method. Examining the bimatrix game with the matrices

' 2 0 1" 1 0 2"


1 2 0 , B= 2 1 0
0 1 2 0 2 1
show that this method cannot be used in finding an equilibrium in bimatrix games.
10. "Musical chairs" game. [Moulin (1981)]. There are two players and three
chairs designated by numbers 1,2,3. A strategy of a player is to choose a chair number.
Both players may suffer losses due to a choice of the same chair. If, however, their
choices are different, then the player, say i, whose chair is located immediately after
player j ' s chair, wins twice as much as player j (it is assumed that chair 1 is located
after chair 3). We have the bimatrix game T(A, B),

(0,0) (1,2) (2,1)


(A,B) (2,1) (0,0) (1,2)
L (1,2) (2,1) (0,0)

(a) Show that the unique completely mixed Nash equilibrium is an equiprobable
choice of chairs to be made by each player.
(b) Show that an equilibrium in joint mixed strategies is of the form

1/6, if i ? i,
L(hf to, if i = j .
(c) Show that the payoffs in Nash equilibrium are not Pareto optimal, while a
joint mixed strategy equilibrium may result in Pareto optimal payoffs (3/2,3/2).
11. The equilbrium in joint mixed strategies does not imply that the players must
necessarily follow the pure strategies resulting from the adopted joint mixed strategy
(see definition in 3.6.1). However, if we must adhere to the results of a particular
realization of the joint mixed strategy, then it is possible to extend the concept an
196 Chapter 3. Nonzero-sum games

"equilibrium in joint mixed strategies". For all i 6 N, denote by /i(N \ {i}) the
restriction of distribution fi to the set XN\^} = n, 6 N\{i} Xi> namely

P{N\ {'= M(*II*)

for all a; 6 UieN & We say that fi is the weak equilibrium in joint mixed strategies
if the following inequalities hold for all i N and i/i Xi~.

Hi(x)p{x)> H(x\)yMN\ {*})

(a) Prove that any equilibrium in joint mixed strategies is the weak equilibrium
in joint mixed strategies.
(b) Let fi = (iii,..., fin) be a mixed strategy situation in the game T. Show that
the probability measure Ji = HiN in on the set X = ILeN Xi is a weak equilbrium in
joint mixed strategies and an equilibrium in joint strategies if and only if the situation
ft = (fii,..., fin) is Nash equilibrium.
12. (a) Prove that in the game formulated in Ex. 10 the set of Nash equilibria, the
set of joint strategy equilibria and the set of weak equilibria in joint mixed strategies
do not coincide.
(b) Show that the interval [(5/3,4/3), (4/3,5/3)] is covered by the set of vector
payoffs that are Pareto optimal among the payoffs in joint mixed strategy equilibria,
while the interval [(2,1),(1,2)] is covered by the payoffs that are Pareto optimal
among the weak equilibrium payoffs in joint mixed strategies.
13. Find an arbitration solution to the bimatrix game with the matrices A =
2 -1 ' 1 -1 '
,B = by employing the Nash bargaining procedure.
-1 1 > *"* -1 2
14. Consider the bimatrix (2 x 2) game with the matrix

A ft
(A,B) = <*1 (1,1) (1,2)
a2 (2,1) (-5,0)

This is a modification of the "crossroads" game (see Example 2 in 3.1.4) with the
following distinction. A car driver (Player 1) and a truck driver (Player 2) make
different assessments of an accident (situation (a 2 , A ) ) . Show that an analysis of the
game in threat strategies prescribes a situation ( o i , f t ) , i.e. the car must "go" and
the truck must "make a stop".
15. Suppose the kernel has a nonempty intersection with all the bounds z,- =
v({}) of the imputation set. Show that here it is a unique ./VM-solution.
16. For the cooperative game (N, v) we define a semiimputation to be the vector
a = ( a i , . . . , a ) for which a^ > ({}) and E" = 1 Oi < v(N). Show that if L is an
NM-solution of the game (N,v) and a is the semiimputation which does not belong
to L, then there exists an imputation fi G L such that fi > a.
3.16. Exercises and probiems 197

17. For the game (N, v) we define /3; by

A= max [ ( 5 U { i } ) - ( 5 ) l .
ScN\[i}1 V
' " V
"

Show that if there is an t for which a; > /?,-, then the imputation a cannot belong
either to the core or to one of the N Af-solutions.
18. Let (N,v) be a simlple game in (0, l)-reduced form (see 3.10.6) Player i is
called a "veto" player if v(N \ {i}) = 0.
(a) Prove that in order for the core to be nonempty in a simple game, it is necessary
and sufficient that there be at least one "veto" player in the game.
(b) Let S be the set of all "veto" players. Show that the imputation a =
( a i , . . . , a ) belongs to the core if Yliesai = 1> a i > 0, for t G S and a< = 0,
for i $ S.
19. In the game (N,v), we interpret the quasiimputation to mean a vector a =
(cri,... , a n ) such that . /v ai V(N). For every t > 0 we define a strict e-core Ct(v)
to be the set of imputations such that for every coalition

(a) Show that if e < e*, then C ( (v) C Cy(u).


(b) Show that there exists the smallest number for which Ct(v) ^ 0. For such an
e the set Ct(v) is called the minimal t-core and is denoted by MC(v).
(c) Find a minimal e-core in the game (N,v), where JV = {1,2,3}, u({}) = 0,
v({\,2}) = 50, u({l,3}) = 80, ({2,3}) = 90, v{{N}) = 100.
(d) Let (N,v), (N,v') be two cooperative games and suppose the equality Ct(v) =
Ce<(v') jt 0, holds for some e and '. Show that here Ce_$(v) = C(_(v') for all 6 > 0,
S < min[e,e'].
20. Show that if (AT, v) is a constant sum game (see 3.9.3), then the Shapley value
Sh is determined by

5M) = 2 [(n~5)'(,3"1)!(S)1-v(iV).

2 1 . The game {N,v) is called convex if for all S,T C N

v(S UT) + v{S n T) > v(S) + v(T).

(a) Prove that the convex game has a nonempty core and the Shapley value belongs
to the core.
(b) Show that (A7, v) is a convex game if

(S) = (J>,) 2 , SCN,

while m = ( m j , . . . , m n ) is a non-negative vector.


198 Chapter 3. Nonzero-sum games

22. Consider a simple game (N,v) in (O-l)-reduced form. We interpret player


Vs "jump" to mean the set S C N for which v(S) = 1 and v(S \ {}) = 0.
Denote by 0 ; the number of player t's jumps in the game. Then the vector
0(v) = {fii(),..., 0n(v)), where ft(t/) = 9 ; / "=i 0 j , is caUed a Banzaf vector for a
simple game.
(a) Show that 0 ! = 6, 0 2 = 0 3 = 9< = 2, and hence 0(v) = ( 1 / 2 , 1 / 6 , 1 / 6 , 1 / 6 )
for a simple four-person game (N,v) in which the coalition S wins if it comprises
either two players and {1} 5 or three or four players.
(b) Show that in the above game the 0(v) coincides with the Shapley value.
2 3 . Let {N,v) be a simple three-person game in which the coalitions (1,2), (1,3),
(1,2,3) are the only winning coalitions. Show that in this game 0 i = 3, 0 j = 6 3 = 1,
and hence the Banzaf vector is /3(v) = (3/5,1/5,1/5), while the Shapley value is
5 % ] = (2/3,1/6,1/6).
24. Consider a non-negative vector p = (wi,. ..,wn) and a number 0 > 0. Let
0 < 0 < E"=i *"' The weighted majority game is taken to be a simple game (N, v) in
which the characteristic function v is determined by

Let 0 = 8 and p = (4,3,3,2,2,1), n = 6. Compute the Shapley value and Banzaf


vector for the simple weighted majority game.
Chapter 4
Positional games

4.1 Multistage games with perfect information


4 . 1 . 1 . The preceding chapters dealt with games in normal form. A dynamic (i.e.
continued during a period of time, not instantaneous) conflictly controlled process can
be reduced to a normal form by formal introduction of the notion of a pure strategy.
In the few cases when the power of a strategy space is not great and the possibility
exists of numerical solutions such an approach seems to be allowable. However, in
the majority of the problems connected with an optimal behavior of participants in
the conflictly-controlled process the passage to normal form, i.e. the reduction of the
problem to a single choice of pure strategies as elements of large dimension spaces
or functional spaces, does not lead to effective ways of finding solutions, though
permits illustration of one or another of the optimality principles. In a number of
cases the general existence theorems for games in normal form does not allow finding
or even specifying the optimal behavior in the games for which they constitute their
normalizations. As is shown below, in "chess" there exists a solution in pure strategies.
This result, however, cannot be obtained by direct investigation of matrix games. By
investigation of differential games of pursuit and evasion it is possible in a number of
cases to find explicit solutions. In such cases however, the normal form of differential
game is so general that, for practical purposes, it is impossible to obtain specific
results.
4.1.2. Mathematical dynamic models of conflict are investigated in the theory of
positional games. The simplest class of positional games is the class of finite stage
games with perfect information. To define a finite stage n-person game with perfect
information we need a rudimentary knowledge of graph theory.
Let X be a finite set. The rule / setting up a correspondence between every
element x X and an element f(x) X is called a single-valued map of X into X or
the function defined on X and taking values in X. The set-valued map F of the set
X into X is the rule which sets up a correspondence between every element x X
and a subset Fz C X (here F x = 0 is not ruled out). In what follows, for simplicity,
the term "map" will be interpreted to mean a "set-valued map".
Let F be the map of X into X, while A C X. By the image set A will mean the

199
200 Chapter 4. Positional games

set
FA i UxAFx.
By definition, let F(0) = 0 . It can be seen that if A,;C X, i = 1,... ,n, then

F(U?=lAi) = U?=lFAu F(n?=lAi) C D^FAf.

Define the maps F 2 , F 3 , . . . , F * , . . . as follows:

F? i F(FX), F? = F(F^), ..., F* = F(Ft%.... (4.1.1)

The map F of the set X into X is called a transitive closure of the map F, if

F x = {*} U Fx U F 2 U ... U F* U . . . . (4.1.2)

The map F - 1 that is inverse to the map F is defined as

Fy~l = {x\y Fx),


i.e. this is the set of points x whose image contains the point y. The map (F - 1 )* is
defined in much the same way as the map Fx, i.e.

(F-')J = F-'tfF-1),), (4.1.3)


(F-1)3. = F-((F- 1 )J),..., (F-)J = F-1((F-')*->).
If B C X, then let
F-l(B){x\FxnB?2>). (4.1.4)
Example 1. (Chess.) Every position on a chess-board is defined by the number
and composition of chess pieces for each player as well as by the arrangement of chess
pieces at a given moment and the indication as to whose move it is. Suppose X is
the set of positions, F x , z X is the set of those positions which can be realized
immediately after the position x has been realized. If in the position x the number
of black or white pieces is zero, then Fx = 0 . Now F* defined by (4.1.1) is the set
of positions which can be obtained from x in k moves, Fz is the set of all positions
which can be obtained from x, F-1(.A) (A C X) is the set of all positions from which
it is possible to make, in one move, the transition to positions of the set A (see (4.1.2)
and (4.1.4)).
Depicting positions by dots and connecting by an arrow two positions x and y,
y 6 F x , it is possible to construct the graph of a game emanating from the original
position. However, because of a very large number of positions it is impossible to
draw such a graph in reality.
The use of set-valued maps over finite sets makes it possible to represent the
structure of many multistage games: chess, draughts, go, and other games.
Definition. The pair (X, F) is called a graph if X is a finite set and F is a map
of X into X.
4.1. Multistage games with perfect information 201

The graph (X, F) is denoted by G. In what follows, the elements of the set X
are represented by points on a plane, and the pairs of points x and y, for which
y Fx, are connected by the solid line with the arrow pointing from x to y. Then
every element of the set X is called a vertex or a node of the graph, and the pair of
elements (x, y), where y Fx is called the ore of the graph. For the arc p = ( i , y) the
nodes x and y Ate called the boundary nodes of the arc with x as the origin and y as
the end point of the arc. Two arcs p and q are called contingent if they are distinct
and have a boundary point in common.
The set of arcs in the graph is denoted by P. The set of arcs in the graph
G = (X, F) determines the map F, and vice versa, the map F determines the set P.
Therefore, the graph G can be represented as G = (X, F) or G = (X, P).
The path in the graph G = (X,F) is called a sequence of arcs, p =
(Pi)P2)-- >P*i- )' s u c n t n a t *^ e e n d f the preceding arc coincides with the ori
gin of the next one. The length of the path p = (pi, ,p*) is the number l(p) = k
of arcs in the sequence; in the case of an endless path p we set l{p) = oo.
The edge of the graph G (X, P) is called the set made up of two elements
x
i V X, for which either (x, y) P or (y, x) 6 P. The orientation is of no importance
in the edge as opposed to the arc. The edges are denoted by p, q, and the set of
edges by P. By the chain is meant a sequence of edges (pi,P2, - ), where one of the
boundary nodes of each edge p* is also boundary for p*_i, while the other is boundary
forp*+i.
The cycle is a finite chain starting in some node and terminating in the same node.
The graph is called connected if its any two nodes can be connected by a chain.
By definition, the tree or the graph tree is a finite connected graph without cycles
which has at least two nodes. Any graph tree has a unique node x0 such that F I 0 = X.
The node x0 is called the initial node of the graph G.
Example 2. Fig. 4.1 shows the graph or the graph tree with its origin at x0. The
nodes x X or the vertices of the graph are marked by dots. The arcs are depicted
as the arrowed segments emphasizing the origin and the end point of the arc.
Example S. Generally speaking, draughts or chess cannot be represented by a
graph tree if by the node of the graph is meant an arrangement of draughtsmen or
chess pieces on the board at a given time and an indication of a move, since the same
arrangement of pieces can be obtained in a variety of ways. However, if the node of
the graph representing a structure of draughtsmen or chess pieces at a given time is
taken to mean an arrangement of pieces on the board at a given time, an indication
of a move and the past course of the game (all successive positions of pieces on the
earlier moves), then each node is reached from the original one in a unique way (i.e.
there exists the only chain passing from the original node to any given node); hence
the corresponding graph of the game, contains no cycles and is the tree.
4.1.3. Let z X. The subgraph Gz of the tree graph G = (X, F) is called a graph
of the form (X Ft), where Xz = Fz and Fa = FXD Xz. In Fig. 4.1 the dashed line
encircles the subgraph starting in the node z. On the tree graph for all x Xz the set
Fx and the set Fzx coincide, i.e. the map Fz is a restriction of the map F to the set
Xx. Therefore, for the subgraphs of the tree graph we use the notation Gz = (Xt,F).
202 Chapter 4. Positional games

Figure 4.1

4.1.4. We shall now define the multistage game with complete information on a
finite tree graph.
Let G = (X, F) be a tree graph. Consider the partition of the node set X into
n + 1 sets X\,...,Xn,Xn+i, U"=xXi = X, Xk n Xi = 0 , k ^ I, where Fz = 0 for
x Xn+i- The set Xi, i l , . . . , n is called the priority set for the *-th player,
while the set Xn+i is called the set of final positions. The real-valued functions
Hi(x),...,Hn(x), x 6 Xn+i ate defined on the set of final positions X+i. The
function Hi(x), i = 1 , . . . ,n is called a payoff to the t-th player.
The game proceeds as follows. Let there be given the set N of players designated
by natural numbers l , . . . , t , . . . , n (hereafter denoted as N = { 1 , 2 , . . . , } ) . Let
x 0 Xtj, then in the node (position) x 0 player ti "makes a move" and chooses the
next node (position) xi Fxo. If xj Xi}, then in the node Xi Player tj "makes a
move" and chooses the next node (position) x% FXl and so on. Thus, if the node
(position) Xfc_! g Xik is realized at the fc-th step, then in this node Player t* "makes a
move" and selects the next node (position) from the set FXk_1. The game terminates
as soon as the terminal node (position) x/ A+i! (i.e. the node for which FXl = 0 )
is reached.
Such a step-by-step selection implies a unique realization of some sequence Xo> 1
x * , . . . , XJ determining the path in the tree graph G which emanates from the initial
position and reaches one of the final positions of the game. In what follows, such a
4.1. Multistage games with perfect information 203
path is called a play of the game. Because of the tree-like structure of the graph G,
each play uniquely determines the final position Xj to be reached and, conversely, the
final position xs uniquely determines the play. In the position X| each of the players
t, i = 1 , . . . , n, receives a payoff Hi(xi).
We assume that Player t making his choice in position x Xi knows this position
and hence, because of the tree-like structure of the graph G, can restore all previous
positions. In this case, the players are said to have perfect information. Chess and
draughts provide a good example of the game with perfect information, because
players can put down their moves, and hence they are said to know the past course
of the game when making each move in turn.
Definition. The single-valued map u, which sets up a correspondence between
each node (position) x Xi and some node (position) y Fx is called a strategy for
player i.
The set of all possible strategies for player i is denoted by {/,-. Now the strategy
of the t-th player prescribes him, in any position a; from his priority set Xi, a unique
choice of the next position.
The ordered set u => ( i , . . . , u ; , . . . , u n ) , where j Ui, is called a situation in the
game, while the Cartesian product U = n"=i Ui ' s called the set of situations. Each
situation u = ( u > , . . . , u , , . . . , u n ) uniquely determines a play in the game, and hence
payoffs to the players. Indeed, let x 0 A";,. In the situation u = ( u j , . . . , U j , . . . , u n )
the next position x\ is then uniquely determined by the rule ;,(x 0 ) = i j . Now let
X\ 6 Xi2. Then x 2 is uniquely determined by the rule u; 3 (xi) = x 2 . If the position
Xk-\ Xik is realized at the fc-th step, then x* is uniquely determined by the rule
Xfc = u;t(2fc_i) and so on.
Suppose that to the situation u = ( t t i , . . . , t t j , . . . ,u n ) in the above sense corre
sponds a play x 0 , x t , . , . , xj. Then we may introduce the notion of the payoff function
Ki for player i by equating its value in each situation to the value of the payoff Hi in
the final position of the play x 0 , . . , xj corresponding to the situation u = ( u j , . . . , u n ) ,
that is
Ki(uu...,Ui, . . . , n ) = Hi(xi), i = l,...,n.
The functions /",, i = l , . . . , n , are defined on the set of situations U = n?=i Ui-
Thus, constructing the players' strategy sets Ui and defining the payoff functions Ki,
i = 1 , . . . ,n, on the Cartesian product of strategy sets of the players we obtain a
game in normal form
T = (N,{U,}ieN,{KiUN),
where N = {!,... , t , . . . , n } is the set of players, Ui is the strategy set for player t,
and Ki is the payoff function for player t, i = 1 , . . . ,n.
4.1.5. For the purposes of further examination of the game T we need to introduce
the notion of a subgame, i.e. the game on a subgraph of the graph G in the main
game (see 1.1.1).
Let z 6 X. Consider a subgraph Gt = {XZ,F) which is associated with the
subgame Tz as follows. The players priority sets in the subgame Vz are determined
by the rule Y' = Xi fl Xz, i = 1 , . . . , n , the set of final positions Vjf+1 = Xn+i ("1 Xx,
204 Chapter 4, Positional games

player i's payoff H*(x) in the subgame is taken to be

//*(*) = # , ( * ) , i ^ + 1 , i = l,...,n.

Accordingly, player i's strategy u* in the subgame T is defined to be the truncation


of player i's strategy u,- in the game T to the set Y*, i.e.

f(i) = tii(x), x Yf = X,; n X i = 1 , . . . ,n.

The set of all strategies for player i in the subgame is denoted by U*. Then each
subgraph Gz is associated with the subgame in normal form

r. = (#,{?}, {/?}),
where the payoff function K', i = l , . . . , n are defined on the Cartesian product
v = nr=i if?.

4.2 Absolute equilibrium (subgame-perfect)


In Chapter 3 we introduced the notion of a Nash equilibrium for the n-person game
in normal form. It turns out that for multistage games it is possible to strengthen
the notion of equilibrium by introducing the notion of an absolute equilibrium.
4.2.1. Definition. The Nosh equilibrium u" = (uj, . . . , u * ) is called on
absolute Nash equilibrium in the game T if for any z X the situation (u*) z =
((t*j)*,... ,(ii|J)*), where (u*)* is the truncation of strategy u* to the subgame TIt is
called an absolute Nash equilibrium in the subgame Tz.
Then the following fundamental theorem is valid.
Theorem. In any multistage game with perfect information on a finite tree
graph there exists an absolute Nash equilibrium.
Preparatory to proving this theorem we will first introduce the notion of a game
length. By definition, the length of the game T means the length of the longest path
in the graph G=(X,F).
Proof will be carried out by induction along the length of the game. If the length
of the game T is 1, then a move can be made by only one of the players who, by
choosing the next node from the maximization condition of his payoff, will act in
accordance with the strategy constituting an absolute Nash equilibrium.
Now, suppose the game T has the length k and XQ 6 -X,-, (i.e. in the initial position
XQ player ii makes his move). Consider the family of subgames Vz, z 6 F I 0 , where
the length of each subgame does not exceed k 1. Suppose this theorem holds for
all games whose length does not exceed k 1, and prove it for the game of length k.
Since the subgame F z g F^ has the length k 1 at most, under the assumption
of induction the theorem holds for them and thus there exists an absolute Nash
equilibrium. For each subgame r z F^ this situation will be denoted by

() = [(:)%-, )'} (4-2.i)


4.2. Absolute equilibrium (subgame-perfect) 205

Using absolute equilibria in the subgames Tz we construct an absolute equilibrium


in the game T. Let u'(x) = (u"(x))', for x e X, l~l Xz, z 6 Fxo, i = l , . . . , n ,
u ' ^ i o ) = **, where z* is obtained from the condition

< ( ( " T > max A'* [()*]. (4.2.2)

The function u* is defined on player t's priority set A',-, * = 1 , . . . , n, and for every
fixed x e Xi the value u*(x) g Fx. Thus, u", i = 1 , . . . ,n, is a strategy for player
i in the game T, i.e. w* Ui. By construction, the truncation (u')z of the strategy
u* to the set Xi n Xz is the strategy appearing in the absolute Nash equilibrium of
the game 1%, z FZQ. Therefore, to complete the proof of the theorem, it suffices to
show that the strategies u", i: = 1 , . . . ,n constructed by formulas (4.2.2) constitute a
Nash equilibrium in the game T. Let t / M. By the construction of the strategy u^
after a position z" has been chosen by player i\ at the first step, the game T becomes
the subgame I V . Therefore,

Ki(u*) = Kf{(u'Y'} > Kf{(u-\\u,)''} = K,(u'\\ut), u, e (/ i = 1,... ,n, i / t


(4-2.3)
since (u*) J is an absolute equilibrium in the subgame Tz>. Let ;, 6 {/;, be an
arbitrary strategy for player n in the game F. Denote zo = Ui,(xo). Then

^ , ( - ) = < { ( * r ' } = max ^ { ( u * ) ' l ^ * . , M ( T }

> ^ { ( u - | | u r } = A', 1 (u*|K). (4.2.4)


The assertion of this theorem now follows from (4.2.3), (4.2.4).
4.2.2. Example 4- Suppose the game T is played on the graph depicted in Fig. 4.2
and the set N is composed of two players: ./V = {1,2}. Referring to Fig. 4.2 we
determine priority sets. The nodes of the set X\ are represented by circles and those
of the set X2 by blocks. Players' payoffs are written in final positions. Designate by
double indices the positions appearing in the sets Xx and X2, and by one index the
arcs emanating from each node. The choice in the node x is equivalent to the choice
of the next node x' Fx; therefore we assume that the strategies indicate in each
node the number of the arc along which it is necessary to move further. For example,
Player l's strategy ux = (2,1,2,3,1,2,1,1) tells him to choose arc 2 in node 1, arc 1
in node 2, arc 2 in node 3, arc 3 in node 4, and so on. Since the priority set of the first
player is composed of eight nodes, the strategy for him is an eight-dimensional vector.
Similarly, any strategy for Player 2 is a seven-dimensional vector. Altogether there
are 864 strategies for Player 1 and 576 strategies for Player 2. Thus the corresponding
normal form appears to be an 864 x 576 bimatrix game. It appears natural that the
solution of such bimatrix games by the methods proposed in Chap. 3 is not only
difficult, but also impossible. However, the game involved is simple enough to be
solved by the backward construction of an absolute Nash equilibrium as proposed in
the proof of Theorem 1 in 4.2.1.
Indeed, denote by i>i(x),v 2 (x) the payoffs in the subgame Tz in a fixed absolute
equilibrium. First we solve subgames r , 6 , r 1 7 , r 2 . 7 . It can be easily seen that
206 Ch&pter 4. Positional games

Figure 4.2

v,(1.6) = 6, t>2(1.6) = 2, ^(1.7) = 2, j(1.7) = 4, v,(2.7) = 1, 2(2.7) = 8. Further,


solve subgames r 2 . 5 , r 2 6 , Tig. Subgame r 2 5 has two Nash equilibria, because Player 2
does not care which of the alternative to choose. At the same time, his choice appears
to be essential for Player 1, because with Player 2's choice of left-hand arc Player 1
scores +1 or, with Player 2's choice of right-hand arc, Player 1 scores +6. We point out
this feature and suppose Player 2 "favors" and chooses the right-hand arc in position
(2.5). Then t>,(2.5) = j(1.6) = 6, v2(2.5) = u2(1.6) = 2, t>,(2.6) = t>,(1.7) = 2,
t>2(2.6) = 2(1.7) = 4, i(1.8) = 2, t>2(1.8) = 3. Further, solve games T o , iY 4 ) r 3 . 3 ,
Ti.si F2.4. Subgame 1 \ 3 has two Nash equilibria, because Player 1 does not care which
of the alternative to choose. At the same time, his choice, appears to be essential
for Player 2, because with Player l's choice of the left-hand alternative he scores + 1 ,
whereas with the choice of the right-hand alternative he scores +10. Suppose Player
1 "favors" and chooses in position (1.3) the right-hand alternative. Then i(1.3) = 5,
w2(1.3) = 10, t>i(1.4) = t>!(2.5) = 6, wj(1.4) = j(2.5) = 2, t>,(1.5) = t>i(2.6) = 2,
w2(1.5) = t>2(2.6) = 4, vi(2.3) = 0, u 2 (2.3) = 6, ,(2.4) = 3, 2 (2.4) = 5. Further,
solve games r 2 .,, I \ 2 , 1 \ 2 ; v,(2.1) = wx(1.3) = 5, v2(2.1) = w2(1.3) = 10, v,(1.2) =
i(2.4) = 3, v 2 (1.2) = t>2(2.4) = 5, w,(2.2) = - 5 , v 2 (2.2) = 6. Now solve the game
T = r , j . Here ^(1.1) = w,(2.1) = 5, a(l.l) = v 2 (2.1) = 10.
As a result we have an absolute Nash equilibrium (uj,uj), where
u, = (1,2,2,2,2,3,2,1), u; = (1,3,2,2,2,1,2). (4.2.5)
4.2. Absolute equilibrium (subgame-perfect) 207
In the situation (u^,uj) the game follows the path (1.1), (2.1), (1.3). It is apparent
from the construction that the strategies u", i = 1,2, are "favorable" in that the
player i making his move and being equally interested in the choice of the subsequent
alternatives, chooses that alternative which is favorable for player 3 .
The game T has absolute equilibria in which the payoffs to players are different. To
construct such equilibria, it suffices to replace the players' "favorableness* condition
by the inverse condition, i.e. the "unfavorableness" condition. Denote by vi(x), u 2 (z)
the payoffs to players in subgame Tx when players use an "unfavorable" equilibrium.
Then we have: v,(1.6) = v,(1.6) = 6, v 2 (1.6) = v 3 (1.6) = 2, ^(1.7) = U,(1.7) = 2,
,,,(1.7) = v,(1.7) = 4, v,(2.7) = U,(2.7) = - 2 , i>2(2.7) = 2 (2.7) = 8. As noted before,
subgame r 2 5 has two Nash equilibria. Contrary to the preceding case, we assume
that Player 2 "does not favor" and chooses the node which ensures a maximum payoff
to him and a minimum payoff to Player 1. Then i(2.5) = 1, v 2 (2,5) = 2, vj(2.6) =
v,(1.7) = 2, t>2(2.6) = v2(1.7) = 4, rJ,(1.8) = ,(1.8) = 2, U2(1.8) = w,(1.8) = 3.
Further, we seek a solution to the games r u , TiA, IYs, r 2 3 , T2A- Subgame IY 3 has
two Nash equilibria. As in the preceding case, we choose "unfavorable" actions for
Player 1. Then we have i(1.3) = vi(1.3) = 5, tJ2(1.3) = 1, t7i(1.4) = 2, v 2 (1.4) = 3,
tJ,(1.5) = ,(2.6) = t, t (1.5) = 2, tJ2(1.5) = v 2 (2.6) = i*(2.6) = 4, t?,(2.3) = i(2.3) =
0, w2(2.3) = v 2 (2.3) = 6, t>,(2.4) = u,(2.4) = 3, t>2(2.4) = v2(2.4) = 5. Further, we
solve games r 2 ] , Ti.2, T2,2. We have: t>i(2.1) = t>i(1.5) = 2, v 2 (2.1) = tJ2(1.5) = 4,
Ui(1.2) = Ui(2.4) = 3, t?2(1.2) = 3,(2.4) = 5, U2(2.2) = v2(2.2) = 6, v,(2.2) =
t>,(2.2) = - 5 . Now solve the game V = T,.,. Here v i ( l . l ) = tj,(1.2) = 3, r7 2 (l.l) =
t>2(1.2) = 5.
We have thus obtained a new Nash equilibrium

!() = (2,2,1,1,2,3,2,1), u*2(-) = (3,3,2,2,1,1,3). (4.2.6)

Payoff to both players in situation (4.2.6) are less than those in situation (4.2.5). Just
as situation (4.2.5), situation (4.2.6) is an absolute equilibrium.
4.2.3. It is apparent that in parallel with "favorable" and "unfavorable" absolute
Nash equilibria there exists the whole family of intermediate absolute equilibria.
Of interest is the question concerning the absence of two distinct absolute equilibria
differing by payoffs to players.
T h e o r e m . [Rochet (1980)]. Let the players' payoffs Hi(x), i = 1 , . . . ,n, in the
game T be such that if there exists an o and there are x,y such that Hta(x) = Hi0(y),
then Hi(x) = Hi(y) for all i 6 N. Then in the game T, the players' payoffs coincide
in all absolute equilibria.
Proof. Consider the family of subgames Tx of the game T and prove the theorem
by induction over their length l(x). Let l(x) = 1 and suppose player i l makes a move
in a unique nonterminal position x. Then in the equilibrium he makes his choice from
the condition
#,,(*) = max #,,(*')

If the point z is unique, then so is the payoff vector in the equilibrium which is
here equal to H(x) = {Hi(x),...,Hn(x)}. If there exists a point W ^ x such
208 Chapter 4. Positional games
that #;,(f) = Hi^x), then there is one more equilibrium with payoffs H(W) =
{Hi(W),,.. , # , , ( ! ) , . . . , H(W)}. From the condition of the theorem, however, it fol
lows that if Hh(W) = Hh(w), then H{(W) = #;(x) for all N.
Let v(x) = {j(i)} be the payoff vector in the equilibrium in a single-stage sub-
game Tx which, as is shown above, is determined in a unique way. Show that if the
equality v^x') = w,(x") holds for some t0 {x\x" are such that the lengths of the
subgames IV, IV< are 1), then j(x') = Vi(x") for all t 6 N. Indeed, let x' 6 Xh,
x" X^, then
Vfe'i'
t>,,(x") = Hh(r) = max Hh(y)
and Vi(x') = ffi(x'), Vi(x") = #<(x") for all i g JV. From the equality UJ,(X') = ^ ( x " )
it follows that H^W1) = //^(x"). But, under the condition of the theorem, /f^x*) =
Hi(x") for allieN. Hence v.-(x') = t)<(x") for all i iV.
We now assume that in all subgames Fx of length l(x) < k 1 the payoff vector in
equilibria is determined uniquely and if for some two subgames Tz>, Tx whose length
does not exceed k l, t)<(x') = "^(x") for some i 0 , then v<(x') = v,(x") for all t N.
Suppose the game Tm is of length k and player it makes his move in the initial
position x0. By the induction hypothesis, for all z g Fzo in the game T, the payoffs in
Nash equilibria are determined uniquely. Let the payoff vector in Nash equilibria in
the game Tx be {,()}. Then as follows from (4.2.2), in the node x0 player t chooses
the next node z g Fxo from the condition
*,(*)= max Vil(z). (4.2.8)
If the point z determined by (4.2.8) is unique, then the vector with components
v.-(xo) = Vi(J), t = 1,... ,n, is a unique payoff vector in Nash equilibria in the game
r i ( ! . If, however, there exist two nodes z, I for which i>,,(z) = v;,(f), then, by the
induction hypothesis, since the lengths of subgames TT and TM do not exceed kl,
the equality v,,(z) = w,-,(*) implies the equality Vi(z) = t>,(z) for all i g N. Thus, in
this case the payoffs in equilibria v,(x0), t N are also determined uniquely.
4.2.4. Example 5. We have seen in the previous example that "favorableness" of
the players give them higher payoffs in the corresponding Nash equilibria, than the
"unfavorable" behavior. But it is not always the case. Sometimes the "unfavorable"
Nash equilibrium gives higher payoffs to all the players than "favorable" one. We
shall illustrate this rather nontrivial fact on example. Consider the two-person game
on the Fig. 4.3. The nodes from the priority set Xi are represented by circles and
those from X% by blocks, with players payoffs written in final position. On the figure
positions from the sets Xi (i = 1,2) are numbered by double indexes (, j) where is
the index of the player and j the index of the node x in the set Xi. One can easily see
that the "favorable" equilibrium has the form ((2,2,1,1,1), (2,1)) with payoffs (2,1).
The "unfavorable" equilibrium has the form ((1,1,2,1,1),(1,1)) with payoffs (5,3).
4.2.5. [Fudenberg and Tirole (1992)]. Consider the n-person game with complete
information, where each player i <n can either end the game by playing D or play
A and give the move to player t -f 1 (see Fig. 4.4).
4.2. Absolute equilibrium (subgame-perfect) 209

0 (!) (i) (1) (!) (V) 0 (!)

1 A 2 A n-1 4 n A
(2,2,... ,2)

D D D )

^i, i , . , . , i ; ^2, 2 , . . . , 2 ; Vn-1'" " * ' n - 1 ' \n>n''"-'n/

Figure 44

If player selects D, each player gets 1/t, if all players select A each gets 2. The
backward induction algorithm for computing the subgame perfect (absolute) equi
libria predicts that all players should play A. Thus the situation (A,A,...,A) is
a subgame perfect Nash equilibrium. (Note that in the game under consideration
each player is moving only once and has two alternatives, wich are also his strate
gies.) But there are also other equilibria. One class of Nash equilibria has the form
(D, A, A,D,...), where the first player selects D and at least one of the others selects
D. The payoffs in the first case are ( 2 , 2 , . . . ,2) and in the second ( 1 , 1 , . . . , 1). On
the basis of robustness argument it seems that the equilibrium {A, A,..., A) is ineffi
cient if n is very large. The equilibrium (D, A, A, / ? , . . . ) is such because the player 4
uses the punishment strategy to enforce player 1 to play D. This equilibrium is not
subgame perfect, because it is not an equilibrium in any subgame starting from the
positions 2,3.
210 Chapter 4. Positional games

4.3 Fundamental functional equations


4.3.1. We shall consider multistage zero-sum games with complete information. If
in conditions of 4.1.4 the set of players is composed of two elements N {1,2} and
H%(x) = -H\(x) for all x G X3 (X3 is the set of final positions in the game T), then
T = (N,Ui,Ki)
appears to be a multistage zero-sum game with perfect information. It is apparent
that this property is possessed by all subgames T, of the game V.
Since an immediate consequence of the condition /fj(x) = Hi(x) is that
Ki(ui,Ui) = Ki(ui,it2) for all Uj G U\, u3 G t/j, in the Nash equilibrium (uj,5)
the inequalities /fi(i,5) < A"i(uJ,uJ) < /t"i(uj,2) hold for all ut G Ult u2 G U%.
The pair (uJ,J) is now called an equilibrium or a saddle point and the strategies
forming an equilibrium are called optimal. The value of the payoff function in an
equilibrium is denoted by v and is called the value of the game I\
4.3.2. From the Theorem in 4.2.1 it follows that in a multistage zero-sum game
with perfect information on a tree graph there exists an absolute equilibrium, i.e.
the equilibrium (J,J) whose truncation to any subgame T, of the game T forms an
equilibrium in Tt. For any subgame Ty it is possible to determine the number v(y)
which represents the value of the payoff function in the equilibrium of this subgame
and is called the value of the subgame Ty. As shown in 1.3.2, the value of a zero-sum
game (i.e. the value of Player 1 's payoff function in the equilibrium) is determined
uniquely, therefore the function v{y) is defined for all y G Xu y G Xt and is unique.
4.3.3. Let us derive a functional equation to compute the function v(y). From
the definition of v(y) it follows that

() = KiimW) = -*?((:), (S)),


v
where ((uj) , (uj)*) is an equilibrium in the subgame Tv that is the truncation of the
absolute equilibrium (uJ,Uj).
Let y G Xi and z G F. Then, as follows from (4.2.2), we have

v(y) = max # ? ( ) * , te)") = W W - (4-3*)

Similarly, for y G X2 we have


v(y) = -*?(K), (;)) = - max j q ( r , (;)*)
= ~ max(-(*)) = mm v(z). (4.3.2)
From (4.3.1) and (4.3.2) we finally get
v(y) = maxv(z), y Xx, (4.3.3)

v(y) = min v(z), y X2. (4.3.4)


xFu
4.3. Fundamental functional equations 211

Equations (4.3.3), (4.3.4) are solved under the boundary condition

v(y)\vXl = Hx{y). (4.3.5)


The system of equations (4.3.3), (4.3.4) with the boundary condition (4.3.5) makes
possible the backward recursion for finding the value of the game and optimal strate
gies for players. Indeed, suppose the values of all subgames Tz of length l(z) < k 1
are known and equal to v(z). Let Fy be a subgame of length l(y) = k. Now, if y Xi,
then v(y) is determined from (4.3.3), if y 6 X2, then v(y) is determined from (4.3.4).
Here the values v(z) in formulas (4.3.3), (4.3.4) are known, since the corresponding
subgames have the length not exceeding k 1. The same formulas indicate the way
of constructing optimal strategies for players. Indeed, if y X\, then Player 1 (max-
imizer) has to choose at the point y the node z e Fy for which the value of the next
subgame is maximum. However, if y 6 X2, then Player 2 (minimizer) has to choose
the position z 6 Fy for which the value of the next subgame is minimum.
When the players' choices in a multistage zero-sum game (an alternating game)
alternate, equations (4.3.3), (4.3.4) can be written as one equation. Indeed, consider
the subgame Tz and, for definiteness, let x X\. Then in the next position Player
2 makes his move or this position is final (an alternating game), i.e. Fx C X2 U X3.
Therefore, we may write
v(x) maxw(j()> x X\, (4.3.6)
yF*
v(y) = min (*), yCFxcX2U X3. (4.3.7)

Subtracting (4.3.7) into (4.3.6) we obtain


v(x) = max[minv(z)], x e X\. (4.3.8)

If i X2, then in a similar way we get


v(x) = min[maxt>(z)]. (4.3.9)

Equations (4.3.8), (4.3.9) are equivalent and must be considered with the initial con
dition (i)| i J rj = #i(*)-
4.3.4. The Theorem in 4.2.1, considered for multistage zero-sum alternating
games, shows the existence of an equilibrium in the game of chess, draughts in the
class of pure strategies, while equations (4.3.8), (4.3.9) show a way of finding the
value of the game. At the same time, it is apparent that for the foreseeable future no
computer implementation is possible for solving these functional equations in order
to find the value of the game and optimal strategies. It is highly improbable that
we will know whether a player, "black" or "white", can guarantee the winning in
any play or there can always be a draw. In chess and draughts, however, successful
attempts are made to construct approximately optimal solutions by creating programs
capable to foresee several steps. Use is also made of various (obtained empirically)
estimations of current positions. Such an approach is possible in the investigatioD of
general multistage zero-sum games with perfect information. Successive iteration of
estimations (for several steps ahead) may lead to desirable results.
212 C&apter 4. Positioned games

4.4 Penalty strategies


4.4.1. In 4.2.1 we proved the existence of absolute (Nash) equilibria in multistage
games with perfect information on a finite graph tree. However, the investigation
of particular games of this class may reveal the whole family of equilibria whose
truncations are not necessarily equilibria in all subgames of the original game. Among
such equilibria are equilibria in penalty strategies. We shall demonstrate this with
the examples below.
Example 6. Suppose the game F proceeds on the graph depicted in Fig. 4.5. The
set TV = {1,2} is made up of two players. In Fig. 4.5, as an Example 5, the circles
represent the nodes making up the set X\ and the blocks represent the set Xt- The
nodes of the graph are designated by double indices and the arcs by single indices.

Figure 4-5

It can be easily seen that the situation u* = (1,1,2,2,2), J = (1,1) is an absolute


equilibrium in the game I\ In this case, the payoffs to players are 8 and 2 units,
respectively. Now consider the situation i = (2,1,2,1,2), u2 ~ (2,2). In this
situation the payoffs to players respectively are 10 and 1, and thus Player 1 receives a
greater amount than in the situation ( u | , u j ) . The situation (ui,uj) is equilibrium in
the game T but not absolute equilibrium. In fact, in the subgame r 1 4 the truncation
of the strategy ux tells Player 1 to choose the left-hand arc, which is not optimal for
him in position 1.4. Such an action taken by Player 1 in position 1.4 can be interpreted
as a "penalty" threat to Player 2 if he avoids Player l's desirable choice of arc 2 in
position 2.2, thereby depriving Player 1 of the maximum payoff of 10 units. But this
"penalty" threat is unlikely to be treated as valid, because the penalizer (Player 1)
may lose in this case 5 units (acting nonoptimally).
4.4.2. We shall now give a strict definition of penalty strategies. For simplicity,
we shall restrict ourselves to the case of a nonzero-sum two-person game. Let there
4.4. Penalty strategies 213

be a multistage nonzero sum two-person game

The game T will be associated with two zero-sum games T1 and r 2 as follows. The
game Ti is a zero-sum game constructed in terms of the game T, where Player 2
plays against Player 1, i.e. K2 = K\. The game T2 is a zero-sum game constructed
in terms of the game T, where Player 1 plays against Player 2, i.e. Kx = K2.
The graphs of the games Tj, r 2 , T and the sets therein coincide. Denote by ( u ^ u ^ )
and (uj 2 ,5 2 ) absolute equilibria in the games r i , r 2 respectively. Let r ^ T ^ be
subgames of the games T , ^ ; Vi(x),v2(x) are the values of these subgames. Then
the situations {(tij,) 1 , (u 21 )*} and {(uj 2 ) r , (uj 2 ) x } are equilibria in the games r l x , r 2 r ,
respectively, and vl(x) = Kf((u'n)z,(u'2l)x), v2{x) = K%((u*l2)x,(um22)z).
Consider an arbitrary pair (i,u 2 ) of strategies in the game T. Of course, this
pair is the same in the games T], T 2 . Let Z = (x0 = z 0 , z , , . . . , zi) be the path to be
realized in the situation (ui,u 2 ).
Definition. The strategy () is called a penalty strategy of Player 1 if

ut(zk) = zk+l for zkZVi X,, (4.4.1)

i(sO = *3(y) for y e xu y$Z.


The strategy u 2 (.) ts called a penalty strategy for Player 2 if

u 2 (z*) = *k+i for z t a n X2, (4.4.2)

wj(tf) = u2i(y) for yeX2, y&Z.

4.4.3. From the definition of penalty strategies we immediately obtain the fol
lowing properties:
1. if 1 (u,(-),u,(-)) = Ht(z,), K2(u,(.),u2(-)) = tfate).
2. Suppose one of the players, say, Player 1, uses strategy i(-) for which the
position zk Z Ci Xi is the first in the path Z, where Ui(-) dictates the choice of the
next position z'k+1 that is different from the choice dictated by the strategy i(-), i.e.
-4+1 ^ zk+\- Hence from the definition of the penalty strategy u 2 ( ) it follows that

ffi(i(0,*a(0) < i(**)- (4-4.3)

Similarly, if Player 2 uses strategy t* 2 () for which the position zk E Z D X2 is the


first in the path Z, where u 2 (-) dictates the choice of the next position z'k+l that is
different from the choice dictated by u 2 ( ) i.e. z +1 ^ zk+x, then from the definition
of the penalty strategy u t (.) it follows that

* a ( f i i ( - W O ) < "(**) (4.4.4)

Hence, in particular, we obtain the following theorem.


214 Chapter 4. Positional games

Theorem. Let {u\(-),Ui(-)) * e a situation in penalty strategies. For the situation


(ui(-), Ui()) to be equilibrium, it is sufficient that for all k = 0,1,...,1 I there be
the inequalities
Ki(M'lM-)) > i(**). (4-4-5)
#2(l(-)>"2(-))> V2(zk),
where zo,Zi,...,zi is the path realized in the situation (iii(-),tij(-)).
4.4.4. Suppose that uj^-) and u j j ( ) are optimal strategies for Players 1 and 2,
respectively, in the auxiliary zero-sum games Ti and T3 and ]? = {z~o,z~i,... ,z;} is
the path corresponding to the situation (ii(*)i u 22('))- Also, suppose the penalty
strategies &i(>) and u 3 ( ) are such that ui(z~k) = uj t (zi) for zk ZHXi and u2(?jt) =
u
2i(*t) f r ** ^HXJ. Then the situation (ui(),j(-)) forms a Nash equilibrium in
penalty strategies. To prove this assertion it suffices to show that

#iKi(-),i(-)) = Kt(M-)M-)) > MH), (4.4.6)


^ W i ( - ) . & ( ) ) = *(&!(),()) > *(**). * = o, 1 , . . . , i - 1 ,
and use the Theorem in 4.4.3. Inequalities (4.4.6) follow from the optimality of
strategies n ( ) and j 2 ( ) in the games Fi and T2, respectively. The proof is offered
as an exercise. We have thus obtained the following theorem.
Theorem. In the game T there always exists an equilibrium in penalty strategies.
In the special case described above (subsection 4-4-4)i the payoffs in this situation are
equal to /JT,-(*i(-), t*3aC'})* w/iere ujj(-) and () ore optimal strategies for Player 1
and S in the auxiliary zero-sum games T% and T^, respectively.
The meaning of penalty strategies is that a player causes his partner to follow the
particular path in the game (the particular choices) by constantly threatening to shift
to a strategy that is optimal in a zero-sum game against the partner. Although the
set of equilibria in the class of penalty strategies is sufficiently representative, these
strategies should not be regarded as very "good", because by penalizing the partner
the player can penalize himself to a greater extent.

4.5 Hierarchical games


There exists an important subclass of multistage nonzero-sum games, referred to as
hierarchical games. Hierarchical games model conflict-controlled systems with a hier
archical structure. This structure is determined by a sequence of control levels ranking
in a particular priority order. Mathematically, it is convenient to classify hierarchical
games according to the number of levels and the nature of vertical relations. The
simplest of them is a two-level system as depicted in Fig. 4.6.
4.5.1. The functioning of a two-level conflict-controlled system is as follows. The
control (coordinating) center Ao is at the first level of hierarchy, selects a vector
u = ( i , . . . ,u) from a given control set U, where m is a control influence of the
center on its subordinate divisions B;, t = l , . . . , n standing at the second level of
the hierarchy. Bi, i = !,...,* in its turn, selects controls Vj K()> where V*(u,)
4.5. Hierarchical games 215

Figure 4.6

is the set of controls for division Bi predetermined by the control u of center Ao-
Now, the control center has the priority right to make the first move and may restrict
the possibilities of its subordinate divisions by channeling their actions as desired.
The aim of center A0 is to maximize the functional K0(u, t>i,..., vn) over u, whereas
divisions B t , t = 1 , . . . ,n, which have their own goals to pursue, seek to maximize the
functional #,(;, Vi) over t>,\
4.5.2, We shall formalize this problem as a noncooperative (n + l)-person game
T (an administrative center A0 and production divisions B\,..., Bn) in normal form.
Suppose Player A0 selects a vector u U, where
n
U - {u = ( ! , . . . , ) : Ui>0, R', i = l , . . . , n , ] , < &}, 6 > 0
1=1

is the set of strategies for player Ao in the game T. The vector U; will be interpreted to
mean the vector of resources of / items allocated by center Ao to the t-th production
division.
Suppose each of the players Bi in the original problem (see 4.5.1) knows the choice
by Ao and selects the vector t>,- 6 K(.) where

Vi(u,) = {, / T : vtAi < i + au Vi > 0}. (4.5.1)

The vector Vi is interpreted as a production program of the t-th division for various
products; A; is the production or the technological matrix of the t-th production
division (/4; > 0); a, is the vector of available resources for the t-th production
division (a< > 0).
By definition, the strategies of Player Bj in the game T mean the set of functions
,() setting up a correspondence among the elements m : ( i t i , . . . , u<,..., u n ) U and
the vector Vj(tti) G Vi(u<). The set of such functions is denoted by Vi, i = 1 , . . . , n.
Let us define the players' payoff functions in the game T. The payoff function for
Player AQ is
n

t=l

where a, > 0, a, i f is a fixed vector, t = 1 , . . . , n; a,i?,(,) is the scalar product of


vectors a^ and i(u,). The payoff function for player Bi is

Ki{u,vx(),. ..,()) = c{Vi(u,),


216 Chapter 4. Positioned games

where c, > 0, Ci e iT" is a fixed vector, t = 1 , . . . , n .


Now the game T becomes T = (U, V i , . . . , Vn, K0, / f j , , K ) ,
4.5.3. We shall construct a Nash equilibrium in the game T. Let V"(UJ) Vi(uj)
be a solution to a linear parametric programming problem (with the vector < as
parameter)
max c,v,; = c,-*(j), i = l , . . . , n , (4.5.2)
tVi(ui)
and let u* (/ be a solution to the problem

max *(, ?(),. ..,<()) (4-5.3)

For simplicity assume that the maxima in (4.5.2) and (4.5.3) are achieved. Note
that (4.5.3) is a nonlinear programming problem with an essentially discontinuous
objective function (maximization is taken over u, and "(,-) are generally discontin
uous functions of the parameter Uj). Show that the point (u*,J(-),.. .,*()) is an
equilibrium in the game I\ Indeed,

#(,:(). -<()) > *o(, ,*(), , ;()), 6 f .

Further, with all i = 1 , . . . ,n the inequality

Ki{u%^(.),.. .,<()) = c < ) > < w )

= Ki(u',v;(-),...,vll(.),viC)^Ui(-), ..-,<())
holds for any <() K. Thus, it is not advantageous for every player Ac, Bx,...,Bn
to depart individually from the situation (u*,i;J(-),..., v*(-)), i.e. it is an equilibrium.
Note that this situation is also stable against departures from it of any coalitionS C
{ B j , . . . , B}, since the payoff Kj to the t-th player does not depend on strategies

4.6 Hierarchical games (Cooperative version)


This section deals with a cooperative version of the simplest hierarhical games (in
cluding the games defined in 4.5.1, 4.5.2). Characteristic functions are constructed
and existence conditions for a nonempty core are studied.
4 . 6 . 1 . Starting from the conceptual basis of problem 4.5.1, 4.5.2 and using the
strategies which form a Nash equilibrium, we define for every coalition S C N =
{Ao,Blt...,Bn} its guaranteed profit v(S) as follows:

(0, if S = {Ao}, (4.6.1)


v(S) = I &A 6 se.-r(0). XAofS, (4.6.2)
| max { B y : J B s U j = 4 } E,:B i s(o.+c < )t;;(u < ), if Ao S, (4.6.3)

where u*(u;)> i = 1 , . . . , n is a solution to the linear parametric programming problem


(4.5.2).
4.6. Hierarchica/ games (Cooperative version) 217

Equality (4.6.1) holds, since the coalition {B\,. ..,/?} can ensure a zero payoff to
Player Ao by selecting all t>; = 0, t = 1 , . . . ,n; equality (4.6.2) holds, since Player Ao
can always guarantee for S the payoff at most (4.6.2) by allocating to every Bi S
a zero resource; equality (4.6.3) holds, since coalition S incorporating Ao can always
ensure distribution of the whole resource only among its members.
Let S be an arbitrary coalition containing Ag. Denote by us = (uf,... , u j ) the
maximizing vector in the nonlinear programming problem (4.6.3) (condition uf = 0
holds for i : Bj $ S). The following expression holds for any coalition 5 C 5, S ^ Ao,
AoS:
E (a, + c,K(uf)> (a, + c,K(uf)
i:B,S i:B,S

= E (a + K(F) + E (* + ciKW-
Let S, .ft C JV, S n ft = 0 and A0 5 ^ AQ. Then ^ .ft. In view of the condition
o > 0, c,- > 0, i\ > 0, t = 1 , . . . , n, we have

v(S UR)= E (< + ^ K ( f U * ) > E ( + ^)v. T (f) = E ( + C


-X(f)
i:B>SUR i-.B.eSUR t:B;6S

+ E (i + ci)r(o) = (5) + (fl) + E a^.*() ^ (s) + (*)


i:B,R i:B,R
where 52,:B,eR<ijt>*(0) ^ 0 ' S the profit of center Ao from "self-supporting" enterprises.
When Ao & S U or S = A0 g # the inequality u(S U fl) > w(S) + w(fi) is obvious.
The function v(S) defined by (4.6.1)-(4.6.3) is super-additive and we may consider
the cooperative game ({Ao, By,. ., B},v) in a characteristic function form.
4,6.2. Consider an (n + 1) vector
n
C V
t = ( E W'fc), l i ( l ) . , Cn^n^n)), (4-6.4)
i=l

where u = uN. The vector is an imputation since the following relationships are
satisfied:
1) E& = f > + );(*) = (AT);
*=0 >=1

2 ) & = E a i w ; ( u i ) > 0 = (y4()),


i=i

4, = ^ ' ( u , ) > c ^ O ) = v(B,), i = 1 , . . . , n.

By the Theorem 3.10.1, the necessary and sufficient condition for the imputation
(oi6, .n) t o belong to core is that the inequality

Lti>(S) (4.6.5)

holds for all coalitions S C {Ao,Bx,...,Bn}.


218 Chapter 4. Positional games
Let us introduce the condition under which the imputation ( belongs to the core.
If S = {/4o} or S C {Bi,..., Bn), then condition (4.6.5) is satisfied, since

& = > * ( * ) > 0 = (Mo}),


=i

Efr = E ;(*)> E ^,*(o) = v(5).


'S :B,eS i:BiS
If i4o G 5 ^ J4O! then condition (4.6.5) can be written as
n ^^
>,;(,)+ E <*;(*)= E .<(.)+ E *"(*)
i=l i:B<S i:B,eS i:fliS

+ E .;(*)> E (.-+c*K(uf).
i:Bj*S t:BieS
Therefore, the imputation (4.6.4) belongs to the core if the inequality

E *:(*)> E te + coitfluf)-;(*)]
:BjgS .:BjeS

holds for all S : AQ S.


Note that in this case we have defined the characteristic function of the game by
using the payoff in Nash equilibrium and the quantity v(N) = max JLi(a,-+c,-)w*(u,-)
is generally less than the total maximum payoff to all players that is

m
M IS?? J E( a * + C*H(*) >
(different from the definition of a characteristic function adopted in Ch. 3).
4.6.3. The characteristic function of the game can be constructed in the ordinary
way, that is: it can be defined for every coalition S as the value of a zero-sum game
between this coalition and the coalition of the other players N \ S, We shall now
construct the characteristic function exactly in this way. In this case we shall slightly
generalize the preceding problem by introducing arbitrary payoff functions for players.
As in the previous case, we assume that center AQ distributes resources among
divisions B\,...,Bn which use these resources to manufacture products. The payoffs
to the control center A and "production" divisions B\,..., B depend on the output
of products by B i , . . . , ?. The vector of resources available to center AQ is denoted
by 6. Center (Player) ,4o selects a system of n vectors u = (tij,..., u n ) from the set
n
U = {u = (uu... ,u):ujt > 0 , uk rf,Eu* < M = li- -,}

Here u* is interpreted as the vector of resources allocated by center AQ to the pro


duction division B*. The capacities of enterprise (Player) B are determined by the
resource t obtained from Ag, i.e. enterprise Bk selects its production program x*
4.6. Hierarchical games (Cooperative version) 219
m
from the set Bk(uk) C R of non-negative vectors. We assume that the sets Bk(uk)
for all Uk contain a zero vector and increase monotonically in the inclusion, i.e. from
u'k > Uk follows Bk(u'k) D Bk(uk), and the condition Bk(0) = 0 (impossibility of
production because of the lack of resources) is satisfied.
Let x = ( x i , . . . , i ) . The payoff to Player AQ is determined by the function
/ 0 (x) > 0, whereas the payoffs to the players Bk are taken to be lk(xk) > 0, k =
l , . . . , n (the payoff to Player Bk depends only upon his production program). For
simplicity, assume that the payoff to center AQ satisfies the condition

'o(*) = '(**).
where the term l(xk) is interpreted to mean the payoff to Player AQ due from Player
Bk. We also assume that l(xk) > 0 for all xk Bk{uk) and /*(0) = 0, /(0) = 0,
ifc= l , . . . , n .
Just as in Sec. 4.5, so can the hierarchical game 4.6.3 be represented as a non-
cooperative (n + 1) person game in normal form, where the strategies for Player Ao
are the vectors u U, while the strategies for players Bk are the functions from the
corresponding sets. Let us construct the characteristic function () for this game
following 3.9.2. For each players' subset of S, we take v(S) to be the value (if it exists
in conditions of the subsection) of a zero-sum game between coalitions S and N\S,
in which the payoff to coalition 5 is determined as the sum of payoffs belonging to
the players set S.
Let N = {Ao,Bu...,Bn}. Then

v(N)= sup sup ( E M * * ) + **(**)]}


{ " 6 t / : " . i u = 6 } r *eB*(*)t *= > H = i '

Note that for all S C {Bx,..., Bn), v(S) = 0, since Player Ao can always distribute
the whole resource b among the members of coalition N \ S, to which he also belongs,
thereby depriving coalition S of resources (i.e. ^o can always set uk = 0 for k : Bk S,
which results in Bk(0) = 0 for all Bk S). Using this line of reasoning we get
v(Ao) = 0, since the players B\,..,Bn can always nullify a payoff to center Ao by
setting xk = 0 for k = 1 , . . . , n (without turning out products). It is apparent that J4O
will distribute the whole resource among the members of the coalition when coalition
S contains center Ao. This reasoning leads to the following formula:

tJ(5)= sup sup { [/(**) + **(**)]}


{ l / : * ; k e s u,=t} r*B( u ), k:BkeS ^k:Bl,eS '

for S : Ao S.
It can be shown that, under such definition of the characteristic function, the core
of the imputation set
n
a = (a0, fti,..., an): en > 0, i = 0 , 1 , . . . , n, a,; = v(N)
220 Chapter 4. Positional games

is always nonempty.
4.6.4. Hierarchical systems with double subordination are called diamond-shaped
(Fig. 4.7). Control of a double subordination division C depends on control B\ and
control B 2 .

Figure 4-7

One can envision a situation in which center ?i represents the interests of an


industry, while B 2 represents regional interests, including the issues of environment
protection. A simple diamond-shaped system is an example of a hierarchical two-level
decision-making system. At the upper level there is an administrative center which
is in charge of material and labor resources. It brings an influence to bear upon
activities of its two subordinate centers belonging to the next level. The decisions
made by these centers determine an output of the enterprise standing at a lower level
of the hierarchical system.
We shall consider this decision-making process as a four-person game. Denote this
game by T. Going to the game setting, we assume that at the first step Player Ao
moves and selects an element (strategy) w = (u\, u 2 ) from a certain set U, where U is
a strategy set for Player A0- The element u 6 U restricts the possibilities for players
Bj and B 2 to make their choices at the next step. In other words, the set of choices
for Player Bx is found to be the function of the parameter i (denoted by Bi(uj)).
Similarly, the set of choices for Player B2 is found to be the function of parameter u 2
(denoted by B 2 (u 2 )). Denote by wi 6 J3i(i) and w2 -^(uj) * n e elements of the
sets of choices for players B\ and B 2 respectively. The parameters wi and w2 selected
by players Bj and B 2 specify restrictions on the set of choices for Player C at the
third step of the game, i.e. this set turns out to be the function of parameters w, and
w3. Denote it by C(u>i,W2), and the elements of this set (production programs) by v.
Suppose the payoffs of all players A0, Bt, B 2 , C depend only on the production
program v selected by Player C and are respectively equal to Ji(), {2(t>), '3(^)1 U(v),
where U(v) > 0.
This hierarchical game can be represented as a noncooperative four-person game
in normal form if the strategies for Player Ao are taken to be the elements u =
(i>2) U, while the strategies for players B\, B 2 and C are taken to be the func
tions wi(ui),wj( 2 ) and v(a>i,ti;2) with values in the sets Bi(ui),B 2 (u 2 ),C(wi,u> 2 ),
respectively, (the sets of such functions will be denoted by B\, B 2 , C) which set up a
correspondence between every possible choice by the player (or the players) standing
4.6. Hierarchical gomes (Cooperative version) 221

at a higher level and the choice made by this player. Setting

tf.KwiOWH.'K-)) = / i ((aj 1 (u 1 ),u) 2 (u2)), i = T74

we obtain the normal form of the game T

r = (U,BuB^C,Kx,K2,K^K<).

4.6.5. We shall now seek a Nash equilibrium in the game T. To this end, we shall
perform additional constructions.
For every fixed pair (wi,W2), (u)i,w2) G U u 6 [/Bi(ui) x B2(u2) we denote by
v"(u>i,u>2) a solution to the parametric extremal problem.

max U(v) = l4(t;*(wi,a;2)). (4.6.6)

(The maximum in (4.6.6) is taken to be achieved.) The solution v'(-) = V'(UJI,U>2) of


problem (4.6.6) is found to be the function of the parameters uii,w2 and *() C.
Let us consider an auxiliary parametric (with parameters 1*1,1*2) nonzero-sum
two-person (Bt and B2) game r'(i*i,uj) = {B\{u\), #2(1*2)) 1*2, ^ K where l2
/a(t;*(wi,W2)), / 3 = /3(u*(u;i,W2)). The elements wi B i ( u i ) are strategies for Player
Bi in r ' ( u j , u2), while the elements w2 B2(u2) are strategies for Player B2. Suppose
the game V(ul,u2) has the Nash equilibrium denoted as (u>1(ui),tjJ2(u2)). Note that
u>*(-) is the function of the parameter u, and w'(-) 6 B,, = 1,2.
Further, let u" = (1*1, Ua) be a solution to the extremal problem

max/,(>,(!!,), ;(!*,))). (4.6.7)


net*
Lemma. TTie situation (u*,uJ(-),wJ('),*()) is a Mis/i equilibrium in the game
r.
Proof. By the definition of u*, from (4.6.7) follows the relationship

*,(,,(.),";(),*()) = n u g c / i K K f t i . J . w ^ u , ) ) )
> /.(^(wKuO.wSM)) = K.KwK-),^-), '())
for all u e U. Since wj(u,),wj(uj) form a Nash equilibrium in the auxiliary game
F ( U J , U J ) the relationships

ff,K,wi(-), ;(), '()) = / a ('K(i),^M))


> ^(^(^(uD.mu;))) = /fa(u*>Wl(-),w;(-), '())
hold for any function t*>\(-) Bt, u>i(u*) = Wi B^(u\).
A similar inequality holds for Player B2.
By the definition of the function v*, from (4.6.6) we have

Kt{u*,U\(-)t<4{.), ()) = / 4 (v*K(u*),a, 2 -(;)))


222 Chapter 4. Positional games

= max Uv) > l<(v) = K4(u',^(.),^(.)A'))


6CK(J)^I("J))
for any function v(-) G C, u(wj(uj),wj(t4)) = v G C(wi(uJ),wJ(u5)).
This completes the proof of lemma.
4.6.6. Applying the maximin approach, for every coalition S C {Ao,B\,B2,C}
we define v'(S) to be the guaranteed maximum payoff to 5 in a zero-sum game
between coalition S acting as the maximizer, and coalition S' = {Ao, Bi,B2, C) \ S.
Suppose there exists v0 G C(ui,u2) for all u>i,W2 such that l\(v0) = 0, t = 1,2,3,4,
We shall distinguish two forms of coalitions: 1) S :C fi S;2) S :C S .
In the first case S C {Ao, B\, B2) and Player C, the member of coalition N\S,
may choose strategy vo : h(vo) = 0, i = 1,2,3,4, therefore v'(S) = 0.
In the second case the characteristic function v'(S) is defined by the following
equalities:
a) S = {C}
v'(S) = min min min max U(v),
6f/ wieBi(ui)weBa(ua)iieC'(wi^i)
(in what follows we assume that all max and min axe achieved);
b ) S = {4),C}
v'(S)
v = max min min max (h(v) + h(v));
' etf ,Bi(u,)UJeft()i-C(a.,,wj)v v
c)S={Bl,C}
v'(S) = min max min max (U(v) + Uv)):

d)S = {B2,C}
v'(S) = min max min max (hM + U(v));

e)S={B1,B3,C)
vt/(S) = min max max max V* Uv):
eu <*, B, (m) ^j 82(02) ec(i ,uj) , J j ^ 4
t)S={A0,B1,C}
v'(5)
v = max max min max V* Uv):
u&J , 6 B j ( u , ) W J B J ( U J ) 6 C ( , ^ , ) ; J ^ J 4

g)S={Ao,B2,C]
w'(5)
v? = max max min max V] /,();
ut/ W26B,(uj)iBi(u,)ueC(wl^j)j=~i4

h)S = {Ao,Bi,B2,C}
4
t/(S) = max max max max yii(v).
uV ifli(ui)tB2(M)BC(wi)^

By this definition, the characteristic function exhibits super-additivity, i.e. the


inequality v(SUR) > v{S) + v(R) holds for any S,R C {Ao,Bt,B2,C} for which
sr\R = e>.
4.7, Multistage games with incomplete information 223

4.7 Multistage games with incomplete informa


tion
4 . 7 . 1 . In Sees. 4.1-4.4 we considered multistage games with perfect information
defined in terms of a finite tree graph G = (X, F) in which each of the players exactly
knows at his move the position or the tree node where he stays. That is why we
were able to introduce the notion of player t's strategy as a single-valued function
Ui(x) defined on the priority set Xi with its values in the set Fx. If, however, we wish
to study a multistage game in which the players making their choices have no exact
knowledge of positions in which they make their moves or may merely speculate that
this position belongs to some subset A of the priority set X,, then the realization of
the player's strategy as a function of position x Xi turns out to be impossible. In
this manner the wish to complicate the information structure of a game inevitably
involves changes in the notion of a strategy. In order to provide exact formulations,
we should first formalize the notion of information in the game. Here the notion of
an information set plays an important role. This will be illustrated with some simple,
already classical examples from texts on game theory [McKinsey (1952)].
Example 7. (Zero-sum game.) Player 1 selects at the first move a number from
the set {1,2}. The second move is to be made by Player 2. He is informed about
Player l's choice and selects a number from the set {1,2}. The third move is again
to be made by Player 1. He knows Player 2's choice, remembers his own choice and
selects a number from the set {1,2}. At this point the game terminates and Player
1 receives a payoff # (Player 2 receives a payoff (#), i.e. the game is zero-sum),
where the function H is defined as follows:
# ( 1 , 1 , 1 ) = - 3 , # ( 2 , 1 , 1 ) = 4,

# ( 1 , 1 , 2 ) = - 2 , # ( 2 , 1 , 2 ) = 1,
/Y(l,2,l) = 2, # ( 2 , 2 , 1 ) = 1,
# ( 1 , 2 , 2 ) = - 5 , # ( 2 , 2 , 2 ) = 5. (4.7.1)
The graph G (X, F) of the game is depicted in Fig. 4.8. The circles in the graph
represent positions in which Player 1 makes a move, whereas the blocks represent
positions in which Player 2 makes a move.
If the set X, is denoted by X, the set X 2 by Y and the elements of these sets by x
X, y 6 Y, respectively, then Player l's strategy ux(-) is given by the five-dimensional
vector Ui(-) = {ui(ii),Ui(x2),ui(x 3 ),111(14),ui(xs)} prescribing the choice of one of
the two numbers {1,2} in each position of the set X. Similarly, Player 2's strategy
u 2 (-) is a two-dimensional vector u2(-) = {"2(2/1), "2(5/2)} prescribing the choice of one
of the two numbers {1,2} in each of the positions of the set V. Now, in this game
Player 1 has 32 strategies and Player 2 has 4 strategies. The corresponding normal
form of the game has a 32x4 matrix which (this follows from the Theorem in 4.2.1)
has an equilibrium in pure strategies. It can be seen that the value of this game
is 4. Player 1 has four optimal pure strategies: (2,1,1,1,2), (2,1,2,1,2), (2,2,1,1,2),
(2,2,2,1,2). Player 2 has two optimal strategies: (1,1), (2,1).
224 Chapter 4. Positional games
- 3 - 2 2 - 5 4 1 1 5

Figure 4-8

Example 8. We shall slightly modify the information conditions of Example 7.


The game is zero-sum. The first move is made by Player 1. He selects a number from
the set {1,2}. The second move is made by Player 2. He is informed about Player l's
choice and selects a number from the set {1,2}. The third move is made by Player
1. Without knowledge of Player 2!s choice and with no memory of his own choice he
chooses a number of the set {1,2}. At this point the game terminates and the payoff
is determined by formula (4.7.1) in the same way as the Example 7.
The graph of the game, G = (X,F), remains unaffected. In the nodes xi, S3, 14, *5
(at the third move in the game) Player 1 cannot identify exactly the node in which
he actually stays. With the knowledge of the priority of his move (the third move),
he can be sure that he is not in the node Xi. In the graph G the nodes X2,x3,X4,x&
are traced by the dashed line (Fig. 4.9).

Figure 4.9
4.7. Multistage games with incomplete information 225

The node i i is enclosed in a circle, which may be interpreted for Player 1 as an


exact knowledge of this node when he stayed in it. The nodes yi,5?2 are enclosed
in blocks, which also means that Player 2 staying in one of them at his move can
distinguish this node from the other. Combining the nodes X2,xs,x4,x& into one set,
we shall illustrate the fact that they are indistinguishable for Player 1.
The sets into which the nodes are collected in this way are called information sets.
We shall now describe strategies. The information state of Player 2 remains
unchanged; therefore the set of his strategies is the same as in Example 7, i.e. it
consists of four vectors (1,1), (1,2), (2,1), (2,2). The information state of Player
1 changed. At the third step of the game he knows only the number of this step,
but does not know the position in which he stays. Therefore, he cannot realize the
choice of the next node (or the choice of a number from the set {1,2}) depending on
the position in which he stays at the third step. For this reason irrespective of the
actually realized position he has to choose at the third step one of the two numbers
{1,2}. Thus the strategy for him is a pair of numbers (i,j), i,j G {1,2}, where the
number t is chosen in position Xi while the number j at the third step is the same in
all positions x 2 , i 3 , x 4 , i 5 . Now the choice of a number j turns out to be a function
of the set and can be written as u { x 2 , i 3 , x 4 , i 5 } = j . In this game either of the two
players has four strategies and the matrix of the game is

(1.1) (1.2) (2.1) (2.2)


(1.1) [ - 3 - 3 2 2'
(1.2) -2 -2 -5 -5
(2.1) 4 1 4 1 '
(2.2) L 1 5 1 5.

This game has no equilibium in pure stategies. The value of the game is 19/7,
an optimal mixed strategy for Player 1 is the vector (0,0,4/7,3/7), and an optimal
mixed stategy for Player 2 is (4/7,3/7,0,0). The guaranteed payoff to Player 1 is
reduced as compared to the one in Example 7. This is due to the degradation of his
information state.
It is interesting to note that the game in Example 8 has a 4 x 4 matrix, whereas the
game in Example 7 has a 32 x 4 matrix. The deterioration of available information
thus reduces the size of the payoff matrix and hence facilitates the solution of the
game itself. But this contradicts the wide belief that the deterioration of information
results in complication of decision-making.
Modifying information conditions we may obtain other variants of the game de
scribed in Example 7.
Example 9. Player 1 chooses at the first move a number from the set {1,2}. The
second move is made by Player 2, who, without knowing Player l's choice, chooses
a number from the set {1,2}. Further, the third move is made by Player 1. Being
informed about Player 2's choice and with the memory of his own choice on the first
step he chooses a number from the set {1,2}. The payoff is determined in the same
way as in Example 7 (Fig. 4.10).
Since on the third move the player knows the position in which he stays, the
226 Chapter 4. Positional games

Figure 4.10

positions of the third level are enclosed in circles and the two nodes, in which Player
2 makes his move, are traced by the dashed line and are included in one information
set.
Example 10, Player 1 chooses a number from the set {1,2} on the first move. The
second move is made by Player 2 without being informed about Player l's choice.
Further, on the third move Player 1 chooses a number from the set {1,2} without
knowing Player 2's choice and with no memory of his own choice at the first step.
The payoff is determined in the same way as in Example 7 (Fig. 4.11).

Figure ^.//

Here the strategy of Player 1 consists of a pair of numbers (, j), the t-th choice
is at the first step, and j-th choice is at the third step; the strategy of Player 2 is a
choice of number j at the second step of the game. Now, Player 1 has four strategies
4.7. Multistage games with incomplete information 227

and Player 2 has two strategies. The game in normal form has a 4 x 2 matrix:

1 2
(1.1) [-3 2'
(1.2) -2 -5
(2.1) 4 1
(2.2) [ 1 5.

The value of the game is 19/7, an optimal mixed strategy for Player 1 is
(0,0,4/7,3/7), whereas an optimal strategy for Player 2 is (4/7,3/7).
In this game the value is found to be the same as in Example 8, i.e. it turns out
that the deterioration of information conditions for Player 2 did not improve the state
of Player 1. This condition is random in nature and is accountable to special features
of the payoff function.
Example 11. In the previous example the players fail to distinguish among posi
tions placed at the same level of the game tree, but they do know the move to be
made. It is possible to construct the game in which the players may reveal their
ignorance to a greater extent.
Let us consider a zero-sum two-person game in which Player 1 is one person,
whereas Player 2 is the team of two persons, A and B. All three persons are placed
in different rooms and cannot communicate with each other. At the start of the game
a mediator comes to Player 1 and suggests that he should choose a number from the
set {1,2}. If Player 1 chooses 1, the mediator suggests that A should be the first
to make his choice. However, if Player 1 chooses 2, the mediator suggests that B
should be the first to make his choice. Once these three numbers have been chosen,
Player 1 wins an amount K(x,y,z), where x,y,z are the choices made by Player 1
and members of Team 2, A and B, respectively. The payoff function K(x,y, z) is
defined as follows:
tf(l,l,l) = l, # ( 1 , 1 , 2 ) = 3,
# ( 1 , 2 , 1 ) = 7, # ( 1 , 2 , 2 ) = 9,
# ( 2 , 1 , 1 ) = 5 , # ( 2 , 1 , 2 ) = 1,
# ( 2 , 2 , 1 ) = 6, # ( 2 , 2 , 2 ) = 7.
From the rules of the game it follows that when a member of the team, A or ,
is suggested that he should make his choice he does not know whether he makes his
choice at the second or at the third step of the game. The structure of the game is
shown in Fig. 4.12.
Now, the information sets of Player 2 contain the nodes belonging to different
levels, which corresponds to the ignorance of the number of the move in the game.
Here Player 1 has two strategies, whereas Player 2 has four strategies composed of
all possible choices by members of the team, A and B, i.e. strategies for him are the
pairs (1,1),(1,2),(2,1),(2,2).
In order to understand how the elements of the payoff matrix are determined, we
consider a situation (2,(2,1)). Since Player 1 has chosen 2, the mediator goes to B
228 Gbapter 4. Positional games
6 7

//

Figure 4-12

who, in accordance with strategy (2,1), chooses 1. Then the mediator goes to A who
chooses 2. Thus the payoff in situation (2,(2,1)) is #(2,1,2) = 1. The payoff matrix
for the game in normal form is

(1.1) (1.2) (2.1) (2.2)


1 3 7 9
5 6 1 7

The value of the game is 17/5 and optimal mixed strategies for players 1 and 2
respectively are (2/5,3/5), (3/5,0,2/5,0).
Note that in multistage games with perfect information (see Theorem in 4.2.1)
there exists a Nash equilibrium in the class of pure strategies, while in multistage
zero-sum games there exists an equilibrium in pure strategies. Yet all the games
with incomplete information discussed in Examples 8-11 have no equilibrium in pure
strategies.
4.7.2. We shall now give a formal definition of a multistage game in extensive
form.
Definition. [Kuhn (1953)]. The n-person game in extensive form is defined
h
1) Specifying the tree graph G = (X,F) with the initial vertex x0 referred to as
the initial position of the game.
2) Partition the sets of all vertices X into n + 1 sets Xi,X%,..., Xn, Xn+i, where
the set Xi is called the priority set of the i-th player, i = 1,..., n, and the set Xn+i =
{x : Fz 0 } is called the set offinalpositions.
S) Specifying the vector function K(x) = (Ki(x),..., Kn(x)) on the set of final
positions x X n+J ; the function Ki(x) is called the payoff to the i-th player.
4) Subpartition of the set Xi, i= 1,..., n into nonoverlapping subsets X' referred
to as information sets of the i-th player. In this case, for any position of one and
the same information set the set of its subsequent vertices should contain one and the
4.8. Behavior strategy 229

some number of vertices, i.e. for any x,y X'\FX\ = \Fy\ (\FZ\ is the number of
elements of the set Fz), and no vertex of the information set should follow another
vertex of this set, i.e. ifxX(, then there is no other vertex y X{ such that y FT
(see 4.1.2).
The definition of a multistage game with perfect information (see 4.1.4) is distin
guished from the one given here only by condition 4, where additional partitions of
players' priority sets Xi into information sets are introduced. As may be seen from
the above examples, the conceptual meaning of such a partition is that when player
i makes his move in position x Xi in terms of incomplete information he does not
know the position x itself, but knows that this position is in a certain set X' C A",-
(x X'). Some restrictions are imposed by condition 4 on the players' information
sets. The requirement \FX\ = \FV\ for any two vertices of the same information set
are introduced to make vertices x,y X' indistinguishable. In fact, with \FZ\ ^ \Fy\
Player t could distinguish among the vertices x,y Xj by the number of arcs ema
nating therefrom. If one information set could have two vertices x,y such that y Fx
this would mean that a play of the game can intersect twice an information set, but
this in turn is equivalent to the fact that player j has no memory of the number of
his move in this play which can hardly be conceived in the actual play of the game.

4.8 Behavior strategy


We shall continue examination of the game in extensive form and show that in the
complete memory case for all players it has an equilibrium in behavior strategies.
4.8.1. For the purposes of further discussion we need to introduce some additional
notions.
Definition. The arcs incidental with x, i.e. {(x,y) : y Fx}, are called
alternatives at the vertex x X.
If \FX\ k, at the vertex x there are k alternatives. We assume that if at the
vertex x there are k alternatives, then they are designated by integers \,...,k with
the vertex x bypassed in a clockwise sense. The first alternative at the vertex x0 is
indicated in an arbitrary way. If some vertex x =f z 0 is bypassed in a clockwise sense,
then an alternative which follows a single arc {F~l ,x) entering into x (Fig. 4.13) is
called the first alternative at x.

Figure 4.13

Suppose that in the game V all alternatives are enumerated as above. Let A* be
230 Chapter 4. Positional games

the set of all vertices x X having exactly k alternatives, i.e. At = {x : \FX\ = k}.
Let J, = {Xf : XI C Xt} be the set of all information sets for player i. By definition
the pure strategy of player i means the function u< mapping U into the set of positive
numbers so that Ui(Xf) < k if Xf C ^4*. We say that the strategy u,- chooses
alternative / in position x Xf if Ui(Xf) = /, where I is the number of the alternative.
As in 4.1.4, we may show that to each situation () = ( i ( ) , . . .,()) uniquely
corresponds a play w, and hence the payoff in the final position of this play.
Let * Xn+i be a final position and ui is the only path (F is the tree) leading
from x 0 to x. The condition that the position y belongs to the path w will be written
as y o> or y < x.
Definition. Position x X is called possible for u , ( ) if there exists a situation
u(-) containing ,() such that the path u> containing position x is realized in situation
u(-), i.e. x w. The information set Xf is called relevant for u,(-) if some position
x Xf is possible for ;()
The set of positions, possible for u,(-), is denoted by Possui(-), while the collection
of information sets that are relevant for u,-(-) are denoted by Reiuj(-).
Lemma. Position x X is possible for ,() if and only / u , ( ) chooses alter
natives lying on the segment of the path wx from XQ to x in all its information sets
intersecting u>x.
Proof. Let x Possui(-). Then there exists a situation () containing Uj(-) such
that the path to realized in this situation passes through x, which exactly means that
in all its information sets intersecting the segment of the path wx the strategy u,(-)
chooses alternatives (arcs) belonging to ux.
Now let Uj(-) choose all alternatives for player i in w r . In order to prove the
possibility of x for u<(-) we need to construct a situation () containing u.-(-) in
which the path would pass through x. For player k ^ * we construct a strategy
() which, in the information sets X'k intersecting the segment of the path ut,
chooses alternatives (arcs) lying on this path and is arbitrary otherwise. Since each
information set only intersects once the path w, this can always be done. In the
resulting situation () the path w necessarily passes through x; hence we have shown
that x POSSUJ(-).
4.8.2. Mixed strategies in the games in extensive form T axe defined in the same
way as in 1.4.2, for finite games.
Definition. The probability distribution over the set of pure strategies of player
i which places his every pure strategy <() in correspondence with probability qUi(')
(for simplicity we write quJ is called a mixed strategy ^, for player i.
The situation ft = (/*j,...,^ n ) in mixed strategies determines the probability
distribution over all plays (paths) u (hence, in final positions Xn+i as well) by the
formula
^ . M = I ] 9i 9n-PH,

where P(w) = 1 if the play (path) w is realized in situation u ( ) and Pu(u) = 0


otherwise.
Lemma. Denote by P(x) the probability that the position x is realized in
4.8. Behavior strategy 231

situation fi. Then we have

w = E *., ?=n E *H- (4-s-i)


{u( ):x6P0SSUi(),i=l,...,n} <=1 {ujrxgPoesiij}

The proof of this statement immediately follows from Lemma in 4.8.1. The math
ematical expectation of the payoff Ei(fi) for player in situation p is

Hf)= E *.-(*)JW*). (4-8.2)

where P M (i) is computed by formula (4.8.1).


Definition. Position x X is possible for fa if there exists a mixed strategy
situation \i containing fa such that PM(x) > 0. TVie information set X\ for player i
is called relevant for fa if some x G X' is possible for fa.
The set of positions, possible for fa, is denoted by Possfa and the collection
information sets essential for fa is denoted by Relfa
4.8.3. Examination of multistage games with perfect information (see 4.3.3) shows
that strategies can be chosen at each step in a suitable position of the game, while in
the solution of specific problems it is not necessary (and it is not feasible) previously
determine a strategy, i.e. a complete set of the recommended behavior in all positions
(informations sets), since such a rule (see Example in 4.2.2) "suffers from strong
redudancy". The question now arises of whether a similar simplification is feasible
in the games with incomplete information. In other words, is it possible to form a
strategy as it arise at a suitable information set rather than to construct the strategy
as a certain previously fixed rule for selection in all information sets? It turns out
that in the general case it is not feasible. However, there exists a class of games in
extensive form where such a simplification is feasible. Let us introduce the notion of
a behavior strategy.
Definition. By definition the behavior strategy # for player i means the rule
which places each information set X- C Ak for player i in correspondence with a
system ofk numbers b(X',v) > 0, v l,...,k such that

X>(*/,") = i,
where Ak = {x : \FX\ = k).
The numbers b(X',v) can be interpreted as the probabilities of choosing alter
native v in the information set X\ C Ak each position of which contains exactly k
alternatives.
Any behavior strategy set 0 = (/Jj,... ,/?) for n players determines the probability
distribution over the plays of the game and in final positions as follows:

P0(U)= I| *(*/.") (4-8-3)


232 Chapter 4. Positional games

Here the product is taken over all Xf,v such that X / U w ^ 0 and the choice in the
point X' n w of an alternative numbered as v leads to a position belonging to the
path w.
m what follows it is convenient to interpret the notion of a "path" not only as a
set of its component positions, but also as a set of suitable alternatives (arcs).
The expected payoff ,(/?) in the behavior strategy situation /? = (/?i,... ,/?) is
defined to be the expectation

E0)= E Ki(x)P0(uI),i = l,...,n,

where w s is the play terminating in position x Xn+i.


4.8.4. For every mixed strategy m there can be a particular behavior strategy ft.
Definition. The behavior strategy ft corresponding to player i 's mixed strategy
Mi {?;} * ' ^ e behavior strategy defined as follows.
If XI eRelin, then

h{XUv) = ^ y i t e f c , (*/)->)', (4.8.4)

If X' & Relfii, then on the set Xf the strategy ft can be defined as distinct from
(4.8.4) in an arbitrary way. (In the case X- # Relfii the denominator in (4-8.4) goes
to zero.) For definiteness, let

*(*>) = E ?- (4-8-5)

We shall present the following result without proof.


Lemma. Let ft be a behavior strategy for player i and fti = {qUi} be a mixed
strategy determined by

Then ft is the behavior strategy corresponding to fi{.


4.8.5. Definition. [Kuhn (1953)]. The game T is called a game with perfect
recall for the ith player if for any Ui(-),Xf,x from the conditions Xf KeJu, and
x Xf it follows that x Possui.
From the definition it follows that in the perfect recall game for the tth player
any position from the information set relevant for u , ( ) is also possible for *,(). The
term "perfect recall" underlines the fact that, appearing in any one of his information
sets the tth player can exactly reconstruct which of the alternatives (i.e. numbers) he
has chosen on all his previous moves (by one-to-one correspondence) and remembers
everything he has known about his opponents. The perfect recall game for all players
becomes the game with perfect information if all its information sets contain one
vertex each.
4.8. Behavior strategy 233

4.8.6. L e m m a . Let T be a perfect recall game for all players with w as a play
in T. Suppose x G X' is the final position of the path u> in which player i makes his
move, and suppose he chooses in x an arc v. Let

Ti(u) = { , : Xi G Relu m(X() = u}.

Ifui has no positions from Xi, then we denote 6j/T,(w) the set of all pure strategies for
player i. Then the play w is realized only in those situations () = (i(-)> >**("))
for which u< G T;(w).
Proof. Sufficiency. It suffices to show that if Uj G Tj(w), then the strategy u<
chooses all the arcs (alternatives) for player i appearing in the play w (if player t has
a move in w). However, if , G ^ ( w ) then X' G Relui, and since the game V has
perfect recall, x G Possui (x G w). Thus, by Lemma in 4.8.1, the strategy u< chooses
all the alternatives for player i appearing in the play u>.
Necessity. Suppose the play w is realized in situation u(-), where U; Ti{u>) for
some i. Since X' G Reiu,, this means that u,-(X/) ^ v. But then the path u> is not
realized. This contradiction completes the proof of the lemma.
4.8.7. L e m m a . Let T be a perfect recall game for all players. Suppose v is an
alternative (arc) in a play w that is incidental to x G X', where x w, and the next
position for player i (if any) on the path u> is y G X*. Consider the sets S and T,
where
S = {u, : X{ G ReJu,, <(*/) = v),
T={m: X* G Relui}.
Then S = T.
Proof. Let Ui G S. Then X' G Relui, and since T has perfect recall x G Possui.
By Lemma 4.8.1, it follows that the strategy u,- chooses all the arcs incidental to
player i's positions on the path from XQ to x, though u,-(X'/) = v. Thus, Uj chooses all
the arcs incidental to Player 's positions on the path from x0 to y, i.e. y G Possui,
X? G Keiu; and u{ G T.
Let u, T. Then X* G Relui, and since T has perfect recall y G Fossu,. But this
means that x 6 Pos$m and u^(X/) = v, i.e. u; G S. This completes the proof of the
lemma.
4.8.8. T h e o r e m . Let $ be a situation in behavior strategies corresponding to
a situation in mixed strategies p in the game T (in which all positions have at least
two alternatives). Then for

Etffi) = Eiin), i = l , . . . , n ,
it is necessary and sufficient that T be a perfect recall game for all players.
Proof. Sufficiency. Let T be a perfect recall game for all players. Fix an arbitrary
p. It suffices to show that /^(w) = P^UJ) for all plays w. If in u there exists a
position for player i belonging to the information set that is irrelevant for fc, then
there is Xj G Rei/i;, X' f l u ^ 0 such that the equality b(Xi,v) 0 where K
holds for the behavior strategy /J, corresponding to /*,-. Hence we have F#(w) = 0.
The validity of relationship Pu(u) = 0 in this case is obvious.
234 Chapter 4. Positional games
We now assume that all the information sets for the t-th player through which
the play w passes, are essential for pi, i = 1,2,...,n. Suppose player i in the play
w makes his succeeding moves in the positions belonging to the sets X},... ,X' and
chooses in the set X\ an alternative i/j, i = 1,..., s. Then, by formula (4.8.4) and
Lemma 4.8.7, we have
n*w,"i)= E fa-
Indeed, since in the play w player t makes his first move from the set X,', it is essential
for all u,(-), therefore the denominator in formula (4.8.4) for b(X*,v\) is equal to 1.
Further, by Lemma 4.8.7, the numerator 6(X7,Vj) in formulas (4.8.4) is equal to the
denominator b(Xj+l, i/J+1), i = 1 , . . . , s. By formula (4.8.3), we finally get

w = n fa>
where Tj(w) is determined in Lemma 4.8.6.
By Lemma 4.8.6
*V(W) = E f e - fa,^(w) = fa fa.
u(-) u:uiTi(w), i=l,...,n

i.e. PM(w) = JP/J(W). This proves the sufficiency part of the theorem.
Necessity. Suppose T is not a perfect recall game for all players. Then there exist
player i, a strategy u,, an information set Xj Reiuj and two positions x, y X\ such
that x e Possu,-, y Possu,-. Let u\ be a strategy for player t for which y PossuJ and
w is the corresponding play passing through y in situation u'. Denote by p,- a mixed
strategy for player t which prescribes with probability 1/2 the choice of strategy u,-
or u\. Then P'||w(y) = ^'||(w) 1/2 (here u'|j/ij is a situation in which the pure
strategy uj is replaced by the mixed strategy ni). Prom the condition y Possui
it follows that the path & realized in sitaution u'||u, does not pass through y. This
means that there exists X$ such that JVf n w = X$ (1 w ^ 0 and ,(Xf) ^ |(A'^).
Hence, in particular, it follows that X* Relui, X* ReiuJ. Let # be the behavior
strategy corresponding to p<. Then ^A^u^X*)) = 1/2. We may assume without
loss of generality that u,(X/) ^ u'{(Xf). Then 6(X/,uJ(X/)) = 1/2. Denote by (t
a situation in behavior strategies corresponding to a mixed strategy situation u'||/<i.
Then P/j(w) < 1/4, whereas Pu'|/(<*>) = 1/2. This completes the proof of the theorem.
Prom Theorem 4.8.8, in particular, it follows that in order to find an equilibrium
in the games with perfect recall it is sufficient to restrict ourselves to the class of
behavior strategies.

4.9 Functional equations for simultaneous multi


stage games
The behavior strategy theorem proved in the preceding section fails, in the general
case, to provide a means of immediately solving multistage games with perfect recall.
4.9- Functional equations for simultaneous multistage games 235

However, when information sets have a simple structure this theorem provides a basis
for derivation of functional equations for the value of the game and the methods of
finding optimal strategies based on these equations. The simplest games with perfect
recall, excluding games with perfect information, are the so-called repeated zero-sum
games. We shall derive a functional equation for the value of such games and consider
some of the popular examples [Diubin and Suzdal (1981), Owen (1968)] where these
equations are solvable.
4.9.1. Conceptually, a repeated game is a multistage zero-sum game, where at
each step of the game the Players 1 and 2 choose their actions simultaneously, i.e.
without being informed about the opponent's choice at this moment. After the choices
have been made they become known to both players, and the players again make their
choices simultaneously, and so on.
Such a game can be represented with the help of a graph which may have one of
the two representations a) or b) in Fig. 4.14.

Figure 4-H

The graph represents an alternating game with an even number of moves, where
the information sets for a player who makes the first move are single-element, while
the information sets for the other player are two-element. In such a game T the two
players have perfect recall. Therefore, in this game, by Theorem 4.8.8, the search for
an equilibrium may be restricted to the class of behavior strategies.
For definiteness, we assume that the first move in T is made by Player 1 and for
every x X$ there is a subgame Tx which has the same structure as the game T.
The normal form of any finite-stage zero-sum game with incomplete information is a
matrix game, i.e. a zero-sum game with a finite number of strategies; therefore in all
subgames Tx, x S X\ (including the game V = r i 0 ) there exists an equilibrium in the
class of mixed strategies. By Theorem 4.8.8, such an equilibrium also exists in the
class of behavior strategies and the values of the game (i.e. the values of the payoff
function in a mixed strategy equilibrium or in a behavior strategy equilibrium) are
equal.
Denote the value of the game Tz by v(x), x Xx and set up functional equations
for v(x).
For each x X\ the next position x' (if any), in which Player 1 makes his move,
belongs to the set F*. Position x' is realized as a result of two consecutive choices:
236 Chapter 4. Positional games
first by Player 1 of an arc incidental to vertex x, and then by Player 2 of an arc in
positions y Fx forming information sets for Player 2. Hence we may say that the
position x' results from the mapping of Tx depending on the choices of or, fi by the
Players 1 and 2, i.e.
x' = Tx(a,0).
Since the number of various alternatives a and /? is finite, for every x X%,
we may consider a matrix game with the payoff matrix Ax = {v{Tx(a,f3)}}. Let
fi(x) = {b',(x,a)}, 0ji(x) = {b'n(x,0)} be optimal mixed strategies in the game
with the matrix Ax. Then we have the following theorem for the structure of optimal
strategies in the game Tx.
Theorem. In the game T an optimal behavior strategy for Player 1 at the point
x (each information set of Player 1 in the game V consists of one position x 6 X\)
assigns probability to each alternative a in accordance with an optimal mixed strategy
of Player 1 in the matrix game Az = {v(Tx{a, /3))} that is
6,(x,a) = 6J(3,a).

An optimal behavior strategy {6j(-^2) 0)} f Player 2 in the game T assigns prob
ability to each alternative fi in accordance with an optimal mixed strategy of Player S
in the game with the matrix Ax, i.e.

b2(xi,0) = rn(x,0),
where x = F~l ify&X^.
The value of the game satisfies the following functional equation:
v(x) = Val{v[Tx(a,ffl}, x Xu (4.9.1)
with the initial condition
v(*)Ux3 = H(x). (4.9.2)
(Here ValA is the value of the game with matrix A.)
The proof is carried out by induction and is completely analogous to the proof of
Theorem 4.2.1.
4.9.2. Example 12. (Game of inspection.) [Diubin and Suzdal (1981)]. Player E
(Violator) wishes to take a wrongful action. There are N periods of time during which
this action can be performed. Player P (Inspector) wishes to prevent this action, but
can perform only one inspection during any one of these periods. The payoff to Player
E is 1 if the wrongful action remains undetected after it has been performed, and is
(1) if the violator has been detained (this is possible when he chooses for his action
the same period of time as the inspector for his inspection); the payoff is zero if the
violator takes no action. Denote this iV-step game by V^.
Each player has two alternatives during the first period (at the 1st step). Player
E may or may not take an action; Player P may or may not perform inspection. If
Player E acts and Player P inspects, then the game terminates and the payoff is 1.
4.9. Functional equations for simultaneous multistage games 237

If Player E acts, while Player P fails to inspect, the game terminates and the payoff
is 1. If Player E does not act, while Player P inspects, then Player E may take action
during the next period of time (assuming that N > 1) and the payoff will also be 1.
If Player E does not act and Player P does not inspect, they pass to the next step
which differs from the previous one only in that there are less periods left before the
end of the game, i.e. they pass to a subgame r ^ - i - Therefore, the game matrix for
the 1st step is as follows:
-1 1
(4.9.3)
1 V/v-i
Equation (4.9.1) then becomes

-1 1
VN : VaJ (4.9.4)
1 VN-l

Here v(x) is the same for all game positions of the same level and hence depends only
on the number of periods until the end of the game. For this reason we write v/v in
place of v(x). In what follows it will be shown that I>JV-I < 1; hence the matrix in
(4.9.4) does not have a saddle point, i.e. the game with matrix (4.9.4) is completely
mixed. From this (see 1.9.1) we obtain the recursive equation

VN-l + 1
vN (4.9.5)
-v/v-i + 3 '
which together with the initial condition

0 (4.9.6)

determines v#. Let us transform equation (4.9.5) by substituting ts = We


obtain a new recursive equation t^ = t^-i 1/2, tt = 1. This equation has an
obvious solution IN = (N + l ) / 2 , hence we have
N - 1
VN (4.9.7)
iV + 1'
We may now compute optimal behavior strategies at each step of the game. In
fact, the game matrix (4.9.4) becomes

-1 1
1 [N-2]/N
and the optimal behavior strategies are

1
\N + l'N + l)' U + l'iV + lA
Example 13. (Game-theoretic features of optimal use of resource.) Suppose that
initially the Players 1 and 2 have respectively r and R r units of some resource
238 Chapter 4. Positional games

and two pure strategies each. We also assume that if the players choose the same
pure strategies, then Player 2's resource is reduced by a unit. If, however, the players
choose different pure strategies, then Player l's resource is reduced by unit. The game
terminates after the resource of one of the players has become zero. In this case the
payoff to Player 1 is 1 if the resource of Player 2 is zero. The payoff to him is 1 if
his resource is zero.
Denote by Tk,i a multistage game in which Player 1 has k (k = 1,2,... , r ) units,
and Player 2 has / (/ = 1 , . . . , R - r) units of resource. Then

VaJT w _, ValTk_u
VailY, = Val
VailV,,, Vff w _,

where Vair i>0 = 1, VaW0J = - 1 .


Consider the 1st step from the end, i.e. when both players are left with 1 unit of
resource each. Evidently, at this step the following matrix game is played:

1
ri.i = -1

The game r u is symmetric, its value denoted by " U is zero, optimal strategies for
players coincide and are equal to (1/2,1/2).
At the 2nd step from the end, i.e. when the players are left with 3 units of resource,
one of the two matrix games is played: Fi,2 or f ^ i . In this case

"1,1 -1 i,i - 1
w1%2 = VaJI\ 2 = Val
-1 vu

1 i,i
va,, = Va/r 2il = Val
t>u 1 2 2"
At the 3rd step from the end (i.e. the players have a total of 4 units of resource)
one of the following three games is played: I \ 3 , TJi2, r 3 i i . In this case

"1,2 -1 ' "1,2 ~ 1 3


u1>3 = V a i r u = Val
-1 "1,2 2 4'

= "2,1 "1,2 "2,1 + "1,2


"2,2 VWTj^ = Val = 0,
_ "1,2 "2,1 . 2
1 "2,1 "2,1 + 1 _ 3
v3tl = Vajr3il = Val
"2,1 1 4"
Continuing analogous computations up to the Af-th step from the end we obtain
the following expression for the value of the original game:

"r,R-r-l "r-l,fl-r
v,jt-r = Vairr,R_r = Val
"r-l,K-r "r,fl-r-l
4.9. Functional equations for simuitaneous multistage games 239

By the symmetry of the payoff matrix of the game r r ,/j_ r we have


l
,
Vr,R-r = ^(t>r,R-r-l + Vr-l%R-r),

optimal behavior strategies for the players at each step coincide and are equal to
(1/2,1/2).
Example 14- This jocky game is played by two teams Player 1 (mi women
and m 2 cats) and Player 2 (n, mouses and n 2 men). Each team chooses at each
step his representative. One of the two chosen representatives is "removed" by the
following rule: woman "removes" man; man "removes" cat; mouse "removes" woman;
cat "removes" mouse. The game is continued until only players of one type remain
in one of the groups. When a group has nothing to choose the other group evidently
wins.
Denote the value of the original game as u ( m i , m 2 , n i , n 2 ) . Let

u(mi, m 2 , " i , 0 ) = v(mi,mj,0, n 2 ) = 1, if m i , m 2 > 0. (4.9.8)

v ( m i , 0 , n i , n j ) = v(0,m 2 ,rii,n 2 ) = - 1 , if tii,n2 > 0.


Let us introduce the following notation: v(m.\ 1) = t)(ni-l,TO 2 ) n],n 2 ), v(m2 1) =
v(mi,m2-l,ni,n2), w(nj-l) = i;(mi,ro2,ni-l,n2), v ( n 2 - l ) = t>(mi,m2,ni,n2-l).
By Theorem 4.9.1, the following relationship holds:

v(mi 1) v(n 2 1)
v ( m i , m 2 , n i , n 2 ) = Vai v(n-i 1) u(m 2 1)

It can be shown that this game is completely mixed. By Theorem 4.9.1, we have

v(mu m2,ni,n 2 ) = -r- .


v(m.i 1) + v(m 2 1) v(t%i 1) v(n2 1)
In terms of the boundary condition (4.9.8) we obtain
t)(m,,1,1,1) = t>(mi 1) + 1
v(m1 1) + 3
and w(l, 1,1,1) = 0. But these equations coincide with equations (4.9.5), (4.9.6),
hence v(m, 1,1,1) = (m l ) / ( m + 1) and optimal strategies in this case also coincide
with those in Example 12.
4.9.3. Repeated evolutionary games.
To define evolutionary games in extensive form it is necessary to define the sym
metry of the extensive form game. We shall restrict our attention to two-person games
with perfect recall and without chance moves. Following Selten (1983), a symmetry
of game T is defined as a mapping ( ) T from alternatives to alternatives with the
following properties. Let Af, denote the set of alternatives (choices) of player i in T.
If m g Mh then mT Mj (i^j {1,2}),

(m T ) T = m for all m.
240 Chapter 4. Positional games
For every information set u there exists an information set r such that every
alternative at u is mapped onto a choice at T , for every endpoint x Xn+i there
exists an endpoint xT Xn+i such that if zis reached by the sequence m s , ma,... ,m n ,
then xT is reached by (a permutation of) m f , m j , . . . , m j , and the payoffs Hi(x) =
H2(xT) for every endpoint x X+t, xT G X n + i.
A symmetric game in extensive form is a pair (r, T) where F is a game in extensive
form and where T is a symmetry of T. If 6 is a behavior strategy of player 1 in
(r, T), then the symmetric image of b is the behavior strategy b7 of Player 2 defined
by
bT(m) = KT(rnT) (u g Uum e M u ).
If &i,6j are behavior strategies of Player 1, then the probability that the endpoint x
is reached when (frj, b7) is played is equal to the probability that xT is reached when
(62,6f) is played. Therefore the expected payoff to Player 1 when (61, b7) is played is
equal to Player 2's expected payoff when (63, bT) is played

Ei(bi,b2) = ;(foA )
This equation defines the symmetric normal form of (r, T) if restricted to the pure
strategies. FoUowing van Damme (1991) define the direct ESS of (r, T), as a behavior
strategy 6 of Player 1 that satisfies

E1(S,f) = m%xE1(b,f)

and if b e B3 ^ l and Et{b,bT) = E2(b,bT), then


Et(b,bT) < E&b7)
(here B; is the set of all behavior strategies of Player i). Van Damme (1991) notes
that in many games intuitively acceptable solutions will fail to satisfy the condition
of direct ESS.
We give here a "refinement" of this definition which will not reject intuitively
acceptable solution. First consider an example.
Example 15. (The repeated Hawk and Dove game.)
The matrices of this bimatrix game have the form
H D H D
1/2(V - C) V H ' 1/2(V - C) 0
n/ /-
H
/= 0 1/2V ~ D V 1/2V
D
If t; > c, then (H, H) is ESS in this bimatrix game T. The game tree is represented
on the figure below.
In the two-stage game the strategy of the Player 1(2) is the rule which selects
H or D in each of his information sets. Player 1(2) has five information sets. Thus
Player 1(2) has 32 strategies which consists from the sequences of the form

(H,H,D,H,D)
4.9. Functional equations for simuJtaneous multistage games 241

Figure 4-15

and so on. Denote the strategy of player by u(-).


Consider a strategy u(-) = (H, H, H, H, H) which is formed from the ESS strate
gies (v > c) in each of the subgames (one stage games). It would be preferable if this
strategy will be ESS in the two-stage game I\ Unfortunately by the definition of the
direct ESS for games in extensive form it is not.
It is easily seen that the first condition of direct ESS is satisfied since u(-) is a
Nash equilibrium in T, but we can find a strategy

v() = (H,H,D,D,D)

for which the payoff (in pure strategies the expected payoff E coincides with the
payoff function K) in the situation ((),()) K(v(-),u()) is equal to the payoff in
the situation (u(-),u(-)) - K(u(-),u(-)).
tf ((), u(-)) = *(tt(.), ()) = - < . ,

but K(v(-),v(-)) is also equal to K(u(-),v(-))

K(v(\v(-)) = K(u(-)A-)) = v-c

and the second condition of direct ESS is not satisfied.


This unnatural arrangement happens because of bad definition of direct ESS for
positional games.
We purpose a new definition.
Definition. The pair ( u ( ) , u ( ) ) is a direct ESS if
1. *((),()) > K(w(-),u(-)) for all ;().
S. Ifv{-) is such that in the situation (u(-),u()),(v(-),u(-)) the realized paths in
F are different (the terminal positions in the game T are different), then if

*((),()) = *((),()),
then
* ( ( ) , ( ) ) < * ( ( ) , ( ) )

By this definition the strategy () = (H,H,H,H,H) is direct ESS, since the


strategy v(-) = (H,H,D,D,D) giving the same payoff against u(-) as u(-) itself is
excluded by the point 2 of this definition from the consideration.
242 Chapter 4. Positional games
The slight modification of this definition for the games with random moves (as
example in van Damme (1991)) shows that the situation natural to be ESS and ex
cluded there from the direct ESS is direct ESS in our sense.

4.10 Cooperative multistage games with com


plete information
4.10.1. Consider the game F with complete information on the finite graph as defined
in 4.1.4. There will be a slight difference in the definition of the payoffs of the players.
In the subsection 4.1.4 we supposed that the payoffs were defined only on the set
X+i of final positions of the game. Here we suppose that for each x belonging to
X n real numbers h,(x), t = l , . . . , n are given. And for each path of the game
z = (zo, 2i> , *i), %i Xn+i the payoff of player i is defined as

*=o
If h{(x) = 0, x Xi, i = l , . . . , n , we have the game exactly defined in form
(4.1.1). As it was done in the classical cooperative game theory (see Chap. 3) we
suppose that before starting the game the players agree to choose such n tuple of
strategies
" ( ) = ( l ( - ) . .<(). ()).
which maximizes the sum of the payoffs by the players. If ? = (ZQ, . . . , z*,..., zj),
zi 6 X n + i is the path (trajectory) realized in the situation u(-) = (i,...,,-,... ,),
then by the definition of () we have that

.flE M*) = M*0- (4.10.1)


The cooperative game F develops along the trajectory z = (zo,... ,z*,... ,zj) which
we shall call the optimal trajectory.
It is clear that in the game T we may have a whole family of "optimal trajectories",
each one of them giving the same maximal common payoff for the players. In this
section for simplicity we suppose also that in the game T the optimal trajectory is
unique. Define in the game T the characteristic function as this was done in Sec. 3.11.
The characteristic function may be introduced axtomatically or as the value of the
zero sum game played between the coalitions S C N and N \S.
As we have seen in Sec. 3.11, the important thing is that

V(N)='thi(jk),N = {l,...,n},

and for Si C N, S2 C N, St n 5 2 = 0
V(S1uS2)>V(Sl) + V(Si),
4.10. Cooperative multistage games with complete information 243

V ( 0 ) = 0.
If the characteristic function is defined then we can define the set of imputations

C = U = ({,-) : f c = V(NUi > V({i)),i = l,-..,n},

the core M C C

M = {( = ((,)--Y:t^V(S),ScN}cC,

iVM solution, Shapley value, and other optimality principles of classical game theory.
In what follows we shall denote by M C C anyone of this optimality principles.
Suppose at the beginning of the game the players agree to use the optimality
principle M C C as the basis for the selection of the "optimal" imputation ( 6 M.
This means that playing cooperatively by choosing the strategy maximizing the
common payoff each one of them is waiting to get the payoff ^ from the optimal
imputation f M after the end of the game (after the maximal common payoff
V(N) is really earned by the players).
But when the game T actually develops along the "optimal" trajectory z =
(z 0 ,2),. . . , ) . , . . . , ; ) at each vertex z* the players find themselves in the new multi
stage game with complete information r j 4 , k = 0 , . . . , /, which is the subgame of the
original game V starting from z* with the payoffs

#(**) = E M*j)> hi > 0 * = 1, - , n. -

It is important to mention that for the problem (4.10.1) the Bellman optimality
principle holds and the part zk = (:?*,..., Z j , . . . ,7j) of the trajectory z, starting from
Zk maximizes the sum of the payoffs in the subgame Tjk, i.e.

n * E E M*i) = E E M*>) (4-io.2)


1=1 J = fc 1=1 ]=k

which means that the trajectory z* ( z * , . . . , Z j , . . . ,z";) is also "optimal" in the


subgame F ^ .
Before entering the subgame rJk each of the players t have already earned the
amount
fff'=EM*;)- (4-10.3)

At the same time at the beginning of the game T = r(x0) = F(z0) the player t
was oriented to get the payoff f, the ith component of the "optimal" imputation
f M C C. From this it follows that in the subgame Tjk he is expected to get the
payoff equal to
l-H?=t!\ = l,...,n (4.10.4)
244 Chapter 4. Positioned games

and then the question arises whether the new vector * = ( j * , . . . i & V - , )
remains to be optimal in the same sense in the subgame F,k as the vector was in
the game r ( I 0 ) - If this will not be the case, it will mean that the players in the
subgame T^k will not orient themselves on the same optimality principle as in the
game T(z0) which may enforce them to go out from the cooperation by changing
the chosen cooperative strategies u , ( ) , i = l , . . . , n and thus changing the optimal
trajectory ~zk in the subgame Tjk. Try now formalize this reasoning.
Introduce in the subgame Tjk, k = 1 , . . . , / , the characteristic function V(S;zjt),
S C N. In the same manner as it was done in the game T r(zo). Based on the
characteristic function V(S; Jk) we can introduce the set of imputations

C(zk) = {I = (fc): f c = V(N;zk),t, > V({i};zk),i = 1,... , } ,

the core M CC:


M = {t = ( 6 ) : & > V(S;zk),SC N} C C(zk),

NM-solution, Shapley value and other optimality principles of the classical game
theory. Denote by M{zk) C C(lk) the optimality principle M C C (which was
selected by players in the game r(o)) considered in the subgame Tjk.
If we suppose that the players in the game T(zo) when moving along the optimal
trajectory (J0,.. .,zk,... ,zj) follow the same ideology of optimal behavior then the
vector * = HZk must belong to the set M(zk) the corresponding optimality
principle in the cooperative game Tjk, k = 0,...,l.
It is clearly seen that it is very difficult to find games and corresponding optimality
principles for which this condition is satisfied. Try to illustrate this on the following
example.
Suppose that in the game T hi(z) / 0 only for z E Xn+i (the game T is the game
with terminal payoff as the game in the Sec. 4.1). Then the last condition would
mean that
t = ?keM(sk), k = o,...,i,
which gives us
len'k=0M(zk). (4.10.5)
For k = / we shall have that
I 6 M(z,)-
But M(zj) = C(z~i) = {hi(z~t)}. And this condition have to be valid for all imputations
of the set M(zo) and for all optimality principles M{z$) C C{ZQ), which means that
in the cooperative game with terminal payoff the only reasonable optimality principle
will be

the payoff vector obtained at the end point of the trajectory in the game T(zo). At
the same time the simplest examples show that the intersection (4.10.5) except the
"dummy" cases, is void for the games with terminal payoffs.
4.10. Cooperative multistage games with complete information 245

How to overcome this difficulty. The plausible way of finding the outcome is to
introduce a special rule of payments (stage salary) on each stage of the game in such
a way that the payments on each stage will not exceed the common amount earned
by the players on this stage and the payments received by the players starting from
the stage k (in the subgame Tj^) will belong to the same optimality principle as the
imputation on which the players agree in the game Tz0 at the beginning of the game.
Whether it is possible or not we shall consider now.
Introduce the notion of the imputation distribution procedure (IDP).
Definition Suppose that = { i , . . . , & , . . ,} G M(ZQ).
Any matrix 0 = {ft*}, i = 1 , . . . ,n, k 0 , . . . , / such that

^ = 53 ft*, 0tk>O (4.10.6)


*=o
is called the imputation distribution procedure (IDP).
Denote 0k = (0lk, ... Jnk), 0{k) = E j f i Pm- T " e interpretation of IDP 0 is: 0,k
is the payment to player i on the stage k of the game r* 0 , i.e. on the first stage of the
subgame V^k. From the definition (4.10.6) it follows that in the game l \ each player
t gets the amount {,, i 1 , . . . , n, which he expects to get as the t'th component of
the optimal imputation d g M(z0) in the game Tzo.
The interpretation of 0{(k) is: 0i(k) which is the amount received by player i on
the first k stages of the game ri0.
Definition. The optimality principle M(zQ) is called time-consistent if for every
6 M(z0) there exists IDP 0, such that

(k^(-0(k)M(zk), A= 0,1,...,/. (4.10.7)

Definition. The optimality principle M(z0) is called strongly time-consistent if


for every g M(z0) there exists such IDP 0 that

0(k)$>M(zk)cM(zo), k = Q,...,l.

Here a A = {a + a' : a1 g A,a R",AC # " } .


The time-consistency of the optimality principle M(zo) implies that for each im
putation ( 6 M there exists such IDP 0 that if the payments in each position z* on
the optimal trajectory 5" will be made to the players according to IDP 0, in every
subgame T?k the players may expect to receive the payments ( which are optimal in
the subgame V?k in the same sense as it was in the game r i 0 .
The open problem remains whether we can find one IDP 0 for all belonging to
M(z0).
The strongly time-consistency means that if the payments are made according to
IDP 0 then after earning on the stage k the amount 0(k) the players (if they are
oriented in the subgame Yik on the same optimality principle as in 1%,,) start with
reconsidering of the imputation in this subgame (but optimal) they will get as a result
246 Chapter 4. Positional games
in the game T^ the payments according to some imputation, optimal in the previous
sense, i.e. the imputation belonging to the set M(ZQ).
If we remove the condition of non-negativity imposed on the components of IDP
P{Pik > 0) then for any optimality principle M(*o) C C(zo) and for every ( M(zo)
we can define fti by the following formulas

*-**'= ft* i = l,...,n, * = 0,...,/-l, (4.10.8)

??=ftc
From the definition it follows that

Eft* = E(r-er +, )+'=r = -


*=o *=o
And at the same time
-/?(*) = f'eM(z*), fc = o,...,/.
The last inclusion would mean the time consistency M(z0) if we could be sure that
ft*='-'+' > 0 f o r a l l i = l , . . . , n , Jfc = 0,...,/. (4.10.9)
Unfortunately it cannot be guaranteed even in the simplest cases. One can try to
convince in this fact by considering the games with the terminal payoffs. For these
games the condition (4.10.9) does not hold for any "non-dummy" case. The strongly
time consistency condition is even more obligatory. We cannot even derive the formula
like (4.10.9).
Since the non-negativity of the IDP appears to be the condition which is ready
satisfied for the optimality principles introduced in the multistage cooperative game
theory from the classical simultaneous cooperative games we shall try to make regu-
larization (refinement) of the classical optimality principles which will lead us to the
new strongly time-consistent and time-consistent optimality principles.
4.10.3. Now introduce the following functions

Pi
V(N;z0) '
where C(z0)
fll _ & -'=1 "(*l) e 1 , o(-z V
ft 6C(Z,)
~ V(N,zi) ' * '

ok _ (i )?=! hi[*k) (k - r>(- \,


( 6C(zfc)
* - V(N,zk) ' '

^ = ^'v[y,?f)' eeC{J,)
- {4 1010)
"
4.11. Voting the directorial council 247

Define the IDP 0k = { # , i = l , . . . , n } , k = 0 , . . . ,1. It is easily seen that /?* >


0. Consider the formula (4.10.10). For different imputations (k C(z*_i) we get
different values of /?* and, hence, different values of 0. Let Bk be the set of all
possible ft for all k C{zk), k=l,...,l.
Consider the set:

and the sets C(zk) = {? : | * = E U * f , f e n


The set C(z 0 ) is called the regularized OP (3(ZQ) and, correspondingly, C(zk) is a
regularized OP C(zk).
We consider C(zo) as a new optimality principle in the game T(z0).
T h e o r e m . / / the IDP ft is defined as a 0, k = 1 , . . . , /, then always:

f3(k)C(zk)cC(z0),

i.e. the OP C(zo) w strongly time consistent.


Proof. Suppose
? G 0(k) C(zk),
for s
then I = 0(k) + EU* 0"*. me m B m , m = ib,..., /.
But /?(*) = Em=o /?"" ^ r some 0"" e B m , m = 0 , . . . , k - 1.
Consider
m
(a"\m__f0 , ra = 0,...,*-l,
^ U m , -01 = * , . . . , / ,
then (") m 6 B m and = E|=o(/ ? ") m and thus I e C(* 0 ). The theorem is proved.
The defined IDP has the advantage (compared with 0 defined by (4.10.6)):

# = i, * = o,...,/,
1=1

and thus
E&() = !>.(**), (4.10.11)
i=\ i=\ k=0
which is the actual amount to be divided between the players on the first 6 + 1 stages
and which is as it is seen by the formula (4.10.11) exactly equal to the amount earned
by them on this stages.

4.11 Voting t h e directorial council


Now we are trying to demonstrate an interesting application of the considered n-
person games. Suppose that a concern, consisting from n independent companies,
has to elect a directorial council.
Denote by a,- the number of voters in A{, i { 1 , . . . , n} = N. a^ represents the
power of Ai- Each company has one or zero places in the council, and offers to one
248 Chapter 4. Positional games
candidate 6, Ai to take part in the vote for the council. Thus there are n candidates
(&!,...,6,,...,b n ) purposed for the election in the council. Every voter can decide
"yes" or "no" for each of the candidates. The result of the voting for each voter is
an n-dimensional vector, the t-th component of which can take one from two values:
"yes* or "no". This can be represented in the form: (Y,N,Y,...,Y). The candidate
n
bi is elected in the directorial council if he gets more than the half positive votes (more
than | E " ^ ^ "yes").
Suppose the directorial council (DC): B = {bi : i 6 S C N} is elected. The DC
B gets a payoff K > 0 which is independent from the number of the members of DC
and its staff, if E, s . > ^f^ = ^ If E * s i < ^ the payoff K = 0.
The amount K has to be shared between the DC staff members proportionally to
the power of the companies represented by the corresponding staff members. Thus
the payoff of the member bi (or what is the same, of the company Ai) if bi 6 B is
equal to
ft = = A - K (4.11.1)

if we denote E.S = a{S), then (4.11.1) can be written in the form

0i = ~-K (4.11.2)
a(b)
for i $ S, ft = 0.
The problem is now how the voters have to vote, what is the optimal size of DC
and the optimal membership.
To solve the problem we shall construct a game theoretic model and purpose
two different approaches, both of them lead to the Nash equilibrium in a specially
constructed multistage game with complete information.
4.11.1. Simultaneous n-person voting game.
Consider the sets S (coalitions) S C N for which the following condition is satis
fied:
a(S}>^- (4.11.3)

where a(S) = E < s i> (#) = E,N .- _


The set S is called "admissible" set. Define S as minimal "admissible" set, i.e.
a(3) = mina(5) (4.11.4)
s
where 5 is subset of N satisfying (4.11.3). The sets B = {A,-,t } and B = {bi,i
} are called optimal DC and admissible DC correspondingly.
The coalition 5 contains more than half of the voters and in the same time, is
minimal from the coalitions with this property. It is clear that 3? defined by (4.11.3),
(4.11.4) may not be the unique coalition. The members from some given coalition ~S
are not interested in that anyone else will join them because then the payoff of each
of the 5 members will decrease (see (4.11.1), (4.11.2)).
4.11. Voting the directorial council 249

How the members of S should behave to guarantee the forming of DC from the
candidates of 5?
Suppose that every company Ai decides how to vote for each of its members
(voters) (the members of A, can always form a voting coalition). In this case if they
say "yes" to the candidates from Ai, i ~5, and "no" to the other candidates, then
DC from S will indeed be elected.
Consider a simultaneous n-person voting game T. The number of players in T
is equal t o n = ct(N) (each voter is considered as a player). Every player / has 2"
strategies, the strategy set of the voter / consists from all possible vectors of the form
a1 = {a[, . . . , < * ' , . . . , a'n}, where a[ can take one of the values "yes" or "no". In the
situation a = (a1,. ..,a!,... ,an) the result of the voting is defined in the following
way: if the number of "yes" in players' strategies on the i-th place is more than ^ ^ ,
then the candidate from the company A, is elected in the directorial council. In the
opposite case this candidate loses the vote. Suppose in the situation a the council
B = {&,, i S], where S is an admissible coalition, is elected. Then each voter / from
the company Ai, i 6 S wins the amount

kl(a) = ~? = --,l~l,...,a(N), (4.11.5)

the other voters' payoffs are equal to zero, if S is not an admissible coalition, then we
suppose that ki(a) 0 for all /. Construct now the Nash equilibrium in P. Suppose
that the set 3 is defined by (4.11.3) and (4.11.4). If / Ai, i G 3 , then in the strategy
a1 of player (voter) /
5^ = " y e s " , i f i S ,

For / A^ i 5 the strategies 5* are arbitrary.


T h e o r e m . The n-tuple of strategies a = ( 5 1 , . . . , 5") is a Nash equilibrium in
r.
Proof. The payoffs ks if the n-tuple a is used are equal to

for l
*/(*) = -rsvi Ai, i e 5, (4.11.6)
a(S)
k,(a) = 0, for / e Ai, i $ S. (4.11.7)
Show that
Jb|(5||a') < k,(5), t=l,...,n = a{N). (4.11.8)
Suppose / 6 Ak, k e 5 . If the change of the strategy di on a' changes the result of
the vote, then two possibilities have to be considered:
a) The candidate 6* Ak wins the vote, i.e. bk 6 B, where B is the new DC
elected in the situation (a\a!). If a(S) > ^ (S = {': 6, B}) then

a(S)
250 Chapter 4. Positional games

Since S is not necessary minimal admissible set, then a(S) > a(S), and from (4.11.6)
follows (4.11.8).
b) bk Ak does not win the vote. Then h g B, where B is the new DC elected
in the situation (olla*), and from (4.11.7) we have M"!!**') = 0 and the inequality
(4.11.8) holds also in this case.
Suppose now / 6 A*, k ~5. In this case the change of the strategy by the player
will not change the minimal admissible coalition and the DC. Thus in the situation
(5||a') the company A* will not be represented in the DC, and we shall also have
fci(|l') = 0.
The theorem is proved.
From theorem it follows that for different minimal admissible coalitions 5 we get
different Nash equilibria in T.
4.11.2. The. multistage game generating the minimal admissible coalition.
Consider the n-person multistage game G with complete information with com
panies Ai,...,Ai,...,An as players. Let N = {Ai,... ,Ai,...,A} and S C N be
any coalition in G. In what follows sometimes instead of Ai we shall write I. In
this section the model for the formation of the minimal admissible coalition purposed
in 4.11.1 will be described with the help of the flow chart of Fig. 4.16. A dynamic
notation is used. Statements of the form a > 0 indicate that /? assumes the value
a. Arrows at connecting lines show the directions of flow. Rectangles permit only
one continuation whereas rhomboids contain questions, whose answers "yes" or "no"
determine which branch the game is to be continued.
For every voting game G the model generates a finite extensive game G(T) with
complete information. (The structure of this game is described by the flow chart.)
The idea of using the flow chart for the representation of the multistage game
with some periodic properties is due to Selten (1991).
The process of forming minimal admissible coalition proceed as a succession of
stages. In the following, the rectangles and rhomboids of the flow chart will be
explained in details (see Fig. 4.16).
Rectangle 2. M is the set of active players who did not make any decision yet. At
the beginning, M is the player set N = {A\,..., Ai,..., An). S is the set of players
which agree to form one coalition. At the beginning S is empty. The symbol r, the
stage number, indicates the number of the current stage. The game begins with the
first stage.
Rectangle 3. The term "random draw" means that every player Ai 6 M has the
same probability to be chosen as the next decision maker.
Rectangie 4. The decision maker Ai is excluded from the set of active players.
Rhomboid 5. It is now important whether the decision maker joined to the coali
tion S or not.
Rectangle 6. The player Ai joined to the coalition S.
Rhomboid 7. The decision maker may be the last active player or not.
Rhomboid 8. The player i is the last active player. Checking up whether S is an
4.11. Voting the directorial council 251

Start

_2i
N -.M
0 - 5
l->r

18
Random
r + 1 > r draw of
i G Af

11 jLL
j - n M\i M
r + 1 r

19

Figure 4-16
252 Chapter 4. Positional games

admissible coalition, i.e.

Rectangle 9. S is admissible coalition and each player receives the payoff:

hi = -777T K for Ai S;

kj = 0 for Aj < S.
Rhomboid 10. Ai is not the last active player. Then he selects the next decision
maker (invites the new member in the coalition 5) or refuses to select the next decision
maker.
Rectangle 11. Ai selects the next decision maker Aj 6 M. Aj is now acting as Ai.
The new stage begins.
Rhomboid 12. Ai refuses to select the next decision maker. The coalition S is
formed. Checking up whether S is an admissible coalition, i.e.,

<S) > *f.


Rhomboid 13. The decision maker A, did not join to the coalition S. Checking
up whether S is an admissible coalition.
Rhomboid 14. The admissibility condition is not satisfied, i.e.,

Then the player Ai has two alternatives: purpose to form new coalition including
himself and some of the remaining active players or to go out from the game.
Rectangie 15. The player Ai decides to form a new coalition including himself and
some of remaining active players and selects the next decision maker Aj M.
Rectangle 16. The player Ai went out from the game. The members of the
coalition S are wiped out ( 5 = 0 ) .
Rhomboid 17. If Aj was the last active player, the game ends. If not, the game
continues with the current set of active players M.
Rectangle 18. The next stage begins.
The flow chart of Fig. 4.16 contains all the necessity for the construction of the
multistage game G(F) with the complete information generating the minimal admissi
ble coalition. The precise mathematical statements require a more formal description
of G(T). For this reason we have to introduce the notion of position, strategy, choice
sets, histories and payoffs.
Positions. The first position v0 = N and consists from the set of players in the
game G(T).
If on the preceding stage the player t was selected by the chance or was invited
by the preceding decision maker in the coalition S then the position is a triple:

u = (M,S,i),
4.11, Voting the directorial council 253

where M is the set of active players, S the forming coalition, i the decision
maker.
If in the preceding position the decision maker went out of the game or refused to
continue the formation of the coalition then the position u is a triple:

u = (Af,S,H i ),

where M is the set of active players, 5 the formed coalition, i?; the negative
decision of the decision maker i in the preceding position.
Choice set A(u). In the position o the A(UQ) is equal to the set of players N in
the game G(T) and the choice is made with equal probabilities An. In the position
u = (M,S,i) the choice set consists from the following alternatives:
a) {R} go out of the game;
b) {RYk,k M} refuse to enter the coalition 5, decide to form the new coalition
including himself, purpose to player k M enter this coalition suggesting him as the
next decision maker;
c) {YR} agree with purposal to enter the coalition S and refuse to invite in
the coalition anyone from the set of active players M (the next decision maker).
d) {YYi,,k M] agree to enter the coalition S, invite in S the next player
k M suggesting him as the next decision maker.
In the position u = (M,S,Ri) A(u) = M and the next decision maker is chosen
with equal probabilities (with the probability j i j ) if 5 = 0 or \S\ < ^ . If \S\ > ^
the position u = (M,S,Ri) is a terminal position (A{u) = 0 ) .
Thus
f {R} U {RYk,k M}0 {YR} u {YYk,ke M), if = (M,S,i),
A = lM, \iu = (M,S,Ri),
( N, if u = u0.
Explanation of Table 4.1. Under the heading "Next positions" the table shows
which positions can follow a position u by a choice o A(u) according to the condi
tions on u and a A(u) shown in the other columns. The set of all positions which
can follow u by a 6 A(u) is denoted by D(u,a), "end" indicates that D(u,a) is empty.
D(u) stands for the union of all D(u,a) with a A(u).
Histories. A history q of G(T) is a sequence of positions:

<? = ( u 0 , . . . , I*T)>

where u ! + ] D(ut) for t = 0 , . . . ,T 1. The set of all histories q is denoted by Q.


Plays. A play z is defined as a terminable history q = (K 0 I . . . , u j ) together with
a terminal choice ox A(ur) at its last position:

z = (u0,...,ur;ar).

The set of all plays is denoted by Z.


Payoffs. The payoff hi(z) of player A{ is determined at the end of a play by
(4.11.2).
254 Chapter 4. Positional games
Next positions
conditions on the position choices conditions next positions
on the choices
the starting position random draw (M,S,i)
with G M
the decision maker t jR M^0
is chosen before with i G M
M = 0 end
RYk% k G M (M,i,k)
with k G M
YR *)>? end

S)<? (M,0,Jfc)
with t M
YYk, ifc G M (M,S,k)
with k G M
the preceding decision maker random draw M ^0 (M,0,t)
went out of the game 6 M with G Af
M = 0 end
the preceding decision maker random draw <n>f end

refused to continue the iM s)<W (M,0,i)


formation of the coalition with t G M

Tae ^ . i

Strategies. A strategy a' is a function which assigns a single choice in every


position where t is a decision maker.
The payoff function. In the situation a = ( a 1 , . . . , a ' , . . . ,ctn) the payoff function
ki(a) is defined as a mathematical expectation of the payoff hi(z) if the players use
strategies ( a 1 , . . . , a',..., a n ) .
Theorem. The following n-tuple of strategies ( a 1 , . . . , a * , . . . , a " ) forms an
absolute (subgame perfect) Nosh equilibrium in the game G:
Let ~S~ be a minimal admissible set. If i $ S, o? is arbitrary. Suppose i G ~5. In
the position u = (M t S, t) ft G 5 is a decision maker) player i decides to form a new
coalition if S C\~ = 0 , including himself and invites any other player from IS to join
to his coalition suggesting him as the next decision maker (refusing enter the coalition
S). If S D 3? ^ 0 the player i agrees to enter the coalition S and if (M \ :) ("12?^ 0
invites any player k G (M \ t ) n 5 to join to the coalition S suggesting him as the
next decision maker. If (M \ t ) n 5 = 0 player i enters the coalition S and refuse to
invite in the coalition S any one from the set M.
Proof. Compute the payoffs in the situation a = (a1,... , 5 " ) . From the construe-
4.11. Voting the directorial council 255

tion of strategies a 1 , i = 1 , . . . , n it follows that in the situation a the coalition 5 will


be formed. In this case the payoffs of the players will be:

* , ( < ? ) = * - , if, 3, (4.11.13)


a{S)

ki(a) = 0, if i 3 .
Consider the situation (a||a'). If i # ~5 the player i cannot prevent the forming of
coalition S in the situation (a||a'). Thus

fc(a) = *i(B||o') = 0.

Suppose now in the situation (5||a') i 6 5 . Then with positive probability the
formation of the coalition different from ~5 which assigns to the player t lower payoff
than he gets in the coalition S is possible (in ~S the payoff of the player i S is
maximal because of the structure of the coalition J ) , thus

jfci(5||a') < ife,-(5) for t T>.

The subgame perfectness of the situation 3 can be proved in the similar way in any
subgame of the game T.
The theorem is proved.
4.11.3. The multistage game modeling the vote of directorial council
In this section we consider another approach to the problem of voting the direc
torial council. This approach is based upon the construction of the corresponding
multistage game G with complete information where the players have the possibili
ties not only to form coalitions but also make commitments for payoffs they wanted
to get if the coalition will be formed. The approach is very close to R.Selten[1991],
but the commitment-intervals are different, and each stage has only one round.
We shall find the Nash equilibrium (subgame perfect) in G.
We remain in the formalization of the 4.11.1.
Denote by 5 any coalition S C N such that o(S) > ^p- and there exists such
i0 s that a(S \ {;}) <
e ^
Define now the upper bounds for the commitment possibilities of the players.
Consider the following problem of linear programming:
mx
f _ (
iN\{S}

T, d < K - K^^p- for each S C N, (4.11.14)

fc>0, i<EN\S,
where S is any fixed minimal admissible coalition.
256 Chapter 4. Positional games

Suppose { ; i 6 N \ 5 } is a solution of (4.11.1), (4.11.2). Then the bounds for


the commitment variables i = 1 , . . . , n are defined in the following way:

0 <&<?,, iN\~5,

0<&< , t e s.
(3)
Let , = ^ for i 5 .
The structure of the game G is easier to illustrate by the flow chart of Fig. 4.17,
which is a simplified version of a flow chart purposed by Selten (1991).

1
f Start J

1 -*r

101
M\C ->M Random
S\C-*S draw of SUi- S
r + 1 ->r iM\S r+1 >r

8
names
permissible
payoff
demand &
each j &C\i
receives ,,
t receives
"(C)-Ej6cy6

11

( ^ )
Figure 4.17

Characteristic function. Define the characteristic function t> in the game G. v


4.11. Voting the directorial council 257

assigns a number v(C) to every coalition C C N in the following way:

>
v(C) = I ^ ^' 'f ^ > s a n admissible coalition (a(C) > ^Ip-),
K
' \o, for all other C C iV.

The process of forming the coalitions proceeds as a succession of stages. The


rectangles and rhomboids of the flow chart are explained in detail below (see Fig. 4.17).
Rectangle 2. M is the set of active players, who are not yet in coalitions formed.
At the beginning M is the player set N. The symbol r, the stage number, indicates
the current stage number. The game begins with the first stage. The set S is the set
of committed players. At the beginning 5 is empty.
Rectangle 3. The term "random draw" means that every player i M \ S has
the same probability to be chosen as the next decision maker.
Rhomboid 4. It is now important whether the decision maker has formed a coali
tion C or not.
Rectangle 5. The decision maker t has formed the coalition including himself and
all committed players. The committed members of C receive their payoff demands
j, j G C \ {i} as their basic payoffs and player {t} who has formed C receives what
is left of the coalition value v(C) (v(C) - J2jec\{i} &)
Rhomboid 6. Checking up whether the C is an admissible coalition, i.e. a(C) >
S f l or not.
Rhomboid 7. It is important whether the decision maker {i} is the last uncom
mitted active player, i.e. M \ {} = S or not.
Rectangle 8. The decision maker {t} is not the last active player. He names a
permissible payoff demand ,.
Rectangle 9. Having named a payoff demand player Ai joins to the set of com
mitted players. The new stage begins.
Rectangle 10. The admissibility condition is not satisfied, i.e. a(C) < ^^- The
members of the coalition C are wiped out. The remaining players form the new set
of active players and the set of committed players. The payoff demand of the players
from the coalition C are wiped out. The payoff, demands of the players from the set
S\C remains. The next stage begins.
The flow chart of the Fig. 4.17 contains all the information necessary for the
construction of the extensive game G(T) generated by the model.
A position is described by a triple (M, fs,) whose components have the following
meanings: M is the set of active players, (s is the system of payoff demands of the
committed players and i is the decision maker. The set S of committed players does
not appear separately in the description, but as a part of the information provided
by s. Formally, a triple
* = (Af, s ,t)
is a position of G(T) if the following conditions axe satisfied:

i M C N and & = ( 6 ) t S with S C M \i.

The set of all positions u is denoted by U.


258 Chapter 4. Positional games
Choice sets. The choice set A(u) at the position u = (M, (s, >) describes the set of
all choices available at a decision point corresponding to u and is defined as follows:

A(u)-l(SUi)U[0,a (or S?M\i,


K>
\(M)U{R}t for S = M\i.
The symbol R stands for the choice of a last uncommitted player active not to form
a coalition.
Next positions
coalition C = (5 U *) c)<'W (M\C,fc\c,
4C)> end
S^M\i payoff demand & S [0,,] (M,tsui,j)
S=M\i no coalition R end
Table IS

Explanation of Table 4.2. Under the heading "Next positions" the table shows
which positions can follow a position u by a choice a A(u) according to the condi
tions on u and a e A(u) shown in the other columns. The set of all positions which
can follow u by a 6 A(u) is denoted by D(u,a). The entry "end" indicates that
D(u,a) is empty. D(u) stands for the union of all D(u,a) with o at A(u). The sym
bol Zsm indicates the demand system obtained if s is complemented by the player
t's demand ,.
Histories. A history q of G(T) is a sequence of positions:

9 = (O,...,T)

with o = (N,z,i) and with

ut+l 6 D(ut) for t=l,...,T-I.

The set of all histories q is denoted by Q.


Terminal choices. A choice a A(u) at a position u is called terminal, if D(u, a)
is empty. It is clear that a terminal choice at u = (M,s>i) is either a coalition C
(admissible coalition) or the choice R for S = M \ i. Not every position u permits
a terminal choice. A position u with the property that A(u) contains at least one
terminal choice is called terminable. The set of all terminable positions is denoted by
Uo- A history q = (u0, , r) >a called terminable, if its last position is terminable.
The set of all terminable histories is denoted by Q0.
Plays. A play z is defined as a terminable history q = (o ; T) together with
a terminal choice a? A(UT) at its last position:

z= (uo,...,tiT-}aT).
4.11. Voting the directorial council 259

The set of all plays is denoted by Z.


Payoffs. The payoff hi(z) of player i is determined at the end of a play. If for a
given z no coalition is formed hi(z) =. 0, = 1 , . . . , n.
Strategy. A strategy a* is a function which assigns a single choice from the choice
set A(u) in every position u where t is a decision maker.
The payoff [unction. In the situation a = (a1,..., a ' , . . . , a") the payoff function
ki(a) is defined as mathematical expectation of the payoff hi(a) if the players use
strategies ( a 1 , . . . , a ' , . . . ,a").
The subgame perfect Nash equilibrium. Introduce the following strategies:

3*(), i = 1,. . . , n .

f;, if in the position u = (M, s , ' ) the coalition SUi is not admissible,
i.e., \S U t| < ^ - * and u is not a terminal position,
S U t, if in the position u = (M, s, i) the coalition S U t is an admissible
a'(tj) = I coalition, \S U i\ > ^j^ anc * u *s n t a terminal position,
R, if u = (M, $s> *) is a terminal position and SUt is not an admissible
coalition
M, if u = (M, si t) is a terminal position and S U i is an admissible
coalition.
Theorem. The n-tuplt of strategies a*(u) forms the subgame perfect Nash
equilibrium in G.
Proof. In the situation when the n-tuple a is used as the result of the game the
coalition of the type S is necessarily formed, and the payoffs of the players are equal
to:

a(S)
Ml S and I is not the player which forms the coalition 5 ;

k(a) = I,
if / S \ 5 and / is not the player which forms the coalition 5 ;

(esns * ' (es\s

if / is the player who forms the coalition S;

fc,(a?) = 0 for / $ S.

We have to prove the inequality

ki(a) > k,(a\\a>), l = l,...,n,

for all strategies a1 of the player /. If the strategy a'(u) is different from a'(u) then
it prescribes different choices in some positions of the game.
260 Chapter 4. Positioned games

Suppose in the position


u = (M,fo,t)

where S\Ji is not an admissible coalition (denote this type of position by U/), a'(uj) ^
a'(tij). This means that whether Q'(U/) prescribes to the player I to form a coalition
or to name a payoff demand
6 < ?i = *("/)
In the first case the formed coalition cannot be admissible and thus the payoff of player
/ &|(a||a') < 0, since the value of the characteristic function for the nonadmissible
coalition is equal to zero. In the second case the player / will be invited in some
coalition S and will be paid & < {( which is less than in the situation a. Thus we
have proved the inequality
ki(a) < k,(al\al)

if a' differs from a1 in the positions of the type u j .


Consider now the positions
w = (M,s,0

where the coalition S U t is an admissible coalition, i.e.

a N
tc -N ( )
(5U)>Y
and u is not a terminal position. Denote the positions of this type by u//.
Suppose a'(un) / a^tijj). This means that the player I instead of forming the
coalition 5 U i named the payoff demand & 6 (0,6). From the conditions (4.11.14)
we have that

* - I * .
tSu(

iS

Thus by naming a payoff demand & the player / gets not more than (&) by forming
a coalition (K , s , ) - Thus we have proved

k(a) > hiaWa1)

for the strategies a1 differing from 5 1 in the positions of the second type.
The proof for the strategies a ' differing from a* in the terminal position is similar.
The theorem is proved.
One can verify that payoffs in the considered Nash equilibrium a are also Pareto
optimal.
4.12. Exercises and problems 261

4.12 Exercises and problems


1. Find all absolute Nash equilibria in Example 4, 4.2.2.
2. Prove that in a finite-stage two-person zero-sum game with perfect information
the payoffs are equal in all "favorable" ("unfavorable") Nash equilibria.
3. Let Vi(x),v2(x),... ,vn(x) be the values of the payoff functions for players
1,2, . . . , n in an absolute equilibrium in the subgame Tx, which is unique in each
subgame.
(a) Show that the functions Vi(x), i = 1,2,... ,n, satisfy the following system of
functional equations:

Vj(x) = maxVj(x'), x 6 X,, t = l , 2 , . . . , n , (4.12.1)

with the boundary condition

v,(x)UXn4t = H,(x). (4.12.2)

(b) Give an example of the game in which the payoffs to players in a penalty
strategy equilibrium do not satisfy the system of functional equations (4.12.1) with
the boundary condition (4.12.2).
4. Construct an example of a multistage two-person nonzero-sum game where,
in a penalty strategy equilibrium, the penalizing player penalizes his opponent for
deviation from the chosen path and thus penalizes himself to a greater extent.
5. Construct Pareto-optimal sets in the game from Example 4, 4.2.2.
6. Construct an example of multistage nonzero-sum game where none of the Nash
equilibria leads to a Pareto-optimal solution.
7. Construct the map T which sets up a correspondence between each subgame
Tz of the game T some subset of situations U, in this subgame. Let T(T) = Uxa.
We say that the map T is dynamically stable (time-consistent) if from () Uxo
it follows that u**(-) UIk where u'-(-) = (uf*(-),.-.,?()) >s the truncation of
situation () to the subgame T,k, and w0 = {x 0 , Z\, ,z*, ..} is the play realized in
situation () E C/I0.
Show that if the map T places each subgame TZk in correspondence with the set
of Pareto-optimal situations Uf, then it is dynamically stable (time-consistent).
8. The map T defined in Ex.7 is called strongly dynamic stable (strongly time-
consistent) if for any situation u(-) 6 Uxo, any zk 6 {z<} = w, where {z<} = u> is a
play in situation u(-), situation u**(-) e UIk there exists a situation w(-) 6 UXo for
which the situation **() is its truncation on positions of the subgame Ttk.
Show that if the map T places each subgame FZk in correspondence with the set
of Nash equilibria, then it is strongly dynamic stable.
9. Construct an example where the map T placing each subgame I \ in correspon
dence with the set of Pareto-optimal equilibria is not strongly dynamic stable.
10. For each subgame T t we introduce the quantities v({t}), i = 1 , . . . , n rep
resenting a guaranteed payoff to the r-th player in the subgame Tx, i.e. ({'}, z) is
the value of the game constructed in terms of the graph of the subgame TI between
262 Chapter 4. Positional games

player and players N \ i acting as one player. In this case, a strategy set for the
coalition of players N \ i is the Cartesian product of strategy sets for each of the
players k {N \ i}, us\i G FI*iVui> t n e payoff function for player i in situation
(ui,xtN\i) is defined to be Hf(ui,UN\i), and the payoff function for coalition N \ i is
taken to be [Hf(%H,UN\i)].
Construct the functions v({i},z) for all subgames Tz of the game from Example
4, 4.2.2.
1 1 . Show that if in a multistage nonzero-sum game T with non-negative payoffs
(Hi > 0, i - l , . . . , n ) , w({t},z) = 0 for all i = l , . . . , n and z 6 Uf =1 X then any
play can be realized in some penalty strategy equilibrium.
12. Formalize the fc-level control tree-like system as a hierarchical game in which a
control center at the i-th level (i = 1 , . . . , k 1) allocate resources among subordinate
control centers at the next level with i < k I and among its subordinate produc
tion divisions with i = k 1. The payoff to each production division depends only
on its output, while the payoff to the control centers depends on their subordinate
production divisions.
1 3 . Find a Nash equilibrium in the tree-like hierarchical fc-level game constructed
in Exercise 12.
14. Show that the payoff vector a = {u(JV),0,... ,0} belongs to the core of a
tree-like hierarchical two-level game with the characteristic function v(S). Show that
the equilibrium constructed in the tree-like hierarchical two-level game is also a strong
equilibrium.
15. In a diamond-shaped hierarchical game construct a characteristic function by
using a Nash equilibrium.
16. Describe the set of all Nash equilibria in a tree-like hierarchical two-level
game. Take into account the possibility that the players B\,...,Bn can "penalize"
center A$ (e.g., by stopping production when the allocation of resources runs counter
to the interests of player ).
17. Construct the payoff matrix for players in the game of Example 7, 4.7.1. Find
optimal pure strategies and the value of the matrix game obtained.
18. Convert the game from Example 9, 4.7.1, to the matrix form and solve it.
19. Consider the following multistage zero-sum game with delayed information
about the positions of one of the players. The game is played by two players: target E
and shooter P. The target can move only by the point of the Ox axis with coordinates
0,1,2, In this case, if player E is at the point t, then at the next moment he can
move only to the points t + 1, i 1 or stay where he is. Shooter P has j bullets,
(j = 0,1,...) and can fire no more than one bullet at each time instant. It is assumed
that the shooter hits the point at which he is aiming.
At each time instant player P knows exactly the position of player E at the
previous step, i.e. if player E has been in the point i at the previous step then player
P has to aim at the points t + 1 , t and % 1. Player E is informed about the number
of bullets that player P has at each time instant, but he does not know where player
P is aiming at. The payoff to shooter P is determined by his accuracies, and so the
objective of shooter P is to maximize the number of his accurate hits before target E
4.12. Exercises and problems 263

can reach a "bunker". The objective of the target is the opposite one. Here "bunker"
means the point 0 where the target is inaccessible for player P.
Denote this game by T^j) with the proviso that at the initial time instant target E
is at the point with the coordinate i, while shooter P has j bullets. Denote by v(i,j)
the value of the game (if any). It can be readily seen that i>(i,0) = 0, i = 1,2,...,
v ( l , j ) = 0, j = 1,2, At each step of the game I \ j , i = 2 , 3 , . . . , j = 1,2,... the
shooter has four strategies (actually he has more strategies, but they are not rational),
whereas player E has three strategies. The strategies for shooter P are: shooting at
the point * 1, shooting at the point t, shooting at the point i + 1, no shooting at
this step. The strategies for the target are: move to the point i 1, stay at the point
t, move to the point t + 1. Thus at each step of the game we have the matrix game
with the payoff matrix

l + w(i - 1,i - l) v(i,j-l) (i + l , j - l ) "


v(s 1, j 1) l + w ( , j - l ) (t+l,j-l)
w(i-l,j-l) v(i,j-l) l+(i + l , j - l ) '
v(i-l,j) v(i,j) v(i + l,j),

Denote by xi(i, j),x2(i,j),x3(i,j),x4(i,j) the probabilities that shooter P will


use his 1st, 2nd, 3rd and 4th strategies. Also, denote by j/i(i,j),5/2(1, j ) , 5/3(1, j ) the
probabilities that target E will use its 1st, 2nd and 3rd strategies (behavior strategies
for the players P and E respectively are functions of the pairs {i,j}).
(a) Show that the value of the game v(i,j) and the optimal behavior strategies for
shooter P(xi(i,j),x2(i,j),x3(i,j),x4(i,j)) and target E(yl(i,j),y2(i,j),y3(i,j)) are
connected by the following inequalities:

(1 + v(i - 1, j - l))xi + v(i - 1, j - l)x2 + v(i - l,j* - l)ar3 + v(i - l,j)x4 > v{i,j),

v(i,j - l)x, + (1 + v(ij - 1))* 2 + v(i,j - l)x 3 + v(i,j)xt > v(i,j),


v(i + l,j - l ) i ! + v{i + 1, j - l)x2 + (1 + v(i + 1, j - l))x3 + v(i + 1, j ) x 4 > v(i,j),
xi + x2 + X3 + X4 1, X\ > 0, x2 > 0, x 3 > 0, x 4 > 0;
(1 + v(t - l , j - l))y, + v(i, j - l)y2 + v(i + IJ - l)j/ 3 < v(i, j ) ,
v{i - l,j - l)y, + v(i,j - 1)2/2 + (1 + v(t + l,j - l))t/ 3 < v(,j),
v(i - 1,7 - 1)5/, + (\+v(i,j - l))y 2 + v(i + \,j - l)t/ 3 < v(i,j),
v(i - l,y)t/i + v(i,j)y2 + v(i + l,j)y3 < (t, j),
!/i + y2 + y3 = 1, yi > 0, 1/2 > o, y3 > 0.
Hint. The difficulty associated with this game is that in order to determine v(i, j)
we need to know w(i + l, j), in order to determine v(i + l,j) we need to know v(i+2,j)
and so on. The exercises below provide a solution to the game T ^ and some of its
properties.
(b) Let <p(i,j), i = 1,2,..., j = 0 , 1 , . . . be the double sequence defined by the
relationships
>(*,0) = 0, i = l , 2 , . . . ; v ( l , j ) = 0, i = 1,2
264 Chapter 4. Positional games

rfi,j) = min{(l + <p(i - 1, j - 1) + <p(ij - 1) + *>(* + IJ - l ) ) / 3 ,

(1 + V( - l , i - 1) + V(i,i - l))/2>.
1) Prove that v(i,j) = <fi(i,j), and if v(i,j) = (1 + v(i - 1, j - 1) + v(i, j - 1) +
w(' + l , i - l ) ) / 3 , then
*('.j) = v (*.i) - v ( - l . J ~ 1).

*3(t,j) = (, j ) - (t + 1, j - 1),
*4(*,i) = o,
i(,i) = (,i) = ys(h j) = 1/3;
2) Prove that v(i,j) = <p(i,j), and if v(t, j ) = (1 + v(i - IJ - 1) + v(i,j - l))/2,
then
*i(ij) = (ii) - "( - l.J - ! ) .
*a(*',i) = w(.i) - (,i - 1),
x3(i,j) = i4(',i) = 0,
yi(\i) = y2(>i) = 1/2,
3(*',i) = 0;
(c) Prove that the following relationships hold for any j = 0 , 1 , 2 , . . . :
l)v(i,j)=j/3, i = j + l,j + 2,...;
2)v(i,j)<v( + l,i), i = l , 2 , . . . ;
3) v(ij) < (,j + 1), 1 = 2 , 3 , . . . ;
4) (t,j) + w(i + 2,j) < 2(i + l , j ) , t " = l , 2 , . . .
(d) Prove that:
1) lim^+oo v(i,j) = j / 3 for any fixed j = 0 , 1 , 2 , . . . ;
2) lira ; __ 00 v(i,j) = t 1 for any fixed * = 1,2,
20. Consider an extension of the game of shooter and target, where target E is
in position i, from where it can move at most k units to the right or the left, i.e. it
can move to each of the points: i k,i fc+ 1 , . . . , t , . . . , + ib. The other objectives
and possibilitilies for shooter P and target E remain unaffected in terms of the new
definition of a strategy for Player E.
Denote by G(i,j) the game with the proviso that at the initial time instant the
target is at the t-th point and the shooter has j bullets. Farther, denote by v(i,j)
the value of the game G(i,j). From the definition of G(i, j) we have

v(t,0) = 0, t = 1,2,...,

v(i,j) = 0, = 1 , 2 , . . . , * , ; = 1,2,....

At each step of the game G(i,j), i = k + 1 , . . . , j = 1 , . . . shooter P has (2k + 2)


pure strategies, whereas target E has (2k +1) pure strategies. The pure strategies for
player P are: shooting at the point k, shooting at the point i k + 1 , . . . , shooting
at the point t + k, no shooting at this step. The strategies for player E are: move to
the point k, move to the point t k + 1, ., move to the point ' + k.
4.12. Exercises and problems 265
Thus, at each step of the game we have the game with the (2k + 2) x (2k + 1)
matrix {o mn (j,j)}, where

( l+(* + n - j f c - l , j - 1), if m = n = l , . . . , 2 f c + l ,
v(i + n~ k- l,j - 1),

v(i + n-k- l,j),


if m ^ n; m,n = l,...,2fc+ 1,

if m = 2* + 2, n = 1,... ,2k + 1.
(a) Show that the game G(i,j) has the value equal to v(i, j) if and only if there
exist ( i , , x 2 , . . . , i 2 t + 2 ) , (yi,y 2 ,...,y 3 * + i) such that
2fc+2
amn(i,j)Xm >v(ij), n = 1,...,2* + 1,
m=l

2*+2
Y, xm = 1, i m > 0, m = l , . . . , 2 * + 2,
m=l
2*+l
5Z mn(i,i)yn <(*,i), m = 1,...,2* + 1,
n=l
2*+l
v = i, y n > 0 , n = i , . . . , 2 * + i.

Hint. Denote by Xi(i,j),Tj(i,j),.--)i2*+j(',j) the optimal behavior strategies


for shooter P and by y\(i,j),y2(hj), ,y2*+i(*>j) the optimal behavior strategies
for target E. The exercises below provide a solution of the game G(i,j) and its
properties,
(b) Denote by >(i,j), j = 0,1,..., t = 1,2,..., the following double sequence:

>(,0) = 0, 1 = 1 , 2 , . . . ;
tp(ij) = 0, i = 1,2,...,*; j = 1,2,...;

V (i,j) = r = n + i ( ( l + > ( , - + - * - 1, j - l))/(* + 2)),

z' = * + l , * + 2 , . . . , ; = 1,2,.... (4.12.3)


Prove that :
l)v(i,j) = (p(i,j);
2) for i = * + l,...; j = 1,2,..., we have x m (t, j) = t>(t, j) -v( + m - * - 1, j - 1)
with m = 1,... ,ik + r', otherwise yn = 0. Here r = r* is the point at which the
minimum is attained in (4.12.3).
(c) Prove that for j = 0,1,...:
!)( , i ) > 0 , i = 1,2,.. 1

2)(i ,y) = i/(2* + i), i = kj + l,kj + 2,...;


3)w(i ,;') <w(! + l,i), i = 1,2,...;
4)(t , i ) < v(i,j + i), i = * + l,fc + 2 , . . . ;
5) w(, i + i)<w('.j) + 1/(2*+ 1), t = 1,2,...;
266 Chapter 4. Positional games
(d) The game G(t,oo). Prove that limj_u(*>i) = w(i) for each t = 1,2,...,
where w(i) is a solution to the linear difference equation
k
M O - 4 - p ) = i, t = * +1,* + 2,...
with initial conditions
7(l) = ti>(2) = ... = u;(jfc)=0.
Chapter 5
Differential games

5.1 Differential zero-sum games with prescribed


duration
Differential games are a generalization of multistage games to the case where the
number of steps in a game is infinite (continuum) and the Players 1 and 2 (denoted
as E and P , respectively) have the possibility of taking decisions continuously in
time. In this setting the trajectories of players' motion are a solution of the system
of differential equations whose right-hand sides depend on parameters that are under
control of the players.
5.1.1. Let x G K", y G R", u G U C Rk, v G V C R', f(x,u),g{y,v) be the
vector functions of dimension n given o n i ? " x { / and Rn X V, respectively. Consider
two systems of ordinary differential equations

x = /(x,u), (5.1.1)

y = g{y,v) (5.1.2)
with initial conditions xo, j/o- Player P (E) starts his motion from the phase state xo
(yo) and moves in the phase space i?" in accordance with (5.1.1) and (5.1.2), choosing
at each instant of time the value of parameter u G U(v G V) to suit his objectives
and in terms of information available in each current state.
The simplest to describe is the case of perfect information. In the differential
game this means that at each time instant t the players choosing parameters u G U,
v G V know the time t and their own and the opponent's phase states. Sometimes
one of the players, say, Player P , is required to know at each current instant t the
value of the parameter v G V chosen by Player E at the same instant of time. In
this case Player E is said to be discriminated and the game is called the game with
discrimination against Player JE.
The parameters u G U, v G V are called controls for the players P and E,
respectively. The functions x(t),y{t) which satisfy equations (5.1.1), (5.1.2) and initial
conditions that are called trajectories for the players P, E.

267
268 Chapter 5. Differential games

5.1.2. Objectives in the differential game are determined by the payoff which
may defined by the realized trajectories x(t), y(t) in a variety of ways. For example,
suppose it is assumed that the game is played during a preassigned time T. Let
x(T),y(T) be phase states for the players P and E at the time instant T the game
terminates. Then the payoff to Player E is taken to be H(x(T),y(T)), where H(x,y)
is some function given on R" x R". In the specific case, when

H(x(ny(T)) = P(x(T),y(T)), (5.1.3)

where p(x(T), y{T)) = y/l%=i(xi('r) - y,(T)) 2 is the Euclidean distance between the
points x(T),y(T), the game describes the process of pursuit during which the objec
tive of Player E is to avoid Player P by moving a maximum distance from him by the
time the game ends. In all cases the game is assumed to be a differential zero-sum
game. Under condition (5.1.3), this means that the objective of Player P is to come
within the shortest distance of Player E by the time T the game ends.
With such a definition, the payoff depends only on final states and the results
obtained by each player during the game until the time T are not scored. It is of
interest to state the problem in which the payoff to Player E is defined as a minimum
distance between the players during the game:

omipT/(*(*)> y(0)-

There exist games in which the constraint on the game duration is not essential and
the game continues until the players obtain a particular result. Let an m-dimensional
surface F be given in R2n. This surface will be called terminal. Let

tn = {mmt:(x(t),y(t))F}, (5.1.4)

i.e. tn is the first time when the point (x(t),y(t)) falls on F. If for all t > 0 the point
(x(i),y(<)) ^ F, then tn is +oo. For the realized paths x(t),y(t) the payoff to Player
E is (the payoff to Player P is tn). In particular, if F is a sphere of radius / > 0
given by the equation

then we have the game of pursuit in which Player P seeks to come within a distance
/ > 0 to Player E as soon as possible. If / = 0, then the capture is taken to mean
the coincidence of phase coordinates for the players P and E, in which case Player E
seeks to postpone the capture time. Such games of pursuit are called the time-optimal
games of pursuit.
The theory of differential games also deals with the problems of determining the
set of initial states for the players from which Player P can ensure the capture of
Player E within a distance /. And a definition is provided for the set of initial states
of the players from which Player E can avoid in a finite time the encounter with
Player P within a distance /. One set is called a capture zone (C, Z) and the other an
5.1. Differential zero-sum games with prescribed duration 269

escape zone (E, Z). It is apparent that these zones do not meet. However, a critical
question arises of whether the closure of the union of the capture and the escape zones
spans the entire phase space. Also the answer to this question is provided below, we
now note that in order to adequately describe this process, it suffices to define the
payoif as follows. If there exists tn < oo (see (5.1.4)), then the payoff to Player E is
1. If, however, tn = oo, then the payoff is +1 (the payoff to Player P is equal to
the payoff to Player E but opposite in sign, since the game is zero-sum). The games
of pursuit with such a payoff are called the pursuit games of kind.
5.1.3. Phase constraints. If we further require that the phase point (x, y) would
not leave some set F C R2n during the game, then we obtain a differential game
with phase constraints. A special case of such a game is the "Life-line" game. The
"Life-line" game is a zero-sum game of kind in which the payoff to Player E is -(-1 if he
reaches the boundary of the set F ("Life-line") before Player P captures him. Thus,
the objective of Player E is to reach the boundary of the set F before being captured
by Player P (coming within a distance /, / > 0 with Player P). The objective of
Player P, however, is to come within a distance / with Player E while the latter is
still in the set F. It is assumed that Player P cannot abandon the set F.
5.1.4. Example 1. (Simple motion.) The game is played on a plane. Motions of
the players P and E are described by the system of differential equations

X\ = ui, it = 2, v] -r v.] < a2,

fa = i, fa = v2, v2 + vl< 02,

i,(0) = i ? , i,(0) = x%, ,(0) = j/?, a (0) = y2,a > 0. (5.1.5)


The physical implication of equation (5.1.5) is that the players P and E are moving
in a plane at limited velocities, the maximum velocities a and 0 are constant in value
and the velocity of Player E does not exceed the velocity of Player P.
Player P can change the direction of his motion (the velocity vector) by choosing
at each time instant the control u = (wi,ti 2 ) constrained by u\-r u\ < a2 (the set
U). Similarly, Player E can also change the direction of his motion by choosing at
each time instant the control v = {vi,v2) constrained by v2 + v2 < 02 (the set V).
It is obvious that if a > 0, then the capture zone (C, Z) coincides with the entire
space, i.e. Player P can always ensure the capture of Player E within any distance
I in a finite time. To this end, it suffices to choose the motion with the maximum
velocity a and to direct the velocity vector at each time instant t towards the pursued
point y(t), i.e. to carry out pursuit along the pursuit line. If a < 0, the escape set
(E, Z) coincides with the entire space of the game except the points (x, y) for which
p(x,y) < I. Indeed, if at the initial instant p(x0,y0) > /, then Player E can always
avoid capture by moving away from Player P along the straight line joining the initial
points xo,yo, the maximum velocity being 0.
The special property manifested here will also be encountered in what follows. In
order to form the control which ensures the avoidance of capture for Player E, we
need only to know the initial states x 0 , y0 while, to form the control which ensures
270 Chapter 5. Differential games
the capture of Player E in the case a > 0, Player P needs information on his own
and the opponent's state at each current instant of time.
Example 2. The players P and E are the material points with unit masses moving
on the plane under the control of the modulus-constrained and frictional forces. The
equations of motion for the players are
X% = X3, Xi X4, X3 = a t i j kpXs,

x4 = a2 kpii, Uj + uj < a 2 ,
!/i = !/3, fa = y*, fa = Pvi- kEy3, (5.1.6)
1
y4 = $v2 - *E!k, v? + v\ < 0 ,
where (xt,x2), (yuy2) are geometric coordinates, (x3,x4), (y3,y4) are respectively mo
menta of the points P and E, kp and jfeg are friction coefficients, a and f$ are maximum
forces which can be applied to the material points P and E. The motion starts from
the states x,(0) = x% y,(0) = y, i = 1,2,3,4. Here, by the state is meant not
the locus of the players P and E, but their phase state in the space of coordinates
and momenta. The sets U, V are the circles U = {u = (u^uj) : ? + u\ < a 2 },
V = {v = (i,t;2): V2 +w| < 0 2 }. This means that at each instant the players P and
E may choose the direction of applied forces. However, the maximum values of these
forces are restricted by the constants a and 0. In this formulation as shown below,
the condition a > f) (power superiority) is not adequate for Player P to accomplish
pursuit from any initial state.
5.1.5. The ways of selecting controls u U, v 6 V by the players P and E in
terms of the incoming information are as yet unknown. In other words, the notion of
a strategy in the differential game remains to be defined.
Although there exist several approaches to this notion, we shall focus on those in
tuitively obvious game-theoretic properties which the notion must possess. As noted
in Ch. 4, the strategy must describe the behavior of a player in all information states
in which he may find himself during the game. In what follows the information state
of each player will be determined by the phase vectors x(t),y(t) at the current time
instant t. Then it would be natural to regard the strategy for Player P (E) as a
function u(x,y,t) (v(x,y,t)) with values in the set of controls U (V). That is how
the strategy is defined in Isaacs (1965). Strategies of this type are called synthesizing.
However, this method of defining a strategy suffers from some grave disadvantages.
Indeed, suppose the players P and E have chosen strategies u(x, y, t), v(x, y, t), re
spectively. Then, to determine the paths for the players, and hence the payoff (which
is dependent of the paths), we substitute the functions u(x,y,t),v(x,y,t) into equa
tions (5.1.1),(5.1.2) in place of the control parameters w, v and integrate them with
initial conditions x0, y0 on the time interval [0, T]. We obtain the following system of
ordinary differential equations:
x = f(x,u(x,y,t)), y = g(y,v(x,y,t)). (5.1.7)
For the existence and uniqueness of the solution to system (5.1.7) it is essential
that some conditions be imposed on the functions f(x,u),g(y,v) and the strategies
5.1. Differential zero-sum games with prescribed duration 271

u(x, y, t), v(x, y, t). The first group of conditions places no limitations on the players'
capabilities, refers to the statement of the problem and is justified by the physical
nature of the process involved. The case is different from the constraints on the
class of functions (strategies) w(x, y, t),v(x, y,t). Such constraints on the players'
capabilities contradict the notions adopted in game theory that the players are at
liberty to choose a behavior. In some cases this leads to substantial impoverishment
of the sets of strategies. For example, if we restrict ourselves to continuous functions
u(x,y, t),v(x, y, t), the problems arise where there are no solutions in the class of
continuous functions. The assumption of a more general class of strategies makes
impossible the unique solution of system (5.1.7) on the interval [0, T]. At times, to
overcome this difficulty, one considers the sets of strategies u(x,y,t),v(x,y, t) un
der which the system (5.1.7) has a unique solution extendable to the interval [0, T].
However, such an approach (aside from the nonconstructivity of the definition of
the strategy sets) is not adequately justified, since the set of all pairs of strategies
u(x,y,t), v(x,y, t) under which the system (5.1.7) has a unique solution is found to
be nonrectangular.
5.1.6. We shall consider the strategies in the differential game to be piecetot'se
open-loop strategies.
The piecewise open-loop strategy u(-) for Player P consists of a pair {a, a}, where
a is some partition 0 = t'0 < t\ < ... < t'n < . . , of the time interval [0, oo) by the
points t'k which have no finite accumulation points; a is the map which places each
point l'k and phase coordinates x(t'k),y(t'k) in correspondence with some measureable
open-loop control u(t) G U for t G [t*)'i+i) (the measurable function u(t) taking
values from the set U). Similarly, the piecewise open-loop strategy v(-) for Player
E consists of a pair {T, 6} where T is some partition 0 = tg < t" < .,. < <J[ < . . .
of the time interval [0, oo) by the points t'k which have no accumulation points; b is
the map which places each point t'k and positions x(t'k), y(tk) in correspondence with
some measurable open-loop control v(t) G V on the interval [t'k, <i'+1) (the measurable
function v(t) taking values from the set V). Using a piecewise open-loop strategy the
player responds to changes in information not continuously in time, but at the time
instants tk G r which are determined by the player himself.
Denote the set of all piecewise open-loop strategies for Player P by P, and the
set of all possible piecewise open-loop strategies for Player E by E.
Let u(t),v(t) be a pair of measurable open-loop controls for the players P and
E (measurable functions with values in the control sets U, V). Consider a system of
ordinary differential equations

i = / ( * , t t ( 0 ) , y = g(yMt)), t > 0. (5.1.8)

Impose the following constraints on the right-hand sides of system (5.1.8). The vector
functions f(x,u),g(y,v) are continuous in all their independent variables and are
uniformly bounded, i.e. / ( x , u ) is continuous on the set ft" X U, while g(y,v) is
continuous on the set i f x V and | | / ( x , u ) | | < a, ||s(y,v)|| < 0 (here ||z|| is the
vector norm in Rn). Furthermore, the vector functions f(x, u) and g(y, v) satisfy the
272 Chapter 5. Differential games

Lipschitz condition in x and y uniformly with respect to u and v, respectively, that is

| | / ( * i , u ) - / ( Z J I ) | | < "lll&i - acall, e U,


||</(yi.) - p(ya,w)H < ft||yi - yill, e V.
From the Karatheodory existence and uniqueness theorem it follows that, under
the above conditions, for any initial states oi!to any measurable open-loop con
trols ti(t),v(t) given on the interval [i,j], 0 < h < tj, there exist unique vector
functions x(t),y(t) which satisfy the following system of differential equations almost
everywhere (i.e. everywhere except the set of measure zero). On the interval [fi,fa]

i(t) = / ( i ( * ) , u ( 0 ) , y(t) = s ( ( 0 , (*)) (5-1 )


and the initial conditions are x(ti) = XQ, y(*i) = yo (see Kolmogorov and Fomin
(1981), Sansone (1954)).
5.1.7. Let (x 0 ,yo) be a pair of initial conditions for equations (5.1.8). The system
S = {xo,yo;(-)' v (')}i where u(-) g P,v(-) E, is called a situation in the differential
game. For each situation S there is a unique pair of paths x(t), y(i) such that x(Q) =
x 0 , y(0) = yo and relationships (5.1.9) hold for almost all t e [0,T], T > 0.
Indeed, let u ( ) = {S,a}, v(-) = { T , 6 } . Furthermore, let 0 = t0 < h < < h <
. . . be a partition of the interval [0, oo) that is the union of partitions 6, r . The solution
to system (5.1.9) is constructed as follows. On each interval [i*, tk+\), k = 0 , 1 , . . . , the
images of the maps a, 6 are the measurable open-loop controls u(t),v(t); hence on the
interval [io>*i) the system of equations (5.1.9) with z(0) = x0, y(0) = yo has a unique
solution. On the interval [<i,<2) with x(ti) = limt_t,-oz(')> y(tx) = limt_ ( l _oy(t) as
initial conditions, we construct the solution to (5.1.9) by reusing the measurability of
controls u(t),v(t) as images of the maps a and 6 on the intervals [t*,t* + i), A; = 1,2,
Setting xfa) = lim_, 2 _o(t), y ( ^ ) = limti3-oy(<) we continue this process to find
a unique solution x(t),y(t) such that x(0) = x0, y(0) = y 0 . Any path x(t) (y(t))
corresponding to some situation {z<!&>>()>"()} 's called the path of the Player P
(Player E).
5.1.8. Payoff function. As shown above, each situation S {a^yo! **(), "(')} *n
piecewise open-loop strategies uniquely determines the paths x(t),y(t) for Player P
and E, respectively. The priority degree of these paths will be estimated by the payoff
function K which places each situation in correspondence with some real number
a payoff to Player E. The payoff to Player P is K (this means that the game is
zero-sum, since the sum of payoffs to players P and E in each situation is zero). We
shall consider the games with payoff functions of four types.
Terminal payoff. Let there be given some number T > 0 and a function H(x,y)
that is continuous in (x,y). The payoff in each situation S = {xo,yo\u(-),v()} is
determined as follows:

K(x0,y0;u(-),v(-)) = H(*(ny(m
where x(T) = x(t)\t=r, y{T) = y(t)|=r (here x(t),y(t) are the paths of players P
and in a situation 5). We have the game of pursuit when the function H(x, y) is
a Euclidean distance between the points x and y.
5.1. Differential zero-sum games with prescribed duration 273

Minimum result. Let i/(x, y) be a real-valued continuous function. In the situation


S = {x 0 ,yo; "()> v(')} t n e payoff to Player E is taken to be mino< t <rii(x(),y(i)),
where T > 0 is a given number. If H(x,y) = p(x,y) then the game describes the
process of pursuit.
Integral payoff. Some manifold F of dimension m and a continuous function
H{x,y) are given in i f x Rn. Suppose in the situation S = {x0,yo;u(-),v(-)}, tn is
the first instant at which the path (x(<),y(t)) falls on F. Then

K(x0,y0;<),()) = t H(x(t),y(t))dt
Jo

(if tn = oo, then K = oo), where x(f),y(t) are the paths of players P and corre
sponding to the situation S. In the case H = 1, A" = <, we have the time optimal
game of pursuit.
Qualitative payoff. The payoff function K can take only one of the three values
+ 1,0,-1 depending on a position of (x(t),y(t)) in i f x /f. Two manifolds F and
L of dimensions mx and m 2 respectively are given in fi" X /f. Suppose that in the
situation S = {xo.Jto; w(')i"(")}> 'n is the first instant at which the path (x(t),y{t))
falls on F. Then

f-1, if(x(),if(M)Gl,
#{*<),oi(0) "()) = ] > if*n = oo,
[+1, if (*(<),(*.))*
5.1.9. Having defined the strategy sets for the players P and E and the payoff
function, we may define the differential game as the game in normal form. In 1.1.1,
we interpreted the normal form T as the triple V = < X, Y, K >, where X X Y is the
space of pairs of all possible strategies in the game T, and K is the payoff function
defined on X x Y. In the case involved, the payoff function is defined not only on
the set of pairs of all possible strategies in the game, but also on the set of all pairs
of initial positions x 0 ,y 0 . Therefore, for each pair (x0,jfo) 6 i f x i f there is the
corresponding game in normal form, i.e. in fact some family of games in normal form
that are dependent on parameters (xo,yo) i f x i f are defined.
Definition. The normal form of the differential game T(xo,yo) given on the
space of strategy pairs P x E means the system

r(x0,3fo) = (xo,yo;P,E,K(x0,y0; u(-), *>()))>

where K(xo,y0;u(-),v(-)) is the payoff function defined by any one of the above meth
ods.
If the payoff function K in the game V is terminal, then the corresponding game
T is called the game with terminal payoff. If the function K is defined by the second
method, then we have the game for achievement of a minimum result. If the function
K in the game T is integral, then the corresponding game T is called the game with
integral payoff. When the payoff function in the game T is qualitative, then the
corresponding game T is called the game of kind.
274 Chapter 5. Differential games
5.1.10. It appears natural that optimal strategies cannot exist in the class of
piecewise open-loop strategies (in view of the open structure of the class). However,
we can show that in a sufficiently large number of cases, for any e > 0 there is an
-equilibrium point.
Recall the definition of the e-equilibrium point (see 2.2.3).
Definition Let t > 0 be given. The situation st = {x0,ya;vc(-),vc(-)} is called
an e-equilibrium in the game T{xo,yo) if for all () G P and () E there is

K(xo,yoM-)M-)) +t > ff(*o,o;u,(-),()) (5.U0)

> K(x0,yo]u(-),())- e-

The strategies ut(-),v(-) determined in (5.1.10) are called e-optimal strategies for
players P and E respectively.
The following Lemma is rephrasing of Theorem 2.2.5 for differential games.
Lemma. Suppose that in the game r(xo,yo) for every t > 0 there is an e-
equMbrium. Then there exists a limit

YimK(x0,y0;u(-),v({-)).

Definition. The function V(x,y) defined at each point (x,y) of some set D C
i f x fl* by the rule
YunK(x,y; ,(), ,()) = V(x,y), (5.1.11)
is called the value of the game r(x,y) on the set of initial conditions (x,y) 6 D.
The existence of an e-equilibrium in the game T{xo, yo) for any e > 0 is equivalent
to the fulfilment of the equality
SU
P tof tf(*o,jto; u(-), ())= inf sup /f(i 0 ,yo; ().())
v(.)B(-)P ()^v()B

If in the game T(x0, y0) for any e > 0 there are e-optimal strategies for players P
and E, then the game r(x 0 , yo) is said to have, a solution.
Definition. Let '(),*() be the pair of strategies such that

K(z0,y0;u( ),'()) > K(x0,y0;*(),()) > K(x0,y0;*(),()) (5.1.12)

/or all () 6 P and v() 6 J5. 7%e stfwaiion s' = (z0,y<>; *()> "*(")) ** ^ e n ca^d
an equilibrium in the game r(xo,yo). The strategies *() P and v'(-) E from
(5.1.12) are called optimal strategies for players P and E, respectively.
The existence of an equilibrium in the game r(xoiyo) is equivalent (see 1.3.4) to
the fulfilment of the equality
m
S ftLtf^yo;().())= ""a. suptfto'Vo;()>())
(-) (-)EP ( ) 6 E v (.)g/>
5.2. Multistage perfect-information games with an infinite number of... 275

Clearly, if there exists an equilibrium, then for any e > 0 it is also an e-equilibrium,
i.e. here the function V(x,y) merely coincides with K(x,y;u'(-),v*(-)) (see 2.2.3).
5.1.11. We shall now consider a synthesizing strategies.
Definition. The pair (u*(x,y,t),t>*(x,y,i)) is called a synthesizing strategy
equilibrium in the differential game, if the inequality

K(x0,y0;u(x,y,t),v"(x,y,t)) > K(x0,yo;u*{x,y,t),vm(x,y,t))

> K(x0,y0;u'(x,y,t),v(x,y,t)) (5.1.13)


holds for all situations (u(x, y, t), v'(x, y, t)) and (u*(x, y, t), v(x, y, t)) for which there
exists a unique solution to system (5.1.7) that can be prolonged on [0,oo) from the
initial states x 0 ,yo- The strategies u*(x,y,t),t>"(x,y,<) are called optimal strategies
for players P and E.
A distinction must be made between the notions of an equilibrium in piecewise
open-loop and synthesizing strategies. Note that in the ordinary sense the equilibrium
in the class of functions u(x,y,t),v(x,y,t) cannot be defined because of the nonrect-
angularity of the space of situations, i.e. in the synthesizing strategies it is impossible
to require that the inequality (5.1.13) holds for all strategies u(x,y,t),v(x,y,t), since
some pairs (u*(x,y,i), v(x,y,t)), (u(x,y,(),t;*(x,y,i)) cannot be admissible (in the
corresponding situation the system of equations (5.1.7) may have no solution or may
have no unique solution).
In what follows we shall consider the classes of piecewise open-loop strategies,
unless otherwise indicated. Preparatory to proving the existence of an e-equilibrium
in the differential game we will first consider one auxiliary class of multistage games
with perfect information.

5.2 Multistage perfect-information games with


a n infinite number of alternatives
5.2.1. We shall consider the class of multistage games with perfect information that
is a generalization of the games with perfect information from Sec. 4.1. The game
proceeds in the n-dimensional Euclidean space H". Denote b y i / P the position
of Player 1, and by y Rn the position of Player 2. Suppose that the sets Ux, Vj, are
defined for each x G Rn,y R", respectively. These are taken to be the compact
sets in the Euclidean space i f . The game starts from a position xQ,y0. At the 1st
step the players 1 and 2 choose the points x t g UXg and yt VVo. In this case the
choice by Player 2 is made known to Player 1 before he chooses the point X\ Uxo.
At the points Xi,y, the players 1 and 2 choose x 2 6 Uxs and y2 6 VVI and Player 2's
choice is made known to Player 1 before he chooses the point x 2 UX1 and so on.
In positions x * . . , ^ - ! at the fc-th step the players choose a;* #**_,, Uk Kjt_i ^ d
Player 2's choice is made known to Player 1 before he chooses the point X* 6 UZk_v
This process terminates at the JV-th step by choosing i s 4 N _, > yjv G Vj_i and
passing to the state Xs,yN-
276 Ch&pter 5. Differential games
The family of sets Ux,Vyl x 6 .ft", y ft" is taken to be continuous in x,y
in Hausdorff metric. This means that for any e > 0 there is 8 > 0 such that for
|x - x 0 | < <S (\y - y0| < <5)

(UI0)CDUX, {Ux)tDUXo;

Here Uc (V() is an e-neighborhood of the set U (V).


The following result is well known in analysis (see Petrosjan (1993)).
Lemma. Let f(x', y') be a continuous function on the Cartesian product UxxVy.
If the families {UX},{VV} are Hausdorff'Continuous in x,y, then the functionals

Fi(x,y) = max min /(x',y')>


y'Vy x'6l/

F2(x,y) = jrunmax/(*',y')

are continuous in x,y.


Let x = (x 0 ,..., XN) and y = (yo,..., y^) be the respective paths of players 1 and
2 realized during the game. The payoff to Player 2 is
x
0f^Nfi k,yk) = F{x,y), (5.2.1)

where f(x, y) is a continuous function of x, y. The payoff to Player 1 is F (the game


is zero-sum).
We assume that this game is a perfect-information game, i.e. at each moment (at
each step) the players know the positions x*,y* and the time instant k+ 1, moreover,
Player 1 is informed about the choice y*+i by Player 2.
Strategies for Player 1 are all possible functions u(x, y, t) such that u(ik-i, y*, k)
/*_! Strategies for Player 2 are all possible functions v(x,y,t) such that
v(xk-i,yk-i,k) 6 Vj,k_,. These strategies are called pure strategies (as distinct from
mixed strategies).
Suppose that players 1 and 2 use pure strategies u(x,y,t),v(x,y,t). In situation
((),()) the game proceeds as follows. At the first step Player 2 passes from
the state y0 to the state ys = u(x 0 ,yo,l), while Player 1 passes from the state x 0
to the state xi = u(xo,yi,l) = u(xo,v(xo,ya, 1), 1) (because Player 1 is informed
about the choice by Player 2). At the 2nd step the players pass to the states y2 =
v(xi,yi,2), Xj = u(xi,y 2 ,2) = u(xi,v(zi,yi,2),2) and so on. At the fc-th step
players 1 and 2 pass from the states Xk-i,yk-i to the states y* = u(xjt_i,y*_i,fc),
Xk = u(zfc-i,y*,*) = u(xk-i,v(xk-i,yk-i,k),k). Thus to each situation (u(-),w())
uniquely correspond the paths of the players 1 and 2, x = (x 0 ,xi,... ,x*,... ,XN) and
V = (jto,yt, , y*, , yiv); hence the payoff K(u(-), ()) = F(x,y) is determined by
(5.2.1).
This game depends on two parameters; the initial positions x0, yo and the duration
N. For this reason, we denote the game by T(x0,yo>N). For the purposes of further
5.2. Multistage perfect-information games with an infinite number of... 277

discussion it is convenient to assign each game: r(zoi yo, N) to the family of games
T(x,y,T) depending on parameters x,y,T.
5.2.2. The following result is a generalization of Theorem 4.2.1.
T h e o r e m . The game T(xo, yo, N) has an equilibrium in pure strategies and the
value of the game V(xo,yo,N) satisfies the relationship

V(x o ,yo,*0 = max{/(z 0 ,tfo),max min V(x,y,k - 1)}, k=l,...,N;

V(x,y,0) = f(x,y). (5.2.2)

Proof is carried out by induction for the number of steps. Let N = 1. Define the
strategies u*(-),u*(), for the players in the game r(xo>yo, 1) in the following way:

min f(x,y) = f(u'{x0,y,l),y), y e Vw.

If maXs,Vl,0 min r i/ Io f(x,y) = f(u'(x0,y', l ) , y ' ) then v"(x0,y0,1) = y*. Then

K(u*{-),*()) = max{/(z 0 ,yo),max min f(x,y)}

and for any strategies u(-), () of the players in the game T(x0, yo, 1)

K(u'(-),v(-)) < *(*(),()) < K(u(.),v'(.)).

In view of this, the assertion of Theorem holds for N < 1.


We now assume that the assertion of Theorem holds for N < n. We shall prove
it for N = n 4- 1, i.e. for the game r(x0,y0,n + 1). Let us consider the family of
games T(x, y,n) x e Uxo,y V^. Denote by "()> ^ v ( ) a n equilibrium in the game
r(x,y,n). Then ft"(Jy(),tJjy(-)) = V(x,y,n), where V(x,y,n) is determined by
relationships (5.2.2). Using the continuity of the function f(x,y) and Lemma 5.2.1,
we may prove the continuity of the function V(x,y,n) in x,y. We define strategies
u n+1 (-),iT ,+1 (-) for players in the game T(x0,y0,n + 1) as follows:

min V(x,y,n) = V(u"+l(x0,y,\),y,n), y S Vw.

If max vVlB min r 6 t / , o V{x,y,n) = K(u" + 1 (x 0 ,y, l ) , y , n ) , then " + , (zo,yo,l) = y (for
x ^ x0, y ^ t/o the functions t f + , ( z , y , 1) and u"+,(x,y,l) can be defined in an
arbitrary way)
5" +1 (-,*) = * ; , ( , * - 1 ) , fc = 2 , . . . , n + l ,
5" + , (-,*) = ^ ( - , * - 0 . * = 2 , . . . , n + l .
Here xi e Uxo, y t e V^ are the positions realized after the 1st step in the game
r(x0,y0,n + 1). By construction,

tf(H"+,(-),S"+,(-)) = max{/(x 0 ,yo), max min V(x,y,n)}. (5.2.3)


VVM zU,0
278 Chapter 5. Differentia/ games
Let us fix an arbitrary strategy () for Player 1 in the game T(x0,yo,n + 1). Let
w(*o, y, 1) = *i where y = n+1 (x 0 , !fo> 1) and *() is the truncation of strategy u(-)
to the game T(x,y,n), x U^, y e VM.
The following relationships are valid:

K(Vl(-),ir+l(-)) < max{/(x0,So), V(* jf,n)}

= max{/(a ! p l Kto),if(lC li (-),^ lj (-)}


< max{/(* 0 , ! t o ),lf(u- s (.),^,,(-))} = ^((-),^ + 1 (-)). (5-2.4)
In the same manner the inequality

iif(u"+1(.)^+1(-)) > Ki^UM-)) (5-2.5)


n
can be proved for any strategy () of Player 2 in the game: r(xo,j/oi + !) From
relationships (5.2.3)-(5.2.5) it follows that the assertion of Theorem holds for N =
n + 1. This completes the proof of the theorem by induction.
We shall now consider the game T(xQ,y0,N) which differs from the game
r(x 0 , jfo, N) in that the information is provided by Player 1 about his choice. Now,
at the step k in the game: T(x0,y0,N) Player 2 knows not only the states xt_i,y t _i
and the step k, but also the state Xk UXk_t chosen by Player 1. Similarly, to the
Theorem 5.2.5. we may show that the game F(xo, yo, N) has an equilibrium in pure
strategies and the game value V(x0,y0,N) satisfies the equation

V(x0,y0,k) = max{/(x0,jto), min maxF(x,y,fc - 1)},

fc=l,...,iV, V(x,y,0) = f(x,y). (5.2.6)


5.2.3. Let us consider the games T'(xo,y0,N) and T (x0,y0,N) that are distin
guished from the games T(x0,y0,N) and r(x0,yo,iV) correspondingly in that the
payoff function is equal to a distance between Player 1 and Player 2 at the final step
of the game, i.e. P{XN, yw)- Then the assertion of Theorem 5.2.2 and its corollary are
valid and, instead of relations (5.2.2),(5.2.6), the following equations hold:

V'(x,y,k) = max mm V'(x',y',k - 1), * = 1,.. .,JV,


y'V X'&JM

V'(x,y,Q) = p(x,y); (5.2.7)


V (x.y.k) = min max v (x'.y',k 1), k = 1.....N,

V'(x,y,0) = p(x,y). (5.2.8)


Example S. Let us consider a discrete game of pursuit, in which the sets Ux are
the circles of radius a centered at the point x, while the sets Vy are the circles of
radius /9 centered at the point y (a > f}). This corresponds to the game in which
Player 2 (Evader) moves in a plane with the speed not exceeding 0, while Player 1
(Pursuer) moves with a speed not exceeding a. The pursuer has a speed advantage
5.2. Multistage perfect-information games with an infinite number of... 279

and the second move is made by Player 1. The game of this type is called a discrete
game of "simple pursuit" with discrimination against evader. The duration of the
game is N steps and the payoff to Player 2 is equal to a distance between the players
at the final step.
We shall find the value of the game and optimal strategies for the players by using
the functional equation (5.2.7).
We have
V ( x , y , l ) = rnaxminp(x',y'). (5-2.9)

Since Ux and Vy are the circles of radii a and 0 with centers at x and y, we have that if
Ux D Vy, then V(x, y, 1) = 0, if, however, Ux ~f> Vv, then V(x,y, 1) = p(x,y) + 0-a =
p(x,y) - (a - /?) (see Example 8 in 2.2.6). Thus,

V( , _ JO, if Ux D Vy, i.e. p(x,y)-(a-p)<0,


V(X y l)
' ' -\p(X,y)-(a-0), ifUzjVy,
in other words,
V(x,y,\) = max[0,^(x,y) - (a - /?)]. (5.2.10)
Using induction on the number of steps k we shall prove that the following formula
holds:
V(x,y,ie) = max[0,p(x,y)-fc(a - / ? ) ] , Jb > 2. (5.2.11)
Suppose (5.2.11) holds for k = m 1. Show that this formula holds for k = m. Using
equation (5.2.7) and relationships (5.2.9), (5.2.10) we obtain

V{x, y, m) = max min V(x', y', m - 1)

=
j^J?&{max[o,p(z'>y') - ("* - i)( - 0)}}
= max[0, max min {/>(x', y')} - (m - l)(a - /?)]
y'V i f
= max[0,max{0,/>(x,y)- (a - 0)} - (m - l ) ( a - / ? ) ] = max[0,/?(x,y) - m(a - 0)\,
which is what we set out to prove.
If V(x0,y0,m) = p(x 0 ) y 0 ) - m(a - 0), i.e. p(x0,y0) - m(a - 0) > 0, then the
optimal strategy dictates Player 2 to choose at the k-th step of the game the point
yt of intersection the line of centers Xk-iiVk-i with the boundary VVk_l that is the
farthest from Xk-i- Here xic-i,yk-i are the players positions after the (k l)-th
step, k = l,...,N. The optimal strategy for Player 1 dictates him to choose at
the Jfc-th step of the game the point from the set UXk_, that is the nearest to the
point y*. If both players are acting optimally, then the sequence of the chosen points
x 0 , x , , . . . , x N , yo,y\,---,VN es along the straight line passing through x0,y0. If
V(x0,y0,m) = 0, then an optimal strategy for Player 2 is arbitrary, while an optimal
strategy for Player 1 remains unaffected. In this case, after some step k the equality
maxygv^ min x 6 [/ n p{x,y) = 0 is satisfied; therefore starting with the (k+ 1) step the
choices by Player 1 will repeat the choices by Player 2.
280 Chapter 5. Differential games

5.3 Existence of e-equilibria in differential games


with prescribed duration
5.3.1. In this section we shall prove existence of piecewise open-loop strategy t-
equilibria in differential games with prescribed duration as defined in 5.1.6. Let us
discuss the case where the payoff to Player E is the distance p(x(T), y(T)) at the last
instant of the game T.
Let the dynamics of the game be given by the following differential equations:

for/5: i = /(i,ii); (5.3.1)

f o r : y = g(y,v). (5.3.2)
Here x,y R", u U, v V, where U, V are compact sets in the Euclidean
spaces Rk and Rf, respectively, t [0,oo). Suppose the requirements in 5.1.6 are all
satisfied.
Definition. Denote by CP(x0) the set of points x R71 for which there is
a measurable open-loop control u(t) U sending the point XQ to x in time t, i.e.
x(t0) = Xo, x(t0 -r t) = x. The set Cp(x0) is called a reachability set for Player P
from initial state XQ in time t.
In this manner we may also define the reachability set Cg(y 0 ) for Player E from
the initial state y0 in time t.
We assume that the functions f,g are such that the reachability sets Cp(xo),
CE(J/O) for players P and E, respectively, satisfy the following conditions:

1. Cp(x0), Cg(jfo) are defined for any x0,yo G # " , U>,t [0, oo) (t0 < t) and are
compact sets of the space i f ;

2. the point to set map C'p(xQ) is continuous in all its variables in Hausdorff metric,
i.e. for every e > 0, x'0 R", t 6 [0,oo) there is S > 0 such that if \t - t'\ < S,
p(x0,x'0) < 6, then p"(CP(x0),Cp(x'0)) < e. This also applies to C^(y 0 )-

Recall that the Hausdorff metric p" in the space of compact sets Rn is given as
follows:

p'(A, B) i max(p'(A, B), p'(B, A)), p'(A,B) = max p(a, B)

and p(a, B) = m i n ^ s p(a, 6), where p is a standard metric in R".


We shall prove the existence theorem for the game of pursuit F(x0, yo, T) with
prescribed duration, where x0,yo G R are initial positions for players P and E
respectively. T is the duration of the game. The game r(x0,y0,T) proceeds as
follows. The players P and E at the time t0 = 0 start their motions from positions
x 0 , yo in accordance with the chosen piecewise open-loop strategies. The game ends
at the time t = T and Player E receives from Player P an amount p(x(T),y(T)) (see
5.1.8). At each time instant t [0,T] of the game r(x 0 ,yo>r) each player knows
the instant of time t, his own position and the position of his opponent. Denote by
5.3. Existence of e-equiiibria in differential games with prescribed duration 281
tne set
P(x0,t0,t) (E(y0,t0,t)) f trajectories of system (5.3.1), (5.3.2) emanating
from the point xQ(y0) and defined on the interval [t0,t].
5.3.2. Let us fix some natural number n > 1. We set 6 = T/2n and introduce the
games Tf (xo, yo, 7 1 ),' = 1,2,3, that are auxiliary with respect to the game T(xo, yo, T).
The game r f ( x 0 , y o , r ) proceeds as follows. At the 1st step Player E in position
yo chooses yi from the set C6E(yo). At this step Player P knows the choice of yi
by Player E and, in position XQ, chooses the point x\ Cp(xo). At the k-th step
k = 2 , 3 , . . . , 2 " , Player E knows Player P'a position xk-i 6 CsP{xk_2) and his own
position yk~i 6 C|(yt_ 3 ) and chooses the point y* 6 Cg(yk-i)- Player P knows
Xk-i>Vk-ifyk and chooses Xk G CP(ik-i)- At the 2"-th step the game ends and
Player E receives an amount p(x(T),y(T)), where x(T) = x 2 , y(T) y 2 .
Note that the players' choices at the Ar-th step of the points x*, yk from the reacha
bility sets Cp(xi_i), Cg{yk~\) can be interpreted as their choices of the corresponding
trajectories from the sets P((ik-i,k 1)6,k6), E((yk-i,k 1)<$,&6), terminating in
the points Xk,yk at the time instant t = kS (or the choice of controls u(-),u(-) on
[(k ~ 1)6, kS] to which these trajectories correspond according to (5.3.1), (5.3.2)).
The game Ts2(x0,yo,T) differs from the game rf(xo,yo, T) in that at the fc-th step
Player P chooses xt Cp(xk-i) with a knowledge of Xk-\,yk-i, while Player E, with
an additional knowledge of Xk, chooses y* Cg(yt_i).
The game T^xo, yo, T) differs from the game T^Xo, yo, T) in that at the 2 n step
Player P chooses x2n Cp(x2->_i), then the game ends and Player E receives an
amount p(x(T),y(T 6)), where x(T) = xj, y(T 6) = yr>-i.
5.3.3. Lemma. The games r<(x 0 , yo, T), i = 1,2,3, have an equilibrium for
all xo,yo,r < co and the value of the game VaJr;(xo,yo,T) is a continuous function
x0,y0 / T \ For any n > 0 there is

Vair*(xo,y 0 ,r) < Vair s s (x 0 ,y 0 ,r), r = 2*. (5.3.3)

Proof. The games Tf (x 0 , y0,T),i = 1,2,3, belong to the class of multistage games
defined in Sec. 5.2. The existence of an equilibrium in the games Tf (x 0 , yo, T) and the
continuity of functions Vairf(xo,yo, T) in Xo.yo immediately follows from Theorem
in 5.2.2 and its corollary. The following recursion equations hold for the values of the
games rf(xo,yo,7'), t = 1,2,

ValTUx0,y0,T) = max min Vair*(x,y,T - 6),

Vair*(x 0 ,yo,T)= min max ValTs2{x,y,T ~ 6),


*6Cj,(ro)v6C<.(j)
with the initial condition Vair?(x,y,0) = Vair^(x,y,0) = p(x,y). Sequential appli
cation of Lemma 1.2.2 shows the validity of inequality (5.3.3).
5.3.4. Lemma. For any integer n > 0 the following inequalities hold:

Vair?"(x 0 ,y 0 ,T) < V a i l f + ' ( x 0 , y 0 , r ) ,


282 Chapter 5. Differential games

Vallf (*o,J/o,T) > Vallf +, (x 0 ,yo,:r),


where Sk = T/2k
Proof, Show the validity of the first of these inequalities. The second inequality
can be proved in a similar manner. To avoid cumbersome notation, let C*(y,) =
< # ( * ) . Ck(xi) = Cp(Xi), i = 0 , 1 , . . . ,2" - 1. We have
1
vwr^+, (*ft,*hT)
Mrflt (x 0 ,y 0 ,r)
+1
= max min max min ValT,n*1 (x2,y
ValT," (x3,v22,T,T
-- 22&,+i)
4,+i)

> max max mm nun Va]r*" + '(x 3 ,y 3 ,T- 2 4 + i )


~ iC+'(w)
lC+'(w) wC+(yi)
wC+(yi) *iC"+'(x 0 ) C+'(*l)
xiC"+'(o) C+'(*i)

= max min
iec-(
viec-(ww)*,C"()
Continuation of this process yields

Vairf* + , (xo,yo,T)> max mm ... ... max mm P\x2n,y32)


p(x2,y n)
JfiC(o)xiC(*o) v,nC(m-l),C(r,-l)
= VaHf(x 0 ,yo,r).
Vairf-(xo,y 0 ! T).

5.3.5. Theorem.
T h e o r e m . For
For all x00,y
,y00 IP,
IP, T < oo there
there is the
the limit
limit equality
equality

Jim VaJrJ"(*o,yo,r)
V a i l f ( z 0 , y o , r ) = Jim Va/lf
V a / l f (x 0)
0 ) yo,T),

w/tere n = T/2 n .
Proof. Let us fix some n > 0. Let u(-), v(-) be a pair of strategies in the game
T2n(xQ,yo,T). This pair remains the same in the game T^(xa,yo,T). Suppose that
the sequence x 0 ,*i>,x 3 , ya,yi,...,yv is realized in situation w(-),v(-). Denote
the payoff functions in the games rj n (xo,yo,T), ^"(xoiyo,^ by K2(u(-),v(-)) =
f>{x2n,y2), K3(u(-),v(-)) = p(x2n,y3_i), respectively. Then

KM-)M-)) ^K-).(-)) ++/K2--i,ft-)-


K2(u(-),v(-)) < /fsKO.()) /K2--i,ft-)-
Hence, by the arbitrariness of (),(), we have:
VaHl-(x
V*Hf(x0,y0> yo,r)<< Vair*(x
0>T) Vairi(x0,y00,yo,T)+
,T) + max max p(y,y').
p{y,y'). (5.3.4)
GCj-*(vo)y'6Ci()
Let vf"
yj" C|-(y C|| --S,,
C*(y00),), then C s
- ((yf")
y f " ) C C|(y0 ). We now write inequahty (5.3.4) for
the games with the initial state x 0 ,yf". In view of the previous inclusion, we have
Vair*(x00ll yf",T)< V
Vair*"(x Vair*(x
a J r * - ( x 00,,yf",r)+
yf-,r)+ max max p(y,y'). (5.3.5)
vC(i)v'C(i,)
yC(i)v'c(i,)
Rom the definition of the games Ff*(x
from Ff"(xo)
0 , Vo, andrfrfnn(x
Vo,T)T) and (x00, ,yo,
yojT)T) follows
followsi the
theequal-
equal
ity
ValTi"(x
V a i r f " ( x0,y
0 ,0y,T) =
o,r)= max Vailt(*o,yi\T).
ValTi'(x0)yt,T). (5.3.6)
5.3. Existence of e-equilibria in differential games with prescribed duration 283

Since the function Cg(y) is continuous in t and the condition C%{y) = y is satisfied,
the second term in (5.3.5) tends to zero as n > oo. Denote it by ei(n). From (5.3.5),
(5.3.6) we obtain

Vair*"(s 0 ,j/o,T) > Va/If (x 0 ,y?",T) - (). (5.3.7)

By the continuity of the function Va/r^XoiJ/o, 7 1 , from (5.3.7) we obtain

Va/r*(x 0 ,j, 0 ,T) > Vair6S(x0,y0,T) - e,(n) - e,(n), (5.3.8)

where tj(n) 0 as n > oo. Passing in (5.3.8) to the limit as n oo (which is


possible in terms of the Lemmas in 5.3.3, 5.3.4 and the limit existence theorem for a
monotone bounded sequence) we obtain

lim V a i r t ( z 0 , y 0 , T ) > Hm V a i l f (x 0 ,j,o,T). (5.3.9)

From Lemma 5.3.3 the inverse inequality follows. Hence both limits in (5.3.9) coincide.
5.3.6. The statement of Theorem 5.3.5 is proved on the assumption that the
partition sequence of the interval [0, T]

On = {to = 0 < h < . . . < tN = T), n = 1, . . . ,

satisfies the condition tJ+i - t, - T/2n, j = 0 , l , . . . , 2 n - 1. The statements of


Theorem 5.3.5 and Lemmas 5.3.3, 5.3.4 hold for any sequence <rn of refined partitions
of the interval [0,7"], i.e. such that <rn+i 3 crn (this means that the partition crn+i is
obtained from an by adding new points)

)(an) = max(t i + i - U) -+_+, 0.


t

We shall now consider any such partition sequences of the interval [0,T] {an} and

Lemma. The following equality holds:


lim VaJr;"(* 0 ,o.T) = lim Va/r" ; (i 0 ,yo,T),
n*oo n*00

where x0,yo /if. T < oo.


Proof is carried out by reductio ad absurdum. Suppose the statement of this
lemma is not true. For definiteness assume that the following inequality is satisfied:

lim Vajrf(x 0 ,yo,T) > hm ValT<(x0,yQ,T).


n-+oo fi*oo

Then, by Theorem 5.3.5, we have

lim Vair?"(x 0 ,yo,T)> lim Vair?(x0,y0,T).

Hence we may find natural numbers m t ,rti such that the following inequality is sat
isfied:
Vair' m '(x 0 ,yo,T) > ValT^ (x0,y0,T).
284 Chapter 5. Differential games

Denote by o the partition of the interval [0, T] by the points belonging to both the
partitions <rmi and o~'ni. For this partition

Vailf(z 0 ,yo,T) < VairJ(x0,y0,T) < VMf"(x0,yo,r) < ValT^(x0,y0,T),


whence
ValTUx0,y0,T) < Vairf(x 0 ,j/o,r).
This contradicts (5.3.3), hence the above assumption is not true and the statement
of the lemma is true.
5.3.7 T h e o r e m . For all x0,ya,T < oo in the game T(x0,y0,T) there exists an
c-equilibium for any e. > 0. In this case
Vair(x0,y0,T) = lim ValT1"(x0,y0,T), (5.3.10)

where {<T} is any sequence of refined partitions of the interval [0,T].


Proof. Let us specify an arbitrarily chosen number e > 0 and show that for
the players P and E there are respective strategies u<(-) and vc(-) such that for all
strategies u(-) P and v(-) G E the following inequalities hold:
K(x0,y0,uc(-),v(-))-e < K(x0,y0,<()."<()) < K{x0,yQ,u(-),vt(-)) +t. (5.3.11)
By Theorem 5.3.5, there is a partition a of the interval [0,T] such that
VallKso, Sfo, T) - Jirn V a i r f (x 0 , !fo, T) < e/2,
lim V a i r f ( 2 o , o , r ) - Vairn*o,Jfo,r) < e/2.
Let u'(-) = (<r,ou), '() = (a,6<), where a, 6 are the optimal strategies for the
players P and E, respectively, in the games H^Xo,ya,T) and r ^ o t y o , ? " ) .
Then the following relationships are valid:
K(x0,y0,u<{-),v(-)) < Vain(xo,!/o,r)
< H m Va/r*-(a: 0 ,i to ,T) + i , v(-) E, (5.3.12)
* ( * o , 0b, ().'(')) > Va/Tt(x 0 ,yo,r)
> n lim V a i n - ( x 0 , y 0 ) r ) - | , (-)P. (5.3.13)
From (5.3.12), (5.3.13) and Theorem 5.3.5 we have

~ < K(x0,y0,u(-),w'(.)) - Jim V a i r t " ( x 0 , y 0 , r ) < | . (5.3.14)

From relationships (5.3.12)-(5.3.14) follows (5.3.11). By the arbitrariness of e from


(5.3.14) follows (5.3.10). This completes the proof of theorem.
5.3.8. Remark. The specific type of the payoff was not used in the proof of
the existence theorem. Only continuous dependence of the payoff on the realized
trajectories is essential. Therefore, Theorem 5.3.7 holds if any continuous functional
of the trajectories x(t),y(t) is considered in place of p(x(T),y(T)). In particular,
such a functional can be mino<rp(x(t),y(t)), i.e. a minimum distance between
players during the game. Thus the result of this section also holds for the differential
minimum result game of pursuit with prescribed duration.
5.4. Differentia] time-optimai games of pursuit 285

5.4 Differential time-optimal games of pursuit


5.4.1. Differential time-optimal games of pursuit are special case of differential games
with integral payoff as defined in 5.1.8. The classes of strategies P and E are the
same as in the game with prescribed duration. We assume that the set F = {(a:,y) :
p(x,y) < ^ / > 0} is given in Rn x i f and x(t),y(t) are trajectories for the players
P and E in situation (u(),t>(-)) from initial conditions x0,y0. Denote

tn(x0,y0;u(-),v()) = mm{t : (x(t),y(t)) F}. (5.4.1)

If there is no t such that (a:(),y()) F , then tn(x0,y0;u(-),v(-)) is +oo. In the


differential time-optimal game of pursuit the payoff to Player E is

K{x<>, y0; u(-), v(-)) = tn{x0, y0; (), ()) (5.4.2)

The game depends on the initial conditions xo,yo, therefore it is denoted by r(xo,yo)-
From the definition of the payoff function (5.4.2) it follows that the objective of
Player E in the game r(a;0,J'o) is to maximize the time of approaching Player P
within a given distance / > 0. Conversely, Player P wishes to minimize this time.
5.4.2. There is a close relation between the time-optimal game of pursuit
T(-ioi Vo, T) and the minimum result game of pursuit with prescibed duration. Let
F{xo,yo,T) be the game of pursuit with prescribed duration T for achievement of a
minimum result (the payoff to Player E is mino<t<T p(x(t),y(t))). It was shown that
for the games of this type there is an e-equilibrium in the class of piecewise open-
loop strategies for any > 0 (see 5.3.8). Let V(x0,yo,T) be the value of the game
T(x0ty0,T) and V(xo,y0) be the value of the game r(x 0 ,y 0 ) if it exists.
L e m m a . With xQ,y0 fixed, the function V(x0,y0,T) is continuous and does not
increase in T on the interval [0, oo].
Proof. Let Ti > T2 > 0. Denote by vj1 a strategy for Player E in the game
F(zo> Jfoi T) which guarantees that a distance between Player E and Player P on the
interval [0, Tj] will be at least max[0, V(xo, yo, Ti) ] Hence it does ensure a distance
max[0, V{x0,yo,T\) t] between the players on the interval [0,T 2 ], where Ti < Ti.
Therefore
V(x0,yo,T2) > max[0, V(x 0 ,yo,jTi) - e] (5.4.3)
(the strategy e-optimal in the game T(xo,yo,Ti) is not necessarily e-optimal in the
game r(xo,y0,T^)). Since e can be chosen to be arbitrary, the statement of this
Lemma follows from (5.4.3). The continuity of V(x0,y0,T) in T will be left without
proof. To be noted only is that this property can be obtained by using the continuity
of V(x0,yo,T) in x0,5to-
5.4.3. Let us consider the equation

V(x0,y0,T) =l (BAA)

with respect to T. Three cases are possible here:


1) equation (5.4.4) has no roots;
286 Chapter 5. Differential games
2) it has a single root;
3) it has more than one root.
In case 3), the monotonicity and the continuity of the function V(x0, yo, T) in T im
ply that the equation (5.4.4) has the whole segment of roots, the function V(x0, yo, T),
as a function of T, has a constancy interval. Let us consider each case individually.
Case 1. In this case the following is possible: a) V(x0,yo,T) < / for all T > 0; b)
infr>oV(a:0,5fo,r) > /; c) infr>o V(x0,y,T) = I.
In case a) we have
V(xo,yo,0) = p(x0,y0) < /,
i.e. <(a;0,yo; (),()) = 0 for all (),() The value of the game r(xo,yo) is then
V(xo,yo) = 0.
In case b) the following equality holds:

mtV(x0,y0,T) = Jim V(x0,y0,T) > I.


1 jju i *oo

Hence for any T > 0 (arbitrary large) Player E has a suitable strategy w T () 6 E which
guarantees him / capture avoidance on the interval [0, T]. But Player P then has no
strategy which could guarantee him /-capture of Player E in finite time. However, we
cannot claim that Player E has a strategy which ensures /-capture avoidance in finite
time. The problem of finding initial states in which such a strategy exists reduces to
solving the game of kind for player E. Thus, for / < limx-^x, V(*o, yo, T) it can be
merely stated that the value of the game T(x0, y0), if any, is larger than any previously
given T, i.e. it is +oo.
c) is considered together with case 3).
Case 2. Let T0 be a single root of equation (5.4.4). Then it follows from the
monotonicity and the continuity of the function V(i 0 , yo, T) in T that

V(*o,yo,r) > V(x0,y0,T0) for all T < T0,

V(x0, y0, T) < V(x0, y0, T0) for all T > T0, (5.4.5)
Jim. V(*0,yo,T) = V(x0,y0,T0). (5.4.6)

Let us fix an arbitrary T > T0 and consider the game of pursuit r(x0,y0,T). The
game has an e-equilibrium in the class of piecewise open-loop strategies for any > 0.
This, in particular, means that for any e > 0 there is Player P's strategy u t ( ) P
which ensures the capture of Player E within a distance V(x0, y0l T) + e, i.e.

K(uc{-),v())<V{x0,yQ,T) + e, () e E, (5.4.7)

where K((),()) is the payoff function in the game F(x0,y0,T). Then (5.4.5),
(5.4.6) imply the existence of t > 0 such that for any e < 1 there is a number
f(e),T0<f(e)<T for which

= V(x0,y0,T0) - V(xo,y0,f(t)). (5.4.8)


5.4. Differential time-optimai games of pursuit 287

From (5.4.7), (5.4.8) it follows that for any c < t

K{u((),v(-)) < V(x0,y0,T) + e < V(xQ, y 0 , f(e)) + e = V(x0,yo,To) = I, v(-) E,

i.e. the strategy u<() ensures /-capture in time T. Hence, by the arbitrariness of
T > To, it follows that for any T > To there is a corresponding strategy u T ( ) P
which ensures /-capture in time T. In other words, for 6 > 0 there is us(-) P such
that
tn(x0, Vo; (), v()) < T0 + 6 for all v(-) e E. (5.4.9)
In a similar manner we may prove the existence of tij(-) E such that

tn{xo, yo; u{-),vs(-))>T0-6 for all u(-) 6 P. (5.4.10)

It follows from (5.4.9), (5.4.10) that in the time-optimal game of pursuit T(x0, y0)
for any e > 0 there is an e-equilibrium in piecewise open-loop strategies and the value
of the game is equal to T0, with T0 as a single root of equation (5.4.4).
Case 3. Denote by T0 the minimal root of equation (5.4.4). Generally speaking, we
cannot now state that the value of the game Vair(x 0 , yo) = To. Indeed, V(x0, yo, To) =
I merely implies that in the game T(a:o!yo)7o) for any e > 0 Player P has a strategy
u e (-) which ensures for him, in time T0, the capture of Player E within a distance
of at most / + t. From the existence of more than one root of equation (5.4.4), and
from the monotonicity of V ( i 0 , y 0 , T ) in T we obtain the existence of the interval
of constancy of the function V(x0,yo,T) in T [T 0 ,Ti]. Therefore, an increase in
the duration of the game F(x0, y 0 , T0) by 6, where 6 < T\ - T0, does not involve a
decrease in the guaranteed approach to Player E, i.e. for all T Po,Ti] Player F
can merely ensure approaching Player E within a distance / + e (for any e > 0), and
it is beyond reason to hope for this quantity to become zero for some T [To,7\]. If
the game F(x0, y0, T0) had an equilibrium (but not an e-equilibrium), then the value
of the game T(xo,!/o) would also be equal to To in Case 3.
5.4.4. Let us modify the notion of an equilibrium in the game r(i 0 ,yo)- Further,
in this section it may be convenient to use the notation T(xo, yo,') instead of r ( i 0 , yo)
emphasizing the fact that the game r(x o >Sto,0 terminates when the players come
within a distance / of each other.
Let t'n(xo,yo\"()>()) ^e the time until coming within a distance / in situation
((),()) and let there be e > 0, 6 > 0.
Definition. We say that the pair of strategies uf(-),cf(-) constitutes an t,6-
equilibrium in the game T(x0,yo,l) if

e { ( * o , y o ; u ( ) , 5 f ( ) ) + > t':s(x0,y0-rf(),v{(-)) > t':s(x0,yoM(-)M)) ~<

for all strategies () P, v(-) E.


Definition. Let there be a sequence {Sk}, Sk > 0, 4 -* 0 such that in all of the
games r(x 0 ,yo' + fa) for every t > 0 there is an t-equilibrium. Then the limit

Jim V(x 0 ,y 0 ,' + fa) = V'{x0,yo,l)


288 Chapter 5. Differential games
is called the value of the game T(x0,yo,l) in the generalized sense.
Note that the quantity V'(x0, y0,1) does not depend on the choice of a sequence
{6k} because of the monotone decrease of the function V(x0,yo,l) in /.
Definition. We say that the game T(x0, yo, 0 has the value in the generalized
sense if there exists a sequence {6k}, ** * 0 such that for every e > 0 and 6k 6 {6k}
in the game T(xo,yo,l) there exists an e,6k-equilibrium.
It can be shown that if the game F(x0, yo, /) has the value in the ordinary sense,
then its value V'(x0,y0,l) ( m ^he generalized sense) exists and is

lim_e 5 *(zo,y 0 ;uf(-),sf(-)) = V'(x0,y0,l).

From the definition of the value and solution of the game F(x0, yo, I) (in the gen
eralized sense) it follows that if in the game r(z 0 . yo.') for every e > 0 there is an
e-equilibrium in the ordinary sense (i.e. the solution in the ordinary sense), then
V(xa,yo,l) = V'(xo,yo,l) (it suffices to take a sequence 6k = 0 for all k).
Theorem. Let equation (5.4-4) have more than one root and let To be the least
root, T0 < oo. Then there exists the value V'(x0, yo, I) (in the generalized sense) of
the time-optimal game of pursuit r(x0,yo,l) and V'(xo,yo,l) = To-
Proof. The monotonicity and continuity of the function V(xo> Vo, T) in T imply the
existence of a sequence Tk To on the left such that V(xo, yo, Tk) V(xo, yo, To) = I
and the function V(xo,yo,Tk) is strictly monotone in the points Tk. Let

Sk = V(xo,yo,Tk)-l>0.
The strict monotonicity of the function V(xo,yo,T) in the points Tk implies that
the equation V(x0,yo,T) = I + 6k has a single root 7*. This means that for every
6k 6 {6k} in the games r(xo,yo, I + 6k) there is an e-equilibrium for every e > 0 (see
Case 2 in 5.4.3). The game T(xo,yo,l) then has a solution in the generalized sense:
lim V{x0, yo, / + **)= Km 7* = T0 = V'{x0, y0,1).
k*oo *oo

This completes the proof of the theorem.


We shall now consider Case lc in 5.4.3. We have infr V(x0, yo. T) I. Let
Tk oo. Then lim^K, V(x0, yQ, TK) = I. From the monotonicity and continuity of
V(x0,y0,T) in T it follows that the sequence {Tk} can be chosen to be such that the
function V(x0, yo, T) is strictly monotone in the points Tk- Then, as in the proof of
Theorem in 5.4.4, it can be shown that there exists a sequence {6k} such that

lim V(x0,yo,l +Sk)= lim Tk = T0 = oo.


t-oo k->oo
Thus, in this case there also exists a generalized solution, while the generalized
value of the game r(x 0 , yo, 0 is infinity.
5.4.5. It is often important to find out whether Player P can guarantee /-capture
from the given initial positions x, y in finite time T. If it is impossible, then we have
to find out whether Player E can guarantee /-capture avoidance within a specified
period of time.
5.4. Differentia] time-optimal games of pursuit 289

Let V(x,y, T) be the value of the game with prescribed duration T from initial
states x,y Rn with the payoff min 0 <i<r p(x(t),y(t)). Then the following alternatives
are possible: 1) V{x,y,T) > I; 2) V(x~y,T) < I.
Case 1. From the definition of the function V(x,y,T) it follows that for every
e > 0 there is a strategy for Player E such that for all strategies u(-)

K(x,y;u(),v:())>V(x,y,T)-e.

Having chosen the t to be sufficiently small we may ensure that

ff(*,;u(-),w;(-))>V(*,y,T)-e>/

holds for all strategies () 6 P of Player P. From the form of the payoff function K
it follows that, by employing a strategy v'(-), Player E can ensure that the inequality
min0<i<To p(x(t),y(t)) > I would be satisfied no matter what Player P does. That
is, in this case Player E ensures /-capture avoidance on the interval [0, T] no matter
what Player P does.
Case 2. Let T0 be a minimal root of the equation V(x,y,T) = / with x,y fixed
(if p{x,y) < I, then To is taken to be 0). From the definition of V(x, y,7o) it then
follows that in the game T(x,y,To) for every e > 0 Player P has a strategy u* which
ensures that
K{z,y;u;(-),v(-)) < V(x,y;T0) + e= /+ e
for all strategies u(-) E of Player E. From the form of the payoff function K it
follows that, by employing a strategy u"(-), Player P can ensure that the inequality
mm0<t<T p(x(t),y(t)) < I + would be satisfied no matter what Player E does.
Extending arbitrarily the strategy *() to the interval [T0, T] we have that, in Case
2, for every e > 0 Player P can ensure (/ + e)-capture of Player E in time T no matter
what the latter does.
This in fact proves the following theorem (of alternative).
Theorem. For every x, y Rn, T > 0 one of the following assertions holds:

1. from initial conditions x,y Player E can ensure l-capture avoidance during the
time T no matter what Player P does;

S. for any e > 0 Player P can ensure (l + t)-capture of Player E from initial states
x,y during the time T no matter what Player E does.

5.4.6. For each fixed T > 0 the entire space Rn x R" is divided into three
nonoverlapping regions: region A - {x,y : V(x,y,T) < 1} which is called the capture
zone; region B ~ {x,y : V(x,y,T) > 1} which is naturally called the escape zone;
region C = {x,y : V(x,y,T) = 1} is called the indifference zone.
Let x,y A. By the definition of A, for any e > 0 Player P has a strategy *()
such that
K(x,y;u:(-)A-))<V(x,y,T) +e
290 Gbapter 5. Differentia] games
for all strategies v(-) of Player E. By a proper choice of e > 0 it is possible to ensure
that the following inequality be satisfied:

K(x,y;u;(.),v(-))<V(x,y,T) + e<l.
This means that the strategy u* of Player P guarantees him l-capture of Player E
from initial states during the time T. We thus obtain the following refinement of
Theorem 5.4.5.
Theorem. For every fixed T > 0 the entire space is divided into three nonover-
lapping regions A, B, C possessing the following properties:

1. for any x,y A Player P has a strategy *() which ensures i-captvre of Player
E on the interval [0, T] no matter what the latter does;

8. for x,y B Player E has a strategy *() which ensures l-capture avoidance of
Player P on the interval [0,T] no matter what the latter does;

S. if x,y C and t > 0, then Player P has a strategy u*() which ensures (/+ e)
capture of Player E during the time T no matter what the latter does.

5.5 Necessary and sufficient condition for ex


istence of optimal open-loop strategy for
Evader
5.5.1. An important subclass of games of pursuit is represented by the games in
which an optimal strategy for evader is a function of time only (this is what is called
a regular case).
We shall restrict consideration to the games of pursuit with prescribed duration,
although all of the results below can be extended to the time-optimal games of pursuit.
Let Cp(i)(Cg(y)) be a reachability set for Player P(E) from initial state x(y) by
the time T, i.e. the set of those positions at which Player P(E) can arrive from
the initial state x(y) at the time T by employing all possible measurable open-loop
controb u(t),(v(*)), t [0,T] provided the motion occurs in terms of the system
x = f(x,u) (y = g(y,v)). Let us introduce the quantity

M*o,!/o)= max nun p(x,y)t (5.5.1)

which may at times also be called (see Krasovskii (1985), Krasovskii and Subbotin
(1974)) a hypothetical mismatch of the sets CE(VO) and Cf(xo) (see Example 6 in
2.2.6).
The function pr{x0,yo) has the following properties:
1. pr(x0,y0) > 0, PT{X0, y0)|T=o = p(*o,o);
2. h(*o,Vo) = 0 if Cp(*o) D Cl(y0);
5.5, Necessary and sufficient condition for existence of optimal open-loop 291

3. If V(x0,y0,T) is the value of the game r ( x 0 , y 0 , r ) with prescribed duration


and terminal payoff p(x(T),y(T)), then

V(*o,yo,T) >pr(x0,yo)-
Indeed, property 1 follows from non-negativity of the function p(x,y). Let
Cp(x 0 ) D C | ( y 0 ) . Then for every y' G C^yo) there is x' G Cfp{x0) such that
p(x',y') = 0, (x' = y'), whence follows 2. Property 3 follows from the fact that
Player E can always guarantee himself an amount />r(xo,yo) by choosing the motion
directed towards the point M G C|(y 0 ) for which

h{xo,yo)= mm p(x,M).

The point M is called the center of pursuit.


5.S.2. Let Ts(x0,yo,T) be a discrete game of pursuit with step 6 (6 = *+! **),
prescribed duration T, discrimination against Player E, and initial states x 0 , Do- Then
the following theorem holds,
T h e o r e m . In order for the following equality to hold for any Xo, yo i f and
T = 6-k, Jt = 1,2,..,.-
/r( x o,yo) = Vairs(x 0 ) yo,T), (5.5.2)
it is necessary and sufficient that for all x 0 , yo Rn, 6 > 0 and T = 6 k, k = 1,2,...,
there be
pr(xo,yo)= max min pr_(x,y) (5.5.3)
ec|.(i*)) *Cj,(*o)
(ValY$(xa,yo,T) is the value of the game rs(xo,yo>r)).
The proof of this theorem is based on the following result.
L e m m a . The following inequality holds for any Xo,yo R", T > 6:

M*o,yo)< max min pT_ s (x,y).


vC(vo)*C*(io)

Proof. By the definition of the function pr, we have

max min prsix.y) = max min max min p(x,y).


C^( w )iC{,(o) C<.(itt)*C<,(io)y6C-'(,,)reCj-'(r)

For all x C P (x 0 ) there is an inclusion Cp~~s(x) C Cp(x0). Hence for any x G C^(x0),
yeCTE-s{y).
min p ( x , y ) > _ min p(x,y).

Then for all x G CP{x0), y G C | ( y )

max min p(x,y) > max min p(x,y)

and
min max min p(x,y) > max min p(x,y).
Cj 1 {o)FeCS-'(,)SCj-'(*) 56Cj-'()x6c;(ro)
292 Chapter 5. Differential games

Thus
max min pT-s(x,y) > max max min p(3S,y)
vCf.(vo)*C(*) 6^(vo)veCj-'()SCf(ro)

vecj(i)iecj(i 0 )
This completes the proof of lemma.
We shall now prove the Theorem.
Necessity. Suppose that condition (5.5.2) is satisfied and condition (5.5.3) is not.
Then, by Lemma, there exist S > 0, xo, yo /?*, To = Ska, ^o > 1 such that

ho(xo,yo) < max min fa.s(x,y). (5.5.4)


VC(i)*Cj,(xo)

L e t ( ) be an optimal strategy for Player P in the game Ti(xo, yo, T0) and suppose
that at the 1st step Player E chooses the point y* C%(y0) for which

min fa-s{x,y*) = m?x mjn fao-sfay)- (5.5.5)


*eC(*o) 6CJ.(I)I6CJ,(IO)

Let x(S) be the state to which Player P passes at the 1st step when he uses
strategy u(-), and let (-) be an optimal strategy for Player E in the game
Ts(x0(6),y*,T0 S). Let us consider the strategy v(-) for Player E in the game
rs(x0,y0,To): at the time instant t = 0 he chooses the point y* and from the instant
t = 6 uses strategy 5(-).
Denote by u(-) the truncation of strategy u(-) on the interval [6, To]. From
(5.5.2), (5.5.4), (5.5.5) (by (5.5.2) pr(x0, yo) is the value of the game T$(x0, yo, T)) we
find

fa(x0,y0) > K(u%),v(-);x0,y0,T0) = K(u%),tf(-);x0(6),y*,T0 - 6)

= fa-s(x{6),y*) > min fa-fay*) = max min fa^6(x,y) > fa(x0,y0)-


i6Cj,(i 0 ) vC*(y 0 )xCj > (* 0 )
This contradiction proves the necessity of condition (5.5.3).
Sufficiency. Note that the condition (5.5.3), in conjunction with the condition
PT(*O>!/O)|T=O = p(xo,yo), shows that the function pr(xo,yo) satisfies the functional
equation for the function of the value of the game Tf(x0,yo,T). As follows from the
proof of Theorem in 5.2.2, this condition is sufficient for Pr(xo,yo) to be the value of
the game r j ( z 0 ! y 0 , r ) .
5.5.3. L e m m a . In order for Player E's optimal open-loop strategy (i.e. the
strategy which is the function of time only) to exist in the game T(x0,y0,T) it is
necessary and sufficient that

Vair(x0,y0,T) = pr(x0,yo). (5.5.6)


5.5. Necessary and sufficient condition for existence of optimal open-loop 293

Proof. Sufficiency. Let v"(t), t [0,T] be an admissible control for Player E


which sends the point yQ to a point M such that

Pr(xo,yo) = min p(z,M).

Denote "() = {<x, v"(t)}, where the partition <x of the interval [0,T] consists of two
points t0 = 0,ii = T. Evidently, v*(-) e . By Theorem in 1.3.4, v'(-) is an
optimal strategy for Player E in the game V(x0,yo,T) if

V&lT(x0,y0,T) - inf K(u(-),v*(-)\x0,y0,T).


()P

But this equality follows from (5.5.6), since

}n{K{u(),v"{);x0,y0,T) = pr{x0,y0).

Necessity. Suppose that in the game T(xo, yo, T) there exists an optimal open-loop
strategy for Player E. Then

Va/r(x 0 ,yo,T) = sup inf K(u(),v(-);xQ,y0,T)


,()t(')C

= m&x m{p(x{T),y) = pT{x0,yo)-


yC|(y 0 ) u(-)P

This completes the proof of lemma.


Theorem. In order for Player E to have an optimal open-loop strategy for any
Xo,ya # " , T > 0 in the game V(x0,y0,T) it is necessary and sufficient that for any
S>Q,x0,y0Rn,T>6

pr(x0,yo)= max min pT.s(x,y). (5.5.7)

Proof. Sufficiency. By Theorem in 5.5.2, condition (5.5.7) implies relationship


(5.5.2) from which, by passing to the limit (see Theorem in 5.3.7) we obtain

Pr(x0,y0) = Vair(x 0 ,yo,T').

By Lemma in 5.5.3, this implies existence of an optimal open-loop strategy for Player
E.
Necessity of condition (5.5.7) follows from Theorem in 5.5.2, since the existence
of an optimal open-loop strategy for Player E in the game T(xo,yo,T) involves the
existence of such a strategy in all games Ts(x0,yo,T), T = Sk, k>l and the validity
of relationship (5.5.3).
294 Chapter 5. Differentia] games
5.6 Fundamental equation
In this section we will show that, under some particular conditions, the value func
tion of the differential game satisfies a partial differential equation which is called
fundamental. Although in monographic literatures R. Isaacs (1965) was the first to
consider this equation, it is often referred to as the Isaacs-Bellman equation.
5.6.1. By employing Theorem in 5.5.3, we shall derive a partial differential equa
tion for the value function of the differential game. We assume that the conditions
of Theorem in 5.5.3 hold for the game T(x, y, T). Then the function pr{x, y) is the
value of the game F(x,y, T) of duration T from initial states x, y.
Suppose that in some domain ft of the space i f X i f x [0, oo) the function pr(x, y)
has continuous partial derivatives in all its variables. We shall show that in this case
the function pr(x, y) in domain ft satisfies the extremal differential equation

w-^%&^-&ti&M''u)=0' (561)
where the functions fi{x,v),gi(y,v), i = l , . . . , n determine the behavior of players
in the game T (see (5.3.1), (5.3.2)).
Suppose that (5.6.1) fails to hold in some point (x, y,T) ft. For definiteness, let

I " T^g ^9iM " ^ g ^(X'U) < -


Let v S V be such that in the point involved, (x, y, T) ft, the following relationship
is satisfied:
Minn O O > (JO

E^(y,*) = maxg(*).
Then the following inequality holds for any u U in the point (x, y, T) 6 ft:

-!>.>-...)>. (5.6.2)
From the continuous differentiability of the function p in all its variables it follows
that the inequality (5.6.2) also holds in some neighbourhood S of the point (x,y,T).
Let us choose a number 6 > 0 so small that the point (x(r), y(r), T T) 6 S for all
r e [0,S\. Here
fr /(*(*), (*))*.
X{T) = X+ Jo

y(r) = y+ JoI* g{y(t)rft))dt


are the trajectories of systems (5.3.1), (5.3.2) corresponding to some admissible con
trol u(t) and v(t) = v and initial conditions x(0) = x, y(0) = y, respectively. Let us
define the function

G(T) = p|(x(T),v(T),T-T) ~ Y,Qy-k'{r)Mr),T-r)9i(y(r),v)


5.6. Fundamental equation 295
n ft*
- 1 2 a-|WT)*(T),T-r)/.-(x(T), U(T)), T [0, 6].

The function G ( T ) is continuous in r, therefore there is a number c < 0 such that


G ( T ) < c for T 6 [0,5], Hence we have

f G(r)dT < cS. (5.6.3)


JO

It can be readily seen that

G
(T) = _
^I(*(T).*<T),T-T)-

From (5.6.3) we obtain

fr(x>y) ~ h-s(x(6),y(S)) < c6.

Hence, by the arbitrariness of u(t), it follows that

pr(x,y)< max rnin pT_e(x',y').


'C|(v)r*C*(i)

But this contradicts (5.5.7).


We have thus shown that in the case when Player E in the game T(x,y,T) has
an optimal open-loop strategy for any x,y 6 i f , T > 0, the value of the game
V(x,y,T) (it coincides with pr(x,y) by Lemma in 5.5.3) in the domain of the space
R" x Rn x [0, oo), where this function has continuous partial derivatives, satisfies the
equation
n n
dV dV dV
df = Tv X g dy-,9'(y'V) + ^ " S ^ /,(X ' U) (5 6 4)
''
with the initial condition V(x,y,T)\T-0 - p(x,y). Suppose we have defined u,v
computing max and min to (5.6.4) as functions of x,y and | j , ^ that is

U U X,
~ ( ~dx~^ "= ^ 0J7^' (5.6.5)
Substituting expressions (5.6.5) into (5.6.4) we obtain

Aav / av\ A0v,/ . av.\ dv


^dy-O'K^W + S^ / '( I ' U ( X '^ ) ) = sf (56 6)
-
subject to
V(x,y,T)|T=o = rt*,y)- (5-6-7)
Thus, to define V(x,y,T) we have the initial value problem for the first order
partial differential equation (5.6.6) with the initial condition (5.6.7).
Remark. In the derivation of the functional equations (5.6.4), (5.6.6), and in the
proof of Theorem in 5.5.3, no use was made of a specific payoff function, therefore
296 Chapter 5. Differential games

this theorem holds for any continuous terminal payoff H(x(T),y(T)). In this case,
however, instead of the quantity pr(x, y) we have to consider the quantity
HT(X,V) = max min H(x',y').
'Cj(v)x'6Cf(x)
Equation (5.6.4) also holds for the value of the differential game with prescribed
duration and any terminal payoff, i.e. if in the differential game T(x,y,T) with
prescribed duration and terminal payoff H(x(T),y(T)) there is an optimal open-loop
strategy for Player E, then the value of the game V(x, y, T) in the domain of the space
fl x i? 1 x [0,oo), where there exist continuous partial derivatives, satisfies equation
(5.6.4) with the initial condition V(x,y,T)|r=o = H(x,y) or equation (5.6.6) with
the same initial condition.
5.6.2. We shall now consider the games of pursuit in which the payoff function
is equal to the time-to-capture. For definiteness, we assume that terminal manifold
F is a sphere p(x, y) = I, I > 0. We also assume that the sets Cp(x) and C'E(y) are
^-continuous in zero uniformly with respect to z and y.
Suppose the following quanitity makes sense:
0(x,y,l) = maxtmntln(x,y\u(t),v(t)),
v(l) u(f)
where t'n(x,y;u(t),v(t)) is the time of approach within /-distance for the players P
and E moving from initial points x,y and using measurable open-loop controls (t)
and v(t), respectively. Also, suppose the function 0(x,y,l) is continuous in all its
independent variables.
Let us denote the time-optimal game by F(a:o,S/o)- As in Sees. 5.4, 5.5, we may de-
rive necessary and sufficient conditions for existence of an optimal open-loop strategy
for Player E in the time-optimal game. The following theorem holds.
T h e o r e m . In order for Player E to have an optimal open-loop strategy for any
xo, !fo 6 .ft in the game T(xo, Sto) it is necessary and sufficient that for any 8 > 0 and
any x0, !fo 6 /P*
Hxo,VoJ) = 6 + max min 0(x',y',l).
i,'eC'(vo)r'ec<()
For the time-optimal game of pursuit the equation (5.6.4) becomes

^t^*M + tM*>) = - 1 (5-6-8)


with the initial condition
%-y,0Ux,v)=i = 0. (5.6.9)
Here it is assumed that there exist the first order continuous partial derivatives of
the function $(x,y,l) with respect to x,y. Assuming that the a, v sending max and
min to (5.6.8) can be defined as functions of x, y, d0/dx, d0/dy, i.e. u = u(x, | | ) , tJ =
v(y, | j ) , we can rewrite equation (5.6.8) as

(-*>)+K*(^5>) <"IO>
5.6. Fundamental equation 297

subject to
0(*,y,OW,)=; = O. (5.6.11)
The derivation of equation (5.6.8) is analogous to the derivation of equation (5.6.4)
for the game of pursuit with prescribed duration.
Both initial value problems (5.6.4), (5.6.7) and (5.6.8), (5.6.9) are nonlinear in
partial derivatives, therefore their solution presents serious difficulties.
5.6.3. We shall now derive equations of characteristics for (5.6.4). We assume
that the function V(x,y;T) has continuous mixed second derivatives over the entire
space, the functions gi(y,v), /,(x,u) and the functions tt = u(x, | ^ ) , v = v(y, )
have continuous first derivatives with respect to all their variables, and the sets U, V
have the aspect of parallelepipeds a m < u m < 6 m , m = 1 , . . . , k and c, < vq < dq,
q = 1 , . . . , t. where u ( t t j , . . . , uk) . U,v = ( v i , . . . , vi) 6 V. Denote

dV J* dV J i , dV
B(x,y,T) = ~-^f,(x,z)-^~9i(y,u).

The function B(x,y,T) = 0, thus taking partial derivatives with respect to x j , . . . , i n


we obtain
2
dB __ d v * a2v jt, dv dfj ^ a2v * d /^av \mm

-&(.)-. '-> -
For every fixed point (x,y,T) H" x Rn x [0,oo) the maximizing value v and the
minimizing value u in (5.6.4) lie either inside or on the boundary of the interval of
constraints. If this is an interior point, then

(&)--* (>
If, however, u(v) is at the boundary, then two cases are possible. Let us discuss these
cases for one of the components m (z, |j) of the vector u. The other components of
vector u and vector V can be investigated in a similar manner. For simplicity assume
that at some point (x',y',T')

m = "m \X', ^ j = Om.

Case 1. In the space H" there exists a ball with its center at the point x' and the
following equality holds for all points x:

_ _ / dV(x,y',T')\
m = m I X, 1 = am.
298 Chapter 5. Differential games

The function u m assumes on the ball a constant value; therefore in the point x' we
have
=0, t= l,...,n.
OX,
Case 2. Such a ball does not exist. Then there is a sequence of interior points x r ,
lim r _ 0 0 x r = x' such that

/ dV(Xr,y',T')\
um r f m
\ ' dx ) "
Hence n
d / 8V \
dum
From the continuity of derivatives dV/dx{, dfi/dum and function u
u(x, dV^'T^) it follows that the preceding equality also holds in the point (x',y',' r).
Thus, the last two terms in (5.6.12) are zero and the following equality holds for
all(x,y,r)eflnx[0)oo):

dB d2V Ji, &V


dxk ' dTdxk 'ldxidzk,tKX'V)

0V0 _ n d*V
SJ dxi dxk ' . a &(>") = *=l,2,...,n.
LetS(t),y(t),te [ 0 , r ] b e a !solution of the system

x= ./(..<, 53*g^>).
y- -,(..<.. S ^ ) )

with the initial condition x(0) = Xo, y(0) = jfo- Along the solution x(t),y(t) we have

d*V(x(t),y(t),T
dTdxk -''-E^^-'W*.))
j^dV{x{t),y(t),T-t)dfi{x{t),u{t))
$T[ dxi dxk

2^ a a giW)Mt)) = v> *=l,...,n, (5.6,.13)


i=i oytOXk
where
m--= B (r(,, 8, ' W) -f ) ' T - , ).
m--..(iio. wi,t,) -y"- T -' ) )-
5.6. Fundamental equation 299
However,

Note that for the twice continuously differentiable function we may reverse the order
of differentiation. Now (5.6.13) can be rewritten in terms of (5.6.14) as
d fdV(x(t),y(t),T~ t)\ ^ dV(x(t),V(t),T - t) 9/,(x(t),u(0)
dx 9l
dt\ dxk ) ' h. ' * ,*-!,...,
In a similar manner we obtain the equations

dfdV{x(t),y(t),T-t)\_ ^dV(x(t),y(t),T-t)d9j(y(t)Mm .
dt\ dy{ I' & dy, % '*-1.-.'
Since for t G [0, T]
V(x(t),y(t),T-t)^H(x(ny(T)%
we have
d/dV(x(t),V(t),T-t)s
dt\ ar )
Let us introduce the following notation:
*dv(x(t),v(t),T-t)
v,.(t) dxi

v m * dV(m>y(t)>T-t)
Vyi{l) g , I i,...,n,

V,(<) = {V,.(*)}, W ) = {Vy.{t)h


v m* dnm,y(t),T-t)
VT(t) - dT

As a result we obtain the following system of ordinary differential equations for the
functions x(t),y(t), Vx(t), Vy(t):

ii = fi(x,u(x,Vx)),

yi = 9i(yAy,vv))>
_ _ A dfj{xM*,v*))
k
~ h ' dxk
300 Ch&pter 5. Differential games

v
vk = - L, vv. Q^ (5.6.15)

VT = 0, i,k= l,...,n,
and, by (5.6.6), we have

Vr = 'tviligi(y,V(1,,V1f)) + Jtv,MzM*,V,)).

In order to solve the system of nonlinear equations (5.6.15) with respect to the func
tions x(t),y(t),VXk(t),VVk(t),VT(t), we need to define initial conditions. For the func
tion V(x(t),y(t),T t) such conditions are given at the time instant t = T, therefore
we introduce the variable T T t and write the equation of characteristics as a
regression. Let us introduce the notation x= x,y= y. The equation of character
istics become

Xi= -fi(x,u),

$i= -9i(y,v),

V,= E V , ^ , (5.6.16)

VT=0.

In the specification of initial conditions for system (5.6.16), use is made of the rela
tionship V(x,y,T)\T=0 = H(x,y). Let x| T=0 = s, yr=0 = s'. Then

M
VI - \
*XJ | T = 0 a |x=,y=',
OXi

dH
VV,|T=O = -TT |x=.tV=,', (5.6.17)

VT\T=0 = J2 Vyi | T=0 ft('. HS, n U o ) ) + V l~o/i(*. (*, V,|o)).


=i i=i

Possible ways of solving system (5.6.16)-(5.6.17) are discussed in detail in Isaacs


(1965).
In a similar manner, using equation (5.6.8) we may write the equation of charac
teristics for the problem of time-optimal pursuit.
5.7. Methods of successive approximations for solving differential games 301

5.7 M e t h o d s of successive approximations for


solving differential games of pursuit
5.7.1. Let T$(x, y, T) be a discrete form of the differential game T(x,y, T) of duration
T > 0 with a fixed step of partition S and discrimination against Player E for the
time 8 > 0 in advance. Denote by V$(x,y,T) the value of the game Ts{x, y,T).1 Then

limVs{x,y,T) = V(x,ytT)
o0

and optimal strategies in the game Fs(x, y, T) for sufficiently small 6 can be efficiently
used to construct e-equilibria in the game T(x,y,T).
5.7.2. The essence of the numerical method is to construct an algorithm of finding
a solution of the game Tg(x,y, T). We shall now expound this method.
Zero-order approximation. A zero-order approximation for the function of the
value of the game Vs(x,y,T) is taken to be the function

V?(x,y,r)= max min , ( { , , ) , (5.7.1)

where Cp(x),C^(y) are reachability sets for the players P and E from initial states
i . y G f f 1 by the time T.
The choice of the function V(x,y,T) as an initial approximation is justified by
the fact that in a sufficiently large class of games (what is called a regular case) it
turns out to be the value of the game T(x,y,T), The following approximations are
constructed by the rule:

Vs\x,y,T)= max min V?(Z,ri,T - 6),

Vs2(x,y,T)= max min V/tf.ij.T-*),

Vsk(x,y,T)= max min V / ^ K . i j . r - 6) (5.7.2)


nec|()<ec(x)
for T > .5 and ^ ( I . J . T ) = Vs(x,y,T) (orT<S,k> 1.
As may be seen from formulas (5.7.2), the max min operation is taken over the
reachability sets CsB{y),Cp(x) for the time 6, i.e. for one step of the discrete game
Vs(x,y,T).
5.7.3. T h e o r e m . For the fixed x,y,T,6 the numerical sequence {Vsk(x,y,T)}
does not decrease with the growth ofk.
Proof. First we prove the inequality

Vsl(z,y,T)>V?(x,y,T).

'The terminal payoff is equal to p(z(T),y(T)), where p(z,y) is a distance in R".


302 Chapter 5. Differentia/ games
For all ( G C.(x) there is Cf-S{() C C?(x). For any f Cg _ 'fo), Cfc(x) we
have

Hence
VHx,y,T)= max min max min p(I,fj)

m
> 5?, "r^ >- m i n ^ ' ^ ) =
max min p(,i?) = K<(x,j/,r).

We now assume that for / < k there is

^(x^.TJ^V^K^T). (5.7.3)

We prove this inequality for / = ifc + 1. From relationships (5.7.2) and (5.7.3) it follows
that
V/ +1 (x, y,T)= max min V/tf, r,, T - S)

> max min V/-',7,T-*) = V/(x ) !,,r).


Cj.(v){C*(x)
Thus, in case T > S, by induction, the statement of the theorem is proved (in case
T < 6 the statement of the theorem is obvious).
5.7.4. Theorem. The sequence {Vsk(x,y,T)} converges in a finite number of
steps N, with the estimate N < [j] + 1, where the brackets stand for the integer
part.
Proof. Let N = [T/8] + 1. We show that

V/ v (x ) y,T) = V 4 JV+1 (x,y,r). (5.7.4)

Equation (5.7.4) can be readily obtained from construction of the sequence


{vsk(*,y, T)}. Indeed,
Vf(s,y, T) = max min Vf + , (,,', T - 6)
)'C|(w)<'Cj,(i)

= max min max ... max min Vsl(tN-l,nN-\T-(N-l)6).

Similarly we get
VtN+\x,y,T)
= max min max ... max min V?((rf-1,nN-1tT-(N-l)6).
mC(i ( ){C^( I )^C|(n') *->Cf.(-*K*->Cj,-*)
But T -1(N 1)S = a <6, therefore

whence equality (5.7.4) follows.


5.7. Methods of successive approximations for solving differential games 303

The coincidence of members of the sequence V/ for Jfc > N is derived from (5.7.4)
by induction. This completes the proof of the theorem.
5.7.5. Theorem. The limit of the sequence {V/(x,y,T)} coincides with the
value of the game Ts{x,y,T).
Proof. This theorem is essentially a corollary to Theoren in 5.7.4. Indeed, let

V^s,T)=limV/(i,B,r).
*->00

Convergence takes place in a finite number of steps not exceeding N = [T/S] + 1;


therefore in the recursion equation (5.7.2) we may pass to the limit as k > oo. The
limiting function V$(x,y,T) satisfies the equation

Vs(x,y,T) = max min VsU,r),T - 6) (5.7.5)


DCK)ec{,(x)
with initial condition

Vs(x,y,T)\oST<s = max min p(,n), (5.7.6)

which is a sufficient condition for the function Vg(x, y, T) to be the value of the game
rs(x,y,T), (this is also a "regularity" criterion).
5.7.6. We shall now provide a modification of the method of successive approxi
mation discussed above.
The initial approximation is taken to be the function V^(x,y,T) V(x,y,T),
where V(x,y,T) is defined by (5.7.1). The following approximations are constructed
by the rule:
V(x, y, T) = max max min V,*(,i,, T - iS)
ll-.N]i)C'1{!,XeC},'(r)
for T > 6, where N = [T/6], and Vsk+l{x,y,T) = Vs{x,y,T) for T < 6.
The statements of the theorems in 5.7.3-5.7.5 hold for the sequence of functions
{Vf{x,y,T)} and the sequence of functions { V / ( x , y , T ) } .
The proof of these statements for the sequence of functions {Vsk(x, y, T)} is almost
an exact replica of a similar argument for the sequence of functions {V/(x, y, T)}. In
the region {(x,y,T)|T > 8} the functional equation for the function of the value of
the game T$(x, y, T) becomes

Vs{x,y,T) = max max min Vs(Z,n,T - i6), (5.7.7)

where N = [T/6], while the initial condition remains unaffected, i.e. it is of the form
(5.7.6).
5.7.7. We shall now prove the equivalence of equations (5.7.5) and (5.7.7).
Theorem. Equations (5.7.5) and (5.7.7) with initial condition (5.7.6) are equiv
alent.
Proof. Suppose the function Vs(x,y,T) satisfies equation (5.7.5) and initial
condition (5.7.6). Show that this function satisfies equation (5.7.7) in the region
{{x,y,T)\T>6}.
304 Chapter 5. Differentia} games

Indeed, the following relationships hold:

Vt(x,y,T) = max min VS((,T),T-6) = max min max min Vs((,rf,T-26)

> max max min min Vj(|, n,T 28)


veC'B{y) vC'BM (Cp(x) UC'P{()

= max min VsU,f),T-26) > ...> max min V W ^ . T - iS) > ....

When = 1 we have

Vs(x,y,T)= max min V s ( , ? , r - ) ,


r,ecB(v) tecfp{z)

hence
Vs(x, y.T) = max max min VgU.n.T i6),

where iV = [T/6], which proves the statement.


Now suppose the function Vg(x,y, T) in the region {(x,y,T)\T > 6} satisfies
equation (5.7.7) and initial condition (5.7.6). Show that this function also satisfies
equation (5.7.5). Suppose the opposite is true. Then the following inequality must
hold in the region {(x,y,T)\T > 6}:

Vs(x,y,T)> max min Vs((,r,,T - 6).


necE{v)tecF(z)

However,

max min VsU,v,TS) = max min max max min V*(,J?,T(i+l)6)


vCHv)UCp{x) ^C%{y)(eCp(I)i[l:N-l]rieC'>(r,)(eCp'(()

> max max max min min V*(,n, T (t + 1)8)


nC'B{v)ieli:N-i]^c>(n)iecp(I)Ucpt(0
= max max max min min VsCE.Ti,T (ii+1)6)
ie[i:N-i)necB(v)ijecB>(n)tecpMUcPt(()

= max max min VsU,n,T - i6) = Vs(x,y,T).

Since for i = 1 the strong inequality holds, this contradiction proves the theorem.

5.8 Examples of solutions to differential games of


pursuit
5.8.1. Example 4- (Simple motion.) Let us consider the differential game T(xo, yo, T)
in which the motion by the Players P and E in the Euclidean space /?* is governed
by the following equations:

for P : x = au, ||u|| < 1, x(0) = x0,


5.8. Examples of solutions to differential games of pursuit 305

for E : y = fa \\v\\ < 1, y(Q) = y 0 , (5.8.1)


n
where a,/9 are constants a > 0 > 0, x,y,v, v 6 R .
The payoff to Player E is

ff(*(T),(r))=||*(r)-y(T)||.

Let Ts(x,y,T) be a discrete form of the differential game F(x,y,T) with the
partition step 6 > 0 and discrimination against Player E. The game Tg{x,y,T) has
N steps, where JV = T/6. By Sec. 5.2 (see Example in 5.2.3) the game Fg(x,y,T)
has the value

Vs{x, y, T) = max{0, ||x - y|| - N 6 ( a - /?)} = max{0, ||x - y\\ - T(a - /?)},

and the optimal motion by players is along the straight line connecting the initial
states x,y.
By the results of 5.3, the value of the original differential game

V(x,y,T) = UmV,(x,y,T) = n M x { 0 , | | i - y | | - r ( o - ^ ) } . (5.8.2)


00

It can be seen that

V(x,y,T) = max nun ||x' - y'|| = pr(x,y),


V6Cj(y)i'Cj(r)

where Cj.(y) = S(y, /3T) is the ball in ft" of radius jlT with its center at the point
y, similarly C%(x) = S(x,aT). Thus, by Lemma in 5.5.3, Player E in the game
F(x 0 ,Vo,T) has the optimal open-loop strategy t>"(i), t (0,!Tj, which leads Player
E's trajectory to the point y* G Cg(yo) for which

M*o,yo)= tmin ||x'-y*||.

Evidently,
w i t
v'(t) = v' = (i^\V ^ * 0 ,
[ v, with y0 = xo,
where v Rn is an arbitrary vector such that ||t>|| = 1. From the results of 5.6 it
follows that in the region A

A={(x,y,T):||*-y||-r(a-/*)>0},
where there exist continuous partial derivatives
dV . a. OV dV x-y
v
dT "" dx dy ||x-y|r

the function V(x,y,T) satisfies equation (5.6.4):

dV
<dV \ a (dV ^ n /* a
- - a min(-^ ,u) pmaxf-r ,v) = 0. (5.8.3)
306 Chapter 5. Differential games
In equation (5.8.3), minimum and maximum are achieved under controls
av
dV (5 8 4)
-"^-m-i^-r --
(5 8J)
^^mrw^i- -
Strategies (5.8.4), (5.8.5) are optimal in the differential game (5.8.1). The strategy
u(x,y) determined by relationship (5.8.4) is called a "pursuit strategy", since at each
instant of time for Player P using this strategy the vector of his velocity is pointing
towards Evader E.
5.8.2. Example 5. (Game of pursuit with frictional forces.) The pursuit takes
place over the plane. Equations of motion are of the form:
forP:
9. = Pi i

p^am-kppi, i = 1,2, ||||<1; (5.8.6)


for E:
n = Si,
i = 0Vi-kBSi, t = l,2, |M|<1; (5.8.7)
*(0) = fl?,(0)=p?,n(0) = r?,
Si(0) = sl t = 1,2, a,p,kB,kP>0. (5.8.8)
Here q = (?i,9s) and r = (ri,r a ) are positions on the plane for players P and E,
respectively; p = (p\.,pi) and s = (s\,st) are the players' momenta; kp,kg are some
constants interpreted to mean friction coefficients.
The payoff to Player E is taken to be

H(q(nr(T)) = MT) ~ r(T)\\ = ^(T) - n ( T ) P + feCO -r,(T)f.

In the plane q = (41, ft), the reachability set Cp(q,p) for Player P from the
initial states p(0) = p, q(0) = q in the time T is the circle (Exercise 18) of radius

RP(T) = ^-(e-k"T + kPT ~ 1)

with its center at the point

(,0,p0,r) = , 0 + p i ^ I .
Similarly, the set Cf(r,s) is the circle of radius

ksT
RE(T) = A-(e-
k
+ kET - 1)
E
5.8. Examples of solutions to differential games of pursuit 307

with its center at the point


1 _ P-kBT

b(r0,S0,T) = r0 + Sl I .
KE

For the quantity P T ( 9 , P , r, s) determined by relationship (5.5.1), in this differ-


ential game there is

^r(f,p 0 ,r,5 0 ) = max min ||tf-rl|.

Hence (see formula (5.2.10)) we have

pr(l, P,r,s) = max{0, \\a(q, p, T) - b(r, s, T)\\ - (RP(T) - RE(T))}

max< 0 , . r +p s
tk* - - kP ' kE )
/ e-k"T + kPT-l ae-"*
T
+ kBT - \ \ \
\Q kj, fi
kl )) (5 89)
'

In particular, the conditions a > /?, f- > r^- suffice to ensure that for any initial
states <?,p, r, 5 there is a suitable T for which pr(q,p, r, s) = 0.
The function pr(q,p,r,s) satisfies the extremal differential equation (5.6.1) in the
domain ft {(q,p,r,s,T) : /Sr(g> P i r , s ) > 0}. In fact, in the domain ft there are
continuous partial derivatives

dp dp dp dp dp Ksirn

Equation (5.6.1) becomes

dp X/dp dp dp, dp, \

(58U)
''mtfe*'"$&**''
Here extrema are achieved on the controls u, tJ determined by the following formulas:
''
it
77. <** (5 8 12)
( }
* s/&Hgr
*L
d,i
v - i - 1 9 f K 8 13"i

' J& + &' '


308 Chapter 5. Differential games
Substituting these controls into (5.8.11) we obtain the nonlinear first-order partial
differential equation

dp ^{dp dp dp dp \ I, dp st , , dp ^
+ kpp + { r)
'ds
ar - (%" &? - dp- ' - d^M - ^V W ^
+Q ( )2+( )2=0 (5814)
vt -
Computing the partial derivatives (5.8.10) we see that the function pr(q,p,r,s)
--
in the domain ft satisfies equation (5.8.14). Note that the quantity pT(q,p0,r,s0)
is the value of the differential game (5.8.6)-(5.8.8) and the controls determined by
relationships (5.8.12), (5.8.13) are optimal in the domain ft,
From formulas (5.8.12), (5.8.13), (5.8.9) we find

r 9i kE Pi kF
Ui= ' * t/, = u i = 1,2. (5.8.15)

In the situation u, v the force direction for each of the players is parallel to the line
connecting the centers of reachability circles (as follows from formula (5.8.15)) and
remains unaffected, since in this situation the centers of reachability circles move
along the straight line.

5.9 Games of pursuit with delayed information


for P u r s u e r
5.9.1. In this chapter we have examined conflict-controlled processes where each par
ticipant (player) has perfect information, i.e. at each current instant of time Player
P(E) is aware of his state i(t)[y(i)] and the opponent's state y(f)[z(t)]. Existence
theorems were obtained for pure strategy t-equilibria in such games and various meth
ods for constructing solutions were illustrated. This was made possible by the fact
that the differential games with perfect information are the limiting case of multi
stage games with perfect information where the time interval between two sequential
moves tends to zero. In differential games with incomplete information, where mixed
strategies play an important role, we have a completely different situation. Without
analyzing the entire problem we will enlarge on the game of pursuit with prescribed
duration, terminal payoff and delayed information for Player P on the phase state of
Player E, the time of delay being / > 0.
5.9.2. Let there be given some number / > 0 referred to as the information delay.
For 0 < t < I, Pursuer P at each instant of time t knows his own state x(t), the time
t and the initial position yo of Evader E. For I <t <T, Player P at each instant of
time t knows his own state x(t), the time t and the state y(t I) of Player E at the
time instant t I. Player E at each instant of time t knows his own state y(t), the
5.9. Games of pursuit with delayed information for Pursuer 309

opponent's state x(t) and the time t. His payoff is equal to a distance between the
players at the time instant T, the payoff to Player P is equal to the payoff to Player
E but opposite in sign (the game is zero-sum). Denote this game by V(x0,y0,T).
Definition. The pure piecewise open-loop strategy v(-) for Player E means the
pair {T,b), where T is a partitioning of the time interval [0,T] by a finite number of
points 0 = t\ < ... < tk = T, and b is the map which places each state x(ti),y(ti),U
in correspondence with the measurable open-loop control v(t) of Player E for t

Definition. The pure piecewise open-loop strategy u(-) for Player P means the
pair {o~, a}, where a is an arbitrary partitioning of the time interval [0,T) by a finite
number of points 0 = t\ < t'2 < < t', = T, and a is the map which places each
state x(*J),y(*| /) K for ' t'i in correspondence with the segment of Player P's
measurable open-loop control u(<) for t [<o*J+1). For <{ < I, the map a places each
state x(tj)> yo, t'i in correspondence with the segment of Player P 's measurable control

U (o/ort e[<:,*;+1).
The sets of all pure piecewise open-loop strategies for the players P and E are
denoted by P and E, respectively.
Equations of motion are of the form
x = f{x,u), u 6 U C RF, x 6 R",

y = g(y,v), v V c ft', y 6 Rn. (5.9.1)


We assume that the conditions which ensure the existence and uniqueness of a solution
to system (5.9.1) for any pair of measurable open-loop controls u(t), v(t) with the given
initial conditions x0, y0 are satisfied. This ensures the existence of a unique solution to
system (5.9.1) where the players P and E use piecewise open-loop strategies u(-) P,
v(-) E with the given initial conditions io, j/o- Thus, in any situation (u(-),u(-))
with the given initial conditions x 0 , yo the payoff function for Player E is determined
in a unique way
tf(*o,yol ().()) = p(x{T),y(T)l (5.9.2)
where x(t), y(t) is a solution to system (5.9.1) with initial conditions x 0 , yo in situation
((),()), and p is the Euclidean distance.
5.9.3. We can demonstrate with simple examples that in the game under study
T(x0,y0,T) the e-equilibria do not exist for all e > 0. For this reason, to construct
equilibria, we shall follow the way proposed by John von Neumann and Oskar Mor-
genstern (1944) for finite positional games with incomplete information. The strategy
spaces of the players P and E will be extended to what are called mixed piecewise
open-loop behavior strategies (MPOLBS) which allows for a random choice of control
at each step.
Example 6. Equations of motion are of the form

for P : x = u, ||u|| < a,

for E : y = v, || v || < /?, (5.9.3)


310 Chapter 5, Differential games
a > 0 > 0, x,y, R2, u,v 6 R*.
The payoff to Player E is p(x(T),y(T)), where x(<),y(<) is a solution to system
(5.9.3) with the initial conditions x(io) = x0, y(*o) = Jfo Player P is informed only
about the initial state y0 of his opponent, while Player E is completely informed
about Player P's state (l = T).
Let v(x, y, t) be some piecewise open-loop strategy for Player E. For each strategy
v there is a strategy u(x,t) of Player P using only information about the initial
position of Player E, his current position and the time from the start of the game, for
which a payoff of p(x(T),y(T)) < t for T > p{x0,y0)/(a - /3). Indeed, let u*(x,y,t)
be a strategy for Player P in the game with perfect information. The strategy is as
follows: Player E is pursued until the capture time t (while the capture of E takes
place) while for tn < t < T the point x(t) is kept in some e-neighbourhood of the
evading point. It is an easy matter to describe analytically such a strategy in the
game with perfect information (see Example 4, 5.8.1). Let us construct the players'
trajectories x(t),y(t) in situation (u*(x,j/,t),v(x,y,t)) from the initial states Xo,jfo-
To do this, it suffices to integrate the system

x = u*(x,y, t), x(t0) = x0,

V = v(*, V, i), V(to) = Vo- (5.9.4)


By construction p(x(T),y(T)) < t. Now let u(t) = u"(x{t),y(t),t). Although the
strategy u*(x, y, t) using the information about E's position is inadmissible, the strat
egy u(t) is admissible since it uses only information about the time from the start of
the game and information about the initial state of Player E. It is apparent that in
situations (u(t),v(x,y,t)) and (u'(x,y, t),v(x,y,t)) the players' paths coincide since
the strategy v(x,y,t) responds to the strategy u*(x, y, t) and the strategy tt(i) by
choosing the same control v(x~(t),y(t),t).
We have thus shown that for each strategy v(x,y,t) there is an open-loop control
u(t) which is an admissible strategy in the game with incomplete information and is
such that p(x(T),y(T)) < c, where x(t),y(t) are the corresponding trajectories. The
choice of v(x, y, t) is made in an arbitrary way, hence it follows that

sup inf p(x(T), y(T)) = 0, (5.9.5)

where sup inf is taken over the players' strategy sets in the game with incomplete
information.
For any strategy u(x, t) of Player P, however, we may construct a strategy v(x, y, t)
for Player E such that in situation (u(x,t),t>(x,y, t)) the payoff p to Player E will
exceed /3T. Indeed, let u(x, t) be a strategy for Player P. Since his motion is inde
pendent of y(t), the path of Player P can be obtained by integrating the system

x = u(x,t), x{t0) = x0 (5.9.6)

irrespective of what motion is made by Player E. Let x(t) be a trajectory resulting


from integration of system (5.9.6). The points x(T) and y0 are connected and the
5.9. Games of pursuit with delayed information (or Pursuer 311

motion by Player E is oriented along the straight line [x(T),t/o] away from the point
x(T). His speed is taken to be maximum. Evidently, the motion by Player E ensures a
distance between him and the point x(T) which is greater than or equal to /3T. Denote
the thus constructed strategy for Player E by v(t). In the situation (u(x,t),v(t)), the
payoff to Player E is then greater than or equal to ffT. From this it follows that

infsu P / >(z(T),y(:r))>/?r, (5.9.7)

where infsup is taken over the players' strategy sets in the game with incomplete
information.
It follows from (5.9.5) and (5.9.7) that the value of the game in the class of pure
strategies does not exist in the game under study.
5.9.4. Definition. The mixed piecewise open-loop behavior strategy (MPOLBS)
for Player P means the pair p(-) = {r,d}, where r is an arbitrary partitioning of the
time interval [0, T] by a finite number of points 0 = ti < tj < . . . < tk = T, and d is
the map which places each state x(ti),y(ti /),, for i, > I and the state x(ti),yo,ti
for t, < I in correspondence with the probability distribution /!;() concentrated on a
finite number of measurble open-loop controls u(t) for t [ti,ti+l).
Similarly, MPOLBS for Player E means the pair v(-) = {c, c}, where a is an
arbitrary partitioning of the time interval [0,T] by a finite number of points 0 =
t'j < t'2 < . . . < t't = T, and c is the map which places the state x(i'j),y{t'i),t'i in
correspondence with the probability distribution /() concentrated on a finite number
of measurable open-loop controls v(t) for t 6 [ft,it+i)>
MPOLBS for the players P and E are denoted respectively by P and E (compare
these strategies with "behavior strategies" in 4.8.3).
Each pair of MPOLBS fi(-),i/(-) induces the probability distribution over the
space of trajectories x(t),x(0) = Xo; 2/(0>y(0) = Vo- For this reason, the payoff
^(*Oi J/o! A*(") "(*)) m MPOLBS is interpreted to mean the mathematical expectation
of the payoff averaged over the distributions over the trajectory spaces that are in
duced by MPOLBS p(-), /(). Having determined the strategy spaces P,E and the
payoff K we have determined the mixed extension r ( i o 5 Vo, T) of the game r ( x 0 , yo> T).
5.9.5. Denote by Cp(x) and C%{y) the respective reachability sets of the players
.T*

P and E from initial states x and y at the instant of time T, and by CE(y) the
convex hull of the set Cg(y). We assume that the reachability sets are compact, and
introduce the quantity
7(y,T)= min max ? ((,i(),
Cs(v)"cSW

Let l{y,T) = p{y,y), where y CE(y), y 6 C"|(y). From the definition of the point y
it follows that it is a center of the minimal sphere containing the set C | ( y ) . Hence it
follows that this point is unique. At the same time, there exist at least two points of
tangency of the set C7f(y) to the minimal sphere containing it, these points coinciding
with the points y.
Let y(t) be a trajectory (y(0) = y0) of Player E for 0 < t < T. When Player
E moves along this trajectory the value of the quanitity f(y(t),T t) changes, the
312 Chapter 5. Differential games
point y also changes. Let y(t) be a trajectory of the point y corresponding to the
trajectory y(t). The point M Cf ~'(yo) will be referred to as the center of pursuit
if
f(M,l)= max -f(y',l).
w'Cj-'(i)
5.9.6. We shall now consider an auxiliary simultaneous zero-sum game of pursuit
over a convex hull of the set Cf (y). Pursuer chooses a point 6 ^E(V) an<^ Evader
chooses a point n CE(y). The choices are made simultaneously. When choosing the
point i, Player P has no information on the choice of n by Player E, and conversely.
Player E receives a payoff />(,?). We denote the value of this game by V(y, T)
in order to emphasize the dependence of the game value on the parameters y and T
which determine the strategy sets TJE(y) and Cfgiy) for players P and E, respectively.
The game in normal form can be written as follows:

r(y,T) = (UTE(y),cUy),p(y'y))-
The strategy set of the minimizing player P is convex, the function p(y', y") is
also convex in its independent variables and is continuous. Theorem in 2.5.5 can
be applied to such games. Therefore the game T(y, T) has an equilibrium in mixed
strategies. An optimal strategy for Player P is pure, and an optimal strategy for
Player E assigns the positive probability to at most (n + 1) points from the set
Cf(y), with V{y, T) = i(y, T). An optimal strategy for Player P in the game T(y, T)
is the choice of a center of the minimal sphere y containing the set CE(y). An optimal
strategy for Player E assigns the positive probabilities to at most (n+1) points among
the points of tangency of the sphere to the set C7g(y) (here n is the dimension of the
space of y). The value of the game is equal to the radius of this sphere (see Example
11 in 2.5.5).
5.9.7. We shall now consider a simultaneous game T(Af, /), where M is the center
of pursuit. Denote by yt(M),... ,yn+i(M) the points from the set C'E(M) appearing
in the spectrum of an optimal mixed strategy for Player E in the game T(Af, /) and
by y(M) an optimal strategy for Player P in this game.
Definition. The trajectory y"(t) is called conditionally optimal if y*(0) = y0,
ym(T l) = M, y*(T) = y~i(M) for some i from the numbers 1,... ,n + 1.
For each i there can be several conditionally optimal strategies of Player E,
Theorem. Let T > I and suppose that for any number e > 0 Player P can ensure
by the time T the e-capture of the center y(T) of the minimal sphere containing the
set CE(y(Tl)). Then the game T(xo,yo,T) has the value f(M,l), and the e-optimal
strategy of Player P is pure and coincides with any one of his strategies which may
ensure the e/2-capture of the point y(T). An optimal strategy for Player E is mixed:
during the time 0 <t <T I he must move to the point M along any conditionally
optimal trajectory y"(t) and then, with probabilities p i , . . . ,p+i (the optimal strategy
for Player E in the game T(M, I)), he must choose one of the conditionally optimal
trajectories sending the point y*(T - I) = M to the points y~i(M), i = 1 , . . . , n + 1
which appear in the spectrum of an optimal mixed strategy for Player E in the game
T(M,l).
5.9. Games of pursuit with delayed information for Pursuer 313
Proof, Denote by u,(),!/() the strategies mentioned in Theorem whose optimal
ity is to be proved. In order to prove Theorem, it suffices to verify the validity of the
following relationships:

Afco, y0; /(). "()) + > #(*(>, Vo\"(). ".())

> T<(x0, y0; u(.), *()) - e, /*() e F , KO e , (5.9.8)


\imK(xa, yo;<(), *.()) = 7(M, /). (5-9.9)
The left-hand side of inequality (5.9.8) follows from the definition of strategy u e () by
which for any piecewise open-loop strategy () P

K{x0, yo! u(-), /,()) + e > K(xQ, y0; ,(),".())

Denote by x"(t) Pursuer's trajectory in situation (ue(-), !/.()). Then

7T(x0,yo; (), *.()) = X > p ( z ' ( r ) , y , ( M ) ) . (5.9.10)

Let R be a radius of the minimal sphere containing the set C'E(M), i.e. R =
7(M, 0- Then # - e/2 < p(x'{T),yx(M)) < R + e/2 for all i = 1,... ,n + 1, since the
point i*(T) belongs to the e/2-neighborhood of the point y(M). Since H ^ 1 p; = 1,
Pi > 0, from (5.9.10) we get

ft - e/2 < (*, y0; u(-), /.()) < + /2, (5.9.11)


and this proves (5.9.9).
Suppose the state x(T),y(T 1} have been realized in situation (()>"(')) an<^
Q() is the probability measure induced on the set C'E(y(T /)). From the optimality
of the mixed strategy p = (pi,...,p n +i) in the game T(M, I) we have

R = ftp(y(M),&(M)) > 7 ( y ( r - 0 , 0 = Vair(y(T - /),/)


i=l

>/, p(y{v(T-01.)"?. (5-9.12)

where y[y(T - /)] is the center of the minimal sphere containing the set C'B(y{T - / ) ) .
However, p{x(T),y[y(T - 1)}) < e/2, therefore for y 6 CsE(y(T - I)) we have

p(x(T),y) < | + (y[y(T - f)],y) < /? + e/2. (5.9.13)

From inequalities (5.9.11)-(5.9.13) it follows that

*(*o,yo;u(.), /.()) > / , , . n / * ( T ) , W - e, (5-9.14)


314 Chapter 5. Differentia] games

but
/ p(x(T), y)dQ = ( x 0 , y0; ,(), *()) (5-9-15)

From formulas (5.9.14) and (5.9.15) we obtain the right-hand side of inequality (5.9.8).
This completes the proof of the theorem.
For T <l the solution of the game does not differ essentially from the case T > I
and Theorem holds if we consider CE(yo), S^Oto), f(M,T), y0 instead of CE(yo),
^fi(yo), l{M,l), yiT - 0) respectively.
The diameter of the set C'B(M) tends to zero as I * 0, which is why the value of the
auxiliary game T(M, /) also tends to zero. But the value of this auxiliary game is equal
to the value Vi(x0,y0,T) of the game of pursuit with delayed information T(x0,y0,T)
(here index / indicates the information delay). The optimal mixed strategy for Player
E in V(M, 1) concentrating its mass on at most n + 1 points from C'E(M) concentrates
in the limit its entire mass in one point M, i.e. it becomes a pure strategy. This agrees
with the fact that the game T(x0,yo,T) becomes the game with perfect information
as / ~ + 0 .
Example 7. Equations of motion are of the form

x = u, ||u|| < a; y = v,\\v\\<0, a>0,x,yeR3.

Suppose the time T satisfies the condition T > p(x0,yo)/(a 0) + ' The reach
ability set Cg(yo) = Cfidft)) and coincides with the circle of radius 01 with its center
at jfo- The value of the game T(y, I) is equal to the radius of the circle ClE(y), i.e.
V(y,l) = 0l.
Since V(y, I) is now independent of y, any point of the set C f ~'(j/o) can be the
center of pursuit M. An optimal strategy for Player P in the game T(y, I) is the choice
of point y, and an optimal strategy for Player E is mixed and is the choice of any
two diametrically opposite points of the circle C'E(y) with probabilities (1/2,1/2).
Accordingly, an optimal strategy for Pursuer in the game r(xo,yo)^') is the linear
pursuit of the point y(t /) for / < t < T (the point yo tot 0 < t < I) until the
capture of this point; moreover, it must remain in e/2-neighborhood of this point.
An optimal strategy for Player E (the mixed piecewise open-loop behavior strategy)
is the transition from the point y0 to an arbitrary point M G CE~'(yo) during the
time T I and then the equiprobable choice of a direction towards one of the two
diametrically opposite points of the circle CE(M). In this case ValT(x0,y0,T) = 01.

5.10 Definition of cooperative differential game


in t h e characteristic function form
Consider the n-person differential game r ( x 0 , T to) from the initial state XQ with
the duration T t0. The motion equations have the form

x = f{x,uu...,un), UiEUiCCompr^, i = l,...,n, (5.10.1)


5.10. Definition of cooperative differential game in the characteristic function 315

x(t0) = x0. (5.10.2)


Here ut Ui is the control variable of player i. The payoff function of player i is
defined in the following way:

K,(x0,T-t0;uu.,.,un)= [ hi(x{t))dt+Hi{x(T)), Aj > 0, ffj(x) > 0, t = l , . . . , n ,


(5.10.3)
where x(t) is the trajectory realized in the situation (tii,... ,u) from the initial
state x 0 . In the cooperative differential game we consider only open-loop strategies
Ui = u,(t), t [to,T], t = 1 , . . . , n of players.
5.10.1. Consider the cooperative form of the game T(a;o, T f0)- In this for-
malization as before we suppose that the players before starting the game agree to
play u , , . . . ,u* such that the corresponding trajectory x"(t) maximizes the sum of
the payoffs
n n
max
Y * K < ( x o , T ~ t 0 ; u u . . . , u ) = Y,Ki(x0,T- t0;u,,...,u'n)
1=1 i=l

= J2fTk>(x'(t))dt = v{N;xo,T-to),

where N is the set of all players in T(x0,T t0). The trajectory x*(t) is called
conditionally optimal. Let S C N, and v(S;x0,T to) be a characteristic function.
It follows from the surperadditivity condition that it is advantageous for the players
to form a maximal coalition N and obtain a maximal total payoff v(N; XQ, T to)
that is possible in the game. Purposefully, the quantity v{S;x0,T t0) [S ^ N) is
equal to a guaranteed payoff of the coalition 5 obtained irrespective of the behavior
of other players, even though the latter form a coalition N\S against S.
Note that the positiveness of payoff fuunctions J,, i = l , . . . , n implies that of
characteristic function. From the superadditivity of v it follows that v(S'\ x0,Tt0) >
v(S;x0,T - t 0 ) for any S, S' C N such that S C S", i.e. the superadditivity of the
function v in S implies that this function is monotone in S.
Since the essence of cooperative game is the possibility to form coalitions and the
main problem therein is distribution of the total payoff between players, then the
subject of cooperative theory is characteristic function rather than strategy. In fact,
the characteristic function displays the possibilities of coalitions in the best way and
can form the basis for equitable distribution of the total payoff between players.
The pair (N, v(S; xa, T~t0)), where N is the set of players, and v the characteristic
function, defined by (8.1.2) is called the cooperative differential game in the form of
characteristic function v. For short, it will be denoted by r(xo,T to)-
5.10.2. Various methods for "equitable" distribution of the total profit between
players are treated as optimality principles in cooperative games. The set of such
distributions satisfying an optimality principle is called a solution to the cooperative
game (in the sense of this optimality principle). We will now define solutions of the
game Tv(N;x0,T -t0).
316 Chapter 5. Differential games
Denote by , a share of the player :' N in the total gain v(N;xo,T tQ).
Definition. The vector (i,...,), whose components satisfy the condi
tions:
1. ti>v({i};x0,T-to), ieN,
2- ,<=* = f ( ^ ; z o , T - < o ) ,
is called an imputation in the game Tv(x0, T t0).
The equity of the distribution = (&,... ,) representing an imputation is that
each player receives at least his safe payoff and the entire maximal payoff is divided
evenly without a remainder.
5.10.3. Theorem. Suppose the function w : 2N x R x R1 - Rl is additive
in S 6 2B, i.e. for any S,R 2N, Sf)R = 0 have w{S\JR\x0,T - t0) =
w(S;x0,T t0) + w(R;xo,T t0)- Then in the game rw(x0,T t0) there is a unique
imputation ,- = {w({i};xo,T <o)> * = 1,... ,}
Proof. Prom the additivity of w we immediately obtain w(N,xQ,T (0) =
ic({l};x 0 , T t0) + ... + w({n}; XQ,T t0), whence follows the statement of the
theorem.
The game with additive characteristic function is called inessential. In the essential
game r(zo, T to) there is an infinite set of imputations. Indeed, any vector of the
form

(v{{l},x0,T-t0) + a1,...,v({n},x0,T-t0) + an), a, > 0, i N,

a, = v(N; x0, T - tQ) - v{{i); * 0 , T - t0) > 0,


ieN ieN
is an imputation. The imputation set in the game Tv(x0,T to) is denoted by
Ev(x0,T-t0).
5.10.4. Definition. We say that the imputation dominates the imputation n
in the coalition S ( >- n) if

1- (i > m> S;
2- T,iesti ^ v(S;x0,T- t0).
The imputation is said to dominate the imputation n ( >- n) if there is such coalition
SCN thaty n.
It follows from the definition of the imputation that domination in single-element
coalition and coalition N, is not possible.
5.10.5. Definition. The set of nondominated imputations is called the core of
the game Tv(xo,T t0) and is denoted by CV(XQ,T t0).
The equity of the imputation belonging to the core is that none of the coalitions
can offer a reasonable alternative against this imputation.
5.10.6. Definition. The set Lv(x0,T - t0) C Ev(x0,T - t0) is called the
Neumann-Morgenstem solution (the NM-solution) of the game Tv(xo,T t0) if:
5.1 J. Principle of dynamic stability (time-consistency) 317

1. ,n Lv(x0,T t0) implies "if- n ( does not dominate n),

2. for n 0 Lv(x0,T - t 0 ) there is such Lv(x0,T ~ t0) that y n.

As is seen from the Definition, the conditions placed on the imputations from
the iVM-solution are weaker than those on the imputation from the core and, as
a result, the ./VAf-solution always contains the core. Unlike the core and NM-
solution, the Shapley value representing an optimal distribution principle of the total
gain V(N;XQ,T t0) is defined without using the concept of domination.
5.10.7. Definition. The vector$v(x0,T-t0) = {$vi(x0,T-t0), i= I,...,n}
is called the Shapley value if it satisfies the following conditions:

1. ifv,w are two characteristic functions, then $V(XQ,T t0) + $ w (xo, T to) =
""(icT-to);
2. w$v(x0,T t0) = $ * " ( x , r t 0 ), where ir is any permutation of play
ers, Tt$v(x0,T - t0) = {<&^(t)(x0,T - t0), i = l , . . . , n } , where nv
is a characteristic function such that for any coalition S { t i , . . . , t , }
*({T(*I) . *(.)}; *o>T - to) = v(5; x 0 , T - t0);

3- LieN *r(*o, T - t0) = v(N, x 0 , T - to);

I if v(S;x0,T-t0)-v(S\i;x0,T-t0) = 0 for allS C N (S 9 i), then$?(x0,T-


t0) = 0.

As we have seen in Chap. 3, there exists a unique vector $"(xo,X t0) satisfying
these four conditions, and its components are computed by the formulas

*?(xo,T-io)

( 3 ) 1)!
= " " t ~ K S ; x 0 , r - t 0 ) - v{S \ i;x0,T - t0)}, (5.10.4)
n
SCN (S3i) '
i 1 , . . . ,n.
The components of the Shapley value have the meaning of the players' expected
shares in the total gain. Also, it may be shown that the Shapley value is an imputa
tion.

5.11 Principle of dynamic stability (time-consis


tency)
5.11.1. Formalization of the notion of optimal behavior constitutes one of the funda
mental problems in the theory of n-person game. At present, for the various classes
of games different optimality principels are constructed. Some of them are stated in
Chap. 3. Recall that the players' behavior (strategies in noncooperative games or
imputation in cooperative games) satisfying some given optimality principle is called
318 Chapter 5. Differential games
a solution of the game in the sense of this principle and must possess two properties.
On the one hand, it must adequately reflect the conceptual notion of optimality pro
viding special features of the class of games for which it is denned. On the other hand,
it must be feasible under conditions of the game where it is applied. This property
reduces to the existence of the solution of the game generated by a specified principle
of optimality.
In dynamic games, one more requirement is naturally added to the above-
mentioned requirements, viz. the purposefulness and feasibility of an optimality
principle are to be preserved throughout the game. This requirement is called the
dynamic stability of a solution of the game (time consistency).
The dynamic stability of a solution of the differential game is the property that,
when the game proceeds along an "optimal" trajectory, at each instant of time the
players are to be guided by the same optimality principle, and hence do not have any
ground for deviation from the previously adopted "optimal" behavior throughout the
game. When the dynamic stability is betrayed, at some instant of time there are
conditions under which the continuation of the initial behavior becomes non-optimal
and hence the initially chosen principle of optimality proves to be unfeasible.
Assume that at the start of the game the players adopt an optimality principle
and construct a solution based on it (an imputation set satisfying the chosen principle
of optimahty, say the core, iVM-solution, etc.). FVom the definition of cooperative
game it follows that the evolution of the game is to be along the trajectory providing
a maximal total payoif for the players. When moving, the players arrive at the
subgames featuring current initial states and current duration. In due course, not
only the conditions of the game and the players opportunities, but even the players'
interests may change. Therefore, at an instant t the initially optimal solution of the
current game may not exist or satisfy the players now. Then, at the instant t, the
players will have no ground to keep to the initially chosen trajectory. The latter
exactly means the dynamic instability of the chosen optimality principle and, as a
result, the dynamic instability of the motion itself.
We now focus our attention on dynamic stable solutions in the cooperative differ
ential games with side payments.
5.11.2. Let an optimality principle be chosen in the game r(x 0 ,r - t0). The
solution of this game constructed in the initial state x(to) = x0 based on the chosen
principle of optimality is denoted by Wv(x0, T - t0). The set Wv(x0, T t0) is a
subset of the imputation set Ev(x0,T t0) in the game Tv(x0,T - t0). Assume that
Wu(x0,T 10) ^ 0 . Let x~(t), t 6 [to,T] be the conditionally optimal trajectory.
Definition. Any trajectory x(-) of the system (5.10.1)-(5.10.2) such that

*w(*(0) = W O ) = W *o, T -10)

is called a conditionally optimal trajectory in the game TV(XQ,T to).


The definition suggests that along the conditionally optimal trajectory the play
ers obtain the largest total payoff. For simplicity, we assume henceforth that such a
trajectory exists. In the absence of the conditionally optimal trajectory we may intro-
5. J J. Principle of dynamic stability (time-consistency) 319

duce the notion of "e-conditionally optimal trajectory" and carry out the necessary
constructions with an accuracy e.
5.11.3. We will now consider the behavior of the set WV(XQ,T t0) along the
conditionally optimal trajectory ~x(t). Towards this end, in each current state x(t)
the current subgame Tv(x(t),T t) is defined as follows. In the state x(t), we define
the characteristic function

f 0, if S = 0 ,
v(S; x(t), T-t) = I ValTs(x(t), T-t), if S C N(0 ? S ? N),
[ maxUN(.)[,,r)ex)w[,,n KN(x(t),uN(-)[t,T\), if 5 = N.

Here Ks{x(t),us(-)[t, T]) is the remaining total payoff of the players from the initial
state x(t) on the conditionally optimal trajectory, i.e.

KN(x(t),uN(-)[t,T}) = f [Th,(x(r))dT + Hi(x(T))

ValTs{x(t), T - t) is the value of the zero-sum differential game rs(x(t),T t) be


tween coalitions S and N \ S which is described by the equation x = f(x, us, ^N\S)
and from the initial state x(t) and duration T - t . Here the payoff of Player
S (maximizer) in each situation (u s ()[*,T],iv\s(-)[<>7l), us(-)[t,T] T>s[t,T],
uN\s(-)[t,T) 6 T>s\s[t,T] equals to

Ks(x(t), s(-)[, r),N\s(-)[t, T\) = E W ) , ()[*, T}, UJV\S(-)[*. TD-


The payoff of player N \ S is set equal to K$. The truncation of the admissible
strategy us{) and the admissible strategy set Vs of the coalition S over the time
interval \t,T] are denoted by us(-)[*;?1 and T>s[t,T] respectively. Since r s ( x ( t ) , r i)
is a zero-sum differential game, the strategy sets Ds[t,T], )/v\s[i,T] may be any of
the strategy sets defined for such games in Sec. 5.1. So are also the sets Ds and DN\S
in rs(x0,T - t0).
The current subgame Tv(x(t),T - t) is defined as (N,v(S,x(t),T - t)). The im
putation set in the game r(x((),T t) is of the form:

Ev(x(t),T- t) = [i * " \d > v({i};x(t),T - t), i = 1 , . . . ,n;

E& = t>(tf;*(0,:r-o},
where
v(N;x(t),T- t) = v(N;x0,T- t0) - / ' h,(x(r))dr.
Jt
iN
The quantity
320 Chapter 5. Differential games
is interpreted as the total gain of the players on the time interval [to, t] when the
motion is carried out along the trajectory x(-).
5.11.4. Consider the family of current games
{Tv(x(t),T - t) = (JV,(S;x(0,T - t)},0 < < < T],
determined along the conditionally optimal trajectory x() and their solutions
Wv(x(t), T - t) C Ev(x(t),T - t) generated by the same principle of optimality
as the initially solution Wv(xo,T t0).
Lemma. The set Wv(x(T),0) is a solution of the current game r(x(r),0)
and is composed of the only imputation H(x(T)) = {Hi(x(T)), i = l , . . . , n } , where
Hi(x(T)) is the terminal part of the player i's payoff along the trajectory x~(-).
Proof. Since the game I\,(x(T),0) is of zero-duration, then for all i 6 N
v({i};x(T), 0) = Hi(x(T)). Hence
v({i};x(T),0) = ff,-(*(T)) = v(N;x(T),0),

i.e. the characteristic function of the game r(x(T),0) is additive for S and, by
Theorem,
(x(T),0) = H(x(T)) = Wv(x(T),Q).
This completes the proof of the lemma.
5.11.5. Dynamic stability of solution. Let the conditionally optimal trajectory
x(-) be such that Wv(x(t),T t) ^ 0,to < t <T. If this condition is not satisfied,
it is impossible for players to adhere to the chosen principle of optimality, since at
the very first instant t, when Wv(x(t),T t) = 0 , the players have no possibility to
follow this principle. Assume that in the initial state xo the players agree upon the
imputation ( Wv(x0,T tQ). This means that in the state x0 the players agree
upon such imputation of the gain in such a way that (when the game terminates at
the instant T) the share of the tth player is equal to ?, i.e. the tth component of the
imputation . Suppose the player t's payoff (his share) on the time interval [t0, t]
is ,(x(*)). Then, on the remaining time interval [t, T] according to the he is to
receive the gain n\ = ,(x(<)). For the original agreement (the imputation ) to
remain in force at the instant t, it is essential that the vector n* = (n{,... ,n) belongs
to the set Wv(x(t),T t), i.e. a solution of the current game Tv(x(t),T t). If such
a condition is satisfied at each instant of time t [to,3"] along the trajectory x(),
then the imputation is realized. Such is the conceptual meaning of the dynamic
stability of the sharing.
Along the trajectory x(-) on the time interval [t,T], to < t < T, the coalition N
obtains the payoff

v(N;x(t), T - t) = f Jtf A,(x(r))<fr + #,(*(T)) .


iert l
Then the difference
v(N; x0, T - t0) - v(N; x(t), T - t) = f hi{x{r))dr
Jt
iSN
5.11. Principle of dynamic stability (time-consistency) 321

is equal to the payoff the coalition N obtains on the time interval [to, t). The share
of the ith player in this payoff, considering the transferability of payoffs, may be
represented as
iM = f 0,(r)f:ht(x(r))dT = 7<(z(t),/J), (5.11.1)

where A ( T ) is the [to,T] integrable function satisfying the condition

f t ( T ) = l, ft(t)>0, t0<r<T, (*=l,...,n).


i=i

From (5.11.1) we necessarily get

d
Jl=m^h,(x(t))-
a l
iN

This quantity may be interpreted as an instantaneous gain of the player i at the


moment t. Hence it is clear the vector fi(t) = (fii(t), , A(0) prescribes distribution
of the total gain among the members of the coalition N. By properly choosing /J(t),
the players can ensure the desirable outcome, i.e. to regulate the players' gain receipt
with respect to time, so that at each instant t g [to,T] there will be no objection
against realization of the original agreement (the imputation ).
Definition. The imputation 6 Wv(x0,T t 0 ) is called dynamic stable in the
game rv(x0, T t0) if the following conditions are satisfied:
1. there exists a conditionally optimal trajectory x(-) along which
Wv(x(t),T-t)t@, t0<t<T,

2. there exists such [t0, T] integrable function P(t) = ( & ( t ) , . . . , /?(*)) ^ai for eac
^
h<t<T ft() > 0, "=, ft(t) = 1 and

fe H [7(*(0.0)wW),r-O). (5-H-2)
'o<t<T

where f(x(t)J) = ( 7 l ( f ( i ) J ) , . . , 7 n ( l ( i ) , f l ) , andWv(x(t),T-t) is a solution


of the current game Tv(x(t),T t).
The cooperative differential game Tv(x0,T-t0) with side payments has a dynamic
stable solution WV(XQ, T t0) if all of the imputations f Wv(x0, T t0) are dynamic
stable.
The conditionally optimal trajectory along which there exists a dynamic stable
solution of the game Fv(x0,T t 0 ) is called an optimal trajectory.
If there exists at least one dynamic stable imputation Wv(x0,T - t 0 ), but
not all of the imputations from the set W(z 0 , T - t 0 ) have such a property, then we
may discuss a partial dynamic stability of the solution Wv(xo, T to) of the game
r(x< T 10)
The sum in the above Definition has the following meaning: for n B? and
Acff,nA = {n + a\aA}.
322 Ciapter 5. Differential games
From the definition of the dynamic stability at the instant t = T we have
4 6 7(x(T),/3) Wv(x(T),Q), where Wv(x(T),0) is a solution of the current game
r(j(r),0) and is made up of the only imputation (T = H(x(T)), the sharing may
be represented as ( = i{x{T),P) + H{x(T)) or

e- T
e=f p{T)Y,H*(r))dr
/O0*
Jto
J
*o ieN
ii(x(r))dT + H{x{T)).
H(x(T)).

The dynamic stable imputation Wv(x0, T 10) may be realized as follows. FVom
(5.11.2) at any instant t0 < t < T we have

ef b(x-(t),0)
h(x(t),P)W v(x(t),T-t)),
Wv(x(t),T - <)], (5.11.3)

where
7
(x(0,/?)=/V)M*-(r))<<r
7(X(*),/?)=/V)M*(T))^

is the payoff vector on the time interval [t0, t], the player t's share in the gain on the
same interval being
7i(x(tlP)-- = r^(r)A n(x(r))dr.

When the game proceeds along the optimal trajectory, the players on each time
interval [to, t] share the total gain

:i(x(T))dr
""ieiv

among themselves
- -r(x(t),0) 6y(x(t),T-t)
{--y(x(t),l3)eW Wy(x(t),T - t) (5.11.4)
so that the inclusion (5.11.4) is satisfied. Furthermore, (5.11.4) implies the existence
of such vector {' Wv(x(t),T-t) that = y(x(t),0)+lt. That is, in the description
of the above method of choosing fi{r), the vector of the gains to be obtained by the
players at the remaining stage of the game

et ==e-
e - -t(mj) =-i:
7(5(0,0) = h(x(r))dr
[h{x(r))dr +
+ H(x(T))

belongs to the set Wv(3(t), T t). Geometrically, this means that by varying the
vector 7(i(<),/9) = (~/i(x(t),0),... ,*yn(x(t),0)) restricted by the only condition

,
*(*W,/)=/
*(*(*),) = M*M)<fr,
,(x(r))dr,
EN JC*
the players ensure displacement of the set f(x(t), f$) @ Wv(x(t), T t) in such a way
that the inclusion (5.11.3) is satisfied.
In general, it is fairly easy to see that there may exist an infinite number of
vectors 0(T) satisfying conditions (5.11.3),(5.11.4). Therefore the sharing method
5.11. Principle of dynamic stability (time-consistency) 323

proposed here seems to lack true uniqueness. However, for any vector f}(r) satisfying
conditions (5.11.3)(5.11.4) at each time instant t0 < t < T the players are guided
by the imputation ' Wv(s(t), T t) and the same optimality principle throughout
the game, and hence have no reason to violate the previously concluded agreement.
Let us make the following additional assumptions:

a) the set Wv(x(t),T t) is continuously dependent on x(t),t in Hausdorff metric;

b ) the vector (' 6 Wv(x(t),T t) may be chosen as a continuously differentiable


monotone nonincreasing function of the argument t.

Show that by properly choosing /3(t) we may always ensure dynamic stability
of the sharing WV(XQ,T to) under assumptions a),b) and the first condition
of Definition (i.e. along the conditionally optimal trajectory at each time instant
to<t<TWvCx(t),T-t)?e>).
We choose ' Wv(~x(t),T t) to be a continuously monotone differentiable
nonincreasing function of t, t0 < t < T. Construct the difference ' = a(t) then
we get f! + a(t) Wv(x0,T-t0). Let ${t) = ( A ( 0 . - ,A.(0) b e t h e %,T) integrable
vector function satisfying condition (5.11.4). Solve the equation (with respect to 0(t))

f|(r)^(#,
ffi(T)J2hi(x(r))dT a=(l):
a(t):

l 1
m
3(t]
P U
= 1 . ^**()_
1= 1 . *! (5(5.11.5)
n 5)
***.-(*(*)) dt E*JVM*(0) dt'
Make sure that for such 0(t) the condition (5.11.4) is satisfied. Indeed,

y m .* _ HN;x{t),T-t)

= ililies
[E*N (If
[ll Hn^dr
h,(x(r))dr++HMT))))
HitfT)))]= E,E, M^))
M*(0)=- 1
*AfW))
De* *,(*(')) ""E*wM*(0)
Li^h,(W(t))
(smceZizNti=v(N;x(t),T-t)).
From condition (5.10.3) we have h{,Hi > 0, i N, and since d^/dt < 0, then
ft(r) > 0.
Thus, if along the conditionally optimal trajectory all current games have
nonempty solutions possessing conditions a), b) then the original game TV(XQ,T 10)
has a dynamic stable solution. Theoretically, the main problem is to study conditions
imposed on the vector function /?(<) in order to ensure dynamic stability of specific
forms of solutions Wv(x0, T - 1 0 ) in various classes of games. In what follows we shall
try to make a classification of dynamic stable solutions.
Consider the new concept of strongly dynamic stability and define dynamic stable
solutions for cooperative games with terminal payoffs.
5.11.6. Strongly-dynamic stable solution. For the dynamic stable imputation
6 Wv(x0,T tQ), as follows from the Definition, for t0 < t < T there exists
324 Chapter 5. Differential games

such [to,T] integrable vector function 0(t) and imputation (* (generally nonunique)
from the solution Wv{w(t),T - t) of the current game Tv(j(t),T - t) that =
7(x(i), ) + '. The conditions of dynamic stability do not affect the imputation from
the set Wv(x(t),T t) which fail to satisfy this equation. Furthermore, of interest is
the case where any imputation from the current solution Wv(x(t), T t) may provide
a "good" continuation for the original agreement, i.e. for a dynamic stable imputation
( 6 Wv(x0,T - to) at any instant tQ < t < T and for every * 6 Wv(x(t),T - t)
the condition t(x(t),0) + (* 6 Wv(x0,T - t0), where i{x{T)J) + H(x(T)) = (, be
satisfied. By slightly strengthening this requirement, we obtain a qualitatively new
dynamic stability concept of the solution WV(XQ,T t0) of the game Tv(xo,T t0)
and call it a strongly dynamic stability.
Definition. The imputation 6 W v (*o, T-t0) is called strongly-dynamic stable
in the game Tv(x0, T 10), if the following conditions are satisfied:
1. the imputation is dynamic stable;
S. for any to<ti<t2<T and /9(t) corresponding to the imputation ( according
to (5.11.2),

7 (x(t 2 ) /9) Wv(x(t2),T - t2)) C l(x(tu0) Wv(x(*d,T- t,)). (5.11.6)

The cooperative differential game Tv(xo, Tt0) with side payments has a strongly-
dynamic stable solution Wv(x0, T -10) if all the imputations from W^zo, T -10) are
strongly-dynamic stable.
The conditionally optimal trajectory along which there exists a strongly dynamic
stable solution of the game I\,(xo! T to) is called a strongly optimal trajectory.
If there exists at least one strongly-dynamic stable, imputation ( Wv(xo,T to),
but not all of the imputation from the set W^xo, Tto) have such a property, then
we are dealing with a partial strongly dynamic stability of the solution JV",,(xo, T to)
of the game r v (x 0 ) T to)-
The dynamic instability of the solution of the cooperative differential game leads
to abandonment of the optimality principle generating this solution, since none of the
imputation from the set Wv(x0, T to) remains optimal until the game terminates.
Therefore, the set Wvfach T 10) may be called a solution to the game r(x 0 , T to)
only if it is dynamic stable. Otherwise the game r(x 0 , T 10) is assumed to have no
solution.
5.11.7. Terminal payoffs. In (5.10.3) let hi = 0, t = l , . . . , n . The cooperative
differential game with terminal payoffs is denoted by the same symbol r(xo, T to).
In such games the payoffs are made when the game terminates.
Denote by C T - , 0 (xo) the set of points y G /T" for which there exists an open-
loop control ( i ( < ) , . . . , n ( 0 ) ! u i ( 0 W,-, * = 1 , . . . , n transferring (because of system
(5.10.2)), a phase point from the initial state Xo to the point y in time T to- The
set CT~te(x0) is called the reachability set in the game r v ( x 0 , T t0).
It is naturally assumed that in the game with terminal payoffs
v{N;x0,T-t0)= max HN(x) = HN(x'), (5.11.7)
xeC T -'o(r 0 )
5.11. Principle of dynamic stability (time-consistency) 325

where
HN(x) = #,(*)
(for simplicity, assume that a maximum in (5.11.7) is achieved, otherwise construc
tions become somewhat complicated and we have to deal with the e-dynamic stabil-
ity).
Definition. Any trajectory x(-) of system (5.10.1)-(5.10.2) such that x(T) =
3T* is called a conditionally optimal trajectory in the cooperative differential game
TV(XQ,T to) with terminal payoffs.
Definition of a dynamic stable imputation from the solution WV(XQ,T to) is
obtained as a special case of the definition from 5.11.5. Since the games with terminal
payoffs are frequently encountered in this book, this definition is provided separately.
Consider the current games r(z(t), T t), t0 < t < T along a conditionally
optimal trajectory x(-). As before, their solutions are denoted by Wv(x(t),T t) C
Ev(x(t),T -t),t0<t< T. The game Vv(x(t),T - t) is of duration T - t, has the
initial state x(), and the payoff functions therein are defined just as in the game
r v ( i 0 ) T to). Note that, with the motion along the conditionally optimal trajectory,
at each time instant t0 < t < T the point x* remains in the reachability region

CT~'(x(t)) = {y Rm | i ( T ; * ( 0 , u , ( i ) , . . . , u ( 0 ) = V< .(<) W,-, t = l , . . . , n } ,

and at the instant t = T, C(x(T)) = x*.


Definition. The imputation Wv(xo,T t 0 ) is called dynamic stable in the
game with terminal payoffs if:
1. there is a conditionally optimal trajectory x(-) along which
wv(x{t),T-1)?0, t0<t<r;

The conditionally optimal trajectory along which there exists a dynamic stable
imputation Wv{x0,T - t0) is called an optimal trajectory.
Theorem. In the cooperative differential game TV(XQ,T t0) with terminal
payoffs Hi(x{T)), i = l , . . . , n , only the vector H(x") = {#i(x"), i = l , . . . , n }
whose components are equal to the players payoffs at the end point of the conditionally
optimal trajectory may be dynamic stable.
Proof. It follows from the dynamic stability of the imputation
Wv(x0,T-t0) that
(0 fl W.(x(t),T-t).
to<(<T

But since the current game i\(x"(T), 0) is of zero duration, then therein Ev(x(T), 0) =
W(*(T),0) = H{x(T)) = H{x'). Hence

f| Wv(W(t),T-t) = H(x'),
to<t<T

i.e. = H(x") and there are no other imputations.


326 Chapter 5. Differentia! games
Theorem. For the existence of the dynamic stable solution in the game with
terminal payoff it is necessary and sufficient that for all to <t <T
H(x')eWv(x(t),T-t),
where H(x*) is the players payoff vector at the end point of the conditionally optimal
trajectory x(-), with Wv(x(t),T t), t0 < t < T being the solutions of the current
games along the conditionally optimal trajectory generated by the chosen principle of
optimality.
This theorem is a corollary of the previous one.
Thus, if in the game with terminal payoffs there is a dynamic stable imputation,
then the players in the initial state x0 have to agree upon realization of the vector
(imputation) H(x*) 6 Wv(x0,T t0) and, with the motion along the optimal tra
jectory !(), at each time instant t0 < t < T this imputation H(x") belongs to the
solution of the current games rv(x(t),T t).
As the Theorem shows, in the game with terminal payoffs only a unique imputation
from the set Wv(xo, T to) may be dynamic stable. Therefore, in such games there
is no point in discussing both the dynamic stability of the solution Wv(xa,T to) as
a whole and its strong dynamic stability.

5.12 Integral optimality principles


As we have seen earlier, most of the optimality principles (OP) in n-person differential
game theory are taken from the classical (static) theory and are dynamic unstable
(time inconsistent), or strongly dynamic unstable (strongly time inconsistent). In
this chapter for the cooperative case we try to purpose the family of strongly time
consistent optimality principles (STCOP) constructed on the base of integration of
the locally optimal behavior in classical sense. Using the regularizing procedure we
get new STCOP's based on core, Shapley value, ./VAf-solution.
Consider n-person differential game r(io, T to)
x = f(x, i , . . . , u), , 6 Ui C CompRf,
with integral payoffs
[T
Ki(x0,T ~t0;ui,...,un) = / hi(x(t))dt, hi > 0, = l , . , . , n .
Jta

5.12.1. Cooperative form of F(xo,T t0). As it is assumed in the cooperative


game theory, we suppose that the players before starting the game agree to play
u j , . . . , u* such that the corresponding trajectory maximizes the sum of the payoffs
n n
max Ki(x0, T - t0; i , . . . , u) = #,(x 0 , T - t0; t**,..., < )

= E fT hi(x'(t))dt = v(N; x0, T - t0), (5.12.1)


Ju
,=i >
5.12. Integral optimality principles 327

where N is the set of all players in V(x0, T-U,}. The trajectory x*(t) is called optimal.
Let v(S; x0, T -10) be the characteristic function (5 C N) and C(x0, T -10) the core.
Consider the family of subgames r(x"(t), T t) along x*(t), t ^oi^]) corresponding
cores C(x*(t),T t) (which are supposed to be nonvoid) and c.f. v(S;x'(t),T t).
The core C(x0,T - t0) is strongly time inconsistent and moreover in all nontrivial
cases even time inconsistent. But using the c.f. v(S;x'(t),T t) and C(x*(t),T t),
t [*oi T] we shall construct a new c.f. and based on it a new strongly dynamic stable
time consistent (STC) optimality principle (OP).
Let us introduce the following function

v(S; xo, T-t0)= = i - f W f ht(x'(r))dr + v(S; s*(t), T - t) dt, (5.12.2)

where S C N, x'(r) is the optimal trajectory from (5.12.1), v(S;x*(t),T t), t g


[t0,T] c.f. in the subgame T(x'(t),T -1). We suppose v(S; x*(t),T-1) integrable on
[t0,n
We have for S = N
v(N;x0,T~t0)

r^-rl L f A.(x*(r))dr + v(N;x'(t),T-t (5.12.3)

= i r ~ / v{N;x0,T-to)dt = v(N;x0,T-to),
1 to Jto
because along the optimal trajectory x*(r), T [to,T\ the Bellman's optimality prin
ciple holds for the function v(N; x"(t),T <), i.e.

v(N;xo,T-t0) = Yl fhiix'WdT + viN-x-it^T-t), te\t0,T).

It is easily seen that

v(Sx\JS2;x0,T- tQ) > v{Sux0,T- t0) + v(S2;x0,T - t 0 ), (5.12.4)

for Sif)S-i = 0 , Si C J V . S i C N. From the superadditivity of c.f. we have for all


t[to,T)

t;(5,;x*(t),T - i) + v(S2;x'(t),T - t) < v(S,{JS2;x'(t),T -1). (5.12.5)

Adding to both sides of (5.12.5)

iesl " i6S s

and integrating on [t0,T] we get (5.12.4).


328 Chapter 5. Differential games
Thus from (5.12.3) and (5.12.4) it follows that v(S;x0,T -10), S C N is c.f. in
the game T(x0,T - to)- Define now the analogue of v(S;x0,T - t0), S C N for the
subgames T(x'(e),T - 0), 9 {t0,T}. Let

V(N; x'(0), T-B) = =-L- f f ^ / ' h^x'^dr + v(N; x*(t), T - t)} dt


T-t0Je [H Je
(5.12.6)
and

v(S; x'(Q), T - 0 ) = - i - fT \j2 f hi(x'{T))Ar + v(S; x'(t), T - t)] dt, S c N.


(5.12.7)
From (5.12.6) we get that

v(N; x'(B), T - 0) = f^^N; x*(0), T - 0). (5.12.8)

We see that V is not a c.f. in a common sense in the subgame T(x'(&),T 0),
*o : is T, because v(N;x"(Q),T 0 ) is not equal to the maximal sum of the
payoffs of all the players in this subgame.
In the way we have done it for V(S;2Q,T t0) we show that v(S;x'(Q),T 0),
0 [to, T], S C N is a superadditive function of 5.
Let C(XQ,T~t0) and C(x"(t),T t) bethenonvoid cores in the game r(x0,Tto)
and r(x'(t),T - t), t 6 [t0,T] respectively. Let

0 = {W0.-.W0-..,W*)}eC(i-(t),r-0. te[t0,T]
be an integrable selector which is an imputation from the core of the subgame
r(x*(t),T t) at each instant t. Consider the quantities

^ = T~^ {ll III M**(T))<fr + &(*)] <"} , (5-12.9)


t = l,...,n.
Let (7(z0, T-t), C(x'{Q, T - 0), be the sets of all possible vectors f " = {"} and
correspondingly $ = {(f} defined by (5.12.9) for all possible integrable selectors

t(t)eC(x'(t),T-t), te[t0,T},
from the cores of the subgames T(x'(i),T t). Then using the set integration we
may write,

C-(x0,T-t0) =^ / T [ / \ ( x - ( r ) ) < , r C ( x ' ( 0 , r - < ) dt,


I to Jto Uto
5.12. Integral optimaJity principles 329

Cf(i-(9)!T-0) = - i /T[/\(i'(T))dTC(i,(*).r-*)l<tt. (5.12.10)


1 t0 Je u e J
The necessary and sufficient condition for an imputation (t) belongs to the core
C{x'(t),T - t) is
J^Ut)>v(S;xm{t),T -t), SCN (5.12.11)

adding to both sides of (5.12.11) E i g s / e hi(x*(r))dr and integrating cover [Q,T] we


get

E^{/;[E/e'M^(r)^^t(o]^

> fh-0 {il [ g /a K{X*{r))dT + *' *'W'T ' l)] *} (5.12.12)

Using (5.12.7) and (5.12.9) we get

D?f >*(S;x*(e),r-e), 5CJV

i.e. that any vector C(x*(8), T 8 ) , 8 [*< T] belongs also to the core of the
subgame r ( x * ( 0 ) , T - 0 ) defined by the c.f. u ( 5 ; x * ( 8 ) , T - 0 ) .
Theorem. The set C(x*(Q),T 0 ) belongs to the core of the subgame
r ( x - ( 8 , T - 8 ) ) with the c.f. u(S;x*(6),T -0),ScN. __
5.12.2. Now we have the intuitive background to introduce C(x*(8),T0) as an
OP in the subgame r ( x * ( 0 ) , r - 0 ) , 0 [t0,T] ( in the case 0 = t0 we have an OP
for the original game r(xa,T t0)). Define now a natural procedure of distribution
of the imputation on the time interval [t0, T] which leads to the STCOP.
Let I C ( x 0 , T - <o) and a function #(<), i = 1 , . . . ,n, t [t0,T] satisfies the
condition
[T&{t)dt = l, ft(t)>o.
The function 0(t) = {0i{t)} shall be called the imputation distribution procedure
(1DP). Define
PW)* = ?,()- t=l,...,n. (5.12.13)
Jto

Definition. The OP ~U(x0,T - t0) is called STC if there exists such an IDP
0(t) = {&(t)}, *at

e(e) + *V(e),r - 0) c c(*0, r - to), (5.12.14)

for all Be [t0,T].


The definition of STC is applicable for larger class of OP's (such that Shapley
value, JVM-sohition, etc.)
330 Chapter 5. Differential games
The STC of the OP means that if an imputation I e (J(x0,T - t 0 ) and an IDP
f}{t) = {/?,(*)} of are selected, then after getting by the players, on the time interval
[t 0 ,O], the amount
()= /%<(<)<&, = l,...,n,
Jto
the optimal income (in the sense of the OP 7J(x'(e),T - )) on the time interval
[9, T] in the subgame r(ar*(9), T - 0) together with () constitutes the imputation
belonging to the OP in the original game r(ar0, T to). This condition is stronger
than time consistency, where we have only

?-?(e)c*V,r-e),
which means that the part of the previously considered "optimal'' imputation belongs
to the OP in the corresponding current subgame r(x*(0),T 0).
5.12.S. Theorem. OP <7(x0, T - T0) is STC in T(x0, T-t0).
Proof. Define
A-(0 = ^ + ^ ~ M**W), (5-12.15)
where (t) C(x*(t),T - t), t [t0,T]. Consider the set ?() <7(x*(0), T - ),
where
J{Q) = / 0(t)dt. (5.12.16)
Jto
From (5.12.16) we get

?(e) =
fh'o' *{t)dt+Th; {T ~ *w**w)*
=
r-J,^ + r-JJX^*wH A
+
r - tJjT - W*>* - T U L \L h{x*{T))drdt\ (5.12.17)

=T-tJAih{x'iT))dT+m}dt
+Tl-tJjT- w<* - T U L \L h{x'{r))dT\dt
= r -J fc U *<**+H dt + T=^i k^'^dT'
9
T-Q f

by using the formula


9 T B
j h(x'(r))dT = j th(x'(t))dt + j \f h(x'(T))dr) dt.

From (5.12.17) and (5.12.16) we get

f(e)lV(e),T-e) = <>54 T h{x {r))dT (t)]dt


i
0[Jl ' ~
5.12. Integral optimality principles 331
6
+; -f- / h(x'(r))dr C(x*(0), T - 0 ) = = - ^ - / * [ / ' h(x'(r))dr + & ) | dt

T - 0 /e
* /i(x*(T))dr 0 i f : f = ^-L- [/j h(x'(r))dr + '(*) <ft,
T-fc,

for all i'(t) C C(i*(<), T - t), t 6 [0, T] f.

We see that every element of the set ( 0 ) + (7(z*(0),T 0 ) is represented in the


form
l ce r /( 1
*

+ L_ e _ f* h(X'(r))dr + L- f [ A(X'(T))A- + '(*)] dt,


1 to Jto 1 t0 Je we J
where ^'(I) is some imputation from the core C(x*(4),Tt), t G [0,T] in thesubgame
V{x'(t),T - t). But the function

?m _ / ( ( * ) , 1*0,6),
tl,
~ U ' ( 0 . [0,r]
is also a selector from C{x"(t),T t), t G [to,^), thus

+ + k{x dr+ dt
^ i r*<*<** r^ n vi '^ H
= r h f K *(**(T)dT+ N * + rho llllh^T))dTdt
+
rho ll [II ^ ) ^ + H d< = r h ll K h^T))dT +i{t)\dt
+Tho{Ch^))dT + m dt
]
= ^-r lT \ /' M**^) + *)1* c e(* 0 ,r - <o)
J tp Jlo Ulo
and we have
?(6) + Z7(x-(9), T - 0) C C(x0, T - t0),
for all 0 [f0, T]. The theorem is proved.
It may be easily seen that in the case of terminal payoff, when
Ki(x0,T - t0\ui,...,u) = /fc(*(T)), i = 1,.. - , n

all the results remain valid if we put

T
i= ^-T/ mdt,
l to 'to
332 Chapter 5. Differential games
whereof) 6 C(x'(t),T - t),

V(x'(e),T-Q) = ^ fc(x*(t),T-t)dt, 6 [to,T\-


I to J
Using the c.f. w(S; x0,T t0) we may compute the Shapley value Wi = {SHi} in
the game r ( i 0 , T - t0). We get the following formula:

Sh(x0,T - to) = j ~ 5A(x*(t),T - 0 * .

where 5ft(z*(i), T i) is a Shapley value for the subgame r(z*(i), T t) with c.f.
v{S;x'(t),T - t), S C N. The following equality holds:

JH(x0,T- t0) = ! fB Sh(x'(t), T - t)dt + M(x'(e),T- 0), 6 [t0,T],


I to Jto
where JK(x'(B),T - 9) is a Shapley value in the subgame T(x'(t),T - t) with c.f.
v(S; x*(t), T t). As in the case of core we may construct the IDP with guaranties
the STC (which in this case coincides with TC) of the Sh. For this reason it is enough
to put
W) = 1^ToSh(x*(t),T-t).

The construction of the STC JVM-solution proceeds in much the same way.

5.13 Differential strongly time consistent opti


mally principles
In Sec. 5.12 we have introduced a family of strongly time consistent optimality prin
ciples (STCOP) based upon the integration of optimality principles (OP) in current
subproblems occurring along the optimal path. When trying to apply this new OP's
to the optimization of complicated economic systems one faces the problem of the
redistribution of the investments on the time interval under consideration. The latter
may require additional investments in intermediate time intervals. This tightens the
applicability of the purposed integral STCOP. To overcome the difficulty we purpose
here another approach suitable for the optimization of developing economic systems,
5.13.1. As before let n-person dynamic game r(* 0 i T 10) with prescribed dura
tion T to, from the initial position xo-
i = /(x,,,...,), x6/T, Ui Ut C Compli!, (5.13.1)
with integral payoffs

Ki(x0,T~t0;uu...,un)= f h,(x(t))dt,
Jto
5.13. Differential strongly time consistent optimality principles 333

hii> 0, = l,...,n
be given.
Denote by E(x0,T - t0) the set of all imputations in r ( i 0 , T 10), i.e.,

E(x0)T - t0) = { = {{,-} : & = w(iV;x6,r - to), fc > V({t};x 0 ,T - f 0 ),

i = l,...,n}.
( De a
Let C'~ (xo) (t (to, T]) reachable set of the system. I.e., the set of all points
in Rn which could be reached at instant t 6 [to, T\ from the initial position XQ = x(to)
according to (5.13.1) with the help of some admissible open-loop control U(T), r
[t 0 ,f]. For each y C - < 0 (xo) consider a subgame T(y,T t) of the game T(x,T
t 0 ) with corresponding characteristic function v(S;y,T t) and set of imputations
E(y,T-ty
Definition. A point-to-set mapping

C(y,T-t)cE(y,T-t)

defined for all y 6 C"~ to (xo), t 6 [to,T] is called optimality principle (OP) in the
family of subgames T(y,T t).
In special cases C(y,T t) may be a core, NM-solution, Shapley value etc.
Consider the family of subgames T(a;*(t), Tt), along the optimal trajectory x*(t),
t 6 [(o,T), with corresponding characteristic functions v(S;xQ(t),T t), and sets of
imputations E(x'(t),T t).
5.13.2. Define now a natural procedure of distribution of the imputation on the
time interval [to,T] which leads to the differential STCOP.
Let i C(x0,T - t 0 ) and a function &(t), t = l , . . . , n , t [to,T] satisfies the
condition
/ ft(()di = L A W > o .
Jto
The function fi(t) = (ft(i)} shall be called the imputation distribution procedure
(IDP). Define
/eft(i)<ft={8), t = l,...,n.

Definition. The OP V(x'{t),T -t), t 6 [t0,T] is called dynamic stable TC


(time consistent) if there exist such an IDP 0(t) = {A(t}}, that

?-?(e)c*V(e),T-6) (513-2)

foraliee[t,T}. _
5.13.3. Definition. The OP C(x'(t),T - t), t [io,T] is called strongly
dynamic stable STC (strongly time consistent) if there exists such an IDP 0(t) =
{A(t)}, that
((G) ffi C(x'(B),T - 8 ) C C(x0,T- to) (5.13.3)
334 Chapter 5. Differential games
for all 0 6 [t0,T}. Here C~(x'(Q),T - 0) means the set of all possible vectors
l + V,forallr,eC(x'(e),T-B).
The STC of the OP means that if an imputation e C~(x0, T - t0) and IDP
f}(t) = {Pi(t)} of f are selected, then after getting by the players on the time interval
[to, ] of the amount
W) = ffiiWdt, =1,...,
any optimal income (in the sense of the OP C7(x*(0),T - 0)) on the time interval
[0, T] in the subgame T(x*, T - 0 ) together with &(0) constitutes the imputation
belonging to the OP in the original game T(x0, T - to).
Suppose <0 = o < 0 i < < 6n < 0n+i < < m = T is a partition of the
time interval [t0, T] such that 0 n + i 0 B = 6, k = 0 , 1 , . . . , m 1.
If (5.13.2) holds only in the points 0*, k = 0 , 1 , . . . ,m - 1, i.e.
f-f(e 4 )cZ7(x-(e 4 ),r-eo (5.13.4)
we call OP 7J(x*(t),T-t) 6TC (6 time-consistent). If (5.13.3) holds only in the points
0*, k = 0,1,...,m 1, i.e.
?(6) $ * V ( 9 * ) , T - 6*) C U(x0, T - to) (5-13.5)
we call OP 7J(x'(t),T- t) <5STC (* strongly time-consistent).
Now we have everything necessary to construct differential STCOP. Introduce the
following functions:
Pa ;
v(N-,x0,T-e0)
r [0o,0,), where > C(x*(i0), T - t 0 ), x0 = *(0o) = *(*o)-

r [0i,0j), where {2 6 C(x*(0,),T - 0,).

tfE?=iA.(*'(r))
ATM V (iy;x-(0 t _,),T-*_,)

r [0*-,,*), where * C(x'(e*_ 1 )T - *_!).

fW (jv;**(em-,),r-em_,)
r S [ 0 m - 0 m = T], where f 6 C(x-(0 m _,),T - _,).
Define the IDP (T) by formula

? ( T ) = 0*(T), T[e*_e4), *=l,.-.,n-l. (5.13.7)


It is easily seen that ^ > 0.
5.13. Differential strongly time consistent optimatity principles 335

Consider the formulas (5.13.6). For different imputations (k G C(z*(8*-i),T


k-i)) k = l , . . . , n 1 we get different functions /3*(T), T G [9t,t+i), and thus
different functions /5(r) defined by (5.13.7). Let B be the set of all such IDP ~0(T),
r G [t0,T] for all possible * G C(x'{Bk-i),T-et_i), Jfc = 1 , . . . , n - 1. Consider the
set
U(x0,T - t0) = U i = jl P{r)dT, p(r) G B\
and the sets

d(x'(Bk),T- ek) = if f = j P(r)dr, flr) G B j .

The set (?(x0, T 10) is called the regularized OP 77(x0, T 10), and correspondingly
d{x*(Qk),T - e) is a regularized OP V{x*(Sk),T - 6*).
We consider (7(xo> T 10) as a new optimality principle in the game T(x0, T to).
Theorem. / / the IDP0{T), T G [t0,T] is defined by (5.13.7) then always
? ( e t ) ffi d(x'(Qk), T-Qk)c V(x0, T -10)
i.e. the OP d(x'(r),T - t) is 6_STCOP.
Proof. Suppose I = l(Bk) V(x'(Bk), T - Bk). Then = ? ( 0 t ) + J * J8(T)JT for
some 0 ( T ) B. But (0*) = j* '( T M T for s o m e
^ ( r ) B . Consider

^^-\/(r), r[e*,Tl,
then /?"(T) G B, and

? = f ('{r)dT
and thus G C ( X Q , T to). The theorem is proved.
5.13.4. The denned IDP has the advantage (compared with integral one defined
in 1 of this chapter)
> , ( r ) = l, r6[lo,r|,

and thus
E?.-(0) = E f M*.-M)*T (5-13.8)
which is the actual amount to be divided between the players on the time interval
[t0, &] and which is as it is seen by the formula (5.13.8), exactly equal to the amount
earned by them on this time interval. Thus for the realization of the purposed IDP
no additional investments are needed ((5.13.8) may not hold for integral OP's).
If 6 tends to zero we may get STCOP by introducing the IDP 0(r), r G [t0,T],
by the formula

PAT)
V(N;X-(T),T-T)
where & ( T ) G C(X'(T),T T ) is an integrable selector.
336 Chapter 5. Differential games
5.14 Strongly time consistent optimality princi
ples for t h e games with discount payoffs
The problem of dynamic stability (time consistency) for the n-person differential
games with discount payoffs was first mentioned in Strotz (1955), where it was proved
that even the Pareto optimal solutions may be time inconsistent in this case. The
reason is that in a discount payoff case the payoffs of the players in subgames ac
quiring along an optimal path essentially change their structure implying the time
inconsistency of the chosen optimality principle (OP). Till the last time no attempts
have been made for the regularizing of the OP's in discount payoff case. We refer to
Kaitala and Pohjola (1992) where this question was once more stated. Here we try
to use the approach from Sec. 5.13 to construct a family of strongly dynamic stable
optimality principles in the case under discussion. Here we shall consider core as OP
in the game, but all the results remain valid for any other subset of imputations,
considered as optimality principle.
5.14.1. Consider n-person differential game T(x0)

i = F(2,ui,...,un), u,; u C CompBf, x(*o) = x0

with payoffs

Ki(x0,...,u.) = re^'-^MxW)*, *i > 0, A, > 0. (5.14.1)


Jta

Cooperative form of r(xo)-


Consider the n-tuple of open loop controls u j , . . . , u* such that the corresponding
trajectory maximizes the sum of the payoffs (utilities)
n n
max Ki(x0; uu..., nn) = J^ A",(x0; !,-, K)
i=l izl

= / e-*(,-to,A(*,,(<))< = v[N; x0),

where N is the set of all players in r ( i 0 ) . The trajectory xu(t) is called conditionally
optimal. Let V1(S; Xo) be the characteristic function (S C N) and Cl(x0) be the core.
Consider the family of subgames r(x'*(f)), along xl'(t), t (*o,oo), corresponding
cores and c.f. V1(5;21*(t)). The payoff functions in the subgames have the form

tf/(x*(0,u,-, -.,) = f e-^-hi{xu(T)}dT,

and differs by multiplier eXit from the payoff functions in the subgames defined for
the games without discount factor. This essentially changes the relative weights of
payoff functions of different players, when the game develops, and thus the whole
game itself.
5.14. Strongly time consistent optimahty principles (or the games with ... 337

5.14.2. Consider the partition of the time interval [to,oo) by the points 0 , to =
o < 6 i < . . . < 0 * < 0fc+i < . . . , where &k+l Qk = S > 0 does not depend upon
k, the subgame I V ^ I ) ) , c.f. V 2 ( S ; x u ( 6 , ) ) and core C 2 ( x u ( 0 i ) ) . Let x 2 '(<),
t > 0 i be the conditionally optimal trajectory in the subgame r ( x ' * ( 0 i ) ) , i.e. such
that
maxf;^,V(e1),ul,...1uB) = x;^1v-(e,),u:,...,u:)
i=l isi
n
fOO
= 12 / e-A>(T-e>>Mx2*(r))</T.

Then consider the subgame I ^ x 2 * ^ ) ) , c.f. V 3 (S;x 2 *(0 2 )), core C 3 (x 2 *(0 2 )). Con
tinuing in the same manner we get the sequence of subgames r(x l *(0n)), c.f. V* + 1 (S;
x**(e)), cores Ck+l(xk'(ek)) and conditionally optimal trajectories x(* +1) *(i), t >

Definition. The trajectory x* = xkm(t), t [0jt_i,0t), k = 1,2,... is called


optimal in the game r ( x 0 ) .
In this formalization of the cooperative game we suppose that the players starting
the game agree to use the optimal trajectory x*(t), t > 0 O , 'o = 00- Our formalization
depends upon S > 0, and we denote the cooperative form of T(xo) by r 6 (xo).
5.14.3. The vector function /?i(r),..., A ( T ) , . . . , {jn(r) is called the utility distri
bution procedure (UDP) in r*(x 0 ) if

r /*(()<* = fk" e^-^h,(XW(t))dt


= Vk+i(N;xk+1-{6k)) - Vk+1(N;xk+u(ek+1)),
A(<)>0, t = l , . . . , n , fc = 0 , l , . . . .
k +,
Let ( C* (x**(0*)), k = 0 , 1 , . . . be any imputation in the subgame r(x**(0 fc ))
belonging to the core of this subgame. Define the function /?;(<), ' G [0*,0*+i) by
the formula
R ... _ ti EiT1 fer e~^-^h,(XW(t))dt
P,U
V +HN;xk*(Qk))(9k+l-ek)
k

k+l k
mV (N;x ^'(Qk)) - Vk"(N;xk"-(Qk+l))}
k k
V ^(N;x '{ek))6 ~ '
i = l , . . . , n , k = 0 , 1 , . . . . The functions {&(<)}> * j ^ *o constitute for each k
Ck+1{xkm{Qk)), k = 0 , 1 , . . . an UDP in r*(x 0 ). Let l be the infinite sequence ( =
{Je **' P{t)<H}- Denote by C(x 0 ) the set of all such sequences , for all possible UDP's
P(t), for different k Ck+\xk*(Qk)), k = 0 , 1 , . . . . Consider C(x0) as optimality
principal (OP) in r s ( x 0 ) , and call it the regularized core (RC). Define ~U(x'(9k)) for
subgame Ts(x'(ek)).
Denote by (0) a finite sequence

I " ' 0(t)dt, i = 0,l,...,fc-l.


338 Chapter 5. Differential games

Define the operation


?(e*)c"(**(e*))
is the set of all sequences / Q , + 1 P'(t)dt, j = 0 , 1 , . . . , k , . . . , where

/ 9 , + 1 p(t)dt = fH10(0*. ; = 0,1, . . . , * - 1 ,

and ff'(t) = /?(<), < G [to,*] and the subsequence

is an arbitrary element from (7(x*(8i)).


Definition. 7Ve C"(x0) is called STC if for all I e Z7(z0) we have

l(ek)WC(x'(6k)) C C"(x0) = (7(x-(8 0 )).

Theorem. The (RC) C"(x0) is a STC optimality principle in Vs(xQ).

5.15 Exercises and problems


1. Construct a reachability set for Player P and E in a "simple motion" game.
2. Suppose Player E moves from the point jfo = (y?, t/t) w i*k velocity 0 which is
constant in value and direction. Show that for each motion there is a unique motion
by Player P from the point x 0 = (z?,*?) with constant velocity a (a > p) which
realizes the capture (/-capture) of Player E within a minimal period of time. Such a
motion by Player P is called a time-optimal response with respect to a capture point.
3 . Suppose Player E moves from the point jfo = (S/IJ^J) with velocity fi that is
constant in value and direction. At the same time, Player P responds immediately by
moving towards a capture point. Construct a capture point for each pair of motions
by the players P and E. Show that the obtained locus of capture points for players
E and P is the Apollonivs circle, and write its equation.
4. Under conditions of the preceding exercise, construct the set of /-capture points
for the players E and P.
5. Denote by ;4(xo> !fo) the set of capture points with respect to initial states
XO) Sto f the players P and E (the Apollonius circle). Suppose that for some instant
of time T (r is less than the time-to-capture) the players E and P move along straight
lines with maximum velocities towards the capture point M. Construct a new set of
capture points A(x(r),y(r)) with respect to the states X(T),J/(T) that sire initial at
the instant of time r. This is a new Apollonius circle.
Show that the circles A(x0,yo) and A(x(r),y(r)) are tangent to one another at
the point M, hence A(x(r), y(r)) are contained in the circle A(xo, yo) bounded by the
circumference A(x0, y0).
5.15. Exercises and problems 339

6. Suppose that Player E moves from the point y0 along some smooth curve y(t)
with maximum velocity 0. Player P moves with maximum velocity o; at each instant
of time T he knows Player 's position y(t) and the direction of the velocity vector
V{T) = {VX{T),V2(T)}, (UJ(T) + u | ( r ) = 01). Construct Il-strategy for Player P. In
accordance with this strategy he chooses the direction of the velocity vector towards
the capture point M assuming that on the time interval [T, OO) Player E follows a
constant direction { I ( T ) , V J ( T ) } (he moves along the ray with constant velocity /?).
Show that if Player P uses Il-strategy, then the line segment [X(T), y(r)] connecting
the current positions of the players is kept parallel to the segment [x0, y0] until the
time of capture.
7. Suppose that Player E moves from yo along some smooth curve y(r) with
maximum velocity /?. Write an analytical expression for Il-strategy of Player P.
8. Show that when Player P uses Il-strategy, the capture point is always contained
in the set A(xo,yo) bounded by the Apollonius circle A(io,yo)-
Hint. The proof is carried out for the motions by Player E along fc-vertex broken
lines in terms of the statement of Exercise 5, then a passage to the limit is made.
9. ("Driver the Killer" game.) In order to write equations of motion for the
players in this game, it suffices to specify five phase coordinates: two coordinates to
identify the position of Player P (a motor vehicle), two coordinates to identify the
position of Player E (a pedestrian), and one coordinate to indicate the direction of
pursuit. Denote these coordinates by x\,Xj,yi,yj,6 (Fig. 5.1). The state of the game

Zj.Sfil
_Ji >W2
E
\

&>
&l
*1 P V2

*2

0 \ xuV\

Figure 5.1

at each instant of time is determined completely and uniquely by specifying these


phase coordinates.
The control for Player E is simple. In order to describe the direction of his motion,
it suffices to specify the angle t/> (see Fig. 5.1).
Let us choose the control for Player P. We draw through the point P the straight
line CC (\C'P\ = \PC\ = R) that is perpendicular to the velocity vector of pursuit.
340 Chapter 5. Differential games
Player P may choose the instantaneous center of curvature of his trajectory at any
point, say, at the point Ct lying on this straight line outside the interval C'C.
The control u is taken to be equal to R/\PCX\ in absolute value, positive for the
points C\ to the left of P and negative to the right of P; thus, 1 < u < 1. Prove
that the equation of motion is:

ii = wi sin 0, 2 = Wi cos 0,

yi = wj sin<p, y2 = w2 costp, 0 = u\jR u.


10. ("Driver The Killer" game. Reduction of dimension.) We assume that a
moving coordinate system related to motor vehicle P is chosen on the plane. In this
system the coordinates yi,y2 of pedestrian can be regarded as components of a single
variable vector x; the Xj axis is taken to be always directed along the velocity vector
of motor vehicle.
Suppose that Player P chooses at the instant of time t the curvature of his path to
be centered at the point C = (R/u,0), and the distance CE is equal to d (Fig. 5.2).
Then the rotation of Player P around the point C is equivalent to the rotation of x

Figure 5.2

around C in the opposite sense, but with the same angular velocity. Thus the vector
i moves with velocity that is equal to ui(dujR) in absolute value and perpendicular
to CE. The components of his velocity are obtained by multiplying the modulus by
xj/d and (ij R/<p)/d, respectively.
Show that the equations of motion are:

i , = xiu + u2 sin tp,


R

X2 = -^XtU -Wi+U2 COS tp,


R
-1<<1, 0 < V < 2*.
5.15. Exercises and problems 341

11. Let a and b be numbers such that p = \ / a 3 + b2 > 0. Show that


max,[,(acosV' +fcsinx/>)is attained in the point ij> such that cos^> = a/p, sinxp = b/p,
and this maximum is equal to p.
12. Let the payoff be terminal and the equations of motion be

i , = av + u;sinu,

i 2 = 1 4-wcosu,
0 < < 2ir, - 1 < v < 1,
where a and w are positive smooth functions of xx and x2-
Write the equation for the value V of the game in form (5.5.64) and (5.5.66) and
show that the equation in form (5.5.69) is

aVx,v -uip- VI2 = 0,

where
P = \JVl +V22> v
= sgnVx,, sinu = -Vx/p, cos5 = -Vy/p.
Hint. Make use of Exercise 11.
13. ("Driver The Killer" game.) Write the main equation in form (5.6.8) and
(5.6.10) for equations of motion in the natural space (Exercise 9) and in the re
duced space (Exercise 10). In the first case, for vx,vy,v we introduce the notation
v
\i v2i u3> vii u5> where the indices refer to the relevant phase coordinates following the
order in which they appear in equations of motion.
14. Find the equation of characteristics as a regression in the natural space for
the "Driver The Killer" game. Here the main equation (5.6.10) becomes

Wi (i>i sin 0 + v2 cos 6) + w2/> + -priV" + 1 = 0 ,


H
where
P = V3 + J, = -sgnv 5 , sini^ = t^/p, cosTp = 4 /p.
15. Make use of a solution to Exercise 14 and show that the solution in the small
of the "Driver The Killer" game for Player P is to make right-left turns as sharp as
possible and the solution for Player E is to move along the straight line.
16. Write and illustrate equation (5.6.6) for the "pulling" game

x, = u + v, |u| < Q, x2 = u + v, \v\ < f3, i(0) = x0

with the terminal payoff p{x(T),A), where A is some point, A e R2, lying outside
the system reachability set by the time-instant T from the initial state i 0 -
17. Write explicit expressions for optimal strategies in the game as in Exercise 16
and for its modification, where the duration of the game is not prefixed and the payoff
to Player E is taken to be equal to the time of arrival at the origin of coordinates.
18. Prove that the reachability set of the controlled system

9. = Pi, Pi = am - kpi,
342 Chapter 5. Differential games

*(0) = fl?, ft(0)=P?> u + u < l , = 1,2


in the space of geometric coordinates (qi ,qi)is the circle of radius R = a(e~kT + kT
l)/k2 with its center at the point q = q + p(l - e"*T)/Jfe.
19. Prove that the function pr(q,p,r,$) satisfies equation (5.6.6) written for this
case.
20. The pursuit is carried out on the plane and equations of motion are of the
form:
forP:
9. = Pi, Pi = aw.' - kppi, |u| < 1, i = 1,2,
for:

Here q, y are positions of the players P and E respectively and p is the momentum
of Player P. Now in this case Player E moves in accordance with a "simple motion",
while Player P represented by a material unit mass point moves under the frictional
force a.
The payoff to a player is defined to be the distance between geometric positions
of the players by the time T when the game ends:

H(q(T)MT)) = p(<l(ny(T)) = \E(?.(r)-!fc(r))2-

Find the quantity pj{q, y)-


21. Derive equation (5.6.6) for the problem from Exercise 20.
22. Consider the game of "simple-pursuit" with prescribed duration T in the
half-plane F, i.e. under the additional assumption that in the process of pursuit the
players cannot leave the set F. Construct reachability sets for the players.
23. Find the quantity pr(x,y) for the game of "simple pursuit" in the half-plane
with prescribed duration.
24. Consider a zero-sum game of "simple pursuit" with prescribed duration be
tween the two pursuers P = {P^/j} acting as one player and the evading player E.
Equations of motion are of the form:

x 1 = u1, lu'l^ax, 0< min{ai,o a },

x2 = u 3 , \u2\ < a 2 , x\x3,y R2,


y = v, \v\<0, ul,u2,veR2,
zl(0) = xl x , (0) = x|, y(0) = y0.
The payoff to Player E is
m i n / K s m y(T)),
i.e. Player E is interested in maximizing the distance to the nearest of the pursuers
by the time the game ends.
5.15. Exercises and problems 343

Construct the rechability sets of the players and determine geometrically the max-
imin distance Pr(xo,xl,y) between these sets.
25. Extend Theorem in 5.9.7 to the case where the participants are several pur
suers Pi,...,Pn acting as one player, and one evading player E.
This page is intentionally left blank
Bibliography

Aarts, H. and T. Driessen. ZOR, 2, 1993.

Auman, R. and L. Shapley. The Values of Non-atomic Games, p. 283. Princeton


Univ. Press, Princeton, 1974,

Ashmanov, S. A. Linear Programming, p. 198. Nauka, Moscow, 1981.

Bellman, R. Rendiconty del Circolo Mathematico di Palermo, ser. 2.1, N2. 1952.

Berge, C. Theorie generate des jeux a N personnes, p. 114. Gauthier-Villars, Paris,


1957.

Blackwell, D. and M. Girshick. Theory of games and statistical decisions, p. 330.


Chapman k Hall, N.Y., Wiley, London, 1954.

Bird, C. G. On cost allocation of a spanning tree: A game theoretic approach.


Networks, 1976.

Basar, T. and I. Olsder. Dynamic Noncooperative Game Theory, p. 233. Acad.


Press, London, 1984.

Bondareva, O. N. On Game-Theoretic Models in Economics, p. 115. Len. State


Univ. Publ., Leningrad, 1974.

Burger, E. Introduction to the Theory of Games, p. 211. Prentice-Hall, Englewood


Cliffs, N.Y., 1963.

Danskin, J. M. The theory of maximin and its applications to weapons allocation


problems, p. 126. Springer-Verlag, Berlin, 1967.

Danskin, J. M. Oper. Research, 16(3), 1968.

Davidov, E. G. Methods and Models in Theory of Antagonistic Games, p. 135.


Moscow State Univ. Publ., Moscow, 1978.

Dresher, M. Games of Strategy. Theory and Applications, p. 186. Prentice-Hall,


Englewood Cliffs, N.Y., 1961.

Diubin, G. N. and Suzdal V. G. Introduction to Applied Theory of Games, p. 311.


Nauka, Moscow, 1981.

345
346 Bibliography
Feller, V. Introduction to Probability Theory and its Applications, p. 1230. John
Wiley fc Sons Inc., N.Y.-London-Sydney-Toronto, 1971.
Friedman, A. Differential Games, p. 350. Wiley, N.Y., 1971.
Friedman, J. W. Game Theory with applications to economics, p. 361. Oxford Univ.
Press, N.Y., Oxford, 1986.
Fudenberg, D. and J. Tirole. Game theory, p. 580. The MIT Press, Cambridge,
1992.
Gale, D. The Theory of Linear Economic Models, p. 330. McGraw-Hill Book comp.,
inc., N.Y., London, 1960.
Grigorenko, N. L. Differential Games of Pursuit by Several Units, p. 217. Moscow
State Univ. Publ, Moscow, 1983.
Haigh, J. Adv. Applied Prob., 7, 1975.
Harsanyi, J. C. International Economic Review, 4, 1963.
Harsanyi, J. C. Papers in Game Theory, p. 367. Reidel, Dordrecht, 1982.
Harsanyi, J. C. and R. Selten. Management Science, 18, 1972.
Hart, S. and A. Mas-Colell. In A. E. Roth, editor, The Shapley Value. Cambridge
Univ. Press, Cambridge, 1988.
Hu, T. Integer Programming and Network Flows, p. 411. Addison-Wesley Publ.
Comp., Mento Park, Calif.-London-Dou Hills, 1970.
Isaacs, R. Differential Games, p. 384. Wiley, N.Y., 1965.
Karlin, S. Reduction of certain classes of games to integral equations. In H. Kuhn
and A. Tucker, editors, Contributions to the Theory of Games, II. Princeton Univ.
Press, Princeton (N.Y.), 1953.
Karlin, S. Mathematical Methods and Theory in Games, Programming and Eco
nomics, p. 840. Pergamon Press, London, 1959.
Kolmogorov, A. N. and S. V. Fomin. Elements of Theory of Functions and Functional
Analysis, p. 389. Nauka, Moscow, 1981.
N. Kazakova-Frehse. In M. Breton and G. Zaccour, editors, 6th International Sympo
sium on Dynamic Games and Applications, Preprint Volume, Montreal, Canada,
1994. Ecole des Hautes Etudes Commerciales.
Kononenko, A. F. Soviet Math. Reports, 231(2), 1976.
Kovalenko, A. A. Set of Problems for Theory of Games. Visha Sch., Lvov, 1974.
Bibliography 347

Kaitala, V. and M. Pohjola. In Proceedings of the V International Symposium on


Dynamic Games, Geneva, 1992.

Karlin, S. and R. Restrepo. In H. Kuhn and A. Tucker, editors, Contributions to


the Theory of Games. Princeton Univ. Press, Princeton (N.Y.), 1957.

Krasovskii, N. N. Control of Dynamical System. Problem of Guaranteed Minimum


Result, p. 469. Nauka, Moscow, 1985.

Krasovskii, N. N. and A. I. Subbotin. Positional Differential Games, p. 456. Nauka,


Moscow, 1974.

Kalai, E. and M. Smorodinsky. Econometrica, 43, 1975.

Kuhn, H. In Annals of Mathematics Studies. Princeton Univ. Press, Princeton,


1953.

Kurahansky, A. B. Control and Observation Under Uncertainty, p. 325. Nauka,


Moscow, 1977.

Luce, R. D. and H. Raiffa. Games and decisions. Introduction and critical survey,
p. 509. Wiley, N.Y., 1957.

Lutcenko, M. M. In Game-theoretical decision making. Nauka, Leningrad, 1978.

Malafeyev, O. A. Vestnik of the Leningrad State University, 7, 1980.

McKinsey, J. C. Introduction to the Theory of Games, p. 371. McGraw-Hill, N.Y.,


1952.

Moulin, H. Theorie des jevx pour I'economie et la politique, p. 200. Hermann, Paris,
1981.

Moulin, H. Game Theory for the Social Sciences, p. 465. N.Y. Univ. Press, N.Y.,
2nd edition, 1986.

Maynard, S. J. and G. R. Price. The logic of animal conflict. Nature, London, 1973.

Morozov, V. V. and A. G. Sukharev. Operational Research in Problems and Exer


cises. Vish. Sch., Moscow, 1986.

Myerson, R. B. International Journal of Game Theory, 7, 1978.

Myerson, R. B. Econometrica, 45, 1981.

Nash, J. Annals of Mathematics, 54(2), 1951.

Owen, G. Game Theory, p. 230. Saunders, Philadelphia, 1968.

Owen, G. Game Theory, p. 230. Acad. Press, N.Y., 2nd edition, 1982.
348 Bibliography
Petrosjan, L. A., A. Azamov and H. Satimov. Controlled Systems, 13, 1974.
Peck, J. E. L. and A. L. Dulmage. Canad. J. Math,, 9(3), 1957.
Petrosjan, L. A. Soviet Math. Reports, 161(1), 1965.
Petrosjan, L. A. Wissenshaftliche Zeitshrift der TU Dresden, 4, 1968.
Petrosjan, L. A. Soviet Math. Reports, 195(3), 1970.
Petrosjan, L. A. Vestnik of the Leningrad State University, 19, 1972.
Petrosjan, L. A. Vestnik of the Leningrad State University, 13, 1977.
Petrosjan, L. A. Vesfnifc of the Leningrad State University, 2, 1992.
Petrosjan, L. A. Differential games of pursuit, p. 325. World Scientific, Singapore,
1993.
Petrosjan, L. A. and Yu. Garnaev. Search Games, p. 217. Len. State Univ. Publ.,
Leningrad, 1992.
Perles, M. A. and M. Mashler. International Journal of Game Theory, 10, 1981.
Pontryagin, L. S. Advances in Math. Sci., 21(4), 1966.
Prokhorov, Y. V. and Y. A. Riazanov. Probability Theory. Basic Notings. Central
limit theorems. Random processes, p. 358. Nauka, Moscow, 1967.
Parthasarathy, T. and T.E.S. Raghavan. Some topics in two-person games, p. 259.
Amer. Elsevier, N.Y., 1971.
Petrosjan, L. A. and V. V. Zakharov. Introduction to mathematical ecology, p. 295.
Len. State Univ. Publ., Leningrad, 1986.
Petrosjan, L. A. and N. A. Zenkevich. Optimal search in conflict conditions, p. 96.
Len. State Univ. Publ., Leningrad, 1986.
Robinson, G. B. An iteration method of solving a game, volume P-154, p. 9. RAND
Corp., 1950.
Rockafellar, R. T. Convex analysis, p. 470. Princeton Univ. Press, Princeton, 1970.
Rochet, J. C. Selection of unique equilibrium payoff for extension games with perfect
information. Mimeo, Universite de Paris, ix, 1980.
Roth, A. E. Mathematics of operations research, 2, 1977.
Rozenmuller, J. Cooperative games and markets, p. 115. Springer-Verlag, Berlin,
1971.
Bibliography 349

Sadovsky, A. L. Soviet Math. Reports, 238(3), 1978.

Sakaguchi, M. Oper. Res. Soc. Jap., 16, 1973.

Sansone, G. Ordinary differential equations, volume 2, p. 269. I.L., Moscow, 1954.

Selten, R. International journal of game theory, 4, 1975.

Selten, R. A demand commitment model of coalition bargaining. The discussion


paper NB-191, Univ. of Bonn, 1991.

Shapley, L. S. In H. Kuhn and A. Tucker, editors, Contributions to the Theory of


Games, II. Princeton Univ. Press, Princeton, 1953.

Sion, M. Pacif. Journ. of Math., 8(1), 1958.

Strotz, R. H. Review of economic studies, XXIII, 1955.

Sion, M. and Ph. Wolfe. In M. Dresher, A. Tucker, and Ph. Wolfe, editors, Contri
butions to the Theory of Games, III. Princeton Univ. Press, Princeton, 1957.

Thomson, W. and T. Lensberg. Axiomatic theory of bargaining with a variable


number of agents. Cambridge Univ. Press, Cambridge, 1989.

Tyniansky, N. T. and V. I. Zhukovsky. Finding in science and engeneering. Mathe


matical Analysis., volume 10. VINITI, Moscow, 1979.

van Damme, E. Stability and Perfection of Nosh equilibria, p. 215. Springer-Verlag,


Berlin, 1991.

von Neumann, J. Math. Ann., 100, 1928.

von Neumann, J. and 0 . Morgenstern. Theory of Games and Economic Behavior,


p. 625. Wiley, N.Y., 1944.

Vorobjev, N. N. Game theory lectures for economists and system scientists, p. 178.
Springer, N.Y., 1977.

Vorobjev, N. N. Fundamentals of Game Theory. Non-cooperative games, p. 496.


Radio & Sviaz, N.Y., 1984.

Vaisbord, E. M. and V. I. Zhukovsky. Introduction to differential multiperson games


and their appilcations, p. 303. Sov. Radio, Moscow, 1980.

Voznyuk, S. N. and N. A. Zenkevich. In Controlled systems and applications. Yak.


Univ. Publ., Yakutsk, 1994.

Yanovskaya, E. B. Bull, of Soviet Acad. of Sci. Engineering Cybernetics., 6, 1973.

Zenkevich, N. A. Vestnik of the Leningrad State University, 19, 1981.


350 Bibliography
Zenkevich, N. A. and I. V. Marchenko. In Mathematical methods of optimization
and control in systems. Kal. State Univ. Publ., Kalinin, 1987.
Zenkevich, N. A. and I. V. Marchenko. Threat strategies in two-person games wth
dependent sets of strategies, p. 21. VINITI, Leningrad, 1990.
Zenkevich, N. A. and S. N. Voznyuk. In V. V. Mazalov and L. A. Petrosjan, editors,
Year-book on Game Theory and Applications, vol. 2. Nauka (Novosibirsk) and
Nova Science Publ. (N.Y.), 1994.
Zenkevich, N. A. and S. N. Voznyuk. In M. Breton and G. Zaccour, editors, 6th
International Symposium on Dynamic Games and Applications, Preprint Volume,
Montreal, Canada, 1994. Ecole des Hautes Etudes Commerciales.
Zenkevich, N. A. and S. N. Voznyuk. Mineral Processing Journal, 2, 1994.
Index
e,6-equilibrium, 287 equilibrium point, 7
ESS, 154
absolute Nash equilibrium, 204 evasion, 3
alternative, 229 extreme point, 15
Apollonius circle, 338
arc, 201 favorable equilibrium, 207
asymmetric Nash solution, 165
game
Banzaf vector, 198 bimatrix, 126
bargaining concave, 71
problem, 161 constant sum, 170
set of the game, 161 continuous, 64
solution vector, 161 convex, 70
behavior strategy, 231 cooperative differential, 315
differential with prescribed dura
carrier, 182 tion, 280
characteristic function, 168 evolutionary, 239
choice set, 253 hierarchical, 214
coalition, 168 in extensive form, 228
conditionally optimal trajectory, 325 infinite, 49
cone, 16 matrix, 1
convex multistage with complete informa
cone, 16 tion, 202
game, 70 noncooperative, 125
hull, 16 of kind, 273
polyhedral set, 15 of pursuit, 268
set, 15 with frictional forces, 306
core, 174 of search, 88
current subgame, 319 on the unit square, 52
Dictatorial solution, 165 repeated evolutionary, 239
repeated zero-sum, 235
direct ESS, 241
symmetric, 37, 240
duel, 3, 86, 87
two-person, 126
Egalitarian solution, 166 with perfect recall, 232
equilibrium, 210 zero-sum, 1
in joint mixed strategies, 158 graph, 200

351
352 Index

graph tree, 201 silent duel, 51


simple motion, 269, 304
hypothetical mismatch, 290 situation, 1, 203
completely mixed, 137
IDP, 245, 329
spectrum of mixed strategy, 137
imputation distribution procedure
Stakelberg i-equilibrium, 134
(IDP), 245, 329
strategy, 203
information set, 225
completely mixed, 35
Isaacs-Bellman equation, 294
conditionally optimal, 312
Kalai-Smorodinsky solution, 165 essential, 27
evolutionary stable (ESS), 154
maximin principle, 5 joint mixed, 158
minimax principle, 6 maximin, 5
minimum result, 273 minimax, 6
MPOLBS, 309 mixed, 12, 57, 136
mixed piecewise open-loop behav
Nash equilibrium, 130, 138
ior (MPOLBS), 309
NM -solution, 179
optimal, 210
Pareto optimality, 133 optimal open-loop, 292
path, 201 piecewise open-loop, 271
payoff, 202 synthesizing, 270, 275
integral, 273 strong equilibrium, 133
terminal, 244 strongly time consistency (STC), 245,
payoff function, 1, 49, 203, 272 333
penalty strategy, 213 subgame, 1, 203
perfect equilibrium, 146 symmetry of game, 240
play of the game, 203
time consistency (TC), 245, 318, 333
poker, 104
time-optimal game of pursuit, 285
positional games, 199
potential function, 188 UDP, 337
priority set, 228 unfavorable equilibrium, 207
proper equilibrium, 148 Utilitarian solution, 167
pure strategy, 230 utility distribution procedure (UDP),
337
reachability set, 280, 324
value
saddle point, 7, 18, 210
lower, 5, 52
existence, 9
of the game, 8, 274
search, 88
of the subgame, 210
search game, 4, 49
upper, 6, 52
secondary search, 92
set
admissible, 248
minimal admissible, 248
Shapley value, 183

Potrebbero piacerti anche