Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Volume 97
Editors
J.E. Marsden L. Sirovich
Advisors
M. Ghil J.K. Hale T. Kambe
J. Keller K. Kirchgassner
B.J. Matkowsky C.S. Peskin
J.T. Stuart
Springer
New York
Berlin
Heidelberg
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Andrzej Lasota
Michael C. Mackey
With 48 Illustrations
'Springer
Andrzej Lasota
Institute of Mathematics
Silesian University
ul. Bankowa 14
Katowice 40-058, Poland
Michael C. Mackey
Center of Nonlinear Dynamics
McGill University
Montreal, Quebec H3G 1Y6
Canada
Editors
J .E. Marsden
Control and Dynamical Systems, 107-81
California Institute of Technology
Pasadena, CA 91125
USA
L. Sirovich
Division of Applied Mathematics
Brown University
Providence, RI 02912
USA
9 8 7 6 5 4 3
ISBN 0-387-94049-9
ISBN 3-540-94049-9
SPIN 10851267
To the memory of
The first edition of this book was originally published in 1985 under the title "Probabilistic Properties of Deterministic Systems." In the intervening
years, interest in so-called "chaotic" systems has continued unabated but
with a more thoughtful and sober eye toward applications, as befits a maturing field. This interest in the serious usage of the concepts and techniques
of nonlinear dynamics by applied scientists has probably been spurred more
by the availability of inexpensive computers than by any other factor. Thus,
computer experiments have been prominent, suggesting the wealth of phenomena that may be resident in nonlinear systems. In particular, they
allow one to observe the interdependence between the deterministic and
probabilistic properties of these systems such as the existence of invariant
measures and densities, statistical stability and periodicity, the influence
of stochastic perturbations, the formation of attractors, and many others.
The aim of the book, and especially of this second edition, is to present
recent theoretical methods which allow one to study these effects.
We have taken the opportunity in this second edition to not only correct
the errors of the first edition, but also to add substantially new material in
five sections and a new chapter. Thus, we have included the additional dynamic property of sweeping (Chapter 5) and included results useful in the
study of semigroups generated by partial differential equations {Chapters
7 and 11) as well as adding a completely new Chapter 12 on the evolution
of distributions. The material of this last chapter is closely related to the
subject of iterated function systems and their attractors-fractals. In addi-
viii
tion, we have added a set of exercises to increase the utility of the work for
graduate courses and self-study.
In addition to those who helped with the first edition, we would like to
thank K. Alligood (George Mason), P. Kamthan, J. Losson, I. Nechayeva,
N. Provatas (McGill), and A. Longtin (Ottawa) for their comments.
A.L.
M.C.M.
This book is about densities. In the history of science, the concept of densities emerged only recently as attempts were made to provide unifying descriptions of phenomena that appeared to be statistical in nature. Thus, for
example, the introduction of the Maxwellian velocity distribution rapidly
led to a unification of dilute gas theory; quantum mechanics developed
from attempts to justify Planck's ad hoc derivation of the equation for the
density of blackbody radiation; and the field of human demography grew
rapidly after the introduction of the Gompertzian age distribution.
From these and many other examples, as well as the formal development
of probability and statistics, we have come to associate the appearance of
densities with the description of large systems containing inherent elements
of uncertainty. Viewed from this perspective one might find it surprising
to pose the questions: "What is the smallest number of elements that a
system must have, and how much uncertainty must exist, before a description in terms of densities becomes useful and/or necessary?" The answer is
surprising, and runs counter to the intuition of many. A one-dimensional
system containing only one object whose dynamics are completely deterministic (no uncertainty) can generate a density of states! This fact has
only become apparent in the past half-century due to the pioneering work
of E. Borel (1909], A. Renyi (1957], and S. Ulam and J. von Neumann.
These results, however, are not generally known outside that small group
of mathematicians working in ergodic theory.
The past few years have witnessed an explosive growth in interest in
physical, biological, and economic systems that could be profitably studied
using densities. Due to the general inaccessibility of the mathematical lit-
erature to the nonmathematician, there has been little diffusion of the concepts and techniques from ergodic theory into the study of these "chaotic"
systems. This book attempts to bridge that gap.
Here we give a unified treatment of a variety of mathematical systems
generating densities, ranging from one-dimensional discrete time transformations through continuous time systems described by integr~partial
differential equations. We have drawn examples from a variety of the sciences to illustrate the utility of the techniques we present. Although the
range of these examples is not encyclopedic, we feel that the ideas presented
here may prove useful in a number of the applied sciences.
This book was organized and written to be accessible to scientists with
a knowledge of advanced calculus and differential equations. In various
places, basic concepts from measure theory, ergodic theory, the geometry
of manifolds, partial differential equations, probability theory and Markov
processes, and stochastic integrals and differential equations are introduced.
This material is presented only as needed, rather than as a discrete unit
at the beginning of the book where we felt it would form an almost insurmountable hurdle to all but the most persistent. However, in spite of our
presentation of all the necessary concepts, we have not attempted to offer
a compendium of the existing mathematical literature.
The one mathematical technique that touches every area dealt with is the
use of the lower-bound function (first introduced in Chapter 5} for proving
the existence and uniqueness of densities evolving under the action of a
variety of systems. This, we feel, offers some partial unification of results
from different parts of applied ergodic theory.
The first time an important concept is presented, its name is given in
bold type. The end of the proof of a theorem, corollary, or proposition is
marked with a ; the end of a remark or example is denoted by a 0.
A number of organizations and individuals have materially contributed
to the completion of this book.
In particular the National Academy of Sciences (U.S.A.), the Polish
Academy of Sciences, the Natural Sciences and Engineering Research Council (Canada}, and our home institutions, the Silesian University and McGill
University, respectively, were especially helpful.
For their comments, suggestions, and friendly criticism at various stages
of our writing, we thank J. Belair (Montreal}, U. an der Heiden (Bremen}, and R. Rudnicki (Katowice). We are especially indebted toP. Bugiel
(Krakow) who read the entire final manuscript, offering extensive mathematical and stylistic suggestions and improvements. S. James (McGill} has
cheerfully, accurately, and tirelessly reduced several rough drafts to a final
typescript.
Contents
vii
ix
Introduction
1.1
A Simple System Generating a Density of States
1.2
The Evolution of Densities: An Intuitive Point of View
1.3 Trajectories Versus Densities
Exercises
1
1
5
9
13
The
2.1
2.2
2.3
17
Toolbox
Measures and Measure Spaces
Lebesgue Integration
Convergence of Sequences of FUnctions
Exercises
17
19
31
35
37
51
37
41
47
49
51
xii
Contents
4.2
4.3
4.4
4.5
5
Ergodic Transformations
Mixing and Exactness
Using the Frobenius-Perron Koopman Operators for
Classifying Transformations
Kolmogorov Automorphisms
Exercises
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
59
65
71
79
83
85
86
88
95
100
102
105
112
123
125
129
136
189
190
191
195
199
205
210
215
226
Contents
7.9
7.10
7.11
7.12
8
9.4
Operator
Behavior of pn I from H (Pn f)
Exercises
xiii
232
241
244
246
247
261
251
252
258
261
264
268
270
273
277
280
283
283
289
292
395
300
303
304
306
311
315
320
327
330
333
xiv
Contents
335
:i35
344
346
351
355
359
393
393
397
405
411
References
449
457
Index
461
364
368
371
378
386
388
391
417
420
425
432
447
1
Introduction
We begin by showing how densities may arise from the operation of a onedimensional discrete time system and how the study of such systems can
be facilitated by the use of densities.
If a given system operates on a density as an initial condition, rather than
on a single point, then successive densities are given by a linear integral
operator, known as the Frobenius-Perron operator. Our main objective in
this chapter is to offer an intuitive interpretation of the Frobenius-Perron
operator. We make no attempt to be mathematically precise in either our
language or our arguments.
The precise definition of the Frobenius-Perron operator is left to Chapter
3, while the measure-theoretic background necessary for this definition is
presented in Chapter 2.
(1.1.1)
We assume that a= 4 so S maps the closed unit interval [0, 1] onto itself.
This is also expressed by the saying that the state (or phase) space of the
system is [0, 1]. The graph of this transformation is shown in Fig. 1.1.1a.
1. Introduction
s
(a)
(b)
Xj
0 ~~~--~----~~~~-----U
(c)
~~~~~~~~~~~r-r-~
Xj
200
100
Having defined S we may pick a.n initial point x 0 e [0, 1] so that the
successive states of our system at times 1, 2, ... are given by the trajectory
(1.1.2)
A typical trajectory corresponding to a given initial state is shown in Figure
1.1.1b. It is visibly erratic or chaotic, as is the case for almost all x 0 What
is even worse is that the trajectory is significantly altered by a slight change
20
10
in the initial state, as shown in Figure 1.1.1c for an initial state differing
by w- 3 from that used to generate Figure 1.1.1b. Thus we are seemingly
faced with a real problem in characterizing systems with behaviors like that
of (1.1.1).
By taking a clue from other areas, we might construct a histogram to
display the frequency with which states along a trajectory fall into given
regions of the state space. This is done in the following way. Imagine that
we divide the state space [0, 1] into n discrete nonintersecting intervals so
the ith interval is (we neglect the end point 1)
[(i -1)/n,i/n)
i = 1, ... ,n.
(1.1.3)
We have carried out this procedure for the initial state used to generate
the trajectory of Figure 1.1.1b by taking n = 20 and using a trajectory of
length N = 5000. The result is shown in Figure 1.1.2. There is a surprising
symmetry in the result, for the states are clearly most concentrated near 0
and 1 with a minimum at Repeating this process for other initial states
leads, in general, to the same result. Thus, in spite of the sensitivity of
trajectories to initial states, this is not usually reflected in the distribution
of states within long trajectories.
However, for certain select initial states, different behaviors may occur.
For some initial conditions the trajectory might arrive at one of the fixed
points of equation (1.1.1), that is, a point x,. satisfying
!.
x,. = S(x,.).
1.. Introduction
(a)
xI
(b)
xI
50 '
100
FIGURE 1.1.3. Exceptional initial conditions may confound the study of transformations via trajectories. In (a) we show how an initial condition on the quadratic
transformation (1.1.1) with a= 4 can lead to a fixed point z. of S. In (b) we see
that another initial condition leads to a period 2 trajectory, although all other
characteristics of S are the same.
(For the quadratic map with a = 4 there are two fixed points, x. = 0 and
x. =
If this happens the trajectory will then have the constant value
x. forever after, as illustrated in Figure 1.1.3a. Alternately, for some other
initial states the trajectory might become periodic (see Figure 1.1.3b) and
also fail to exhibit the irregular behavior of Figures 1.1.1 b and c. The worst
part about these exceptional behaviors is that we have no a priori way of
predicting which initial states will lead to them.
In the next section we illustrate an alternative approach to avoid these
problems.
i)
Remark 1.1.1. Map {1.1.1) has attracted the attention of many mathematicians, Ulam and von Neumann [1947) examined the case when a= 4,
whereas Ruelle [1977), Jakobson [1978), Pianigiani [1979), Collet and Eckmann [1980) and Misiurewicz [1981] have studied its properties for values
of a< 4. May [1974], Smale and Williams [1976], and Lasota and Mackey
[1980), among others, have examined the applicability of {1.1.1) and similar
maps to biological population growth problems. Interesting properties re-
1 ifxE!l
if X fl .
={ 0
Loosely speaking, we say that a function fo(x) is the density function for
the initial states x~, ... , xO,. if, for every (not too small) interval floc (0, 1],
we have
f fo(u}du ~ ~
lao
1a0 (xJ).
(1.2.1}
j=l
1 N
ft(u}du ~ N :~:::)a(x}).
1
a
(1.2.2}
i=l
1. Introduction
FIGURE 1.2.1. The counterimage of the set [0, zJ under the quadratic transformation consists of the union of the two sets denoted by the heavy lines on the
z-axis.
ac
(0, 1]
if and only if
x1 E s- (a).
1
= 1s-l(A)(x).
(1.2.3)
JA
h(u)du =
Js-l(A)
fo(u)du.
(1.2.5)
1"'
a
!l(u)du =
Js- ([a,z))
1
fo(u)du,
ft(x)
=! f
Js-l((a,:l!]}
lo(u}du.
(1.2.6}
Pl(x)
=! f
Js-l([a,:J:))
l(u}du
(1.2.7}
Pl(x)=-d
X
d 11
l(u}du+dx
1/2+1/2,fl-i
l(u}du,
Pl(x) =
)=x
(1.2.8}
This equation is an explicit formula for the Frobenius-Perron operator corresponding to the quadratic transformation and will tell us how S transforms a given density I into a new density Pl. Clearly the relationship can
be used in an iterative fashion.
To see how this equation works, pick an initial density l(x) = 1 for
x E [0, 1]. Then, since both terms inside the braces in (1.2.8} are constant,
a simple calculation gives
Pl(x)
= 2vr::x
1-x
(1.2.9}
1. Introduction
p2f
Pf
2
I
I
I
I
I
I
I
I
I
I
I
I
I
I
\
__ ., "
FIGURE 1.2.2. The evolution of the constant density f(x) = 1, x E [0, 1], by
the Frobenius-Perron operator corresponding to the quadratic transformation.
Compare the rapid and regular approach of pn J to the density given in equation
(1.2.11) (shown as a dashed line) with the sustained irregularity shown by the
trajectories in Figure 1.1.1.
P(Pf(x))
= P 2 f(x)
=
=
2 10
(1. . )
/.(x)
= 1ry'x(1- x)
(1.2.11}
=/.
S(x)
= rx
(mod 1},
{1.2.12}
FIGURE 1.2.3. The dyadic transformation is a special case of the r-adic transformation. The heavy lines along the x-axis mark the two components of the
counterimage of the interval [0, x).
s- 1{[0, x]) =
U[~. ~ + ~]
r r
and the Frobenius-Perron operator is thus
i=O
Pl(x)= dx
r-1
i/r+z/r
i/r
1
l(u)du="i-
r-1
( .
~~ ~+;
{1.2.13)
This formula for the Frobenius-Perron operator corresponding to the radic transformation {1.2.12) shows again that densities I will be rapidly
smoothed by P, as can be seen in Figure 1.2.4a for an initial density l(x) =
2x, x E [0, 1]. It is clear that the density pnl(x) rapidly approaches the
constant distribution l.(x)
1, x E [0, 1]. Indeed, it is trivial to show
that P1
1. This behavior should be contrasted with that of a typical
trajectory (Figure 1.2.4b). D
10
1. Introduction
.,
200
tOO
FIGURE 1.2.4. Dynamics of the dyadic transformation. (a) With an initial density J(x) = 2x, x E [0, 1], successive applications of the Frobenius-Perron operator corresponding to the dyadic transformation result in densities approaching
j. = 1, x E [0, 1]. (b) A trajectory calculated from the dyadic transformation
with x 0 ~ 0.0005. Compare the irregularity of this trajectory with the smooth
approach of the densities in (a) to a limit.
Let R denote the entire real line, that is, R = {x: -oo < x < oo}, and
consider the transformation S: R - R defined by
S(x) =ax,
a>O.
(1.3.1)
Pf(x) = (1/a)f(x/a).
We first examine the behavior of S for a> 1. Since sn(x)
that, for a > 1,
X :f: 0,
= anx, we see
and thus the iterates 5n (X) escape from any bounded interval.
This behavior is in total agreement with the behavior deduced from the
flow of densities. To see this note that
11
FIGURE 1.3.1. The transformation S(z), defined by equation (1.3.2), has a single
weak repelling point at x 0.
By the qualitative definition of the Frobenius-Perron operator of the previous section, we have, for any bounded interval [-A, AJ c R,
pn f(x) dx
-A
= ~A/a"
f(x) dx.
-A/a"
Since a:> 1,
lim
!A
n-+oo -A
pnf(x)dx = 0
and so, under the operation of S, densities are reduced to zero on every
finite interval when a: > 1.
Conversely, for a: < 1,
lim ISR(x)l = 0
n-+oo
for every x E R, and therefore all trajectories converge to zero. Furthermore, for every neighborhood ( -e, e) of zero, we have
lim
n-+oo
-~
pn f(x) dx
lim
~~/a"
n-+oo
-~/a"
f(x) dx
= ~oo
_ 00
f(x) dx
= 1,
so in this case all densities are concentrated in an arbitrarily small neighborhood of zero. Thus, again, the behaviors of trajectories and densities
seem to be in accord.
However, it is not always the case that the behavior of trajectories and
densities seem to be in agreement. This may be simply illustrated by what
we call the paradox of the weak repellor. In Remark 6.2.1 we consider
the transformationS: [0, 1]-+ [0, 1] defined by
10r X
2 1 ,
(La. 2)
12
1. Introduction
x,
100
200
400
300
500
FIGURE 1.3.2. Dynamics of the weak repellor defined by {1.3.2). (a) The evolution pn f of an initial distribution J(x) = 1, x E (0, 1]. {b) The trajectory
originating from an initial point x 0 ~ 0.25.
pnf(x)dx
= 0.
pn f(x) dx
= 1,
lim
n-+oo e
n-+oo}0
where a= 1/{1- xo) > 1. Thus initially, for small xo, this transformation
behaves much like transformation {1.3.1), and the behavior of the trajectory
Exercises
13
near zero apparently contradicts that expected from the behavior of the
densities.
This paradox is more apparent than real and may be easily understood.
First, note that even though all trajectories are repelled from zero (zero is
a repellor), once a trajectory is ejected from (0, ~1 it is quickly reinjected
into (0, ~ 1 from ( ~, 11 . Thus zero is a "weak repellor." The second essential
point to note is that the speed with which any trajectory leaves a small
neighborhood of zero is small; it is given by
2
Xo
S n( Xo ) - sn-1( xo ) -_ (1- nxo)[1(n- 1)xo]"
Thus, starting with many initial points, as n increases we will see the progressive accumulation of more and more points near zero. This is precisely
the behavior predicted by examining the flow of densities.
Although our comments in this chapter lack mathematical rigor, they
offer some insight into the power of looking at the evolution of densities
under the operation of deterministic transformations. The next two chapters are devoted to introducing the mathematical concepts required for a
precise treatment of this problem.
Exercises
Simple numerical experiments can greatly clarify the material of this and
subsequent chapters. Consequently, the first five exercises of this chapter involve the writing of simple utility programs to study the quadratic
map (1.1.1) from several perspectives. Exercises in subsequent chapters will
make use of these programs to study other maps. H you have access to a
personal computer (preferably with a math coprocessor), a workstation,
or a microcomputer with graphics capabilities, we strongly urge you to do
these exercises.
1.1. Write a program to numerically generate a sequence of iterates {xn}
from Xn+l = S(xn), where S is the quadratic map (1.1.1). Write your
program in such a way that the map Sis called from a subroutine (so it
may be changed easily) and include graphics to display Xn versus n. When
displaying the sequence {xn} graphically, you will find it helpful to connect
successive values by a straight line so you can keep track of them. Save this
program under the name TRAJ so you can use it for further problems.
1.2. Using TRAJ study the behavior of (1.1.1) for various values of a
satisfying 3:::; a:::; 4, and for various initial conditions xo. (You can include
an option to generate xo using the random number generator if you wish,
but be careful to use a different seed number for each run.) At a given
value of a what can you say about the temporal behavior of the sequence
14
1. Introduction
{xn} for different x 0 ? What can you say concerning the qualitative and
quantitative differences in the trajectories {xn} for different values of o:?
1.3. To increase your understanding of the results in Exercise 1.2, write a
second program called BIFUR. This program will plot a large number of
iterates of the map {1.1.1) as o: is varied between 3 and 4, and the result
will approximate the bifurcation diagram of (1.1.1). Procedurally, for each
value of o:, use the random number generator (don't forget about the seed)
to select an initial x0 , discard the first 100 or so values of Xn to eliminate
transients, and then plot a large number (on the order of 1000 to 5000)
of the Xn vertically above the value of o:. Then increment o: and repeat
the process successively until you have reached the maximal value of o:. A
good incremental value of o: is ~o: = 0.01 to 0.05, and obviously the smaller
~o: the better the resolution of the details of the bifurcation diagram at
the expense of increased computation time. Use the resulting bifurcation
diagram you have produced, in conjunction with your results of Exercise
1.2, to more fully discuss the dynamics of {1.1.1). You may find it helpful
to make your graphics display flexible enough to "window" various parts
of the bifurcation diagram so you can examine fine detail.
1.4. Write a program called DENTRAJ (Density from a 'Irajectory) to
display the histogram of the location of the iterates {xn} of {1.1.1) for
various values of o: satisfying 3 :::; o: :::; 4 as was done in Figure 1.1.2 for
o: = 4. [Constructing histograms from "data" like this is always a bit tricky
because there is a tradeoff between the number of points and the number of
bins in the histogram. However, a ratio of 20(}-300 of point number to bin
number should provide a satisfactory result, so, depending on the speed of
your computer (and thus the number of iterations that can be carried out
in a given time), you can obtain varying degrees of resolution.) Compare
your results with those from Exercise 1.3. Note that at a given value of
o:, the bands you observed in the bifurcation diagram correspond to the
histogram supports (the places where the histogram is not zero).
Exercises
15
1.7. This exercise illustrates that there can sometimes be a danger in draw.ing conclusions about the behavior of even simple systems based on numer-
ical experiments. Consider the Frobenius-Perron operator {1.2.13) corresponding to the r-adic transformation {1.2.12) when r is an integer. (a)
For every integer r show that /.{x) = 1[o, 1J{x) is a solution of Pf = f.
Can you prove that it is the unique solution? {b) For r = 2 and r = 3 use
TRAJ, DENTRAJ, and DENITER to study {1.2.12). What differences do
you see in the behaviors for r = 2 and r = 3? Why do these differences
exist? Discuss your numerical results in light of your computations in (a).
1.8. Consider the example of the weak repellor {1.3.2). (a) Derive the
Frobenius-Perron operator corresponding to the weak repellor without
looking in Chapter 6. Calculate a few terms of the sequence {pn!} for
f(x) = 1[o, 1J(x). {b) Use TRAJ, DENTRAJ and DENITER to study the
weak repellor (1.3.2). Discuss your results. Based on your observations,
what conjectures can you formulate about the behavior of the weak repellor? In what way do these differ from the properties of the quadratic map
(1.1.1) that you saw in Exercises 1.1-1.5?
2
The Toolbox
2.1
(c) X EA.
From this definition it follows immediately, by properties (a) and (c),
that the empty set 0 belongs to A, since 0 = X \ X. Further, given a
18
2. The Toolbox
sequence {Ak }, Ak
that
=X\U (X\Ak)
and then apply properties (a) and (b). Finally, the difference A \ B of two
sets A and B that belong to A also belongs to A because
(a) p.(0) = 0;
(b) p.(A) 2:: 0 for all A
e A; and
(c) p.(UkAk) = Ekp.(Ak) if {Ak} is a finite or infinite sequence of pairwise disjoint sets from A, that is, A 1 n Aj 0 for i =F j.
Remark 2.1.1. This definition of a measure and the properties of aualgebra A as detailed in Definition 2.1.1 ensure that (1) if we know the
measure of a set X and a subset A of X we can determine the measure of
X \A; and (2) if we know the measure of each disjoint subset Ak of A we
can calculate the measure of their union. 0
Definition 2.1.3. U A is a u-algebra of subsets of X and if p. is a measure
on A, then the triple (X, A, p.) is called a measure space. The sets belonging to A are called measurable sets because, for them, the measure
is defined.
Remark 2.1.3. If X = [0, 1] or R, the real line, then the most natural ualgebra is the u-algebra B of Borel sets (the Borel a--algebra), which, by
definition, is the smallest u-algebra containing intervals. (The word smallest
means that any other u-algebra that contains intervals also contains any
set contained in B.) It can be proved that on the Borel u-algebra there
19
exists a unique measure p, called the Borel measure, such that p([a, b]) =
b - a. Whenever considering spaces X = R or X = Rd or subsets of these
(intervals, squares, etc.) we always assume the Borel measure and will not
repeat this assumption again. 0
As presented, Definition 2.1.3 is extremely general. In almost all applications a more specific measure space is adequate, as follows:
Definition 2.1.4. A measure space (X, A, p) is called D'-finite if there is
a sequence {A~e}, A~e e A, satisfying
00
= U A~e
for all k.
le=l
Remark 2.1.4. If X= R, the real line, and pis the Borel measure, then
the A1e may be chosen as intervals of the form [-k, k]. In the d-dimensional
space ~, the A~e may be chosen as balls of radius k. 0
Definition 2.1.5. A measure space (X, A, p) is called finite if p(X) < oo.
In particular, if p(X) = 1, then the measure space is said to be normalized
or probabilistic.
Remark 2.1.5. We have defined a hierarchy of measure spaces from the
most general (Definition 2.1.3) down to the most specific (Definition 2.1.5).
Throughout this book, unless it is specifically stated to the contrary, a measure space will always be understood to be u-finite. 0
Remark 2.1.6. If a certain property involving the points of a measure
space is true except for a subset of that space having measure zero, then
= max(O, -/(x))
20
2. The Toolbox
/(z)
= /+(z)- 1-(x)
and
lf(x)l
= f+(x) + 1-(x).
Ai
= {x:f(x) E [at,ai+l)},
= 0, ... ,n -1.
f(x)-
t;
n-1
I n-
M
ailA,(x) :5
n A; = 0 for
21
is defined as
g(x) ~t(dx)
= ~ AiiJ(Ai)
I
Jx
n-+oo}x
Remark 2.2.1. It can be shown that the limit in Definition 2.2.3 exists
and is independent of the choice of the sequence of simple functions {9n}
as long as they converge uniformly to I. D
Definition 2.2.4. Let (X, A, I') be a measure space,
ative unbounded measurable function, and define
1: X
-+
R a nonneg-
_ { l(x) if 0 ~ l(x) ~ M
f M(x ) - M
if M < l(x).
Then the Lebesgue integral of I is defined by
Jx
M-+oo)x
IM(x) ~t(dx).
=[
l+(x) ~t(dx)- [
1-(x) ~t(dx)
22
2. The Toolbox
is called
integrable.
Remark 2.2.3. The four Definitions 2.2.2-2.2.5 are for the Lebesgue integral of f over the entire space X. For A E A we have, by definition,
f(x) IJ(dx)
f(x)lA(x) IJ(dx).
The Lebesgue integral has some important properties that we will often
use. We state them without proof. Throughout a measure space (X,A,IJ)
is assumed.
(Ll) If J,g:X-+ Rare measurable, g is integrable, and lf(x)l ~ g(x),
then f is integrable and
IL
f(x) IJ(dx)l
g(x) IJ(dx).
linear combination
~d1
~1, ~2 E
R the
+ ~2 !2 is integrable and
L[~d1(x) + ~2h(x)]IJ(dx)
=
~1 [!t(x)1J(dx)+~2 [h(x)IJ(dx).
(L4) Let f, g: X -+ R be measurable functions and fn: X -+ R be mea-surable functions such that lfn(x)l ~ g(x) and Un(x)} converges to
f(x) almost everywhere. If g is integrable, then f and fn are also
integrable and
lim { fn(x) IJ(dx)
n-+oo}x
= Jx
{ f(x) IJ(dx).
The last formula is also true if the assumption lfn(x)l ~ g(x) with an
integrable g is replaced by 0 ~ ft(x) ~ h(x) .... In this case, however, the
integrals could be infinite.
Remark 2.2.4. The properties described in (L4) are often referred to as
the Lebesgue dominated convergence theorem (lfn(x)l ~ g(x)) and
the Lebesgue monotone convergence theorem (0 ~ ft(x) ~ ). 0
(L5) Let f: X -+ R be an integrable function and the sets Ai E A, i
1, 2, ... , be disjoint. If A = Ui Ai, then
L }A,
{ f(x) IJ(dx) = }Af f(x) IJ(dx).
i
23
ll(x)l J.t(dx)
is finite. Hence
l+(x) J.t(dx) +
1-(x) J.t(dx)
Remark 2.2.6. Our definition of the Lebesgue integral was stated in four
distinct steps. It should be evident from this construction that for every
integrable function I there is a sequence of simple functions
ln(x) = L~i,nlA,,,.(x)
i
such that
lim ln(x)
n-+oo
= l(x) a.e.
and
lln(x)l :5 ll(x)l.
lim
n-+oo}x
= Jx
f l(x) J.t(dx).
ln(x) J.t(dx)
lra,b)
l(x) J.t(dx)
1ba
l(x) dx
where the left-hand side is the Lebesgue integral and the right-hand side
is the Riemann integral. This equality is true for any Riemann integrable
function I since any Riemann integrable function is automatically Lebesgue
integrable. An analogous connection exists in higher dimensions. 0
From the properties of the Lebesgue integral it is easy to demonstrate
that if 1: X - R is a nonnegative integrable function then J.tJ(A), defined
by
J.tJ(A)
l(x) J.t(dx),
24
2. The Toolbox
I'J(A)
lA(x)f(x) J'(dx)
=0
since lA(x)f(x) = 0 a.e. Thus I'J(A) satisfies all the properties of a measure as detailed in Definition 2.1.2, and I'J(A) = 0 whenever J'(A) = 0.
This observation that every integrable nonnegative function defines a finite
measure can be reversed by the following theorem, which is of fundamental
importance for the development of the Frobenius-Perron operator.
Theorem 2.2.1. (Radon-Nikodym theorem). Let (X,A,J') be a measure space and let v be a second finite measure with the property that
v(A) = 0 for all A E A such that J'(A) = 0. Then there exists a nonnegative integrable function f: X --+ R such that
v(A)
f(x) J'(dx)
v(A)
h(x) J'(dx).
= 0.
A1
and A2
= {x:ft(x) :5 h(x)}.
Then
0=
jA1
=f
lft(x)- h(x)IJ'(dx)
jA1UA2
jA2
lft(x)- h(x)IJ'(dx).
25
h(x)IJ(dx) =
then
h =h
h(x)IJ(dx)
forAeA
a..e.
Also from property (12) of the Lebesgue integral it is clear that two
measurable functions, h and h, differing from each other only on a. set of
measure zero, cannot be distinguished by calculating integrals. Thus we sa.y
that in the space of measurable functions, every two functions !1, /2,
differing only on a. set of measure zero, represent the same element of that
space. However, to simplify our notation, we will often write ''measurable
function" instead of "an element of the space of measurable functions."
Because of property (12) this should not lead to any confusion.
With these remarks in mind, we now introduce the concept of an lJ'
space.
Definition 2.2.6. Let (X, A, IJ) be a. measure space and p a. real number,
1 ~ p < oo. The family of all possible real-valued measurable functions
f: X -+ R satisfying
(2.2.2)
is the IJ'(X, A, IJ) space. Here we use the term "measurable function" to
mean "an element of the space of measurable functions."
We shall sometimes write lJ' instead of IJ'(X, A, IJ) if the measure space
is understood, or IJ'(X) if A and IJ are understood. Note that if p = 1 then
the L 1 space consists of all possible integrable functions.
The integral appearing in (2.2.2) is very important for an element f E IJ'.
Thus it is assigned the special notation
1
(2.2.3)
and is called the lJ' norm of f. When property (12) of the Lebesgue
integral is applied to Ill", it immediately follows that the condition 11/IIL,. =
0 is equivalent to f(x) = 0 a..e. Or, more precisely, 11/IIL,. = 0 if and only if
f is a zero element in lJ' (which is an element represented by a.ll functions
equal to zero almost everywhere).
Two other important properties of the norm a.re
for f E IJ', a E R
(2.2.4)
26
2. The Toolbox
and
II/+ giiLP
11/IILP + llgiiLP
for
J,g E V.
(2.2.5}
The first condition, (2.2.4), simply says that the norm is homogeneous. The
second is called the triangle inequality. As shown in Figure 2.2.2, if we
think of J, g, and J +gas vectors, we can consider a triangle with sides J,
g, and J +g. Then, by equation, (2.2.5), the length of the side (J +g) is
shorter than the sum of the lengths of the other two sides.
From (2.2.4) it follows that for every J E V and real a, the product af
belongs to V. Further, from (2.2.5) it follows that for every J,g E V the
sum J + g is also an element of V. This is denoted by saying that V is a
vector space.
Because the value of 11/IILP is interpreted as the length of j, we say that
1
II/- giiLP =
/P
= 1.
27
(/,g)=
f(x)g(x) tt(dx).
An important relation we will often use is the Cauchy-Holder inequality. Thus, if IE LP-and g E v'' then
lg(x)l :5 c
for almost all x E X. This constant is denoted by ess sup lg( x) I, called the
essential supremum of g.
Remark 2.2.10. As we almost always work in Lt, we will not indicate the
space in which the norm is taken unless it is not Lt. Thus we will write
11/11 instead of 11/IILl Observe that in Lt the norm has the exceptional
property that the triangle inequality is sometimes an equality. To see this,
note from property (L3) that
for
v(A)
f(x) tt(dx)
forA EA.
One of the most important notions in analysis, measure theory, and topology, as well as other areas of mathematics, is that of the Cartesian product.
To introduce this concept we start with a definition.
Definition 2.2.8. Given two arbitrary sets At and A2, the Cartesian
product of At and A2 (note that the order is important) is the set of all
pairs (xt,x2) such that Xt EAt and x2 E A2. This is customarily written
as
28
2. The Toolbox
In a natural way this concept may be extended to more than two sets.
Thus the Cartesian product of the sets A1 , , Ad is the set of all sequences
(xl! ... , xd) such that Xi E Ai, i = 1, ... , d, or
(2.2.8)
Unfortunately, by themselves they do not define a measure space (X, A, JL).
There is no problem with either X or A, but I' is defined only on special
sets, namely A = A 1 x x Ad, that do not form a u-algebra. To show that
JL, as defined by (2.2.8), can be extended to the entire u-algebra A requires
the following theorem.
Theorem 2.2.2. If measure spaces (Xi, A, l'i), i = 1, ... , d are given
and X, A, and I' are defined by equations (2.2.6), (2.2.7), and (2.2.8),
respectively, then there exists a unique extension of I' to a measure defined
on A.
The measure space (X, A, I') whose existence is guaranteed by Theorem
2.2.2, is called the product of the measure spaces (X11 A 1,JL1), ... ,
(Xd, ~.I'd), or more briefly a product space. The measure I' is called
the product measure.
Observe that from equation (2.2.8) it follows that
JL(Xl
X X
Thus, if all the measure spaces (Xi, A, l'i) are finite or probabilistic, then
(X, A, JL) will also be finite or probabilistic.
Theorem 2.2.2 allows us to define integration on the product space (X, A,
JL) since it is also a measure space. A function f: X --+ R may be written
as a function of d variables because every point x E X is a sequence x =
(x1, ... ,xd), Xi E Xi. Thus it is customary to write integrals on X either
as
f(x) JL(dx),
29
{[
1
I . I
(2.2.10)
Remark 2.2.11. As we noted in Remark 2.1.3, the "natural" Borel measure on the real line R is defined on the smallest u-algebra 8 that contains
all intervals. For every interval [a, b] this measure satisfies J.L([a, b]) = b- a.
Having the structure (R, 8, J.L), we define by Theorem 2.2.2 the product
space (~,8d,J.Ld), where
d
R =Rx xR,
30
2. The Toolbox
EB,
and
{2.2.11)
- The measure I'd is again called the Borel measure. It is easily verified that
Bd may be alternately defined as either the smallest u-algebra containing
all the rectangles
[at. b1] x x [ad, bd],
or as the smallest u-algebra containing all the open subsets of Jl!l. From
{2.2.11) it follows that
= {b1 -
= [0,1]x .~.
x[0,1].
In all cases (Rd, [0, 1]d, etc.) we will omit the superscript don Bd and I'd
and write {Jrl, B, J.) instead of (Rd, Bd, I'd). Furthermore, in all cases when
the space is R, Rd, or any subset ofthese ([0, 1], [0, 1]d, R+ = (0, oo), etc.)
and the measure and u-algebra are not specified, we will assume that the
measure space is taken with the Borel u-algebra and Borel measure. Finally,
all the integrals on R or Rd taken with respect to the Borel measure will
be written with dx instead of J.( dx). D
Remark 2.2.12. From the additivity property of a measure (Definition
2.1.2c) it follows that every measure is monotonic, that is, if A and Bare
measurable sets and A c B then J.(A) ~ J.(B). This follows directly from
31
= (/,g)
In
V', 1
~ p
< oo,
is
(2.3.1)
k=l
In
V', 1
In
V', 1
< oo, is
(2.3.2)
oo, is
(2.3.3)
strong convergence implies weak convergence, and the condition for strong
convergence is relatively straightforward to check. However, the condition
for weak convergence requires a demonstration that it holds for all g E v',
which seems difficult to do at first glance. In some special and important
spaces, it is sufficient to check weak convergence for a restricted class of
functions, defined as follows.
Definition 2.3.4. A subset K c V' is called linearly dense if for each
E V' and e > 0 there are g1, ... ,gn E K and constants .>.l,.>.n, such
32
2. The Toolbox
that
~ 1,
and hence the sequence {11/nll2} is bounded. Now take an arbitrary function g(x) = sin(m1rx) from Ks. We have
1
(/n,g) = 1 sin(nx) sin(m1rx) dx
sin(n- m1r)
2(n- m1r)
sin(n + m1r)
2(n + m1r)
so that
lim (/n, g) = (0, g) = 0,
for g E Ks
n-+oo
33
We have seen that, in a given lJ' space, strong convergence implies weak
convergence. It also turns out that we may compare convergence in different
lJ' spaces using the following proposition.
Proposition 2.3.1. If (X, A, f.J) is a finite measure space and 1 :::; p 1 <
P2 :::; oo, then
for every
E IJ'2
(2.3.4)
where c depends on JL(X). Thus every element of lJ'2 belongs to IJ'1 , and
strong convergence in lJ'2 implies strong convergence in IJ'1
Proof. Let
Setting p' = P2/P1 and denoting by p the number adjoint top', that is,
(1/p) + (1/p') = 1, we have
and, consequently,
which proves equation (2.3.4). Hence, if II/IILP2 is finite, then 11/IILPl is also
finite, proving that IJ'2 is contained in IJ'1 FUrthermore, the inequality
n-+oo}x
To see this simply note that
= Jx
{ /JL(dx).
34
2. The Toolbox
It is often necessary to define a function as a limit of a convergent sequence and/or as a sum of a convergent series. Thus the question arises
how to show that a sequence {/n} is convergent if the limit is unknown.
The famous Cauchy condition for convergence provides such a tool.
To understand this condition, first assume that {/n}, In E V, is strongly
convergent to f. Take e > 0. Then there is an integer no such that
forn~no
and, in particular,
for n ~ no and k ~ 0.
From this and the triangle inequality, we obtain
Thus we have proved that, if {/n} is strongly convergent in/.)' to/, then
lim
n-+oc
11/n+Tc -/niiLP =
0.
(2.3.5)
Theorem 2.3.1. Let (X, A, JJ) be a measure space and let {/n}, In E
V(X, A, JJ) be a sequence such that equation (2.3.5) holds. Then there exists
an element f E /.)' (X, A, JJ) such that {In} converges strongly to f, that is,
condition (2.3.3) holds.
The fact that Theorem 2.3.1 holds for])' spaces is referred to by saying
that /.)' spaces are complete.
Theorem 2.3.1 enables us to prove the convergence of series by the use
of a comparison series. Suppose we have a sequence {gn} c V and we
know the series of norms lluniiLP is convergent, that is,
00
(2.3.6)
n=O
Exercises
35
To see this note that the convergence of (2.3. 7) simply means that the
sequence of partial sums
n
Bn=
~9m
m=O
= ~ IIYmiiLP
m=O
From equation (2.3.6) the sequence of real numbers {an} is convergent and,
therefore, the Cauchy condition holds for this sequence. Thus
n-+oo
=0
uniformly
fork~
0.
Further
n+A:
:$;
LP
~ ll9miiLP
= lun+A:- Unl
m=n+l
so finally
uniformly
fork~
0,
Exercises
2.1. Using Definition 2.1.2 prove the following "continuity properties" of
the measure:
(a) If {An} is a sequence of sets belonging to A and A1 C A2 C ... , then
(b) If {An} is a sequence of sets belonging to A and A1:) A2:) ... , then
2.2. Let X
define
For each A
k(n, A)= the number of elements of the set An {1, ... , n}.
36
2. The Toolbox
a-+oo
and that
Jor
}R+
n-+oo}o
2.5. Consider the space (X, A, J.) where X = {1, 2, ... } is the set of positive
integers, A all subsets of X and J.' the counting measure. Prove that a
function f: X -+ R is integrable if and only if
00
and that
[ f(x)J.(dx)
= ~ f(k).
[Remark. L 1 (X, A, J.) is therefore identical with the space of all absolutely
convergent sequences. It is denoted by l 1 .]
2.6. From Proposition 2.3.1 we have derived the statement: if 1 ~ Pl < P2 ~
oo and J.(X) < oo, then the strong convergence of Un} to fin IJ'2(fn, f E
LP2 ) implies the strong convergence of {fn} to f in LP1 Construct an
example showing that this statement is false when J.(X) = oo even if fm
f E Pl nP2.
2. 7. Let (X, A, J.) be a finite measure space and let
Show that the function
L 00 (X) be fixed.
1~p<oo
n-+oo
2.8. The spaces LP(X, A, J.) are seldom considered for 0 < p < 1 because
in this case an important property of the norm IIIILP given by formulas
(2.2.2) is not satisfied. Which property?
3
Markov and Frobenius-Perron
Operators
Taking into account the concepts of the preceding chapter, we are now
ready to formally introduce the Frobenius-Perron operator, which, as we
saw in Chapter 1, is of considerable use in studying the evolution of densities
under the operation of deterministic systems.
However, as a preliminary step, we develop the more general concept of
the Markov operator and derive some of its properties. Our reasons for this
approach are twofold: First, as will become clear, many concepts concerning
the asymptotic behavior of densities may be equally well formulated for
both deterministic and stochastic systems. Second, many of the results that
we develop in later chapters concerning the behavior of densities evolving
under the influence of deterministic systems are simply special cases of
more general results for stochastic systems.
The theory of Markov operators is extremely rich and varied, and we have
chosen an approach particularly suited to an examination of the eventual
behavior of densities in dynamical systems. Foguel [1969] contains an exhaustive survey of the asymptotic properties of Markov operators.
3.1
Markov Operators
Definition 3.1.1. Let (X, A, J.t) be a measure space. Any linear operator
P: L 1 -+ L 1 satisfying
38
(a) Pf?::. 0
(b)
IlP/II = 11/11,
for/?::.O,jeL 1
(3.1.1)
Remark 3.1.1. In conditions (a) and (b), the symbols f and Pj denote
elements of 1 represented by functions that can differ on a set of measure
zero. Thus, for any such function, properties f ?::. 0 and P f ?::. 0 hold almost
everywhere. When it is clear that we are dealing with elements of 1 (or
P), we will drop the "almost everywhere" notation. 0
Markov operators have a number of properties that we will have occasion
to use. First, if j, g e 1 , then
PJ(x)?::. Pg(x)
(3.1.2)
(3.1.3)
(3.1.4)
(3.1.5)
and
(M4)
IlP/II::;; 11/11.
(3.1.6)
39
IlP/II
LIPJ(x)l~t(dx) LPlf(x)l~t(dx)
= Llf(x)l~t(dx) = 11/11,
=
:5
ft, 12 E L 1 , It i: 12,
Inequality (3.1. 7) simply states that during the process of iteration of two
individual functions the distance between them can decrease, but it can
never increase. This is referred to as the stability property of iterates of
Markov operators.
By the support of a function g we simply mean the set of all x such
that g(x) : 0, that is,
suppg = {x:g(x)
: 0}.
(3.1.8)
: 0}.
(3.1.9)
40
zero means that the set of x in A that does not belong to B, or vice versa,
has measure zero. 0
One might wonder under what conditions the contractive property (3.1.6)
is a strong inequality. The answer is quite simple.
Proposition 3.1.2.11Pfll
supports.
Clearly the inequality will be strong if both p J+ (X) > 0 and p (X) > 0,
while the equality holds if PJ+(x) = 0 or Pf-(x) = 0. Thus, by integrating
over the space X, we have
if and only if there is no set A E A, JL(A) > 0, such that Pj+(x) > 0 and
Pf-(x) > 0 for x E A, that is, Pj+(x) and Pf-(x) have disjoint support.
Since f = j+- 1-, the left-hand integral is simply IIPfll FUrther, the righthand side is IIPJ+II +liP!_ II= IIJ+II + 111-11 = IIJII, so the proposition is
proved.
Having developed some of the more important elementary properties of
Markov operators, we now introduce the concept of a fixed point of P.
Definition 3.1.2. If P is a Markov operator and, for some f E L 1 , P f = f
then f is called a fixed point of P.
From Proposition 3.1.1 it is easy to show the following.
Proposition
=f
we have
hence
LPlf(x)IJL(dx)- Llf(x)IJL(dx)
= IIPiflll-111!111
41
II PI/Ill - Ill/Ill ~ 0.
Since both the integrands (Pf+- j+) and (Pf-- f-) are nonnegative,
this last inequality is possible only if P j+ = j+ and P 1- = 1-.
Definition 3.1.3. Let (X, A, J.L) be a measure space and the set D(X, A, J.L)
be defined by D(X,A,J.L) = {! E L 1 (X,A,J.L):f;;:: 0 and II/II= 1}. Any
function f E D(X,A,J.L) is called a density.
Definition 3.1.4.
Iff E L 1 (X,A,J.L)
J.Lt(A) =
f(x)J.L(dx)
S: X
for all A
EA.
42
l(x)J'(dx).
(3.2.1)
ls-l(A)
Because
s- (
yA.) ys-'(A.),
=
it follows from property (L5) of the Lebesgue integral that the integral
(3.2.1) defines a finite measure. Thus, by Corollary 2.2.1, there is a unique
element in L 1 , which we denote by PI, such that
}A
2. Now let
Write I = 1+
Pl(x)J(dx)
=f
l(x)J(dx)
for A EA.
Js-l(A)
I E L 1 be arbitrary,
- 1- and define
}A
Pl(x)J(dx} =
l(x)J(dx),
for A EA.
(3.2.2)
ls-l(A)
From Proposition 2.2.1 and the nonsingularity of S, it follows that equation (3.2.2) uniquely defines P.
We summarize these comments as follows.
Definition 3.2.3. Let (X, A, J') be a measure space. If S: X -+ X is a
nonsingular transformation the unique operator P: L 1 -+ L 1 defined by
equation (3.2.2) is called the Frobenius-Perron operator corresponding
to S.
It is straightforward to show from (3.2.2) that P has the following properties:
{FPl) P{).d1
{3.2.3)
(3.2.4)
and
Pf(x)p,(dx) = [
43
f(x)p,(dx)
(3.2.5)
(FP4) If S., = So . ~. o S and P., is the Frobenius-Perron operator corresponding to S.,, then P., = P"", where Pis the Frobenius-Perron
operator corresponding to S.
1
:z:
Pf(x)ds =
1S-l((o,:z:))
f(s)ds,
and by differentiating
Pf(x)
= dxd
1s-l((o,:z:))
f(s)ds.
{3.2.6)
s- 1 ([a, x])
Pf(x)
d 1s-'(:z:)
= dx
S- 1 (o)
f(s)ds
44
Pf
-I
0 e-1
e 3
(3.2.7}
given by
1
f(x) = 21[-1,1J(x},
and shown in Figure 3.2.1a. Under the action of P, the function I is carried
into
Pf(x) = (1/2x}1[.,-1,eJ(x)
as shown in Figure 3.2.1b. 0
Two important points are illustrated by this example. The first is that for
an initial I supported on a set [a, b], PI will be supported on [S(a), S(b)].
Second, Pf is small where (dS/dx} is large and vice versa.
We generalize the first observation as follows.
Proposition 3.2.1. Let S: X -+ X be a nonsingular transformation and P
the associated Probenius-Perron operator. Assume that an f ;:::: 0, f E 1 ,
is given. Then
(3.2.8}
suppf c s- 1(suppPJ)
45
and, more generally, for every set A E A the following equivalence holds:
Pf(x) = 0 for X E A if and only if f(x) = 0 for X E s- 1 (A).
Proof. The proof is straightforward. By the definition of the FrobeniusPerron operator, we have
}A
Pf(x)J.L(dx) =
ls-l(A)
f(x)J.L(dx)
or
[ 1A(x)Pf(x)J.L(dx) = [ 1s-l(A)(x)f(x)J.L(dx).
Thus Pf(x) = 0 on A implies, by property (L2) of the Lebesgue integral,
that f(x) = 0 for X E s- 1 (A), and vice versa. Now setting A = X \
supp(P/), we have Pf(x) = 0 for x E A and, consequently, f(x) = 0 for
X E s- 1 (A), which means that supp I c X \ s- 1 (A). Since s- 1 (A) =
X\ s- 1 (supp(P/)), this completes the proof.
Remark 3.2.3. In the case of arbitrary f E 1, then, in Proposition 3.2.1
we only have: If f(x) = 0 for all X E s- 1 (A), then Pf(x) = 0 for all X EA.
That the converse is not true can be seen by the following example. Take
S(x) = 2x (mod 1) and let
f(x)
Then from (1.2.13) Pf(x)
X
E [0,1].
={1
-1
= 0 for
~~x<~
2
~X~
1.
1"'
11
ds1 Pf(s,t)dt=
II
f(s,t)dsdt.
S- 1 ((a,z) X (c,y))
Differentiating first with respect to x and then with respect toy, we have
immediately that
Pf(x,y) =
a:~x
II
f(s,t)dsdt.
46
this we first state and prove a change of variables theorem based on the
Radon-Nikodym theorem.
Theorem 3.2.1. Let (X,A,J.t) be a measure space, S:X -+ X a nonsingular transformation, and f: X -+ X a measurable function such that
f o S E L 1 (X,A,J.t). Then for every A E A,
ls-l(A)
f(S(x))J.t(dx)
=I
}A
f(x)J.t8- 1(dx)
=I
}A
f(x)J- 1(x)J.t(dx)
Remark 3.2.4. We use the notation J- 1 (x) to draw the connection with
differentiable invertible transformations on R:'-, in which case J(x) is the
determinant of the Jacobian matrix:
J(x)
I I
= d~~x)
or
J-1(x)
dS:(x)
ls-l(A)
f(S(x))J.t(dx) =
lx
ls-l(A)(x)f(S(x))J.t(dx)
= ls-1(A)(x)ls-1(B)(x)J.t(dx)
= J.t(s- 1(A) n s- 1 (B)) = J.t(s- 1(A n B)).
The second integral of the theorem may be written as
f(x)J.tS- 1 (dx) =
fx
f(x)J- 1(x)J.t(dx) =
1B(x)J- 1(x)J.t(dx)
= I
lAnB
J- 1 (x)J.t(dx)
47
Thus we have shown that the theorem is true for functions of the form
f(x) = 1s(x). To complete the proof we need only to repeat it for simple
functions /(x), which will certainly be true by linearity [property (13))
of the Lebesgue integral. Finally, we may pass to the limit for arbitrary
bounded and integrable function f. [Note that f bounded is required for
the integrability of /(x)J- 1 (x).)
With this change of variables theorem it is easy to prove the following
extension of equation (3.2.7).
Corollary 3.2.1. Let (X, A, J.t) be a measure space, S: X-+ X an invertible nonsingular transformation (s-t nonsingular) and P the associated
Frobenius-Perron operator. Then for every f E L 1
(3.2.9)
Proof. By the definition of P, for A E A we have
}A
Pf(x)J.t(dx)
=f
f(x)J.t(dx).
ls-l(A)
f(x)J.t.(dx) =
ls-l(A)
}A
f(s-t(y))J-t(Y)J.t(dy)
48
This operator wBB first introduced by Koopman [1931]. Due to the nonsingularity of S, U is well defined since It (x) = h(x) a. e. implies It (S(x)) =
h(S(x)) a.e. Operator U hBB some important properties:
(K1)
E L 00 ,
(3.3.3)
00 1
(P/,g}
(3.3.2)
00
E L 1 , g E L 00 ,
= (/,Ug}
(3.3.4)
(Pj,g}
Pf(x)1A(X)IL(dx)
Pf(x)IL(dx),
(!, Ug}
=[
f(x)U1A(X)IL(dx)
= f
f(x)1A(S(x))#L(dx)
= f
f(x)IL(dx).
[ Pf(x)IL(dx) = [
}A
k-1(~
f(x)IL(dx)
ls-l(A)
which is the equation defining Pf. Because (K3) is true for g(x) = lA(x) it
is true for any simple function g(x). Thus, by Remark 2.2.6, property (K3)
must be true for all g E L 00
With the Koopman operator it is eaBy to prove that the FrobeniusPerron operator is weakly continuous. Precisely, this means that for every
sequence {fn} C L 1 the condition
fn- f weakly
(3.3.5)
Exercises
49
implies
Pfn-+ Pf weakly.
(3.3.6}
(Pf,g) = (f,Pg)
Exercises
3.1. The differential equation
u'(O)
= u'(1} = 0
for every f E 1 ([0, 1]} has a unique solution u(x) defined for 0 :5 x :5
1. Show that the mapping that adjoins the solution u to f is a Markov
operator on 1 ([0, 1]}. This can be done without looking for the explicit
formula for u.
3.2. Find the Frobenius-Perron operator P corresponding to the following
transformations:
(a) S: [0, 1]
-+
= 4x2 (1- x2 );
(Pf)i = ~Pi;/i,
i=1
i= 1, ... ,N,
50
Pi; 2:: 0,
LPi;
= 1,
i=1
and
fi
3.4. A mappingS: [0, 1] -+ [0, 1] is called a generalized tent transformation if S{x) = 8{1- x) for 0 ~ x ~ 1 and if S(x) is strictly increasing
for 0 ~ x ~ ! . Show that there is a unique generalized tent transformation
[given by (6.5.9)) for which the standard Borel measure is invariant.
3.5. Generalize the previous result showing that for every absolutely continuous measure p. on [0,1) with positive density (dp.fdx > 0 a.e.) there is
a unique generalized tent transformation S such that p. is invariant with
respect to S.
3.6. Let (X, .A, p.) be a measure space. A Markov operator P: L 1 (X) -+
L 1 {X) is called deterministic if its adjoint U
p has the following
property: For every A E .A the function U1A is a characteristic function,
i.e., U1A = 1s for some BE .A. Show that the Frobenius-Perron operator
is a deterministic operator.
3.7. Let X= {1, ... ,N} be a measure space with the counting measure
considered in Exercise 3.3. Describe a general form of the matrix (p,;) which
corresponds to a deterministic operator.
3.8. Let P,: L 1 -+ L 1 , i = 1, 2, denote deterministic Markov operators. Are
the operators P 1 P 2 and o:P1 + {1- o:)P2 , 0 < o: < 1, also deterministic?
3.9. Let X
mula
= [0, 1]. Show that P: L 1 {[0, 1]) -+ L 1 ([0, 1]) given by the forPf(x) =-1 f(x)
2
+-41 f (X)
- +-1 f (X
- +-1)
2
4
2 2
3.10. Let P: L 1 -+ L 1 be a Markov operator. Prove that for every nonnegative f,g E L 1 the condition supp f c supp g implies supp P/ C supp
Pg.
4
Studying Chaos with Densities
4.1
-+
X a
52
Theorem 4.1.1. Let (X, A, IS) be a measure space, 8: X -+X a nonsingular transformation, and P the Frobenius-PerTOn operator associated with
S. Consider a nonnegative f E 1 . Then a measure IS! given by
~St(A) = f(x)~S(dx)
is invariant if and only iff is a fixed point of P.
Proof. First we show the "only if' portion. Assume IS1 is invariant. Then,
by the definition of an invariant measure,
for all A E A,
or
f f(x)~S(dx) = f
}A
Js-l(A)
f(x)~S(dx)
forA EA.
(4.1.1)
ls-l(A)
f(x)~S(dx) = f Pf(x)~S(dx),
for A EA.
}A
(4.1.2)
Remark 4.1.1. Note that the original measure IS is invariant if and only
if P1 = 1. 0
Example 4.1.1. Consider the r-adic transformation originally introduced
in ~ple 1.2.1,
(mod 1),
S(x) = rx
where r > 1 is an integer, on the measure space ([0, 1], 8, IS) where 8 is the
Borel u-algebra and IS is the Borel measure (cf. Remark 2.1.3). As we have
shown in Example 1.2.1, for any interval [0, x] c [0, 1]
s-1([0,x]) =
ur~. ~
i= 0
rr
+ ~]
r
53
Pf(x)
r-1
(
)
= -1 L
I .!. + ~ .
ri=O
Thus
r-1
1
P1 =- L1 = 1
r i=O
and by our previous remark the Borel measure is invariant under the r-adic
transformation. 0
Pf(x)
= 4 )=x{1
(!- !v'l-X) +I(!+ !v'l-X)}.
1-x
Clearly,
1
,
2v'1- X
so that the Borel measure J.1. is not invariant under S by Remark 4.1.1. To
find an invariant measure we must find a solution to the equation P f = f
or
P1=
f(x) =
4
This problem was first solved by Ulam and von Neumann [1947] who showed
that the solution is given by
1
/.(x) = 11'Jx(1- x)'
{4.1.3)
= 4x{1- x).
54
S(
) _ { (2x, b)
0:::; x < !,O:::; y:::; 1
x, Y (2x -1 1 !y
+ !)
!2
<-x < 1I0 < y< 1.
2
2
(4.1.4)
! : :;
=!(~x,2y),
In the second case, for
271
dsfo
f(s,t)dt
o:::;y<~.
[0,2y -11)
55
2
(b)
FIGURE 4.1.1. Steps showing the operation of the baker transformation given in
equation {4.1.4).
hence
{j2 { r/2
1(1/2)+(z/2)
[2fl-l
}
Pf(x,y)= {}x{)y lo
ds lo f(s,t)dt+
ds lo
f(s,t)dt
112
! ~ y ~ 1.
56
1/2
-------------
2y
x/2
FIGURE 4.1.2. Two cases for calculating the counterimage of a set A by the
baker transformation.
Pf(
)_
x,y-
{I/(~+!x,2y-1),
(~x, 2y),
o$ y < ~
!5y$1
(4.1.5)
so that P1 = 1, and the Borel measure is, therefore, invariant under the
baker transformation. 0
Remark 4.1.4. Note that the transformation of the x-coordinate in the
baker transformation is the dyadic transformation. However, the dyadic
transformation is not 1 - 1, whereas the baker transformation is a.e. Given
an X C Rand any (not necessarily invertible) one-dimensional transformation 8: X -+ X, we may construct a two-dimensional invertible transformation T: X x X -+ X x X with 0 < (3 and
T(x,y) = (8(x)+y,(3x).
AB an example let 8: [0, 1]
Then T is given by
-+
57
<Cl
s-t(x )-{{!x,2y)
,y-
(!+!x,2y-1)
= l ), and indeed
0$x<1,0$y<!
O$x<1,!<y<1,
(mod 1).
(4.1.6)
To see the effect of this transformation consult Figure 4.1.3. 'In the first
part (a) of the figure we depict the unit square in the plane and divide it
into four triangular ares. In Figure 4.1.3b we show how the unit square is
transformed after one application of (x, y)-+ (x+y, x+2y), whereas Figure
4.1.3c shows the result of the full Anosov diffeomorphism. It is clear that
the effect of this transformation will be to very quickly scramble, or mix,
various regions of the unit square. This property of mixing, also shared
by the baker transformation, is most important and is dealt with in more
detail in Section 4.3.
The determinant of the Jacobian of transformation (4.1.6) is given by
= det 1~
~ 1 = 1,
58
V5
2 2
3
and .X2
3 V5
= 2 + 2'
hence 0 < .X1 < 1 < .X2 Thus, as for the baker transformation, the Anosov
diffeomorphism involves a stretching in one direction and a corresponding
compression in the orthogonal direction.
With some patience it is possible to derive an explicit formula for the
Frobenius-Perron operator corresponding to the Anosov diffeomorphism
(4.1.6) using a technique analogous to that employed for the baker transformation of the previous example. However, we can obtain this result
immediately from Corollary 3.2.1 since the Anosov diffeomorphism is invertible. An easy calculation gives
s- 1 (x, y) =
(2x- y, y- x)
and thus
Pl(x, y)
(mod 1)
= l(2x- y, y- x),
(4.1.7)
Remark 4.1.6. Observe that, if we replace the unit square [0, 1] x [0, 1]
with the torus, that is, if we identify points (x, 1) with (x, 0) and (1, y) with
(0, y), then this example of an Anosov diffeomorphism becomes continuous
and differentiable just as the r-adic transformation does when the unit
interval is replaced by the unit circle. The word diffeomorphism comes
from the fact that the Anosov transformation is invertible, and that both
the transformation and its inverse are differentiable. 0
s- 1 (A) =A
(4.2.1)
59
the sets A and X\ A separately. To see this, assume that A is fixed and
condition (4.2.1) holds. Consider a trajectory
x0 , 8(x0 ), 8 2 (x0 ),
operating on the space X= {1, ... , 2N} with the counting measure. This
transformation can be studied separately on the sets A = {1, 3, ... , 2N -1}
and X \ A = {2, 4, ... , 2N} of odd and even interiors. 0
Any set A satisfying (4.2.1) is called invariant. We require this equality
to be satisfied modulo zero (see Remark 3.1.3). Then we can make the
following definition.
Definition 4.2.1. Let (X, .A, JJ.) be a measure space and let a nonsingular
transformation 8: X -+ X be given. Then 8 is called ergodic if every
invariant set A e .A is such that either JJ.(A) = 0 or JJ.(X \ A) = 0; that is,
Sis ergodic if all invariant sets are trivial subsets of X.
From this definition it follows that any ergodic transformation 8 must
be studied on the entire space X. Determining ergodicity on the basis
of Definition 4.2.1 is, in general, difficult except for simple examples on
finite spaces. Thus, for example, the transformation in Example 4.2.1 is
not ergodic on the space X of integers, but it is ergodic on the sets of even
and odd integers.
In studying more interesting examples the following theorem may be of
use.
Theorem 4.2.1. Let (X, .A, JJ.) be a measure space and 8: X -+ X a nonsingular transformation. 8 is ergodic if and only if, for every measurable
function f: X -+ R,
f(8(x))
= f(x)
eX
(4.2.2)
60
and
B={x:f(x)>r}
= {x:f(x) ~ r} =A
and similarly for B. Because sets A and Bare invariant, 8 is not ergodic,
which is a contradiction. Thus, every f satisfying (4.2.2) must be constant.
To prove the converse, assume that 8 is not ergodic. Then, by Definition
4.2.1, there is a nontrivial set A E A that is invariant. Set f = 1A, and since
A is nontrivial, f is not a constant function. Moreover, since A= 8- 1 (A)
we have
/(8(x))
a.e.
Remark 4.2.1. It is clear from the proof that it is sufficient to verify only
(4.2.2) for bounded measurable functions since in the last part of the proof
we used characteristic functions that are bounded. 0
An immediate consequence of Theorem 4.2.1 in combination with the
definition of the Koopman operator is the following corollary.
Corollary 4.2.1. Let (X, A, JJ) be a measure space, 8: X--+ X a nonsingu-
61
Theorem 4.2.2. Let (X,A,IJ) be a measure space, S:X-+ X a nonsingular transformation, and P the Probenius-Perron operator associated with
S. If S is ergodic, then there is at most one stationary density f. of P.
Further, if there is a unique stationary density f. of P and f.(x) > 0 a. e.,
then S is ergodic.
Proof. To prove the first part of the theorem 8BSume that S is ergodic and
that It and h are different stationary densities of P. Set g = It - h, so
that Pg = g by the linearity of P. Thus, by Proposition 3.1.3, g+ and gare both stationary densities of P:
(4.2.3)
Since, by 8BSumption, It and h are not only different but are also densities
we have
Set
It is evident that A and Bare disjoint sets and both have positive (nonzero)
measure. By equality (4.2.3) and Proposition 3.2.1, we have
s- 1 (B).
Since A and Bare disjoint sets, s- 1 (A) and s- 1 (B) are also disjoint. By
Ac
s- 1 (A)
and B
and
where s-n(A) and s-n(B) are also disjoint for all n. Now define two sets
by
00
A=
U s-n(A)
00
and
iJ =
U s-n(B).
n=O
n=O
These two sets A and iJ are also disjoint and, furthermore they are invariant
because
00
00
n=O
62
and
s-l(fJ)
us-n(B) = us-n(B) = B.
00
00
n=l
n=O
Neither A nor fJ are of measure zero since A and B are not of measure
zero. Thus, A and fJ are nontrivial invariant sets, which contradicts the
ergodicity of S. Thus, the first portion of the theorem is proved.
To prove the second portion of the theorem, assume that /. > 0 is the
unique density satisfying Pf. = /. but that Sis not ergodic. If Sis not
ergodic, then there exists a nontrivial set A such that
s- 1 (A) =A
and with B = X \A
s- 1 (B) =B.
s-
1 (A). Thus,
The function 1s/. is equal to zero in the set X\ B =A=
by Proposition 3.2.1, P{1s/.) is equal to zero in A= X\ Band, likewise,
P{1A/.) is equal to zero in B =X\ A. Thus, equality (4.2.4) implies that
1Af
= P(1Af)
and
1s/. = P(1s/.).
63
FIGURE 4.2.2. The two disjoint sets A (containing all the arcs denoted by thin
lines) and B (containing arcs marked by heavy lines) are invariant under the
rotational transformation when t/J/21r is rational.
In this example the behavior of the trajectories is moderately regular and
insensitive to changes in the initial value. Thus, independent of whether or
not (4J/21r) is rational, if the value of 4J is known precisely but the initial
condition is located between a and /3, xo E {a, /3), then
{mod 21r)
and all of the following points of the trajectory are known with the same
accuracy, {/3 - a). 0
Before closing this section we state, without proof, the Birkhoff individual ergodic theorem [Birkhoff, 1931a,b].
Theorem 4.2.3. Let (X, A, p.) be a measure space, 8: X -+ X a measumble
transformation, and f: X -+ R an integrable function. If the measure p. is
invariant, then there exists an integrable function
such that
n-1
(4.2.5)
k=O
= f*(S(x))
{4.2.6)
f*(x)p.(dx) =
f(x)p.(dx).
{4.2.7)
Equation (4.2.6) follows directly from (4.2.5) if xis replaced by S(x). The
second property, {4.2.7), follows from the invariance of p. and equation
64
f(x)JJ(dx) = [
f(S(x))JJ(dx)
so that integrating equation (4.2.5) over X and passing to the limit yields
(4.2. 7) by the Lebesque-dominated convergence theorem when f is bounded.
When f is not bounded the argument is more difficult.
lim n-+oo n
f(Sk(x))
k=O
JJ
1
(X)
f(x)JJ(dx)
a.e.
(4.2.8)
Proof. From (4.2.6) and Theorem 4.2.1 it follows that/* is constant almost
everywhere. Hence, from (4.2.7), we have
f*(x)JJ(dx)
= f* [
so that
/*(x)
JJ(dx)
= JJ(~) [
= J*JJ(X) = [
f(x)JJ(dx)
f(x)JJ(dx),
a.e.
Thus equation (4.2.5) of the Birkhoff theorem and the preceding formula
imply (4.2.8), and the theorem is proved.
One of the most quoted consequences of this theorem is the following.
65
Remark 4.2.3. Corollary 4.2.2 says that every set of nonzero measure is
visited infinitely often by the iterates of almost every x E X. This result is
a special case of the Poincare recurrence theorem. 0
Definition 4.3.1. Let (X, A, I') be a normalized measure space, and S: X-+
X a measure-preserving transformation. Sis called mixing if
lim I'(A n s-n(B))
n-+oo
= IJ(A)IJ(B)
(4.3.1)
n-+oo
= I'(A)IJ(B) = (1 -~J(B))I'(B),
Example 4.3.1. (See also Example 4.1.3.) In considering the baker transformation, it is relatively easy to check the mixing condition (4.3.1) for
generators of the u-algebra B, namely, for rectangles. Although the transformation is simple, writing the algebraic expressions for the counterimages
is tedious, and the property of mixing is easier to see pictorially. Consider
66
Figure 4.3.1a, where two sets A and Bare represented with J.'(B) =!We
take repeated counterimages of the set B by the baker transformation and
find that after n such steps, s-"(B) consists of 2"- 1 vertical rectangles of
equal area. Eventually the measure of AnS-"(B} approaches J.'(A}/2, and
condition (4.3.1} is evidently satisfied. The behavior of any pair of sets A
and B is similar.
It is interesting that the baker transformation behaves in a similar fashion
if, instead of examining s-"(B}, we look at S"(B) as shown in Figure
4.3.1b. Now we have 2" horizontal rectangles after n steps and all of our
previous comments apply. So, for the baker transformation the behavior
of images and counterimages is very similar and illustrates the property of
mixing. This is not true for our next example, the dyadic transformation.
0
In general, proving that a given transformation is mixing via Definition
4.3.1 is difficult. In the next section, Theorem 4.4.1 and Proposition 4.4.1,
we introduce easier and more powerful techniques for this purpose.
Example 4.3.2. (Cf. Examples 1.2.1 and 4.1.1.} To examine the mixing
property (4.3.1) for the dyadic transformation, consider Figure 4.3.2a. Now
we take the set B = (0, b] and find that the nth counterimage of B consists
of intervals on (0,1] each of the same length. Eventually, as before J.'(A n
s-"(B}} - J.(A)J.(B).
As for the baker transformation let us consider the behavior of images of
a set B under the dyadic transformation (cf. Figure 4.3.2b}. In this case,
if B = (0, b], then S(B) = [0, 2b] and after a finite number of iterations
S"(B) = (0, 1). The same procedure with any arbitrary set B c (0, 1] of
positive measure will show that J.'(S"(B)) - 1 and thus the behavior of
images of the dyadic transformation is different from the baker transformation. 0
Exact Transformations
The behavior illustrated by images of the dyadic transformation is called
exactness, and is made precise by the following definition due to Rochlin
[1964].
Definition 4.3.2. Let (X, A, J.') be a normalized measure space and S: X -+
X a measure-preserving transformation such that S(A) E A for each
A EA. If
lim J.(S"(A))
n-+oo
=1
> 0,
(4.3.2}
I 5" 1 (B)
67
0
(a)
(b)
68
0
0
(a)
(b)
baker transformation the converse is not true. We defer the proof until the
next section when we have other tools at our disposal.
Condition (4.3.2) has a very simple interpretation. If we start with a set
A of initial conditions of nonzero measure, then after a large number of
iterations of an exact transformation S the points will have spread and
completely filled the space X.
Remark 4.3.1. It cannot be emphasized too strongly that invertible transformations cannot be exact. In fact, for any invertible measure-preserving
transformationS, we have p.(S{A)) = p.(S- 1 (S(A))) = p.(A) and by induction p.(S.. {A)) = p.(A), which violates {4.3.2). 0
In this and the previous section we have defined and examined a hierarchy
of "chaotic" behaviors. However, by themselves the definitions are a bit
sterile and may not convey the full distinction between the behaviors of
ergodic, mixing, and exact transformations. To remedy this we present the
first six successive iterates of a random distribution of 1000 points in the
set X= [0, 1] x [0, 1] by the ergodic transformation
{mod 1)
{4.3.3)
= (x+y,x+2y)
(mod 1)
(4.3.4)
69
I (d)
I (a)
I (b)
I (e)
0
I (c)
S(x,y) = (3x+y,x+3y)
(mod 1)
(4.3.5)
70
I (a l
I (b)
0
I (c)
0
(f)
I _.. _..
, ....
FIGURE 4.3.4. The effect of the mixing transformation (4.3.4} on the same initial
distribution of points used in Figure 4.3.3.
(d)
I (a)
71
I (b)
72
exactness in terms of the behavior of sequences of iterates of FrobeniusPerron and Koopman operators and show how they can be used to determine whether a given transformation S with an invariant measure is
ergodic, mixing, or exact. The techniques of this chapter rely heavily on
the notions of Cesaro, weak and strong convergences, which were developed
in Section 2.3.
We will first state and prove the main theorem of this section and then
show its utility.
Theorem 4.4.1. Let (X, .A, JL) be a normalized measure space, 8: X-+ X
a measure preseroing tmnsformation, and P the Frobenius-Perron opemtor
corresponding to S. Then
(a) Sis ergodic if and only if the sequence {Pn!} is CesAro convergent
to 1 for all f E D;
(b) S is mixing if and only if { pn!} is weakly convergent to 1 for all
/ED;
(c) S is exact if and only if { pn!} is strongly convergent to 1 for all
/ED.
Before giving the proof of Theorem 4.4.1, we note that, since Pis linear,
convergence of { pn!} to 1 for f E D is equivalent to the convergence of
{Pn!} to (!, 1} for every f E 1 This observation is, of course, valid for
all types of convergence: Cesaro, weak, and strong. Thus we may restate
Theorem 4.4.1 in the equivalent form.
1
lim - '"'(Pk /, g}
n-+oo
n L...J
= (!, 1} (1, g)
k=O
n-+oo
= (!, 1} (1, g)
n-+oo
- (/, 1) II
=0
Proof of Theorem 4.4.1. The proof of part (a) follows easily from Corollary 5.2.3.
73
n-+oo
n-+oolx
By applying the definitions of the Koopman operator and the scalar product
to this equation, we obtain
(4.4.1)
or
lim (Pnf,g} = (/,1}(1,g}
n-+oo
for f = 1A and g = 1s. Since this relation holds for characteristic functions
it must also hold for the simple functions
=L
AilA,
and g
=L
u,1s1
f,
11!11 +f)
and analogously
f(IIYIILoo +
11/11 +f).
74
E.
/A(x) = (1/J.t(A))1A(x).
Clearly, !A is a density. If the sequence {rn} is defined by
then it is also clear that the sequence is convergent to zero. By the definition
of rn, we have
J.t(Sn(A)) =
J.t(dx)
1sn(A)
={
pn fA(X)J.t(dx)- {
1sn(A)
1sn(A)
pnfA(X)J.t(dx)- Tn
?; {
(4.4.3)
1sn(A)
1sn(A)
pn fA(X)J.t(dx)
={
1s-n(Sn(A))
/A(X)J.t(dx)
75
that the Koopman operator is much easier to calculate than the FrobeniusPerron operator. Unfortunately, this reformulation cannot be extended to
condition (c) for exactness of Corollary 4.4.1 since it is not expressed in
terms of a scalar product.
Thus, from Corollary 4.4.1, the following proposition can easily be stated.
Proposition 4.4.1. Let (X, A, J.) be a normalized measure space, S: X X a measure-preseroing transformation, and U the Koopman operator corresponding to S. Then
1
lim - "(!, uk g)
n-+oo n L..J
= (!, 1){1, g)
k=O
n-+oo
(f,Ung)
= (Pn/,g}
for IE L\ g E L 00 , n
= 1, 2, ... ,
which shows that conditions (a) and (b) of Corollary 4.4.1 and Proposition
4.4.1 are identical.
Remark 4.4.1. We stated Theorem 4.4.1 and Corollary 4.4.1 in terms of
L1 and L 00 spaces to underline the role of the Frobenius-Perron operator
as a transformation of densities. The same results can be proved using
adjoint spaces P and v' instead of 1 and L 00 , respectively. Moreover,
when verifying conditions (a) through (c) of Theorem 4.4.1 and Corollary
4.4.1, or conditions (a) and (b) of Proposition 4.4.1, it is not necessary to
check for their validity for all I E P and g E v'. Due to special properties
of the operators P and U, which are linear contractions, it is sufficient to
check these conditions for f and g belonging to linearly dense subsets of
V' and v', respectively (see Section 2.3). 0
Example 4.4.1. In Example 4.2.2 we showed that the rotational transformation
S(x) = x + cp (mod 211")
is not ergodic when c/J/27r is rational. Here we prove that it is ergodic when
r/J/27r is irrational.
76
n-1
lim
n-+oo
.!:.
"'Ukg(x) = (1,g)
n L.J
(4.4.4)
k=O
uniformly for all x, thus implying that condition (a) of Proposition 4.4.1 is
satisfied for all f. To simplify the calculations, note that
U'g(x)
= g(S'(x)) = eik(z+lt/1)'
so that
n-1
Un(x)
= .!:. L
U1g(x)
n l=O
obeys
n-1
Un(x)
1
=n
L
eik(z+lt/1)
l=O
1 .
einkt/1 -1
= -e'kz--:-:-:--n
eikt/>-1'
and
llun(x)IIL2
~ nle'kt/1
.1
-
~ nleikt/1- 11'
Thus un(x) converges in L 2 to zero. Also, however, with our choice of g(x),
(1,g)
1
0
"'
dx
eikz_
21r
1
= -:-[e2wilc
_
zk
1] = 0
and condition (a) of Proposition 4.4.1 for ergodicity is satisfied with k '# 0.
When k = 0 the calculation is even simpler, since g(x) 1 and thus
Un(x)
= 1.
211'
(1, g) =
Un (x)
1
0
dx
211"
77
=1,
Example 4.4.2. In this example we demonstrate the exactness of the radic transformation
S(x) = rx
(mod 1}.
ri=O
Pl(x) =- L l
~+
=
,
r
)
pn l(x)
=..!._~I
rn ~ (i.
rn + -=-)
rn
k=O
n-+oo
1
1
l(s)ds,
uniformly in x,
= (x+y,x+2y)
(mod 1}
=
=
78
(!, U"g)
= {1
if (ka2'!-2 + la2n-1 - p)
0 otherwlSe.
Now we show that for large n either
= (ka2n-1 + la2n- q) = 0
[k
l- _P_]
lim
(a2n-2) +
=~ +
n-+oo
a2n-1
a2n-1
1 + v'5
.
However, this limit can never be zero because k and l are integers. Thus,
ka2n-2 + la2n-1- p '::/: 0 for large n. Therefore, for large n,
(!, un ) = { 1 if k = ~ = p = q = 0
g
0 otherwlSe.
But
(1, g)
=1
1 1
1 exp[211'i(kx + ly))dx dy
={
0 k '::/: 0 or l '::/: 0,
1 k=l=O
so that
(/,1)(1,g)
=1
1
1\1,g)exp[-211'i(px+qy)]dxdy
= { (1, g)
0
={1
if p = q = 0
ifp'::/:0 or q:f:O
ifk=l=p=q=O
0 otherwise.
,~
Thus
(1, ung)
79
= {/, 1){1, g)
for large nand, as a consequence, {Ung} converges weakly to {1,g). Therefore, mixing of the Anosov diffeomorphism is demonstrated. 0
In this chapter we have shown how the study of ergodicity, mixing, and
exactness for transformations S can be greatly facilitated by the use of
the Frobenius-Perron operator P corresponding to S (cf. Theorem 4.4.1
and Corollary 4.4.1). Since the Frobenius-Perron operator is a special type
of Markov operator, there is a certain logic to extending the notions of
ergodicity, mixing, and exactness for transformations to Markov operators
in general. Thus, we close this section with the following definition.
4.5
Kolmogorov Automorphisms
n-1
=o
lc=O
80
exact
./
'\.
mixing
!
weakly mixing
!
ergodic
where K -automorphism is the usual abbreviation for a Kolmogorov automorphism and the arrows indicate that the property above implies the
one below. Before giving the precise definition of K -automorphisms, we
introduce two simple notations.
If S: X-+ X is a given transformation and A is a collection of subsets of
X (e.g., au-algebra), then S(A) denotes the collection of sets of the form
S(A) for A e A, and s- 1 (A) the collection of s- 1 (A) for A e A. More
generally,
= 0,1,2, ....
Definition 4.5.1. Let (X, A, p.) be a normalized measure space and let
S: X-+ X be an invertible transformation such that Sand s- 1 are measurable and measure preserving. The transformation S is called a K -automorphism if there exists au-algebra .Ao c A such that the following three conditions are satisfied:
(i)
s- 1 (.Ao) c .Ao;
81
the word endomorphism is used for measure preserving but not necessarily
invertible transformations).
Examples 4.5.1. The baker transformation is a. K-a.utomorphism. For .Ao
we can take all the sets of the form
A1 =~Au(~+ ~A)
(4.5.3)
and thus condition (i) is satisfied. From this follows a. hint of how to prove
condition (ii). Namely, from (4.5.3) it follows that the basis A1 of the set
B 1 = s- 1 (B) is the union of two sets of equal measure that are contained in
the intervals (0, ~] and (~, 1], respectively. Furthermore, set B 2 = 2 (B)
has the form A1 x [0, 1] and its basis A2 is the union of four sets of equal
measure contained in the intervals [0, l] , ... , [!, 1] . Finally, every set Boo
belonging to the u-algebra. (4.5.1) is of the form Aoo x [0, 1] and A 00 has
the property that for each integer n the measure of the intersection of Aoo
with [k/2n, (k + 1)/2n] does not depend on k. From this it is easy to show
that the measure of the intersection of A 00 with [0, x] is a. linear function
of x or
s-
1z
1A (y)dy
00
=ex,
82
considered in Example 4.1.4 has the same property. AJJ we have observed
the Jacobian of S has two eigenvalues .\11 .\2 such that 0 < .\1 < 1 < .\2.
To these eigenvalues correspond the eigenvectors
e1
e2.
el
e2i
;,
v(F- 1 (A))
= J.(A)
for A E A
...!...
y
(4.5.4}
__!..X
and the condition SoF = FoT may be expressed by saying that the diagram
(4.5.4} commutes. We have the following theorem due to Rochlin [1961].
Theorem 4.5.1. Every exact transformation is a factor of a K -automorphism.
The relationship between K -automorphisms and mixing transformations
is much simpler; it is given by the following theorem.
Theorem 4.5.2. Every K -automorphism is mixing.
Exercises
83
Exercises
4.1. Study the rotation on the circle transformation (Examples 4.2.2 and
4.4.1) numerically. Is the behavior a consequence of the properties of the
transformation or of the computer? Why?
4.2. Write a series of programs, analogous to those you wrote in the exercises of Chapter 1, to study the behavior of tw<Klimensional transformations. In particular, write a program to examine the successive locations of
an initial cluster of initial conditions as presented in our study of the baker
transformation and of equations (4.3.3)-(4.3.5).
4.3. Let (X, A, Jt) be a finite measure space and let 8: X
surable transformation such that
--+
X be a mea-
forA eA.
Show that J..L is invariant with respect to S. Is the assumption J..L(X) < oo
essential?
4.4. Consider the space (X, A, Jt) where
X= { ... ,-2,-1,0,1,2, ... }
is the set of all integers, A the family of all subsets of X and Jt is the
counting measure. Let S(X) = x + k for x EX where k is an integer. For
which k is the transformationS ergodic?
4.5. Prove that the baker transformation of Examples 4.1.3 and 4.3.1 is
mixing by using the mixing condition (4.3.1).
4.6. Let X = [0, 1) x [0, 1) be the unit square with the standard Borel
measure. Let r ~ 2 be an integer. Consider the following generalization of
the baker transformation
S(x, y)
for
= (rx(mod 1), ~ + ~)
~x<
k+1
--,k=O, ... ,r-1.
r
84
5
The Asymptotic Properties of
Densities
An/= ~Lpkj
k=O
and show how this may be used to demonstrate the existence of a stationary density of P. We then show that under certain conditions { pn!}
can display a new property, namely, asymptotic periodicity. Finally, we introduce the concept of asymptotic stability for Markov operators, which
is a generalization of exactness for Frobenius-Perron operators. We then
show how the lower-bound function technique may be used to demonstrate
86
5.1
In calculus one of the most important observations, originally due to Weierstrass, is that any bounded sequence of numbers contains a convergent
subsequence. This observation can be extended to spaces of any finite dimension. Unfortunately, for more complicated objects, such as densities,
this is not the case. One example is
fn(x) = n1[0,1/nJ(x),
0 :$;X:$; 1
which is bounded in L 1 norm, that is, llfnll = 1, but which does not converge weakly or strongly in 1 ([0, 1]) to any density. In fact as n --+ oo,
fn(x) --+ c5(x), the Dirac delta function that is supported on a single point,
x=O.
One of the great achievements in mathematical analysis was the discovery of sufficient conditions for the existence of convergent subsequences of
functions, which subsequently found applications in the calculus of variations, optimal control theory, and proofs for the existence of solutions to
ordinary and partial differential equations and integral equations.
To make these comments more precise we introduce the following definitions. Let (X, A, J.) be a measure space and :Fa set of functions in V.
Definition 5.1.1. The set :F will be called strongly precompact if every sequence of functions {in}, fn E :F, cont~ns a strongly convergent
subsequence {fan} that converges to an/ E V.
Remark 5.1.1. The prefix ''pre-" is used because we take
than f E :F. 0
E V rather
87
There are several simple and general criteria useful for demonstrating
the weak precompactness of sets in LP [see Dunford and Schwartz, 1957].
The three we will have occasion to use are as follows:
1 Let g E L1 be a nonnegative function. Then the set of all functions
I E L1 such that
ll(x)l ~ g(x)
forx EX a.e.
(5.1.1)
is weakly precompact in L1
2 Let M > 0 be a positive number and p > 1 be given. If ~t(X) <co,
then the set of all functions IE L1 such that
(5.1.2)
is weakly precompact in L 1
3 A set of functions :F c L 1 , ~t(X) < co, is weakly precompact if and
only if:
(a) There is an M < co such that IIIII ~ M for all
(b) For every f > 0 there is a 6 > 0 such that
Lll(x)l~t(dx)
<f
if ~t(A)
:F; and
Remark 5.1.4. If the measure is not finite these two conditions must be
supplemented by
(c) For every e > 0 there is a set BE A,
{
lx\B
~t(B)
ll(x)l~t(dx) <e.
a of the
(5.1.3a)
88
(5.1.3b)
for all f E F a.nd all h such that lhl < 6. To ensure that this integral
is well defined we assume f(x +h)- f(x) = 0 for X+ h a.
Remark 5.1.5. This necessary a.nd sufficient condition for strong precompactness is valid for unbounded intervals a if, in addition, for every E > 0
there is a.n r > 0 such that
Jlzl?:.r
for all f E F.
lf(x)l dx < E
(5.1.4)
Remark 5.1.6. In practical situations it is often difficult to verify inequality (5.1.3b). However, if the functions f E F have uniformly bounded
derivatives, that is, if there is a constant K such that If' (x)l ~ K, then
the condition is automatically satisfied. To see this, note that
E,
Kh
~ Khp.(a)
we pick
the condition (5.1.3b) is satisfied. Clearly this will not work for unbounded
intervals because for p.(a) -+ oo, 6-+ 0. 0
To close this section we state the following corollary.
Corollary 5.1.1. For every f E L 1 ,
lim
bounded or not,
h-+O}A
(5.1.5)
Proof. To see this note that set {!} consisting of only one function f
is obviously strongly precompact since the sequence {!, J, ... } is always
convergent. Thus equation (5.1.5) follows from the foregoing condition (4b)
for strong precompactness.
5.2
In this section we assume a measure space (X, A, p.) a.nd a Markov operator
P: L 1
89
(5.2.1)
We then state and prove a special case of the Kakutani-Yosida abstract
ergodic theorem as well as two corollaries to that theorem.
e 1,
n-+oo
= 0.
= (1/n)(/- pn f)
and thus
+ IIPn/11).
11/11,
90
Write
E L 1 in the form
= (!- /.)
+ /.
(5.2.2)
and &BSume for the time being that for every E > 0 the function
be written in the form
I - /.
f - /. can
= Pg - g + r,
(5.2.3)
where g E L 1 and llrll < E. Thus, from equation (5.2.2) and (5.2.3), we
have
An/ = An(Pg- g)+ Anr +An/.
Because Pf.
IIAn/- /.II
(!-/.,go};:/:- 0
and,
(h,go)=O
In particular
((P- I)Pi /,go} = 0.
Thus
for j
= 0, 1, ... ,
91
(5.2.5)
(Anf,go} =(/,go}.
(5.2.6)
As a consequence
or
Since {Aan!} was assumed to converge weakly to j., we have
and, by (5.2.6),
(!,go}= (!.,go},
which gives
(/-/.,go}= 0.
However, this result contradicts (5.2.4), and therefore we conclude that the
representation (5.2.3) is, indeed, always pOBBible.
92
E
FIGURE 5.2.1. Diagram showing that, for /o E, we can find a go, such that go
is not orthogonal to Jo but it is orthogonal to all J E E. Since go belongs to L"',
but not necessarily to
-+
L1 a
(5.2.7}
= J,.,
0 5, Anf
= ;;:1 L
pic f 5, g
lc=O
and, thus, IAn/1 5, g. By applying our first criterion for weak precompa.ctness (Section 5.1}, we know that {An/} is weakly precompa.ct. Then
Theorem 5.2.1 completes the argument.
Corollary 5.2.2. Again let (X, A, p.) be a finite measure space and P: 1 -+
1 a Markov operator. If some fED there exists M > 0 and p > 1 such
that
(5.2.8}
= f,..
93
Proof. We have
1
IIAnfiiL"
Il
l nn
pk I
k=O
1 n-1
:5 n
Lr>
1
L II?!IlL" :5 n(nM)
= M.
k=O
= cPn /. =
cf.
= cj.
(5.2.9)
where
rc =
(1- ll;cll)
fc
+ f-
fc
94
Since f,.(x)
C-+00
= /(x)
for all x
(5.2.10)
However, since /c/11/cll is a density bounded by cll/cll- 1f,., according to
the first part of the proof,
(5.2.11)
for sufficiently large n. Combining inequalities (5.2.10) and (5.2.11) with
the decomposition (5.2.9), we immediately obtain
IIAnf- f,.ll $ e
for sufficiently large n.
In the case that P is the Frobenius-Perron operator corresponding to
a nonsingular transformation S, Theorem 5.2.2 offers a convenient criterion for ergodicity. As we have seen in Theorem 4.2.2, the ergodicity of
S is equivalent to the uniqueness of the solution to Pf = f. Using this
relationship, we can prove the following corollary.
(5.2.12)
95
Definition 5.3.1. Let (X,A,Jt) be a finite measure space. A Markov operator P is called constrictive if there exists a 6 > 0 and K. < 1 such that
for every IE D there is an integer no(/) for which
pn l(x)Jt(dx) $
K.
and
~t(E) $
6.
(5.3.1)
Note that for every density I the integral in inequality (5.3.1) is bounded
above by one. Thus condition (5.3.1) for constrictiveness means that eventually [n;:;:: no(/)] this integral cannot be close to one for sufficiently small
sets E. This clearly indicates that constrictiveness rules out the possibility
that pn I is eventually concentrated on a set of very small or vanishing
measure.
H the space X is not finite, we wish to have a definition of constrictiveness
that also prevents pn I from being dispersed throughout the entire space.
To accomplish this we extend Definition 5.3.1.
pn l(x)~t(dx) $
K.
and
~t(E) $
6.
(5.3.2)
j(X\B)UE
j(X\B)UE
pn l(x)~t(dx) =
j(X\B)UF
pn l(x)~t(dx),
96
~~~~~~~~~~~~~~--x
(5.3.3)
n-+oo
Then P is constrictive.
Proof. From (5.3.3) there is an integer n 0 (/) such that
IIPn /IILP ~ K
+1
for n 2: no(/).
Thus, by criteria 2 of Remark 5.1.3 the family {Pn /},for n 2: no(/), fED,
is weakly precompact. Finally, for a fixed f E {0, 1), criteria 3 of the same
remark implies there is a 6 > 0 such that
pn f(x)J.&(dx) <
if J.&(E)
< 6.
97
that
for IE D,
(5.3.4)
then P is constrictive.
Proof. Let f = !(1- .X) and take :F = {h}. Since :F, which contains only
one element, is evidently weakly precompact (it is also strongly precompact,
but this property is not useful to us here), then by criterion 3 of Remark
5.1.3 there exists a 6 > 0 such that
~ h(x)J.L(dx) < f
(5.3.5)
lx\B
Now fix
forn ~no(/),
and, as a consequence,
(5.3.7)
f
j(X\B)UE
pn l(x)J.L(dx) :5
h(x)J.L(dx) +
j X\B
h(x)J.L(dx) +A+ f
jE
98
n D(X)
L pn fo(x)~-&(dx) :5
K.
for n ~ no(fo)
and
L pn f(x)~-&(dx)
Thus, when (5.3.1) holds for fo E Do it holds for all densities f E D(X).
Precisely the same argument shows that it is also sufficient to verify
(5.3.2) for densities drawn from dense sets. As a consequence of these observations, in verifying either (5.3.3) or (5.3.4) of Propositions 5.3.1 and
5.3.2 we may confine our attention to f E Do.
The main result of this section-which is proved in Komornik and Lasota
([1987]; see also Lasota, Li and Yorke [1984]; Schaefer [1980] and Keller
[1982])-is as follows.
Theorem 5.3.1. (spectral decomposition theorem). Let P be a constrictive Markov operator. Then there is an integer r, two sequences of
nonnegative functions 9i E D and k1 E 00 , i = 1, ... , r, and an operator
Q: 1 -+ 1 such that for every f E 1 , P f may be written in the form
r
Pf(x)
= ~ >.,(J)g,(x) + Qf(x),
(5.3.8)
i=1
where
>.,(!)
Lf(x)k,(x)~-&(dx).
(5.3.9)
(1) g;(x)g;(x)
= 0 for all i =F j,
99
ports;
(2) For each integer i there exists a unique integer a(i) such that Pg; =
Ya(i) FUrther a(i) =F a(j) fori =F j and thus opemtor P just seroes
to pennute the functions g;.
(3)
L1
Remark 5.3.2. Note from representation (5.3.8) that operator Q is automatically determined if we know the function g; and~' that is,
r
Qf(x) = f(x)-
L A;(/)g;(x).
i=l
L A;(/)ga"(i)(X) + Qnf(x),
pn+lj(x) =
(5.3.10)
i=l
L A;(/)ga"(i)(x)
(5.3.11)
i=l
L Aa-"(i)(/)g,(x),
i=l
100
n-+oo
II(Pnf - h)+ll = 0
for fED.
(5.3.12)
Lf(x)
fED.
i=l
tf
lc=l 1supp9 ,.
Since
g~; E
g~c(x)~t(dx) ~
tf
k=l
1supp9 ,.
h(x)~t(dx) ~ llhll.
D, this reduces to
r~
llhll,
(5.3.13)
-+
L1 a
f be defined by
1 r
f(x) =- L9i(x),
r i=l
(5.4.1)
101
pn+lf(x)
= L .\
for all/ E 1 ,
{5.4.2)
i=1
where
IA,(x) = [1/p.(A;)]lA;(x).
The sets A; form a partition of X, that is,
fori:/: j.
Furthermore, p.(Ai)
= p.(A;)
whenever j
pn+llx(x) =
L Aa-"(i)(lx)gi(x) + Qn1x(x).
(5.4.3)
i=1
From our considerations in the preceding section, we know that the summation in equation (5.4.3) is periodic. Let r be the period of the summation
portion of pn+l (remember that r :5 r!) so that
a-nr(i)
and
=i
-+
1x(x)
= LAi(lx)gi(x).
i=1
(5.4.4)
102
where Ai c X denotes the support of 9i, that is, the set of all x such that
9i(x) # 0. From (5.4.4) it also follows that UiAi =X.
Apply operator pn to equation (5.4.4) to give
r
P"1x(x)
= 1x(x) = L~(lx)9a"(i)(x),
i=1
1
A,
Thus p.(Ai)
9i(x)p.(dx)
= 1 = p.(Ai)/.Xi(1x).
= .Xi(1x) and
9i(x)
Moreover, p.(Aa"(i))
= [1/p.(A;)]1A,(x).
(5.4.5)
Anf(x)
n-1
j=O
=-
.
3
L P f(x).
103
n-+oo
= J.(Aj) =
= rJ..(/).
Hence, for fED, X(/)= 1/r, and we have proved that if the permutation
{a(1), ... , a(r)} of {1, ... , r} is cyclical, then {pn!} is Cesaro convergent
to 1 and, therefore, ergodic.
The converse is also easy to prove. Suppose P is ergodic and that {a(i)}
is not a cyclical permutation. Thus {a(i)} has an invariant subset I. As an
initial f take
r
f(x)
= LCiiA,(x)
i=l
wherein
,., = { c =F 0
......
0
104
Then
lim A,,f
n-+oo
- = -r1 Lr Ai(/)1A"
i=1
where >.i (/) '# 0 if i is contained in the invariant subset I, and >.i (f) = 0
otherwise. Thus the limit of An! as n -+ oo is not a constant function with
respect to x, so that P cannot be ergodic. This is a contradiction; hence,
if P is ergodic, {a(i)} must be a cyclical permutation.
Mixing and Exactness
= 1, so by {5.4.2) we have
and, thus,
lim pn+l f = A(/)1x.
n-+oo
Theorem 5.5.3. Again let {X, A, p.) be a normalized measure space and
P: L 1 -+ 1 a constrictive Markov operator. If P is mixing, then r = 1 in
representation (5.9.8).
Proof. To see this, assume P is mixing but that r > 1 and take an initial
fED given by
where c1 = 1/p.(A1).
Therefore
pn f(x) = C11A(n)(x),
105
Hence { pn!} will converge weakly to 1 only if an(1) = 1 for all sufficiently
large n. Since a is a cyclical permutation, r cannot be greater than 1, thus
demonstrating that r = 1.
Remark 5.5.1. It is somewhat surprising that in this case P mixing implies
P exact. D
Remark 5.5.2. Observe that, except for the remainder Qn/, pn+l f behaves like permutations for which the notions of ergodicity, mixing, and
exactness are quite simple. D
n-+oo
IIPnf- /.II = 0
for every f E D.
(5.6.1)
~A~
for almost all x E A and all n > no(/). Then {pn} is asymptotically stable.
Proof. Since, by assumption, P is constrictive, representation (5.3.8) is
valid. We will first show that r = 1.
106
Clearly, pn-r l(x) is not positive on the set A since A is not contained in
the support of 9io This result contradicts (5.6.2) of the theorem and, thus,
we must have r = 1.
Since r = 1, equation (5.3.10) reduces to
pn+l l(x)
= >.(f)g(x) + Qnl(x)
so
lim pn I
n ..... oo
= >.(f)g.
(5.6.3)
where IIEnll
--+
107
P.
Proof. The "only if" part is obvious since (5.6.1) implies (5.6.3) with
h = / . The proof of the "ir' part is not so direct, and will be done in two
steps. We first show that
lim
n-+oo
IIP"(h - 12)11 = 0
(5.6.4)
Now set g = h
= IIY-11 = ~11911
(5.6.5)
and
108
Next write
=
=
pn(g+ fc)(x)p(dx) -
+2
fx
h(x)p(dx)
= 1- !llhll
for n? n1.
Analogously,
IIPn(g- /c)- hll :51- !llhll
for n? n1.
for n? n1.
(5.6.6}
From (5.6.6), for any 11, he D, we can find an integer n 1 such that
Choose a sequence {h;} of lower-bound functions such that llh;ll-+ p. Replacing, if necessary, h; by max(h 1 . , h; ), we can construct an increasing
109
sequence {hj} of lower functions, which will always have a limit (finite or
infinite). This limiting function
h.= .lim hi
J-+00
110
for n ~ no(/).
= /.
or f
= -/.
Proof. From Proposition 3.1.3, equation (5.6.7) implies ~that both j+ and
are fixed points of P. Assume 11/+11 > 0, so that f = j+ /11/+11 is a
density and Pj =f. Uniqueness of/. implies j =/.,hence
1-
so that
Since 11/11 =
I= J+- 1- =
11/.11, we have lal =
(11/+11-11/-11)/.
=a/.
111
Before closing this section we state and prove a result that draws the connection between statistical stability and exactness when P is a FrobeniusPerron operator.
Proposition 5.6.2. Let (X, A,~) be a measure space, 8: X -+X a nonsingular transformation such that 8(A) E A for A E A, and P the FrobeniusPemJn operator corresponding to 8. If 8 is statistically stable and/. is the
density of the unique invariant measure, then the transformation 8 with
the measure
forAEA
is exact.
~f.
is invariant.
!A(x)
= [1/~J.(A))/.(x)1A(x)
for
EX.
n-+oo
~J.(SR(A)) = f
of~~.,
j Sn(A)
n-+oo
-/.11 = 0.
we have
/.(x)~(dx) ~ f
j Sn(A)
pn!A(x)~(dx)- rn.
(5.6.8)
Jsn(A)
pn /A(x)~(dx)
= [
Jx
pn /A(x)~(dx)
= 1.
Substituting this result into (5.6.8) and taking the limit as n-+ oo gives
hence 8: X
-+
X is exact by definition.
Remark 5.6.1. In the most general case, Proposition 5.6.2 is not invertible;
that is, statistical stability of 8 implies the existence of a unique invariant
measure and exactness, but not vice versa. Lin [1971) has shown that the
inverse implication is true when the initial measure ~ is invariant. 0
112
0$ K(x,y)
and
K(x,y)dx
(5.7.1)
[dx = JJ(dx)].
= 1,
(5.7.2)
Pf(x)
K(x,y)f(y)dy
(5.7.3)
LL
=L
L
Pf(x)dx =
dx
K(x,y)f(y)dy
f(y) dy
K(x, y) dx
f(y) dy,
Pis therefore a Markov operator. In the special case that X is a finite set
and JJ is a counting measure, we have a Markov chain and P is a stochastic
matrix.
Now consider two Markov operators Pa and .P, and their corresponding
stochastic kernels, Ka and Kb. Clearly, PaP, is also a Markov operator,
and we wish to know how its kernel is related to Ka and Kb. Thus, write
(Pa.P,)f(x)
= Pa(.P,f)(x)
=
=
Ka(x, z)(.P,f(z)) dz
L z}{ L
L{L
L
Ka(x,
Kb(z, y)f(y)dy} dz
K(x,y) =
Ka(x,z)Kb(z,y) dz.
(5.7.4}
K=Ka*Kb
and note that the composition has the properties:
(5.7.5}
113
tic.
However, in general kernels KA and Kb do not commute, that is, Ko.*Kb =I:
Kb* K 0 Note that the foregoing operation of composition definition is just
a generalization of matrix multiplication.
Now we are in a position to show that Theorem 5.6.2 can be applied
to operators P defined by stochastic kernels and, in fact, gives a simple
sufficient condition for the asymptotic stability of {pn}.
(5.7.6)
11
=[
Kn(x,y)f(y)dy.
Km(x,z)Kn(z,y)dz,
so that
pn+m f(x)
=[
Kn+m(X, y)f(y) dy
=[ {[
Km(x,z)Kn(z,y)dz }f(y)dy.
If we set
h(x) = inf Km(x, y),
11
then
pn+m f(x) '2:. h(x) [
{[
= h(x) [!(y)dy
114
IE D(X),
l(y)dy = 1,
and, therefore,
pn+m l(x) ~ h(x)
for n ~ 1,
D(X).
Thus
~
forn
m+ 1,
which implies that (5.6.3) holds, and we have finished the proof.
In the case that X is a finite set and K is a stochastic matrix, this result
is equivalent to one originally obtained by Markov.
Although condition (5.7.6) on the kernel is quite simple, it is seldom
satisfied when K(x, y) is defined on an unbounded space. For example, in
Section 8.9 we discuss the evolution of densities under the operation of a
Markov operator defined by the kernel [cf. equation (8.9.6)]
K(x y)
'
= { -e11Ei(-y)
'
-e11Ei(-x),
where
0<x <y
O<y<x,
=1 (e- 11 /y)dy,
(5.7.7)
00
-Ei(-x)
X>
0,
for all x
> 0,
and the same holds for all of its iterates Km(x, y). A similar problem occurs
with the kernel
K(x, y) = g(ax +by),
where b # 0 and g is an integrable function defined on R or even on n+
(cf. Example 5.7.2).
In these and other cases where condition (5.7.6) is not satisfied, an
alternative approach, reminiscent of the stability methods developed by
Liapunov, offers a way to examine the asymptotic properties of iterates of
densities by Markov operators.
Let G be an unbounded measurable subset of a d-dimensional Euclidian
space Rd, G c ~, and K: G x G -+ R a measurable stochastic kernel. We
will call any measurable nonnegative function V: G -+ R satisfying
lim V(x)
lzl->oo
a Liapunov function.
= oo
(5.7.8)
115
Proposition 5.7.1. Let (X,A,JL) be a measure space, V:X-+ Ran arbitrary nonnegative measurable function, and for all f E D set
E(VIf) = [
V(x)f(x)JL(dx).
If
la,.
(5.7.9)
lx\G..
;?: a{
V(x)f(x)JL(dx) ;?: a
1-l.
lx\G,.
f(x)JL(dx)
f(x)JL(dx) }
f in K(x,y)dx
Jal11l<r
and has a Liapunov function V: G
Ia
V(x)PJ(x)dx :5 a
>0
-+
for every r
>0
(5.7.10)
R such that
fooo V(X)f(x)dx+{J
0 :5
< 1, {J 2:: 0
(5.7.11)
Remark 5. 7.1. Before giving the proof, we note that sometimes instead
of verifying inequality (5.7.11) it is sufficient to check the simpler condition
fa
(5.7.11a)
116
since (5.7.11a) implies (5.7.11). To see this, note that from (5.7.1la)
V(x)Pf(x)dx=
LL
V(x)K(x,y)f(y)dxdy
[aV(y) + PJ/(y) dy =a
V(y)f(y) dy + {3.
En(VIf)
=LV(x)Pnf(x}dx
(5.7.12)
that can be thought of as the expected value of V(x) with respect to the
density pn f(x). From (5.7.11) we have directly
(5.7.13}
(5.7.14}
(5.7.15}
Now let
Ga
= {x E G: V(x) <a}
{ pn f(x)dx ~ 1 _ En(VIf).
lo,.
(5.7.16)
Further, set
En(VIf)
a
~!
a
(1 +
___L)
<1
1-a
for n
~no
[ pnj(x)dx
lo,.
~ 1-!a (1 + ___L)
1-a
>0
for n
~no.
(5.7.17)
117
Since V(x) __. oo as lxl __. oo there is an r > 0 such that V(x) > a for
lxl > r. Thus the set Ga is entirely contained in the ball lxl :5 r, and we
may write
pn+lj(x) = { K(x,y)Pnf(y)dy
la
inf K(x,y) {
}Ga
uEGa
laa
K(x,y)Pnf(y)dy
pnf(y)dy
(5.7.18)
h(x) =
inf K(x,y)
lu1:5r
in inequality {5.7.18) we have, by assumption (5.7.10), that
f
llhll > 0.
Finally, because of the continuity of V, the set Do c D of all f such that
(5.7.14) is satisfied is dense in D. Thus all the conditions of Theorem 5.6.2
are satisfied.
Another important property of Markov operators defined by a stochastic
kernel is that they may generate an asymptotically periodic sequence { pn}
for every f E D. This may happen if condition (5.7.10) on the kernel is
replaced by a different one.
K(x,y) dx :5 A
for p,(E)
< o,
y E B,
B.
(5.7.19)
jG\Ga
pn f(x) dx = 1- {
laa
:5
pn f(x) dx :5 En(VIf)
a
!a (1 + -L)
1-a
for n
~ no(!).
118
Set
a~!E
(1+_L),
a
1-
we have
P"f(x)dx 5:
for n 2: no(/).
(5.7.20)
lc\G ..
Consequently, from (5.7.19) we have
P"f(x)dx5,
j(G\G,.)uE
5: E +
+1
pn- 1 f(y) dy
K(x, y) dx
G,.
pn- 1 f(y)dy
)E
K(x, y) dx.
= Ga we finally have
pn- 1 f(y) dy
P" f(x) dx 5:
2E
+.X
5,
2E
+ .X = 1 -
j(G\G,.)UE
P"f(x)dx
)E
5: E + 1
P"f(x)dx+
)G\G,.
Jc ..
E
for n ~ no(/) + 1.
P" f(x) =
fx
5: g(x) we have
for n 2: m.
119
forx E A, y E X,
inf K(x,y)
O:Sy:Sr
min{-Ei{-x),-erEi(-r)}
> 0.
Pf(x)
00
K(x,y)e- 11 dy
= e-z.
L:
g(x)dx = 1 and
m1
L:
-+
R be a continuous
lxlg(x)dx
< oo.
K(x, y)
= lalg(ax +by),
120
Pf(x)
Let V(x) =
1:
K(x,y)f(y)dy.
1:
Pf(x)
'h a
Wlt
= -C21
andb
Ct
= --,
C2
is the density ofthe random variable (c1{ +c2t7) [cf. equation (10.1.8)].
dt
=g(m},
m(O}
=r
with solution m(r,t). The rate g is a C 1 function on [O,oo) and g(x) > 0
for x > 0. Second, it is assumed that the probability of mitosis in the
interval [t, t +At] is given by </>(m(t))At + u(At), where 4> is a nonnegative
function such that q(x) = <f>(x)fg(x) is locally integrable (that is, integrable
on bounded sets [0, c]) and satisfies
lim Q(x)
Z-+00
= oo,
where Q(x)
1z
q(y)dy.
(5.7.21)
00
fn(x)
K(x,r)fn-t(r)dr,
121
where
o
K(x,r) =
x E [o, !r)
x E [lr,oo).
tz q(y)dy]
{ 2q(2x)exp [-
(5.7.22)
fooo K(x,r)f(r)dr
Pf(x) =
(5.7.23)
(5.7.24)
:I: -tOO
00
I=
u(Q(2x))Pf(x) dx,
(5.7.25)
00
1z
Setting Q(z)-Q(y)
u(Q(2x))Pf(x) dx =
00
00
00
f(y) dy
u(x + Q(y))e-zdx.
=1
(5.7.26)
we have
00
00
Pf(x) dx =
f(y) dy,
fooo V(x)Pf(x) dx =
= u(Q(2x)).
From
00
00
f(y)eQ('II)dy
e-(l-)zdx
1-
(5.7.27)
122
0 such that
Q(2y)- Q(y)
for y
xo.
00
211 >dy
1-E
1-E
e-ep
a(E)=1-E
we have a{O) = 1 and a'{O) = 1- p < 0. Thus for some E > 0 we have
a(E) < 1. Take such an E set
1
{3 = --eeQ(zo)
1-E
With these values of a and {3 we have shown that the operator P defined
by (5.7.22)-{5.7.23) satisfies inequality (5.7.11) of Theorem 5.7.1 under
the assumption of (5.7.23). It only remains to be shown that K satisfies
(5.7.10).
Let ro ~ 0 be an arbitrary finite real number. Consider K(x, r) for
0 ~ r ~ ro and x ~ !r. Then
a= a( E),
K{x,r)=2q{2x)exp[-1 z q(y)dy]
~ 2q{2x)exp [-
L
2
z q(y)dy]
for 0
~ r ~ ro,x ~ !r
and, as a consequence,
O$r$ro
for x
< !ro
for x ~ !ro.
Further,
1~~
00
00
/
h(x) dx =
2q(2x) exp [- [
:1:
q(y) dy] dx
> 0;
hence K (x, r) satisfies (5. 7.10). Thus, in this simple model for cell division,
we know that there is a globally asymptotically stable distribution of mitogen. Generalizations of this model have appeared in the work of Tyson and
Hannsgen [1986], Tyrcha [1988], and Lasota, Mackey, and Tyrcha [1992].
0
123
+ 6) ~
f(x)
6>0
-+
= 6-+0
lim inf ~[/(x + 6) -f(x)]
u
6>0
-+
R,
/n(x)
~ g(x)
a. e. in (a, b)
(5.8.2)
124
(5.8.3)
= ElA
is
Zl
g(x)dx
< l and
1b
:1:2
g(x)ch
< l
(5.8.4)
Set
M
Since
= l exp[-k(x2- xo)].
1b
(5.8.5)
ln(x) ch = 1.
X E
[y, X2].
(5.8.6)
In :::; g, we have
1
a
z1
z2
1b
ln(x)dx 5:
Remark 5.8.1. In the proof of Proposition 5.8.1, the left lower semicontinuity of In and inequality (5.8.3) were only used to obtain the evaluation
for x;::: y.
Therefore Proposition 5.8.1 remains true under this condition; for example,
it is true if all In are nonincreasing. 0
It is obvious that in Proposition 5.8.1 we can replace (5.8.3) by d-lnfdx;:::
-kln and assume In right lower continuous (or assume In nondecreasing;
5.9. Sweeping
125
cf. Remark 5.8.1}. In the case of a bounded interval, we may omit condition (5.8.2} and replace (5.8.3} by a two-sided inequality. This observation
is summarized as follows.
Proposition 5.8.2. Let (a, b) denote a bounded interoal and let P: L 1 ( (a, b))
-+ L1 ( (a, b)) be a Markov operator. Assume that for each f E Do the functions In in (5.8.1} are differentiable and satisfy the inequality
(5.8.7)
= (1/2(b- a)Je-k(b-a).
Now it is easy to show that In ~ h for n ~ no. H not, then /n(y) < e for
some y E (a, b) and n' ~ n 0 Consequently, by (5.8. 7),
fn(x)::; /n(y)ekl:z:-yl ::; [1/2(b- a)).
This evidently contradicts (5.8.5). The inequality fn
proof.
h completes the
5.9 Sweeping
Until now we have considered the situation in which the sequence {Pn}
either converges to a unique density (asymptotic stability) or approaches
a set spanned by a finite number of densities (asymptotic periodicity) for
every initial density f. In this section we consider quite a different property
in which the densities are dispersed under the action of a Markov operator
P. We call this new behavior sweeping, and introduce the concept through
two definitions and several examples.
Our first definition is as follows.
Definition 5.9.1. Let (X, A, J.t) be a measure space and A. c A be a
subfamily of the family of measurable sets. Also let P: L1 (X)-+ L 1 (X) be
a Markov operator. Then {pn} is said to be sweeping with respect to A.
if
lim
n--+oo}A
pnj(x)J.t(dx)
=0
for every
D and A E A .
(5.9.1)
for j, ED,
for a sweeping operator P, condition (5.9.1) also holds for
E L 1.
126
$;
/oil,
pn
(5.9.2)
and the fact that both terms on the right-hand side of (5.9.2) can be made
arbitrarily small.
for jeD
so
(5.9.3)
for fED.
With r > 0 the sequence { pn} is sweeping with respect to the family of
intervals
.Ao = {( -oo, c): c E R}.
To prove this, note that for every
-oo
pnj(x)dx =
~c
-oo
f(x- nr)dx =
~c-nr
-oo
J(y)dy.
Thus the integral on the right-hand side will eventually become zero since
( -oo, c - nr] n supp f
=0
for sufficiently large n. In an analogous fashion we can also prove that for
r < 0 the sequence {Pn}, where P is given by (5.9.3), is sweeping with
respect to the family of intervals
A1
Example 5.9.2. Again take X= Rand IS to be the Borel measure. Further, let P be an integral operator with Gaussian kernel
Pj(x)
= ~2
v27ro
00
-oo
exp [- (x;
:>
2
]
f(y)dy.
It is easy to show (see also Example 7.4.1 and Remark 7.9.1) that
2
pnj(x) =
(5.9.4)
5.9. Sweeping
and as a consequence
127
pnf(x)~ ~
2
2'11'u n
Thus the sequence {Pn} defined by (5.9.4) is sweeping with respect to the
family of bounded intervals
pnf(x)dx
~ ~-+ 0
as n-+ oo.
0
2'11'0' n
These two examples motivate a more restricted version of the general
Definition 5.9.1 of sweeping appropriate to the situation where X c R is
an interval.
b
Definition 6.9.2. Let X c R be an interval {bounded or not) with endpoints o:,/3 and let P:L1 (X)-+ L 1 (X) be a Markov operator. We say the
following:
a. {pn} is sweeping to fj if it is sweeping with respect to the family
of intervals
(5.9.5)
.Ao ={{a:, c): c < /3};
b. {pn} is sweeping to a if it is sweeping with respect to the family
of intervals
(5.9.6)
At = {[c, /3): o: < c};
c. {pn} is central sweeping if it is sweeping with respect to the family
of closed intervals
A2
(5.9.7)
In Examples 5.9.1 and 5.9.2 the sweeping was almost self-evident from
the structure of the operator P. However, this is often not the case, and we
are going to present a sufficient condition often useful for proving sweeping.
We start with a definition reminiscent of the definition of the Liapunov
function.
Let (X, A, JL) be a measure space and let a subfamily A. c A be given.
A bounded Borel measurable function V: X -+ R is called a Bielecki
function if it is nonnegative and if
inf V(x) > 0
zE.A
forA EA.
For example, if X = [a:, /3) and A. = .Ao from Definition 5.9.2a, then
every continuous and strictly positive function V: [o:,/3)-+ R is a Bielecki
function since the infimum of a positive continuous function on a closed
bounded (compact) interval is always positive.
128
V(x)Pf(x)p.(dx) :57
V(x)f(x)p.(dx)
for fED.
E D and A E
A ... Then
{ pn f(x)p.(dx) :5 . f
jA
V( ) { V(x)Pn f(x)p.(dx)
In zEA
X }A
"''nV( X )
:5 . f
In zEA
1
A
V(x)f(x)p.(dx).
<1
(5.9.8)
Z--+00
u( z )
where
= { e-uo
e-E.z
for z
for z
xo, and p
<
1 such that
< zo
~ zo,
00
00
V(x)Pf(x) dx
00
f(y) dy
u(x + Q(y)e-zdx,
129
or
where
for IE D,
roo
(5.9.9)
(5.9.10)
00
W(y)::::;
=
ee.z:o
u(x)e-a:dx
Jo
= 1- e-.z:o [ 1 :
E]
=o:1(E),
00
W(y)::::;
W(y) ::::;
roo e-E(a:-p)-a:dx =
lo
eEP
= o:2(E).
1+E
It is clear that o:2(0) = 1 and that o:~(O) = p- 1 < 0. Thus, there must be
an E > 0 such that o: 2(E) < 1.
Chose an E such that o:(E) < 1 and define a= min(o:l(E),o: 2(E)). Then
W(y)::::; a< 1 for ally 2:: 0 and from (5.9.9) we have
00
00
V(x)PI(x) dx::::; a
V(x)l(x) dx
for allIED.
Thus by proposition 5.9.1 we have shown that the cell cycle model defined by equations (5.7.21)-(5.7.23) is characterized by a sweeping Markov
operator when (5.9.8) holds.
130
fx
(5.10.1)
K(x,y)f(Y)IJ.(dy)
for x eX a.e.
UAn=X.
(5.10.2)
n=O
Lf(x)~J.(dx)
< oo
f: X
-t
R is lo-
for A E .A.
With these definitions we state the following result which will be referred
to as the Foguel alternative.
Theorem 5.10.1. Let (X, .A, IJ.) be a measure space and .A. E .A a regular
family. Assume that P: L 1 - t L 1 is an integral operator with a stochastic
kernel. If P has a locally integrable and positive (f > 0 a. e.) subinvariant
function f, then either P has an invariant density or {P"} is sweeping.
In the statement of this theorem, there are two implications:
131
Only the first part is hard to prove, and the second part can be demonstrated using condition (5.10.2).
To prove the second implication, suppose that {pn} is sweeping and that
/. = PJ. is an invariant density. Further define
/.(x)p(dx) =
lc-+oo}B,.
f /.(x)p(dx) =
Jx
1,
(5.10.3)
On the other hand, since/.= Pf.,
1c
/.(x)p(dx) $
B,.
L
i=l
1c
/.(x)p(dx)
Ac
=L
i=l
pn j.(x)p(dx).
Ac
--+
(5.10.4)
1
1
t/J(z)dz = 1.
(5.10.5)
The operator (5.10.4) appears on the right-hand side of the ChandrasekharMiinch equation describing the fluctuations in the brightness of the Milky
Way. This equation will be discussed in Examples 7.9.2 and 11.10.2. Here
we are going to study the properties of the operator (5.10.4) alone.
132
Let V: R+
have
-+
D we
oo
1oo
1oo (X) dy
1 V(x)Pf(x) dx = V(x) dx 1/J y f(y)y
z
1 1
1
00
00
V(x)Pf(x) dx =
f(y)dy
'1/J(z)V(zy) dz.
(5.10.6)
=1 gives
1ooo Pf(x)dx = 1o
00
f(y)dy { '1/J(z)dz
lo
foo f(y)dy
lo
which, together with the nonnegativity of 1/J, implies that (5.10.4) defines
a Markov operator.
Now set ffj(x) = x-fj in (5.10.4). Then
(X)
00
P/[J(x)
For
1/J -
dy
fJ+
Y 1
= {j1
11
,P(z)zfJ- 1 dz.
(5.10.7)
0.
Thus, by Theorem 5.10.1, the operator P defined by (5.10.4) is either sweeping to zero or has an invariant density.
It is easy to exclude the possibility that P has an invariant density.
Suppose that there is an invariant density / . Then the equality (5.10.6)
gives
1
1
00
or
1
1
00
V(y)f.(y) dy
/.(y) dy
'1/J(z)V(zy) dz,
00
/.(y) dy
,P(z)[V(y)- V(zy)] dz
= 0.
(5.10.8)
>0
for y
> 0, 0 ~ z
,P(z)[V(y)- V(zy)] dz
1,
133
is strictly positive for y > 0. Consequently, the product /.(y)I(y) is a nonnegative and nonvanishing function. This shows that the equality {5.10.8}
is not satisfied, and thus there is no invariant density for P.
Thus, for every '1/J satisfying (5.10.5) the operator P given by equation {5.10.4} is sweeping. This is both interesting and surprising since we
will show in Section 11.10 that the stochastic semigroup generated by the
Chandrasekhar-Miinch equation is asymptotically stable! D
The alternative formulated in Theorem 5.10.1 does not specify the behavior of the sequence {pn} in the case when an invariant density exists. We
now formulate a stronger form of the Foguel alternative, first introducing
the notion of an expanding operator.
=0
n-+oo
for
L 1 be
{5.10.9}
Pf(x)
= la
K(x,y)f(y)dy,
(5.10.10}
K(x,y) > 0
(5.10.11}
and A: [a, oo) - [a, oo) is a continuous strictly increasing function such that
A(x) > x
fora< x.
(5.10.12}
K(x, y) dx = 1
foo
for y
>a.
(5.10.13}
1>.-l(y)
xo
134
n [0, x 0]) =
0.
I'( A - supp pn /) $
Xn -
a.
(5.10.14)
The sequence {xn} is bounded from below (xn ~a). It is also decreasing
since Xn .x- 1{Xn-1) $ Xn-1 Thus {xn} is convergent to a number x. ~
a. Since .X{xn) = Xn-b in the limit as n- oo we have .X{x.) = x . From
inequality {5.10.12) it follows that x. = a, which according to (5.10.14)
shows that P is expanding.
For expanding operators, the Foguel alternative can be formulated as
follows.
Example 5.10.2. We return to the modeling of the cell cycle (see Example
5. 7.3) by considering the following model proposed by Tyson and Hannsgen
(1986).
They assume that the probability of cell division depends on cell size
m, so cell size plays the role of the mitogen considered in Example 5.7.3.
It is further assumed that during the lifetime of the cell, growth proceeds
exponentially, that is,
dm
-=km.
dt
When the size is smaller than a given value, which for simplicity is denoted
by 1, the cell cannot divide. When the size is larger than 1, the cell must
traverse two phases A and B. The end of phase B coincides with cell
division. The duration of phase B is constant and is denoted by Ts. The
length TA of phase A is a random variable with the exponential distribution
prob(TA ~ t)
= e-pt.
At cell division the two daughter cells have sizes exactly one-half that of
the mother cell.
135
Using these assumptions it can be shown that the process of the replication of size may be described by the equation
fn+l(x)
= Pfn(x) = [ tT
/tT
K(x,r)fn(r)dr,
(5.10.15)
where fn is the density function of the distribution of the initial size in the
nth generation of cells, and the kernel K is given by
_
K(x,r)-
;
(kp)u(x)-1-{1'/lc)
(
p)
(x)-1-(p/lc)
'-/A:)
{- r\P
ku
foru$r<1
x
for 1 < r < -.
- -u
(5.10.16)
or
x-1--y
ku
(:.)-1-{p/A:)
u
[u-'Y -1"f
(~)-"f
- < -lnu.
p
(5.10.18)
Thus, for k,p, and u satisfying (5.10.18), there exists "f > 0 for which
the function /.(x) = cx- 1-'Y is invariant with respect to P. It can be
normalized on the interval [u, oo ), namely, for c = "fCT-'Y,
136
Exercises
5.1. Let (X,A,~J.) be a finite measure space and let 1 :5 P1 :5 P2 :5 oo.
Show that every set :F E lJ'2 strongly precompact in lJ'2 is also strongly
precompact in IJ'1 Is the same true for weak precompactness?
5.2. Let X = R+ with the standard Borel measure. Consider the four
families of functions:
(1) fa(x) = ae-az, for a~ 1;
= ae-az, for 0 :5 a :5 1;
fa(x) = e-zsinax, for a~ 1;
fa(x) = e-z sin ax, for 0 :5 a :51.
(2) fa(x)
(3)
(4)
g E l1
:F = {/
e l 1 : 1/1 :5 g}.
Pf(x)
=[
K(x,y)f(y)dy
-+
R be a. continuous
Exercises
137
= R+ and let
Pf(x)
1z
K(x, y)f(y) dy
={A E B(R):m(A)
< oo},
IJ(supp pno I
6
The Behavior of Transformations
on Intervals and Manifolds
This chapter is devoted to a series of examples of transformations on intervals and manifolds whose asymptotic behavior can be explored through
the use of the material developed in Chapter 5. Although results are often
stated in terms of the asymptotic stability of {Pn}, where Pis a FrobeniusPerron operator corresponding to a transformationS, remember that, according to Proposition 5.6.2, S is exact when {pn} is asymptotically stable
and S is measure preserving.
In applying the results of Chapter 5, in several examples we will have
occasion to calculate the variation of a function. Thus the first section
presents an exposition of the properties of' functions of bounded variation.
6.1
= b ~a
1b
f{x) dx,
and its variance, D 2 (/) = m((f -m(/)) 2 ). However, these are not always
satisfactory. Consider, for example, the sequence of functions {/n} with
fn(x) = sin2mrx, n = 1,2, .... They have the same mean value on [0,1],
namely, m(/n) = 0 and the same variance D 2 (/n) = ~; but they behave
quite differently for n : 1 than they do for n = 1. To describe these
140
Xo
and write
(6.1.1)
Bn{f) =
L: ll(xi) - l(xi-1)1.
{6.1.2)
i=1
If all possible sums sn{!), corresponding to all subdivisions of [a, b], are
bounded by a number that does not depend on the subdivision, I is said
to be of bounded variation on [a, b]. Further, the smallest number c such
that Bn :$ c for all Bn is called the variation of I on [a, b] and is denoted
by
I. Notationally this is written as
v!
VI= SUPBn{!),
(6.1.3)
where the supremum is taken over all possible partitions of the form (6.1.1).
Consider a simple example. Assume that I is a monotonic function, either
decreasing or increasing. Then
ll(xi) - l(xi-1)1
where
8
_ {1
-1
= 8[1(xi) -
l(xi-1)],
for I increasing
for I decreasing
and, consequently,
n
141
and, consequently,
b
Bn(/
VU+g):SVI+Vg.
a
If It, ... , In are of bounded variation on [a, b], then by an induction argument
b
(Vl)
(6.1.4)
follows immediately.
Variation on the Union of Intervals
Assume that a < b < c and that the function I is of bounded variation on
[a, b]as well as on [b, c). Consider a partition of the intervals [a, b] and [b, c),
a
(6.1.5)
Bn
(!) =
[o,b)
L ll(xi)- l(xi-1)1,
i=1
m
Bm
(f)=
(b,c)
i=1
II(Yi)- I(Yi-1)1.
(f) +
(o,b)
Bm
(f)
= Bn+m (f)
(b,c)
(6.1.6)
[o,c)
where the right-hand side of equation (6.1.6) denotes the sum corresponding
to the variation of I over [a, c]. Observe that (6.1.6) holds only for partitions
of [a, c] that contain the point b. However, any additional point in the sum
Bn can only increase Bm but, since we are interested in the supremum, this
is irrelevant. From equation (6.1.6) it follows that
b
VI+VI=VJ.
0
142
where ao < a 1
i = l, ... ,n.
On
Gn
VI+
.. + V I= V/,
ao
(V2)
(6.1.7)
(6.1.8)
Observe that, due to the monotonicity of g, the points g(ui) define a partition of [a, b]. Thus, Bn (f o g) is a particular sum for the variation of f and,
therefore,
b
Bn(/og) $
Vf
a
VJog$V f.
(V3)
(6.1.9)
~1
= g(xi)]
143
Bn(/g) $;
ll(xi-1)g'(xi)l(xi- Xi-1)
i=1
L ll(xi-1)g'(xi)l(xi- Xi-1)
i=l
with Xi E (xi-1! Xi) Observe that the last term is simply an approximating
sum for the Riemann integral of the product ll(x)g'(x)l. Thus the function
l(x)g(x) is of bounded variation and
b
(V4)
Taking in particular
{6.1.10)
I= 1,
y
b
(V4')
g $;
Llg'(x)l dx.
{6.1.11)
However, in this case, the left- and right-hand sides are strictly equal since
Bn (g) is a Riemann sum for the integral of g'.
Yorke Inequality
Now let I be defined on [0, 1) and be of bounded variation on [a, b] c [0, 1].
We want to evaluate the variation of the product of I and the characteristic
function 1[a,b) Without any loss of generality, assume that the partitions
of the interval [0, 1] will always contain the points a and b. Then
Bn
(0,1)
(/1[a,bJ) $; Bn (/)
[a,b)
+ ll(a)l + ll(b)l.
Let c be an arbitrary point in [a, b]. Then, from the preceding inequality,
Bn
(0,1)
(/1[a,bJ) $; Bn (/)
(a,b)
$; 2 VI+ 2ll(c)j.
a
144
~a 1b ll(x)l dx
2
(/1[a,bJ) 52 VI+~
(0,1)
1b lf(x)l dx,
a a
which gives
(V5)
b 2 1b ll(x)l dx.
V
l1(a,b) 5 2 VI+ -_b
0
a
1
6.2
(6.1.12)
= rx
> 1, r(O) = 0,
(6.2.2)
where r > 1, is a real constant. (The r-adic transformation considered earlier is clearly a special case of the R.enyi transformation.) Using a numbertheoretic argument, Renyi was able to prove the existence of a unique
invariant measure for such transformations. Rochlin was able to prove that
the Renyi transformations on a measure space with the Renyi measure
were, in fact, exact.
In this section we unify and generalize the results of Renyi and Rochlin
through the use of Theorem 5.6.2.
Consider a mapping S: [0, 1] --+ [0, 1] that satisfies the following four
properties:
(2i) There is a partition 0 = ao < a 1 < < ar = 1 of [0, 1] such that for
each integer i = 1, ... , r the restriction of S to the interval [ai-l, ai)
is a C 2 function;
{2ii) S(ai-l) = 0 fori= 1, ... , r;
(2iii) There is a A> 1 such that S'(x) ~A for 0 5 x
denote the right derivatives]; and
145
O~x<l.
(6.2.3)
Proof. We first derive an explicit expression for the Frobenius-Perron operator. Note that, for any x E [0, 1),
r
s- 1 ([0,x]) =
Urai-1t9i(x)],
i=l
where
= { s-(i) (x)
1
9i(x)
ai
0 ~X< bi
bi~x<1
146
Pf(x) = ~- f
ua;
r 1g,(z)
= dx L
i=l
or
f(u) du
Js-1 ((O,z))
f(u)du
4 '-1
Pf(x) = L:oHx)f(g,(x)).
(6.2.4)
i=l
If b, < 1, then gHb,) denotes the right derivative. Thus, gHx) = 0 for
b, ~ x < 1, and all the g~ are lower left semicontinuous.
Now let Do denote the subset of D((O, 1]) consisting of all functions f
that, on the interval (0, 1), are bounded, lower left semicontinuous, and
satisfy the inequality
for 0
x < 1,
(6.2.5)
Pf as
i=l
and
so, as a consequence,
(PI)~ :5 c tgHf
i=l
g,)
+X tgHf~
g,).
i=l
Choose a real k
147
kfn
{6.2.6)
for n sufficiently large, say n ~ no{!), and thus condition {5.8.3) of Proposition 5.8.1 is satisfied.
We now show that the fn are bounded and hence satisfy condition {5.8.2)
of Proposition 5.8.1. First note that from equation (6.2.4) we may write
r
fn+l(x)
= L,u;(x)fn(Ui(x)).
i=l
L fn(ai_l).
{6.2.7)
i=2
fn(ai) $ fn(x)elc,
so that
1 ~loa' fn(x) dx
~ e-lc fn(ai)ai,
for i = 1, ... , r.
for n
no{/),
L = :~:::::>lc /ai-l
i=2
no{!),
so
for 0 $ x
< 1, n
~ n1.
{6.2.8)
148
Theorem 6.2.1 is valid only for mappings that are monotonically increasing on each subinterval [ai_ 1,ai) of the partition of (0, 1]. However, by
modification of some of the foregoing properties {2i)-{2iv), we may also
prove another theorem valid for transformations that are either monotonically increasing or decreasing on the subintervals of the partition. The
disadvantage is that the mapping must be onto for every (ai-l, ai)
We now consider a mapping S: [0, 1] -+ (0, 1] that satisfies a condition
slightly different from property {2i):
{2i)' There is a partition 0 = a0 < a 1 < < ar = 1 of (0, 1] such that for
each integer i = 1, ... , r the restriction of S to the interval (ai-l! ai)
is a C 2 function; as well as
(2ii)' S((ai-l,ai))
IS"(x)I/[S'(x)] 2 5 c,
for x =f ai,i
= 0, ... ,r.
(6.2.9)
Theorem 6.2.2. If S: (0, 1] -+ (0, 1] satisfies the preceding conditions (2i)'(2iv)' and Pis the Frobenius-Pemm operator associated with S, then {Pn}
is asymptotically stable.
Proof. The proof proceeds much as for Theorem 6.2.1. Using the same
149
8- 1 ((0,x))
= U<a;-l!Y;(x)) + U<YA:(x),aA:),
A:
where Yi = 8~)1 , 8(i) is as before, and the first union is over all intervals in
which 8(i) is an increasing function of x whereas the second is over intervals
in which 8(i) is decreasing. Thus
= lgHx)l,
r
Pf(x)
=L
ui(x)f(gi(x)).
(6.2.10)
i=l
(Pf)'
f. For every f
i=l
i=l
Do, differentiating
ui ~ sup(1/l8'1) ~ 1/>.
and
I(Pf)'l
[c+ ~] Pf.
(6.2.12)
150
Xj+J
FIGURE 6.2.3. Successive maxima in the variable z(t) from the Lorenz equations
are labeled :z:,, and one maximum is plotted against the previous (z,+l vs. :z:,)
after rescaling so that all :z;, E (0, 1).
Example 6.2.1. When u = 10, b = 8/3, a.nd r = 28, then a.ll three
variables x, y, a.nd z in the Lorenz [1963] equations,
dx
-dt =yz-bx,
-dy
= -xz+rz-y'
dt
dz
dt
= u(y- z),
S(x)
(~==~X
for x E [0, !]
(6.2.13)
(2- a)(1- x)
1-a(1-x)
for x E
(!, 1],
where a= 1-e, shown in Figure 6.2.4 fore= 0.01. Clearly, S(O) = 8(1) =
0, S{!) = 1, a.nd, since S'(x) = (2- a)/(1- ax) 2 , we will always have
IS'(x)l > 1 for x E [0, !) if e > 0. Fina.lly, since S"(x) = 2a(2-a)/(1-ax?,
IS"(x)l is always bounded above. For x E (!, 1] the calculations a.re similar.
Thus the transformation (6.2.13) satisfies a.ll the requirements of Theorem
6.2.2: {pn} is a.symptotica.lly stable a.nd S is exact. 0
151
FIGURE 6.2.4. The transformation S(z) given by equation (6.2.13) with= 0.01
as an approximation to the data of Figure 6.2.3.
Remark 6.2.1. The condition that IS'(z)l > 1 in Theorem 6.2.2 is essential
for S to be exact. We could easily demonstrate this by using (6.2.13) with
e = 0, thus making IS'(O)I = 18'(1)1 = 1. However, even if IS'(z)l = 1 for
only one point z E [0, 1), it is sufficient to destroy the exactness, as can be
demonstrated by the transformation
S(z)
= { z/(1- z)
2x -1
for z E [~, ~]
for x E ( 2 , 1] ,
(6.2.14)
(6.2.15)
Set qn(x) = xln(x), where In = pn lo, and pick the initial density to be
lo = 1. Thus q0 (x) = x, and from (6.2.15) we have the recursive formula,
1 ( +X
qn+l(x) = 1 + x qn
(1
X
%)
+ 1 +X
qn 2 + 2 '
(6.2.16)
152
we have
which shows that
exists. Write Zo
Qn+t(z~c) =
1:
Zlc Qn(Zic+t)
:lcZic Qn
(~ + ~).
Take k to be fixed and assume that limn,_, 00 Qn(x) = Co for Zlc ::::; x ::::; 1
{which is certainly true fork= 0). Since Zlc ::::; + !z~c, taking the limit as
n -+ oo, we have
Zlc
Co= - -1 - 1"1m Qn (Zlc+t ) + - --eo,
1 + Zlc n-+oo
1 + Zlc
so limn..... oo Qn(Zic+l) =Co Since the functions Qn(x) are increasing, we know
that limn-+oo Qn(x) = Co for all x E [z1c+1, 1]. By induction it follows that
limn.....oo Qn(x) = Co in any interval [z~c, 1] and, since lim~c.... 00 Zlc = 0, we
have limn,_, 00 Qn(x) =Co for all x E {0, 1]. Thus
lim fn(x) = eo/x.
n-+oo
Actually, the limit Co is zero; to show this, assume Co
must exist some e > 0 such that
1
1
lim
n-+oo
0. Then there
1
1
fn(x) dx
(eo/x) dx > 1.
1
1
Thus, since
1
1
1
1
u-- h)+dx +
u+- h)+dx::::; 6.
1
= 21
1
1
1
+1
1
P"' j+ dx +
1
P"'1dx+6
1
1
P"'hdx
:::;2h
P"' f- dx
P"'(J+- h)dx +
P"'(r- h)dx
153
154
s- 1 ([0,x]) =
where
gi(x)
= { s~) (x)
a,
Ulai-l,g,(x)],
i=l
for X E S([a,_l,ai))
for x E [0, 1] \ S([a,-1, a,))
and, as before, S(i) denotes the restriction of S to the interval [ai-l, a,).
Thus, as in Section 6.2, we obtain
r
Pf(x)
= L:gHx)f(g,(x)).
(6.3.1)
i=l
Even though equations (6.2.4) and (6.3.1) appear to be identical, the functions g, have different properties. For instance, by using the inverse function
155
theorem, we have
g~
= 1/8' > 0
and g~'
g~ is
1~
l(x) :5 1/x,
Hence, for i
E (0, 1].
2, we must have
gHx)l(gi(x)) :5 gHO)I(g,(O))
:5 gHO)
9i(O)
= gHO),
ai-l
= 2, ... ,r.
= 0. However, we do
Combining these two results with equation (6.3.1) for P, we can write
r
Set
8'(0)
and
= 1/g~(O) = ~ > 1
Li;(O)/ai-1
=M
i=2
so
Pl(x) :5
{1/~)1(0)
+ M.
pn l(x) :5
{1/~n)I{O)
+ ~M/(~- 1)
156
Thus, for decreasing f e D((O, 1]), since /(0) < oo the sequence {P" /}is
bounded above by a constant. From Corollary 5.2.1 we therefore know that
there is a density, /. e D such that Pf. =/.,and by Theorem 4.1.1 the
measure J.I.J. is invariant.
Example 6.3.1. In the experimental study of ftuid ftow it is commonly observed that for Reynolds numbers R less than a certain value, RL, strictly
laminar ftow occurs; for Reynolds numbers greater than another value,
RT, continuously turbulent ftow occurs. For Reynolds numbers satisfying
RL < R < RT, a transitional type behavior (intermittency) is found.
Intermittency is characterized by alternating periods of laminar and turbulent ftow, each of a variable and apparently unpredictable length.
Intermittency is also observed in mathematical models of ftuid ftow, for
example, the Lorenz equations (Manneville and Pomeau, 1979]. Manneville
[1980] argues that, in the parameter ranges where intermittency occurs
in the Lorenz equations, the model behavior can be approximated by the
transformationS: (0, 1]- [0, 1] given by
S(x) = (1 + e)x + (1- e)x2
(mod 1)
(6.3.2)
[j,
157
union has measure x. However, 8 is obviously not exact and, indeed, is not
even ergodic since 8- 1 {[0, !D = [o, !l and 8- 1 {[!, 1]) = l!. 1]. 8 that is
restricted to either (0, !J or [!, 1] behaves like the dyadic transformation.
The loss of asymptotic stability by {pn} may, under certain circumstances, be replaced by the asymptotic periodicity of {pn}. To see this,
consider a mapping 8: [0, 1] -+ [0, 1] satisfying the following three conditions:
{4i) There is a partition 0 = ao < a1 < < ar = 1 of [0, 1] such that
for each integer i = 1, ... , r the restriction of 8 to (ai-l ~) is a C 2
function;
(4ii) l8'(x)l;::: -X> 1,
x =I= ai, i
= 1, ... , r;
(6.4.1)
l8"(x)l
x 'I ai, i
= 0, ... , r.
(6.4.2)
Theorem 6.4.1. Let 8: [0, 1] -+ [0, 1] satisfy conditions {4i)-(4iii) and let
P be the Frobenius-Penvn operator associated with 8. Then, for all fED,
{ pn I} is asymptotically periodic.
158
----- I
----~'--,--------t--
:/
:
I
:
I
0~~--~--~~~~~-.-~~~x
8- 1((0,x))
= UAi(x)
i=l
~(x)
and, as before, 9i
Therefore,
~~I
f(u)du
""" Js-l([o,z))
r d I
L
dx J~
i=l
f(u)du,
(6.4.3)
a.(z)
where
dx J~
a.(z)
f(u)du
x E Ii,g~ > 0
-gHx)f(gi(x)), x E hg~ < 0
0
xh
{ gHx)f(gi(x)),
(6.4.4)
The right-hand side of equation (6.4.3) is not defined on the set of end
points of the intervals Ii, 8(ai_ 1 ), and 8(~). However, this set is finite and
159
where ui(x) = lgHx)l and 11, (x) is the characteristic function of the interval
h Thus (6.4.3) may be written as
r
Pf(x) = Lui(x)f(gi(x))1I,(x).
(6.4.5)
i=1
Equation (6.4.5) for the Frobenius-Perron operator is made more complicated than those in Sections 6.2 and 6.3 by the presence of the characteristic
functions 11,(x). The effect of these is such that even when a completely
smooth initial function f E L 1 is chosen, P f and all subsequent iterates of
f may be discontinuous. As a consequence we do not have simple criteria,
such as decreasing functions, to examine the behavior of pn f. Thus we
must examine the variation of pn f.
We start by examining the variation of P f as given by equation (6.4.5).
Let a function f E D be of bounded variation on [0, 1]. From property (V1)
of Section 6.1, the Yorke inequality (V5), and equation (6.4.5),
1
V Pf(x) :5 L:Vfui(x)f(gi(x))1I,(x)]
0
i=1 0
:52 L:Vfui(x)/(gi(x))]
(6.4.6)
Further, by property (V4),
Vfui(x)f(gi(x))] :5
~
~ V /(gi(x)) + c
~
O"i
:5 1/A and
ui(x)f(gi(x)) dx,
lu~l
:5
160
i=1
Ill] i. ui(x)f(gi(x))dx.
+2t [c+
{6.4.7)
Define a new variable y = 9i(x) for the integral in {6.4.7) and use property
{V3) for the first term to give
Set L = maJCi 2{c + 1/llil) and use property (V2) to rewrite this last inequality as
1
2
11
2 1
V
Pj(x) ~ XV f + L
f(y) dy =XV f + L
1
(6.4.8)
VPnJ~(~)\jJ+L~(~Y
0
Thus, if ..\
> 2, then
pnj
(6.4.9)
J=O
~ (~)ny J+ /:2
J E D of bounded variation,
1
lim sup
n-+oo
Vpnj < K,
(6.4.10)
:F = {g E D:
y~ K} .
g
g(x) - g(y) ~
Vg
0
161
for all x, y E [0, 1]. Since g E D, there is some y E [0, 1] such that g(y) :5 1
and, thus,
g(x) :5 K + 1.
Hence, by criterion 1 of Section 5.1, :F is weakly precompact. (Actually, it
is strongly precompact, but we will not use this fact.) Since :F is weakly
precompact, then Pis constrictive by Proposition 5.3.1. Finally, by Theorem 5.3.1, { pn!} is asymptotically periodic and the theorem is proved
when A> 2.
To see that the theorem is also true for A > 1, consider another transformation S: [0, 1] -+ [0, 1] defined by
-
= Sq(x).
(6.4.11)
Let q be the smallest integer such that Aq > 2 and set X= Aq. It is easy to
see that S satisfies conditions (4i)-(4ii). By the chain rule,
=X> 2.
lim supVPn/
n-+oo
<k
VpmI :5 k,
> mo.
:5 k
o~ 3 ~q-1
;=o
(mo + 1)q.
Remark 6.4.1. From the results of Kosjakin and Sandler [1972] or Li and
Yorke [1978a], it follows that transformations S satisfying the assumptions
of Theorem 6.4.1 are ergodic if r = 2. 0
Example 6.4.1. In this example we consider one of the simplest heuristic
models for the effects of periodic modulation of an autonomous oscillator
[Glass and Mackey, 1979].
162
/
/
/
/
/
/
/
/
FIGURE 6.4.3. The periodic threshold 9(t) is shown as a solid curved line, and
the activity z(t) as dashed lines. (See Example 6.4.1 for further details.)
Consider a system (see Figure 6.4.3) whose activity x(t) increases linearly
from a starting time ti until it reaches a periodic threshold 9(t) at time ti:
(6.4.12)
We take
x(t)
= ,\(t- ti)
and 9(t)
= 1 + </>(t),
(6.4.13)
(6.4.14)
Equation (6.4.14) has exactly one smallest solution ti ~ ti for every tiER.
We wish to examine the behavior of the starting times ti. Set
F(ti) = ti(ti) + -y- 1 x(ti(ti))
so that the transformation
S(t) = F(t)
(mod 1)
(6.4.15)
= K sin 2?Tt,
163
!J
_ { 4Kt + 1 - K
t E (0,
(}(t)- 4K(1-t)+I-K tE (!,I).
The calculation of F(t) depends on the sign of .X- 4K. For example, if
.X > 4K, a simple computation shows that
tf
+ ~) f:f
~!pt+(l+~)
F(t) = {
~:;:pt + ( l
tE [-a,!(1-,8)-a]
6 16
( .4' )
164
where
F=Mu2 /QR
is Freude's number, the ratio of the kinetic and potential energies.
The Freude number F contains all of the important parameters charact!lrizing this process. It is moderately straightforward to show that with
S =So S, IS'(x)l > 1 ifF> 2. However, the transformation (6.4.17) is
not generally onto, so that by Theorem 6.4.1 the most that we can say is
that for F > 2, if Pis the Frobenius-Perron operator corresponding to S
Example 6.4.3. Kitano, Yabuzaki, and Ogawa [1983] experimentally examined the dynamics of a simple, nonlinear, acoustic feedback system with
a time delay. A voltage x, the output of an operational amplifier with response time -y- 1 , is fed to a speaker. The resulting acoustic signal is picked
up by a microphone after a delay T (due to the finite propogation velocity
of sound waves), passed through a full-wave rectifier, and then fed back to
the input of the operational amplifier.
Kitano and co-workers have shown that the dynamics of this system are
described by the delay-differential equation
-y- 1:i:(t)
(6.4.18)
-lx+ !I+!
(6.4.19)
where
F(x) =
is the output of the full-wave rectifier with an input x, and J.l. is the circuit
loop gain.
In a series of experiments, Kitano et al. found that increasing the loop
gain J.l. above 1 resulted in very complicated dynamics in x, whose exact
nature depends on the value of -yr. To understand these behaviors they
considered the one-dimensional difference equation,
Xn+l
= J.tF(xn),
[-f--r, -!]
165
so that (6.4.20) is equivalent to the transformation 8: [0, 1]- [0, 1], defined
by
1
S(x') = { J.tX
for x' E [0, 1/J.t]
(6.4.21)
2- J.tX 1 for x' E (1/J.t, 1].
For 1 < J.t ::5; 2, the transformation S defined by (6.4.21) satisfies all the
conditions of Theorem 6.4.1, and 8 is thus asymptotically periodic. If J.t = 2,
then, by Theorem 6.2.2, 8 is statistically stable. Furthermore, from Remark
6.4.1 it follows that 8 is ergodic for 1 < J.t < 2 and will, therefore, exhibit
disordered dynamical behavior. This is in agreement with the experimental
results. 0
Remark 6.4.2. As we have observed in the example of Figure 6.4.1, piecewise monotonic transformations satisfying properties (4i)-(4iii) may not
have a unique invariant measure. If the transformation is ergodic, and the
invariant measure is thus unique by Theorem 4.2.2, then the invariant measure has many interesting properties. For example, in this case Kowalski
[1976] has shown that the invariant measure is continuously dependent on
the transformation. 0
166
= IS'(x)lr/>(S(x)) >
p (X ) -
and
r/>(x)
~
A
> 11
(6.5.1)
0<x<1
0 <X< 1,
(6.5.2)
e (0, 1],
(6.5.3)
1 r
= jj4;jj
Jo
for x
rJ>(u) du,
T(x)
= g(S(g- 1 (x)))
(6.5.4)
T '( g(x ))
= S'(x)r/>(S(x))
r/>(x)
1 d ( 1 )
r/>(x) dx p(x)
d (
= r/>(x) dx
1 )
IT'(g)l
T"(g)
= (T'(g)]2 114>11'
T"(g)
[T'(g)] 2 ~ cllr/>11 < oo.
Thus the new transformation T satisfies all the conditions of Theorem 6.2.2,
and {~} is asymptotically stable as is {J>N} by (6.5.14).
Example 6.5.1. Consider the quadratic transformation S(x)
with x e [0, 1] and set
1
r/>(x) = -7rv''x=7(1::=-=x=.=)
= 4x(1- x)
(6.5.5)
167
Using equations (6.5.3) and (6.5.5), it is easy to verify that all the conditions
of Theorem 6.5.1 are satisfied in this case and, thus, for the quadratic
transformation, {pn} is asymptotically stable.
Note that with l/J as given by (6.5.5), the associated function g, as defined
by (6.5.3), is given by
du
1 1. 1 (1-2z),
g(z) =-11:r:
=---sin11" o y'u(1 - u)
2 11"
(6.5.6)
and thus
(6.5.7)
Hence, when S(z)
by
(6.5.8)
is easily shown to be
2z
T(z)
= { 2(1- z)
for z E
for z E
[0, ~)
[~, 1].
(6.5.9)
Theorem 6.5.2. LetT: (0, 1)-+ (0, 1) be a measumble, nonsingular tronaformation and let l/J E D((a,b)), with a and b finite or not, be a given positive density, that is, lfJ > 0 a. e. Let a second tmnsformation 8: (a, b) -+ (a, b)
be given by S = g- 1 o Tog, where
g(z) =
1:r: lfJ(y)dy,
a< z <b.
(6.5.10)
168
Then T is exact if and only if S is statistically stable and </> is the density
of the measure invariant with respect to S.
Proof. Let PT and Ps be Frobenius-Perron operators corresponding to
the transformations T and S, respectively. We start with the derivation of
the relation between PT and Ps. By the definition of Ps, we have
11
Psf(x)dx =
f
1s-
f(x)dx,
for
E L 1 ((a,b)),
1 ((a,11))
where s- 1 ((a, y)) = g- 1 (T- 1 (g(a), g(y))). Set X= g- 1 (z) and use equation
(6.5.10) to change the variables so that the last integral may be rewritten
to give
11
1
a
Psf(x) dx
T-l(g(a),g( 11 ))
1
dz )) .
f(g- (z)) </>( -l(
Defining
for f E L 1 ((a, b)),
(6.5.11)
we have
11
Psf(x) dx
= f
JT- 1 (g(a),g(11))
P9 j(z) dz
= [9<
11
}g(a)
>i'TP9 f(z)
dz. (6.5.12)
Setting
Pg- 1 f(x) =
j(g(x))</>(x),
for
1 ((0, 1)},
(6.5.13)
1
11
Psf(x) dx
1
11
E L 1 ((a, b)).
(6.5.14)
L 1 ((a, b)).
Further, Pg- 1 , as given by (6.5.13), is the inverse operator to P9 , and integration of (6.5.13) gives
for j E L 1 ((0, 1}}.
(6.5.15)
169
which shows that J is the density of the measure invariant with respect
to S. Analogously from PsJ = J, it follows that PT1 = 1. By using an
induction argument with equation (6.5.14), we obtain
for
IE L 1 ((a, b)).
liPs I- ifJIILl(a,b) =
=
By substituting
I= P9- 1 i,
for
(6.5.16)
E L 1 ((0, 1)),
Thus, from equations (6.5.16) and (6.5.17), it follows that the strong convergence of {PSI} to J for IE D((a,b)) is equivalent to the strong convergence of {.P!Jf} to 1 for f E D((O, 1)).
Example 6.5.2. LetT be the hat transformation of (6.5.9) and pick ifJ(x) =
k exp( -kx) for 0 < x < oo, which is the density distribution function for
the lifetime of an atom with disintegration constant k > 0. Then it is
straightforward to show that the transformation S = g- 1 o Tog is given
by
S(x) =In {
11- 2e~bl1/k}.
Psl(x)
e-k:z:
By Theorem 6.5.2, {PS} is asymptotically stable with the stationary density ifJ(x) = kexp(-kx). 0
Sm(x)
= 2 cos[mcos- 1 (x/2)],
m=0,1,2, ....
170
Define
g(x)
du
= -1r11a:
r;--::;r
-2v4-u-
1
tf>(x)= 1r ~
4-x
(6.5.18)
~)
= { m (2't2 - x)
2't1)
[
)
for x E 2't1, 2't2 ,
for x E [ ~,
(6.5.19)
where n = 0, 1, ... , [(m - 1)/2], and [y) denotes the integer part of y.
For m ~ 2, by Theorem 6.2.2, {P.,J is asymptotically stable. An explicit
computation is easy and shows that /. 1 is the stationary density of PTm.
Thus Tm is exact. Hence, by Theorem 6.5.2, the Chebyshev polynomials
Sm are statistically stable for m ~ 2 with a stationary density given by
equation (6.5.18). This may also be proved more directly as shown by Adler
and Rivlin [1964).
This example is of interest from several standpoints. First, it illustrates
in a concrete way the nonuniqueness of statistically stable transformations (Sm) with the same stationary density derived from different exact
transformations (Tm) Second, it should be noted that the transformation
Sm: (0, 1)-+ (0, 1), given by
1
= -4Sm(4x2) + 21
when m = 2, is just the familiar parabola, S2 (x) = 4x(1- x). Finally, we
Sm(x)
u (x) = [
n
.{i2nnf
] 1/2
H (ax)e-< 112 )a
n
a: '
2 2
for n
= 0, 1, ... ,
171
where
a4
= mk/h-2
(h. is Planck's constant) and Hn(Y) denotes the nth-order Hermite polynomial, defined recursively by
a
) -Q2z2
2(
<l>n (X ) = ..{ii2nn!Hn axe
I
for n
= 0, 1, ... ,
= ..(ii~n
1r
n.1
jz H~(ay)e-Q
-oo
2112
dy,
for n
= 0,1, ....
1z
<f>(y) dy
and g(x)
lz
(y) dy.
Proof. First set T: {0, 1) -+ {0, 1) equal toT= goSog- 1 This is equivalent
to S = g- 1 o Tog and, by Theorem 6.5.2, T is exact. Again, using Theorem
6.5.2 with the exactness ofT, we have that 8 = y- 1 o Tog is statistically
stable.
Remark 6.5.1. Nonlinear transformations with a specified stationary density can be used as pseudorandom number generators. For details see Li
and Yorke [1978). 0
172
= R;
(6iii) There is a constant A > 1 such that IS' (x) I ~ A for x ::/: a,,
i = 0, 1, 2, ... i
(6iv) There is a constant L ~ 0 and a function q E L 1 (R) such that
(6.6.1}
where 9i
= s~)l' for i =
0, 1, ... i and
[S'(x)]2 ~ c,
= 0, 1, ....
(6.6.2)
that
173
U<a;-t.9;(x>> + U<g~c(x),a~c),
s- 1 << -oo,x)) =
1c
Pf(x)
= dxd f
f(u) du
Js-1((-oo,z))
d
19J(z)
d
=dxL
f(u)du+(i:L
j
OJ-1
or
1a
1c
f(u)du
g,(z)
00
Pf(x)
CTi(x)f(gi(x)),
(6.6.3)
i=-oo
VP/(x) ~
-oo
00
f. Now
00
VCTi(x)f(g,(x))
i=-oo -oo
00
{ 1
~ i~oo
:X
y
00
f(g,(x))
100 ICT;(x)lf(g,(x)) dx }.
-oo
Using ICTHx)l ~ CCTi(x), which follows from inequality (6.6.2), and making
the change of variables y = g,(x), we have
Pf(x)
{.
~ :X
v f(y) +c.
00
-oo
oo
-oo
174
Since .X > 1, then, for real a > .Xcf(.X- 1), there must exist a sufficiently
large n, say n >no(!), such that
00
V P"f(x) ~a.
(6.6.5)
-oo
Zi
(6.6.6)
= 0, 1, ....
Pf(x)
f(x)
= q(x) { L
f(x)
1}.
P" f(x)
q(x)(aL + 1),
x E R,n >no(!)+ 1.
(6.6.7)
and, by induction,
- .X - 1
_xn f
6.7. Manifolds
175
Pick a constant K > >..cf(>.. -1) so that, since >.. > 1, for n sufficiently large
(n > n1(/)), we have
I(P f)' I :5 KP f.
(6.6.8)
Thus the iterates In= pn f satisfy condition (5.8.3) of Proposition 5.8.1.
Therefore, by Proposition 5.8.1, pn f has a nontrivial lower-bound function,
and thus, by Theorem 5.6.2, {pn} is asymptotically stable.
Remark 6.6.1. Observe that in the special case where Sis periodic (in x)
with period L = fli -ai-l, condition (6iv) is automatically satisfied. In fact,
in this case oHx) = !fo(x) so, by setting q = lol>I/L, we obtain inequality
(6.6.1) and, moreover,
llqll
= L1
11
00
00
-oo lg;(x)ldx
= L1
-oo g;(x)dx
I= l[gi(x)J~ool = LL = 1,
showing that q E 1 The remaining conditions simply generalize the properties of the transformation S(x) = /3tan('"Yx + 6) with IP'"YI > 1. 0
S"(x)
[S'(x)] 2 =
so that
1 .
S"(x)
1
[S'(x)] 2 :5 1131"
6.7
Manifolds
The last goal of this chapter is to show how the techniques described in
Chapter 5 may be used to study the behavior of transformations in higherdimensional spaces. The simplest, and probably most striking, use of the
Frobenius-Perron operator in d-dimensional spaces is for expanding mappings on manifolds. To illustrate this, the results of Krzyzewski and Szlenk
(1969), which may be considered as a generalization of the results of Renyi
presented in Section 6.2, are developed in detail in Section 6.8. However,
in this section we preface these results by presenting some basic concepts
from the theory of manifolds, which will be helpful for understanding the
176
geometrical ideas related to the Krzyiewski-Szlenk results. This elementary description of manifolds is by no means an exhaustive treatment of
differential geometry.
First consider the paraboloid z = x 2 + y2 This paraboloid is embedded
in three-dimensional space, even though it is a two-dimensional object. If
the paraboloid is the state space of a system, then, to study this system,
each point on the paraboloid must be described by precisely two numbers.
Thus, any point m on the paraboloid with coordinates (x, y, x 2 + y2 ) is
simply described by its x, y-coordinates. This two-dimensional system of
coordinates may be described in a more abstract way as follows. Denote by
M the graph of the paraboloid, that is,
M = {(x,y,z): z = x 2 + y 2 },
4>t(x,y,z) = (x,y),
4>2(x, y, z) = (x, y),
4>s(x, y, z) = (x, z),
4>4(x, y, z) = (x, z),
4>s(x, y, z) = (y, z),
4>6(x,y,z) = (y,z),
for z > 0;
for z < 0;
for y > 0;
for y < 0;
forx>O;
for x < 0.
Each of these functions 4>i maps a hemisphere of M onto an open unit disk.
This coordinate system has the property that for any m E M there is an
open hemisphere that contains m and on each of these hemispheres one 4>i
is defined.
In the same spirit, we give a general definition of a smooth manifold.
Definition 6. 7 .1. A smooth d-dimensional manifold consists of a topological Hausdorff space M and a system {4>i} of local coordinates satisfying
the following properties:
(a) Each function 4>i is defined and continuous on an open subset Wi C M
and maps it onto an open subset U;. = 4>i(W;.) of Rd. The inverse
functions 4>i 1 exist and are continuous (i.e., 4>i is a homeomorphism
of Wi onto Ui)i
(b) For each mE M there is a Wi such that mE Wi, that is, M = Ui W;.;
6.7. Manifolds
177
(c) If the intersection Win W; is nonempty, then the mapping 4Ji o 4Jj 1 ,
which is defined on 4J; (Wi n W;) c ~ and having values in ~, is a
0 00 mapping.
(Note that a topological space is called a Hausdorff space if every two
distinct points have nonintersecting neighborhoods.)
Any map 4Ji gives a coordinate system of a part of M, namely, Wi. A
local coordinate of a point mE Wi is 4Ji(m). Having a coordinate system,
we may now define what we mean by a Ole function on M. We say that
1: M -+ R is of class Ole if for each 4Ji: Wi -+ Ui the composed mapping
I o 4Ji 1 is of class c~e on ui.
Next consider the gradient of a function defined on the manifold. For
I: Rd -+ R the gradient of I at a point x E Rd is simply the vector
(sequence of real numbers),
81(x))
_ (81(x)
grad I( X ) 8
, ... , 8
Xd
Xt
I at a point
m E M can be
(6.7.la}
where
(6.7.lb}
Thus the gradient is again a sequence of real numbers that depends on the
choice of the local coordinates.
The most important notion from the theory of manifolds is that of tangent vectors and tangent spaces. A continuous mapping 'Y= [a, b] -+ M represents an arc on M with the end points 'Y(a} and 'Y(b}. We say that 'Y
starts from m = 'Y(a). The arc 'Y is Ole if, for any coordinate system (jJ, the
composed function 4Jo'Y is of class Ole. The tangent vector to 'Y at a point
m = 'Y(a) in a coordinate system (jJ is defined by
d [
dt 4J('Y(t)}
t=a =
(~ ' .. '~ },
d
(6.7.2}
e' ...
178
all equivalent arcs produces the same sequence (6. 7.3) for any given system
of coordinates. Such a class represents the tangent vector. Tangent vectors
are denoted by the Greek letters { and 'I
Assume that a tangent vector in a coordinate system ,p has components
e1, ... , {d. What are the components in another coordinate system'? Now,
dt [1/J('y(t))] = dt [H(,P('y(t)))],
where H = 1/J o,p- 1 and, therefore, setting
,i = L: aniej.
8x;
d
(6.7.4)
;= 1
Equation (6.7.4) shows the transformations of the tangent vector coordinates under the change of coordinate system. Thus from an abstract (tensor
analysis) point of view the tangent vector at a point m is nothing but a
sequence of numbers in each coordinate system given in such a way that
these numbers satisfy condition (6.7.4) when we pass from one coordinate
system to another. From this description it is clear that the tangent vectors
at m form a linear space, the tangent space, which we denote by Tm
Now consider a transformation F from a ~dimensional manifold Minto
a ~dimensional manifold N, F: M-+ N. The transformation F is said to
be of class Ck if, for any two coordinate systems ,P on M and 1/J on N, the
composed function 1/J oF o ,p- 1 is of class Ck, or its domain is empty. Let
be a tangent vector at m, represented by a 0 1 arc "(: [a, b] -+ M starting
from m. Then F o 'Y is an arc starting from F(m), and it is of class 0 1 if
F is of class 0 1 The tangent vector to F o 'Y in a coordinate system 1/J is
given by
Setting
[1/J oF o 'Y]t=a
= ('I\ ...
"'d).
(fi
(6.7.5)
i= 1
Note that the differential ofF is represented in any two coordinate systems,
,P on M and 1/J on N, by the matrix
6.7. Manifolds
179
The same matrix appears in the formula for the gradient of the composed
function: If F: M -+ N and f: N -+ R are C 1 functions, then the differentiation of(! oF) o f/J- 1 = (! o 1/J- 1 ) o (1/J oF o f/J- 1 ) gives
grad(/ o F)(m) = (D:r: 1 (m)(f oF), ... , D:r:Am)(f oF)),
where
D:r:,(m)(f 0 F)=
j=l
= (grad f)(dF(m)).
Observe that now dF(m) appears on the right-hand side of the vector.
Finally observe the relationship between tangent vectors and gradients.
Let f: M-+ R be of class C 1 and let -y: [a, b] -+ M start from m. Consider
the composed function f o -y: [a, b] -+ R that is also of class C 1 Using the
local system of coordinates,
and, consequently,
(6.7.6)
Observe that the numbers D:r:J and {i depend on J even though the lefthand side of (6.7.6) does not. Equation (6.7.6) may be more compactly
written as
(6.7.7)
In order to construct a calculus on manifolds, concepts such as the length
of a tangent vector, the norm of a gradient, and the area of Borel subsets
of M are necessary. The most effective way of introducing these is via the
Riemannian metric. Generally speaking the Riemannian metric is a scalar
product on Tm. This means that, for any two vectors {t,{2 E Tm, there
corresponds a real number denoted by ({1!{2 ). However, the coordinates
180
-+
L: gf;~i~j
i,j=1
gt,
(el!e2}
= L: gf;(m)efe~
(6.7.8)
i,j=1
k,l
where
"11!"12
l('Y) =
1b
II'Y'(t)ll dt.
(6.7.10)
This equation may be used for any arc 'Y that is continuous and piecewise
C 1 . If a manifold M is such that any two points mo, m 1 e M can be joined
by a continuous piecewise C 1 arc, it is said to be connected. On connected
manifolds the distance between points is given by
p(mo, ml)
= inf l('Y),
6.7. Manifolds
181
where the inf is taken over all possible arcs joining mo and m1. With this
distance, M becomes a metric space.
From equation (6.7.7) it is easy to define the length of grad fat a point
m. It is, by definition,
lgrad f(m)l =sup I! [/('y(t)))t=al
where the sup is taken over all possible arcs -y: (a, b] -+ M with -y(a) = m
and II'Y'(a)ll = 1. From this definition, it follows that, for an arbitrary C 1
arc 'Y and C 1 function f,
(6.7.11)
-+
meM,eeTm,
and
!grad(! o F)(m)l ::;; !(grad f)(F(m))lldF(m)l,
meM,
(fg)(m) = f(m)g(m)
182
<e~c,e,) =
:E uf;<m>eLel = 6~c,,
i,j=l
where
61c1
et
et ...
ej
det .. .. ..
e~
J.J.(B)
4>(B)
dx
Vq,(<f>-l(x))
(6.7.13}
dal Vt~~(F(m)),
Vq,(m)
= ldx
It can be shown that this definition does not depend on the choice of
coordinate systems </> and t/J in M and N, respectively. Note also that
the determinant per se is not defined, but only its absolute value. This is
because our manifolds M, N are not assumed to be oriented.
183
The following c8.J.culation will justify our definition of ldet dF(m)l. Let
B be a small set on M, and F(B) its image on N. What is the ratio
IJ(F(B))/IJ(B)? From equation (6.7.13),
IJ(F(B))
IJ(B)
Setting u
dy
If
dx
}"'(B) V"'(</>- 1 (x)).
I I
IJ(F(B))
f
du
dx
IJ(B) =}"'(B) dx Vt/J(F(</>- 1 (x))
If
dx
}"'(B) V"'(</>- 1 (x))
ldul
V"'(m) = ldet(dF(m))l.
dx Vt/JF(m))
(6.7.14)
formE M,
(6.8.2)
e = [1/2J.t(M)]e-kr,
184
where
r=
sup p(mo,ml)
mo,m1EM
Let "Y( t), a :$ t :$ b, be a piecewise smooth arc joining points mo = "Y(a)
and m1 =')'(b). Differentiation of/no')' gives (see inequality (6.7.11)]
Id(/n~~(t))]l :$1/~("Y(t))III"Y'(t)ll
:$ klb'(t)ll/n("Y(t))
so that
/n(ml) :$ fn(mo) exp { k
1b
lb'(s)ll ds}.
(6.8.3)
185
x- tj>(S(>- 1 (x)))
= u(x)
(1/-X)IIeu.
(6.8.4)
c W, so
1c
s- 1 (B) = U9i(B),
i=l
L:1
1c
Pf(m)p.(dm)
=f
jB
/(m)p.(dm)
ls-l(B)
i=l
f(m)p.(dm).
91(B)
1
B
Pf(m)p.(dm)
~ p.(gi(B))
1
= L..,
(B) ( (B))
i=l
Jl.
p. g,
1
91(B)
f(m)p.(dm).
p.(~)
and
(
~B))
P. g,
91(B)
i = 1, ... ,k.
Moreover, by (6.7.14),
p.~{~))
-ldet(dgi(m))j.
Pf(m) =
L ldet(dgi(m))lf(gi(m)),
i=l
(6.8.5)
186
Now let Doc D(M) be the set of all strictly positive C 1 densities. For
E Do, differentiation of P/(m) as given by (6.8.5) yields
I(P/)'1 = E~=~ I(Ji(f o gi))'l
Pf
Ei=l J,(f 0 g,)
Ji
(/ 0 9i)
'
Pf
>..
1/>..,
ltJ
I '
where
c =sup IJHm>l.
i,m Ji(m)
(6.8.6)
define the local coordinate system. In these local coordinates the Riemannian metric is given by 9ik = 6;k, the Kronecker delta, and defines a Borel
measure 11- identical with that obtained from the product of the Borel measures on the circle.
We define a mapping S: M -+ M that, in local coordinates, has the form
(mod 211").
(6.8.8)
Exercises
187
(m1 ,m2),
e>
Exercises
6.1. Let (X, A, p.) be a measure space and let S: X-+ X be a nonsingular
transformation. Fix an integer k ?: 1. Prove that SA: is statistically stable
if and only if S is statistically stable.
6.2. Consider the transformationS: (0, 1]-+ (0, 1] defined by
-{2x
~
S(x)-
a:
O$x<!
12<- x<-1.
Using the result proved in Exercise 6.1 show that Sis statistically stable.
= cx(1- x2 ).
Fix cat the value for which S maps (0, 1] onto [0, 1] and, using the change
of variables formulas (Theorem 6.5.1), show that for this value of c the
transformations S is statistically stable.
6.4. Consider the transformations S: (0, 1]-+ (0, 1] of the form
S(x)
Again fix c to be that value at which S maps (0, 1] onto (0, 1] and use a
change of variables and the Birkhoff individual ergodic theorem to show
that
lim sn(x) 0,
for x E (0, 1] a.e.
n-+oo
188
Observe that S has periodic points of period 3 and thus is chaotic in the
sense of Sark.ovskil and Li-Yorke [Jama, 1989; Li and Yorke, 1975].
6.5. Consider the transformation S: [0, l] -+ [0, l] defined by
S(x)
= x [-a+ 1 !x],
where l = (b/a)- 1 > 0. Find b = bo such that S maps [0, l] onto [0, l] and
using the change of variables formulas prove that S is statistically stable.
6.6. Consider the ''tent" transformationS: [0, 1]-+ [0, 1] of the form
-{ex
O$x<!
c(1- x) ! $ x $ 1
S(x) -
where 1 < c $ 2. Let P be the corresponding Frobenius-Perron operator. Determine the values of parameter c for which {pn} is asymptotically
periodic but not asymptotically stable (r ~ 2 in formula. (5.3.8)).
6.7. Let M be the two-dimensional torus described in Example 6.8.1. Consider the transformation S: M -+ M which in local coordinates has the
form
S(xt, x2) = (auxt + a12X2, a21X1 + a22x2).
Assume that the coefficients ai; are positive integers and find conditions
concerning the matrix (ai;) which imply the exactness of S.
6.8. Using TRAJ, BIFUR, DENTRAJ, and DENITER numerically study
the behavior of transformations defined in Exercises 6.5 and 6.6 for a <
b $ bo, and 0 < c < 2, respectively.
7
Continuous Time Systems:
An Introduction
190
7.1
AB illustrated in Figure 7.1.1, the value St(:r:0 ) is the position ofthe system
at a time t that started from an initial point :r:0 E X at time t = 0.
We consider only those processes in which the dynamical law S does not
explicitly depend on time so that the property
(7.1.1)
holds. This simply means that the dynamics governing the evolution of the
system are the same on the intervals (0, t'] and [t, t + t'].
Example 7.1.1. A well-known example of a continuous time process is
given by an autonomous d-dimensional system of ordinary differential equations
d:r:
(7.1.2)
- =F(:r:)
dt
where :r: = (:r:11 ... ,:r:d) and F: Jld -+ Jld is sufficiently smooth to ensure
the existence and uniqueness of solutions, such as F is C 1 and satisfies
IF(:r:)l ~ et + .BI:r:l, with et and .B finite. In this case, St(:r:0 ) is the solution
of (7.1.2) with the initial condition
(7.1.3)
In this example time t need not be restricted to t ;::: 0, and the system can
also be studied for t ~ 0. As we will see in Section 7.8, this is a commonly
encountered situation for problems in finite dimensional phase spaces. 0
~~t)
= F(x(t),x(t -1))
(7.1.4)
(7.1.5)
(7.1.6}
191
s1 (x >
FIGURE 7.1.1. The trajectory of a continuous time process in the phase space
X. At timet= 0 the system is at x0 , and at timet it is at St(x0 ).
192
Remark 7.2.1. System (7.1.2) of ordinary differential equations, introduced in the preceding section, is clearly an example of a dynamical system.
0
Remark 7.2.2. It is clear from the group property of Definition 7.2.1 that
for all t E R.
Thus, for all toE R, any transformation St0 of a dynamical system {StheR
is invertible. 0
In applied problems the space X is customarily called the phase space
of the dynamical system {StheR. whereas, for every fixed x 0 E X, the
function St(x0 ), considered as a function of t, is called a trajectory of
the system. The trajectories of a dynamical system {StheR in its phase
space X are of only three possible types, as shown in Figure 7.2.1a,b,c for
X = R 2 First (Figure 7.2.1a), the trajectory can be a stationary point x0
such that
for all t E R.
Second, as shown in Figure 7.2.1b, the trajectory can be periodic with
period w > 0, that is,
for all t E R.
Finally, the trajectory can be nonintersecting (see Figure 7.2.1c), by which
we mean that
= St(x 0 )
and
Bt-tt(St~(x0 ))
Hence, with w = (t 2
= St+(t~-tt)(x
= St+w(x0 ),
).
193
()
; s, (;)
~=Sw(;)
(b)
(a)
(c)
(d)
g(t)
= </J(St(y))
for all t E R.
From our precise definition of the trace of a dynamical system, the following obvious question arises: Given an observed continuous function in
194
for all t E R.
Proof. Let Y be the space of all continuous functions from R into X (note
that the elements of space Y are functions, not points). Let a dynamical
system {St' h'eR, St': Y --+ Y, operating on Y, be a simple shift so that
starting from a given y E Y we have, after the operation of St', a new
function y(t + t'). This may be represented by a diagram,
y(t)
~ y(t + t'),
--+
= y(t + t').
X by
4>(y) = y(O),
St'(g)(t)
--+
= g(t + t')
and
4>(St'(g)) = Stg(O} = g(t'},
Remark 7.2.3. Note that the proof of this theorem rests on the identification of the functions on X as the objects on which the new dynamical
system {St' heR operates. 0
195
Semidynamical Systems
(a.) So(x)
(b)
Remark 7.2.4. The only difference between dynamical and semidynamical systems is contained in the group property [compare conditions (b)
of Definitions 7.2.1 and 7.2.3]. The consequence of this difference is most
important, however, because semidynamical systems, in contrast to dynamical systems, are not invertible. It is this property that makes the study of
semidynamical systems so important for applications. Henceforth, we will
confine our attention to semidynamical systems. 0
Remark 7.2.5. An examination of the proof of Theorem 7.2.1 shows that
it is also true for semidyna.mical systems. 0
Remark 7 .2.6. On occasion a. family of transformations { Bt }t~ 0 satisfying
properties (a.) and (b) will be called a. semigroup of transformations.
This is because property (b) in Definition 7.2.3 ensures that transformations Bt form an Abelian semigroup in which the group operation is the
composition of two functions. Thus a. semidyna.mical system is a. continuous semigroup. 0
196
(7.3.1)
J*(x)
11T
= Tlim
-T
..... oo
f(St(x)) dt
{7.3.2)
Proof. This theorem may be rather easily demonstrated using the corresponding discrete time result, Theorem 4.2.3, if we assume, in addition,
that for almost all x EX the integrand f(St(x)) is a bounded measurable
function of t.
Set
1
1
g(x)
f(St(x)) dt
and assume at first that T is an integer, T = n. Note also that the group
property {b) of semidynamical systems implies that
T Jo
n-1
=;; L
f(St(x))dt
f(St-k(SA:(x))) dt
A:+l
k=O k
n-1
= ;; L
A:+l
k=O k
197
L 1o f(St'(S~c(x))) dt'
n
1
= -
k=O
n-1
=- l:u(S~c(x)).
n k=O
However, S~c
= 81 o S~c-1 = 81
o o o 81 =Sf, so that
lim n-+oo
n-+oo
k=O
and the right-hand side exists by Theorem 4.2.3. Call this limit f*(x).
If T is not an integer, let n be the largest integer such that n < T. Then
we may write
1r f(St(x)) dt = Tn . n1r
1r
lo f(St(x)) dt + T ln f(St(x)) dt.
T lo
(C2)
= f*(x),
f*(x) dx
f(x) dx.
0,
(7.3.3)
(7.3.4)
Definition 7.3.2. A semidynamical system {Sth>o, consisting of nonsingular transformations St: X -+ X is ergodic if every invariant set A E A
198
is such that either JL(A) = 0 or JL(X\A) = 0. (Recall that a set A for which
JL(A) = 0 or JL(X \A)= 0 is called trivial.)
St(x)=x+wt
(7.3.6)
t-+oo
(7.3.7)
t-+oo
= JL(A)JL(B)
(7.3.8)
7.4. Semigroups
199
since
Exactness
Definition 7.3.4. Let (X, A, p.) be a normalized measure space. A measurepreserving semidynamical system {Sth2:o such that St(A) E A for A E A
is exact if
lim p.(St(A))
t-+oo
=1
> 0.
(7.3.9)
= 0.
200
}A
=f
Ptf(x)p.(dx)
ls;- 1 (A)
f(x)p.(dx)
(7.4.1)
forAEA
>.11 >.2 E R;
{FP2)
(FP3)
Pt! ~ 0,
(7.4.2)
if I~ 0;
Ptf(x)p.(dx)
(7.4.3)
f(x)p.(dx),
for all
E L1
{7.4.4)
Thus, for every fixed t, the operator Pt: L 1 (X) -+ L 1 (X) is a Markov
operator.
The entire family of Frobenius-Perron operators Pt: L 1 (X) -+ L 1 (X)
satisfies some properties similar to (a) and (b) of Definition 7.2.3. To see
this, first note that since St+t' = StoSt', then s;_;t, = S;, 1 (8t" 1 ) and, thus,
1A
Pt+t'f(x)p.(dx)
=f
f(x)p.(dx} = f
Js-t+t' (A)
1s-t'
= f
Ptf(x)p.(dx)
1
(s-t 1 (A))
f(x)p.(dx)
ls; 1 (A)
Pt(Ptf(x))p.(dx).
f E L 1 {X), t, t' ~ 0
(7.4.5)
}A
Pof(x)p.(dx)
= f
Js0 (A)
1
f(x)p.(dx)
=f
}A
f(x)p.(dx)
implying that
Pol=!
(7.4.6)
7.4. Semigroups
201
Pt:
L 1 (X)
-+
= 0,
IIPt+t'ft- Pt+thll
IIPt/1- Pthll
which follows from (7.4. 7). By using this property, we may now proceed to
prove a continuous time analog of Theorem 5.6.2.
n-+oo
= Pt
so that Pnto
lim
Pf = /.
0. Fur(7.4.9)
= /. and
Having shown that Ptf. = /. for the set {to, 2t0 , }, we now turn to
a demonstration that Ptf. = /. for all t. Pick a particular time t', set
ft = Pt' /., and note that /. = pn /. = Pntof Therefore,
I!Ptf- fll
= IIPt'(Pntof*)- /.11
= IIPnta(Pt!.)- /.II
= IIPn(Ptf.)- /.II
= IIPnft -fll
(7.4.10)
202
Thus, since,
and the left-hand side of (7.4.10) is independent of n, we must have IIPt' /./.II = 0 so Pt' /. = J. Since t' is arbitrary, we have Pd. = /. for all t ~ 0.
Finally, to show (7.4.9), pick a function f E D so that IIPtf- /.11
liFt!- .Pt/.11 is a nonincreasing function. Pick a subsequence tn = nto. We
know from before that limn-oo IIPt,.!- /.II = 0. Thus we have a nonincreasing function that converges to zero on a subsequence and, hence
lim
t-oo
IIPt!- /.II = o.
P/. = /.
Remark 7.4.2. From the above definition, it immediately follows that the
asymptotic stability of a semigroup {Pt h;::o implies the asymptotic stability
of the sequence {P{!} for arbitrary to > 0. The proof of Theorem 7.4.1 shows
that the converse holds, that is, if for some to > 0 the sequence {Pt~} is
asymptotically stable, then the semigroup {Pth;::o is also asymptotically
stable. 0
Stochastic semigroups that are not semigroups of Frobenius-Perron operators can arise, as illustrated by the following example.
Example 7.4.1. Let X
by
Ptf(x)
= R, I
= /_: K(t,x,y)J(y)dy,
Pof(x) = l(x),
where
y)2] .
1
[ (xK(t,x,y) = ~ exp - 2a2t
(c) K(t+t',x,y)
= 1; and
= /_: K(t,x,z)K(t',z,y)dz.
--+
L 1 (X)
(7.4.11)
(7.4.12}
7.4. Semigroups
203
From these properties it follows that Pt defined by (7.4.11) forms a continuous stochastic semigroup. The demonstration that {Pth>o defined by
(7.4.11) and (7.4.12) is not a semigroup of Frobenius-Perron operators is
postponed to Remark 7.10.2.
That (7.4.11) and (7.4.12) look familiar should come as no surprise as
the function u(t,x) = Ptf(x) is the solution to the heat equation
au
u2
8t = 2
with the initial condition
u(O,x)
8 2u
8x2
= l(x)
fort> O,x E R
forxER.
(7.4.13)
(7.4.14)
Ut+t'l Ut(Utl)
for all IE L 00
Furthermore, Uol(x) = I(So(x)) = l(x), or
Uol =I
for all IE L 00 ,
for all
The family of Koopman operators is, in general, not a stochastic semigroup because Ut does not map L 1 into itself (though it does map L 00 into
itself) and satisfies the inequality
ess sup IUtfl ::5 ess sup Ill
204
~.
FIGURE 7.4.1. Plots of /(z) and Te/(z) = f(x- ct), for c > 0.
To/=/,
(d) Tt+t'f
for
E L; and
= 1t(Tt/), for IE L.
Moreover, if
for
E L, to ~ 0,
IITt!IFr_P =
i:
lf(x- ct)IP dx
Ill
c!
205
I+C!
whenp = oo. The remaining properties (a), (c), and (d) follow immediately
from the definition of 7t in equation (7.4.17).
Finally, we note that if p = 1 then this semigroup of contractions is
continuous. To see this, first use
ll7t/- TtofiiLl
f(x- cto)l dx
Ttf(x)
and, as a consequence,
117t/- !IlL""
for 0
"'
< ct < 1. ThusiiTd- filL""
may be simply interpreted as shown in Figure 7.4.2 where the hatched areas
206
may appear more complicated in the continuous case because of the use
of integrals rather than summations, for example, in the Birkhoff ergodic
theorem. However, there is one great advantage in the study of continuous
time problems over discrete time dynamics, and this is the existence of a
new tool-the infinitesimal operator.
In the case of a semidynamical system {St }t~o arising from a system of
ordinary differential equations (7.1.2), the infinitesimal operator is simply
the function F(x). This connection between the infinitesimal operator and
F( x) stems from the formula
lim x(t) - x(O) = F(xo),
t-+0
where x(t) is the solution of (7.1.2) with the initial condition (7.1.3). This
can be rewritten in terms of the transformations St as
lim St(xo) - xo
= F(xo).
t-+0
This relation offers some insight into how the infinitesimal operator may
be defined for semigroups of contractions in general, and for semigroups of
the Frobenius-Perron and Koopman operators in particular.
Definition 7.5.1. Let L = V', 1 ~ p ~ oo, and {Tth~o be a semigroup of
contractions. We define by I>(A) the set of all f E L such that the limit
Af = lim Ttf- f
(7.5.1)
t-+0
exists, where the limit is considered in the sense of strong convergence (cf.
Definition 2.3.3). Thus (7.5.1) is equivalent to
lim ''Af- Ttft
t-+0
'II =
L
0.
>..tft + >..2/2
E I>(A)
for all
for all
207
ct;-
= dJ jdx.
Thus, if
Af =lim Ttlt-+0
t
!'
I= -cf',
= 0.
(7.5.2)
By using this concept, we can see that the value of the infinitesimal
operator for f E V{A), Af, is simply the derivative of the function u(t) =
Ttl at t 0. The following theorem gives a more sophisticated relation
between the strong derivative and the infinitesimal operator.
208
u'(t) = Au(t)
(7.5.3)
u(O) =f.
(7.5.4)
~---"".;....
t-to
Because
E V(A),
t-to
1)
fort> to.
(7.5.5)
the limit of
Tt-tof- I
t- to
exists as t -+ to and gives Af. Thus the limit of (7.5.5) as t -+ to also
exists and is equal to T't, 0 Af. In an analogous fashion, if t < to, we have
Tto = TtTt 0 -t and, as a consequence,
u(t) - u(to)
t-to
= Tt (Tt
-d- 1)
to-t
fort<to
(7.5.6)
and
+ IITtAJ- TtoAfiiL
Again, since TtA/ converges to T't,0 Af as t -+to, the limit of (7.5.6) exists
as t-+ 0 and is equal to Tt 0 Af. Thus the existence of the derivative u'(to)
is proved.
Now we can rewrite equation (7.5.5) in the form
u(t)- u(to) _ Tt-t 0 (Tt 0 / ) - (Tt 0 / )
t-to
t-to
fort> to.
209
Since the limit of the differential quotient on the left-hand side exists as
t-+ to, the limit on the right-hand side also exists as t-+ to, and we obtain
u'(to)
= ATt f,
0
Remark 7.5.1. The main property of the set V(A) that follows directly
from Theorem 7.5.1 is that, for f E V(A), the function u(t) =Ttl is a solution of equations (7.5.3) and (7.5.4). Moreover, the solution can be proved
to be unique. Unfortunately, in general1>(A) is not the entire space L,
although it can be proved that, for continuous semigroups of contractions,
V(A) is dense in L. 0
In Theorem 7.5.1, the notion of a function u: [0, oo) -+ L, where L is
again a space of functions, may seem strange. In fact, u actually represents
a function of two variables, t and x, since, for each t ~ 0, u(t) E II'. Thus we
frequently write u(t)(x) = u(t,x), and equation (7.5.3) is to be interpreted
as an equation in two variables.
Applying this theorem to the semigroup considered in Examples 7.4.2
and 7.5.1 with L =II', 1 :5 p < oo, it is clear that this semigroup satisfies
equation (7.5.3), where
u(t,x)
Af=
dJ
-c-,
dx
and
E V(A).
au
at
au
+ c ax
= 0
(7.5.7)
= f(x).
Remark 7.5.2. It is important to stress the large difference in the two interpretations of this problem as embodied in equations (7.5.3) and (7.5.7).
From the point of view of (7.5.7), u(t, x) is thought of a8 a function of isolated coordinates t and x that evolve independently and whose derivatives
8uf8t and 8ujax are evaluated at specific points in the (t,x)-plane. However, in the semigroup approach that leads to (7.5.3), we are considering
the evolution in time of a family of functions, and the derivative du(t)fdt
is to be thought of as taken over an entire ensemble of points. This is made
somewhat clearer when we take into account that u(t) = Ttl has a time
derivative u'(t 0 ) at a point t 0 if (7.5.2) is satisfied, that is,
lim
t-+to
210
dt
(7.6.1a)
=F(x)
or
(7.6.1b)
i = 1, ... ,d,
= x(t),
(7.6.2)
Therefore
Utf(x 0 )
f(x 0 )
tJz,
i=l
(x(fJt))xWJt)
t !~~:,
i=l
(x(6t))F,(x(6t)),
211
Utf(~rP)t- l(:rP) =
t !:~:,
(SBt(xo))F,(SBt(xo)).
(7.6.4)
i=l
uniformly for all x 0 Thus (7.6.4) has a strong limit in L 00 , and the infinitesimal operator AK is given by
(7.6.5)
Observe that equation (7.6.5) was derived only for functions I with some
special properties, namely, continuously differentiable I with compact support. These functions do not form a dense set in L 00 , which is not surprising
since it can be proved that the semigroup {Uth>o is not, in general, continuous in L 00 It does become continuous in a subspace of L 00 consisting
of all continuous functions with compact support (see Remark 7.6.2).
Hence, if I is continuously differentiable with compact support, then by
Theorem 7.5.1 for such I the function
u(t,x) = Utf(x)
satisfies the first-order partial differential equation (7.5.3). From (7.6.5) it
may be written as
(7.6.6)
Remark 7.6.1. It should be noted that the same equation can be immediately derived for u(t, x) = I (St (x)) by differentiating the equality
u(t, S_t(x)) = l(x) with respect to t. In this case I may be an arbitrary
continuously differentiable function, not necessarily having compact support. However, in this case (7.6.6) is satisfied locally at every point (t,x)
and is not an evolution equation in L 00 (cf. Remark 7.5.2}. 0
We now turn to a derivation of the infinitesimal operator for the semigroup ofFrobenius-Perron operators generated by the semigroup of (7.6.la).
This is difficult to do if we start from the formal definition of the FrobeniusPerron operator, that is,
[ Ptl(x)p.(dx)
}A
= [
1s; 1 (A)
l(x)p.(dx)
forA EA.
212
(Ptf,g}
= (/,Utg},
(7.6.7)
(7.6.8)
Now let f E V(App) and g E V(AK), where App and AK denote, respectively, the infinitesimal operators for the semigroups of Frobenius-Perron
and Koopman operators. Take the limit as t-+ 0 in (7.6.8) to obtain
(Appf,g}
= {/,AKg}.
(7.6.9)
However, from equation (7.6.5) the right-hand side of (7.6.9) can be written
as
and thus
213
(AFPI,g}
= (-
t. {}~:i) ,g).
u(t,x)
= Ptf(x)
au+ 2: a(uFi) = o.
8t
i=l
(7.6.11)
OXi
Example 7.6.1. As a special case of the system (7.6.1) of ordinary differential equations, let d = 2n and consider a Hamiltonian system whose
dynamics are governed by the canonical equations of motion (Hamilton's
equations)
i
= 1, ... ,n,
(7.6.12)
at
t oq,
i=l
{}p,
{}pi
oq,
ou
8t + [u,H] = 0,
where [u, H] is the Poisson bracket of u with H. For Hamiltonian systems,
the change with time of an arbitrary function g of the variables q1, ... , qn,
Ph ... , Pn is given by
214
= dg
dH
dHdt
= dg (H H] =0
dH
since [H, H] = 0. Thus any function of the generalized energy His a constant of the motion. 0
Remark 7.6.2. The semigroup ofFrobenius-Perron operators {Pth~o corresponding to the system {Sth~o generated by equation (7.6.1) is continuous.
To show this note that, since St is invertible (S;- 1 B-t), by Corollary
3.2.1 we have
(7.6.13)
Ptf(x) = I(B-t(x))Lt(x),
where Lt is the Jacobian of the transformation B-t Thus, for every continuous I with compact support,
t-to
since the integrals are, in actuality, over a bounded set. Because continuous
functions with compact support form a dense subset of L 1 , this completes
the proof that {Pt}t~o is continuous.
Much the same argument holds for the semigroup {Uth~o if we restrict
ourselves to continuous functions with compact support. In this case, from
the relation
Utf(x) = I(St(x)),
it immediately follows that Utf is uniformly convergent to Ut 0 l as t - to.
For this class of functions the proof of Theorem 7.5.1 can be repeated, thus
showing that equation (7.5.3) is true for I E 'D(AK ). 0
In the whole space L 00 , it may certainly be the case that {Uth~o is not
a continuous semigroup. As an example, consider the differential equation
dx
dt
= -c
215
Theorem 7. 7.1. Let (X, .A, JJ) be a measure space, and St: X-+ X a family
of nonsingular tmnsformations. Also let Pt: L 1 -+ 1 be the FrobeniusPerron opemtor corresponding to {Sth~o Then the measure
JLJ(A)
is invariant with respect to
{Sth~o
f(x)JL(dx)
=f
for all t ~ 0.
e .A,
which, with the definition of Pt, implies Ptf =f. The converse is equally
f=limPtf-f
FP
t-+0
Theorem 7.7.2. A semidynamicalsystem {Sth>o, with nonsingulartmnsformations St: X -+ X, is ergodic if and only if the fixed points of {Uth~o
are constant functions.
216
Proof. The proof is quite similar to that of Theorem 4.2.1. First note that
if {Sth~o is not ergodic then there is an invariant nontrivial subset CCX,
that is,
fort~ 0.
8t" 1 (C) = C
By setting
f = 1c, we have
Ud = 1c o St = 18 ;-l(C} = 1c =f.
Since Cis not a trivial set, f is not a constant function (cf. Theorem 4.2.1).
Thus, if {Sth~o is not ergodic, then there is a nonconstant fixed point of
{Uth~o
Conversely, assume there exists a nonconstant fixed point f of {Uth~o
Then it is possible to find a number r such that the set
= {x:f(x) < r}
8t" 1 (C)
0,
Thus, if the only solutions of AK f = 0 are constant, then the semidynamical system {St}t~o must be ergodic.
.~. x8 1
= {(m1, ... ,md):mk = eiz~,Xk E R,k = 1, ... ,d}
= S1 x
7. 7. Applications of Semigroups
217
(0,21r)x x[0,21r).
(7.7.1)
x,.
The
have an important geometrical interpretation since they are lengths
on 8 1. The natural Borel measure on 8 1 is generated by these arc lengths
and, by Fubini's theorem, these measures, in turn, generate a Borel measure
on~- Thus, from a measure theoretic point of view, we identify Td with
the Cartesian product (7.7.1), and the measure p, on ~ with the Borel
measure on R,d. We have, in fact, used exactly this identification in the
intuitively simpler cases d = 1 (r-adic transformation; see Example 4.1.1
and Remark 4.1.2) and d = 2 (Anosov diffeomorphism; see Example 4.1.4
and Remark 4.1.6). The disadvantage of this identification is that curves
that are continuous on the torus may not be continuous on the Cartesian
product (7.7.1).
Thus we consider a dynamical system {StheR that, in the coordinate
system {x,.}, is defined by
(mod 211").
We call this system rotation on the torus with angular velocities w11 .. .,
Since det(dSt(x)fdx) = 1, the transformation St preserves the measure.
We will prove that {StheR is ergodic if and only if the angular velocities
w1, ... , Wd are linearly independent over the ring of integers. This linear
independence means that the only integers k1, ... , kd satisfying
Wd
(7.7.2)
are kt = = kd = 0.
To prove this, we will use Theorem 7.7.2. Choose f E L 2 (Td) and assume
Utf f for t E R, where Utf f o St is the group of Koopman operators
corresponding to St. Write f as a Fourier series
=L
Ok1 kd
exp(i(k1x1
+ + kdxd)],
where the summation is taken over all possible integers k11 . .. , kd. Substitution of this series into the identity f(x) = f(St(x)) yields
L a1c
1 ,.d
=L
exp(i( kt Xt
+ + kdxd)]
exp(i(k1x1
+ + wdkd)]
+ + kdxd)).
(7.7.3)
218
and all sequences k11 , kd. Equation (7.7.3) will be satisfied either when
ak 1 k., = 0 or when (7.7.2) holds. If w1 , ,wd are linearly independent,
then the only Fourier coefficient that can be different from zero is ao...o. In
this case, then, /(x) = ao ...o is constant and the ergodicity of {StheR is
proved.
Conversely, if the w1 , . , Wd are not linearly independent, and condition
(7.7.2) is thus satisfied for a nontrivial sequence kt, ... , kd, then {7.7.3)
holds for ak 1 k., = 1. In this case the nonconstant function
/(x)
satisfies /(x)
= f(St(x))
= exp[i(k1x1 + + kdxd)]
Remark 7. 7.1. The reason why rotation on the torus is so important stems
from its frequent occurrence in applied problems. As a simple example,
consider a system of d independent and autonomous oscillators
k= l, ... ,d,
(7.7.4)
where q1, ... , qd are the positions of the oscillators and Pl, ... , Pd are their
corresponding velocities. For this system the total energy of each oscillator
is given by
Ek = ~p~ + ~w~q~,
k = 1, ... ,d,
and it is clear that the Ek are constants of the motion. Assuming that
E1, ... , Ed are given and positive, equations (7.7.4} may be solved to give
where Ak = V2Ek/wk and the ak are determined, modulo 211", by the initial
conditions of the system. Set Pk = Pk/Akwk and iik = qk/Ak so that the
vector (p( t), q(t)) describes the position of a point on a d-dimensional torus
moving with the angular velocities w1 , ,wd. Thus, for fixed and positive
E1, ... , Ed, all possible trajectories of the system (7.7.4) are described by
the group {StheR of the rotation on the torus.
At first it might appear that the set of oscillators described by (7.7.4) is
a very special mechanical system. Such is not the case, as equations (7.7.4)
are approximations to a very general situation. We present an argument
below that supports this claim.
Consider a Hamiltonian system
dqk
8H
dt = 8pk'
dpk
dt
8H
= - 8qk
k= l, ... ,d.
H(p,q)
=! :~::::a;k(q)P;Pk + V(q),
j,k
(7.7.5)
219
where the first term represents the kinetic energy and V is a potential
function. Because the first term in H is associated with the kinetic energy,
the quadratic form E;,k a;~~:(q) is symmetric and positive definite. Further,
if rf is a stable equilibrium point, then
avl
-o
k= 1, ... ,d
is also positive definite (we neglect some special cases in which it might
be semidefinite). Further, we assume that H{O,q0 ) = V(q0 ) = 0 since the
potential is only defined up to an additive constant. Thus, developing H
in a Taylor series in the neighborhood of {0, q0 ), and neglecting terms of
order three and higher, we obtain
H(p, q)
=!L
j,k
a;~~:P;Pk +!
{7.7.6)
j,k
where a;~~: = a;~~:(q0 ) and b;~~: = (82 Vj8q;8q~~:)lqo. Both the quadratic forms
E;,k a;~~: and E ,~~: b;k are symmetric and positive definite. With approximation (7.7.6), t~e original Hamiltonian equations (7.7.5) may be rewritten
as
(7.7.7)
where the variables Pk and q~~: - q2 denote, respectively, the deviation of
the system from the equilibrium point {0, q0 ).
Since matrices A = (a;~~:) and B = (b;~~:) are symmetric and positive
definite, there exists a nonsingular matrix C such that (Gantmacher, 1959)
and
CA- CT
1~ ... 0)i
(
{7.7.8)
This new system is completely equivalent to our system {7. 7.4) of independent oscillators with angular velocities w~ = .X~~:.
Finally we note that, although our approximation shows the correspondence between rotation on the torus and Hamiltonian systems, the terms
220
Remark 7.7.2. Note that the statement and proof of Theorem 7.7.2 are
virtually identical with the corresponding discrete time result given in Theorem 4.2.1. Indeed, necessary and sufficient conditions for ergodicity, mixing, and exactness using the Frobenius-Perron operator, identical to those
in Theorem 4.4.1, can be stated by replacing n by t. Analogously, conditions
for ergodicity and mixing in continuous time systems using the Koopman
operator can be obtained from Proposition 4.4.1 by setting n = t. Since all
of these conditions are completely equivalent we will not rewrite them for
continuous time systems. 0
Example 7. 7.2. To illustrate the property of mixing in a continuous time
system we consider a model for an ideal gas in R,3 adapted from Cornfeld,
Fomin, and Sinai [1982). However, our proof of the mixing property is
based on a different technique. At any given moment of time the state of
this system is described by the set of pairs
where Xi denotes the position, and vi, the velocity of a particle. We emphasize that y is a set of pairs and not a sequence of pairs, which means that
the coordinate pairs (xi, vi) are not taken in any specific order. Physically
this means that the particles are not distinguishable. It is further assumed
that the gas is sufficiently dilute, both in spatial position and in velocity, so
that the only states that must be considered are such that in every bounded
set B c It> there is, at most, a finite number of pairs (xi, vi)
The collection of all possible states of this gas will be denoted by Y, and
we assume that the motion of each particle at the gas is governed by a
group of transformations St: Y --+ Y given by
or, more compactly, by St(Y) = {st(Xi,vi)}, where {stheR is the family of
transformations in It> such that
Bt(x,v)
= (x+vt,v).
Thus particles move with a constant speed and do not interact. The surprising result, proved below, is that this system is mixing.
To study the asymptotic properties of {StheR, we must define a aalgebra and a measure on Y. We do this by first introducing a special
measure on Jt>, which is the phase space for the motion of a single particle.
Let g be a density on R,3. As usual, the measure associated with g is
m 9 (A) =
g(v)dv
221
for every Borel set A c R.3, and the measure m in Jl6 = R.3 x R.3 is defined
as the product of the usual Borel measure and m 9 , that is,
From a physical point of view this definition of the measure simply reflects the fact that the particle positions are uniformly distributed in R3 ,
whereas the velocities are distributed with a given density g, for instance,
the Maxwellian g(v) = c exp( -lvl 2).
With these comments we now proceed to define a u-algebra and a measure on Y. Let Bt, ... , Bn be a given sequence of bounded Borel subsets of
R6 for an arbitrary n, and k1, ... , kn be a given sequence of integers. We
use C(Bt. ... , Bn; kt. ... , kn) to denote the set of all y = {(xi, vi)} such
that the number of elements (xi, vi) that belong to B; is equal to k;, that
is,
= {y E Y:
#(ynBt)
= kt. ... ,
#(ynBn)
= kn},
(7.7.9)
where # Z denotes the number of elements of the set Z. Sets of the form
(7. 7.9) are called cylinders. If the sets Bt. ... , Bn are disjoint, then the
cylinder is said to be proper. For every proper cylinder, we define
~t(C(Bt,
[m{Bt)]k1 [m{Bn)]k,.
[- ~ {B)]
k '-k I
exp
L...Jm ' .
n
(7.7.10)
i=l
(7.7.11)
= U[C(Br; kt -
r=O
I'
C(B B . k k ))
(
t. 2' t. 2
= ~ [m(B?)Jkl-r[m{BoW[m(B~)Jk2-r
~
(7.7.12)
222
= (;)
~)
parameters cor-
form a linearly dense subset of L 2 (Y, A, I'). We omit the proof of these
facts as they are quite technical in nature and, instead, turn to consider
the asymptotic properties of system {StheR on the phase spaceY.
First we note that the measure I' is normalized. To show this, take an
arbitrary bounded Borel set B. Then
00
Y= ua(B;k)
k=O
since every y belongs to one of the cylinders C(B; k), namely, the one for
which #(y n B) = k. As the cylinders C(B; k), k = 0, 1, ... , are mutually
disjoint, we have
JJ(Y)
= EJJ(C(B;k)) =
lc=O
E[m~)Jk
e-m(B)
= 1.
lc=O
m(st(B;)) =
II
e(BJ)
g(v)dxdv
=ILJ
g(v)dXdv = m(B;).
= m(B;)
223
and {lei, 1}
= lc (St(y)) = ls_e(Cl)(Y)
1
t-+OO
= p.(Cl)JJ.(C2)
(7.7.14)
We will verify that (7.7.14) holds only in tlie simplest case when each of
the cylinders C; is determined by only one bounded Borel set. Thus we
assume
(7.7.15)
C; = C(B;;kj),
j = 1,2.
(This is not an essential simplification, since the argument proceeds in
exactly the same way for arbitrary proper cylinders. However, in the general
case the formulas are so complicated that the simple geometrical ideas
behind the calculations are obscured.) When the C; are given by (7.7.15),
the right-hand side of equation (7. 7.14) may be easily calculated by (7. 7.10).
Thus
so
p.(S-t(Cl) n C2)
(7.7.17)
224
p(S-t(Cl) n 2) = ~
[m(Bf)]kl-r[m(BtW[m(B~)jk2-r
= B2 \
(7.7.18)
8-t(Bt).
j j 1s (x+vt,v)g(v)dxdv.
1
B2
= Lt(B1) \
t-+oo
= m(B1).
(7.7.21)
225
sets Lt{B1) and B2 are "almost" disjoint for large t. Taken together these
produce the surprising result that mixing can appear in a system without
particle interaction. 0
Example 7.7.3. The preceding example gave a continuous time, dynamical system that was mixing. The phase space of this system was infinite
dimensional. This fact is not essential. There is a large class of finite dimensional, mixing, dynamical systems that play an important role in classical
mechanics. In this example we briefly describe these systems. An exhaustive
treatment requires highly specialized techniques from differential geometry
and cannot be given within the measure-theoretic framework that we have
adopted. All necessary information can be found in the books by Arnold
and Avez [1968), by Abraham and Marsden [1978), and articles by Anosov
[1967) and by Smale [1967).
Let M be a compact connected smooth Riemannian manifold. Having
M, we define the sphere bundle E as the set of all pairs {m,~), where m is
an arbitrary point of M and ~ is a unit tangent vector starting at m. This
definition can be written as
E
= m,
'Y'{O)
= ~.
!I"Y'(t)ll
=1
fortE R.
(7.7.22)
= ("Y(t),"'f'(t))
fortE R,
where the geodesic "Y satisfies (7.7.22). This system is called a geodesic
flow.
In the case dim M = 2, the geodesic flow has an especially simple interpretation: It describes the motion of a point that moves on the surface M in
the absence of external forces and without friction. The motion described
by the geodesic flow looks quite specific but, in fact, it represents a rather
general situation. If M is the configuration space of a mechanical system
with the typical Hamiltonian function (see Remark 7.7.1),
H(q,p) =
! L a;~c(q)P;Pk + V(q),
j,lc
226
(b) For each f E L there exists a unique solution g E V(A) of the resolvent equation
(7.8.1)
).g-Ag=/;
227
AII9IIL
(7.8.2)
(7.8.3)
where A.x = -\AR.x and R.xf = g (the resolvent operator) is the unique
solution of ,\g-Ag = f.
Consult Dynkin [1965) or Dunford and Schwartz [1957) for the proof.
Operator A.x -\AR.x can be written in several alternative forms, each of
which is useful in different situations. Thus, after substitution of g = R.xf
into (7.8.1), we have
(7.8.4)
By applying the operator R.x to both sides of (7.8.1) and using g = R.xf,
we also obtain
-\R.xg- R.xAg g
for g E V(A).
(7.8.5)
R.xAf = AR.xf
for
V(A).
(7.8.6)
for f E L,
(7.8.7)
228
exp(tA~)
+ 11>./IIL ~ 2>.11/IIL.
etA,. f =
oo tn
L --;A~/,
n.
(7.8.12)
n=O
et~ R" f
2
=L -
oo tn An
n=O
n.1
(>.R~tf.
(7.8.14}
>.
LR~f(x)J.L(dx} L
L
L
=
f(x)J.L(dx}
(7.8.15}
Ttf(x)J.L(dx) =
f(x)J.L(dx},
f 2: 0, t 2: 0.
eU R"
f(x)J.L(dx}
= ~ t:~n
L(>.R~)n
=?; 7
Jx f(x)J.L(dx}
oo tn).n
et~
f(x)J.L(dx)
f(x)J.L(dx).
(7.8.16)
229
Now,
11/11 ~ 11>-R>./11.
This is always satisfied if >.R>. is a Markov operator, as we have shown in
Section 3.1 [cf. inequality {3.1.6)].
The Hille-Yosida theorem has several other important applications. The
first is that it provides an immediate and simple way to demonstrate that
Appf = 0 is a sufficient condition that /.Lf is an invariant measure.
Thus, A/= 0 implies, from (7.8.10), that A>./ = 0 and from (7.8.12)
etA,.
I= f.
Ttl=!
for all t
0.
Thus, in the special case Appf = 0 this implies that Ptf = f and thus 1-'1
is invariant.
By combining this result with that of Section 7.7, we obtain the following
theorem.
Theorem 7.8.2. Let {Sth~o be a semidynamicalsystem such that the corresponding semigroup of Frobenius-Perron operators is continuous. Under
this condition, an absolutely continuous measure /.L1 is invariant if and only
if Appf = 0.
Consider the special case where App is the infinitesimal operator for a d.dimensional system of ordinary differential equations {cf. equation 7.6.10).
230
Then the necessary and sufficient condition that Jl.J be invariant, that is
AFP! = 0, reduces to
d
L8(/Fi) =0
i=1 8xi
(7.8.17)
Remark 7.8.1. Equation (7.8.17) is also a necessary and sufficient condition for the invariance of the measure
Jl.J(A) =
f(x)p.(dx)
p.(A)
dx
= 1 into (7.8.17).
(7.8.18)
Remark 7.8.2. It is quite straightforward to show that Hamiltonian systems (see Example 7.6.1) satisfy (7.8.18) since
i= 1
[~ (8H) + ~ (- 8H)] = O
8qi
Bpi
Bpi
8qi
231
Then {Sth~o is ergodic if and only if AKg = 0 has only constant solutions
in L 00
Proof. The "if'' part follows from Theorem 7. 7.2. The proof of the "only ir'
part is more difficult since, in general, the semigroup {Ut} is not continuous
and we cannot use the Hille-Yosida theorem. Thus, assume that AKg = 0
for some nonconstant g. Choose an arbitrary f E L 1 and define the realvalued function ,P by the formula
,P(t + ~- ,P(t)
= ( /, Ut+hgh- Utg)
= (Ptf, Uhgh-
g)
-+
,P'(t)
for h
> 0, t ~ 0.
0, we obtain
= (Ptf,AKg) = 0.
(!, Utg- g)
= ,P(t) -
,P{O)
=0
fort~
0.
232
AK/ =
[8/ 8H _ 8/ 8H]
i=l
8qi 8pi
8pi 8qi
= [f,H].
[/(H), H]
= 88/H [H, H] = 0
and therefore Hamiltonian systems are not ergodic on the whole space.
However, if we fix the total energy, or the energy for each degree of freedom
as in Remark 7.7.1, then the system may become ergodic. 0
Af = tf/
(7.9.1)
dx2
that can, of course, only be defined for some f E 1 Let 'D(A) be the set
of all f E L 1 such that f"(x) exists almost everywhere, is integrable on R,
and
f'(x) = /'(0)
/"(s) ds.
+loa:
In other words, 'D(A) is the set of all f such that/' is absolutely continuous
and f" is integrable on R. We will show that there is a unique semigroup
corresponding to the infinitesimal operator A.
The set 'D(A) is evidently dense in L 1 (even the set of
functions is
dense in 1), therefore we may concentrate on verifying properties (b) and
(c) of the Hille-Yosida theorem.
The resolvent equation (7.8.1) has the form
coo
tfg
).g- dx2
= /,
(7.9.2)
233
which is a second-order ordinary differential equation in the unknown function g. Using standard arguments, the general solution of (7.9.2) may be
written as
g(x)
1
= Cte-az + C2eaz +-
1z
2a zo
1z
where a = v'>., and Ct. C2, xo, and Xt are arbitrary constants. To be
specific, pick x 0 = -oo, Xt = +oo, and set
K(x- y) = (1/2a}e-al(z-y)l,
(7.9.3)
Then the solution of (7.9.2} can be written in the more compact form
g(x) =
C1e-az
(7.9.4}
rJC'
= ~1 }_oo
f(y) dy.
(7.9.5}
Thus, since neither exp( -ax) nor exp( ax) are integrable over R, a necessary and sufficient condition for f to be integrable over R is that 0 1 =
0 2 = 0. In this case we have shown that the resolvent equation (7.9.1} has
a unique solution g E L 1 given by
g(x)
(7.9.6}
rPJ(w)
234
1/(>. + w2 ),
where).= o. 2 Since, by (7.9.6), R>.l is the convolution of the functions K
and I, and it is well known that
(7.9.8)
where I* g denotes the convolution of I with g, the Fourier transformation
of R';.l is
As a consequence, the Fourier transformation of the series in (7.9.7) is
00
n=O
(>.
tn ).2n
)n </JJ(w)
+W2 n.1
= exp[>.2tj(>. + w2 )]tPJ(w).
>.-+oo
v47rt
00
Ttf(x) =
r:;-::;
1
v47rt
-oo
(7.9.9)
Hence, using the semigroup method we have shown that u(t, x) = Ttf(x)
is the solution of the heat equation
u(O,x)
= l(x).
235
L
n
Af=ilf=
i=l
Pf
-2
8x,
(7.9.10)
x, x,
i,j=l
dz2
with
df = 0 and df = 0
dxa
dxb
is an infinitesimal operator for a stochastic semigroup. More details concerning such general elliptic operators may be found in Dynkin [1965].
Finally, we note that all semigroups that are generated by second-order
differential operators are not semigroups of Frobenius-Perron operators
for a semidynamical system and, thus, cannot arise from deterministic processes. This is quite contrary to the situation for first-order differential
operators, as already discussed in Section 7.8.
Remark 7.9.2. Equation (7.8.3) of the Hille-Yosida theorem allows the
construction of the semigroup {Tth~o if the resolvent operator RA is
known. As it turns out, the construction of the resolvent operator when
the continuous semigroup of contractions is given is even simpler. Thus it
can be shown that (Dynkin, 1965)
for
L, A > 0.
(7.9.12)
236
In (7.9.12} the integral on the half-line [0, oo) is considered as the limit of
Riemann integrals on [0, a] as a - oo. This limit exists since
It is an immediate consequence of (7.9.12} that for every stochastic semigroup Tt: 1 -+ L 1 , the operator >.R>. is a Markov operator. To show this
note first that, for f ~ 0, equation (7.9.12} implies >.R>. ~ 0. Furthermore,
for f ~ 0,
IIR>./11
=[
R>./(x) dx
00
00
e->.t { [ Ttf(x)
dx} dt
e->.tll/11 dt =~II/II.
(g, R>./)
00
e->.t(g, Ttf) dt
for>.> 0,
which shows that (g, R>./), as a function of>., is the Laplace transformation
of (g, 7t/) with respect tot. Since the Laplace transformation is one to one,
this implies that (g, Ttl) is uniquely determined by (g, R>.f). Further, since
g E L 00 is arbitrary, {Ttl} is uniquely determined by {R>./}. The same
argument also shows that for a bounded continuous function u{t), with
values in 1, the equality
implies u(t)
= Ttf.
237
Ptf
= e-t LTn(t)l
IE Ll,
(7.9.13)
n=O
where To(t) =
7t and
(7.9.14)
Ao,
>-.g-Aog
may be rewritten as
=I,
(7.9.15)
(7.9.16)
9 = R~(Ao)l = L[R~+l(A)P]nRMt(A)I,
(7.9.17)
n=O
oo (
IIR~(Ao)lll = ~ ).. + 1
for I;;::: 0.
238
Thus ~R~ (Ao) is a Markov operator and Ao satisfies all of the assumptions
of the Hille-Yosida theorem. Hence the infinitesimal operator A0 generates
a unique stochastic semigroup and the first part of the theorem is proved.
Now we show that this semigroup is given by equations (7.9.13) and
(7.9.14). Using (7.9.14) it is easy to show by induction that
(7.9.18)
Q~,nf =
so
1 e-~tTn(t)f
00
dt,
n=0,1, ... ,
and
Q~,nf =
1 e-~t
1 {1 e-~tTo(t00
00
00
r)PTn-t(r)f dt} dr
1 e-~'T 1 e-~tTo(t)PTn-t(r)f
=1 e-~tTo(t)P 1
00
00
00
dt} dr
00
e-MTn-t(r)f dr} dt
= R~(A)PQ~,n-d
Hence, by induction, we have
Q~,n = (R~(A)P]" R~(A).
Define
Q~f =
1 e-~t
00
Ptf dt
Q~f = ~
00
e-(Ml)tTn(t)f dt
= ~ Q~+t,n/
00
= L(R~+t(A)P]"R~+t(A)f.
By comparing this result with (7.9.17), we see that
R~(Ao)f =
Q~ = R~(Ao)
or
(7.9.19)
239
From (7.9.19) (see also the end of Remark 7.9.2), it follows that {Ptfh>o
is the semigroup corresponding to Ao.
-
u 2 82u(t, x)
+u(t,x) = 2
1Jx2
00
-oo
K(x,y)u(t,y)dy,
t > 0, x E R
(7.9.20)
u(O,x) = <jJ(x)
xER.
(7.9.21)
K(x,y)
~0
and
L:
K(x,y)dx = 1.
To treat the initial value problem, equations (7.9.20) and (7.9.21), using
semigroup theory, we rewrite it in the form
du
dt
= (A+P-I)u,
u(O) = </J,
(7.9.22)
Ttf(x)
= v'2U2rl
12
2u 7rt
00
(7.9.23)
-oo
L:
240
where
00
9t
=L
P1'n-1(t)l/J.
n=l
Thus, using (7.9.23) (with T0 (t) = Tt), we have the explicit representation
Jo
1
y'2u 21r(t- r)
00
-oo
8u(t,x) 8u(t,x)
8t + ax +u(t,x)=
00
a:
K(x,y)u(t,y)dy,
t
with
u(t, 0) = 0 and u(O, x)
> 0,
X ~
= lf>(x).
(7.9.24)
(7.9.25)
K(x, y)
~0
1
11
and
K(x, y) dx = 1.
(7.9.26)
Equation (7.9.24) occurs in queuing theory and astrophysics (BharuchaReid, 1960]. In its astrophysical form,
K(x, y) = (1/y)'I/J(xfy),
{7.9.27)
and, with this specific expression for K, equation (7.9.24) is called the
Chandrasekhar-Miinch equation. As developed by Chandrasekhar and
Miinch (1952], equation (7.9.24) with K as given by (7.9.27) describes fluctuations in the brightness x of the Milky Way as a function of the extent
of the system t along the line of sight. The unknown function u( t, x) is
the probability density of the fluctuations, and the given function 1/J in
{7.9.27) is related to the probability density of light transmission through
interstellar gas clouds. This function satisfies
1/l(z)
~0
1
1
and
1/J(z) dz = 1
(7.9.28)
241
Ttf(x)
= 1[o,oo)(x- t)f(x- t)
(7.9.29)
du
dt
= (A+P-I)u,
where
u(O}
= t/>,
(7.9.30)
00
Pf(x) =
K(x,y)f(y)dy.
By Theorem 7.9.1 there is a continuous unique semigroup {Pth>o corresponding to the infinitesimal operator A+ P-I. For every t/> E V(A), the
function u(t) = Ptt/> is a solution of (7.9.30). 0
[ g(x)Ptf(x)JJ(dx)
=[
f(x)g(St(X))JJ(dx)
For some A C X such that A and Bt(A) are in A, take f(x) = 0 for all
x A and g = 1x\St(A) so the preceding formula becomes
[ 1x\St(A)(x)Ptf(x)JJ(dx) = [ f(x)1x\St(A)(St(x))JJ(dx)
f(x)1x\St(A)(Bt(X))JJ(dx).
242
The right-hand side ofthis equation is obviously equal to zero since St(x)
X\ St(A) for x E A. The left-hand side is, however, just the L 1 norm of
the integrand, so that
ll1x\S,(A)Ptlll
= 0.
or
for x St(A).
Ptf(x) =0
(7.10.1)
Utf(x)
Assume
IE L
00
= I(St(x)).
I(St(x))
=0
if St(x) A.
(7.10.2)
Utf(x)
=0
(7.10.3)
= B-t(x) and
If, in addition, the group { St} preserves the measure p., we have
( Ptf(x)p.(dx)
= (
k_,(~
l(x)p.(dx)
= (
which gives
Ptl(x) = 1(8-t(x))
1(8-t(x))p.(dx),
Ptf(x)
= U_tf(x).
243
(7.10.4)
Equation (7.10.4} makes totally explicit our earlier comments on the forward and backward transport of densities in time by the Frobenius-Perron
and Koopman operators.
Furthermore, from (7.10.4) we have directly that
lim[(Ptf- /)/t]
t--+0
= t--+0
lim[(U-tf- !)ft]
f in a dense subset of 1 ,
(7.10.5)
This relation was previously derived, although not explicitly stated, for
dynamical systems generated by a system of ordinary differential equations
[cf. equations (7.6.5) and (7.6.10)].
Remark 7.10.1. Equation (7.10.4) may, in addition be interpreted as saying that the operator adjoint to Pt is also its inverse. In the terminology
of Hilbert spaces [and thus in L 2 (X)] this means simply that {Pt} is a
semigroup of unitary operators. The original discovery that {Ut}, generated by a group {St} of measure-preserving transformations, forms a group
of unitary operators is due to Koopman [1931]. It was later used by von
Neumann [1932] in his proof of the statistical ergodic theorem. 0
Remark 7.10.2. Equation (7.10.1) can sometimes be used to show that a
semigroup of Markov operators cannot arise from a deterministic dynamical system, which means that it is not a semigroup of Frobenius-Perron
operators for any semidynamical system {Sth>o
For example, consider the semigroup {Pt} 'iiven by equations (7.4.11}
and (7.4.12):
Ptf(x)
Setting f(y)
= J21W2t
2u2t
(7.10.6)
However, according to (7.10.1), if Ptf(x) was the Frobenius-Perron operator generated by a semidynamicalsystem {Sth~o, then it should be zero
outside a bounded interval St([O, 1]). [The interval St([O, 1]) is a bounded
interval since a continuous function maps bounded intervals into bounded
intervals.] Thus {Pt}, where Ptf(x) is given by (7.10.6), does not correspond to any semidynamical system. 0
244
7.11
t-+oo}A
Ptf(x)J.&(dx) = 0
(7.11.1)
As in the discrete time case, it is easy to verify that condition (7.11.1) for
a sweeping semigroup {Pth>o also holds for every IE L 1 (X). Alternately,
if Doc Dis dense in D, th;n it is sufficient to verify (7.11.1) for IE Do.
In the special case that X c R is an interval (bounded or not) with endpoints a and {3, a< {3, we will use notions analogous to those in Definition
5.9.2. Namely, we will say that a stochastic semigroup Pt: L 1 (X)-+ L 1 (X)
is sweeping to a, sweeping to {3, or simply sweeping if it is sweeping
with respect to the families Ao, A 11 or A 2 defined in equations (5.9.5)(5.9.7), respectively.
Example 7.11.1. Let X = R. We consider the semigroup generated by
the infinitesimal operators cd./dx and (u 2 /2)f12/dx 2 discussed in Example
7.5.1 and Remark 7.9.1.
The operator cd./dx corresponds to the semigroup
Ptf(x) = l(x- ct)
which, for c > 0, is sweeping to +oo and for c < 0 to -oo. The verification
of these properties is analogous to the procedure in Example 5.9.1. Thus,
for c > 0 we have
l-oob
Ptf(x) dx
lb-oo
l(x- ct) dx
lb-ct
-oo
l(y) dy
=0
Ptf(x) =
1
b
E D,
b-a
Ptf(x)dx :5 ~-+
0
21ru 2 t
as t-+ oo.
<
0 the
245
Comparing Examples 5.9.1, 5.9.2, and 7.11.1 we observe that the sweeping property of a semigroup {Pt }t~o appears simultaneously with the sweeping of the sequence {Pt~} for some to > 0. This is not a coincidence. It is
evident from Definitions 5.9.1 and 7.11.1 that if {.Pt}t~o is sweeping, then
{Pt~} is also sweeping for an arbitrary to > 0. The converse is more delicate,
but is assured by the following result.
Theorem 7.11.1. Let (X, A, Jl.) be a measure space, A. c A be a given
family of measurable sets, and Pt: L 1 (X)- L 1 (X) a continuous stochastic
semigroup. If for some to > 0 the sequence {JTo} is sweeping, then the
semigroup {Pth>o is also sweeping.
Proof. Fix an e > 0 and
such that
liFt!- Ill$ e
>0
for 0$ t $6.
Let
0 = So < St < < Sk
be a partition of the interval [0, t 0 ] such that
for i
Define
= to
= 1, ... , k.
and 0 $ r < 6.
!)II$ e.
Ptf(x)J.t(dx) $
LPt~fi(x)Jl.(dx) +e.
246
for fED.
Proof. Since the operator Pt0 satisfies the conditions of Proposition 5.9.1,
the sequence{~} is sweeping. Theorem 7.11.1 completes the proof.
More sophisticated applications of Theorem 7.11.1 will be given in the
next section.
/.(x)
is a density and satisfies Ptf.
= t~ 1to Ptfo(x) dt
= j.
for all t ~ 0.
L/.(x)~(dx) L[t~
= :0 1to
[L Ptfo(x)~(dx)]
=1.
Furthermore,
1 rto
Pt/.
= to Jo
~(dx)
Ps+tfo ds
1 lto+t
t
P.fo ds
= to
dt
Exercises
1 1to
1 1to+t
= -too
Pefo ds + -t
Pefo ds Ot
0
= /.
247
1 1t
-t
P8 fo ds
oo
+ _!_
Now, using Theorems 5.9.1, 5.9.2, and 7.12.1, it is easy to establish the
following alternative.
Theorem 7.12.1. Let (X, .A, p.) be a measure space, and .A. c .A be a given
regular family of measurable sets. FUrthermore, let Pt: L1 (X)-+ L1 (X) be
a continuous stochastic semigroup such that for some to > 0 the operator
Pt 0 satisfies the following conditions:
Under these conditions, the semigroup {Pt h>o either has an invariant density, or it is sweeping. If an invariant density exists and, in addition, Pt0
is an expanding operator, then the semigroup is asymptotically stable.
Proof. The proof is quite straightforward. Assume first that { Pt }t~o is not
sweeping so by Theorem 7.11.1 the sequence {JTo} is also not sweeping.
In this case, by Theorem 5.10.1 the operator Pt 0 has an invariant density.
Proposition 7.12.1 then implies that { Pth~o must have an invariant density.
In the particular case that Pt0 is also an expanding operator, it follows from
Theorem 5.10.2 that {JTo} is asymptotically stable. Finally, Remark 7.4.2
implies that {Pt h>o is also asymptotically stable.
In the second case that {Pt }t~o is sweeping, {JTo} is also, and by Th~
orem 5.10.1 the operator Pt0 does not have an invariant density. As a
consequence, { Pt h~o also does not have an invariant density.
Exercises
7.1. Let A: L-+ L be a linear bounded operator, that is,
IIAII
248
/EL,
7.2. Again, let A: L-+ L be a linear bounded operator. Using the results
of Exercise 7.1, prove that A is the infinitesimal operator of the semigroup
Ttf =etA/
and that :V(A)
= L.
11/n- /II-+ 0;
IIA/n- gil-+ 0;
/n
'V(A);
/,gEL
imply that f E :V(A) and g = Af. Prove that the following operators are
closed:
(a) The operator Af = df fdx defined on the set :V(A) c L1 of all
absolutely continuous f E L1 such that /' E L 1 .
(b) The operator Af = ~//dx2 defined on the set 'D(A) C L1 of all
I E L 1 such that /' is absolutely continuous and f" E L1
7 .4. Generalize the previous results and show that every operator A satisfying the conditions of the Hille-Yosida theorem is closed.
7.5. In Section 7.9 using the Hille-Yosida theorem we have proved that
A=~ /dx 2 generates the semigroup {Tt} given by formula (7.9.9). Reverse
the calculation and assuming that {Tt} is defined by (7.9.9) show that
A = ~ / dx 2 is its infinitesimal operator.
+ ax (F(x)u) =
0,
u(O,x)
= /(x),
forxeR.
showing that for r > 1 the semigroup {Pt} may not be well defined or
stochastic.
7. 7. Consider the semigroup {Pt} defined in the previous exercise (with
r = 1). Show that
(a) {.Pt} is not asymptotically stable;
(b) {Pt} is sweeping to +oo if and only if all solutions of the equation
x' = F(x) satisfy limt__, 00 x(t) = oo.
Exercises
249
=8
u(t,x)
Ox2 +
1,.. K(x,y)u(x,y) dy
= u~(t, 1r) = 0
= f(x)
generates the stochastic semigroup pt f(x) = u(t, x) on the space 1 ([0, 1r]).
u(O,x)
In particular, define precisely the domain I>( A) of A=~ jdx 2 for which
the conditions of the Hille-Yosida theorem are satisfied (Jama 1986).
8
Discrete Time Processes Embedded
in Continuous Time Systems
8.1
252
FIGURE 8.1.1. Determination of the first return (or Poincare) map for a semidynamical system {St}t~o
Also assume that we can find a closed set A c X such that, if x E A, then,
for t > 0 sufficiently small St (x) A, that is, each trajectory leaves A
immediately (see Figure 8.1.1). Further, if every trajectory that starts in A
eventually returns to A, that is, for every x E A there is a t' > 0 such that
St' (x) E A, then we may define a new mapping, the first return map.
This is given by
S(x) = St'(x),
where t' is the smallest time t' > 0 such that St' (x) E A. Again, by studying
S, we may gain some insight into the properties of {Sth>o This method
was introduced by Poincare, and the first return map is often called the
Poincare map.
Thus it is relatively straightforward to devise ways to study continuous
time processes by a reduction to a discrete time system. However, given
a discrete time system S: X -+ X, it is much more difficult to embed it
in a continuous time system and, indeed, such embedding is, in general,
impossible. That is, given S: X -+ X, generally there does not exist a
{Stho::o such that S(x) = St0 (x) for some to ~ 0 [see Zdun, 1977]. For
example, in previous chapters we considered the quadratic transformation
S(x) = 4x(1 - x), x E [0, 1]. It can be proved that there does not exist
a semidynamical system {Stho::o on [0, 1] such that St 0 (x) = 4x(1- x).
Of course, it is always possible to embed a discrete time process into a
continuous time system by altering the phase space in an appropriate way.
253
= p,
Ae.r,
means that the probability of event A is p. From the fact that prob is a
measure, it immediately follows that
prob ( yAi)
= ~prob{Ai),
{8.2.1)
"I j.
Definition 8.2.1. In a sequence of events At. A2, ... (finite or not), the
events are called independent if, for any increasing sequence of integers
kt < k2 < < km
prob{ Atc 1 n Atc2 n n Atc,.) = prob{ Atc 1 )
prob{ Atc,. ).
{8.2.2)
Equation {8.2.2) just means that the probability of all the events Atc, occurring is the product of the probabilities that each will occur separately.
Random variables are defined next.
Definition 8.2.2. A random variable { is a measurable transformation
from into R. More precisely, {:
R is a random variable if, for any
Borel set B c R,
n-
f(x) dx
{8.2.3)
{6
254
and the probability that all events {ei E Bi} will occur is simply given by
the product of the probabilities that each will occur separately.
We are now in a position to make the concept of a stochastic process
precise with the following definition.
Definition 8.2.3. A stochastic process {et} is a family of random variables that depends on a parameter t, usually called time. If t assumes
only integer values, t = 1, 2, ... , then the stochastic process reduces to
a sequence {en} of random variables called a discrete time stochastic
process. However, if t belongs to an interval (bounded or not) of R, then
the stochastic process is called a continuous time stochastic process.
By its very definition, a stochastic process {et} is a function of two
variables, namely, time t and event w, but this is seldom made explicit
by writing {et(W) }. If the time is fixed, then et is simply a random variable.
However, if w is fixed, then the mapping t -+ et(w) is called the sample
path of the stochastic process.
Two important properties that stochastic processes may have are given
in the following definition.
Definition 8.2.4. A continuous time stochastic process {eth>o has independent increments if, for any sequence of times to < t 1 < < t"' the
random variables
etl - eto' et2 - eto . .. 'etn - etn-1
are independent. Further, if for any h and t 2 and Borel set B
c R,
(8.2.5)
does not depend on t'' then the continuous time stochastic process {et} has
stationary independent increments.
Before giving the definition of a Poisson process, we note that a stochastic
process {et} is called a counting process if its sample paths are nondecreasing functions of time with integer values. Counting processes will be
denoted by {Nt}t~o
(8.2.6a)
= 0;
(8.2.6b)
= t-+0
lim(1/t)prob{Nt = 1}
(8.2.6c)
255
(8.2.7)
(8.2.8b)
~ = lim(1/t)pl(t).
(8.2.8c)
i=2
and
t-+0
To obtain the differential equation for P~c(t), we first start with Po(t),
noting that Po(t +h) may be written as
256
= prob{Nt+h -
Nt
= 0} prob{Nt -No = 0}
(8.2.9)
(8.2.10)
= _Pt(h)
h
_ .!:_ ~p(h)
h.L...J
i=2
and, thus, by taking the limit of both sides of (8.2.10) ash-+ 0, we obtain
dPo(t)
dt
= -.\Po(t) .
(8.2.11)
P~c(t)
proceeds in a similar
+ Eprob{Nt- No= k- i
and Nt+h- Nt = i}
i=2
k
As before, we have
257
FIGURE 8.2.1. Probabilities Po(t), p1(t), P:l(t) versus .Xt for a Poisson process.
dt
= -APl(t) + Ae-.\t
whose solution is
Pl(t)
= Ate->.t.
258
>.10.1
>.tI.O
L---..L---L.---1.
Pk(t)
0.1
>.t 10
.d
10
II
20
FIGURE 8.2.2. Plots of p,.(t) versus k for a Poisson process with >.t
or 10.
= 0.1, 1.0,
8Nt(x)
for times in the interval [0, oo). Specifically, we consider the following problem. Given an initial distribution of points x E X, with density f, how does
this distribution evolve in time? We denote the time-dependent density by
u(t,x) and set u(O,x) = f(x).
The solution of this problem starts with a calculation of the probability
259
that
(8.3.1)
for a given set A c .A and time t > 0. This probability depends on two
factors: the initial density f and the counting process {Nth>o
To be more precise, we need to calculate the measure of the set
(8.3.2)
This, in turn, requires some assumptions concerning the product space
n X X given by
#Lt(A)
= prob(C)#Lt(A),
(8.3.3)
f(x)#L(dx).
This measure is denoted by "Prob" since it is a probability measure. Equation (8.3.3) intuitively corresponds to the assumption that the initial position x and the stochastic process {Nth>o are independent.
Now we may proceed to calculate the 'ineasure of the set (8.3.2). This set
may be rewritten as the union of disjoint subsets in the following way:
00
{(w,x):SN,(w>(x)
260
L Prob{Nt(w) = k, S"(x) E A}
Prob{SN E A}=
k=O
00
EP~c(t)
k=O
EP~c(t)
f(x)IJ.(dx)
ls-"(A)
pk f(x)IJ.(dx)
(8.3.4)
lc=O
so that
Prob{SN E A}=
f EP~c(t)Pic f(x)IJ.(dx)
forAEA,
(8.3.5)
}Ak=-0
u(t,x)
= LP~c(t)P" f(x).
(8.3.6)
k=O
&(t,x)
0t
f: dp~c(t)
pk f(x)
dt
k=O
00
00
k=1
Since the last two series are strongly convergent in 1 , the initial differentiation was proper. Thus we have
= -.Xu(t,x) + ~Pu(t,x).
261
(8.3.7)
u(O,x)
= f(x).
(8.3.8)
=-~u(t,n)+~u(t,n-1)
and
8u(t, 0)
8t
n~1
= -~u(t, O),
which are identical with equations (8.2.12) and (8.2.11), respectively, except
that the initial condition is more general than for the Poisson process since
u(O, n) = f(n).
262
u(t,x)dx
u(t+tl.t,x)dx-N
u(t,x)dx.
(8.4.1)
From our assumption, such a change can only take place through collisons
with the walls of the container. Take fl.t to be sufficiently small so that
a negligible number of particles make two or more collisions with a wall
during fl.t. Thus, the number of particles striking the wall during a time
fl.t with velocity in A before the collision [and, therefore, having velocities
in S(A) after the collision] is
N>.tl.t
(8.4.2)
u(t,x)dx,
where >.N is the number of particles striking the walls per unit time. In
this idealized, abstract example we neglect the quite important physical
fact that the faster particles are striking the walls of the container more
frequently than are the slower particles.
Conversely, to find the number of particles whose velocity is in A after
the collision, we must calculate the number having velocities in the set
s- 1 (A) before the collision. Again, assuming fl.t to be sufficiently small to
make the number of double collisions by single particles negligible, we have
N>.fl.t
u(t,x)dx.
(8.4.3)
ls-l(A)
Hence the total change in the number of particles with velocity in the set A
over a short time fl.t is given by the difference between (8.4.3) and (8.4.2):
N>.fl.t {
ls-l(A)
u(t,x)dx- N>.fl.t
}A
u(t,x)dx.
(8.4.4)
}A
u(t,x)dx-
ls-l(A)
and, since
ls-l(A)
u(t,x)dx =
}A
Pu(t,x)dx,
}A
u(t,x)dx},
263
N L[u(t+J)..t,x) -u(t,x)]dx
(8.4.5)
Ou~x)dx=>. L[-u(t,x)+Pu(t,x)]dx,
which gives
Ni u(t,x)dx
is the number of particles with velocities in A. Once again,
>.N /),.t
Pu(t,x)dx
- >.N /),.t
Pu(t, x) dx
264
is the net change, due to collisions over a time !:it in the number of particles
whose velocities are in A.
Combining this result with (8.4.1), we immediately obtain the balance
equation {8.4.5), which leads once again to {8.3.7). The only difference is
that P is no longer a Frobenius-Perron operator corresponding to a given
one-t~one deterministic transformationS, but it is an arbitrary Markov
operator.
Since in our intuitive derivations of {8.3.7) presented in this section,
we used arguments that are employed to derive a Boltzmann equation,
we will call equation {8.3.7) a linear abstract Boltzmann equation
corresponding to a collision (Markov) operator P. To avoid confusion with
the usual Boltzmann equation, bear in mind that x corresponds to the
particle velocity and not to position. Indeed, it is because we assume that
the only source of change for particle velocity is collisions with the wall,
that drift and external force terms do not appear in (8.3.7).
Our next goal will not be to apply equation (8.3. 7) to specific physical
systems. Rather, we will demonstrate the interdependence between the
properties of discrete time deterministic processes, governed by S: X -+ X
or a Markov operator, and the continuous time process, determined by
{8.3.7). The next four sections are devoted to an examination of the most
important properties of (8.3.7), and then in the last section we demonstrate
that the Tjon-Wu representation of the Boltzmann equation is a special
case of {8.3.7).
8.5
dt
= (P- I)u,
(8.5.1)
where Pis a given Markov operator and I is the identity operator, we may
apply the Hille-Yosida theorem 7.8.1 to the study of equation {8.3.8).
All three assumptions (a)-(c) of the Hille-Yosida theorem are easily
shown to be satisfied by the operator (P-I) of equation {8.5.1). First,
since A = P-I is defined on the whole space, L 1 , V(A) = L 1 and property (a) is thus trivially satisfied.
To check property {b), rewrite the resolvent equation>../- Af = g using
=P -
265
I to give
(>-.+1)/-Pf=g.
(8.5.2)
Equation (8.5.2) may be easily solved by the method of successive approximations. Starting from an arbitrary /o, we define fn by
/n=(>-.+ 1)"'
pn~ ~
1
A:-1
J0+~()..+1)kp g.
(8.5.3)
Since IIPkgll $ llgll, the series in (8.5.3) is, therefore, convergent, and the
unique solution f of the resolvent equation (8.5.2) is
(8.5.4)
11
= -)..
g(x)JJ(dx)
= -,
)..
AR>.g(x)JJ(dx)
= 1,
and, since >-.R>. is linear, nonnegative, and also preserves the integral, it
is a Markov operator. Thus condition (c) is automatically satisfied (see
Corollary 7.8.1).
Therefore, by the Hill~Yosida theorem, the linear Boltzmann equation
(8.3.8) generates a continuous semigroup of Markov operators, {f-'t}t~O
266
so
lim
>.-+oo
A>.f = Pf- f.
Thus, by the Hill~Yosida theorem and equation (7.8.3), the unique semigroup corresponding to A = P - I is given by
A! =
et(P-1) j,
(8.5.5)
and the unique solution to equation (8.3.8) with the initial condition u(O, x) =
f(x) is
u(t,x) = et(P- 1 ) f(x).
(8.5.6)
Although we have determined the solution of (8.3.8) using the Hill~
Yosida theorem, precisely the same result could have been obtained by
applying the method of successive approximations to equation {8.5.1). However, our derivation once again illustrates the techniques involved in using
the Hill~Yosida theorem and establishes that (8.3.8) generates a continuous semigroup of Markov operators. Finally, we note that if P in equation (8.3.8) is a Frobenius-Perron operator corresponding to a nonsingular
transformation 8, the solution can be obtained by substituting equation
(8.2.14) into equation {8.3.6).
In addition to the existence and uniqueness of the solution to (8.3.8),
other properties of Pt may be demonstrated.
Property 1. From inequality (7.4.7) we know that, given ft,/2 E Lt, the
norm
(8.5.7)
IIPtft-Pthll
is a nonincreasing function of time t.
Property 2. If for some f E 1 the limit
/. = t-+00
lim A!
exists, then, for the same
(8.5.8)
f,
lim Pt(Pf)
t-+oo
=f.
(8.5.9)
t-+oo
(8.5.10)
for all
267
E 1 Now,
(8.5.11)
and
oo tn
f>t(Pf)
oo
= e-t L
lpn+l f
n=O n.
= e-t L
tn-1
( -1)1pnf.
n=1 n
III'tf- Pt(P!)II
ltn
tn-1
= m, an integer, then
00
tn
tn-1
e-t ~ nl - (n- 1)1
I=
2e-m :1m -
1)
since almost all of the terms in the series cancel. However, by Stirling's
formula, ml = mme-mv'11rm9m, where 9m -+ 1 as m -+ oo. Thus for
integer t,
converges to zero as t -+ oo. Since, by property 1 this quantity is a nonincreasing function, then (8.5.10) is demonstrated for all t -+ oo. Finally,
inserting (8.5.8) into (8.5.10) gives the desired result, (8.5.9).
Remark 8.5.2. Note that the sum of the coefficients of Pt! given in equation {8.5.11) is identically 1, and thus the solutions of the linear Boltzmann
equation u(t, x) = Pt!(x) bear a strong correspondence to the averages An!
studied earlier in Chapter 5, with n and t playing analogous roles. 0
Property 9. The operators P and Pt commute, that is, P Pt! = PtPf for
all f E 1 . This is easily demonstrated by applying P to (8.5.11):
P/. = / .
268
then, by (8.5.9),
which gives the desired result. Further, the same argument shows that, if
/. = limn-+oo Pt"f exists for some subsequence {tn}, then P/. = j .
Property 5. HPj.=/. for some/. e 1 , then also iH. = j .
This is also easy to show. Write P/. = /. as
(P-1)/.
= 0.
(8.6.2)
269
n-+oo
Pt .. U- Pf) = 0,
P/. =
/.,
which establishes the claim. Note also from property 5 of Pt (Section 8.5)
that this implies Pt!. = /..
I= I-/.+/.
Assume that for every c
I - /. = Pg - g + r,
(8.6.3)
where g E L 1 and llrll ~ c. (We will prove in the following that this
representation is possible.) By using (8.6.3), we may write
Pt!. = /.
and, thus,
From (8.5.10), the first term on the right-hand side approaches zero as
t-+ oo, whereas the second term is not greater than c. Thus
11Pt!-/.ll~2c
for t sufficiently large, and, since e is arbitrary,
lim
t-+oo
IIA/- 1.11 = o,
which completes the proof if (8.6.3) is true. Suppose (8.6.3) is not true,
which implies that
(/-/.,go) :f:. 0
(8.6.4)
270
and
{h,go}
for all hE closure(P-
I)L1 (X).
=0
In particular
((P- I)Pnf,go} = 0,
since (P- I)Pn f E (P- I)L 1 (X), so
(8.6.5)
or
(8.6.6)
(Ptf,go} =(!,go}.
Substituting t = tn and taking the limit as t--+ oo in (8.6.6) gives
(/.,go}= {/,go},
and, thus,
(!.-/,go}= 0,
which contradicts equation (8.6.4). Thus (8.6.3) is true.
(8.7.1)
(8.7.2)
271
exists. That is, either i'tf is not bounded by any integrable function or Ptf
is strongly converyent.
Corollary 8. 7 .2. If the {Markov) opemtor P has a positive fixed point /.,
/.(x) > 0 a.e., then the strong limit, limt-+oo Ptf, exists for all/ E L 1
Proof. First note that when the initial function
satisfies
l/1 :5 c/.
{8.7.3)
IPn /I :5 pn(c/.)
= cPn /. = cf.
Multiply both sides by e-ttn fnl and sum the result over n to give
IPtfl, so that
IA/1:::; c/.,
and, since Ptf is bounded, by Corollary 8.7.1 we know that the strong limit
limt-+oo Ptf exists.
In the more general case when the initial function f does not satisfy
(8.7.3), we proceed as follows. Define a new function by
!. (x) _ { f(x)
c
if 1/(x)l :5 cf.(x)
if 1/(x)l > cf.(x).
C-+00
Thus, by writing /
11/c -/II = 0.
= /c + / - /c, we have
272
Since
/c satisfies
1/cl $ cj.,
from (8.7.3) we know that {F't/c} converges strongly. Now take e > 0. Since
{f'tfc} is strongly convergent, there is a to> 0, which in general depends
on c, such that
for t ~ to, t' ~ 0.
(8.7.4)
fort~O
(8.7.5)
Further,
for a fixed but sufficiently large c. From equations (8.7.4) and (8.7.5) it
follows that
for t ~ to, t' ~ 0,
which is the Cauchy condition for {Ftf}. Thus {Fd} also converges strongly,
and the proof is complete.
The existence of the strong limit (8.7.2) is interesting, but from the
point of view of applications we would like to know what the limit is. In
the following corollary we give a sufficient condition for the existence of
a unique limit to (8.7.2), noting, of course, that, since (8.7.2) is linear,
uniqueness is determined only up to a multiplicative constant.
Corollary 8. 7 .3. Assume that in the set of all densities f E D the equation
Pf = f has a unique solution/. and /.(x) > 0 a.e. Then, for any initial
density, f E D
(8.7.6)
lim Pt! = /.,
t-+oo
Proof. The proof is straightforward. From Corollary 8. 7.2 the limt-+oo Pt!
exists and is also a nonnegative normalized function. However, by property
4 of Pt (Section 8.5), we know that this limit is a fixed point of the Markov
operator P. Since, by our assumption, the fixed point is unique it must be
/.,and the proof is complete.
In the special case that P is a Frobenius-Perron operator for a nonsingular transformation S: X --+ X, the condition P /. = /. is equivalent to
the fact that the measure
1-'J.(A) =
/.(x)J.t(dx)
is invariant with respect to S. Thus, in this case, from Corollary 8.7.2 the
existence of an invariant measure 111. with a density /.(x) > 0 is sufficient
for the existence ofthe strong limit (8.7.2) for the solutions of (8.3.8). Since,
273
t-+oo
AI=/.
(8.7.7)
for allIED.
Now consider the more special case where (X, A, J.L) is a finite measure
space and S: X -+ X is a measure-preserving transformation. Since S is
measure preserving,/. exists and is given by
/.(x) = 1/J.L(X)
for x EX.
Thus limt-+oo Pt! always exists. Furthermore, this limit is unique, that is,
lim
t-+oo
Pt/ = /. = 1/J.L(X)
(8.7.8)
-~';........:..
Ot
+u(t,x) = Pu
274
Pf(x)
and K( x, y): X x X
-+
(8.8.1)
K(x,y)f(y)dy
K(x,y)
(8.8.2)
and
(8.8.3)
K(x,y) dx = 1.
lxf infKm(x,y)dx>O
(8.8.4)
11
(Km is them times iterated kernel K). In this case we will show that the
strong limit
{8.8.5)
lim i'tl = /.
t-+oo
exists for all densities
solution of
f(x)
(8.8.6)
K(x,y)f(y)dy.
h(x)
= inf
Km(x, y).
11
However, for n
i'tl, we have
m, we may write
pn f(x)
~ h{x),
275
-e-t
Lm Itn)
h(x),
n.
n=0
so that
(Ptf- h)- $
L 1tn) h.
m
e-t
n=O
n.
t-+oo
IICA/- h)-11 = o,
(8.8.7)
(8.8.8)
t-+oo
IPn/(x)l
= IPm(pn-m/(x))l $
$ g(x) [
Km(x,y)IPn-mf(y)idy
276
IP.JI ,s.-~:IP"/1+ (-
~ e-t
t t~ IPn/I
n=O
n.
+ gllfll
Further, setting
m
= c "E IPn/1,
n=O
tn
c= sup e -t I
n.
O<t
o::;ri"::;m
we finally obtain
lA/I ~ gll/11 + r.
Evidently, (gll/11 + r) is an integrable function, and from Corollary 8.7.1
we know that the strong limit (8.8.8) exists.
Under assumption (8.8.7) we have no assurance that the strong limit
(8.8.8) is unique. However, some additional properties of K(x, y) may ensure this uniqueness. For example, if X is a bounded interval of the real
line or the half-line, (8.8.7) holds, and Km(x, y) is monotonically increasing
or deCreasing in x, then
for all fED,
(8.8.9)
I: Kn(x, y) > 0
n=l
for x E A, y E X.
277
{}F~,v)
= C(F(t,v)).
(8.9.1}
Bobylev [1976], Krook and Wu [1977], and Tjon and Wu [1979] have
shown that in some cases equation (8.9.1} may be transformed into
8u(t x)
~
where x
1 y Jo
00
= -u(t,x) +
a:
dy {"
u(t,y- z)u(t,z)dz,
> 0, (8.9.2}
= const
00
1
a:
F(t v)
' dv.
Jv-x
8u(t, x)
8t
+ u(t, x) =
00
a:
lo
> 0.
(8.9.3}
278
1 1"
=1
00
Pf(x) =
..1!..
d
y
:r:
for
e-<v-) j(z) dz
(8.9.4)
-Ei( -x)
00
Pf(x)
where
K x
( ,y)
K(x,y)J(y)dy,
= { -e"Ei( -y)
-e"Ei(-x)
0<x ~y
0 < y < x.
(8.9.5)
(8.9.6)
t-+oo
= t-+oo
lim Pt!(x) = e-:r:
(8.9.7)
f(x)
1 1"
00
:r:
..1!..
d
y
(8.9.8)
which must be solved for f. Since the right-hand side of (8.9.8) is differentiable, f must be differentiable. Its first derivative is
d/(x) =
dx
_!
x }0
e-<:r:-z) J(z)
dz.
Multiply both sides by x exp(x) and differentiate again to obtain the nonlinear second-order differential equation
tPj
x dx 2
d/
+ (x + 1) dx + f = 0.
(8.9.9)
279
We know that one solution of (8.9.9) is ft(x) = exp(-x), and a second independent solution may be determined using the d' Alembert reduction method [Kamke, 1959]. This simply consists of substituting f(x) =
g(x) exp(-x) into (8.9.~) and solving the resulting equation for g(x). Once
g is determined then the second independent solution of (8.9.8) is h(x) =
g(x) exp(-x).
Making this substitution and simplifying gives
x,Pg +(1-x)dg =0
dx 2
dx
'
dg
1 z
-=-e
dz
X
as a particular solution. Thus
g(x)
= Ei(x),
h(x) = e-zEi(x).
Therefore, the general solution of (8.9.9) is
f(x)
= 01e-z + 02e-zEi(x).
00
f(x) dx
(8.9.10)
= 01 +
021
00
e-zEi(x) dx
= 1.
t-+oo
(8.9.12)
280
Exercises
.r.
en
en
8.3. Let {Nth>o be a Poisson process and S:R-+ R a nonsingular mapping. Consider the following procedure: In a time t > 0 a point x E R is
transformed into S(x) + Nt. Given an initial density distribution function
f of the initial point x find the density u(t, x) of S(x) + Nt. (As in Section
8.3 assume that the position x of the initial point and the process Nt are
independent.) Prove that u(t, x) satisfies the differential equation
for
t > 0, x
E R,
E L 1
8.6. Again consider the linear Boltzmann equation (8.5.1) and assume that
P: L 1 (X,A,Jt)-+ L 1 (X,A,Jt) is sweeping with respect to a family A. CA.
Prove that the semigroup {Fth2::o is sweeping with respect to the same
family.
8.7. The nonlinear Tjon-Wu equation (8.9.2) may be written in the form
du
dt
= -u+P(u,u),
Exercises
P(f, g)(x)
1 -dy1"
281
00
:1:
u(t)
with
1
Un
=-
n-1
L P(u~c, Un-1-lc),
UO
=IE V(R+)
k=O
=I
(Kielek,
9
Entropy
The concept of entropy was first introduced by Clausius and later used
in a different form by L. Boltzmann in his pioneering work on the kinetic
theory of gases published in 1866. Since then, entropy has played a pivotal
role in the development of many areas in physics and chemistry and has
had important ramifications in ergodic theory. However, the Boltzmann
entropy is different from the Kolmogorov-Sinai-Qrnstein entropy (Walters,
1975; Parry, 1981] that has been so successfully used in solving the problem
of isomorphism of dynamical systems, and which is related to the work of
Shannon [see Shannon and Weaver, 1949].
In this short chapter we consider the Boltzmann entropy of sequences
of densities { pn!} and give conditions under which the entropy may be
constant or increase to a maximum. We then consider the inverse problem
of determining the behavior of { pn!} from the behavior of the entropy.
9.1
Basic Definitions
If (X, A, p,) is an arbitary measure space and P: 1 - 1 a Markov operator, then under certain circumstances valuable information concerning the
behavior of { pn!} (or, in the continuous time case, { pt!}) can be obtained
from the behavior of the sequence
H(Pnf)
1J(Pn f(x))p,(dx),
{9.1.1)
284
9. Entropy
1/2
= -ulogu.
= -u log u,
TJ(O)
= 0,
(9.1.2)
H(f)
TJ(/(x))J.'(dx).
(9.1.3)
Remark 9.1.1. If J.'(X) < oo, then the integral (9.1.3) is always well
defined for every I ~ 0. In fact, the integral over the positive parts of
TJ(/(x)),
[TJ(/(x))]+ = max[O,TJ(/(x))].
is always finite. Thus H(/) is either finite or equal to -oo. 0
Since we take TJ(O) = 0, the function TJ(u) is continuous for all u ~ 0. The
graph of 1J is shown in Figure 9.1.1. One of the most important properties
of 1J is that it is convex. To see this, note that
TJ"(u)
= -1/u
so TJ" (u) < 0 for all u > 0. From this it follows immediately that the graph
of 1J always lies below the tangent line, or
TJ(u) ~ (u- v)TJ'(v) + TJ(v)
(9.1.4)
(9.1.5)
285
If f and g are two densities such that 71(/(x)) and f(x) logg(x) are
integrable, then from (9.1.5) we have the useful integral inequality
-i
(9.1.6)
and the equality holds only for f =g. Inequality (9.1.6) is often of help in
proving some extremal properties of H(/) as shown in the following.
Proposition 9.1.1. Let JJ(X) < oo, and consider all the possible densities
f defined on X. Then, in the family of all such densities, the maximal
entropy occurs for the constant density
/o(x) = 1/JJ(X),
(9.1.7)
E D
H(f) =- [
[JJ(~)] [!(x)JJ(dx)
or
[JJ(~)] ,
i JJ(~)
log
286
9. Entropy
(9.1.9)
maximizes the entropy.
The proof proceeds as in Proposition 9.1.1. From inequality (9.1.6) we
have, for arbitrary f E D satisfying (9.1.8),
-1
00
H(J) :5
f(x)
log(.Xe--':~:) dx
00
=-log.X
00
f(x)dx+
-Xxf(x)dx
=-log.X+l.
/o given by (9.1.9),
-1
00
f E D satisfying (9.1.8).
Example 9.1.2. For our next example take X= (-oo,oo) and consider
all possible densities f E D such that the second moment of f is finite,
that is,
(9.1.10)
/_: x 2J(x) dx = u 2.
Then the maximal entropy is achieved for the Gaussian density
/o(x)
1
- exp (- x 2 )
= -V27ru
2
2u
(9.1.11)
roo
roo
=-log [.; 1 2]
f(x) dx + 2\
x2 J(x) dx
27ru 1-oo
q 1-oo
1
=!-log
[ -- ]
.
2
V27ru2
Further
H(Jo) = -
287
These two examples are simply special cases covered by the following
simple statement.
Proposition 9.1.2. Let {X,A,J.L) be a measure space. Assume that a sequence 91, ... , 9m of measurable functions is given as well as two sequences
of real constants 91, ... , 9m and 111, ... , lim that satisfy
fx g;(x) exp[-v;g;(x)]J.L{dx)
g; = fx n~1 exp[-v;g;(x)]J.L(dx),
where all of the integrals are finite. Then the maximum of the entropy H(J)
for all f E D, subject to the conditions
g; =
g;(x)f(x)J.L(dx),
occurs for
n~1 exp[-v;g;(x)]
;{! ( ) 0 X
i=l, ... ,m
fx n~1 exp[-ll;g;(x)]J.L(dx)
1IT
X
exp[-v;g;(x)]J.L(dx)
i=1
so
fo(x)
= z- 1 II exp[-v;g;(x)J.
i=1
H(J)
~=:....
lx
f(x) [-log Z-
=log Z
+ ~ 11; [
=log Z
+ ~ v;g;.
f(x)g;(x)J.L(dx)
i=1
Furthermore, it is easy to show that
m
H(Jo) =log Z
+ ~lli9i
i=1
288
9. Entropy
z-le-vg(z)'
which is just the Gibbs canonical distribution function, with the partition function Z given by
H(/o)
= log Z + vg
(9.1.12)
The proof of this result is difficult and requires many specialized techniques.
However, the following considerations provide some insight into why it is
true. Let 17(y) be a convex function defined for y ~ 0. Pick u, v, and z such
that 0 ~ u ~ z ~ v. Since z E [u, v] there exist nonnegative constants, a
and {j, with a + {3 = 1, such that
z = au+{3v.
Further, from the convexity of 11 it is clear that 17(z) ~ r, where
= a17(u) + {j1J(v).
17(au + {jv)
a17(u) + /317(v).
289
<
(9.1.13)
(PI);
=L
k;;/;.
j=1
where f
(ft, ... , /n) and
(9.1.13) to (PI);, we have
E; k;; = 1, k;;
~ 0. By applying inequality
TI((Pf);) ~ Lkt;TI(/;)
= P(Tif);,
j=1
(9.1.14)
where again
a; ~
0 and
Ei a; = 1.
P: L 1
for all
~ 0, f E L 1
290
9. Entropy
gives
1J(PI(x))J.&(ch)
L
=L
~
P1J(f(x))J.&(ch)
71(/(x))J.&(ch)
Remark 9.2.1. For a finite measure space, we know that the maximal
entropy Hmax is -log[1/J.&(X)], so that
-log[1/J.&(X)] ~ H(J>ftf) ~ H(f).
This, in conjunction with Theorem 9.2.1, tells us that in a finite measure space when P has a constant stationary density, the entropy never
decreases and is bounded above by -log[1/J.&(X)]. Thus, in this case the
entropy H(Pn f) always converges as n-+ oo, although not necessarily to
the maximum. Note further that, if we have a normalized measure space,
then J.&(X) = 1 and Hmax = 0. 0
Remark 9.2.2. In the case of a Markov operator without a constant stationary density, it may happen that the sequence H(pn f) is not increasing
as n increases. As a simple example consider the quadratic transformation
S(x) = 4x(1- x). The Frobenius-Perron operator for S, derived in Section
1.2,is
Pl(x)
= 4v'1- X
{1
l.(x) = 1ry'x(1- x)
is a stationary density for P. Take as an initial density I= 1, so H(f) = 0
and
1
Pl(x) = yr::x
2 1-x
Then
H(Pf)
Clearly H(Pf)
-1
~log ( 2~)
ch = (log2) -1.
1-x
o 2 1-x
< H(f)
= 0.
291
It is for this reason that it is necessary to introduce the concept of conditional entropy for Markov operators with nonconstant stationary densities.
Definition 9.2.1. Let f,g E D be such that supp/ c suppg. Then the
conditional entropy of f with respect to g is defined by
The conditional entropy H(f I g) has two properties, which we will use
later. They are
1. If j,g E D, then, by inequality (9.1.6), H(f
holds if and only if I= g.
I g)
For j,g ED, the condition supp/ C suppg implies suppP/ c suppPg
(see Exercise 3.10}, and given H(f I g) we may evaluate H(Pf I Pg)
through the following.
Theorem 9.2.2. Let (X,A,Jl) be an arbitmry measure space and P: L 1
L 1 a Markov opemtor. Then
H(Pf I Pg)
H(f I g)
for/, g E D, supp f
C supp g.
-+
(9.2.3)
H(Pf I g)
H(f I g).
= P(hg)/ Pg
for hE
r~o,
292
9. Entropy
2. R1
= PgfPg = 1.
(9.2.4}
Integrating this last inequality over the space X, and remembering that P
preserves the integral, we have
H(Pf 1 Pg) :2:::- [
P{f<x> log
f(x)
[~~:n}"<dx>
Theorem 9.3.1. Let (X, A, J') be a finite measure space and S: X --t X
be an invertible measure-presennng transformation. If P is the FrobeniusPerron operator corresponding to S, then H(pn f)= H(f) for all n.
Proof. If Sis invertible and measure preserving, then by equation (3.2.10}
we have Pf(x) = f(S- 1 (x)) since J- 1 1. If P 1 is the Frobenius-Perron
293
H(PtP!}
H(Pf)
H{!},
n-oo
pnc = cPn1
c. Then
= c.
Without any loss of generality, we can assume that c > 1. Further, since
77(u) :50 for u ~ 1, we have (note p.(X} = 1 and Hmax =OJ
{9.3.1}
where
c}.
lin
TJ(Pn f(x)}p(dx}l
=in
~k
:5 k
where
JAn
LIPn
k = sup l77'(u)l
l$u$c
294
9. Entropy
n-+oo}A..
liP"J- 111 -+ 0 as n -+ oo
t7(Pn J(x))JJ(dz)
= 0.
!=It +h,
where
I 1 (x) -_
and
{0
J(x)
if J(x) > c
if 0 ~ J(x) ~ c
<E
and H(h)
> -E.
where 6 = llhll Now ft/(1- 6) is a bounded density, and so from the first
part of our proof we know that for n sufficiently large
Furthermore,
6H
h) -log (
~)
pn h(x)JJ(dz)
6H ( pn
(~h)) ~ -E + 6 log 6.
H(Pnf)
-t(1 - 6) -
(pn (~h))
+ 6 log 6
= -2e + 6e + 6 log 6.
(9.3.2)
295
) _ { (2x, b),
0~x< 0~y~1
(2x - 1'21y + 1)
x, Y 2, 1
2<
- x -< 1 , 0 <
- y -< 1,
originally introduced in Example 4.1.3, with that of the dyadic transformation. Observe that the x-coordinate of the baker transformation is transformed by the dyadic transformation
81(x)
= 2x
(mod 1).
From our considerations of Chapter 4, we know that the baker transformation is invertible and measure preserving. Thus by Theorem 9.3.1 it
follows that the entropy of the sequence {pn !}, where P is the FrobeniusPerron operator corresponding to the baker transformation, is constant for
every density f.
Conversely, the dyadic transformation 81 is exact. Hence, from Theorem
9.3.2, the entropy of {Pf !}, where P 1 is the Frobenius-Perron operator
corresponding to 8 11 increases to zero for all bounded initial densities f.
0
Remark 9.3.2. Observe that in going from the baker to the dyadic transformation, we are going from an invertible (reversible) to a noninvertible (irreversible) system through the loss of information about the y-coordinate.
This loss of information is accompanied by an alteration of the behavior of
the entropy. An analogous situation occurs in statistical mechanics where,
in going from the Liouville equation to the Boltzmann equation, we also lose
coordinate information and go from a situation where entropy is constant
(Liouville equation) to one in which the entropy increases to its maximal
value (Boltzmann H theorem). 0
296
9. Entropy
then P is constrictive.
H(/)
-c
is weakly precompact.
We will use criterion 3 of Section 5.1 to demonstrate the weak precompactness of F. Since 11/11 = 1 for all f E D, the first part of the criterion is satisfied. To check the second part take e > 0. Pick l = e- 1JJ(X},
N = exp[2(c + l)/e] and 6 = ef2N, and take a set A c X such that
JJ(A) < 6. Then
(9.4.1)
h
where
A1
= {x E A:f(x) ~ N}
A2
and
{ f(x)JJ(dx)
}At
~ N6 = ef2.
~ C-
~ c+
jX\A2
-c, it follows
'lmaxP.( dx)
lx\A2
~ c + (1/e)JJ(X} = c + l.
297
Therefore
jA2
or
f(x)log Np(dx)
< c+ l
c+l
= 2
Thus
f(x)p(dx) < e
H(Pnf) 2:: -c
holds for every bounded f E D and n sufficiently large. Since P is a Markov
operator and is constrictive, we may write P f in the form given by the
spectral decomposition Theorem 5.3.1, and, for every initial/, the sequence
{pn!} will be asymptotically periodic.
pn+l f(x)
f(x)
as an initial
f.
If T
(x)
is the asymptotic period of pn f, then we must have
1
However, by assumption,
lim H(Pn f)
n-+oo
= 0,
298
9. Entropy
and, since the sequence {H(pn.,. f)} is a constant sequence, we must have
f(x)
= 1x(x).
[1/I'(At}]1A 1 (x)
= 1x(x).
jj(A)
g(x)l'(dx)
for A E .A.
}A
Ph(x}ji(dx}
and satisfies P1
=f
ls-l(A)
h(x}ji(dx}
for A E .A
f [Ph(x)]g(x)l'(dx} = f
}A
ls-l(A)
h(x)g(x)l'(dx}.
ls-l(A)
so that (i'h}g
h(x)g(x)l'(dx}
= f P(h(x)g(x)}~(dx}
}A
= P(hg} or
Ph= (1/g)P(hg).
299
Furthermore, by induction,
finh
= (1/g)Pn(hg).
In this new space {X, A, jJ.), we may also calculate the entropy il(Pn h)
as
H(Ph) =-
pnh(x) log[finh(x)]il{dx)
=- [
~0
and
h(x)g(x)p(dx)
= 1,
il(Pnh)
= H(Pn I I g).
I ED
such that
I/ g is bounded.
8u(t,x)
8t +u(t,x)=Pu(t,x),
with the initial condition u(O,x) = l(x), which we examined in Chapter 8.
There we showed that the solution of this equation was given by
u(t,x)
t-+oo
Thus, in the case in which 1. is positive and unique, the conditional entropy for the solutions of the linear Boltzmann equation always achieves
its maximal value.
300
9. Entropy
Exercises
9.1. Let X= {(xt, ... ,x~~:) E Rlc:x1 ~ 0, ... ,x~~: ~ 0}. Consider the family
Fm 1 m of densities 1: X - R+ such that
1 1
00
00
xd(xt, ... ,
= 1, ... ,k.
LYI(x,y)dxdy = m
> 0.
Show that for a> 0 there is a density in Fma having the maximal entropy
and that for Q ~ 0 the entropy in r mOt is unbounded.
9.3. Consider the space X = { 1, ... , N} with the counting measure. In this
space 'D(X) consists of all probabilistic vectors (It = 1(1), ... , IN = f(N))
satisfying
lc=l
Show that j,, = 1/N, k = 1, ... , N maximizes the entropy. For which vector
is the entropy minimal?
9.4. Consider the heat equation
au
62
a2 u
for t
&t-2ax2
> 0, x
R,
(a )2
= J+oo
-oo u ax lnu
dx ~ 0.
au
62
&t = 2
B2u
fJx2 -
ax (b(x)u)
for t
> 0, 0 ~ x
Uz(t, 0)
= Uz{t, 1) = 0
for t > 0.
Assume that b is a (J2 function and that b(O) = b(1) = 0. Without looking
for the explicit formula for the solutions (which, for arbitrary b, is difficult)
prove the following properties:
Exercises
301
1
1
u(t,x)dx = const.
d~H(u1 I u2) =
u1
(!In ::) dx ~
0,
1
1
H(f I g)=-
/(x)log
[~~=~] dx
for
f, g E D([O, 1]).
Compare for different pairs of sequences {/n}, {gn} C D([O, 1]) the asymptotic behavior of
11/n- 9niiL1 and H(/n I 9n)
9.7. Let (X,A,JS) be a measure space. Prove that for every two sequences
{/n}, {gn} CD the convergence H(fn I Un) - 0 implies 11/n- YniiLl - 0.
Is the converse implication also true? Exercise 9.6 can be helpful in guessing
the proper answer (Loskot and Rudnicki, 1991).
9.8. Consider a density f,.: R 3
R+ of the form
= o: exp(-Pixl 2 + kx),
where lxl 2 = x~ + x~ + x~ and kx = k1x1 + k2x2 + ksxs.
f,.(x)
Assume that a
}Ra
= 0, 1,2,3,
-/.II- 0
10
Stochastic Perturbation of Discrete
Time Systems
We have seen two ways in which uncertainty (and thus probability) may
appear in the study of strictly deterministic systems. The first was the
consequence of following a. random distribution of initial states, which, in
turn, led to a. development of the notion of the Frobenius-Perron operator
and an examination of its properties a.s a. means of studying the asymptotic
properties of flows of densities. The second resulted from the random application of a. transformation S to a. system and led na.tura.lly to our study
of the linear Boltzmann equations.
In this chapter we consider yet another source of probabilistic distributions in deterministic systems. Specifica.lly, we examine discrete time situations in which a.t each time the value Xn+l = S(xn) is reached with some
error. An extremely interesting situation occurs when this error is small
and the system is ''primarily" governed by a. deterministic transformation
S. We consider two possible ways in which this error might be sma.ll: Either
the error occurs rather rarely and is thus small on the average, or the error
occurs constantly but is sma.ll in magnitude. In both cases, we consider the
situation in which the error is independent of S(xn) and are, thus, led to
first reca.ll the notion of independent random variables in the next section
and to explore some of their properties in Sections 10.2 and 10.3.
304
10.1
Let (0, F, prob) be a probability space. A finite sequence of random variables (et, ... ,
is called a k-dimensional random vector. Equivalently,
we could say that a random vector = (
is a measurable transformation from n into Rk. Measurability means that for every Borel subset
B C Rk the set
ek)
e e, ... ,ek)
belongs to F.
Thus, having a k-dimensional random vector (e1, ... ,eA:), we may consider two different kinds of densities: the density of each random component
and the joint density function for the random vector (e1, ... ,
Let the
density of be denoted by f,(x), and the joint density of = (et ... ,eA:)
be /(xt, ... , X A:) Then by definition, we have
ei
ei
ek).
(10.1.1)
and
j .. j
forB c Rk,
Jik,
._____.,
k-l times
so that we have
prob{(et. ... eA:)
e B} = prob{et e Bt}
k{I . I
1
dxk} dx.
(10.1.2)
R"-1
ft(x)
I . I
dxk.
(10.1.3)
R"-1
Thus, having the joint density function f for (e1, ... ,eA:), we can always find
the density for
from equation (10.1.3). In an entirely analogous fashion,
el
305
h can be obtained by integrating f(xlt x, ... , Xk) over Xt, xs, ... , Xk The
same procedure will yield each of the densities /;.
However, the converse is certainly not true in general since, having the
density /; of each random variable
(i = 1, ... 'k), it is not usually possible to find the joint density I of the random vector (elt ... 'ek> The one
important special case in which this construction is possible occurs when
ek are independent random variables. Thus, we have the following
theorem.
ei
e1, ... ,
Theorem 10.1.1. If the random variables 6, ... ,ek are independent and
have densities It, ... , fk, respectively, then the joint density function for
the random vector (6' ... 'ek) is given by
(10.1.4)
Rk of the form
(10.1.5)
where B 1 , , Bk
Then
(10.1.6)
Since, by definition, sets of the form {10.1.5) are generators of the Borel
subsets in Rk, it is clear that (10.1.6) must hold for arbitrary Borel sets B c
Rk. By the definition of the joint density, this implies that It (x 1 ) fk(xk)
is the joint density for the random vector (e1 , . ,ek>
As a simple application of Theorem 10.1.1, we consider two independent
random variables
and
with densities It and f2, respectively. We wish
to obtain the density of
+ Observe that, by Theorem 10.1.1, the
random vector
2) has the joint density ft(xt)f2(x2) Thus, for an
e1 e2
e1 e2.
(et.e
306
c R, we have
II
prob{~1 +~2 E B} =
11(xl)f2(x:~)dxldx2,
z1+z:aEB
e B}
=II
L{/_:
11(x- y)f2(y)dxdy
BxR
1:
is the density of
el + e2
(10.1.7)
11(x- y)f2(y) dy
1
f(y) dy = - f I(~) dx.
j(l/c)A
c
1c 1 )A
e !A}= f
c
Thus, from (10.1.7), if e1 and e2 are independent and have densities 11 and
/2, respectively, then (c 1 ~1 + C:J~2 ) has the density
1
f(x) = 1C1C2 1
joo
dy.
-oo 11 (x-C1 y) h(1L)
C:J
(10.1.8)
E(~)
307
E(c) = c
prob(dw)
=c.
(10.2.1)
eexists and
is given by
E(h 0 e)
I ... I
{10.2.2)
h(x)
= L.\i1A (x)
1
i=l
where the Ai are mutually disjoint Borel subsets of Ric such that UiAi
Then
n
h({(w))
=L
.\ilA,(e(w))
=L
.\il~-l(A,)(w),
i=l
i=l
and, by the definition of the Lebesgue integral,
E(h o e)=
h({(w)) prob{dw)
Further' since
i=l
prob{C 1 (Ai)}
= Ric.
308
As a consequence,
Thus, for the h that are simple functions, equality {10.2.2) is proved. For an
arbitrary h, hf integrable, we can find a sequence {hn} of simple functions
converging to h and such that lhnl :S lhl. From equality {10.2.2), already
proved for simple functions, we thus have
E(hn o e)=
}R
hn(x)f(x) dx.
Jn
lhn/1
:S
lhl/,
it
JR
E(e) =
1:
xf(x)dx.
(10.2.3)
t,
(t,.xiei) = .xiE(ei>
(10.2.4)
Ei .xiei
E(ei)
e:
(10.2.5)
if the corresponding integral is finite.
Thus the variance of a random variable is just the average value of
the square of the deviation of away from m. By the additivity of the
mathematical expectation, equation (10.2.5) may also be written as
(10.2.6)
309
If has a density f(x), then by the use of equation (10.2.2), we can also
write
whenever the integral on the right-hand side exists. Finally, we note that
for any constant >.,
e,
u(e)
= VD2 (e).
For our purposes here, two of the most important properties of the mathematical expectation and variance of a random variable are contained in
the next theorem.
= E(6) E(ek)
(10.2.7)
and
(10.2.8)
Proof. The proof is easy even in the general case. However, to illustrate
again the usefulness of (10.2.2), we will prove this theorem in the case when
all thee, have densities. Thus, assume that has density j,, i = 1, ... , k,
and pick h(xll ... 'Xk) = Xl Xk. Since the ell ... ' ek are independent
random variables, by Theorem 10.1.1, the joint density function for the
random vector (ell ... ' ek) is
e,
E(el ek)
=I I
R"
310
= m 1, so that
2
D (et + + e~c) = E((et + + e~c- mt -
= E
(.t
,,,=1
(e,- m,)(e;-
- m~c) )
m;)) .
Since the et, ... ,e~c are independent, (et- mt), ... , (e~c- m~c) are also independent. Therefore, by (10.2.4) and (10.2.7), we have
1c
D 2 (et
i=l
1c
Since E(e,) =
i"#j
e,
e.
e;
(10.2.9)
prob{le- ml ~ e} 5 D 2 (e)/e2
(10.2.10)
E(e)
= f e(w) prob{dw) ~ f
Jo
aJ
J{w:E(w)~a}
e(w) prob(dw)
prob(dw) = aprob{e
{w:E(~o~)~a}
~a},
311
(10.3.1)
is denoted by
l.i.m.en =
lim prob{len- el ~ e}
n-+oo
= 0.
{10.3.2)
=e.
(10.3.3)
Note that in terms of V norms, the mathematical expectation and variance of a random variable may be written as
and
.o2(e)
=In le-
ml prob(dw)
= lie- mii~2(0)
This observation allows us to derive a connection between stochastic convergence and strong convergence from the Chebyshev inequality, as contained
in the following proposition.
Proposition 10.3.1. If a sequence {en} of mndom variables, en E V(O),
is strongly convergent in V(O) toe, then {en} is stochastically convergent
to
Thus, convergence in mean implies stochastic convergence.
e.
Proof. We only consider p < oo, since for p oo the proposition is trivial.
Applying the Chebyshev inequality (10.2.9) to len- eiP, we have
prob{len - ei.P ~ eP} ~ (1feP)E(len - ei.P)
312
or, equivalently,
n-+oo
Remark 10.3.1. For all of the types of convergence we have defined (strong
and weak V convergence, convergence in mean, stochastic convergence,
and almost sure convergence), the limiting function is determined up to a
set of measure zero. That is, if and are both limits of the sequence {en},
then and ( differ only on a set of measure zero. 0
We now show the connection between almost sure and stochastic convergence with the following proposition.
e,
e.
Proof. Set
e,
n-+oo
= n-+00
lim { 71n(w)prob(dw) = 0.
ln
= prob{77n ~ c}
forO<c<l.
Thus the stochastic convergence of {71n} to zero implies the stochastic convergence of {en} to and the proof is complete.
313
and
M
Then
prob
for every
> 0. In particular, if m1
1
= m2 = = mn = m,
n
st-um- L: ei
n
Proof. Set
then
= m.
{10.3.4)
i=l
17n =- L:ei
i=l
and, clearly,
1
E(1Jn) =- L:mi.
n i=l
314
versions of these is the so-called strong law of large numbers, as contained in the Kolmogorov theorem.
Theorem 10.3.2 (Kolmogorov). Let {en} be a sequence of independent
random variables with
E(en) = mn
< oo.
Then
1 n
lim
n-+oon ~
"'<ei - mi) = 0
i=l
with probability 1.
We will not give a proof of the Kolmogorov theorem [see Breiman (1968)
for the proof] as it is not used in our studies of the ftow of densities. We
stated it only because of its close correspondence to the Birkhoff individual
ergodic Theorem 4.2.4, which also deals with the pointwise convergence of
averages. To illustrate this correspondence, consider the sequence
(10.3.5)
en:
en
315
for 0 E :F, A E A
316
such that each "'n: 0-+ R takes only two values, 1 and 0, with the following
probabilities:
prob("7n = 0) = e,
prob("7n = 1) = 1 - e,
and each
~n
n=0,1, ...
"'n
(10.4.2)
"'n = 0}
+Prob{S(xn) E A and "'n = 1}.
Since the events {~n E A} and {"'n = 0} are independent of each other and
independent of the initial vector xo, we have
Prob{~n E A and
Lg(x)~(dx).
~1,
= 0}
Finally, since Xn has the density In by assumption, this last formula implies
Prob{S(xn) E A and "'n
= 1} =
(1- e) {
ls- 1 (A)
ln(x)~(dx).
ls-l(A)
ln(x)~(dx) + e
jA
g(x)~(dx).
317
for all Borel sets A CR. Hence, if Xn has density fn, then this demonstrates
that Xn+l also has a density f n+l given by
fn+l
(10.4.3)
f(x)JL(dx)
(10.4.4)
for all f E L 1 . Using the definition of Pf:, we may rewrite equation {10.4.3)
in the form
(10.4.5)
Our goal is to deduce as much as possible concerning the asymptotic behavior of P: fo for foE D.
The first result is contained in the following proposition.
Proposition 10.4.1. Let the operator Pe: D-+ D be defined by equation
{10.4.4}. Then {P:} is asymptotically stable.
Proof. The proof is trivial. From the definition of Pf: in (10.4.4}, we have
f! = e ~)1- e)kpkg.
k=O
(10.4.6)
318
(10.4.7}
pkg(x) = 2kg(2kx)
and, thus,
00
/!(x) = e:
L:<1- e:)k2kg(2kx).
k=O
Now pick an arbitrarily small h > 0 and integrate/! over [0, h]:
hJ:(x) dx
1
= e:
k=O
00
= e: E<1 - e:>k
k=O
h2.
lo
g(y) dy.
For 6 > 0 arbitrarily small, we can always find an m such that, for all
k>m,
h2.
so
1h
g(y)dy
'?.1
00
g(y)dy- 6 = 1-6
h-olo
which is a contradiction.
319
Proof. Since /! is a stationary density for P., we have Pd! = f! or, more
explicitly,
/2
f1,
f! =ge L)l-e)lc =g
lc=O
so 12 =g. 0
Although this example shows that it is quite possible for f~ to depend
on g, the following theorem gives sufficient conditions for not only the
existence of f~ but also its value.
r.
f~
= 1/~t(X).
e ~)1- e)lc = 1.
lc=O
320
E(h(xn+l)) = {
}Rd.
h(x)ln+l(x) dx.
(10.5.2)
Furthermore, because of (10.5.1) and the fact that the joint density of
(xn,en) is just ln(y)g(z), we also have
E(h(xn+l))
= E(h(S(xn) +en))
= jRd.jRd.
f f h(S(y) + z)ln(y)g(z) dydz.
= }Rd.
[ jRd
f h(x)ln(y)g(x- S(y)) dxdy.
(10.5.3)
Equating (10.5.2) and (10.5.3), and using the fact that h was an arbitrary,
bounded, measurable function, we immediately obtain
ln+l(x)
= }Rt~.
{ ln(y)g(x- S(y)) dy.
(10.5.4)
Remark 10.5.1. Our derivation of (10.5.4), though mathematically precise, is somewhat different from the usual method. Our reasons for this are
321
f f
jAjRd
fn(y)g(x- S(y))dxdy
and, thus, by the definition of density, if f n exists then f n+l also exists and
is given by (10.5.4). Finally, we have introduced this method of obtaining
(10.5.4) because we use it later in deriving the Fokker-Planck equation
that describes the evolution of densities for continuous time systems in the
presence of a stochastic process. 0
From our equation (10.5.4), we define an operator P: L 1 -+ 1 by
P!(x) =
}Rd
f(y)g(x- S(y)) dy
(10.5.5)
for f E 1 . That Pis a Markov operator is quite easy to prove. Note first
that if we set K(x, y) = g(x-S(y)), then, forgED, K is a stochastic kernel
(Section 5.7) and Pis a Markov operator. Thus, in examining the behavior
of the systems forming the subject of this section, we have available all of
the tools developed in Section 5.7.
Remark 10.5.2. In the special case in which d = 1 and S = ..Xx, equation
(10.5.5) reduces to that considered in Example 5.7.2 with a= 1 and b = -..X.
0
However, because of the characteristics of the function g identified as a
kernel, we can prove more than in Section 5. 7. We start by stating and
proving a result for the asymptotic periodicity of {pn}.
}Rd
g(x- S(y))V(x) dx
~ aV(y) + {3
for ally E ~.
Proof. We will use Theorem 5. 7.2 in the proof, noting that now the stochastic kernel is explicitly given by K(x,y) = g(x- S(y)).
We first verify that (5.7.19) holds. Since g is integrable, for every . X > 0
there is a 6 > 0 such that
g(x)dx <..X
322
In particular,
K(x, y) dx
jE
=I
g(x- S(y)) dx =
jE
g(x) dx <)..
JE-8(11)
for JJ.(E-S(y)) ='JJ.(E) < 6. Thus, (5.7.19) holds uniformly for all bounded
sets B.
Further, from (10.5.5) and the assumptions of the theorem we have
jRd
V(x)P!(x)dx=
jRd
:5 a { V(y)f(y) dy + {3,
jRd
(10.5.6)
n
00
s/c([o, 1])
(10.5.7)
lc=O
A.
The perturbed dynamical system
mod 1
(10.5.8)
323
!.
en
Corollary 10.5.1. If P given by {10.5.5) satisfies the conditions of Theorem 10.5.1, and g(x) > 0, then {P"} is asymptotically stable.
Proof. We start with the observation that for every fixed x the product
g(x- S(y))P"- 1 f(y), considered as a function of y, does not vanish everywhere. As a consequence,
P" f(x)
={
}Rd
Ff=f.
P,
namely,
It
and
{10.5.9)
324
rl
n
lh
(b)
(c)
(d)
0
FIGURE 10.5.1. Asymptotic periodicity illustrated. Here we show the histograms
obtained after iterating 104 initial points uniformly distributed on (0, 1] with
a= ~ .X=
and(}= ft in equation (10.5.8). In (a) n = 10; (b) n = 11; (c)
n = 12; and (d) n = 13. The correspondence of the histograms for n = 10 and
n = 13 indicates that, with these parameter values, numerically the sequence of
densities has period 3.
o,
325
and similarly for P f-. Since f+ is not identically zero and g is strictly
positive, the integral in (10.5.10) is a. nonzero function for every x, and,
thus, Pf+(x) > 0 for all x. Clearly, too, Pf-(x) > 0 for all x and, thus,
the supports of Pf+ and Pf- are not disjoint. By Proposition 3.1.2, then,
we must have IIPfll < llfll, which contradicts equality (10.5.9). Thus, ft
and h must be identical almost everywhere if they exist.
= ~exp(-x2 /2),
v2rr
00
-oo
Note that Ff(x) is simply the solution u(t, x) of the heat equation (7.4.13)
with u 2 = 1 at time t = 1, assuming an initial condition u(O, y) = f(y).
Since this solution is given by a. semigroup of operators [cf. equations
(7.4.11) and (7.9.9)], it can be shown that
:::;
00
v2rrn
exp[-(x- y) 2 /2n]f(y) dy
100
f(y)dy=
pn f(x) = . ~
1
~
v 2rrn
-oo
-oo
v 2rrn
}Rd.
+ {3
(10.5.11)
(10.5.12)
326
lvl~r
for every x E
Ir'-.
IS(x)l ~ ~lxl,
(10.5.13)
eo,
r g(x)(lxl + IS(y)l) dx
Jnd
= IS(y)l + Jnd
r g(x)lxl dx.
From (10.5.13) we also have
IS(y)l < ~1111
-
+ lzi~M
max IS(x)l
so that
327
Ff(x) =
jRtJ
Ff(x) =
jRtJ
Ff(x) =
jRtJ
g(x- y)Pf(y)dy.
(10.5.14)
Ff(x) =
}Rd
g(y)Pf(x-y)dy.
(10.5.15)
e > 0,
(10.6.1)
Pd(x) =
!e }Rd
f g (!!.)
Pf(x- y) dy
e
{10.6.2)
328
= }Rd
f g(y)Pf(x- ey) dy.
Pd(x)
(10.6.4)
Since
}Rd
g(y)Pf(x)dy
= Pf(x),
= 0,
e-+0
for all f E L\
=f
Pf(x)
}Rd
g(y)Pf(x)dy,
then
Pd(x)- Pf(x)
=f
}Rd
Pick an arbitrarily small 6 > 0. Since g and P f are both integrable functions on _Rd., there must exist an r > 0 such that
JIIII~r
g(y)dy:::;;
and
1lzl~r/2
Pf(x)dx:::;;
~-
IIPd-Pfll::=;;
f f
}Rd }Rd
g(y)IPf(x-Ey)-Pf(x)ldxdy,
IIPd-P/11 :s;It+h
where
It=
f f
}Rd A111~r
and
I2 =
}R4
1.
329
l11l?!:r
L
4
61.
I 1 :5 g(y) dy :5
2 l11l:$r
61
R4
g(y) dy =
62
I2:5
1.
JR 4 l11l?!:r
g(y)Pf(x-ey)dxdy+
1.
j R 4 l11l?!:r
g(y)Pf(x-ey)dxdy=
1.
}R4 l11l?!:r
f4
JR
1.
g(y)Pf(x)dxdy.
1.
g(v)Pf(z)dzdv
lvl;?!r
lvl?!:r
g(v) dv :5
~-
}Rd
1.
l11l?!:r
g(y)Pf(x)dxdy :5
so that I2 :5 6/2.
Thus
that is,
lim
-+0
IIPd- pIll
=0
Corollary 10.6.1. Suppose that S and g are given and that for every small
e, 0 < e < e0 , the opemtor P, defined by (10.6.4), has a stationary density
fe. If the limit
/.=lim
/
--+0
330
Proof. Write
Since Pe is contractive,
= E(enS(xn))
=
h(zS(y))fn(y)g(z) dy dz
11
00
00
1oo 1
00
h(x)fn(y)g (s(y))
S~y) dxdz,
(10.7.3)
331
P!(x)
= Pin
sty) dy,
P,
(10.7.5)
K(x,y) =g
(s~y)) S~y)"
(10.7.6)
with m =
forx
100
0,
xg(x) dx,
(10.7.7)
(10.7.8)
00
00
00
1 xPf(x)dx = 1 f(y)S(y)dy 1 zg(z)dz
00
00
= m 1 f(y)S(y) dy $am 1 yf(y) dy +{3m.
332
Define
6 = 61 inf S(y).
veB
h K(x,y)dx
= hg
(s(y))
=f
S;y)dx
g(x)dx
}E/S(v)
~ >.
< 6,
and all of the conditions of Theorem 5.7.2 are satisfied. Thus Pis constrictive and a simple application of the spectral decomposition Theorem 5.3.1
finishes the proof.
We close with a second theorem concerning asymptotic stability induced
by multiplicative noise.
Theorem 10.7.2. If the Markov operator P: L 1 (R+)- L 1 (R+) defined by
(10.7.5) satisfies the conditions of Theorem 10.7.1 and, in addition, g(x) >
0, then {P"} is asymptotically stable.
Proof. Note that, for fixed x, the quantity
X
g ( S(y)
1 - 1
S(y) pn- f(y),
pn f(x) =
and Theorem 5.6.1 finishes the proof of the asymptotic stability of {P"} .
Theorems 10.7.1 and 10.7.2 illustrate the behaviors that may be induced
by multiplicative noise in discrete time systems. A number of other results concerning asymptotic periodicity and asymptotic stability induced
by multiplicative noise may be proved, but rather than giving these we
refer the reader to Horbacz [1989a,b].
Exercises
333
Exercises
10.1. Let {n: 0 -+ Jld.. , n = 1, 2, ... , be a sequence of independent random
vectors, and let cpn: Rd.. -+ ~ be a sequence of Borel measurable functions. Prove that the random vectors 17n(w) = 'Pn({n(w)) are independent.
10.2. Replace inequality (10.7.7) in Theorem 10.7.1 by
0 :5 8(x) :5 ax,
a<
1,
and show that in this case the sequence {pn} is sweeping to zero. Formulate
an analogous sufficient condition for sweeping to +oo.
10.3. Let 8: [0, 1]-+ [0, 1] be a measurable transformation and let {{n} be
a sequence of independent random variables each having the same density
g. Consider the process defined by
(mod 1),
and denote by fn the density of the distribution of Xn. Find an explicit
expression for the Markov operator P: 1 ([0, 1]) -+ 1 ([0, 1]) such that
fn+l
= Pfn
10.4. Under the assumptions of the previous exercise, show that {pn}
is asymptotically periodic. Find sufficient conditions for the asymptotic
stability of {pn}.
10.5. Consider the dynamical system (10.7.1) on the unit interval. Assume
that 8: (0, 1]-+ [0, 1] is continuous and that {n: n-+ [0, 1] are independent
random variables with the same density g E D([O, 1]). Introduce the corresponding Markov operator and reformulate Theorems 10.7.1 and 10.7.2 in
this case.
10.6. As a specific example of the dynamical system (10.7.1) on the unit
interval (see the previous exercise), consider the quadratic map 8( x) =
ax(1- x) and en having a density g E D([O, 1]) such that
0:5x:51.
Show that for every a E (1,4] there is a K
is asymptotically stable (Horbacz, 1989a).
Yn+l
= 17nT(lnyn)
334
= enS(xn)
and we set y = lnx, 'I= 1ne, and T = lnS, then
Xn+l
Yn+l
= T(e11") + '7n
results. Examine the results for additive noise that can be obtained using
this technique on the theorems of Section 10.7 pertaining to multiplicative
noise.
11
Stochastic Perturbation of
Continuous Time Systems
In this chapter continuous time systems in the presence of noise are considered. This leads us to examine systems of stochastic differential equations
and to a derivation of the forward Fokker-Planck equation, describing the
evolution of densities for these systems. We close with some results concerning the asymptotic stability of solutions to the Fokker-Planck equation.
dx
dt = b(x) + o-(x)e,
(11.1.1)
336
= 0; and
g(t- s, x) =
J21r(t- s)
(11.1.2)
1
g(t,x)= ~exp(-x 2 /2t).
v21Tt
An easy calculation shows
(11.1.3)
00
E((w(t)- w(s))n) =
1
J21T(t- s)
-oo
E(w(t)- w(s))
=0
{11.1.5)
337
w(t)
(a)
w(t)
(b)
;:-.;'
... l(
j ; : . i II ~ :
..
..
H#~#m#M~:~::~::i*;~~~~~~!~ii~!~:~::~:i~~:;~:~;;~r.y~f:~l::~i:~:i!~~::i~i~i:~i~!:~;:~ t
;';:;;,',,:.::::: !!i::J!!;.:?:<H:::::i:::::::::;;;:;:;.
' . ol '
~ : ' ; : := .
= (t- s).
(11.1.6)
338
llit I= !litllw(to
E (I~~ I) = l~tl E(lw(to
liw
We have
+lit) - w(to)l.
+lit) - w(to)l)
00
-oo
E(l~~l)=~~
x(t)
However, this approach leads to the new problem of defining what the
integrals on the right-hand side mean, which will be dealt with in Section
11.3.
To obtain further insight into the nature of the process w(t), examine
the alternative sequence {zn} of processes, defined by
Zn(t) = w(tf__d +
t~ -=_t;~ 1
'
t-1
[w(tf)- w(tf_d]
339
"'n
The process "'n(t) is piecewise constant. The heights of the individual segments are independent, have a mean value zero, and variance D 2 'f/n(t) = n.
Thus, the variance grows linearly with n. If we look at this process approximating white noise, we see that it consists of a sequence of independent
impulses of width (1/n) and variance n. For very large n we will see peaks
of almost all possible sizes uniformly spread along the t-axis.
Note that the random variable .zn(t) for fixed t and large n is the sum
of many independent increments. Thus the density of Zn(t) must be close
to a Gaussian by the central limit theorem. The limiting process w(t) will,
therefore, also have a Gaussian density, which is why we assumed that w(t)
had a Gaussian density in Definition 11.1.2.
Historically, Wiener processes (or Brownian motion) first became of interest because of the findings of the English biologist Brown, who observed
the microscopic movement of pollen particles in water due to the random
collisions of water molecules with the particles. The impulses coming from
these collisions are almost ideal realizations of the process of white noise,
somewhat similar to our process "'n(t) for large n.
In other applications, however, much slower processes are admitted as
"white noise" perturbations, for example, waves striking the side of a large
ship or the influence of atmospheric turbulence on an airplane. In the example of the ship, the reason that this assumption is a valid approximation
stems from the fact that waves of quite varied energies strike both sides of
the ship almost independently with a frequency much larger than the free
oscillation frequency of the ship.
Example 11.1.1. Having defined a one-dimensional Wiener process
{w(t)}t~o, it is rather easy to construct an exact, continuous time, semidynamical system that corresponds to the partial differential equation
(11.1.7)
Our arguments follow those of Rudnicki (1985), which generalize results of
Lasota (1981), Brunovsky (1983), and Brunovsky and Komornik (1984).
The first step in this process is to construct the Wiener measure. Let X
be the space of all continuous functions x: (0, 1) -+ R such that x(O) = 0.
We are going to define some special subsets of X that are called cylinders.
Thus, given a sequence of real numbers,
0 < St
< < Bn
1,
340
C(sb ... , Bni A1, ... ,An)= {x E X:x(si) E A,,i = 1, ... ,n}.
(11.1.8)
Thus the cylinder defined by (11.1.8) is the set of all functions x E X pasing
through the set A1 at s1 (see Figure 11.1.2). The Wiener measure P.w of
the cylinders (11.1.8) is defined by
(11.1.9)
(11.1.11)
1
g(s, y) = rn= exp( -y2 f2s).
v211's
(11.1.12)
Thus we have
prob{w(sl) E A1, ... , w(sn) E An}
JJ
F(A)
r jAnr
jA1
9(81 1 X1)g(s2
817
X2 - X1)
341
x(s)
FIGURE 11.1.2. Schematic representation of implications of the cylinder definition (equation (11.1.8).]
u(O,s)
= x(s),
{11.1.15)
342
(11.1.18)
t--+oo
=1
if P.w(C) > 0
(11.1.19)
Set y(s)
Since s E [0, 1], and, thus, se-t E [0, e-t], the conditions x(s,) E At are
irrelevant for s, > e-t. Thus
St(C(st, ... ,sn; At, ... ,An))= C(stet, ... , s~cet; et/ 2 At, ... ,et/2 A~c)
343
where k = k(t) is the largest integer k ~ n such that s1c ~ e-t. Once t
becomes sufficiently large, that is, t > -logs1, then from (11.1.20) we see
that the last condition x1 E A1 disappears and we are left with
St(C(sl, ... , Sni A., ... , An))= {y EX: y(s) = et12 x(se-t)}.
However, since X is the space of all possible continuous functions x: [0, 1]
R, the set on the right-hand side is just X and, as a consequence,
for t
-+
> -log s 11
where
(11.1.22)
and to is an arbitrary nonnegative number. From (11.1.22), it follows that
.Aoo =
n 8t" (.A).
1
t2:;0
From the Blumenthal zero-one law [see Remark 11.2.1] it may be shown that
the u-algebra .Aoo contains only trivial sets. Thus, since J.tw(B) ;::: J.tw(C),
we must have J.tw(B) = 1 whenever J.'w(C) > 0. Thus (11.1.19) follows
immediately from (11.1.21).
A proof of exactness may also be carried out for equations more general
than the linear version (11.1.7). The nonlinear equation
au.
&t
au.
+ c(s) 88
= f(s, u),
(11.1.23)
344
(11.2.1)
of a-algebras. We define the independence of (11.2.1) as follows.
Definition 11.2.1. A sequence (11.2.1) consists of independent a-algebras if all possible sequences of sets At, ... , An such that
are independent.
Further, for every random variable ewe denote by .r(e) the a-algebra of
all events of the form {w:e(w) E B}, where the Bare Borel sets, or, more
explicitly,
.r(e) = {Ct(B):B is a Borel set}.
Having a stochastic process {TJ(t)}tea on an interval a, we denote the
smallest a-algebra that contains all sets of the form
{w:17(t,w) E B},t E a, B is a Borel set,
by F(17 (t): t E a).
With this notation we can restate our definition of independent random variables as follows. The random variables et, ... , are independent
if .r(et), ... ,.r(en) are independent. In an analogous fashion, stochastic
processes {TJt(t)}tea1 , , {TJn(t)hean are independent, if
en
em
345
are independent. We will also say that a stochastic process {q(t)hea and
au-algebra :Fo are independent if :F(q(t):t E a) and :Fo are independent.
Now it is straightforward to define a d-dimensional Wiener process.
Definition 11.2.2. A d-dimensional vector valued process
w(t)
t ~0
t.
;t x~
(11.2.2)
I I
= 1,
(11.2.3)
Rd
I I
= 0,
i = 1, ... ,d,
(11.2.4)
Rd
and
I I
= 61;t,
i,j
= 1, ... ,d,
Rd
(11.2.5)
:F(w(u):O :5 u :5 t+h).
h>O
(11.2.6)
346
.r(w(u): 0 $; u $;h)
h>O
contains only sets of measure zero or one. The last statement is referred to
as the Blumenthal zero-one law (Friedman [1975]). D
J:
17(t) dw(t).
(11.3.1)
Proceeding naively from the classical rules of calculus would suggest that
(11.3.1) should be replaced by
J:
17(t)w'(t) dt.
8=
L '7(f,)[w(t,)- w(ti-t)],
(11.3.2)
i=l
where
a
is a partition of the interval [a, .B] and the intermediate points f, E [t,, ta+t].
This turns out to be a more fruitful idea but has the surprising consequence
that the limit ofthe approximating sums 8 ofthe form (11.3.2) depends on
the choice of the intermediate points f,, in sharp contrast to the situation
for the Riemann and Stieltjes integrals. This occurs because w(t), at fixed
w, is not a function of bounded variation.
347
a~ t ~
t), so w(u), a
t, is measurable with
(3) w(t +h)- w(t) is independent of Ft for h;::: 0, so all pairs of sets A1 ,
A2 such that A1 EFt and A2 E F(w(t +h)- w(t)) are independent.
From this point on we will assume that a Wiener process w( t) and a
family of nonanticipative 0'-algebras {Ft}, a~ t ~ {3, are given.
We next define a fourth condition.
Deftnition 11.3.2. A stochastic process {77{t)}, a
non-anticipative with respect to {Ft} if
{3, is called
For every random process {77(t)}, a~ t ~ {3, we define the Ito sum 8 by
k
E 17(ti-1)[w(t,) - w(ti-1)].
(11.3.3)
i=l
Note that in the definition of the Ito sum {11.3.3), we have specified the
intermediate points fi of {11.3.2) to be the left end of each interval, f, =
ti-l For a given Ito sum 8, we define
and call a sequence of Ito sums {8n} regular if c5(8n)-+ 0 as n-+ oo.
We now define the Ito integral as follows.
Deftnition 11.3.3. Let {77(t)}, a~ t ~ {3, be a nonanticipative stochastic
process. If there exists a random variable ( such that
( = st-lim 8n
(11.3.4)
for every regular sequence of the Ito sums {8n}, then we say that (is the
Ito integral of {17(t)} on the interval [a, /3] and denote it by
( =
J:
17(t) dw(t).
{11.3.5)
348
Remark 11.3.1. It can be proved that for every continuous nonanticipative process the limit (11.3.4) always exists. D
Remark 11.3.2. Definition 11.3.1 of a nonanticipative u-algebra is complicated, and the reason for introducing each element of the definition, as well
as the implication of each, may appear somewhat obscure. Condition (1)
is easy, for it merely means that the u-algebra Fe of events grows as time
proceeds. The second condition ensures that :Ft contains all of the events
that can be described by the Wiener process w(8) for times 8 E [a, t]. Finally, condition (3) says that no information concerning the behavior of
the process w(u)- w(t) for u > t can influence calculations involving the
probability of the events in Fe. Definition 11.3.2 gives to a stochastic process 71(u) the same property that condition (2) of Definition 11.3.1 gives
to w(u). Thus, all of the information that can be obtained from 71(u) for
u E [a, t] is contained in :Ft.
Taken together, these four conditions ensure that the integrand 71(t) of
the Ito integral (11.3.5) does not depend on the behavior of w(t) for times
greater than {3 and aid in the proof of the convergence of the Ito approximating sums. Further, the nonanticipatory assumption plays an important
role in the proof of the existence and uniqueness of solutions to stochastic
differential equations since it guarantees that the behavior of a solution in
a time interval [0, t] is not influenced by the Wiener process for times larger
than t. D
Example 11.3.1. For our first example of the calculation of a specific Ito
integral, we take
1T
dw(t).
= L[w(ti) -
w(ti_t)]
= w(t~c) -
w(to)
i=l
and, thus,
1T
dw(t) = w(T).
1T
w(t) dw(t),
= w(T)
349
In this case, 17(t) = w(t), so that condition (4) of Definition 11.3.2 follows
from condition (2) of Definition 11.3.1. The Ito sum,
k
=L
w(ti-l)[w(ti) - w(ti_t)],
i=l
may be rewritten as
k
= ! I)w
k
2
i=l
= lw
(T) -
! L 'Yi,
(11.3.6)
i=l
where
'Yi
= [w(t,) -
w(ti-1)] 2
To evaluate the last summation in (11.3.6), observe that, from the Chebyshev inequality (10.2.10),
prob
{I! t, !t,
'Yi -
mi
~ e} ~ e~ D
(! t,
'Yi)
(11.3.7)
= E([w(t,) -
w(ti-lW)
= ti -ti-l
Thus,
and
~ 3Tmrx:(ti- ti-d
i=l
i=l
Setting 6(8) = maJCi(ti -ti-l) as before and using (11.3.7), we finally obtain
2
prob
{I! t, il ~
'Yi-
e}
~ ~6(8)
350
{I (
8-
I}
w2(T)
3T 6(8).
- - T) ~ e ::::; 4e
2
st-lim 8n
= !w2(T) - !T.
=L
i=l
1c
= !w
(T) - !
1c
L
'Yi + ! ~:::>i
i=l
i=l
where
st-lim
1c
i=l
Thus the Stratonovich sums {8n} converge to !w2(T), and the Stratonovich
integral gives a result more in accord with our experience from calculus.
351
However, the use of the Stratonovich integral in solving stochastic differential equations leads to other more serious problems. 0
To close this section, we extend our definition of the Ito integral to the
multidimensional case. If G(t) = ('1i;(t)), i,j = 1, ... , dis ad x d matrix of
continuous stochastic processes, defined for a~ t ~ {3, and w(t) = (wi(t)),
i = 1, ... , d, is a d-dimensional Wiener process, then
L{3
G(t) dw(t) =
where
(i
1{3
j=1
=L
('~1)
'
(11.3.8)
'1i;(t) dw;(t)
defines the Ito integral. Thus, equation (11.3.8) is integrated term by term.
In this case the family {Ft} of nonanticipative o--algebras must satisfy
conditions (2) and (3) of Definition 11.3.1 with respect to all {wi(t)}, i =
1, ... , d, and condition (4) of Definition 11.3.2 must be satisfied by all
{'IJi;(t)}, i,j = 1, ... ,d.
Bn
= Lf(tf)[w(tf)- w(tf-1)],
i=1
(=
1:
f(t) dw(t).
(11.4.1)
Although we will not prove this assertion, it suffices to say that the proof
proceeds in a fashion similar to the proof of the following proposition.
Proposition 11.4.1. Iff: [a,/3]-+ R is a continuous function, then
(!: /( t))
t) dw(
=0
(11.4.2)
352
and
2
D (J: f(t)dw(t))
= J:[f(t)] 2 dt.
(11.4.3)
Proof. Set
8
i=1
i=1
/(ti-1)/(t;-1)~wi~w;.
i,j=1
We have immediately that
k
E(8)
=L
/(ti-1)E(~wi)
=0
i=l
We also have
k
D (8)
= E(8 ) = L
2
/(ti-1)/(t;-1)E(~wi~w;)
i,j=1
k
=0
(11.4.4)
and
(11.4.5)
Since, from the remarks preceding the proposition, {sn} converges in mean
to the integral (given in equation (11.4.1), we have limn-+oo E(8n) = E(()
and limn-+oo D 2 (sn) = D 2 ((), which, by (11.4.4) and (11.4.5), completes
the proof.
353
A second special case of the stochastic integral occurs when the integrand
is a stochastic process but it is desired to have the integral only with respect
(=
J:
{11.4.6)
11(t)dt
s=
E 11(ti)(ti -ti-d.
i=l
/3
with arbitrary intermediate points fiE [ti-l ti] We now have the following
definition.
Definition 11.4.1. If every regular [6{sn) -+ OJ sequence {sn} of approximating sums is stochastically convergent and
( = st-lim Bm
{11.4.7)
then this common limit is called the integral of 17(t) on [a, /3] and is denoted
by {11.4.6).
Observe that, when 17{t,w) possesses continuous sample paths, that is, it
is a continuous function of t, the limit
lim Bn(w)
n-+oo
1:
f(t) dw(t)
=-
J:
-+
{11.4.8)
354
Proof. Since the integrals in (11.4.8) both exist we may pick special approximating sums of the form
len
=L
Bn
/'(~)w(~)(tf- tf-1),
(11.4.9)
i=1
Bn =
L[f(tf) -/(tf_t))w(~)
i=1
len-1
=-
i=1
-/(ti)w(f'i).
(11.4.10)
that does not contain intervals (a, ff) and (tJ: ,/3). Setting t1;
t~n = /3, we may rewrite (11.4.10) in the form n
Bn
= -sn + w(/3)/(/3) -
=a
and
(11.4.11)
w(a)f(a),
where
len-1
Bn
L [w(tf+l) - w(tf)Jf(tf).
i=1
J:
f'(t)w(t) dt,
J:
f(t) dw(t).
Remark 11.4.1. In our short development of the Ito integral and presentation of its main properties, we have restricted ourselves to the special
355
situation where the integrand is a continuous stochastic process. This allowed us to define the Ito integral in a relatively simple and direct way
as the limit of the Ito sums (11.3.3). Generally, such an approach is inconvenient because of the restrictive nature of the assumption concerning
the continuity of the integrand. Usually, the definition of the Ito integral
is given in a more sophisticated manner. It is first defined for stochastic
processes that are piecewise constant in time, and then, by using a limiting
procedure in L 2 (0) the definition is extended to a quite general class of
integrands. An exhaustive treatment of this procedure may be found in
Gikhman and Skorokbod [1969, 1975]. 0
dx
dt
= b(x) + u(x)e
(11.5.1)
b(x)
bl(x))
(bd(x);
and u(x)
(uu(x)
;
Udt(x)
x(t)
Xl(t))
(Xd(t);
e-- (~)
:
~
356
(11.5.3}
Since the integrals that appear on the right-hand side of (11.5.3} are defined
from our considerations of the previous sections, we are close to a formal
definition of the solution.
First, however, it is necessary to choose a specific family of nonanticipative a-algebras {Fth>o We may, for example, assume that F is the
smallest a-algebra containing all events of the form {w: w( u, w) E B} and
(x0 )- 1 (B) for 0 ~ u ~ t and Borel sets B, that is, Ft is the smallest aalgebra with respect to which w(u}, 0 ~ u ~ t, and x 0 are measurable.
This family is nonanticipative since conditions (1} and (2) of Definition
11.3.1 are evidently satisfied, and condition (3} follows from the fact that
{w(t)} is a process with independent increments and that x 0 and {w(t)}
are independent.
With this family of nonanticipative a-algebras, we define the solution to
equations (11.5.1} and (11.5.2}.
Definition 11.5.1. A continuous stochastic process {x(t)}t>o is called a
solution of equations (11.5.1} and (11.5.2} if:
(a) {x(t)} is nonanticipative, that is, it satisfies condition (4} of Definition
11.3.2; and
(b) For every t ~ 0, equation (11.5.3} is satisfied with probability 1.
It is well known from the theory of ordinary differential equations that
it is necessary to assume some special conditions on the right-hand side in
order to guarantee the existence and uniqueness of a solution. It is interesting that analogous conditions are also sufficient for stochastic differential
equations. Thus we have the following theorem.
Theorem 11.5.1. If b(x) and a(x) satisfy the Lipschitz conditions
lb(x)- b(y)l ~ Llx- yi,
(11.5.4}
and
x,yER"'
(11.5.5)
with some constant L, then the initial value problem, equations (11.5.1} and
(11.5.2}, has a unique solution {x(t)}t>O
Theorem 11.5.1 can be proved by the method of successive approximations as can the corresponding result for ordinary differential equations.
357
xi(t)
and to prove that x(t) is, indeed, the desired solution. We omit the details
as this proof is quite complicated, but a full proof may be found in Gikhman
and Skorokhod [1969].
An alternative way to generate an approximating solution is to use the
Euler linear extrapolation formula. Suppose that the solution x(t) is given
on the interval [0, to]. Then for values to+ tit larger than, but close to, t 0 ,
we write
x(to +tit)
(11.5.6)
where tiw = w(to +tit)- w(to). (Observe that for an ordinary differential
equation, this equation defines a ray tangent to the solution on [0, to] at
to.) In particular, when an interval [0, T] is given, we may take a partition
O=to<<tn=T
and define
(11.5.7)
where tix(ti) = x(ti) - x(ti-1), titi = ti -ti-l! tiwi = w(ti) - w(ti-1),
and x(to) = x 0
It is evident that in some respects this approach is much simpler than the
method of successive approximations, since no knowledge concerning the
Ito integral is even necessary. Indeed, S. Bernstein employed this technique
in his original investigations into stochastic differential equations, so we
will call equations (11.5.6) and (11.5.7) the Euler-Bernstein equations.
Example 11.5.1. The oldest and best-known example of a stochastic differential equation is probably the Langevin equation.
-dx
dt = -bx+uc....
x(O)
= x0 ,
(11.5.8)
358
x(t)
= -b Lt x(s) ds + u Lt dw(s) + x 0
(11.5.9~
Equation (11.5.9) is rather easy to deal with since it does not contain an
Ito integral, and, since the one integral that does appear exists for almost
w taken separately, we may use the usual rules of calculus.
Setting
z(t)
= Lt x(s) ds,
(11.5.10)
dz
o
dt = -bz(t) + uw(t) + x .
For fixed w, this is an ordinary differential equation and, thus,
z(t)
(11.5.11)
x(t)
x(t)
= x0 e-bt + u it e-b(t-)dw(s).
E(x(t))
= e-bt E(x0 )
+ u2fot e-2b(t-s)ds
(11.5.12)
359
e.
u(t, z) dz.
(11.6.3)
a,;(x)
= L:u,~;(x)u;~c(x).
lc=l
(11.6.4)
360
L: aij(x),\i,\;,
(11.6.5)
i,j=l
(11.6.5) is nonnegative.
We are now ready to state the main theorem of this section, which gives
the Fokker-Planck equation.
Theorem 11.6.1. lfthefunctionsl1ij, 8l1i;/8x,ui:J2l1i;/8x~c8xz, bi, I:Jbifi:Jx;,
8uf8t, 8ufi:Jxi, and 82 uf8xi8x; are continuous fort> 0 and x E Jld, and
if bi, l1ij and their first derivatives are bounded, then u(t,x) satisfies the
equation
t
> 0, X
Jld.
(11.6.6)
Proof of Theorem 11.6.1. We will use the Euler-Bernstein approximation formula (11.5.6) in the proof of this theorem as it allows us to derive
(11.6.6) in an extremely simple and transparent fashion.
Thus let to > 0 be arbitrary, and let x(t) be the solution to equations
(11.6.1) and (11.6.2) on the interval [0, to]. Define x(t) on [t0 , t 0 + e] by
x(to +At)
(11.6.7)
361
E(h(x(t 0 +at)))=
jRd
h(x)u.(t0 +at,x)dx.
(11.6.8)
However, using equation (11.6. 7), we may write the random variable h(x(to+
at)) in the form
(11.6.9)
where
I I
jRd jRd
h(Q(x,y))u.(t0 ,x)g(at,y)dxdy
= }Rd
I }Rd
I h(x+b(x)at+CT(x)y)u.(t0 ,x)g(at,y)dxdy.
From this and (11.6.8), we obtain
}Rd
h(x)u.(t0 +at,x)dx
= }Rd
I }Rd
I h(x+b(x)at+CT(x)y)u.(t0 ,x)g(at,y)dxdy.
By developing h in a Taylor expansion, we have
Ld h(x)u.(to
d
+ at,x)dx =
Ld Ld {h(x)
t, ::i
[bi(x)at + (CT(x)y)i]
fPh
+! ..L 1 -x,.a
a
.[bi(x)at + (e1(x)y)i]
x,
I,J=
[b;(x)at + (e1(x)y);]
+ r(at)}
u.(t0 ,x)g(at,y)dxdy,
(11.6.10)
362
where r(~t) denotes the remainder and (u(x)y) 1 is the ith coordinate of
the vector u(x)y.
On the right-hand side of (11.6.10) we have a finite collection of integrals
that we will first integrate with respect toy. Observe that
d
(u(x)y),(u(x)y);
=L
u,~c(x)u;,(x)ykYI
A:,l=l
By equation (11.2.3)
r u<~t. y) dy = 1,
}Rd.
r (u(x)y),g(~t. y) dy = 0.
}Rd.
r (u(x)y),(u(x)y);g(~t. y) dy = a,;(x)~t.
}Rd.
h(x)[u(to +
}Rd.
= ~t
1{L d
Rd.
+2
i=l
8h
. b (x)
8 x, 1
L 8x,IJ2h.8x,.a;(x) } u(to,x)dx+R(~t),
d
(11.6.11)
i,j=l
R(~t)
R(~t)
r ,j;
= ! JRd.
is
82h
x, x; b,(x)b;(x)(~t) u(t0 , x) dx
8 8
1
r r
r(~t)u(to,x)g(~t,y)dxdy.
(11.6.12)
}Rt~.}Rd.
It is straightforward to show that R(~t)/ ~t goes to zero as ~t -+ 0. The
first integral on the right-hand side of (11.6.12) contains (~t) 2 , so this is
easy. The second integral may be evaluated by using the classical formula
for the remainder r(~t):
r(~t)
L
d
i,j,A:=l
[b,~t + (uy) 1)
.IJ3h.
8 x, 8 x, 8:Z:Jc .I
363
L
R"
8
h(x);: dx
L {L ~bi(x) + l L
d
R"
i=l
8h
X,
{)2 h
}
. . a;;(x) u(to,x)dx.
8
8
i,j=l x, XJ
d
(11.6.13)
Since h has compact support we may easily integrate the right-hand side of
(11.6.13) by parts. Doing this and shifting all terms to the left-hand side,
we finally have
(11.6.14)
Since h(x) is a C 3 function with compact support, but otherwise arbitrary,
the integral condition (11.6.14), which is satisfied for every such h implies
that the term in braces vanishes. This completes the proof that u(t0 ,x)
satisfies equation (11.6.6.).
Remark 11.6.2. To deal with the stochastic differential equations (11.6.1)
with (11.6.2), we were forced to introduce many abstract and difficult concepts. It is ironic that, once we pass to a consideration of the density
function u(t,x) of the random process x(t), all this material becomes unnecessary, as we must only insert the appropriate coefficients ai; and b;
into the Fokker-Planck equation (11.6.6)1
D
364
&
8t
~
d
8
~[ai;(x)u]8-[bi(x)u], t > O,x E uJ, (11.7.1)
i,j=l x, x,
i=l x,
d
=! L
u(O,x)
= f(x),
(11.7.2)
(11.7.3)
where
and
(11.7.4)
L ai;(x).Xi.X;,
i,j=l
corresponding to the term of (11.7.3) with second-order derivatives, is always nonnegative. We will assume the somewhat stronger inequality,
d
L ai;(x).Xi.X; ~ P L.X~,
i,j=l
i=l
(11.7.5)
where p is a positive constant, holds. This is called the uniform parabolicity condition.
365
then the classical solution of the Cauchy problem, equations (11.7.2) and
(11.7.3), is unique and given by the integral formula
u(t,x)
= }R<~
f r(t,x,y)f(y)dy,
(11.7.7)
(11.7.8)
Condition (a) is necessary because for functions which grow faster than
2
eal:z:l , the Cauchy problem, even for the heat equation Ut = !u2 u:z::z: is not
uniquely determined. Condition (b) is obvious, and (c) is necessary since
(11.7.3) is satisfied only fort> 0 and, thus, the values of u(t,x) fort> 0
must be related to the initial condition u( 0, x) = f (x).
The existence and uniqueness or solutions for the initial value (Cauchy)
problems (11.7.1)-(11.7.2) or (11.7.3)-(11.7.2) are given in every standard
textbook on parabolic equations. General results may be found in Friedman
[1964], Eidelman [1969], Chabrowski [1970], and Bessala [1975].
To state a relatively simple existence and uniqueness theorem, we require
the next definition.
Definition 11. 7.2. We say that the coefficients ai; and bi of equation
(11.7.1) are regular for the Cauchy problem if they are C 4 functions
such that the corresponding coefficients a;;, bi, and c of equation (11.7.3)
satisfy the uniform parabolicity condition (11.7.5) arid the growth conditions (11.7.6).
366
The theorem that ensures the existence and uniqueness of classical solutions may be stated as follows.
Theorem 11. 7.1. Assume that the coefficients llij and bi are regular for the
Cauchy problem and that f is a continuous function satisfying the inequality
3
1/(x)l :5 cealzl with constants c > 0 and a > 0. Then there is a unique
classical solution of (11.7.1)-(11.7.2) which is given by (11.7.7). The kernel
r(t,x,y), defined fort> 0, x,y E W, is continuous and differentiable with
respect tot, is twice differentiable with respect to Xi, and satisfies (11.7.3)
as a function of (t, x) for every fixed y. Further, in every strip 0 < t :5 T,
X E R, IYI :5 r' r satisfies the inequalities
0 < r(t,x,y) :5
1:~
where
~(t,x- y),
I ~(t,x:5
~(t, x- y)
y),
la::xi I ~(t,x:5
y),
(11.7.9)
(11.7.10)
The explicit construction of the fundamental solution r for general coefficients aij, bi, and c is usually impossible. It is easy only for some special
cases, such as the heat equation,
Ut = (u2 /2}Uzz
21ru2 t
}Br
r(t,x,y)l/(y)ldy:$M
;;:::
f ~(t,x-y)dy
}Br
IYI :5 r, we have
367
and, consequently,
dx
dt = -bx+ue
first introduced in Example 11.5.1. The corresponding Fokker-Planck equation is
8u
{)t
1 2fJ2u
a(
= 20'
ax2 + bax xu).
(11.7.12)
-d
dt
~0'2
-oo
-oo
-oo
Let
mn(t)
1:
xnu(t,x)dx
be the nth moment of the function u(t,x). From (11.7.13) we thus have an
infinite system of ordinary differential equations in the moments,
dmo
"dt = 0,
dmn = 20'
l2(
"dt
n n-
dm1
"dt = -bml,
1) mn-2 - nbmn1
n 2: 2,
mo(t)
= mo(O) =
1:
I dx = 1,
is a
368
Ct
= mt(O),
m2(t)
= ~b + C2e- 2bt,
ms(t)
C2
= m2(0) - ~b,
e-3bt),
Cs
= ms(O).
Successive formulas for higher moments become progressively more complicated. However, it is straightforward to demonstrate inductively that
12
lim mn(t)
t-+oo
= { 1 3 5 .. (n -1) (~f ,
0,
for n even
for n odd.
Thus the limiting moments are the same as the moments of the Gaussian
density
At the end of the next section it will become clear that not only do the
moments of the solution of equation (11.7.12) converge to the moments of
the Gaussian density 9ub 1 but also that u(t,x)-+ 9ub(x) as t-+ oo. 0
Definition 11.8.1. Assume that the coefficients a,; and b,, of (11.7.1) are
regular for the Cauchy problem. Then, for every f E L 1, not necessarily
continuous, the function
u(t,x):;:
r(t,x,y)f(y)dy
(11.8.1)
R"
will be called a generalized solution of the Cauchy problem (11. 7.1) and
(11.7.2).
Since r(t,x,y), as a function of (t,x), satisfies (11.7.1) fort> 0, u(t,x)
has the same property. However, if f is discontinuous, then condition
(11. 7.8) might not hold at a point of discontinuity.
369
Pof(x)
= f(x),
Ptf(x) =
f I'(t,x,y)f(y)dy.
lnd
(11.8.2)
We will now show that, from the properties of I' stated in Theorem 11.7.1,
we obtain the following corollary.
is a stochastic semi-
(3)
{Pth~o
I ?:. 0;
IIPtfll = 11/11
for f ?:. 0;
IE L1 .
Proof. Properties (1) and (2) follow immediately from equation (11.8.1)
since the right-hand side is an integral operator with a. positive kernel.
To verify (3), first assume that f is continuous with compact support.
By multiplying the Fokker-Planck equation by a. C 2 bounded function h( x)
and integrating, we obtain
L
Rd
h(x)utdx
L {! L
Rd
h(x)
.. _
'3-
82
. . (a,;u)8 X, 8 x3
1
}
8
L ~(b
1 u) dx
d
._
,_1 uX,
Setting h = 1, we have
! Ld
Since u ?:. 0 for
u dx
Ld
Ut
dx
= 0.
f ?:. 0, we have
!lluii=O
fort> 0.
Further, the initial condition (11.7.8), inequality (11.7.11) and the boundedness of u imply, by the Lebesgue dominated convergence theorem, that
I!Pdll is continuous a.t t = 0. This proves that IIPt/11 is constant for all
t ?:. 0. If f E L 1 is an arbitrary function, we can choose a. sequence {!AJ of
370
I.
Pt+tJ = l't(PtJ),
which proves (4) for all continuous I with compact support. If I E 1
is arbitrary, we again pick a sequence {!A:} of continuous functions with
compact supports that converges strongly to I and for which
Pt2+tJ1c =
Pt 2 (PtJ~c)
holds. Since the Pt have been shown to be continuous, we may pass to the
limit of k-+ oo and obtain (4) for arbitrary f.
Remark 11.8.1. In developing the material of Theorems 11.6.1, 11.7.1,
and Corollary 11.8.1, we have passed from the description of u(t, x) as the
density of the random variable x(t), through a derivation of the FokkerPlanck equation for u( t, x) and then shown that the solutions of the FokkerPlanck equation define a stochastic semigroup {Pth~o This semigroup describes the behavior of the semi-dynamical system, equations (11.6.1} and
371
(11.6.2). In actuality, our proof of Theorem 11.6.1 shows that the righthand side of the Fokker-Planck equation is the infinitesimal operator for
Ptf, although our results were not stated in this fashion. Further, Theorem 11.7.1 and Corollary 11.8.1 give the construction of the semigroup
generated by this infinitesimal operator. 0
Remark 11.8.2. Observe that, when the stochastic perturbation disappears (ui; = 0), then the Fokker-Planck equation reduces to the Liouville
equation and {Pt} is simply the semigroup of Frobenius-Perron operators
corresponding to the dynamical system
i
= 1, ... ,d.
(1) V(x)
0 for all x;
= oo;
I I
= 1, ... , d;
(11.9.1)
V(x)
ilttim=l
372
82V
:E ai;(x)
..
,,,=
1
.+
8Xi 8X3
8V
:E
bi{x)O. ~ -aV(x) + f3
l=
d
{11.9.2)
X,
with positive constants a and {3. Specifically, we can state the following
theorem.
Proof. The proof is similar to that of Theorem 5.7.1. First pick a continuous density f with compact support and then consider the mathematical
expectation of V calculated with respect to the solution u of equations
(11.7.1) and (11.7.2). We have
E(V I u)
=f
V(x)u(t,x)dx.
}Rrl
(11.9.3)
By inequalities (11.7.11) and (11.9.1), u(t,x)V(x) and Ut{t,x)V(x) are integrable. Thus, diffemtiation of (11.9.3) with respect tot gives
dE(V I u) _
dt
-
jR"
V( ) (
X Ut
t,x
) ..J_
ua;
d
82
d8
}
= JlifRrl V(x) { ! i,j=1
:E
8-[bi(x)u]
8 x,.8x,. [~;(x)u]- :E
i=1 x,
dE(V 1 u)
dt
=
1 { ,,,_
1
Rrl
2
8 V
8 X, 8 x3
. .
2 :E ai;(x).. _
dx.
Uz, V,
and
d
8V}
+ :E~(x)a.
u(t,x)dx.
._
x,
,_
1
[E(V I u)eat]
~ {Jeat.
373
E(V I u) $ (/3/o.)
Now let Gq
have
la
+1
fo
u(t,x)dx 2::
1- ~ [1 + ~] = > 0
e
for t ;::: to. Since V(x) -+ oo as lxl -+ oo, there is an r > 0 such that
V(x) 2:: q for lxl 2:: r. Thus the set Gq is contained in the ball Br and, as a
consequence,
u(t,x)=
r(1,x,y)u(t-1,y)dy;=::
jRd
r(1,x,y)u(t-1,y)dy
jBr
l11l~r
JBr
llli~r
{11.9.4)
h(x)
= e inf
l11l~r
r(1,x,y)
Ptf(x)
= u(t,x);::: h
fort;::: to+ 1,
t--+00
= u.(x),
JeD.
{11.9.5)
374
(11.9.6)
-+
}Rd
r(t,x,y)u(s,y)dy.
oo, we obtain
u.(x) =
}Rd
r(t,x,y)u.(y)dy.
dt = -bx+ue
8u
at=
1 21:J2u
a )
2u ax2 +bax(xu.
1 2a2v
av
375
which is satisfied with V(x) = x 2 , a = 2b, and {3 = u 2 Thus all solutions u(t,x), such that u(O,x) = f(x) is a density, converge to the unique
(nonnegative and normalized) solution u. of
2
cPu
1
20'
dz2
+ bdx (xu) = 0.
(11.9.7)
The function
which is the Gaussian density with mean zero and variance u 2 f2b, satisfies
(11.9.7), and, by Proposition 11.9.1, it is the unique solution. 0
Example 11.9.2. Next consider the system of stochastic differential equations
dx
dt =Bx+u~,
= (ui;)
(11.9.8)
a,;
= E O'ikUjk
k=1
dx
-=Bx
dt
(11.9.9)
v~
where
b1(x)
= Lb;x;.
j=1
Since the coefficients ao; are constant and the matrix (a,;) is nonsingular,
this guarantees that the uniform parabolicity condition (11.7.5) is satisfied.
All of the remaining conditions appearing in Theorem 11. 7.1 in this case
are obvious. Since (11.9.9) is asymptotically stable, the real parts of all
eigenvalues of B are negative and from the classical results of Liapunov
stability theory there is a Liapunov function V such that
ttb
d
1(x)
8V
x, ~ -aV(x),
8
(11.9.10)
376
V(x)
(11.9.11)
k,;XiX;
i',j'=1
!a
a2v
a,;{f"7):
x, x,
i,j=1
av
i=1
x,
+ Lb,(x)a-:- ~ -aV(x) +
Ld a,;ki;
i,j=1
u(x)
= cexp
(.t
p,;x;x;)
,,,=1
02
d a.;--u-da(d
! .L:
L:u L:b,;x; )
. 1
8x,8x; . 1 ax,
. 1
'1=
a=
o. o
:1=
tFx
dx
ue
(11.9.12)
dx
dt
=v
dv
and m dt
(11.9.13)
377
8u
&t
o-2
= 2m2
o2u
(11.9.14)
o- 2 02u
1
2m2 1Jv2 -ox (vu) + m 8v {[,Bv + F(x)]u}
= 0,
.!!_ - .!!_)
( p_
m ov
ox
[vu
+~
au] + .!!_
[..!..F(x)u + ~ au] = 0.
8v m
2mf3 ox
2mf3 8v
Set u(x, v)
13
( m .!!_
8v
!...) [x
ox
(vv
+~
dV)] + [..!..F(x)X
+ ~ dX] dV = o
m
2mf3 dx dv
'
2m/3 dv
+ u 2 F(x)X = 0
(11.9.15)
dV
dv
+ 2mf3 vV = 0
(11.9.16)
a.nd
2/3
o-2
'
respectively.
Integrating equations (11.9.15) a.nd (11.9.16) a.nd combining the results
gives
u(x, v) =
(11.9.17)
I:I:
u(x, v) dx dv
= 1.
= c1 .j{3m/7ro-2 ,
378
where
-1 =
c1
00
-oo
(11.9.18)
u(x,v)
(11.9.19)
= oo.
(11.10.1)
E(V I Ptf) =
V(x)Ptf(x) dx.
(11.10.2)
E(V I Ptf)
(11.10.3)
379
for every f E Do and sufficiently large t, say t ~ h(/). Let r be such that
V(x) ~ M + 1 for lxl ~ r and x E G. I/, for some to > 0, there is a
nontrivial function hr with hr ~ 0 and llhrll > 0 such that
Ptof ~
hr
for f E D
(11.10.4)
r},
then the
Ptf(x)dx
loG
~1-M,
(11.10.5)
+ 1 so V(x)
~ a for
j=
11/tll = f Pt-t
loG
lxl
~ r.
(11.10.6)
/(x) dx
~ 1- M
and, by (11.10.4), Ptoi ~ hr. Thus, using (11.10.6)., we have shown that
[1- (Mfa)]hr
is a lower-bound function for the semigroup {Pth~o Since, by assumption, hr is a nontrivial function and we took a > M, then it follows that
the lower-bound function for the semigroup {Pth>o is also nontrivial. Application of Theorem 7.4.1 completes the proof. -0
8u(t,x)
&t
+u(t,x)
= 2u
t
lJ2u
Bx2 +
00
-oo
K(x,y)u(t,y)dy,
> O,x E R
(11.10.7)
u(O,x)
= t/J(x),
xER,
(11.10.8)
1:
!x!K(x, y) dx
~ alvl + f3
foryER
< 1.
(11.10.9)
380
(11.10.10)
n=O
where
L:
(11.10.11)
g(t, x- y)f(y) dy
and
00
Pf(x)=
1
v211"t
-oo
Let
E(t)
L:
= E(lxl Pd) =
lxiPtf(x) dx,
E(t)
= e-t ~ en(t),
n=O
where
en(t)
L:
lxiTn(t)f(x) dx.
We are going to show that E(t}, as given here, satisfies condition (11.10.3}.
If we set
fnt
= PTn-l(t}/
and qnT(t)
L:
lxiTo(t- r}/nt(x) dx
(11.10.13}
381
(11.10.14)
(11.10.15)
and, as a consequence,
00
Qn-r(t) :5
-oo
r)
IYI/n-r(Y) dy + V2(t11"
00
-oo
fn-r(Y) dy.
By using equation (7.9.18) from the proof of the Phillips perturbation theorem and noting that P is a Markov operator (since K is a stochastic kernel)
and II/II = 1, we have
oo
-oo
fn-r(y)dy
= IIPTn-t(-r)/11 = IITn-1(7")/11 :5
7"n-1
(n _ )!"
1
(11.10.16)
/_: IYI/n-r(Y) dy =
L: L:
:5 aen-1(-r) + {3 (n _ 1)!"
Substituting this and (11.10.16) into (11.10.15) gives
v2(t- -r)
11"
7"n-1
(n _ 1)!
= 1, 2,... .
{11.10.17)
eo(t)
L:
00
:5
+ mt.
m1
-oo
IYif(y) dy.
(11.10.18)
382
With equations (11.10.17) and (11.10.18) we may now proceed to examine E(t). Sum (11.10.17) from n = 1 to m and add (11.10.18). This
gives
m
{2i
{2 (t
m
Een(t)~m1+y-;+f3et+v;Jo vt-Te.,.dT+ lo Een(T)dT,
n=1
Define Em(t)
n=O
= et.
Em(t)
where
p=f3+mp-x
(11.10.19)
To solve the integral inequality (11.10.19), it is enough to solve the corresponding equality and note that Em(t) is below this solution (Walter,
1970). This process leads to
-+
oo,
(11.10.20)
Since the constant p does not depend on /, (11.10.20) proves that the
semigroup {Pth;::o, generated by (11.10.7) and (11.10.8), satisfies equation
(11.10.3) with V(x) = lxl.
Next we verify equation (11.10.4). Assume that f E D(R) is supported
on [-r, r]. Then we have
1
Pd ~ e- 1 To(1)/ = e- 1 . tiC
v27r
~
=
1
2
2
. tiC exp[-(x + r
v27r
-r
-r
~ exp[-(x2 + r 2 + 1)],
v27r
dy
383
x)
+ c 8u(t
8~ + u(t, x) =
ch
00
(11.10.21)
= f/J(x)
(11.10.22)
u(t,O)
=0
and u(O,x)
for 11 > 0,
(11.10.23)
where a and (3 are nonnegative constants and a < 1. In the ChandrasekharMiinch equation, K(x, 11) = .,P(x/11)/11, and {11.10.23) is automatically satisfied since
1
11
xK(x, 11) dx
and
=1
11
(x/11)'1/J(x/11) dx
= 111
z.,P(z) dz
To(t)f(x)
= 1[o,oo)(X- t)f(x- t)
and
(11.10.24)
00
Pf(x)
(11.10.25)
384
E(t) =
E(t)
= e-t L en(t),
en(t)
n=O
and
qn'T(t) = {
lt-T
or, setting x- t
[1
00
z-t+T
1 [1
00
qnT(t)
00
zK(z,y)fnT(y)dy] dz
1 [1
00
+(t-r)
00
K(z,y)fnT(y)dy] dz
00
$a:
00
en(t) $a:
tn
1
t
en-1(r)dr + {31
~
n= 1,2, ....
1 1.
Further,
eo(t)
00
00
zf(z) dz + t
f(z) dz
{11.10.26)
or
385
00
= m1 + t,
eo(t)
m1
(11.10.27)
zl(z) dz.
Observe the similarity between equations (11.10.26)-(11.10.27) and equa.tions (11.10.17)-(11.10.18). Thus, proceeding as in Example 11.10.1, we
again obtain (11.10.20) with
p = /3 +
00
ue-u du + max(te-t).
t
Thus we have shown that the semigroup generated by equations (11.10.21)(11.10.22) satisfies condition (11.10.3).
However, the proof that (11.10.4) holds is more difficult for the reasons
set out at the beginning of this example. To start, pick r > 0 as in Proposition 11.10.1, that is,
r = M + 1 = [p/(1 -a)] + 1.
For an arbitrary
= e-to
to
To(to- r)PTo(r)l(x)dr
00
K(x-to+r,y)
:~:-to+r
In particular, for 0
~to,
lto-:J:
Now set z
obtain
=y-
and
=x -
T,
D([O, r]) to
where
hr(x)
= hr(x)
for 0
~ x ~to,
= e-to O:Sz$r
inf 1:~: K(8, z + 8 +to- x) d8.
0
386
= e-to inf
r 1/J ( z +
~ 1o
If we set q = s/ (z + 8
+ to -
hr(x)
+to - x
d8
z + 8 +to - x
= e-to inf
~
z/(~+to)
1/J( )
_q_ dq
1-q
r/(r+to)
~ e-to 1o
.,P(q) dq.
lim
to-+oo
uniformly for x
1/J(q) dq = 1
r/(r+to)
hr(x) ~ e-to 1o
.,P(q) dq
>0
lim f u(t,x)dx=O,
1fA Ptf(x)dx= t-+oo
1A
387
8V
fPV
8xi' 8xi8x;'
i,j
= 1, ... ,d;
and
(3)
V(x)~pe6lzl, ~8~x(x,.)l~pe6lzl,
u
and
lfPV(x)l< 6lzl
8xix; - pe '
(11.11.2)
zEA
>0
for A E
.Ac,
i,j=l
fPV
ai;(x)~ +
x, x,
8V
L bi(x)~x, ~ -aV(x),
(11.11.3)
i=l
with a constant a> 0. Then the semigroup {.Pth~o generated by (11.7.1)(11.7.2) is sweeping.
Proof. The proof proceeds exactly as the proof of Theorem 11.9.1, but
is much shorter. First we pick a continuous density f with compact support and consider the mathematical expectation (11.9.3). Using inequality
(11.11.3), we obtain
388
and, consequently,
E(V I Ptf)
ue
(11.11.4)
where b and 0' are positive constants and is a white noise perturbation. Equation (11.11.4) differs from the Langevin equation because the
coefficient of x is positive. The Fokker-Planck equation corresponding to
(11.11.4) is
8u
= u 2 tPu _ b8(xu).
8t
2 8x2
Now the inequality (11.11.3) becomes
u2
8x
lPV
8V
8x2 +bx 8x $ -aV.
(11.11.5)
(11.11.6)
Pick a Bielecki function of the form V(x) = e-n:~ and substitute it into
(11.11.6) to obtain
2e(eu2 - b)x 2 - eu 2 $ -a.
This inequality is satisfied for arbitrary positive e $ b/ u 2 and a $ eu2 This
demonstrates that forb> 0 the semigroup {~}t~o generated by equation
(11.11.5) is sweeping.
}Rd
u.(x)dx<oo
(11.12.1)
389
Proof. We are going to use Theorem 7.12.1 in the proof, sequentially verifying conditions (a), (b), and (c).
First we are going to show that the kernel f(t,x,y) in equation (11.7.7)
is stochastic for each t > 0. We already know that r is positive and that
{Pth>o is stochastic.
Furthermore, for each IE L 1 (R") we have
}Rd
l(y)dy
= }Rd
f Ptf(x)dx = f f f(t,x,y)l(y)dxdy,
}Rtl }Rd
and consequently
-1]
l(y)dy
0.
}Rtl
r(t,x,y)dx
=1
u(t,x)
= Ptu.(x)
u.(x) = Ptu.(x)
0.
1 02(u 2 (x)u]
8t=2
8x2
8(b(x)u]
8x
(11.12.3)
390
xb(x)
for
lxl 2:: r,
(11.12.4)
where r is a positive constant. This last condition simply means that the
interval [-r, r] is attracting (or at least not repelling) for trajectories of the
unperturbed equation x = b(x).
To find a stationary solution of (11.12.3) we must solve the differential
equation
d:z:2
or
dz
d:z:
where z
d:z:
'
2b(x)
= u 2 (x)z + Ct
eB(z) {
C2
2b(y)
B(x) = Jo u2(y) dy.
The solution z(x) will be positive if and only if
C2
(11.12.5)
1:~: e-B(II)dy
converges to +oo if x - +oo and to -oo if x - -oo. This shows that for
Ct =! 0 inequality (11.12.5) cannot be satisfied. Thus, the unique (up to a
multiplicative constant) positive stationary solution of equation (11.12.3)
is given by
u.(x)
= _c_eB(:~:)
u2(x)
00
-oo
--eB(:~:) d:z:
u2(x)
< oo
'
Exercises
391
then the semigroup {Pt}t~o generated by equation (11.12.3) is asymptotically stable. If I= oo, then {Pt}t~o is sweeping.
and
u
2~y
=-
(x)
= ce-~ln(l+,z2) =
!Z
=1
-1 +y2 dy = -~ ln(1 + x 2 ),
c
(1 +x 2 )~
i,
The function u. is integrable on R only for ~ > and thus the semigroup
{Pth~o is asymptotically stable for ~ > and sweeping for 0 :5 ~ :5
This example shows that even though the origin x = 0 is attracting in
the unperturbed system, asymptotic stability may vanish in a. perturbed
system whenever the coefficient of the attracting term is not sufficiently
strong.
Jo
e-B(,z)
dx
{O
Loo
e-B(,z)
dx
= 00.
(11.12.6)
Exercises
11.1. Let {w(t)}t~o be a. one-dimensional Wiener process defined on a.
complete probabilistic measure space. Show that for every to ~ 0, r > 0,
and M > 0 the probability of the event
{I
r}
~
0 the probability
is equal to zero.
11.2. Generalize the previous result and show that the probability of the
event
{w' (t) exists a.t least for one t ~ 0}
is equal to zero.
392
tw
2 (T)converges to
strongly in L (0)].
> o,e E R,
where c and u > 0 are constant and is normalized white noise. Show that
the corresponding stochastic semigroup { Pt h>o is asymptotically stable
(Mackey, Longtin, and Lasota, 1990).
-
11.5. Show that the stochastic semigroup {Pth~o defined in Exercise 7.8
is asymptotically stable for an arbitrary stochastic kernel K (Jama, 1986).
11.6. A stochastic semigroup {Pth~o is called weakly (strongly) mixing
if, for every /t, h E D the difference Pd1 -Pd2 converges weakly (strongly)
to zero in L 1 Show that the stochastic semigroup {Tth~o given by equation
(7.9.9), corresponding to the heat equation, is strongly mixing.
11.7. Consider equation (11.12.3) with b(x) = xf(l + x 2 ) and u = 1.
Prove that the stochastic semigroup { Pt h>o corresponding to this equation
satisfies
-
t-+oo
It
and
h, where
!l,/2ED
H denotes the conditional en-
12
Markov and Foias Operators
Throughout this book we have studied the asymptotic behavior of densities. However, in some cases the statistical properties of dynamical systems
are better described if we use a more general notion than a density, namely,
a measure. In fact, the sequences (or flows) of measures generated by dynamical systems simultaneously generalize the notion of trajectories and
the sequences (or flows) of densities. They are of particular value in studying fractals.
The study of the evolution of measures related to dynamical systems is
difficult. It is more convenient to study them by use of functionals on the
space C0 (X) of continuous functions with bounded support. Thus, we start
in Section 12.1 by examining the relationship between measures and linear
functionals given by the Riesz representation theorem, and then look at
weak and strong convergence notions for measures in Section 12.2. After
defining the notions of Markov and Foias operators on measures (Section
12.3 and 12.4, respectively), we study the behavior of dynamical systems
with stochastic perturbations. Finally, we apply these results to the theory
of fractals in Section 12.8.
394
< oo
for A E 8, A bounded.
Of course, every locally finite measure I' is 0'-finite, since X may be written
as a countable sum of bounded sets:
00
X=
Uxn,
(12.1.1)
n=1
where
It is easy to verify that supp 1 is a closed set. Observe that it also has the
property that if A is a closed set and Jl. is supported on A, then A ::J supp IJ.
To see this, assume that x A. Since X \ A is an open set, there exists a
ball Be (x) contained in X \ A. Thus,
JJ(Be(x)) $
~t(X \A)
= 0,
and x supp IJ This shows that x A implies x supp 1, and consequently A ::J supp IJ
From the above arguments it follows that the support of a measure Jl. can
be equivalently defined as the smallest closed set on which Jl. is supported.
(The adjective closed is important here.)
It should also be noted that the definition of the support of a measure J.1.
does not coincide exactly with the definition of the support of an element
I E 1 The main difference is that supp I' is defined precisely for every
single point, but supp I is not (see Remarks 3.12 and 3.13).
395
xo E A,
0zo (A) -_ { 01 if
if x 0 A.
(12.1.2)
J(A)
l(x)dx
for A E 8,
(12.1.3)
= At<p(ht)+..X2<p(h2)
<p(h) =
h(x)J(dx)
for hE Co.
(12.1.5)
Theorem 12.1.1. For every positive linear functional <p: Co-+ R there is
a unique measure J EM such that condition (12.1.5) is satisfied.
The proof can be found in Halmas [1974].
Observe that Theorem 12.1.1 is somewhat similar to the Radon-Nikodym
theorem. In the Radon-Nikodym theorem, a measure is represented by integrals with a given density. In the ruesz theorem a functional is represented
by integrals with a given measure. However, it should be noted that in the
ruesz theorem even the uniqueness of the measure J is not obvious. Namely,
396
tp(1x)
= J.&(X).
tp(1x)
= 1.
(12.1.6)
0 :$ h1 :$ h2 :$ , lim hn(x)
n-+oo
=1
for x EX.
(12.1.7)
n-+oo
= Jx
f 1J.&(dx) = J.&(X).
(12.1.8)
n-+oo
=1
(12.1.9)
n-+oo
< 00
(12.1.10)
397
cp(h) =
Jx
h(x)J.t(dx) =
f
J{zo}
h(x)J.t(dx) = h(x0 ).
cp(h) =
h(x)J.t(dx) =
n-+oo}x
h(x)J.tn(dx) =
Jx
h(x)J.t(dx)
(12.2.1)
h(x)fn(x) dx-+
h(x)f(x) dx
= (h, f)
for hE Co.
(12.2.2)
This looks quite similar to condition (2.3.2) in Definition 2.3.1 for the weak
convergence of a sequence {in} of functions in V' space. However, there is
398
(h,~) =
Lh(x)~(dx).
for h E Co.
n-oo
{12.2.3)
and
for h E Co.
For each fixed h E Co, from the continuity of h the sequence {h(xn)}
converges to h(x.). Consequently, the sequence of measures {~n} converges
weakly to ~. 0
{~n}
Gaussian densities
/n(x) =
1
{
z2 }
J21W!
exp - 2 0"~
n= 1,2, ....
(12.2.4)
Assume that O"n-+ 0 as n-+ oo and denote by~.= 6o the 6-Dirac measure
supported at x = 0. We have
l(h,~n}- (h,~.}l =
=
:5
Choose an e > 0. Let r
IL
IL
h(x)fn(x)dx- h(O)I
h(x)fn(x)dx-
fn(x)h(O)dxl
lh(x)- h(O)I/n(x)dx.
l(h,~n- ~.}1::; f
Jlzl~r
lh(x)- h(O)Ifn(x)dx
+ f
lh(x)- h(O)I/n(x) dx
Jlzl?.r
:5 e +2M
Jlzl?.r
fn(x) dx,
399
where M =max lh(x)j. Using (12.2.4) and setting xfu, = y we finally have
z2)
00
1
:5 E + 4M rn=
j(h,Jin- p.)j
v27r r/tTn
exp ( - - dx.
2
Since the sequence {u,} converges to zero the last integral also converges
to zero which implies that
= 0.
'R-+00
l(h,Jin- Jl)l
= l(h,J~n)l :5
=
+oo
-oo
lh(x)lfn(x)dx
lh(x)lfn(x) dx,
(12.2.5)
:5 maxlhl
<
k 1b
exp {-
2:~} dx
-~
Since the sequence {u,} converges to infinity, the integrals on the righthand side converge to zero. This shows that the Gaussian measures converge weakly to zero when the standard deviations go to infinity. Observe,
however, that in this case the sequence of densities {!,} does not converge
weakly in L 1 to /. 0. In fact, setting g 1 in (2.2.3) we have
(g,/,)
f,(x)dx
= 1,
(g,/.)
= 0,
and the sequence { (g, f,)} does not converge to (g, /.). 0
The weak convergence of {JJn} to J1 does not imply the convergence of
{J~n(A)} to {J~(A)} for all measurable sets A. However, it is easy to obtain
some inequalities between J~(A) and J.&n(A) for large n and some special
sets A.
We say that G c X is open in X if X \ G is a closed set. For example
the ball
Br(x) {y EX: lx- Yl < r}
400
is open in X since
Mfin
converges weakly to
for G C X, G open in X.
{12.2.6)
Thus, limp.{FA:) = p.(G) and for any given e > 0 there is a set Fk such that
p.{FA:) ?: p.(G)- e. Let hE C0 {X) be such that 0 ~ h ~ 1 and
h(x)
={1
for x E FA:,
0 for x E X\G.
Since FA: and X \ G are closed and disjoint, the function h always exists.
Evidently h ~ 1a and
which gives, in the limit,
(h, p.)
On the other hand, h?:
IF~
liminf P.n(G).
n-+oo
and
401
Now we are going to show how the lliesz representation theorem may be
used to show that a given sequence of measures is convergent.
for hE Co.
for hE Co.
From this and the definition of cp, it follows that the sequence {JLn} converges to J.L weakly.
Remark 12.2.2. In the special case when the J.Ln are probabilistic measures the use of Theorem 12.2.2 can be greatly simplified. Namely, it is not
necessary to verify the convergence of sequences {{h, l'nH for all h E C0
Let C. c Co be a dense subset of Co which means that for every h E Co
and e > 0 there is g E C. such that
sup lg(x)- h(x)l :5 e.
:z:EX
hi
and the Cauchy condition for all sequences { (g, l'n}} imply the Cauchy
condition for {{h, l'nH 0
We close this section by introducing the concept of the strong convergence of measures. First we need to define the distance between two
measures 1'11 1'2 E Miin Let (Xl! ... , Xn) be a measurable partition of
X, that is
n
X=
UX;, X;nX;
for i
=f: j, X;
B.
i=l
We set
L l~tt(X;) -JL2(X,)I,
i=l
(12.2.7)
402
111'11 =sup
L J(Xi) = J(X).
(12.2.8)
i=1
This value will be called the norm of the measure I' It is the distance
from I' to zero. The norm of a probabilistic measure is equal to 1.
n-+oo
lll'n - I'll = 0.
(12.2.9)
#'1(Xi) -J2(Xi)
= f (/l(x) -f2(x))dx.
lx,
IIJ1 -J211
=sup~
IL,
(/l(x) -f2(x))dxl
:::;
(12.2.10)
Now let
x1
x2
lx1
IIJ1 -J211
=[
(12.2.11)
403
From this equality a necessary and sufficient condition for the strong convergence of absolutely continuous measures follows immediately. Namely,
if the IJ.n are absolutely continuous with densities /n, and p. is absolutely
continuous with density/, then {P.n} converges strongly top. if and only
if 11/n -/II - 0.
Example 12.2.4. Assume X = R. Let xo E X. Denote by 11-o = 6:~:0
the 6-Dirac measure supported at xo. Further, let {P.n} be a sequence of
absolutely continuous measures with densities fn Write
Xt = {xo},
X2 =X\ {xo},
where, as usual, {xo} denotes the set that contains only the one point
x = xo. We have
11-o(Xt) = 1,
P.n(Xt) = {
11-o(X2)
=0
fn(x) dx = 0
J{zo}
P.n(X2)
= {
fn(x) dx
= {
Jx\{zo}
Jx
fn(x) dx
= 1.
= IO- 11 + 11 - Ol =
+ IP.n(X2) -p.o(X2)1
2.
This shows that a sequence of absolutely continuous measures cannot converge strongly to a 6-Dirac measure. 0
L:ck
= 1,
(12.2.12)
k=O
l'n
= L:ck,.6k,
k=O
= k.
Further, let
00
L:ck,. = 1,
(12.2.13)
k=O
Assume that for each fixed k (k = 0, 1, ... ) the sequence {ck,.} converges
to Ck as n - oo. We are going to show that under this condition the
sequence of measures {P.n} converges strongly to p.. Thus we must evaluate
the distance IIPn -I'll
404
L lckn - c~cl6~c(Xi)
k=O
L IPn(Xi)- J(Xi)l
i=l
:S; sup
:S; sup
oo
i=l
k=O
L L lc~cn - c~cl6~c(Xi)
oo
k=O
i=l
where the supremum is taken over all partitions {Xi} of X. Since for every
partition
m
L6~c(Xi)
i=l
=1
k=0,1, ... ,
this gives
00
L lckn- c~cl
{12.2.14)
k=O
Cfc <
k=N+l
for n
~no.
k=O
We have, therefore,
oo
Cfcn = 1 -
k=N+l
lc=O
lc=O
L Cfcn :S; 1 - L
L....t
Ck
Clc
lc=N+l
L lckn - Cfc I
lc=O
oo
and, finally,
oo
oo
lc=O
e e
lc=N+l
:54+2+4=e
oo
Clcn
lc=n+l
Clc
405
From the last inequality and (12.2.14} it follows that {1-'n} is strongly convergent to J.l.
A13 a typical situation described in this example consider a sequence of
measures {JJ.n} corresponding to the binomial distribution.
Ckn
= { ( ~) p!q~-k
0
if k = 0, ... , n
if k
> n,
where 0 < Pn < 1 and qn = 1- Pn. Further, let J.1. be a measure corresponding to the Poisson distribution
If Pn
= >..jn, then
Ck = (n- k + 1} ... (n- l}n >._k (1-
kl
nk
~)n-k
n
= (n _ +
Evidently the first k factors converges to 1 and the (k+ 1}th to e-.x. Thus,
Ckn -+ c~c as n -+ oo for every fixed k, and the sequence of measures corresponding to the binomial distribution converges strongly to the measure
corresponding to the Poisson distribution. This is a classical result of probability theory known as Poisson's theorem, but it is seldom stated in
terms of strong convergence. 0
406
J.'J(A)
f(x) dx
for A E B.
(12.3.1}
PJ.&J(A}
for A E 8,
g(x) dx
(12.3.2}
IF j
Ma
!RN
(12.3.3}
(12.3.4)
407
and analogously
II/II = [
f(x) dx
= JLJ(X).
IIF/11 =II/II
for/ E L~.
{12.3.6)
{12.3.7)
Using this extension and condition {12.3.5) one can verify that Pis a linear
operator. Further, from our construction, and in particular from {12.3.4),
it follows that P! ~ 0 for f ~ 0. Finally, {12.3.6) shows that P preserves
the norm of nonnegative functions. We may summarize this discussion with
the following.
Proposition 12.3.1. Let P: Man - Man be a Markov operator on measures such that for every absolutely continuous measure p. the measure P ,_,
is also absolutely continuous. Then the corresponding operator P defined by
fonnulas {12.3.4) and {12.3. 7) is a Markov operator on densities and the
diagram {12.3.3) commutes.
The commutative property of diagram {12.3.3) hOB an important consequence. Namely, if P is the operator on densities corresponding to an
operator P on meBBures, then (P)n corresponds to pn. To prove this consider the following row of n blocked diagrams {12.3.8).
408
Ma
------+-
Ma
------+-
Ma Ma
Ma
~F
~F
1
------+-
------+p
L~
(12.3.8)
~ L~ ... L~ ~ L~
Since each of the blocks commutes, the total diagram (12.3.8) also commutes. This shows that (ft)n corresponds to pn.
Remark 12.3.1. There is an evident asymmetry in our approach to the
definition of Markov operators. In Section 3.1 we defined a Markov operator on the whole space 1 which contains positive and negative functions
f: X -+ R. Now we have defined a Markov operator on Mfln which contains
only nonnegative functions p.: B -+ R. This asymmetry can be avoided.
Namely, we extend the definition of P on the set of signed measures, that
is, all possible differences p.1 - J.1.2, where P.1 1 P.2 E Mfln 1 by setting
P(p.1 - P.2)
= Pp.1 -
Pp.2.
Such an extension is unnecessary for our purposes and leads to some difficulties in calculating integrals, and in the use of the lliesz representation
theorem which is more complicated for signed measures on unbounded regions. 0
Example 12.3.1. Let X= R+. For a given p. E Man define
Pp.(A)
(12.3.9)
where, as usual, 6o denotes the 6-Dira.c measure supported at x = 0. Evidently, P satisfies the linearity condition (a) of Definition 12.3.1.
Moreover,
Pp.(R+)
which shows that condition (b) is also satisfied. Thus, (12.3.9) defines a
Markov operator on measures.
The operator P is relatively simple, but it has an interesting property.
Namely, if a measure p. E M 1 is supported on [0, 1), then Pp. is a 6-Dira.c
measure. H p. is supported on [1,oo), then Pp. = p.. In other words, P
shrinks all of the measures on [0, 1) down to the point x = 0 and leaves
the remaining portion of the measure untouched. In particular, P does not
map absolutely continuous measures into absolutely continuous ones, and
the corresponding Markov operator P on densities cannot be defined. 0
Example 12.3.2. Let X = R and let t > 0 be a fixed number. For every
409
L{t k
exp (- (x ;ty)
JJ(dy)} dx.
(12.3.10)
Inside the braces we have the integral of the Gaussian density, and consequently
Ptp.(R)
1p.(dy) = p.(R),
so Pt is a Markov operator.
To understand the meaning of the family of operators {Pt}, first observe
that for every p. E Man the measure PtJ.I. is given by the integral (12.3.10)
and has the Radon-Nikodym derivative
(12.3.11)
If p. is absolutely continuous with density f, we may replace p.(dy) by
f(y) dy and in this way obtain an explicit formula for the operator Pt on
iH(x) = 9t(x) =
exp {- (x ;ty)
f(y) dy.
The function u(t,x) = 9t(x) is the familiar solution (7.4.11), (7.4.12) of the
heat equation (7.4.13)
1 8 2u
8t = 2 8x 2
8u
u(O,x)
for t > 0, x E R,
= f(x).
It is interesting that u(t, x) = 9t(x) satisfies the heat equation even in the
case when JJ has no density. This can be verified simply by differentiation
of the integral formula (12.3.11). (Such a procedure is always possible since
JJ is a finite measure and the integrand
_1_e-(z-y)2/2t
v'21it
and its derivatives are bounded 0 00 functions for t
e > 0.)
410
Further, in the case of arbitrary I' the initial condition is also satisfied.
Namely, the measures Ptl' converge weakly to I' as t - 0. To prove this
choose an arbitrary hE C0 (R). Since 9t is the Radon-Nikodym derivative
of Ptl' we have
{h, PtJ}
=
=
h(x)Ptp(dx)
L {L k
h(x)
h(x)gt(x) dx
(x ;ty)2) p(dy)} dx
exp (
v(t,y)
(12.3.12)
v(t,y)p(dy)
kL
(x ;ty)
exp (
h(x)dx.
Observe that v(t, y) is the solution of the heat equation corresponding to the
initial function h(y). Since his continuous and bounded, this is a classical
solution and we have
lim v(t, y)
t-+0
= h(y)
for y E R.
Evidently
lv(t,y)l:::;
m~l
2
exp( _(x;ty) )dx=maxlhl.
t-+O}R
v(t, y)p(dy) =
jR
h(y)IJ(dy).
From this and (12.3.12} it follows that Ptl' converges weakly to I'
Thus, we can say that the family of measures {Ptl'} describes the transport of the initial measure by the heat equation. From a physical point of
view, if u(t, x) = 9t(x) is the temperature at timet at the point x, then
ptp(A)
9t(x}dx
411
This equation is identical to the fundamental solution r(t, x, xo) of the heat
equation (see Section 11.7) and it gives a simple physical interpretation of
this solution. Namely, r(t,x,xo) is the temperature at time t and point
x corresponding to the situation in which the initial amount of heat was
concentrated at a single point xo. 0
(12.4.1)
ss-
or
p 6 (A)=
zo
Thus, P6z0
{0
if S(xo) A
1 if S(xo) E A.
= 6sn(zo)
pn6zo
This shows that the iterates of the Markov operator (12.4.1) can produce
a trajectory of a transformation S. To obtain this trajectory it is sufficient
to start from a 6-Dirac measure.
We next show that P can also transform densities. Consider the special
case when p. is absolutely continuous with density f and S is a nonsingular
transformation. Then
p.(A)
f(x) dx,
p.(S- 1 (A)) =
f
ls-l(A)
f(x)dx =
f
}A
Ff(x)dx,
412
This is a special case of formula {12.3.4) and it shows that the FrobeniusPerron operator P on densities corresponds, in the sense of diagram {12.3.3),
to the Frobenius-Perron operator P on measures.
This correspondence was obtained under the additional assumption that
S is nonsingular. For an arbitrary Borel measurable transformation S, the
operator P given by {12.4.1) may transform absolutely continuous measures
into measures without density.
S(x)
= {~..,
0$ x < 1
X~
1.
Then
= { [o, 1)
0
if 0 E A
ifO A.
= 1A{O)p{[O, 1)).
Ptt(A)
= 0, 1, ... ,
en
{12.4.2)
413
(ii) The random vectors eo,et ... , have values in Wand have the same
distribution, that is, the measure
v(B)
= prob(en E B)
forB E B(W)
(iii) The initial random vector Xo has values in X and the vectors Xo, eo,
et. ... ' are independent.
A dynamical system of the form (12.4.2) satisfying conditions (i)-(iii)
will be called a regular stochastic dynamical system. We emphasize
that in studying (12.4.2) it is assumed that the transformation T and the
random vectors en are given. The initial vector Xo can be arbitrary, but
must be such that condition (iii) is satisfied. Observe that in particular
if eo, e1 are independent and Xo E X is constant (not random) then the
vectors Xo, ell el ... ' are also independent. This can be easily verified using
the definition of the independence of random vectors and the fact that the
value of prob(xo E A) is either 0 or 1 for Xo constant.
According to (12.4.2) the random vector Xn is a function of Xo and eo.
6, ... 1 en-1 From this and condition (iii) it follows that Xn and en are
independent. Using this fact we will derive a recurrence formula for the
measures
A E B(X),
J.l.n(A) = prob(xn E A),
(12.4.3)
which statistically describe the behavior of the dynamical system (12.4.2).
Thus, choose a bounded Borel measurable function h: X -+ R and for
some integer n ~ 0 consider the random vector Zn+l = h(Xn+t) Observe
that
J.l.n+t(A) = prob(x;;~ 1 (B)).
Using this equality and the change of variables Theorem 3.2.1, the mathematical expectation E(zn+l) can be calculated as follows:
E(zn+l)
=
=
fo
E(zn+t) =
h(xn+t(w))prob(tku)
h(X)J.I.n+t(dx)
i h(x)prob(x;;~1 (dx))
= (h,J.I.n+t}
(12.4.4)
= h(T(xn,en)) we have
fo
=j
h(T(xn(w),en(w)))prob(tku)
{
h(T(x,y))prob((xn,en)- 1(dxdy)).
lxxw
{12.4.5)
414
or
which shows that the measure prob((xn, en)- 1 (C)) is the product of measures
l'n(A)
= prob(x;- 1(A))
and v(B)
= prob(e;- 1(B)).
L{fw
L{fw
(12.4.6)
This is the desired recurrence formula, derived under the assumption that
h is Borel measurable and bounded. The boundness of h asserts that all
the integrals appearing in the derivation are well defined and finite, since
the measures J'n, l'n+ll prob, ... , were probabilistic. The same derivation
can be repeated for unbounded h as long as all the integrals are well defined. In particular the derivation can be made for an arbitrary measurable
nonnegative h. However, in this case the integrals on both sides of (12.4.5)
could be infinite.
Using (12.4.5) we may calculate the values of l'n+l(A) for an arbitrary
measurable set A c X. Namely, setting h = 1A we obtain
l'n+l(A) =
L{fw
1A(T(x,y))v(dy)} l'n(dx).
L{fw
(12.4.7)
will be called the Foias operator corresponding to the dynamical system
(12.4.2).
Since v is a probabilistic measure, it is obvious that P is a Markov
operator. Moreover, from the definition of P it follows that l'n = pnp.o,
where {l'n} denotes the sequence of distributions (12.4.3) described by the
dynamical system (12.4.2).
Setting
Uh(x)
= fw h(T(x,y))v(dy)
for x EX,
415
(12.4.8)
(gn,PJ.I.} = (Ugn,J.I.},
where
9n
= L~i1A,
i=l
= 1,2, ....
(12.4.10)
{12.4.11)
where yn =
=f
h(Tn(x, yn))vn(dyn),
(12.4.12)
Jwn
(Yl. ... ,yn). wn = w X ... X w is the Cartesian product of
for n
= 1,2, ... ,
416
E(h(xn))
=In h(xn(w))prob(dw)
=
Lh(x)prob(:z:~ (dx)) L
1
h(x)JJn(dx)
or
(12.4.13}
In particular, if the starting point :z:o is fixed, corresponding to JJo = 6~~:01
we have
(12.4.14)
Thus, unh gives the mathematical expectation of h(xn) as a function of
,
the initial position :z:o.
We close this section by discussing the relationship between the FrobeniusPerron and Foias operators. Having a continuous transformation S: X - t X
we may formally write
T(x, y)
= S(x) + Oy.
L{fw
Pp.(A) =
1A(S(x))v(dy)} JJ(dx)
and is identical with (12.4.1}. Thus, in the case when T(x, y) does not
depend on y the notions of the Foias operator and the Frobenius-Perron
operator coincide. Moreover, in this case
Uh(x)
= fw h(S(x))v(dy) = h(S(x)),
(12.4.15)
417
Xn
prob(xn E A)
= J.I,.(A)
for A E B(X),
= J.t.(A)
for A E B(X), n
= 0, 1, ....
= pnJ.to(B) ;::: 1- e
for n
= 0, 1, 2, ....
(12.5.1)
= 1,2, ....
(12.5.2)
Proof. Define
(n
1 n-1
n-1
1
L
P' J.to = - L J.ti
n
n
=-
i=O
for n
i=O
Choose a countable subset {h1, h2, ... } of Co(X) dense in C0 (X) (see Exercises 12.1 and 12.2). The sequence {(h1, (n)} is bounded since the (n are
probabilistic and l(h1, (n)l:::;; max lh1l Thus, there is a subsequence {(1n} of
{(n} such that { (hl! (1n)} is convergent. Again, since { (h2, (1n)} is bounded
we can choose a subsequence {(2n} of {(1n} such that { (h2, (2n)} is convergent. By induction for every integer k > 1 we may construct a sequence
{(A:n} such that all sequences {(h;,(A:n}} for j = 1, ... ,k are convergent
and {(A:n} is a subsequence of {(A:-1,n} Evidently the diagonal sequence
{(nn} has the property that {(h;,(nnH is convergent for every j = 1, 2, ....
418
Now we may prove that {{h, (nn}} converges to (h, J.l.} for every bounded
continuous h. Let h be given. Define he = hge where 9e E Co is such that
for x E B.
Then
n~~
= (h, J.l.)
= 1x we obtain
so J.l. is probabilistic.
Now we are ready to prove that I' is invariant. The sequence {(nn}, as
a subsequence of {(n}, may be written in the form
1
(nn =
k
n
k,.-1
2:::
P'J.i.O,
i=O
and, consequently,
1
I(Uh, (nn}- (h, (nn}l = l(h, P(nn}- (h, (nn}l :5 kn sup lhl.
419
(Uh,p..}- (h,p..} = 0,
or
(h,Pp..}
= (h,p..).
The last equality holds for every bounded continuous h and in particular
for hE C0 Thus, by the Riesz representation theorem 12.1.1, Pp.. = I-'
The proof is completed.
Condition (12.5.1) is not only sufficient for the existence of an invariant distribution I-' but also necessary. To see this, assume that I-' exists.
Let {Bk} be an increasing sequence of bounded measurable sets such that
Uk Bk =X. Then
lim p..(Bk) = p..(X) = 1.
k-+oo
Thus, for every e > 0 there is a bounded set Bk such that p..(Bk)
Setting P.o = I-' we have 1-'n = I-' and, consequently,
for n
1- e.
= 0,1, ....
< oo.
(12.5.3)
Ba
for
a~
0.
420
IJ.n(X \ Ba)
or
for n = 0, 1, ... ,
where K = supn E(V(xn)). Thus, for every e > 0 inequality (12.5.1) is
satisfied with B = Ba and a= K- e. It follows from Theorem 12.5.1 that
P has an invariant distribution and the proof is complete.
It is easy to formulate a sufficient condition for (12.5.3) related explicitly
to properties of the function T of (12.4.2) and the distribution v. Thus we
have the following
Proposition 12.5.2. Let P be the Foias operator corresponding to a regular
stochastic dynamical system (12.4.2). Assume that there exists a Liapunov
function V and nonnegative constants a, {3, a < 1, such that
forx EX.
(12.5.4)
~ V(x) + 1 ~a
Fix an Xo E X and define #J.o
(12.4.12) we have
E(V(xn))
= unv(xo) = f
~n
= Dzo
V(Tn(xo,yn))vn(dyn)
~ V(xo) + -
/3 ,
1 -a
421
we will introduce two types of asymptotic stability. We will start from the
following.
(12.6.1)
In the special case that P is a Foias operator corresponding to a stochastic dynamical system (12.4.2) and {pn} is weakly asymptotically stable,
we say that the system is weakly asymptotically stable.
It may be shown that the uniqueness of the stationary distribution J.l.
is a consequence of the condition (12.6.1). To show this, let ji. E M 1 be
another stationary distribution. Then pnji. = ji. and from (12.6.1) applied
to J.l. = ji. we obtain
for h E Co(X).
By the lliesz representation theorem 12.1.1, this gives ji. = J.l. On the
other hand, condition (12.6.1) does not imply that J.l. is stationary for an
arbitrary Markov operator.
Example 12.6.1. Let X = [0, 1]. Consider the Frobenius-Perron operator P on measures and the Koopman operator U corresponding to the
transformation
S(x) = { c!x x > 0
x=O,
where c E [0, 1] is a constant. Now
x>O
x=O.
Thus, for every
J.1.
n-+oo
422
=0
(12.6.2)
and (12.6.1) implies (12.6.2). Alternately, if (12.6.2) holds and p.. is stationary, then substituting p, = I' in (12.6.2) we obtain (12.6.1).
The main advantage of condition (12.6.2) in comparison with {12.6.1) is
that in proving the convergence we may restrict the verification to subsets
of Co and M1.
= p.(A n B)
p.(B)
an
_{A)
p
= [J(A n B)
p.(B)
for A E B(X).
423
IJi(A) - p(A)I
Now let a function g E
4e
for A E B(X).
c. be given. Then
forgE C.,h E Co
and the density of C. in Co, (12.6.3) follows for all hE Co. Thus the proof
is complete.
Now we may establish the main result of this section, which is an effective
criterion for the weak asymptotic stability of the stochastic system (12.4.2).
Theorem 12.6.1. Let P be the Foias operator corresponding to the regular
stochastic dynamical system (12.4.2). Assume that
forx,z EX
(12.6.3)
and
E(IT(o,en)l) ~ {3,
(12.6.4)
and
(12.6.5)
fw IT(O,y)lv(dy) ~ {3.
(12.6.6)
424
lwn
= {
Jwn-1
{ { IT(Tn-1(x,y"- 1),yn)
Jw
::5 a {
ITn-1(x,y"- 1)- Tn-1(z,y"- 1)lv"- 1(dy"- 1)
Jwn-1
::5 ::5 a"lx- z!.
(12.6. 7)
Now consider the subset C. of Co which consists of functions h satisfying
the Lipschitz condition
!h(x)- h(z)l ::5 klx- zl
IL
U"h(x)P,(dx)l, (12.6.8)
where B is a bounded set such that p.(B) = p.(B) = 1. Since the measures
JJ and P, are probabilistic there exist points qn, rn E B such that
IL
U"h(x)p.(dx)-
::5 {
}wn
!Tn(qn,y")- Tn(rmy")lv"(dy")
425
where d = sup{lx- zl:x,z E B}. Since kdan-+ 0 as n-+ oo, this implies
(12.6.3) for arbitrary h E C. and p., ji. E M1 with bounded supports. According to Propositions 12.6.1 and 12.6.2 the proof of the weak asymptotic
stability is complete.
Remark 12.6.1. When T(x, y) = S(x) does not depend on y, condition
{12.6.4) is automatically satisfied with {3 = IS(O)I and inequality (12.6.3)
reduces to
IS(x)- S(z)l $ alx- zl
for x, z EX.
In this case the statement of Theorem 12.6.1 is close to the Banach contraction principle. However, it still gives something new. Namely, the classical
Banach theorem shows that all the trajectories {Sn(xo)} converge to the
unique fixed point x. = S(x.). From Theorem 12.6.1 it follows also that the
measures p.(s-n(A)) (with p. E Mt) converge to 6:r:. which is the unique
stationary distribution. 0
en
forB E B(X)
(12.7.1)
{12.7.2)
426
{12.7.3)
x E X, y E W
implies x + y E X.
PJ.t(A)
=[
and
Uh(x)
= fw h(8(x) + y)v(dy)
for x EX.
{12.7.5)
= (Unh, J.t} = [
(12.7.6)
Proposition 12. 7.1. If in the regular stochastic dynamical system (12. 7.3)
the transformation 8 and perturbations {en} satisfy the conditions
l8(x)l ~ alxl + 'Y
for x EX
(12.7.7)
and
{12.7.8)
where a,-y,k are nonnegative constants with a < 1, then (12.7.3) has a
stationary distribution. Moreover, if (12.7.7) is replaced by the stronger
condition
(12.7.9)
l8(x)- 8(z)l ~ alx- zl
for x, z EX,
then (12.7.3) is weakly asymptotically stable.
fw lylv(dy) ~ k.
427
(12.7.8)
we obtain
Pp.(A) = [
For fixed yEW the function 1A(S(x)+y) is the result of the application of
the Koopman operator to 1A(x+y). Denoting by Ps the Frobenius-Perron
operator (acting on densities) corresponding to S, we may rewrite the last
integral to obtain
Pp.(A) =
L{[
= fw
{[+
1A(x)Psf(x- y)
11
dx} v(dy).
Inside the braces the integration runs over all x such that x E A and
x EX+ y, or, equivalently, x E A and x- y EX. Thus,
428
P~-(A) = fw
=
{L
L{fw
1x(x- y)Psf(x- y)
clx} v(dy)
(12.7.10)
The function
q(x)
(12.7.11)
1-'a(X)
for 1- E Mt1 01
{12.7.12)
Since (P~-)a is the maximal absolutely continuous measure which does not
exceed PI-, we have (PI-')a 2::: PIJ.a In particular,
Ff(x)
(12.7.13)
If S and v are both nonsingular, we can say much more about the asymptotic behavior of (P"~-) 0 This behavior is described as follows.
Theorem 12. 7.1. Let P be the Foias opemtor corresponding to the regular
stochastic system (12.7.3). If the tmnsformation S and the distribution v
of mndom vectors {en} are nonsingular, then
(12.7.14)
429
Ptt.(A)
~
=
L{fw
L{fw
= {{{
lx lw+S(x)
The integration in the braces of the last integral runs over all y such that
y E A and y E W + S(x), or equivalently all y E A and y- S(x) E W.
Thus, the last inequality may be rewritten in the form
PJ.ta(A)
~
=
L{L
L{L
Setting
r(y)
r(y) dy
The measure P J.La + u is absolutely continuous and consequently the absolutely continuous part of PJ.L satisfies
In particular,
(PJ.L)a(X) ~ PJ.ta(X). + u(X) = JJ.a(X) + u(X).
(12.7.15)
u(X)
L{L
=f { f
lx
lx-S(x)
430
Set va(W)
fx
e)pa(X).
Is
or
pn fo(:z:) d:z:
lx\B
Now let F
cX
= J.&n(B) ~ 1- e
pnfo(:z:)d:z:~e
for n
for n ~no,
~no.
pn fo(:z:) d:z:
= J.&n(F) = PJ.&n-l(F),
(12.7.16)
PJ.tn-1(F)
11
114
431
+ "
L{fw
+ L{fw
v.(X)
= 1-
114
(X)
= 1-
3e,
this implies
+ 1- 3e.
=sup {
9a(y)dy}
zex lwn(F-z)
+1-
3e
+ 1- 3e.
9o(Y) dy :5 e
lwn(F-z)
and consequently
pn /o(x) dx
= 1- e
for n ~no(/)
(X\B)UF
pn f =
L Ai(/)ga"(i) + Qnf
(12.7.17)
n=0,1, ... ,
i=1
9a(i)
432
The last equality implies that 9i is the density of JL. Thus, there is only one
term in the summation portion of (12.7.17) and g1 is the invariant density.
Step III. Consider the sequence { pn JLo} with an arbitrary JLo E M 1
Choose an e > 0. According to Theorem 12.7.1 there exists an integer k
such that
(PicJLo)a(X) = JL~ca(X) ~ 1- e.
Define fJ
(1 - fJ)p..
or
where 1111 denotes the distance defined by equation (12.2.7). The last two
terms are easy to evaluate since
{12.7.20)
and
(1- fJ)IIJLII
(12.7.21)
The measure (J- 1JLka is absolutely continuous and normalized. Denote its
density by Ia pn((J- 1JLkn) clearly has density f>n Ia and from equation
(12.2.11)
Since {f>n} is asymptotically stable the right-hand side of this equality converges to zero as n-+ oo. From this convergence and inequalities (12.7.20)
and (12.7.21) applied to (12.7.19), it follows that
lim IIJLn+lc -1-'nll
n-+oo
= 0.
433
{pt,PN),
Pi~
o,
LPi
i=l
= 1,
= i) =Pi
~o.~t ...
such that
for i = 1, ... , N.
= 0, 1, ....
(12.8.1)
It is clear that in this case T(x, y) = Sy(x) and W = {1, ... , N}. The
system (12.8.1) is called (Barnsley, 1988) an iterated function system
(IFS).
Using the general equations (12.4. 7) and (12.4.8) it is easy to find explicit
formulas for the operators U and P corresponding to an iterated function
system. Namely,
Uh(x)
or
Uh(x)
= LPih(Si(x))
for x EX.
(12.8.2)
i=l
Further,
N
PJL(A)
or
PJL(A)
= LPiJL(S; 1 (A))
i=l
for A E B(X).
{12.8.3)
434
(12.8.4)
Proposition 12.8.1. If
N
~PiLi< 1,
(12.8.5)
i=l
~ lx- zl ~PiLi
i=l
and
E(IS~"(O)I)
= ~PiiSi(O)I.
i=l
= m~ Li < 1
I
and Pi > 0,
i= 1, ... ,N
(12.8.6)
is called hyperbolic. Our goal now is to study the structure of the set
A. = supp I',
(12.8.7)
F(A)
= U Si(A)
i=l
forA eX
(12.8.8)
435
(12.8.9)
Proof. It is clearly sufficient to verify that supp J.t 1 = F(supp JJ.o) since the
situation repeats. Let x E F(supp JJ.o) and e > 0 be fixed. Then x = Sj(z)
for some integer j and z E supp Jl.o. Consequently, for the ball Br(z) we
have JJ.o(Br(z)) > 0 for every r > 0. Further, due to the continuity of Sj
there is an r > 0 such that
This gives
n
J.t1(Be(x))
= LPsJJ.o(Bi 1(Be(x))
i=1
Since e > 0 was arbitrary this shows that x E supp 1'1 We have proved the
inclusion F(supp JJ.o) c supp 1'1
Now, suppose that this inclusion is proper and there is a point x E
supp J.t 1 such that x F(supp JJ.o). Due to the compactness of F(supp JJ.o)
there must exist an e > 0 such that the ball Be(x) is disjoint with F(supp JJ.o).
This implies
fori= 1, ... ,N
or
J'1(Be(x))
= LPsJJ.o(Bi 1(Be(x)) = 0
i=1
436
A.
= supp I',
(12.8.10)
which is called the attractor of the iterated function system, can be obtained as the limit of the sequence of sets
An = supp J.'n
= F(Ao).
{12.8.11)
To state this fact precisely we introduce the notion of the Hausdorff distance
between two sets.
Definition 12.8.2. Let At, A2 c R"- be nonempty compact sets and let
> 0 be a real number. We say that At approximates A2 with accuracy r
if, for every point Xt EAt, there is a point x2 E A2 such that lxt- x2l ~ r
and for every x 2 E A there is an Xt E At such that the same inequality
holds. The infimum of all r such that At approximates A2 with accuracy
r is called the Hausdorff distance between At and A2 and is denoted by
dist(At, A2).
We say that a sequence {An} of compact sets converges to a compact
set A if
lim dist(Am A) = 0.
n-o ex>
r
From the compactness of A it easily follows that the limit of the sequence
{An}, if it exists, must be unique. This limit will be denoted by liron..... oo An.
Example 12.8.1. Let X
= R, A= [0, 1] and
2n-1}
1 2
An= { 2n'2n''"'2ft
for n
= 1,2, ....
2n - 2n
I
Thus, An approximates A with accuracy 1/2n. Moreover, for x
the nearest point in An is 1/2n. Consequently,
=0 E A
dist(An, A) = ;n.
This example shows that sets which are close in the sense of Hausdorff
distance can be quite different from a topological point of view. In fact, each
437
= max{ISi(xo) -
xol: i
= 1, ... , N}.
Then
or by induction,
ISil
0 0
(12.8.12)
for every sequence of integers i1, ... , in with 0 $ ik $ N. Choose an arbitrary point z E X such that
r
+ 1.
(12.8.13)
We are going to prove that z f/. supp p.,.. Fix an e E (0, 1). From inequality
(12.2.6) and equation (12.8.3} we obtain
p.,.(B~(z))
$liminf P.n(B~(z))
(12.8.14}
n-+oo
= liminf
n-+oo
"
L..J' Pi 1
..
x0 ~
0 ... 0
"'
Si; 1 o o Si:
(B~(z)).
p.,.(B~(z)) =
438
A. = lim Fn(Ao)
(12.8.15)
n-+oo
whenever
Ao C X
Prooof. We divide the proof into two steps. First we show that the limit
of {FR(Ao)} does not depend on the particular choice of Ao, and then we
will prove that this limit is equal to supp P..
Step I. Consider two initial compact sets Ao, Zo c X and the corresponding sequences
n=0,1, ....
We are going to show that dist(An, Zn) converges to zero. Let r > 0 be
sufficiently large so A0 and Zo are contained in a ball of radius r. Now fix
an integer nand a point x E An. According to the definition ofF there
exists a sequence of integers kt, ... , kn and a point u E Ao such that
= A. = supp p..
X and define
439
Substituting this into (12.8.16) we obtain (12.8.15) and the proof is complete.
It is worth noting that for systems which are not hyperbolic, equality
(12.8.15) may be violated even if condition (12.8.5) is satisfied. In general
the set lim F"(Ao) is larger than A.= supp p..
Example 12.8.2. Let X = R, S1(x) = x and S2(x) = 0 for x E R.
Evidently for every probabilistic vector (p1,P2) with Pl < 1 the condition
(12.8.5) is satisfied. Thus the system is weakly asymptotically stable and
there exists unique stationary distribution P.. It is easy to guess that p.. =
60 In fact, according to (12.8.3),
P6o(A)
Sil(A)
= {R
ifO E A
0 ifO A.
Therefore
= (0, 1]
n=0,1, ....
440
1/~
II II
II II
1/~
FIGURE 12.8.1.
since, by the first part of the proof, An :::> An+k(j). The set An is closed and
the conditions xi E An, xi -+ x imply x E An This verifies the inclusion
An :::> A. and completes the proof.
Our first example of the construction of an attractor deals with a onedimensional system given by two linear transformations. Despite the simplicity of the system the attractor is quite complicated.
Example 12.8.3. Let X
St(x)
Choose
= ix
= R and
and S2(x)
= !x + j
forxeR.
i, j).
A2 = F(At) = St
=
441
the sum of 2n intervals of length 1/3n. The Borel measure of An is {f} n and
converges to zero as n -+ oo. The limiting set A. has Borel measure zero
since it is contained in all sets An. This is the famous Cantor set-the
source of many examples in analysis and topology. 0
Example 12.8.4. Let X = R 2 and
Si(x)
bi
i = 1,2,3,
where
a t -- bt -- O'
a2 -
a3 -
14' "
,._3-- 12.
Choose A0 to be the isosceles triangle with vertices (0, 0), (1, 0), (!, 1) (see
Figure 12.8.2a). St(Ao) is a triangle with vertices {0,0), (},0}, (!,}).The
triangles S 2(A0 } and S3(Ao) are congruent to St(Ao) but shifted to the
right, and to the right and up, respectively. AB a result, the set
At= F(Ao) = St(Ao) U S2(Ao) U S3(Ao)
is the union of three triangles as shown in Figure 12.8.2b. Observe that At
is obtained from Ao by taking out the middle open triangle with vertices
(},0}, (!, }), (i, }). Analogously each set Si(At}, i = 1,2,3, consisting of
three congruent triangles of height land A2 = F(At} in the union of nine
triangles shown in Figure (12.8.2c). Again A2 can be obtained from At by
taking out three middle triangles.
This process repeats and in general An consists of 3n triangles with height
(!}n, base (!}n, and total area
m(An} =
! {if,
which converges to zero as n-+ oo. The limiting set A., called the Sierpinski triangle, has Borel measure zero. It is shown in Figure 12.8.2d. Unlike
the Cantor set, the Sierpinski triangle is a continuum (compact connected
set) and from a geometric point of view it is a line whose every point is a
ramification point. The Sierpinski triangle also appears in cellular automata
theory [Wolfram, 1983]. 0
In these two examples the construction of the sets An approximating A.
was ad hoc. We simply guessed the procedure leading from An to An+l,
taking out the middle intervals or middle triangles. In general, for an arbitrary iterated function system the connection between An and An+l is not
so simple. In the next theorem we develop another way of approximating
A. which is especially effective with the aid of a computer.
Theorem 12.8.2. Let (12.8.1} be a hyperbolic system. Then for every x 0 E
X and e > 0 there exist two numbers no= no(e) and ko = k0 (e) such that
prob(dist({xn.Xn+fe},A.) <e)> 1-e
for n ?:
no. k ?: ko,
(12.8.18)
442
FIGURE 12.8.2.
In other words Theorem 12.8.2 says the following. If we cancel the first
or more elements of the trajectory {xn}, then the probability that a
sufficiently long segment Xn, , Xn+fc approximates A. with accuracy e: is
greater than 1 - e:.
no
for n
By
~no.
Xn E An.
Xn
there is
Zn E
(12.8.19)
443
Now we are going to find /co. Since A. is a compact set there is a finite
sequence of points ai EA., i = 1, ... , q such that
q
A. c
U B~;2(ai)
(12.8.20)
i=l
Pick a point u E An0 The set {u}, which contains the single point u, is
compact and according to Theorem 12.8.1 there exists an integer r such
that
dist(Fr({u}),A.) < ~
The points of Fr({u}) are given by Ba 1 o .. o Sar(u). Thus, for every
i = 1, ... , q, there exists a sequence of integers a(i, 1), ... , a(i, r) for which
IBa(i,l) 0 " . 0 Ba(i,r)(u)- ail <
4'
This inequality holds for a fixed u E An0 When u moves in Ano, the
corresponding value Ba(i,t) o o Ba(i,r)(u) changes by at most rc where
c = max{lu-vl: u,v E An0 }. Choosing r large enough, we have Lrc < e/4,
and consequently
for i = 1, ... , q, u E An0 (12.8.21)
Now consider the segment Xn, ... , Xn+k of the trajectory given by (12.8.1)
with n ~ no. We have
Xn+j
If the sequence en," . en+k contains the segment a(i, 1),". 'a(i, r), that
is,
en+Hr-1 = a( i, 1), ... , en+; = a( i, r)
(12.8.22)
for some j, 0 ::; j ::; k- r, then (12.8.21) implies Xn+Hr E B~;2(ai) The
probability of the event (12.8.22), with fixed j, is equal to Pa(i,l) Pa(i,r)
and the probability of the opposite event is smaller than or equal to 1 - pr
where p = minPi The probability that en, ... ,en+k with k ~ rm does
not contain the sequence a(i, 1), ... , a(i, r) is at most (1 - pr)m. For sufficiently large m we have (1- pr)m ::; efq. With this m and k ~ ko = rm
the probability of the event that en. ... 'en+k contains all the sequences
a(i, 1), ... , a(i, r), fori= 1, ... , q is at least 1- q(1- pr)m ~ 1- e. When
the last event occurs, then for every point ai there is a point Xn+i+r such
that lxn+Hr -ail < e/2. In this case according to (12.8.20) every point
x E A. is approximated by a point of the segment Xn, ... , Xn+k with accuracy e. From this and (12.8.19) it follows
dist( {xn, ... , Xn+k}, A.) <e.
The proof is completed.
444
S1(x)=
S2(x)
(~ ~)x+ (3)
=(
S3(x) =
- r cos cp -r sin cp
- rsincp
)
)
rcoscp
q cos 1/J
-r sin 1/J
q sin 1/J
r cos 1/J
where
c = 0.255,
r = 0. 75,
no =
ko = 150000,
q = 0.625,
Choosing
100,
lxol ~ 1,
P1 = P2 = P3 =
445
FIGURE 12.8.3.
When the size l of the cube K is arbitrary the calculation is a little more
complicated. Namely, K may be divided into cubes of size En = lfn and
the number of these cubes is N(En) = nd. Consequently,
d= lim
n-+oo
dlogn
=lim logN(En).
logn -logl n-+oo log(1/En)
. K
d1m
if this limit exists.
lim
= e-+0
logN(E)
log(1/E)
--.:::...,.....-"-'-
(12.8.23)
446
12. Markov
Calculation of the fractal dimension by a direct application of Definition 12.8.a is difficult. It may be simplified and the continuous variable e
replaced by an appropriate sequence {en.} Namely, if for some c > 0 and
0 < q < 1 we define En. = cq"', and if
d
cq
(12.8.24)
exists, then the limit (12.8.2a) also exists and dim K = dcq [Barnsley, 1988;
Chapter 5).
Using this property we may find the dimension of the attractors A.
described in Examples 12.8.a and 12.8.4. First consider the case when A.
is the Cantor set. We have
A.
c An = F"([O, 1))
and A. can be covered by 2"' disjoint intervals of length a-n whose sum is
equal to An. Since A. contains the endpoints of these intervals the number
of covering intervals cannot be made smaller. We have, therefore, N(en) =
2n for En= a-n. which gives
.
d1m A.cantor
log2
= log a.
A. c An
= Fn(Ao),
loga
= log 2 .
Exercises
447
Exercises
12~1. Let X c R!'- be a compact set and C(X) be the space of continuous functions f: X -+ R. Using the Weierstrass approximation theorem
prove that in C(X) there exists a dense countable subset of Lipschitzean
functions.
12.2. Let X
= w(x)max(l- n- 1 lxi,O),
lim 8i1
n-+oo
8i 2
0 0
Si n (xo)
exists, and that the set of all such points x corresponding to all possible
sequences {in} is equal to A. [Barnsley, 1988; Chapter 4].
12.8. Consider the hyperbolic dynamical system given, on X
eight transformations
= R 2 , by the
= 1, ... ,8,
1,
where (ai, bi) are all possible pairs made from the numbers (0, ~) excluding (l, l). The attractor A. of this system is called a Sierpinski carpet.
Make a picture of A. and calculate dim A .
References
Abraham, R. and Marsden, J.E. 1978. Foundations of Mechanics, Benjamin/Cummings, Reading, M88Sachusetts.
Adler, R.L. and Rivlin, T.J. 1964. Ergodic and mixing properties of Chebyshev polynomials, Proc. Am. Math. Soc. 15:794-796.
Anosov, D.V. 1963. Ergodic properties of geodesic flows on closed Riemannian manifolds of negative curvature, Sov. Math. Dokl. 4:1153-1156.
Anosov, D.V. 1967. Geodesic flows on compact Riemannian manifolds of
negative curvature, Proc. Steklov Inst. Math., 90:1-:209.
Arnold, V.I. 1963. Small denominators and problems of stability of motion
in classical and celestial mechanics, Russian Math. Suroeys, 18:85-193.
Arnold, V.I. and Avez, A. 1968. Ergodic Problems of Classical Mechanics,
Benjamin, New York.
Barnsley, M. 1988. Fractals Everywhere, Academic Press, New York.
Barnsley, M. and Cornille, H. 1981. General solution of a Boltzmann equation and the formation of Maxwellian tails, Proc. R. Soc. London, Sect.
A, 374:371-400.
Baron, K. and Lasota A. 1993. Asymptotic properties of Markov operators
defined by Volterra type integrals, Ann. Polon. Math., 58:161-175.
Bessala, P. 1975. On the existence of a fundamental solution for a parabolic
differential equation with unbounded coefficients, Ann. Polon. Math.,
29:403-409.
Bharucha-Reid, A.T. 1960. Elements of the Theory of Markov Processes
and Their Applications, McGraw-Hill, New York.
450
References
References
451
452
References
Horbacz, K. 1989b. Asymptotic stability of dynamical systems with multiplicative perturbations, Ann. Polon. Math., 50:209-218.
Jablonski, M. and Lasota, A. 1981. Absolutely continuous invariant measures for transformations on the real line, Zesz. Nauk. Uniw. Jagiellon.
Pr. Mat., 22:7-13.
Jakobson, M. 1978. Topological and metric properties of one-dimensional
endomorphisms, Dokl. Akad. Nauk. SSSR, 243:866-869 [in Russian).
Jama, D. 1986. Asymptotic behavior of an integra-differential equation of
parabolic type, Ann. Polon. Math., 47:65-78.
Jama, D. 1989. Period three and the stability almost everywhere, Rivista
Mat. Pum Appl., 5:85-95.
Jaynes, E.T. 1957. Information theory and statistical mechanics, Phys.
Rev., 106:62Q-630.
Kamke, E. 1959. Differentialgleichungen: Losungsmethoden und Losungen.
Band 1. Gewonliche Differential-gleichungen, Chelsea, New York.
Katz, A. 1967. Principles of Statistical Mechanics, Freeman, San Francisco.
Kauffman, S. 1974. Measuring a mitotic oscillator: The arc discontinuity,
Bull. Math. Biol., 36:161-182.
Keener, J.P. 1980. Chaotic behavior in piecewise continuous difference
equations, 7rans. Amer. Math. Soc., 261:589-604.
Keller, G. 1982. Stochastic stability in some chaotic dynamical systems,
Mh. Math., 94:313-333.
Kemperman, J.H.B. 1975. The ergodic behavior of a class of real transformations, in Stochastic Processes and Related Topics, pp. 249-258 (vol.
1 of Proceedings of the Summer Research Institute on Statistical Inference, Ed. Madan Lal Puri). Academic Press, New York.
Kielek, Z. 1988. An application of the convolution iterates to evolution
equation in Banach space, Universitatis Jagellonicae Acta Mathematica,
27:247-257.
Kifer, Y.l. 1974. On small perturbations of some smooth dynamical systems, Math. USSR Izv., 8:1083-1107.
Kitano, M., Yabuzaki, T., and Ogawa, T. 1983. Chaos and period doubling
bifurcations in a simple acoustic system, Phys. Rev. Lett., 50:713-716.
Knight, B.W. 1972a. Dynamics of encoding in a population of neurons, J.
Gen. Physiol., 59:734-766.
Knight, B.W. 1972b. The relationship between the firing rate of a single
neuron and the level of activity in a population of neurons. Experimental
evidence for resonant enhancement in the population response, J. Gen.
Physiol., 59:767-778.
Komornik, J. and Lasota, A. 1987. Asymptotic decomposition of Markov
operators, Bull. Polon. Acad. Sci. Math., 35:321-327.
References
453
454
References
Li, T.Y. and Yorke, J.A. 1978b. Ergodic maps on [0, 1] and nonlinear pseudorandom number generators, Nonlinear Anal., 2:473-481.
Lin, M. 1971. Mixing for Markov operators, Z. Wahrscheinlichkeitstheorie
Verw. Gebiete, 19:231-242.
Lorenz, E.N. 1963. Deterministic nonperiodic flow, J. Atmos. Sci., 20:130141.
Loskot, K. and Rudnicki, R. 1991. Relative entropy and stability of stochastic semigroups, Ann. Polon. Math., 53:139-145.
Mackey, M.C. and Dormer, P. 1982. Continuous maturation of proliferating
erythroid precursers, Cell Tissue Kinet., 15:381-392.
Mackey, M.C., Longtin, A., and Lasota, A. 1990. Noise-induced global
asymptotic stability, J. Stat. Phys., 60:735-751.
Malczak, J. 1992. An application of Markov operators in differential and
integral equations, Rend. Sem. Univ. Padova, 87:281-297.
Mandelbrot, B.B. 1977. Fractals: Form, Chance, and Dimension, Freeman,
San Francisco.
Manneville, P. 1980. Intermittency, self-similarity and 1/I spectrum in dissipative dynamical systems, J. Physique, 41:1235-1243.
Manneville, P. and Pomeau, Y. 1979. Intermittency and the Lorenz model,
Phys. Lett., 75A:1-2.
May, R.M. 1974. Biological populations with nonoverlapping generations:
stable points, stable cycles, and chaos, Science, 186:645-647.
May, R.M. 1980. Nonlinear phenomena in ecology and epidemology, Ann.
N.Y. Acad. Sci., 357:267-281.
Misiurewicz, M. 1981. Absolutely continuous measures for certain maps of
an interval, Publ. Math. IHES, 53:17-51.
von Neumann, J. 1932. Proof of the quasi-ergodic hypothesis, Proc. Nat.
Acad. Sci. USA, 18:31-38.
Parry, W. 1981. Topics in Ergodic Theory, Cambridge University Press,
Cambridge, England.
Petrillo, G.A. and Glass, L. 1984. A theory for phase locking of respiration
in cata to a mechanical ventilator, Am. J. Physiol., 246:R311-320.
Pianigiani, G. 1979. Absolutely continuous invariant measures for the process Xn+t = Axn(1 - Xn), Boll. Un. Mat. !tal., 16A:374-378.
Pianigiani, G. 1983. Existence of invariant measures for piecewise continuous transformations, Ann. Polon. Math., 40:39-45.
Procaccia, I. and Schuster, H. 1983. Functional renormalization group theory of 1/I noise in dynamical systems, Phys. Rev. A, 28:121Q--1212.
Renyi, A. 1957. Representation for real numbers and their ergodic properties, Acta Math. Acad. Sci. Hung., 8:477-493.
References
455
456
References
If A and Bare sets, then x E B means that "xis an element of B," whereas
A c B means that "A is contained in B." For x B and A . B substitute
"is not" for "is" in these statements. Furthermore, A U B = {x: x E A or
x E B}, AnB = {x:x E A and x E B}, A \B = {x:x E A and x B},
and Ax B = {(x,y):x E A andy E B}, respectively, define the union,
intersection, difference, and Cartesian product of two sets A and B.
Symbol0 denotes the empty set, and
lA(x)
1 ifxEA
if x A
={0
number, then
g(x)
= l(x)
(mod b)
means that g(x) = l(x)- nb, where n is the largest integer less than or
equal to l(x)fb. IIIIILP and {!,g), respectively, denote the V' norm ofthe
458
V!
a. e.
A
8
d(g, :F)
d+lld.x
D,D(X,A,p)
D2 <e>
1>(A)
E<e>
E(V I/)
Ei(x)
{17t}
I
I.
:F
{:Ft}
9ub(x)
.p
9ij
Hn(x)
H(/)
H(/ I g)
I
K(x,y)
V,V(X,A,p)
v'
p(A)
J.'J(A)
J.Lw
{Nth~o
w
0
(O,:F, prob)
p
Pe,P
{Fth~o
prob
Prob
R>.
almost everywhere
u-algebra
Borel u-algebra
L 1 distance between functions g and :F
right lower derivative
set of densities
variance of random variable
domain of an infinitesimal operator A
mathematical expectation of a random variable
expected value of V with respect to I
exponential integral
continuous time stochastic process
an element of V, often a density
stationary density
V set of functions, u-algebra in probability space
family of u-algebras
Gaussian density with variance u 2 /2b
Riemannian metric
Hermite polynomial
entropy of a density I
conditional entropy of I with respect to g
identity operator
stochastic kernel
V space
space adjoint to V
measure of a set A
measure of a set A with respect to a density I
Wiener measure
counting process
an element of 0; angular frequency
space of elementary events
probability space
Markov or Frobenius-Perron operator
Markov operator
continuous semigroup generated by the linear
Boltzmann equation
probability measure
probability measure on a product space
resolvent operator
transformation
s- 1 (A)
Bm
Sl
{StheR, {Sth>o
u(e)
T
Ttl
{7t}t~o
u
v
{w(t)}t>O
(X,A,~t)
e.e,
{en}, {et}
459
Index
462
Index
teiDS,317,323,325-326,
332, (333)
of strong repellor, 154
of transformations on R, 172
automorphism, 80
Chandrasekhar-Miinch equation,
131, 240, 383
change of variables
and asymptotic stability, 165171
in Lebesgue integral, 46
characteristic function, 5
Chebyshev inequality, 114, 310
Chebyshev polynomials, 169
classical solution of Fokker-Planck
equation, 365
closed linear operator (248)
closed linear subspace, 91
compact support, 207
comparison series, 31
complete space, 34
complete measure, 30
conditional entropy, 291-292, 299
connected manifold, 180
constant of motion, 214
constrictive Markov operator, 95,
96
and asymptotic periodicity,
98, 321-323
and perturbations, 321
continuous semigroup, 195
of contractions, 204
of contractions and infinitesimal operator, 226
and ordinary differential equations, 21G-214
continuous stochastic processes, 336
continuous stochastic semigroup,
229
continuous time stochastic process,
254
continuous time system
asymptotically stable, 202
and discrete time systeiDS, 198,
251-252
ergodic, 197
exact, 199, 338-344
mixing, 198, 22G-224
sweeping, 244-245
contracting operator, 39,204
Index
convergence
almost sure, 312
Cauchy condition for, 34
Cesaro, 31
comparison series for, 31
spaces, 33
in different
in mean, 311
to set of functions, 160
stochastic, 311
strong, 31
strong, of measures, 402, 425
weak, 31
weak Cesaro, 31
weak convergence of measures,
397, 421
convex transformation, 153-156
counterimage, 5
counting measure, 18
counting process, 254
curvature, 225-226
cycle, 102
cyclical permutation, 102
cylinder, 221, 340
dense set, 97
dense subset of densities, 97
density, 5, 9, 41
of absolutely continuous measure, 41
Gaussian, 398, 409
evolution of by Fcobenius-Perron
operator, 38, 241-243
of random variable, 253
stationary, 41
and sweeping, 125, 129, 386
derivative
Radon-Nikodym, 41
right lower, 123
strong, 207
determinant of differential on manifold, 182
diffeomorphism, 58
differential
determinant of, on manifold,
182
463
464
Index
196
ergodic dynamical system, 197
ergodic Markov operator, 79, (83),
102
ergodic semidynamical system, 197
ergodic transformation, 59
ergodicity
conditions for via FrobeniusPerron operator, 61, 72,
94, 220
conditions for via Koopman
operator, 59, 75,215,220,
230
and Hamiltonian systems, 230
and linear Boltzmann equation, 273-276
illustrated, 68
of motion on torus, 216-218
necessary and sufficient conditions for, 59
relation to mixing, exactness
and K -automorphisms, 80
and rotational transformation,
62, 75, 198
essential supremum, 27
Euler-Bernstein equations, 357
events
elementary, 253
independent, 253
mutually disjoint, 253
exact Markov operator, 79, (83),
103
exact semidynamical system, 199
Index
465
466
Index
gradient
of function, 177
length of, 181
Hahn-Banachtheorem, 91
Hamiltonian, 213
system, 213, 218, 225, 23G231, 293
hat map, (50), 167, (188)
Hausdorff distance, 436
and capacity, 444
and fractal dimension, 444
Hausdorff space, 177
heat equation, 203,234,243, (300),
409
Henon map, 56
Hille-Yosida theorem, 226, (248)
homeomorphism, 176
hyperbolic iterated function system, 434, (447)
ideal gas, 22G-224, 277-280
independent events, 253, (280)
independent increments, 254
independent u-algebras, 344
independent random variables, 253,
304, 314
independent random vectors, 304,
(333)
indicator function, 5
inequality
Cauchy-Holder, 27
Gibbs, 284
Jensen, 288
triangle, 26
infinitesimal operator, 206
of continuous semigroup of contractions, 226
and differential equations, 210
and ergodicity, 215
and Frobenius-Perron operator, 212-214, 229
and Hamiltonian systems, 213214
and Hille-Yosida theorem, 226
Index
Jacobian matrix, 46
Jensen inequality, 288
joint density function, 304
K -automorphism, 80
and exactness, 82
and geodesic flows, 226
and mixing, 82
Keener map, 322-323
K-flow, 226
Kolmogorov automorphism, 80
Kolmogorov equation, see FokkerPlanck equation
Koopman operator, 47-49, 203
and Anosov diffeomorphism,
77
and motion on torus, 216-218
relation to ergodicity, 59, 75,
215-216, 220, 230
relation to Frobenius-Perron
operator, 48, 204, 241243
relation to infinitesimal operators, 21o-211, 230
relation to mixing, 75, 220
relation to ordinary differential equations, 21o-211
and rotational transformation,
75
Krylov-Bogolubov theorem, 419
467
468
Index
constrictive, 95, 96
contractive property of, 39,
201
deterministic, (50)
ergodic, 79, (83), 102
exact, 79, (83), 103
expanding, 132
fixed point of, 40
and Foias operator, 414
and Frobenius-Perron operator, 43
and linear abstract Boltzmann
equation, 261-268
lower-bound function for, 106
for measures, 405
nrubdng, 79, (83), 104
and parabolic equation, 368
properties of, 38-39
relation to entropy, 289--292
semigroup of, 201
stability property of, 39, 202
stationary density of, 41
with stochastic kernel, 111,
(136), 243, 270
and stochastic perturbation,
317, 320, 327, 331
s~ping, 125-129
weak continuity of, 49
mathematical expectation, 306
maximal entropy, 285-288
maximal measure, 435
Maxwellian distribution, 378
mean value
of function, 139
of random variable, 306
measurable function, 19
space of, 25
measurable set, 18
measurable transformation, 41
measure, 18
absolutely continuous, 41
absolutely continuous part, 425
Borel, 19
complete, 30
continuous part of, 425
density of, 41
Dirac, 395, 399, 403, 408, 411
distance bet~n, 401
with Gaussian density, 398,
409
invariant, 52, (83), 196, 417
Lebesgue, 30
Lebesgue decomposition of,
426
locally finite, 393
maximal, 425
and Markov operator, 405
nonsingular, 425
norm of, 402
normalized, 41
preserving transformation, 52
product, 30, 259
probabilistic, 402, 403
singular part of, 425
stationary, 417
strong convergence of, 402, 425
support of, 394
uniqueness, 395
weak convergence of 397, 421
Wiener, 340-341
measure-preserving transformation,
52, 196
measure space, 18
finite, 19
normalized, 19
probabilistic, 19
product of, 30
u-finite, 19
metric, Riemannian, 179
mixing, 65
of Anosov diffeomorphism, 77
of baker transformation, 65,
(83)
of dyadic transformation, 66
dynamical system, 198
illustrated, 70
Markov operator, 79, (83), 104
necessary and sufficient conditions for via FrobeniusPerron operator, 72, 220
Index
469
470
Index
199
of Koopman operator, 203
stochastic, 201
sweeping, 243
of transformations, 195
u-algebra, 18
Borel, 18
independent, 344
nonanticipative, 347
trivial, 80
u-finite measure, 394
u-finite measure space, 19
Sierpinski
carpet, 447
triangle, 441
simple function, 21
space
adjoint, 26
of measurable functions, 25
space and time averages, 64, 196
spectral decomposition theorem,
98
sphere bundle, 225
stability property of Markov operators, 39
254
stationary measure, 417
statistical stability, 105, (187)
relation to asymptotic stability, 105
relation to exactness, 110
statistically stable transformation,
construction of, 167
Stirling's formula, 267
stochastic convergence, 311
stochastic differential equations,
335,355
relation to Fokker-Planck equation, 359-360
stochastic integrals, 347, 353
stochastic kernel, 111, (136), 243,
274, 277
stochastic perturbation
additive, 315, 320, 327, (333)
and asymptotic periodicity,
254
with stationary independent
increments, 254
stochastic semigroup, 201, (248)
asymptotic stability of, 202
and Bielecki function, 245
relation to Fokker-Planck
equation, 369
and sweeping, 245, (392)
Index
Stratonvich sum, 350
strong asymptotic stability of measures, 425
in regular stochastic systems,
430
and weak asymptotic stability, 426, 434
strong convergence, 31
Cauchy condition for, 34
of densities, 72
of measures, 397, 402
strong law of large numbers, 314
strong precompactness, 86, (135)
conditions for, 87-88
strong repellor, 153-156
subinvariant function, 129
support, 39
compact, 207
and Frobenius-Perron operator, 44
of measure, 394
sweeping 125-127, (136), 243-244,
(333)
and Bielecki function, 127, 245,
387
and Foguel alternative, 130,
133
and Fokker-Planck equation,
386-388
and invariant density, 130, 247
and stochastic semigroup, 243
tangent space, 178
tangent vector, 177
tent map, (50), 167, (188)
time and space averages, 64, 196
torus, 186
Anosov diffeomorphism on, 57
d-dimensional, 216
exact transformation on, 186
rotation on, 216-218
trace of dynamical system, 193194
trajectory, 192
versus density, 10
471
transformation
asymptotically periodic, 156165
convex, 153-156
ergodic, 59, 197
exact,66, 199
factor of, 82
Frobenius-Perron operator
for, 7, 42, 199-200, 215
Koopman operator for, 47,
203
measurable, 41
measure-preserving, 52
mixing, 65, 198
nonsingular, 42
piecewise monotonic, 144,
153, 156, 165, 172
statistically stable, 105, 110
weakly mixing, 80
triangle inequality, 26
trivial set, 59, 197
trivial u-algebra, 80
uniform parabolicity, 364
unit volume function, 181
variance
of function, 139
of random variable, 308
of Wiener process, 337
variation of function, 140
vector
norm of, 180
scalar product of, 180
space, 26
von Neumann series, 265
472
Index
weak continuity, 49
weak convergence, 31
of densities, 72
of measures, 397, 400, 401
weak law of large numbers, 313
weak precompactness, 86
condition for, 87-88
weak repellor, paradox of, 11, (15),
151