Sei sulla pagina 1di 490

Applied Mathematical Sciences

Volume 97
Editors
J.E. Marsden L. Sirovich

Advisors
M. Ghil J.K. Hale T. Kambe
J. Keller K. Kirchgassner
B.J. Matkowsky C.S. Peskin
J.T. Stuart

Springer
New York
Berlin
Heidelberg
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo

Applied Mathematical Sciences


I. John: Partial Differential Equations, 4th ed
2. Sirovich: Techniques of Asymptotic Analysis.
3. Hale: Theory of Functional Differential Equations,
2nd ed.
4. Percus: Combinatorial Methods.
5. von Mises!Friedrichs: fluid Dynamics.
6. Freiberger!Grenander: A Short Course in
Computational Probability and Statistics.
7. Pipkin: Lectures on Viscoelasticity Theory.
8. Giacaglia: Perturbation Methods in Non-linear
Systems.
9. Friedrichs: Spectral Theory of Operators in
Hilbert Space.
10. Stroud: Numerical Quadrature and Solution of
Ordinary Differential Equations.
II. Wolovich: Linear Multi variable Systems.
12. Berkovitz: Optimal Control Theory.
13. Bluman/Cole: Similarity Methods for Differential
Equations.
14. Yoshizawa: Stability Theory and the Existence of
Periodic Solution and Almost Periodic Solutions.
15. Braun: Differential Equations and Their
Applications, 3rd ed.
16. Lefschetz: Applications of Algebraic Topology.
17. Collatz/Wetterling: Optimization Problems.
18. Grenander: Pattern Synthesis: Lectures in Pattern
Theory, Vol. I.
19. Mars den/McCracken: Hopf Bifurcation and Its
Applications.
20. Driver: Ordinary and Delay Differential
Equations.
21. Courant/Friedrichs: Supersonic flow and Shock
Waves.
22. Rouche!Habets/Laloy: Stability Theory by
Liapunov's Direct Method.
23. Lamperti: Stochastic Processes: A Survey of the
Mathematical Theory.
24. Grenander: Pattern Analysis: Lectures in Pattern
Theory, Vol. II.
25. Davies: Integral Transforms and Their
Applications, 2nd ed.
26. Kushner!Clark: Stochastic Approximation
Methods for Constrained and Unconstrained
Systems.
27. de Boor: A Practical Guide to Splines.
28. Keilson: Markov Chain Models-Rarity and
Exponentiality.
29. de Veubeke: A Course in Elasticity.
30. niatycki: Geometric Quantization and Quantum
Mechanics.
31. Reid: Sturmian Theory for Ordinary Differential
Equations.
32. Meis/Markowitz: Numerical Solution of Partial
Differential Equations.
33. Grenander: Regular Structures: Lectures in
Pattern Theory, Vol. III.

34. Kevorkian/Cole: Perturbation Methods in Applied


Mathematics.
35. Carr: Applications of Centre Manifold Theory.
36. Bengtsson/GhiVK/iiUn: Dynamic Meteorology:
Data Assimilation Methods.
37. Saperstone: Semidynamical Systems in Infinite
Dimensional Spaces.
38. Uchtenberg!Lieberman: Regular and Chaotic
Dynamics, 2nd ed.
39. Piccini/Stampacchia!Vidossich: Ordinary
Differential Equations in R".
40. Naylor/Sell. Linear Operator Theory in
Engineering and Science.
41. Sparrow: The Lorenz Equations: Bifurcations,
Chaos, and Strange Attractors.
42. Guckenheimer!Holmes: Nonlinear Oscillations,
Dynamical Systems, and Bifurcations of Vector
Fields.
43. Ockendonll'aylor: Inviscid fluid flows.
44. Pazy: Semigroups of Linear Operators and
Applications to Partial Differential Equations.
45. Glashoff!Gustafson: Linear Operations and
Approximation: An Introduction to the Theoretical
Analysis and Numerical Treatment of SemiInfinite Programs.
46. Wilcox: Scattering Theory for Diffraction
Gratings.
47. Hale et al: An Introduction to Infinite Dimensional
Dynamical Systems-Geometric Theory.
48. Murray: Asymptotic Analysis.
49. Ladyzhenskaya: The Boundary-Value Problems of
Mathematical Physics.
50. Wilcox: Sound Propagation in Stratified fluids.
51. Golubitsky!Schaeffer: Bifurcation and Groups in
Bifurcation Theory, Vol. I.
52. Chipot: Variational Inequalities and flow in
Porous Media.
53. Majda: Compressible fluid flow and System of
Conservation Laws in Several Space Variables.
54. Was ow: Linear Turning Point Theory.
55. Yosida: Operational Calculus: A Theory of
Hyperfunctions.
56. Chang/Howes: Nonlinear Singular Perturbation
Phenomena: Theory and Applications.
57. Reinhardt: Analysis of Approximation Methods
for Differential and Integral Equations.
58. Dwoyer/HussainWoigt (eds): Theoretical
Approaches to Turbulence.
59. Sanders/Verhulst: Averaging Methods in
Nonlinear Dynamical Systems.
60. GhiVChildress: Topics in Geophysical Dynamics:
Atmospheric Dynamics, Dynamo Theory and
Climate Dynamics.

(continued following index)

Andrzej Lasota

Michael C. Mackey

Chaos, Fractals, and Noise


Stochastic Aspects of Dynamics
Second Edition

With 48 Illustrations

'Springer

Andrzej Lasota
Institute of Mathematics
Silesian University
ul. Bankowa 14
Katowice 40-058, Poland

Michael C. Mackey
Center of Nonlinear Dynamics
McGill University
Montreal, Quebec H3G 1Y6
Canada

Editors

J .E. Marsden
Control and Dynamical Systems, 107-81
California Institute of Technology
Pasadena, CA 91125
USA

L. Sirovich
Division of Applied Mathematics
Brown University
Providence, RI 02912
USA

Mathematics Subject Classifications (1991): 60Gxx, 60Bxx, 58F30


Library of Congress Cataloging-in-Publication Data
Lasota, Andrzej, 1932Chaos, fractals, and noise ; stochastic aspects of dynamics I
Andrzej Lasota, Michael C. Mackey.
p. em. - (Applied mathematical sciences ; v. 97)
Rev. ed. of: Probabilistic properties of deterministic systems.
1985.
Includes bibliographical references and index.
ISBN 0-387-94049-9
1. System analysis. 2. Probabilities. 3. Chaotic behavior in
systems. I. Mackey, Michael C., 1942. II. Lasota, Andrzej,
1932
. Probabilistic properties of deterministic systems.
III. Title. IV. Series:Applied mathematical sciences
(Springer-Verlag New York Inc.); v.97.
QA1.A647 vol. 97
[QA402]
510s-dc20
[003'.75]
93-10432
Printed on acid-free paper.
1994 Springer-Verlag New York, Inc. First edition published by Cambridge University Press
as Probabilistic Properties of Deterministic Systems. 1985.
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New
York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly
analysis. Use in connection with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even
if the former are not especially identified, is not to be taken as a sign that such names, as
understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely
by anyone.
Production managed by Hal Henglein; manufacturing supervised by Vincent R. Scelta.
Photocomposed copy prepared from a TeX file.
Printed and bound by Berryville Graphics, Berryville, VA.
Printed in the United States of America.

9 8 7 6 5 4 3
ISBN 0-387-94049-9
ISBN 3-540-94049-9

SPIN 10851267

Springer-Verlag New York Berlin Heidelberg


A member of BertelsmannSpringer Science+ Business Media GmbH

To the memory of

Maria Waiewska- Czyiewska

Preface to the Second Edition

The first edition of this book was originally published in 1985 under the title "Probabilistic Properties of Deterministic Systems." In the intervening
years, interest in so-called "chaotic" systems has continued unabated but
with a more thoughtful and sober eye toward applications, as befits a maturing field. This interest in the serious usage of the concepts and techniques
of nonlinear dynamics by applied scientists has probably been spurred more
by the availability of inexpensive computers than by any other factor. Thus,
computer experiments have been prominent, suggesting the wealth of phenomena that may be resident in nonlinear systems. In particular, they
allow one to observe the interdependence between the deterministic and
probabilistic properties of these systems such as the existence of invariant
measures and densities, statistical stability and periodicity, the influence
of stochastic perturbations, the formation of attractors, and many others.
The aim of the book, and especially of this second edition, is to present
recent theoretical methods which allow one to study these effects.
We have taken the opportunity in this second edition to not only correct
the errors of the first edition, but also to add substantially new material in
five sections and a new chapter. Thus, we have included the additional dynamic property of sweeping (Chapter 5) and included results useful in the
study of semigroups generated by partial differential equations {Chapters
7 and 11) as well as adding a completely new Chapter 12 on the evolution
of distributions. The material of this last chapter is closely related to the
subject of iterated function systems and their attractors-fractals. In addi-

viii

Preface to the Second Edition

tion, we have added a set of exercises to increase the utility of the work for
graduate courses and self-study.
In addition to those who helped with the first edition, we would like to
thank K. Alligood (George Mason), P. Kamthan, J. Losson, I. Nechayeva,
N. Provatas (McGill), and A. Longtin (Ottawa) for their comments.

A.L.
M.C.M.

Preface to the First Edition

This book is about densities. In the history of science, the concept of densities emerged only recently as attempts were made to provide unifying descriptions of phenomena that appeared to be statistical in nature. Thus, for
example, the introduction of the Maxwellian velocity distribution rapidly
led to a unification of dilute gas theory; quantum mechanics developed
from attempts to justify Planck's ad hoc derivation of the equation for the
density of blackbody radiation; and the field of human demography grew
rapidly after the introduction of the Gompertzian age distribution.
From these and many other examples, as well as the formal development
of probability and statistics, we have come to associate the appearance of
densities with the description of large systems containing inherent elements
of uncertainty. Viewed from this perspective one might find it surprising
to pose the questions: "What is the smallest number of elements that a
system must have, and how much uncertainty must exist, before a description in terms of densities becomes useful and/or necessary?" The answer is
surprising, and runs counter to the intuition of many. A one-dimensional
system containing only one object whose dynamics are completely deterministic (no uncertainty) can generate a density of states! This fact has
only become apparent in the past half-century due to the pioneering work
of E. Borel (1909], A. Renyi (1957], and S. Ulam and J. von Neumann.
These results, however, are not generally known outside that small group
of mathematicians working in ergodic theory.
The past few years have witnessed an explosive growth in interest in
physical, biological, and economic systems that could be profitably studied
using densities. Due to the general inaccessibility of the mathematical lit-

Preface to the First Edition

erature to the nonmathematician, there has been little diffusion of the concepts and techniques from ergodic theory into the study of these "chaotic"
systems. This book attempts to bridge that gap.
Here we give a unified treatment of a variety of mathematical systems
generating densities, ranging from one-dimensional discrete time transformations through continuous time systems described by integr~partial
differential equations. We have drawn examples from a variety of the sciences to illustrate the utility of the techniques we present. Although the
range of these examples is not encyclopedic, we feel that the ideas presented
here may prove useful in a number of the applied sciences.
This book was organized and written to be accessible to scientists with
a knowledge of advanced calculus and differential equations. In various
places, basic concepts from measure theory, ergodic theory, the geometry
of manifolds, partial differential equations, probability theory and Markov
processes, and stochastic integrals and differential equations are introduced.
This material is presented only as needed, rather than as a discrete unit
at the beginning of the book where we felt it would form an almost insurmountable hurdle to all but the most persistent. However, in spite of our
presentation of all the necessary concepts, we have not attempted to offer
a compendium of the existing mathematical literature.
The one mathematical technique that touches every area dealt with is the
use of the lower-bound function (first introduced in Chapter 5} for proving
the existence and uniqueness of densities evolving under the action of a
variety of systems. This, we feel, offers some partial unification of results
from different parts of applied ergodic theory.
The first time an important concept is presented, its name is given in
bold type. The end of the proof of a theorem, corollary, or proposition is
marked with a ; the end of a remark or example is denoted by a 0.
A number of organizations and individuals have materially contributed
to the completion of this book.
In particular the National Academy of Sciences (U.S.A.), the Polish
Academy of Sciences, the Natural Sciences and Engineering Research Council (Canada}, and our home institutions, the Silesian University and McGill
University, respectively, were especially helpful.
For their comments, suggestions, and friendly criticism at various stages
of our writing, we thank J. Belair (Montreal}, U. an der Heiden (Bremen}, and R. Rudnicki (Katowice). We are especially indebted toP. Bugiel
(Krakow) who read the entire final manuscript, offering extensive mathematical and stylistic suggestions and improvements. S. James (McGill} has
cheerfully, accurately, and tirelessly reduced several rough drafts to a final
typescript.

Contents

Preface to the Second Edition

vii

Preface to the First Edition

ix

Introduction
1.1
A Simple System Generating a Density of States
1.2
The Evolution of Densities: An Intuitive Point of View
1.3 Trajectories Versus Densities
Exercises

1
1
5
9
13

The
2.1
2.2
2.3

17

Toolbox
Measures and Measure Spaces
Lebesgue Integration
Convergence of Sequences of FUnctions
Exercises

17
19
31
35

Markov and Frobenius-Perron Operators


3.1
Markov Operators
3.2 The Frobenius-Perron Operator
3.3 The Koopman Operator
Exercises

37

Studying Chaos with Densities


4.1
Invariant Measures and Measure-Preserving
Transformations

51

37
41

47
49

51

xii

Contents

4.2
4.3
4.4
4.5
5

Ergodic Transformations
Mixing and Exactness
Using the Frobenius-Perron Koopman Operators for
Classifying Transformations
Kolmogorov Automorphisms
Exercises

The Asymptotic Properties of Densities


Weak and Strong Precompactness
Properties of the Averages An!
Asymptotic Periodicity of { pn!}
The Existence of Stationary Densities
Ergodicity, Mixing, and Exactness
Asymptotic Stability of {pn}
Markov Operators Defined by a Stochastic Kernel
Conditions for the Existence of Lower-Bound Functions
Sweeping
The Foguel Alterative and Sweeping
Exercises

5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10

59
65
71

79
83
85
86

88
95
100
102
105
112
123
125
129
136

The Behavior of Transformations on Intervals


and Manifolds
139
6.1
Functions of Bounded Variation
139
6.2 Piecewise Monotonic Mappings
144
6.3 Piecewise Convex Transformations with a Strong Repellor 153
6.4 Asymptotically Periodic Transformations
156
6.5
Change of Variables
165
6.6 Transformations on the Real Line
172
6.7 Manifolds
175
6.8 Expanding Mappings on Manifolds
183
Exercises
187

Continuous Time Systems: An Introduction


7.1
Two Examples of Continuous Time Systems
7.2 Dynamical and Semidynamical Systems
7.3 Invariance, Ergodicity, Mixing, and Exactness in
Semidynamical Systems
7.4 Semigroups of the Frobenius-Perron and Koopman
Operators
7.5 Infinitesimal Operators
7.6 Infinitesimal Operators for Semigroups Generated by
Systems of Ordinary Differential Equations
7.7 Applications of the Semigroups of the Frobenius-Perron
and Koopman Operators
7.8 The Hill~Yosida Theorem and Its Consequences

189
190
191
195
199
205
210
215
226

Contents

7.9
7.10
7.11
7.12
8

FUrther Applications of the Hille-Yosida Theorem


The Relation Between the Frobenius-Perron and
Koopman Operators
Sweeping for Stochastic Semigroups
Foguel Alternative for Continuous Time Systems
Exercises

Discrete Time Processes Embedded in Continuous


Time Systems
8.1 The Relation Between Discrete and Continuous Time
Processes
8.2 Probability Theory and Poisson Processes
8.3 Discrete Time Systems Governed by Poisson Processes
8.4 The Linear Boltzmann Equation: An Intuitive
Point of View
8.5 Elementary Properties of the Solutions of the Linear
Boltzmann Equation
8.6 FUrther Properties of the Linear Boltzmann Equation
8.7 Effect of the Properties of the Markov Operator on
Solutions of the Linear Boltzmann Equation
8.8 Linear Boltzmann Equation with a Stochastic Kernel
8.9 The Linear Tjon-Wu Equation
Exercises
Entropy
Basic Definitions
9.1
9.2 Entropy of pn I When P is a Markov Operator
9.3 Entropy H(pn f) When P is a Frobenius-Perron

9.4

Operator
Behavior of pn I from H (Pn f)
Exercises

10 Stochastic Perturbation of Discrete Time Systems


10.1 Independent Random Variables
10.2 Mathematical Expectation and Variance
10.3 Stochastic Convergence
10.4 Discrete Time Systems with Randomly Applied
Stochastic Perturbations
Discrete
Time Systems with Constantly Applied
10.5
Stochastic Perturbations
10.6 Small Continuous Stochastic Perturbations of Discrete
Time Systems
10.7 Discrete Time Systems with Multiplicative Perturbations
Exercises

xiii

232
241
244
246
247
261

251
252
258
261
264
268
270
273
277
280
283
283
289

292
395
300
303
304
306
311

315
320
327
330
333

xiv

Contents

11 Stochastic Perturbation of Continuous Time Systems


11.1 One-Dimensional Wiener Processes (Brownian Motion)
11.2 d-Dimensional Wiener Processes (Brownian Motion)
11.3 The Stochastic Ito Integral: Development
11.4 The Stochastic Ito Integral: Special Cases
11.5 Stochastic Differential Equations
11.6 The Fokker-Planck (Kolmogorov Forward) Equation
11.7 Properties ofthe Solutions of the Fokker-Planck
Equation
11.8 Semigroups of Markov Operators Generated by Parabolic
Equations
11.9 Asymptotic Stability of Solutions of the Fokker-Planck
Equation
11.10 An Extension of the Liapunov Function Method
11.11 Sweeping for Solutions of the Fokker-Planck Equation
11.12 Foguel Alternative for the Fokker-Planck Equation
Exercises

335
:i35
344
346
351
355
359

12 Markov and Foias Operators


12.1 The Riesz Representation Theorem
12.2 Weak and Strong Convergence of Measures
12.3 Markov Operators
12.4 Foias Operators
12.5 Stationary Measures: Krylov-Bogolubov Theorem for
Stochastic Dynamical Systems
12.6 Weak Asymptotic Stability
12.7 Strong Asymptotic Stability
12.8 Iterated Function Systems and Fractals
Exercises

393
393
397
405
411

References

449

Notation and Symbols

457

Index

461

364
368
371
378
386
388
391

417
420
425
432
447

1
Introduction

We begin by showing how densities may arise from the operation of a onedimensional discrete time system and how the study of such systems can
be facilitated by the use of densities.
If a given system operates on a density as an initial condition, rather than
on a single point, then successive densities are given by a linear integral
operator, known as the Frobenius-Perron operator. Our main objective in
this chapter is to offer an intuitive interpretation of the Frobenius-Perron
operator. We make no attempt to be mathematically precise in either our
language or our arguments.
The precise definition of the Frobenius-Perron operator is left to Chapter
3, while the measure-theoretic background necessary for this definition is
presented in Chapter 2.

1.1 A Simple System Generating a Density of


States
One of the most studied systems capable of generating a density of states
is that defined by the quadratic map
S(x) = ax(1- x)

for 0 ::; x ::; 1.

(1.1.1)

We assume that a= 4 so S maps the closed unit interval [0, 1] onto itself.
This is also expressed by the saying that the state (or phase) space of the
system is [0, 1]. The graph of this transformation is shown in Fig. 1.1.1a.

1. Introduction

s
(a)

(b)

Xj

0 ~~~--~----~~~~-----U

(c)

~~~~~~~~~~~r-r-~

Xj

200

100

FIGURE 1.1.1. The quadratic transformation (1.1.1) with a = 4 is shown in


(a). In (b) we show the trajectory (1.1.2) determined by (1.1.1) with :~P = 1rjl0.
Panel (c) illustrates the sensitive dependence of trajectories on initial conditions
by using x 0 = (7r/10) +0.001. In (b) and (c), successive points on the trajectories
have been connected by lines for clarity of presentation.

Having defined S we may pick a.n initial point x 0 e [0, 1] so that the
successive states of our system at times 1, 2, ... are given by the trajectory
(1.1.2)
A typical trajectory corresponding to a given initial state is shown in Figure
1.1.1b. It is visibly erratic or chaotic, as is the case for almost all x 0 What
is even worse is that the trajectory is significantly altered by a slight change

1.1. A Simple System Generating a Density of States

20

10

FIGURE 1.1.2. The histogram constructed according to equation (1.1.3) with


n = 20, N = 5000, and z 0 = 1r /10.

in the initial state, as shown in Figure 1.1.1c for an initial state differing
by w- 3 from that used to generate Figure 1.1.1b. Thus we are seemingly
faced with a real problem in characterizing systems with behaviors like that
of (1.1.1).
By taking a clue from other areas, we might construct a histogram to
display the frequency with which states along a trajectory fall into given
regions of the state space. This is done in the following way. Imagine that
we divide the state space [0, 1] into n discrete nonintersecting intervals so
the ith interval is (we neglect the end point 1)

[(i -1)/n,i/n)

i = 1, ... ,n.

Next we pick an initial system state x 0 and calculate a long trajectory


x 0 , S(x0 ), S2 (x 0 ), , sN (x0 )

of length N where N > > n. Then it is straightforward to determine the


fraction, call it /i, of theN system states that is in the ith interval form

fi =;{number of Si(x0 ) E [(i-1)/n,i/n), j = 1, ... ,N}.

(1.1.3)

We have carried out this procedure for the initial state used to generate
the trajectory of Figure 1.1.1b by taking n = 20 and using a trajectory of
length N = 5000. The result is shown in Figure 1.1.2. There is a surprising
symmetry in the result, for the states are clearly most concentrated near 0
and 1 with a minimum at Repeating this process for other initial states
leads, in general, to the same result. Thus, in spite of the sensitivity of
trajectories to initial states, this is not usually reflected in the distribution
of states within long trajectories.
However, for certain select initial states, different behaviors may occur.
For some initial conditions the trajectory might arrive at one of the fixed
points of equation (1.1.1), that is, a point x,. satisfying

!.

x,. = S(x,.).

1.. Introduction
(a)

xI

(b)

xI

50 '

100

FIGURE 1.1.3. Exceptional initial conditions may confound the study of transformations via trajectories. In (a) we show how an initial condition on the quadratic
transformation (1.1.1) with a= 4 can lead to a fixed point z. of S. In (b) we see
that another initial condition leads to a period 2 trajectory, although all other
characteristics of S are the same.

(For the quadratic map with a = 4 there are two fixed points, x. = 0 and
x. =
If this happens the trajectory will then have the constant value
x. forever after, as illustrated in Figure 1.1.3a. Alternately, for some other
initial states the trajectory might become periodic (see Figure 1.1.3b) and
also fail to exhibit the irregular behavior of Figures 1.1.1 b and c. The worst
part about these exceptional behaviors is that we have no a priori way of
predicting which initial states will lead to them.
In the next section we illustrate an alternative approach to avoid these
problems.

i)

Remark 1.1.1. Map {1.1.1) has attracted the attention of many mathematicians, Ulam and von Neumann [1947) examined the case when a= 4,
whereas Ruelle [1977), Jakobson [1978), Pianigiani [1979), Collet and Eckmann [1980) and Misiurewicz [1981] have studied its properties for values
of a< 4. May [1974], Smale and Williams [1976], and Lasota and Mackey
[1980), among others, have examined the applicability of {1.1.1) and similar
maps to biological population growth problems. Interesting properties re-

1.2. The Evolution of Densities: An Intuitive Point of View

lated to the existence of periodic orbits in the transformation {1.1.1) follow


from the classical results of Sarkovski'l (1964]. 0

1.2 The Evolution of Densities: An Intuitive


Point of View
The problems that we pointed out in the previous section can be partially
circumvented by abandoning the study of individual trajectories in favor of
an examination of the flow of densities. In this section we give a heuristic
introduction to this concept.
Again we assume that we have a transformation S : [0, 1] -+ [0, 1] (a
shorthand way of sayingS maps (0, 1] onto itself} and pick a large number
N of initial states
To each of these states we apply the map S, thereby obtaining N new states
denoted by
x~ = S(x~}, x~ = S(xg}, ... , x}.r = S(xO,. ).
To define what we mean by the densities of the initial and final states, it
is helpful to introduce the concept of the characteristic (or indicator)

function for a set fl. This is simply defined by


1a(x)

1 ifxE!l
if X fl .

={ 0

Loosely speaking, we say that a function fo(x) is the density function for
the initial states x~, ... , xO,. if, for every (not too small) interval floc (0, 1],
we have

f fo(u}du ~ ~
lao

Likewise, the density function


tl c [0, 1],

1a0 (xJ).

(1.2.1}

j=l

It {x} for the states xl, ... , x}v satisfies, for

1 N
ft(u}du ~ N :~:::)a(x}).

1
a

(1.2.2}

i=l

We want to find a relationship between It and fo.


To do this it is necessary to introduce the notion of the counterimage
of an interval tl c [0, 1] under the operation of the mapS. This is the set
of all points that will be in tl after one application of S, or

s- 1 (/l) = {x: S(x) Ell}.


As illustrated in Figure 1.2.1, for the quadratic map considered in Section
1.1, the counterimage of an interval will be the union of two intervals.

1. Introduction

FIGURE 1.2.1. The counterimage of the set [0, zJ under the quadratic transformation consists of the union of the two sets denoted by the heavy lines on the
z-axis.

Now note that for any


X~J E

ac

(0, 1]

if and only if

x1 E s- (a).
1

We thus have the very useful relation


1A(S(x))

= 1s-l(A)(x).

(1.2.3)

With (1.2.3) we may rewrite equation (1.2.2) as


(1.2.4)
Because Ao and A have been arbitrary up to this point, we simply pick
ao = s- 1 (a). With this choice the right-hand sides of (1.2.1) and (1.2.4)
are equal and therefore

JA

h(u)du =

Js-l(A)

fo(u)du.

(1.2.5)

This is the relationship that we sought between fo and h, and it tells us


how a density of initial states fo will be transformed by a given map S into
a new density h.
If a is an interval, say a= [a, x], then we can obtain an explicit representation for f1. In this case, equation (1.2.5) becomes

1"'
a

!l(u)du =

Js- ([a,z))
1

fo(u)du,

1.2. The Evolution of Densities: An Intuitive Point of View

and differentiating with respect to x gives

ft(x)

=! f

Js-l((a,:l!]}

lo(u}du.

(1.2.6}

It is clear that It will depend on lo This is usually indicated by writing


It = Plo, so that (1.2.6} becomes

Pl(x)

=! f

Js-l([a,:J:))

l(u}du

(1.2.7}

(we have dropped the subscript on lo as it is arbitrary). Equation (1.2.7}


explicitly defines the Frobenius-Perron operator P corresponding to
the transformation S; it is very useful for studying the evolution of densities.
To illustrate the utility of (1.2. 7} and, incidentally, the Frobenius-Perron
operator concept, we return to the quadratic map S(x) = 4x(1- x) of the
preceding section. To apply (1.2. 7) it is obvious that we need an analytic
formula for the counterimage of the interval [0, x]. Reference to Figure 1.2.1
shows that the end points of the two intervals constituting s- 1 ([0,x]} are
very simply calculated by solving a quadratic equation. Thus

s- 1 ([o,x]} = (o,!- !v'l-XJ u [! + !v'1- x, 1].


With this, equation (1.2. 7} becomes
d 11/2-1/2.,ff=i

Pl(x)=-d
X

d 11

l(u}du+dx

1/2+1/2,fl-i

l(u}du,

or, after carrying out the indicated differentiation,

Pl(x) =

)=x

4 1-x {I(!- !v'f=X) +I(!+ !v'f=X)}.

(1.2.8}

This equation is an explicit formula for the Frobenius-Perron operator corresponding to the quadratic transformation and will tell us how S transforms a given density I into a new density Pl. Clearly the relationship can
be used in an iterative fashion.
To see how this equation works, pick an initial density l(x) = 1 for
x E [0, 1]. Then, since both terms inside the braces in (1.2.8} are constant,
a simple calculation gives

Pl(x)

= 2vr::x
1-x

(1.2.9}

Now substitute this expression for PI in place of I on the right-hand side


of (1.2.8} to give

1. Introduction

p2f

Pf

2
I
I
I
I
I
I

I
I
I

I
I

I
I

I
\

' ' .......

__ ., "

FIGURE 1.2.2. The evolution of the constant density f(x) = 1, x E [0, 1], by
the Frobenius-Perron operator corresponding to the quadratic transformation.
Compare the rapid and regular approach of pn J to the density given in equation
(1.2.11) (shown as a dashed line) with the sustained irregularity shown by the
trajectories in Figure 1.1.1.

P(Pf(x))

= P 2 f(x)
=
=

4~ {2,;1- ~:~vr-x + 2,;1-l~~~}


sb{ v'1+~+ v'1-~}

2 10
(1. . )

In Figure 1.2.2 we have plotted f(x)


1, Pf(x) given by (1.2.9}, and
P 2 f(x) given by (1.2.10} to show how rapidly they seem to approach a
limiting density. Actually, this limiting density is given by
1

/.(x)

= 1ry'x(1- x)

(1.2.11}

If /. is really the ultimate limit of pn f as n --+ oo, then we should find


that P /.
when we substitute into equation (1.2.8} for the FrobeniusPerron operator. A few elementary calculations confirm this. Note also the
close similarity between the graph of/. in Figure 1.2.2 and the histogram
of Figure 1.1.2. Later we will show that for the quadratic map the density
of states along a trajectory approaches the same unique limiting density /.
as the iterates of densities approach.

=/.

Example 1.2.1. Consider the transformationS: [0, 1]--+ [0, 1] given by

S(x)

= rx

(mod 1},

{1.2.12}

1.3. Trajectories Versus Densities

FIGURE 1.2.3. The dyadic transformation is a special case of the r-adic transformation. The heavy lines along the x-axis mark the two components of the
counterimage of the interval [0, x).

where r is an integer. The notation rx (mod 1) means rx-n, where n is the


largest integer such that rx - n ~ 0. This expression is customarily called
the r-adic transformation and is illustrated in Figure 1.2.3 for r = 2 {the
dyadic transformation) ..
Pick an interval [0, x] C [0, 1] so that the counterimage of [0, x] under S
is given by

s- 1{[0, x]) =

U[~. ~ + ~]

r r
and the Frobenius-Perron operator is thus
i=O

Pl(x)= dx

r-1

i/r+z/r

i/r

1
l(u)du="i-

r-1

( .

~~ ~+;

{1.2.13)

This formula for the Frobenius-Perron operator corresponding to the radic transformation {1.2.12) shows again that densities I will be rapidly
smoothed by P, as can be seen in Figure 1.2.4a for an initial density l(x) =
2x, x E [0, 1]. It is clear that the density pnl(x) rapidly approaches the
constant distribution l.(x)
1, x E [0, 1]. Indeed, it is trivial to show
that P1
1. This behavior should be contrasted with that of a typical
trajectory (Figure 1.2.4b). D

1.3 Trajectories Versus Densities


In closing this chapter we offer a qualitative examination of the behavior
of two transformations from both the flow of trajectories and densities
viewpoints.

10

1. Introduction

.,

200

tOO

FIGURE 1.2.4. Dynamics of the dyadic transformation. (a) With an initial density J(x) = 2x, x E [0, 1], successive applications of the Frobenius-Perron operator corresponding to the dyadic transformation result in densities approaching
j. = 1, x E [0, 1]. (b) A trajectory calculated from the dyadic transformation
with x 0 ~ 0.0005. Compare the irregularity of this trajectory with the smooth
approach of the densities in (a) to a limit.

Let R denote the entire real line, that is, R = {x: -oo < x < oo}, and
consider the transformation S: R - R defined by

S(x) =ax,

a>O.

(1.3.1)

Our study of transformations confined to the unit interval of Section 1.2


does not affect expression (1.2.7) for the Frobenius-Perron operator. Thus
(1.3.1) has the associated Frobenius-Perron operator

Pf(x) = (1/a)f(x/a).
We first examine the behavior of S for a> 1. Since sn(x)
that, for a > 1,
X :f: 0,

= anx, we see

and thus the iterates 5n (X) escape from any bounded interval.
This behavior is in total agreement with the behavior deduced from the
flow of densities. To see this note that

1.3. Trajectories Versus Densities

11

FIGURE 1.3.1. The transformation S(z), defined by equation (1.3.2), has a single
weak repelling point at x 0.

By the qualitative definition of the Frobenius-Perron operator of the previous section, we have, for any bounded interval [-A, AJ c R,

pn f(x) dx

-A

= ~A/a"

f(x) dx.

-A/a"

Since a:> 1,
lim

!A

n-+oo -A

pnf(x)dx = 0

and so, under the operation of S, densities are reduced to zero on every
finite interval when a: > 1.
Conversely, for a: < 1,
lim ISR(x)l = 0
n-+oo

for every x E R, and therefore all trajectories converge to zero. Furthermore, for every neighborhood ( -e, e) of zero, we have
lim
n-+oo

-~

pn f(x) dx

lim

~~/a"

n-+oo

-~/a"

f(x) dx

= ~oo

_ 00

f(x) dx

= 1,

so in this case all densities are concentrated in an arbitrarily small neighborhood of zero. Thus, again, the behaviors of trajectories and densities
seem to be in accord.
However, it is not always the case that the behavior of trajectories and
densities seem to be in agreement. This may be simply illustrated by what
we call the paradox of the weak repellor. In Remark 6.2.1 we consider
the transformationS: [0, 1]-+ [0, 1] defined by

S(x) = { x /(1 - x) !or x E ([0; i ]]


2X - 1

10r X

2 1 ,

(La. 2)

12

1. Introduction

x,

100

200

400

300

500

FIGURE 1.3.2. Dynamics of the weak repellor defined by {1.3.2). (a) The evolution pn f of an initial distribution J(x) = 1, x E (0, 1]. {b) The trajectory
originating from an initial point x 0 ~ 0.25.

(see Figure 1.3.1). There we prove that, for every e > 0,

pnf(x)dx

= 0.

pn f(x) dx

= 1,

lim

n-+oo e

Thus, since pn f is a density,


lim

n-+oo}0

and all densities are concentrated in an arbitrarily small neighborhood of


zero. This behavior is graphically illustrated in Figure 1.3.2a.
If one picks an initial point x0 > 0 very close to zero (see Figure 1.3.2b),
then, as long as sn(xo) E (0, !] , we have

where a= 1/{1- xo) > 1. Thus initially, for small xo, this transformation
behaves much like transformation {1.3.1), and the behavior of the trajectory

Exercises

13

near zero apparently contradicts that expected from the behavior of the
densities.
This paradox is more apparent than real and may be easily understood.
First, note that even though all trajectories are repelled from zero (zero is
a repellor), once a trajectory is ejected from (0, ~1 it is quickly reinjected
into (0, ~ 1 from ( ~, 11 . Thus zero is a "weak repellor." The second essential
point to note is that the speed with which any trajectory leaves a small
neighborhood of zero is small; it is given by
2

Xo
S n( Xo ) - sn-1( xo ) -_ (1- nxo)[1(n- 1)xo]"

Thus, starting with many initial points, as n increases we will see the progressive accumulation of more and more points near zero. This is precisely
the behavior predicted by examining the flow of densities.
Although our comments in this chapter lack mathematical rigor, they
offer some insight into the power of looking at the evolution of densities
under the operation of deterministic transformations. The next two chapters are devoted to introducing the mathematical concepts required for a
precise treatment of this problem.

Exercises
Simple numerical experiments can greatly clarify the material of this and
subsequent chapters. Consequently, the first five exercises of this chapter involve the writing of simple utility programs to study the quadratic
map (1.1.1) from several perspectives. Exercises in subsequent chapters will
make use of these programs to study other maps. H you have access to a
personal computer (preferably with a math coprocessor), a workstation,
or a microcomputer with graphics capabilities, we strongly urge you to do
these exercises.
1.1. Write a program to numerically generate a sequence of iterates {xn}
from Xn+l = S(xn), where S is the quadratic map (1.1.1). Write your
program in such a way that the map Sis called from a subroutine (so it
may be changed easily) and include graphics to display Xn versus n. When
displaying the sequence {xn} graphically, you will find it helpful to connect
successive values by a straight line so you can keep track of them. Save this
program under the name TRAJ so you can use it for further problems.

1.2. Using TRAJ study the behavior of (1.1.1) for various values of a
satisfying 3:::; a:::; 4, and for various initial conditions xo. (You can include
an option to generate xo using the random number generator if you wish,
but be careful to use a different seed number for each run.) At a given
value of a what can you say about the temporal behavior of the sequence

14

1. Introduction

{xn} for different x 0 ? What can you say concerning the qualitative and
quantitative differences in the trajectories {xn} for different values of o:?
1.3. To increase your understanding of the results in Exercise 1.2, write a
second program called BIFUR. This program will plot a large number of
iterates of the map {1.1.1) as o: is varied between 3 and 4, and the result
will approximate the bifurcation diagram of (1.1.1). Procedurally, for each
value of o:, use the random number generator (don't forget about the seed)
to select an initial x0 , discard the first 100 or so values of Xn to eliminate
transients, and then plot a large number (on the order of 1000 to 5000)
of the Xn vertically above the value of o:. Then increment o: and repeat
the process successively until you have reached the maximal value of o:. A
good incremental value of o: is ~o: = 0.01 to 0.05, and obviously the smaller
~o: the better the resolution of the details of the bifurcation diagram at
the expense of increased computation time. Use the resulting bifurcation
diagram you have produced, in conjunction with your results of Exercise
1.2, to more fully discuss the dynamics of {1.1.1). You may find it helpful
to make your graphics display flexible enough to "window" various parts
of the bifurcation diagram so you can examine fine detail.
1.4. Write a program called DENTRAJ (Density from a 'Irajectory) to
display the histogram of the location of the iterates {xn} of {1.1.1) for
various values of o: satisfying 3 :::; o: :::; 4 as was done in Figure 1.1.2 for
o: = 4. [Constructing histograms from "data" like this is always a bit tricky
because there is a tradeoff between the number of points and the number of
bins in the histogram. However, a ratio of 20(}-300 of point number to bin
number should provide a satisfactory result, so, depending on the speed of
your computer (and thus the number of iterations that can be carried out
in a given time), you can obtain varying degrees of resolution.) Compare
your results with those from Exercise 1.3. Note that at a given value of
o:, the bands you observed in the bifurcation diagram correspond to the
histogram supports (the places where the histogram is not zero).

1.5. Redo Exercise 1.4 by writing a program called DENITER (Density


Iteration) that takes a large number N of initial points {x~H~ 1 distributed
with some density /o(x), e.g., f 0 (x) could be uniform on [0, 1] for (1.1.1),
or /o(x) = 2x, etc., and iterates them sequentially to give {xl}[!. 1 =
{S(x~)}: 1 , {xHf: 1 = {S(xf)}:, 1 , etc. Construct your program to display the histogram of the {z{}[!.1 for the initial (j = 0) and successive iterations. Do the histograms appear to converge to an invariant histogram?
How does the choice of the initial histogram affect the result after many
iterations? Discuss the rate of convergence of the sequence of histograms.
1.6. Prove that /. given by {1.2.11) is a solution of the equation Pf = /,
where P, given by {1.2.8), is the Frobenius-Perron operator corresponding
to the quadratic map {1.1.1) with o: = 4.

Exercises

15

1.7. This exercise illustrates that there can sometimes be a danger in draw.ing conclusions about the behavior of even simple systems based on numer-

ical experiments. Consider the Frobenius-Perron operator {1.2.13) corresponding to the r-adic transformation {1.2.12) when r is an integer. (a)
For every integer r show that /.{x) = 1[o, 1J{x) is a solution of Pf = f.
Can you prove that it is the unique solution? {b) For r = 2 and r = 3 use
TRAJ, DENTRAJ, and DENITER to study {1.2.12). What differences do
you see in the behaviors for r = 2 and r = 3? Why do these differences
exist? Discuss your numerical results in light of your computations in (a).
1.8. Consider the example of the weak repellor {1.3.2). (a) Derive the
Frobenius-Perron operator corresponding to the weak repellor without
looking in Chapter 6. Calculate a few terms of the sequence {pn!} for
f(x) = 1[o, 1J(x). {b) Use TRAJ, DENTRAJ and DENITER to study the
weak repellor (1.3.2). Discuss your results. Based on your observations,
what conjectures can you formulate about the behavior of the weak repellor? In what way do these differ from the properties of the quadratic map
(1.1.1) that you saw in Exercises 1.1-1.5?

2
The Toolbox

In this and the following chapter, we introduce basic concepts necessary


for understanding the flow of densities. These concepts may be studied in
detail before continuing on to the core of our subject matter, which starts
in Chapter 4, or, they may be skimmed on first reading to fix the location
of important concepts for later reference.
We briefly outline here some essential concepts from measure theory, the
theory of Lebesgue integration, and from the theory of the convergence
of sequences of functions. This material is in no sense exhaustive; those
desiring more detailed treatments should refer to Halm08 [1974] and Royden
[1968].

2.1

Measures and Measure Spaces

We start with the definition of a a-algebra.


Definition 2.1.1. A collection A of subsets of a set X is a a-algebra if:
(a) When A E A then X\ A E A;
(b) Given a finite or infinite sequence {Ak} of subsets of X, Ak E A, then
the union Uk Ak E A; and

(c) X EA.
From this definition it follows immediately, by properties (a) and (c),
that the empty set 0 belongs to A, since 0 = X \ X. Further, given a

18

2. The Toolbox

sequence {Ak }, Ak
that

e A, then the intersection nk Ak e A. To see this, note


nAk

=X\U (X\Ak)

and then apply properties (a) and (b). Finally, the difference A \ B of two
sets A and B that belong to A also belongs to A because

A\ B =An (X\ B).


Definition 2.1.2. A real-valued function p. defined on au-algebra A is a
measure if:

(a) p.(0) = 0;
(b) p.(A) 2:: 0 for all A

e A; and

(c) p.(UkAk) = Ekp.(Ak) if {Ak} is a finite or infinite sequence of pairwise disjoint sets from A, that is, A 1 n Aj 0 for i =F j.

We do not exclude the possibility that p.(A)

= oo for some A EA.

Remark 2.1.1. This definition of a measure and the properties of aualgebra A as detailed in Definition 2.1.1 ensure that (1) if we know the
measure of a set X and a subset A of X we can determine the measure of
X \A; and (2) if we know the measure of each disjoint subset Ak of A we
can calculate the measure of their union. 0
Definition 2.1.3. U A is a u-algebra of subsets of X and if p. is a measure
on A, then the triple (X, A, p.) is called a measure space. The sets belonging to A are called measurable sets because, for them, the measure
is defined.

Remark 2.1.2. A simple example of a measure space is the finite set


X = {x1, ... , x N}, in which the u-algebra is all possible subsets of X and
the measure is defined by ascribing to each element x 1 e X a nonnegative number, say p1 From this it follows that the measure of a subset
{ X011 , Xa,.} of X is just p 01 + + Pa,. U Pi = 1, then the measure is
called a counting measure because it counts the number of elements in
the set. 0

Remark 2.1.3. If X = [0, 1] or R, the real line, then the most natural ualgebra is the u-algebra B of Borel sets (the Borel a--algebra), which, by
definition, is the smallest u-algebra containing intervals. (The word smallest
means that any other u-algebra that contains intervals also contains any
set contained in B.) It can be proved that on the Borel u-algebra there

2.2. Lebesgue Integration

19

exists a unique measure p, called the Borel measure, such that p([a, b]) =
b - a. Whenever considering spaces X = R or X = Rd or subsets of these
(intervals, squares, etc.) we always assume the Borel measure and will not
repeat this assumption again. 0
As presented, Definition 2.1.3 is extremely general. In almost all applications a more specific measure space is adequate, as follows:
Definition 2.1.4. A measure space (X, A, p) is called D'-finite if there is
a sequence {A~e}, A~e e A, satisfying
00

= U A~e

and p(A~e) < oo

for all k.

le=l

Remark 2.1.4. If X= R, the real line, and pis the Borel measure, then
the A1e may be chosen as intervals of the form [-k, k]. In the d-dimensional
space ~, the A~e may be chosen as balls of radius k. 0
Definition 2.1.5. A measure space (X, A, p) is called finite if p(X) < oo.
In particular, if p(X) = 1, then the measure space is said to be normalized
or probabilistic.
Remark 2.1.5. We have defined a hierarchy of measure spaces from the
most general (Definition 2.1.3) down to the most specific (Definition 2.1.5).
Throughout this book, unless it is specifically stated to the contrary, a measure space will always be understood to be u-finite. 0
Remark 2.1.6. If a certain property involving the points of a measure
space is true except for a subset of that space having measure zero, then

we say that property is true almost everywhere (abbreviated as a. e.).


0

2.2 Lebesgue Integration


In the material we deal with it is often necessary to use a type of integration more general than the customary Riemann integration. In this section
we introduce the Lebesgue integral, which is defined for abstract measure
spaces in which no other structures except a u-algebra A and a measure p
must be introduced.

Definition 2.2.1. Let (X, A, p) be a measure space. A real-valued function


f:X-+ R is measurable if f- 1 (~) e A for every interval~ cR.
In developing the concept of the Lebesgue integral, we need the notation

J+(x) = max(O, /(x)) and /-(x)

= max(O, -/(x))

20

2. The Toolbox

FIGURE 2.2.1. Illustration ofthe notation j+(z) and r(z).

(see Figure 2.2.1). Observe that

/(z)

= /+(z)- 1-(x)

and

lf(x)l

= f+(x) + 1-(x).

Before presenting the formal definitions for the Lebesgue integral of a


function, consider the following. Let f: X -+ R be a bounded, nonnegative
measurable function, 0 :5 f(x) < M < oo. Take the partition of the interval
[O,M], 0 = ao < a1 <<an= M, 0-i = Mifn, i = 0, ... ,n, and define
the sets Ai by

Ai

= {x:f(x) E [at,ai+l)},

= 0, ... ,n -1.

Then it is clear that the lA, are measurable and

f(x)-

t;

n-1

I n-
M

ailA,(x) :5

Therefore, we must conclude that every bounded nonnegative measurable


function can be approximated by a finite linear combination of characteristic functions. This observation is crucial to our development of the
Lebesgue integral embodied in the following four definitions.
Definition 2.2.2. Let (X,A,J) be a measure space, and the sets Ai E A,

2.2. Lebesgue Integration

= 1, ... , n be such that Ai


integral of the function

n A; = 0 for

21

all i =F j. Then the Lebesgue


(2.2.1)

is defined as

g(x) ~t(dx)

= ~ AiiJ(Ai)
I

A function g of the form {2.2.1) is called a simple function.


Definition 2.2.3. Let (X, A, I') be a measure space, 1: X-+ Ran arbitrary
nonnegative bounded measurable function, and {gn} a sequence of simple
functions converging uniformly to I. Then the Lebesgue integral of I is
defined as
f l(x) ~t(dx) = lim f 9n(x) ~t(dx).

Jx

n-+oo}x

Remark 2.2.1. It can be shown that the limit in Definition 2.2.3 exists
and is independent of the choice of the sequence of simple functions {9n}
as long as they converge uniformly to I. D
Definition 2.2.4. Let (X, A, I') be a measure space,
ative unbounded measurable function, and define

1: X

-+

R a nonneg-

_ { l(x) if 0 ~ l(x) ~ M
f M(x ) - M
if M < l(x).
Then the Lebesgue integral of I is defined by

Jx

l(x) ~t(dx) = lim

M-+oo)x

IM(x) ~t(dx).

Remark 2.2.2. Note that fx IM(x) ~t(dx) is an increasing function of M


so that the limit in Definition 2.2.4 always exists even though it might be
infinite. D
Definition 2.2.5. Let (X,A,IJ) be a measure space and I:X -+ R a
measurable function. Then the Lebesgue integral of I is defined by
[ l(x) ~t(dx)

=[

if at least one of the terms

l+(x) ~t(dx)- [

1-(x) ~t(dx)

22

2. The Toolbox

is finite. If both of these terms are finite then the function

is called

integrable.
Remark 2.2.3. The four Definitions 2.2.2-2.2.5 are for the Lebesgue integral of f over the entire space X. For A E A we have, by definition,

f(x) IJ(dx)

f(x)lA(x) IJ(dx).

The Lebesgue integral has some important properties that we will often
use. We state them without proof. Throughout a measure space (X,A,IJ)
is assumed.
(Ll) If J,g:X-+ Rare measurable, g is integrable, and lf(x)l ~ g(x),
then f is integrable and

IL

f(x) IJ(dx)l

g(x) IJ(dx).

fx lf(x)IIJ(dx) = 0 if and only if f(x) = 0 a.e.


(L3) If ft, h: X -+ R are integrable functions, then for
(L2)

linear combination

~d1

~1, ~2 E

R the

+ ~2 !2 is integrable and

L[~d1(x) + ~2h(x)]IJ(dx)
=

~1 [!t(x)1J(dx)+~2 [h(x)IJ(dx).

(L4) Let f, g: X -+ R be measurable functions and fn: X -+ R be mea-surable functions such that lfn(x)l ~ g(x) and Un(x)} converges to
f(x) almost everywhere. If g is integrable, then f and fn are also
integrable and
lim { fn(x) IJ(dx)

n-+oo}x

= Jx
{ f(x) IJ(dx).

The last formula is also true if the assumption lfn(x)l ~ g(x) with an
integrable g is replaced by 0 ~ ft(x) ~ h(x) .... In this case, however, the
integrals could be infinite.
Remark 2.2.4. The properties described in (L4) are often referred to as
the Lebesgue dominated convergence theorem (lfn(x)l ~ g(x)) and
the Lebesgue monotone convergence theorem (0 ~ ft(x) ~ ). 0
(L5) Let f: X -+ R be an integrable function and the sets Ai E A, i
1, 2, ... , be disjoint. If A = Ui Ai, then

L }A,
{ f(x) IJ(dx) = }Af f(x) IJ(dx).
i

2.2. Lebesgue Integration

23

Remark 2.2.5. Observe that I is integrable if and only if Ill is integrable.


This is easy to see since Ill = 1+ + 1-. If I is integrable, 1+ and 1- are
also and thus

ll(x)l J.t(dx)

is finite. Hence

l+(x) J.t(dx) +

1-(x) J.t(dx)

Ill is integrable. The converse is equally easy to prove.

Remark 2.2.6. Our definition of the Lebesgue integral was stated in four
distinct steps. It should be evident from this construction that for every
integrable function I there is a sequence of simple functions
ln(x) = L~i,nlA,,,.(x)
i

such that
lim ln(x)

n-+oo

= l(x) a.e.

and

lln(x)l :5 ll(x)l.

Thus, by the Lebesgue dominated convergence theorem {L4), we have

lim

n-+oo}x

= Jx
f l(x) J.t(dx).

ln(x) J.t(dx)

This observation will be used many times in simplifying proofs since it


enables us to reduce our arguments to two steps: First, we must only verify
some formula for simple functions and then, aecond, pass to the limit. 0
Remark 2.2.7. The notion of the Lebesgue integral is quite important
since it is defined for very abstract measure spaces {X,A,J.t) in which no
other structures are introduced except for the existence of a u-algebra A
and a measure J.t In calculus the definition of the Riemann integral is
intimately related to the algebraic properties of the real line, and it is easy
to establish a connection between the Lebesgue and Riemann integrals. For
example, if we define J.t as in Remark 2.1.3, then

lra,b)

l(x) J.t(dx)

1ba

l(x) dx

where the left-hand side is the Lebesgue integral and the right-hand side
is the Riemann integral. This equality is true for any Riemann integrable
function I since any Riemann integrable function is automatically Lebesgue
integrable. An analogous connection exists in higher dimensions. 0
From the properties of the Lebesgue integral it is easy to demonstrate
that if 1: X - R is a nonnegative integrable function then J.tJ(A), defined
by

J.tJ(A)

l(x) J.t(dx),

24

2. The Toolbox

is a finite measure. In fact, by the definition of the Lebesgue integral it is


clear that I'J(A) is nonnegative and finite, and from property (15) it is also
additive. Further, from (12) if J'(A) = 0 then

I'J(A)

lA(x)f(x) J'(dx)

=0

since lA(x)f(x) = 0 a.e. Thus I'J(A) satisfies all the properties of a measure as detailed in Definition 2.1.2, and I'J(A) = 0 whenever J'(A) = 0.
This observation that every integrable nonnegative function defines a finite
measure can be reversed by the following theorem, which is of fundamental
importance for the development of the Frobenius-Perron operator.

Theorem 2.2.1. (Radon-Nikodym theorem). Let (X,A,J') be a measure space and let v be a second finite measure with the property that
v(A) = 0 for all A E A such that J'(A) = 0. Then there exists a nonnegative integrable function f: X --+ R such that

v(A)

f(x) J'(dx)

for all A EA.

Remark 2.2.8. It should be observed that we have not explicitly stated


that (X, A, J') is au-finite measure space, which is an important assumption
in the Radon-Nikodym theorem. Once again we wish to stress our earlier
assumption that all measure spaces are taken to be u-finite unless a contrary
assumption is made. 0
Although we omit the proof of the Radon-Nikodym theorem, it is easy
to show that the function f is in some sense unique. To see this, assume
that there are two functions It, h: X --+ R such that

v(A)

ft(x) J'(dx) and v(A)

h(x) J'(dx).

Then for all A E A we have

(ft(x)- h(x)] J'(dx)

= 0.

Define two sets A1 and A2 by

A1

= {x:ft(x) > h(x)}

and A2

= {x:ft(x) :5 h(x)}.

Then
0=

jA1

[ft(x)- h(x)] J'(dx)-

=f

lft(x)- h(x)IJ'(dx)

jA1UA2

jA2

lft(x)- h(x)IJ'(dx).

[ft(x)- h(x)] J'(dx)

2.2. Lebesgue Integration

25

Hence, from property (12) of Lebesgue integrals, we have lft(x)-h(x)l = 0


a..e., so that h(x) and h(x) differ only on a. set of measure zero.
Observe that our argument is quite general, and we have in fact proved
the following.
Proposition 2.1.1. If h and h are integrable functions such that

h(x)IJ(dx) =

then

h =h

h(x)IJ(dx)

forAeA

a..e.

Also from property (12) of the Lebesgue integral it is clear that two
measurable functions, h and h, differing from each other only on a. set of
measure zero, cannot be distinguished by calculating integrals. Thus we sa.y
that in the space of measurable functions, every two functions !1, /2,
differing only on a. set of measure zero, represent the same element of that
space. However, to simplify our notation, we will often write ''measurable
function" instead of "an element of the space of measurable functions."
Because of property (12) this should not lead to any confusion.
With these remarks in mind, we now introduce the concept of an lJ'
space.
Definition 2.2.6. Let (X, A, IJ) be a. measure space and p a. real number,
1 ~ p < oo. The family of all possible real-valued measurable functions
f: X -+ R satisfying

lf(x)l" IJ(dx) < oo

(2.2.2)

is the IJ'(X, A, IJ) space. Here we use the term "measurable function" to
mean "an element of the space of measurable functions."
We shall sometimes write lJ' instead of IJ'(X, A, IJ) if the measure space
is understood, or IJ'(X) if A and IJ are understood. Note that if p = 1 then
the L 1 space consists of all possible integrable functions.
The integral appearing in (2.2.2) is very important for an element f E IJ'.
Thus it is assigned the special notation
1

nJnL,. = [[ IJ(x)l" 1J(dx)f "

(2.2.3)

and is called the lJ' norm of f. When property (12) of the Lebesgue
integral is applied to Ill", it immediately follows that the condition 11/IIL,. =
0 is equivalent to f(x) = 0 a..e. Or, more precisely, 11/IIL,. = 0 if and only if
f is a zero element in lJ' (which is an element represented by a.ll functions
equal to zero almost everywhere).
Two other important properties of the norm a.re
for f E IJ', a E R

(2.2.4)

26

2. The Toolbox

FIGURE 2.2.2. A geometric interpretation of the triangle inequality (2.2.5).

and

II/+ giiLP

11/IILP + llgiiLP

for

J,g E V.

(2.2.5}

The first condition, (2.2.4), simply says that the norm is homogeneous. The
second is called the triangle inequality. As shown in Figure 2.2.2, if we
think of J, g, and J +gas vectors, we can consider a triangle with sides J,
g, and J +g. Then, by equation, (2.2.5), the length of the side (J +g) is
shorter than the sum of the lengths of the other two sides.
From (2.2.4) it follows that for every J E V and real a, the product af
belongs to V. Further, from (2.2.5) it follows that for every J,g E V the
sum J + g is also an element of V. This is denoted by saying that V is a
vector space.
Because the value of 11/IILP is interpreted as the length of j, we say that
1

II/- giiLP =

[ [ lf(x)- g(x)IP J.L(dx)]

/P

is the V distance between J and g.


It is important to note that the product f g of two functions /, g E V is
not necessarily in V, for example, J(x) = x- 1/ 2 is integrable on [0, 1] but
[/(x)) 2 = x- 1 is not.
This leads us to define the space adjoint to V.
Definition 2.2.7. Let (X,A,J.L) be a measure space. The space adjoint
to V(X,A,J.L) is v' (X,A,p), where
(1/p) + (1/p')

= 1.

Remark 2.2.9. If p = 1, Definition 2.2. 7 of adjoint space fails. The adjoint


space, in the case p = 1, by definition, consists of all bounded almost
everywhere measurable functions and is denoted by L 00 Functions that
differ only on a set of measure zero are considered to represent the same
element. 0

2.2. Lebesgue Integration

27

It is well known that if I E v and g E v'' then I g is integrable, and


hence we define the scalar product of two functions by

(/,g)=

f(x)g(x) tt(dx).

An important relation we will often use is the Cauchy-Holder inequality. Thus, if IE LP-and g E v'' then

For this inequality to make sense when f E Lt, g E L 00 , we take the L 00


norm of g to be the smallest constant c such that

lg(x)l :5 c
for almost all x E X. This constant is denoted by ess sup lg( x) I, called the
essential supremum of g.

Remark 2.2.10. As we almost always work in Lt, we will not indicate the
space in which the norm is taken unless it is not Lt. Thus we will write
11/11 instead of 11/IILl Observe that in Lt the norm has the exceptional
property that the triangle inequality is sometimes an equality. To see this,
note from property (L3) that

II/+ gll = 11/11 + llgll

for

f '?. 0, g '?. 0; J,g E 1.

Thus geometrical intuition in some abstract spaces may be misleading. 0


The concept of the Lt space simplifies the Radon-Nikodym theorem as
shown by the following corollary.
Corollary 2.2.1. If(X,A,tt) is a measure space and vis a finite measure
on A such that v(A) = 0 whenever tt(A) = 0, then there exists a unique
element fELt such that

v(A)

f(x) tt(dx)

forA EA.

One of the most important notions in analysis, measure theory, and topology, as well as other areas of mathematics, is that of the Cartesian product.
To introduce this concept we start with a definition.
Definition 2.2.8. Given two arbitrary sets At and A2, the Cartesian
product of At and A2 (note that the order is important) is the set of all
pairs (xt,x2) such that Xt EAt and x2 E A2. This is customarily written
as

28

2. The Toolbox

In a natural way this concept may be extended to more than two sets.
Thus the Cartesian product of the sets A1 , , Ad is the set of all sequences
(xl! ... , xd) such that Xi E Ai, i = 1, ... , d, or

A1 x x Ad= {(xl, ... ,xd):xi

Ai fori= 1, ... ,d}.

An important consequence following from the concept of the Cartesian


product is that if a structure is defined on each of the factors Ai, for
example, a measure, then it is possible to extend that property to the
Cartesian product. Thus, given d measure spaces (Xi, A, l'i), i = 1, ... , d,
we define
(2.2.6)

A to be the smallest u-algebra of subsets of X containing all sets of the


form
(2.2.7)
with Ai E A, i = 1, ... , d,
and

(2.2.8)
Unfortunately, by themselves they do not define a measure space (X, A, JL).
There is no problem with either X or A, but I' is defined only on special
sets, namely A = A 1 x x Ad, that do not form a u-algebra. To show that
JL, as defined by (2.2.8), can be extended to the entire u-algebra A requires
the following theorem.
Theorem 2.2.2. If measure spaces (Xi, A, l'i), i = 1, ... , d are given
and X, A, and I' are defined by equations (2.2.6), (2.2.7), and (2.2.8),
respectively, then there exists a unique extension of I' to a measure defined
on A.
The measure space (X, A, I') whose existence is guaranteed by Theorem
2.2.2, is called the product of the measure spaces (X11 A 1,JL1), ... ,
(Xd, ~.I'd), or more briefly a product space. The measure I' is called
the product measure.
Observe that from equation (2.2.8) it follows that
JL(Xl

X X

Xd) = JL(Xl) .. JL(Xd)

Thus, if all the measure spaces (Xi, A, l'i) are finite or probabilistic, then
(X, A, JL) will also be finite or probabilistic.
Theorem 2.2.2 allows us to define integration on the product space (X, A,
JL) since it is also a measure space. A function f: X --+ R may be written
as a function of d variables because every point x E X is a sequence x =
(x1, ... ,xd), Xi E Xi. Thus it is customary to write integrals on X either
as
f(x) JL(dx),

2.2. Lebesgue Integration

where it is implicitly understood that x


xd, or in the more explicit form

29

= (xl! ... , Xd) and X = X 1 x x

Integrals on the product of measure spaces are related to integrals on


the individual factors by a theorem associated with the name of Fubini.
For simplicity, we first formulate it for product spaces containing only two
factors.
Theorem 2.2.3 (Fubini's theorem). Let (X, A, J.L) be the product space
formed by (Xl!Al!J.Lt) and (X2,A2,J.L2), and let a J.L integrable function
f: X -+ R be given. Then, for almost every Xt, the function f(xl! x2) is J.L2
integrable with respect to x2. Furthermore the function

of the variable Xt is J.Lt integrable and

f(xt, x2)J.L2(dx2)} J.L1 (dxt)

{[
1

j j f(xl! x2) J.L(dx1dx2). (2.2.9)

Theorem 2.2.3 extends, in a natural way, to product spaces with an


arbitrary number of factors. If (X, A, J.L) is the product of the measure
spaces (X,, At, J.Li), i = 1, ... , d, and f: X-+ R is J.L integrable, then

I . I

f(xl! ... , xd) J.L(dxt .. dxd)

(2.2.10)

f { ... f [f f(xl! ... ,xd)J.Ld(dxd)]J.Ld-l(dxd_t)}J.Lt(dxt)


lx1
lxd-1 lxd

Remark 2.2.11. As we noted in Remark 2.1.3, the "natural" Borel measure on the real line R is defined on the smallest u-algebra 8 that contains
all intervals. For every interval [a, b] this measure satisfies J.L([a, b]) = b- a.
Having the structure (R, 8, J.L), we define by Theorem 2.2.2 the product
space (~,8d,J.Ld), where
d

R =Rx xR,

30

2. The Toolbox

Bd is the smallest u-algebra containing all sets of the form


with~

EB,

and
{2.2.11)
- The measure I'd is again called the Borel measure. It is easily verified that
Bd may be alternately defined as either the smallest u-algebra containing
all the rectangles
[at. b1] x x [ad, bd],
or as the smallest u-algebra containing all the open subsets of Jl!l. From
{2.2.11) it follows that

J.'d([al, b1] x x [ad, bd])

= {b1 -

a1) (bd- ad),

which is the classical formula for the volume of an n-dimensional box.


The same construction may be repeated by starting, not from the whole
real lineR, but from the unit interval [0, 1] or from any other finite interval.
Thus, from Theorem 2.2.2, we will obtain the Borel measure on the unit
square [0, 1] x [0, 1] or on the d-dimensional cube
[0,1]d

= [0,1]x .~.

x[0,1].

In all cases (Rd, [0, 1]d, etc.) we will omit the superscript don Bd and I'd
and write {Jrl, B, J.) instead of (Rd, Bd, I'd). Furthermore, in all cases when
the space is R, Rd, or any subset ofthese ([0, 1], [0, 1]d, R+ = (0, oo), etc.)
and the measure and u-algebra are not specified, we will assume that the
measure space is taken with the Borel u-algebra and Borel measure. Finally,
all the integrals on R or Rd taken with respect to the Borel measure will
be written with dx instead of J.( dx). D
Remark 2.2.12. From the additivity property of a measure (Definition
2.1.2c) it follows that every measure is monotonic, that is, if A and Bare
measurable sets and A c B then J.(A) ~ J.(B). This follows directly from

= J.(A U (B \A)) = I'( A)+ J.(B \A).


Thus, if J.(B) = 0 and A c B, then J.(A) = 0. However, it could happen
J.(B)

that A C B and B is a measurable set while A is not. In this case, if


J.(B) = 0, then it does not follow that J.(A) = 0, because J.(A) is not
defined, which is a peculiar situation.
It is rather natural, therefore, to require that a "good" measure have the
property that subsets of measurable sets of measure zero should also be
measurable with, of course, measure zero. If a measure has this property it
is called complete. Indeed, it can be proved that, if {X, A, J.) is a measure

2.3. Convergence of Sequences of Functions

31

space, then there exists a smallest u-algebra A1 :J A and a measure l'l on


A1 identical with J1. on A such that (Xl.Al,Jl.l) is complete.
Every Borel measure on R (or Rd, [0, 1), [0, 1]d, etc.) can be completed.
This complete measure is called the Lebesgue measure. However, when
working in R (or ~, etc.), we will use the Borel measure and not the
Lebesgue measure, because, with the Lebesgue measure, we encounter problems with the measurability of the composition of measurable functions that
are avoided with the Borel measure. 0

2.3 Convergence of Sequences of Functions


Having defined V' spaces and introduced the notions of norms and scalar
products, we now consider three different types of convergence for a sequence of functions.
Definition 2.3.1. A sequence of functions {/n},
(weakly) Cesaro convergent to f E V' if
1 n
lim - ""(b" g)
n--+oo n L...J

= (/,g)

In

V', 1

~ p

for all g E V'.

< oo,

is

(2.3.1)

k=l

Definition 2.3.2. A sequence of functions {/n},


weakly convergent to f E V' if
lim (/n,g) =(/,g)
n--+oo

In

V', 1

for all g E V'.

Definition 2.3.3. A sequence of functions {/n},


strongly convergent to f E V' if

In

lim 11/n - /IILP = 0.


n--+oo

V', 1

< oo, is
(2.3.2)

oo, is

(2.3.3)

From the Cauchy-Holder inequality, we have

1(/n- /,g)l ~ 11/n- fiiLP llgiiLP


and, thus, if 11/n - /IILP converges to zero, so must (/n - J,g). Hence
11

strong convergence implies weak convergence, and the condition for strong
convergence is relatively straightforward to check. However, the condition
for weak convergence requires a demonstration that it holds for all g E v',
which seems difficult to do at first glance. In some special and important
spaces, it is sufficient to check weak convergence for a restricted class of
functions, defined as follows.
Definition 2.3.4. A subset K c V' is called linearly dense if for each
E V' and e > 0 there are g1, ... ,gn E K and constants .>.l,.>.n, such

32

2. The Toolbox

that

II/- giiLP < e,


where
i=1

By using the notion of linearly dense sets, it is possible to simplify the


proof of weak convergence. If the sequence {/n} is bounded in norm, that
is, 11/niiLP ~ c < oo, and if K is linearly dense in v', then it is sufficient
to check weak convergence in Definition 2.3.2 for any g E K.
It is well known that in the space V([O, 1]) (1 ~ p < oo) the following
sets are linearly dense:
K 1 ={the set of characteristic functions 1a(x) of the Borel sets
tl. c [0, 1]},
K2 ={the set of continuous. functions on [0, 1]} ,
Ks = {sin(mrx); n = 1, 2, ... }.

In K 1 it is enough to take a family of sets tl. that are generators of Borel


sets on [0, 1], for example, {fl.} could be the family of subintervals of [0, 1].
Observe that the linear density of Ks follows from the Fourier expansion
theorem. In higher dimensions, for instance on a square in the plane, we
may take analogous sets K 1 and K 2 but replace K 3 with
K~

= {sin(m1rx) sin(n1ry):n, m = 1, 2, ... }.

Example 2.3.1. Consider the sequence of functions fn(x) = sin(nx) on


L 2([0, 1]). We are going to show that {/n} converges weakly to f 0. First
observe that

ll/nll2 = (11 sin2nxdx) 1/2 =I~- si::nr/2

~ 1,

and hence the sequence {11/nll2} is bounded. Now take an arbitrary function g(x) = sin(m1rx) from Ks. We have
1
(/n,g) = 1 sin(nx) sin(m1rx) dx

sin(n- m1r)
2(n- m1r)

sin(n + m1r)
2(n + m1r)

so that
lim (/n, g) = (0, g) = 0,

for g E Ks

n-+oo

and {/n} thus converges weakly to f = 0.

2.3. Convergence of Sequences of Functions

33

We have seen that, in a given lJ' space, strong convergence implies weak
convergence. It also turns out that we may compare convergence in different
lJ' spaces using the following proposition.

Proposition 2.3.1. If (X, A, f.J) is a finite measure space and 1 :::; p 1 <
P2 :::; oo, then

for every

E IJ'2

(2.3.4)

where c depends on JL(X). Thus every element of lJ'2 belongs to IJ'1 , and
strong convergence in lJ'2 implies strong convergence in IJ'1

Proof. Let

E lJ'2 and let P2

< oo. By setting g = IJIP1 , we obtain

Setting p' = P2/P1 and denoting by p the number adjoint top', that is,
(1/p) + (1/p') = 1, we have

and, consequently,

11/111~1 :5 IL(X) 11PIIJII1~2,

which proves equation (2.3.4). Hence, if II/IILP2 is finite, then 11/IILPl is also
finite, proving that IJ'2 is contained in IJ'1 FUrthermore, the inequality

implies that strong convergence in IJ'2 is stronger than strong convergence


in IJ'1 If P2 = oo, the inequality (2.3.4) is obvious, and thus the proof is
complete.
Observe that the strong convergence of fn to f in 1 (with arbitrary
measure) as well as the strong convergence of f n to f in lJ' (p > 1) with
finite measure both imply
lim { fnf.J(dx)

n-+oo}x
To see this simply note that

= Jx
{ /JL(dx).

34

2. The Toolbox

It is often necessary to define a function as a limit of a convergent sequence and/or as a sum of a convergent series. Thus the question arises
how to show that a sequence {/n} is convergent if the limit is unknown.
The famous Cauchy condition for convergence provides such a tool.
To understand this condition, first assume that {/n}, In E V, is strongly
convergent to f. Take e > 0. Then there is an integer no such that
forn~no

and, in particular,
for n ~ no and k ~ 0.
From this and the triangle inequality, we obtain

Thus we have proved that, if {/n} is strongly convergent in/.)' to/, then
lim

n-+oc

11/n+Tc -/niiLP =

uniformly for all k

0.

(2.3.5)

This is the Cauchy condition for convergence.


It can be proved that all /.)' spaces (1 5 p 5 oo) have the property
that condition (2.3.5) is also sufficient for convergence. This is stated more
precisely in the following theorem.

Theorem 2.3.1. Let (X, A, JJ) be a measure space and let {/n}, In E
V(X, A, JJ) be a sequence such that equation (2.3.5) holds. Then there exists
an element f E /.)' (X, A, JJ) such that {In} converges strongly to f, that is,
condition (2.3.3) holds.
The fact that Theorem 2.3.1 holds for])' spaces is referred to by saying
that /.)' spaces are complete.
Theorem 2.3.1 enables us to prove the convergence of series by the use
of a comparison series. Suppose we have a sequence {gn} c V and we
know the series of norms lluniiLP is convergent, that is,
00

L lluniiLP < oo.

(2.3.6)

n=O

Then, using Theorem 2.3.1, it is easy to verify that the series


(2.3.7)
is also strongly convergent and that its sum is an element of ])'.

Exercises

35

To see this note that the convergence of (2.3. 7) simply means that the
sequence of partial sums
n

Bn=

~9m
m=O

is convergent. To verify that {sn} is convergent, set


n
t1n

= ~ IIYmiiLP
m=O

From equation (2.3.6) the sequence of real numbers {an} is convergent and,
therefore, the Cauchy condition holds for this sequence. Thus

lim lan+A: -ani

n-+oo

=0

uniformly

fork~

0.

Further
n+A:

:$;

LP

~ ll9miiLP

= lun+A:- Unl

m=n+l

so finally
uniformly

fork~

0,

which is the Cauchy condition for {sn}

Exercises
2.1. Using Definition 2.1.2 prove the following "continuity properties" of
the measure:
(a) If {An} is a sequence of sets belonging to A and A1 C A2 C ... , then

(b) If {An} is a sequence of sets belonging to A and A1:) A2:) ... , then

2.2. Let X
define

= {1, 2, ... } be the set of positive integers.

For each A

k(n, A)= the number of elements of the set An {1, ... , n}.

36

2. The Toolbox

Let A be the family of all A


of A in X" given by

c X for which there exists ''the average density

J.(A) = lim .!k(n, A).


n-+00 n
Is J. a measure? [More precisely, is (X, A, J.) a measure space?]
2.3. Let X = [a, b] be a compact interval and J.' the standard Borel measure.
Prove that for a continuous f: [a, b] -+ R the values of the Lebesgue and
the Riemann integral coincide.
2.4. Let X = R+ and J.' be the standard Borel measure. Prove that a
continuous function f: R+-+ R is Lebesgue integral if and only if
lim

a-+oo

and that

Jor

lf(x)ldx < oo,

r f(x)J.(dx) = lim r f(x) dx.

}R+

n-+oo}o

2.5. Consider the space (X, A, J.) where X = {1, 2, ... } is the set of positive
integers, A all subsets of X and J.' the counting measure. Prove that a
function f: X -+ R is integrable if and only if
00

L lf(x)l < oo,


k=l

and that

[ f(x)J.(dx)

= ~ f(k).

[Remark. L 1 (X, A, J.) is therefore identical with the space of all absolutely
convergent sequences. It is denoted by l 1 .]
2.6. From Proposition 2.3.1 we have derived the statement: if 1 ~ Pl < P2 ~
oo and J.(X) < oo, then the strong convergence of Un} to fin IJ'2(fn, f E
LP2 ) implies the strong convergence of {fn} to f in LP1 Construct an
example showing that this statement is false when J.(X) = oo even if fm
f E Pl nP2.
2. 7. Let (X, A, J.) be a finite measure space and let
Show that the function

L 00 (X) be fixed.

1~p<oo

is continuous and that


lim cp(p) = ess suplfl.

n-+oo

2.8. The spaces LP(X, A, J.) are seldom considered for 0 < p < 1 because
in this case an important property of the norm IIIILP given by formulas
(2.2.2) is not satisfied. Which property?

3
Markov and Frobenius-Perron
Operators

Taking into account the concepts of the preceding chapter, we are now
ready to formally introduce the Frobenius-Perron operator, which, as we
saw in Chapter 1, is of considerable use in studying the evolution of densities
under the operation of deterministic systems.
However, as a preliminary step, we develop the more general concept of
the Markov operator and derive some of its properties. Our reasons for this
approach are twofold: First, as will become clear, many concepts concerning
the asymptotic behavior of densities may be equally well formulated for
both deterministic and stochastic systems. Second, many of the results that
we develop in later chapters concerning the behavior of densities evolving
under the influence of deterministic systems are simply special cases of
more general results for stochastic systems.
The theory of Markov operators is extremely rich and varied, and we have
chosen an approach particularly suited to an examination of the eventual
behavior of densities in dynamical systems. Foguel [1969] contains an exhaustive survey of the asymptotic properties of Markov operators.

3.1

Markov Operators

We define the Markov operator as follows.

Definition 3.1.1. Let (X, A, J.t) be a measure space. Any linear operator
P: L 1 -+ L 1 satisfying

38

3. Markov and Frobenius-Perron Operators

(a) Pf?::. 0

(b)

for f?::. 0,/ E 1 ; and

IlP/II = 11/11,

for/?::.O,jeL 1

(3.1.1)

is called a Markov operator.

Remark 3.1.1. In conditions (a) and (b), the symbols f and Pj denote
elements of 1 represented by functions that can differ on a set of measure
zero. Thus, for any such function, properties f ?::. 0 and P f ?::. 0 hold almost
everywhere. When it is clear that we are dealing with elements of 1 (or
P), we will drop the "almost everywhere" notation. 0
Markov operators have a number of properties that we will have occasion
to use. First, if j, g e 1 , then
PJ(x)?::. Pg(x)

whenever f(x) ?::. g(x).

(3.1.2)

Any operator P satisfying (3.1.2) is said to be monotonic. To show the


monotonicity of P is trivial, since (! - g) ?::. 0 implies P(J - g) ?::. 0.
To demonstrate further inequalities that Markov operators satisfy, we
offer the following proposition.

Proposition 3.1.1. If (X,A,p) is a measure space and P is a Markov


operator, then, for every f E 1 ,
(Ml) (PJ(x))+::;; Pj+(x)

(3.1.3)

(M2) (Pj(x))- ::;; Pf-(x)

(3.1.4)

(M3) IP/(x)l::;; Plf(x)l

(3.1.5)

and
(M4)

IlP/II::;; 11/11.

(3.1.6)

Proof. These inequalities are straightforward to derive. To obtain (3.1.3),


note that from the definition of j+ and /-, it follows that
(PI)+= (Pj+- Pj-)+ = max(O,Pj+- Pj-)

::;; max(O,Pj+) = Pj+;


and inequality (3.1.4) is obtained in an analogous fashion. Inequality (3.1.5)
follows from (Ml) and (M2), namely,

IP/1 = (Pj)+ +(PI)-::;; Pj+ + PJ= P(J+ + /-) = PI/I.

3.1. Markov Operators

39

Finally, by integrating (3.1.5) over X, we have

IlP/II

LIPJ(x)l~t(dx) LPlf(x)l~t(dx)
= Llf(x)l~t(dx) = 11/11,
=

:5

which confirms (3.1.6).


Inequality (3.1.6) is extremely important, and any operator P that satisfies it is called a contraction. The actual inequality (3.1.6) is known as
the contractive property of P. To illustrate its power note that for any
f E L 1 we have

and, thus, for any two

ft, 12 E L 1 , It i: 12,

llpn It - pn 1211 = llpn(!t - 12)11


:5 npn- 1(/t - 12)11 = npn-l It -

pn- 1hll (3. 1.7)

Inequality (3.1. 7) simply states that during the process of iteration of two
individual functions the distance between them can decrease, but it can
never increase. This is referred to as the stability property of iterates of
Markov operators.
By the support of a function g we simply mean the set of all x such
that g(x) : 0, that is,
suppg = {x:g(x)

: 0}.

(3.1.8)

Remark 3.1.2. This is not the customary definition of the support of a


function, which is usually defined by
supp g = closure{x: g(x)

: 0}.

(3.1.9)

But, because the customary definition (3.1.9) requires the introduction of


topological notions not used elsewhere, we have presented the slightly unusual definition (3.1.8). 0

Remark 3.1.3. If g is an element of V, then the set (3.1.8) is not defined


in a completely unique manner, since g may be represented by functions
that differ on a set of measure zero. This inaccuracy never leads to any
difficulties in calculating measures and integrals. Thus, it is customary to
simplify the terminology and to talk about the supports of elements from
V as if we were speaking of sets. However, if we want to emphasize that a
relation between sets does not hold precisely but may be violated on a set
of measure zero, we say that it holds modulo zero. Thus A = B modulo

40

3. Markov and Frobenius-Perron Operators

zero means that the set of x in A that does not belong to B, or vice versa,
has measure zero. 0
One might wonder under what conditions the contractive property (3.1.6)
is a strong inequality. The answer is quite simple.
Proposition 3.1.2.11Pfll

= IIIII if and only if Pj+ and Pf- have disjoint

supports.

Proof. We start from the inequality

Clearly the inequality will be strong if both p J+ (X) > 0 and p (X) > 0,
while the equality holds if PJ+(x) = 0 or Pf-(x) = 0. Thus, by integrating
over the space X, we have

if and only if there is no set A E A, JL(A) > 0, such that Pj+(x) > 0 and
Pf-(x) > 0 for x E A, that is, Pj+(x) and Pf-(x) have disjoint support.
Since f = j+- 1-, the left-hand integral is simply IIPfll FUrther, the righthand side is IIPJ+II +liP!_ II= IIJ+II + 111-11 = IIJII, so the proposition is
proved.
Having developed some of the more important elementary properties of
Markov operators, we now introduce the concept of a fixed point of P.
Definition 3.1.2. If P is a Markov operator and, for some f E L 1 , P f = f
then f is called a fixed point of P.
From Proposition 3.1.1 it is easy to show the following.
Proposition

3.1.3. If P f = f then P j+ = j+ and P f- = f-.

Proof. Note that from P f

=f

we have

hence

L[Pj+(x)- j+(x)]JL(dx) + L[Pf-(x)- f-(x)]JL(dx)

= L[Pj+(x) + Pf-(x)]JL(dx)- L[j+(x) + f-(x)]JL(dx)


=

LPlf(x)IJL(dx)- Llf(x)IJL(dx)

= IIPiflll-111!111

3.2. The Frobenius-Perron Operator

41

However, by the contractive property of P we know that

II PI/Ill - Ill/Ill ~ 0.
Since both the integrands (Pf+- j+) and (Pf-- f-) are nonnegative,
this last inequality is possible only if P j+ = j+ and P 1- = 1-.
Definition 3.1.3. Let (X, A, J.L) be a measure space and the set D(X, A, J.L)
be defined by D(X,A,J.L) = {! E L 1 (X,A,J.L):f;;:: 0 and II/II= 1}. Any
function f E D(X,A,J.L) is called a density.
Definition 3.1.4.

Iff E L 1 (X,A,J.L)
J.Lt(A) =

and f;;:: 0, then the measure

f(x)J.L(dx)

is said to be absolutely continuous with respect to J.L, and f is called


the Radon-Nikodym derivative of J.L 1 with respect to J.L. In the special
case that f E D(X, A, J.L), then we also say that f is the density of J.Lt and
that J.Lt is a normalized measure.
From Corollary 2.2.1 it follows that a normalized measure v is absolutely
continuous with respect to J.L if v(A) = 0 whenever J.L(A) = 0. This property
is often used as the definition of an absolutely continuous measure.
Using the notion of densities we may extend the concept of a fixed point
of a Markov operator with the following definition.
Definition 3.1.5. Let (X, A, J.L) be a measure space and P be a Markov
operator. Any f E D that satisfies P f = f is called a stationary density
of P.
The concept of a stationary density of an operator is extremely important
and plays a central role in many of the sections that follow.

3.2 The Frobenius-Perron Operator


Having developed the concept of Markov operators and some of their properties, we are in a position to examine a special class of Markov operators,
the Frobenius-Perron operator, which we introduced intuitively in Chapter
1.

We start with the following definitions.


Definition 3.2.1. Let (X, A, J.L) be a measure space. A transformation
- t X is measurable if

S: X

for all A

EA.

42

3. Markov and Frobenius-Perron Operators

Definition 3.2.2. A measurable transformation S: X -+ X on a measure


space (X,A,J) is nonsingular if J(S- 1 (A)) = 0 for all A E A such that
J(A) = 0.
Before stating a precise definition of the Frobenius-Perron operator, consider the following. Assume that a nonsingular transformation S: X -+ X
on a measure space is given. We define an operator P: L 1 -+ L 1 in two
steps.
1. Let I E L 1 and I ~ 0. Write

l(x)J'(dx).

(3.2.1)

ls-l(A)

Because

s- (

yA.) ys-'(A.),
=

it follows from property (L5) of the Lebesgue integral that the integral
(3.2.1) defines a finite measure. Thus, by Corollary 2.2.1, there is a unique
element in L 1 , which we denote by PI, such that

}A
2. Now let
Write I = 1+

Pl(x)J(dx)

=f

l(x)J(dx)

for A EA.

Js-l(A)

I E L 1 be arbitrary,
- 1- and define

that is, not necessarily nonnegative.

From this definition we have

or, more completely,

}A

Pl(x)J(dx} =

l(x)J(dx),

for A EA.

(3.2.2)

ls-l(A)

From Proposition 2.2.1 and the nonsingularity of S, it follows that equation (3.2.2) uniquely defines P.
We summarize these comments as follows.
Definition 3.2.3. Let (X, A, J') be a measure space. If S: X -+ X is a
nonsingular transformation the unique operator P: L 1 -+ L 1 defined by
equation (3.2.2) is called the Frobenius-Perron operator corresponding
to S.
It is straightforward to show from (3.2.2) that P has the following properties:

3.2. The Frobenius-Perron Operator

+ ).2/2) = ).1Pft + ).2Ph


for all It, h E 1, ).1, ).1 E R, so P is a linear operator;

{FPl) P{).d1

{FP2) Pf?. 0 iff?. 0;


{FP3) [

{3.2.3)

(3.2.4)

and

Pf(x)p,(dx) = [

43

f(x)p,(dx)

(3.2.5)

(FP4) If S., = So . ~. o S and P., is the Frobenius-Perron operator corresponding to S.,, then P., = P"", where Pis the Frobenius-Perron
operator corresponding to S.

Remark 3.2.1. Although the definition of the Frobenius-Perron operator


P by {3.2.2) is given by a quite abstract mathematical theorem of RadonNikodym, it should be realized that it describes the evolution of f by a
transformationS. Properties {3.2.4-3.2.5) of the transformed distribution
Pf(x) are exactly what one would expect on intuitive grounds. 0
Remark 3.2.2. From the preceding section, the Frobenius-Perron operator is also a Markov operator.
As we wish to emphasize the close connection between the behavior of
stochastic systems and the chaotic behavior of deterministic systems, we
will formulate concepts and results for Markov operators wherever possible. The Frobenius-Perron operator is a particular Markov operator, and
thus any property of Markov operators is immediately applicable to the
Frobenius-Perron operator.
In some special cases equation {3.2.2) allows us to obtain an explicit form
for Pf. As we showed in Chapter 1, if X= (a,b] is an interval on the real
lineR, and A= [-a,x], then (3.2.2) becomes

1
:z:

Pf(x)ds =

1S-l((o,:z:))

f(s)ds,

and by differentiating

Pf(x)

= dxd

1s-l((o,:z:))

f(s)ds.

{3.2.6)

It is important to note that in the special case where the transformation

S is differentiable and invertible, an explicit form for P f is available. If S


is differentiable and invertible, then S must be monotone. Suppose S is an
increasing function and

s-1 has a continuous derivative. Then

s- 1 ([a, x])

= (s- 1 (a), s- 1 (x)],

and from {3.2.6)

Pf(x)

d 1s-'(:z:)

= dx

S- 1 (o)

f(s)ds

= f{S- 1 (x))dx(s- 1 (x)].

44

3. Markov and Frobenius-Perron Operators

Pf

-I

0 e-1

e 3

FIGURE 3.2.1. Operation of the Frobenius-Perron operator corresponding to


S(x) = ez, x E R. (a) An initial density l(x) = ~~[-l,lJ(x) is transformed into
the density Pl(x) = (2x)- 1 1[.,-1,.,1(x) by S as shown in (b).
If S is decreasing, then the sign of the right-hand side is reversed. Thus, in
the general one-dimensional case, for S differentiable and invertible with
continuous dS- 1 /dx,

(3.2.7}

Example 3.2.1. To see how the Frobenius-Perron operator works, pick


S(x) = exp(x). Thus from (3.2.7}, we have
Pl(x) = (1/x)l(lnx).

Consider what happens to an initial

given by

1
f(x) = 21[-1,1J(x},
and shown in Figure 3.2.1a. Under the action of P, the function I is carried
into
Pf(x) = (1/2x}1[.,-1,eJ(x)
as shown in Figure 3.2.1b. 0
Two important points are illustrated by this example. The first is that for
an initial I supported on a set [a, b], PI will be supported on [S(a), S(b)].
Second, Pf is small where (dS/dx} is large and vice versa.
We generalize the first observation as follows.
Proposition 3.2.1. Let S: X -+ X be a nonsingular transformation and P
the associated Probenius-Perron operator. Assume that an f ;:::: 0, f E 1 ,
is given. Then
(3.2.8}
suppf c s- 1(suppPJ)

3.2. The Frobenius-Perron Operator

45

and, more generally, for every set A E A the following equivalence holds:
Pf(x) = 0 for X E A if and only if f(x) = 0 for X E s- 1 (A).

Proof. The proof is straightforward. By the definition of the FrobeniusPerron operator, we have

}A

Pf(x)J.L(dx) =

ls-l(A)

f(x)J.L(dx)

or

[ 1A(x)Pf(x)J.L(dx) = [ 1s-l(A)(x)f(x)J.L(dx).
Thus Pf(x) = 0 on A implies, by property (L2) of the Lebesgue integral,
that f(x) = 0 for X E s- 1 (A), and vice versa. Now setting A = X \
supp(P/), we have Pf(x) = 0 for x E A and, consequently, f(x) = 0 for
X E s- 1 (A), which means that supp I c X \ s- 1 (A). Since s- 1 (A) =
X\ s- 1 (supp(P/)), this completes the proof.
Remark 3.2.3. In the case of arbitrary f E 1, then, in Proposition 3.2.1
we only have: If f(x) = 0 for all X E s- 1 (A), then Pf(x) = 0 for all X EA.
That the converse is not true can be seen by the following example. Take
S(x) = 2x (mod 1) and let

f(x)
Then from (1.2.13) Pf(x)
X

E [0,1].

={1

-1

= 0 for

~~x<~
2

~X~

1.

all x E [0, 1) but f(x) '::/: 0 for any

For a second important case consider the rectangle X = [a, b] x [c, d] in


the plane R 2 Set A = [a, x] x [c, y] so that (3.2.2) now becomes

1"'

11

ds1 Pf(s,t)dt=

II

f(s,t)dsdt.

S- 1 ((a,z) X (c,y))

Differentiating first with respect to x and then with respect toy, we have
immediately that

Pf(x,y) =

a:~x

II

f(s,t)dsdt.

S-l ((a,z) X (c,y)

Analogous formulas can be derived in the case of X c Rd.


In the general case, where X = Rd and S: X -+ X is invertible, we can
derive an interesting and useful generalization of equation (3.2. 7). To do

46

3. Markov and Frobenius-Perron Operators

this we first state and prove a change of variables theorem based on the
Radon-Nikodym theorem.
Theorem 3.2.1. Let (X,A,J.t) be a measure space, S:X -+ X a nonsingular transformation, and f: X -+ X a measurable function such that
f o S E L 1 (X,A,J.t). Then for every A E A,

ls-l(A)

f(S(x))J.t(dx)

=I

}A

f(x)J.t8- 1(dx)

=I

}A

f(x)J- 1(x)J.t(dx)

where J.tS- 1 denotes the measure


forBEA,
and J- 1 is the density of J.tS- 1 with respect to J.t, that is,
forB eA.

Remark 3.2.4. We use the notation J- 1 (x) to draw the connection with
differentiable invertible transformations on R:'-, in which case J(x) is the
determinant of the Jacobian matrix:

J(x)

I I

= d~~x)

or

J-1(x)

dS:(x)

Proof of Theorem 3.2.1. To prove this change of variables theorem,


= lB(x) so that f(S(x)) =
lB(S(x)) = ls-l(B)(x) and, hence,

we recall Remark 2.2.6 and first take f(x)

ls-l(A)

f(S(x))J.t(dx) =

lx

ls-l(A)(x)f(S(x))J.t(dx)

= ls-1(A)(x)ls-1(B)(x)J.t(dx)
= J.t(s- 1(A) n s- 1 (B)) = J.t(s- 1(A n B)).
The second integral of the theorem may be written as

f(x)J.tS- 1 (dx) =

fx

1A(x)lB(x)J.tS- 1 (dx) = J.t(S- 1(A n B))

whereas the third and last integral has the form

f(x)J- 1(x)J.t(dx) =

1B(x)J- 1(x)J.t(dx)

= I

lAnB

J- 1 (x)J.t(dx)

= J.t(S- 1(A n B)).

3.3. The Koopman Operator

47

Thus we have shown that the theorem is true for functions of the form
f(x) = 1s(x). To complete the proof we need only to repeat it for simple
functions /(x), which will certainly be true by linearity [property (13))
of the Lebesgue integral. Finally, we may pass to the limit for arbitrary
bounded and integrable function f. [Note that f bounded is required for
the integrability of /(x)J- 1 (x).)
With this change of variables theorem it is easy to prove the following
extension of equation (3.2.7).
Corollary 3.2.1. Let (X, A, J.t) be a measure space, S: X-+ X an invertible nonsingular transformation (s-t nonsingular) and P the associated
Frobenius-Perron operator. Then for every f E L 1
(3.2.9)
Proof. By the definition of P, for A E A we have

}A

Pf(x)J.t(dx)

=f

f(x)J.t(dx).

ls-l(A)

Change the variables in the right-hand integral withy= S(x),so that

f(x)J.t.(dx) =

ls-l(A)

}A

f(s-t(y))J-t(Y)J.t(dy)

by Theorem 3.2.1. Thus we have

with the result that, by Proposition 2.2.1,

3.3 The Koopman Operator


To close this chapter, we define a third type of operator closely related to
the Frobenius-Perron operator.
Definition 3.3.1. Let (X, A, J.t) be a measure space, S: X -+ X a nonsingular transformation, and f E L 00 The operator U: L 00 -+ L 00 defined
by
Uf(x) = /(S(x))
(3.3.1)
is called the Koopman operator with respect to S.

48

3. Markov and Frobenius-Perron Operators

This operator wBB first introduced by Koopman [1931]. Due to the nonsingularity of S, U is well defined since It (x) = h(x) a. e. implies It (S(x)) =
h(S(x)) a.e. Operator U hBB some important properties:
(K1)

U(>..tft + >..2/2) =>.tUft+ >..2Uh


for all /t,/2 E L 00 ,>..t,>..2 E R;

(K2) For every

E L 00 ,

IIU JIIL $ 11/11


00

(3.3.3)

00 1

that is, U is a contraction of L


(K3) For every

(P/,g}

(3.3.2)

00

E L 1 , g E L 00 ,

= (/,Ug}

(3.3.4)

so that U is adjoint to the Frobenius-Perron operator P.


Property (K1) is trivial to check. FUrther, property (K2) follows immediately from the definition of the norm since lf(x)l $ 11/IILoo a.e. implies
IJ(S(x))l $ IIJIILoo a.e. The latter inequality gives equation (3.3.3) since,
by (3.3.1), Uf(x) = f(S(x)).
Finally, to obtain (K3) we first check it with g = 1A. Then the left-hand
side of (3.3.4) becomes

(Pj,g}

Pf(x)1A(X)IL(dx)

Pf(x)IL(dx),

while the right-hand side becomes

(!, Ug}

=[

f(x)U1A(X)IL(dx)

= f

f(x)1A(S(x))#L(dx)

Thus (K3) is equivalent to

= f

f(x)IL(dx).

[ Pf(x)IL(dx) = [

}A

k-1(~

f(x)IL(dx)

ls-l(A)

which is the equation defining Pf. Because (K3) is true for g(x) = lA(x) it
is true for any simple function g(x). Thus, by Remark 2.2.6, property (K3)
must be true for all g E L 00
With the Koopman operator it is eaBy to prove that the FrobeniusPerron operator is weakly continuous. Precisely, this means that for every
sequence {fn} C L 1 the condition

fn- f weakly

(3.3.5)

Exercises

49

implies

Pfn-+ Pf weakly.

(3.3.6}

To show this note that by property (K3} we have


forgE L 00
Furthermore, from (3.3.5} it follows that (/n, U g) converges to (!, Ug) =
(PJ,g), which means that Pfn converges weakly to Pf.
The same proof can be carried out for an arbitrary Markov operator P
(or even more generally for every bounded linear operator). In this case
we must use the fact that for every Markov operator there exists a unique
adjoint operator P: L 00 -+ L 00 that satisfies

(Pf,g) = (f,Pg)

Exercises
3.1. The differential equation

u" -u+f(x) =0,


with the boundary value conditions

u'(O)

= u'(1} = 0

for every f E 1 ([0, 1]} has a unique solution u(x) defined for 0 :5 x :5
1. Show that the mapping that adjoins the solution u to f is a Markov
operator on 1 ([0, 1]}. This can be done without looking for the explicit
formula for u.
3.2. Find the Frobenius-Perron operator P corresponding to the following
transformations:

(a) S: [0, 1]

-+

[0, 1], S(x)

= 4x2 (1- x2 );

(b) S: [0, 1],-+ [0, 1], S(x) =sin 1rx;


(c) S: R-+ R, S(x) =a tan(bx +c).
In (c) observe that the value of S(x) for bx + c = n1r are irrelevant for
the calculation of P.
3.3. Consider the set X = {1, ... , N} with the counting measure. Prove
that any Markov operator P: L 1 (X)-+ L 1 (X) is given by a formula
N

(Pf)i = ~Pi;/i,
i=1

i= 1, ... ,N,

50

3. Markov and Frobenius-Perron Operators

where (p,;) is a stochastic matrix, i.e.,


N

Pi; 2:: 0,

LPi;

= 1,

i=1

and

fi

stands for f(i).

3.4. A mappingS: [0, 1] -+ [0, 1] is called a generalized tent transformation if S{x) = 8{1- x) for 0 ~ x ~ 1 and if S(x) is strictly increasing
for 0 ~ x ~ ! . Show that there is a unique generalized tent transformation
[given by (6.5.9)) for which the standard Borel measure is invariant.
3.5. Generalize the previous result showing that for every absolutely continuous measure p. on [0,1) with positive density (dp.fdx > 0 a.e.) there is
a unique generalized tent transformation S such that p. is invariant with
respect to S.
3.6. Let (X, .A, p.) be a measure space. A Markov operator P: L 1 (X) -+
L 1 {X) is called deterministic if its adjoint U
p has the following
property: For every A E .A the function U1A is a characteristic function,
i.e., U1A = 1s for some BE .A. Show that the Frobenius-Perron operator
is a deterministic operator.

3.7. Let X= {1, ... ,N} be a measure space with the counting measure
considered in Exercise 3.3. Describe a general form of the matrix (p,;) which
corresponds to a deterministic operator.
3.8. Let P,: L 1 -+ L 1 , i = 1, 2, denote deterministic Markov operators. Are
the operators P 1 P 2 and o:P1 + {1- o:)P2 , 0 < o: < 1, also deterministic?
3.9. Let X
mula

= [0, 1]. Show that P: L 1 {[0, 1]) -+ L 1 ([0, 1]) given by the forPf(x) =-1 f(x)
2

+-41 f (X)
- +-1 f (X
- +-1)
2
4
2 2

is not a deterministic Markov operator.

3.10. Let P: L 1 -+ L 1 be a Markov operator. Prove that for every nonnegative f,g E L 1 the condition supp f c supp g implies supp P/ C supp

Pg.

4
Studying Chaos with Densities

Here we introduce the concept of measure-preserving transformations and


then define and illustrate three levels of irregular behavior that such transformations can display. These three levels are known as ergodicity, mixing,
and exactness. The central theme of the chapter is to show the utility of the
Frobenius-Perron and Koopman operators in the study of these behaviors.
All these basic notions arise in ergodic theory. Roughly speaking, preservation of an initial measure J.1. by a transformation corresponds to the fact
that the constant density /( x) = 1 is a stationary density of the FrobeniusPerron operator, P1 = 1. Ergodicity corresponds to the fact that f(x) 1
is the unique stationary density of the Frobenius-Perron operator. Finally,
mixing and exactness correspond to two different kinds of stability of the
stationary density f(x) 1.
In Section 4.5, we briefly introduce Kolmogorov automorphisms, which
are closely related to exact transformations. This section is only of a reference nature, and, therefore, all proofs are omitted and the examples are
treated superficially.

4.1

Invariant Measures and Measure-Preserving


Transformations

We start with a definition.


Definition 4.1.1. Let {X, A, J.l.) be a measure space and 8: X

-+

X a

52

4. Studying Chaos with Densities

measurable transformation. Then S is said to be measure preserving if


for all A EA.
Since the property of measure preservation is dependent on S as well as
IS 1 we will alternately say that the measure IS is invariant under S if S is
measure preserving. Note that every measure-preserving transformation is
necessarily nonsingular.

Theorem 4.1.1. Let (X, A, IS) be a measure space, 8: X -+X a nonsingular transformation, and P the Frobenius-PerTOn operator associated with
S. Consider a nonnegative f E 1 . Then a measure IS! given by

~St(A) = f(x)~S(dx)
is invariant if and only iff is a fixed point of P.
Proof. First we show the "only if' portion. Assume IS1 is invariant. Then,
by the definition of an invariant measure,
for all A E A,
or

f f(x)~S(dx) = f

}A

Js-l(A)

f(x)~S(dx)

forA EA.

(4.1.1)

However, by the very definition of the Frobenius-Perron operator, we have

ls-l(A)

f(x)~S(dx) = f Pf(x)~S(dx),

for A EA.

}A

(4.1.2)

Comparing (4.1.1) with (4.1.2) we immediately have Pf =f.


Conversely, if P f = f for some f E L 1 , f ;:::: 0, then from the definition
of the Frobenius-Perron operator equation (4.1.1) follows and thus IS! is
invariant.

Remark 4.1.1. Note that the original measure IS is invariant if and only
if P1 = 1. 0
Example 4.1.1. Consider the r-adic transformation originally introduced
in ~ple 1.2.1,
(mod 1),
S(x) = rx
where r > 1 is an integer, on the measure space ([0, 1], 8, IS) where 8 is the
Borel u-algebra and IS is the Borel measure (cf. Remark 2.1.3). As we have
shown in Example 1.2.1, for any interval [0, x] c [0, 1]

s-1([0,x]) =

ur~. ~
i= 0

rr

+ ~]
r

4.1 Measure-Preserving Transformations

53

and the Frobenius-Perron operator P corresponding to S is given by equation {1.2.13):

Pf(x)

r-1
(
)
= -1 L
I .!. + ~ .

ri=O

Thus

r-1

1
P1 =- L1 = 1
r i=O

and by our previous remark the Borel measure is invariant under the r-adic
transformation. 0

Remark 4.1.2. It should be noted that, as defined, the r-adic transfor-

mation is not continuous at


However, if instead of defining the r-adic
transformation on the interval [0,1] we define it on the unit circle (circle
with circumference of 1) obtained by identifying 0 with 1 on the interval
[0,1], then it is continuous and differentiable throughout. 0
Example 4.1.2. Again consider the measure space {[0,1], B, J.l.) where J.l.
is the Borel measure. Let S: [0, 1] -+ [0, 1] be the quadratic map (S(x) =
4x{1- x) of Chapter 1). As was shown there, for [0, x] c [0, 1],

s- 1{[0,x]) = [o,!- !v'I=XJ u [! + !v'f=X, 1]


and the Frobenius-Perron operator is given by

Pf(x)

= 4 )=x{1
(!- !v'l-X) +I(!+ !v'l-X)}.
1-x

Clearly,
1
,
2v'1- X
so that the Borel measure J.1. is not invariant under S by Remark 4.1.1. To
find an invariant measure we must find a solution to the equation P f = f
or
P1=

f(x) =

)=-x{ I(!- !v'l-X) +I(!+ !v'l-X) }

4
This problem was first solved by Ulam and von Neumann [1947] who showed
that the solution is given by
1
/.(x) = 11'Jx(1- x)'

{4.1.3)

which justifies our assertion in Section 1.2. It is straightforward to show


that/. as given by (4.1.3) does, indeed, constitute a solution to Pf =f.
Hence the measure

is invariant under the quadratic transformation S(x)

= 4x{1- x).

54

4. Studying Chaos with Densities

Remark 4.1.3. The factor of 1r appearing in equation (4.1.3) ensures that


/. is a density and thus that the measure 1'!. is normalized. 0
Example 4.1.3. (The baker transformation). Now let X be the unit
square in a plane, which we denote by X = [0, 1] x [0, 1] (see Section 2.2).
The Borel 0'-algebra 8 is now generated by all possible rectangles of the
form [0, a] x [0, b] and the Borel measure p is the unique measure on 8 such
that
p([O, a] x [0, b]) = ab.
(Thus the Borel measure is a generalization of the concept of the area.) We
define a transformation S: X - X by

S(

) _ { (2x, b)
0:::; x < !,O:::; y:::; 1
x, Y (2x -1 1 !y
+ !)
!2
<-x < 1I0 < y< 1.
2
2

(4.1.4)

To understand the operation of this transformation, examine Figure 4.1.1,


where X is shown in Figure 4.1.1a. The first operation of S involves a
compression of X in the y direction by and a stretching of X in the x direction by a factor of 2 (Figure 4.1.b). The transformationS is completed
by vertically dividing the compressed and stretched rectangle, shown in
Figure 4.1.1b, into two equal parts and then placing the right-hand part
on top of the left-hand part (Figure 4.1.1c). This transformation has become known as the baker transformation because it mimics some aspects
of kneading dough. From Figure 4.1.1 it is obvious that the counterimage
of any rectangle is again a rectangle or a pair of rectangles with the same
total area. Thus the baker transformation is measurable.
Now we calculate the Frobenius-Perron operator for the baker transformation. It will help to refer to Figure 4.1.2 and to note that two cases must
be distinguished: 0 :::; y < and
y :::; 1. Thus, for the simpler case of
0 :::; y < and 0 :::; x < 1 we have

! : :;

s- 1([0,xj X [O,y]) = (0, !x] X [0,2yj


so from equation (3.2.9)

Pj(x,y) = a:ay foz/

=!(~x,2y),
In the second case, for

271

dsfo

f(s,t)dt

o:::;y<~.

! : :; y:::; 1, we find that

s- 1 ((0,xj X [O,y]) = ((0, !x]

(0, 11) U ((t,! + !x]

[0,2y -11)

4.1 Measure-Preserving TrNlSformations

55

2
(b)

FIGURE 4.1.1. Steps showing the operation of the baker transformation given in
equation {4.1.4).

hence

{j2 { r/2
1(1/2)+(z/2)
[2fl-l
}
Pf(x,y)= {}x{)y lo
ds lo f(s,t)dt+
ds lo
f(s,t)dt

=I a+ !x, 2y- 1),


Thus, finally,

112

! ~ y ~ 1.

56

4. Studying Chaos with Densities

1/2

-------------

2y

x/2

FIGURE 4.1.2. Two cases for calculating the counterimage of a set A by the
baker transformation.

Pf(

)_
x,y-

{I/(~+!x,2y-1),
(~x, 2y),

o$ y < ~

!5y$1

(4.1.5)

so that P1 = 1, and the Borel measure is, therefore, invariant under the
baker transformation. 0
Remark 4.1.4. Note that the transformation of the x-coordinate in the
baker transformation is the dyadic transformation. However, the dyadic
transformation is not 1 - 1, whereas the baker transformation is a.e. Given
an X C Rand any (not necessarily invertible) one-dimensional transformation 8: X -+ X, we may construct a two-dimensional invertible transformation T: X x X -+ X x X with 0 < (3 and

T(x,y) = (8(x)+y,(3x).
AB an example let 8: [0, 1]
Then T is given by

-+

[0, 1] be the quadratic map 8(x) = 4x(1- x).

T(x, y) = (4x(1- x) + y,(3x),


which is equivalent to the Henon map first studied by Henon [1976].

Remark 4.1.5. Our derivation of the Frobenius-Perron operator (4.1.5)


corresponding to the baker transformation is longer than it need be. Since

4.1 Measure-Preserving 'Iransformations

57

<Cl

FIGURE 4.1.3. Operation of the Anosov diffeomorphism (equation (4.1.6)).

the baker transformation is invertible (except on the line y

s-t(x )-{{!x,2y)
,y-

(!+!x,2y-1)

= l ), and indeed

0$x<1,0$y<!
O$x<1,!<y<1,

equation (4.1.5) may be immediately obtained from Corollary 3.2.1.

Example 4.1.4 (Anosov diffemorphisms). The baker transformation


of the previous example may be considered to be a prototype of a very
important class of transformations originally introduced by Anosov [1963).
One of the simplest of the Anosov diffeomorphisms is given by
S(x,y) = (x + y,x + 2y)

(mod 1).

(4.1.6)

To see the effect of this transformation consult Figure 4.1.3. 'In the first
part (a) of the figure we depict the unit square in the plane and divide it

into four triangular ares. In Figure 4.1.3b we show how the unit square is
transformed after one application of (x, y)-+ (x+y, x+2y), whereas Figure
4.1.3c shows the result of the full Anosov diffeomorphism. It is clear that
the effect of this transformation will be to very quickly scramble, or mix,
various regions of the unit square. This property of mixing, also shared
by the baker transformation, is most important and is dealt with in more
detail in Section 4.3.
The determinant of the Jacobian of transformation (4.1.6) is given by

= det 1~

~ 1 = 1,

58

4. Studying Chaos with Densities

so that the transformation is measure preserving; we already noted this


result on geometric grounds in Figure 4.1.3. The two eigenvectors associated
with S have eigenvalues
.X1 =

V5
2 2
3

and .X2

3 V5
= 2 + 2'

hence 0 < .X1 < 1 < .X2 Thus, as for the baker transformation, the Anosov
diffeomorphism involves a stretching in one direction and a corresponding
compression in the orthogonal direction.
With some patience it is possible to derive an explicit formula for the
Frobenius-Perron operator corresponding to the Anosov diffeomorphism
(4.1.6) using a technique analogous to that employed for the baker transformation of the previous example. However, we can obtain this result
immediately from Corollary 3.2.1 since the Anosov diffeomorphism is invertible. An easy calculation gives

s- 1 (x, y) =

(2x- y, y- x)

and thus

Pl(x, y)

(mod 1)

= l(2x- y, y- x),

(4.1.7)

where, as in (4.1.6), the terms 2x- y and y- x should be interpreted


modulo 1. From (4.1.7) it is clear that P1 = 1, which corresponds to the
fact that S preserves the Borel measure. 0

Remark 4.1.6. Observe that, if we replace the unit square [0, 1] x [0, 1]
with the torus, that is, if we identify points (x, 1) with (x, 0) and (1, y) with
(0, y), then this example of an Anosov diffeomorphism becomes continuous
and differentiable just as the r-adic transformation does when the unit
interval is replaced by the unit circle. The word diffeomorphism comes
from the fact that the Anosov transformation is invertible, and that both
the transformation and its inverse are differentiable. 0

4.2 Ergodic Transformations


Because a transformation S has an invariant measure or because the
Frobenius-Perron operator P associated with S has a stationary density
does not imply that S has interesting statistical properties. For example,
if Sis the identity on X, that is, S(x) = x for every x EX, then

s- 1 (A) =A

(4.2.1)

for every A C X, and, consequently, PI = I for every I E L1 This is,


of course, not an interesting transformation. However, even if (4.2.1) holds
for just one subset A of X, then the transformation S may be studied on

4.2 Ergodic Transformations

59

the sets A and X\ A separately. To see this, assume that A is fixed and
condition (4.2.1) holds. Consider a trajectory
x0 , 8(x0 ), 8 2 (x0 ),

Equality (4.2.1) implies that 8 maps A into itself and no element of X\ A


is mapped into A. Thus, if x 0 e A then sn(x0 ) e A for all n, and if xo rt A
then 8n(x0 ) '1. A for all n.
Example 4.2.1. A simple example is
8(k) =

k + 2 for k = 1, ... , 2(N- 1)


1
fork= 2N- 1
{2
fork=2N

operating on the space X= {1, ... , 2N} with the counting measure. This
transformation can be studied separately on the sets A = {1, 3, ... , 2N -1}
and X \ A = {2, 4, ... , 2N} of odd and even interiors. 0
Any set A satisfying (4.2.1) is called invariant. We require this equality
to be satisfied modulo zero (see Remark 3.1.3). Then we can make the
following definition.
Definition 4.2.1. Let (X, .A, JJ.) be a measure space and let a nonsingular
transformation 8: X -+ X be given. Then 8 is called ergodic if every
invariant set A e .A is such that either JJ.(A) = 0 or JJ.(X \ A) = 0; that is,
Sis ergodic if all invariant sets are trivial subsets of X.
From this definition it follows that any ergodic transformation 8 must
be studied on the entire space X. Determining ergodicity on the basis
of Definition 4.2.1 is, in general, difficult except for simple examples on
finite spaces. Thus, for example, the transformation in Example 4.2.1 is
not ergodic on the space X of integers, but it is ergodic on the sets of even
and odd integers.
In studying more interesting examples the following theorem may be of

use.
Theorem 4.2.1. Let (X, .A, JJ.) be a measure space and 8: X -+ X a nonsingular transformation. 8 is ergodic if and only if, for every measurable
function f: X -+ R,

f(8(x))

= f(x)

for almost all

eX

(4.2.2)

implies that f is constant almost everywhere.


Proof. We first show that ergodicity implies f is constant. Assume that,
as in Figure 4.2.1, we have a function f satisfying (4.2.2), which is not

60

4. Studying Chaos with Densities

FIGURE 4.2.1. Definition of the sets A and B = B1 U B2.

constant almost everywhere, and that 8 is ergodic. Then there is some r


such that the sets
A={x:f(x)~r}

and

B={x:f(x)>r}

have positive measure. These sets are also invariant because

8- 1 (A) = {x: 8(x) E A}= {x: f(8(x)) ~ r}

= {x:f(x) ~ r} =A
and similarly for B. Because sets A and Bare invariant, 8 is not ergodic,
which is a contradiction. Thus, every f satisfying (4.2.2) must be constant.
To prove the converse, assume that 8 is not ergodic. Then, by Definition
4.2.1, there is a nontrivial set A E A that is invariant. Set f = 1A, and since
A is nontrivial, f is not a constant function. Moreover, since A= 8- 1 (A)
we have

/(8(x))

= 1A(8(x)) = 1s-l(A)(x) = 1A(x) = f(x)

and ( 4.2.2) is satisfied by a nonconstant function.

a.e.

Remark 4.2.1. It is clear from the proof that it is sufficient to verify only
(4.2.2) for bounded measurable functions since in the last part of the proof
we used characteristic functions that are bounded. 0
An immediate consequence of Theorem 4.2.1 in combination with the
definition of the Koopman operator is the following corollary.
Corollary 4.2.1. Let (X, A, JJ) be a measure space, 8: X--+ X a nonsingu-

lar transformation, and U the Koopman operator with respect to 8. Then


8 is ergodic if and only if all the fixed points of U are constant functions.
In addition to Theorem 4.2.1 and the preceding corollary, another result
of use in checking the ergodicity of 8 using the Frobenius-Perron operator
is contained in the following theorem.

4.2 Ergodic Transformations

61

Theorem 4.2.2. Let (X,A,IJ) be a measure space, S:X-+ X a nonsingular transformation, and P the Probenius-Perron operator associated with
S. If S is ergodic, then there is at most one stationary density f. of P.
Further, if there is a unique stationary density f. of P and f.(x) > 0 a. e.,
then S is ergodic.
Proof. To prove the first part of the theorem 8BSume that S is ergodic and
that It and h are different stationary densities of P. Set g = It - h, so
that Pg = g by the linearity of P. Thus, by Proposition 3.1.3, g+ and gare both stationary densities of P:
(4.2.3)

Since, by 8BSumption, It and h are not only different but are also densities
we have
Set

A= suppg+ = {x:g+(x) > 0}.


and

= suppg- = {x:g-(x) > 0}.

It is evident that A and Bare disjoint sets and both have positive (nonzero)
measure. By equality (4.2.3) and Proposition 3.2.1, we have

s- 1 (B).
Since A and Bare disjoint sets, s- 1 (A) and s- 1 (B) are also disjoint. By
Ac

s- 1 (A)

and B

induction we, therefore, have

s- 1 (A) c s- 2 (A) c s-n(A)

s- 1 (B) c s- 2 (B) ... c s-n(B),

and

where s-n(A) and s-n(B) are also disjoint for all n. Now define two sets
by
00

A=

U s-n(A)

00

and

iJ =

U s-n(B).
n=O

n=O

These two sets A and iJ are also disjoint and, furthermore they are invariant
because
00
00

s- 1 (A) = U s-n(A) = U s-n(A) =A


n=l

n=O

62

4. Studying Chaos with Densities

and

s-l(fJ)

us-n(B) = us-n(B) = B.
00

00

n=l

n=O

Neither A nor fJ are of measure zero since A and B are not of measure
zero. Thus, A and fJ are nontrivial invariant sets, which contradicts the
ergodicity of S. Thus, the first portion of the theorem is proved.
To prove the second portion of the theorem, assume that /. > 0 is the
unique density satisfying Pf. = /. but that Sis not ergodic. If Sis not
ergodic, then there exists a nontrivial set A such that

s- 1 (A) =A
and with B = X \A

s- 1 (B) =B.

With these two sets A and B, we may write /.

= 1Af + 1s/., so that


(4.2.4)

s-

1 (A). Thus,
The function 1s/. is equal to zero in the set X\ B =A=
by Proposition 3.2.1, P{1s/.) is equal to zero in A= X\ Band, likewise,
P{1A/.) is equal to zero in B =X\ A. Thus, equality (4.2.4) implies that

1Af

= P(1Af)

and

1s/. = P(1s/.).

Since/. is positive on A and B, we may replace 1Af by !A= 1A/./II1A/.II,


and 1s/. by Is= 1s/./ll1s/.ll in the last pair of equalities to obtain

!A= PIA and Is= Pfs.


This implies that there exist two stationary densities of P, which is in contradiction to our assumption. Thus, if there is a unique positive stationary
density/. of P, then Sis ergodic.

Example 4.2.2. Consider a circle of radius 1, and let S be a rotation


through an angle t/J. This transformation is equivalent to the mapS: [0, 211") -+
[0, 211") defined by
(mod 211").
S(x) =x+t/J
If tjJ is commensurate with 211" (that is, t/J/211" is rational), then Sis evidently
nonergodic. For example, if t/J = 1r /3, then the sets A and B of Figure 4.2.2
are invariant. For any t/J = 27r(k/n), where k and n are integers, we will still
find two invariant sets A and B, each containing n parts. As n becomes large
the intermingling of the two sets A and B becomes more complicated and
suggests that the rotational transformation S may be ergodic for (t/J/211")
irrational. This does in fact hold, but it will be proved later when we have
more techniques at our disposal.

4.2 Ergodic 'fransformations

63

FIGURE 4.2.2. The two disjoint sets A (containing all the arcs denoted by thin
lines) and B (containing arcs marked by heavy lines) are invariant under the
rotational transformation when t/J/21r is rational.
In this example the behavior of the trajectories is moderately regular and
insensitive to changes in the initial value. Thus, independent of whether or
not (4J/21r) is rational, if the value of 4J is known precisely but the initial
condition is located between a and /3, xo E {a, /3), then

{mod 21r)
and all of the following points of the trajectory are known with the same
accuracy, {/3 - a). 0
Before closing this section we state, without proof, the Birkhoff individual ergodic theorem [Birkhoff, 1931a,b].
Theorem 4.2.3. Let (X, A, p.) be a measure space, 8: X -+ X a measumble
transformation, and f: X -+ R an integrable function. If the measure p. is
invariant, then there exists an integrable function
such that

n-1

f*(x) = lim .!_ "f(Sk(x))


n-+oo n L....i

for almost all x E X.

(4.2.5)

k=O

Without additional assumptions the limit (X) is generally difficult to


determine. However, it can be shown that f*(x) satisfies
f*(x)

= f*(S(x))

for almost all x E X,

{4.2.6)

and when p.{X) < oo

f*(x)p.(dx) =

f(x)p.(dx).

{4.2.7)

Equation (4.2.6) follows directly from (4.2.5) if xis replaced by S(x). The
second property, {4.2.7), follows from the invariance of p. and equation

64

4. Studying Chaos with Densities

(4.2.5). Thus, by Theorem 3.2.1,

f(x)JJ(dx) = [

f(S(x))JJ(dx)

so that integrating equation (4.2.5) over X and passing to the limit yields
(4.2. 7) by the Lebesque-dominated convergence theorem when f is bounded.
When f is not bounded the argument is more difficult.

Remark 4.2.2. Theorem 4.2.3 is known as the individual ergodic theorem


because it may be used to give information concerning the asymptotic
behavior of trajectories starting from a given point x E X. AB our emphasis
is on densities and not on individual trajectories, we will seldom use this
theorem. 0
With the notion of ergodicity we may derive an important and often
quoted extension of the Birkhoff individual ergodic theorem.
Theorem 4.2.4. Let (X, A, JJ) be a finite measure space and S: X-+ X be
measure presenJing and ergodic. Then, for any integrable f, the average of
f along the trajectory of S is equal almost everywhere to the average off
over the space X; that is
1 n-1

lim n-+oo n

f(Sk(x))

k=O

JJ

1
(X)

f(x)JJ(dx)

a.e.

(4.2.8)

Proof. From (4.2.6) and Theorem 4.2.1 it follows that/* is constant almost
everywhere. Hence, from (4.2.7), we have

f*(x)JJ(dx)

= f* [

so that

/*(x)

JJ(dx)

= JJ(~) [

= J*JJ(X) = [
f(x)JJ(dx)

f(x)JJ(dx),

a.e.

Thus equation (4.2.5) of the Birkhoff theorem and the preceding formula
imply (4.2.8), and the theorem is proved.
One of the most quoted consequences of this theorem is the following.

Corollary 4.2.2. Let (X, A, JJ) be a finite measure space and S: X -+ X


be measure presenJing and ergodic. Then for any set A E A, JJ(A) > 0, and
almost all x E X, the fraction of the points {Sk(x)} in A as k -+ oo is
given by JJ(A)/JJ(X).
Proof. Using the characteristic function 1A of A, the fraction of points
from {Sk(x)} in A is
n-1
1
lim lA(Sk(x)).
n-+oo n k=O

However, from (4.2.8) this is simply JJ(A)/JJ(X).

4.3. Mixing and Exactness

65

Remark 4.2.3. Corollary 4.2.2 says that every set of nonzero measure is
visited infinitely often by the iterates of almost every x E X. This result is
a special case of the Poincare recurrence theorem. 0

4.3 Mixing and Exactness


Mixing Transformations
The examples of the previous section show that ergodic behavior per se
need not be very complicated and suggests the necessity of introducing
another concept, that of mixing.

Definition 4.3.1. Let (X, A, I') be a normalized measure space, and S: X-+
X a measure-preserving transformation. Sis called mixing if
lim I'(A n s-n(B))

n-+oo

= IJ(A)IJ(B)

for all A,B EA.

(4.3.1)

Condition (4.3.1) for mixing has a very simple interpretation. Consider


points X belonging to the set An s-n(B). These are the points such that
x E A and sn(x) E B. Thus, from (4.3.1), as n-+ oo the measure of the set
of such points is just I'(A)IJ(B). This can be interpreted as meaning that
the fraction of points starting in A that ended up in B after n iterations
(n must be a large number) is just given by the product of the measures
of A and B and is independent of the position of A and B in X.
It is easy to see that any mixing transformation must be ergodic. Assume
that B E A is an invariant set, so that B = s- 1 (B) and, even further,
B = s-n(B) by induction. Take A = X \ B so that I'(A n B) = I'(A n
s-n(B)) = 0. However, from (4.3.1), we must have
lim I'( An s-n(B))

n-+oo

= I'(A)IJ(B) = (1 -~J(B))I'(B),

and thus I'(B) is either 0 or 1, which proves ergodicity.


Many of the transformations considered in our examples to this point
are mixing, for example, the baker, quadratic, Anasov, and r-adic. (The
rotation transformation is not mixing according to our foregoing discussion.) To illustrate the mixing property we consider the baker and r-adic
transformations in more detail.

Example 4.3.1. (See also Example 4.1.3.) In considering the baker transformation, it is relatively easy to check the mixing condition (4.3.1) for
generators of the u-algebra B, namely, for rectangles. Although the transformation is simple, writing the algebraic expressions for the counterimages
is tedious, and the property of mixing is easier to see pictorially. Consider

66

4. Studying Chaos with Densities

Figure 4.3.1a, where two sets A and Bare represented with J.'(B) =!We
take repeated counterimages of the set B by the baker transformation and
find that after n such steps, s-"(B) consists of 2"- 1 vertical rectangles of
equal area. Eventually the measure of AnS-"(B} approaches J.'(A}/2, and
condition (4.3.1} is evidently satisfied. The behavior of any pair of sets A
and B is similar.
It is interesting that the baker transformation behaves in a similar fashion
if, instead of examining s-"(B}, we look at S"(B) as shown in Figure
4.3.1b. Now we have 2" horizontal rectangles after n steps and all of our
previous comments apply. So, for the baker transformation the behavior
of images and counterimages is very similar and illustrates the property of
mixing. This is not true for our next example, the dyadic transformation.

0
In general, proving that a given transformation is mixing via Definition
4.3.1 is difficult. In the next section, Theorem 4.4.1 and Proposition 4.4.1,
we introduce easier and more powerful techniques for this purpose.

Example 4.3.2. (Cf. Examples 1.2.1 and 4.1.1.} To examine the mixing
property (4.3.1) for the dyadic transformation, consider Figure 4.3.2a. Now
we take the set B = (0, b] and find that the nth counterimage of B consists
of intervals on (0,1] each of the same length. Eventually, as before J.'(A n
s-"(B}} - J.(A)J.(B).
As for the baker transformation let us consider the behavior of images of
a set B under the dyadic transformation (cf. Figure 4.3.2b}. In this case,
if B = (0, b], then S(B) = [0, 2b] and after a finite number of iterations
S"(B) = (0, 1). The same procedure with any arbitrary set B c (0, 1] of
positive measure will show that J.'(S"(B)) - 1 and thus the behavior of
images of the dyadic transformation is different from the baker transformation. 0

Exact Transformations
The behavior illustrated by images of the dyadic transformation is called
exactness, and is made precise by the following definition due to Rochlin

[1964].
Definition 4.3.2. Let (X, A, J.') be a normalized measure space and S: X -+
X a measure-preserving transformation such that S(A) E A for each
A EA. If
lim J.(S"(A))

n-+oo

=1

for every A E A, J.(A)

> 0,

(4.3.2}

then Sis called exact.


It can be proved, although it is not easy to do so from the definition,
that exactness of S implies that S is mixing. As we have seen from the

4.3. Mixing and Exactness

I 5" 1 (B)

67

0
(a)

(b)

FIGURE 4.3.1. Mixing illustrated by the behavior of counterimages and images


of a set B by the baker transformation. (a) The nth counterimage of the set B
consists of 2"- 1 vertical rectangles, each of equal area. (b) Successive iterates of
the same set B results in 2" horizontal rectangles after n iterations.

68

4. Studying Chaos with Densities


B

0
0

(a)

(b)

FIGURE 4.3.2. The behavior of counterimages and images of a set B by the


dyadic transformation. (a) Successive counterimages of a set B that result after
n such counterimages, in 2" disjoint sets on [0,1]. (b) The behavior of images of a
set B generated by the dyadic transformation, which is quite different than that
for the baker transformation. (See the text for further details.)

baker transformation the converse is not true. We defer the proof until the
next section when we have other tools at our disposal.
Condition (4.3.2) has a very simple interpretation. If we start with a set
A of initial conditions of nonzero measure, then after a large number of
iterations of an exact transformation S the points will have spread and
completely filled the space X.

Remark 4.3.1. It cannot be emphasized too strongly that invertible transformations cannot be exact. In fact, for any invertible measure-preserving
transformationS, we have p.(S{A)) = p.(S- 1 (S(A))) = p.(A) and by induction p.(S.. {A)) = p.(A), which violates {4.3.2). 0
In this and the previous section we have defined and examined a hierarchy
of "chaotic" behaviors. However, by themselves the definitions are a bit
sterile and may not convey the full distinction between the behaviors of
ergodic, mixing, and exact transformations. To remedy this we present the
first six successive iterates of a random distribution of 1000 points in the
set X= [0, 1] x [0, 1] by the ergodic transformation
{mod 1)

{4.3.3)

in Figure 4.3.3; by the mixing transformation


S(x,y)

= (x+y,x+2y)

(mod 1)

(4.3.4)

4.3. Mixing and Exactness

69

I (d)

I (a)

I (b)

I (e)

0
I (c)

FIGURE 4.3.3. Successive iterates of a random distribution of 1000 points in


(0,0.1) x (0,0.1) by the ergodic transformation (4.3.3). Note how the distribution
moves about in the space (0, 1) x [0, 1).

in Figure 4.3.4; and by the exact transformation

S(x,y) = (3x+y,x+3y)

(mod 1)

(4.3.5)

in Figure 4.3.5. Techniques to prove these assertions will be developed in


the next two chapters.

70

4. Studying Chaos with Densities

I (a l

I (b)

0
I (c)

0
(f)

I _.. _..

, ....

FIGURE 4.3.4. The effect of the mixing transformation (4.3.4} on the same initial
distribution of points used in Figure 4.3.3.

4.4. Classifying Transformations

(d)

I (a)

71

I (b)

FIGURE 4.3.5. Successive applications of the exact transformation [equation


(4.3.5)]. Note the rapid spread of the initial distribution of points throughout
the phase space.

4.4 Using the Frobenius-Perron and Koopman


Operators for Classifying Transformations
The concepts developed in the previous two sections for classifying various degrees of irregular behaviors (ergodicity, mixing, and exactness) were
stated in terms of the behavior of sequences of sets. The proof of ergodicity,
mixing, or exactness using these definitions is difficult. Indeed, in all the
examples we gave to illustrate these concepts, no rigorous proofs were ever
given, although it is possible to do so.
In this section we reformulate the concepts of ergodicity, mixing, and

72

4. Studying Chaos with Densities

exactness in terms of the behavior of sequences of iterates of FrobeniusPerron and Koopman operators and show how they can be used to determine whether a given transformation S with an invariant measure is
ergodic, mixing, or exact. The techniques of this chapter rely heavily on
the notions of Cesaro, weak and strong convergences, which were developed
in Section 2.3.
We will first state and prove the main theorem of this section and then
show its utility.

Theorem 4.4.1. Let (X, .A, JL) be a normalized measure space, 8: X-+ X
a measure preseroing tmnsformation, and P the Frobenius-Perron opemtor
corresponding to S. Then
(a) Sis ergodic if and only if the sequence {Pn!} is CesAro convergent
to 1 for all f E D;
(b) S is mixing if and only if { pn!} is weakly convergent to 1 for all
/ED;
(c) S is exact if and only if { pn!} is strongly convergent to 1 for all
/ED.
Before giving the proof of Theorem 4.4.1, we note that, since Pis linear,
convergence of { pn!} to 1 for f E D is equivalent to the convergence of
{Pn!} to (!, 1} for every f E 1 This observation is, of course, valid for
all types of convergence: Cesaro, weak, and strong. Thus we may restate
Theorem 4.4.1 in the equivalent form.

Corollary 4.4.1. Under the assumptions of Theorem 4.4.1, the following


equivalences hold:
(a) S is ergodic if and only if
n-1

1
lim - '"'(Pk /, g}

n-+oo

n L...J

= (!, 1} (1, g)

k=O

(b) S is mixing if and only if


lim (Pn /,g)

n-+oo

= (!, 1} (1, g)

(c) S is exact if and only if


lim IIPn f

n-+oo

- (/, 1) II

=0

Proof of Theorem 4.4.1. The proof of part (a) follows easily from Corollary 5.2.3.

4.4. Classifying Transformations

73

Next consider the mixing portion of the theorem. AssumeS is mixing,


which, by definition, means
lim

n-+oo

~(An s-n(B)) =~(A)~( B)

for all A,B EA.

The mixing condition can be rewritten in integral form as


lim

f 1A(X)1 8 (sn(x))~(dx) = f 1A(x)~(dx) f 1 8 (x)~(dx).


lx
lx

n-+oolx

By applying the definitions of the Koopman operator and the scalar product
to this equation, we obtain
(4.4.1)

Since the Koopman operator is adjoint to the Frobenius-Perron operator,


equation (4.4.1) may be rewritten as

or
lim (Pnf,g} = (/,1}(1,g}

n-+oo

for f = 1A and g = 1s. Since this relation holds for characteristic functions
it must also hold for the simple functions

=L

AilA,

and g

=L

u,1s1

Further, every function g E L 00 is the uniform limit of simple functions


9k E L 00 , and every function f E L 1 is the strong (in L 1 norm) limit of a
sequence of simple functions be E 1 Obviously,
I(Pnj,g}- (/,1}(1,g}j $j(Pnj,g}- {Pn/1c,91c}l

+ I(Pn fk, 9k}- (f~c, 1}{1, 9k}l


+ l(f~c, 1}(1,g~c}- (!, 1}(1,g}j. (4.4.2)

H llf~c - /II $ f and ll9k - YIILoo $


right-hand side of (4.4.2) satisfy

f,

then the first and last terms on the

I(Pn J,g}- {Pn fk,9k}i


$ j(Pnj,g}- (Pnfk,g}j + I(Pnfk,g}- (Pnfk,9k}i
$ fii9IILoo + fll/~cll $ f(IIYIILoo +

11!11 +f)

and analogously

1(/k, 1}(1,g~c}- (/, 1}(1,g}l $

f(IIYIILoo +

11/11 +f).

74

4. Studying Chaos with Densities

Thus these terms are arbitrarily small for small


middle term of (4.4.2),

E.

Finally, for fixed k the

converges to zero as n -+ oo, which shows that the right-hand side of


inequality (4.4.2) can be as small as we wish it to be for large n. This
completes the proof that mixing implies the convergence of (Pn /,g) to
(!, 1}(1,g} for all f e 1 and g e 00 Conversely, this convergence implies
the mixing condition (4.4.1) if we set f = 1A and g = 1s.
Lastly, we show that the strong convergence of {pn!} to (!, 1} implies
exactness. Assume J.t(A) > 0 and define

/A(x) = (1/J.t(A))1A(x).
Clearly, !A is a density. If the sequence {rn} is defined by

then it is also clear that the sequence is convergent to zero. By the definition
of rn, we have

J.t(Sn(A)) =

J.t(dx)

1sn(A)

={

pn fA(X)J.t(dx)- {

1sn(A)

(Pn fA(x)- 1)J.t(dx)

1sn(A)

pnfA(X)J.t(dx)- Tn

?; {

(4.4.3)

1sn(A)

From the definition of the Frobenius-Perron operator, we have

1sn(A)

pn fA(X)J.t(dx)

={

1s-n(Sn(A))

/A(X)J.t(dx)

and, since s-n(Sn(A)) contains A, the last integral is equal to 1. Thus


inequality (4.4.3) gives
J.t(Sn(A)) ?: 1 - Tn 1
which completes the proof that the strong convergence of {pn!} to (!, 1}
implies exactness.
We omit the proof of the converse (that exactness implies the strong
convergence of {pnf} to (/,1}) since we willnever use this fact and its
proof is based on quite different techniques (see Lin [1971]).
Because the Koopman and Frobenius-Perron operators are adjoint, it is
possible to rewrite conditions (a) and (b) of Corollary 4.4.1 in terms of the
Koopman operator. The advantage of such a reformulation lies in the fact

4.4. Classifying Transformations

75

that the Koopman operator is much easier to calculate than the FrobeniusPerron operator. Unfortunately, this reformulation cannot be extended to
condition (c) for exactness of Corollary 4.4.1 since it is not expressed in
terms of a scalar product.
Thus, from Corollary 4.4.1, the following proposition can easily be stated.
Proposition 4.4.1. Let (X, A, J.) be a normalized measure space, S: X X a measure-preseroing transformation, and U the Koopman operator corresponding to S. Then

(a) S is ergodic if and only if


n-1

1
lim - "(!, uk g)
n-+oo n L..J

= (!, 1){1, g)

k=O

(b) S is mixing if and only if

lim(!, ung) = (/, l)(l,g)

n-+oo

Proof. The proof of this proposition is trivial since, according to equation


(3.3.4), we have

(f,Ung)

= (Pn/,g}

for IE L\ g E L 00 , n

= 1, 2, ... ,

which shows that conditions (a) and (b) of Corollary 4.4.1 and Proposition
4.4.1 are identical.
Remark 4.4.1. We stated Theorem 4.4.1 and Corollary 4.4.1 in terms of
L1 and L 00 spaces to underline the role of the Frobenius-Perron operator
as a transformation of densities. The same results can be proved using
adjoint spaces P and v' instead of 1 and L 00 , respectively. Moreover,
when verifying conditions (a) through (c) of Theorem 4.4.1 and Corollary
4.4.1, or conditions (a) and (b) of Proposition 4.4.1, it is not necessary to
check for their validity for all I E P and g E v'. Due to special properties
of the operators P and U, which are linear contractions, it is sufficient to
check these conditions for f and g belonging to linearly dense subsets of
V' and v', respectively (see Section 2.3). 0
Example 4.4.1. In Example 4.2.2 we showed that the rotational transformation
S(x) = x + cp (mod 211")
is not ergodic when c/J/27r is rational. Here we prove that it is ergodic when
r/J/27r is irrational.

It is straightforward to show that S preserves the Borel measure J. and the


normalized measure J./27r. We take as our linearly dense set in v' ([0, 21r])

76

4. Studying Chaos with Densities

that consisting of the functions {sin kx, cos lx: k, l


that, for each function g belonging to this set,

= 0, 1, ... }. We will show

n-1

lim

n-+oo

.!:.
"'Ukg(x) = (1,g)
n L.J

(4.4.4)

k=O

uniformly for all x, thus implying that condition (a) of Proposition 4.4.1 is
satisfied for all f. To simplify the calculations, note that

where i = A. Consequently, it is sufficient to verify (4.4.4) only for


g(x) = exp(ikx) with k an arbitrary (not necessarily positive) integer.
We have, for k '# 0,

U'g(x)

= g(S'(x)) = eik(z+lt/1)'

so that

n-1

Un(x)

= .!:. L

U1g(x)

n l=O

obeys
n-1

Un(x)

1
=n
L

eik(z+lt/1)

l=O

1 .

einkt/1 -1

= -e'kz--:-:-:--n
eikt/>-1'
and
llun(x)IIL2

~ nle'kt/1
.1
-

{12"' le'kz [einkt/1 -1] 12dx211' }1/2


0

~ nleikt/1- 11'
Thus un(x) converges in L 2 to zero. Also, however, with our choice of g(x),
(1,g)

1
0

"'

dx
eikz_
21r

1
= -:-[e2wilc
_
zk

1] = 0

and condition (a) of Proposition 4.4.1 for ergodicity is satisfied with k '# 0.
When k = 0 the calculation is even simpler, since g(x) 1 and thus

Un(x)

= 1.

4.4. Classifying Transformations

Noting also that

211'

(1, g) =

we have again that

Un (x)

1
0

dx
211"

77

=1,

converges to (1, g).

Example 4.4.2. In this example we demonstrate the exactness of the radic transformation
S(x) = rx
(mod 1}.

From Corollary 4.4.1 it is sufficient to demonstrate that {Pn!} converges


strongly to (!, 1) for I in a linearly dense set in .LP([O, 1]}. We take that
linearly dense set to be the set of continuous functions.
From equation (1.2.13} we have
1 r-1

ri=O

Pl(x) =- L l

~+

=
,
r
)

and thus by induction


n-1

pn l(x)

=..!._~I
rn ~ (i.
rn + -=-)
rn
k=O

However, in the limit as n -+ oo, the right-hand side of this equation


approaches the Riemann integral of I over [0,1], that is,
lim pn l(x)

n-+oo

1
1

l(s)ds,

uniformly in x,

which, by definition, is just


filled. 0

(1, 1). Thus the condition for exactness is ful-

Example 4.4.3. Here we show that the Anosov diffeomorphism


S(x,y)

= (x+y,x+2y)

(mod 1}

is mixing. For this, from Proposition 4.4.1, it is sufficient to show that

ung(x, Y} g(Sn(x, y)) converges weakly to (1, g) for g in a linearly dense


set in V ([0, 1] x [0, 1]).
Observe that for g(x,y) periodic in x andy with period 1, g(S(x,y)) =
g(x+y,x+2y),g(S 2 (x,y)) = g(2x+3y,3x+5y), and so on. By induction
we easily find that

where the an are the Fibonacci numbers given by ao = a1 = 1, an+l


an + an_ 1 Thus, if we take g(x, y) = exp[211"i(kx + ly)J and l(x, y)

=
=

78

4. Studying Chaos with Densities

exp[-211'i(px + qy)), then we have


1 1
(!, ung) = 1 1 exp{211'i[ka2n-2 + la2n-1 - p)x
+(ka2n-1 + l~n - q)y]}dx dy,
and it is straightforward to show that

(!, U"g)

= {1

if (ka2'!-2 + la2n-1 - p)
0 otherwlSe.
Now we show that for large n either

= (ka2n-1 + la2n- q) = 0

ka2n-2 + la2n-1- p or ka2n-1 + la2n- q


is different from zero if at least one of k, l, p, q is different from zero. If
k = l = 0 but p '::/: 0 or q '::/: 0 this is obvious. We may suppose that either
k or l is not zero. Assume k '::/: 0 and that ka2n-2 + la2n-1 - p = 0 for
infinitely many n. Thus,
ka2n-2 +l- _P_ = O.
a2n-1
a2n-1
It is well known [Hardy and Wright, 1959) that
. m
a2n-2- - = 2- - and
l1
n-+oo a2n-1
1 + y'5
hence

[k

lim an= oo,


n-+oo

l- _P_]

lim
(a2n-2) +
=~ +
n-+oo
a2n-1
a2n-1
1 + v'5
.
However, this limit can never be zero because k and l are integers. Thus,
ka2n-2 + la2n-1- p '::/: 0 for large n. Therefore, for large n,

(!, un ) = { 1 if k = ~ = p = q = 0
g
0 otherwlSe.
But
(1, g)

=1

1 1
1 exp[211'i(kx + ly))dx dy

={

0 k '::/: 0 or l '::/: 0,

1 k=l=O

so that

(/,1)(1,g)

=1

1
1\1,g)exp[-211'i(px+qy)]dxdy

= { (1, g)
0

={1

if p = q = 0
ifp'::/:0 or q:f:O

ifk=l=p=q=O

0 otherwise.

,~

4.5. Kolmogorov Automorphisms

Thus

(1, ung)

79

= {/, 1){1, g)

for large nand, as a consequence, {Ung} converges weakly to {1,g). Therefore, mixing of the Anosov diffeomorphism is demonstrated. 0
In this chapter we have shown how the study of ergodicity, mixing, and
exactness for transformations S can be greatly facilitated by the use of
the Frobenius-Perron operator P corresponding to S (cf. Theorem 4.4.1
and Corollary 4.4.1). Since the Frobenius-Perron operator is a special type
of Markov operator, there is a certain logic to extending the notions of
ergodicity, mixing, and exactness for transformations to Markov operators
in general. Thus, we close this section with the following definition.

Definition 4.4.1. Let (X,A,JL) be a normalized measure space and P:


L1 (X, A,JL)-+ L 1 (X,A,JL) be a Markov operator with stationary density
1, that is, P1 = l.Then we say:
(a) The operator P is ergodic if { pn /} is CesAro convergent to 1 for all
/ED;
{b) The operator P is mixing if { pn!} is weakly convergent to 1 for all
fED; and
(c) The operator P is exact if { pn!} is strongly convergent to 1 for all
/ED.

4.5

Kolmogorov Automorphisms

Until now we have considered three types of transformations exhibiting


gradually stronger chaotic properties: ergodicity, mixing, and exactness.
This is not a complete list of possible behaviors. These three types are
probably the most important, but it is possible to find some intermediate
types and some new unexpected connections between them. For example,
between ergodic and mixing transformations, there is a class of weakly
mixing transformations that, by definition, are measure preserving (on a
normalized measure space {X, A, JL)] and satisfy the condition

n-1

lim - ~ IJL(A n s-k(B)) - JL(A)JL(B)I


n-+oo n L..J

=o

for A,B EA.

lc=O

It is not easy to construct an example of a weakly mixing transformation


that is not mixing. Interesting comments on this problem can be found in
Brown (1976].
However, Kolmogorov automorphisms, which are invertible and therefore cannot be exact, are stronger than mixing. As we will see later, to

80

4. Studying Chaos with Densities

some extent they are parallel to exact transformations. Schematically this


situation ca be visualized as follows:
K -automorphisms

exact

./

'\.
mixing

!
weakly mixing

!
ergodic
where K -automorphism is the usual abbreviation for a Kolmogorov automorphism and the arrows indicate that the property above implies the
one below. Before giving the precise definition of K -automorphisms, we
introduce two simple notations.
If S: X-+ X is a given transformation and A is a collection of subsets of
X (e.g., au-algebra), then S(A) denotes the collection of sets of the form
S(A) for A e A, and s- 1 (A) the collection of s- 1 (A) for A e A. More
generally,

= 0,1,2, ....

Definition 4.5.1. Let (X, A, p.) be a normalized measure space and let
S: X-+ X be an invertible transformation such that Sand s- 1 are measurable and measure preserving. The transformation S is called a K -automorphism if there exists au-algebra .Ao c A such that the following three conditions are satisfied:

(i)

s- 1 (.Ao) c .Ao;

(ii) the u-algebra


(4.5.1)
n=O

is trivial, that is, it contains only sets of measure 1 or 0; and

(iii) the smallest u-algebra containing


(4.5.2)
is identical to A.
The word automorphism comes from algebra and in this case it means
that the transformationS is invertible and measure preserving (analogously

4.5 Kolmogorov Automorphisms

81

the word endomorphism is used for measure preserving but not necessarily
invertible transformations).
Examples 4.5.1. The baker transformation is a. K-a.utomorphism. For .Ao
we can take all the sets of the form

.Ao ={Ax [0, 1]: A C [0, 1], A is a. Borel set}.


It is easy to verify condition (i) of Definition 4.5.1. Thus, if B =Ax [0, 1],
then Bl = s- 1 (B) has the form Bl = Al X [0, 1], where

A1 =~Au(~+ ~A)

(4.5.3)

and thus condition (i) is satisfied. From this follows a. hint of how to prove
condition (ii). Namely, from (4.5.3) it follows that the basis A1 of the set
B 1 = s- 1 (B) is the union of two sets of equal measure that are contained in
the intervals (0, ~] and (~, 1], respectively. Furthermore, set B 2 = 2 (B)
has the form A1 x [0, 1] and its basis A2 is the union of four sets of equal
measure contained in the intervals [0, l] , ... , [!, 1] . Finally, every set Boo
belonging to the u-algebra. (4.5.1) is of the form Aoo x [0, 1] and A 00 has
the property that for each integer n the measure of the intersection of Aoo
with [k/2n, (k + 1)/2n] does not depend on k. From this it is easy to show
that the measure of the intersection of A 00 with [0, x] is a. linear function
of x or

s-

1z

1A (y)dy
00

=ex,

where c is a. constant. Differentiation gives


for0$x$1.
Since 1Aoo is a. characteristic function, either c = 1 or c = 0. In the first
case, A 00 as well as Boo have measure 1, and if c = 0, then A 00 and Boo
have measure 0. Thus condition (ii) is verified.
To verify (iii), observe that .AouS(.Ao) contains not only sets of the form
Ax [0, 1] but also the sets of the form A x [0, !] and Ax [!, 1]. Further,
.Ao U S(.Ao) U S 2 (.Ao) also contains the sets Ax [0, l], ... , Ax [!, 1] and
so on. Thus, by using the sets from the family (4.5.2), we can approximate
every rectangle contained in [0, 1] x [0, 1]. Consequently, the smallest ualgebra. containing (4.5.2) is the u-algebra. of Borel sets. 0
Example 4.5.2. The baker transformation considered in the previous example has an important geometrical property. At every point it is contracting in one direction and expanding in the orthogonal one. The transformation
S(x,y) = (x+y,x+2y) (mod 1)

82

4. Studying Chaos with Densities

considered in Example 4.1.4 has the same property. AJJ we have observed
the Jacobian of S has two eigenvalues .\11 .\2 such that 0 < .\1 < 1 < .\2.
To these eigenvalues correspond the eigenvectors

e1 = (1,! -lv'S) , e2 = (1,! + !v'S) .

e1

e2.

Thus, S contracts in the direction


and expands in the direction
With this fact it can be verified that S is also a K -automorphism. The
and
that is, .Ao may be
construction of .Ao is related with vectors
defined as a u-algebra generated by a class of rectangles with sides parallel
to vectors 6 and e2. The precise definition of .Ao requires some technical
details, which can be found in an article by Arnold and Avez [1968]. 0
AJJ we observed in Remark 4.1.4, the first coordinate in the baker transformation is transformed independently of the second, which is the dyadic
transformation. The baker transformation is a K -automorphism and the
dyadic is exact. This fact is not a coincidence. It may be shown that every
exact transformation is, in some sense, a restriction of a K -automorphism.
To make this statement precise we need the following definition.

el

e2i

;,

Definition 4.5.2. Let (X,A,J.) and (Y,B,v) be two normalized measure


spaces and let S: X -+ X and T: Y -+ Y be two measure-preserving transformations. If there exists a transformation F: Y -+ X that is also measure
preserving, namely,

v(F- 1 (A))

= J.(A)

for A E A

and such that S o F = F o T, then S is called a factor of T.


The situation described by Definition 4.5.2 can be visualized by the diagram

...!...

y
(4.5.4}

__!..X

and the condition SoF = FoT may be expressed by saying that the diagram
(4.5.4} commutes. We have the following theorem due to Rochlin [1961].
Theorem 4.5.1. Every exact transformation is a factor of a K -automorphism.
The relationship between K -automorphisms and mixing transformations
is much simpler; it is given by the following theorem.
Theorem 4.5.2. Every K -automorphism is mixing.

Exercises

83

The proofs and more information concerning K -automorphisms can be


found in the books by Walters [1982] and by Parry [1981].

Exercises
4.1. Study the rotation on the circle transformation (Examples 4.2.2 and
4.4.1) numerically. Is the behavior a consequence of the properties of the
transformation or of the computer? Why?
4.2. Write a series of programs, analogous to those you wrote in the exercises of Chapter 1, to study the behavior of tw<Klimensional transformations. In particular, write a program to examine the successive locations of
an initial cluster of initial conditions as presented in our study of the baker
transformation and of equations (4.3.3)-(4.3.5).
4.3. Let (X, A, Jt) be a finite measure space and let 8: X
surable transformation such that

--+

X be a mea-

forA eA.
Show that J..L is invariant with respect to S. Is the assumption J..L(X) < oo
essential?
4.4. Consider the space (X, A, Jt) where
X= { ... ,-2,-1,0,1,2, ... }

is the set of all integers, A the family of all subsets of X and Jt is the
counting measure. Let S(X) = x + k for x EX where k is an integer. For
which k is the transformationS ergodic?

4.5. Prove that the baker transformation of Examples 4.1.3 and 4.3.1 is
mixing by using the mixing condition (4.3.1).
4.6. Let X = [0, 1) x [0, 1) be the unit square with the standard Borel
measure. Let r ~ 2 be an integer. Consider the following generalization of
the baker transformation
S(x, y)

for

= (rx(mod 1), ~ + ~)

~x<

k+1
--,k=O, ... ,r-1.
r

Prove that S is mixing.


4.7. Let (X,A,J..L) be a normalized measure space and let P:L 1 (X) --+
L 1 (X) be a Markov operator such that P1 = 1. Fix an integer k ~ 1.
Prove that the following statements are true:

84

4. Studying Chaos with Densities

(a) pic is ergodic =? P is ergodic,


(b) pic is mixing =? P is mixing,
(c) pic is exact =?Pis exact,
where the arrow, as usual, means "implies that." Where may the arrow be
reversed?

5
The Asymptotic Properties of
Densities

The preceding chapter was devoted to an examination of the various degrees


of "chaotic" behavior (ergodicity, mixing, and exactness) that measurepreserving transformations may display. In particular, we saw the usefulness of the Koopman and Frobenius-Perron operators in answering these
questions.
Theorem 4.1.1 reduced the problem of finding an invariant measure to
one of finding solutions to the equation P f = f. Perhaps the most obvious,
although not the simplest, way to find these solutions is to pick an arbitrary f E D and examine the sequence { pn!} of successive iterations of f
by the Frobenius-Perron operator. If {pn!} converges to/., then clearly
{Pn+l!} = {P(pn f)} converges simultaneously to/. and P/. and we are
done. However, to prove that { pn!} converges (weakly or strongly) to a
function /. is difficult.
In this chapter we first examine the convergence of the sequence {An/}
of averages defined by
n-1

An/= ~Lpkj
k=O

and show how this may be used to demonstrate the existence of a stationary density of P. We then show that under certain conditions { pn!}
can display a new property, namely, asymptotic periodicity. Finally, we introduce the concept of asymptotic stability for Markov operators, which
is a generalization of exactness for Frobenius-Perron operators. We then
show how the lower-bound function technique may be used to demonstrate

86

5. The Asymptotic Properties of Densities

asymptotic stability. This technique is used throughout the remainder of


the book.

5.1

Weak and Strong Precompactness

In calculus one of the most important observations, originally due to Weierstrass, is that any bounded sequence of numbers contains a convergent
subsequence. This observation can be extended to spaces of any finite dimension. Unfortunately, for more complicated objects, such as densities,
this is not the case. One example is

fn(x) = n1[0,1/nJ(x),

0 :$;X:$; 1

which is bounded in L 1 norm, that is, llfnll = 1, but which does not converge weakly or strongly in 1 ([0, 1]) to any density. In fact as n --+ oo,
fn(x) --+ c5(x), the Dirac delta function that is supported on a single point,
x=O.
One of the great achievements in mathematical analysis was the discovery of sufficient conditions for the existence of convergent subsequences of
functions, which subsequently found applications in the calculus of variations, optimal control theory, and proofs for the existence of solutions to
ordinary and partial differential equations and integral equations.
To make these comments more precise we introduce the following definitions. Let (X, A, J.) be a measure space and :Fa set of functions in V.
Definition 5.1.1. The set :F will be called strongly precompact if every sequence of functions {in}, fn E :F, cont~ns a strongly convergent
subsequence {fan} that converges to an/ E V.
Remark 5.1.1. The prefix ''pre-" is used because we take
than f E :F. 0

E V rather

Definition 5.1.2. The set :F will be called weakly precompact if every


sequence of functions {in}, fn E :F, contains a weakly convergent subsequence {fan} that converges to an f E V.
Remark 5.1.2. These two definitions are often applied to sets consisting simply of sequences of functions. In this case the precompactness of
:F = {fn} simply means that every sequence {!n} contains a convergent
subsequence {fan} 0
Remark 5.1.3. From the definitions it immediately follows that any subset
of a weakly or strongly precompact set is itself weakly or strongly precompact. 0

5.1. Weak and Strong Precompactness

87

There are several simple and general criteria useful for demonstrating
the weak precompactness of sets in LP [see Dunford and Schwartz, 1957].
The three we will have occasion to use are as follows:
1 Let g E L1 be a nonnegative function. Then the set of all functions
I E L1 such that
ll(x)l ~ g(x)

forx EX a.e.

(5.1.1)

is weakly precompact in L1
2 Let M > 0 be a positive number and p > 1 be given. If ~t(X) <co,
then the set of all functions IE L1 such that
(5.1.2)
is weakly precompact in L 1
3 A set of functions :F c L 1 , ~t(X) < co, is weakly precompact if and
only if:
(a) There is an M < co such that IIIII ~ M for all
(b) For every f > 0 there is a 6 > 0 such that

Lll(x)l~t(dx)

<f

if ~t(A)

:F; and

< 6 and I E :F.

Remark 5.1.4. If the measure is not finite these two conditions must be
supplemented by
(c) For every e > 0 there is a set BE A,
{

lx\B

~t(B)

ll(x)l~t(dx) <e.

<co, such that


0

Strong precompactness is generally more difficult to demonstrate than


weak precompactness. One of the simplest criteria, which we present only
for one-dimensional spaces, is as follows:
4 Let :F be a set of functions defined on a bounded interval
real line. :F is strongly precompact in L 1 (a) if and only if:

a of the

(a) There exists a constant M > 0 independent of I such that


IIIII~M

for all IE :F;

(b) For all f > 0 there exists a 6 > 0 such that

(5.1.3a)

88

5. The Asymptotic Properties of Densities

lf(x +h)- f(x)l dx < E

(5.1.3b)

for all f E F a.nd all h such that lhl < 6. To ensure that this integral
is well defined we assume f(x +h)- f(x) = 0 for X+ h a.
Remark 5.1.5. This necessary a.nd sufficient condition for strong precompactness is valid for unbounded intervals a if, in addition, for every E > 0
there is a.n r > 0 such that

Jlzl?:.r

for all f E F.

lf(x)l dx < E

(5.1.4)

Remark 5.1.6. In practical situations it is often difficult to verify inequality (5.1.3b). However, if the functions f E F have uniformly bounded
derivatives, that is, if there is a constant K such that If' (x)l ~ K, then
the condition is automatically satisfied. To see this, note that

lf(x +h)- f(x)l


implies

lf(x +h)- f(x)l dx

a.nd thus if, for a given

E,

Kh

~ Khp.(a)

we pick

the condition (5.1.3b) is satisfied. Clearly this will not work for unbounded
intervals because for p.(a) -+ oo, 6-+ 0. 0
To close this section we state the following corollary.
Corollary 5.1.1. For every f E L 1 ,
lim

bounded or not,

r lf(x +h)- f(x)l dx = 0.

h-+O}A

(5.1.5)

Proof. To see this note that set {!} consisting of only one function f
is obviously strongly precompact since the sequence {!, J, ... } is always
convergent. Thus equation (5.1.5) follows from the foregoing condition (4b)
for strong precompactness.

5.2

Properties of the Averages Anf

In this section we assume a measure space (X, A, p.) a.nd a Markov operator

5.2. Properties of the Averages An!

P: L 1

89

L 1 We are going to demonstrate some simple properties of the


averages defined by
-.

(5.2.1)
We then state and prove a special case of the Kakutani-Yosida abstract
ergodic theorem as well as two corollaries to that theorem.

Proposition 5.2.1. For all f

e 1,

lim II An/ - AnP/II

n-+oo

= 0.

Proof. By the definition of Anf (5.2.1) we have


An/- AnPf

= (1/n)(/- pn f)

and thus

IIAnf- AnPfll ~ (1/n)(ll/11

+ IIPn/11).

Since it is an elementary property of Markov operators that llpn !II


we have
IIAnf- AnPfll ~ (2/n)ll/11--t 0
as n -. oo, which completes the proof.

11/11,

Proposition 5.2.2. If, for f E L 1 , there is a subsequence {Aa,.!} of the


sequence {An/} that converges weakly to f. E L1 , then Pf. =f.
Proof. First, since P Aa,. f = Aa,. P f, then { Aa,. P!} converges weakly to
Pf. Since {Aa,.Pf} has the same limit as {Aa,.f}, we have P/. =f..
The following theorem is a special case of an abstract ergodic theorem
originally due to Kakutani and Yosida (see Dunford and Schwartz [1957]).
The usefulness of the theorem lies in the establishment of a simple condition
for the existence of a fixed point for a given Markov operator P.

Theorem 5.2.1. Let (X,A,JJ) be a measure space and P: L 1 - . L 1 a


Markov operator. If for a given f E L 1 the sequence {An!} is weakly precompact, then it converges strongly to some /. E L 1 that is a fixed point of
P, namely, Pf. =f. Furthermore, iff ED, then f. ED, so that/. is a
stationary density.
Proof. Because {An/} is weakly precompact by assumption, there exists
a subsequence {Aa,.f} that converges weakly to some f. E L 1 Further, by
Proposition 5.2.2, we know P /. f .

90

5. The Asymptotic Properties of Densities

Write

E L 1 in the form

= (!- /.)

+ /.

(5.2.2)

and &BSume for the time being that for every E > 0 the function
be written in the form

I - /.

f - /. can

= Pg - g + r,

(5.2.3)

where g E L 1 and llrll < E. Thus, from equation (5.2.2) and (5.2.3), we
have
An/ = An(Pg- g)+ Anr +An/.
Because Pf.

= /., Anf = /., and we obtain

IIAn/- /.II

= IIAn(/- /.)II :5 IIAn(Pg- g) II+ IIAnrll.

By Proposition 5.2.1 we know that IIAn(Pg- g)ll is strongly convergent


to zero as n -+ oo, and by our assumptions IIAnrll :5 llrll < E. Thus, for
sufficiently large n, we must have

Since E is arbitrary, this proves that {An!} is strongly convergent to / .


To show that iff E D, then /. E D, recall from Definition 3.1.3 that
f E D means that
I ~ 0 and 11/11 = 1.
Therefore Pf ~ 0 and IlP/II = 1 so that pn/ ~ 0 and IIPn/11 = 1. As
a consequence, Anf ~ 0 and IIAn/11 = 1 and, since {An/} is strongly
convergent to /., we must have /. e D. This completes the proof under
the assumption that representation (5.2.3) is possible for every E.
In proving this aBSumption, we will use a simplified version of the HahnBanach theorem (see Remark 5.2.1). Suppose that for some E there does
not exist an r such that equation (5.2.3) is true. If this were the case, then
f- /. closure(P- J)L 1 (X) and, thus, by the Hahn-Banach theorem,
there must exist a go E L 00 such that
(5.2.4)

(!-/.,go};:/:- 0
and,
(h,go)=O

for all hE closure(P- J)L 1 (X).

In particular
((P- I)Pi /,go} = 0.

Thus

(Pi+ 1 /,Yo} = (Pi/, Yo}

for j

= 0, 1, ... ,

5.2. Properties of the Averages An/

91

and by induction we must, therefore, have

(P; /,go}= (!,go}.

(5.2.5)

(Anf,go} =(/,go}.

(5.2.6)

As a consequence

or
Since {Aan!} was assumed to converge weakly to j., we have

and, by (5.2.6),

(!,go}= (!.,go},
which gives

(/-/.,go}= 0.
However, this result contradicts (5.2.4), and therefore we conclude that the
representation (5.2.3) is, indeed, always pOBBible.

Remark 5.2.1. The Hahn-Banach theorem is one of the classical results of


functional analysis. Although it is customarily stated as a general property
of some linear topological spaces (e.g., Banach spaces and locally convex
spaces), here we state it for LP spaces. We need only two concepts. A set
E C LP is a linear subspace of LP if >.1!1 + >.2!2 E E for all h, h E E
and all scalars )..1, >.2. A linear subspace is closed if lim f n E E for every
strongly convergent sequence {!n} C E. D
Next we state a simple consequence of the Hahn-Banach theorem in the
language of LP spaces [see Dunford and Schwartz, 1957].
Proposition 5.2.3. Let 1 $ p < oo and let p' be adjoint to p, that is,
(1/p) + (1/p') = 1 for p > 1 and p' = oo for p = 1. Further, let E c LP be
a linear closed subspace. If fo E LP and fo E, then there is a g0 E v'
such that (Jo,go} =F 0 and (/,go}= 0 for fEE.
Geometrically, this proposition means that, if we have a closed subspace
E and a vector fo E, then we can find another vector go orthogonal to
E but not orthogonal to fo (see Figure 5.2.1).

Remark 5.2.2. By proving Theorem 5.2.1 we have reduced the problem


of demonstrating the existence of a stationary density /. for the operator
P, that is, PJ. = j., to the simpler problem of demonstrating the weak

92

5. The Asymptotic Properties of Densities

E
FIGURE 5.2.1. Diagram showing that, for /o E, we can find a go, such that go
is not orthogonal to Jo but it is orthogonal to all J E E. Since go belongs to L"',
but not necessarily to

L", it is drawn as a dashed line.

precompa.ctness of the sequence {An/}. In the special case that P is a


Frobenius-Perron operator this also suffices to demonstrate the existence
of an invariant measure. 0
There are two simple and useful corollaries to Theorem 5.2.1.

Corollary 5.2.1. Let (X,A,J) be a measure space and P:L 1


Markov operator. If, for some feD there is age 1 such that
pnj 5, g
for all n, then there is an f,.
stationary density.

-+

L1 a

(5.2.7}

D such that Pf,.

= J,.,

that is, f,. is a

Proof. By assumption, pn f 5, g so that


n-1

0 5, Anf

= ;;:1 L

pic f 5, g

lc=O

and, thus, IAn/1 5, g. By applying our first criterion for weak precompa.ctness (Section 5.1}, we know that {An/} is weakly precompa.ct. Then
Theorem 5.2.1 completes the argument.

Corollary 5.2.2. Again let (X, A, p.) be a finite measure space and P: 1 -+
1 a Markov operator. If some fED there exists M > 0 and p > 1 such
that
(5.2.8}

for all n, then there is an f,. E D such that P j.

= f,..

5.2. Properties of the Averages An/

93

Proof. We have
1

IIAnfiiL"

Il

l nn
pk I

k=O

1 n-1

:5 n
Lr>

1
L II?!IlL" :5 n(nM)
= M.

k=O

Hence, by our second criterion for weak precompactness, {An!} is weakly


precompact, and again Theorem 5.2.1 completes the proof.

Remark 5.2.3. The conditions pn f :5 g or llpn filL" :5 M of these two


corollaries guaranteeing the existence of a stationary density /. rely on the
properties of { pn!} for large n. To make this clearer suppose pn f :5 g
only for n > no. Then, of course, pn+no f :59 for all n, b!?-t this can be
rewritten in the alternate form pn pno f = pn f :5 g, where f = pno f. The
same argument holds for llpn !IlL" thus demonstrating that it is sufficient
for some no to exist such that for all n > no either (5.2. 7) or (5.2.8) holds.
D
We have proved that either convergence or precompactness of {An!}
implies the existence of a stationary density. We may reverse the question
to ask whether the existence of a stationary density gives any clues to the
asymptotic properties of sequences {An/} The following theorem gives a
partial answer to this question.
Theorem 5.2.2. Let (X, A, tt) be a measure space and P: L 1 - L 1 a
Markov operator with a unique stationary density f . If /.(x) > 0 for all
x EX, then
for all fED.
lim An/=/.
n-+oo
Proof. First HBsume f I/. is bounded. By setting c = sup(/I/.), we have
pn f $ pncf.

= cPn /. =

cf.

and An/ :5 cAnf

= cj.

Thus the sequence {An!} is weakly precompact and, by Theorem 5.2.1,


is convergent to a stationary density. Since /. is the unique stationary
density, {An!} must converge to /.. Thus the theorem is proved when
f I/. is bounded.
In the general case, write fc =min(/, cf.). We then have

(5.2.9)
where
rc =

(1- ll;cll)

fc

+ f-

fc

94

5. The Asymptotic Properties of Densities

Since f,.(x)

> 0 we also have


lim /c(x)

C-+00

= /(x)

for all x

and, evidently, /c(x) $ f(x). Thus, by the Lebesgue dominated convergence


theorem, 11/c- /II --t 0 and 11/cll --t 11/11 = 1 as c --t oo. Thus the remainder
r c is strongly convergent to zero as c --t oo. By choosing e > 0 we can find
a value c such that lire II < e/2. Then

(5.2.10)
However, since /c/11/cll is a density bounded by cll/cll- 1f,., according to
the first part of the proof,

(5.2.11)
for sufficiently large n. Combining inequalities (5.2.10) and (5.2.11) with
the decomposition (5.2.9), we immediately obtain

IIAnf- f,.ll $ e
for sufficiently large n.
In the case that P is the Frobenius-Perron operator corresponding to
a nonsingular transformation S, Theorem 5.2.2 offers a convenient criterion for ergodicity. As we have seen in Theorem 4.2.2, the ergodicity of
S is equivalent to the uniqueness of the solution to Pf = f. Using this
relationship, we can prove the following corollary.

Corollary 5.2.3. Let (X,A,J') be a normalized measure space, S:X --t X


a measure-preserving transformation, and P the corresponding ProbeniusPefTOn operator. Then S is ergodic if and only if
for every fED.

(5.2.12)

Proof. The proof is immediate. Since S is measure preserving, we have


P1 = 1. If S is ergodic, then by Theorem 4.2.2, f,.(x)
1 is the unique
stationary density of P and, by Theorem 5.2.2, the convergence of (5.2.12)
follows. Conversely, if the convergence of (5.2.12) holds, applying (5.2.12) to
a stationary density f gives f = 1. Thus f,.(x) = 1 is the unique stationary
density of P and again, by Theorem 4.2.2, the transformationS is ergodic .

5.3. Asymptotic Periodicity of { pn /}

95

5.3 Asymptotic Periodicity of { pn f}


In the preceding section we reduce the problem of examining the asymptotic
properties of the averages Ani to one of determining the precompactness
of {An/}. This, in turn, was reduced by Corollaries 5.2.1 and 5.2.2 to the
problem of finding an upper-bound function for pn I or an upper bound
for llpn IIILP In this section we show that if conditions similar to those
in Corollaries 5.2.1 and 5.2.2 are satisfied for Frobenius-Perron operators,
then the surprising result is that { pn!} is asymptotically periodic. Even
more generally, we will show that almost any kind of upper bound on the
iterates pn I of a Markov operator P suffices to establish that { pn!} will
also have very regular (asymptotically periodic) behavior.

Definition 5.3.1. Let (X,A,Jt) be a finite measure space. A Markov operator P is called constrictive if there exists a 6 > 0 and K. < 1 such that
for every IE D there is an integer no(/) for which

pn l(x)Jt(dx) $

K.

for n;:;:: no(/)

and

~t(E) $

6.

(5.3.1)

Note that for every density I the integral in inequality (5.3.1) is bounded
above by one. Thus condition (5.3.1) for constrictiveness means that eventually [n;:;:: no(/)] this integral cannot be close to one for sufficiently small
sets E. This clearly indicates that constrictiveness rules out the possibility
that pn I is eventually concentrated on a set of very small or vanishing
measure.
H the space X is not finite, we wish to have a definition of constrictiveness
that also prevents pn I from being dispersed throughout the entire space.
To accomplish this we extend Definition 5.3.1.

Definition 5.3.2. Let (X,A,Jt) be a (u-finite) measure space. A Markov


operator Pis called constrictive if there exists 6 > 0, and K. < 1, and a
measurable set B of finite measure, such that for every density I there is
an integer no (f) for which

pn l(x)~t(dx) $

K.

for n;:;:: no(/)

and

~t(E) $

6.

(5.3.2)

j(X\B)UE

Clearly this definition reduces to that of Definition 5.3.1 when X is finite


and we take X = B.

Remark 5.3.1. Observe that in inequality (5.3.2) we may always assume


that E c B. To see this, take F = BnE. Then (X\B)UE = (X\B)UF
and, as a consequence,

j(X\B)UE

pn l(x)~t(dx) =

j(X\B)UF

pn l(x)~t(dx),

96

5. The Asymptotic Properties of Densities

~~~~~~~~~~~~~~--x

FIGURE 5.3.1. Graph showing convergence of a sequence of functions {/n} to a


set :F, where the hatched region contains all possible functions drawn from :F.
(See Example 5.3.1 for details.)

and J.&(F) 5 J.&(E).


From the definition, one might think that verifying constrictiveness is
difficult since it is required to find two constants 6 and K. as well as a set
B with rather specific properties. However, it is often rather easy to verify
constrictiveness using one of the two following propositions.

Proposition 5.3.1. Let (X,A,J.) be a finite measure space andP: L1 (X)-+


L 1 (X) be a Markov operator. Assume there is a p > 1 and K > 0 such that
for every density f E D we have pn f E V' for sufficiently large n, and
lim sup IIPn /IILJ 5 K.

(5.3.3)

n-+oo

Then P is constrictive.
Proof. From (5.3.3) there is an integer n 0 (/) such that
IIPn /IILP ~ K

+1

for n 2: no(/).

Thus, by criteria 2 of Remark 5.1.3 the family {Pn /},for n 2: no(/), fED,
is weakly precompact. Finally, for a fixed f E {0, 1), criteria 3 of the same
remark implies there is a 6 > 0 such that

pn f(x)J.&(dx) <

if J.&(E)

< 6.

Thus weak constrictiveness following from (5.3.3) is demonstrated.


Our next proposition may be even more useful in demonstrating the
constrictiveness of an operator P.

Proposition 5.3.2. Let (X,A,J.) be a a-finite measure space and P: L1 (X)


-+ L 1 (X) be a Markov operator. If there exists an hE L1 and .X< 1 such

5.3. Asymptotic Periodicity of {P" /}

97

that

for IE D,

(5.3.4)

then P is constrictive.

Proof. Let f = !(1- .X) and take :F = {h}. Since :F, which contains only
one element, is evidently weakly precompact (it is also strongly precompact,
but this property is not useful to us here), then by criterion 3 of Remark
5.1.3 there exists a 6 > 0 such that

~ h(x)J.L(dx) < f

for J.L(E) < 6.

(5.3.5)

Furthermore, by Remark 5.1.4 there is a measurable set B of finite measure


for which
f h(x)J.L(dx) <f.
(5.3.6)

lx\B

Now fix

E D. From (5.3.4) we may choose an integer no(/) such that

forn ~no(/),
and, as a consequence,

[ pn l(x)J.L(dx) :5 [ h(x)J.L(dx) +A+ f

(5.3.7)

for an arbitrary set C EA. Setting C =(X\ B) U E in (5.3.7) and using


(5.3.5) and (5.3.6) we have

f
j(X\B)UE

pn l(x)J.L(dx) :5

h(x)J.L(dx) +

j X\B

h(x)J.L(dx) +A+ f

jE

< 3f+A = 1-f.


this completes the proof.
The interpretation of Proposition 5.3.2 is quite straightforward. Namely,
for those regions where pn I > h, if the area of the difference between pn I
and his bounded above by A< 1, then Pis constrictive.
In checking conditions (5.3.1)-(5.3.4), it is not necessary to verify them
for all I E D. Rather, it is sufficient to verify them for an arbitrary class of
densities IE Doc D where the set Do is dense in D. To be more precise,
we give the following definition.
Definition 5.3.3. A set Do C D(X) is called dense in D(X) if, for every
h E D and f > 0, there is a g E Do such that llh -gil < f.
If X is an interval of the real line R or, more generally, and open set in
Rd, then, for example, the following subsets of D(X) are dense:

98

5. The Asymptotic Properties of Densities

D1 ={nonnegative continuous functions on X}

n D(X)

D 2 = {nonnegative continuous functions with compact support in X}


nD(X)

Ds = {nonnegative differentiable functions on X} n D( X)

D4 ={positive differentiable functions on X} n D(X)


If a set Do c D(X) is dense in D(X), one need only verify inequality
(5.3.1) for f E Do when checking for constrictiveness. Then, for any other
f E D(X) this inequality will be automatically satisfied with K. replaced
by K-1 = !(1 + K.). To show this choose an f E D. Then there is another
density fo E Do such that II/- foil :5 K-1 - K.. Since, by assumption, (5.3.1)
holds for foE Do, we have

L pn fo(x)~-&(dx) :5

K.

for n ~ no(fo)

and

L pn f(x)~-&(dx)

= L Pnfo(x)~-&(dx) + L[Pnf(x)- pn fo(x)]~-&(dx)


:5 L pn fo(x)~-&(dx) +II!- foil
:5 K-1.

Thus, when (5.3.1) holds for fo E Do it holds for all densities f E D(X).
Precisely the same argument shows that it is also sufficient to verify
(5.3.2) for densities drawn from dense sets. As a consequence of these observations, in verifying either (5.3.3) or (5.3.4) of Propositions 5.3.1 and
5.3.2 we may confine our attention to f E Do.
The main result of this section-which is proved in Komornik and Lasota
([1987]; see also Lasota, Li and Yorke [1984]; Schaefer [1980] and Keller
[1982])-is as follows.

Theorem 5.3.1. (spectral decomposition theorem). Let P be a constrictive Markov operator. Then there is an integer r, two sequences of
nonnegative functions 9i E D and k1 E 00 , i = 1, ... , r, and an operator
Q: 1 -+ 1 such that for every f E 1 , P f may be written in the form
r

Pf(x)

= ~ >.,(J)g,(x) + Qf(x),

(5.3.8)

i=1

where

>.,(!)

Lf(x)k,(x)~-&(dx).

The functions g, and operator Q have the following properties:

(5.3.9)

5.3. Asymptotic Periodicity of {P" /}

(1) g;(x)g;(x)

= 0 for all i =F j,

99

so that functions g; have disjoint sup-

ports;

(2) For each integer i there exists a unique integer a(i) such that Pg; =
Ya(i) FUrther a(i) =F a(j) fori =F j and thus opemtor P just seroes
to pennute the functions g;.
(3)

IIP"Q/11 --+ 0 as n --+ oo for every f

L1

Remark 5.3.2. Note from representation (5.3.8) that operator Q is automatically determined if we know the function g; and~' that is,
r

Qf(x) = f(x)-

L A;(/)g;(x).

i=l

From representation (5.3.8) of Theorem 5.3.1 for P/, it immediately


follows that the structure of pn+lj is given by
r

L A;(/)ga"(i)(X) + Qnf(x),

pn+lj(x) =

(5.3.10)

i=l

where Qn = P"Q, and a"(i) = a(a"- 1 (i)) = , and IIQn/11 --+ 0 as


n --+ oo. The terms under the summation in (5.3.10) are just permuted
with each application of P, and since r is finite the sequence
r

L A;(/)ga"(i)(x)

(5.3.11)

i=l

must be periodic with a period,. S r!. Since {a"(1), .. . a"(r)} is simply a


permutation of {1, ... , r}, there is a unique i corresponding to each a"(i).
Thus it is clear that summation (5.3.11) may be rewritten as
r

L Aa-"(i)(/)g,(x),
i=l

where {a-"(i)} denotes the inverse permutation of {a"(i)}.


Rewriting the summation in this form clarifies how successive applications of operator P really work. Since the functions g; are supported on
disjoint sets, each successive application of operator P leads to a new set
of sca.ling coefficients A0 - n(/) 8880Ciated with each function g;(x).
A sequence {P"} for which formula (5.3.8) is satisfied will be called
asymptotically periodic. Using this notion, Theorem 5.3.1 may be rephrased as follows: If Pis a constrictive operator, then {P"} is asymptotica.lly periodic.
It is actually rather easy to obtain an upper bound on the integer r
appearing in equation (5.3.8) if we can find an upper bound function for

100

5. The Asymptotic Properties of Densities

pn f. Assume that P is a Markov operator and there exists a function


hE L 1 such that
lim

n-+oo

II(Pnf - h)+ll = 0

for fED.

(5.3.12)

Thus, by Proposition 5.3.2, Pis constrictive and representation {5.3.8) is


valid. Let T be the period of sequence (5.3.11), so that, from {5.3.8) and
(5.3.12), we have
r

Lf(x)

* lim pn.,. f(x) = "'>.i(f)gi(x)


~ h(x),
L..J
n_.,..oo

fED.

i=l

Set f = 9k so that Lf(x) = g~;(x) ~ h(x). By integrating over the support


of g~;, bearing in mind that the supports ofthe 9k are disjoint, and summing
from k = 1 to k = r, we have

tf

lc=l 1supp9 ,.

Since

g~; E

g~c(x)~t(dx) ~

tf

k=l

1supp9 ,.

h(x)~t(dx) ~ llhll.

D, this reduces to

r~

llhll,

(5.3.13)

which is the desired result.


If the explicit representation (5.3.8) for P f for a given Markov operator
P is known, then it is especially easy to check for the existence of invariant
measures and to determine ergodicity, mixing, or exactness, as shown in the
following sections. Unfortunately, we seldom have an explicit representation
for a given Markov operator, but in the remainder of this chapter we show
that the mere existence of representation (5.3.8) allows us to deduce some
interesting properties.

5.4 The Existence of Stationary Densities


In this section we first show that every constrictive Markov operator has a
stationary density and then give an explicit representation for pn f when
that stationary density is a constant. We start with a proposition.
Proposition 5.4.1. Let (X,A,JL) be a measure space and P:L 1

-+

L1 a

constrictive Markov operator. Then P has a stationary density.


Proof. Let a density

f be defined by
1 r

f(x) =- L9i(x),
r i=l

(5.4.1)

5.4. The Existence of Stationary Densities

101

where r and 9i were defined in Theorem 5.3.1. Because of property (2),


Theorem 5.3.1,
1 r
Pf(x) =- L9a(i)(x)
r i=1

and thus PI = f, which completes the proof.


Now assume that the measure p. is normalized [p.(X) = 1] and examine
the consequences for the representation of pn f when we have a constant
stationary density f = 1x. Remember that, if P is a Frobenius-Perron
operator, this is equivalent top. being invariant.

Proposition 5.4.2. Let (X, A, p.) be a finite measure space and P: 1 -+ 1


a constrictive Markov opemtor. If P has a constant stationary density, then
the representation for pn+lf takes the simple form
r

pn+lf(x)

= L .\

-n(i)(/)IA, {x) + Qnf(x)

for all/ E 1 ,

{5.4.2)

i=1

where

IA,(x) = [1/p.(A;)]lA;(x).
The sets A; form a partition of X, that is,
fori:/: j.
Furthermore, p.(Ai)

= p.(A;)

whenever j

= an(i) for some n.

Proof. First observe that with f = lx and stationary, P1x = 1x so that


pn1x = lx. However, if Pis constrictive, then, from Theorem 5.3.1
r

pn+llx(x) =

L Aa-"(i)(lx)gi(x) + Qn1x(x).

(5.4.3)

i=1

From our considerations in the preceding section, we know that the summation in equation (5.4.3) is periodic. Let r be the period of the summation
portion of pn+l (remember that r :5 r!) so that
a-nr(i)

and

=i

p(n+l)rlx(x) = L .\i(lx )gi(x) + Qnrlx(x).


i=1

Passing to the limit as n

-+

oo and using the stationarity of lx, we have


r

1x(x)

= LAi(lx)gi(x).
i=1

(5.4.4)

102

5. The Asymptotic Properties of Densities

However, since functions 9i are supported on disjoint sets, therefore, from


(5.4.4), we must have each 9i constant or, more specifically,
9i(x)

= [1/.Xi(1x )]1A, (x),

where Ai c X denotes the support of 9i, that is, the set of all x such that
9i(x) # 0. From (5.4.4) it also follows that UiAi =X.
Apply operator pn to equation (5.4.4) to give
r

P"1x(x)

= 1x(x) = L~(lx)9a"(i)(x),
i=1

and, by the same reasoning employed earlier, we have


for allx E Aa,.
Thus, the functions 9i(x) and 9a"(i) must be equal to the same constant.
And, since the functions 9i(x) are densities, we must have

1
A,

Thus p.(Ai)

9i(x)p.(dx)

= 1 = p.(Ai)/.Xi(1x).

= .Xi(1x) and
9i(x)

Moreover, p.(Aa"(i))

= [1/p.(A;)]1A,(x).

= p.(A;) for all n.

(5.4.5)

5.5 Ergodicity, Mixing, and Exactness


We now turn our attention to the determination of ergodicity, mixing, and
exactness for operators P that can be written in the form of equation
(5.3.8). We assume throughout that p.(X} = 1 and that P1x = lx. We
further note that a permutation {a(1), ... , a(r)} of the set {1, ... , r} (see
Theorem 5.3.1) for which there is no invariant subset is called a cycle or
cyclical permutation.
Ergodicity
Theorem 5.5.1. Let (X, A, p.) be a normalized measure space and P: L 1 -+
1 a constrictive Markov operator. Then P is ergodic if and only if the
permutation {a(1}, ... ,a(r)} of the sequence {1, ... ,r} is cyclical.
Proof. We start the proof with the "if' portion. Recall from equation
(5.2.1) that the average An/ is defined by

Anf(x)

n-1

j=O

=-

.
3

L P f(x).

5.5. Ergodicity, Mixing, and Exactness

103

Thus, with representation (5.4.2), An! can be written as

where the remainder Qnf is given by

Now consider the coefficients


(5.5.1)
Since, as we showed in Section 5.4, the sequence {Aa-.l(i)} is periodic in
j, the summation (5.5.1) must always have a limit as n -+ oo. Let this
limit be Xi(/). Assume there are no invariant subsets of {1, ... , r} under
the permutation a. Then the limits~(/) must be independent of i since
every piece of the summation (5.5.1) of length r for different i consists of
the same numbers but in a different order. Thus

Further, since a is cyclical, Proposition 5.4.2 implies that J.(Ai)


1/r for all i, j and IA, = r1A,, so that
lim Anf

n-+oo

= J.(Aj) =

= rJ..(/).

Hence, for fED, X(/)= 1/r, and we have proved that if the permutation
{a(1), ... , a(r)} of {1, ... , r} is cyclical, then {pn!} is Cesaro convergent
to 1 and, therefore, ergodic.
The converse is also easy to prove. Suppose P is ergodic and that {a(i)}
is not a cyclical permutation. Thus {a(i)} has an invariant subset I. As an
initial f take
r

f(x)

= LCiiA,(x)
i=l

wherein

,., = { c =F 0
......
0

if i belongs to the invariant subset I of the


permutation of {1, ... , r},
otherwise.

104

5. The Asymptotic Properties of Densities

Then
lim A,,f

n-+oo

- = -r1 Lr Ai(/)1A"
i=1

where >.i (/) '# 0 if i is contained in the invariant subset I, and >.i (f) = 0
otherwise. Thus the limit of An! as n -+ oo is not a constant function with
respect to x, so that P cannot be ergodic. This is a contradiction; hence,
if P is ergodic, {a(i)} must be a cyclical permutation.
Mixing and Exactness

Theorem 5.5.2. Let (X, A, p.) be a normalized measure space and P: L 1 -+


L 1 a constrictive Markov operator. Ifr = 1 in representation {5.3.8) for P,
then P is exact.

Proof. The proof is simple. Assume r

= 1, so by {5.4.2) we have

pn+l f(x) = A(/)1x{x) + Qnf(x)

and, thus,
lim pn+l f = A(/)1x.

n-+oo

In particular, when f E D then A(/) = 1 since P preserves the norm.


Hence, for all f E D, {pn f} converges strongly to 1, and P is therefore
exact (and, of course, also mixing).
The converse is surprising, for we can prove that P mixing implies that
r = 1.

Theorem 5.5.3. Again let {X, A, p.) be a normalized measure space and
P: L 1 -+ 1 a constrictive Markov operator. If P is mixing, then r = 1 in
representation (5.9.8).
Proof. To see this, assume P is mixing but that r > 1 and take an initial
fED given by
where c1 = 1/p.(A1).
Therefore
pn f(x) = C11A(n)(x),

where A(n) = Aa"( 1) Since P was assumed to be mixing, {pn f} converges


weakly to 1. However, note that

5.6. Asymptotic Stability of {P"}

105

Hence { pn!} will converge weakly to 1 only if an(1) = 1 for all sufficiently
large n. Since a is a cyclical permutation, r cannot be greater than 1, thus
demonstrating that r = 1.
Remark 5.5.1. It is somewhat surprising that in this case P mixing implies
P exact. D
Remark 5.5.2. Observe that, except for the remainder Qn/, pn+l f behaves like permutations for which the notions of ergodicity, mixing, and
exactness are quite simple. D

5.6 Asymptotic Stability of {Pn}


Our considerations of ergodicity, mixing and exactness for Markov operators in the previous section were based on the assumption that we were
working with a normalized measure space (X, A, f.). We now turn to a more
general situation and take (X,A,f..') to be an arbitrary measure space. We
show how Theorem 5.3.1 allows us to obtain a most interesting result concerning the asymptotic stability of {pn!}.
We first present a generalization for Markov operators of the concept of
exactness for Frobenius-Perron operators associated with a transformation.
Definition 5.6.1. Let (X,A,f..') be a measure space and P:L 1 -+ L 1 a
Markov operator. Then {pn} is said to be asymptotically stable if there
exists a unique /. E D such that P /. = /. and
lim

n-+oo

IIPnf- /.II = 0

for every f E D.

(5.6.1)

When Pis a Frobenius-Perron operator, the following definition holds.


Definition 5.6.2. Let (X, A, f..') be a measure space and P: L 1 -+ L 1 the
Frobenius-Perron operator corresponding to a nonsingular transformation
S:X-+ X. If {Pn} is asymptotically stable, then the transformationS is
said to be statistically stable.
The following theorem is a direct consequence of Theorem 5.3.1.
Theorem 5.6.1. Let P be a constrictive Markov operator. Assume there
is a set A c X of nonzero measure, f..'( A) > 0, with the property that for
every fED there is an integer no(/) such that
pnf~)>O

~A~

for almost all x E A and all n > no(/). Then {pn} is asymptotically stable.
Proof. Since, by assumption, P is constrictive, representation (5.3.8) is
valid. We will first show that r = 1.

106

5. The Asymptotic Properties of Densities

Assume r > 1, and choose an integer io such that A is not contained in


the support of 9io. Take a density I E D of the form I (x) = 9io (x) and let
T be the period of the permutation a. Then we have
pn-r l(x) = 9i0 (x).

Clearly, pn-r l(x) is not positive on the set A since A is not contained in
the support of 9io This result contradicts (5.6.2) of the theorem and, thus,
we must have r = 1.
Since r = 1, equation (5.3.10) reduces to
pn+l l(x)

= >.(f)g(x) + Qnl(x)

so
lim pn I

n ..... oo

= >.(f)g.

If I E D, then limn..... oo pn I E D also; therefore, by integrating over X we


have
1 = >.(!).
Thus, limn..... oo pn I = g for all I E D and {pn} is asymptotically stable;
this finishes the proof.
The disadvantage with this theorem is that it requires checking for two
different criteria: (i) that Pis constrictive and {ii) the existence of the set
A. It is interesting that, by a slight modification of the assumption that
pn I is positive on a set A, we can completely eliminate the necessity of
assuming P to be constrictive. To do this, we first introduce the notion of
a lower-bound function.

Definition 5.6.3. A function h E 1 is a lower-bound function for a


Markov operator P: 1 --+ 1 if
for every IE D.

(5.6.3)

Condition {5.6.3) may be rewritten as

where IIEnll

--+

0 as n--+ oo or, even more explicitly, as

Thus, figuratively speaking, a lower.,.bound function h is one such that,


for every density I, successive iterates of that density by P are eventually
almost everywhere above h.
It is, of course, clear that any nonpositive function is a lower-bound
function, but, since I E D and thus pn I E D and all densities are positive,

5.6. Asymptotic Stability of {P"}

107

a negative lower bound function is of no interest. Thus we give a second


definition.

Definition 5.6.4. A lower-bound function his called nontrivial if h;::: 0


and llhll > 0.
Having introduced the concept of nontrivial lower-bound functions, we
can now state the following theorem.
Theorem 5.6.2. Let P: L 1 -+ L 1 be a Markov operator. { pn} is asymptotically stable if and only if there is a nontrivial lower bound function for

P.
Proof. The "only if" part is obvious since (5.6.1) implies (5.6.3) with
h = / . The proof of the "ir' part is not so direct, and will be done in two
steps. We first show that
lim

n-+oo

IIP"(h - 12)11 = 0

(5.6.4)

for every h, 12 E D and then proceed to construct the function / .


Step I. For every pair of densities fi, 12 E D, the IIP,.(h - 12)11 is a
decreasing function of n. To see this, note that, since every Markov operator
is contractive,
IlP/II ~II/II
and, as a consequence,

Now set g = h

- 12 and note that, since /I, 12 ED,


c = llg+ll

= IIY-11 = ~11911

Assume c > 0. We have g = g+ - g- and

IIP,.gll = cii(P,.(g+ /c)- h)- (P,.(g- /c)- h) II.

(5.6.5)

Since g+ fc and g- fc belong to D, by equation (5.6.3), there must exist an


integer n1 such that for all n ;::: n1

and

II(P"(g- /c)- h)_ II~ ~llhll.


Now we wish to establish upper bounds for IIP,.(g+ /c) - hll and
llpn(g- /c)- hll. To do this, first note that, for any pair of nonnegative
real numbers a and b,
la-bl =a-b+2(a-b)-.

108

5. The Asymptotic Properties of Densities

Next write

IIPn(g+ /c)- hll

=
=

IPn(g+ fc)(x)- h(x)IJt(dx)

pn(g+ fc)(x)p(dx) -

+2

fx

h(x)p(dx)

(Pn(g+ fc)(x)- h(x))- p(dx}

= IIPn(g+ /c}ll-llhll + 2II(Pn(g+ /c)- h}-11


:5 1-llhll + 2 lllhll

= 1- !llhll

for n? n1.

Analogously,
IIPn(g- /c)- hll :51- !llhll

for n? n1.

Thus equation (5.6.5) gives


IIPngll :5 ciiPn(g+ /c)- hll
:5 c(2 -llhll}

+ ciiPn(g- /c)- hll

= IIYII (1- !llhll)

for n? n1.

(5.6.6}

From (5.6.6), for any 11, he D, we can find an integer n 1 such that

By applying the same argument to the pair pn 1 11, pn 1 h we may find a


second integer n2 such that
11Pn 1 +n 2 (11- h)ll :511Pn1 (11- h)ll (1- !llhll)
2
:51111- hll (1- !llhll) .
After k repetitions of this procedure, we have
11Pn1 + ... +n"(/l- h)ll :511/t- hll (1- !llhll)lc'
and since llpn(l1- h)ll is a decreasing function of n, this implies (5.6.4).
Step II. To complete the proof, we construct a maximal lower-bound
function for P. Thus, let
p

= sup{llhll: his a lower-bound function for P}.

Since by assumption there is a nontrivial h, we must have 0 < p :5 1.


Observe that for any two lower-bound functions h 1 and h2, the function
h = max(h 11 h 2) is also a lower-bound function. To see this, note that

Choose a sequence {h;} of lower-bound functions such that llh;ll-+ p. Replacing, if necessary, h; by max(h 1 . , h; ), we can construct an increasing

5.6. Asymptotic Stability of {pn}

109

sequence {hj} of lower functions, which will always have a limit (finite or
infinite). This limiting function

h.= .lim hi
J-+00

is also a lower-bound function since

and, by the Lebesgue monotone convergence theorem,


as j-+ oo.
Now the limiting function h. is also the maximal lower function. To see
this, note that for any other lower function h, the function max( h, h.) is
also a lower function and that

II max(h, h.)ll ~ P = I! h. II,


which implies h ~ h .
Observe that, since (PJ)-

Pf-, for every m and n(n > m},

which implies that, for every m, the function pm h. is a lower function.


Thus, since h. is the maximal lower function, pm h. ~ h. and since pm
preserves the integral, pmh. = h. Thus the function /. = h./llh.ll is a
density satisfying P /. = ! .
Finally, by equation (5.6.4), we have
for fED,
which automatically gives equation (5.6.1}.
In checking for the conditions of Theorem 5.6.2 it is once again sufficient
to demonstrate that (5.6.3} holds for densities f drawn from a dense set
Doc D(X).
Remark 5.6.1. Before continuing, it is interesting to point out the connection between Theorems 5.3.1 and 5.6.2 concerning asymptotic periodicity
and asymptotic stability. Namely, from the spectral decomposition Theorem 5.3.1 we can actually shorten the proof of asymptotic stability in
Theorem 5.6.2.
To show this, assume P satisfies the lower-bound function condition
(5.6.3). Pick an f E D and choose a number no(!) such that

for n ;?: no(!).

110

5. The Asymptotic Properties of Densities

From Ia- bl =a- b+ 2{a- b)- we have


II(P"/- h)+ll::; IIPn/- hll::; IIP"/11-IIhll +2II{Pn/- h)-11,
and since liP" /II= 1, equation (5.6.7) gives
II(P"/- h)+ II::; 1-lllhll

for n ~ no(/).

Thus, by Proposition 5.3.2 we know that the operator P is constrictive.


Since Pis constrictive it satisfies Theorem 5.3.1 and in particular we have
the decomposition formula (5.3.8). Using the assumed existence of a lowerbound function, h we will show that r = 1 by necessity.
Assume the contrary and take r ~ 2. Consider two basis functions 9 1
and 92 in the decomposition (5.3.8). From P9, = 9a(i) we obviously have
pnm9i = 9i for m = rl and an arbitrary n. However, from (5.6.3) it also
follows that
pnm9i ~ h- E~m'
i = 1,2,
so 9i ~ h - E~m for i = 1, 2. This then implies that 9192 > 0, which
contradicts the orthogonality of the 9i required by Theorem 5.3.1. We are
thus led to a contradiction and therefore must haver = 1. Thus (5.3.8)
implies the asymptotic stability of {Pn} with/.= 91
Hence, by the expedient of using Theorem 5.3.1 we have been able to
considerably shorten the proof of Theorem 5.6.2.
The results of Theorem 5.6.2 with respect to the uniqueness of stationary
densities for asymptotically stable Markov operators may be generalized by
the following observation.

Proposition 5.6.1. Let (X, A, J.l.) be a measure space and P: L 1 -+ L1


a Markov operator. If {Pn} is asymptotically stable and/. is the unique
stationary density of P, then for every normalized f E L 1 (11/ll = 1) the
condition
(5.6.7)
Pf=f
implies that either f

= /.

or f

= -/.

Proof. From Proposition 3.1.3, equation (5.6.7) implies ~that both j+ and
are fixed points of P. Assume 11/+11 > 0, so that f = j+ /11/+11 is a
density and Pj =f. Uniqueness of/. implies j =/.,hence

1-

which must also hold for II j+ II = 0. In an analogous fashion,

so that
Since 11/11 =

I= J+- 1- =
11/.11, we have lal =

(11/+11-11/-11)/.

=a/.

1, and the proof is complete.

5.6. Asymptotic Stability of {Pn}

111

Before closing this section we state and prove a result that draws the connection between statistical stability and exactness when P is a FrobeniusPerron operator.

Proposition 5.6.2. Let (X, A,~) be a measure space, 8: X -+X a nonsingular transformation such that 8(A) E A for A E A, and P the FrobeniusPemJn operator corresponding to 8. If 8 is statistically stable and/. is the
density of the unique invariant measure, then the transformation 8 with
the measure
forAEA
is exact.

Proof. From Theorem 4.1.1 it follows immediately that

~f.

is invariant.

Thus, it only remains to prove the exactness.


Assume ~ J, (A) > 0 and define

!A(x)

= [1/~J.(A))/.(x)1A(x)

for

EX.

Clearly, fA E D(X,A,~) and


lim rn = lim IIPnfA

n-+oo

From the definition

~J.(SR(A)) = f

of~~.,

j Sn(A)

n-+oo

-/.11 = 0.

we have

/.(x)~(dx) ~ f

j Sn(A)

pn!A(x)~(dx)- rn.

(5.6.8)

By Proposition 3.2.1, we know that pn fA is supported on 8n(A), so that

Jsn(A)

pn /A(x)~(dx)

= [

Jx

pn /A(x)~(dx)

= 1.

Substituting this result into (5.6.8) and taking the limit as n-+ oo gives

hence 8: X

-+

X is exact by definition.

Remark 5.6.1. In the most general case, Proposition 5.6.2 is not invertible;
that is, statistical stability of 8 implies the existence of a unique invariant
measure and exactness, but not vice versa. Lin [1971) has shown that the
inverse implication is true when the initial measure ~ is invariant. 0

112

5. The Asymptotic Properties of Densities

5. 7 Markov Operators Defined by a Stochastic


Kernel
As a sequel to Section 5.6, we wish to develop some important consequences
of Theorems 5.6.1 and 5.6.2. Let (X, A, JJ) be a measure space and K: X x
X -+ R be a measurable function that satisfies

0$ K(x,y)

and

K(x,y)dx

(5.7.1)

[dx = JJ(dx)].

= 1,

(5.7.2)

Any function K satisfying (5.7.1) and (5.7.2) is called a stochastic kernel.


Further, we define an integral operator P by

Pf(x)

K(x,y)f(y)dy

(5.7.3)

The operator P is clearly linear and nonnegative. Since we also have

LL
=L
L

Pf(x)dx =

dx

K(x,y)f(y)dy

f(y) dy

K(x, y) dx

f(y) dy,

Pis therefore a Markov operator. In the special case that X is a finite set
and JJ is a counting measure, we have a Markov chain and P is a stochastic
matrix.
Now consider two Markov operators Pa and .P, and their corresponding
stochastic kernels, Ka and Kb. Clearly, PaP, is also a Markov operator,
and we wish to know how its kernel is related to Ka and Kb. Thus, write

(Pa.P,)f(x)

= Pa(.P,f)(x)
=
=

Ka(x, z)(.P,f(z)) dz

L z}{ L
L{L
L
Ka(x,

Kb(z, y)f(y)dy} dz

Ka(x, z)Kb(z, y)dz} /(y) dy.

Then PaP, is also an integral operator with the kernel

K(x,y) =

Ka(x,z)Kb(z,y) dz.

(5.7.4}

We denote this composed kernel K by

K=Ka*Kb
and note that the composition has the properties:

(5.7.5}

5. 7. Markov Operators Defined by a Stochastic Kernel


(i) Ko. * (Kb * Kc)

113

= (Ko. * Kb) * Kc (associative law); and

(ii) Any kernel formed by the composition of stochastic kernels is stochas-

tic.
However, in general kernels KA and Kb do not commute, that is, Ko.*Kb =I:
Kb* K 0 Note that the foregoing operation of composition definition is just
a generalization of matrix multiplication.
Now we are in a position to show that Theorem 5.6.2 can be applied
to operators P defined by stochastic kernels and, in fact, gives a simple
sufficient condition for the asymptotic stability of {pn}.

Corollary 5.7.1. Let (X,A,JL) be a measure space, K:X x X - R a


stochastic kernel, that is, K satisfies (5. 7.1) and (5. 7.2}, and P the corresponding Markov operator defined by (5. 7.9}. Denote by Kn the kernel
co1T'esponding to pn. If, for some m,

lxf inf Km(x, y) dx > 0,

(5.7.6)

11

then { pn} is asymptotically stable.

Proof. By the definition of Kn, for every f E D(X) we have


pnf(x)

=[

Kn(x,y)f(y)dy.

Furthermore, from the associative property of the composition of kernels,


Kn+m(x,y) = [

Km(x,z)Kn(z,y)dz,

so that

pn+m f(x)

=[

Kn+m(X, y)f(y) dy

=[ {[

Km(x,z)Kn(z,y)dz }f(y)dy.

If we set
h(x) = inf Km(x, y),
11

then
pn+m f(x) '2:. h(x) [

{[

Kn(z, y)dz} /(y) dy

= h(x) [!(y)dy

114

5. The Asymptotic Properties of Densities

since Kn is a stochastic kernel. Furthermore, since

IE D(X),

l(y)dy = 1,

and, therefore,
pn+m l(x) ~ h(x)

for n ~ 1,

D(X).

Thus
~

forn

m+ 1,

which implies that (5.6.3) holds, and we have finished the proof.
In the case that X is a finite set and K is a stochastic matrix, this result
is equivalent to one originally obtained by Markov.
Although condition (5.7.6) on the kernel is quite simple, it is seldom
satisfied when K(x, y) is defined on an unbounded space. For example, in
Section 8.9 we discuss the evolution of densities under the operation of a
Markov operator defined by the kernel [cf. equation (8.9.6)]

K(x y)
'

= { -e11Ei(-y)
'
-e11Ei(-x),

where

0<x <y
O<y<x,

=1 (e- 11 /y)dy,

(5.7.7)

00

-Ei(-x)

X>

0,

is the exponential integral. In this case


inf K(x, y) = 0
II

for all x

> 0,

and the same holds for all of its iterates Km(x, y). A similar problem occurs
with the kernel
K(x, y) = g(ax +by),
where b # 0 and g is an integrable function defined on R or even on n+
(cf. Example 5.7.2).
In these and other cases where condition (5.7.6) is not satisfied, an
alternative approach, reminiscent of the stability methods developed by
Liapunov, offers a way to examine the asymptotic properties of iterates of
densities by Markov operators.
Let G be an unbounded measurable subset of a d-dimensional Euclidian
space Rd, G c ~, and K: G x G -+ R a measurable stochastic kernel. We
will call any measurable nonnegative function V: G -+ R satisfying
lim V(x)

lzl->oo

a Liapunov function.

= oo

(5.7.8)

5.7. Markov Operators Defined by a Stochastic Kernel

115

Next, we introduce the Chebyshev inequality through the following


proposition.

Proposition 5.7.1. Let (X,A,JL) be a measure space, V:X-+ Ran arbitrary nonnegative measurable function, and for all f E D set
E(VIf) = [

V(x)f(x)JL(dx).

If

Ga.= {x: V(x) <a},


then

la,.

f(x)JL(dx) ;?: 1- E(Vlf)/a

(5.7.9)

(the Chebyshev inequality).

Proof. The proof is easy. Clearly,


E(V)If) ;?:

lx\G..

;?: a{

V(x)f(x)JL(dx) ;?: a

1-l.

lx\G,.

f(x)JL(dx)

f(x)JL(dx) }

Thus the Chebyshev inequality is proved.


With the lower-bound Theorem 5.6.2 and the Chebyshev inequality, it is
possible to prove the following theorem.

Theorem 5. 7.1. Let K: G x G-+ R be a stochastic kernel and P defined


by (5.7.3), with G replacing X, be the corresponding Markov operator. If
the kernel K(x,y) satisfies

f in K(x,y)dx
Jal11l<r
and has a Liapunov function V: G

Ia

V(x)PJ(x)dx :5 a

>0
-+

for every r

>0

(5.7.10)

R such that

fooo V(X)f(x)dx+{J

0 :5

< 1, {J 2:: 0
(5.7.11)

for f E D, then { pn} is asymptotically stable.

Remark 5. 7.1. Before giving the proof, we note that sometimes instead
of verifying inequality (5.7.11) it is sufficient to check the simpler condition

fa

K(x, y)V(x) dx :5 aV(y) + {J,

(5.7.11a)

116

5. The Asymptotic Properties of Densities

since (5.7.11a) implies (5.7.11). To see this, note that from (5.7.1la)

V(x)Pf(x)dx=

LL

V(x)K(x,y)f(y)dxdy

[aV(y) + PJ/(y) dy =a

V(y)f(y) dy + {3.

Proof. First define the function

En(VIf)

=LV(x)Pnf(x}dx

(5.7.12)

that can be thought of as the expected value of V(x) with respect to the
density pn f(x). From (5.7.11) we have directly

En(VIf) ~ aEn-t(VIf) + {3.

(5.7.13}

By an induction argument, it is easy to show that from this equation we


obtain
En(VIf) ~ [{3/(1- a))+ anEo(VIf).
Even though Eo(VIf) is clearly dependent on our initial choice of/, it is
equally clear that, for every f such that

Eo(VIf) < oo,


there is some integer

(5.7.14}

no= n 0 (/) such that

En(VIf) ~ [{3/(1- a))+ 1

for all n ~no.

(5.7.15}

Now let

Ga

= {x E G: V(x) <a}

so that from the Chebyshev inequality we have

{ pn f(x)dx ~ 1 _ En(VIf).

lo,.

(5.7.16)

Further, set

a> 1 + [{3/(1- a)],


then

En(VIf)
a

~!
a

(1 +

___L)
<1
1-a

for n

~no

and thus (5.7.16) becomes

[ pnj(x)dx

lo,.

~ 1-!a (1 + ___L)
1-a

>0

for n

~no.

(5.7.17)

5.7. Markov Operators Defined by a Stochastic Kernel

117

Since V(x) __. oo as lxl __. oo there is an r > 0 such that V(x) > a for
lxl > r. Thus the set Ga is entirely contained in the ball lxl :5 r, and we
may write

pn+lj(x) = { K(x,y)Pnf(y)dy

la

inf K(x,y) {

}Ga

uEGa

laa

K(x,y)Pnf(y)dy

pnf(y)dy

inf K(x,y) { pnf(y)dy


lul:5r
Jaa
~ e inf K(x,y)
lul:5r

(5.7.18)

for all n ~ no.


By setting

h(x) =

inf K(x,y)
lu1:5r
in inequality {5.7.18) we have, by assumption (5.7.10), that
f

llhll > 0.
Finally, because of the continuity of V, the set Do c D of all f such that
(5.7.14) is satisfied is dense in D. Thus all the conditions of Theorem 5.6.2
are satisfied.
Another important property of Markov operators defined by a stochastic
kernel is that they may generate an asymptotically periodic sequence { pn}
for every f E D. This may happen if condition (5.7.10) on the kernel is
replaced by a different one.

Theorem 5. 7.2. Let K: G x G __. R be a stochastic kernel and P be the


corresponding Markov opemtor. Assume that there is a nonnegative A< 1
such that for every bounded B C G there is a o = o(B) > 0 for which

K(x,y) dx :5 A

for p,(E)

< o,

y E B,

B.

(5.7.19)

Assume further there exists a Liapunov function V: G __. R such that


(5. 7.11) holds. Then P is constrictive. Consequently, for every f e L 1
the sequence {pn} is asymptotically periodic.
Proof. Again consider En {VI!) defined by (5.7.12). Using condition (5.7.11)
we once more obtain inequality (5.7.15). Thus by the Chebyshev inequality,
with Ga defined as in the proof of Theorem 5. 7.1,

jG\Ga

pn f(x) dx = 1- {

laa

:5

pn f(x) dx :5 En(VIf)
a

!a (1 + -L)
1-a

for n

~ no(!).

118

Set

5. The Asymptotic Properties of Densities


E

= 1(1- .X). Choosing a sufficiently large a that satisfies

a~!E

(1+_L),
a
1-

we have

P"f(x)dx 5:

for n 2: no(/).

(5.7.20)

lc\G ..
Consequently, from (5.7.19) we have

P"f(x)dx5,

j(G\G,.)uE

5: E +

+1

pn- 1 f(y) dy

K(x, y) dx

pn- 1 f(y)dy1 K(x,y)dx


G\G,.
G

G,.

pn- 1 f(y)dy

Using (5.7.19) and (5.7.20) applied to B

)E

K(x, y) dx.

= Ga we finally have
pn- 1 f(y) dy

P" f(x) dx 5:

2E

+.X

5,

2E

+ .X = 1 -

j(G\G,.)UE

P"f(x)dx

)E

5: E + 1

P"f(x)dx+

)G\G,.

Jc ..
E

for n ~ no(/) + 1.

Thus, inequality (5.3.2) in Definition 5.3.2 of constrictiveness is satisfied.


A simple application of Theorem 5.3.1 completes the proof.
Before passing to some examples of the application of Theorem 5. 7.1 and
5.7.2, we give two simple results concerning the eventual behavior of {P"}
when P is a Markov operator defined by a stochastic kernel.

Theorem 5. 7 .3. If there exists an integer m and a g E L 1 such that


Km(x,y) 5: g(x),
where Km(x,y) is the mth iterate of a stochastic kernel, then the sequence
{P"} with P defined by (5.7.3) is asymptotically periodic.
Proof. Since Km(x,y)

P" f(x) =

fx

5: g(x) we have

Km(x, y)pn-m f(y) dy '5 g(x)

for n 2: m.

Set h = g and take .X = 0 so by Proposition 5.3.2 the sequence {P"} is


asymptotically periodic.

5.7. Markov Operators Defined by a Stochastic Kernel

119

A slight restriction on Km(x,y) in Theorem 5.7.3 leads to a different


result, as given in the next result.
Theorem 5. 7.4. If there exists an integer m and a g E L 1 such that

Km(x,y) ::5 g(x),


where Km(x, y) is the mth iterate of a stochastic kernel, and there is a set
A c X with I'(A) > 0 such that
0 < Km(x,y)

forx E A, y E X,

then the sequence {pn} is asymptotically stable.


Proof. The proof is a trivial consequence of the constrictiveness of P from
Theorem 5.7.3, the assumptions, and Theorem 5.6.1.
Example 5.7.1. To see the power of Theorem 5.7.1, we first consider
the case where the kernel K(x, y) is given by the exponential integrals in
equation (5.7.7). It is easy to show that -eY(-Ei(y)) is decreasing and
consequently

inf K(x,y)

O:Sy:Sr

min{-Ei{-x),-erEi(-r)}

> 0.

Furthermore, taking V(x) = x, we have, after integration,

fooo xK(x, y) dx = !(1 + y).


Therefore it is clear that V(x) =xis a Liapunov function for this system
Also, observe that with f(x) = exp( -x), we have
when a= f3 =

Pf(x)

00

K(x,y)e- 11 dy

= e-z.

Thus, the limiting density attained by repeated application of the Markov


operator Pis f.(x) = exp( -x). 0
Example 5.7.2. As a second example, let g:R
positive function satisfying

L:

g(x)dx = 1 and

m1

L:

-+

R be a continuous

lxlg(x)dx

< oo.

Further, let a stochastic kernel be defined by

K(x, y)

= lalg(ax +by),

Ia! > lbl, b =F 0

120

5. The Asymptotic Properties of Densities

and consider the corresponding Markov operator

Pf(x)
Let V(x) =

1:

K(x,y)f(y)dy.

lxl, so that we have

/_: K(x, y)V(x) dx = lal /_: lxlg(ax +by) dx =

1:

g(s) Is~ by Ids

~ i:g(s)l~lds+ i:g(s)~~~ds= ~j +1~1


Thus, when a= lb/al and {3 = mtflal, it is clear that V(x) satisfies condition (5.7.11) and hence Theorem 5.7.1 is satisfied .
.AJJ will become evident in Section 10.5, in this example P f has the following interesting probabilistic interpretation. If { and f7 are two independent
random variables with densities f(x) and g(x), respectively, then

Pf(x)

'h a
Wlt

= lal /_: g(ax + by)f(y) dy,

= -C21

andb

Ct
= --,
C2

is the density ofthe random variable (c1{ +c2t7) [cf. equation (10.1.8)].

Example 5. 7 .3. As a final example of the applicability of the results of this


section, we consider a simple model for the cell cycle [Lasota and Mackey,
1984]. First, it is assumed that there exists an intracellular substance (mitogen), necessary for mitosis and that the rate of change of mitogen is
governed by
dm

dt

=g(m},

m(O}

=r

with solution m(r,t). The rate g is a C 1 function on [O,oo) and g(x) > 0
for x > 0. Second, it is assumed that the probability of mitosis in the
interval [t, t +At] is given by </>(m(t))At + u(At), where 4> is a nonnegative
function such that q(x) = <f>(x)fg(x) is locally integrable (that is, integrable
on bounded sets [0, c]) and satisfies
lim Q(x)

Z-+00

= oo,

where Q(x)

1z

q(y)dy.

(5.7.21)

Finally, it is assumed that at mitosis each daughter cell receives exactly


one-half of the mitogen present in the mother cell.
Under these assumptions it can be shown that for a distribution fn-t(x)
of mitogen in the (n - 1)st generation of a large population of cells, the
mitogen distribution in the following generation is given by

00

fn(x)

K(x,r)fn-t(r)dr,

5.7. Markov Operators Defined by a Stochastic Kernel

121

where

o
K(x,r) =

x E [o, !r)
x E [lr,oo).

tz q(y)dy]

{ 2q(2x)exp [-

(5.7.22)

It is straightforward to show that K(x,r) satisfies (5.7.1) and (5.7.2)


and is, thUB, a stochastic kernel. Hence the operator P: L 1 ( R+) -+ L 1 ( R+)
defined by

fooo K(x,r)f(r)dr

Pf(x) =

(5.7.23)

is a Markov operator. To show that there is a unique stationary density


/.ED to which {pnf} converges strongly, we use Theorem 5.7.1 under
the assumption that

lim inf[Q(2x)- Q(x)] > 1.

(5.7.24)

:I: -tOO

First we consider the integral

00

I=

u(Q(2x))Pf(x) dx,

(5.7.25)

where u is a continuoUB nonnegative function. Using equations (5.7.21)


through (5.7.23) we can rewrite (5.7.25) as follows:
2

00

I= 21 u(Q(2x))q(2x) dx 1 z exp[Q(y)- Q(2x)]f(y) dy


00
= u(Q(z))q(z) dz exp[Q(y)- Q(z)Jf(y) dy
00
= f(y) dy ioo u(Q(z)) exp[Q(y)- Q(z)]q(z) dz.

1z

Setting Q(z)-Q(y)

= x so q(z)dz = dx we finally obtain the useful equality

u(Q(2x))Pf(x) dx =

00

00

00

f(y) dy

Note in particular from (5.7.26) that for u(z)

u(x + Q(y))e-zdx.

=1

(5.7.26)

we have

00

00

Pf(x) dx =

f(y) dy,

which also proves that P is a Markov operator.


Now take u(x) = ez with 0 < ~ 1, and V(x)
(5. 7.26) it therefore follows that

fooo V(x)Pf(x) dx =

= u(Q(2x)).

From

00

00

f(y)eQ('II)dy

e-(l-)zdx

=1- 100 f(y)eEQ('II)dy.

1-

(5.7.27)

122

5. The Asymptotic Properties of Densities

Now pick a p > 1 and xo

0 such that

Q(2y)- Q(y)

for y

xo.

Then we can write {5.7.27) as

oo V(x)PJ(x)dx ~ - 1 1zo J(y)e'-Q(y)dy + - 1 J(y)e'-Q(


l-Eo
1-Ezo
1
1
e-ep 1oo V(y)J(y)dy.
~ -eeQ(zo) + e-'-P

00

211 >dy

1-E

1-E

e-ep

For the function

a(E)=1-E
we have a{O) = 1 and a'{O) = 1- p < 0. Thus for some E > 0 we have
a(E) < 1. Take such an E set

1
{3 = --eeQ(zo)
1-E
With these values of a and {3 we have shown that the operator P defined
by (5.7.22)-{5.7.23) satisfies inequality (5.7.11) of Theorem 5.7.1 under
the assumption of (5.7.23). It only remains to be shown that K satisfies
(5.7.10).
Let ro ~ 0 be an arbitrary finite real number. Consider K(x, r) for
0 ~ r ~ ro and x ~ !r. Then
a= a( E),

K{x,r)=2q{2x)exp[-1 z q(y)dy]

~ 2q{2x)exp [-

L
2

z q(y)dy]

for 0

~ r ~ ro,x ~ !r

and, as a consequence,

inf K(x,r) > h(x)

O$r$ro

= {O2q{2x) exp [- Jof 2z q(y)dy]

for x

< !ro

for x ~ !ro.

Further,

1~~

00

00
/

h(x) dx =

2q(2x) exp [- [

=exp [-foro q(y)dy]

:1:

q(y) dy] dx

> 0;

hence K (x, r) satisfies (5. 7.10). Thus, in this simple model for cell division,
we know that there is a globally asymptotically stable distribution of mitogen. Generalizations of this model have appeared in the work of Tyson and
Hannsgen [1986], Tyrcha [1988], and Lasota, Mackey, and Tyrcha [1992].
0

5.8. Existence of Lower-Bound Functions

123

5.8 Conditions for the Existence of Lower-Bound


Functions
The consequences of the theorems of this chapter for the Frobenius-Perron
operator are so far-reaching that an entire theory of invariant measures for
a large class of transformations on the interval [0,1], and even on manifolds, may be constructed. This forms the subject of Chapter 6. In this last
section, we develop some simple criteria for the existence of lower-bound
functions that will be of use in our specific examples of the next chapter.
Our first criteria for the existence of a lower bound function will be
formulated in the special case when X = (a, b) is an interval on the real
line [(a, b) bounded or not) with the usual Borel measure. We will use some
standard notions from the theory of differential inequalities [Szarski, 1967].
A function f: (a, b) -+ R is called lower semicontinuous if
J~ inf f(x

+ 6) ~

for x E (a, b).

f(x)

It is left lower semicontinuous if


J~ inf f(x- 6) ~ f(x)

for x E (a, b).

6>0

For any function/: (a, b)


setting
d+J(x)
X

-+

R, we define its right lower derivative by

= 6-+0
lim inf ~[/(x + 6) -f(x)]
u

for x E (a, b).

6>0

It is well known that every left lower semicontinuous function f: (a, b)


satisfying

-+

R,

for x E (a, b),


is nonincreasing on (a, b). (The same is true for functions defined on a

half-closed interval [a, b).)


For every f E Do that is a dense subset of D (Definition 5.6.5) write the
trajectory pn I as
(5.8.1)
for n ~ no(/).
Then we have the following proposition.
Proposition 5.8.1. Let P:L 1 ((a,b))-+ L 1 ((a,b)) be a Markov operator.
Assume that there exists a nonnegative function g E L 1 ((a, b)) and a constant k ~ 0 such that for every f E Do the function In in (5.8.1) are left
lower semicontinuous and satisfy the following conditions:

/n(x)

~ g(x)

a. e. in (a, b)

(5.8.2)

124

5. The Asymptotic Properties of Densities

for all x E (a, b).


Then there exists an intenJal ~
a lower function for pn.

(a, b) and an e > 0 such that h

(5.8.3)

= ElA

is

Proof. Let xo < x 1 < x 2 be chosen in (a, b) such that

Zl

g(x)dx

< l and

1b
:1:2

g(x)ch

< l

(5.8.4)

Set

M
Since

= l exp[-k(x2- xo)].

llpn /II= 1, condition (5.8.1) implies

1b

(5.8.5)

ln(x) ch = 1.

Now we are going to show that h = El(zo,z 1 ) is a lower function. Suppose


it is not. Then there is n';::: no andy E (xo,xt) such that ln(y) < h(y) =e.
By integrating inequality (5.8.3), we obtain
for
Furthermore, since

X E

[y, X2].

(5.8.6)

In :::; g, we have

bln(x)ch5, 1:1:1 g(x)ch+ 1:1:2 g(x)ch+ 1b g(x)ch.

1
a

z1

z2

Finally, by applying inequalities (5.8.4) and (5.8.6), we obtain

1b

ln(x)dx 5:

l + (x2- y)(e/4M) + l:::; !,

which contradicts equation (5.8.5).

Remark 5.8.1. In the proof of Proposition 5.8.1, the left lower semicontinuity of In and inequality (5.8.3) were only used to obtain the evaluation
for x;::: y.
Therefore Proposition 5.8.1 remains true under this condition; for example,
it is true if all In are nonincreasing. 0
It is obvious that in Proposition 5.8.1 we can replace (5.8.3) by d-lnfdx;:::
-kln and assume In right lower continuous (or assume In nondecreasing;

5.9. Sweeping

125

cf. Remark 5.8.1}. In the case of a bounded interval, we may omit condition (5.8.2} and replace (5.8.3} by a two-sided inequality. This observation
is summarized as follows.
Proposition 5.8.2. Let (a, b) denote a bounded interoal and let P: L 1 ( (a, b))
-+ L1 ( (a, b)) be a Markov operator. Assume that for each f E Do the functions In in (5.8.1} are differentiable and satisfy the inequality

~df~x) I::; kfn(x},

for all x E (a, b),

(5.8.7)

where k ~ 0 is a constant independent of f. Then there exists an e > 0


such that h = d(a,b) is a lower-bound function.
Proof. As in the preceding proof, we have equation (5.8.5}. Set

= (1/2(b- a)Je-k(b-a).

Now it is easy to show that In ~ h for n ~ no. H not, then /n(y) < e for
some y E (a, b) and n' ~ n 0 Consequently, by (5.8. 7),
fn(x)::; /n(y)ekl:z:-yl ::; [1/2(b- a)).
This evidently contradicts (5.8.5). The inequality fn
proof.

h completes the

5.9 Sweeping
Until now we have considered the situation in which the sequence {Pn}
either converges to a unique density (asymptotic stability) or approaches
a set spanned by a finite number of densities (asymptotic periodicity) for
every initial density f. In this section we consider quite a different property
in which the densities are dispersed under the action of a Markov operator
P. We call this new behavior sweeping, and introduce the concept through
two definitions and several examples.
Our first definition is as follows.
Definition 5.9.1. Let (X, A, J.t) be a measure space and A. c A be a
subfamily of the family of measurable sets. Also let P: L1 (X)-+ L 1 (X) be
a Markov operator. Then {pn} is said to be sweeping with respect to A.
if
lim

n--+oo}A

pnj(x)J.t(dx)

Since every element


two densities

=0

for every

D and A E A .

(5.9.1)

E L 1 can be written as a linear combination of

for j, ED,
for a sweeping operator P, condition (5.9.1) also holds for

E L 1.

126

5. The Asymptotic Properties of Densities

In particular examples, it is sufficient to verify condition (5.9.1) for f E


Do, where Do is an arbitrary dense subset of D. That this is so follows
immediately from the inequality

L f(x)~S(dx) L fo(x)~S(dx) +II/pn

$;

/oil,

pn

for fED, fo E Do,

(5.9.2)
and the fact that both terms on the right-hand side of (5.9.2) can be made
arbitrarily small.

Example 5.9.1. Let X= Rand IS be the standard Borel measure. Further,


let

Pf(x) = f(x- r),

for jeD

so

pn /(x) = f(x- nr),

(5.9.3)

for fED.

With r > 0 the sequence { pn} is sweeping with respect to the family of
intervals
.Ao = {( -oo, c): c E R}.
To prove this, note that for every

-oo

pnj(x)dx =

~c
-oo

E D with compact support we have

f(x- nr)dx =

~c-nr
-oo

J(y)dy.

Thus the integral on the right-hand side will eventually become zero since
( -oo, c - nr] n supp f

=0

for sufficiently large n. In an analogous fashion we can also prove that for
r < 0 the sequence {Pn}, where P is given by (5.9.3), is sweeping with
respect to the family of intervals
A1

= {[c, oo): c E R}.

Example 5.9.2. Again take X= Rand IS to be the Borel measure. Further, let P be an integral operator with Gaussian kernel

Pj(x)

= ~2
v27ro

00

-oo

exp [- (x;

:>

2
]

f(y)dy.

It is easy to show (see also Example 7.4.1 and Remark 7.9.1) that
2

pnj(x) =

~ /_: exp [<~~2~ ] J(y)dy,

(5.9.4)

5.9. Sweeping

and as a consequence

127

pnf(x)~ ~
2

2'11'u n
Thus the sequence {Pn} defined by (5.9.4) is sweeping with respect to the
family of bounded intervals

A2 ={[a, b): -oo <a< b < oo}


since

pnf(x)dx

~ ~-+ 0

as n-+ oo.
0
2'11'0' n
These two examples motivate a more restricted version of the general
Definition 5.9.1 of sweeping appropriate to the situation where X c R is
an interval.
b

Definition 6.9.2. Let X c R be an interval {bounded or not) with endpoints o:,/3 and let P:L1 (X)-+ L 1 (X) be a Markov operator. We say the
following:
a. {pn} is sweeping to fj if it is sweeping with respect to the family
of intervals
(5.9.5)
.Ao ={{a:, c): c < /3};
b. {pn} is sweeping to a if it is sweeping with respect to the family
of intervals
(5.9.6)
At = {[c, /3): o: < c};
c. {pn} is central sweeping if it is sweeping with respect to the family
of closed intervals

A2

= {[a,b]:o: <a< b < /3}.

(5.9.7)

In Examples 5.9.1 and 5.9.2 the sweeping was almost self-evident from
the structure of the operator P. However, this is often not the case, and we
are going to present a sufficient condition often useful for proving sweeping.
We start with a definition reminiscent of the definition of the Liapunov
function.
Let (X, A, JL) be a measure space and let a subfamily A. c A be given.
A bounded Borel measurable function V: X -+ R is called a Bielecki
function if it is nonnegative and if
inf V(x) > 0

zE.A

forA EA.

For example, if X = [a:, /3) and A. = .Ao from Definition 5.9.2a, then
every continuous and strictly positive function V: [o:,/3)-+ R is a Bielecki
function since the infimum of a positive continuous function on a closed
bounded (compact) interval is always positive.

128

5. The Asymptotic Properties of Densities

Having the concept of Bielecki functions it is straightforward to verify


the following proposition.
Proposition 5.9.1. Let (X, A, p.) be a measure space and let A .. C A be
fixed. Further let P: L 1 (X)-+ L 1 (X) be a Markov opemtor for which there
exists a Bielecki function V: X -+ R and a constant 7 < 1 such that

V(x)Pf(x)p.(dx) :57

V(x)f(x)p.(dx)

for fED.

Then {Pn} is sweeping.


Proof. Fix an

E D and A E

A ... Then

{ pn f(x)p.(dx) :5 . f

jA

V( ) { V(x)Pn f(x)p.(dx)

In zEA

X }A

"''nV( X )

:5 . f

In zEA

1
A

V(x)f(x)p.(dx).

Since "'( < 1 by assumption, the right-hand side converges to zero as n -+ oo


and condition (5.9.1) holds. Thus the proof is complete.
Despite the fact that the proof was relatively easy, Proposition 5.9.1 can
be extremely useful in proving sweeping as the next example demonstrates.
Example 5.9.3. We once again turn to the cell cycle model of Example 5.7.3 defined by the Markov operator (5.7.23) on L 1 (R+) with kernel
(5.7.22) constrained by condition (5.7.21). There we showed that condition
(5.7.24), that is,
liminf[Q(2x)- Q(x)] > 1
Z--+00

was sufficient to guarantee the asymptotic stability of {pn}. In this example


we will show that for the same system (5.7.22)-(5.7.23) the condition
limsup[Q(2x)- Q(x)]

<1

(5.9.8)

Z--+00

implies that {pn} is sweeping to +oo.


We start by choosing 0 < E < 1, an x
Q(2x)- Q(x) :::; p < 1 for x ~ xo. Define

u( z )
where

= { e-uo
e-E.z

for z
for z

xo, and p

<

1 such that

< zo
~ zo,

zo = Q(2xo), and set V(x) = u(Q(2x)). From (5.7.26) we have

00

00

V(x)Pf(x) dx

00

f(y) dy

u(x + Q(y)e-zdx,

5.10. The Foguel Alternative and Sweeping

129

or

1oo V(x)PI(x)dx 1 V(y)l(y)W(y)dy


00

where

for IE D,

roo

W(y) = V(y) Jo u(x + Q(y))e-a:dx.

(5.9.9)

(5.9.10)

We will evaluate W as given by (5.9.10) separately for y ::::; x 0 and for


y 2:: xo.
When y < x 0 observe that u is a nonincreasing function and that V (y) =
e-e.z:o. Thus

00

W(y)::::;
=

ee.z:o

u(x)e-a:dx

= euo{ 1.z:o u(x)e-a:dx + 1~ u(x)e-a:dx}

r.z:o e-:z:dx + roo eE(z-.z:o)-zdx


l.z:o

Jo

= 1- e-.z:o [ 1 :

E]

=o:1(E),

and it is evident that o:1(E) < 1 for all E > 0.


When y 2:: x0 we have V(y) = e-eQ( 2Y>. Furthermore u(x)::::; e-Ez for all
x, so

00

W(y)::::;

exp{-x- E[x + Q(y)- Q(2y)]}dx.

Since, by assumption, Q(2y)- Q(y)::::; p this can also rewritten as

W(y) ::::;

roo e-E(a:-p)-a:dx =

lo

eEP

= o:2(E).

1+E

It is clear that o:2(0) = 1 and that o:~(O) = p- 1 < 0. Thus, there must be
an E > 0 such that o: 2(E) < 1.
Chose an E such that o:(E) < 1 and define a= min(o:l(E),o: 2(E)). Then
W(y)::::; a< 1 for ally 2:: 0 and from (5.9.9) we have

00

00

V(x)PI(x) dx::::; a

V(x)l(x) dx

for allIED.

Thus by proposition 5.9.1 we have shown that the cell cycle model defined by equations (5.7.21)-(5.7.23) is characterized by a sweeping Markov
operator when (5.9.8) holds.

5.10 The Foguel Alternative and Sweeping


From Example 5.9.3 it is clear that the demonstration of sweeping is neither necessarily straightforward nor trivial and may, in fact, require a rather

130

5. The Asymptotic Properties of Densities

strong effort. In this section we present a sufficient condition for sweeping


that is sometimes especially helpful in the study of integral Markov operators with stochastic kernels.
Let (X, .A, IJ.) be a measure space and P: L 1 (x) - t L 1 (X) be the operator
Pf(x)

fx

(5.10.1)

K(x,y)f(Y)IJ.(dy)

where K is a stochastic kernel and thus satisfies conditions (5.7.1) and


(5.7.2). We have already shown in Section 5.7 that Pis a Markov operator
and hence defined for all f e L 1 .
However, the right-hand side of (5.10.1) is well defined for every measurable f;?: 0 even though it may, of course, be infinite for some x. With this
observation we make the following definitions.

Definition 5.10.1. Let P: L 1 - L 1 be the integral Markov operator


(5.10.1) and let f: X - R be a measurable and nonnegative function.
We say that f is subinvariant if
Pf(x) 5: f(x)

for x eX a.e.

Definition 5.10.2. Let a subfamily .A. c .A be fixed. We say that .A. is


regular if there is a sequence of sets An e .A., n = 0, 1 ... , such that
00

UAn=X.

(5.10.2)

n=O

Definition 5.10.3. A nonnegative measurable function


cally integrable if

Lf(x)~J.(dx)

< oo

f: X

-t

R is lo-

for A E .A.

With these definitions we state the following result which will be referred
to as the Foguel alternative.

Theorem 5.10.1. Let (X, .A, IJ.) be a measure space and .A. E .A a regular
family. Assume that P: L 1 - t L 1 is an integral operator with a stochastic
kernel. If P has a locally integrable and positive (f > 0 a. e.) subinvariant
function f, then either P has an invariant density or {P"} is sweeping.
In the statement of this theorem, there are two implications:

(1) if {P"} is not sweeping, then P has an invariant density; and


(2) if {P"} is sweeping, then P has no invariant density.

5.10. The Foguel Alternative and Sweeping

131

Only the first part is hard to prove, and the second part can be demonstrated using condition (5.10.2).
To prove the second implication, suppose that {pn} is sweeping and that
/. = PJ. is an invariant density. Further define

Then, according to (5.10.2),


lim

/.(x)p(dx) =

lc-+oo}B,.

f /.(x)p(dx) =
Jx

1,

and in particular for some fixed k

(5.10.3)
On the other hand, since/.= Pf.,

1c

/.(x)p(dx) $

B,.

L
i=l

1c

/.(x)p(dx)

Ac

=L
i=l

pn j.(x)p(dx).

Ac

Since {pn} is sweeping by assumption, the right-hand side of this relation


converges to zero. This, however, contradicts (5.10.3) and we thus conclude
that {pn} is not sweeping.
Remark 5.10.1. This theorem was proved by Komorowski and Tyrcha
(1989] and the assumptions concerning the regular family A. simplified by
Malczak [1992]. Similar theorems when A. is the family of all measurable
subsets have been proved by several authors; see Foguel [1966] and Lin
[1971].

Example 5.10.1. Let X = R+, and consider the integral operator


for x 2:: 0,
where '1/J: [0, 1]

--+

(5.10.4)

R is a given integrable function such that

1
1

t/J(z) 2:: 0 and

t/J(z)dz = 1.

(5.10.5)

The operator (5.10.4) appears on the right-hand side of the ChandrasekharMiinch equation describing the fluctuations in the brightness of the Milky
Way. This equation will be discussed in Examples 7.9.2 and 11.10.2. Here
we are going to study the properties of the operator (5.10.4) alone.

132

5. The Asymptotic Properties of Densities

Let V: R+
have

-+

R be a nonnegative measurable function. For

D we

oo
1oo
1oo (X) dy
1 V(x)Pf(x) dx = V(x) dx 1/J y f(y)y
z

1oo f(y)dy 111 1/J (~) V(x)~.

Substituting xfy = z this becomes

1 1
1

00

00

V(x)Pf(x) dx =

f(y)dy

'1/J(z)V(zy) dz.

(5.10.6)

=1 gives

This equality with V(x)

1ooo Pf(x)dx = 1o

00

f(y)dy { '1/J(z)dz

lo

foo f(y)dy

lo

which, together with the nonnegativity of 1/J, implies that (5.10.4) defines
a Markov operator.
Now set ffj(x) = x-fj in (5.10.4). Then

(X)

00

P/[J(x)

For

1/J -

dy
fJ+
Y 1

= {j1

11

,P(z)zfJ- 1 dz.

(5.10.7)

/3 ~ 1 we have 1/J(z)zP- 1 ~ 1/J(z) and, as a consequence,


for x

0.

Thus, by Theorem 5.10.1, the operator P defined by (5.10.4) is either sweeping to zero or has an invariant density.
It is easy to exclude the possibility that P has an invariant density.
Suppose that there is an invariant density / . Then the equality (5.10.6)
gives

1
1

00

or

1
1

00

V(y)f.(y) dy

/.(y) dy

'1/J(z)V(zy) dz,

00

/.(y) dy

,P(z)[V(y)- V(zy)] dz

= 0.

(5.10.8)

Now take V: [O,oo) -+ R to be positive, bounded, and strictly increasing


[e.g., V(z) = z/(1 + z)].Then
V(y)- V(zy)

and the integral


I(y)

>0

for y

> 0, 0 ~ z

,P(z)[V(y)- V(zy)] dz

1,

5.10. The Foguel Alternative and Sweeping

133

is strictly positive for y > 0. Consequently, the product /.(y)I(y) is a nonnegative and nonvanishing function. This shows that the equality {5.10.8}
is not satisfied, and thus there is no invariant density for P.
Thus, for every '1/J satisfying (5.10.5) the operator P given by equation {5.10.4} is sweeping. This is both interesting and surprising since we
will show in Section 11.10 that the stochastic semigroup generated by the
Chandrasekhar-Miinch equation is asymptotically stable! D
The alternative formulated in Theorem 5.10.1 does not specify the behavior of the sequence {pn} in the case when an invariant density exists. We
now formulate a stronger form of the Foguel alternative, first introducing
the notion of an expanding operator.

Definition 5.10.4. Let (X,A,p) be a measure space and P:L 1


a Markov operator. We say that P is expanding if

=0

lim p(A - suppPn f)

n-+oo

for

E D and p(A) < oo.

L 1 be
{5.10.9}

The simplest example of an expanding operator is an integral operator


with a strictly positive stochastic kernel. In fact, from equation (5.7.3) with
K(x,y) > 0 it follows that pnf(x) > 0 for all x EX and n ~ 1. In this
case, supp pn f = X and condition (5.10.9) is automatically satisfied.
A more sophisticated example of an expanding operator is given by
{>.(a:)

Pf(x)

= la

K(x,y)f(y)dy,

(5.10.10}

where K(x, y) is a measurable kernel satisfying

K(x,y) > 0

for a< y < A(x), a< x,

(5.10.11}

and A: [a, oo) - [a, oo) is a continuous strictly increasing function such that

A(x) > x

fora< x.

(5.10.12}

A straightforward calculation shows that P is a Markov operator on


L 1{[a, oo)) when

K(x, y) dx = 1

foo

for y

>a.

(5.10.13}

1>.-l(y)

We also have the following.

Proposition 5.10.1. If K and A satisfy conditions (5.10.11)-(5.10.13},


then the Markov operator P: L 1 {[a, oo)) - t L 1 {[a, oo)) defined by (5.10.10)
is expanding.
Proof. Let

E D be given and let

xo

= essinf{x:f(x) > 0}.

134

5. The Asymptotic Properties of Densities

This means that x 0 is the largest possible real number satisfying


l'{supp f

n [0, x 0]) =

0.

Further, let x1 = .x-1(xo). It is evident from the defining equation {5.10.10)


that Pf(x) > 0 for .X{{x) > xo or x > x1. Define Xn = .,x-n(xo). By an
induction argument it is easy to verify that pn f(x) > 0 for x > Xn Thus,
for an arbitrary measurable set A c [a, oo) we have

I'( A - supp pn /) $

Xn -

a.

(5.10.14)

The sequence {xn} is bounded from below (xn ~a). It is also decreasing
since Xn .x- 1{Xn-1) $ Xn-1 Thus {xn} is convergent to a number x. ~
a. Since .X{xn) = Xn-b in the limit as n- oo we have .X{x.) = x . From
inequality {5.10.12) it follows that x. = a, which according to (5.10.14)
shows that P is expanding.
For expanding operators, the Foguel alternative can be formulated as
follows.

Theorem 5.10.2. Let (X,A,J) be a measure space and A. C A be a


regular family of measurable sets. Assume that P: L 1 (X) - L 1 (X) is an
expanding integral operator with a stochastic kernel. If P has a locally integrable posititJe (! > 0 a. e.) subintJariant function, then either {pn} is
asymptotically stable or it is sweeping.
The proof can be found in Malczak {1992). Theorem 5.10.2 can be derived
from a new criterion for asymptotic stability given by Baron and Lasota
{1993). See Exercise 5.8.

Example 5.10.2. We return to the modeling of the cell cycle (see Example
5. 7.3) by considering the following model proposed by Tyson and Hannsgen

(1986).
They assume that the probability of cell division depends on cell size
m, so cell size plays the role of the mitogen considered in Example 5.7.3.
It is further assumed that during the lifetime of the cell, growth proceeds
exponentially, that is,
dm
-=km.
dt
When the size is smaller than a given value, which for simplicity is denoted
by 1, the cell cannot divide. When the size is larger than 1, the cell must
traverse two phases A and B. The end of phase B coincides with cell
division. The duration of phase B is constant and is denoted by Ts. The
length TA of phase A is a random variable with the exponential distribution
prob(TA ~ t)

= e-pt.

At cell division the two daughter cells have sizes exactly one-half that of
the mother cell.

5.10. The Foguel Alternative and Sweeping

135

Using these assumptions it can be shown that the process of the replication of size may be described by the equation

fn+l(x)

= Pfn(x) = [ tT

/tT

K(x,r)fn(r)dr,

(5.10.15)

where fn is the density function of the distribution of the initial size in the
nth generation of cells, and the kernel K is given by

_
K(x,r)-

;
(kp)u(x)-1-{1'/lc)
(
p)
(x)-1-(p/lc)
'-/A:)
{- r\P
ku

foru$r<1
x
for 1 < r < -.
- -u

(5.10.16)

It is assumed that u = !ekTs < 1. A straightforward calculation shows


that P given by formulas (5.10.15) and (5.10.16) is a Markov operator on
the space L 1 ([u, oo)).
Following Tyson and Hannsgen, we are looking for an invariant density
of the form /.(x) = cx- 1-'Y. From the equation/.= Pf. we obtain

or
x-1--y

= (fa) (~r1--y + (.!!_)


(~)-"f

ku

(:.)-1-{p/A:)
u

[u-'Y -1"f

(~)-"f

The above condition is satisfied when "f is a solution of the transcendental


equation
(5.10.17)
The left-hand side of this equation is equal to 1 for "( = 0 and tends to
oo as"(-+ oo. Thus in order to have a positive solution of (5.10.17) it is
sufficient to assume that
for "f = 0,
which is equivalent to
k

- < -lnu.
p

(5.10.18)

Thus, for k,p, and u satisfying (5.10.18), there exists "f > 0 for which
the function /.(x) = cx- 1-'Y is invariant with respect to P. It can be
normalized on the interval [u, oo ), namely, for c = "fCT-'Y,

1oo f.(x)dx = "fCT-11oo x-1--ydx = 1.

136

5. The Asymptotic Properties of Densities

Now we can apply Theorem 5.10.2. The function/. is simultaneously a


positive locally integrable (with respect to the family of compact subsets
of [u,oo)) invariant function and an invariant density. Further, according
to Proposition 5.10.1, the operator (5.10.15) is expanding since the kernel
K(x, r) given by equation (5.10.16) and ,\(x) = xfu with u < 1 satisfy
conditions (5.10.11) and (5.10.12). Therefore, according to Theorem 5.10.2
the sequence {pn} is asymptotically stable. 0

Remark 5.10.2. The asymptotic stability of { pn} under condition (5.10.18)


was predicted by Tyson and Hannsgen (1986) and proved by Tyrcha. (1989)
using Lia.punov function techniques (see Exercise 5. 7). It should be noted
that in our approach using the Foguel alternative the major effort was expended in finding the invariant density /. since, once /. was a.va.ila.ble, the
asymptotic stability followed automatically from the properties of Markov
operators with advanced arguments.

Exercises
5.1. Let (X,A,~J.) be a finite measure space and let 1 :5 P1 :5 P2 :5 oo.
Show that every set :F E lJ'2 strongly precompact in lJ'2 is also strongly
precompact in IJ'1 Is the same true for weak precompactness?

5.2. Let X = R+ with the standard Borel measure. Consider the four
families of functions:
(1) fa(x) = ae-az, for a~ 1;

= ae-az, for 0 :5 a :5 1;
fa(x) = e-zsinax, for a~ 1;
fa(x) = e-z sin ax, for 0 :5 a :51.

(2) fa(x)
(3)
(4)

Which of these families is weakly and/or strongly precompact in L 1 (R+)?


5.3. Consider the measure space (X, A, J) described in Exercise 2.1. Let
= L 1 (X,A,~-&) be nonnegative and let

g E l1

:F = {/

e l 1 : 1/1 :5 g}.

Show that :F is strongly precompact in l 1 .


5.4. Generalize the previous result and show that every weakly precompact
set :F c l 1 is also strongly precompact in l 1 .

5.5. Let X c R be an interval and let K: X x X


function. Assume that

Pf(x)

=[

K(x,y)f(y)dy

-+

R be a. continuous

Exercises

137

is a Markov operator. Prove that K is a stochastic kernel. Try to generalize


this result to the case when (X,A,IJ) is an arbitrary measure space and
K:X x X- R is measurable on the product space X x X.
5.6. Let X

= R+ and let
Pf(x)

1z

K(x, y)f(y) dy

be a Markov operator with a stochastic kernel; that is, K is measurable


and
00
K(x,y) ~ 0,
K(x, y) dx = 1.

Prove that if K is bounded (K(x,y) $ M), then {pn} is sweeping to +oo.


Try to generalize this result of the case when K is unbounded.

5.7. Let X= Rand let

Prove that {pn} is sweeping with respect to the family

A.= {(-oo,c):c E R},


but it is not sweeping with respect to
Afin

={A E B(R):m(A)

< oo},

where m denotes the standard Borel measure (Komorowski and Tyrcha


1989).
5.8. Let (X,A,IJ) be a measure space and P:L 1 - L 1 be a Markov operator. We say that P overlaps supports if, for every /,g ED there is an
integer no(/, g) such that

IJ(supp pno I

n supp pno u) > o.

It can be proved (Baron and Lasota, 1993) that {Pn} is asymptotically


stable for every integral operator with stochastic kernel which overlaps
supports and has an invariant positive (a.e.) density. Using this result and
Theorem 5.10.1, prove Theorem 5.10.2.

6
The Behavior of Transformations
on Intervals and Manifolds

This chapter is devoted to a series of examples of transformations on intervals and manifolds whose asymptotic behavior can be explored through
the use of the material developed in Chapter 5. Although results are often
stated in terms of the asymptotic stability of {Pn}, where Pis a FrobeniusPerron operator corresponding to a transformationS, remember that, according to Proposition 5.6.2, S is exact when {pn} is asymptotically stable
and S is measure preserving.
In applying the results of Chapter 5, in several examples we will have
occasion to calculate the variation of a function. Thus the first section
presents an exposition of the properties of' functions of bounded variation.

6.1

Functions of Bounded Variation

There are a number of descriptors of the "average" behavior of a function


f: [a, b] -+ R. Two of the most common are the mean value of/,
m(f)

= b ~a

1b

f{x) dx,

and its variance, D 2 (/) = m((f -m(/)) 2 ). However, these are not always
satisfactory. Consider, for example, the sequence of functions {/n} with
fn(x) = sin2mrx, n = 1,2, .... They have the same mean value on [0,1],
namely, m(/n) = 0 and the same variance D 2 (/n) = ~; but they behave
quite differently for n : 1 than they do for n = 1. To describe these

140

6. The Behavior of Transformations on Intervals and Manifolds

kinds of differences in the behavior of functions, it is useful to introduce


the variation of a function (sometimes called the total variation).
Let I be a real-valued function defined on an interval A c R and let
[a, b] be a subinterval of A. Consider a partition of [a, b] given by
a

Xo

and write

< X1 < < Xn = b

(6.1.1)

Bn{f) =

L: ll(xi) - l(xi-1)1.

{6.1.2)

i=1

If all possible sums sn{!), corresponding to all subdivisions of [a, b], are
bounded by a number that does not depend on the subdivision, I is said
to be of bounded variation on [a, b]. Further, the smallest number c such
that Bn :$ c for all Bn is called the variation of I on [a, b] and is denoted
by
I. Notationally this is written as

v!

VI= SUPBn{!),

(6.1.3)

where the supremum is taken over all possible partitions of the form (6.1.1).
Consider a simple example. Assume that I is a monotonic function, either
decreasing or increasing. Then

ll(xi) - l(xi-1)1
where
8

_ {1
-1

= 8[1(xi) -

l(xi-1)],

for I increasing
for I decreasing

and, consequently,
n

Bn{f) = 8 L[l(xi) - I(Xi-d]


i=1
= 8[1(xn)- l(xo)] = ll(b)- l(a)l.
Thus, any function that is defined and monotonic on a closed interval is
of bounded variation. It is interesting (the proof is not difficult) that any
function I of bounded variation can be written in the form I = h + h,
where It is increasing and h is decreasing.
Variation of the Sum
Let I and g be of bounded variation on [a, b]. Then

6.1. Functions of Bounded Variation

141

and, consequently,
b
Bn(/

+g) $ Bn(/) + Bn(g) :5 VI+ V g.


0

Thus (! + g) is of bounded variation and


b

VU+g):SVI+Vg.
a

If It, ... , In are of bounded variation on [a, b], then by an induction argument
b

(Vl)

VU1 ++In) :5 V It++ V In


0

(6.1.4)

follows immediately.
Variation on the Union of Intervals
Assume that a < b < c and that the function I is of bounded variation on
[a, b]as well as on [b, c). Consider a partition of the intervals [a, b] and [b, c),
a

= Xo < X1 < < Xn = b = Yo < Y1 < < Ym = c

(6.1.5)

and the corresponding sums


n

Bn

(!) =

[o,b)

L ll(xi)- l(xi-1)1,
i=1
m

Bm

(f)=

(b,c)

i=1

II(Yi)- I(Yi-1)1.

It is evident that the partitions (6.1.5) jointly give a partition of [a,c].


Therefore,
Bn

(f) +

(o,b)

Bm

(f)

= Bn+m (f)

(b,c)

(6.1.6)

[o,c)

where the right-hand side of equation (6.1.6) denotes the sum corresponding
to the variation of I over [a, c]. Observe that (6.1.6) holds only for partitions
of [a, c] that contain the point b. However, any additional point in the sum
Bn can only increase Bm but, since we are interested in the supremum, this
is irrelevant. From equation (6.1.6) it follows that
b

VI+VI=VJ.
0

142

6. The Behavior of 'Iransformatioos on Intervals and Manifolds

Again by an induction argument the last formula may be generalized to


41

where ao < a 1
i = l, ... ,n.

On

Gn

VI+
.. + V I= V/,
ao

(V2)

< < an and f

(6.1.7)

is of bounded variation on [ai-l,ai],

Variation of the Composition of Functions


Now let g: [a, P] - [a, b] be monotonically increasing or decreasing on the
interval [a, P] and let f: [a, b] - R be given. Then the composition fog is
well defined and, for any partition of [a, P],

= uo < u1 < < Un = P;

(6.1.8)

the corresponding sum is


n

Bn(/ o g) = ~ 1/(g(ui))- /(g(ui-1))1.


i=l

Observe that, due to the monotonicity of g, the points g(ui) define a partition of [a, b]. Thus, Bn (f o g) is a particular sum for the variation of f and,
therefore,
b

Bn(/og) $

Vf
a

for any partition (6.1.8). Consequently,


{3

VJog$V f.

(V3)

(6.1.9)

Variation of the Product


Let f be of bounded variation on [a, b] and let g be 0 1 on [a, b]. To evaluate
the variation of the product f(x)g(x), x E [a, b], start from the well-known
Abel equality,
n

~ laibi- ai-l bi-ll=~ lbi(ai- ai-l)+ ~-l(bi- bi-1)1.


~1

~1

Applying this equality to the sum [substituting~ = /(xi) and bi


n

Bn(/g) = ~ 1/(xi)g(xi) - /(Xi-l)g(Xi-1)1,


i=l

= g(xi)]

6.1. Functions of Bounded Variation

143

the immediate result is


n

Bn(/g) $;

L {lg(xi)lll(xi) -I(Xi-1)1 + II(Xi-1)llg(xi)- g(Xi-1)1}.


i=1

Now, by applying the mean value theorem, we have


n

Bn(/9) $; (sup lgl)sn(/) +

ll(xi-1)g'(xi)l(xi- Xi-1)

i=1

$;(sup lgl) VI+


a

L ll(xi-1)g'(xi)l(xi- Xi-1)
i=l

with Xi E (xi-1! Xi) Observe that the last term is simply an approximating
sum for the Riemann integral of the product ll(x)g'(x)l. Thus the function
l(x)g(x) is of bounded variation and
b

(V4)

V 19 $; (sup lgl) VI+ 111(x)g'(x)l dx.


a

Taking in particular

{6.1.10)

I= 1,

y
b

(V4')

g $;

Llg'(x)l dx.

{6.1.11)

However, in this case, the left- and right-hand sides are strictly equal since
Bn (g) is a Riemann sum for the integral of g'.
Yorke Inequality

Now let I be defined on [0, 1) and be of bounded variation on [a, b] c [0, 1].
We want to evaluate the variation of the product of I and the characteristic
function 1[a,b) Without any loss of generality, assume that the partitions
of the interval [0, 1] will always contain the points a and b. Then
Bn

(0,1)

(/1[a,bJ) $; Bn (/)
[a,b)

+ ll(a)l + ll(b)l.

Let c be an arbitrary point in [a, b]. Then, from the preceding inequality,

Bn

(0,1)

(/1[a,bJ) $; Bn (/)
(a,b)

+ ll(b) -I{c) I+ II(c) -l(a)l + 2ll{c)l

$; 2 VI+ 2ll(c)j.
a

144

6. The Behavior of Transformations on Intervals and Manifolds

It is always possible to choose the point c such that


ll(c)l5 b
so that
Sn

~a 1b ll(x)l dx

2
(/1[a,bJ) 52 VI+~

(0,1)

1b lf(x)l dx,

a a

which gives
(V5)

b 2 1b ll(x)l dx.
V
l1(a,b) 5 2 VI+ -_b
0
a
1

6.2

(6.1.12)

Piecewise Monotonic Mappings

Two of the most important results responsible for stimulating interest in


transformations on intervals of the real line were obtained by R.enyi (1957)
and by Rochlin (1964). Both were considering two classes of mappings,
namely,
{6.2.1)
S(x) = r(x)
(mod 1),0 5 x 51,
where r: [0, 1] --+ [0, oo) is a C 2 function such that infz r'
and r(1) is an integer; and the R.enyi transformation
S(x)

= rx

(mod 1),0 5 x 51,

> 1, r(O) = 0,
(6.2.2)

where r > 1, is a real constant. (The r-adic transformation considered earlier is clearly a special case of the R.enyi transformation.) Using a numbertheoretic argument, Renyi was able to prove the existence of a unique
invariant measure for such transformations. Rochlin was able to prove that
the Renyi transformations on a measure space with the Renyi measure
were, in fact, exact.
In this section we unify and generalize the results of Renyi and Rochlin
through the use of Theorem 5.6.2.
Consider a mapping S: [0, 1] --+ [0, 1] that satisfies the following four
properties:
(2i) There is a partition 0 = ao < a 1 < < ar = 1 of [0, 1] such that for
each integer i = 1, ... , r the restriction of S to the interval [ai-l, ai)
is a C 2 function;
{2ii) S(ai-l) = 0 fori= 1, ... , r;
(2iii) There is a A> 1 such that S'(x) ~A for 0 5 x
denote the right derivatives]; and

< 1 [S'(ai) and S"(ai)

6.2. Piecewise Monotonic Mappings

145

FIGURE 6.2.1. FUnction S(x) = 3x + ~ sin(7z/4) (mod 1) as an example of a


transformation on [0, 1] satisfying the conditions (2i)-(2iv). In this case r = 3,
and the counterimage of the set [0, x] consists of the union of the three intervals
indicated as heavy lines along the x-axis.

(2iv) There is a real finite constant c such that


- S"(z)/[S'(x)] 2 ~ c,

O~x<l.

(6.2.3)

An example of a mapping satisfying these conditions is shown in Figure


6.2.1.

Then we may state the following theorem.


Theorem 6.2.1. If 8: [0, 1] -+ [0, 1] satisfies the foregoing conditions (2i)(2iv) and Pis the Frobenius-Perron operator associated with S, then {Pn}
is asymptotically stable.

Proof. We first derive an explicit expression for the Frobenius-Perron operator. Note that, for any x E [0, 1),
r

s- 1 ([0,x]) =

Urai-1t9i(x)],
i=l

where

= { s-(i) (x)
1

9i(x)

ai

0 ~X< bi
bi~x<1

and B(i)(x) denotes the restriction of S to the interval [ai-ltai), whereas

146

6. The Behavior of Transformations on Intervals and Manifolds

Thus, by the definition of the Frobenius-Perron operator,

Pf(x) = ~- f
ua;

r 1g,(z)

= dx L

i=l

or

f(u) du

Js-1 ((O,z))

f(u)du
4 '-1

Pf(x) = L:oHx)f(g,(x)).

(6.2.4)

i=l

If b, < 1, then gHb,) denotes the right derivative. Thus, gHx) = 0 for
b, ~ x < 1, and all the g~ are lower left semicontinuous.
Now let Do denote the subset of D((O, 1]) consisting of all functions f
that, on the interval (0, 1), are bounded, lower left semicontinuous, and
satisfy the inequality
for 0

x < 1,

(6.2.5)

where k1 is a constant that depends on f. For any f E Do, the function


calculated from equation (6.2.4) will be bounded and lower left
semicontinuous.
For every f E Do, differentiation of expression (6.2.4) for the FrobeniusPerron operator gives

Pf as

(PI)~ = L:<gD~U o g,) + L:<oD 2 U~ o g,).


i=l

i=l

By using the inverse function theorem, we have


g~ ~ sup(1/ S') ~ 1/.\

and
so, as a consequence,

(PI)~ :5 c tgHf
i=l

g,)

+X tgHf~

g,).

i=l

Using inequality (6.2.5), this expression may be further simplified to

(Pf)~ ~ [c+ ~] Pf.


Set In= pn f. An induction argument shows that

(/n)~ ~ [.x c~ 1 + ~~] fn

6.2. Piecewise Monotonic Mappings

Choose a real k

147

> c)..j (>.. - 1). Then


(/n)~ $

kfn

{6.2.6)

for n sufficiently large, say n ~ no{!), and thus condition {5.8.3) of Proposition 5.8.1 is satisfied.
We now show that the fn are bounded and hence satisfy condition {5.8.2)
of Proposition 5.8.1. First note that from equation (6.2.4) we may write
r

fn+l(x)

= L,u;(x)fn(Ui(x)).
i=l

Thus, since g~ $ 1/>.. and S(ai-1) = 0 fori= 1, ... , r,

fn+l {0) $ :\ fn{O) + :\

L fn(ai_l).

{6.2.7)

i=2

From {6.2.6) it follows that

fn(ai) $ fn(x)elc,
so that
1 ~loa' fn(x) dx

~ e-lc fn(ai)ai,

for i = 1, ... , r.

Thus fn(ai) $ elcjai, and from {6.2.7) we have

fn+t{O) $ {1/>..)fn(O) + Lj>..,


where

for n

no{/),

L = :~:::::>lc /ai-l
i=2

Again, using a simple induction argument, it follows that


for n

no{!),

so

fn(O) $ 1 + (Lj(>..- 1)]


for sufficiently large n, say n ~ n1 {!). By using this relation in conjunction
with the differential inequality {6.2.6), we therefore obtain

fn(x) $ {1 + [Lj(>.. -1)J}elc,

for 0 $ x

< 1, n

~ n1.

{6.2.8)

Thus, by inequalities (6.2.6) and {6.2.8), all the conditions of Proposition


5.8.1 are satisfied and {pn} is asymptotically stable by Theorem 5.6.2.

148

6. The Behavior of Transformations on Intervals and Manifolds

FIGURE 6.2.2. A piecewise monotonic transformation satisfying the conditions


of Theorem 6.2.2.

Theorem 6.2.1 is valid only for mappings that are monotonically increasing on each subinterval [ai_ 1,ai) of the partition of (0, 1]. However, by
modification of some of the foregoing properties {2i)-{2iv), we may also
prove another theorem valid for transformations that are either monotonically increasing or decreasing on the subintervals of the partition. The
disadvantage is that the mapping must be onto for every (ai-l, ai)
We now consider a mapping S: [0, 1] -+ (0, 1] that satisfies a condition
slightly different from property {2i):
{2i)' There is a partition 0 = a0 < a 1 < < ar = 1 of (0, 1] such that for
each integer i = 1, ... , r the restriction of S to the interval (ai-l! ai)
is a C 2 function; as well as
(2ii)' S((ai-l,ai))

= (0, 1), that is, Sis onto;

(2iii)' There is a A> 1 such that IS'(x)l ~A, for x =f ai, i

= 0, ... ,r; and

(2iv)' There is a real finite constant c such that

IS"(x)I/[S'(x)] 2 5 c,

for x =f ai,i

= 0, ... ,r.

(6.2.9)

(See Figure 6.2.2 for an example.)


Then we have the following theorem.

Theorem 6.2.2. If S: (0, 1] -+ (0, 1] satisfies the preceding conditions (2i)'(2iv)' and Pis the Frobenius-Pemm operator associated with S, then {Pn}
is asymptotically stable.
Proof. The proof proceeds much as for Theorem 6.2.1. Using the same

6.2. Piecewise Monotonic Mappings

149

notation as before, it is easy to show that for x E [0, 1],

8- 1 ((0,x))

= U<a;-l!Y;(x)) + U<YA:(x),aA:),
A:

where Yi = 8~)1 , 8(i) is as before, and the first union is over all intervals in
which 8(i) is an increasing function of x whereas the second is over intervals
in which 8(i) is decreasing. Thus

Pf(x) = LYj(x)f(g;(x))- LY~(x)f(gA:(x))


j
A:
or, with ui(x)

= lgHx)l,
r

Pf(x)

=L

ui(x)f(gi(x)).

(6.2.10)

i=l

Let Do c D be the set of all bounded continuously differentiable densities


such that
1/'(x)l ~ kJf(x),
for 0 < x < 1,
(6.2.11)
where the constant k1 depends on
equation (6.2.10) gives

(Pf)'

f. For every f

i=l

i=l

Do, differentiating

=I: uHf o Yi) + Luig~(f o Yi)

Exactly as in the proof of Theorem 6.2.1 we have

ui ~ sup(1/l8'1) ~ 1/>.
and

lu~l/lg~l ~sup 18"1/[8'] ~c.


2

These two inequalities, in combination with (6.2.11), allow us to evaluate


(Pf)' as

I(Pf)'l

[c+ ~] Pf.

Set f n = pn f and use an induction argument to show that

Again we may always pick a k > c>.j(>.- 1) such that


l/~1 ~ kfn

(6.2.12)

150

6. The Behavior of Transformations on Intervals and Manifolds

Xj+J

FIGURE 6.2.3. Successive maxima in the variable z(t) from the Lorenz equations
are labeled :z:,, and one maximum is plotted against the previous (z,+l vs. :z:,)
after rescaling so that all :z;, E (0, 1).

for sufficiently large n [sa.y n


fied.

n 0 (f)], a.nd thus Proposition 5.8.2 is satis-

Example 6.2.1. When u = 10, b = 8/3, a.nd r = 28, then a.ll three
variables x, y, a.nd z in the Lorenz [1963] equations,

dx
-dt =yz-bx,

-dy
= -xz+rz-y'
dt

dz
dt

= u(y- z),

show very complicated dynamics. If we label successive maxima. in x( t)


a.s Xi (i = 0, 1, ... ), plot each maximum against the previous maximum
(i.e., Xi+l vs. Xi), a.nd scale the results so that the Xi a.re contained in
the interval [0, 1], then the numerical computations show that the points
(x;, Xi+l) a.re located approximately on the graph of a. one-dimensional
mapping, a.s shown in Figure 6.2.3.
AB an approximation to this mapping of one maximum to the next, we
ca.n consider the transformation

S(x)

(~==~X

for x E [0, !]
(6.2.13)

(2- a)(1- x)
1-a(1-x)

for x E

(!, 1],

where a= 1-e, shown in Figure 6.2.4 fore= 0.01. Clearly, S(O) = 8(1) =
0, S{!) = 1, a.nd, since S'(x) = (2- a)/(1- ax) 2 , we will always have
IS'(x)l > 1 for x E [0, !) if e > 0. Fina.lly, since S"(x) = 2a(2-a)/(1-ax?,
IS"(x)l is always bounded above. For x E (!, 1] the calculations a.re similar.
Thus the transformation (6.2.13) satisfies a.ll the requirements of Theorem
6.2.2: {pn} is a.symptotica.lly stable a.nd S is exact. 0

6.2. Piecewise Monotonic Mappings

151

FIGURE 6.2.4. The transformation S(z) given by equation (6.2.13) with= 0.01
as an approximation to the data of Figure 6.2.3.

Remark 6.2.1. The condition that IS'(z)l > 1 in Theorem 6.2.2 is essential
for S to be exact. We could easily demonstrate this by using (6.2.13) with
e = 0, thus making IS'(O)I = 18'(1)1 = 1. However, even if IS'(z)l = 1 for
only one point z E [0, 1), it is sufficient to destroy the exactness, as can be
demonstrated by the transformation
S(z)

= { z/(1- z)
2x -1

for z E [~, ~]
for x E ( 2 , 1] ,

(6.2.14)

which we originally considered in Section 1.3 (paradox of the weak repellor).


Now, the condition IS'(x)l > 1 is violated only at the single point x = 0,
and, for any IE 1 , the sequence {P"!} converges to zero on (0, 1]. Thus,
the only solution to the equation PI = I is the trivial solution I 0, and
therefore there is no measure invariant under S.
This is quite difficult to prove. First write the Frobenius-Perron operator
corresponding to S as

(6.2.15)
Set qn(x) = xln(x), where In = pn lo, and pick the initial density to be
lo = 1. Thus q0 (x) = x, and from (6.2.15) we have the recursive formula,

1 ( +X

qn+l(x) = 1 + x qn

(1

X
%)
+ 1 +X
qn 2 + 2 '

(6.2.16)

Proceeding inductively, it is easy to prove that q~(x) ~ 0 for all n, so that


the functions qn(x) are all positive and increasing. From equation (6.2.16)

152

6. The Behavior of Transformations on Intervals and Manifolds

we have
which shows that
exists. Write Zo

= 1 and Zlc+t = z~c/{1 + z~c). Then from (6.2.16) we have

Qn+t(z~c) =

1:

Zlc Qn(Zic+t)

:lcZic Qn

(~ + ~).

Take k to be fixed and assume that limn,_, 00 Qn(x) = Co for Zlc ::::; x ::::; 1
{which is certainly true fork= 0). Since Zlc ::::; + !z~c, taking the limit as
n -+ oo, we have

Zlc
Co= - -1 - 1"1m Qn (Zlc+t ) + - --eo,
1 + Zlc n-+oo
1 + Zlc

so limn..... oo Qn(Zic+l) =Co Since the functions Qn(x) are increasing, we know
that limn-+oo Qn(x) = Co for all x E [z1c+1, 1]. By induction it follows that
limn.....oo Qn(x) = Co in any interval [z~c, 1] and, since lim~c.... 00 Zlc = 0, we
have limn,_, 00 Qn(x) =Co for all x E {0, 1]. Thus
lim fn(x) = eo/x.
n-+oo
Actually, the limit Co is zero; to show this, assume Co
must exist some e > 0 such that

1
1

lim
n-+oo

0. Then there

1
1

fn(x) dx

(eo/x) dx > 1.

However, this is impossible since 11/n II = 1 for every n. By induction, each


of the functions f n (x) is decreasing, so the convergence of f n ( x) to zero is
uniform on any interval [e, 1] where e > 0.
Now, let f be an arbitrary function, and write f = J+- 1-. Given 6 > 0,
consider a constant h such that

1
1

Thus, since

1
1

1
1

u-- h)+dx +

u+- h)+dx::::; 6.

IP"' /I : : ; P"'l/1 = P"' J+ + pn f-, we have

1
= 21
1
1

IP"' fldx ::::;

1
+1
1

P"' j+ dx +
1

P"'1dx+6

1
1

P"'hdx

:::;2h

P"' f- dx
P"'(J+- h)dx +

P"'(r- h)dx

6.3. Piecewise Convex Transformations with a Strong Repellor

153

and, since {pn1} converges uniformly to zero on [e, 1], we have


fore> 0.
Hence the sequence {pn /} converges to zero in L 1 {[e, 1]) norm for every
e > 0 and equation PI = I cannot have a solution I E L 1 except I 0.
0

6.3 Piecewise Convex Transformations with a


Strong Repellor
Although the theorems of the preceding section were moderately easy to
prove using the techniques of Chapter 5, the conditions that transformation
8 must satisfy are highly restrictive. Thus, in specific cases of interest, it
may often not be the case that S' {x) > 1 or IS' {x) I > 1, or that condition
{6.2.3) or (6.2.9) is obeyed.
However, for a class of convex transformations, it is known that {pn}
is asymptotically stable. Consider 8: [0, 1] __. [0, 1] having the following
properties:
{3i) There is a partition 0 = ao < a1 < < ar = 1 of [0, 1] such that
for each integer i = 1, ... , r the restriction of 8 to [ai-l! ai) is a C 2
function;
{3ii) S'(x) > 0 and S"{x)
right derivatives];
(3iii) For each integer i

0 for all x E (0, 1), (S'(ai) and S"(ai) are

= 1, ... , r, S(ai-1) = 0; and

(3iv) S'(O) > 1.


An example of a mapping satisfying these criteria is shown in Figure 6.3.1.
Remark 6.3.1. Property (3iv) implies that point x = 0 is a strong repellor (see also Section 1.3 and Remark 6.2.1), that is, trajectory {S{x0 ),
S 2 (x0 ), }, starting from a point xo E {0, a1), will eventually leave [0, a1).
To see this, note that as long as sn(xo) E (0, a1) there is a eE (0, al) such
that
sn(x0 ) = S(sn- 1 (xo))- 8{0)
= S'(e)sn- 1 (xo) ~ .>.sn- 1 (xo),

where ). = 8'(0). By an induction argument, sn(xo) ~ ).Rxo and, since


>. > 1, sn(x0 ) must eventually exceed a 1 After leaving the interval [O,al)

154

6. The Behavior of Transformations on Intervals and Manifolds

FIGURE 6.3.1. An example of a piecewise convex transformation satisfying the


conditions of Theorem 6.3.1.

the trajectory will, in general, exhibit very complicated behavior. If at some


point it returns to [0, a1), then it will, again, eventually leave [0, a1). 0
With these comments in mind, we can state the following theorem.

Theorem 6.3.1. Let S: [0, 1] -+ [0, 1] be a transformation satisfying the


foregoing conditions {3i}-{3iv), and let P be the Frobenius-Perron operator
associated with S. Then { pn} is asymptotically stable.
Proof. The complete proof of this theorem, which may be found in Lasota
and Yorke [1982], is long and requires some technical details we have not
introduced. Rather than give the full proof, here we show only that { pn /}
is bounded above, thus implying that there is a measure invariant under S.
We first derive the Frobenius-Perron operator. For any x E [0, 1] we have

s- 1 ([0,x]) =
where

gi(x)

= { s~) (x)
a,

Ulai-l,g,(x)],
i=l

for X E S([a,_l,ai))
for x E [0, 1] \ S([a,-1, a,))

and, as before, S(i) denotes the restriction of S to the interval [ai-l, a,).
Thus, as in Section 6.2, we obtain
r

Pf(x)

= L:gHx)f(g,(x)).

(6.3.1)

i=l
Even though equations (6.2.4) and (6.3.1) appear to be identical, the functions g, have different properties. For instance, by using the inverse function

6.3. Piecewise Convex Transformations with a Strong Repellor

155

theorem, we have
g~

= 1/8' > 0

= -8" /[8'] 2 :50.

and g~'

Thus, since g~ > 0 we know that 9i is an increasing function of x, whereas


a decreasing function of x since g~' :5 0.
Let I E D([O, 1]) be a decreasing density, that is, x :5 y implies l(x) ~
l(y). Then, by our previous observations, l(gi(x)) is a decreasing function
of x as is gHx)l(gi(x)). Since PI, as given by (6.3.1), is the sum of decreasing functions, PI is a decreasing function of x and, by induction, so
is pnl
Observe further that, for any decreasing density IE D([O, 1]), we have

g~ is

1~

1:~: l(u) du ~ 1:~: l(x) du = xl(x),

so that, for any decreasing density,

l(x) :5 1/x,
Hence, for i

E (0, 1].

2, we must have

gHx)l(gi(x)) :5 gHO)I(g,(O))
:5 gHO)
9i(O)

= gHO),

ai-l

= 2, ... ,r.

This formula is not applicable when i = 1 since ao


have
~(x)f{g1(x)) :5 g~(O)I(O).

= 0. However, we do

Combining these two results with equation (6.3.1) for P, we can write
r

Pl(x) :5 g~(O)I(O) + LYHO)/ai-1


i=2

Set

8'(0)
and

= 1/g~(O) = ~ > 1

Li;(O)/ai-1

=M

i=2

so

Pl(x) :5

{1/~)1(0)

+ M.

Proceeding inductively, we therefore have

pn l(x) :5

{1/~n)I{O)

+ ~M/(~- 1)

:5 1(0) + ~M/(~- 1).

156

6. The Behavior of Transformations on Intervals and Manifolds

Thus, for decreasing f e D((O, 1]), since /(0) < oo the sequence {P" /}is
bounded above by a constant. From Corollary 5.2.1 we therefore know that
there is a density, /. e D such that Pf. =/.,and by Theorem 4.1.1 the
measure J.I.J. is invariant.

Example 6.3.1. In the experimental study of ftuid ftow it is commonly observed that for Reynolds numbers R less than a certain value, RL, strictly
laminar ftow occurs; for Reynolds numbers greater than another value,
RT, continuously turbulent ftow occurs. For Reynolds numbers satisfying
RL < R < RT, a transitional type behavior (intermittency) is found.
Intermittency is characterized by alternating periods of laminar and turbulent ftow, each of a variable and apparently unpredictable length.
Intermittency is also observed in mathematical models of ftuid ftow, for
example, the Lorenz equations (Manneville and Pomeau, 1979]. Manneville
[1980] argues that, in the parameter ranges where intermittency occurs
in the Lorenz equations, the model behavior can be approximated by the
transformationS: (0, 1]- [0, 1] given by
S(x) = (1 + e)x + (1- e)x2

(mod 1)

(6.3.2)

with e > 0, where x corresponds to a normalized ftuid velocity. This


transformation clearly satisfies all of the properties of Theorem 6.2.1 for
0 < e < 2 and is thus exact.
The utility of equation (6.3.2) in the study of intermittency stems from
the fact that x = 0 is a strong repellor. From Remark 6.3.1 is it clear
that any transformationS satisfying conditions (3i)-(3iv) will serve equally
well in this approach to the intermittency problem. Exactly this point of
view has been adopted by Procaccia and Schuster [1983] in their heuristic
treatment of noise spectra in dynamical systems. 0

6.4 Asymptotically Periodic Transformations


In order to prove the asymptotic stability of {pn} in the two preceding
sections, we were forced to consider transformations S with very special
properties. Thus, for every subinterval of the partition of [0, 1], we used
either S((ai-l!ai)) = (0, 1) or S(ai-t) = 0. Eliminating either or both
of these requirements may well lead to the loss of asymptotic stability of
{pn}, as is illustrated in the following example.
LetS: (0, 1]- (0, 1] be defined by
for x E [0, !)
2x
2x- ! for x E
i)
{
2x- 1 for x E [4, 1],
as shown in Figure 6.4.1. Examination of the figure shows that the Borel
measure is invariant since s- 1 ([0, x]) always consists of two intervals whose
S(x) =

[j,

6.4. Asymptotically Periodic Transformations

157

FIGURE 6.4.1. An example showing that piecewise monotonic transformation


that is not onto might not even be ergodic. (See the text for details.)

union has measure x. However, 8 is obviously not exact and, indeed, is not
even ergodic since 8- 1 {[0, !D = [o, !l and 8- 1 {[!, 1]) = l!. 1]. 8 that is
restricted to either (0, !J or [!, 1] behaves like the dyadic transformation.
The loss of asymptotic stability by {pn} may, under certain circumstances, be replaced by the asymptotic periodicity of {pn}. To see this,
consider a mapping 8: [0, 1] -+ [0, 1] satisfying the following three conditions:
{4i) There is a partition 0 = ao < a1 < < ar = 1 of [0, 1] such that
for each integer i = 1, ... , r the restriction of 8 to (ai-l ~) is a C 2
function;
(4ii) l8'(x)l;::: -X> 1,

x =I= ai, i

= 1, ... , r;

(6.4.1)

(4iii) There is a real constant c such that

l8"(x)l

(8'(x)]2 ~ c < oo,

x 'I ai, i

= 0, ... , r.

(6.4.2)

An example of a transformation satisfying these conditions is shown in


Figure 6.4.2.
We now state the following theorem.

Theorem 6.4.1. Let 8: [0, 1] -+ [0, 1] satisfy conditions {4i)-(4iii) and let
P be the Frobenius-Penvn operator associated with 8. Then, for all fED,
{ pn I} is asymptotically periodic.

Proof. We first construct the Frobenius-Perron operator corresponding to

158

6. The Behavior of 'Iransformations on Intervals and Manifolds

----- I

----~'--,--------t--

:/

:
I

:
I

0~~--~--~~~~~-.-~~~x

FIGURE 6.4.2. An example of a transformation on [0, lJ satisfying the conditions


of Theorem 6.4.1.

8. For any x E (0, 1], we have

8- 1((0,x))

= UAi(x)
i=l

~(x)

and, as before, 9i
Therefore,

(ai-19i(x)) x E Ii,g~ > 0


(gi(x), ~)
X E !i, g. < 0
{
0 or (ai-l ai) x Ii,

= 8~) and 8(i) denotes the restriction of 8 to (ai-t.ai)


Pf(x)=

~~I

f(u)du

""" Js-l([o,z))

r d I
L
dx J~
i=l

f(u)du,

(6.4.3)

a.(z)

where

dx J~

a.(z)

f(u)du

x E Ii,g~ > 0
-gHx)f(gi(x)), x E hg~ < 0
0
xh

{ gHx)f(gi(x)),

(6.4.4)

The right-hand side of equation (6.4.3) is not defined on the set of end
points of the intervals Ii, 8(ai_ 1 ), and 8(~). However, this set is finite and

6.4. Asymptotically Periodic Transformations

159

thus of measure zero. Since a function representing P f that is an element


of L 1 is defined up to a set of measure zero we neglect these end points.
Equation (6.4.4) may be rewritten as

where ui(x) = lgHx)l and 11, (x) is the characteristic function of the interval
h Thus (6.4.3) may be written as
r

Pf(x) = Lui(x)f(gi(x))1I,(x).

(6.4.5)

i=1

Equation (6.4.5) for the Frobenius-Perron operator is made more complicated than those in Sections 6.2 and 6.3 by the presence of the characteristic
functions 11,(x). The effect of these is such that even when a completely
smooth initial function f E L 1 is chosen, P f and all subsequent iterates of
f may be discontinuous. As a consequence we do not have simple criteria,
such as decreasing functions, to examine the behavior of pn f. Thus we
must examine the variation of pn f.
We start by examining the variation of P f as given by equation (6.4.5).
Let a function f E D be of bounded variation on [0, 1]. From property (V1)
of Section 6.1, the Yorke inequality (V5), and equation (6.4.5),
1

V Pf(x) :5 L:Vfui(x)f(gi(x))1I,(x)]
0

i=1 0

:52 L:Vfui(x)/(gi(x))]
(6.4.6)
Further, by property (V4),

Because, from the inverse function theorem, we have


cui, the preceding inequality becomes

Vfui(x)f(gi(x))] :5
~

and, thus, (6.4.6) becomes

~ V /(gi(x)) + c
~

O"i

:5 1/A and

ui(x)f(gi(x)) dx,

lu~l

:5

160

6. The Behavior of Transformations on Intervals and Manifolds


1

VPJ(x) ~X 'LV J(gi(x))


0

i=1

Ill] i. ui(x)f(gi(x))dx.

+2t [c+

{6.4.7)

Define a new variable y = 9i(x) for the integral in {6.4.7) and use property
{V3) for the first term to give

yPj(x) ~ Xt1r a~1 J + 2 t1r [ + !Iii1 ] 1o'o,-1 J(y) dy.


0

Set L = maJCi 2{c + 1/llil) and use property (V2) to rewrite this last inequality as

1
2
11
2 1
V
Pj(x) ~ XV f + L
f(y) dy =XV f + L
1

(6.4.8)

since J E D([O, 1]).


By using an induction argument with inequality {6.4.8), we have

VPnJ~(~)\jJ+L~(~Y
0

Thus, if ..\

> 2, then

pnj

and, therefore, for every

(6.4.9)

J=O

~ (~)ny J+ /:2

J E D of bounded variation,
1

lim sup

n-+oo

Vpnj < K,

(6.4.10)

where K > ..\Lf(..\- 2) is independent of f.


Now let the set :F be defined by

:F = {g E D:

y~ K} .
g

From (6.4.10) it follows that pn J E :F for a large enough n and, thus,


{pn!} converges to :F in the sense that limn .....oo infpn /E:F llpn J - gil = 0.
We want to show that :F is weakly precompact. From the definition of the
variation, it is clear that, for any positive function g defined on [0, 1],
1

g(x) - g(y) ~

Vg
0

6.4. Asymptotically Periodic Transformations

161

for all x, y E [0, 1]. Since g E D, there is some y E [0, 1] such that g(y) :5 1
and, thus,
g(x) :5 K + 1.
Hence, by criterion 1 of Section 5.1, :F is weakly precompact. (Actually, it
is strongly precompact, but we will not use this fact.) Since :F is weakly
precompact, then Pis constrictive by Proposition 5.3.1. Finally, by Theorem 5.3.1, { pn!} is asymptotically periodic and the theorem is proved
when A> 2.
To see that the theorem is also true for A > 1, consider another transformation S: [0, 1] -+ [0, 1] defined by
-

S(x) =So oS(x)

= Sq(x).

(6.4.11)

Let q be the smallest integer such that Aq > 2 and set X= Aq. It is easy to
see that S satisfies conditions (4i)-(4ii). By the chain rule,

IS'(x)l ~ (inf IS'(x)l)q ~ Aq

=X> 2.

Thus, by the preceding part of the proof, {fin} satisfies


1

lim supVPn/

n-+oo

<k

for every f E D of bounded variation, where the constant k is independent


of f. Write an integer n in the form n = mq + s, where the remainder 8
satisfies 0 :58< q. Take m sufficiently large, m > mo, so that
1

VpmI :5 k,

> mo.

Now, using inequality (6.4.9), we have

:5 k

~up (~); + L I:(~);,

o~ 3 ~q-1

;=o

(mo + 1)q.

Thus, for n sufficiently large, the variation of pn f is bounded by a constant


independent of f and the proof proceeds as before.

Remark 6.4.1. From the results of Kosjakin and Sandler [1972] or Li and
Yorke [1978a], it follows that transformations S satisfying the assumptions
of Theorem 6.4.1 are ergodic if r = 2. 0
Example 6.4.1. In this example we consider one of the simplest heuristic
models for the effects of periodic modulation of an autonomous oscillator
[Glass and Mackey, 1979].

162

6. The Behavior of Transformations on Intervals and Manifolds

/
/

/
/

/
/
/
/

FIGURE 6.4.3. The periodic threshold 9(t) is shown as a solid curved line, and
the activity z(t) as dashed lines. (See Example 6.4.1 for further details.)

Consider a system (see Figure 6.4.3) whose activity x(t) increases linearly
from a starting time ti until it reaches a periodic threshold 9(t) at time ti:

(6.4.12)
We take
x(t)

= ,\(t- ti)

and 9(t)

= 1 + </>(t),

where </> is a continuous periodic function with period 1 whose amplitude


satisfies
1 ~ sup<J>(t) = -inf</>(t) = K ~ 0.
When the activity reaches threshold it instantaneously resets to zero, and
the process begins anew at the starting time,
ti+l = ti + -y- 1 x(ti)

(6.4.13)

In (6.4.13), ti is an implicit function of ti given by (6.4.12) or by

(6.4.14)
Equation (6.4.14) has exactly one smallest solution ti ~ ti for every tiER.
We wish to examine the behavior of the starting times ti. Set
F(ti) = ti(ti) + -y- 1 x(ti(ti))
so that the transformation
S(t) = F(t)

(mod 1)

gives the connection between successive starting times.


Many authors have considered the specific cases of </>(t)
-y- 1 = 0, so ti = ti+l and, thus, ti+l is given implicitly by

(6.4.15)

= K sin 2?Tt,

6.4. Asymptotically Periodic 'Iransformations

163

Here, to illustrate the application of the material of this and previous


sections, we restrict ourselves to the simpler situation in which t/>(t) is a
piecewise linear function of t and 8 given by

!J

_ { 4Kt + 1 - K
t E (0,
(}(t)- 4K(1-t)+I-K tE (!,I).
The calculation of F(t) depends on the sign of .X- 4K. For example, if
.X > 4K, a simple computation shows that

tf
+ ~) f:f

~!pt+(l+~)

F(t) = {

~:;:pt + ( l

tE [-a,!(1-,8)-a]

t E {!(1 - ,8) -a, 1 -a],

6 16
( .4' )

where a= 4Kf'y, ,8 = 4Kf..X, and a= (I- K)f..X.


Since 0 $; ,8 < 1, it is clear that F'(t) > 1 for all t E [-a, !(1 - ,8)- a].
However, if (1 - a)/(1 + ,8) < -1, then l8'(t)l > 1 for all t and {P"} is
asymptotically periodic by Theorem 6.4.1. Should it happen in this case
that 8 is onto for every subinterval of the partition, then {pn} is asymptotically stable by Theorem 6.2.2.
Despite the obvious simplifications in such models they have enjoyed
great popularity in neurobiology: the "integrate and fire" model (Knight,
I972a,b]; in respiratory physiology, the "inspiratory off switch" model [Petrillo and Glass, 1984]; in cardiac electrophysiology, the "circle model" [Guevara and Glass, 1982]; and in cell biology, the "mitogen" model [Kauffman,
1974; Tyson and Sachsenmaier, 1978]. 0
Example 6.4.2. An interesting problem arises in the rotary drilling of
rocks. Usually the drilling tool is in the form of a toothed cone (mass
M and radius R) that rotates on the surface of the rock with tangential
velocity u. At rest the tool exerts a pressure Q on the rock. In practice it
is found that, for sufficiently large tool velocities, after each impact of a
tooth with the rock the tool rebounds before the next blow. The energy of
each impact, and thus the efficiency of the cutting process, is a function of
the angle at which the impact occurs.
Let x be the normalized impact angle that is in the interval (0, I]. Lasota
and Rusek [1974] have shown that the next impact angle is given by the
transformation 8: [0, 1]-+ [0, 1] defined by

8(x) = x + aq(x)- v'faq(x)] 2 + 2axq(x)- aq(x)[1 + q(x)] (mod 1),


(6.4.I7)
where
q(x) =I+ int[(1- 2x)f(a- 1)];
int(y) denotes the integer part of y, namely, the largest integer smaller than
or equal to y, and
a= F/(F-1),

164

6. The Behavior of Transformations on Intervals and Manifolds

where

F=Mu2 /QR
is Freude's number, the ratio of the kinetic and potential energies.
The Freude number F contains all of the important parameters charact!lrizing this process. It is moderately straightforward to show that with
S =So S, IS'(x)l > 1 ifF> 2. However, the transformation (6.4.17) is
not generally onto, so that by Theorem 6.4.1 the most that we can say is
that for F > 2, if Pis the Frobenius-Perron operator corresponding to S

then {pn} is asymptotically periodic. However, it seems natural to expect


that {pn} is in fact asymptotically stable. This prediction is supported
experimentally, because, once u > (2QR/M) 112 , there is a transition from
smooth cutting to extremely irregular behavior (chattering) of the tool.
0

Example 6.4.3. Kitano, Yabuzaki, and Ogawa [1983] experimentally examined the dynamics of a simple, nonlinear, acoustic feedback system with
a time delay. A voltage x, the output of an operational amplifier with response time -y- 1 , is fed to a speaker. The resulting acoustic signal is picked
up by a microphone after a delay T (due to the finite propogation velocity
of sound waves), passed through a full-wave rectifier, and then fed back to
the input of the operational amplifier.
Kitano and co-workers have shown that the dynamics of this system are
described by the delay-differential equation
-y- 1:i:(t)

= -x(t) + J.tF(x(t- r)),

(6.4.18)

-lx+ !I+!

(6.4.19)

where

F(x) =

is the output of the full-wave rectifier with an input x, and J.l. is the circuit
loop gain.
In a series of experiments, Kitano et al. found that increasing the loop
gain J.l. above 1 resulted in very complicated dynamics in x, whose exact
nature depends on the value of -yr. To understand these behaviors they
considered the one-dimensional difference equation,
Xn+l

= J.tF(xn),

derived from expressions (6.4.18) and (6.4.19) as -y- 1 -+ 0. In our notation


this is equivalent to the map T: [-J.t/(J.t- 1),J.t/2] -+ [-J.t/(J.t- 1),J.t/2],
defined by
J.t(1 +x) for x E
(6.4.20)
T(x) = {
(
]
-J.tX
for X E -!, j .

[-f--r, -!]

6.5. Change of Variables

165

for 1 < J.t ::5; 2. Make the change of variables


J.t
,J,t J.t+ 1
x----+x--J,t-1
2J,t-1

so that (6.4.20) is equivalent to the transformation 8: [0, 1]- [0, 1], defined
by
1
S(x') = { J.tX
for x' E [0, 1/J.t]
(6.4.21)
2- J.tX 1 for x' E (1/J.t, 1].
For 1 < J.t ::5; 2, the transformation S defined by (6.4.21) satisfies all the
conditions of Theorem 6.4.1, and 8 is thus asymptotically periodic. If J.t = 2,
then, by Theorem 6.2.2, 8 is statistically stable. Furthermore, from Remark
6.4.1 it follows that 8 is ergodic for 1 < J.t < 2 and will, therefore, exhibit
disordered dynamical behavior. This is in agreement with the experimental
results. 0

Remark 6.4.2. As we have observed in the example of Figure 6.4.1, piecewise monotonic transformations satisfying properties (4i)-(4iii) may not
have a unique invariant measure. If the transformation is ergodic, and the
invariant measure is thus unique by Theorem 4.2.2, then the invariant measure has many interesting properties. For example, in this case Kowalski
[1976] has shown that the invariant measure is continuously dependent on
the transformation. 0

6.5 Change of Variables


In the three preceding sections, we have examined transformations 8: [0, 1] [0, 1] with very restrictive conditions on the derivatives S'(x) and S"(x).
However, most transformations do not satisfy these conditions. A good
example is the quadratic transformation,

= 4x(1- x), for x E [0, 1].


For this transformation, S'(x) = 4- 8x, and IS'(x)l < 1 for x E (i, i).
Furthermore, IS"(x)j[S'(x)J21 = !(1- 2x)- 2 , which is clearly not bounded
S(x)

at x = ! However, iteration of any initial density on [0, 1] indicates that


the iterates rapidly approach the same density (Figure 1.2.2), leading one
to suspect that, for the quadratic transformation, { pn} is asymptotically
stable.
In this section we show how, by a change of variables, we can sometimes
utilize the results of the previous sections to prove asymptotic stability.
The idea is originally due to Ruelle [1977] and Pianigiani [1983].
Theorem 6.5.1. Let 8: [0, 1] - [0, 1] be a transformation satisfying properties (2i)' and (2ii)' of Section 6.2, and Ps be the Frobenius-Perron operator

166

6. The Behavior of Transformations on Intervals and Manifolds

corresponding to S. If there exists an a. e. positive 0 1 function 4> e L 1 ([0, 1])


such that, for some real ~ and c,

= IS'(x)lr/>(S(x)) >

p (X ) -

and

r/>(x)

~
A

> 11

(6.5.1)

0<x<1

14><~) ! (p(~)) I~ c < oo,

0 <X< 1,

(6.5.2)

e (0, 1],

(6.5.3)

then {J>N} is asymptotically stable.


Proof. Set
g(x)

1 r
= jj4;jj
Jo

for x

rJ>(u) du,

and consider a new transformation T defined by

T(x)

= g(S(g- 1 (x)))

(6.5.4)

with associated Frobenius-Perron operator ~. From (6.5.4), T(g(x))


g(S(x)), so
dT dg
dg dS
dg dx = dS dx
or
T '( g(x )) = g'(S(x))S'(x)
g'(x)
.

Using (6.5.3) this may be rewritten as

T '( g(x ))

= S'(x)r/>(S(x))
r/>(x)

Hence, by (6.5.1), we have IT'(g)l ~ ~ > 1. Further, by comparing this


equation with (6.5.1), we see that p(x) = IT'(g(x))!. It follows that

1 d ( 1 )
r/>(x) dx p(x)

d (

= r/>(x) dx

so that, from inequality (6.5.2),

1 )
IT'(g)l

T"(g)

= (T'(g)]2 114>11'

T"(g)
[T'(g)] 2 ~ cllr/>11 < oo.

Thus the new transformation T satisfies all the conditions of Theorem 6.2.2,
and {~} is asymptotically stable as is {J>N} by (6.5.14).
Example 6.5.1. Consider the quadratic transformation S(x)
with x e [0, 1] and set
1
r/>(x) = -7rv''x=7(1::=-=x=.=)

= 4x(1- x)
(6.5.5)

6.5. Change of Variables

167

Using equations (6.5.3) and (6.5.5), it is easy to verify that all the conditions
of Theorem 6.5.1 are satisfied in this case and, thus, for the quadratic
transformation, {pn} is asymptotically stable.
Note that with l/J as given by (6.5.5), the associated function g, as defined
by (6.5.3), is given by
du
1 1. 1 (1-2z),
g(z) =-11:r:
=---sin11" o y'u(1 - u)
2 11"

(6.5.6)

and thus
(6.5.7)
Hence, when S(z)
by

= 4z(1-z), the transformation T: [0, 1]-+ [0, 1), defined


T(z) =go So g- 1 (z),

(6.5.8)

is easily shown to be

2z

T(z)

= { 2(1- z)

for z E
for z E

[0, ~)
[~, 1].

(6.5.9)

[The transformation defined by (6.5.9) is often referred to as the tent map


or hat map.] The Frobenius-Perron operator, Fr, corresponding to T is
given by
PTf(z) ~~ (~z) + ~~ (1- ~z),

and, by Theorem 6.2.2, {P~} is asymptotically stable. Furthermore, it is


clear that /.
1 is the unique stationary density of PT, soT is, in fact,
exact by Theorem 4.4.1. Reversing the foregoing procedure by constructing
a transformation S = g- 1 o Tog from T given by (6.5.9) and from g, g- 1
given by equations (6.5.6) and (6.5.7) yields the transformation S(z) =
4z(1-z). From this {Pff} is asymptotically stable, and l/J, given by (6.5.5),
is the stationary density of Ps. 0
These comments illustrate the construction of a statistically stable transformation S with a given stationary density from an exact transformation
T. Clearly, the use of a different exact transformation T1 will yield a different statistically stable transformation 8 1 , but one that has the same
stationary density asS. Thus we are led to the next theorem.

Theorem 6.5.2. LetT: (0, 1)-+ (0, 1) be a measumble, nonsingular tronaformation and let l/J E D((a,b)), with a and b finite or not, be a given positive density, that is, lfJ > 0 a. e. Let a second tmnsformation 8: (a, b) -+ (a, b)
be given by S = g- 1 o Tog, where
g(z) =

1:r: lfJ(y)dy,

a< z <b.

(6.5.10)

168

6. The Behavior of Transformations on Intervals and Manifolds

Then T is exact if and only if S is statistically stable and </> is the density
of the measure invariant with respect to S.
Proof. Let PT and Ps be Frobenius-Perron operators corresponding to
the transformations T and S, respectively. We start with the derivation of
the relation between PT and Ps. By the definition of Ps, we have
11

Psf(x)dx =

f
1s-

f(x)dx,

for

E L 1 ((a,b)),

1 ((a,11))

where s- 1 ((a, y)) = g- 1 (T- 1 (g(a), g(y))). Set X= g- 1 (z) and use equation
(6.5.10) to change the variables so that the last integral may be rewritten
to give
11

1
a

Psf(x) dx

T-l(g(a),g( 11 ))

1
dz )) .
f(g- (z)) </>( -l(

Defining
for f E L 1 ((a, b)),

(6.5.11)

we have
11

Psf(x) dx

= f
JT- 1 (g(a),g(11))

P9 j(z) dz

= [9<

11

}g(a)

>i'TP9 f(z)

dz. (6.5.12)

Setting

Pg- 1 f(x) =

j(g(x))</>(x),

for

1 ((0, 1)},

(6.5.13)

and substituting z = g(x) in the last integral in (6.5.12) yields

1
11

Psf(x) dx

1
11

P9- 1 l'TP9 f(x) dx.

Thus Ps and l'T are related by


for

E L 1 ((a, b)).

(6.5.14)

By integrating equation (6.5.11) over the entire space, we have


for

L 1 ((a, b)).

Further, Pg- 1 , as given by (6.5.13), is the inverse operator to P9 , and integration of (6.5.13) gives
for j E L 1 ((0, 1}}.

(6.5.15)

6.5. Change of Variables

169

If T is measure preserving, we have PT 1 = 1. Furthermore, from the


definition of P9 in (6.5.11), we have P9 J = 1. As a consequence
PsifJ = P9- 1PTP9 J = P9- 1PT1 = P9- 11 = J,

which shows that J is the density of the measure invariant with respect
to S. Analogously from PsJ = J, it follows that PT1 = 1. By using an
induction argument with equation (6.5.14), we obtain
for

IE L 1 ((a, b)).

This, in conjunction with (6.5.15) and the equality P9 J = 1 gives


IIP9- 1P!}P9 1- P9- 1P9 JIILl(a,b)
IIP!}Pgl- PgJIILl(0,1) = IIP!}Pgl- 1IIL1(0,1)

liPs I- ifJIILl(a,b) =

=
By substituting

I= P9- 1 i,

for

(6.5.16)

E L 1 ((0, 1)),

into (6.5.16), we have


(6.5.17)

Thus, from equations (6.5.16) and (6.5.17), it follows that the strong convergence of {PSI} to J for IE D((a,b)) is equivalent to the strong convergence of {.P!Jf} to 1 for f E D((O, 1)).

Example 6.5.2. LetT be the hat transformation of (6.5.9) and pick ifJ(x) =
k exp( -kx) for 0 < x < oo, which is the density distribution function for
the lifetime of an atom with disintegration constant k > 0. Then it is
straightforward to show that the transformation S = g- 1 o Tog is given
by

S(x) =In {

11- 2e~bl1/k}.

The Frobenius-Perron operator associated with S is given by


e-k:z:

Psl(x)

e-k:z:

= 1 + e-k:z: I k In 1 + e-k:z: + 1- e-b I k In 1- e-k:z:

By Theorem 6.5.2, {PS} is asymptotically stable with the stationary density ifJ(x) = kexp(-kx). 0

Example 6.5.3. As a second example, consider the Chebyshev polynomials


Sm: ( -2, 2) --+ ( -2, 2),

Sm(x)

= 2 cos[mcos- 1 (x/2)],

m=0,1,2, ....

170

6. The Behavior of Transformations on Intervals and Manifolds

Define

g(x)

du
= -1r11a:
r;--::;r
-2v4-u-

corresponding to the density

1
tf>(x)= 1r ~
4-x

(6.5.18)

The Chebyshev polynomials satisfy Sm+l(x) = xSm(x) - Sm-1(x) with


S 0 (x) = 2 and S 1 (x) = x. It is straightforward, but tedious, to show that
the transformation Tm =go Sm o g- 1 is given by
m (xTm(x)

~)

= { m (2't2 - x)

2't1)
[
)
for x E 2't1, 2't2 ,
for x E [ ~,

(6.5.19)

where n = 0, 1, ... , [(m - 1)/2], and [y) denotes the integer part of y.
For m ~ 2, by Theorem 6.2.2, {P.,J is asymptotically stable. An explicit
computation is easy and shows that /. 1 is the stationary density of PTm.
Thus Tm is exact. Hence, by Theorem 6.5.2, the Chebyshev polynomials
Sm are statistically stable for m ~ 2 with a stationary density given by
equation (6.5.18). This may also be proved more directly as shown by Adler
and Rivlin [1964).
This example is of interest from several standpoints. First, it illustrates
in a concrete way the nonuniqueness of statistically stable transformations (Sm) with the same stationary density derived from different exact
transformations (Tm) Second, it should be noted that the transformation
Sm: (0, 1)-+ (0, 1), given by

1
= -4Sm(4x2) + 21
when m = 2, is just the familiar parabola, S2 (x) = 4x(1- x). Finally, we
Sm(x)

note in passing that cubic maps equivalent to Sa have arisen in a study of


a simple genetic model involving one locus and two alleles [May, 1980) and
have also been studied in their own right by Rogers and Whitley [1983].
0

Example 6.5.4. As a further illustration of the power of Theorem 6.5.2, we


consider an example drawn from quantum mechanics. Consider a particle
of mass m free to move in the x direction and subjected to a restoring
force, -kx. This is equivalent to the particle being placed in a potential
V(x) = kx 2 /2. The standard solution to this quantized harmonic oscillator
problem is (Schiff, 1955)

u (x) = [
n
.{i2nnf

] 1/2

H (ax)e-< 112 )a
n

a: '

2 2

for n

= 0, 1, ... ,

6.5. Change of Variables

171

where
a4

= mk/h-2

(h. is Planck's constant) and Hn(Y) denotes the nth-order Hermite polynomial, defined recursively by

[Ho(Y) = 1, H1(y) = 2y, H2(y) = 4y 2 - 2, ...). In accord with the usual


interpretation of quantum mechanics, the associated densities are given by
<f>n(x) = [un(x)] 2 , or

a
) -Q2z2
2(
<l>n (X ) = ..{ii2nn!Hn axe
I

for n

= 0, 1, ... ,

and the Un are


Un(x)

= ..(ii~n
1r
n.1

jz H~(ay)e-Q
-oo

2112

dy,

for n

= 0,1, ....

Then for any exact transformation T, the transformations Sn(x) = g;; 1 o


To Un(x) have the requisite stationary densities >. 0
To close this section we note that the following result is a direct extension
of Theorem 6.5.2.

Corollary 6.5.1. Let S:(a,b)-+ (a,b), with a and b finite or not, be a


statistically stable transformation with a stationary density > E D((a, b))
and let E D( (a, /3)) be given, with a and /3 also finite or not. Further, set
g(x)

1z

<f>(y) dy

and g(x)

lz

(y) dy.

8: (a, /3) -+ (a, /3), defined by


8 = g-l 0 g 0 s 0 g-l 0 g,

Then the transformation

is statistically stable with stationary density

Proof. First set T: {0, 1) -+ {0, 1) equal toT= goSog- 1 This is equivalent
to S = g- 1 o Tog and, by Theorem 6.5.2, T is exact. Again, using Theorem
6.5.2 with the exactness ofT, we have that 8 = y- 1 o Tog is statistically
stable.
Remark 6.5.1. Nonlinear transformations with a specified stationary density can be used as pseudorandom number generators. For details see Li
and Yorke [1978). 0

172

6. The Behavior of Transformations on Intervals and Manifolds

6.6 Transformations of the Real Line


All of the transformations considered in previous sections were defined on
the interval [0, 1]. The particular choice of the interval [0, 1] is not restrictive
since, givenS: [a, b)-+ [a, b), we can always consider T(x) = Q- 1 (S(Q(x))),
T: [0, 1)-+ [0, 1), where Q(x) = a+(b-a)x. All of the asymptotic properties
of S are the same as those of T.
However, if S maps the whole real line (or half-line) into itself, no linear
change of variables is available to reduce this problem to an equivalent
transformation on a finite interval. Further, transformations on the real line
may have some anomalous properties. For example, the requirement that
IS'(x)l ~A> 1 for S: R-+ R is not sufficient for the asymptotic stability of
{P"}. This is amply illustrated by the specific example S(x) = 2x, which
was considered in Section 1.3.
There are, however, transformations on the real line for which the asymptotic stability of {Pn} can be demonstrated; one example is S(x) =
.8 tan("Yx + 6), I.B"YI > 1. This section will treat a class of such transformations.
Assume the transformation S: R -+ R satisfies the following conditions:
(6i} There is a partition a_ 2 < a_ 1 < ao < a 1 < a2 of the real line
such that, for every integer i 0, 1, 2, ... , the restriction S(i) of S
to the interval (ai-l! ai) is a 0 2 function;

(6ii) S((ai-11 a,))

= R;

(6iii) There is a constant A > 1 such that IS' (x) I ~ A for x ::/: a,,
i = 0, 1, 2, ... i
(6iv) There is a constant L ~ 0 and a function q E L 1 (R) such that
(6.6.1}
where 9i

= s~)l' for i =

0, 1, ... i and

(6v) There is a real constant c such that


IS"(x)l

[S'(x)]2 ~ c,

for x ::/: ai, i

= 0, 1, ....

(6.6.2)

Then the following theorem summarizes results of Kemperman [1975],


Schweiger [1978), Jablonski and Lasota [1981), and Bugiel [1982).
Theorem 6.6.1. If S: R-+ R satisfies conditions (6i)-(6v) and P is the
associated Frobenius-PeTTOn operator, then {Pn} is asymptotically stable.
Proof. We first calculate the Frobenius-Perron operator. To do this note

6.6. 'Iransformations of the Real Line

that

173

U<a;-t.9;(x>> + U<g~c(x),a~c),

s- 1 << -oo,x)) =

1c

where the first union is over intervals in which g, is an increasing function


of x, and the second is for intervals in which 9i is decreasing. Thus

Pf(x)

= dxd f

f(u) du

Js-1((-oo,z))

d
19J(z)
d
=dxL
f(u)du+(i:L
j

OJ-1

or

1a

1c

f(u)du

g,(z)

00

Pf(x)

CTi(x)f(gi(x)),

(6.6.3)

i=-oo

where CTi(x) = lgHx)l.


Having an expression for Pf(x) we now calculate the variation of Pf
to show that the sequence In = pn f satisfies assumptions (5.8.2) and
(5.8.3) of Proposition 5.8.1. Denote by Do C D(R) the set of all densities
of bounded variation on R that are positive, continuously differentiable,
and satisfy
for x E R,
(6.6.4)
1/'(x)l ~ kJI(x),
where the constant kJ depends on
00

VP/(x) ~
-oo

00

f. Now

00

VCTi(x)f(g,(x))

i=-oo -oo
00
{ 1
~ i~oo
:X

y
00

f(g,(x))

100 ICT;(x)lf(g,(x)) dx }.
-oo

Using ICTHx)l ~ CCTi(x), which follows from inequality (6.6.2), and making
the change of variables y = g,(x), we have

Pf(x)

{.

~ J~ ~ ~,'(y) + c [,'(y) dy}


1

~ :X

v f(y) +c.
00

-oo

By an induction argument, we obtain


oo

oo

V pnf(x) ~An V f(y) +A~ 1"


-oo

-oo

174

6. The Behavior of Transformations on Intervals and Manifolds

Since .X > 1, then, for real a > .Xcf(.X- 1), there must exist a sufficiently
large n, say n >no(!), such that
00

V P"f(x) ~a.

(6.6.5)

-oo

Now we are in a position to evaluate P" f. From inequalities (6.6.1) and


(6.6.3), we have
00

Pf(x) ~ q(x) ~ f(gi(x))(tli- ai-l)


i=-oo
For every interval (ai-ltlli) pick a

Zi

(6.6.6)

(ai-l,ai) such that


for i

= 0, 1, ....

Thus, from (6.6.1) and (6.6.6), we obtain

Pf(x)

~ q(x) itoo { Llf(gi(x))- f(zi)l + 1~~ 1 f(x)dx}


~ Lq(x)

f(x)

= q(x) { L

+ q(x) /_: f(x) dx

f(x)

1}.

By substituting pn-l f instead off in this expression and using (6.6.5), we


have

P" f(x)

q(x)(aL + 1),

x E R,n >no(!)+ 1.

(6.6.7)

Thus the sequence of functions fn = P" f satisfies conditions (5.8.2) of


Proposition 5.8.1.
Now, differentiating equation (6.6.3) and using Ia~ I ~ CO"i 1 O"i < 1/.X, and
1!'1 ~ kJf gives

and, by induction,

I(P" f)' I < ~ + _!_k .


pn f

- .X - 1

_xn f

6.7. Manifolds

175

Pick a constant K > >..cf(>.. -1) so that, since >.. > 1, for n sufficiently large
(n > n1(/)), we have
I(P f)' I :5 KP f.
(6.6.8)
Thus the iterates In= pn f satisfy condition (5.8.3) of Proposition 5.8.1.
Therefore, by Proposition 5.8.1, pn f has a nontrivial lower-bound function,
and thus, by Theorem 5.6.2, {pn} is asymptotically stable.

Remark 6.6.1. Observe that in the special case where Sis periodic (in x)
with period L = fli -ai-l, condition (6iv) is automatically satisfied. In fact,
in this case oHx) = !fo(x) so, by setting q = lol>I/L, we obtain inequality
(6.6.1) and, moreover,

llqll

= L1

11

00

00

-oo lg;(x)ldx

= L1

-oo g;(x)dx

I= l[gi(x)J~ool = LL = 1,

showing that q E 1 The remaining conditions simply generalize the properties of the transformation S(x) = /3tan('"Yx + 6) with IP'"YI > 1. 0

Example 6.6.1. It is easy to show that the Frobenius-Perron operator P


associated with S(x) = Ptan('"Yx + 6), IP'"YI > 1, is asymptotically stable.
We have
S'(x)P"Y
- cos2 ('"Yx + 6)
hence IS' (x) I ~ /3-y. Further

S"(x)
[S'(x)] 2 =

so that

1 .

--p sm[2('"Yx + 6)]

S"(x)
1
[S'(x)] 2 :5 1131"

6.7

Manifolds

The last goal of this chapter is to show how the techniques described in
Chapter 5 may be used to study the behavior of transformations in higherdimensional spaces. The simplest, and probably most striking, use of the
Frobenius-Perron operator in d-dimensional spaces is for expanding mappings on manifolds. To illustrate this, the results of Krzyzewski and Szlenk
(1969), which may be considered as a generalization of the results of Renyi
presented in Section 6.2, are developed in detail in Section 6.8. However,
in this section we preface these results by presenting some basic concepts
from the theory of manifolds, which will be helpful for understanding the

176

6. The Behavior of Transformations on Intervals and Manifolds

geometrical ideas related to the Krzyiewski-Szlenk results. This elementary description of manifolds is by no means an exhaustive treatment of
differential geometry.
First consider the paraboloid z = x 2 + y2 This paraboloid is embedded
in three-dimensional space, even though it is a two-dimensional object. If
the paraboloid is the state space of a system, then, to study this system,
each point on the paraboloid must be described by precisely two numbers.
Thus, any point m on the paraboloid with coordinates (x, y, x 2 + y2 ) is
simply described by its x, y-coordinates. This two-dimensional system of
coordinates may be described in a more abstract way as follows. Denote by
M the graph of the paraboloid, that is,
M = {(x,y,z): z = x 2 + y 2 },

and, as a consequence, there is a one-to-one transformation 4>: M --+ R2


described by 4>(x,y,z) = (x,y) for (x,y,z) eM. Of course, other coordinate systems on M are possible, that is, another one-to-one mapping,
4>*: M --+ R 2 , but 4> is probably the simplest one.
Now let M be the unit sphere,
M

= {(x,y,z):x2 +y2 + z2 = 1}.

In this example it is impossible to find a single smooth invertible function


4>: M --+ R 2 However, six functions i: M --+ R 2 may be defined as follows:

4>t(x,y,z) = (x,y),
4>2(x, y, z) = (x, y),
4>s(x, y, z) = (x, z),
4>4(x, y, z) = (x, z),
4>s(x, y, z) = (y, z),
4>6(x,y,z) = (y,z),

for z > 0;
for z < 0;
for y > 0;
for y < 0;
forx>O;
for x < 0.

Each of these functions 4>i maps a hemisphere of M onto an open unit disk.
This coordinate system has the property that for any m E M there is an
open hemisphere that contains m and on each of these hemispheres one 4>i
is defined.
In the same spirit, we give a general definition of a smooth manifold.

Definition 6. 7 .1. A smooth d-dimensional manifold consists of a topological Hausdorff space M and a system {4>i} of local coordinates satisfying
the following properties:
(a) Each function 4>i is defined and continuous on an open subset Wi C M
and maps it onto an open subset U;. = 4>i(W;.) of Rd. The inverse
functions 4>i 1 exist and are continuous (i.e., 4>i is a homeomorphism
of Wi onto Ui)i
(b) For each mE M there is a Wi such that mE Wi, that is, M = Ui W;.;

6.7. Manifolds

177

(c) If the intersection Win W; is nonempty, then the mapping 4Ji o 4Jj 1 ,
which is defined on 4J; (Wi n W;) c ~ and having values in ~, is a
0 00 mapping.
(Note that a topological space is called a Hausdorff space if every two
distinct points have nonintersecting neighborhoods.)
Any map 4Ji gives a coordinate system of a part of M, namely, Wi. A
local coordinate of a point mE Wi is 4Ji(m). Having a coordinate system,
we may now define what we mean by a Ole function on M. We say that
1: M -+ R is of class Ole if for each 4Ji: Wi -+ Ui the composed mapping
I o 4Ji 1 is of class c~e on ui.
Next consider the gradient of a function defined on the manifold. For
I: Rd -+ R the gradient of I at a point x E Rd is simply the vector
(sequence of real numbers),

81(x))
_ (81(x)
grad I( X ) 8
, ... , 8

Xd

Xt

For I: M -+ R of class 0 1 , the gradient of


calculated in local coordinates as follows:

I at a point

grad l(m) = (Dz 1 (m}I, ... ,Dz,(m}f),

m E M can be

(6.7.la}

where
(6.7.lb}
Thus the gradient is again a sequence of real numbers that depends on the
choice of the local coordinates.
The most important notion from the theory of manifolds is that of tangent vectors and tangent spaces. A continuous mapping 'Y= [a, b] -+ M represents an arc on M with the end points 'Y(a} and 'Y(b}. We say that 'Y
starts from m = 'Y(a). The arc 'Y is Ole if, for any coordinate system (jJ, the
composed function 4Jo'Y is of class Ole. The tangent vector to 'Y at a point
m = 'Y(a) in a coordinate system (jJ is defined by
d [

dt 4J('Y(t)}

t=a =

(~ ' .. '~ },
d

(6.7.2}

e' ...

where, again, the numbers


'~d depend on the choice of the coordinate system (jJ. Of course, 'Y must be at least of class 0 1. Two arcs 'Yt
and 'Y2 starting from m are called equivalent if they produce the same
coordinates, that is,
(6.7.3}
where 'Yt(at) = 'Y2(a2) = m. Observe that, if (6.7.3} holds in a given system
of coordinates (jJ, then it holds in any other coordinate system. The class of

178

6. The Behavior of 'Ii'ansformations on Intervals and Manifolds

all equivalent arcs produces the same sequence (6. 7.3) for any given system
of coordinates. Such a class represents the tangent vector. Tangent vectors
are denoted by the Greek letters { and 'I
Assume that a tangent vector in a coordinate system ,p has components
e1, ... , {d. What are the components in another coordinate system'? Now,

dt [1/J('y(t))] = dt [H(,P('y(t)))],
where H = 1/J o,p- 1 and, therefore, setting

d('f/Jo-y)/dt =('I\ ... ,,d),

,i = L: aniej.
8x;
d

(6.7.4)

;= 1

Equation (6.7.4) shows the transformations of the tangent vector coordinates under the change of coordinate system. Thus from an abstract (tensor
analysis) point of view the tangent vector at a point m is nothing but a
sequence of numbers in each coordinate system given in such a way that
these numbers satisfy condition (6.7.4) when we pass from one coordinate
system to another. From this description it is clear that the tangent vectors
at m form a linear space, the tangent space, which we denote by Tm
Now consider a transformation F from a ~dimensional manifold Minto
a ~dimensional manifold N, F: M-+ N. The transformation F is said to
be of class Ck if, for any two coordinate systems ,P on M and 1/J on N, the
composed function 1/J oF o ,p- 1 is of class Ck, or its domain is empty. Let
be a tangent vector at m, represented by a 0 1 arc "(: [a, b] -+ M starting
from m. Then F o 'Y is an arc starting from F(m), and it is of class 0 1 if
F is of class 0 1 The tangent vector to F o 'Y in a coordinate system 1/J is
given by

Setting

[1/J oF o 'Y]t=a

= ('I\ ...

"'d).

= 1/J oF o ,p- 1, where ,P is a coordinate system on M,


8 ej
,i = L: 8x;
d

(fi

(6.7.5)

i= 1

results. Equation (6.7.5) gives the linear transformation of a tangent vector


eat m to a tangent vector 'I at F(m) without explicit reference to the arc
'Y. This transformation is called the differential of F at a point m and is
denoted by dF(m), and thus symbolically
'I= dF(m){.

Note that the differential ofF is represented in any two coordinate systems,
,P on M and 1/J on N, by the matrix

i,j = 1, ... ,d.

6.7. Manifolds

179

The same matrix appears in the formula for the gradient of the composed
function: If F: M -+ N and f: N -+ R are C 1 functions, then the differentiation of(! oF) o f/J- 1 = (! o 1/J- 1 ) o (1/J oF o f/J- 1 ) gives
grad(/ o F)(m) = (D:r: 1 (m)(f oF), ... , D:r:Am)(f oF)),
where

D:r:,(m)(f 0 F)=

La:. [!(1/J- (x))]:r:=tt>(F(m)) ~:~.


1

j=l

This last formula may be written more compactly as


grad((/ o F)(m))

= (grad f)(dF(m)).

Observe that now dF(m) appears on the right-hand side of the vector.
Finally observe the relationship between tangent vectors and gradients.
Let f: M-+ R be of class C 1 and let -y: [a, b] -+ M start from m. Consider
the composed function f o -y: [a, b] -+ R that is also of class C 1 Using the
local system of coordinates,

o 'Y = (! o f/J- 1 ) o (f/J o -y),

and, consequently,
(6.7.6)
Observe that the numbers D:r:J and {i depend on J even though the lefthand side of (6.7.6) does not. Equation (6.7.6) may be more compactly
written as
(6.7.7)
In order to construct a calculus on manifolds, concepts such as the length
of a tangent vector, the norm of a gradient, and the area of Borel subsets
of M are necessary. The most effective way of introducing these is via the
Riemannian metric. Generally speaking the Riemannian metric is a scalar
product on Tm. This means that, for any two vectors {t,{2 E Tm, there
corresponds a real number denoted by ({1!{2 ). However, the coordinates

<ef, ... ,et>, <eJ, ... ,eg>


depend on the coordinate system f/J. Thus the rule that allows ({11 { 2 ) to
be calculated given ({f), ({~)must also depend on f/J. These facts are summarized in the following definition.
Definition 6. 7.2. A Riemannian metric on the manifold M is a system
of functions
i,j = l, ... ,d,
gf;(m): M-+ R,
such that

180

6. The Behavior of Transformations on Intervals and Manifolds

(a) For any choice of local coordinates </J: W


are defined and coo for x e U.

-+

U the functions gf; (f/J- 1(x))

(b) For each m E M the quadratic form


d

L: gf;~i~j
i,j=1

is symmetric and positive definite (i.e., gf; =


this sum is positive except if all ~i = 0).

gt,

and the value of

(c) For every {1,e2 E Tm the scalar product


d

(el!e2}

= L: gf;(m)efe~

(6.7.8)

i,j=1

does not depend on </J.


The last condition (6.7.8) looks somewhat mysterious, but it simply
means that
"LJ 9~~:1
.P (m )"11k 112I -_ LJ
"
t/1 (m )ti
9i;
.. 1..t:i2
i,j

k,l

where

"11!"12

are calculated from

et,e2 by equation (6.7.4). Thus

which implies that


(6.7.9)

Now having introduced the scalar product, the norm of E Tm is defined


by II ell = ((e, e)) 112. If a C 1 arc 1: [a, b] -+ M is given, it defines, at each
point m = 'Y(to), the tangent vector l'(to). Thus the length of an arc 'Y is
just

l('Y) =

1b

II'Y'(t)ll dt.

(6.7.10)

This equation may be used for any arc 'Y that is continuous and piecewise
C 1 . If a manifold M is such that any two points mo, m 1 e M can be joined
by a continuous piecewise C 1 arc, it is said to be connected. On connected
manifolds the distance between points is given by

p(mo, ml)

= inf l('Y),

6.7. Manifolds

181

where the inf is taken over all possible arcs joining mo and m1. With this
distance, M becomes a metric space.
From equation (6.7.7) it is easy to define the length of grad fat a point
m. It is, by definition,
lgrad f(m)l =sup I! [/('y(t)))t=al
where the sup is taken over all possible arcs -y: (a, b] -+ M with -y(a) = m
and II'Y'(a)ll = 1. From this definition, it follows that, for an arbitrary C 1
arc 'Y and C 1 function f,

(6.7.11)

!(J('y(t))]l ::;; lgrad f('y(t))III'Y'(t)ll.

Analogously, for a C1 mapping F: M


differential dF(m) by

-+

N we introduce the norm of the

ldF(m)l =sup lldF(m)ell,


where the supremum is taken over all
notion it can be verified that

eE Tm such that llell = 1. Using this

lldF(m)ell ::;; ldF(m)lllell,

meM,eeTm,

and
!grad(! o F)(m)l ::;; !(grad f)(F(m))lldF(m)l,

meM,

where f: N -+ R is a C 1 function. Differentiation on manifolds satisfies


some other properties analogous to those on Rd. We have, for example,
lgrad(fu)l ::;; 1/llgrad ul + lullgrad /1,
and
grad

(fg)(m) = f(m)g(m)

I:" : ; I: lgrad ,i,,


i

where/, g, and fi are C 1 functions.


To introduce the measure on M associated with the Riemannian metric,
it is first necessary to define the unit volume function Vq,(m). Consider
the Riemannian form gf; (m) corresponding to a coordinate system >: W -+
U. We can find d normalized vectors
(6.7.12)

182

6. The Behavior of Transformations on Intervals and Manifolds

orthogonal with respect to this form, that is,


d

<e~c,e,) =

:E uf;<m>eLel = 6~c,,
i,j=l

where

61c1

= 1 if k = l, and 61c1 = 0, k ::F l, is the Kronecker delta. Write


Vq,(m)

et
et ...

ej

det .. .. ..
e~

This same procedure can be carried out algebraically by setting Vq,(m) =


ldet(gf;(m))l- 1/ 2
Function Vq,(m) has a simple heuristic interpretation. Vectors e11 .. , ed,
which correspond to the components (6.7.12), are orthogonal and normal~
ized. Thus the volume spanned by them should be equal to 1. The volume
spanned by the representation of el, ... , edina local coordinate system is
Vq,(m). Thus Vq,(m) gives the volume in local coordinates corresponding to
a unit volume in M. By using this interpretation we define the measure
J1. of a Borel set B c Was

J.J.(B)

4>(B)

dx

Vq,(<f>-l(x))

(6.7.13}

The idea leading to this definition is obvious, as the elementary volume


dx = dx1 dxd in Rd corresponds to the volume Vq,(</>- 1(x}}dx1 dxd in
M. Thus, in order to reproduce the "original" volume in M, we must divide
dx by Vq,. It can be shown that J.J.(B) defined by (6.7.13} does not depend
on the choice of </>, which is quite obvious from the heuristic interpretation
of Vq,{m).
Analogous considerations lead to the definition of the determinant of
the differential of a C 1 transformation F from a d-dimensional manifold
Minto ad-dimensional manifold N. Take a point mE M and define
ldet dF{m)l
where
matrix

dal Vt~~(F(m)),
Vq,(m)
= ldx

Ida/dxl denotes the absolute value of the determinant of the d x d


{Jqi)
(8x ' i,j = 1, ... ,d.
J

It can be shown that this definition does not depend on the choice of
coordinate systems </> and t/J in M and N, respectively. Note also that
the determinant per se is not defined, but only its absolute value. This is
because our manifolds M, N are not assumed to be oriented.

6.8. Expanding Mappings on Manifolds

183

The following c8.J.culation will justify our definition of ldet dF(m)l. Let
B be a small set on M, and F(B) its image on N. What is the ratio
IJ(F(B))/IJ(B)? From equation (6.7.13),

IJ(F(B))
IJ(B)

Setting u

dy

= }t/J(F(B)) V(t/J- 1(y))

If

dx
}"'(B) V"'(</>- 1 (x)).

= 1/J oF o 4>- 1 and substituting y = u(x),

I I

IJ(F(B))
f
du
dx
IJ(B) =}"'(B) dx Vt/J(F(</>- 1 (x))

If

dx
}"'(B) V"'(</>- 1 (x))

results. Thus, for small B containing a point m, we have approximately


IJ(F(B))
IJ(B)

ldul
V"'(m) = ldet(dF(m))l.
dx Vt/JF(m))

(6.7.14)

6.8 Expanding Mappings on Manifolds


With the background material of the preceding section, we now turn to an
examination of the asymptotic behavior of expanding mappings on manifolds.
We assume that M is a finite-dimensional compact connected smooth
(C 00 ) manifold with a Riemannian metric. As we have seen in Section 6.7,
this metric induces the natural (Borel) measure 1J and distance p on M.
We use 1/'(m)l to denote the length of the gradient off at point mE M.
Before starting and proving our main result, we give a sufficient condition
for the existence of a lower-bound function in the same spirit as contained
in Propositions 5.8.1 and 5.8.2. We use the notation of Section 5.8.
Proposition 6.8.1. Let P: L 1 (M) --+ L 1 (M) be a Markov operator and if
we assume that there is a set Do, dense in D, so that for every f E Do the
trajectory
pnf=f,,
for n ?: no(/),
(6.8.1)

is such that the functions In are C 1 and satisfy


1/~(m)l $ kfn(m),

formE M,

(6.8.2)

where k ?: 0 is a constant independent of I, then there exists e > 0 such


that h =elM is a lower-bound function for P.
Proof. The proof of this proposition proceeds much as for Proposition
5.8.2. As before, 11/nll = 1. Set

e = [1/2J.t(M)]e-kr,

184

6. The Behavior of Transformations on Intervals and Manifolds

where

r=

sup p(mo,ml)
mo,m1EM
Let "Y( t), a :$ t :$ b, be a piecewise smooth arc joining points mo = "Y(a)
and m1 =')'(b). Differentiation of/no')' gives (see inequality (6.7.11)]

Id(/n~~(t))]l :$1/~("Y(t))III"Y'(t)ll
:$ klb'(t)ll/n("Y(t))
so that
/n(ml) :$ fn(mo) exp { k

1b

lb'(s)ll ds}.

Since "Y was an arbitrary arc, this gives


/n(mt) :$ /n(mo)ekp(mo,ml) :$ fn(mo)ekr.
Now suppose that h =elM is not a lower-bound function for P. This
means that there must be some n' > no and m 0 E M such that /n (mo) < e.
Therefore,
for m1 EM,
which contradicts 11/nll = 1 for all n >no(/). Thus we must have /n ~ h =
elM for n > no.
Next we turn to a definition of an expanding mapping on a manifold.

Definition 6.8.1. Let M be a finite-dimensional compact connected smooth


(C00 ) manifold with Riemannian metric and let 11- be the corresponding
Borel measure. A C 1 mapping S: M -+ M is called expanding if there
exists a constant A> 1 such that the differential dS(m) satisfies

(6.8.3)

at each m E M for each tangent vector E Tm


With this definition, Krzyzewski and Szlenk (1969] and Krzyzewski [1977]
demonstrate the existence of a unique absolutely continuous normalized
measure invariant under S and establish many of its properties. Most of
these results are contained in the next theorem.

Theorem 6.8.1. Let S: M -+ M be an expanding mapping of class C 2 ,


and P the Frobenius-Perron operator con-esponding to S. Then {pn} is
asymptotically stable.
Proof. From equation (6.7.5) with F = S, since Sis expanding, 7J =F 0 for
any =F 0, and, thus, the matrix (8ui/8x;) must be nonsingular for every
mEM.

6.8. Expanding Mappings on Manifolds

185

In local coordinates the transformation 8 has the form

x- tj>(S(>- 1 (x)))

= u(x)

and consequently is locally invertible. Therefore, for any point m e M the


counterimage s- 1 (m) consists of isolated points, and, since M is compact,
the number of these points is finite. Denote the counterimages of m by
m11 ... , m1c. Because 8 is locally invertible there exists a neighborhood W
of m and neighborhoods Wi of mi such that 8 restricted to Wi is a one to
one mapping from Wi onto W. Denote the inverse mapping of 8 on Wi by
9i We have 8 o 9i = lw11 where lw1 is the identity mapping on Wi and,
consequently, (dB) o (dgi) is the identity mapping on the tangent vector
space. From this, in conjunction with (6.8.3), it immediately follows that
ll(dui)ell ~

Now take a set B

(1/-X)IIeu.

(6.8.4)

c W, so
1c

s- 1 (B) = U9i(B),
i=l

and, by the definition of the Frobenius-Perron operator,

L:1
1c

Pf(m)p.(dm)

=f

jB

/(m)p.(dm)

ls-l(B)

i=l

f(m)p.(dm).

91(B)

This may be rewritten as


1
Jl. (B)

1
B

Pf(m)p.(dm)

~ p.(gi(B))
1
= L..,
(B) ( (B))
i=l
Jl.
p. g,

1
91(B)

f(m)p.(dm).

If B shrinks tom, then 9i(B) shrinks to 9i(m),

p.(~)
and
(

~B))

P. g,

91(B)

P/(m)p.(dm)- Pf(m) a.e.

f(m)p.(dm) - f(gi(m)) a.e.,

i = 1, ... ,k.

Moreover, by (6.7.14),

p.~{~))

-ldet(dgi(m))j.

Thus, by combining all the preceding expressions, we have


1c

Pf(m) =

L ldet(dgi(m))lf(gi(m)),
i=l

which is quite similar to the result in equation (6.2.10).

(6.8.5)

186

6. The Behavior of Transformations on Intervals and Manifolds

Now let Doc D(M) be the set of all strictly positive C 1 densities. For
E Do, differentiation of P/(m) as given by (6.8.5) yields
I(P/)'1 = E~=~ I(Ji(f o gi))'l
Pf
Ei=l J,(f 0 g,)

< E~=l IJ:I(f o 9i) + 2::-1 Jill' o 9illdgil


-

E~=l Ji(f o 9i)

E~=l Ji(f o 9i)

< max IJ:I +max I/' 0 9illd9il


-

Ji

(/ 0 9i)

'

where Ji = ldet dgi(m)l. From equation (6.8.4), it follows that ldgil ~


so that
sup I(P1)'1 < c + .!. sup

Pf

>..

1/>..,

ltJ

I '

where
c =sup IJHm>l.
i,m Ji(m)

Thus, by induction, for n = 1, 2, ... , we have


sup I(Pn/)'1 < ~ + _!:._supJfl.
pn / - ).. - 1 )..n
/
Choose a real K

> >.cj(>.- 1), then


sup I(Pn f)' I < K
pnf -

(6.8.6)

for n sufficiently large, say n > n 0 (/). A straightforward application of


Proposition 6.8.1 and Theorem 5.6.2 finishes the proof.
Example 6.8.1. Let M be the two-dimensional torus, namely, the Cartesian product of two unit circles:
M = {(m11m2):m1 = eiz 1 ,m2 = eiz 2 ,X1X2 E R}.
M is evidently a Riemannian manifold, and the inverse functions to
(6.8.7)

define the local coordinate system. In these local coordinates the Riemannian metric is given by 9ik = 6;k, the Kronecker delta, and defines a Borel
measure 11- identical with that obtained from the product of the Borel measures on the circle.
We define a mapping S: M -+ M that, in local coordinates, has the form
(mod 211").

(6.8.8)

Exercises

Thus S maps each point (m 1 ,m2) given by (6.8.7) to the point


where
m1 = exp(i(3xl + X2)] and ffl2 = exp(i(xl + 3x2)].

187

(m1 ,m2),

We want to show that S is an expanding mapping.


From (6.8.8) we see that dS(m) maps the vector = (e,e 2) into the
vector (3e+e,e 1 +3e). Also, sinceg;A: = D;A:, hence (e 1 ,e)= (e 1 ) 2+(e) 2
from (6.7.8). Thus
2
2
lldS(m)ell = (3e + e) + (e + 3e) 2
2
22
2
= 4!<e ) + <e ) l + 6(e +
?: 4llell 2,

e>

and we see that inequality (6.8.3) is satisfied with.\= 2, therefore Sis an


expanding mapping. Further, if Pis the Frobenius-Perron operator corresponding to S, then, by Theorem 6.8.1, {Pn} is asymptotically stable. It is
also possible to show that S is measure preserving, so by Proposition 5.6.2
this transformation is exact. This proves our earlier assertion in Section
4.3. 0

Exercises
6.1. Let (X, A, p.) be a measure space and let S: X-+ X be a nonsingular
transformation. Fix an integer k ?: 1. Prove that SA: is statistically stable
if and only if S is statistically stable.
6.2. Consider the transformationS: (0, 1]-+ (0, 1] defined by
-{2x
~

S(x)-

a:

O$x<!
12<- x<-1.

Using the result proved in Exercise 6.1 show that Sis statistically stable.

6.3. Consider the transformationS: (0, 1]-+ (0, 1] of the form


S(x)

= cx(1- x2 ).

Fix cat the value for which S maps (0, 1] onto [0, 1] and, using the change
of variables formulas (Theorem 6.5.1), show that for this value of c the
transformations S is statistically stable.
6.4. Consider the transformations S: (0, 1]-+ (0, 1] of the form
S(x)

= cx2 (1- x).

Again fix c to be that value at which S maps (0, 1] onto (0, 1] and use a
change of variables and the Birkhoff individual ergodic theorem to show
that
lim sn(x) 0,
for x E (0, 1] a.e.
n-+oo

188

6. The Behavior of Transformations on Intervals and Manifolds

Observe that S has periodic points of period 3 and thus is chaotic in the
sense of Sark.ovskil and Li-Yorke [Jama, 1989; Li and Yorke, 1975].
6.5. Consider the transformation S: [0, l] -+ [0, l] defined by

S(x)

= x [-a+ 1 !x],

where l = (b/a)- 1 > 0. Find b = bo such that S maps [0, l] onto [0, l] and
using the change of variables formulas prove that S is statistically stable.
6.6. Consider the ''tent" transformationS: [0, 1]-+ [0, 1] of the form

-{ex
O$x<!
c(1- x) ! $ x $ 1

S(x) -

where 1 < c $ 2. Let P be the corresponding Frobenius-Perron operator. Determine the values of parameter c for which {pn} is asymptotically
periodic but not asymptotically stable (r ~ 2 in formula. (5.3.8)).
6.7. Let M be the two-dimensional torus described in Example 6.8.1. Consider the transformation S: M -+ M which in local coordinates has the
form
S(xt, x2) = (auxt + a12X2, a21X1 + a22x2).
Assume that the coefficients ai; are positive integers and find conditions
concerning the matrix (ai;) which imply the exactness of S.
6.8. Using TRAJ, BIFUR, DENTRAJ, and DENITER numerically study
the behavior of transformations defined in Exercises 6.5 and 6.6 for a <
b $ bo, and 0 < c < 2, respectively.

7
Continuous Time Systems:
An Introduction

In previous chapters we concentrated on discrete time systems because

they offer a convenient way of introducing many concepts and techniques


of importance in the study of irregular behaviors in model systems. Now
we turn to a study of continuous time systems.
Continuous and discrete systems differ in several important and interesting ways, which we will touch on throughout the remainder of this book. For
example, in a continuous time system, complicated irregular behaviors are
possible only if the dimension of the phase space of the system is three or
greater. As we have seen, this is in sharp contrast to discrete time processes
that can have extremely complicated dynamics in only one dimension. Further, continuous time processes in a finite-dimensional phase space are in
general invertible, which immediately implies that exactness is a property
that will not occur for these systems (recall that noninvertibility is a necessary condition for exactness). However, systems in an infinite-dimensional
phase space, namely, time delay equations and some partial differential
equations, are generally not invertible and, thus, may display exactness.
This chapter is devoted to an introduction of the concept of continuous
time systems, an extension of many properties developed previously for discrete time systems, and the development of tools and techniques specifically
designed for studying continuous time systems.

190

7. Continuous Time Systems: An Introduction

7.1

Two Examples of Continuous Time Systems

Here a continuous time process in a phase space X is given by a family of


mappings
St: X -+ X,
t ;::: 0.

AB illustrated in Figure 7.1.1, the value St(:r:0 ) is the position ofthe system
at a time t that started from an initial point :r:0 E X at time t = 0.
We consider only those processes in which the dynamical law S does not
explicitly depend on time so that the property
(7.1.1)

holds. This simply means that the dynamics governing the evolution of the
system are the same on the intervals (0, t'] and [t, t + t'].
Example 7.1.1. A well-known example of a continuous time process is
given by an autonomous d-dimensional system of ordinary differential equations
d:r:
(7.1.2)
- =F(:r:)

dt

where :r: = (:r:11 ... ,:r:d) and F: Jld -+ Jld is sufficiently smooth to ensure
the existence and uniqueness of solutions, such as F is C 1 and satisfies
IF(:r:)l ~ et + .BI:r:l, with et and .B finite. In this case, St(:r:0 ) is the solution
of (7.1.2) with the initial condition
(7.1.3)

In this example time t need not be restricted to t ;::: 0, and the system can
also be studied for t ~ 0. As we will see in Section 7.8, this is a commonly
encountered situation for problems in finite dimensional phase spaces. 0

Example 7.1.2. Consider the delay-differential equation

~~t)

= F(x(t),x(t -1))

(7.1.4)

with the initial condition


forTE (-1,0).

(7.1.5)

For rather simple restrictions on F (namely, that F is C 1 and IF(:r:, y)l ~


a(y) + ,B(y)l:r:l, where et and .B are arbitrary continuous functions of y),
there is a unique solution to (7.1.4} with (7.1.5) [see Hale, 1977].
Let X be the space of all functions (-1,0]-+ R with the usual uniform
convergence topology. Given :r:0 E X and the solution :r: of (7.1.4) and
(7.1.5}, we may define

St:r:0 (T) = :r:(t + T},

forTE [-1, 0].

(7.1.6}

7.2. Dynamical and Semidynamical Systems

191

s1 (x >

FIGURE 7.1.1. The trajectory of a continuous time process in the phase space
X. At timet= 0 the system is at x0 , and at timet it is at St(x0 ).

If x(t) is the solution of (7.1.4)-(7.1.5), then, since F is not an explicit


function oft, x(t +a), a~ 0, is also a solution to the problem. Using this
fact it is easy to verify that transformation (7.1.6) satisfies property (7.1.1),
although it is impossible to define Stx(r) fort< 0. 0
A very important difference exists between these two examples with respect to their invertibility. Thus, although the solution to the system of
ordinary differential equations in Example 7.1.1 may be studied fort~ 0,
in general no solution exists for the differential-delay equation of Example 7.1.2 when t < 0. This lack of invertibility is generally the case for
delay-differential equations and, indeed, for many continuous time systems
whose phase space X is not finite dimensional (e.g., some partial differential
equations).

7.2 Dynamical and Semidynamical Systems


It is possible to establish many results for continuous time processes in a
phase space X endowed with no other property than a measure ,.,, as was
done in earlier chapters for discrete time processes. However it is simpler to
consider continuous time processes in a measure space that is also equipped
with a topology. Thus, from this point on, let X be a topological Hausdorff
space and A the u-algebra of Borel sets, that is, the smallest u-algebra that
contains all open, and thus closed, subsets of X.
Dynamical Systems

Definition 7.2.1. A dynamical system {StheR on X is a family of


transformations St: X-+ X, t E R, satisfying
(a) S0 (x)

= x for all x EX;

(b) St(St'(x)) = St+t'(x) for all x EX, with t,t' E R; and


(c) The mapping (t,x)-+ St(x) from X x R into X is continuous.

192

7. Continuous Time Systems: An Introduction

Remark 7.2.1. System (7.1.2) of ordinary differential equations, introduced in the preceding section, is clearly an example of a dynamical system.
0
Remark 7.2.2. It is clear from the group property of Definition 7.2.1 that
for all t E R.
Thus, for all toE R, any transformation St0 of a dynamical system {StheR
is invertible. 0
In applied problems the space X is customarily called the phase space
of the dynamical system {StheR. whereas, for every fixed x 0 E X, the
function St(x0 ), considered as a function of t, is called a trajectory of
the system. The trajectories of a dynamical system {StheR in its phase
space X are of only three possible types, as shown in Figure 7.2.1a,b,c for
X = R 2 First (Figure 7.2.1a), the trajectory can be a stationary point x0
such that
for all t E R.
Second, as shown in Figure 7.2.1b, the trajectory can be periodic with
period w > 0, that is,
for all t E R.
Finally, the trajectory can be nonintersecting (see Figure 7.2.1c), by which
we mean that

It is straightforward to show that the trajectory of a dynamical system


cannot be of the intersecting nonperiodic form shown in Figure 7.2.1d. To
demonstrate this, assume the contrary, that, for a given x 0 E X, we have

By applying Bt-t 1 to both sides of this equation, we have


Bt-tl (St 1(x0 ))

= Bt-t1(St~ (x0 )).

By the group property (b) of Definition 7.2.1, we also have


St-t1 (Sh (x 0 ))

= St(x 0 )

and
Bt-tt(St~(x0 ))

Hence, with w = (t 2

= St+(t~-tt)(x

tl), our assumption leads to


St(x0 )

= St+w(x0 ),

).

7.2. Dynamical and Semidynamical Systems

193

()

; s, (;)

~=Sw(;)

(b)

(a)

(c)

(d)

FIGURE 7.2.1. Trajectories of a dynamical system in its phase space X. In (a)


the trajectory is a stationary point, whereas in (b) the trajectory is a periodic
orbit. Trajectory (c) is of the nonintersecting type. The intersecting trajectory
shown in (d) is not possible in a dynamical system.

implying that the only possible intersecting trajectories of a dynamical


system are periodic.
However, it is often the case that the evolution in time of data is observed to be of the intersecting nonperiodic type. For example, the tw~
dimensional projection of the trajectory of a three-dimensional system
might easily be of this type. The projection of a trajectory of a dynamical
system is called the trace of the system. The following is a more precise
definition.
Definition 7.2.2. Let X and Y be two topological Hausdorff spaces,
</J: Y --+ X a given continuous function, and Bt: Y --+ Y a given dynamical
system on Y. A function g: R --+ X is called the trace of the dynamical
system {BtheR if there is ayE Y such that

g(t)

= </J(St(y))

for all t E R.

From our precise definition of the trace of a dynamical system, the following obvious question arises: Given an observed continuous function in

194

7. Continuous Time Systems: An Introduction

a space X that is intersecting and nonperiodic, when is this function the


trace of a dynamical system {StheR operating in some higher-dimensional
phase space Y? The answer is as surprising as it is simple, but it turns out
that all continuous functions in X are traces of a single dynamical system!
This is stated more formally in the following theorem.

Theorem 7.2.1. Let X be an arbitrary topological Hausdorff space. Then


there is another topological Hausdorff spaceY, a dynamical system {StheR
operating in Y, and a continuous function 4>: Y --+ X such that every continuous function g: R--+ X is the trace of {StheR that is, for every g there
is a y E Y such that
g(t) = 4>(St(y))

for all t E R.

Proof. Let Y be the space of all continuous functions from R into X (note
that the elements of space Y are functions, not points). Let a dynamical
system {St' h'eR, St': Y --+ Y, operating on Y, be a simple shift so that
starting from a given y E Y we have, after the operation of St', a new
function y(t + t'). This may be represented by a diagram,
y(t)

~ y(t + t'),

or, more formally,


St'(y)(t)

Define a projection 4>: Y

--+

= y(t + t').

X by
4>(y) = y(O),

then projection 4> is just the evaluation of y at point t = 0. Let g: R


be an arbitrary continuous function so that, by our definitions,

St'(g)(t)

--+

= g(t + t')

and
4>(St'(g)) = Stg(O} = g(t'},

showing that g is the trace of a trajectory of the dynamical system {St' h eR


operating in Y, namely, a trajectory starting from the initial point g. Further, Y will be a topological Hausdorff space, and (t', y) --+ St' (y) a continuous mapping if we equip the function space Y with the topology of
uniform convergence on compact intervals. This, coupled with the trivial
observation that St'+t"(Y) = St(St"(y)) and So(Y) = y, completes the
proof of the theorem.

Remark 7.2.3. Note that the proof of this theorem rests on the identification of the functions on X as the objects on which the new dynamical
system {St' heR operates. 0

7.3. Invariance, Ergodicity, Mixing, and Exactness

195

Semidynamical Systems

Definition 7.2.3. A semidynamical system {Sth~o on X is a family


of transformations Bt: X -+ X, t E R+, satisfying

= x for all x EX;


Bt(St'(x)) = Bt+t(x) for all x EX, with t,t' E R+;

(a.) So(x)
(b)

(c) The mapping (t,x)-+ Bt(x) from X x R+ into X is continuous.

Remark 7.2.4. The only difference between dynamical and semidynamical systems is contained in the group property [compare conditions (b)
of Definitions 7.2.1 and 7.2.3]. The consequence of this difference is most
important, however, because semidynamical systems, in contrast to dynamical systems, are not invertible. It is this property that makes the study of
semidynamical systems so important for applications. Henceforth, we will
confine our attention to semidynamical systems. 0
Remark 7.2.5. An examination of the proof of Theorem 7.2.1 shows that
it is also true for semidyna.mical systems. 0
Remark 7 .2.6. On occasion a. family of transformations { Bt }t~ 0 satisfying
properties (a.) and (b) will be called a. semigroup of transformations.
This is because property (b) in Definition 7.2.3 ensures that transformations Bt form an Abelian semigroup in which the group operation is the
composition of two functions. Thus a. semidyna.mical system is a. continuous semigroup. 0

Remark 7.2.7. The area. of topological dynamics examines the behavior


of semidyna.mical systems from a topological perspective. Here, however,
since we are primarily interested in highly irregular behaviors, our main
tools will be measures on X. 0

7.3 Invariance, Ergodicity, Mixing, and Exactness


in Semidynamical Systems
Inva.riance and the Individual Ergodic Theorem
From the continuity property (c) of Definition 7.2.3, all our transformations
Bt are measurable, that is, for all A E A,

196

7. Continuous Time Systems: An Introduction

where, as usual, Bt 1 (A) denotes the counterimage of A, namely, the set of


all points x such that St(x) EA. Thus we can state the following definition.

Definition 7.3.1. A measure p. is called invariant under a family {St} of


measurable transformations St: X -+ X if
for all A EA.

(7.3.1)

As for discrete time processes, we will say interchangeably either that a


measure is invariant under {St} or that transformations {St} are measure
preserving when equation (7.3.1) holds.
Given a finite invariant measure p., we can formulate a continuous time
analog of Theorem 4.2.3, which is also known as the Birkhoff individual
ergodic theorem.
Theorem 7.3.1. Let p. be a finite invariant measure with respect to the
semidynamicalsystem {Sth>o, and let f: X-+ R be an arbitrary integrable
function. Then the limit
-

J*(x)

11T

= Tlim
-T
..... oo

f(St(x)) dt

{7.3.2)

exists for all x E X except perhaps for a set of measure zero.

Proof. This theorem may be rather easily demonstrated using the corresponding discrete time result, Theorem 4.2.3, if we assume, in addition,
that for almost all x EX the integrand f(St(x)) is a bounded measurable
function of t.
Set

1
1

g(x)

f(St(x)) dt

and assume at first that T is an integer, T = n. Note also that the group
property {b) of semidynamical systems implies that

Then the integral on the right-hand side of (7.3.2) may be written as


1

rT f(St(x)) dt = ;;1 Jor f(St(x)) dt

T Jo

n-1

=;; L

f(St(x))dt

f(St-k(SA:(x))) dt

A:+l

k=O k

n-1

= ;; L

A:+l

k=O k

7.3. Invariance, Ergodicity, Mixing, and Exactness


n-1

197

L 1o f(St'(S~c(x))) dt'
n

1
= -

k=O

n-1

=- l:u(S~c(x)).
n k=O

However, S~c

= 81 o S~c-1 = 81

o o o 81 =Sf, so that

11n f(St(x)) dt = lim -1n-1


L g(S~(x)),

lim n-+oo

n-+oo

k=O

and the right-hand side exists by Theorem 4.2.3. Call this limit f*(x).
If T is not an integer, let n be the largest integer such that n < T. Then
we may write

1r f(St(x)) dt = Tn . n1r
1r
lo f(St(x)) dt + T ln f(St(x)) dt.

T lo

As T -+ oo, the first term on the right-hand side converges to f*(x), as


we have shown previously, whereas the second term converges to zero since
f(St(x)) is bounded.
As in the discrete time case, the limit f*(x) satisfies two conditions:
(C1) f*(St(x))
and

(C2)

= f*(x),

f*(x) dx

a.e. in x for every t

f(x) dx.

0,

(7.3.3)

(7.3.4)

Ergodicity and Mixing


We now develop the notions of ergodicity and mixing for semidynamical
systems. Exact semidynamical systems are considered in the next section.
Under the action of a semidynamical system {Stlt>o, a set A E A is
called invariant if
fort~ 0.
(7.3.5)
Again we require that for every t ~ 0 the equality (7.3.5) is satisfied modulo
zero (see Remark 3.1.3). By using this notion of invariant sets, we can define
ergodicity for semidynamical systems.

Definition 7.3.2. A semidynamical system {Sth>o, consisting of nonsingular transformations St: X -+ X is ergodic if every invariant set A E A

198

7. Continuous Time Systems: An Introduction

is such that either JL(A) = 0 or JL(X\A) = 0. (Recall that a set A for which
JL(A) = 0 or JL(X \A)= 0 is called trivial.)

Example 7.3.1. Again we consider the example of rotation on the unit


circle, originally introduced in Example 4.2.2. Now X= (0, 211") and
(mod 211").

St(x)=x+wt

(7.3.6)

St is measure preserving (with respect to the natural Borel measure on the


circle) and, for w i= 0, it is also ergodic. To see this, first pick t = to such
that wto/211" is irrational. Then the transformation St0 : X-+ X is ergodic,
as was shown in Example 4.4.1. Since St0 is ergodic for at least one to, every
A must be trivial by Definition
(invariant) set A that satisfies St;; 1 (A)
4.2.1. Thus, any set A that satisfies (7.3.5) must likewise be trivial, and
the semidynamical system {Sth>o with St given by (7.3.6) is ergodic. 0

Remark 7.3.1. It is interesting to note that, for any to commensurate with


211"/W (e.g., to= 11"/w), the transformation St0 is not ergodic. This curious
result illustrates a very general property of semidynamical systems: For a
given ergodic semidynamical system {Sth2:o, there might be a specific to
for which St 0 is not ergodic. However, if at least one St 0 is ergodic, then
the entire semidynamical system {St h>o is ergodic. 0
We now turn our attention to mixingin semidynamical systems, starting
with the following definition.

Definition 7.3.3. A semidynamical system {Sth2:o on a measure space


(X, A, JL) with a normalized invariant measure JL is mixing if
lim JL(A n 8t" 1 (B)) = JL(A)JL(B)

t-+oo

for all A,B EA.

(7.3.7)

Thus, in continuous time systems, the interpretation of mixing is the


same as for discrete time systems. For example, consider all points x in the
set An St" 1 (B), that is, points x such that x E A and St(x) E B. From
(7.3.7), for large t the measure of these points is just JL(A)JL(B), which
means that the fraction of points starting in A that eventually are in B is
given by the product of the measures of A and B in the phase space X.
By Definition 7.3.3 the semidynamical system {Sth2:o, consisting of rotation on the unit circle given by (7.3.6), is evidently not mixing. This is
because, given any two nontrivial disjoint sets, A, B E A, the left-hand side
of (7.3.7) is always zero for wt = 21rn (nan integer), whereas JL(A)JL(B) i= 0.
A continuous time system that is mixing is illustrated in Example 7. 7.2.
Remark 7 .3.2. The concepts of ergodicity and mixing are also applicable
to dynamical systems. In this case, condition (7.3. 7) can be replaced by
lim JL(A n St(B))

t-+oo

= JL(A)JL(B)

(7.3.8)

7.4. Semigroups

199

since

Exactness

Definition 7.3.4. Let (X, A, p.) be a normalized measure space. A measurepreserving semidynamical system {Sth2:o such that St(A) E A for A E A
is exact if
lim p.(St(A))

t-+oo

=1

for all A E A, p.(A)

> 0.

(7.3.9)

Example 11.1.1 illustrates exactness for a continuous time semidynamical


system.

Remark 7.3.3. As in discrete time systems, exactness of {Sth>o implies


that {Sth2:o is mixing. 0
Remark 7.3.4. Due to their invertibility, dynamical systems cannot be
exact. This is easily seen, since p.(S-t(St(A))) = p.(A) and, thus, the limit
in (7.3.9) is p.(A) and not 1, for all A EA. If the system is nontrivial and
contains a set A such that 0 < p.(A) < 1, then, of course, condition (7.3.9)
is not satisfied. 0

7.4 Semigroups of the Frobenius-Perron and


Koopman Operators
As we have seen in the discrete time case, many properties of dynamical systems are more easily studied by examining ensembles of trajectories
rather than single trajectories. This is primarily because the ensemble approach leads to semigroups of linear operators, and, hence, the techniques
of linear functional analysis may be applied to a study of their properties.
Since, for any fixed tin a semidynamicalsystem {Sth>o, the transformation St is measurable, we can adopt the discrete time definitions of the
Frobenius-Perron and Koopman operators directly for the continuous time
case.
Frobenius-Perron Operator
Assume that a measure p. on X is given and that all transformations St of
a semidynamicalsystem {St}t2:0 are nonsingular, that is,
for each A E A such that p.(A)

= 0.

200

7. Continuous Time Systems: An Introduction

Then, analogously to (3.2.2), the condition

}A

=f

Ptf(x)p.(dx)

ls;- 1 (A)

f(x)p.(dx)

(7.4.1)

forAEA

for each fixed t ~ 0 uniquely defines the Frobenius-Perron operator Pt:


L 1 (X)-+ L 1 (X), corresponding to the transformation St.
It is easy to show, with the aid of (7.4.1), that Pt has the following
properties:
(FP1) Pt(Adt

+ >.2!2) = AtPth + >.2Pth,

>.11 >.2 E R;
{FP2)
(FP3)

Pt! ~ 0,

(7.4.2)
if I~ 0;

Ptf(x)p.(dx)

(7.4.3)

f(x)p.(dx),

for all

E L1

{7.4.4)

Thus, for every fixed t, the operator Pt: L 1 (X) -+ L 1 (X) is a Markov
operator.
The entire family of Frobenius-Perron operators Pt: L 1 (X) -+ L 1 (X)
satisfies some properties similar to (a) and (b) of Definition 7.2.3. To see
this, first note that since St+t' = StoSt', then s;_;t, = S;, 1 (8t" 1 ) and, thus,

1A

Pt+t'f(x)p.(dx)

=f
f(x)p.(dx} = f
Js-t+t' (A)
1s-t'
= f
Ptf(x)p.(dx)
1

(s-t 1 (A))

f(x)p.(dx)

ls; 1 (A)

Pt(Ptf(x))p.(dx).

This implies that


for all

f E L 1 {X), t, t' ~ 0

(7.4.5)

and, thus, Pt satisfies a group property analogous to (b) of Definition 7.2.3.


Further, since S0 (x) = x, we have S0 1 (A) =A and, consequently,

}A

Pof(x)p.(dx)

= f

Js0 (A)
1

f(x)p.(dx)

=f

}A

f(x)p.(dx)

implying that

Pol=!

(7.4.6)

Hence Pt satisfies properties (a) and (b) of the definition of a semidynamical


system.

7.4. Semigroups

201

The properties of Pt in (7.4.2)-(7.4.6) are important enough to warrant


the following definition.

Definition 7.4.1. Let (X, A, JJ) be a measure space. A family of operators


L 1 (X), t ~ 0, satisfying properties (7.4.2)-(7.4.6) is called a
stochastic semigroup. Further, if, for every f E L 1 and to~ 0,

Pt:

L 1 (X)

-+

lim IIPtf - Ptofll


t-+to

= 0,

then this semigroup is called continuous.


A very important and useful property of stochastic semigroups is that
(7.4.7)
and, thus, from the group property (7.4.5), the function t-+
is a nonincreasing function of t. This is simply shown by

IIPt+t'ft- Pt+thll

IIPt/1- Pthll

= 11Pt(Ptf1- Pth)ll:::; IIPtft- Pthll,

which follows from (7.4. 7). By using this property, we may now proceed to
prove a continuous time analog of Theorem 5.6.2.

Theorem 7.4.1. Let {Pth>o be a semigroup of Markov operators, not


necessarily continuous. Assume that there is an hE L 1 , h(x) ~ 0, llhll > 0
such that
lim II(Ptf- h)_ II= 0
for every fED.
(7.4.8)
t-+oo

Then there is a unique density/. such that Ptf. = /. for all t


thermore,
for every fED.
lim Ptf = /.
t-+oo

Proof. Take any to > 0 and define P


(7.4.8)
lim

n-+oo

= Pt

II(P I- h)_ II= o

so that Pnto

lim

Pf = /.

0. Fur(7.4.9)

= pn. Then, from

for each fED.

Thus, by Theorem 5.6.2, there is a unique /. E D such that P /.


n-+oo

= /. and

for every fED.

Having shown that Ptf. = /. for the set {to, 2t0 , }, we now turn to
a demonstration that Ptf. = /. for all t. Pick a particular time t', set
ft = Pt' /., and note that /. = pn /. = Pntof Therefore,

I!Ptf- fll

= IIPt'(Pntof*)- /.11
= IIPnta(Pt!.)- /.II
= IIPn(Ptf.)- /.II
= IIPnft -fll

(7.4.10)

202

7. Continuous Time Systems: An Introduction

Thus, since,
and the left-hand side of (7.4.10) is independent of n, we must have IIPt' /./.II = 0 so Pt' /. = J. Since t' is arbitrary, we have Pd. = /. for all t ~ 0.
Finally, to show (7.4.9), pick a function f E D so that IIPtf- /.11
liFt!- .Pt/.11 is a nonincreasing function. Pick a subsequence tn = nto. We
know from before that limn-oo IIPt,.!- /.II = 0. Thus we have a nonincreasing function that converges to zero on a subsequence and, hence

lim

t-oo

IIPt!- /.II = o.

Remark 7.4.1. The proof of this theorem illustrates a very important


property of stochastic semigroups: namely, a stochastic semigroup {Pth;::o
is called asymptotically stable if there exists a unique /. E D such that

P/. = /.

and if condition (7.4.9) holds for every IE D.

Remark 7.4.2. From the above definition, it immediately follows that the
asymptotic stability of a semigroup {Pt h;::o implies the asymptotic stability
of the sequence {P{!} for arbitrary to > 0. The proof of Theorem 7.4.1 shows
that the converse holds, that is, if for some to > 0 the sequence {Pt~} is
asymptotically stable, then the semigroup {Pth;::o is also asymptotically
stable. 0
Stochastic semigroups that are not semigroups of Frobenius-Perron operators can arise, as illustrated by the following example.
Example 7.4.1. Let X
by
Ptf(x)

= R, I

L1 (X), and define Pt: L 1 (X)

= /_: K(t,x,y)J(y)dy,

Pof(x) = l(x),

where

y)2] .

1
[ (xK(t,x,y) = ~ exp - 2a2t

It may be easily shown that the kernel K(t, x, y) satisfies:


(a) K(t,x,y) ~ 0;
(b) /_: K(t, x, y) dx

(c) K(t+t',x,y)

= 1; and

= /_: K(t,x,z)K(t',z,y)dz.

--+

L 1 (X)

(7.4.11)

(7.4.12}

7.4. Semigroups

203

From these properties it follows that Pt defined by (7.4.11) forms a continuous stochastic semigroup. The demonstration that {Pth>o defined by
(7.4.11) and (7.4.12) is not a semigroup of Frobenius-Perron operators is
postponed to Remark 7.10.2.
That (7.4.11) and (7.4.12) look familiar should come as no surprise as
the function u(t,x) = Ptf(x) is the solution to the heat equation

au

u2

8t = 2
with the initial condition
u(O,x)

8 2u
8x2

= l(x)

fort> O,x E R

forxER.

(7.4.13)

(7.4.14)

The Koopman Operator


Again let {Sth>o be a semigroup of nonsingular transformations St in
our topological Hausdorff space X with Borel u-algebra A and measure J.l.
Recall that the St are nonsingular if, and only if, for every A E A such
that JJ(A) = 0, JJ(S; 1 (A)) = 0. Further, let IE L 00 (X). Then the function
Utf, defined by
Utf(x) = I(St(x)),
(7.4.15)
is again a function in L 00 (X). Equation (7.4.15) defines, for every t ;:::: 0,
the Koopman operator associated with the transformation St. The family
of operators {Uth~o, defined by (7.4.15), satisfies all the properties of the
discrete time Koopman operator introduced in Section 3.3.
It is also straightforward to show that {Uth~o is a semigroup. To check
this, first note from the defining formula (7.4.15) that

Ut+tl(x) = I(St+t(x)) = I(St(St(x)))


= Ut(Utl(x)),
which implies

Ut+t'l Ut(Utl)
for all IE L 00
Furthermore, Uol(x) = I(So(x)) = l(x), or
Uol =I

for all IE L 00 ,

so that {Uth>o is a semigroup.


Finally, the-Koopman operator is adjoint to the Frobenius-Perron operator, or

(Ptl, g) = (!, Utg}

for all

E L 1 (X), g E L 00 (X) and t ;:::: 0. (7.4.16)

The family of Koopman operators is, in general, not a stochastic semigroup because Ut does not map L 1 into itself (though it does map L 00 into
itself) and satisfies the inequality
ess sup IUtfl ::5 ess sup Ill

204

7. Continuous Time Systems: An Introduction

~.
FIGURE 7.4.1. Plots of /(z) and Te/(z) = f(x- ct), for c > 0.

instead of preserving the norm. In order to have a common notion for


families of operators such as {Pt} and {Ut}, we introduce the following
definition.

Definition 7.4.2. Let L = LP, 1 ~ p ~ oo. A family {Tth;::o of operators,


1t: L - L, defined for t ~ 0, is called a semigroup of contracting linear
operators (or a semigroup of contractions) if Tt satisfies the following
conditions:
(a) Tt(l..dt

+ l..2h) = l..tTdt + l..2Tth, for ft, h e L, >..to >..2 e R;

(b) IITtfiiL ~ II/IlL for IE L;


(c)

To/=/,

(d) Tt+t'f

for

E L; and

= 1t(Tt/), for IE L.

Moreover, if
for

lim !lTd- TtofiiL = 0,


t-+to

E L, to ~ 0,

then this semigroup is called continuous.


Example 7.4.2. Consider the family of operators {Tth;::o defined by (see
Figure 7.4.1)
Ttf = f(x-ct)
for x E R, t ~ 0.
(7.4.17)
These operators map L = LP(R), 1 ~ p ~ oo, into itself, satisfy properties
(a)-(d) of Definition 7.4.2., and form a semigroup of contractions.
To see that property (b) holds for Tt. use the "change of variables"
formula,

IITt!IFr_P =

i:

lf(x- ct)IP dx

= /_: 1/(y)IP dy = 11/II':,P

when p < oo, and the obvious equality,

1111/IILoo = ess sup 1/(x- ct)l = ess sup 1/(x)l = 11/IILoo


Ill

Ill

7.5. Infinitesimal Operators

c!

205

I+C!

FIGURE 7.4.2. Function llct,l+ctJ(x) -l[o,ce)(x)l versus x.

whenp = oo. The remaining properties (a), (c), and (d) follow immediately
from the definition of 7t in equation (7.4.17).
Finally, we note that if p = 1 then this semigroup of contractions is
continuous. To see this, first use

ll7t/- TtofiiLl

= /_: 1/(x- ct) -

f(x- cto)l dx

= /_: 1/(y)- f(y- c(to- t))l dy


and note that the right-hand side converges to zero by Corollary 5.1.1. A
slightly more complicated calculation shows that {Tth>o is a continuous
semigroup of contractions for every 1 ~ p < oo. However, in 00 the semigroup {7t}t~o given by (7.4.17) is not continuous except in the trivial case
1[o,l). We then have
when c 0. This may be easily shown by setting f

Ttf(x)

= 1[o,lJ(X- ct) = 1[ct,ct+1J(x)

and, as a consequence,
117t/- !IlL""
for 0

= ess sup j1(t,l+ctJ(x)- 1[o,ct)(x)j = 1

"'
< ct < 1. ThusiiTd- filL""

does not converge to zero as t - 0. This

may be simply interpreted as shown in Figure 7.4.2 where the hatched areas

corresponding to the function l1(1,1ctl - 1[o,ct)l disappear as t - 0 but the


heights do not. 0

7.5 Infinitesimal Operators


The problems associated with the study of continuous time processes are
more difficult than those encountered in discrete time systems. This is
partially due to concerns over continuity of processes with respect to time.
Also, equivalent formulations of discrete and continuous time properties

206

7. Continuous Time Systems: An Introduction

may appear more complicated in the continuous case because of the use
of integrals rather than summations, for example, in the Birkhoff ergodic
theorem. However, there is one great advantage in the study of continuous
time problems over discrete time dynamics, and this is the existence of a
new tool-the infinitesimal operator.
In the case of a semidynamical system {St }t~o arising from a system of
ordinary differential equations (7.1.2), the infinitesimal operator is simply
the function F(x). This connection between the infinitesimal operator and
F( x) stems from the formula
lim x(t) - x(O) = F(xo),

t-+0

where x(t) is the solution of (7.1.2) with the initial condition (7.1.3). This
can be rewritten in terms of the transformations St as
lim St(xo) - xo

= F(xo).

t-+0

This relation offers some insight into how the infinitesimal operator may
be defined for semigroups of contractions in general, and for semigroups of
the Frobenius-Perron and Koopman operators in particular.
Definition 7.5.1. Let L = V', 1 ~ p ~ oo, and {Tth~o be a semigroup of
contractions. We define by I>(A) the set of all f E L such that the limit

Af = lim Ttf- f

(7.5.1)

t-+0

exists, where the limit is considered in the sense of strong convergence (cf.
Definition 2.3.3). Thus (7.5.1) is equivalent to
lim ''Af- Ttft

t-+0

'II =
L

0.

Operator A: I>( A) -+ Lis called the infinitesimal operator. It is evident


that the subspace V(A) is linear or that

>..tft + >..2/2

E I>(A)

for all

/t, /2 E I>(A), and >..17 >..2 E R.

Furthermore, operator A: V(A)-+ Lis linear or


A(>..tft + >..2!2) =>.tAft+ >..2Ah

for all

/t, hE V(A) and >..t, >..2 E R.

In general, the domain V(A) of operator A is not the entire space L.


Before deriving the infinitesimal operators for the Frobenius-Perron and
Koopman semigroups, we consider the following example.
Example 7.5.1. Let X = R and L = V'(R), 1 ~ p < oo. Consider a
semigroup {Tth~o on L defined, as in Example 7.4.2, by

Ttf(x) = /(x- ct)

7.5. Infinitesimal Operators

207

cf. Figure 7.4.1). By the mean value theorem, iff is 0 1 on R, then

f(xwhere 191 :5 1 and f'


continuous on R, then

ct;-

f(x) = -cf'(x _(Jet),

= dJ jdx.

Thus, if

Af =lim Ttlt-+0
t

!'

is bounded and uniformly

I= -cf',

and the limit is uniform on R and consequently strong in VXJ. Further,


iff {and thus/') has compact support {zero outside a bounded interval}, then the limit is strong in every LP, 1 :5 p :5 oo. Thus, all such f
belong to 'D(A) and for them A is just differentiation with respect to x and
multiplication by -c. 0
In studying infinitesimal operators and many other problems of analysis,
functions that are equal to zero outside a compact set play an important
role. It is customary to call such mappings functions with compact
support. This notion does not coincide very well with our definition of
support given by equation {3.1.8) even though it is commonly accepted.
Thus, we will also use this traditional terminology, hoping that it will not
lead to confusion or misunderstanding.
Having introduced the notion of infinitesimal operators, and illustrated
their calculation in Example 7.5.1, we now wish to state a theorem that
makes explicit the relation among semigroups of contractions, infinitesimal
operators, and differential equations.
First, however, we must define the strong derivative of a function with
values in L = LP. Given a function u: l:i - L, where l:i c R, and a point
t 0 E l:i, we define the strong derivative u'(to) by
'( )
li u(t) - u(to)
uto= m
,
t-+to
t- to
where the limit is considered in the sense of strong convergence. This definition is equivalent to
lim llu(t)- u(to)- u'(to)ll
t-+to
t- to
L

= 0.

(7.5.2)

By using this concept, we can see that the value of the infinitesimal
operator for f E V{A), Af, is simply the derivative of the function u(t) =
Ttl at t 0. The following theorem gives a more sophisticated relation
between the strong derivative and the infinitesimal operator.

Theorem 7.5.1. Let {Tt}t~o be a continuous semigroup of contractions


acting on L, and A: V(A) - L the corre8p0nding infinitesimal operator.
Further, let u(t) =Ttl for fixed I E 'D{A). Then u(t) satisfies the following
properties:

208

7. Continuous Time Systems: An Introduction

(1) u(t) E 'D(A) fort~ 0;

(2) u'(t) exists fort~ 0; and


(3) u(t) satisfies the differential equation
fort~

u'(t) = Au(t)

(7.5.3)

and the initial condition

u(O) =f.

(7.5.4)

Proof. Fort= 0, properties (1)-(3) are satisfied by assumption. Thus we


may concentrate on t > 0. Let t 0 > 0 be fixed. By the definition of u(t),
we have
u(t) - u(to) = Ttf- Ttof
t-to
t-to
Noting that Tt = Tt-t 0 Tt 0 for t > to this differential quotient may be
rewritten as
~'---"-=

~---"".;....

u(t)- u(to) = Tto (Tt-tof-

t-to
Because

E V(A),

t-to

1)

fort> to.

(7.5.5)

the limit of

Tt-tof- I
t- to
exists as t -+ to and gives Af. Thus the limit of (7.5.5) as t -+ to also
exists and is equal to T't, 0 Af. In an analogous fashion, if t < to, we have
Tto = TtTt 0 -t and, as a consequence,
u(t) - u(to)

t-to

= Tt (Tt

-d- 1)

to-t

fort<to

(7.5.6)

and

I u(t~ =:o(to) - TtoAfiiL ~ IITt (Ttot~t~ ~ f -A/) IlL


+ IITtA/- Tt A/IIL ~ II Ttot~t~ ~ / -Aft
0

+ IITtAJ- TtoAfiiL
Again, since TtA/ converges to T't,0 Af as t -+to, the limit of (7.5.6) exists
as t-+ 0 and is equal to Tt 0 Af. Thus the existence of the derivative u'(to)
is proved.
Now we can rewrite equation (7.5.5) in the form
u(t)- u(to) _ Tt-t 0 (Tt 0 / ) - (Tt 0 / )

t-to

t-to

fort> to.

7.5. Infinitesimal Operators

209

Since the limit of the differential quotient on the left-hand side exists as
t-+ to, the limit on the right-hand side also exists as t-+ to, and we obtain
u'(to)

= ATt f,
0

which proves that Tt 0 f E 1>(A) and that u'(to) = Au(to).

Remark 7.5.1. The main property of the set V(A) that follows directly
from Theorem 7.5.1 is that, for f E V(A), the function u(t) =Ttl is a solution of equations (7.5.3) and (7.5.4). Moreover, the solution can be proved
to be unique. Unfortunately, in general1>(A) is not the entire space L,
although it can be proved that, for continuous semigroups of contractions,
V(A) is dense in L. 0
In Theorem 7.5.1, the notion of a function u: [0, oo) -+ L, where L is
again a space of functions, may seem strange. In fact, u actually represents
a function of two variables, t and x, since, for each t ~ 0, u(t) E II'. Thus we
frequently write u(t)(x) = u(t,x), and equation (7.5.3) is to be interpreted
as an equation in two variables.
Applying this theorem to the semigroup considered in Examples 7.4.2
and 7.5.1 with L =II', 1 :5 p < oo, it is clear that this semigroup satisfies
equation (7.5.3), where
u(t,x)

= Ttf(x) = f(x- ct)

Af=

dJ
-c-,
dx

and

E V(A).

These relations can, in turn, be interpreted as meaning that u( t, x) satisfies


the first-order partial differentiation equation

au
at

au

+ c ax

with the initial condition


u(O,x)

= 0

(7.5.7)

= f(x).

Remark 7.5.2. It is important to stress the large difference in the two interpretations of this problem as embodied in equations (7.5.3) and (7.5.7).
From the point of view of (7.5.7), u(t, x) is thought of a8 a function of isolated coordinates t and x that evolve independently and whose derivatives
8uf8t and 8ujax are evaluated at specific points in the (t,x)-plane. However, in the semigroup approach that leads to (7.5.3), we are considering
the evolution in time of a family of functions, and the derivative du(t)fdt
is to be thought of as taken over an entire ensemble of points. This is made
somewhat clearer when we take into account that u(t) = Ttl has a time
derivative u'(t 0 ) at a point t 0 if (7.5.2) is satisfied, that is,
lim
t-+to

joo Iu(t)(x)u(to)(x) - u'(to)(x)IP dx = 0.


t - to
_ 00

210

7. Continuous Time Systems: An Introduction

Moreover, u(t)(x) and u'(t)(x) with fixed tare defined as functions of x up


to a set of measure zero. D

7.6 Infinitesimal Operators for Semigroups


Generated by Systems of Ordinary
Differential Equations
We now turn to an explicit calculation of the infinitesimal operators for
the semigroups {Pth2:o and {Uth2:o generated by ad-dimensional system
of ordinary differential equations
dx

dt

(7.6.1a)

=F(x)

or
(7.6.1b)

i = 1, ... ,d,

where x = (xt, ... ,xd).


The semigroup of transformations {Stlt2:0 corresponding to equations
(7.6.1) is defined by the formula
St(x0 )

= x(t),

(7.6.2)

where x(t) is the solution of (7.6.1) corresponding to the initial condition


(7.6.3)
We will assume that the F, have continuous derivatives 8F,j8x;, i,j =
1, ... , d, and that for every x 0 e Jld the solution x(t) exists for all t E R.
This guarantees that (7.6.2) actually defines a group of transformations.
Because of a well-known theorem on the continuous dependence of solutions
of differential equations on the initial condition, {St}t;::o is a dynamical
system (see Example 7.1.1).
As the derivative of the infinitesimal operator AK for the Koopman operator is simpler, we start from there. By definition we have

Therefore
Utf(x 0 )

f(x 0 )

f(St(x 0 )) - f(x 0 ) _ f(x(t)) - f(x 0 )


t
t
_

so that, iff is continuously differentiable with compact support, then by


the mean value theorem
Utf(xo)t- /(xo)

tJz,
i=l

(x(fJt))xWJt)

t !~~:,
i=l

(x(6t))F,(x(6t)),

7.6. Semigroups and Differential Equations

211

where 0 < 8 < 1. Now by using equation (7.6.2), we obtain

Utf(~rP)t- l(:rP) =

t !:~:,

(SBt(xo))F,(SBt(xo)).

(7.6.4)

i=l

Since the derivatives

I z, have compact support

uniformly for all x 0 Thus (7.6.4) has a strong limit in L 00 , and the infinitesimal operator AK is given by
(7.6.5)
Observe that equation (7.6.5) was derived only for functions I with some
special properties, namely, continuously differentiable I with compact support. These functions do not form a dense set in L 00 , which is not surprising
since it can be proved that the semigroup {Uth>o is not, in general, continuous in L 00 It does become continuous in a subspace of L 00 consisting
of all continuous functions with compact support (see Remark 7.6.2).
Hence, if I is continuously differentiable with compact support, then by
Theorem 7.5.1 for such I the function

u(t,x) = Utf(x)
satisfies the first-order partial differential equation (7.5.3). From (7.6.5) it
may be written as
(7.6.6)

Remark 7.6.1. It should be noted that the same equation can be immediately derived for u(t, x) = I (St (x)) by differentiating the equality
u(t, S_t(x)) = l(x) with respect to t. In this case I may be an arbitrary
continuously differentiable function, not necessarily having compact support. However, in this case (7.6.6) is satisfied locally at every point (t,x)
and is not an evolution equation in L 00 (cf. Remark 7.5.2}. 0
We now turn to a derivation of the infinitesimal operator for the semigroup ofFrobenius-Perron operators generated by the semigroup of (7.6.la).
This is difficult to do if we start from the formal definition of the FrobeniusPerron operator, that is,

[ Ptl(x)p.(dx)

}A

= [

1s; 1 (A)

l(x)p.(dx)

forA EA.

212

7. Continuous Time Systems: An Introduction

However, the derivation is straightforward if we start from the fact that


the Frobenius-Perron and Koopman operators are adjoint, that is,

(Ptf,g}

= (/,Utg},

(7.6.7)

Subtract {/,g) from both sides of (7.6.7) to give

{Pd- /,g)=(!, Utg- g}


or, after division on both sides by t,
{(Ptf- /)/t,g}

= {/, (Utg- g)ft}.

(7.6.8)

Now let f E V(App) and g E V(AK), where App and AK denote, respectively, the infinitesimal operators for the semigroups of Frobenius-Perron
and Koopman operators. Take the limit as t-+ 0 in (7.6.8) to obtain
(Appf,g}

= {/,AKg}.

(7.6.9)

However, from equation (7.6.5) the right-hand side of (7.6.9) can be written
as

provided g is a continuously differentiable function with compact support.


If we write out this scalar product explicitly and note that X = Rd and
dp. = dx1 dxd = dx, we obtain

for f E V(App ), which is also continuously differentiable. Since g has


compact support,

and thus

7.6. Semigroups and Differential Equations

213

which is a d-dimensional version of the "integration by parts" formula.


From this and equation (7.6.9), we finally obtain

(AFPI,g}

= (-

t. {}~:i) ,g).

This formula holds for every continuously differentiable I e 1>( AF p) and


for every continuously differentiable g with compact support. Such a function g is automatically contained in 'D(AK)
Therefore
(7.6.10)
for continuously differentiable I E 'D(AFP ). Again, by using Theorem 7.5.1,
we conclude that the function

u(t,x)

= Ptf(x)

satisfies the partial differential equation (continuity equation)


d

au+ 2: a(uFi) = o.
8t

i=l

(7.6.11)

OXi

Example 7.6.1. As a special case of the system (7.6.1) of ordinary differential equations, let d = 2n and consider a Hamiltonian system whose
dynamics are governed by the canonical equations of motion (Hamilton's
equations)
i

= 1, ... ,n,

(7.6.12)

where H(p, q) is the system Hamiltonian. In systems of this type, q and


p are referred to as the generalized position and momenta, respectively,
whereas His called the energy. Equation (7.6.11) for Hamiltonian systems
takes the form
8u +
8u 8H _ 8u 8H = O,

at

t oq,
i=l

{}p,

{}pi

oq,

which is often written as

ou
8t + [u,H] = 0,
where [u, H] is the Poisson bracket of u with H. For Hamiltonian systems,
the change with time of an arbitrary function g of the variables q1, ... , qn,
Ph ... , Pn is given by

214

7. Continuous Time Systems: An Introduction

In particular, if we take g to be a function of the energy H, then


dg
dt

= dg

dH
dHdt

= dg (H H] =0
dH

since [H, H] = 0. Thus any function of the generalized energy His a constant of the motion. 0

Remark 7.6.2. The semigroup ofFrobenius-Perron operators {Pth~o corresponding to the system {Sth~o generated by equation (7.6.1) is continuous.
To show this note that, since St is invertible (S;- 1 B-t), by Corollary
3.2.1 we have
(7.6.13)
Ptf(x) = I(B-t(x))Lt(x),

where Lt is the Jacobian of the transformation B-t Thus, for every continuous I with compact support,

uniformly with respect to x. This implies that


lim

t-to

IIPtf- Ptolll = t-to


lim f IPtl(x)- Ptol(x)l dx = 0
} Rd.

since the integrals are, in actuality, over a bounded set. Because continuous
functions with compact support form a dense subset of L 1 , this completes
the proof that {Pt}t~o is continuous.
Much the same argument holds for the semigroup {Uth~o if we restrict
ourselves to continuous functions with compact support. In this case, from
the relation

Utf(x) = I(St(x)),
it immediately follows that Utf is uniformly convergent to Ut 0 l as t - to.
For this class of functions the proof of Theorem 7.5.1 can be repeated, thus
showing that equation (7.5.3) is true for I E 'D(AK ). 0
In the whole space L 00 , it may certainly be the case that {Uth~o is not
a continuous semigroup. As an example, consider the differential equation

dx
dt

= -c

whose corresponding dynamical system is Stx = x- ct. Thus the semigroup


{Uth~o is given by Utf(x) = l(x- ct). As we know from Example 7.4.2,
when c -::/= 0, this semigroup is certainly not continuous in L 00
The continuity of {Pt }t~ 0 is very important since it proves that the set
'D(AFP) is dense in L 1 . Using equation (7.6.13) it may also be shown that
'D(AFP) contains all I with compact support, that have continuous firstand second-order derivatives.

7.7. Applications of Semigroups

215

7.7 Applications of the Semigroups of the


Frobenius-Perron and Koopman Operators
After developing the concept of the semigroups of the Frobenius-Perron
operators in Section 7.4 and introducing the general notion of an infinitesimal operator in Section 7.5 and of infinitesimal operators for semigroups
generated by a system of ordinary differential equations in Section 7.6,
we are now in a position to examine the utility and applications of these
semigroups to questions concerning the existence of invariant measures and
ergodicity. This material forms the core of this and the following section.

Theorem 7. 7.1. Let (X, .A, JJ) be a measure space, and St: X-+ X a family
of nonsingular tmnsformations. Also let Pt: L 1 -+ 1 be the FrobeniusPerron opemtor corresponding to {Sth~o Then the measure
JLJ(A)
is invariant with respect to

{Sth~o

f(x)JL(dx)

if and only if Ptl

=f

for all t ~ 0.

Proof. The proof is trivial, since the invariance of JL1 implies


for A

e .A,

which, with the definition of Pt, implies Ptf =f. The converse is equally

Now assume that JJ/ is invariant. Since by the preceding theorem we


know that Ptf = f, and
easy to prove.

f=limPtf-f
FP

t-+0

then Appf = 0. Thus the condition Appf = 0 is necessary for 1'1 to be


invariant. To demonstrate that Appf = 0 is also sufficient for 1'1 to be
invariant is not so easy, since we must pass from the infinitesimal operator
to the semigroup. To deal with this very general and difficult problem,
we must examine the way in which semigroups are constructed from their
infinitesimal operators. This construction is very elegantly demonstrated
by the Hill~Yosida theorem, which is described in Section 7.8.
Analogously to the way in which the semigroup of the Frobenius-Perron
operator is employed in studying invariant measures of a semidynamical
system {Sth~o, the semigroup of the Koopman operator can be used to
study the ergodicity of {Sth~o
We start by stating the following theorem.

Theorem 7.7.2. A semidynamicalsystem {Sth>o, with nonsingulartmnsformations St: X -+ X, is ergodic if and only if the fixed points of {Uth~o
are constant functions.

216

7. Continuous Time Systems: An Introduction

Proof. The proof is quite similar to that of Theorem 4.2.1. First note that
if {Sth~o is not ergodic then there is an invariant nontrivial subset CCX,
that is,
fort~ 0.
8t" 1 (C) = C
By setting

f = 1c, we have
Ud = 1c o St = 18 ;-l(C} = 1c =f.

Since Cis not a trivial set, f is not a constant function (cf. Theorem 4.2.1).
Thus, if {Sth~o is not ergodic, then there is a nonconstant fixed point of
{Uth~o
Conversely, assume there exists a nonconstant fixed point f of {Uth~o
Then it is possible to find a number r such that the set

= {x:f(x) < r}

is nontrivial (cf. Figure 4.2.1). Since, for each t

8t" 1 (C)

0,

= {x: St(x) E C} = {x: f(St(x)) < r}


= {x:Utf < r} = {x:f(x) < r} = C,

subset Cis invariant, implying that {St}t~o is not ergodic.


Proceeding further with an examination of the infinitesimal operator
generated by the Koopman operator, note that the condition Utf = f,
t ~ 0, implies that
. Utf-f
AK f =hm
=0.
t-+0

Thus, if the only solutions of AK f = 0 are constant, then the semidynamical system {St}t~o must be ergodic.

Example 7. 7.1. In this example we consider the ergodic motion of a point


on a d-dimensional torus, which is a generalization of the rotation of the
circle treated in Example 7.3.1. We first note that the unit circle 8 1 is a
circle of radius 1, or

8 1 = {m:m = eiz,x E R}.


Formally, the d-dimensional torus Td is defined as the Cartesian product
of d unit circles 8 1 ' that is,
Td

.~. x8 1
= {(m1, ... ,md):mk = eiz~,Xk E R,k = 1, ... ,d}

= S1 x

(cf. Example 6.8.1 where we introduced the two-dimensional torus). Td is


clearly a d-dimensional Riemannian manifold, and the functions mk = eiz,,

7. 7. Applications of Semigroups

217

k = 1, ... , d, give a one to one correspondence between points on the torus


Td and points on the Cartesian product
d

(0,21r)x x[0,21r).

(7.7.1)

x,.

The
have an important geometrical interpretation since they are lengths
on 8 1. The natural Borel measure on 8 1 is generated by these arc lengths
and, by Fubini's theorem, these measures, in turn, generate a Borel measure
on~- Thus, from a measure theoretic point of view, we identify Td with
the Cartesian product (7.7.1), and the measure p, on ~ with the Borel
measure on R,d. We have, in fact, used exactly this identification in the
intuitively simpler cases d = 1 (r-adic transformation; see Example 4.1.1
and Remark 4.1.2) and d = 2 (Anosov diffeomorphism; see Example 4.1.4
and Remark 4.1.6). The disadvantage of this identification is that curves
that are continuous on the torus may not be continuous on the Cartesian
product (7.7.1).
Thus we consider a dynamical system {StheR that, in the coordinate
system {x,.}, is defined by
(mod 211").
We call this system rotation on the torus with angular velocities w11 .. .,
Since det(dSt(x)fdx) = 1, the transformation St preserves the measure.
We will prove that {StheR is ergodic if and only if the angular velocities
w1, ... , Wd are linearly independent over the ring of integers. This linear
independence means that the only integers k1, ... , kd satisfying

Wd

(7.7.2)
are kt = = kd = 0.
To prove this, we will use Theorem 7.7.2. Choose f E L 2 (Td) and assume
Utf f for t E R, where Utf f o St is the group of Koopman operators
corresponding to St. Write f as a Fourier series

f(xl! ... , Xd)

=L

Ok1 kd

exp(i(k1x1

+ + kdxd)],

where the summation is taken over all possible integers k11 . .. , kd. Substitution of this series into the identity f(x) = f(St(x)) yields

L a1c

1 ,.d

=L

exp(i( kt Xt

+ + kdxd)]

a,., ... ,.d exp(it(w1k1

exp(i(k1x1

+ + wdkd)]

+ + kdxd)).

As a consequence we must have


fortER

(7.7.3)

218

7. Continuous Time Systems: An Introduction

and all sequences k11 , kd. Equation (7.7.3) will be satisfied either when
ak 1 k., = 0 or when (7.7.2) holds. If w1 , ,wd are linearly independent,
then the only Fourier coefficient that can be different from zero is ao...o. In
this case, then, /(x) = ao ...o is constant and the ergodicity of {StheR is
proved.
Conversely, if the w1 , . , Wd are not linearly independent, and condition
(7.7.2) is thus satisfied for a nontrivial sequence kt, ... , kd, then {7.7.3)
holds for ak 1 k., = 1. In this case the nonconstant function

/(x)
satisfies /(x)

= f(St(x))

= exp[i(k1x1 + + kdxd)]

and {StheR is not ergodic.

Remark 7. 7.1. The reason why rotation on the torus is so important stems
from its frequent occurrence in applied problems. As a simple example,
consider a system of d independent and autonomous oscillators
k= l, ... ,d,

(7.7.4)

where q1, ... , qd are the positions of the oscillators and Pl, ... , Pd are their
corresponding velocities. For this system the total energy of each oscillator
is given by
Ek = ~p~ + ~w~q~,
k = 1, ... ,d,
and it is clear that the Ek are constants of the motion. Assuming that
E1, ... , Ed are given and positive, equations (7.7.4} may be solved to give

Pk(t) = Akwk cos(wkt + ak),

qk(t) = Ak sin(wkt + ak),

where Ak = V2Ek/wk and the ak are determined, modulo 211", by the initial
conditions of the system. Set Pk = Pk/Akwk and iik = qk/Ak so that the
vector (p( t), q(t)) describes the position of a point on a d-dimensional torus
moving with the angular velocities w1 , ,wd. Thus, for fixed and positive
E1, ... , Ed, all possible trajectories of the system (7.7.4) are described by
the group {StheR of the rotation on the torus.
At first it might appear that the set of oscillators described by (7.7.4) is
a very special mechanical system. Such is not the case, as equations (7.7.4)
are approximations to a very general situation. We present an argument
below that supports this claim.
Consider a Hamiltonian system

dqk

8H

dt = 8pk'

dpk
dt

8H

= - 8qk

k= l, ... ,d.

Typically the energy H has the form

H(p,q)

=! :~::::a;k(q)P;Pk + V(q),
j,k

(7.7.5)

7.7. Applications of Semigroups

219

where the first term represents the kinetic energy and V is a potential
function. Because the first term in H is associated with the kinetic energy,
the quadratic form E;,k a;~~:(q) is symmetric and positive definite. Further,
if rf is a stable equilibrium point, then

avl

-o

k= 1, ... ,d

8qk q=qOand the quadratic form,

is also positive definite (we neglect some special cases in which it might
be semidefinite). Further, we assume that H{O,q0 ) = V(q0 ) = 0 since the
potential is only defined up to an additive constant. Thus, developing H
in a Taylor series in the neighborhood of {0, q0 ), and neglecting terms of
order three and higher, we obtain

H(p, q)

=!L
j,k

a;~~:P;Pk +!

L b;~~:(q;- qJ)(q~~:- q2)

{7.7.6)

j,k

where a;~~: = a;~~:(q0 ) and b;~~: = (82 Vj8q;8q~~:)lqo. Both the quadratic forms
E;,k a;~~: and E ,~~: b;k are symmetric and positive definite. With approximation (7.7.6), t~e original Hamiltonian equations (7.7.5) may be rewritten
as

(7.7.7)
where the variables Pk and q~~: - q2 denote, respectively, the deviation of
the system from the equilibrium point {0, q0 ).
Since matrices A = (a;~~:) and B = (b;~~:) are symmetric and positive
definite, there exists a nonsingular matrix C such that (Gantmacher, 1959)

and

CA- CT

1~ ... 0)i
(

with positive elements Ai on the diagonal. By introducing new variables


q- qO = cTij and p = C- 1p into equations (7.7.7), we obtain

{7.7.8)
This new system is completely equivalent to our system {7. 7.4) of independent oscillators with angular velocities w~ = .X~~:.
Finally we note that, although our approximation shows the correspondence between rotation on the torus and Hamiltonian systems, the terms

220

7. Continuous Time Systems: An Introduction

we neglected in our expansion of H might play a very important role in


modifying the eventual asymptotic behavior of a Hamiltonian system. 0

Remark 7.7.2. Note that the statement and proof of Theorem 7.7.2 are
virtually identical with the corresponding discrete time result given in Theorem 4.2.1. Indeed, necessary and sufficient conditions for ergodicity, mixing, and exactness using the Frobenius-Perron operator, identical to those
in Theorem 4.4.1, can be stated by replacing n by t. Analogously, conditions
for ergodicity and mixing in continuous time systems using the Koopman
operator can be obtained from Proposition 4.4.1 by setting n = t. Since all
of these conditions are completely equivalent we will not rewrite them for
continuous time systems. 0
Example 7. 7.2. To illustrate the property of mixing in a continuous time
system we consider a model for an ideal gas in R,3 adapted from Cornfeld,
Fomin, and Sinai [1982). However, our proof of the mixing property is
based on a different technique. At any given moment of time the state of
this system is described by the set of pairs

where Xi denotes the position, and vi, the velocity of a particle. We emphasize that y is a set of pairs and not a sequence of pairs, which means that
the coordinate pairs (xi, vi) are not taken in any specific order. Physically
this means that the particles are not distinguishable. It is further assumed
that the gas is sufficiently dilute, both in spatial position and in velocity, so
that the only states that must be considered are such that in every bounded
set B c It> there is, at most, a finite number of pairs (xi, vi)
The collection of all possible states of this gas will be denoted by Y, and
we assume that the motion of each particle at the gas is governed by a
group of transformations St: Y --+ Y given by
or, more compactly, by St(Y) = {st(Xi,vi)}, where {stheR is the family of
transformations in It> such that
Bt(x,v)

= (x+vt,v).

Thus particles move with a constant speed and do not interact. The surprising result, proved below, is that this system is mixing.
To study the asymptotic properties of {StheR, we must define a aalgebra and a measure on Y. We do this by first introducing a special
measure on Jt>, which is the phase space for the motion of a single particle.
Let g be a density on R,3. As usual, the measure associated with g is
m 9 (A) =

g(v)dv

7.7. Applications of Semigroups

221

for every Borel set A c R.3, and the measure m in Jl6 = R.3 x R.3 is defined
as the product of the usual Borel measure and m 9 , that is,

From a physical point of view this definition of the measure simply reflects the fact that the particle positions are uniformly distributed in R3 ,
whereas the velocities are distributed with a given density g, for instance,
the Maxwellian g(v) = c exp( -lvl 2).
With these comments we now proceed to define a u-algebra and a measure on Y. Let Bt, ... , Bn be a given sequence of bounded Borel subsets of
R6 for an arbitrary n, and k1, ... , kn be a given sequence of integers. We
use C(Bt. ... , Bn; kt. ... , kn) to denote the set of all y = {(xi, vi)} such
that the number of elements (xi, vi) that belong to B; is equal to k;, that
is,

= {y E Y:

C(Bt, ... , Bn; kt. ... , kn)

#(ynBt)

= kt. ... ,

#(ynBn)

= kn},

(7.7.9)
where # Z denotes the number of elements of the set Z. Sets of the form
(7. 7.9) are called cylinders. If the sets Bt. ... , Bn are disjoint, then the
cylinder is said to be proper. For every proper cylinder, we define
~t(C(Bt,

... , Bn; kt. ... , kn))

[m{Bt)]k1 [m{Bn)]k,.
[- ~ {B)]
k '-k I
exp
L...Jm ' .
n

(7.7.10)

i=l

From {7.7.10) it follows immediately that


~t(C(Bt. ... , Bn; kt. ... , kn)

= ~t(C(Bt; kt)) ~t(C(Bn; kn))

(7.7.11)

whenever the sets B1, ... , Bn are mutually disjoint.


It is also easy to calculate the measure of C(Bt. B2; kt. k2) when B1
and B 2 are not disjoint by writing C as the union of proper cylinders.
Thus, y belongs to C(Bt, B2; kt. k2) if, for some r ::::;; min{kt. k2), the set
B~ = B 1 \ B 2 contains kt - r particles, B 0 = Bt n B2 contains r particles,
and B~ = B2 \ B1 has k2 - r particles. AB a consequence,
k

C(Bt, B2; kt. k2)

= U[C(Br; kt -

r) n C(B 0 ; r) n C(B~; k2- r)J,

r=O

I'

C(B B . k k ))
(
t. 2' t. 2

= ~ [m(B?)Jkl-r[m{BoW[m(B~)Jk2-r
~

{kt- r)!r!{k2- r)!

exp[-m(Br)- m(no)- m(Bg)]

(7.7.12)

222

7. Continuous Time Systems: An Introduction

By employing arguments of this type we can calculate the measure JJ of


any cylinder. However, the formulas for arbitrary cylinders are much more
complicated as it is necessary to sum these various contributions first with
respect to q

= (;)

parameters r 1 , , r 9 , corresponding to all possible

intersections Bin B;, i

j, then with respect to (

~)

parameters cor-

responding to all possible intersections Bin B; n B1, i # j # l, and so


forth.
With respect to the 0'-algebra, we define A to be the smallest 0'-algebra
that contains all the cylinders or, equivalently, all proper cylinders. Using
standard results from measure theory, it is possible to prove that JJ given
by (7.7.10) for proper cylinders can be uniquely extended to a measure on
A and that the characteristic functions of proper cylinders

form a linearly dense subset of L 2 (Y, A, I'). We omit the proof of these
facts as they are quite technical in nature and, instead, turn to consider
the asymptotic properties of system {StheR on the phase spaceY.
First we note that the measure I' is normalized. To show this, take an
arbitrary bounded Borel set B. Then
00

Y= ua(B;k)
k=O

since every y belongs to one of the cylinders C(B; k), namely, the one for
which #(y n B) = k. As the cylinders C(B; k), k = 0, 1, ... , are mutually
disjoint, we have
JJ(Y)

= EJJ(C(B;k)) =
lc=O

E[m~)Jk

e-m(B)

= 1.

lc=O

Second, the measure JJ is invariant with respect to {StheR To show this,


note that for every cylinder
St(C(Bl, 1 Bni k1, 1 kn)) = C(st(Bl), ... 1 Bt(Bn)i k11 ... 1 kn)
It is clear that (x,v) E Bt(B;) if and only if (x,v) E B;, where x = x- vt,
v = v, and, as a consequence,

m(st(B;)) =

II
e(BJ)

g(v)dxdv

=ILJ

g(v)dXdv = m(B;).

7.7. Applications of Semigroups


From this equality, m(st(B;))
therefore, have

= m(B;)

p.(St(C(Bt, ... , Bni k1, ... , kn)))

223

and, from equation {7.7.10), we,

= p.(C(B1, ... , Bni k1, ... , kn))

for every proper cylinder. Writing JJ.t(E) = p.(St(E)) forE E A, we define


for every fixed t a measure Jl.t, on A that is identical with J1. for proper
cylinders. Since J1. is uniquely determined by its values on cylinders, we
must have Jl.t(E) = p.(E) for all E E A, and thus the invariance of Jl. with
respect to St is proved.
With these results in hand, we now prove that the dynamical system
{StheR is mixing. Since the characteristic functions of proper cylinders
are linearly dense in L 2 (Y,A,p.), by Remark 7.7.2 it is sufficient to verify
the condition
(7.7.13)
for every two proper cylinders C1 and C2. Since
Utlcl (y)

and {lei, 1}

= lc (St(y)) = ls_e(Cl)(Y)
1

= p.(C;), condition (7.7.13) is equivalent to


lim p.(S-t(Cl) n C2)

t-+OO

= p.(Cl)JJ.(C2)

(7.7.14)

We will verify that (7.7.14) holds only in tlie simplest case when each of
the cylinders C; is determined by only one bounded Borel set. Thus we
assume
(7.7.15)
C; = C(B;;kj),
j = 1,2.
(This is not an essential simplification, since the argument proceeds in
exactly the same way for arbitrary proper cylinders. However, in the general
case the formulas are so complicated that the simple geometrical ideas
behind the calculations are obscured.) When the C; are given by (7.7.15),
the right-hand side of equation (7. 7.14) may be easily calculated by (7. 7.10).
Thus

To compute the left-hand side of equation (7.7.14), observe that

so

p.(S-t(Cl) n C2)

= p.(C(s_t(Bl)i kl) n C(B2; k2))


= p.(C(s-t(Bl), B2; k1, k2)).

(7.7.17)

224

7. Continuous Time Systems: An Introduction

With (7.7.12) we have


k

p(S-t(Cl) n 2) = ~

[m(Bf)]kl-r[m(BtW[m(B~)jk2-r

(k1 - r)lr!(k2- r)l

exp(-m(BD- m(Bt)- m(B~)],


where B1 = 8-l(Bl) \ B2, Bt = Lt(Bl) n B2, and B~
From our definition of m, we have

= B2 \

(7.7.18)

8-t(Bt).

j j 1s (x+vt,v)g(v)dxdv.
1

B2

Since B 1 and B2 are bounded, 1s 1 (x+vt,v) = 0 for almost every point


(x, v) E B 2 if t is sufficiently large (except for some points at which v = 0).
Thus, by the Lebesgue dominated convergence theorem,
(7.7.19)
Furthermore, since B~ = B 2 \ Bt, it follows that
(7.7.20)
Finally, since B1

= Lt(B1) \

Bt and 8t is measure preserving,

m(BD = m(8-t(Bt))- m(Bt) = m(B1) - m(Bt),


and
lim m(BD

t-+oo

= m(B1).

(7.7.21)

Passing to the limit in equation (7.7.18) and using (7.7.19) through


(7.7.21) gives

which, together with (7.7.16), proves (7.7.14).


From this proof, it should be clear that mixing in this model is a consequence of the following two facts. The first is that, for disjoint B 1 and
B2 and given kt and k2, the events consisting of B 1 containing kt particles
and B2 containing k2 particles are independent [this follows from equation (7.7.11)]. Second, for every two bounded Borel sets Bt and B2, the

7.7. Applications of Semigroups

225

sets Lt{B1) and B2 are "almost" disjoint for large t. Taken together these
produce the surprising result that mixing can appear in a system without
particle interaction. 0

Example 7.7.3. The preceding example gave a continuous time, dynamical system that was mixing. The phase space of this system was infinite
dimensional. This fact is not essential. There is a large class of finite dimensional, mixing, dynamical systems that play an important role in classical
mechanics. In this example we briefly describe these systems. An exhaustive
treatment requires highly specialized techniques from differential geometry
and cannot be given within the measure-theoretic framework that we have
adopted. All necessary information can be found in the books by Arnold
and Avez [1968), by Abraham and Marsden [1978), and articles by Anosov
[1967) and by Smale [1967).
Let M be a compact connected smooth Riemannian manifold. Having
M, we define the sphere bundle E as the set of all pairs {m,~), where m is
an arbitrary point of M and ~ is a unit tangent vector starting at m. This
definition can be written as
E

= {{m,{):m EM,{ E Tm, 11{11 = 1}.

It can be proved that E, with an appropriately defined metric, is also a


lliemannian manifold. Thus a measure #JE is automatically given on E. In
a physical interpretation, M is the configuration space of a system that
moves with constant speed and E is its phase space. To describe precisely
the dynamical system that corresponds to this interpretation we need only
the concept of geodesics. Let "'(: R -+ M be a C 1 curve. This curve is called a
geodesic iffor every point mo ="'((to) there is an e > 0 such that for every
m 1 = "Y(tl), with lt1- to I $ e, the length of the arc "Y between the points
mo and m 1 is equal to the distance between mo and m1. It can be proved
that, for every {m, ~) E E, there exists exactly one geodesic satisfying
"'f{O)

= m,

'Y'{O)

= ~.

!I"Y'(t)ll

=1

fortE R.

(7.7.22)

We define a dynamical system {StlteR on E by setting


St(m,~)

= ("Y(t),"'f'(t))

fortE R,

where the geodesic "Y satisfies (7.7.22). This system is called a geodesic
flow.
In the case dim M = 2, the geodesic flow has an especially simple interpretation: It describes the motion of a point that moves on the surface M in
the absence of external forces and without friction. The motion described
by the geodesic flow looks quite specific but, in fact, it represents a rather
general situation. If M is the configuration space of a mechanical system
with the typical Hamiltonian function (see Remark 7.7.1),

H(q,p) =

! L a;~c(q)P;Pk + V(q),
j,lc

226

7. Continuous Time Systems: An Introduction

then it is possible to change the Riemannian metric on M in such a way


that trajectories of the system become geodesics.
The behavior of the geodesic flow depends on the geometrical properties
of the manifold M and most of all on its curvature. In the simplest case,
dim M = 2, the curvature K is a scalar function and has a clear geometrical
interpretation. In order to define K at a point m E M, we consider, in a
neighborhood W of m, a triangle made by three geodesics. We denote the
angles of that triangle by a1, a2, as, and its area by u. Then
K(m) =lim[(at+ a2 +as -11')/u],
where the limit is taken over a sequence of neighborhoods that shrinks to
the point m. In the general case, dim M > 2, the curvature must be defined
separately for every two-dimensional section of a neighborhood of the point
m. (Thus, in this case, the curvature becomes a tensor.) When the curvature of M is negative, the behavior of the geodesic flow is quite specific
and highly chaotic. Such flows have been studied since the beginning of
the century, starting with Hadamard (1898]. Results were first obtained for
manifolds with constant negative curvature and then finally completed by
Anosov (1967]. It follows that the geodesic flow on a compact, connected,
smooth Riemannian manifold with negative curvature is mixing and even
a K-flow (a continuous time analog of K-automorphism). This fact has
some profound consequences for the foundations of classical statistical mechanics. A heuristic geometrical argument of Arnold (1963] shows that the
Boltzmann-Gibbs model of a dilute gas (ideal balls with elastic collisions)
may be considered as a geodesic flow on a manifold with negative curvature.
Thus, such a system is not only ergodic but also mixing. A sophisticated
proof of the ergodicity and mixing of the Boltzmann-Gibbs model has been
given by Sinai (1963, 1970]. D

7.8 The Hille-Yosida Theorem and Its


Consequences
Theorem 7.8.1 (Hille-Yosida). Let A:V(A)-+ L be a linear operator,
where V(A) C Lis a linear subspace of L. In order for A to be an infinitesimal operator for a continuous semigroup of contractions, it is necessary
and sufficient that the following three conditions are satisfied:
(a) V(A) is dense in L, that is, every point in L is a strong limit of a
sequence of points from V(A);

(b) For each f E L there exists a unique solution g E V(A) of the resolvent equation
(7.8.1)
).g-Ag=/;

7.8. The Hille-Yosida Theorem and Its Consequences

227

{c) For every g E V(A) and,\> 0,


ll-\g- AgiiL ~

AII9IIL

(7.8.2)

Further, if A satisfies {a)-(c), then the semigroup corresponding to A is


unique and is given by
fEL,

(7.8.3)

where A.x = -\AR.x and R.xf = g (the resolvent operator) is the unique
solution of ,\g-Ag = f.
Consult Dynkin [1965) or Dunford and Schwartz [1957) for the proof.
Operator A.x -\AR.x can be written in several alternative forms, each of
which is useful in different situations. Thus, after substitution of g = R.xf
into (7.8.1), we have

(7.8.4)
By applying the operator R.x to both sides of (7.8.1) and using g = R.xf,
we also obtain
-\R.xg- R.xAg g
for g E V(A).
(7.8.5)

Equations (7.8.4) and (7.8.5) immediately give

R.xAf = AR.xf

for

V(A).

(7.8.6)

Equation (7.8.4) also gives

AR.xf = (,\R.x - I)f

for f E L,

(7.8.7)

where I is the identity operator (If


f for all f). Thus we have three
possible representations for A.x: the original definition,
(7.8.8)
or, from (7.8.7),
(7.8.9)
and, finally, from (7.8.6),
{7.8.10)
The representations in (7.8.8) and (7.8.9) hold in the entire space L, whereas
(7.8.10) holds in V(A).
From conditions (b) and (c) of the Hille-Yosida theorem, using g = R.xf,
it follows that
(7.8.11)

228

7. Continuous Time Systems: An Introduction

Consequently, using (7.8.9),


IIA~!IIL = 11>.2 R~f- >.filL ~ 11>.2 R~!IIL

so that the operator

exp(tA~)

+ 11>./IIL ~ 2>.11/IIL.

can be interpreted as the series

etA,. f =

oo tn

L --;A~/,
n.

(7.8.12)

n=O

which is strongly convergent.


In addition to demonstrating the existence of a semigroup {Tt}t~o corresponding to a given operator A, the Hille-Yosida theorem also allows us
to determine some properties of {Tth~o
One very interesting corollary is the following. Suppose we have an operator A: V(A) - t L (remembering that L = V) that satisfies conditions
(a)-(c) of the Hille-Yosida theorem, and such that the solution g = R~f
of equation (7.8.1} has the property that R~f 2: 0 for f 2: 0. Then, as we
will show next, Ttf 2: 0 for every f 2: 0.
To see this, note that from (7.8.9} we have
(7.8.13}
where

et~ R" f
2

=L -

oo tn An

n=O

n.1

(>.R~tf.

(7.8.14}

Further, for any f 2: 0, R~f 2: 0 and, by induction, ~/ 2: 0. Thus, from


(7.8.14}, since >. > 0 and t 2: 0, exp(t>.2 R~)f 2: 0 and so, from (7.8.13},
exp(tA~}/ 2: 0. Finally, from (7.8.3}, we have Ttl 2: 0 since it is the limit
of nonnegative functions.
Now suppose that L = L 1 and that the operator >.R~ preserves the
integral, that is,

>.

LR~f(x)J.L(dx} L
L
L
=

for all f E L 1 , >. > 0.

f(x)J.L(dx}

(7.8.15}

We will show that these properties imply that

Ttf(x)J.L(dx) =

f(x)J.L(dx},

f 2: 0, t 2: 0.

This is straightforward. Since (7.8.14} is strongly convergent, and using


equation (7.8.15}, we obtain

eU R"

f(x)J.L(dx}

= ~ t:~n

L(>.R~)n

=?; 7

Jx f(x)J.L(dx}

oo tn).n

et~

f(x)J.L(dx)

f(x)J.L(dx).

(7.8.16)

7.8. The Hille-Yosida Theorem and Its Consequences

229

Now,

by the use of equation (7.8.16), and the claim is demonstrated.


These two results may be summarized in the following corollary.

Corollary 7.8.1. Let A: I>(A) - L1 be an operator satisfying conditions

(a)-(c) of the Hille-Yosida theorem. If the solution g = R>./ of (7.8.1)


is such that >.R>. is a Markov operator, then {Tth~o generated by A is a
continuous semigroup of Markov operators.
In fact, in this corollary only conditions (a) and {b) of the Hille-Yosida
theorem need be checked, as condition (c) is automatically satisfied for any
Markov operator.
To see this, set f = ).g - Ag and write inequality {7.8.2) in the form

11/11 ~ 11>-R>./11.
This is always satisfied if >.R>. is a Markov operator, as we have shown in
Section 3.1 [cf. inequality {3.1.6)].
The Hille-Yosida theorem has several other important applications. The
first is that it provides an immediate and simple way to demonstrate that
Appf = 0 is a sufficient condition that /.Lf is an invariant measure.
Thus, A/= 0 implies, from (7.8.10), that A>./ = 0 and from (7.8.12)
etA,.

I= f.

This, combined with (7.8.3), gives

Ttl=!

for all t

0.

Thus, in the special case Appf = 0 this implies that Ptf = f and thus 1-'1
is invariant.
By combining this result with that of Section 7.7, we obtain the following
theorem.

Theorem 7.8.2. Let {Sth~o be a semidynamicalsystem such that the corresponding semigroup of Frobenius-Perron operators is continuous. Under
this condition, an absolutely continuous measure /.L1 is invariant if and only
if Appf = 0.
Consider the special case where App is the infinitesimal operator for a d.dimensional system of ordinary differential equations {cf. equation 7.6.10).

230

7. Continuous Time Systems: An Introduction

Then the necessary and sufficient condition that Jl.J be invariant, that is
AFP! = 0, reduces to
d

L8(/Fi) =0
i=1 8xi

(7.8.17)

for continuously differentiable f e L 1 This result was originally obtained


by Liouville using quite different techniques and is known as Liouville's
theorem.

Remark 7.8.1. Equation (7.8.17) is also a necessary and sufficient condition for the invariance of the measure
Jl.J(A) =

f(x)p.(dx)

even if f is an arbitrary continuously differentiable function that is not


necessarily integrable on R!-. This is related to the fact that operators Ptf
as given by (7.6.13) can also be considered for nonintegrable functions.
Thus, if one wishes to determine when the Lebesgue measure

p.(A)

dxt ... dxd

dx

is invariant, it is necessary to substitute its density J(x)


This gives

= 1 into (7.8.17).
(7.8.18)

as a necessary and sufficient condition for the invariance of the Lebesgue


measure. (In many sources, equation (7.8.18) is called Liouville's equation,
even though it is a special case of equation (7.8.17).] 0

Remark 7.8.2. It is quite straightforward to show that Hamiltonian systems (see Example 7.6.1) satisfy (7.8.18) since

i= 1

[~ (8H) + ~ (- 8H)] = O
8qi

Bpi

Bpi

8qi

automatically, and thus they preserve the Lebesgue measure. 0


Returning now to the problem of determining the ergodicity of a semidynamical system {Sth>o, recall that Utg = g implies AKg = 0. Using this
relation and Theorem 7-:1.2 we are going to prove the following theorem.
{Bth~o be a semidynamical system such that the
corresponding semigroup {Pt} of Frobenius-Perron operators is continuous.

Theorem 7.8.3. Let

7.8. The Hille-Yosida Theorem and Its Consequences

231

Then {Sth~o is ergodic if and only if AKg = 0 has only constant solutions
in L 00
Proof. The "if'' part follows from Theorem 7. 7.2. The proof of the "only ir'
part is more difficult since, in general, the semigroup {Ut} is not continuous
and we cannot use the Hille-Yosida theorem. Thus, assume that AKg = 0
for some nonconstant g. Choose an arbitrary f E L 1 and define the realvalued function ,P by the formula

,P(t) = (/, Utg) = {Pt/,g).


Due to the continuity of {Pt}, function ,P is also continuous. Further, we
have

,P(t + ~- ,P(t)

= ( /, Ut+hgh- Utg)
= (Ptf, Uhgh-

g)

Since AKg = 0, passing to the limit as h

-+

,P'(t)

for h

> 0, t ~ 0.

0, we obtain

= (Ptf,AKg) = 0.

Function ,P is continuous with the right-hand derivative identically equal


to zero, implying that ,P{t) = ,P{O) for all t ~ 0. Consequently,

(!, Utg- g)

= ,P(t) -

,P{O)

=0

fort~

0.

Since f is arbitrary this, in turn, implies that Utg = g for t ~ 0, which, by


Theorem 7. 7.2, completes the proof.
In particular, if { Sth~o is a semigroup generated by a system of ordinary
differential equations then, from equation (7.6.5), AK f = 0 is equivalent to
{7.8.19)
for continuously differentiable f with compact support. However, it must be
pointed out that {7.8.19) is of negligible usefulness in checking ergodicity,
because the property "AK f = 0 implies f constant for all functions in L 00 "
must be checked and not just the continuously differentiable functions. This
is quite different from the situation where one is using the Liouville theorem
(7.8.17) to check for invariant measures. In the latter case, it is necessary
to find only a single solution of AFP! = 0.

Example 7.8.1. Theorem 7.8.3 allows us easily to prove that Hamiltonian


systems {see Example 7.6.1) are not ergodic. To show this, note that for a

232

7. Continuous Time Systems: An Introduction

Hamiltonian system defined by equation (7.6.12), equation (7.6.5) becomes

AK/ =

[8/ 8H _ 8/ 8H]

i=l

8qi 8pi

8pi 8qi

= [f,H].

Take f E L 00 to be any nonconstant function of the energy H. By Example


7.6.1, we know that AKf 0 since

[/(H), H]

= 88/H [H, H] = 0

and therefore Hamiltonian systems are not ergodic on the whole space.
However, if we fix the total energy, or the energy for each degree of freedom
as in Remark 7.7.1, then the system may become ergodic. 0

7.9 Further Applications of the Hille-Yosida


Theorem
Thus far we have used the Hille-Yosida theorem to demonstrate some simple properties of semigroups that followed directly from properties of the
infinitesimal operator A and the resolvent equation (7.8.1). In these cases
the semigroups were given. Now we are going to show a simple application
of the theorem to the problem of determining a semigroup corresponding
to a given infinitesimal operator A.
Let X= Rand L = L 1 (R), and consider the infinitesimal operator

Af = tf/

(7.9.1)

dx2

that can, of course, only be defined for some f E 1 Let 'D(A) be the set
of all f E L 1 such that f"(x) exists almost everywhere, is integrable on R,
and
f'(x) = /'(0)
/"(s) ds.

+loa:

In other words, 'D(A) is the set of all f such that/' is absolutely continuous
and f" is integrable on R. We will show that there is a unique semigroup
corresponding to the infinitesimal operator A.
The set 'D(A) is evidently dense in L 1 (even the set of
functions is
dense in 1), therefore we may concentrate on verifying properties (b) and
(c) of the Hille-Yosida theorem.
The resolvent equation (7.8.1) has the form

coo

tfg
).g- dx2

= /,

(7.9.2)

7.9. FUrther Applications of the Hille-Yosida Theorem

233

which is a second-order ordinary differential equation in the unknown function g. Using standard arguments, the general solution of (7.9.2) may be
written as

g(x)

1
= Cte-az + C2eaz +-

1z

2a zo

1z

e-a(z-y) f(y) dy- -1


ea(z-y) f(y) dy
2a zl

where a = v'>., and Ct. C2, xo, and Xt are arbitrary constants. To be
specific, pick x 0 = -oo, Xt = +oo, and set

K(x- y) = (1/2a}e-al(z-y)l,

(7.9.3)

Then the solution of (7.9.2} can be written in the more compact form

g(x) =

C1e-az

+ C2eaz + /_: K(x- y)f(y) dy.

(7.9.4}

The last term on the right-hand side of (7.9.4) is an integrable function


on R, since

/_: dx /_: K(x- y)f(y) dy = /_: K(x- y) dx /_: f(y) dy

rJC'
= ~1 }_oo
f(y) dy.

(7.9.5}

Thus, since neither exp( -ax) nor exp( ax) are integrable over R, a necessary and sufficient condition for f to be integrable over R is that 0 1 =
0 2 = 0. In this case we have shown that the resolvent equation (7.9.1} has
a unique solution g E L 1 given by

g(x)

= R>.f(x) = /_: K(x- y)f(y) dy,

(7.9.6}

and thus condition (b) of the Hille-Yosida theorem is satisfied.


Combining equations (7.9.5} and (7.9.6) it follows immediately that the
operator >.R>. preserves the integral. Moreover, >.R>. ~ 0 iff~ 0, so that
>.R>. is a Markov operator. Thus condition (c) of the Hille-Yosida theorem
is automatically satisfied, and we have shown that the operator ~ / dx 2
is an infinitesimal operator of a continuous semigroup {Tth2:o of Markov
operators, where
(7.9.7}
and R>. is defined by (7.9.3} and (7.9.6}.
It is interesting that the limit (7.9.7} can be calculated explicitly. To do
this, denote by rjJf the Fourier transformation of f, that is,

rPJ(w)

= /_: e-iwz f(x) dx.

234

7. Continuous Time Systems: An Introduction

The Fourier transformation of K(x) given by equation (7.9.3) is

1/(>. + w2 ),
where).= o. 2 Since, by (7.9.6), R>.l is the convolution of the functions K
and I, and it is well known that

(7.9.8)
where I* g denotes the convolution of I with g, the Fourier transformation
of R';.l is
As a consequence, the Fourier transformation of the series in (7.9.7) is
00

n=O

(>.

tn ).2n
)n </JJ(w)
+W2 n.1

= exp[>.2tj(>. + w2 )]tPJ(w).

Thus the Fourier transformation of Ttl is


lim exp( ->.t) exp[>.2 tj(>. + w2 )]tPJ(w) = exp{ -w2 t)tPJ(w).

>.-+oo

Using the fact that exp( -w2 t) is the Fourier transformation of

~ exp( -x2 /4t)

v47rt

and (7.9.8), we then have

00

Ttf(x) =

r:;-::;
1

v47rt

exp[-(x- y) 2 /4t]l(y) dy.

-oo

(7.9.9)

Hence, using the semigroup method we have shown that u(t, x) = Ttf(x)
is the solution of the heat equation

with the initial condition

u(O,x)

= l(x).

Remark 7.9.1. It is a direct consequence of the elementary properties of


the differential quotient (see Definition 7.5.1) that if A is the infinitesimal
operator corresponding to a semigroup {Tth~o, then cA is the infinitesimal
operator corresponding to {Tcth>o Thus, since we have proved that A=
~ / dx 2 is the infinitesimal operator corresponding to the semigroup {Tt }t~ 0
given by (7.9.9), we know immediately that

7.9. Further Applications of the Hille-Yosida Theorem

235

has a corresponding infinitesimal operator equal to (u2 l2)(d?-ldx 2 ). (This


is in perfect agreement with our observations in Example 7.4.1.) For simplicity, we have omitted the coefficient (u 2 12) in the foregoing calculations.
0
The proofthat tP I dx2 is an infinitesimal operator for a stochastic semigroup on R may be extended to Rd. Thus, for example, the operator

L
n

Af=ilf=

i=l

Pf
-2

8x,

(7.9.10)

on Rd may be shown to be an infinitesimal operator for a stochastic semigroup, as can


n
lJ2f
{7.9.11)
Af =
a,; {) .{) . '

x, x,

i,j=l

where the a,; are constant, or sufficiently regular functions of x, and


Lt,; a,;{te; is positive definite. The procedure for proving these assertions
is similar to that for operator tP I dx 2 on R, but requires some special results
from the theory of partial differential equations and functional analysis, allowing us to extend the definitions of the differential operators (7.9.10) and
(7.9.11).
Operators such as tP ldx 2 , (7.9.10), or (7.9.11) may be considered not on
the whole space (R or Rd), but also on bounded subspaces. However, in
this case other boundary conditions must be specified, for example,
Af= d?-f

dz2

with

df = 0 and df = 0
dxa
dxb
is an infinitesimal operator for a stochastic semigroup. More details concerning such general elliptic operators may be found in Dynkin [1965].
Finally, we note that all semigroups that are generated by second-order
differential operators are not semigroups of Frobenius-Perron operators
for a semidynamical system and, thus, cannot arise from deterministic processes. This is quite contrary to the situation for first-order differential
operators, as already discussed in Section 7.8.
Remark 7.9.2. Equation (7.8.3) of the Hille-Yosida theorem allows the
construction of the semigroup {Tth~o if the resolvent operator RA is
known. As it turns out, the construction of the resolvent operator when
the continuous semigroup of contractions is given is even simpler. Thus it
can be shown that (Dynkin, 1965)
for

L, A > 0.

(7.9.12)

236

7. Continuous Time Systems: An Introduction

In (7.9.12} the integral on the half-line [0, oo) is considered as the limit of
Riemann integrals on [0, a] as a - oo. This limit exists since

It is an immediate consequence of (7.9.12} that for every stochastic semigroup Tt: 1 -+ L 1 , the operator >.R>. is a Markov operator. To show this
note first that, for f ~ 0, equation (7.9.12} implies >.R>. ~ 0. Furthermore,
for f ~ 0,

IIR>./11

=[

R>./(x) dx

00

00

e->.t { [ Ttf(x)

dx} dt

e->.tll/11 dt =~II/II.

In addition to demonstrating that >.R>. is a Markov operator, (7.9.12} also


demonstrates that the semigroup corresponding to a given resolvent R>. is
unique. To see this, choose g E L 00 and take the scalar product of both
sides of equation (7.9.12} with g. We obtain

(g, R>./)

00

e->.t(g, Ttf) dt

for>.> 0,

which shows that (g, R>./), as a function of>., is the Laplace transformation
of (g, 7t/) with respect tot. Since the Laplace transformation is one to one,
this implies that (g, Ttl) is uniquely determined by (g, R>.f). Further, since
g E L 00 is arbitrary, {Ttl} is uniquely determined by {R>./}. The same
argument also shows that for a bounded continuous function u{t), with
values in 1, the equality
implies u(t)

= Ttf.

Some of the most sophisticated applications of semigroup theory occur


in treating integra-differential equations. Thus we may not only prove the
existence and uniqueness of solutions to such equations, but also determine
the asymptotic properties of the solutions. One of the main tools in this
area is the following extension of the H~Yosida theorem, generally known
as the Phillips perturbation theorem.
Theorem 7.9.1. Let a continuous stochastic semigroup {Tth>o and a
Markov operator P be given. Further, let A be the infinitesimal operator
of {Tth~o Then there exists a unique continuous stochastic semigroup
{l't h>o for which
Ao=A+P-1

7.9. Further Applications of the Hille-Yosida Theorem

237

(I is the identity operator on L 1) is the infinitesimal operator. Furthermore,


the semigroup {Pt}t~o is defined by
00

Ptf

= e-t LTn(t)l

IE Ll,

(7.9.13)

n=O

where To(t) =

7t and
(7.9.14)

Tn(t)l = 1t To(t- r)PTn-t(r)l dr.

Proof. Denote by R~(A) the resolvent corresponding to operator A, that


is, g = R~(A)I is the solution of
>-.g-Ag= I

Since {Tth~o is a stochastic semigroup, )..R~(A) is a Markov operator (see


Remark 7.9.2). Now we observe that the resolvent equation for operator

Ao,
>-.g-Aog

may be rewritten as

=I,

(7.9.15)

(>-. + 1)g- Ag = I+ Pg.

Thus (7.9.15) is equivalent to


g = R~+l(A)I + R~+l(A)Pg.

From ineq11ality (7.8.11) we have IIR~+l(A)Pgll ~


is a Markov operator, this becomes

(7.9.16)

IIPull/().. + 1). Since P

Thus, equation (7.9.16) has a unique solution that can be constructed by


the method of successive approximations. The result is given by
00

9 = R~(Ao)l = L[R~+l(A)P]nRMt(A)I,

(7.9.17)

n=O

and the existence of a solution g to (7.9.15) is proved. Further, from (7.9.17)


it follows that R~(Ao)l;;::: 0 for I;;::: 0 and that

oo (

IIR~(Ao)lll = ~ ).. + 1

)n+l IIIII = -xlllll


1

for I;;::: 0.

238

7. Continuous Time Systems: An Introduction

Thus ~R~ (Ao) is a Markov operator and Ao satisfies all of the assumptions
of the Hille-Yosida theorem. Hence the infinitesimal operator A0 generates
a unique stochastic semigroup and the first part of the theorem is proved.
Now we show that this semigroup is given by equations (7.9.13) and
(7.9.14). Using (7.9.14) it is easy to show by induction that

IITn(t)/11 5 (t" /nl)ll/11

(7.9.18)

Thus, the series (7.9.13) is uniformly convergent, with respect to t, on


bounded intervals and Ptf is a continuous function oft. Now set

Q~,nf =
so

1 e-~tTn(t)f
00

dt,

n=0,1, ... ,

Q~,of = fooo e-~tTtf dt = R~(A)f

and

Q~,nf =

1 e-~t
1 {1 e-~tTo(t00

{lot To(t- r)PTn-t(T)/ dT} dt

00

00

r)PTn-t(r)f dt} dr

1 e-~'T 1 e-~tTo(t)PTn-t(r)f
=1 e-~tTo(t)P 1
00

00

00

dt} dr

00

e-MTn-t(r)f dr} dt

= R~(A)PQ~,n-d
Hence, by induction, we have
Q~,n = (R~(A)P]" R~(A).

Define

Q~f =

1 e-~t
00

Ptf dt

and substitute equation (7.9.13) to give

Q~f = ~

00

e-(Ml)tTn(t)f dt

= ~ Q~+t,n/

00

= L(R~+t(A)P]"R~+t(A)f.
By comparing this result with (7.9.17), we see that

R~(Ao)f =

fooo e-~tptf dt.

Q~ = R~(Ao)

or

(7.9.19)

7.9. Further Applications of the Hille-Yosida Theorem

239

From (7.9.19) (see also the end of Remark 7.9.2), it follows that {Ptfh>o
is the semigroup corresponding to Ao.
-

Example 7.9.1. Consider the integro-differential equation


8u( t, x)
&t

u 2 82u(t, x)

+u(t,x) = 2

1Jx2

00

-oo

K(x,y)u(t,y)dy,
t > 0, x E R

(7.9.20)

with the initial condition

u(O,x) = <jJ(x)

xER.

(7.9.21)

We assume that the kernel is measurable and stochastic, that is,

K(x,y)

~0

and

L:

K(x,y)dx = 1.

To treat the initial value problem, equations (7.9.20) and (7.9.21), using
semigroup theory, we rewrite it in the form

du
dt

= (A+P-I)u,

u(O) = </J,

(7.9.22)

where A= !u 2 (~ /dx 2 ) is the infinitesimal operator for the semigroup

Ttf(x)

= v'2U2rl
12
2u 7rt

00

exp(-(x- y) 2 /2u 2 t]/(y) dy

(7.9.23)

-oo

(see Remark 7.9.1) and


Pf(x)

L:

K(x, y)f(y) dy.

From Theorem 7.9.1 it follows that there is a unique continuous semigroup


{Pth~o corresponding to operator Ao = A+P-1, and, by Theorem 7.5.1,
the function u(t) = Pt</J is the solution of (7.9.22) for every </J E l>(Ao) =
l>(A). Thus u(t,x) = Pt<P(x) can be interpreted as the generalized solution
to equations (7.9.20) and (7.9.21) for every </J E L1 (R).
This method of treating equation (7.9.20) is convenient from several
points of view. First, it demonstrates the existence and uniqueness of the
solution u(t,x) for every </J E L 1 (R), and stochastic kernel K. Second, it
shows that Pt</J is a density fort~ 0 whenever </J is a density. Furthermore,
some additional properties of the solution can be demonstrated by using
the explicit representation for Pt given in Theorem 7.9.1. For this example,
it follows directly from (7.9.13) and (7.9.14) that

Pt<P = e-t 1t To(t- r)g,.dr + e-tTo(t)<jJ,

240

7. Continuous Time Systems: An Introduction

where

00

9t

=L

P1'n-1(t)l/J.

n=l

Thus, using (7.9.23) (with T0 (t) = Tt), we have the explicit representation

Ptl/>(x) = e-t ft{

Jo

1
y'2u 21r(t- r)

00

exp(-(x- y) 2 f2u 2 (t- r)]

-oo

g.,.(y) dy}dr + e-t ~ roo exp(-(x- y) 2 /2u 2 t]l/>(y) dy.


2u211't 1-oo
This shows directly that the function u(t, x) = Ptlf>(x) is continuous and
strictly positive fort> 0 and every 4> E L 1 (R), even if 4> and the stochastic
kernel K are not continuous! Finally, we will come back to this semigroup
approach in Section 11.10 and use it to demonstrate some asymptotic properties of the solution u( t, x). 0

Example 7.9.2. As a second example of the applicability of the Phillips


perturbation theorem, we consider the first-order integro-differential equation

8u(t,x) 8u(t,x)
8t + ax +u(t,x)=

00

a:

K(x,y)u(t,y)dy,
t

with
u(t, 0) = 0 and u(O, x)

> 0,

X ~

= lf>(x).

(7.9.24)
(7.9.25)

Again the kernel K is assumed to be measurable and stochastic, that is,

K(x, y)

~0

1
11

and

K(x, y) dx = 1.

(7.9.26)

Equation (7.9.24) occurs in queuing theory and astrophysics (BharuchaReid, 1960]. In its astrophysical form,

K(x, y) = (1/y)'I/J(xfy),

{7.9.27)

and, with this specific expression for K, equation (7.9.24) is called the
Chandrasekhar-Miinch equation. As developed by Chandrasekhar and
Miinch (1952], equation (7.9.24) with K as given by (7.9.27) describes fluctuations in the brightness x of the Milky Way as a function of the extent
of the system t along the line of sight. The unknown function u( t, x) is
the probability density of the fluctuations, and the given function 1/J in
{7.9.27) is related to the probability density of light transmission through
interstellar gas clouds. This function satisfies
1/l(z)

~0

1
1

and

1/J(z) dz = 1

and, thus, K as given by (7.9.27) automatically satisfies (7.9.26).

(7.9.28)

7.10. Frobenius-Perron and Koopman Operators

241

To rewrite (7.9.24) as a differential equation in 1 , recall (see Example


7.5.1) that -d/dx is the infinitesimal operator for the semigroup Ttf(x) =
f(x- t) defined on L 1 (R). On L 1 ([0,oo)),

Ttf(x)

= 1[o,oo)(x- t)f(x- t)

(7.9.29)

plays an analogous role. Proceeding much as in Example 7.5.1, a simple


calculation shows that for continuously differentiable f with compact support in [0, oo) the infinitesimal operator corresponding to the semigroup in
(7.9.29) is given by Af = -df jdx. Further, it is clear that u(t,x) = Ttf(x)
satisfies u(t, 0) = 0 for t > 0. Hence we may rewrite equations (7.9.24)(7.9.25) in the form

du
dt

= (A+P-I)u,

where

u(O}

= t/>,

(7.9.30)

00

Pf(x) =

K(x,y)f(y)dy.

By Theorem 7.9.1 there is a continuous unique semigroup {Pth>o corresponding to the infinitesimal operator A+ P-I. For every t/> E V(A), the
function u(t) = Ptt/> is a solution of (7.9.30). 0

7.10 The Relation Between the Frobenius-Perron


and Koopman Operators
The semigroup ofFrobenius-Perron operators {Pt} and the semigroup {Ut}
of Koopman operators, both generated by the same semidynamical system
{Sth>o, are closely related because they are adjoint. However, each describes the behavior of the system {Bth~o in a different fashion, and in
this section we show the connection between the two.
Equation (7.4.16}, (Ptf, g) = (!, Utg}, which says that Pt and Ut are
adjoint, may be written explicitly as

[ g(x)Ptf(x)JJ(dx)

=[

f(x)g(St(X))JJ(dx)

For some A C X such that A and Bt(A) are in A, take f(x) = 0 for all
x A and g = 1x\St(A) so the preceding formula becomes

[ 1x\St(A)(x)Ptf(x)JJ(dx) = [ f(x)1x\St(A)(St(x))JJ(dx)

f(x)1x\St(A)(Bt(X))JJ(dx).

242

7. Continuous Time Systems: An Introduction

The right-hand side ofthis equation is obviously equal to zero since St(x)
X\ St(A) for x E A. The left-hand side is, however, just the L 1 norm of
the integrand, so that

ll1x\S,(A)Ptlll

= 0.

This, in turn, implies

or
for x St(A).

Ptf(x) =0

(7.10.1)

Thus the operator Pt "carries" the function I, supported on A, forward


in time to a function supported on a subset of St(A) (see Example 3.2.1
and Proposition 3.2.1). Figuratively speaking, we may say that the density
is transformed by Pt analogously to the way in which initial points x are
transformed into St(x).
Now consider the definition of the Koopman operator,

Utf(x)
Assume

IE L

00

= I(St(x)).

is zero outside a set A, so we have

I(St(x))

=0

if St(x) A.

(7.10.2)

This, in turn, implies that

Utf(x)

=0

(7.10.3)

In contrast to Pt, therefore, Ut may be thought of as transporting the


function supported on A, backward in time to a function supported on
Bt 1 (A).
These observations become even clearer when {St} is a group of transformations, that is, when the group property holds for both positive and
negative time,
for all t, t' E R,x EX,
and all the St are at least nonsingular. In this case, St 1 (x)
(7.10.3) becomes
Utf(x) =0
for x B-t(A).

= B-t(x) and

If, in addition, the group { St} preserves the measure p., we have

( Ptf(x)p.(dx)

= (

k_,(~

l(x)p.(dx)

= (

which gives

Ptl(x) = 1(8-t(x))

1(8-t(x))p.(dx),

7.10. Frobenius-Perron and Koopman Operators


or, finally,

Ptf(x)

= U_tf(x).

243

(7.10.4)

Equation (7.10.4} makes totally explicit our earlier comments on the forward and backward transport of densities in time by the Frobenius-Perron
and Koopman operators.
Furthermore, from (7.10.4) we have directly that

lim[(Ptf- /)/t]

t--+0

and, thus, for

= t--+0
lim[(U-tf- !)ft]

f in a dense subset of 1 ,
(7.10.5)

This relation was previously derived, although not explicitly stated, for
dynamical systems generated by a system of ordinary differential equations
[cf. equations (7.6.5) and (7.6.10)].

Remark 7.10.1. Equation (7.10.4) may, in addition be interpreted as saying that the operator adjoint to Pt is also its inverse. In the terminology
of Hilbert spaces [and thus in L 2 (X)] this means simply that {Pt} is a
semigroup of unitary operators. The original discovery that {Ut}, generated by a group {St} of measure-preserving transformations, forms a group
of unitary operators is due to Koopman [1931]. It was later used by von
Neumann [1932] in his proof of the statistical ergodic theorem. 0
Remark 7.10.2. Equation (7.10.1) can sometimes be used to show that a
semigroup of Markov operators cannot arise from a deterministic dynamical system, which means that it is not a semigroup of Frobenius-Perron
operators for any semidynamical system {Sth>o
For example, consider the semigroup {Pt} 'iiven by equations (7.4.11}
and (7.4.12):
Ptf(x)
Setting f(y)

= J21W2t

joo f(y) exp [- (x _ y)2] dy.


-oo

2u2t

(7.10.6)

= 1[o, 1J(y), it is evident that we obtain


Ptf(x) > 0

for all x and t > 0.

However, according to (7.10.1), if Ptf(x) was the Frobenius-Perron operator generated by a semidynamicalsystem {Sth~o, then it should be zero
outside a bounded interval St([O, 1]). [The interval St([O, 1]) is a bounded
interval since a continuous function maps bounded intervals into bounded
intervals.] Thus {Pt}, where Ptf(x) is given by (7.10.6), does not correspond to any semidynamical system. 0

244

7.11

7. Continuous Time Systems: An Introduction

Sweeping for Stochastic Semigroups

The notion of sweeping for operators as developed in Section 5.9 is easily


extended to semigroups. We start with the following.

Definition 7.11.1. Let (X,A,J.&) be a measure space and A. C A be


a given family of measurable sets. A stochastic semigroup ,Pt: L 1 (X) -+
L1 (X) is called sweeping with respect to A. if
lim

t-+oo}A

Ptf(x)J.&(dx) = 0

for IE D and A EA.

(7.11.1)

As in the discrete time case, it is easy to verify that condition (7.11.1) for
a sweeping semigroup {Pth>o also holds for every IE L 1 (X). Alternately,
if Doc Dis dense in D, th;n it is sufficient to verify (7.11.1) for IE Do.
In the special case that X c R is an interval (bounded or not) with endpoints a and {3, a< {3, we will use notions analogous to those in Definition
5.9.2. Namely, we will say that a stochastic semigroup Pt: L 1 (X)-+ L 1 (X)
is sweeping to a, sweeping to {3, or simply sweeping if it is sweeping
with respect to the families Ao, A 11 or A 2 defined in equations (5.9.5)(5.9.7), respectively.
Example 7.11.1. Let X = R. We consider the semigroup generated by
the infinitesimal operators cd./dx and (u 2 /2)f12/dx 2 discussed in Example
7.5.1 and Remark 7.9.1.
The operator cd./dx corresponds to the semigroup
Ptf(x) = l(x- ct)
which, for c > 0, is sweeping to +oo and for c < 0 to -oo. The verification
of these properties is analogous to the procedure in Example 5.9.1. Thus,
for c > 0 we have

l-oob

Ptf(x) dx

lb-oo

l(x- ct) dx

lb-ct
-oo

l(y) dy

=0

when I has compact support and t is sufficiently large. For c


argument is similar.
The operator (u 2 /2)f12 /dx 2 generates the semigroup

Ptf(x) =

/_:oo exp [- (x2:2~)2] l(y) dy

which is evidently sweeping since, for

1
b

E D,

b-a
Ptf(x)dx :5 ~-+
0
21ru 2 t

as t-+ oo.

<

0 the

7.11. Sweeping for Stochastic Semigroups

245

Comparing Examples 5.9.1, 5.9.2, and 7.11.1 we observe that the sweeping property of a semigroup {Pt }t~o appears simultaneously with the sweeping of the sequence {Pt~} for some to > 0. This is not a coincidence. It is
evident from Definitions 5.9.1 and 7.11.1 that if {.Pt}t~o is sweeping, then
{Pt~} is also sweeping for an arbitrary to > 0. The converse is more delicate,
but is assured by the following result.
Theorem 7.11.1. Let (X, A, Jl.) be a measure space, A. c A be a given
family of measurable sets, and Pt: L 1 (X)- L 1 (X) a continuous stochastic
semigroup. If for some to > 0 the sequence {JTo} is sweeping, then the
semigroup {Pth>o is also sweeping.
Proof. Fix an e > 0 and
such that

E D. Since Pt is continuous there is a 6

liFt!- Ill$ e

>0

for 0$ t $6.

Let
0 = So < St < < Sk
be a partition of the interval [0, t 0 ] such that
for i
Define

= to

= 1, ... , k.

P.J. Every value to ~ 0 can be written in the form


t = nto + si + r,

where nand i are integers (n


Therefore,
Ptf

= 0,1, ... ; i = i, ... ,k)

and 0 $ r < 6.

= Pt~P. 1 Pd = Pt~fi + ~~P.; (Pd- !).

Since IIPd- /II$ e and Pt~ and P. 1 are contractive, we have


IIPt~P.;(Pd-

!)II$ e.

As a consequence, for every A E A.

Ptf(x)J.t(dx) $

LPt~fi(x)Jl.(dx) +e.

Evidently, n - oo as t - oo and the integrals on the right-hand side


converge to zero, thus completing the proof.
The main advantage of Theorem 7.11.1 is that it allows us to obtain
many corollaries concerning sweeping for semigroups from previous results
for iterates of a single operator. As an example, from Theorem 7.11.1 and
Proposition 5.9.1 we have the following:
Proposition 7.11.1. Let (X,A,Jl.) be a measure space, and A. c A be a
given family of measurable sets. Furthermore, let Pt: L 1 (X)- L 1 (X) be a

246

7. Continuous Time Systems: An Introduction

continuous stochastic semigroup for which there exists a Bielecki function


V: X--+ R, a constant 'Y < 1, and a point to > 0 such that

LV(x)Pt0 /(x)~(dx) ~ LV(x)f(x)~(dx)


'Y

for fED.

Then the semigroup {Pto }t~o is sweeping.

Proof. Since the operator Pt0 satisfies the conditions of Proposition 5.9.1,
the sequence{~} is sweeping. Theorem 7.11.1 completes the proof.
More sophisticated applications of Theorem 7.11.1 will be given in the
next section.

7.12 Foguel Alternative for Continuous Time


Systems
We start from a question concerning the relationship between the existence of an invariant density for a stochastic semigroup {Pth>o and for
an operator Pto with a fixed t 0 . Clearly, if/. is invariant with respect to
Pt so Pt/. = /. for all t ~ 0, then/. is invariant for every operator Pto
The converse is, however, unfortunately false. Rather we have the following
result.

Proposition 7.12.1. If Pt: L1 (X) --+ L1 (X) is a continuous stochastic


semigroup and if Pto fo = fo for some to > 0 with fo E D, then

/.(x)
is a density and satisfies Ptf.

= t~ 1to Ptfo(x) dt
= j.

for all t ~ 0.

Proof. From the definition of /. we have

L/.(x)~(dx) L[t~

1to Ptfo(x) dt]

= :0 1to

[L Ptfo(x)~(dx)]

=1.
Furthermore,
1 rto

Pt/.

= to Jo

~(dx)

Ps+tfo ds

1 lto+t
t
P.fo ds

= to

dt

Exercises

1 1to
1 1to+t
= -too
Pefo ds + -t
Pefo ds Ot
0

= /.

247

1 1t
-t
P8 fo ds
oo

+ _!_

ft (Pe+tofo - P.Jo) ds.


to lo

Since Pt 0 /o = /o we have Pe+tofo- Palo


thus completing the proof.

= 0 and the last integral vanishes,

Now, using Theorems 5.9.1, 5.9.2, and 7.12.1, it is easy to establish the
following alternative.
Theorem 7.12.1. Let (X, .A, p.) be a measure space, and .A. c .A be a given
regular family of measurable sets. FUrthermore, let Pt: L1 (X)-+ L1 (X) be
a continuous stochastic semigroup such that for some to > 0 the operator
Pt 0 satisfies the following conditions:

( 1) Pt0 is an integral operator given by a stochastic kernel; and

(2) There is a locally integrable function/. such that


I'tof :::; /.

and /. > 0 a.e.

Under these conditions, the semigroup {Pt h>o either has an invariant density, or it is sweeping. If an invariant density exists and, in addition, Pt0
is an expanding operator, then the semigroup is asymptotically stable.

Proof. The proof is quite straightforward. Assume first that { Pt }t~o is not
sweeping so by Theorem 7.11.1 the sequence {JTo} is also not sweeping.
In this case, by Theorem 5.10.1 the operator Pt 0 has an invariant density.
Proposition 7.12.1 then implies that { Pth~o must have an invariant density.
In the particular case that Pt0 is also an expanding operator, it follows from
Theorem 5.10.2 that {JTo} is asymptotically stable. Finally, Remark 7.4.2
implies that {Pt h>o is also asymptotically stable.
In the second case that {Pt }t~o is sweeping, {JTo} is also, and by Th~
orem 5.10.1 the operator Pt0 does not have an invariant density. As a
consequence, { Pt h~o also does not have an invariant density.

Exercises
7.1. Let A: L-+ L be a linear bounded operator, that is,
IIAII

= sup{IIA/11: 11/11 :::; 1} < oo.

Using a comparison series prove that

248

7. Continuous Time Systems: An Introduction

(a) etA f = E~=0 (tn fn!)An f is strongly convergent in L for t E R and

/EL,

(b) e<t 1 +t2 )AJ = ehAet 2 A/ for t1,t2 E R, IE L.

7.2. Again, let A: L-+ L be a linear bounded operator. Using the results
of Exercise 7.1, prove that A is the infinitesimal operator of the semigroup

Ttf =etA/
and that :V(A)

= L.

7.3. A linear operator A: :V(A)-+ L1 is called closed if the conditions

11/n- /II-+ 0;

IIA/n- gil-+ 0;

/n

'V(A);

/,gEL

imply that f E :V(A) and g = Af. Prove that the following operators are
closed:
(a) The operator Af = df fdx defined on the set :V(A) c L1 of all
absolutely continuous f E L1 such that /' E L 1 .
(b) The operator Af = ~//dx2 defined on the set 'D(A) C L1 of all
I E L 1 such that /' is absolutely continuous and f" E L1
7 .4. Generalize the previous results and show that every operator A satisfying the conditions of the Hille-Yosida theorem is closed.
7.5. In Section 7.9 using the Hille-Yosida theorem we have proved that
A=~ /dx 2 generates the semigroup {Tt} given by formula (7.9.9). Reverse
the calculation and assuming that {Tt} is defined by (7.9.9) show that
A = ~ / dx 2 is its infinitesimal operator.

= R) case of the continuity equation

7 .6. Consider the one-dimensional (X


8u
8t

+ ax (F(x)u) =

0,

u(O,x)

= /(x),

where F: R -+ R is a C 1 function satisfying

IF(x)l :5 c(1 + lxn


Assuming the above inequality with r

forxeR.

= 1 show that the formula Ptf(x) =

u(t,x) defines a stochastic semigroup on L 1 (R). Find counter-examples

showing that for r > 1 the semigroup {Pt} may not be well defined or
stochastic.
7. 7. Consider the semigroup {Pt} defined in the previous exercise (with
r = 1). Show that
(a) {.Pt} is not asymptotically stable;
(b) {Pt} is sweeping to +oo if and only if all solutions of the equation
x' = F(x) satisfy limt__, 00 x(t) = oo.

Exercises

249

7.8. Consider the integra-differential equation


8u(t,x)
8t +u(t,x)

=8

u(t,x)
Ox2 +

1,.. K(x,y)u(x,y) dy

for t > 0, 0 :5 x :5 1r,

where K: [0,1r] x [0,1r]-+ R is a stochastic kernel. Using the Hille-Yosida


and the Phillips perturbation theorem, show that this equation with the
boundary value condition
u~(t, 0)

= u~(t, 1r) = 0

and the initial condition

= f(x)
generates the stochastic semigroup pt f(x) = u(t, x) on the space 1 ([0, 1r]).
u(O,x)

In particular, define precisely the domain I>( A) of A=~ jdx 2 for which
the conditions of the Hille-Yosida theorem are satisfied (Jama 1986).

8
Discrete Time Processes Embedded
in Continuous Time Systems

In this chapter, our goal is to introduce a way in which discrete time


processes may be embedded in continuous time systems without altering the phase space. To do this, we adopt a strictly probabilistic point
of view, not embedding the deterministic system S: X -+ X in a continuous time process, but rather embedding its Frobenius-Perron operator
P: L 1 (X) -+ L 1 (X) that acts on L 1 functions. The result of this embedding is an abstract form of the Boltzmann equation. This chapter requires
some elementary definitions from probability theory and a knowledge of
Poisson processes, which are introduced following the preliminary remarks
of the next section.

8.1

The Relation Between Discrete and


Continuous Time Processes

For a semidynamical system {Sth;;::o on a phase space X, if we fix the time


tat some value to, then by property (b) of Definition 7.2.3,
for all x EX.
Thus, in this fashion, a discrete time system may be generated from any
continuous time (semidynamical} system. It is possible that a study of the
discrete time system may yield some partial information concerning the
continuous time system from which it was derived.
Another way of obtaining a discrete time system from a continuous one

252

8. Discrete Time Processes/Continuous Time Systems

FIGURE 8.1.1. Determination of the first return (or Poincare) map for a semidynamical system {St}t~o

Also assume that we can find a closed set A c X such that, if x E A, then,
for t > 0 sufficiently small St (x) A, that is, each trajectory leaves A
immediately (see Figure 8.1.1). Further, if every trajectory that starts in A
eventually returns to A, that is, for every x E A there is a t' > 0 such that
St' (x) E A, then we may define a new mapping, the first return map.
This is given by
S(x) = St'(x),
where t' is the smallest time t' > 0 such that St' (x) E A. Again, by studying
S, we may gain some insight into the properties of {Sth>o This method
was introduced by Poincare, and the first return map is often called the
Poincare map.
Thus it is relatively straightforward to devise ways to study continuous
time processes by a reduction to a discrete time system. However, given
a discrete time system S: X -+ X, it is much more difficult to embed it
in a continuous time system and, indeed, such embedding is, in general,
impossible. That is, given S: X -+ X, generally there does not exist a
{Stho::o such that S(x) = St0 (x) for some to ~ 0 [see Zdun, 1977]. For
example, in previous chapters we considered the quadratic transformation
S(x) = 4x(1 - x), x E [0, 1]. It can be proved that there does not exist
a semidynamical system {Stho::o on [0, 1] such that St 0 (x) = 4x(1- x).
Of course, it is always possible to embed a discrete time process into a
continuous time system by altering the phase space in an appropriate way.

8.2 Probability Theory and Poisson Processes


Up to this point we have almost never used the word probabilistic even
though we have dealt, from the outset, with normalized measures that are
also measures on a probability space. In this short section, we review all of
the material necessary for an understanding of Poisson processes.

8.2. Probability Theory and Poisson Processes

253

The fundamental notion of probability theory is that of a probability


space {n,.r, prob), where n is a nonempty set called the space of all
possible elementary events, .r is a u-algebra of subsets of n, which are
called events, and "prob" is a normalized measure on .r. The equality
prob{A)

= p,

Ae.r,

means that the probability of event A is p. From the fact that prob is a
measure, it immediately follows that
prob ( yAi)

= ~prob{Ai),

{8.2.1)

where the Ai E .r are mutually disjoint, that is, Ai n A; = 0 for all i


To introduce the concept of independence, we define it as follows.

"I j.

Definition 8.2.1. In a sequence of events At. A2, ... (finite or not), the
events are called independent if, for any increasing sequence of integers
kt < k2 < < km
prob{ Atc 1 n Atc2 n n Atc,.) = prob{ Atc 1 )

prob{ Atc,. ).

{8.2.2)

Equation {8.2.2) just means that the probability of all the events Atc, occurring is the product of the probabilities that each will occur separately.
Random variables are defined next.
Definition 8.2.2. A random variable { is a measurable transformation
from into R. More precisely, {:
R is a random variable if, for any
Borel set B c R,

n-

C 1 (B) ={wE n:{(w) E B} E .r.


This set is customarily written in the more compact notation {e E B}.
Thus, for any Borel set B C R, prob{{ E B} is well defined.
A function IE D{R) is called the density of the random variable if
prob{{ E B}

f(x) dx

{8.2.3)

for any Borel set B c R.


Let {t. 6, ... be a sequence of random variables. We say the {i are independent if, for any sequence of Borel sets Bt. B2, ... , the events

{6

E Bt}, {{2 E B2}, ...

are independent. Thus a finite sequence of independent random variables


satisfies
prob{{t E Bt. ... , {n E Bn} = prob{{t E Bt} prob{{n E Bn}, {8.2.4)
is as follows. Again, suppose we are given a semidynamical system {Sth~o

254

8. Discrete Time Processes/Continuous Time Systems

and the probability that all events {ei E Bi} will occur is simply given by
the product of the probabilities that each will occur separately.
We are now in a position to make the concept of a stochastic process
precise with the following definition.

Definition 8.2.3. A stochastic process {et} is a family of random variables that depends on a parameter t, usually called time. If t assumes
only integer values, t = 1, 2, ... , then the stochastic process reduces to
a sequence {en} of random variables called a discrete time stochastic
process. However, if t belongs to an interval (bounded or not) of R, then
the stochastic process is called a continuous time stochastic process.
By its very definition, a stochastic process {et} is a function of two
variables, namely, time t and event w, but this is seldom made explicit
by writing {et(W) }. If the time is fixed, then et is simply a random variable.
However, if w is fixed, then the mapping t -+ et(w) is called the sample
path of the stochastic process.
Two important properties that stochastic processes may have are given
in the following definition.
Definition 8.2.4. A continuous time stochastic process {eth>o has independent increments if, for any sequence of times to < t 1 < < t"' the
random variables
etl - eto' et2 - eto . .. 'etn - etn-1
are independent. Further, if for any h and t 2 and Borel set B

c R,
(8.2.5)

does not depend on t'' then the continuous time stochastic process {et} has
stationary independent increments.
Before giving the definition of a Poisson process, we note that a stochastic
process {et} is called a counting process if its sample paths are nondecreasing functions of time with integer values. Counting processes will be
denoted by {Nt}t~o

Definition 8.2.5. A Poisson process is a counting process {Nth>o with


stationary independent increments satisfying:
(a) No= 0;

(8.2.6a)

(b) }~(1/t) prob{Nt ~ 2}

= 0;

(8.2.6b)

= t-+0
lim(1/t)prob{Nt = 1}

(8.2.6c)

(c) The limit


~

exists and is positive; and

8.2. Probability Theory and Poisson Processes

255

(d) prob{ Nt = k} as functions of t are continuous.


A classic example of a. Poisson process is illustrated by a. radioactive
substance placed in a. chamber equipped with a. device for detecting and
counting the total number of atomic disintegrations Nt that have occurred
up to a. time t. The amount of the substance must be sufficiently large
such that during the time of observation there is no significant decrease
in the mass. This ensures that the probability (8.2.5) is independent oft'.
It is an experimental observation that the number of disintegrations that
occur during any given interval of time is independent of the number occurring during any other disjoint interval, thus giving stationary independent
increments. Conditions (a.)-(c) in Definition 8.2.5 have the following interpretations within this example: No = 0 simply means that we start to count
disintegrations from time t = 0. Condition (b) states that two or more disintegrations are unlikely in a. short time, whereas (c) simply means that
during a. short time t the probability of one disintegration is proportional
tot.
Also, the classical derivations of the Boltzmann equation implicitly assume that molecular collisions are a. Poisson process. This fact will turn
out to be important later.
It is interesting that from the properties of the Poisson process we may
derive a. complete description of the way the process depends on time. Thus
we may derive an explicit formula. for
P~c(t) = prob{Nt = k}.

(8.2.7)

This is carried out in two steps. First we derive an ordinary differential


equation for p~c(t), and then we solve it. In our construction it will be
useful to rewrite equations (8.2.6a.) through (8.2.6c) using the notation of
(8.2.7):
(8.2.8a.)
Po(O) = 1,
1 00
lim - LPi(t) = 0,
t-+0 t

(8.2.8b)

~ = lim(1/t)pl(t).

(8.2.8c)

i=2

and
t-+0

To obtain the differential equation for P~c(t), we first start with Po(t),
noting that Po(t +h) may be written as

Po(t +h) = prob{Nt+h = 0} = prob{Nt+h- Nt + Nt- No = 0}.


Since Ntis not decreasing, hence (Nt+h- Nt) + (Nt- No)= 0 if and only
if (Nt+h - Nt) = 0 and (Nt -No) = 0. Thus,

Po(t +h) = prob{(Nt+h - Nt) = 0 and (Nt -No) = 0}

256

8. Discrete Time Processes/Continuous Time Systems

= prob{Nt+h -

Nt

= 0} prob{Nt -No = 0}

= prob{Nh- No= O}prob{Nt- No= 0}


= Po(h)Po(t),

(8.2.9)

where we have used the property of stationary independent increments.


From (8.2.9) we may write

Po(t +h)- Po(t)

= Po( h)- 1Po(t).

(8.2.10)

Since L::oPi(t) = 1, we have


Po(h) -1
h

= _Pt(h)
h

_ .!:_ ~p(h)

h.L...J

i=2

and, thus, by taking the limit of both sides of (8.2.10) ash-+ 0, we obtain

dPo(t)
dt

= -.\Po(t) .

The derivation of the differential equation for


fashion. Thus

(8.2.11)
P~c(t)

proceeds in a similar

P~c(t +h)= prob{Nt+h = k}

= prob{Nt+h- Nt + Nt- No= k}

= prob{Nt- No= k and Nt+h- Nt = 0}


+prob{Nt- No= k -1 and Nt+h- Nt = 1}
k

+ Eprob{Nt- No= k- i

and Nt+h- Nt = i}

i=2
k

= Pk(t)Po(h) + Pk-l(t)pt(h) + LPk-i(t)pi(h).


i=2

As before, we have

P~c(t + h~- P~c(t)

Po(h~- 1Pk(t) + Plt) Plc-l(t) + LPk-i(t)pi(h),


i=2

and, by taking the limit as h -+ 0, we obtain


(8.2.12)
The initial conditions for Po(t) and p~c(t), k ~ 1, are just Po(O) = 1 (by
definition), and this immediately gives p~c(O) = 0 for all k ~ 1. Thus, from
(8.2.11), we have
(8.2.13)

8.2. Probability Theory and Poisson Processes

257

FIGURE 8.2.1. Probabilities Po(t), p1(t), P:l(t) versus .Xt for a Poisson process.

Substituting this into (8.2.12) when k = 1 gives


dp 1 (t)

dt

= -APl(t) + Ae-.\t

whose solution is

Pl(t)

= Ate->.t.

Repeating this procedure for k = 2, ... we find, by induction, that


{8.2.14)
The behavior of Pk(t) as a function oft is shown in Figure 8.2.1 fork= 0,
1, and 2. Figure 8.2.2 shows Pk(t) versus k for several values of At.

Remark 8.2.1. Note that in our derivation of equation {8.2.12) we have


only used h > 0 and, therefore, the derivative pl, on the left-hand side of
{8.2.12) is, in fact, the right-hand derivative of Pk However, it is known
[Szarski, 1967] that, if the right-hand derivative pl, exists and the Pk are
continuous [as they are here by assumption {d) of Definition 8.2.5], then
there is a unique solution to {8.2.12). Thus the functions {8.2.14) give the
unique solution to the problem. 0
Although the way we have introduced Poisson processes and derived the
expressions for Pk (t) is the most common, there are other ways in which
this may be accomplished. However, all these derivations, as indicated by
properties (a)-(c) of Definition 8.2.5, show that a Poisson process results
if the events counted by Nt are caused by a large number of independent
factors, each of which has a small probability of incrementing Nt.

258

8. Discrete Time Processes/Continuous Time Systems

>.10.1

>.tI.O

L---..L---L.---1.

Pk(t)

0.1

>.t 10

.d

10

II

20

FIGURE 8.2.2. Plots of p,.(t) versus k for a Poisson process with >.t
or 10.

= 0.1, 1.0,

8.3 Discrete Time Systems Governed by Poisson


Processes
A particular sample path for a Poisson process might look like the one
shown in Figure 8.3.1. In this section we develop some ideas and tools
that will allow us to study the behavior of a deterministic discrete time
process given by a nonsingular transformation 8: X -+ X on a measure
space (X, A, I') coupled with a Poisson process {Nt}t~o The coupling is
such that, even though the dynamics are deterministic, the times at which
the transformation S operates are determined by the Poisson process. Thus
we consider the situation in which each point x E X is transformed into
SNt(x). This may be written symbolically as
X-+

8Nt(x)

for times in the interval [0, oo). Specifically, we consider the following problem. Given an initial distribution of points x E X, with density f, how does
this distribution evolve in time? We denote the time-dependent density by
u(t,x) and set u(O,x) = f(x).
The solution of this problem starts with a calculation of the probability

8.3. Discrete Time Systems Governed by Poisson Processes

259

FIGURE 8.3.1. A sample path for a Poisson process.

that
(8.3.1)
for a given set A c .A and time t > 0. This probability depends on two
factors: the initial density f and the counting process {Nth>o
To be more precise, we need to calculate the measure of the set
(8.3.2)
This, in turn, requires some assumptions concerning the product space
n X X given by

0 x X= {(w,x):w E O,x EX}


that contains all sets of the form (8.3.2). In the space n x X we define
(see Theorem 2.2.2) a product measure that, for the sets C x A, C E :F,
A E .A, is given by prob(C)#Lt(A), and we denote it by Prob(C x A) or
Prob(C x A)
where, as usual,

#Lt(A)

= prob(C)#Lt(A),

(8.3.3)

f(x)#L(dx).

This measure is denoted by "Prob" since it is a probability measure. Equation (8.3.3) intuitively corresponds to the assumption that the initial position x and the stochastic process {Nth>o are independent.
Now we may proceed to calculate the 'ineasure of the set (8.3.2). This set
may be rewritten as the union of disjoint subsets in the following way:
00

{(w,x):SN,(w>(x)

A}= U{Nt(w) = k,Sk(x) E A}


k=O
00

= U{Nt(w) = k} x {Sk(x) e A}.


k=O

260

8. Discrete Time Processes/Continuous Time Systems

Thus the Prob of this set is


00

L Prob{Nt(w) = k, S"(x) E A}

Prob{SN E A}=

k=O
00

= Lprob{Nt = k}pj(X E s-"(A))


k=O

EP~c(t)

k=O

EP~c(t)

f(x)IJ.(dx)

ls-"(A)

pk f(x)IJ.(dx)

(8.3.4)

lc=O

so that
Prob{SN E A}=

f EP~c(t)Pic f(x)IJ.(dx)

forAEA,

(8.3.5)

}Ak=-0

where, as before, P denotes the Frobenius-Perron operator associated with


8, and we have assumed that B: X--+ X is nonsingular.
The integrand on the right-hand side of (8.3.5) is just the desired density,
u(t,x):
00

u(t,x)

= LP~c(t)P" f(x).

(8.3.6)

k=O

[Note that the change in order of integration and summation in arriving at


(8.3.5) is correct since IIPic /II = 1 and ~~oP~c(t) = 1. Thus the sequence
on the right-hand side of (8.3.6) is strongly convergent in 1 .]
Differentiating (8.3.6) with respect tot and using (8.2.12), we have

&(t,x)
0t

f: dp~c(t)

pk f(x)

dt

k=O
00

00

= -~ LPk(t)Pic f(x) + ~ LPk-1(t)Pk f(x).


k=O

k=1

Since the last two series are strongly convergent in 1 , the initial differentiation was proper. Thus we have

&~ x) = -~u(t, x) + ~ EP~c(t)Pk+l f(x)


k=O
00

= -.Xu(t, x) + ~p LP~c(t)Pk f(x)


k=O

= -.Xu(t,x) + ~Pu(t,x).

8.4. Linear Boltzmann Equation

261

Therefore u( t, x) satisfies the differential equation

&u~x) = -~u(t,x) + ~Pu(t,x)

(8.3.7)

with, from (8.3.6), the initial condition

u(O,x)

= f(x).

We may always change the time scale in (8.3.7) to give

&u~x) = -u(t,x) + Pu(t,x).

(8.3.8)

From a formal point of view, equation (8.3.7) is a generalization of the


system of differential equations (8.2.11) and (8.2.12) derived for the Poisson
process. Consider the special case where X is the set of nonnegative integers
{0, 1, ... }, p. is a counting measure, and S(x) = x + 1. For a single point
n~ 1,
Pf(n) = f(n -1)
and when n

= 0, P/(0) = 0. Thus, from (8.3.7), we have


8u(t,x)
8t

=-~u(t,n)+~u(t,n-1)

and

8u(t, 0)
8t

n~1

= -~u(t, O),

which are identical with equations (8.2.12) and (8.2.11), respectively, except
that the initial condition is more general than for the Poisson process since
u(O, n) = f(n).

8.4 The Linear Boltzmann Equation: An Intuitive


Point of View
Our derivation in the preceding section of equation (8.3.8) for the density
u(t,x) was quite long as we wished to be precise and show the connection
with Poisson processes. In this section we present a more intuitive derivation of the same result, using arguments similar to those often employed in
statistical mechanics.
Assume that we have a hypothetical system consisting of N particles
enclosed in a container, where N is a large number. Each particle may
change its velocity x = (v 11 v2, v2) from x to S(x) only by colliding with
the walls of the container. Our problem is to determine how the velocity

262

8. Discrete Time Processes/Continuous Time Systems

distribution of particles evolves with time. Thus we must determine the


function u( t, x) such that

u(t,x)dx

is the number of particles having, at time t, velocities in the set A.


The change in the number of particles, whose velocity is in A, between t
and t + fl.t is given by

u(t+tl.t,x)dx-N

u(t,x)dx.

(8.4.1)

From our assumption, such a change can only take place through collisons
with the walls of the container. Take fl.t to be sufficiently small so that
a negligible number of particles make two or more collisions with a wall
during fl.t. Thus, the number of particles striking the wall during a time
fl.t with velocity in A before the collision [and, therefore, having velocities
in S(A) after the collision] is
N>.tl.t

(8.4.2)

u(t,x)dx,

where >.N is the number of particles striking the walls per unit time. In
this idealized, abstract example we neglect the quite important physical
fact that the faster particles are striking the walls of the container more
frequently than are the slower particles.
Conversely, to find the number of particles whose velocity is in A after
the collision, we must calculate the number having velocities in the set
s- 1 (A) before the collision. Again, assuming fl.t to be sufficiently small to
make the number of double collisions by single particles negligible, we have
N>.fl.t

u(t,x)dx.

(8.4.3)

ls-l(A)

Hence the total change in the number of particles with velocity in the set A
over a short time fl.t is given by the difference between (8.4.3) and (8.4.2):
N>.fl.t {
ls-l(A)

u(t,x)dx- N>.fl.t

}A

u(t,x)dx.

(8.4.4)

By combining equation (8.4.1) with equation (8.4.4), we have


N

f [u(t+fl.t,x)-u(t,x)]dx = >.N fl.t { f

}A

u(t,x)dx-

ls-l(A)

and, since

ls-l(A)

u(t,x)dx =

}A

Pu(t,x)dx,

}A

u(t,x)dx},

8.4. Linear Boltzmann Equation

263

where Pis the Frobenius-Perron operator associated with S, we have

N L[u(t+J)..t,x) -u(t,x)]dx

= >.N J)..t L[-u(t,x) +Pu(t,x)]dx.

(8.4.5)

Equation (8.4.5) is exact to within an error that is small compared to J)..t.


By dividing through in (8.4.5) by /),.t and passing to the limit J)..t - t 0,
we obtain

Ou~x)dx=>. L[-u(t,x)+Pu(t,x)]dx,

which gives

8u~x) = ->.u(t,x) + >.Pu(t,x).


Thus we have again arrived at equation (8.3.7).
In this derivation we assumed that the particle, upon striking the wall,
changed its velocity from x to S(x), where 8: X - t X is a point-to-point
transformation. An alternative physical assumption, which is more general
from a mathematical point of view, would be to assume that the change
in velocity is not uniquely determined but is a probabilistic event. In other
words, we might assume that collision with the walls of the container alters
the distribution of particle velocities. Thus, if before the collision the particles have a velocity distribution with density g, then after collision they
have a distribution with density Pg, where P: L 1 (X) - t L 1 (X) is a Markov
operator.
So, assume as before that u( t, x) is the density of the distribution of
particles having velocity x at time t, so

Ni u(t,x)dx
is the number of particles with velocities in A. Once again,

>.N J)..t L u(t,x)dx


is the number of particles with velocity in A that will collide with the walls
in a time /),.t, whereas

>.N /),.t

Pu(t,x)dx

is the number of particles whose velocities go into A because of collisions


over a time /),.t. Thus,

- >.N /),.t

u(t, x) dx + >.N /),.t

Pu(t, x) dx

264

8. Discrete Time Processes/Continuous Time Systems

is the net change, due to collisions over a time !:it in the number of particles
whose velocities are in A.
Combining this result with (8.4.1), we immediately obtain the balance
equation {8.4.5), which leads once again to {8.3.7). The only difference is
that P is no longer a Frobenius-Perron operator corresponding to a given
one-t~one deterministic transformationS, but it is an arbitrary Markov
operator.
Since in our intuitive derivations of {8.3.7) presented in this section,
we used arguments that are employed to derive a Boltzmann equation,
we will call equation {8.3.7) a linear abstract Boltzmann equation
corresponding to a collision (Markov) operator P. To avoid confusion with
the usual Boltzmann equation, bear in mind that x corresponds to the
particle velocity and not to position. Indeed, it is because we assume that
the only source of change for particle velocity is collisions with the wall,
that drift and external force terms do not appear in (8.3.7).
Our next goal will not be to apply equation (8.3. 7) to specific physical
systems. Rather, we will demonstrate the interdependence between the
properties of discrete time deterministic processes, governed by S: X -+ X
or a Markov operator, and the continuous time process, determined by
{8.3.7). The next four sections are devoted to an examination of the most
important properties of (8.3.7), and then in the last section we demonstrate
that the Tjon-Wu representation of the Boltzmann equation is a special
case of {8.3.7).

8.5

Elementary Properties of the Solutions of the


Linear Boltzmann Equation

To facilitate our study of the linear Boltzmann equation {8.3.7), we will


consider the solution u( t, x) as a function from the positive real numbers,
R+, into L 1
u:R+-+ L 1
Thus, by writing {8.3.8) in the form
du

dt

= (P- I)u,

(8.5.1)

where Pis a given Markov operator and I is the identity operator, we may
apply the Hille-Yosida theorem 7.8.1 to the study of equation {8.3.8).
All three assumptions (a)-(c) of the Hille-Yosida theorem are easily
shown to be satisfied by the operator (P-I) of equation {8.5.1). First,
since A = P-I is defined on the whole space, L 1 , V(A) = L 1 and property (a) is thus trivially satisfied.
To check property {b), rewrite the resolvent equation>../- Af = g using

8.5. Linear Boltzmann Equation: Solutions

=P -

265

I to give
(>-.+1)/-Pf=g.

(8.5.2)

Equation (8.5.2) may be easily solved by the method of successive approximations. Starting from an arbitrary /o, we define fn by

().. + 1)/n- Pfn-1 = g,


so, as a consequence,
1

/n=(>-.+ 1)"'

pn~ ~
1
A:-1
J0+~()..+1)kp g.

(8.5.3)

Since IIPkgll $ llgll, the series in (8.5.3) is, therefore, convergent, and the
unique solution f of the resolvent equation (8.5.2) is
(8.5.4)

Remark 8.5.1. The method of successive approximations applied to an


equation such as (8.5.2) will always result in a solution (8.5.3) that converges to a unique limit, as n --+ oo, when IIPII $ ).. + 1. The limiting
solution given by (8.5.4) is called a von Neumann series. 0
To check that the linear Boltzmann equation satisfies property (c) of the
Hill~Yosida theorem, integrate (8.5.4) over the entire space X to give

I R>.g(x)JJ(dx) = (>-.: 1)k I pk-1g(x)JJ(dx)


lx
k=l
lx
1
I
= {; (>-. + 1)k lx g(x)JJ(dx)
00

11

= -)..

g(x)JJ(dx)

= -,
)..

where we used the integral-preserving property of Markov operators in


passing from the first to the second line. Thus,

AR>.g(x)JJ(dx)

= 1,

and, since >-.R>. is linear, nonnegative, and also preserves the integral, it
is a Markov operator. Thus condition (c) is automatically satisfied (see

Corollary 7.8.1).
Therefore, by the Hill~Yosida theorem, the linear Boltzmann equation
(8.3.8) generates a continuous semigroup of Markov operators, {f-'t}t~O

266

8. Discrete Time Processes/Continuous Time Systems

To determine an explicit formula for

Pt, we first write

so
lim

>.-+oo

A>.f = Pf- f.

Thus, by the Hill~Yosida theorem and equation (7.8.3), the unique semigroup corresponding to A = P - I is given by

A! =

et(P-1) j,

(8.5.5)

and the unique solution to equation (8.3.8) with the initial condition u(O, x) =
f(x) is
u(t,x) = et(P- 1 ) f(x).
(8.5.6)
Although we have determined the solution of (8.3.8) using the Hill~
Yosida theorem, precisely the same result could have been obtained by
applying the method of successive approximations to equation {8.5.1). However, our derivation once again illustrates the techniques involved in using
the Hill~Yosida theorem and establishes that (8.3.8) generates a continuous semigroup of Markov operators. Finally, we note that if P in equation (8.3.8) is a Frobenius-Perron operator corresponding to a nonsingular
transformation 8, the solution can be obtained by substituting equation
(8.2.14) into equation {8.3.6).
In addition to the existence and uniqueness of the solution to (8.3.8),
other properties of Pt may be demonstrated.
Property 1. From inequality (7.4.7) we know that, given ft,/2 E Lt, the
norm
(8.5.7)
IIPtft-Pthll
is a nonincreasing function of time t.
Property 2. If for some f E 1 the limit

/. = t-+00
lim A!
exists, then, for the same

(8.5.8)

f,
lim Pt(Pf)

t-+oo

=f.

(8.5.9)

To show this, we prove even more, namely that


lim Pt{f- Pf) = 0

t-+oo

(8.5.10)

8.5. Linear Boltzmann Equation: Solutions

for all

267

E 1 Now,

(8.5.11)
and

oo tn

f>t(Pf)

oo

= e-t L

lpn+l f
n=O n.

= e-t L

tn-1

( -1)1pnf.
n=1 n

Taking the norm of I'tf- i't(Pf), we have

III'tf- Pt(P!)II

~ e-t ~ [ ~ - (~=-:)I] pn /II + e-t IIIII


00

ltn

tn-1

~ e-t ~ nl - (n- 1)1 11/11 + e-t IIIII


If t

= m, an integer, then
00

tn
tn-1
e-t ~ nl - (n- 1)1

I=

2e-m :1m -

1)

since almost all of the terms in the series cancel. However, by Stirling's
formula, ml = mme-mv'11rm9m, where 9m -+ 1 as m -+ oo. Thus for
integer t,
converges to zero as t -+ oo. Since, by property 1 this quantity is a nonincreasing function, then (8.5.10) is demonstrated for all t -+ oo. Finally,
inserting (8.5.8) into (8.5.10) gives the desired result, (8.5.9).

Remark 8.5.2. Note that the sum of the coefficients of Pt! given in equation {8.5.11) is identically 1, and thus the solutions of the linear Boltzmann
equation u(t, x) = Pt!(x) bear a strong correspondence to the averages An!
studied earlier in Chapter 5, with n and t playing analogous roles. 0
Property 9. The operators P and Pt commute, that is, P Pt! = PtPf for
all f E 1 . This is easily demonstrated by applying P to (8.5.11):

Properly 4. If for some f E L 1 the limit (8.5.8), /. = limt-oo Pt!, exists,


then/. is a fixed point of the Markov operator P, that is,

P/. = / .

268

8. Discrete Time Processes/Continuous Time Systems

To show this, note that if

then, by (8.5.9),

which gives the desired result. Further, the same argument shows that, if
/. = limn-+oo Pt"f exists for some subsequence {tn}, then P/. = j .
Property 5. HPj.=/. for some/. e 1 , then also iH. = j .
This is also easy to show. Write P/. = /. as

(P-1)/.

= 0.

Since (P-I) = A is an infinitesimal operator, and every solution of A/ = 0


is a fixed point of the semigroup (see Section 7.8), we have immediately
that Pt!. = j .

8.6 Further Properties of the Linear Boltzmann


Equation
AB shown in the preceding section, the solutions of the linear Boltzmann
equation are rather regular in their behavior, that is, the distance between
any two solutions never increases. Now we will show that, under a rather
mild condition, Pt! always converges to a limit.
Recall our definition of precompactness (Section 5.1) and observe that
every sequence {/n} that is weakly precompact contains a subsequence that
is weakly convergent. Analogously, if for a given f, the trajectory {Pd} is
weakly precompact, then there exists a sequence {tn} such that {Pt"f} is
weakly convergent as tn - oo. To see this, take an arbitrary sequence of
numbers t~ - oo and, then, applying the definition of precompactness to
{Pt:, I}, choose a weakly convergent subsequence {Pt" I} of {Pt:, I}.
Theorem 8.6.1. If the trajectory
exists a fixed point of P.

{Pd} is weakly precompact, then there

Proof. If { Pt!} is weakly precompact, then there exists a sequence {tn}


such that
(8.6.1)
weakly
exists. This implies the weak convergence of

(8.6.2)

8.6. Linear Boltzmann Equation: Solutions

269

However, from (8.5.10), we have


lim

n-+oo

Pt .. U- Pf) = 0,

and, thus, from equations (8.6.1) and (8.6.2), we have

P/. =

/.,

which establishes the claim. Note also from property 5 of Pt (Section 8.5)
that this implies Pt!. = /..

Theorem 8.6.2. For a given f E L 1 , if the trajectory {AI} is weakly


precompact, then

A! strongly converges to a limit.

Proof. From Theorem 8.6.1 we know that Pt .. I converges wea.kly to an/.


that is a fixed point of P and Pt. Write f E L 1 in the form

I= I-/.+/.
Assume that for every c

> 0 the function f- /. may be written in the form

I - /. = Pg - g + r,

(8.6.3)

where g E L 1 and llrll ~ c. (We will prove in the following that this
representation is possible.) By using (8.6.3), we may write

Pt! = AU- /. + /.) = Pt(Pg- g) + Pt!. + Ptr


However

Pt!. = /.

and, thus,

From (8.5.10), the first term on the right-hand side approaches zero as
t-+ oo, whereas the second term is not greater than c. Thus

11Pt!-/.ll~2c
for t sufficiently large, and, since e is arbitrary,
lim

t-+oo

IIA/- 1.11 = o,

which completes the proof if (8.6.3) is true. Suppose (8.6.3) is not true,
which implies that

I-/. closure(P- I)L 1 (X).


This, in turn, implies by the Hahn-Banach theorem (see Proposition 5.2.3)
that there is a go E L 00 such that

(/-/.,go) :f:. 0

(8.6.4)

270

8. Discrete Time Processes/Continuous Time Systems

and

{h,go}
for all hE closure(P-

I)L1 (X).

=0

In particular

((P- I)Pnf,go} = 0,
since (P- I)Pn f E (P- I)L 1 (X), so

(pn+l /,go}= (Pn/,go}


for n

= 0, 1, .... Thus, by induction, we have


(pnf,go} =(!,go).

(8.6.5)

Furthermore, since e-t :E~=O tn fn! = 1, we may multiply both sides of


(8.6.5) by e-ttn fn! and sum over n to obtain

or

(8.6.6)

(Ptf,go} =(!,go}.
Substituting t = tn and taking the limit as t--+ oo in (8.6.6) gives

(/.,go}= {/,go},
and, thus,

(!.-/,go}= 0,
which contradicts equation (8.6.4). Thus (8.6.3) is true.

8.7 Effect of the Properties of the Markov


Operator on Solutions of the Linear
Boltzmann Equation
From the results of Section 8.6, some striking properties of the solutions of
the linear Boltzmann equation emerge. The first of these is stated in the
following corollary.
Corollary 8.7.1. If for f E 1 there exists agE 1 such that
t ~ 0,

(8.7.1)

then the (strong) limit

(8.7.2)

8. 7. Markov Operator and Boltzmann Equation

271

exists. That is, either i'tf is not bounded by any integrable function or Ptf
is strongly converyent.

Proof. Observe that {Ptf} is weakly precompact by our first criterion of


precompactness; see Section 5.1. Thus the limit {8.7.2) exists according to
Theorem 8.6.2.
With this result available to us, we may go on to state and demonstrate
some important corollaries that give information concerning the convergence of solutions Ptf of {8.3.8) when the operator P has various properties.

Corollary 8. 7 .2. If the {Markov) opemtor P has a positive fixed point /.,
/.(x) > 0 a.e., then the strong limit, limt-+oo Ptf, exists for all/ E L 1
Proof. First note that when the initial function

satisfies

l/1 :5 c/.

{8.7.3)

for some sufficiently large constant c > 0, we have

IPn /I :5 pn(c/.)

= cPn /. = cf.

Multiply both sides by e-ttn fnl and sum the result over n to give

The left-hand side of this inequality is just

IPtfl, so that

IA/1:::; c/.,
and, since Ptf is bounded, by Corollary 8.7.1 we know that the strong limit
limt-+oo Ptf exists.
In the more general case when the initial function f does not satisfy
(8.7.3), we proceed as follows. Define a new function by

!. (x) _ { f(x)
c

if 1/(x)l :5 cf.(x)
if 1/(x)l > cf.(x).

It follows from the Lebesgue dominated convergence theorem that


lim

C-+00

Thus, by writing /

11/c -/II = 0.

= /c + / - /c, we have

At= Ptfc + PtU -!c)

272

Since

8. Discrete Time Processes/Continuous Time Systems

/c satisfies

1/cl $ cj.,

from (8.7.3) we know that {F't/c} converges strongly. Now take e > 0. Since
{f'tfc} is strongly convergent, there is a to> 0, which in general depends
on c, such that
for t ~ to, t' ~ 0.

(8.7.4)

fort~O

(8.7.5)

Further,
for a fixed but sufficiently large c. From equations (8.7.4) and (8.7.5) it
follows that
for t ~ to, t' ~ 0,
which is the Cauchy condition for {Ftf}. Thus {Fd} also converges strongly,
and the proof is complete.
The existence of the strong limit (8.7.2) is interesting, but from the
point of view of applications we would like to know what the limit is. In
the following corollary we give a sufficient condition for the existence of
a unique limit to (8.7.2), noting, of course, that, since (8.7.2) is linear,
uniqueness is determined only up to a multiplicative constant.

Corollary 8. 7 .3. Assume that in the set of all densities f E D the equation
Pf = f has a unique solution/. and /.(x) > 0 a.e. Then, for any initial
density, f E D
(8.7.6)
lim Pt! = /.,
t-+oo

and the convergence is strong.

Proof. The proof is straightforward. From Corollary 8. 7.2 the limt-+oo Pt!
exists and is also a nonnegative normalized function. However, by property
4 of Pt (Section 8.5), we know that this limit is a fixed point of the Markov
operator P. Since, by our assumption, the fixed point is unique it must be
/.,and the proof is complete.
In the special case that P is a Frobenius-Perron operator for a nonsingular transformation S: X --+ X, the condition P /. = /. is equivalent to
the fact that the measure

1-'J.(A) =

/.(x)J.t(dx)

is invariant with respect to S. Thus, in this case, from Corollary 8.7.2 the
existence of an invariant measure 111. with a density /.(x) > 0 is sufficient
for the existence ofthe strong limit (8.7.2) for the solutions of (8.3.8). Since,

8.8. Linear Boltzmann Equation with a Stochastic Kernel

273

for ergodic transformations/. is unique (cf. Theorem 4.2.2), these results


may be summarized in the following corollary.

Corollary 8.7.4. Suppose S:X-+ X is a nonsingular tmnsfonnation and


P is the cotTesponding Frobenius-Perron operator. Then with respect to
the trajectories {i'tf} that generate the solutions of the linear Boltzmann
equation (8.3.8):
1. If there exists an absolutely continuous invariant measure J.L1. with
a positive density j.(x) > 0 a.e., then for every f E L 1 the strong
limit, limt-+oo iH exists; and

2. If, in addition, the tmnsfonnation S is ergodic, then


lim

t-+oo

AI=/.

(8.7.7)

for allIED.
Now consider the more special case where (X, A, J.L) is a finite measure
space and S: X -+ X is a measure-preserving transformation. Since S is
measure preserving,/. exists and is given by

/.(x) = 1/J.L(X)

for x EX.

Thus limt-+oo Pt! always exists. Furthermore, this limit is unique, that is,
lim

t-+oo

Pt/ = /. = 1/J.L(X)

(8.7.8)

if and only if Sis ergodic (cf. Theorem 4.2.2).


In closing this section we would like to recall that, from Definition 4.4.1, a
Markov operator P: L 1 -+ L 1 is exact if and only if the sequence { pn!} has
a strong limit that is a constant for every f E L 1 . Although the term exactness is never used in talking about the behavior of stochastic semigroups,
for the situation where (8.7.8) holds, then, the behavior of the trajectory
{Ptf} is precisely analogous to our original definition of exactness. Figuratively speaking, then, we could say that S is ergodic if and only if { Pt h>o
is exact.

8.8 Linear Boltzmann Equation with a Stochastic


Kernel
In this section we consider the linear Boltzmann equation
8u(t x)

-~';........:..

Ot

+u(t,x) = Pu

274

8. Discrete Time Processes/Continuous Time Systems

where the Markov operator P is given by

Pf(x)
and K( x, y): X x X

-+

(8.8.1)

K(x,y)f(y)dy

R is a stochastic kernel, that is,

K(x,y)

(8.8.2)

and

(8.8.3)

K(x,y) dx = 1.

For this particular formulation of the linear Boltzmann equation, we will


show some straightforward applications of the general results presented
earlier.
The simplest case occurs when we are able to evaluate the stochastic
kernel from below. Thus we assume that for some integer m the function
inf11 Km(x, y) is not identically zero, so that

lxf infKm(x,y)dx>O

(8.8.4)

11

(Km is them times iterated kernel K). In this case we will show that the
strong limit
{8.8.5)
lim i'tl = /.
t-+oo
exists for all densities
solution of

f(x)

D, where /. is the unique density that is a

(8.8.6)

K(x,y)f(y)dy.

The proof of this is quite direct. Set

h(x)

= inf
Km(x, y).
11

By using the explicit formula (8.5.11) for the solution

However, for n

i'tl, we have

m, we may write

pn f(x)

Km(x, y)pn-m /{y) dy

~ h{x),

8.8. Linear Boltzmann Equation with a Stochastic Kernel

275

and thus the explicit solution Ptf becomes

Thus we have immediately that


Ptf(x)- h(x) ;:::
A

-e-t

Lm Itn)
h(x),
n.

n=0

so that
(Ptf- h)- $

L 1tn) h.
m

e-t

n=O

n.

Since, however, e-ttn-+ 0 as t-+ oo, we have


lim

t-+oo

IICA/- h)-11 = o,

and, by Theorem 7.4.1, the strong limit/.. of (8.8.5) is unique. Properties


4 and 5 of the solution Ptf, outlined in Section 8.5, tell us that / .. is the
unique solution of Pf = /, namely, equation (8.8.6). Thus the proof is
complete.
Now we assume, as before, that K(x, y) is a stochastic kernel for which
there is an integer m and agE L 1 such that
Km(x,y) $ g(x)

for x,y EX.

(8.8.7)

Then the strong limit


lim Ptf

(8.8.8)

t-+oo

exists for all f E L 1


AB before, to prove this we use the explicit series representation of Ptf,
noting first that, because of (8.8.7), we have, for n;::: m,

IPn/(x)l

= IPm(pn-m/(x))l $
$ g(x) [

Km(x,y)IPn-mf(y)idy

lpn-m /(y)l dy $ g(x)llfll.

276

8. Discrete Time Processes/Continuous Time Systems

Thus we can evaluate I'tf as

IP.JI ,s.-~:IP"/1+ (-

~ e-t

t t~ IPn/I

n=O

n.

.f._, ~u) 11/11

+ gllfll

Further, setting
m

= c "E IPn/1,
n=O

tn

c= sup e -t I
n.
O<t

o::;ri"::;m

we finally obtain

lA/I ~ gll/11 + r.
Evidently, (gll/11 + r) is an integrable function, and from Corollary 8.7.1
we know that the strong limit (8.8.8) exists.
Under assumption (8.8.7) we have no assurance that the strong limit
(8.8.8) is unique. However, some additional properties of K(x, y) may ensure this uniqueness. For example, if X is a bounded interval of the real
line or the half-line, (8.8.7) holds, and Km(x, y) is monotonically increasing
or deCreasing in x, then
for all fED,

(8.8.9)

where f,. is the unique solution of (8.8.6).


To demonstrate this, note that by repeating the proof of Proposition
5.8.1 we may construct an h(x), h(x) ~ 0, llhll > 0, such that Km(x,y) ~
h(x). Then the prooffollows directly from the assertion following equation
(8.8.4).
Analogously, if (8.8.7) holds and Km(x,y) > 0 for x E A, y EX, where
A is a. set of positive measure, then the limit (8.8.9) exists and is unique.
To prove this set Pm = P and observe that for f E D the operator P
satisfies
Pf~g and Pf(x)>O
for x EA.
Thus by Theorem 5.6.1 the limiting function liron..... oo pn f does not depend
on f for fED. Since pn = Pmn 1 the limit (8.8.9) is also independent of f.
It should be noted that the same result holds under even weaker conditions, that is, if (8.8.7) holds and for some integer k
1c

I: Kn(x, y) > 0
n=l

for x E A, y E X.

8.9. The Linear Tjon-Wu Equation

277

8.9 The Linear Tjon-Wu Equation


To illustrate the application of the results developed in this chapter we
close with an example drawn from the kinetic theory of gases [see Dlotko
and Lasota, 1983].
In the theory of dilute gases [Chapman and Cowling, 1960] the Boltzmann equation
DF(t,x,v) _ C(F(
))
Dt
t,x,v
is studied to obtain information about the particle distribution function F
that depends on time (t}, position (x}, and velocity (v). DFfDt denotes
the total rate of change of F due to spatial gradients and any external
forces, whereas the collision operator C(-) determines the way in which
particle collisions affect F. In the case of a spatially homogeneous gas with
no external forces the Boltzmann equation reduces to

{}F~,v)

= C(F(t,v)).

(8.9.1}

Bobylev [1976], Krook and Wu [1977], and Tjon and Wu [1979] have
shown that in some cases equation (8.9.1} may be transformed into

8u(t x)
~

where x

1 y Jo
00

= -u(t,x) +

a:

dy {"

u(t,y- z)u(t,z)dz,

> 0, (8.9.2}

= (v 2 /2} (note that xis not a spatial coordinate) and


u(t,x)

= const

00

1
a:

F(t v)
' dv.
Jv-x

Equation (8.9.2}, called the Tjon-Wu equation [Barnsley and Cormille,


1981], is nonlinear because of the presence of u(t, y- z)u(t, z) in the integrand on the right-hand side. Thus the considerations of this chapter are
of no help in studying the behavior of u( t, x) as t -+ oo.
However, note that exp( -x) is a solution of (8.9.2}, a fact that we can
use to study a linear problem. Here we will investigate the situation where
a small number of particles with an arbitrary velocity distribution f are
introduced into a gas, containing many more particles, at equilibrium, so
that u.(x) = e:xp( -x). We want to know what the eventual distribution of
velocities of the small number of particles tends to.
Thus, on the right-hand side of (8.9.2}, we set u(t, y- z) = u.(y- z) =
e:xp[-(y- z}], so the resulting linear Tjon-Wu equation is of the form

8u(t, x)
8t

+ u(t, x) =

00

a:

dy {" e-(ll-z>u(t, z) dz,


Y

lo

> 0.

(8.9.3}

278

8. Discrete Time Processes/Continuous Time Systems

Equation (8.9.3) is a special case of the linear Boltzmann equation of this


chapter with a Markov operator defined by

1 1"
=1
00

Pf(x) =

..1!..
d
y

:r:

for

e-<v-) j(z) dz

(8.9.4)

E L 1 ((0,oo)). Using the definition of the exponential integral,


00

(e-tl Jy) dy,

-Ei( -x)

equation (8.9.4) may be rewritten as

00

Pf(x)
where

K x
( ,y)

K(x,y)J(y)dy,

= { -e"Ei( -y)

-e"Ei(-x)

0<x ~y
0 < y < x.

(8.9.5)

(8.9.6)

To examine the behavior of the solutions u(t, x) of (8.9.3) as t-+ oo, we


have a number of potential aids available. First, from the preceding section,
if inf, Km(x, y) > 0 for some m, then we could determine limt-+oo Ptf.
However, inf, K(x, y) = 0, and further composition of the kernel with
itself leads to analytically complex results. Second, if we were able to find
a g(x) :2:: Km(x,y) for some m, then the results of the preceding section
could be applied. However, the maximum of K(x, y) in y occurs at y = x
and -exp(x)Ei(-x) is not integrable. As before, compositions of K(x,y)
become so complicated that it is difficult to work with them.
A third alternative is the following. Note that J(x) = exp(-x) is a fixed
point of (8.9.4). If we can show that exp( -x) is the unique fixed point of
(8.9.4), then we may apply Corollary 8.7.3 to show that
lim u(t,x)

t-+oo

= t-+oo
lim Pt!(x) = e-:r:

(8.9.7)

for all densities J E D((O, oo)).


From Pf = f and (8.9.4), we have

f(x)

1 1"
00

:r:

..1!..
d
y

e-<v-) f(z) dz,

(8.9.8)

which must be solved for f. Since the right-hand side of (8.9.8) is differentiable, f must be differentiable. Its first derivative is

d/(x) =
dx

_!

x }0

e-<:r:-z) J(z)

dz.

Multiply both sides by x exp(x) and differentiate again to obtain the nonlinear second-order differential equation

tPj

x dx 2

d/

+ (x + 1) dx + f = 0.

(8.9.9)

8.9. The Linear Tjon-Wu Equation

279

We know that one solution of (8.9.9) is ft(x) = exp(-x), and a second independent solution may be determined using the d' Alembert reduction method [Kamke, 1959]. This simply consists of substituting f(x) =
g(x) exp(-x) into (8.9.~) and solving the resulting equation for g(x). Once
g is determined then the second independent solution of (8.9.8) is h(x) =

g(x) exp(-x).
Making this substitution and simplifying gives

x,Pg +(1-x)dg =0
dx 2

dx

'

which is a first-order equation in dgfdx, easily solved to give

dg
1 z
-=-e
dz
X
as a particular solution. Thus

g(x)

= Ei(x),

and the second solution of (8.9.9) is

h(x) = e-zEi(x).
Therefore, the general solution of (8.9.9) is

f(x)

= 01e-z + 02e-zEi(x).

Since we are searching for an


02 such that f ?: 0 and

00

f(x) dx

(8.9.10)

E D((O,oo)), we must determine 0 1 and

= 01 +

021

00

e-zEi(x) dx

= 1.

However, exp( -x)Ei(x) is not integrable, so we must have 0 1 = 1, 0 2 = 0,


and thus the unique normalized solution of equation (8.9.9) is
(8.9.11)
Hence j. is also the unique normalized solution of (8.9.8).
Therefore, since P f = f has a unique nonnegative normalized solution
j. e D given by (8.9.11), which is also positive, by Corollary 8.7.3, all
solutions of the linear Tjon-Wu equation have the limit
lim u(t,x) = e-z

t-+oo

(8.9.12)

for all initial conditions u(O,x) = f(x), f E D((O,oo)).


This illustration of applying the tools developed in this chapter to deal
with the Tjon-Wu equation is meant to show their power. Given the

280

8. Discrete Time Processes/Continuous Time Systems

integro-differential equation (8.9.3), we have been able to show the global


convergence of its solutions by examining only the fixed points of the righthand side. This led to a second-order ordinary differential equation that
was easily solved, in spite of its nonlinearity. Finally, once the solution was
available and shown to satisfy the requirements of Corollary 8.7.3, then the
asymptotic behavior of u(t, x), for all initial conditions, was also known.

Exercises

.r.

8.1. Let (f!,F,prob) be a probability space and let A,B E


Define
A = {} \ A and fJ = {} \ B. Prove that the independence of events A, B
implies the independence of events A, Bas well as A, fJ and A, B.

8.2. Let (f!,F,prob) be a probability space and A1 ,A2, ... be a sequence


of events. Define
= 1An I n = 1, 2, .... Prove that All A2, ... are independent if and only if the random variables
are independent.

en

en

8.3. Let {Nth>o be a Poisson process and S:R-+ R a nonsingular mapping. Consider the following procedure: In a time t > 0 a point x E R is
transformed into S(x) + Nt. Given an initial density distribution function
f of the initial point x find the density u(t, x) of S(x) + Nt. (As in Section
8.3 assume that the position x of the initial point and the process Nt are
independent.) Prove that u(t, x) satisfies the differential equation

&~x) = -.Xu(t,x) + .Xu(t,x -1)

for

t > 0, x

E R,

which does not depend explicitly on S. (.X is defined in (8.2.8c).] Explain


this paradox.
8.4. Derive formula (8.5.5) for the solution of the linear Boltzmann equation by the use of the Phillips perturbation theorem.
8.5. Consider the linear Boltzmann equation (8.5.1) and corresponding
semigroup {Fth2::0 Assuming that P: L 1 -+ L 1 is a constrictive operator,
prove that
exists for every

E L 1

8.6. Again consider the linear Boltzmann equation (8.5.1) and assume that
P: L 1 (X,A,Jt)-+ L 1 (X,A,Jt) is sweeping with respect to a family A. CA.
Prove that the semigroup {Fth2::o is sweeping with respect to the same
family.
8.7. The nonlinear Tjon-Wu equation (8.9.2) may be written in the form

du
dt

= -u+P(u,u),

Exercises

P(f, g)(x)

1 -dy1"

281

00

:1:

l(y- z)g(z) dz.

Verify that the series


00

u(t)

= e-t L)1 - e-t)nun


n=O

with
1

Un

=-

n-1

L P(u~c, Un-1-lc),

UO

=IE V(R+)

k=O

is uniformly convergent on compact subintervals of R+ and satisfies the

nonlinear Tjon-Wu equation with the initial condition u(O)


1988).

=I

(Kielek,

9
Entropy

The concept of entropy was first introduced by Clausius and later used
in a different form by L. Boltzmann in his pioneering work on the kinetic
theory of gases published in 1866. Since then, entropy has played a pivotal
role in the development of many areas in physics and chemistry and has
had important ramifications in ergodic theory. However, the Boltzmann
entropy is different from the Kolmogorov-Sinai-Qrnstein entropy (Walters,
1975; Parry, 1981] that has been so successfully used in solving the problem
of isomorphism of dynamical systems, and which is related to the work of
Shannon [see Shannon and Weaver, 1949].
In this short chapter we consider the Boltzmann entropy of sequences
of densities { pn!} and give conditions under which the entropy may be
constant or increase to a maximum. We then consider the inverse problem
of determining the behavior of { pn!} from the behavior of the entropy.

9.1

Basic Definitions

If (X, A, p,) is an arbitary measure space and P: 1 - 1 a Markov operator, then under certain circumstances valuable information concerning the
behavior of { pn!} (or, in the continuous time case, { pt!}) can be obtained
from the behavior of the sequence

H(Pnf)

1J(Pn f(x))p,(dx),

where 1J(u) is some function appropriately defined for u ~ 0.

{9.1.1)

284

9. Entropy

1/2

FIGURE 9.1.1. Plot of function fl(u)

= -ulogu.

The classical work of Boltzmann on the statistical properties of dilute


gases suggested that the function 1J should be of the form
fJ(u)

= -u log u,

TJ(O)

= 0,

(9.1.2)

and gi,ves us our definition of entropy.


Definition 9.1.1. If I~ 0 and~(/) E L 1 then the entropy of I is defined
by

H(f)

TJ(/(x))J.'(dx).

(9.1.3)

Remark 9.1.1. If J.'(X) < oo, then the integral (9.1.3) is always well
defined for every I ~ 0. In fact, the integral over the positive parts of
TJ(/(x)),
[TJ(/(x))]+ = max[O,TJ(/(x))].
is always finite. Thus H(/) is either finite or equal to -oo. 0
Since we take TJ(O) = 0, the function TJ(u) is continuous for all u ~ 0. The
graph of 1J is shown in Figure 9.1.1. One of the most important properties
of 1J is that it is convex. To see this, note that
TJ"(u)

= -1/u

so TJ" (u) < 0 for all u > 0. From this it follows immediately that the graph
of 1J always lies below the tangent line, or
TJ(u) ~ (u- v)TJ'(v) + TJ(v)

(9.1.4)

for every u, v > 0. Combining (9.1.4) with the definition of 1J given in


equation (9.1.2) leads to the Gibbs inequality
u-ulogu~v-ulogv

for u,v > 0,

which we shall have occasion to use frequently.

(9.1.5)

9.1. Basic Definitions

285

If f and g are two densities such that 71(/(x)) and f(x) logg(x) are
integrable, then from (9.1.5) we have the useful integral inequality

-i

f(x) log /(x)JJ(dx) :5 -

f(x) log g(x)JJ(dx).

(9.1.6)

and the equality holds only for f =g. Inequality (9.1.6) is often of help in
proving some extremal properties of H(/) as shown in the following.
Proposition 9.1.1. Let JJ(X) < oo, and consider all the possible densities
f defined on X. Then, in the family of all such densities, the maximal
entropy occurs for the constant density

/o(x) = 1/JJ(X),

(9.1.7)

and for any other f the entropy is strictly smaller.

Proof. Pick an arbitrary

E D

so that the entropy of f is given by

H(f) =- [

f(x) log /(x)JJ(dx)

and, by inequality (9.1.6),

H(f) :5 - [ f(x) log /o(x)JJ(dx)


=-log

[JJ(~)] [!(x)JJ(dx)

or

H(f) ::::; -log

[JJ(~)] ,

and the equality is satisfied only for f = fo However, the entropy of /o is


simply
H(/o) =-

i JJ(~)

log

[JJ(~)] JJ(dx) =-log [JJ(~)],

so H(/) :5 H(/o) for all f E D.


If JJ(X) = oo, then there are no constant densities and this proposition
fails. However, if additional constraints are placed on the density, then
we may obtain other results for maximal entropies as illustrated in the
following two examples.
Example 9.1.1. Let X= [0, oo) and consider all possible densities f such
that the first moment of f is given by
(9.1.8)

286

9. Entropy

Then the density

(9.1.9)
maximizes the entropy.
The proof proceeds as in Proposition 9.1.1. From inequality (9.1.6) we
have, for arbitrary f E D satisfying (9.1.8),

-1

00

H(J) :5

f(x)

log(.Xe--':~:) dx

00

=-log.X

00

f(x)dx+

-Xxf(x)dx

=-log.X+l.

Also, however, with


H(J0) =

/o given by (9.1.9),

-1

00

.Xe-.\:1: log(.Xe--':~:) dx = -log .X+ 1

and thus H(J) :5 H(Jo) for all

f E D satisfying (9.1.8).

Example 9.1.2. For our next example take X= (-oo,oo) and consider
all possible densities f E D such that the second moment of f is finite,
that is,
(9.1.10)
/_: x 2J(x) dx = u 2.
Then the maximal entropy is achieved for the Gaussian density

/o(x)

1
- exp (- x 2 )
= -V27ru
2
2u

(9.1.11)

As before, we calculate that, for arbitrary fED satisfying (9.1.10),

H(J) :5- /_: /(x) log [

.;2~u2 exp (- ;;2 )] dx

roo

roo

=-log [.; 1 2]
f(x) dx + 2\
x2 J(x) dx
27ru 1-oo
q 1-oo
1
=!-log
[ -- ]
.
2
V27ru2
Further

H(Jo) = -

roo /o(x) log /o(x) dx = ! -log [ v21ru


~]
1-oo
2

so that the entropy is maximized with the Gaussian density (9.1.11).

9.1. Basic Definitions

287

These two examples are simply special cases covered by the following
simple statement.
Proposition 9.1.2. Let {X,A,J.L) be a measure space. Assume that a sequence 91, ... , 9m of measurable functions is given as well as two sequences
of real constants 91, ... , 9m and 111, ... , lim that satisfy

fx g;(x) exp[-v;g;(x)]J.L{dx)
g; = fx n~1 exp[-v;g;(x)]J.L(dx),
where all of the integrals are finite. Then the maximum of the entropy H(J)
for all f E D, subject to the conditions

g; =

g;(x)f(x)J.L(dx),

occurs for

n~1 exp[-v;g;(x)]

;{! ( ) 0 X

i=l, ... ,m

fx n~1 exp[-ll;g;(x)]J.L(dx)

Proof. For simplicity, set

1IT
X

exp[-v;g;(x)]J.L(dx)

i=1

so

fo(x)

= z- 1 II exp[-v;g;(x)J.
i=1

From inequality (9.1.6), we have

H(J)

~=:....

f(x) log fo(x)J.L(dx)

lx

f(x) [-log Z-

"$: v;g;(x)] J.L(dx)


=1

=log Z

+ ~ 11; [

=log Z

+ ~ v;g;.

f(x)g;(x)J.L(dx)

i=1
Furthermore, it is easy to show that
m

H(Jo) =log Z

+ ~lli9i
i=1

and thus H(J) ~ H(Jo).

288

9. Entropy

Remark 9.1.2. Note that if m = 1 and g(x) is identified as the energy of


a system, then the maximal entropy occurs for
lo(x) =

z-le-vg(z)'

which is just the Gibbs canonical distribution function, with the partition function Z given by

Further, the maximal entropy

H(/o)

= log Z + vg

is just the thermodynamic entropy. AB is well known, all of the results


of classical thermodynamics can be derived with the partition function Z
and the preceding entropy H(/0 ). Indeed, the contents of Proposition 9.1.2
have been extensively used by Jaynes [1957] and Katz [1967] in an alternative formulation and development of classical and quantum statistical
mechanics. 0
Thus, the simple Gibbs inequality has far-reaching implications in pure
mathematics as well as in more applied fields. Another inequality that we
will have occasion to use often is the Jensen inequality: If 17(u), u ~ 0
is a function such that 11" ~ 0 (i.e., the graph of 11 is convex), P: P-+ P
1 ~ p ~ oo, is a linear operator such that P1 = 1, and PI ~ 0 for all
I~ 0, then for every IE LP, I~ 0,
whenever P17(/) exists.

(9.1.12)

The proof of this result is difficult and requires many specialized techniques.
However, the following considerations provide some insight into why it is
true. Let 17(y) be a convex function defined for y ~ 0. Pick u, v, and z such
that 0 ~ u ~ z ~ v. Since z E [u, v] there exist nonnegative constants, a
and {j, with a + {3 = 1, such that

z = au+{3v.
Further, from the convexity of 11 it is clear that 17(z) ~ r, where

= a17(u) + {j1J(v).

Thus 17(z) ~ r gives

17(au + {jv)

a17(u) + /317(v).

9.2. Entropy of P" / when P is a Markov Operator

289

Further, it is easy to verify by induction that for every sequence 0 $ u 1

<

(9.1.13)

where a; ~ 0 and Ei a; = 1. Now suppose we have a linear operator


P: Rn - nn satisfying Pl = 1. Since P is linear its coordinates must be
of the form
n

(PI);

=L

k;;/;.

j=1

where f
(ft, ... , /n) and
(9.1.13) to (PI);, we have

E; k;; = 1, k;;

~ 0. By applying inequality

TI((Pf);) ~ Lkt;TI(/;)

= P(Tif);,

j=1

or, suppressing the coordinate index,

In an arbitrary (not necessarily finite dimensional) space the proof of the


Jensen inequality is much more difficult, but still uses (9.1.13) as a starting
point.
The final inequality we will have occasion to use is a direct consequence
of integrating inequality (9.1.13) over the entire space X, namely,

(9.1.14)

where again

a; ~

0 and

Ei a; = 1.

9.2 Entropy of pn f when P is a Markov Operator


We are now in a position to examine the behavior of the entropy H ( pn f)
when Pis a Markov operator. We begin with the following theorem.
Theorem 9.2.1. Let (X, A, JJ) be a finite measure space [J.t(X) < oo] and
L 1 a Markov opemtor. If P has a constant stationary density
[PI = 1], then
H(Pf) ~ H(f)
(9.2.1)

P: L 1

for all

~ 0, f E L 1

Proof. Integrating Jensen's inequality (9.1.12) over the entire space X

290

9. Entropy

gives

1J(PI(x))J.&(ch)

L
=L
~

P1J(f(x))J.&(ch)

71(/(x))J.&(ch)

since P preserves the integral. However, the left-most integral is H(Pf),


and the last integral is H(f), so that (9.2.1) is proved.

Remark 9.2.1. For a finite measure space, we know that the maximal
entropy Hmax is -log[1/J.&(X)], so that
-log[1/J.&(X)] ~ H(J>ftf) ~ H(f).
This, in conjunction with Theorem 9.2.1, tells us that in a finite measure space when P has a constant stationary density, the entropy never
decreases and is bounded above by -log[1/J.&(X)]. Thus, in this case the
entropy H(Pn f) always converges as n-+ oo, although not necessarily to
the maximum. Note further that, if we have a normalized measure space,
then J.&(X) = 1 and Hmax = 0. 0

Remark 9.2.2. In the case of a Markov operator without a constant stationary density, it may happen that the sequence H(pn f) is not increasing
as n increases. As a simple example consider the quadratic transformation
S(x) = 4x(1- x). The Frobenius-Perron operator for S, derived in Section
1.2,is

Pl(x)

= 4v'1- X

{1

( ! - !v'1- x) +I(!+ !v'1- x)}


2 2
2 2

and it is easy to verify that


1

l.(x) = 1ry'x(1- x)
is a stationary density for P. Take as an initial density I= 1, so H(f) = 0
and
1
Pl(x) = yr::x
2 1-x
Then

H(Pf)
Clearly H(Pf)

-1

~log ( 2~)
ch = (log2) -1.
1-x

o 2 1-x

< H(f)

= 0.

9.2. Entropy of pn f when P is a Markov Operator

291

It is for this reason that it is necessary to introduce the concept of conditional entropy for Markov operators with nonconstant stationary densities.
Definition 9.2.1. Let f,g E D be such that supp/ c suppg. Then the
conditional entropy of f with respect to g is defined by

nu 1g)= Lg(x)11 [~~:?] Jl(dx} = - LJ<x> log [~~:?] Jl(dx). (9.2.2}


Remark 9.2.3. Since g is a. density and 17(x) = -x log x is bounded
(sup17 < oo) the integral H(f I g) is always defined, that is, it is either
finite or equal to -oo. In some sense, which is suggested by the equation
(9.2.2), the value H(f I g) measures the deviation off from the density g.
0

The conditional entropy H(f I g) has two properties, which we will use
later. They are
1. If j,g E D, then, by inequality (9.1.6), H(f
holds if and only if I= g.

I g)

::;; 0. The equality

2. If g is the constant density, g = 1, then H(f 11} = H(f}. Thus the

conditional entropy H(f I g) is a. generalization of the entropy H(f).

For j,g ED, the condition supp/ C suppg implies suppP/ c suppPg
(see Exercise 3.10}, and given H(f I g) we may evaluate H(Pf I Pg)
through the following.
Theorem 9.2.2. Let (X,A,Jl) be an arbitmry measure space and P: L 1
L 1 a Markov opemtor. Then

H(Pf I Pg)

H(f I g)

for/, g E D, supp f

C supp g.

-+

(9.2.3)

Remark 9.2.4. Note from this theorem that if g is a. stationary density of


P, then H(Pf I Pg) = H(Pf I g) and thus

H(Pf I g)

H(f I g).

Thus the conditional entropy with respect to a. stationary density is always


increasing and bounded above by zero. It follows that H(pn f I g) always
converges, but not necessarily to zero, as n -+ oo. 0
Proof of Theorem 9.2.2. Here we give the proof of Theorem 9.2.2 only
in the case when Pg > 0, g > 0, and the function f / g is bounded. [Consult
Voigt (1981) for the full proof.) Take g E L 1 with g > 0. Define an operator
R: Loo -+ Loo by
Rh

= P(hg)/ Pg

for hE

r~o,

292

9. Entropy

where hg denotes multiplication, not composition. R has the following properties:


1. Rh :2::: 0 for h :2::: 0; and

2. R1

= PgfPg = 1.

Thus R satisfies the assumptions of Jensen's inequality, giving


q(Rh) :2::: Rq(h).

(9.2.4}

Setting h = f fg the left-hand side of (9.2.4} may be written in the form


q(Rh) = -(PffPg)log(PffPg)

and the right-hand side is given by


Rq(h)

= (1/Pg)P((q o h)g] = -(1/Pg)P(flog(f /g)].

Hence inequality (9.2.4} becomes


-Pflog(PffPg) :2::: -P(flog(f/g)].

Integrating this last inequality over the space X, and remembering that P
preserves the integral, we have
H(Pf 1 Pg) :2:::- [

=which finishes the proof.

P{f<x> log

f(x)

[~~:n}"<dx>

log[~~=~] J'(dx) = H(f I g),

9.3 Entropy H(Pn f) when P is a


Frobenius-Perron Operator
Inequalities (9.2.1} and (9.2.3} of Theorems 9.2.1 and 9.2.2 are not strong.
In fact, the entropy may not increase at all during successive iterations
of f. This is always the case when P is the Frobenius-Perron operator
corresponding to an invertible transformation, which leads to the following
theorem.

Theorem 9.3.1. Let (X, A, J') be a finite measure space and S: X --t X
be an invertible measure-presennng transformation. If P is the FrobeniusPerron operator corresponding to S, then H(pn f)= H(f) for all n.
Proof. If Sis invertible and measure preserving, then by equation (3.2.10}
we have Pf(x) = f(S- 1 (x)) since J- 1 1. If P 1 is the Frobenius-Perron

9.3. Entropy H(Pn f) when P is a Frobenius-Perron Operator

293

operator corresponding to s- 1 , we also have Ptf{x) = f(S(x)). Thus


PtPf = PPtf = f, so P1 =p-l. From Theorem 9.2.1 we also have
~

H(PtP!}

H(Pf)

H{!},

but, since P1 Pf = p-lpf = /, we conclude that H(Pf} = H{!}, so


H(Pn f) = H(f) for all n.
Remark 9.3.1. For any discrete or continuous time system that is invertible and measure preserving the entropy is always constant. In particular,
for a continuous time system evolving according to the set of differential
equations x = F(x}, the entropy is constant if div F = 0 (see equation
(7.8.18}]. Every Hamiltonian system satisfies this condition. D
However, for noninvertible (irreversible) systems this is not the case, and
we have the following theorem.

Theorem 9.3.2. Let (X, A, p.) be a measure space, p.(X) = 1, 8: X -+ X a


measure-preseroing transformation, and P the Frobenius-Perron operator
corresponding to S. If S is exact then
lim H(Pn!} = 0

n-oo

for all f E D such that H(f) > -oo.


Proof. Assume initially that f is bounded, that is, 0 ~ f
0 :5 pn f

pnc = cPn1

c. Then

= c.

Without any loss of generality, we can assume that c > 1. Further, since
77(u) :50 for u ~ 1, we have (note p.(X} = 1 and Hmax =OJ
{9.3.1}
where

An= {x: 1 ~ pnf(x)

c}.

Now, by the mean value theorem (using 77{1} =OJ, we obtain

lin

TJ(Pn f(x)}p(dx}l

=in
~k
:5 k

where

ITJ(Pn f(x)}- TJ(1}Ip(dx}

JAn

IPn f(x)- 1lp.(dx}

LIPn

f(x} -1lp.(dx} = IIPnf -111,

k = sup l77'(u)l
l$u$c

294

9. Entropy

Since S is exact, from Theorem 4.4.1, we have


for all J E D and thus
lim {

n-+oo}A..

liP"J- 111 -+ 0 as n -+ oo

t7(Pn J(x))JJ(dz)

= 0.

From inequality (9.3.1), it follows that H(P" f) converges to zero.


Now relax the assumption that J is bounded and write J in the form

!=It +h,
where

I 1 (x) -_
and

{0
J(x)

if J(x) > c
if 0 ~ J(x) ~ c

J - It. Fixing E > 0, we may choose c sufficiently large so that


llhll

<E

and H(h)

> -E.

Write P" J in the form

.P"J = (1-6)Pn ( 1 ~ ft) +6pn (~h),

where 6 = llhll Now ft/(1- 6) is a bounded density, and so from the first
part of our proof we know that for n sufficiently large

Furthermore,

6H

(pn (~h)) = H(Pn

h) -log (

~)

pn h(x)JJ(dz)

= H(Pn h) -llhlllog (~)


= H(Pnh) + 6log 6.
Since H(P" h) ~ H(h)

> -E, this last expression becomes

6H ( pn

(~h)) ~ -E + 6 log 6.

Combining these results and inequality (9.1.14), we have

H(Pnf)

~ (1- 6)H (pn ( 1 ~ 6 ~t)) +6H


~

-t(1 - 6) -

(pn (~h))

+ 6 log 6

= -2e + 6e + 6 log 6.

(9.3.2)

9.4. Behavior of pnf from H(Pnf)

295

Since f..'( X) = 1, we have H(pn f) ~ 0. Further since 6 < e and e is


arbitrary, the right-hand side of (9.3.2) is also arbitrarily small, and the
theorem is proved.

Example 9.3.1. We wish to compare the entropy of the baker transformation


S(

) _ { (2x, b),
0~x< 0~y~1
(2x - 1'21y + 1)
x, Y 2, 1
2<
- x -< 1 , 0 <
- y -< 1,

originally introduced in Example 4.1.3, with that of the dyadic transformation. Observe that the x-coordinate of the baker transformation is transformed by the dyadic transformation
81(x)

= 2x

(mod 1).

From our considerations of Chapter 4, we know that the baker transformation is invertible and measure preserving. Thus by Theorem 9.3.1 it
follows that the entropy of the sequence {pn !}, where P is the FrobeniusPerron operator corresponding to the baker transformation, is constant for
every density f.
Conversely, the dyadic transformation 81 is exact. Hence, from Theorem
9.3.2, the entropy of {Pf !}, where P 1 is the Frobenius-Perron operator
corresponding to 8 11 increases to zero for all bounded initial densities f.
0
Remark 9.3.2. Observe that in going from the baker to the dyadic transformation, we are going from an invertible (reversible) to a noninvertible (irreversible) system through the loss of information about the y-coordinate.
This loss of information is accompanied by an alteration of the behavior of
the entropy. An analogous situation occurs in statistical mechanics where,
in going from the Liouville equation to the Boltzmann equation, we also lose
coordinate information and go from a situation where entropy is constant
(Liouville equation) to one in which the entropy increases to its maximal
value (Boltzmann H theorem). 0

9.4 Behavior of pnf from H(Pnf)


In this section we wish to see what aspects of the eventual behavior of
pn f can be deduced from H(Pn f). This is a somewhat difficult problem,
and the major stumbling block arises from the fact that '1 changes its
sign. Thus, because of the integration in the definition of the entropy, it is
difficult to determine f or its properties from H(f). However, by use of the
spectral representation Theorem 5.3.1 for Markov operators, we are able
to circumvent this problem.

296

9. Entropy

In our first theorem we wish to show that, if H(pn f) is bounded below,


then P is constrictive. This is presented more precisely in the following
theorem.
Theorem 9.4.1. Let (X, A, JJ) be a measure space, JJ(X) < oo, and P: 1 -+
1 a Markov operator such that P1 = 1. If there exists a constant c > 0
such that for every bounded f E D
H(Pn f)~ -c

for n sufficiently large,

then P is constrictive.

Proof. Observe that P1 = 1 implies that Pf is bounded for bounded f.


Thus, to prove our theorem, it is sufficient to show that the set F of all
bounded f E D that satisfy

H(/)

-c

is weakly precompact.
We will use criterion 3 of Section 5.1 to demonstrate the weak precompactness of F. Since 11/11 = 1 for all f E D, the first part of the criterion is satisfied. To check the second part take e > 0. Pick l = e- 1JJ(X},
N = exp[2(c + l)/e] and 6 = ef2N, and take a set A c X such that
JJ(A) < 6. Then

rf(x)JJ(dx) = hlr f(x)JJ(dx) + h2r f(x)JJ(dx),

(9.4.1)

h
where

A1

= {x E A:f(x) ~ N}

A2

= {x E A: /(x) > N}.

and

The first integral on the right-hand side of (9.4.1} clearly satisfies

{ f(x)JJ(dx)

}At

~ N6 = ef2.

In evaluating the second integral, note that from H (/)


that

{ f(x) log /(x)JJ(dx}


jA2

~ C-

~ c+

jX\A2

-c, it follows

f(x) log /(X)JJ(dx)

'lmaxP.( dx)
lx\A2
~ c + (1/e)JJ(X} = c + l.

9.4. Behavior of pn f from H(Pn f)

297

Therefore

jA2
or

f(x)log Np(dx)

< c+ l

c+l

}A f(x)p(dx) < log N


2

= 2

Thus

f(x)p(dx) < e

and :F is weakly precompact. Thus, by Definition 5.3.3, the operator Pis


constrictive.
Before stating our next theorem, consider the following. Let (X,A,p)
be a finite measure space, 8: X - X a nonsingular transformation, and P
the Frobenius-Perron operator corresponding to 8. Assume that for some
c > 0 the condition

H(Pnf) 2:: -c
holds for every bounded f E D and n sufficiently large. Since P is a Markov
operator and is constrictive, we may write P f in the form given by the
spectral decomposition Theorem 5.3.1, and, for every initial/, the sequence
{pn!} will be asymptotically periodic.

Theorem 9.4.2. Let (X, A, p) be a normalized measure space, 8: X -X a


measure-preseroing tmnsfonnation, and P the Frobenius-PeTTOn opemtor
corresponding to 8. If
lim H(Pn f) = 0
n-+oo

for all bounded f E D, then 8 is exact.

Proof. It follows from Theorem 9.4.1 that Pis constrictive. Furthermore,


since 8 is measure preserving, we know that P has a constant stationary
density. From Proposition 5.4.2 we, therefore, have
r

pn+l f(x)

= L Aa-"(i)(/)IA, (x) + Qnf(x)


i=l

If we can demonstrate that r


shown 8 to be exact.
Pick

f(x)

as an initial

f.

If T

= 1, then from Theorem 5.5.2 we will have


= [1/p(A1)]1A

(x)
is the asymptotic period of pn f, then we must have
1

However, by assumption,

lim H(Pn f)

n-+oo

= 0,

298

9. Entropy

and, since the sequence {H(pn.,. f)} is a constant sequence, we must have

Note that, by Proposition 9.1.1, H(J} = 0 only if

f(x)

= 1x(x).

So, clearly, we must have

[1/I'(At}]1A 1 (x)

= 1x(x).

This is possible if and only if A1 is the entire space X, and thus r = 1.


Hence Sis exact.
This theorem in conjunction with Theorem 9.3.2 tells us that the convergence of H(Pn f) to zero as n --+ oo is both necessary and sufficient for
the exactness of measure-preserving transformations. If the transformation
is not measure preserving then an analogous result using the conditional
entropy may be proved.
To see this, suppose we have an arbitrary measure space (X, .A, I') and
a nonsingular transformationS: X--+ X. Let P be the Frobenius-Perron
operator corresponding to Sand g ED (g > 0} the stationary density of P
so Pg =g. Since Sis not measure preserving, our previous results cannot
be used directly in examining the exactness of S.
However, consider the new measure space (X, .A, Jl}, where

jj(A)

g(x)l'(dx)

for A E .A.

Since Pg = g, therefore ji is an invariant measure. Thus, in this new space


the corresponding Frobenius-Perron operator P is defined by

}A

Ph(x}ji(dx}

and satisfies P1

=f

ls-l(A)

h(x}ji(dx}

for A E .A

= 1. This may be rewritten as

f [Ph(x)]g(x)l'(dx} = f

}A

ls-l(A)

h(x)g(x)l'(dx}.

However, we also have

ls-l(A)

so that (i'h}g

h(x)g(x)l'(dx}

= f P(h(x)g(x)}~(dx}
}A

= P(hg} or
Ph= (1/g)P(hg).

9.4. Behavior of P" f from H(P" /)

299

Furthermore, by induction,

finh

= (1/g)Pn(hg).

In this new space {X, A, jJ.), we may also calculate the entropy il(Pn h)
as

H(Ph) =-

pnh(x) log[finh(x)]il{dx)

g(~) pn(h(x)g(x)) log [ pn(~~rx))] g(x)p(dx)

=- [

= H(P.. (hg) I g).


Observe that hE D(X,A,ji,) is equivalent to

~0

and

h(x)g(x)p(dx)

= 1,

which is equivalent to hg E D(X, A, p). Set I= hg, so

il(Pnh)

= H(Pn I I g).

We may, therefore, use our previous theorems to examine the exactness


of S in the new space (X, A, jJ) or its asymptotic stability in the original
space (X, A, p), that is, S is statistically stable in (X, A, p) if and only if
{9.4.2)
for all

I ED

such that

I/ g is bounded.

Example 9.4.1. Consider the linear Boltzmann equation [equation {8.3.8)]

8u(t,x)
8t +u(t,x)=Pu(t,x),
with the initial condition u(O,x) = l(x), which we examined in Chapter 8.
There we showed that the solution of this equation was given by

u(t,x)

= et(P-I) l(x) = Al(x),

and et(P-I) is a semigroup of Markov operators. From Theorem 9.2.2 we


know immediately that the conditional entropy H(Pt! I 1.) is continually
increasing for every 1. that is a stationary density of P. Furthermore, by
(9.4.2) and Corollary 8.7.3, if l.(x) > 0 and 1. is the unique stationary
density of P, then
lim H(Pt!

t-+oo

I 1.) = H(f. I 1.) = 0.

Thus, in the case in which 1. is positive and unique, the conditional entropy for the solutions of the linear Boltzmann equation always achieves
its maximal value.

300

9. Entropy

Exercises
9.1. Let X= {(xt, ... ,x~~:) E Rlc:x1 ~ 0, ... ,x~~: ~ 0}. Consider the family
Fm 1 m of densities 1: X - R+ such that

1 1
00

00

xd(xt, ... ,

x~~:)dxt dx~~: = Tni > 0,

= 1, ... ,k.

Find the density in Fm 1 m that maximizes the entropy.


9.2. Let X = {(x, y) e R 2 : y ~ ajxl} where a is a constant. Consider the
family Fma of densities 1: X - R+ such that

LYI(x,y)dxdy = m

> 0.

Show that for a> 0 there is a density in Fma having the maximal entropy
and that for Q ~ 0 the entropy in r mOt is unbounded.
9.3. Consider the space X = { 1, ... , N} with the counting measure. In this
space 'D(X) consists of all probabilistic vectors (It = 1(1), ... , IN = f(N))
satisfying

lc=l

Show that j,, = 1/N, k = 1, ... , N maximizes the entropy. For which vector
is the entropy minimal?
9.4. Consider the heat equation

au

62

a2 u

for t

&t-2ax2

> 0, x

R,

and prove that every positive solution u( t, x) corresponding to the bounded


initial u(O, x) = l(x), IE D with compact support, satisfies
d
dtH(u)

(a )2

= J+oo
-oo u ax lnu

dx ~ 0.

9.5. Consider the differential equation

au

62

&t = 2

B2u

fJx2 -

ax (b(x)u)

for t

> 0, 0 ~ x

with the boundary value conditions

Uz(t, 0)

= Uz{t, 1) = 0

for t > 0.

Assume that b is a (J2 function and that b(O) = b(1) = 0. Without looking
for the explicit formula for the solutions (which, for arbitrary b, is difficult)
prove the following properties:

Exercises

301

(a) For every solution

1
1

u(t,x)dx = const.

(b) For every two positive normalized solutions u1 and u 2

d~H(u1 I u2) =

u1

(!In ::) dx ~

0,

(Risken, 1984; Sec. 6.1.)

9.6. Write a program called CONDENT (conditional entropy) to study the


value

1
1

H(f I g)=-

/(x)log

[~~=~] dx

for

f, g E D([O, 1]).

Compare for different pairs of sequences {/n}, {gn} C D([O, 1]) the asymptotic behavior of
11/n- 9niiL1 and H(/n I 9n)
9.7. Let (X,A,JS) be a measure space. Prove that for every two sequences
{/n}, {gn} CD the convergence H(fn I Un) - 0 implies 11/n- YniiLl - 0.
Is the converse implication also true? Exercise 9.6 can be helpful in guessing
the proper answer (Loskot and Rudnicki, 1991).
9.8. Consider a density f,.: R 3

R+ of the form

= o: exp(-Pixl 2 + kx),
where lxl 2 = x~ + x~ + x~ and kx = k1x1 + k2x2 + ksxs.
f,.(x)

Assume that a

sequence of densities InC D(~) satisfies

r 9i(x)fn(x) dx = }Rar 9i(x)f,.(x) dx,

}Ra

with go(x) = lxl 2 and 9i(x)


H(/n)- H(f,.) implies 11/n
nicki, 1991).

= 0, 1,2,3,

= Xi, i = 1, 2, 3. Prove that the convergence

-/.II- 0

(Elmroth, 1984; Loskot and Rud-

10
Stochastic Perturbation of Discrete
Time Systems

We have seen two ways in which uncertainty (and thus probability) may
appear in the study of strictly deterministic systems. The first was the
consequence of following a. random distribution of initial states, which, in
turn, led to a. development of the notion of the Frobenius-Perron operator
and an examination of its properties a.s a. means of studying the asymptotic
properties of flows of densities. The second resulted from the random application of a. transformation S to a. system and led na.tura.lly to our study
of the linear Boltzmann equations.
In this chapter we consider yet another source of probabilistic distributions in deterministic systems. Specifica.lly, we examine discrete time situations in which a.t each time the value Xn+l = S(xn) is reached with some
error. An extremely interesting situation occurs when this error is small
and the system is ''primarily" governed by a. deterministic transformation
S. We consider two possible ways in which this error might be sma.ll: Either
the error occurs rather rarely and is thus small on the average, or the error
occurs constantly but is sma.ll in magnitude. In both cases, we consider the
situation in which the error is independent of S(xn) and are, thus, led to
first reca.ll the notion of independent random variables in the next section
and to explore some of their properties in Sections 10.2 and 10.3.

304

10. Stochastic Perturbation of Discrete Time Systems

10.1

Independent Random Variables

Let (0, F, prob) be a probability space. A finite sequence of random variables (et, ... ,
is called a k-dimensional random vector. Equivalently,
we could say that a random vector = (
is a measurable transformation from n into Rk. Measurability means that for every Borel subset
B C Rk the set

ek)

e e, ... ,ek)

belongs to F.
Thus, having a k-dimensional random vector (e1, ... ,eA:), we may consider two different kinds of densities: the density of each random component
and the joint density function for the random vector (e1, ... ,
Let the
density of be denoted by f,(x), and the joint density of = (et ... ,eA:)
be /(xt, ... , X A:) Then by definition, we have

ei

ei

ek).

(10.1.1)

and

j .. j

f(xt, ... ,xA:)dxt" dxA: = prob{(et, ... ,eA:) E B},

forB c Rk,

Jik,

where B, and B are Borel subsets of R and


respectively. In this last
integral take
B= B 1 x Rx x R

._____.,
k-l times

so that we have
prob{(et. ... eA:)

e B} = prob{et e Bt}

k{I . I
1

f(x, x2, ... , XA:) dx2 ..

dxk} dx.

(10.1.2)

R"-1

By comparing (10.1.1) with (10.1.2), we see immediately that

ft(x)

I . I

J(x,x2, ... ,xA:)dx2 ..

dxk.

(10.1.3)

R"-1

Thus, having the joint density function f for (e1, ... ,eA:), we can always find
the density for
from equation (10.1.3). In an entirely analogous fashion,

el

10.1. Independent Random Variables

305

h can be obtained by integrating f(xlt x, ... , Xk) over Xt, xs, ... , Xk The
same procedure will yield each of the densities /;.
However, the converse is certainly not true in general since, having the
density /; of each random variable
(i = 1, ... 'k), it is not usually possible to find the joint density I of the random vector (elt ... 'ek> The one
important special case in which this construction is possible occurs when
ek are independent random variables. Thus, we have the following
theorem.

ei

e1, ... ,

Theorem 10.1.1. If the random variables 6, ... ,ek are independent and
have densities It, ... , fk, respectively, then the joint density function for
the random vector (6' ... 'ek) is given by
(10.1.4)

where the right-hand side is a product.

Proof. Consider a Borel set B

Rk of the form
(10.1.5)

where B 1 , , Bk

c R are Borel sets.

and, since the random variables

Then

el' ... 'ek are independent,

With this equation and (10.1.1}, we obtain

j .. j It (xt) .. fk(xk) dx1 .. dxk.

(10.1.6)

Since, by definition, sets of the form {10.1.5) are generators of the Borel
subsets in Rk, it is clear that (10.1.6) must hold for arbitrary Borel sets B c
Rk. By the definition of the joint density, this implies that It (x 1 ) fk(xk)
is the joint density for the random vector (e1 , . ,ek>
As a simple application of Theorem 10.1.1, we consider two independent
random variables
and
with densities It and f2, respectively. We wish
to obtain the density of
+ Observe that, by Theorem 10.1.1, the
random vector
2) has the joint density ft(xt)f2(x2) Thus, for an

e1 e2
e1 e2.
(et.e

306

10. Stochastic Perturbation of Discrete Time Systems

arbitrary Borel set B

c R, we have

II

prob{~1 +~2 E B} =

11(xl)f2(x:~)dxldx2,

z1+z:aEB

or, setting x = x1 +x2 andy= x2,


prob{el +e2

e B}

=II
L{/_:

11(x- y)f2(y)dxdy

BxR

1:

11(x- y)f2(y)dy} dx.

From the definition of a density, this last equation shows that


f(x) =

is the density of

el + e2

(10.1.7)

11(x- y)f2(y) dy

Remark 10.1.1. From the definition of the density, it follows that, if


has a density/, thence has a density (1/lcl)f(x/c). To see this, write
prob{ce e A}= prob {~

1
f(y) dy = - f I(~) dx.
j(l/c)A
c
1c 1 )A

e !A}= f
c

Thus, from (10.1.7), if e1 and e2 are independent and have densities 11 and
/2, respectively, then (c 1 ~1 + C:J~2 ) has the density
1
f(x) = 1C1C2 1

joo
dy.
-oo 11 (x-C1 y) h(1L)
C:J

(10.1.8)

10.2 Mathematical Expectation and Variance


In previous chapters we have, on numerous occasions, used the concept of
mathematical expectation in rather specialized situations without specifically noting that it was, indeed, the mathematical expectation that was
involved. We now wish to explicitly introduce this concept in its general
sense.
Let (O,F,prob) be a probability space and let e:n--+ R be a random
variable. Then we have the following definition.

Definition 10.2.1. If is integrable with respect to the measure "prob,"


then the mathematical expectation (or mean value) of is given by

E(~)

=In e(w) prob(dw).

10.2. Mathematical Expectation and Variance

307

Remark 10.2.1. By definition, E(e) is the average value of e. A more


illuminating interpretation of E(e) is given by the law of large numbers
(see equation (10.3.4)]. D
In the case when is a constant, = c, then it is trivial to derive E(c).
Since prob{O} = 1 for any constant c, we have

E(c) = c

prob(dw)

=c.

(10.2.1)

Now we show how the mathematical expectation may be calculated via


the use of a density function. Let h: Ric - R be a Borel measurable function, that is, h- 1 (a) is a Borel subset of Ric for each interval a. Further,
let = (6' ... '{~c) be a random vector with the joint density function
f(xl! ... , x~c). Then we have the following theorem.

Theorem 10.2.1. If hf is integrable, that is,

j I h(x1 x~c)f(xl x~c)dx 1 dx~c < oo,


R~

then the mathematical expectation of the random variable h 0

eexists and

is given by
E(h 0 e)

I ... I

h(xl, ... 'x~c)f(xb ... 'x~c) dxl ... dx~c.

{10.2.2)

Proof. First assume that h is a simple function, that is,


n

h(x)

= L.\i1A (x)
1

i=l
where the Ai are mutually disjoint Borel subsets of Ric such that UiAi
Then
n

h({(w))

=L

.\ilA,(e(w))

=L

.\il~-l(A,)(w),

i=l
i=l
and, by the definition of the Lebesgue integral,
E(h o e)=

h({(w)) prob{dw)

Further' since

.\i prob{C 1 (Ai)}.

i=l

f is the density for e' we have

prob{C 1 (Ai)}

= prob{e E Ai} = }A,


f f(x) dx,

= Ric.

308

10. Stochastic Perturbation of Discrete Time Systems

As a consequence,

Thus, for the h that are simple functions, equality {10.2.2) is proved. For an
arbitrary h, hf integrable, we can find a sequence {hn} of simple functions
converging to h and such that lhnl :S lhl. From equality {10.2.2), already
proved for simple functions, we thus have
E(hn o e)=

}R

hn(x)f(x) dx.

By the Lebesgue dominated convergence theorem, since


follows that
f h(e(w)) prob(dw) = f h(x)f(x) dx,

Jn

lhn/1

:S

lhl/,

it

JR

which completes the proof.


In the particular case that k
(10.2.2)

= 1 and h(x) = x, we have from equation

E(e) =

1:

xf(x)dx.

(10.2.3)

Thus, if /(x) is taken to be the mass density of a rod of infinite length,


then E(e) gives the center of mass of the rod.
From Definition 10.2.1, it follows that, for every sequence of random
variables e11 1 e1c and COnstants .X11 1 .X1c, we have
E

t,

(t,.xiei) = .xiE(ei>

(10.2.4)

since the mathematical expectation is simply a Lebesgue integral on the


probability space (O,F,prob). Moreover, the mathematical expectation of
exists whenever all of the
exist.
We now turn to a consideration of the variance, starting with a definition.

Ei .xiei

E(ei)

e:

Definition 10.2.2. Let 0 - R be a random variable such that m = E(e)


exists. Then the variance of is

(10.2.5)
if the corresponding integral is finite.
Thus the variance of a random variable is just the average value of
the square of the deviation of away from m. By the additivity of the
mathematical expectation, equation (10.2.5) may also be written as

(10.2.6)

10.2. Mathematical Expectation and Variance

309

If has a density f(x), then by the use of equation (10.2.2), we can also
write

whenever the integral on the right-hand side exists. Finally, we note that
for any constant >.,

Since in any application there is a certain inconvenience in the fact that


D 2 (e) does not have the same dimension as it is sometimes more convenient to use the standard deviation of e' defined by

e,

u(e)

= VD2 (e).

For our purposes here, two of the most important properties of the mathematical expectation and variance of a random variable are contained in
the next theorem.

Theorem 10.2.2. Let 6, ... , ek be independent random variables such that


E(e,), D 2 (e,), i = 1, ... , k exist. Then
E(6 ek)

= E(6) E(ek)

(10.2.7)

and

(10.2.8)

Proof. The proof is easy even in the general case. However, to illustrate
again the usefulness of (10.2.2), we will prove this theorem in the case when
all thee, have densities. Thus, assume that has density j,, i = 1, ... , k,
and pick h(xll ... 'Xk) = Xl Xk. Since the ell ... ' ek are independent
random variables, by Theorem 10.1.1, the joint density function for the
random vector (ell ... ' ek) is

e,

Hence, by equation (10.2.2),

E(el ek)

=I I

x1 xk!I(xl) !k(xk)dxl. dxk

R"

= /_: xd1(x1)dx1 /_: Xk/k(xk) dxk


= E(ed ... E(ek),
and (10.2. 7) is therefore proved.

310

10. Stochastic Perturbation of Discrete Time Systems

= m 1, so that
2
D (et + + e~c) = E((et + + e~c- mt -

Now set E(e,)

= E

(.t

,,,=1

(e,- m,)(e;-

- m~c) )

m;)) .

Since the et, ... ,e~c are independent, (et- mt), ... , (e~c- m~c) are also independent. Therefore, by (10.2.4) and (10.2.7), we have
1c

D 2 (et

+ ... +e~c) = EE((e,- m,) 2 ) + EE((e,- m,)(e;- m;))


i~j

i=l

1c

= ED2 (e,) + E<E(e,)- m,)(E(e;)- m;).


i=l

Since E(e,) =

i"#j

ms, equation (10.2.8) results immediately.

e,

Remark 10.2.2. In Theorem 10.2.2, it is sufficient to assume that the


are mutually independent, that is, is independent of t for i ':/: j. 0
To close this section on mathematical expectation and variance, we give
two versions of the Chebyshev inequality, originally introduced in a special
context in Section 5.7.

e.

e;

Theorem 10.2.3. If e is nonnegative and E(e) exists, then


prob{e ~a} 5 E(e)/a

If e is arbitrary but such that m

for every a> 0.

(10.2.9)

= E(e) and D 2 (e) exist, then

prob{le- ml ~ e} 5 D 2 (e)/e2

for every e > 0.

(10.2.10)

Proof. By the definition of mathematical expectation,

E(e)

= f e(w) prob{dw) ~ f
Jo

aJ

J{w:E(w)~a}

e(w) prob(dw)

prob(dw) = aprob{e

{w:E(~o~)~a}

~a},

which proves (10.2.9). (This is, of course, analogous to equation (5.7.9).]


Now replace by {e- m) 2 and a by e 2 in (10.2.9) to give

prob{(e- m) 2 ~ e 2 } 5 (1/e2 )E((e- m) 2 )

= (1/e2 )D2 (e),

which is equivalent to (10.2.10) and completes the proof.

10.3. Stochastic Convergence

311

10.3 Stochastic Convergence


There are several different ways in which the convergence of a sequence {en}
of random variables may be defined. For example, if en E V(O,F,prob),
then we may define both strong and weak convergences of {en} to in
V(O) space, as treated in Section 2.3.
In probability theory some of these types of convergence have special
names. Thus, strong convergence of {en} in 2 (0), defined by the relation

(10.3.1)
is denoted by
l.i.m.en =

and called convergence in mean.


A second type of convergence useful in the treatment of probabilistic
phenomena is given in the following definition.
Definition 10.3.1. A sequence Un} of random variables is said to be
stochastically convergent to the random variable if, for every e > 0,

lim prob{len- el ~ e}

n-+oo

= 0.

{10.3.2)

The stochastic convergence of {en} toe is denoted by


st-limen

=e.

(10.3.3)

Note that in terms of V norms, the mathematical expectation and variance of a random variable may be written as

and
.o2(e)

=In le-

ml prob(dw)

= lie- mii~2(0)

This observation allows us to derive a connection between stochastic convergence and strong convergence from the Chebyshev inequality, as contained
in the following proposition.
Proposition 10.3.1. If a sequence {en} of mndom variables, en E V(O),
is strongly convergent in V(O) toe, then {en} is stochastically convergent
to
Thus, convergence in mean implies stochastic convergence.

e.

Proof. We only consider p < oo, since for p oo the proposition is trivial.
Applying the Chebyshev inequality (10.2.9) to len- eiP, we have
prob{len - ei.P ~ eP} ~ (1feP)E(len - ei.P)

312

10. Stochastic Perturbation of Discrete Time Systems

or, equivalently,

which completes the proof.


A third type of convergence useful for random variables is defined next.
Definition 10.3.2. A sequence {en} of random variables is said to converge
ahnost surely to (or to converge to with probability 1) if

lim en(w) = e(w)

n-+oo

for almost all w. Equivalently, this condition may be written as


prob{ lim en(w) = e(w)} = 1.
n-+oo

Remark 10.3.1. For all of the types of convergence we have defined (strong
and weak V convergence, convergence in mean, stochastic convergence,
and almost sure convergence), the limiting function is determined up to a
set of measure zero. That is, if and are both limits of the sequence {en},
then and ( differ only on a set of measure zero. 0
We now show the connection between almost sure and stochastic convergence with the following proposition.

Proposition 10.3.2. If a sequence of random variables {en} converges


almost surely to then it also converges stochastically to

e,

e.

Proof. Set

71n(w) = min(l, len(w)- e(w)l).

e,

Clearly, l77nl 5 1. If {en} converges almost surely to


then {77n} converges to zero almost surely, and, by the Lebesgue dominated convergence
theorem,
lim ll77niiLl(Sl)

n-+oo

= n-+00
lim { 71n(w)prob(dw) = 0.
ln

By Proposition 10.3.1 this implies that {71n} converges stochastically to


zero. Since in the definition of stochastic convergence it suffices to consider
only c < 1, it then follows that
prob{len- el ~ c}

= prob{77n ~ c}

forO<c<l.

Thus the stochastic convergence of {71n} to zero implies the stochastic convergence of {en} to and the proof is complete.

10.3. Stochastic Convergence

313

As a simple illustration of the usefulness of the concept of stochastic


convergence, we prove the simplest version of the law of large numbers
given in the next theorem.
Theorem 10.3.1 (Weak law of large numbers). Let Un} be a sequence
of independent random variables with

and
M
Then

prob
for every

2(en) < oo.


= supD
n

{I~ t,ce,- "">I" e} < !.

> 0. In particular, if m1
1

= m2 = = mn = m,
n

st-um- L: ei

n
Proof. Set

then

= m.

{10.3.4)

i=l

17n =- L:ei

i=l

Since the ei are indepenent random variables,

and, clearly,
1

E(1Jn) =- L:mi.
n i=l

Thus, by the Chebyshev inequality {10.2.10),


prob{l'ln- E(qn)l ~ e} :5; {1/e2)D2('7n) :5; Mjne 2,
which completes the proof, as equation (10.3.4) is a trivial consequence.

Equation {10.3.4} is a precise statement of our intuitive notion that the


mathematical expectation or mean value of a random variable may be
obtained by averaging the results of many independent experiments.
The term ''weak law of large numbers" specifically refers to equation
{10.3.4) because stochastic convergence is weaker than other types of convergence for which similar results can be proved. One of the most famous

314

10. Stochastic Perturbation of Discrete Time Systems

versions of these is the so-called strong law of large numbers, as contained in the Kolmogorov theorem.
Theorem 10.3.2 (Kolmogorov). Let {en} be a sequence of independent
random variables with
E(en) = mn

and M =sup D 2 (en)

< oo.

Then

1 n
lim
n-+oon ~

"'<ei - mi) = 0
i=l

with probability 1.

We will not give a proof of the Kolmogorov theorem [see Breiman (1968)
for the proof] as it is not used in our studies of the ftow of densities. We
stated it only because of its close correspondence to the Birkhoff individual
ergodic Theorem 4.2.4, which also deals with the pointwise convergence of
averages. To illustrate this correspondence, consider the sequence

/(x), /(S(x)), ... ,

(10.3.5)

which appears in the Birkhoff theorem, as a sequence of random variables


on the probability space (X, A, prob), where
prob(A) = JJ(A)/ JJ(X).
These variables (10.3.5) are, in general, highly dependent since S(x) is a
function of x, S 2 (x) = S(S(x)) is a function of x and S(x), and so on.
The reason that a probabilistic treatment of deterministic systems is often more difficult than problems in classical probability theory is directly
related to the absence of independent random variables in the former. It
is only in some special circumstances that independence may appear in
deterministic systems under certain limiting cases, such as mixing and exactness.
However, independence appears in a natural way in the definition of
perturbed dynamical systems which we consider in the following Sections
10.4-10.6. There we use the notion of independent random vectors, which
is an immediate generalization of the definition of independent random
!l -+ R"n, with n = 1, 2, ... be a sequence {finite or
variables. Let
are independent if, for every
not) of random vectors. We say that the
sequence of Borel sets Bn C Rkn, the events

en:

en

are independent. Observe that in this definition the


dimensions for different n.

en may have different

10.4. Randomly Applied Stochastic Perturbations

315

FIGURE 10.4.1. Schematic representation of the operation of a deterministic


system with a randomly applied stochastic perturbation.

10.4 Discrete Time Systems with Randomly


Applied Stochastic Perturbations
In this section we consider the asymptotic behavior of a nonsingular transformation when a stochastic perturbation is randomly applied.
Let (X,A,JL) be a measure space, and S:X-+ X a nonsingular transformation with associated Frobenius-Perron operator P. The following rules
apply to the evolution of the point x E X: At the nth instant of time we do
not know the precise location of x although we do know the density fn(x).
At the next instant of time (n + 1), the point moves with probability (1- e)
to the next location S(xn) However, there is a probability e that this new
location will not be given by S(xn) but rather by a random variable, independent of Xn, with density g(x). This process can be visualized as shown
in Figure 10.4.1. To make it more precise, we follow the ideas of Chapter 8
in which we derived the linear Boltzmann equation. Thus, consider space
(X,A,JLJ), where f is the density of the initial position of the point, and
probability space (0, :F, prob) related to the perturbations. With these two
spaces we define the product space

0 x X= {(w,x):w E O,x EX}


and the product measure
Prob(O x A) = prob(O)JLJ(A),

for 0 E :F, A E A

(see Theorem 2.2.2).


To describe the perturbations, consider a sequence of independent random vectors

316

10. Stochastic Perturbation of Discrete Time Systems

such that each "'n: 0-+ R takes only two values, 1 and 0, with the following
probabilities:
prob("7n = 0) = e,

prob("7n = 1) = 1 - e,
and each

~n

has the same density g. Then the equation


(10.4.1)

n=0,1, ...

gives the precise description of our intuitively introduced behavior of the


sequence of {xn}. Denote the density of Xn by In Our task now is to derive
a relation between In and ln+l! assuming that the initial density lo =I is
given.
Note that

"'n

Prob{xn+l E A}= Prob{xn+l E A and


= 0}
+Prob{xn+l E A and "7n = 1}.

(10.4.2)

Since, from (10.4.1), Xn+l(w,x) = ~n(w) if "7n(w) = 0, and Xn+l(w,x) =


S(xn) if "7n(w) = 1, we can rewrite equation (10.4.2) as

"'n = 0}
+Prob{S(xn) E A and "'n = 1}.

Prob{xn+l E A}= Prob{~n E A and

Since the events {~n E A} and {"'n = 0} are independent of each other and
independent of the initial vector xo, we have
Prob{~n E A and

"'n = 0} = prob{en E A and "'n = 0}


= prob{~n E A}prob{"7n
= e

Lg(x)~(dx).

Further, since Xn is dependent only on


have
Prob{S(xn) E A and

~1,

= 0}

... ~n-1 and "71, ... 1 "7n-1! we

"'n = 1} = Prob{S(xn) E A}prob{"7n = 1}.

Finally, since Xn has the density In by assumption, this last formula implies
Prob{S(xn) E A and "'n

= 1} =

(1- e) {

ls- 1 (A)

ln(x)~(dx).

Thus, combining the foregoing probabilities, we have


Prob{xn+l E A}= (1- e) {

ls-l(A)

ln(x)~(dx) + e

jA

g(x)~(dx).

10.4. Randomly Applied Stochastic Perturbations

317

Using the definition of the Frobenius-Perron operator P corresponding to


S, this may be rewritten as
Prob{xn+l E A}=

[(1- e)Pfn(x) + eg(x)]JL(dx)

for all Borel sets A CR. Hence, if Xn has density fn, then this demonstrates
that Xn+l also has a density f n+l given by

fn+l

= (1- e)Pfn +eg.

(10.4.3)

We want to write the right-hand side of equation (10.4.3} in the form of


a linear operator, and so we define Pf:: L 1 -+ L1 by

Pd = (1- e)Pf +eg

f(x)JL(dx)

(10.4.4)

for all f E L 1 . Using the definition of Pf:, we may rewrite equation {10.4.3)
in the form
(10.4.5)
Our goal is to deduce as much as possible concerning the asymptotic behavior of P: fo for foE D.
The first result is contained in the following proposition.
Proposition 10.4.1. Let the operator Pe: D-+ D be defined by equation
{10.4.4}. Then {P:} is asymptotically stable.
Proof. The proof is trivial. From the definition of Pf: in (10.4.4}, we have

P: f = Pf:(P;'- f)~ eg fx f(x)JL(dx) = eg


1

for all f E D. Thus, eg is a nontrivial lower-bound function for P: f.


Further since Pe is clearly a Markov operator, we have, by Theorem 5.6.2,
that {P:} is asymptotically stable.
Remark 10.4.1. This simple result tells us that given any nonsingular
transformation, the addition of even the smallest stochastic perturbation
ensures that the system will be asymptotically stable regardless of the
character of the deterministic system in the unperturbed case. D
However, much more can be determined about this stochastically perturbed deterministic system. We have the following result that explicitly
gives the stationary density for Pf:.
Proposition 10.4.2. Let the operator Pe: L 1 -+ L 1 be defined by equation
{10.4.4). Then fore> 0, the unique stationary density of Pf: is given by
00

f! = e ~)1- e)kpkg.
k=O

(10.4.6)

318

10. Stochastic Perturbation of Discrete Time Systems

Proof. Since 11(1-e:)k Pkgll :$ (1-e:)kllgll the series in (10.4.6) is absolutely


convergent. Substitution of {10.4.6} into

P,J = (1- e:)Pf + e:g


shows that P,J! = /!.

(10.4.7}

Remark 10.4.2. It may happen that the limit of stationary densities J:


defined by equation {10.4.6} may not exist as e:-+ 0. ~a simple example
consider S: R+ -+ R+ given by S(x) = lx. In this case, the kth iterate of
the Frobenius-Perron operator is given by

pkg(x) = 2kg(2kx)
and, thus,
00

/!(x) = e:

L:<1- e:)k2kg(2kx).

k=O
Now pick an arbitrarily small h > 0 and integrate/! over [0, h]:

hJ:(x) dx
1

= e:

L 2A:(1- e:)k 1h g(2kx) dx


00

k=O

00

= e: E<1 - e:>k
k=O

h2.

lo

g(y) dy.

For 6 > 0 arbitrarily small, we can always find an m such that, for all
k>m,

h2.

so

1h

g(y)dy

'?.1

/!(x) dx '?. {1- 6)e:

00

g(y)dy- 6 = 1-6

~{1- e:)A: = (1- 6)(1- e:)m.

Thus, holding 6 and m fixed, assume /~(x)

= lim~-o /!(x) so that

for every h > 0.


Now it follows directly that/~ ED cannot exist, for, if it did, then
lim fh f2(x) dx = 0,

h-olo
which is a contradiction.

Theorem 10.4.1. Let (X, A,/-') be a measure space, S:X

- t X a nonsingular tmnsformation, P the Frobenius-PefTOn opemtor corTesponding to

10.4. Randomly Applied Stochastic Perturbations

319

S, and Pe the opemtor defined by equation (10.4.4} with unique stationary


density f! given by (10.4.6}. If the limit (strong or weak in 1)
/2(x) =lim f!(x)
e-+0

exists, then f~ is a stationary density for P.

Proof. Since /! is a stationary density for P., we have Pd! = f! or, more
explicitly,

(1- e)P/! +eg = f!.

Under the assumption that


finishing the proof.

/2

exists, we immediately have P /2 =

f1,

Remark 10.4.3. In this context it is interesting to note that if f~ exists


it may depend on g. A simple example comes from 8: R -+ R given by
S(x) = x. Then Pg = g for all g ED and
00

f! =ge L)l-e)lc =g
lc=O
so 12 =g. 0
Although this example shows that it is quite possible for f~ to depend
on g, the following theorem gives sufficient conditions for not only the
existence of f~ but also its value.

Theorem 10.4.2. Let (X,A,~t) be a finite measure space, S:X -+X a


measure-preserving eryodic tmnsformation, and P. the opemtor defined by
equation (10.4.4) with the unique stationary density f! given by (10.4.6).
Then
= lim.....o /! exists and is given by

r.

f~

= 1/~t(X).

Although the proof of this theorem is straightforward, we will not give


it in detail. It suffices to note that the proof is very similar to those of
Section 5.2 and 8. 7, and the point of similarity resides in the fact that
the series representations for pn f and i'tf in each of those sections have
coefficients that sum to 1. Exactly the same situation occurs in the explicit
representation (10.4.4} for/! since
00

e ~)1- e)lc = 1.
lc=O

320

10. Stochastic Perturbation of Discrete Time Systems

10.5 Discrete Time Systems with Constantly


Applied Stochastic Perturbations
In Section 10.4 we examined the asymptotic behavior of deterministic systems with a randomly applied stochastic perturbation. We now turn our
attention to deterministic systems with constantly applied stochastic perturbations. Such dynamical systems have been considered by Kifer (1974]
and Boyarsky (1984], and in a physical context by Feigenbaum and Hasslacher (1982].
Specifically consider the process defined by
(10.5.1)
where 8: Rd -+ Rd is a measurable, though not necessarily nonsingular,
are independent random vectors each having
transformation and
the same density g. We let the density of Xn be denoted by lm and desire
a relation connecting ln+l and In
Assume In ED. By (10.5.1), Xn+l is the sum of two independent random
vectors: S(xn) and en Note that S(xn) and en are clearly independent
since, in calculating X!t ... ,Xn, we only need eo, ... ,en-1 Let h:Rd-+
R be an arbitrary, bounded, measurable function. It is easy to find the
mathematical expectation of h(xn+l) since, by Theorem 10.2.1,

eo, el, ...

E(h(xn+l)) = {

}Rd.

h(x)ln+l(x) dx.

(10.5.2)

Furthermore, because of (10.5.1) and the fact that the joint density of
(xn,en) is just ln(y)g(z), we also have
E(h(xn+l))

= E(h(S(xn) +en))
= jRd.jRd.
f f h(S(y) + z)ln(y)g(z) dydz.

By a change of variables, this can be rewritten as


E(h(xn+l))

= }Rd.
[ jRd
f h(x)ln(y)g(x- S(y)) dxdy.

(10.5.3)

Equating (10.5.2) and (10.5.3), and using the fact that h was an arbitrary,
bounded, measurable function, we immediately obtain
ln+l(x)

= }Rt~.
{ ln(y)g(x- S(y)) dy.

(10.5.4)

Remark 10.5.1. Our derivation of (10.5.4), though mathematically precise, is somewhat different from the usual method. Our reasons for this are

10.5. Constantly Applied Stochastic Perturbations

321

threefold. First we were able to avoid the introduction of the concept of


conditional probabilities. Second, the technique provides a clear proof that
if /n(x) exists then fn+l(x) must also exist. To see this, take h(x) = 1A(x)
in (10.5.3), so (10.5.3) becomes
prob{xn+l E A}=

f f

jAjRd

fn(y)g(x- S(y))dxdy

and, thus, by the definition of density, if f n exists then f n+l also exists and
is given by (10.5.4). Finally, we have introduced this method of obtaining
(10.5.4) because we use it later in deriving the Fokker-Planck equation
that describes the evolution of densities for continuous time systems in the
presence of a stochastic process. 0
From our equation (10.5.4), we define an operator P: L 1 -+ 1 by
P!(x) =

}Rd

f(y)g(x- S(y)) dy

(10.5.5)

for f E 1 . That Pis a Markov operator is quite easy to prove. Note first
that if we set K(x, y) = g(x-S(y)), then, forgED, K is a stochastic kernel
(Section 5.7) and Pis a Markov operator. Thus, in examining the behavior
of the systems forming the subject of this section, we have available all of
the tools developed in Section 5.7.
Remark 10.5.2. In the special case in which d = 1 and S = ..Xx, equation
(10.5.5) reduces to that considered in Example 5.7.2 with a= 1 and b = -..X.
0
However, because of the characteristics of the function g identified as a
kernel, we can prove more than in Section 5. 7. We start by stating and
proving a result for the asymptotic periodicity of {pn}.

Theorem 10.5.1. Let the operator P: L 1 (Rd) -+ L 1 (Rd) be defined by


(10.5.5) and let g E D. If there exists a Liapunov function [see (5.7.8)]
V: Rd -+ Rd such that, with a < 1,

}Rd

g(x- S(y))V(x) dx

~ aV(y) + {3

for ally E ~.

then the operator P is constrictive and, as a consequence, for every f E 1


the sequence {pn} is asymptotically periodic.

Proof. We will use Theorem 5. 7.2 in the proof, noting that now the stochastic kernel is explicitly given by K(x,y) = g(x- S(y)).
We first verify that (5.7.19) holds. Since g is integrable, for every . X > 0
there is a 6 > 0 such that

g(x)dx <..X

for JL(A) < 6.

322

10. Stochastic Perturbation of Discrete Time Systems

In particular,

K(x, y) dx

jE

=I

g(x- S(y)) dx =

jE

g(x) dx <)..

JE-8(11)

for JJ.(E-S(y)) ='JJ.(E) < 6. Thus, (5.7.19) holds uniformly for all bounded
sets B.
Further, from (10.5.5) and the assumptions of the theorem we have

jRd

V(x)P!(x)dx=

r V(x)dx jRdr f(y)g(x-S(y))dy

jRd

:5 a { V(y)f(y) dy + {3,

jRd

so inequality (5.7.11) also holds. Thus, by Theorem 5.7.2 we have shown


that P is constrictive.
Theorem 10.5.1 implies that for a very broad class of transformations
the addition of a stochastic perturbation will cause the limiting sequence
of densities to become asymptotically periodic. For some transformations
this would not be at all surprising. For example, the addition of a small
stochastic perturbation to a transformation with exponentially stable periodic orbits will induce asymptotic periodicity. However it is surprising that
even in a transformation S that has no particularly interesting behavior
from a density standpoint, the addition of noise may result in asymptotic
periodicity. We may easily illustrate this through an example on [0, 1], since
this makes numerical experiments feasible.

Example 10.5.1 (Lasota and Mackey, 1987). Consider the transformation


mod 1, 0

< a < 1, 0 < ).. < 1,

(10.5.6)

which is an example of a general class of transformations considered by


Keener [1980]. From Keener's general results, for 0 <a< 1 there exists an
uncountable set A such that for all ).. E A the rotation number corresponding to (10.5.6) is irrational. As a consequence, for these ).. the sequence
{Xn} is not periodic and the invariant limiting set

n
00

s/c([o, 1])

(10.5.7)

lc=O

is a Cantor set. The proof of Keener's general result offers a constructive


tool for numerically determining values of ).. that approximate elements of

A.
The perturbed dynamical system
mod 1

(10.5.8)

10.5. Constantly Applied Stochastic Perturbations

323

leads to an integral operator for which it is easy to verify the conditions of


Theorem 5.7.2 (see Exercises 10.3 and 10.4).
For illustration we pick a = Keener's results show that ~ = ~~ is close
to an element of A such that the invariant limiting set {10.5.7) is a Cantor
set and the sequence {xn} is not periodic. Using the explicit transformation
{10.5.8), where the
are random numbers uniformly distributed on [0, 8],
in Figure 10.5.1 we show the eventual limiting behavior of the sequence
{P"!} of densities for 8 = 115 and an initially uniform density on [0, 1].
It is clear that P 13 f(x) is the same as P 10 f(x), and P 14 f(x) is identical
to P 11 f(x). Thus, in this example we have a noise induced period three
asymptotic periodicity.
Theorem 10.5.1 also implies that P has a stationary density f. since this
is a consequence of the spectral decomposition Theorem 5.3.3. This does
not, of course, guarantee the uniqueness of f., but a simple assumption
concerning the positivity of g will not only ensure uniqueness of f. but also
asymptotic stability. More specifically, we have the following result.

!.

en

Corollary 10.5.1. If P given by {10.5.5) satisfies the conditions of Theorem 10.5.1, and g(x) > 0, then {P"} is asymptotically stable.
Proof. We start with the observation that for every fixed x the product
g(x- S(y))P"- 1 f(y), considered as a function of y, does not vanish everywhere. As a consequence,
P" f(x)

={

}Rd

g(x-S(y))P"- 1 f(y) dy > 0

for all x E :gJ,n ~ 1,f ED.

The asymptotic stability of {P"} is thus proved by applying Theorem 5.6.1.

It is interesting that we may also prove the uniqueness of a stationary


density f. of P defined by {10.5.5) without the rather restrictive conditions
required by Corollary 10.5.1.

Theorem 10.5.2. Let the operator P: L 1 -+ 1 be defined by equation


{10.5.5) and let g E D. If g(y) > 0 for all y E Rd and if a stationary
density f. for P exists, then f. is unique.
Proof. Assume there are two stationary densities for
h Set f = It - h, so we clearly have

Ff=f.

P,

namely,

It

and

{10.5.9)

We may write f = f+- f- by definition, so that, if It :f:. h, then neither


f+ nor f- are zero. Since Pf+ = f+ {by Proposition 3.1.3), from {10.5.5)
we have
{10.5.10)

324

10. Stochastic Perturbation of Discrete Time Systems


(a)

rl

n
lh

(b)

(c)

(d)

0
FIGURE 10.5.1. Asymptotic periodicity illustrated. Here we show the histograms
obtained after iterating 104 initial points uniformly distributed on (0, 1] with
a= ~ .X=
and(}= ft in equation (10.5.8). In (a) n = 10; (b) n = 11; (c)
n = 12; and (d) n = 13. The correspondence of the histograms for n = 10 and
n = 13 indicates that, with these parameter values, numerically the sequence of
densities has period 3.

o,

10.5. Constantly Applied Stochastic Perturbations

325

and similarly for P f-. Since f+ is not identically zero and g is strictly
positive, the integral in (10.5.10) is a. nonzero function for every x, and,
thus, Pf+(x) > 0 for all x. Clearly, too, Pf-(x) > 0 for all x and, thus,
the supports of Pf+ and Pf- are not disjoint. By Proposition 3.1.2, then,
we must have IIPfll < llfll, which contradicts equality (10.5.9). Thus, ft
and h must be identical almost everywhere if they exist.

Remark 10.5.3. It certainly may happen that there is no solution to


Ff =fin D. As a. simple example, consider S(x) = x for all x E R. Take
g to be the Gaussian density
g(x)

= ~exp(-x2 /2),
v2rr

so the operator P defined in (10.5.5) becomes


Ff(x)

00

-oo

rn= exp[-(x- y) 2 /2]f(y) dy.


v2rr

Note that Ff(x) is simply the solution u(t, x) of the heat equation (7.4.13)
with u 2 = 1 at time t = 1, assuming an initial condition u(O, y) = f(y).
Since this solution is given by a. semigroup of operators [cf. equations
(7.4.11) and (7.9.9)], it can be shown that

:::;

00

v2rrn

exp[-(x- y) 2 /2n]f(y) dy

100

f(y)dy=

pn f(x) = . ~
1
~

v 2rrn

-oo

-oo

v 2rrn

Thus pn f converges uniformly to zero as n -+ oo for all f E D, and there


is no solution to Ff =f. 0
If these conditions for the existence and uniqueness of stationary densities
of P are strengthened somewhat, we can prove that {pn} is asymptotically
stable. In fact from our results of Theorem 5.7.1, we have the following
corollary.

Corollary 10.5.2. Let the opemtor P: L 1 -+ L 1 be defined by equation


(10.5.5) and let g E D. If there is a Liapunov function V: Rd -+ R such
that

}Rd.

g(x- S(y))V(x) dx:::; aV(y)

+ {3

(10.5.11)

for some nonnegative constants a, {3, a< 1, and

f inf g(x- S(y)) dx > 0


}Rd.ifll~r
for every r

> 0, then {pn} is asymptotically stable.

(10.5.12)

326

10. Stochastic Perturbation of Discrete Time Systems

Remark 10.5.4. Note that condition (10.5.12) is automatically satisfied


if g: Rd. - R is positive and continuous and S: Rd. - Rd. is continuous
because

inf g(x- S(y))

lvl~r

for every x E

Ir'-.

= min g(x- S(y)) > 0


lvl~r

Example 10.5.2. Consider a point moving through


is determined by
Xn+l = S(xn) +en,

Ir'- whose trajectory

where S: Rd. - ~ is continuous and satisfies


for lxl ~ M,

IS(x)l ~ ~lxl,

(10.5.13)

eo,

where ~ < 1 and M > 0 are given constants. Assume that


e11 ... are
independent random variables with the same density g, which is continuous
and positive, and such that E(en) exists. Then {.Pn} defined by (10.5.5) is
asymptotically stable.
To show this, it is enough to confirm that condition (10.5.11) is satisfied.
Set V(x) = lxl, so

f g(x- S(y))V(x) dx = f g(x- S(y))lxl dx


Jnd
lnd
=f

Jnd g(x)lx + S(y)l dx

r g(x)(lxl + IS(y)l) dx
Jnd

= IS(y)l + Jnd
r g(x)lxl dx.
From (10.5.13) we also have
IS(y)l < ~1111
-

+ lzi~M
max IS(x)l

so that

f g(x- S(y))V(x) dx ~ ~IYI + max IS(x)l + f g(x)lxl dx.


kd
~SM
~
Thus, since E(en) exists, equation (10.5.11) is satisfied with a= ~ and
{3

= Jnd g(x)lxl dx + lzi~M


max IS(x)l.

10.6. Small Continuous Stochastic Perturbations

327

It is important to note that throughout it has not been necessary to


require that 8 be a nonsingular transformation. Indeed, one of the goals of
this section was to demonstrate that the addition of random perturbations
to a singular transformation may lead to interesting results.
However, if 8 is nonsingular, then the Frobenius-Perron operator P corresponding to 8 exists and allows us to rewrite (10.5.5) in an alternate form
that will be of use in the following section. By definition,

Ff(x) =

jRtJ

g(x- 8(y))f(y) dy.

Assume 8 is nonsingular, therefore the Frobenius-Perron and Koopman


operators corresponding to 8 exist. Let ha:(Y) = g(x- y), so we can write
PJ as

Ff(x) =

jRtJ

ha:(8(y))f(y) dy = (/, Uha:} = (Pj, ha:},

or, more explicitly,

Ff(x) =

jRtJ

g(x- y)Pf(y)dy.

(10.5.14)

By a change of variables, {10.5.14) may also be written as

Ff(x) =

}Rd

g(y)Pf(x-y)dy.

(10.5.15)

Remark 10.5.5. Observe that ford= 1, equations (10.5.14) and (10.5.15)


could also be obtained as an immediate consequence of equation (10.1.7)
applied to equation (10.5.1) since {n and 8(xn) are independent. D

10.6 Small Continuous Stochastic Perturbations


of Discrete Time Systems
This section examines the behavior of the system

e > 0,

(10.6.1)

where 8: Jld --+ Rd is measurable and nonsingular. As in the preceding


section, we assume the {n to be independent random variables each having
the same density g.
Since the variables e{n have the density (1/e)g(x/e), see Remark 10.1.1,
equation (10.5.15) takes the form

Pd(x) =

!e }Rd
f g (!!.)
Pf(x- y) dy
e

{10.6.2)

328

10. Stochastic Perturbation of Discrete Time Systems

and gives the recursive relation


(10.6.3)
that connects successive densities /n of Xn
The operator Pe can also be written, via a change of variables, as

= }Rd
f g(y)Pf(x- ey) dy.

Pd(x)

(10.6.4)

Since

}Rd

g(y)Pf(x)dy

= Pf(x),

we should expect that in some sense lime-o Pef(x) = P /( x ). To make this


more precise, we state the following theorem.
Theorem 10.6.1. For the system defined by equation (10.6.1)

= 0,

lim IIPd- Pfll

e-+0

for all f E L\

where P is the Frobenius-PefTOn operator corresponding to S and Pe is


given by (10.6.4).
Proof. Since P and Pe are linear we may restrict ourselves to f E D. Write

=f

Pf(x)

}Rd

g(y)Pf(x)dy,

then

Pd(x)- Pf(x)

=f

}Rd

g(y)[Pf(x- ey)- Pf(x)] dy.

Pick an arbitrarily small 6 > 0. Since g and P f are both integrable functions on _Rd., there must exist an r > 0 such that

JIIII~r

g(y)dy:::;;

and

1lzl~r/2

Pf(x)dx:::;;

~-

To calculate the norm of Pel - P f,

IIPd-Pfll::=;;

f f

}Rd }Rd

g(y)IPf(x-Ey)-Pf(x)ldxdy,

we split the integral into two parts,

IIPd-P/11 :s;It+h
where

It=

f f

}Rd A111~r

g(y)IPf(x- ey)- Pf(x)l dxdy,

10.6. Small Continuous Stochastic Perturbations

and

I2 =

}R4

1.

329

g(y)IPJ(x- ey)- Pf(x)l dxdy.

l11l?!:r

We consider each in turn.


With respect to I 11 note that, since the function Pf is integrable, by
Corollary 5.1.1, we may assume

L
4

IPf(x- ey)- Pf(x)l dx :5

for e :5 e0 with e0 sufficiently small. Hence

61.

I 1 :5 g(y) dy :5
2 l11l:$r

61

R4

g(y) dy =

62

In examining I 2 , we use the triangle inequality to write

I2:5

1.

JR 4 l11l?!:r

g(y)Pf(x-ey)dxdy+

Change the variables in the first integral to v

1.

j R 4 l11l?!:r

g(y)Pf(x-ey)dxdy=

1.

}R4 l11l?!:r

f4

JR

1.

g(y)Pf(x)dxdy.

= y and z = x- ey, then

1.

g(v)Pf(z)dzdv

lvl;?!r

lvl?!:r

g(v) dv :5

~-

Further, we also have

}Rd

1.

l11l?!:r

g(y)Pf(x)dxdy :5

so that I2 :5 6/2.
Thus

IIPd- P/11 :56

for any e :5 min (!,eo),

that is,
lim

-+0

IIPd- pIll

=0

As an immediate consequence of Theorem 10.6.1 we have the following


corollary.

Corollary 10.6.1. Suppose that S and g are given and that for every small
e, 0 < e < e0 , the opemtor P, defined by (10.6.4), has a stationary density
fe. If the limit
/.=lim
/
--+0

330

10. Stochastic Perturbation of Discrete Time Systems

exists then /. is a stationary density for the Ftobenius-PefTOn operator


corresponding to S.

Proof. Write
Since Pe is contractive,

Thus fe + Pe(/ - /e) -+ /. as E -+ 0 and, as a consequence, Pd. -+ /


However, Theorem 10.6.1 also tells us that Pd.-+ P/., soP/.=/.. II

10.7 Discrete Time Systems with Multiplicative


Perturbations
Up to now in this chapter we have confined our attention to situations
in which a discrete time system is perturbed in an additive fashion, for
example, (10.4.1), (10.5.1), and {10.6.1). We now turn to a consideration
of the influence of perturbations that appear in a multiplicative way. Since
in many applied problems this arises because of noise in parameters, it is
also known as parametric noise.
Specifically, we examine a process
{10.7.1)
where 8: R+ -+ R+ is continuous and positive a.e. and, as before, the en
are independent random variables, each distributed with the same density
g. We denote the density of Xn by /n, and our first task in the study of
{10.7.1) will be to derive a relation connecting fn+l and In
Using exactly the same approach employed in Section 10.5, let h: R+ -+
R+ be an arbitrary bounded and Borel measurable function. The expectation of h(xn+l) is given by
(10.7.2)
However, using (10.7.1) we also have
E(h(xn+l))

= E(enS(xn))
=
h(zS(y))fn(y)g(z) dy dz

11
00

00

1oo 1

00

h(x)fn(y)g (s(y))

S~y) dxdz,

(10.7.3)

10.7. Discrete Time Systems with Multiplicative Perturbations

331

where we used a change of variables z = x / S(y) in passing from the second


to third lines of (10.7.3). Equating (10.7.2) and (10.7.3), and using the fact
that h was arbitrary by assumption, we arrive at
(10.7.4)
which is the desired relation.
From (10.7.4) we may also write fn+l
given by

P!(x)

= Pin

= 1oo f(y)g (s~y))

where the operator

sty) dy,

P,

(10.7.5)

is a Markov operator with a stochastic kernel

K(x,y) =g

(s~y)) S~y)"

(10.7.6)

Our first result is related to the generation of asymptotic periodicity by


multiplicative noise. Though originally formulated by Horbacz (1989a], the
proof we give is different from the original.
Theorem 10.7.1. Let the Markov opemtor P:L1 (R+)-+ L 1 (R+) be defined by (10.7.5). Assume that g ED,

0 < S(x) $ax+ {3


and
am< 1

with m =

forx

100

0,

xg(x) dx,

(10.7.7)

(10.7.8)

where a and {3 are nonnegative constants. Then P is constrictive. As a


consequence the sequence {pn} is asymptotically periodic.
Proof. Once again we will employ Theorem 5. 7.2 in the proof. We first
show that (5.7.11) holds for the kernel (10.7.6) with V(x) = x. We have

1oo xPf(x)dx= 1oo xdx 1oo g (s~y)) sty/(y)dy


= 100 f(y)dy 100 g (s~y)) s~y) dx.
Using the change of variables z

= x/S(y) and then (10.7.7) we obtain

00
00
00
1 xPf(x)dx = 1 f(y)S(y)dy 1 zg(z)dz
00
00
= m 1 f(y)S(y) dy $am 1 yf(y) dy +{3m.

332

10. Stochastic Perturbation of Discrete Time Systems

Thus, we have verified inequality (5.7.11).


We next show that the kernel K(x, y) given by (10.7.6) satisfies inequality
(5.7.19) of Theorem 5.7.2. Fix an arbitrary positive >. < 1 and choose a
bounded set B C R+. Since g is uniformly integrable there must be a 61 > 0
such that

Define

6 = 61 inf S(y).
veB

Then for p(E)

< 6 we have p(E/S(y)) < 61 and

h K(x,y)dx

= hg

(s(y))

=f

S;y)dx

g(x)dx

}E/S(v)

~ >.

for y E Band p(E)

< 6,

and all of the conditions of Theorem 5.7.2 are satisfied. Thus Pis constrictive and a simple application of the spectral decomposition Theorem 5.3.1
finishes the proof.
We close with a second theorem concerning asymptotic stability induced
by multiplicative noise.
Theorem 10.7.2. If the Markov operator P: L 1 (R+)- L 1 (R+) defined by
(10.7.5) satisfies the conditions of Theorem 10.7.1 and, in addition, g(x) >
0, then {P"} is asymptotically stable.
Proof. Note that, for fixed x, the quantity
X

g ( S(y)

1 - 1
S(y) pn- f(y),

as a function of y, does not vanish everywhere. Consequently,

pn f(x) =

1oo g (s(y)) S;y) pn-1 f(y)dy > 0


for all x E R+,n;:::: 1,/ ED,

and Theorem 5.6.1 finishes the proof of the asymptotic stability of {P"} .

Theorems 10.7.1 and 10.7.2 illustrate the behaviors that may be induced
by multiplicative noise in discrete time systems. A number of other results concerning asymptotic periodicity and asymptotic stability induced
by multiplicative noise may be proved, but rather than giving these we
refer the reader to Horbacz [1989a,b].

Exercises

333

Exercises
10.1. Let {n: 0 -+ Jld.. , n = 1, 2, ... , be a sequence of independent random
vectors, and let cpn: Rd.. -+ ~ be a sequence of Borel measurable functions. Prove that the random vectors 17n(w) = 'Pn({n(w)) are independent.
10.2. Replace inequality (10.7.7) in Theorem 10.7.1 by

0 :5 8(x) :5 ax,

a<

1,

and show that in this case the sequence {pn} is sweeping to zero. Formulate
an analogous sufficient condition for sweeping to +oo.

10.3. Let 8: [0, 1]-+ [0, 1] be a measurable transformation and let {{n} be
a sequence of independent random variables each having the same density
g. Consider the process defined by
(mod 1),
and denote by fn the density of the distribution of Xn. Find an explicit
expression for the Markov operator P: 1 ([0, 1]) -+ 1 ([0, 1]) such that

fn+l

= Pfn

10.4. Under the assumptions of the previous exercise, show that {pn}
is asymptotically periodic. Find sufficient conditions for the asymptotic
stability of {pn}.
10.5. Consider the dynamical system (10.7.1) on the unit interval. Assume
that 8: (0, 1]-+ [0, 1] is continuous and that {n: n-+ [0, 1] are independent
random variables with the same density g E D([O, 1]). Introduce the corresponding Markov operator and reformulate Theorems 10.7.1 and 10.7.2 in
this case.
10.6. As a specific example of the dynamical system (10.7.1) on the unit
interval (see the previous exercise), consider the quadratic map 8( x) =
ax(1- x) and en having a density g E D([O, 1]) such that

0:5x:51.
Show that for every a E (1,4] there is a K
is asymptotically stable (Horbacz, 1989a).

> 0 and r > 0 such that {Pn}

10.7. Consider the system

. Xn+l = 8(xn) +en


with additive noise. Note that with the definitions y = ez, T = e 8 , and
17 = e~ this can be rewritten in the alternative form

Yn+l

= 17nT(lnyn)

334

10. Stochastic Perturbation of Discrete Time Systems

as if there were multiplicative noise. Using this transformation, discUBB the


results for multiplicative noise that can be obtained from the theorems and
corollaries of Section 10.5.
10.8. As a counterpoint to the previous examples, note that if

= enS(xn)
and we set y = lnx, 'I= 1ne, and T = lnS, then
Xn+l

Yn+l

= T(e11") + '7n

results. Examine the results for additive noise that can be obtained using
this technique on the theorems of Section 10.7 pertaining to multiplicative
noise.

11
Stochastic Perturbation of
Continuous Time Systems

In this chapter continuous time systems in the presence of noise are considered. This leads us to examine systems of stochastic differential equations
and to a derivation of the forward Fokker-Planck equation, describing the
evolution of densities for these systems. We close with some results concerning the asymptotic stability of solutions to the Fokker-Planck equation.

11.1 One-Dimensional Wiener Processes


(Brownian Motion)
In this and succeeding sections of this chapter, we turn to a consideration of
continuous time systems with stochastic perturbations. We are specifically
interested in the behavior of the system

dx
dt = b(x) + o-(x)e,

(11.1.1)

where o-(x) is the amplitude of the perturbation and e = dwjdt is known


as a ''white noise" term that may be considered to be the time derivative of
a Wiener process. The system (11.1.1) is the continuous time analog of the
discrete time problem with a constantly applied stochastic perturbation
considered in Section 10.5.
The consideration of continuous time problems such as (11.1.1) will offer
new insight into the possible behavior of systems, but at the expense of introducing new concepts and techniques. Even though the remainder of this

336

11. Stochastic Perturbation of Continuous Time Systems

chapter is written to be self-contained, it does not constitute an exhaustive


treatment of stochastic differential equations such as (11.1.1). A definitive
treatment of this subject may be found in Gikhman and Skorokhod [1969].
In this section and the material following, we will denote stochastic processes by {~(t)}, {77(t)}, ... as well as Ut}, {'7t}, ... , depending on the
situation. Remember that in this notation ~(t) or ~t denote, for fixed t, a
random variable, namely, a measurable function ~t: 0--. R. Thus ~(t) and
~t. are really abbreviations for ~(t,w) and ~t(w), respectively. The symbol~
will be reserved for white noise stochastic processes (to be described later),
whereas 77 will be used for other stochastic processes.
Let a probability space {0, F, prob) be given. We start with a definition.

Definition 11.1.1. A stochastic process {77(t)} is called continuous if,


for almost all w (except for a set of probability zero), the sample path
t--. 77(t,w) is a continuous function.
A Wiener process can now be defined as follows.
Definition 11.1.2. A one-dimensional normalized Wiener process
(or Brownian motion) {w(t)}t~o is a continuous stochastic process with
independent increments such that
(a) w(O)

= 0; and

(b) for every s, t, 0 ~ s


Gaussian density

g(t- s, x) =

< t, the random variable w(t) - w(s) has the


1

J21r(t- s)

exp[-x 2 /2(t- s)].

(11.1.2)

Figure 11.1.1a shows a sample path for a process approximating a Wiener


process.
It is clear that a Wiener process has stationary increments since w(t)w(s) and w(t + t') - w(s + t') have the same density function (11.1.2).
Further, since w(t) w(t)-w(O), the random variable w(t) has the density

1
g(t,x)= ~exp(-x 2 /2t).
v21Tt
An easy calculation shows

(11.1.3)

00

E((w(t)- w(s))n) =

1
J21T(t- s)

xn exp(-x 2 /2(t- s)] dx

-oo

= { 1.3. (n- 1)(t- s)n/

for n even (11.1.4)


for n odd

and thus, in particular,

E(w(t)- w(s))

=0

{11.1.5)

11.1. One-Dimensional Wiener Processes

337

w(t)
(a)

w(t)
(b)

;:-.;'

... l(

j ; : . i II ~ :
..

..

H#~#m#M~:~::~::i*;~~~~~~!~ii~!~:~::~:i~~:;~:~;;~r.y~f:~l::~i:~:i!~~::i~i~i:~i~!:~;:~ t
;';:;;,',,:.::::: !!i::J!!;.:?:<H:::::i:::::::::;;;:;:;.
' . ol '

~ : ' ; : := .

.' . ' .'

FIGURE 11.1.1. A process approximating a Wiener process. In (a) we show a


single sample path for this process. In (b) we superimpose the points of many
sample paths to show the progressive increase in the variance.
and
D 2 (w(t)- w(s))

= (t- s).

(11.1.6)

This last equation demonstrates that the variance of a Wiener process


increases linearly with t.

Remark 11.1.1. The adjective normalized in our definition of the Wiener


process is used because D 2 (w(t)) = t. It is clear that multiplication of a

normalized Wiener process by a constant u > 0 again yields a process with


properties similar to those of Definition 11.1.2, but now with the density

~ exp( -x2 /2u2 t)


and with the variance u 2 t. These processes are also called Wiener processes.
From this point on we will always refer to a normalized Wiener process as
a Wiener process. 0
In Figure 11.1.1b we have drawn a number of sample paths for a process
approximating a Wiener process. Note that as time increases they all seem
to be bounded by a convex envelope. This is due to the fact that the
standard deviation of a Wiener process, from (11.1.6), increases as ..fi,
that is,

338

11. Stochastic Perturbation of Continuous Time Systems

The highly irregular behavior of these individual trajectories is such that


magnification of any part of the trajectory by a factor a 2 in the time
direction and a in the x direction yields a picture indistinguishable from
the original trajectory. This procedure can be repeated as often as one
wishes, and, indeed, the sample paths of a Wiener process are fractal curves
[Mandelbrot, 1977]. To obtain some insight into the origin of this behavior
consider the absolute value of the differential quotient

llit I= !litllw(to
E (I~~ I) = l~tl E(lw(to
liw

We have

+lit) - w(to)l.

+lit) - w(to)l)

and, since the density of w(t0 +lit) - w(t0 ) is given by (11.1.3),

00

E(lw(to +lit) - w(to)l) = ~


1
211'lit
= J2lit/11'
or

lxl exp( -x 2 /2lit)dx

-oo

E(l~~l)=~~

Thus the mathematical expectation of !liwf lit! goes to infinity, with a


speed proportional to (lit)- 112 , when !lit! --+ 0. This is the origin of the
irregular behavior shown in Figure 11.1.1.
Extending the foregoing argument, it can be proved that the sample
paths of a Wiener process are not differentiable at any point almost surely.
Thus, the white noise terme = dw/dt in (11.1.1) does not exist as a stochastic process. However, since we do wish ultimately to consider (11.1.1) with
such a perturbation, we must inquire how this can be accomplished. AB
shown in following sections, this is simply done by formally integrating
(11.1.1) and treating the resulting system,

x(t)

=lot b(x(s))ds+ lot q(:z:(s))dw(s)+x

However, this approach leads to the new problem of defining what the
integrals on the right-hand side mean, which will be dealt with in Section
11.3.

To obtain further insight into the nature of the process w(t), examine
the alternative sequence {zn} of processes, defined by

Zn(t) = w(tf__d +

t~ -=_t;~ 1
'

t-1

[w(tf)- w(tf_d]

fortE [tf_ 1 , tf],

11.1. One-Dimensional Wiener Processes

339

where tf = ifn, n = 1, 2, ... , i = 0, 1, 2, .... In other words, Zn is obtained


by sampling the Wiener process w( t) at times tf and then applying a linear
interpolation between tf and tf+l Any sample path of the process {Zn (t)}
is differentiable, except at the points tf, and the derivative
= z~ is given
by
fortE (tf_ 1 , tf).
"'n(t) = n(w(tf) - w(tf-t)J,

"'n

The process "'n(t) is piecewise constant. The heights of the individual segments are independent, have a mean value zero, and variance D 2 'f/n(t) = n.
Thus, the variance grows linearly with n. If we look at this process approximating white noise, we see that it consists of a sequence of independent
impulses of width (1/n) and variance n. For very large n we will see peaks
of almost all possible sizes uniformly spread along the t-axis.
Note that the random variable .zn(t) for fixed t and large n is the sum
of many independent increments. Thus the density of Zn(t) must be close
to a Gaussian by the central limit theorem. The limiting process w(t) will,
therefore, also have a Gaussian density, which is why we assumed that w(t)
had a Gaussian density in Definition 11.1.2.
Historically, Wiener processes (or Brownian motion) first became of interest because of the findings of the English biologist Brown, who observed
the microscopic movement of pollen particles in water due to the random
collisions of water molecules with the particles. The impulses coming from
these collisions are almost ideal realizations of the process of white noise,
somewhat similar to our process "'n(t) for large n.
In other applications, however, much slower processes are admitted as
"white noise" perturbations, for example, waves striking the side of a large
ship or the influence of atmospheric turbulence on an airplane. In the example of the ship, the reason that this assumption is a valid approximation
stems from the fact that waves of quite varied energies strike both sides of
the ship almost independently with a frequency much larger than the free
oscillation frequency of the ship.
Example 11.1.1. Having defined a one-dimensional Wiener process
{w(t)}t~o, it is rather easy to construct an exact, continuous time, semidynamical system that corresponds to the partial differential equation
(11.1.7)
Our arguments follow those of Rudnicki (1985), which generalize results of
Lasota (1981), Brunovsky (1983), and Brunovsky and Komornik (1984).
The first step in this process is to construct the Wiener measure. Let X
be the space of all continuous functions x: (0, 1) -+ R such that x(O) = 0.
We are going to define some special subsets of X that are called cylinders.
Thus, given a sequence of real numbers,
0 < St

< < Bn

1,

340

11. Stochastic Perturbation of Continuous Time Systems

and a sequence of Borel subsets of R,

we define the corresponding cyHnder by

C(sb ... , Bni A1, ... ,An)= {x E X:x(si) E A,,i = 1, ... ,n}.

(11.1.8)

Thus the cylinder defined by (11.1.8) is the set of all functions x E X pasing
through the set A1 at s1 (see Figure 11.1.2). The Wiener measure P.w of
the cylinders (11.1.8) is defined by

P.w(C(sb ... , Bni A1,. ,An))


= prob{w(s1) E A1, ... , w(sn) E An}

(11.1.9)

To derive an explicit formula for Jl.w, consider a transformation y = F(x)


of R" into itself given by
(11.1.10)
and set A= A1 x x An. Then the condition
(w(s1), ... , w(sn)) E A
is equivalent to the requirement that the random vector
(w(s1), w(s2)- w(s1), ... , w(sn)- w(sn-1))

(11.1.11)

belong to F(A). Since {w(t)}t>o is a random process with independent


increments, the density functio;; of the random vector (11.1.11) is given by

g(s1,yl)g(s2- B1,Y2),. ,g(sn- Bn-bYn),


where, by the definition of the Wiener process [see equation (11.1.3)],

1
g(s, y) = rn= exp( -y2 f2s).
v211's

(11.1.12)

Thus we have
prob{w(sl) E A1, ... , w(sn) E An}

JJ

g(s1, yl)g(s2- s1, Y2) g(sn- Bn-b Yn)dy1 dYn

F(A)

Using the variables defined in (11.1.10), this becomes


prob{w(s1) E A1, ... ,w(sn) E An}

r jAnr

jA1

9(81 1 X1)g(s2

817

X2 - X1)

" g(sn - Bn-b Xn - Xn-d dx1 dxn.

11.1. One-Dimensional Wiener Processes

341

x(s)

FIGURE 11.1.2. Schematic representation of implications of the cylinder definition (equation (11.1.8).]

By combining this expression with equations {11.1.9) and {11.1.12), we


obtain the famous formula for the Wiener measure:

JLw(C(sl! ... 1 Bni A11. 1 An))


1

(We assume, for simplicity, that so = xo = 0.)


To extend the definition of JLw, we can define the u-algebra A to be
the smallest u-algebra of the subsets of X that contains all the cylinders
defined by (11.1.8) for arbitrary n. By definition, the Wiener measure JLw
is the (unique) extension of JLw 1 given by {11.1.13) on cylinders, to the
entire u-algebra A. The proof that JLw given by (11.1.13) on cylinders can
be extended to the entire u-algebra is technically difficult, and we omit
it. However, note that if a Wiener process {w(t)}t>O is given, then it is a
direct consequence of our construction of the Wiener measure for cylinders
that
forEEA,
JLw(E) = prob{w E E)
{11.1.14)
where w is the restriction of w to the interval [0, 1]. {Incidentally, from
this equation, it also follows that the assumption that a Wiener process
{w(t)}t;?!O exists is not trivial, but, in fact, is equivalent to the existence of
the Wiener measure.)
With the element of the measure space (X, A, JLw) defined, we now turn to
a definition of the semidynamical system {St }t;?!o corresponding to {11.1. 7).
With the initial condition

u(O,s)

= x(s),

{11.1.15)

342

11. Stochastic Perturbation of Continuous Time Systems

equation (11.1.7) has the solution


u(t, s) = et12 x(se-t).
Thus, if we set
(11.1.16)

this equation defines {Sth~o


We first show that {Sth~o preserves the Wiener measure J.l.w Since the
measures J.Lw on cylinders generate the Wiener measure on the entire ualgebra A, we will only verify the measure-preservation condition
(11.1.17)

for cylinders. First observe that for every a E (0, 1),


P.w(C(a 2 st. ... ,a2 Sni aAt, ... ,aAn))
= P.w(C(st, .. , Sni At, .. , An)).

(11.1.18)

This follows directly from equation (11.1.13) if we set Yi = ax, in the


integral on the right-hand side. Further, from (11.1.16), it is clear that
(Stx)(s,) E A, if and only if x(ste-t) E e-t/2 A,. Thus,
s;t(C(st, ... I Sni At, ... I An))
= {x EX: (Stx)(s,) E A,,i = 1, ... ,n}
= C(e-tst, ... , e-tsn; e-t/2 At. ... , e-t/2 An).

From this relation and (11.1.18) with a = e-t/2 , we immediately obtain


(11.1.17), thereby verifying that {Sth~o preserves the Wiener measure P.w
To demonstrate the exactness of {St}t~ 0 , we will be content to show that
lim P.w(St(C))

t--+oo

=1

if P.w(C) > 0

(11.1.19)

for cylinders. In this case we have


St(C) = St(C(st, ... 1 Sni At. ... , An))
= {Stx: x E C} = {et1 2 x(se-t): x E C}.

Set y(s)

= et12 x(se-t) so this becomes


St(C) = St(C(st. ... ,sn;At, ... ,An))
= {y E X:y(s) = et12 x(se-t),x(s,) E A,,i = 1, ... ,n}(11.1.20)

Since s E [0, 1], and, thus, se-t E [0, e-t], the conditions x(s,) E At are
irrelevant for s, > e-t. Thus
St(C(st, ... ,sn; At, ... ,An))= C(stet, ... , s~cet; et/ 2 At, ... ,et/2 A~c)

11.1. One-Dimensional Wiener Processes

343

where k = k(t) is the largest integer k ~ n such that s1c ~ e-t. Once t
becomes sufficiently large, that is, t > -logs1, then from (11.1.20) we see
that the last condition x1 E A1 disappears and we are left with

St(C(sl, ... , Sni A., ... , An))= {y EX: y(s) = et12 x(se-t)}.
However, since X is the space of all possible continuous functions x: [0, 1]
R, the set on the right-hand side is just X and, as a consequence,
for t

-+

> -log s 11

which proves equation (11.1.19) for cylinders.


In the general case, for an arbitrary C E .A the demonstration that
(11.1.19) holds is more difficult, but the outline of the argument is as follows. Starting with the equality

1-'w(St(C)) = J.tw(St" 18t(C)),


and using the fact that the family {8t" 1 8t(C)}e>o is increasing with t, we
obtain
lim J.tw(St(C)) = J.tw(B),
(11.1.21)
t-+oo

where

(11.1.22)
and to is an arbitrary nonnegative number. From (11.1.22), it follows that

.Aoo =

n 8t" (.A).
1

t2:;0

From the Blumenthal zero-one law [see Remark 11.2.1] it may be shown that
the u-algebra .Aoo contains only trivial sets. Thus, since J.tw(B) ;::: J.tw(C),
we must have J.tw(B) = 1 whenever J.'w(C) > 0. Thus (11.1.19) follows
immediately from (11.1.21).
A proof of exactness may also be carried out for equations more general
than the linear version (11.1.7). The nonlinear equation

au.
&t

au.

+ c(s) 88

= f(s, u),

(11.1.23)

has been used to model the dynamics of a population of cells undergoing


simulaneous proliferation and maturation [Lasota, Mackey, and Wa.ZewskaCzyZewska, 1981; Mackey and Dormer, 1982], where sis the maturation
variable. When the coefficients c and f satisfy some additional conditions,
it can be shown that all the solutions of (11.1.23) with the initial condition
(11.1.15) converge to the same limit if x(O) > 0. However, if x(O) = 0, then
the solutions of (11.1.23) will exhibit extremely irregular behavior that

344

11. Stochastic Perturbation of Continuous Time Systems

can be identified with the exactness of the semidynamical system {Sth~o


corresponding to u(t, s). This latter situation [x(O) =OJ corresponds to the
destruction of the most primitive cell type (maturity = 0), and in such
situations the erratic behavior corresponding to exactness of {St}t~o is
noted clinically. 0

11.2 d-Dimensional Wiener Processes (Brownian


Motion)
In considering d-dimensional Wiener processes we will require an extension
of our definition of independent sets. Suppose we have a finite sequence

(11.2.1)
of a-algebras. We define the independence of (11.2.1) as follows.
Definition 11.2.1. A sequence (11.2.1) consists of independent a-algebras if all possible sequences of sets At, ... , An such that

are independent.
Further, for every random variable ewe denote by .r(e) the a-algebra of
all events of the form {w:e(w) E B}, where the Bare Borel sets, or, more
explicitly,
.r(e) = {Ct(B):B is a Borel set}.
Having a stochastic process {TJ(t)}tea on an interval a, we denote the
smallest a-algebra that contains all sets of the form
{w:17(t,w) E B},t E a, B is a Borel set,
by F(17 (t): t E a).

With this notation we can restate our definition of independent random variables as follows. The random variables et, ... , are independent
if .r(et), ... ,.r(en) are independent. In an analogous fashion, stochastic
processes {TJt(t)}tea1 , , {TJn(t)hean are independent, if

en

F(TJt(t):t Eat), ... ,F(TJn(t):t E an)


are independent.
and n stochastic processes
Finally, having m random variables et, ... ,
{TJt(t)heaw .. , {TJn(t)hean, we say that they are independent if the aalgebras

em

11.2. d-Dimensional Wiener Processes

345

are independent. We will also say that a stochastic process {q(t)hea and
au-algebra :Fo are independent if :F(q(t):t E a) and :Fo are independent.
Now it is straightforward to define a d-dimensional Wiener process.
Definition 11.2.2. A d-dimensional vector valued process
w(t)

= {w1(t), ... ,wd(t)},

t ~0

is a d-dimensional Wiener process (Brownian motion) if its compo-

nents {w1(t)}t;::o, ... , {wd(t)}t;::o are one-dimensional independent Wiener


processes (Brownian motion).
From this definition it follows that for every fixed t the random variables w1(t), ... , wd(t) are independent. Thus, it is an immediate consequence of Theorem 10.1.1 that the joint density of the random vector
(w1(t), ... ,wd(t)) is given by
g(t, x1, ... ,xd)

= g(t, xl) g(t, xd)


= (21r:)d/2 exp [-

t.

;t x~

(11.2.2)

The joint density g has the following properties:

I I

g(t,x11 ... ,xd)dxl dxd

= 1,

(11.2.3)

Rd

I I

x,g(t, x11 , Xd) dx1 dxd

= 0,

i = 1, ... ,d,

(11.2.4)

Rd

and

I I

x,x;g(t,xll ... ,xd) dx1 dxd

= 61;t,

i,j

= 1, ... ,d,

Rd

where 61; is the Kronecker delta (6a;

= 0, i ::/= j, 6" = 1).

(11.2.5)

Remark 11.2.1. The family :F(w(u): 0 :5 u :5 t) of u-algebras generated by


the Wiener process (or d-dimensional Wiener process) has the interesting
property that it is right-hand continuous. We have (modulo zero)
:F(w(u):O :5 u :5 t)

:F(w(u):O :5 u :5 t+h).

h>O

(11.2.6)

346

11. Stochastic Perturbation of Continuous Time Systems

In particular at t = 0, since w(O) = 0 and the 0'-algebra generated by w(O)


is trivial, we can see from equality (11.2.6) that the product

.r(w(u): 0 $; u $;h)

h>O

contains only sets of measure zero or one. The last statement is referred to
as the Blumenthal zero-one law (Friedman [1975]). D

11.3 The Stochastic

Ito Integral: Development

To understand what is meant by a solution to the stochastic differential


equation (11.1.1), it is necessary to introduce the concept of the stochastic
Ito integral. In this section we offer a simple but precise definition of this
integral and calculate some specific cases so that a comparison with the
usual Lebesgue integral may be made.
Let a probability space (0, .r, prob) be given, and let {w(t)}t;:::o be a onedimensional Wiener process. If {17(t)he[a,~] is another stochastic process
defined fortE [a, ,8], a~ 0, we wish to know how to interpret the integral

J:

17(t) dw(t).

(11.3.1)

Proceeding naively from the classical rules of calculus would suggest that
(11.3.1) should be replaced by

J:

17(t)w'(t) dt.

However, this integral is only defined if w(t) is a differentiable function,


which we have already observed is not the case for a Wiener process.
Another possibility suggested by classical analysis is to consider (11.3.1)
as the limit of approximating sums 8 of the form
1c

8=

L '7(f,)[w(t,)- w(ti-t)],

(11.3.2)

i=l

where
a

= to < t1 < < t1c = .B

is a partition of the interval [a, .B] and the intermediate points f, E [t,, ta+t].
This turns out to be a more fruitful idea but has the surprising consequence
that the limit ofthe approximating sums 8 ofthe form (11.3.2) depends on
the choice of the intermediate points f,, in sharp contrast to the situation
for the Riemann and Stieltjes integrals. This occurs because w(t), at fixed
w, is not a function of bounded variation.

11.3. The Stochastic Ito Integral: Development

347

With these preliminary remarks in mind, we now proceed to develop


some concepts of use in the definition of the Ito integral.
Definition 11.3.1. A family {Ft},

a~ t ~

{3, of 0'-algebras contained in

.r is called nonanticipative if the following three conditions are satisfied:


(1) Fu C :Ft for u

(2) Ft :::> F(w(u): a


respect to Ft;

t, soFt increases as t increases;


~

t), so w(u), a

t, is measurable with

(3) w(t +h)- w(t) is independent of Ft for h;::: 0, so all pairs of sets A1 ,
A2 such that A1 EFt and A2 E F(w(t +h)- w(t)) are independent.
From this point on we will assume that a Wiener process w( t) and a
family of nonanticipative 0'-algebras {Ft}, a~ t ~ {3, are given.
We next define a fourth condition.
Deftnition 11.3.2. A stochastic process {77{t)}, a
non-anticipative with respect to {Ft} if

(4) Ft :::> .1"{77(u): a~ u

{3, is called

t}, so 77(u) is measurable with respect to :ft.

For every random process {77(t)}, a~ t ~ {3, we define the Ito sum 8 by
k

E 17(ti-1)[w(t,) - w(ti-1)].

(11.3.3)

i=l

Note that in the definition of the Ito sum {11.3.3), we have specified the
intermediate points fi of {11.3.2) to be the left end of each interval, f, =
ti-l For a given Ito sum 8, we define

and call a sequence of Ito sums {8n} regular if c5(8n)-+ 0 as n-+ oo.
We now define the Ito integral as follows.
Deftnition 11.3.3. Let {77(t)}, a~ t ~ {3, be a nonanticipative stochastic
process. If there exists a random variable ( such that

( = st-lim 8n

(11.3.4)

for every regular sequence of the Ito sums {8n}, then we say that (is the
Ito integral of {17(t)} on the interval [a, /3] and denote it by
( =

J:

17(t) dw(t).

{11.3.5)

348

11. Stochastic Perturbation of Continuous Time Systems

Remark 11.3.1. It can be proved that for every continuous nonanticipative process the limit (11.3.4) always exists. D
Remark 11.3.2. Definition 11.3.1 of a nonanticipative u-algebra is complicated, and the reason for introducing each element of the definition, as well
as the implication of each, may appear somewhat obscure. Condition (1)
is easy, for it merely means that the u-algebra Fe of events grows as time
proceeds. The second condition ensures that :Ft contains all of the events
that can be described by the Wiener process w(8) for times 8 E [a, t]. Finally, condition (3) says that no information concerning the behavior of
the process w(u)- w(t) for u > t can influence calculations involving the
probability of the events in Fe. Definition 11.3.2 gives to a stochastic process 71(u) the same property that condition (2) of Definition 11.3.1 gives
to w(u). Thus, all of the information that can be obtained from 71(u) for
u E [a, t] is contained in :Ft.
Taken together, these four conditions ensure that the integrand 71(t) of
the Ito integral (11.3.5) does not depend on the behavior of w(t) for times
greater than {3 and aid in the proof of the convergence of the Ito approximating sums. Further, the nonanticipatory assumption plays an important
role in the proof of the existence and uniqueness of solutions to stochastic
differential equations since it guarantees that the behavior of a solution in
a time interval [0, t] is not influenced by the Wiener process for times larger
than t. D

Example 11.3.1. For our first example of the calculation of a specific Ito
integral, we take

1T

dw(t).

In this case the integrand of (11.3.5) is 77(t) 1. Thus :F(77(t): 0 :5 t :5 T)


is a trivial u-algebra that contains the whole space 0 and the empty set 0.
To see this, note that, if 1 E B, then {w:77(t) E B} = 0 and, if 1 B then
{w:77(t) E B} = 0. This trivial u-algebra {0,0} is contained in any other
u-algebra, and thus condition (4) of Definition 11.3.2 is satisfied.
By definition
1c
8

= L[w(ti) -

w(ti_t)]

= w(t~c) -

w(to)

i=l

and, thus,

1T

dw(t) = w(T).

Example 11.3.2. In this example we will evaluate

1T

w(t) dw(t),

which is not as trivial as our previous example.

= w(T)

11.3. The Stochastic Ito Integral: Development

349

In this case, 17(t) = w(t), so that condition (4) of Definition 11.3.2 follows
from condition (2) of Definition 11.3.1. The Ito sum,
k

=L

w(ti-l)[w(ti) - w(ti_t)],

i=l
may be rewritten as
k

= ! I)w

k
2

i=l

(ti) - w (ti-l)J-l2:!w(ti) - w(ti-1)] 2


i=l
k

= lw

(T) -

! L 'Yi,

(11.3.6)

i=l

where
'Yi

= [w(t,) -

w(ti-1)] 2

To evaluate the last summation in (11.3.6), observe that, from the Chebyshev inequality (10.2.10),
prob

{I! t, !t,
'Yi -

mi

~ e} ~ e~ D

(! t,

'Yi)
(11.3.7)

where mi = E('Yi) Further, by (11.1.4),


E('Yi)

= E([w(t,) -

w(ti-lW)

= ti -ti-l

and, by equations {10.2.6) and (11.1.4),


D 2 ('Y1) ~ E('Yl)

= E([w(ti)- w(ti-1)] 4 ) = 3(ti- ti-1) 2

Thus,

and

LD ('Yi) ~ 3 L(ti- ti-1)

~ 3Tmrx:(ti- ti-d
i=l
i=l
Setting 6(8) = maJCi(ti -ti-l) as before and using (11.3.7), we finally obtain
2

prob

{I! t, il ~
'Yi-

e}

~ ~6(8)

350

11. Stochastic Perturbation of Continuous Time Systems

or, from (11.3.6),


prob

{I (
8-

I}

w2(T)
3T 6(8).
- - T) ~ e ::::; 4e
2

If {8n} is a regular sequence, then 6(8n) converges to zero as n-+ oo and

st-lim 8n

= !w2(T) - !T.

Thus we have shown that

loT w(t) dw(t) = !w2 (T) - !T,


clearly demonstrating that the stochastic Ito integral does not obey the
usual rules of integration. 0
This last example illustrates the fact that the calculation of stochastic integrals is, in general, not an easy matter and requires many analytical tools
that may vary from situation to situation. What is even more interesting is
that the sufficient conditions for the existence of stochastic integrals related
to the construction of nonanticipative u-algebras are quite complicated in
comparison with the Lebesgue integration of deterministic functions.
Remark 11.3.3. From Example 11.3.2, it is rather easy to demonstrate
how the choice of the intermediate point ti influences the value of the
integral. For example, picking ti = !(ti-l+ ti), we obtain, in place of the
Ito sum, the Stratonovich sum,
1c

=L

w (!(ti-l+ ti)} [w(ti)- w(ti-1)]

i=l
1c

= !w

(T) - !

1c

L
'Yi + ! ~:::>i
i=l
i=l

where

'Yi = [w(ti)- w (!(ti-l+ t,))]


and

Pi= [w (!(ti-l+ ti)}- w(ti-1)]


Since the variables 'Y1, ... , 'Yic are independent as are Pl, ... , Pic, we may use
the Chebyshev inequality as in the previous example to show that
1c

st-lim

1c

L 'Yi = iT= st-lim L Pi


i=l

i=l
Thus the Stratonovich sums {8n} converge to !w2(T), and the Stratonovich
integral gives a result more in accord with our experience from calculus.

11.4. Stochastic Ito Integral: Special Cases

351

However, the use of the Stratonovich integral in solving stochastic differential equations leads to other more serious problems. 0
To close this section, we extend our definition of the Ito integral to the
multidimensional case. If G(t) = ('1i;(t)), i,j = 1, ... , dis ad x d matrix of
continuous stochastic processes, defined for a~ t ~ {3, and w(t) = (wi(t)),
i = 1, ... , d, is a d-dimensional Wiener process, then

L{3

G(t) dw(t) =

where
(i

1{3

j=1

=L

('~1)

'

(11.3.8)

'1i;(t) dw;(t)

defines the Ito integral. Thus, equation (11.3.8) is integrated term by term.
In this case the family {Ft} of nonanticipative o--algebras must satisfy
conditions (2) and (3) of Definition 11.3.1 with respect to all {wi(t)}, i =
1, ... , d, and condition (4) of Definition 11.3.2 must be satisfied by all
{'IJi;(t)}, i,j = 1, ... ,d.

11.4 The Stochastic Ito Integral: Special Cases


In the special case when the integrand of the Ito integral does not depend
on w, that is to say, it is not a stochastic process, the convergence of the approximating sums is quite strong. This section is devoted to an examination
of this situation and one in which we are simply integrating a stochastic
process with respect tot.
Before stating our first proposition, we note that, if f: [a, {3] -+ R is a
continuous function, then every regular sequence {sn} of approximating
sums
k,.

Bn

= Lf(tf)[w(tf)- w(tf-1)],
i=1

converges in the mean [i.e,, strongly in 2 (0)] to the integral

(=

1:

f(t) dw(t).

(11.4.1)

Although we will not prove this assertion, it suffices to say that the proof
proceeds in a fashion similar to the proof of the following proposition.
Proposition 11.4.1. Iff: [a,/3]-+ R is a continuous function, then

(!: /( t))
t) dw(

=0

(11.4.2)

352

11. Stochastic Perturbation of Continuous Time Systems

and
2

D (J: f(t)dw(t))

= J:[f(t)] 2 dt.

(11.4.3)

Proof. Set
8

i=1

i=1

L f(ti-1)[w(ti)- w(ti-1)] = L f(ti-d~wi,


82 =

/(ti-1)/(t;-1)~wi~w;.

i,j=1
We have immediately that
k

E(8)

=L

/(ti-1)E(~wi)

=0

i=l

and, since w(t) is a Wiener process with independent increments,

We also have
k

D (8)

= E(8 ) = L
2

/(ti-1)/(t;-1)E(~wi~w;)

i,j=1
k

= L:!J(ti_t)] 2 (ti -ti-d


i=1
Thus for any regular sequence {8n},
lim E(sn)
n-+oo

=0

(11.4.4)

and
(11.4.5)
Since, from the remarks preceding the proposition, {sn} converges in mean
to the integral (given in equation (11.4.1), we have limn-+oo E(8n) = E(()
and limn-+oo D 2 (sn) = D 2 ((), which, by (11.4.4) and (11.4.5), completes
the proof.

11.4. Stochastic Ito Integral: Special Cases

353

A second special case of the stochastic integral occurs when the integrand
is a stochastic process but it is desired to have the integral only with respect

to time. Hence we wish to consider

(=

J:

{11.4.6)

11(t)dt

when {17(t)}, a$ t $ /3, is a given stochastic process. To define {11.4.6) we


consider approximating sums of the form
k

s=

E 11(ti)(ti -ti-d.
i=l

corresponding to the partition


a = to

< t1 < < t~c

/3

with arbitrary intermediate points fiE [ti-l ti] We now have the following
definition.

Definition 11.4.1. If every regular [6{sn) -+ OJ sequence {sn} of approximating sums is stochastically convergent and

( = st-lim Bm

{11.4.7)

then this common limit is called the integral of 17(t) on [a, /3] and is denoted
by {11.4.6).
Observe that, when 17{t,w) possesses continuous sample paths, that is, it
is a continuous function of t, the limit
lim Bn(w)

n-+oo

exists as the classical Riemann integral. Thus when {17(t)}, a $ t $ /3, is


a continuous stochastic process, this limit exists for almost all w. Further,
since, by Proposition 10.3.2, almost sure convergence implies stochastic
convergence, the limit (11.4. 7) must exist.
There is an interesting connection between the Ito integral {11.3.5) and
the integral of {11.4.6) reminiscent of the classical "integration by parts"
formula. It can be stated formally as follows.

Proposition 11.4.2. Iff: [a, /3]


derivative f', then

1:

f(t) dw(t)

=-

J:

-+

R is differentiable with a continuous

f'(t)w(t) dt + f({J)w({J)- f(a)w(a).

{11.4.8)

354

11. Stochastic Perturbation of Continuous Time Systems

Proof. Since the integrals in (11.4.8) both exist we may pick special approximating sums of the form
len

=L

Bn

/'(~)w(~)(tf- tf-1),

(11.4.9)

i=1

where the intermediate points t, are chosen in such a way that

Substituting this expression into (11.4.9), we may rewrite Bn as


len

Bn =

L[f(tf) -/(tf_t))w(~)
i=1
len-1

=-

[w(~+l) - w(~)]/(tf) + f(t~Jw(~J

i=1

-/(ti)w(f'i).

(11.4.10)

The sum on the right-hand side of (11.4.10) corresponds to the partition


f'i<<~n

that does not contain intervals (a, ff) and (tJ: ,/3). Setting t1;
t~n = /3, we may rewrite (11.4.10) in the form n
Bn

= -sn + w(/3)/(/3) -

=a

and

(11.4.11)

w(a)f(a),

where
len-1
Bn

L [w(tf+l) - w(tf)Jf(tf).
i=1

The sequence {sn} converges to

J:

f'(t)w(t) dt,

whereas {sn} converges, by our remarks preceding Proposition 11.4.1, to


the integral

J:

f(t) dw(t).

Thus, passing to the limit in (11.4.11) finishes the proof.

Remark 11.4.1. In our short development of the Ito integral and presentation of its main properties, we have restricted ourselves to the special

11.5. Stochastic Differential Equations

355

situation where the integrand is a continuous stochastic process. This allowed us to define the Ito integral in a relatively simple and direct way
as the limit of the Ito sums (11.3.3). Generally, such an approach is inconvenient because of the restrictive nature of the assumption concerning
the continuity of the integrand. Usually, the definition of the Ito integral
is given in a more sophisticated manner. It is first defined for stochastic
processes that are piecewise constant in time, and then, by using a limiting
procedure in L 2 (0) the definition is extended to a quite general class of
integrands. An exhaustive treatment of this procedure may be found in
Gikhman and Skorokbod [1969, 1975]. 0

11.5 Stochastic Differential Equations


All the material developed in the previous sections was a necessary prelude
to be able to study the stochastic differential equation

dx

dt

= b(x) + u(x)e

(11.5.1)

with initial condition


(11.5.2)
where

b(x)

bl(x))

(bd(x);

and u(x)

(uu(x)
;

Udt(x)

are given functions of x and

x(t)

Xl(t))

(Xd(t);

is the unknown. In (11.5.1), the "white noise" vector

e-- (~)
:
~

should be considered, from a mathematical point of view, as a pure symbol


much like the letters "dt" in the notation for the derivative. However, from
an application standpoint, denotes a very specific process consisting of
"infinitely" many independent, or random, impulses as discussed in Section
11.1. We assume that the initial vector x 0 and the Wiener process {w(t)}

356

11. Stochastic Perturbation of Continuous Time Systems

are independent. To examine the solution of equations (11.5.1} and (11.5.2},


we formally integrate (11.5.1} over the interval [0, t] to give
x(t)

=lot b(x(s)) ds +lot a(x(s)) dw(s) + x

(11.5.3}

Since the integrals that appear on the right-hand side of (11.5.3} are defined
from our considerations of the previous sections, we are close to a formal
definition of the solution.
First, however, it is necessary to choose a specific family of nonanticipative a-algebras {Fth>o We may, for example, assume that F is the
smallest a-algebra containing all events of the form {w: w( u, w) E B} and
(x0 )- 1 (B) for 0 ~ u ~ t and Borel sets B, that is, Ft is the smallest aalgebra with respect to which w(u}, 0 ~ u ~ t, and x 0 are measurable.
This family is nonanticipative since conditions (1} and (2) of Definition
11.3.1 are evidently satisfied, and condition (3} follows from the fact that
{w(t)} is a process with independent increments and that x 0 and {w(t)}
are independent.
With this family of nonanticipative a-algebras, we define the solution to
equations (11.5.1} and (11.5.2}.
Definition 11.5.1. A continuous stochastic process {x(t)}t>o is called a
solution of equations (11.5.1} and (11.5.2} if:
(a) {x(t)} is nonanticipative, that is, it satisfies condition (4} of Definition
11.3.2; and
(b) For every t ~ 0, equation (11.5.3} is satisfied with probability 1.
It is well known from the theory of ordinary differential equations that
it is necessary to assume some special conditions on the right-hand side in
order to guarantee the existence and uniqueness of a solution. It is interesting that analogous conditions are also sufficient for stochastic differential
equations. Thus we have the following theorem.
Theorem 11.5.1. If b(x) and a(x) satisfy the Lipschitz conditions
lb(x)- b(y)l ~ Llx- yi,

(11.5.4}

and

la(x)- a(y)l ~ Llx- yi,

x,yER"'

(11.5.5)

with some constant L, then the initial value problem, equations (11.5.1} and
(11.5.2}, has a unique solution {x(t)}t>O

Theorem 11.5.1 can be proved by the method of successive approximations as can the corresponding result for ordinary differential equations.

11.5. Stochastic Differential Equations

357

Thus a sequence {xi(t)}t~o of stochastic processes would be defined with


x(t = 0) = x 0 and

xi(t)

= Lt b(xi- 1 (s)) ds + Lt u(xi- 1 (s)) dw(s) + x0

Then, using the Lipschitz conditions (11.5.4) and (11.5.5), it is possible to


evaluate the series
00

x(t) = :~::)xi(t) - xi- 1(t)] + x 0


i=1
in L 2 (0) norm by a convergent series of the form,

and to prove that x(t) is, indeed, the desired solution. We omit the details
as this proof is quite complicated, but a full proof may be found in Gikhman
and Skorokhod [1969].
An alternative way to generate an approximating solution is to use the
Euler linear extrapolation formula. Suppose that the solution x(t) is given
on the interval [0, to]. Then for values to+ tit larger than, but close to, t 0 ,
we write
x(to +tit)

= x(to) + b(x(to))tit + u(x(to))tiw,

(11.5.6)

where tiw = w(to +tit)- w(to). (Observe that for an ordinary differential
equation, this equation defines a ray tangent to the solution on [0, to] at
to.) In particular, when an interval [0, T] is given, we may take a partition

O=to<<tn=T
and define

(11.5.7)
where tix(ti) = x(ti) - x(ti-1), titi = ti -ti-l! tiwi = w(ti) - w(ti-1),
and x(to) = x 0
It is evident that in some respects this approach is much simpler than the
method of successive approximations, since no knowledge concerning the
Ito integral is even necessary. Indeed, S. Bernstein employed this technique
in his original investigations into stochastic differential equations, so we
will call equations (11.5.6) and (11.5.7) the Euler-Bernstein equations.

Example 11.5.1. The oldest and best-known example of a stochastic differential equation is probably the Langevin equation.

-dx
dt = -bx+uc....

x(O)

= x0 ,

where xis a scalar and the coefficients band u are constant.

(11.5.8)

358

11. Stochastic Perturbation of Continuous Time Systems

By definition, the solution of (11.5.8) satisfies

x(t)

= -b Lt x(s) ds + u Lt dw(s) + x 0

or, using our calculations of Example 11.3.1,

x(t) = -b Lt x(s) ds + uw(t) + x 0

(11.5.9~

Equation (11.5.9) is rather easy to deal with since it does not contain an
Ito integral, and, since the one integral that does appear exists for almost
w taken separately, we may use the usual rules of calculus.
Setting

z(t)

= Lt x(s) ds,

(11.5.10)

equation (11.5.9) becomes, for almost all w,

dz
o
dt = -bz(t) + uw(t) + x .
For fixed w, this is an ordinary differential equation and, thus,

z(t)

= Lt e-b(t->(uw(s) +x0 )ds.

(11.5.11)

Combining equations (11.5.9) through (11.5.11) after some manipulation,


yields

x(t)

= x 0 e-bt- bu Lt e-b(t->w(s) ds + uw(t).

Using the integration by parts formula (11.4.8), this becomes

x(t)

= x0 e-bt + u it e-b(t-)dw(s).

From (11.5.12) and (11.4.2), it follows that

E(x(t))

= e-bt E(x0 )

and, taking note of the independence of x 0 and w(t),

D 2(x(t)) = e- 2bt D 2(x0) + u 2D 2 (lot e-b(t->dw(s)) .


With (11.4.3), this finally reduces to
D2(x(t)) = e-2bt D2(xo)

+ u2fot e-2b(t-s)ds

= e-2bt D2(xo) + (u2 /2b)[1 - e-2bt].

(11.5.12)

11.6. Fokker-Planck Equation

359

Thus, for the Langevin equation,

This asymptotic property of the variance is a special case of a more general


result that we will establish in the next section where we examine uses of
the Fokker-Planck equation. 0

11.6 The Fokker-Planck (Kolmogorov Forward)


Equation
The preceding sections were aimed at obtaining an understanding of the
dynamical system
dx
(11.6.1)
dt = b(x) + u(x)e
with
(11.6.2)

e.

under a stochastic perturbation This required us to first introduce the


abstract concept of nonanticipative a--algebras. Then we had to define the
Ito integral, which is generally quite difficult to calculate. Finally we gave
the solution to equations (11.6.1)-(11.6.2) in terms of a general formula,
generated by the method of successive approximations, which contains infinitely many Ito integrals.
In this section we extend this to a discussion of the density function of
the process x(t), which is a solution of (11.6.1) and (11.6.2). This density
is defined as the function u( t, x) that satisfies
prob{x(t) E B} =

u(t, z) dz.

(11.6.3)

The uniqueness of u(t,x) follows immediately from Proposition 2.2.1, but


the existence requires some regularity conditions on the coefficients b( x)
and u(x), which are given in the following. We will also show how u(t,x)
can be found without any knowledge concerning the solution x(t) of the
stochastic differential equations (11.6.1) with (11.6.2). It will turn out that
u( t, x) is given by the solution of a partial differential equation, known
as the Fokker-Planck (or Kolmogorov forward) equation and that it is
completely specified by the coefficients b(x) and u(x) of equation (11.6.1).
Now set
d

a,;(x)

= L:u,~;(x)u;~c(x).
lc=l

(11.6.4)

360

11. Stochastic Perturbation of Continuous Time Systems

From (11.6.4) it is clear that ai;

= a;i and, thus, the quadratic form,

L: aij(x),\i,\;,

(11.6.5)

i,j=l

is symmetric. FUrther, since

(11.6.5) is nonnegative.
We are now ready to state the main theorem of this section, which gives
the Fokker-Planck equation.
Theorem 11.6.1. lfthefunctionsl1ij, 8l1i;/8x,ui:J2l1i;/8x~c8xz, bi, I:Jbifi:Jx;,
8uf8t, 8ufi:Jxi, and 82 uf8xi8x; are continuous fort> 0 and x E Jld, and
if bi, l1ij and their first derivatives are bounded, then u(t,x) satisfies the
equation
t

> 0, X

Jld.

(11.6.6)

Equation (11.6.6) is called the Fokker-Planck equation or Kolmorgorov forward equation.


Remark 11.6.1. In Theorem 11.6.1 we assumed I:Jbifi:Jx; and I:Jqi;/8x~c
were bounded since this implies the Lipschitz conditions (11.5.4) and (11.5.5)
which, in turn, guarantee the existence and uniqueness of the solution to
the stochastic equations (11.6.1) with (11.6.2). In order to assure the existence and differentiability of u, it is sufficient, for example, that ai; and
bi, together with their derivatives up to the third order, are continuous,
bounded, and satisfy the uniform parabolicity condition (11.7.5). 0

Proof of Theorem 11.6.1. We will use the Euler-Bernstein approximation formula (11.5.6) in the proof of this theorem as it allows us to derive
(11.6.6) in an extremely simple and transparent fashion.
Thus let to > 0 be arbitrary, and let x(t) be the solution to equations
(11.6.1) and (11.6.2) on the interval [0, to]. Define x(t) on [t0 , t 0 + e] by
x(to +At)

= x(to) + b(x(to))At + q(x(to))[w(to +At)- w(to)],

(11.6.7)

where 0 ::; At ::; e and e is a positive number. We assume (and this is


the only additional assumption needed for simplifying the proof) that x(t),

11.6. Fokker-Planck Equation

361

extended according to (11.6.7), has a density u.(t,x) for 0 ~ t ~to+ e and


that fort= t 0 , Ut(t, x) exists. Observe that at the point t =to, u.(t, x) (and
U.t ( t, x)) is simultaneously the density (and its derivative) for the exact and
for the extended solution.
Now let h: R"- _. R be a C 3 function with compact support. We wish to
calculate the mathematical expectation of h(x(t0 +at)). First note that
since u.(to +at, x) is the density of x(to +at), we have, by (10.2.2),

E(h(x(t 0 +at)))=

jRd

h(x)u.(t0 +at,x)dx.

(11.6.8)

However, using equation (11.6. 7), we may write the random variable h(x(to+
at)) in the form

h(x(to +at)) = h(Q(x(to), w(to +at)- w(to))),

(11.6.9)

where

Q(x, y) = x + b(x)at + e1(x)y.


The variables x(t0 ) and aw(to) = w(to +at) - w(to) are independent for
each 0 ~at~ e since x(to) is Ft0 -measurable and aw(to) is independent
with respect to Fto Thus the random vector (x(to),aw(t0 )) has the joint
density

u.(to, x)g(at, y),


where g is given by (11.1.3). As a consequence, the mathematical expectation of (11.6.9) is given by

I I

jRd jRd

h(Q(x,y))u.(t0 ,x)g(at,y)dxdy

= }Rd
I }Rd
I h(x+b(x)at+CT(x)y)u.(t0 ,x)g(at,y)dxdy.
From this and (11.6.8), we obtain

}Rd

h(x)u.(t0 +at,x)dx

= }Rd
I }Rd
I h(x+b(x)at+CT(x)y)u.(t0 ,x)g(at,y)dxdy.
By developing h in a Taylor expansion, we have

Ld h(x)u.(to
d

+ at,x)dx =

Ld Ld {h(x)

t, ::i

[bi(x)at + (CT(x)y)i]

fPh

+! ..L 1 -x,.a
a
.[bi(x)at + (e1(x)y)i]
x,
I,J=

[b;(x)at + (e1(x)y);]

+ r(at)}

u.(t0 ,x)g(at,y)dxdy,

(11.6.10)

362

11. Stochastic Perturbation of Continuous Time Systems

where r(~t) denotes the remainder and (u(x)y) 1 is the ith coordinate of
the vector u(x)y.
On the right-hand side of (11.6.10) we have a finite collection of integrals
that we will first integrate with respect toy. Observe that
d

(u(x)y),(u(x)y);

=L

u,~c(x)u;,(x)ykYI

A:,l=l

By equation (11.2.3)

r u<~t. y) dy = 1,

}Rd.

whereas from (11.2.4)

r (u(x)y),g(~t. y) dy = 0.

}Rd.

Finally, from (11.2.5), we have

r (u(x)y),(u(x)y);g(~t. y) dy = a,;(x)~t.

}Rd.

where a 1; is as defined in (11.6.4). By combining all of these results, we can


write equation (11.6.10) as

h(x)[u(to +

}Rd.

= ~t

~t. x)- u(to, x)) dx

1{L d

Rd.

+2

i=l

8h
. b (x)
8 x, 1

L 8x,IJ2h.8x,.a;(x) } u(to,x)dx+R(~t),
d

(11.6.11)

i,j=l

where the new remainder

R(~t)

R(~t)

r ,j;

= ! JRd.

is

82h

x, x; b,(x)b;(x)(~t) u(t0 , x) dx
8 8
1

r r

r(~t)u(to,x)g(~t,y)dxdy.
(11.6.12)
}Rt~.}Rd.
It is straightforward to show that R(~t)/ ~t goes to zero as ~t -+ 0. The
first integral on the right-hand side of (11.6.12) contains (~t) 2 , so this is
easy. The second integral may be evaluated by using the classical formula
for the remainder r(~t):

r(~t)

L
d

i,j,A:=l

[b,~t + (uy) 1)
.IJ3h.
8 x, 8 x, 8:Z:Jc .I

[b;~t + (uy);][b~c~t + (uy)~c].

11.6. Fokker-Planck Equation

363

The third derivatives of hare evaluated at some intermediate point z, which


is irrelevant because we only use the fact that these derivatives are bounded
since h is of compact support.
All of the components appearing in r(~t) can be evaluated by terms of
the form
M(~t) 3 ' M(~t) 2 IYil, M(~t)IYiY; I, MIYiYjYkl,
where M is a constant. To evaluate R(~t) we must integrate these terms
with respect to x andy. Using

/_: lzlng(~t, z} dz = an(~t)n/ 2 ,


where the constants an depend only on n, integration of M(~t) 3 again
gives M(~t) 3 since u(to,x) and g(~t,y) are both densities. Integration
of M(~t) 2 1Yil gives M(~t) 2 C;(~t) 1 1 2 , where C; = a1. Analogously, integration of the third term gives M(~t)C;;(~t), whereas the fourth yields
MC;;k(~t) 3 1 2 ,whereC;; depends on a1 and a2, and C;;k depends on a 1 ,
a 2 , and a 3 All these terms divided by ~t approach zero as ~t-+ 0.
Returning to (11.6.11), dividing by ~t and passing to the limit as ~t-+ 0,
we obtain

L
R"

8
h(x);: dx

L {L ~bi(x) + l L
d

R"

i=l

8h

X,

{)2 h
}
. . a;;(x) u(to,x)dx.
8
8
i,j=l x, XJ
d

(11.6.13)
Since h has compact support we may easily integrate the right-hand side of
(11.6.13) by parts. Doing this and shifting all terms to the left-hand side,
we finally have

(11.6.14)
Since h(x) is a C 3 function with compact support, but otherwise arbitrary,
the integral condition (11.6.14), which is satisfied for every such h implies
that the term in braces vanishes. This completes the proof that u(t0 ,x)
satisfies equation (11.6.6.).
Remark 11.6.2. To deal with the stochastic differential equations (11.6.1)
with (11.6.2), we were forced to introduce many abstract and difficult concepts. It is ironic that, once we pass to a consideration of the density
function u(t,x) of the random process x(t), all this material becomes unnecessary, as we must only insert the appropriate coefficients ai; and b;
into the Fokker-Planck equation (11.6.6)1
D

364

11. Stochastic Perturbation of Continuous Time Systems

11.7 Properties of the Solutions of the


Fokker-Planck Equation
As we have shown in the previous section, the density function u( t, x) of the
solution x(t) of the stochastic differential equation (11.6.1) with (11.6.2)
satisfies the partial differential equation (11.6.6). Moreover, if the initial
condition x(O) = x 0 , which is a random variable, has a density I then
u(O,x) = f(x). Thus, to understand the behavior of the densities u(t,x),
we must study the initial-value (Cauchy) problem:

&
8t

~
d
8
~[ai;(x)u]8-[bi(x)u], t > O,x E uJ, (11.7.1)
i,j=l x, x,
i=l x,
d

=! L

u(O,x)

= f(x),

(11.7.2)

Observe that equation (11.7.1) is of second order and may be rewritten in


the form

(11.7.3)
where

and
(11.7.4)

As was shown in Section 11.6, the quadratic form


d

L ai;(x).Xi.X;,
i,j=l
corresponding to the term of (11.7.3) with second-order derivatives, is always nonnegative. We will assume the somewhat stronger inequality,
d

L ai;(x).Xi.X; ~ P L.X~,
i,j=l
i=l

(11.7.5)

where p is a positive constant, holds. This is called the uniform parabolicity condition.

11.7. Properties of the Solutions of the Fokker-Planck Equation

It is known that, if the coefficients


the growth conditions

365

ai;, bi, and C; are smooth and satisfy


(11.7.6)

then the classical solution of the Cauchy problem, equations (11.7.2) and
(11.7.3), is unique and given by the integral formula
u(t,x)

= }R<~
f r(t,x,y)f(y)dy,

(11.7.7)

where the kernel r, called the fundamental solution, is independent of


the initial density function f.
However, we are more interested in studying equation (11.7.1) than
(11.7.3) which plays an ancillary role in our considerations. To this end, we
start with the following.
Definition 11. 7.1. Let f: Rfl -+ R be a continuous function. A function
u(t,x), t > 0, x E Rd is called a classical solution of equation (11.7.1)
with the initial condition (11.7.2) if it satisfies the following conditions:
(a) For every T > 0 there is a c > 0 and a > 0 such that
for 0 < t ~ T, x E Rd.
(b) u(t, x) has continuous derivatives Ut, U:z:, U:z:,:z:i and satisfies equation
(11.7.3) for every t > 0, x E Rfl; and
(c) lim u(t,x) = f(x).
t-o

(11.7.8)

Condition (a) is necessary because for functions which grow faster than
2
eal:z:l , the Cauchy problem, even for the heat equation Ut = !u2 u:z::z: is not
uniquely determined. Condition (b) is obvious, and (c) is necessary since
(11.7.3) is satisfied only fort> 0 and, thus, the values of u(t,x) fort> 0
must be related to the initial condition u( 0, x) = f (x).
The existence and uniqueness or solutions for the initial value (Cauchy)
problems (11.7.1)-(11.7.2) or (11.7.3)-(11.7.2) are given in every standard
textbook on parabolic equations. General results may be found in Friedman
[1964], Eidelman [1969], Chabrowski [1970], and Bessala [1975].
To state a relatively simple existence and uniqueness theorem, we require
the next definition.
Definition 11. 7.2. We say that the coefficients ai; and bi of equation
(11.7.1) are regular for the Cauchy problem if they are C 4 functions
such that the corresponding coefficients a;;, bi, and c of equation (11.7.3)
satisfy the uniform parabolicity condition (11.7.5) arid the growth conditions (11.7.6).

366

11. Stochastic Perturbation of Continuous Time Systems

The theorem that ensures the existence and uniqueness of classical solutions may be stated as follows.
Theorem 11. 7.1. Assume that the coefficients llij and bi are regular for the
Cauchy problem and that f is a continuous function satisfying the inequality
3
1/(x)l :5 cealzl with constants c > 0 and a > 0. Then there is a unique
classical solution of (11.7.1)-(11.7.2) which is given by (11.7.7). The kernel
r(t,x,y), defined fort> 0, x,y E W, is continuous and differentiable with
respect tot, is twice differentiable with respect to Xi, and satisfies (11.7.3)
as a function of (t, x) for every fixed y. Further, in every strip 0 < t :5 T,
X E R, IYI :5 r' r satisfies the inequalities
0 < r(t,x,y) :5

1:~
where

~(t,x- y),

I ~(t,x:5

~(t, x- y)

y),

1~1 :5 ~(t,x- y),

la::xi I ~(t,x:5

y),

= kt-(n+2)/2 exp[-cS(x- y) 2 /t]

(11.7.9)
(11.7.10)

and the constants c5 and k depend on T and r.

The explicit construction of the fundamental solution r for general coefficients aij, bi, and c is usually impossible. It is easy only for some special
cases, such as the heat equation,
Ut = (u2 /2}Uzz

In this case, r is the familiar kernel


r(t, x, y) =

21ru2 t

exp[-(x- y) 2 /2u 2 t].

Nevertheless, the properties (11.7.9) ofr given in Theorem 11.7.1 allow us


to deduce some very interesting properties of the solution u(t, x).
Let f be a continuous function with bounded support, say on the ball
Br = {x: lxl :5 r}, and let u be the corresponding solution of equations
(11.7.2) and (11.7.3). Then, from the first inequality (11.7.9), we have
lu(t,x)l:5

}Br

r(t,x,y)l/(y)ldy:$M

where M = max11 1/l. Further, since lx-yl 2

;;:::

f ~(t,x-y)dy

}Br

!x2 -r2 for

IYI :5 r, we have

11.7. Properties of the Solutions of the Fokker-Planck Equation

367

and, consequently,

lu(t, x)l $ Kt-(n+ 2)/2 exp ( -~6lxl 2 /t),


2

where K = kMe 6r IBrl By using the remaining inequalities (11.7.9), we


may derive analogous inequalities for the derivatives of u as summarized in
(11.7.11)
These inequalities are quite important for they allow us to multiply equation (11.7.3) by any function that increases more slowly than exp( -~lxl 2 )
decreases (e.g., x, x 2 , ... , erz), and then to integrate term by term to, for
example, calculate the moments of u(t, x).

Example 11. 7.1. Again consider the Langevin equation

dx
dt = -bx+ue
first introduced in Example 11.5.1. The corresponding Fokker-Planck equation is

8u
{)t

1 2fJ2u
a(
= 20'
ax2 + bax xu).

(11.7.12)

Multiply (11.7.12) by xn and integrate to obtain

-d
dt

1oo xnudx = 1oo xna dx.


dx + b 1oo xn-(xu)
ax
ax
()2u

~0'2

-oo

-oo

-oo

Since by our foregoing discussion, u and its derivatives decay exponentially


as lxl -+ oo, we can integrate by parts to give

Let

mn(t)

1:

xnu(t,x)dx

be the nth moment of the function u(t,x). From (11.7.13) we thus have an
infinite system of ordinary differential equations in the moments,

dmo

"dt = 0,
dmn = 20'
l2(
"dt
n n-

dm1

"dt = -bml,

1) mn-2 - nbmn1

n 2: 2,

which can be solved sequentially. Assuming that the initial function


density, we have

mo(t)

= mo(O) =

1:

I dx = 1,

is a

368

11. Stochastic Perturbation of Continuous Time Systems


mt(t) = C1e-bt,

Ct

= mt(O),

m2(t)

= ~b + C2e- 2bt,

ms(t)

= Cse-3bt + 3~~2 (e-bt -

C2

= m2(0) - ~b,
e-3bt),

Cs

= ms(O).

Successive formulas for higher moments become progressively more complicated. However, it is straightforward to demonstrate inductively that
12

lim mn(t)
t-+oo

= { 1 3 5 .. (n -1) (~f ,
0,

for n even
for n odd.

Thus the limiting moments are the same as the moments of the Gaussian
density
At the end of the next section it will become clear that not only do the
moments of the solution of equation (11.7.12) converge to the moments of
the Gaussian density 9ub 1 but also that u(t,x)-+ 9ub(x) as t-+ oo. 0

Remark 11.7.1. A comparison of the discrete and continuous time systems


with stochastic perturbations considered here reveals a close analogy between the dynamical laws, equations (10.4.1) and (10.5.1), and the stochastic differential equation (11.6.1) as well as between equations (10.4.3) and
(10.5.4), for the evolution of densities, and the Cauchy problem, equations
(11.7.1) and (11.7.2). 0

11.8 Semigroups of Markov Operators Generated


by Parabolic Equations
In this section we examine the solutions of the Fokker-Planck equation as
a flow of densities governed by a semigroup of Markov operators. We start
with the following definition.

Definition 11.8.1. Assume that the coefficients a,; and b,, of (11.7.1) are
regular for the Cauchy problem. Then, for every f E L 1, not necessarily
continuous, the function
u(t,x):;:

r(t,x,y)f(y)dy

(11.8.1)

R"

will be called a generalized solution of the Cauchy problem (11. 7.1) and
(11.7.2).
Since r(t,x,y), as a function of (t,x), satisfies (11.7.1) fort> 0, u(t,x)
has the same property. However, if f is discontinuous, then condition
(11. 7.8) might not hold at a point of discontinuity.

11.8. Markov Operators and Parabolic Equations

369

Having a. generalized solution, we define a. family of operators {Pth~o


by

Pof(x)

= f(x),

Ptf(x) =

f I'(t,x,y)f(y)dy.
lnd

(11.8.2)

We will now show that, from the properties of I' stated in Theorem 11.7.1,
we obtain the following corollary.

Corollary 11.8.1. The family of operators


group, that is,
(1) Pt(>.d1

is a stochastic semi-

+ >.2/2) = >.1Ptf1 + >.2Pth, hh E L 1;

(2) Ptf ?:. 0 for

(3)

{Pth~o

I ?:. 0;

IIPtfll = 11/11

for f ?:. 0;

(4) Ptl+t2! = Ptl(Pt2!),

IE L1 .

Proof. Properties (1) and (2) follow immediately from equation (11.8.1)
since the right-hand side is an integral operator with a. positive kernel.
To verify (3), first assume that f is continuous with compact support.
By multiplying the Fokker-Planck equation by a. C 2 bounded function h( x)
and integrating, we obtain

L
Rd

h(x)utdx

L {! L
Rd

h(x)

.. _

'3-

82

. . (a,;u)8 X, 8 x3
1

}
8
L ~(b
1 u) dx
d

._

,_1 uX,

and integration by parts gives

Setting h = 1, we have

! Ld
Since u ?:. 0 for

u dx

Ld

Ut

dx

= 0.

f ?:. 0, we have
!lluii=O

fort> 0.

Further, the initial condition (11.7.8), inequality (11.7.11) and the boundedness of u imply, by the Lebesgue dominated convergence theorem, that
I!Pdll is continuous a.t t = 0. This proves that IIPt/11 is constant for all
t ?:. 0. If f E L 1 is an arbitrary function, we can choose a. sequence {!AJ of

370

11. Stochastic Perturbation of Continuous Time Systems

continuous functions with compact support that converges strongly to


Now,

I.

I(IIPt!ll-11111>1 s; I(IIPt!II-IIPt!A:II)I+I(IIPt!A:II-IIIA:II>I+II!A:-Ill (11.8.3}


Since, as we just showed, Pt preserves the norm, the term IIPdii-IIIA:II is
zero. To evaluate the first term, note that

I(IIPdii-IIPt!A:II)I s; IIPd- Pt!A:II


s; f r(t,x,y)III-IA:IIdys;Mtiii-IA:II,
jRtJ
where Mt = SUPz,v r. Thus the right-hand side of (11.8.3} converges strongly
to zero as k -+ oo. Since the left-hand side is independent of k, we have
IIPdll = IIIII, which completes the proof of (3}. As we know, conditions
(1}-(3} imply that IIPdll s; IIIII for all I and, thus, the operators Pt are
continuous.
Finally to prove (4}, again assume I is a continuous function with compact support and set ii(t, x) = u(t+t1, x). An elementary calculation shows
that ii(t, x} satisfies the Fokker-Planck equation with the initial condition
ii(O,x) = u(t1,x). Thus, by the uniqueness of solutions to the FokkerPlanck equation,
and, at the same time,

From these it is immediate that

Pt+tJ = l't(PtJ),
which proves (4) for all continuous I with compact support. If I E 1
is arbitrary, we again pick a sequence {!A:} of continuous functions with
compact supports that converges strongly to I and for which

Pt2+tJ1c =

Pt 2 (PtJ~c)

holds. Since the Pt have been shown to be continuous, we may pass to the
limit of k-+ oo and obtain (4) for arbitrary f.
Remark 11.8.1. In developing the material of Theorems 11.6.1, 11.7.1,
and Corollary 11.8.1, we have passed from the description of u(t, x) as the
density of the random variable x(t), through a derivation of the FokkerPlanck equation for u( t, x) and then shown that the solutions of the FokkerPlanck equation define a stochastic semigroup {Pth~o This semigroup describes the behavior of the semi-dynamical system, equations (11.6.1} and

11.9. Asymptotic Stability of Solutions

371

(11.6.2). In actuality, our proof of Theorem 11.6.1 shows that the righthand side of the Fokker-Planck equation is the infinitesimal operator for
Ptf, although our results were not stated in this fashion. Further, Theorem 11.7.1 and Corollary 11.8.1 give the construction of the semigroup
generated by this infinitesimal operator. 0

Remark 11.8.2. Observe that, when the stochastic perturbation disappears (ui; = 0), then the Fokker-Planck equation reduces to the Liouville
equation and {Pt} is simply the semigroup of Frobenius-Perron operators
corresponding to the dynamical system
i

= 1, ... ,d.

11.9 Asymptotic Stability of Solutions of the


Fokker-Planck Equation
A13 we have seen, the fundamental solution r may be extremely useful.
However, since a formula for r is not available in the general case, it is
not of much use in the determination of asymptotic stability properties of
u(t,x). Thus, we would like to have other techniques available, and in this
section we develop the use of Liapunov functions for this purpose, following
Dlotko and Lasota (1983).
Here, by a Liapunov function we mean any function V: Rd -+ R that
satisfies the following four properties:

(1) V(x)

0 for all x;

(2) limlzl-+oo V(x)

= oo;

(3) V has continuous derivatives (8Vj8xi), (fPVJ8xi8x;), i,j


and
(4) V(x)

I I

~ pe6lzl, ~()~~:)I ~ pe61zl, and ~:~~;J ~ pe6lzl

= 1, ... , d;
(11.9.1)

for some constants p, 6.


Conditions (1)-(4) are not very restrictive, for example, any positive definite quadratic form (of even order m)
d

V(x)

ait ... imXit Xim

ilttim=l

is a Liapunov function. Our main purpose will be to use a Liapunov function

372

11. Stochastic Perturbation of Continuous Time Systems

V that satisfies the differential inequality

82V

:E ai;(x)

..
,,,=
1

.+

8Xi 8X3

8V
:E
bi{x)O. ~ -aV(x) + f3
l=
d

{11.9.2)

X,

with positive constants a and {3. Specifically, we can state the following
theorem.

Theorem 11.9.1. Assume that the coefficients ai; and bi of equation


{11.7.1) are regular for the Cauchy problem and that there is a Liapunov
function V satisfying {11.9.2). Then the stochastic semigroup {Pth?:o defined by the generalized solution of the Fokker-Planck equation and given
in {11.8.2) is asymptotically stable.

Proof. The proof is similar to that of Theorem 5.7.1. First pick a continuous density f with compact support and then consider the mathematical
expectation of V calculated with respect to the solution u of equations
(11.7.1) and (11.7.2). We have

E(V I u)

=f

V(x)u(t,x)dx.

}Rrl

(11.9.3)

By inequalities (11.7.11) and (11.9.1), u(t,x)V(x) and Ut{t,x)V(x) are integrable. Thus, diffemtiation of (11.9.3) with respect tot gives

dE(V I u) _
dt
-

jR"

V( ) (

X Ut

t,x

) ..J_
ua;

d
82
d8
}
= JlifRrl V(x) { ! i,j=1
:E
8-[bi(x)u]
8 x,.8x,. [~;(x)u]- :E
i=1 x,

Integrating by parts and using the fact that the products u V,


uVx, vanish exponentially as lxl--+ oo, we obtain

dE(V 1 u)
dt
=

1 { ,,,_
1

Rrl

2
8 V
8 X, 8 x3

. .
2 :E ai;(x).. _

dx.

Uz, V,

and

d
8V}
+ :E~(x)a.
u(t,x)dx.
._
x,
,_
1

From this and inequality (11.9.2), we have

dE(~ I u) ~ -aE(V I u) + {3.


To solve this differential inequality, multiply through by eat, which gives

[E(V I u)eat]

~ {Jeat.

11.9. Asymptotic Stability of Solutions

373

Since E(V I u) at t = 0 equals E(V I /), integration on the interval [0, t]


yields
E(V I u)eat- E(V I f)$ (/3/o.)(eat- 1)
or

E(V I u) $ e-atE(V I f)+ (/3/o.)(1- e-at).

Since E(V I f) is finite, we can find a to

E(V I u) $ (/3/o.)
Now let Gq
have

= {x: V(x) < q}.


f

la

= to(/) such that

+1

fort 2:: to.

From the Chebyshev inequality (5.7.9), we

u(t,x)dx;::: 1- E(V I u),


q

and taking q > 1 + (/3 /a) gives

fo

u(t,x)dx 2::

1- ~ [1 + ~] = > 0
e

for t ;::: to. Since V(x) -+ oo as lxl -+ oo, there is an r > 0 such that
V(x) 2:: q for lxl 2:: r. Thus the set Gq is contained in the ball Br and, as a
consequence,

u(t,x)=

r(1,x,y)u(t-1,y)dy;=::

jRd

r(1,x,y)u(t-1,y)dy

jBr

;::: inf r(1, x, y)

l11l~r

JBr

u(t- 1, y) dy;::: e inf r(1, x, y),

llli~r

fort 2:: to+ 1,x E Rd.

{11.9.4)

Since r(1, x, y) is strictly positive and continuous, the function

h(x)

= e inf

l11l~r

r(1,x,y)

is also positive. From (11.9.4), we have

Ptf(x)

= u(t,x);::: h

fort;::: to+ 1,

which shows that {Pt} has a nontrivial lower-bound function. Hence, by


Theorem 7.4.1, the proof is complete.
When {Pt} is asymptotically stable, the next problem is to determine
the limiting function
lim Ptf(x)

t--+00

= u.(x),

JeD.

{11.9.5)

374

11. Stochastic Perturbation of Continuous Time Systems

This may be accomplished by using the following proposition.

Proposition 11.9.1. If the assumptions of Theorem 11.9.1 are satisfied,


then the limiting function u of (11.9.5) is the unique density satisfying the
elliptic equation

(11.9.6)

Proof. Assume that u(x) is a solution of (11.9.6). To prove the uniqueness


of u(x), note that, because u is a solution of (11.9.6), it follows that u(t, x) =
u(x) is a time-independent solution of the Fokker-Planck equation (11.7.1).
Thus, by Theorem 11.9.1,
u(x) = lim u(t,x) = u.(x)
t-+oo

and u.(x) = u(x) is unique.


Next we show that u. satisfies (11.9.6). Let f E D(Rd) be a continuous
function with compact support. We have u(t + s, x) = Ptu(s, x), or
u(t+s,x)=
Passing to the limit as s

-+

}Rd

r(t,x,y)u(s,y)dy.

oo, we obtain

u.(x) =

}Rd

r(t,x,y)u.(y)dy.

Since r is a fundamental solution of the Fokker-Planck equation, u.(x) is


also a solution, and, since u.(x) is independent oft, it must satisfy equation
(11.9.6). Thus the proof is complete.

Example 11.9.1. Again consider the Langevin equation


dx

dt = -bx+ue

and the corresponding Fokker-Planck equation

8u

at=

1 21:J2u

a )

2u ax2 +bax(xu.

Inequality (11.9.2) becomes

1 2a2v

av

2u ax2 - bx ax ~ -aV + {3,

11.9. Asymptotic Stability of Solutions

375

which is satisfied with V(x) = x 2 , a = 2b, and {3 = u 2 Thus all solutions u(t,x), such that u(O,x) = f(x) is a density, converge to the unique
(nonnegative and normalized) solution u. of
2

cPu

1
20'
dz2

+ bdx (xu) = 0.

(11.9.7)

The function

which is the Gaussian density with mean zero and variance u 2 f2b, satisfies
(11.9.7), and, by Proposition 11.9.1, it is the unique solution. 0
Example 11.9.2. Next consider the system of stochastic differential equations
dx

dt =Bx+u~,

where B = (b,;) and u


matrix (a 1;) with

= (ui;)

(11.9.8)

are constant matrices. Assume that the


d

a,;

= E O'ikUjk
k=1

is nonsingular and that the unperturbed system

dx

-=Bx
dt

(11.9.9)

is stable asymptotically, that is, all solutions converge to zero as t -+ oo.


The Fokker-Planck equation corresponding to (11.9.8) has the form
8u

v~

~ .L. a,;8X 8ux . - L


-8
. [b,(x)u]
.
X
3
1
,,,= 1
=
1

where

b1(x)

= Lb;x;.
j=1

Since the coefficients ao; are constant and the matrix (a,;) is nonsingular,
this guarantees that the uniform parabolicity condition (11.7.5) is satisfied.
All of the remaining conditions appearing in Theorem 11. 7.1 in this case
are obvious. Since (11.9.9) is asymptotically stable, the real parts of all
eigenvalues of B are negative and from the classical results of Liapunov
stability theory there is a Liapunov function V such that

ttb
d

1(x)

8V
x, ~ -aV(x),
8

(11.9.10)

376

11. Stochastic Perturbation of Continuous Time Systems

where V is a positive definite quadratic form


d

V(x)

(11.9.11)

k,;XiX;

i',j'=1

Differentiating (11.9.11) with respect to x 1 and then x;, multiplying by


1;, summing over i and j, and adding the result to (11.9.10) gives

!a

a2v
a,;{f"7):

x, x,

i,j=1

av

i=1

x,

+ Lb,(x)a-:- ~ -aV(x) +

Ld a,;ki;

i,j=1

Thus inequality (11.9.2) is satisfied. Hence the semigroup {Pt} generated


by the perturbed system (11.9.8) is asymptotically stable.
To summarize, if the unperturbed system (11.9.9) is asymptotically stable, then any stochastic perturbation with a nonsingular matrix (a,;) leads
to a stochastic semigroup that is also asymptotically stable. In this case
the limiting density is also Gaussian and can be found by the method of
undetermined coefficients by substituting

u(x)

= cexp

(.t

p,;x;x;)

,,,=1

into the equation

02
d a.;--u-da(d
! .L:
L:u L:b,;x; )
. 1
8x,8x; . 1 ax,
. 1
'1=

a=

o. o

:1=

Example 11.9.3. Consider the second-order system

tFx

dx

m dt2 +/3 dt +F(x) =

ue

(11.9.12)

with constant coefficients m, {3, and u. Equation (11.9.12) describes the


dynamics of many mechanical and electrical systems in the presence of
''white noise." In the mechanical interpretation, m would be the mass of
a body whose position is x, f3 is a friction coefficient, and F = 8cf>f8x is
a conservative force (with a corresponding potential function cf>) acting on
the body. Introducing the velocity v = dx / dt as a new variable, equation
(11.9.12) is equivalent to the system

dx
dt

=v

dv

and m dt

= -{3v- F(x) + ue.

(11.9.13)

11.9. Asymptotic Stability of Solutions

377

The Fokker-Pla.nck equation corresponding to (11.9.13) is

8u
&t

o-2

= 2m2

o2u

1Jv2 - ox (vu) + m ov {[,Bv + F(x)]u}.

(11.9.14)

Unfortunately, the asymptotic stability of the solutions of (11.9.14) cannot


be studied by Theorem 11.9.1 as the quadratic form associated with the
second-order term is
0 A~+ (o-2 /m2 ) A~,
which is clearly not positive definite. Using some sophisticated techniques,
it is possible to prove that the solutions to some parabolic equations with
associated semidefinite quadratic forms are asymptotically stable. However,
in this example we wish only to derive the steady-state solution to (11.9.14)
a.nd to bypass the question of asymptotic stability.
In a steady state, (8u/&t) = 0, so (11.9.14) becomes

o- 2 02u
1
2m2 1Jv2 -ox (vu) + m 8v {[,Bv + F(x)]u}

= 0,

which may be written in the alternate form

.!!_ - .!!_)
( p_
m ov
ox

[vu

+~

au] + .!!_
[..!..F(x)u + ~ au] = 0.
8v m
2mf3 ox

2mf3 8v

Set u(x, v)

= X(x)V(v), so that the last equation becomes

13
( m .!!_
8v

!...) [x
ox

(vv

+~

dV)] + [..!..F(x)X
+ ~ dX] dV = o
m
2mf3 dx dv
'

2m/3 dv

which will certainly be satisfied if X a.nd V satisfy


dX
dx

+ u 2 F(x)X = 0

(11.9.15)

dV
dv

+ 2mf3 vV = 0

(11.9.16)

a.nd

2/3

o-2

'

respectively.
Integrating equations (11.9.15) a.nd (11.9.16) a.nd combining the results
gives
u(x, v) =

cexp {-(2{3/u2) [!mv2 + 1/>(x)]}.

(11.9.17)

The constant c in (11.9.17) is determined from the normalization condition

I:I:

u(x, v) dx dv

= 1.

The velocity integration is easily carried out a.nd we have


c

= c1 .j{3m/7ro-2 ,

378

11. Stochastic Perturbation of Continuous Time Systems

where

-1 =
c1

00

-oo

exp[(-2P/u2 )<f>(x)] dx.

(11.9.18)

Thus (11.9.17) becomes

u(x,v)

= c1 v'Pm/1ru2 exp { -(2P/u2 ) [!mv2 + <f>(x)]}.

(11.9.19)

The interesting feature of (11.9.19) is that the right-hand side may be


written as the product of two functions, one dependent on v and the other
on x. This can be interpreted to mean that in the steady state the positions
and velocities are independent. Furthermore, observe that for every <P for
which the integral (11.9.18) is convergent, u(x, v), as given by (11.9.19), is
a well-defined solution of the steady-state equation and that the distribution of velocities is Maxwellian, independent of the nature of the potential
function </>. The Maxwellian nature of the velocity distribution is a natural
consequence of the characteristics of the noise perturbation term in the
force balance equation (11.9.13). 0

11.10 An Extension of the Liapunov Function


Method
A casual inspection of the proofs of Theorems 5.7.1 and 11.9.1 shows that
they are based on the same idea: We first prove that the mathematical expectation E(V I Ptf) is bounded for large t and then show, by the Chebyshev inequality, that the density Ptf is concentrated on some bounded
region. With these facts we are then able to construct a lower-bound function. This technique may be formalized as follows.
Let a stochastic semigroup {l't}t~ 0 , Pt: L 1 (G)-+ L1 (G), be given, where
G is an unbounded measurable subset of R!-. Further, let V:G-+ R be a
continuous nonnegative function such that
lim V(x)
lzl-+oo

= oo.

(11.10.1)

Also set, as before,

E(V I Ptf) =

V(x)Ptf(x) dx.

(11.10.2)

With these definitions it is easy to prove the following proposition.


Proposition 11.10.1. Assume there exists a linearly dense subset Do
D(G) and a constant M < oo such that

E(V I Ptf)

(11.10.3)

11.10. An Extension of the Liapunov Function Method

379

for every f E Do and sufficiently large t, say t ~ h(/). Let r be such that
V(x) ~ M + 1 for lxl ~ r and x E G. I/, for some to > 0, there is a
nontrivial function hr with hr ~ 0 and llhrll > 0 such that
Ptof ~

hr

for f E D

(11.10.4)

whose support is contained in the ball Br = {x E Rd: lxl ~


stochastic semigroup {Pt h~o is asymptotically stable.
Proof. Pick
follows that

r},

then the

E D 0 From the Chebyshev inequality and (11.10.3), it

Ptf(x)dx

loG

~1-M,

(11.10.5)

where Ga = {x E G: V(x) <a}. Pick a= M


Then Ga C Br and

+ 1 so V(x)

~ a for

Ptf = PtoPt-tof ~ Ptoft = llfti!Ptof,

j=

where It = (Pt-t 0 /)1oG and

11/tll = f Pt-t
loG

lxl

~ r.

(11.10.6)

ft!llftll From (11.10.5), we have

/(x) dx

~ 1- M

fort~ to+ t1,

and, by (11.10.4), Ptoi ~ hr. Thus, using (11.10.6)., we have shown that

[1- (Mfa)]hr
is a lower-bound function for the semigroup {Pth~o Since, by assumption, hr is a nontrivial function and we took a > M, then it follows that
the lower-bound function for the semigroup {Pth>o is also nontrivial. Application of Theorem 7.4.1 completes the proof. -0

Example 11.10.1. As an example of the application of Proposition 11.10.1,


we will first prove the asymptotic stability of the semigroup generated by
the integro-differential equation

8u(t,x)
&t
+u(t,x)

= 2u
t

lJ2u

Bx2 +

00

-oo

K(x,y)u(t,y)dy,

> O,x E R

(11.10.7)

with the initial condition

u(O,x)

= t/J(x),

xER,

(11.10.8)

which we first considered in Example 7.9.1. As in that example, we assume


that K is a stochastic kernel, but we also assume that K satisfies

1:

!x!K(x, y) dx

~ alvl + f3

foryER

where a and f3 are nonnegative constants and a

< 1.

(11.10.9)

380

11. Stochastic Perturbation of Continuous Time Systems

To slightly simplify an intricate series of calculations we assume, without


any loss of generality, that u = 1. (This is equivalent to defining a new
x = x / u.) Our proof of the asymptotic stability of the stochastic semigroup,
corresponding to equations (11.10.7) and (11.10.8), follows arguments given
by Jama (1986] in verifying (11.10.3) and (11.10.4) of Proposition 11.10.1.
From Example 7.9.1, we know that the stochastic semigroup {Pth~o
generated by equations (11.10.7) and (11.10.8) is defined by (with u 2 = 1)
00

Ptt/> = e-t ~ Tn(t)tj>,

(11.10.10)

n=O

where

Tn(t)f =lot To(t- r)PTn-l(r)J dr,


To(t)f(x)

L:

(11.10.11)

g(t, x- y)f(y) dy

and
00

Pf(x)=

1
v211"t

K(x,y)f(y)dy, g(t,x)= .~exp(-x2 /2t). (11.10.12)

-oo

Let

E D(R) be a continuous function with compact support. Define

E(t)

L:

= E(lxl Pd) =

lxiPtf(x) dx,

which may be rewritten using (11.10.10) as


00

E(t)

= e-t ~ en(t),
n=O

where
en(t)

L:

lxiTn(t)f(x) dx.

We are going to show that E(t}, as given here, satisfies condition (11.10.3}.
If we set

fnt

= PTn-l(t}/

and qnT(t)

L:

lxiTo(t- r}/nt(x) dx

then, using (11.10.11}, we may write en(t) as


en(t}

= lot qnT(t) dr.

(11.10.13}

11.10. An Extension of the Liapunov FUnction Method

381

Using the second relation in equations (11.10.11), Qn-r(t) can be written as

Qn-r(t) = /_: /n-r(Y) [/_: lxlg(t- 7", X - y) dx] dy.


Since lxl :5 lx - Yl

+ IYI, it is evident that

/_: lxlg(t- 7", X- y) dx :5

V(t; -r) + IYI

(11.10.14)

(11.10.15)

and, as a consequence,
00

Qn-r(t) :5

-oo

r)
IYI/n-r(Y) dy + V2(t11"

00

-oo

fn-r(Y) dy.

By using equation (7.9.18) from the proof of the Phillips perturbation theorem and noting that P is a Markov operator (since K is a stochastic kernel)
and II/II = 1, we have
oo

-oo

fn-r(y)dy

= IIPTn-t(-r)/11 = IITn-1(7")/11 :5

7"n-1
(n _ )!"
1

(11.10.16)

Furthermore, from equations (11.10.9) and (7.9.18),

/_: IYI/n-r(Y) dy =

L: L:

IYIK(y, z)Tn-t(r)f(z) dydz

:5 a/_: lziTn-1(-r)/(z) dz + {3 /_: Tn-t(-r)/(z) dz


7"n-1

:5 aen-1(-r) + {3 (n _ 1)!"
Substituting this and (11.10.16) into (11.10.15) gives

Qn-r(t) :5 aen-1(-r) + [{3 +

v2(t- -r)
11"

7"n-1

(n _ 1)!

so that {11.10.13) becomes

en(t) :5 a1t en-1(-r)d-r+{Jt~


o
n.
n

~ loft v't-r (n7"~-;)l. dr,


V;:

= 1, 2,... .

{11.10.17)

To obtain eo(t) we again use {11.10.14) to give

eo(t)

= /_: lx!To(t)f(x)dx = /_:

L:

lxlg(t,x- y)f(y) dxdy

00

:5

+ mt.

m1

-oo

IYif(y) dy.

(11.10.18)

382

11. Stochastic Perturbation of Continuous Time Systems

With equations (11.10.17) and (11.10.18) we may now proceed to examine E(t). Sum (11.10.17) from n = 1 to m and add (11.10.18). This
gives

m
{2i
{2 (t
m
Een(t)~m1+y-;+f3et+v;Jo vt-Te.,.dT+ lo Een(T)dT,

where we used the fact that


m t"
oo t"
"""
L..., -n! <
- """
L..., -nl

n=1
Define Em(t)

n=O

= et.

= e-t E:'=o en(t); hence we can write

Em(t)

~ m1e-t + p +a lot e-(t-.,.) Em(T) dT,

where

p=f3+mp-x

(11.10.19)

[~e-t] + 1oo y'Ue-udv..

To solve the integral inequality (11.10.19), it is enough to solve the corresponding equality and note that Em(t) is below this solution (Walter,
1970). This process leads to

Em(t) ~ [p/(1 -a)] + m1e-< 1-a)t,


or passing to the limit as m

-+

oo,

E(t) ~ [p/(1- a))+ m1e-(1-a)t.

(11.10.20)

Since the constant p does not depend on /, (11.10.20) proves that the
semigroup {Pth;::o, generated by (11.10.7) and (11.10.8), satisfies equation
(11.10.3) with V(x) = lxl.
Next we verify equation (11.10.4). Assume that f E D(R) is supported
on [-r, r]. Then we have

1.,. f(y)exp [-!(x- y)


+ 1)] 1.,. f(y) dy

1
Pd ~ e- 1 To(1)/ = e- 1 . tiC

v27r

~
=

1
2
2
. tiC exp[-(x + r
v27r

-r

-r

~ exp[-(x2 + r 2 + 1)],

v27r

and the function on the right-hand side is clearly nontrivial.

dy

11.10. An Extension of the Liapunov FUnction Method

383

Thus we have shown that the semigroup {Pth~o generated by equations


(11.10.7) and (11.10.8) is asymptotically stable, and therefore the solution
with every initial condition f/J E D converges to the same limit. 0

Example 11.10.2. Using a quite analogous approach, we now prove the


asymptotic stability generated by the equation
8u(t x)

x)
+ c 8u(t
8~ + u(t, x) =

ch

00

K(x, 11)u(t, 11) d11

(11.10.21)

= f/J(x)

(11.10.22)

with the conditions

u(t,O)

=0

and u(O,x)

(see Example 7.9.2). However, in this case some additional constraints on


kernel K will be introduced at the end of our calculations. The necessity of
these constraints is related to the fact that the smoothing properties of the
semigroup generated by the infinitesimal operator (,P / dx 2 ) of the previous
example are not present now (see Example 7.4.1). Rather, in the present
example the operator (d/dx) generates a semigroup that merely translates
functions (see Example 7.4.2). Thus, in general the properties of equations
(11.10.7) and (11.10.21) are quite different in spite of the fact that we are
able to write the explicit equations for the semigroups generated by both
equations using the formulas of the Phillips perturbation theorem. Our
treatment follows that of Dlotko and Lasota [1986).
To start, we assume K is a stochastic kernel and satisfies

lot xK(x, 11) dx :$; a11 + (3

for 11 > 0,

(11.10.23)

where a and (3 are nonnegative constants and a < 1. In the ChandrasekharMiinch equation, K(x, 11) = .,P(x/11)/11, and {11.10.23) is automatically satisfied since

1
11

xK(x, 11) dx

and

=1

11

(x/11)'1/J(x/11) dx

= 111

z.,P(z) dz

fo zt/J(z) dz < fo t/J(z) dz = 1.

As in the preceding example, the semigroup {Pth~o generated by


equations (11.10.21) and (11.10.22) is given by equations (11.10.10) and
(11.10.11), but now (assuming c = 1 for ease of calculations)

To(t)f(x)

= 1[o,oo)(X- t)f(x- t)

and

(11.10.24)

00

Pf(x)

K(x, 11)1(11) d11.

(11.10.25)

384

11. Stochastic Perturbation of Continuous Time Systems

To verify condition {11.10.3), assume that f E D{[O,oo)) is a continuous


function with compact support contained in {0, oo) and consider

E(t) =

koo xPtf(x) dx.

By using notation similar to that introduced in Example 11.10.1, we have


00

E(t)

= e-t L en(t),

en(t)

= koo xTn(t)f(x) dx,

n=O

and

where fnT = Tn-1(r)f. From equations (11.10.24) and (11.10.25), we have


00

qn'T(t) = {

lt-T

or, setting x- t

[1

00

z-t+T

K(x- t + T, y)fnT(Y) dy] dx,

+,. = z and using (11.10.23),

1 [1
00

qnT(t)

00

zK(z,y)fnT(y)dy] dz

1 [1
00

+(t-r)

00

K(z,y)fnT(y)dy] dz
00

fooo Yfn'T(y) dy + {31 fn-r(Y) dy


+(t-r) koo [1 K(z,y)fnT(y)dy] dz.

$a:

00

Since K is stochastic and

this inequality reduces to

qnT(t) $ a:en-1{'1") + [{3 + t- r][rn- 1/(n- 1)!],


Thus

en(t) $a:

tn

1
t

en-1(r)dr + {31
~

n= 1,2, ....

lot (t- r) ( ,.n-1


_ ) dr.
o

1 1.

Further,

eo(t)

= koo xTo(t)f(x) dx = [oo xf(x- t) dx

00

00

zf(z) dz + t

f(z) dz

{11.10.26)

11.10. An Extension of the Liapunov Function Method

or

385

00

= m1 + t,

eo(t)

m1

(11.10.27)

zl(z) dz.

Observe the similarity between equations (11.10.26)-(11.10.27) and equa.tions (11.10.17)-(11.10.18). Thus, proceeding as in Example 11.10.1, we
again obtain (11.10.20) with

p = /3 +

00

ue-u du + max(te-t).
t

Thus we have shown that the semigroup generated by equations (11.10.21)(11.10.22) satisfies condition (11.10.3).
However, the proof that (11.10.4) holds is more difficult for the reasons
set out at the beginning of this example. To start, pick r > 0 as in Proposition 11.10.1, that is,

r = M + 1 = [p/(1 -a)] + 1.
For an arbitrary

D([O, r]) and to > 0, we have

Pt0 l(x) 2::: e-toTl(to)l(x)

= e-to

to

To(to- r)PTo(r)l(x)dr

= e-to 1to [1[o,oo)(X- to+ r)

00

K(x-to+r,y)

:~:-to+r

1[o,oo)(Y- r)l(y- r)dy]dr.


~

In particular, for 0

~to,

Ptol(x);:::: e-to rto

[10Q K(x- to+

lto-:J:

Now set z
obtain

=y-

and

=x -

T,

y)l(y- r) dy] dr.

to + T and remember that I

D([O, r]) to

Pt 0 l(x) 2::: e-to 1:~: [for K(8, z + 8 +to- x)l(z) dz] d8


2::: hr(x) 1r l(z) dz

where

hr(x)

= hr(x)

for 0

~ x ~to,

= e-to O:Sz$r
inf 1:~: K(8, z + 8 +to- x) d8.
0

386

11. Stochastic Perturbation of Continuous Time Systems

It is therefore clear that hr ~ 0, and it is easy to find a sufficient condition


for hr to be nontrivial. For example, if K(8, u) = (8/u)/u, as in the
Chandrasekhar-Miinch equation, then
hr(x)

= e-to inf

r 1/J ( z +

~ 1o

If we set q = s/ (z + 8

+ to -

hr(x)

+to - x

d8

z + 8 +to - x

x) in this expression, then

= e-to inf
~

z/(~+to)

1/J( )
_q_ dq

1-q

r/(r+to)
~ e-to 1o
.,P(q) dq.

Since 1/J(q) is a density, we have


z/(r+to)

lim

to-+oo

uniformly for x

1/J(q) dq = 1

e [to -1, t 0 ]. Thus, for some sufficiently large to, we obtain

r/(r+to)
hr(x) ~ e-to 1o
.,P(q) dq

>0

for x E [to- 1, to],

showing that hr is a nontrivial function. Therefore all the assumptions of


Proposition 11.10.1 are satisfied and the semigroup {Pth~o generated by
the Chandrasekhar-Miinch equation is asymptotically stable. 0

11.11 Sweeping for Solutions of the


Fokker-Planck Equation
As we have seen in Section 11.9, semigroups generated by the FokkerPlanck equation may, for some value of the coefficients, be asymptotically
stable. The example provided was the Langevin equation. On the other
hand, the heat equation, perhaps the simplest Fokker-Planck equation,
generates a sweeping semigroup. In this and the next section we develop a
technique to distinguish between these two possibilities.
We return to equation (11.7.1) with the initial condition (11.7.2) and consider the stochastic semigroup {Pth~o given by equations (11.8.2) generated by these conditions. We say that {Pt }t~ 0 is sweeping if it is sweeping
with respect to the family .Ac of all compact subsets of R!'-. Thus, {Pth~o
is sweeping if
lim
t-+oo

lim f u(t,x)dx=O,
1fA Ptf(x)dx= t-+oo
1A

for IE D,A E .Ac.


(11.11.1)

11.11. Sweeping for Solutions of the Fokker-Planck Equation

387

'In this section, we understand a Bielecki function to be any function


V: Jld - R that satisfies the following three conditions:
(1) V(x) > 0 for all x;
(2) V has continuous derivatives

8V
fPV
8xi' 8xi8x;'

i,j

= 1, ... ,d;

and

(3)

V(x)~pe6lzl, ~8~x(x,.)l~pe6lzl,
u

and

lfPV(x)l< 6lzl
8xix; - pe '
(11.11.2)

for some constants p and 6.


From condition (1) and the continuity of V it follows that
inf V(x)

zEA

>0

for A E

.Ac,

and consequently our new definition of a Bielecki function is completely


consistent with the general definition given in Section 5.9.
With these preliminaries we are in a position to state an analog of Theorem 11.9.1, which gives a sufficient sweeping condition for semigroups
generated by the Fokker-Planck equation.
Theorem 11.11.1. Assume that the coefficients ai; and bi of equation
(11.7.1) are regular for the Cauchy problem, and that there is a Bielecki
function V: Jld - R satisfying the inequality

i,j=l

fPV

ai;(x)~ +
x, x,

8V

L bi(x)~x, ~ -aV(x),

(11.11.3)

i=l

with a constant a> 0. Then the semigroup {.Pth~o generated by (11.7.1)(11.7.2) is sweeping.
Proof. The proof proceeds exactly as the proof of Theorem 11.9.1, but
is much shorter. First we pick a continuous density f with compact support and consider the mathematical expectation (11.9.3). Using inequality
(11.11.3), we obtain

dE(~ I u) ~ -aE(V I u),

388

11. Stochastic Perturbation of Continuous Time Systems

and, consequently,

E(V I Ptf)

= E(V I u) ~ e-at E(V I f).

Since e-at< 1 fort> 0, Proposition 7.11.1 completes the proof.

Example 11.11.1. Consider the stochastic equation


dx
dt = bx +

ue

(11.11.4)

where b and 0' are positive constants and is a white noise perturbation. Equation (11.11.4) differs from the Langevin equation because the
coefficient of x is positive. The Fokker-Planck equation corresponding to
(11.11.4) is

8u

= u 2 tPu _ b8(xu).

8t
2 8x2
Now the inequality (11.11.3) becomes
u2

8x

lPV
8V
8x2 +bx 8x $ -aV.

(11.11.5)

(11.11.6)

Pick a Bielecki function of the form V(x) = e-n:~ and substitute it into
(11.11.6) to obtain
2e(eu2 - b)x 2 - eu 2 $ -a.
This inequality is satisfied for arbitrary positive e $ b/ u 2 and a $ eu2 This
demonstrates that forb> 0 the semigroup {~}t~o generated by equation
(11.11.5) is sweeping.

11.12 Foguel Alternative for the Fokker-Planck


Equation
Stochastic semigroups generated by the Fokker-Planck equation are especially easy to study using the Foguel alternative introduced in Section
7.12. This is due to the fact that these semigroups are given by the integral
formula {11.8.2).
We have the following.

Theorem 11.12.1. Assume that the coefficients D.i; and bi of equation


(11.7.1) are regular for the Cauchy problem. FUrther assume that all stationary nonnegative solutions of equation (11.7.1) are of the form cu.(x)
where u. (x) > 0 a. e. and c is a nonnegative constant. Then the semigroup
{Pth~o generated by equations (11.7.1)-(11.7.2) is either asymptotically
stable or sweeping. Asymptotic stability occurs when
I=
and sweeping when I= oo.

}Rd

u.(x)dx<oo

(11.12.1)

11.12. Foguel Alternative for the Fokker-Planck Equation

389

Proof. We are going to use Theorem 7.12.1 in the proof, sequentially verifying conditions (a), (b), and (c).
First we are going to show that the kernel f(t,x,y) in equation (11.7.7)
is stochastic for each t > 0. We already know that r is positive and that
{Pth>o is stochastic.
Furthermore, for each IE L 1 (R") we have

}Rd

l(y)dy

= }Rd
f Ptf(x)dx = f f f(t,x,y)l(y)dxdy,
}Rtl }Rd

and consequently

ltl [ld r(t,x,y)dx


Since

-1]

l(y)dy

0.

IE L 1 (R") is arbitrary, this implies

}Rtl

r(t,x,y)dx

=1

for t > 0, y E Rd.

Thus, r is a stochastic kernel and condition (a) of Theorem 7.12.1 is satisfied.


In verifying condition (b), note that according to the definition of the
semigroup {Pth>o the function

u(t,x)

= Ptu.(x)

is a solution of equations (11.7.1) and (11.7.2) with I= u. Since u. is a


stationary solution and the Cauchy problem is uniquely solvable, we have
fort~

u.(x) = Ptu.(x)

0.

Thus, condition (b) of Theorem 7.12.1 is satisfied for 1. = u .


To verify (c) simply observe that the positivity of r implies that Ptf (x) >
0 for every t > 0 and I E D. Thus, supp Ptf = R" and Pt is expanding for
every t > 0. This completes the proof.
It is rather easy to illustrate the general theory developed above with
a simple example in one dimension. Consider the stochastic differential
equation
dx
(11.12.2)
dt = b(x) + u(x)e

where 0' 1 b, and X are scalar functions, and is a one-dimensional white


noise. The corresponding Fokker-Planck equation is of the form
lJu

1 02(u 2 (x)u]

8t=2

8x2

8(b(x)u]
8x

(11.12.3)

390

11. Stochastic Perturbation of Continuous Time Systems

Assume that a(x)


and that

= u 2(x) and b(x) are regular for the Cauchy problem,


~

xb(x)

for

lxl 2:: r,

(11.12.4)

where r is a positive constant. This last condition simply means that the
interval [-r, r] is attracting (or at least not repelling) for trajectories of the
unperturbed equation x = b(x).
To find a stationary solution of (11.12.3) we must solve the differential
equation

! P[u 2 (x)u] _ d[b(x)u] = 0


2

d:z:2

or

dz
d:z:

where z

d:z:

'

2b(x)
= u 2 (x)z + Ct

= u 2u and c1 is a constant. A straightforward calculation gives


z(x) =

eB(z) {

C2

where c2 is a second constant and

+ Ct 1:~: e-B(II) dy}'

2b(y)
B(x) = Jo u2(y) dy.
The solution z(x) will be positive if and only if
C2

+ Ct 1:~: e-B(II) dy > 0

for - oo < x < oo.

(11.12.5)

From condition (11.12.4) it follows that the integral

1:~: e-B(II)dy
converges to +oo if x - +oo and to -oo if x - -oo. This shows that for
Ct =! 0 inequality (11.12.5) cannot be satisfied. Thus, the unique (up to a
multiplicative constant) positive stationary solution of equation (11.12.3)
is given by

u.(x)

= _c_eB(:~:)
u2(x)

with c > 0. Applying Theorem 11.12.1 to equation (11.12.3) we obtain the


following.
Corollary 11.12.1. Assume that the coefficient8 a= u 2 and b of equation
(11.12.3} are regular for the Cauchy problem and that inequality (11.12.4)
is satisfied. If

00

-oo

--eB(:~:) d:z:

u2(x)

< oo

'

Exercises

391

then the semigroup {Pt}t~o generated by equation (11.12.3) is asymptotically stable. If I= oo, then {Pt}t~o is sweeping.

Example 11.12.1. Consider the differential equation (11.12.2) with q


and
~
b(x)=--1+x2'
where ~ ~ 0 is a. constant. Then
B(x)

and
u

2~y

=-

(x)

= ce-~ln(l+,z2) =

!Z

=1

-1 +y2 dy = -~ ln(1 + x 2 ),
c
(1 +x 2 )~

i,

The function u. is integrable on R only for ~ > and thus the semigroup
{Pth~o is asymptotically stable for ~ > and sweeping for 0 :5 ~ :5
This example shows that even though the origin x = 0 is attracting in
the unperturbed system, asymptotic stability may vanish in a. perturbed
system whenever the coefficient of the attracting term is not sufficiently
strong.

Remark 11.12.1. In Corollary 11.12.1, the conditions (11.12.4) may be


replaced by the less restrictive assumption
00
{

Jo

e-B(,z)

dx

{O

Loo

e-B(,z)

dx

= 00.

(11.12.6)

Exercises
11.1. Let {w(t)}t~o be a. one-dimensional Wiener process defined on a.
complete probabilistic measure space. Show that for every to ~ 0, r > 0,
and M > 0 the probability of the event

{I

w(to + ~- w(to) :5 M for 0 < h :5

is equal to zero. Using this, show that for every fixed to


of the event
{w'(to) exists}

r}
~

0 the probability

is equal to zero.

11.2. Generalize the previous result and show that the probability of the
event
{w' (t) exists a.t least for one t ~ 0}
is equal to zero.

392

11. Stochastic Perturbation of Continuous Time Systems

11.3. Show that every regular sequence {sn} of


for the integral

Ito approximation sums

loT w(t) dw(t)

tw

2 (T)converges to
strongly in L (0)].

!T not only stochastically but also in the mean [i.e.,

11.4. Consider the stochastic differential equation


t

> o,e E R,

where c and u > 0 are constant and is normalized white noise. Show that
the corresponding stochastic semigroup { Pt h>o is asymptotically stable
(Mackey, Longtin, and Lasota, 1990).
-

11.5. Show that the stochastic semigroup {Pth~o defined in Exercise 7.8
is asymptotically stable for an arbitrary stochastic kernel K (Jama, 1986).
11.6. A stochastic semigroup {Pth~o is called weakly (strongly) mixing
if, for every /t, h E D the difference Pd1 -Pd2 converges weakly (strongly)
to zero in L 1 Show that the stochastic semigroup {Tth~o given by equation
(7.9.9), corresponding to the heat equation, is strongly mixing.
11.7. Consider equation (11.12.3) with b(x) = xf(l + x 2 ) and u = 1.
Prove that the stochastic semigroup { Pt h>o corresponding to this equation
satisfies
-

j_:oo (arctan x)Pd(x) dx =constant


and is not weakly mixing (Brzezniak and Szafirski, 1991).

11.8. Consider the semigroup {Pth>o defined in the previous exercise.


Show that the limit
lim H(Pd1 I Pth),

t-+oo

depends on the choice of


tropy, cf. Chapter 9.

It

and

h, where

!l,/2ED
H denotes the conditional en-

12
Markov and Foias Operators

Throughout this book we have studied the asymptotic behavior of densities. However, in some cases the statistical properties of dynamical systems
are better described if we use a more general notion than a density, namely,
a measure. In fact, the sequences (or flows) of measures generated by dynamical systems simultaneously generalize the notion of trajectories and
the sequences (or flows) of densities. They are of particular value in studying fractals.
The study of the evolution of measures related to dynamical systems is
difficult. It is more convenient to study them by use of functionals on the
space C0 (X) of continuous functions with bounded support. Thus, we start
in Section 12.1 by examining the relationship between measures and linear
functionals given by the Riesz representation theorem, and then look at
weak and strong convergence notions for measures in Section 12.2. After
defining the notions of Markov and Foias operators on measures (Section
12.3 and 12.4, respectively), we study the behavior of dynamical systems
with stochastic perturbations. Finally, we apply these results to the theory
of fractals in Section 12.8.

12.1 The Riesz Representation Theorem


Let X c Jl!1 be a. nonempty closed set which, in general, is unbounded.
We denote by 8 = 8(X) the a-algebra. of Borel subsets of X. A measure
p: 8 -+ R+ will be called locally finite if it is finite on every bounded

394

12. Markov and Foias Operators

measurable subset of X, that is,


~t(A)

< oo

for A E 8, A bounded.

Of course, every locally finite measure I' is 0'-finite, since X may be written
as a countable sum of bounded sets:
00

X=

Uxn,

(12.1.1)

n=1

where

Xn = {x EX: lxl $ n}.


The space of all locally finite measures on X will be denoted by M =
M(X). The subspace of M which contains only finite or probabilistic measures will be denoted by Mftn and M 1 , respectively. We say that a measure
I' is supported on a set A if JJ(X \A) = 0. Observe that the set A on which
I' is supported is in general not unique, since if B is measurable and con-

tains A, then X\ A ::J X\ B and consequently JJ(X \B) = 0. The elements


of M 1 are often called distributions.
In general, the smallest measurable set on which a measure 1 is supported
does not exist. However, this difficulty may be partially avoided. Denote
by Br (x) a ball in X with center located at x E X and radius r, that is,

Br(x) = {y EX: IY- xl < r}.


Let 1 E M. We define the support of the measure 1 by setting
supp Jl.

= {x EX: JJ(Be(x)) > 0 for every e > 0}.

It is easy to verify that supp 1 is a closed set. Observe that it also has the
property that if A is a closed set and Jl. is supported on A, then A ::J supp IJ.
To see this, assume that x A. Since X \ A is an open set, there exists a
ball Be (x) contained in X \ A. Thus,

JJ(Be(x)) $

~t(X \A)

= 0,

and x supp IJ This shows that x A implies x supp 1, and consequently A ::J supp IJ
From the above arguments it follows that the support of a measure Jl. can
be equivalently defined as the smallest closed set on which Jl. is supported.
(The adjective closed is important here.)
It should also be noted that the definition of the support of a measure J.1.
does not coincide exactly with the definition of the support of an element
I E 1 The main difference is that supp I' is defined precisely for every
single point, but supp I is not (see Remarks 3.12 and 3.13).

12.1. The Riesz Representation Theorem

395

We will often discuss measures that are supported on finite or countable


sets. Perhaps the simplest of these is the o-Dirac measure defined by

xo E A,
0zo (A) -_ { 01 if
if x 0 A.

(12.1.2)

Another important el8BS of measures are those absolutely continuous with


respect to the standard Borel measure on X. According to Definition 3.1.4
every measure that is absolutely continuous with respect to the Borel measure is given by

J(A)

l(x)dx

for A E 8,

(12.1.3)

where IE L1 (X) and I;?: 0.


Let Co = Co(X) be the space of all continuous functions h: X -+ R with
compact support. Our goal is to study the relationship between locally finite
measures on X and linear functionals on Co. We start with the following.

Definition 12.1.1. A mapping <p: Co-+ R is called a linear functional


if
<p(Atht +..\2h2)

= At<p(ht)+..X2<p(h2)

for At.A2 E R;h11h2 E Co. (12.1.4)

A linear functional is positive if <p(h) ;::: 0 for every hE Co with h;::: 0.


It is easy to define a linear functional corresponding to a locally finite
measure I' Namely, we may write

<p(h) =

h(x)J(dx)

for hE Co.

(12.1.5)

Since the support of h is bounded and J is finite on bounded sets, this


integral is always well defined. Further, from the known properties of integrals (see Section 2.2) it follows that condition {12.1.4) is satisfied and
that <p(h) ;::: 0 for h;::: 0. Thus by (12.1.5) every measure J E M defines a
positive linear functional on Co in a natural way.
It is surprising that formula (12.1.5) gives all positive functionals on Co.
Namely, the following celebrated Riesz representation theorem holds.

Theorem 12.1.1. For every positive linear functional <p: Co-+ R there is
a unique measure J EM such that condition (12.1.5) is satisfied.
The proof can be found in Halmas [1974].
Observe that Theorem 12.1.1 is somewhat similar to the Radon-Nikodym
theorem. In the Radon-Nikodym theorem, a measure is represented by integrals with a given density. In the ruesz theorem a functional is represented
by integrals with a given measure. However, it should be noted that in the
ruesz theorem even the uniqueness of the measure J is not obvious. Namely,

396

12. Markov and Foias Operators

we cannot substitute the characteristic function h = 1A of a measurable


set A c X into formula (12.1.5) to find an explicit value of J.&(A). In general, except for some trivial cases like A = 0 or A = X, the characteristic
function 1A is not continuous and tp(1A) is not defined.
The Riesz theorem allows us to also characterize finite probabilistic measures by the use of corresponding functionals. Consider first the simplest
case when X is bounded. Then h = 1x has bounded support and is continuous on X. From (12.1.5) it follows immediately that

tp(1x)

= J.&(X).

Thus, probabilistic measures correspond to those functionals for which

tp(1x)

= 1.

(12.1.6)

In the case when X is bounded it is not necessary to characterize the finite


measures, since every locally finite measure is automatically finite.
However, if X is unbounded we cannot substitute h = 1x into (12.1.5)
and we must use a more sophisticated method. Namely, let {hn} with
hn E Co be a sequence of functions such that

0 :$ h1 :$ h2 :$ , lim hn(x)
n-+oo

=1

for x EX.

(12.1.7)

Substituting hn into (12.1.5) we obtain

which by the Lebesgue monotone convergence theorem (see Remark 2.2.4)


gives
lim tp(hn)

n-+oo

= Jx
f 1J.&(dx) = J.&(X).

(12.1.8)

Thus, probabilistic measures correspond to functionals tp such that


lim tp(hn)

n-+oo

=1

(12.1.9)

for any sequence {hn} with hn E Co satisfying conditions (12.1.7). Further,


from equality (12.1.8) it follows that the validity of condition (12.1.9) does
not depend on the particular choice of the sequence {hn}. In other words, if
(12.1.9) holds for one sequence {hn} with hn E Co and satisfying (12.1.7),
then it is also valid for any other sequence of the same type.
Using (12.1.8) we may also characterize finite measures. Namely, all of
the finite measures correspond to functionals such that
lim tp{hn)

n-+oo

< 00

(12.1.10)

12.2. Weak and Strong Convergence of Measures

397

for any sequence {hn} with hn E Co and satisfying conditions (12.1.7).


Again the validity of (12.1.10) does not depend on the choice of {hn}
Example 12.1.1. Consider a 6-Dirac measure J.t = 6:~: 0 supported on the
point set {x0 } and given by conditions (12.1.2). Then formula (12.1.5)
implies

cp(h) =

Jx

h(x)J.t(dx) =

f
J{zo}

h(x)J.t(dx) = h(x0 ).

Thus, the functional that corresponds to the 6-Dirac measure supported


on {xo} is simply the map that adjoins to each function h E C0 its value
at xo. This observation is, incidentally, the starting point for the Schwartz
[1966] approach to the theory of generalized functions. 0
Example 12.1.2. Consider an absolutely continuous measure J.l. with a
density f. In this case, formula (12.1.5) gives

cp(h) =

h(x)J.t(dx) =

h(x)f(x) dx = {h, f).

Thus, the functional corresponding to an absolutely continuous measure is


given by a scalar product. 0

12.2 Weak and Strong Convergence of Measures


In Section 2.3 we introduced the notions of the weak and strong convergence
of sequences of V' functions. In a somewhat similar (but not identical!) way
we may introduce the concepts of weak and strong convergence of sequences
of measures. We start from the definition of weak convergence, since it is
quite simple and natural.
Definition 12.2.1. Let {J.tn} with J.l.n E M be a sequence of measures and
let J.t EM. We say that {J.tn} is weakly convergent to J.t if
lim

n-+oo}x

h(x)J.tn(dx) =

Jx

h(x)J.t(dx)

for every h E Co.

(12.2.1)

Before giving examples of weak convergence, observe that in the case


when J.l.n and J.t are absolutely continuous, and have densities fn and J,
respectively, condition (12.2.1) reduces to
{h, fn)

h(x)fn(x) dx-+

h(x)f(x) dx

= (h, f)

for hE Co.

(12.2.2)
This looks quite similar to condition (2.3.2) in Definition 2.3.1 for the weak
convergence of a sequence {in} of functions in V' space. However, there is

398

12. Markov and Foias Operators

an important difference between conditions (2.3.2) and (12.2.2). Namely, in


(2.3.2) the space of "test functions" g is larger and we must verify (2.3.2)
for all g which belong to the space v' adjoint to[)'. In (12.2.2) all the ''test
functions" h belong to Co and thus are continuous with compact supports.
To simplify the notation we will quite often use the notion of scalar
product for measures. Thus, we write

(h,~) =

Lh(x)~(dx).

In this notation the weak convergence of measures has an especially simple


form. Namely, {~n} converges to~ weakly if
lim (h, ~n} = (h, ~}

for h E Co.

n-oo

{12.2.3)

Example 12.2.1. Let X = R and let ~n = 6z,. be a sequence of 6-Dirac


measures supported at points Xn E R. Assume that {xn} converges to x.
and denote by~.= 6a:. the 6-Dirac measure supported at x . We have
(h, ~n} = h(xn)

(h, ~.} = h(x.)

and

for h E Co.

For each fixed h E Co, from the continuity of h the sequence {h(xn)}
converges to h(x.). Consequently, the sequence of measures {~n} converges
weakly to ~. 0

Example 12.2.2. Let X= Rand let

{~n}

be a sequence of measures with

Gaussian densities
/n(x) =

1
{
z2 }
J21W!
exp - 2 0"~

n= 1,2, ....

(12.2.4)

Assume that O"n-+ 0 as n-+ oo and denote by~.= 6o the 6-Dirac measure
supported at x = 0. We have

l(h,~n}- (h,~.}l =
=

:5
Choose an e > 0. Let r

IL
IL

h(x)fn(x)dx- h(O)I
h(x)fn(x)dx-

fn(x)h(O)dxl

lh(x)- h(O)I/n(x)dx.

> 0 be such that lh(x)- h(O)I :5 e for lxl ::; r. Then

l(h,~n- ~.}1::; f

Jlzl~r

lh(x)- h(O)Ifn(x)dx

+ f

lh(x)- h(O)I/n(x) dx

Jlzl?.r

:5 e +2M

Jlzl?.r

fn(x) dx,

12.2. Weak and Strong Convergence of Measures

399

where M =max lh(x)j. Using (12.2.4) and setting xfu, = y we finally have

z2)

00

1
:5 E + 4M rn=

j(h,Jin- p.)j

v27r r/tTn

exp ( - - dx.
2

Since the sequence {u,} converges to zero the last integral also converges
to zero which implies that

= 0.

lim (h,JJn- J~.)

'R-+00

Thus, the Gaussian measures converge weakly to a 6-Dirac measure when


the standard deviations go to zero. 0

Example 12.2.3. As in the previous example, let the #In be Gaussian


measures with densities given by (12.2.4). This time, however, assume that
u,-+ oo. Denote by Jl the measure identically equal to zero. We have

l(h,Jin- Jl)l

= l(h,J~n)l :5
=

+oo

-oo

lh(x)lfn(x)dx

lh(x)lfn(x) dx,

(12.2.5)

where K is the support of h. Let [a, b] denote a bounded interval which


contains K. From (12.2.4) and (12.2.5) it follows immediately that
l(h,Jin- Jl)l

:5 maxlhl

<

k 1b

exp {-

2:~} dx

maxlhl (b- a).

-~

Since the sequence {u,} converges to infinity, the integrals on the righthand side converge to zero. This shows that the Gaussian measures converge weakly to zero when the standard deviations go to infinity. Observe,
however, that in this case the sequence of densities {!,} does not converge
weakly in L 1 to /. 0. In fact, setting g 1 in (2.2.3) we have

(g,/,)

f,(x)dx

= 1,

(g,/.)

= 0,

and the sequence { (g, f,)} does not converge to (g, /.). 0
The weak convergence of {JJn} to J1 does not imply the convergence of
{J~n(A)} to {J~(A)} for all measurable sets A. However, it is easy to obtain
some inequalities between J~(A) and J.&n(A) for large n and some special
sets A.
We say that G c X is open in X if X \ G is a closed set. For example
the ball
Br(x) {y EX: lx- Yl < r}

400

12. Markov and Foias Operators

is open in X since

X\ Br(x) = {y EX: lx- Yl?: r}


is a closed set.

Theorem 12.2.1. Assume that a sequence P.n E


p. E Mfin. Then

Mfin

converges weakly to

for G C X, G open in X.

lim inf P.n {G) ?: p.(G)


n-+oo

{12.2.6)

Proof. Since G is open in X there exists a sequence of compact sets Ft C


F2 C such that
k=l

Thus, limp.{FA:) = p.(G) and for any given e > 0 there is a set Fk such that
p.{FA:) ?: p.(G)- e. Let hE C0 {X) be such that 0 ~ h ~ 1 and
h(x)

={1

for x E FA:,
0 for x E X\G.

Since FA: and X \ G are closed and disjoint, the function h always exists.
Evidently h ~ 1a and
which gives, in the limit,
(h, p.)
On the other hand, h?:

IF~

liminf P.n(G).
n-+oo

and

(h,p.)?: p.{FA:)?: p.(G)- e.


Consequently,
liminf P.n{G)?: p.{G)- e.
n-+oo

Since e > 0 was arbitrary this completes the proof.

Remark 12.2.1. It is easy to observe that in general inequality in {12.2.6)


cannot be replaced by the equality. In fact, let X= R, P.n = 6t/m p. = 6o
and A= {0, 1). In this case the sequence {P.n} converges weakly top., but
P.n(A) = 1, p.(A) = 0, and the inequality {12.2.6) is strong. 0

12.2. Weak and Strong Convergence of Measures

401

Now we are going to show how the lliesz representation theorem may be
used to show that a given sequence of measures is convergent.

Theorem 12.2.2. Let a sequence of measures {JLn}, l'n EM, be given. If


for each h E Co the sequence {(h, ~tn}} is convefYent, then there is a unique
measure JL such that {JLn} convefYes weakly to I'
Proof. Define

cp(h) = lim (h, JLn}


n-+oo

for hE Co.

Evidently cp is a linear positive functional. Thus, according to Theorem


12.1.1 there is a unique measure J.L such that

cp(h) = (h, JL}

for hE Co.

From this and the definition of cp, it follows that the sequence {JLn} converges to J.L weakly.
Remark 12.2.2. In the special case when the J.Ln are probabilistic measures the use of Theorem 12.2.2 can be greatly simplified. Namely, it is not
necessary to verify the convergence of sequences {{h, l'nH for all h E C0
Let C. c Co be a dense subset of Co which means that for every h E Co
and e > 0 there is g E C. such that
sup lg(x)- h(x)l :5 e.
:z:EX

c. implies the conver-

Then the convergence of sequences { (g, l'n}} for g E


gence of { (h, l'n}} for h E Co. In fact, the inequality

I{h, l'n - I'm} I :5 I(g, l'n - J.Lm} I + 2 sup lg -

hi

and the Cauchy condition for all sequences { (g, l'n}} imply the Cauchy
condition for {{h, l'nH 0
We close this section by introducing the concept of the strong convergence of measures. First we need to define the distance between two
measures 1'11 1'2 E Miin Let (Xl! ... , Xn) be a measurable partition of
X, that is
n

X=

UX;, X;nX;

for i

=f: j, X;

B.

i=l

We set

lll't -1'211 =sup

L l~tt(X;) -JL2(X,)I,
i=l

(12.2.7)

402

12. Markov and Foias Operators

where the supremum is taken over all possible measurable partitions of


X (with arbitrary n). The value IIJ1 -J211 is the desired distance. In the
special case where 1 = 1'1 is arbitrary and 1'2 0 we have

111'11 =sup

L J(Xi) = J(X).

(12.2.8)

i=1

This value will be called the norm of the measure I' It is the distance
from I' to zero. The norm of a probabilistic measure is equal to 1.

Definition 12.2.2. We say that a sequence {Jn}, l'n E Man is strongly


convergent to a measure I' E Man if
lim

n-+oo

lll'n - I'll = 0.

(12.2.9)

Before passing to examples of strong convergence, we will calculate the


norm IIJ1 -J211 in the case when the measures 1'1 and #2 are absolutely
continuous with Radon-Nikodym derivatives /1 and /2, respectively. We
have

#'1(Xi) -J2(Xi)

= f (/l(x) -f2(x))dx.
lx,

Substituting this into (12.2.7) we obtain immediately

IIJ1 -J211

=sup~

IL,

(/l(x) -f2(x))dxl

:::;

sup~[, 1/l(x)- h(x)l dx

1/l(x) -f2(x)l dx.

(12.2.10)

Now let

x1

= {x: /l(x) ~ h(x)},

x2

= {x:/l(x) < h(x)}.

Then (X1, X2) is a partition of X and, consequently,

IIJ1 -J211 ~ IJ1(X1) -J2(X1)1 + IJ1(X2) -J2(X2)1

r (ft(x)- h(x)) dx + lx2r (h(x)- ft(x)) dx

lx1

1/l(x) -f2(x)l dx.

This and (12.2.10) implies

IIJ1 -J211

=[

1/l(x)- h(x)l dx.

(12.2.11)

12.2. Weak and Strong Convergence of Measures

403

From this equality a necessary and sufficient condition for the strong convergence of absolutely continuous measures follows immediately. Namely,
if the IJ.n are absolutely continuous with densities /n, and p. is absolutely
continuous with density/, then {P.n} converges strongly top. if and only
if 11/n -/II - 0.
Example 12.2.4. Assume X = R. Let xo E X. Denote by 11-o = 6:~:0
the 6-Dirac measure supported at xo. Further, let {P.n} be a sequence of
absolutely continuous measures with densities fn Write

Xt = {xo},

X2 =X\ {xo},

where, as usual, {xo} denotes the set that contains only the one point
x = xo. We have

11-o(Xt) = 1,
P.n(Xt) = {

11-o(X2)

=0

fn(x) dx = 0

J{zo}

P.n(X2)

= {

fn(x) dx

= {

Jx\{zo}

Jx

fn(x) dx

= 1.

Thus, since (X11 X 2) is a partition of X


IIP.n -11-oll :?: IP.n(Xt) -~J.o(Xt)l

= IO- 11 + 11 - Ol =

+ IP.n(X2) -p.o(X2)1
2.

This shows that a sequence of absolutely continuous measures cannot converge strongly to a 6-Dirac measure. 0

Example 12.2.5. Assume X


R and consider a probabilistic measure
p. supported on the set of nonnegative integers {0, 1, ... }. The measure p.

may be written in the form


00

L:ck

= 1,

(12.2.12)

k=O

where 6k denotes the 6-Dirac measure supported at x


{P.n} be a sequence of similar measures, so
00

l'n

= L:ck,.6k,
k=O

= k.

Further, let

00

L:ck,. = 1,

(12.2.13)

k=O

Assume that for each fixed k (k = 0, 1, ... ) the sequence {ck,.} converges
to Ck as n - oo. We are going to show that under this condition the
sequence of measures {P.n} converges strongly to p.. Thus we must evaluate
the distance IIPn -I'll

404

12. Markov and Foias Operators

From (12.2.12) and (12.2.13) it follows that


00

L lckn - c~cl6~c(Xi)

IJn(Xi)- J(Xi)l :S;

k=O

for each measurable subset Xi of X. Consequently,


m

lll'n- ~11 =sup

L IPn(Xi)- J(Xi)l
i=l

:S; sup
:S; sup

oo

i=l

k=O

L L lc~cn - c~cl6~c(Xi)
oo

k=O

i=l

L lckn- c~cl L6~c(X,),

where the supremum is taken over all partitions {Xi} of X. Since for every
partition
m

L6~c(Xi)
i=l

=1

k=0,1, ... ,

this gives
00

L lckn- c~cl

lll'n- I'll :S;

{12.2.14)

k=O

Now fix an e > 0 and choose an integer N such that


00

Cfc <

k=N+l

When N is fixed we can find an integer no such that


N

L lckn- c~cl :S; ~

for n

~no.

k=O

We have, therefore,
oo

Cfcn = 1 -

k=N+l

lc=O

lc=O

L Cfcn :S; 1 - L

L....t

Ck

Clc

lc=N+l

L lckn - Cfc I

lc=O

oo

+ L....t lclcn- c~cl :S; 4 + 4 = 2


lc=O

and, finally,
oo

oo

L lc1cn- c~cl :S; L lclcn- c~cl + L


k=O

lc=O
e e

lc=N+l

:54+2+4=e

oo

Clcn

lc=n+l

for n 2:: no.

Clc

12.3. Markov Operators

405

From the last inequality and (12.2.14} it follows that {1-'n} is strongly convergent to J.l.
A13 a typical situation described in this example consider a sequence of
measures {JJ.n} corresponding to the binomial distribution.
Ckn

= { ( ~) p!q~-k
0

if k = 0, ... , n
if k

> n,

where 0 < Pn < 1 and qn = 1- Pn. Further, let J.1. be a measure corresponding to the Poisson distribution

If Pn

= >..jn, then
Ck = (n- k + 1} ... (n- l}n >._k (1-

kl

nk

~)n-k
n

= 1) ... (~) (1- ~) n-k ~;.

= (n _ +

Evidently the first k factors converges to 1 and the (k+ 1}th to e-.x. Thus,
Ckn -+ c~c as n -+ oo for every fixed k, and the sequence of measures corresponding to the binomial distribution converges strongly to the measure
corresponding to the Poisson distribution. This is a classical result of probability theory known as Poisson's theorem, but it is seldom stated in
terms of strong convergence. 0

12.3 Markov Operators


In Chapter 3 we introduced Markov operators in Definition 3.1.1, taking
a Markov operator to be a linear, positive, and norm-preserving mapping
on the space L 1 Now we will extend this notion to the space of all finite
measures Man and, in particular, to all probabilistic measures M 1 We
start from a formal definition of this extension.

Definition 12.3.1. A mapping P: Man{X) -+ Man(X) will be called a


Markov operator on measures if it satisfies the following two conditions:
{a) P(>..1J.1.1 + >..2J.1.2)
{b) PJL(X)

= >..1PJ.1.1 + >..2PJ.1.2 for >..t. >..2 ~ 0, J.l.t. J.1.2 E Man, and

= JL(X} for JL E Man

406

12. Markov and Foias Operators

Assumption (a) will often be called the linearity condition; however,


it is restricted to nonnegative ..\1 only. Assumption (b) may be written in
the form IIPJJ.II = IIJJ.II (see 12.2.8} and will be called the preservation of
the norm.
In the following we will quite often omit the qualifying phrase "on measures" if this does not lead to a misunderstanding. On the other hand, if
it is necessary we will add the words "on densities" for Markov operators
described by Definition 3.1.1.
Our first goal is to show how these two definitions of Markov operators
are related. Thus, suppose that the Borel measure of the set X is positive
(finite or not) and consider one operator P: Mfin--+ Mfin Assume that it
satisfies conditions (a) and (b) of Definition 12.3.1 and that, moreover, for
every absolutely continuous J.& the measure P J.' is also absolutely continuous.
Take an arbitrary f E L 1 , f ~ 0, and define

J.'J(A)

f(x) dx

for A E B.

(12.3.1}

Since P J.&1 is absolutely continuous it can be written in the form

PJ.&J(A}

for A E 8,

g(x) dx

(12.3.2}

where g is a Radon-Nikodym derivative with respect to the Borel measure


PJ.&J on X. In this way to every f E L 1 , f ~ 0, we adjoin a unique g E
L 1 , g ~ 0, for which conditions (12.3.1} and (12.3.2} are satisfied. The
uniqueness follows immediately from Proposition 2.2.1 or from the RadonNikodym theorem. Thus, f is mapped to g. Denote this mapping by P, so
g = Pf. We may illustrate this situation by the diagram
p
--t

IF j

Ma
!RN

(12.3.3}

where Ma denotes the family of absolutely continuous measures, L~ is the


subspace of L 1 which contains nonnegative functions, IF denotes the integral formula (12.3.1}, and RN stands for the Radon-Nikodym derivative.
The operator P is defined as a "shortcut" between L~ and L~ or, more
precisely, in such a way that the diagram (12.3.3} commutes. Thus, Pis
the unique operator on densities that corresponds to the operator P on
measures.
Substituting (12.3.1} and (12.3.2} with g = Pt we obtain
for A E 8, f E L~.

(12.3.4)

12.3. Markov Operators

407

This is the shortest analytical description of P. To understand this formula


correctly we must remember that on the left-hand side of the operator P
is applied to the meBBure given by the integral in braces and then the new
meBBure is applied to the set A.
From condition (a) and formula {12.3.4) it follows immediately that P
satisfies the linearity condition for nonnegative functions, that is,

Further, using {12.3.4) we obtain

and analogously

II/II = [

f(x) dx

= JLJ(X).

From condition (b) this implies

IIF/11 =II/II

for/ E L~.

{12.3.6)

Now we may extend the definition of P to the whole space L 1 that


contains all integrable {not necessarily nonnegative) functions by setting

{12.3.7)
Using this extension and condition {12.3.5) one can verify that Pis a linear
operator. Further, from our construction, and in particular from {12.3.4),
it follows that P! ~ 0 for f ~ 0. Finally, {12.3.6) shows that P preserves
the norm of nonnegative functions. We may summarize this discussion with
the following.
Proposition 12.3.1. Let P: Man - Man be a Markov operator on measures such that for every absolutely continuous measure p. the measure P ,_,
is also absolutely continuous. Then the corresponding operator P defined by
fonnulas {12.3.4) and {12.3. 7) is a Markov operator on densities and the
diagram {12.3.3) commutes.

The commutative property of diagram {12.3.3) hOB an important consequence. Namely, if P is the operator on densities corresponding to an
operator P on meBBures, then (P)n corresponds to pn. To prove this consider the following row of n blocked diagrams {12.3.8).

12. Markov and Foias Operators

408

Ma

------+-

Ma

------+-

Ma Ma

Ma

~F

~F
1

------+-

------+p

L~

(12.3.8)

~ L~ ... L~ ~ L~

Since each of the blocks commutes, the total diagram (12.3.8) also commutes. This shows that (ft)n corresponds to pn.
Remark 12.3.1. There is an evident asymmetry in our approach to the
definition of Markov operators. In Section 3.1 we defined a Markov operator on the whole space 1 which contains positive and negative functions
f: X -+ R. Now we have defined a Markov operator on Mfln which contains
only nonnegative functions p.: B -+ R. This asymmetry can be avoided.
Namely, we extend the definition of P on the set of signed measures, that
is, all possible differences p.1 - J.1.2, where P.1 1 P.2 E Mfln 1 by setting
P(p.1 - P.2)

= Pp.1 -

Pp.2.

Such an extension is unnecessary for our purposes and leads to some difficulties in calculating integrals, and in the use of the lliesz representation
theorem which is more complicated for signed measures on unbounded regions. 0
Example 12.3.1. Let X= R+. For a given p. E Man define

Pp.(A)

= p.([O, 1))60 (A) + p.([1, oo) n A)

(12.3.9)

where, as usual, 6o denotes the 6-Dira.c measure supported at x = 0. Evidently, P satisfies the linearity condition (a) of Definition 12.3.1.
Moreover,

Pp.(R+)

= p.([O, 1))60 (R+) + p.([1,oo) n R+)


= p.([O, 1)) + p.([1, oo)) = p.(R+),

which shows that condition (b) is also satisfied. Thus, (12.3.9) defines a
Markov operator on measures.
The operator P is relatively simple, but it has an interesting property.
Namely, if a measure p. E M 1 is supported on [0, 1), then Pp. is a 6-Dira.c
measure. H p. is supported on [1,oo), then Pp. = p.. In other words, P
shrinks all of the measures on [0, 1) down to the point x = 0 and leaves
the remaining portion of the measure untouched. In particular, P does not
map absolutely continuous measures into absolutely continuous ones, and
the corresponding Markov operator P on densities cannot be defined. 0
Example 12.3.2. Let X = R and let t > 0 be a fixed number. For every

12.3. Markov Operators

409

p. E Man define the measure PtJ.I. by


Ptp.(A) =

L{t k

exp (- (x ;ty)

JJ(dy)} dx.

(12.3.10)

Again the linearity of Pt is obvious, and to verify that Pt is a Markov


operator it is sufficient to check the preservation of the norm.
To do this, substitute A = R into (12.3.10) and change the order of
integration to obtain

Inside the braces we have the integral of the Gaussian density, and consequently

Ptp.(R)

1p.(dy) = p.(R),

so Pt is a Markov operator.
To understand the meaning of the family of operators {Pt}, first observe
that for every p. E Man the measure PtJ.I. is given by the integral (12.3.10)
and has the Radon-Nikodym derivative
(12.3.11)
If p. is absolutely continuous with density f, we may replace p.(dy) by
f(y) dy and in this way obtain an explicit formula for the operator Pt on

densities corresponding to Pt. Namely,

iH(x) = 9t(x) =

exp {- (x ;ty)

f(y) dy.

The function u(t,x) = 9t(x) is the familiar solution (7.4.11), (7.4.12) of the
heat equation (7.4.13)

1 8 2u
8t = 2 8x 2

8u

with the initial condition

u(O,x)

for t > 0, x E R,

= f(x).

It is interesting that u(t, x) = 9t(x) satisfies the heat equation even in the
case when JJ has no density. This can be verified simply by differentiation
of the integral formula (12.3.11). (Such a procedure is always possible since
JJ is a finite measure and the integrand
_1_e-(z-y)2/2t

v'21it
and its derivatives are bounded 0 00 functions for t

e > 0.)

410

12. Markov and Foias Operators

Further, in the case of arbitrary I' the initial condition is also satisfied.
Namely, the measures Ptl' converge weakly to I' as t - 0. To prove this
choose an arbitrary hE C0 (R). Since 9t is the Radon-Nikodym derivative
of Ptl' we have
{h, PtJ}

=
=

h(x)Ptp(dx)

L {L k
h(x)

h(x)gt(x) dx
(x ;ty)2) p(dy)} dx

exp (

or by changing the order of integration


{h,l'tp} =
where

v(t,y)

(12.3.12)

v(t,y)p(dy)

kL

(x ;ty)

exp (

h(x)dx.

Observe that v(t, y) is the solution of the heat equation corresponding to the
initial function h(y). Since his continuous and bounded, this is a classical
solution and we have
lim v(t, y)

t-+0

= h(y)

for y E R.

Evidently
lv(t,y)l:::;

m~l

2
exp( _(x;ty) )dx=maxlhl.

Thus by the Lebesgue dominated convergence theorem (see Remark 2.2.4)


lim

t-+O}R

v(t, y)p(dy) =

jR

h(y)IJ(dy).

From this and (12.3.12} it follows that Ptl' converges weakly to I'
Thus, we can say that the family of measures {Ptl'} describes the transport of the initial measure by the heat equation. From a physical point of
view, if u(t, x) = 9t(x) is the temperature at timet at the point x, then

ptp(A)

9t(x}dx

is equal (up to multiplicative constraint) to the amount of heat carried by


a segment A at time t. In particular, substituting I' = 6z0 (the 6-Dirac
measure supported at x = x0 ) in (12.3.11) we obtain

u(t,x) = 9t(X) = ke-(z-zo)2/2t.

12.4. Foias Operators

411

This equation is identical to the fundamental solution r(t, x, xo) of the heat
equation (see Section 11.7) and it gives a simple physical interpretation of
this solution. Namely, r(t,x,xo) is the temperature at time t and point
x corresponding to the situation in which the initial amount of heat was
concentrated at a single point xo. 0

12.4 Foias Operators


At the end of the previous section we have given two examples of Markov
operators constructed in two different methods. The goal of the present
section is to develop these methods in detail.
Let X c R_d be a nonempty closed set. We start from the following

Definition 12.4.1. Let 8: X-+ X be a Borel measurable transformation.


Then the operator P: Mfin -+ Mtin defined by
for A E B(X)

(12.4.1)

is called the Frobenius-Perron operator on measures corresponding


to S.
Evidently P defined by (12.4.1) is a Markov operator. Now observe how
P works on measures supported on a single point. Let x 0 E X be fixed.
Then
1
P 6 (A) = 6 (s-l(A)) = { o if xo 1 (A)
Zo
zo
1 if Xo E
(A)

ss-

or
p 6 (A)=
zo

Thus, P6z0

{0

if S(xo) A
1 if S(xo) E A.

= 6s(zo). By induction we obtain

= 6sn(zo)

pn6zo

This shows that the iterates of the Markov operator (12.4.1) can produce
a trajectory of a transformation S. To obtain this trajectory it is sufficient
to start from a 6-Dirac measure.
We next show that P can also transform densities. Consider the special
case when p. is absolutely continuous with density f and S is a nonsingular
transformation. Then

p.(A)

f(x) dx,

and the right-hand side of (12.4.1) may be written in the form

p.(S- 1 (A)) =

f
ls-l(A)

f(x)dx =

f
}A

Ff(x)dx,

412

12. Markov and Foias Operators

where Pis the Frobenius-Perron operator on densities corresponding to S.


Now equality {12.4.1) may be explicitly written in the form

This is a special case of formula {12.3.4) and it shows that the FrobeniusPerron operator P on densities corresponds, in the sense of diagram {12.3.3),
to the Frobenius-Perron operator P on measures.
This correspondence was obtained under the additional assumption that
S is nonsingular. For an arbitrary Borel measurable transformation S, the
operator P given by {12.4.1) may transform absolutely continuous measures
into measures without density.

Example 12.4.1. Let X= R+ and

S(x)

= {~..,

0$ x < 1
X~

1.

Then

s- 1 (A) = s- 1 (A n [o, 1)) u s- 1 (A u [1, oo)),


where s- 1 {AU [1,oo)) =An [1,oo) and
s-l(A n [O, 1))

= { [o, 1)
0

if 0 E A
ifO A.

From the last formula it follows that

tt(S- 1 (A n (0, 1))

= 1A{O)p{[O, 1)).

Consequently, the Frobenius-Perron operator for S is given by

Ptt(A)

= tt(S- 1 (A)) = 1A{0)p{[O, 1)) + tt(A n [1, oo))

which is identical with {12.3.9). 0


Now we are going to study a more general, and complicated, situation
when the dynamical system includes random perturbations. Thus, we consider the system
for n
where T is a given transformation and the
vectors. We make the following assumptions:

= 0, 1, ... ,

en

{12.4.2)

are independent random

(i) T is defined on the subset X x W of Jld x Rk with values in X. The


set X c Jld is closed and W c Rk is Borel measurable. For every
fixed y E W the function T(x, y) is continuous in x and for every
fixed x EX it is measurable in y.

12.4. Foias Operators

413

(ii) The random vectors eo,et ... , have values in Wand have the same
distribution, that is, the measure
v(B)

= prob(en E B)

forB E B(W)

is the same for all n.

(iii) The initial random vector Xo has values in X and the vectors Xo, eo,
et. ... ' are independent.
A dynamical system of the form (12.4.2) satisfying conditions (i)-(iii)
will be called a regular stochastic dynamical system. We emphasize
that in studying (12.4.2) it is assumed that the transformation T and the
random vectors en are given. The initial vector Xo can be arbitrary, but
must be such that condition (iii) is satisfied. Observe that in particular
if eo, e1 are independent and Xo E X is constant (not random) then the
vectors Xo, ell el ... ' are also independent. This can be easily verified using
the definition of the independence of random vectors and the fact that the
value of prob(xo E A) is either 0 or 1 for Xo constant.
According to (12.4.2) the random vector Xn is a function of Xo and eo.
6, ... 1 en-1 From this and condition (iii) it follows that Xn and en are
independent. Using this fact we will derive a recurrence formula for the
measures
A E B(X),
J.l.n(A) = prob(xn E A),
(12.4.3)
which statistically describe the behavior of the dynamical system (12.4.2).
Thus, choose a bounded Borel measurable function h: X -+ R and for
some integer n ~ 0 consider the random vector Zn+l = h(Xn+t) Observe
that
J.l.n+t(A) = prob(x;;~ 1 (B)).
Using this equality and the change of variables Theorem 3.2.1, the mathematical expectation E(zn+l) can be calculated as follows:

E(zn+l)

=
=

fo

However, since Zn+l

E(zn+t) =

h(xn+t(w))prob(tku)

h(X)J.I.n+t(dx)

i h(x)prob(x;;~1 (dx))

= (h,J.I.n+t}

(12.4.4)

= h(T(xn,en)) we have

fo

=j

h(T(xn(w),en(w)))prob(tku)

{
h(T(x,y))prob((xn,en)- 1(dxdy)).
lxxw

{12.4.5)

The independence of the random vectors Xn and en implies that


prob((xn,en) E AxB) = prob{xn E A, en E B)= prob(xn E A)prob{en E B),

414

12. Markov and Foias Operators

or

prob((xn,en)- 1 (A x B))= prob(x;- 1 (A))prob(e;- 1 (B)),

which shows that the measure prob((xn, en)- 1 (C)) is the product of measures

l'n(A)

= prob(x;- 1(A))

and v(B)

= prob(e;- 1(B)).

Thus, by the Fubini Theorem 2.2.3, equality (12.4.5) may be rewritten in


the form
E(ZnH)

L{fw

h(T(x, y))v(dy)} l'n(dx).

Equating this expression with (12.4.4) we immediately obtain


(h, l'n+l} =

L{fw

h(T(x, y))v(dy)} l'n(dx).

(12.4.6)

This is the desired recurrence formula, derived under the assumption that
h is Borel measurable and bounded. The boundness of h asserts that all
the integrals appearing in the derivation are well defined and finite, since
the measures J'n, l'n+ll prob, ... , were probabilistic. The same derivation
can be repeated for unbounded h as long as all the integrals are well defined. In particular the derivation can be made for an arbitrary measurable
nonnegative h. However, in this case the integrals on both sides of (12.4.5)
could be infinite.
Using (12.4.5) we may calculate the values of l'n+l(A) for an arbitrary
measurable set A c X. Namely, setting h = 1A we obtain

l'n+l(A) =

L{fw

1A(T(x,y))v(dy)} l'n(dx).

Now we are in a position to define the Foias operator corresponding to


the dynamical system (12.4.2).

Definition 12.4.2. Let a function T: X x W--+ X satisfying condition (i)


and a probabilistic measure (supported on W) be given. Then the operator
P: Mfln --+ Mfin given by
PJ(A)

L{fw

1A(T(x,y))v(dy)} l'(dx) for I' E Mfln,A E B(X)

(12.4.7)
will be called the Foias operator corresponding to the dynamical system
(12.4.2).
Since v is a probabilistic measure, it is obvious that P is a Markov
operator. Moreover, from the definition of P it follows that l'n = pnp.o,
where {l'n} denotes the sequence of distributions (12.4.3) described by the
dynamical system (12.4.2).

12.4. Foias Operators

Setting
Uh(x)

= fw h(T(x,y))v(dy)

for x EX,

415

(12.4.8)

we may rewrite {12.4.7) in the form

Due to the linearity of the scalar product this implies

(gn,PJ.I.} = (Ugn,J.I.},
where

9n

= L~i1A,
i=l

is a simple function. Further, since every measurable function h can be


approximated by a sequence {gn} of simple functions, we obtain in the
limit
(h, PJ.I.) = (Uh, J.l.)
(12.4.9)

if {gn} and {U9n} satisfy the conditions of the Lebesgue dominated or


Lebesgue monotone convergence theorem. In particular, (12.4.9) is valid if
h is Borel measurable and bounded or nonnegative.
From (12.4.9) it follows by an induction argument that
for n

= 1,2, ....

(12.4.10)

Now define a sequence of functions Tn(x, y1, ... , Yn) by setting


Tn(x, Y1. ... , Yn)

= T(Tn-l(x, Y1. .. , Yn-d, Yn)

Using this notation we obtain


Unh(x)

= fw fw h(Tn(x, Y1, ... , Yn)v(dyl) v(dyn)

{12.4.11)

from (12.4.8), or, more briefly,


unh(x)

where yn =

=f

h(Tn(x, yn))vn(dyn),

(12.4.12)
Jwn
(Yl. ... ,yn). wn = w X ... X w is the Cartesian product of

= v(dy1) v(dyn) is the corresponding product


measure on wn.
Equations (12.4.10) and (12.4.12) give convenient tools for studying the
asymptotic behavior of the sequence {pn J.1.}. Moreover, un and Tn have a
simple dynamical interpretation. Namely, from (12.4.2) and the definition
of Tn it follows that
n sets W and vn(dyn)

for n

= 1,2, ... ,

416

12. Markov and Foias Operators

which shows that Tn describes the position of Xn as a function of the initial


position x 0 and perturbations. Further, repeating the calculation of the
mathematical expectation E(h(xn)) we obtain

E(h(xn))

=In h(xn(w))prob(dw)
=

Lh(x)prob(:z:~ (dx)) L
1

h(x)JJn(dx)

or
(12.4.13}
In particular, if the starting point :z:o is fixed, corresponding to JJo = 6~~:01
we have
(12.4.14)
Thus, unh gives the mathematical expectation of h(xn) as a function of
,
the initial position :z:o.
We close this section by discussing the relationship between the FrobeniusPerron and Foias operators. Having a continuous transformation S: X - t X
we may formally write

T(x, y)

= S(x) + Oy.

In this case (12.4.7) takes the form

L{fw

Pp.(A) =

1A(S(x))v(dy)} JJ(dx)

lA(S(x))p.(dx) = p.(S- 1(A)),

and is identical with (12.4.1}. Thus, in the case when T(x, y) does not
depend on y the notions of the Foias operator and the Frobenius-Perron
operator coincide. Moreover, in this case

Uh(x)

= fw h(S(x))v(dy) = h(S(x)),

(12.4.15)

and U is the Koopman operator.


It is evident that the operator U given by equation (12.4.8} (with v E
M 1 ), or (12.4.15), maps a bounded function h into a bounded function Uh.
Moreover, if S is continuous [or more generally T satisfies condition (i)],
then U h is continuous for continuous bounded h. However, in general, the
support of Uh is not bounded for hE C0 (X).

12.5. Stationary Measures

417

12.5 Stationary Measures: Krylov-Bogolubov


Theorem for Stochastic Dynamical Systems
We begin our study of the asymptotic properties of {pn J.t} by looking for
stationary measures.
Definition 12.5.1. A measure I' E Mon is called invariant or stationary with respect to a Markov operator P if PJ.t = I' In particular, when
Pis a Foias operator corresponding to the dynamical system (12.4.2) and
PJ.t. = J.t, we say that I' is stationary with respect to {12.4.2). A stationary probabilistic measure is called a stationary distribution.
H I' is a stationary distribution for (12.4.2) and if the initial vector xo
is distributed according to J.t, that is,
prob(xo E A)
then all the vectors

Xn

prob(xn E A)

= J.I,.(A)

for A E B(X),

have the same property, that is,

= J.t.(A)

for A E B(X), n

= 0, 1, ....

Our main result concerning the existence of a stationary distribution is


contained in the following.
Theorem 12.5.1. Let P be the Foias opemtor coTTesponding to a regular
stochastic dynamical system {12.4.2). Assume that there is a J.to E M 1
having the following property. For every e > 0 there is a bounded set B E
B(X) such that
J.tn(B)

= pnJ.to(B) ;::: 1- e

for n

= 0, 1, 2, ....

(12.5.1)

= 1,2, ....

(12.5.2)

Then P has an invariant distribution.

Proof. Define
(n

1 n-1

n-1

1
L
P' J.to = - L J.ti
n
n

=-

i=O

for n

i=O

Choose a countable subset {h1, h2, ... } of Co(X) dense in C0 (X) (see Exercises 12.1 and 12.2). The sequence {(h1, (n)} is bounded since the (n are
probabilistic and l(h1, (n)l:::;; max lh1l Thus, there is a subsequence {(1n} of
{(n} such that { (hl! (1n)} is convergent. Again, since { (h2, (1n)} is bounded
we can choose a subsequence {(2n} of {(1n} such that { (h2, (2n)} is convergent. By induction for every integer k > 1 we may construct a sequence
{(A:n} such that all sequences {(h;,(A:n}} for j = 1, ... ,k are convergent
and {(A:n} is a subsequence of {(A:-1,n} Evidently the diagonal sequence
{(nn} has the property that {(h;,(nnH is convergent for every j = 1, 2, ....

12. Markov and Foias Operators

418

This procedure of choosing subsequences is known as the Cantor diagonal


process [Dunford and Schwartz, 1957, Chapter 1.6]. Since the set {h;} is
dense in Co, then according to Remark 12.2.1 the sequence {(nn} is weakly
convergent to a measure I' It remains to prove that J.l. is probabilistic
and invariant.
Without any loss of generality we may assume that the set Bin (12.5.1)
is compact. Then X\ B is open and according to Theorem 12.2.1

I'(X \B) :5 liminf (nn(X \B) :5 1- inf J.l.n(B) :5 1- (1- e) =e.


n
n~~

Now we may prove that {{h, (nn}} converges to (h, J.l.} for every bounded
continuous h. Let h be given. Define he = hge where 9e E Co is such that

0:5ge:51 and 9e(z)=1

for x E B.

Then

l(h,J.i.- (nn}l :5l(he,J.I.- (nn}l + l(h(1- 9e),J.I.- (nn}l


:5 l(he, I' - (nn}l +sup lhi(J.I..(X \B)+ (nn(X \B))
or

l(h,J.i..- (nn)l :5 l(he, I'- (nn)l + 2e sup lhl.


Since he E Co and {(nn} converges weakly to J.l. this implies
lim (h, (nn)

n~~

= (h, J.l.)

for every bounded continuous h. In particular, setting h

= 1x we obtain

so J.l. is probabilistic.
Now we are ready to prove that I' is invariant. The sequence {(nn}, as
a subsequence of {(n}, may be written in the form
1

(nn =

k
n

k,.-1

2:::
P'J.i.O,
i=O

where {kn} is a strictly increasing sequence of integers. Thus,

and, consequently,
1

I(Uh, (nn}- (h, (nn}l = l(h, P(nn}- (h, (nn}l :5 kn sup lhl.

12.5. Stationary Measures

419

Passing to the limit we obtain

(Uh,p..}- (h,p..} = 0,
or

(h,Pp..}

= (h,p..).

The last equality holds for every bounded continuous h and in particular
for hE C0 Thus, by the Riesz representation theorem 12.1.1, Pp.. = I-'
The proof is completed.
Condition (12.5.1) is not only sufficient for the existence of an invariant distribution I-' but also necessary. To see this, assume that I-' exists.
Let {Bk} be an increasing sequence of bounded measurable sets such that
Uk Bk =X. Then
lim p..(Bk) = p..(X) = 1.
k-+oo

Thus, for every e > 0 there is a bounded set Bk such that p..(Bk)
Setting P.o = I-' we have 1-'n = I-' and, consequently,
for n

1- e.

= 0,1, ....

Remark 12.5.1. In the case when X is bounded (and hence compact,


because we always assume that X is closed), condition (12.5.1) is automatically satisfied with B = X. Thus for a regular stochastic dynamical
system there always exists a stationary distribution. In particular, for a
continuous transformation S: X --+ X of a compact set X there always exists an invariant probabilistic measure. This last assertion is known as the
Krylov-Bogolubov theorem. It is valid not only when X is a compact
subset of Rd, but also for arbitrary compact topological Hausdorff spaces.
0

Now we will concentrate on the case when XC~ is unbounded (but


closed!), and formulate some sufficient conditions for (12.5.1) based on the
technique of Liapunov functions. Recall from (5.7.8) that a Borel measurable function V: X --+ R is called a Liapunov function if V (x) --+ oo for
lxl--+ oo.
Proposition 12.5.1. Let P be the Foias operator corresponding to a regular stochastic dynamical system (12.4.2). Assume that there is an initial
random vector x 0 and a Liapunov function V such that
supE(V(xn))

< oo.

(12.5.3)

Then P has an invariant distribution.


Proof. Consider the family of bounded sets

Ba

= {x EX: V(x) :5: a}

for

a~

0.

420

12. Markov and Foias Operators

By Chebyshev's inequality (10.2.9) we have

IJ.n(X \ Ba)

= prob(V(xn) > a) ~ _E(,_V--'-(x_n'-'-))


a

or
for n = 0, 1, ... ,
where K = supn E(V(xn)). Thus, for every e > 0 inequality (12.5.1) is
satisfied with B = Ba and a= K- e. It follows from Theorem 12.5.1 that
P has an invariant distribution and the proof is complete.
It is easy to formulate a sufficient condition for (12.5.3) related explicitly
to properties of the function T of (12.4.2) and the distribution v. Thus we
have the following
Proposition 12.5.2. Let P be the Foias operator corresponding to a regular
stochastic dynamical system (12.4.2). Assume that there exists a Liapunov
function V and nonnegative constants a, {3, a < 1, such that

fw V(T(x, y))v(dy) ~ aV(x) + /3

forx EX.

(12.5.4)

Then P has an invariant distribution.


Proof. By an induction argument from inequality (12.5.4), it follows that

{ V(Tn(X, yn))vn(dyn) ~ anV(x) + an-l/3 + + a/3 + /3


lwn

~ V(x) + 1 ~a
Fix an Xo E X and define #J.o
(12.4.12) we have

E(V(xn))

= unv(xo) = f

~n

= Dzo

Then according to (12.4.14) and

V(Tn(xo,yn))vn(dyn)

~ V(xo) + -

/3 ,

1 -a

which implies (12.5.3), and Proposition 12.5.1 completes the proof.

12.6 Weak Asymptotic Stability


In the previous section we developed sufficient conditions for the existence
of a stationary measure IL. Now we are going to prove conditions that
ensure that this measure is asymptotically stable. Since in the space of
measures there are two natural notions of convergence (weak and strong),

12.6. Weak Asymptotic Stability

421

we will introduce two types of asymptotic stability. We will start from the
following.

Definition 12.6.1. Let P: Man -+ Mfin be a Markov operator. We say


that the sequence {pn} is weakly asymptotically stable if P has a
unique invariant distribution J.l. and
{ pn J.l.} converges weakly to J.l. for J.l. E M 1

(12.6.1)

In the special case that P is a Foias operator corresponding to a stochastic dynamical system (12.4.2) and {pn} is weakly asymptotically stable,
we say that the system is weakly asymptotically stable.
It may be shown that the uniqueness of the stationary distribution J.l.
is a consequence of the condition (12.6.1). To show this, let ji. E M 1 be
another stationary distribution. Then pnji. = ji. and from (12.6.1) applied
to J.l. = ji. we obtain
for h E Co(X).
By the lliesz representation theorem 12.1.1, this gives ji. = J.l. On the
other hand, condition (12.6.1) does not imply that J.l. is stationary for an
arbitrary Markov operator.

Example 12.6.1. Let X = [0, 1]. Consider the Frobenius-Perron operator P on measures and the Koopman operator U corresponding to the
transformation
S(x) = { c!x x > 0
x=O,
where c E [0, 1] is a constant. Now
x>O
x=O.
Thus, for every

J.1.

E M 1 and h E C0 (X) we have

Since h is continuous this implies


lim (h,PnJJ.}

n-+oo

= JJ.(O)h(O) + JJ.((O, 1])h(O) = h(O)

and consequently {pnJJ.} converges to 6o. On the other hand, P6o = 6c


and the system is weakly asymptotically stable only for c = 0 when S

422

12. Markov and Foias Operators

is continuous. U c > 0 the operator P has no invariant distribution but


condition (12.6.1) holds with I' = 6o. 0
Next we give two easily proved criteria for the weak asymptotic stability
of a sequence {P" }.

Proposition 12.6.1. Let P: Mfln -+ Mfln be a Markov operator. The


sequence {P"} is weakly asymptotically stable if P has an invariant distribution and if
lim (h, pnp.- pn[J}
R-+00

=0

for hE Co; p., [J EM.

(12.6.2)

Proof. First assume that {P"} is weakly asymptotically stable. Then by


the triangle inequalitY

and (12.6.1) implies (12.6.2). Alternately, if (12.6.2) holds and p.. is stationary, then substituting p, = I' in (12.6.2) we obtain (12.6.1).
The main advantage of condition (12.6.2) in comparison with {12.6.1) is
that in proving the convergence we may restrict the verification to subsets
of Co and M1.

Proposition 12.6.2. Let C. c Co be a dense subset. If condition {12.6.2)


holds for every h E C. and p., [J E M 1 with bounded supports, then it is
satisfied for arbitrary h E Co and p., [J E M1.
Proof. Choose p., [J E M1 and fix an e > 0. Without any loss of generality
we may assume that e ~ 1/2. Since p. and [J are probabilistic, there is a
bounded set B c X such that
p.(X \B) ~ e and [J(X \B) ~ e.
Define
{A)
p

= p.(A n B)
p.(B)

an

_{A)
p

= [J(A n B)
p.(B)

for A E B(X).

Evidently p and p are probabilistic measures with bounded supports. We


have p.(B) ~ 1- e ~!,and consequently
lp.(A)- p(A)I ~ 2lp.(A)p.(B)- p.(A n B) I
= 2lp.(A){1 - p.(X \B)) - p.(A n B) I
~ 2lp.(A) - p.(A n B) I + 2p.(A)p.(X \B)
~ 2p.{A \B)+ 2p.(X \B) ~ 4e for A E B(X).

12.6. Weak Asymptotic Stability

423

In an analogous fashion we may verify that

IJi(A) - p(A)I
Now let a function g E

4e

for A E B(X).

c. be given. Then

l{g,Pnp- pnp)l = I{Ung,p- Ji}l


~ I(Ung,p- p)l +Be supiUngl,
and, finally,
l(h,Pnp- pnp)l ~ l(g,pnp- pnp)l +Be maxlgl.
Since p and p have bounded supports the sequence { (g, pn p - pn p)} converges to zero. Consequently, { (g, pn p- pn Ji)} converges to zero for every
g E C. and p,Ji E M 1 Now from the inequality
l(h, pnJ.&- pnp)l ~ l(g, pnp- pnii}l + 2sup In- hi

forgE C.,h E Co

and the density of C. in Co, (12.6.3) follows for all hE Co. Thus the proof
is complete.
Now we may establish the main result of this section, which is an effective
criterion for the weak asymptotic stability of the stochastic system (12.4.2).
Theorem 12.6.1. Let P be the Foias operator corresponding to the regular
stochastic dynamical system (12.4.2). Assume that

forx,z EX

(12.6.3)

and
E(IT(o,en)l) ~ {3,

(12.6.4)

where E is the mathematical expectation and a, {3 are nonnegative constants


with a< 1. Then the system (12.4.2) is weakly asymptotically stable.
Before passing to the proof observe that conditions (12.6.3) and (12.6.4)
can be rewritten, using the distribution v appearing in the definition of the
stochastic system (12.4.2), in the form

and

fw IT(x, y) - T(z, y)lv(dy) ~ alx- zl

(12.6.5)

fw IT(O,y)lv(dy) ~ {3.

(12.6.6)

Proof of Theorem 12.6.1. From the inequality

fw IT(x,y)lv(dy) ~ fw IT(x,y)- T(x,O)Iv(dy) + LIT(O,y)lv(dy)

424

12. Markov and Foias Operators

and conditions (12.6.5) and (12.6.6), inequality (12.5.4) follows immediately


= !x!. Thus, according to Proposition 12.5.2 there exists a
stationary probabilistic measure I'. Using the definition of Tn from Section
12.4 and inequality (12.6.5) we obtain
if we take V(x)

!Tn(x, y")- Tn(z, y")lv"(dy")

lwn
= {

Jwn-1

{ { IT(Tn-1(x,y"- 1),yn)
Jw

- T(Tn-1(z,y"- 1), Yn)!v(dyn) }v"- 1(dy"- 1)

::5 a {
ITn-1(x,y"- 1)- Tn-1(z,y"- 1)lv"- 1(dy"- 1)
Jwn-1
::5 ::5 a"lx- z!.
(12.6. 7)
Now consider the subset C. of Co which consists of functions h satisfying
the Lipschitz condition
!h(x)- h(z)l ::5 klx- zl

for x,z EX,

where the constant k depends, in general, on h. Further let p. and P, be two


distributions with bounded support. Then

= I(U"h, JJ- P.}l


= U"h(x)p.(dx)-

l(h, P"p.- P"P,}I

IL

U"h(x)P,(dx)l, (12.6.8)

where B is a bounded set such that p.(B) = p.(B) = 1. Since the measures
JJ and P, are probabilistic there exist points qn, rn E B such that

IL

U"h(x)p.(dx)-

U"h(x)p,(dx)l ::5 !U"h(qn)- U"h(rn)l.

From this and (12.6.8) we have

l(h, P"p.- P"P,}I ::5IU"h(gn)- U"h(rn)l

::5 {

}wn !h(Tn(9m y"))- h(T"(rn, y"))lv"(dy").

Using the Lipschitz condition for hand (12.6.7) we finally obtain

l(h,P"p.- P"p,}l ::5 k {

}wn

!Tn(qn,y")- Tn(rmy")lv"(dy")

::5 ka"lqn - rnl ::5 k do.",

12.7. Strong Asymptotic Stability

425

where d = sup{lx- zl:x,z E B}. Since kdan-+ 0 as n-+ oo, this implies
(12.6.3) for arbitrary h E C. and p., ji. E M1 with bounded supports. According to Propositions 12.6.1 and 12.6.2 the proof of the weak asymptotic
stability is complete.
Remark 12.6.1. When T(x, y) = S(x) does not depend on y, condition
{12.6.4) is automatically satisfied with {3 = IS(O)I and inequality (12.6.3)
reduces to
IS(x)- S(z)l $ alx- zl
for x, z EX.
In this case the statement of Theorem 12.6.1 is close to the Banach contraction principle. However, it still gives something new. Namely, the classical
Banach theorem shows that all the trajectories {Sn(xo)} converge to the
unique fixed point x. = S(x.). From Theorem 12.6.1 it follows also that the
measures p.(s-n(A)) (with p. E Mt) converge to 6:r:. which is the unique
stationary distribution. 0

12.7 Strong Asymptotic Stability


In Example 12.2.1 we have shown that if a sequence of points {xn} converges to x., then the corresponding sequence of measures {6:r:n} converges
weakly to 6:r: In general, this convergence is not strong since ll6:r:n - 6:r:.ll =
2 for Xn # x . Thus, in the space of measures, weak convergence seems to
be a more convenient and natural notion than strong convergence. However
this is not necessarily true for stochastic dynamical systems in which the
perturbations are nonsingular. To make this notion precise we introduce
the following.

en

Definition 12.7.1. A measure p. E Mftn{X) is called nonsingular if there


is an absolutely continuous measure J.l.a such that
P.a(B) $ p.(B)

forB E B(X)

(12.7.1)

and P.a(X) > 0.


It can be proved that for every measure p. E Mfin there exists a maximal
absolutely continuous measure J.l.a satisfying {12.7.1}. The word maximal
means that for any other continuous measure J.l.a' satisfying J.l.a(B)::::; p.(B)
for all measurable sets B, we also have P.a(B) $ p.4 (B} for all measurable
B. This maximal measure J.l.a is called the absolutely continuous part
of p.. The remaining component, p.8 = p.- J.1.4 , is called the singular part.
Thus, Definition 12.7.1 may be restated as follows: The measure p. E Mftn
is nonsingular if its absolutely continuous part P.a is not identically equal
to zero. We always denote the absolutely continuous and singular parts of
any measure by subscripts a and s, respectively. The equation
J.I.=J.I.a+P.s
is called the Lebesgue decomposition of the measure p..

{12.7.2)

426

12. Markov and Foias Operators

In this section we will exclusively consider regular stochastic dynamical


systems of the form
for n = 0, 1, ... ,

{12.7.3)

where 8: X -+ X is a continuous mapping of a closed set X C If'- into


itself, and Xo, eo, el ... ' are independent random vectors. The values of en
belong to a Borel measurable set W c If'- such that

x E X, y E W

implies x + y E X.

This condition is satisfied, for example, when X = W = Rd or X = W =


R+. The dynamical system (12.7.3) with additive perturbations reduces to
the general form {12.4.2) for T(x,y) = 8(x) + y. Then equations (12.4.7)
and (12.4.8) for the Foias operator P and its adjoint U take the form

PJ.t(A)

=[

{fw 1A(8(x) + y)v(dy)} J.t(dx)

and
Uh(x)

= fw h(8(x) + y)v(dy)

for A E B(X) (12.7.4)

for x EX.

{12.7.5)

Consequently, for the scalar product we obtain


{h, pnJ.t}

= (Unh, J.t} = [

{fw h(8(x) + y)v(dy)} J.t(dx).

(12.7.6)

From Proposition 12.5.2 and Theorem 12.6.1 we immediately obtain the


following result.

Proposition 12. 7.1. If in the regular stochastic dynamical system (12. 7.3)
the transformation 8 and perturbations {en} satisfy the conditions
l8(x)l ~ alxl + 'Y

for x EX

(12.7.7)

and

{12.7.8)
where a,-y,k are nonnegative constants with a < 1, then (12.7.3) has a
stationary distribution. Moreover, if (12.7.7) is replaced by the stronger
condition
(12.7.9)
l8(x)- 8(z)l ~ alx- zl
for x, z EX,
then (12.7.3) is weakly asymptotically stable.

Proof. The proof is immediate. It is sufficient to verify conditions {12.5.4)


and (12.6.5). First observe that (12.7.8) is equivalent to

fw lylv(dy) ~ k.

12.7. Strong Asymptotic Stability

Consequently, setting T(x,y)

= S(x) + y and using (12.7.7) and

427

(12.7.8)

we obtain

fw IT(O,y)!v(dy) = fw IS(O) + y!v(dy)


:5 fw IS(O)Iv(dy) + fw lylv(dy)
:5 IS(O)I + k :5 I+ k.
This is a special case of {12.6.6) with {3 = 1

+ k. Further, {12.7.9) yields

fw IT(x,y)- T(z,y)!v(dy) = fw IS(x)- S(z)lv(dy)


= IS(x)- S(z)l :5 alx- z!,
which gives {12.6.5).
We will now show that under rather mild additional assumptions the
asymptotic stability guaranteed by Proposition 12.7.1 is, in fact, strong.
This is related to an interesting property of the absolutely continuous part
J.tna of the distribution J.tn Namely, IIJ.tnall = J.tna(X) increases to 1 as
n - oo. Our first result in this direction is the following.
Proposition 12. 7.2. Let P be the Foias operator coJTesponding to a regular
stochastic dynamical system (12.7.3) in which Sis a nonsingular transformation. If p. E Mftn is absolutely continuous, then Pp. is also.
Proof. Let f be the Radon-Nikodym derivative of p.. Then equation (12.7.4)
gives

{fw 1A(S(x) + y)v(dy)} f(x) dx


= fw {[ 1A(S(x) + y)f(x) dx} v(dy).

Pp.(A) = [

For fixed yEW the function 1A(S(x)+y) is the result of the application of
the Koopman operator to 1A(x+y). Denoting by Ps the Frobenius-Perron
operator (acting on densities) corresponding to S, we may rewrite the last
integral to obtain

Pp.(A) =

L{[

= fw

1A(x + y)Psf(x) dx} v(dy)

{[+

1A(x)Psf(x- y)
11

dx} v(dy).

Inside the braces the integration runs over all x such that x E A and
x EX+ y, or, equivalently, x E A and x- y EX. Thus,

428

12. Markov and Foias Operators

P~-(A) = fw
=

{L

L{fw

1x(x- y)Psf(x- y)

clx} v(dy)

1x(x- y)Psf(x- y)v(dy)} clx.

(12.7.10)

The function

q(x)

= fw 1x(x- y)Psf(x- y)v(dy)

(12.7.11)

inside the braces of (12.7.10) is the convolution of the element Psi E


L 1 with the measure v. Thus we have verified that PI- is an absolutely
continuous measure with density q.
From Proposition 12. 7.2, an important consequence concerning the behavior of the absolutely continuous part of P"l- follows directly. Namely,
we have
Corollary 12.7.1. Let P be the Foias opemtor corresponding to the regular
stochastic system (12.7.3) with nonsingular S. Then
(P~-)a(X) 2:::

1-'a(X)

for 1- E Mt1 01

{12.7.12)

and the sequence 1-'na(X) is increasing.

Proof. By the linearity of P we have


PI-= Pl-'a +PI-, 2::: Pl-a

Since (P~-)a is the maximal absolutely continuous measure which does not
exceed PI-, we have (PI-')a 2::: PIJ.a In particular,

and the proof is complete.


Proposition 12.7.2 also implies that when Sis nonsingular the operator
P on densities corresponding to P exists. In fact the right hand side of
{12.7.11) gives an explicit equation for this operator, that is,

Ff(x)

= fw 1x(x- y)Psf(x- y)v(dy).

(12.7.13)

If S and v are both nonsingular, we can say much more about the asymptotic behavior of (P"~-) 0 This behavior is described as follows.

Theorem 12. 7.1. Let P be the Foias opemtor corresponding to the regular
stochastic system (12.7.3). If the tmnsformation S and the distribution v
of mndom vectors {en} are nonsingular, then
(12.7.14)

12.7. Strong Asymptotic Stability

429

Proof. Let 9o be the Radon-Nikodym derivative of the measure 1/0 , Using


the inequality v ~ v0 in equation (12.7.4) applied to J.L 8 , we obtain

Ptt.(A)

~
=

L{fw

L{fw

1A(S(x) + y)v0 (dy)} JJ. 8 (dx)


1A(S(x) + y)go(Y) dy} J.La(dx)

= {{{

lx lw+S(x)

1A(Y)9a(Y- S(x)) dy} J.La(dx).

The integration in the braces of the last integral runs over all y such that
y E A and y E W + S(x), or equivalently all y E A and y- S(x) E W.
Thus, the last inequality may be rewritten in the form

PJ.ta(A)

~
=

L{L

L{L

1w(Y- S(x))g0 (y- S(x)) dy} J.La(dx)


1w(Y- S(x))ga(Y- S(x)}J.L.(dx)} dy.

Setting
r(y)

1w(Y- S(x))ga(Y- S(x))J.La(dx) and u(A)

r(y) dy

we may easily evaluate the measure PJ.L from below:

The measure P J.La + u is absolutely continuous and consequently the absolutely continuous part of PJ.L satisfies

In particular,
(PJ.L)a(X) ~ PJ.ta(X). + u(X) = JJ.a(X) + u(X).

(12.7.15)

We may easily evaluate u(X) since

u(X)

L{L

=f { f
lx

1w(Y- S(x))ga(Y- S(x)) dy} J.La{dx)


1w(y)g0 (y) dy} J.La(dx).

lx-S(x)

In the braces we integrate over ally such that y E Wand y E X- S(x)


or equivalently yEW andy+ S(x) EX. Since W +XC X the condition

430

12. Markov and Foias Operators

y + S(:z:) EX is always satisfied withy E Wand :z; EX. Thus,


u(X)

Set va(W)

fx

= {fw ga(Y) dy} p.(d:z:) = Va(W)p.(X)


= Va(W)(1 - J.&a(X)).

= e and use (12.7.15) to obtain


= e + (1 -

(PJJ)a(X) ~ J.&a(X) + e(1 - J.&a(X))

e)pa(X).

From this, we obtain by an induction argument


(Pnp) 4 (X) ~ e+e(1-e)+ .. +e(1-et- 1 +e(1-etJJa(X) ~ 1-(1-et.
Since e = va(X) > 0, this completes the proof.
Now we are in a position to state our main result concerning the strong
asymptotic stability of (12.7.3).

Theorem 12.7.2. Assume that (12.7.3) is a regular stochastic system and


that the transformation 8 and the distribution v are nonsingular. If (12. 7.3)
is weakly asymptotically stable, then it is also strongly asymptotically stable
and the limiting measure J.& is absolutely continuous.
Proof. Let P be the Foias operator given by equation (12.7.4) and P
the corresponding operator (12.7.13) for densities. The proof will be constructed in three steps. First we are going to show that P is constrictive.
Then we will prove that r = 1 in equation (5.3.10) and {Fn} is asymptotically stable in the sense of Definition 5.6.1. Finally, using Theorem 12.7.1
we will show that {pn} is strongly asymptotically stable.
Step I. Since (12. 7.3) is weakly asymptotically stable there exists a
stationary measure J.' Choose e = v4 (X)/3 and an open bounded set B
in X such that
p.(B) > 1- e.
Now consider an absolutely continuous J.&o E M1 with a density fo. According to the diagram (12.3.8), for each integer n ~ 1 the function pn fo
is the density of J.&n = pnJ.&o. The sequence {J.&n} converges weakly top.
and according to Theorem 12.2.1 there is an integer no such that

Is
or

pn fo(:z:) d:z:

lx\B
Now let F

cX

= J.&n(B) ~ 1- e

pnfo(:z:)d:z:~e

for n

for n ~no,

~no.

be a measurable set. We have

pn fo(:z:) d:z:

= J.&n(F) = PJ.&n-l(F),

(12.7.16)

12.7. Strong Asymptotic Stability

and from (12.7.4) with

PJ.tn-1(F)

11

114

431

+ "

L{fw
+ L{fw

1F(S(x) + y)v0 (dy)} J.tn-1(dx)


1F(S(x) + y)v.(dy)} J.tn-1(dx).

Since J.tn-1 is a probabilistic measure and

v.(X)

= 1-

114

(X)

= 1-

3e,

this implies

PJ.tn-1(F) :5 sup { f 1F(Y + z)va(dy)}


zex lw

+ 1- 3e.

Let g0 be the Radon-Nikodym derivative of 114 so we may rewrite the last


inequality in the form

PJ.tn-1 (F) :5 sup { f 1F(Y + z)go(Y) dy}


zex lw

=sup {
9a(y)dy}
zex lwn(F-z)

+1-

3e

+ 1- 3e.

The standard Borel measure of W n (F- z) is smaller than the measure of


F. Thus there exists a o> 0 such that

9o(Y) dy :5 e

lwn(F-z)

for F E B(X), m(F) :5 o

and consequently

pn /o(x) dx

= PJ.tn-1(F) :5 e + (1- 3e) = 1- 2e.

From this and (12.7.16) we obtain

pn fo(x) dx :5 e + (1- 2e)

= 1- e

for n ~no(/)

(X\B)UF

which proves that P is a constrictive operator. According to the spectral


decomposition theorem [see equation (5.3.10)] the iterates of P may be
written in the form
r

pn f =

L Ai(/)ga"(i) + Qnf

(12.7.17)

n=0,1, ... ,

i=1

where the densities 9i have disjoint supports and />gi =

9a(i)

432

12. Markov and Foias Operators

Step II. Now we are going to prove that r = 1 in equation (12.7.17).


Let k = rl and 9i be an arbitrary density. Then ak(i) = i and consequently pkn9i = 9i for all n. Since (12. 7.3) is weakly asymptotically stable
the sequence {{h,f>nkgi}} converges to (h,p..}. However, this sequence is
constant so
(12.7.18)
for hE Co.

The last equality implies that 9i is the density of JL. Thus, there is only one
term in the summation portion of (12.7.17) and g1 is the invariant density.
Step III. Consider the sequence { pn JLo} with an arbitrary JLo E M 1
Choose an e > 0. According to Theorem 12.7.1 there exists an integer k
such that
(PicJLo)a(X) = JL~ca(X) ~ 1- e.

= JLka(X). Since JLk = JLka + JLk we have


JLn+lc - I' = pn #Lie - JL = pn #Lka - fJp.. + pn l'k -

Define fJ

(1 - fJ)p..

or

where 1111 denotes the distance defined by equation (12.2.7). The last two
terms are easy to evaluate since
{12.7.20)

and
(1- fJ)IIJLII

= (1- fJ)p..(X) = 1-8 ~e.

(12.7.21)

The measure (J- 1JLka is absolutely continuous and normalized. Denote its
density by Ia pn((J- 1JLkn) clearly has density f>n Ia and from equation
(12.2.11)
Since {f>n} is asymptotically stable the right-hand side of this equality converges to zero as n-+ oo. From this convergence and inequalities (12.7.20)
and (12.7.21) applied to (12.7.19), it follows that
lim IIJLn+lc -1-'nll

n-+oo

This completes the proof.

= 0.

12.8 Iterated Function Systems and Fractals


In the previous section we considered a special case of a regular stochastic
dynamical system with additive nonsingular perturbations. As we have

12.8. Iterated Function Systems and Fractals

433

seen, these systems produce absolutely continuous limiting distributions.


In this section we consider another special class in which the set W is
finite. We will see that such systems produce limiting measures supported
on very special sets-fractals.
Intuitively a system with finite W can be described as follows. Consider
N continuous transformations
i= 1, ... ,N
of a closed nonempty subset X C Jld. If the initial point xo E X is chosen we
toss anN-sided die, and if the number io is drawn we define Xt= sio(xo).
Then we toss up the die again and if the number it is drawn we define
X2 = 8i 1 (xt), and SO on.
This procedure can be easily formalized. Consider a probabilistic vector
N

{pt,PN),

Pi~

o,

LPi
i=l

= 1,

and the sequence of independent random variables


prob(~n

= i) =Pi

~o.~t ...

such that

for i = 1, ... , N.

The dynamical system is defined by the formula


for n

= 0, 1, ....

(12.8.1)

It is clear that in this case T(x, y) = Sy(x) and W = {1, ... , N}. The
system (12.8.1) is called (Barnsley, 1988) an iterated function system

(IFS).
Using the general equations (12.4. 7) and (12.4.8) it is easy to find explicit
formulas for the operators U and P corresponding to an iterated function
system. Namely,

Uh(x)

= fw h(T(x, y))v(dy) = fw h(Sy(x))v(dy)

or

Uh(x)

= LPih(Si(x))

for x EX.

(12.8.2)

i=l
Further,
N

PJL(A)

= (UlA,JL} = LPi 11A(Si(X))JL(dx)


i=l

or

PJL(A)

= LPiJL(S; 1 (A))
i=l

for A E B(X).

{12.8.3)

434

12. Markov and Foias Operators

Now assume that the Si satisfy the Lipschitz condition


forx,zEX; i=1, ... ,N,

(12.8.4)

where Li are nonnegative constants. In this case Theorem 12.6.1 implies


the following result.

Proposition 12.8.1. If
N

~PiLi< 1,

(12.8.5)

i=l

then the iterated function system (12.8.1) is weakly asymptotically stable.

Proof. It is sufficient to verify conditions (12.6.3) and (12.6.4). We have


N

E(IS~"(x)- S~"(z)l) = ~PiiSi(x)- Si(z)l


i=l
N

~ lx- zl ~PiLi
i=l

and

E(IS~"(O)I)

= ~PiiSi(O)I.
i=l

Consequently (12.6.3) and (12.6.4) are satisfied with a= :EPiLi and /3 =


:EPiiSi(O)I, and by Theorem 12.6.1 the proof is complete.
Condition (12.8.5) is automatically satisfied when Li < 1 fori = 1, ... , N.
An iterated function system for which

= m~ Li < 1
I

and Pi > 0,

i= 1, ... ,N

(12.8.6)

is called hyperbolic. Our goal now is to study the structure of the set

A. = supp I',

(12.8.7)

where I' is a stationary distribution for hyperbolic systems. We will show


that A. does not depend on the probabilistic vector (p11 ... , PN) as long
as all the Pi are strictly positive. To show an alternative, nonprobabilistic
method of constructing A., we introduce a transformation F on the subset
of X such that the iterates pn approximate A.

Definition 12.8.1. Let an iterated function system (12.8.1) be given. Then


the transformation
N

F(A)

= U Si(A)
i=l

forA eX

(12.8.8)

12.8. Iterated Function Systems and Fractals

435

mapping subsets of X into subsets of X is called the Barnsley operator


corresponding to (12.8.1).
It is easy to observe that for every compact set A C X its image F(A)
is also a compact set. In fact, the S,(A) a.re compact since the images of
compact sets by continuous transformations a.re compact and the finite
union of compact sets is compact. To show the connection between F and
the dynamical system (12.8.1) we prove the following.

Proposition 12.8.2. Let F be the Bamsley operator corresponding to


{12.8.1). Moreover, let {Pn} be the sequence of distributions corresponding to {12.8.1), that is, J'n = P"JJ.o. Jfsupp Jl.o is a compact set, then
supp J'n = F(supp JJ.o)

(12.8.9)

Proof. It is clearly sufficient to verify that supp J.t 1 = F(supp JJ.o) since the
situation repeats. Let x E F(supp JJ.o) and e > 0 be fixed. Then x = Sj(z)
for some integer j and z E supp Jl.o. Consequently, for the ball Br(z) we
have JJ.o(Br(z)) > 0 for every r > 0. Further, due to the continuity of Sj
there is an r > 0 such that

This gives
n

J.t1(Be(x))

= LPsJJ.o(Bi 1(Be(x))
i=1

Since e > 0 was arbitrary this shows that x E supp 1'1 We have proved the
inclusion F(supp JJ.o) c supp 1'1
Now, suppose that this inclusion is proper and there is a point x E
supp J.t 1 such that x F(supp JJ.o). Due to the compactness of F(supp JJ.o)
there must exist an e > 0 such that the ball Be(x) is disjoint with F(supp JJ.o).
This implies
fori= 1, ... ,N
or

Si 1(Be(x)) n supp J.to = 0

fori= 1, ... ,N.

The last condition implies that


N

J'1(Be(x))

= LPsJJ.o(Bi 1(Be(x)) = 0
i=1

436

12. Markov and Foias Operators

which contradicts to the assumption that x E supp l't This contradiction


shows that F(supp JJo) = supp l't An induction argument completes the
proof.
Formula (12.8.9) allows us to construct the supports of J.'n from the support of JJo by purely geometrical methods without any use of probabilistic
arguments. Now we will show that the set

A.

= supp I',

(12.8.10)

which is called the attractor of the iterated function system, can be obtained as the limit of the sequence of sets

An = supp J.'n

= F(Ao).

{12.8.11)

To state this fact precisely we introduce the notion of the Hausdorff distance
between two sets.
Definition 12.8.2. Let At, A2 c R"- be nonempty compact sets and let
> 0 be a real number. We say that At approximates A2 with accuracy r
if, for every point Xt EAt, there is a point x2 E A2 such that lxt- x2l ~ r
and for every x 2 E A there is an Xt E At such that the same inequality
holds. The infimum of all r such that At approximates A2 with accuracy
r is called the Hausdorff distance between At and A2 and is denoted by
dist(At, A2).
We say that a sequence {An} of compact sets converges to a compact
set A if
lim dist(Am A) = 0.
n-o ex>
r

From the compactness of A it easily follows that the limit of the sequence
{An}, if it exists, must be unique. This limit will be denoted by liron..... oo An.
Example 12.8.1. Let X

= R, A= [0, 1] and

2n-1}
1 2
An= { 2n'2n''"'2ft

for n

= 1,2, ....

Clearly, An C [0, 1]. Moreover for every x E [0, 1] there is an integer k,


1 ~ k ~ 2n-t, such that
X-

!_I < _!:_,

2n - 2n
I
Thus, An approximates A with accuracy 1/2n. Moreover, for x
the nearest point in An is 1/2n. Consequently,

=0 E A

dist(An, A) = ;n.
This example shows that sets which are close in the sense of Hausdorff
distance can be quite different from a topological point of view. In fact, each

12.8. Iterated Function Systems and Fractals

437

An consists of a finite number of points, whereas A = (0, 1] is a continuum.


This is a typical situation in the technical reproduction of pictures; on a
television screen a picture is composed of a finite number of pixels. 0
We have introduced the notion of the distance between compact sets only.
We already know that for compact Ao = supp p.o all the sets An = supp P.n
are compact. Now we are going to show the compactness of the limiting set
A,.= supp p.,..
Proposition 12.8.3. If the iterated function system (12.8.1) is hyperbolic
and p.,. is the stationary distribution, then the set A,. = supp p.,. is compact.
Proof. Since the support of every measure is a closed set, it is sufficient
to verify that A,. is bounded. Further, since p.,. does not depend on p.o we
may assume that p.o = 6z0 for an xo E X. Define
r

= max{ISi(xo) -

xol: i

= 1, ... , N}.

Then

or by induction,
ISil

0 0

sin (xo) - xol $ Ln-lr + ... + Lr + r $ 1 ~ L

(12.8.12)

for every sequence of integers i1, ... , in with 0 $ ik $ N. Choose an arbitrary point z E X such that
r

lz- xol :?: 1 - L

+ 1.

(12.8.13)

We are going to prove that z f/. supp p.,.. Fix an e E (0, 1). From inequality
(12.2.6) and equation (12.8.3} we obtain
p.,.(B~(z))

$liminf P.n(B~(z))

(12.8.14}

n-+oo

= liminf
n-+oo

"
L..J' Pi 1

Pin6z0 (Si1 o o Sin (B~(z))).


1

..

i1, ... ,in

According to (12.8.12) and (12.8.13) we have


lz- sin
which implies that

x0 ~

0 ... 0

sil (xo)l :?: 1

"'

Si; 1 o o Si:

(B~(z)).

Thus the right-hand side of (12.8.14} is equal to zero and as a consequence


0. We have proved that z f/. supp p.,. and that the support of
p.,. is contained in a ball centered at xo with radius 1 + r/(1- L).

p.,.(B~(z)) =

438

12. Markov and Foias Operators

Now we formulate a convergence theorem which allows us to construct


the set A. without any use of probabilistic tools.
Theorem 12.8.1. Let (12.8.1) be a hyperbolic system and let F be the
corT"esponding Bamsley operator. Further, let A. be the support of the invariant distribution. Then

A. = lim Fn(Ao)

(12.8.15)

n-+oo

whenever

Ao C X

is a nonempty compact set.

Prooof. We divide the proof into two steps. First we show that the limit
of {FR(Ao)} does not depend on the particular choice of Ao, and then we
will prove that this limit is equal to supp P..
Step I. Consider two initial compact sets Ao, Zo c X and the corresponding sequences

An= F(Ao), Zn = F(Zo)

n=0,1, ....

We are going to show that dist(An, Zn) converges to zero. Let r > 0 be
sufficiently large so A0 and Zo are contained in a ball of radius r. Now fix
an integer nand a point x E An. According to the definition ofF there
exists a sequence of integers kt, ... , kn and a point u E Ao such that

Now choose an arbitrary point v E Zo and define z E Zn by

Since the Si are Lipschitzean we have

We have proved that for every x E An there is a z E Zn such that lx- zl ~


2r Ln. Since the assumptions concerning the sets A0 and Zo are symmetric
this shows that the distance between An and Zn is smaller than 2r Ln.
Consequently,
(12.8.16}
lim dist(An, Zn) = 0.
R-+00

Step II. Choose an arbitrary nonempty compact sets Ao


Zo

= A. = supp p..

Since p.. is invariant we also have

X and define

12.8. Iterated Function Systems and Fractals

439

Substituting this into (12.8.16) we obtain (12.8.15) and the proof is complete.
It is worth noting that for systems which are not hyperbolic, equality
(12.8.15) may be violated even if condition (12.8.5) is satisfied. In general
the set lim F"(Ao) is larger than A.= supp p..
Example 12.8.2. Let X = R, S1(x) = x and S2(x) = 0 for x E R.
Evidently for every probabilistic vector (p1,P2) with Pl < 1 the condition
(12.8.5) is satisfied. Thus the system is weakly asymptotically stable and
there exists unique stationary distribution P.. It is easy to guess that p.. =
60 In fact, according to (12.8.3),
P6o(A)

=Plc5o(S1 1 (A)) + P2c5o(S2 1 (A)),

where S} 1 (A) =A and

Sil(A)

= {R

ifO E A

0 ifO A.

Therefore

= Plc5o(A) + P2c5o(A) = c5o(A).


On the other hand, for Ao = (0, 1] we have
P6o(A)

F(Ao) = S1(Ao) U S2(Ao) = (0, 1] U {0} = (0, 1]


and by induction
F"(Ao)

= (0, 1]

n=0,1, ....

This sequence does not converge to A. = supp p.. = {0}. 0


Now we are going to use equation (12.8.15) for the construction of attractors of hyperbolic systems. This procedure can often be simplified using
the following result concerning the Barnsley operator (12.8.8).
Proposition 12.8.3. Assume that the Si:X-+ X, i = 1, ... ,N appearing
in equation (12.8.8) are continuous and that Ao C X is a compact set.
Denote An = pn(Ao) and assume that A. = limn-+oo An exists. If Ao :,)
F(Ao), then
(12.8.17)
Proof. The Barnsley operator F is monotonic, that is, A C B implies
F(A) C F(B). Thus from A1 :,) Ao it follows F"(Al) :,) pn(A0 ) or An+l :,)
An. It remains to prove that An :,:, A . Fix an integer n and a point x E A .
Consider a sequence e; = 1/j. Since {An+lc} converges to A. as k -+ oo
we can find a set An+A:U> which approximates A. with accuracy e;. There
exists, therefore, x; E An+A:(j) such that lx;- xl 5 e;. Evidently x; E An

440

12. Markov and Foias Operators

1/~

II II

II II

1/~

FIGURE 12.8.1.

since, by the first part of the proof, An :::> An+k(j). The set An is closed and
the conditions xi E An, xi -+ x imply x E An This verifies the inclusion
An :::> A. and completes the proof.
Our first example of the construction of an attractor deals with a onedimensional system given by two linear transformations. Despite the simplicity of the system the attractor is quite complicated.
Example 12.8.3. Let X

St(x)
Choose

= ix

= R and
and S2(x)

= !x + j

forxeR.

Ao = (0, 1] (see Figure 12.8.1). Then


At

= F(Ao) = St((O, 1]) U 82([0, 1]) = [0, i] U [j, 1].

Thus, At is obtained from Ao by taking out the middle open interval (


Now

i, j).

([0, }] u [j, 1]) u S2 ([0, }] u [j, 1])


[0, !] u [~, ~] u [t, ~] u [i, 1] .

A2 = F(At) = St
=

Again A2 is obtained from At by taking out two middle open intervals


(!, ~) and ( ~, i). Proceeding further we observe that this operation repeats
and As can be obtained from A2 by taking out the four middle intervals.
Thus, the set As consists of eight intervals of length 2~ . In general, An is

12.8. Iterated Function Systems and Fractals

441

the sum of 2n intervals of length 1/3n. The Borel measure of An is {f} n and
converges to zero as n -+ oo. The limiting set A. has Borel measure zero
since it is contained in all sets An. This is the famous Cantor set-the
source of many examples in analysis and topology. 0
Example 12.8.4. Let X = R 2 and
Si(x)

(13 !0) + (ai)


x

bi

i = 1,2,3,

where
a t -- bt -- O'

a2 -

12' ",._2-- O'

a3 -

14' "
,._3-- 12.

Choose A0 to be the isosceles triangle with vertices (0, 0), (1, 0), (!, 1) (see
Figure 12.8.2a). St(Ao) is a triangle with vertices {0,0), (},0}, (!,}).The
triangles S 2(A0 } and S3(Ao) are congruent to St(Ao) but shifted to the
right, and to the right and up, respectively. AB a result, the set
At= F(Ao) = St(Ao) U S2(Ao) U S3(Ao)
is the union of three triangles as shown in Figure 12.8.2b. Observe that At
is obtained from Ao by taking out the middle open triangle with vertices
(},0}, (!, }), (i, }). Analogously each set Si(At}, i = 1,2,3, consisting of
three congruent triangles of height land A2 = F(At} in the union of nine
triangles shown in Figure (12.8.2c). Again A2 can be obtained from At by
taking out three middle triangles.
This process repeats and in general An consists of 3n triangles with height
(!}n, base (!}n, and total area
m(An} =

! {if,

which converges to zero as n-+ oo. The limiting set A., called the Sierpinski triangle, has Borel measure zero. It is shown in Figure 12.8.2d. Unlike
the Cantor set, the Sierpinski triangle is a continuum (compact connected
set) and from a geometric point of view it is a line whose every point is a
ramification point. The Sierpinski triangle also appears in cellular automata
theory [Wolfram, 1983]. 0
In these two examples the construction of the sets An approximating A.
was ad hoc. We simply guessed the procedure leading from An to An+l,
taking out the middle intervals or middle triangles. In general, for an arbitrary iterated function system the connection between An and An+l is not
so simple. In the next theorem we develop another way of approximating
A. which is especially effective with the aid of a computer.
Theorem 12.8.2. Let (12.8.1} be a hyperbolic system. Then for every x 0 E
X and e > 0 there exist two numbers no= no(e) and ko = k0 (e) such that
prob(dist({xn.Xn+fe},A.) <e)> 1-e

for n ?:

no. k ?: ko,
(12.8.18)

where {Xn} denotes the tmjectory starting from Xo.

442

12. Markov and Foias Operators

FIGURE 12.8.2.

In other words Theorem 12.8.2 says the following. If we cancel the first
or more elements of the trajectory {xn}, then the probability that a
sufficiently long segment Xn, , Xn+fc approximates A. with accuracy e: is
greater than 1 - e:.

no

Proof. Let e: > 0 be fixed. Choose a compact set Ao C X such that xo E Ao


and F(Ao) c Ao. [From condition (12.8.4) and (12.8.6) it follows that such

a set exists.] The sequence An = F(Ao) is decreasing and


Theorem 12.8.1 there is an integer no(e:) such that
dist(An, A.)< e:

for n

By

~no.

From this inequality, for every value of the random vector


An for which
lxn- Znl < e:
for n ~no.
This determines the number

Xn E An.

Xn

there is

no appearing in condition (12.8.18).

Zn E

(12.8.19)

12.8. Iterated Function Systems and Fractals

443

Now we are going to find /co. Since A. is a compact set there is a finite
sequence of points ai EA., i = 1, ... , q such that
q

A. c

U B~;2(ai)

(12.8.20)

i=l

Pick a point u E An0 The set {u}, which contains the single point u, is
compact and according to Theorem 12.8.1 there exists an integer r such
that
dist(Fr({u}),A.) < ~
The points of Fr({u}) are given by Ba 1 o .. o Sar(u). Thus, for every
i = 1, ... , q, there exists a sequence of integers a(i, 1), ... , a(i, r) for which
IBa(i,l) 0 " . 0 Ba(i,r)(u)- ail <

4'

This inequality holds for a fixed u E An0 When u moves in Ano, the
corresponding value Ba(i,t) o o Ba(i,r)(u) changes by at most rc where
c = max{lu-vl: u,v E An0 }. Choosing r large enough, we have Lrc < e/4,
and consequently
for i = 1, ... , q, u E An0 (12.8.21)
Now consider the segment Xn, ... , Xn+k of the trajectory given by (12.8.1)
with n ~ no. We have
Xn+j

= s{..+J-1 0 0 s{.. (xn)

and Xn+j E Ano for 0 ::; j ::; k.

If the sequence en," . en+k contains the segment a(i, 1),". 'a(i, r), that
is,
en+Hr-1 = a( i, 1), ... , en+; = a( i, r)
(12.8.22)

for some j, 0 ::; j ::; k- r, then (12.8.21) implies Xn+Hr E B~;2(ai) The
probability of the event (12.8.22), with fixed j, is equal to Pa(i,l) Pa(i,r)
and the probability of the opposite event is smaller than or equal to 1 - pr
where p = minPi The probability that en, ... ,en+k with k ~ rm does
not contain the sequence a(i, 1), ... , a(i, r) is at most (1 - pr)m. For sufficiently large m we have (1- pr)m ::; efq. With this m and k ~ ko = rm
the probability of the event that en. ... 'en+k contains all the sequences
a(i, 1), ... , a(i, r), fori= 1, ... , q is at least 1- q(1- pr)m ~ 1- e. When
the last event occurs, then for every point ai there is a point Xn+i+r such
that lxn+Hr -ail < e/2. In this case according to (12.8.20) every point
x E A. is approximated by a point of the segment Xn, ... , Xn+k with accuracy e. From this and (12.8.19) it follows
dist( {xn, ... , Xn+k}, A.) <e.
The proof is completed.

444

12. Markov and Foias Operators

Theorem 12.8.3 gives a practical way of constructing a picture of A.


with the use of a computer. We simply must generate a segment of the
trajectory {xn} according to equation (12.8.1). Neglecting the first segment
xo, ... , Xn-1, we assure that the remaining points Xn, Xn+ 1, .. are in the
e-neighborhood of A. where e = eoLn, and eo is a constant which depends
only on the choice of the initial point xo. Since the system is hyperbolic
the sequence {Ln} quickly converges to zero, and in practice, the value eo
is irrelevant. We generate the points Xn 1 Xn+l, Xn+2' . until the picture
no longer substantially changes. Theoretically, the approach to the set A.
does not depend on the choice of the probabilities p 1 , ,pN as long as
Pi > 0 fori= 1, ... , N. However, the convergence to A. may be slow if the
value p = min Pi is small. Changing Pi does not change A. = supp J.l., but
may change J.l.
Example 12.8.5. Let X = R2 and take Si, i = 1, 2, 3 to be given by

S1(x)=
S2(x)

(~ ~)x+ (3)

=(

S3(x) =

- r cos cp -r sin cp
- rsincp

)
)

rcoscp

q cos 1/J

-r sin 1/J

q sin 1/J

r cos 1/J

where

c = 0.255,

r = 0. 75,

no =

ko = 150000,

q = 0.625,

Choosing
100,

lxol ~ 1,

P1 = P2 = P3 =

we obtain a representation of a tree as shown in Figure 12.8.3, as prepared


for this volume by Dr. Z. Kielek. It should be noted that the "tree" has
some natural asymmetry as usually appears in nature. 0
The objects shown in Figures 12.8.1, 12.8.2, and 12.8.3 are called fractals,
a name derived from the Latin word fractus meaning broken or partial.
One possible definition says that a fractal is a set which has a fractional
dimension. To make this definition precise it is necessary to define a dimension applicable to a large class of sets. This is not an easy task and the
several existing dimension definitions are, in general, not equivalent.
Here we give a simplification of the Hausdorff dimension proposed by
N. A. Kolmogorov. It is called the capacity or fractal dimension. To
understand the ideas that lead to this notion, define a d-cube of size l by
K = {(x1, ... , xd): ai ~xi ~ ai + l fori= 1, ... , d}.

12.8. Iterated Function Systems and Fractals

445

FIGURE 12.8.3.

This set K is evidently ad-dimensional object. The question arises how to


derive the number d from the intrinsic properties of the cube, neglecting the
trivial fact that the index i takes on d values in the definition of K. Assume
first that l = 1 and that K is subdivided into cubes of size En= 1/n. The
number of these cubes is N(E) = nd. From this we obtain immediately
d = lognd = logN(En)
logn
log(1/En)'

When the size l of the cube K is arbitrary the calculation is a little more
complicated. Namely, K may be divided into cubes of size En = lfn and
the number of these cubes is N(En) = nd. Consequently,
d= lim
n-+oo

dlogn
=lim logN(En).
logn -logl n-+oo log(1/En)

These calculations suggest the following.


Definition 12.8.3. Let A C Rd be a compact set. For every E > 0 denote
by N(E) the minimum number of cubes of size E needed to cover A. We
define the dimension of K by the formula

. K
d1m
if this limit exists.

lim
= e-+0

logN(E)
log(1/E)

--.:::...,.....-"-'-

(12.8.23)

446

12. Markov

and Foias Operators

Calculation of the fractal dimension by a direct application of Definition 12.8.a is difficult. It may be simplified and the continuous variable e
replaced by an appropriate sequence {en.} Namely, if for some c > 0 and
0 < q < 1 we define En. = cq"', and if
d

cq

= lim log N(en.)


n-ooo log(1/en)

(12.8.24)

exists, then the limit (12.8.2a) also exists and dim K = dcq [Barnsley, 1988;
Chapter 5).
Using this property we may find the dimension of the attractors A.
described in Examples 12.8.a and 12.8.4. First consider the case when A.
is the Cantor set. We have

A.

c An = F"([O, 1))

and A. can be covered by 2"' disjoint intervals of length a-n whose sum is
equal to An. Since A. contains the endpoints of these intervals the number
of covering intervals cannot be made smaller. We have, therefore, N(en) =
2n for En= a-n. which gives
.
d1m A.cantor

log2

= log a.

For the Sierpinski triangle the situation is similar. We have

A. c An

= Fn(Ao),

where Ao is the initial triangle. The set An consists of an isosceles triangles


of height 2-:-n and base length 2-n. Every such triangle can be covered
by four squares of size 2-(n+l). On the other hand the vertices of these
triangles belong to A.. It is necessary to use an different squares of size
2-(n+l) just to cover the top vertices of these triangles. Thus, for En =
2-(n+l) we have
and consequently
nloga
< logN(en) < log4 + nloga
(n + 1) log 2 - log(1/en) - (n + 1) log 2
which gives in the limit
.
dim A.sierplnski

loga

= log 2 .

Exercises

447

Exercises
12~1. Let X c R!'- be a compact set and C(X) be the space of continuous functions f: X -+ R. Using the Weierstrass approximation theorem
prove that in C(X) there exists a dense countable subset of Lipschitzean
functions.

12.2. Let X

c R!'- be a closed unbounded set. Using the family of functions


fnw(x)

= w(x)max(l- n- 1 lxi,O),

where n is a positive integer and w: R!'- -+ R is a polynomial, show that


in the space Co(X) there exists a dense countable subset of Lipschitzean
functions.
12.3. Let X = [0, 211'] and let {Jtn} be the sequence of probabilistic measures
with densities (1/11') sin2 nx. Find the weak limit I' of {Jtn} Is I' also the
strong limit of {Jtn}?
12.4. LetS: R-+ R be a continuous function such that S(x) ::/= x for x e R.
Show that for the operator P~t(A) = ~t(S- 1 (A)) there does not exist an
invariant probabilistic measure.
12.5. Let X = {0, 1, ... } be the set of nonnegative integers: Consider the
iterated function system given by the two transformations
for x EX,
and the probability vector Pl
asymptotically stable.

= P2 = ! . Show that this system is strongly

12.6. Generalize the previous result and consider an arbitrary iterated


function system (12.8.1). Show that if 81 = 0 and P1 > 0, then this system
is strongly asymptotically stable.
12.7. Let (12.8.1) be a hyperbolic dynamical system. Fix an arbitrary
xo EX. Prove that for every sequence {in} with in E {1, ... , N} the limit
X=

lim 8i1

n-+oo

8i 2

0 0

Si n (xo)

exists, and that the set of all such points x corresponding to all possible
sequences {in} is equal to A. [Barnsley, 1988; Chapter 4].
12.8. Consider the hyperbolic dynamical system given, on X
eight transformations

= R 2 , by the

= 1, ... ,8,

1,

where (ai, bi) are all possible pairs made from the numbers (0, ~) excluding (l, l). The attractor A. of this system is called a Sierpinski carpet.
Make a picture of A. and calculate dim A .

References

Abraham, R. and Marsden, J.E. 1978. Foundations of Mechanics, Benjamin/Cummings, Reading, M88Sachusetts.
Adler, R.L. and Rivlin, T.J. 1964. Ergodic and mixing properties of Chebyshev polynomials, Proc. Am. Math. Soc. 15:794-796.
Anosov, D.V. 1963. Ergodic properties of geodesic flows on closed Riemannian manifolds of negative curvature, Sov. Math. Dokl. 4:1153-1156.
Anosov, D.V. 1967. Geodesic flows on compact Riemannian manifolds of
negative curvature, Proc. Steklov Inst. Math., 90:1-:209.
Arnold, V.I. 1963. Small denominators and problems of stability of motion
in classical and celestial mechanics, Russian Math. Suroeys, 18:85-193.
Arnold, V.I. and Avez, A. 1968. Ergodic Problems of Classical Mechanics,
Benjamin, New York.
Barnsley, M. 1988. Fractals Everywhere, Academic Press, New York.
Barnsley, M. and Cornille, H. 1981. General solution of a Boltzmann equation and the formation of Maxwellian tails, Proc. R. Soc. London, Sect.
A, 374:371-400.
Baron, K. and Lasota A. 1993. Asymptotic properties of Markov operators
defined by Volterra type integrals, Ann. Polon. Math., 58:161-175.
Bessala, P. 1975. On the existence of a fundamental solution for a parabolic
differential equation with unbounded coefficients, Ann. Polon. Math.,
29:403-409.
Bharucha-Reid, A.T. 1960. Elements of the Theory of Markov Processes
and Their Applications, McGraw-Hill, New York.

450

References

Birkhoff, G.D. 1931a. Proof of a recurrence theorem for strongly transitive


systems, Proc. Natl. Acad. Sci. USA, 17:65o-655.
Birkhoff, G.D. 1931b. Proof of the ergodic theorem, Proc. Natl. Acad. Sci.
USA, 17:656-660.
Bobylev, A.V. 1976. Exact solutions of the Boltzmann equations, Sov. Phys.
Dokl., 20:822-824.
Borel, E. 1909. Les probabilites denombrables et leurs applications arithmetiques, Rendiconti Circ. Mat. Palermo, 27:247-271.
Boyarsky, A. 1984. On the significance of absolutely continuous invariant
measures, Pysica 11D, 13G-146.
Breiman, L. 1968. Probability, Addison-Wesley, Reading, Massachusetts.
Brown, J.R. 1976. Ergodic Theory and Topological Dynamics, Academic
Press, New York.
Brunovsky, P. 1983. Notes on chaos in the cell population partial differential
equation. Nonlin. Anal., 7:167-176.
Brunovsky, P. and Komornik, J. 1984. Ergodicity and exactness of the shift
on C[O, oo) and the semiftow of a first-order partial differential equation,
J. Math. Anal. Applic., 104:235-245.
Brzeiniak, Z. and Szafirski, B. 1991. Asymptotic behavior of L 1 norm of
solutions to parabolic equations, Bull. Polon. Acad. Sci. Math., 39:1-10.
Bugiel, P. 1982. Approximation for the measure of ergodic transformations
on the real line, Z. Wahrscheinlichkeitstheorie Venn. Gebeite, 59:27-38.
Chabrowski, J. 1970. Sur la construction de la solution fundamentale de
I'equation parabolique aux coefficients non homes. Colloq. Math., 21:141148.
Chandrasekhar, S. and Miinch, G. 1952. The theory of fluctuations in
brightness of the Milky-Way, Astrophys. J., 125:94-123.
Chapman, S. and Cowling, T.G. 1960. The Mathematical Theory of NonUniform Gases, Cambridge University Press, Cambridge, England.
Collet, P. and Eckmann, J.P. 1980. Iterated Maps on the Interval as Dynamical Systems, Birkhaiiser, Boston.
Cornfeld, I.P., Fomin, S.V., and Sinai, Ya.G. 1982. Ergodic Theory, SpringerVerlag, New York.
Dlotko, T. and Lasota, A. 1983. On the Tjon-Wu representation of the
Boltzmann equation, Ann. Polon. Math., 42:73-82.
Dlotko, T. and Lasota, A. 1986. Statistical stability and the lower bound
function technique, in Proceedings of the Autumn Course on Semigroups: Theory and Applications (H. Brezis, M. Crandall, and F. Kappel, eds.). International Center for Theoretical Physics, Trieste, Pitman
Res. Notes Math., 141:75-95.

References

451

Dunford, N. and Schwartz, J.T. 1957. Linear Operators. Part I: General


Theory, Wiley, New York.
Dynkin, E.G. 1965. Markov Processes, Springer-Verlag, New York.
Eidel'man, S.D. 1969. Parabolic Systems, North-Holland, Amsterdam.
Elmroth, T. 1984. On the H-function and convergence toward equilibrium for a space-homogeneous molecular density, SIAM J. Appl. Math.,
44:150-159.
Feigenbaum, M.J. and Hasslacher, B. 1982. Irrational decimations and path
integrals for external noise, Phys. Rev. Lett., 49:605-609.
Foguel, S.R. 1966. Limit theorems for Markov processes, 7rans. Amer.
Math. Soc., 121:200-209.
Foguel, S.R. 1969. The Ergodic Theory of Markov Processes, Van Nostrand
Reinhold, New York.
Friedman, A. 1964. Partial Differential Equations of Parabolic 7)jpe,
Prentice-Hall, Englewood Cliffs, New Jersey.
Friedman, A. 1975. Stochastic Differential Equations and Applications, vol.
1, Academic Press, New York.
Gantmacher, F.R. 1959. Matri:c Theory, Chelsea, New York.
Gihman [Gikhman], 1.1. and Skorohod [Skorokhod], A.V. 1975. The Theory
of Stochastic Processes, vol. 2, Springer-Verlag, New York.
Gikhman, 1.1. and Skorohod A.V. 1969. Introduction to the Theory of Random Processes, Saunders, Philadelphia. [Trans. from Russian].
Glass, L. and Mackey, M.C. 1979. A simple model for phase locking of
biological oscillators, J. Math. Biology, 7:339-352.
Guevara, M.R. and Glass, L. 1982. Phase locking, period doubling bifurcations, and chaos in a mathematical model of a periodically driven
oscillator: A theory for the entrainment of biological oscillators and the
generation of cardiac dysrhythmias, J. Math. Biology, 14:1-23.
Hadamard, J. 1898. Les surfaces a courbures opposees et leur lignes gOOdesiques, J. Math. Pures Appl., 4:27-73.
Hale, J. 1977. Theory of Functional Differential Equations, Springer-Verlag,
New York.
Halm08 P.R. 1974. Measure Theory, Springer-Verlag, New York.
Hardy, G.H. and Wright, E.M. 1959. An Introduction to the Theory of
Numbers, 4th Edition, Oxford University Press, London.
Henon, M. 1976. A two-dimensional mapping with a strange attractor.
Commun. Math. Phys., 50:69-77.
Horbacz, K. 1989a. Dynamical systems with multiplicative perturbations,
Ann. Pol. Math., 50:11-26.

452

References

Horbacz, K. 1989b. Asymptotic stability of dynamical systems with multiplicative perturbations, Ann. Polon. Math., 50:209-218.
Jablonski, M. and Lasota, A. 1981. Absolutely continuous invariant measures for transformations on the real line, Zesz. Nauk. Uniw. Jagiellon.
Pr. Mat., 22:7-13.
Jakobson, M. 1978. Topological and metric properties of one-dimensional
endomorphisms, Dokl. Akad. Nauk. SSSR, 243:866-869 [in Russian).
Jama, D. 1986. Asymptotic behavior of an integra-differential equation of
parabolic type, Ann. Polon. Math., 47:65-78.
Jama, D. 1989. Period three and the stability almost everywhere, Rivista
Mat. Pum Appl., 5:85-95.
Jaynes, E.T. 1957. Information theory and statistical mechanics, Phys.
Rev., 106:62Q-630.
Kamke, E. 1959. Differentialgleichungen: Losungsmethoden und Losungen.
Band 1. Gewonliche Differential-gleichungen, Chelsea, New York.
Katz, A. 1967. Principles of Statistical Mechanics, Freeman, San Francisco.
Kauffman, S. 1974. Measuring a mitotic oscillator: The arc discontinuity,
Bull. Math. Biol., 36:161-182.
Keener, J.P. 1980. Chaotic behavior in piecewise continuous difference
equations, 7rans. Amer. Math. Soc., 261:589-604.
Keller, G. 1982. Stochastic stability in some chaotic dynamical systems,
Mh. Math., 94:313-333.
Kemperman, J.H.B. 1975. The ergodic behavior of a class of real transformations, in Stochastic Processes and Related Topics, pp. 249-258 (vol.
1 of Proceedings of the Summer Research Institute on Statistical Inference, Ed. Madan Lal Puri). Academic Press, New York.
Kielek, Z. 1988. An application of the convolution iterates to evolution
equation in Banach space, Universitatis Jagellonicae Acta Mathematica,
27:247-257.
Kifer, Y.l. 1974. On small perturbations of some smooth dynamical systems, Math. USSR Izv., 8:1083-1107.
Kitano, M., Yabuzaki, T., and Ogawa, T. 1983. Chaos and period doubling
bifurcations in a simple acoustic system, Phys. Rev. Lett., 50:713-716.
Knight, B.W. 1972a. Dynamics of encoding in a population of neurons, J.
Gen. Physiol., 59:734-766.
Knight, B.W. 1972b. The relationship between the firing rate of a single
neuron and the level of activity in a population of neurons. Experimental
evidence for resonant enhancement in the population response, J. Gen.
Physiol., 59:767-778.
Komornik, J. and Lasota, A. 1987. Asymptotic decomposition of Markov
operators, Bull. Polon. Acad. Sci. Math., 35:321-327.

References

453

Komorowski, T. and Tyrcha, J. 1989. Asymptotic properties of some Markov


operators, Bull. Acad. Polon. Sci. Math., 37:221-228.
Koopman, B.O. 1931. Hamiltonian systems and transformations in Hilbert
space, Proc. Nat. Acad. Sci. USA, 17:315-318.
Kosjakin, A.A. and Sandler, E.A. 1972. Ergodic properties of a certain class
of piecewise smooth transformations of a segment, lzv. Vyssh. Uchebn.
Zaved. Matematika, 118:32-40.
Kowalski, Z.S. 1976. Invariant measures for piecewise monotonic transformations, Lect. Notes Math., 472:77-94.
Krook, M. and Wu, T.T. 1977. Exact solutions of the Boltzmann equation,
Phys. Fluids, 20:1589-1595.
Krzyzewski, K. 1977. Some results on expanding mappings, Soc. Math.
Fhmce Asterique, 50:205-218.
Krzyzewski, K. and Szlenk, W. 1969. On invariant measures for expanding
differential mappings, Stud. Math., 33:83-92.
Lasota, A. 1981. Stable and chaotic solutions of a first-order partial differential equation, Nonlin. Anal., 5:1181-1193.
Lasota, A., Li, T.Y., and Yorke, J.A. 1984. Asymptotic periodicity of the
iterates of Markov operators, 7rans. Amer. Math. Soc., 286:751-764.
Lasota, A. and Mackey, M.C. 1980. The extinction of slowly evolving dynamical systems, J. Math. Biology, 10:333-345.
Lasota, A. and Mackey, M.C. 1984. Globally asymptotic properties of proliferating cell populations, J. Math. Biology, 19:43-62.
Lasota, A. and Mackey, M.C. 1989. Stochastic perturbation of dynamical
systems: The weak convergence of measures, J. Math. Anal. Applic.,
138:232-248.
Lasota, A., Mackey, M.C., and Tyrcha, J. 1992. The statistical dynamics
of recurrent biological events, J. Math. Biology, 30:775-800.
Lasota, A., Mackey, M.C., Wa:iewska-Czyzewska, M. 1981. Minimizing therapeutically induced anemia, J. Math. Biology, 13:149-158.
Lasota, A. and Rusek, P. 1974. An application of ergodic theory to the
determination of the efficiency of cogged drilling bits, Arch. G6rnictwa,
19:281-295. [In Polish with Russian and English summaries.]
Lasota, A. and Tyrcha, J. 1991. On the strong convergence to equilibrium
for randomly perturbed dynamical systems, Ann. Polon. Math., 53:7989.
Lasota, A. and Yorke, J .A. 1982. Exact dynamical systems and the FrobeniusPerron operator, 7rans. Amer. Math. Soc., 273:375-384.
Li, T.Y. and Yorke, J.A. 1978a. Ergodic transformations from an interval
into itself, 7rans. Am. Math. Soc., 235:183-192.

454

References

Li, T.Y. and Yorke, J.A. 1978b. Ergodic maps on [0, 1] and nonlinear pseudorandom number generators, Nonlinear Anal., 2:473-481.
Lin, M. 1971. Mixing for Markov operators, Z. Wahrscheinlichkeitstheorie
Verw. Gebiete, 19:231-242.
Lorenz, E.N. 1963. Deterministic nonperiodic flow, J. Atmos. Sci., 20:130141.
Loskot, K. and Rudnicki, R. 1991. Relative entropy and stability of stochastic semigroups, Ann. Polon. Math., 53:139-145.
Mackey, M.C. and Dormer, P. 1982. Continuous maturation of proliferating
erythroid precursers, Cell Tissue Kinet., 15:381-392.
Mackey, M.C., Longtin, A., and Lasota, A. 1990. Noise-induced global
asymptotic stability, J. Stat. Phys., 60:735-751.
Malczak, J. 1992. An application of Markov operators in differential and
integral equations, Rend. Sem. Univ. Padova, 87:281-297.
Mandelbrot, B.B. 1977. Fractals: Form, Chance, and Dimension, Freeman,
San Francisco.
Manneville, P. 1980. Intermittency, self-similarity and 1/I spectrum in dissipative dynamical systems, J. Physique, 41:1235-1243.
Manneville, P. and Pomeau, Y. 1979. Intermittency and the Lorenz model,
Phys. Lett., 75A:1-2.
May, R.M. 1974. Biological populations with nonoverlapping generations:
stable points, stable cycles, and chaos, Science, 186:645-647.
May, R.M. 1980. Nonlinear phenomena in ecology and epidemology, Ann.
N.Y. Acad. Sci., 357:267-281.
Misiurewicz, M. 1981. Absolutely continuous measures for certain maps of
an interval, Publ. Math. IHES, 53:17-51.
von Neumann, J. 1932. Proof of the quasi-ergodic hypothesis, Proc. Nat.
Acad. Sci. USA, 18:31-38.
Parry, W. 1981. Topics in Ergodic Theory, Cambridge University Press,
Cambridge, England.
Petrillo, G.A. and Glass, L. 1984. A theory for phase locking of respiration
in cata to a mechanical ventilator, Am. J. Physiol., 246:R311-320.
Pianigiani, G. 1979. Absolutely continuous invariant measures for the process Xn+t = Axn(1 - Xn), Boll. Un. Mat. !tal., 16A:374-378.
Pianigiani, G. 1983. Existence of invariant measures for piecewise continuous transformations, Ann. Polon. Math., 40:39-45.
Procaccia, I. and Schuster, H. 1983. Functional renormalization group theory of 1/I noise in dynamical systems, Phys. Rev. A, 28:121Q--1212.
Renyi, A. 1957. Representation for real numbers and their ergodic properties, Acta Math. Acad. Sci. Hung., 8:477-493.

References

455

Riskin, H. 1984. The Fokker-Planck Equation, Springer-Verlag, New York.


Rochlin, V.A. 1964. Exact endomorphisms of Lebesgue spaces, Am. Math.
Soc. 7ransl., (2} 39:1-36.
Rogers, T.D. and Whitley, D.C. 1983. Chaos in the cubic mapping, Math.
Modelling, 4:9-25.
Royden, H.L. 1968. Real Analysis, Macmillan, London.
Rudnicki, R. 1985. Invariant measures for the flow of a first-order partial
differential equation, Ergod. Th. & Dynam. Sys., 5:437-443.
Ruelle, D. 1977. Applications conservant une mesure absolument continue
par rapport 8. dx sur (0, 1], Commun. Math. Phys., 55:477-493.
Sarkovskir, A.N. 1964. Coexistence of cycles of a continuous map of a line
into itself, Ukr. Mat. Zh., 16:61-71.
Schaefer, H.H. 1980. On positive contractions in lJ' spaces, 7rans. Am.
Math. Soc., 257:261-268.
Schiff, L.I. 1955. Quantum Mechanics, McGraw-Hill, New York.
Schwartz, L. 1965. Methodes mathemathiques de la physique, Hermann,
Paris.
Schwartz, L. 1966. Theorie des distributions, Hermann, Paris.
Schweiger, F. 1978. Tan xis ergodic, Proc. Am. Math. Soc., 1:54-56.
Shannon, C.E. and Weaver, W. 1949. The Mathematical Theory of Communication, University of illinois Press, Urbana.
Sinai, Ya. 1963. On the foundations of ergodic hypothesis for a dynamical
system of statistical mechanics, Sov. Math. Dokl., 4:1818-1822.
Smale, S. 1967. Differentiable dynamical systems, Bull. Am. Math. Soc.,
73:741-817.
Smale, S. and Williams, R.F. 1976. The qualitative analysis of a difference
equation of population growth, J. Math. Biology, 3:1-5.
Szarski, J. 1967. Differential Inequalities (2nd Ed.}, Polish Scientific Publishers, Warsaw.
Tjon, J.A. and Wu, T.T. 1979. Numerical aspects of the approach to a
Maxwellian distribution, Phys. Rev. A, 19:883-888.
Tyrcha, J. 1988. Asymptotic stability in a generalized probabilistic/deterministic model of the cell cycle, J. Math. Biology, 26:465-475.
Tyson, J.J. and Hannsgen, K.B. 1986. Cell growth and division: a deterministic/probabilistic model of the cell cycle, J. Math. Biology, 23:231-246.
Tyson, J.J. and Sachsenmaier, W. 1978. Is nuclear division in Physarum
controlled by a continuous limit cycle oscillator? J. Theor. Biol., 73:723738.
Ulam, S.M. and von Neumann, J. 1947. On combination of stochastic and
deterministic processes, Bull. Am. Math. Soc., 53:1120.

456

References

Voigt, J. 1981. Stochastic operators, information and entropy, Commun.


Math. Phys., 81:31-38.
Walter, W. 1970. Differential and Integral Inequalities, Springer-Verlag,
New York.
Walters, P. 1975. Ergodic Theory: Introductory Lectures, Lecture Notes in
Mathematics 458, Springer-Verlag, New York.
Walters, P. 1982. An Introduction to Ergodic Theory, Springer-Verlag, New
York.
Wolfram, S. 1983. Statistical mechanics of cellular automata, Reviews of
Modern Physics, 55:601-644.
Zdun, M.C. 1977. Continuous iteration semigroups, Boll. Un. Mat. /tal.,
14A:65-70.

Notation and Symbols

If A and Bare sets, then x E B means that "xis an element of B," whereas
A c B means that "A is contained in B." For x B and A . B substitute
"is not" for "is" in these statements. Furthermore, A U B = {x: x E A or
x E B}, AnB = {x:x E A and x E B}, A \B = {x:x E A and x B},
and Ax B = {(x,y):x E A andy E B}, respectively, define the union,
intersection, difference, and Cartesian product of two sets A and B.
Symbol0 denotes the empty set, and

lA(x)

1 ifxEA
if x A

={0

is the characteristic (or indicator) function for set A.


When a < b the closed interval [a, bJ = {x: a ::::; x ::::; b}, whereas the
open interval (a, b)= {x:a < x < b}. The half-open intervals [a, b) and
(a, b] are similarly defined. The real line is denoted by R, and the positive
half-line by R+. If A is a set and a E R, then aA = {y: x E A andy= ax}.
The notation I:A-+ B means that "I is a function whose domain is
A and whose range is in B," or "I maps A into B." Given two functions
I: A -+ B and g: B -+ C, then g o I denotes the composition of g with I
and go 1: A-+ C. If I maps R (or a subset of R) into R, and b is a positive

number, then
g(x)

= l(x)

(mod b)

means that g(x) = l(x)- nb, where n is the largest integer less than or
equal to l(x)fb. IIIIILP and {!,g), respectively, denote the V' norm ofthe

458

Notation and Symbols

V!

function 1, and the scalar product of the functions I and g.


I is used
for the variation of the function I over the interval [a, b).
The following is a list of the most commonly used symbols and their
meaning:

a. e.
A

8
d(g, :F)

d+lld.x
D,D(X,A,p)

D2 <e>

1>(A)

E<e>

E(V I/)
Ei(x)

{17t}

I
I.
:F
{:Ft}

9ub(x)
.p

9ij

Hn(x)
H(/)
H(/ I g)
I

K(x,y)
V,V(X,A,p)

v'
p(A)
J.'J(A)
J.Lw

{Nth~o

w
0
(O,:F, prob)
p

Pe,P
{Fth~o
prob
Prob
R>.

almost everywhere
u-algebra
Borel u-algebra
L 1 distance between functions g and :F
right lower derivative
set of densities
variance of random variable
domain of an infinitesimal operator A
mathematical expectation of a random variable
expected value of V with respect to I
exponential integral
continuous time stochastic process
an element of V, often a density
stationary density
V set of functions, u-algebra in probability space
family of u-algebras
Gaussian density with variance u 2 /2b
Riemannian metric
Hermite polynomial
entropy of a density I
conditional entropy of I with respect to g
identity operator
stochastic kernel
V space
space adjoint to V
measure of a set A
measure of a set A with respect to a density I
Wiener measure
counting process
an element of 0; angular frequency
space of elementary events
probability space
Markov or Frobenius-Perron operator
Markov operator
continuous semigroup generated by the linear
Boltzmann equation
probability measure
probability measure on a product space
resolvent operator
transformation

Notation and Symbols

s- 1 (A)
Bm
Sl
{StheR, {Sth>o
u(e)
T
Ttl
{7t}t~o

u
v
{w(t)}t>O
(X,A,~t)

e.e,

{en}, {et}

counterimage of a set A under a transformation S


Chebyshev polynomial
unit circle
dynamical or semidynamical system
standard deviation of a random variable
transformation
d-dimensional torus
semigroup corresponding to an infinitesimal
operator A
Koopman operator
Liapunov function, potential function
Wiener process
measure space
random variables
discrete or continuous time stochastic process

459

Index

Abel Inequality, 142


abstract ergodic theorem, 89
acoustic feedback system, 164
adjoint operator, 48, 49
almost everywhere (a.e.) 19, 38
almost sure convergence, 312
Anosov diffeomorphism 57, 77
arcs, equivalent, 177
asymptotic periodicity, 95, 117, 156
and constrictive operators, 99,
321, 322, 331, (333)
and asymptotic stability, 105
of stochastically perturbed systems,321,322,331,(333)
asymptotic stability, 104, 202
of Chandrasekhar-Miinch equation, 386
via change of variables, 165
of convex transformation, 154
of expanding mappings on manifolds, 184
of fluid flow, 156
of integral operators, 112, 115
of integro-differential equations,
379--386

of iterated function systems,


434
of Lorenz equations, 150
of measures, weak, 397, 421
of measures, strong, 402, 425,
430
of monotonically increasing
transformation, 144
and overlapping support (137)
proof via lower-bound function, 106, 201
of quadratic transformation,
166
relation to asymptotic periodicity, 105
relation to conditional entropy,
299
relation to exactness, 110
relation to Liapunov function,
115, 372, 378,
relation to statistical stability, 105
of Renyi transformation, 145
of stochastic semigroups, 202
of stochastically perturbed sys-

462

Index

teiDS,317,323,325-326,
332, (333)
of strong repellor, 154
of transformations on R, 172
automorphism, 80

baker transformation, 54, 65, 81,


(83), 295
relation to dyadic transformation, 56, 295
Barnsley operator, 435
Bielecki function, 127, 387
and sweeping of stochastic
semi-group, 245, 387
Birkhoff ergodic theorem
continuous time, 196
discrete time, 63,64
Boltzmann equation and entropy,
295, 299
Borel measure, 19, 29-30
on manifolds, 182
Borel sets, 18
Borel u-algebra, 18
bounded variation, function of,
139-144
Brownian motion
d-dimensional, 345
one-dimensional, 336

Cantor set, 441


dimension of, 446
capacity, 444
Cartesian product, 27
Cauchy condition for convergence,
34
Cauchy-Holder inequality, 27
Cauchy problem for Fokker-Planck
equation, 364-365
cell proliferation, 114, 119-122,
127-129, 133-135, 343
Cesaro convergence, 31, 122
of Frobenius-Perron operator,
72
of Koopman operator, 75

Chandrasekhar-Miinch equation,
131, 240, 383
change of variables
and asymptotic stability, 165171
in Lebesgue integral, 46
characteristic function, 5
Chebyshev inequality, 114, 310
Chebyshev polynomials, 169
classical solution of Fokker-Planck
equation, 365
closed linear operator (248)
closed linear subspace, 91
compact support, 207
comparison series, 31
complete space, 34
complete measure, 30
conditional entropy, 291-292, 299
connected manifold, 180
constant of motion, 214
constrictive Markov operator, 95,
96
and asymptotic periodicity,
98, 321-323
and perturbations, 321
continuous semigroup, 195
of contractions, 204
of contractions and infinitesimal operator, 226
and ordinary differential equations, 21G-214
continuous stochastic processes, 336
continuous stochastic semigroup,
229
continuous time stochastic process,
254
continuous time system
asymptotically stable, 202
and discrete time systeiDS, 198,
251-252
ergodic, 197
exact, 199, 338-344
mixing, 198, 22G-224
sweeping, 244-245
contracting operator, 39,204

Index

convergence
almost sure, 312
Cauchy condition for, 34
Cesaro, 31
comparison series for, 31
spaces, 33
in different
in mean, 311
to set of functions, 160
stochastic, 311
strong, 31
strong, of measures, 402, 425
weak, 31
weak Cesaro, 31
weak convergence of measures,
397, 421
convex transformation, 153-156
counterimage, 5
counting measure, 18
counting process, 254
curvature, 225-226
cycle, 102
cyclical permutation, 102
cylinder, 221, 340

dense set, 97
dense subset of densities, 97
density, 5, 9, 41
of absolutely continuous measure, 41
Gaussian, 398, 409
evolution of by Fcobenius-Perron
operator, 38, 241-243
of random variable, 253
stationary, 41
and sweeping, 125, 129, 386
derivative
Radon-Nikodym, 41
right lower, 123
strong, 207
determinant of differential on manifold, 182
diffeomorphism, 58
differential
determinant of, on manifold,
182

463

of transformation on manifold, 178


differential delay equation as semidynamical system, 190
differential equation as dynamical system, 190
dimension, 445
fractal, 444
Dirac measure, 395, 399, 403, 408,
411
discrete time stochastic process,
254
discrete time system
and Poisson processes, 258261
as sampled semidynamical system, 251-252
embedded in continuous time
system, 252
distance
Hausdorff, 436
between measures, 401
in V spaces, 26
on manifold, 180
distribution, 394
invariant, 417
stationary, 417
dyadic transformation, 8, 66, 77,
295
related to baker transformation, 56
dynamical system, 191
and exactness, 199
ergodic, 197
mixing, 198
trace of, 193
elementary events, 253
endomorphism, 81
entropy, 284
conditional, 291
and exact transformations,
293, 295, 297
and Fcobenius-Perron operator, 292-295, 297

464

Index

and Hamiltonian systems, 293


and heat equation (300)
and invertible transformations,
292
and Liouville equation, 295
and Markov operators, 289292
and quadratic map, 290
of reversible and irreversible
systems, 295
equivalent arcs on manifold, 177
ergodic Birkhoff theorem, 63-64,

196
ergodic dynamical system, 197
ergodic Markov operator, 79, (83),
102
ergodic semidynamical system, 197
ergodic transformation, 59
ergodicity
conditions for via FrobeniusPerron operator, 61, 72,
94, 220
conditions for via Koopman
operator, 59, 75,215,220,
230
and Hamiltonian systems, 230
and linear Boltzmann equation, 273-276
illustrated, 68
of motion on torus, 216-218
necessary and sufficient conditions for, 59
relation to mixing, exactness
and K -automorphisms, 80
and rotational transformation,
62, 75, 198
essential supremum, 27
Euler-Bernstein equations, 357
events
elementary, 253
independent, 253
mutually disjoint, 253
exact Markov operator, 79, (83),
103
exact semidynamical system, 199

exact semidynamical system with


continuous time, 339-344
exact semigroup of linear Boltzmann equation, 273
exact transformation, 66
exactness
and asymptotic periodicity, 103
and entropy, 293, 295, 297
illustrated, 69
necessary and sufficient conditions for via FrobeniusPerron operator, 72,220
of r-adic transformation, 77
relation to dynamical systems,
199
relation to ergodicity, mixing,
and K-automorphisms, 66,
79, 82, 199
relation to statistical stability, 110, 167
of transformations on torus,
186, (188)
expanding mappings, 183
expanding Markov operator, 132
and asymptotic stability, 247
factor of transformation, 82
finite measure space, 19
first return map, 252
fixed point of Markov operator,
40
fluid flow, 156
Foias operator, 414
relation to Frobenius Perron
operator, 416
relation to Koopman operator, 416
Foguel alternative to sweeping, 130
and expanding Markov operators, 133
and Fokker-Planck equation,
388
Fokker-Planck equation
asymptotic stability of solutions, 372, 388, 390

Index

and Cauchy problem, 364-365


cl88Sical solution, 365
derivation of, 359-363
existence and uniqueness of
solutions, 366
fundamental solution, 365
generalized solution, 368
for Langevin equation, 367,
374
and Liouville equation, 371
for second-order system, 376
stationary solutions, 374-376
for stochastic differential equations, 360
and stochastic semigroups, 369
and sweeping, 386-388
forced oscillator, 161-163
fractal, 444
dimension 444
Frobenius-Perron operator for densities, 37, 200
for Anosov diffeomorphism,
57
for baker transformation, 54
for dyadic transformation, 9
and evolution of densities, 513, 38
for Hamiltonian system, 214
and invariant measure, 52, 215,
229
for invertible transformations,
43, 45, 47
and Koopman operator, 48,
203
and Liouville equation, 230
as Markov operator, 43
and ordinary differential equations, 211-213
for quadratic map, 7, 53
for r-adic transformation, 9,
52
relation to entropy, 292-295,
297-298
relation to ergodicity, 61, 72,
94, 220

465

relation to infinitesimal operator, 211-213


relation to Koopman operator, 48, 203, 241-243
relation to mixing, 72, 220
and semidynamical systems,
199-200, 215
semigroups of, 199
support of, 44
for transformations on R, 10,
43, 172
for transformations on R 2 , 45
weak continuity of, 48
Frobenius-Perron operator for measures, 411
relation to Foias operator, 416
Fubini's theorem, 29
function
Bielecki, 127
of bounded variation, 140
with compact support, 207
left lower semicontinuous, 123
Liapunov, 114, 115, 117
locally integrable, 130
lower bound, 106
lower semicontinuous, 122
subinvariant, 129
support of, 39
functional, linear, 395
fundamental solution of FokkerPlanck equation, 365
gas dynamics, 20Q-224, 277-280
Gaussian density, 286, 325, 336,
345, 398, 409, (410)
Gaussian kernel, 202, 234, 239, 243,
366
generalized solution of Fokker-Planck
equation, 368
geodesic, 225
flow, 225
motion on, 224-226
Gibbs canonical distribution function, 288
Gibbs inequality, 284

466

Index

gradient
of function, 177
length of, 181
Hahn-Banachtheorem, 91
Hamiltonian, 213
system, 213, 218, 225, 23G231, 293
hat map, (50), 167, (188)
Hausdorff distance, 436
and capacity, 444
and fractal dimension, 444
Hausdorff space, 177
heat equation, 203,234,243, (300),
409
Henon map, 56
Hille-Yosida theorem, 226, (248)
homeomorphism, 176
hyperbolic iterated function system, 434, (447)
ideal gas, 22G-224, 277-280
independent events, 253, (280)
independent increments, 254
independent u-algebras, 344
independent random variables, 253,
304, 314
independent random vectors, 304,
(333)
indicator function, 5
inequality
Cauchy-Holder, 27
Gibbs, 284
Jensen, 288
triangle, 26
infinitesimal operator, 206
of continuous semigroup of contractions, 226
and differential equations, 210
and ergodicity, 215
and Frobenius-Perron operator, 212-214, 229
and Hamiltonian systems, 213214
and Hille-Yosida theorem, 226

illustrated by parabolic differential equations, 234235


illustrated by heat equation,
234
and invariant measure, 229
and Koopman operator, 21G212, 230
and ordinary differential equations, 21Q-213
and partial differential equation, 206-209
as strong derivative, 206
integrable function, 22
integral
Ito, 347
Lebesgue, 19-22
Riemann, 23
integro-differential equations, 238,
240, 379, 383
intermittency, 156
invariant measure, 52, 83), 196,
417
and differential equations, 230
and Frobenius-Perron operator, 52, 215
and Hamiltonian systems, 230
and infinitesimal operators,
229
and Liouville's theorem, 229230
and sweeping, 130
invariant set, 59, 197
invertibility, 56, 68, 19G-191, 292,
295
iterated function system, 433
attractor of, 436
and Cantor set, 441
hyperbolic, 434
and Sierpinski triangle, 441
weak asymptotic stability
of, 434
Ito integral, 346-351
Ito sum, 347, (392)

Index

Jacobian matrix, 46
Jensen inequality, 288
joint density function, 304

K -automorphism, 80
and exactness, 82
and geodesic flows, 226
and mixing, 82
Keener map, 322-323
K-flow, 226
Kolmogorov automorphism, 80
Kolmogorov equation, see FokkerPlanck equation
Koopman operator, 47-49, 203
and Anosov diffeomorphism,
77
and motion on torus, 216-218
relation to ergodicity, 59, 75,
215-216, 220, 230
relation to Frobenius-Perron
operator, 48, 204, 241243
relation to infinitesimal operators, 21o-211, 230
relation to mixing, 75, 220
relation to ordinary differential equations, 21o-211
and rotational transformation,
75
Krylov-Bogolubov theorem, 419

Langevin equation, 357, 367, 374


law of large numbers
strong, 314
wea.k, 313
Lebesgue decomposition of measures, 426
Lebesgue dominated convergence
theorem, 22
Lebesgue integral, 19-22
on product spaces, 29
relation to Riemann integral,
23
Lebesgue measure, 30

467

Lebesgue monotone convergence


theorem, 22
left lower semicontinuous function,
123
length of gradient on manifolds,
181
Liapunov function, 114, 117, 321,
325, 371, 378, 419
linear abstract Boltzmann equation, 264
linear Boltzmann equation, 261268, (280}, 299
linear functional, 395
linear operator, closed (248)
linear subspace, 91
linear Tjon-Wu equation, 277,
(280}
linearly dense set, 31
Liouville equation, 229, 295, 371
Liouville's theorem, 229
locally finite measure, 393
locally integrable function, 130
Lorenz equations, 150
lower bound function, 106
conditions for existence, 122124, 183
relation to asymptotic stability, 106, 112, 210
lower semicontinuous function, 122
V, space adjoint to, 26
V distance, 26
V norm, 25
V space, 25
complete, 34
manifold, 175-183
connected, 180
d-dimensional, 176
geodesic flow on, 225
Markov operator, 37-38
adjoint operator to, 49
asymptotic periodicity, 95100, 117, 118, 321, 331,
(333}
asymptotic stability, 104, 202

468

Index
constrictive, 95, 96
contractive property of, 39,
201

deterministic, (50)
ergodic, 79, (83), 102
exact, 79, (83), 103
expanding, 132
fixed point of, 40
and Foias operator, 414
and Frobenius-Perron operator, 43
and linear abstract Boltzmann
equation, 261-268
lower-bound function for, 106
for measures, 405
nrubdng, 79, (83), 104
and parabolic equation, 368
properties of, 38-39
relation to entropy, 289--292
semigroup of, 201
stability property of, 39, 202
stationary density of, 41
with stochastic kernel, 111,
(136), 243, 270
and stochastic perturbation,
317, 320, 327, 331
s~ping, 125-129
weak continuity of, 49
mathematical expectation, 306
maximal entropy, 285-288
maximal measure, 435
Maxwellian distribution, 378
mean value
of function, 139
of random variable, 306
measurable function, 19
space of, 25
measurable set, 18
measurable transformation, 41
measure, 18
absolutely continuous, 41
absolutely continuous part, 425
Borel, 19
complete, 30
continuous part of, 425

density of, 41
Dirac, 395, 399, 403, 408, 411
distance bet~n, 401
with Gaussian density, 398,
409
invariant, 52, (83), 196, 417
Lebesgue, 30
Lebesgue decomposition of,
426
locally finite, 393
maximal, 425
and Markov operator, 405
nonsingular, 425
norm of, 402
normalized, 41
preserving transformation, 52
product, 30, 259
probabilistic, 402, 403
singular part of, 425
stationary, 417
strong convergence of, 402, 425
support of, 394
uniqueness, 395
weak convergence of 397, 421
Wiener, 340-341
measure-preserving transformation,
52, 196
measure space, 18
finite, 19
normalized, 19
probabilistic, 19
product of, 30
u-finite, 19
metric, Riemannian, 179
mixing, 65
of Anosov diffeomorphism, 77
of baker transformation, 65,
(83)
of dyadic transformation, 66
dynamical system, 198
illustrated, 70
Markov operator, 79, (83), 104
necessary and sufficient conditions for via FrobeniusPerron operator, 72, 220

Index

necessary and sufficient conditions for via Koopman


operator, 75, 220
relation to ergodicity, exactness, and K -automorphisms,
66, 79, 82, 199
semidynamical system, 198
transformation, 65
modulo zero equality, 39
moments of solutions, 367-368
nonanticipative u-algebra, 347
nonsingular semidynamical system,
199
nonsingular transformation, 42
nontrivial lower-bound function,
106
norm
in V, 25
of measure, 402
preservation of, 406
of vector on manifold, 180
normalized measure space, 19
normalized Wiener process, 336
one-dimensional Brownian motion,
336
one-dimensional Weiner process,
336
operator
constrictive, 95-96
contractive, 39, 201
expanding, 132
Frobenius-Perron, 37, 200
infinitesimal, 206
Koopman, 47-49, 203
Markov, 37-38
resolvent, 227
oscillators, 161-163, 218-220
parabolic equation, 203
parabolicity condition, 364
paradox of weak repellor, 11, 150
partition function, 288
phase space, 1, 192

469

Phillip's perturbation theorem, 236,


(248)
piecewise convex transformations,
153-156
piecewise monotonic mappings,
144-153, 172
Poincare map, 252
Poincare recurrence theorem, 65
Poisson bracket, 213
Poisson distribution, 405
Poisson processes, 254-257, (280)
Poisson's theorem, 405
probabilistic measure space, 19,
253
product measure, 30, 259
product space, 28, 259
proper cylinder, 221
quadratic transformation, 1, 7, (14),
53, 56, 166, 290, (333)
r-adic transformation, 9, (15) 52,
77
Radon-Nikodym theorem, 24, 27
Radon-Nikodym derivative, 41
random number generator, 171
random variable, 253
density of, 253
independent, 253, 305, 314
mathematical expectation of,
306
mean value of, 306
standard deviation of, 309
variance of, 308
random vector, 304
randomly applied stochastic perturbation, 315-319
regular family, 129
regular Ito sum, 347
regular stochastic dynamical system, 413
Renyi transformation, 144
resolvent operator, 227
Riemann integral, relation to
Lebesgue integral, 23

470

Index

Riemannian metric, 179


Riesz representation theorem, 395
right lower derivative, 123
rock drilling, 163
rotation on circle, 62, 75, (83), 198
rotation on torus, 216-218
sample path, 254
scalar product, 27
on manifolds, 180
semidynamical system, 195
and ergodicity, 197
and exactness, 199
and mixing, 198
semigroup
continuous, 195
continuous stochastic, 201
of contracting operators, 204
of contractions, 204
of Frobenius-Perron operator,

199
of Koopman operator, 203
stochastic, 201
sweeping, 243
of transformations, 195
u-algebra, 18
Borel, 18
independent, 344
nonanticipative, 347
trivial, 80
u-finite measure, 394
u-finite measure space, 19
Sierpinski
carpet, 447
triangle, 441
simple function, 21
space
adjoint, 26
of measurable functions, 25
space and time averages, 64, 196
spectral decomposition theorem,

98
sphere bundle, 225
stability property of Markov operators, 39

standard deviation, 309


state space, 1
stationary density, 41
stationary distribution, 417
stationary independent increments,

254
stationary measure, 417
statistical stability, 105, (187)
relation to asymptotic stability, 105
relation to exactness, 110
statistically stable transformation,
construction of, 167
Stirling's formula, 267
stochastic convergence, 311
stochastic differential equations,

335,355
relation to Fokker-Planck equation, 359-360
stochastic integrals, 347, 353
stochastic kernel, 111, (136), 243,

274, 277
stochastic perturbation
additive, 315, 320, 327, (333)
and asymptotic periodicity,

321-323, 331, (333)


constantly applied, 320
multiplicative, 330, (333-334)
randomly applied, 315
small, 315, 327
stochastic processes, 254
continuous time, 254
and convergence of measures,
410
discrete time, 254
with independent increments,

254
with stationary independent
increments, 254
stochastic semigroup, 201, (248)
asymptotic stability of, 202
and Bielecki function, 245
relation to Fokker-Planck
equation, 369
and sweeping, 245, (392)

Index
Stratonvich sum, 350
strong asymptotic stability of measures, 425
in regular stochastic systems,
430
and weak asymptotic stability, 426, 434
strong convergence, 31
Cauchy condition for, 34
of densities, 72
of measures, 397, 402
strong law of large numbers, 314
strong precompactness, 86, (135)
conditions for, 87-88
strong repellor, 153-156
subinvariant function, 129
support, 39
compact, 207
and Frobenius-Perron operator, 44
of measure, 394
sweeping 125-127, (136), 243-244,
(333)
and Bielecki function, 127, 245,
387
and Foguel alternative, 130,
133
and Fokker-Planck equation,
386-388
and invariant density, 130, 247
and stochastic semigroup, 243
tangent space, 178
tangent vector, 177
tent map, (50), 167, (188)
time and space averages, 64, 196
torus, 186
Anosov diffeomorphism on, 57
d-dimensional, 216
exact transformation on, 186
rotation on, 216-218
trace of dynamical system, 193194
trajectory, 192
versus density, 10

471

transformation
asymptotically periodic, 156165
convex, 153-156
ergodic, 59, 197
exact,66, 199
factor of, 82
Frobenius-Perron operator
for, 7, 42, 199-200, 215
Koopman operator for, 47,
203
measurable, 41
measure-preserving, 52
mixing, 65, 198
nonsingular, 42
piecewise monotonic, 144,
153, 156, 165, 172
statistically stable, 105, 110
weakly mixing, 80
triangle inequality, 26
trivial set, 59, 197
trivial u-algebra, 80
uniform parabolicity, 364
unit volume function, 181
variance
of function, 139
of random variable, 308
of Wiener process, 337
variation of function, 140
vector
norm of, 180
scalar product of, 180
space, 26
von Neumann series, 265

weak asymptotic stability


of iterated function systems, 434
of measures, 397, 421
and strong asymptotic stability, 426
weak Cesaro convergence, 31
of densities, 72

472

Index

weak continuity, 49
weak convergence, 31
of densities, 72
of measures, 397, 400, 401
weak law of large numbers, 313
weak precompactness, 86
condition for, 87-88
weak repellor, paradox of, 11, (15),
151

weakly mixing transformation, 80


Wiener measure, 34Q-341
Wiener process
d-dimensional, 345
normalized, 337
one-dimensional, 336, (391)
variance of, 337
Yorke inequality, 143-144

Applied Mathematical Sciences


(continued from page ii)
61. Sattinger!Weaver: Ue Groups and Algebras with
Applications to Physics, Geometry, and
Mechanics.
62. LaSalle: The Stability and Control of Discrete
Processes.
63. Grasman: Asymptotic Methods of Relaxation
Oscillations and Applications.
64. Hsu: Cell-to-Cell Mapping: A Method of Global
Analysis for Nonlinear Systems.
65. Rand/Armbruster: Perturbation Methods,
Bifurcation Theory and Computer Algebra.
66. Hlavdcek/Haslinger/NecasVLovfsek: Solution of
Variational Inequalities in Mechanics.
67. Cercignani: The Boltzmann Equation and Its
Applications.
68. Ternam: Infinite-Dimensional Dynamical Systems
in Mechanics and Physics, 2nd ed.
69. Golubitsky/Stewart/Schae.ffer: Singularities and
Groups in Bifurcation Theory, Vol. II.
70. Constantin/Foias/Nicolaenko/Ternam: Integral
Manifolds and Inertial Manifolds for Dissipative
Partial Differential Equations.
71. Catlin: Estimation, Control, and the Discrete
Kalrnan Filter.
72. Lochak/Meunier: Multiphase Averaging for
Classical Systems.
73. Wiggins: Global Bifurcations and Chaos.
74. Mawhin/Willem: Critical Point Theory and
Hamiltonian Systems.
75. Abraham!Marsden!Ratiu: Manifolds, Tensor
Analysis, and Applications, 2nd ed.
76. Lagerstrom: Matched Asymptotic Expansions:
Ideas and Techniques.
77. Aldous: Probability Approximations via the
Poisson CIUJq!ing Heuristic.
78. Dacorogna: Direct Methods in the Calculus of
Variations.
79. Herndndez-Lerma: Adaptive Markov Processes.
80. Lowden: Elliptic Functions and Applications.
81. Blurnan/Kumei: Symmetries and Differential
Equations.
82. Kress: Unear Integral Equations, 2nd ed.
83. Bebernes!Eberly: Mathematical Problems from
Combustion Theory.
84. Joseph: Fluid Dynamics of Viscoelastic Fluids.
85. Yang: Wave Packets and Their Bifurcations in
Geophysical Fluid Dynamics.
86. Dendrinos/Sonis: Chaos and Socio-Spatial
Dynamics.
87. Weder: Spectral and Scattering Theory for Wave
Propagation in Perturbed Stratified Media.
88. BogaevskVPovzner: Algebraic Methods in
Nonlinear Perturbation Theory.

89. O'Malley: Singular Perturbation Methods for


Ordinary Differential Equations.
90. Meyer/Hall: Introduction to Hamiltonian
Dynamical Systems and the N-body Problem
91. Straughan: The Energy Method, Stability, and
Nonlinear Convection.
92. Naber: The Geometry of Minkowski Spacetime.
93. Colton/Kress: Inverse Acoustic and
Electromagnetic Scattering Theory, 2nd ed.
94. Hoppensteadt: Analysis and Simulation of Chaotic
Systems, 2nd ed.
95. Hackbusch: Iterative Solution of Large Sparse
Systems of Equations.
%. Marchiora!Pulvirenti: Mathematical Theory of
Incompressible Nonviscous Fluids.
97. Lasota/Mackey: Chaos, Fractals, and Noise:
Stochastic Aspects of Dynamics, 2nd ed.
98. de Boor/Hollig/Riemenschneider: Box Splines.
99. Hale!Lunel: Introduction to Functional Differential
Equations.
100. Sirovich (ed): Trends and Perspectives in
Applied Mathematics.
101. Nusse!Yorke: Dynamics: Numerical Explorations,
2nded.
102. Chossatllooss: The Couette-Taylor Problem
103. Chorin: Vorticity and Turbulence.
104. Farkas: Periodic Motions.
105. Wiggins: Normally Hyperbolic Invariant
Manifolds in Dynamical Systems.
106. CercignanVlllner!Pulvirenti: The Mathematical
Theory of Dilute Gases.
107. Antman: Nonlinear Problems of Elasticity.
108. Zeidler: Applied Functional Analysis:
Applications to Mathematical Physics.
109. Zeidler: Applied Functional Analysis: Main
Principles and Their Applications.
110. Diekmannlvan Gils!Verduyn LuneVWalther:
Delay Equations: Functional-, Complex-, and
Nonlinear Analysis.
Ill. Visintin: Differential Models of Hysteresis.
112. Kuznetsov: Elements of Applied Bifurcation
Theory. 2nd ed.
113. Hislop/Sigal: Introduction to Spectral Theory:
With Applications to Schrlldinger Operators.
114. Kevorkian/Cole: Multiple Scale and Singular
Pertorbation Methods.
115. Taylor: Partial Differential Equations I, Basic
Theory.
116. Taylor: Partial Differential Equations II,
Qualitative Studies of Unear Equations.
117. Taylor: Partial Differential Equations ill,
Nonlinear Equations.

(continued on next page)

Applied Mathematical Sciences


Icontinued from previous poge)
118. GodJewskVRaviart: Numerical Approximation of
Hyperbolic Systems of Conservation Laws.
119. Wu: Theory and Applications of Partial Functional
Differential Equations.
120. Kirsch: An Introduction to the Mathematical
Theory of Inverse Problems.
121. Brokate/Sprekels: Hysteresis and Phase
Transitions.
122. Glildikh: Global Analysis in Mathematical
Physics: Geometric and Stochastic Methods.
123. Le/Schmitt: Global Bifurcation in Variational
Inequalities: Applications to Obstacle and
Unilateral Problems.
124. Polak: Optimization: Algorithms and Consistent
Approximations.
125. Amold/Khesin: Topological Methods in
Hydrodynamics.
126. Hoppenstead/1/zhikevich: Weakly Connected
Neural Networks.
127. Isakov: Inverse Problems for Partial Differential
Equations.
128. LVWiggins: Invariant Manifolds and Fibrations
for Perturbed Nonlinear Schrlldinger Equations.
129. MUller: Analysis of Spherical Synunetries in
Euclidean Spaces.
130. Feintuch: Robust Control Theory in Hilbert
Space.
131. Ericksen: Introduction to the Thermodynamics of
Solids, Revised ed.

132. lhlenburg: Finite Element Analysis of Acoustic


Scattering.
133. Vorovich: Nonlinear Theory of Sballow SheDs.
134. Vein/Dale: Detenninants and Their Applications in
Mathematical Physics.
135. Drew!Passman: TheoryofMulticomponent
fluids.
136. Cioranescu/Saint Jean Paulin: Homogenization
of Reticulated Structures.
137. Gurtin: Configurational Forces as Basic Concepts
of Continuwn Physics.
138. Haller: Chaos Near Resonance.
139. Sulem!Sulem: The Noulinear Schrl!dinger
Equation: Self-Focusing and Wave Collapse.
140. Cherkaev: Variational Methods for Structural
Optimization.

141. Naber: Topology, Geometry, and Gauge Fields:


Interactions.
142. Schmid/Henningson: Stability and Transition in
Shear flows.
143. SeiVYou: Dynamics of Evolutionary Equations.
144. Nldelec: Acoustic and Electromagnetic Equations:
Integral Representations for
Harmonic Problems.
145. Newton: TheN-Vortex Problem:
Analytical Techniques.
146. Allaire: Shape Optimization by the
Homogenization Method.
147. Aubert/Komprobst: Mathematical Problems in
Image Processing: Partial Differential Equations
and the Calculus of Variations.

In recent years there has been an explosive growth in the study of


physical , biological, and economic systems that can be profitably
studied using densities. Because of the general inaccessibility of the
mathematical literature to the nonspecialist, little diffusion of the
applicable mathematics into the study of these "chaotic" systems
has taken place. This book will help bridge that gap.
To show how densities arise in simple deterministic systems , the
authors give a unified treatment of a variety of mathematical systems generating densities, ranging from one-dimensional discrete
time transformations through continuous time systems described by
integra-partial-differential equations. Examples have been drawn
from many fields to illustrate the utility of the concepts and techniques presented, and the ideas in this book should thus prove useful in the study of a number of applied sciences.
The authors assume that the reader has a knowledge of advanced
calculus and differential equations. Basic concepts from measure
theory, ergodic theory, the geometry of manifolds, partial differential
equations, probability theory and Markov processes, and stochastic
integrals and differential equations are introduced as needed .
Physicists, chemists , and biomathematicians studying chaotic
behavior will find this book of value. It will also be a useful reference
or text for mathematicians and graduate students working in ergodic
theory and dynamical systems.

ISBN 0-387 -94049-9


ISBN 3-540-94049-9
www .spri nger-ny.com

Potrebbero piacerti anche