Problems From The Discrete To The Continuous - Probability, Number Theory, Graph Theory, and Combinatorics PDF

Universitext
Ross G. Pinsky
Problems from
the Discrete to
the Continuous
Probability, Number Theory, Graph
Theory, and Combinatorics
Universitext
Universitext
Series Editors:
Sheldon Axler
San Francisco State University
Vincenzo Capasso
Università degli Studi di Milano
Carles Casacuberta
Universitat de Barcelona
Angus MacIntyre
Queen Mary University of London
Kenneth Ribet
University of California, Berkeley
Claude Sabbah
CNRS, École Polytechnique, Paris
Endre Süli
University of Oxford
Wojbor A. Woyczynski
Case Western Reserve University, Cleveland, OH
Universitext is a series of textbooks that presents material from a wide variety of mathematical
disciplines at master’s level and beyond. The books, often well class-tested by their author,
may have an informal, personal even experimental approach to their subject matter. Some of
the most successful and established books in the series have evolved through several editions,
always following the evolution of teaching curricula, to very polished texts.
Thus as research topics trickle down into graduate-level teaching, first textbooks written for
new, cutting-edge courses may make their way into Universitext.
For further volumes:

http://www.springer.com/series/223
Ross G. Pinsky
Problems from the Discrete

to the Continuous
Probability, Number Theory, Graph Theory,
and Combinatorics
123
Ross G. Pinsky
Department of Mathematics
Technion-Israel Institute of Technology
Haifa, Israel
ISSN 0172-5939 ISSN 2191-6675 (electronic)

ISBN 978-3-319-07964-6 ISBN 978-3-319-07965-3 (eBook)
DOI 10.1007/978-3-319-07965-3
Springer Cham Heidelberg New York Dordrecht London
Library of Congress Control Number: 2014942157
Mathematics Subject Classification (2010): 05A, 05C, 05D, 11N, 60
© Springer International Publishing Switzerland 2014

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from Springer.
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations
are liable to prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with
respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

In most sciences one generation tears down
what another has built and what one has
established another undoes. In Mathematics
alone each generation builds a new story to
the old structure.
—Hermann Hankel
A peculiar beauty reigns in the realm of

mathematics, a beauty which resembles not
so much the beauty of art as the beauty of
nature and which affects the reflective mind,
which has acquired an appreciation of it,
very much like the latter.
—Ernst Kummer
To Jeanette
and to
E. A. P.
Y. U. P.
L. A. T-P.
M. D. P.
Preface
It is often averred that two contrasting cultures coexist in mathematics—the theory-

building culture and the problem-solving culture. The present volume was certainly
spawned by the latter. This book takes an array of specific problems and solves
them, with the needed tools developed along the way in the context of the particular
problems.
The book is an unusual hybrid. It treats a mélange of topics from combinatorial
probability theory, multiplicative number theory, random graph theory, and combi-
natorics. Objectively, what the problems in this book have in common is that they
involve the asymptotic analysis of a discrete construct, as some natural parameter
of the system tends to infinity. Subjectively, what these problems have in common
is that both their statements and their solutions resonate aesthetically with me.
The results in this book lend themselves to the title “Problems from the Finite to
the Infinite”; however, with regard to the methods of proof, the chosen appellation
is the more apt. In particular, generating functions in their various guises are
a fundamental bridge “from the discrete to the continuous,” as the book’s title
would have it; such functions work their magic often in these pages. Besides
bridging discrete mathematics and mathematical analysis, the book makes a modest
attempt at bridging disciplines—probability, number theory, graph theory, and
combinatorics.
In addition to the considerations mentioned above, the problems were selected
with an eye toward accessibility to a wide audience, including advanced undergrad-
uate students. The technical prerequisites for the book are a good grounding in basic
undergraduate analysis, a touch of familiarity with combinatorics, and a little basic
probability theory. One appendix provides the necessary probabilistic background,
and another appendix provides a warm-up for dealing with generating functions.
That said, a moderate dose of the elusive quality known as mathematical maturity
will certainly be helpful throughout the text and will be necessary on occasion.
The primary intent of the book is to introduce a number of beautiful problems in
a variety of subjects quickly, pithily, and completely rigorously to graduate students
and advanced undergraduates. The book could be used for a seminar/capstone
course in which students present the lectures. It is hoped that the book might also be
ix
x Preface
of interest to mathematicians whose fields of expertise are away from the subjects
treated herein. In light of the primary intended audience, the level of detail in proofs
is a bit greater than what one sometimes finds in graduate mathematics texts.
I conclude with some brief comments on the novelty or lack thereof in the
various chapters. A bit more information in this vein may be found in the chapter
notes at the end of each chapter. Chapter 1 follows a standard approach to the
problem it solves. The same is true for Chap. 2 except for the probabilistic proof
of Theorem 2.1, which I haven’t seen in the literature. The packing problem
result in Chap. 3 seems to be new, and the proof almost certainly is. My approach
to the arcsine laws in Chap. 4 is somewhat different than the standard one; it
exploits generating functions to the hilt and is almost completely combinatorial.
The traditional method of proof is considerably more probabilistic. The proofs of
the results in Chap. 5 on the distribution of cycles in random permutations are
almost exclusively combinatorial, through the method of generating functions. In
particular, the proof of Theorem 5.2 makes quite sophisticated use of this technique.
In the setting of weighted permutations, it seems that the method of proof offered
here cannot be found elsewhere. The number theoretic topics in Chaps. 6–8 are
developed in a standard fashion, although the route has been streamlined a bit to
provide a rapid approach to the primary goal, namely, the proof of the Hardy–
Ramanujan theorem. In Chap. 9, the proof concerning the number of cliques in a
random graph is more or less standard. The result on tampering detection constitutes
material with a new twist and the methods are rather probabilistic; a little additional
probabilistic background and sophistication on the part of the reader would be useful
here. The results from Ramsey theory are presented in a standard way. Chapter 10,
which deals with the phase transition concerning the giant component in a sparse
random graph, is the most demanding technically. The reader with a modicum of
probabilistic sophistication will be at quite an advantage here. It appears to me that
a complete proof of the main results in this chapter, with all the details, is not to be
found in the literature.
Acknowledgements It is a pleasure to thank my editor, Donna Chernyk, for her professionalism

and superb diligence.
Haifa, Israel Ross G. Pinsky

April 2014
Contents
1 Partitions with Restricted Summands

or “the Money Changing Problem” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 The Asymptotic Density of Relatively Prime Pairs and
of Square-Free Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 A One-Dimensional Probabilistic Packing Problem . . . . . . . . . . . . . . . . . . . . 21
4 The Arcsine Laws for the One-Dimensional Simple
Symmetric Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5 The Distribution of Cycles in Random Permutations . . . . . . . . . . . . . . . . . . 49
6 Chebyshev’s Theorem on the Asymptotic Density of the Primes. . . . . . 67
7 Mertens’ Theorems on the Asymptotic Behavior of the Primes . . . . . . . 75
8 The Hardy–Ramanujan Theorem on the Number
of Distinct Prime Divisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9 The Largest Clique in a Random Graph and Applications
to Tampering Detection and Ramsey Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9.1 Graphs and Random Graphs: Basic Definitions . . . . . . . . . . . . . . . . . . . . 89
9.2 The Size of the Largest Clique in a Random Graph . . . . . . . . . . . . . . . . 91
9.3 Detecting Tampering in a Random Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 99
9.4 Ramsey Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
10 The Phase Transition Concerning the Giant Component
in a Sparse Random Graph: A Theorem of Erdős and Rényi . . . . . . . . . 109
10.1 Introduction and Statement of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
10.2 Construction of the Setup for the Proofs
of Theorems 10.1 and 10.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
10.3 Some Basic Large Deviations Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
10.4 Proof of Theorem 10.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
xi
xii Contents
10.5 The Galton–Watson Branching Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

10.6 Proof of Theorem 10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Appendix A A Quick Primer on Discrete Probability . . . . . . . . . . . . . . . . . . . . . 133
Appendix B Power Series and Generating Functions . . . . . . . . . . . . . . . . . . . . . 141
Appendix C A Proof of Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

P1 1 2
Appendix D An Elementary Proof of nD1 n2 D 6
. . . . . . . . . . . . . . . . . . . . . . 149
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
A Note on Notation
Z denotes the set of integers

ZC denotes the set of nonnegative integers
N denotes the set of natural numbers: f1; 2; g
R denotes the set of real numbers
f .x/ D O.g.x// as x ! a means that lim supx!a j fg.x/
.x/
j < 1; in particular,
f .x/ D O.1/ as x ! a means that f .x/ remains bounded as x ! a
f .x/
f .x/ D o.g.x// as x ! a means that limx!a g.x/
D 0; in particular, f .x/ D o.1/
as x ! a means limx!a f .x/ D 0
f .x/
f g as x ! a means that limx!a g.x/
D1
gcd.x1 ; : : : ; xm / denotes the greatest common divisor of the positive integers
x1 ; : : : ; x m
The symbol Œ is used in two contexts:
1. Œn D f1; 2; : : : ; ng, for n 2 N
2. Œx is the greatest integer function; that is, Œx D n, if n 2 Z and n x < n C 1
Bin.n; p/ is the binomial distribution with parameters n and p

Pois./ is the Poisson distribution with parameter
Ber.p/ is the Bernoulli distribution with parameter p
X Bin.n; p/ means the random variable X is distributed according to the
distribution Bin.n; p/
xiii
Chapter 1
Partitions with Restricted Summands
or “the Money Changing Problem”
Imagine a country with coins of denominations 5 cents, 13 cents, and 27 cents. How
many ways can you make change for $51,419.48? That is, how many solutions
.b1 ; b2 ; b3 / are there to the equation 5b1 C 13b2 C 27b3 D 5;141;948, with
the restriction that b1 ; b2 ; b3 be nonnegative integers? This is a specific case of
the following general problem. Fix m distinct, positive integers faj gm j D1 . Count the
number of solutions .b1 ; : : : ; bm / with integral entries to the equation
b1 a1 C b2 a2 C C bm am D n; bj 0; j D 1; : : : ; m: (1.1)
A partition of n is a sequence of integers .x1 ; : : : ; xk /, where k is a positive

integer, such that
X
k
xi D n and x1 x2 xk 1:
iD1
Let Pn denote the number of different partitions of n. The problem of obtaining an

asymptotic formula for Pn is celebrated and very difficult. It was solved in 1918 by
G.H. Hardy and S. Ramanujan, who proved that
1 p 2n
Pn p e 3 ; as n ! 1:
4n 3
Now consider partitions of n where we restrict the values of the summands xi above
to the set faj gm
j D1 . Denote the number of such restricted partitions by Pn .faj gj D1 /.
m
A moment’s thought reveals that the number of solutions to (1.1) is Pn .faj gj D1 /.

m
Does there exist a solution to (1.1) for every sufficiently large integer n? And
if so, can one evaluate asymptotically the number of such solutions for large n?
Without posing any restrictions on faj gm j D1 , the answer to the first question is
negative. For example, if m D 3 and a1 D 5; a2 D 10; a3 D 30, then clearly
there is no solution to (1.1) if n − 5. Indeed, it is clear that a necessary condition
R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext, 1

DOI 10.1007/978-3-319-07965-3__1, © Springer International Publishing Switzerland 2014
2 1 Partitions with Restricted Summands
for the existence of a solution for all large n is that faj gm j D1 are relatively prime:
gcd.a1 ; : : : ; am / D 1. This is the time to recall a well-known result concerning
solutions .b1 ; : : : ; bm / with (not necessarily nonnegative) integral entries to the
equation b1 a1 C b2 a2 C C bm am D n. A fundamental theorem in algebra/number
theory states that there exists an integral solution to this equation for all n 2 Z if
and only if gcd.a1 ; : : : ; am / D 1. This result has an elegant group theoretical proof.
We will prove that for all large n, (1.1) has a solution .b1 ; : : : ; bm / with integral
entries if and only if gcd.a1 ; : : : ; am / D 1, and we will give a precise asymptotic
estimate for the number of such solutions for large n.
Theorem 1.1. Let m 2 and let faj gm j D1 be distinct, positive integers. Assume
that the greatest common divisor of faj gm j D1 is 1: gcd.a1 ; : : : ; am / D 1. Then for all
sufficiently large n, there exists at least one integral solution to (1.1). Furthermore,
the number Pn .faj gm j D1 / of such solutions satisfies
nm1
Pn .faj gm
j D1 / Q ; as n ! 1: (1.2)
.m 1/Š mj D1 aj
Remark. In particular, we see (not surprisingly) that for fixed m and sufficiently
large n, the smaller the faj gm
j D1 are, the more solutions there are. We also see
.1/ .2/
that given m1 and faj gm m2
j D1 , and given m2 and faj gj D1 , with m2 > m1 , then
1
for sufficiently large n there will be more solutions for the latter set of parameters.
Proof. We will prove the asymptotic estimate in (1.2), from which the first statement
of the theorem will also follow. Let hn denote the number of solutions to (1.1). (For
the proof, the notation hn will be a lot more convenient than Pn .faj gmj D1 /.) Thus,
we need to show that (1.2) holds with hn in place of Pn .faj gmj D1 /. We define the
generating function of fhn g1
nD1 :
1
X
H.x/ D hn x n : (1.3)
nD1
m
A simple, rough estimate shows that hn Qmn , from which it follows that the
j D1 aj
power series on the right hand side of (1.3) converges for jxj < 1. See Exercise 1.1.
It turns out that we can exhibit H explicitly. We demonstrate this for the case m D 2,
from which the general case will become clear.
For k D 1; 2, we have
1
D 1 C x ak C x 2ak C x 3ak C ;
1 x ak
and the series converges absolutely for jxj < 1. Thus,

1 Partitions with Restricted Summands 3
1
D 1Cx a1
Cx 2a1
Cx 3a1
C 1Cx a2
Cx 2a2
Cx 3a2
C D
.1x a1 /.1x a2 /

1 C x a1 C x 2a1 C x 3a1 C C x a2 C x a1 Ca2 C x 2a1 Ca2 C x 3a1 Ca2 C C
2a2
x C x a1 C2a2 C x 2a1 C2a2 C x 3a1 C2a2 C C (1.4)
A little thought now reveals that on the right hand side of (1.4), the number of
times the term x n appears is the number of integral solutions .b1 ; b2 / to (1.1) with
m D 2; that is, hn is the coefficient of x n on the right hand side of (1.4). So H.x/ D
1
.1x a1 /.1x a2 /
. Clearly, the same argument works for all m; thus we conclude that
1
H.x/ D ; jxj < 1: (1.5)
.1 x a1 /.1 x a2 / .1 x am /
We now begin an analysis of H , as given in its closed form in (1.5), which will
lead us to the asymptotic behavior as n ! 1 of the coefficients hn in its power
series representation in (1.3). Consider the polynomial
p.x/ D .1 x a1 /.1 x a2 / .1 x am /:
2 ij
1
For each k, the roots of 1 x ak are the ak th roots of unity: fe ak gjakD0 . Clearly 1 is a
root of p.x/ of multiplicity m. Because of the assumption that gcd.a1 ; : : : ; am / D 1,
it follows that every other root of p.x/ is of multiplicity less than m—that is, there is
2 ijk
no complex number r that can be written in the form r D e ak , simultaneously for
k D 1; : : : ; m, where 1 jk < ak . Indeed, if r can be written in the above form
for all k, then it follows that ajkk is independent of k. In particular, ak D jkja1 1 , for
k D 2; : : : ; m. Since 1 j1 < a1 , it follows that there is at least one prime factor
of a1 which is a factor of all of the ak , k D 2; : : : ; m, and this contradicts the
assumption that gcd.a1 ; : : : ; am / D 1.
Denote the distinct roots of p.x/ by 1; r2 ; : : : ; rl , and note from above that
jrj j D 1, for all j . Let mk denote the multiplicity of the root rk , for k D 2; : : : ; l.
Also, note that p.0/ D 1. Then we can write
x m2 x
.1 x a1 /.1 x a2 / .1 x am / D .1 x/m .1 / .1 /ml ; (1.6)
r2 rl
where 1 mj < m; for j D 2; : : : ; l:
In light of (1.5) and (1.6), we can write the generating function H.x/ in the form
1
H.x/ D : (1.7)
.1 x/m .1 x m2
r2
/ .1 x ml
rl
/
By the method of partial fractions, we can rewrite H from (1.7) in the form
A11 A12 A1m
H.x/ D C C C C
.1 x/m .1 x/m1 .1 x/
A A2m2 A Alml
21 l1
x m2 C C C C x ml C C : (1.8)
.1 r2 / .1 r2 /
x
.1 rl / .1 rxl /
For positive integers k, the function F .x/ D .1 x/k has the power series
expansion
1
!
X nCk1 n
k
.1 x/ D x :
nD0
k1
.n/ nCk1
To prove this, just verify that F nŠ.0/ D k1
. Thus, the first term on the right
hand side of (1.8) can be expanded as
1
!
A11 X nCm1 n
D A11 x : (1.9)
.1 x/m nD0
m1
The coefficient of x n on the right hand side above is
.n C m 1/.n C m 2/ .n C 1/ nm1
A11 A11 as n ! 1:
.m 1/Š .m 1/Š
Every other term on the right hand side of (1.8) is of the form .1Ax /k where 1
r
k < m and jrj D 1. By the same argument as above, the coefficient of x n in the
k1
n .k1/Š as n ! 1 (substitute r for x in the
expansion for .1Ax /k is asymptotic to r An x
r
appropriate series expansion). Thus, each of these terms is on a smaller order than
the coefficient of x n in (1.9). We thereby conclude that the coefficient of x n in H.x/
nm1
is asymptotic to A11 .m1/Š as n ! 1. By (1.3), this gives
nm1
hn A11 ; as n ! 1: (1.10)
.m 1/Š
It remains to evaluate the constant A11 . From (1.8), it follows that
A11 1
H.x/ CO ; as x ! 1:
.1 x/ m .1 x/ m1
Thus,
lim .1 x/m H.x/ D A11 : (1.11)

x!1
1 Partitions with Restricted Summands 5
But on the other hand, from (1.5), we have
.1 x/m Y m
x1
.1 x/m H.x/ D D : (1.12)
.1 x a1 /.1 x a2 / .1 x am / j D1
x aj 1
Since .x aj /0 jxD1 D aj x aj 1 jxD1 D aj , we conclude from (1.12) that
1
lim .1 x/m H.x/ D Qm : (1.13)
x!1 j D1 aj
From (1.11) and (1.13) we obtain A11 D Qm 1 , and thus from (1.10) we conclude
j D1 aj
nm1
that hn Q
.m1/Š m
.
j D1 aj
Exercise 1.1. If b1 a1 C b2 a2 C C bm am D n, then of course bj aj n, for

all j 2 Œm. Use this to obtain the following rough upper bound on the number of
m
solutions hn to (1.1): hn Qmn a . Then use this estimate together with the third
j D1 j
“fundamental result” in Appendix B to show that the series defining H.x/ in (1.3)
converges for jxj < 1.
Exercise 1.2. Go through the proof of Theorem 1.1 and convince yourself that the
result of the theorem holds even if the integers faj gm j D1 are not distinct. That is,
the number of solutions to (1.1) is asymptotic to the expression on the right hand side
of (1.2). Note though that the number of such solutions is not equal to Pn .faj gnj D1 /.
What is the leading asymptotic term as n ! 1 for the number of ways to make n
cents out of quarters and pennies, where one distinguishes the quarters by their mint
marks—“p” for Philadelphia, “d” for Denver, and “s” for San Francisco—but where
the pennies are not distinguished?
Exercise 1.3. In the case that d D gcd.a1 ; : : : ; am / > 1, use Theorem 1.1 to
formulate and prove a corresponding result.
Exercise 1.4. A composition of n is an ordered sequence P of positive integers
.x1 ; : : : ; xk /, where k is a positive integer, such that kiD1 xi D n. A favorite
method of combinatorists to calculate the size of some combinatorial object is to
find a bijection between the object in question and some other object whose size
is known. Let Cn denote the number of compositions of n. To calculate Cn , we
construct a bijection as follows. Consider a sequence of n dots in a row. Between
each pair of adjacent dots, choose to place or choose not to place a vertical line.
Consider the set of all possible dot and line combinations. (For example, if n D 5,
here are two possible such combinations: (1) j j (2) ):
(a) Show that there are 2n1 dot and line combinations.
(b) Show that there is a bijection between the set of compositions of n and the set
of dot and line combinations.
(c) Conclude from (a) and (b) that Cn D 2n1 .
Exercise 1.5. Let Cnf1;2g denote the number of compositions of n with summands
restricted to the integers 1 and 2, that is, compositions .x1 ; ; xk / of n with the
restriction that xi 2 f1; 2g, for all i . The series
X 1
1
F .x/ WD D .x C x 2 /n (1.14)
1 x x2 nD0
p
converges absolutely for jxj < 51
2
since jx C x 2 j jxj C jxj2 < 1 if jxj <
p
51
2
:
(a) Similar to the argument leading from (1.3) to (1.5), argue that Cnf1;2g is the
coefficient of x n in the
Ppower series expansion of F .
1
(b) Show that F .x/ D nD0 fn x n
, where ffn g1
nD0 is the Fibonacci sequence—
see (B.2) in Appendix B. (Hint: One has .x C x 2 /F .x/ D F .x/ 1.)
(c) Conclude from (a) and (b) that Cnf1;2g is the nth Fibonacci number; thus,
from (B.10) in Appendix B,
p p
1 1C 5 n 1 5 n
Cnf1;2g Dp . / . / :
5 2 2
Chapter Notes
For a leisurely and folksy introduction to the use of generating functions in

combinatorics, see Wilf’s little book [34]. For a recent encyclopedic treatment, see
the book of Flajolet and Sedgewick [20]. The asymptotic formula for Pn , noted at
the beginning of the chapter, was proved by Hardy and Ramanujan in [23]. For a
modern account, see [4]. The asymptotic estimate in Theorem 1.1 is due to I. Schur.
As noted in the text, this asymptotic formula also proves that (1.1) has a solution
for all sufficiently large n. However, this latter fact can be proved more easily; see,
for example, Brauer [11]. Given faj gm j D1 , what is the exact minimal value of n0
such that every n n0 can be written in the form (1.1)? When m D 2, the answer is
.a1 1/.a2 1/. A proof can be found in [34]. For m 3 the answer is not known.
Chapter 2
The Asymptotic Density of Relatively Prime
Pairs and of Square-Free Numbers
Pick a positive integer at random. What is the probability of it being even? As

stated, this question is not well posed, because there is no uniform probability
measure on the set N of positive integers. However, what one can do is fix a
positive integer n, and choose a number uniformly at random from the finite set
Œn D f1; : : : ; ng. Letting n denote the probability that the chosen number was
even, we have limn!1 n D 12 , and we say that the asymptotic density of even
numbers is equal to 12 .
In this spirit, we ask: if one selects two positive integers at random, what is the
probability that they are relatively prime? Fixing n, we choose two positive integers
uniformly at random from Œn. Of course, there are two natural ways to interpret this.
Do we choose a number uniformly at random from Œn and then choose a second
number uniformly at random from the remaining n 1 integers, or, alternatively,
do we select the second number again from Œn, thereby allowing for doubles? The
answer is that it doesn’t matter, because under the second alternative the probability
of getting doubles is only n1 , and this doesn’t affect the asymptotic probability. Here
is the theorem we will prove.
Theorem 2.1. Choose two integers uniformly at random from Œn. As n ! 1, the
asymptotic probability that they are relatively prime is 62 0:6079.
We will give two very different proofs of Theorem 2.1: one completely number
theoretic and one completely probabilistic. The number theoretic proof is elegant
even a little magical. However, it does require the preparation of some basic number
theoretic tools, and it provides little intuition. The number theoretic proof gives the
P P1 1
asymptotic probability as . 1 1 1
nD1 n2 / . The well-known fact that
2
nD1 n2 D 6 is
proved in Appendix D. The probabilistic proof requires very little preparation; it is
enough to know just the most rudimentary notions from discrete probability theory:
probability space, event, and independence. A heuristic, non-rigorous version of
the probabilistic proof provides a lot of intuition, some of which the reader might
find obscured Q in the rigorous proof. The probabilistic proof gives the asymptotic
probability as 1 1
kD1 .1 p 2 /, where fpk gkD1 is an enumeration of the primes. One
1
k

8 2 Relatively Prime Pairs and Square-Free Numbers
P
then must use the Euler product formula to show that this is equal to . 1 1 1
nD1 n2 / .
We will first give the number theoretic proof and then give the heuristic and the
rigorous probabilistic proofs.
The number theoretic ideas we develop along the way to our first proof of
Theorem 2.1 will bring us close to proving another result, which we now describe.
Every positive integer n 2 can be factored uniquely as n D p1k1 pm km
, where
m 1, fpj gj D1 are distinct primes, and kj 2 N, for j 2 Œm. If in this factorization,
m
one has kj D 1, for all j 2 Œm, then we say that n is square-free. Thus, an integer
n 2 is square-free if and only if it is of the form n D p1 pm , where m 1 and
fpj gmj D1 are distinct primes. The integer 1 is also called square-free. There are 61
square-free positive integers that are no greater than 100:
1,2,3,5,6,7,10,11,13,14,15,17,19,21,22,23,26,29,30,31,33,34,35,37,38,39,41,42,43,
46,47,51,53,55,57,58,59,61,62,65,66,67,69,70,71,73,74,77,78,79,82,83,85,86,
87,89,91,93,94,95,97.
Let Cn D fk W 1 k n; k is square-freeg. If limn!1 jCnn j exists, we call
this limit the asymptotic density of square-free numbers. After giving the number
theoretic proof of Theorem 2.1, we will prove the following theorem.
Theorem 2.2. The asymptotic density of square-free integers is 6
2
0:6079.
For the number theoretic proof of Theorem 2.1, the first alternative suggested
above in the second paragraph of this chapter will be more convenient. In fact, once
we have chosen the two distinct integers, it will be convenient to order them by size;
thus, we may consider the set Bn of all possible (and equally likely) outcomes to be
Bn D f.j; k/ W 1 j < k ng:
Let An Bn denote those pairs which are relatively prime:
An D f.j; k/ W 1 j < k n; gcd.j; k/ D 1g:
Then the probability qn that the two selected integers are relatively prime is
jAn j 2jAn j
qn D D : (2.1)
jBn j n.n 1/
We proceed to develop a circle of ideas that will facilitate the calculation of

limn!1 qn and thus give a proof of Theorem 2.1. A function a W N ! R is called
an arithmetic function. The Möbius function is the arithmetic function defined by
8
ˆ
<1; if n D 1I
ˆ
Qm
.n/ D .1/m ; if n D pj ; where fpj gm
j D1 are distinct primesI
ˆ j D1
:̂0; otherwise:
Thus, for example, we have .3/ D 1; .15/ D 1, and .12/ D 0.

2 Relatively Prime Pairs and Square-Free Numbers 9
Given arithmetic functions a and b, we define their convolution a b to be the

arithmetic function satisfying
X n
.a b/.n/ D a.d /b. /; n 2 N:
d
d jn
Clearly, a b D b a. The convolution arises naturally in the following context.

Define formally
1
X a.n/
f .x/ D (2.2)
nD1
nx
and
1
X b.n/
g.x/ D : (2.3)
nD1
nx
When we say “formally,” what we mean is that we ignore questions of convergence

and manipulate these infinite series according to the laws of addition, subtraction,
multiplication, and division, which are valid for series with a finite number of terms
and for absolutely convergent infinite series. Their formal product is given by
1 1 1 1
X a.d / X b.k/ X a.d /b.k/ X 1 X
f .x/g.x/D D D a.d /b.k/
dx kx .d k/x nx
d D1 kD1 nD1
d;kD1 d;k W d kDn
X1 1
1 X n X .a b/.n/
D x
a.d /b. / D : (2.4)
n d nx
nD1 d jn nD1
If the series on the right hand side of (2.2) and (2.3) are in fact absolutely convergent,
then the series on the right P1hand side of (2.4) is alsoPabsolutely convergent. In
a.d / P1 b.k/ 1 .ab/.n/
such case, the equality d D1 d x kD1 k x D nD1 nx
is a rigorous
statement in mathematical analysis.
An arithmetic function a is called multiplicative if a.nm/ D a.n/a.m/ whenever
gcd.n; m/ D 1. It follows that if a 6
0 is multiplicative, then a.1/ D 1. If a 6
0 is
multiplicative, then it is completely determined by its values on the prime powers;
Q kj
indeed, if n D m j D1 pj is the factorization of n into a product of distinct prime
Q kj Qm kj
powers, then a.n/ D a. m j D1 pj / D j D1 a.pj /.
It is trivial to verify that is multiplicative. For the first proposition below, the
following lemma will be useful.
P
Lemma 2.1. The arithmetic function d jn .d / is multiplicative.
Proof. Let n and m be positive integers satisfying gcd.n; m/ D 1. We have
X X X X X
.d1 / .d2 / D .d1 /.d2 / D .d1 d2 / D .d /;
d1 jn d2 jm d1 jn;d2 jm d1 jn;d2 jm d jnm
where the second equality follows from the fact that is multiplicative and the fact
that if gcd.n; m/ D 1, d1 jn and d2 jm, then gcd.d1 ; d2 / D 1, while the final equality
follows from the fact that if gcd.n; m/ D 1 and d jnm, then d can be written as
d D d1 d2 for a unique pair d1 ; d2 satisfying d1 jn and d2 jm. (The reader should
verify these facts.)
We introduce three more arithmetic functions that will be used in the sequel:
(
1; if n D 1I
1.n/ D 1; for all nI i.n/ D n; for all nI e.n/ D
0; otherwise:
P
Note that a e D a, for all a, and that .a 1/.n/ D d jn a.d /. A key result we
need is the Möbius inversion formula.
Proposition 2.1. Let a be an arithmetic function. Define b D a 1. Then a D b .
Remark. Written out explicitly, the theorem asserts that if
X
b.n/ WD a.d /;
d jn
P
then a.n/ D d jn b.d /. dn /.
Proof. To prove the proposition, it suffices to prove that
1 D e: (2.5)
Indeed, using this along with the easily verified associativity of the convolution, we
have
b D .a 1/ D a .1 / D a e D a:
We now prove (2.5). We have

X
.1 /.n/ D . 1/.n/ D .d /:
d jn
P
By Lemma 2.1, the function d jn .d / is multiplicative. Clearly, the function e
P D 1 and e.p / D 0, for any prime p and any
k
is multiplicative. Obviously, e.1/
positive integer k. We have d j1 .d / D .1/ D 1. Thus, since a nonzero,
multiplicative, arithmetic function is completely determined by its values

P on prime
powers, to complete the proof that 1 D e, it suffices to show that d jpk .d /
P P
D 0. We have d jpk .d / D kj D0 .p j / D .1/ C .p/ D 1 1 D 0.
We introduce one final arithmetic function—the well-known Euler -function:
.n/ D jfj W 1 j n; gcd.j; n/ D 1gj:
That is, .n/ counts the number of positive integers less than or equal to n which
are relatively prime to n. For our calculation of limn!1 qn , we will use a result that
is a corollary of the following proposition.
Proposition 2.2. 1 D i ; that is,
X
.d / D n:
d jn
From Proposition 2.2 and the Möbius inversion formula, the following corollary
is immediate.
Corollary 2.1. i D ; that is,
X n
.n/ D .d / :
d
d jn
Remark. For the proofs of Theorems 2.1 and 2.2, we do not need Proposition 2.2,
but only Corollary 2.1. In Exercise 2.1, the reader is guided through a direct proof of
the corollary. The proof also will reveal why the seemingly strange Möbius function
has such nice properties.
Proof of Proposition 2.2. Let d jn. It is easy to see that .d / is equal to the number
of k 2 Œn satisfying gcd.k; n/ D dn . Indeed, k 2 Œn satisfies gcd.k; n/ D dn if and
only if k D j. dn /, for some j 2 Œd satisfying gcd.d; j / D 1. (The reader should
verify this.) Also, clearly, every k 2 Œn satisfies gcd.k; n/ D dn for some d jn. The
proposition follows from these facts.
Remark. For an alternative proof of Proposition 2.2, exactly in the spirit of
Lemma 2.1 and the proof of (2.5), see Exercise 2.2.
We are now in a position to prove Theorem 2.1.
Number Theoretic Proof of Theorem 2.1. For each k 2, there are .k/ integers j
satisfying 1 j < k and gcd.j; k/ D 1. Thus,
X
n
jAn j D jf.j; k/ W 1 j < k n; gcd.j; k/ D 1gj D .k/:
kD2
Therefore, from (2.1), we have

Pn
2 .k/
qn D kD2
:
n.n 1/
To calculate
Pn
2 .k/
lim qn D lim kD2
; (2.6)
n!1 n!1 n.n 1/
Pn
we analyze the behavior of the sum kD1 .k/ for large n.
Remark. The function can be written explicitly as
Y 1
.n/ D n .1 /; n 2; (2.7)
p
pjn
Q
where pjn indicates that the product is over all primes that divide n; see
Exercise 2.3. However, this formula is of no help whatsoever for analyzing the above
sum.
P
We will use Corollary 2.1 to analyze nkD1 .k/. From Corollary 2.1, we have
X
n X
n X
n X
k
.k/ D . i /.k/ D .d / D
d
kD1 kD1 kD1 d jk
X
n X X
n X
d 0 .d / D .d / d 0:
kD1 dd 0 Dk d D1 d 0 dn
Pm
Since j D1 j D 12 m.m C 1/, we have
X
n X
n X 1X
n
n n
.k/ D .d / d0 D .d /Œ .Œ C 1/: (2.8)
2 d d
kD1 d D1 d 0 dn d D1
We have Œ dn .Œ dn C 1/ n n
.
d d
C 1/ D . dn /2 C dn , and Œ dn .Œ dn C 1/ . dn 1/ dn D
. dn /2 dn ; thus,
n n n n n n
. /2 Œ .Œ C 1/ . /2 C :
d d d d d d
Substituting this two-sided inequality in (2.8), we obtain
n2 X .d / n X .d / X n2 X .d / n X .d /

n n n n n
.k/ C : (2.9)
2 d2 2 d 2 d2 2 d
d D1 d D1 kD1 d D1 d D1
Now
X
n
.d / X
n
1 X
n
1
j j D1C 1 C log n; (2.10)
d d d
d D1 d D1 d D2
Rn
since the final sum is a lower Riemann sum for 1 x1 dx. From (2.9) and (2.10), we
obtain
Pn 1
kD2 .k/ 1 X .d /
lim D : (2.11)
n!1 n.n 1/ 2 d2
d D1
P
It remains to evaluate 1 .d /
d D1 d 2 . On the face of it, from the definition of ,
it would seem very difficult to evaluate this explicitly. However, Möbius inversion
saves the day. Consider (2.2)–(2.4) with a D 1 and b D and with x D 2. With
these choices, the right hand sides of (2.2) and (2.3) are absolutely convergent.
By (2.5), we have 1 D e; that is, a b D e. Therefore, we conclude from
(2.2)–(2.4) that
1
! 1
!
X 1 X .d /
D 1: (2.12)
d2 d2
d D1 d D1
Recall the well-known formula
X1
1 2
2
D : (2.13)
nD1
n 6
We give a completely elementary proof of this fact in Appendix D. From (2.12)

and (2.13) we obtain
1
X .d / 6
D 2: (2.14)
d2
d D1
Using (2.14) with (2.11) and (2.6) gives
6
lim qn D ;
n!1 2
completing the proof of the theorem.
Remark. If a is an arithmetic function and f is a nondecreasing function, we
say that the function f is the average order of the arithmetic function a if
1 Pn
n kD1 a.k/ D f .n/ C o.f .n//. Of course this doesn’t uniquely define f ; we
usually choose a particular such f which has a simple form. From (2.11) and (2.14),
it follows that the average order of the Euler -function is 3n2 .
We now turn to Theorem 2.2.

Proof of Theorem 2.2. From the definition of the Möbius function, it follows that
(
1; if n is square-freeI
2 .n/ D (2.15)
0; otherwise:
Thus, letting
An D fj 2 Œn W j is square-freeg;
we have
X
n
jAn j D 2 .j /: (2.16)
j D1
To prove the theorem, we need to show that
jAn j 6
lim D 2: (2.17)
n!1 n
We need the following lemma.
Lemma 2.2.
X
2 .n/ D .k/:
k 2 jn
P
Proof. Let ƒ.n/ WD k 2 jn .k/. If n is square-free, then the only integer k that
satisfies k jn is k D 1. Thus, since .1/ D 1, we have ƒ.n/ D 1. On the other
2
hand, if n is not square-free, then n can be written in the form n D m2 l, where

m > 1 and l is square-free. Now k 2 jm2 l if and only if kjm. (The reader should
verify this.) Thus, we have
X X X
ƒ.n/ D .k/ D .k/ D .k/ D . 1/.m/ D 0;
k 2 jn k 2 jm2 l kjm
where the last equality follows from (2.5). The lemma now follows from (2.15).
Using Lemma 2.2, we have
X
n X
n X
2 .j / D .k/: (2.18)
j D1 j D1 k 2 jj
If k 2 > n, then .k/ will not appear on the right hand side of (2.18). If k 2 n,
then .k/ will appear on the right hand side of (2.18) Œ kn2 times, namely, when
j D k 2 ; 2k 2 ; : : : ; Œ kn2 k 2 . Thus, we have
X
n X
n X X n X n
2 .j / D .k/ D Œ 2 .k/ D Œ 2 .k/ D
j D1 j D1 k 2 jj
k k
k n
2 1
kŒn 2
X .k/ X n n
n C Œ .k/: (2.19)
1
k2 1
k2 k2
kŒn 2 kŒn 2
Since each summand in the second term on the right hand side of (2.19) is bounded
in absolute value by 1, we have
X n n 1
j Œ 2 2 .k/j n 2 : (2.20)
1
k k
kŒn 2
It follows from (2.16), (2.19), and (2.20) that

1
X .k/
jAn j
lim D :
n!1 n k2
kD1
Using this with (2.14) gives (2.17) and completes the proof of the theorem.
We now give a heuristic probabilistic proof and a rigorous probabilistic proof of
Theorem 2.1. In the heuristic proof, we put quotation marks around the steps that
are not rigorous.
Heuristic Probabilistic Proof of Theorem 2.1. Let fpk g1 kD1 be an enumeration of
the primes. In the spirit described in the first paragraph of the chapter, if we
pick a positive integer “at random,” then the “probability” of it being divisible by
the prime number pk is p1k . (Of course, this is true also with pk replaced by an
arbitrary positive integer.) If we pick two positive integers “independently,” then the
“probability” that they are both divisible by pk is p1k p1k D p12 , by “independence.”
k
So the “probability” that at least one of them is not divisible by pk is 1 p12 . The
k
“probability” that a “randomly” selected positive integer is divisible by the two
distinct primes, pj and pk , is pj1pk D p1j p1k . (The reader should check that this
“holds” more generally if pj and pk are replaced by an arbitrary pair of relatively
prime positive integers, but not otherwise.) Thus, the events of being divisible by pj
and being divisible by pk are “independent.” Now two “randomly” selected positive
integers are relatively prime if and only if, for every k, at least one of the integers
is not divisible by pk . But since the “probability” that at least one of them is not
divisible by pk is 1 p12 , and since being divisible by a prime pj and being divisible
k
by a different prime pk are “independent” events, the “probability” that the two
Q integers1 are such that, for every k, at least one of them
“randomly” selected positive
is not divisible by pk is 1
kD1 .1 p 2 /. Thus, this should be the “probability” that
k
two “randomly” selected positive integers are relatively prime.
Rigorous Probabilistic Proof of Theorem 2.1. For the probabilistic proof, the sec-
ond alternative suggested in the second paragraph of the chapter will be more
convenient. Thus, we choose an integer from Œn uniformly at random and then
choose a second integer from Œn uniformly at random. Let n D Œn. The
appropriate probability space on which to analyze the model described above is the
space .n n ; Pn /, where the probability measure Pn on n n is the uniform
measure; that is, Pn .A/ D jAj
n2
, for any A n n . The point .i; j / 2 n n
indicates that the integer i was chosen the first time and the integer j was chosen
the second time. Let Cn denote the event that the two selected integers are relatively
prime; that is,
Cn D f.i; j / 2 n n W gcd.i; j / D 1g:
Then the probability qn that the two selected integers are relatively prime is
jCn j
qn D Pn .Cn / D :
n2
Let fpk g1 kD1 denote the prime numbers arranged in increasing order. (Any
enumeration of the primes would do, but for the proof it is more convenient to
choose the increasing enumeration.) For each k 2 N, let BnIk
1
denote the event that
2
the first integer chosen is divisible by pk and let BnIk denote the event that the
second integer chosen is divisible by pk . That is,
1
BnIk D f.i; j / 2 n n W pk ji g; BnIk
2
D f.i; j / 2 n n W pk jj g:
Note of course that the above sets are empty if pk > n. The event BnIk 1
\ BnIk
2
D
f.i; j / 2 n n W pk ji and pk jj g is the event that both selected integers have
pk as a factor. There are Œ pnk integers in n that are divisible by pk , namely,
pk ; 2pk ; ; Œ pnk pk . Thus, there are Œ pnk 2 pairs .i; j / 2 n n for which both
coordinates are divisible by pk ; therefore,
Œ pnk 2
1
Pn .BnIk \ BnIk
2
/D : (2.21)
n2
Note that [1kD1 .BnIk \ BnIk / D [kD1 .BnIk \ BnIk / is the event that the two
1 2 n 1 2
selected integers have at least one common prime factor. (The equality above
1 2
follows from the fact that BnIk and BnIk are clearly empty for k > n.) Consequently,
Cn can be expressed as
c
Cn D [nkD1 .BnIk
1
\ BnIk
2
/ D \nkD1 .BnIk
1
\ BnIk
2 c
/ ;
where Ac WD n n A denotes the complement of an event A n n .

Thus,

Pn .Cn / D Pn \nkD1 .BnIk
1
\ BnIk
2 c
/ : (2.22)
Let R < n be a positive integer. We have
\nkD1 .BnIk
1
\ BnIk
2 c
/ D \R
kD1 .BnIk \ BnIk / [kDRC1 .BnIk \ BnIk /
1 2 c n 1 2
and, of course, \nkD1 .BnIk

1
\ BnIk
2 c
/ \R
kD1 .BnIk \ BnIk / . Thus,
1 2 c
n
Pn \RkD1 .BnIk \ BnIk / Pn [kDRC1 .BnIk \ BnIk /
1 2 c 1 2

Pn \nkD1 .BnIk
1
\ BnIk
2 c
/ Pn \R
kD1 .BnIk \ BnIk / :
1 2 c
(2.23)
Using the sub-additivity property of probability measures for the first inequality
below, and using (2.21) for the equality below, we have
X
n X
n Œ pnk 2 1
X
1 1
Pn [nkDRC1 .BnIk
1 2
\ BnIk / Pn BnIk 2
\ BnIk / D :
n2 p2
kDRC1 kDRC1 kDRC1 k
(2.24)
Up until now, we have made no assumption on n. Q Now assume that pk jn, for
k D 1; ; R; that is, assume that n is a multiple of R
kD1 pk . Denote the set of
such n by DR ; that is,
DR D fn 2 N W pk jn for k D 1; ; Rg:
1
Recall that the event BnIk \ BnIk
2
is the event that both selected integers are divisible
by k. We claim that if n 2 DR , then the events fBnIk 1
\ BnIk
2
gR
kD1 are independent.
That is, for any subset I f1; 2; ; Rg, one has
Y
Pn \k2I .BnIk
1
\ BnIk
2
/ D 1
Pn .BnIk \ BnIk
2
/; if n 2 DR : (2.25)
k2I
The proof of (2.25) is a straightforward counting exercise and is left as Exercise 2.4.
If events fAk gR
kD1 are independent, then the complementary events fAk gkD1 are also
c R
independent. See Exercise A.3 in Appendix A. Thus, we conclude that
YR
1
Pn \R
kD1 .B 1
nIk \ B 2 c
nIk / D Pn .BnIk \ BnIk
2 c
/ ; if n 2 DR : (2.26)
kD1
1 Œ pn 2
By (2.21) we have Pn .BnIk \ BnIk
2 c
/ D 1 Pn .BnIk
1
\ BnIk
2
/ D 1 k
n2
, for any
n. Thus, from the definition of DR , we have
1 1
Pn .BnIk \ BnIk
2 c
/ D 1 2 ; if n 2 DR : (2.27)
pk
From (2.22) to (2.24), (2.26), and (2.27), we conclude that
Y
R 1
X YR
1 1 1
.1 / P n .C n / .1 2 /; for R 2 N and n 2 DR : (2.28)
kD1
pk2 kDRC1 pk2 kD1
p k
QRWe now use 0 (2.28) to obtain an estimate on Pn .Cn / for general n. Let n
kD1 pk . Let n denote the largest integer in DR which is smaller or equal to n,
and let n00 denote the smallest integer
Q in DR which is larger or equal to n. Since DR
is the set of positive multiples of RkD1 pk , we obviously have
Y
R Y
R
n0 > n pk and n00 < n C pk : (2.29)
kD1 kD1
For any n, note that n2 Pn .Cn / D jCn j is the number of pairs .i; j / 2 n n that
are relatively prime. Obviously, the number of such pairs is increasing in n. Thus
.n0 /2 Pn0 .Cn0 / n2 Pn .Cn / .n00 /2 Pn00 .Cn00 /, or equivalently,
n0 2 n00
. / Pn0 .Cn0 / Pn .Cn / . /2 Pn00 .Cn00 /: (2.30)
n n
Since n0 ; n00 2 DR , we conclude from (2.28)–(2.30) that
QR 1 QR
n kD1 pk 2
Y
R
1 X 1 nC kD1 pk 2
Y
R
1
. / .1 2
/ < Pn .Cn / < . / .1 /:
n
kD1
pk p2
kDRC1 k
n
kD1
pk2
(2.31)
Letting n ! 1 in (2.31), we obtain
Y
R X1 YR
1 1 1
.1 2
/ 2
lim inf P n .C n / lim sup P n .C n / .1 2 /:
kD1
pk p
kDRC1 k
n!1 n!1
kD1
pk
(2.32)
Now (2.32) holds for arbitrary R; thus letting R ! 1, we conclude that
1
Y 1
lim Pn .Cn / D .1 /: (2.33)
n!1
kD1
pk2
The celebrated Euler product formula states that
X1
1 1
Q1 D ; r > 1I (2.34)
kD1 .1
1 nr
pkr
/ nD1
see Exercise 2.5. From (2.33), (2.34), and (2.13), we conclude that
1 6
lim qn D lim Pn .Cn / D P1 1
D :
n!1 n!1
nD1 n2
2
Exercise 2.1. Give a direct proof of Corollary 2.1. (Hint: The Euler -function
.n/ counts the number of positive integers that are less than or equal to n and
relatively prime to n. We employ the sieve method, which from the point of view
of set theory is the method of inclusion–exclusion. Start with a list of all n integers
between 1 and n as potential members of the set of the .n/ relatively prime integers
to n. Let fpj gm n
j D1 be the prime divisors of n. For any such pj , the pj numbers
pj ; 2pj ; : : : ; pnj pj are not relatively prime to n. So we should strike these numbers
from our list. When we do this for each j , the remaining numbers on the list are
those numbers that are relatively prime to n, and the size of theP list is .n/. Now
we haven’t necessarily reduced the size of our list to N1 WD n m n
j D1 pj , because
some of the numbers we have deleted might be multiples of two different primes,
pi and pj , in which case they were subtracted above twice. Thus we need to add
back to N1 all of the pinpj multiples of pi pj , for i ¤ j . That is, we now have
P
N2 WD N1 C i¤j pinpj . Continue in this vein.
Exercise 2.2. This exercise presents an alternative proof to Proposition 2.2:
P
(a) Show that the arithmetic function d jn .d / is multiplicative. Use the fact that
is multiplicative—see
P Exercise 2.3.
(b) Show that d jn .d / D n, when n is a prime power.
(c) Conclude that Proposition 2.2 holds.
Exercise 2.3. The Chinese remainder theorem states that if n and m are relatively
prime positive integers, and a 2 Œn and b 2 Œm, then there exists a unique c 2 Œnm
such that c D a mod n and c D b mod m. (For a proof, see [27].) Use this to prove
that the Euler -function is multiplicative. Then use the fact that is multiplicative
to prove (2.7).
Exercise 2.4. Prove (2.25).
Exercise 2.5. Prove the Euler product formula (2.34). (Hint: Let N` denote the set
of positive integers all of whose prime factors are in the set fpk g`kD1 . Using the fact
that
X1
1 1
D rm ;
1 pr
1 p
mD0 k
k
P
for all k 2 N, first show that 11 1 11 1 D 1
n2N2 nr , and then show that
p1r p2r
Q` P
kD1 1 1 D n2N` nr , for any ` 2 N.)
1 1
pkr
Exercise 2.6. Using Theorem 2.1, prove the following result: Let 2 d 2 N.
Choose two integers uniformly at random from Œn. As n ! 1, the asymptotic
probability that their greatest common divisor is d is d 26 2 .
Exercise 2.7. Give a probabilistic proof of Theorem 2.2.
Chapter Notes
It seems that Theorem 2.1 was first proven by E. Cesàro in 1881. A good source for
the results in this chapter is Nathanson’s book [27]. See also the more advanced
treatment of Tenenbaum [33], which contains many interesting and nontrivial
exercises. The heuristic probabilistic proof of Theorem 2.1 is well known and
can be found readily, including via a Google-search. I am unaware of a rigorous
probabilistic proof in the literature.
Chapter 3
A One-Dimensional Probabilistic Packing
Problem
Consider n molecules lined up in a row. From among the n 1 nearest neighbor

pairs, select one pair at random and “bond” the two molecules together. Now from
all the remaining nearest neighbor pairs, select one pair at random and bond the
two molecules together. Continue like this until no nearest neighbor pairs remain.
Let MnI2 denote the random variable that counts the number of bonded molecules.
Let EMnI2 denote the expected value of MnI2 , that is, the average number of bonded
molecules. The first thing we would like to do is to compute the limiting average
fraction of bonded molecules: limn!1 EMnnI2 . Then we would like to show that MnnI2
is close to this limiting average with high probability as n ! 1; that is, we would
like to prove that MnnI2 satisfies the weak
P law of large numbers.
Of course, by definition, EMnI2 D nj D0 jP .MnI2 D j /, where P .MnI2 D j / is
the probability that MnI2 is equal to j . However, it would be fruitless to pursue this
formula to evaluate EMnI2 asymptotically because the calculation of P .MnI2 D j /
is hopelessly complicated. We will solve the problem with the help of generating
functions.
Actually, we will consider a slightly more general problem, where the pairs are
replaced by k-tuples, for some k 2. So the problem is as follows. There are n
molecules on a line. From among the n k C 1 nearest neighbor k-tuples, select one
at random and “bond” the k molecules together. Now from among all the remaining
nearest neighbor k-tuples, select one at random and bond the k molecules together.
Continue like this until there are no nearest neighbor k-tuples left. Let MnIk denote
the random variable that counts the number of bonded molecules, and let EMnIk
denote the expected value of MnIk . See Fig. 3.1. Here is our result.
Theorem 3.1. For each integer k 2,
X
k1 Z 1 X
k1 j
EMnIk 1 s
lim D k exp.2 / exp.2 / ds WD pk : (3.1)
n!1 n j D1
j 0 j D1
j
MnIk
Furthermore, n
satisfies the weak law of large numbers; that is, for all > 0,

22 3 Probabilistic Packing Problem
Fig. 3.1 A realization with n = 21 and k = 3 that gives M21;3 D 15
MnIk
lim P .j pk j / D 0: (3.2)
n!1 n
Remark 1. Only when k D 2 can pk be calculated explicitly, one obtains p2 D

1e 2 0:865. Numerical integration gives p3 0:824, p4 0:804, p5 0:792,
p10 0:770, p100 0:750, p1000 0:748, and p10;000 D 0:748. The expression
pk seems surprisingly difficult to analyze. We suggest the following open problem
to the reader.
Open Problem. Prove that pk is monotone decreasing and calculate limk!1 pk .
Remark 2. Any molecule that remains unbonded at the end of the nearest neighbor
k-tuple bonding process occurs in a maximal row of j unbonded molecules, for
some j 2 Œk 1. In the limit as n ! 1, what fraction of molecules ends up in a
maximal row of j unbounded molecules? See Exercise 3.2. (In Fig. 3.1, numbering
from left to right, molecules #4 and #8 occur in a maximal row of one unbounded
molecule, while molecules #15, #16, #20, and #21 occur in a maximal row of two
unbounded molecules.)
.k/ .k/
Proof. For notational convenience, let Hn D EMnIk and Ln D EMnIk
2
. To prove
the theorem, it suffices to show that
EMnIk D Hn.k/ D pk n C o.n/; as n ! 1; (3.3)
and that
2
EMnIk D L.k/
n D pk n C o.n /; as n ! 1:
2 2 2
(3.4)
This method of proof is known as the second moment method. It is clear that (3.1)
follows from (3.3). An application of Chebyshev’s inequality shows that (3.2)
follows from (3.3) and (3.4). To see this, note that if Z is a random variable with
expected value EZ and variance
2 .Z/, then Chebyshev’s inequality states that

2 .Z/
P .jZ EZj ı/ ; for any ı > 0:
ı2
MnIk
Also,
2 .Z/ D EZ 2 .EZ/2 . We apply Chebyshev’s inequality with Z D n
.
Using (3.3) and (3.4), we have
.k/
Hn
EZ D D pk C o.1/; as n ! 1; (3.5)
n
3 Probabilistic Packing Problem 23
and
.k/ .k/
Ln .Hn /2

2 .Z/ D 2
D pk2 C o.1/ .pk C o.1//2 D o.1/; as n ! 1:
n n2
Thus, we obtain for all ı > 0,

.k/
MnIk Hn o.1/
P .j j ı/ 2 ; as n ! 1;
n n ı
or, equivalently,
.k/
MnIk Hn
lim P .j j ı/ D 0; for all ı > 0: (3.6)
n!1 n n
We now show that (3.2) follows from (3.3) and (3.6). Fix > 0. We have
.k/ .k/ .k/ .k/
MnIk MnIk Hn Hn MnIk Hn Hn
j pk j D j C pk j j jCj pk j:
n n n n n n n
.k/
For sufficiently large n , one has from (3.3) that j Hnn pk j 2 , for n n . Thus,
.k/
MnIk MnIk
for n n , a necessary condition for j n
pk j is that j n
Hn
n
j 2 .
Consequently,
.k/
MnIk MnIk Hn
P .j pk j / P .j j /; for n n :
n n n 2
Now (3.2) follows from this and (3.6).
Our proofs of (3.3) and (3.4) will follow similar lines. Before commencing with
the proof of (3.3), we trace its general architecture. Only the first step of the proof
involves probability. In this step, we employ probabilistic reasoning to produce a
.k/ .k/ .k/ .k/
recursion equation that gives Hn in terms of H0 ; H1 ; : : : ; Hnk . In this form,
.k/
the equation is not useful because as n ! 1, it gives Hn in terms of a growing
.k/ Pn .k/
number of its predecessors. However, defining Sn D j D0 Hj , and using the
.k/
abovementioned recursion equation, we find that Sn is given in terms of only
two of its predecessors. We then construct the generating function g.t / whose
coefficients are fSn g1
.k/ .k/
nD0 . Using the recursion equation for Sn , we show that g
solves a linear, first order differential equation. We solve this differential equation
to obtain an explicit formula for g.t /. This explicit formula reveals that g possesses
.k/
a singularity at t D 1. Exploiting this singularity allows us to evaluate limn!1 Sn
n2
,
.k/ .k/
and then a simple observation allows us to obtain limn!1 Hnn from Sn
limn!1 n2 .
We now commence with the proof of (3.3). Note that if we start with n < k
molecules, then none of them will get bonded. Thus,
Hn.k/ D 0; for n D 0; : : : ; k 1: (3.7)
.k/
We now derive a recursion relation for Hn . The method we use is called first step
analysis. We begin with a line of n k unbonded molecules, and in the first step,
one of the nearest neighbor k-tuples is chosen at random and its k molecules are
bonded. In order from left to right, denote the original n k C 1 nearest neighbor
k-tuples by fBj gnkC1
j D1 . If Bj was chosen in the first step, then the original row now
contains a row of j 1 unbonded molecules to the left of the bonded k-tuple Bj
and a row of n C 1 j k unbonded molecules to the right of Bj . To complete the
random bonding process, we choose random k-tuples from these two sub-rows until
there are no more k-tuples to choose from. This gives us the following formula for
the conditional expectation of MnIk given that Bj was selected first: for n k,
.k/ .k/
E.MnIk jBj selected first/DkCE.Mj 1Ik CMnC1j kIk /DkCHj 1 CHnC1j k :
(3.8)
Of course, for each j 2 Œn k C 1, the probability that Bj was chosen first is
1
nkC1
. Thus, we obtain the formula
X
nkC1
EMnIk D Hn.k/ D P .Bj selected first/E.MnIk jBj selected first/ D
j D1
1 X
nkC1
.k/ .k/
.k C Hj 1 C HnC1j k /; n k:
n k C 1 j D1
We can rewrite this as
2 X
nk
.k/
Hn.k/ DkC H ; n k: (3.9)
n k C 1 j D0 j
.k/
The above recursion equation is not useful directly because it gives Hn in terms
of n k C 1 of its predecessors; we want a recursion equation that expresses a given
term in terms of a fixed finite number of its predecessors. To that end, we define
X
n
.k/
Sn.k/ D Hj : (3.10)
j D0
Substituting this in (3.9) gives
2 .k/
Hn.k/ D k C S ; n k: (3.11)
n k C 1 nk
Writing (3.7) and (3.11) in terms of fSn g1

.k/
nD0 , we obtain
Sn.k/ D 0; for n D 0; : : : ; k 1; (3.12)
and
.k/ 2 .k/
Sn.k/ Sn1 D k C S ; n k: (3.13)
n k C 1 nk
.k/
This recursion equation has the potential to be useful since it gives Sn in terms of
.k/ .k/
only two of its predecessors—Sn1 and Snk . Of course, we have paid a price—we
.k/ .k/
are now working with Sn instead of Hn ; but this will be dealt with easily. For
.k/ .k/ .k/
convenience, we drop the superscript k from Sn ; Hn , and Ln for the rest of the
chapter, except in the statement of propositions. We rewrite (3.13) as
.n k C 1/Sn D .n k C 1/Sn1 C 2Snk C k.n k C 1/; n k: (3.14)
We now define the generating function for fSn g1 nD0 and use (3.14) to derive a
linear, first-order differential equation that is satisfied by this generating function.
The generating function g.t / is defined by
1
X 1
X
g.t / D Sn t n D Sn t n ; (3.15)
nD0 nDk
where the second equality follows from (3.12). From the definitions, it follows that
Hn n, and thus Sn 12 n.n C 1/. Consequently, the sum on the right hand side
of (3.15) converges for jt j < 1, with the convergence being uniform for jt j , for
any 2 .0; 1/. It follows then that
1
X
g 0 .t / D nSn t n1 ; jt j < 1: (3.16)
nDk
Multiply equation (3.14) by t n and group the terms in the following way:
nSn t n .k 1/Sn t n D .n 1/Sn1 t n .k 2/Sn1 t n C 2Snk t n C k.n k C 1/t n :
Now summing the equation over all n k, and appealing to (3.15), (3.16),
and (3.12), we obtain the differential equation
tg 0 .t / .k 1/g.t / D t 2 g 0 .t / .k 2/tg.t /
1
X 1
X
C 2t k g.t / C k t nt n1 k.k 1/ t n: (3.17)
nDk nDk
P1 P1 tk
Since nDk nt n1 is the derivative of nDk tn D 1t
, it follows that
P1 tk 0 .1t/kt k1 Ct k
nDk nt
n1
D D
. 1t / .
Using these facts and doing some algebra,
.1t/2
which leads to many cancelations, we obtain
1
X 1
X ktk
kt nt n1 k.k 1/ tn D : (3.18)
.1 t /2
nDk nDk
Substituting this in (3.17), and doing a little algebra, we obtain
.k 1/ .k 2/t C 2t k k t k1
g 0 .t / D g.t / C ; 0 < t < 1: (3.19)
t .1 t / .1 t /3
Note that we have excluded t D 0 because we have divided by t .

There are two singularities in the above equation—one at t D 0 and one at t D 1.
The singularity at t D 0 is removable; indeed, g.0/ D 0 so the first term on the right
hand side of (3.19) can be defined at 0. The singularity at 1, on the other hand, is
authentic, and actually contains the solution to our problem—we will just need to
“unzip” it.
The linear, first-order differential equation in (3.19) is written in the form g 0 .t / D
a.t /g.t / C b.t /, where
.k 1/ .k 2/t C 2t k k t k1
a.t / D ; b.t / D : (3.20)
t .1 t / .1 t /3
Let 2 .0; 1/ and rewrite the differential equation as

Rt Rt
.g.t /e a.s/ ds 0
/ D b.t /e a.s/ ds
:
Integrating from to t 2 . ; 1/ gives
Rt
Z t Rs
g.t /e a.r/ dr
D g. / C b.s/e a.r/ dr
ds; t 2 . ; 1/;

which we rewrite as
Rt
Z t Rt
g.t / D g. /e a.r/ dr
C b.s/e s a.r/ dr
ds; t 2 . ; 1/: (3.21)

k 12
Since limt!0 t a.t / D k 1, there exists a t0 > 0 such that a.t / t
, for
0 < t t0 . Thus, for < t0 , one has
Rt Rt 1
t0
0 0 k 2 1
e a.r/ dr
e r dr
D . /k 2 :

By (3.15) we have g. / D O. k / as ! 0. Therefore,

Rt Rt Rt Rt
0 a.r/ dr a.r/ dr t0 1
lim g. /e a.r/ dr
D lim g. /e a.r/ dr
e t0 e t0 lim g. /. /k 2 D0:
!0 !0 !0
Thus, letting ! 0 in (3.21) gives

Z t Rt
g.t / D b.s/e s a.r/ dr
ds; 0 t < 1: (3.22)
0
Using partial fractions, one finds that
.k 1/ .k 2/r k1 1
D C :
r.1 r/ r 1r
We also have
r k1 1
D .1 C r C C r k2 /:
.1 r/ 1r
Thus, we can rewrite a.r/ from (3.20) as
k1 3
a.r/ D C 2.1 C r C C r k2 /:
r 1r
We then obtain
Z t X
k1 j
t
a.r/ dr D .k 1/ log t 3 log.1 t / 2 ;
j D1
j
and thus
Rt Pk1 t j Pk1 s j
e s a.r/ dr
D t k1 .1 t /3 e 2 j D1 j s 1k .1 s/3 e 2 j D1 j : (3.23)
Substituting this in (3.22) and recalling the definition of b from (3.20), we obtain
Z
t k1 2 Pjk1 tj
t Pk1 sj
g.t / D e D1 j
ke 2 j D1 j
ds: (3.24)
.1 t / 3
0
We see that g has a third-order singularity at t D 1. We proceed to “unzip” this

singularity to reveal the answer to our problem.
We have the following proposition which connects the limiting behavior of Hn
with that of Sn .
Proposition 3.1.
.k/
Hn
lim D`
n!1 n
if and only if
.k/
Sn `
lim D :
n!1 n2 2
Proof. The proof is immediate from (3.11).
And we have the following proposition which connects the limiting behavior of
Sn with the singularity in g at t D 1.
Proposition 3.2. If
.k/
Sn
lim D L;
n!1 n2
then
lim .1 t /3 g.t / D 2L:

t!1
Proof. Since limn!1 Sn

n2
D L, we also have limn!1 Sn
n.n1/
D L. Let > 0.
Choose n0 such that j n.n1/ Lj , for
Sn
n > n0 . Then recalling (3.15), we have
X
n0 1
X X
n0 1
X
Sn t n C.L / n.n1/t n g.t / Sn t n C.LC / n.n1/t n :
nD0 nDn0 C1 nD0 nDn0 C1
(3.25)
Now
1
X 1
X 1 00 2t 2
n.n 1/t n D t 2 . t n /00 D t 2 . / D ;
nD0 nD0
1t .1 t /3
so
1
X X n
2t 2 0
n.n 1/t Dn
n.n 1/t n :
nDn0 C1
.1 t / 3
nD0
Substituting this latter equality in (3.25), multiplying by .1 t /3 , and letting t ! 1,

we obtain
2L 2 lim inf.1 t /3 g.t / lim sup.1 t /3 g.t / 2L C 2 :

t!1 t!1
As > 0 is arbitrary, the proposition follows.

In order to exploit Propositions 3.1 and 3.2, we will establish the existence of the
limit limn!1 Snn2 .
.k/
Sn
Proposition 3.3. limn!1 n2
exists.
Proof. Rewriting the recursion equation for Sn in (3.13) so that only Sn appears on
Sn1
the left hand side, then dividing both sides by n2 and subtracting .n1/ 2 from both
sides, we have
Sn Sn1 k Sn1 Sn1 2Snk

D 2C 2 C 2 D
n2 .n 1/2 n n .n 1/2 n .n k C 1/
k 2n 1 2Snk
2 Sn1 C 2 D
n 2 n .n 1/ 2 n .n k C 1/
k 2n 1 2Sn1 2
Sn1 C 2 .HnkC1 C CHn1 / D
n2 n2 .n 1/2 n .n k C 1/ n2 .n k C 1/
k .2k 5/n C 3 k 2
C 2 Sn1 2 .HnkC1 C C Hn1 /:
n 2 n .n 1/ .n k C 1/
2 n .n k C 1/
(3.26)
As already noted, from the definitions, we have Hl l and Sl 12 l.l C 1/. Thus,
there exists a C > 0 such that
ˇ .2k 5/n C 3 k ˇ
ˇ ˇSn1 C and
n2 .n 1/2 .n k C 1/ n2
2 C
.HnkC1 C C Hn1 / 2 : (3.27)
n2 .n k C 1/ n
This shows that the right hand side of (3.26) is O. n12 / and thus so is the left hand
P Sn
side. Consequently, the telescopic series 1nD2 n2 .n1/2 is convergent. Since
Sn1
Sn X Sj n
Sj 1
D ;
n2 j D2
j 2 .j 1/2
we conclude that limn!1 Sn

n2
exists.
By Propositions 3.1 and 3.3, ` WD limn!1 Hn
n
exists. Then by Propositions 3.1
and 3.2 (with L D 2` ), it follows that
lim .1 t /3 g.t / D `:
t!1
However, from the explicit formula for g in (3.24), we have
Pk1 Z 1 Pk1
1 sj
lim .1 t /3 g.t / D ke 2 j D1 j
e2 j D1 j
ds D pk :
t!1 0
Thus, ` D pk , completing the proof of (3.3).

We now turn to the proof of (3.4). We derive a formula by the method used to
obtain (3.8). Recall the discussion preceding (3.8). Note that conditioned on Bj
being chosen on the first step, the final state of the j 1 molecules to the left of Bj
and the final state of the n C 1 j k molecules to the right of Bj are independent
of one another. Let Mj 1IkI1 and MnC1j kIkI2 be independent random variables
distributed according to the distributions of Mj 1Ik and MnC1j kIk , respectively.
Then similar to (3.8), we have
2
E.MnIk jBj selected first/ D E.k C Mj 1IkI1 C MnC1j kIkI2 /2 D
k 2 C Lj 1 C LnC1j k C 2kHj 1 C 2kHnC1j k C 2Hj 1 HnC1j k ; (3.28)
where the last term comes from the fact that the independence gives
EMj 1IkI1 MnC1j kIkI2 D EMj 1IkI1 EMnC1j kIkI2 :
Thus, similar to the passage from (3.8) to (3.9), we have
2 X
nk
4k X
nk
2 X
nk
Ln D k 2 C Lj C Hj C Hj Hnkj ;
n k C 1 j D0 n k C 1 j D0 n k C 1 j D0
for k n: (3.29)
We simplify the above recursion relation by defining
X
n
Rn D Lj :
j D0
Of course, we have Ln D 0, for n D 0; : : : ; k 1, and thus,
Rn D 0; for n D 0; : : : ; k 1: (3.30)
Recalling (3.10), we can now rewrite (3.29) in the form
2 4k 2 X
nk
Rn DRn1 Ck 2 C Rnk C Snk C Hj Hnkj ; n k:
nkC1 nkC1 nkC1
j D0
(3.31)
Proposition 3.4.
1 X .k/ .k/
nk
pk2
lim H j H nkj D :
n!1 n3 6
j D0
Proof. Let > 0. Since limn!1 Hnn D pk , we can find an n such that .pk /n
Hn .pk C /n, for n > n . Thus
X X
.pk /2 j.n k j / C Hj Hnkj
n <j <nn k 0j n ;nn kj nk
X
nk
Hj Hnkj
j D0
X X
.pk C /2 j.n k j / C Hj Hnkj :
n <j <nn k 0j n ;nn kj nk
(3.32)
Since Hj j , for all j , we have
X
Hj Hnkj 2.n C 1/n n: (3.33)
0j n ;nn kj nk
(There are 2.n C 1/ summands on the left hand side of (3.33),Pand each summand,
Hj Hnkj , is less than or equal to n n.) Using the identity nj D1 j 2 D 16 n.n C
1/.2n C 1/, we have
X X X
j.n k j / D .n k/ j j2 D
1j <nn k 1j <nn k 1j <nn k
1
.n k/.n n k 1/.n n k/
2
1 1
.nn k1/.nn k/ 2.nn k 1/ C 1 D n3 C o.n3 /; as n ! 1:
6 6
(3.34)
Of course,
X X 1
j.n k j / n j nn .n C 1/: (3.35)
1j n 1j n
2
From (3.32)–(3.35), we conclude that
1 X 1 X
nk nk
1 1
.pk /2 lim inf 3 Hj Hnkj lim sup 3 Hj Hnkj .pk C /2 ;
6 n!1 n j D0 n!1 n
j D0
6
which completes the proof, since > 0 is arbitrary.

We can rewrite (3.31) as
nkC3 2
Rn D Rn1 C k 2 .LnkC1 C C Ln1 /C
nkC1 nkC1
4k 2 X
nk
Snk C Hj Hnkj :
nkC1 n k C 1 j D0
Since Lj j 2 and Sj 12 j.j C 1/, we conclude from Proposition 3.4 that Rn

satisfies an equation of the form
nkC3 Wn p2
Rn D Rn1 C Wn ; where Wn satisfies lim 2 D k : (3.36)
nkC1 n!1 n 3
In Exercise 3.1 the reader is asked to show that if for some n0 , the positive sequence
fRO n g1 O
nDn satisfies Rn
0
nkC3 O
Rn1 C cn2 (RO n nkC3 RO n1 C cn2 ), then
nkC1 nkC1
RO n RO n
lim supn!1 n3
c (lim infn!1 n3
c). Using this with (3.36), we conclude
that
Rn p2
lim
3
D k: (3.37)
n!1 n 3
Writing (3.31) in the form
2 4k 2 X
nk
Ln D k 2 C Rnk C Snk C Hj Hnkj ; n k;
nkC1 nkC1 n k C 1 j D0
(3.38)
2
dividing both sides of this equation by n , and using (3.37), Proposition 3.4, and the
fact that Sn is on the order n2 , we conclude that
Ln p2 p2
lim 2
D 2 k C 2 k D pk2 :
n!1 n 3 6
This gives (3.4) and completes the proof of Theorem 3.1.
Exercise 3.1. Show that if for some n0 , the positive sequence fRO n g1
nDn0 satisfies
O
RO n nkC1 RO n1 C cn (RO n nkC1 RO n1 C cn ), then lim supn!1 Rn3n c
nkC3 2 nkC3 2
RO n
(lim infn!1 n3
c).
Exercise 3.2. Any molecule that remains unbonded at the end of the nearest neigh-
bor k-tuple bonding process occurs in a maximal row of j unbonded molecules,
for some j 2 Œk 1. In the limit as n ! 1, what fraction of molecules ends up
in a maximal row of j unbounded molecules? Let’s denote these fractions by qkIj ,
P
j 2 Œk 1. Of course k1
j D1 qkIj D 1 pk .
(a) Let k 3 and fix j 2 Œk 1. Consider the following bonding process:
implement the bonding of nearest neighbor k-tuples as described in the chapter.
When this process terminates, bond all the unbonded molecules that occur in
a maximal row of j unbonded molecules, but leave untouched all unbonded
molecules that occur in a maximal row of i unbonded molecules, for some
i ¤ j . Let MnIk;j denote the number of bonded molecules at the end of
.k;j / .k;j / P .k;j /
the process, and let Hn D EMnIk;j . Let Sn D niD0 Hi . Convince
.k;j / 1
yourself that fHn gnD0 satisfies the recursion equation (3.9) and that it
.k;j /
satisfies the boundary condition (3.7) with one change, namely Hj D j,
D 0. Thus, fSn g1
.k;j / .k;j /
instead of Hj nD0 satisfies the recursion equation (3.13),
and in place of the boundary condition (3.12), it satisfies the boundary condition
.k;j / .k;j /
Sn D 0, n D 0; : : : ; j 1; Sn D j , n D j; : : : ; k 1.
P1
the generating function for fSn g1
.k;j / n .k;j /
(b) Let gj .t / D nD0 S n t denote nD0 .
0
Show that gj solves the differential equation gj .t / D a.t /gj .t / C bj .t /, where
a is as in (3.20) and
j.k 1 j /t j 1 C j.k j /t j jt k1

bj .t / D b.t / C ;
.1 t /3
with b as in (3.20).
(c) In particular, note that bk1 D b; therefore, gk1 satisfies the same differential
equation satisfied by g. Thus, (3.21) holds for gk1 ; that is,
Rt
Z t Rt
gk1 .t / D gk1 . /e a.r/ dr
C b.s/e s a.r/ dr
ds; t 2 . ; 1/:

Use the fact that gk1 . / D .k 1/ k1 C O. k /, as ! 0, along with (3.23)

to show that
Rt t k1 2.tC t 2 CC t k1 /
lim gk1 . /e a.r/ dr
D .k 1/ e 2 k1 :
!0 .1 t /3
(d) Use (c) to show that

1 1
qkIk1 D .k 1/e 2.1C 2 CC k1 / :
In particular then, q3;2 D 2e 3 0:0996 0:100, and consequently q3;1 D

1 p3 q3;2 1 0:8237 0:0996
Pn 1 0:077.
(e) It is well known that limn!1 rD1 r log n exists; the limit is called Euler’s
constant and is denoted by . One has 0:5772. For a proof, see, for
example, [25]. Show that
1 2
qkIk1 e ; as k ! 1:
k1
(f) For j 2 Œk 2, one obtains
Rt
Z t Rt
gj .t / D gj . /e a.r/ dr
C bj .s/e s a.r/ dr
ds; t 2 . ; 1/: (3.39)

Show that since gj . / D j j C O. j C1 /, as ! 0, one has

Rt
lim gj . /e a.r/ dr
D 1:
!0
On the other hand, since bj appears instead of bk1 D b, show that one also has
Z t Rt
lim bj .s/e s a.r/ dr
ds D 1:
!0
Rt
You are Rinvited to show that the appropriate terms in gj . /e a.r/ dr and
Rt t
s a.r/ dr ds cancel each other out and to obtain a finite limiting
bj .s/e
expression as ! 0 on the right hand side of (3.39). This limiting expression
is then also gj .t /. One then has limt!1 .1 t /3 gj .t / D pk C qkIj , which gives
an explicit formula for qkIj . The above analysis gets more involved the smaller
j is. Try it first for j D k 2.
Chapter Notes
The calculation of (3.1) in the case k D 2 goes back to an article by the Nobel Prize
winning chemist Flory in 1939 [21]. The problem was rediscovered by Page, who
obtained the asymptotic behavior for the mean and variance in the case k D 2 [28].
The method used there does not generalize to k > 2. Theorem 3.1 seems to be new.
A continuous space version of this problem was considered by Rényi [31].
Chapter 4
The Arcsine Laws for the One-Dimensional
Simple Symmetric Random Walk
The simple, symmetric random walk fSn g1 nD0 on Z starts at step n D 0 at 0 2 Z and
at each successive step jumps one unit to the right or left, each with probability 12 .
The random walk is called “simple” because the sizes of its jumps are restricted to
the set f1; 1g. One way to realize this random walk is as follows. Let fXn g1 nD1
be an infinite sequence of independent, identically distributed random variables
distributed according to the Bernoulli distribution with parameter 12 ; that is, P .Xj D
P
1/ D P .Xj D 1/ D 12 . Now define S0 D 0 and Sn D nj D1 Xj , n 1.
We begin with a fundamental fact about the simple, symmetric random walk
on Z.
Proposition 4.1.
P .lim sup Sn D 1 and lim inf Sn D 1/ D 1: (4.1)

n!1 n!1
Remark 1. A moment’s thought shows that (4.1) is equivalent to the statement that
the random walk is recurrent; that is, with probability one, fSn g1
nD0 visits every site
in Z infinitely often.
Remark 2. One can consider a simple, symmetric random walk fSn g1 nD0 on Z , the
d
d -dimensional lattice—at each step it jumps in one of the 2d directions with prob-
1
ability 2d . Again, the random walk is called recurrent if with probability one every
site is visited infinitely often. It is called transient if P .limn!1 jSn j D 1/ D 1. In
1923, G. Polya proved the quite surprising result that this random walk is recurrent
in two dimensions but transient in three or more dimensions. For a proof of this, see,
for example, [15].
Proof. By Remark 1 above, to prove the proposition, it suffices to prove that with
probability one, the random walk visits every site in Z infinitely often. Let p denote
the probability that the random walk fSn g1 nD0 ever returns to its starting point 0.
We will show that p D 1. Let N0 denote the number of times the random walk is
at 0 after time n D 0. Then of course, P .N0 D 0/ D 1 p. Now let’s calculate

36 4 Arcsine Laws for Random Walk
P .N0 D 1/. In order to have N0 D 1, the random walk must return to 0 and then
never return to 0 again. The probability of returning to 0 is p. If the random walk
returns to 0, it continues independently of everything that has already transpired.
Thus, conditioned on returning to 0, the probability that the random walk does not
return to 0 again is 1 p. So P .N0 D 1/ D p.1 p/. Continuing with this line of
reasoning, we obtain
P .N0 D n/ D p n .1 p/; n D 0; 1; : : : :
If p D 1, it follows from the above reasoning that P .N0 D 1/ D 1; that

is, with probability one, the random walk visits 0 infinitely often. If p 2 .0; 1/,
then the above calculation shows that N0 is distributed according to the geometric
distribution with parameter p. For p 2 .0; 1/, the expected value EN0 of N0 is
given by
1
X 1
X 1
X
EN0 D nP .N0 D n/ D np n .1 p/ D p.1 p/ np n1 D
nD0 nD0 nD0
1
X
d 1 0 p
p.1 p/ p n D p.1 p/ D : (4.2)
dp nD0 1p 1p
(The term by term differentiation above is permitted because for any p0 < 1, the
series is uniformly absolutely convergent over p 2 Œ0; p0 .) Of course, if p D 1,
then EN0 D 1. Thus, the formula for EN0 in (4.2) also holds if p D 1.
We now calculate EN0 in a different way. Let 1fSn D0g denote the indicator
random variable that is equal to 1 if Sn D 0 and is equal to 0 otherwise. Then
N0 , the number of times the random walk returns to 0, can be represented as
1
X
N0 D 1fSn D0g :
nD1
By the linearity of the expectation and the nonnegativity of the summands, we

conclude that
1
X
EN0 D P .Sn D 0/; (4.3)
nD1
since E1fSn D0g D 0 P .Sn ¤ 0/ C 1 P .Sn D 0/ D P .Sn D 0/.

Since the random walk starts at 0, it can only return to 0 at even times; thus,
P .S2nC1 D 0/ D 0. Since the random walk has two equally likely choices at each
step, there are 22n equally likely paths that the random walk can traverse during its
first 2n steps. Now one has S2n D 0 if and only if from among the first
2n jumps, n
of them were to the right and n of them were to the left. There are 2nn
such paths;
4 Arcsine Laws for Random Walk 37
thus,
2n
P .S2n D 0/ D
n
: (4.4)
22n
p
Using Stirling’s formula, namely, nŠ nn e n 2 n as n ! 1, we have
2n p
.2n/Š .2n/2n e 2n 4 n 1
n
D 2n 2n D p ; as n ! 1: (4.5)
22n 2
.nŠ/ 22n n e .2 n/2 2n n
P
Since 1 nD1 n D 1, it follows from (4.3)–(4.5) that EN0 D 1. In light of (4.2),
p1
we conclude that p D 1.
We have shown that with probability one, the random walk returns to 0.
Upon returning to 0, the random walk continues independently of everything that
transpired previously; thus, in fact, with probability one, the random walk visits 0
infinitely often. From this, it is easy to show that in fact with probability one the
random walk visits every site infinitely often. We leave this as Exercise 4.1.
Define
T0 D inffn > 0 W Sn D 0g:
The random time T0 is called the first return time to 0. By Proposition 4.1, it follows
that P .T0 < 1/ D 1. However, perhaps surprisingly, one has ET0 D 1; the reader
is guided through a proof of this in Exercise 4.2. This result suggests that there is
quite some tendency for the random walk to take a long time to return to 0. In this
chapter we present two results which give vivid expression to this phenomenon.
The arcsine distribution will figure prominently in the results of this chapter. The
distribution function for this distribution is defined by
2 p
Farcsin .x/ D arcsin x; 0 x 1:

0
The corresponding density function farcsin .x/ D Farcsin .x/ is given by
1 1
farcsin .x/ D p ; 0 < x < 1:
x.1 x/
Our first theorem concerns the random time

.2n/
L0 D maxfk 2n W Sk D 0g;
.2n/
which is the last return time to 0 up to step 2n. By parity considerations, L0 can
take on only even values.
Theorem 4.1.
2k 2n2k
.2n/
P .L0 D 2k/ D k nk
; k D f0; 1; : : : ; ng: (4.6)
22n
Furthermore,
L0
.2n/
2 p
lim P . x/ D arcsin x; 0 x 1: (4.7)
n!1 2n
Remark. This theorem highlights the tendency of the random walk to take a long
time to return to 0. Indeed, since the density farcsin .x/ blows up at x D 0; 1, it
follows from (4.7) that for large n the most likely epochs k for the last visit to 0 up
to time 2n are those satisfying k D o.n/ or k D 2n o.n/, that is, those q epochs
at the very beginning or at the very end of the trajectory. Since 2

arcsin 1
2
D 12 ,
1
from (4.7) it also follows that for large n, there is a probability of about that a 2
random walk trajectory of 2n steps will never return to 0 during the second half of
its life.
C
Our second theorem concerns the random variable O2n , which should be thought
of as the number of steps k 2 Œ2n at which the random walk is positive
(or nonnegative). Of course, the number of steps between 1 and 2n that the random
walk is positive is usually not equal to the number of steps that it is nonnegative.
In order to obtain an exact result in closed form for all n, we need to work in a
symmetric setting. Therefore, if the random walk is equal to 0 at some step 2k, we
classify that step as “positive” if the previous step was positive and “negative” if the
previous step was negative. That is,
C
O2n D jfk 2 Œ2n W Sk > 0 or Sk D 0 and Sk1 > 0gj:
C OC
We call O2n the occupation time of the positive half line up to time 2n. Then 2n2n
gives the fraction of steps among the first 2n steps that the random walk is in the
C
positive half line. Note that by parity considerations, O2n can only take on even
values.
Theorem 4.2.
2k 2n2k
C
P .O2n D 2k/ D k nk
; k D f0; 1; : : : ; ng: (4.8)
22n
Furthermore,
C p
O2n 2
lim P . x/ D arcsin x; 0 x 1: (4.9)
n!1 2n
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
−1
−2
−3
−4
Fig. 4.1 A random walk path of length 17
Remark 1. Since the density farcsin .x/ takes on its minimum at x D 12 , and since
it blows up at x D 0; 1, it follows that for large n the most likely percentages
of time that a random walk trajectory is nonnegative are around 0 % and 100 %,
while the least likely percentage is around 50 %! To put it in a different way, if two
players bet a dollar each on a succession of fair coin flips, then after a long time it
is overwhelmingly more likely that one of the players was leading almost the whole
time than that each player was leading about half the time. This result even more
vividly highlights the tendency of the random walk to take a long time to return to 0.
0
Remark 2. Let O2n D fk 2 Œn W S2k D 0g denote the number of visits to 0 of the
O0
random walk up to step Œ2n. It is not hard to show that the random variable 2n2n ,
denoting the fraction of steps up to 2n at which the random walk is at 0, converges
to 0 in probability; that is,
0
O2n
lim P . > / D 0; for all > 0: (4.10)
n!1 2n
We leave this as Exercise 4.3. In light of this, it follows that (4.9) would also hold if
C
we had defined O2n in an asymmetric fashion as the number of steps up to Œ2n for
which the random walk is nonnegative: jfk 2 Œ2n W Sk 0gj.
Our approach to proving the above two theorems will be completely combi-
natorial rather than probabilistic. Generating functions will play a seminal role.
A random walk path of length m is a path fxj gmj D0 which satisfies
x0 D 0I
(4.11)
xj xj 1 D ˙1; j 2 ŒmI
See Fig. 4.1. Since a random walk path has two choices at each step, there are 2m
random walk paths of length m. The probability that the simple, symmetric random
walk behaves in a certain way up until time m is simply the number of random walk
paths that behave in that certain way divided by 2m .
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Fig. 4.2 A Dyck path of length 16
Our basic combinatorial object upon which our results will be developed is the
Dyck path. A Dyck path of length 2n is a nonnegative random walk path fxj g2n j D0 of
length 2n which returns to 0 at step 2n; that is, in addition to satisfying (4.11) with
m D 2n, it also satisfies the following conditions:
xj 0; j 2 Œ2nI
(4.12)
x2n D 0:
See Fig. 4.2. We use generating functions to determine the number of Dyck paths.
Let dn denote the number of Dyck paths of length 2n. We also define d0 D 1.
Proposition 4.2. The number of Dyck paths of length 2n is given by
!
1 2n
dn D ; n 1:
nC1 n
1 2n

Remark. The number Cn WD nC1 n
is known as the nth Catalan number.
Proof. We derive a recursion formula for fdn g1 nD0 . A primitive Dyck path of length
2k is a Dyck path fxj g2k
j D0 of length 2k which satisfies xj > 0 for j D 1; : : : ; 2k1.
Let vk denote the number of primitive Dyck paths of length 2k. Every Dyck path
of length 2n returns to 0 for the first time at 2k, for some k 2 Œn. Consider a Dyck
path of length 2n that returns to 0 for the first time at 2k. The part of the path from
time 0 to time 2k is a primitive Dyck path of length 2k, and the part of the path from
time 2k to 2n is an arbitrary Dyck path of length 2n 2k. (In Fig. 4.2, the Dyck
path of length 16 is composed of an initial primitive Dyck path of length 6, followed
by a Dyck path of length 10.) This reasoning yields the recurrence relation
X
n
dn D vk dnk ; n 1: (4.13)
kD1
Now we claim that
vk D dk1 ; k 1: (4.14)
Indeed, a primitive Dyck path fxj g2k

j D0 must satisfy x1 D 1, xj 1, for j 2 Œ2k2,
x2k1 D 1, x2k D 0. Thus, letting yj D xj C1 1, 0 j 2k 2, it follows
that fyj g2k2
j D0 is a Dyck path. Of course, this analysis can be reversed. This shows
that there is a 1-1 correspondence between primitive Dyck paths of length 2k and
arbitrary Dyck paths of length 2.k 1/, proving (4.14). From (4.13) and (4.14) we
obtain the Dyck path recursion formula
X
n
dn D dk1 dnk : (4.15)
kD1
Let
1
X
D.x/ D dn x n (4.16)
nD0
be the generating function for fdn g1 nD0 . Since there are 2

2n
random walk paths of
length 2n, we have the trivial estimate dn 2 D 4 . Thus, the power series
2n n
defining D.x/ is absolutely P1 convergent forP jxj < 14 . The product

P1 of ntwo absolutely
1
nD0 cn x , where cn D
n n
convergent
Pn power series nD0 a n x and nD0 b n x is
j D0 aj bnj . Thus, if in (4.15), the term dk1 were dk instead, and the summation
started from k D 0 instead of from k D 1, then we would have had D 2 .x/ D D.x/.
As it is, we “correct” for these deficiencies by multiplying by x and adding 1: it is
easy to check that (4.16) and (4.15) give
D.x/ D xD 2 .x/ C 1: (4.17)

p
Solving this quadratic equation in D gives D.x/ D 1˙ 2x14x . Since we know
from (4.16) that D.0/ D 1, we conclude that the generating function for fdn g1
nD0 is
given by
p
1 1 4x 1
D.x/ D ; jxj < : (4.18)
2x 4
1 1
Now .1 4x/ 2 jxD0 D 1, ..1 4x/ 2 /0 jxD0 D 2 and
1 Y
n1
1 1 2n .2n 2/Š
..1 4x/ 2 /.n/ jxD0 D 2n .2j 1/ D n1 D
nŠ nŠ j D1 nŠ2 .n 1/Š
!
2 2n 1
; for n 2I
2n 1 n
p
thus, the Taylor series for 1 4x is given by
1
!
p X 2 2n 1 n
1 4x D 1 2x x : (4.19)
nD2
2n 1 n
2nC1 2n
The coefficient of x nC1 in (4.19) is 2nC1
2
nC1
D nC1
2
n
. Using this along
with (4.18) and (4.19), we conclude that
1
!
X 1 2n n 1
D.x/ D x ; jxj < : (4.20)
nD0
nC1 n 4

1 2n
From (4.20) and (4.16) it follows that dn D nC1 n
.
The proof of the proposition gives us the following corollary.
Corollary 4.1. The generating function for the sequence fdn g1
nD0 , which counts
Dyck paths, is given by
p
1 1 4x 1
D.x/ D ; jxj < :
2x 4
Let wn denote the number of nonnegative random walk paths of length 2n. The
difference between such a path and a Dyck path is that for such a path there is no
requirement that it return to 0 at time 2n. We also define w0 D 1. We now calculate
fwn g1 1
nD0 by deriving a recursion formula which involves fdn gnD0 .
Proposition 4.3. The number wn of nonnegative random walk paths of length 2n is

given by
!
2n
wn D ; n 1: (4.21)
n
of random walk paths of length 2n that return to 0 at time 2n

Remark. The number
is also given by 2n
n
, since to obtain such a path, we must choose n jumps of +1 and
n jumps of 1. Thus, we have the following somewhat surprising corollary.
Corollary 4.2.
P .S1 0; : : : ; S2n 0/ D P .S2n D 0/:
Proof of Proposition 4.3. Of course every nonnegative random walk path of length
2n C 2, when restricted to its first 2n steps, constitutes a nonnegative random walk
path of length 2n. A nonnegative random walk path of length 2n which does not
return to 0 at time 2n, that is, which is not a Dyck path, can be extended in four
different ways to create a nonnegative random walk path of length 2n C 2. On the
other hand, a nonnegative random walk path of length 2n which is a Dyck path can
only be extended in two different ways to create a nonnegative random walk path of
length 2n C 2. Thus, we have the relation
wnC1 D 4.wn dn / C 2dn D 4wn 2dn ; n 0: (4.22)
Let
1
X
W .x/ D wn x n
nD0
be the generating function for fwn g1nD0 . As with the power series defining D.x/,
it is clear that the power series defining W .x/ converges for jxj < 14 . Multiply
equation (4.22) by x n and sum over n from 0 to 1. On the left side we obtain
P 1
nD0 wnC1 x D x .W .x/1/, and on the right hand side we obtain 4W .x/2D.x/.
n 1
From the resulting equation, x1 .W .x/ 1/ D 4W .x/ 2D.x/, we obtain
1 2xD.x/
W .x/ D : (4.23)
1 4x
Substituting for D.x/ in (4.23) from Corollary 4.1, we obtain
1 1
W .x/ D p ; jxj < :
1 4x 4
We have W .0/ D 1, and for n 1,

!
1 nY
n
W .n/ .0/ 1 12 .n/ 1 n .2n/Š 2n
D .1 4x/ / jxD0 D 2 .2j 1/ D 2 n D :
nŠ nŠ nŠ j D1 nŠ 2 nŠ n
Thus the Taylor series for W .x/ is given by
1
!
X 2n n
W .x/ D x ;
nD0
n
2n
and we conclude that wn D n
.
Armed with Propositions 4.2 and 4.3, we can give a quick proof of (4.6).
Proof of Theorem 4.1. By the remark after Proposition 4.3, it follows that (4.6)
holds for k D n. So we now assume that k 2 f0; 1; : : : ; n 1g. Given a random
walk path, fxj glj D0 , we define the negative of the path to be the path fxj glj D0 .
.2n/
If a random walk path of length 2n satisfies L0 D 2k, then its first 2k steps
constitute a random walk path that returns to 0 at time 2k, and its last 2n 2k
steps constitute either a random walk path that is strictly positive or thenegative
of such a path. As noted in the remark after Proposition 4.3, there are 2k k
random
walk paths of length 2k that return to 0 at time 2k. How many strictly positive
random walk paths of length 2n 2k are there? Let fxj g2n2k j D0 be such a path. Then
x1 D 1, and by parity considerations, x2n2k 2. Consider now the part of the
path from time 1 to time 2n 2k. If we relabel and subtract one, yj D xj C1 1,
j D 0; 1 : : : ; 2n 2k 1, then we obtain a nonnegative random walk path of length
2n 2k 1. By defining y2n2k D y2n2k1 ˙ 1, we can extend this path in two
ways to get a nonnegative random walk path of length 2n 2k. This reasoning
shows that there is a two-to-one correspondence between nonnegative random walk
paths of length 2n 2k and strictly positive
random walk paths of length 2n 2k.
We know that there are wnk D 2n2k nk
nonnegative random walk paths of length
2n 2k; thus, we conclude thatthe number
of strictly positive random walk paths
of length 2n 2k is equal to 12 2n2k
nk
. We conclude from the above analysis that
.2n/
the number of random walk paths of length 2n that satisfy L0 D 2k is equal to
2k 2n2k
k nk
, from which (4.6) follows.
We now consider (4.7). In Exercise 4.4 the reader is asked to apply Stirling’s
formula and show that for any > 0,
2k 2n2k
1 1
k nk
p ; uniformly over n k .1 /n; as n ! 1:
22n k.n k/
(4.24)
Using (4.24) and (4.6), we have for 0 < a < b < 1
2k 2n2k
L
.2n/ X
Œnb
X
Œnb
1 1
P .a < 0 b/ D k nk
p D
2n 22n k.n k/
kDŒnaC1 kDŒnaC1
1 X
Œnb
1 1
q ; as n ! 1: (4.25)
k
.1 k n
/
kDŒnaC1 n n
But the last term on the right hand side of (4.25) is a Riemann sum for
R
1 b p 1
a x.1x/
dx. Thus, letting n ! 1 in (4.25) gives
Z p
L 1
.2n/ b
1 2 2 p
lim P .a < 0 b/ D p dx D arcsin b arcsin a;
n!1 2n a x.1 x/
for 0 < a < b < 1;
which is equivalent to (4.7). This completes the proof of Theorem 4.1.

We now turn to the proof of Theorem 4.2.
Proof of Theorem 4.2. We need to prove (4.8). Of course, (4.9) follows from (4.8)
C
just like (4.7) followed from (4.6). Recalling the symmetric definition of O2n , for
the purpose of this proof, we will refer to S2k as “positive” if either S2k > 0 or
S2k D 0 and S2k1 > 0. Let cn;k denote the number of random walk paths of length
2n which are positive at exactly 2k steps. Since there are 22n random walk paths of
length 2n, in order to prove (4.8), we need to prove that
! !
2k 2n 2k
cn;k D ; k D 0; 1; : : : ; n: (4.26)
k nk

By Proposition 4.3, we have cn;n D 2n n
, and by symmetry, cn;0 D 2n n
; thus, (4.26)
holds for k D 0; n.
C
Consider now k 2 Œn 1. A random walk path that satisfies O2n D 2k
must return to 0 before step 2n. Consider the first return to 0. If the path was
positive before the first return to 0, then the first return to 0 must occur at step 2j , for
some j 2 Œk (for otherwise, the path would be positive for more than 2k steps). If
the path was negative before the first return to 0, then the first return to 0 must occur
at step 2j , for some j 2 Œn k (for otherwise the path would be positive for fewer
than 2k steps). In light of these facts, and recalling that vj D dj 1 is the number of
primitive Dyck paths of length 2j , it follows that for j 2 Œk, the number of random
walk paths of length 2n which start out positive, return to 0 for the first time at step
2j , and are positive for exactly 2k steps is equal to dj 1 cnj;kj , Similarly, for
j 2 Œn k, the number of random walk paths of length 2n which start out negative,
return to 0 for the first time at step 2j , and are positive for exactly 2k steps is equal
to dj 1 cnj;k . Thus, we obtain the recursion relation
X
k X
nk
cn;k D dj 1 cnj;kj C dj 1 cnj;k ; k 2 Œn 1: (4.27)
j D1 j D1

Let en WD 2nn
, n 0. As follows from the remark after Proposition 4.3, for
n 1, en is the number of random walk paths of length 2n that are equal to 0 at
step 2n. We derive a recursion formula for fen g1 nD0 . A random walk path of length
2n which is equal to 0 at step 2n must return to 0 for the first time at step 2k, for
some k 2 Œn. The number of random walk paths of length 2n which are equal to 0
at time 2n and which return to 0 for the first time at step 2k is equal to 2vk enk D
2dk1 enk . Consequently, we obtain the recursion formula
X
n
en D 2dk1 enk : (4.28)
kD1
We can now prove (4.26) by considering (4.27) and (4.28) and applying induction.
To prove (4.26) we need to show that for all n 1,
cn;k D ek enk ; for k D 0; 1; : : : ; n: (4.29)

When n D 1, (4.29) clearly holds. We now assume that (4.29) holds for all n n0
and prove that it also holds for n D n0 C 1. When n D n0 C 1 and k D 0 or
k D n0 C 1, we already know that (4.29) holds. So we need to show that (4.29)
holds for n D n0 C 1 and k 2 Œn0 . Using (4.27) for the first equality, using the
inductive assumption for the second equality, and using (4.28) for the third equality,
we have
X
k C1k
n0X
cn0 C1;k D dj 1 cn0 C1j;kj C dj 1 cn0 C1j;k D
j D1 j D1
X
k C1k
n0X
dj 1 ekj en0 C1k C dj 1 ek en0 C1kj D
j D1 j D1
1 1
ek en0 C1k C en0 C1k ek D ek en0 C1k ; (4.30)
2 2
which proves that (4.29) holds for n D n0 C 1 and completes the proof of
Theorem 4.2.
Exercise 4.1. This exercise completes the proof of Proposition 4.1. We proved that
with probability one, the simple, symmetric random walk on Z visits 0 infinitely
often.
(a) For fixed x 2 Z, use the fact that with probability one the random walk visits
0 infinitely often to show that with probability one the random walk visits x
infinitely often. (Hint: Every time the process returns to 0, it has probability
. 12 /jxj of moving directly to x in jxj steps.)
(b) Show that with probability one the random walk visits every x 2 Z infinitely
often.
Exercise 4.2. In this exercise, you will prove that ET0 D 1, where T0 is the first
return time to 0. We can consider the random walk starting from any j 2 Z, rather
than just from 0. When we start the random walk from j , denote the corresponding
probabilities and expectations by Pj and Ej . Fix n 1 and consider starting the
random walk from some j 2 f0; 1; : : : ; ng. Let T0;n denote the first nonnegative
time that the random walk is at 0 or n.
(a) Define g.j / D Ej T0;n . By analyzing what happens on the first step, show that
g solves the difference equation g.j / D 1 C 12 g.j C 1/ C 12 g.j 1/, for
j D 1; : : : ; n 1. Note that one has the boundary conditions g.0/ D g.n/ D 0.
(b) Use (a) to show that Ej T0;n D j.n j /. (Hint: Write the difference equation
in the form g.j C 1/ g.j / D g.j / g.j 1/ 2.)
(c) In particular, (b) gives E1 T0;n D n 1. From this, conclude that ET0 D 1.
O0
Exercise 4.3. Prove (4.10): limn!1 P . 2n2n > / D 0; for all > 0. (Hint:
P2n
Represent O2n0 0
by O2n D j D1 1fSj D0g , where 1fSj D0g is as in the proof of
0
O2n
Proposition 4.1. From this representation, show that limn!1 E 2n
D 0. Conclude
from this that (4.10) holds.)
Exercise 4.4. Use Stirling’s formula to prove (4.24). That is, show that for any
; ı > 0, there exists an n ;ı such that if n n ;ı , then
2k 2n2k
p
1ı k nk
k.n k/ 1 C ı;
22n
for all k satisfying n k .1 /n.

Exercise 4.5. If one considers a simple, symmetric random walk fSk g2n kD0 up to
time 2n, the probability of seeing any particular one of the 22n random walk paths
2n
2n 2n is equal to 2 . Recall from the remark after Proposition 4.3 that there
of length
are n random walk paths of length 2n that return to 0 at time 2n. It follows from
that conditioned on S2n D 0, the probability of seeing any particular one
symmetry
of the 2nn
random walks paths of length 2n which return to 0 at time 2n is equal
to 2n1 .
.n/
(a) Let p 2 .0; 1/ f 12 g and consider the simple random walk on Z which jumps
one unit to the right with probability p and one unit to the left with probability
1 p. Denote the random walk by fSn g1
.p/
nD0 . Consider this random walk up
to time 2n. For each particular random walk path of length 2n, calculate the
probability of seeing this path. The answer now depends on the path.
.p/
(b) Conditioned on S2n D 0, show that the probability of seeing any particular one
2n
of the n random walk paths of length 2n which return to 0 at time 2n is equal
to 2n1 .
.n/
Exercise 4.6. Let 0 j m. Consider the random walk fSn g1
.p/
nD0 as in
Exercise 4.5, with p 2 .0; 1/, but starting from j , and denote probabilities by Pj .
.p/
Let T0;m denote the first nonnegative time that this random walk is at 0 or at m. Use
the method of Exercise 4.2—analyzing what happens on the first step—to calculate
.p/
Pj .S .p/ D 0/, that is, the probability that starting from j , the random walk reaches
T0;m
0 before it reaches m. (Hint: The calculation in the case p D 1
2
needs to be treated
separately.)
Chapter Notes
The arcsine law in Theorem 4.2 was first proven by P. Lévy in 1939 in the context
of Brownian motion, which is a continuous time and continuous path version of the
simple, symmetric random walk. The proof of Theorem 4.2 is due to K.L. Chung and
W. Feller. One can find a proof in volume 1 of Feller’s classic text in probability [19].
One can also find there a proof of Theorem 4.1. Our proofs of these theorems are
a little different from Feller’s proofs. As expected, the proofs in Feller’s book have
a probabilistic flavor. We have taken a more combinatorial/counting approach via
generating functions. Proposition 4.3 and Corollary 4.2 can be derived alternatively
via the “reflection principle”; see [19]. For a nice little book on random walks from
the point of view of electrical networks, see Doyle and Snell [15]; for a treatise on
random walks, see the book by Spitzer [32].
Chapter 5
The Distribution of Cycles in Random
Permutations
In this chapter we study the limiting behavior of the total number of cycles and of
the number of cycles of fixed length in random permutations of Œn as n ! 1. This
class of problems springs from a classical question in probability called the envelope
matching problem. You have n letters and n addressed envelopes. If you randomly
place one letter in each envelope, what is the asymptotic probability as n ! 1 that
no letter is in its correct envelope?
Let Sn denote the set of permutations of Œn. Of course, Sn is a group, but the
group structure will not be relevant for our purposes. For us, a permutation
2
Sn is simply a 1-1 map of Œn onto Œn. The notation
j will be used to denote
the image of j 2 Œn under this map. We have jSn j D nŠ. Let PnU denote the
uniform probability measure on Sn . That is, PnU .A/ D jAjnŠ
, for any subset A Sn .
If
j D j , then j is called a fixed point for the permutation
. Let Dn Sn
denote the set of permutations that do not fix any points; that is,
2 Dn if
j ¤ j ,
for all j 2 Œn. Such permutations are called derangements. The classical envelope
matching problem then asks for limn!1 PnU .Dn /.
The standard way to solve the envelope matching problem is by the method of
inclusion–exclusion. Define Gi D f
2 Sn W
i D i g. (We suppress the dependence
of Gi on n since n is fixed in this discussion.) Then the complement Dnc of Dn is
given by Dnc D [niD1 Gi , and the inclusion–exclusion principle states that
X
n X
P .[niD1 Gi / D P .Gi / P .Gi \ Gj /C
iD1 1i<j n
X
P .Gi \ Gj \ Gk / C .1/n1 P .\niD1 Gi /:
1i<j <kn
(See Exercise A.2 in Appendix A.) Each of the probabilities above can be computed
readily. After some calculations one finds that P .Dn / D 1 P .[niD1 Gi / D 1 1 C
1
2Š
3Š1 C C .1/n nŠ1 ; thus, limn!1 P .Dn / D e 1 .

50 5 Cycles in Random Permutations
.n/
Here is an elegant, alternative proof using generating functions. Let dk denote
the number of permutations in Sn that fix exactly k points. We need to calculate
.n/
d0
limn!1 nŠ
. Clearly,
X
n
.n/
dk D nŠ; (5.1)
kD0
since every permutation fixes k points, for some k. To construct a permutation in

Sn that fixes exactly k points, first we can choose k numbers from Œn for the fixed
points, and then we must choose a permutation of the other n k numbers that fixes
none of them; thus,
!
.n/ n .nk/
dk D d :
k 0
Substituting this in (5.1) gives

!
X
n
n .nk/
d D nŠ;
k 0
kD0
or equivalently
X
n
d0
.nk/
D 1: (5.2)
kŠ.n k/Š
kD0
P
If one multiplies the absolutely convergent
P1 power series 1 n
nD0 an x by the abso-
n
lutely convergent
P power series b
nD0 P n x , one gets the absolutely convergent
power series 1 c
nD0 n x n
, where c n D n
a b
kD0 k nk . Thus, it follows from (5.2)
that
1 1 1
X x n X d0
.n/
X
xn D x n ; jxj < 1;
nD0
nŠ nD0
nŠ nD0
or
1
X .n/
d e x
0
xn D ; jxj < 1: (5.3)
nD0
nŠ 1x
.n/
d0
Thus nŠ
is the coefficient of x n in
e x x2 x3
D .1 x C C /.1 C x C x 2 C x 3 C /;
1x 2Š 3Š
.n/
d0
and this is easily seen to give nŠ
D11C 1
2Š
1
3Š
C C .1/n nŠ1 .
5 Cycles in Random Permutations 51
In order to begin our study of the behavior of the number of cycles and of the
number of cycles of fixed length in random permutations, we recall some basic facts
and notation concerning cycles of permutations. Consider the permutation
2 S4
given in two-line form by 12 24 31 43 . This means that
1 D 2;
2 D 4, etc. Since
1 goes to 2, 2 goes to 4, 4 goes to 3, and 3 goes back to 1, we call
cyclic and
denote this by writing
D .1 2 4 3/. (We could also just as well write it as .4 3 1 2/,
for example.) Recall that every permutation can be decomposed into a product of
disjoint cycles. For example, consider
2 S8 given by 13 22 35 48 56 67 71 84 . Under
,
1 goes to 3, 3 goes to 5, 5 goes to 6, 6 goes to 7, and 7 goes back to 1, closing a
cycle. Now 2 goes to 2, which makes a cycle unto itself, and finally, 4 goes to 8
and 8 goes back to 4. Therefore, we write
D .1 3 5 6 7/.2/.4 8/ or, alternatively,
D .1 3 5 6 7/.4 8/; in the latter form, the convention is that every number that does
not appear at all forms a cycle unto itself. Note that
has one cycle of length 5, one
cycle of length 2, and one cycle of length 1.
.n/
For
2 Sn and j 2 Œn, let Cj .
/ denote the number of cycles of length j in
. Note that for all

2 Sn , one has the identity
X
n
.n/
jCj .
/ D n:
j D1
.n/ .n/ .n/

We call .C1 .
/; C2 .
/; : : : ; Cn .
// the cycle type of the permutation
. Let
X
n
.n/
N .n/ .
/ D Cj .
/
j D1
denote the number of cycles in the permutation

2 Sn . Under the probability
.n/
measure PnU , we may think of N .n/ and Cj as random variables. In this chapter
we will investigate the limiting distribution of the random variable N .n/ and of
.n/
the random variable Cj for fixed j , as n ! 1. In fact, more generally,
we will investigate the limiting distribution of the j -dimensional random vector
.n/ .n/ .n/
.C1 ; C2 ; : : : ; Cj /. We call these cycles small cycles because their lengths are
fixed as n ! 1.
Instead of just considering permutations under the uniform measure PnU , we will
consider permutations under a one-parameter family of probability measures which
includes the uniform measure as a particular case. For each 2 .0; 1/, we define a
./
probability measure Pn on Sn by
.n/
N .
/
Pn./ .f
g/ D ;
Kn . /
where
X .n/ .
/
Kn . / D N

2Sn
is the normalizing constant required to make a probability measure. Thus, under

./
the measure Pn , every permutation is weighted proportionally by the parameter
raised to an exponent equal to the number of cycles in the permutation.
./
Consequently, for > 1, Pn favors permutations with many cycles, and for < 1,
it favors permutations with few cycles. Of course D 1 corresponds to the uniform
.1/
measure: PnU D Pn . The original reason for considering the probability measures
./
Pn can be attributed to Proposition 5.1 below, which gives the exact distribution
./
of the cycle types under Pn . In Exercise 5.1, the reader is asked to verify that
./
Proposition 5.1 follows from the definition of Pn along with Proposition 5.2 and
Lemma 5.1, which are stated and proved in the course of the proofs of Theorems 5.1
and 5.2 below. We use the standard notation
.n/ WD . C 1/ . C n 1/; n 1:
This expression is sometimes referred to as a rising factorial; the notation is called

the Pochhammer symbol.
P
Proposition 5.1. If nj D1 jaj D n, then
nŠ Y aj 1
n
.n/ .n/
Pn./ .C1 D a1 ; C2 D a2 ; : : : ; Cn.n/ D an / D . / :
.n/ j D1 j aj Š
The distribution in Proposition 5.1 is known as the Ewens sampling formula; it

arose originally in the context of population genetics.
We will prove a weak law of large numbers for the distribution of the number of
cycles N .n/ .
./
Theorem 5.1. Let 2 .0; 1/. Under Pn , the distribution of the number of cycles
N .n/ in a permutation satisfies
N .n/
! in probabilityI
log n
that is, for all > 0,
N .n/
lim Pn./ .j j / D 0:
n!1 log n
We now consider the small cycles. A random variable Z is distributed according

to the Poisson distribution with parameter > 0 (Z Pois./) if
j
P .Z D j / D e ; for j D 0; 1; : : : :
jŠ
j
The j discrete random variables fXi giD1 are called independent if P .X1 D
Qj j
x1 ; : : : ; Xj D xj / D iD1 P .Xi D xi /, for all choices of fxi giD1 R. In the
sequel, Z will denote a random variable distributed according to Pois./, and it
j j
will always be assumed that fZi giD1 are independent for distinct fi giD1 .
We will prove a weak convergence result for small cycles.
./
Theorem 5.2. Let 2 .0; 1/. Let j be a positive integer. Under the measure Pn ,
.n/ .n/ .n/
the distribution of the random vector .C1 ; C2 ; : : : ; Cj / converges weakly to the
distribution of .Z ; Z ; : : : ; Z /. That is,
2 j
Y
j
. i /mi
e i
.n/ .n/ .n/
lim Pn./ .C1 D m 1 ; C2 D m 2 ; : : : ; Cj D mj / D ;
n!1
iD1
mi Š
mi 0; i D 1; : : : ; j: (5.4)
Remark. Let j be a positive integer and let 1 k1 < k2 < < kj . In

Exercise 5.7 the reader is asked to show that by making a small change in the proof
of Theorem 5.2, one has
Y
j
. ki /mi
e ki
.n/ .n/ .n/
lim P ./ .Ck1 D m1 ; Ck2 D m2 ; : : : ; Ckj D mj / D ;
n!1 n mi Š
iD1
mi 0; i D 1; : : : ; j: (5.5)
.n/
In particular, for any fixed j , the distribution of Cj converges weakly to the
Pois. j / distribution. Actually, (5.5) can be deduced directly from (5.4); see
Exercises 5.2 and 5.3.
Our proofs of these two theorems will be very combinatorial, through the method
of generating functions. The use of purely probabilistic reasoning will be rather
minimal.
For the proofs of the two theorems, we will need to evaluate the normalizing
constant Kn . /. Of course, this is trivial in the case of the uniform measure, that
is, the case D 1. Let s.n; k/ denote the number of permutations in Sn that have
exactly k cycles. From the definition of Kn . /, we have
X
n
Kn . / D s.n; k/ k : (5.6)
kD1
Proposition 5.2.
Kn . / D .n/ :
Remark. The numbers s.n; k/ are called unsigned Stirling numbers of the first kind.
Proposition 5.2 and (5.6) show that they arise as the coefficients of the polynomials
qn . / WD .n/ D . C 1/ . C n 1/.
Proof. There are .n 1/Š permutations in Sn that contain only one cycle and one
permutation in Sn that contains n cycles:
s.n; 1/ D .n 1/Š; s.n; n/ D 1: (5.7)
We prove the following recursion relation:
s.n C 1; k/ D ns.n; k/ C s.n; k 1/; n 2; 2 k n: (5.8)
Note that (5.7) and (5.8) uniquely determine s.n; k/ for all n 1 and all k 2 Œn.
To create a permutation
0 2 SnC1 , we can start with a permutation
2 Sn
and then take the number n C 1 and either insert it into one of the existing cycles
of
or let it stand alone as a cycle of its own. If we insert n C 1 into one of the
existing cycles, then
0 will have k cycles if and only if
has k cycles. There are n
possible locations in which one can place the number nC1 and preserve the number
of cycles. (The reader should verify this.) Thus, from each permutation in Sn with
k cycles, we can construct n permutations in SnC1 with k cycles. If, on the other
hand, we let n C 1 stand alone in its own cycle, then
0 will have k cycles if and
only if
has k 1 cycles. Thus, from each permutation in Sn with k 1 cycles, we
can construct one permutation in SnC1 with k cycles. Now (5.8) is the mathematical
expression of this verbal description.
Let cn;k denote the coefficient of k in qn . / D . C 1/ . C n 1/. Clearly
cn;1 D .n 1/Š and cn;n D 1, for n 1. Writing qnC1 . / D qn . /. C n/, one sees
that cnC1;k D ncn;k C cn;k1 , for n 2; 2 k n. Thus, cn;k satisfies the same
recursion relation (5.8) and the same boundary condition (5.7) as does s.n; k/. We
conclude that cn;k D s.n; k/. The proposition follows from this along with (5.6).
./
In light of Proposition 5.2, from now on, we write the probability measure Pn
in the form
.n/
N .
/
Pn./ .f
g/ D :
.n/
We now set the stage to prove Theorem 5.1. The probability generating function
PX .s/ of a random variable X taking nonnegative integral values is defined by
1
X
PX .s/ D Es X D s i P .X D i /; jsj 1:
iD0
The probability generating function uniquely determines the distribution; indeed,

1 d i PX .s/
iŠ ds i
jsD0 D P .X D i /. Let PN .n/ .sI / denote the probability generating
./
function for the random variable N .n/ under Pn :
X
n
PN .n/ .sI / D s i Pn./ .N .n/ D i /:
iD1
Recalling that s.n; i / denotes the number of permutations in Sn with i cycles, it

follows that
i s.n; i /
Pn./ .N .n/ D i / D :
.n/
Using this with (5.6) and Proposition 5.2 gives
X
n
i s.n; i / .s /.n/ s.s C 1/ .s C n 1/
PN .n/ .sI / D si D D D
iD1
.n/ .n/ . C 1/ . C n 1/
Y
n
i 1
. sC /: (5.9)
iD1
Ci 1 Ci 1
A random variable X is distributed according to the Bernoulli distribution with

parameter p 2 Œ0; 1 if P .X D 1/ D p and P .X D 0/ D 1 p. We write
X Ber.p/. The probability generating function for such a random variable is
ps C 1 p. Now let fX.Ci1/1 gniD1 be independent random variables, where
Pn
X.Ci1/1 Ber. Ci1

/. Let Zn; D iD1 X.Ci1/1 . Then the probability
generating function for Zn; is given by
Pn
X.Ci 1/1
Y
n
PZn; .s/ D Es Zn; D Es i D1
D Es X.Ci 1/1 D
iD1
Y
n
i 1
. sC /: (5.10)
iD1
Ci 1 Ci 1
For the third equality above we have used the fact that the expected value of a
product of independent random variables is equal to the product of their expected
values. From (5.9), (5.10), and the uniqueness of the probability generating function,
we obtain the following proposition.
./
Proposition 5.3. Under Pn , the distribution of N .n/ is equal to the distribution of
P n
iD1 X.Ci1/1 , where fX.Ci1/1 giD1 are independent random variables, and
n
X.Ci1/1 Ber . Ci1 /.

Remark. As an alternative way of arriving at the result in the proposition, there is

a nice probabilistic construction of uniformly random permutations ( D 1) that
immediately yields the result, and the construction can be amended to cover the
case of general . See Exercise 5.4.
We now use Proposition 5.3 and Chebyshev’s inequality to prove the first
theorem.
Pn
Proof of Theorem 5.1. Let Zn; D iD1 X.Ci1/1 . By Proposition 5.3, it
suffices to show that
Zn;
lim P .j j / D 0; for all > 0: (5.11)
n!1 log n
If Xp Ber.p/, then the expected value of Xp is EXp D p, and the variance

is Var.Xp / D p.1 p/. Since the expectation is linear, we have EZn; D
Pn
iD1 Ci1 . By considering the above sum simultaneously as an upper Riemann
sum and as a lower Riemann sum of appropriate integrals, we have
Z X n
n
1
log.n C / log D dx D EZn;
0 Cx iD1
C i 1
Z n1
1
1C dx D 1 C log.n 1 C / log :
0 Cx
Since log.n C / D log n C log.1 C n / and log.n 1 C / D log n C log.1 C 1

n
/,
the above inequality immediately yields
EZn; D log n C O.1/; as n ! 1: (5.12)
Since the variance of a sum of independent random variables

P is the sum of the
variances of the random variables, we have Var.Zn; / D niD1 .Ci1/
.i1/
2 . Similar to
the integral estimate above for the expectation, we have
X
n Z n1
1 1
Var.Zn; / C dx D C log.n 1/: (5.13)
iD2
i 1 1 x
Using (5.12) for the last inequality below, we have for sufficiently large n
Zn;
P .j j / D P .jZn; log nj log n/ D
log n

P .j Zn; EZn; C EZn; log n/j log n/
P .jZn; EZn; j log n jEZn; log n/j/
1
P .jZn; EZn; j log n/: (5.14)
2
Applying Chebyshev’s inequality to the last term in (5.14), it follows from (5.13)
and (5.14) that for sufficiently large n,
Zn; C log.n 1/
P .j j / 1 2
: (5.15)
log n 4
log2 n
Now (5.11) follows from (5.15).

We now develop a framework that will lead to the proof of Theorem 5.2. Given a
positive
Pn integer n and given a collection fai gniD1 of nonnegative integers satisfying
iD1 i ai D n, let cn .a1 ; : : : ; an / denote the number of permutations
2 Sn with
./
cycle type .a1 ; : : : ; an /. From the definition of Pn , we have
Pn
i D1 ai cn .a1 ; : : : ; an /
.n/ .n/
Pn./ .C1 D a1 ; C2 D a2 ; : : : ; Cn.n/ D an / D :
.n/
./ .n/ .n/
To prove Theorem 5.2, we need to analyze Pn .C1 D m 1 ; : : : ; Cj D mj /, for
large n and fixed j . We have
.n/ .n/ .n/

Pn. / .C1 D m1 ; C2 D m2 ; : : : ; Cj D mj / D
X .n/ .n/ .n/
Pn. / .C1 Dm1 ; : : : ; Cj Dmj ; Cj C1 Daj C1 ; : : : ; Cn.n/ Dan /D
Pj Pn
iD1 i mi C iDj C1 i ai Dn
aj C1 0;:::;an 0
Pj Pn
X i D1 mi C i Dj C1 ai
cn .m1 ; : : : ; mj ; aj C1 ; : : : ; an /
: (5.16)
Pj Pn .n/
iD1 i mi C iDj C1 i ai Dn
aj C1 0;:::;an 0
We calculate cn .a1 ; : : : ; an / by direct combinatorial reasoning.

Lemma 5.1.
nŠ
cn .a1 ; : : : ; an / D Qn ai
:
iD1 i ai Š
Remark. From the lemma and (5.16), we obtain

.n/ .n/ .n/
Pn./ .C1 D m 1 ; C2 D m 2 ; : : : ; Cj D mj / D
nŠ Y . i /mi X Y
j n
. i /ai
:
.n/ iD1 mi Š Pj P iDj C1
ai Š
i D1imi C niDj C1 iai Dn
aj C1 0;:::;an 0
The sum on the right hand side above is a real mess; however, a sophisticated
application of generating functions in conjunction with the lemma will allow us
to evaluate the right hand side of (5.16) indirectly.
Proof of Lemma 5.1. First we separate out a1 numbers for 1 cycles, 2a2 numbers
for 2 cycles,: : :, .n 1/an1 numbers for .n 1/ cycles, and finally the last nan
numbers for n cycles. The number of ways of doing this is
! ! ! !
n n a1 n a1 2a2 n a1 .n 1/an1
D
a1 2a2 3a3 nan
nŠ
:
a1 Š.2a2 /Š .nan /Š
The a1 numbers selected for 1 cycles need no further differentiation. The 2a2
numbers selected for 2 cycles must be separated out into a2 pairs. Of course the
order of the pairs is irrelevant, so the number of ways of doing this is
! ! ! !
1 2a2 2a2 2 4 2 .2a2 /Š
D :
a2 Š 2 2 2 2 a2 Š.2Š/a2
The 3a3 numbers selected for 3 cycles must be separated out into a3 triplets, and then
each such triplet must be ordered in a cycle. The number of ways of separating the
3a3 numbers into triplets is
! ! ! !
1 3a2 3a3 3 6 3 .3a3 /Š
D :
a3 Š 3 3 3 3 a3 Š.3Š/a3
Each such triplet can be ordered into a cycle in .31/Š ways.a Thus, we conclude that
3 .3a3 /Š
the 3a3 numbers can be arranged into a3 3 cycles in ..31/Š/
a3 Š.3Š/a3
ways. Continuing
like this, we obtain
nŠ .2a2 /Š ..3 1/Š/a3 .3a3 /Š ..n1/Š/an .nan /Š

c.a1 ; : : : ; an /D D
a1 Š.2a2 /Š .nan /Š a2 Š.2Š/ a2 a3 Š.3Š/ a3 an Š.nŠ/an
nŠ
:
a1 Ša2 Š an Š2a2 3a3 nan

We now turn to generating functions. Consider an infinite dimensional vector
x D .x1 ; x2 ; : : :/, and for any positive integer n, define x .n/ D .x1 ; : : : ; xn /. For
a D .a1 ; : : : ; an /, let x a D .x .n/ /a WD x1a1 xnan . Let T .
/ denote the cycle type
of
2 Sn . Define the cycle index of Sn , n 1, by
1 X T .
/ 1 X
n .x/ D n .x .n/ / D x D cn .a/x a :
nŠ
2G nŠ Pn
i D1 iai Dn
a1 0;:::;an 0
We also define 0 .x/ D 1. We now consider (formally for the moment) the
generating function for n . x/:
1
X
G ./ .x; t / D n . x/t n ; x D .x1 ; x2 ; : : :/:
nD0
Using Lemma 5.1, we can obtain a very nice representation for G ./ , as well as a
domain on which its defining series converges. Let jjxjj1 WD supn1 jxn j.
Proposition 5.4.
1
X xi t i
G ./ .x; t / D exp. /; for jt j < 1; jjxjj1 < 1:
iD1
i
Proof. Consider t 2 Œ0; 1/ and x with xj 0 for all j , and jjxjj1 < 1. Using
Lemma 5.1 and the definition of n .x/, we have
1
X X cn .a/. x/a t n
G ./ .x; t / D D
Pn nŠ
j D1 jaj Dn
nD0
a1 0;:::;an 0
1
X X nŠ . x1 /a1 . xn /an t n
Qn ai
D
Pn iD1 i ai Š nŠ
j D1 jaj Dn
nD0
a1 0;:::;an 0
1
X X Y
n i
X 1 xi t i ai
Y 1
Y
. xi t /ai . / xi t i
i
D i
D e i D
Pn ai Š a1 0;a2 0;::: iD1
ai Š
j D1 jaj Dn
nD0 iD1 iD1
a1 0;:::;an 0
1
X xi t i
exp. /: (5.17)
iD1
i
The right hand side above converges for t and x in the range specified at the
beginning of the proof. Since all of the summands in sight are nonnegative, it follows
that the series defining G ./ is convergent in this range. For t and x in the range
specified in the statement of the theorem, the above calculation shows that there is
absolute convergence and hence convergence.
We now exploit the formula for G ./ .x; t / in Proposition 5.4 in a clever way.
Recall that
1 i
X t
log.1 t / D : (5.18)
iD1
i
For x D .x1 ; x2 ; : : :/ and a positive integer j , let x .j /I1 D .x1 ; : : : ; xj ; 1; 1; : : :/. In

other words, x .j /I1 is the infinite dimensional vector which coincides with x in its
first j places and has 1 in all of its other places. From Proposition 5.4 and (5.18) we
have
1 i
X X
j
X .xi 1/t i
j
t .xi 1/t i 1
G ./ .x .j /I1 ; t /D exp. / exp. /D exp. /:
iD1
i iD1
i .1t / iD1
i
(5.19)
We will need the following lemma.
P
Lemma 5.2. Let 2 .0; 1/. Let 1 iD0 bi be a convergent series, and assume that
X1 X1
1
b i t i
D i t i ; jt j < 1:
.1 t / iD0 iD0
P1
P1 >i 1, also assume that iD0 jbi j < 1. If 2 .0; 1/, also assume that
If
iD0 s jbi j < 1, for some s > 1. Then
X 1
nŠ
lim .n/
n D bi :
n!1
iD0
1
Proof. Since . .1t/ /
.n/
jtD0 D . C 1/ . C n 1/ D .n/ , the Taylor expansion
1
for .1t/
is given by
X .n/ 1
1
D t n; (5.20)
.1 t / nD0
nŠ
where Pfor convenience we have defined .0/ D 1. Thus, the Taylor expansion for
1 1 i
.1t/ iD0 bi t is given by
X1 X1
1
b i t i
D dn t n ;
.1 t / iD0 nD0
Pn .ni /
where dn D iD0

bi .ni/Š . Therefore, by the assumption in the lemma, we have
X
n
.ni/
n D bi :
iD0
.n i /Š
If DP1, then kŠ D .k/ , for all k. Consequently the above equation reduces to
n
n D iD0 bi , and thus the statement of the lemma holds. When ¤ 1, then using
the additional assumptions on fbi g1iD0 , we can show that
1
nŠ X X
n
.ni/
lim b i D bi ; (5.21)
n!1 .n/
iD0
.n i /Š iD0
which finishes the proof of the lemma. The reader is guided through a proof of (5.21)
in Exercise 5.5.
We can now give the proof of Theorem 5.2.
Proof of Theorem 5.2. From (5.19) and the original definition of G ./ .x; t /, we have
Xj
X1
1 .xi 1/t i
exp. / D n . x .j /I1 /t n : (5.22)
.1 t / iD1
i nD0
Considering x and as constants, we apply Lemma 5.2 to (5.22). In terms of the

lemma, we have
Xj
X1
.xi 1/t i
n D n . x .j /I1
/ and exp. /D bi t i : (5.23)
iD1
i iD0
P1In order to be able to apply the lemma for all > 0, we need to show that
N 1
iD0 s jbi j < 1, for some s > 1. Define fbi giD0 by
i
Xj
X1
jxi 1jt i
exp. /D bNi t i : (5.24)
iD1
i iD0
Since all of the coefficients in the sum in the exponent on the left hand side of (5.24)
are nonnegative, we have bNi jbi j 0, for all i . The reader is asked to prove this
in Exercise 5.6. The function on the left hand side of (5.24) is real analytic for all
t 2 R (and complex analytic for all complex t ); consequently, its power series on
the right handPside converges for all t 2 R. From this and the nonnegativity of bNi , it
1 iN N
P1thati iD0 s bi < 1, for all s 0, and then, since jbi j bi , we conclude
follows
that iD0 s jbi j < 1, for all s 0.
By definition, from (5.23), we have
1
X Xj
xi 1
bi D exp. /: (5.25)
iD0 iD1
i
Consider now
nŠ nŠ 1 X
n D .n/ n . x .j /I1 / D .n/ cn .a/. x .j /I1 /a : (5.26)
.n/ Pn
i D1iai Dn
a1 0;:::;an 0
For any given j -vector .m1 ; : : : ; mj / with nonnegative integral entries, the coeffi-
m
cient of x1m1 x2m2 xj j in (5.26) is
1 X Pj Pn
mi C i Dj C1 ai
i D1 cn .m1 ; : : : ; mj ; aj C1 ; : : : ; an /:
.n/ Pj Pn
i D1imi C i Dj C1 iai Dn
aj C1 0;:::;an 0
./ .n/ .n/ .n/

But by (5.16), this is exactly Pn .C1 D m1 ; C2 D m2 ; : : : ; Cj D mj /. By
Lemma 5.2, limn!1 nŠ.n/ n exists, and this is true for every choice of x and ; thus,
we conclude that
nŠ nŠ X m
lim .n/
n D lim .n/ n . x .j /I1 / D pm1 ;:::;mj . /x1m1 xj j ;
n!1 n!1
m 1 0;:::;mj 0
(5.27)
where
.n/ .n/ .n/
pm1 ;:::;mj . / D lim Pn./ .C1 D m 1 ; C2 D m 2 ; : : : ; Cj D mj /: (5.28)
n!1
Applying Lemma 5.2, we conclude from (5.25) and (5.27) that
Xj
xi 1 X m
exp. /D pm1 ;:::;mj ./ x1m1 xj j : (5.29)
i
iD1 m 1 0;:::;mj 0
m
On the one hand, (5.29) shows that the coefficient of x1m1 xj j in the Taylor
Pj
expansion about x D 0 of the function exp. iD1 xi i1 / is pm1 ;:::;mj . /. On the
other hand, by Taylor’s formula, this coefficient is equal to
Pj xi 1

1 @m1 CCmj exp. iD1 /
mj
i
jxD0 D
m1 Š mj Š @m
x1 @xj
1
1 Xj
1 Y mi
j
Yj mi
. /
exp. / . / D e i i : (5.30)
m1 Š mj Š iD1
i iD1 i iD1
mi Š
Thus, from (5.28)–(5.30), we conclude that
Y
j
. i /mi
e i
.n/ .n/ .n/
lim Pn./ .C1 D m 1 ; C2 D m 2 ; : : : ; Cj D mj / D ;
n!1
iD1
mi Š
completing the proof of Theorem 5.2.

./
Exercise 5.1. Verify that Proposition 5.1 follows from the definition of Pn along
with Proposition 5.2 and Lemma 5.1.
Exercise 5.2. Show that (5.4) is equivalent to
X j
Y . i /ri
e i
./ .n/ .n/ .n/
lim Pn .C1 m1 ; C2 m2 ; : : : ; Cj mj /D ;
n!1 ri Š
0r1 m1 ;:::;0rj mj iD1
mi 0; i D 1; : : : ; j: (5.31)
Exercise 5.3. In this exercise you will show directly that (5.5) follows from (5.4).
(a) Fix an integer j 2. Use (5.31) to show that for any > 0, there exists an N
such that if n N and m N , then
.n/
Pn./ .Ci > m; for some i 2 Œj / < : (5.32)
(b) From (5.31) and (5.32), deduce that (5.31) also holds if some of the mi are equal
to 1.
(c) Prove that (5.5) follows from (5.4).
Exercise 5.4. This exercise gives an alternative probabilistic proof of Proposi-
tion 5.3. A uniformly random ( D 1) permutation
2 Sn can be constructed in
the following manner via its cycles. We begin with the number 1. Now we randomly
choose a number from Œn. If we chose j , then we declare that
1 D j . This is the
first stage of the construction. If j ¤ 1, then we randomly choose a number from
Œn fj g. If we chose k, then we declare that
j D k. This is the second stage of
the construction. If k ¤ 1, then we randomly choose a number from Œn fj; kg.
We continue like this until we finally choose 1, which closes the cycle. For example,
if after k we chose 1, then the permutation
would contain the cycle .1j k/. Once
we close a cycle, we begin again, starting with the smallest number that has not yet
been used. We continue like this for n stages, at which point the permutation
has
been defined completely.
(a) The above construction has n stages. Show that the probability of completing a
1
cycle on the j th stage is nC1j . Thus, letting
(
.n/ 1; if a cycle was completed at stage j I
Xj D
0; otherwise;
.n/
it follows that Xj Ber. nC1j
1
/.
.n/
(b) Argue that fXj gnj D1 are independent.
P .n/
(c) Show that the number of cycles N .n/ can be represented as N .n/ D nj D1 Xj ,
thereby proving Proposition 5.3 in the case D 1.
(d) Let 2 .0; 1/. Amend the above construction as follows. At any stage j ,

close the cycle with probability nCj , and choose any other particular number
1
that has not yet been used with probability nCj . Show that this construction
./
yields a permutation distributed according to Pn , and use the above reasoning
to prove Proposition 5.3 for all > 0.
P
Exercise 5.5. (a) Show that if 1 iD0 jbi j < 1 and the triangular array fcn;i W i D
0; 1; : : : ; nI n DP0; 1; : : :g is bounded
P and satisfies limn!1 cn;ni D 1, for all
i , then limn!1 niD1 bi cn;ni D 1 iD1 bi . Then use this to prove (5.21) in the
case that > 1.
nŠ .ni / .0/
(b) Show that if 2 .0; 1/, then .ni/Š .n/
ni
n
, if i < n. Also, nŠ0Š .n/
n,
where we recall, P .0/
D 1.
(c) Show that if 1 i
iD0 jbi js < 1, where s > 1, then jbi j s , for all large i .
i
nŠ Pn .ni /
(d) For 2 .0; 1/, prove (5.21) as follows. Break the sum .n/ iD0 bi .ni/Š into
three parts—from i D 0 to i D N , from i D N C 1 to i D Œ n2 , and from
i D Œ n2 C 1 to i D n. Use the reasoning in the proof of (a) to show that by
choosing N sufficiently P large, the limit as n !P 1 of the first part can be made
arbitrarily close to 1 iD0 ib . Use the fact that 1
iD0 jbi j < 1 to show that by
choosing N sufficiently large, the lim supn!1 of the second part can be made
arbitrarily small. Use (b) and (c) to show that the limit as n ! 1 of the third
part is 0.
Exercise 5.6. Prove that bNi jbi j, where fbi g1 N 1
iD0 and fbi giD0 are defined in (5.23)
and (5.24).
Exercise 5.7. Make a small change in the proof of Theorem 5.2 to show that (5.5)
holds.
.1/ .1/
Exercise 5.8. Consider the uniform probability measure Pn on Sn and let En
.1/
denote the expectation under Pn . Let Xn D Xn .
/ be the random variable denoting
the number of nearest neighbor pairs in the permutation
2 Sn , and let Yn D Yn .
/
be the random variable denoting the number of nearest neighbor triples in
2 Sn .
(A nearest neighbor pair for
is a pair k; k C 1, with k 2 Œn 1, such that
i D k
and
iC1 D k C 1, for some i 2 Œn 1, and a nearest neighbor triple is a triple
.k; k C 1; k C 2/ with k 2 Œn 2 such that
i D k,
iC1 D k C 1 and
iC2 D k C 2,
for some i 2 Œn 2.)
.1/
(a) Show that En Xn D 1, for all n. (Hint: Represent Xn as the sum of indicator
random variables fIk gn1
kD1 , where Ik .
/ is equal to 1 if k; k C 1 is a nearest
neighbor pair in
and is equal to 0 otherwise.) It can be shown that the
distribution of Xn converges weakly to the Pois.1/ distribution as n ! 1;
see [17].
.1/ .1/
(b) Show that limn!1 En Yn D 0 and conclude that limn!1 Pn .Yn D 0/ D 1.
Chapter Notes
In this chapter we investigated the limiting distribution as n ! 1 of the

random vector denoting the number of cycles of lengths 1 through j in a random
permutation from Sn . It is very interesting and more challenging to investigate
the limiting distribution of the random vector denoting the j longest cycles or,
alternatively, the j shortest cycles. For everything you want to know about cycles in
random permutations, and lots of references, see the book by Arratia et al. [6]. Our
approach in this chapter was almost completely combinatorial, through the use of
generating functions. Such methods are used occasionally in [6], but the emphasis is
on more sophisticated probabilistic analysis. Our method is similar to the generating
function approach of Wilf in [34], which deals only with the case D 1. For
an expository account of the intertwining of combinatorial objects with stochastic
processes, see the lecture notes of Pitman [30].
Chapter 6
Chebyshev’s Theorem on the Asymptotic
Density of the Primes
Let .n/ denote the number of primes that are no larger than n; that is,
X
.n/ D 1;
pn
where here and elsewhere in this chapter and the next two, the letter p in a
summation denotes a prime. Euclid proved that there are infinitely many primes:
limn!1 .n/ D 1. The asymptotic density of the primes is 0; that is,
.n/
lim D 0:
n!1 n
The prime number theorem gives the leading order asymptotic behavior of .n/. It
states that
.n/ log n
lim D 1:
n!1 n
This landmark result was proved in 1896 independently by J. Hadamard and by C.J.
de la Vallée Poussin. Their proofs used contour integration and Cauchy’s theorem
from analytic function theory. A so-called “elementary” proof, that is, a proof that
does not use analytic function theory, was given by P. Erdős and A. Selberg in 1949.
Although their proof uses only elementary methods, it is certainly more involved
than the proofs of Hadamard and de la Vallée Poussin. We will not prove the prime
number theorem in this book. In this chapter we prove a precursor of the prime
number theorem, due to Chebyshev in 1850. Chebyshev was the first to prove that
.n/ grows on the order logn n . Chebyshev’s methods were ingenious but entirely
elementary. Given the truly elementary nature of his approach, it is quite impressive
how close his result is to the prime number theorem. Here is Chebyshev’s result.

68 6 Chebyshev’s Theorem
Theorem 6.1 (Chebyshev).
.n/ log n .n/ log n

0:693 log 2 lim inf lim sup log 4 1:386:
n!1 n n!1 n
Chebyshev’s result is not the type of result we are emphasizing in this book, since
it is not an exact asymptotic result but rather only an estimate. We have included the
result because we will need it to prove Mertens’ theorems in Chap. 7, and one of
Mertens’ theorems will be used to prove the Hardy–Ramanujan theorem in Chap. 8.
Define Chebyshev’s -function by
X
.n/ D log p: (6.1)
pn
Chebyshev realized that an understanding of the asymptotic behavior of .n/ allows

one to infer the asymptotic behavior of .n/ (and vice versa), and that the direct
asymptotic analysis of the function is much more tractable than that of the function
, because the sum of logarithms is the logarithm of the product. Indeed, note that
Y
.n/ D log p: (6.2)
pn
We will give an exceedingly simple proof of the following result, which links the
asymptotic behavior of to that of .
Proposition 6.1.
(i) lim infn!1 .n/
n
D lim infn!1 .n/nlog n ;
(ii) lim supn!1 n D lim supn!1 .n/nlog n .
.n/
Proof. We have the trivial inequality

X
.n/ D log p .n/ log n:
pn
Dividing this by n and letting n ! 1, we obtain
.n/ .n/ log n .n/ .n/ log n

lim inf lim inf I lim sup lim sup : (6.3)
n!1 n n!1 n n!1 n n!1 n
We have for 2 .0; 1/,

X
.n/ log p .n/ .Œn1 / log n1
Œn1 <pn

.1 / .n/ log n Œn1 log n ;
6 Chebyshev’s Theorem 69
where the last inequality comes from the trivial fact that .y/ y. Dividing this by
n and letting n ! 1, and using the fact that 2 .0; 1/ is arbitrary, we obtain
.n/ .n/ log n .n/ .n/ log n

n!1 n n!1 n n!1 n n!1 n
The proposition follows from (6.3) and (6.4).

The following theorem gives an upper bound on the Chebyshev -function.
.n/
Theorem 6.2. n
log 4, n 1.
Proof. The proof is by induction, the inductive hypothesis being that .n/ n log 4.
Note that the hypothesis holds for n D 1; 2. If n C 1 3 is even, then .n C 1/ D
.n/ n log 4 .n C 1/ log 4, where the first inequality comes from the inductive
hypothesis. If n C 1 is odd,
then write n C 1 D 2m C 1, and note that the
.2mC1/.2m/.mC2/
binomial coefficient 2mC1
m
D mŠ
is divisible by every prime between
m C 2 and 2m C 1 (since all such primes appear in the
numerator of the latter
expression, but not in the denominator). Since 2mC1 m
is a positive integer (all
binomial coefficients are integers) which contains as factors all the primes between
m C 2 and 2m C 1, we have
!
Y 2m C 1
p : (6.5)
mC2p2mC1
m
By the binomial formula,

! ! ! !
X
2mC1
2m C 1 2m C 1 2m C 1 2m C 1
2 2mC1
D.1 C 1/ 2mC1
D C D2 I
j D0
j m mC1 m
thus,
!
2m C 1
22m : (6.6)
m
From (6.2), (6.5), and (6.6) we have

Y
.2m C 1/ .m C 1/ D log p log 22m D m log 4: (6.7)
mC2p2mC1
From (6.7) and the inductive hypothesis, we have
.2m C 1/ .m C 1/ C m log 4 .m C 1/ log 4 C m log 4 D .2m C 1/ log 4I
that is, .n C 1/ .n C 1/ log 4.

As we noted above, the direct asymptotic analysis of is much more tractable

than that of , and Theorem 6.2 carried out an upper bound analysis for . It turns
out that for the lower-bound analysis it is better to work with Chebyshev’s -
function instead of Chebyshev’s -function. One defines
X
.n/ D log p: (6.8)
p k n;k1
That is, in the sum above, a term log p appears for every prime p and integer k 1
for which p k n. So, for example, .14/ D 3 log 2 C 2 log 3 C log 5 C log 7 C
log 11 C log 13. Of course, .n/ .n/. We show now that and have the same
asymptotic behavior.
Proposition 6.2.
(i) lim infn!1 .n/
n
D lim infn!1 .n/
n
;
.n/ .n/
(ii) lim supn!1 n D lim supn!1 n
.
Proof. Since .n/ .n/, we have
.n/ .n/ .n/ .n/

n!1 n n!1 n n!1 n n!1 n
n
Since 2k n if and only if k log 2 log n, or equivalently, k Œ log
log 2
, it follows
n
that p k > n for every prime p and every k > Œ log
log 2
; thus
log n
Œ log 2
X X X
.n/ .n/ D log p D log p D
p k n;k2 kD2 1
pn k
log n
Œ log 2
X 1 log n 1
.Œn k / .Œn 2 /: (6.10)
log 2
kD2
P
Now trivially, .k/ D pk log p k log k. Using this with (6.10) gives
1
.log n/2 n 2
.n/ .n/ : (6.11)
2 log 2
From (6.11) it follows that
.n/ .n/ .n/ .n/

n!1 n n!1 n n!1 n n!1 n
The proposition follows from (6.9) and (6.12).

Remark. The bound obtained in (6.11) can be improved by replacing the trivial
bound on , namely, .k/ k log k, by the bound obtained from Theorem 6.2.
We will carry out a lower-bound analysis of . This will be somewhat more
involved than the upper bound analysis for but still entirely elementary. For n 2 N
and p a prime, let vp .n/ denote the largest exponent k such that p k jn. One calls
vp .n/ the p-adic value of n. It follows from the definition of vp that any positive
integer n can be written as
Y
nD p vp .n/ : (6.13)
p
In Exercise 6.1 the reader is asked to prove the following simple formula:
vp .mn/ D vp .m/ C vp .n/; m; n 2 N: (6.14)
From (6.14) it follows that
X
n
vp .nŠ/ D vp .m/: (6.15)
mD1
We will need the following result.

Proposition 6.3.
1
X n
vp .nŠ/ D Œ k :
p
kD1
Proof. We can write

X
vp .m/ D 1:
1k<1;p k jm
Using this with (6.15), we have
X
n X
n X 1
X X
vp .nŠ/ D vp .m/ D 1D 1: (6.16)
mD1 mD1 1k<1;p k jm kD1 1mn;p k jm
If p k > n, then obviously there is no m 2 Œn for which p k jm. If p k n,

then the integers m 2 Œn for which p k jm are the Œ pnk integers p k ; : : : ; Œ pnk p k .
P
Thus, 1mn;pk jm 1 D Œ pnk . Substituting this in (6.16) completes the proof of the
proposition.
We can now carry out a lower-bound analysis of .
Theorem 6.3.
.n/
log 2:
lim inf
n n!1

Proof. Consider the binomial coefficient 2n
n
D .2n/Š
.nŠ/2
. Using (6.13) we have
!
2n .2n/Š Y Y
D 2
D p vp ..2n/Š/2vp .nŠ/ D p vp ..2n/Š/2vp .nŠ/ ; (6.17)
n .nŠ/ p p2n
where the final equality comes from the fact that neither .2n/Š nor nŠ has a prime
factor larger than 2n. From Proposition 6.3, we have
1
X 2n n
vp ..2n/Š/ 2vp .nŠ/ D 2 k : (6.18)
pk p
kD1
2n log 2n
Of course, pk
D n
pk
D 0 if p k > 2n, that is, if k >log p
. Thus,
log 2n
in the summation over k above, we may replace the upper limit 1 by log p .
Furthermore, it is easy to verify that Œ2x 2Œx is equal to either 0 or 1, for all
real numbers x. From these two facts we obtain from (6.18) the estimate
log 2n
0 vp ..2n/Š/ 2vp .nŠ/ : (6.19)
log p
From (6.17) and (6.19) we have the estimate

!
2n Y Œ log 2n
p log p : (6.20)
n p2n
On the other hand we have the easy estimate

!
2n 22n
: (6.21)
n 2n

To prove (6.21), note that the middle binomial coefficient 2nn
maximizes 2nk
over
k 2 Œ2n. The reader is asked to prove this in Exercise 6.2. Thus, we have
! ! ! !
X
2n
2n X 2n
2n1
2n 2n
22n
D .1 C 1/ 2n
D D 2C 2 C .2n 1/ 2n :
k k n n
kD0 kD1
From (6.20) and (6.21), we conclude that
22n Y Œ log 2n
p log p
2n p2n
or, equivalently,
X log 2n
2n log 2 log 2n log p: (6.22)
p2n
log p
P
Recalling from (6.8) that .2n/ D pk 2n;k1 log p, it follows that the summand
log p appears in .2n/ one time for each k 1 that satisfies p k 2n; that is, the
2n
summand log p appears Œ log
log p
times. Thus, the right hand side of (6.22) is equal to
.2n/, giving the inequality
.2n/ 2n log 2 log 2n: (6.23)
Of course then we also have
.2n C 1/ 2n log 2 log 2n: (6.24)
Dividing (6.23) by 2n and dividing (6.24) by 2n C 1, and letting n ! 1, we

conclude that
.n/
lim inf log 2;
n!1 n
which completes the proof of the theorem.
We can now prove Chebyshev’s theorem in one line.
Proof of Theorem 6.1. The upper bound follows from Theorem 6.2 and part (ii)
of Proposition 6.1, while the lower bound follows from Theorem 6.3, part (i) of
Proposition 6.2, and part (i) of Proposition 6.1.
Exercise 6.1. Prove (6.14): vp .mn/ D vp .m/ C vp .n/; m; n 2 N.

Exercise 6.2. Prove that 2n
n
D maxk2Œ2n 2n
k
.
Exercise 6.3. Bertrand’s postulate states that for each positive integer n, there
exists a prime in the interval .n; 2n/. This result was first proven by Chebyshev.
Use the upper and lower bounds obtained in this chapter for Chebyshev’s -function
to prove the following weak form of Bertrand’s postulate: For every > 0, there
exists an n0 . / such that for every n n0 . / there exists a prime in the interval
.n; .2 C /n/.
Chapter Notes
Chebyshev also proved that if limn!1 .n/nlog n exists, then this limit must be equal
to 1. For a proof, see Tenenbaums’ book [33]. Late in his life, in a letter, Gauss
recollected that in the early 1790s, when he was 15 or 16, he conjectured the prime
number theorem; however, he never published the conjecture. The theorem was
conjectured by Dirichlet in 1838. For some references for further reading, see the
notes at the end of Chap. 8.
Chapter 7
Mertens’ Theorems on the Asymptotic
Behavior of the Primes
Given a sequence of positive numbers fan g1 nD1 satisfying limn!1 an D 1, one

way to measure theP rate at which the sequence approaches 1 is to consider the rate
at which the series nj D1 a1j grows. For aj D j , it is well known that the harmonic
P P
series nj D1 j1 satisfies nj D1 j1 D log nCO.1/ as n ! 1. How does the harmonic
series of the primes behave? The goal of this chapter is to prove a theorem known
as Mertens’ second theorem.
Theorem 7.1.
X1
D log log n C O.1/; as n ! 1:
pn
p
Mertens’ second theorem will play a key role in the proof of the Hardy–
Ramanujan theorem in Chap. 8. For our proof of Mertens’ second theorem, we will
need a result known as Mertens’ first theorem.
Theorem 7.2.
X log p
D log n C O.1/; as n ! 1:
pn
p
We now prove Mertens’ two theorems.

Proof of Mertens’ first theorem. We will analyze the asymptotic behavior of log nŠ
in two different ways. Comparing the two results will prove the theorem. First we
show that
log nŠ D n log n C O.n/; as n ! 1: (7.1)

76 7 Mertens’ Theorems
p
We note that (7.1) follows from Stirling’s formula: nŠ nn e n 2 n. However, we
certainly don’t need such a precise estimate of nŠ to obtain (7.1). We give a quick
direct proof of (7.1). Consider an integer m 2 and x 2 Œm 1; m. Integrating the
inequality log.m 1/ log x log m over x 2 Œm 1; m gives
Z m
log.m 1/ log x dx log m;
m1
which we rewrite as
Z m
0 log m log x dx log m log.m 1/:
m1
Summing this inequality from m D 2 to m D n, and noting that the resulting series
on the right hand side is telescopic, we obtain
Z n
0 log nŠ log x dx log n: (7.2)
1
Rn
An integration by parts shows that 1 log x dx D n log n n C 1. Substituting this
in (7.2) gives
n log n n C 1 log nŠ n log n n C 1 C log n;
which completes the proof of (7.1).

To analyze log nŠ in another way, we utilize the function vp .n/ introduced in
Chap. 6. Recall that vp .n/, the p-adic value of n, is equal to the largest
Q exponent
k such that p k jn and that by the definition of vp , we have n D pp
vp .n/
D
Q vp .n/
pm p , for any integer m that is greater than or equal to the largest prime
divisor of n. Recall that Proposition 6.3 states that
1
X n
vp .nŠ/ D Œ k :
p
kD1
Thus, we have
Y Y P1 n
kD1 Œ p k
nŠ D p vp .nŠ/ D p ;
pn pn
and
XX 1 X n XX 1
n n
log nŠ D Œ k log p D Œ log p C Œ k log p: (7.3)
pn
p pn
p pn
p
kD1 kD2
7 Mertens’ Theorems 77
We now analyze the two terms on the right hand of (7.3), beginning with the
second term. We have
X1 X1 1
n 1 p2 n
Œ k n Dn D :
p pk 1 p1 p.p 1/
kD2 kD2
Thus, we obtain
XX 1 X log p
n
Œ k log p n C n; (7.4)
pn
p pn
p.p 1/
kD2
for some constant

P log p PC
1
> 0, the latter inequality following from the fact that
log m
pn p.p1/ < mD2 m.m1/ < 1. We write the first term on the right hand
side of (7.3) as
X n X log p X n n
Œ log p D n . Œ / log p: (7.5)
pn
p pn
p pn
p p
Recalling that Theorem 6.2 gives .n/ .log 4/n, we can estimate the second term
on the right hand side of (7.5) by
X n n X
0 . Œ / log p log p D .n/ .log 4/n: (7.6)
pn
p p pn
From (7.3)–(7.6), we conclude that

X log p
log nŠ D n C O.n/; as n ! 1: (7.7)
pn
p
P log p
Comparing (7.1) with (7.7) allows us to conclude that pn p
D log n C O.1/,
completing the proof of Mertens’ first theorem.
In order to use Mertens’ first theorem to prove his second theorem, we need to
introduce Abel summation, a tool that is used extensively in number theory. Abel
summation is a discrete version of integration by parts. It appears in a variety of
guises, the following of which is the most suitable in the present context.
Proposition 7.1 (Abel Summation). Let j0 ; n 2 Z with j0 < n. Let a W Œj0 ; n \
PŒt
Z ! R, and let A W Œj0 ; n ! R be defined by A.t / D kDj0 a.k/. Let f W
Œj0 ; n ! R be continuously differentiable. Then
X Z n
a.r/f .r/ D A.n/f .n/ A.j0 /f .j0 / A.t /f 0 .t / dt: (7.8)
j0 <rn j0
Remark. Since A.j0 / D a.j0 /, we could also write the above formula in the more
compact form
X Z n
a.r/f .r/ D A.n/f .n/ A.t /f 0 .t / dt: (7.9)
j0 rn j0
The form in the proposition of course mimics the standard integration by parts
formula.
Proof. Since A is constant between integers, we have
Z n1 Z
X X
n1
n
0
rC1
0

A.t /f .t / dt D A.t /f .t / dt D A.r/ f .r C 1/ f .r/ :
j0 rDj0 r rDj0
(7.10)
Substituting for A in the last term on the right hand side, and interchanging the order
of the resulting summation, we obtain
X
n1 n1 X
X r

A.r/ f .r C 1/ f .r/ D a.k/ f .r C 1/ f .r/ D
rDj0 rDj0 kDj0
X
n1 X
n1
X
n1

a.k/ f .r C 1/ f .r/ D a.k/ f .n/ f .k/ D
kDj0 rDk kDj0
X
n1
A.n 1/f .n/ a.k/f .k/: (7.11)
kDj0
From (7.10) and (7.11) we obtain

Z n X
n1
A.t /f 0 .t / dt D A.n 1/f .n/ a.k/f .k/:
j0 kDj0
Substituting this in the right hand side of (7.8) gives

Z n
A.n/f .n/ A.j0 /f .j0 / A.t /f 0 .t / dt D
j0
X
n1 X
n
A.n/f .n/ A.j0 /f .j0 / A.n 1/f .n/ C a.k/f .k/ D a.k/f .k/;
kDj0 kDj0 C1
(7.12)
which proves the proposition.

7 Mertens’ Theorems 79
Proof of Mertens’ second theorem. Let

(
log p
; if n D pI
a.n/ D p
0; otherwise;
and let
1
f .t / D ; t > 1:
log t
We use Abel summation in the form (7.9) with j0 D 2. By Mertens’ first theorem,
we have
X
Œt
X log p
A.t / D a.k/ D D log t C O.1/; as t ! 1: (7.13)
p
kD2 pŒt
Thus, we obtain from (7.9) and (7.13),

X1 X log p 1 X Z n
D D a.r/f .r/ D A.n/f .n/ A.t /f 0 .t / dt D
pn
p pn
p log p 2rn 2
Z n
log n C O.1/ log t C O.1/
C dt: (7.14)
log n 2 t .log t /2
We have
Z n
1
dt D log log t jn2 D log log n log log 2;
2 t log t
R
and since 1
t.log t/2
dt D log1 t , we have
Z 1
1
dt < 1:
2 t .log t /2
Using these two facts in (7.14) gives

X1
D log log n C O.1/; as n ! 1; (7.15)
pn
p
completing the proof of Mertens’ second theorem.

Exercise 7.1. (a) Use Mertens’ first theorem and Abel summation to prove that
X log2 p 1
D log2 n C O.log n/:
pn
p 2
P 2 P
(Hint: Write pn logp p D 1rn a.r/ log r, where a.r/ is as in the proof of
Mertens’ second theorem.)
(b) Use induction and the result in (a) to prove that
X logk p 1
D logk n C O.logk1 n/;
pn
p k
for all positive integers k.

Exercise
P 7.2. Proposition 6.1Pin Chap. 6 showed that the two statements,
pn log p n and .n/ D pn 1 logn n , can easily be derived one from the
other. The prime number theorem cannot be derived
P from Mertens’ second theorem.
Derive Mertens’ second theorem in the form pn p1 log log n from the prime
number theorem, .n/ logn n . (Hint: Use Abel summation.)
Chapter Notes
The two theorems in this chapter were proven by F. Mertens in 1874. For some
references for further reading, see the notes at the end of Chap. 8.
Chapter 8
The Hardy–Ramanujan Theorem
on the Number of Distinct Prime Divisors
Let !.n/ denote the number of distinct prime divisors of n; that is,
X
!.n/ D 1:
pjn
Thus, for example, !.1/ D 0, !.2/ D 1, !.9/ D 1, !.60/ D 3. The values

of !.n/ obviously fluctuate wildly as n ! 1, since !.p/ D 1, for every
prime p. However, there are not very many prime numbers, in the sense that
the asymptotic density of the primes is 0. In this chapter we prove the Hardy–
Ramanujan theorem, which in colloquial language states that “almost every” integer
n has “approximately” log log n distinct prime divisors. The meaning of “almost
every” is that the asymptotic density of those integers n for which the number of
distinct prime divisors is not “approximately” log log n is zero. The meaning of
“approximately” is that the actual number of distinct prime divisors of n falls in
1 1
the interval Œlog log n .log log n/ 2 Cı ; log log n C .log log n/ 2 Cı , where ı > 0 is
arbitrarily small.
Theorem 8.1 (Hardy–Ramanujan). For every ı > 0,
1
jfn 2 ŒN W j!.n/ log log nj .log log n/ 2 Cı gj
lim D 1: (8.1)
N !1 N
Remark. From the proof of the theorem, it is very easy to infer that the statement of
the theorem is equivalent to the following statement: For every ı > 0,
1
jfn 2 ŒN W j!.n/ log log N j .log log N / 2 Cı gj
lim D 1:
N !1 N

82 8 Hardy–Ramanujan Theorem
While the statement of the theorem is probably more aesthetically pleasing than
this latter statement, the latter statement is more practical. Thus, for example, take
ı D :1. Then for sufficiently large n, a very high percentage of the positive integers
n
up to the astronomical number N D e e will have between nn:6 and nCn:6 distinct
prime factors. Let n D 109 . We leave it to the interested reader to estimate the
O.1/ terms appearing in the proofs of Mertens’ theorems, and to keep track of how
they appear in the proof of the Hardy–Ramanujan theorem below, and to conclude
109
that over ninety percent of the positive integers up to N D e e have between
109 .109 /:6 and 109 C .109 /:6 distinct prime factors. That is, over ninety percent
109
of the positive integers up to e e have between 109 251; 188 and 109 C 251; 188
distinct prime factors.
Our proof of the Hardy–Ramanujan theorem will have a probabilistic flavor. For
any positive integer N , let PN denote the uniform probability measure on ŒN ; that
is, PN .fj g/ D N1 , for j 2 ŒN . Then we may think of the distinct prime divisor
function ! D !.n/ as a random variable on the space ŒN with the probability
measure PN . For the sequel, note that when we write PN .! 2 A/, where A ŒN ,
what we mean is
jfn 2 ŒN W !.n/ 2 Agj

PN .! 2 A/ D PN .fn 2 ŒN W !.n/ 2 Ag/ D :
N
Let EN denote the expected value with respect to the measure PN . The expected
value of ! is given by
1 X
N
EN ! D !.n/: (8.2)
N nD1
The second moment of ! is given by
1 X 2
N
EN ! 2 D ! .n/: (8.3)
N nD1
The variance VarN .!/ of ! is defined by
VarN .!/ D EN .! EN !/2 D EN ! 2 .EN !/2 : (8.4)
We will prove the Hardy–Ramanujan theorem by applying Chebyshev’s inequal-

ity to the random variable !:
VarN .!/
PN .j! EN !j / ; for > 0: (8.5)
2
8 Hardy–Ramanujan Theorem 83
In order to implement this, we need to calculate EN ! and VarN .!/ or, equivalently,
EN ! and EN ! 2 . The next two theorems give the asymptotic behavior as N ! 1
of EN ! and of EN ! 2 . The proofs of these two theorems will use Mertens’ second
theorem.
Theorem 8.2.
EN ! D log log N C O.1/; as N ! 1:
Remark. Recall the definition of the average order of an arithmetic function, given
in the remark following the number-theoretic proof of Theorem 2.1. Theorem 8.2
shows that the average order of !, the function counting the number of distinct
prime divisors, is given by the function log log n.
Proof. From the definition of the divisor function we have
X
N X
N X X X X N
!.n/ D 1D 1D Œ D
nD1 nD1 pjn pN pjn;nN pN
p
X 1 X N N
N . Œ /: (8.6)
pN
p pN p p
The second term above satisfies the inequality

X N N X
0 . Œ / 1 D .N / N: (8.7)
pN
p p pN
(We could use Chebyshev’s theorem (Theorem 6.1) to get the better bound O. logNN /
on the right hand side above, but that wouldn’t improve the order of the final bound
we obtain for EN !.) Mertens’ second theorem (Theorem 7.1) gives
X 1
D log log N C O.1/; as N ! 1: (8.8)
pN
p
From (8.6)–(8.8), we obtain
X
N
!.n/ D N log log N C O.N /; as N ! 1;
nD1
and dividing this by N gives
EN ! D log log N C O.1/; as N ! 1; (8.9)
completing the proof of the theorem.

Theorem 8.3.
EN ! 2 D .log log N /2 C O.log log N /; as N ! 1:
Remark. To prove the Hardy–Ramanujan theorem, we only need the upper bound
EN ! 2 .log log N /2 C O.log log N /; as N ! 1: (8.10)
Proof. We have
X X X X X X
! 2 .n/ D . 1/2 D . 1/. 1/ D 1C 1D 1 C !.n/: (8.11)
pjn p1 jn p2 jn p1 p2 jn pjn p1 p2 jn
p1 ¤p2 p1 ¤p2
Thus,
X
N X
N X X
N
! 2 .n/ D 1C !.n/: (8.12)
nD1 nD1 p1 p2 jn nD1
p1 ¤p2
The second term on the right hand side of (8.12) can be estimated by Theorem 8.2,
giving
X
N
!.n/ D NEN ! D N log log N C O.N /; as N ! 1: (8.13)
nD1
To estimate the first term on the right hand side of (8.12), we write
X
N X X X X N
1D 1D D
nD1 p1 p2 jn p1 p2 N nN p p N
p1 p2
1 2
p1 ¤p2 p1 ¤p2 p1 p2 jn p1 ¤p2
X 1 X N N
N : (8.14)
p1 p2 N
p1 p2 p p N p1 p2 p1 p2
1 2
p1 ¤p2 p1 ¤p2
The number of ordered pairs of distinct primes .p1 ; p2 / such that p1 p2 N is of

course equal to twice the number of such unordered pairs fp1 ; p2 g. The fundamental
theorem of arithmetic states that each integer has a unique factorization into primes;
thus, if p1 p2 D p3 p4 , then necessarily fp1 ; p2 g D fp3 ; p4 g. Consequently the
number of unordered pairs fp1 ; p2 g such that p1 p2 N is certainly no greater
than N . Thus, the second term on the right hand side of (8.14) satisfies
X N N X
0 1 2N: (8.15)
p p N
p1 p2 p1 p2 p p N
1 2 1 2
p1 ¤p2 p1 ¤p2
Using Mertens’ second theorem for the second inequality below, we bound from
above the summation in the first term on the right hand side of (8.14) by
X 1 X 1 2
. /2 log log N C O.1/ ; as N ! 1: (8.16)
p1 p2 N
p1 p2 pN
p
p1 ¤p2
From (8.12)–(8.16), we conclude that (8.10) holds.

To complete the proof of the theorem, we need to show (8.10) with the reverse
inequality. The easiest way to do this is to note simply that the variance is a
nonnegative quantity. Thus,
2
EN ! 2 .EN !/2 D log log N C O.1/ D .log log N /2 C O.log log N /;
where the first equality follows from Theorem 8.2. For an alternative proof, see
Exercise 8.1.
We now use Chebyshev’s inequality along with the estimates in Theorems 8.2
and 8.3 to prove the Hardy–Ramanujan theorem.
Proof of Theorem 8.1. From Theorems 8.2 and 8.3 we have
2
VarN .!/DEN ! 2 .EN !/2 D.log log N /2 CO.log log N / log log N C O.1/ D
O.log log N /; as N ! 1: (8.17)
Theorem 8.2 gives
EN ! D log log N C RN ; where RN is bounded as N ! 1: (8.18)

1
Applying Chebyshev’s inequality with D .log log N / 2 Cı , where ı > 0, we obtain
from (8.5), (8.17), and (8.18)
1
O.log log N /
PN j! log log N RN j .log log N / 2 Cı ; as N ! 1:
.log log N /1C2ı
Thus,
1

lim PN j! log log N RN j .log log N / 2 Cı D 1: (8.19)
N !1
Translating (8.19) back to the notation in the statement of the theorem, we have for
every ı > 0
1
jfn 2 ŒN W j!.n/ log log N RN j .log log N / 2 Cı gj
lim D 1: (8.20)
N !1 N
The main difference between (8.20) and the statement of the Hardy–Ramanujan
theorem is that log log N appears in (8.20) and log log n appears in (8.1). Because
log log x is such a slowly varying function, this difference is not very significant.
The remainder of the proof consists of showing that if (8.20) holds for all ı > 0,
then (8.1) also holds for all ı > 0.
Fix an arbitrary ı > 0. Using the fact that (8.20) holds with ı replaced by 2ı , we
will show that (8.1) holds for ı. This will then complete the proof of the theorem.
The term RN in (8.20) may vary with N , but it is bounded in absolute value, say
1
by M . For N 2 n N , we have
1
log log N log log n log log N log log N 2 D log 2: (8.21)
Therefore, writing !.n/ log log n D .!.n/ log log N RN / C .log log N
log log n/ C RN , the triangle inequality and (8.21) give
1
j!.n/log log nj j!.n/log log N RN jClog 2CM; for N 2 n N: (8.22)
ı
Using (8.20) with ı replaced by 2
, along with (8.22) and the fact that
1
limN !1 NN 2
D 0, we have
1 1
jfn 2 ŒN W j!.n/ log log nj .log log N / 2 C 2 ı C log 2 C M gj
lim D 1:
N !1 N
(8.23)
1 1 1
By (8.21), it follows that .log log n/ 2 Cı .log log N log 2/ 2 Cı , for N 2 n N .
Clearly, we have
1 1 1
.log log N log 2/ 2 Cı .log log N / 2 C 2 ı C log 2 C M; for sufficiently large N:
Thus,
1 1 1 1
.log log n/ 2 Cı .log log N / 2 C 2 ı C log 2 C M; for N 2 n N and sufficiently large N:
(8.24)
1
From (8.23), (8.24), and the fact that limN !1 N
N
2
D 0, we conclude that
1
jfn 2 ŒN W j!.n/ log log nj .log log n/ 2 Cı gj
lim D 1:
N !1 N

Exercise 8.1. Prove the lower bound
EN ! 2 .log log N /2 C O.log log N /

P
by using (8.12)–(8.15) and an inequality that begins with 1
p1 p2 N p p
1 2

p1 ¤p2
P p 1
p1 ;p2 N p1 p2 .
p1 ¤p2
Exercise 8.2. Let .n/ denote the number of prime divisors of n,Qcounted with
prime factorization of n is given by n D m
repetitions. Thus, if the P ki
iD1 pi , then
m
!.n/ D m, but .n/ D iD1 ki . Use the method of proof in Theorem 8.2 to prove
that
1 X
N
EN D .n/ D log log N C O.1/; as N ! 1:
N nD1
Exercise 8.3. Let d.n/ denote the number of divisors of n. Thus, d.12/ D 6
because the divisors of 12 are 1,2,3,4,6,12. Show that
1X
n
d.j / D log n C O.1/:
n j D1
This shows that the average order of the divisor function is the function log n. Recall
from the remark after Theorem 8.2 that the average order of !.n/, the function
counting the number
P of distinct
P prime divisors,
P Pis the function
P logP log n. (Hint: We
have d.k/ D mjk 1, so nkD1 d.k/ D k2Œn mjk 1 D m2Œn k2ŒnWmjk 1.)
Chapter Notes
The theorem of G. H. Hardy and S. Ramanujan was proved in 1917. The proof we
give is along the lines of the 1934 proof of P. Turán, which is much simpler than the
original proof. For more on multiplicative number theory and primes, the subject
of the material in Chaps. 6–8, the reader is referred to Nathanson’s book [27] and
to the more advanced treatment of Tenenbaum in [33]. In [27] one can find a proof
of the prime number theorem by “elementary” methods. For very accessible books
on analytic number theory and a proof of the prime number theorem using analytic
function theory, see, for example, Apostol’s book [5] or Jameson’s book [25]. For
a somewhat more advanced treatment, see the book of Montgomery and Vaughan
[26]. One can also find a proof of the prime number theorem using analytic function
theory, as well as a whole trove of sophisticated material, in [33].
Chapter 9
The Largest Clique in a Random Graph
and Applications to Tampering Detection
and Ramsey Theory
9.1 Graphs and Random Graphs: Basic Definitions
A finite graph G is a pair .V; E/, where V is a finite set of vertices and E is a
subset of V .2/ , the set of unordered pairs of elements of V . The elements of E
are called edges. (This is what graph theorists call a simple graph. That is, there
are no loops—edges connecting a vertex to itself—and there are no multiple edges,
more than one edge connecting the same pair of vertices.) If x; y 2 V and the pair
fx; yg 2 E, then we say that an edge joins the vertices x and y;otherwise, we say
that there is no edge joining x and y. If jV j D n, then jV .2/ j D n2 D 12 n.n1/. The
size of the graph is the number of vertices it contains, that is, jV j. We will identify
the vertex set V of a graph of size n with Œn. The graph G D .V; E/ with jV j D n
and E D V .2/ is called the complete graph of size n and is henceforth denoted
by Kn . This graph has n vertices and an edge connects every one of the 12 n.n 1/
pairs of vertices. See Fig. 9.1.
For a graph G D .V; E/ of size n, a clique of size k 2 Œn is a complete subgraph
.2/
K of G of size k; that is, K D .VK ; EK /, where VK V; jVK j D k and EK D VK .
See Fig. 9.2.
Consider the vertex set V D Œn. Now construct the edge set E Œn.2/ in the
following random fashion. Let p 2 .0; 1/. For each pair fx; yg 2 Œn.2/ , toss a coin
with probability p of heads and 1p of tails. If heads occurs, include the pair fx; yg
in E, and if tails occurs, do not include it in E. Do this independently for every
pair fx; yg 2 Œn.2/ . Denote the resulting random edge set by En .p/. The resulting
random graph is sometimes called an Erdős–Rényi graph; it will be denoted by
Gn .p/ D .Œn; En .p//. In this chapter, the generic notation P for probability and E
for expectation will be used throughout.
To get a feeling for how many edges one expects to see in the random graph,
attach to each of the N WD 12 n.n 1/ potential edges a random variable which is
equal to 1 if the edge exists in the random set of edges En .p/ and is equal to 0
if the edge does not exist in En .p/. Denote these random variables by fWm gN mD1 .
The random variables are distributed according to the Bernoulli distribution with

90 9 The Largest Clique in a Random Graph and Applications
Fig. 9.1 The complete graph with 5 vertices, G D K5
10
3
6
9
5
2 7 8
1
4
Fig. 9.2 A graph with 10 vertices and 13 edges. The largest clique is the one of size 4, formed by
the vertices f4; 5; 6; 7g
parameter p; that is, P .Wm D 1/ D 1 P .Wm D 0/ D p. Thus, the expectation

and the variance of Wm are given by EWm D p and
2 .Wm / D p.1 p/. Let SN D
P N
mD1 Wm denote the number of edges in the random graph. By the linearity of the
expectation, one has ESN D Np. Because edges have been selected independently,
the random variables fWm gNmD1 are independent. Thus, the variance of SN is the sum
of the variances of fWm gN
mD1 ; that is,
.SN / D Np.1p/. Therefore, Chebyshev’s
2
inequality gives
1C Np.1 p/
P .jSN Npj N 2 / :
N 1C
1C
Consequently, for any > 0, one has limN !1 P .jSN Npj N 2 / D 0. Thus,
for any > 0 and large n (depending on ), with high probability the Erdős–Rényi
graph Gn .p/ will have 12 n2 p C O.n1C / edges.
The main question we address in this chapter is this: how large is the largest
complete subgraph, that is, the largest clique, in Gn .p/, as n ! 1? We study this
question in Sect. 9.2. In Sect. 9.3 we apply the results of Sect. 9.2 to a problem in
tampering detection. In Sect. 9.4, we discuss Ramsey theory for cliques in graphs
and use random graphs to give a bound on the size of a fundamental deterministic
quantity.
9.2 The Size of the Largest Clique 91
9.2 The Size of the Largest Clique in a Random Graph
Let Ln;p be the random variable denoting the size of the largest clique in Gn .p/. Let
.2/
log 1 n WD log 1 log 1 n.
p p p
Theorem 9.1. Let Ln;p denote the size of the largest clique in the Erdős–Rényi
graph Gn .p/. Then
(
.2/ 0; if c < 2I
lim P Ln;p 2 log 1 n c log 1 n D
n!1 p p 1; if c > 2:
Remark. Despite the increasing randomness and disorder in Gn .p/ as n grows,

the theorem shows that Ln;p behaves almost deterministically—with probability
approaching 1 as n ! 1, the size of the largest clique will be very close to
.2/
2 log 1 n 2 log 1 n. In fact, it is known that for each n, there exists a value dn
p p
such that limn!1 P .Ln equals either dn or dn C 1/ D 1. That is, with probability
approaching 1 as n ! 1, Ln is restricted to two specific values. The proof of this
is similar to the proof of Theorem 9.1 but a little more delicate; see [9]. We have
chosen the formulation in Theorem 9.1 in particular because it is natural for the
topic discussed in Sect. 9.3.
Let Nn;p .k/ be the random variable denoting the number of cliques of size
k in the random graph Gn .p/. We will always assume tacitly that the argu-
ment of Nn;p is a positive integer. Of course it follows from Theorem 9.1 that
.2/
limn!1 P .Nn;p .kn / D 0/ D 1, if kn 2 log 1 n c log 1 n, for some c < 2.
p p
We say then that the random variable Nn;p .kn / converges in probability to 0 as
n ! 1. The proof of Theorem 9.1 will actually show that if kn 2 log 1 n
p
.2/
c log 1 n, for some c > 2, then limn!1 P .Nn;p .kn / > M / D 1, for any M 2 R.
p
We say then that the random variable Nn;p .kn / converges in probability to 1 as
n ! 1. We record this as a corollary.
Corollary 9.1.
.2/
i. If kn 2 log 1 n c log 1 n, for some c < 2, then Nn;p .kn / converges to 0 in
p p
probability; that is,
lim P .Nn;p .kn / D 0/ D 1I

n!1
.2/
ii. If kn 2 log 1 n c log 1 n, for some c > 2, then Nn;p .kn / converges to 1 in
p p
probability; that is,
lim P .Nn;p .kn / > M / D 1; for all M 2 R:

n!1
of Theorem 9.1. The number of cliques of size kn in the complete graph Kn
Proof
is knn ; denote these cliques by fKjn W j D 1; : : : ; knn g. Let IKjn be the indicator
random variable defined to be equal to 1 or 0, according to whether the clique Kjn is
or is not contained in the random graph Gn .p/. Then we can represent the random
variable Nn;p .kn /, denoting the number of cliques of size kn in the random graph
Gn .p/, as
.knn /
X
Nn;p .kn / D IKjn : (9.1)
j D1
Let P .Kjn / denote the probability that the clique Kjn is contained in Gn .p/; that is,
the probability that the edges of the clique Kjn are all contained in the random edge

set En .p/ of Gn .p/. Since each clique Kjn contains k2n edges, we have
kn
P .Kjn / D p . 2 / :
The expected value EIKjn of IKjn is given by EIKjn D P .Kjn /. Thus, the expected
value of Nn;p .kn / is given by
.knn / !
X n kn
ENn;p .kn / D EIKjn D p. 2 /: (9.2)
j D1
kn
We will first prove that if c < 2, then
.2/
lim P .Ln;p 2 log 1 n c log 1 n/ D 0: (9.3)
n!1 p p
We have
ENn;p .kn / P .Nn;p .kn / 1/ D P .Ln;p kn /;
where the equality follows from the fact that a clique of size l contains sub-cliques
of size j for all j 2 Œl 1. Thus, to prove (9.3) it suffices to prove that
.2/
lim ENn;p .2 log 1 n cn log 1 n/ D 0; (9.4)
n!1 p p
where 0 cn c < 2, for all n. (We have written cn instead of c in (9.4) because
we need the argument of Nn;p to be an integer.) This approach to proving (9.3) is
known as the first moment method.
To prove (9.4), we need the following lemma.

1
Lemma 9.1. If kn D o.n 2 /, as n ! 1, then
!
n nkn
; as n ! 1:
kn kn Š
n n.n1/.nkn C1/

Proof. We have kn
D kn Š
. Thus, to prove the lemma we need to show
that
n.n 1/ .n kn C 1/
lim D 1;
n!1 nkn
or, equivalently,
n 1
kX
j
lim log.1 / D 0: (9.5)
n!1
j D1
n
Letting f .x/ D log.1 x/, and applying Taylor’s remainder theorem in the form
f .x/ D f .0/ C f 0 .x .x//x, for x > 0, where x .x/ 2 .0; x/, we have
1
0 log.1 x/ 2x; 0 x :
2
Thus, for n sufficiently large so that kn

n
12 , we have
n 1
kX kXn 1
j j .kn 1/kn
0 log.1 /2 D :
j D1
n j D1
n n
1
Letting n ! 1 in the above equation, and using the assumption that kn D o.n 2 /,
we obtain (9.5).
.2/
We can now prove (9.4). Let kn D 2 log 1 n cn log 1 n, where 0 cn c < 2,
p p
for all n. Stirling’s formula gives
p
kn Š knkn e kn 2kn ; as n ! 1:
Using this with Lemma 9.1 and (9.2), we have

! kn .kn 1/
n kn nkn kn .kn 1/ nkn p 2
ENn;p .kn / D p. 2 / p 2 k p ; as n ! 1;
kn kn Š knn e kn 2kn
and thus
!
n kn
log 1 ENn;p .kn / D log 1 p. 2 /
p p kn
1 1 1
kn log 1 n kn2 C kn kn log 1 kn C kn log 1 e log 1 2kn ; as n ! 1:
p 2 2 p p 2 p
(9.6)
Note that
.2/
cn log 1 n
.2/
log 1 kn D log 1 .2 log 1 n cn log 1 n/ D log 1 .log 1 n/ 2 D
p
p p p p p p log 1 n
p
.2/
cn log 1 n
.2/ .2/
log 1 n C log 1 2 D log 1 n C O.1/; as n ! 1:
p
(9.7)
p p log 1 n p
p
Substituting for kn and using (9.7), we have
1 .2/
kn log 1 n kn2 kn log 1 kn D .2 log 1 n cn log 1 n/ log 1 n
p 2 p p p p
1 .2/ .2/ .2/

.2 log 1 n cn log 1 n/2 .2 log 1 n cn log 1 n/.log 1 n C O.1// D
2 p p p p p
.2/
.cn 2/.log 1 n/ log 1 n C O log 1 n : (9.8)
p p p
Since 12 kn C kn log 1 e 12 log 1 2kn D O.log 1 n/, it follows from (9.6), (9.8), and
p p p
the fact that 0 cn c < 2 that
.2/
lim log 1 ENn;p .2 log 1 n cn log 1 n/ D 1:
n!1 p p p
Thus, (9.4) holds, completing the proof of (9.3).

We now prove that if c > 2, then
.2/
lim P .Ln;p 2 log 1 n c log 1 n/ D 1: (9.9)
n!1 p p
The analysis in the above paragraph shows that if cn c > 2, for all n, then
.2/
lim ENn;p .2 log 1 n cn log 1 n/ D 1: (9.10)
n!1 p p
The first moment method used above exploits the fact that (9.4) implies (9.3).
Now (9.10) does not imply (9.9). To prove (9.9), we employ the second moment
method. (This method was also used in Chap. 3 and Chap. 8.) The variance of
Nn;p .kn / is given by
2 2
Var Nn;p .kn / D E Nn;p .kn / ENn;p .kn / D ENn;p
2
.kn / ENn;p .kn / :
(9.11)
.2/
Our goal now is to show that if kn D 2 log 1 n cn log 1 n with cn c > 2, for all
p p
n, then
2
Var Nn;p .kn / D o ENn;p .kn / ; as n ! 1: (9.12)
Chebyshev’s inequality gives for any > 0

Var Nn;p .kn /
P jNn;p .kn / ENn;p .kn /j jENn;p .kn /j 2 : (9.13)
2 ENn;p .kn /
Thus, (9.12) and (9.13) yield
Nn;p .kn /
lim P .j 1j < / D 1; for all > 0: (9.14)
n!1 ENn;p .kn /
From (9.14) and (9.10), it follows that
lim P .Nn;p .kn / > M / D 1; for all M 2 R: (9.15)

n!1
In particular then, (9.9) follows from (9.15). Thus, the proof of the theorem will be
complete when we prove (9.12), or, in light of (9.11), when we prove that
2 2
2
ENn;p .kn / D ENn;p .kn / C o .ENn;p .kn / ; as n ! 1: (9.16)

We relabel the cliques fKjn W j D 1; : : : ; knn g, of size kn in Kn according to
the vertices that are contained in each clique. Thus, we write Kin1 ;i2 ;:::;ikn to denote
the clique whose vertices are i1 ; i2 ; : : : ; ikn . The representation for Nn;p .kn / in (9.1)
becomes
X
Nn;p .kn / D IKin ;i ;:::;i : (9.17)
1 2 kn
1i1 <i2 <<ikn n
Note that the random variable IKin ;i ;:::;i IKln ;l ;:::;l is equal to 1 if the edges of the
1 2 kn 1 2 kn
two cliques Kin1 ;i2 ;:::;ikn and Kln1 ;l2 ;:::;lk are all contained in Gn .p/ and is equal to 0
n
otherwise. Thus,
EIKin ;i IKln ;l D P .Kin1 ;i2 ;:::;ikn [ Kln1 ;l2 ;:::;lkn /;

1 2 ;:::;ikn 1 2 ;:::;lkn
where P .Kin1 ;i2 ;:::;ikn [ Kln1 ;l2 ;:::;lk / is the probability that the edges of Kin1 ;i2 ;:::;ikn and
n
Kln1 ;l2 ;:::;lk are all contained in the random edge set En .p/ of Gn .p/. Consequently,
n
we have
X
ENn;p2
.kn / D EIKin ;i ;:::;i IKln ;l ;:::;l D
1 2 kn 1 2 kn
1i1 <i2 <<ikn n
1l1 <l2 <<lkn n
X
P .Kin1 ;i2 ;:::;ikn [ Kln1 ;l2 ;:::;lkn /: (9.18)
1i1 <i2 <<ikn n
1l1 <l2 <<lkn n
Now by symmetry considerations, it follows that the sum

X
P .Kin1 ;i2 ;:::;ikn [ Kln1 ;l2 ;:::;lkn /
1l1 <l2 <<lkn n
over all kn -tuples 1 l1 < l2 < < lkn n is independent of the particular
choice of kn -tuple i1 ; i2 ; : : : ; ikn . (The reader should
verify
this.) For convenience,
we select the kn -tuple 1; 2; : : : ; kn . Since there are knn different kn -tuples, we have
!
n X
2
ENn;p .kn / D n
P .K1;2;:::;kn
[ Kln1 ;l2 ;:::;lkn /: (9.19)
kn
1l1 <l2 <<lkn n
Let
J D J.l1 ; l2 ; : : : ; lkn / D jŒkn \ fl1 ; l2 ; : : : ; lkn gj
n
denote the number of vertices shared by the cliques K1;2;:::;k and Kln1 ;l2 ;:::;lk . Each
kn n n
of these two cliques has 2 edges. Since the cliques share J vertices, the number

n
of edges in K1;2;:::;k [ Kln1 ;l2 ;:::;lk is equal to 2 k2n J2 , if J 2, and is equal to
n n
2 knn , if J D 0 or J D 1. Thus,
( kn J
p 2. 2 /. 2 / ; if J D J.l1 ; l2 ; : : : ; lkn / 2I
n
P .K1;2;:::;k [ Kln1 ;l2 :::;lkn / D kn (9.20)
p 2. 2 / ; if J D J.l1 ; l2 ; : : : ; lk / 1:
n
n
Substituting (9.20) into (9.19), we have

! !
n X n X
2.k2n /.J2 / kn
2
ENn;p .kn / D p C p 2. 2 / :
kn kn
1l1 <l2 <<lkn n 1l1 <l2 <<lkn n
J.l1 ;l2 ;:::;lkn /2 J.l1 ;l2 ;:::;lkn /1
(9.21)
Keep in mind that our aim is to prove (9.16). We will do this by showing that
2
the first term on the right hand side of (9.21) is equal to o .ENn;p .kn / and
2
that
the second term on the right hand side of (9.21) is equal to ENn;p .kn / C
2
o .ENn;p .kn / .
In order to analyze the two terms on the right hand side of (9.21), we need to
count the number of kn -tuples l1 ; l2 ; : : : ; lkn for which J.l1 ; l2 ; : : : ; lkn / D j , for
j D 0; 1; : : : ; kn . Denote this number by #.j /. In order that J.l1 ; l2 ; : : : ; lkn / D j ,
we need to choose j of the vertices of l1 ; l2 ; : : : ; lkn from the set Œkn and the other
kn j vertices of l1 ; l2 ; : : : ; lkn from the set Œn Œkn . Thus,
! !
kn n kn
#.j / D ; j D 0; 1; : : : ; kn : (9.22)
j kn j
We first show that

the second term on the right hand side of (9.21) is equal to
2 2
ENn;p .kn / C o .ENn;p .kn / . Using (9.22), we have
! ! ! ! ! !
n X kn n h k n k k n k i kn
p 2. / D p 2. 2 / D
n n n n
2 C
kn kn 0 kn 0 1 kn 1
1l1 <l2 <<lkn n
J.l1 ;l2 ;:::;lkn /1
! ! ! h nkn i
n h n kn n kn i kn 2
nkn
kn
C k n kn 1
C kn p 2. 2 / D ENn;p .kn / n ;
kn kn kn 1 kn
(9.23)
kn
where (9.2) was used for the final equality. By Lemma 9.1, knn nkn Š , and applying
n .nkn /kn kn
Lemma 9.1 with n replaced by nkn , we have nk kn
kn Š D nkn Š .1 knn /kn
nk n 1 n kn 1
kn Š
, since kn D o.n 2 /. Of course then also nk
kn 1
.knn 1/Š . Thus,
nkn nkn nk n kn 1
kn
C kn kn 1 kn Š
C kn .knn 1/Š kn2
n D1C : (9.24)
nk n n
kn kn Š
From (9.23) and (9.24), we conclude that the secondterm on the right hand side
2 2
of (9.21) is equal to ENn;p .kn / C o .ENn;p .kn / .
Now we consider the first term on the right hand side of (9.21). Of course,
nkn kn j j kn
kn j
.knn j /Š and kjn kjnŠ . Also, by Lemma 9.1, knn nkn Š . Using these
2 2 kn
estimates and (9.22), and recalling from (9.2) that ENn;p .kn / D knn p 2. 2 / ,
we can estimate the first term on the right hand side of (9.21) by
! ! k ! !
n X n X n
kn n kn 2.kn /.j /
2.k2n /.J2 /
p D p 2 2 D
kn kn j D2 j kn j
1l1 <l2 <<lkn n
J.l1 ;l2 ;:::;lkn /2
kn nkn
2 X
kn
k j j 2 X
kn
nkn j kn
j
j
p . 2 / ENn;p .kn / n p . 2 /
j
ENn;p .kn / n n
j D2 kn j D2
.kn j /Šj Š kn
2 Xkn
nkn j kn kn Š .j /
j
2 Xkn
kn
2j
j.j 1/
ENn;p .kn / p 2 EN
n;p .k n / p 2 :
j D2
.kn j /Šj Šn kn
j D2
j
n jŠ
(9.25)
j j
p
By Stirling’s formula, j Š j e 2j , as j ! 1, and thus there exists a
constant C > 0 such that
j Š Cj j e j ; for all j 2: (9.26)

j 1
It is easy to check that jp 2 is decreasing in j for j sufficiently large. Using this
j 1
and the fact that limj !1 jp 2 D 0, it follows that
j 1 kn 1
min jp 2 D kn p 2 ; for sufficiently large kn : (9.27)
2j kn
Using (9.26) for the first inequality below and (9.27) for the second inequality below,
for sufficiently large n the summation in the last term on the right hand side of (9.25)
can be estimated by
kn
ekn2 j
kn
X 1 X 1 X ekn j
kn 2j
kn j.j 1/
2
p 1

j D2
nj j Š C j D2 j np 2j
C j D2 np kn21
p p
1 X pekn j
1
1 n2 pekn
D ; if n WD < 1: (9.28)
C j D2 np 2 kn
C 1 n kn
np 2
.2/
Using the fact that kn D 2 log 1 n cn log 1 n with cn c > 2, we now show
p p
that
p kn
lim n D pe lim kn
D 0: (9.29)
n!1 n!1
np 2
Using (9.7) (which of course holds for fcn g1

nD1 as above) for the second equality
below, we have
9.3 Detecting Tampering 99
kn kn .2/ kn
log 1 kn
D log 1 kn log 1 n log 1 p D log 1 n C O.1/ log 1 n C D
p
np 2
p p 2 p p p 2
.2/ cn .2/ cn .2/
log 1 n C O.1/ log 1 n C log 1 n log 1 n D .1 / log 1 n C O.1/;
p p p 2 p 2 p
as n ! 1:
Since cn c > 2, it follows from this that limn!1 log 1 kn

kn D 1 and
p
np 2
consequently that (9.29) holds. From (9.25), (9.28),
and (9.29)we conclude that the
2
first term on the right hand of (9.21) is indeed o .ENn;p .kn / . This completes the
proof of Theorem 9.1.
9.3 Detecting Tampering in a Random Graph
The tampering detection problem we discuss is intimately related to Theorem 9.1

and Corollary 9.1. Consider the random graph Gn .p/ D .Œn; En .p//. Of course,
En .p/ Œn.2/ is a random subset of Œn.2/ . Consider now the complete
graph Kn
whose edge set is Œn.2/ . Let kn satisfy 1 kn n. There are knn different cliques

of size kn in Kn . We choose one of these knn cliques at random and “add” all of
its edges to the random edge set En .p/ (of course some of these additional edges
might already be in En .p/); that is, we take the union of En .p/ and the edges of the
randomly chosen clique. We denote this new augmented edge set by EntamIkn .p/ and
denote the corresponding tampered graph by GntamIkn .p/. See Fig. 9.3.
The question we ask is whether one can detect the tampering asymptotically as
n ! 1. Of course, we need to define what we mean by detecting the tampering.
For this we need to define a distance between measures.
Consider a finite set and consider probability measures and on . We
define the total variation distance between and by
DTV .; / WD max j.A/ .A/j: (9.30)

A
10
3 6
9
5
2 7 8
1
4
Fig. 9.3 The graph from Fig. 9.2 of size n D 10 has been tampered with by adding to it the clique
of size kn D 3 formed by the vertices {3,6,10}
In Exercise 9.1, the reader is asked to show that the distance DTV .; / can be
written in two other fashions:
1X
DTV .; / D max..A/ .A// D j.x/ .x/j: (9.31)
A 2 x2
It is easy to see that DTV .; / takes on values in Œ0; 1, vanishes if and only if
D , and equals 1 if and only if and are mutually singular. We recall that
two probability measures and are called mutually singular if there exists a subset
A such that .A/ D .A/ D 1 (and then of course .A/ D .A/ D 0).
Consider now a -valued random variable X (defined on some probability space
.S; P /). The random variable X induces a probability measure X on , namely
for any subset A , we define X .A/ D P .X 2 A/. This probability measure is
called the distribution of X . Given two random variables X; Y taking values in ,
we define the total variation distance between them by
DTV .X; Y / WD DTV .X ; Y /:
We now apply the above concepts to the random graph. The original random
graph Gn .p/ has as its edge set En .p/, whereas the tampered random graph
GntamIkn .p/ has the augmented edge set EntamIkn .p/. Each of the random variables
.2/
En .p/ and EntamIkn .p/ takes values in the space P.Œn.2/ / WD 2Œn , the set of all
.2/
subsets of Œn . (Given a set A, the set of all subsets of A is sometimes denoted by
2A ; it is known as the power set of A.) We define the tamper detection problem as
follows.
Definition.
i. If

lim DTV En .p/; EntamIkn .p/ D 0;
n!1
we say that the tampering is strongly undetectable.

ii. If

lim DTV En .p/; EntamIkn .p/ D 1;
n!1
we say that the tampering is detectable.

iii. If
lim inf DTV .En .p/; EntamIkn .p// > 0 and lim sup DTV .En .p/; EntamIkn .p// < 1;
n!1 n!1
we say that the tampering is weakly undetectable.

We will prove the following theorem.

Theorem 9.2. Consider the Erdős–Rényi graph Gn .p/ with random edge set
En .p/ and consider the tampered graph GntamIkn .p/ obtained by choosing at random
a clique of size kn from the complete graph Kn and adjoining its edges to En .p/ to
create the augmented edge set EntamIkn .p/.
.2/
i. If kn 2 log 1 n c log 1 n, for some c < 2, then the tampering is detectable;
p p
that is, limn!1 DTV .En .p/; EntamIkn .p// D 1.
.2/
ii. If kn 2 log 1 n c log 1 n/, for some c > 2, then the tampering is strongly
p p
undetectable; that is, limn!1 DTV .En .p/; EntamIkn .p// D 0.
Remark. In light of Corollary 9.1, Theorem 9.2 seems quite intuitive. Indeed, if
.2/
kn 2 log 1 n c log 1 n, with c < 2, then Nn;p .kn /, the number of cliques of
p p
size kn in the random graph Gn .p/, converges to 0 in probability. However, by
construction, the tampered graph will always have such a clique. Thus, clearly, one
can distinguish between the corresponding measures. On the other hand if kn
.2/
2 log 1 n c log 1 n, with c > 2, then Nn;p .kn / converges to 1 in probability. That
p p
is, for arbitrary M , the number of cliques of size kn in Gn .p/ will be larger than
M with probability approaching 1 as n ! 1. Since the tampered graph GntamIkn .p/
is obtained from the original graph Gn .p/ by adjoining a randomly chosen clique
of size kn from the complete graph Kn , and since the number of cliques of size
kn in Gn .p/ grows unboundedly as n ! 1 with probability approaching 1, it
seems intuitive that the addition of a single randomly chosen clique would hardly
be felt, and that asymptotically, the two graphs would be indistinguishable. Despite
the above intuition, which leads to the correct answer in the present situation, there
are situations in which this intuition leads one astray. See the notes at the end of
the chapter.
Proof. For notational clarity at a certain point in the proof, we will denote Nn;p .kn /,
the random variable denoting the number of cliques of size kn in the random graph
.k /
Gn .p/, by Nn;pn . For the proof of part (ii) of the theorem, we will need the weak
.k /
law of large numbers for the random variable Nn;pn :
.2/
If kn 2 log 1 n c log 1 n; for some c > 2; then for all > 0;
p p
.k / (9.32)
Nn;pn
lim P .j .k /
1j < / D 1:
n!1 ENn;pn
This result was actually proved in the course of the proof of Theorem 9.1—it appears
as (9.14).
Let n denote the distribution of the random variable En .p/ and let nItam denote

the distribution of the random variable EntamIkn .p/. Let fKjn W j D 1; : : : ; knn g

denote the knn cliques of size kn in the complete graph Kn . Recall that P.Œn.2/ /
denotes the set of subsets of Œn.2/ ; thus, a point ! 2 P.Œn.2/ / is a subset of
Œn.2/ , while a subset A P.Œn.2/ / is a collection of subsets of Œn.2/ . Denote by
Anj P.Œn.2/ / the subset of P.Œn.2/ / consisting of all those subsets of Œn.2/ which

contain all of the k2n edges of the clique Kj . Let An D [kj nD1 Anj P.Œn.2/ / denote
the set of all those subsets of Œn.2/ which possess at least one clique of size kn .
The tampered graph is obtained by choosing at random one of the knn cliques of
size kn in Kn and adding all of its edges to the original random edge set En .p/. That
is, one of the Kjn ; j D 1; : : : ; knn is chosen at random, and its edges are adjoined to
En .p/ to form EntamIkn .p/. Of course then, by construction, the tampered edge set
EntamIkn .p/ must possess a clique of size kn ; thus,
EntamIkn .p/ 2 An : (9.33)
.2/
We first prove part (i) of the theorem. Let kn 2 log 1 n c log 1 n, for some
p p
c < 2. By Corollary 9.1 (or Theorem 9.1), the probability of there being at least one
clique of size kn in En .p/ converges to 0 as n ! 1; thus,
lim n .An / D 0:
n!1
On the other hand, by (9.33),
nItam .An / D 1; for all n:
Consequently,
DTV .En .p/; EntamIkn .p// D DTV .n ; nItam / D max jn .A/ nItam .A/j
AP.Œn.2/ /
jn .An / nItam .An /j D 1 n .An / ! 1; as n ! 1;
proving part (i).

We now prove part (ii). The conditional n -probability that a set A P.Œn.2/ /
occurs given that the set Anj P.Œn.2/ / occurs is denoted by n .AjAnj / and is
n .A\An /
given by n .AjAnj / D n .An /j . From the description of the construction of the
j
tampered graph in the first paragraph of this section, along with the fact that under
n the existence of any particular edge is independent of the existence of any other
particular edges, it follows that
.knn /
1 X
nItam .A/ D n n .AjAnj /; for A P.Œn.2/ /: (9.34)
kn j D1
(The reader should verify this.)

For a point ! 2 P.Œn.2/ /, we write f!g P.Œn.2/ / to denote the subset of

P.Œn.2/ / consisting of the singleton !. Note that
(
n .f!g/; if ! 2 Anj I
n .f!g \ Anj / D :
0; otherwise:
.k /
Consequently, from the definition of Nn;pn and the definition of fAnj W j D

1; : : : ; knn g, we have
.knn /
X
n .f!g \ Anj / D n .f!g/Nn;p
.kn /
.!/; ! 2 P.Œn.2/ /: (9.35)
j D1
kn kn
Note that n .Anj / D p . 2 / , for all j . Recall from (9.2) that ENn;pn D knn p . 2 / .
.k /
Using these facts with (9.34) and (9.35), we have
.knn / .knn /
1 X 1 X n .f!g \ Anj /
nItam .f!g/ D n n .f!gjAj / D n
n
D
k j D1 k j D1
n .Anj /
n n
.k / .k /
n .f!g/Nn;pn .!/ Nn;pn .!/
n .kn / D .k /
n .f!g/: (9.36)
k
p 2 ENn;pn
n
Equation (9.36) shows that the probability measure nItam is the tilted probability
.k /
measure of n , tilted by the random variable Nn;pn .
For > 0, let
.k /
Nn;pn .!/
B n D f! 2 P.Œn.2/ / W j .k /
1j < g:
ENn;pn
.2/
Since kn 2 log 1 n c log 1 n, for some c > 2, it follows from the law of large
p p
numbers in (9.32) that
lim n .B n / D 1: (9.37)
n!1
From (9.36), we have
X Nn;p
.kn /
.!/
jnItam .B n / n .B n /j D j n .f!g/ .k /
1 j<
!2B n ENn;pn
X
n .f!g/ D n .B n / ; (9.38)
!2B n
where the first inequality follows from the definition of B n . From (9.37) and (9.38),
it follows that
lim inf nWtam .B n / 1 : (9.39)

n!1
Now let A P.Œn.2/ / be arbitrary. Note that (9.38) holds also with B n replaced
by A \ B n ; so jnItam .A \ B n / n .A \ B n /j < . Let .B n /c D P.Œn.2/ / B n
denote the complement of B n . Then we have
jn .A/ nItam .A/j D

jn .A \ B n / C n .A \ .B n /c / nItam .A \ B n / nItam .A \ .B n /c /j
jn .A \ B n / nItam .A \ B n /j C n .A \ .B n /c / C nItam .A \ .B n /c / <
C n ..B n /c / C nItam ..B n /c /: (9.40)
From (9.40) and the definition of the total variation distance, it follows that

DTV En .p/; EntamIkn .p/ D DTV n ; nItam D
max jn .A/ nItam .A/j < C n ..B n /c / C nItam ..B n /c /: (9.41)
AP.Œn.2/ /
From (9.37), (9.39), (9.41), and the fact that > 0 is arbitrary, we conclude that

lim DTV En .p/; EntamIkn .p/ D 0: (9.42)
n!1

Remark. The final two paragraphs of the proof can be replaced by a shorter argu-
ment using L2 -convergence and the Cauchy–Schwarz inequality. See Exercise 9.2.
9.4 Ramsey Theory
Consider the complete graph Kn . For each edge in Kn , choose either blue or red,
and color the edge with that color. We call this a 2-coloring of Kn . For 2 k n,
one can ask whether there exists a monochromatic clique of size k, that is, a clique
with all of its edges blue or with all of its edges red. For k D 2, obviously there
exists such a monochromatic clique, for all n 2. The fundamental theorem of
Ramsey theory states the following:
For each integer k 3, there exists an integer R.k/ > k such that if n R.k/,
then every 2-coloring of Kn will necessarily have a monochromatic clique of size
k, while if k n < R.k/, then it is possible to find a 2-coloring of Kn with no
monochromatic clique of size k.
9.4 Ramsey Theory 105
Fig. 9.4 The above example shows that R.3/ > 5
Note that this result is purely deterministic—it says that no matter how we arrange
the coloring of Kn , there must be a monochromatic clique of size k, if n R.k/.
The exact computation of the Ramsey numbers R.k/ is notoriously hard. One has
R.3/ D 6 and R.4/ D 18, but the exact value of R.5/ is unknown! See Fig. 9.4.
Remark. It is known that 43 R.5/ 49. The complete graph K43 has 12 43

42 D 903 edges. There are 2903 different two-colorings of K43 and 43
5
D 962; 598
different cliques of size 5.
We will prove the above fundamental result by providing upper and lower bounds
on R.k/. A nice, elementary combinatorial argument yields the following result.
Theorem 9.3.
R.k/ 4k1 ; k 3: (9.43)
Remark. The above estimate is not far from the best known asymptotic upper bound
for R.k/. In particular, it is not known if R.k/ c k , for large k and some c < 4.
For the best known upper bound, see [12].
Proof. Let k 3. Consider an arbitrary coloring of the complete graph K4k1 of
size 4k1 D 22k2 . Define x1 D 1 and S0 D K4k1 . Since x1 shares an edge with
22k2 1 vertices, there must be a set of vertices S1 of size at least 22k3 such
that every edge from x1 to a vertex in S1 is the same color. This is the so-called
pigeonhole principle. Let x2 denote the vertex in S1 with the lowest number. By the
same reasoning, since x2 shares an edge with all the other vertices in S1 , of which
there are at least 22k3 1, there must be a set S2 S1 of size at least 22k4 such
that every edge from x2 to a vertex in S2 has the same color. Continuing like this,
we obtain a sequence x1 ; : : : ; x2k2 of vertices and a decreasing, nested sequence of
sets of vertices fSj g2k3
j D0 such that xj 2 Sj 1 , j 2 Œ2k 2. By the construction,
it follows that for each i , the color of the edge joining xi to xj is the same for all
˚ 2k3
j > i . Now look at the 2k 3 edges fxi ; xiC1 g iD1 . Obviously, we can choose
at least k 1 of these edges to be all the same color. Find such a set of edges and
denote the set of vertices in these edges by S . Note that jS j k. Because the color
joining xi to xj is the same for all j > i , it follows in fact that the color of the edge
joining any two vertices in S is the same. We have thus exhibited a monochromatic
clique of size at least k.
Despite the fact that the Ramsey number R.k/ is a quantity associated with a
purely deterministic result, one can give a very short and ingenious probabilistic
proof of a lower bound for R.k/.
Theorem 9.4. R.k/ > k, for all k 3, and
1 k
R.k/ 1 C o.1/ k2 2 ; as k ! 1: (9.44)
e
p
Remark. The best known lower bound is just 2 times the above estimate; see [2].
Thus, a real chasm lies between the best known upper bound and the best known
lower bound!
Proof. Consider a random two-coloring of the graph Kn , where each edge is colored
red or blue with equal probability, and independently of what occurs at other edges.
Let W be a clique in Kn of size k, with 3 k n. Let IW be the indicator random
variable, which is equal to 1 if W is monochromatic, and equal to 0 otherwise.
Since there are k2 edges in W , the probability that W is all blue (or all red) is
k k
. 1 /.2/ ; consequently, the probability that W is monochromatic is 21.2/ . Of course,
2
k
also equal to 21.2/ .
the expected value EIW of IW isP
For 3 k n, let Xk D jW jDk IW . The random variable Xk counts the
number of monochromatic cliques of size k in Kn . We have
!
X n 1.k/
EXk D EIW D 2 2 :
k
jW jDk
Since the average number of monochromatic cliques of size k in this random

k
two-coloring is equal to kn 21.2/ , there certainly must exist some particular two-
k
coloring with exactly M monochromatic cliques of size k, for some M kn 21.2/ .
Consider such a two-coloring. From each of the M monochromatic cliques of size k,
remove one of the vertices. Let M 0 denote the number of vertices removed. We have
M 0 M . (It is possible that M 0 < M because we might have removed the same
vertex from more than one of the cliques.) What remains is a two-coloring of the
complete graph on n M 0 vertices, and by construction, this two-coloring has no
monochromatic cliques of size k. We conclude that
!
n 1.k/
R.k/ > n 2 2 ; for any n k:
k
9.4 Ramsey Theory 107
k
In particular, choosing n D k C 1, one obtains R.k/ > k C 1 2.k C 1/2.2/ , and
it is easy to check that the right hand side is greater than or equal to k, for all k 3.
In Exercise 9.3 the reader is asked to show that
!
n 1.k/ 1 k
max n 2 2 D 1 C o.1/ k2 2 ; as k ! 1: (9.45)
kn<1 k e

Remark. The strategy used to prove Theorem 9.4 is known as the probabilistic
method. It was pioneered by P. Erdős. He used the method in a slightly different
p
way from above and obtained a lower bound on R.k/ with an extra factor of 2 in
the denominator on the right hand side of (9.44).
Exercise 9.1. Show that the total variation distance DTV .; / defined in (9.30)
satisfies (9.31).
Exercise 9.2. This exercise presents an alternative approach in place of the final
two paragraphs of the proof of part (ii) of Theorem q9.2. Recall that the Cauchy–
Pm Pm 2 Pm 2
Schwarz inequality states that j iD1 ai bi j . iD1 ai /. iD1 bi /, where
fai gm
iD1 ; fb gm
i iD1 are real numbers and m is a positive integer.
a. Use (9.36) and the Cauchy–Schwarz inequality to show that for any A
P.Œn.2/ /, one has
v
u
p u X Nn;p
.kn /
.!/ 2
jnItam .A/ n .A/j .A/ t .kn /
1 n .!/
!2A ENn;p
v
u X
u Nn;p
.kn /
.!/ 2
t 1 n .!/: (9.46)
.k /
!2P.Œn.2/ /
ENn;pn
b. The expression on the right hand side of (9.46) is called the L2 -norm with respect
.k /
Nn;pn .!/
to the measure n of the function .k / 1, which is defined on the domain
ENn;pn
.k /
Nn;pn
P.Œn.2/ /. We denote this norm by jj .k / 1jj2In . Use (9.16) (where the
ENn;pn
.k /
notation Nn;p .kn / instead of Nn;pn is used), which holds for kn as in part (ii)
of Theorem 9.2, to prove that
.k /
Nn;pn
lim jj .k /
1jj2In D 0: (9.47)
n!1 ENn;pn
c. Conclude from (9.46) and (9.47) that (9.42) holds.

k
21. 2 / x k
Exercise 9.3. Show that (9.45) holds. (Hint: Let f1;k .x/ D x kŠ
and
1.k2 /
2 .xk/k
f2;k .x/ D x . Show that maxkx<1 f1;k .x/ maxkx<1 f2;k .x/
kŠ
.nk/k k
as k ! 1. Since kn nkŠ , it then follows that maxkn<1 n
n 1.k/ kŠ
k
2 2 maxkx<1 f1;k .x/, as k ! 1. To obtain the asymptotic behavior
of maxkx<1 f1;k .x/, you will need Stirling’s formula.)
Exercise 9.4. Figure 9.4 shows that the Ramsey number R.3/ satisfies R.3/ 5.
Prove that R.3/ D 6.
Chapter Notes
For a wide scope of results concerning graphs, deterministic and random, see
Bollobás’ books [9] and [10].
For a paper that considers tampering detection, see [29]. In particular, one
finds there two examples that show that the intuition for Theorem 9.2, discussed
in the remark following the theorem, can fail. It should be noted that the word
“detection” must be understood here in a very theoretical way, as there are no known
algorithms for detecting this clique in a reasonable amount of time, namely an
amount of time which grows no more than polynomially in the number of vertices n.
The construction of such algorithms is known in the theoretical computer science
literature as the “planted clique” problem. See, for example, the paper of Alon et al.
1
[3], where for p D 12 it is shown that a planted clique of order n 2 can be detected in
polynomial time. (This order for the clique is of course far, far larger than the order
log n for the cliques discussed in this chapter.)
The proof of the existence of the Ramsey number R.k/ goes back to F. Ramsey
in 1930. The nice little book by Alon and Spencer [2] is devoted entirely to the
probabilistic method in combinatorics. The book by Graham et al. [22] is devoted
entirely to Ramsey theory.
Chapter 10
The Phase Transition Concerning the Giant
Component in a Sparse Random Graph:
A Theorem of Erdős and Rényi
10.1 Introduction and Statement of Results
Let Gn .pn / D .Œn; En .pn // denote the Erdős–Rényi graph of size n which was
introduced in Chap. 9. As in Chap. 9, the generic notation P for probability and
E for expectation will be used in this chapter. Note that whereas in Chap. 9 the
edge probability p was fixed independent of the graph size, in this chapter the
edge probability pn will vary with n. A subset A Œn of the vertex set Œn is
called connected if for every x; y 2 A, there exists a path between x and y along
edges in En .pn /. The vertex set Œn is of course equal to the disjoint union of its
lg
connected components. Let Cn be the random variable denoting the size of the
largest connected component in the random graph Gn .pn /. It turns out that the
size of the largest connected component undergoes a striking phase transition as
the edge probability passes from nc with c < 1 to nc with c > 1. In this chapter we
will prove the following two theorems.
Theorem 10.1. Let pn D nc , with c < 1. Then there exists a D .c/ such that the
lg
size Cn of the largest connected component of Gn .pn / satisfies
lim P .Cnlg log n/ D 1:

n!1
Theorem 10.2. Let pn D nc , with c > 1. Then there exists a unique solution ˇ D
ˇ.c/ 2 .0; 1/ to the equation 1 e cx x D 0. For any > 0, the size Cn of the
lg
largest connected component of Gn .pn / satisfies

lim P .1 /ˇn Cnlg .1 C /ˇn D 1: (10.1)
n!1
110 10 Giant Component in a Sparse Random Graph
Furthermore, every other connected component of Gn .pn / is of size O.log n/ as

2nd-lg
n ! 1; that is, letting Cn denote the size of the second largest component,
then for some D .c/,
lim P .Cn2nd-lg log n/ D 1: (10.2)

n!1
Remark 1. In light of (10.1) and (10.2), when pn D c

n
with c > 1, the largest
component is referred to as the giant component.
Remark 2. It follows from the above theorems that when pn D nc , for some c > 0,
the probability that the graph is connected approaches 0 as n ! 1. This can be
proved directly far more easily than the above theorems can be proved. Indeed, in
Exercise 10.3, the reader is guided through a proof of the following fact concerning
disconnected vertices; that is vertices that are not connected to any other vertices:
If pn D log nCc n
n
, then as n ! 1, the probability of there being at least
one disconnected vertex approaches 0 if limn!1 cn D 1, while for any M ,
the probability of there being at least M disconnected vertices approaches 1 if
limn!1 cn D 1. Actually, there is a totally trivial way to see that if pn D nc ,
then the probability that the graph is connected does not approach 1. Indeed, simply
note that the probability that any particular vertex is disconnected is .1 nc /n1 ; thus
as n ! 1, the probability that any particular vertex is disconnected converges to
e c . The above theorems and this discussion naturally elicit the question, how large
must pn be in order that the graph be connected? The answer to this was given also
by Erdős and Rényi, who proved that the above threshold probability concerning
whether or not the graph possesses disconnected vertices is also the threshold for
connectivity: If pn D log nCc
n
n
, then as n ! 1, the probability of the graph being
connected approaches 1 if limn!1 cn D 1 and approaches 0 if limn!1 cn D 1.
See [9].
Remark 3. If a connected component of a graph contains m vertices, then it must
contain at least m 1 edges. Thus, it follows from (10.1) that if c > 1, then for any
> 0 and for large n, with high probability, the random graph Gn . nc / will contain
at least .1 /ˇ.c/n edges. In Sect. 9.1 of Chap. 9, it was shown that for any > 0,
with high probability, the graph Gn .p/ has 12 n2 p C O.n1C / edges. The same type
of analysis shows that for any > 0 and large n, with high probability the graph
1
Gn . nc / has 12 cn C O.n 2 C / edges. Thus, one must have ˇ.c/ c2 , for 1 < c < 2.
See Exercise 10.1.
In Sect. 10.2 we construct the setup that will be used for the proofs of
Theorems 10.1 and 10.2. In particular, we construct and analyze probabilistically
an algorithm that calculates for each vertex of the graph the size of the connected
component to which it belongs. In Sect. 10.3 we present a couple of basic large
deviations estimates that will be needed for the proofs of the theorems. The results
of Sects. 10.2 and 10.3 will allow for a quick proof of Theorem 10.1 in Sect. 10.4. In
Sect. 10.5, we give a concise presentation of the Galton–Watson branching process
and prove the most basic theorem of this subject, concerning the probability of
10.2 Preliminary Results 111
extinction. This will be used for one part of the proof of Theorem 10.2, which
is presented in Sect. 10.6. The proof of Theorem 10.2 requires considerably more
technical work over and above that which is required for the proof of Theorem 10.1.
10.2 Construction of the Setup for the Proofs

of Theorems 10.1 and 10.2
Let x 2 Œn be a vertex of the random graph. All the random quantities that we define
below depend on x and n, but we suppress this dependence in the notation. We
construct an algorithm that produces the connected component to which x belongs.
We begin by calling x “alive” and calling all of the other vertices in Œn “neutral.”
We define Y0 D 1, to indicate that at the beginning there is one vertex that is alive.
Each of the neutral vertices y is now observed. If there is an edge connecting x to y,
that is, if fx; yg 2 En .pn /, then y is declared alive; if not, then y remains neutral.
After every such y has been checked, we declare x to be “dead.” We define Y1 to
be the new number of vertices that are alive. We also say that at time t D 1 there
is one dead vertex. This ends the first step of the algorithm. We continue like this.
If at the end of step t there are Yt > 0 vertices that are alive (and t dead vertices),
we begin step t C 1 by selecting one of the alive vertices (it doesn’t matter which
one) and call it z. Each of the currently neutral vertices y is now observed. If there is
an edge connecting z and y, then y is declared alive; if not, then y remains neutral.
After every such y has been checked, we declare z to be “dead.” We define YtC1 to
be the new number of vertices that are alive, and we say that at time t C 1 there are
t C 1 dead vertices. The process stops at the end of the step T for which YT D 0.
It follows that at the end of step T , there are T dead vertices. A little thought shows
that these dead vertices form the connected component to which x belongs. Thus
T is the size of the connected component to which x belongs. (The reader should
verify this.) See Fig. 10.1. Of course, T is a random variable since it depends on the
random edge configuration En .pn /.
For 1 t T , define Zt to be the number of neutral vertices that are declared
alive at step t . Then from the description of the algorithm, we have for 1 t T ,
Yt D Yt1 C Zt 1: (10.3)
Assuming that t T , at the end of step t 1, there are t 1 dead vertices and
Yt1 > 0 alive vertices. Thus there are n t Yt1 C 1 neutral vertices. A key
feature of the above algorithm is that no pair of vertices is ever checked twice.
Consequently, for every pair of vertices that is checked, the probability of there
being an edge between them is equal to pn , independently of what occurred when
checking other pairs of vertices. Thus, since Zt counts how many of the n t
Yt1 C 1 neutral vertices have a common edge with the alive vertex z that has been
selected for implementing step t , and since the probability of there being an edge
x =1
y = 2 : neutral → alive
y = 3 : neutral → neutral
x = 1 is declared dead 5
Take alive vertex z = 2 2
y = 3 : neutral → neutral 1
y = 6 : neutral → neutral 4
x = 2 is declared dead
3
Take alive vertex z = 4
z = 4 is declared dead
6
Take alive vertex z = 5
z = 5 is declared dead
There are no more alive vertices, so algorithm ends
Dead sites : {1, 2, 4, 5} = the connected component containing x = 1
Fig. 10.1 The algorithm
from z to any given neutral vertex is pn , it follows that Zt is distributed according to

the binomial distribution with parameters n t Yt1 C 1 and pn : for 1 t T ,
Zt Bin.n t Yt1 C 1; pn /: (10.4)
Of course, Yt1 , which appears in the size parameter of the binomial distribution
above, is itself a random variable. The meaning of (10.4) is that conditioned on
knowing that Yt1 D y, then Zt Bin.nt y C1; pn /. Since no pair of vertices is
ever checked twice, and since from (10.3), Yt1 only depends on fZs gt1
sD1 , it follows
that given the value of Yt1 , and given that T t , the random variable Zt and the
random variables fZs gt1
sD1 are conditionally independent; that is, for all m 1 and
all t 2,
P .Zt 2 ; fZs gt1

sD1 2 jYt1 D m; T t / D
P .Zt 2 jYt1 D m; T t /P .fZs gt1

sD1 2 jYt1 D m; T t /: (10.5)
As noted, (10.3) and (10.4) hold only up to time T ; however it will be convenient
to define Yt and Zt recursively from (10.3) and (10.4) for all integers 0 t n.
(Thus, e.g., if T D t0 , then we have Yt0 D 0 (as well as Zt0 D 0), and thus
Zt0 C1 Bin.n t0 ; pn / and Yt0 C1 D Zt0 C1 1.) In particular, for t > T , Yt
can take on negative values. For 1 t T , note that the number Nt of neutral
vertices at the end of step t is given by Nt D n t Yt . We use this equation to
define N0 , namely, N0 D n1, indicating that there are n1 neutral vertices before
the first step begins. We now use this equation to extend Nt also to all 0 t n.
We have the following key lemma.
10.3 Large Deviations 113
Lemma 10.1.
Yt 1 C t Bin.n 1; 1 .1 pn /t /; t 0:
Proof. Since Nt D n t Yt D .n 1/ .Yt 1 C t /, the statement of the lemma

is equivalent to
Nt Bin.n 1; .1 pn /t /: (10.6)
We prove (10.6) by induction. Clearly (10.6) holds for t D 0. Now assume that for
some t 1,
Nt1 Bin.n 1; .1 pn /t1 /: (10.7)
Using (10.3), we have
Nt D n t Yt D n t Yt1 C 1 Zt D Nt1 Zt : (10.8)
However, from (10.4) and the definition of Nt1 , we have Zt Bin.Nt1 ; pn /.

Thus, Nt1 Zt Bin.Nt1 ; 1 pn /, and it follows from (10.8) that
Nt Bin.Nt1 ; 1 pn /: (10.9)
By the inductive hypothesis (10.7), Nt1 is the number of heads in n1 independent
coin flips, where on each flip the probability of heads is .1 pn /t1 . Then (10.9)
states that Nt is the number of “successes” in n 1 independent trials, where each
trial consists of first tossing a coin with probability .1 pn /t1 of heads and then
tossing a second coin with probability 1 pn of heads, and a “success” is defined as
obtaining heads on both flips. This description of Nt is the description of a random
variable distributed according to Bin.n 1; .1 pn /t /. For an alternative derivation
that (10.9) and (10.7) imply (10.6), using generating functions, see Exercise 10.4.

10.3 Some Basic Large Deviations Estimates
We present two propositions which are known as large deviations estimates. The
first proposition will be used in the proof of Theorem 10.1 and the second one will
be used in the proof of Theorem 10.2.
Proposition 10.1. Let c 2 .0; 1/. For n 2 ZC and t > 0 with tcn 1, let Sn;t
Bin.n; tcn /. Then there exists a D .c/ > 0, independent of n and t , such that
P .Sn;t t / e t :
Remark. Note that ESn;t D n. tcn / D t c < t , since c 2 .0; 1/.

Proof. For any > 0, we have
P .Sn;t t / exp.t /E exp.Sn;t /; (10.10)
since exp..Sn;t t // 1 on the event fSn;t t g.

Since Sn;t is the number of successes in n independent Bernoulli trials, each
of whichPhas probability tcn of success, it follows that Sn;t can be represented as
Sn;t D nj D1 Bj , where the fBj gnj D1 are independent and identically distributed
Bernoulli random variables with parameter tcn ; that is, P .Bj D 1/ D 1 P .Bj D
0/ D tcn . Using the fact that these random variables are independent and identically
distributed, we have
X
n Y
n
E exp.Sn;t / D E exp. Bj / D E exp.Bj / D .E exp.B1 //n D
j D1 j D1
tc tc
.1 C e /n : (10.11)
n n
Since 1 C y e y , for all y, we have .1 C xn /n e x , for all x 0 and all n 1.

Thus, .1 tcn C tcn e /n e tc.e 1/ , and consequently, (10.10) and (10.11) give for

any > 0

P .Sn;t t / e t e tc.e
1/
D exp . ce C c/t : (10.12)
The function f ./ WD ce C c satisfies f .0/ D 0, and f 0 .0/ > 0, since

c 2 .0; 1/. Thus, there exist D .c/ > 0 and 0 > 0 such that f .0 / D . We
then conclude from (10.12) that P .Sn;t t / e t .
Proposition 10.2. For each n 2 ZC , let Sn Bin.n; /, where 2 .0; 1/. Let
0 1 0
.0 ; / D 0 log C .1 0 / log ; 0 < ; 0 < 1:
1
Then .0 ; / > 0, if ¤ 0 , and

(i) if < 0 < 1, then
P .Sn 0 n/ e .0 ;/n ; for all nI
(ii) if 0 < 0 < , then
P .Sn 0 n/ e .0 ;/n ; for all n:

10.4 Proof of Theorem 10.1 115
Remark. The function .0 ; / is a relative entropy. For more about this, see the
notes at the end of the chapter.
Proof. The following three facts show that (ii) follows from (i): SOn WD n Sn is
distributed according to the distribution Bin.n; 1 /, P .Sn 0 n/ D P .SOn
.1 0 /n/ and .1 0 ; 1 / D .0 ; /. So it suffices to show that (i) holds and
that .0 ; / > 0, if ¤ 0 .
Let 0 > . For any > 0, we have
P .Sn 0 n/ exp.0 n/E exp.Sn /; (10.13)
since exp..Sn 0 n//

P 1 on the event fSn 0 ng. We can represent the random
variable Sn as Sn D nj D1 Bj , where the fBj gnj D1 are independent and identically
distributed Bernoulli random variables with parameter ; that is, P .Bj D 1/ D
1 P .Bj D 0/ D . Using the fact that these random variables are independent
and identically distributed, we have
X
n Y
n
E exp.Sn / D E exp. Bj / D E exp.Bj / D .E exp.B1 //n D
j D1 j D1
.e C 1 /n : (10.14)
Thus, from (10.13), we obtain the inequality

n
P .Sn 0 n/ e 0 .e C 1 / ; for all n 1 and all > 0: (10.15)
The function f ./ WD e 0 .e C 1 /, 0, satisfies f .0/ D 1,

lim!1 f ./ D 1, and f 0 .0/ D 0 C < 0. Consequently, f possesses
a global minimum at some 0 > 0, and f .0 / 2 .0; 1/ [indeed, f .0 / 0
would contradict (10.15)]. In Exercise 10.5, the reader is asked to show that
1 10 0
f .0 / D 1 0 0
. Note now that .0 ; /, defined in the statement of
the proposition, is equal to log f .0 /. Thus, .0 ; / > 0, for 0 > , and
e .0 ;/ D f .0 /. Substituting D 0 in (10.15) gives
P .Sn 0 n/ e .0 ;/n :
Finally, since .0 ; / D .1 0 ; 1 /, it follows that .0 ; / > 0, if ¤ 0 .
10.4 Proof of Theorem 10.1
In this section, and also in Sect. 10.6, we will use tacitly the following facts, which
are left to the reader in Exercise 10.6:
1. If Xi Bin.ni ; p/, i D 1; 2, and n1 > n2 , then P .X1 k/ P .X2 k/, for

all integers k 0.
2. If Xi Bin.n; pi /, i D 1; 2, and p1 > p2 , then P .X1 k/ P .X2 k/, for
all integers k 0.
We assume that pn D nc with c 2 .0; 1/. From the analysis in Sect. 10.2, we have
seen that for x 2 Œn, the size of the connected component of Gn .pn / containing x
is given by T D minft 0 W Yt D 0g. (As noted in Sect. 10.2, the quantities T and
Yt depend on x and n, but this dependence is suppressed in the notation.) Let YOt be
a random variable distributed according to the distribution Bin.n 1; 1 .1 nc /t /.
Then from Lemma 10.1,
P .T > t / P .Yt > 0/ D P .YOt > t 1/ D P .YOt t /:
(The inequality above is not an equality because we have continued the definition
of Yt past the time T .) Let YNt be a random variable distributed according to the
distribution Bin.n 1; tcn /. By Taylor’s remainder formula, .1 x/t 1 tx,
for x 0 and t a positive integer. Thus, tcn 1 .1 nc /t , and consequently
P .YOt t / P .YNt t /. Thus, we have
P .T > t / P .YNt t /: (10.16)
If Sn;t Bin.n; tcn / as in Proposition 10.1, then P .YNt t / P .Sn;t t /. Using

this with (10.16) and Proposition 10.1, we conclude that there exists a > 0 such
that
P .T > t / e t ; t 0; n 1: (10.17)
Let > 0 satisfy > 1. Then from (10.17) we have
P .T > log n/ e log n D n : (10.18)
We have proven that the probability that the connected component containing x is
larger than log n is no greater than n . There are n vertices in Gn .pn /; thus the
probability that at least one of them is in a connected component larger than log n
is certainly no larger than nn D n1 ! 0 as n ! 1. This completes the
proof of Theorem 10.1.
10.5 The Galton–Watson Branching Process
1
P1 process in discrete time. Let fq1
We define a random population n gnD0 be a nonneg-
ative sequence satisfying nD0 qn D 1. We will refer to fqn gnD0 as the offspring
distribution of the process. Consider an initial particle alive at time t D 0 and set
10.5 Galton–Watson Branching Process 117
Fig. 10.2 A realization of a branching process that becomes extinct at n D 5
X0 D 1 to indicate that the size of the initial population is 1. At time t D 1, this

particle gives birth to a random number of offspring and then dies. For each n 2 ZC ,
the probability that there were n offspring is qn . Let X1 denote the population size
at time 1, namely the number of offspring of the initial particle. In general, at any
time t 1, all of the Xt1 particles alive at time t 1 give birth to random numbers
of offspring and die. The new number of particles is Xt . The numbers of offspring
of the different particles throughout all the generations are assumed independent
of one another and are all distributed according to the same offspring distribution
fqn g1
nD0 .
The random population process fXt g1 tD0 is called a Galton–Watson branching
process. Clearly, if Xt D 0 for some t , then Xr D 0 for all r t . If this occurs,
we say the process becomes extinct; otherwise we say that the process survives.
See Fig. 10.2. If q0 D 0, then the probability of survival is 1. Otherwise, there
is a positive probability of extinction, since at any time t 1, there is a positive
probability (namely q0Xt1 ) that all of the particles die without leaving any offspring,
in which case Xt D 0. The most fundamental question we can ask about this process
is whether it has a positive probability of surviving.
Let W be a random variable distributed according to the offspring distribution:
P .W D n/ D qn . Let
1
X
D EW D nqn
nD0
denote the mean number of offspring of a particle. It is easy to show that EXtC1 D
EXt (Exercise 10.7), from which it follows that EXt D t , t 0. From this, it
follows that if < 1, then limt!1 EXt D 0. Since EXt P .Xt 1/, it follows
that limt!1 P .Xt 1/ D 0, which means that the process has probability 1 of
extinction. The fact that EXt is growing exponentially in t when > 1 would
suggest, but not prove, that for > 1 the probability of extinction is less than 1. In
fact, we can use the method of generating functions to prove the following result.
Define
1
X
.s/ D qn s n ; s 2 Œ0; 1: (10.19)
nD0
The function .s/ is the probability generating function for the distribution fqn g1
nD0 .
Theorem 10.3. Consider a Galton–Watson P branching process with offspring dis-

1
tribution fqn g1
nD0 , where q0 > 0. Let D nD0 nqn 2 Œ0; 1 denote the mean
number of offspring of a particle.
(i) If 1, then the Galton–Watson process becomes extinct with probability 1.
(ii) If > 1, then the Galton–Watson process becomes extinct with probability
˛ 2 .0; 1/, where ˛ is the unique root s 2 .0; 1/ of the equation .s/ D s.
Proof. If q0 C q1 D 1, then necessarily, < 1. Thus, it follows from the paragraph
before the statement of the theorem that extinction occurs with probability 1.
Assume now that q0 C q1 < 1. Since the power series for .s/ converges uniformly
for s 2 Œ0; 1 , for any > 0, it follows that we can differentiate term by term
to get
1
X 1
X
0 .s/ D nqn s n1 0; 00 .s/ D n.n 1/qn s n2 0; 0 s < 1:
nD0 nD0
In particular then, since q0 C q1 < 1, is a strictly convex function on Œ0; 1, and
consequently, so is .s/ WD .s/ s. We have .0/ D q0 > 0 and .1/ D 0.
Also, lims!1 0 .s/ D lims!1 0 .s/ 1 D 1. Since is strictly convex, it
follows that if 1, then 0 .s/ < 0 for s 2 Œ0; 1/, and consequently .s/ > 0,
for s 2 Œ0; 1/. However, if > 1, then 0 .s/ > 0 for s < 1 and sufficiently
close to 1. Using this along with the strict convexity and the fact that .0/ > 0
and .1/ D 0, it follows that there exists a unique ˛ 2 .0; 1/ such that .˛/ D 0
and that .s/ > 0, for s 2 .0; ˛/, and .s/ < 0, for s 2 .˛; 1/. (The reader should
verify this.) We have thus shown that
the smallest root ˛ 2 Œ0; 1 of the equation .z/ D z satisfies

˛ 2 .0; 1/; if > 1; and ˛ D 1; if 1: Furthermore, in the case > 1;
one has .s/ > s; for s 2 Œ0; ˛/; and .s/ < s; for s 2 .˛; 1/: (10.20)
Now let t WD P .Xt D 0/ denote the probability that extinction has occurred by
time t . Of course, 0 D 0. We claim that
t D .t1 /; for t 1: (10.21)
To prove this, first note that when t D 1, (10.21) says that 1 D .0/ D q0 , which
is of course true. Now consider t > 1. We first calculate P .Xt D 0jX1 D n/, the
probability that Xt D 0, conditioned on X1 D n. By the conditioning, at time t D 1,
there are n particles, and each of these particles will contribute independently to the
10.5 Galton–Watson Branching Process 119
population size Xt at time t , through t 1 generations of branching. In order to

have Xt D 0, each of these n “new” branching processes must become extinct by
time t 1. The probability that any one of them becomes extinct by time t 1 is,
by definition, t1 . By the independence, it follows that the probability that they all
become extinct by time t 1 is t1
n
. We have thus proven that
P .Xt D 0jX1 D n/ D t1

n
:
Since P .X1 D n/ D qn , we conclude that

1
X 1
X
t D P .Xt D 0/ D P .X1 D n/P .Xt D 0jX1 D n/ D n
qn t1 D .t1 /;
nD0 nD0
proving (10.21). From its definition, t is nondecreasing, and ext WD limt!1 t is

the extinction probability. Letting t ! 1 in (10.21) gives
ext D .ext /: (10.22)
It follows immediately from (10.22) and (10.20) that ext D 1, if 1. If > 1,

then there are two roots s 2 Œ0; 1 of the equation .s/ D s, namely s D ˛ and
s D 1. If ext D 1, then t > ˛ for sufficiently large t , and then by (10.20)
and (10.21), for such t , we have tC1 D .t / < t , which contradicts the fact
that t is nondecreasing. Thus, we conclude that ext D ˛.
At one point in the proof of Theorem 10.2, we will use the above result on the
extinction probability of a Galton–Watson branching process. However, we will
need to consider this process in an alternative form. In the original formulation,
at time t , the entire population of size Xt1 that was alive at time t 1 reproduces
and dies, and then Xt is the new population size. In other words, time t referred
to the t th generation of particles. In our alternative formulation, at each time, t
only one of the particles that was alive at time t 1 reproduces and dies. Thus,
as before, we have X0 D 1 to denote that we start with a single particle, and X1
denotes the number of offspring that the original particle produces before it dies. At
time t D 2, instead of having all X1 particles reproduce and die simultaneously, we
choose (arbitrarily) just one of these particles that was alive at time t D 1 and have
it reproduce and die. Then X2 is equal to the new total population. We continue in
this way, at each step choosing just one of the particles that was alive at the previous
step. Since in any case, the number of offspring of any particle is independent of the
number of offspring of the other particles, it is clear that this new process has the
same extinction probability as the original one.
10.6 Proof of Theorem 10.2
We assume that pn D nc , with c > 1. From the analysis in Sect. 10.2, we have seen
that for x 2 Œn, the size of the connected component of Gn .pn / containing x is
given by T D minft 0 W Yt D 0g.
Consider a Galton–Watson branching process fXt g1 tD0 in the alternative form
described at the end of Sect. 10.5, and let the offspring distribution be the Poisson
m
distribution with parameter c; that is, qm D e c cmŠ . The probability generating
function of this distribution is given by
1
X 1
X cm m
.s/ D qm s m D e c s D e c.s1/ : (10.23)
mD0 mD0
mŠ
The expected number of offspring is equal to c. Since c > 1, it follows from

Theorem 10.3 that the extinction time Text D infft 1 W Xt D 0g satisfies
P .Text < 1/ D ˛; (10.24)
where ˛ 2 .0; 1/ is the unique solution s 2 .0; 1/ to the equation .s/ D s, that is,
to the equation e c.s1/ D s. Substituting z D 1 s in this equation, this becomes
1 ˛ is the unique root z 2 .0; 1/ of the equation 1 e cz z D 0: (10.25)
Let fWt g1tD1 be a sequence of independent, identically distributed random

variables distributed according to the Poisson distribution with parameter c. If
Xt1 ¤ 0, then Wt will serve as the number of offspring of the particle chosen
for reproduction and death from among the Xt1 particles alive at time t 1. Then
we may represent the process fXt g1
tD0 by X0 D 1 and
Xt D Xt1 C Wt 1; 1 t Text : (10.26)
If Text < 1, then of course Xt D 0 for all t Text . For any fixed t 1, as soon as
one knows the values of fWs gtsD1 , one knows the values of fXs gtsD1 . (We note that
it might happen that these values of fWs gtsD1 result in Xs0 D 0 for some s0 < t ,
in which case the values of fWs gtsDs0 C1 are superfluous for determining the values
of fXs gtsD1 .) If rN WD frs gtsD1 are the values obtained for fWs gtsD1 , let lN WD fls gtsD1
denote the corresponding values for fXs gtsD1 . We write lN D l. N r/.
N Note that Text t
occurs if and only if ls > 0, for 0 s t 1, or, equivalently, if and only if
lt1 > 0.
Now consider the process fYt g1 tD0 introduced in Sect. 10.2. Recall that T is
equal to the smallest t for which Yt D 0. Note from (10.3) and (10.26) that
fYt gTtD0 is defined recursively in a way very similar to the way fXt gTtD0 ext
is defined.
The difference is that the independent sequence of random variables fWt g1 tD1
distributed according to the Poisson distribution with parameter c is replaced by

the sequence fZt g1 tD1 . The distribution of these latter random variables is given
by (10.4) and (10.5). (As noted in Sect. 10.2, Yt , T , and Zt depend on x and n,
but that dependence has been suppressed in the notation.) Because the form of the
recursion formula is the same for fXs gTsD1 ext
and fYs gTsD1 , and because X0 D Y0 D 1,
t N r/
it follows that if rN D frs gsD1 are the values obtained for fZs gtsD1 , and if l. N satisfies
lt1 > 0, then l.N r/
N are the corresponding values for fYs gsD1 .t
Since the random variables fWs gtsD1 are independent, we have
Y
t
P .fWs gtsD1 D r/
N D P .Ws D rs /: (10.27)
sD1
By (10.4) and by the conditional independence condition (10.5), if lt1 > 0, we

have
Y
t
P .fZs gtsD1 D r/
N D P .Zs D rs jYs1 D ls1 /; (10.28)
sD1
where, for convenience, we define l0 D 1. By (10.4), the distribution of Zs ,

conditioned on Ys1 D ls1 , is given by Bin.n s ls1 C 1; nc /. Since limn!1
.n s ls1 C 1/ nc D c, it follows from the Poisson approximation to the binomial
distribution (see Proposition A.3 in Appendix A) that
lim P .Zs D rs jYs1 D ls1 / D P .Ws D rs /:

n!1
Thus, we conclude from (10.27) and (10.28) that for any fixed t ,
lim P .fZs gtsD1 D r/

N D P .fWs gtsD1 D r/;
N
n!1
for all rN D frs gtsD1 for which lt1 .r/

N > 0;
and, consequently, that for any fixed t ,
N D P .fXs gt D l/;
lim P .fYs gtsD1 D l/ N for all lN D fls gt satisfying lt1 > 0:
n!1 sD1 sD1
(10.29)
Since Text , the extinction time for fXt g1
tD0 , is the smallest t for which Xt D 0, and
Text t is equivalent to lt1 > 0, and since T is the smallest t for which Yt D 0, it
follows from (10.29) that
lim P .T t / D P .Text t /; for any fixed t 1: (10.30)

n!1
From (10.24), we have limt!1 P .Text t / D ˛; thus, for any > 0, there exists an
integer such that P .Text t / 2 .˛ 2 ; ˛/, if t . It then follows from (10.30)
that there exists an n1; D n1; .t / such that
P .T t / 2 .˛ ; ˛ C /; if t and n n1; .t /: (10.31)
We now analyze the probabilities P . n T .1˛ /n/ and P ..1˛C /n

T n/. We will show that these probabilities are very small. From (10.25), it
follows that 1 e cz > z, for z 2 .0; 1 ˛/, and 1 e cz < z, for z 2 .1 ˛; 1.
Consequently, for > 0, choosing ı D ı. / sufficiently small, we have
1 e cz 2ı > z; for z 2 . ; 1 ˛ /I 1 e cz C 2ı < z; for z 2 .1 ˛ C ; 1:

(10.32)
Since T is the smallest t for which Yt D 0, we have
f n T .1 ˛ /ng [ nt.1˛ /n fYt 0g:
(Recall that Yt has also been defined recursively for t > T and can take on negative
values for such t .) Thus, letting YOt be the random variable distributed according to
the distribution Bin.n 1; 1 .1 nc /t /, it follows from Lemma 10.1 that
X
P . n T .1 ˛ /n/ P .YOt t 1/: (10.33)
nt.1˛ /n
One has limn!1 .1 nc /bn D e cb , uniformly over b in a bounded set. (The
reader should verify this by taking the logarithm of .1 nc /bn and applying Taylor’s
ct
formula.) Applying this with b D nt , with 0 t n, it follows that .1 nc /t e n
is small for large n, uniformly over t 2 Œ0; n. Thus, for ı D ı. /, which has been
ct
defined above, there exists an n2;ı D n2;ı. / such that 1 .1 nc /t 1 e n ı,
for n n2;ı and 0 t n. Let YNt be a random variable distributed according to
the distribution Bin.n 1; 1 e n ı/. Then P .YOt t 1/ P .YNt t 1/, if
ct
n n2;ı . Using this with (10.33), we obtain

X
P . n T .1 ˛ /n/ P .YNt t 1/; n n2;ı : (10.34)
nt.1˛ /n
Every t in the summation on the right hand side of (10.34) is of the form t D bn n,
ct
with bn 1 ˛ . Thus, it follows from (10.32) that 1 e n ı D
cbn
1e ı bn C ı. We now apply part (ii) of Proposition 10.2 with n 1 in
place of n, with D 1e cbn ı, and with 0 D bn . Note that and 0 are bounded
from 0 and from 1 as n varies and as t varies over the above range. Also, we have
> 0 C ı. Consequently, there exists a constant > 0 such that .0 ; / , for
all ; 0 as above. Thus, we have for n n2;ı ,
P .YNt t 1/ D P .YNt bn n 1/ P .YNt bn .n 1// e .n1/ : (10.35)
From (10.34) and (10.35) we conclude that
P . n T .1 ˛ /n/ .1 ˛/ne .n1/ ; n n2;ı. / : (10.36)
A very similar analysis shows that
P ..1 ˛ C /n T n/ ˛ ne .n1/ ; n n3;ı. / ; (10.37)
for some n3;ı D n3;ı. / . This is left to the reader as Exercise 10.8.
We now analyze the probability P .t < T < n/, for fixed t . As in (10.33), we
have
X
P .t < T < n/ P .YOs s 1/; (10.38)
t<s< n
where, we recall, YOs is distributed according to the distribution Bin.n 1; 1 .1

/ /. Let YQs be a random variable distributed according to the distribution Bin.n1;
c s
n
.1 nc /s /. Then
P .YOs s 1/ D P .YQs n s/: (10.39)
As in the proofs of Propositions 10.1 and 10.2, we have for any > 0
Q
P .YQs n s/ e .ns/ Ee Ys : (10.40)
P
We can represent the random variable YQs as YQs D n1
j D1 Bj , where the fBj gj D1 are
n1
independent and identically distributed Bernoulli random variables with parameter

.1 nc /s ; that is, P .Bj D 1/ D 1 P .Bj D 0/ D .1 nc /s . Using the fact that
these random variables are independent and identically distributed, we have
Q
Y
n1
c c n1
Ee Ys D Ee Bj D .1 /s /e C 1 .1 /s : (10.41)
j D1
n n
Thus, from (10.40) and (10.41), we obtain

c c n1
P .YQs n s/ e .ns/ .1 /s e C 1 .1 /s : (10.42)
n n
We now substitute n D M s in (10.42) to obtain

c s M s1
P .YQs n s/ e s.M 1/ .1 / .e 1/ C 1
Ms
M s
c s c s
e s.M 1/ .1 / .e 1/ C 1 D e s .M 1/CM log .1 M s / .e 1/C1 ;
Ms
for all > 0: (10.43)
We will show that for an appropriate choice of > 0, the expression in the square
brackets above is negative and bounded away from 0 for all s 1 and sufficiently
large M . Let
c s
fs;M ./ WD .M 1/ C M log .1 / .e 1/ C 1 :
Ms
Then fs;M .0/ D 0 and
0
M.1 Mc s /s e
fs;M ./ D .M 1/ C : (10.44)
.1 Mc s /s .e 1/ C 1

For any fixed , defining g.y/ D y.eye 1/C1
, for y > 0, it is easy to check that
g 0 .y/ > 0; therefore, g is increasing. The last term on the right hand side of (10.44)
is M g.y/, with y D .1 Mc s /s . Since 1 x e x , for x 0, we have
c
.1 Mc s /s e M , if n D M s c, and thus the last term on the right hand
c
side of (10.44) is bounded from above by M g.e M /, independent of s, for s Mc .
Thus, from (10.44), we have
c
0 M e M e M e
fs;M ./ .M 1/ C c
M
D M C 1 C c D
e .e 1/ C 1 e 1 C e M
c
1 eM c
1CM c ; for all s :
e 1Ce M M
c
Since limM !1 M 1e M c D ce , uniformly over 2 Œ0; 1, and since c > 1,
e 1Ce M
it follows that there exists a 0 > 0 and an M0 such that if 2 Œ0; 0 and M M0 ,
0
then fs;M ./ 1c2
, for all s 1. It then follows that fs;M .0 / 0 .c1/
2
, for all
M M0 and s 1. Choosing D 0 in (10.43) and using this last inequality for
fs;M .0 /, we conclude that
0 .c1/
P .YQs n s/ e 2 s
; for n M0 s; s 1: (10.45)
From (10.38), (10.39), and (10.45), we obtain the estimate
X 1
X
0 .c1/

0 .c1/

0 .c1/ e 2 t
P .t < T < n/ e 2 s
< e 2 s
D 0 .c1/
;
t<s< n sDt 1 e 2
1
if : (10.46)
M0
Now (10.31), (10.36), (10.37), and (10.46) guarantee that for any 2 .0; 1/, we can
choose t and n such that for all n n , one has
˛ P .T t / ˛ C I

P T > t ; T 62 ..1 ˛ /n; .1 ˛ C /n/ I
(10.47)
1 ˛ 2 P T 2 ..1 ˛ /n; .1 ˛ C /n/ 1 ˛ C ;
for all n n :
(The third set of inequalities above is a consequence of the first two sets of
inequalities.)
We recall that the above estimates have been obtained when p D nc , with
c > 1, and where 1 ˛ D 1 ˛.c/ is the unique root z 2 .0; 1/ of the equation
1 e cz z D 0. The reader can check that the above estimates hold uniformly for
c 2 Œc1 ; c2 , for any 1 < c1 < c2 . Thus, consider as before a fixed c > 1 and
˛ D ˛.c/, and let ı > 0 satisfy c ı > 1. For c 0 2 Œc ı; c, let ˛ 0 WD ˛.c 0 /. Then
for all > 0, there exists a t > 0 and a n > 0 such that for all n n and all
0
c 0 2 Œc ı; c, one has for the graph G.n; cn /,
˛ 0 P .T t / ˛ 0 C I

P T > t ; T 62 ..1 ˛ 0 /n; .1 ˛ 0 C /n/ I
(10.48)
1 ˛ 0 2 P T 2 ..1 ˛ 0 /n; .1 ˛ 0 C /n/ 1 ˛ 0 C ;
for all n n :
Return now to our graph G.n; nc /, with n considerably larger than the n
in (10.48). (We will quantify “considerably larger” a bit later on.) Recall that we
started out by choosing arbitrarily some vertex x in the graph G.n; nc /, and then
applied our algorithm, obtaining T , which is the size of the connected component
containing x. Call this the first step in a “game.” If it results in T t , say that a
“draw” occurred on the first step. If it results in .1˛ /n < T < .1˛ C /n, say
that a “win” occurred on the first step. Otherwise, say that a “loss” occurred on the
first step. If a win or a loss occurs on this first step, we stop the procedure and say
that the game ended in a win or loss, respectively. If a draw occurs, then consider the
remaining n T vertices that are not in the connected component containing x, and
consider the corresponding edges. This gives a graph of size n0 D n T . Note that
by the definition of the algorithm, there is no pair of points in this new graph that has
already been checked by the algorithm. Therefore, the conditional edge probabilities
for this new graph, conditioned on having implemented the algorithm, are as before,
namely nc , independently for each edge. This edge probability can be written as
0
pn0 D nc 0 , where c 0 D nTn
c. Now T t . Thus, if n n is sufficiently large,
then c 0 2 Œc ı; c and n0 D n T n , so the estimates (10.48) (with n replaced
by n0 ) will hold for this new graph, which has n0 vertices and edge probabilities
0
pn0 D nc 0 . Choose an arbitrary vertex x1 from this new graph and repeat the above
algorithm on the new graph. Let T1 denote the random variable T for this second
step. If a win or a loss occurs on the second step of the game, then we stop the game
and say that the game ended in a win or a loss, respectively. (Of course, here we
define win, loss, and draw in terms of T1 ; n0 , and ˛ 0 instead of T; n, and ˛. However,
the same t is used.) If a draw occurs on this second step, then we consider the
n0 T1 D n T T1 vertices that are neither in the connected component of x
nor of x1 . We continue like this for a maximum of M steps, where M is chosen
M
sufficiently large to satisfy ˛.c ı/ C < . (We work with > 0 sufficiently
small so that ˛.c ı/ C < 1.) The reason for this choice of M will become
clear below. If after M steps, a win or a loss has not occurred, then we declare
that the game has ended in a draw. Note that the smallest possible graph size that
can ever be used in this game is n t .M 1/. The smallest modified value of c
1/
that can ever be used is nt .M n
c. We can now quantify what we meant when
we said at the outset of this paragraph that we are choosing n “considerably larger”
than n . We choose n sufficiently large so that n t .M 1/ n and so that
nt .M 1/
n
c c ı. Thus, the estimates in (10.48) are valid for all of the steps of
the game.
It is easy to check that ˛ D ˛.c/ is decreasing for c > 1. Thus, if the game ends
in a win, then there is a connected component of size between .1 ˛.c ı/ /n
and .1 ˛.c/ C /n. What is the probability that the game ends in a win? Let W
denote the event that the game ends in a win, let D denote the event that it ends in a
draw, and let L denote the event that it ends in a loss. We have
P .W / D 1 P .L/ P .D/: (10.49)
The game ends in a draw if there was a draw on M consecutive steps. Since on any
given step the probability of a draw is no greater than ˛.c ı/ C , the probability
of obtaining M consecutive draws is no greater than
M
˛.c ı/ C ; so by the choice of M , we have
M
P .D/ ˛.c ı/ C < : (10.50)
Let D c denote the complement of D; that is, D c D W [ L. Obviously, we have

L D L \ D c . Then we have
P .L/ D P .L \ D c / D P .D c /P .LjD c / P .LjD c /: (10.51)

If one played a game with three possible outcomes on each step—win, loss, or
draw—with respective nonzero probabilities p 0 , q 0 , and r 0 , and the outcomes of all
the steps were independent of one another, and one continued to play step after step
0
until either a win or a loss occurred, then the probability of a win would be p0pCq 0
0
and the probability of a loss would be p0qCq 0 (Exercise 10.9). Conditioned on D c ,
our game essentially reduces to this game. However, the probabilities of win and
loss and draw are not exactly fixed, but can vary a little according to (10.48). Thus,
we can conclude that

P .LjD c / D : (10.52)
1 ˛.c ı/ 2 C 1 ˛.c ı/
From (10.49)–(10.52) we obtain

P .W / 1 : (10.53)
1 ˛.c ı/
In conclusion, we have demonstrated the following. Consider any c > 1 and any
ı > 0 such that c ı > 1. Then for each sufficiently small > 0 and sufficiently
large n depending on , with probability at least 1 1˛.cı/

there will exist
a connected component of G.n; n / of size between .1 ˛.c ı/ /n and .1
c
˛.c/ C /n. If the connected component above, which has been shown to exist
with probability close to 1 and which is of size around .1 ˛/n, is in fact with
probability close to 1 the largest connected component, then the above estimates
prove (10.1), since by (10.25) the ˇ defined in the statement of the theorem is in
fact 1 ˛. Thus, to complete the proof of (10.1) and (10.2), it suffices to prove
that with probability approaching 1 as n ! 1, every other component of G.n; nc /
is of size O.log n/, as n ! 1. In fact, we will prove here the weaker result that
with probability approaching 1 as n ! 1, every other component is of size o.n/
as n ! 1. In Exercise 10.10, the reader is guided through a proof that every other
component is of size O.log n/.
To prove that every other component is of size o.n/ with probability approaching
1 as n ! 1, assume to the contrary. Then for an unbounded sequence of n’s,
the following holds. As above, with probability at least 1 1˛.cı/

, there
will exist a connected component of G.n; n / of size between .1 ˛.c ı/ /n
c
and .1 ˛.c/ C /n, and by our assumption, for some > 0, with probability
at least , there will be another connected component of size at least n. We may
take < 1 ˛.c ı/ . But if this were true, then at the first step of our
algorithm, when we randomly selected a vertex x, the probability that it would be
in a connected component of size at least n would be at least
.1 ˛.c ı/ /n n
1 C :
1 ˛.c ı/ n n
2
For and ı sufficiently small, this number will be larger than 1 ˛.c/ C 2
, in
2
which case the algorithm would have to give P .T t / < ˛.c/ 2
. However, for
> 0 sufficiently small, this contradicts the first line of (10.47).
Exercise 10.1. This exercise refers to Remark 3 after Theorem 10.2. Prove that for
1
any > 0 and large n, the number of edges of Gn . nc / is equal to 12 cn C O.n 2 C /
with high probability. Show directly that ˇ.c/ 2 , for 1 < c < 2, where ˇ.c/ is as
c
in Theorem 10.2.
Exercise 10.2. Let Dn denote the number of disconnected vertices in the Erdős–
Rényi graph Gn .pn /. For this exercise, it will be convenient to represent Dn as a sum
of indicator random variables. Let Dn;iPbe equal to 1 if the vertex i is disconnected
and equal to 0 otherwise. Then Dn D niD1 Dn;i .
(a) Calculate EDn . P P
(b) Calculate EDn2 . (Hint: Write EDn2 D E. niD1 Dn;i /. nj D1 Dn;j /.)
Exercise 10.3. In this exercise, you are guided through a proof of the result noted
in Remark 2 after Theorem 10.2, namely that:
if pn D log nCc
n
n
, then as n!1, the probability that the Erdős–Rényi graph Gn .pn /
possesses at least one disconnected vertex approaches 0 if limn!1 cn D 1, while
for any M , the probability that it possesses at least M disconnected vertices
approaches 1 if limn!1 cn D 1.
Let Dn be as in Exercise 10.2, with pn D log nCc
n
n
.
(a) Use Exercise 10.2(a) to show that limn!1 EDn equals 0 if limn!1 cn D 1
and equals 1 if limn!1 cn D 1. (Hint: Consider log EDn and note that by
2
Taylor’s remainder theorem, log.1 x/ D x .1x1 /2 x2 , for 0 < x < 1,
where x D x .x/ satisfies 0 < x < x.)
(b) Use (a) to show that if limn!1 cn D 1, then limn!1 P .Dn D 0/ D 1.
(c) Use Exercise 10.2(b) to calculate EDn2 .
(d) Show
that if limn!1 cn D 1, then the variance
2 .Dn / satisfies
2 .Dn / D
o .EDn /2 . (Hint: Recall that
2 .Dn / D EDn2 .EDn /2 .)
(e) Use Chebyshev’s inequality with (a) and (d) to conclude that if limn!1 cn D
1, then for any M , limn!1 P .Dn M / D 1.
Exercise 10.4. Recall from Chap. 5 that the probability generating function PX .s/
of a nonnegative random variable X taking integral values is defined by
1
X
PX .s/ D Es X D s i P .X D i /:
iD0
The probability generating function of a random variable X uniquely characterizes

.i /
PX .0/
its distribution, because iŠ
D P .X D i /.
(a) Let X Bin.n; p/. Show that PX .s/ D .ps C 1 p/n .

(b) Let Z Bin.n; p/, and let Y Bin.Z; p 0 /, by which is meant that conditioned
on Z D m, the random variable Y is distributed according to Bin.m; p 0 /.
Calculate PY .s/ by writing
X
n
PY .s/ D Es Y D E.s Y jZ D m/P .Z D m/;
mD0
and conclude that Y Bin.n; pp 0 /. Conclude from this that (10.7) and (10.9)
imply (10.6).
Exercise 10.5. Let f ./ D e 0 .e C 1 /, with 0 < < 0 < 1. Show that
1 10 0
inf0 f ./ is attained at some 0 > 0 and that f .0 / D 1 0 0
2 .0; 1/.
Pn
Exercise 10.6. If X Bin.n; p/, then X can be represented as X D iD1 Bi ,
where fBi gniD1 are independent and identically distributed random variables dis-
tributed according to the Bernoulli distribution with parameter p; that is, P .Bi D
1/ D 1 P .Bi D 0/ D p.
(a) Use the above representation to prove that
if Xi Bin.ni ; p/; i D 1; 2; and n1 > n2 ; then

(10.54)
P .X1 k/ P .X2 k/; for all integers k 0;
and that
if Xi Bin.n; pi /; i D 1; 2; and p1 > p2 ; then

(10.55)
P .X1 k/ P .X2 k/; for all integers k 0:
(Hint: For (10.54), represent X1 using the random variables fBi gniD1 1
and
represent X2 using the first n2 of these very same random variables. For (10.55),
let fUi gniD1 be independent and identically distributed random variables, dis-
tributed according to the uniform distribution on Œ0; 1; that is, P .a Ui
.1/
b/ D b a, for 0 a < b 1. Define random variables fBi gniD1 and
.2/ n
fBi giD1 by the formulas
( (
.1/ 1; if Ui p1 I .2/ 1; if Ui p2 I
Bi D Bi D
0; if Ui > p1 ; 0; if Ui > p2 :
.1/ .2/
Now represent X1 and X2 through fBi gniD1 and fBi gniD1 , respectively. This
method is called coupling.)
(b) Prove (10.54) and (10.55) directly P fact that if X Bin.n; p/, then for
from the
0 k n, one has P .X k/ D nj Dk jn p j .1 p/nj .
Exercise 10.7. If fXt g1

tD0 is a Galton–Watson branching process of the type
described at the beginning of Sect. 10.5, show that EXtC1 D EXt , where is
the mean number of offspring of a particle. (Hint: Use induction and conditional
expectation.)
Exercise 10.8. Prove (10.37) by the method used to prove (10.36).
Exercise 10.9. Prove that if one plays a game with three possible outcomes on each
step—win, loss, or draw—with respective nonzero probabilities p 0 , q 0 , and r 0 , and
the outcomes of all the steps are independent of one another, and one continues to
play step after step until either a win or a loss occurs, then the probability of a win
0 0
is p0pCq 0 and the probability of a loss is p0qCq 0 .
Exercise 10.10. In the proof of Theorem 10.2, after the algorithm for finding the
connected component of a vertex was implemented a maximum of M times, and a
component with size around .1˛/n was found with probability close to 1, the final
paragraph of the proof of the theorem gave a proof that with probability approaching
1 as n ! 1, all other components are of size o.n/ as n ! 1. To prove the stronger
result, as in the statement of Theorem 10.2, that with probability approaching 1 as
n ! 1 all other components are of size O.log n/, consider starting the algorithm
all over again after the component of size around .1 ˛/n has been discovered.
The number of edges left is around n0 D ˛ n and the edge probability is still nc ,
which we can write as nC0 with C c˛. If C < 1, then the method of proof of
Theorem 10.1 shows that with probability approaching 1 as n ! 1 all components
are of size O.log n0 / D O.log n/ as n ! 1. To show that C < 1, it suffices to
show that c˛ < 1. To prove this, use the following facts: (1) xe x increases in Œ0; 1/
and decreases in .1; 1/, so for c > 1, there exists a unique d 2 .0; 1/ such that
de d D ce c ; (2) ˛ D e c.˛1/ .
Chapter Notes
The context in which Theorems 10.1 and 10.2 were originally proven by Erdős
and Rényi
in 1960 [18] is a little different from the context presented here. Let
N WD n2 . Define G.n; M /, 0 M N , to be the random graph with n vertices
and exactly M edges, where the M edges are selected uniformly at random from
the N possible edges. One can consider an evolving random graph fG.n; t /gN tD0 . By
definition, G.n; 0/ is the graph on n vertices with no edges. Then sequentially, given
G.n; t /, for 0 t N 1, one obtains the graph G.n; t C1/ by choosing at random
from the complete graph Kn one of the edges that is not in G.n; t / and adjoining
it to G.n; t /. Erdős and Rényi looked at evolving graphs of the form G.n; tn /, with
tn D Œ cn2
. They showed that if c < 1, then with probability approaching 1 as
n ! 1, the largest component of G.n; tn / is of size O.log n/, while if c > 1,
then with probability approaching 1 as n ! 1 there is one component of size
approximately ˇ.c/ n, and all other components are of size O.log n/. To see how
this connects up to the version given in this chapter, note that the expected number
of edges in the graph Gn . nc / is nc n2 D c.n1/ 2
. A detailed study of the borderline
case, when tn n2 as n ! 1, was undertaken by Bollobás [8]. Our proofs of
Theorems 10.1 and 10.2 are along the lines of the method sketched briefly in the
book of Alon and Spencer [2]. We are not aware in the literature of a complete
proof of Theorems 10.1 and 10.2 with all the details.
The large deviations bound in Proposition 10.2 is actually tight. That is, in part
(i), where 0 > , for any > 0, one has for sufficiently large n, P .Sn
0 n/ e ..0 ;/C /n . Thus, in particular, limn!1 n1 log P .Sn 0 n/ D .0 ; /.
Similarly, in part (ii), where 0 < , limn!1 n1 log P .Sn 0 n/ D .0 ; /.
Consider two measures, P and 0 , defined on a finite or countably infinite set A.
Then H.0 I / WD x2A 0 .x/ log .x/ 0 .x/
is called the relative entropy of 0 with
respect to . It plays a fundamental role in the theory of large deviations. In the
case that A is a two-point set, say A D f0; 1g, and .f1g/ D 1 .f0g/ D and
0 .f1g/ D 1 0 .f0g/ D 0 , one has H.0 I / D .0 ; /. For more on large
deviations, see the book by Dembo and Zeitouni [13].
For some basic results on the Galton–Watson branching process, using prob-
abilistic methods, see the advanced probability textbook of Durrett [16]. Two
standard texts on branching processes are the books of Harris [24] and of Athreya
and Ney [7].
Appendix A
A Quick Primer on Discrete Probability
In this appendix, we develop some basic ideas in discrete probability theory. We

note from the outset that some of the definitions given here are no longer correct in
the setting of continuous probability theory.
Let be a finite or countably infinite set, and let 2 denote the set of subsets
of . An element A 2 2 is simply a subset of , but in the language of probability
it is called an event. A probability measure on is a function P W 2 ! Œ0; 1
satisfying P .;/ D 0; P ./ D 1 and PNwhich is
-additive; that is, for any 1N
N 1, one has P .[N nD1 An / D nD1 P .An /, whenever the events fAn gnD1
are disjoint. From this
-additivity, it follows that P is uniquely determined by
fP .fxg/gx2 . Using the
-additivity on disjoint events, it is not hard
PN to prove that
P is
-sub-additive on arbitrary events; that is, P .[NnD1 A n / nD1 P .An /, for
arbitrary events fAn gN
nD1 . See Exercise A.1. The pair .; P / is called a probability
space.
If C and D are events and P .C / > 0, then the conditional probability of D
given C is denoted by P .DjC / and is defined by
P .C \ D/
P .DjC / D :
P .C /
Note that P . jC / is itself a probability measure on . Two events C and D are

called independent if P .C \ D/ D P .C /P .D/. Clearly then, C and D are
independent if either P .C / D 0 or P .D/ D 0. If P .C /; P .D/ > 0, it is easy
to check that independence is equivalent to either of the following two equalities:
P .DjC / D P .D/ or P .C jD/ D P .C /. Consider a collection fCn gN nD1 of events,
with 1 N 1. This collection of events is said to be independent
Qm if for any
finite subset fCnj gm
j D1 of the events, one has P .\j D1 Cnj / D
m
j D1 P .Cnj /.
Let .; P / be a probability space. A function X W ! R is called a (discrete,
real-valued) random variable. For B R, we write fX 2 Bg to denote the event
X 1 .B/ D f! 2 W X.!/ 2 Bg, the inverse image of B. When considering the
probability of the event fX 2 Bg or the event fX D xg, we write P .X 2 B/ or
P .X D x/, instead of P .fX 2 Bg/ or P .fX D xg/. The distribution of the random
DOI 10.1007/978-3-319-07965-3, © Springer International Publishing Switzerland 2014
134 A Primer on Discrete Probability
variable X is the probability measure X on R defined by X .B/ D P .X 2 B/,

for B R. The function pX .x/ WD P .X D x/ is called the probability function or
the discrete density function for X .
The expected value or expectation EX of a random variable X is defined by
X X X
EX D x P .X D x/ D x pX .x/; if jxj P .X D x/ < 1:
x2R x2R x2R
Note that the set of x 2 R for which P .X D x/ > 0 is either finite or countably
infinite; thus, these summations are well defined. We frequently denote EX by . If
P .X 0/ D 1 and the condition above in the definition of EX does not hold, then
we write EX P D 1. In the sequel, when we say that the expectation of X “exists,”
we mean that x2R jxj P .X D x/ < 1.
Given a function W R ! R and a random variable X , we can define a new
random variable Y D .X /. One can calculate EY according to the definition of
expectation above or in the following equivalent way:
X X
EY D .x/P .X D x/; if j .x/jP .X D x/ < 1:
x2R x2R
For n 2 N, the nth moment of X is defined by

X X
EX n D x n P .X D x/; if jxjn P .X D x/ < 1:
x2R x2R
If D EX exists, then one defines the variance of X , denoted by

2 or
2 .X / or
Var.X /, by
X

2 D E.X /2 D .x /2 P .X D x/:
x2R
Of course, it is possible to have

2 D 1. It is easy to check that

2 .X / D EX 2 2 : (A.1)
Chebyshev’s inequality is a fundamental inequality involving the expected value

and the variance.
Proposition A.1 (Chebyshev’s Inequality). Let X be a random variable with
expectation and finite variance
2 . Then for all > 0,

2
P .jX j / :
2
A Primer on Discrete Probability 135
Proof.
X X .x /2
P .jX j / D P .X D x/ P .X D x/
2
x2RWjxj x2RWjxj
X .x /2
2
P .X D x/ D :
x2R
2 2

Let fXj gnj D1 be a finite collection of random variables on a probability space
.; P /. We call X D .X1 ; : : : ; Xn / a random vector. The joint probability function
of these random variables, or equivalently, the probability function of the random
vector, is given by
pX .x/ D pX .x1 ; : : : ; xn / WD P .X1 D x1 ; : : : ; Xn D xn / D P .X D x/;

xi 2 R; i D 1; : : : ; n; where x D .x1 ; : : : ; xn /:
P P
It follows that j 2Œnfig xj 2R pX .x/ D P .Xi D xi /. For any function H W
Rn ! R, we define
X X
EH.X / D H.x/pX .x/; if jH.x/jpX .x/ < 1:
x2Rn x2Rn
P
In particular then, if EXj exists, it can be written as EXj D x2Rn xj pX .x/.
Similarly, if EXk exists, for all k, then we have
X
n X Xn X
n
X
E ck X k D . ck xk /pX .x/ D ck xk pX .x/ :
kD1 x2Rn kD1 kD1 x2Rn
It follows from this that the expectation is linear; that is, if EXk exists for k D
1; : : : ; n, then
X
n X
n
E ck X k D ck EXk ;
kD1 kD1
for any real numbers fck gnkD1 .

Let fXj gNj D1 be a collection of random variables on a probability space .; P /,
where 1 N 1. The random variables are called independent if for every finite
n N , one has
Y
n
P .X1 D x1 ; X2 D x2 ; : : : ; Xn D xn / D P .Xj D xj /;
j D1
for all xj 2 R; j D 1; 2; : : : ; n:
Let ffi gniD1 be real-valued functions with fi defined at least on the set fx 2 R W
P .Xi D x/ > 0g. Assume that Ejfi .Xi /j < 1, for i D 1; : : : ; n. From the
definition of independence it is easy to show that if fXj gnj D1 are independent, then
Y
n Y
n
E fi .Xi / D Efi .Xi /: (A.2)
iD1 iD1
The variance is of course not linear. However the variance of a sum of independent
random variables is equal to the sum of the variances of the random variables:
If fXi gniD1 are independent random variables, then

X
n X
n

2. Xi / D
2 .Xi /: (A.3)
iD1 iD1
It suffices to prove (A.3) for n D 2 and then use induction. Let i D EXi , i D 1; 2.
We have
2 2
2 .X1 C X2 / D E X1 C X2 E.X1 C X2 / D E .X1 1 / C .X2 2 / D

E.X1 1 /2 C E.X2 2 /2 C 2E.X1 1 /.X2 2 / D
2 .X1 / C
2 .X2 /;
where the last equality follows because (A.2) shows that E.X1 1 /.X2 2 / D
E.X1 1 /E.X2 2 / D 0.
Chebyshev’s inequality and (A.3) allow for an exceedingly short proof of
an important result—the weak law of large numbers for sums of independent,
identically distributed (IID) random variables.
Theorem A.1. Let fXn g1nD1 be a sequence of independent, identically distributed
2
Pncommon variance
is finite. Denote their
random variables and assume that their
common expectation by . Let Sn D j D1 Xj . Then for any > 0,
Sn
lim P .j j / D 0:
n!1 n
Proof. We have ESn D n, and since the random variables are independent and
identically distributed, it follows from (A.3) that
2 .Sn / D n
2 . Now applying
Chebyshev’s inequality to Sn with D n gives
n
2
P .jSn nj n / ;
.n /2
which proves the theorem.

Remark. The weak law of large numbers is a first moment result. It holds even
without the finite variance assumption, but the proof is much more involved.
The above weak law of large numbers is actually a particular case of the
following weak law of large numbers.
Proposition A.2. Let fYn g1
nD1 be random variables. Assume that

2 .Yn / D o .EYn /2 ; as n ! 1:
Then for any > 0,
Yn
lim P .j 1j / D 0:
n!1 EYn
Proof. By Chebyshev’s inequality, we have

2 .Yn /
P .jYn EYn j jEYn j/ 2 :
EYn

If X and Y are random variables on a probability space .; P /, and if
P .X Dx/>0, then the conditional probability function of Y given X D x is
defined by
P .X D x; Y D y/
pY jX .yjx/ WD P .Y D yjX D x/ D :
P .X D x/
The conditional expectation of Y given X D x is defined by

X X
E.Y jX D x/ D y P .Y D yjX D x/ D y pY jX .yjx/;
y2R y2R
X
if jyjP .Y D yjX D x/ < 1:
y2R
It is easy to verify that

X
EY D E.Y jX D x/P .X D x/;
x2R
where E.Y jX D x/P .X D x/ WD 0, if P .X D x/ D 0.

A random variable X that takes on only two values—0 and 1, with P .X D

1/ D p and P .X D 0/ D 1 p, for some p 2 Œ0; 1—is called a Bernoulli
random variable. One writes X Ber.p/. It is trivial to check that EX D p and
2 .X / D p.1 p/.
Let n 2 N and let p 2 Œ0; 1. A random variable X satisfying
!
n j
P .X D j / D p .1 p/nj ; j D 0; 1; : : : ; n;
j
is called a binomial random variable, and one writes X Bin.n; p/. The random
variable X can be thought of as the number of “successes” in n independent trials,
where on each trial there are two possible outcomes—“success” and “failure”—
and the probability of “success” is p on each trial. Letting fZi gniD1 be independent,
identically distributed random variables
Pn distributed according to Ber.p/, it follows
that X can be realized as X D iD1 Zi . From the formula for the expected
value and variance of a Bernoulli random variable, and from the linearity of the
expectation and (A.3), the above representation immediately yields EX D np and
2 .X / D np.1 p/.
A random variable X satisfying
n
P .X D n/ D e ; n D 0; 1; : : : ;
nŠ
where > 0, is called a Poisson random variable, and one writes X Pois./.
One can check easily that EX D and
2 .X / D .
Proposition A.3 (Poisson Approximation to the Binomial Distribution). For
n 2 N and p 2 Œ0; 1, let Xn;p Bin.n; p/. For > 0, let X Pois./. Then
lim P .Xn;p D j / D P .X D j /; j D 0; 1; : : : : (A.4)

n!1;p!0;np!
Proof. By assumption, we have p D n

n
, where limn!1 n D . We have
!
n j n.n 1/ .n j C 1/ n j n
P .Xn;p Dj /D p .1p/nj D . / .1 /nj D
j jŠ n n
1 j n.n 1/ .n j C 1/ n
.1 /nj I
jŠ n nj n
thus,
j
lim P .Xn;p D j / D e D P .X D j /:
n!1;p!0;np! jŠ

Equation (A.4) is an example of weak convergence of random variables or dis-

tributions. In general, if fXn g1 1
nD1 are random variables with distributions fXn gnD1 ,
and X is a random variable with distribution X , then we say that Xn converges
weakly to X , or Xn converges weakly to X , if limn!1 P .Xn x/ D
P .X x/, for all x 2 R for which P .X D x/ D 0, or equivalently, if
limn!1 Xn ..1; x/ D X ..1; x/, for all x 2 R for which X .fxg/ D 0.
Thus, for example, if P .Xn D n1 / D P .Xn D 1 C n1 / D 12 , for n D 1; 2; ,
and P .X D 0/ D P .X D 1/ D 12 , then Xn converges weakly to X since
limn!1 P .Xn x/ D P .X x/, for all x 2 R f0; 1g. See also Exercise A.4.
Exercise A.1. Use the
-additivity property of probability measures PNon disjoint sets
nD1 An /
to prove
-sub-additivity on arbitrary sets: that is, P .[N nD1 P .An /, for
arbitrary events fAn gN nD1 , where 1 N 1. (Hint: Rewrite [nD1 An as a disjoint
N
union [N nD1 Bn , by letting B1 D A1 ; B2 D A2 A1 ; B3 D A3 A2 A1 , etc.)
Exercise A.2. Prove that P .A1 [A2 / D P .A1 /CP .A2 /P .A1 \A2 /, for arbitrary
events A1 ; A2 . Then prove more generally that for any finite n and arbitrary events
fAk gnkD1 , one has
X X
P .[nkD1 Ak / D P .Ai / P .Ai \ Aj /C
1in 1i<j n
X
P .Ai \ Aj \ Ak / C .1/n1 P .A1 \ A2 \ An /:
1i<j <kn
This result is known as the principle of inclusion–exclusion.

Exercise A.3. Let .; P / be a probability space and let R 2 be an integer. For
A , recall that the complement Ac of A is defined by Ac D A. Prove that
if the events fAk gR
kD1 are independent, then the complementary events fAk gkD1 are
c R
also independent. (Hint: By the definition of independence, we have
Ỳ
P .\`j D1 Bj / D P .Bj /; for any ` R and any
j D1 (A.5)
sub-collection fBj g`j D1 of fAk gR

kD1 :
Q`
Using this, we need to prove that P .\`j D1 Bjc / D c
j D1 P .Bj /, for any sub-
collection fBj gj D1 of fAk gkD1 . Let pj D P .Bj / and p D P .\`j D1 Bjc /. Then
c ` c R
Q
we need to prove that p D `j D1 .1 pj /. Write
Ỳ X X
.1 pj / D 1 pi C pi pj ;
j D1 1i` 1i<j `
and use (A.5) along with the principle of inclusion–exclusion, which appears in
Exercise A.2.
Exercise A.4. Using (A.4), show that
lim P .Xn;p x/ D P .X x/; for all x 2 R:

n!1;p!0;np!
Appendix B
Power Series and Generating Functions
We review without proof some basic results concerning power series. For more
details, the reader should consult an advanced calculus or undergraduate analysis
text. We also illustrate the utility of generating functions by analyzing the one that
arises from the Fibonacci sequence.
Let fan g1nD0 be a sequence of real numbers. Define formally the generating
function F .t / of fan g1
nD0 by
1
X
F .t / D an t n ; (B.1)
nD0
where t 2 R. We say “formally” because we have made the definition before

determining for which values of t the power series on the right hand side above
converges. The power series converges trivially for t D 0, and it is possible that it
converges only for t DP0, for example, if an D nŠ. P
The power series 1 nD0 na t n
converges absolutely if 1 nD0 jan t j < 1. The
n
power series is uniformly, absolutely convergent for jt j if

1
X
lim sup jan t n j D 0I
N !1 jtj
nDN
P
that is, if the tail of the series 1nD0 jan t j converges to 0 uniformly over jt j .
n
We state four fundamental results concerning the convergence of power series:

1. If the power series converges for some number t0 ¤ 0, then necessarily the power
series converges absolutely and uniformly for jt j , for all < t0 .
There exists an extended real number r0 2 Œ0; 1 such that the power series
2. P
1
nD0 an t converges absolutely if t 2 Œ0; r0 / and diverges if t > r0 .
n
The number r0 in (2) is called the radius of convergence of the power series.
142 B Power Series and Generating Functions
3. The radius of convergence is given by the formula
1
r0 D p
n a
:
lim supn!1 n
4. If the power series is uniformly, absolutely convergent for jt j , then the

function F .t / in (B.1) is infinitely differentiable for jt j < , and its derivatives
are obtained
P via term by term differentiation in the power series; in particular,
F 0 .t / D 1nD0 nan t
n1
.
The generating function often provides an efficient method for obtaining infor-
mation about the sequence fan g1 nD0 . Typically, this will occur when the generating
function can be written in a nice closed form and analyzed. This analysis then allows
one to obtain information about the coefficients in the generating function’s power
series expansion, and these coefficients are of course fan g1 nD0 . We illustrate this in
the case of the famous Fibonacci sequence.
Recall that the sequence of Fibonacci numbers is defined recursively by f0 D 0;
f1 D 1 and
fn D fn1 C fn2 ; for n 2: (B.2)
The first few Fibonacci numbers are 0,1,1,2,3,5,8,13, 21,34,55,89,144.

We will obtain a closed form for the generating function
1
X
F .t / D fn t n (B.3)
nD0
of the Fibonacci numbers. Multiply both sides of (B.2) by t n and then sum both
sides over n, with n running from 2 to 1. This gives us
1
X 1
X 1
X
fn t n D fn1 t n C fn2 t n :
nD2 nD2 nD2
Since f0 D 0 and f1 D 1, the left hand side above is equal to F .t / t . Factoring

out t from the first term and t 2 from the second term on the right hand side above,
and using the fact that f0 D 0, one sees that the right hand side above is equal to
tF .t / C t 2 F .t /. Thus, we obtain the equation
F .t / t D tF .t / C t 2 F .t /;
which gives a closed form expression for F ; namely, F .t / D 1tt

t
2 . Up until now
we have ignored the question of convergence. However, the above formula
p
gives us
the answer. The roots of the polynomial t 2 C t 1 are r C WD 1C2 5 and r WD
p
1 5
2
. Since jr C j < jr j, we conclude that the generating function F .t / has radius
B Power Series and Generating Functions 143
p
of convergence jr C j D 51
2
. Thus, the generating function of the Fibonacci series
is given by
p
t 51
F .t / D ; jt j < : (B.4)
1 t t2 2
t
We now use the method of partial fractions to represent the function 1tt 2
in an
explicit power series. Using the fact that r C r D 1, we write
t 2 C t 1 D .t r C /.t r / D .t r C 1/.t r C C 1/I
thus,
t t
D : (B.5)
1 t t2 .t r C 1/.t r C C 1/
For unknown A and B, we write
t A B t .Ar C C Br / C .A C B/
D C C D :
.t r C
C 1/.t r C 1/ tr C 1 tr C 1 .t r C 1/.t r C C 1/
(B.6)
Comparing the left-most and right-most terms in (B.6) , we conclude that ACB D 0
and Ar C C Br D 1. Solving for A and B, we obtain A D r C r 1
D
p1 and
5
B D r r
1
C D
p1 . Thus, from (B.5) and the first equality in (B.6), we arrive at
5
the partial fraction representation
t 1 1 1
Dp
C
: (B.7)
1 t t2 5 1 C t r 1 C t r
Since jr j > jr C j, both 1

and 1
1Ct r C
can be written as geometric series if
p 1Ct r
jt j < 1
jr j
D 2p
D 51
2
. We have
1C 5
1 1 p
1 X X 1C 5 n n
n n n
D .1/ .r / t D . / t I
1 C t r nD0 nD0
2
p (B.8)
X1 X1
1 n C n n 1 5 n n
D .1/ .r / t D . / t :
1 C t rC nD0 nD0
2
Thus, from (B.4), (B.7), and (B.8), we obtain

p p
1 1C 5 n 1 5 n n
X1
F .t / D p . / . / t : (B.9)
nD0
5 2 2
144 B Power Series and Generating Functions
Comparing (B.3) with (B.9), we conclude that the nth Fibonacci number fn is
given explicitly by
p p
1 1C 5 n 1 5 n
fn D p . / . / : (B.10)
5 2 2
From the explicit formula in (B.10), the asymptotic behavior of fn is clear:

p
1 1C 5 n
fn p . / as n ! 1:
5 2
Appendix C
A Proof of Stirling’s Formula
Stirling’s formula states that

p
nŠ nn e n 2 n; as n ! 1: (C.1)
In order to obtain an asymptotic formula for the discrete quantity nŠ, it is extremely
useful to be able to embed this quantity in a function of a continuous variable.
Integrating by parts and then applying induction shows that nŠ D .n C 1/, n 2 N,
where the gamma function .t / is defined by
Z 1
.t / D x t1 e x dx; t > 0:
0
Thus, one proves Stirling’s formula in the following form.

Theorem C.1 (Stirling’s Formula).
p
.t C 1/ t t e t 2 t ; as t ! 1: (C.2)
Proof. In the literature one can find literally dozens of proofs of Stirling’s formula.
We present here an elementary proof that uses Laplace’s asymptotic method [14].
We begin by giving the intuition for the method. We write
Z 1
.t C 1/ D e t .x/
dx; (C.3)
0
where
t .x/ D t log x x:
146 C Proof of Stirling’s Formula
Now t takes on its maximum at x D t , and the Taylor expansion of t about x D t

starts out as
.x t /2
t log t t DW O t .x/:
2t
Replacing t by O t , we calculate that

Z 1 Z 1 Z 1
O t .x/ .xt/2 .xt/2
t t
e dx D e t log tt 2t dx D t e e 2t dx:
0 0 0
Making the substitution z D xt

p
t
gives
Z Z
1 .xt/2 p 1
1 2
e 2t dx D t p e 2 z d z:
0 t
R1 1 2 p
Since 1 e 2 z d z D 2, we conclude that
Z 1 p
O t .x/
e dx t t e t 2 t ; as t ! 1:
0
We now turn to the rigorous proof. We can write t exactly as

y
t .t C y/ D t log t t tg. /;
t
where
g.v/ D v log.1 C v/:
Substituting this in (C.3) and making the change of variables x D y C t , we obtain

Z 1 y
.t C 1/ D t t e t e tg. t / dy:
t
p
Making the change of variables y D t z, we have
p
.t C 1/ D t t e t N /;
2 t .t (C.4)
where
Z 1
N / D p1
.t e
tg. pz /
t d z:
p
2 t
C Proof of Stirling’s Formula 147
We will show that
N / D 1:
lim .t (C.5)
t!1
Now (C.2) follows from (C.4) and (C.5).

Fix L > 0 and write
N / D N L .t / C p1 T C .t / C p1 TL .t /;
.t (C.6)
L
2 2
where
Z L
1 tg. pz /
N L .t / D p e t dz
2 L
and
Z 1 Z L
tg. pz / tg. pz /
TLC .t / D e t d z; TL .t / D p e t d z:
L t
From Taylor’s remainder formula it follows that for any > 0 and sufficiently small
v, one has
1 1
.1 /v 2 g.v/ .1 C /v 2 :
2 2
Thus, limt!1 tg. pz t / D 12 z2 , uniformly over z 2 ŒL; L; consequently,
Z L
1
lim N L .t / D p
1 2
e 2 z d z: (C.7)
t!1 2 L
0 p p
Since t g. pz t / D t 1 1
D p tz is increasing in z, we have
1C pz t Cz
t
p Z p
t CL 1 z 0 tg. pz t / t C L tg. pLt /
TLC .t / p t g. p / e dz D p e D
tL L t tL
p
t C L tŒ pLt log.1C pLt /
p e :
tL
L2 3
By Taylor’s formula, we have log.1 C L
p
t
/ D L
p
t
2t
C O.t 2 / as t ! 1; thus,
1 1 L2
lim sup TLC .t / e 2 : (C.8)
t!1 L
148 C Proof of Stirling’s Formula
A very similar argument gives
1 1 L2
lim sup TL .t / e 2 : (C.9)
t!1 L
Now from (C.6)–(C.9), we obtain

Z L
1 1 2
N / lim sup .t
N /
p e 2 z d z lim inf .t
2 L t!1 t!1
Z L
1 1 2 2 1 2
p e 2 z d z C p e 2 L :
2 L L 2
N / is independent of L, letting L ! 1 above gives (C.5).

Since .t
Appendix D
P1 1 2
An Elementary Proof of nD1 n2 D 6
The standard way to prove the identity in the title of this appendix is via Fourier
series. We give a completely elementary proof, following [1]. Consider the double
integral
Z 1 Z 1
1
I D dxdy: (D.1)
0 0 1 xy
(Actually, the expression on the right hand side of (D.1) is an improper integral,
R1R1 1
because the integrand blows up at .x; y/ D .1; 1/. Thus, 0 0 1xy dxdy WD
R 1 R 1 1
lim !0C 0 0 1xy
dxdy. Since the integrand is nonnegative, there is no
R1R1 1
problem applying the standard rules of calculus directly to 0 0 1xy dxdy.) On
the one hand, expanding the integrand in a geometric series and integrating term by
term gives
Z 1 Z 1
1X 1 Z
X 1 Z 1
I D .xy/ dxdy D
n
x n y n dxdy D
0 0 nD0 nD0 0 0
1 Z 1
X Z 1 1
X X 1 1
1
x n dx y n dy D D : (D.2)
nD0 0 0 nD0
.n C 1/ 2
nD1
n2
(The interchanging of the order of the integration and the summation is justified by
the fact that all the summands are nonnegative.)
On the other hand, consider the change of variables u D yCx 2
, v D yx 2
. This
ı
transformation p rotates the square Œ0; 1Œ0; 1 clockwise by 45 and shrinks its sides
by the factor 2. The new domain is f.u; v/ W 0 u 12 ; u v ug [ f.u; v/ W
1
2
u 1; u 1 v 1 ug. The Jacobian @.x;y/@.u;v/
of the transformation is equal to
1
2, so the area element dxdy gets replaced by 2d udv. The function 1xy becomes
1
1u2 Cv 2
. Since the function and the domain are symmetric with respect to the u-axis,
we have
P1 2
150 D Proof of 1
nD1 n2 D 6
Z 1 Z Z Z
2 u
dv 1 1u
dv
I D4 du C 4 d u:
0 0 1u Cv
2 2 1
2 0 1u Cv
2 2
R
Using the integration formula dx
x 2 Ca2
D 1
a
arctan xa , we obtain
Z 1 Z
2 1 u 1
1 1u
I D4 p arctan p du C 4 p arctan p d u:
0 1 u2 1 u2 1
2 1 u2 1 u2

Now the derivative of g.u/ WD arctan p u 2 is p 1 2 , and the derivative of
q 1u
1u 1u
1u
h.u/ WD arctan p
2
D arctan 1Cu
is 2
1p1
2
. Thus, we conclude that
1u 1u
Z 1 Z 1 1
2
0
I D4 g.u/g .u/ d u 8 h.u/h0 .u/ d u D 2g 2 .u/j02 4h2 .u/j11 D
1 2
0 2
1 1 1
2 arctan2 p arctan2 0 4 arctan2 0 arctan2 p D 6 arctan2 p
3 3 3
2
D 6. /2 D : (D.3)
6 6
Comparing (D.2) and (D.3) gives
X1
1 2
2
D :
nD1
n 6

References
1. Aigner, M., Ziegler, G.: Proofs from the Book, 4th edn. Springer, Berlin (2010)
2. Alon, N., Spencer, J.: The Probabilistic Method, 3rd edn. Wiley-Interscience Series in Discrete
Mathematics and Optimization. Wiley, Hoboken (2008)
3. Alon, N., Krivelevich, M., Sudakov, B.: Finding a large hidden clique in a random graph.
In: Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (San
Francisco, CA, 1998), pp. 594–598. ACM, New York (1998)
4. Andrews, G.: The Theory of Partitions, reprint of the 1976 original. Cambridge University
Press, Cambridge (1998)
5. Apostol, T.: Introduction to Analytic Number Theory. Undergraduate Texts in Mathematics.
Springer, New York (1976)
6. Arratia, R., Barbour, A.D., Tavaré, S.: Logarithmic Combinatorial Structures: A Probabilistic
Approach. EMS Monographs in Mathematics. European Mathematical Society, Zürich (2003)
7. Athreya, K., Ney, P.: Branching Processes, reprint of the 1963 original [Springer, Berlin].
Dover Publications, Inc., Mineola (2004)
8. Bollobás, B.: The evolution of random graphs. Trans. Am. Math. Soc. 286, 257–274 (1984)
9. Bollobás, B.: Modern Graph Theory. Graduate Texts in Mathematics, vol. 184. Springer, New
York (1998)
10. Bollobás, B.: Random Graphs, 2nd edn. Cambridge Studies in Advanced Mathematics, vol. 73.
Cambridge University Press, Cambridge (2001)
11. Brauer, A.: On a problem of partitions. Am. J. Math. 64, 299–312 (1942)
12. Conlon, D.: A new upper bound for diagonal Ramsey numbers. Ann. Math. 170, 941–960
(2009)
13. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, 2nd edn. Springer,
New York (1998)
14. Diaconis, P., Freedman, D.: An elementary proof of Stirling’s formula. Am. Math. Mon. 93,
123–125 (1986)
15. Doyle, P., Snell, J.L.: Random Walks and Electric Networks. Carus Mathematical Monographs,
vol. 22. Mathematical Association of America, Washington (1984)
16. Durrett, R.: Probability: Theory and Examples, 4th edn. Cambridge Series in Statistical and
Probabilistic Mathematics. Cambridge University Press, Cambridge (2010)
17. Dwass, M.: The number of increases in a random permutation. J. Combin. Theor. Ser. A 15,
192–199 (1973)
18. Erdős, P., Rényi, A.: On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutató Int.
Közl 5, 17–61 (1960)
19. Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn, vol. I. Wiley,
New York (1968)
152 References
20. Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge
(2009)
21. Flory, P.J.: Intramolecular reaction between neighboring substituents of vinyl polymers. J. Am.
Chem. Soc. 61, 1518–1521 (1939)
22. Graham, R., Rothschild, B., Spencer, J.: Ramsey Theory, 2nd edn. Wiley-Interscience Series
in Discrete Mathematics and Optimization. Wiley, New York (1990)
23. Hardy, G.H., Ramanujan, S.: Asymptotic formulae in combinatory analysis. Proc. London
Math. Soc. 17, 75–115 (1918)
24. Harris, T.: The Theory of Branching Processes, corrected reprint of the 1963 original [Springer,
Berlin]. Dover Publications, Inc., Mineola (2002)
25. Jameson, G.J.O.: The Prime Number Theorem. London Mathematical Society Student Texts,
vol. 53. Cambridge University Press, Cambridge (2003)
26. Montgomery, H., Vaughan, R.: Multiplicative Number Theory. I. Classical Theory. Cambridge
Studies in Advanced Mathematics, vol. 97. Cambridge University Press, Cambridge (2007)
27. Nathanson, M.: Elementary Methods in Number Theory. Graduate Texts in Mathematics, vol.
195. Springer, New York (2000)
28. Page, E.S.: The distribution of vacancies on a line. J. Roy. Stat. Soc. Ser. B 21, 364–374 (1959)
29. Pinsky, R.: Detecting tampering in a random hypercube. Electron. J. Probab. 18, 1–12 (2013)
30. Pitman, J.: Combinatorial stochastic processes. Lectures from the 32nd Summer School on
Probability Theory held in Saint-Flour, 7–24 July 2002. Lecture Notes in Mathematics, 1875.
Springer, Berlin (2006)
31. Rényi, A.: On a one-dimensional problem concerning random space filling (Hungarian;
English summary). Magyar Tud. Akad. Mat. Kutató Int. Közl. 3, 109–127 (1958)
32. Spitzer, F.: Principles of Random Walk, 2nd edn. Graduate Texts in Mathematics, vol. 34.
Springer, New York (1976)
33. Tenenbaum, G.: Introduction to Analytic and Probabilistic Number Theory. Cambridge Studies
in Advanced Mathematics, vol. 46. Cambridge University Press, Cambridge (1995)
34. Wilf, H.: Generating Functionology, 3rd edn. A K Peters, Ltd., Wellesley (2006)
Index
A F
Abel summation, 77 Fibonacci sequence, 142
arcsine distribution, 37 finite graph, 89
average order, 13
G
B Galton–Watson branching process, 117
Bernoulli random variable, 138 generating function, 141
binomial random variable, 138 giant component, 110
branching process – see Galton–Watson
branching process, 117
H
Hardy–Ramanujan theorem, 81
C
Chebyshev’s -function, 70
Chebyshev’s -function, 68 I
Chebyshev’s inequality, 134 independent events, 133
Chebyshev’s theorem, 68 independent random variables, 135
Chinese remainder theorem, 19
clique, 89
coloring of a graph, 104 L
composition of an integer, 5 large deviations, 113
cycle index, 58
cycle type, 51
M
Mertens’ theorems, 75
D Mőbius function, 8
derangement, 49 Mőbius inversion, 10
Dyck path, 40 multiplicative function, 9
E P
Erdős–Rényi graph, 89 p-adic, 71
Euler -function, 11 partition of an integer, 1
Euler product formula, 19 Poisson approximation to the binomial
Ewens sampling formula, 52 distribution, 138
expected value, 134 Poisson random variable, 138
extinction, 117 prime number theorem, 67
154 Index
probabilistic method, 107 Stirling’s formula, 145

probability generating function, 54 survival, 117
probability space, 133
T
R tampering detection, 99
Ramsey number, 105 total variation distance, 99
random variable, 133
relative entropy, 115, 131
restricted partition of an integer, 1 V
variance, 134
S
sieve method, 19
simple, symmetric random walk, 35 W
square-free integer, 8 weak convergence, 139
Stirling numbers of the first kind, 54 weak law of large numbers, 136, 137

Problems From The Discrete To The Continuous - Probability, Number Theory, Graph Theory, and Combinatorics PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Problems From The Discrete To The Continuous - Probability, Number Theory, Graph Theory, and Combinatorics PDF

Caricato da

Copyright:

Formati disponibili

Universitext

For further volumes:

Problems from the Discrete

ISSN 0172-5939 ISSN 2191-6675 (electronic)

Mathematics Subject Classification (2010): 05A, 05C, 05D, 11N, 60

© Springer International Publishing Switzerland 2014

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

A peculiar beauty reigns in the realm of

It is often averred that two contrasting cultures coexist in mathematics—the theory-

Acknowledgements It is a pleasure to thank my editor, Donna Chernyk, for her professionalism

Haifa, Israel Ross G. Pinsky

1 Partitions with Restricted Summands

10.5 The Galton–Watson Branching Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Appendix A A Quick Primer on Discrete Probability . . . . . . . . . . . . . . . . . . . . . 133

Appendix B Power Series and Generating Functions . . . . . . . . . . . . . . . . . . . . . 141

Appendix C A Proof of Stirling’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Z denotes the set of integers

Bin.n; p/ is the binomial distribution with parameters n and p

A partition of n is a sequence of integers .x1 ; : : : ; xk /, where k is a positive

Let Pn denote the number of different partitions of n. The problem of obtaining an

A moment’s thought reveals that the number of solutions to (1.1) is Pn .faj gj D1 /.

R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext, 1

and the series converges absolutely for jxj < 1. Thus,

The coefficient of x n on the right hand side above is

lim .1  x/m H.x/ D A11 : (1.11)

But on the other hand, from (1.5), we have

Since .x aj /0 jxD1 D aj x aj 1 jxD1 D aj , we conclude from (1.12) that

Exercise 1.1. If b1 a1 C b2 a2 C C bm am D n, then of course bj aj n, for

For a leisurely and folksy introduction to the use of generating functions in

Pick a positive integer at random. What is the probability of it being even? As

R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext, 7

Bn D f.j; k/ W 1 j < k ng:

Let An  Bn denote those pairs which are relatively prime:

An D f.j; k/ W 1 j < k n; gcd.j; k/ D 1g:

We proceed to develop a circle of ideas that will facilitate the calculation of

Thus, for example, we have .3/ D 1; .15/ D 1, and .12/ D 0.

Given arithmetic functions a and b, we define their convolution a b to be the

Clearly, a b D b a. The convolution arises naturally in the following context.

When we say “formally,” what we mean is that we ignore questions of convergence

We now prove (2.5). We have

multiplicative, arithmetic function is completely determined by its values

.n/ D jfj W 1 j n; gcd.j; n/ D 1gj:

Therefore, from (2.1), we have

n2 X .d / n X .d / X n2 X .d / n X .d /

Recall the well-known formula

We give a completely elementary proof of this fact in Appendix D. From (2.12)

Using (2.14) with (2.11) and (2.6) gives

We now turn to Theorem 2.2.

To prove the theorem, we need to show that

hand, if n is not square-free, then n can be written in the form n D m2 l, where

It follows from (2.16), (2.19), and (2.20) that

Cn D f.i; j / 2 n n W gcd.i; j / D 1g:

where Ac WD n n  A denotes the complement of an event A  n n .

Let R < n be a positive integer. We have

and, of course, \nkD1 .BnIk

independent. See Exercise A.3 in Appendix A. Thus, we conclude that

From (2.22) to (2.24), (2.26), and (2.27), we conclude that

Since n0 ; n00 2 DR , we conclude from (2.28)–(2.30) that

The celebrated Euler product formula states that

Consider n molecules lined up in a row. From among the n  1 nearest neighbor

R.G. Pinsky, Problems from the Discrete to the Continuous, Universitext, 21

Fig. 3.1 A realization with n = 21 and k = 3 that gives M21;3 D 15

lim .1 x/m H.x/ D A11 : (1.11)

Let An Bn denote those pairs which are relatively prime:

Thus, for example, we have .3/ D 1; .15/ D 1, and .12/ D 0.

.n/ D jfj W 1 j n; gcd.j; n/ D 1gj:

n2 X .d / n X .d / X n2 X .d / n X .d /

Cn D f.i; j / 2 n n W gcd.i; j / D 1g:

where Ac WD n n A denotes the complement of an event A n n .

Consider n molecules lined up in a row. From among the n 1 nearest neighbor

Hn.k/ D 0; for n D 0; : : : ; k 1: (3.7)

Sn.k/ D 0; for n D 0; : : : ; k 1; (3.12)

.n k C 1/Sn D .n k C 1/Sn1 C 2Snk C k.n k C 1/; n k: (3.14)

nSn t n .k 1/Sn t n D .n 1/Sn1 t n .k 2/Sn1 t n C 2Snk t n C k.n k C 1/t n :

lim .1 t /3 g.t / D 2L:

Substituting this latter equality in (3.25), multiplying by .1 t /3 , and letting t ! 1,

2L 2 lim inf.1 t /3 g.t / lim sup.1 t /3 g.t / 2L C 2 :

Of course, we have Ln D 0, for n D 0; : : : ; k 1, and thus,

j.k 1 j /t j 1 C j.k j /t j jt k1

Use the fact that gk1 . / D .k 1/ k1 C O. k /, as ! 0, along with (3.23)

In particular then, q3;2 D 2e 3 0:0996 0:100, and consequently q3;1 D

(f) For j 2 Œk 2, one obtains

P .lim sup Sn D 1 and lim inf Sn D 1/ D 1: (4.1)

P .S1 0; : : : ; S2n 0/ D P .S2n D 0/:

wnC1 D 4.wn dn / C 2dn D 4wn 2dn ; n 0: (4.22)

From the resulting equation, x1 .W .x/ 1/ D 4W .x/ 2D.x/, we obtain

We have W .0/ D 1, and for n 1,

for all k satisfying n k .1 /n.