Sei sulla pagina 1di 136

World Scientific Publishing Co. Pte. Ltd.

5 Toh Tuck Link, Singapore 596224

USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601

UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.

PROBABILITY AND RANDOM NUMBER A First Guide to Randomness

All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy

is not required from the publisher.

ISBN 978-981-3228-25-2

Printed in Singapore

Preface

Imagine old times when the word probability did not exist. Facing diﬃcult situations that could be expressed as irregular, unpredictable, random, etc. (in what follows, we call it random), people were helpless. After a long time, they have found how to describe randomness, how to analyze it, how to deﬁne it, and how to make use of it. What is really amazing is that these all have been done in the very rigorous mathematics—–just like geometry and algebra.

At high school, students calculate probabilities by counting the number of permutations, combinations, etc. At university, counting is also the most basic method to study probability. The only diﬀerence is that we count huge numbers at university; e.g., we ask how large 10, 000! is. To count huge numbers, calculus—diﬀerentiation and integration—is useful. To deal with extremely huge numbers, taking the limit to inﬁnity often makes things simpler, in which case, calculus is again useful. In short words, at university, counting huge numbers by calculus is the most basic method to study probability. Why do we count huge numbers? It is because we want to ﬁnd as many limit theorems as possible. Limit theorems are very useful to solve practical problems; e.g., to analyze statistical data of 10,000 people, or to study properties of a certain substance consisting of 6.02 × 10 23 molecules. They have a yet more important mission. It is to unlock the secrets of randomness, which is the ultimate aim of studying probability. What is randomness? Why can limit theorems unlock its secrets? To answer these questions, we feature random number as one of the two main subjects of this book. 1 Without learning random number, we can do

1 The other one is, of course, probability.

v

vi

Preface

calculations and prove theorems about probability, but to understand the essential relation between probability and randomness, the knowledge of random number is necessary. Another reason why we feature random number is that for proper under- standing and implementation of the Monte Carlo method, the knowledge of random number is indispensable. The Monte Carlo method is a numer- ical method to solve mathematical problems by computer-aided sampling of random variables. Thus, not only in theory but also in practice, learning random number is important.

The prerequisite for this book is ﬁrst-year university calculus. University mathematics is really diﬃcult. There are three elements of the diﬃculty. First, the subtle nuance of concepts described by unfamiliar technical terms. For example, a random variable needs a probability space as a setup, and should be accompanied by a distribution, to make sense. In partic- ular, special attention must be paid to terms—such as event—that have mathematical meanings other than usual ones. Secondly, long proofs and complicated calculations. This book includes many of them. They are unavoidable; for it is not easy to obtain important concepts or theorems. In this book, reasoning by many inequalities, which readers may not be used to, appear here and there. We hope readers to follow the logic patiently. Thirdly, the fact that inﬁnity plays an essential role. Since the latter half of the 19th century, mathematics has developed very much by dealing with inﬁnity directly. However, inﬁnity essentially diﬀers from ﬁnity 2 in many respects, and our usual intuition does not work at all for it. Therefore mathematical concepts about inﬁnity cannot help being so delicate that we must be very careful in dealing with them. In this book, we discuss the distinction between countable set and uncountable set, and the rigorous deﬁnition of limit.

This book presents well-known basic theorems with proofs that are not seen in usual probability textbooks; for we want readers to learn that a good solution is not always unique. In general, breakthroughs in science have been made by unusual solutions. We hope readers to know more than one proof for every important theorem.

This is an English translation of my Japanese book Kakuritsu to ransu¯ published by Sugakushobo Co. To that book, Professors Masato Takei and

2 ﬁniteness

Preface

vii

Tetsuya Hattori gave me suggestions for better future publication. Indeed, they were very useful in preparing the present English version. Mr. Shin Yokoyama at Sugakushobo encouraged me to translate the book. Professor Nicolas Bouleau carefully read the translated English manuscript, and gave me valuable comments, which helped me improve it. Dr. Pan Suqi and Ms. Tan Rok Ting at World Scientiﬁc Publishing Co. kindly supported me in producing this English version. I am really grateful to all of them.

Osaka, September 2017

Hiroshi Sugita

viii

Preface

Notations and symbols

P

A := B

=Q

A is deﬁned by B (B =: A as well).

P implies Q (logical inclusion).

end of proof.

N := {0, 1, 2,

.}, the set of all non-negative integers.

N + := {1, 2,

.}, the set of all positive integers.

R := the set of all real numbers.

n

i=1

a i := a 1 × ··· × a n .

max[min]A := the maximum[minimum] value of A R.

max min u(t) := the maximum[minimum] value of u(t) over all t 0.

t 0

t 0

t := the largest integer not exceeding t 0 (rounding down).

n

k

:=

a b

b

b n

a

a n

n! (n k)!k! . a and b are approximately equal to each other.

a is much greater than b (b a as well).

a n /b n

1,

as n → ∞.

:= the empty set.

P(Ω) := the set of all subsets of Ω.

1 A (x) := 1 (x A),

0 (x

A)

(the indicator function of A).

#A := the number of elements of A.

A c := the complement of A.

A × B := {(x, y) | x A, y B} (the direct product of A and B).

Table of Greek letters

 A, α B, β Γ, γ ∆, δ E, (ε) Z, ζ H, η Θ, θ (ϑ) alpha I, ι iota P, ρ ( ) Σ, σ (ς) T, τ Υ, υ Φ, φ (ϕ) X, χ Ψ, ψ Ω, ω rho beta K, κ kappa sigma gamma Λ, λ lambda tau delta M, µ mu upsilon epsilon N, ν nu phi zeta Ξ, ξ xi chi eta O, o omicron psi theta Π, π ( ) pi omega

Contents

 Preface v Notations and symbols . . . . . . . . . . . . . . . . . . . . . . . viii Table of Greek letters . . . . . . . . . . . . . . . . . . . . . . viii 1. Mathematics of coin tossing 1 1.1 Mathematical model . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Probability space . . . . . . . . . . . . . . . . . . 4 1.1.2 Random variable . . . . . . . . . . . . . . . . . . . 6 1.2 Random number . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 Limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.1 Analysis of randomness . . . . . . . . . . . . . . . 12 1.3.2 Mathematical statistics . . . . . . . . . . . . . . . 15 1.4 Monte Carlo method . . . . . . . . . . . . . . . . . . . . . 16 1.5 Inﬁnite coin tosses . . . . . . . . . . . . . . . . . . . . . . 18 1.5.1 Borel’s normal number theorem . . . . . . . . . . 19 1.5.2 Construction of Brownian motion . . . . . . . . . 19 2. Random number 23 2.1 Recursive function . . . . . . . . . . . . . . . . . . . . . . 24 2.1.1 Computable function . . . . . . . . . . . . . . . . 25 2.1.2 Primitive recursive function and partial recursive function . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1.3 Kleene’s normal form (∗)†3 . . . . . . . . . . 29 2.1.4 Enumeration theorem . . . . . . . . . . . . . . . . 31 2.2 Kolmogorov complexity and random number . . . . . . . 33

3 The subsections with () can be skipped.

ix

x

Contents

 2.2.1 Kolmogorov complexity . . . . . . . . . . . . . . . 34 2.2.2 Random number . . . . . . . . . . . . . . . . . . . 37 2.2.3 Application: Distribution of prime numbers (∗) 38 3. Limit theorem 42 3.1 Bernoulli’s theorem . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Law of large numbers . . . . . . . . . . . . . . . . . . . . . 47 3.2.1 Sequence of independent random variables . . . . 48 3.2.2 Chebyshev’s inequality . . . . . . . . . . . . . . . 54 3.2.3 Cram´er–Chernoﬀ’s inequality . . . . . . . . . . . . 57 3.3 De Moivre–Laplace’s theorem . . . . . . . . . . . . . . . . 60 3.3.1 Binomial distribution . . . . . . . . . . . . . . . . 60 3.3.2 Heuristic observation . . . . . . . . . . . . . . . . 61 3.3.3 Taylor’s formula and Stirling’s formula . . . . . . 65 3.3.4 Proof of de Moivre–Laplace’s theorem . . . . . . . 75 3.4 Central limit theorem . . . . . . . . . . . . . . . . . . . . 80 3.5 Mathematical statistics . . . . . . . . . . . . . . . . . . . . 86 3.5.1 Inference . . . . . . . . . . . . . . . . . . . . . . . 86 3.5.2 Test . . . . . . . . . . . . . . . . . . . . . . . . 88 4. Monte Carlo method 91 4.1 Monte Carlo method as gambling . . . . . . . . . . . . . . 91 4.1.1 Purpose . . . . . . . . . . . . . . . . . . . . . 91 4.1.2 Exercise I, revisited . . . . . . . . . . . . . . . . . 93 4.2 Pseudorandom generator . . . . . . . . . . . . . . . . . . . 94 4.2.1 Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . 94 4.2.2 Security . . . . . . . . . . . . . . . . . . . . . . 95 4.3 Monte Carlo integration . . . . . . . . . . . . . . . . . . . 96 4.3.1 Mean and integral . . . . . . . . . . . . . . . . . . 96 4.3.2 Estimation of mean . . . . . . . . . . . . . . . . . 97 4.3.3 Random Weyl sampling . . . . . . . . . . . . . . . 98 4.4 From the viewpoint of mathematical statistics . . . . . . . 104 Appendix A A.1 Symbols and terms . . . . . . . . . . . . . . . . . . . . . 105 105 A.1.1 . Set and function . . . . . . . . . . . . . . . . . . . 105 A.1.2 Symbols for sum and product . . . . . . . . . . . . 106 A.1.3 Inequality symbol ‘ ’ . . . . . . . . . . . . . . . 108

Contents

xi

 A.2 Binary numeral system . . . . . . . . . . . . . . . . . . . . 108 A.2.1 A.2.2 Binary integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Binary fractions . . . . . . . . 110 A.3 Limit of sequence and function . . . . . . . . . . . . . . . 111 A.3.1 A.3.2 A.3.3 . Continuity of function of one variable Convergence of sequence . Continuity of function of several variables . . . . . . . . . . . . . . . . . . . 111 114 115 A.4 Limits of exponential function and logarithm . . . . . . . 115 A.5 C language program . . . . . . . . . . . . . . . . . . . . . 116 List of mathematicians 119 Further reading 120 Bibliography 121 Index 123

Chapter 1

Mathematics of coin tossing

Tossing a coin many times, record 1 if it comes up Heads and record 0 if it comes up Tails at each coin toss. Then, we get a long sequence consisting of 0 and 1—let us call such a sequence a {0, 1}-sequence—that is random. In this chapter, with such random {0, 1}-sequences as material, we study outlines of

how to describe randomness (Sec. 1.1),

how to deﬁne randomness (Sec. 1.2),

how to analyze randomness (Sec. 1.3.1), and

how to make use of randomness (Sec. 1.3.2, Sec. 1.4).

Readers may think that coin tosses are too simple as a random object, but as a matter of fact, virtually all random objects can be mathemati- cally constructed from them (Sec. 1.5.2). Thus analyzing coin tosses means analyzing all random objects. In this chapter, we present only basic ideas, and do not prove theorems.

1.1 Mathematical model

For example, the concept ‘circle’ is obtained by abstracting an essence from various round objects in the world. To deal with circle in mathematics, we consider an equation (x a) 2 + (y b) 2 = c 2 as a mathematical model. Namely, what we call a circle in mathematics is the set of all solutions of this equation

{(x, y) | (x a) 2 + (y b) 2

= c 2 }.

Similarly, to analyze random objects, since we cannot deal with them di- rectly in mathematics, we consider their mathematical models. For exam- ple, when we say ‘n coin tosses’, it does not mean that we toss a real coin

1

2

Mathematics of coin tossing

n times, but it means a mathematical model of it, which is described by mathematical expressions in the same way as ‘circle’. Let us consider a mathematical model of ‘3 coin tosses’. Let X i ∈ {0, 1} be the outcome (Heads = 1 and Tails = 0) of the i-th coin toss. At high school, students learn the probability that the consecutive outcomes of 3 coin tosses are Heads, Tails, Heads, is

P(X 1 = 1, X 2 = 0, X 3 = 1) =

1

2 3

=

1

8 .

(1.1)

Here, however, the mathematical deﬁnitions of P and X i are not clear. After making them clear, we can call them a mathematical model of 3 coin tosses.

Fig. 1.1

Heads and Tails of 1 JPY coin

Example 1.1. Let {0, 1} 3 denote the set of all {0, 1}-sequences of length 3:

{0, 1} 3 := { ω = (ω 1 , ω 2 , ω 3 ) | ω i ∈ {0, 1},

1 i 3 }

= { (0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0),

(1, 0, 1), (1, 1, 0), (1, 1, 1) }.

Let P({0, 1} 3 ) be the power set 1 of {0, 1} 3 , i.e., the set of all subsets of

{0, 1} 3 .

number of elements of A. Now, deﬁne a function P 3 : P({0, 1} 3 ) [0, 1] := { x | 0 x 1 } by

A P({0, 1} 3 ) is equivalent to A ⊂ {0, 1} 3 . Let #A denote the

P 3 (A) :=

#A

#{0, 1} 3 =

#A

2

3

,

A P({0, 1} 3 )

(See Deﬁnition A.2), and functions ξ i : {0, 1} 3 → {0, 1}, i = 1, 2, 3, by

ξ i (ω) :=

ω i ,

ω = (ω 1 , ω 2 , ω 3 ) ∈ {0, 1} 3 .

1 P is the letter P in the Fractur typeface.

(1.2)

1.1. Mathematical model

3

Each ξ i is called a coordinate function. Then, we have

P 3 ω ∈ {0, 1} 3 ξ 1 (ω) = 1, ξ 2 (ω) = 0, ξ 3 (ω) = 1

= P 3 ω ∈ {0, 1} 3 ω 1 = 1, ω 2 = 0, ω 3 = 1

(1.3)

Although (1.3) has nothing to do with the real coin tosses, it is formally the same as (1.1). Readers can easily examine the formal identity not only for the case Heads, Tails, Heads, but also for any other possible outcomes of 3 coin tosses. Thus we can compute every probability concerning 3 coin tosses by using P 3 and {ξ i } i=1 . This means that P and {X i } i=1 in (1.1) can be considered as P 3 and {ξ i } i=1 , respectively. In other words, by the correspondence

1 3 .

= P 3 ( { (1, 0, 1) } ) =

2

3

3

3

P

←→ P 3 ,

3

{X i } i=1 ←→ {ξ i }

3

i=1 ,

3

P 3 and {ξ i } i=1 are a mathematical model of 3 coin tosses.

The equation (x a) 2 + (y b) 2 = c 2 is not a unique mathematical

There are diﬀerent models of it; e.g., a parametrized

model of ‘circle’. representation

x

y

=

= c sin t + b,

c cos t + a,

0 t 2π.

You can select suitable mathematical models according with your particular purposes. In the case of coin tosses, it is all the same. We can present another mathematical model of 3 coin tosses.

Example 1.2. (Borel’s model of coin tosses) For each x [0, 1) := { x | 0 x < 1}, let d i (x) ∈ {0, 1} denote the i-th digit of x in its binary expansion (Sec. A.2.2). We write the length of each semi-open interval [ a, b) [0, 1) as

P( [a, b) ) :=

b a.

Here, the function P that returns the lengths of semi-open intervals is called the Lebesgue measure. Then, the length of the set of x [0, 1) for which d 1 (x), d 2 (x), d 3 (x), are 1, 0, 1, respectively, is

P ( {x [0, 1) | d 1 (x) = 1, d 2 (x) = 0, d 3 (x) = 1} )

= P x [0, 1)

1

2 +

0

1

2

2 2 +

3 x <

= P

5

8 , 6

8 = 1

8 .

1

2 +

0

1

3

1

2

2 2 +

2 3 +

In the number line with binary scale, it is expressed as a segment:

4

Mathematics of coin tossing

0
0.001
0.01
0.011
0.1
0.101
0.11
0.111
1

Under the correspondence

P ←→ P,

3

{X i } i=1 ←→ {d i }

3

i=1 ,

3

P and {d i } i=1 are also a mathematical model of 3 coin tosses.

Readers may suspect that, in the ﬁrst place, (1.1) is not correct for real coin tosses. Indeed, rigorously speaking, since Heads and Tails are diﬀerently carved, (1.1) is not exact in the real world. What we call ‘coin tosses’ is an idealized model, which can exist only in our mind—just as we consider the equation (x a) 2 + (y b) 2 = c 2 as a mathematical model of circle, although there is no true circle in the real world.

1.1.1 Probability space

Let us present what we stated in the previous section in a general setup. In what follows, ‘probability theory’ means the axiomatic system for prob- ability established by [Kolomogorov (1933)] 2 and all its derived theorems. Let us begin with probability distribution and probability space.

Deﬁnition 1.1. (Probability distribution) Let Ω be a non-empty ﬁnite

Suppose that for each ω Ω, there

set,

corresponds a real number 0 p ω 1 so that

i.e., Ω

=

3

and #Ω <

.

p ω

ω

= 1

(Sec. A.1.2 ). Then, we call the set of all pairs ω and p ω

{(ω, p ω ) | ω }

a probability distribution (or simply, a distribution) in Ω.

Deﬁnition 1.2. (Probability space)

let P(Ω) be the power set of Ω. If a function P : P(Ω) R satisﬁes

(i)

(ii) P (Ω) = 1, and

Let Ω be a non-empty ﬁnite set and

0 P (A) 1,

A P(Ω),

2 See Bibliography at the end of the book. 3 ’ denotes the empty set.

1.1. Mathematical model

5

(iii) A, B P(Ω) are disjoint, i.e., A B =

=P(A B) = P(A) + P(B),

then the triplet (Ω, P(Ω), P ) is called a probability space. 4 An element of P(Ω) (i.e., a subset of Ω) is called an event, in particular, Ω is called the whole event (or the sample space), and the empty event. A one point set {ω} or ω itself is called an elementary event. P is called a probability measure (or simply, a probability) and P (A) the probability of A.

For a non-empty ﬁnite set Ω, to give a distribution in it and to give a

probability space are equivalent. In fact, if a distribution {(ω, p ω ) | ω }

is given, by deﬁning a probability P : P(Ω) R as

P (A) :=

p ω ,

ωA

A P(Ω),

a triplet (Ω, P(Ω), P ) becomes a probability space. Conversely, if a proba- bility space (Ω, P(Ω), P ) is given, by deﬁning

p ω := P ({ω}),

ω ,

{(ω, p ω ) | ω } becomes a distribution in Ω.

A triplet (Ω, P(Ω), P ) is a probability space provided that the conditions (i)(ii)(iii) of Deﬁnition 1.2 are satisﬁed, no matter whether it is related to

a random phenomenon or not. Thus a triplet

({0, 1} 3 , P({0, 1} 3 ), P 3 ),

whose components have been deﬁned in Example 1.1, is a probability space not because it is related to 3 coin tosses but because it satisﬁes all the conditions (i)(ii)(iii). Like P 3 , in general, a probability measure satisfying

P(A) =

#A ,

#Ω

A P(Ω),

is called a uniform probability measure. Equivalently, a distribution satis-

fying

p ω

=

1

# ,

ω ,

4 In mathematics, many kinds of ‘spaces’ enter the stage, such as linear space, Eu- clidean space, topological space, Hilbert space, etc. These are sets accompanied by some structures, operations, or functions. In general, they have nothing to do with the 3-dimensional space where we live.

6

Mathematics of coin tossing

is called a uniform distribution. Setting the uniform distribution means that we assume every element of Ω is equally likely to be chosen.

By the way, in Example 1.2, you may wish to assume [0, 1) to be the whole event and P to be the probability, but since [0, 1) is an inﬁnite set, it is not covered by Deﬁnition 1.2. By extending the deﬁnition of probability space, it is possible to consider an inﬁnite set as a whole event, but to do this, we need Lebesgue’s measure theory, which exceeds the level of this book.

Each assertion of the following proposition is easy to derive from Deﬁ- nition 1.2.

Proposition 1.1. Let (Ω, P(Ω), P ) be a probability space. For A, B P(Ω), we have

(i) P (A c ) = 1 P(A), 5 in particular, P () = 0,

(ii) A B

=P (A) P (B), and

(iii) P (A B) = P(A) + P(B).

P (A) + P (B) P (A B), in particular, P (A B)

1.1.2 Random variable

Deﬁnition 1.3. Let (Ω, P(Ω), P ) be a probability space. We call a function

, a s } ⊂ R be the set of all possible

values that X can take, which is called the range of X, and let p i be the probability that X = a i :

(1.4)

X : Ω R a random variable. Let {a 1 ,

P ( {ω | X(ω) = a i } ) =:

p i ,

i = 1,

, s.

Then, we call the set of all pairs (a i , p i )

{(a i , p i ) | i = 1,

, s}

(1.5)

the probability distribution (or simply, the distribution) of X . Since we have

= 1,

, a s } of X. The left-hand side of

p 1 + ··· + p s

0 p i 1,

i = 1,

, s,

(1.5) is a distribution in the range {a 1 ,

(1.4) and the event inside P ( ) are often abbreviated as

P(X = a i )

respectively.

and

{X = a i },

5 A c := {ω | ω

A } is the complement of A,