Sei sulla pagina 1di 7

Chapter 1 Sections 1 and 2

January 20, 2012

The Markov property starting at time k

Let X0 , X1 , . . . be a nite or innite sequence of random variables taking values in some state space, In Section 1.1 of the text, the Markov property is stated as equation (1.1) of that section: P (Xn+1 = in+1 |X0 = i0 , . . . , Xn = in ) = p (in , in+1 ) , (1.1)

for every choice of i0 , . . . , in+1 . Your text writes in as i, and in+1 as j, but that doesnt change the meaning. Notice that it is tacitly assumed that P (X0 = i0 , . . . , Xn = in ) > 0, since otherwise the conditional probability in the equation would not exist. As your text says, this equation expresses the memoryless property of a Markov chain. Also, since the function p on the right side of the equation does not depend on n, your book notes that this equation expresses the fact that the rules governing the Markov chain do not change with time. Your text refers to p(i, j) as the transition matrix for the Markov chain. We will use the phrase transition function for p(i, j) as well, because we dont always want to think of p(i, j) as a matrix. If we multiply through by P (X0 = i0 , . . . , Xn = in ) in (1.1), we obtain the multiplied-through form of the Markov property: P (X0 = i0 , . . . , Xn = in , Xn+1 = in+1 ) = P (X0 = i0 , . . . , Xn = in ) p (in , in+1 ) , (1.2) for every choice of i0 , . . . , in+1 . This form of the Markov property implies the original form, and it has the added benet that it is true even if P (X0 = i0 , . . . , Xn = in ) = 0. Please verify this fact. (The same dierence holds between the ordinary conditional probability formula and its multipliedthrough form.) The multiplied-through form of the Markov property is convenient for establishing many properties of Markov chains. It is important to realize that a Markov chain doesnt have to start at time n = 0. It could just as well start at any time n = k. (The point about a Markov chain is that the values occur in a denite order. It doesnt matter how we 1

label the times.) So lets state the equations dening the Markov property for a Markov chain starting at time n = k. First the conditional form: P (Xn+1 = in+1 |Xk = ik , . . . , Xn = in ) = p (in , in+1 ) , (1.3)

for every choice of ik , . . . , in+1 . As usual in such equations, it is tacitly assumed that P (Xk = i0 , . . . , Xn = in ) > 0, since otherwise the conditional probability in the equation would not exist. Second, the multiplied-through form: P (Xk = ik , . . . , Xn = in , Xn+1 = in+1 ) = P (Xk = ik , . . . , Xn = in ) p (in , in+1 ) , (1.4) for every choice of ik , . . . , in+1 .

The tail of Markov chain is a Markov chain

Here is a useful fact. Lemma 2.1 (Tail of Markov is Markov) Let (X0 , X1 , . . .) be a Markov chain with transition function p(i, j). Then for any k 0, (Xk , Xk+1 , . . .) is also a Markov chain with transition function p(i, j) Proof It is enough to prove the lemma for k = 1 (because we can keep going!). The proof consists of summing both sides of (1.2) over all values of i0 . By pure logic, {X0 = i0 , . . . , Xn = in } = {X1 = i1 , . . . , Xn = in } ,
i0

and {X0 = i0 , . . . , Xn = in , Xn+1 = in+1 } = {X1 = i1 , . . . , Xn = in , Xn+1 = in+1 } .


i0

Hence when we add the probabilities, we obtain (1.4) with k = 1, so the proof is complete.

One-step conditionals

On page 29 of Section 1.1, your text states an important fact about transition functions: p(i, j) = P (Xn+1 = j|Xn = i) , (3.1) which holds assuming that P (Xn = i) > 0. We will refer to this equation as the one-step conditional probability formula. As usual it is convenient to also consider the multiplied-through form: p(i, j)P (Xn = i) = P (Xn = i, Xn+1 = j) , 2 (3.2)

and this holds even if P (Xn = i) = 0. A proof is not given in the text, but since it is such an important fact we should check it. First consider the case that n = 0. Taking n = 0 in (1.2), with i0 = i and i1 = 1, and check that the result is (3.2). Thus we have an immediate proof for n = 0. The remaining step is to see that (3.2) is also true for general n. But Lemma 2.1 we know that (1.4) holds. Take k = n in (1.4), in = i, in+1 = j, and check this gives (3.2) in the general case, nishing the proof.

The family of chains using p

Suppose that (X0 , X1 , . . .) is a Markov chain with transition function p such that P (X0 = i) = 1 for some state i. Then we say that the chain starts at i with probability one. It is important to note that we will sometimes be interested in Markov chains that dont start with probability one at a particular point. For example, you might toss a coin to choose the starting point. So often we have P (X0 = j) > 0 for several points j. The values of P (X0 = j) for dierent values of j are determine what we call the initial distribution of the chain. Everything that matters about a Markov chain is determined by two things: the Markov transition function p, and the initial distribution, that is, the distribution of the starting random variable X0 . For a given Markov transition function p, It is a fact that for every possible initial distribution that can be specied, there is a Markov chain such that (1.2) holds and X0 has the specied initial distribution. We will say that the set of all these Markov chains is the family of Markov chains associated with p. Each choice of initial distribution gives a dierent chain, but all the chains in the family have the same transition function p, and we will soon see that many properties are the same for all the chains in the family.

The Pi notation

In the special case that the chain starts at a particular point i with probability one, i.e. P (X0 = i) = 1, the text uses a standard notation (see page 42 in Section 1.3, important notation) for the probability function P , and writes the probability function P in that case as Pi . This is convenient for expressing things briey. It is a fact, that we could prove easily, that when dealing with any Markov chain that has transition function p, not necessarily satisfying P (X0 = i) = 1, P (A|X0 = i) = Pi (A) (5.1)

holds for any event A. Your text assumes this and uses both notations freely.

5.1

Path probabilities
P (X0 = i0 , . . . , Xn = in ) = P (X0 = i0 ) p(i0 , i1 ) . . . p(in1 , in ). (5.2)

In class we showed by induction on n that

(Here the n = 0 statement says that P (X0 = i0 ) = P (X0 = i0 ), which is definitely true.) The proof works best with the multiplied-through form of the Markov property, equation (1.2). We will refer to equation (5.2) as the basic path probability formula. Remark If it happens that P (X0 = i0 ) = 1, then of course P (X0 = i0 , . . . , Xn = in ) = p(i0 , i1 ) . . . p(in1 , in ). On the other hand, notice that if P (X0 = j) = 1 holds for some j = i0 then we have P (X0 = i0 ) = 0 (why?) and hence P (X0 = i0 , . . . , Xn = in ) = 0. It is important to note that we will sometimes be interested in Markov chains that dont start with probability one at a particular point. For example, you might toss a coin to choose the starting point. So often we have P (X0 = j) > 0 for several points j. Then one must use the general form (5.2) of the basic path probability formula. Remark The proof of (5.2) is closely related to a familiar formula about a chain of conditional probabilities: P (A1 A2 . . . A ) = P (A1 ) P (A2 |A1 ) P (A3 |A1 A2 ) . . . P (An |A1 . . . A
1 ) ,

(5.3)

provided that P (A1 . . . A 1 ) > 0. The basic path probability formula can be extended. Indeed, by Lemma 2.1, (Xm , Xm+1 , . . .) is a Markov chain with transition function p. Hence, for the same reason that (5.2) holds, we know that for any m < n, P (Xm = im , Xm+1 = im+1 , . . . , Xn = in ) = P (Xm = im ) p (im , im+1 ) . . . p (in1 , in ) . (5.4) We will refer to (5.4) as the general path probability formula. Everything about a Markov chain follows from the general path probability formula. The only disadvantage in using this formula is that sometimes one must sum a lot of terms.

Equations for distributions

The law of total probability shows that for any discrete random variables X, Y taking values in the state space (not necessarily part of a Markov chain), and 4

for any point j in the state space: P (Y = j) =


i

P (X = i) P (Y = j|X = i) .

(6.1)

Here we use the usual convention: if P (X = i) = 0, we agree to interpret the term P (X = i) P (Y = j|X = i) as zero. That is, we agree that in this situation, 0 undened = 0. With this convention, equation (6.1) is always true. In particular, we can apply (6.1) to an arbitrary sequence X = (X0 , X1 , . . .) of random variables taking values in the state space, and conclude that P (Xn+1 = j) =
i

P (Xn = i) P (Xn+1 = j|Xn = i) .

(6.2)

From now on, suppose that the Markov property equation (1.1) of Section 1.1 of the text holds, so that X is a Markov chain with transition function p. As a consequence of (3.1) and (6.2) we then have P (Xn+1 = j) =
i

P (Xn = i) p(i, j).

(6.3)

When the points of the state space are 1, 2, . . . , k (or more generally when the points are arranged in any denite order), we will often use n to denote the row vector representing the distribution of Xn . That is, n = P (Xn = 1) . . . P (Xn = k)

Then we can rewrite (6.3) as a matrix equation n+1 = n p. Using (6.4) repeatedly we have for any nonnegative integers n that m+n = m pn , (6.5) (6.4)

where pn is the matrix power of p. Writing out (6.5) gives an explicit formula for updating the distribution: P (Xm+n = j) =
i

P (Xm = i) pn (i, j).

(6.6)

Of course (6.6) makes sense and is true even if the points of the state space are not arranged in any particular order, which sometimes makes it easier to think about.

Multistep conditionals, Chapman-Kolmogorov

It is a fact that for any nonnegative integers n, m 0, and any points i, j in the state space, P (Xm = i, Xm+n = j) = P (Xm = i) pn (i, j), (7.1) 5

where pn is the matrix product of p with itself n times. If we divide both sides by P (Xm = i), we obtain the multistep conditional probability formula: P (Xm+n = j|Xm = i) = pn (i, j), (7.2)

where as usual we say that the equation holds whenever P (Xm = i) > 0. Note that this equation is the same as the equation given at the beginning of Section 1.2 on page 34 in the text. Despite the way the text phrases things, we will not think of this as a denition of pn . We will stick to thinking of The equation simply states a property of P (Xm+n = j|Xm = i), and pn is the matrix product. Note that when n = 1 we already know (7.1), since when n = 1 it is just the one step equation (3.2). If we apply the Law of Total Probability using the multistep conditional probability formula, we obtain (6.6) for updating the distribution, which is equivalent to (6.5). Please check this. Since we dene pn to be the matrix power of p, one might thinkg the Chapman-Kolmogorov equation (equation (2.2) in Section 1.2 of the text), as we have derived it, is a trivial consequence of the associativity of matrix multiplication. However, the interesting content of the Chapman-Kolmogorov equation is the probabilistic meaning of the matrix pn , which is expressed in (7.2) (as well as in (7.3) below). When we speak of the Chapman-Kolmogorov equation to justify some probability conclusion, we really mean both the matrix associativity and the interpretation of the matrix pn given in (7.2) and (7.3). Proof of (7.1) We will base our argument on the path probability formula. Let m, n nonnegative integers, and let i, j be points in the state space. By the general path probability formula, P (Xm = i, Xm+n = j) =
i1 ,...,in1

P (Xm = i, Xm+1 = i1 , . . . , Xm+n1 = in1 , Xm+n = j) =


i1 ,...,in1

P (Xm = i) p (i, i1 ) p (i1 , i2 ) . . . p (in1 , j) . (7.3)

A simple, but very important fact is that p (i, i1 ) p (i1 , i2 ) . . . p (in1 , j) = pn (i, j),
i1 ,...,in1

(7.4)

where pn denotes the matrix product of p with itself n times. Equation (7.4) is easy to prove by induction on n, and is useful whenever matrices are used, not just in probability. Substituting from (7.4) into (7.3), we obtain (7.1). Remark In class we derived the same formula for the case n = 2, when the state space was {1, 2, 3}. The general proof given here is exactly the same idea. 6

Remark The approach used in the text derives Chapman-Kolmogorov and (7.2) via conditional probability properties. That is a good method too, but it would takes quite a bit of extra work for us to check all the omitted details rigorously.

Potrebbero piacerti anche