Asset Pricing (안동현교수님강의노트)

Financial Economics: Asset Pricing
Dong-Hyun Ahn
School of Economics
Seoul National University
Fall 2006
Chapter 1
Overview
1.1 The Strength of Continuous Time Models
1.1.1 Why continuous time models?
Example 1: Option Pricing Models - The payos of options are nonlinear
(convex) functions of underlying assets. In a discrete time framework, option
prices are not perfectly correlated with their underlying assets. But the con-
tinuous time framework brings the perfect correlation instantaneously. Some-
what paradoxically, the introduction of more complicated mathematics makes
the pricing of contingent claims more analytically tractable.
Example 2: Equilibrium Asset Pricing Models - CAPM of Sharpe, Lintner
and Mossin received relatively limited application in the broader community
of economic research because of a widespread belief that the mean-variance
criterion is not consistent with the generally accepted Neumann-Morgenstern
axioms of choice unless either asset prices have Gaussian distributions or in-
vestors have quadratic preferences.
In contrast, the continuous time dynamic model with lognormally distributed
asset prices produces optimal portfolio rules that are identical in form with
those prescribed by the mean-variance model and the CAPM. In this sense,
the continuous-time model is a watershed between the static and dynamic
models of nance.
Example 3: Agency Theory - In agency models, the intertwine of the ob-
jective functions of agents and principles makes the analysis dicult unless
the optimal conditions are replaced by the rst-order conditions. In discrete
1
Financial Economics: Prof. Dong-Hyun Ahn 2
time framework, the necessary and sucient conditions for this replacement
are very dicult whereas these conditions are well dened in the continuos
time framework.
The continuous-time model has proven to be a versatile and productive tool
in modern nance. Despite its mathematical complexity, the continuous-time
formulation provides just enough additional specicity to produce both more
precise theoretical solutions and more rened empirical hypothesis than can
otherwise be derived from its discrete-time counterpart.
1.1.2 Continuous Time vs. Discrete Time
The choice of model depends on the problem we are dealing with.
Continuous time models are more appropriate in terms of market availability
(trading opportunities).
Discrete time models make sense in terms of decision intervals.
The results of the two kinds of models can be very dierent, but in general,
the solutions are simpler and stronger in the continuous time formulation.
(biased argument???)
1.2 The Big Picture
Virtually all the existing nancial asset pricing models focus on the following fun-
damental asset pricing equation,
1
P
n
(t) = E
t
[M
t+
P
n
(t +)]
which states that the time t price of any asset n, P
n
(t), is the conditional expectation
of its future value, P
n
(t+), multiplied by a market-wide random variable, M(t+).
M(t + ) has been referred to by many dierent names including a stochastic
discount factor, a pricing operator, an intertemporal marginal rate of substitution
and a Radon-Nikodym derivative to name a few. Here we simply call it a pricing
kernel.
Simply put, the history of asset pricing models is the history of identifying
this pricing kernel. Broadly speaking, there are three representative approaches to
identifying the pricing kernel.
1
The fundamental asset pricing equation is also referred to as the canonical asset pricing
equation or Euler equation.
Equilibrium Approach - This is a general equilibrium approach wherein there
are agents who have specic preference structures along with either stochastic
endowments or stochastic production technologies. Along with the market
clearing condition, the optimal economic behavior of the agents determines
the endogenous specication of the pricing kernels. The key feature of the
approach is that it tries to price all the assets simultaneously, so that internal
consistency across the assets is guaranteed. Some representative studies in
this line of research are as follows:
Capital Asset Pricing Model (CAPM)
Intertemporal Asset Pricing Model (ICAPM)
Consumption Asset Pricing Model (CCAPM)
Production Asset Pricing Model (PAPM)
Arbitrage Approach - In contrast to the equilibrium approach, the arbitrage
approach tries to retrieve the implied pricing kernel embedded in a subset of
existing assets and to use this pricing kernel to value the remaining set of
assets. We refer to the chosen assets used to retrieve the pricing kernel as
basis assets. The strength of this approach is that it is free of the potential
misspecication of a preference system of economic agents. But unlike the
equilibrium approach, it cannot price all the assets simultaneously by con-
struction. Examples of the arbitrage approach include:
Rosss Arbitrage Pricing Theory (APT)
Black-Scholes Option Pricing Model (OPM)
Heath-Jarrow-Morton Model (HJM)
Recently there is a surge in new eort to extend arbitrage pricing scheme by
grafting it with equilibrium approach. The basic idea of this new approach
is to identify a range of prices of assets based on a set of admissible pricing
kernels inferred from basis assets, and then tighten this range based on some
exogenous rules such as good deals (high Sharpe-ratio or gain-loss ratios).
Cochrane & Saa-Requejo (Sharpe ratio)
Bernado & Ledoit (gain-loss ratio)
Pricing Kernel Approach - This approach is an intermediate approach relative
to the two described above, and it directly species the pricing kernel. The
required specication of the preference system in the equilibrium approach
can at times limit the specication of the pricing kernel so that departures
from the fundamental equation can be ascribed to the improper specication
of the preference structure. This pricing kernel approach discards this en-
dogenous derivation of the pricing kernel and instead searches for the pricing
kernel which can t the fundamental valuation equation. The strength of
the approach is that it can provide more latitude relative to the equilibrium
approach in a modeling sense without losing the internal consistency which ar-
bitrage models sometimes fail to meet. However, it does not provide as much
economic reasoning because we do not know what the supporting equilibrium
is. On the other hand, if we can nd the supporting equilibrium, the speci-
cation of the pricing kernel may be more restrictive than its counterparts in
the general equilibrium setup. The pricing kernel approach is used in:
Constantinidess SAINTS Model
Bansal & Viswanathans APT
At-Sahalia & Los OPM
Figure 1.1 illustrates the valuation scheme of the alternative approaches. Each
approach has its own strengths and weaknesses. There is a trade-o between practi-
cal usage and economic reasoning. On one extreme, you have the arbitrage approach
which is very useful in practice especially in the pricing of derivatives. On the other
extreme, you have the equilibrium approach where you can analyze the economic be-
havior of agents in a full-edged manner. The pricing kernel approach lies between
these two extreme models.
In this class, we will study mainly the equilibrium approach and the arbitrage
approach, and the pricing kernel approach will be discussed when we study term
structure models.
Equilibrium Approach
Max E
t
[U()]
s.t. budget constraint
& market clearing
?
Pricing Kernel
?
Basis
?
Pricing Kernel
6
Arbitrage Approach
Pricing
Kernel
Pricing Kernel
Approach
Figure 1.1: Asset Pricing Approaches
Chapter 2
A Discrete One Period Model
In this section, we briey review a discrete one-period model. A representative one
period equilibrium model is the Capital Asset Pricing Model of Sharpe, Lintner and
Mossin. Because you must have already studied this model rigorously, we skip this
model and discuss the arbitrage pricing approach in this chapter.
The model which we will review is the simplied version of Harrison and Kreps
(1979). The primary purpose of this chapter is a pedagogic concern, getting you
familiar with the denitions of attainable sets, complete markets, arbitrage transac-
tions and the relationship between equivalent martingale measures and price func-
tionals.
Therefore, we sketch heuristically the major results of Harrison and Kreps with-
out watertight proofs here. In the next chapter, we will study an extension of the
model in a multiplied framework.
2.1 Assumptions
Initial date 0, terminal date T, trading at time 0 only, consumption at time 0
and time T only.
Probability space (, T, P). States belong to a nite sample space with K
elements
= (
1
, ...,
K
)
where state occurs with a probability P() > 0.
only one perishable consumption good.
6
t = 0 t = T
e
i
= e
i
(0) e
i
(T, )
: endowment process
of trader i
c
i
= c
i
(0) c
i
(T, )
: consumption process
of trader i
Figure 2.1: Consumption & Endowment Process
a nite number N of securities where each individual security N has a
random dividend payout d
n
() on the sample space . We arrange a dividend
matrix, D , in such a way that each row represents a state and each column
represents a security:
D =
_
_
_
d
1
(
1
) d
N
(
1
)

d
1
(
K
) d
N
(
K
)
_
_
_.
a nite number I of traders
exogenously specied endowment process of trader i, e
i
and endogenous con-
sumption process of trader i, c
i
i. All traders have the same consumption
set (IRX), where X = L
2
(, T, P), the space of T-measurable random vari-
ables that are square integrable. Note that any consumption scheme can be
also interpreted as contingent claims.
1
The representation of these processes
over time is illustrated in Figure 2.1.
Preference systems
On the consumption set (IRX), traders have complete preference orderings,
_ that are continuous, increasing and convex.
2
We write the set of those _s
as A; thus A represents the class of conceivable agents.
3
1
In general, a contingent claim is a claim on the future consumption contingent on the
state, so we can also interprete X as the set of contingent claims.
2
For these properties of preference structures, you can refer to microeconomic text books
such as Varian (1984).
3
Note that the class A is quite general and includes non Von Neumann-Morgenstern
utility functions.
2.2 Model
2.2.1 Budget Sets
Given endowment processes of traders e
i
i, and a price vector of securities, p

=
(p
1
p
2
p
N
), the budget set B(e
i
, p) of trader i is the subset of the consumption
set IR X such that
c B(e
i
, p) IR X
iif there is a trading strategy denoted by a vector

= (
1

2

N
) such that
c
i
(0) = e
i
(0)

p (2.1)
c
i
(T, ) = e
i
(T, w) +

d (). (2.2)
Example 1 K = 2 states and N = 4 securities whose terminal payouts are
D =
_
100 40 60 120
100 0 40 80
_
and the price vector is
p

= (50 4 22 44).
The endowment process of trader i is
e
i
=
_
e
i
(0) e
i
(T)
_
=
_
9
_
10
20
__
The consumption set IR X is IR
3
, and the consumption process c
i
belongs to
his/her budget set i the system of simultaneous equations
_
_
50 4 22 44
100 40 60 120
100 0 40 80
_
_
_
4
_
_
=
_
_
_
c(0) 9
c(T, w
1
) 10
c(T, w
2
) 20
_
_
_
has a solution ; i.e., the nonhomogeneous system of linear equations should be
consistent. One simple way to solve the system of linear equations is a Gaussian
elimination method which sets up an augmented matrix and then reduces it to a
row-echelon form that is simple enough that the system of equations can be solved
by inspection. The initial augmented matrix in our case is
_
_
50 4 22 44 [ c(0) 9
100 40 60 120 [ c(T, w
1
) 10
100 0 40 80 [ c(T, w
2
) 20
_
_.
After a Gaussian elimination, its row-echelon form can be shown to be
_
_
1
2
25
11
25
22
25
[
c(0)9
50
0 1
1
2
1 [
c(T,w
1
)+2c(0)28
32
0 0 0 0 [
5
2
c(0) +
c(T,w
1
)
4
+c(T, w
2
) 45
_
_.
Then, for the system to be consistent, the last row implies the equation:
5
2
c(0) +
c(T, w
1
)
4
+c(T, w
2
) 45 = 0
Therefore, the system has solutions i
c(0) +
1
10
c(T, w
1
) +
2
5
c(T, w
2
) = 18.
Hence, the budget set B(e
i
, p) is a plane in IR
3
.
4
2
2.2.2 Attainable Set
The undesirable feature of consumption sets is that a given terminal consumption
can be attained by a fortunate coincidence with the terminal endowment as well
as by trading. That is, because the consumption sets are dierent for individual
investors due to dierent endowment processes, the consumption set can summarize
the characterstics of individual investors along with the whole market.
To avoid this, we introduce a new concept which is free of the terminal endow-
ment.
Denition 1 A consumption process c = c(0), c(T, w) is attainable at prices p
i an endowment process e = e(0), e(T) such that e(T) = 0 and c B(e, p ).
The set of attainable consumption processes is denoted (IR M).
In the one period model, the attainability of a consumption process depends only
on the terminal consumption part, c(T). Stated dierently, a consumption process
c = c(0), c(T) is attainable i a trading strategy such that
c(T, w) =

d (w).
Note that this denition of attainable sets is closely related with the spanning con-
cept, such that a given consumption scheme is attainable i it is spanned by the
given set of securities.
4
As we will discuss later, a single restrictional equation if a market is complete. In
contrast, if the market is not complete, + 1 equations where is the number of assets
needed for completion: i.e., = K rank(D ).
Example 2 K = 3 states and N = 4 securities whose terminal payouts are
D =
_
_
_
1 2 3 6
2 0 2 4
4 1 5 10
_
_
_
After a Gaussian elimination, the row-echelon form of an augmented matrix can be
shown to be
_
_
1 2 3 6 [ c(T, w
1
)
0 1 1 2 [
1
2
c(T, w
1
)
1
4
c(T, w
2
)
0 0 0 0 [
1
2
c(T, w
1
)
7
4
c(T, w
2
) +c(T, w
3
)
_
_.
Therefore, for the system to be consistent,
c(T, w
1
) +
7
2
c(T, w
2
) 2c(T, w
3
) = 0. (2.3)
Hence, a consumption process with the terminal consumption
c(T, w
1
) = 2, c(T, w
2
) = 4, c(T, w
3
) = 8 (2.4)
satises the above equation, so it is attainable; i.e., c(T) M. In contrast,
c(T, w
1
) = 2, c(T, w
2
) = 1, c(T, w
3
) = 2 (2.5)
is not attainable; c(T) , M. 2
From the example, it is obvious to see that every consumption process is attain-
able i rank(D ) = K. From this, we establish the denition of completeness of a
market.
Denition 2 A market is complete i every consumption process is attainable, i.e.,
M = X and equivalently i rank(D) = K in a nite state economy.
This property of the matrix of terminal payouts has a crucial implication in asset
pricing. In a complete market, the terminal payos of all contingent claims are
attainable, so given the price system we can price them. In contrast, in an incomplete
market, we can price only the subset of contingent claims for which the terminal
payos are attainable. We will look into this concept in more detail later. Now we
set up the mathematical denition of attainable set.
Proposition 1 Interpreting K N payout matrix D as a linear operator
ID : IR
N
IR
K
dened by ID( ) = D , we can establish the following characterization of the
attainable set:
M = im(ID)
Further, the relationship between the budget set and the attainable set rests on the
following result:
Proposition 2 For any endowment process e, price system p , and consumption
process c, we have c B(e, p ) i the net trade nc = ce is attainable at zero initial
cost, or equivalently, i nc B(0, p ).
2.2.3 Arbitrage Strategies
In the previous subsection, we argue that we can price certain sets of securities
depending on the feature of a market: completeness or incompleteness of the market.
But to do so, we require one more prerequisite; the price system of the market should
not allow any arbitrage transaction opportunities.
An arbitrage trading strategy is a good deal. Many practitioners are searching
for any free lunch available in markets. Traders using an arbitrage strategy gets a
sure return without any investment. The precise denition is as follows.
Denition 3 An arbitrage trading strategy is a trading strategy that gives a trader
with a zero endowment a nonnegative, nonzero, consumption process.
For convenience we dene a new inequality notation for vectors
x
+
y if x y and (x y)
1 > 0
Stating mathematically, an arbitrage trading strategy is a trading strategy
such that
_

p

D

_
+
0
The arbitrage transaction is another name for free lunch. Notice that more familiar
denition of arbitrage transaction argument
no net endowment
no net risk
_
= net return
is a special case of the above denition; i.e.,
_

p

D

_
= (0, 1
K
),
where > 0. Therefore, the concept of arbitrage herein is broader.
Example 3 K = 3 states and N = 3 securities with the payouts
D =
_
_
_
6 11 3
5 11 3
12 9 3
_
_
_
and
p

= (8 10 3)
Consider the following strategy

= (1
7
2

87
6
)
The initial cost of this strategy is
1 8 +
7
2
10
87
6
3 =
1
2
and its terminal payout is 1 in state w
1
, and 0 in states w
2
and w
3
. This trading
strategy yields one half unit of consumption good at time 0, and one unit of the
consumption good at time T, state w
1
. Since it does not require any net spending,
it is an arbitrage strategy. 2
2.2.4 Equilibrium
Equilibrium is denoted by (p ,
i
) i such that for each i the consumption process
generated by the given endowment process e
i
and the trading strategy
i
optimizes
the preference ordering of trader i over the budget constraint B(e
i
, p ) and the
market clears,

I
i=1
i
= 0 .
2.2.5 Arbitrage Transactions and Equilibrium
Even though many traders are searching for arbitrage transaction opportunities,
those eorts, paradoxically, eliminate arbitrage transaction opportunities in the mar-
ket. These eort to police the market is a pivotal characteristic of equilibrium. So,
unfortunately arbitrage strategies do not exist in equilibrium.
Theorem 1 Arbitrage trading strategies do not exist in equilibrium
Proof The proof is by contradiction. Suppose arbitrage trading strategies exist in
equilibrium. Let
i
e
be a trading strategy that generates the equilibrium consump-
tion of trader i, and let
ab
be an arbitrage trading strategy at the given equilibrium
prices. Then,
_
(
i
e
+
ab
)
p (
i
e
+
ab
)
D

_
+
_
i
e
p
i
e
D

_
.
The increasing property of preference ordering, then, leads to
_
(
i
e
+
ab
)
p (
i
e
+
ab
)
D

_
_
_
i
e
p
i
e
D

_
.
This contradicts the fact that at equilibrium prices there is no consumption that a
trader strictly prefers to his/her equilibrium consumption. 3
2.3 Equivalent Martingale Measure and Price
Functional
2.3.1 Equivalent Martingale Measure
Here we add one additional assumption that a bond whose terminal payout as
well as initial price is positive and does not depend on states. We dene r
f
=
1
p
f
1
where p
f
is a bond price and interpret r
f
as the rate of interest rate on the bond.
Note that r
f
> 1 (r
f
0 if in real economy). We stack the terminal payo of the
bond in the rst column of D for convenience.
Denition 4 If for a given price system p the system of linear equations
p =
D

Q
1 +r
f
(2.6)
has a positive solution Q > 0 , then this Q is called an equivalent martingale
measure, equilibrium price measure or risk-neutral probability measure for the price
system p .
An equilibrium price measure is a probability measure on the sample space . Be-
cause the rst security is a bond,
Q

1 = 1.
That, together with the requirement Q > 0 , establish Q as a probability measure
on .
5
5
We will study a rigorous denition of probability measures in Chapters 4 & 5.
Because Q is a probability measure, equation (2.6) can be rewritten as
p
j
=
E
Q
(d
j
)
1 +r
f
, (2.7)
which means that the current price of an asset is equal to an expectation of its future
payo discounted by interest rate. The reason Q is called an equivalent martingale
measure is that under the Q measure, the price of a security normalized by a bond
is a martingale. This concept will be clearer in following chapters.
Theorem 2 The 1st Valuation Theorem An equivalent martingale measure Q
exists i the given price system does not allow arbitrage transaction opportunities.
Proof
Only if part: Suppose an equivalent martingale measure Q which satises equation
(2.6). Then premultiplying both sides of equation (2.6) by

yields

p =

D

Q
1 +r
f
.
The LHS is the initial cost of portfolio while the RHS is the terminal value of the
portfolio postmultifplied by Q /(1 +r
f
) which is strictly positive from Q > 0 and
r
f
> 1. Then,
if (D 0 ) = (

p 0)
if (D
+
0 ) = (

p > 0).
Therefore, if any equivalent martingale measure, there is no arbitrage transaction
opportunity.
If part: Suppose there is no arbitrage opportunity. Remember that we dene nc =
ce. Note that from the proposition 2, the budget set, B(e, p ) of c can be rewritten
as B(0, p ) of net trading nc. Dene
H =
_
nc IR X[(nc(0) nc(T)

+
0
_
.
The statement that the price system does not permit arbitrage transaction oppor-
tunity is equivalent to
B(0, p ) H = ,
that is, the consumption set and set of arbitrage transactions are disjoint. Note
that B(0, p ) and H are convex, closed subset of IR X, and H is nonempty.
From the separating hyperplane theorem, there exists a nontrivial continuous linear
functional,
f : IR X IR
such that
f(nc) =
_
= 0 for nc B(0, p )
> 0 for nc H
Since (1, 0

)
H, f(1, 0 ) > 0. Without loss of generality, we normalize f such

that f(1, 0 ) = 1, and write
f(nc) = nc(0) +g

nc(T).
Because f is continuous, so is g. Next any arbitrary asset j, (p
j
, d
j
) is in B(0, p ),
therefore,
f(p
j
, d
j
) = p
j
+g

d
j
= 0, i (2.8)
or alternatively,
p = g

D .
We rescale g such that h =
1
1+r
f
g . Since equation (2.8) holds for a bond also, we
can prove that
h

1 = 1.
Next we claim that h > 0 . For any x X
+
, we have f(0, x ) > 0 because
(0, x ) H, hence,
f(0, x ) = 0 +
h

x
1 +r
f
> 0,
which results in h > 0. Therefore if the price system does not allow for arbitrage
opportunity, there exists an equilvalent martingle measure, h. 3
Theorem 3 An equivalent martingale measure Q exists i the given price system
p is an equilibrium price system for some population A.
Proof
If part: Theorem 1 states that an equilibrium price system for such a population
of traders does not permit arbitrage opportunities and the existence of Q follows
from Theorem 2.
Only if part: Suppose that Q exists for the given p . We assume that agents
_
A and dene a relation _
on IR X by
nc
j
_
nc
k
if nc
j
(0) +
nc
j
(T)
Q
1 +r
f
nc
k
(0) +
nc
k
(T)
Q
1 +r
f
The above preference ordering verbally means that traders prefer net trade j to k if
the present value of j is greater than that of k. The denition of budget set states
(from equation(2.1) and equation (2.2))
nc(0) =

p
nc(T) =
D ,
which results in
nc(0) +
nc(T)
Q
1 +r
f
=

p +

D

Q
1 +r
f
= 0.
The second equation holds by the denition of Q , i.e., equation (2.6). Note that
the net endowment process (0, 0 ) B(0, p ), so traders are indierent between nc
and (0, 0 ). Then, agents with preferences _
weakly prefers (0, 0 ) to every net trade

nc IRM. There is no trade in this equilibrium. Thus, p is an equilibrium price
system for an economy populated by agents from the class A. 3.
Example 4 K = 3 states and N = 3 securities with the terminal payouts and
the prices
D =
_
_
_
1 3 9
1 1 5
1 5 13
_
_
_, p =
_
_
_
1
2
7
_
_
_.
It is easily found that rank(D ) = 2, so the market is incomplete, and there
exists an innite number of positive solutions for Q
Q(w
1
) = 1/2 2Q(w
3
), Q(w
2
) = 1/2 +Q(w
3
), 0 < Q(w
3
) < 1/4.
Because equilibrium price measures exist, the given price system does not permit
arbitrage strategies. Since the market is incomplete though, not all consumption
processes are attainable. For example, suppose two securities i and j whose
terminal payouts are
d
i
= (14 12 16)
d
j
= (2 1 5)
The terminal payout of security i is attainable (i.e.,
_
0, d
i
_
M) because trading
strategies to duplicate that, for example,

= (2 5 3).
Hence, the price of security i is 13. However security j cannot be duplicated by the
existing securities, so it is not attainable. But still we can nd the price range of
security j which does not permit arbitrage strategy.
p
j
= d
j
Q
= 3/2 + 2Q(w
3
)
From 0 < Q(w
3
) < 1/4, the price range of security j which does not permit any
arbitrage transaction opportunity is
3/2 < p
j
< 2. 2
Proposition 3 The 2nd Valuation Theorem If an equivalent martingale measure
exists, then it is unique i the market is complete.
Proof When the system of equations p =
D

Q
1+r
f
has a solution, then this solution
is unique i rank(D ) = K. 3.
Example 5 K = 3 states and N = 4 securities, The terminal payouts of the
securities are
D =
_
_
_
100 40 60 120
100 0 40 80
100 20 100 200
_
_
_
and the price vector is

p

= (50 7 31 62).
Here r
f
= 1 (from 1 +r
f
=
100
50
) and the system of equations for Q is
_
_
_
100 40 60 120
100 0 40 80
100 20 100 200
_
_
_
_
_
_
Q(w
1
)
Q(w
2
)
Q(w
3
)
_
_
_ =
_
_
_
_
_
100
14
62
124
_
_
_
_
_
This system has the solution
Q

=
_
1
5
1
2
3
10
_
and therefore, the given price system does not permit arbitrage strategies. 2
2.3.2 The Price Functional
For any price system p that does not permit arbitrage strategies, the price functional
assigns to each attainable consumption process c the initial cost of c, that is, the
initial endowment e(0) such that c B(e, p ), where e = e(0), 0 .
Denition 5 Let p be a price system that does not permit arbitrage strategies. The
price functional : IR M IR (or : M IR) is such that for every c M
(c) = c(0) +(c(T))
where
(c(T)) =

p
for any trading strategy such that
c(T) =

d .
Notice that M is a linear subspace of the consumption X and that is a linear
functional on M. Since the present value of current consumption is its value, the
dierence between and is obvious.
The relationship between the price functional and the equilibrium price mea-
sure Q is represented in the following theorem.
Theorem 4 If the price system p does not permit arbitrage transactions, then for
every c M and every equilibrium price measure Q
(c) = c(0) +
E
Q
[c(T)]
1 +r
f
or (c) =
E
Q
[c(T)]
1 +r
f
In particular, for any security 1 i N which is an elementary security of the
basis assets
p
i
= (d
i
) =
E
Q
[d
i
]
1 +r
f
.
Proof Let Q be an equilibrium price measure and c an attainable consumption
process. Then
D

Q = (1 +r
f
)p
and a trading strategy such that D is the terminal consumption part of c
such that
(c) = c(0) +

p = c(0) +

D

Q
1 +r
f
= c(0) +
E
Q
[c(T)]
1 +r
f
.
The above theorem makes it clearer that Q is interpreted as a risk-adjusted prob-
ability. 3
Remark 1 Theorem 4 implies that the budget set B(e, p ) can be expressed as:
nc(T) M and (nc) = 0.
The above equation says that the net trading should be attainable and its present
value should be zero (unless arbitrage opportunity).
Example 6 Consider a market with K = 4 and N = 3, and a payout matrix
D =
_
_
_
_
_
1 3 9
1 1 5
1 5 13
1 7 17
_
_
_
_
_
along with a price vector,
p

= (1 4
5
6
12
2
3
).
It is easy to nd rank(D ) = 2, so the market is incomplete. There are two
restrictions for a vector nc(T) to be attainable,
0 = 2nc(T, w
1
) nc(T, w
2
) nc(T, w
3
)
0 = 3nc(T, w
1
) 2nc(T, w
2
) nc(T, w
4
).
Further, it is easy to verify that the set of equilibrium price measures Q is given by
Q

=
__
23
12
2Q
3
3Q
4
_ _
11
12
+Q
3
+ 2Q
4
_
Q
3
Q
4
_
where
0 < Q
3
< 1
0 < Q
4
< 1
0 <
_
23
12
2Q
3
3Q
4
_
< 1
0 <
_
11
12
+Q
3
+ 2Q
4
_
< 1.
We obtain the following representation of the price functional (Note 1 +r
f
= 1)
(nc) = nc(0) +(nc(T))
= nc(0) +E
Q
[nc(T)]
= nc(0) +nc(T)
Q
= nc(0) +
_
23
12
2Q
3
3Q
4
_
nc(T, w
1
)
+
_
11
12
+Q
3
+ 2Q
4
_
nc(T, w
2
) +Q
3
nc(T, w
3
) +Q
4
nc(T, w
4
).
Since the given consumption nc is assumed to be attainable, we will substitute
equation (2.9) and (2.9) into the above equation,
(nc) = nc(0) +
23
12
nc(T, w
1
)
11
12
nc(T, w
2
)
or (nc(T)) =
23
12
nc(T, w
1
)
11
12
nc(T, w
2
).
Note again that even though there are an innite number of equivalent martingale
measures they all provide the unique price of any attainable consumption scheme,
and is uniquely dened. Hence, we can summarize the budget set B(e, p ) as:
0 = 2nc(T, w
1
) nc(T, w
2
) nc(T, w
3
)
0 = 3nc(T, w
1
) 2nc(T, w
2
) nc(T, w
4
).
0 = nc(0) +
23
12
nc(T, w
1
)
11
12
nc(T, w
2
)
You can derive this solution by directly solving the system of equations as in Example
1. This will be reserved as an assignment. 2
Remark 2 Note that even in an incomplete market, the pricing functional is
uniquely dened. This uniqueness stems from the fact that is dened only on
M, not X. That is, prices uniquely any attainable claim x M; otherwise,
arbitrage opportunities.
Corollary 1 A price functional exists i the given price system p is an equilib-
rium price system for some population A.
Proof It is the direct result of Theorem 3 and Theorem 4. 3
2.3.3 Viability of Price System
This subsection is provided to help your understanding of Harrison and Kreps (1979).
They use a particular terminology viability to describe the price system which does
not allow any free lunch.
Denition 6 A price system (M,) is viable if some _ A and nc
= nc(0)
, nc(T, )
IR M such that
nc(0)
+(nc(T)
) 0 and nc
_ nc nc IR M
such that nc(0) +(nc(T)) 0.
Remark 3 This denition says that there is some agent from the class A who,
when choosing a best net trade subject to his or her budget constraint nc 0, is
able to nd an optimal trade. Put dierently, a price system is viable i the given
price system is an equilibrium price system for some population of traders whose
preferences are characterized as A.
Corollary 2 A price system is viable
i the market does not permit any arbitrage strategy
i equivalent martingale measures
i a price functional
Proof It is a direct result of Theorem 2, Theorem 3 and Theorem 4.
2.3.4 Extension of Price Functional
So far we have studied the pricing of the set of contingent claims which are at-
tainable. That is, we can price any claim x M, but we cannot price other
claims x (X M). So, the equivalent characterization of viability has a partial
equilibrium-general equilibrium avor to it. Imagine an economy where markets
exists for all claims x X, one portion of that economy being the market where
claims x M can be traded at prices (x). Then these prices must be part of
a general equilibrium system of prices for all of X. Formally, let M X and
consider a linear functional
: X IR.
such that for every x M we have (x) = (c) (which is denoted [M =
in Harrison and Kreps (1979; p386)). Then the linear functional is called an
extension of the linear functional .
Note that in an incomplete market, the number of extensions of the pricing
functional is innite, which is clearly distinct from which is uniquely dened.
In a complete market, M = X, so = . Hence, you may guess that there might
be some relationship between and Q . This relationship is summarized in the
following theorem.
Theorem 5 Suppose the market admits no simple free lunches. Then there is a one-
to-one correspondence between equilibrium price measures Q and positive extensions
: X IR of the pricing functional . This correspondence is given by
1. (c(T)) =
E
Q
[c(T)]
1+r
f
2. Q(w) = (1 +r
f
)(1
)
where
1
=
_
1 if state =
0 otherwise
Proof
Q = part: Theorem 4 implies that is an extension of from domain M to
X. is positive because Q > 0 and c(T)
+
0 .
=Q part: We claim that Q() dened by (2) is positive because is posi-
tive. Second,
Q

1 = (1 +r
f
)(1
) = (1 +r
f
)p
f
=
1 +r
f
1 +r
f
= 1,
so, Q is a probability measure. 3
For your understanding, lets examine the following example.
Example 7 This is a continuation of Example 6. The extension of can be ex-
pressed as
(x) =
_
23
12
2Q
3
3Q
4
_
x
1
+
_
11
12
+Q
3
+ 2Q
4
_
x
2
+Q
3
x
3
+Q
4
x
4
. (2.9)
subject to [M = . Lets represent more explicitly. First we choose Q which
satises equation (2.9), for example,
Q

= (1/4 1/6 1/12 1/2).
But we have to impose a condition [M = on the above equation. This can be
done by adding Lagrangian equations
(x) =
1
4
x
1
+
1
6
x
2
+
1
12
x
3
+
1
2
x
4
+
1
[2x
1
x
2
x
3
]
+
2
[3x
1
2x
2
x
4
]
=
_
1
4
+ 2
1
+ 3
2
_
x
1
+
_
1
6

1
2
2
_
x
2
+
_
1
12

1
_
x
3
+
_
1
2

2
_
x
4
where
1
and
2
are any numbers such that
1
4
+ 2
1
+ 3
2
> 0,
1
6

1
2
2
> 0,
1
12

1
> 0,
1
2

2
> 0.
The rst inequality restriction is redundant. Of course, a dierent choice of Q
results in dierent s, but regardless of the choice, the set made by s represents
the same extensions of price functional. 2
2.4 Appendix: Key Mathematics
The common dimension of the row space and column space of a matrix A is
called the rank of A and is denoted by rank(A); the dimension of the nullspace
of A is called the nullity of A and is denoted by nullity(A).
If V and W are vector spaces and F is a function that associates a unique
vector in W with each vector in V , we say F maps V into W, and write
F : V W. Further, if F associates the vector w with the vector v , we
write w = F(v ) and say the w is the image of v under F. The vector space
V is called the domain of F denoted by do(F), and the vector space W is
called the image space of F denoted by im(F).
Example 8 The function dened by the formula
F(x, y) = (x y, x +y, 5x) (2.10)
maps IR
2
into IR
3
. For this function, the image of a vector v = (x, y) in IR
2
is the vector w = (x y, x + y, 5x) in IR
3
. For example, if v = (1, 3), then
w = F(v ) = F(1, 3) = (2, 4, 5). Hence do(F) = IR
2
and im(F) = R
3
.
If F : V W is a function from the vector space V into the vector space W,
then F is called a linear transformation or linear operator if
(a) F(u +v ) = F(u ) +F(v ) u and v ( V ).
(b) F(ku ) = k F(u ) u ( V ).
If F : V W is a linear operator, then the set of vectors in V that F maps
into 0 is called the kernel or null space of F and denoted by ker(F). The
set of all vectors in W that are images under F of at least one vector in V is
called the range of T; it is denoted by R(T).
Example 9 Let F : V W be the zero transformation. Since F maps every
vector in V into 0 , it follows that ker(F) = V and R(F) = 0 .
Separating Hyperplane Theorem: Called also Hahn-Banach Theorem.
Let A and B IR
n
be convex disjoint sets. If x is in the interior of A and y
is in the interior of B, the there exists a continuous linear functional , not
identically zero on IR
n
, and a constant c such that
(y) < c < (x).
Chapter 3
A Discrete Multiperiod Model
In this section, we study a discrete multiperiod model. This multiperiod model also
assumes nite number of states and securities, but trading can be taken place on a
nite number of dates.
There is no dramatic changes in the result of the discrete multiperiod model.
There are two perturbations though. First, the number of securities to complete
a market can be reduced. A smaller number of securities than states can make
entire contingent claims attainable. Second, and more importantly, the intermedi-
ate trading opportunity introduces the intermediate arrival of new information, or
information resolution.
3.1 Assumptions
Initial date 0, terminal date T, trading at time 0, 1, , T 1, consumption
at time 0, 1, , T.
Probability space (, T, P). States belong to a nite sample a nite sample
space with K elements
= (
1
, ...,
K
)
where state occurs with a probability P() > 0.
only one perishable consumption good.
a nite number N of securities where each individual security n has a random
terminal payout d
n
(T, ) on the sample space . We arrange a dividend
matrix, D (T), in such a way that each row represents a state and each
25
t = 0 t = 1 t = 2 t = T
e
i
= e
i
(0) e
i
(1) e
i
(2) e
i
(T, )
: endowment process
of trader i
c
i
= c
i
(0) c
i
(1) c
i
(2) c
i
(T, )
: consumption process
of trader i
Figure 3.1: Consumption & Endowment Process
column represents a security
D (T) =
_
_
_
d
1
(
1
) d
N
(
1
)

d
1
(
K
) d
N
(
K
)
_
_
_.
For simplicity, we assume there is no interim dividend payouts.
a nite number I of traders
exogenously specied endowment process of trader i, e
i
and endogenous
consumption process of trader i, c
i
i. The representation of these processes
over time is illustrated in Figure 2.1.
Preference systems
On the consumption set, traders have complete preference orderings, _ that
are continuous, increasing and convex.
3.2 Partitions and a Sequence of Partitions
In a discrete one-period model, all the information is represented by the eld T.
However, in a multiperiod model wherein new information arrives over time infor-
mation should be represented by a sequence of eld of or ltration. In a discrete
time model with nite states, partitions will represents the same information that
is represented by the eld. Because we study eld and -eld in the next chapter
in more detail, here we will dene only partitions and a sequence of partitions.
Denition 7 A set f = f
1
, , f
v
of subsets of a sample space is called a
partition of i:
1. f
i
f
j
= i ,= j.
2. U
v
i=1
f
i
= .
Denition 8 A partition g = g
1
, , g
u
is ner than the partition f = f
1
, , f
v
i every set of g is a subset of some set of f, that is

1 i v such that g
j
f
i
1 j u
A partition f is coarser than a partition g i g is ner than f.
Denition 9 An information structure f
t
(t [0, T]) is a sequence of partitions
f
0
, , f
T
such that
1. At the initial time f
0
=
2. At the terminal time f
T
=
1
,
2
, ,
K
3. For each 0 t T 1 the partition f

t+1
is ner than the partition f
t
.
The rst condition states that you do not have any information except the possible
outcomes while the second condition means that you will have complete information
about the actual state. The third condition is crucial part, which states that as time
passes, your information will get better, or the uncertainty is resolved over time.
Example 1 (Two tosses of a fair coin) Let = HH, HT, TH, TT. The infor-
mation revelation can be represented by a binomial tree, or equivalently a sequence
of partitions (see Figure 3.2). Note that f
2
f
1
f
0
, so the partition is ner as
time evolves. 2
Denition 10 A random variable x is measurable on a partition f i it is constant
on each set of f. Further, for a random variable x, the coarsest partition on which
x is measurable is called the partition generated by x and is denoted by f
x
.
Example 2 In Example 1, at t = 1, f
1
= HH, HT, TH, TT.
x(HH) = 1 x(HT) = 1 x(TH) = 2 x(TT) = 2
y(HH) = 1 y(HT) = 2 y(TH) = 2 y(TT) = 2
A random variable x is measurable on f
1
whereas y is not. Also,
f
x
= HH, HT, TH, TT f
1
f
y
= HH, HT, TH, TT , f
1
2
$
$
$
$
$
$
$
$
$
$
$
H
T
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
HH
HT
TH
TT
f
0
=
f
1
= HH, HT, TH, TT
f
2
= HH, HT, TH, TT
Figure 3.2: A Sequence of Partitions
Corollary 3 A random variable x is measurable on a partition f i f is ner than
f
x
.
Denition 11 A sequence of random variables (called stochastic process) x =
x(0), , x(T) is adapted to the information structure f
t
i each x(t) is mea-
surable on f
t
.
A sequence of random variables y = y(0), , y(T) is predictable on the in-
formation structure f
t
i each y(t) (y(t + 1)) is measurable on f
t
(f
t
).
3.3 Information Structure
In addition to the assumptions made in section 3.1, we need additional assumptions
regarding the information resolution.
Homogeneous information: All traders share the identical information, f
t
.
e
i
(t) and c
i
(t) is measurable on f
t
: i.e., f
e
i
(t) f
t
and f
c
i
(t)
f
t
.
The consumption set is a product, IR X where X =

T
t=1
IR
f
t
. The di-
mension of IR
f
t
is determined by f
t
. Given the ner information resolution,
dim
_
IR
f
t
_
is an increasing (may not monotonically) function of time t.
3.4 Budget Sets
Given endowment processes of traders e
i
i, and a price vector of securities, p

=
(p
1
p
N
), the budget set B(e
i
, p ) of trader i is the subset of consumption set
IR X i there is a trading strategy (t) (1 t T) such that 0 t T
c
i
(t) + (t + 1)
p (t) = e
i
(t) + (t)
p (t). (3.1)
The RHS of equation (3.1) is a total income at time t. There are two sources of
income, an endowment and a portfolio that he or she holds. The LHS is a total
spending. It is either consumed or invested in a portfolio. Note that (t) is a
predictable process (i.e., (t) is measurable on f
t1
) because the portfolio decision
for t is decided at time t 1. Further (0) = (T + 1) = 0 , and p (T) = D (T).
3.5 Equilibrium
Equilibrium is denoted by
_
p (t),
i
(t 1)
_
i and t such that for each i the con-
sumption process generated by the given endowment process e
i
and the trading
strategy
i
optimizes the preference ordering of trader i over the budget constraint
B(e
i
, p ) and the market clears,

i
i
(t) = 0 t.
3.6 Completeness
In the one period model in Chapter 2, the market is complete i the number of inde-
pendent securities equals the number of states. This is true for any price systems.
1
In reality where the number of states is close to innity, this requires the tremendous
number of securities outstanding.
However, in the multiperiod model, the introduction of interim tradings signif-
icantly reduces the number of securities required to complete the market. This is
called dynamic completion, and this property depends on the given price system.
The intuition behind this is simple. A multiperiod model is a collection of one period
1
Remember that in Chapter 2, the attainability of a consumption scheme is not aected
by the prices of securities.
models. Especially, the decision makings at time t 1 is exactly identical to that
of one-period model. At time t 2, your payout matrix is not a dividend ma trix,
but a matrix of prices instead. Hence, to complete the market at a certain state of
time t 2, the rank of the price matrix should equal the number of possible states
of the next period. Hence, the market is complete i at any time, at any subset
of partitions, the sequence of one period model is complete. Let us begin with the
denition of completeness.
Denition 12 A multiperiod price system p (t) is complete i every adapted
consumption process is attainable, that is, M(p ) = X.
Above denition of completeness is similar to the denition made in a single period
model but for prices and consumption dened over multi periods. Before we make
formal denitions, look at the following examples so as to capture some basic ideas
about the feature of dynamic completeness.
Example 3 The sample space =
1
,
2
, ...,
6
and T = 3. A sequence of
partitions is as following:
f
0
=
1
,
2
, ...,
6
f
1
=
1
,
2
,
3
,
4
,
5
,
6
f
2
=
1
,
2
,
3
,
4
,
5
,
6
f
3
=
1
,
2
,
3
,
4
,
5
,
6
There are two securities, and each security has an adapted price processes:
p (f
0
)
=
_
.729
2.4
_
p (f
11
)
=
_
.81
1.26
_
p (f
12
)
=
_
.81
3.75
_
p (f
21
)
=
_
.9
1.4
_
p (f
22
)
=
_
.9
4.2
_
p (f
23
)
=
_
.909
4.2)
_
p (f
31
)
=
_
1
1
_
p (f
32
)
=
_
1
2
_
p (f
33
)
=
_
1
3
_
p (f
34
)
=
_
1
6
_
p (f
35
)
=
_
1
4
_
p (f
36
)
=
_
1
5
_
All this information can be represented in an information tree, Figure 3.3. 2
Here we dene the branching number, (f
tj
) of an information structure.
_
.729
2.4
_

e
e
e
e
e
e
e
e
_
.81
1.26
_
_
.81
1.26
_

d
d
d
d
_
.9
1.4
_
_
.9
1.4
_
_
.909
1.4
_
r
r
r
r
r
r
r
r
r
r
r
r
_
1
1
_
_
1
2
_
_
1
3
_
_
1
6
_
_
1
4
_
_
1
5
_
Figure 3.3: Price System in Example 3
Denition 13 The branching number of a sequence of partitions f
t
is dened
for all t [0, T] and is the number of sets of the partition f
t+1
which are subsets of
f
tj
. The branching index,
m
is the maximum value of the branching numbers over
all subsets of partitions, i.e.,
m
= max (f
t,j
)[0 t T 1 and 1 j # of subsets of f
t
.
The denition may sound complicated, but (f
tj
) denotes the number of branches or
nodes from the subset of partition, f
tj
. It is easier to nd the branching numbers by
drawing the information tree like Figure 3.3. In Example 3, the branching numbers
are
(f
0
) = 2
(f
11
) = 1 (f
12
) = 2
(f
21
) = 2 (f
22
) = 2 (f
23
) = 2
Therefore, the branching index
m
= 2.
This branching number is critically important since that is equivalent to the
number of states, K in the one period model. Remember that in the one period
model, the market is complete i the number of independent securities, rank(D )
is equal to the number of states K. In the multi period model, in each node or a
subset of the partition, f
tj
, the feature of the market is isomorphic to the one period
model. For notation simplicity, we denote
p (t + 1, f
ti
) =
_
_
_
_
_
p
1
(t + 1, f
t+1 j+1
) p
N
(t + 1, f
t+1 j+1
)
p
1
(t + 1, f
t+1 j+2
) p
N
(t + 1, f
t+1 j+2
)

p
1
(t + 1, f
t+1 j+(f
ti
)
) p
N
(t + 1), j +(f
tj
))
_
_
_
_
_
.
where
f
t+1 j+1
f
t+1;j+(f
tj
)
f
ti
.
One Period Multi Period
# of states K (f
tj
)
# of assets rank(D ) rank(p (t + 1, f
tj
))
Then we can establish the following theorem.
Theorem 6 A market is dynamically complete i for each 0 t T 1,
rank(p (t + 1, f
tj
)) = (f
tj
)
0 j # of subsets of f
t
.
The above theorem directly bears a corollary about the necessary condition for
dynamic completeness.
Corollary 4 A necessary condition for dynamic completeness is the number of se-
curities is at least as large as the branching index of the information structure, i.e.,
N
m
.
Then, lets check if the price system in Example 3 is complete. Since the number of
securities is two and the branching index is 2, the price system satises the necessary
condition. Further we check the rank of price matrix in each node of partitions:
(f
0
) = 2 rank(p (1, f
0
)) = rank
_
.81 1.26
.81 3.75
_
= 2
(f
11
) = 1 rank(p (2, f
11
)) = rank
_
.90 1.4
_
= 1
(f
12
) = 2 rank(p (2, f
12
)) = rank
_
.90 4.2
.909 4.2
_
= 2
(f
21
) = 2 rank(p (3, f
21
)) = rank
_
1 1
1 2
_
= 2
(f
22
) = 2 rank(p (3, f
22
)) = rank
_
1 3
1 6
_
= 2
(f
23
) = 2 rank(p (3, f
23
)) = rank
_
1 4
1 5
_
= 2
Because at each node, the branching number is equal to the rank of price matrix,
the price system in Example 3 is complete.
Example 4 Continuation of Example 3. A consumption scheme is represented in
Figure 3.4. How much is needed to attain this consumption scheme? If we think
of this consumption process as a payo process of a security, this is equivalent to
the price of the security. Now what trading strategy is needed to facilitate this
consumption process? For this, it is necessary to keep solving a trading decision at
each node backward from T 1.
t=2:
1. f
21
: Buying
1
(f
21
) of security 1 and
2
(f
21
) of security 2. Then, the
value of the portfolio should be
1 =
1
(f
21
) +
2
(f
21
) at
1
2 =
1
(f
21
) + 2
2
(f
21
) at
2
The solution is
1
(f
2
1) = 0 and
2
(f
2
1) = 1. Hence at this time, in
addition to $1 of consumption, we need $1.4 (= 11.4) to buy a portfolio,
so all told, the amount needed is $2.4.
2. f
22
: Do not buy any security
3. f
23
: Do not buy any security
t = 1:
0
e
e
e
e
e
e
e
e
2
1
d
d
d
d
1
0
0
r
r
r
r
r
r
r
r
r
r
r
r
1
2
0
0
0
0
Figure 3.4: Consumption Scheme in Example 3
1. f
11
: The branching number is 1. In he next period (f
21
) you need $2.4.
Hence your portfolio decision is the solution to
.9
1
(f
11
) + 1.4
2
(f
11
) = 2.4
innite number of solutions. Two examples are
1
(f
11
) = 8/3
2
(f
11
) = 0
1
(f
11
) = 0
2
(f
11
) = 12/7
To implement this trading, we need $2.16 (=
8
3
.81 or
12
7
1.26). Of
course if the prices of tradings are dierent, arbitrage trading strate-
gies. Thus, the total amount for consumption and trading is $4.16.
2. f
12
: Do not buy any security. The total amount needed for consumption
is $1.
t = 0: A trading strategy has to bear a payo of $4.16 at f
11
and $1 at f
12
.
4.16 = .81
1
(f
0
) + 1.26
2
(f
0
)
1 = .81
1
(f
0
) + 3.75
2
(f
0
)
The solution is (7.1098 -1.269). Hence, the amount of needed today (or the
price of this security) is
7.1098 .729 + (1.269 2.4) = 2.1374 2
Example 5 In Example 3, lets change the price system such that at node f
23
such
that
p
1
(f
23
) = .912 p
2
(f
23
) = 4.256.
Then, at the node of f
12
,
(f
12
) = 2 rank(p (2, f
12
)) = rank
_
.90 4.2
.912 4.256
_
= 1
So, the market is not complete. If you buy a shares of security 1 and b shares of
security 2,
nc(f
22
) = .9a + 4.2b
nc(f
23
) = .912a + 4.256 =
76
75
nc(f
22
) (3.2)
Equation (3.2) is a restriction for attainability of consumption. 2
3.7 Attainable Consumption Process
In this section, we will dene attainable consumption process. From now on we
assume that one of the securities is a money market account with interest rate r
t
such that
d
f
=
T
t=1
(1 +r
f
(s))
p
f
(t) =
t
s=1
(1 +r
f
(s))
From the denition of consumption set, we establish some identities.
Proposition 4 We have the following identities.
t1
s=0
(c(s) e(s))
. .
cum. net cons. ut. t1
+ (t)
p (t)
. .
curr. pf.
=
t
s=1

(p (s) unp(s 1))
. .
cg
(3.3)
t
s=0
(c(s) e(s))
. .
cum. net cons. ut. t
+ (t + 1)
p (t)
. .
new pf.
=
t
s=1

(p (s) p (s 1))
. .
cg
(3.4)
t1
s=0
_
c(s) e(s)
p
f
(s)
_
. .
cum. net dis. cons. ut. t1
+
(t)
p (t)
p
f
(t)
. .
dis. curr. pf.
=
t
s=1

_
p (s)
p
f
(s)

p (s 1)
p
f
(s 1)
_
. .
dis. cg
(3.5)
t
s=0
_
c(s) e(s)
p
f
(s)
_
. .
cum. net dis. cons. ut. t
+
(t + 1)
p (t)
p
f
(t)
. .
dis. curr. pf.
=
t
s=1

_
p (s)
p
f
(s)

p (s 1)
p
f
(s 1)
_
. .
dis. curr. pf.
(3.6)
Proof From equation (3.1),
t1
s=0
(c(s) e(s)) =
t1
s=0
(s)
p(s)
t1
s=0
(s + 1)
p(s)
=
t1
s=1
(s)
p(s)
t
s=1
(s)
p(s 1)
=
t
s=1
(s)
p(s) (t)
p(t)
t
s=1
(s)
p(s 1)
=
t
s=
(s)
(p(s) p(s 1)) (t)
p(t)
This is the proof of equation (3.3). Equation (3.4) can be derived by adding
c(t) e(t) = ( (t) (t +1))
p (t) to equation (3.3). Equation (3.5) and (3.6) can

be veried similarly starting from dividing equation (3.1) by p
f
(t). 3.
Similar to the denition of attainable set in the one period model, we set e(f
tj
) = 0
t and 1 j # of subsets of f
t
. Then the denition of attainable consumption
processes in the multiperiod model is as following:
Corollary 5 A consumption processes is attainable i it satises the following con-
ditions (2 t T):
t1
s=1
c(s) + (t)
p (t) = (1)p (0) +

t
s=1
(s)
(p (s) p (s 1))
t1
s=1
c(s)
p
f
(s)
+
(t)
p (t)
p
f
(t)
= (1)p (0) +
t
s=1
(s)
_
p (s)
p
f
(s)

p (s 1)
p
f
(s 1)
_
Proof It is a direct result of Proposition 4 and the fact that
c(0) = e(0) (1)
p (0). 2
In particular, the total cumulative consumption process or discounted consump-
tion process can be represented,
T
s=1
c(s) = (1)
p (0) +
T
s=1
(s)
(p (s) p (s 1))
T
s=1
c(s)
p
f
(s)
= (1)
p (0) +
T
s=1
(s)
_
p (s)
p
f
(s)

p (s 1)
p
f
(s 1)
_
which means that the total (discounted) cumulative consumption equals initial value
of the portfolio plus the total (discounted) cumulative capital gains.
3.8 Self-Financing Strategy
A self-nancing strategy indicates that a trader consumes exactly his or her en-
dowments and does not consume capital gains in intermediate times. A simple
buy-and-hold strategy and a rebalancing strategy are examples.
Denition 14 A self-nancing strategy (t) (1 t T 1) is a trading strategy
which satises
( (t) (t + 1))
p (t) = 0
The above denition means
c(t) = e(t) 1 t T 1.
A self-nancing strategy can be used to price contingent claims. Any contingent
claims which can be priced should be replicated by some self-nancing strategies. If
the payout of this security is replicated by a self-nancing strategy
d
r
= D (T) (T),
then, the initial price of that security must equal the initial cost of the self-nancing
strategy,
p
r
= (1)
p (0).
Using the denition of a self-nancing strategy and Proposition 4, we can establish
the following corollary.
Corollary 6 A self-nancing strategy can be characterized in the following equiva-
lent ways:
(t)
p (t) = (1)p (0) +

t
s=1
(s)
(p (s) p (s 1))
(t)
p (t)
p
f
(t)
= (1)p (0) +
t
s=1
(s)
_
p (s)
p
f
(s)

p (s 1)
p
f
(s 1)
_
for 1 t T.
Proof Applying the fact that c(s)e(s) = 0 s to Proposition 1 yields the results.
3
3.9 Equivalent Martingale Measures
Again note that the multiperiod model can be broken into a collection of a one-period
models, and equivalent martingale measures are dened for each one-period moel
in this collection. In that sense, equivalent martingale measures in the multiperiod
model is an extension of the same concept in the one-period model;
Equivalent martingale measures are derived from the given price system and
constitute articial probability measures.
An equivalent martingale measure exists i the given price system does not
allow any free lunch, or equivalently, there is an supporting equilibrium for
the given price system. Then, the price system is called viable.
An equivalent martingale measure will be unique i the market is dynamically
complete.
Again, we assume that a money market account whose return at time t is r
f
(t),
so r
f
(t) =
p
f
(t)
p
f
(t1)
1 f
t i
and f
t+1 j
. The vector of prices at the node f
ti
is
p (f
ti
)
= (p
1
(f
ti
) p
2
(f
ti
) p
N
(f
ti
)).
And instead of the dividend matrix D in the one-period model, except the case of
t + 1 = T, we have the matrix of prices,
p (t + 1, f
ti
) =
_
_
_
_
_
p
1
(t + 1, f
t+1 j+1
) p
N
(t + 1, f
t+1 j+1
)
p
1
(t + 1, f
t+1 j+2
) p
N
(t + 1, f
t+1 j+2
)

p
1
(t + 1, f
t+1 j+(f
ti
)
) p
N
(t + 1), j +(f
tj
))
_
_
_
_
_
.
where
f
t+1 j+1
f
t+1;j+(f
tj
)
f
ti
.
Denition 15 If for a given price system p (t) the system of linear equations
p (f
ti
) =
p (t + 1, f
ti
)
Q (t + 1, f
ti
)
1 +r
f
(t + 1)
(3.7)
has a positive solution Q (t + 1, f
ti
) > 0 , the this Q (t + 1, f
ti
) > 0 is called an
equivalent martingale measure for the price system p (f
ti
). Then,
Q() =
T
t=1
Q(t, f
ti
())
where f
ti
() is a set of the partion at time f which is an element of. This
(Q(
1
) Q(
K
)) is called an equivalent martingale measure for the price system
p (t). Notice that Q(t + 1, f
ti
, f
t+1 j
) is a conditional probability,
Q(t + 1, f
ti
, f
t+1 j
) = Q(t + 1, f
t+1 j
[f
ti
) =
_
Q(t+1,f
t+1 j
)
Q(t,f
ti
)
if f
t+1 j
f
ti
0 otherwise
(3.8)
where Q(t, f
ti
) =

f
ti
Q().
Example 6 A price process and the dividend of three securities are represented in
Figure 3.5. Lets compute Equivalent martingale measures for this price system.
when t = 1: r
f
(2) = 0
1. at f
11
:
p (f
11
) =
_
_
_
1
5
3
_
_
_ = p (2, f
11
)
Q (2, f
11
) =
_
_
_
1 3 2
1 4 3
1 8 4
_
_
_
_
_
_
Q(2, f
11
,
1
)
Q(2, f
11
,
2
)
Q(2, f
11
,
3
)
_
_
_
The solution to the above equation is
Q (2, f
11
)
= (1/3 1/3 1/3).

2. at f
12
:
p (f
12
) =
_
_
_
1
7
4.5
_
_
_ = p (2, f
12
)
Q (2, f
12
) =
_
1 6 4
1 8 5
_
_
Q(2, f
11
,
4
)
Q(2, f
11
,
5
_
Q (2, f
12
)
= (1/2 1/2).
when t = 0: r
f
(1) = 0
p (f
0
) =
_
_
_
1
6
3.75
_
_
_ = p (1, f
0
)
Q (1, f
0
) =
_
1 5 3
1 7 4.5
_
_
Q(1, f
0
, f
11
)
Q(1, f
0
, f
12
)
_
Q (1, f
0
)
= (1/2 1/2).
Therefore, the equivalent martingale measure for the given price system is
Q

= (1/6 1/6 1/6 1/4 1/4) .
Since the equivalent martingale measure is unique, the market is complete and does
not allow for any arbitrage transaction opportunities. Further,
Q(1, f
11
, f
0
) = Q(1, f
11
)/Q(0, f
0
) = (1/6 + 1/6 + 1/6)/1 = 1/2
Q(1, f
12
, f
0
) = Q(1, f
12
)/Q(0, f
0
) = (1/4 + 1/4)/1 = 1/2
Q(2, f
21
, f
11
) = Q(2, f
21
)/Q(1, f
11
) = (1/6)/(1/2) = 1/3
Q(2, f
22
, f
11
) = Q(2, f
22
)/Q(1, f
11
) = (1/6)/(1/2) = 1/3
Q(2, f
23
, f
11
) = Q(2, f
23
)/Q(1, f
11
) = (1/6)/(1/2) = 1/3
Q(2, f
24
, f
11
) = = = 0
Q(2, f
25
, f
11
) = = = 0
Q(2, f
21
, f
12
) = = = 0
Q(2, f
22
, f
12
) = = = 0
Q(2, f
23
, f
12
) = = = 0
Q(2, f
24
, f
12
) = Q(2, f
23
)/Q(1, f
12
) = (1/4)/(1/2) = 1/2
Q(2, f
25
, f
12
) = Q(2, f
24
)/Q(1, f
12
) = (1/4)/(1/2) = 1/22
Now that we know how to derive equivalent martingale measures from the given
price system, lets examine why it is called so. To see that, we will establish the
following denition and a lemma.
_
_
.1
6
3.75
_
_

e
e
e
e
e
e
e
e
_
_
1
5
3
_
_
_
_
1
7
4.5
_
_
d
d
d
d
1
_
_
1
3
2
_
_
2
_
_
1
4
3
_
_
3
_
_
1
8
4
_
_
d
d
d
d
4
_
_
1
6
4
_
_
5
_
_
1
8
5
_
_
Figure 3.5: Price System in Example 6
Denition 16 For a random variable x on a probability space (, P) and a partition
f of , the conditional expectation E
p
(x[f) is the random variable
E
P
(x[f) =
f()
x()P()
f()
P()
.
Notice that the conditional expectation is measurable on f.
Example 7 =
1
, ,
5
, P(
k
) = 1/5 k = 1, , 5.
f
1
=
1
,
2
,
3
,
5
f
2
=
4
.
x(
1
) = 1, x(
2
) = 1, x(
3
) = 2, x(
4
) = 3, x(
5
) = 2
Then
E
P
(x[f) =
_
_
_
E
P
(x[f
1
) =
(1+1+2+2)/5
4/5
= 3/2
E
P
(x[f
2
) =
3/5
1/5
= 3
2
Lemma 1 The law of iterated expectations If a partition f is coarser than a
partition g, then for any random variable x
E
P
_
E
P
(x[g)[f]
_
= E
P
(x[f).
With this lemma, the representation of prices by the equivalent martingale measure
can be stated as follows.
Theorem 7 If Q(w) is an equilibrium price measure, then for an asset n and 0
t T
p
n
(t)
p
f
(t)
=
E
Q
[d
n
[f
t
]
p
f
(T)
=
E
Q
[d
n
[f
t
]
d
f
.
Proof From equation (3.7) and (3.8) in denition 15, the price of the asset n at an
element of partition, f
ti
is
p
n
(f
ti
) =
p
n
(t + 1, f
ti
)
Q (t + 1, f
ti
)
1 +r
f
(t + 1)
=
p
n
(t + 1, f
ti
)
Q (t + 1[f
ti
)
1 +r
f
(t + 1)
=
E
Q
[p
n
(t + 1)[f
ti
]
1 +r
f
(t + 1)
Hence, the price of this asset at time t is a random variable,
p
n
(t) =
E
Q
[p
n
(t + 1)[f
t
]
1 +r
f
(t + 1)
. (3.9)
and
p
n
(t + 1) =
E
Q
[p
n
(t + 2)[f
t+1
]
1 +r
f
(t + 2)
. (3.10)
Substituting equation (3.10) into equation (3.9) and using the fact that f
t
is coarser
than f
t+1
(from the denition of information consistency) and the law of iterated
expectation yields
p
n
(t) =
E
Q
[p
n
(t + 2)[f
t
]
(1 +r
f
(t)
(1 +r
f
(t + 1))(1 +r
f
(t + 2))
=
E
Q
[p
n
(t + 2)[f
t
]
(1 +r
f
(t)
p
f
(t + 2)
p
f
(t)
continuing these procedures yields the result. 3.
Next dene a martingale process.
Denition 17 A sequence of random variables x(t) for 0 t T is a martingale
on the sequence of partitions f
t
(i.e., adapted to f
t
) under a probability measure
P i
E
P
[x(s)[f
t
] = x(t) s t
Finally, we establish the following theorem representing the martingale property of
scaled asset prices.
Theorem 8 For each security the sequence of its scaled price is a Q-martingale:
p
n
(t)
p
f
(t)
= E
Q
_
p
n
(s)
p
f
(s)
[f
t
_
From theorem 7,
E
Q
_
p
n
(s)
p
f
(s)
[f
t
_
= E
Q
_
E
Q
(d
n
[f
s
)
p
f
(T)
[f
t
_
= E
Q
_
d
n
p
f
(T)
[f
t
_
=
p
n
(t)
p
f
(t)
3
Corollary 7 In general, the sum of the discounted cumulative consumption process
and the preconsumption discounted value process
t1
s=0
c(s)
p
f
(s)
+
(t)
p (t)
p
f
(t)
is a Q-martingale. If the trading strategy (t) is self-nancing, the the discounted
value process
(t)
p (t)
p
f
(t)
1 t T is a Q-martingale.
3.10 The Price Functional
The denition of the price functional is very similar to that in the one-period model.
Denition 18 Let p (t) be a price system that does not permit arbitrage strate-
gies. The price functional : IR M(p ) IR (or : M(p ) IR) is such that
for every c IR M(p )
(c) = c(0) +(c(t))
for 1 t T, where
(c(t)) = theta (1)
p(0)
for any trading strategy (t) such that
c(t) = ( (t) (t + 1))
p (t).
Note that M(p ) is a linear subspace of the consumption set X and that is a
linear functional on M(p ). The relationship between the price functional and the
equivalent martingale measure is represented in the following theorem.
Theorem 9 If the price system p (t) does not permit arbitrage transactions, for
every c IR M(p ) and every equilibrium price measure Q
(c) = c(0) +
T
t=1
E
Q
(c(t))
p
f
(t)
or (c(t)) =
T
t=1
E
Q
(c(t))
p
f
(t)
Proof
c(t)
p
f
(t)
=
1
p
f
(t)
_
( (t) (t + 1))
p (t)
_
=
1
p
f
(t)
_
_
( (t) (t + 1))
E
Q
[d(T)[f
t
]
p
f
(T)
p
f
(t)
_
_
= ( (t) (t + 1))
E
Q
[d(T)[f
t
]
p
f
(T)
Hence,
T
t=1
E
Q
_
c(t)
p
f
(t)
_
=
T
t=1
E
Q
_
( (t) (t + 1))
E
Q
[d(T)[f
t
]
p
f
(T)
_
=
1
p
f
(T)
E
Q
_
_
_
T
t=1
(t)
T
t=1
(t + 1)
_
d(T)
_
_
=
1
p
f
(T)
E
Q
_
_
_
T
t=1
(t)
T+1
t=2
(t)
_
d(T)
_
_
=
1
p
f
(T)
E
Q
_
_
_
T
t=1
(t)
T
t=2
(t)
_
d(T)
_
_
=
1
p
f
(T)
E
Q
_
(1)
d(T)
_
= (1)
p (0) 3
Chapter 4
Probability and Measure I
This chapter is a brief and very informal introduction to probability and measure
theory. They are meant primarily to acquaint the student with the terminology
used in this area. A more detailed and leisurely development of the ideas we present
here may be found in P. Billingsley Probability and Measure (Willey, 1986).
4.1 Probability Spaces
4.1.1 Denition of Probability Spaces
A probability space is a triple (, T, P) where
is the set of outcomes (or the sample space). An element of is generally
denoted .
T is a collection of subsets of .
P is a probability measure which assigns to each set A T, the probability
P(A)of the set A.
A rough interpretation of this construct may be obtained by viewing the set as
the set of possible outcomes of some experiment. For example, the experiment may
involve the toss of a coin, so = HH, HT, TH, TT. The set T of subsets of
consists of the subsets of to which we can assign probabilities; these sets are called
events. In a sense we will make precise shortly, T represents information; the more
information we have about the outcome of the experiment, the larger the number
of subsets of that we will be able to assign probabilities to, and so the larger the
46
number of sets that will be in T. Since P is required to assign a probability to every
set in T, T is also called the set of measurable sets of (under T), and a set A
is called measurable if A T.
We place some regularity assumptions on the sets in T and on the probability
measure P. Regarding T, we will require the following:
1. T, T.
2. If A T, then A
c
T.
3. If A
1
, A
2
, T, then
i=1
A
i
T.
These restrictions are mostly self-explanatory and intuitive. Condition 1 requires
that the empty set and the whole space be measurable, i.e., that P be able to
assign a probability to the set of all possible events, and to the empty set. Condition
2 states that if P can assign a probability to A, it should also be able to assign a
probability to the event not A, i.e., to the complement of A. Finally, Condition
3 requires that if we are able to assign a probability to a nite or countable family
of events, we should also be able to assign a probability to their union. When T
meets all three of these conditions, it is called a sigma-algebra of subsets of . We
will also write the term sigma-algebra as -algebra, as is common. The word led
is often used in place of algebra; the word tribe is also used sometimes, albeit
less frequently.
Regarding the probability P, we will require the following:
1. P() = 0; P() = 1.
2. 0 P(A) 1 A T.
3. If A
1
, A
2
, T, and A
j
A
k
= distinct j and k, then P (
i=1
A
i
) =
i=1
P(A
i
).
Condition 1 simply states that the empty set has zero probability, and the set of all
possible outcomes has probability one. These are obviously desirable characteristics
in a probability measure. Condition 2 requires that all portabilities be non-negative
and less than one, again a condition that one intuitively associates with probability
measure. Condition 3 states that the probability of the union of disjoint events
should be the sum of their individual probabilities.
4.1.2 The SigmaAlgebra as Information
The following examples and discussion are aimed at clarifying the concepts we have
presented above, and especially, at reinforcing the interpretation of T as informa-
tion about the experiments outcome.
Example 1 (Coin Tossing) Let = H, T, and T = , , H, T. Dene P
by P() = 0 = 1 P(), and P(H) = 1/2. Then, corresponds to the sample
space of a coin toss, while the condition that P(H) = P(T) = 1/2 implies that
the coin is fair. 2
Example 2 (Two Tosses of a fair coin) Let = HH, HT, TH, TT. Let T consist
of all the subsets of . It is not hard to verify that T, so dened, is a sigma-
albegra. Now dene P as follows: if A T has the form HH, HT, or TT, let
P(A) = 1/4. For all other A, dene P(A) from these basic probabilities using the
three rules that dene a probability. For all other A, dene P(A) from these basic
probabilities using the three rules that dene a probability. (For instance, the set
HT, TH is the union of the disjoint sets HT and TH, so set P(HT, TJ) =
P(HT) +p(TH). 2.
In each of the examples above, the sigma-algebra T was simply the set of all subsets
of (i.e., every subset of was assumed to be measurable).
1
This is equivalent
to saying that at the end of the experiment, we know exactly which outcome
occurred.
In genera, however, the -algebra need not consist of all possible subsets of .
for instance, if is any set, and A is any proper subset of , then
T = , , A, A
c
is a -algebra on . Such a pair (, T) would correspond to a case of partial

information: for each possible outcome of the experiment, we are able to judge
if lies in A or not, but nothing more; in particular, this may not be enough
information to judge precisely which A occurred,
As an example of a problem with partial information, consider the problem of
two tosses of a coin again, but where we are only able to observe the outcome of
the rst toss. Suppose this outcome is H. Then all we know is that one of the
two events HT or HH occurred; we cannot, however, say with any degree of
certainty which of them was the true event. Thus, our set of measurable subsets of
would be
T = , , HH, HT, TH, TT.
1
For any set , we will use the term 2
to denote the set of all subsets of (including

the empty set); thus, in such cases as the examples above, we will simply write T = 2
. It
is easily veried that the power set 2
of any set meets the three conditions required to

be a -algebra.
On the other hand, if we are able to observe the outcome of both tosses, then the
correct -algebra would be 2
as in Example 2. This is the sense in which T captures

information; the ner T is (i.e., the more subsets of it contains), the better our
information about the outcome of the experiment.
4.1.3 Sigma-Algebras Generated by an Arbitrary Col-
lection
Given any collection of subsets / of , it is evidently possible to complete A,
and make it into a -algebra, by adding to the sets in /, the complements and
countable unions of all the sets in /, and continuing this procedure till no new sets
are added. The resulting -algebra is called the -algebra generated by /, and is
usually denoted (/). More abstractly, (/) is dened to be the smallest -algebra
on that contains all the sets in /, i.e., as the intersection over all -algebras that
contain /. Formally,
(/) = ([( is a algebra on , and / (.
for example, suppose = 1, 2, 3, 4.LetA=1,2. Then, (/) is given by
, , 1, 2, 3, 4, 1, 2, 2, 3, 4, 1, 3, 4.
An important consequence of the denition of ([calA) as the smallest -algebra
containing / is that if we are able to assign probabilities to all the sets /, then using
the three rules used to dene a probability, we will be able to dene probabilities
on all the sets in (/).
Example 3 (The Borel Sigma-Algebra) Let = IR. Take / to be the set of all
open intervals of the form (a, b) in IR. Then, the -algebra generated by / is called
the Borel -algebra on IR and will be denoted B. 2
4.2 Random Variables
4.2.1 Random Variables Dened
Let a probability space (, T, P) be given. A random variable is simply a function
X from to IR which satises the condition that B B, the set X
1
(B) is in T,
where
X
1
(B) = [X() B.
The condition that X
1
(B) T B B is often expressed as the requirement that
X be measurable with respect to T, or simply, that X be T-measurable.
The distribution of a random variable X is the probability measure
X
on (IR, B)
dened by
X
(B) = P(X
1
(B)), B B.
The distribution function of a random variable X, denoted F
X
(), is dened by
F
X
(x) = P([X() x =
X
((, x]).
The intuitive content of the measurability restriction in the denition of a random
variable may be grasped from the denition of
X
. In typical economic modelling,
we assume only that the outcome of a random variable (or its realization) is ob-
served. Thus, a priori, we are interested in the distribution
x
of the outcomes
X(). However, to be able to assign a probability to the event that the outcome
lies in a Borel set B, we need to know the probability of the set [X() B, i.e.,
of the set X
1
(B). But the only subsets of to which we can assign probabilities
are those in F. Thus, to calculate the distribution of X requires that X
1
(B) be
an element of T for each B B.
Example 4 As in Example 1, let = H, T, T = , , H, T and let
P(H) = P(T) = 1/2. Let X : IR be dened by
X() =
_
1 if = H
1 if = T
It is a simple matter to verify that X is measurable with respect to T. For, suppose
that B B. Then, we have
X
1
(B) =
_
_
if 1, 1 B
H if 1 , B, 1 B
T if 1 B, 1 , B
if 1, 1 , B
In each case, X
1
(B) T, so X is measurable with respect to T. The random
variable X dened in this fashion is called a Bernoulli random variable. 2
The Bernoulli random variable takes on the value -1 with probability 1/2, and
+1 with probability 1/2. Therefore, the distribution of this random variable is the
probability measure
X
on B dened for B B by
X
(B) =
_
_
1 if 1, 1 B
.50 if 1 , B, 1 B
.50 if 1 B, 1 , B
0 if 1, 1 , B
nally, the distribution function F
X
of X is given by
F
X
(x) =
_
_
0 x < 1
1/2 x [1, 1)
1 x 1
4.2.2 The Information Contained in a Random Vari-
able
Because the measurability restriction depends on the precise -algebra T on the
original probability space, the realization of the random variable contains informa-
tion about which has occurred. Two points are important in this context:
1. The random variable cannot contain more information than is present in T.
2. It may not even contain any amount of information.
The rst point is easiest to illustrate through some simple examples. First, suppose
T is totally information-less, i.e., that T = , . (This is called the trivial
-algebra on .) Then, for any random variable to be T-measurable, it is necessary
that for any B B, we have either X
1
(B) = or X
1
(B) = . This is possible
i X is constant on . Thus, in this case, observing X() gives us no information
at all regarding which occurred.
On the other hand, suppose T has the form , , A, A
c
for some A . Then,
a function X from to IR can take on upto two distinct values a and b: it will be
measurable as long as the value of X is equal to a constant on A, and is equal to a
(possibly dierent) constant on Z
c
.
2
Thus, observing the realization of X can give
us partial information about which occurred-we can surmise whether A
or A
c
.
It is not too dicult to see from this discussion that the ner the information
contained in T, the more the information that may be carried in a realization of a
random variable that is T-measurable.
Although a random variable cannot contain more information than is present
in T, it could contain less. This is an important point. Consider, as an example,
a random variable X on a probability space , T, P which is constant on (say,
X() = k ). If B is any Borel set, then we have
X
1
(B) =
_
if k B
if k , B.
2
However, it is easy to show that X cannot take on more than two distinct values on
and remain measurable.
Since and must be in any -algebra T, it follows that a constant function such as
X will be T-measurable for any T. However, regardless of the information contained
in T, observing a realization of X gives us no information at all concerning which
may have occurred.
4.3 The Sigma-Algebra Generated by a Ran-
dom Variable
The second point of the previous subsectionthat a random variable X on a prob-
ability space , T, P may fail to contain as much information as Tmotivates
the next concept we introduce. If we observe the outcome of an T-measurable ran-
dom variable X, then the amount of information contained in the observation can
be expressed as the smallest -algebra ( with respect to which X is measurable.
We denote this -algebra by (X), and term it the -algebra generated by X.
Formally,
(X) = ([X is ( - measurable.
By denition, (X) contains exactly as much information as is revealed by X; since
X cannot contain more information than is present in T, we must have (X) T.
The following examples, (X), all pertain to a situation where X is an T-
measurable random variable on some set .
Example 5 Suppose X() = k . Then, as we have seen, X is measurable
with respect to the trivial -algebra , . since no -algebra can be smaller
than the trivial -algebra, it follows that (X) = , . Of course, X reveals no
information at all, and neither therefore, does (X). 2
Example 6 Suppose there is a set A contained in (A , ) such that
X() =
_
k
1
A
k
2
A
c
where k
1
,= k
2
. By observing X, we can see if A or A
c
. thus, (X) must
contain at least the sets A and A
C
. On the other hand, it is easy to verify that X
is measurable with respect to the -algebra , , A, A
c
. It follows that we must
have (X) = , , A, A
c
. 2
4.4 The Integral of a Random Variable
Throughout this section, we will assume that X is a random variable on some
probability space , T, P.
rst, we consider the case where X is a simple random variable, that is, a random
variable that assumes only nitely distinct values a
1
, , a
n
. Let
i
= [X() =
a
i
. Then, the probability that X assumes the value a
i
is simply P(
i
), and for
such random variables, we dene the integral of X (with respect to) P as
_
x()dP() =

i=1
na
i
P(
i
).
In shorthand notation, we will write
_
XdP for
_
X()dP() in the sequel.

The denition of the integral of a random variable gets more technically involved
if X is not a simple random variable. We sketch the construction here, omitting the
details. Suppose that X IS A positive random variable, i.e., that X() 0. Dene
S(X) = Y [Y is an T measurable simple random variable, and Y X.
Then, the integral
_
XdP of X, dened as
_
XdP = sup
__
Y ()dP()[Y S(X)
_
.
The motivation for this denition is two-fold. First, it is the case that any positive
random variable can approximated arbitrarily closely from below by simple random
variables, i.e., that given any (T-measurable) positive random variable X, there
exists a sequence of (T-measurable) simple random variables Y
n
such that Y
n
X
and Y
n
() X() for each , as n +. second, although the approximating
sequence need not be unique, it is also true that if Y
t
and Z
n
are two dierent
sequences of simple random variables that approximate X, then the sequences of
integrals
_
n
dP and
_
Z
n
dP have the identical limit. This is the direct result
of so-called Monotone Convergence theorem. It is natural, then, to call this limit
_
XdP. It is left to the reader as an exercise to show that
_
XdP dened in this way
(that is, as the limit of the integrals of approximating sequences of simple random
variables) is the same as the denition we have provided above.
Finally, consider the case where X is an arbitrary random variable. Dene
X
+
() =
_
X() if X() 0
0 otherwise
X
() =
_
0 if X() 0
X() otherwise
Note that X
+
and X
are then both positive random variables, and that X =

X
+
X
. We dene the integral

_
XdP of X now by
_
XdP =
_
X
+
dP
_
X
dP.
The integral has the following very useful properties:
1. If X 0, then
_
XdP 0.
2. If X and Y are random variables on , T, P, and if Z is the random variable
X + y (i.e., if , we have Z() = X() + Y (), then
_
ZdP =
_
XdP +
_
Y dP.
3. If a IR, and aX is the function that at any takes on the value aX(),
the
_
(aX)dP = a
_
XdP.
Properties 2 and 3 are often combined into the single statement that the integral is
linear in X.
4.5 The Mean of a Random Variable
the mean of a random variable X (dened on a probability space , T, P, or its
expected value, is precisely the integral of T with respect to P. Denoting this mean
by either
X
or E(X), we have
X
= E(X) =
_
XdP.
A crude interpretation of this expression is that the value X() is viewed as
occurring with probability dP(); summing over all (i.e., taking the integral
over ) gives us the expected value of X.
An equivalent denition of the mean of X (and, therefore, also the integral of X)
can be obtained by considering the distribution
X
OF x. As a rough motivation,
note that the probability that X lies in a small interval dz around a point z is
X
(dz). Thus, the expected value of X can also be obtained by summing over all
possible z (i.e., by integrating over IR; indeed, we have
x
=
_
IR
z
x
(dz).
Of course, we could have also used the distribution function F
X
(), rather than
the distribution
X
, to evaluate the integral of X. In this case, we would have
X
=
_
IR
zdF
X
(z).
4.6 The Variance and Covariance of a Ran-
dom Variable
If X is a random variable on , T, P, then it is easily veried that so is the
function X
2
. Thus, we can dene the expected value E(X
2
of X
2
. the variance of
X, denoted
2
X
is dened as
2
X
= E(X
2
) [E(X)]
2
.
The variance of a random variable is always non-negative since we have also
E(X
2
) [E(X)]
2
= E[X E(X)]
2
,
and, as a non-negative random variable, [X E(X)]
2
must have non-negative ex-
pectation.
If X and Y are both random variables on , T, P, then the covariance of X
and Y , denoted
XY
, is dened as
XY
= E(XY ) E(X)E(Y ),
where, of course, XY is the random variable whose value at any is X()Y ().
Example 7 Let =
1
,
2
, T = 2
, and P(w
1
) = P(w
2
) = 1/2. Dene the
random variable X on (, T, P) by
X() =
_
1 =
1
0 =
2
Then,
E(X) =
_
IR
z
X
(dz)
=
1
2
1 +
1
2
0
=
1
2
,
and
2
x
= E[X E(X)]
2
=
1
2

_
1
1
2
_
2
+
1
2

_
0
1
2
_
2
=
1
4
. 2
4.7 Independent Random Variables
Given two random variables X and Y on a probability space , T, P, we say that
X and Y are independent random variables if it is the case that B and B
in B,
P([X() B, Y () B
= P([X() B)P([Y () B
).
In words, this denition says that conditioning on the outcome of X cannot have
any eect on the distribution of Y .
Example 8 Let =
1
,
2
,
3
), T = 2
, and P(w
i
) =
1
3
i. Dene the random
variable X and Y on (, T, P) by
X() =
_
_
1 =
1
0 =
2
1 =
3
and
Y () =
_
1 =
1
,
3
0 =
2
Let B be the Borel set consisting of the singleton point +1. Then, the probability
of the event that both X and Y are in B is the probability of
1
, so we have
P(X B, Y B) = P(
1
) =
1
3
.
On the other hand, since X() B for +
1
, and Y () B for =
1
or =
3
,
we have
P(X B)P(Y B) = P(
1
)P(
1
,
3
) =
1
3

2
3
=
2
9
,
so X and Y are not independent. 2
Chapter 5
Probability and Measure II
Building on the previous chapter, this chapter introduces three topics:
The notion of absolute continuity of probability measures and the Radon-
Nikodym derivative
the ideas of conditional probability and conditional expectation.
The denition of a stochastic process. In particular, the denition of a mar-
tingale, and the denition of a Markov process.
The concepts provided here play an important role in understanding the denition
and properties of a Brownian motion. They also play an important role for under-
standing the notion of an equivalent martingale measure which we briey discussed
earlier.
5.1 The Radon-Nikodym Theorem
The Radon-Nikodym theorem is a discrete-time version of Girsanov theorem which
is one of the most powerful mathematical tool in continuous time mathematics. The
theorem provides a set of conditions under which, given two probability measures P
and Q, the expected value of a random variable under P can be transformed to
expected values under Q, by reassigning the weights given by Q to dierent sets.
Specically, it provides conditions under which there exists a non-negative function
h such that if X is any random variable, we have
_
XdP =
_
(Xh)dQ.
57
In a crude sense, the theorem may be viewed as providing conditions under which
the weights dP under P can be replaced with the weights hdQ, without aecting the
integral. The theorem plays an important role in the use of risk-neutral techniques
in evaluating contingent claims.
The key condition in the Radon-Nikodym theorem is that the probability mea-
sure P be absolutely continuous with respect to Q. We begin with a denition of
this concept. An intuitive interpretation of the notion of absolute continuity, and
especially of its importance for the Radon-Nikodym theorem, is given following the
statement of the theorem.
Let , T be a measurable space, and let P and Q be probability measures on
, T. We say that the measure P is absolutely continuous with respect to the
measure Q, if it is the case that whenever a set A T has Q-probability 0 (i.e.,
nullset under P), it also has P-probability 0 (i.e., nullset under Q), that is, if
Q(A) = 0 =P(A) = 0.
We write this as P << Q. In must be stressed at the outset that it is possible that
we have P << P without also having Q << P. That is, there could exist sets of
P-measure zero, which have positive probability under Q. the following example
illustrates:
Example 1 Let =
1
,
2
,
3
, and let T = 2
. Let P and Q be dened by

Q(
1
) =
1
2
, Q(
2
) =
1
2
, Q(
3
) = 0,
and
P(
1
) = 1, P(
2
) = 0, Q(
3
) = 0.
Then, the only non-empty set in T that has Q-measure zero is w
3
. Since this set
also has P-measure zero, we have P << Q. Note, however, that the set w
2
T
has P-measure zero without having Q-measure zero, so it is not the case that Q <<
P. 2
If we have probabilities P and Q on a given space (, T) such that both P << Q
and Q << P are true, then we say that P is equivalent to Q. If P is equivalent to
Q, then we have, by denition, for any A T,
P(A) = 0 i Q(A) = 0,
i.e., they have the same events of probability zero. Now given the denition of
absolute continuity and equivalence, we set up the following theorem:
Theorem 10 (Radon-Nikodym Theorem) Suppose P and Q are probability
measures on a measurable space , T such that P << Q. then, there exists a
non-negative function h : (, T) (IR, B) such that if X is any random variable on
, T, the expected value of X under P is the same as the expected value of Xh
under Q, i.e., we have
_
X()dP() =
_
[X()h()]dQ().
Remark 4 The power of the theorem comes from the fact that the same function
h works for all random variables X dened on , T). The function h is called the
Radon-Nikodym derivative of P with respect to Q, and is often expressed simply as
h =
dP
dQ
. 2
the intuitive content of the Radon-Nikodym theorem, as well as of the condition
that P be absolutely continuous with respect to Q, may be grasped by considering
the special case where consists of only a nite set of points, and T = 2
. Let
P() and Q() denote, respectively, the probabilities under P and Q of the point
P. If P << Q, then we must have
Q() = 0 P() = 0.
The only possible diculty that could arise in this denition of h is that it might
be the case that there is a point such that P() > 0 and Q() = 0; in this case,
the left-hand side would be zero for any chose of h, but the right-hand side would
be strictly positive. However, when P << Q, then such a case is impossible: if
Q() = 0, the P() must also be zero. Therefore, when P << Q, it is always
possible to dene h as described.
Now, observe that if X is any simple random variable on , the
_
XdP, the
expected value of X under the probability P, is simply given by
_
XdP =

X()P().
A similar statement holds for
_
XdQ. thus, we have for any X,
_
XdP =

X()P()
=

X()h()Q()
=
_
(Xh)dQ,
which is precisely the condition claimed by the Radon-Nikodym theorem. Note that
we could not have established this result is we had not had P << Q, since it would
not have been possible to dene h as required.
The following example illustrates the Radon-Nikodym theorem using an explicit
calculation.
Example 2 Let , T, P and Q be as in Example 1. Then, for any random variable
X, we have
_
XdP =
3
i=1
X(
i
)P(
i
)
= X(
1
) 1 +X(
2
) 0 +X(
3
) 0
= X(
1
)
Now dene h by h(
i
)Q(
i
) = P(
i
), i.e.,
h(
i
) = 2, h(
2
) = 0, h(
3
) = 0.
Then, for any random variable X, we have
_
XhdQ =
3
i=1
X(
i
)h(
i
)Q(
i
)
= X(
1
) 2
1
2
+X(
2
) 0
1
2
+X(
3
) 0
= X(
1
)
=
_
XdP.
It is also easy to show in this example that since Q is not absolutely continuous
with respect to P, it is not possible to dene a Radon-Nikodym derivative of Q with
respect to P. 2.
5.2 Conditional Probability, Conditional Ex-
pectations
Let a probability space (, T, P) be given. Let A and B be in T. Conditional
probability asks the following question: given that the event B has occurred (i.e.,
the true is known to lie in B) what is the probability that the event A has
occurred (i.e., what is the probability that A?)
When the probability of B is zero, then P(A[B) is not well-dened. When the
probability of B is greater than zero, the answer is given by the well-known formula:
P(A[B) =
P(A B)
P(B)
.
Note that the right-hand side can be computed from the information we have: since
B T, we know P(B). Since A T, and T is a -algebra, we must also have
A B T; therefore, we must also know P(A B).
More generally, we can ask the following question: given a random variable X
on (, T), and a set B T such that P(B) > 0, what is the expected value of X,
given that lies in B? This is called the conditional expectation of X given B, and
is written E(X[B).
If is a nite set, the answer is not too dicult to see. Suppose, for instance,
that T = 2
. For each B, we can nd the probability of given B, and using

these probabilities, we can calculate E(X[B). That is, we have
P([B) =
P()
P(B)
, B,
and P([B) = 0, otherwise. Therefore,
E(X[B) =

B
X()P([B).
By expanding the term P([B), we equivalently have
E(X[B) =

B
X()
_
P()
P(B)
_
=
1
P(B)
B
X()P().
By analogy, the conditional expectation in the general case (where may not be
nite) is given by
E(X[B) =
1
P(B)
_
B
XdP
We are now in a position to describe the main subject of this section. Let T be a
-algebra contained in T. Suppose some has occurred, and while we are
not told the value of we are given the information for each G (, whether G
or not. Denote our conditional estimate of the expectation of X given this limited
information by E(X[()().
1
Of course, given dierent values of , the sets in ( that
1
Limited information is an exaggeration, of course. If ( consists of a large number of
sets, then the information we receive will be considerable.
could belong to could dier, and so will our conditional expectation E(X[()().
In other words, the conditional expectation E(X[() is a random variable, whose
precise value depends on which occurs. The question is: what can we say about
this random variable? We motivate the answer using a few simple examples.
Example 3 First, we consider the case where ( is the trivial -algebra. Then, no
matter which actually occurs, we will only have the information that (there
are no other non-trivial sets in (.) Thus, the information present in ( is valueless,
and our estimate of the expectation of X given ( is the same as our expectation of
X without obtaining information from (. That is, we have in this case
E(X[() = E(X),
where, of course, E(X) =
_
XdP. Note that this result accords with intuition:
Since ( provides no information, our estimate of the conditional expectation of X
given ( should not dier from our unconditional expectation of X. 2
Example 4 Suppose, now, that we are given a slightly ner partition, specically
that ( = , , A, A
c
for some subset A of with 0 < P(A) < 1.
2
Any that
occurs must belong to either A or A
c
, so the information we will have is either that
A or that A
c
. (This is the best information we can get from (, since subsets
of A or A
c
do not belong to (.) Suppose we are given the information that A.
Then, we can calculate the expectation E(X[A) in the manner described above, i.e.,
E(X[A) =
1
P(A)
_
A
XdP.
Similarly, if we are given the information that A
c
, we can calculate E(X[A
c
).
Since every must belong to either A or A
c
, we have
E(X[()() =
_
E(X[A) if A
E(X[A
c
) if A
c
Thus, E(X[() is a random variable that takes on two values, depending on whether
Aor A
c
. 2
Example 5 Finally, we consider the case where ( = T, i.e., ( gives us as much
information as T. A random variable X is dened on T. In this case, we must have
E(X[() = X,
2
The inequalities 0 < P(A) < 1 are meant to ensure that neither A nor A
c
is a trivial
set.
that is, we must have for each ,
E(X[()() = X().
That is, if ( = T, knowing the information in ( is equivalent to knowing which
itself occurred. The reader is invited to think why this is the case. 2
The intuitive content of the notion of E(X[( is, hopefully, clear from these examples.
On a formal level, we dene the conditional expectation of X given ( to be a (-
measurable random variable which has the property that for any G (,
_
G
E(X[()()dP() =
_
G
X()dP().
The condition that the conditional expectation be (-measurable is entirely intuitive:
since E(X[(.
_
G
E(X[()()dP() =
_
G
X()dP().
The condition that the conditional expectation be (-measurable is entirely intuitive:
since E(X[() is calculated with only the information in (, observing the value of
E(X[() should not provide us with any more information than is present in (,
and this is the same thing as requiring that E(X[() be (-measurable. The second
condition ensures, among other things, that if E(X[()() is equal to a constant k
for all in some set G ( (as was the case for G = A in Example 2), the we must
have
k =
1
P(G)
_
G
XdP = E(X[G).
This is also intuitive. If E(X[()() is constant on G (, then it must be the case
that knowing which G occurred tells us nothing more than just the knowledge
that some G occurred. Therefore, our conditional expectation from knowing
that some G occurred should coincide exactly with E(X[().
Two nal points:
1. First, suppose A T, and the indicator function 1
A
is dened as:
1
A
() =
1 if A
0 if , A
Then the function E(1
A
[() is called the conditional probability of the set A
given the information in (, and is also denoted P(A[().
2. Secondly, we are frequently interested in estimating the value of a (perhaps,
unobservable) random variable X given information about the realization of
another random variable Y . Since the information content of observing Y is
summarized by (Y ), this conditional estimate can be represented simply as
E(X[(Y )). For notational convenience, however, we will adopt the abbrevi-
ated from E(X[Y ) to denote this expression.
3. The conditional Radon-Nikodym theorem is also valid. For the sub-tribe (
and P << P
E
Q
(X[() =
1
E
P
(h[()
E
P
(hX[().
This conditional version is somewhat dierent from its unconditional counter-
part so as to adjust the eect of measure change on conditional probabilities.
5.3 Stochastic Processes
5.3.1 Stochastic Processes Dened
Let T be a subset of the real line. The set T will denote time. A stochastic
process is simply a time-indexed family of random variables, i.e., a family of
random variables X
t
tT
. If T is a continuum (e.g., T = [0, T] for some
T > 0), we say that X
t
tT
is a continuous-time stochastic process. If T
is discrete (e.g., T = 1, 2, 3, ), then X
t
tT
is said to be a discrete-time
stochastic process.
In dening a stochastic process, we typically want to capture the feature that
our information about the process, from observing the value of X
t
(), increases
(or, at least, does not decrease) with the passage of time. To achieve this, we
will make use of the fact that the larger the -eld a random variable is mea-
surable with respect to, the more the information contained in an observation
of the random variable. Specically, we will require that for a larger value of
t, X
t
be measurable with respect to a larger -eld.
More formally, let (, T, P) be a probability space, and let T
tT
be a family
of increasing family of sub--algebras on (, T), i.e., a family of -elds with
the property that if t < s, then
T
t
T
s
T.
Such a family of increasing -elds is called a ltration, or a -ltration. We
will denote the ltration by simply IF rather than use the more cumbersome
F
t
tT
.
3
The probability space (, T, P) together with the ltration IF will
be denoted (, T, IF, P).
In this notation, a stochastic process is a family of random variables X
t
on
(, T, IF, P) with the property that for each t, X
t
is an T
t
-measurable random
variable.
Remark 5 A minor technical point is worth mentioning here. Recall that
(Z) represents the information present in observing the random variable Z.
In an obvious extension of this notation, we will denote by (X
s
, s t) the
information contained in observing the stochastic process X
s
upto and in-
cluding time t.
4
Since X
s
is required to be T
s
-measurable for each s, and the
family T
s
is increasing in s, we must have
(X
s
, s t) T
t
.
We do not insist on equality between these -algebras. That is, we allow for
the possibility that the information T
t
we have at time t could be more than just
that contained in the process upto t. If, however, we wish to consider the case
where T
t
= (X
s
, s t) for all t, then we will emphasize this by writing T
X
t
for (X
s
, s t), and denoting the resulting -ltration by IF
X
= T
X
t

tT
.
5.4 Sample Paths and Induced Probabili-
ties
Let a stochastic process X
t
on a probability space (, T, IF, P) be given.
Fix any . For this xed , the time-t value of the stochastic process
is given by X
t
(). The mapping taking t into the value X
t
() is called the
sample path of the stochastic process X
t
corresponding to . In words, the
sample path corresponding to is simply to the time-path of the process that
will be observed if occurred.
Thus, under our denition of a stochastic process, each is associated
with a sample path. The probabilities of the dierent subsets of then
determines the probabilities of the dierent sample paths. That is, given a
3
The use of the word increasing is standard, but somewhat misleading. Strictly speak-
ing, it means that the family of -elds should be non-decreasing; that is, that our informa-
tion does not decrease with time. It is entirely possible that T
t
= T
s
t and s, so that our
information may not change with time.
4
Formally, (X
s
, s t) is the smallest -algebra with respect to which all the random
variables X
s
(s t) are measurable.
set C of possible sample paths for the outcome of the stochastic process, the
probability that the observed path will lie in this set is simply the probability
of the set of for which the sample path X
t
() lies in C. Put dierently,
the probability measure P on (, T) induces a probability measure on the set
of possible sample paths of the underlying variable.
Some new terminology is important in this context. If, for any xed ,
the sample path corresponding to (i.e., the mapping taking t into X
t
() is
a continuous function, then the stochastic process is said to have continuous
sample paths. If for some , this function fails to be continuous, then the
stochastic process will be said to have discontinuous sample paths.
5.5 The Notion of Almost Surely
If a set A is such that P(A) = 0, then how a stochastic process dened on
behaves for A is really irrelevant, because the event A has a probability
zero of occurring. In the sequel, therefore, when we require properties of
a given stochastic process, we will not insist that the properties hold for all
; rather, we will often be content if the set A on which the property
fails to hold has probability zero. We express this as the condition that the
property hold almost surely, abbreviated a.s. Of course, which subsets of
have zero probability is decided by the underlying probability measure P. If
we which to stress this dependence on P, then we will write P-almost surely,
and abbreviate this as P-a.s.
5.6 Revisit to Martingales
In Chapter 3, we dened a martingale process, but here we study it more
rigorously. A martingale is essentially a stochastic process whose current
values are the best predictor of its expected future values. Formally, we say
that a stochastic process X
t
tT
dened on (, T, IF, P), is a martingale if
its is the case that t > s implies
E(X
t
[T
s
) = X
s
, P a.s..
To interpret this denition in words, recall that T
t
summarizes the information
we have at time s; therefore, E(X
t
[T
s
is the best estimate of the expectation
of the value of the process at time t, given the information we have upto time
s. The denition of a martingale says that this best estimate is precisely the
value taken by the process at time s.
Note the important point that whether a stochastic process is a martingale or
not depends on the probability measure P. In particular, a process which is a
martingale under a given probability measure P may fail to be a martingale
under a dierent probability measure Q. This point is signicant in the theory
of risk-neutral valuation of contingent claims, where we replace the actual
probability measure on the set of possible security-price paths, with another
measure under which the security prices become a martingale.
Similarly, whether a stochastic process is a martingale or not depends on the
information ltration IF, since this ltration decides the form of E(X
t
[T
s
. In
full terminology, we should actually say that the process A
t
is a martingale
with respect to the ltration IF. However, this is excessively pedantic, so
where the ltration IF is understood, we will simply suppress dependence on
it.
A typical example of a martingale is the stochastic process of wealth of a
gambler, who is facing a fair bet. Suppose, for instance, that the gambler
has an initial wealth of $100, and will win or lose $1 depending on whether
the outcome of a coin toss is heads or tails. If the coin is unbiased (i.e.,
if P(H) = P(T) =
1
2
), then the expected change in the gamblers wealth is
zero; the conclusion that his wealth process is a martingale follows. However,
if the coin is biased (say, we have P(H) >
1
2
), then the expected change in the
gamblers wealth is not zero, so the process of wealth fails to be a martingale.
This underscores the point that whether a particular process is a martingale
or not depends on the underlying probabilities.
A stochastic process X
t
, dened on (, T, IF, P) is said to be a supermar-
tignale if it is the case that for all t and s with t > s, we have
E(X
t
[T
s
) X
s
.
If the inequality is reversed to , then the process is called a supermatingale.
In a rough sense, a supermatingales is the probability-theoretic analog of a
decreasing function, since the sample path of a supermartingale decreases on
average with time. A submartingale may, likewise, be thought of as the
probabilistic equivalent of an increasing function, and a martingale that of a
constant function.
Remark 6 Sometimes, we can see L
p
-martingale. X
t
is called an L
p
-
martingale i it is a martingale and X
t
L
p
for each t. Now what is L
p
?
Suppose the probability space is (, T, P). A set in of P-measure zero is
called a P-null set. For p [1, ), L
p
(, T, P) denotes the vector space of
T-measurable functions : IR for which
[[[[
p
=
__
[()[
p
dP()
_
1/p
is nite. If functions which are equal P-a.s. are identied, then L
p
(, T, P)
is a Banach space with norm [[ [[
p
. In the case p = 2, it is also a Hilbert
space with inner product given by (X, Y ) =
_
X()Y (s)dP() for X and Y in

L
2
(, T, P).
5.7 Markov Processes
A stochastic process is said to be a markov process if the future evolution of
the system depends only on where the process is currently, and not on how
it got there. That is, a stochastic process X
t
dened on (, T, IF, P) is a
markov process if for any t
1
< t
2
t
n
< t, and any Borel set B B, it
is the case that
P(X
t
B[X
t
1
, , X
t
n
) = P(X
t
B[X
t
n
).
In words, the denition says that the distribution of X
t
conditioned on the
values of the process at the earlier times t
1
, , t
n
is the same as the distribu-
tion from knowing only the value of the process at the latest of these times.
This operationalizes the notion that how the process got to its time-t
n
value
is not relevant.
The wealth process in the gambling problem dened in the previous subsection
is also an example of a Markov process. For, note that at time t + 1, the
gamblers wealth increases or decreases by a dollar from its time-t level; thus,
to the distribution of the gamblers wealth at time t + 1 conditioned on the
entire history of his wealth levels in all the past periods, is the same as the
distribution conditioned on only the wealth level in period t.
Chapter 6
Stochastic Calculus
In this chapter, we dene Brownian motions, stochastic integration, Ito calculus,
Girsanov theorem and Dynkins operators as well as Feynman-Kac solutions. Even
though these topics are well related to each other, here we cannot deal with each of
them in detail. A stochastic calculus per se is far beyond one semester. Hence, we
study only a couple of issues which are really important for our purposespricing
of contingent claims. Any student who is interested in this can refer to Chung and
William (1983), Karatzas and Shrev (1989) among many others.
6.1 Brownian Motions
We x a probability space (, T, P). As we have studied in the previous chapter, a
process is a measurable function on [0, ) into IR. The value of a process X at
time t is the random variable written as X(t), X
t
or X(, t) : IR.
Denition 19 A standard Brownian motion(SBM) is a process B dened by these
properties.
1. B
0
= 0 a.s.
2. (B
s
B
t
) N(0, s t) s > t.
3. For any time t
0
< t
1
, , t
n
, the random variables B(t
0
), B(t
1
)B(t
0
), , B(t
n
)
B(t
n1
) are independently distributed.
4. For each in , the sample path t B(, t) is continuous.
69
Another interpretation of Brownian motion is possible (in more technical sense). Let
/
2
be the space of L
2
-martingale whose time zero is zero. The quadratic variation
of a martingale S /
2
is the unique increasing process denoted [S] such that, for
each time t,
[S]
t
= lim
n
2
n
1
i=0
_
S
_
t
n
i+1
_
S (t
n
i
)
2
, (6.1)
where t
n
i
= i2
n
t for 0 i 2
n
. Thus the quadratic variation [S]
t
is roughly the
limit of sums of squared movements of the process during [0, t], taking the limit as
the size of the time intervals over which the movements are measured converges to
zero. Then suppose s /
2
is continuous. Then S is a SBM i [S]
t
= t t 0.
Proposition 5 If B(t) is an SBM, then B
2
t
t is a martingale.
One thing to note about an SMB is even though the sample paths are continuous,
they are nowhere dierentiable. Informally speaking, this means that even though
the time intervals over which the movements are measured converges to zero, its
variation does not die out.
1
Because of this nondierentiability of SBMs, ordinary
deterministic integration is not applicable, and we need a special integration called
stochastic integral or Ito integral named after Ito, who derived this.
6.2 Stochastic Integration
Remember the cumulative capital gain process that we studied in Chapter 3,
t
s=0
(s)
p (s).
In continuous time, the equivalent expression for this cumulative capital gain process
is
_
t
0
(s)dp(s).
Note that (s) and p(s) are stochastic processes. The price process p(s) is
p(t) = p(0) +A
t
+M
t
1
That is, the increment of an SBM over a small time period t is typically of order
(t)
1/2
.
where A
t
is a (predictable) part with nite variation and M
t
is an irregular innovative
part without nite variation. Then the cumulative capital gain process can be
rewritten as,
_
t
0
(s)dp(s) =
_
t
0
(s)dA
s
+
_
t
0
(s)dM
s
. (6.2)
The rst term in RHS is a deterministic integration whereas the second term is a
stochastic integration.
First, we recall the denition of deterministic integration. Suppose F : [0, ) IR
is a deterministic right-continuous function of time of nite variation.
2
Further,
suppose g : [0, ) IR is continuous. We then have the following deterministic
integration:
Denition 20 (Stieltjes Integral) For any time t [0, ),
_
t
0
g(s)dF(s) = lim
n
n
i=0
g
_
t
n
i+1
_ _
F
_
t
n
i+1
_
F (t
n
i
)
is the Stieltjes integral of g with respect to F, where t

n
i
is dened as for relation
(6.1).
Now, the rst term in (6.2) is well-dened by the Stieltjes integral because A
t
has
nite variation (under the assumption that A
t
is a right-continuous), so if A
t
() =
_
t
0
a
s
()ds,
_
t
0
(, s)dA(s, ) =
_
t
0
(, s)a
s
()ds.
This integral is dened separately for each xed time t and each xed state of the
world . Therefore, for a given , it is tantamount to a deterministic integral, so it
is called random Stieltjes integral.
The stochastic integration is, conceptually, similar to the integration of any arbitrary
random variables. We rst dene a simple or elementary stochastic process
n
(t),
and we show that as this sequence converges to (s), the sequence
_

n
dS con-
verges to
_
dS.
For simplicity, we rst restrict ourselves to a time interval T = [0, T] of nite
length. It can be easily extended to the innite horizon case. A stochastic process
2
By nite variation, we mean only that F is of the form G H, where G and H are
increasing functions of time.
is elementary if there is a partition T of the form 0, t
1
], (t
1
, t
2
], , (t
k
, T] such
that
t
is a constant over each set in the partition,
(t) = (t
n
), t (t
n1
, t
n
].
Now we dene formally the stochastic integration.
Denition 21 For any martingale S /
2
and any in H
2
, where
H
2
=
_
L
2
[E
_
_
T
0
2
t
dt
_
<
_
,
there exists a sequence
n
in H
2
such that
E
_
_
T
0
[
n
(t) (t)]
2
dt
_
0.
Then, there is a unique martingale denoted
_
ds /
2
such that, for any such
sequence
n
, the sequence of martingales
_
T
0
n
dS =
N1
i=0
(t
i+1
)[S(t
i+1
) S(t
i
)]
for t
n
= T converges to
_
dS; i.e.,
E
_
_
_
_
T
0
(t)dS(t)
_
T
0
n
(t)dS
t
_
2
_
_
0.
Of course, B
t
can be S
t
arbitrarily. The following remark summarizes the properties
of stochastic integrals.
Remark 7 Given (, T, P), and an SMB, B
t
,
1.
_
t
0
(ax(s) +by(s))dB(s) = a
_
x(s)dB(s) +b
_
y(s)dB(s)
2.
_
t
0
x(s)dB(s) =
_

0
x(s)dB(s) +
_
t
x(s)dB(s)
3.
_
Y (s)
_
t
0
x(s)dB(s)
_
is martingale
4. E
__
t
0
x(s)dB(s)
_
t
0
y(s)dB(s)
_
= E
__
t
0
x(s)y(s)ds
_
The last property is very important, and it is the direct result of
var(dB(s)) = E[(dB(s))
2
] = s.
From this chapter, we assume that all the regularity conditions are satised, so that
the stochastic integrals are well dened.
6.3 Stochastic Dierential Equations
An SBM B in IR
d
is given on some probability space (, F, P), along with the
standard ltration IF of B . A stochastic dierential equation (SDE) is an expression
of the form
dX
t
= (X
t
, t)dt + (X
t
, t)dB
t
(6.3)
where mu : IR
N
[0, ) IR
N
and : IR
N
[0, ) IR
Nd
are given functions.
Breaking up the time between s and t into small subintervals we say X
s
is a solution
to equation (6.3) if:
X
t
= X
0
+
_
t
0
(X
s
, s)ds +
_
t
0
(X
s
, s)dB (s). (6.4)
Note that
_
t
0
(X , t)dB
t

d
n=1
_
t
0

i
dB
i
t
. What conditions on the drift and
diusion ensure that (6.3) is well dened? Is the resulting Ito process X Markov?
We will supply sucient conditions. First, we extend the denition of the Euclidean
norm | | to the vector space IR
Nd
matrices by writing | D | [tr(D D

]
1/2
,
where tr(D ) =

i
D
ii
.
Denition 22 Lipschitz condition and Growth condition Let f : IR
N
times[0, )
IR
Nd
. If there exists a scalr k such that
| f(x , t) f(y , t) | k | x y |
x and y in IR
N
and t 0, we say that f satises a Lipschitz condition (in x ). If
there exists a scalar k such that
| f(x , t) | k(1+ | x |)
x and in IR
N
and t 0, we say that f satises a growth condition.
Proposition 6 Suppose and are Borel measurable and satisfy Lipschitz and
growth condition. Then a unique R
N
-valued Ito process X satisfying
X
t
= X
0
+
_
t
0
(X
s
, s)ds +
_
t
0
(X
s
, s)dB (s).
Furthermore, X is a continuous process and Markov with respect to IF. Finally,
the Ito integral
_
(X
t
, t)dB
t
is a martingale.
6.4 Itos Lemma
Let C
m
(IR
N
) denote the set of real-valued functions f on IR
N
that m times contin-
uously dierentiable. For example, if f C
2
(IR
N
) then, the gradient
f(x ) =
_
f(x )
x
1
, ,
f(x )
x
N
_
,
and the Hessian
2
f(x ) =
_
2
f(x )
x
i
x
j
_
NN
exist x , and dene continuous functions on IR
N
.
Lemma 2 (Itos Lemma) Suppose X is an IR
N
-valued Ito process of the form
X
t
= x +
_
t
0

s
ds +
_
t
0

s
dB
s
. f C
2
(IR
N
[0, )) and t 0,
f(X
t
, t) = f(x , t) +
_
t
0
Tf(x
s
)ds +
_
t
0
f(X
s
, t)
s
ddB
s
where
Tf(X
t
) = f
t
(x , t) +f(x , t)
t
+
1
2
tr
_

s

2
f(x , t)
s
_
.
And f
t
(x , t) = f(x , t)/t.
In the above lemma, Tf(, t) is called Dynkins operator sometimes. Along with
Itos Lemma one usually reads heuristic justications of the term
1
2
2
t
2
f(X
t
, t) as
the limit of the second order terms of the Taylor series expansions of f(X
t+
, t)
f(X
t
) as 0, using the fact that an SBM B
s
has quadratic variation [B]
t
= t :
t 0. While this is the central idea of most proofs, there are many other details.
6.5 Girsanov Theorem
Subjectively I think this theorem is one of the most important thing for valuation of
contingent claims. In the previous chapter, we study the Radon-Nikodym theorem
at length, and how important it is for asset pricing. The Radon-Nikodym derivative
divided by 1 + r
f
is the pricing kernel which hundreds of dierent asset pricing
pricing models try to derive.
In continuous time, Girsanov theorem deals with the construction of a Brownian
motion under a change of probability measure. We will only study the application to
Ito processes with the time set [0, T]. The ltered probability space is (, T
T
, IF, P),
and in particular, the standard ltration IF is dened over B = B
t
: 0 t T.
A vector of processes in L
2
satises Novikovs condition if
E
_
exp
_
1
2
_
T
0

ds
__
< .
The following lemma establishes how to change a probability measure.
Lemma 3 If L
2
satises Novikovs condition, then a process ( ) dened by
( )
t
= exp
__
t
0

dB
s
1
2
_
t
0

ds
_
, t [0, T],
is a positive martingale and E[( )
T
] = 1.
Proof Assume is bounded, and let X be the Itos process
X
t
=
_
t
0

dB
s
1
2
_
t
0

ds, t [0, T]
Then ()
t
= e
X
t
, and by Itos Lemma,
( )
t
= 1 +
_
t
0
( )
s

s
dB
s
(6.5)
Since is bounded, it follows that
E
_
_
_
_
T
0
( )
2
s

ds
_
1/2
_
_
< .
Then, ( )
s

H
2
, so that the stochastic integral
_
( )
t

t
dB
t
is a martin-
gale, implying from (6.5) that ( )
t
is a martingale and that E[( )] = 1. 3
Under the assumptions of this lemma, the random variable ( )
T
is positive and
has unit expectation. We can therefore dene a probability measure Q

on (, T)
by
Q
(A) = E
P
[( )
T
1
A
]. A T
meaning Q
has Radon-Nikodym derivative dQ
/dP = ( )
T
. Now we establish
Girsanov Theorem.
Theorem 11 Suppose suces Novikovs condition. Then the IR
N
-valued Ito
process

B dened by
B
t
= B
t
_
t
0
s
ds, t [0, T]
is an SBM in IR
N
for the ltered probability space (, T, IF, Q
).
Girsanov theorem basically allows one to change the drift of a given Ito process by
changing the probability measure.
Proposition 7 Let X is the IR
N
-valued Ito process on (, T, IF, P) and the SDE
dX
t
= (X
t
, t)dt + (X
t
, t)
dB
t
. (6.6)
Then, there exists a probability measure Q
on (, T) equivalent to P and an SBM
B in IR
N
for (, T, IF, Q) such that the Ito process X dened by equation(6.7) also
obeys the stochastic dierential equation.
dX
t
= [mu (X
t
, t) + ]dt +

(X
t
, t)dB
t
. (6.7)

Asset Pricing (안동현교수님강의노트)

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Asset Pricing (안동현교수님강의노트)

Caricato da

Copyright:

Formati disponibili

Financial Economics: Asset Pricing

H, f(1, 0 ) > 0. Without loss of generality, we normalize f such

A and dene a relation _

weakly prefers (0, 0 ) to every net trade

and the price vector is

i every set of g is a subset of some set of f, that is

3. For each 0 t T 1 the partition f

(p(s) p(s 1)) (t)

p (t) to equation (3.3). Equation (3.5) and (3.6) can

p (t) = (1)p (0) +

p (t) = (1)p (0) +

= (1/3 1/3 1/3).

is a -algebra on . Such a pair (, T) would correspond to a case of partial

to denote the set of all subsets of (including

of any set meets the three conditions required to

as in Example 2. This is the sense in which T captures

X()dP() in the sequel.

are then both positive random variables, and that X =

. We dene the integral

. Let P and Q be dened by

. For each B, we can nd the probability of given B, and using

X()Y (s)dP() for X and Y in

is the Stieltjes integral of g with respect to F, where t

has Radon-Nikodym derivative dQ

on (, T) equivalent to P and an SBM

Potrebbero piacerti anche