Active Portfolio Management

Active Portfolio Management
Lectures
1 Richard R. Lindsey
Portfolio Choice
Individual:
1. Strictly prefers more to less (strictly increasing utility
function)
2. Risk averse
0
initial wealth
riskless interest rate
random return on j-th risky asset
dollar investment in j-th asset
uncertain end of period wealth
f
j
j
w
r
r
a
w
Portfolio Choice
0
0
( )(1 ) (1 )
(1 ) ( )
j f j j
j j
f j j f
j
w w a r a r
w w r a r r
0
{ }
max [ ( (1 ) ( ))]
j
f j j f
a
j
EU w r a r r
Portfolio Choice
2
F.O.C. [ ( )( )] 0
S.O.C. [ ( )( ) ] 0
j f
j f
EU w r r j
EU w r r j
() 0 more preferred to less
() 0 concave utility or risk averse
U
U
Portfolio Choice
Theorem: An individual who is risk averse and strictly
prefers more to less will invest in risky assets iff the rate
of return on at least one asset > r
f
.
Consider the case with a single risky asset
F.O.C. [ ( )( )] 0
f
EU w r r
Portfolio Choice
Claim:
Consider the no investment case
*
*
*
0 iff [ ] 0
0 iff [ ] 0
0 iff [ ] 0
f
f
f
a E r r
a E r r
a E r r
0 0
[ ( (1 ))( )] ( (1 ))( [ ] )
f f f f
EU w r r r U w r E r r
Portfolio Choice
() 0 sign is entirely determined by [ ]
f
U E r r
[ ] 0 can increase utility by adding some of the risky asset
[ ] 0 can increase utility by shorting some of the risky asset
[ ] 0 utility is maximized
f
f
f
E r r
E r r
E r r
Portfolio Choice
Richard R. Lindsey 61
In the multi-asset case, to hold no risky assets or to short
them
And again
Therefore, a risk averse individual with strictly increasing
utility avoids any positive investment in risky assets only
if none of the investments have a positive risk premium.
0
0
[ ( (1 ))( )] 0
( (1 ))( [ ] ) 0
f f
f f
EU w r r r j
U w r E r r j
0 only if [ ] 0
j j f
a j E r r j
Portfolio Choice
When one or more of the risky assets has a positive risk
premium, the investor will have positive holdings in some
risky assets
Note that j and j are not necessarily the same because with
more than one risky asset, a positive risk premium on an
asset does not necessarily mean a positive investment (e.g.
2 assets w/ + risk premium but one stochastically
dominates the other).
0 if [ ] 0
j j f
j a j E r r
Risk Aversion
Consider now the case with one risky asset and one riskless
asset.
For a monotonically increasing strictly concave (MISC)
individual to invest all her wealth in the risky asset:
1
st
order Taylor series expansion around
0
[ ( (1 ))( )] 0
f
EU w r r r
0
( (1 ))
f
U w r
Risk Aversion
Note that this is for a small risk.
The minimum risk premium to induce full investment is
0 0
2 2
0 0
[ ( (1 ))( )] ( (1 )) [ ]
( (1 )) [( ) ] o( [( ) ]
f f f
f f f
EU w r r r U w r E r r
U w r E r r w E r r
0
2
0
0
2
0 0
( (1 ))
[ ] [( ) ]
( (1 ))
( (1 )) [( ) ]
f
f f
f
A f f
U w r
E r r w E r r
U w r
R w r w E r r
Risk Aversion
This is known as the Arrow-Pratt measure of absolute risk
aversion (the inverse of R
A
is the risk tolerance).
For small risks (or small changes in risk) it is a measure of
the intensity of an individuals aversion to risk.
It is a measure of curvature (but since vonNeumann-
Morgenstern utility is unique up to affine transformations,
the 2
nd
derivative is not sufficient).
Risk Aversion
Theorem:
( )
0 decreasing absolute risk aversion
( )
0 increasing absolute risk aversion
( )
0 constant absolute risk aversion
A
A
A
dR z
z
dz
dR z
z
dz
dR z
z
dz
0
0
0
0
0
0
( )
0 if 0
( )
0 if 0
( )
0 if 0
A
A
A
dR z
da
w z
dw dz
dR z
da
w z
dw dz
dR z
da
w z
dw dz
Risk Aversion
Decreasing absolute risk aversion implies that the risky asset
is a normal good (i.e. the dollar demand increases as
wealth increases).
Increasing absolute risk aversion implies that the risky asset
is an inferior good (i.e. the dollar demand decreases as
wealth increases).
Constant absolute risk aversion implies that the dollar
demand is invariant with respect to wealth.
Risk Aversion
Absolute risk aversion is therefore related to the dollar
demand for the risky asset.
But under decreasing absolute risk aversion, an individual
may actually increase, hold constant, or decrease the
proportion of wealth in the risky asset as wealth increases.
This brings us to the Arrow-Pratt measure of relative risk
aversion
( )
R A
R zR z
Risk Aversion
Theorem:
Where
Is the wealth elasticity of demand.
( )
1 if 0 (relatively elastic)
( )
1 if 0
( )
1 if 0 (relatively inelastic)
R
R
R
dR z
dz
dR z
dz
dR z
dz
0
0
w
da
dw a
Risk Aversion
<1: the proportion of agents initial wealth invested in the
risky asset decreases as wealth increases
=1: the proportion of agents initial wealth invested in the
risky asset is constant as wealth increases
>1: the proportion of agents initial wealth invested in the
risky asset increases as wealth increases
Linear Risk Tolerance Utility
To get sharper results and closed form solution for securities
holdings, we need to specify the form of the utility
function. Most typically we use a class of utility function
known as linear risk tolerance (LRT) utilities or HARA
utilities (hyperbolic absolute risk aversion). These utility
functions satisfy state independence and time additivity.
Definition: Linear risk tolerance utility, the time additive
and state dependent utility function U( ) satisfies linear
risk tolerance if it solves the differential equation:
Where and are independent of z.
Note: every LRT utility function is identified by 2
parameters: the intercept and the slope .
( )
( )
U z
z
U z
This differential equation has three sets of solutions
depending on the value of
Where means that the solutions are unique up to a positive
linear transform.
1
1
(A) 0,1 : ( ) where 0; max , 0
1
U z z z
(B) 1 : ( ) ln U z z
(C) 0 : ( ) exp where 0
z
U z
These three classes are:
(A) Generalized Power Utility (when = 0)
1
( )
A
R z
z |
=
+
2
( )
0
( )
A
dR z
dz z
|
|
= <
+
( )
R
z
R z
z |
=
+
2
( )
( )
R
dR z
dz z
|
=
+
Which is
0 iff 0
0 iff 0
0 iff 0
> >
= =
< <
Recall from Risk Aversion
Theorem:
Where
( )
( )
1 if 0
( )
R
R
R
dR z
dz
dR z
dz
dR z
dz
0
0
w
da
dw a
When = 0 we have power utility which is CPRA or
constant proportional (relative) risk aversion. Also known
as iso-elastic utility.
The proportion of wealth in the risky asset is invariant to
changes in wealth.
When | = -1 we have quadratic utility.
(B) Generalized Log Utility (when = 0)
1
( )
A
R z
z
=
+
2
( ) 1
0
( )
A
dR z
dz z
= <
+
( )
R
z
R z
z
=
+
2
( )
( )
R
dR z
dz z
=
+
Which is
0 iff 0
0 iff 0
0 iff 0
> >
= =
< <
Recall from Risk Aversion
Theorem:
Where
( )
( )
1 if 0
( )
R
R
R
dR z
dz
dR z
dz
dR z
dz
0
0
w
da
dw a
When = 0 we have log utility which is CPRA or constant
proportional (relative) risk aversion. Also known as iso-
elastic utility.
The proportion of wealth in the risky asset is invariant to
changes in wealth.
Note when = 0 we have R
R
(z) = 1.
(C) Negative Exponential Utility
Constant absolute risk aversion (CARA)
Dollar demand for risky assets is unaffected by changes in
wealth (riskless borrowing or lending absorbs all
changes).
1
( )
A
R z
=
( )
0
A
dR z
dz
=
Stochastic Dominance
Empirical Observations Properties of U(z)
Investors prefer more to less U'(z) > 0
Investors are risk averse U(z) > 0
The risky asset is a normal good dR
A
(z)/dz < 0
We now want to relate these three properties of utility
functions to the properties of payoff distributions.
For example, one question we can ask is: Under what
circumstances can we unambiguously say that an
individual will prefer one risky asset to another if all
we know is that he prefers more to less?
We can answer questions like this using stochastic
dominance.
Note that stochastic dominance is:
1. Always a pairwise comparison.
2. Only a partial ordering among risky assets.
3. Much richer than what we will cover here (e.g. you can
develop much of modern portfolio theory just using
stochastic dominance).
Definition: First Order Stochastic Dominance
Then X
A
FSD X
B
.
( ) Pr[ ] F x X x s
( ) and ( ) are different distributions
( ) 0
A B
F x F x
a F a - =
If ( ) ( ) 0
0 some
A B
F x F x x
x
s
<
Definition: Second Order Stochastic Dominance
Then X
A
SSD X
B
.
( )
If ( ) ( ) 0
0 some
t
A B
a
F x F x dx t
t
s
<
}
and [ ] [ ]
A B
E X E X =
Definition: Third Order Stochastic Dominance
Then X
A
TSD X
B
.
( )
If ( ) ( ) 0
0 some
y t
A B
a a
F x F x dxdt y
y
s
<
} }
[ ] [ ] and [ ] [ ]
A B A B
E X E X Var X Var X = s
Theorem: X
A
FSD X
B
X
A
SSD X
B
X
A
TSD X
B
(these are
progressively weaker tests).
Theorem: E[U(X
A
)] > E[U(X
B
)] for all U( ) (that are finite
for all finite x) such that U'(x) > 0 everywhere iff X
A
FSD
X
B
(i.e. prefers more to less).
Theorem: E[U(X
A
)] > E[U(X
B
for all finite x) such that U'(x) > 0 and U(x) < 0
everywhere iff X
A
SSD X
B
(i.e. risk averse).
Theorem: E[U(X
A
)] > E[U(X
B
for all finite x) such that U'(x) > 0, U(x) < 0 and U'(x) >
0 everywhere iff X
A
TSD X
B
.
Theorem: E[U(X
A
)] > E[U(X
B
for all finite x) such that U'(x) > 0, U(x) < 0 and R
A
'(x) <
0 everywhere iff X
A
TSD X
B
(i.e. risky asset is a normal
good).
Theorem: The following three statements are equivalent:
1. A FSD B
2. F
A
(x) F
B
(x) for all x
3. x
A
= x
B
+ where 0
Theorem: The following three statements are equivalent:
1. A SSD B
2. E[x
A
] = E[x
B
] and
3. x
A
= x
B
+ where E[ |A] = 0
( )
if ( ) ( ) 0 and 0 some
t
A B
a
F x F x dx t t s <
}
Lets consider an example
Which investment do we choose?
1
1 with probability 0.25
X

=

2
X

1
1
[ ] 3.25
[ ] 1.6875
E X
Var X
=
=
2
2
[ ] 3.25
[ ] 1.6875
E X
Var X
=
=
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1 2 3 4 5 6
X1
X2
Cannot have FSD because the cumulative distribution
functions cross.
No SSD because both distribution functions are admissible.
Definition: A distribution is admissible or efficient with
respect to a set of distribution functions, S, if it is not
dominated by a member of S.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1 2 3 4 5 6
X1
X2
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0 1 2 3 4 5 6
g(t)
X
2
TSD X
1
so we would choose X
2
.
Note that this choice reflects a preference for skewness.
If you must take a risky gamble, do you prefer to take it
when wealth is high or low?
Riskiness of Distributions
This is a partial ordering of distributions.
Definition: Distribution Y is more risky than distribution X
if:
1. Y=X+Z where E[Z|X]=0 and non-degenerate.
2. Y is obtained from X by the addition of a mean
preserving spread.
3. X is preferred to Y by all risk averters providing
E[X]=E[Y].
4. Var[Y] > Var[X] provided E[X]=E[Y].
Riskiness of Distributions
Theorem: The partial orderings given by 1, 2, and 3 are
equivalent.
Theorem: The partial orderings given by 1, 2, 3, and 4 are
equivalent for normal distributions. (Reason: normals are
stable under addition if variances are finite.)
Bibliography
Huang, Chi-fu, and Robert Litzenberger, Foundations for
Financial Economics, North-Holland.
Levy, Haim, Stochastic Dominance: Investment Decision
Making under Uncertainty, Springer.
Ohlson, James, The Theory of Financial Markets and
Information, North-Holland.
Rothschild, M. and J. E. Stiglitz (1970). Increasing Risk: I.
A Definition. Journal of Economic Theory 2: 225-43.
Optimization: Definitions
Our optimization problems will take the form:
Where f is a function, x is an n-vector and S is a set of n-
vectors. We call f the objective function, x the choice
variable or control variable, and S the constraint set or
opportunity set.
max ( ) subject to
x
f x x S
Definition: The value x* of the variable x solves the problem
if
In this case, we say that x* is a maximizer of the function f
subject to the constraint x an element of S, and that f(x*) is
the maximum (or maximum value) of the function f
subject to the constraint.
max ( ) subject to
x
f x x S
*
( ) ( ) f x f x x S
A minimizer is defined analogously.
x
1
is a local maximizer x
2
is a minimizer
x
3
is a maximizer x
4
is a ?
x
5
is a ?
Note that we can transform the objective function f with any
strictly increasing function g. In other words:
Is identical to the set of solutions to the problem:
This fact is sometimes useful since it may be easier to work
with a transform of the objective function rather than the
original function.
max ( ) subject to
x
f x x S
max ( ( )) subject to
x
g f x x S
Minimization problems are just the maximization of the
negative of the objective function
Has the same set of solutions as
max ( ) subject to
x
f x x S
min ( ) subject to
x
f x x S
Note that a continuous function on a compact set (closed
and bounded) attains both a minimum and a maximum on
that set (this is the Extreme Value Theorem). This is a
sufficient condition for a maximum (and a minimum) to
exist.
Interior Optimum: One Variable

Proposition: (FOC) Let f be a differentiable function of a
single variable defined on the interval I. If a point x* in
the interior of I is a local or global maximizer or
minimizer of f then f '(x*) = 0 (i.e. it is stationary).
Proposition: (SOC) Let f be a function of a single variable
with continuous first and second derivatives, defined on
the interval I. Suppose that x* is a stationary point of f in
the interior of I (so that f '(x*) = 0).
1. If f "(x*) < 0 then x* is a local maximizer.
2. If x* is a local maximizer then f "(x*) 0.
3. If f "(x*) > 0 then x* is a local minimizer.
4. If x* is a local minimizer then f "(x*) 0.
Note: These are necessary conditions.
Interior Optimum: Many Variables

Proposition: (FOC) Let f be a differentiable function of n
variables defined on the set S. If the point x in the interior
of S is a local or global maximizer or minimizer
of f then f
i
'(x) = 0 for i = 1, ..., n (i.e. it is stationary).
Proposition (SOC) Let f be a function of n variables with
continuous partial derivatives of first and second order,
defined on the set S. Suppose that x* is a stationary point
of f in the interior of S (so that f
i
'(x*) = 0 for all i).
1. If H(x*) is negative definite then x* is a local maximizer.
2. If x* is a local maximizer then H(x*) is negative semidefinite.
3. If H(x*) is positive definite then x* is a local minimizer.
4. If x* is a local minimizer then H(x*) is positive semidefinite.
Note: These are necessary conditions.

Where H is the Hessian matrix
2 2
1 1 1
2 2
1
n
n n n
f f
x x x x
H
f f
x x x x

An implication of this result is that if x* is a stationary point
of f then
1. if H(x*) is negative definite then x* is a local maximizer
2. if H(x*) is negative semidefinite, but neither negative definite nor
positive semidefinite, then x* is not a local minimizer, but might be a
local maximizer
3. if H(x*) is positive definite then x* is a local minimizer
4. if H(x*) is positive semidefinite, but neither positive definite nor
negative semidefinite, then x* is not a local maximizer, but might be a
local minimizer
5. if H(x*) is neither positive semidefinite nor negative semidefinite then
x* is neither a local maximizer nor a local minimizer.
A stationary point which is neither a maximizer or a minimizer is
called a saddle point (note that not all saddle points look like
a saddle. For example, every point (0, y) is a saddle point of
the function f (x, y) = x
3
.).
Global Optimum: One Variable

Proposition: Let f be a differentiable function defined on
the interval I, and let x be in the interior of I. Then:
1. if f is concave then x is a global maximizer of f in I if and only if x
is a stationary point of f
2. if f is convex then x is a global minimizer of f in I if and only if x
is a stationary point of f .
So if f is twice differentiable:
1. f "(z) 0 for all z I [x is a global maximizer of f in I if and
only if f '(x) = 0]
2. f "(z) 0 for all z I [x is a global minimizer of f in I if and only
if f '(x) = 0].
Global Optimum: Many Variables

Proposition: Suppose that the function f has continuous
partial derivatives in a convex set S and let x be in the
interior of S. Then:
1. if f is concave then x is a global maximizer of f in S if and only if
it is a stationary point of f .
2. if f is convex then x is a global minimizer of f in S if and only if it
is a stationary point of f .
So if f is twice differentiable:
1. H(z) is negative semidefinite for all z S [x is a global maximizer
of f in S if and only if x is a stationary point of f ].
2. H(z) is positive semidefinite for all z S [x is a global minimizer
of f in S if and only if x is a stationary point of f ].
Global Optimum: Many Variables

Note the difference between this and the local optima:
Sufficient conditions for local maximizer: if x* is a
stationary point of f and the Hessian of f is negative
definite at x* then x* is a local maximizer of f.
Sufficient conditions for global maximizer: if x* is a
stationary point of f and the Hessian of f is negative
semidefinite for all values of x then x* is a global
maximizer of f.
Constrained Optimization: Equality

Usually it is not enough to consider solutions which
maximize (or minimize) a particular function (e.g. Diet
Coke can).
Instead, we want to find a solution which is subject to fixed,
outside constraints.
To solve these problems, we can use Lagrange multipliers.

Suppose that Monique and
Carl are going swimming
in the river, and they see
each other in a field
bounded by the river.
Since it is such a hot day,
they want to jump in the
river as quickly as
possible, but they want to
do it together. What point
(P) on the riverbank
should they meet?

In mathematical terms, if d(M,P) is the distance between M
and P, they must solve the problem:
Subject to the constraint:
P
min (P) (M, P) (P, C) f d d
(P) 0 g

We can solve this graphically
if we recall that ellipses are
curves of constant P (i.e.
for every point P on an
ellipse, the total distance
from one focus of the
ellipse to P and then to the
other focus is the same).
So we need to find and
ellipse (with C and M as
the foci) which is tangent
to the riverbank.

Or, mathematically, the normal vector to the ellipse must
point in the same direction as the normal vector to the
river.

Recall that the gradient of a function f (which is written )
is a normal vector to a curve (in two dimensions) or a
surface (in higher dimensions). The length of the normal
vector doesnt matter; any constant multiple of the
gradient is also a normal vector. In our case, we have two
functions whose normal vectors are parallel, so:
The unknown multiplier - is necessary because the
magnitudes of the two gradients may be different.
f
(P) (P) f g

Alternatively, we can approach the problem by considering
the optimization problem and combine it with the
constraint to form a new function called the Lagrangian or
Lagrangian function:
and then we set:
P, P
min (P, ) min (P) (P) f g L
(P, ) 0 L

Proposition: Let f and g be continuously differentiable
functions of two variables defined on the set S, let c be a
number, and suppose that (x*, y*) is an interior point of S
that solves the problem
Suppose also that either
,
max ( , ) subject to g( , )
x y
f x y x y c
* *
,
0
g x y
x
* *
,
0
g x y
y

Then there is a unique number such that (x*, y*) is a
stationary point of the Lagrangian
That is (x*, y*) satisfy the FOC
( ,y) ( , ) ( ( , ) ) x f x y g x y c L
* * * * * *
( ,y ) ( , ) ( , )
0
x f x y g x y
x x x
L
* *
( , ) g x y c
* * * * * *
( ,y ) ( , ) ( , )
0
x f x y g x y
y y y
L


Algorithm for solving a two-variable maximization problem with an equality
constraint.
Let f and g be continuously differentiable functions of two variables defined on a set
S and let c be a number. If the problem
has a solution, it may be found as follows.
A) Find all the values of (x, y, ) in which
1. (x, y) is an interior point of S
2. (x, y, ) satisfies the FOC and the constraint.
B) Find all the points (x, y) that satisfy g
1
'(x, y) = 0, g
2
'(x, y) = 0, and g(x, y) = c. (For
most problems, there are no such values of (x, y). In particular, if g is linear there are
no such values of (x, y).)
C) If the set S has any boundary points, find all the points that solve the problem
max
x,y
f (x, y) subject to the two conditions g(x, y) = c and (x, y) is a boundary point
of S.
D) The points (x, y) you have found at which f (x, y) is largest are the maximizers of f .
,
x y
f x y x y c

Example: Consider the problem
(Note that the objective function xy is defined on the set of
all 2-vectors, which has no boundary. The constraint set is
therefore not bounded, so the extreme value theorem does
not imply that this problem has a solution.)
The Lagrangian is
,
max subject to 6
x y
xy x y
( ,y) ( 6) x xy x y L

The FOC are
And the constraint
These equations have a unique solution, (x, y, ) = (3, 3, 3). We
have g'
1
(x, y) = 1 0 and g'
2
(x, y) = 1 0 for all (x, y), so we
conclude that if the problem has a solution it is (x, y) = (3, 3).
0 y
x
L
0 x
y
L
6 x y

(Note that the constraint set is compact and the objective
function is continuous, so the extreme value theorem
implies that this problem has a solution.)
The Lagrangian is
2 2 2
,
max subject to 2 3
x y
x y x y
2 2 2
( ,y) (2 3) x x y x y L

The FOC are
And the constraint
(Note that the constraint could also be considered the FOC
for the Lagrangian with respect to , the Lagrange
multiplier.)
2 4 2 ( 2 ) 0 xy x x y
x
L
2
2 0 x y
y
L
2 2
2 3 0 x y

To find the solutions of these three equations, first note that
from the first equation we have either x = 0 or y = 2. We
can check each possibility in turn.
x = 0: we have y = 3
1/2
and = 0, or y = 3
1/2
and = 0.
y = 2: we have x
2
= y
2
from the second equation, so either x =
1 or x = 1 from the third equation.
x = 1: either y = 1 and = 1/2, or y = 1 and = 1/2.
x = 1: either y = 1 and = 1/2, or y = 1 and = 1/2.

So, the FOC have six solutions:
1. (x, y, ) = (0, 3
1/2
,0), with f (x, y) = 0.
2. (x, y, ) = (0, 3
1/2
,0), with f (x, y) = 0.
3. (x, y, ) = (1, 1, 1/2), with f (x, y) = 1.
4. (x, y, ) = (1, 1, 1/2), with f (x, y) = 1.
5. (x, y, ) = (1, 1, 1/2), with f (x, y) = 1.
6. (x, y, ) = (1, 1, 1/2), with f (x, y) = 1.
Now, g'
1
(x, y) = 4x and g'
2
(x, y) = 2y, so the only value of (x, y) for
which g'
1
(x, y) = 0 and g'
2
(x, y) = 0 is (x, y) = (0, 0). At this point the
constraint is not satisfied, so the only possible solutions of the
problem are the solutions of the first-order conditions.
We conclude that the problem has two solutions, (x, y) = (1, 1) and
(x, y) = (1, 1).

2/3/2009
Consider the problem
And suppose we solve the problem for various values of c.
Let the solution be (x*(c), y*(c)) with a Lagrange
multiplier of *(c). Assume that the functions x*, y*, and
* are differentiable and that g
1
'(x*(c), y*(c)) 0 or
g
2
'(x*(c), y*(c)) 0, so that the first-order conditions are
satisfied. Let f *(c) = f (x*(c), y*(c)). Differentiate f *(c)
with respect to c:
,
x y
f x y x y c

Differentiate f *(c) with respect to c:
(using the FOC). Note, however, that g(x*(c), y*(c)) = c for
all c, so the derivatives of each side of this equality are the
same for all c. That is
* * * * * * * * *
* * * * * * * *
*
( ) ( ( ), ( )) ( ) ( ( ), ( )) ( )
( ( ), ( )) ( ) ( ( ), ( )) ( )
( )
f c f x c y c x c f x c y c y c
c x c y c
g x c y c x c g x c y c y c
c
x c y c
* * * * * * * *
( ( ), ( )) ( ) ( ( ), ( )) ( )
1
g x c y c x c g x c y c y c
c
x c y c

Therefore
OrThe value of the Lagrange multiplier at the solution of
the problem is equal to the rate of change in the maximal
value of the objective function as the constraint is relaxed.
(Note that this follows directly from our use of the gradient
earlier.)
So, in a utility maximization problem, the optimal value of
the Lagrange multiplier measures marginal utility of our
control variable (or the shadow price of that variable).
*
*
( )
( )
f c
c
c

Sufficient conditions for a local optimum with two variables.
Consider the problem
Suppose (x*, y*) and * satisfy the FOC:
And the constraint
,
x y
f x y x y c
* * * *
( , ) ( , )
0
f x y g x y
x x
* * * *
( , ) ( , )
0
f x y g x y
y y
* *
( , ) g x y c

Then
If D(x*, y*, *) > 0 then (x*, y*) is a local maximizer
of f subject to the constraint g(x, y) = c.
If D(x*, y*, *) < 0 then (x*, y*) is a local mimimizer
of f subject to the constraint g(x, y) = c.
Where D(x*, y*, *) is the determinant of the bordered
Hessian of the Lagrangian.

( )
* * * *
* * 2 * * 2 * * 2 * * 2 * *
* * * * *
* * 2 * * 2 * * 2 * * 2 * *
* *
( , ) ( , )
0
( , ) ( , ) ( , ) ( , ) ( , )
, ,
( , ) ( , ) ( , ) ( , ) ( , )
g x y g x y
x y
g x y f x y g x y f x y g x y
D x y
x x x x x x y x y
g x y f x y g x y f x y g x y
y y x y x y y y y

c c
c c
c c c c c
=
c c c c c c c c c
c c c c c

c c c c c c c c c

Example: Consider again the problem
We previously found that there are six solutions to the FOC
1. (x, y, ) = (0, 3
1/2
,0), with f (x, y) = 0.
2. (x, y, ) = (0, 3
1/2
,0), with f (x, y) = 0.
3. (x, y, ) = (1, 1, 1/2), with f (x, y) = 1.
4. (x, y, ) = (1, 1, 1/2), with f (x, y) = 1.
5. (x, y, ) = (1, 1, 1/2), with f (x, y) = 1.
6. (x, y, ) = (1, 1, 1/2), with f (x, y) = 1.
2 2 2
,
max subject to 2 3
x y
x y x y

Further, we found that solutions 3 and 5 are global
maximizers and solutions 4 and 6 are global minimizers.
The two remaining solutions of the FOC, (0, 3
1/2
) and
(0, 3
1/2
), are neither global maximizers nor global
minimizers. Are they local maximizers or local
minimizers?

The determinant of the bordered Hessian of the Lagrangian
is
The determinant is
0 4 2
( , , ) 4 2 4 2
2 2 2
x y
D x y x y x
y x

2 2 2 2 2
2 2
4 ( 8 4 ) 2 (8 2 (2 4 )) 8(2 (2 ) (4 ))
8(6 (4 ))
x x xy y x y y x y y x y
y x y

+ = + +
= +

(since 2x
2
+ y
2
= 3 at each solution, from the constraint). The
value of the determinant at the two solutions is
(0, 3
1/2
, 0): 83
3/2
, so (0, 3
1/2
) is a local minimizer;
(0, 3
1/2
, 0): 8 3
1/2
, so (0, 3
12
) is a local maximizer.

Proposition: Suppose that f and g are continuously differentiable
functions defined on an open convex subset S of two-
dimensional space and suppose that there exists a number *
such that (x*, y*) is an interior point of S that is a stationary
point of the Lagrangean
Suppose further that g(x*, y*) = c.
Then if
L is concave in particular if f is concave and *g is convex then
(x*, y*) solves the problem max
x,y
f (x, y) subject to g(x, y) = c.
L is convex in particular if f is convex and *g is concave then (x*,
y*) solves the problem min
x,y
f (x, y) subject to g(x, y) = c.
( ,y) ( , ) ( ( , ) ) x f x y g x y c L
Envelope Theorem
Often we are interested in how the maximal value of a
function depends on its parameters.
Consider the unconstrained maximization problem:
Assume that for any a the problem has a unique solution;
denote this solution x*(a). Denote the maximum value
of f , for any given value of a, by M*(a): M *(a)
= f (x*(a), a). We call M* the value function.
max ( ( ), )
x
f x a a
Envelope Theorem
Taking the derivative of Musing the chain rule
The first term is the indirect effect of how changing a affects the optimal
choice of x and how that change in x affects the value of f. The second term
is the direct effect of how changing a changes f holding x fixed at x(a). This
expression can be simplified by noticing that since x*(a) is the optimal
choice for x at each value of a,
* * *
( ) ( , ) ( ) ( ( ), ) dM a f x a dx a f x a a
da x da a
c c
= +
c c
*
( , )
0
f x a
x
c
=
c
Envelope Theorem
This means
Or the change in the objective function adjusting optimally
is equal to the change in the objective function when one
doesnt adjust x.
In other words, the total derivative of f(x(a),a) with respect
to a is equal to the partial derivative of f(x(a),a) with
respect to a, evaluated at the optimal choice of x.
This is known as the Envelope Theorem.
* *
( ) ( ( ), ) dM a f x a a
da a
c
=
c
Envelope Theorem
Note that to compute the effect of changing a on x(a), we
differentiate the FOC
*
( , )
0
f x a
x
a
(
c
c
(
c

=
c
2 * 2 *
2
( , ) ( ) ( ( ), )
0
f x a dx a f x a a
da x a
x
c c
+ =
c c
c
Envelope Theorem
The sign of the denominator is negative by the SOC,
therefore the sign of the expression is determined by the
sign of the mixed partial in the numerator.
2 *
2 *
2
( ( ), )
( )
( , )
f x a a
dx a
x a
da
f x a
x
c
c c
=
c
c
Envelope Theorem
Now consider
Then the Lagrangian is
The envelope theorem states
Again, we only have to take into account the change in y,
not the associated change in x.
,
max ( , ) subject to g( , ) 0
x y
f x y x y
( ,y) ( , ) ( , ) x f x y g x y L
* * *
*
( (y),y) ( ( ), ) ( ( ), ) x f x y y g x y y
y y y
c c c
=
c c c
L
Envelope Theorem
Example: Consider a utility maximization problem: max
x
U(x) subject to p x = w. where x is a vector (a bundle of
goods), p is the price vector, and w is the consumer's
wealth (a real number). Denote the solution of the problem
by x*(p, w), and denote the value function by v, so that
The function v is known as the indirect utility function.
*
( , ) ( ( , )) for every ( , ) v p w U x p w p w =
Envelope Theorem
By the envelope theorem
Thus
This result is known as Roy's identity.
* *
( , )
( , ) ( , )
i
i
i
v p w
p w x p w
p

c
=
c
*
( , )
( , )
v p w
p w
w

c
=
c
*
( , )
( , )
( , )
i
i
i
v p w
p
x p w
v p w
w
c
c
=
c
c
Mean-Variance Analysis: Intro

Mean-variance model for asset choice was developed by
Markowitz (1952 Journal of Finance).
Recalling our discussion of stochastic dominance, we can
see that, in general, investors should have MISC
preferences. In other words, they should exhibit a
preference for expected return and aversion to variance.
But for arbitrary distribution functions and utility functions
E[U( )] cannot be expressed as a function of only mean
and variance.

To see this, take a Taylor series expansion around the
expected end of period wealth:
( )
2
( )
3
( [ ]) ( [ ])( [ ])
1
( [ ])( [ ])
2
1
( [ ])( [ ])
!
n n
n
U w U E w U E w w E w
U E w w E w
U E w w E w
n
=
'
= +
''
+
+

Taking the expectation:
Unless the last term is zero, we need more than the mean
and variance.
Note that the last part of the last term is the n
th
central
moment of w .
( )
( )
3
1
[ ] ( [ ]) ( [ ]) [ ]
2
1
( [ ]) [( [ ])]
!
n n
n
E U w U E w U E w Var w
U E w E w E w
n
=
''
= +
+

For arbitrary distributions, the mean-variance model can be
motivated by assuming quadratic utility:
There are no additional terms because the third and higher
order derivatives are zero.
( )
( )
( )
2
2 2
[ ] [ ] [ ]
2
[ ] ( [ ]) ( )
2
b
E U w E w E w
b
E w E w w o
=
= +

Problems with quadratic utility
Saturation (i.e. utility decreases as wealth increases after a certain
point).
Increasing absolute risk aversion (i.e. risky assets are inferior
goods).

For arbitrary preferences, the mean-variance model can be
motivated by assuming that rates of return on risky assets
are multivariate normal.
The normal is completely characterized by the mean and the
variance (all higher moments can be described as
functions of the first two moments).
Note: the lognormal is also characterized by the mean and
variance, but is not stable under addition.

Problems with normality
Unbounded
Inconsistent with limited liability
Inconsistent with economic theory (no place for negative
consumption)
Experimentally, returns are not normal
Note: multivariate normal is sufficient for mean-variance
analysis, but not necessary.

Although the mean-variance model is not a general model of
asset choice, it holds a central role in finance due to its
tractability and its richness of empirical predictions.
Mean-Variance Analysis: Basics

Assume that we have:
N 2 assets
frictionless markets
unlimited short selling
common knowledge about
expected returns
the variance-covariance structure
finite variances and unequal expectations
variance-covariance matrix of asset returns E
1
the vector of expected returns
N
e
e
e
(
(
(
(


If we plot the variance and expected returns for all N
securities

And then consider all possible portfolios of them

We have the feasible set of portfolios in mean-variance
space (which is a parabola).

Definition: A portfolio is a frontier portfolio if it has the
minimum variance among portfolios having the same
expected rate of return.
1
[ ] [ ] 1
N
p i i
i
E r wE r w e w wi
=
' '
= = =
1 1
[ ]
N N
p i j ij
i j
Var r ww w w o
= =
'
= = E

A portfolio p is a frontier portfolio iff w
p
, the N-vector of
portfolio weights of p is the solution to:
{ }
1
min
2
s.t. and 1
w
p
w w
w e E r wi
'
E
' ' (
= =


Forming the Lagrangian and solving for the first order
conditions:
F.O.C.
( )
( )
1
1
2
p
w w E r w e w i
' ' ' (
= E + +

L
0 w e
w
i
c
= E =
c
L
( )
0
p
E r w e
c
' (
= =

c
L
( )
1 0 wi
c
'
= =
c
L

Since is positive definite, these first order conditions are
necessary and sufficient for a global optimum.
Solving the 1
st
FOC for the weights
Premultiply by the expected returns and using the 2
nd
FOC
( ) ( )
1 1
p
w e i

= E + E
( ) ( )
1 1
p
E r e e e i

' ' (
= E + E


Or premultiply the portfolio weights by a vector of 1s and
use the 3
rd
FOC
Define
( ) ( )
1 1
1 e i i i

' '
= E + E
1 1
A e e i i

' '
= E = E
1
B e e
'
= E
1
C i i
'
= E
2
D BC A =
B A
M
A C
(
=
(


Note: A, B, C, and D are just numbers. M contains
sufficient information to prove everything in efficient set
mathematics.
Solving for the Lagrange multipliers
C A
D
p
E r
(

=
B A
D
p
E r
(

=
And substituting into our expression for w
p
gives
Any frontier portfolio can be found this way since the
expected return was arbitrary and this equation is a
necessary and sufficient solution.
( ) ( )
1 1
C [ ] A B A [ ]
D D
p p
p
E r E r
w e i

= E + E
1 1 1 1
1 1
C A [ ] B A
D D
p p
w e E r e i i

( (
= E E + E E

h [ ] g
p p
w E r = +

Note that is the vector of portfolio weights corresponding
to a frontier portfolio with E[r]=0 and that is the
vector of portfolio weights corresponding to a frontier
portfolio with E[r]=1.
Claim all frontier portfolios can be generated by forming
portfolios of the two frontier portfolios formed with
weights and .
Note that it therefore follows that all frontier portfolios can
be formed from any two distinct frontier portfolios.
g
g
g h +
g h +
Mean-Variance Analysis: Frontier

The covariance between the returns of any two frontier
portfolios is
Or the variance of any frontier portfolio can be found and
then we can write
1 C A A
( , ) [ ] [ ]
C D C C
p q p q p q
Cov r r w w E r E r
( | || |
'
= E = +
| |
(
\ .\ .
2
2
2
A
( )
C
1
1 D
C
C
p
p
E r
r o
| |
(
|

\ .
=

Which is the equation of a hyperbola in SD-E[r] space with
center (0, A/C) and asymptotes
The minimum variance portfolio is defined as the portfolio
having the minimum variance of all possible portfolios.
Note
A D
C C
p p
E r o ( =

1
[ ]
C
MV
E r =
A
[ ]
C
MV
Var r =

Definition: Frontier
portfolios which have
expected rates of return
strictly greater than that
of the minimum variance
portfolio are called
efficient portfolios.
These are portfolios which
have the highest return
for a given variance.

Let be m frontier portfolios and
be real numbers such that .
Then
Therefore, any linear combination of frontier portfolios is on
the frontier.
1, ,
i
w i m =
1, ,
i
i m o =
1
1
m
i
i
o
=
=
| | ( )
| |
1 1
1
m m
i i i i
i i
m
i
i
w g hE r
g h E r
o o
= =
=
= +
= +


If the i=1,,m portfolios are efficient, and
i
>0 for all i,
then
Any convex combination of efficient portfolios is an
efficient portfolio (i.e. the set of efficient portfolios is a
convex set).
| |
1 1
A A
C C
m m
i i i
i i
E r o o
= =
> =

Bibliography
Cornuejols and Ttnc, Optimization Methods in Finance,
Cambridge.
Huang and Litzenberger, Foundations for Financial Economics,
North-Holland.
Intriligator, Mathematical Optimization and Economic Theory,
Prentice-Hall.
Marsden and Tromba, Vector Calculus, Freeman.
Varian, Microeconomic Analysis, Norton.
Mean-Variance Analysis: Risk Free Rate
Everything we have done so far did not have a riskless asset.
Now consider N+1 assets with equal to the portfolio
weights on risky assets is the solution to
p
w
p
w
{ }
1
min
2
s.t. (1 )
w
f p
w w
w e w r E r i
'
E
' ' (
+ =

Which has the solution
( )
1
2
B 2A C
p f
p f
f f
E r r
w e r
r r
i
(

= E
+
( )
2
2
2
( )
B 2A C
p f
p
f f
E r r
r
r r
o
(

=
+
There are three cases.
1. A/C>r
f

2. A/C<r
f

3. A/C=r
f
Note: invest everything in the riskless asset and hold an arbitrage portfolio of
risky assets whose weight sums to zero.

We can also write
which holds independent of the relationship between r
f
and
A/C
and
for any frontier portfolio p other than the riskless asset.
( )
q f qp p f
E r r E r r | ( ( = +

( )
1
q qp f qp p q
r r r | | c = + +
( )
, 0
p q q
Cov r E c c ( = =

Mean-Variance Analysis
Lets return to our minimization problem:
There are alternative ways to pose this problem; for
example, we could rewrite the constraints as:
{ }
1
min
2
s.t. and 1
w
p
w w
w e E r wi
'
E
' ' (
= =

Aw b
Where
Note: If we wanted to include a riskless asset, we could also have N+1 assets
with one of the assets return equal to the risk-free rate.
1 2
1 1 1
N
A
e e e
1
[ ]
p
b
E r
Forming the Lagrangian
With FOC
( )
1
2
w w b Aw
'
= E L
0 w A
w

c
'
= E + =
c
L
0 A w b
c
'
= =
c
L
Solving now, from the first FOC
Substituting into the second FOC and solving for the
optimal weights gives
1
w A
1 1 1
( ) w A A A b
Example: Assume that we have three stocks with the
following characteristics (what do you expect?)
1
2
3
0.100162
0.164244
0.182082
e
e e
e
11 12 13
21 22 23
31 32 33
0.100162 0.045864 0.005712
0.210773 0.028283
0.066884
And that we want a 15% return on the portfolio (is this
feasible?). The constraints can be written
1 1 1
0.100162 0.164244 0.182082
A
1
0.15
b
Now we can use the solution to find the optimal weights
1 1 1
( ) w A A A b
0.3830
0.0397
0.5773
w
Do you see any problems or issues associated with the
solution to our portfolio problem?
Do you see any problems or issues associated with the
solution to our portfolio problem?
There may be other constraints which must be imposed:
Diversification constraints
max or min
Short-sale constraints
Borrowing constraints
Leverage constraints
Tracking error constraints
Etc.
For example, the Investment Company Act of 1940
Rule 12-d3 imposes certain investment constraints on
mutual funds:
Mutual funds cannot own more than 5% of other investment
companies (firms which derive more than 15% of revenue from
securities related activity)
If a mutual fund advertises as a diversified fund, it cannot hold
more than 5% of its assets in any company or hold more than
10% of the voting stock for any company for 75% of the fund
This means that we may want to (or need to) place
additional constraints on our optimization. Further, these
constraints may be inequality constraints (for example a
short-sale constraint would be expressed as w
i
0 for
all i.).
So, lets revisit optimization this time with inequality
constraints.
Optimization with Inequalities

Consider a problem of the form
where f and g
j
for j = 1, ..., m are functions of n variables, x
= (x
1
, ..., x
n
), and c
j
for j = 1, ..., m are constants.
All of the problems we have studied so far can be put into
this form
max ( ) subject to ( ) for 1, ,

j j
x
f x g x c j m
For equality constraints, we simply introduce two inequality
constraints for every equality. For example, the problem
Can be written as
max ( ) subject to ( ) 0
x
f x g x
max ( ) subject to ( ) 0 and ( ) 0
x
f x g x g x
To start thinking about how to solve the general problem,
first consider the case with a single constraint
There are two possible solutions for this problem, one where
the constraint is binding and the other is where the
constraint does not bind. In the latter case, where the
constraint is not binding for small changes in the
constraint, we say that the constraint is slack.
max ( ) subject to ( )
x
f x g x c

As before, we define the Lagrangian by
From our previous analysis of problems with equality
constraints and problems with no constraints,
if g(x*) = c (as in the left-hand panel) and the constraint
satisfies a regularity condition, then L'
i
(x*) = 0 for all i
if g(x*) < c (as in the right-hand panel), then f
i
'(x*) = 0 for
all i.
( ) ( ) ( ( ) ) x f x g x c L
In the first case (that is, if g(x*) = c) we have 0.
Suppose, to the contrary, that < 0. Then we know that a
small decrease in c raises the maximal value of f . That is,
moving x* inside the constraint raises the value of f ,
contradicting the fact that x* is the solution of the
problem.
In the second case, the value of does not enter the
conditions, so we can choose any value for it. Given the
interpretation of , setting = 0 makes sense. Under this
assumption we have f
i
'(x) = L'
i
(x) for all x, so that
L'
i
(x*) = 0 for all i.

Thus in both cases we have L'
i
(x*) = 0 for all i, 0, and
g(x*) c. In the first case we have g(x*) = c and in the
second case = 0.
We can combine the two cases by writing the conditions as
*
( )
0 for 1, ,
j
x
j n
x
L
* *
0, ( ) , and either 0 or ( ) 0 g x c g x c
Alternatively, since the product of two numbers is zero if at
least one of them is zero, we can write
Note that we have not ruled out the possibility that both = 0 and g(x*) = c.
The inequalities 0 and g(x*) c are called
complementary slackness conditions; at most one of these
conditions is slack (i.e. not an equality).
*
( )
0 for 1, ,
j
x
j n
x
L
* *
0, ( ) , and ( ( ) ) 0 g x c g x c
For a problem with many constraints, we introduce a
multiplier for each constraint and obtain the Kuhn-Tucker
conditions. For the problem
The Kuhn-Tucker conditions are

j j
x
f x g x c j m
*
( )
0 for 1, ,
i
x
i n
x
L
* *
0, ( ) , and ( ( ) ) 0 for 1, ,
j j j j j j
g x c g x c j m
Where
1
( ) ( ) ( ( ) )
m
j j j
j
x f x g x c
=
=
L
The Lagrangian is
1 2
2 2
1 2
,
max ( 4) ( 4)
x x
x x
2 2
1 2 1 2 1 1 2 2 1 2
( , ) ( 4) ( 4) ( 4) ( 3 9) x x x x x x x x = + + L
1 2 1 2
subject to 4 and 3 9 x x x x + s + s
And the Kuhn-Tucker conditions are
1 1 2
2 1 2
1 2 1 1 1 2
1 2 2 2 1 2
2( 4) 0
2( 4) 0
4, 0, and ( 4) 0
3 9, 0, and ( 3 9) 0
x
x
x x x x
x x x x

=
=
+ s > + =
+ s > + =
We have seen that a solution x* of an optimization problem
with equality constraints is a stationary point of the
Lagrangean if the constraints satisfy a regularity condition
(g(x*) 0 in the case of a single constraint g(x) = c)). In
an optimization problem with inequality constraints a
related regularity condition guarantees that a solution
satisfies the Kuhn-Tucker conditions. The weakest forms
of this regularity condition are difficult to verify. The next
result gives three alternative strong forms that are much
easier to verify.

Proposition Let f and g
j
for j = 1, ..., m be continuously
differentiable functions of many variables and let c
j
for j =
1, ..., m be constants. Suppose that x* solves the problem
Suppose that
either each g
j
is concave
or each g
j
is convex and there is some x such that g
j
(x) < c
j
for j = 1, ..., m
or each g
j
is quasi-convex, g
j
(x*) (0, ..., 0) for all j, and there is some x
such that g
j
(x) < c
j
for j = 1, ..., m.
Then there exists a unique vector = (
1
, ...,
m
) such that
(x*, ) satisfies the Kuhn-Tucker conditions.

j j
x
f x g x c j m
Example of a quasi-convex
function which is not
convex.
Example of a function which
is not quasi-convex.

Recall that a linear function is concave, so the conditions in
the result are satisfied if each constraint function is linear.
Note that the last part of the second and third conditions is
very weak: it requires only that some point strictly satisfy
all the constraints.
One way in which the conditions in the result may be
weakened is sometimes useful: the conditions on the
constraint functions need to be satisfied only by the
binding constraintsthose for which g
j
(x*) = c
j
.

We saw previously that for both an unconstrained
maximization problem and a maximization problem with
an equality constraint the first-order conditions are
sufficient for a global optimum when the objective and
constraint functions satisfy appropriate
concavity/convexity conditions. The same is true for an
optimization problem with inequality constraints.
Precisely, we have the following result.

Proposition: Let f and g
j
j
for j =
1, ..., m be constants. Consider the problem
Suppose that
f is concave
and g
j
is quasi-convex for j = 1, ..., m.
If there exists = (
1
, ...,
m
) such that (x*, ) satisfies the
Kuhn-Tucker conditions then x* solves the problem.

j j
x
f x g x c j m
Corollary: The Kuhn-Tucker conditions are both necessary
and sufficient if the objective function is concave and
either
each constraint is linear
or each constraint function is convex and some vector of the
variables satisfies all constraints strictly.
But sometimes the condition that the objective function is
concave is too strong to be useful, for instance, we
generally assume that utility functions are quasi-concave,
in which case, the following result is useful.

Proposition: Let f and g
j
j
for j =
1, ..., m be constants. Consider the problem
Suppose that
f is twice differentiable and quasi-concave
and g
j
is quasi-convex for j = 1,...,m.
If there exists = (
1
, ...,
m
) and a value of x* such that (x*,
) satisfies the Kuhn-Tucker conditions and f '
i
(x*) 0 for
i = 1, ..., n then x* solves the problem.

j j
x
f x g x c j m
Corollary: Suppose that the objective function is twice
differentiable and quasi-concave and every constraint is
linear. If x* solves the problem then there exists a unique
vector such that (x*, ) satisfies the Kuhn-Tucker
conditions, and if (x*, ) satisfies the Kuhn-Tucker
conditions and f '
i
(x*) 0 for i = 1, ..., n then x* solves
the problem.

Very Important!
If you have a minimization problem, remember that you can
transform it to a maximization problem by multiplying the
objective function by 1. Thus for a minimization
problem the condition on the objective function in the first
result above is that it be convex, and the condition in the
second result is that it be quasi-convex.

Example: max
x
[(x 2)
2
] subject to x 1
Written in the standard format, this problem is
max
x
[(x 2)
2
] subject to 1 x 0.
The objective function is concave and the constraint is
linear. Thus the Kuhn-Tucker conditions are both
necessary and sufficient: the set of solutions of the
problem is the same as the set of solutions of the Kuhn-
Tucker conditions.

2(x 2) + = 0
x1 0, 0, and (1 x) = 0.
From the last condition we have either = 0 or x = 1.
x = 1: 2 + = 0, or = 2, which violates 0.
= 0: 2(x 2) = 0; the only solution is x = 2.
Thus the Kuhn-Tucker conditions have a unique solution,
(x, ) = (2, 0). Hence the problem has a unique solution
x = 2.

Example: max
x
[(x 2)
2
] subject to x 3
Written in the standard format, this problem is
max
x
[(x 2)
2
] subject to 3 x 0.
As in the previous example, the objective function is
concave and the constraint function is linear, so that the
set of solutions of the problem is the set of solutions of the
Kuhn-Tucker conditions.

2(x2) + = 0
x3 0, 0, and (3 x) = 0.
From the last conditions we have either = 0 or x = 3.
x = 3: 2 + = 0, or = 2.
= 0: 2(x 2) = 0; since x 3 this has no solution compatible with
the other conditions.
Thus the Kuhn-Tucker conditions have a single solution,
(x, ) = (3, 2). Hence the problem has a unique solution,
x = 3.

These two examples illustrate a procedure for finding
solutions of the Kuhn-Tucker conditions that is useful in
many problems.
1. Look at the complementary slackness conditions, which
imply that either a Lagrange multiplier is zero or a
constraint is binding.
2. Check the implications of each case, using the other
equations.
In these two examples, this procedure is very easy to follow.
The following examples are more complicated.

The objective function is concave and the constraints are
both linear, so the solutions of the problem are the
solutions of the Kuhn-Tucker conditions.
1 2
2 2
1 2
,
max ( 4) ( 4)
x x
x x
1 2 1 2
subject to 4 and 3 9 x x x x + s + s
We previously found the Kuhn-Tucker conditions,
What are the solutions of these conditions? Start by looking
at the two conditions
1
(x
1
+ x
2
4) = 0 and
2
(x
1
+ 3x
2
9) = 0. These two conditions yield the
following four cases.
1 1 2
2 1 2
1 2 1 1 1 2
1 2 2 2 1 2
2( 4) 0
2( 4) 0
4, 0, and ( 4) 0
3 9, 0, and ( 3 9) 0
x
x
x x x x
x x x x

=
=
+ s > + =
+ s > + =
(1) x
1
+ x
2
= 4 and x
1
+ 3x
2
= 9:
In this case we have x
1
= 3/2 and x
2
= 5/2. Then the first two
equations are
5
1

2
= 0
3
1
3
2
= 0
which imply that
1
= 6 and
2
= 1, which violates the
condition
2
0. We can rule out this case.

(2) x
1
+ x
2
= 4 and x
1
+ 3x
2
< 9, so that
2
= 0:
Then first two equations imply x
1
= x
2
= 2 and
1
= 4.
All the conditions are satisfied, so
(x
1
, x
2
,
1
,
2
) = (2, 2, 4, 0) is a solution.

(3) x
1
+ x
2
< 4 and x
1
+ 3x
2
= 9, so that
1
= 0:
Then the first two equations imply x
1
= 12/5 and x
2
= 11/5,
violating x
1
+ x
2
< 4. We can rule out this case.

(4) x
1
+ x
2
< 4 and x
1
+ 3x
2
< 9, so that
1
=
2
= 0:
Then first two equations imply x
1
= x
2
= 4, violating x
1
+ x
2
< 4. We can rule out this case.
So (x
1
, x
2
,
1
,
2
) = (2, 2, 4, 0) is the single solution of the
Kuhn-Tucker conditions. Hence the unique solution of
problem is (x
1
, x
2
) = (2, 2).

Example: max
x,y
xy subject to x + y 6, x 0, and y 0.
The objective function is twice-differentiable and quasi-
concave and the constraint functions are linear, so the
Kuhn-Tucker conditions are necessary and if ((x*, y*), *)
satisfies these conditions and no partial derivative of the
objective function at (x*, y*) is zero then (x*, y*) solves
the problem. Solutions of the Kuhn-Tucker conditions at
which all derivatives of the objective function are zero
may or may not be solutions of the problem (we need to check
the values of the objective function at these solutions).

The Lagrangian is
y
1
+
2
= 0
x
1
+
3
= 0
1
0, x + y 6,
1
(x + y 6) = 0
2
0, x 0,
2
x = 0
3
0, y 0,
3
y = 0.
1 2 3
( , ) ( 6) x y xy x y x y = + + + L

(1) If x > 0 and y > 0 then
2
=
3
= 0, so that
1
= x = y from
the first two conditions. Hence x = y = = 3 from the third
condition. These values satisfy all the conditions.
(2) If x = 0 and y > 0 then
3
= 0 from the last condition and
hence
1
= x = 0 from the second condition. But now from
the first condition
2
= y < 0, contradicting
2
0.
(3) If x > 0 and y = 0 then
2
= 0, and a symmetric argument
yields a contradiction.
(4) If x = y = 0 then
1
= 0 form the third set of conditions,
so that
2
=
3
from the first and second conditions. These
values satisfy all the conditions.

We conclude that there are two solutions of the Kuhn-
Tucker conditions, (x, y,
1
,
2
,
3
) = (3, 3, 3, 0, 0) and
(0, 0, 0, 0, 0). The value of the objective function at (3, 3)
is greater than the value of the objective function at (0, 0),
so the solution of the problem is (3, 3).
Optimization Summary
Conditions under which FOC are necessary and sufficient:
Unconstrained Maximization Problems
If x* solves max
x
f (x) then f '
i
(x*) = 0 for i = 1, ..., n.
If f '
i
(x*) = 0 for i = 1, ..., n and if f is concave then x*
solves max
x
f (x).
Equality Constrained Maximization Problems (one constraint)
If x* solves max
x
f (x) subject to g(x) = c, and if
g(x*) (0,...,0), then there exists such that L'
i
(x*) = 0
for i = 1, ..., n and g(x*) = c.
If there exists such that L'
i
(x*) = 0 for i = 1, ..., n and
g(x*) = c and if f is concave and g is convex then x*
solves max
x
f (x) subject to g(x) = c.
Inequality Constrained Maximization Problems
If x* solves max
x
f (x) subject to g
j
(x) c
j
for j = 1, ..., m
and if {g
j
is concave for j = 1, ..., m} or {g
j
is convex for
j = 1, ..., m and there exists x such that g
j
(x) < c
j
for
j = 1, ..., m} or {g
j
is quasi-convex for j = 1, ..., m,
g
j
(x*) (0,...,0) for j = 1, ..., m, and there exists x such
that g
j
(x) < c
j
for j = 1, ..., m} then there exists (
1
,...,
m
)
such that L'
i
(x*) = 0 for i = 1, ..., n and
j
0, g
j
(x*) c
j
,
and
j
(g
j
(x*) c
j
) = 0 for j = 1, ..., m.
Inequality Constrained Maximization Problems
If there exists (
1
,...,
m
) such that L'
i
(x*) = 0 for i = 1, ..., n
and
j
0, g
j
(x*) c
j
, and
j
(g
j
(x*) c
j
) = 0 for j = 1, ..., m
and if g
j
is quasi-convex for j = 1, ..., m and either {f is
concave} or {f is quasi-concave and twice differentiable
and f (x*) (0,...,0) where L(x) = f (x)
j=1
m
j
(g
j
(x)
c
j
)} then x* solves max
x
f (x) subject to g
j
(x) c
j
for
j = 1, ..., m.
Bibliography
Cornuejols and Ttnc, Optimization Methods in Finance,
Cambridge.
Huang and Litzenberger, Foundations for Financial Economics,
North-Holland.
Intriligator, Mathematical Optimization and Economic Theory,
Prentice-Hall.
Marsden and Tromba, Vector Calculus, Freeman.
Varian, Microeconomic Analysis, Norton.

3. A/C=r
f
Note: invest everything in the riskless asset and hold an arbitrage portfolio of
risky assets whose weight sums to zero.

Recall the expression for the optimal weights
Substituting r
f
=A/C and premultiplying by , we get
( )
1
2
B 2A C
p f
p f
f f
E r r
w e r
r r
i
(

= E
+
1
2
2
A
C
B 2A C
A
A C
C
B 2A C
0
p f
p
f f
p f
f f
E r r
w e
r r
E r r
r r
i i i
(
| |

' '
= E
|
+ \ .
(
| |

=
|
+ \ .
=
M-V Analysis Inequalities

Lets return to our exploration of mean-variance analysis.
When we add inequality constraints to our problem, the
quadratic optimization problem generally does not have a
simple analytical solution. Instead, we must use
numerical methods to solve for the optimal portfolio
weighting.
M-V Analysis Inequalities

State-of-the-art quadratic programming algorithms with inequality
constraints use two kinds of approaches: (1) the active-set method or
projection method, and (2) the interior point method.
Both of these approaches solve a series of sub-problems where there
are only equality constraints. They differ only in how they arrange
the order of those sub-problems. In the active-set method, you
proceed along the boundary of the feasible set defined by the
constraints. In the interior-point method, you proceed within the
feasible set. (You can use Matlabs functions e.g. quadprog).
Current implementations of interior methods often outperform active
set methods in terms of speed. On the other hand, active set methods
are more robust and better suited for warm starts, which are
important for solving integer optimization problems (quadprog uses an
active set method).
M-V Analysis Inequalities: Example

Example: Lets return to our earlier numerical example,
adding the restriction that we cannot short any of the
stocks. In addition, we will also add the constraint that
stock 2 must have a weight of at least 0.10. Our problem
can be written:
1
min
2
s.t.
w
w w
Aw b
'
E
s
Where
1 1 1
0.100162 0.164244 0.182082
1 0 0
0 1 0
0 0 1
1 0 1
A
And
Notice to express the constraint that w
2
0.10, we used w
1
+w
3
0.90. Sometimes
we need to reengineer our constraints to reach a solution.
1
0.15
0
0
0
0.90
b
(
(
(
(
=
(
(
(
(

The solution is
(using quadprog this took 1 iteration)
0.3699
0.1000
0.5301
w
(
(
=
(
(

M-V Analysis
Congratulations!
M-V Analysis
Congratulations!
Now you know how to do everything in portfolio analysis
you just need to set up the appropriate problem.
M-V Analysis
Congratulations!
Now you know how to do everything in portfolio analysis
you just need to set up the appropriate problem.
Lets consider a few alternatives
M-V Analysis: Diversification Constraint
As discussed last time, there are sometimes regulatory
requirements for diversification. In addition, many portfolios
are required (by their managers/investors) to have minimum
and/or maximum investment limits in certain stocks, industries,
sectors, or asset classes. These types of problems can be
generally expressed:
Where the vectors w
l
and w
u
represent lower and upper bounds.
1
min
2
s.t.
and
w
l u
w w
Aw b
w w w
'
E
s
s s
M-V Analysis: Trading Volume
A typical constraint is one on trading volume. This
constraint may be used for a large portfolio where you
want to avoid price impact or for any portfolio where you
want to control the liquidity risk of the portfolio.
Where x is a vector of ADV in dollar terms and c is a
constant for the threshold.
(e.g. $500 million portfolio; 10% of ADV (in millions) of stock i
w
i
(0.1/500)x
i
) Can you generalize this?
1
min
2
s.t.
and
w
w w
Aw b
w cx
'
E
s
s
M-V Analysis: Beta Exposure
Sometimes it is desirable to match the beta of a benchmark
portfolio:
Where:
(note that this will not bound the tracking error or asset specific risk only the
factor risk)
benchmark
1
min
2
s.t.
and
w
w w
Aw b
w | |
'
E
s
'
=
1
N
M-V Analysis: Beta Exposure
Or we can specify a range for the beta exposure:
lower limit upper limit
1
min
2
s.t.
and
w
w w
Aw b
w | | |
'
E
s
'
s s
M-V Analysis: Factor Exposure
Or sometimes we are matching multiple factors:
Where:
(NB: tilting)
lower limit upper limit
1
min
2
s.t.
and B
w
w w
Aw b
w | |
'
E
s
s s
11 12 1
21 22 2
1 2
B
K
K
N N NK
M-V Analysis: Tracking Error
Most professionals with a benchmark use a minimization of
tracking error when weighting stocks in the portfolio.
Most professionals with a benchmark use a minimization of
tracking error when weighting stocks in the portfolio.
Two methods:
1. Minimize the tracking error for a given expected excess
return over the benchmark.
2. Maximize the expected excess return over the benchmark
without exceeding a maximum tracking error constraint,
Tracking error is generally defined as the standard deviation
of the portfolio returns minus the benchmark returns:
Consider the components of the variance
The last term is beyond our control and the first term is what
we usually minimize.
benchmark
TE ( )
( )
p
p b
StdDev r r
Var r r
( ) ( ) 2 ( , ) ( )
p b p p b b
Var r r Var r Cov r r Var r
Define
And our problem becomes
1
( , )
( , )
b
N b
Cov r r
Cov r r
min 2
s.t.
and
w
p
w w w
Aw b
w

' '
E
s
'
=
M-V Analysis: Tracking Error (Factors)
If we are dealing with multiple factors and want to minimize
tracking error, we note:
Where the vector f are the factors into which we have
decomposed returns and the residual terms for different
securities have covariance of zero.
( ) ( ) ( )
i i i i
Var r Var f Var
1 1 i i j j K K i
r f f f
We can then write the variance-covariance matrix as
Or
1,1 1, 1 1 1,1 ,1
,1 , 1 1, ,
1
( ) ( , )
( , ) ( )
( ) 0
0 ( )
K K N
N N K K K K N K
N
Var f Cov f f
Cov f f Var f
Var
Var
B ( )B ( ) Var f Var
B then represents the N by K matrix of factor exposures;
Var(f ) is a K by K matrix of factor premium variances and
Var() is an N by N diagonal matrix of error variances.
The squared tracking error is then
If we add any other relevant constraints, we can solve this
using our quadratic optimizer.
(note: we are now minimizing the tracking error)
2
TE ( ) B ( )B( ) ( ) ( )( )
p b p b p b p b
w w Var f w w w w Var w w c
' ' '
= +
M-V Analysis: Tracking Error (Tilting)
When we actually have specific values or weights for our
factor exposure, we can tilt the portfolio to those weights
by applying a constraint
Where B is as defined earlier and d is the vector
representing the tilt. For example, if we have five factors:
market, size, growth, country, and sector and we wanted to
overweight size and growth, we could use
B( ) d
p b
w w
'
=
d (0 0.1 0.1 0 0)
M-V Analysis: Tracking Error (Tilting)
The zeros in d make sure that the portfolios exposures to
the benchmark with respect to market, country and sector
are the same, and the values make sure that the exposure
to size and growth will by higher than the benchmark by
0.1.
With factor tilting, the optimization problem becomes
min( ) ( )( )
s.t. B ( ) d
and any other constraints
p
p b p b
w
p b
w w Var w w
w w
c
'

'
=
M-V Analysis: Tracking Error (Ghost)
There may be cases in which you do not know what the
underlying securities in the benchmark are or their
weights. In this case, you would minimize the tracking
error with respect to the history of returns of the
benchmark. One possible approach is to minimize
Where
b
is the benchmarks factor exposure and
b
is the
benchmarks error term. Now that we have described the
tracking error, we continue as before.
2
B B ( ) 0
TE ( )
0 ( ) 1 1 1 1
p p p p
b b b
Var w w w w
Var f
Var
c
| | c
' ' '
| | | | | | | | | | | | | |
= +
| | | | | | |

\ . \ . \ . \ . \ . \ . \ .
M-V Analysis: Tracking Error (Risk-Adj)
As indicated earlier, an alternative approach is have a
maximum tracking error constraint and maximize
expected return of the portfolio subject to that constraint.
We could write this as
And any other constraints. Alternatively, if we did not have
a target mean or tracking error, we could use a tracking
error risk aversion parameter A and write
2
max
s.t. ( )
w
p b x
w
Var r r
o
'
=
max ( )
p b
w
w AVar r r
'

Note that these two formulations are related. The set of
maximum-return portfolios obtained as we vary the
tracking error constraint is identical to the set of optimal
portfolios obtained as we vary the tracking-error risk
aversion parameter. In other words, we can always choose
parameters so the two formulations are equivalent. This
property may be useful for solving the optimization
problem depending on how our optimizer wants the
problem to be set.
M-V Analysis: Tracking Error (Risk-Adj)
M-V Analysis
Get the idea?
One we know how to solve the portfolio optimization
problem, everything else is just a wrinkle.
M-V Analysis
Get the idea?
That doesnt mean that its easy what it means is that we
have to figure out how to pose the problem that we want
to solve in a manner in which we can solve it (with the
help of an optimizer).
M-V Analysis
Get the idea?
That doesnt mean that its easy what it means is that we
have to figure out how to pose the problem that we want
to solve in a manner in which we can solve it (with the
help of an optimizer).
But, just for fun, lets see if there is anything else we can learn.
M-V Analysis Utility
Notice that in the numerical example at the beginning of
class, we assumed that we wanted an expected return for
the portfolio of 15% and optimized to achieve that
objective. What makes this right?
Theory would tell us

that what we want
to do is find the
point on the
efficient frontier
which maximizes
the investors utility.
Note that less risk averse investors will have flatter indifference curves.
In practice, we often use a modified approach to mean-
variance analysis in which we construct optimal portfolios
for different risk tolerance parameters (), and by varying
, find the efficient frontier.
In this approach, we trade off risk against return by
maximizing
For various risk tolerances .
2
1 1
max max max
2 2
p p
x x x
U w w w o

( (
' '
~ = E
( (

Where
The unconstrained optimum is found using the FOC
Under the normal regularity conditions.
1
0
dU
w
dw

= E =
* 1
w
= E
( , )
ij i
Cov R c R c o =
[ ]
i i
E R c =
Or with equality constraints
Forming the standard Lagrangian
1
max max subject to
2
w w
U w w w Aw b
(
' '
~ E =
(

1
( )
2
w w w Aw b
' ' '

= E L
FOC
1
0 w A
w

c
'
= E =
c
L
0 Aw b
c
= =
c
L
* 1
( ) w A
'
= E
Aw b =
M-V Analysis Utility/2-Fund Separation
Solving for the optimal weights
Notice that the optimal solution is split into a constrained
minimum-variance portfolio and a speculative portfolio.
This is known as two-fund separation. The first term does
not depend either on the expected returns or on the risk
tolerance it is the constrained minimum-variance
portfolio. The second term depends on the expected
returns and the investors risk tolerance.
* 1 1 1 1 1 1
( ) ( ( ) ) w A A A b A A A A

' ' ' '
= E E + E E E
M-V Analysis Efficiency of Solution
A brief aside:
Note that constrained optimization reduces the efficiency of the
solution. A constrained solution must be less optimal than an
unconstrained solution (assuming that the constraint is
binding). The loss in efficiency can be measured as the
difference between a constrained and unconstrained solution.
But, not every difference between constrained and unconstrained
portfolios is statistically or economically significant. So we
might want to test whether there is a difference. One way to
test for significance is to use the Sharpe ratio (SR).
M-V Analysis Efficiency of Solution

Consider a simple case of running an unconstrained
optimization with k* assets and a constrained optimization
with k assets (k* > k). We can use
Where the statistic is F-distributed and the Sharpe Ratio is
* *
* * *2 2
2 , ( 1)
( )( )( )
F
(1 )
k N k k
N k k k SR SR
SR
+ +

+
f
r r
SR
o
=
Asset-Liability Management
Now consider the problem when we also have stochastic
liabilities. In this case, we focus on the difference between
assets and liabilities. This is known as surplus. The
change in surplus depends directly on the returns of the
asset portfolio (R
p
) as well as the liability returns (R
l
).
We will express surplus returns as a change in surplus
relative to assets
Surplus Assets Liabilities
p l
R R A =
Surplus Liabilities
Assets Assets
p l p l
R R R fR
A
= =
Where f is the ratio of liabilities to assets. If we set f = 1 and
R
l
= c, we are back in the world without liabilities (or
where cash is our liability).
If we want to use the same optimizer, we need to transform
this problem into one of surplus i.e. we need to express
covariance in terms of surplus risk and expected returns in
terms of the relative return of assets verses liabilities.
S S
1
max subject to
2
w
w w w Aw b
(
' '
E =
(

11 1 1
S
1
1
1 0 0 1 0 0
0 1 0 1
0 0 1 0 0 1
k l
k kk kl
l lk ll
f f
f f
f f
o o o
o o o
o o o
'

( ( (
( ( (

( ( (
E =
( ( (
( ( (

1
S
(1 )
l
k l
f
c f
f

(
(
= +
(
(

Now our solution is
By varying the risk-tolerance parameter, we can trace out
the surplus-efficient frontier.
* 1 1
S S
1 1 1 1
S S S S S
( )
( ( ) )
w A A A b
A A A A

' '
= E E +
' '
E E E
The unconstrained (asset-only) frontier and the surplus-
efficient frontier coincide if:
Liabilities are cash (or, equivalently, if assets have zero covariance
with liabilities)
All assets have the have the same covariance with liabilities
There exists a liability-mimicking asset and it lies on the efficient
frontier
The Investment Universe
The choice of the investment universe has a significant
impact on the outcome of portfolio construction. If we
constrain ourselves to NYSE equities, it is likely that our
optimizer will produce a solution skewed toward smaller
cap stocks (why?). If we add Nasdaq equities and foreign
equities, this is likely to change as the variance-covariance
structure changes.
In general, to avoid the accumulation of estimation errors,
we would like to limit our portfolio optimization to groups
of assets with high intragroup and low intergroup
correlations.
In the two asset case, our unconstrained optimization
produces
* 1
w
= E
* 1 1
1 11 12 1 *
* 1 1
2
2 21 22
w
w
w

( (
E E
(
= =
( (
(
E E
( (

*
1
1 22
11
1 11 22 12 21
2 2
11 11 11
1 1
(1 )
dw
d
o

o o o o

o o o
= E =
= =

As the correlation between the two assets approaches 1, the
portfolio weights will react very sensitively to changes in
means (or expected return estimates). As assets become
more similar, any expected return becomes increasingly
important for the allocation decision. Portfolio
optimization with highly correlated assets will almost
certainly lead to extreme and undiversified results.
In the next homework set, I have you explore a method of reducing this
problem using cluster analysis.
Risk Decomposition
It is often useful to understand the sources of risk in and
how those risks are spread through our portfolio. To get at
this, we can decompose risk in the following way.
Consider the standard deviation of portfolio returns
The first question we would like to address is how does
portfolio risk as we change the holdings of a particular
asset?
1/2
1/2 2
( )
p i ii i j ij
i i j i
w w w ww o o o
=
(
'
= E = +
(
(

Risk Decomposition
What we need is the marginal contribution to risk MCTR
which can be easily calculated
Where the ith element in the k by 1 vector is
1
MCTR
p
k
p
d
w
dw
i ii j ij
p j i ip
i p
i p p
w w
d
dw
Risk Decomposition
Note that if we add the weighted MCTRs of all securities in
the portfolio, we get the volatility of the portfolio
as we would expect. If we divide this expression by the
volatility of the portfolio, we get
p ip
i i p
i p
i i
d
w w
dw
2
1
p ip
i
i i i
p i
p i i i
d
w
w w
dw
Risk Decomposition
Which shows that the percentage contributions to risk
(PCTR), which add up to 100%, are equal to the weighted
betas. This can be written as a vector
Where W is a k by k diagonal matrix with portfolio weights
on the diagonal. Each element of the vector PCTR is
given by
1
W
PCTR
p
k
p
d
dw
PCTR
p
i
i i i
p i
d
w
w
dw
Bibliography
Huang and Litzenberger, Foundations for Financial
Economics, North-Holland.
Intriligator, Mathematical Optimization and Economic
Theory, Prentice-Hall.
Factor Risk Contributions

Last time we looked at risk decomposition of a portfolio.
Today we will assume that we can decompose the
uncertainty in asset returns into common factors.
Stocks are at least partly driven by characteristics like
industry, country, size, etc.
We can write the risk premium of a given stock as a
combination of these factor returns weighted by their
respective factor exposures.

Where r is a k by1vector of risk premia (asset return minus
cash), X is a k by p matrix of factor exposures, f is a p by 1
vector of factor returns and u is a k by 1 vector of asset-
specific returns which are both uncorrelated with factor
returns and uncorrelated across assets.
The covariance matrix of excess returns can be expressed
r Xf u = +
[ ] [( )( ) ] E rr E Xf u Xf u
' '
= + +

Where
ff
denotes the p by p covariance matrix of factor
returns and
uu
is a k by k covariance (diagonal) of asset-
specific returns
[ ] [( )] [( )] [( )] [( )] E rr E Xfu E Xff X E uu E uX f
' ' ' ' ' '
= + + +
ff uu
X X
'
E = E + E

We can now decompose the portfolio risk into a common
and a specific part
Using the same logic as last time, we get for the marginal
factor contribution to risk MFCTR (an f by 1 vector)
2
p ff uu
wX X w w w o
' ' '
= E + E
MFCTR
( )
p ff
p
d X w
d X w
o
o
'
E
= =
'
Implied View Analysis

So far, we have calculated the optimal portfolio weights
from given return expectations. But often we are working
with previously established portfolios and all we have are
the weights. How can we determine what the expectations
are and whether or not the weights make sense?
This is done using reverse optimization, which maps the
positions into implicit return expectations.

In an unconstrained portfolio optimization, marginal risks
are traded off against marginal returns. A portfolio is
therefore optimal when the relationship between marginal
risks and marginal returns is the same for all assets in the
portfolio
Since the Sharpe ratio of the portfolio measures the
relationship between incremental risk and return, we can
express the relationship between marginal return and
marginal risk as:

Where the beta measures the sensitivity of an asset to
movements of the portfolio:
Note that this follows from portfolio mathematics not from an equilibrium
condition, but if the portfolio were the market portfolio, the implied returns
would be the returns that investors would need to hold the market portfolio.
p
p
p p
w
2
p
w

This kind of analysis can be used to show investors whether
their return expectations are consistent with market
realities, i.e., whether they are over or under investing
their risk budget in particular areas and whether they are
investing in a way that is consistent with their views.

Lets consider an example
Expected return 10% (5% excess); Volatility 8.97%, Sharpe
ratio 0.57
Asset Weight % Return % Volatility %
Equity 40 11 18
Absolute Rtn 15 12 8
Private Eqty 15 11 9
Real Estate 5 10 14
US Bonds 25 7 3
Non-US Bonds 0 8 8
Cash 0 5 0

With a correlation matrix
1.0 0.0 0.5 0.5 0.3 0.3 0.0
0.0 1.0 0.0 0.0 0.0 0.0 0.0
0.5 0.0 1.0 0.5 0.3 0.3 0.0
0.5 0.0 0.5 1.0 0.5 0.3 0.0
0.3 0.0 0.3 0.5 1.0 0.8 0.0
0.3 0.0 0.3 0.3 0.8 1.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 1.0

We can compute the marginal contribution to risk using the
equation from last time
We compute the MCTR for US Bonds as 0.014 what does
this mean? Suppose instead of holding 25%, we invested
26%, then our total portfolio risk would change from
8.7948 to 8.8089
MCTR
i i p
_
_
8.8089 8.7948 0.0141
p
p US Bonds
US Bonds
d
w
dw

Or for the complete picture
Biggest increase in risk would come from equities (already
about 80%), smallest increase from Absolute Return (most
diversifying).
Asset PCTR % MCTR Implied Rtn %
Equity 79.1 0.174 9.84
Absolute Rtn 1.9 0.011 0.62
Private Eqty 10.2 0.060 3.39
Real Estate 4.8 0.085 4.80
US Bonds 4.0 0.014 0.80
Non-US Bonds 0.0 0.029 1.66
Cash 0.0 0.000 0.00

0
2
4
6
8
10
12
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

Implied excess return for Absolute Return strategies is much
lower than the forecast. This means that the investor is
underspending risk in this area.
For equities, the investor is overspending in the risk
allocation. A large allocation in a relatively
undiversifying asset requires large implied return to make
the portfolio optimal.
In this case, it is apparent that the investors implied return
for equities is much larger than historical experience.

View Optimization
This approach can be used iteratively where changes are
made to allocations or to forecasts until there is reasonable
correspondence between implied returns and expected
returns.
It can also be used to build a consensus view within a
portfolio team.
Note, however, that these views are for an unconstrained investor.
Correcting for Autocorrelation

Some asset classes appear to have much less risk than one
might commonly believe.
Corporate high yield
Hedge funds
If the risk for an asset class is underestimated, too much
capital will be allocated to that class.
Loss of efficiency in the portfolio.
Broader issue of societal allocations.

Positively autocorrelated returns (high returns tend to be
followed by high returns), show less historical volatility
than an uncorrelated series.
Where does autocorrelation come from?
Infrequent trading in illiquid securities.
Real estate
High yield
Hedge funds
Non-synchronous trading

One of the ways to check and correct for autocorrelation is
known as the Blundell-Ward filter:
Which creates a new, transformed return series, r*, using the
returns r at times t and t-1. The coefficient a
1
is estimated
from an autoregressive first-order (AR(1)) model:
1 *
1
1 1
1
1 1
t t t
a
r r r
a a
0 1 1 t t t
r a a r

Note that by applying this filter the mean is unchanged:
And the variance increases:
1 *
1 1
1
1 1
t
a
r r r r
a a
2
1 2 * 2
2
1
1
( ) ( )
(1 )
t t
a
r r
a

This approach can also be used to arrive at more realistic
beta estimates.
Lets consider an example using four hedge fund indices,
convertible arbitrage, distressed debt, event-driven and
macro and the MSCI USA index as the market, we could
run three types of regressions
0 it mt t
r r
* *
0 it mt t
r r
0 1 1 2 2 3 3 it mt mt mt mt t
r r r r r

Index a
1

0
*
0
0
+
1
+
2
+
3
Convertible 0.55 (7.66) 0.09 0.22 0.25
Distressed 0.52 (6.86) 0.18 0.44 0.49
Event-Driven 0.28 (3.56) 0.29 0.38 0.38
Macro 0.18 (2.10) 0.29 0.37 0.52
The betas from ordinary regressions appear to
underestimate the true market exposure and therefore
overstate the diversifying effects associated with the
hedge funds.
Problems with the Covariance Matrix

The covariance matrix is a fundamental tool for our analysis,
so it is worthwhile spending a bit of time looking at its
properties.
Since this is intended to be a covariance matrix, it must be
true that for all w. In other words, it must be
positive semi-definite. A necessary and sufficient
condition for positive semi-definiteness (for symmetric
matrices) is that all of the eigenvalues of are positive or
zero and at least one eigenvalue is greater than zero.
0 w w
'
E >

However, we may find that we sometimes have negative
eigenvalues when we have estimated out covariance
matrix.
This can arise for several reasons:
Estimates are generated from time series of different lengths.
The number of observations is less than the number of assets or
risk factors.
Two or more assets are collinear.

Consider the following:
Where the variances have been standardized to 1.0 for
simplicity.
The eigenvalues can be found
1.0 0.9 0.3
0.9 1.0 0.7
0.3 0.7 1.0
| |
|
E =
|
|
\ .
1 2 3
( , , ) (2.0,1.29, 0.3) e e e =

So this matrix is not positive semi-definite. One of the ways
to fix this is to perform an adjustment to the matrix.
1. Find the smallest eigenvalue (here e
3
)
2. Create a minimum zero eigenvalue by shifting the
covariance matrix where I is an identity
matrix.
3. Scale the resulting matrix by 1/(1/e
3
) to enforce
variances of 1:
*
3
e I E = E
** *
3
1
1 e
E = E

For our example, the new adjusted matrix is
With eigenvalues
**
1.0 0.69 0.23
0.69 1.0 0.54
0.23 0.54 1.0
| |
|
E =
|
|
\ .
1 2 3
( , , ) (1.77,1.22, 0) e e e =
Significance of the Inverse Covariance

Lets turn to the economics of our unconstrained solution
If we run the regression of asset i against all other k-1 assets
The explanatory power of this regression is given as
* 1
w
= E
i ij j i
j i
r a r | c
=
= + +
2
i
R

It can then be shown than
1 12
2 2 2
11 1 11 1 11 1
2 21
2 2 2 1
22 2 22 2 22 2
1 2
2 2 2
1
(1 ) (1 ) (1 )
1
(1 ) (1 ) (1 )
1
(1 ) (1 ) (1 )
k
k
k k
kk k kk k kk k
R R R
R R R
R R R
| |
o o o
| |
o o o
| |
o o o
(

(

(
(

(
E =
(
(
(
(

(


Which means that the optimal weight for asset i is
The numerator is the excess return after regression hedging
(i.e. the excess return after the reward for implicit
exposure to other assets has been removed. This is
equivalent to a in the regression.
*
2
(1 )
i ij j
j i
i
ii i
w
R
|
o
=
(
(
=
(

Since
ii
is the total risk associated with asset i, the fraction
of risk that cannot be hedged away is the denominator of
our expression.
In terms of the regression equation, this is the unexplained
variance or the variance of the error term.
*
2
(1 )
i ij j
j i
i
ii i
w
R
|
o
=
(
(
=
(

Since the regression attempts to minimize the variance of
the errors this means that the optimization will put
maximum weight into those assets that are similar to the
other assets (as a group) but have a small return
advantage. This property leads to implausible results
when estimation errors are taken into account.
Covariance in Good and Bad Times
Often we find that during times of market difficulty,
correlations within an asset class increase. Sometimes this
is stated, In times of stress, all correlations go to one.
Is the low correlation in a full sample covariance matrix just
an artifact of reasonably positive correlation in normal
times and of highly negative correlation in unusual times?
Or is it a diversifying asset?
Investors may not want to bet on average correlation they
may actually have preferences that vary depending on the
state of the world.

To address these types of issues, we may want to optimize
our portfolio based upon our expectation of the occurrence
of normal and unusual times.
To determine what are unusual times, we will define them
according to their statistical distance from the mean vector
This statistic is distributed Chi-Squared with k degrees of
freedom. If we define an unusual observation as the outer
10%, we can test each time period.
1 1

( ) ( )
t t t t t
r r d d D

' '
E = E =

Notice that the distance is weighted by the inverse of the
covariance matrix. This means that we take into account
asset volatilities (the same deviation from the mean might
be significant for low-volatility series but not for high-
volatility series). Hence, outliers are not necessarily
associated with down markets.

We could now build a new covariance matrix weighted by
our subjective (or estimated) probabilities.
Where we have included the relative risk tolerance for each
regime (note that these must be scaled so they sum to the
actual risk tolerance of the investor).
Note that this analysis can be very sensitive to the inclusion of new assets
since that may change which periods are usual and unusual. For that reason,
it may be useful to define unusual times with respect to a core set of assets.
(1 )
new normal normal unusual unusual
p p E = E + E
Estimation Error
We should be clear that everything that we have done so far
is predicated on a couple of things:
1. We are using expected returns in other words,
forecasted returns for our assets.
2. We are using an expected variance-covariance structure
in other words, forecasted for our universe of assets.
3. If the future deviates from our forecasts by a significant
amount, we will not have an optimal portfolio. (This is an
issue of performance measurement)
Estimation Error
As I have said, generally you will want to forecast the mean
in some manner (if we have time we will talk more about
this later in the course). Your forecast could be a simple
forecast (like last periods return or the sample mean) or it
could be more complex (Delphi method; time series
forecast; multi-factor forecast).
Estimation Error
For the variance-covariance structure, one typically uses
simple approaches like the estimated structure based upon
the sample history, a 250 day moving average, or an
exponentially weighted average. You can add complexity
to this by embedding Arch-Garch processes or other
generalizations, but remember that if you are not using a
factor decomposition (and thereby reducing the space),
you are now attempting to forecast a large number of
variables for a problem of any size.
2
2
n
n
Estimation Error
To review what I discussed last time, assume that we have
an estimated mean of 10% and an estimated volatility of
20%.
Estimation error for the mean is given by
And the confidence interval is calculated as
T
, z z
T T
Estimation Error
For the variance, Campbell, Lo and MacKinlay have shown
We can see from these expressions that the estimation error
for the mean is effected by the length of the time series T
and the estimation error for the variance is effected both
by the length and by the frequency of sampling (t).
We also see this in the following tables:
1
2 2
( ) 1 2
T
Var
t
Estimation Error
Estimation Period (yrs) Estimation Error % 95% Confidence Interval %
1 20 78
5 9 35
10 6 25
20 4 18
50 3 11
Effect of Sample Period on Estimation Error for Mean Returns
Estimation Error
Effect of Sample Period on Estimation Error (%) for Variance
Estimation Estimation Frequency
Period yrs Daily Weekly Monthly Quarterly
1 0.35 0.79 1.71 3.27
5 0.16 0.35 0.74 1.30
10 0.11 0.25 0.52 0.91
20 0.08 0.18 0.37 0.64
50 0.05 0.11 0.23 0.40
What is more important estimation error in the

mean or in the variance?
Currency in the Portfolio
When optimizing a portfolio, one often has to deal with a
block structure. In other words, two or more blocks of
assets (eg. stocks and bonds, equities and currencies,
active managers and passive strategies).
Often the correlation between blocks is ignored or set to
zero and the problem is solved separately, or the problem
is solved in a two-step process where one finds the
optimal allocation for part of the problem and then finds
the optimal allocation for the second part of the
problem.

We will study this problem using currencies.
Optimal currency hedging is the subject of ongoing debate
between plan sponsors, asset managers and consultants.
We will consider asset returns (local return plus currency
return minus domestic cash rate)
i i
i h
i i
p s
a c
p s

And currency returns (local cash rate plus currency return
minus domestic cash rate)
The covariance matrix of asset and currency returns is
assumed to follow the block structure
i
i i h
i
s
e c c
s
aa ae
ea ee

Currency hedging takes the form of regression hedging
where we regress asset returns against all currency returns:
Regression hedging can also be expressed in matrix terms as
Where is
1 1 i i i ik k ik k i
a e e e
1
ea ee
11 12 1
21 22 2
1 2
k
k
k k kk

We can now define the variance in asset returns that remains
unexplained by currency returns (this is the conditional
variance of asset returns conditioned on currency returns)
And write the inverse of the covariance matrix of asset and
currency returns as
| ae aa ee
1 1
| |
1
1 1 1
| |
a e a e
a e ee a e

Where we use the results for the inverse of a partitioned
matrix
1
1 1 1
11 12 12 22
1 1 1 1 1 1
21 22
22 21 22 22 21 12 22
P P D D P P
P P
P P D P P P D P P
1
11 12 22 21
D P P P P

For example, checking the value of D
1 1
11 12 22 21
1 1 1
1 1 1
1
|
( ) ( )
aa ae ee ea
aa ae ee ee ee ea
aa ae ee ee ee ea
aa ee
a e
D P P P P

Now, defining
And recalling the solution to the unconstrained optimization
a
e
w
w
w
a
e
* 1
w

There are three solutions to our problem.
First is the simultaneous optimization or the joint full blown
optimization (choosing the optimal asset and currency
positions simultaneously):
This assumes that the manager has expertise over all assets and
currencies.
1 1
*
| |
,
*
*
1 *
,
,
a e a a e e
a sim
sim
e sim
ee e a sim
w
w
w
w

Note that the optimal hedge positions for currency depend
on the optimal asset positions, which are themselves
effected by the presence of currencies in the portfolio.
Also, the hedge positions have a speculative component
driven by non-zero expected returns in currencies as well
as a variance reduction component related to beta.
* 1 *
, , e sim ee e a sim
w w

If currencies carry a positive risk premium (the currency
return is, on average, greater than the interest rate
differential), currencies will be included in the optimal
portfolio because the first term will be positive.
Instead, lets focus on the case (often assumed in practice)
that currencies do not offer a significant risk premium. In
this case, the solution becomes
* 1
, |
* *
, ,
a sim a e a
e sim a sim
w
w w

Suppose now that local asset returns are also uncorrelated
with currency returns. In that case, taking on currency
risk does not help to reduce (or hedge) asset risk and
currency risk would always be an add-on to asset risk.
If local returns are not correlated with currency movements,
the covariance between currency returns and foreign
assets returns in home currency units contains solely the
covariance between currencies.

Which in matrix terms becomes
or
, , ,
,
j j j
i i i i
i i j i j i j
j
i
i j
s s s
p s p s
Cov Cov Cov
p s s p s s s
s
s
Cov
s s
1
1
ee ee
ea ee

So the currency positions will completely hedge out the
currency risk that arises from the unhedged asset positions
(unitary hedging):
* 1
, |
* *
, ,
a sim a e a
e sim a sim
w
w w

Now, suppose the opposite that foreign asset returns (in
home country currency) and currency returns are not
correlated. Now we would have and
so our solution would be
Since the covariance of asset returns conditioned on
currency returns would be
0
ea
1
0
ea ee
*
1
,
*
,
0
a sim
aa a
e sim
w
w
| ae aa ee aa

To summarize:
1. If currencies carry a risk premium, there will always be a
speculative aspect to currency exposure.
2. If currencies do not have a risk premium, we need to look at
currency exposure in terms of its ability to reduce asset risk:
a. Zero correlation between local returns and currency returns means
currencies add risk without return or diversification benefits.
b. Negative correlation between local returns and currency returns
makes currencies a hedge asset that reduces total portfolio risk.
c. Positive correlation between local returns and currency returns
would increase total portfolio risk. In that case, over-hedging
(short position in currency is greater than the long position in the
asset) is optimal.

Now consider the second approach, where we optimize asset
positions in a first step and in a second step choose
optimal currency positions conditional on the already
established asset positions. This is known as partial
optimization and the solution is
Terms representing the conditional covariance drop out and
there is no feedback of currency positions on asset
positions. Total risk is controlled but currencies are managed
independently.
*
1
,
*
1 *
*
,
,
a par
aa a
par
ee e a par
e par
w
w
w
w

The final option for constructing portfolios with currencies
is simply separate optimization (also known as currency
overlay)
In this case currencies are completely independent and should be measured
against their own benchmark.
*
1
,
*
1 *
,
a sep
aa a
sep
ee e e sep
w
w
w

I hope, by now, that it is obvious to you that these different
techniques are in decreasing order of efficiency (in other
words, decreasing utility).
Moreover, it should also be obvious that currencies are just a
proxy for any investible asset that you want as part of your
portfolio (hedge funds; foreign equity; private equity; real
estate; etc.). These three techniques can always be used
(and commonly are), but they are always in decreasing
efficiency.
Bibliography
Blundell and Ward, Property Portfolio Allocation: A Multifactor
Model, Land Development Studies, 1987.
Chan and Hussey, Marginal Contribution to the Sharpe Ratio,
Northwater Capital Management Inc., January 2009.
Chow, Jacquier, Kritzman, and Lowry, Optimal Portfolios in
Good Times and Bad, Financial Analysts Journal, 1999.
Scholes and Williams, Estimating Beta from Nonsynchronous
Data, Journal of Financial Economics, 1977.
Stevens, On the Inverse of the Covariance Matrix in Portfolio
Analysis, Journal of Finance, 1998.
Bibliography
Campbell, Lo, and MacKinlay, The Econometrics of
Financial Markets, Princeton University Press, 1997.
Jorion, Mean Variance Analysis of Currency Overlays,
Financial Analysts Journal, 1994.
Risk Revisited
So far we have often relied on an assumption (or
presumption) of normal returns. But we know that asset
returns are not normal and, therefore, the mean and
variance do not fully describe the characteristics of the
joint asset return distribution. Specifically, the risk and
the undesirable outcomes associated with the portfolio
cannot be adequately captured by the variance.
Lets spend a bit of time looking at alternative portfolio risk
measures that are sometimes used in practice.
Risk Revisited
Generally speaking, there are two different types of risk
measures:
1. Dispersion Measures: consider both positive and
negative deviations from the mean, and treat those
deviations as equally risky.
2. Downside Measures: maximize the probability that the
portfolio return is above a certain minimal acceptable
level known as the benchmark or disaster level.
Dispersion: Standard Deviation
Of course, the best known and most used dispersion
measure is (for historical reasons) the foundation of
modern portfolio theory standard deviation
1/2
1/2 2
( )
p i ii i j ij
i i j i
w w w ww o o o
=
(
'
= E = +
(
(

Dispersion: Mean-Absolute Deviation
The mean-absolute deviation or MAD approach doesnt use
squared deviations, but absolute deviations
Where
And r
i
is the return on the asset and
i
is the expected return
on the asset.
( )
p i i i i
i i
MAD r E wr w
(
=
(

p i i
i
r wr =
Dispersion: Mean-Absolute Deviation

The computation of optimal portfolios under MAD is
straightforward since the optimization problem is linear
and can be solved with standard linear programming
routines.
Note that it can be shown that if individual asset returns are
multivariate normal
( )
2
p p
MAD r o
t
=
Dispersion: Mean-Absolute Moment
The mean-absolute moment (MAM
q
) of order q is defined by
Or
Which is a straightforward generalization of the mean-
standard deviation (q=2) and the mean-absolute deviation
(q=1) approaches.
( )
1/
, 1
q
q
q p i i i i
i i
MAM r E wr w q
( | |
( | = >
|
(
\ .

( )
( )
1/
( ) , 1
q
q
q p p p
MAM r E r E r q
(
= >
(

Downside Measures
Now lets turn to downside measures, where the objective is to have a
portfolio return above a certain minimum a safety first approach.
While these types of measures may have significant intuitive and
theoretical appeal, they are often computationally more complicated
to use in a portfolio context.
Downside risk measures of individual assets cannot be easily integrated
into portfolio downside risk measures since their computation
requires knowledge of the entire joint distribution of asset returns.
You usually have to resort to computationally intense nonparametric
estimation, simulation, and optimization techniques.
Moreover, the estimation error for downside measures is usually higher
than that for mean-variance approaches since we only use a portion
of the original data often just the tail of the empirical distribution.
Downside: Roys Safety First
Published the same year (1952) as Markowitzs paper (the
foundation of Modern Portfolio Theory), was Roys paper
on safety first (the foundation of downside risk measures).
Under MPT, the investor makes a trade off between risk and
return where the final portfolio allocation depends on the
investors utility function. As you know, it can be hard, or
even impossible, to determine the investors actual utility
function.
Roy argued that an investor, rather than thinking in terms of
utility, first wants to make sure that a certain amount of
the principal is preserved. Thereafter, the investor decides
on a minimal acceptable return that achieves this principal
preservation.
In essence, the investor solves
Where Pr is the probability function and r
p
is the portfolio
return.
0
minPr( ) subject to 1
p
w
r r w
Of course, it would be unlikely that the investor would know
the true probability function, but if we recall that
Tchebycheffs inequality (for a random variable x, mean
and variance
2
) states that for any positive real number c
Then we can write
2
2
Pr x c
c
0 0
2
2
0
Pr( ) Pr( )
p p p p
p
p
r r r r
r
Therefore, not knowing the probability function, the investor
solves the approximation
Note that if r
0
is equal to the risk-free rate, then this optimization problem is
equivalent to maximizing a portfolios Sharpe ratio.
0
min subject to 1
p
w
p
w
r
Downside: Semi-variance
Even in his 1959 book, Markowitz proposed the use of
semi-variance to correct for the fact that variance
penalizes over-performance and under-performance
equally.
Portfolio semi-variance is
2
2
,min
min , 0
p i i i i
i i
E wr w o
(
| | | |
(
=
| |
(
\ . \ .

Downside: Lower Partial Moment
The lower partial moment risk measure is a generalization of
semi-variance. The lower partial moment with power
index q and a target rate of return r
0
is given by
If we set q=2 and r
0
equal to the expected return, we get the
semi-variance.
Note, it can be shown q=1 represents a risk neutral investor, 0<q1 a risk
seeking investor and q>1 a risk-averse investor.
0
1
, , 0
min , 0
p
q
q
r q r p
E r r
Downside: Value at Risk
The best known downside risk measure is probably value at
risk (VaR), originally developed by JP Morgan. VaR is
related to the percentiles of loss distributions, and
measures the predicted maximum loss at a specified
probability level (for example 95%).
VaR can be defined as
Typical values of (1-) are 90%, 95%, and 99%.
1
VaR min | Pr
p p
r r r r
Note that there a several equivalent ways to define VaR
emphasizes that r is the value such that the probability of a
loss greater than r is less than .
An alternative (and equivalent) way to define VaR
emphasizes that r is the value such that the probability that
the maximum loss is at most r is (1-).
1
VaR min | Pr
p p
r r r r
1
VaR min | Pr (1 )
p p
r r r r
There are many well known problems with VaR:
1. The common assumption of lognormal returns is problematic
when you have long and short positions.
2. It is not sub-additive (in other words, the risk of two
combined portfolios may not be less than the sum of the risks
of each), which means that diversification does not generally
hold.
3. When calculated from generated scenarios, VaR is a non-
smooth and non-convex function with multiple stationary
points making it a difficult function to find a global optimum.
4. It does not take into account the magnitude of losses beyond
the VaR value.
Downside: Conditional Value at Risk
The problems with value at risk led to the development of
desirable properties for a risk measure. Risk measures
which satisfy these properties are known as coherent risk
measures.
A risk measure is called a coherent measure of risk if it
satisfies:
1. Monotonicity: if X 0, then (X) 0.
2. Subadditivity: (X+Y) (X)+ (Y).
3. Positive Homogeneity: for any positive real number c,
(cX) = c(X).
4. Translational invariance: for any real number c,
(X+c) (X)-c.
These properties can be interpreted:
1. If there are only positive returns, then the risk should be non-
positive.
2. The risk of a portfolio of two assets should be less than or
equal to the risks of the individual assets.
3. If the portfolio is increased c times, the risk becomes c times
larger.
4. Cash or another risk-free asset does not contribute to
portfolio risk.
Note that standard deviation is not a coherent measure since it violates the
monotonicity property. Semi-deviation type measures violate the
subadditivity condition. The four properties together are quite restrictive.
Conditional value at risk is a coherent risk measure defined
as:
CVaR measures the expected amount of losses in the tail of
the distribution of possible portfolio losses (beyond the
portfolio VaR).
This is also known as expected shortfall, expected tail loss,
or tail VaR.
(1 ) (1 )
CVaR ( ) | VaR ( )
p p p p
r E r r r
Lets consider some of the mathematical properties of
CVaR.
Let w be the vector denoting the number of shares of each
asset and y be a random vector describing the uncertain
outcomes of the economy (or the market variables). The
function f(w,y) (the loss function) represents the loss
associated with the portfolio vector w (Note that for each
w, the loss function is a one-dimensional random
variable). Finally, p(y) is the probability associated with
scenario y.
Now, assuming all random variables are discrete, the
probability that the loss function does not exceed a certain
value is given by the cumulative probability
Using this cumulative probability, we can write
{ | ( , ) }
( , ) ( )
y f w y
w p y
(1 )
VaR ( ) min{ | ( , ) (1 )} w w y
Since CVaR of the losses of portfolio w is the expected
value of the losses conditioned on the losses being in
excess of VaR, we have
(1 )
(1 )
(1 ) (1 )
{ | ( , ) VaR ( )}
{ | ( , ) VaR ( )}
CVaR ( ) ( ( , ) | ( , ) VaR ( ))
( ) ( , )
( )
y f w y w
y f w y w
w E f w y f w y w
p y f w y
p y
The continuous equivalents of these formulas are
( , )
( , ) ( )
f w y
w p y dy
(1 )
VaR ( ) min{ | ( , ) (1 )} w w y
(1 )
(1 ) (1 )
( , ) VaR ( )
CVaR ( ) ( ( , ) | ( , ) VaR ( ))
1
( , ) ( )
f w y w
w E f w y f w y w
f w y p y dy
Moreover, we see that
(1 )
(1 )
(1 )
( , ) VaR ( )
(1 )
( , ) VaR ( )
(1 )
1
CVaR ( ) ( , ) ( )
1
VaR ( ) ( )
VaR ( )
f w y w
f w y w
w f w y p y dy
w p y dy
w
Since
In other words, CVaR is always at least as large as VaR, but
it is a coherent risk measure (and VaR is not). Further,
CVaR is a concave function and therefore has a unique
minimum.
Note, however, we have a problem in that you need to have
an analytical expression for VaR this problem was
solved by Rockefellar and Uryasev (2000).
(1 )
( , ) VaR ( )
1
( ) 1
f w y w
p y dy
Their idea is that instead of CVaR we can use the function
Rockefellar and Uryasev prove the following
1. is a convex and continuously differentiable
function in .
2. is a minimizer of .
3. The minimum value of is .
( , )
1
( , ) ( ( , ) ) ( )
f w y
F w f w y p y dy
c

c
>
(
= + (
(

}
( , ) F w
c

(1 )
VaR ( ) w
( , ) F w
c

( , ) F w
c

(1 )
CVaR ( ) w
So we can find the optimal value of by
solving the optimization problem
If we denote as the solution to this optimization
problem, then is the optimal CVaR.
The optimal portfolio is given by and the corresponding
VaR is given by .
In other words, we can compute the optimal CVaR without first calculating
VaR.
(1 )
CVaR ( ) w
,
min ( , )
w
F w
c

* *
( , ) w
* *
( , ) F w
c

*
w
*

In practice, the probability density function p(y) is not
known or difficult to estimate. Instead, we might have T
different scenarios Y={y
1
,,y
T
} that are sampled from the
probability distribution or that have been obtained from
computer simulations. Evaluating the auxiliary function
using the scenarios Y, we obtain
* *
( , ) F w
c

1
1
( , ) max(( ( , ) ), 0)
T
Y
i
i
F w f w y
T
c

c
=
(
= +
(


Therefore the optimization problem
Takes the form
(1 )
minCVaR ( )
w
w
c
,
1
1
min max(( ( , ) ), 0)
T
i
w
i
f w y
T

c
=
(
+
`
(
)

Which can also be written
Subject to
Along with any other constraints (like short sales). Where z
i
is an auxiliary variable for .
,
1
1
min
T
i
w
i
z
T

c
=
(
+
`
(
)
0, 1, ,
i
z i T > =
( , ) , 1, ,
i i
z f w y i T > =
max(( ( , ) ), 0)
i
f w y
Under the assumption that f(w,y) is linear in w, the above
optimization is linear and can be solved using standard
linear programming techniques.
This representation of CVaR can also be used to construct
other portfolio optimization problems. For example, the
mean-CVaR optimization problem
Subject to
Along with other constraints on w written as
max
w
w
'
(1 ) 0
CVaR ( ) w c
c
s
w
w C e
Results in the following
Subject to
max
w
w
'
0
1
1
T
i
i
z c
T
c
=
(
+ s
(

0, 1, ,
i
z i T > =
( , ) , 1, ,
i i
z f w y i T > =
w
w C e
Palmquist, Uryasev, and Krokhmal provide us with an
example of the mean-CVaR approach.
They considered two-week returns for all of the stocks in the
S&P 100 from July 1, 1997 to July 8, 1999 for scenario
generation. Optimal portfolios were constructed solving
the mean-CVaR optimization approach for a two-week
horizon at different levels of confidence.
Note risk is the percent of the portfolio allowed to be put at risk.
It can be shown that for a normally distributed loss function,
the mean-variance and mean-CVaR frameworks generate
the same efficient frontier. However, when distributions
are non-normal, these two approaches can be significantly
different.
M-V optimization relies on deviations on both sides of the
mean, while M-CVaR relies only on the part of the
distribution which contributes to high losses.
Bibliography
Artzner, Delbaen, Eber, and Heath, Coherent Measures of Risk,
Mathematical Finance, 1999.
Grootveld and Hallerbach, Variance Verses Downside Risk: Is
There Really That Much Difference?, European Journal of
Operational Research, 1999.
Krokhmal, Palmquist, and Uryasev, Portfolio Optimization with
Conditional Value-At-Risk Objective and Constraints, Journal
of Risk, 2002.
Markowitz, Portfolio Selection, Journal of Finance, 1952.
Rockafellar and Uryasev, Optimization of Conditional Value-At-
Risk, Journal of Risk, 2000.
Roy, Safety-First and the Holding of Assets, Econometrica,
1952.
Uryasev, Conditional Value-At-Risk: Optimization Algorithms
and Applications, Financial Engineering News, 2000.
Asset Allocation
Allocation between asset classes accounts for the major
portion of risk and return in a portfolio
Selection of specific instruments is a decision with smaller
influence on portfolio performance
Asset Allocation should consider all financial aspects
Current and future wealth, income, and financial needs
Financial goals
Taxes and tax advantaged investments
Liquidity (for unexpected needs)
Investors (all types) need customized strategies
Typical Financial Advice for Individuals

Questionnaires to assess investors risk aversion
E*Trade, Charles Schwab, Fidelity, Financial Engines, etc.
Risk aversion of the investor typically assumed to be CRRA
Choose from standardized portfolios
Conservative (20% stocks)
Dynamic (40% stocks)
Aggressive (60% stocks)
Is this customized?
Typical Financial Advice for Individuals
Recently, so called life-cycle funds have been popular
Fidelity Freedom 2020
Asset allocation is purely time-dependent
Rule of thumb percent stock = 100 age
But these strategies do not depend on wealth, expected
performance, cash flow, etc.
Dynamic Asset Allocation
In real life investors change their asset allocation as time
goes by and new information is available
In theory investors value wealth at the end of the planning
horizon (and along the way) using a specific utility
function and maximize expected utility
Fixed-mix strategies are optimal only under certain
conditions
In general, the optimal investment strategy is dynamic and
reflects real-life behavior
After a stock market correction (with significant losses in
the stock portion of the portfolio) an investor would:
Rebalance back to the original allocation (constant RRA)
Buy more stocks and assume a larger stock allocation than in the
original portfolio (increasing RRA)
Buy more stocks and assume a larger stock allocation than in the
original portfolio (increasing RRA)
Do nothing and keep the new stock allocation or sell stocks to assume
a smaller stock allocation than in the original portfolio (decreasing
RRA)
Samuelson (1969)
Optimal program for investment/consumption in each period
Backward dynamic programming (maximize discounted expected
utility over lifetime)
No bequest
One risky asset (iid) and one riskless
Power utility
Optimal to invest the same proportion of wealth in stocks
in every period, independent of wealth
Merton (1969) extended this to multiple risky assets and a
variety of bequest situations
Conflict between theoreticians and practitioners
Samuelsons and Mertons result is that under their
assumptions about the market and under constant relative
risk aversion, the consumption and investment decisions
are independent of each other; the optimal investment
decision is invariant with respect to the investment
horizon and with respect to wealth.
This is the same as an investment problem where you
maximize the utility of final wealth at the end of the
investment horizon, by allocating and reallocating at each
period along the way.
The result follows directly from the utility function used.
Myopic investment strategy.
Mossin (1968) attempted to isolate the class of utility
functions of terminal wealth which result in myopic utility
for intermediate periods.
Log utility for general asset distributions
Power utility for serially independent asset distributions
If there is a riskless asset all HARA (linear risk tolerance) utility
functions
Hakansson (1971) showed for HARA no myopic strategy
except for complete absence of restrictions on borrowing
and short sales
A percent margin requirement
An absolute limit on borrowing
Lending that must be repaid
Therefore, under those restrictions, only power and log
utility functions can lead to myopic policies; furthermore
if there is serial correlation only log utility produces
myopic policies
More recently, numerical dynamic portfolio optimization
methods have been developed
Two methods
Stochastic programming
Stochastic dynamic programming (stochastic control)
Stochastic Programming
Efficiently solves the most general models
Transaction costs
Return distributions with serial dependence
Lends itself well to the more general asset liability model (ALM)
Traditionally uses scenario trees to represent possible
future events
Need to keep the tree thin for computational tractability
In later stages a very small number of scenarios are used to represent
the distribution (very thin sub-trees)
Emphasis is on obtaining a good first-stage solution rather than an
entire accurate policy
Stochastic Dynamic Programming
Used when focus is on obtaining optimal policies and
transaction costs are not a primary issue.
Based on Bellmans dynamic programming principle.
An optimal policy has the property that, whatever the initial action, the
remaining choices constitute an optimal policy with respect to the subproblem
starting at the state that results from the initial conditions.
Closed form solutions exist for HARA utility functions.
For general monotone increasing and concave utility functions
there are no analytical solutions, but can be solved numerically
when state space is small.
Curse of dimensionality
Dynamic Portfolio Choice
Lets extend the single-period utility maximization problem
to a multi-period setting.
Let:
t = 0,, T be discrete time periods with T the investment
horizon
R
t
be the random vector of asset returns in time periods t
y
t
= (y
1
,, y
N
)
t
be the amount of money invested in the
different asset classes i = 1,, N at time t
Scalars W
0
and s
t
, t = 0,, T-1, represent the initial wealth
and possible cash flows (positive and negative) over time
We can then write:
0 0 0
1 1
0 0 1
max
st.
, 1, ,
0, , , , given, 0
T
t t t t
t T T
E U y
y W s
R y y s t T
y W s s s
As an aside, note that with time-additive utility we could
also write
Where represents the discount factor.
1
0 0 0
1 1
0 0 1
max
st.
, 1, ,
0, , , , given, 0
T
t
t
t
t t t t
t T T
E U y
y W s
R y y s t T
y W s s s
Back to our problem, defining x
t
(for t = 0, T-1) as the vector
of fractions invested in each asset class in each period, we
write
Where W
t
is the wealth available each period before adding
or deducting cash
t
t
t t
y
x
W s
=
+
1 1 1 1
( )
t t t t t
W R x W s

'
= +
We can then write:
Here we can see that for serially independent asset returns,
wealth is a single state connecting one period with the
next.
1
0 0 1
max
st. 1 0, , 1
( ) , 0, , 1
0, , , , given, 0
T
t
t t t t t
t T T
E U W
x t T
W R x W s t T
y W s s s
Now we can write the problem as a dynamic programming
recursion
1
1 0 0 1
max ( )
st. 1
A b

where ( ) ( )
( ) and , , , given, 0
t t t t t t t
t
t
t
T T
t t t t t T T
U W E U W s R x
x
x
l x u
U W U W
W R x W s W s s s
In practice, we need to resort to Monte Carlo simulation to
estimate the expected utility of the single-period utility
maximizing problem in each period.
Let be samples of return
distributions for each period t. We can represent the
problem as:
, , 1, , 1,
t t
R S t T
e
ee =
1
1

max ( )
st. 1
A b,
t
t t t t t t t
t
S
t
t t
U W U W s R x
S
x
x l x u
Now the dynamic optimization problem can be solved using
a backward dynamic programming recursion, conditioning
on wealth.
Starting at T-1, parameterize wealth into K discrete levels
and solve the T-1 problem K times using
sample S
T-1
, obtaining solutions .
We then use those solutions to obtain the T-2 solutions and
continue backward. In period 0, the initial wealth is
known and we conduct the final optimization using the
period 1 value function.
In each period in the backward recursion, use a new sample
generated from Monte Carlo.
1
k
T
x

1
, 1, ,
k
T
W k K
=
Practical Utility
Represent utility as a piecewise exponential function with K
pieces represents a certain absolute risk aversion
i
where
i = 1,, K
Let be discrete wealth levels representing
the borders of each piece i, such that below the risk
aversion is
i
and above (until ) the risk aversion is
i+1
for all i = 1,, K.
For each piece i represent utility by an exponential function
, 1, ,
i
W i K
i
W
i
W
1
i
W
i i
W
i i i i
U W a b e
Practical Utility
With a first derivative with respect to wealth
The
i
are chosen to represent the desired function of risk
aversion verses wealth.
The coefficients of the exponential functions for each piece i
are found by matching both the function values and the
first derivatives at the intersections . In other words, we
fit an spline function.
i i
i i W
i i
i
U W
b e
W
i
W
Practical Utility
Thus at each wealth level , representing the border
between risk aversion
i
and
i+1
, we have the following
two equations
From which we calculate the coefficients (setting a
1
= 0 and
b
1
= 1)
1

1 1
i i i i
W W
i i i i
a b e a b e
1

1 1
i i i i
W W
i i i i
b e b e
1
( )
1
1
i i i
i W
i i
i
b b e
1
1
1
i i
i W
i i i
i
a a b e
Practical Utility
Example 1
Current wealth $100,000
Cash contributions (savings) of $15,000 per year
20 year investment horizon
US Stocks, International Stocks, Corporate Bonds,
Government Bonds, and Cash
Example 1
US Stocks Int Stocks Corp Bonds Gvt Bonds Cash
Mean 10.80 10.37 9.49 7.90 5.61
Std 15.72 16.75 6.57 4.89 0.70
Example 1
Four utility functions
A: exponential, absolute risk aversion = 2
B: Increasing relative risk aversion and decreasing absolute risk
aversion
2.0 @ W of $0.25M and below, increasing to 3.5 @ Wof $3.5 and above
C: Decreasing relative risk aversion and decreasing absolute
risk aversion
8.0 @ W of $1.0M and below, decreasing to 1.01 @ Wof $1.5M and above
D: Quadratic (downside)
Quadratic with linear penalty of 1000 for underperforming $1.0M
Recall from Lecture 2
Example 1
Utility CEW Mean Std 99% 95%
Exponential 1.412 1.564 0.424 0.770 0.943
Increasing RRA 1.440 1.575 0.452 0.771 0.937
Decreasing RRA 1.339 1.498 0.436 0.865 0.998
Quadratic 0.982 1.339 0.347 0.911 1.006
Example 1
Exponential Increasing RRA
Quadratic
Decreasing RRA
Example 1
57.4
16.9
25.7
0
0
Exponential
US Stock
Int Stock
Corp Bonds
Gvmt Bonds
Cash
34
13.7
52.3
0
0
Increasing RRA
US Stock
Int Stock
Corp Bonds
Gvmt Bonds
Cash
10.6
10
67.2
12.2
0
Decreasing RRA
US Stock
Int Stock
Corp Bonds
Gvmt Bonds
Cash
53.2
16.4
30.4
0 0
Quadratic
US Stock
Int Stock
Corp Bonds
Gvmt Bonds
Cash
Example 1
Exponential
Example 1
Exponential
Example 1
Exponential
Example 1
Exponential: 1 to go
Example 1
Example 1
Example 1
Increasing RRA
Example 1
Increasing RRA
Example 1
Increasing RRA
Example 1
Increasing RRA: 1 to go
Example 1
Example 1
Example 1
Decreasing RRA
Example 1
Decreasing RRA
Example 1
Decreasing RRA
Example 1
Decreasing RRA: 1 to go
Example 1
Example 1
Example 1
Quadratic
Example 1
Quadratic
Example 1
Quadratic
Example 1
Quadratic: 1 to go
Example 1
Quadratic: 10 to go
Example 1
Quadratic: 19 to go
Example 2
Now compare these dynamic strategies with six fixed-mix
strategies.
US stocks only
Cash only
All asset classes equally weighted
Risk averse (conservative)
Medium risk (dynamic)
Risk prone (aggressive)
With the exception of equally weighted asset classes, all
strategies are the solution of the single period Markowitz
optimization.
Example 2
Example 2
Strategy Mean Std 99% 95%
US stocks 1.825 1.065 0.469 0.660
Cash 0.868 0.019 0.822 0.834
Equally weighted 1.349 0.301 0.799 0.920
Risk Averse 1.098 0.110 0.869 0.930
Medium Risk 1.538 0.407 0.825 0.975
Risk Prone 1.663 0.639 0.677 0.852
Example 2 CEW Improvement
Exponential Increasing
RRA
Decreasing
RRA
Quadratic
US stocks 9.61% 7.17% 96.12% 12.06%
Cash 62.79% 66.04% 56.08% 13.36%
Equally wtd 11.10% 12.30% 14.56% 2.03%
Risk averse 29.93% 32.42% 27.45% 1.03%
Medium risk 0.55% 0.76% 0.62% 1.19%
Risk Prone 1.63% 0.44% 23.72% 4.81%
Bibliography
Hakansson, On Myopic Portfolio Policies, With and Without
Serial Correlation of Yields, Journal of Business, 1971.
Infanger, Dynamic Asset Allocation Strategies Using a
Stochastic Dynamic Programming Approach, in Handbook of
Asset and Liability Management, Volume 1, Zenios and
Ziemba eds., 2006.
Merton, Lifetime Portfolio Selection Under Uncertainty: the
Continuous-time Case, Review of Economics and Statistics,
1969.
Mossin, Optimal Multiperiod Portfolio Policies, Journal of
Business, 1968.
Samuelson, Lifetime Portfolio Selection by Dynamic Stochastic
Programming, Review of Economics and Statistics, 1969.
Characteristic Portfolios
Consider a single period problem with no rebalancing within
the period with the underlying assumptions:
There is a riskless asset
All first and second moments exist
It is not possible to build a fully invested portfolio that has zero
risk
The expected excess return on the fully invested portfolio with
minimum risk is positive.
Define a vector of asset attributes or characteristics (these
could be betas, expected returns, earnings-to-price ratios,
capitalization, membership in a an economic sector, etc.)
The exposure of portfolio to the attribute is .
1
2
N
a
a
a
a
p
w a
p
w
4/14/2009
The characteristic portfolio uniquely captures the defining
attribute.
Characteristic portfolio machinery connects attributes and
portfolios and to identify a portfolios exposure to an
attribute in terms of its covariance with the characteristic
portfolio.
The process works both ways, we can start with a portfolio
and find the attribute that the portfolio expresses most
effectively.
Proposition 1
1. For any non-zero attribute there is a unique portfolio that
has minimum risk and unit exposure to the attribute.
The weights of the characteristic portfolio are:
Characteristic portfolios are not necessarily fully
invested; they can have long and short positions, and
may have significant leverage.
1
1
a
a
w
a a
2. The variance of the characteristic portfolio is given by:
3. The beta of all assets with respect to the characteristic
portfolio is equal to
a
w
2
1
1
a a a
w w
a a
a
w a
2
a
a
w
a
4. Consider two attributes and with characteristic
portfolios and Let and be, respectively, the
exposure of portfolio to characteristic and the
exposure of portfolio to characteristic . The
covariance of the characteristic portfolios satisfies
a
w
d
w
d
w
a
w
a d
d
a
a
d
a
d
2 2
, a d d a a d
a d
5. If is a positive scalar, then the characteristic portfolio
of is . Because characteristic portfolios have
unit exposure to the attribute, if we multiply the attribute
by we will need to divide the characteristic portfolio
by to preserve unit exposure.
a
a
w
6. If characteristic is a weighted combination of
characteristics and , then the characteristic portfolio
of is a weighted combination of the characteristic
portfolios of and ; in particular, if
then
where
d f
a d f
a
d f
a
d f
2
2
2 2
f a
d a
a d f
d f
w w w
2 2 2
1
f f
d d
a d f
a
a
Proof
The holdings of the characteristic portfolio can be
determined by solving for the portfolio with minimum risk
given the constraint that the exposure to characteristic
equals 1.
The first order conditions are
Where is the Lagrange multiplier.
a
min s.t. 1 w w w a
1
0
w a
w a
The results are
And
Which proves item 1. Item 2 can be verified using and
the definition of portfolio variance. Item 3 can be verified
using the definition of beta with respect to portfolio P as
1
1
a
a
w
a a
1
1
a a
a
w
2
P P
w
For item 4, note and
Items 5 and 6 are straightforward.
2
2
{ }
{ }
ad a d
a d
a d
d a
w w
w w
a w
a
2
2
{ }
{ }
ad a d
a d
a a
a d
w w
w w
w d
d
Example 1:
Suppose is the attribute. Every
portfolios exposure to measures the extent of its
investment if then the portfolio is fully invested.
Portfolio C, the characteristic portfolio for attribute , is
the minimum-risk fully invested portfolio:
1 1 1
1
P
w
Note every asset has a beta of 1 with this portfolio; and the
covariance of any fully invested portfolio with C is .
1
1
2
1
2
1
C
C C C
C
C
w
w w
w
2
C
Example 2
Suppose beta is the attribute, where beta is defined by some
benchmark portfolio B
Then the benchmark is the characteristic portfolio of beta
2
B
B
w
So the benchmark is the minimum-risk portfolio with a beta
of 1.
Note that the relationship between portfolios C and B is
1
1
2
1
1
B
B B B
w w
w w
2 2
BC B C C B
Proposition 2
Let q be the characteristic portfolio of the characteristic
(expected excess returns)
Then
a. The Sharpe ratio is
f
1
1
q
f
w
f f
1
1
2
max{ | }
q P
SR SR P f f
b.
c.
2
1
1
1
q q
q
f w f
f f
2
q
q
q
q
q
w
f
w
SR
d. If is the correlation between portfolios P and q, then
e. The fraction of q invested in risky assets is given by
Pq
P Pq q
SR SR
2
2
C q
q
C
f
Proof
For any portfolio , the Sharpe ratio is . For
any positive constant , the portfolio with holdings
will also have a Sharpe ratio equal to . Thus, to find
the maximum Sharpe ratio, we can set the expected excess
return to 1 and minimize risk. We can then minimize
subject to the constraint that . This is just the
problem we solved to get , the characteristic portfolio
of .
Items b and c are properties of the characteristic portfolio.
P
w
P P P
SR f
P
w
P
SR
q
w
f
B B
w w
1 w f
For d, we use c:
And e follows from Proposition 1, item 4.
P P
P
P P
q
P
q
P q
P q
q Pq q
P q
f w f
SR
w
w
SR
w w
SR SR
Proposition 3
Assume
1. Portfolio q is net long
Let portfolio Q be the characteristic portfolio of .
Portfolio Q is fully invested with holdings
In addition SR
Q
=SR
q
, and for any portfolio P with a
correlation with portfolio Q, we have
0
C
f >
q
f
0
q
i >
Q q q
w w i =
PQ
P PQ Q
SR SR =
2.
Note that this specifies exactly how Portfolio Q explains
expected returns.
3.
2 2
Q
C
C Q
f
f
o o
=
wrt
2
Q
Q Q Q
Q
w
f f f |
o
| |
E
= =
|
|
\ .
2
2
B Q
Q
Q B
f
f
o
|
o
=
4. If the benchmark is fully invested, , then
1
B
i =
C B
Q
C
f
f
|
| =
Portfolio A (characteristic portfolio for alpha)
Define alpha as . Let be the characteristic
portfolio for alpha, the minimum risk portfolio with alpha
of 100% (note that this portfolio will have significant
leverage). According to Proposition 1, item 6, we can
express in terms of and . From item 4, we see
that the relationship between alpha and beta is
However, by construction, so portfolios A and B are
uncorrelated and
B
f f
A
w
A
w
B
w
q
w
2 2
, B A B A A B
0
B
0
A
Characteristic Portfolio of Alpha
Consider the characteristic portfolio for alpha where
Is the vector of forecasted expected residual returns, where
the residual is relative to the benchmark portfolio. Since
the alphas are forecasts of residual return, both the
benchmark and the riskless asset have alphas of zero.
The portfolio weights are
1 2 N
1
1
A
w
Characteristic Portfolio of Alpha
Portfolio A has an alpha of 1, and it has minimum
risk among all portfolios with that property. The variance
of portfolio A is
In addition, we can define alpha in terms of Portfolio A
1
A
w
2
1
1
A A A
w w
2
A
A
w
Alpha
Looking forward (ex ante), a is a forecast of residual return.
Looking backward (ex post), a is the average of the realized
residual returns.
The term alpha (just like beta) comes from the use of linear
regression
The residual returns from this regression are
Realized alphas are for keeping score the job of an active manager is to
score for that you need to forecast alpha
( ) ( ) ( )
P P P B P
r t r t t o | c = + +
( ) ( )
P P P
t t u o c = +
Alpha
Looking into the future, alpha is a forecast of residual return
Note that by definition, the benchmark portfolio always has
a residual return of 0. Therefore the alpha of the
benchmark portfolio must also be 0.
Similarly, the residual returns for a riskless portfolio is also
0 and its alpha must be 0.
| |
n n
E o u =
Information Ratio
While is the primary measure of a portfolios excess
return, another metric, the information ratio, is often used
by professionals.
The information ratio adjusts the for the portfolios
residual risk and is written:
P
is predicted alpha;
P
is the predicted standard deviation
of the residual.
Typically, we consider the ex-ante information ratio for making decisions and
the ex-post information ratio for performance evaluation.
P
P
IR
o
e
=
Information Ratio
If
P
is 0, we set IR
P
equal to 0, and, in general, we define
the information ratio IR as the largest possible value of
IR
P
given alphas {
n
}
max |
p
IR IR o ( =

Information Ratio
Now, returning to Portfolio A (the characteristic portfolio for
alpha), we note that it has several interesting properties
Proposition 4
1. Portfolio A has zero beta; therefore it typically has long
and short positions
2. Portfolio A has the maximum information ratio
0
A A
w | |
'
= =
1
for all
A P
IR IR IR P o o
'
= = E >
Information Ratio
3. Portfolio A has total and residual risk equal the inverse of
IR.
4. Any portfolio P that can be written as
has IR
P
= IR.
1
A A
IR
e o = =
with 0
P P B P A P
w w w | o o = + >
Information Ratio
5. Recall Portfolio Q the characteristic portfolio of ).
This portfolio is a mixture of the benchmark and portfolio
A:
With and
Therefore IR
Q
= IR. The information ratio of Portfolio Q
equals that of Portfolio A.
q
f
Q Q B Q A
w w w | o = +
2
2
B Q
Q
Q B
f
f
o
|
o
=
2
2
Q
Q
Q A
f
o
o
e
=
Information Ratio
6. Total holdings in risky assets for Portfolio A are
7. Let be the residual return on any portfolio P. The
information ratio of portfolio P is
2
2
C A
A
C
o e
i
o
=
{ , }
P Q P Q
IR IR Corr u u =
P
u
Information Ratio
8. The maximum information ratio is related to portfolio Qs
maximum Sharpe ratio
9. Alpha can be represented as
So alpha is directly related to the marginal contribution to
residual risk by the information ratio.
Q Q
Q Q
IR SR
o e
e o
| |
= =
|
\ .
MCRR
A
Q
A
w
IR IR o
e
| | E
= =
|
\ .
Information Ratio
10. The Sharpe ratio of the benchmark is related to the
maximal information ratio and Sharpe ratio
2 2 2
B
SR SR IR =
Fundamental Law of Active
Management
A portfolio manager applies quantitative analysis to market
data to find and exploit the opportunities for excess return
hidden in market inefficiencies.
Quantitative analysis opens up the possibility of statistical
arbitrage if the methods and models used combine all
available information efficiently.
This is illustrated within the framework of the fundamental
law of active management (Grinold 1989; Grinold & Kahn
1997).
Management
The fundamental law states that the information ratio (IR) is
the product of the information coefficient (IC) and the
square root of breadth (BR)
Breadth is defined as the number of independent forecasts of
exceptional return (think of breadth as the number of
independent factors for which you make forecasts).
The information coefficient is the correlation of each
forecast with the actual outcomes (here assumed to be the
same for all forecasts).
IR IC BR =
Management
This equation says that a higher information ratio can be
achieved by increasing the information coefficient or by
increasing the breadth.
IC can be increased by finding factors that are more
significant than those that are already in the model.
BR can be increased by finding more factors that are
uncorrelated (or relatively uncorrelated) with the existing
factors in the model.
Management
Generally, for quantitative portfolio management, we use a
model something like
The fundamental law basically assesses how well our model
explains stock-return process, and it expresses the
equations goodness of fit as the product of the number of
explanatory variables and each variables average
contribution.
1 1 2 2 it i i t i t iK Kt it
r f f f o | | | c = + + + + +
Management
While the fundamental law can be expressed in different ways,
there are certain general facts which always hold:
1. IR
2
approximately equals the goodness of fit (R
2
) of the
forecasting equations.
2. The breadth is the number of explanatory variables in the
forecasting equations.
3. IC
2
is the average contribution of each explanatory variable
in increasing R
2
4. When the benchmark is ignored and the risk-free rate is
subtracted from the portfolio returns, IR is essentially the
maximum Sharpe ratio one can achieve and the fundamental
law decomposes the maximum Sharpe ratio into the number
of explanatory variables and their average contribution.
Bibliography
Chincarini and Kim, Quantitative Equity Portfolio
Management, 2006.
Grinold, The Fundamental Law of Active Management,
Journal of Portfolio Management, 1989.
Grinold and Kahn, Active Portfolio Management, 2000.

Active Portfolio Management

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Active Portfolio Management

Caricato da

Copyright:

Formati disponibili

Active Portfolio Management

Interior Optimum: One Variable

Interior Optimum: Many Variables

Interior Optimum: Many Variables

Interior Optimum: Many Variables

Global Optimum: One Variable

Global Optimum: Many Variables

Global Optimum: Many Variables

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Constrained Optimization: Equality

Mean-Variance Analysis: Intro

Mean-Variance Analysis: Intro

Mean-Variance Analysis: Intro

Mean-Variance Analysis: Intro

Mean-Variance Analysis: Intro

Mean-Variance Analysis: Intro

Mean-Variance Analysis: Intro

Mean-Variance Analysis: Intro

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Basics

Mean-Variance Analysis: Frontier

Mean-Variance Analysis: Frontier

Mean-Variance Analysis: Frontier

Mean-Variance Analysis: Frontier

Mean-Variance Analysis: Frontier

Mean-Variance Analysis: Risk Free Rate

Mean-Variance Analysis: Risk Free Rate

Mean-Variance Analysis: Risk Free Rate