Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Department of Mathematics
Master’s Thesis
Matematiska institutionen
Linköpings universitet
581 83 Linköping
Forecasting the Equity Premium and Optimal
Portfolios
Department of Mathematics, Linköpings universitet
LITH-MAT-EX--2008/04--SE
Division of Mathematics
Department of Mathematics
2008-04-15
Linköpings universitet
SE-581 83 Linköping, Sweden
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11795
Sammanfattning
Abstract
The results show that the predictive ability of linear models is not sub-
stantially improved by taking model uncertainty into consideration. This could
mean that the main problem with linear models is not model uncertainty, but
rather too low predictive ability. However, we find that our approach gives better
forecasts than just using the historical average as an estimate. Furthermore,
we find some predictive ability in the the GDP, the short term spread and the
volatility for the five years to come. Portfolio resampling proves to be useful
when the input parameters in a portfolio optimization problem is suffering from
vast uncertainty.
Nyckelord
Keywords equity premium, Bayesian model averaging, linear prediction, estimation errors,
Markowitz optimization
Abstract
The expected equity premium is an important parameter in many financial mod-
els, especially within portfolio optimization. A good forecast of the future equity
premium is therefore of great interest. In this thesis we seek to forecast the equity
premium, use it in portfolio optimization and then give evidence on how sensitive
the results are to estimation errors and how the impact of these can be minimized.
Linear prediction models are commonly used by practitioners to forecast the ex-
pected equity premium, this with mixed results. To only choose the model that
performs the best in-sample for forecasting, does not take model uncertainty into
account. Our approach is to still use linear prediction models, but also taking
model uncertainty into consideration by applying Bayesian model averaging. The
predictions are used in the optimization of a portfolio with risky assets to investi-
gate how sensitive portfolio optimization is to estimation errors in the mean vector
and covariance matrix. This is performed by using a Monte Carlo based heuristic
called portfolio resampling.
The results show that the predictive ability of linear models is not substantially
improved by taking model uncertainty into consideration. This could mean that
the main problem with linear models is not model uncertainty, but rather too low
predictive ability. However, we find that our approach gives better forecasts than
just using the historical average as an estimate. Furthermore, we find some pre-
dictive ability in the the GDP, the short term spread and the volatility for the five
years to come. Portfolio resampling proves to be useful when the input parameters
in a portfolio optimization problem is suffering from vast uncertainty.
v
Acknowledgments
First of all we would like to thank risklab GmbH for giving us the opportunity
to write this thesis. It has been a truly rewarding experience. We are grateful
for the many inspirational discussions with Wolfgang Mader, our supervisor at
risklab. He also has provided us with valuable comments and suggestions. We
thank our supervisor at LiTH, Jörgen Blomvall, for his continous support and
feedback. Finally we would like to acknowledge our opponent Tobias Törnfeldt,
for his helpful comments.
Johan Bjurgert
Marcus Edstrand
vii
Contents
1 Introduction 5
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Bayesian Statistics 25
4.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Sufficient statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Choice of prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Marginalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 Bayesian model averaging . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Using BMA on linear regression models . . . . . . . . . . . . . . . 32
ix
x Contents
6 Implementation 53
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Linear prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Bayesian model averaging . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 Backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7 Results 57
7.1 Univariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.2 Multivariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . 60
7.3 Results from the backtest . . . . . . . . . . . . . . . . . . . . . . . 62
11 Conclusions 89
Bibliography 91
A Mathematical Preliminaries 97
A.1 Statistical definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.2 Statistical distributions . . . . . . . . . . . . . . . . . . . . . . . . 98
B Code 100
B.1 Univariate predictions . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2 Multivariate predictions . . . . . . . . . . . . . . . . . . . . . . . . 101
B.3 Merge time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.4 Load data into Matlab from Excel . . . . . . . . . . . . . . . . . . 103
B.5 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.6 Removal of outliers and linear prediction . . . . . . . . . . . . . . . 104
Contents xi
6.1 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
List of Tables
2.1 Advantages and disadvantages of discussed models . . . . . . . . . 14
The most frequently used symbols and abbreviations are described here.
Symbols
µ̄ Demanded portfolio return
βi,t Beta for asset i at time t
βt True least squares parameter at time t
µ Asset return vector
Ωt Information set at time t
Σ Estimated covariance matrix
cov[X] Covariance of the random variable X
β̂t Least squares estimate at time t
Σ̂ Sampled covariance matrix
ût Least squares sample residual at time t
λm,t Market m price of risk at time t
C Covariance matrix
In The unity matrix of size n × n
w Weights of assets
tr[X] The trace of the matrix X
var[X] Variance of the random variable X
Di,t Dividend for asset i at time t
E[X] Expected value of the random variable X
rf,t Riskfree rate at time t to t + 1
rm,t Return from asset m at time t
ut Population residual in the least square model at time t
Abbreviations
aHEP Average historical equity premium
BM A Bayesian model averaging
DJIA Dow Jones industrial average
EEP Expected equity premium
GDP Gross domestic product
HEP Historical equity premium
IEP Implied equity premium
OLS Ordinary least squares
REP Required equity premium
3
Chapter 1
Introduction
The expected equity risk premium is one of the single most important economic
variables. A meaningful estimate of the premium is critical to valuing companies
and stocks and for planning future investments. However, the only premium that
can be observed is the historical premium.
Since the equity premium is shaped by overall market conditions, factors influ-
encing market conditions can be used to explain the equity premium. Although
predictive power usually is low, the factors can also be used for forecasting. Many
of the investigations undertaken, typically set out to determine a best model, con-
sisting of a set of economic predictors and then proceed as if the selected model
had generated the equity premium. Such an approach ignores the uncertainty
in model selection leading to over confident inferences that are more risky than
one thinks that they are. In our thesis we will forecast the equity premium by
computing a weighted average of a large number of linear prediction models using
Bayesian model averaging (BMA) to allow for model uncertainty being taken into
account.
Having forecasted the equity premium - the key input for asset allocation op-
timization models, we conclude by highlighting main pitfalls in the mean variance
optimization framework and present portfolio resampling as a way to arrive at
suitable allocation decisions when the input parameters are very uncertain.
5
6 Introduction
1.1 Objectives
The objective of this thesis is to build a framework for forecasting the equity
premium and then implement it to produce a functional tool for practical use.
Further, the impact of uncertain input parameters in mean-variance optimization
shall be investigated.
1.3 Limitations
The practical part of this thesis is limited to the use of US time series only.
However, the theoretical framework is valid for all economies.
1.4 Contributions
To the best knowledge of the authors, this is the first attempt to forecast the
equity premium using Bayesian model averaging with the priors specified later in
the thesis.
1.5 Outline
The first part of the thesis is about forecasting the equity premium whereas the
second part discusses the importance of parameter uncertainty in portfolio opti-
mization.
7
Chapter 2
In this chapter we define the concept of the equity premium and present some
models that have been used for estimating the premium. At the end of the chap-
ter, a table summing up advantages and disadvantages of the different models is
provided. The chapter concludes with a motivation to why we have chosen to work
with multi factor models and a summary of criterions for a good model.
• implied equity premium (IEP): the required equity premium that arises
from a pricing model and from assuming that the market price is correct.
The HEP is observable on the financial market and is equal for all investors.1 It
is calculated by
9
10 The Equity Premium
where rm,t is the return on the stock market, rf,t−1 is the rate on a riskfree asset
from t − 1 to t. Pt is the stock index level.
A widely used measure for rm,t is the return on a large stock index. For the
second asset rf,t−1 in (2.1), the return on government securities is usually used.
Some practitioners use the return on short-term treasury bills; some use the re-
turns on long-term government bonds. Yields on bonds instead of returns have
also been used to some extent. Despite the indisputable importance of the equity
premium, a general consensus on exactly which assets should enter expression (2.1)
does not exist. Questions like: “Which stock index should be used?” and “Which
riskfree instrument should be used and which maturity should it have?” remain
unanswered.
The EEP is made up of the markets expectations of future returns over a risk-free
asset and is therefore not observable in the financial market. Its magnitude and
the most appropriate way to produce estimates thereof is an intensively debated
topic among economists. The market expectations shaping the premium are based
on, at least, a non-negative premium and to some extent also average realizations
of the HEP. This would mean that there is a relation between the EEP and the
HEP. Some authors (e.g. [9], [21], [37] and [42]), even argue that there is a strict
equality between the both, whereas other claim that the EEP is smaller than the
HEP (e.g. [45], [6] and [22]). Although investors have different opinions to what
is the correct level of the expected equity premium, many basic financial books
recommend using 5-8%.2
The required equity premium (REP) is important in valuation since it is the key
to determining the company’s required return on equity.
If one believes that prices on the financial markets are correct, then the implied eq-
uity premium, (IEP), would be an estimate of the expected equity premium (EEP).
We now turn to presenting models being used to produce estimates of the dif-
ferent concepts.
Assuming that the historical equity premium is equal to the expected equity pre-
mium can be formulated as
where em,t is the error term, the unexpected return. The expectation is often com-
puted as the arithmetic average of all available values for the HEP. In equation
(2.2), it is assumed that the errors are independent and have a mean of zero. The
model then implies that investors are rational and the random error term corre-
sponds to their mistakes. It is also possible to model more advanced errors. For
example, an autoregressive error term might be motivated since market returns
sometimes exhibit positive autocorrelation. An AR(1) model then implies that
investors need one time step to learn about their mistakes. [64]
The model has the advantages of being intuitive and easy to use. The draw-
backs on the other hand are not few. Except for usual problems with time series,
such as used length, outliers etc, the model suffers from problems with longer pe-
riods where the riskfree asset has a higher average return than the equity. Clearly,
this is not plausible since an investor expects a positive return in order to invest.
E[Di,t+1 ]
Pit = (2.3)
E[ri,t+1 ] − E[gi,t+1 ]
where E[Di,t+1 ] are the next years expected dividend, E[ri,t+1 ] the required rate
of return and E[gi,t+1 ] is the company’s expected growth rate of dividends from
today until infinity.
Assuming that CAPM3 holds, the required rate of returns for stock i can be
written as
over all assets, we can now solve for the expected market risk premium
(1 + E[gm,t+1 ])Dm,t
E[rm,t+1 ] = + E[gm,t+1 ]
Pm,t
= (1 + E[gm,t+1 ]) DivYieldm,t +E[gm,t+1 ] (2.5)
where E[rm,t+1 ] is the expected market risk premium, Dm,t is the sum of dividends
from all companies, E[gm,t+1 ] is the expected growth rate of the dividends from
today to infinity4 , and DivYieldm,t is the current market price dividend yield. [64]
One critic against using the Gordon dividend growth model is that the result
depend heavily on what number is used for the expected dividend growth rate and
thereby the problem is shifted to forecasting the expected dividend growth rate.
and E[ri,t |Ωt−1 ] and E[rm,t |Ωt−1 ] are expected returns on asset i and the market
portfolio conditional on investors’ information set Ωt−1 5 .
Observing that the ratio E[rm,t |Ωt−1 ]/ var[rm,t |Ωt−1 ] is the market price of risk
λm,t , measuring the compensation an investor must receive for a unit increase
in the market return variance [55], yields the following expression for the market
portfolio’s expected excess returns
By specifying a model for the conditional variance process, the equity premium
can be estimated.
4 E[R
m,t+1 ] >E[gm,t+1 ]
5 Both returnsare in excess of the riskless rate of return rf,t−1 and all returns are measured
in one numeraire currency.
2.5 Multi factor models 13
where the coefficients α and β usually are calculated using the least squares method
(OLS), X contains the factors and ε is the error.
Goyal and Welch [37] showed that most of the mentioned predictors performed
worse out-of-sample than just assuming that the equity premium had been con-
stant. They also found that the predictors were not stable, that is their importance
changes over time. Campbell and Thompson [16] on the other hand found that
some of the predictors, with significant forecasting power in-sample, generally have
a better out-of-sample forecast power than a forecast based on the historical av-
erage.
14 The Equity Premium
Table 2.1. Table highlighting advantages and disadvantages of the discussed models
2.7 What is a good model? 15
• The estimated premium should be rather smooth over time because investor
preferences presumably do not change much over time
• The model should provide different premium estimates for different time
horizons, that is, taking investors “time structure” into account
First we summarize the mechanics of linear regressions and present some formu-
las that hold regardless of what statistical assumptions that are made. Then we
discuss different statistical assumptions about the properties of the model and ro-
bustness of the estimates.
yt = x>
t β + ut . (3.1)
Theorem 3.1 (Ordinary least squares estimate) The OLS estimate is given
by
T
X T
X
β̂ = [ (xt x>
t )]
−1
[ (xt yt )] (3.2)
t=1 t=1
PT >
assuming that the matrix t=1 (xt xt ) ∈ Rk×k is nonsingular (see [38]).
17
18 Linear Regression Models
PT > −1
PT
β̂ = [ t=1 (xt xt )] [ t=1 (xt yt )].
y = Xβ + u, (3.3)
T
y1 x1 u1
y2 xT2 u2
where y ≡ . X ≡ . u ≡ .. .
.. .. .
yn xTn un
A perhaps more intuitive way to arrive at equation (3.2) is to project y on the
column space of X.
The vector of the OLS sample residuals, û can then be written as û = y − Xβ.
Consequently the loss function V (β) for the least squares problem can be written
In the same way, the OLS sample residuals are orthogonal to the explanatory
variables in X
û> X = 0. (3.5)
3.1 Basic definitions 19
(Xβ)> (y − Xβ) = 0 ⇔
β > (X> y − X> Xβ) = 0.
By choosing the nontrivial solution for beta, and by noticing that if X is of full
rank, then the matrix X> X also is of full rank and we can compute the least
squares estimator by inverting X> X.
The OLS sample residual û shall not be confused with the population residual u.
The vector of OLS sample residuals can be written as
The relationship between the two errors can now be found by substituting equation
(3.3) into equation (3.7)
û = MX (Xβ + u) = MX u. (3.8)
The difference between the OLS estimate β̂ and the true parameter β is found by
substituting equation (3.3) into (3.6)
var[ŷ]
R2 = var[y] .
If we let X include an
Pnintercept, then (3.5) also implies that the fitted residuals
have a zero mean n1 i=1 ûi = 0. Now we can decompose the variance of y into
the variance of ŷ and û
Since OLS minimizes the sum of squared fitted errors, which is proportional to
var[y], it also maximizes R2 .
Substituting equation (3.3) into equation (3.6) and taking expectations using as-
sumptions 1 and 2 establishes that β̂ is unbiased,
E[(β̂ − β)(β̂ − β)> ] = E[(X> X)−1 X> uu> X(X> X)−1 ] (3.12)
> −1 > > > −1
= (X X) X E[uu ]X(X X)
>
= 2
σ (X X) −1
X> X(X> X)−1
2 > −1
= σ (X X) .
When u is Gaussian, the above calculations imply that β̂ is Gaussian. Hence, the
preceding results imply
β̂ ∼ N (β, σ 2 (X> X)−1 ).
It can further be shown that under assumption 1,2 and 3, β̂ is BLUE2 , that is, no
unbiased estimator of β is more efficient than the OLS estimator β̂.
1 As treated in [38]
2 BLUE, best linear unbiased estimator see the Gauss-Markov theorem
22 Linear Regression Models
point can be justified by arguing that there seldom are outliers which practically
makes them unpredictable and therefore the deletion would make the predictive
power stronger. Sometimes extreme points correspond to extraordinary changes
in economies and depending on context it might be more or less justified to discard
them.
Because the outliers do not get a higher residual they might be easy to over-
look. A good measure for the influence of a data point is its leverage.
Since ŷ = Xβ̂ = X(X> X)−1 X> y the leverage measures how an observation es-
timates its own predicted value. The diagonals hii of H contains the leverage
measures and are not influenced by y. A rule of thumb [39] for detecting out-
liers is that hii > 2 (p+1)
n signals a high leverage point, where p is the number of
columns in the predictor matrix X aside from the intercept and n is the number
of observations. [39]
3.4 Testing the regression assumptions 23
Since ρj is a correlation, |ρj | ≤ 1 for all j. Note also that ρ0 equals unity for all
covariance stationary processes.
For a more detailed analysis the Jarque-Bera test, a godness of fit measure from
departure of normality, based on skewness and kurtosis can be employed.
1
Pn
(xk − x̄)3
S = n
1
Pn k=1
( n k=1 (xk − x̄)2 )3/2
1
Pn 4
n k=1 (xk − x̄)
K = 1
Pn
( n k=1 (xk − x̄) )2
2
Bayesian Statistics
Theorem 4.1 (Bayes’s theorem) Let p(y, θ), denote the joint probability den-
sity function (pdf) for a random observation vector y and a parameter vector θ,
also considered random. Then according to usual operations with pdf’s, we have
p(y, θ) = p(y|θ)p(θ)
=p(θ|y)p(y)
and thus
p(θ)p(y|θ) p(θ)p(y|θ)
p(θ|y) = =R (4.1)
p(y) A
p(y|θ)p(θ)dθ
25
26 Bayesian Statistics
Figure 4.1 highlights the importance of Bayes‘s theorem and shows how the prior
information enters the posterior pdf via the prior pdf, whereas all the sample in-
formation enters the posterior pdf via the likelihood function.
Note that an important difference between the Bayesian statistics and the classical
Fisher statistics is that the parameter vector θ is considered to be a stochastic
variable rather than an unknown parameter.
It turns out that many of the common statistical distributions have a similar
form. This leads to the definition of the exponential family.
P
then it follows immediately from definition 4.2 that t(yi ) is sufficient for θ given
y.
1 X 2 X 1 X 2
= exp[− 2 µ + 2µ yt ] (2πσ 2 )−N/2 exp[− 2 yt ]
| 2σ {z }| {z 2σ }
=f (t,µ) =g(y)
P
the sufficient statistics t(y) is given by t(y) = yt .
28 Bayesian Statistics
A good rule of thumb for prior selection is that your prior should represent the
best knowledge available about the parameters before looking at data. For exam-
ple, the number of scores in a football game can not be less than zero and is less
than 1000, which justifies setting your prior equal to zero outside this interval.
In the case that one does not have any information, a good idea might be to use
an uninformative prior.
Example 4.2
Consider a random sample y = (y1 , . . . , yn ) ∼ N (θ, φ), with mean θ known and
variance φ unknown. The Jeffreys prior pJ (φ) for φ is then computed as follows
n
Y 1 (xi − θ)2
L(φ|y) = ln (p(y|φ)) = ln ( √ exp[− ])
i=1
2πφ 2φ
n
1 1 X
= ln (( √ )n exp[− (xi − θ)2 ])
2πφ 2φ i=1
n
1 X n
= − (xi − θ)2 − ln φ + c
2φ i=1 2
n
∂2L 3 X n
⇒ = − (xi − θ)2 + 2
∂φ2 φ3 i=1 φ
n
∂2L 3 X n
⇒ −E[ ] = E[ (xi − θ)2 ] − 2 =
∂φ2 φ3 i=1 φ
3 n 2n
= 3
(nφ) − 2 = 2
φ φ φ
1 1
⇒ pJ (φ) ∝ |J(φ|y)| 2 ∝
φ
A natural question that arises is what choices of priors generate analytical expres-
sions for the posterior distribution. This question leads to the notion of conjugate
priors.
is in the class Π for all y whenever the prior density is in Π (see [49]).
There is a minor complication with the definition and a more rigorous definition is
presented in [5]. However, the definition states the key principle in a clear enough
matter.
30 Bayesian Statistics
Example 4.3
Let x = (x1 , . . . , xn ) have independent Poisson distributions with the same mean
λ, then the likelihood function lx (λ) equals
Qn xi −nλ
lx (λ) = i=1 ( λxi e−λ ) = λt Qe n x ∝ λt e−nλ
i
i=1
Pn
where t = i=1 xi and by theorem 4.2 is sufficient for λ given x.
Conjugate priors are useful in computing posterior densities. Although there are
not that many priors that are conjugate, there might be a risk of overuse since
data might be better described by another distribution that is not conjugate.
4.4 Marginalization
A useful property of conditional probabilities is the possibility to integrate out
undesired variables. According to usual operations of pdf’s we have
R
p(a, b)db = p(a).
Analogously, for any likelihood function of two or more variables, marginal like-
lihoods with respect to any subset of the variables can be defined. Given the
likelihood ly (θ, M ) the marginal likelihood ly (M ) for model M is
R
ly (M ) = p(y|M ) = p(y|θ, M )p(θ|M )dθ.
Unfortunately marginal likelihoods are often very difficult to calculate and numer-
ical integration techniques might have to be employed.
Example 4.4
Suppose we are analyzing data and believe that it arises from a set of probability
distributions or models {Mi }ki=1 . For example, the data might consist of a normally
distributed outcome y that we wish to predict future values of. We also have two
other outcomes, x1 and x2 , that covariates with y. Using the two covariates as
4.5 Bayesian model averaging 31
p(D|Mk )p(Mk )
p(Mk |D) = PK , (4.8)
l=1 p(D|Ml )p(Ml )
where
Z
p(D|Mk ) = p(D|θk , Mk )p(θk |Mk )dθk (4.9)
By completing the square in the exponent, the sum of squares can be written as
(y − Xβ)> (y − Xβ) = (β − β̂)> X> X(β − β̂) + (y − Xβ̂)> (y − Xβ̂),
where β̂ = (X> X)−1 X> y is the OLS estimate. That the equality holds is proved
by multiplying out the right handside and checking that it equals the left handside.
As pointed out in section 3.1, (y − Xβ̂) is the residual vector û and its sum
of squares divided by the number of observations less the number of covariates is
known as the residual mean square denoted by s2 .
û> û û> û
s2 = (n−p) = (v) ⇒ û> û = vs2
which is the Jeffreys prior as calculated in example 4.2. For βj the g-prior, as
introduced by Zellner [68], is applied
As shown by Fernandez, Ley and Steel [33] the following three theoretical values
of g lead to consistency, in the sense of asymptotically selecting the correct model.
• g = 1/n
The prior information is roughly equal to the information available from one
data observation
• g = k/n
Here, more information is assigned to the prior as the number of predictors
k grows
• g = k (1/k) /n
Now, less information is assigned to the prior as the number of predictors
grows
To arrive at a posterior probability of the models given data we also need to specify
the prior distribution for each model Mj over M the space of all K = 2p−1 models.
p(Mj ) = pj , j = 1, . . . , K
∀Mj ∈ M = pj > 0
PK
j=1 pj = 1
Theorem 4.3 (Derivation of the marginal likelihood) Using the above spec-
ified priors, the marginalized likelihood function is given by
Z
ly (Mj ) = p(y|βj , σ 2 , Mj )p(σ 2 )p(βj |σ 2 , Mj )dβj dσ 2 =
Γ(n/2) g −1 > − n
= (y> y − y> Xj (X>
j Xj ) Xj y) 2 .
π n/2 (g+ 1) p/2 1+g
Proof :
To integrate the expression we start by completing the square of the exponents. Here,
we do not write out the index on the variables. Mind that Z0 is used instead of writing
out the g-prior.
34 Bayesian Statistics
= (β − B1 )> (X> X + Z0 )(β − B1 ) − (β̂ > X> X)(X> X + Z0 )−1 (X> Xβ̂)
− (β̂ > X> X)(X> X + Z0 )−1 Z0 β̄ − (β̄ > Z0 )(X> X + Z0 )−1 (X> Xβ̂)+
−(β̄ > Z0 )(X> X + Z0 )−1 (Z0 β̄) + (β̂ > X> X)(X> X + Z0 )−1 (X> X + Z0 )β̂
+β̄ > Z0 (X> X + Z0 )−1 (X> X + Z0 )β̄ =
= (β − B1 )> (X> X + Z0 )(β − B1 ) − [(β̂ > X> X)(X> X + Z0 )−1 (Z0 β̄)
+ (β̄ > Z0 )(X> X + Z0 )−1 (X> Xβ̂) − (β̄ > X> X)(X> X + Z0 )−1 (Z0 β̄)
− (β̂ > Z0 )(X> X + Z0 )−1 (X> Xβ̂)] =
= (β −B1 )> (X> X+Z0 )(β −B1 )−[β̂ > ((X> X)−1 +Z−1
0 )
−1
β̄ + β̄ > ((X> X)−1 +Z−1
0 )
−1
β̂
> > −1 −1 −1 > > −1 −1 −1
−β̂ ((X X) + X0 ) β̂ − β̄ ((X X) + Z0 ) β̄] =
The second exponent is the kernel of a multivariate normal density2 and integrating with
respect to β yields
−n/2
|S1 |−n/2 = S1 = (vj s2j + β̂j> ((1 + g)X>
j Xj )
−1
β̂j )−n/2
= (vj s2j + β̂j> (1/(1 + g))(X>
j Xj )
−1
β̂j )−n/2
= ((y − Xj β̂j )> (y − Xj β̂j ) + β̂j> (1/(1 + g))(X>
j Xj )
−1
β̂j )−n/2
g
= (y> y − y> Xj (X> j Xj )
−1 >
Xj y)−n/2
1+g
1
|Z0 |1/2 = | X> j Xj |
1/2
= (1/g)p/2 |X>j Xj |
1/2
g
1 1
|A1 |−1/2 = =
|A1 |1/2 (1 + (1/g))p/2 |X>j Xj |
1/2
Γ(n/2)
And finally we arrive at ly (Mj ) = π n/2 (g+1)p/2
(y> y − g
1+g
y> Xj (X>
j Xj )
−1 >
Xj y)−n/2 .
36 Bayesian Statistics
p(y|Mj )pj
p(Mj |y) = Pn
k=1 p(y|Mk )pk
K
X
φ = var[∆|y] = [σû2 Xj (X>
j Xj )
−1 >
Xj
j=1
(4.16)
Finally the confidence interval for our BMA estimate for the equity premium is
calculated
α Sk
I1−α (ξk ) = ξk ± Φ−1 (1 − ) √ , (4.19)
2 n
where Φ = p(X ≤ x) when X is N (0, 1). This interval results from the central
limit theorem stating that for a set of n i.i.d. random variables with finite mean
µ and variance σ 2 , the sample average approaches the normal distribution with a
2
mean µ and variance σn as n increases. This holds irrespectively of the shape of
the original distribution. It then follows, that for each time step the 218 estimates
of the equity premium has a sample mean and variance that is normal distributed.
Chapter 5
In this chapter we first describe the used data set and then explain and motivate
the predictors we have chosen to forecast the expected equity premium. We also
check that our statistical assumptions hold and explain how the predictions are
carried out.
1 DJIA is is a price-weighted average of 30 significant stocks traded on the New York Stock
Exchange and the Nasdaq. In contrast, most stock indices are market capitalization weighted,
meaning that larger companies account for larger proportions of the index.
37
38 The Data Set and Linear Prediction
Factors
1 2 3 4 5 6 7 8 9
Mean 0.00 0.07 0.06 0.02 0.07 0.04 0.09 0.07 -0.02
Std 0.14 0.37 0.15 0.13 0.23 0.03 0.40 0.40 0.11
Median 0.00 -0.01 0.05 0.01 0.10 0.04 0.06 0.01 -0.02
Min -0.30 -0.38 -0.20 -0.23 -0.61 0.01 -0.71 -0.68 -0.34
Max 0.32 1.73 0.87 0.29 0.64 0.14 1.28 1.65 0.20
10 11 12 13 14 15 16 17 18
Mean -0.04 0.00 0.04 0.03 0.07 0.07 0.00 0.04 1.50
Std 0.27 0.04 0.04 0.05 0.03 0.03 0.14 0.01 11.84
Median 0.01 -0.01 0.02 0.03 0.07 0.06 0.00 0.04 0.79
Min -1.29 -0.10 -0.03 -0.09 0.01 0.00 -0.28 0.01 -52.24
Max 0.53 0.15 0.16 0.11 0.13 0.13 0.42 0.08 48.60
Dividend yield
The main reason for the supposed predictive power of the dividend yield is the
positive relation between expected high dividend yields and high returns. This is
a result from using a discounted cash flow framework under the assumption that
the expected stock return is equal to a constant. For instance Campbell [11] has
shown that the current stock price is equal to the expected present value of future
dividends out to the infinite future. Assuming that the current dividend yields will
remain the same in the future, the positive relation follows. This relationship can
also be observed in the Gordon dividend growth model. In the absence of capital
gains, the dividend yield is also the return on the stock and measures how much
cash flow you are getting for each unit of cash invested.
Price-earnings ratio
Price-earnings ratio, price per share divided by earnings per share, measures how
much an investor is willing to pay per unit of earnings. A high Price-earnings ratio
then suggests that investors think the firm has good growth opportunities or that
the earnings are safe and therefore more valuable [7].
Price-dividend ratio
The price-dividend ratio, price per share divided by annual dividend per share, is
the reciprocal of the dividend yield. A low ratio might mean that investors require
a high rate of return or that they are not expecting dividend growths in the future
5.3 Factors explaining the equity premium 41
Inflation
Inflation is defined as the increase in the price of some set of goods and services
in a given economy over a period of time [10]. The inflation is usually measured
through a consumer price index, which measures nominal consumer prices for a
basket of items bought by a typical consumer. The prices are weighted by the
fraction the typical consumer spends on each item. [20]
Many different theories for the role and impact of inflation in an economy have
been proposed, but they all have some basic implications in common. A high
inflation make people more interested in investing their savings in assets that are
inflation protected, e.g. real estate, instead of holding fixed income assets such
as bonds. By moving away from fixed income and investing in other assets the
hopes are that the returns will exceed the inflation. As a result, high inflation
leads to reduced purchasing power as individuals reduce money holdings. High
inflations are unpredictable and volatile. This creates uncertainty in the business
community, reducing investment activity and thus economic growth. If a period
of high inflation rules, a prolonged period of high unemployment must be paid to
reduce inflation to modest levels again. This is the main reason for fearing high
inflation. [44]
A low inflation usually implies that the price levels are expected to increase over
time and therefore it is beneficial to spend and borrow in the short run. A low
inflation is the starting point for a higher rate of inflation.
Besides being linked to the general state of the economy, inflation also has great
impact on interest rates. If the inflation rises, so will the nominal interest rates
which in turn influence the business conditions. [44]
A bank that wishes to finance a client venture but does not have the means to
do so can lend capital from another bank to the federal funds rate. As a result,
the federal funds rate set the threshold for how willing banks are to finance new
ventures. As the rate increases, banks become more reluctant to take out these
inter-bank loans. A low rate will on the other hand encourage banks to borrow
money and hence increase the possibilities for businesses to finance new ventures.
Therefore, this rate somewhat controls the US business climate.
Term spread
A yield curve can take on many different shapes and there are several different
theories trying to explain the shape. When talking about the shape of the yield
curve one refers to the slope of the curve. Is it flat, upward sloping, downward
sloping or humped? Upward and downward sloping curves are also referred to as
normal and inverted yield curves. A yield curve constructed from prices in the
bond market can be used to calculate different term spreads, differences in rates
for two different maturities. For this reason the term spread is related to the slope
of the yield curve. Here we have defined the short term spread as the difference in
rates between the maturities one year and three months and the long term spread
as the difference between ten years and one year maturities. Positive short and
long term spreads could imply an upward sloping yield curve, and the opposite
could imply a downward sloping curve. A positive short term spread and a nega-
5.3 Factors explaining the equity premium 43
Yield curves almost always slope upwards, figure 5.2 a. One reason for this is
expectation of future increases in inflation and therefore investors require a pre-
mium for locking in their money at an interest rate that is not inflation protected.
[44] As mentioned earlier, increase in inflation comes with economy growth which
makes an upward sloping yield curve a sign of good times. The growth itself
can also be partly explained by the lower short term rate which makes it cheaper
for companies to borrow for expanding. Furthermore, central banks are expected
to fend off the expected rise in inflation with higher rates, decreasing the price
of long-term bonds and thus increasing their yields. A downward sloping yield
curve, figure 5.2 b occurs when the expectations is that future inflation will be
lower than current inflation and thus the expectation also is that the economy will
slow down in the future [44]. A low long term bond yield is acceptable since the
inflation is low. In fact, each of the six last recessions in the US has been preceded
by an inverted yield curve [25]. This shape could also be developed as the Federal
Reserve raises their nominal federal funds rate.
A flat yield curve, figure 5.2 c, signals uncertainty in the economy and should not
be visible for any longer time periods. Investors should in theory not have any
incentive to hold long-dated bonds over shorter-dated bonds when there is no yield
premium. Instead they would sell off long-dated bonds resulting in higher yields in
the long end and an upward sloping yield curve. A humped yield curve, figure 5.2
d, arises when investors expect interest rates to rise over the next several periods
and then decline. It could also signal the beginning of a recession or just be the
result of a shortage in the supply of long or short-dated bonds. [18]
Credit spread
Yields on corporate bonds are almost always higher than on treasuries with the
same maturity. This is mainly a result of the higher default risk in corporate
bonds, even if other factors have been suggested as well. The corporate spread,
also known as the credit spread, is usually the difference between the yields on a
Baa rated corporate bond and a government bond, with the same time to maturity
44 The Data Set and Linear Prediction
of course. Research [47] has shown that only around 20-50 percent of the credit
spread can be accounted for by the default risk only, when calculating the credit
spread with government bonds as the reference instrument. If one instead uses
Aaa rated corporate bonds, you hopefully increase this number. Above all, the
main reason for using credit spread as an explaining/forecasting variable at all is
that the credit spread seems to widen in recessions and to shrink in expansions
during the business cycle [47]. It can also change as other bad news hit the market.
Our corporate bond series have bonds with a maturity as close as possible to 30
years, and are averages of daily data.
Producer price
The producer price measures the average change over time in selling prices received
by domestic producers of goods and services. It is measured from the perspective
of the seller in contrast with the consumer price index that measure from pur-
chaser’s perspective. These two may differ due to government subsidies, sales and
excise taxes and distribution costs.[63]
Consumer sentiment
The consumer sentiment index is based on household interviews and gives an in-
dication of the future business climate, personal finance and spending in the US
and therefore has implications on stocks, bonds and cash markets.[62]
Volatility
Volatility is the standard deviation of the change in value of a financial instrument.
The volatility is here calculated on monthly observations for each year. The basic
idea behind volatility as an explaining variable is that volatility is synonymous
with risk. High volatility should imply a higher demand for risk compensation, a
higher equity premium.
5.4 Testing the assumptions of linear regression 45
Earnings-book ratio
The earnings-book ratio relates the earnings per share to the book value per share
and measures a firm’s efficiency at generating profits. The ratio is also called ROE,
return on equity. It is likely that a high ROE yields a high equity premium be-
cause general business conditions have to be good in order to generate a good ROE.
Step Outlierstot
1 19
2 18
3 18
4 14
5 16
Table 5.3. Outliers identified by the leverage measure for univariate predictions
The assumptions that must hold for a linear regression model were presented in
chapter 3.2 and the means for testing these assumptions were given in chapter 3.4.
After having removed outliers, it is motivated to check for violations against the
classical regression assumptions.
The QQ-plots for all factors are presented in figure 5.3 and 5.4. By visual in-
spection of each subplot, it is seen that for some factors, the points on the plot fall
close to the diagonal line - the error distribution is likely to be gaussian. Other
factors shows sign of kurtosis due to the S-shaped form. A Jarque-Bera test on the
significance level 0.05 has been performed to rule out the uncertainties of depar-
tures from the normal distribution. From the results in table 5.4 it is found that
we can not reject the null hypothesis that the residuals are Gaussian at significance
level 0.05. The critical value represents the upper limit for the null hypothesis to
hold, the P-Value represents the probability of observing the same outcome given
that the null hypothesis is true or put another way if the P-Value is above the
significance level we cannot reject the null hypothesis.
46 The Data Set and Linear Prediction
Factor
1 2 3 4 5 6 7 8 9
JB-Value 2.39 1.79 1.35 2.24 1.69 1.27 0.96 1.14 2.00
Crit-Value 4.84 4.88 4.95 4.92 4.95 4.89 4.95 4.93 4.93
P-Value 0.16 0.26 0.39 0.18 0.29 0.41 0.53 0.46 0.22
H0 or H1 H0 H0 H0 H0 H0 H0 H0 H0 H0
10 11 12 13 14 15 16 17 18
JB-Value 1.62 2.14 0.85 1.77 0.96 0.82 1.72 2.18 1.62
Crit-Value 4.94 4.98 4.93 4.92 4.91 4.90 4.91 4.88 4.94
P-Value 0.30 0.20 0.58 0.26 0.53 0.59 0.28 0.19 0.30
H0 or H1 H0 H0 H0 H0 H0 H0 H0 H0 H0
Table 5.4. Jarque-Bera test of normality at α = 0.05 for univariate residuals for lagged
factors
Factor
1 2 3 4 5 6 7 8 9
DW-Value 1.83 2.10 2.02 1.88 2.10 2.19 2.09 2.09 2.16
P-Value 0.46 0.85 0.97 0.58 0.83 0.67 0.89 0.89 0.64
10 11 12 13 14 15 16 17 18
DW-Value 2.08 1.97 2.23 1.92 2.23 2.08 2.11 2.02 2.05
P-Value 0.92 0.82 0.57 0.67 0.56 0.95 0.81 0.91 0.98
Table 5.5. Durbin-Watson test of autocorrelation for univariate residuals for lagged
factors
5.4 Testing the assumptions of linear regression 47
(a) Dividend yield (b) Price-earnings ratio (c) Book value per share
(g) Fed funds rate (h) Short term interest rate (i) Term spread short
Figure 5.3. QQ-Plot of the one step lagged residuals for factors 1-9 versus standard
normal pdf
48 The Data Set and Linear Prediction
(a) Term spread long (b) Credit spread (c) Producer price
(d) Industrial production (e) Personal income (f) Gross domestic product
Figure 5.4. QQ-Plot of the one step lagged residuals for factors 10-18 versus standard
normal pdf
5.4 Testing the assumptions of linear regression 49
(a) Dividend yield (b) Price-earnings ratio (c) Book value per share
(g) Fed funds rate (h) Short term interest rate (i) Term spread short
Figure 5.5. One step lagged factors 1-9 versus returns on the equity premium, outliers
marked with a circle
50 The Data Set and Linear Prediction
(a) Term spread long (b) Credit spread (c) Producer price
(d) Industrial production (e) Personal income (f) Gross domestic product
Figure 5.6. One step lagged factors 10-18 versus returns on the equity premium, outliers
marked with a circle
5.5 Forecasting by linear regression 51
The second possibility is to estimate the regression equation using lagged inde-
pendent variables. If one wants to take one step ahead in time, then one would lag
its independent variables one step. This is illustrated in table 5.6 where τ is the
time lag steps. By inserting the most recent, unused, observations of the indepen-
dent variables in the regression equation you get a one step forecasted value for
the dependent variable. In fact, one could insert any of the unused observations of
the independent variables since its already assumed that the regression equation
holds over time. However, economically, it is common practise to use the most
recent values since they probably contain more information about the future2 . It
is the approach mentioned above that has been used in this thesis. Plots for the
univariate one step lagged regressions are found in figure 5.5 and figure 5.6.
Y Xi
yt ↔ xi,t−τ
yt−1 ↔ xi,t−τ −1
yt−2 ↔ xi,t−τ −2
.. ..
. .
yt−N ↔ xi,t−τ −N
When a time series is regressed on other time series that are lagged, information
is generally lost and resulting in smaller absolute values of R2 , see table 5.7. This
does not need to be the case, some times lagged predictors provide a better R2 .
This can be explained by, and observed in table 5.7, that it takes time for these
predictors to have impact on the dependent variable. For instance a higher R2 in-
sample would have been obtained for factor 15, GDP, if its time series would have
been lagged one step. The realized change in GDP does a better job in forecasting
than in explaining that years equity premium.
Table 5.7. Lagged R2 for univariate regression with the equity premium as dependent
variable
Chapter 6
Implementation
In this chapter it is explained how the theory from the previous chapter is imple-
mented and techniques and solutions are highlightened. All code is presented in
the appendix B.
6.1 Overview
The theory covered in the previous chapters are implemented using Matlab. To
make the program easy to use, a user interface in Excel is constructed. Figure 6.1
describes the communication between Excel, VBA and Matlab.
53
54 Implementation
The Jarque-Bera test, Durbin-Watson test and the QQ-plots are generated us-
ing the following Matlab calls: jbtest,dwtest and qqplot.
Surveys on the equity premium have shown that the large majority of profes-
sionals believe that the the premium is confined to 2-13% [65]. Therefore, models
yielding a negative value of the premium or a value exceeding the historical mean
of the premium with 1.28σ, that corresponds to a 90% confidence interval, are not
being used in the Bayesian model averaging and therefore do not influence the
final premium estimate at all. Setting the upper bound to 1.28σ rules out premia
larger than around 30%.
6.3 Bayesian model averaging 55
6.4 Backtesting
Since the HEP sometimes is negative while we do not allow for negative values
of the premium, traditional backtesting would not be a fair benchmark for the
performance of our prediction model. Instead we evaluate how good excess returns
are estimated by allowing for negative values. To further investigate the predictive
ability of our forecasting, an R2 -out-of-sample statistic is employed. The statistic
is defined as
Pn
2 (rt − rˆt )2
Ros = 1 − Pi=1n 2
, (6.1)
i=1 (rt − r¯t )
where rˆt is the fitted value from the predicitive regression estimated through t − 1
and r¯t is the historical average return, also measured through t − 1. If the statistic
is positive, then the predicitive regression has lower average mean squared error
than the historical average.1 Therefore, the statistic can be used to determine if a
model has better predictive performance than applying the historical average.
A measure called hit ratio (HR) can be used as an indication of how good the
forecast is at predicting the sign of the realized premium. It is simply the ratio of
how many times the forecast has the right sign and the length of the investigated
time period. For an investor this is of interest since the hit ratio can be used as a
buy-sell signal on the underlying asset. In the case of the equity premium, this is
a biased measure since the long-term average of the HEP is positive.
An interesting question is if the next years predicted value will be realized within
the next coming business cycle, here approximated as five years and called forward
average. This value is calculated as a benchmark along with a five-year rolling av-
erage, here called backward average. The results from the backtest is presented in
the results section.
Results
In this chapter we present our forecasts of the equity premium along with the
results from the backtest.
The figures for the forecasted premia is displayed in table 7.1. Models not belong-
ing to the set specified in chapter 6 are not taken into consideration. In table 7.1
the labels Prediction Range and Mean refer to the range of the predicted values
and to the mean of these predicted values. Note that the Mean corresponds to the
prior believes. ξk is the estimate of the premium using bayesian model averaging.
The variance and a confidence interval for this estimate is also presented.
57
58 Results
In table 7.2 the factors constituting the univariate model with highest probability
over time is presented. The factors are further explained in chapter 5. Note that
the prior assumption about model probabilities is 1/18 ≈ 5.5 percent for each
model.
Table 7.2. The univariate model with highest probability over time
Figure 7.2 shows how the likelihood function changes for different g-values for
each one step lagged predictor. Table 7.3 shows results from the backtest. The
2
Ros statistics shows that the univariate prediction model has better predictive
7.1 Univariate forecasting 59
performance than applying the historical average for the period 1991 to 1999.
The hit ratio statistics, HR, shows how often the univariate predictions have the
right sign, that is, if the premium is positive or negative. Mind that we allow for
negative premium values when applying the HR statistics.
Pred. step 1 2 3 4 5
2
Ros,uni 0.21 0.26 0.23 0.05 0.14
HRuni 0.6 0.2 0 0.6 0.2
2
Table 7.3. Out of sample, Ros,uni , and hit ratios, HRuni
60 Results
Table 7.5. The multivariate model with highest probability over time
In table 7.5 the factors constituting the multivariate models with highest proba-
bilities over time are presented. The factors are discussed in chapter 5. Note that
the prior assumption about the model probabilities is 1/(218 ) ≈ 0.00038 percent
for each model.
Table 7.6 depicts how the predicted values are influenced by the three choices for g.
In the univariate case, the three choices coincide. Table 7.7 shows results from the
2
backtest. The Ros statistics shows that also the multivariate prediction model has
better predictive performance than applying the historical average for the period
1991 to 1999. The hit ratio statistics, HR, shows how often the univariate pre-
dictions have the right sign, that is, if the premium is positive or negative. Once
again, we allow for negative premium values when applying the HR statistics.
Pred. step 1 2 3 4 5
2
Ros,mv 0.23 -0.10 0.20 0.47 0.60
HRmv 0.6 0.4 0.6 0.8 0.6
2
Table 7.7. Out of sample, Ros,mv , and hit ratios, HRmv
62 Results
In chapter 6.3 we specified the value of g to be used in this thesis as the reciprocal
of the number of samples. For the sake of completeness, we have presented the
outcome of the two other values of g in table 7.6. Apparently, the chosen value
of g has most impact on the 1-year horizon forecast and a decreasing impact on
the other horizons. This can be explained by the rapid decreasing forecasting
performance of the covariance matrix for time lags above one which in turn can
be motivated by table 5.7 showing decreasing R2 -values over time. In figure 7.2
the principle appearance of the likelihood function for the factors and different
g-values can be seen. As explained earlier it is seen that increasing the value of
g gives models with good adaptation to data a higher likelihood, while setting g
to zero yields the same likelihood for all models. For large g-values, only models
with high degree of explanation will have impact in the BMA and you have great
confidence in your data. On the other hand, a decrease of g allows for more un-
certainty to be taken into account.
Turning to the model criterions formulated in chapter 2.7, it is found that most of
the criteria are fulfilled. The equity premium over the five-year horizon is positive,
due to our added constraints, however the confidence interval for the premium
incorporates the zero at some times.
The time variation criteria is not fulfilled in the sense that the regression line
does not change considerably as new data points become available. The amount
of used data is a tradeoff between stability and incorporating the latest trend. The
conflict lies in the confidence of predictors. To use many data samples improve
preciseness of the predictors but the greater the difference between the time to be
predicted and that of the oldest samples, the more doubtful are the implications
of old samples.
The smoothness of the estimates over time is questionable, our five-years pre-
diction in the univariate case are rather smooth whereas the multivariate forecasts
exhibit greater fluctuations. Given the appearance of the realized equity premium
65
66 Discussion of the Forecasting
til December 2007, which is strongly volatile, and that a multivariate model can
explain more variance, it is reasonable that a multivariate model would generate
results more similar to the input data, just as can be observed in the multivariate
case, figure 7.3.
The time structure of the equity premium is not taken into consideration be-
cause the one-year yield, serving as the riskfree asset, does not alone account for
the term structure.
Since all predictions suffer from an error it is important to be aware of the qual-
ity of the predictions. Our precision estimate takes the misfit of the models into
account and therefore it says something about the uncertainty in our predictions.
However, this precision does not say anything about the relevancy of using old
data to forecast future values.
From the R2 -values in table 5.7 it can be seen that there are some predictive
ability at hand, even though it is small. Another evidence of predictability is the
deviation of the prior probabilities to the posterior probabilities. If there were no
predictability at hand, why would then the prior probability be different from the
posterior probability? The mean in table 7.1 and table 7.4 corresponds to using
the prior believes that all models have the same probability, the BMA estimate is
never equal to the mean.
The univariate predictors with the highest probability in each time step, table
7.2, also enters the models with highest probability in table 7.5, except for GDP
which is not a member of the multivariate model for the first time step. This
can be summarized as the factors GDP, term spread short and volatility being
important in the forecast for the next five years.
Having seen evidence of predictive ability, the question is now to what extent
it can be used to produce accurate forecasts.
Backtesting our approach is not trivial, mainly because we cannot access the
historical expected premium. Nevertheless, backtesting has been performed by
doing a full five-year horizon forecast starting in each year between 1991 and 1995
respectively and then comparing the point forecasts with the realized historical
equity premium for each year. Here, no restrictions are imposed on the forecasts,
i.e. negative excess returns are allowed. The results are presented in figure 7.4 and
figure 7.5 where each plot corresponds to a time step (1, 2, 3, 4 or 5 years). These
plots have also been complemented with the realized excess returns, as well as the
five-year backward and the five-year forward average. In figure 7.4 f and figure
7.5 f , the arithmetic average of the full five-year horizon forecast is compared to
the five-year forward average.
The univariate backtest shows that the forecast intervals at most capture 2 out
of 5 HEP:s, this at the one and two-year horizon. Otherwise, the forecasts tend
67
to be far too low in comparison with the HEP. The number of times the HEP
intersects with the forecasted intervals at the most is 2 times, at the two-year
horizon figure 7.4 b. In general, the univariate forecasts do not seem to be flexible
enough to fit the sometimes vast changes in the HEP and are far too low. The
backtest has not provided us with any evidence of forecasting ability. However,
when the forecast constraint is imposed, the predictive ability from 1991-1995 is
superior to using the historical average. This can be seen from the R2 -statistics
in table 7.3. The four and five-year horizon forecasts, figure 7.4 d − e, captures
2 out of 5 forward averages, whereas the one-year horizon captures 3 backward
averages. In figure 7.4 f it can be seen that averaging the forecasts do not give
a better estimate of the forward average. From table 7.3 it can be seen that the
hit-ratios for the one and four-year horizon stand out with both scoring 60 %. The
results from the univariate back-test have shown that the best forecasts were re-
ceived for the one and four-year horizon, of which none has a good forecast quality.
The multivariate backtest shows little sign of forecasting ability for our model.
The number of times the HEP intersects at most with the forecasted interval is 3
out of 5 times. This happens at the three and four-year horizon, figure 7.5 c and d,
these are also also the forecasts following the evolvement of the HEP most closely.
The four-year forecast depicts the change of the HEP the best, being correct 3 out
of 4 times, however never getting the actual figures correct. The two and four-year
forecast captures the forward average the best, 2 out of 5 forecasted intervals are
realized in average over the next 5 years. From figure 7.5 f , the only conclusion
that can be drawn is that averaging our forecast for each time step does not pro-
vide a better estimate of the forward average. The R2 -values in table 7.7 show
sign of forecasting ability in comparison with the historical average at all time
steps except for the two-year horizon, with the four and five-year horizon forecasts
standing out. The most significant hit-ratio is 80%, at the four-year horizon. In
conclusion the back-testing in the multivariate case has shown that for the test
period the best results in all terms have been received for the four and five-year
horizon, in particular the four-year horizon.
Summing up the results from the univariate and multivariate back-test, it can
not be said that the quality of the multivariate forecasts outperforms the quality
of the univariate estimates when looking to the R2 -values and hit-ratios. However,
the multivariate forecasts as such depict the evolvement of the true excess returns
in a better way. Contrary to what one could believe, the one year horizon fore-
casts do not generate better forecasts than the other horizons. In fact, the best
estimates are provided by the 4-year forecasts, both in the univariate and the mul-
tivariate case. Still, we recommend using the one-year horizon forecasts because it
has the smallest time lag and therefore uses more recent data. Furthermore, the
result that the forecast power for multi factor models is better than for a forecast
based on the historical average is in line with Campbell and Thompson’s findings
[16].
68 Discussion of the Forecasting
Part II
69
Chapter 9
Portfolio Optimization
In modern portfolio theory it is assumed that expected returns and covariances are
known with certainty. Naturally, this is not the case in practise - the inputs have
to be estimated and with this follows estimation errors. Errors in the estimations
have great impact on the optimal allocation weights in a portfolio, therefore it is
of great interest to have as accurate forecasts of the input parameters as possible,
which has been dealt with in part I of this thesis. Even if you have good estimates of
the input parameters, estimation errors will still be present, they are just smaller.
In this chapter we discuss and present the impact of estimation errors in portfolio
optimization.
The model of Markowitz is assuming that investors are only concerned about
the mean, the variance and the correlation of the portfolio assets. A portfolio is
said to be “efficient” if there is no other portfolio with the same expected return
but with a lower risk, or if there is no other portfolio with the same risk, but with
a higher expected return. [54] An investor who seeks to minimize risk (standard
deviation) always chooses the portfolio with the smallest standard deviation for a
given mean, i.e. he is risk averse. An investor, who for a given standard deviation
wants to maximize the expected return, is said to have the property nonsatiation.
An investor being riskaverse and nonsatiated at the same time will always choose a
portfolio on the efficient frontier, which is made up of the set of efficient portfolios.
[52] The portfolio on the efficient frontier with the lowest standard deviation is
called the minimum variance portfolio (MVP).
Given the number of assets n in the portfolio the other statistical properties of
the Markowitz problem can be described by its average return µ ∈ Rn×1 , the
71
72 Portfolio Optimization
covariance matrix C ∈ Rn×n and the asset weight w ∈ Rn×1 . The mathematical
formulation of the Markowitz problem is now given as
min w> Cw
w
s.t. µ> w = µ̄
1> w = 1, (9.1)
where 1 is a column vector of ones. The first constraint says that the weights
and their corresponding returns have to equal the desired return level. The sec-
ond constraint means that the weights have to add up to one. Note that in this
formulation, the signs of the weights are not restricted, short selling is allowed.
Following Zagst [66] the solution to problem (9.1) is given in theorem 9.1.
d = bc − a2 > 0. (9.9)
(9.11) ⇔ Cw∗ = u1 µ + u2 1
⇔ w∗ = u1 C −1 µ + u2 C −1 1 (9.14)
where d is greater than zero, see (9.9). Using (9.17) and (9.18) yields
−1 1
u = A
µ̄
1 cµ̄ − a
= (9.19)
d b − aµ̄
By inserting (9.19) into (9.14) equation (9.2),the optimal weights, are found:
w∗ = u1 C −1 µ + u2 C −1 1
1
= ((cµ̄ − a)C −1 µ + (b − aµ̄)C −1 1). (9.20)
d
∂σ 2 (µ̄) 1
= (2cµ̄ − 2a) = 0
∂ µ̄ d
a
⇒ µM V P = (9.23)
c
since the second partial derivative is positive
(9.8)&(9.9)
∂ 2 σ 2 (µ̄) 2c z}|{
2
= > 0. (9.24)
∂ µ̄ d
9.1 Solution of the Markowitz problem 75
If shorting was not allowed, the constraint for positive portfolio weights would
have to be added to problem (9.1). The problem formulation would then be
min w> Cw
w
s.t. µ> w = µ̄
1> w = 1
w ≥ 0. (9.27)
This optimization problem is quadratic just as problem (9.1), but in contrast it can
not be reduced to a set of linear equations due to the added inequality constraint.
Instead, an iterative optimization method has to be used for finding the optimal
weights. The problem is solved by making the call quadprog in Matlab. The
function solves quadratic optimization problems by using active set methods3 .
Solving problem (9.27) for a given data set, where the means and covariances
have been estimated on historical data, would generate portfolios that exhibit
very different allocation weights. Some assets tend to never enter the solution as
well. This is a natural result from solving the optimization problem - the assets
with very attractive features dominate the solution. It is also here the estimation
errors are likely to be large, which means that the impact of estimation errors
on portfolio weights is maximized. [61] This is an undesired property of portfolio
optimization that has been known for a long time [56]. Since the input parameters
are treated as if they were known with certainty, even very small changes in them
will trace out a new efficient frontier. The problem gets even worse as the numbers
of assets increases because this increases the probability of outliers. [61]
1. Estimate the mean vector, µ̂, and covariance matrix, Σ̂, from historical data.
2. Draw T random samples from the multivariate distribution N (µ̂, Σ̂) to
estimate µ̂i and Σ̂i .
3. Calculate an efficient frontier from the input parameters from step 2 over the
interval [σM V P,i , σM AX ] which is partitioned into M equally spaced points.
Record the weights w1,i , . . . , wM,i .
4. Repeat step 2 and 3 a total of I times.
PI
5. Calculate the resampled portfolio weights as w̄M = I1 i=1 wM,i and evalu-
ate the resampled frontier with the mean vector and covariance matrix from
step 1.
The number of draws T correspond to the uncertainty in the inputs you are us-
ing. As the number of draws increases the dispersion decreases and the estimation
error, the difference between the original estimated input parameters and the sam-
pled input parameters, will become smaller. [61] Typically, the value of T is set
to the length of the historical data set [61] and the value of I is set between 100 to
500 [26]. The number of portfolios M can be chosen freely according to how well
the efficient frontier should be depicted.
The new resampled frontier will appear below the original one. This follows from
the weights w1,i , . . . , wM,i being optimal relative to µ̂i and Σ̂i but inefficient rela-
tive to the original estimates µ̂ and Σ̂. Therefore, the resampled portfolio weights
are also inefficient relative to µ̂ and Σ̂. By the sampling and reestimation that
occurs at each step in the portfolio resampling process, the effect of estimation
error is incorporated in the determination of the resampled portfolio weights. [26]
78 Portfolio Optimization
With the input parameters from table 9.1 a portfolio resampling has been carried
out, with and without shorting allowed and always with both errors in the mean
and covariances. In figure 9.1 the resampled efficient frontiers are depicted. In
figure 9.2 and 9.3 the portfolio allocations are found. Finally, the impact of errors
in the mean and in the covariances respectively are displayed in 9.4.
9.5 Discussion of portfolio resampling 79
As discussed earlier the resampled frontier will plot below the efficient frontier,
just as in figure 9.1 b. However, when shorting is allowed the resampled frontier
will coincide with the efficient frontier. Why is that? Estimation errors should
result in an increase in portfolio risk showing up as an increase in volatility for
each return level. Instead it can only be seen that the estimation errors result in
a shortening of the frontier. The explanation given by Scherer [61] is that highly
positive returns will be offset by highly negative returns when drawing from the
original distribution. The quadratic programming optimizer will invest heavily in
the asset with highly positive returns and short the asset with highly negative
returns and this will be offset in average. When the long-only constraint is added,
this will no longer be the case and the resampled frontier will plot below the effi-
cient frontier, figure 9.1 b.
In the resampling, estimation errors have been assumed both in the means and
covariances. In figure 9.4 the effect of only estimation errors in the means or co-
variances can be observed. It is found that estimation errors in the mean have a
much greater impact than estimation errors in covariances. A good forecast of the
mean will improve the resulting allocations a great deal.
The averaging in the portfolio resampling method makes the weights still sum
to one, which is important. But averaging can sometimes prove to be misleading.
For instance you will always face the probability that the allocation weights for a
given portfolio are heavily influenced of a few lucky draws making the asset look
more attractive than what is justifiable. Averaging is indeed the main idea be-
hind portfolio resampling, but it is not plausible that the final averaged portfolio
weights are dependent on a few extreme outcomes. This is criticism discussed by
Scherer [61]. However, the most important criticism, also presented by Scherer
[61], is that all resamplings are derived from the same mean vector and covariance
matrix. Because the true distribution is unknown, all resampled portfolios suffer
from the same deviation from the true parameters in pretty much the same way.
Averaging will not help much in this case. Therefore it is fair to say that all port-
folios inherit the same estimation error.
always beat Markowitz portfolios out-of-sample and can therefore not only be sub-
scribed to the portfolio resampling method itself as being outstanding. Although
the resampling heuristic have some major drawbacks, it remains interesting since
it is a first step of addressing estimation errors in portfolio optimization.
9.5 Discussion of portfolio resampling 81
Backtesting Portfolio
Performance
In the first part of this thesis we developed a method for forecasting the equity
premium that took model uncertainty into account. It was found that our forecast
outperformed the use of an historical average but was associated with estimation
errors. In the previous chapter we presented portfolio resampling as a method for
dealing with these errors. In this chapter we will evaluate if portfolio resampling
can be used to improve our forecasting results.
Starting in the end of 1998 and going to the end of 2007 we solve problem (9.27)
and rebalance the portfolio at the end of each year. We do not allow for short-
selling since it previously was found that portfolio resampling only has effect under
the long-only constraint. Transaction costs are not taken into account, since our
concern is the relative performance of the methods. The returns vector, µ, is fore-
casted using the arithmetic average of the returns up to time t for asset i except
for equity US where we make use of our one year multivariate forecasted equity
premium
√ for time t. The parameter µ̄ is set so that each portfolio has a volatility of
0.02 ≈ 14% when rebalanced. The covariance matrix is always estimated on all
returns available up to time t. The resulting portfolio value over time is found in
figure 10.1 and in table 10.1 the corresponding returns are found. In table 10.2 the
exact portfolio values on the end date for ten resampling simulations are presented.
85
86 Backtesting Portfolio Performance
It is found that using our premium forecasts as input yields better performance
than just employing the historical average1 . Our forecast consistently generates
the highest portfolio value. As explained earlier, using accurate inputs in portfolio
optimization is very important.
Table 10.1. Portfolio returns in percent over time. PR is the acronym for portfolio
resampling.
1 For the asset equity US, the historical arithmetic average is refered to as aHEP.
10.1 Backtesting setup and results 87
Table 10.2. Terminal portfolio value. PR is the acronym for portfolio resampling.
In this backtest we find evidence that our multivariate forecast performs better
than the arithmetic average when used as input in a mean-variance asset allocation
problem. Portfolio resampling is also found to provide a good way of arriving at
meaningful asset allocations when the input parameters are very noisy.
Chapter 11
Conclusions
It is found that the forecasting ability of multi factor models is not substantially
improved by our approach. Our interpretation thereof is that the largest problem
with multifactor models is not model uncertainty, but rather too low predictive
ability.
Further, our investigation brings evidence that the GDP, the short term spread
and the volatility are useful in forecasting the expected equity premium for the five
years to come. Our investigations also show that multivariate models are to some
extent better than univariate models, but it can not be said that any of them is
accurate in predicting the expected equity premium. Nevertheless, it is likely that
both provide better forecasts than using the arithmetic average of the historical
equity premium.
We have also found that portfolio resampling provides a good way to arrive at
meaningful allocation decisions when the optimization inputs are very noisy.
Our proposal to further work is to investigate whether a Bayesian analysis, not in-
volving linear regression, with carefully selected priors, calibrated to reflect mean-
ingful economic information, provides better predictions for the expected equity
premium than the approach used in this thesis.
89
Bibliography
[1] Ang A. & Bekaert G., (2003), Stock return predictability: is it there?, Work-
ing Paper, University of Columbia.
[2] Avramov D., (2002), Stock return predictability and model uncertainty, Jour-
nal of Financial Economics, vol. 64, pp. 423-458.
[3] Baker M. & Wurgler J., (2000), The Equity Share in New Issues and Aggregate
Stock Returns, Journal of Finance, American Finance Association, vol. 55(5),
pp. 2219-2257.
[4] Benning J. F., (2007), Trading Strategies for Capital Markets, McGraw-Hill,
New York.
[5] Bernardo J. M. & Smith A., (1994), Bayesian Theory, John Wiley & Sons
Ltd.
[6] Bostock P., (2004), The Equity Premium, Journal of Portfolio Management
vol. 30(2), pp. 104-111.
[7] Brealey R. A., Myers S. C. & Allen F., (2006),Corporate Finance, McGraw-
Hill, New York.
[8] Brealey R. A., Myers S. C. & Allen F., (2000),Corporate Finance, McGraw-
Hill, New York.
[9] Brealey R. A., Myers S. C. & Allen F., (1996),Corporate Finance, McGraw-
Hill, New York.
[10] Burda M. & Wyplosz C., (1997), Macroeconomics: A European text, Oxford
University Press, New York.
[11] Campbell J. Y., Lo A. & MacKinlay A., (1997), The Econometrics of Financial
Markets, Princeton University Press.
[12] Campbell J. Y. & Shiller R. J., (1988) The dividend-price ratio and expecta-
tions of future dividends and discount factors, Review of Financial Studies,
vol. 1, pp. 195-228.
[13] Campbell J. Y. & Shiller R. J., (1988) Stock prices, earnings, and expected
dividends, Journal of Finance, vol. 43, pp. 661-676.
91
92 Bibliography
[14] Campbell J. Y. & Shiller R. J., (1998) Valuation ratios and the long-run stock
market outlook, Journal of Portfolio Management, vol. 24, pp. 11-26.
[15] Campbell, J. Y., (1987), Stock returns and the term structure, Journal of
Financial Economics, vol. 18, pp. 373-399.
[16] Campbell J. & Thompson S., (2005), Predicting the Equity Premium Out of
Sample: Can Anything Beat the Historical Average?, NBER Working Papers
11468, National Bureau of Economic Research.
[17] Casella G. & Berger R. L., (2002), Statistical Inference, 2nd ed. Duxbury
Press.
[18] Choudhry M., (2006), Bonds - A concise guide for investors, Palgrave Macmil-
lan, New York.
[19] Cohen R.B., Polk C. & Vuolteenaho T., (2005), Inflation Illusion in the Stock
Market: The Modigliani-Cohn Hypothesis, Quarterly Journal of Economics,
vol. 120, pp. 639-668.
[20] Dalén J., (2001), The Swedish Consumer Price Index - A Handbook of Meth-
ods, Statistiska Centralbyrån, SCB-Tryck, Örebro.
[21] Damodaran A., (2006), Damodaran on Valuation, John Wiley & Sons, New
York.
[22] Dimson E., Marsh P. & Staunton M., (2006), The Worldwide Equity Pre-
mium: A Smaller Puzzle, SSRN Working Paper No. 891620.
[23] Durbin J. & Watson G.S., (1950), Testing for Serial Correlation in Least
Squares Regression I, Biometrika vol. 37, pp. 409-428.
[24] Escobar L. A. & Meeker W. Q., (2000), The Asymptotic Equivalence of the
Fisher Information Matrices for Type I and Type II Censored Data from
Location-Scale Families., Working Paper.
[25] Estrella A. & Trubin M. R., (2006), The Yield Curve as a Leading Indicator:
Some Practical Issues, Current Issues in Economics and Finance - Federal
Reserve Bank of New York, vol. 12(5).
[26] Fabozzi F. J., Focardi S. M. & Kolm P. N., (2006), Financial Modeling of the
Equity Market, John Wiley & Sons, New Jersey.
[27] Fama E.F., (1981), Stock returns, real activity, inflation and money, American
Economic Review, pp. 545-565.
[28] Fama E. F. & French K. R., (1988), Dividend yields and expected stock
returns, Journal of Financial Economics, vol. 22, pp. 3-25.
[29] Fama E. F. & French K. R., (1989), Business conditions and expected returns
on stocks and bonds, Journal of Financial Economics, vol. 25, pp. 23-49.
Bibliography 93
[30] Fama E.F. & Schwert G.W., (1977), Asset Returns and Inflation, Journal of
Financial Economics, vol. 5(2), pp. 115-46.
[31] The Federal Reserve, Industrial production and capacity utilization, (2007),
Retrieved February 12, 2008 from
http://www.federalreserve.gov/releases/g17/20071214/
[32] Fernández P., (2006), Equity Premium: Historical, Expected, Required and
Implied, IESE Business School, Madrid.
[33] Fernandéz C., Ley E. & Steel M., (1998), Benchmark priors for Bayesian
Model Averaging, Working Paper.
[34] Franke J., Härdle W.K. & Hafner C.M., (2008), Statistics of Financial Markets
An Introduction, Springer-Verlag, Berlin Heidelberg.
[35] Gill P. E. & Murray W., (1981), Practical Optimization, Academic Press,
London.
[36] Golub G. & Van Loan C., (1996), Matrix Computations, The Johns Hopkins
University Press, Baltimore.
[37] Goyal A. & Welch I., (2006), A Comprehensive Look at the Empirical Per-
formance of Equity Premium Prediction, Review of Financial Studies, forth-
coming.
[38] Hamilton J. D., (1994), Time Series Analysis, Princeton University Press.
[39] Harrell F. E., (2001), Regression Modeling Strategies, Springer-Verlag, New
York.
[40] Hodrick R. J., (1992), Dividend yields and expected stock returns: alternative
procedures for inference and measurement, Review of Financial Studies, vol.
5(3), pp. 257-286.
[41] Hoeting J. A., Madigan D. & Raftery A. E. & Volinsky C. T., (1999), Bayesian
Model Averaging: A Tutorial, Statistical Science 1999, vol. 14(4), pp. 382-417.
[42] Ibbotson Associates, (2006), Stocks, Bonds, Bills and Inflation, Valuation
Edition, 2006 Yearbook.
[43] Keim D. B. & Stambaugh R. F., (1986), Predicting returns in the stock and
bond markets, Journal of Financial Economics, vol. 17(2), pp. 357-390.
[44] Kennedy P. E., (2000), Macroeconomic Essentials - Understanding Economics
in the News, The MIT Press, Cambridge.
[45] Koller T. & Goedhart M. & Wessels D., (2005), Valuation: Measuring and
Managing the Value of Companies, McKinsey & Company, Inc. Wiley.
[46] Kothari S. P. & Shanken J., (1997), Book-to-market, dividend yield, and ex-
pected market returns: a time series analysis, Journal of Financial Economics,
vol. 44, pp. 169-203.
94 Bibliography
[47] Krainer J., What Determines the Credit Spread?, (2004), FRBSF Economic
Letter, Nr 2004-36.
[48] Lamont O., (1998), Earnings and expected returns, Journal of Finance, vol.
53, pp.1563-1587.
[50] Lettau M. & Ludvigson, (2001), Consumption, aggregate wealth and expected
stock returns, Journal of Finance, vol. 56(3), pp. 815-849.
[51] Lewellen J., (2004), Predicting returns with financial ratios, working paper.
[52] Luenberger D. G., (1998), Investment Science, Oxford University Press, New
York.
[54] Mayer B., (2007), Credit as an Asset Class, Masters Thesis, TU Munich.
[55] Merton R. C., (1980), On Estimating the Expected Return on the Market:
An Exploratory Investigation, Journal of Financial Economics, vol. 8, pp.
323-361.
[56] Michaud R., (1998), Efficient Asset Management: A Practical Guide to Stock
Portfolio Optimization and Asset Allocation, Oxford University Press, New
York.
[58] Pontiff J. & Schall L. D., (1998), Book-to-market ratios as predictors of market
returns, Journal of Financial Economics, vol. 49, pp. 141-160.
[59] Press J. S., (1972), Applied Multivariate Analysis, Holt, Rinehart & Winston
Inc, University of Chicago.
[60] Rozeff M., (1984), Dividend yields are equity risk premiums, Journal of Port-
folio Management, vol. 11, pp. 68-75.
[61] Scherer B., (2004), Portfolio Construction and Risk Budgeting, Risk Books,
Incisive Financial Publishing Ltd.
[64] Vaihekoski M., (2005), Estimating Equity Risk Premium: Case Finland,
Lappeenranta University of Technology, Working paper.
Bibliography 95
[65] Welch, I., (2000),Views of Financial Economists on the Equity Premium and
on Professional Controversies, Journal of Business, vol. 73(4), pp. 501-537
[66] Zagst R., (2004), Lecture Notes - Asset Pricing, TU Munich.
[67] Zagst, R. & Pöschik M., (2007), Inverse Portfolio Optimization under Con-
straints, Working Paper.
[68] Zellner A., (1986), On assessing prior distributions and bayesian regression
analysis with g-prior distributions, in Essays in Honor of Bruno de Finetti,
eds P.K. Goel and A. Zellner, Amsterdam: North-Holland, pp. 233-243.
Appendix A
Mathematical Preliminaries
For covariance stationary processes, the term weakly stationary is often used, (see
[34]).
97
98 Mathematical Preliminaries
Definition A.6 (The gamma function) The gamma function can be defined
as the definite integral
Z∞
Γ(x) = t(x−1) e−t dt
0
−(yt − µ)2
1
fY = √ exp .
2πσ 2σ 2
xv/2−1 exp[−x/2]
pv (x) = .
Γ(v/2)2v/2
A.2 Statistical distributions 99
f (x) = 1
(2π)p/2 |Σ|1/2
exp[− 12 (x − θ)> Σ−1 (x − θ)].
Code
%prediction horizon
horizon=5;
for k=1:horizon
y_bma=[];
x_bma=[];
res=[];
est=[];
removedModels=[];
usedModels=[];
outliers=0;
for j=1:length(returns(1,:))
[x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)...
,returns(1:end-k,j),returns(end,j));
y_bma=[y_bma y];
x_bma=[x_bma x];
n=length(x(:,1));
p=length(x(1,:));
g=1/n;
100
B.2 Multivariate predictions 101
for i = 1:length(returns(1,:))
VARyhat_data = VARyhat_data +(diag(res(:,i))*x_bma(:,i*2-1:i*2)...
*inv(x_bma(:,i*2-1:i*2)’*x_bma(:,i*2-1:i*2))*x_bma(:,i*2-1:i*2)’...
+y_bma(:,i)*y_bma(:,i)’)*prob_model(i)-(y_bma(:,i)*prob_model(i))...
*(y_bma(:,i)*prob_model(i))’;
end
STD_step(k) = sqrt(sum(diag(VARyhat_data))/length(diag(VARyhat_data)));
z=norminv([0.05 0.95],0,1);
muci=[muci; weightedAvg+z(1)*STD_step(k)/sqrt(length(res(:,1)))...
weightedAvg weightedAvg+z(2)*STD_step(k)/sqrt(length(res(:,1)))];
end
%input
[dates, returns, differ] = calcFactors_LongDataSet(dates, values);
eqp=returns(:,1); regressor=returns(:,2:end);
numFactor=length(regressor(1,:)); numOfModel=2^numFactor;
for k=1:horizon
for i=1:numOfModel-1
%pick a model
L=length(regressor(:,1));
out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)...
comb(i,4)*ones(L,1) comb(i,5)*ones(L,1) comb(i,6)*ones(L,1)...
comb(i,7)*ones(L,1) comb(i,8)*ones(L,1) comb(i,9)*ones(L,1)...
comb(i,10)*ones(L,1) comb(i,11)*ones(L,1) comb(i,12)*ones(L,1)...
comb(i,13)*ones(L,1) comb(i,14)*ones(L,1) comb(i,15)*ones(L,1)...
comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)];
output=out.*regressor;
modRegr = output(:,not(all(output(:,1:size(output,2))== 0)));
%predictions
[x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)...
,modRegr(1:end-k,:),modRegr(end,:));
102 Code
if (est_tmp>0)&&(est_tmp<(mean(eqp(k+1:end))+1.28*sqrt(var(eqp(k+1:end)))))
tmp(i)=est_tmp;
%calculate likelihood
n=length(x(:,1));
p=length(x(1,:));
g=p^(1/(1+p))/n;
P=x*inv(x’*x)*x’;
likelihood(i)=(gamma(n/2)/((2*pi)^(n/2))/((1+g)^(p/2)))...
*(y’*y-(g/(1+g))*y’*P*y)^(-n/2);
else
likelihood(i)=0;
tmp(i)=0;
r(k)=r(k)+1;
end
setsubColumn(k+1,size(res,1),i,resVec,res);
end
%bma
p_model=likelihood./sum(likelihood);
magnitude=p_model’*tmp;
prob_model(:,k)=p_model;
predRng(:,k)=[min(tmp); max(tmp); mean(tmp)];
allMag=[allMag magnitude];
y_bma(k+1:end,k)=y;
%pick a model
L=length(regressor(:,1));
out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)...
comb(i,4)*ones(L,1) comb(i,5)*ones(L,1) comb(i,6)*ones(L,1)...
comb(i,7)*ones(L,1) comb(i,8)*ones(L,1) comb(i,9)*ones(L,1)...
comb(i,10)*ones(L,1) comb(i,11)*ones(L,1) comb(i,12)*ones(L,1)...
comb(i,13)*ones(L,1) comb(i,14)*ones(L,1) comb(i,15)*ones(L,1)...
comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)];
output=out.*regressor;
modRegr = output(:,not(all(output(:,1:size(output,2))== 0)));
modRegr = output(:,not(all(output(:,1:size(output,2))== 0)));
modRegr = [modRegr(1:end-k,:) ones(length(modRegr(1:end-k,:)),1)];
%intercept added
end
STD_step(k) = sqrt(sum(diag(VARyhat_data))/(length(diag(VARyhat_data))));
z=norminv([0.05 0.95],0,1);
muci=[muci; allMag(k)+z(1)*STD_step(k)/sqrt(length(res(:,1)))...
allMag(k) allMag(k)+z(2)*STD_step(k)/sqrt(length(res(:,1)))];
end
B.3 Merge time series 103
for j = 1:nMerged
while (dates(k) < mergedDates(j) && k < length(dates))
k = k+1;
end
if (dates(k) == mergedDates(j) && ~isnan(data{i}(k,2)))
n = n+1;
newDates(n) = mergedDates(j);
end
end
mergedDates = newDates(1:n);
end
for i = 1:length(sheetNames)
dates = datenum(’30-Dec-1899’) + data{i}(:,1);
k = 1;
for j = 1:n
while (dates(k) < mergedDates(j) && k < length(dates))
k = k+1;
end
if (dates(k) == mergedDates(j))
values(j,i) = data{i}(k,2);
else
error = 1
end
end
end
for i = 1:length(sheetNames)
data{i} = xlsread(’runEqPred.xls’, char(sheetNames(i)));
end
if interpolate
[dates, values] = mergeInterpolExcelData(sheetNames, data);
else
[dates, values] = mergeExcelData(sheetNames, data);
end
104 Code
B.5 Permutations
function out = combinations(k);
out = indicator;
%remove outliers
xTmp=[]; outliers=0; for i=1:length(x(1,:)) xVec=x(:,i);
for k=1:3 %nr of iterations for finding outliers
H_hat=xVec*inv(xVec’*xVec)*xVec’;
Y=H_hat*y;
index=find(abs(Y-mean(Y))>3*rlstd(Y));
outliers=outliers+length(index);
for j=1:length(index)
if index(j)~= length(y)
xVec(index(j))= 0.5*xVec(index(j)+1)+0.5*xVec(index(j)-1);
else
xVec(index(j))=0.5*xVec(index(j)-1)+0.5*xVec(index(j));
end
end
end xTmp = [xTmp xVec]; end x=xTmp;
%OLS
x=[ones(length(x),1) x]; %adding intercept
beta=x\y; % OLS
est=[1 lastVal]*beta; %predicted value
resVec=(y-x*beta).^2; %residual vector
B.7 setSubColumn
#include "mex.h"
//mexPrintf("%d\n", (int)col[0]*mxGetM(prhs[4])+(int)iStart[0]-1);
for l = 1:10
minMean = abs(min(sampMean));
maxMean = max(sampMean);
z=1;
for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean]
[wStar(:,z), tmp] = solveQuad(sampMean, sampCov, nrAssets, k);
z=z+1;
end
%4. Repeat step 2-3
allReturn(:,j)=wStar’*histMean’;
for q=1:nrPortfolios
allVol(q,j)=wStar(:,q)’*histCov*wStar(:,q);
end
wStarAll=wStarAll + wStar;
end
106 Code
minMean = abs(min(histMean));
maxMean = max(histMean);
z=1;
for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean]
[wStarHist(:,z), tmp] = solveQuad(histMean, histCov, nrAssets, k);
z=z+1;
end
returnHist=wStarHist’*histMean’;
for i=1:nrPortfolios
volHist(i)=wStarHist(:,i)’*histCov*wStarHist(:,i);
end
prices((11-l),:)=values(end-(l-1),stocksNr);
if resampPort
[mvp_val mvp_nr] = min(volHist);
[tmpMin, portNr] = min(abs(volResamp(mvp_nr:end)-volDesired));
weights(l,:)=wStarAll(:, portNr+mvp_nr-1)’;
else
[mvp_val mvp_nr] = min(volHist);
[tmpMin, portNr] = min(abs(volHist(mvp_nr:end)-volDesired));
weights(l,:)=wStarHist(:, portNr+mvp_nr-1)’;
end
end
[V, wealth]=buySell2(weights,prices)
clc;
options = optimset(’LargeScale’,’off’);
Upphovsrätt
Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare -
under 25 år från publiceringsdatum under förutsättning att inga extraordinära
omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och
en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att an-
vända det oförändrat för ickekommersiell forskning och för undervisning. Över-
föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd.
All annan användning av dokumentet kräver upphovsmannens medgivande. För
att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av
teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att
bli nämnd som upphovsman i den omfattning som god sed kräver vid användning
av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras
eller presenteras i sådan form eller i sådant sammanhang som är kränkande för
upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterli-
gare information om Linköping University Electronic Press se förlagets hemsida
http://www.ep.liu.se/
c May 12, 2008. Johan Bjurgert & Marcus Edstrand