Forecasting The Equity Premium and Optimal Portfolios

Matematiska Institutionen
Department of Mathematics
Master’s Thesis
Forecasting the Equity Premium and

Optimal Portfolios
Johan Bjurgert and Marcus Edstrand
Reg Nr: LITH-MAT-EX--2008/04--SE

Linköping 2008
Matematiska institutionen
Linköpings universitet
581 83 Linköping
Forecasting the Equity Premium and Optimal
Portfolios
Department of Mathematics, Linköpings universitet
Johan Bjurgert and Marcus Edstrand
LITH-MAT-EX--2008/04--SE
Handledare: Dr Jörgen Blomvall

mai, Linköpings universitet
Dr Wofgang Mader
risklab GmbH
Examinator: Dr Jörgen Blomvall

mai, Linköpings universitet
Linköping, 15 April, 2008

Avdelning, Institution Datum
Division, Department Date
Division of Mathematics
Department of Mathematics
2008-04-15
Linköpings universitet
SE-581 83 Linköping, Sweden
Språk Rapporttyp ISBN

Language Report category —
Svenska/Swedish Licentiatavhandling ISRN
Engelska/English

Examensarbete LITH-MAT-EX--2008/04--SE
C-uppsats
Serietitel och serienummer ISSN
D-uppsats Title of series, numbering —
Övrig rapport

URL för elektronisk version
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11795
Titel Forecasting the Equity Premium and Optimal Portfolios

Title
Författare Johan Bjurgert and Marcus Edstrand

Author
Sammanfattning
Abstract
The expected equity premium is an important parameter in many financial

models, especially within portfolio optimization. A good forecast of the future
equity premium is therefore of great interest. In this thesis we seek to forecast
the equity premium, use it in portfolio optimization and then give evidence on
how sensitive the results are to estimation errors and how the impact of these can
be minimized.
Linear prediction models are commonly used by practitioners to forecast

the expected equity premium, this with mixed results. To only choose the model
that performs the best in-sample for forecasting, does not take model uncertainty
into account. Our approach is to still use linear prediction models, but also taking
model uncertainty into consideration by applying Bayesian model averaging.
The predictions are used in the optimization of a portfolio with risky assets to
investigate how sensitive portfolio optimization is to estimation errors in the
mean vector and covariance matrix. This is performed by using a Monte Carlo
based heuristic called portfolio resampling.
The results show that the predictive ability of linear models is not sub-
stantially improved by taking model uncertainty into consideration. This could
mean that the main problem with linear models is not model uncertainty, but
rather too low predictive ability. However, we find that our approach gives better
forecasts than just using the historical average as an estimate. Furthermore,
we find some predictive ability in the the GDP, the short term spread and the
volatility for the five years to come. Portfolio resampling proves to be useful
when the input parameters in a portfolio optimization problem is suffering from
vast uncertainty.
Keywords: equity premium, Bayesian model averaging, linear prediction,

estimation errors, Markowitz optimization
Nyckelord
Keywords equity premium, Bayesian model averaging, linear prediction, estimation errors,
Markowitz optimization
Abstract
The expected equity premium is an important parameter in many financial mod-
els, especially within portfolio optimization. A good forecast of the future equity
premium is therefore of great interest. In this thesis we seek to forecast the equity
premium, use it in portfolio optimization and then give evidence on how sensitive
the results are to estimation errors and how the impact of these can be minimized.
Linear prediction models are commonly used by practitioners to forecast the ex-
pected equity premium, this with mixed results. To only choose the model that
performs the best in-sample for forecasting, does not take model uncertainty into
account. Our approach is to still use linear prediction models, but also taking
model uncertainty into consideration by applying Bayesian model averaging. The
predictions are used in the optimization of a portfolio with risky assets to investi-
gate how sensitive portfolio optimization is to estimation errors in the mean vector
and covariance matrix. This is performed by using a Monte Carlo based heuristic
called portfolio resampling.
The results show that the predictive ability of linear models is not substantially
improved by taking model uncertainty into consideration. This could mean that
the main problem with linear models is not model uncertainty, but rather too low
predictive ability. However, we find that our approach gives better forecasts than
just using the historical average as an estimate. Furthermore, we find some pre-
dictive ability in the the GDP, the short term spread and the volatility for the five
years to come. Portfolio resampling proves to be useful when the input parameters
in a portfolio optimization problem is suffering from vast uncertainty.
Keywords: equity premium, Bayesian model averaging, linear prediction,

estimation errors, Markowitz optimization
v
Acknowledgments
First of all we would like to thank risklab GmbH for giving us the opportunity
to write this thesis. It has been a truly rewarding experience. We are grateful
for the many inspirational discussions with Wolfgang Mader, our supervisor at
risklab. He also has provided us with valuable comments and suggestions. We
thank our supervisor at LiTH, Jörgen Blomvall, for his continous support and
feedback. Finally we would like to acknowledge our opponent Tobias Törnfeldt,
for his helpful comments.
Johan Bjurgert
Marcus Edstrand
Munich, April 2008
vii
Contents
1 Introduction 5
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
I Equity Premium Forecasting using Bayesian Statistics 7

2 The Equity Premium 9
2.1 What is the equity premium? . . . . . . . . . . . . . . . . . . . . . 9
2.2 Historical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Implied models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Conditional models . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Multi factor models . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 A short summary of the models . . . . . . . . . . . . . . . . . . . . 14
2.7 What is a good model? . . . . . . . . . . . . . . . . . . . . . . . . 15
2.8 Chosen model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Linear Regression Models 17

3.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 The classical regression assumptions . . . . . . . . . . . . . . . . . 21
3.3 Robustness of OLS estimates . . . . . . . . . . . . . . . . . . . . . 22
3.4 Testing the regression assumptions . . . . . . . . . . . . . . . . . . 23
4 Bayesian Statistics 25
4.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Sufficient statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Choice of prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Marginalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5 Bayesian model averaging . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Using BMA on linear regression models . . . . . . . . . . . . . . . 32
ix
x Contents
5 The Data Set and Linear Prediction 37

5.1 Chosen series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 The historical equity premium . . . . . . . . . . . . . . . . . . . . 37
5.3 Factors explaining the equity premium . . . . . . . . . . . . . . . . 39
5.4 Testing the assumptions of linear regression . . . . . . . . . . . . . 45
5.5 Forecasting by linear regression . . . . . . . . . . . . . . . . . . . . 51
6 Implementation 53
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Linear prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.3 Bayesian model averaging . . . . . . . . . . . . . . . . . . . . . . . 55
6.4 Backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7 Results 57
7.1 Univariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.2 Multivariate forecasting . . . . . . . . . . . . . . . . . . . . . . . . 60
7.3 Results from the backtest . . . . . . . . . . . . . . . . . . . . . . . 62
8 Discussion of the Forecasting 65
II Using the Equity Premium in Asset Allocation 69

9 Portfolio Optimization 71
9.1 Solution of the Markowitz problem . . . . . . . . . . . . . . . . . . 71
9.2 Estimation error in Markowitz portfolios . . . . . . . . . . . . . . . 76
9.3 The method of portfolio resampling . . . . . . . . . . . . . . . . . 77
9.4 An example of portfolio resampling . . . . . . . . . . . . . . . . . . 78
9.5 Discussion of portfolio resampling . . . . . . . . . . . . . . . . . . . 79
10 Backtesting Portfolio Performance 85

10.1 Backtesting setup and results . . . . . . . . . . . . . . . . . . . . . 85
11 Conclusions 89
Bibliography 91
A Mathematical Preliminaries 97
A.1 Statistical definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.2 Statistical distributions . . . . . . . . . . . . . . . . . . . . . . . . 98
B Code 100
B.1 Univariate predictions . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2 Multivariate predictions . . . . . . . . . . . . . . . . . . . . . . . . 101
B.3 Merge time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.4 Load data into Matlab from Excel . . . . . . . . . . . . . . . . . . 103
B.5 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.6 Removal of outliers and linear prediction . . . . . . . . . . . . . . . 104
Contents xi
B.7 setSubColumn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

B.8 Portfolio resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 105
B.9 Quadratic optimization . . . . . . . . . . . . . . . . . . . . . . . . 106
List of Figures
3.1 OLS by means of projection . . . . . . . . . . . . . . . . . . . . . . 18
3.2 The effect of outliers . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Example of a Q-Q plot . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Bayesian revising of probabilities . . . . . . . . . . . . . . . . . . . 26
5.1 The historical equity premium over time . . . . . . . . . . . . . . . 38

5.2 Shapes of the yield curve . . . . . . . . . . . . . . . . . . . . . . . 43
5.3 QQ-Plot of the one step lagged residuals for factors 1-9 . . . . . . 47
5.4 QQ-Plot of the one step lagged residuals for factors 10-18 . . . . . 48
5.5 Lagged factors 1-9 versus returns on the equity premium . . . . . . 49
5.6 Lagged factors 10-18 versus returns on the equity premium . . . . 50
6.1 Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.1 The equity premium from the univariate forecasts . . . . . . . . . . 58

7.2 Likelihood function values for different g-values . . . . . . . . . . . 59
7.3 The equity premium from the multivariate forecasts . . . . . . . . 60
7.4 Backtest of univariate models . . . . . . . . . . . . . . . . . . . . . 62
7.5 Backtest of multivariate models . . . . . . . . . . . . . . . . . . . . 63
9.1 Comparison of efficient and resampled frontier . . . . . . . . . . . . 81

9.2 Resampled portfolio allocation when shorting allowed . . . . . . . 82
9.3 Resampled portfolio allocation when no shorting allowed . . . . . . 83
9.4 Comparison of estimation error in mean and covariance . . . . . . 84
10.1 Portfolio value over time using different strategies . . . . . . . . . . 86

2 Contents
List of Tables
2.1 Advantages and disadvantages of discussed models . . . . . . . . . 14
3.1 Critical values for the Durbin-Watson test. . . . . . . . . . . . . . 23
5.1 The data set and sources . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Basic statistics for the factors . . . . . . . . . . . . . . . . . . . . . 40
5.3 Outliers identified by the leverage measure . . . . . . . . . . . . . . 45
5.4 Jarque-Bera test of normality . . . . . . . . . . . . . . . . . . . . . 46
5.5 Durbin-Watson test of autocorrelation . . . . . . . . . . . . . . . . 46
5.6 Principle of lagging time series for forecasting . . . . . . . . . . . . 51
5.7 Lagged R2 for univariate regression . . . . . . . . . . . . . . . . . . 52
7.1 Forecasting statistics in percent . . . . . . . . . . . . . . . . . . . . 57

7.2 The univariate model with highest probability over time . . . . . . 58
2
7.3 Out of sample, Ros,uni , and hit ratios, HRuni . . . . . . . . . . . . 59
7.4 Forecasting statistics in percent . . . . . . . . . . . . . . . . . . . . 60
7.5 The multivariate model with highest probability over time . . . . . 61
7.6 Forecasts for different g-values . . . . . . . . . . . . . . . . . . . . 61
2
7.7 Out of sample, Ros,mv , and hit ratios, HRmv . . . . . . . . . . . . 61
9.1 Input parameters for portfolio resampling . . . . . . . . . . . . . . 78
10.1 Portfolio returns over time . . . . . . . . . . . . . . . . . . . . . . . 86

10.2 Terminal portfolio value . . . . . . . . . . . . . . . . . . . . . . . . 87
Nomenclature
The most frequently used symbols and abbreviations are described here.
Symbols
µ̄ Demanded portfolio return
βi,t Beta for asset i at time t
βt True least squares parameter at time t
µ Asset return vector
Ωt Information set at time t
Σ Estimated covariance matrix
cov[X] Covariance of the random variable X
β̂t Least squares estimate at time t
Σ̂ Sampled covariance matrix
ût Least squares sample residual at time t
λm,t Market m price of risk at time t
C Covariance matrix
In The unity matrix of size n × n
w Weights of assets
tr[X] The trace of the matrix X
var[X] Variance of the random variable X
Di,t Dividend for asset i at time t
E[X] Expected value of the random variable X
rf,t Riskfree rate at time t to t + 1
rm,t Return from asset m at time t
ut Population residual in the least square model at time t
Abbreviations
aHEP Average historical equity premium
BM A Bayesian model averaging
DJIA Dow Jones industrial average
EEP Expected equity premium
GDP Gross domestic product
HEP Historical equity premium
IEP Implied equity premium
OLS Ordinary least squares
REP Required equity premium
3
Chapter 1
Introduction
The expected equity risk premium is one of the single most important economic
variables. A meaningful estimate of the premium is critical to valuing companies
and stocks and for planning future investments. However, the only premium that
can be observed is the historical premium.
Since the equity premium is shaped by overall market conditions, factors influ-
encing market conditions can be used to explain the equity premium. Although
predictive power usually is low, the factors can also be used for forecasting. Many
of the investigations undertaken, typically set out to determine a best model, con-
sisting of a set of economic predictors and then proceed as if the selected model
had generated the equity premium. Such an approach ignores the uncertainty
in model selection leading to over confident inferences that are more risky than
one thinks that they are. In our thesis we will forecast the equity premium by
computing a weighted average of a large number of linear prediction models using
Bayesian model averaging (BMA) to allow for model uncertainty being taken into
account.
Having forecasted the equity premium - the key input for asset allocation op-
timization models, we conclude by highlighting main pitfalls in the mean variance
optimization framework and present portfolio resampling as a way to arrive at
suitable allocation decisions when the input parameters are very uncertain.
5
6 Introduction
1.1 Objectives
The objective of this thesis is to build a framework for forecasting the equity
premium and then implement it to produce a functional tool for practical use.
Further, the impact of uncertain input parameters in mean-variance optimization
shall be investigated.
1.2 Problem definition

By means of BMA and linear prediction, what is the expected equity premium
for the years to come and how is it best used as an input in a mean variance
optimization problem?
1.3 Limitations
The practical part of this thesis is limited to the use of US time series only.
However, the theoretical framework is valid for all economies.
1.4 Contributions
To the best knowledge of the authors, this is the first attempt to forecast the
equity premium using Bayesian model averaging with the priors specified later in
the thesis.
1.5 Outline
The first part of the thesis is about forecasting the equity premium whereas the
second part discusses the importance of parameter uncertainty in portfolio opti-
mization.
In chapter 2 we present the concept of the equity premium, usual assumptions

thereof and associated models. Chapter 3 describes the fundamental ideas of lin-
ear regression and its limitations. In chapter 4 we first present basic concepts of
Bayesian statistics and then use them to combine the properties of linear predic-
tion with Bayesian model averaging. Having defined the forecasting approach we
in chapter 5 turn to the factors explaining the equity premium. Chapter 6 ad-
dresses the implementation of the theory. Finally, chapter 7 presents our results
and a discussion thereof is found in chapter 8. In chapter 9 we investigate the im-
pact of estimation error on portfolio optimization. In chapter 10 we evaluate the
performance of a portfolio when using the forecasted equity premium and portfo-
lio resampling. With chapter 11 we conclude our thesis and make propositions of
future investigations and work.
Part I
Equity Premium Forecasting

using Bayesian Statistics
7
Chapter 2
The Equity Premium
In this chapter we define the concept of the equity premium and present some
models that have been used for estimating the premium. At the end of the chap-
ter, a table summing up advantages and disadvantages of the different models is
provided. The chapter concludes with a motivation to why we have chosen to work
with multi factor models and a summary of criterions for a good model.
2.1 What is the equity premium?

As defined by Fernandéz [32], the equity premium can be split up into four different
concepts. These concepts hold for single stocks as well for stock indices. In our
thesis the emphasis is on stock indices.
• historical equity premium (HEP): historical return of the stock market

over riskfree asset
• expected equity premium (EEP): expected return of the stock market

over riskfree asset
• required equity premium (REP): incremental return of the market port-

folio over the riskfree rate required by an investor in order to hold the market
portfolio, or the extra return that the overall stock market must provide over
the riskfree asset to compensate for the extra risk
• implied equity premium (IEP): the required equity premium that arises
from a pricing model and from assuming that the market price is correct.
The HEP is observable on the financial market and is equal for all investors.1 It
is calculated by
HEPt = rm,t − rf,t−1 = ( PPt−1

t
− 1) − (rf,t−1 ) (2.1)
1 This is true as long as they use the same instruments and the same time resolution.
9
10 The Equity Premium
where rm,t is the return on the stock market, rf,t−1 is the rate on a riskfree asset
from t − 1 to t. Pt is the stock index level.
A widely used measure for rm,t is the return on a large stock index. For the
second asset rf,t−1 in (2.1), the return on government securities is usually used.
Some practitioners use the return on short-term treasury bills; some use the re-
turns on long-term government bonds. Yields on bonds instead of returns have
also been used to some extent. Despite the indisputable importance of the equity
premium, a general consensus on exactly which assets should enter expression (2.1)
does not exist. Questions like: “Which stock index should be used?” and “Which
riskfree instrument should be used and which maturity should it have?” remain
unanswered.
The EEP is made up of the markets expectations of future returns over a risk-free
asset and is therefore not observable in the financial market. Its magnitude and
the most appropriate way to produce estimates thereof is an intensively debated
topic among economists. The market expectations shaping the premium are based
on, at least, a non-negative premium and to some extent also average realizations
of the HEP. This would mean that there is a relation between the EEP and the
HEP. Some authors (e.g. [9], [21], [37] and [42]), even argue that there is a strict
equality between the both, whereas other claim that the EEP is smaller than the
HEP (e.g. [45], [6] and [22]). Although investors have different opinions to what
is the correct level of the expected equity premium, many basic financial books
recommend using 5-8%.2
The required equity premium (REP) is important in valuation since it is the key
to determining the company’s required return on equity.
If one believes that prices on the financial markets are correct, then the implied eq-
uity premium, (IEP), would be an estimate of the expected equity premium (EEP).
We now turn to presenting models being used to produce estimates of the dif-
ferent concepts.
2.2 Historical models

The probably most used method by practitioners is to use the historical realized
equity premium as a proxy for the expected equity premium [64]. They thereby
implicitly follow the relationship HEP = EEP .
Assuming that the historical equity premium is equal to the expected equity pre-
mium can be formulated as
rm,t = Et−1 [rm,t ] + em,t (2.2)

2 See for instance [8]
2.3 Implied models 11
where em,t is the error term, the unexpected return. The expectation is often com-
puted as the arithmetic average of all available values for the HEP. In equation
(2.2), it is assumed that the errors are independent and have a mean of zero. The
model then implies that investors are rational and the random error term corre-
sponds to their mistakes. It is also possible to model more advanced errors. For
example, an autoregressive error term might be motivated since market returns
sometimes exhibit positive autocorrelation. An AR(1) model then implies that
investors need one time step to learn about their mistakes. [64]
The model has the advantages of being intuitive and easy to use. The draw-
backs on the other hand are not few. Except for usual problems with time series,
such as used length, outliers etc, the model suffers from problems with longer pe-
riods where the riskfree asset has a higher average return than the equity. Clearly,
this is not plausible since an investor expects a positive return in order to invest.
2.3 Implied models

Implied models for the equity premium make use of the assumption EEP = IEP
and are used much in a similar way as investors use the Black and Scholes formula
backwards to solve for implied volatility. The advantage of implied models is that
they provide time-varying estimates for the expected market returns since prices
and expectations change over time. The main drawback is that the validity is
bounded by the validity of the model used. Lately, the inverse Black Litterman
model has attracted interest, see for instance [67]. Another more widely used
model is the Gordon dividend growth model which is further discussed in [11].
Under certain assumptions it can be written as
E[Di,t+1 ]
Pit = (2.3)
E[ri,t+1 ] − E[gi,t+1 ]
where E[Di,t+1 ] are the next years expected dividend, E[ri,t+1 ] the required rate
of return and E[gi,t+1 ] is the company’s expected growth rate of dividends from
today until infinity.
Assuming that CAPM3 holds, the required rate of returns for stock i can be
written as
E[ri,t ] = rf,t + βi,t E[rm,t − rf,t ] (2.4)
By combining the two equations, where dividends are approximated as E[Di,t+1 ] =

[1 + E[gi,t+1 ]]Di,t , under assumption that E[rf,t+1 ] = rf,t+1 and by aggregating
3 Capital asset pricing model, see [7]

over all assets, we can now solve for the expected market risk premium
(1 + E[gm,t+1 ])Dm,t
E[rm,t+1 ] = + E[gm,t+1 ]
Pm,t
= (1 + E[gm,t+1 ]) DivYieldm,t +E[gm,t+1 ] (2.5)
where E[rm,t+1 ] is the expected market risk premium, Dm,t is the sum of dividends
from all companies, E[gm,t+1 ] is the expected growth rate of the dividends from
today to infinity4 , and DivYieldm,t is the current market price dividend yield. [64]
One critic against using the Gordon dividend growth model is that the result
depend heavily on what number is used for the expected dividend growth rate and
thereby the problem is shifted to forecasting the expected dividend growth rate.
2.4 Conditional models

Conditional models refers to models conditioning on the information investors use
to estimate the risk premium and thereby allow for time-varying estimations. On
the other hand, the information set Ωt used by investors is not observable on the
market and it is not clear how to specify a method that investors use to form their
expectations from the data set.
As an example of such a model, the conditional version of the CAPM implies

the following restriction for the excess returns
E[ri,t |Ωt−1 ] = βi,t E[rm,t |Ωt−1 ] (2.6)
where the market beta is

cov [ri,t , rm,t |Ωt−1 ]
βi,t = (2.7)
var [rm,t |Ωt−1 ]
and E[ri,t |Ωt−1 ] and E[rm,t |Ωt−1 ] are expected returns on asset i and the market
portfolio conditional on investors’ information set Ωt−1 5 .
Observing that the ratio E[rm,t |Ωt−1 ]/ var[rm,t |Ωt−1 ] is the market price of risk
λm,t , measuring the compensation an investor must receive for a unit increase
in the market return variance [55], yields the following expression for the market
portfolio’s expected excess returns
E[rm,t |Ωt−1 ] = λm,t (Ωt−1 ) var [rm,t|Ωt−1 ]. (2.8)
By specifying a model for the conditional variance process, the equity premium
can be estimated.
4 E[R
m,t+1 ] >E[gm,t+1 ]
5 Both returnsare in excess of the riskless rate of return rf,t−1 and all returns are measured
in one numeraire currency.
2.5 Multi factor models 13
2.5 Multi factor models

Multi factor models make use of correlation between equity returns and returns
from other economic factors. By choosing a set of economic factors and by deter-
mining the coefficients, the equity premium can be estimated as
X
rm,t = αt + βj,t Xj,t + εt (2.9)
j
where the coefficients α and β usually are calculated using the least squares method
(OLS), X contains the factors and ε is the error.
The most prominent candidates of economic factors used as explanatory variables

are the dividend to price ratio and the dividend yield (e.g. [60], [12], [28], [40] and
[51]), the earnings to price ratio (e.g. [13], [14] and [48]), the book to market ratio
(e.g. [46] and [58]), short term interest rates (e.g. [40] and [1]), yield spreads (e.g.
[43], [15] and [29]), and more recently the consumption-wealth ratio (e.g. [50]).
Other candidates are dividend payout ratios, corporate or net issuing ratios and
beta premia (e.g. [37]), the term spread and the default spread (e.g. [2], [15], [29]
and [43]), the inflation rate (e.g. [30], [27] and [19]), value of high and low beta
stocks (e.g. [57]) and aggregate financing activity (e.g. [3]).
Goyal and Welch [37] showed that most of the mentioned predictors performed
worse out-of-sample than just assuming that the equity premium had been con-
stant. They also found that the predictors were not stable, that is their importance
changes over time. Campbell and Thompson [16] on the other hand found that
some of the predictors, with significant forecasting power in-sample, generally have
a better out-of-sample forecast power than a forecast based on the historical av-
erage.
2.6 A short summary of the models
Model type Advantages Disadvantages

Historical Intuitive and easy to use Might have problems with
longer periods of negative
equity premium
Doubtful whether past is
an indicator for future
Implied Relatively simple to use The validity of the esti-
mates is bounded to the
validity of the used model
Provides time varying es- Assumes market prices are
timates for the premium correct
Conditional Provides time varying es- The information used by
timates for the premium investors are not visible on
the market
Models for determining
how investors form their
expectations from the in-
formation are not unam-
biguous
Multi Factor High model transparency It is doubtful whether past
and results are easy to in- is an indicator for future
terpret
Forecasts are only possible
for a short time horizon,
due to lagging
Table 2.1. Table highlighting advantages and disadvantages of the discussed models
2.7 What is a good model? 15
2.7 What is a good model?

These are model criterions that the authors, inspired of Vaihekoski [64], consider
important for a good estimate of the equity premium:
Economical reasoning criterions

• The premium estimate should be positive for most of the time
• Model inputs should be visible at the financial markets
• The estimated premium should be rather smooth over time because investor
preferences presumably do not change much over time
• The model should provide different premium estimates for different time
horizons, that is, taking investors “time structure” into account
Technical reasoning criterions

• The model should allow for time variation in the premium
• The model should make use of the latest time t observation
• The model should be provided with a precision of the estimated premium
• It should be possible to use different time resolutions in the data input
2.8 Chosen model

All model categories previously stated are likely to be useful in estimating the
equity premium. In our thesis we have chosen to work with multi factor models
because they are intuitively more straight forward than both implied and condi-
tional models; all model inputs are visible on the market and it is perfectly clear
from the model how different factors add up to the equity premium. Furthermore,
it is easy to add constraints to the model, which enables the use of economic
reasoning as a complement to pure statistical analysis.
Chapter 3
Linear Regression Models
First we summarize the mechanics of linear regressions and present some formu-
las that hold regardless of what statistical assumptions that are made. Then we
discuss different statistical assumptions about the properties of the model and ro-
bustness of the estimates.
3.1 Basic definitions

Suppose that a scalar yt is related to a vector xt ∈ Rk×1 and a noise term ut
according to the regression model
yt = x>
t β + ut . (3.1)
Definition 3.1 (Ordinary least squares OLS) Given an observed sample

(y1 , y2 , . . . , yT ), the ordinary least squares estimate of β (denoted βˆt ) is the value
PT PT
that minimizes the residual sum of squares: V (β) = t=1 ε2t (β) = t=1 (yt − ŷt )2
PT
= t=1 (yt − xt β)2 (see [38])
Theorem 3.1 (Ordinary least squares estimate) The OLS estimate is given
by
T
X T
X
β̂ = [ (xt x>
t )]
−1
[ (xt yt )] (3.2)
t=1 t=1
PT >
assuming that the matrix t=1 (xt xt ) ∈ Rk×k is nonsingular (see [38]).
Proof : The result is found by differentiation,

dV (β) PT
dβ = −2 t=1 xt (yt − xt β) = 0,
and the minimizing argument is thus
17
18 Linear Regression Models
PT > −1
PT
β̂ = [ t=1 (xt xt )] [ t=1 (xt yt )].
Often, the regression model is written in matrix notation as
y = Xβ + u, (3.3)
   T  
y1 x1 u1
 y2  xT2   u2 
where y ≡  .  X ≡  .  u ≡  .. .
     
 ..   ..   . 
yn xTn un
A perhaps more intuitive way to arrive at equation (3.2) is to project y on the
column space of X.
Figure 3.1. OLS by means of projection
The vector of the OLS sample residuals, û can then be written as û = y − Xβ.
Consequently the loss function V (β) for the least squares problem can be written
V (β) = minβ (û> û).

Since ŷ, the projection of y on the column space of X, is orthogonal to û
û> ŷ = ŷ> û = 0. (3.4)
In the same way, the OLS sample residuals are orthogonal to the explanatory
variables in X
û> X = 0. (3.5)
3.1 Basic definitions 19
Now, substituting ŷ = Xβ into (3.4) yields
(Xβ)> (y − Xβ) = 0 ⇔
β > (X> y − X> Xβ) = 0.
By choosing the nontrivial solution for beta, and by noticing that if X is of full
rank, then the matrix X> X also is of full rank and we can compute the least
squares estimator by inverting X> X.
β̂ = (X> X)−1 X> y. (3.6)
The OLS sample residual û shall not be confused with the population residual u.
The vector of OLS sample residuals can be written as
û = y − Xβ̂ = y − X(X> X)−1 X> y = [In − X(X> X)−1 X> ]y = MX y. (3.7)
The relationship between the two errors can now be found by substituting equation
(3.3) into equation (3.7)
û = MX (Xβ + u) = MX u. (3.8)
The difference between the OLS estimate β̂ and the true parameter β is found by
substituting equation (3.3) into (3.6)
β̂ = (X> X)−1 X> [Xβ + u] = β + (X> X)−1 X> u. (3.9)
Definition 3.2 (Coefficient of determination) The coefficient of determina-

tion, R2 , is defined as the fraction of variance that is explained by the model
var[ŷ]
R2 = var[y] .
If we let X include an
Pnintercept, then (3.5) also implies that the fitted residuals
have a zero mean n1 i=1 ûi = 0. Now we can decompose the variance of y into
the variance of ŷ and û
var[y] = var[ŷ + û] = var[ŷ] + var[û] − 2 cov[ŷ, û].
Rewriting the covariance as
cov[ŷ, û] = E[ŷû] − E[ŷ]E[û]
and by using ŷ ⊥ û and E[û] = 0 we can write R2 as

var[ŷ] var[û]
R2 = var[y] =1− var[y] .
Since OLS minimizes the sum of squared fitted errors, which is proportional to
var[y], it also maximizes R2 .
By substituting the estimated variances, R2 can be written as

1
Pn
var[ŷ] (ŷi − ȳ)2
= n1 Pi=1n 2
var[y] n i=1 (yi − ȳ)
n
(ŷi )2 − nȳ 2
P
= Pi=1 n 2 2
i=1 (yi ) − nȳ
(Xβ̂)> (Xβ̂) − nȳ 2
=
y> y − nȳ 2
y> X(X> X)−1 X> y − nȳ 2
=
y> y − nȳ 2
where the identity used is calculated as

n n n n
X X 2 X 1 X 2
(xi − x̄)2 = [x2i − xi xi + 2 ( xi ) ]
i=1 i=1
n i=1 n i=1
n n n
X 2 X 2 n X 2
= (x2i ) − ( xi ) + 2 ( xi )
i=1
n i=1 n i=1
n n
X 1 X 2
= (x2i ) − ( xi )
i=1
n i=1
n
X
= (x2i ) − nx̄2 .
i=1
3.2 The classical regression assumptions 21
3.2 The classical regression assumptions

The following assumptions1 are used for later calculations
1. xt is a vector of deterministic variables

2. ut is i.i.d. with mean 0 and variance σ 2
(E[u] = 0 and E[uu> ] = σ 2 In )
3. ut is Gaussian (0, σ 2 )
Substituting equation (3.3) into equation (3.6) and taking expectations using as-
sumptions 1 and 2 establishes that β̂ is unbiased,
β̂ = (X> X)−1 X> [Xβ + u] = β + (X> X)−1 X> u (3.10)

> −1 >
E[β̂] = β + (X X) X E[u] = β (3.11)
with covariance matrix given by
E[(β̂ − β)(β̂ − β)> ] = E[(X> X)−1 X> uu> X(X> X)−1 ] (3.12)
> −1 > > > −1
= (X X) X E[uu ]X(X X)
>
= 2
σ (X X) −1
X> X(X> X)−1
2 > −1
= σ (X X) .
When u is Gaussian, the above calculations imply that β̂ is Gaussian. Hence, the
preceding results imply
β̂ ∼ N (β, σ 2 (X> X)−1 ).
It can further be shown that under assumption 1,2 and 3, β̂ is BLUE2 , that is, no
unbiased estimator of β is more efficient than the OLS estimator β̂.
1 As treated in [38]
2 BLUE, best linear unbiased estimator see the Gauss-Markov theorem
3.3 Robustness of OLS estimates

The most serious problem with OLS is non-robustness to outliers. One single bad
point will have a strong influence on the solution. To remedy this one can dis-
card the worst fitting data-point and recompute the OLS fit. In figure 3.2, the
black line illustrates the result of discarding an outlier. Deleting of an extreme
Figure 3.2. The effect of outliers
point can be justified by arguing that there seldom are outliers which practically
makes them unpredictable and therefore the deletion would make the predictive
power stronger. Sometimes extreme points correspond to extraordinary changes
in economies and depending on context it might be more or less justified to discard
them.
Because the outliers do not get a higher residual they might be easy to over-
look. A good measure for the influence of a data point is its leverage.
Definition 3.3 (Leverage) To compute leverage in ordinary least squares, the

hat matrix H is given by H = X(X> X)−1 X> , where X ∈ Rn×p and n ≥ p.
Since ŷ = Xβ̂ = X(X> X)−1 X> y the leverage measures how an observation es-
timates its own predicted value. The diagonals hii of H contains the leverage
measures and are not influenced by y. A rule of thumb [39] for detecting out-
liers is that hii > 2 (p+1)
n signals a high leverage point, where p is the number of
columns in the predictor matrix X aside from the intercept and n is the number
of observations. [39]
3.4 Testing the regression assumptions 23
3.4 Testing the regression assumptions

Unfortunately assumption 2 can easily be violated for time series data since many
time series exhibit autocorrelation, resulting in the OLS estimates being inefficient,
that is, they have higher variability than they should.
Definition 3.4 (Autocorrelation function) The j th autocorrelation of a co-

variance stationary process3 , denoted ρj , is defined as its j th autocovariance di-
vided by the variance
γj
ρj ≡ , where γj = E(Yt − µ)(Yt−j − µ). (3.13)
γ0
Since ρj is a correlation, |ρj | ≤ 1 for all j. Note also that ρ0 equals unity for all
covariance stationary processes.
A natural estimate of the the sample autocorrelation ρj is provided by the corre-

sponding sample moments
γˆj
ρˆj ≡ γˆ0 , where
1
PT
γˆj = T t=j+1 (Yt − ȳ)(Yt−j − ȳ) j = 0, 1, 2 . . . , T − 1
PT
ȳ = T1 t=1 (Yt ).
Definition 3.5 (Durbin-Watson test) The Durbin-Watson test statistics is used

to detect the presence of autocorrelation in the residuals from a regression analysis
and is defined by
PT
t=2 (et − et−1 )2
DW = PT (3.14)
2
t=1 et
where the et , t = 1, 2, . . . , T are the regression analysis residuals.

The null hypothesis of the statistic is that there is no autocorrelation, that is
ρ = 0 and the opposite hypothesis, that there is autocorrelation, ρ 6= 0. Durbin
and Watson [23] derive lower and upper bounds for the critical values, see table
3.1.
ρ=0 → DW≈ 2 No Correlation

ρ=1 → DW≈ 0 Positive Correlation
ρ = −1 → DW≈ 4 Negative Correlation
Table 3.1. Critical values for the Durbin-Watson test.
3 For a definition of a covariance stationary process, see appendix A.1.

One way to check assumption 3 is to plot the underlying probability distribution

of the sample against the theoretical distribution. Figure 3.3 is called a Q-Q plot.
Figure 3.3. Example of a Q-Q plot
For a more detailed analysis the Jarque-Bera test, a godness of fit measure from
departure of normality, based on skewness and kurtosis can be employed.
Definition 3.6 (Jarque-Bera test) The test statistic JB is defined as

(K − 3)2

n 2
JB = S + (3.15)
6 4
where n is the number of observations, S is the sample skewness and K is the
sample kurtosis, defined as
1
Pn
(xk − x̄)3
S = n
1
Pn k=1
( n k=1 (xk − x̄)2 )3/2
1
Pn 4
n k=1 (xk − x̄)
K = 1
Pn
( n k=1 (xk − x̄) )2
2
where x̄ is the sample mean.

Asymptotically JB ∼ χ2 (2) which can be used to test the null hypothesis that
data are from a normal distribution. The null hypothesis is a joint hypothesis
of skewness being 0 and the excess kurtosis being 3 since samples from a nor-
mal distribution have an expected skewness of 0 and an expected kurtosis of 3.
The definition shows that any deviation from the expectations increases the JB
statistic.
Chapter 4
Bayesian Statistics
First, we introduce fundamental concepts of Bayesian statistics and then we pro-

vide tools for calculating posterior densities which are crucial to our forecasting.
4.1 Basic definitions

Definition 4.1 (Prior and posterior) If Mj , j ∈ J, are considered models,
then for any data D,
p(Mj ), j ∈ J, are called the prior probabilities of the Mj , j ∈ J
p(Mj |D), j ∈ J, are called the posterior probabilities of the Mj , j ∈ J
where p denotes probability distribution functions (See [5]).
Definition 4.2 (The likelihood function) Let x = (x1 , . . . , xn ) be a random

sample from a distribution p(x; θ) dependingQon an unknown parameter θ in the
n
parameter space A. The function lx (θ) = i=1 p(xi ; θ) is called the likelihood
function.
The likelihood function is then the probability that the values x1 , . . . , xn are in
the random sample. Mind that the probability density is written as p(x; θ). This
is to emphasize that θ is the underlying parameter and will not be written out
explicitly in the sequel. Depending on context we will also refer to the likelihood
function as p(x|θ) instead of lx (θ).
Theorem 4.1 (Bayes’s theorem) Let p(y, θ), denote the joint probability den-
sity function (pdf) for a random observation vector y and a parameter vector θ,
also considered random. Then according to usual operations with pdf’s, we have
p(y, θ) = p(y|θ)p(θ)
=p(θ|y)p(y)
and thus
p(θ)p(y|θ) p(θ)p(y|θ)
p(θ|y) = =R (4.1)
p(y) A
p(y|θ)p(θ)dθ
25
26 Bayesian Statistics
with p(y) 6= 0. In the discrete case, the theorem is written as

p(θ)p(y|θ) p(θ)p(y|θ)
p(θ|y) = =P . (4.2)
p(y) i∈A p(y|θi )p(θi )
The last expression can be written as follows

p(θ|y) ∝ p(θ)p(y|θ)
posterior pdf ∝ pdf × likelihood function, (4.3)
here, p(y), the normalizing constant needed to obtain a proper distribution in θ
is discarded and ∝ denotes proportionality. The use of the symbol ∝ is explained
in the next section.
Figure 4.1 highlights the importance of Bayes‘s theorem and shows how the prior
information enters the posterior pdf via the prior pdf, whereas all the sample in-
formation enters the posterior pdf via the likelihood function.
Figure 4.1. Bayesian revising of probabilities
Note that an important difference between the Bayesian statistics and the classical
Fisher statistics is that the parameter vector θ is considered to be a stochastic
variable rather than an unknown parameter.
4.2 Sufficient statistics

A sufficient statistics can be seen as a summary of the information in data, where
redundant and uninteresting information has been removed.
Definition 4.3 (Sufficient statistics) A statistic t(x) is sufficient for an under-

lying parameter θ precisely if the conditional probability distribution of the data
x, given the statistic t(x), is independent of the parameter θ, (see [17]).
Shortly the definition states that θ can not give any further information about x
if t(x) is sufficient for θ, that is, p(x|t, θ) = p(x|t).
The Neyman’s factorization theorem provides a convinient characterization of a

sufficient statistics.
4.2 Sufficient statistics 27
Theorem 4.2 (Neyman’s factorization theorem) A statistic t is sufficient

for θ given y if and only if there are functions f and g such that
p(y|θ) = f (t, θ)g(y)
where t = t(y). (see [49])
Proof: For a proof see [49]

Here, t(y) is the sufficient statistics and the function f (t, θ) relates the sufficient
statistics to the parameter θ, while g(y) is a θ independent normalization factor
of the pdf.
It turns out that many of the common statistical distributions have a similar
form. This leads to the definition of the exponential family.
Definition 4.4 (The exponential family) A distribution is from the one-parameter

exponential family if it can be put into the form
p(y|θ) = g(y)h(θ) exp[t(y)Ψ(θ)].

Equivalently, if the likelihood of n independent observations y = (y1 , y2 . . . yn )
from this distribution is of the form
ly (θ) ∝ h(θ)n exp[ t(yi )Ψ(θ)],
P
P
then it follows immediately from definition 4.2 that t(yi ) is sufficient for θ given
y.
Example 4.1: Sufficient statistics for a Gaussian

For a sequence of independent Gaussian variables with unknown mean µ
yt = µ + et ∼ N (µ, σ 2 ), t = 1, 2, . . . , N
QN
p(y|µ) = √ 1 exp[− 2σ1 2 (yt − µ)2 ]
t=1 2πσ 2
1 X 2 X 1 X 2
= exp[− 2 µ + 2µ yt ] (2πσ 2 )−N/2 exp[− 2 yt ]
| 2σ {z }| {z 2σ }
=f (t,µ) =g(y)
P
the sufficient statistics t(y) is given by t(y) = yt .
4.3 Choice of prior

Suppose our model M of a set of data y is parameterized by θ. Our knowledge
about θ before y is measured (given) is quantified by the prior pdf, p(θ). After
measuring y the posterior pdf is available as p(θ|y) ∝ p(y|θ)p(θ). It is clear that
different assumptions of p(θ) leads to different inferences p(θ|y).
A good rule of thumb for prior selection is that your prior should represent the
best knowledge available about the parameters before looking at data. For exam-
ple, the number of scores in a football game can not be less than zero and is less
than 1000, which justifies setting your prior equal to zero outside this interval.
In the case that one does not have any information, a good idea might be to use
an uninformative prior.
Definition 4.5 (Jeffreys prior) Jeffreys prior pJ (θ) is defined as proportional

to the determinant of the Fisher information matrix of p(y|θ)
1
pJ (θ) ∝ |J (θ|y)| 2 (4.4)
where
∂ 2 ln p(y|θ)

J(θ|y)i,j = −Ey . (4.5)
∂θi ∂θj
The Fisher information is a way of measuring the amount of information that
an observable random variable y = (y1 , . . . , yn ) carries about a set of unknown
parameters θ = (θ1 , . . . , θn ). The notation J(θ|y) is used to make clear that the
parameter vector θ is associated with the random variable y and should not be
thought of as conditioning. A perhaps more intuitive way1 to write (4.5) is
∂ ∂
J(θ|y)i,j = covθ [ ln p(y|θ), ln p(y|θ)] (4.6)
∂θi ∂θj
Mind that the Fisher information only is defined under certain regularity condi-
tions, which is further discussed in [24]. One might wonder why Jefferys made his
prior proportional to the square root of the determinant of the fisher information
matrix. There is a perfectly good reason for this, consider a transformation of the
unknown parameters θ to ψ(θ) then if K is the matrix Kij = ∂θi /∂ψj
J (ψ|y) = KJ (θ|y)K >
and hence the determinant of the information satisfies
|J(ψ|y)| = |J(θ|y)||K|2 .
Because |K| is the Jacobian, and thus, does not depend on y, it follows that
1
pJ (θ) ∝ |J(θ|y)| 2
provides a scale-invariant prior, which is a highly desirable property for a reference
prior. In Jefferys’ own words “any arbitrariness in the choice of parameters could
make no difference to the results”.
1 Remember that cov[x, y] = E[(x − µx )(y − µy )].
4.3 Choice of prior 29
Example 4.2
Consider a random sample y = (y1 , . . . , yn ) ∼ N (θ, φ), with mean θ known and
variance φ unknown. The Jeffreys prior pJ (φ) for φ is then computed as follows
n
Y 1 (xi − θ)2
L(φ|y) = ln (p(y|φ)) = ln ( √ exp[− ])
i=1
2πφ 2φ
n
1 1 X
= ln (( √ )n exp[− (xi − θ)2 ])
2πφ 2φ i=1
n
1 X n
= − (xi − θ)2 − ln φ + c
2φ i=1 2
n
∂2L 3 X n
⇒ = − (xi − θ)2 + 2
∂φ2 φ3 i=1 φ
n
∂2L 3 X n
⇒ −E[ ] = E[ (xi − θ)2 ] − 2 =
∂φ2 φ3 i=1 φ
3 n 2n
= 3
(nφ) − 2 = 2
φ φ φ
1 1
⇒ pJ (φ) ∝ |J(φ|y)| 2 ∝
φ
A natural question that arises is what choices of priors generate analytical expres-
sions for the posterior distribution. This question leads to the notion of conjugate
priors.
Definition 4.6 (Conjugate prior) Let l be a likelihoodfunction ly (θ). A class

Π of prior distributions is said to form a conjugate family if the posterior density
p(θ|y) ∝ p(θ)ly (θ)
is in the class Π for all y whenever the prior density is in Π (see [49]).
There is a minor complication with the definition and a more rigorous definition is
presented in [5]. However, the definition states the key principle in a clear enough
matter.
Example 4.3
Let x = (x1 , . . . , xn ) have independent Poisson distributions with the same mean
λ, then the likelihood function lx (λ) equals
Qn xi −nλ
lx (λ) = i=1 ( λxi e−λ ) = λt Qe n x ∝ λt e−nλ
i
i=1
Pn
where t = i=1 xi and by theorem 4.2 is sufficient for λ given x.
If we let the prior of λ be in the family Π of constant multiples of chi-squared

random variables, p(λ) ∝ λv/2−1 e−S0 λ/2 , then the posterior is also in Π.
1
p(λ|x) ∝ p(λ)lx (λ) = λt+v/2−1 e− 2 (S0 +2n)λ
The distribution of p(λ) is explained in appendix A.2.
Conjugate priors are useful in computing posterior densities. Although there are
not that many priors that are conjugate, there might be a risk of overuse since
data might be better described by another distribution that is not conjugate.
4.4 Marginalization
A useful property of conditional probabilities is the possibility to integrate out
undesired variables. According to usual operations of pdf’s we have
R
p(a, b)db = p(a).
Analogously, for any likelihood function of two or more variables, marginal like-
lihoods with respect to any subset of the variables can be defined. Given the
likelihood ly (θ, M ) the marginal likelihood ly (M ) for model M is
R
ly (M ) = p(y|M ) = p(y|θ, M )p(θ|M )dθ.
Unfortunately marginal likelihoods are often very difficult to calculate and numer-
ical integration techniques might have to be employed.
4.5 Bayesian model averaging

To explain the powerful idea of Bayesian model averaging (BMA) we start by an
example
Example 4.4
Suppose we are analyzing data and believe that it arises from a set of probability
distributions or models {Mi }ki=1 . For example, the data might consist of a normally
distributed outcome y that we wish to predict future values of. We also have two
other outcomes, x1 and x2 , that covariates with y. Using the two covariates as
4.5 Bayesian model averaging 31
predictors on y offers two models, M1 and M2 as explanation for what values y is

likely to take on in the future. A novel approach to deciding what future value of
y should be used might be to simply average the two estimates. But, if one of the
models suffers from bad predictive ability, then the average of the two estimates
is not likely to be especially good. Bayesian model averaging solves this issue by
normalizing the estimates ŷ1 and ŷ2 by how likely the models are
ŷ = p(M1 |Data)ŷ1 + p(M2 |Data)ŷ2 . Using theory from the previous chapters it
is possible to compute the probability p(Mi |Data) for each model.
We now treat the averaging more mathematically.
Let ∆ be a quantity of interest, then its posterior distribution given data D is

K
X
p(∆|D) = p(∆|Mk , D)p(Mk |D). (4.7)
k=1
This is a weighted average of the posterior probability where each model Mk is

considered. The posterior probability for model Mk is
p(D|Mk )p(Mk )
p(Mk |D) = PK , (4.8)
l=1 p(D|Ml )p(Ml )
where
Z
p(D|Mk ) = p(D|θk , Mk )p(θk |Mk )dθk (4.9)
is the marginalized likelihood of the model Mk with parameter vectors θk as defined

in section 4.4. All probabilities are implicitly conditional on M, the set of models
being considered. The posterior mean and variance of ∆ are given by
K
X
ξ = E[∆|D] = ˆ k p(Mk |D)
∆ (4.10)
k=1
φ = var[∆|D] = E(y2 |D) − E(y|D)2 (4.11)
K
X
= ˆ k )p(Mk |D) − E[y|D]2
(var[y|D, Mk ] + ∆
k=1
ˆ k = E[∆|D, Mk ], (see [41]).

where ∆
4.6 Using BMA on linear regression models

Here, the key issue is the uncertainty about the choice of regressors, that is the
model uncertainty. Each model Mj is of the previously discussed form y = Xj βj +
u ∼ N (Xj βj , σ 2 In ), where the regressors Xj ∈ Rn×p ∀j, with the intercept
included, correspond to the regressor set, j ∈ J, specified in chapter 5. The
quantity y is the given data and we are interested in the quantity ∆, the regression
line.
n
1 1 >
p(y|βj , σ 2 ) = ly (βj , σ 2 ) = ( 2πσ 2 ) exp[− 2σ 2 (y − Xj βj ) (y − Xj βj )]
2
By completing the square in the exponent, the sum of squares can be written as
(y − Xβ)> (y − Xβ) = (β − β̂)> X> X(β − β̂) + (y − Xβ̂)> (y − Xβ̂),
where β̂ = (X> X)−1 X> y is the OLS estimate. That the equality holds is proved
by multiplying out the right handside and checking that it equals the left handside.
As pointed out in section 3.1, (y − Xβ̂) is the residual vector û and its sum
of squares divided by the number of observations less the number of covariates is
known as the residual mean square denoted by s2 .
û> û û> û
s2 = (n−p) = (v) ⇒ û> û = vs2
It is convenient to denote n − p as v, known as the degrees of freedom of the model.
Now we can write the likelihood as

pj vj vj s2
ly (βj , σ 2 ) ∝ (σ 2 )− 2 exp[− 2σ1 2 (βj − βˆj )> (X> ˆ 2 −
j Xj )(βj − βj )] × (σ )
2 exp[− 2σ2j ].
The BMA analysis requires the specification of prior distribution for the parame-
ters βj and σ 2 . For σ 2 we choose an uninformative prior
p(σ 2 ) ∝ 1/σ 2 , (4.12)
which is the Jeffreys prior as calculated in example 4.2. For βj the g-prior, as
introduced by Zellner [68], is applied
p(βj |σ 2 , Mj ) ∼ fN (βj |0, σ 2 g(X>

j Xj )
−1
), (4.13)
where ∼ fN (w|m, V ) denotes a normal density on w with mean m and covariance

matrix V. The expression σ 2 (X> X)−1 is recognized as the covariance matrix of
the OLS-estimate and the prior covariance matrix is then assumed to be propor-
tional to the sample covariance with a factor g which is used as a design parameter.
An increase of g makes the distribution more flat and therefore gives higher pos-
terior weights to large absolute values of βj .
4.6 Using BMA on linear regression models 33
As shown by Fernandez, Ley and Steel [33] the following three theoretical values
of g lead to consistency, in the sense of asymptotically selecting the correct model.
• g = 1/n
The prior information is roughly equal to the information available from one
data observation
• g = k/n
Here, more information is assigned to the prior as the number of predictors
k grows
• g = k (1/k) /n
Now, less information is assigned to the prior as the number of predictors
grows
To arrive at a posterior probability of the models given data we also need to specify
the prior distribution for each model Mj over M the space of all K = 2p−1 models.

p(Mj ) = pj , j = 1, . . . , K

∀Mj ∈ M = pj > 0
PK

j=1 pj = 1
In our application, we chose pj = 1/K so that we have a uniform distribution

over the model space since we at this point have no reason to favor a model to
another. Now, the priors chosen have the tractable property of an analytical ex-
pression for ly (Mj ) the marginal likelihood.
Theorem 4.3 (Derivation of the marginal likelihood) Using the above spec-
ified priors, the marginalized likelihood function is given by
Z
ly (Mj ) = p(y|βj , σ 2 , Mj )p(σ 2 )p(βj |σ 2 , Mj )dβj dσ 2 =
Γ(n/2) g −1 > − n
= (y> y − y> Xj (X>
j Xj ) Xj y) 2 .
π n/2 (g+ 1) p/2 1+g
Proof :
ly (Mj , βj , σ 2 ) = p(y|βj , σ 2 , Mj )p(βj |σ 2 , Mj )p(σ 2 ) =

1
= (2πσ 2 )−n/2 exp[− 2 (vj s2j + (βj − β̂j )> (X>
j Xj )(βj − β̂j ))]
2σ
1
×(2πσ 2 )−p/2 |Z0 |−1/2 exp[− 2 (βj − β̄j )> Z0 (βj − β̄j ))] × 1/σ 2
2σ
To integrate the expression we start by completing the square of the exponents. Here,
we do not write out the index on the variables. Mind that Z0 is used instead of writing
out the g-prior.
(β − β̂)> X> X(β − β̂) + (β − β̄)> Z0 (β − β̄)

= β > (X> X + Z0 )β − β > (X> Xβ̂ + Z0 β̄) − (β̂ > X> X + β̄ > Z0 )β + β̂ > X> Xβ̂ + β̄ > Z0 β̄ =
= β > (X> X + Z0 )β − β > (X> X + Z0 ) (X> X + Z0 )−1 (X> Xβ̂ + Z0 β̄)
| {z }
=B1
− (β̂ > X> X + β̄ > Z0 )(X> X + Z0 )−1 (X> X + Z0 )β + β̂ X> Xβ̂ + β̄ > Z0 β̄ =
>
| {z }
=B>
1
= β > (X> X + Z0 )β − β > (X> X + Z0 )B1 − B> > > >

1 (X X + Z0 )β + B1 (X X + Z0 )B1
−B>1 (X >
X + Z 0 )B 1 + β̂ > >
X Xβ̂ + β̄ >
Z0 β̄ =
= (β − B1 )> (X> X + Z0 )(β − B1 ) − B> > > > >

1 (X X + Z0 )B1 + β̂ X Xβ̂ + β̄ Z0 β̄ =
> > > > > > −1 >
= (β − B1 ) (X X + Z0 )(β − B1 ) − (β̂ X X + β̄ Z0 )(X X + Z0 ) (X Xβ̂ + Z0 β̄)+
+β̂ > X> Xβ̂ + β̄ > Z0 β̄ =
= (β − B1 )> (X> X + Z0 )(β − B1 ) − (β̂ > X> X)(X> X + Z0 )−1 (X> Xβ̂)
− (β̂ > X> X)(X> X + Z0 )−1 Z0 β̄ − (β̄ > Z0 )(X> X + Z0 )−1 (X> Xβ̂)+
−(β̄ > Z0 )(X> X + Z0 )−1 (Z0 β̄) + (β̂ > X> X)(X> X + Z0 )−1 (X> X + Z0 )β̂
+β̄ > Z0 (X> X + Z0 )−1 (X> X + Z0 )β̄ =
= (β − B1 )> (X> X + Z0 )(β − B1 ) − [(β̂ > X> X)(X> X + Z0 )−1 (Z0 β̄)
+ (β̄ > Z0 )(X> X + Z0 )−1 (X> Xβ̂) − (β̄ > X> X)(X> X + Z0 )−1 (Z0 β̄)
− (β̂ > Z0 )(X> X + Z0 )−1 (X> Xβ̂)] =
/X> X(X> X + Z0 )−1 Z0 = ((X> X)−1 + Z−1

0 )
−1
/
= (β −B1 )> (X> X+Z0 )(β −B1 )−[β̂ > ((X> X)−1 +Z−1
0 )
−1
β̄ + β̄ > ((X> X)−1 +Z−1
0 )
−1
β̂
> > −1 −1 −1 > > −1 −1 −1
−β̂ ((X X) + X0 ) β̂ − β̄ ((X X) + Z0 ) β̄] =
= (β − B1 )> (X> X + Z0 )(β − B1 ) + (β̂ − β̄)> ((X> X)−1 + Z−1

0 )
−1
(β̂ − β̄).
Now we can write ly (Mj , βj , σ 2 ) as

−(n+p)/2
1/σ 2 × (2πσ 2 ) × exp[− 2σ1 2 S1 ] × exp[− 2σ1 2 (βj − B1 )> (A1 )(βj − B1 )] where
S1 = vj s2j + (β̂j − β̄j )> ((X>

j Xj )
−1
+ Z−1
0 )
−1
(β̂j − β̄j )
A1 = Z0 + X>
j Xj
The second exponent is the kernel of a multivariate normal density2 and integrating with
respect to β yields
1/σ 2 × (2πσ 2 )−n/2 |Z0 |1/2 |A1 |−1/2 × exp[− 2σ1 2 S1 ]
which in turn is the kernel of an inverted Wishart density3 .
2 For a definition, see Appendix A

3 For a definition, see Appendix A
4.6 Using BMA on linear regression models 35
We now integrate with respect to σ 2 resulting in
lY (Mj ) = (2π)−n/2 |Z0 |1/2 |A1 |−1/2 |S1 |−n/2 c0 (n0 = n + 2, p0 = 1) × k
where k is a proportionality constant canceling in the posterior expression. To obtain

the marginal likelihood we substitute Z0 with the inverse of the g-prior g1 (X>
j Xj ), where
σ 2 is integrated out.
−n/2
|S1 |−n/2 = S1 = (vj s2j + β̂j> ((1 + g)X>
j Xj )
−1
β̂j )−n/2
= (vj s2j + β̂j> (1/(1 + g))(X>
j Xj )
−1
β̂j )−n/2
= ((y − Xj β̂j )> (y − Xj β̂j ) + β̂j> (1/(1 + g))(X>
j Xj )
−1
β̂j )−n/2
g
= (y> y − y> Xj (X> j Xj )
−1 >
Xj y)−n/2
1+g
1
|Z0 |1/2 = | X> j Xj |
1/2
= (1/g)p/2 |X>j Xj |
1/2
g
1 1
|A1 |−1/2 = =
|A1 |1/2 (1 + (1/g))p/2 |X>j Xj |
1/2
c0 (n0 = n + 2, p0 = 1) = 2n/2 Γ(n/2)
Γ(n/2)
And finally we arrive at ly (Mj ) = π n/2 (g+1)p/2
(y> y − g
1+g
y> Xj (X>
j Xj )
−1 >
Xj y)−n/2 .

Now, applying Bayes rule yields the posterior model probabilities
p(y|Mj )pj
p(Mj |y) = Pn
k=1 p(y|Mk )pk
Meanwhile, the mean and variance of the predicted values, ∆, is given by

K
X
ξ = E(∆|y) = Xj βj p(Mj |y) (4.14)
j=1
K
X
φ = var[∆|y] = [σû2 Xj (X>
j Xj )
−1 >
Xj
j=1
+ (Xj β̂j ) ]p(Mj |y) − [Xj β̂j p(Mj |y)]2

2
(4.16)
where the expression var[∆|y, Mk ] from equation (4.11) is calculated as
var[∆|y, Mk ] = var[Xk βk ] = E[Xk (β̂ − β)(β̂ − β)> X>

k] (4.17)
= Xk E[(β̂ − β)(β̂ − β) >
]X>
k
= σû2 Xk (X>
k Xk )
−1 >
Xk .
The estimation error is calculated as

v
u n
u1 X
Sk = t φii . (4.18)
n i=1
Finally the confidence interval for our BMA estimate for the equity premium is
calculated
α Sk
I1−α (ξk ) = ξk ± Φ−1 (1 − ) √ , (4.19)
2 n
where Φ = p(X ≤ x) when X is N (0, 1). This interval results from the central
limit theorem stating that for a set of n i.i.d. random variables with finite mean
µ and variance σ 2 , the sample average approaches the normal distribution with a
2
mean µ and variance σn as n increases. This holds irrespectively of the shape of
the original distribution. It then follows, that for each time step the 218 estimates
of the equity premium has a sample mean and variance that is normal distributed.
Chapter 5
The Data Set and Linear

Prediction
In this chapter we first describe the used data set and then explain and motivate
the predictors we have chosen to forecast the expected equity premium. We also
check that our statistical assumptions hold and explain how the predictions are
carried out.
5.1 Chosen series

The data set consists of information from three different sources: Bloomberg , R
FRED and
R ValueLine ,
R see table 5.1. In total the set consists of 18 different
time series, which can be divided into three different groups: data on a large stock
index, interest rates and macroeconomic factors. The data set has yearly data
from 1959 to 2007 on each series. The time series from ValueLine ends in 2003
and has been prolonged with data from Bloomberg while data from FRED covers
the whole time span.
5.2 The historical equity premium

The historical yearly realized equity premium can be seen in figure 5.1, where the
premium is calculated as in expression (2.1) with Pt as the index level of Dow
Jones Industrial Average (DJIA)1 and rf,t−1 being the US 1-year treasury bill
rate. It is this historical time series that will be used as dependent variable in the
regression models.
1 DJIA is is a price-weighted average of 30 significant stocks traded on the New York Stock
Exchange and the Nasdaq. In contrast, most stock indices are market capitalization weighted,
meaning that larger companies account for larger proportions of the index.
37
38 The Data Set and Linear Prediction
Time series Bloomberg Ticker FRED Id Value Line

Dow Jones Industrial Average (DJIA) INDU Index.Px Last - X
DJIA Dividend Yield .Eqy Dvd Yld 12m - X
DJIA Price-Earnings Ratio .Pe Ratio - X
DJIA Book Value per share .Indx Weighted Book Val - X
DJIA Price-Dividend Ratio .Eqy Dvd Yld 12m - X
DJIA Earnings per share .Indx General Earn - X
Consumer Price Index - CPIAUCNS -
Effective Federal Funds Rate - FEDFUNDS -
3-month Treasury Bill - TB3MS -
1-Year Treasury Rate - GS1 -
10-Year Treasury Rate - GS10 -
Moody’s Aaa Corp Bond Yield - AAA -
Moody’s Baa Corp Bond Yield - BAA -
Producer Price Index - PPIFGS -
Industrial Production Index - INDPRO -
Personal Income - PI -
Gross Domestic Product - GDPA -
Consumer Sentiment - UMCSENT -
Table 5.1. The data set and sources
Figure 5.1. The historical equity premium over time

5.3 Factors explaining the equity premium 39
5.3 Factors explaining the equity premium

From the time series in table 5.1 we have constructed 18 predictors, which should
account for changes in the stock index as well as changes in the general economy.
1. Dividend yield is the dividend yield on the Dow Jones Industrial Average
Index (DJIA).
2. Price-earnings ratio is the price-earnings ratio on DJIA.
3. Book value per share is the book value per share on DJIA.
4. Price-dividend ratio is the price dividend ratio on DJIA.
5. Earnings per share is the earnings per share on DJIA.
6. Inflation is measured by the consumer price index for all urban consumers
and all items.
7. Fed funds rate is the US effective federal funds rate.
8. Short term interest rate is the 3-month US treasury bill secondary market
rate.
9. Term spread short is the US 1-year treasury with constant maturity rate
less the 3-month US treasury bill secondary market rate.
10. Term spread long is the US 10-year treasury with constant maturity rate
less the US 1-year treasury with constant maturity rate.
11. Credit spread is Moody’s Baa corporate bond yield returns less the Aaa
corporate bond yield.
12. Producer price is the US producer price index for finished goods.
13. Industrial production is the US industrial production index.
14. Personal income is the US personal income.
15. GDP is the gross US domestic product.
16. Consumer sentiment is the University of Michigan time series for con-
sumer sentiment.
17. Volatility is the standard deviation of the returns on DJIA.
18. Earnings-book ratio is earnings per share divided by book value per share
for DJIA.
For all 18 factors above we have used the fractional change defined as
It
ri,t = It−1 −1 (5.1)
where ri,t is the return on factor i at time t and It is the factor level at time t.
The basic statistics for the 18 factors is found in table 5.3.
Factors
1 2 3 4 5 6 7 8 9
Mean 0.00 0.07 0.06 0.02 0.07 0.04 0.09 0.07 -0.02
Std 0.14 0.37 0.15 0.13 0.23 0.03 0.40 0.40 0.11
Median 0.00 -0.01 0.05 0.01 0.10 0.04 0.06 0.01 -0.02
Min -0.30 -0.38 -0.20 -0.23 -0.61 0.01 -0.71 -0.68 -0.34
Max 0.32 1.73 0.87 0.29 0.64 0.14 1.28 1.65 0.20
10 11 12 13 14 15 16 17 18
Mean -0.04 0.00 0.04 0.03 0.07 0.07 0.00 0.04 1.50
Std 0.27 0.04 0.04 0.05 0.03 0.03 0.14 0.01 11.84
Median 0.01 -0.01 0.02 0.03 0.07 0.06 0.00 0.04 0.79
Min -1.29 -0.10 -0.03 -0.09 0.01 0.00 -0.28 0.01 -52.24
Max 0.53 0.15 0.16 0.11 0.13 0.13 0.42 0.08 48.60
Table 5.2. Basic statistics for the factors
Dividend yield
The main reason for the supposed predictive power of the dividend yield is the
positive relation between expected high dividend yields and high returns. This is
a result from using a discounted cash flow framework under the assumption that
the expected stock return is equal to a constant. For instance Campbell [11] has
shown that the current stock price is equal to the expected present value of future
dividends out to the infinite future. Assuming that the current dividend yields will
remain the same in the future, the positive relation follows. This relationship can
also be observed in the Gordon dividend growth model. In the absence of capital
gains, the dividend yield is also the return on the stock and measures how much
cash flow you are getting for each unit of cash invested.
Price-earnings ratio
Price-earnings ratio, price per share divided by earnings per share, measures how
much an investor is willing to pay per unit of earnings. A high Price-earnings ratio
then suggests that investors think the firm has good growth opportunities or that
the earnings are safe and therefore more valuable [7].
Book value per share

Book value per share, value of equity divided by the number of outstanding share,
is the raw value of the stock and should be compared with the market value of
the stock. These two figures are rarely the same. Most of the time a stock trade
to a multiple of the book value. A low book value per share in comparison with
the market value per share suggests that the stock is high valued or perhaps even
overvalued, the reciprocal also holds.
Price-dividend ratio
The price-dividend ratio, price per share divided by annual dividend per share, is
the reciprocal of the dividend yield. A low ratio might mean that investors require
a high rate of return or that they are not expecting dividend growths in the future
[7]. As a consequence, a low ratio could be a forecast of less profitable times. A

low ratio can also indicate either a fast growing company or a company with poor
outlooks. A high ratio could either point to a mature company with few growth
opportunities or just a mature stable company with temporarily low market value.
Earnings per share

Earnings per share, profit divided by the number of outstanding share, is more
interesting if you calculate and view the incremental change for a period of time.
A steady rate of increasing earnings per share could suggest good performance and
decreasing earnings per share figures would suggest poor performance.
Inflation
Inflation is defined as the increase in the price of some set of goods and services
in a given economy over a period of time [10]. The inflation is usually measured
through a consumer price index, which measures nominal consumer prices for a
basket of items bought by a typical consumer. The prices are weighted by the
fraction the typical consumer spends on each item. [20]
Many different theories for the role and impact of inflation in an economy have
been proposed, but they all have some basic implications in common. A high
inflation make people more interested in investing their savings in assets that are
inflation protected, e.g. real estate, instead of holding fixed income assets such
as bonds. By moving away from fixed income and investing in other assets the
hopes are that the returns will exceed the inflation. As a result, high inflation
leads to reduced purchasing power as individuals reduce money holdings. High
inflations are unpredictable and volatile. This creates uncertainty in the business
community, reducing investment activity and thus economic growth. If a period
of high inflation rules, a prolonged period of high unemployment must be paid to
reduce inflation to modest levels again. This is the main reason for fearing high
inflation. [44]
A low inflation usually implies that the price levels are expected to increase over
time and therefore it is beneficial to spend and borrow in the short run. A low
inflation is the starting point for a higher rate of inflation.
Central banks try to contain the rate of inflation to a predetermined interval,

usually 2-3 %, in order to maintain a stable price level and currency value. The
means for doing so are given to the banks by changing the discount rate - increas-
ing the rate usually dampens the inflation and the other way around.
Generally, no producer is keen on lowering their prices, just as no employee accepts

a decrease in their nominal salary. This leads to that a small level of inflation has
to be allowed in order for the pricing system to work efficient. Inflation levels
above this threshold are considered negative, mainly due to the fact that inflation
creates further inflation expectations. [44]
Besides being linked to the general state of the economy, inflation also has great
impact on interest rates. If the inflation rises, so will the nominal interest rates
which in turn influence the business conditions. [44]
Federal funds rate

The federal funds rate is one of the most important money market instruments. It
is the rate that banks in the US charge each other for lending overnight. Federal
funds are tradable reserves that commercial banks are required to maintain with
the Fed. The Fed does not pay interest on these reserves so banks maintain the
minimum reserve position possible and sell the excess to other banks short of cash
to meet their reserve deposit needs. The federal funds rate therefore is roughly
analogous to the London Interbank Offered Rate (LIBOR). [4]
A bank that wishes to finance a client venture but does not have the means to
do so can lend capital from another bank to the federal funds rate. As a result,
the federal funds rate set the threshold for how willing banks are to finance new
ventures. As the rate increases, banks become more reluctant to take out these
inter-bank loans. A low rate will on the other hand encourage banks to borrow
money and hence increase the possibilities for businesses to finance new ventures.
Therefore, this rate somewhat controls the US business climate.
Short term interest rate

The short term interest rate (3-month T-bills) is an important rate which many
use as a proxy for the risk-free rate and hence enters many different valuation
models used by practitioners. As a result, changes in the short term rate influ-
ences the market prices. For instance, an increase in the short term rate makes the
present value of cash flows to the firm take on a smaller value and a discount cash
flow model for a firm’s stock would as a result imply a lower stock price. Another
simple implication is that an increase also just make it more expensive for firms
to finance themselves in the short run. In general, an increase in the short term
rate tend to slow economic growth and dampen inflation. The short term interest
rate is also linked, in its movements, to the federal funds rate.
Term spread
A yield curve can take on many different shapes and there are several different
theories trying to explain the shape. When talking about the shape of the yield
curve one refers to the slope of the curve. Is it flat, upward sloping, downward
sloping or humped? Upward and downward sloping curves are also referred to as
normal and inverted yield curves. A yield curve constructed from prices in the
bond market can be used to calculate different term spreads, differences in rates
for two different maturities. For this reason the term spread is related to the slope
of the yield curve. Here we have defined the short term spread as the difference in
rates between the maturities one year and three months and the long term spread
as the difference between ten years and one year maturities. Positive short and
long term spreads could imply an upward sloping yield curve, and the opposite
could imply a downward sloping curve. A positive short term spread and a nega-
tive long term spread could correspond to a humped yield curve.
Yield curves almost always slope upwards, figure 5.2 a. One reason for this is
expectation of future increases in inflation and therefore investors require a pre-
mium for locking in their money at an interest rate that is not inflation protected.
[44] As mentioned earlier, increase in inflation comes with economy growth which
makes an upward sloping yield curve a sign of good times. The growth itself
can also be partly explained by the lower short term rate which makes it cheaper
for companies to borrow for expanding. Furthermore, central banks are expected
to fend off the expected rise in inflation with higher rates, decreasing the price
of long-term bonds and thus increasing their yields. A downward sloping yield
curve, figure 5.2 b occurs when the expectations is that future inflation will be
lower than current inflation and thus the expectation also is that the economy will
slow down in the future [44]. A low long term bond yield is acceptable since the
inflation is low. In fact, each of the six last recessions in the US has been preceded
by an inverted yield curve [25]. This shape could also be developed as the Federal
Reserve raises their nominal federal funds rate.
(a) Normal (b) Inverted (c) Flat (d) Humped
Figure 5.2. Shapes of the yield curve
A flat yield curve, figure 5.2 c, signals uncertainty in the economy and should not
be visible for any longer time periods. Investors should in theory not have any
incentive to hold long-dated bonds over shorter-dated bonds when there is no yield
premium. Instead they would sell off long-dated bonds resulting in higher yields in
the long end and an upward sloping yield curve. A humped yield curve, figure 5.2
d, arises when investors expect interest rates to rise over the next several periods
and then decline. It could also signal the beginning of a recession or just be the
result of a shortage in the supply of long or short-dated bonds. [18]
Credit spread
Yields on corporate bonds are almost always higher than on treasuries with the
same maturity. This is mainly a result of the higher default risk in corporate
bonds, even if other factors have been suggested as well. The corporate spread,
also known as the credit spread, is usually the difference between the yields on a
Baa rated corporate bond and a government bond, with the same time to maturity
of course. Research [47] has shown that only around 20-50 percent of the credit
spread can be accounted for by the default risk only, when calculating the credit
spread with government bonds as the reference instrument. If one instead uses
Aaa rated corporate bonds, you hopefully increase this number. Above all, the
main reason for using credit spread as an explaining/forecasting variable at all is
that the credit spread seems to widen in recessions and to shrink in expansions
during the business cycle [47]. It can also change as other bad news hit the market.
Our corporate bond series have bonds with a maturity as close as possible to 30
years, and are averages of daily data.
Producer price
The producer price measures the average change over time in selling prices received
by domestic producers of goods and services. It is measured from the perspective
of the seller in contrast with the consumer price index that measure from pur-
chaser’s perspective. These two may differ due to government subsidies, sales and
excise taxes and distribution costs.[63]
Industrial production and personal income

Industrial production measures the output from the US industrial sector which
is defined as being compromised to manufacturing, mining and electric and gas
utilities [31]. Personal income measures the sum of wages and salaries in dollars
for the US.
Gross domestic product

The gross domestic product (GDP) is considered as a good measure of the size of
an economy and how well it is performing. This statistics is defined as the market
value of all goods and services produced within a country in a given time period
and is computed every three months by the Bureau of Economic Analysis. More
specifically, the GDP is the sum of spending divided into four broad categories:
consumption, investment, government purchases and net exports. The change of
the GDP describes how the economy varies so therefore it is an indicator of the
conjuncture cycle. [53]
Consumer sentiment
The consumer sentiment index is based on household interviews and gives an in-
dication of the future business climate, personal finance and spending in the US
and therefore has implications on stocks, bonds and cash markets.[62]
Volatility
Volatility is the standard deviation of the change in value of a financial instrument.
The volatility is here calculated on monthly observations for each year. The basic
idea behind volatility as an explaining variable is that volatility is synonymous
with risk. High volatility should imply a higher demand for risk compensation, a
higher equity premium.
5.4 Testing the assumptions of linear regression 45
Earnings-book ratio
The earnings-book ratio relates the earnings per share to the book value per share
and measures a firm’s efficiency at generating profits. The ratio is also called ROE,
return on equity. It is likely that a high ROE yields a high equity premium be-
cause general business conditions have to be good in order to generate a good ROE.
5.4 Testing the assumptions of linear regression

As discussed in chapter 3.3 the estimated coefficients in the OLS-solution are very
sensitive to outliers. By applying the leverage measure from definition 3.3 the
outliers in table 5.3 have been found. Elements in ŷ deviating more than three
standard deviations from the mean of ŷ have been removed and replaced by lin-
ear interpolated values. This have been repeated three times for each factor time
series. In total, an average of one outlier per time series factor per time step has
been removed and interpolated.
Step Outlierstot
1 19
2 18
3 18
4 14
5 16
Table 5.3. Outliers identified by the leverage measure for univariate predictions
The assumptions that must hold for a linear regression model were presented in
chapter 3.2 and the means for testing these assumptions were given in chapter 3.4.
After having removed outliers, it is motivated to check for violations against the
classical regression assumptions.
The QQ-plots for all factors are presented in figure 5.3 and 5.4. By visual in-
spection of each subplot, it is seen that for some factors, the points on the plot fall
close to the diagonal line - the error distribution is likely to be gaussian. Other
factors shows sign of kurtosis due to the S-shaped form. A Jarque-Bera test on the
significance level 0.05 has been performed to rule out the uncertainties of depar-
tures from the normal distribution. From the results in table 5.4 it is found that
we can not reject the null hypothesis that the residuals are Gaussian at significance
level 0.05. The critical value represents the upper limit for the null hypothesis to
hold, the P-Value represents the probability of observing the same outcome given
that the null hypothesis is true or put another way if the P-Value is above the
significance level we cannot reject the null hypothesis.
Factor
1 2 3 4 5 6 7 8 9
JB-Value 2.39 1.79 1.35 2.24 1.69 1.27 0.96 1.14 2.00
Crit-Value 4.84 4.88 4.95 4.92 4.95 4.89 4.95 4.93 4.93
P-Value 0.16 0.26 0.39 0.18 0.29 0.41 0.53 0.46 0.22
H0 or H1 H0 H0 H0 H0 H0 H0 H0 H0 H0
10 11 12 13 14 15 16 17 18
JB-Value 1.62 2.14 0.85 1.77 0.96 0.82 1.72 2.18 1.62
Crit-Value 4.94 4.98 4.93 4.92 4.91 4.90 4.91 4.88 4.94
P-Value 0.30 0.20 0.58 0.26 0.53 0.59 0.28 0.19 0.30
H0 or H1 H0 H0 H0 H0 H0 H0 H0 H0 H0
Table 5.4. Jarque-Bera test of normality at α = 0.05 for univariate residuals for lagged
factors
To investigate the presence of autocorrelation in the residuals a Durbin-Watson

test is performed. If the Durbin-Watson test statistics is close to 2, it indicates
that there is no autocorrelation in the residuals. As can be seen in table 5.5 all
test statistics group around 2 and it can be assumed that autocorrelation is not
present. It can be concluded from these two tests and from checking that the
errors indeed have an average of zero, that the classical regression assumptions in
chapter 3.2 are fulfilled for the univariate models. For the multivariate models, it
has not been verified that the assumptions hold, this is due to the large number
of models. Even if the assumptions are not fulfilled, OLS can still be used, but it
is not guaranteed that it is the best linear unbiased estimate.
Factor
1 2 3 4 5 6 7 8 9
DW-Value 1.83 2.10 2.02 1.88 2.10 2.19 2.09 2.09 2.16
P-Value 0.46 0.85 0.97 0.58 0.83 0.67 0.89 0.89 0.64
10 11 12 13 14 15 16 17 18
DW-Value 2.08 1.97 2.23 1.92 2.23 2.08 2.11 2.02 2.05
P-Value 0.92 0.82 0.57 0.67 0.56 0.95 0.81 0.91 0.98
Table 5.5. Durbin-Watson test of autocorrelation for univariate residuals for lagged
factors
(a) Dividend yield (b) Price-earnings ratio (c) Book value per share
(d) Price-dividend ratio (e) Earnings per share (f) Inflation
(g) Fed funds rate (h) Short term interest rate (i) Term spread short
Figure 5.3. QQ-Plot of the one step lagged residuals for factors 1-9 versus standard
normal pdf
(a) Term spread long (b) Credit spread (c) Producer price
(d) Industrial production (e) Personal income (f) Gross domestic product
(g) Consumer sentiment (h) Volatility (i) Earnings-book ratio
Figure 5.4. QQ-Plot of the one step lagged residuals for factors 10-18 versus standard
normal pdf
(a) Dividend yield (b) Price-earnings ratio (c) Book value per share
(d) Price-dividend ratio (e) Earnings per share (f) Inflation
(g) Fed funds rate (h) Short term interest rate (i) Term spread short
Figure 5.5. One step lagged factors 1-9 versus returns on the equity premium, outliers
marked with a circle
(a) Term spread long (b) Credit spread (c) Producer price
(d) Industrial production (e) Personal income (f) Gross domestic product
(g) Consumer sentiment (h) Volatility (i) Earnings-book ratio
Figure 5.6. One step lagged factors 10-18 versus returns on the equity premium, outliers
marked with a circle
5.5 Forecasting by linear regression 51
5.5 Forecasting by linear regression

When forecasting time series data by using regression there are two different ap-
proaches. The first possibility would be to estimate the regression equation using
all values of the dependent and the independent variables. When one wants to
take a step ahead in time, forecasted values for the independent variables have to
be inserted into the regression equation. In order to do this one must clearly be
able to forecast the independent variables, e.g. by assuming an underlying process,
and one has mearly shifted the problem of forecasting the dependent variable to
forecasting the independent variables.
The second possibility is to estimate the regression equation using lagged inde-
pendent variables. If one wants to take one step ahead in time, then one would lag
its independent variables one step. This is illustrated in table 5.6 where τ is the
time lag steps. By inserting the most recent, unused, observations of the indepen-
dent variables in the regression equation you get a one step forecasted value for
the dependent variable. In fact, one could insert any of the unused observations of
the independent variables since its already assumed that the regression equation
holds over time. However, economically, it is common practise to use the most
recent values since they probably contain more information about the future2 . It
is the approach mentioned above that has been used in this thesis. Plots for the
univariate one step lagged regressions are found in figure 5.5 and figure 5.6.
Y Xi
yt ↔ xi,t−τ
yt−1 ↔ xi,t−τ −1
yt−2 ↔ xi,t−τ −2
.. ..
. .
yt−N ↔ xi,t−τ −N
Table 5.6. Principle of lagging time series for forecasting
2 This follows from the Efficient market hypothesis

When a time series is regressed on other time series that are lagged, information
is generally lost and resulting in smaller absolute values of R2 , see table 5.7. This
does not need to be the case, some times lagged predictors provide a better R2 .
This can be explained by, and observed in table 5.7, that it takes time for these
predictors to have impact on the dependent variable. For instance a higher R2 in-
sample would have been obtained for factor 15, GDP, if its time series would have
been lagged one step. The realized change in GDP does a better job in forecasting
than in explaining that years equity premium.
Factor Time lag

0 1 2 3 4 5
1 0.440 0.038 0.008 0.000 0.086 0.000
2 0.075 0.000 0.009 0.000 0.033 0.010
3 0.001 0.032 0.108 0.010 0.028 0.010
4 0.416 0.024 0.014 0.000 0.075 0.001
5 0.001 0.000 0.042 0.009 0.008 0.008
6 0.180 0.013 0.006 0.016 0.027 0.001
7 0.001 0.076 0.004 0.022 0.008 0.119
8 0.000 0.045 0.004 0.010 0.004 0.065
9 0.001 0.037 0.004 0.129 0.034 0.128
10 0.003 0.008 0.011 0.008 0.000 0.022
11 0.138 0.087 0.003 0.000 0.127 0.014
12 0.180 0.020 0.012 0.006 0.032 0.019
13 0.159 0.059 0.000 0.001 0.003 0.060
14 0.030 0.096 0.052 0.035 0.058 0.042
15 0.008 0.113 0.084 0.018 0.049 0.030
16 0.305 0.000 0.010 0.001 0.030 0.008
17 0.112 0.025 0.017 0.095 0.062 0.059
18 0.000 0.005 0.117 0.003 0.002 0.002
Table 5.7. Lagged R2 for univariate regression with the equity premium as dependent
variable
Chapter 6
Implementation
In this chapter it is explained how the theory from the previous chapter is imple-
mented and techniques and solutions are highlightened. All code is presented in
the appendix B.
6.1 Overview
The theory covered in the previous chapters are implemented using Matlab. To
make the program easy to use, a user interface in Excel is constructed. Figure 6.1
describes the communication between Excel, VBA and Matlab.
Figure 6.1. Flowchart
53
54 Implementation
Figure 6.2. User interface
6.2 Linear prediction

The predictions are implemented using Matlabs backslash operator which solves
equation systems of the form y = βx. Depending on matrix conditions of y,x dif-
ferent factorizations are made in the call y\x. If the dimensions are not matched,
the call is executed by first performing a factorization and the least squares esti-
mate of β is calculated. If the dimensions are matched, then β = y\x is computed
by Gauss elimination. The backslash operator never computes explicit inverses.
The Jarque-Bera test, Durbin-Watson test and the QQ-plots are generated us-
ing the following Matlab calls: jbtest,dwtest and qqplot.
In the multivariate prediction, permutations of the 18 factors are selected us-

ing binary numbers from 1 to 218 where the ones symbolize factors included and
the zeros symbolize factors not included in the different models.
Surveys on the equity premium have shown that the large majority of profes-
sionals believe that the the premium is confined to 2-13% [65]. Therefore, models
yielding a negative value of the premium or a value exceeding the historical mean
of the premium with 1.28σ, that corresponds to a 90% confidence interval, are not
being used in the Bayesian model averaging and therefore do not influence the
final premium estimate at all. Setting the upper bound to 1.28σ rules out premia
larger than around 30%.
6.3 Bayesian model averaging 55
6.3 Bayesian model averaging

The Bayesian model averaging is straightforwardly implemented from the theo-
retical expression for the likelihood given in section 4.6, where g is set to be the
reciprocal of the number of samples. As can be seen in table 7.6, the three different
choices of g lead almost to the same results. The difficulties with the implemen-
tation lie within dealing with the large number of models, 218 ≈ 262000, in a
time efficient manner. This problem has been solved by implementing a routine
in C, called setSubColumn, that handles memory allocation more efficient when
working with matrices close to the maximal allowed matrix size in Matlab. The
code is supplied in the appendix B.
6.4 Backtesting
Since the HEP sometimes is negative while we do not allow for negative values
of the premium, traditional backtesting would not be a fair benchmark for the
performance of our prediction model. Instead we evaluate how good excess returns
are estimated by allowing for negative values. To further investigate the predictive
ability of our forecasting, an R2 -out-of-sample statistic is employed. The statistic
is defined as
Pn
2 (rt − rˆt )2
Ros = 1 − Pi=1n 2
, (6.1)
i=1 (rt − r¯t )
where rˆt is the fitted value from the predicitive regression estimated through t − 1
and r¯t is the historical average return, also measured through t − 1. If the statistic
is positive, then the predicitive regression has lower average mean squared error
than the historical average.1 Therefore, the statistic can be used to determine if a
model has better predictive performance than applying the historical average.
A measure called hit ratio (HR) can be used as an indication of how good the
forecast is at predicting the sign of the realized premium. It is simply the ratio of
how many times the forecast has the right sign and the length of the investigated
time period. For an investor this is of interest since the hit ratio can be used as a
buy-sell signal on the underlying asset. In the case of the equity premium, this is
a biased measure since the long-term average of the HEP is positive.
An interesting question is if the next years predicted value will be realized within
the next coming business cycle, here approximated as five years and called forward
average. This value is calculated as a benchmark along with a five-year rolling av-
erage, here called backward average. The results from the backtest is presented in
the results section.
1 This statistics is further investigated by Campbell and Thomson [16]

Chapter 7
Results
In this chapter we present our forecasts of the equity premium along with the
results from the backtest.
7.1 Univariate forecasting

In figure 7.1 the historical equity premium is prolonged with the estimated equity
premia for five years ahead and plotted over time. The models used are univariate
and hence each model consists of only one factor, being 18 models in total.
The figures for the forecasted premia is displayed in table 7.1. Models not belong-
ing to the set specified in chapter 6 are not taken into consideration. In table 7.1
the labels Prediction Range and Mean refer to the range of the predicted values
and to the mean of these predicted values. Note that the Mean corresponds to the
prior believes. ξk is the estimate of the premium using bayesian model averaging.
The variance and a confidence interval for this estimate is also presented.
Time Prediction Range Mean ξk Sk I0.90

step
Dec-08 0.00 - 16.0 3.69 4.20 15.27 0.58 - 7.83
Dec-09 0.00 - 14.4 2.36 3.07 15.29 -0.60 - 6.74
Dec-10 0.00 - 14.0 2.54 3.54 15.28 -0.17 - 7.24
Dec-11 0.00 - 15.1 2.94 4.84 15.30 1.08 - 8.59
Dec-12 0.00 - 8.9 3.36 4.05 15.34 0.25 - 7.85
Table 7.1. Forecasting statistics in percent
57
58 Results
Figure 7.1. The equity premium from the univariate forecasts
In table 7.2 the factors constituting the univariate model with highest probability
over time is presented. The factors are further explained in chapter 5. Note that
the prior assumption about model probabilities is 1/18 ≈ 5.5 percent for each
model.
Time Factor Pr(Mi )

step
1 Gross Domestic Product 6.47
2 Gross Domestic Product 7.38
3 Terms Spread Short 8.19
4 Volatility 9.23
5 Terms Spread Short 6.96
Table 7.2. The univariate model with highest probability over time
Figure 7.2 shows how the likelihood function changes for different g-values for
each one step lagged predictor. Table 7.3 shows results from the backtest. The
2
Ros statistics shows that the univariate prediction model has better predictive
7.1 Univariate forecasting 59
Figure 7.2. Likelihood function values for different g-values
performance than applying the historical average for the period 1991 to 1999.
The hit ratio statistics, HR, shows how often the univariate predictions have the
right sign, that is, if the premium is positive or negative. Mind that we allow for
negative premium values when applying the HR statistics.
Pred. step 1 2 3 4 5
2
Ros,uni 0.21 0.26 0.23 0.05 0.14
HRuni 0.6 0.2 0 0.6 0.2
2
Table 7.3. Out of sample, Ros,uni , and hit ratios, HRuni
60 Results
7.2 Multivariate forecasting

The corresponding results from multivariate predictions are presented below in
figure 7.3. As in the univariate case, no negative values are allowed and the upper
limit from chapter 6 is used. In table 7.4 the labels Prediction Range and Mean
refer to the range of the predicted values and to the mean of these predicted values.
Note that the Mean corresponds to the prior believes. ξk is the estimate of the
premium using Bayesian model averaging.
Figure 7.3. The equity premium from the multivariate forecasts
Time Prediction Range Mean ξk Sk I0.90

step
Dec-08 0.00 - 21.4 3.18 7.72 16.6 3.79 - 11.7
Dec-09 0.00 - 21.7 1.48 7.97 16.7 4.01 - 11.9
Dec-10 0.00 - 21.4 5.07 10.4 16.6 6.45 - 14.3
Dec-11 0.00 - 21.7 4.26 10.2 16.7 6.30 - 14.2
Dec-12 0.00 - 16.0 0.58 3.74 17.7 -0.21 - 7.70
Table 7.4. Forecasting statistics in percent

7.2 Multivariate forecasting 61
Time Factor Pr(Mi )

step
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 • • • • • 0.001
2 • • • • • • • • 0.002
3 • • • • • • • 0.0009
4 • • • • • • 0.001
5 • • 0.003
Table 7.5. The multivariate model with highest probability over time
In table 7.5 the factors constituting the multivariate models with highest proba-
bilities over time are presented. The factors are discussed in chapter 5. Note that
the prior assumption about the model probabilities is 1/(218 ) ≈ 0.00038 percent
for each model.
Time horizon g = 1/n g = k 1/(1+k) /n g = k/n

Dec-08 7.7236 7.7274 7.8047
Dec-09 7.9769 7.9786 7.9509
Dec-10 10.384 10.340 10.568
Dec-11 10.248 10.251 10.344
Dec-12 3.7434 3.7433 3.7688
Table 7.6. Forecasts for different g-values
Table 7.6 depicts how the predicted values are influenced by the three choices for g.
In the univariate case, the three choices coincide. Table 7.7 shows results from the
2
backtest. The Ros statistics shows that also the multivariate prediction model has
better predictive performance than applying the historical average for the period
1991 to 1999. The hit ratio statistics, HR, shows how often the univariate pre-
dictions have the right sign, that is, if the premium is positive or negative. Once
again, we allow for negative premium values when applying the HR statistics.
Pred. step 1 2 3 4 5
2
Ros,mv 0.23 -0.10 0.20 0.47 0.60
HRmv 0.6 0.4 0.6 0.8 0.6
2
Table 7.7. Out of sample, Ros,mv , and hit ratios, HRmv
62 Results
7.3 Results from the backtest

In figure 7.4 and 7.5 our forecasts are compared with a backward average, a forward
average and the HEP. An average of the forecasts are also compared to a forward
average. The backtest is explained in chapter 6.4 and further discussed in the next
chapter.
(a) Univariate backtest 1 year (b) Univariate backtest 2 year
(c) Univariate backtest 3 year (d) Univariate backtest 4 year
(e) Univariate backtest 5 year (f) 1991:1995 compared forward
Figure 7.4. Backtest of univariate models

7.3 Results from the backtest 63
(a) Multivariate backtest 1 year (b) Multivariate backtest 2 year
(c) Multivariate backtest 3 year (d) Multivariate backtest 4 year
(e) Multivariate backtest 5 year (f) 1991:1995 compared forward
Figure 7.5. Backtest of multivariate models

Chapter 8
Discussion of the Forecasting
In chapter 6.3 we specified the value of g to be used in this thesis as the reciprocal
of the number of samples. For the sake of completeness, we have presented the
outcome of the two other values of g in table 7.6. Apparently, the chosen value
of g has most impact on the 1-year horizon forecast and a decreasing impact on
the other horizons. This can be explained by the rapid decreasing forecasting
performance of the covariance matrix for time lags above one which in turn can
be motivated by table 5.7 showing decreasing R2 -values over time. In figure 7.2
the principle appearance of the likelihood function for the factors and different
g-values can be seen. As explained earlier it is seen that increasing the value of
g gives models with good adaptation to data a higher likelihood, while setting g
to zero yields the same likelihood for all models. For large g-values, only models
with high degree of explanation will have impact in the BMA and you have great
confidence in your data. On the other hand, a decrease of g allows for more un-
certainty to be taken into account.
Turning to the model criterions formulated in chapter 2.7, it is found that most of
the criteria are fulfilled. The equity premium over the five-year horizon is positive,
due to our added constraints, however the confidence interval for the premium
incorporates the zero at some times.
The time variation criteria is not fulfilled in the sense that the regression line
does not change considerably as new data points become available. The amount
of used data is a tradeoff between stability and incorporating the latest trend. The
conflict lies in the confidence of predictors. To use many data samples improve
preciseness of the predictors but the greater the difference between the time to be
predicted and that of the oldest samples, the more doubtful are the implications
of old samples.
The smoothness of the estimates over time is questionable, our five-years pre-
diction in the univariate case are rather smooth whereas the multivariate forecasts
exhibit greater fluctuations. Given the appearance of the realized equity premium
65
66 Discussion of the Forecasting
til December 2007, which is strongly volatile, and that a multivariate model can
explain more variance, it is reasonable that a multivariate model would generate
results more similar to the input data, just as can be observed in the multivariate
case, figure 7.3.
The time structure of the equity premium is not taken into consideration be-
cause the one-year yield, serving as the riskfree asset, does not alone account for
the term structure.
Since all predictions suffer from an error it is important to be aware of the qual-
ity of the predictions. Our precision estimate takes the misfit of the models into
account and therefore it says something about the uncertainty in our predictions.
However, this precision does not say anything about the relevancy of using old
data to forecast future values.
From the R2 -values in table 5.7 it can be seen that there are some predictive
ability at hand, even though it is small. Another evidence of predictability is the
deviation of the prior probabilities to the posterior probabilities. If there were no
predictability at hand, why would then the prior probability be different from the
posterior probability? The mean in table 7.1 and table 7.4 corresponds to using
the prior believes that all models have the same probability, the BMA estimate is
never equal to the mean.
The univariate predictors with the highest probability in each time step, table
7.2, also enters the models with highest probability in table 7.5, except for GDP
which is not a member of the multivariate model for the first time step. This
can be summarized as the factors GDP, term spread short and volatility being
important in the forecast for the next five years.
Having seen evidence of predictive ability, the question is now to what extent
it can be used to produce accurate forecasts.
Backtesting our approach is not trivial, mainly because we cannot access the
historical expected premium. Nevertheless, backtesting has been performed by
doing a full five-year horizon forecast starting in each year between 1991 and 1995
respectively and then comparing the point forecasts with the realized historical
equity premium for each year. Here, no restrictions are imposed on the forecasts,
i.e. negative excess returns are allowed. The results are presented in figure 7.4 and
figure 7.5 where each plot corresponds to a time step (1, 2, 3, 4 or 5 years). These
plots have also been complemented with the realized excess returns, as well as the
five-year backward and the five-year forward average. In figure 7.4 f and figure
7.5 f , the arithmetic average of the full five-year horizon forecast is compared to
the five-year forward average.
The univariate backtest shows that the forecast intervals at most capture 2 out
of 5 HEP:s, this at the one and two-year horizon. Otherwise, the forecasts tend
67
to be far too low in comparison with the HEP. The number of times the HEP
intersects with the forecasted intervals at the most is 2 times, at the two-year
horizon figure 7.4 b. In general, the univariate forecasts do not seem to be flexible
enough to fit the sometimes vast changes in the HEP and are far too low. The
backtest has not provided us with any evidence of forecasting ability. However,
when the forecast constraint is imposed, the predictive ability from 1991-1995 is
superior to using the historical average. This can be seen from the R2 -statistics
in table 7.3. The four and five-year horizon forecasts, figure 7.4 d − e, captures
2 out of 5 forward averages, whereas the one-year horizon captures 3 backward
averages. In figure 7.4 f it can be seen that averaging the forecasts do not give
a better estimate of the forward average. From table 7.3 it can be seen that the
hit-ratios for the one and four-year horizon stand out with both scoring 60 %. The
results from the univariate back-test have shown that the best forecasts were re-
ceived for the one and four-year horizon, of which none has a good forecast quality.
The multivariate backtest shows little sign of forecasting ability for our model.
The number of times the HEP intersects at most with the forecasted interval is 3
out of 5 times. This happens at the three and four-year horizon, figure 7.5 c and d,
these are also also the forecasts following the evolvement of the HEP most closely.
The four-year forecast depicts the change of the HEP the best, being correct 3 out
of 4 times, however never getting the actual figures correct. The two and four-year
forecast captures the forward average the best, 2 out of 5 forecasted intervals are
realized in average over the next 5 years. From figure 7.5 f , the only conclusion
that can be drawn is that averaging our forecast for each time step does not pro-
vide a better estimate of the forward average. The R2 -values in table 7.7 show
sign of forecasting ability in comparison with the historical average at all time
steps except for the two-year horizon, with the four and five-year horizon forecasts
standing out. The most significant hit-ratio is 80%, at the four-year horizon. In
conclusion the back-testing in the multivariate case has shown that for the test
period the best results in all terms have been received for the four and five-year
horizon, in particular the four-year horizon.
Summing up the results from the univariate and multivariate back-test, it can
not be said that the quality of the multivariate forecasts outperforms the quality
of the univariate estimates when looking to the R2 -values and hit-ratios. However,
the multivariate forecasts as such depict the evolvement of the true excess returns
in a better way. Contrary to what one could believe, the one year horizon fore-
casts do not generate better forecasts than the other horizons. In fact, the best
estimates are provided by the 4-year forecasts, both in the univariate and the mul-
tivariate case. Still, we recommend using the one-year horizon forecasts because it
has the smallest time lag and therefore uses more recent data. Furthermore, the
result that the forecast power for multi factor models is better than for a forecast
based on the historical average is in line with Campbell and Thompson’s findings
[16].
68 Discussion of the Forecasting
Part II
Using the Equity Premium

in Asset Allocation
69
Chapter 9
Portfolio Optimization
In modern portfolio theory it is assumed that expected returns and covariances are
known with certainty. Naturally, this is not the case in practise - the inputs have
to be estimated and with this follows estimation errors. Errors in the estimations
have great impact on the optimal allocation weights in a portfolio, therefore it is
of great interest to have as accurate forecasts of the input parameters as possible,
which has been dealt with in part I of this thesis. Even if you have good estimates of
the input parameters, estimation errors will still be present, they are just smaller.
In this chapter we discuss and present the impact of estimation errors in portfolio
optimization.
9.1 Solution of the Markowitz problem

The Markowitz problem is the foundation for single-period investment theory and
relates the trade-off between expected rate of return and variance of the rate of
return in a portfolio of risky assets. [52]
The model of Markowitz is assuming that investors are only concerned about
the mean, the variance and the correlation of the portfolio assets. A portfolio is
said to be “efficient” if there is no other portfolio with the same expected return
but with a lower risk, or if there is no other portfolio with the same risk, but with
a higher expected return. [54] An investor who seeks to minimize risk (standard
deviation) always chooses the portfolio with the smallest standard deviation for a
given mean, i.e. he is risk averse. An investor, who for a given standard deviation
wants to maximize the expected return, is said to have the property nonsatiation.
An investor being riskaverse and nonsatiated at the same time will always choose a
portfolio on the efficient frontier, which is made up of the set of efficient portfolios.
[52] The portfolio on the efficient frontier with the lowest standard deviation is
called the minimum variance portfolio (MVP).
Given the number of assets n in the portfolio the other statistical properties of
the Markowitz problem can be described by its average return µ ∈ Rn×1 , the
71
72 Portfolio Optimization
covariance matrix C ∈ Rn×n and the asset weight w ∈ Rn×1 . The mathematical
formulation of the Markowitz problem is now given as
min w> Cw
w
s.t. µ> w = µ̄
1> w = 1, (9.1)
where 1 is a column vector of ones. The first constraint says that the weights
and their corresponding returns have to equal the desired return level. The sec-
ond constraint means that the weights have to add up to one. Note that in this
formulation, the signs of the weights are not restricted, short selling is allowed.
Following Zagst [66] the solution to problem (9.1) is given in theorem 9.1.
Theorem 9.1 (Solution of the Markowitz problem) If C is positive definite,

then according to theorem A.1, C is invertible and its inverse is also positive def-
inite. Further, denote
• a = 1> C −1 µ
• b = µ> C −1 µ
• c = 1> C −1 1
• d = bc − a2 .
The optimal solution of problem (9.1) is given as
1
w∗ = ((cµ̄ − a)C −1 µ + (b − aµ̄)C −1 1) (9.2)
d
with
cµ̄2 − 2aµ̄ + b
σ 2 (µ̄) = w> Cw∗ = . (9.3)
d
The minimum variance portfolio denoted with wM V P is given as
1 −1
wM V P = C 1 (9.4)
c
and is located at r
a 1
(µM V P , σM V P ) = ( , ). (9.5)
c c
Finally, the minimum variance set is given as
r
d 2 2
µ̄ = µM V P ± (σ − σM VP) (9.6)
c
where the positive case correspond to the efficient frontier, since it dominates the
2 2
negative case. σM V P sets the lower bound for possible values on σ .
9.1 Solution of the Markowitz problem 73
Proof :1 Since C −1 is positive definite it holds that
b = µ> C −1 µ > 0 (9.7)
and also that

c = 1> C −1 1 > 0. (9.8)
With the scalar product2 h1, µi ≡ 1> C −1 µ and the Cauchy-Schwarz inequality it
follows
h1, µi2 = (1> C −1 µ)2 = a2

≤ h1, 1ihµ, µi = (1> C −1 1)(µ> C −1 µ) = bc
and for µ 6= k · 1, it follows that
d = bc − a2 > 0. (9.9)
Furthermore, the Lagrangian for problem (9.1) is given as

1 >
L(w, u) = w Cw + u1 (µ̄ − µ> w) + u2 (1 − 1> w) (9.10)
2
where the objective function has been multiplied with the factor 21 for convenience
only. w∗ is optimal if there exists an u = (u1 , u2 )> ∈ R2 that satisfies the Kuhn-
Tucker conditions
n
∂L X
(w∗ , u) = ci,j wj∗ − u1 ui − u2 = 0, ∀i (9.11)
∂wi j=1
∂L
(w∗ , u) = µ̄ − µ> w∗ = 0 (9.12)
∂u1
∂L
(w∗ , u) = 1 − 1> w∗ = 0. (9.13)
∂u2
(9.11) ⇔ Cw∗ = u1 µ + u2 1
⇔ w∗ = u1 C −1 µ + u2 C −1 1 (9.14)
(9.13)&(9.14) ⇒ 1> w∗ = u1 1> C −1 µ + u2 1> C −1 1

= au1 + cu2 = 1 (9.15)
(9.12)&(9.14) ⇒ µ> w∗ = u1 µ> C −1 µ + u2 µ> C −1 1

= bu1 + au2 = µ̄ (9.16)

a c u1 1
(9.15)&(9.16) ⇔ = (9.17)
b a u2 µ̄.
| {z } | {z }
≡A ≡u
1 Following [66]
2 see theorem A.1
Calculate the inverse of A as

−1 1 a −c
A =
det(A) −b a

1 a −c
=
bc − a2 −b a

1 −a c
= , (9.18)
d b −a
where d is greater than zero, see (9.9). Using (9.17) and (9.18) yields

−1 1
u = A
µ̄

1 cµ̄ − a
= (9.19)
d b − aµ̄
By inserting (9.19) into (9.14) equation (9.2),the optimal weights, are found:
w∗ = u1 C −1 µ + u2 C −1 1
1
= ((cµ̄ − a)C −1 µ + (b − aµ̄)C −1 1). (9.20)
d
Equation (9.3) follows by
σ 2 (µ̄) = w> Cw∗

(9.11)
u1 µ> w∗ + u2 1> w∗
z}|{
=
(9.15)&(9.16)
z}|{
= u1 µ̄ + u2 (9.21)
(9.19)
1
((cµ̄ − a)µ̄ + (b − aµ̄))
z}|{
=
d
cµ̄2 − 2aµ̄ + b
= (9.22)
d
which has its minimum for
∂σ 2 (µ̄) 1
= (2cµ̄ − 2a) = 0
∂ µ̄ d
a
⇒ µM V P = (9.23)
c
since the second partial derivative is positive
(9.8)&(9.9)
∂ 2 σ 2 (µ̄) 2c z}|{
2
= > 0. (9.24)
∂ µ̄ d
9.1 Solution of the Markowitz problem 75
(9.23) and (9.3) results in

p
σM V P = σ 2 (µM V P )
r
cµ2M V P − 2aµM V P + b
=
d
r
c( ac )2 − 2a( ac ) + b
=
d
r
1
= , (9.25)
c
where c is positive, see (9.8). Together with (9.23) this gives equation (9.5), the
location of the minimum variance portfolio.
r
a 1
(µM V P , σM V P ) = ( , )
c c
The weights of the minimum variance portfolio, equation (9.4) is found as follows
(9.20)
1
((cµM V P − a)C −1 µ + (b − aµM V P )C −1 1)
z}|{
wM V P =
d
(9.23)
1 a a
((c( ) − a)C −1 µ + (b − a( ))C −1 1)
z}|{
=
d c c
(9.9)
z}|{ 1 −1
= C 1. (9.26)
c
Finally, the efficient frontier in equation (9.6) is found by defining σ ≡ σ(µ̄)
(9.22)
cµ̄2 − 2aµ̄ + b
σ2
z}|{
=
d
⇔
d 2 a b
σ = µ̄2 − 2 µ̄ +
c c c
a a2 b
= (µ̄ − )2 − 2 +
c c c
(9.9)&(9.23)
d1
(µ̄ − µM V P )2 +
z}|{
=
cc
(9.25)
d 2
(µ̄ − µM V P )2 + σM
z}|{
= VP
c
⇔
2 d 2 2
(µ̄ − µM V P ) = (σ − σM VP)
c
⇔ r
d 2 2
µ̄ = µM V P ± (σ − σM VP)
c
If shorting was not allowed, the constraint for positive portfolio weights would
have to be added to problem (9.1). The problem formulation would then be
min w> Cw
w
s.t. µ> w = µ̄
1> w = 1
w ≥ 0. (9.27)
This optimization problem is quadratic just as problem (9.1), but in contrast it can
not be reduced to a set of linear equations due to the added inequality constraint.
Instead, an iterative optimization method has to be used for finding the optimal
weights. The problem is solved by making the call quadprog in Matlab. The
function solves quadratic optimization problems by using active set methods3 .
9.2 Estimation error in Markowitz portfolios

The estimated parameters, mean and covariance, used in Markowitz-based port-
folio construction are often based on calculations on just one sample set from the
return history. Input parameters derived from this sample set can only be ex-
pected to equal the parameters of the true distribution if the sample is very large
and the distribution is stationary. If the distribution is non-stationary it could
be advisable to instead use a smaller sample for estimating the parameters. We
now can distinguish between two types of origination for the estimation error -
stationary but too short data set or non-stationary data. [61] In this part of the
thesis we will focus on estimation error originating from stationary but too short
data sets.
Solving problem (9.27) for a given data set, where the means and covariances
have been estimated on historical data, would generate portfolios that exhibit
very different allocation weights. Some assets tend to never enter the solution as
well. This is a natural result from solving the optimization problem - the assets
with very attractive features dominate the solution. It is also here the estimation
errors are likely to be large, which means that the impact of estimation errors
on portfolio weights is maximized. [61] This is an undesired property of portfolio
optimization that has been known for a long time [56]. Since the input parameters
are treated as if they were known with certainty, even very small changes in them
will trace out a new efficient frontier. The problem gets even worse as the numbers
of assets increases because this increases the probability of outliers. [61]
3 This is further explained in [35]

9.3 The method of portfolio resampling 77
9.3 The method of portfolio resampling

Section 9.2 presented the problems with estimation errors in portfolio optimiza-
tion due to treating input parameters as certain. A Monte Carlo approach called
“Portfolio Resampling” has been introduced by Michaud [56] to deal with this. The
basic idea is to allow for uncertainty in the input parameters by sampling from
a distribution with parameters specified by estimates on historical data. Fabozzi
[26] has summarized the procedure and it is described below.
Algorithm 9.1 (Portfolio resampling)
1. Estimate the mean vector, µ̂, and covariance matrix, Σ̂, from historical data.
2. Draw T random samples from the multivariate distribution N (µ̂, Σ̂) to
estimate µ̂i and Σ̂i .
3. Calculate an efficient frontier from the input parameters from step 2 over the
interval [σM V P,i , σM AX ] which is partitioned into M equally spaced points.
Record the weights w1,i , . . . , wM,i .
4. Repeat step 2 and 3 a total of I times.
PI
5. Calculate the resampled portfolio weights as w̄M = I1 i=1 wM,i and evalu-
ate the resampled frontier with the mean vector and covariance matrix from
step 1.
The number of draws T correspond to the uncertainty in the inputs you are us-
ing. As the number of draws increases the dispersion decreases and the estimation
error, the difference between the original estimated input parameters and the sam-
pled input parameters, will become smaller. [61] Typically, the value of T is set
to the length of the historical data set [61] and the value of I is set between 100 to
500 [26]. The number of portfolios M can be chosen freely according to how well
the efficient frontier should be depicted.
The new resampled frontier will appear below the original one. This follows from
the weights w1,i , . . . , wM,i being optimal relative to µ̂i and Σ̂i but inefficient rela-
tive to the original estimates µ̂ and Σ̂. Therefore, the resampled portfolio weights
are also inefficient relative to µ̂ and Σ̂. By the sampling and reestimation that
occurs at each step in the portfolio resampling process, the effect of estimation
error is incorporated in the determination of the resampled portfolio weights. [26]
9.4 An example of portfolio resampling

A portfolio consisting of 8 different assets has been constructed. The assets are:
a world commodity index; equity in the emerging markets, the US and Germany;
bonds in the emerging markets, the US and Germany and finally a real estate
index. Their mean vector and covariance matrix has been estimated on data from
2002-2006 and can be found in table 9.1.
Ticker Asset Mean

Bloomberg
Covariance
Cmdty EQEM EQUS EQDE BDEM BDUS BDDE Estate
SPGCCITR Cmdty 0.57 0.21
NDLEEGF EQEM 0.08 0.32 0.21
INDU EQUS -0.05 0.17 0.18 0.05
DAX EQDE -0.08 0.31 0.30 0.64 0.07
JGENGLOG BDEM -0.01 0.01 0.00 -0.01 0.01 0.09
JPMTUS BDUS 0.01 -0.03 -0.03 -0.08 0.01 0.03 0.06
JPMTWG BDDE 0.01 -0.02 -0.02 -0.05 0.01 0.02 0.01 0.05
G250PGLL Estate -0.05 0.12 0.10 0.13 0.01 -0.01 0.00 0.19 0.10
Table 9.1. Input parameters for portfolio resampling
With the input parameters from table 9.1 a portfolio resampling has been carried
out, with and without shorting allowed and always with both errors in the mean
and covariances. In figure 9.1 the resampled efficient frontiers are depicted. In
figure 9.2 and 9.3 the portfolio allocations are found. Finally, the impact of errors
in the mean and in the covariances respectively are displayed in 9.4.
9.5 Discussion of portfolio resampling 79
9.5 Discussion of portfolio resampling
As discussed earlier the resampled frontier will plot below the efficient frontier,
just as in figure 9.1 b. However, when shorting is allowed the resampled frontier
will coincide with the efficient frontier. Why is that? Estimation errors should
result in an increase in portfolio risk showing up as an increase in volatility for
each return level. Instead it can only be seen that the estimation errors result in
a shortening of the frontier. The explanation given by Scherer [61] is that highly
positive returns will be offset by highly negative returns when drawing from the
original distribution. The quadratic programming optimizer will invest heavily in
the asset with highly positive returns and short the asset with highly negative
returns and this will be offset in average. When the long-only constraint is added,
this will no longer be the case and the resampled frontier will plot below the effi-
cient frontier, figure 9.1 b.
As a result of above, the resampled porfolio weights when shorting is allowed

will be pretty much the same as those in the efficient portfolios. Most of the assets
enter the solution in the same way, as depicted in figure 9.2 b. When shorting
no longer is allowed, the resulting allocations are very concentrated to only some
assets in the efficient portfolios and a small shift in desired return level can lead
to rather different allocations, e.g. going from portfolio 6 to 7 in figure 9.3. The
resampled portfolios on the other hand exhibit a much more smooth transition
from different return levels and a greater diversification.
In the resampling, estimation errors have been assumed both in the means and
covariances. In figure 9.4 the effect of only estimation errors in the means or co-
variances can be observed. It is found that estimation errors in the mean have a
much greater impact than estimation errors in covariances. A good forecast of the
mean will improve the resulting allocations a great deal.
The averaging in the portfolio resampling method makes the weights still sum
to one, which is important. But averaging can sometimes prove to be misleading.
For instance you will always face the probability that the allocation weights for a
given portfolio are heavily influenced of a few lucky draws making the asset look
more attractive than what is justifiable. Averaging is indeed the main idea be-
hind portfolio resampling, but it is not plausible that the final averaged portfolio
weights are dependent on a few extreme outcomes. This is criticism discussed by
Scherer [61]. However, the most important criticism, also presented by Scherer
[61], is that all resamplings are derived from the same mean vector and covariance
matrix. Because the true distribution is unknown, all resampled portfolios suffer
from the same deviation from the true parameters in pretty much the same way.
Averaging will not help much in this case. Therefore it is fair to say that all port-
folios inherit the same estimation error.
It is found by Michaud [56] that resampled portfolios beat Markowitz portfolios

out-of-sample. This follows from the fact that well diversified portfolios tend to
always beat Markowitz portfolios out-of-sample and can therefore not only be sub-
scribed to the portfolio resampling method itself as being outstanding. Although
the resampling heuristic have some major drawbacks, it remains interesting since
it is a first step of addressing estimation errors in portfolio optimization.
(a) shorting allowed
(b) no shorting allowed
Figure 9.1. Comparison of efficient and resampled frontier

(a) Resampled weights
(b) Mean-variance weights
Figure 9.2. Resampled portfolio allocation when shorting allowed

(a) Resampled weights
(b) Mean-variance weights
Figure 9.3. Resampled portfolio allocation when no shorting allowed

(a) Errors in mean
(b) Errors in covariance
Figure 9.4. Comparison of estimation error in mean and covariance

Chapter 10
Backtesting Portfolio
Performance
In the first part of this thesis we developed a method for forecasting the equity
premium that took model uncertainty into account. It was found that our forecast
outperformed the use of an historical average but was associated with estimation
errors. In the previous chapter we presented portfolio resampling as a method for
dealing with these errors. In this chapter we will evaluate if portfolio resampling
can be used to improve our forecasting results.
10.1 Backtesting setup and results

We benchmark the performance of a portfolio consisting of all the assets found in
table 9.1, except for equity and bonds from emerging markets, using our forecasted
equity premium and portfolio resampling. For the two assets in emerging markets
we had too short time series.
Starting in the end of 1998 and going to the end of 2007 we solve problem (9.27)
and rebalance the portfolio at the end of each year. We do not allow for short-
selling since it previously was found that portfolio resampling only has effect under
the long-only constraint. Transaction costs are not taken into account, since our
concern is the relative performance of the methods. The returns vector, µ, is fore-
casted using the arithmetic average of the returns up to time t for asset i except
for equity US where we make use of our one year multivariate forecasted equity
premium
√ for time t. The parameter µ̄ is set so that each portfolio has a volatility of
0.02 ≈ 14% when rebalanced. The covariance matrix is always estimated on all
returns available up to time t. The resulting portfolio value over time is found in
figure 10.1 and in table 10.1 the corresponding returns are found. In table 10.2 the
exact portfolio values on the end date for ten resampling simulations are presented.
85
86 Backtesting Portfolio Performance
Figure 10.1. Portfolio value over time using different strategies
It is found that using our premium forecasts as input yields better performance
than just employing the historical average1 . Our forecast consistently generates
the highest portfolio value. As explained earlier, using accurate inputs in portfolio
optimization is very important.
Date EEP EEP&PR aHEP aHEP&PR

Dec-99 33.4 32.8 24.4 27.2
Dec-00 -3.6 -2.6 -2.8 -1.2
Dec-01 -17.1 -18.3 -17.1 -18.9
Dec-02 -16.8 -16.8 -23.3 -19.8
Dec-03 22.0 24.3 19.0 23.0
Dec-04 3.4 7.6 7.0 9.8
Dec-05 18.9 20.2 20.8 21.0
Dec-06 6.9 5.9 6.7 6.3
Dec-07 20.7 17.6 20.3 19.4
Table 10.1. Portfolio returns in percent over time. PR is the acronym for portfolio
resampling.
1 For the asset equity US, the historical arithmetic average is refered to as aHEP.
10.1 Backtesting setup and results 87
EEP EEP&PR aHEP aHEP&PR

1.716 1.701 1.520 1.731
1.765 1.671
1.750 1.713
1.700 1.717
1.785 1.728
1.768 1.672
1.750 1.755
1.790 1.730
1.767 1.675
1.766 1.736
Average: 1.754 1.713
Table 10.2. Terminal portfolio value. PR is the acronym for portfolio resampling.
Portfolio resampling seems to improve performance if the input is very uncertain,

such as the aHEP. Resampling increases the portfolio return on an average of al-
most 20 percentage units for the aHEP, but only about 4 percentage units for the
EEP. As seen in table 10.2, resampling generated a higher terminal value ten out
of ten times for the aHEP, whilst for the EEP resampling sometimes generated
a lower terminal portfolio value. This could point to that resampling indeed is
useful when the input parameters are uncertain, since the portfolio weights get
smoothened and more assets enter the solution and creates a more diversified
portfolio. According to Michaud [56] well diversified portfolios, e.g. obtained by
resampling, should outperform Markowitz portfolios out-of-sample, just as found
here. The pure EEP and aHEP portfolios are both outperformed by their resam-
pled counterparts. The rather small increase in portfolio return when resampling
using the EEP as input compared to using the aHEP, points to the EEP containing
smaller estimation errors than the aHEP. This is also supported by the positive
2
Ros,mv found in section 7.2.
In this backtest we find evidence that our multivariate forecast performs better
than the arithmetic average when used as input in a mean-variance asset allocation
problem. Portfolio resampling is also found to provide a good way of arriving at
meaningful asset allocations when the input parameters are very noisy.
Chapter 11
Conclusions
In this thesis we incorporate model uncertainty in the forecasting of the expected

equity premium by creating a large number of linear prediction models on which
we apply Bayesian model averaging. We also investigate the general impact of in-
put estimation errors in mean-variance optimization and evaluate the performance
of a Monte Carlo based heuristic called portfolio resampling.
It is found that the forecasting ability of multi factor models is not substantially
improved by our approach. Our interpretation thereof is that the largest problem
with multifactor models is not model uncertainty, but rather too low predictive
ability.
Further, our investigation brings evidence that the GDP, the short term spread
and the volatility are useful in forecasting the expected equity premium for the five
years to come. Our investigations also show that multivariate models are to some
extent better than univariate models, but it can not be said that any of them is
accurate in predicting the expected equity premium. Nevertheless, it is likely that
both provide better forecasts than using the arithmetic average of the historical
equity premium.
We have also found that portfolio resampling provides a good way to arrive at
meaningful allocation decisions when the optimization inputs are very noisy.
Our proposal to further work is to investigate whether a Bayesian analysis, not in-
volving linear regression, with carefully selected priors, calibrated to reflect mean-
ingful economic information, provides better predictions for the expected equity
premium than the approach used in this thesis.
89
Bibliography
[1] Ang A. & Bekaert G., (2003), Stock return predictability: is it there?, Work-
ing Paper, University of Columbia.
[2] Avramov D., (2002), Stock return predictability and model uncertainty, Jour-
nal of Financial Economics, vol. 64, pp. 423-458.
[3] Baker M. & Wurgler J., (2000), The Equity Share in New Issues and Aggregate
Stock Returns, Journal of Finance, American Finance Association, vol. 55(5),
pp. 2219-2257.
[4] Benning J. F., (2007), Trading Strategies for Capital Markets, McGraw-Hill,
New York.
[5] Bernardo J. M. & Smith A., (1994), Bayesian Theory, John Wiley & Sons
Ltd.
[6] Bostock P., (2004), The Equity Premium, Journal of Portfolio Management
vol. 30(2), pp. 104-111.
[7] Brealey R. A., Myers S. C. & Allen F., (2006),Corporate Finance, McGraw-
Hill, New York.
Hill, New York.
Hill, New York.
[10] Burda M. & Wyplosz C., (1997), Macroeconomics: A European text, Oxford
University Press, New York.
[11] Campbell J. Y., Lo A. & MacKinlay A., (1997), The Econometrics of Financial
Markets, Princeton University Press.
[12] Campbell J. Y. & Shiller R. J., (1988) The dividend-price ratio and expecta-
tions of future dividends and discount factors, Review of Financial Studies,
vol. 1, pp. 195-228.
[13] Campbell J. Y. & Shiller R. J., (1988) Stock prices, earnings, and expected
dividends, Journal of Finance, vol. 43, pp. 661-676.
91
92 Bibliography
[14] Campbell J. Y. & Shiller R. J., (1998) Valuation ratios and the long-run stock
market outlook, Journal of Portfolio Management, vol. 24, pp. 11-26.
[15] Campbell, J. Y., (1987), Stock returns and the term structure, Journal of
Financial Economics, vol. 18, pp. 373-399.
[16] Campbell J. & Thompson S., (2005), Predicting the Equity Premium Out of
Sample: Can Anything Beat the Historical Average?, NBER Working Papers
11468, National Bureau of Economic Research.
[17] Casella G. & Berger R. L., (2002), Statistical Inference, 2nd ed. Duxbury
Press.
[18] Choudhry M., (2006), Bonds - A concise guide for investors, Palgrave Macmil-
lan, New York.
[19] Cohen R.B., Polk C. & Vuolteenaho T., (2005), Inflation Illusion in the Stock
Market: The Modigliani-Cohn Hypothesis, Quarterly Journal of Economics,
vol. 120, pp. 639-668.
[20] Dalén J., (2001), The Swedish Consumer Price Index - A Handbook of Meth-
ods, Statistiska Centralbyrån, SCB-Tryck, Örebro.
[21] Damodaran A., (2006), Damodaran on Valuation, John Wiley & Sons, New
York.
[22] Dimson E., Marsh P. & Staunton M., (2006), The Worldwide Equity Pre-
mium: A Smaller Puzzle, SSRN Working Paper No. 891620.
[23] Durbin J. & Watson G.S., (1950), Testing for Serial Correlation in Least
Squares Regression I, Biometrika vol. 37, pp. 409-428.
[24] Escobar L. A. & Meeker W. Q., (2000), The Asymptotic Equivalence of the
Fisher Information Matrices for Type I and Type II Censored Data from
Location-Scale Families., Working Paper.
[25] Estrella A. & Trubin M. R., (2006), The Yield Curve as a Leading Indicator:
Some Practical Issues, Current Issues in Economics and Finance - Federal
Reserve Bank of New York, vol. 12(5).
[26] Fabozzi F. J., Focardi S. M. & Kolm P. N., (2006), Financial Modeling of the
Equity Market, John Wiley & Sons, New Jersey.
[27] Fama E.F., (1981), Stock returns, real activity, inflation and money, American
Economic Review, pp. 545-565.
[28] Fama E. F. & French K. R., (1988), Dividend yields and expected stock
returns, Journal of Financial Economics, vol. 22, pp. 3-25.
[29] Fama E. F. & French K. R., (1989), Business conditions and expected returns
on stocks and bonds, Journal of Financial Economics, vol. 25, pp. 23-49.
Bibliography 93
[30] Fama E.F. & Schwert G.W., (1977), Asset Returns and Inflation, Journal of
Financial Economics, vol. 5(2), pp. 115-46.
[31] The Federal Reserve, Industrial production and capacity utilization, (2007),
Retrieved February 12, 2008 from
http://www.federalreserve.gov/releases/g17/20071214/
[32] Fernández P., (2006), Equity Premium: Historical, Expected, Required and
Implied, IESE Business School, Madrid.
[33] Fernandéz C., Ley E. & Steel M., (1998), Benchmark priors for Bayesian
Model Averaging, Working Paper.
[34] Franke J., Härdle W.K. & Hafner C.M., (2008), Statistics of Financial Markets
An Introduction, Springer-Verlag, Berlin Heidelberg.
[35] Gill P. E. & Murray W., (1981), Practical Optimization, Academic Press,
London.
[36] Golub G. & Van Loan C., (1996), Matrix Computations, The Johns Hopkins
University Press, Baltimore.
[37] Goyal A. & Welch I., (2006), A Comprehensive Look at the Empirical Per-
formance of Equity Premium Prediction, Review of Financial Studies, forth-
coming.
[38] Hamilton J. D., (1994), Time Series Analysis, Princeton University Press.
[39] Harrell F. E., (2001), Regression Modeling Strategies, Springer-Verlag, New
York.
[40] Hodrick R. J., (1992), Dividend yields and expected stock returns: alternative
procedures for inference and measurement, Review of Financial Studies, vol.
5(3), pp. 257-286.
[41] Hoeting J. A., Madigan D. & Raftery A. E. & Volinsky C. T., (1999), Bayesian
Model Averaging: A Tutorial, Statistical Science 1999, vol. 14(4), pp. 382-417.
[42] Ibbotson Associates, (2006), Stocks, Bonds, Bills and Inflation, Valuation
Edition, 2006 Yearbook.
[43] Keim D. B. & Stambaugh R. F., (1986), Predicting returns in the stock and
bond markets, Journal of Financial Economics, vol. 17(2), pp. 357-390.
[44] Kennedy P. E., (2000), Macroeconomic Essentials - Understanding Economics
in the News, The MIT Press, Cambridge.
[45] Koller T. & Goedhart M. & Wessels D., (2005), Valuation: Measuring and
Managing the Value of Companies, McKinsey & Company, Inc. Wiley.
[46] Kothari S. P. & Shanken J., (1997), Book-to-market, dividend yield, and ex-
pected market returns: a time series analysis, Journal of Financial Economics,
vol. 44, pp. 169-203.
94 Bibliography
[47] Krainer J., What Determines the Credit Spread?, (2004), FRBSF Economic
Letter, Nr 2004-36.
[48] Lamont O., (1998), Earnings and expected returns, Journal of Finance, vol.
53, pp.1563-1587.
[49] Lee P. M., (2004), Bayesian Statistics an introduction, Oxford University

Press.
[50] Lettau M. & Ludvigson, (2001), Consumption, aggregate wealth and expected
stock returns, Journal of Finance, vol. 56(3), pp. 815-849.
[51] Lewellen J., (2004), Predicting returns with financial ratios, working paper.
[52] Luenberger D. G., (1998), Investment Science, Oxford University Press, New
York.
[53] Mankiw G. N., (2002), Macroeconomics, Worth Publishers, New York.
[54] Mayer B., (2007), Credit as an Asset Class, Masters Thesis, TU Munich.
[55] Merton R. C., (1980), On Estimating the Expected Return on the Market:
An Exploratory Investigation, Journal of Financial Economics, vol. 8, pp.
323-361.
[56] Michaud R., (1998), Efficient Asset Management: A Practical Guide to Stock
Portfolio Optimization and Asset Allocation, Oxford University Press, New
York.
[57] Polk C., Thompson S, & Vuolteenaho T, (2005), Cross-sectional forecasts of

the equity premium, Journal of Financial Economics, vol. 81(1), pp. 101-141.
[58] Pontiff J. & Schall L. D., (1998), Book-to-market ratios as predictors of market
returns, Journal of Financial Economics, vol. 49, pp. 141-160.
[59] Press J. S., (1972), Applied Multivariate Analysis, Holt, Rinehart & Winston
Inc, University of Chicago.
[60] Rozeff M., (1984), Dividend yields are equity risk premiums, Journal of Port-
folio Management, vol. 11, pp. 68-75.
[61] Scherer B., (2004), Portfolio Construction and Risk Budgeting, Risk Books,
Incisive Financial Publishing Ltd.
[62] University of Michigan, Surveys of consumers, Retrieved February 9, 2008

from http://www.sca.isr.umich.edu/
[63] U.S. Department of Labor, Glossary, Retrieved February 5, 2008 from

http://www.bls.gov/bls/glossary.htm#P
[64] Vaihekoski M., (2005), Estimating Equity Risk Premium: Case Finland,
Lappeenranta University of Technology, Working paper.
Bibliography 95
[65] Welch, I., (2000),Views of Financial Economists on the Equity Premium and
on Professional Controversies, Journal of Business, vol. 73(4), pp. 501-537
[66] Zagst R., (2004), Lecture Notes - Asset Pricing, TU Munich.
[67] Zagst, R. & Pöschik M., (2007), Inverse Portfolio Optimization under Con-
straints, Working Paper.
[68] Zellner A., (1986), On assessing prior distributions and bayesian regression
analysis with g-prior distributions, in Essays in Honor of Bruno de Finetti,
eds P.K. Goel and A. Zellner, Amsterdam: North-Holland, pp. 233-243.
Appendix A
Mathematical Preliminaries
A.1 Statistical definitions

Definition A.1 (Bias) Let θ̂ be a sample estimate of a vector of parameters θ.
For example, θ̂ could be the sample mean x̄. The estimate is then said to be
unbiased if E[θ̂] = θ, (see [38]).
Definition A.2 (Stochastic process) A stochastic process Xt , t ∈ Z, is a fam-

ily of random variables, defined in a probability space (Ω, F, P ).
At a specific time point t, Xt is a random variable with a specific density function.

Given a specific w ∈ Ω, {X(ω) = Xt (ω, t ∈ Z)} is a realization or a path of the
process, (see [34]).
Definition A.3 (Autocovariance function) The autocovariance function of a

stochastic process Xt is defined as
γ(t, τ ) = E[(Xt − µt )(Xt−τ − µt−τ )], ∀τ ∈ Z
The autocovariance function is symmetric, that is, γ(t−τ, −τ ) = γ(t, τ ). In general

γ(t, τ ) is dependent on t as well as on τ . Below we define the important concept of
stationarity, which many times will simplify autocovariance functions, (see [34]).
Definition A.4 (Stationarity) A stochastic process Xt is covariance stationary

if
E[Xt ] = µ and γ(t, τ ) = γ(τ ), ∀t.
A stochastic process Xt is strictly stationary if for any t1 , . . . , tn and for all n, s ∈ Z
it holds that the joint distribution
Ft1 ,...,tn (x1 , . . . , xn ) = Ft1 +s,...,tn +s (x1 , . . . , xn ).
For covariance stationary processes, the term weakly stationary is often used, (see
[34]).
97
98 Mathematical Preliminaries
Definition A.5 (Trace of a matrix) The trace of an matrix A ∈ Rn×n is de-

fined as the sum of the elements along the diagonal
tr(A) = a11 + a22 + · · · + ann , (see[59]).
Definition A.6 (The gamma function) The gamma function can be defined
as the definite integral
Z∞
Γ(x) = t(x−1) e−t dt
0
where x ∈ R and x > 0, (see [59]).
Definition A.7 (Positive definite matrix) A symmetric matrix A ∈ Rn×n is

called positive definite if
x> Ax > 0, ∀x 6= 0 ∈ Rn , (see[34]).
Theorem A.1 (Properties of positive definite matrices) If A is positive def-

inite it defines an inner product on Rn as
hx, yi = x> Ay.
In particular, the standard inner product for Rn is obtained when setting A = I.

Furthermore, A has only positive eigenvalues λi and is invertible and its inverse
is also positive definite.
Proof : (see [36], [59])
A.2 Statistical distributions

Definition A.8 (The normal distribution) The variable Y has a Gaussian,
or normal, distribution with mean µ and variance σ 2 if
−(yt − µ)2

1
fY = √ exp .
2πσ 2σ 2
Definition A.9 (The Chi-Squared distribution) The probability density for

the χ2 -distribution with v degrees of freedom is given by
xv/2−1 exp[−x/2]
pv (x) = .
Γ(v/2)2v/2
A.2 Statistical distributions 99
Definition A.10 (The multivariate normal distribution) Let x ∈ Rp×1 be

a random vector with density function f (x). x is said to follow a multivariate
normal distribution with mean vector θ ∈ Rp×1 and covariance matrix Σ ∈ Rp×p
if
f (x) = 1
(2π)p/2 |Σ|1/2
exp[− 12 (x − θ)> Σ−1 (x − θ)].
If |Σ| = 0 the distribution of x is called degenerate and does not exist.
The inverted Wishart distribution is the multivariate generalization of the uni-

variate inverted gamma distribution. It is the distribution of the inverse of a
random matrix following the Wishart distribution, and is the distribution which
is natural conjugate prior for the covariance matrix in a normal distribution.
Definition A.11 (The inverted Wishart distribution) Let U ∈ Rp×p be a

random matrix following the inverted Wishart distribution with positive definite
matrix G and n degrees of freedom. Then for n > 2p, the density of U is given by
c0 |G|(n−p−1)/2
p(U) = |U|(n/2)
exp[− 12 tr[U−1 G]]
and p(U) = 0 otherwise. The constant c0 is given by

Qp
c−1
0 =2
(n−p−1)p/2 p(p−1)/4
π j=1 Γ(
n−p−j
2 ).
Appendix B
Code
B.1 Univariate predictions

%input
[dates,values]=loadThesisData_LongDataSet(false); [dates, returns,
differ] = calcFactors_LongDataSet(dates, values);
eqp=returns(1:end,1); %this is the equity premium
returns=returns(1:end,2:end);
muci=[]; predRng=[]; allEst=[]; prob_model=[]; outliersStep=[];
%prediction horizon
horizon=5;
for k=1:horizon
y_bma=[];
x_bma=[];
res=[];
est=[];
removedModels=[];
usedModels=[];
outliers=0;
for j=1:length(returns(1,:))
[x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)...
,returns(1:end-k,j),returns(end,j));
res = [res resVec];

est = [est est_tmp];
y_bma=[y_bma y];
x_bma=[x_bma x];
n=length(x(:,1));
p=length(x(1,:));
g=1/n;
if (est(j) > 0.0) && est(j)<mean(eqp(k+1:end))+1.28*rlstd(eqp(k+1:end))

P=x*inv(x’*x)*x’;
likelihood(j)=(gamma(n/2)/((2*pi)^(n/2))/((1+g)^(p/2)))...
*(y’*y-(g/(1+g))*y’*P*y)^(-n/2);
usedModels = [usedModels j];
else
likelihood(j)=0;
100
B.2 Multivariate predictions 101
removedModels = [removedModels j];

est(j)=0;
usedModels = [usedModels j];
end
outliers = outliers + outliersTmp;
end
outliersStep=[outliersStep outliers];
usedModelsBMA = usedModels*2-1;
p_model=likelihood./sum(likelihood);
weightedAvg =p_model*est’;
prob_model=[prob_model p_model’];
predRng = [predRng; 100*min(est) 100*max(est) 100*mean(est)];
allEst = [allEst est’];
VARyhat_data=zeros(length(res(:,1)),length(res(:,1)));
for i = 1:length(returns(1,:))
VARyhat_data = VARyhat_data +(diag(res(:,i))*x_bma(:,i*2-1:i*2)...
*inv(x_bma(:,i*2-1:i*2)’*x_bma(:,i*2-1:i*2))*x_bma(:,i*2-1:i*2)’...
+y_bma(:,i)*y_bma(:,i)’)*prob_model(i)-(y_bma(:,i)*prob_model(i))...
*(y_bma(:,i)*prob_model(i))’;
end
STD_step(k) = sqrt(sum(diag(VARyhat_data))/length(diag(VARyhat_data)));
z=norminv([0.05 0.95],0,1);
muci=[muci; weightedAvg+z(1)*STD_step(k)/sqrt(length(res(:,1)))...
weightedAvg weightedAvg+z(2)*STD_step(k)/sqrt(length(res(:,1)))];
end
B.2 Multivariate predictions

[dates,values]=loadThesisData_LongDataSet(false);
%input
[dates, returns, differ] = calcFactors_LongDataSet(dates, values);
eqp=returns(:,1); regressor=returns(:,2:end);
numFactor=length(regressor(1,:)); numOfModel=2^numFactor;
horizon=5; %prediction horizon

comb=combinations(numFactor);
prob_model=zeros(numOfModel-1,horizon);
likelihood=zeros(numOfModel-1,1); tmp=zeros(numOfModel-1,1);
usedModels=zeros(1,horizon); predRng=zeros(3,horizon);
y_bma=zeros(length(returns),horizon);
res=zeros(length(eqp)-1,numOfModel-1); toto = ones(length(eqp),1);
r=zeros(1,horizon); allMag=[]; muci=[]; VARyhat_data=[];
for k=1:horizon
for i=1:numOfModel-1
%pick a model
L=length(regressor(:,1));
out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)...
comb(i,4)*ones(L,1) comb(i,5)*ones(L,1) comb(i,6)*ones(L,1)...
comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)];
output=out.*regressor;
modRegr = output(:,not(all(output(:,1:size(output,2))== 0)));
%predictions
[x, y, est_tmp, beta, resVec, outliersTmp]=predictClean(eqp(k+1:end)...
,modRegr(1:end-k,:),modRegr(end,:));
102 Code
if (est_tmp>0)&&(est_tmp<(mean(eqp(k+1:end))+1.28*sqrt(var(eqp(k+1:end)))))
tmp(i)=est_tmp;
%calculate likelihood
n=length(x(:,1));
p=length(x(1,:));
g=p^(1/(1+p))/n;
P=x*inv(x’*x)*x’;
likelihood(i)=(gamma(n/2)/((2*pi)^(n/2))/((1+g)^(p/2)))...
*(y’*y-(g/(1+g))*y’*P*y)^(-n/2);
else
likelihood(i)=0;
tmp(i)=0;
r(k)=r(k)+1;
end
setsubColumn(k+1,size(res,1),i,resVec,res);
end
%bma
p_model=likelihood./sum(likelihood);
magnitude=p_model’*tmp;
prob_model(:,k)=p_model;
predRng(:,k)=[min(tmp); max(tmp); mean(tmp)];
allMag=[allMag magnitude];
y_bma(k+1:end,k)=y;
%Compute variance and confidence interval

%Instead of storing all models, create them again
VARyhat_data=zeros(length(y_bma(k+1:end,k)));
for i=1:numOfModel-1
%pick a model
L=length(regressor(:,1));
out=[comb(i,1)*ones(L,1) comb(i,2)*ones(L,1) comb(i,3)*ones(L,1)...
comb(i,16)*ones(L,1) comb(i,17)*ones(L,1) comb(i,18)*ones(L,1)];
output=out.*regressor;
modRegr = [modRegr(1:end-k,:) ones(length(modRegr(1:end-k,:)),1)];
%intercept added
VARyhat_data = VARyhat_data + (diag(res(k:end,i))*modRegr*inv(modRegr’...

*modRegr)*modRegr’+y_bma(k+1:end,k)*y_bma(k+1:end,k)’)...
*prob_model(i)-(y_bma(k+1:end,k)*prob_model(i))...
*(y_bma(k+1:end,k)*prob_model(i))’;
end
STD_step(k) = sqrt(sum(diag(VARyhat_data))/(length(diag(VARyhat_data))));
z=norminv([0.05 0.95],0,1);
muci=[muci; allMag(k)+z(1)*STD_step(k)/sqrt(length(res(:,1)))...
allMag(k) allMag(k)+z(2)*STD_step(k)/sqrt(length(res(:,1)))];
end
B.3 Merge time series 103
B.3 Merge time series

Developed by Jörgen Blomvall, Linköping Institute of Technology
function [mergedDates, values] = mergeExcelData(sheetNames, data)

mergedDates = datenum(’30-Dec-1899’) + data{1}(:,1);
mergedDates(find(isnan(data{1}(:,2)))) = []; length(sheetNames); for
i = 2:length(sheetNames)
nMerged = length(mergedDates);
dates = datenum(’30-Dec-1899’) + data{i}(:,1);
newDates = zeros(size(mergedDates));
for j = 1:nMerged
while (dates(k) < mergedDates(j) && k < length(dates))
k = k+1;
end
if (dates(k) == mergedDates(j) && ~isnan(data{i}(k,2)))
n = n+1;
newDates(n) = mergedDates(j);
end
end
mergedDates = newDates(1:n);
end
values = zeros(n, length(sheetNames));
for i = 1:length(sheetNames)
dates = datenum(’30-Dec-1899’) + data{i}(:,1);
k = 1;
for j = 1:n
while (dates(k) < mergedDates(j) && k < length(dates))
k = k+1;
end
if (dates(k) == mergedDates(j))
values(j,i) = data{i}(k,2);
else
error = 1
end
end
end
B.4 Load data into Matlab from Excel

Developed by Jörgen Blomvall, Linköping Institute of Technology
function [dates, values] = loadThesisData(interpolate)
%[status, sheetNames] = xlsfinfo(’test_merge.xls’); % Do not work for all

% Matlab versions
sheetNames = {’DJtech’ ’WoMat’ ’ConsDisc’ ’EnergySec’ ’ConStap’

’Health’...
’Util’ ’sp1500’ ’sp500’ ’spEarnYld’ ’spMktCap’ ’spPERat’ ’spDaiNetDiv’...
’spIndxPxBook’ ’spIndxAdjPe’ ’spEqDvdYi12m’ ’spGenPERat’ ’spPrice’...
’spMovAvg200’ ’spVol90d’ ’MoodCAA’ ’MoodBAA’ ’tresBill3m’ ’USgenTBill1M’...
’GovtYield10Y’ ’CPI’ ’PCECYOY’};
for i = 1:length(sheetNames)
data{i} = xlsread(’runEqPred.xls’, char(sheetNames(i)));
end
if interpolate
[dates, values] = mergeInterpolExcelData(sheetNames, data);
else
[dates, values] = mergeExcelData(sheetNames, data);
end
104 Code
B.5 Permutations
function out = combinations(k);
total_num = 2^k; indicator = zeros(total_num,k); for i = 1:k;

temp_ones = ones( total_num/( 2^i),2^(i-1) );
temp_zeros = zeros( total_num/(2^i),2^(i-1) );
x_temp = [temp_ones; temp_zeros];
indicator(:,i) = reshape(x_temp,total_num,1);
end;
out = indicator;
B.6 Removal of outliers and linear prediction

function [x, y, est, beta ,resVec, outliers]=predictClean(y,x,
lastVal)
%remove outliers
xTmp=[]; outliers=0; for i=1:length(x(1,:)) xVec=x(:,i);
for k=1:3 %nr of iterations for finding outliers
H_hat=xVec*inv(xVec’*xVec)*xVec’;
Y=H_hat*y;
index=find(abs(Y-mean(Y))>3*rlstd(Y));
outliers=outliers+length(index);
for j=1:length(index)
if index(j)~= length(y)
xVec(index(j))= 0.5*xVec(index(j)+1)+0.5*xVec(index(j)-1);
else
xVec(index(j))=0.5*xVec(index(j)-1)+0.5*xVec(index(j));
end
end
end xTmp = [xTmp xVec]; end x=xTmp;
%OLS
x=[ones(length(x),1) x]; %adding intercept
beta=x\y; % OLS
est=[1 lastVal]*beta; %predicted value
resVec=(y-x*beta).^2; %residual vector
B.7 setSubColumn
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[ ],int nrhs, const mxArray

{
*prhs[ ]) { int j; double *output;
double *src; double *dest; double *iStart, *iEnd, *col;
iStart = mxGetPr(prhs[0]); iEnd = mxGetPr(prhs[1]); col =

mxGetPr(prhs[2]);
src = mxGetPr(prhs[3]); dest = mxGetPr(prhs[4]);
//mexPrintf("%d\n", (int)col[0]*mxGetM(prhs[4])+(int)iStart[0]-1);
/* Populate the output */

memcpy(&(dest[((int)col[0]-1)*mxGetM(prhs[4])+(int)iStart[0]-1]),...\\
src, (int)(iEnd[0]-iStart[0]+1)*sizeof(double));
}
B.8 Portfolio resampling 105
B.8 Portfolio resampling

% Load Data & Set Parameters
[dates,values]=loadThesisData_Resampling4(false);
volDesired = 0.02; nrAssets=6; T=17; I=200; nrPortfolios=30;

errMean=true; errCov=true;
normPort=false; resampPort=true; stocksNr = [1 2 3 4 5 6];
EQP=[0.1417 0.1148 0.1062 0.4478 0.1024 0.1372 0.0979 0.0635

0.0897 0.1084]; HEP=[0.0616 0.0760 0.0708 0.0326 0.0231 0.0253
0.0398 0.0578 0.0674 0.0450];
for l = 1:10
%1. Estimate Historical Mean & Cov

if normPort
histMean=mean(returns(1:end-(10-l),stocksNr));
histMean(2)=EQP(l);
%histMean(2)=HEP(l);
histCov=cov(returns(1:end-(10-l),stocksNr));
elseif resampPort
histMean=mean(returns(1:end-(10-l),stocksNr));
histMean(2)=EQP(l);
%histMean(2)=HEP(l);
histCov=cov(returns(1:end-(10-l),stocksNr));
end
%2. Sample the Distribution

if resampPort
wStarAll=zeros(nrAssets, nrPortfolios);
for j=1:I
r = mvnrnd(histMean,histCov,T);
sampMean = mean(r);
sampCov = cov(r);
%3. Calculate efficient sampled Frontier

if (errMean) && ~(errCov)
sampMean=sampMean;
sampCov=histCov;
elseif errCov && ~(errMean)
sampMean=histMean;
sampCov=sampCov;
elseif errCov && errMean
sampMean=sampMean;
sampCov=sampCov;
else
sampMean=histMean;
sampCov=histCov;
end
minMean = abs(min(sampMean));
maxMean = max(sampMean);
z=1;
for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean]
[wStar(:,z), tmp] = solveQuad(sampMean, sampCov, nrAssets, k);
z=z+1;
end
%4. Repeat step 2-3
allReturn(:,j)=wStar’*histMean’;
for q=1:nrPortfolios
allVol(q,j)=wStar(:,q)’*histCov*wStar(:,q);
end
wStarAll=wStarAll + wStar;
end
106 Code
%5. Calculate Average Weights

wStarAll=wStarAll./I;
returnResamp=wStarAll’*histMean’;
for i=1:nrPortfolios
volResamp(i)=wStarAll(:,i)’*histCov*wStarAll(:,i);
end
end
%6. Original Frontier
minMean = abs(min(histMean));
maxMean = max(histMean);
z=1;
for k=[minMean:(maxMean-minMean)/(nrPortfolios-1):maxMean]
[wStarHist(:,z), tmp] = solveQuad(histMean, histCov, nrAssets, k);
z=z+1;
end
returnHist=wStarHist’*histMean’;
for i=1:nrPortfolios
volHist(i)=wStarHist(:,i)’*histCov*wStarHist(:,i);
end
prices((11-l),:)=values(end-(l-1),stocksNr);
if resampPort
[mvp_val mvp_nr] = min(volHist);
[tmpMin, portNr] = min(abs(volResamp(mvp_nr:end)-volDesired));
weights(l,:)=wStarAll(:, portNr+mvp_nr-1)’;
else
[mvp_val mvp_nr] = min(volHist);
[tmpMin, portNr] = min(abs(volHist(mvp_nr:end)-volDesired));
weights(l,:)=wStarHist(:, portNr+mvp_nr-1)’;
end
end
[V, wealth]=buySell2(weights,prices)
B.9 Quadratic optimization

function [w, fval] = solveQuad(histMean, histCov, nrAssets, muBar)
clc;
H=histCov*2; f=zeros(nrAssets,1); A=[]; b=[]; Aeq=[histMean; ones(1,

nrAssets)]; beq=[muBar; 1]; lb=zeros(nrAssets,1);
ub=ones(nrAssets,1);
options = optimset(’LargeScale’,’off’);
[w, fval] = quadprog(H, f, A, b, Aeq, beq, lb, ub, [], options);

108 Code
Copyright
The publishers will keep this document online on the Internet - or its possible re-
placement - for a period of 25 years from the date of publication barring exceptional
circumstances. The online availability of the document implies a permanent per-
mission for anyone to read, to download, to print out single copies for your own use
and to use it unchanged for any non-commercial research and educational purpose.
Subsequent transfers of copyright cannot revoke this permission. All other uses of
the document are conditional on the consent of the copyright owner. The publisher
has taken technical and administrative measures to assure authenticity, security
and accessibility. According to intellectual property law the author has the right to
be mentioned when his/her work is accessed as described above and to be protected
against infringement. For additional information about the Linköping University
Electronic Press and its procedures for publication and for assurance of document
integrity, please refer to its WWW home page: http://www.ep.liu.se/
Upphovsrätt
Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare -
under 25 år från publiceringsdatum under förutsättning att inga extraordinära
omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och
en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att an-
vända det oförändrat för ickekommersiell forskning och för undervisning. Över-
föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd.
All annan användning av dokumentet kräver upphovsmannens medgivande. För
att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av
teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att
bli nämnd som upphovsman i den omfattning som god sed kräver vid användning
av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras
eller presenteras i sådan form eller i sådant sammanhang som är kränkande för
upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterli-
gare information om Linköping University Electronic Press se förlagets hemsida
http://www.ep.liu.se/

c May 12, 2008. Johan Bjurgert & Marcus Edstrand

Forecasting The Equity Premium and Optimal Portfolios

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Forecasting The Equity Premium and Optimal Portfolios

Caricato da

Copyright:

Formati disponibili

Matematiska Institutionen

Forecasting the Equity Premium and

Johan Bjurgert and Marcus Edstrand

Reg Nr: LITH-MAT-EX--2008/04--SE

Johan Bjurgert and Marcus Edstrand

Handledare: Dr Jörgen Blomvall

Examinator: Dr Jörgen Blomvall

Linköping, 15 April, 2008

Språk Rapporttyp ISBN

URL för elektronisk version

Titel Forecasting the Equity Premium and Optimal Portfolios

Författare Johan Bjurgert and Marcus Edstrand

The expected equity premium is an important parameter in many financial

Linear prediction models are commonly used by practitioners to forecast

Keywords: equity premium, Bayesian model averaging, linear prediction,

Keywords: equity premium, Bayesian model averaging, linear prediction,

Munich, April 2008

I Equity Premium Forecasting using Bayesian Statistics 7

3 Linear Regression Models 17

5 The Data Set and Linear Prediction 37

8 Discussion of the Forecasting 65

II Using the Equity Premium in Asset Allocation 69

10 Backtesting Portfolio Performance 85

B.7 setSubColumn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.1 Bayesian revising of probabilities . . . . . . . . . . . . . . . . . . . 26

5.1 The historical equity premium over time . . . . . . . . . . . . . . . 38

7.1 The equity premium from the univariate forecasts . . . . . . . . . . 58

9.1 Comparison of efficient and resampled frontier . . . . . . . . . . . . 81

10.1 Portfolio value over time using different strategies . . . . . . . . . . 86

3.1 Critical values for the Durbin-Watson test. . . . . . . . . . . . . . 23

5.1 The data set and sources . . . . . . . . . . . . . . . . . . . . . . . . 38

7.1 Forecasting statistics in percent . . . . . . . . . . . . . . . . . . . . 57

9.1 Input parameters for portfolio resampling . . . . . . . . . . . . . . 78

10.1 Portfolio returns over time . . . . . . . . . . . . . . . . . . . . . . . 86

1.2 Problem definition

In chapter 2 we present the concept of the equity premium, usual assumptions

Equity Premium Forecasting

The Equity Premium

2.1 What is the equity premium?

• historical equity premium (HEP): historical return of the stock market

• expected equity premium (EEP): expected return of the stock market

• required equity premium (REP): incremental return of the market port-

HEPt = rm,t − rf,t−1 = ( PPt−1

2.2 Historical models

rm,t = Et−1 [rm,t ] + em,t (2.2)

2.3 Implied models

E[ri,t ] = rf,t + βi,t E[rm,t − rf,t ] (2.4)

By combining the two equations, where dividends are approximated as E[Di,t+1 ] =

3 Capital asset pricing model, see [7]

2.4 Conditional models

As an example of such a model, the conditional version of the CAPM implies

E[ri,t |Ωt−1 ] = βi,t E[rm,t |Ωt−1 ] (2.6)

where the market beta is

E[rm,t |Ωt−1 ] = λm,t (Ωt−1 ) var [rm,t|Ωt−1 ]. (2.8)

2.5 Multi factor models

The most prominent candidates of economic factors used as explanatory variables

2.6 A short summary of the models

Model type Advantages Disadvantages

2.7 What is a good model?

Economical reasoning criterions

Technical reasoning criterions

• It should be possible to use different time resolutions in the data input

2.8 Chosen model