Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 1 / 128
Reading List
F.X. Diebold and R.S. Mariano (1995) Comparing predictive accuracy, Journal of Business
and Economic Statistics 13, 253-263.
R.S. Mariano (2007) Testing forecast accuracy, in: A Companion to Economic Forecasting
(eds. M. P. Clements and D. F. Hendry), Blackwell. Here is a working paper version.
F.X. Diebold and L.A. Lopez (1995) Forecast evaluation and combination, Federal Reserve
Bank of New York Research Paper No. 9525.
G. Elliott and A. Timmermann (2008) Economic forecasting, Journal of Economic
Literature 46, 3-56. Here is a working paper version.
A. Timmermann (2006) Forecast combinations, in: Handbook of Economic Forecasting,
Vol. 1, 99-134. Here is a working paper version.
A.J. Patton and A. Timmermann (2010) Generalized Forecast Errors, A Change of Measure,
and Forecast Optimality Conditions, in: Volatility and Time Series Econometrics: Essays in
Honor of Robert F. Engle, edited by T. Bollerslev, J.R. Russell and M.W. Watson, Oxford
University Press. Here is a working paper version.
A.J. Patton and A. Timmermann (2007) Testing Forecast Optimality under Unknown Loss,
Journal of the American Statistical Association, 102(480), 1172-1184. Here is a working
paper version.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 2 / 128
Example: Forecasting US inflation and unemployment
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 3 / 128
Example: Forecasting US inflation and unemployment
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 4 / 128
Example: Forecasting US inflation and unemployment
10
2
1980 1985 1990 1995 2000 2005 2010
0.5
0.5
1
1980 1985 1990 1995 2000 2005 2010
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 5 / 128
Example: Forecasting US inflation and unemployment
US inflation rate, actual (blue) and 1step Green Book forecast (red)
20
15
10
10
1980 1985 1990 1995 2000 2005 2010
10
1980 1985 1990 1995 2000 2005 2010
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 6 / 128
Example: Forecasting US inflation and unemployment
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 7 / 128
Example: Forecasting US inflation and unemployment
2
1985 1990 1995 2000 2005 2010 2015 2020
0.5
0.5
1985 1990 1995 2000 2005 2010 2015 2020
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 8 / 128
Example: Forecasting US inflation and unemployment
3
1985 1990 1995 2000 2005 2010 2015 2020
4
1985 1990 1995 2000 2005 2010 2015 2020
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 9 / 128
Example: Forecasting US inflation and unemployment
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 10 / 128
Forecasting and forecast loss
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 11 / 128
Forecasting and forecast loss
Notation
I h: forecast horizon
I Yt+h : random variable of information set IT +h that generates the value to be
forecast (realization: yt+h )
I Yt+h|t : forecast using the information set IT .
I The forecast is typically based on some population parameters which are unknown
to the forecaster and have to be estimated.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 12 / 128
Forecasting and forecast loss
Loss function
I The real-valued loss function describes how bad a forecast might be.
I General formulation:
L(Yt+h|t , Yt+h , Wt )
Hence, in general the loss depends on the forecast Yt+h|t , the target variable Yt+h ,
and (some) data Wt of the information set IT .
I Many important loss functions can be simplified to
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 13 / 128
Forecasting and forecast loss
exists, i.e., the integral is finite. Note that pY (yt+h |It ) is the conditional density of
Yt+h given the information set It . Existence might be an issue if the conditional
distribution is fat-tailed and/or the loss in the tails of the distribution is very large.
I Symmetry: L(yt+h + c, yt+h , wt ) = L(yt+h c, yt+h , wt ). This might be a strong
assumption in many economic applications (government budget forecast, ECBs
inflation forecast, firms demand forecast, banks asset price forecast). Nevertheless
standard in many empirical studies.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 14 / 128
Forecasting and forecast loss
Optimal forecast
E[L0 (Y , Y , Wt )|It ] = 0.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 15 / 128
Forecasting and forecast loss
which is solved by
Y = E[Y |It ].
I Result: under MSE loss the optimal predictor of Yt+h is the conditional expectation
(given the information set available to the forecaster).
I This is a general result that does not only apply to VAR models (which was shown
in lecture 1).
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 16 / 128
Forecasting and forecast loss
Feasible forecast
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 17 / 128
Specific loss functions
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 18 / 128
Specific loss functions
LMSE (Y , Y , Wt ) = LMSE (Y Y ) = (Y Y )2
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 19 / 128
Specific loss functions
LMAE (Y , Y , Wt ) = LMAE (Y Y ) = |Y Y |
I Large forecast errors are penalized less heavily than under MSE loss.
I Not easily differentiable at Y = Y (no interchangeability of differentiation and
integration).
I For all continuous distributions pY (y |It ) the optimal forecast is the conditional
median of Y .
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 20 / 128
Specific loss functions
I This is a very flexible loss function that only depends on the forecast error
e = Y Y .
I It is symmetric for = 0.5.
I Not easily differentiable at Y = Y (no interchangeability of differentiation and
integration).
I Nests MAE loss ( = 0.5, p = 1) and MSE loss ( = 0.5, p = 2).
I p = 1: lin-lin loss (asymmetric piecewise linear loss).
I p = 2: quad-quad loss (asymmetric piecewise quadratic loss).
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 21 / 128
Specific loss functions
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1.5 1 0.5 0 0.5 1 1.5
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 22 / 128
Specific loss functions
0.8
0.6
0.4
0.2
0
1.5 1 0.5 0 0.5 1 1.5
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 23 / 128
Specific loss functions
Linex loss
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 24 / 128
Specific loss functions
0
1.5 1 0.5 0 0.5 1 1.5 2 2.5
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 25 / 128
Specific loss functions
0 if sign(Y ) = sign(Y )
LDoC (Y , Y , Wt ; b) =
1 otherwise
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 26 / 128
Specific loss functions
LWMSE (Y , Y , Wt ) = Wt (Y Y )2
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 27 / 128
Specific loss functions
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 28 / 128
Specific loss functions
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 29 / 128
Specific loss functions
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 30 / 128
Specific loss functions
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 31 / 128
Specific loss functions
I Find (and use) the optimal forecast based on a specific loss function appropriate for
the forecasting problem at hand.
I Compare different forecasts or forecasting models for a given loss function in order
to identify the best model.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 32 / 128
Finding optimal forecasts
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 33 / 128
Finding optimal forecasts
I For some loss functions we can state closed form solutions for the optimal forecasts
without knowing the distribution of the target variable.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 34 / 128
Finding optimal forecasts
Suppose we want to forecast an iid Bernoulli random variable x that takes a value of 1
with probability p and a value of 0 with probability 1 p.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 35 / 128
Finding optimal forecasts
1.4
1.2
0.8
0.6
0.4
0.2
0
0.2 0 0.2 0.4 0.6 0.8 1 1.2
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 36 / 128
Finding optimal forecasts
0.8
0.6
0.4
0.2
0
0.2 0 0.2 0.4 0.6 0.8 1 1.2
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 37 / 128
Finding optimal forecasts
Loss function:
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 38 / 128
Finding optimal forecasts
Loss function:
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 39 / 128
Finding optimal forecasts
Proof of Yt+h|t = Med[Yt+h |It ]:
where ft (y ) is the conditional density of Yt+h given It . For later use, also define Ft (y ) as
the conditional cdf of Yt+h given It .
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 40 / 128
Finding optimal forecasts
Here:
Z Y Z Y
(Y y )
(Y y )ft (y )dy = ft (y )dy + (Y Y )ft (Y ) 1 (Y Y )ft (Y ) 0
Y Y
Z Y
= ft (y )dy = Ft (Y )
and
Z Z Z
(y Y )
(y Y )ft (y )dy = ft (y )dy + 0 0 = ft (y )dy
Y Y Y Y Y
= Pt (Y > Y ) = [1 Pt (Y Y )] = Ft (Y ) 1
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 41 / 128
Finding optimal forecasts
1
2Ft (Y ) 1 = 0 Ft (Y ) =
2
By the definition of the median,
Y = Medt [Y ]
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 42 / 128
Finding optimal forecasts
Loss function:
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 43 / 128
Finding optimal forecasts
Proof of Yt+h|t = Quantile [Yt+h |It ]:
The proof is similar to the proof shown above for MAE loss. You will arrive at the
condition
Ft (Y ) =
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 44 / 128
Finding optimal forecasts
Loss function:
1
Yt+h|t = log E[exp(b Yt+h )|It ]
b
irrespective of the distribution of Yt+h .
2
Under conditional normality, i.e., Yt+h |It N (t+h|t , t+h|t ), the optimal forecast is
b 2
Yt+h|t = t+h|t + ,
2 t+h|t
2
where t+h|t is the conditional mean and t+h|t is the conditional variance.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 45 / 128
Finding optimal forecasts
Recall the moment generating function for a conditionally normal random variable:
2
mt (s) = E[exp(s Yt+h )|It ] = exp(t+h|t s + 0.5t+h|t s 2 ).
1 1
Yt+h|t = log E[exp(b Yt+h )|It ] = log mt (b)
b b
1 h
2
i b 2
= log exp(t+h|t b + 0.5t+h|t b 2 ) = t+h|t + t+h|t
b 2
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 46 / 128
Finding optimal forecasts
Under MSE and MAE loss, we only need to find the conditional mean t+h|t . Under
2
asymmetric loss, we need to to find the conditional variance t+h|t , too.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 47 / 128
Finding optimal forecasts
Now assume the 1-year ahead forecast of German GDP growth follows a conditional
normal distribution. A forecasting model (for example, a VAR) yields
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 48 / 128
Finding optimal forecasts
Let us think about loss functions. For many people is overprediction (=negative surprise)
more costly than underprediction (=positive surprise).
Suppose the German finance minister has a 50 percent harder time in Parliament when
MPs find out he overpredicted GDP (and thereby his budget) by 0.5 percentage points
than he has when he underpredicted GDP by 0.5 percentage points.
Let us thus parameterize Linlin and Linex loss such that L(0.5) = 1.50L(0.5).
This requires = 0.4 for Linlin loss and b = 1.2147 for Linex loss.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 49 / 128
Finding optimal forecasts
Linlin with =0.4 (blue) and Linex with b=1.2147 (red) loss
2
1.8
0.2
0
1.5 1 0.5 0 0.5 1 1.5
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 50 / 128
Finding optimal forecasts
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 51 / 128
Finding optimal forecasts
We know that Yt+h|t is the -quantile of the conditional distribution. Hence,
Noting that Y t+h|t follows a standard normal distribution (given It ), this simplifies to
= (Y t+h|t ),
1 () = Y t+h|t .
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 52 / 128
Properties of optimal forecasts
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 53 / 128
Properties of optimal forecasts
MSE loss
Optimal forecast: Yt+h|t = E[Yt+h |It ]
Forecast error: et+h|t Yt+h Yt+h|t
Properties:
I Unbiasedness: E(et+h|t )=0
I Unpredictability: Cov(et+h|t , Zt ) = 0 for all Zt It
I Increasing variance: Var(et+h|t ) Var(et+h+k|t ) for all k 0.
I Note that unpredictability implies that et+h|t is autocorrelated at most of order
h 1. Consequently for h = 1, the optimal forecast error is white noise.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 54 / 128
Properties of optimal forecasts
Proof of unbiasedness:
E(et+h|t |It ) = E(Yt+h Yt+h|t |It ) = E(Yt+h |It ) E(Yt+h |It ) = 0
By the LIE,
E(et+h|t ) = E[E(et+h|t |It )] = E[0] = 0.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 55 / 128
Properties of optimal forecasts
Proof of unpredictability:
Recall that the conditional expectation is zero: E(et+h|t |It ) = 0.
As Zt It , this implies: E(et+h|t |Zt ) = 0.
Hence, the covariance is zero: Cov(et+h|t , Zt ) = 0.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 56 / 128
Properties of optimal forecasts
By strict stationarity,
Taken together,
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 57 / 128
Properties of optimal forecasts
MAE loss
Optimal forecast: Yt+h|t = Med[Yt+h |It ]
Properties:
I Median unbiasedness: Med(et+h|t )=0
h i
1
I Median unpredictability: E 1et+h|t
0 2 Zt = 0 for all Zt It
I Increasing imprecision: E(|et+h|t |) E(|et+h+k|t |) for all k 0.
I This implies that for asymmetric distributions, the optimal forecast under MAE loss
differs from the optimal forecast under MSE loss.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 58 / 128
Properties of optimal forecasts
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 59 / 128
Properties of optimal forecasts
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 60 / 128
Properties of optimal forecasts
Lin-lin loss
Optimal forecast: Yt+h|t = Quantile [Yt+h |It ]
Properties:
I see below for general loss functions
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 61 / 128
Properties of optimal forecasts
Linex loss
1
Optimal forecast: Yt+h|t = b
log E[exp(b Yt+h )|It ]
Properties:
I see below for general loss functions
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 62 / 128
Properties of optimal forecasts
Optimal forecast: Under some regularity conditions, the optimal forecast satisfies
E(L(Yt+h Yt+h|t )/ Yt+h|t |It ) = 0
Generalized forecast error: t+h|t L(Yt+h Yt+h|t )/ Yt+h|t
Properties:
I Unbiasedness: E(t+h|t ) = E(t+h|t |It ) = 0
I Unpredictability: Cov(t+h|t , Zt ) = 0 for all Zt It
I Increasing loss: E(L(Yt+h Yt+h|t )) E(L(Yt+h Yt+h|tk )) for all k 0.
I Note that unpredictability implies that t+h|t is autocorrelated at most of order
h 1. Consequently for h = 1, the optimal generalized forecast error is white noise.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 63 / 128
Properties of optimal forecasts
I Often the forecast producer (e.g., the Fed) is separate from the forecast consumer
(e.g., us).
I In this case, the loss function used to obtain the forecast may be unknown.
I Still, under mild conditions, we can derive some properties an optimal forecast must
satisfy. This allows us to test whether the forecast producer did a good job...
I See Patton and Timmermann (2007) for details.
I (Note: Patton and Timmermann (2007) derive more results than the one we
consider in the following.)
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 64 / 128
Properties of optimal forecasts
I Assumption on the loss function: The loss function is a function solely of the
forecast error, i.e.,
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 65 / 128
Properties of optimal forecasts
2. The optimal forecast error et+h|t is independent of all Zt It , since
et+h|t = Yt+h Yt+h|t = h + t+h ,
2
where t+h |It F,h (0, ,h ). Thus,
E(et+h|t |It ) = h .
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 66 / 128
Properties of optimal forecasts
where qh (0, 1) depends only on the forecast horizon and the loss function. If Ft+h|t
is continuous and strictly increasing then we obtain:
1
Yt+h|t = Ft+h|t (qh ) = Quantileqh (Yt+h |It ).
4. The variable
It+h|t = 1Yt+h Y = 1et+h|t
0
t+h|t
is independent of all Zt It and thus Cov(It+h|t , Zt ) = 0.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 67 / 128
Properties of optimal forecasts
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 68 / 128
Evaluating the efficiency of a forecast
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 69 / 128
Evaluating the efficiency of a forecast
Notation
I Forecast horizon: h
I Forecast sample: t + h = T1 , ..., T2
I Number of h-step forecasts starting from the baseline sample: T = T2 T1 + 1
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 70 / 128
Evaluating the efficiency of a forecast
MSE loss
Test of unbiasedness
d
T eh N (0, 2 ),
where
2 = lim Var( T eh )
T
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 71 / 128
Evaluating the efficiency of a forecast
MSE loss
Test of unbiasedness
Note that for h > 1, the et+h|t s are autocorrelated. Therefore, to estimate 2 you have
to use an autocorrelation-robust estimator of the variance like Newey-West (where you
can set the lag window to h 1 because higher-order autocorrelation is excluded under
the null).
Test statistic:
eh
t=
/ T
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 72 / 128
Evaluating the efficiency of a forecast
MSE loss
Test of unpredictability
Hypotheses: H0 : 0 = 1 = 1 = = k = 0 versus H1 : H0
Note that for h > 1, the et+h|t s (and thus the vt+h s) are autocorrelated. Therefore, you
have to use an autocorrelation-robust estimator of the variance like Newey-West (where
you can set the lag window to h 1).
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 73 / 128
Evaluating the efficiency of a forecast
MSE loss
Test of unpredictability: Mincer-Zarnowitz regression
Note again that for h > 1, the vt+h s are autocorrelated. Therefore, you have to use an
autocorrelation-robust estimator of the variance like Newey-West (where you can set the
lag window to h 1).
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 74 / 128
Evaluating the efficiency of a forecast
MSE loss
Test of unpredictability: augmented Mincer-Zarnowitz regression
Note again that for h > 1, the vt+h s are autocorrelated. Therefore, you have to use an
autocorrelation-robust estimator of the variance like Newey-West (where you can set the
lag window to h 1).
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 75 / 128
Evaluating the efficiency of a forecast
MSE loss
Test of increasing variance
I When forecasts for multiple horizons (e.g., h = 1, 2, 3) are available, we can test the
inequality
Var(et+h|t ) Var(et+h+k|t )
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 76 / 128
Evaluating the efficiency of a forecast
MAE loss
Test of median unbiasedness
Hypotheses: H0 : E 1et+h|t 0 12 = 0 versus H1 : H0
Let us denote
1
t,h = 1et+h|t 0 .
2
The sample equivalent of the expected value is
T2 T2
X X 1 #(et+h|t s 0) 1
h = T 1 t,h = T 1 1et+h|t 0 = .
2 T 2
t+h=T1 t+h=T1
Test statistic:
h d
t= N (0, 1),
/ T
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 77 / 128
Evaluating the efficiency of a forecast
MAE loss
Test of median unpredictability
h i
1
Null hypothesis: E 1et+h|t
0 2
Zt = E[t,h Zt ] = 0 for all Zt It
Hypotheses: H0 : 0 = 1 = 1 = = k = 0 versus H1 : H0
Note that for h > 1, the et+h|t s (and thus the vt+h s) are autocorrelated. Therefore, you
have to use an autocorrelation-robust estimator of the variance like Newey-West (where
you can set the lag window to h 1).
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 78 / 128
Evaluating the efficiency of a forecast
Test statistic:
h d
t= N (0, 1),
/ T
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 79 / 128
Evaluating the efficiency of a forecast
Hypotheses: H0 : 0 = 1 = 1 = = k = 0 versus H1 : H0
Note that for h > 1, the t+h|t s (and thus the vt+h s) are autocorrelated. Therefore, you
have to use an autocorrelation-robust estimator of the variance like Newey-West (where
you can set the lag window to h 1).
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 80 / 128
Evaluating the efficiency of a forecast
Hypotheses: H0 : 1 = 1 = = k = 0 versus H1 : H0
Note that for h > 1, the et+h|t s (and thus the vt+h s) are autocorrelated. Therefore, you
have to use an autocorrelation-robust estimator of the variance like Newey-West (where
you can set the lag window to h 1).
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 81 / 128
Evaluating the efficiency of a forecast
Hypotheses: H0 : 1 = 1 = = k = 0 versus H1 : H0
Note that for h > 1, the It+h|t s (and thus the vt+h s) are autocorrelated. Therefore, you
have to use an autocorrelation-robust estimator of the variance like Newey-West (where
you can set the lag window to h 1).
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 82 / 128
Inferring the loss function
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 83 / 128
Inferring the loss function
References
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 84 / 128
Comparing forecast accuracy
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 85 / 128
Comparing forecast accuracy
Introduction
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 86 / 128
Comparing forecast accuracy
I You observe two (or more) h-step forecasts of Yt+h over the sample
t + h = T1 , . . . , T2 .
I Example: US 4-quarter ahead inflation forecasts published by the Fed (Green Book)
and by Consensus Economics (see here) for 1980Q1 - 2009Q4.
I You know the realized values of Yt+h .
I You ask: Which forecast is (systematically) better?
I Dimensions of comparison:
I Descriptive statistics (such as average losses, Theils U)
I Tests of forecast efficiency
I Tests of the null hypothesis that the accuracy of two different forecasts does not differ
systematically.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 87 / 128
Comparing forecast accuracy
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 88 / 128
Comparing forecast accuracy
Theils U
Theils U is usually defined in terms of the root of the MSE but in principle it can be
applied to any loss function.
q P
1 T2
t+h=T1 (yt+h yt+h|t )
2
RMSE(yt+h|t ) T
U= = q P
RMSE(yt+h|t ) 1 T2
(y y )2
T t+h=T1 t+h t+h|t
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 89 / 128
Comparing forecast accuracy
I Test of equal accuracy of two competing (non-nested) forecasts y1,t+h|t and y2,t+h|t
for target variable yt+h , t + h = T1 , . . . , T2 .
I Denote forecast errors e1,t+h|t and e2,t+h|t , respectively.
I Accuracy of forecasts is measured by some loss function L(). Here, focus on
error-based loss functions L(et+h|t ).
I Loss differential is denoted by
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 90 / 128
Comparing forecast accuracy
I Hypotheses
dh d
DM = N (0, 1).
/ T
I This is an asymptotic result. It cannot be expected to approximate the unknown
finite-sample distribution well for small forecast samples T .
I Note that the dh,t s may be autocorrelated. This holds even under MSE loss and if
h = 1 because we do not know whether the forecasts are optimal. Therefore, you
have to use an autocorrelation-robust estimator of the variance like Newey-West.
Set the lag window at least to h 1.
I Take into account that Newey-West type variance estimators may perform poorly in
small samples. Hence, interpret the test results with caution.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 91 / 128
Comparing forecast accuracy
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 92 / 128
Comparing forecast accuracy
and
d
T (y ) N 0, 2 ,
I The proofs can be found in Hamilton (1994, p. 186ff.). Also see Econometrics II.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 93 / 128
Comparing forecast accuracy
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 94 / 128
Comparing forecast accuracy
I The term
|j|
1
h+1
is called a kernel with bandwidth or lag window h (because autocovariances larger
than h are neglected).
I It leads to a downweighting of distant autocovariances. For example, for a
bandwidth of h = 3, the weights are
1 2 3 3 2 1
, , , 1, , , .
4 4 4 4 4 4
I Other kernels (Parzen kernel, quadratic kernel) have been proposed, see Hamilton
(1994, p. 281ff.) and Andrews (1991, Econometrica).
I Andrews (1991) also suggests a plug-in estimator for the bandwidth.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 95 / 128
Comparing forecast accuracy
I For small values of T , the MDM tests may have better size properties when critical
values from the t-distribution with T 1 degrees of freedom are used.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 96 / 128
Forecast methods in practice
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 97 / 128
Forecast methods in practice
Introduction
I The literature on forecast methods is vast and quickly evolving, particularly with
respect to large data methods.
I We even cannot give an overview here. But a few overview papers:
I Stock and Watson (2006), Forecasting with many predictors, In: Graham et al. (ed),
Handbook of Economic Forecasting 1, 516-550.
I Stock and Watson (2010), Dynamic factor models, In: Clements and Hendry (ed),
Oxford Handbook of Economic Forecasting.
I Stock and Watson (2012), Generalized Shrinkage Methods for Forecasting Using
Many Predictors, JBES 30(4), 481-493.
I Carriero, Kapetanios, Marcellino (2010), Forecasting large datasets with Bayesian
reduced rank multivariate models, Journal of Applied Econometrics 26(5), 735-61.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 98 / 128
Forecast methods in practice
Autoregressions
yt = a1 yt1 + + ap ytp + ut .
I Since AR models are special cases of the VAR model studied in the first half of the
semester, all those results apply.
I In particular, due to estimation uncertainty, small lag orders should be preferred.
I In fact, the AR(1) model is typically difficult to beat.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 99 / 128
Forecast methods in practice
AR-X models
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 100 / 128
Forecast methods in practice
VAR models
I Popular benchmark
I However, quickly overparameterized
I Valuable for multi-step forecasts
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 101 / 128
Forecast methods in practice
Multi-step forecasts
Direct forecasts
I Recall: forecasting Yt+h often requires knowledge of the conditional mean of Yt+h
given It .
I To be parsimonious, let us assume that the relevant variables included in It are Yt
and k indicator variables Xt .
I Now suppose we have a sample t = 1, . . . , T and want to forecast YT +h .
I Under stationarity and linearity, the sample counterpart to
is the regression
yT +h = 0 + 1 yT + . . . + q+1 yT q .
Multi-step forecasts
Direct forecasts
is, however, complicated by the fact that the disturbances are typically
autocorrelated for h > 1 which reduces estimation efficiency.
I To see the point, consider h = 2.
I Then vt+2 is due to unforeseeable shocks that occur in periods t + 1 and t + 2, while
vt+3 is due to unforeseeable shocks that occur in periods t + 2 and t + 3.
I Hence, the overlapping errors vt+2 and vt+3 are correlated.
I This is no surprise since optimal forecast errors should have a MA(h 1) structure
under MSE loss.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 103 / 128
Forecast methods in practice
Multi-step forecasts
Iterated forecasts
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 104 / 128
Forecast methods in practice
I ARX and VAR models only useful when the number of variables small
I Today: hundreds or even thousands of variables
I Impossible to include them all in one ARX or VAR model
I What to do?
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 105 / 128
Forecast methods in practice
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 106 / 128
Forecast combinations
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 107 / 128
Forecast combinations
Background
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 108 / 128
Forecast combinations
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 109 / 128
Forecast combinations
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 110 / 128
Forecast combinations
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 111 / 128
Forecast combinations
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 112 / 128
Forecast combinations
Under MSE loss, the aim is to minimize the expected variance if the forecast is unbiased.
c2 (w ) = w 2 12 + (1 w )2 22 + 2w (1 w )12 .
22 12
w =
12 + 22 212
12 12
1 w = .
12 + 22 212
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 113 / 128
Forecast combinations
The MSE of the optimal forecast combination (which is again unbiased) is obtained from
substituting w into the objective function. After some algebra, using 12 = 12 1 2 , this
yields
12 22 (1 212 )
c2 (w ) = .
12 + 22 21 2 12
In the following, we show this by comparing c2 (w ) with the smallest individual variance
12 , i.e., we assume that 12 < 22 . (This is without loss of generality because we can
denote any of the two forecasts as forecast 1.)
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 114 / 128
Forecast combinations
12 22 (1 212 ) 4 + 2 2 213 2 12
c2 (w ) 12 = 1 2 1 22
12 2
+ 2 21 2 12 1 + 2 21 2 12
12 22 212 14 + 213 2 12
=
12 + 22 21 2 12
12 12 22
2 1
= 12 2 + 212 2 2
2 2 1 + 2 21 2 12
2
12 22
1
= 12 2 2
2 1 + 2 21 2 12
c2 (w ) 12 0 c2 (w ) 12
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 115 / 128
Forecast combinations
The result also shows when combination does not pay off: The difference c2 (w ) 12 is
zero if
12 = 1 /2 .
1 w = 0.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 116 / 128
Forecast combinations
0
(121/2)2
4
1
0.5 1
0.8
0 0.6
0.5 0.4
0.2
1 0
1/2
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 117 / 128
Forecast combinations
0.5
0
(2c(w*)21)/21
0.5
1.5
1
0.5 1
0.8
0 0.6
0.5 0.4
0.2
1 0
1/2
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 118 / 128
Forecast combinations
In reality,
I we face the typical observation that population parameters seem unstable.
I we have to estimate the variances and correlations which typically induces a lot of
noise.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 119 / 128
Forecast combinations
To estimate the weights, we can estimate the variances and the covariance from the
sample moments:
T1 1
X
12 = T 1 (e1,t+1|t e1 )2
t=T
T1 1
X
22 = T 1 (e2,t+1|t e2 )2
t=T
T1 1
X
2
12 = T 1 (e1,t+1|t e1 )(e1,t+1|t e1 )
t=T
Asymptotically, the estimators converge towards the population values. Hence, the
estimated weight converges towards the optimal weight.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 120 / 128
Forecast combinations
As an alternative method, we can use OLS. To this end, write the combination as a
regression equation:
(To account for possibly non-zero sample means of the forecast error, an intercept may
be added.)
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 121 / 128
Forecast combinations
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 122 / 128
Forecast combinations
For large T , the standard error of the regression converges towards c (w ) because w
converges to w and thus the regression residual converges to ec,t+h|t .
1 c2 (w )
Avar(w ) = T 1
.
T T 1 t=T
P 1
(e2,t+h|t e1,t+h|t )2
Since
T1 1
" #
X
E T 1 (e2,t+h|t e1,t+h|t )2 = 12 + 22 212
t=T
1 c2 (w ) 1 12 22 (1 212 ) 1 1 212
Avar(w ) = = = .
2 2
T 1 + 2 212 2 2
T (1 + 2 212 ) 2 T ( 2 + 21 212 )2
1
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 123 / 128
Forecast combinations
Is this large?
This is huge!
(Note: precision becomes much better if the forecast errors are negatively correlated.)
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 124 / 128
Forecast combinations
0.8
0.6
0.4
0.2
0
1
0.5 1
0.8
0 0.6
0.5 0.4
0.2
1 0
1/2
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 125 / 128
Forecast combinations
Many different approaches to combine K different forecasts have been proposed (cf.
Timmermann, 2006):
I Equal weights: wi = 1/K .
I Optimal linear weights: using the weights estimated by OLS (see above).
MSE1
i
I Relative performance weights: wi = .
MSE1
P
k k
Rank1
i
I Rank-based weights: wi = .
Rank1
P
k k
I Trimming: before weighting, discard the worst 25%, 50% or even 75% of the
forecasts.
I Clustering: (1) cluster forecasts into groups of similar forecasts, (2) compute
average forecast of each cluster, (3) combine these cluster forecasts into one
forecast using one of the previous weighting schemes.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 126 / 128
Forecast combinations
If E [w ] = w , then E [w ] = 1
1+s
w < w , hence it is biased.
1 1
However, Var(w ) = Var 1+s
w = (1+s)2
Var(w ) < Var(w ).
Hence, the MSE (= bias2 + variance) of the shrinkage estimator can, in principle, be
smaller than the MSE of the OLS estimator.
In general, shrinkage pays off if the variance of the OLS estimator is large. As we saw
above, this may be a typical situation in forecast combination exercises.
In the multiple regression, shrinkage may also be a good idea if the regressors are highly
correlated such that the inverse of the moment matrix X 0 X is near singular. Again this
may characterize a set of multiple forecasts.
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 127 / 128
Forecast combinations
Carstensen (CAU Kiel) Multivariate Time Series Summer Term 2017 128 / 128