Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Sebastiano Manzan
Contents
1. Getting Started with R . . .
1.1 Working with data in R
1.2 Plotting the data . . . .
1.3 From prices to returns .
1.4 Distribution of the data
1.5 Creating functions in R .
1.6 Loops in R . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
5
8
10
10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14
14
17
21
23
27
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
34
38
38
40
45
47
53
56
61
4. Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Moving Average (MA) and Exponential Moving Average (EMA) . . . . . . . . . . . . . . . .
4.2 Auto-Regressive Conditional Heteroskedasticity (ARCH) models . . . . . . . . . . . . . . . .
62
63
67
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 81
. 81
. 89
. 92
. 98
. 102
require(tseries)
sp500 <- get.hist.quote("^GSPC", start="1970-01-01",end="2015-08-31", quote="AdjClose",
provider="yahoo", compression="m", quiet=TRUE)
where sp500 is defined as a zoo object and the arguments of the function are (see help(get.hist.quote) for
more details):
1.
2.
3.
4.
An alternative to using get.hist.quote() is the getSymbols() function from the quantmod package which
has the advantage to allow downloading data for several symbols simultaneously. For example, if we are
interested in obtaining data for Apple Inc. (ticker: AAPL) and the S&P 500 Index from January 1970 to August
2015 we could run the following lines of code:
require(quantmod)
getSymbols(c("AAPL","^GSPC"), src="yahoo", from='1990-01-02',to='2015-08-31')
Notice that the getSymbols() function does not require you to specify the frequency of the data, but it
uses the daily frequency by default. To subsample to lower frequencies, the package provides the functions
to.weekly() and to.monthly() that convert the series (from daily) to weekly or monthly. Notice also that
the getSymbols() function creates as many objects as tickers and each will contain the open, high, low, close,
adjusted close price and the volume which are zoo objects. The quantmod package provides functions to extract
information from these objects, such as Ad(AAPL) that returns the adjusted closing price of the AAPL object
(similarly, the functions Op(x), Hi(x), Lo(x), Cl(x), Vo(x) extract the open, high, low, close price and volume).
The package also has functions such as LoHi(x) which creates a series with the difference between the highest
and lowest intra-day price. Another useful function is ClCl(x) which calculates the daily percentage change
of the closing price in day t relative to the closing in day t 1.
The third possibility is to use the fImport package which has function yahooSeries() that can download
several tickers for a specified period of time and frequency. An example is provided below:
require(fImport)
data <- yahooSeries(c("AAPL", "^GSPC"), from="2003-01-01",
to="2015-08-31", frequency="monthly")
In addition, the package fImport also provides the function fredSeries() which allows to download data
from FRED which includes thousands of macroeconomic and financial series for the U.S. and other countries.
Similarly to the ticker of a stock, you need to know the FRED symbol for the variable that you are interested
to download by visiting the FRED webpage. For example, UNRATE is the civilian unemployment rate for
the US, CPIAUCSL is the Consumer Price Index (CPI) for all urban consumers, and GDPC1 is the real Gross
Domestic Product (GDP) in billions of chained 2009 dollars. We can download the three macroeconomic
variables together as follows:
require(fImport)
options(download.file.method="libcurl")
macrodata = fredSeries(c('UNRATE','CPIAUCSL','GDPC1'), from="1950-01-02", to="2015-08-31")
[1] 786
Next, we might want to look at the data using the head() and tail(), commands which provide the first and
last 5 observations:
head(sp500)
1970-01-02
1970-02-02
1970-03-02
1970-04-01
1970-05-01
1970-06-01
AdjClose
85.0
89.5
89.6
81.5
76.6
72.7
tail(macrodata)
GMT
2015-02-01
2015-03-01
2015-04-01
2015-05-01
2015-06-01
2015-07-01
since these objects are defined as time series, the function also prints the date of each observation. Notice that
in the macro-dataset we have many NA that represent missing values for the GDP variable. This is due to the
fact that the unemployment rate and the CPI Index are available monthly, whilst Real GDP is only available
at the quarterly frequency. The object can be converted to the quarterly frequency using the to.quarterly()
function in the quantmod package as follows: to.quarterly(as.zooreg(macrodata),OHLC=FALSE).
Plotting the data is a useful way to learn about the behavior of a series over time and this can be done very
easily in R using the function plot(). For example, a graph of the S&P 500 Index from 1970 to 2013 can
be easily generated with the command plot(sp500) which produces the following graph. Since we defined
sp500 as a time series object (more specifically, as a zoo object), the x-axis represents time without any further
intervention by the user.
For many economic and financial variables that display exponential growth over time, it is often convenient to
plot the log of the variable rather than its level. This has the additional advantage that differences between the
values at two points in time represent an approximate percentage change of the variable in that period of time.
This can be achieved by plotting the natural logarithm of the variable with the command plot(log(sp500),
xlab="Time",ylab="S&P 500 Index"):
Notice that we added two more arguments to the plot() function to personalize the labels on the axis, instead
of the default values (compare the two graphs).
where we multiply by 100 to obtain percentage returns. An alternative way to calculate these returns is by
using the diff() function which takes the first difference of the time series, that is, Pt = Pt Pt1 . Using
the diff() function we would calculate returns as follows:
simpleR = 100 * diff(sp500) / lag(sp500, -1)
logR
= 100 * diff(log(sp500))
A first step in data analysis is to calculate descriptive statistics that summarize the main features of the
distribution of the data, such as the average/median returns and their dispersion. One way to do this is by
using the summary() function which provides the output shown below:
summary(simpleR)
Index
Min.
:1970-02-02
1st Qu.:1981-06-16
Median :1992-11-02
Mean
:1992-10-31
3rd Qu.:2004-03-16
Max.
:2015-08-03
AdjClose
Min.
:-21.76
1st Qu.: -1.85
Median : 0.94
Mean
: 0.68
3rd Qu.: 3.60
Max.
: 16.30
summary(logR)
Index
Min.
:1970-02-02
1st Qu.:1981-06-16
Median :1992-11-02
Mean
:1992-10-31
3rd Qu.:2004-03-16
Max.
:2015-08-03
AdjClose
Min.
:-24.54
1st Qu.: -1.87
Median : 0.93
Mean
: 0.58
3rd Qu.: 3.53
Max.
: 15.10
You may notice that the mean, median, and 1st and 3rd quartile (25% and 75%) are quite close values but the
minimum (and the maximum) are quite different: for the simple return the maximum drop is -21.763% and
for the logarithmic return the maximum drop is -24.543%. The reason for this is that the logarithmic return is
an approximation to the simple return that works well when the returns are small but becomes increasingly
unreliable for large (positive or negative) returns.
An advantage of using logarithmic returns is that it simplifies the calculation of multiperiod returns. This is
due to the fact that the (continuously compounded) return over k periods is given by rtk = log(Pt )log(Ptk )
which can be expressed as the sum of one-period logarithmic returns, that is
rtj+1
j=1
Instead, for simple returns the multi-period return would be calculated as Rtk = kj=1 (1 + Rtj+1 ) 1. One
reason to prefer logarithmic to simple returns is that it is easier to derive the properties of the sum of random
variables, rather than their product. The disadvantage of using the continuously compounded return is that
when calculating the return of a portfolio the weighted average of log returns of the individual assets is only
an approximation of the log portfolio return. However, at the daily and monthly horizons returns are very
small and thus the approximation error is relatively minor.
Descriptive statistics can also be obtained by individual commands that calculate the mean(), sd() (standard
deviation), median(), and empirical quantiles (quantile( , tau) with tau a value between 0 and 1). The
package fBasics provides additional functions such as skewness() and kurtosis() which are particularly
relevant in the analysis of financial data. This package has also a function basicStats() that provides a table
of descriptive statistics as follows (see help(basicStats) for details):
require(fBasics)
basicStats(logR)
nobs
NAs
Minimum
Maximum
1. Quartile
3. Quartile
Mean
Median
Sum
SE Mean
LCL Mean
UCL Mean
Variance
Stdev
Skewness
Kurtosis
AdjClose
547.000
0.000
-24.543
15.104
-1.872
3.534
0.576
0.932
315.183
0.190
0.204
0.948
19.644
4.432
-0.714
2.628
In addition, in the presence of several assets we might be interested in calculating the covariance and
correlation among these assets. Lets define Ret to have two columns representing the return of Apple in
the first column, and the return of the S&P 500 Index in the second column. We can use the functions cov()
and cor() to estimate the covariance and correlation as follows:
cov(Ret, use='complete.obs')
aapl sp500
aapl 193.8 24.1
sp500 24.1 18.5
Where the elements in the diagonal are the variances of the Apple and S&P 500 returns and the off-diagonal
element is the covariance between the two series (the off-diagonal elements are the same bebause the
covariance between X and Y is the same as the covariance between Y and X). The correlation matrix is
calculated as:
cor(Ret, use='complete.obs')
aapl sp500
aapl 1.000 0.402
sp500 0.402 1.000
where the diagonal elements are equal to 1 because it represents the correlation of X with X and the offdiagonal element is the correlation between the two returns.
We can now plot the monthly simple and logarithmic return as follows:
plot(simpleR)
where the abline() command produces a horizontal line at 0 with a certain color (col=4 is blue), type of line
(lty), and width (lwd).
variable takes values in a certain bin/interval. The function hist() serves this purpose and can be used as
follows:
hist(logR, breaks=50, main="S&P 500")
This is a basic histogram in which we set the number of bins to 50 and the title of the graph to Apple. We
can also add a nonparametric estimate of the frequency that smooths out the roughness of the histogram and
makes the density estimates continuous (N.B.: the prob=TRUE option makes the y-scale probabilities instead
of frequencies):
hist(logR, breaks=50,main="S&P 500",xlab="Return",ylab="",prob=TRUE)
lines(density(logR,na.rm=TRUE),col=2,lwd=2)
box()
10
## [1] 0.576
Not surprisingly, the result is the same as the one obtained using the mean function. More generally, a function
can take several arguments, but it has to return only one outcome, which could be a list of items. The function
we defined above is quite simple and it has several limitations: 1) it does not take into account that the series
might have NAs, and 2) it does not calculate the mean of each column in case there are several. As an exercise,
modify the mymean function to accomodate for these issues.
1.6 Loops in R
A loop consists of a set of commands that we are interested to repeat a pre-specified number of times and to
store the results for further analysis. There are several types of loops, with the for loop probably the most
popular. The syntax in R to implement a for loop is as follows:
for (i in 1:N)
{
## write your commands here
}
where i is an indicator and N is the number of times the loop is repeated. As an example, we can write a
function that contains a loop to calculate the sum of a variable and compare the results to the sum() function
provided in R. This function could be written as follows:
11
Notice that to define the mysum() function we only use the basic + operator and the for loop. This is just a
simple illustration of how the for loop can be used to produce functions that perform a certain operation
on the data. Lets consider another example of the use of the for loop that demonstrates the validity of the
Central Limit Theorem (CLT). We are going to do this by simulation, which means that we simulate data and
calculate some statistic of interest and repeat these operations a large number of times. In particular, we want
to demonstrate that, no matter how the data are distributed, the sample mean is normally distributed with
mean the population mean and variance given by 2 /N , where 2 is the population variance of the data and
N is the sample size. We assume that the population distribution is N (0, 4) and we want to repeat a large
number of times the following operations:
1. Generate a sample of length N
2. Calculate the sample mean
3. Repeat 1-2 S times
Every statistical package provides functions to simulate data from a certain distribution. The function rnorm(
N, mu, sigma) simulate N observations from the normal distribution with mean mu and standard deviation
sigma whilst rt(N, df, ncp) generates a sample of length N from the t distribution with df degrees-of-freedom
and non-centrality parameter ncp. The code to perform this simulation is as follows:
S
N
mu
sigma
=
=
=
=
1000
1000
0
2
Ybar = vector('numeric', S)
for (i in 1:S)
{
= rnorm(N, mu, sigma)
Y
Ybar[i] = mean(Y)
}
c(mean(Ybar), sd(Ybar))
[1] -0.000337
12
0.062712
The object Ybar contains 1000 elements each representing the sample mean of a random sample of length 1000
drawn from a certain distribution. We expect that these values are distributed as a normal distribution with
mean equal to 0 (the population mean) and standard deviation 2/31.623 = 0.063. We can assess this by plotting
the histogram of Ybar and overlap it with the distribution of the sample mean. The graph below shows that
the two distribution seem very close to each other. This is confirmed by the fact that the mean of Ybar and its
standard deviation are both very close to their expected values. To evaluate the normality of the distribution,
we can estimate the skewness and kurtosis of store which we expect to be close to zero to indicate normality.
These values are -0.042 and -0.199 which can be considered close enough to zero to conclude that Ybar is
normally distributed.
What would happen if we generate samples from a t instead of a normal distribution? For a small number
of degrees-of-freedom the t distribution has fatter tails than the normal, but the CLT is still valid and we
should expect results similar to the previous ones. We can run the same code as above, but replace the line Y
= rnorm(N, mu, sigma) with Y = rt(N, df) with df=4. The plot of the histogram and normal distribution
(with 2 = df /(df 2)) below shows that the empirical distribution of Ybar closely tracks the asymptotic
distribution of the sample mean.
13
t ,Yt
= Xt ,Yt XYt
for the OLS coefficient estimates. The estimate of the slope coefficient is given by 1 = X
2
Xt
2 and
where
X
Y2t represent the sample variances of two variables, and
Xt ,Yt and Xt ,Yt are the sample
t
where
covariance and the correlation of Xt and Yt , respectively. The intercept is given by 0 = Y 1 X
and Y represent the sample mean of Xt and Yt . Lets assume that aaplret represents the monthly excess
X
return of Apple which we consider our dependent variable, and that sp500ret is the monthly excess return
of the S&P 500. To calculate the estimate of the slope coefficient 1 we need to first estimate the covariance
of the stock and the index, and the variance of the index return. The R commands to estimate these quantities
were introduced in the previous Chapter, so that 1 can be calculated as follows:
[1] 1.301034
The interpretation of the coefficient estimate is that if the S&P 500 changes by 1% then we expect that the
Apple stock prices changes by 1.3010343%. The alternative way to calculate 1 is using the formula with the
correlation coefficient and the ratio of the standard deviations of the two assets:
cor(aaplret, sp500ret) * sd(aaplret) / sd(sp500ret)
[1] 1.301034
Once the slope parameter is estimated, we can then estimate the intercept as follows:
14
15
[1] 0.7095597
Naturally, R has functions that estimate the LRM automatically and provide a wealth of information. The
function is called lm() for linear model and below is the way it is used to estimate the LRM:
fit <- lm(aaplret ~ sp500ret)
fit
Call:
lm(formula = aaplret ~ sp500ret)
Coefficients:
(Intercept)
0.7096
sp500ret
1.3010
The fit object provides the coefficient estimates and the function that has been used; a richer set of statistics
is provided by the function summary(fit) as shown below:
Call:
lm(formula = aaplret ~ sp500ret)
Residuals:
Min
1Q
-79.660 -5.796
Median
0.777
3Q
7.725
Max
32.215
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.7096
0.7598
0.934
0.351
sp500ret
1.3010
0.1753
7.420 1.34e-12 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.77 on 286 degrees of freedom
Multiple R-squared: 0.1614,
Adjusted R-squared: 0.1585
F-statistic: 55.06 on 1 and 286 DF, p-value: 1.34e-12
The table provided by the summary() function provides standard errors of the coefficient estimates, the tstatistics and p-values for the null hypothesis that the coefficient is equal to zero (and stars to indicate the
significance at 1, 5, and 10%), the R2 and its adjusted version, and the F test statistic for the overall significance
of the regression. Interpreting the magnitude of the R2 is always related to what can be expected in the specific
context that is being analyzed. In this case, a 14% is not a very high number and there is a lot of variability of
Apple returns that is not explained by a linear relationship with the market. This becomes evident when we
plot the time series of the residuals:
16
plot(fit$residuals)
abline(h=0, col=2, lty=2)
It is clear from this graph that the residuals are large in absolute magnitude (see the scale of the y-axis) and
some large negative residuals that are probably due to company-specific news hitting the market. We can
also evaluate the normality of the residuals graphically using a Quantile-Quantile (QQ) plot that represents
a scatter plot of the empirical quantiles of the time series against the quantiles of a normal distribution with
mean and variance estimated from the time series.
qqnorm(fit$residuals)
qqline(fit$residuals,col=2)
# Quantile-Quantile function
# diagonal line
The continuous line in the graph represents the diagonal and we expect the QQ points to be along or close to the line
if the residuals are normally distributed. Instead, in this case we observe large deviations from the diagonal
17
on the left tail and some smaller deviations on the right tail of the residuals distribution. The dot on the
bottom-left corner has coordinates approximately equal to -3% and -80%: this means that there is only 0.1%
probability that a normal distribution with the estimated mean and variance takes values smaller than -3%. On
the other hand, we observe that for Apple that same quantile is -80%, which indicates that the residuals of the
model have fatter tails relative to the normal distribution. This is because the monthly returns of Apple have
some extreme events that are not explained by exposure of the stock to market risk (1 Xt ). This produces
large unexplained returns that might have occurred because of the release of firm-specific news (rather than
market-wide) which depressed the stock price in that specific month. In a later Section we discuss the effect
of outliers on OLS coefficient estimates.
18
strategies indexes) from January 1993 until December 2013 at the monthly frequency. We start the analysis
by considering the HF Index which represents the first column of the file. Below we show a scatter plot of the
HF Index return against the S&P 500. There seems to be positive correlation between these indexes, although
the most striking feature of the plot is the difference in scale between the x- and y-axis: the HF returns range
between 6% while the equity index between 20% and 12%. The standard deviation of the HF index is
2.111% compared to 4.444% for the S&P 500 Index which shows that hedge funds, in general, provide an hedge
against large movements in markets (the S&P 500 in this case).
plot(sp500, hfindex, main="Credit Suisse Hedge Fund Index", cex.main=0.75)
abline(v=0, col=2)
abline(h=0, col=2)
Before introducing non-linearities, we estimate a linear model in which the HF index return is explained by
the S&P 500 return. The results below show the existence of a statistically significant relationship between the
two returns, with a 0.274 exposure of the HF return to the market return (if the market return changes by 1
% then we expect the fund return to change by 0.274%). The R2 of the regression is 0.3314986, which is not
very high and might indicate that a nonlinear model might be more successful to explain the time variation of
the hedge fund returns. We can add the fitted linear relationship 0.564 + 0.274 * sp500 to the previous scatter
plot to have a graphical understanding of the LRM.
fitlin
(Intercept)
sp500
19
Lets consider now the quadratic model in which we add the square of the market return:
sp500sq <- sp500^2
fitquad <- lm(hfindex ~ sp500 + sp500sq)
(Intercept)
sp500
sp500sq
If the coefficient of the square term is statistically significant then we conclude that a nonlinear form is a
better model for the relationship between market and fund returns. In this case, we find that the p-value for
the null hypothesis that the coefficient of the square market return is equal to zero is 0.039 (or 3.9%) which
is smaller than 0.10 (or 10%) and thus find evidence of nonlinearity in the relationship between the HF and
the market return. In the plot below we see that the contribution of the quadratic term (dashed line) is more
apparent at the extremes, while for small returns it overlaps with the fitted values of the linear model. In
terms of goodness-of-fit, the adjusted R2 of the quadratic regression is 0.338 compared to 0.329 for the linear
model, which is a modest but significant increase that makes the quadratic model preferable.
20
The second type of nonlinearity we discussed earlier is to assume the relationship is linear but with different
slopes below and above a certain threshold. In the example below we consider as threshold the median value of
the independent variable, which in this case is the market return. We then create two variables that represent
the market return below the median (sp500down) and above the median (sp500up). These two variables can
then enter the lm() command and the OLS estimation results are reported below.
m
sp500up
sp500down
fitupdown
<<<<-
(Intercept)
sp500up
sp500down
median(sp500)
sp500 * (sp500 >= m)
sp500 * (sp500 < m)
lm(hfindex ~ sp500up + sp500down)
The slope coefficient (or market exposure) above the median is 0.222 and below is 0.313, which suggests that
the HF Index is more sensitive to downward movements of the market relative to upward movements. We
conclude that there is evidence of a significant difference in the exposure of HF returns to positive/negative
market conditions, although it is not very strong. The adjusted R2 of 0.3299409 is slightly higher than the
linear model but lower relative to the quadratic model. The graph below shows the fitted regression line for
this model.
21
The problem with this graph is that the scale of the y-axis ranges between -40% and 10% and all of the HF
returns, except for one month, happen on the much smaller range between 10%. The extreme observation
that skews the graph corresponds to a month in which the market index lost 7.5% and the equity market
neutral index lost over 40%. To find out when the extreme observation occurred, we can use the command
which(hfneutral < -30) which indicates that it represents the 179th observation and corresponds to Nov
2008. What happened in Nov 2008 to create a loss of 40% to an aggregate index of market neutral strategy
hedge funds? In the following press release Credit Suisse discusses that they marked down to zero the assets of
22
the Kingate Global Fund, which was a hedge fund based on the British Virgin Island that acted as a feeder for
the Madoff funds and was completely wiped out all of its assets. Since the circumstances were so exceptional
and unlikely to happen again to such a large extent, it is probably warranted to simply drop that observation
from the sample when estimating the model parameters.
The most important reason for excluding extreme observations from the sample is that they contribute to
bias the coefficient estimates away from their true values. We can use the equity market neutral returns as
an example to evaluate the effect of one extreme event on the parameters. Below, we first estimate a LRM of
the fund returns on the market return and then the same regression, but dropping observation 179 from the
sample.
fit0 <- lm(hfneutral ~ sp500 )
(Intercept)
sp500
(Intercept)
sp500[-179]
These results indicate that by dropping the Nov 2008 return the estimate of 0 increases from 0.352 to 0.56
while the market exposure of the fund declines from 0.199 to 0.128, thus indicating a smaller exposure to
market returns. However, even after removing the outlier the exposure is statistically significant at 1%, that
suggests that the aggregate index is not market neutral. In term of goodness-of-fit, the R2 increases from
0.09405 to 0.2534943 due to dropping the large error experience in November 2008.
23
192607
192608
192609
24
Mkt.RF
SMB
HML
RF
2.95 -2.50 -2.67 0.22
2.63 -1.20 4.50 0.25
0.38 -1.33 -0.30 0.23
tail(FF, 3)
201310
201311
201312
Mkt.RF
SMB
HML RF
4.17 -1.53 1.39 0
3.12 1.31 -0.38 0
2.81 -0.44 -0.17 0
The dataset starts in July 1926 and ends in December 2013 and each column represents the factors discussed
above (plus the risk-free rate) with the returns expressed in percentage. The file is imported as a data.frame
and it is useful to give a time series characterization as we did in the previous Chapter using the command
FF <- zooreg(FF, start=c(1926,7), end=c(2013,12), frequency=12) which defines the dataset as a zoo
object.
25
Lets look at some descriptive statistics of these factors, in particular the mean and standard deviation. Since we
have a matrix with 4 columns (market returns, SMB, HML, and the risk-free rate) using the mean() function
would calculate just one number that represents the average of all columns. Instead, we want to apply the
mean() function to each column and there is an easy way to do this using the function apply(). This function
applies a function (e.g., mean or sd) to all the rows or columns of a matrix (second argument of the function
equal to 1 applies to rows and 2 to columns). In our case:
apply(FF, 2, mean)
Mkt.RF
SMB
HML
RF
0.6487429 0.2342095 0.3938190 0.2871429
The results show that the average monthly market return from 1926 to 2013 has been 0.649% (approx. 7.785%
yearly) in excess of the risk-free rate. The monthly average of SMB is 0.234% (approx 2.808% yearly) which
measures the monthly extra-return from investing on a portfolio of small caps relative to investing on large
capitalization stocks. The average of HML provides the extra-return from investing on value stocks relative
to growth stocks which annualized corresponds to about 4.728%.
apply(FF, 2, sd)
Mkt.RF
SMB
HML
RF
5.4137611 3.2317907 3.5117457 0.2539519
The standard deviation of monthly returns is 5.414% which can be annualized by multiplying by 12. The
SMB and HML factors show significant volatility, since their monthly standard deviations are 3.232 and 3.512,
respectively. Another quantity that we need to calculate to better understand the behavior of these risk factors
is their correlation. We can calculate the correlation matrix using the cor() function that was introduced
earlier:
cor(FF)
Mkt.RF
SMB
Mkt.RF 1.00000000 0.33429226
SMB
0.33429226 1.00000000
HML
0.21574891 0.11992280
RF
-0.06622179 -0.05650984
HML
RF
0.21574891 -0.06622179
0.11992280 -0.05650984
1.00000000 0.01528333
0.01528333 1.00000000
where we find that both SMB and HML are weakly correlated to the market returns (0.334 and 0.216,
respectively) and also among each other (0.12). In this sense, it seems that the factors capture relatively
uncorrelated sources of risk which is valuable from a diversification stand-point.
Lets consider an application to mutual funds returns and their relationship to the FF factors. We download
from Yahoo Finance data for the DFA Small Cap Value mutual fund (ticker: DFSVX). The fund invests in
26
company with small capitalization that are considered undervalued according to a valuation ratio (e.g., Book
to Market ratio) and holding the investment until these ratios are considered fair. In the long run, this strategy
outperforms the market, although it comes at the risk that they might significantly under perform over
shorter evaluation periods. We estimate a three-factor model with the monthly excess return of DFSVX as the
dependent variable and the regressors are represented by MKT, SMB, and HML. The three-factor model can
be estimated using the lm() command. However, when dealing with time series the package dyn provides a
wrapper for the lm() function which synchronizes the variables in case they span different time periods, but
also allows to use time series commands (like diff() and lag()) in the Equation definition. Notice that in the
regression below we are regressing ex_dfsvx on the first three columns of FF (FF[,1:3]) with the dependent
variable starting in March 1993, but the independent starting in 1926. In this case the dyn$lm() will adjust
automatically the series such that they all span the same time period:
fit
<- dyn$lm(ex_dfsvx ~ FF[,1:3]) # regress the excess fund returns on the 3 F-F factors
summary(fit)
Call:
lm(formula = dyn(ex_dfsvx ~ FF[, 1:3]))
Residuals:
Min
1Q Median
-4.3860 -0.6704 -0.0041
3Q
0.7024
Max
4.6372
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001708
0.072859
0.023
0.981
FF[, 1:3]1 1.057903
0.016828 62.865
<2e-16 ***
FF[, 1:3]2 0.806633
0.023013 35.051
<2e-16 ***
FF[, 1:3]3 0.673636
0.024075 27.981
<2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.129 on 245 degrees of freedom
(801 observations deleted due to missingness)
Multiple R-squared: 0.9618,
Adjusted R-squared: 0.9613
F-statistic: 2056 on 3 and 245 DF, p-value: < 2.2e-16
As it is clear from the regression results, the fund has not only exposure to the market factor, but also to
SMB and HML with positive coefficients. These results indicate that the fund overweighs value and small
stocks (relative to growth and large) and thus benefits, in the long-run, from the premium from investing
in those stocks. We expected this result since the fund states clearly its strategy to buy undervalued small
cap stocks. However, keep in mind that we could have ignored the investment strategy followed by the fund,
and the regression results would have indicated that the manager overweighs small cap value stocks in the
portfolio. In this model, positive coefficients for SMB and HML indicate that the fund manager overweighs
(relative to the market) small caps and/or value stocks, while a negative value suggest that the manager
gears the investments toward large caps and/or growth stocks. The intercept estimate is typically referred in
the finance literature as alpha and is interpreted as the risk-adjusted return, with the risk being measured
27
by 1 M KTt + 2 SM Bt + 3 HM Lt .The example above shows how to use the LRM to analyze the
investing style of a mutual fund or an investment strategy. In conjunction with nonlinear functional forms,
this model can also be useful to investigate the style and the risk factors of hedge fund returns. In this case the
role of nonlinearities is essential given the time-variation in the exposure (s) of this type of funds, as well as
the extensive use of derivative which imply nonlinear exposure to the risk factors.
The fund has an exposure of 1.077 against the US equity market which is highly significant. In addition, there
seems to be significant (at 10%) exposure to SMB with a coefficient of 0.395 and a t-statistic of 4.459, but not
to HML which has a t-statistic of 1.196. In addition, the R2 of the regression is equal to 0.65, which indicates a
reasonable fit for this type of regressions. Based on these results, we would conclude that the fund invests in
US equity with a focus on small cap stocks. However, it turns out that the fund is the Oppenheimer Developing
Markets (ticker: ODMAX) which is a fund that invests exclusively in stocks from emerging markets and does
not hold any US stock. The results above appear as inconsistent with the declared investment strategy of the
fund: how is it possible that the exposures to the MKT and SMB are large and significant in the regression
above but the funds does not hold any US stock? It is possible that, despite these factors not being directly
relevant to explain the performance of the fund, they indirectly proxy for the effect of an omitted risk factor
that is correlated with MKT and SMB. Given the investment objective of the fund, we could consider including
as an additional risk factor the MSCI Emerging Markets (EM) Index which seems a more appropriate choice
of benchmark for this fund. In terms of correlation between the EM returns and the FF factors, the results
below indicate that there is a strong positive correlation with the US-equity market, as demonstrated by a
correlation of 0.699, and by much lower correlations with SMB (positive) and HML (negative).
EM Mkt.RF
1.000 0.699
28
SMB
HML
RF
0.290 -0.184 -0.001
It seems thus reasonable to include the EM Index returns as an additional risk factor to explain the performance
of ODMAX. The Table below shows the estimation results of a regression of ODMAX excess monthly returns
on 4 factors, the EM factor in addition to the FF factors.
(Intercept)
EM
FF[, 1:3]1
FF[, 1:3]2
FF[, 1:3]3
Not surprisingly, the estimated exposure to EM is 0.803 and highly significant, whilst the exposure to the
FF factors decline significantly. In particular, adding EM to the regression has the effect of reducing the
coefficient of the MKT from 1.077 to 0.127. This large change (relative to its standard error) in the coefficient
can be attributed to the effect of omitting a relevant variable (i.e., EM) which produces bias in the coefficient
estimates of the FF factors. The estimate from the first regression of 1.077 is biased because it does not represent
(only) the effect of MKT on ODMAX, but also acts as a good proxy for the omitted source of risk of the EM
Index, given the large and positive correlation between MKT and EM. The effect of omitted variables and
the resulting bias in the coefficient estimates is not only an econometric issue, but it has important practical
implications. If we use the LRM for performance attribution, that is, disentangling the systematic component
of the fund return (beta) from the risk-adjusted part (alpha), then omitting some relevant risk factors has the
effect of producing bias in the remaining coefficients and thus changes our conclusion about the contribution
of each component to the performance of the fund.
To further illustrate the effect of omitted variables in producing biased coefficient estimates, we can perform
a simulation study of the problem. The steps of the simulation are as follows:
1. We will assume that the dependent variable Yt (for t = 1, , T ) is generated by the following model:
Yt = 0.5 X1,t + 0.5 X2,t + t where X1,t and X2,t are simulated from the (multivariate) normal
distribution with mean 0 and variance 1 for both variables, with their correlation set equal to . The
error term t is also normally distributed with mean 0 and standard deviation 0.5. In the context of this
simulation exercise, the model above for Yt represents the true model for which we know the population
values of the parameters (i.e., 0 = 0 and 1 = 2 = 0.5).
2. We then estimate by OLS the following model: Yt = 0 + 1 X1,t + t where we intentionally omit
X2,t from the regression. Notice that X2,t is both a relevant variable to explain Yt (since 2 = 0.5) and
it is correlated with X1,t if we set = 0
3. We repeat step 1-2 S times and store the estimate of 1
We can then analyze the properties of 1 , the estimate of 1 , by, for example, plotting a histogram of the
S values obtained in the simulation. If omitting X2,t does not introduce bias in the estimate of 1 , then we
would expect the histogram to be centered at the true value of the parameter 0.5. Instead, the histogram will
be shifted away from the true value of the parameter if the omission introduces estimation bias. The code
below starts by setting the values of the parameters, such as the number of simulations, length of the time
series, and the parameters of the distributions. Then the for loop iterates S times step 1 and 2 described above,
while the bottom part of the program plots the histogram.
29
require(MASS) # this package is needed for function `mvrnorm()` to simulate from the
# multivariate normal distribution
S
T
mu
cor
Sigma
beta
eps
<<<<<<<-
1000
# set the number of simulations
300
# set the number of periods
c(0,0)
# mean of variables X1 and X2
0.7
# correlation coefficient between X1 & X2
matrix(c(1,cor,cor,1), 2, 2) # covariance matrix of X = [X1, X2]
c(0.5, 0.5)
# slope coefficient of X = [X1, X2]
rnorm(T, 0, 0.5)
# errors
for (i in 1:S)
{
X
<- mvrnorm(T, mu, Sigma)
In the simulation exercise we set the correlation between X1,t and X2,t equal to 0.7 (line cor = 0.7) and
30
it is clear from the histogram above that the distribution of the OLS estimate is shifted away from the true
value of 0.5. This illustrates quite well the problem of omitted variable bias: we expect the estimates of 1
to be close to the true value of 0.5, but we find that these estimates range from 0.75 to 0.95. The bias that
arises from omitting a relevant variable does not disappear by using longer samples, but on the fact that we
omitted a relevant variable which is highly correlated with an included variable. If the omitted variable was
relevant but uncorrelated with the included variable, then the histogram of the OLS estimates would look like
the following plot that is produced with the earlier code and by setting cor = 0.
31
32
The option plot=TRUE implies that the function provides a graph where the horizontal axis is represented by
lag k (starting at 0 and expressed as fraction of a year) and the vertical axes represents the autocorrelation
which is a
value between -1 and 1. The horizontal dashed lines represents the 95% confidence interval equal
to 1.96/ T for the null hypothesis that the population auto-correlation at lag k is equal to 0. If the autocorrelation at lag k is within the interval we conclude that we do not reject the null that the correlation
coefficient for that lag is equal to zero (at 5% level). If the option plot=FALSE then the function prints the
estimates of the auto-correlation up to lag.max (equal to 12 in this example):
Autocorrelations of series 'spm.ret', by lag
0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833 0.6667 0.7500
1.000 0.071 -0.002 0.071 0.048 0.022 -0.073 0.037 0.058 -0.006
0.8333 0.9167 1.0000
0.011 0.028 0.078
This analysis considers monthly returns for the S&P 500 Index and shows that there is very small and
statistically insignificant serial correlation across months. We can use the ACF to investigate if there is
evidence of serial correlation at the daily frequency. For the same sample period. The average daily percentage
return in this sample period is 0.027% and the daily standard deviation is 1.156% (total number of days is 6049).
The ACF plot up to 25 lags is reported below:
33
Since the sample size (T ) is quite large at the daily frequency these bands are tight around zero. However, we
find that some of these auto-correlations are statistically significant (lag 1, 2, 5, 7 and 10) at 5%, although from
an economic standpoint they are very small to provide predictive power for the direction of future returns.
Although the S&P 500 returns do not show any significant correlation at the daily and monthly frequency,
it is common to find for several asset classes that their absolute or square returns show significant and longlasting serial correlations. This is a property of financial returns in general, and mostly at the daily frequency
and higher. The next plot shows the ACF for the absolute and square daily S&P 500 returns up to lag 100 days:
34
It is clear from these graphs that the absolute and square returns are significantly positively correlated and that
the auto-correlation decays very slowly. This shows that large (small) absolute returns are likely to be followed
by large (small) absolute returns, that is, the magnitude of returns is correlated rather than their direction.
This is associated with the evidence that returns display volatility clusters that represent periods (that can
last several months) of high volatility followed by periods of low volatility. This suggests that volatility is
persistent (and thus predictable), while returns are unpredictable.
35
Lets work with an example. We estimate an AR(1) model on the S&P 500 return at the monthly frequency.
I will discuss two (of several) ways to estimate an AR model in R. One way is to use the lm() function in
conjunction with the dyn package which gives lm() the capabilities to handle time series data and operations,
such as the lag() operator. The estimation is implemented below:
require(dyn)
fit <- dyn$lm(spm.ret ~ lag(spm.ret, -1))
summary(fit)
Call:
lm(formula = dyn(spm.ret ~ lag(spm.ret, -1)))
Residuals:
Min
1Q
-18.450 -2.391
Median
0.538
3Q
2.730
Max
10.338
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.5584
0.2573
2.17
0.031 *
lag(spm.ret, -1)
0.0707
0.0592
1.19
0.234
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.31 on 284 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.00499,
Adjusted R-squared: 0.00149
F-statistic: 1.42 on 1 and 284 DF, p-value: 0.234
where 0 = 0.558 and 1 = 0.071 represents the estimate of the coefficient of the lagged monthly return.
The estimate of 1 is quite close to zero which suggests that the monthly returns have little persistence and
that it is difficult to predict the return next month knowing the return for the current month. In addition,
we can test the null hypothesis that 1 = 0 and the t-statistic is 1.193 with a p-value of 0.234 suggesting
that the coefficient is not significant at 10%. The lack of persistence in financial returns at high-frequencies
(intra-daily or daily) but also at lower frequencies (weekly, monthly, quarterly) is well documented and it is
one of the stylized facts of returns common across asset classes.
Another function that can be used to estimate AR models is the ar() from the tseries package. The inputs of
the function are the series Yt , the maximum order/lag of the model, and the estimation method. An additional
argument of the function is aic that can be TRUE/FALSE. This option refers to the Akaike Information Criterion
(AIC) that is a method to select the optimal number of lags in the AR model. It is calculated as a penalized
goodness of fit measure, where the penalization is a function of the order of the AR model. In the case of
AR(1) the search is over lag 0 and 1 as you can see by comparing the two regression output below:
36
Call:
ar(x = spm.ret, aic = FALSE, order.max = 1, method = "ols", demean = FALSE,
intercept = TRUE)
Coefficients:
1
0.071
Intercept: 0.558 (0.256)
Order selected 1
sigma^2 estimated as
18.4
Call:
ar(x = spm.ret, aic = TRUE, order.max = 1, method = "ols", demean = FALSE,
intercept = TRUE)
sigma^2 estimated as
18.5
By setting aic=TRUE the results show that the selected order is 0 since the first lag is (statistically) irrelevant
(i.e., the best model is Yt = 0 + t ). The option demean means that by default the ar() function subtracts
the mean from the series before estimating the regression. In this case, the ar() function estimates the model
Yt = 0 +1 Yt1 +t the function estimates Yt Y = 1 (Yt1 Y )+t without an intercept. Estimating
the two models provides the same estimate of 1 because E(Yt ) = 0 + 1 E(Yt1 ); if we denote the
expected value of Yt by , then using the previous equation we can express the intercept as 0 = (1 1 ).
By replacing this value for 0 in the AR(1) model we obtain Yt = (1 1 ) + 1 Yt1 + t that can be
rearranged as Yt = 1 (Yt1 ) + t . So, estimating the model in deviations from the mean or with an
intercept leads to the same estimate of 1 .
More generally, we do not have to restrict ourselves to explain Yt using only Yt1 but can also use Yt2 ,
Yt3 and further lags in the past. The reason why more lags might be needed to model a time series relates
to the stickiness in wages, prices, and expectations which might delay the effect of shocks on economic and
financial variables. A generalization of the AR(1) model is the AR(p) that includes p lags of the variable:
Yt = 0 + 1 Yt + 2 Yt2 + + p Ytp + t which can be estimated with the dyn$lm() or ar()
commands. Since we are using monthly data, we can set p (order.max in the ar() function) equal to 12 and
the estimation results are provided below:
37
Call:
ar(x = spm.ret, aic = FALSE, order.max = 12, method = "ols",
Coefficients:
1
2
0.063 -0.017
10
11
0.011
0.015
3
0.107
12
0.065
4
0.037
5
0.034
6
-0.091
7
0.038
8
0.035
9
-0.003
sigma^2 estimated as
17.6
The coefficient estimates are quite close to zero, but to evaluate their statistical significance we need to obtain
the standard errors of the estimates and calculate the t-statistics for the null hypothesis that the coefficients
are equal to zero:
coef
se_coef
t
t
<- fit$ar
<- fit$asy.se.coef$ar
<- as.numeric(coef / se_coef)
1.7760
0.2541
0.6109
1.1048
0.5652 -1.5191
0.6445
0.5844
where we can see that only the third lag is significant at 10%, although not at 5%. Since most of the lags are not
significant, we prefer to reduce the number of lags and thus the amount of parameters that we are estimating.
We can select the optimal number of lags in the AR(p) model by setting the option aic=TRUE:
Call:
ar(x = spm.ret, aic = TRUE, order.max = 12, method = "ols", demean = FALSE,
intercept = TRUE)
sigma^2 estimated as
18.5
which shows that the model that maximizes the AIC criterion is a model with no lags. This confirms the
evidence from the ACF that past values of the series have little relevance in explaining the dynamics of the
current level of the variable Yt .
A similar analysis can be conducted on daily S&P 500 returns and their absolute or square transformations.
Below we present the results for an AR(5) model estimated on the absolute return of the S&P 500. It might
be preferable to use dyn$lm over ar() because it provides the typical regression output with a column for the
coefficient estimate, the standard errors, the t-statistics, and the p-values. The results are as follows:
38
(Intercept)
lag(abs(spd.ret),
lag(abs(spd.ret),
lag(abs(spd.ret),
lag(abs(spd.ret),
lag(abs(spd.ret),
(-1):(-5))1
(-1):(-5))2
(-1):(-5))3
(-1):(-5))4
(-1):(-5))5
The table shows that all lags are significant (even using a 1% significance level). The largest coefficient is 0.193
which considered together with the other coefficients that are positive and statistically significant the overall
persistence of absolute daily returns means that the series is quite predictable. Using the ar() function to
select the order by AIC results in the choice of 12 lags.
39
Assume that we are at the beginning of 2014 and the GDP data for the forth quarter of 2013 have been released
which allows to calculate the growth rate in quarter 4 of 2013. The aim is to forecast the percentage growth of
GDP in the first quarter of 2014 based on the information available at that time. The first step is to estimate
an AR(p) model on data up to 2013Q4. We can first use the R command ar() to select the order p of the AR(p)
model and the order selected is 1. We can then estimate an AR(1) model for the log-difference of real GDP:
fit <- dyn$lm(dlGDP ~ lag(dlGDP,-1))
(Intercept)
lag(dlGDP, -1)
which has a R2 of 0.152. The forecast for 2014Q1 based on the information available in 2013Q4 is obtained
as 0.482+0.386*0.648 which is equal to an expectation of GDP growth of 0.732%. The realized GDP growth in
that quarter happened to be -0.744 and the forecast error is thus -1.476% which is quite large relative to the
standard deviation of the time series of 0.939%. In addition, qualitatively the AR(1) model predicts persistence
of growth rates whilst in this case the realization was a temporary contraction of output in 2014Q1. R has of
course functions that produce forecasts automatically; for example:
fit <- ar(dlGDP, method="ols", order=1, demean=FALSE, intercept=TRUE)
predict(fit, n.ahead=4)
40
$pred
Qtr1 Qtr2 Qtr3 Qtr4
2014 0.732 0.764 0.776 0.781
$se
Qtr1 Qtr2 Qtr3 Qtr4
2014 0.855 0.917 0.926 0.927
where the variable dlGDP represents the percentage growth rates of GDP and the function predict() produces
forecasts up to n.ahead periods. The $se output of the predict() function represents the standard error of
the forecast and provides a measure of uncertainty around the forecast.
3.5 Seasonality
A seasonal pattern in a time series represents the regular occurrence of higher/lower realizations of the
variable in certain periods of the year. The seasonal pattern is related to the frequency at which the time
series is observed. For daily data it could be by the 7 days of the week, for monthly data by the 12 months
in a year, and for quarterly data by the 4 quarters in a year. For example, electricity consumption spikes
during the summer months while being lower in the rest of the year. Of course there are many other factors
that determine the consumption of electricity which might grow over time, but seasonality captures the
characteristic of systematically higher/lower values at certain times of the year. As an example, lets assume
that we want to investigate if there is seasonality in the S&P 500 returns at the monthly frequency. To capture
the seasonal pattern we use dummy variables that take value 1 in a certain month and zero in all other months.
For example, we define the dummy variable JANt to be equal to 1 if month t is January and 0 otherwise,
F EBt takes value 1 every February and it is 0 otherwise, and similarly for the remaining months. We can
then include the dummy variables in a regression model, for example,
Yt = 0 + 1 Xt + 2 F EBt + 3 M ARt + 4 AP Rt + 5 M AYt +
41
1990(5)
5
1990(6)
6
1990(7)
7
1990(8)
8
This shows that the month variable provides the month of each observation from 1 to 12. We can then use a
logic statement to define the monthly dummy variables such as JAN = as.numeric(quarter == 1) which
returns (printed are the first 12 observations of JAN)
[1] 1 0 0 0 0 0 0 0 0 0 0 0
To illustrate the use of seasonal dummy variables I consider a simple example in which I regress the monthly
return of the S&P 500 on the 12 monthly dummy variables (no Xt variable for now). As discussed before,
to avoid the dummy variable trap, I opt for the exclusion of the intercept/constant from the regression and
include the 12 dummy variables. The model is implemented as follows:
fit <- dyn$lm(spm.ret ~ -1 + JAN + FEB + MAR + APR + MAY +
JUN + JUL + AUG + SEP + OCT + NOV + DEC)
summary(fit)
Call:
lm(formula = dyn(spm.ret ~ -1 + JAN + FEB + MAR + APR + MAY +
JUN + JUL + AUG + SEP + OCT + NOV + DEC))
Residuals:
Min
1Q
-19.890 -2.087
Median
0.488
Coefficients:
Estimate Std.
JAN
-0.171
FEB
1.364
MAR
1.642
APR
0.905
MAY
-0.625
JUN
0.688
JUL
-1.133
AUG
-0.480
SEP
1.326
OCT
1.322
NOV
1.859
DEC
0.514
---
3Q
2.887
Max
8.905
42
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results indicate that, in most months, the expected return is not significantly different from zero, except
for March and November. In both cases the coefficient is positive which indicates that in those months it
is expected that returns are higher relative to the other months. To develop a more intuitive understanding
of the role of the seasonal dummies, the graph below shows the fitted or predicted returns from the model
above. In particular, the expected return of the S&P 500 in January is -0.171%, that is, E(Rt |JANt = 1) =
-0.171, while in February the expected return is E(Rt |F EBt = 1) = 1.364% and so on. These coefficients
are plotted in the graph below and create a regular pattern that is expected to repeat every year:
Returns seems to be positive in the first part of the year and then go into negative territory during the summer
months only to return positive toward the end of the year. However, keep in mind that only March and
November are significant at 10%. We can also add the lag of the S&P 500 return to the model above to see if
the significance of the monthly dummy variables changes and to evaluate if the goodness of the regression
increases:
fit <- dyn$lm(spm.ret ~ -1 + lag(spm.ret, -1) + JAN + FEB + MAR + APR +
MAY + JUN + JUL + AUG + SEP + OCT + NOV + DEC)
summary(fit)
43
Call:
lm(formula = dyn(spm.ret ~ -1 + lag(spm.ret, -1) + JAN + FEB +
MAR + APR + MAY + JUN + JUL + AUG + SEP + OCT + NOV + DEC))
Residuals:
Min
1Q
-19.321 -2.330
Median
0.367
3Q
2.821
Max
9.343
Coefficients:
Estimate Std. Error t value Pr(>|t|)
lag(spm.ret, -1)
0.0629
0.0604
1.04
0.299
JAN
-0.2472
0.8952
-0.28
0.783
FEB
1.3751
0.8759
1.57
0.118
MAR
1.5563
0.8797
1.77
0.078 .
APR
0.8016
0.8814
0.91
0.364
MAY
-0.6822
0.8775
-0.78
0.438
JUN
0.7273
0.8766
0.83
0.407
JUL
-1.1764
0.8768
-1.34
0.181
AUG
-0.4084
0.8785
-0.46
0.642
SEP
1.3562
0.8763
1.55
0.123
OCT
1.2387
0.8795
1.41
0.160
NOV
1.7756
0.8795
2.02
0.044 *
DEC
0.3987
0.9015
0.44
0.659
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.29 on 273 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.0703,
Adjusted R-squared: 0.0261
F-statistic: 1.59 on 13 and 273 DF, p-value: 0.0877
The results show that in July and August the expected return of the S&P 500 is negative, by 1.176% and 0.408%,
respectively, while in December it positive and equal to 0.399%.
Seasonality is a common characteristics of macroeconomic variables. Typically, we analyze these variables
on a seasonally-adjusted basis, which means that the statistical agencies have already removed the seasonal
component from the variable. However, they also provide the variables before the adjustment and we consider
the Department Stores Retail Trade (FRED ticker RSDSELDN) at the monthly frequency. The time series graph
for this variable from January 1995 is shown below.
44
The seasonal pattern is quite clear in the data and it seems to happen toward the end of the year. We can
conjecture that the spike in sales is probably associate with the holiday season in December, but we can
quantitatively test this hypothesis by estimating a linear regression model in which we include monthly
dummy variables as explanatory variables to account for this pattern. The regression model for the sales in $
of department stores, denoted by Yt , is
Yt = 0 + 1 Xt + 2 F EBt + 3 M ARt + 4 AP Rt + 5 M AYt + 6 JU Nt + 7 JU Lt + 8 AU Gt +
9 SEPt + 10 OCTt + 11 N OVt + 12 DECt + t
where, in addition to the monthly dummy variables, we have the first order lag of the variable. Below are
shown the regression results:
fit <- dyn$lm(Y ~ lag(Y, -1) + FEB + MAR + APR + MAY +
JUN + JUL + AUG + SEP + OCT + NOV + DEC)
45
The results indicate the retail sales at department stores are highly persistent with an AR(1) coefficient of
0.902. To interpret the estimates of the seasonal coefficients we need to notice that the dummy for the month
of January was left out so that all other seasonality dummy coefficients should be interpreted as the difference
is sales relative to the first month of the year. The results indicate that all coefficients are positive and thus
retail sales are higher than January. From the low levels of January, sales seems to increase toward the summer,
remain relative stable during the summer months, and then increase significantly in November and December.
Below is a graph of the variable and the model fit (red dashed line).
plot(Y, col="gray",lwd=5)
lines(fitted(fit),col=2,lty=2,lwd=2)
46
The three series have in common the feature of growing over time with no tendency to revert back to the
mean. In fact, the trending behavior of the variables implies that the mean of the series is also increasing
over time rather than being approximately constant as time progresses. For example, if we were to estimate
the mean of GDP, CPI, and the S&P 500 in 1985 it would not have been a good predictor of the future value
of the mean because the mean value of a variable with a trend keeps growing over time. This type of series
are called non-stationary because the mean and the variance of the distribution are changing over time. On
the other hand, series are defined stationary when their long-run distribution is constant as time progresses.
For these variables, we can thus conclude that their time series graph clearly indicate that the variables are
non-stationary.
Very often in economics and finance we prefer to take the natural logarithm since it makes the exponential
growth of some variables approximately linear. The same variables discussed above are shown below in
natural logarithm:
Taking the log of the variables is particularly relevant for the S&P 500 Index which shows a considerable
exponential behavior, at least until the end of the 1990s. Also for GDP the time series plot seems to become
more linear, while for CPI we observe different phases of a linear trend up to the end of the 1960s, the 1970s,
47
(Intercept)
7.761
trend
0.008
where the estimate of 1 is 0.008 which indicates that real GDP is expected to grow 1% every quarter. The
application of the linear trend model to the three series shown above gives as a result the dashed trend line
shown in the graph below:
The deviation of the series from the fitted trend line are small for GDP, but for CPI and S&P 500 indices
they are persistent and last for long periods of time (i.e., 10 years or even longer). Such persistent deviations
might be due to the inadequacy of the linear trend model and the need to consider a nonlinear trend. This
48
can be accommodated by adding a quadratic and cubic term to the linear trend model which becomes: Yt =
0 + 1 t + 2 t2 + 3 t3 + dt where t2 and t3 represent the square and cube of the trend variable.
The implementation in R requires the only additional step of creating the quadratic and cubic terms as shown
below:
trend2 <- trend^2
trend3 <- trend^3
fitsq <- dyn$lm(log(gdp) ~ trend + trend2)
round(summary(fitsq)$coefficients, 4)
(Intercept)
trend
trend2
fitcb
<- dyn$lm(log(gdp) ~ trend + trend2 + trend3)
round(summary(fitcb)$coefficients, 4)
(Intercept)
trend
trend2
trend3
The linear (dashed line) and cubic (dash-dot line) deterministic trends are shown in the Figure below. For
the case of GDP the differences between the two lines is not visually large, although the quadratic and/or
cubic regression coefficients might be statistically significant at conventional levels. In addition, the AIC of
the linear model is 281.471 while for quadratic and cubic trend model is -1004.701 and -1016.595. Hence, in this
case we would select the cubic model which does slightly better relative to the quadratic, and significantly
better relative to the linear trend model. However, for CPI the cubic trend seems to capture the slow increase
in the log CPI index at the beginning of the sample, followed by a rapid increase and then again a slower
growth of the index. The period of rapid growth of the CPI index happened in the 1970s when the surge in
oil prices led to an increase of the rate of inflation in the US and globally. However, it could be argued that
the deterministic trend model might not represent well the behavior of the CPI and S&P 500 index since the
series departs from the trend (even the cubic one) for long periods of time.
49
Another way to visualize the goodness of the trend-stationary model is to plot dt , the residuals or deviation
from the cubic trend, and investigate their time series properties. Below we show the time series graph of the
deviations and their ACF function with lag up to 20 (quarters for GDP and months for CPI and S&P 500):
50
The ACF shows clearly that the deviation of the log GDP from the cubic trend is persistent but with rapidly
decaying values. To the contrary, for CPI and S&P 500 we observe that the serial correlation decays very
slowly which is typical of non-stationary time series. In other words, we find that the deviations exhibit
non-stationary behavior even after taking into account for a deterministic trend.
An alternative model that is often used in asset and option pricing is the random walk with drift model. The
model takes the following form: Yt = + Yt1 + t where is a constant and t is an error term with mean
zero and variance 2 . The random walk model assumes that the expected value of Yt is equal to the previous
value of the series (Yt1 ) plus a constant term (which can be positive or negative). In formula we can write
this as E(Yt |Yt1 ) = + Yt1 . The model can also be reformulated by substituting backwards the value of
Yt1 which, based on the model, is + Yt2 + t1 and we obtain Yt = 2 + Yt2 + t + t1 . Then we
can substitute Yt2 , Yt3 , and so on until we reach Y0 and the model can be written as
51
Yt = + Yt1 + t
= + + Yt2 + t1 + t
= + + + Yt3 + t2 + t1 + t
=
= Y0 + t +
tj+1
j=1
This shows that a random walk with drift model can be expressed as the sum of a deterministic trend (t )
and a term which is the sum of all past errors/shocks
to the series. In case the drift term is set equal to zero,
the model reduces to Yt = Yt1 + t = Y0 + tj=1 tj+1 which is called the random walk model (without
drift since is equal to zero). Hence, another way to think of the random walk model with drift is as the sum
of a deterministic linear trend and a random walk process.
The relationship between the trend-stationary and the random walk with drift models becomes clear if we
assume that the deviation from the trend dt follow an AR(1) process, that is, dt = dt1 + t , where is
the coefficient of the first lag and t is a mean zero random variable. Similarly to above, we can do backward
substitution of the AR term in the trend-stationary model, that is,
Yt = 0 + 1 t + dt
+ dt1 + t
+ 2 dt2 + t1 + t
+ ...
+ t + t1 + 2 t2 + + t1 1
t
+
j1 tj+1
j=1
Comparing this equation with the one obtained above for the random walk with drift model we find that the
former is a special case of the latter for = 1. We have thus related the two models in being only different
in terms of the persistence of the deviation from the trend. If the coefficient is less than 1 the deviation
is stationary and thus the trend-stationary model can be used to de-trend the series and then conduct the
analysis on the deviation. However, when = 1 the deviation is non-stationary (i.e., random walk) and
the approach just described is not valid anymore and we will discuss later what to do in this case. A more
practical way to understand the issue of the (non-)stationarity of the deviation from the trend is to think
in terms of the speed at which the series is likely to revert back to the trend-line. Series that oscillate often
around the trend are stationary while persistent deviations from the trend (slow reversion) are an indication
of non-stationarity. How do we know if a series (e.g., the deviation from the trend) is stationary or not? In the
following section we will discuss a test that evaluates this hypothesis and thus provides guidance as to what
modeling approach to take.
The previous analysis of GDP, CPI, and the S&P 500 index shows that the deviations of GDP from its trend
seem to revert to the mean faster relative to the other two series: this can be seen both in the time series plot
52
and also from the quickly decaying ACF. The estimate of the trend-stationary model shows that we expect
GDP to grow around 0.8% per quarter (or 3.2% annualized), although GDP alternates periods above trend
(expansions) and periods below trend (recessions). The alternation between expansions and recessions thus
captures the mean-reverting nature of the GDP deviations from the long-run trend and its stationarity. We can
evaluate the ability of the trend-stationary model to capture the features of the business cycle by comparing
the periods of positive and negative deviations with the peak and trough dates of the business cycle decided
by the NBER dating committee. In the graph below we plot the deviation from the cubic trend estimated
earlier together with the gray areas that indicate the period of recessions.
# dates from the NBER business cycle dating committee
xleft = c(1953.25, 1957.5, 1960.25, 1969.75, 1973.75, 1980,
1981.5, 1990.5, 2001, 2007.917) # beginning
xright = c(1954.25, 1958.25, 1961, 1970.75, 1975, 1980.5, 1982.75,
1991, 2001.75, 2009.417) # end
#fitgdp is the lm() object for the cubic trend model
plot(residuals(fitgdp), ylim=c(-0.10,0.10), xlab="", ylab="")
abline(h=0, col=2, lwd=2, lty=2)
rect(xleft, rep(-0.10,10), xright, rep(0.10,10), col="gray90", border=NA)
Overall, there is a tendency for the deviation to sharply decline during recessions (gray areas), and then
increase during the recovery period and the expansion, which seem to have last longer since the mid-1980s.
Earlier we discussed that the distribution of a non-stationary variable changes over time. We can now derive
the mean and variance of Yt when it follows a trend-stationary model and when it follows a random walk
with drift. Under the trend-stationary model the dynamics follows Yt = 0 + 1 t + dt and we can make the
simplifying assumption that dt = dt1 +t with t a mean zero and variance 2 error term. Based on these
assumptions, we obtain that E(dt ) = 0 and variance V ar(dt ) = 2 /(1 2 ), so that Et (Yt ) = 0 + 1 t
and V art (Yt ) = V ar(dt ) = 2 /(1 2 ).This demonstrates that the mean of a trend-stationary variable is
a function of time and not constant, while the
On )
the other hand for the random walk
( variance is constant.
with drift model we have that Et (Yt ) = Et Y0 + t + tj=1 tj+1 = Y0 + t and V art (Yt ) = t2 .
53
From these results we see that for the random walk with drift model both the mean and the variance are time
varying, while for the trend-stationary model only the mean varies with time.
The main difference between the trend-stationary and random walk with drift models thus consists if the
non-stationarity properties of the deviations from the deterministic trend. For the trend-stationary model the
deviations are considered stationary and they can be analysed using regression models estimated by OLS to
investigate their dynamics. However, for the random walk with drift the deviations are non-stationary and its
time series cannot be considered in regression models because of several statistical issues that will be discussed
in the next Section, followed by a discussion of an approach to statistically test if a series is non-stationary
and non-stationary around a (deterministic) trend.
=
=
=
=
25
1000
0.1
1
54
The histogram shows that the empirical distribution of 1 over 1000 simulations ranges between 0.158 and
1.154 with a mean of 0.812 and median of 0.842. Both the mean and the median are significantly smaller
relative to the true value of 1. This demonstrates the bias in the OLS estimates in small samples when the
variable is non-stationary. However, this bias has a tendency to decline for larger sample sizes as shown in
the histogram below that is produced by the same code above but for T = 500:
In this case the min/max estimate are 0.948 and 1.005 with a mean and median of 0.997 and 0.998, respectively.
For the larger sample of 500 periods, even though the series is non-stationary, the coefficient estimates of 1
are close to the theoretical value of 1 and thus there is no bias.
55
The second fact that arises when estimating AR by OLS when variables are non-stationary is that the t statistic
does not follow the normal distribution even when samples are large. This can be seen clearly in the histogram
below for the test statistic for the null hypothesis that 1 = 1 and for T=500. The histogram below shows
that the distribution of t statistic for the null hypothesis that 1 = 1 which is skewed to the left relative to
the standard normal distribution.
The third problem with non-stationary variables occurs when the interest is the relationship between X
and Y and both variables are non-stationary. This could lead to spurious results of significant evidence of a
relationship between the two series when indeed they are independent of each other. An intuitive explanation
for this result can be provided when considering, e.g., two independent random walk with drift: estimating a
LRM regression model finds co-movement between the series due to the existence of a trend in both variables
that makes the series move in the same or opposite direction. The simulation below shows more intuitively
this results for two independent processes X and Y with the same drift parameter , but independent of
each other (i.e., Y is not a function of X). The histogram of the t statistics for the significance of 1 in
Yt = 0 + 1 Xt + t is shown in the left plot, while the R2 of the regression is shown on the right. The
distribution of the t test statistic has a significant positive mean and would lead to an extremely large number
of rejections of the hypothesis that 1 = 0, when indeed it is equal to zero. Also the distribution of the R2
shows that in the vast majority of these 1000 simulations we would find a moderate to large fit measure
which would suggest a significant relationship between the two variables, although the truth is that there is
no relationship.
56
require(dyn)
set.seed(1234)
T
B
mu
sigma
tstat
R2
=
=
=
=
500
1000
0.1
1
for (b in 1:B)
{
<- ts(cumsum(rnorm(T, mean=mu, sd=sigma)))
Y
X
<- ts(cumsum(rnorm(T, mean=mu, sd=sigma)))
fit
<- dyn$lm(Y ~ X)
tstat[b] <- summary(fit)$coef[2,3]
R2[b]
<- summary(fit)$r.square
}
# plotting
par(mfrow=c(1,2))
hist(tstat, breaks=50, freq=FALSE, main="")
box()
hist(R2, breaks=30, freq=FALSE, main="", xlim=c(0,1))
box()
57
if the deviation from the trend is stationary or non-stationary. Typically, only by visually inspecting at the
series it is possible to determine if there is a drift or not in the series and thus to narrow down the question
of the stationarity of the deviation from the deterministic trend. If a drift in the series is not apparent, then it
is likely that the series does not have a trend and thus the question is whether the series is an AR process or
a random walk without drift.
To test for stationarity we follow the usual approach of calculating a test statistic with a known distribution
under the null hypothesis which allows to decide whether we are confident on the stationarity of the series
or not. We do this by estimating the following processes by OLS:
Yt = Yt1 + t
Yt = + Yt1 + t
In the case of the first Equation we are assuming a mean-zero AR(1) model which becomes a random walk
(without drift) in case = 1, whilst for the following Equation we are estimating a AR(1) process which
becomes a random walk with drift in case = 1. The null hypothesis we are interested in testing is H0 : = 1
(non-stationarity) against the alternative H1 : < 1 (stationarity). Rejection of the null hypothesis leads to
the conclusion that the series is stationary while failure to reject is interpreted as evidence that the series is
non-stationary. These models are typically reformulated by subtracting Yt1 from both the left and right side
of the previous Equations which results in
Yt = Yt1 + t
Yt = + Yt1 + t
where Yt = Yt Yt1 and = 1. Testing the hypothesis that = 1 is thus equivalent to test = 0.
The test is referred to as the Dickey-Fuller (DF) test and is given by DF = where and
are the OLS
estimate of and its standard error assuming homoskedasticity. Under the null hypothesis that the series
is non-stationary, the DF statistic is not distributed according to the Student t since it requires running a
regression that involves non-stationary variables which leads to the problems discussed earlier. Instead, it
follows a different distribution with critical values that are tabulated and are provided below.
The non-standard distribution of the DF test statistic can be investigated via a simulation study using R. The
code below performs a simulation in which we generate a random walk time series (without drift), estimate
the DF regression equation, and store the t-statistic of Yt1 which represents the DF test statistic. We repeat
these operations B times and then plot a histogram of the DF statistic together with the Student t distribution
with T-1 degree-of-freedom (T represents the length of the time series set in the code).
58
require(dyn)
set.seed(1234)
T
B
mu
sigma
=
=
=
=
100
500
0.1
1
#
#
#
#
The graph shows clearly that the distribution of the DF statistic does not follow the t distribution: it has
a negative mean and median (instead of 0), it is skewed to the right (positive skewness) rather than being
symmetric, and its empirical 5% quantile is -2.849 instead of the theoretical value of -1.66. Since we are
performing a one-sided test against the alternative hypothesis H1 : < 0, using the one-sided 5% critical
value from the t distribution would lead to reject the null hypothesis of non-stationarity too often relative
to the appropriate critical values derived by Dickey and Fuller. For the simulations exercise above, the
(asymptotic) critical value for the null of a random walk with drift is -2.86 at 5% significance level. The
percentage of simulations for which we reject the null based on this critical value and that from the t
distribution are
59
[1] 0.048
[1] 0.376
This shows that using the critical value from the t-distribution would lead to reject too often (37.6% of the
times) relative to the expected level of 5%. Instead, using the correct critical value the null is rejected 4.8%
which is quite close to the 5% significance level.
In practical implementation, it is typically advisable to include lags of Yt to control for serial correlation
in the changes of the variable. This is called the Augmented Dickey Fuller (ADF) and
requires to estimate
the following regression model (for the case with a constant): Yt = + Yt1 + pj=1 j Ytj + t
which consists of adding lags of the change of Yt . The ADF test statistic is calculated as before by taking the
t-statistic of the Yt1 , that is, ADF = /
. Another variation of the test also includes a trend variable in
the regression model to calculate the test statistic, that is,
Yt = + Yt1 + 0 t +
j Ytj + t
j=1
The reason for including a deterministic trend in the ADF regression is that we want to be able to discriminate
between a deterministic trend model and a random walk model with drift. If we do not reject the null
hypothesis then we conclude that the time series follows a random walk with drift whilst in case of rejection
we conclude in favor of the deterministic trend model.
Once we calculate the DF or ADF test statistic we need to evaluate its statistical significance using the
appropriate critical values. As discussed earlier, these statistics have a special distribution and critical values
have been tabulated for the case with/without a constant and with/without a trend and for various sample
sizes. Below you can find the critical values obtained for various sample sizes for a model with constant and
with or without trend:
Sample Size
T = 25
T = 50
T = 100
T = 250
T = 500
T=
Without
1%
-3.75
-3.58
-3.51
-3.46
-3.44
-3.43
Trend
With
Trend
5%
-3.00
-2.93
-2.89
-2.88
-2.87
-2.86
1%
-4.38
-4.15
-4.04
-3.99
-3.98
-3.96
5%
-3.60
-3.50
-3.45
-3.43
-3.42
-3.41
The non-stationarity test is implemented in the urca package using the function ur.df(). Below is an
60
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min
1Q
-0.03076 -0.00466
Median
0.00052
3Q
0.00494
Coefficients:
Estimate Std. Error t
(Intercept) 9.06e-02
8.45e-02
z.lag.1
-1.08e-02
1.09e-02
tt
7.23e-05
8.81e-05
z.diff.lag1 3.31e-01
6.42e-02
z.diff.lag2 9.85e-02
6.75e-02
z.diff.lag3 -4.73e-02
6.69e-02
z.diff.lag4 -4.37e-02
6.34e-02
--Signif. codes: 0 '***' 0.001 '**'
Max
0.03379
value Pr(>|t|)
1.07
0.28
-0.99
0.32
0.82
0.41
5.16 5.1e-07 ***
1.46
0.15
-0.71
0.48
-0.69
0.49
0.01 '*' 0.05 '.' 0.1 ' ' 1
In this case the ADF test statistic is -0.99 which should be compared to the critical value at 5% -3.42 and we do
not reject the null hypothesis that the series is non-stationary and follows a random walk with drift model.
61
4. Volatility Models
A stylized fact across many asset classes is that the standard deviation of returns, often referred to as volatility,
varies significantly over time. The graph below shows the time series of the daily returns of the S&P 500 from
the beginning of 1990 until the end of 2013. The mean daily return for the S&P 500 returns is 0.028% and its
standard deviation is 1.142%. This estimate of the standard deviation represents a long-run average between
periods of high and low volatility. In the Figure below we can see that returns are within the two standard
deviation confidence bands (dashed lines in the graph) for long periods of time. Occasionally, there are sudden
bursts of high volatility that last for several months before volatility decreases again. In other words, volatility
is time-varying in the sense that it alternates between regimes of low and high volatility. This fact should be
accounted for by any model of financial returns since volatility is a proxy for risk, which is an important
input to many financial decisions (e.g., from option pricing to risk management). Modeling volatility is thus
a particularly relevant task in financial econometrics.
The model for high-frequency (daily and intra-daily) returns that we will work with in this Chapter has
the following structure: Rt+1 = t+1 + t+1 t+1 where the asset return is decomposed in following three
components:
Expected return: t+1 represents the expected return of the asset. At the daily and intra-daily frequency,
this component is typically assumed equal to zero. Alternatively, it could be assumed to follow an AR(p)
process t+1 = 0 +1 Rt + +p Rtp+1 or being a function of other contemporaneous variables,
that is, t+1 = 0 + 1 Xt .
Volatility: t+1 is the standard deviation of the shock conditional on information available at time t
62
Volatility Models
63
Unexpected shock: t+1 represents the shock that occurred in period t. A simple assumption is that
it is normally distribution with mean zero and variance one, but more sophisticated distributional
assumption can be introduced.
The aim of this Chapter is to discuss different models to estimate and forecast t+1 using simple techniques
such as Moving Average (MA) and Exponential Moving Average (EMA), followed by a discussion of a
more sophisticated time series models such as the Auto-Regressive Conditional Heteroskedasticy (ARCH)
model. ARCH models have been extended in several directions, but for the purpose of this Chapter we will
consider the two most important generalizations: GARCH (Generalized ARCH) and GJR-GARCH (Glosten,
Jagganathan and Runkle ARCH) which includes an asymmetric effect of positive/negative shocks on volatility.
+
R
=
t
t1
t+1
tM
+1
M
M
1
2
2 . The two extreme values of the window
t+1
j=1 Rtj+1 and the standard deviation is calculated as
M
2
2
are M = 1, which implies t+1
= Rt2 , and M = t, that leads to t+1
= 2 , where 2 represents the
unconditional variance estimated in the full sample. Small values of M imply that the volatility estimate is
very responsive to the most recent square returns, whilst for large values of M the estimate responds very little
to the latest returns. Another way to look at the role of the window size on the smoothness of the volatility
is to interpret it as an average of the last M days each carrying a weight of 100/M % (i.e., for M = 25 the
weight is 4%). When M increases, the weight given to each observation in the window becomes smaller so
that each daily square returns (even when extreme) has a smaller impact on changing the volatility estimate.
The MA approach can be implemented in R using the rollmean() function provided in package zoo which
requires to specify the window size (M in the notation above). An example for the S&P 500 daily returns is
provided below:
require(zoo)
sigma25 <- rollmean(sp500daily^2, 25, align="right")
plot(abs(sp500daily), col="gray", xlab="", ylab="")
lines(sigma25^0.5, col=2, lwd=2)
Volatility Models
64
The effect of increasing the window size from M = 25 (continuous line) to 100 (dashed line) is shown in the
graph below: the longer window smooths out the fluctuation of the MA(25) since each observation is given
a smaller weight. This implies that large (negative or positive) returns increase volatility less relative to MA
calculated on smaller windows.
require(zoo)
sigma100 <- rollmean(sp500daily^2, 100, align="right")
plot(sigma25^0.5, col=2, xlab="", ylab="")
lines(sigma100^0.5, lwd=2, col=4, lty=2)
One drawback of the MA approach is that a large daily return is able to increase significantly the estimate
while it makes it decrease when the observations drops out of the window. This is because the MA approach
distributes the weight across the last M observations while returns older than M receive a zero weight. An
extension of the MA approach is to have a smoothly decreasing weight assigned to older returns instead of the
65
Volatility Models
discrete jump of the weight from 1/M to 0 at the M + 1th observation. This approach is called Exponential
Moving Average (EMA) and it is calculated as follows:
2
t+1
2
=
(1 )j1 Rtj+1
j=1
where is a smoothing parameter between zero and one. After some algebra, the expression above can be
rewritten as follows:
2
t+1
= (1 ) t2 + Rt2
which shows that the conditional variance estimate in day t+1 is given by a weighted average of the previous
day estimate and the square return in day t, with the weights equal to 1 and , respectively. A typical
value of is 0.06 which means that day t has a 6% weight in determining that days volatility, day t 1 has
weight (0.94 * 6) = 5.64%, day t 2 has weight (0.942 * 6) = 5.3016%, day t k has weight (0.94k * 6)% and so
on. The method is called Exponential MA because the weights are decaying exponentially and became more
popular in finance since it was proposed by J.P. Morgan to model and predict volatility for Value-at-Risk (VaR)
calculations.
The typical value for M is 25 (one trading month) and for it is 0.06. The graph below compares the volatility
estimates from the MA(25) and EMA(0.06) methods. In this example we use the package TTR which provides
functions to calculate moving averages, both simple and of the exponential type:
require(TTR)
# SMA() function for Simple Moving Average; n = number of days
ma25 <- SMA(sp500daily^2, n=25)
# EMA() function for Exponential Moving Average; ratio = lambda
ema06 <- EMA(sp500daily^2, ratio=0.06)
plot(ma25^0.5, ylim=c(0, 6), ylab="", xlab="")
lines(ema06^0.5,col=2)
Volatility Models
66
The picture shows daily volatility estimate for over 23 years and it is difficult (at this scale) to see large
differences between the two methods. However, by plotting a sub-period of time, some differences become
evident. Below we plot the three year period between beginning 2008 and end of 2009. The biggest difference
between the two methods is that in periods of rapid change (from small/large to large/small returns) the EMA
line captures the change in volatility regime more smoothly relative to the simple MA. This is in particular
clear in the second part of 2008 and the beginning of 2009, with some marked difference between the two
lines.
The parameters of MA and EMA are, usually, chosen a priori rather than being estimated from the data.
There are possible ways to estimate these parameters although they are not popular in the financial literature
and typically do not provide great benefit for practical purposes. In the following Section we discuss a more
general volatility model which generalizes the EMA model and it is typically estimated from the data.
Volatility Models
67
68
Volatility Models
(hence, GJR-GARCH) which assumes that the square return has a different effect on volatility depending on
its sign. The conditional variance Equation of this model is
2
t+1
= + 1 Rt2 + 1 Rt2 I(Rt 0) + t2
In this specification, when the return is positive its effect on the conditional variance is 1 and when it is
negative the effect is 1 + 1 . Testing the hypothesis that 1 = 0 thus provides a test of the asymmetric effect
of shocks on volatility. Empirically, for many assets the estimation results show that 1 is estimated close to
zero and insignificant, while 1 is found positive and significant. The evidence thus indicates that negative
shocks lead to more uncertainty and an increase in the volatility of asset returns, while positive shocks do not
have a relevant effect.
1
exp
f (Rt+1 | , ) =
2
2 ( )
2t+1
Rt+1 t+1 ( )
t+1 ( )
)2 ]
which represents the normal density evaluated at Rt+1 . We wrote the conditional mean and variance as
t+1 ( ) and t2 ( ) to make explicit their dependence on parameters over which the likelihood function
will be maximized. Since we have T returns, we are interested in the joint likelihood of the observed returns
and denoting by p the largest lag of Rt+1 used in the conditional mean and variance, we can define the
(conditional) likelihood function L( , ) (= f (Rp+1 , , Rt+1 | , , R1 , , Rp )) as
69
Volatility Models
L( , ) =
f (Rt+1 | , ) =
exp
2
2 ( )
2t+1
t=p+1
t=p+1
Rt+1 t+1 ( )
t+1 ( )
)2 ]
The ML estimates and are thus obtained by maximizing the likelihood function L( , ). It is
convenient to log-transform the likelihood function to simplify the task of maximizing the function. We
denote the log-likelihood by l( , ) and it is given by
[
T
1
l( , ) = ln L( , ) =
2
ln(2) +
(
2
ln t+1
( )
t=p+1
Rt+1 t+1 ( )
t+1 ( )
)2 ]
since the first term ln(2) does not depend on any parameter, it can be dropped from the function. The
estimates and are then obtained by maximizing
T
1
l( , ) =
2
t=p+1
[
2
ln t+1
( )
(
+
Rt+1 t+1 ( )
t+1 ( )
)2 ]
The maximization of the likelihood or log-likelihood is performed numerically, which means that we use
algorithms to find the maximum of this function. The problem with this approach is that, in some situations,
the likelihood function is not well-behaved since it is characterized by local maxima or it is flat, which means
that it is relatively constant for a large set of parameter values. The numerical search has to be started at some
initial values and, in the difficult cases just mentioned, the choice of these values is extremely important to
achieve the global maximum of the function. The choice of valid starting values for the parameters can be
achieved by a small-scale grid search over the space of possible values of the parameters.
In the case of volatility models the likelihood function is usually well-behaved and achieves a maximum
quite rapidly. In the following Section we discuss some R packages which implement GARCH estimation and
forecasting.
GARCH in R
There are several packages that provide functions to estimate models from the GARCH family. One of the
earliest is the garch() function in the tseries package. Being one of the earliest, it is quite limited in the type
of models it can estimate. Below is an example of the estimation of a GARCH(1,1) model to the daily S&P 500
returns:
70
Volatility Models
require(tseries)
fit <- garch(ts(sp500daily), order=c(1,1), trace=FALSE)
round(summary(fit)$coef, 3)
a0
a1
b1
Estimate
0.011
0.076
0.915
Std. Error
0.001
0.004
0.005
t value Pr(>|t|)
8.188
0
17.127
0
180.214
0
Where a0, a1 and b1 represent the parameter , , and respectively. The output provides standard errors for
the parameter estimates as well as t-stats and p-values for the null hypothesis that the coefficients are equal
to zero. The estimate of is 0.076 and of is 0.915, with their sum equal to 0.991, which is close enough to 1
to conclude that volatility is non-stationary.
More flexible functions for GARCH estimation are provided by the package fGarch, that allows flexibility in
modeling the conditional mean t+1 and the conditional variance t2 with time series models. The function to
perform the estimation is called garchFit. The example below shows the application of the garchFit function
to the daily returns of the S&P 500 index. The first example estimates a GARCH(1,1) without the intercept in
the conditional mean (i.e., t+1 = 0) and the results are thus comparable to the earlier ones for the garch
function; we then add in the model the intercept (t+1 = ), and finally we consider an AR(1)-GARCH(1,1)
model. Notice that to specify the AR(1) for the conditional mean we use the function arma(p,q) which is a
more general function than those used to estimate AR(p) models.
require(fGarch)
fit <- garchFit(~garch(1,1), data=sp500daily, include.mean=FALSE, trace=FALSE)
round(fit@fit$matcoef, 3)
omega
alpha1
beta1
Estimate
0.011
0.076
0.915
Std. Error
0.002
0.007
0.007
t value Pr(>|t|)
5.624
0
11.375
0
125.069
0
While the point estimates are equal to those obtained earlier for the garch function, the standard errors
are different due to differences between analytical and numerical standard errors. In the example below we
consider an intercept in the conditional mean which leads to small changes in the coefficient estimates of the
volatility parameters:
fit <- garchFit(~garch(1,1), data=sp500daily, trace=FALSE)
round(fit@fit$matcoef, 3)
71
Volatility Models
mu
omega
alpha1
beta1
Estimate
0.053
0.012
0.079
0.912
Std. Error
0.010
0.002
0.007
0.008
t value Pr(>|t|)
5.321
0
5.762
0
11.382
0
120.749
0
The results show that the mean is estimated equal to 0.053% and it is statistically significant even at 1%.
Hence, for the daily S&P 500 returns the assumption of a zero expected return is rejected. It might also be
interesting to evaluate the need to introduce some dependence in the conditional mean, for example, by
assuming a AR(1) model. The command arma(1,0) + garch(1,1) in the garchFit() function estimates an
AR(1) model with GARCH(1,1) conditional variance:
fit <- garchFit(~ arma(1,0) + garch(1,1), data=sp500daily, trace=FALSE)
round(fit@fit$matcoef, 3)
mu
ar1
omega
alpha1
beta1
Estimate
0.054
-0.013
0.011
0.078
0.912
Std. Error
0.010
0.013
0.002
0.007
0.008
t value Pr(>|t|)
5.371
0.000
-0.959
0.337
5.760
0.000
11.375
0.000
121.046
0.000
The estimate of the AR(1) coefficient is -0.013 and it is not statistically significant at 10% level, which shows
the irrelevance of including dependence in the conditional mean of daily financial returns.
Based on the GARCH model estimation, we can then obtain the conditional variance and the conditional
standard deviation. The conditional standard deviation is extracted by appending @sigma.t to the garchFit
object:
sigma <- fit@sigma.t
# define sigma as a zoo object since it is numeric
sigma <- zoo(sigma, order.by=index(sp500daily))
plot(sigma, type="l", main="Standard deviation", xlab="",ylab="")
Volatility Models
72
The graph shows the significant variation over time of the standard deviation that alternates between periods
of low and high volatility, in addition to sudden increases in volatility due to the occurrence of large returns.
It is also interesting to compare the fitted standard deviation from the GARCH model with the ones obtained
from the MA and EMA methods. As the plot below shows, the three estimates track each other very closely.
The correlation between the MA and GARCH conditional standard deviation is 0.98 and between EMA and
GARCH is 0.988 and, to a certain extent, they can be considered very good substitutes for each other (in
particular at short horizons).
Another quantity that we need to analyze to evaluate the goodness of the GARCH model is the residuals.
Appending @residuals to the garchFit estimation object we can extract the residuals of the GARCH
model that represent an estimate of t+1 t . The plot below shows that the residuals maintain most of the
characteristics of the raw returns, in particular the clusters of volatility. This is because these residuals have
been obtained from the last fitted model in which
t+1 = 0.054 + (-0.013) Rt1 and the residuals are thus
given by Rt+1
t+1 . The contribution of the intercept is to demean the return series while the small
Volatility Models
73
coefficient on the lagged return leads to returns that are very close to the residuals.
res
<- fit@residuals
res
<- zoo(res, order.by=index(sp500daily))
par(mfrow=c(1,2))
plot(res, type="l", main="Unstandardized Residuals", xlab="", ylab="", cex.main=0.8)
plot(res/sigma, type="l", main="Standardized Residuals", xlab="", ylab="", cex.main=0.8)
Volatility Models
74
It is clear from the QQ plot that the left-tail of the standardized residuals distribution seems to disagree
with normality: there are too many large negative returns (relative to what would be expected under the
normal) to be able to say that the residuals are normally distributed. We can further investigate this issue by
calculating the skewness, equal to -0.255, and that excess kurtosis, equal to 8.747. We can also consider the
auto-correlation of residuals and square residuals to assess if there is neglected dependence in the conditional
mean and variance:
Overall, there is weak evidence of auto-correlation in the standardized residuals and in their squares, so that
the GARCH(1,1) model seems to be well specified to model the daily returns of the S&P 500.
Another package that provides functions to estimate a wide range of GARCH models is the rugarch package.
This packages requires first to specify the functional form of the conditional mean and variance using the
function ugarchspec() and then proceed with the estimation using the function ugarchfit(). Below is an
example for an AR(1)-GARCH(1,1) model estimated on the daily S&P 500 returns:
75
Volatility Models
require(rugarch)
spec = ugarchspec(variance.model=list(model="sGARCH",garchOrder=c(1,1)),
mean.model=list(armaOrder=c(1,0)))
fitgarch = ugarchfit(spec = spec, data = sp500daily)
mu
ar1
omega
alpha1
beta1
Estimate
SE
0.053
5.380
-0.013 -0.959
0.011
5.459
0.078 10.763
0.912 113.059
The estimation results are the same as those obtained for the fGarch package and more information about the
estimation results can be obtained using command show(fitgarch) as shown below:
*---------------------------------*
*
GARCH Model Fit
*
*---------------------------------*
Conditional Variance Dynamics
----------------------------------GARCH Model : sGARCH(1,1)
Mean Model : ARFIMA(1,0,0)
Distribution
: norm
Optimal Parameters
-----------------------------------Estimate Std. Error
t value Pr(>|t|)
mu
0.053207
0.009889
5.38046 0.00000
ar1
-0.012898
0.013449 -0.95902 0.33755
omega
0.011496
0.002106
5.45930 0.00000
alpha1 0.078496
0.007293 10.76270 0.00000
beta1
0.912106
0.008068 113.05857 0.00000
Robust Standard Errors:
Estimate Std. Error
mu
0.053207
0.009120
ar1
-0.012898
0.012249
omega
0.011496
0.003234
alpha1 0.078496
0.012923
beta1
0.912106
0.013591
t value
5.8344
-1.0530
3.5553
6.0740
67.1116
LogLikelihood : -8456.918
Information Criteria
-----------------------------------Akaike
Bayes
2.6863
2.6917
Pr(>|t|)
0.000000
0.292362
0.000378
0.000000
0.000000
Volatility Models
Shibata
2.6863
Hannan-Quinn 2.6882
Weighted Ljung-Box Test on Standardized Residuals
-----------------------------------statistic p-value
Lag[1]
0.03057 0.86119
Lag[2*(p+q)+(p+q)-1][2]
0.38666 0.98534
Lag[4*(p+q)+(p+q)-1][5]
5.08162 0.09963
d.o.f=1
H0 : No serial correlation
Weighted Ljung-Box Test on Standardized Squared Residuals
-----------------------------------statistic p-value
Lag[1]
3.89 0.048562
Lag[2*(p+q)+(p+q)-1][5]
11.83 0.003082
Lag[4*(p+q)+(p+q)-1][9]
13.13 0.009965
d.o.f=2
Weighted ARCH LM Tests
-----------------------------------Statistic Shape Scale P-Value
ARCH Lag[3]
0.0788 0.500 2.000 0.7789
ARCH Lag[5]
0.5472 1.440 1.667 0.8696
ARCH Lag[7]
0.6365 2.315 1.543 0.9644
Nyblom stability test
-----------------------------------Joint Statistic: 2.782
Individual Statistics:
mu
0.07596
ar1
1.59443
omega 0.20362
alpha1 0.23997
beta1 0.17185
Asymptotic Critical Values (10% 5% 1%)
Joint Statistic:
1.28 1.47 1.88
Individual Statistic:
0.35 0.47 0.75
Sign Bias Test
-----------------------------------t-value
prob sig
2.5578 1.056e-02 **
Sign Bias
Negative Sign Bias 0.4089 6.826e-01
Positive Sign Bias 3.1726 1.518e-03 ***
Joint Effect
38.0232 2.795e-08 ***
76
Volatility Models
77
The estimation of the GJR-GARCH model is quite straightforward in this package and requires to specify the
option model='gjrGARCH' in ugarchspec(), in addition to selecting the orders for the conditional mean and
variance as shown below:
require(rugarch)
spec = ugarchspec(variance.model=list(model="gjrGARCH",garchOrder=c(1,1)),
mean.model=list(armaOrder=c(1,0)))
fitgjr = ugarchfit(spec = spec, data = sp500daily)
mu
ar1
omega
alpha1
beta1
gamma1
Estimate
SE
0.025
2.546
-0.003 -0.211
0.015
6.803
0.000
0.001
0.916 111.416
0.137 11.003
The results for the S&P 500 confirm the earlier discussion that positive returns have a negligible effect in
increasing volatility (
1 =0) while negative returns have a very large and significant effect (
1 = 0.137). The
plot below on the left compares the time series of the volatility estimates for GARCH and GJR and the plot on
the right-hand side shows the difference between the two estimates. This graph shows clearly that also the
difference has clusters of volatility which are due to large negative returns that increase significantly more
t+1 for GJR than GARCH, as opposed to positive returns that have no effect on volatilities for GJR.
78
Volatility Models
The selection of the best performing volatility model can be done using the AIC selection criterion, similarly
to the selection of the optimal order p for AR(p) models. The package rugarch provides the function
infocriteria() that calculates AIC and several other selection criteria. These criteria are different in the
amount of penalization that they involve for adding more parameters (AR(1)-GJR has one parameter more
than AR(1)-GARCH). For all criteria, the best model is the one that provides the smallest value. In this case
the GJR specification clearly outperforms the basic GARCH(1,1) model for all criteria.
ciao <- cbind(infocriteria(fitgarch), infocriteria(fitgjr))
colnames(ciao) <- c("GARCH","GJR")
ciao
Akaike
Bayes
Shibata
Hannan-Quinn
GARCH
2.686323
2.691679
2.686322
2.688179
GJR
2.654422
2.660849
2.654421
2.656649
The function ugarchforecast() allows to compute the out-of-sample forecasts for a model n.ahead periods.
The plot below shows the forecasts made in 2014-12-31 when the volatility estimate t+1 was 0.893 for GARCH
and 0.782. Both models forecast an increase in volatility in the future since the volatility is mean-reverting in
these models (and at the moment the forecast was made volatility was below its long-run level ).
garchforecast <- ugarchforecast(fitgarch, n.ahead=250)
gjrforecast
<- ugarchforecast(fitgjr, n.ahead=250)
Volatility Models
T+1
T+2
T+3
79
Finally, we can compare the GARCH and GJR specifications based on the effect of a shock (t ) on the
conditional variance (t2 ). The left plot refers to the GARCH(1,1) model and clearly show that positive and
negative shocks (of the same magnitude) increase the conditional variance by the same amount. However, the
news impact curve for the GJR model clearly shows the asymmetric effect of shocks, since there is no effect
when t1 is positive but a large effect when the shock is negative.
newsgarch <- newsimpact(fitgarch)
newsgjr
<- newsimpact(fitgjr)
plot(newsgarch$zx, newsgarch$zy, type="l", xlab=newsgarch$xexpr,
ylab=newsgarch$yexpr, lwd=2, main="GARCH", cex.main=0.8)
abline(v=0, lty=2)
abline(v=0, lty=2)
Volatility Models
80
potential portfolio loss that an institution might face if an unlikely adverse event occurred at a certain time
horizon. Lets define the profit/loss of a financial institution in day t + 1 by Rt+1 = 100 ln(Wt+1 /Wt ),
where Wt+1 is the portfolio value in day t + 1. Then Value-at-Risk (VaR) at 100(1 )% is defined as
1
P (Rt+1 V aRt+1
)=
where the typical values of are 0.01 and 0.05. In practice, V aRt+1 is calculated every day and for an horizon
of 10 days (2 trading weeks). If V aRt+1 is expressed in percentage return it can be easily transformed into
1
dollars by multiplying the portfolio value in day t (denoted by Wt ) with the expected loss, that is, V aRt+1
=
1
Wt (exp(V aRt+1 /100) 1). From a statistical point view, 99% VaR represents the 1% quantile, that is,
the value such that there is only 1% probability that the random variable takes a value smaller or equal to
that value. The graphs below shows the Probability Density Function (PDF) and the Cumulative Distribution
Function (CDF). Risk calculation is concerned with the left tail of the return distribution since those are the rare
events that have a large and negative effect on the portfolio of financial institutions. This is the reason why
the profession has devoted a lot of energy to make sure that the left tail, rather than the complete distribution,
is appropriately specified since a poor model for the left tail implies poor risk estimates (poor in a sense that
will become clear in the backtesting section).
81
82
As an example, assume that an institution is holding a portfolio that replicates the S&P 500 Index and that
it wants to calculate the 99% VaR for this position. If we assume that returns are normally distributed, then
0.99 we need to estimate the expected daily return of the portfolio (i.e., ) and its expected
to calculate V aRt+1
volatility (i.e., ). Lets assume that we believe that the distribution is approximately constant over time so that
we can estimate the mean and standard deviation of the returns over a long period of time. In the illustration
below we use the S&P 500 time series that was used in the volatility chapter that consists of daily returns
from 1990 to 2014 (6300 observations).
mu
= mean(sp500daily)
sigma = sd(sp500daily)
= mu + qnorm(0.01) * sigma
var
[1] -2.629
The 99% VaR is -2.629% and represent the maximum loss of holding the S&P500 that is expected for the
following day with 99% probability. If we had used a shorter estimation window of one year (252 observations),
the V aR estimation would have been -1.572%. The difference between the two VaR estimates is quite
remarkable since we only changed the size of the estimation window. The standard deviation declines from
83
1.142% in the full sample to 0.707% in the shorter sample, whilst the mean changes from 0.028% to 0.073%. As
discussed in the volatility modeling Chapter, it is extremely important to account for time variation in the
distribution of financial returns if the interest is to estimate VaR at short horizons (e.g., a few days ahead).
Time-varying VaR
So far we assumed that the mean and standard deviation of the return distribution are constant and represent
the long-run distribution of the variable. However, this might not be the best way to predict the distribution
of the profit/loss at very short horizons (e.g., 1 to 10 days ahead) if the return volatility changes over time. In
particular, in the volatility chapter we discussed the evidence that the volatility of financial returns changes
over time and introduced models to account for this behavior. We can model the conditional distribution of
the returns in day t+1 by assuming that both t+1 and t+1 are time-varying conditional on the information
available in day t. Another decision that we need to make to specify the model is the distribution of the errors.
We can assume, as above, that the errors are normally distributed so that 99% VaR is calculated as:
0.99
V aRt+1
= t+1 2.33 t+1
where the expected return t+1 can be either constant (i.e. t+1 = ), or an AR(1) process (t+1 = 0 +
1 Rt ), and the conditional variance can be modeled as MA, EMA, or with a GARCH-type model. In the
example below, I assume that the conditional mean is constant (and equal to the sample mean) and model the
conditional variance of the demeaned returns as an EMA with parameter = 0.06:
require(TTR)
<- mean(sp500daily)
mu
sigmaEMA <- EMA((sp500daily-mu)^2, ratio=0.06)^0.5
var
<- mu + qnorm(0.01) * sigmaEMA
The VaR time series inherits the time variation in volatility which alternates between calm periods of low
volatility and risk, and other periods of increased uncertainty and thus the possibility of large losses. For this
84
example, we find that in 2.207% of the 6299 days the return was smaller relative to VaR. Since we calculated
VaR at 99% we expected to experience only 1% of days with violations.
be criticized on the ground that it does not convey the information on the potential loss that is expected if
indeed an extreme event (only likely 1% of less) occurs. For example, a VaR of -5.52% provides no information
on how large the portfolio loss is expected to be if the portfolio return will happen to be smaller than VaR. That
is, how large do we expect the loss be in case VaR is violated? A risk measure that quantifies this potential
1
1
loss is Expected Shortfall (ES) which is defined as ESt+1
= E(Rt+1 |Rt+1 V aRt+1
) that is, the
expected portfolio return conditional on being on a day in which the return is smaller than VaR. This risk
measure focuses the attention on the left tail of the distribution and it is highly dependent on the shape of the
distribution in that area, while it neglects all other parts of the distribution.
An analytical formula for ES is available if we assume that returns are normally distributed. In particular, if
1
Rt+1 = t+1 t+1 with t+1 N (0, 1), then VaR is calculated as V aRt+1
= z t+1 . The conditioning event
is that the return in the following day is smaller than VaR and the probability of this event happening is ,
e.g. 0.01. We then need to calculate the expected value of Rt+1 over the interval from minus infinity to Rt+1
which corresponds to a truncated normal distribution with density function given by
1
1
f (Rt+1 |Rt+1 V aRt+1
) = (Rt+1 )/(V aRt+1
/t+1 )
1
where () and () represent the PDF and the CDF of the normal distribution (i.e., (V aRt+1
/t+1 ) =
(z ) = ). We can thus express ES as
1
ESt+1
= t+1
(z )
where z is equal to -2.33 and -1.64 for equal to 0.01 and 0.05, respectively. If we are calculating VaR at 99%
so that is equal to 0.01 then ES is equal to
0.99
ESt+1
= t+1
(2.33)
= 2.64t+1
0.01
where the value 2.64 can be obtained in R typing the command (2*pi)(-0.5) * exp(-(2.332)/2) /
0.01 or using the function dnorm(-2.33) / 0.01. If = 0.05 then the constant to calculate ES is -2.08 instead
of -1.64 for VaR. Hence, ES leads to more conservative risk estimates since the expected loss in a day in which
VaR is exceeded is always larger than VaR. We can plot the difference between VaR and ES as a function of
1 in the following graph:
85
sigma
alpha
ES
VaR
= 1
= seq(0.001, 0.05, by=0.001)
= - dnorm(qnorm(alpha)) / alpha * sigma
= qnorm(alpha) * sigma
K rule
The Basel Accords require VaR to be calculated at a horizon of 10 days and for a risk level of 99%. In addition,
the
Accords allow
financial institution to scale up the 1-day VaR to the 10 day horizon by multiplying it
by 10. Why 10? Under
what conditions is the VaR for the cumulative returns over 10 days, denoted by
V aRt+1:t+10 , equal to 10 V aRt+1 ?
In day t we are interested in calculating the risk of holding the portfolio over a horizon of K days, that is,
assuming that we can liquidate the portfolio only on the Kth day. Regulators require banks to use K = 10
that corresponds to two trading weeks. To calculate risk of holding the portfolio in the next K days we need
to obtain thedistribution of the sum of K daily returns or cumulative return, denoted by Rt+1:t+k , which
is given by K
k=1 Rt+k = Rt+1 + + Rt+K , where Rt+k is the return in day t + k. If we assume that
these daily returns are independent and identically distributed (i.i.d.) with mean and variance 2 , then the
expected value of the cumulative return is
(
E
)
Rt+k
k=1
= K
k=1
V ar
(K
k=1
)
Rt+k
k=1
2 = K 2
86
so that the standard deviation of the cumulative return is equal to K. If we maintain the normality
assumption that we introduced earlier, than the 99% V aR of Rt+1:t+K is given by
1
V aRt+1:t+K
= K 2.33 K
In this formula, the mean and standard deviation are estimated on daily returns and they are then scaled up
to horizon K.
This result relies on the assumption that returns are serially independent which allows us to set all covariances
between returns in different days equal to zero. The empirical evidence from the ACF of daily returns indicates
that this assumption is likely to be accurate most of the time, although in times of market booms or busts
returns could be, temporarily, positively correlated. What would be the effect of positive correlation in returns
on VaR? The first-order covariance can be expressed in terms of correlation as 2 , with the first order
serial correlation. To keep things simple, assume that we are interested in calculating VaR for the two-day
return, that is, K = 2. The variance of the cumulative return is V ar(Rt+1 + Rt+2 ) which is equal to
V ar(Rt+1 ) + V ar(Rt+2 ) + 2Cov(Rt+1 , Rt+2 ) = 2 + 2 + 2 2 . This can be re-written as 2 2 (1 + )
which shows that in the presence of positive correlation the cumulative return becomes riskier, relative to the
0.99
independent case, since 2 2 (1 + ) > 2 2 . The Value-at-Risk for the two-day return is then V aRt+1:t+2
=
2 2.33 2 1 + , which is smaller relative to the VaR assuming independence that is given by
2 2.33 2. Hence, neglecting positive correlation in returns leads to underestimating risk and the
potential portfolio loss deriving from an extreme (negative) market movement.
87
the normal distribution and a dynamic method to forecast volatility should provide reasonably accurate VaR
estimates. However, we still might want to account for the non-normality in the standardized returns t and
we will consider two possible approaches in this Section.
One approach to relax the assumption of normally distributed errors is the Cornish-Fisher approximation
which consists of performing a Taylor expansion of the normal distribution around its mean. This has the
effect of producing a distribution which is a function of skewness and kurtosis. We skip the mathematical
details of the derivation and focus on the VaR calculation when this approximation is adopted. If we assume
the mean is equal to zero, the 99% VaR for normally distributed returns is calculated as 2.33 t+1 or,
more generally, by z t+1 for 100(1 )% VaR, where 1
1 represents the 1 -quantile of the standard
normal distribution. With the Cornish-Fisher (CF) approximation VaR is calculated in a similar manner, that
1
is, V aRt+1
= zCF t+1 , where the quantile zCF is calculated as follows:
zCF = z +
] SK 2 [ 5
]
] EK [ 3
SK [ 2
z 1 +
z 3z +
2z 5z
6
24
36
where SK and EK represent the skewness and excess kurtosis, respectively, and for = 0.01 we have
CF = z . However,
that z = 2.33. If the data are normally distributed then SK = EK = 0 so that z1
CF
in case the distribution is asymmetric and/or with fat tails the effect is that z z . In practice, we
estimate the skewness and the excess kurtosis from the sample and use those values to calculate the quantile
CF (black line) and its relationship to z
for VaR calculations. In the plot below we show the quantile z0.01
0.01 (red
dashed line) as a function of the skewness and excess kurtosis parameters. The left plot shows the effect of the
skewness parameter on the quantile, while holding the excess kurtosis equal to zero. Instead, the plot on the
right shows the effect of increasing values of excess kurtosis, while the skewness parameter is kept constant
CF is smaller than z
and equal to zero. As expected, the z0.01
0.01 = 2.33 and it is interesting to notice that
CF more than positive skewness of the same magnitude.
negative skewness increases the (absolute) value of z0.01
This is due to the fact that negative skewness implies a higher probability of large negative returns compared
to large positive returns. The effect on VAR of accounting for asymmetry and fat tails in the data is thus to
provide more conservative risk measures.
88
alpha = 0.01
EK = 0; SK = seq(-1, 1, by=0.05)
z = qnorm(alpha)
zCF = z + (SK/6) * (z^2 - 1) + (EK/24) * (z^3 - 3 * z) +
(SK^2/36) * (2*z^5 - 5*z)
EK = seq(0, 10,by=0.1); SK = 0
zCF = z + (SK/6) * (z^2 - 1) + (EK/24) * (z^3 - 3 * z) +
(SK^2/36) * (2*z^5 - 5*z)
An alternative approach to allow for non-normality is to make a different distributional assumption for t+1
that captures the fat-tailness in the data. A distribution that is often considered is the t distribution with a
small number of degrees of freedom. Since the t distribution assigns more probability to events in the tail
of the distribution, it will provide more conservative risk estimates relative to the normal distribution. The
graphs below show the t distribution for 4, 10, and degree-of-freedom, while the plot on the right zooms
on the shape of the left tail of these distributions. It is clear that the smaller the d.o.f used the more likely are
extreme events relative to the standard normal distribution (d.o.f. = ). So, also this approach delivers more
conservative risk measures relative to the normal distribution since it assigns higher probability to extreme
events.
plot(dnorm, xlim=c(-4,4), col="black", ylab="", xlab="",yaxt="n")
curve(dt(x,df=4), add=TRUE,col="orange")
curve(dt(x,df=10), add=TRUE,col="purple")
89
To be able to use the t distribution for risk calculation we need to set the value of the degree-of-freedom
parameter, denoted by d. In the context of a GARCH volatility model this can be easily done by considering
d as an additional parameter to be estimated by maximizing the likelihood function based on the assumption
that the t+1 shocks follow a td distribution. A simple alternative approach to estimate the degree-of-freedom
parameter d exploits the fact that the excess kurtosis of a td distribution is equal to EK = 6/(d 4) (for
d > 4) which is only a function of the parameter d. Thus, based on the sample excess kurtosis we can then
back out an estimate of d. The steps are as follows:
1. estimate by ML the GARCH model assuming that the errors are normally distributed
2. calculate the standardized residuals as t = Rt /t
3. estimate the excess kurtosis of the standardized residuals and obtain d as d = 6/EK + 4 (for d > 4)
For the standardized returns of the S&P 500 the sample excess kurtosis is equal to 2.22 so that the estimate of
d is equal to approximately 7 which indicates the need for a fat-tailed distribution. In practice, it would be
advisable to estimate the parameter d jointly with the remaining parameters of the volatility model, rather
than separately. Still, this simple approach provides a starting point to evaluate the usefulness of fat tailed
distributions in risk measurement.
90
any parameter for the volatility model and the distribution. However, there are also a few difficulties in using
HS to calculate risk measures. One issue is the choice of the estimation window size M. Practitioners often
use values between M=250 and 1000, but, similarly to the choice of smoothing in MA and EMA, this is an
ad hoc value that has been validated by experience rather than being optimally selected based on a criterion.
Another complication is that HS applied to daily returns provides a VaR measure at the one day horizon which,
for
regulatory purposes, should then be converted to a 10-day horizon. What is typically done is to apply the
10 rule discussed before, although it does not have much theoretical justification in the context of HS since
we are not actually making any assumption about the return distribution. An alternative would be to calculate
VaR as the 1% quantile of the (non-overlapping) cumulative return instead of the daily return. However, this
would imply a much smaller sample size, in particular for small M.
The implementation in R is quite straightforward and shown below. The function quantile() calculates the
1 quantile for a return series which we can combine with the function rollapply() from package zoo
to apply it recursively to a rolling window of size M. For example, the command rollapply(sp500daily,
250, quantile, probs=alpha, align="right") calculates the function quantile (for probs=alpha and
alpha=0.01) for the sp500daily time series with the first VaR forecast for day 251 until the end of the sample.
The graph below shows a comparison of VaR calculated with the HS method for an estimation window of 250
and 1000 days. The shorter estimation window makes the VaR more sensitive to market events as opposed
to M=1000 that changes very slowly. A characteristic of HS that is particularly evident when M=1000 is that it
might be constant for long periods of time, even though volatility might have significantly decreased.
M1 = 250
M2 = 1000
alpha = 0.01
hs1 <- rollapply(sp500daily, M1, quantile, probs=alpha, align="right")
hs2 <- rollapply(sp500daily, M2, quantile, probs=alpha, align="right")
91
92
up the 1-day VaR to 10-day by multiplying it by 10. However, there are several limitations in doing this and
it becomes particularly innacurate when used for VaR models that assume a GARCH volatility structure.
An alternative is to use simulation methods that generate artificial future returns that are consistent with
the risk model. The model makes assumptions about the volatility model, and the distribution of the error
term. Using simulations we are able to produce a large number of future possible paths for the returns that
are conditional on the current day. In addition, it becomes very easy to obtain the distribution of cumulative
returns by summing daily simulated returns along a path. We will consider two popular approaches that
differ in the way simulated shocks are generated: Monte Carlo Simulation (MC) consists of iterating the
volatility model based on shocks that are simulated from a certain distribution (normal, t or something else),
and Filtered Historical Simulation (FHS) that assumes the shocks are equal to the standardized returns and
takes random sample of those values. The difference is that FHS does not make a parametric assumption for
the t+1 (similarly to HS), while MC does rely on such assumption.
93
estimate the model using the rugarch package. The volatility forecast for the following day is 0.483%
which is significantly lower relative to the sample standard deviation from January 1990 to January
2007 of 0.993%. To use some terminology introduced earlier, the conditional forecast (0.483%) is lower
relative to the unconditional forecast (0.993%).
spec = ugarchspec(variance.model=list(model="sGARCH",garchOrder=c(1,1)),
mean.model=list(armaOrder=c(0,0), include.mean=FALSE))
# lowvol represents the January 22, 2007 date
fitgarch = ugarchfit(spec = spec, data = window(sp500daily, end=lowvol))
omega alpha1
0.005 0.054
T+1
beta1
0.942
2007-01-22 19:00:00
0.483
1. The next step consists of simulating a large number of return paths, say S, that are consistent with the
model assumption that Rt+1 = t+1 t+1 . Since we have already produced the forecast t+1 = 0.483,
to obtain simulated values of Rt+1 we only need to generate values for the error term t+1 . This can be
easily done in R using the command rnorm() which returns random values from the standard normal
distribution (and rt() does the same for the t distribution). Denote by s,t+1 the s-th simulated value
of the shock (for s = 1, , S), then the s-th simulated value of the return is produced by multiplying
t+1 and s,t+1 , that is, Rs,t+1 = t+1 s,t+1 .
2. The next step is to use the simulate returns Rs,t+1 to predict volatility next period, denoted by s,t+2 .
Since we have assumed a GARCH specification the volatility forecast is obtained by (s,t+2 )2 = +
2
(Rs,t+1 )2 + t+1
and the (simulated) returns at time t + 2 are obtained as Rs,t+2 = s,t+2 s,t+2 ,
where s,t+2 represent a new set of simulated values for the shocks in day t + 2.
3. Continue the iteration to calculate s,t+k and Rs,t+k for k = 1, , K. The cumulative or multi-period
94
set.seed(9874)
S = 10000 # number of MC simulations
K = 250
# forecast horizon
# create the matrices to store the simulated return and volatility
R
= zoo(matrix(sigma*rnorm(S), K, S, byrow=TRUE), order.by=futdates)
Sigma = zoo(matrix(sigma, K, S), order.by=futdates)
# iteration to calculate R and Sigma based on the previous day
for (i in 2:K)
{
Sigma[i,] = (gcoef['omega'] + gcoef['alpha1'] * R[i-1,]^2 +
gcoef['beta1'] * Sigma[i-1,]^2)^0.5
R[i,] = rnorm(S) * Sigma[i,]
}
How would the forecasts of the return and volatility distribution look like if they were made in a period of high
volatility? To illustrate this scenario we consider September 29, 2008 as the forecast base. The GARCH(1,1)
forecast of volatility for the following day is 3.165% and the distribution of simulate returns and volatilities
are shown in the graph below. Since the day was in a period of high volatility, the assumption of stationarity
of volatility made by the GARCH model implies that volatility will reduce in the future. This is clear from the
return quantiles converging in the left plot, as well as from the declining average volatility in the right plot.
95
The cumulative or multi-period return can be easily obtained by the R command cumsum(Ret) where Ret
represents the K by S matrix of simulated one-day returns that the function cumulatively sums over the
columns. The outcome is also a K by S matrix with each column representing a possible path of the cumulative
return from 1 to K steps ahead. The plots below show the quantiles at each horizon k calculated across the S
simulated cumulative returns. The quantile at probability 0.01 represents the 99% VaR that financial institutions
are required to report for regulatory purposes (at the 10 day horizon). The left plot represents the distribution
of expected cumulative returns conditional on being on January 22, 2007 while the right plot is conditional
on September 29, 2008. The same scale of the y-axis for both plots highlights the striking difference in the
dispersion of the distribution of future cumulative returns. Although we saw above that the volatility of daily
returns is expected to increase after January 22, 2007 and decrease following September 29, 2008, the levels
of volatility in these two days are so different that when accumulated over a long period of time they lead to
very different distributions for the cumulative returns.
96
MC simulations make also easy to calculate ES. For each horizon k, ES can be calculated as the average of those
simulated returns that are smaller than VaR. The code below shows these steps for the two dates considered
and the plots compare the two risk measures conditional on being on January 22, 2007 and on September 29,
2008. Similarly to the earlier discussion, ES provides a larger (in absolute value) potential loss relative to VaR,
a difference that increases with the horizon k.
VaR
VaRmat
Rviol
ES
=
=
=
=
97
simulated cumulative returns) for the two forecasting dates that we are considering. The expected future
volatility by FHS converges at a slightly lower speed relative to MC during periods of low volatility, while
the opposite is true when the forecasting point occurs during a period of high volatility. This is because FHS
does not restrict the shape of the distribution on the left tail (as MC does given the assumption of normality)
so that large negative standardized returns contribute to determine future returns and volatility. Of course,
this result is specific to the S&P 500 daily returns that we are considering as an illustrative portfolio and it
might be different when analyzing other portfolio returns.
In terms of VaR calculated on cumulative returns we find that FHS predicts lower risk relative to MC at long
horizon, with the difference becoming larger and larger as the horizon K progresses. Hence, while at short
horizons the VaR forecasts are quite similar, they become increasingly different at longer horizon.
# standardized residuals
std.resid = as.numeric(residuals(fitgarch, standardize=TRUE))
std.resid1 = as.numeric(residuals(fitgarch1, standardize=TRUE))
for (i in 2:K)
{
Sigma.fhs[i,]
98
99
2
2
2
V ar(Rtp ) = t,p
= w1,t
12 + w2,t
22 + 2w1,t w2,t 1,2 1 2
which is a function of the individual (weighted) variances and the correlation between the two assets, 1,2 .
The portfolio Value-at-Risk is then given by
t,p 2.33t,p
2 2 + w 2 2 + 2w w
= w1,t 1 + w2,t 2 2.33 w1,t
1,t 2,t 1,2 1 2
1
2,t 2
0.99
V aRp,t
=
0.99
V aRp,t
=
2.33
2 2 + w 2 2 + 2w w
w1,t
1,t 2,t 1,2 1 2
1
2,t 2
2 2 + 2.332 w 2 2 + 2 2.332 w w
= 2.332 w1,t
1,t 2,t 1,2 1 2
1
2,t 2
which shows that the portfolio VaR in day t can be expressed in terms of the individual VaRs of the assets and
the correlation between the two asset returns. Since the correlation coefficient ranges between 1, the two
extreme cases of correlation implies the following VaR:
(
)
0.99 = V aR0.99 + V aR0.99 the two assets are perfectly correlated and the total
1,2 = 1: V aRp,t
1,t
2,t
portfolio VaR is the sum of the individual VaRs
0.99 = |V aR0.99 V aR0.99 | the two assets have perfect negative correlation then
1,2 = 1: V aRp,t
1,t
2,t
the total risk of the portfolio is given by the difference between the two VaRs since the risk in one asset
is offset by the other asset, and vice versa.
More generally, both the mean and variance could be varying over time conditional on past information. In
0.99 can be rewritten as
this case V aRp,t
0.99
V aRp,t
= w1,t 1,t + w2,t 2,t 2.33
2 2 + w 2 2 + 2w w
w1,t
1,t 2,t 12,t 1,t 2,t
1,t
2,t 2,t
In the Equation above we added a t subscript also to the correlation coefficient, that is, 12,t represents the
correlation between the two asset conditional on the information available up to that day. There is evidence
supporting the fact that correlations between assets might be changing over time in response to market events
or macroeconomic shocks (e.g., a recession). In the following Section we discuss some methods that can be
used to model and predict correlations.
100
Modeling correlations
A simple approach to modeling correlations consists of using MA and EMA smoothing similarly to the case of
forecasting volatility. However, in this case the object to be smoothed is not the square return, but the product
of the returns of asset 1 and 2 (N.B.: we are implicitly assuming that the mean of both assets can be set equal
to zero). Denote the return of asset 1 by R1,t , of asset 2 by R2,t , and by 12,t the covariance between the two
assets in day t. We can estimate 12,t by a MA of M days:
12,t+1
M
1
=
R1,tm+1 R2,tm+1
M
m=1
and the correlation is then obtained by dividing the covariance estimate with the standard deviation of the
two assets, that is:
12,t+1 = 12,t+1 / (1,t+1 2,t+1 )
In case the portfolio is composed of J assets then there are (J 1)/2 asset pairs for which we need to calculate
correlations. An alternative approach is to use EMA smoothing which can be implemented using the recursive
formula discussed earlier:
12,t+1 = (1 )12,t + R1,t R2,t
and the correlation is obtained by dividing the covariance by the forecasts of the standard deviations for the
two assets.
To illustrate the implementation in R we assume that the firm is holding a portfolio that invests a fraction w1
in a gold ETF (ticker: GLD) and the remaining fraction 1 w1 in the S&P 500 ETF (ticker: SPY). The closing
prices are downloaded from Yahoo Finance starting in Jan 02, 2005 and ending on Apr 14, 2015 and the goal is
to forecast portfolio VaR for the following day. We will assume that the expected daily returns for both assets
are equal to zero and forecast volatilities and the correlation between the assets using EMA with = 0.06.
In the code below R represents a matrix with 2587 rows and two columns representing the GLD and SPY daily
returns.
library(TTR)
prod
<- R[,1] * R[,2]
# EMA for the product of returns
cov
<- EMA(prod, ratio=0.06)
# Apply the EMA function to each column of R and make it a zoo object
sigma <- zoo(apply(R^2, 2, EMA, ratio=0.06), order.by=time(R))^0.5
corr
<- cov / (sigma[,1] * sigma[,2])
101
The time series plot of the EMA correlation shows that the dependence between the gold and S&P 500 returns
oscillates significantly around the long-run correlation of 0.062. In certain periods gold and the S&P 500 have
positive correlation as high as 0.81 and in other periods as low as -0.65. During 2008 the correlation between the
two assets became large and negative since investors fled the equity market toward gold that was perceived
as a safe haven during turbulent times. Based on these forecasts of volatilities and correlation, portfolio VaR
can be calculated in R as follows:
# weight of asset 1
w1 = 0.5
w2 = 1 - w1 # weight of asset 2
VaR = -2.33 * ( (w1*sigma[,1])^2 + (w2*sigma[,2])^2 +
2*w1*w2*corr*sigma[,1]*sigma[,2] )^0.5
The one-day portfolio Value-at-Risk fluctuates substantially between -0.79% and -8.05%, which occurred
during the 2008-09 financial crises. It is also interesting to compare the portfolio VaR with the risk measure if
102
the portfolio is fully invested in either asset. The time series graph below shows the VaR for three scenarios:
a portfolio invested 50% in gold and equity, 100% gold, and 100% S&P 500. Portfolio VaR is between the VaRs
for the individual assets when correlation is positive. However, in those periods in which the two assets have
negative correlation (e.g., 2008) the portfolio VaR is higher than both individual VaRs since the two assets are
moving in opposite directions and the risk exposures partly offset each other.
VaRGLD = -2.33 * sigma[,1]
VaRSPY = -2.33 * sigma[,2]
Exposure mapping
103
[1] 0.022
In this case, we find that the 99% VaR is violated more often (0.022) then expected (0.01). We can test the
hypothesis that = 0.01 (or 1%) by comparing the likelihood that the sample has been generated by as
opposed to its estimate
, which in this example is equal to 0.022. One way to test this hypothesis is to define
the event of a violation of VaR, that is Rt+1 V aRt+1 , as a binomial random variable with probability
that the event occurs. Since we have a total of T days and introducing the assumption that violations are
independent of each other, then the joint probability of having T1 violations (out of T days) is
L(, T1 , T ) = T1 (1 )T0
where T0 = T T1 . The hypothesis = 0.01 can be tested by comparing this likelihood above at the
estimated and at the theoretical value of 0.01 (more generally, if VaR is calculated at 95% then is 0.05).
This type of tests are called likelihood ratio tests and can be interpreted as the distance between the theoretical
value (i.e., using = 0.01) of the likelihood of obtaining T1 violation in T days and the likelihood based on
the sample estimate
. The statistic and distribution of the test for Unconditional Coverage (UC) are
(
U C = 2 ln
L(0.01, T1 , T )
L(
, T1 , T )
)
21
where
= T1 /T and 21 denotes the chi-square distribution with 1 degree-of-freedom. The critical values
at 1, 5, and 10% are 6.63, 3.84, and 2.71, respectively, and the null hypothesis = 0.01 is rejected if LRU C is
larger than the critical value. In practice, the test statistic can be calculated as follows:
[
2 T1 ln
0.01
(
+ T0 ln
0.99
1
)]
= -2 * ( T1 * (log(0.01/alphahat))
+ (TT - T1) * log(0.99/(1-alphahat)))
[1] 68.1
Since 68.1 is larger than 3.84 we reject the null hypothesis at 5% significance level that = 0.01 and conclude
that the risk model provides inappropriate coverage.
While testing for coverage, we introduced the assumption that violations are independent of each other. If
this assumption fails to hold, then a violation in day t has the effect of increasing/decreasing the probability
104
of experiencing a violation in day t + 1, relative to its unconditional level. A situation in which this might
happen is during financial crises when markets enter a downward spiral which is likely to lead to several
consecutive days of violations and thus to the possibility of underestimating risk. The empirical evaluation of
this assumption requires the calculation of two probabilities: 1) the probability of having a violation in day
t given that a violation occurred in day t 1, and 2) the probability of having a violation in day t given
that no violation occurred the previous day. We denote the estimates of these conditional probabilities as
1,1
and
0,1 , respectively. They can be estimated from the data by calculating T1,1 and T0,1 that represent the
number of days in which a violation was preceded by a violation and a no violation, respectively. In R we can
determine these quantities as follows:
T11 = sum((lag(V,-1)==1) & (V==1))
T01 = sum((lag(V,-1)==0) & (V==1))
where we obtain that T0,1 = 131 and T1,1 = 7, with their sum equal to T1 . Similarly, we can calculate T1,0
and T0,0 and the estimates are 131 and 5998, respectively, that sum to T0 . Since we look at violations in
two consecutive days we lose one observation and our total sample size is now T 1 = TRUE. We can then
calculate the conditional probabilities of a violation in a day given that the previous day there was no violation
as
0,1 = T0,1 /(T0,1 + T1,1 ) while the probability of a violation in two consecutive days, that is,
1,1 = 1
1,0
0,1
L(
0,1 ,
1,1 , T0,1 , T1,1 , T ) =
1,0
(1
1,0 )T1,1
0,1
(1
0,1 )T0,0
the test statistic and distribution for the hypothesis of Independence (IND) in this case is
(
IN D = 2 ln
L(
, T0 , T )
L(
0,1 ,
1,1 , T0,1 , T1,1 , T )
)
21
and the critical values are the same as for the UC test. The numerator of the IND test is the likelihood in the
denominator of the UC test and it is based on the empirical estimate of rather than the theoretical value .
The value of the IN D test statistic is 4.04 which is greater relative to the critical value at 5% so that we reject
the null hypothesis that the violations occur independently.
Finally, we might be interested in testing the hypothesis 0,1 = 1,1 = 0.01 which tests jointly the
independence of the violations as well as the coverage which should be equal to 1%. This test is referred
to as Conditional Coverage and the test statistic is given by the sum of the previous two test statistics, that
is, CC = IN D + U C and it is distributed as a 22 . The critical values at 1, 5, and 10% in this case are 9.21,
5.99, and 4.61, respectively.