Sei sulla pagina 1di 15

Question 1

1. Outline the concepts of Skewness, Kurtosis in terms of salient features of the financial data and
explain their meaning and the use of Jarque-Bera tests statistic.

Skewness measures the extent to which a distribution is not symmetric around its mean. When the
distribution of data is symmetric and unimodal the mean, median and mode will be the same. If the
distribution is positively skewed then there will be a long right tail and the data will be bunched to
the left. A normally distributed series has zero skewness.

Kurtosis measures the fatness of the tails of the distribution and how peaked at the mean it is. A
normal distribution is defined to have a kurtosis of 3. A distribution with a kurtosis greater than 3
will have thinner tails and a higher peak, and with a kurtosis of less than 3 will have fatter tails and a
lower peak.

The Jarque-Bera test can be used to test a distribution for normality. The null hypothesis is that the
coefficient of skewness and the coefficient of kurtosis minus three jointly equal zero, that is, that the
distribution is normal. The test statistic has a chi squared distribution with two degrees of freedom.

Financial data usually is leptokurtic with negative skew.

2. The descriptive statistics of the above two series given below, explain the characteristics of these

3. Using the correlogram given below, assess the underlying data generating process (e.g. AR, MA) of
these series and explain whether these series (LCHINA and DCHINA) are stationary.

An autoregressive model is one where the current value of a variable, y, depends upon only the
values that the variable took in previous periods, plus an error term.
(2): = + 1 1 + 2 2 +

A moving average model is simply a linear combination of white noise processes

(2): = + 1 1 + 2 2 +

The partial autocorrelation function measures the correlation between an observation k periods ago
and the current observation, after controlling for all intermediate lags. At lag 1, the autocorrelation
and partial autocorrelation coefficients are the same as there are no lags to eliminate.

For AR(2), there is partial correlation between yt and yt-1 and yt and yt-2, but no partial correlation
between yt and yt-3 and beyond.

MA(2) can be expressed as AR(). Thus there is a direct connect between yt and all its previous
values. Thus the partial autocorrelation function for an MA(q) model will decline geometrically.

Autoregressive process:
1. a geometrically decaying acf
2. a number of non-zero points of pacf

Moving average process:

1. a number of non-zero points acf
2. a geometrically decaying pacf
1. a geometrically decaying acf
2. a geometrically decaying pacf

Process for determination:

Is the ACF geometrically declining and there are p spikes on the PACF?

95% non-rejection region for autocorrelation being different from zero: 1.96 where T is the

sample size.

The Ljung-Box Q-statistic tests the joint hypothesis that all of the autocorrelation coefficients up to a
certain point are jointly equal to zero.

In a AR model whose coefficients are non-stationary, the previous values of the error term will have
a non-declining effect on the current value of y as time progresses.

4. Outline the unit root testing procedure using Dickey Fuller (DF) and Augmented Dickey Fuller

P260-261 lag operators and unit roots


A stationary series is on with a constant mean, constant variance and constant autocovariances.

DF can be conducted allowing for an intercept, or an intercept and deterministic trend, or neither.
DF with an intercept and deterministic trend:
= 1 + + +

Null hypothesis that = 1

Alternative that < 1

DF is only valid if ut is white noise. If there is autocorrelation in the dependent variable of the
regression that has no be modelled, there will be autocorrelation in ut. If this is the case the test
would be oversized, that is, the proportion of times that the null hypothesis is incorrectly rejected
would be higher than assumed.
The solution is ADF which adds p lags of the dependent variable.

= y1 + y +
5. Perform unit root tests using the tables below, clearly outlining null and alternate hypothesis.
Outline the relevance of lag length in unit root testing and what criteria are used to decide the lag

One can use either the frequency of data or an information criterion to choose the number of lags.
Including too few lags will not remove all of the autocorrelation, and too many lags will increase the
coefficient standard error.
Question 2

1. Outline the Vector Autoregressive (VAR) modelling procedure. What are the advantages and
disadvantages of VAR modelling over univariate models (i.e. AR, MA, ARMA etc.)?

A VAR is a systems regression model that can be considered a king of hybrid between the univariate
time series model and the simultaneous equations model. VAR models treat every exogenous
variable as a function of all the lagged variables of all the endogenous variables in the system.

As an example, suppose that industrial production (IP) and money supply (M1) are jointly
determined by a VAR and let a constant be the only exogenous variable. Assuming that the VAR
contains two lagged values of the endogenous variables, it may be written as:

is a vector of innovations that may be contemporaneously correlated but are uncorrelated with
their own lagged values and uncorrelated with all of the right-hand side variables.

Advantages of VAR
1. The researcher does not need to specify which variables are endogenous or exogenous,
because they are all endogenous. This helps when the exogenous variables are not easily
identifiable through economic theory.
2. VARs allow the value of a variable to depend on more than just its own lags or combinations
of white noise terms. Thus, they may be able to capture more features of the data.
3. OLS can be used on each equation independently
4. Forecasts generated by VARs are often better than traditional structural models.

Disadvantages of VAR
1. VARs are a-theoretical. They have no economic theory to guide them. This means it is easier
to mine the data to achieve a spurious relationship between the variables.
2. There is no precise rule for determining lag length
3. There are lots of parameters. This can lead to much larger standard errors and wide
confidence intervals for model coefficients.

2. Explain the procedure for selecting the order of a VAR model. Interpret the following estimated
VAR model.

There are two methods one can use to select the lag length: log likelihood and information criteria.
The log likelihood will only be valid for models errors are normally distributed, which is unlikely in
the case of financial data. So, multivariate versions of the information criteria are used. It is deemed
preferable to look at the multivariate IC rather than the IC of the individual equations, so that the
same lags are used for each equation. The chosen lags are the ones minimising the IC.

Each column in the table corresponds to an equation in the VAR. For each right-hand side variable,
EViews reports the estimated coefficient, its standard error, and the t-statistic.

R squared:
3. Outline briefly the Johansens cointegration methodology and testing procedure for cointegrating
vectors and give at least one example.

A linear combination of I(1) variables will be I(0), in other words stationary, if the variables are
A set of variables is defined as cointegrated if a linear combination of them is stationary.
If there were no cointegration, there would be no long-run relationship binding the series and the
series could wonder apart without bound.
Formally, if (X,Y,Z) are each integrated of order 1, and there exist coefficients a,b,c such
that aX + bY + cZ is integrated of order 0, then X, Y, and Z are cointegrated.
In order to use the Johansen test, the VAR equation must be transformed into a vector error
correction model. The Johansen test centres around an examination of a long-run coefficient matrix,
denoted as . The test for cointegration between the variables is calculated by looking at the rank of
the matrix via its eigenvalues. The rank of a matrix is equal to the number of its eigenvalues that
are different from zero. If the variables are not cointegrated, the rank of will not be significantly
different from 0:

For both test statistics, the initial Johansen test is a test of the null hypothesis of no cointegration
against the alternative of cointegration. The tests differ in terms of the alternative hypothesis

The maximum eigenvalue test examines whether the largest eigenvalue is zero relative to the
alternative that the next largest eigenvalue is zero. The first test is a test whether the rank of the
matrix is zero. In more detail, the first test is the test of rank () = 0 and the alternative hypothesis
is that rank () = 1.

The trace test is a test whether the rank of the matrix is r0. The null hypothesis is that rank () =
r0. The alternative hypothesis is that r0 < rank () n, where n is the maximum number of possible
cointegrating vectors

Spot prices and future prices may be expected to be cointegrated since they are prices for the same
asset at different points in time, and will hence be affected in very similar ways by given pieces of
The discounted dividend model assumes that the value of a certain stock held in perpetuity is the
present value of all its future dividend payments. Hence, it may be argued that one would not
expect current prices to move out of line with expected dividends in the long run, thus implying that
shares prices and their dividends should be cointegrated.

4. Explain why is the Vector Error Correction Mechanism (VECM) modelling framework better than
the single equation ECM methodology?

The EngleGranger approach as described above suffers from a number of weaknesses. Namely it is
restricted to only a single equation with one variable designated as the dependent variable,
explained by another variable that is assumed to be weakly exogeneous for the parameters of
interest. It also relies on pretesting the time series to find out whether variables are I(0) or I(1).
These weaknesses can be addressed through the use of Johansen's procedure. Its advantages
include that pretesting is not necessary, there can be numerous cointegrating relationships, all
variables are treated as endogenous and tests relating to the long-run parameters are possible. The
resulting model is known as a vector error correction model (VECM), as it adds error correction
features to a multi-factor model known as vector autoregression (VAR).

5. Test for cointegrated vectors in the following tables and comment on your results.

6. Test for Granger causality using the following results.

The Granger (1969) approach to the question of whether x causes y is to see how much of
the current y can be explained by past values of y and then to see whether adding lagged values
of x can improve the explanation. y is said to be Granger-caused by x if x helps in the prediction of y,
or equivalently if the coefficients on the lagged xs are statistically significant. Note that two-way
causation is frequently the case; x Granger causes y and y Granger causes x.
It is important to note that the statement x Granger causes y does not imply that y is the effect or
the result of x. Granger causality measures precedence and information content but does not by
itself indicate causality in the more common use of the term.

In general, it is better to use more rather than fewer lags, since the theory is couched in terms of the
relevance of all past information. You should pick a lag length, l, that corresponds to reasonable
beliefs about the longest time over which one of the variables could help predict the other.
EViews runs bivariate regressions of the form:

for all possible pairs of (x, y) series in the group. The reported F-statistics are the Wald statistics for
the joint hypothesis:

The null hypothesis is that x does not Granger-cause y in the first regression and
that y does not Granger-cause x in the second regression.

1. Explain reasons for using ARCH family models compared to other univariate time series models
(i.e. AR, ARMA) and give an example to explain your answer.

All the AR MA ARMA models are linear, that is, the model is linear in the parameters so that there is
one parameter multiplied by each variable in the model.
Linear structural models are unable to explain some common features of financial data:
Leptokurtosis fat tails and peakedness of the distribution of asset returns
Volatility clustering the tendency for volatility in financial markets to appear in bunches.
Leverage effects the tendency for volatility to rise more following a large price fall than following a
price rise of the same magnitude.

2. Outline the GARCH methodology and explain the reasons for using GARCH method instead of
ARCH. Explain briefly the main underlying assumptions of the ARCH and GARCH models.

Under ARCH the expected value of the error term, ut, is zero, which allows us to express the
variance of the error term as the expected value of the error term squared at time t given the
previous values of the error term.

The GARCH model allows the conditional variance to be dependent upon previous values of the
conditional variance as well as previous values of the squared error term.

GARCH is more parsimonious than ARCH, and it avoids underfitting. Consequently, the model is less
likely to breach non-negativity constraints (i.e. that the conditional variance on the LHS must be
positive or it would have no meaning.) For example, the GARCH(1,1) model can be expanded out to
a restricted infinite order ARCH model. Thus, the GARCH(1, 1) model containing only three
parameters in the conditional variance equation is a very parsimonious model, that allows an infinite
number of past squared errors to influence the current conditional variance.

3. Test the following estimated GARCH models and carefully interpret your results. Which model is
your preferred model and why?

RESID(-1)^2 = lagged squared residuals i.e. the ARCH parameter

GARCH(-1) = lagged conditional variance

If the sum of the coefficients of the above is close to unity, then shocks to the conditional variance
will be highly persistent. The coeff of RESID measures to what extent a volatility shock today feeds
through into next periods volatility. Coeffs of RESID + GARCH measures the rate at which this effect
dies over time.
4. Outline reasons for using TGARCH and EGARCH models compared to other simple GARCH models.

GARCH may violate non-negativity conditions. GARCH models cannot account for leverage effects,
because the conditional variance is a function of the magnitude of the lagged residuals and not their
signs (in other words, by squaring the lagged residuals the sign is lost). TGARCH accounts for this by
the addition of a term that accounts for asymmetries.

The EGARCH model has several advantages over the pure GARCH model. Sigma squared will always
be positive even if the RHS is negative, because e to the power of any number is positive. So, there is
no need to impose artificial non-negativity constraints on the model parameters. Secondly,
asymmetries are allowed for under EGARCH, since if the relationship between volatility and returns
is negative will be negative.

5. Test the following estimated TGARCH and EGARCH models and carefully interpret your results.

The TGARCH model is a simple extension of GARCH with an additional term added to account for
possible asymmetries.

For a levering effect, we would expect to see > 0.

(RESID(-1)^2*RESID(-1)<0) = , the asymmetry term.

The EGARCH model

C(4) = RESID(-1)/SQRT(GARCH-1) = , the asymmetry term.

When is significantly greater than zero, then negative shocks imply a higher next period conditional
variance than positive shocks of the same magnitude.
Explain briefly the concept of non-stationarity in regression and consequences of ignoring this when
using the time series data?

For the purpose of the analysis in this chapter, a stationary series can be defined as one with a constant
mean, constant variance and constant autocovariances for each given lag. Therefore, the discussion
in this chapter relates to the concept of weak stationarity.

2 problems:

The stationarity or otherwise of a series can strongly influence its behaviour and properties. To offer
one illustration, the word shock is usually used to denote a change or an unexpected change in a
variable or perhaps simply the value of the error term during a particular time period. For a stationary
series, shocks to the system will gradually die away. That is, a shock during time t will have a smaller
effect in time t + 1, a smaller effect still in time t + 2, and so on. This can be contrasted with the case
of non-stationary data, where the persistence of shocks will always be infinite, so that for a non-
stationary series, the effect of a shock during time t will not have a smaller effect in time t + 1, and in
time t + 2, etc.

The use of non-stationary data can lead to spurious regressions. If two stationary variables are
generated as independent random series, when one of those variables is regressed on the other, the
t-ratio on the slope coefficient would be expected not to be significantly different from zero, and the
value of R2 would be expected to be very low. This seems obvious, for the variables are not related to
one another. However, if two variables are trending over time, a regression of one on the other could
have a high R2 even if the two are totally unrelated. So, if standard regression techniques are applied
to non-stationary data, the end result could be a regression that looks good under standard measures
(significant coefficient estimates and a high R2), but which is really valueless. Such a model would be
termed a spurious regression.

Describe the Error Correction Mechanism (ECM) and why this approach may be better than other
single equation models? Give examples.

Pure first difference models have no long run solution.

This model is known as an error correction model or an equilibrium correction model, and yt1
xt1 is known as the error correction term. Provided that yt and xt are cointegrated with
cointegrating coefficient , then (yt1 xt1) will be I(0) even though the constituents are I(1). It is
thus valid to use OLS and standard procedures for statistical inference on this equation.

Outline the Engle and Granger two-step estimation method.

Step 1
Make sure that all the individual variables are I(1). Then estimate the cointegrating regression using
OLS. Test these residuals to ensure that they are I(0). If they are I(0), proceed to Step 2; if they are I(1),
estimate a model containing only first differences.
Step 2

Use the step 1 residuals as one variable in the error correction model, e.g. yt = 1xt + 2(ut1) + vt
(7.51) where ut1 = yt1 xt1. The stationary, linear combination of nonstationary variables is
also known as the cointegrating vector. In this case, the cointegrating vector would be [1 ].
Additionally, any linear transformation of the cointegrating vector will also be a cointegrating vector.
So, for example, 10yt1 + 10 xt1 will also be stationary. In (7.45) above, the cointegrating vector
would be [1 1 2 3]. It is now valid to perform inferences in the second-stage regression,
i.e. concerning the parameters 1 and 2 (provided that there are no other forms of misspecification,
of course), since all variables in this regression are stationary

What are strengths and weaknesses of a single equation ECM methodology?

At most one cointegrating relationship can be examined.

The univariate unit root tests used in the first stage have low statistical power
The choice of dependent variable in the first stage influences test results, i.e. we need weak
exogeneity for as determined by Granger causality
One can potentially have a small sample bias
The cointegration test on does not follow a standard distribution
The validity of the long-run parameters in the first regression stage where one obtains the residuals
cannot be verified because the distribution of the OLS estimator of the cointegrating vector is highly
complicated and non-normal
It is not possible to perform any hypothesis tests about the actual cointegrating relationship
estimated at stage 1.

Test for cointegration using the two-step Engle and Granger method and interpret ECM model results.

Outline briefly AR, MA and ARMA models. Highlight their potential use in modelling economic and
finance data and give examples.

An autoregressive model is one where the current value of a variable, y, depends upon only the
values that the variable took in previous periods, plus an error term.
(2): = + 1 1 + 2 2 +

A moving average model is simply a linear combination of white noise processes

(2): = + 1 1 + 2 2 +
What criteria should one use to select the best model? Outline briefly the relative strengths and
weaknesses of different selection criteria?

Another technique, which removes some of the subjectivity involved in interpreting the acf and pacf,
is to use what are known as information criteria. Information criteria embody two factors: a term
which is a function of the residual sum of squares (RSS), and some penalty for the loss of degrees of
freedom from adding extra parameters. So, adding a new variable or an additional lag to a model will
have two competing effects on the information criteria: the residual sum of squares will fall but the
value of the penalty term will increase. The object is to choose the number of parameters which
minimises the value of the information criteria. So, adding an extra term will reduce the value of the
criteria only if the fall in the residual sum of squares is sufficient to more than outweigh the increased
value of the penalty term.
SBIC is strongly consistent (but inefficient) and AIC is not consistent, but is generally more efficient. In
other words, SBIC will asymptotically deliver the correct model order, while AIC will deliver on average
too large a model, even with an infinite amount of data. On the other hand, the average variation in
selected model orders from different samples within a given population will be greater in the context
of SBIC than AIC. Overall, then, no criterion is definitely superior to others.

Outline the main steps for forecasting a series using these models.

Following tables show results from various univariate models, which model is the best model
capturing the underlying data generating in the data and why?

Explain the estimated results of your selected (best model) model using the appropriate statistical
Section A
1. Interpret the following statistics: R-squared, Adjusted R-squared, S.E. of regression, Sum squared
resid, F-statistic, Prob(F-statistic), Mean dependent var and S.D. dependent var.

The R-squared is a measure of goodness of fit of the model. The closer is the value to unity the
better is goodness of fit. The Adjusted R-squared takes into account the number of the degrees of
freedom in the model and penalises introducing additional explanatory variables. The latter
indicates that the model explains 87.56% of the variation in the dependent variable.

Sum squared resid simply calculates the residual sum of squares. It is an absolute measure of the
importance of the residual part of the model. The standard error of the regression is an average size
of the residual. The standard error of the regression is obtained by dividing the residual sum of
squares by the number of degrees of freedom and taking the square root.

The F-statistic is used to test for joint significance of the model. The null hypothesis is that all the
coefficients of the model are collectively zero. The alternative hypothesis is that at least one
coefficient that takes values different from zero. Subsequently, the constrained and unconstrained
models are estimated, as implied by the null and alternative hypotheses. The F statistic is calculated
as a relative difference between the residual sum of squares of these models and it is compared with
the corresponding critical value from an F distribution. Prob(F-statistic) measures marginal
significance of the F statistic.

Mean dependent var calculates the average of the dependent variable, whereas S.D. dependent var
calculates the standard deviation of the dependent variable.

2. t-test

Five per cent critical values = -1.97 and 1.97

3. F-test

4. VIFs and Multicollinearity

Multicollinearity exists when two or more of the predictors in a regression model are moderately or
highly correlated. Unfortunately, when it exists, it can wreak havoc on our analysis and thereby limit
the research conclusions we can draw. As we will soon learn, when multicollinearity exists, any of
the following pitfalls can be exacerbated:
the estimated regression coefficient of any one variable depends on which other predictors
are included in the model
the precision of the estimated regression coefficients decreases as more predictors are
added to the model
the marginal contribution of any one predictor variable in reducing the error sum of squares
depends on which other predictors are already in the model
hypothesis tests for k = 0 may yield different conclusions depending on which predictors are
in the model

VIF = 1 / (1- R^2)

where R2 is the value of the coefficient of determination resulting from an auxiliary regression that
has been performed of one of the explanatory variables on a constant and the other explanatory

Another indication of the presence of multicollinearity is that individually the explanatory variables
are associated with a lack of significance, while the regression model as a whole is seen to be
statistically significant. In answering Question 4, in all cases, the null hypothesis was rejected at the
five per cent level of significance. Hence, there was no sign of multicollinearity.

Yet another indication of multicollinearity is that, upon adding an explanatory variable to a

regression model, the size of the standard error of the estimator of the slope parameter markedly
increases. In this exercise, in the context of the two-variable model, the standard error of the
Ordinary Least Squares estimator of B1 was equal to 0.1976. In contrast, in the context of the three-
variable model, the corresponding standard error was equal to 0.2439. Hence, upon expanding the
model, the value of the standard error has become larger, but not tremendously so.

Finally, the strength of correlation between explanatory variables may be interpreted as a sign of
multicollinearity. It must be respected that the value of a correlation coefficient has the potential to
fall between -1 and +1. The closer that the value is to either of the two extremes then the stronger is
the linear relationship between the respective variables. The nearer that the value is to zero then
the weaker is the linear relationship between the two variables.

Available remedies: i) removing explanatory variables that are strongly correlated with each other; ii)
acquiring more data; and iii) reconsidering the functional form of the model.

5. Breusch-Pagan-Godfrey

Formulate the variance function that is required for the Breusch-Pagan-Godfrey test for
heteroscedasticity. What are the null and alternative hypotheses for this test? Determine the value
for the test statistic and the critical value. Is there evidence of heteroscedasticity?

The test equation can be formulated as

If the random disturbance term is homoscedastic, the variance (the expected value of 2 ) will be
constant. If any of the explanatory variables is significant, then the random disturbance term is
heteroscedastic. More formally, 0:1 = 2 = 0 against the 1:1 0 2 0. We shall assume
asymptotic validity of the test statistic, which is distributed with a Chi-square distribution with 2
degrees of freedom. It is calculated as 2 = 216 0.135844 = 29.34231. The critical value is given
by 5.9915. Since the test statistic is greater than the critical value, the null hypothesis is rejected.
Therefore, there is evidence of heteroscedasticity in the model.

To remedy the problem of heteroscedasticity, the White or Newey-West estimates of coefficient

standard errors can be used, since they give rise to consistent estimates. Alternatively, generalised
least squares could be used to obtain efficient estimates.

Ramsey RESET

The REST test is a general test for misspecification of functional form. In this test, the dependent
variable from the original regression is regressed on powers of the fitted values together with the
original explanatory variables. The fitted values of y are those which are found through the original
regression, and represent all the different quadratic terms of x.
The null hypothesis, under an F-test, is that all the coefficients of the fitted values of y are jointly
equal to zero.

What are the three tests for autocorrelation?

White test

What are the effects of autocorrelation?

The coefficient estimates derived using OLS are still unbiased, but they are inefficient, i.e. they are
not BLUE, even at large sample sizes, so that the standard error estimates could be wrong. There
thus exists the possibility that the wrong inferences could be made about whether a variable is or is
not an important determinant of variations in y. In the case of positive serial correlation in the
residuals, the OLS standard error estimates will be biased downwards relative to the true standard
errors. That is, OLS will understate their true variability. This would lead to an increase in the
probability of type I error -- that is, a tendency to reject the null hypothesis sometimes when it is
correct. Furthermore, R2 is likely to be inflated relative to its correct value if autocorrelation is
present but ignored, since residual autocorrelation will lead to an underestimate of the true error
variance (for positive autocorrelation).

White test for heteroskedasticity.

H0: there is heteroskedasticity.

Consequences of using OLS in the presence of heteroskedasticity.

In this case, OLS estimators will still give unbiased (and also consistent) coefficient estimates, but
they are no longer BLUE -- that is, they no longer have the minimum variance among the class of
unbiased estimators. The reason is that the error variance, 2, plays no part in the proof that the
OLS estimator is consistent and unbiased, but 2 does appear in the formulae for the coefficient
variances. If the errors are heteroscedastic, the formulae presented for the coefficient standard
errors no longer hold.
So, the upshot is that if OLS is still used in the presence of heteroscedasticity, the standard errors
could be wrong and hence any inferences made could be misleading. In general, the OLS standard
errors will be too large for the intercept when the errors are heteroscedastic. The effect of
heteroscedasticity on the slope standard errors will depend on its form