0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

21 visualizzazioni15 pagineLearn econometrics for university

Jun 20, 2017

© © All Rights Reserved

DOCX, PDF, TXT o leggi online da Scribd

Learn econometrics for university

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

21 visualizzazioni15 pagineLearn econometrics for university

© All Rights Reserved

Sei sulla pagina 1di 15

1. Outline the concepts of Skewness, Kurtosis in terms of salient features of the financial data and

explain their meaning and the use of Jarque-Bera tests statistic.

Skewness measures the extent to which a distribution is not symmetric around its mean. When the

distribution of data is symmetric and unimodal the mean, median and mode will be the same. If the

distribution is positively skewed then there will be a long right tail and the data will be bunched to

the left. A normally distributed series has zero skewness.

Kurtosis measures the fatness of the tails of the distribution and how peaked at the mean it is. A

normal distribution is defined to have a kurtosis of 3. A distribution with a kurtosis greater than 3

will have thinner tails and a higher peak, and with a kurtosis of less than 3 will have fatter tails and a

lower peak.

The Jarque-Bera test can be used to test a distribution for normality. The null hypothesis is that the

coefficient of skewness and the coefficient of kurtosis minus three jointly equal zero, that is, that the

distribution is normal. The test statistic has a chi squared distribution with two degrees of freedom.

2. The descriptive statistics of the above two series given below, explain the characteristics of these

series.

3. Using the correlogram given below, assess the underlying data generating process (e.g. AR, MA) of

these series and explain whether these series (LCHINA and DCHINA) are stationary.

An autoregressive model is one where the current value of a variable, y, depends upon only the

values that the variable took in previous periods, plus an error term.

(2): = + 1 1 + 2 2 +

(2): = + 1 1 + 2 2 +

The partial autocorrelation function measures the correlation between an observation k periods ago

and the current observation, after controlling for all intermediate lags. At lag 1, the autocorrelation

and partial autocorrelation coefficients are the same as there are no lags to eliminate.

For AR(2), there is partial correlation between yt and yt-1 and yt and yt-2, but no partial correlation

between yt and yt-3 and beyond.

MA(2) can be expressed as AR(). Thus there is a direct connect between yt and all its previous

values. Thus the partial autocorrelation function for an MA(q) model will decline geometrically.

Autoregressive process:

1. a geometrically decaying acf

2. a number of non-zero points of pacf

1. a number of non-zero points acf

2. a geometrically decaying pacf

ARMA:

1. a geometrically decaying acf

2. a geometrically decaying pacf

Is the ACF geometrically declining and there are p spikes on the PACF?

AR(p)

1

95% non-rejection region for autocorrelation being different from zero: 1.96 where T is the

sample size.

The Ljung-Box Q-statistic tests the joint hypothesis that all of the autocorrelation coefficients up to a

certain point are jointly equal to zero.

In a AR model whose coefficients are non-stationary, the previous values of the error term will have

a non-declining effect on the current value of y as time progresses.

4. Outline the unit root testing procedure using Dickey Fuller (DF) and Augmented Dickey Fuller

Tests.

361

A stationary series is on with a constant mean, constant variance and constant autocovariances.

DF can be conducted allowing for an intercept, or an intercept and deterministic trend, or neither.

DF with an intercept and deterministic trend:

= 1 + + +

Alternative that < 1

DF is only valid if ut is white noise. If there is autocorrelation in the dependent variable of the

regression that has no be modelled, there will be autocorrelation in ut. If this is the case the test

would be oversized, that is, the proportion of times that the null hypothesis is incorrectly rejected

would be higher than assumed.

The solution is ADF which adds p lags of the dependent variable.

= y1 + y +

=1

5. Perform unit root tests using the tables below, clearly outlining null and alternate hypothesis.

Outline the relevance of lag length in unit root testing and what criteria are used to decide the lag

length.

One can use either the frequency of data or an information criterion to choose the number of lags.

Including too few lags will not remove all of the autocorrelation, and too many lags will increase the

coefficient standard error.

Question 2

1. Outline the Vector Autoregressive (VAR) modelling procedure. What are the advantages and

disadvantages of VAR modelling over univariate models (i.e. AR, MA, ARMA etc.)?

A VAR is a systems regression model that can be considered a king of hybrid between the univariate

time series model and the simultaneous equations model. VAR models treat every exogenous

variable as a function of all the lagged variables of all the endogenous variables in the system.

As an example, suppose that industrial production (IP) and money supply (M1) are jointly

determined by a VAR and let a constant be the only exogenous variable. Assuming that the VAR

contains two lagged values of the endogenous variables, it may be written as:

is a vector of innovations that may be contemporaneously correlated but are uncorrelated with

their own lagged values and uncorrelated with all of the right-hand side variables.

Advantages of VAR

1. The researcher does not need to specify which variables are endogenous or exogenous,

because they are all endogenous. This helps when the exogenous variables are not easily

identifiable through economic theory.

2. VARs allow the value of a variable to depend on more than just its own lags or combinations

of white noise terms. Thus, they may be able to capture more features of the data.

3. OLS can be used on each equation independently

4. Forecasts generated by VARs are often better than traditional structural models.

Disadvantages of VAR

1. VARs are a-theoretical. They have no economic theory to guide them. This means it is easier

to mine the data to achieve a spurious relationship between the variables.

2. There is no precise rule for determining lag length

3. There are lots of parameters. This can lead to much larger standard errors and wide

confidence intervals for model coefficients.

2. Explain the procedure for selecting the order of a VAR model. Interpret the following estimated

VAR model.

There are two methods one can use to select the lag length: log likelihood and information criteria.

The log likelihood will only be valid for models errors are normally distributed, which is unlikely in

the case of financial data. So, multivariate versions of the information criteria are used. It is deemed

preferable to look at the multivariate IC rather than the IC of the individual equations, so that the

same lags are used for each equation. The chosen lags are the ones minimising the IC.

Each column in the table corresponds to an equation in the VAR. For each right-hand side variable,

EViews reports the estimated coefficient, its standard error, and the t-statistic.

R squared:

3. Outline briefly the Johansens cointegration methodology and testing procedure for cointegrating

vectors and give at least one example.

A linear combination of I(1) variables will be I(0), in other words stationary, if the variables are

cointegrated.

A set of variables is defined as cointegrated if a linear combination of them is stationary.

If there were no cointegration, there would be no long-run relationship binding the series and the

series could wonder apart without bound.

Formally, if (X,Y,Z) are each integrated of order 1, and there exist coefficients a,b,c such

that aX + bY + cZ is integrated of order 0, then X, Y, and Z are cointegrated.

In order to use the Johansen test, the VAR equation must be transformed into a vector error

correction model. The Johansen test centres around an examination of a long-run coefficient matrix,

denoted as . The test for cointegration between the variables is calculated by looking at the rank of

the matrix via its eigenvalues. The rank of a matrix is equal to the number of its eigenvalues that

are different from zero. If the variables are not cointegrated, the rank of will not be significantly

different from 0:

For both test statistics, the initial Johansen test is a test of the null hypothesis of no cointegration

against the alternative of cointegration. The tests differ in terms of the alternative hypothesis

The maximum eigenvalue test examines whether the largest eigenvalue is zero relative to the

alternative that the next largest eigenvalue is zero. The first test is a test whether the rank of the

matrix is zero. In more detail, the first test is the test of rank () = 0 and the alternative hypothesis

is that rank () = 1.

The trace test is a test whether the rank of the matrix is r0. The null hypothesis is that rank () =

r0. The alternative hypothesis is that r0 < rank () n, where n is the maximum number of possible

cointegrating vectors

Spot prices and future prices may be expected to be cointegrated since they are prices for the same

asset at different points in time, and will hence be affected in very similar ways by given pieces of

information.

The discounted dividend model assumes that the value of a certain stock held in perpetuity is the

present value of all its future dividend payments. Hence, it may be argued that one would not

expect current prices to move out of line with expected dividends in the long run, thus implying that

shares prices and their dividends should be cointegrated.

4. Explain why is the Vector Error Correction Mechanism (VECM) modelling framework better than

the single equation ECM methodology?

The EngleGranger approach as described above suffers from a number of weaknesses. Namely it is

restricted to only a single equation with one variable designated as the dependent variable,

explained by another variable that is assumed to be weakly exogeneous for the parameters of

interest. It also relies on pretesting the time series to find out whether variables are I(0) or I(1).

These weaknesses can be addressed through the use of Johansen's procedure. Its advantages

include that pretesting is not necessary, there can be numerous cointegrating relationships, all

variables are treated as endogenous and tests relating to the long-run parameters are possible. The

resulting model is known as a vector error correction model (VECM), as it adds error correction

features to a multi-factor model known as vector autoregression (VAR).

5. Test for cointegrated vectors in the following tables and comment on your results.

The Granger (1969) approach to the question of whether x causes y is to see how much of

the current y can be explained by past values of y and then to see whether adding lagged values

of x can improve the explanation. y is said to be Granger-caused by x if x helps in the prediction of y,

or equivalently if the coefficients on the lagged xs are statistically significant. Note that two-way

causation is frequently the case; x Granger causes y and y Granger causes x.

It is important to note that the statement x Granger causes y does not imply that y is the effect or

the result of x. Granger causality measures precedence and information content but does not by

itself indicate causality in the more common use of the term.

In general, it is better to use more rather than fewer lags, since the theory is couched in terms of the

relevance of all past information. You should pick a lag length, l, that corresponds to reasonable

beliefs about the longest time over which one of the variables could help predict the other.

EViews runs bivariate regressions of the form:

for all possible pairs of (x, y) series in the group. The reported F-statistics are the Wald statistics for

the joint hypothesis:

The null hypothesis is that x does not Granger-cause y in the first regression and

that y does not Granger-cause x in the second regression.

Question 3 ARCH, GARCH, EGARCH, TGARCH

1. Explain reasons for using ARCH family models compared to other univariate time series models

(i.e. AR, ARMA) and give an example to explain your answer.

All the AR MA ARMA models are linear, that is, the model is linear in the parameters so that there is

one parameter multiplied by each variable in the model.

Linear structural models are unable to explain some common features of financial data:

Leptokurtosis fat tails and peakedness of the distribution of asset returns

Volatility clustering the tendency for volatility in financial markets to appear in bunches.

Leverage effects the tendency for volatility to rise more following a large price fall than following a

price rise of the same magnitude.

2. Outline the GARCH methodology and explain the reasons for using GARCH method instead of

ARCH. Explain briefly the main underlying assumptions of the ARCH and GARCH models.

Under ARCH the expected value of the error term, ut, is zero, which allows us to express the

variance of the error term as the expected value of the error term squared at time t given the

previous values of the error term.

The GARCH model allows the conditional variance to be dependent upon previous values of the

conditional variance as well as previous values of the squared error term.

GARCH is more parsimonious than ARCH, and it avoids underfitting. Consequently, the model is less

likely to breach non-negativity constraints (i.e. that the conditional variance on the LHS must be

positive or it would have no meaning.) For example, the GARCH(1,1) model can be expanded out to

a restricted infinite order ARCH model. Thus, the GARCH(1, 1) model containing only three

parameters in the conditional variance equation is a very parsimonious model, that allows an infinite

number of past squared errors to influence the current conditional variance.

3. Test the following estimated GARCH models and carefully interpret your results. Which model is

your preferred model and why?

GARCH(-1) = lagged conditional variance

If the sum of the coefficients of the above is close to unity, then shocks to the conditional variance

will be highly persistent. The coeff of RESID measures to what extent a volatility shock today feeds

through into next periods volatility. Coeffs of RESID + GARCH measures the rate at which this effect

dies over time.

4. Outline reasons for using TGARCH and EGARCH models compared to other simple GARCH models.

GARCH may violate non-negativity conditions. GARCH models cannot account for leverage effects,

because the conditional variance is a function of the magnitude of the lagged residuals and not their

signs (in other words, by squaring the lagged residuals the sign is lost). TGARCH accounts for this by

the addition of a term that accounts for asymmetries.

The EGARCH model has several advantages over the pure GARCH model. Sigma squared will always

be positive even if the RHS is negative, because e to the power of any number is positive. So, there is

no need to impose artificial non-negativity constraints on the model parameters. Secondly,

asymmetries are allowed for under EGARCH, since if the relationship between volatility and returns

is negative will be negative.

5. Test the following estimated TGARCH and EGARCH models and carefully interpret your results.

The TGARCH model is a simple extension of GARCH with an additional term added to account for

possible asymmetries.

When is significantly greater than zero, then negative shocks imply a higher next period conditional

variance than positive shocks of the same magnitude.

Explain briefly the concept of non-stationarity in regression and consequences of ignoring this when

using the time series data?

For the purpose of the analysis in this chapter, a stationary series can be defined as one with a constant

mean, constant variance and constant autocovariances for each given lag. Therefore, the discussion

in this chapter relates to the concept of weak stationarity.

2 problems:

The stationarity or otherwise of a series can strongly influence its behaviour and properties. To offer

one illustration, the word shock is usually used to denote a change or an unexpected change in a

variable or perhaps simply the value of the error term during a particular time period. For a stationary

series, shocks to the system will gradually die away. That is, a shock during time t will have a smaller

effect in time t + 1, a smaller effect still in time t + 2, and so on. This can be contrasted with the case

of non-stationary data, where the persistence of shocks will always be infinite, so that for a non-

stationary series, the effect of a shock during time t will not have a smaller effect in time t + 1, and in

time t + 2, etc.

The use of non-stationary data can lead to spurious regressions. If two stationary variables are

generated as independent random series, when one of those variables is regressed on the other, the

t-ratio on the slope coefficient would be expected not to be significantly different from zero, and the

value of R2 would be expected to be very low. This seems obvious, for the variables are not related to

one another. However, if two variables are trending over time, a regression of one on the other could

have a high R2 even if the two are totally unrelated. So, if standard regression techniques are applied

to non-stationary data, the end result could be a regression that looks good under standard measures

(significant coefficient estimates and a high R2), but which is really valueless. Such a model would be

termed a spurious regression.

Describe the Error Correction Mechanism (ECM) and why this approach may be better than other

single equation models? Give examples.

This model is known as an error correction model or an equilibrium correction model, and yt1

xt1 is known as the error correction term. Provided that yt and xt are cointegrated with

cointegrating coefficient , then (yt1 xt1) will be I(0) even though the constituents are I(1). It is

thus valid to use OLS and standard procedures for statistical inference on this equation.

Step 1

Make sure that all the individual variables are I(1). Then estimate the cointegrating regression using

OLS. Test these residuals to ensure that they are I(0). If they are I(0), proceed to Step 2; if they are I(1),

estimate a model containing only first differences.

Step 2

Use the step 1 residuals as one variable in the error correction model, e.g. yt = 1xt + 2(ut1) + vt

(7.51) where ut1 = yt1 xt1. The stationary, linear combination of nonstationary variables is

also known as the cointegrating vector. In this case, the cointegrating vector would be [1 ].

Additionally, any linear transformation of the cointegrating vector will also be a cointegrating vector.

So, for example, 10yt1 + 10 xt1 will also be stationary. In (7.45) above, the cointegrating vector

would be [1 1 2 3]. It is now valid to perform inferences in the second-stage regression,

i.e. concerning the parameters 1 and 2 (provided that there are no other forms of misspecification,

of course), since all variables in this regression are stationary

The univariate unit root tests used in the first stage have low statistical power

The choice of dependent variable in the first stage influences test results, i.e. we need weak

exogeneity for as determined by Granger causality

One can potentially have a small sample bias

The cointegration test on does not follow a standard distribution

The validity of the long-run parameters in the first regression stage where one obtains the residuals

cannot be verified because the distribution of the OLS estimator of the cointegrating vector is highly

complicated and non-normal

It is not possible to perform any hypothesis tests about the actual cointegrating relationship

estimated at stage 1.

Test for cointegration using the two-step Engle and Granger method and interpret ECM model results.

Outline briefly AR, MA and ARMA models. Highlight their potential use in modelling economic and

finance data and give examples.

An autoregressive model is one where the current value of a variable, y, depends upon only the

values that the variable took in previous periods, plus an error term.

(2): = + 1 1 + 2 2 +

(2): = + 1 1 + 2 2 +

What criteria should one use to select the best model? Outline briefly the relative strengths and

weaknesses of different selection criteria?

Another technique, which removes some of the subjectivity involved in interpreting the acf and pacf,

is to use what are known as information criteria. Information criteria embody two factors: a term

which is a function of the residual sum of squares (RSS), and some penalty for the loss of degrees of

freedom from adding extra parameters. So, adding a new variable or an additional lag to a model will

have two competing effects on the information criteria: the residual sum of squares will fall but the

value of the penalty term will increase. The object is to choose the number of parameters which

minimises the value of the information criteria. So, adding an extra term will reduce the value of the

criteria only if the fall in the residual sum of squares is sufficient to more than outweigh the increased

value of the penalty term.

SBIC is strongly consistent (but inefficient) and AIC is not consistent, but is generally more efficient. In

other words, SBIC will asymptotically deliver the correct model order, while AIC will deliver on average

too large a model, even with an infinite amount of data. On the other hand, the average variation in

selected model orders from different samples within a given population will be greater in the context

of SBIC than AIC. Overall, then, no criterion is definitely superior to others.

Outline the main steps for forecasting a series using these models.

Following tables show results from various univariate models, which model is the best model

capturing the underlying data generating in the data and why?

Explain the estimated results of your selected (best model) model using the appropriate statistical

tests.

Section A

1. Interpret the following statistics: R-squared, Adjusted R-squared, S.E. of regression, Sum squared

resid, F-statistic, Prob(F-statistic), Mean dependent var and S.D. dependent var.

The R-squared is a measure of goodness of fit of the model. The closer is the value to unity the

better is goodness of fit. The Adjusted R-squared takes into account the number of the degrees of

freedom in the model and penalises introducing additional explanatory variables. The latter

indicates that the model explains 87.56% of the variation in the dependent variable.

Sum squared resid simply calculates the residual sum of squares. It is an absolute measure of the

importance of the residual part of the model. The standard error of the regression is an average size

of the residual. The standard error of the regression is obtained by dividing the residual sum of

squares by the number of degrees of freedom and taking the square root.

The F-statistic is used to test for joint significance of the model. The null hypothesis is that all the

coefficients of the model are collectively zero. The alternative hypothesis is that at least one

coefficient that takes values different from zero. Subsequently, the constrained and unconstrained

models are estimated, as implied by the null and alternative hypotheses. The F statistic is calculated

as a relative difference between the residual sum of squares of these models and it is compared with

the corresponding critical value from an F distribution. Prob(F-statistic) measures marginal

significance of the F statistic.

Mean dependent var calculates the average of the dependent variable, whereas S.D. dependent var

calculates the standard deviation of the dependent variable.

2. t-test

3. F-test

Multicollinearity exists when two or more of the predictors in a regression model are moderately or

highly correlated. Unfortunately, when it exists, it can wreak havoc on our analysis and thereby limit

the research conclusions we can draw. As we will soon learn, when multicollinearity exists, any of

the following pitfalls can be exacerbated:

the estimated regression coefficient of any one variable depends on which other predictors

are included in the model

the precision of the estimated regression coefficients decreases as more predictors are

added to the model

the marginal contribution of any one predictor variable in reducing the error sum of squares

depends on which other predictors are already in the model

hypothesis tests for k = 0 may yield different conclusions depending on which predictors are

in the model

where R2 is the value of the coefficient of determination resulting from an auxiliary regression that

has been performed of one of the explanatory variables on a constant and the other explanatory

variable.

Another indication of the presence of multicollinearity is that individually the explanatory variables

are associated with a lack of significance, while the regression model as a whole is seen to be

statistically significant. In answering Question 4, in all cases, the null hypothesis was rejected at the

five per cent level of significance. Hence, there was no sign of multicollinearity.

regression model, the size of the standard error of the estimator of the slope parameter markedly

increases. In this exercise, in the context of the two-variable model, the standard error of the

Ordinary Least Squares estimator of B1 was equal to 0.1976. In contrast, in the context of the three-

variable model, the corresponding standard error was equal to 0.2439. Hence, upon expanding the

model, the value of the standard error has become larger, but not tremendously so.

Finally, the strength of correlation between explanatory variables may be interpreted as a sign of

multicollinearity. It must be respected that the value of a correlation coefficient has the potential to

fall between -1 and +1. The closer that the value is to either of the two extremes then the stronger is

the linear relationship between the respective variables. The nearer that the value is to zero then

the weaker is the linear relationship between the two variables.

Available remedies: i) removing explanatory variables that are strongly correlated with each other; ii)

acquiring more data; and iii) reconsidering the functional form of the model.

5. Breusch-Pagan-Godfrey

Formulate the variance function that is required for the Breusch-Pagan-Godfrey test for

heteroscedasticity. What are the null and alternative hypotheses for this test? Determine the value

for the test statistic and the critical value. Is there evidence of heteroscedasticity?

If the random disturbance term is homoscedastic, the variance (the expected value of 2 ) will be

constant. If any of the explanatory variables is significant, then the random disturbance term is

heteroscedastic. More formally, 0:1 = 2 = 0 against the 1:1 0 2 0. We shall assume

asymptotic validity of the test statistic, which is distributed with a Chi-square distribution with 2

degrees of freedom. It is calculated as 2 = 216 0.135844 = 29.34231. The critical value is given

by 5.9915. Since the test statistic is greater than the critical value, the null hypothesis is rejected.

Therefore, there is evidence of heteroscedasticity in the model.

standard errors can be used, since they give rise to consistent estimates. Alternatively, generalised

least squares could be used to obtain efficient estimates.

Ramsey RESET

The REST test is a general test for misspecification of functional form. In this test, the dependent

variable from the original regression is regressed on powers of the fitted values together with the

original explanatory variables. The fitted values of y are those which are found through the original

regression, and represent all the different quadratic terms of x.

The null hypothesis, under an F-test, is that all the coefficients of the fitted values of y are jointly

equal to zero.

Durbin-Watson

White test

The coefficient estimates derived using OLS are still unbiased, but they are inefficient, i.e. they are

not BLUE, even at large sample sizes, so that the standard error estimates could be wrong. There

thus exists the possibility that the wrong inferences could be made about whether a variable is or is

not an important determinant of variations in y. In the case of positive serial correlation in the

residuals, the OLS standard error estimates will be biased downwards relative to the true standard

errors. That is, OLS will understate their true variability. This would lead to an increase in the

probability of type I error -- that is, a tendency to reject the null hypothesis sometimes when it is

correct. Furthermore, R2 is likely to be inflated relative to its correct value if autocorrelation is

present but ignored, since residual autocorrelation will lead to an underestimate of the true error

variance (for positive autocorrelation).

In this case, OLS estimators will still give unbiased (and also consistent) coefficient estimates, but

they are no longer BLUE -- that is, they no longer have the minimum variance among the class of

unbiased estimators. The reason is that the error variance, 2, plays no part in the proof that the

OLS estimator is consistent and unbiased, but 2 does appear in the formulae for the coefficient

variances. If the errors are heteroscedastic, the formulae presented for the coefficient standard

errors no longer hold.

So, the upshot is that if OLS is still used in the presence of heteroscedasticity, the standard errors

could be wrong and hence any inferences made could be misleading. In general, the OLS standard

errors will be too large for the intercept when the errors are heteroscedastic. The effect of

heteroscedasticity on the slope standard errors will depend on its form