Sei sulla pagina 1di 38

Unit Roots, Structural Breaks and Cointegration Analysis: A Review of the Available Processes and Procedures and an Application

(Presented at the Macroeconomics and Financial Economics Workshop:

Recent Developments in Theory and Empirical Modeling workshop, held from October 8 to October 9 at Eastern Mediterranean University)

Asst. Prof. Dr. Mete Feridun Department of Banking and Finance Faculty of Business and Economics Eastern Mediterranean University


Testing for stationarity Testing for structural breaks Dealing with structural breaks in cointegration analysis Tests for parameter stability

Time-series Econometrics

Prior to any time-series econometric analysis, it is necessary to investigate the stationarity properties of the variables. A stationary series fluctuates around a constant longrun mean and, this implies that the series has a finite variance which does not depend on time. On the other hand, non-stationary series have no tendency to return to a long-run deterministic path and the variances of the series are time-dependent.

Non-stationary series suffer permanent effects from random shocks and thus the series follow a random walk. The problems caused by non-stationary variables in standard regression analysis have been well documented in the time-series literature. The standard classical estimation methods are based on the assumption that the mean and variances of the stochastic series are finite, constant and time invariant.

Given that economic time-series are typically described as nonstationary processes, the estimates of such variables will lead to spurious regression and their economic interpretation will not be meaningful. A spurious regression occurs when a pair of independent series, but with strong temporal properties, are found apparently to be related according to standard inference in an OLS regression If the unit root tests find that a series contain one unit root, the appropriate route in this case is to transform the data by differencing the variables prior to their inclusion in the regression model, but this incurs a loss of important long-run information. Alternatively, if the variables are cointegrated, that is, if a long-run relationship exists among the set of variables that share similar nonstationarity properties, regression involving the levels of the variables can proceed without generating spurious results.

In this case, a balanced regression leads to meaningful interpretations and evades the spurious regression problem. In binary choice models also, the method of estimation of the are based on the assumption that the means and variances of these variables being tested are constant over time, i.e. stationary. Incorporating non-stationary or unit root variables in estimating the regression equations using these models give misleading inferences.

Unit Root Tests

Traditionally, Augmented DickeyFuller (ADF) and PhillipsPerron (PP) tests are used to assess the order of integration of the variables. Uniform outcomes of both tests are necessary for the final conclusion about the stationarity properties of each series. Usually, all variables are tested with a linear trend and/or intercept or none.

Structural Breaks

A well-known weakness of the ADF and PP unit root tests is their potential confusion of structural breaks in the series as evidence of non-stationarity. In other words, they may fail to reject the unit root hypothesis if the series have a structural break. In other words, for the series that are found to be I(1), there may be a possibility that they are in fact stationary around the structural break(s), I(0), but are erroneously classified as I(1).

Perron (1989) shows that failure to allow for an existing break leads to a bias that reduces the ability to reject a false unit root null hypothesis. To overcome this, the author proposes allowing for a known or exogenous structural break in the Augmented Dickey-Fuller (ADF) tests. Following this development, many authors, including Zivot and Andrews (1992) and Perron (1997), proposed determining the break point endogenously from the data.

Enders (2004) argues that Perron-Vogelsang (1992) unit root tests are more appropriate if the date of the break is uncertain.

Shrestha and Chowdhury (2005) argue that, in the case of a structural break, the testing power of the Perron-Vogelsang unit root test is superior to that of the Zivot-Andrews test.
Applying the unit root tests which allow for the possible presence of the structural break prevents obtaining a test result which is possibly biased towards non-rejection, as suspected by Perron (1989). Also, since this procedure can identify the date of the structural break, it facilitates the analysis of whether a structural break on a certain variable is associated with a particular event such as a change in government policy, a currency crisis, war and so forth.

Multiple Structural Breaks

The Zivot-Andrews and Perron-Vogelsang (1992) unit root tests allow for one structural break, whereas the Clemente-MontanesReyes (1998) unit root test allows for two structural breaks in the mean of the series196. Clemente et al (1998) base their approach on Perron and Vogelsang (1992), allowing for the possibility of having two structural breaks in the mean of the series.

In these tests, the null hypothesis is that the series has a unit root with structural break(s) against the alternative hypothesis that they are stationary with break(s). The advantage of these tests is that they do not require an a priori knowledge of the structural break dates. Ben-David et al (2003) cautions that just as failure to allow one break can cause non-rejection of the unit root null by the Augmented Dickey Fuller test, failure to allow for two breaks, if they exist, can cause non-rejection of the unit root null by the tests which only incorporate one break (Ben-David et al, 2003: 304).

Lumisdaine and Papell (1997) extended the Zivot and Andrews (1992) model to accommodate two structural breaks.

However, this test was criticized for the absence of the breaks under the null hypothesis of unit root as this could result in a tendency for these tests to suggest evidence of stationarity with breaks (see Glynn et al, 2007).

Hence, The Perron-Vogelsang and Clemente-MontanesReyes unit root tests are more preferable. Both of these tests offer two models: (1) an additive outliers (AO) model, which captures a sudden change in the mean of a series; and (2) an innovational outliers (IO) model, which allows for a gradual shift in the mean of the series.

According to Baum (2004), if the estimates of the Perron-Vogelsang and Clemente-Montanes-Reyes unit root tests provide evidence of significant additive or innovational outliers in the time series, the results derived from ADF and PP tests are doubtful, as this is evidence that the model excluding structural breaks is misspecified. Therefore, in applying unit root tests in time series that exhibit structural breaks, only the results from the Clemente-Montanes-Reyes unit root tests should be considered if the two structural breaks indicated by the respective tests are statistically significant (at the 5% level as used by STATA).

On the other hand, if the results of the Perron-Vogelsang and Clemente-MontanesReyes unit root tests show no evidence of two significant breaks in the series, the results from the PerronVogelsang unit root tests are considered. If these tests show no evidence of a structural break, the ADF and PP tests can be considered.

Cointegration Analysis

If it is certain that the underlying series are all I(1), the conventional Johansen cointegration technique can be safely used. However, in the case where the presence of structural breaks introduces uncertainty as to the true order of integration of the variables, the autoregressive distributed lag (ARDL) bounds testing procedure introduced by Pesaran and Pesaran (1997), Pesaran and Shin (1999), and Pesaran et al (2001) should be preferred. The advantage of this methodology is that it yields valid results regardless of whether the underlying variables are I(1) or I(0), or a combination of both.

Conventionally, in the case where the unit root tests reveal that a series contain one unit root, the appropriate method is to transform the data by differencing the variables prior to their inclusion in the regression model to avoid the risk of spurious regression. Nonetheless, this incurs a loss of important long-run information. Alternatively, if the variables are cointegrated, that is, if a long-run relationship exists among the set of variables that share similar nonstationarity properties, regression involving the levels of the variables can proceed without generating spurious results. In this case, a balanced regression leads to meaningful interpretations and evades the spurious regression problem.

Cointegration vectors are of considerable interest since they determine I(0) relations that hold between variables which are individually non-stationary. Essentially, variables are cointegrated when a long-run linear relationship is obtained from a set of variables that share the same non-stationary properties.

Hence, the intuition behind cointegration is that it allows capturing the equilibrium relationships dictated by the economic theory between nonstationary variables within a stationary model.
A search is made for a linear combination of such variables such that the combination is stationary. If such a stationary combination exists, then the variables are said to be cointegrated, meaning even though they are individually not stationary, they are bound by an equilibrium relationship.

In this case, the application of traditional econometric modelling to non-stationary time series data generates meaningful results. An advantage of the cointegration approach is that it provides a direct test of the economic theory and enables utilization of the estimated long-run parameters into the estimation of the shortrun disequilibrium relationships. Although Engle and Grangers (1987) original definition of cointegration refers to variables that are integrated of the same order, Enders (2004) argues that:

It is possible to find equilibrium relationships among groups of variables that are integrated of different orders200. Asteriou and Hall (2007) also explains that in cases where a mix of I(0) and I(1) variables are present in the model, cointegrating relationships might exist.

Similarly, Lutkopohl (2004) explains:

Occasionally it is convenient to consider systems with both I(1) and I(0) variables. Thereby the concept of cointegration is extended by calling any linear combination that is I(0) a cointegration relation, although this terminology is not in the spirit of the original definition because it can happen that a linear combination of I(0) variables is called a cointegration relation

Therefore, even in the presence of a set of variables which contains both I(1) and I(0) variables, cointegration analysis is applicable and the presence of a long-run linear combination denotes the existence of cointegrated variables. Hence, it is possible to find long-run equilibrium relationships among a set of I(0) and I(1) variables if their linear combination reveals a cointegrating relationship. In the multivariate case, it is possible to have series with different orders of integration. In this case, a subset of the higher order series must cointegrate to the order of the lower order series. The long-run relationship among the variables could be achieved if the low frequency or the stochastic trend components of a set of variables offset each other to achieve a stationary linear combination of the variables.

Autoregressive Distributed Lag (ARDL) Bounds Tests

The choice of the ARDL bounds testing procedure as a tool for investigating the existence of a long-run relationship is based on the following considerations: First and the foremost, both dependent and the independent variables can be introduced in the model with lags. Autoregressive refers to lags in the dependent variable. Therefore, the past values of a variable are allowed to determine its present value. On the other hand, distributed lag refers to the lags of the explanatory variables.

This is a highly plausible feature: Conceptually, the dependence of the dependent variable on the independent variables may or may not be instantaneous depending on the theoretical considerations. A change in the economic variables may not necessarily lead to an immediate change in another variable.

The reaction to a change in each variable may be different depending on various factors. Hence, in some cases, they may respond to the economic developments with a lag and there is usually no reason to assume that all regressors should have the same lags.

Hence, ARDL bounds testing approach is appropriate as it allows flexibility in terms of the structure of lags of the regressors in the ARDL model as opposed to the cointegration VAR models where different lags for different variables is not permitted (Pesaran et al, 2001).
It goes without saying that the correct choice of the order of the ARDL model is very important in the long-run analysis. In this respect, the ARDL approach has the advantage that it takes a sufficient number of lags to capture the data generating process in a general-to-specific modelling framework

Furthermore, the lag orders can be selected based on four different selection criteria taking into consideration the results of the diagnostic tests for residual serial correlation, functional form misspecification, non-normality, and heteroscedasticity. Also, as shown by Pesaran et al (2001), the ARDL models yield consistent estimates of the long-run coefficients that are asymptotically normal irrespective of whether the underlying regressors are purely I(0), I(1) or mutually cointegrated. In other words, this procedure allows making inferences in the absence of any a priori information about the order of integration of the series under investigation.

As demonstrated by Pesaran and Shin (1999), the small sample properties of the bounds testing approach are superior to that of the traditional Johansen cointegration approach, which typically requires a large sample size for the results to be valid. Since the ARDL approach draws on the unrestricted error correction model, it is likely to have better statistical properties than the traditional cointegration techniques. In particular, Pesaran and Shin (1999) show that the ARDL approach has better properties in sample sizes up to 150 observations. On the other hand, Narayan and Smyth (2005) provide exact critical values for up to 80 observations

The ARDL approach is particularly applicable in the presence of the disequilibrium nature of the time series data stemming from the presence of possible. Structural breaks as happens with most economic variables. With the ARDL approach, it can be conveniently tested whether the underlying structural breaks have affected the long-run stability of the estimated coefficients. As suggested by Pesaran (1997), the cumulative sum of recursive residuals (CUSUM) and the CUSUM of square (CUSUMSQ) tests proposed by Brown et al (1975) can be applied to the residuals of the estimated error correction models to test parameter constancy

Having established the existence of a long-run relationship based on F-tests, the second step of the ARDL analysis is to estimate the long-run and the associated short-run coefficients. The long-run relationship is regarded as a steady-state equilibrium, whereas the short-run relationship is evaluated by the magnitude of the deviation from the equilibrium. The order of the lags in the ARDL model are selected using the appropriate selection criteria such as Akaike Information Criterion (AIC), Schwartz Bayesian Criterion (SBC), HannanQuinn Criterion (HQC) and R2 ensuring that there is no evidence of residual serial correlation, functional form misspecification, non-normality and heteroscedasticity.

In the presence of structural breaks, the diagnostic tests of the selected models will most likely suggest that the estimated model suffers from the nonnormality problem. In this case, the short-run and long-run coefficients of the estimated models will not be valid. The presence of non-normality problem can be attributed to the presence of outliers over the sample period, which results from non-recurring, exogenous shocks (such as wars, terrorist attacks, oil price shocks, financial crises etc.) rather than the normal evolution of the economic data.

Lets see an example of the diagnostic test results of Turkish macroeconomic data which has severe structural breaks due to currency crises. (see Table 1). In econometric modelling, the presence of extreme residuals, i.e. outliers, may lead to a rejection of the normality assumption as can be seen in Table 1 The outliers can individually or collectively be responsible for the residual non-normality problem. Indeed, this is not surprising in most economic cases given most of the series are characterized by frequent fluctuations.

Pulse Dummy Variables

One way to improve the chances of error normality is to use pulse dummy variables to capture those one-off abnormal observations, i.e. the estimated models should be re-estimated by augmenting the cointegrating equations with pulse dummy variables. Accordingly, separate dummy variables should be introduced for each of the outliers. Following the existing econometrics literature, the operational definition of an outlier is considered as any data point for which the residuals are in excess of 2 standard deviations from the fitted model.

The dummy variables are set equal to zero for all observations except the month in which the observation goes beyond the threshold of two standard errors. In these months, the dummy variable takes on the value of 1.

For example, Figure I plots the residuals of several econometric models with structural breaks and 2 standard errors. The horizontal lines in the figures represent the 2 standard error bands.

In these models the identified outliers correspond to the Turkish currency crises of 1994 and 2000-2001.
In this case, pulse dummy variables may be justifiably used to remove observations corresponding to these one-off events that are highly unlikely to be repeated.

When the results of the models which are reestimated using the pulse dummy variables to account for the presence of the outliers are examined, it can be seen that the use of the intervention dummy variables ensured normality of the probability distribution of the residuals, which permits hypothesis testing on the results of the model. (See Table 2) The results also confirm that the re-estimated models do not suffer from autocorrelation, heteroscedasticity, or model misspecification problems.

Testing for Parameter Stability

The existence of cointegration does not necessary imply that the estimated coefficients are stable. If the coefficients are unstable the results will be unreliable. In order to test for long-run parameter stability, Pesaran and Pesaran (1997) suggest applying the cumulative sum of recursive residuals (CUSUM) and the cumulative sum of recursive residuals of square (CUSUMSQ) tests proposed by Brown et al (1975) to the residuals of the estimated ECMs to test for parameter constancy. The advantage of these tests is that, unlike the alternative Chow test that requires break point(s) to be specified, they can be used without the requirement of a priori knowledge of the exact date of the structural break(s).

Hansen and Johansen (1992) also suggest a parameter constancy test but they require the variables to be I(1). Also, they do not incorporate the short-run dynamics of a model into testing unlike CUSUM and CUSUMSQ tests.
In both CUSUM and CUSUMSQ, the related null hypothesis is that all coefficients are stable. The CUSUM test uses the cumulative sum of recursive residuals based on the first observations and is updated recursively and plotted against break point. The test is more suitable for detecting systematic changes in the regression coefficients. The CUSUMSQ makes use of the squared recursive residuals and follows the same procedure. However, it is more useful in situations where the departure from the constancy of the regression coefficients is haphazard and sudden.

If the plot of the CUSUM and CUSUMSQ stays within the 5 percent critical bounds the null hypothesis that all coefficients are stable cannot be rejected. If however, either of the parallel lines are crossed then the null hypothesis of parameter stability is rejected at the 5 percent significance level. For example in Figure II and III, which plot the CUSUM and CUSUMSQ tests, the plots of the CUSUM and CUSUMSQ statistics are generally confined within the 5 percent critical value bounds, indicating the absence of any instability of the coefficients Thus, the parameters of the model do not suffer from any structural instability.

Thank you !