Nielsen Dissertation

Multivariate Fractional Integration and Cointegration
By Morten rregaard Nielsen
A dissertation submitted to The Faculty of Social Sciences in partial fulllment of the requirements for the degree of
Doctor of Philosophy
in
Economics
University of Aarhus, Denmark
ii
To My Family Til Min Familie
iii
iv
Table of Contents
Preface vii
Summary of the Dissertation
ix
Dansk Resume (Danish Summary)
xi
Chapter 1
Local Whittle Analysis of Stationary Fractional Cointegration and the Implied-Realized Volatility Relation
Chapter 2
Semiparametric Estimation in Time Series Regression with Long Range Dependence
43
Chapter 3
Optimal Residual Based Tests for Fractional Cointegration and Exchange Rate Dynamics
71
Chapter 4
Multivariate Lagrange Multiplier Tests for Fractional Integration
107
vi
Preface
The present collection of papers constitutes my PhD dissertation. The dissertation was written during my studies at the Department of Economics, University of Aarhus, in the years 19992002. In the Spring of 2002, I visited the Department of Economics and the Cowles Foundation at Yale University. I would like to thank Peter C. B. Phillips and the department for their hospitality. I am very grateful to my dissertation advisors, Bent Jesper Christensen and Niels Haldrup, for providing intellectual guidance and support throughout my studies, and for always being willing to read and comment on several drafts of this dissertation and my other papers. During my studies in Aarhus, I have worked with Bent Jesper Christensen, Niels Haldrup, and Svend Hylleberg on various research projects. I am grateful for having had this opportunity to collaborate, and I hope several more opportunities will present themselves in the years to come. Finally, I would like to thank the other PhD students in Aarhus for generating a very pleasant environment. Special thanks go to my good friend Lars Stentoft with whom I have shared several oces, lots of academic and not-so-academic discussions, and many visits to the Friday bar. Morten rregaard Nielsen Aarhus, December 2002 The pre-defense of this dissertation took place on 21 February, 2003. I would like to take this opportunity to thank the members of the assessment committee, Sren Johansen (University of Copenhagen), Jrg Breitung (Bonn University), and Svend Hylleberg (University of Aarhus) for their numerous constructive and very helpful comments and suggestions. Morten rregaard Nielsen Aarhus, March 2003
vii
viii
Summary of the Dissertation

This dissertation is concerned with the properties of statistical inference techniques in time series models of long memory, fractional integration, and cointegration, as well as the application of such models to economic data. The aim is to develop new and improved methods of inference for long memory models, and methods for investigating econometric relationships between such models, in particular to extend existing work on the asymptotic distribution theory for fractionally integrated and cointegrated time series. Since the seminal work by Granger & Joyeux (1980), Granger (1981), and Hosking (1981), introducing long memory and fractionally integrated models into econometrics, there has been an increasing focus on the development of econometric and statistical inference techniques for such models. Time series exhibiting long memory (or long range dependence) are characterized by a strong dependence between observations that are distant in time, and they provide a exible modelling framework for both stationary and nonstationary data. Two excellent surveys are Robinson (1994b) and Baillie (1996). Recent empirical work by a large number of researchers, e.g. Diebold & Rudebusch (1989), Sowell (1992), Baillie (1996), Lobato & Velasco (2000), Andersen, Bollerslev, Diebold & Ebens (2001), and Andersen, Bollerslev, Diebold & Labys (2001), reveals that many important economic time series exhibit long memory and may be modelled using fractional integration techniques. Series where these phenomena are observed include exchange rates, interest rates, production, consumption, unemployment, volatility, and many others. Thus, almost all areas of economics are aected by these observations, and hence the importance of developing appropriate econometric techniques to model such series. This dissertation consists of four selfcontained papers. In chapter 1, I consider local Whittle analysis, see Robinson (1995) and Lobato (1999), of a stationary fractionally cointegrated model. A two step estimator, equivalent to the local Whittle QMLE, is proposed to jointly estimate the integration orders of the regressors, the integration order of the errors, and the cointegration vector. The estimator is semiparametric in the sense that it employs local assumptions on the joint spectral density matrix of the regressors and the errors near the zero frequency. Consequently, the estimator is invariant to short-run dynamics which does not even have to be specied. I show that, for the entire stationary region of the integration orders, the estimator is asymptotically normal with block diagonal covariance matrix. Thus, the estimates of the integration orders are asymptotically independent of the estimate of the cointegration vector. Furthermore, the present estimator of the cointegrating vector is asymptotically normal for a wider range of integration orders than the narrow band frequency domain least squares estimator of Robinson (1994a), and is superior with respect to asymptotic variance, see also Christensen & Nielsen (2001). An application to nancial volatility series is oered, which demonstrates that useful long-run relations may be derived between stationary time series. Chapter 2 is concerned with semiparametric estimation in time series regression in the presence of long range dependence in both the errors and the stochastic regressors. A central limit theorem is established for a class of semiparametric frequency domain weighted least squares estimates, which includes both narrow band ordinary least squares and narrow band generalized least squares as special cases. The estimates are semiparametric as in chapter 1. ix
This setting diers from earlier work on time series regression with long range dependence where a fully parametric approach has been employed, e.g. Robinson & Hidalgo (1997). The generalized least squares estimate is infeasible when the degree of long range dependence is unknown and must be estimated in an initial step. In that case, I show that a feasible estimate exists which has the same asymptotic properties as the infeasible estimate. In chapter 3, I propose a Lagrange Multiplier test of the null hypothesis of cointegration in fractionally cointegrated models. The test statistic utilizes fully modied residuals, see Phillips & Hansen (1990), to cancel the endogeneity and serial correlation biases, and I show that standard asymptotic properties apply under the null and under local alternatives. With i.i.d. Gaussian errors the asymptotic Gaussian power envelope of all (unbiased) tests is achieved by the one-sided (two-sided) test. In an application to the dynamics among exchange rates for seven major currencies against the US dollar, mixed evidence of the existence of a cointegrating relation is found. In chapter 4, I introduce a multivariate Lagrange Multiplier test for fractional integration, generalizing Tanakas (1999) univariate test to multiple time series. With no multivariate tests available for testing the order of fractional integration, researchers interested in multiple time series have been forced to apply univariate tests to each element of the multiple time series. That procedure is not only cumbersome, but ignores potentially important correlations between the elements of the multiple time series, which could lead to increased power of a multivariate test. A regression variant of Tanakas (1999) LM test is proposed by Breitung & Hassler (2002), but it is not equivalent to the LM test in the multivariate case. I show that the LM statistic is asymptotically chi-squared distributed and ecient against local alternatives. Thus, asymptotic inference in this framework is much simpler than in the usual (integer) integration models where asymptotic distributions are nonstandard, see e.g. Phillips & Durlauf (1986) and Choi & Ahn (1999). An application to multivariate time series of real interest rates for six countries is oered, demonstrating that more clear-cut evidence can be drawn from multivariate tests compared to conducting several univariate tests.
Dansk Resume (Danish Summary)

Afhandlingen beskftiger sig med egenskaberne ved statistiske inferensmetoder i tidsrkkemodeller med lang hukommelse, fraktionel integration og kointegration, svel som anvendelsen af sdanne modeller p konomiske data. Mlet er at udvikle nye og forbedrede inferensmetoder for tidsrkkemodeller med lang hukommelse, og metoder til at undersge konometriske relationer mellem sdanne modeller, specielt at udvide eksisterende asymptotisk fordelingsteori for fraktionelt integrerede og kointegrerede tidsrkker. Siden det banebrydende arbejde af Granger & Joyeux (1980), Granger (1981) og Hosking (1981), som introducerede lang hukommelse og fraktionelt integrerede modeller i den konometriske litteratur, har der vret en stigende fokus p udviklingen af konometriske og statistiske inferensteknikker for disse modeller. Tidsrkker, som udviser lang hukommelse, er karakteriserede ved en strk afhngighed mellem observationer, som er langt fra hinanden i tid, og de tilbyder eksibel modellering for bde stationr og ikke-stationr data. To glimrende oversigtsartikler er Robinson (1994b) og Baillie (1996). Nyere empirisk arbejde af mange forskere, f.eks. Diebold & Rudebusch (1989), Sowell (1992), Baillie (1996), Lobato & Velasco (2000), Andersen, Bollerslev, Diebold & Ebens (2001) og Andersen, Bollerslev, Diebold & Labys (2001), har demonstreret at mange vigtige konomiske tidsrkker udviser lang hukommelse og kan modelleres ved hjlp af fraktional integration. Tidsrkker hvor disse fnomener er observeret inkluderer valutakurser, renter, produktion, forbrug, arbejdslshed, volatilitet og mange andre. Sledes er nsten alle omrder berrt af disse observationer, hvilket understreger vigtigheden af at udvikle passende konometriske teknikker til at modellere sdanne tidsrkker. Afhandlingen indeholder re selvstndige papirer. I kapitel 1 foretager jeg lokal Whittle analyse, se Robinson (1995) og Lobato (1999), af en stationr fraktionelt kointegreret model. En to-trins estimator, som er kvivalent med den lokale Whittle QMLE, foresls til samtidig at estimere regressorernes integrationsordener, fejlleddets integrationsorden og kointegrationsvektoren. Estimatoren er semiparametrisk p den mde, at den benytter lokale antagelser om spektraltthedsmatricen for regressorerne og fejlleddet nr nulfrekvensen. Som konsekvens er estimatoren invariant overfor kortsigtsdynamik, som ikke engang krves speciceret. Jeg viser, at estimatoren er asymptotisk normalfordelt for hele den stationre region af integrationsordenerne, samt at kovariansmatricen er blokdiagonal. Dvs., estimaterne af integrationsordenerne og estimaterne af kointegrationsvektoren er asymptotisk uafhngige. Desuden er den nye estimator asymptotisk normalfordelt for et strre omrde af integrationsordener end narrow band frequency domain least squares estimatoren af Robinson (1994a), og den er bedre med hensyn til asymptotisk varians, se ogs Christensen & Nielsen (2001). Den nye model og estimator er anvendt p nansielle volatiliteter, og anvendelsen demonstrerer at nyttige langsigtssammenhnge mellem stationre tidsrkker kan udledes. Kapitel 2 drejer sig om semiparametrisk estimation i tidsrkkeregression, nr fejlleddet og de stokastiske regressorer har lang hukommelse. En central grnsevrdistning udledes for en klasse af semiparametriske frequency domain weighted least squares estimater, der inkluderer bde narrow band ordinary least squares og narrow band generalized least squares som specielle tilflde. Estimaterne er semiparametriske p samme mde som i kapitel 1. Denne xi
tilgangsvinkel er forskellig fra tidligere arbejde p tidsrkkeregression med lang hukommelse, hvor en fuldt parametrisk model typisk har vret antaget, se f. eks. Robinson & Hidalgo (1997). Generalized least squares estimatet er uopneligt, hvis graden af lang hukommelse er ukendt og skal estimeres i en initial analyse. I det tilflde viser jeg, at et opneligt estimat eksisterer, som har samme asymptotiske egenskaber som det uopnelige estimat. I kapitel 3 introducerer jeg et Lagrange Multiplier test for kointegration i fraktionelt kointegrerede modeller. Teststatistikken udnytter fuldt modicerede residualer, se Phillips & Hansen (1990), til at fjerne endogenitets- og autokorrelationsbias, og jeg viser, at standard asymptotiske egenskaber glder under nulhypotesen og under lokale alternativer. Med i.i.d. Gaussiske fejl opnr det ensidede (tosidede) test den asymptotiske Gaussiske styrkegrnse for alle (forventningsrette) tests. I en anvendelse til valutakurser for syv store valutaer overfor U.S. dollars nder jeg blandede beviser for eksistensen af kointegration. I kapitel 4 introducerer jeg et multivariat Lagrange Multiplier test for fractionel integration, som generaliserer Tanakas (1999) univariate test til multiple tidsrkker. Uden nogen tilgngelige multivariate tests til at teste den fraktionelle integrationsorden har forskere, som har vret interesserede i multiple tidsrkker, vret ndt til at anvende univariate tests p hvert element af den multiple tidsrkke. Den procedure er ikke kun besvrlig, men ignorerer potentielt vigtige korrelationer mellem elementerne af den multiple tidsrkke, som kunne lede til get styrke af et multivariat test. En regressionsvariant af Tanakas (1999) LM test er foreslet af Breitung & Hassler (2002), men den er ikke kvivalent med LM testet i det multivariate tilflde. Jeg viser, at LM statistikken er asymptotisk chi-i-anden fordelt og ecient mod lokale alternativer. Sledes er asymptotisk inferens i denne model meget simplere end i sdvanlige heltalsintegrerede modeller, hvor de asymptotiske fordelinger er ikke-standard, se f.eks. Phillips & Durlauf (1986) og Choi & Ahn (1999). LM testet anvendes p en multipel tidsrkke af realrenter for seks lande, og anvendelsen demonstrerer at skarpere konlusioner kan drages fra multivariate tests sammenlignet med at foretage en rkke univariate tests.
References
Andersen, T. G., Bollerslev, T., Diebold, F. X. & Ebens, H. (2001), The distribution of realized stock return volatility, Journal of Financial Economics 61, 4376. Andersen, T. G., Bollerslev, T., Diebold, F. X. & Labys, P. (2001), The distribution of exchange rate volatility, Journal of the American Statistical Association 96, 4255. Baillie, R. T. (1996), Long memory processes and fractional integration in econometrics, Journal of Econometrics 73, 559. Breitung, J. & Hassler, U. (2002), Inference on the cointegration rank in fractionally integrated processes, Journal of Econometrics 110, 167185. Choi, I. & Ahn, B. C. (1999), Testing the null of stationarity for multiple time series, Journal of Econometrics 88, 4177. xii
Christensen, B. J. & Nielsen, M. . (2001), Semiparametric analysis of stationary fractional cointegration and the implied-realized volatility relation, Department of Economics Working Paper 2001-04 (revised 2002), University of Aarhus . Diebold, F. X. & Rudebusch, G. D. (1989), Long memory and persistence in aggregate output, Journal of Monetary Economics 24, 189209. Granger, C. W. J. (1981), Some properties of time series data and their use in econometric model specication, Journal of Econometrics 16, 121130. Granger, C. W. J. & Joyeux, R. (1980), An introduction to long memory time series models and fractional dierencing, Journal of Time Series Analysis 1, 1529. Hosking, J. R. M. (1981), Fractional dierencing, Biometrika 68, 165176. Lobato, I. N. (1999), A semiparametric two-step estimator in a multivariate long memory model, Journal of Econometrics 90, 129153. Lobato, I. N. & Velasco, C. (2000), Long memory in stock-market trading volume, Journal of Business and Economic Statistics 18, 410427. Phillips, P. C. B. & Durlauf, S. N. (1986), Multiple time series regression with integrated processes, Review of Economic Studies 53, 473495. Phillips, P. C. B. & Hansen, B. E. (1990), Statistical inference in instrumental variables regression with I(1) variables, Review of Economic Studies 57, 99125. Robinson, P. M. (1994a), Semiparametric analysis of long-memory time series, Annals of Statistics 22, 515539. Robinson, P. M. (1994b), Time series with strong dependence, in C. A. Sims, ed., Advances in Econometrics, Cambridge University Press, Cambridge, pp. 4795. Robinson, P. M. (1995), Gaussian semiparametric estimation of long range dependence, Annals of Statistics 23, 16301661. Robinson, P. M. & Hidalgo, F. J. (1997), Time series regression with long-range dependence, Annals of Statistics 25, 77104. Sowell, F. B. (1992), Modeling long run behavior with the fractional ARIMA model, Journal of Monetary Economics 29, 277302. Tanaka, K. (1999), The nonstationary fractional unit root, Econometric Theory 15, 549582.
xiii
xiv
Chapter 1
Morten rregaard Nielsen
Abstract We consider local Whittle analysis of a stationary fractionally cointegrated model. A two step estimator, equivalent to the local Whittle QMLE, is proposed to jointly estimate the integration orders of the regressors, the integration order of the errors, and the cointegration vector. We rst show that the univariate local Whittle QMLE of the integration order of the residuals in our model is unaected by the fact that it is based on residuals, and that it may thus be employed as an initial estimator. The two step estimator is semiparametric in the sense that it employs local assumptions on the joint spectral density matrix of the regressors and the errors near the zero frequency. We show that the estimator is asymptotically normal for the entire stationary region of the integration orders, and thus for a wider range of integration orders than the narrow band frequency domain least squares estimator, and that it is superior to the latter estimator with respect to asymptotic variance. Monte Carlo evidence documenting the nite sample feasibility of our new methodology is presented. In an application to nancial volatility series, we examine the unbiasedness hypothesis in the implied-realized volatility relation. JEL Classication: C22 Keywords: Fractional Cointegration, Fractional Integration, Whittle Likelihood, Long Memory, Realized Volatility, Semiparametric Estimation
I am grateful to Richard Baillie, Richard Blundell, Jrg Breitung, Bent Jesper Christensen, Niels Haldrup, Uwe Hassler, Svend Hylleberg, Sren Johansen, Helmut Ltkepohl, Neil Shephard, Herman van Dijk, and Tim Vogelsang for many helpful comments and discussions that have signicantly improved the paper. I would also like to thank participants at the 2002 Econometric Society European Meeting in Venice, the 2002 Econometric Society Winter European Meeting in Budapest, the 2002 (EC)2 Conference in Bologna, and seminar participants at University of Aarhus, Tilburg University, University of British Columbia, Cornell University, Michigan State University, and Nueld College (Oxford) for comments.
Chapter 1
Introduction
In this paper we are concerned with the joint estimation of the integration orders and the cointegrating vector in stationary fractionally cointegrated models. Suppose we observe the p-vector zt = (yt , x0 )0 , which is integrated of order d (0, 1/2), denoted zt I (d). For a t precise statement, zt I (d) if (1 L)d zt = t , (1) where t I (0) and (1 L)d is dened by its binomial expansion (1 L) =
d X j=0
(j d) Lj , (d) (j + 1)
(z) =
tz1 et dt,
(2)
in the lag operator L (Lzt = zt1 ). A process is labelled I (0) if it is covariance stationary and has spectral density that is bounded and bounded away from zero at the origin. A scalar-valued stochastic process generated by (1) has spectral density f () g2d as 0+ , (3)
where g is a constant and the symbol means that the ratio of the left- and right-hand sides tends to one in the limit. Such a process is said to possess strong dependence or long range dependence, since the autocorrelations decay at a hyperbolic rate in contrast to the much faster exponential rate in the weak dependence case. The parameter d determines the memory of the process. If d > 1/2, zt is invertible and admits a linear representation, and if d < 1/2 it is covariance stationary. If d = 0, the spectral density (3) is bounded at the origin, and the process has only weak dependence. Sometimes, zt is said to have intermediate memory, short memory, and long memory when d < 0, d = 0, and d > 0, respectively. Suppose further that zt = (yt , x0 )0 satises the regression model t yt = 0 xt + et , (4)
where the error term is integrated of a smaller order de < d, i.e. et I (de ). A much studied special case is the standard I (1) I (0) cointegration model which arises when d = 1 and de = 0, see e.g. Watson (1994) for a review. When d and/or de are not integers the model is called a fractional cointegration model following the original idea by Granger (1981). We call the model (4) with 0 de < d < 1/2 a stationary fractionally cointegrated model, since it is concerned with the long-run linear co-movement between two or more stationary fractionally integrated processes. The properties of the model in the standard I (1) I (0) cointegration case are well known, see Watson (1994), but the fractional cointegration framework has been examined only recently, see the short review in Robinson & Yajima (2002). Throughout this paper, we shall be concerned with the stationary case d (0, 1/2). This interval is relevant for many applications in nance, for instance stock market trading volume 4
(Lobato & Velasco (2000)), exchange rate volatility (e.g. Andersen, Bollerslev, Diebold & Labys (2001)), stock return volatility (e.g. Andersen, Bollerslev, Diebold & Ebens (2001) and Christensen & Nielsen (2004)), spot prices for crude oil (Robinson & Yajima (2002)), and electricity spot prices (Haldrup & Nielsen (2004)). In particular, it is the relevant region for the volatility processes we study below in our empirical application. Henry & Zaaroni (2003) provide a survey of empirical applications of fractional integration and long memory in macroeconomics and nance. Since our model is stationary, a comparison with the standard time series regression model with weakly dependent regressors is natural. It is well known that, in that case, under a wide variety of regularity conditions, the ordinary least squares and generalized least squares estimates of in (4) are asymptotically normal, see e.g. Hannan (1979). The new complication is that, since the regressors and the errors both have long memory, they are potentially correlated even at very long horizons, thus rendering the OLS estimator inconsistent as discussed by Robinson (1994a, 1994b) and Robinson & Hidalgo (1997). To deal with this issue, Robinson (1994a) proposed a semiparametric narrow band frequency domain least squares (FDLS) estimator that assumes only a multivariate generalization of (3), and essentially performs OLS on a degenerating band of frequencies around the origin. The consistency of the estimator in the stationary case is proved by Robinson (1994a), and Christensen & Nielsen (2004) show that its asymptotic distribution is normal when the collective memory of the regressors and the error term is less than 1/2, i.e. when d+de < 1/2. In contrast, Robinson & Marinucci (2003) consider several cases where the regressors are fractionally integrated and nonstationary, and show that the limiting distributions for the FDLS estimator are then functionals of fractional Brownian motion, and Chen & Hurvich (2003a) generalize the model to allow deterministic polynomial trends. As an alternative, Robinson & Hidalgo (1997) introduced a parametric class of (full band) weighted least squares estimates (including generalized least squares as a special case), and proved root-n-consistency and asymptotic normality for their estimates, assuming correct specication of the dynamics at any frequency (later relaxed by Hidalgo & Robinson (2002)) and independence between the regressors and the errors. Many estimators of the memory parameter d and the scale parameter g have been suggested in the literature. A semiparametric approach has been developed by Geweke & Porter-Hudak (1983), Knsch (1987), Robinson (1994a, 1995a, 1995b), Lobato & Robinson (1996), and Lobato (1999), among others. The semiparametric estimators of the memory parameter assume only the model (3) for the spectral density, and use a degenerating part of the periodogram around the origin to estimate the model. This approach has the advantage of being invariant to any short- and medium-term dynamics (as well as mean terms since the zero frequency is usually left out). In particular, a local Whittle quasi maximum likelihood estimator (QMLE) based on the maximization of a local Whittle approximation to the likelihood, see our equation (7), has been developed by Knsch (1987), Robinson (1995a) (who called it a Gaussian semiparametric estimator), and Lobato (1999) to estimate the integration orders of univariate 5
Chapter 1
and multivariate stationary fractionally integrated time series, respectively. Of course, a fully parametric approach is more ecient, using the entire sample, but is inconsistent if the parametric model is specied incorrectly, e.g. if the lag-structure of the short-term dynamics is misspecied. The methods described above are combined by Marinucci & Robinson (2001) and Christensen & Nielsen (2004), who suggest conducting a fractional cointegration analysis in several steps. First, the integration orders of the raw data is estimated by, e.g., the local Whittle QMLE. Secondly, the narrow band FDLS estimator for the cointegrating vector is calculated, and nally the integration order of the residuals is estimated assuming that the approach is equally valid for residuals. Hypothesis testing is then conducted on de as if et were observed, and on as if de (which enters in the limiting distribution of the FDLS estimator) were known. Although this is indeed a valid course of action, see Hassler, Marmol & Velasco (2000) and Velasco (2003) for the nonstationary case, and our Theorem 1 below for the stationary case, a joint estimation method for the integration orders and the cointegration vector would be preferable. We propose a simple joint semiparametric two step estimator of the integration orders and the cointegration vector in (4), which is equivalent to the local Whittle QMLE. The two step estimator is based on consistent initial estimators. We show that such estimators exist, and in particular, our Theorem 1 shows that the local Whittle QMLE of the integration order of the residuals is unaected by the fact that it is based on residuals. More generally, this shows that in fact the three step procedure employed by Marinucci & Robinson (2001) and Christensen & Nielsen (2004) is valid. That is, inference on de may, in their setup, be conducted based on our distributional result in Theorem 1 and is equivalent to disregarding the fact that the estimator is based on residuals. Similarly to the narrow band FDLS estimator of the cointegration vector and the local Whittle QMLE of the integration orders, our two step estimator employs local assumptions on the joint spectral density matrix of the regressors and the errors near the zero frequency. It turns out that the limiting distribution of our estimator has a block diagonal covariance matrix, so that the estimates of the integration orders are asymptotically uncorrelated with the estimates of the cointegration vector. Thus, the marginal limiting distribution of the estimates of the integration orders equals the one derived by Lobato (1999), and in particular, it is unaected by the fact that it is partly based on residuals. In contrast to the FDLS estimator, we show that our two step estimator is asymptotically normal for the entire parameter space, i.e. 0 de < d < 1/2, thus avoiding the condition d + de < 1/2 required by the FDLS estimator for asymptotic normality, see Christensen & Nielsen (2004). We also demonstrate that our two step estimator, in addition to being applicable for a wider range of integration orders, has smaller asymptotic variance than the FDLS estimator when the latter is asymptotically normal. A similar approach to ours is considered by Velasco (2001) for bivariate nonstationary frac6
tionally cointegrated processes, and similar results for the asymptotic distribution are reached using data tapering, following Lobato & Velasco (2000). However, Velascos (2001) results are limited to a bivariate model, and require tapering and an additional user chosen bandwidth parameter to trim out the very rst Fourier frequencies as in Robinson (1995b). Following the semiparametric approach outlined above, our estimator enjoys the extremely general treatment of the short-term dynamics that has made the log-periodogram and local Whittle estimators popular among practitioners. In particular, the short-term dynamics does not even need to be specied, since only a degenerating band of frequencies around the origin is used. In contrast, for a parametric estimator to be consistent we would have to specify correctly the short-run dynamics of the model, employing e.g. a vector fractional ARIMA specication as in Dueker & Startz (1998). The obvious cost for this robustness is that the eciency of the semiparametric estimator relative to a correctly specied parametric estimator converges to zero. In order to demonstrate the feasibility of our methodology in nite samples, we present the results of a small Monte Carlo study. The results show that the performance of the estimator is very good with respect to bias and root mean squared error when no short-run dynamics is present. However, it also shows that the bandwidth parameter should not be too high in the presence of short-run dynamics to avoid biased results. The stationary fractional cointegration model has many potential applications, especially in nance. Many nancial time series, like the volatility of stock returns and exchange rates, have been found to be well described by stationary fractionally integrated processes, see e.g. the seminal early contributions by Baillie, Bollerslev & Mikkelsen (1996), Breidt, Crato & de Lima (1998), and Harvey (1998) or the more recent studies by Andersen, Bollerslev, Diebold & Ebens (2001), Andersen, Bollerslev, Diebold & Labys (2001), and Christensen & Nielsen (2004). Our model then applies if it is assumed that the long memory is a common feature between two or more such processes, which would seem like a plausible assumption especially if the underlying assets are traded on the same market (exchange rate or stock market). Finally, we oer an application to the relation between the volatility implied by option prices and the volatility subsequently realized in the stock market, which has previously been analyzed by, e.g., Christensen & Prabhala (1998), Christensen & Nielsen (2004), and Bandi & Perron (2004). The unbiasedness hypothesis in the option market implies a slope coecient of unity in the implied-realized volatility relation, but the ordinary regression estimate is less than one-half. However, we conduct a stationary fractional cointegration analysis, and nd that the volatility series are well described as being stationary fractionally cointegrated with d approximately 0.45 and de insignicantly dierent from zero. When accounting for the possibility of stationary fractional cointegration, the estimated slope coecient is in one case insignicantly dierent from unity, thus supporting long-run unbiasedness of implied volatility as a forecaster of realized volatility. However, the evidence when applying our more ecient joint estimation procedure is not as clear-cut as in Christensen & Nielsen (2004) and Bandi & 7
Chapter 1
Perron (2004), and in particular the tests for long-run unbiasedness actually reject when larger bandwidths are employed. The paper is organized as follows. In the next section we present the model and set up the local Whittle likelihood and the assumptions necessary to prove our main result. We also present our rst theorem which shows the validity of the univariate local Whittle QMLE of d when based on residuals. In section 3 we state our main theorem which gives the asymptotic distribution of the joint semiparametric two step estimator. The asymptotic distribution is compared to that of the local Whittle QMLE of d and that of the narrow band FDLS estimator of . Section 4 presents the results of a Monte Carlo study, illustrating the nite sample behavior of the proposed estimator. Section 5 presents the empirical application to the impliedrealized volatility relation, and section 6 concludes. The proofs of the two theorems are provided in three appendices.
Stationary Fractional Cointegration Model
Let us now generalize the simple model described above. In particular, suppose the spectral density matrix of the p-vector wt = (x0 , et )0 is t f () 1 G1 as 0+ , (5)
where = diag(d1 , ..., dp ), da = { x| 0 x 1 , 0 < 1 < 1/2}, a = 1, ..., p, and G is a p p real symmetric matrix. Here, diag(a1 , a2 , ..., ak ) denotes the diagonal k k matrix with diagonal elements a1 , a2 , ..., ak . Later we shall use the more general notation that, for P P mi mi matrices Ai , i = 1, ..., k, diag(A1 , A2 , ..., Ak ) is the k mi k mi block-diagonal i=1 i=1 matrix with diagonal blocks A1 , A2 , ..., Ak . Furthermore, the symbol "" means that the ratio of the left- and right-hand sides tends to one in the limit, element-by-element. Equation (5) is the natural multivariate extension of (3), including multivariate fractional ARIMA models as a special case, and is also considered in previous work by e.g. Robinson (1995b), Lobato (1999), and Robinson & Yajima (2002). Thus, the elements of the vector xt can be integrated of dierent orders, i.e. xat I (da ). This implies, by (4), that yt I(max1ap1 da ) (with the maximum taken over those a where a 6= 0), such that the conceptual requirement that at least two of the variables in (yt , x0 )0 must be integrated of the same order is automatically t satised. Notice that dp now denotes the integration order of the error term, i.e. et I (dp ). For simplicity of presentation we assume that only one cointegration vector exists, i.e. that the cointegration rank is unity. A generalization to the case with cointegration rank r < p along the lines of Engle & Granger (1987) should be possible, but is beyond the scope of this paper. Consistent semiparametric procedures for estimating the cointegration rank, r, from data have been explored recently by Robinson & Yajima (2002) in the stationary fractional cointegration case also considered here, and by e.g. Chen & Hurvich (2003b) and Nielsen & Shimotsu (2004) for the nonstationary case. 8
evaluated at the Fourier frequencies j = 2j/n, j = 1, ..., m. We let the bandwidth parameter m = m (n) tend to innity to gather information, but at a slower rate than n to remain in a neighborhood of = 0. Note that the zero frequency has been left out of the summation in (6) to render the estimation invariant to mean terms. An integral version of (6) could also have been considered, but it would not share this property and it would be computationally more burdensome. The local Whittle estimator of (, G) is dened as ( G) = arg min W (, G) ,
,G
Pn Pn it 0 it is the periodogram matrix of w at frewhere I () = (2n)1 t t=1 wt e t=1 wt e quency . Note that enters the likelihood function through the relation Ipp () = Iyy () Re( 0 Ixy () + Iyx () 0 Ixx () ), where the subscripts pp, yy, and xx denote the periodograms of et (or equivalently, wpt ), yt , and xt , and the subscript xy denotes the crossperiodogram between xt and yt . In the spirit of the semiparametric approach, we prefer the discrete local version of the likelihood m X (, G) = 1 log |f (j )| + tr f 1 (j ) Re (I (j )) (6) W m
j=1
0 We collect the parameters of interest in the (2p 1)-vector = d1 , ..., dp , 0 . The Whittle approximation to the (negative) likelihood is Z log |f ()| + tr f 1 () Re (I ()) d, W (, G) =
over a compact subset of p Rp +p1 . We concentrate G out of the likelihood by setting P G () = m1 m j Re(I (j ))j , and write the concentrated likelihood as j=1 Pp m 2 ( a=1 da ) X log j L () = log G () m
j=1
(7)
apart from constants. The local Whittle estimator of the parameter of interest, , can then be dened in terms of the concentrated likelihood as = arg min L () ,
(8)
where the minimization is carried out over , a compact subset of p Rp1 . We propose the following simple two step estimator (TSE) for the integration orders and the cointegrating vector, 1 2 L () (2) = (1) L () , (9) (1) 0 (1) 9
Chapter 1
where is a consistent initial estimator, e.g. the univariate local Whittle QMLE of Robinson (1995a) and the narrow band FDLS estimator of Robinson (1994a) and Christensen & Nielsen (2004). We could iterate (9) until convergence for higher order gains, but that does not change the rst order asymptotics. It is well known that the TSE has the same asymptotic distribution as the QMLE, but we prefer the TSE for its simplicity. To prove our main results we assume, with obvious implications for yt , the following conditions on wt = (x0 , et )0 , the bandwidth, and the initial estimates. t Assumption 1 The spectral density matrix of wt given in (5) with typical element fab (), the cross spectral density between wat and wbt , satises da db = O da db as 0+ , a, b = 1, ..., p, fab () gab
(1)
for some (0, 2]. The matrix G satises gap = gpa = 0 for a = 1, ..., p 1, and the leading (p 1) (p 1) submatrix of G, denoted G, is positive denite.
P Assumption 2 wt is a linear process, wt = + Aj tj , with square summable coej=0 P kAj k2 < . The innovations satisfy, almost surely, E ( t | Ft1 ) = 0, cient matrices, j=0 E ( t 0 | Ft1 ) = Ip , and the matrices 3 = E ( t t 0 | Ft1 ) and 4 = E ( t 0 t 0 | Ft1 ) t t t t are nonstochastic, nite, and do not depend on t, where Ft = ({s , s t}). Assumption 3 As 0+ ,
dAa () = O 1 kAa ()k , a = 1, ..., p, d P where Aa () is the a0 th row of A () = Aj eij . j=0 Assumption 4 The bandwidth parameter m = m (n) satises m1+2 (log m)2 1 + 0 as n . m n2
(1) Assumption 5 The initial estimates are consistent, and in particular satisfy d(1) da = Op m1/2 for a = 1, ..., p, a da d (1) a a = Op m1/2 m p for a = 1, ..., p 1.
(10) (11)
Our assumptions are a multivariate generalization of those in Robinson (1994a, 1995a), see also Lobato (1997, 1999). Since our assumptions are semiparametric in nature they naturally dier from those employed by e.g. Robinson & Hidalgo (1997) in their parametric setup, and are at least in some respects weaker than standard parametric assumptions. In particular, we 10
avoid the standard assumptions (from time series regression theory with stationary variables) of independence between xt and ut and complete specication of f (). The rst part of Assumption 1 specializes (5) by imposing smoothness conditions on the spectral density matrix of wt commonly employed in the literature. They are satised with = 2 if, for instance, wt is a vector fractional ARIMA process. The positive deniteness condition on G is a no multicollinearity or no cointegration condition within the components of xt , which is typical in single-equation cointegration models or regression models. The condition that gap = gpa = 0, for a = 1, ..., p 1, is new compared to previous research from the I (1)I (0) cointegration literature, but it relaxes the standard orthogonality condition from the time series regression literature with stationary variables which is a more relevant comparison given our stationary setting. The condition ensures that the coherence between the regressors and the error process is zero at the origin, and it can be thought of as a localto-zero version of the usual orthogonality condition from least squares theory. It is needed, for instance, to show that the estimation of dp is unaected by the fact that it is based in part on estimated residuals. The condition is not quite as strong as it may seem at rst. In particular, it does not require the regressors xt and the errors et to be uncorrelated at frequencies away from the origin, but allows the regressor and error terms to share the same short- and mediumterm dynamics. Thus, it relaxes the independence (or uncorrelatedness) assumption typically employed in the time series regression literature with stationary variables, see e.g. Robinson & Hidalgo (1997), which is a more relevant comparison given our stationary framework. While the local orthogonality condition is necessary to derive the asymptotic normality distribution theory below, we conjecture that our TSE (9) remains consistent even if some gap 6= 0. The conjecture is based on the rate result of Robinson & Marinucci (2003) which de states that the FDLS estimator of is m d -consistent for general non-zero G, and on the fact that the multivariate TSE of the integration orders of non-cointegrated variables in Lobato (1999) does not require this condition. This conjecture is partially conrmed in simulations in section 4 below. From the simulations it also seems that the estimation of de is unaected by the violation of the local orthogonality condition and the resulting slower rate of convergence for the initial estimate of . Assumptions 2 and 3 follow Robinson (1995a) and Lobato (1999) in imposing a linear structure on wt with square summable coecients and martingale dierence innovations with nite fourth moments. The assumption of constant conditional variance for the innovations could presumably be relaxed by assuming boundedness of the eighth moment as in Robinson & Henry (1999). Assumption 2 is satised, for instance, if t is an i.i.d. process with nite fourth moments. Under Assumption 2 we can write the spectral density matrix of wt as f () = 1 A () A () , 2 (12)
where the asterisk is complex conjugation combined with transposition. Assumption 4 restricts the expansion rate of the bandwidth parameter m = m (n). The 11
Chapter 1
bandwidth is required to tend to innity for consistency, but at a slower rate than n to remain in a neighborhood of the origin, where we have some knowledge of the form of the spectral density. When is high, (5) is a better approximation to (12) as 0+ , and hence (by the second term of Assumption 4) a higher expansion rate of the bandwidth can be chosen. The weakest constraint is implied by = 2, in which case the condition is m = o(n4/5 ) apart from a logarithmic term. Finally, Assumption 5 states the required rates of convergence of the initial estimates. The cointegration vector may initially be estimated by the narrow band FDLS estimator which satises (11) at least when maxa da + dp < 1/2, see Christensen & Nielsen (2004). For any da in the stationary and invertible range, i.e. also when maxa da + dp < 1/2 is not satised, the narrow band frequency domain generalized least squares type estimate of Nielsen (2002) may be employed, which satises Assumption 5 for any such da , but other estimators would also satisfy the assumption. Note that Christensen & Nielsen (2004) and Nielsen (2002) employ assumptions like our Assumptions 1-4 (except the logarithmic term in Assumption 4) to derive their asymptotic distribution theory. In particular, their stationary setup also requires the local orthogonality condition in Assumption 1. Also note that Assumption 5 depends on the bandwidth m to be used in the second stage of the TSE, so that a slow rate of convergence of the initial estimator would imply restrictions on m and therefore limitations on the rate of convergence of the TSE. For the initial estimates of the integration orders we suggest using the local Whittle QMLE or log-periodogram methods which obviously satisfy (10) for a = 1, ..., p 1, see Robinson (1995a, 1995b). When the time series is not observed but instead is a residual, which is the case for a = p, the results of Robinson do not apply directly. Hassler et al. (2000) and Velasco (2003) consider the estimation of d for residuals when the observed variables are nonstationary. They show that, under complicated conditions on the bandwidth parameter, the use of the local Whittle QMLE or the log-periodogram estimator is indeed valid. In particular, their approaches assume both nonstationarity and the condition that min1ap1 da dp > 1/2 and thus do not apply in our setting. We next show that, under Assumptions 1-4 above and the condition (11) on the estimator of , the local Whittle QMLE remains valid in our stationary model even when based on residuals. In particular, we do not need to introduce additional, complicated conditions on the expansion rate of the bandwidth parameter. Thus, suppose dp is estimated by dp = arg min R (d) ,
d
(13)
m X j=1
R (d) = log G (d) 2d
log j , G (d) =
1 X 2d j Ipp (j ) , m
j=1
12
where Ipp (j ) = Ipp (j ) + ( )0 Re (Ixx (j )) ( ) + 2( )0 Re (Ixp (j )) (14)
0 is the periodogram of the residual series et = yt xt = et + ( )0 xt . The subscript xp in (14) denotes the cross-periodogram between xt and et (or equivalently wpt ). Our rst theorem shows that, under our assumptions, the eect of using residuals in place of observed series is negligible. Theorem 1 Let Assumptions 1-4 be satised and suppose dp is given by (13) based on the (1) 0 residuals et = yt xt , where satises (11) (in place of ). Let the true value be denoted by d0p and suppose d0p belongs to the interior of . Then, as n , D m(dp d0p ) N (0, 1/4) . Proof. See appendix A. The theorem demonstrates that we may choose to use the local Whittle QMLE as the initial estimator of the integration order of the residuals when the estimator of the cointegration vector satises (11). Hence, we have found feasible initial estimates of both the integration orders and the cointegration vector that satisfy Assumption 5. More generally, Theorem 1 in fact shows that the three step procedure employed by Marinucci & Robinson (2001) and Christensen & Nielsen (2004) is valid. That is, inference on dp may, in their setup, be conducted based on our distributional result in Theorem 1 and is equivalent to disregarding the fact that dp is based on residuals.
Main Result
We are now ready to state our main result regarding the TSE. Theorem 2 Let 0 denote the true value of the parameter vector , and suppose 0 belongs to the interior of the parameter space, . Under 0 dp < da < 1/2, for a = 1, ..., p 1, (4), and Assumptions 1-5 (2) D dp m diag Ip , m 1 0 N 0, 1 , (15) m with = E 0 0 F ! , (16) (17) a, b = 1, ..., p 1, (18)
where denotes the Hadamard product and m is the leading (p 1) (p 1) submatrix of dp m = diag(da , ..., m ). m 13
E = 2 Ip + G G1 , 2gab , Fab = gpp (1 da db + 2dp )
Chapter 1
Proof. The asymptotic distribution of the TSE is the same as that of the QMLE, which is given by (15) if we can show the following. The score is such that L ( ) d 0 D m diag Ip , m p m N (0, ) , (19) and the Hessian satises 2 L( p ) d d diag Ip , m p m diag Ip , m p m 0 (20)
(1) for all such that k 0 kk 0 k. Notice that is positive denite by Assumption 1 and the fact that the Hadamard product of two positive denite matrices is positive denite. We prove (19) in appendix B, where parts of the proof follow Lobato (1999) in applying the martingale dierence array approximation technique by Robinson (1995a). (20) is proven in appendix C. Some comments on our result are in order. Velasco (2001) reaches a result very similar to our Theorem 2 in a nonstationary setup, using tapered periodograms to account for the nonstationarity, following the approach of Lobato & Velasco (2000). However, Velascos (2001) results are limited to a bivariate model, and require tapering and an additional user chosen bandwidth parameter (say l) to trim out the rst l Fourier frequencies as in Robinson (1995b). The asymptotic distribution in (15) is block diagonal, such that the estimates of the integration orders are asymptotically uncorrelated with the estimate of the cointegration vector. In particular, the asymptotic distribution of the estimators of the integration orders is unaected by the fact that they are based in part on residuals. This is due to the local orthogonality condition in Assumption 1, which ensures that the eect of the estimation of on the estimation of the integration orders is negligible. A discussion of the eciency gains of the multivariate estimator of the integration orders over the univariate local Whittle QMLEs in Robinson (1995a) can be found in Lobato (1999, p. 136). Let us have a closer look at the asymptotic distribution of the estimator of the cointegration vector in the simple two variable case. Suppose we observe two time series yt and xt both integrated of order d < 1/2, and that the error term is known to be integrated of order de < d. (2) Then the asymptotic (marginal) distribution (15) of in Theorem 2 reduces to de d (2) ge (1 2d + 2de ) D 0 N 0, mm , 2gx
where gx and ge are the elements of G, which is a diagonal 2 2 matrix. Thus, the variance depends on the signal-to-noise ratio gx /ge . We compare our estimator of the cointegration vector with the narrow band FDLS estimator given by 1 m m 1 X 1 X F DLS = Re (Ixx (j )) Re (Ixy (j )) , (21) m m
j=1 j=1
14
and asymptotically distributed according to de d D F DLS 0 N mm ge (1 2d)2 0, 2gx (1 2d 2de ) !
(2) with respect to F DLS is
in the two variable case with d + de < 1/2, see Christensen & Nielsen (2004). Thus, for the comparison, we restrict ourselves to the case d + de < 1/2 since otherwise the narrow band FDLS estimator is non-normal. We note immediately that the convergence rates are the same, and in particular, they are very close to n for relevant parameter values. For instance, when m = O n0.5 and d de = 0.4, which are values close to those in the empirical application (2) are n0.45 -consistent. The asymptotic relative eciency of below, we get that and
F DLS
V ar( F DLS ) (2) V ar( )
(1 2d)2 , (1 2d)2 4d2 e
which equals unity if and only if de = 0, and exceeds unity otherwise. Thus, our two step estimator is more ecient and applies for a wider range of (d, de ) than the FDLS estimator. The unknown parameters appearing in the asymptotic distribution (15) can be replaced by consistent estimates. In particular, the matrix of coherencies at the zero frequency, G, (2) can be estimated consistently by G( ). Its asymptotic distribution could also be derived by application of the delta-method, see Robinson (1995a) and Lobato & Velasco (2000). Based on Theorem 2 it is straightforward to construct Wald tests of hypotheses that involve both the integration orders and the cointegration vector. For instance, the linear restrictions H0 : R = r can be tested by 0 1 D R r 2 W = R r R1 R0 q (22)
under the null, where q is the number of linearly independent restrictions. Some hypotheses of general interest in this framework are (i) the components of xt are integrated of the same order, 1 = ... = p1 , (ii) the errors have no long memory, p = 0, (iii) xkt is not present in the cointegrating relation, p+k = 0, or combinations of these. In the empirical application below we apply a combination of (ii) and (iii) since the unbiasedness hypothesis implies that the errors have no long memory and that xt has a unit coecient, i.e. 2 = 0 and 3 = 1 in a bivariate setup.
Finite Sample Performance
In this section we investigate the nite sample behavior of the TSE. We consider the following three generating mechanisms for xt and et , 15
Chapter 1
Model A : Model B : Model C :
(1 L)d xt = u1t ,
(1 L) xt = u1t ,
(1 L)d xt = 1t ,
d
(1 L)de et = 2t ,
de
= corr(1t , 2t ) = 0, u1t = 0.5u1,t1 + 1t , u1t = 0.5u1,t1 + 1t , = 0, = 0.5,
(1 L) et = 2t , (1 L)de et = 2t ,
where t = (1t , 2t )0 is independently and identically normally distributed with mean zero, unit variances, and contemporaneous correlation . We then generate yt from (4) with = 1. Models A and B satisfy all the assumptions of the model, whereas Model C violates the assumption of block diagonality of the G matrix. In particular, the models are increasing in complexity with Model A being very simple with no short-run dynamics. Model B adds shortrun dynamics to the regressor and thus disturbs the signal due to the contamination of the low frequencies of xt from the higher frequencies which are dominated by the short-run dynamics. In Model C we add the further complication that gap 6= 0, which violates Assumption 1 of our model and hence our distribution theory no longer applies. As conjectured in section 2 above, the TSE is presumably still consistent but biased. For each model, we use 10, 000 replications for sample sizes n = 200 and n = 500. The bandwidth parameters chosen for the simulation study are m = n0.5 , m = n0.6 , and m = n0.7 , using the same bandwidth for the initial estimates. The reported results are robust to changes in the bandwidth parameters for the initial estimates. Tables 1 and 2 about here Tables 1 and 2 present the Monte Carlo bias and root mean squared error (RMSE) of the TSE for Model A with (d, de ) = (0.4, 0) and (d, de ) = (0.2, 0.1), respectively. The former is close to what is expected in many practical situations concerning e.g. volatility series as in the empirical application below, and the latter is a weaker form of cointegration where d and de are closer and there is long memory in the error term. Model A is simple and contains no short-run dynamics and consequently the approximation (5) is close to (12) even for frequencies away from the origin. Hence the bias in the estimates is very low. For almost all the specications of the model the bias is negative, and it is uniformly lower than 0.06 in absolute value in Table 1 and 0.04 in Table 2. The RMSE is decreasing in the bandwidth for all three parameters suggesting that for Model A larger bandwidths are preferable. We also notice that the bias and RMSE of de are only slightly higher than those of d. Thus, the fact that de is based on residuals only increases the RMSE slightly in Model A. Tables 3 and 4 about here Tables 3 and 4 present the simulation results for Model B with (d, de ) = (0.4, 0) and (d, de ) = (0.2, 0.1), respectively. In Model B, the short-run dynamics in u1t inuences the results signicantly. I.e., for xt , (5) is a poor approximation to (12) when moving only a short 16
distance away from the origin, due to the contamination from higher frequencies (short-run dynamics), and thus we expect biased results when the bandwidth is chosen too large. For n = 200 with m = n0.6 or m = n0.7 , the estimates of d suer from severe bias of up to 0.22 for both congurations of integration orders. However, the bias decreases signicantly when considering the large sample, n = 500, where the bias is 0.06 when m = n0.6 and 0.15 when m = n0.7 (for both congurations of integration orders). Furthermore, the RMSE of d reects the bias resulting from large bandwidth parameters and is no longer decreasing in the bandwidth. Thus, in the presence of short-run dynamics a smaller bandwidth is preferable to avoid biased results. The estimates of de and contain no bias even with the introduction of short-run dynamics in the regressor. Unreported simulations conrm that the exact same pattern emerges if the short-run dynamics were instead present in the error term et , i.e. the estimates of de would be biased but the estimates of d and would be virtually unaected. Tables 5 and 6 about here Finally, Tables 5 and 6 present the simulation results for Model C with (d, de ) = (0.4, 0) and (d, de ) = (0.2, 0.1), respectively. In Model C we add another complication relative to Model B. I.e., the local orthogonality condition between the regressors and the errors is now violated since = 0.5 which introduces correlation between the error term and the regressor at all frequencies, and in particular at frequency zero. Thus, the underlying assumptions of our model and the distribution theory in sections 2 and 3 above are no longer satised for Model C, and consequently we expect the estimates to be biased but conjecture as in section 2 that the TSE remains consistent even in this case. In both Tables 5 and 6, the bias in the estimate of d is unchanged relative to Model B and the estimate of de remains roughly unbiased. The simulations thus suggest that the results of Theorem 1 are unaected by the violation of the local orthogonality condition and consequent slower convergence rate of the initial estimate. However, the violation of the local orthogonality condition has introduced a bias in the estimation of . For the case (d, de ) = (0.4, 0) in Table 5, the bias ranges from 0.09 to 0.11 when n = 200, but decreases to 0.07 0.10 when the larger sample n = 500 is considered. In both cases the bias is increasing in the bandwidth. The RMSE of also reects the bias and is in fact increasing in the bandwidth. For the case (d, de ) = (0.2, 0.1), where the cointegrating strength is much weaker, the bias is more pronounced but still increasing in the bandwidth and decreasing in the sample size. This is expected based on the rate result of Robinson & Marinucci (2003) that the FDLS estimator de of is m d -consistent when gap is allowed to be non-zero, and hence the rate of convergence is slower in the case (d, de ) = (0.2, 0.1) compared to the case (d, de ) = (0.4, 0). Thus, as in the case of short-run dynamics, the long-run coherence between the regressors and errors makes the smaller bandwidth preferably to avoid biases. Unreported simulations show qualitatively very similar results for dierent (or no) short-run dynamics in Model C. 17
Chapter 1
The Implied-Realized Volatility Relation
We proceed to conduct an actual empirical stationary fractional cointegration analysis. We analyze the relation between the volatility implied by option prices, and the subsequently realized return volatility of the underlying asset, following Christensen & Prabhala (1998), Bandi & Perron (2004), and particularly Christensen & Nielsen (2004). The presence of long memory (or fractional integration) in the volatility of nancial assets has received a great deal of attention in recent years, see inter alia Robinson (1991), Ding, Granger & Engle (1993), Baillie et al. (1996), Comte & Renault (1996), Andersen & Bollerslev (1997a, 1997b), Breidt et al. (1998), Harvey (1998), Andersen, Bollerslev, Diebold & Ebens (2001), Andersen, Bollerslev, Diebold & Labys (2001), and Deo & Hurvich (2003). If option market participants are rational and markets are ecient, the price of a nancial option should reect all publicly available information including information about expected future return volatility of the underlying asset. Given an observation on the price of an option, the implied volatility IV may be determined by inverting the option pricing formula with respect to IV , and if this is done every period t a time series IV,t results. Each implied volatility IV,t may now be considered as the markets forecast of the actually realized return volatility of the underlying asset. Here, realized volatility is simply the sample standard deviation RV,t of the realized return from t to t + 1. In practice, we work with the log volatilities, since they are close to Gaussian, see Andersen, Bollerslev, Diebold & Ebens (2001). Christensen & Prabhala (1998) considered the regression specication yt = + xt + et , (23)
where yt = ln RV,t and xt = ln IV,t are the log volatilities, and and are intercept and slope coecients. The unbiasedness hypothesis for option markets implies a -coecient of unity. A monthly sampling frequency was employed for xt and yt . The underlying asset was the S&P100 stock market index, and yt was calculated from daily returns, see Christensen & Prabhala (1998) for the details. Basic OLS regression in (23) produced a -estimate that was greater than zero and less than unity (Christensen & Prabhala (1998) also presented results without the logarithmic transform and the dierence was negligible). Inferences from OLS may be erroneous if xt and yt are fractionally cointegrated, see Robinson (1994a) and Robinson & Marinucci (2003), which is exactly what would be expected under the unbiasedness hypothesis. Thus, if xt =Et (yt ) with Et () denoting conditional expectation as of time t, then is unity and et is serially uncorrelated. For a detailed description of the implied-realized volatility relation and its implications, see Christensen & Prabhala (1998). If volatility is fractionally integrated, as empirical literature suggests (Andersen, Bollerslev, Diebold & Ebens (2001), Christensen & Nielsen (2004), and Bandi & Perron (2004) nd fractional integration with d around 0.40 0.45), whereas the forecasting error et in (23) possesses only short memory, then xt and yt are fractionally cointegrated. This is in fact what the empirical results in Christensen & Nielsen (2004) and our empirical results below indicate. In 18
particular, Christensen & Nielsen (2004) considered a fractional cointegration analysis of (23), using rst the univariate local Whittle estimator of Robinson (1995a) to estimate the integration orders of the raw data, then the narrow band FDLS estimator to estimate , and nally the local Whittle estimator to estimate the integration order of the errors. It was found that accounting for the possibility of stationary fractional cointegration greatly improves the results, and in most cases produces -estimates that are insignicantly dierent from unity. The data we use are the same as those investigated by Christensen & Nielsen (2004), and are weekly data covering the period January 1, 1988, to December 31, 1995, resulting in n = 417 observations. The nal data series are based on high-frequency data from the Berkeley Options Data Base (BODB), see the BODB Users Guide for a description. From the high-frequency options data, a 5-minute return series for the underlying S&P500 index is constructed for the period 9:00 AM to 3:00 PM each trading day. This results in a series of 147,022 observations. From this 5-minute return series we form the realized volatility RV,t over each one-week interval by taking the sample standard deviation of the 5-minute annualized returns in week t. The implied volatilities are backed out from the Monday 10:00 AM quote, for the call of shortest maturity and closest to the money, using the standard option pricing formula corrected for dividends. This results in a weekly implied volatility series IV,t with dierent times to maturity since the options expire monthly. We convert this heterogeneous series to another weekly series IV,t , that may be associated with the series RV,t of realized volatilities covering homogeneous nonoverlapping weekly intervals by the formula 2 IV,ti = 1 IV,ti di1 2 IV,ti+1 , di 2 di di1 (24)
where di is the number of days until expiration of IV,ti , starting with IV,t = IV,t for t corresponding to a one-week option and then applying the recursion (24). This is of course an approximation for implied volatilities, as opposed to realized volatilities where it is an identity. However, the approximation is a high-frequency measurement error, and consequently our semiparametric approach should be robust towards it. For the complete details of the construction of the data set and summary statistics, see Christensen & Nielsen (2004). In Table 7 we report the results of our stationary fractional cointegration analysis for bandwidths m = n0.50 = 20, m = n0.55 = 27, and m = n0.60 = 37, respectively. Following the suggestions from the Monte Carlo study in the previous section, the bandwidths are chosen to be quite small, and in particular m is slightly lower than in Christensen & Nielsen (2004). The rst column shows the initial estimates. For the integration orders d and de we choose Robinsons (1995a) univariate local Whittle estimates. To obtain an initial estimate of we use the frequency domain narrow band generalized least squares estimator of Nielsen (2002), since it is suspected that the condition d + de < 1/2 may not be satised in our application. Both sets of initial estimates use the same bandwidth as the TSE. The results are robust to changes in the bandwidth parameters for the initial estimates, see also Christensen & Nielsen (2004). The initial estimates are comparable to those found by Christensen & Nielsen (2004), 19
Chapter 1
and in particular the series seem to be stationary (d < 1/2) and the errors are close to I (0). The initial estimate of ranges from 0.70 to 0.78, which is well above the 0.3 0.4 that are typical for OLS estimates of , see Christensen & Prabhala (1998) and Christensen & Nielsen (2004), but still suggests that implied volatility is a biased forecast of realized volatility. Table 7 about here In the next two columns we report the TSE (9) and the standard error of each parameter, respectively. The standard errors are calculated using our new distribution theory as the square root of the diagonal elements of the covariance matrix in Theorem 2. For d, the estimates are the same as the initial estimates of approximately 0.45 0.48, which is in line with previous evidence, e.g. Andersen, Bollerslev, Diebold & Ebens (2001), Christensen & Nielsen (2004), and Bandi & Perron (2004). Turning to the estimation of the parameters of primary interest, de and , we nd somewhat dierent results for bandwidth m = 20 compared to m = 27 or m = 37. The TSE point estimate of de is smaller for the small bandwidth and increases with the bandwidth, ranging from 0.15 to 0.17, and are slightly higher than the initial estimates. Similarly, the TSE of is larger for the small bandwidth and decreases (rapidly) with the bandwidth, being estimated at 0.77 for the small bandwidth and 0.68 0.71 for the larger bandwidths. For the small bandwidth, m = 20, the estimates of de and are insignicantly dierent from zero and unity, respectively. However, the estimates for m = 27 and m = 37, the two largest bandwidths considered, are signicantly dierent from unity, and the de estimate for the largest bandwidth is signicantly dierent from zero. The results so far are consistent with the notion that realized and implied volatility are well described as stationary but fractionally integrated series, and that they tend to move together in the sense that the errors in (23) have less memory. The interesting question is how closely they move together and whether the errors are in fact only weakly dependent. To answer this question, the fourth column shows the Wald test statistic (22) of the joint hypothesis that de = 0 and = 1, which is asymptotically distributed as a 2 random variable with 2 degrees of freedom (the 5% and 1% critical values are 5.99 and 9.21, respectively). The test rejects at the 1% level for the two larger bandwidth choices, m = 27 and m = 37, casting some doubt on the conclusions from the literature that implied and realized volatility can indeed be described by a stationary fractionally cointegrated relation with unit coecient and only weakly dependent errors. However, the results for the smaller bandwidth, m = 20, do suggest that all long memory properties in volatility are common features for implied and realized volatility pointing towards a unit coecient and weakly dependent errors. The remaining columns in Table 7 present the estimates, standard errors, and Wald test statistic when (9) is iterated until convergence. These results are very similar to the results for the TSE, and consequently the Wald statistics oer the same conclusions as for the TSE. Thus, similarly to Christensen & Nielsen (2004) and Bandi & Perron (2004), we nd that the volatility series are well described as stationary fractionally integrated series and we cannot 20
reject that implied and realized volatility indeed are stationary fractionally cointegrated. That is, the residuals are of lower order of fractional integration than the volatility series themselves, de < d. In fact, our results oer some support to the even stronger relation that de = 0. Under long-run unbiasedness, we would expect the series to follow each other closely resulting in a unit -coecient, which is also somewhat supported by our analysis. However, the evidence when applying our more ecient joint estimation procedure is not as clear-cut as in Christensen & Nielsen (2004) and Bandi & Perron (2004), and in particular the tests for long-run unbiasedness actually reject when larger bandwidths are employed.
Conclusion
We consider a local Whittle analysis of a stationary fractionally cointegrated model. In particular, we propose a two step estimator, which is equivalent to the local Whittle QMLE, to jointly estimate the integration orders of the regressors, the integration order of the errors, and the cointegration vector. The estimator is semiparametric in the sense that it employs local assumptions on the joint spectral density matrix of the regressors and the errors near the zero frequency. By using a degenerating part of the periodogram near the origin, the approach is invariant to short-run dynamics, which would have to be specied correctly in a parametric procedure. The two step estimator is based on consistent initial estimators. We show that such estimators exist, and in particular, our Theorem 1 shows that the local Whittle QMLE of the integration order of the residuals is unaected by the fact that it is based on residuals. More generally, our Theorem 1 in fact shows that the three step procedure employed by Marinucci & Robinson (2001) and Christensen & Nielsen (2004) is valid. That is, inference on de may, in their setup, be conducted based on our distributional result in Theorem 1 and is equivalent to disregarding the fact that the estimator is based on residuals. In our stationary fractionally integrated case, we show that the two step estimator is asymptotically normal with block diagonal covariance matrix for the entire stationary region of the integration orders. Thus, the estimates of the integration orders are asymptotically uncorrelated with the estimate of the cointegration vector. Furthermore, our estimator of the cointegration vector is asymptotically normal for a wider range of integration orders than the narrow band frequency domain least squares estimator of Robinson (1994a), analyzed by Marinucci & Robinson (2001), Robinson & Marinucci (2003), and Christensen & Nielsen (2004), and is superior with respect to asymptotic variance when the latter is normal. To demonstrate the feasibility of our methodology in nite samples we have presented the results of a small Monte Carlo study. The results show that the performance of the estimator is very good with respect to bias and root mean squared error when no short-run dynamics is present. However, it also shows that the bandwidth parameter should not be too high in the presence of short-run dynamics to avoid biased results. 21
Chapter 1
We have applied our methodology to nancial volatility series. The unbiasedness hypothesis of option markets implies a coecient of unity in the implied-realized volatility relation, but the ordinary regression estimate is less than one-half. We show that implied and realized volatility are well described as being stationary fractionally cointegrated. When accounting for this, our estimates of this coecient are about twice as large as before and for the smallest bandwidth even insignicantly dierent from unity. Furthermore, we are unable to reject the joint hypothesis of unit coecient and weak dependence of the error process, when considering the specication with the smallest bandwidth parameter. The analysis demonstrates that useful long-run relations can be derived even among stationary series.
Appendix A: Proof of Theorem 1

First we show that (log n) (dp d0p ) 0. Rewriting equations (A.1)-(A.4), (A.24), (A.25), and (A.30) from the proof of Theorem 3 of Robinson (1997) it suces to show that m2(d0p 1 )1 (log n)2 m
m X p
j=1 m X 21 2
j 2(1 d0p ) |hj | 0 j |hj | 0

p
for 1 < d0p , for some > 0,
(25)
(26)
j=1
m 1 X p (j/q)2(1 d0p ) 1 hj 0, m j=1
m (log n)2 X p |hj | 0, m j=1
(27)
(28)
P where q = exp m1 m log j and j=1 hj =
Ipp (j ) Ipp (j ) Gpp j

2d0p
(29)
is a normalized measure of the impact of using the periodogram of residuals instead of the periodogram of observed data. Our assumption that d0p 0 allows a slight simplication of the conditions (25)-(28) compared to their counterparts in Robinson (1997) which shortens this proof somewhat. It could easily be relaxed at the expense of a longer proof. Note that, by Assumption 1, (11), (14), and Robinson (1995b, Theorem 2), the random variables hj satisfy (30) |hj | = Op m1 + m1/2 . m 22
it is easy to show that (25)-(27) are Op (log m) m1 + m1/2 , m Op (log n)2 (log m) m1 + m1/2 , m , Op (log n)2 m1 + m1/2 m
Using (30) and the fact that m X sup m1 (log m)1 j = O (1) for C (1, ) , 1C j=1
(31)
respectively. We will need (31) throughout all our proofs, and we shall use it automatically and without special reference in what follows. Using the fact that q m/e (e = 2.71...) as n , the left-hand side of (28) is bounded by
m m 1 X 1 X j 2(1 d0p ) |hj | + |hj | , m m m j=1 j=1
where
which is negligible by (25) and (27). Thus, we have shown (log n)-consistency of dp and proceed to prove the asymptotic distribution result. Following Robinson (1995a) we need to show that G (d) G (d) 0,e 1 0, e , (32) sup = op G (d) (log m)6 dN p k = 0, 1, 2, (33) Fk, (d0p ) Fk,e (d0p ) 0, e Re (d0p ) Re (d0p ) p 0, m (34) d d Gk,a (d) = 1 X (log j )k 2d Iaa (j ) , j m
j=1 m j=1 m
1 X 2(dd0p ) j , G (d) = Gpp m Fk,a (d) = Ra (d) d

j=1
m 1 X (log j)k 2d Iaa (j ) , j m m G1,a (d) 2 X log j , G0,a (d) m j=1
= 2
23
Chapter 1
and N is dened in Robinson (1995a, p. 1634). By (4.7) in Robinson (1995a), (32) follows if j m X X j 12 p (log m)6 j 2 hk 0 m
j=1 k=1
e where G0,e (d0p ) = Gpp + op (1) by Robinson (1995a) and thus also G0, (d0p ) = Gpp + op (1) in view of (33) with k = 0. Furthermore, G1, (d0p ) G1, (d0p ) G1,e (d0p ) + G1,e (d0p ), e e R (d ) where G1,e (d0p ) = G0,e (d0p ) e 0p = (Gpp + op (1)) Op m1/2 = op (1) by Robinson (1995a). d Consequently, to complete the proof, we need to show that p m G1, (d0p ) G1,e (d0p ) 0. e The left-hand side is X m Gpp (log j ) hj m j=1 Gpp (log n) X |hj | m j=1 = Op (log n) (log m) m1/2 + m
m
by the same arguments as applied to (27) above. The left-hand side of (34) is e G1,e (d0p ) G1, (d0p ) 2 m e G0,e (d0p ) G0, (d0p ) e G1,e (d0p ) G1, (d0p ) G1, (d0p ) G0, (d0p ) G0,e (d0p ) e e , +2 m 2 m e G0,e (d0p ) G0,e (d0p ) G0, (d0p )
which holds by (26) and (27) above, when changing the order of the summations. The left-hand side of (33) is bounded by m m X Gpp Gpp (log m)k X k (log j) hj |hj | m m j=1 j=1 = Op (log m)k m1 + m1/2 m
as in (27).
Appendix B: Limit of the Score

In this appendix and the following one, we ignore the subscript zero on the true values of the parameters d, , and G to lighten the notation. 24
Applying the Cramr-Wold device we need to show that L ( ) d 0 D N 0, 0 0 m diag Ip , m p m for any non-null vector . The derivatives with respect to da and a are L (0 ) da L (0 ) a
m 2 X da a j Re g j Iwa (j ) j 1 , m
(35)
a = 1, ..., p,
(36)
2 m
j=1 m X j=1
j p Re (g p j Iwa (j )) ,
a = 1, ..., p 1,
(37)
P where j = log j m1 m log j, g a is the a0 th row of G1 , and Iwa () is the crossj=1 periodogram between wt and wat . In both (36) and (37) we replaced G (0 ) by G since G (0 ) G = Op m1/2 , (38)
see Lobato (1999).
The part of the left-hand side of (35) corresponding to (37) is

p1 X a=1 p1 X a=1 p1 X a=1
2 da d a+p m p 2 da d a+p m p 2 da d a+p m p
m X j=1 m X j=1 m X j=1
j p Re (g p j Iwa (j )) j p Re (g p j (Iwa (j ) A (j ) J (j ) A (j ))) a j p Re (g p j A (j ) J (j ) A (j )) , a

d d
(39)
(40)
where J () is the periodogram of t and Aa () is the a0 th row of A (). By (C.2) of Lobato (1999), which is implied by our assumptions, p1 ! X 1 m 2/3 1/3 m (log m) + log m + 1/4 (39) = Op m n a=1 ! (log m)2/3 log m 1 p + 1/4 0. + = Op 1/6 m m n 25
Chapter 1
Write (40) as n 2 m 2 X dp 1 X itj da d a+p m p j Re g p j A (j ) t e Aa (j ) 2n t=1 m a=1

p1 X p1 X j=1 m
a a+p m
By denition of f (), see (12), and using Assumption 1 m X 2dp 1 da d (41) = max O m p j fpa (j ) 1ap1 m j=1 m X da +dp 1 da d , j = max O m p 1ap1 m
j=1
1 X dp j Re (g p j A (j ) A (j )) (41) a m a=1 j=1 n ! ! p1 m X 1X 0 1 X dp da dp p a+p m j Re g j A (j ) t Ip Aa (j ) (42) n t=1 t m a=1 j=1 p1 m n X X dp XX 2 1 da d a+p m p j Re g p j A (j ) t 0 ei(ts)j A (j ) . (43) s a 2n m
d dp a=1 j=1 t=1 s6=t
which is O n2 m1+2 (log m)2 0 by Assumption 4. For equation (42), note that t 0 Ip t P is a martingale dierence sequence with respect to Ft implying that n1 n t 0 Ip = t t=1 Op (n1/2 ). Thus, m X 2dp 1 1 da d (42) = max Op m p j fpa (j ) 1ap1 m n j=1 m X da +dp 1 da d j = max Op m p 1ap1 nm j=1 1/2+ = Op m (log m) . We are left with (43), which we rewrite as
n X t=1
0 t
t1 p1 m X X a+p da dp X dp m j Re A0 (j ) j g p0 ei(ts)j Aa (j ) s . n m s=1 a=1 j=1
The corresponding term for (36), derived by Lobato (1999, p. 141), is given by
n X t=1
0 t
t1 p1 XX
m a X da j j Re A0 (j ) j g a0 ei(ts)j Aa (j ) s . n m s=1 a=1 j=1
26
Thus, (35) has the same asymptotic distribution as ctn = 1 X (j1 + j2 ) cos (tj ) , n m
j=1 a=1 p1 X dp da d j a+p m p a=1 p X m
Pn
0 t=1 t
Pt1
s=1 cts,n s ,
where we dene
j1 = j j2 =
da a Re A0 (j ) j g a0 Aa (j ) + A0 (j ) g a j A (j ) , a j
Notice that, by construction, kj1 k = O (1) and kj2 k = O max1ap1 (m/j)da dp . P Since ztn = 0 t1 cts,n s is a martingale dierence array with respect to Ft = ({s , s t}), t s=1 we can apply the CLT for martingale dierence arrays if (see Brown (1971) or Hall & Heyde (1980, chp. 3.2))
n X
Re A0 (j ) j g p0 Aa (j ) + A0 (j ) g p j A (j ) . a
t=1 n X t=1
X X 2 2p1 2p1 p E ztn Ft1 a b ab 0,

a=1 b=1
(44) (45)
2 E ztn 1 (|ztn | > ) 0 for all > 0.

n X t=1 n X
A sucient condition for (45) is 4 E ztn 0. t1 t1 XX 0 0 c0 s ts,n t t ctr,n r Ft1 ! (47) (46)
First, to show (44),

n X t=1
2 ztn Ft1
t=1 s=1 r=1 n t1 XX 0 c0 s ts,n cts,n s t=1 s=1 n t1 XXX 0 c0 + s ts,n ctr,n r . t=1 s=1 r6=s
(48)
by(D.10) and (D.11) of Lobato (1999). It is immediate by denition of csn that kcsn k = 1 Pm kj1 + j2 k = O n1 m1/2 (log m) using (31). Dene the functions Ha () = O (n m) j=1 27
The term (48) has mean zero and variance !2 ! n n t1 X u1 X X X u1 X kcsn k2 + kcus,n k2 kcts,n k2 O n
s=1 t=3 u=2 s=1 s=1
(49)
Chapter 1
P da d dp Re A0 () g p0 Aa () + A0 () g p A () such that 2j = p1 a+p m p Ha (j ), Ha () = a a=1 dp da 1 dp da O as 0+ , and Ha () is dierentiable with Ha () / = O as 0+ by Assumption 3. Now we can derive an alternative bound as da d m m p X kcsn k = O max Ha (j ) cos (sj ) 1ap1 n m j=1 j da dp m1 X X max m = O (Ha (j ) Ha (j+1 )) cos (sk ) 1ap1 n m j=1 k=1 m da dp X Ha (m ) max m +O cos (sj ) 1ap1 n m
j=1
from summation by parts. Using the Mean Value Theorem it follows that Ha (j )Ha (j+1 ) = Ha ( ) Ha ( ) (j+1 j ) j = 2 j , and the bound is n j X 1 dp da 1 n j cos (sk ) kcsn k = O max 1ap1 n m j=1 k=1 m da d m p dp da X m +O max cos (sj ) 1ap1 n m j=1 log m = O s m
da d m1 m p X
P using also l cos (sj ) = O (n/s), see Zygmund (2002, p. 2). This bound is better when j=1 s > n/m. Thus, we nd that
n X s=1
kcsn k2
n X m (log m)2 X (log m)2 = O + n2 s2 m s=1 s=[n/m]+1 ! (log m)2 = O , n

[n/m]
implying that the rst term of (49) is O(n1 (log m)4 ). The second term of (49) is ! [n/2] n X X kcsn k2 s kcsn k2 , O n
s=1 s=1
28
following the analysis in Robinson (1995a, pp. 1646-1647). Applying the latter bound we nd that [n/2] [n/2] X X 1 s kcsn k2 = O sm s=1 s=1 log n = O m and (49) = O n1 (log m)4 + m1 (log n) (log m)2 . P P We still need to show that the mean of (47) is asymptotically equal to 2p1 2p1 a b ab . a=1 b=1 Thus, E (47) =
n t1 XX t=1 s=1 n t1 XX t=1 s=1
0 E tr c0 ts,n cts,n s s tr c0 ts,n cts,n
by Assumption 2. Rewrite this expression as

n t1 m m XXX X
t=1 s=1 j=1 j 0 =1
n t1 m XXX t=1 s=1 j=1
n t1 m XXX t=1 s=1 j=1 n t1 m XXX t=1 s=1 j=1
1 tr 0 j1 cos2 ((t s) j ) j1 2 n2 m 1 2 n2 m 1 2 n2 m
1 tr 0 + j2 0 0 1 + j 0 2 cos ((t s) j ) cos (t s) j 0 j1 j 2 n2 m (50)
n t1 m m XXX X n t1 m m XXX X n t1 m m XXX X
2 tr 0 j2 cos2 ((t s) j ) j1
tr 0 j2 cos2 ((t s) j ) j2
(51)
(52)
1 2 n2 m 1 2 n2 m 1 2 n2 m
t=1 s=1 j=1 j 0 6=j
tr 0 j 0 1 cos ((t s) j ) cos (t s) j 0 j1 tr 0 j 0 2 cos ((t s) j ) cos (t s) j 0 j2
(53)
(54)
t=1 s=1 j=1 j 0 6=j
t=1 s=1 j=1 j 0 6=j
2 tr 0 j 0 2 cos ((t s) j ) cos (t s) j 0 . j1 Pp

a=1
(55)
It was shown by Lobato (1999) that (50) is asymptotically equal to 29
Pp
b=1 a b Eab
and
Chapter 1
P and, using that t=1 t1 cos ((t s) j ) cos (t s) j 0 = n/2 j j 0 , we can 6= bound for s=1 Pm 1 2da 2dp Pm 2 dp da 0dp da = O (log m) m/n . Sim(54) by max1ap1 O (nm) m j=1 j j 0 6=j j ilarly, (55) is also O (log m)2 m/n . For the covariance term in (52) we notice that Pn tr 1 0 j2 4 2 j1 = j
p p1 XX a=1
b a b+p m
that (53) is asymptotically negligible. We consider the remaining terms in turn. First, n t1 m m X X X X 1 m da dp m da dp cos ((t s) j ) cos (t s) j 0 (54) = max O 1ap1 n2 m j j0 0 t=1 s=1
j=1 j 6=j
d dp da +dp j
and, using the denition of f () and Assumption 1, this is easily shown be o (1). E.g., to a f ( ) g p0 ) = O da db g a = the rst term in the square brackets is tr (fab (j ) g j j j j j O da db using that gap = 0 for a = 1, ..., p 1. This implies that (52) is o (1) since j Pn1 Pnt 2 2 t=1 s=1 cos (sj ) = (n 1) /4. Now let us examine tr(0 j2 ) appearing in (51), j2 ! 0 j2 j2 tr 4 2 ! p1 p1 da +d 2d 2d XX m b p j p 0 0 p p0 a+p b+p Re Aa (j ) g j A (j ) Re A (j ) j g Ab (j ) = tr 4 2 a=1 b=1 p1 p1 ! da +d 2d 2d XX m b p j p 0 p (j ) Re A0 (j ) g p j A (j ) + tr a+p b+p Re Aa (j ) g j A b 4 2 a=1 b=1 p1 p1 ! da +d 2d 2d XX m b p j p 0 0 + tr a+p b+p Re A (j ) j g p Aa (j ) Re A (j ) j g p0 Ab (j ) 4 2 a=1 b=1 p1 p1 ! da +d 2d 2d XX m b p j p 0 0 + tr a+p b+p Re A (j ) j g p Aa (j ) Re Ab (j ) g p j A (j ) . 4 2 a=1
b=1
b=1 0 1 a (j ) Re A0 (j ) j g p0 Ab (j ) tr Re Aa (j ) g j A 4 2 0 0 1 a p Re Aa (j ) g j A (j ) Re Ab (j ) g j A (j ) + tr 4 2 0 0 1 a0 p0 Re A (j ) j g Aa (j ) Re A (j ) j g Ab (j ) + tr 4 2 0 0 1 a0 p + tr Re A (j ) j g Aa (j ) Re Ab (j ) g j A (j ) 4 2
30
By denition of the spectral density f () in (12), the rst term is asymptotically equal P P P P 2d 2d d d 1 to p1 p1 a+p b+p j p fba (j ) g p j f (j ) j g p0 = p1 p1 a+p b+p j p a b gba gpp , a=1 a=1 b=1 b=1 P P 2d d d 1 the fourth to p1 p1 a+p b+p j p a b gab gpp , and the second and third terms to zero a=1 b=1 using Assumption 1. Hence, (51) is asymptotically equal to
n t1 m p1 p1 XXXXX t=1 s=1 j=1 a=1 b=1
a+p b+p
a 4m
a 4m
d +db 2dp 2dp da db j n2 m
m p1 p1 XXX j=1 a=1 b=1 m p1 p1 XXX j=1 a=1 b=1
a+p b+p
d +db 2dp 2dp da db j 2m n d +db 2dp 2dp da db j 2m n
a+p b+p
a 4m
(gab + gba ) cos2 ((t s) j ) gpp ! n t1 (gab + gba ) X X 2 cos ((t s) j ) gpp t=1 s=1 (gab + gba ) (n 1)2 gpp 4 (56)
P P since n1 nt cos2 (sj ) = (n 1)2 /4. t=1 s=1 We can approximate the Riemann sum appearing in (56) by an integral, viz. Z m m 1d d +2d m a b p 2 X 2dp da db j 2dp da db d = , n 1 da db + 2dp 0
j=1
where the symbol means that the ratio of the left- and right-hand sides tends to one. Using this approximation we get that (56) =
p1 p1 XX a=1 b=1 p1 p1 XX a=1 b=1
a+p b+p a+p b+p
a 2m b nm
d +d 2dp
gab + gba n2 m a b p gpp 4 1 da db + 2dp
1d d +2d
2gab , gpp (1 da db + 2dp )
and we have shown (44). Thus, we need to show (46), t1 t1 n n t1 t1 X X X X X X 4 E ztn = E 0 cts,n t 0 ctr,n r 0 ctp,n t 0 ctq,n q s t p t
t=1 t=1
C +
n X
t=1 n X t=1
s=1
tr
tr
P n 2 2 = for some constant C > 0 by Assumption 2. This expression can be bounded by O n t=1 ctn 1 4 , and we are done. O n (log m) 31
t1 X
s=1
t1 X
s=1
r=1
p=1
0 c0 ts,n cts,n cts,n cts,n t1 X r=1
q=1
c0 ts,n
ctr,n c0 tr,n cts,n
!!
Chapter 1
Appendix C: Limit of the Hessian

We prove that ) 2 L( p Eab , da db 2 db d L() p 0, m p da b 2 da +d 2d L() p Fab , m b p a b
(1) for all such that k 0 kk 0 k. First, we will need to strengthen the approximation (38) to G by showing that (0 ) = Op log n . G() G m m 1 X da +dp d +d j Iap (j ) j a p Iap (j ) , gap ( gap (0 ) = ) m j=1
(57) (58) (59)
(60)
The proof for the leading (p 1) (p 1) block is given in Lobato (1999, pp. 145-148). Consider now, for a = 1, ..., p 1,
0 where Iap () is the cross-periodogram between xat and et = yt xt . Noting that Iap () 0 Ixa (), we can rewrite this as Iap () = ( ) 1 X da +dp 1 X da +dp d +d gap ( gap (0 ) = ) j ( )0 Ixa (j ) + j a p Iap (j ) . j m m
m m j=1 j=1 1 X da +dp X j ( b b )Iba (j ) m j=1 b=1 m p1
(61)
The rst term on the right-hand side can be bounded as
1 m
d +d d d using max1jm j a p a p = 1 + op (1) which follows since the exponent is Op m1/2 by Assumption 5. The second term on the right-hand side of (61) is X m a +dp da dp 1 da +dp d = op log n Op max 1 j Iap (j ) m 1jm j m
j=1
= Op m1/2 ,
d +d d d max j a p a p 1jm
X m
j=1
j a
d +dp
p1 X ( b b )Iba (j ) b=1
32
by Assumption 5 and the above analysis. The (p, p)0 th element of (60) follows in the exact same way by application of the Cauchy-Schwartz Inequality. In view of (60), (57) follows from Lobato (1999). For (58) and (59) it can be shown that ! 2 L( ) 2 L (0 ) p db dp 0, (62) m da b da b ! ) 2 L (0 ) p 2 L( da +db 2dp 0, (63) m a b a b by proceeding component by component with the same methods that we applied to show (60). We show next that 2 da +d 2d L ( 0 ) p Fab . (64) m b p a b The left-hand side of (64) is asymptotically equal to !! m X dp 0p1 da +db 2dp 2 p j Re g j m (65) m Iab (j ) j=1 ! m m X dp X 1 Op1 Ixa j 0 da +d 2d 2 j 0 G1 j Iwa ((66) j Re g p j 0 m b p j ) m m 0 Iax j 0 2Ipa j 0 j=1
j =1
by (38), with 0p1 and Op1 denoting a (p 1)-vector of zeros and a (p 1) (p 1) matrix of zeros, respectively. The rst of these terms is
a (65) = m
d +db 2dp
2 X 2dp j Re (g pp (Iab (j ) Aa (j ) J (j ) A (j ))) b m 2 m

j=1 m X j=1
a +m
d +db 2dp
2dp
Re (g pp Aa (j ) J (j ) A (j )) , b
where the rst term is op (1) by the same arguments as for (39) in appendix B. The second term is n 2 m 2 X 2dp 1 X itj da +d 2d m b p j Re g pp Aa (j ) t e Ab (j ) m 2n t=1
j=1 m X 2dp 1 da +d 2d 2 Re (g pp Aa (j ) A (j )) + op (1) j = m b p b m 2 j=1
(67)
by Assumption 2 and the same arguments as for (41) (42) in appendix B. By denition of f () we get that
a (67) = m
d +db 2dp
2 X 2dp pp j g Re (fab (j )) + op (1) . m

j=1
33
Chapter 1
1 Applying the integral approximation from appendix B and recalling that g pp = gpp and m = 2m/n, this expression is asymptotically equal to d +db 2dp
a m
n pp g m
2dp gab da db d =
2gab . gpp (1 da db + 2dp )
Next, rewrite (66) as

a m
d +db 2dp
applying the same type of analysis as in appendix B. The last expression is seen to be O(da +db ) = m o (1) . To complete the proof, we need to show that
b m
m X 1 d d j 0 G1 j Iwa (j ) Re g pp j 0p Iax j 0 , g p j 0 Iwa j 0 + g pp j 0p Ipa j 0 m 0 j =1 m m X dp 1 X dp +da da +d 2d 1 da j j 0 = Op m b p j m m 0

j=1 j =1
2 X dp j m
j=1
d dp
2 L (
0) p
da b
0,
(68)
which implies (58) in view of (62). The left-hand side of (68) is asymptotically equal to ! m m X X 1 Op1 Ixa j 0 db d 2 j 0 G1 j Iwa (j ) da j Re g a j 0 m p j m m 0 Iax j 0 2Ipa j 0 j=1 j =1 ! ! m X 0p1 db dp 2 j Re g a j da m j m Iab (j ) j=1 by (38). The rst of these terms is asymptotically negligible by the same arguments as for (66), and the second by those for (52). This completes the proof.
34
References
Andersen, T. G. & Bollerslev, T. (1997a), Heterogenous information arrivals and return volatility dynamics: Uncovering the long-run in high frequency returns, Journal of Finance 52, 9751005. Andersen, T. G. & Bollerslev, T. (1997b), Intraday periodicity and volatility persistence in nancial markets, Journal of Empirical Finance 4, 115158. Andersen, T. G., Bollerslev, T., Diebold, F. X. & Ebens, H. (2001), The distribution of realized stock return volatility, Journal of Financial Economics 61, 4376. Andersen, T. G., Bollerslev, T., Diebold, F. X. & Labys, P. (2001), The distribution of exchange rate volatility, Journal of the American Statistical Association 96, 4255. Baillie, R. T., Bollerslev, T. & Mikkelsen, H. O. (1996), Fractionally integrated generalized autoregressive conditional heteroskedasticity, Journal of Econometrics 74, 330. Bandi, F. M. & Perron, B. (2004), Long memory and the relation between implied and realized volatility, Preprint, Universite de Montreal . Breidt, F. J., Crato, N. & de Lima, P. (1998), The detection and estimation of long-memory in stochastic volatility, Journal of Econometrics 83, 325348. Brown, B. M. (1971), Martingale central limit theorems, Annals of Mathematical Statistics 42, 5966. Chen, W. W. & Hurvich, C. M. (2003a), Estimating fractional cointegration in the presence of polynomial trends, Journal of Econometrics 117, 95121. Chen, W. W. & Hurvich, C. M. (2003b), Semiparametric estimation of multivariate fractional cointegration, Journal of the American Statistical Association 98, 629642. Christensen, B. J. & Nielsen, M. . (2004), Asymptotic normality of narrow-band least squares in the stationary fractional cointegration model and volatility forecasting, Forthcoming in Journal of Econometrics . Christensen, B. J. & Prabhala, N. R. (1998), The relation between implied and realized volatility, Journal of Financial Economics 50, 125150. Comte, F. & Renault, E. (1996), Long-memory continuous-time models, Journal of Econometrics 73, 101149. Deo, R. S. & Hurvich, C. M. (2003), Estimation of long memory in volatility, in P. Doukhan, G. Oppenheim & M. S. Taqqu, eds, Theory and Applications of Long-Range Dependence, Birkhuser, Boston, pp. 313324. 35
Chapter 1
Ding, Z., Granger, C. W. J. & Engle, R. F. (1993), A long memory property of stock market returns and a new model, Journal of Empirical Finance 1, 83106. Dueker, M. & Startz, R. (1998), Maximum-likelihood estimation of fractional cointegration with an application to U.S. and Canadian bond rates, Review of Economics and Statistics 83, 420426. Engle, R. & Granger, C. W. J. (1987), Cointegration and error correction: Representation, estimation and testing, Econometrica 55, 251276. Geweke, J. & Porter-Hudak, S. (1983), The estimation and application of long memory time series models, Journal of Time Series Analysis 4, 221238. Granger, C. W. J. (1981), Some properties of time series data and their use in econometric model specication, Journal of Econometrics 16, 121130. Haldrup, N. & Nielsen, M. . (2004), A regime switching long memory model for electricity prices, Working paper, Cornell University . Hall, P. & Heyde, C. C. (1980), Martingale Limit Theory and its Application, Academic Press, New York. Hannan, E. J. (1979), The central limit theorem for time series regression, Stochastic Processes and their Applications 9, 281289. Harvey, A. C. (1998), Long memory in stochastic volatility, in J. Knight & S. Satchell, eds, Forecasting Volatility in Financial Markets, Butterworth-Heineman, Oxford, pp. 307 320. Hassler, U., Marmol, F. & Velasco, C. (2000), Residual log-periodogram inference for long-run relationships, Forthcoming in Journal of Econometrics . Henry, M. & Zaaroni, P. (2003), The long range dependence paradigm for macroeconomics and nance, in P. Doukhan, G. Oppenheim & M. S. Taqqu, eds, Theory and Applications of Long-Range Dependence, Birkhuser, Boston, pp. 417438. Hidalgo, F. J. & Robinson, P. M. (2002), Adapting to unknown disturbance autocorrelation in regression with long memory, Econometrica 20, 15451581. Knsch, H. R. (1987), Statistical aspects of self-similar processes, in Y. Prokhorov & V. V. Sazanov, eds, Proceedings of the First World Congress of the Bernoulli Society, VNU Science Press, Utrecht, pp. 6774. Lobato, I. N. (1997), Consistency of the averaged cross-periodogram in long memory series, Journal of Time Series Analysis 18, 137155. 36
Lobato, I. N. (1999), A semiparametric two-step estimator in a multivariate long memory model, Journal of Econometrics 90, 129153. Lobato, I. N. & Robinson, P. M. (1996), Averaged periodogram estimation of long memory, Journal of Econometrics 73, 303324. Lobato, I. N. & Velasco, C. (2000), Long memory in stock-market trading volume, Journal of Business and Economic Statistics 18, 410427. Marinucci, D. & Robinson, P. M. (2001), Semiparametric fractional cointegration analysis, Journal of Econometrics 105, 225247. Nielsen, M. . (2002), Semiparametric estimation in time series regression with long range depencence, Forthcoming in Journal of Time Series Analysis . Nielsen, M. . & Shimotsu, K. (2004), Determining the cointegrating rank in nonstationary fractional systems by the exact local Whittle approach, Working paper, Cornell University . Robinson, P. M. (1991), Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regressions, Journal of Econometrics 47, 6784. Robinson, P. M. (1994a), Semiparametric analysis of long-memory time series, Annals of Statistics 22, 515539. Robinson, P. M. (1994b), Time series with strong dependence, in C. A. Sims, ed., Advances in Econometrics, Cambridge University Press, Cambridge, pp. 4795. Robinson, P. M. (1995a), Gaussian semiparametric estimation of long range dependence, Annals of Statistics 23, 16301661. Robinson, P. M. (1995b), Log-periodogram regression of time series with long range dependence, Annals of Statistics 23, 10481072. Robinson, P. M. (1997), Large-sample inference for nonparametric regression with dependent errors, Annals of Statistics 25, 20542083. Robinson, P. M. & Henry, M. (1999), Long and short memory conditional heteroscedasticity in estimating the memory parameter of levels, Econometric Theory 15, 299336. Robinson, P. M. & Hidalgo, F. J. (1997), Time series regression with long-range dependence, Annals of Statistics 25, 77104. Robinson, P. M. & Marinucci, D. (2003), Semiparametric frequency domain analysis of fractional cointegration, in P. M. Robinson, ed., Time Series With Long Memory, Oxford University Press, Oxford, pp. 334373. 37
Chapter 1
Robinson, P. M. & Yajima, Y. (2002), Determination of cointegrating rank in fractional systems, Journal of Econometrics 106, 217241. Velasco, C. (2001), Gaussian semiparametric estimation of fractional cointegration, Preprint, Universidad Carlos III de Madrid . Velasco, C. (2003), Gaussian semi-parametric estimation of fractional cointegration, Journal of Time Series Analysis 24, 345378. Watson, M. W. (1994), Vector autoregressions and cointegration, in R. F. Engle & D. L. McFadden, eds, Handbook of Econometrics, Vol. IV, North-Holland, Amsterdam, chapter 47, pp. 28432915. Zygmund, A. (2002), Trigonometric Series, third edn, Cambridge University Press, Cambridge.
38
Table 1: Simulation Results for Model A with d = 0.4 Bias n = 200 d de d 0.5 = 14 0.0138 0.0545 0.0013 0.1962 m=n 0.6 = 24 0.0062 0.0311 0.0007 m=n 0.1336 m = n0.7 = 40 0.0086 0.0191 0.0000 0.0969 n = 500 0.0002 0.1431 m = n0.5 = 22 0.0041 0.0333 0.6 = 41 0.0009 0.0158 0.0002 0.0940 m=n 0.0004 0.0651 m = n0.7 = 77 0.0021 0.0086
and de = 0 RMSE de 0.2834 0.1080 0.1588 0.0861 0.1054 0.0746 0.1719 0.1016 0.0673 0.0622 0.0515 0.0458
Table 2: Simulation Results for Model A with Bias n = 200 d de 0.5 = 14 0.0216 0.0361 0.0018 m=n 0.6 = 24 0.0142 0.0205 m=n 0.0000 0.7 = 40 0.0119 0.0135 0.0006 m=n n = 500 m = n0.5 = 22 0.0125 0.0184 0.0020 m = n0.6 = 41 0.0062 0.0105 0.0018 m = n0.7 = 77 0.0056 0.0065 0.0011
d = 0.2 and de = 0.1 RMSE d de 0.1971 0.2649 0.1866 0.1358 0.1446 0.1331 0.0958 0.0997 0.1061 0.1419 0.0950 0.0659 0.1558 0.0979 0.0661 0.1271 0.0946 0.0729
39
Chapter 1
Table 3: Simulation Results for Model B with d = 0.4 and de Bias RMSE n = 200 d de d de 0.5 = 14 0.0486 0.0600 0.0007 0.2026 0.2807 m=n 0.6 = 24 0.1180 0.0350 0.0010 m=n 0.1785 0.1549 m = n0.7 = 40 0.2153 0.0222 0.0007 0.2369 0.1046 n = 500 0.1454 0.1902 m = n0.5 = 22 0.0157 0.0380 0.0000 0.6 = 41 0.0605 0.0202 0.0001 0.1137 0.1025 m=n 0.1639 0.0671 m = n0.7 = 77 0.1499 0.0119 0.0001
=0 0.0563 0.0454 0.0424 0.0339 0.0270 0.0250
Table 4: Simulation Results for Model B with d = 0.2 Bias n = 200 d de d 0.5 = 14 0.0388 0.0378 0.0004 0.2018 m=n 0.6 = 24 0.1156 0.0180 m=n 0.0012 0.1791 0.7 = 40 0.2171 0.0157 0.0005 0.2383 m=n n = 500 0.0000 0.1412 m = n0.5 = 22 0.0108 0.0220 0.6 = 41 0.0577 0.0130 0.0001 0.1101 m=n 0.7 = 77 0.1493 0.0088 0.0000 0.1629 m=n
and de = 0.1 RMSE de 0.3478 0.1488 0.2951 0.1211 0.1008 0.0681 0.1573 0.1010 0.0667 0.0652 0.0517 0.0431
40
Table 5: Simulation Results for Model C with d = 0.4 and de Bias RMSE n = 200 d de d de 0.5 = 14 0.0438 0.0593 0.0864 0.2018 0.2850 m=n 0.6 = 24 0.1164 0.0421 0.0973 m=n 0.1784 0.1660 m = n0.7 = 40 0.2163 0.0279 0.1109 0.2384 0.1113 n = 500 0.1435 0.1917 m = n0.5 = 22 0.0159 0.0353 0.0720 0.6 = 41 0.0604 0.0220 0.0832 0.1127 0.1069 m=n 0.1645 0.0712 m = n0.7 = 77 0.1503 0.0143 0.0961
=0 0.1117 0.1125 0.1228 0.0840 0.0898 0.1013
Table 6: Simulation Results for Model C with d = 0.2 and de Bias RMSE e n = 200 d d d de 0.5 = 14 0.0389 0.0479 0.2362 0.2010 0.3412 m=n 0.6 = 24 0.1126 0.0382 0.2516 m=n 0.1771 0.1553 0.7 = 40 0.2161 0.0303 0.2772 0.2375 0.1050 m=n n = 500 0.1407 0.1567 m = n0.5 = 22 0.0105 0.0268 0.2230 0.6 = 41 0.0557 0.0214 0.2377 0.1095 0.1010 m=n 0.7 = 77 0.1486 0.0202 0.2608 0.1627 0.0700 m=n
= 0.1 0.2748 0.2674 0.2883 0.2357 0.2452 0.2657
41
Chapter 1
Table 7: Application to the Implied-Realized Volatility Relation Parameter Initial Two Step Std. Error Wde =0,=1 Converged Std. Error Wde =0,=1 0.50 = 20 m=n d 0.4628 0.4628 0.1117 0.4628 0.1117 0.1476 0.1507 0.1117 4.8452 0.1507 0.1117 4.8464 de 0.7767 0.7717 0.1313 0.7716 0.1313 0.55 = 27 m=n d 0.4807 0.4807 0.0960 0.4807 0.0960 0.1570 0.1673 0.0960 9.8175 0.1677 0.0960 9.8332 de 0.7253 0.7098 0.1115 0.7095 0.1115 0.60 = 37 m=n d 0.4527 0.4527 0.0821 0.4527 0.0821 e 0.1679 0.1766 0.0821 17.057 0.1768 0.0821 17.074 d 0.6968 0.6810 0.0905 0.6808 0.0905 Note: For the Wald tests, one or two asterisks denote signicance at 5% or 1% level, respectively.
42
Chapter 2
Published in Journal of Time Series Analysis, 2005, vol. 26, pp. 279304
43
44

Abstract We consider semiparametric estimation in time series regression in the presence of long range dependence in both the errors and the stochastic regressors. A central limit theorem is established for a class of semiparametric frequency domain weighted least squares estimates, which includes both narrow band ordinary least squares and narrow band generalized least squares as special cases. The estimates are semiparametric in the sense that focus is on the neighborhood of the origin, and only periodogram ordinates in a degenerating band around the origin are used. This setting diers from earlier work on time series regression with long range dependence where a fully parametric approach has been employed. The generalized least squares estimate is infeasible when the degree of long range dependence is unknown and must be estimated in an initial step. In that case, we show that a feasible estimate exists, which has the same asymptotic properties as the infeasible estimate. By Monte Carlo simulation, we evaluate the nite-sample performance of the generalized least squares estimate and the feasible estimate. JEL Classication: C14, C22 Keywords: Fractional integration, generalized least squares, linear regression, long range dependence, semiparametric estimation, Whittle likelihood
I am grateful to Jrg Breitung, Svend Hylleberg, Sren Johansen, Peter Phillips, seminar participants at Yale University, an anonymous referee, and an associate editor for comments and suggestions.
45
Chapter 2
Introduction
In this paper we derive central limit theorems for semiparametric estimates of the coecient vector in the multiple linear time series regression model yt = + 0 xt + ut , t = 1, 2, ..., (1)
where both the (p 1)-vector of stochastic regressors xt and the scalar errors ut are allowed to have long range dependence. It is well known that, under a wide variety of regularity conditions, the ordinary least squares and generalized least squares estimates of are asymptotically normal, see e.g. Hannan (1979). However, as discussed by Robinson (1994a, 1994b) and Robinson & Hidalgo (1997), this fails to hold when xt and ut have sucient collective long range dependence. To account for this, Robinson (1994a) suggested a narrow band (semiparametric) frequency domain least squares estimate, where the estimation is conducted over a degenerating band of frequencies near the origin, and proved its consistency for arbitrary short-run dynamics. As an alternative, Robinson & Hidalgo (1997) introduced a parametric class of (full band) weighted least squares estimates (including generalized least squares as a special case), and proved root-n-consistency and asymptotic normality for these estimates, assuming correct specication of the dynamics at any frequency. We consider a semiparametric version of the class of weighted least squares estimates in Robinson & Hidalgo (1997). The advantage of the semiparametric approach is that consistency and asymptotic normality are retained without the need for correct specication of the shortrun dynamics. Suppose the spectral density matrix of the p-vector wt = (x0 , ut )0 exists and t satises as 0+ , (2) fw () 1 G1 where the symbol means that the ratio of the left- and right-hand sides tends to one (elementwise), = diag(d1 , ..., dp ), and G is a p p real, symmetric, positive denite matrix. Then the process wt is said to have long range dependence or strong dependence since the autocorrelations decay hyperbolically. The parameters d1 , ..., dp determine the memory of the process, i.e. each component of wt , say wat , is associated with one memory parameter, da . If da > 1/2, wat is invertible and admits a linear representation, and if da < 1/2, wat is covariance stationary. If da = 0, the spectral density is bounded at the origin and wat has only weak dependence. Sometimes wat is said to have negative, short, or long memory when da < 0, da = 0, or da > 0, respectively. Note that the memory parameter of ut = wpt is dp in this notation. Throughout this paper we shall be concerned with the case 0 da < 1/2, a = 1, ..., p, since this is the dominant case in empirical research, see Robinson (1994b) and Beran (1994) for a review of long range dependent processes. The most well known parametric models satisfying (2) are the fractional Gaussian noise and the fractional ARIMA models, see Mandelbrot & Van Ness (1968), Adenstedt (1974), Granger 46
& Joyeux (1980), and Hosking (1981). The obvious advantage of specifying the spectral density only in a neighborhood of the origin as in (2), is that it allows treating the spectral density away from the origin nonparametrically, assuming only mild regularity conditions. Thus, in applications we need not worry about correct specication of the short-run dynamics of the process, such as the autoregressive and moving average orders in the fractional ARIMA model. Previously, this type of specication, termed semiparametric by Robinson (1994a), has been applied for estimation of the memory parameters by Geweke & Porter-Hudak (1983), Robinson (1994a, 1995a, 1995b), Lobato & Robinson (1996), and Lobato (1997, 1999), among others. Based on observations (yt , xt ) , t = 1, ..., n, we consider the class of semiparametric weighted least squares estimates ,m where 1 m m 1 X 2 1 X 2 = j Re (Ixx (j )) j Re (Ixy (j )) , m m
j=1 j=1 wa () wb ()
(3)
Iab () =
1 X it and wa () = at e 2n t=1
(4)
are the cross-periodogram matrix between at and bt and the discrete Fourier transform of at , respectively, j = 2j/n are the harmonic frequencies, m = m (n) is a bandwidth parameter, and the asterisk denotes complex conjugation combined with transposition. Our estimates are semiparametric in the sense that they employ only local assumptions (near the zero frequency), such as (2), on the spectral density matrix of wt , except for weak regularity conditions (see below). Thus, we shall need the bandwidth parameter m = m (n) to tend to innity at a slower rate than n, such that we remain in a neighborhood of the origin where the functional form of the spectral density (2) is assumed. This has the advantage that our estimate is invariant to the short-run dynamics of the processes xt and ut (it is also location invariant since 0 = 0 is left out of the summations in (3)). In contrast, the estimates in Robinson & Hidalgo (1997) use all available periodogram ordinates (i.e. m = n) and replace 1 our weights 2 by weight functions (), < . Thus, () = 1 and () = fu () j correspond to ordinary least squares and generalized least squares, respectively, and correct specication of the dynamics of the model at any frequency is assumed. In our setting, (3) with = 0 (i.e. 0,m ) is termed the narrow band frequency domain least squares (FDLS) estimate (see Robinson (1994a) and Robinson & Marinucci (2003)). Hence forth, we shall term (3) with = dp (i.e. dp ,m ) the narrow band frequency domain generalized least squares (FDGLS) estimate. The latter case also corresponds to the local Whittle QMLE of . To see this, consider the local frequency domain Whittle QML objective function for (1), m Ipp (j ) 1 X log fpp (j ) + . W (, Gpp ) = m fpp (j )
j=1
(5)
47
Chapter 2
P 2d Concentrate Gpp out of the likelihood by setting Gpp () = m1 m j p Ipp (j ), then the j=1 concentrated likelihood is Wc () = log Gpp () apart from constant terms. The derivative, 0 using that Ipp (j ) = Iyy (j ) Re( Ixy (j ) + Iyx (j ) 0 Ixx (j ) ), is Wc0 () 1 X 2dp = 2Gpp ()1 j Re (Ixx (j ) Ixy (j )) , m
j=1 m
p1 where m = diag(d1 , ..., m ) and E, F will be dened later. As mentioned above, the fully m parametric version of this class of estimates has been examined by Robinson & Hidalgo (1997), who derived a parametric version of (6) in the case of long range dependent regressors and errors. For the case with long range dependent errors and xed (nonstochastic) regressors, Yajima (1988, 1991) derived central limit theorems for the ordinary least squares and generalized least squares estimates under conditions on the cumulants of all orders, and gave conditions for the ordinary least squares estimate to achieve the eciency of the generalized least squares estimate. Dahlhaus (1995) considered an ecient weighted least squares estimate, and proved asymptotic normality under Gaussianity of the errors. Robinson (1997) gave a central limit theorem for nonparametric regression with xed regressors assuming that the errors are linear in martingale dierences. For a detailed discussion of the xed regressor case, see Robinson & Hidalgo (1997) and the references therein. Our emphasis on stochastic long range dependent regressors reects recent empirical research. Thus, we also cover the case of cointegration where, if dp < min1ap1 da , yt and xt are termed (fractionally) cointegrated, see ? for details on this denition of cointegration and its implications. Cointegration is essentially the necessary condition to avoid spurious regression eects when data is trended, i.e. when da is high, see Phillips (1986) and Tsay & Chung (2000). Since we impose only the condition da [0, 1/2), for all a, on the memory parameters, our framework provides a unied treatment of cointegration and regression with fractionally integrated regressors and errors. The paper proceeds as follows. In the next section we present the central limit theorem for (3), and discuss its implications for the FDLS and FDGLS estimates. Section 3 discusses feasible versions of these estimates, and it is shown that the central limit theorem continues to hold for the feasible estimates. Section 4 reports the results of a Monte Carlo investigation of our estimates. The proofs of our theorems appear in sections 5 and 6, and section 7 contains some auxiliary lemmas and propositions.
and setting this equal to zero produces (3) with = dp . In the next section, we shall give the conditions necessary to prove central limit theorems of the type dp 1 mm m ,m d N 0, E 1 F E 1 , (6)
d
48
Asymptotic Distribution of Estimates
We shall need the following assumptions on wt and the spectral density matrix fw () (with obvious implications for yt ). Assumption 1 The spectral density matrix of wt in (2) with typical element fab (), the cross spectral density between wat and wbt , satises fab () Gab da db = O da db as 0+ , a, b = 1, ..., p,
for some (0, 2] and 0 da < 1/2, a = 1, ..., p. The matrix G satises Gap = Gpa = 0 for a = 1, ..., p 1, and the leading (p 1) (p 1) submatrix of G, denoted Gx , is positive denite. P Assumption 2 wt is a linear process, wt = + Aj tj , where the coecient matrices are j=0 P 2 square summable, j=0 kAj k < . The innovations satisfy, almost surely, E ( t | Ft1 ) = 0, E ( t 0 | Ft1 ) = Ip , and the matrices 3 = E ( t t 0 | Ft1 ) and 4 = E ( t 0 t 0 | Ft1 ) t t t t are nonstochastic, nite, and do not depend on t, with Ft = ({s , s t}). Assumption 3 As 0+ dAa () = O 1 kAa ()k , a = 1, ..., p, d P where Aa () is the ath row of A () = Aj eij . j=0
We also need a restriction on the expansion rate of the bandwidth parameter m.
Assumption 4 The bandwidth parameter m = m (n) satises, as n , 1 m1+2 + 0. m n2 Finally, we need to restrict the weighting parameter depending on the memory parameters as follows. Assumption 5 The weighting parameter satises
1ap1
max (2da + 2dp 1) /4 < dp .
Our assumptions are a multivariate generalization of those in Robinson (1994a, 1995a), see also Lobato (1997, 1999). They are in some respects much weaker than those employed by Robinson & Hidalgo (1997) in their parametric setup. In particular, we avoid their assumptions of independence between xt and ut and complete specication of f (). 49
Chapter 2
Assumptions 1 and 3 specialize (2) by imposing smoothness conditions on the spectral density matrix of wt commonly employed in the literature. They are satised with = 2 if, e.g., wt is a vector fractional Gaussian noise or a vector fractional ARIMA process. The condition that Gx must be positive denite is a no multicollinearity condition for the components of xt . The extra condition that Gap = Gpa = 0 for a = 1, ..., p 1 ensures that the coherence at = 0 between the regressors and the error process is of smaller order, and can be thought of as a local version of the usual orthogonality condition from least squares theory. In particular, it relaxes the independence assumption employed by Robinson & Hidalgo (1997). Assumption 2 is a straightforward multivariate generalization of the corresponding condition in Robinson (1995a), following Lobato (1999), and imposes a linear structure on wt with square summable coecients and martingale dierence innovations with nite fourth moments. It is satised, for instance, if t forms an i.i.d. process with nite fourth moments. Under Assumption 2 we can write the spectral density matrix of wt as f () = 1 A () A () . 2 (7)
Assumption 4 restricts the expansion rate of the bandwidth parameter m = m (n). The bandwidth is required to tend to innity for consistency, but at a slower rate than n to remain in a neighborhood of the origin, where some knowledge of the form of the spectral density is assumed. The maximal rate depends on the adequacy of the approximation (2) to (7), i.e. on the parameter from Assumption 1, and the weakest constraint is implied by = 2 in which case the condition is m = o(n4/5 ). Finally, Assumption 5 states the required restrictions on the weighting parameter. Reversing Assumption 5 eectively gives a restriction on the memory parameters for the narrow band FDLS estimate (i.e. = 0) to be covered by our theory. Thus, for max1ap1 da + dp < 1/2, the narrow band FDLS estimate satises Assumption 5. We now state the following central limit theorem for ,m , which is proved in section 5. Theorem 1 Under (1) and Assumptions 1-5, the estimator dened by (3) satises dp 1 mm m ,m d N 0, E 1 F E 1 with Eab = Fab = Gab , 1 da db + 2 Gab Gpp . 2 (1 da db 2dp + 4)
(8)
(9) (10)
If the memory parameters of xt and ut are all equal, i.e. da = d, a = 1, ..., p, inference is particularly simple since the memory parameter does not appear in the convergence rate and E, F are scalar multiples of Gx . We state this special case as a corollary. 50
Corollary 1 Under (1), Assumptions 1-5, and da = d [0, 1/2), a = 1, ..., p, the estimator dened by (3) satises ! (1 2d + 2)2 Gpp G1 . m ,m d N 0, x 2 (1 4d + 4) Let us focus briey on the case of scalar xt . Suppose fw () = diag Gx 2dx , Gu 2du as 0+ . When dx + du < 1/2, the FDLS estimate satises ! 2 du dx 0,m d N 0, Gu (1/2 dx ) mm . (11) Gx 1/2 dx du However, the FDGLS estimate satises du dx du ,m d N mm Gu 0, (1/2 dx + du ) Gx (12)
for the entire stationary region of dx and du , unlike the FDLS estimate. Furthermore, the asymptotic relative eciency of du ,m with respect to 0,m (when both are asymptotically normal) is V ( 0,m ) (1/2 dx )2 , = (1/2 dx )2 d2 V ( ) u
du ,m
which equals unity if and only if du = 0, and exceeds unity otherwise. Hence, as expected, the FDGLS estimate is more ecient and applies for a wider range of (dx , du ) than the FDLS estimate. We end this section by remarking that the location of the spectral pole at the origin is not critical as long as the location is known. If instead the pole was located at = 6= 0, we assume (2) as and use periodogram ordinates close to in the summations in (3). However, the case with a pole at the origin dominates both theoretical and empirical research, so we shall not consider this extension further.
Feasible Estimates
For the FDGLS estimate the correct is usually not known a priori, and hence this estimate is infeasible in practice. However, can obviously be estimated in any given situation by = dp , p is an estimate of dp based on residuals ut from (1). These residuals can be obtained where d by e.g. FDLS, which does not require any knowledge of the memory parameters. Although the FDLS estimate is not asymptotically normal for all d, it is consistent, see Robinson (1994a) and Lobato (1997), and is thus useful as a preliminary estimate. We assume the following for dp . 51
Chapter 2
In practice, the estimate can be obtained from residuals ut as mentioned above. Hassler, Marmol & Velasco (2000) and Velasco (2001) provide some evidence that the log-periodogram and Gaussian semiparametric procedures of Robinson (1995a, 1995b), with carefully chosen bandwidth parameters, satisfy Assumption 6 with dp dp = Op (m1/2 ). Denote the feasible estimate dp ,m . The asymptotic distribution is given by the following theorem which is proved in section 6. Theorem 2 Under (1) and Assumptions 1-4 and 6 the results of Theorem 1 hold with replaced by dp . Thus, under the additional Assumption 6, the initial estimation of the memory parameter of the error process does not inuence the asymptotic distribution theory for the regression coecients obtained in the previous section.
Assumption 6 The estimate of dp satises, as n , (log n) dp dp p 0.
for the grid of values dx = 0.0(0.1)0.4 and du = 0.0(0.1)dx , i.e. du dx to avoid any spurious regression eects, see Phillips (1986) and Tsay & Chung (2000). These models both satisfy Assumptions 1-3 with = 2. From the linear model (1), we then generated yt by = 0 and = 1; the results are not sensitive to the choice of and . All calculations were performed in Ox version 3.10, see Doornik (2001) and Doornik & Ooms (2001). In each model we computed du ,m and du ,m with bandwidth parameters m = n0.4 and m = n0.5 , where [z] denotes the integer part of z. The rst bandwidth is more conservative, and is expected to be more robust under more complicated generating mechanisms such as Model B. The FGLS estimate was computed by rst obtaining the residual process ut from FDLS estimation of , and then estimating du by the Gaussian semiparametric estimator of Robinson (1995a) using the same bandwidth parameter for the entire estimation procedure. 52
We proceed to investigate the nite sample properties of the FDGLS (henceforth GLS) and feasible FDGLS (henceforth FGLS) estimates in a Monte Carlo study with two dierent generating mechanisms for xt and ut . In particular, we generated 10, 000 replications of xt and ut of length n = 256, 512, and 1, 024. Both were Gaussian fractional ARIMAs with spectra given by the two models 2da 1 , a = x, u, M odel A : fa () = 1 ei 2 2 2da 1 1 + 0.4ei 1 ei , a = x, u, M odel B : fa () = 2 1 0.6ei
Tables 1-4 about here
In Tables 1-4 we present the results of the simulation study for Model A. Tables 1 and 2 display the Monte Carlo bias of the GLS and FGLS estimates, respectively. The bias is universally lower than 0.008 in absolute value, and there are no clear trends. Tables 3 and 4 display the ratio (henceforth MSE ratio) of the asymptotic variance of du ,m (from Theorem 1) to the simulated mean-squared errors of the GLS and FGLS estimates. In both tables the estimates with the higher bandwidth parameter are superior, their MSE ratios being closer to unity and in some cases up to 20% higher than those with the lower bandwidth parameter. Comparing the results of Tables 3 and 4, the mean-squared errors of the FGLS estimates are in most cases approximately 5% higher than those of the GLS estimates, the dierence of course being due to the estimation of du . Furthermore, we note a clear monotonicity in the MSE ratios for both estimates. Thus, the ratios tend to be decreasing when dx du increases. When dx = du , i.e. on the diagonals, the asymptotic theory performs very well with MSE ratios around 0.9 for the GLS estimate and 0.85 for the FGLS estimate. The MSE ratios for the fully parametric estimates in Robinson & Hidalgo (1997) display similar magnitudes and patterns across dx and du (c and d, respectively, in their notation).
Tables 5-8 about here
Tables 5-8 present the corresponding simulation results for Model B. Again the bias is negligible, and the pattern of MSE ratios from Tables 3 and 4 is repeated. Naturally, the MSE ratios tend to be lower under this more complicated generating mechanism, but only slightly so. Robinson & Hidalgo (1997) considered only one generating mechanism, Model A, but do conjecture that their MSE ratios could deteriorate if a richer model of f () were estimated. Unreported simulations have shown that the highest possible expansion rate for the band width under Assumption 4, m = n0.8 , generally results in an MSE ratio smaller than 0.6 for the GLS estimate for Model B, and thus appears too high for the sample sizes considered here. Overall, the asymptotic theory seems to perform well, and the results of the Monte Carlo study are very similar to those obtained by Robinson & Hidalgo (1997) for their fully parametric estimates. However, in contrast to the estimates of Robinson & Hidalgo (1997), ours can be obtained without any prior knowledge of the generating mechanism of xt and ut . In particular, we do not need to know if xt and ut are generated by Model A or Model B in order to calculate our semiparametric estimates. The simulated bias is negligible in all our specications and the MSE ratio is high when dx and du are not too far apart. However, when dx is much larger than du , the asymptotic variance is quite small compared to the Monte Carlo result, and consequently asymptotic condence intervals tend to be too narrow. 53
Chapter 2
Proof of Theorem 1
We prove Theorem 1 using the auxiliary results in section 7. The basic technique is the martingale dierence approximation method of Robinson (1995a). The left-hand side of (8) is 1 m m X 1 X 2 dp 2 1 m 2 j Re (Ixx (j )) m m m 2 Re (Ixp (j )) . m j m m
j=1 j=1
From Proposition 1 of section 7, the rst term on the right-hand side satises m 2 m 1 X 2 j Re (Ixx (j )) m p E, m
j=1 m
where E is dened in (9). Note that Gx (and thus E) is invertible by Assumption 1. For the second term we show that m X 1 dp 2 m m 2 Re (Ixp (j )) d N (0, F ) . j m
j=1
By application of the Cramr-Wold device, we need to examine ( is a (p 1)-vector)

p1 X p1 X
where J () is the periodogram matrix of the innovations t . Lemma 2 of section 7 proves that (13) is op (1), while Lemma 3 in conjunction with Assumptions 1 and 4 proves that (14) is 1+2 2 da +d 2 Pm 2 n . op (1) since m1/2 m p j=1 j Re (fap (j )) = O m P We are left with (15), which can be written as n ztn , where t=1 ztn = ctn = 0 t X 1 j cos (tj ) , 2n m
j=1
a a m
m 1 da +d 2 X 2 m p = a j Re Iap (j ) Aa (j ) J (j ) A (j ) (13) p m a=1 j=1 ! p1 m n X 1 da +dp 2 X 2 1 X 0 + a m j Re Aa (j ) t t Ap (j ) (14) 2n t=1 m a=1 j=1 p1 m n X 1 da +dp 2 X 2 1 X X 0 i(ts)j a m j Re Aa (j ) t s e Ap (j ) , (15) + 2n t=1 m a=1 j=1 s6=t
1 da +d 2 X 2 a m p j Re (Iap (j )) m a=1
j=1
t1 X s=1
cts,n s ,
m
j =
p1 X a=1
d +dp 2 2 j
Re A0 (j ) Ap (j ) + A0 (j ) Aa (j ) . a p 54
Since ztn is a martingale dierence array with respect to the ltration (Ft )tZ , Ft = ({s , s t}), we can apply the CLT of Brown (1971) and Hall & Heyde (1980, chp. 3.2) if
n X t=1
n X t=1
2 E ztn 1 (|ztn | > ) 0, > 0.

n X t=1
2 ztn Ft1
p1 p1 XX a=1 b=1
a b Fab p 0,
(16) (17)
A sucient condition for (17) is 4 E ztn 0. (18)
by Lemma 4. We need to show that the mean of the rst term on the right-hand side of (19) P P is asymptotically equal to p1 p1 a b Fab . Thus, a=1 b=1
n t1 XX t=1 s=1 n t1 XX
First we show (16). The rst term on the left-hand side is t1 t1 ! n n t1 X XX XX 0 0 0 E s cts,n t t ctr,n r Ft1 = 0 c0 s ts,n cts,n s + op (1)
t=1 s=1 r=1 t=1 s=1
(19)
0 E tr c0 ts,n cts,n s s tr c0 ts,n cts,n 1 4 2 n2 m
t=1 s=1 n t1 m XXX t=1 s=1 j=1
n t1 m XXXX t=1 s=1 j=1 k6=j
tr 0 j cos2 ((t s) j ) j 1
(20)
4 2 n2 m
tr 0 k cos ((t s) j ) cos ((t s) k ) . j
(21)
Notice that since kj k = O (1) by Theorem 2 of Robinson (1995b) we have n t1 m XXXX 1 (21) = O cos ((t s) j ) cos ((t s) k ) , n2 m t=1 s=1
j=1 k6=j
and since
P 2 1 m P cos ((t s) j ) cos ((t s) k ) = n/2, (21) is O n = k6=j n m t=1 s=1 j=1 0 Pp1 Pp1 da +d +2d 4 O (m/n). Now, tr j j equals a=1 b=1 a b m b p 4 times j 0 tr Re Ap (j ) Aa (j ) + A (j ) Ap (j ) Re Ab (j ) Ap (j ) + A0 (j ) Ab (j ) a p Pn Pt1 = 4 2 (fab (j ) fpp (j ) + fap (j ) fbp (j ) + fpb (j ) fpa (j ) + fpp (j ) fba (j )) 55
Chapter 2
by denition of f (), see (7). By Assumption 1 the second and third terms are of smaller order, and since (x + x) = 2 Re (x) for any complex number x we can thus rewrite (20) as
n t1 m p1 p1 X X X X X 2 da +d +2dp 4 a b m b 4 Re (fab (j ) fpp (j )) cos2 ((t s) j ) . j n2 m t=1 s=1 a=1 j=1 b=1
(22)
Using Lemma 1 to approximate the sum (n 1)2 /4, we have that (22) is
p1 p1 X X da +d +2dp 4 a b m b nm a=1
Pm
j=1
by an integral, and since !Z
Pn1 Pnt
t=1
2 s=1 cos (sj )
b=1 p1 p1 X X a=1 b=1
ab
da +d +2d 4 m b p
n t1 XX
t=1 s=1 m 0
cos2 ((t s) j )
4 Re (fab () fpp ()) d
4 Re (fab () fpp ()) d,
and we have shown (16). Hence, we have to show (18),

n X t=1
t1 t1 n t1 t1 X X X X X 4 E ztn = E 0 cts,n t 0 ctr,n r 0 ctp,n t 0 ctq,n q s t p t

t=1
n X
t=1
s=1
tr
t1 X
s=1
r=1
p=1
0 c0 ts,n cts,n cts,n cts,n
n X t=1
tr
t1 X
s=1
q=1
c0 ts,n
t1 X r=1
ctr,n c0 tr,n cts,n
!!
for some constant C > 0 by Assumption 2. Using the arguments in Lemma 4, this expression P n 2 2 = O n1 , which completes the proof. can be bounded by O n t=1 ctn
Proof of Theorem 2
We show that
By denition of dp ,m and dp ,m , this amounts to showing that
dp 1 dp ,m dp ,m mm m p 0.
m X 2dp 1 2d 2d j j p Re (Ixx (j )) m p 0, m p m m j=1 m X 2dp dp 1 2d j j p Re (Ixp (j )) p 0. mm m m

j=1
(23)
(24)
56
2d 2d Since max1jm j p p 1 = Op dp dp log n , we have

m
a m
d +db 2dp
and
m X 2dp da d da +d 2d 1 b j = Op m b p dp dp (log n) m j=1 = Op dp dp log n
X m 2dp da +d 2d 1 2d 2d j Re (Iab (j )) = Op m b p max j p p 1 m 1jm

j=1
1 X 2dp 2d j j p Re (Iab (j )) m
j=1
by Assumption 1. In view of Assumptions 4 and 6, this proves (23) and (24).
m da dp 1 X 2dp 2d mm j j p Re (Iap (j )) m j=1 m X da dp 1 +d d j p a = Op mm dp dp log n m j=1 m dp dp log n = op m
Auxiliary Propositions and Lemmas
Here we provide a series of auxiliary results used to prove our main theorems. First, we provide an extension of the consistency result of Lobato (1997, Theorem 1) for the discretely averaged cross-periodogram, showing that the result is equally valid for our weighted cross-periodogram.
Proposition 1 Under Assumptions 1, 2, 5, and m1 + m/n 0, da +db 2 m 1 X 2 Gab p 0, j Re (Iab (j )) m 1 da db + 2

j=1 m
a, b = 1, ..., p.
(25)
57
Chapter 2
Proof. Decompose the left-hand side of (25) as 1 X 2 j Re (Iab (j ) Aa (j ) J (j ) A (j )) b m j=1 ! m n 1 X 2 1 X 0 +da +db 2 j Re Aa (j ) t A (j ) fab (j ) m m 2n t=1 t b j=1 m n X XX 1 1 +da +db 2 2 Re Aa (j ) t 0 ei(ts)j A (j ) m j s b m 2n t=1
da m +db 2 j=1 s6=t m
(26)
(27)
(28)
+da +db 2 m
1 m
m X j=1
2 Re (fab (j )) j
Gab . 1 da db + 2
(29)
By Lemmas 2 and 3 and the analysis of (15) in the proof of Theorem 1, (26) (28) are all op (1). Applying Lemma 1 to (29) we get that da +db 2 m 1 X 2 Gab = o (m ) , j Gab da db j m 1 da db + 2
j=1 m
thus completing the proof. The rst lemma is undoubtedly well known and is provided for reference. Lemma 1 For m1 + m/n 0 and any c (1, 1] , Z m m 2 X c j c d = o c+1 m n 0
j=1
as n .
Since
As |j (/j )a | |j | for (j1 , j ) and a 0, the rst term on the right-hand side is X c1 Z j Z m X m c1 j c1 (30) j d = O j (j ) d . O j j j1 j1
j=1 j=1
Proof. For n suciently large, the left-hand side is c1 ! Z j m m X Z j X c1 c c j d. j d = j j j1 j1

j=1 j=1
P (j ) d = 2 2 /n2 it follows that (30) is O n2 m c1 , which is O m1 c+1 m j=1 j j1 if c > 0 and O (log m) nc1 = o c+1 if 1 < c 0. m The remaining lemmas are straightforward extensions (to incorporate our weights) and variants of previous results appearing in Robinson (1995a) and Lobato (1997, 1999). R j 58
Lemma 2 Under the conditions of Proposition 1, for a, b = 1, ..., p, 1 X 2 j Re (Iab (j ) Aa (j ) J (j ) A (j )) = op (1) , b m

j=1 m
da m +db 2
(31)
and under the conditions of Theorem 1, for a = 1, ..., p 1,

d +dp 2 m 1 X 2 j Re Iap (j ) Aa (j ) J (j ) A (j ) = op (1) . p m j=1
a m
(32)
Proof. Using summation by parts, we have that

m X j=1
2 Re (Iab (j ) Aa (j ) J (j ) A (j )) j b 2 j 2 j+1 j X
k=1
m1 X j=1
Re (Iab (k ) Aa (k ) J
(k ) A (k )) b
+2 m
m X j=1
m X j=1
Re (Iab (j ) Aa (j ) J (j ) A (j )) b j X
k=1
n1 21 j
m X j=1
Re (Iab (k ) Aa (k ) J (k ) A (k )) b
+2 m
Re (Iab (j ) Aa (j ) J (j ) A (j )) , b
which is m X 1d n1 21 n1da db + 2 n1da db = op nm a db +2 op m m

j j j=1
by (3.3)-(3.4) in Lobato (1997), which apply under the conditions of our Proposition 1. It follows that the left-hand side of (31) is op (1). 59
Chapter 2
To prove the second statement, we use summation by parts to show that

m X j=1
m1 X j=1
2 Re Iap (j ) Aa (j ) J (j ) A (j ) j p j
2da dp m X j=1
j+1 j a
d +dp
2da dp
+m
m1 X j=1
2da dp
2d d 1 n1 j a p m X j=1
j X k=1
Re Iap (j ) Aa (j ) J (j ) A (j ) p ka
d +dp
j X k=1
ka
d +dp
Re Iap (k ) Aa (k ) J (k ) A (k ) p
+m
2da dp
j a
d +dp
Re Iap (j ) Aa (j ) J (j ) A (j ) . p
Re Iap (k ) Aa (k ) J (k ) A (k ) p
(33)
(34)
Under the conditions Theorem 1, we can apply eq. (C.2) in Lobato (1999) to conclude that of 2da dp m m1/3 (log m)2/3 + (log m) + m1/2 n1/4 and (33) is (34) is Op Op nda +dp 2
m X j=1
= Op nda +dp 2 (log m)2 1 + m2da dp +1/3 + n1/4 m2da dp +1/2 .
j 2da dp 1 j 1/3 (log j)2/3 + (log j) + j 1/2 n1/4
Thus, the left-hand side of (32) is Op (m1/6 (log m)2/3 +m1/2 (log m)+n1/4 +mda +dp 21/2 (log m)2 + m1/6 (log m)2 + n1/4 (log m)2 ) = op (1) by Assumptions 4 and 5. Lemma 3 Under the conditions of Proposition 1, for a, b = 1, ..., p, ! m n 1 da +db 2 X 2 1 X 0 m j Re Aa (j ) t A (j ) fab (j ) = op (1) . 2n t=1 t b m
j=1
P where D = n1 n t 0 Ip satises kDk = Op (n1/2 ), since by Assumption 2, t 0 Ip is a t t t=1 martingale dierence sequence with respect to the ltration (Ft )tZ . Then, since kAi (j )k = 60
Proof. The proof follows parts of the proof of Lobato (1997, Proposition 3). By denition of f (), the left-hand side is bounded by m 1 d +d 2 X 2 1 a m b (35) j Aa (j ) DAb (j ) , m 2 j=1
O(fii (j )1/2 ), i = a, b, (35) is bounded by 1/2 m X 1 da +db 2 4 kAa (j )k2 kDk2 kAb (j )k2 j 2 m m j=1 1/2 m X da = Op m1/2 m +db 2 kDk 4 faa (j ) fbb (j ) j
j=1
1/2 1/2 m m X X da = Op m1/2 m +db 2 kDk 2 faa (j ) 2 fbb (j ) , j j

j=1 j=1 1/2
which is Op (m ) = op (1) as required. Lemma 4 Under the conditions of Theorem 1,

n X t=2
t1 t1 XX
s=1 r=1
P Pt1 P 0 0 Proof. We prove convergence in mean-square. The left-hand side is n r6=s s cts,n ctr,n r , t=2 s=1 which has mean zero and variance !2 ! u1 n n t1 X X X X u1 X (36) O n kcsn k2 + kcus,n k2 kcts,n k2 ,
s=1 t=3 u=2 s=1 s=1
! n t1 XX 0 0 0 s cts,n t t ctr,n r Ft1 0 c0 s ts,n cts,n s = op (1) .

t=2 s=1
following the analysis in Robinson (1995a, p. 1646) and Lobato (1999, pp. 150-151). By Theorem 2 of Robinson (1995b), kj k = O (m/j)da +dp 2 such that kcsn k is bounded by 1 kcsn k = O kj k n m j=1 m da +dp 21/2 X m j 2da dp = O n j=1 m = O . n Next, dene the functions Ha () = 2 Re A0 () Ap () + A0 () Aa () such that j = a p Pp1 da +dp 2 Ha (j ) and Ha () = O 2da dp as 0+ , and Ha () is dierentiable a=1 a m 61
m X
Chapter 2
from summation by parts. Using the Mean Value Theorem it follows that Ha (j )Ha (j+1 ) = Ha ( ) Ha ( ) (j+1 j ) j = 2 j , and the bound is n j da +d 2 m1 X 1 2da dp 1 X m p n j cos (sk ) kcsn k = O max 1ap1 n m j=1 k=1 m da +d 2 2da dp X m p m +O max cos (sj ) 1ap1 n m j=1 1 = O s m P using also l cos (sj ) = O (n/s), see Zygmund (2002, p. 2). This bound is better when j=1 s > n/m. Thus, we nd that [n/m] n n X X m X 1 kcsn k2 = O + 2 2m n s s=1 s=1 s=[n/m]+1 1 = O n , implying that the rst term of (36) is O n1 . The second term of (36) is bounded by ! [n/2] n X X O n kcsn k2 s kcsn k2 ,
s=1 s=1
with Ha () / = O 2da dp 1 as 0+ by Assumption 3. Now we can derive an alternative bound as m da +dp 2 X max m Ha (j ) cos (sj ) kcsn k = O 1ap1 n m j=1 j da +dp 2 m1 X X max m = O (Ha (j ) Ha (j+1 )) cos (sk ) 1ap1 n m j=1 k=1 m da +dp 2 X Ha (m ) max m +O cos (sj ) 1ap1 n m j=1
see Robinson (1995a, pp. 1646-1647). The summand in the last sum is O(sm/n2 + (sm)1 ). Choose the rst bound when s n/m2/3 , then the last sum is [n/m2/3 ] [n/2] X 1 1 X sm + O , =O n2 sm m1/3 2/3 +1 s=1 s=[n/m ] 62
and (36) = O n1 + m1/3 .
63
Chapter 2
References
Adenstedt, R. K. (1974), On large-sample estimation of the mean of a stationary random sequence, Annals of Statistics 2, 10951107. Beran, J. (1994), Statistics for Long-Memory Processes, Chapman-Hall, New York. Brown, B. M. (1971), Martingale central limit theorems, Annals of Mathematical Statistics 42, 5966. Dahlhaus, R. (1995), Ecient location and regression estimation for long range dependent regression models, Annals of Statistics 23, 10291047. Doornik, J. A. (2001), Ox: An Object-Oriented Matrix Language, 4th edn, Timberlake Consultants Press, London. Doornik, J. A. & Ooms, M. (2001), A package for estimating, forecasting and simulating arma models: Arma package 1.01 for Ox, Working Paper, Nueld College, Oxford . Geweke, J. & Porter-Hudak, S. (1983), The estimation and application of long memory time series models, Journal of Time Series Analysis 4, 221238. Granger, C. W. J. & Joyeux, R. (1980), An introduction to long memory time series models and fractional dierencing, Journal of Time Series Analysis 1, 1529. Hall, P. & Heyde, C. C. (1980), Martingale Limit Theory and its Application, Academic Press, New York. Hannan, E. J. (1979), The central limit theorem for time series regression, Stochastic Processes and their Applications 9, 281289. Hassler, U., Marmol, F. & Velasco, C. (2000), Residual log-periodogram inference for long-run relationships, Forthcoming in Journal of Econometrics . Hosking, J. R. M. (1981), Fractional dierencing, Biometrika 68, 165176. Lobato, I. N. (1997), Consistency of the averaged cross-periodogram in long memory series, Journal of Time Series Analysis 18, 137155. Lobato, I. N. (1999), A semiparametric two-step estimator in a multivariate long memory model, Journal of Econometrics 90, 129153. Lobato, I. N. & Robinson, P. M. (1996), Averaged periodogram estimation of long memory, Journal of Econometrics 73, 303324. 64
Mandelbrot, B. B. & Van Ness, J. W. (1968), Fractional brownian motions, fractional noises and applications, SIAM Review 10, 422437. Phillips, P. C. B. (1986), Understanding spurious regressions in econometrics, Journal of Econometrics 33, 311340. Robinson, P. M. (1994a), Semiparametric analysis of long-memory time series, Annals of Statistics 22, 515539. Robinson, P. M. (1994b), Time series with strong dependence, in C. A. Sims, ed., Advances in Econometrics, Cambridge University Press, Cambridge, pp. 4795. Robinson, P. M. (1995a), Gaussian semiparametric estimation of long range dependence, Annals of Statistics 23, 16301661. Robinson, P. M. (1995b), Log-periodogram regression of time series with long range dependence, Annals of Statistics 23, 10481072. Robinson, P. M. (1997), Large-sample inference for nonparametric regression with dependent errors, Annals of Statistics 25, 20542083. Robinson, P. M. & Hidalgo, F. J. (1997), Time series regression with long-range dependence, Annals of Statistics 25, 77104. Robinson, P. M. & Marinucci, D. (2003), Semiparametric frequency domain analysis of fractional cointegration, in P. M. Robinson, ed., Time Series With Long Memory, Oxford University Press, Oxford, pp. 334373. Tsay, W. J. & Chung, C. F. (2000), The spurious regression of fractionally integrated processes, Journal of Econometrics 96, 155182. Velasco, C. (2001), Gaussian semiparametric estimation of fractional cointegration, Preprint, Universidad Carlos III de Madrid . Yajima, Y. (1988), On estimation of a regression model with long-memory stationary errors, Annals of Statistics 16, 791807. Yajima, Y. (1991), Asymptotic properties of the LSE in a regression model with long-memory stationary errors, Annals of Statistics 19, 158177. Zygmund, A. (2002), Trigonometric Series, third edn, Cambridge University Press, Cambridge.
65
Chapter 2
Table 1: Bias (x100) of GLS estimate for Model A.

n = 256 du \dx 0 0.1 0.2 0.3 0.4 du \dx 0 0.1 0.2 0.3 0.4 du \dx 0 0.1 0.2 0.3 0.4 0 .7733 0 .2024 0 .3978 m = [n0.4 ] = 9 0.1 0.2 0.3 .0060 .1793 .0591 .4049 .1212 .3895 .2961 .0829 .1227 m = [n0.4 ] = 12 0.1 0.2 0.3 .1835 .0131 .0095 .0960 .0028 .1694 .3121 .0062 .1069 m = [n0.4 ] = 16 0.1 0.2 0.3 .0492 .0090 .0245 .0727 .0642 .0984 .1535 .1370 .0975 0.4 .0591 .0629 .0920 .1817 .0347 n = 512 0 .2953 m = [n0.5 ] = 16 0.1 0.2 0.3 .0263 .1773 .1337 .0518 .1541 .0450 .0927 .0666 .0648 m = [n0.5 ] = 22 0.1 0.2 0.3 .0369 .0850 .0244 .0169 .0113 .0200 .0622 .1402 .0828 m = [n0.5 ] = 32 0.1 0.2 0.3 .1215 .0572 .0241 .1049 .0279 .0138 .1881 .0478 .0736 0.4 .0020 .0566 .0435 .1498 .0270 0.4 .0257 .1260 .1765 .0957 .1440 0.4 .0327 .0683 .0704 .0793 .1162
0.4 0 .0542 .0900 .1098 .2058 .0496 .1913 n = 1024 0.4 .0597 .0384 .0691 .0183 .0591 0 .1952
Table 2: Bias (x100) of FGLS estimate for Model A.

n = 256 du \dx 0 0.1 0.2 0.3 0.4 0 .7944 0 .1949 0 .3073 m = [n0.4 ] = 9 0.1 0.2 0.3 .0014 .1751 .0578 .3735 .0490 .3513 .3707 .0799 .1947 m = [n0.4 ] = 12 0.1 0.2 0.3 .1477 .0034 .0076 .0626 .0300 .1943 .3309 .0133 .0595 m = [n0.4 ] = 16 0.1 0.2 0.3 .0163 .0188 .0242 .0170 .0924 .1709 .1925 .1273 .1410 0.4 .0273 .0506 .1320 .2056 .2135 n = 512 0 .3206 m = [n0.5 ] = 16 0.1 0.2 0.3 .0159 .1937 .1089 .1036 .1480 .0222 .1950 .0565 .0201 m = [n0.5 ] = 22 0.1 0.2 0.3 .0205 .0849 .0301 .0064 .0021 .0200 .1177 .1292 .0069 m = [n0.5 ] = 32 0.1 0.2 0.3 .1301 .0563 .0063 .0840 .0204 .0025 .1846 .0452 .0700 0.4 .0080 .0425 .0557 .1935 .0475
du \dx 0 0.1 0.2 0.3 0.4
0.4 0 .0600 .1094 .1075 .2500 .0487 .1509 n = 1024 0.4 .0831 .0285 .0592 .0204 .0024 0 .2009
0.4 .0145 .1476 .1904 .1094 .1144 0.4 .0486 .0695 .0648 .1098 .0464
du \dx 0 0.1 0.2 0.3 0.4
66
Table 3: MSE ratio of GLS estimate for Model A.

n = 256 0.4 .4021 .6343 .7672 .8581 .8578 n = 512 du \dx 0 0.1 0.2 0.3 0.4 0 .8858 0 .9212 0 .9550 m = [n0.4 ] = 9 0.1 0.2 0.3 .8286 .7605 .6042 .9064 .8340 .7468 .8751 .8227 .8532 m = [n0.4 ] = 12 0.1 0.2 0.3 .8944 .7923 .6594 .9050 .9000 .7855 .9035 .8934 .8877 m = [n0.4 ] = 16 0.1 0.2 0.3 .9079 .8373 .7032 .9494 .9050 .8277 .93425 .9159 .8995 0 .9398 m = [n0.5 ] = 16 0.1 0.2 0.3 .8692 .8281 .7038 .9417 .8997 .8580 .9527 .8950 .9185 m = [n0.5 ] = 22 0.1 0.2 0.3 .9099 .8538 .7457 .9587 .9071 .8954 .9674 .9200 .9286 m = [n0.5 ] = 32 0.1 0.2 0.3 .9541 .9007 .7810 .9770 .9386 .9251 .9710 .9481 .9500 0.4 .4553 .7225 .8200 .9092 .8982
du \dx 0 0.1 0.2 0.3 0.4
0.4 0 .4268 .9459 .6714 .8185 .8900 .8817 n = 1024 0.4 .4752 .7218 .8410 .9016 .8820 0 .9827
0.4 .4942 .7519 .8681 .9258 .9184
du \dx 0 0.1 0.2 0.3 0.4
0.4 .5350 .8089 .9023 .9336 .9117
Table 4: MSE ratio of FGLS estimate for Model A.

n = 256 0.4 .37413 .5829 .7052 .7882 .7608 n = 512 du \dx 0 0.1 0.2 0.3 0.4 0 .8287 0 .8693 0 .9127 m = [n0.4 ] = 9 0.1 0.2 0.3 .7737 .7085 .5597 .8448 .7725 .6939 .8050 .7620 .7793 m = [n0.4 ] = 12 0.1 0.2 0.3 .8395 .7441 .6148 .8448 .8372 .7371 .8479 .8266 .8264 m = [n0.4 ] = 16 0.1 0.2 0.3 .8650 .7866 .6617 .8952 .8585 .7873 .8848 .8574 .8469 0 .8930 m = [n0.5 ] = 16 0.1 0.2 0.3 .8236 .7826 .6639 .8910 .8535 .8089 .9035 .8456 .8537 m = [n0.5 ] = 22 0.1 0.2 0.3 .8742 .8179 .7108 .9278 .8700 .8503 .9212 .8808 .8847 m = [n0.5 ] = 32 0.1 0.2 0.3 .9231 .8760 .7494 .9479 .9090 .8855 .9405 .9173 .9202 0.4 .4314 .6755 .7709 .8428 .8314
du \dx 0 0.1 0.2 0.3 0.4
0.4 0 .3996 .9131 .6239 .7529 .8093 .7972 n = 1024 0.4 .4474 .6749 .7866 .8367 .8203 0 .9561
0.4 .4693 .7104 .8195 .8723 .8683
du \dx 0 0.1 0.2 0.3 0.4
0.4 .5104 .7753 .8678 .8947 .8746
67
Chapter 2
Table 5: Bias (x100) of GLS estimate for Model B.

n = 256 du \dx 0 0.1 0.2 0.3 0.4 0 .2044 0 .0457 0 .0908 m = [n0.4 ] = 9 0.1 0.2 0.3 .1635 .0611 .1900 .1195 .1304 .1619 .4386 .0352 .1338 m = [n0.4 ] = 12 0.1 0.2 0.3 .0221 .1051 .1531 .0220 .0913 .0911 .3577 .2223 .0351 m = [n0.4 ] = 16 0.1 0.2 0.3 .0696 .0998 .0120 .0906 .0527 .0502 .3233 .0296 .0690 0.4 .0728 .0522 .1474 .2467 .0978 n = 512 0.4 .0083 .0530 .1993 .3337 .0613 n = 1024 0.4 .0203 .0251 .1125 .2212 .2007 0 .4680 0 .0678 0 .0491 m = [n0.5 ] = 16 0.1 0.2 0.3 .1765 .1787 .1638 .0037 .0905 .0576 .0469 .2834 .1265 m = [n0.5 ] = 22 0.1 0.2 0.3 .2716 .0721 .0289 .1370 .0143 .0653 .1206 .0003 .1794 m = [n0.5 ] = 32 0.1 0.2 0.3 .0093 .0748 .0161 .1616 .0710 .0687 .2311 .1081 .1561 0.4 .0220 .1679 .2454 .0978 .2504 0.4 .0365 .1247 .0861 .0876 .5301 0.4 .0135 .0174 .0775 .0833 .3223
du \dx 0 0.1 0.2 0.3 0.4
du \dx 0 0.1 0.2 0.3 0.4
Table 6: Bias (x100) of FGLS estimate for Model B.

n = 256 du \dx 0 0.1 0.2 0.3 0.4 0 .2263 0 .2877 0 .1009 m = [n0.4 ] = 9 0.1 0.2 0.3 .0342 .0154 .2082 .1463 .1412 .2009 .4286 .0571 .1530 m = [n0.4 ] = 12 0.1 0.2 0.3 .0495 .1330 .1251 .0223 .0824 .1044 .3494 .1992 .0305 m = [n0.4 ] = 16 0.1 0.2 0.3 .04255 .0984 .0057 .0780 .0156 .0353 .4002 .0451 .0015 0.4 .0748 .0675 .2076 .3159 .2354 n = 512 0.4 .0045 .0610 .2081 .4237 .0911 n = 1024 0.4 .0297 .0172 .1011 .2307 .1773 0 .4661 0 .0552 0 .0297 m = [n0.5 ] = 16 0.1 0.2 0.3 .2301 .1878 .1502 .0124 .0525 .0417 .0460 .3203 .2171 m = [n0.5 ] = 22 0.1 0.2 0.3 .2613 .0942 .0040 .1756 .0484 .0707 .0588 .0229 .1858 0.1 .0260 .1708 m = [n0.5 ] = 32 0.2 0.3 .1000 .0278 .0915 .0481 .2167 .1124 .1570 0.4 .0067 .1743 .2713 .1449 .1875 0.4 .0202 .1187 .0889 .1330 .5193 0.4 .0037 .0172 .0743 .0920 .3276
du \dx 0 0.1 0.2 0.3 0.4
du \dx 0 0.1 0.2 0.3 0.4
68
Table 7: MSE ratio of GLS estimate for Model B.

n = 256 du \dx 0 0.1 0.2 0.3 0.4 0 .8782 0 .8863 0 .9217 m = [n0.4 ] = 9 0.1 0.2 0.3 .8500 .7665 .6340 .8903 .8317 .7777 .8817 .8305 .8598 m = [n0.4 ] = 12 0.1 0.2 0.3 .8912 .8051 .6805 .9139 .8746 .8092 .9220 .8612 .9272 m = [n0.4 ] = 16 0.1 0.2 0.3 .8971 .8479 .7196 .9292 .8889 .8366 .9143 .8963 .9386 0.4 0 .3966 .9135 .6326 .7646 .8311 .8540 n = 512 0.4 0 .4308 .9531 .6478 .8125 .8929 .8676 n = 1024 0.4 .4695 .7031 .8374 .9052 .9238 0 .9826 m = [n0.5 ] = 16 0.1 0.2 0.3 .8589 .8212 .7026 .9240 .8744 .8403 .9238 .8859 .8970 m = [n0.5 ] = 22 0.1 0.2 0.3 .8966 .8478 .7550 .9336 .9048 .8557 .9464 .9016 .9380 m = [n0.5 ] = 32 0.1 0.2 0.3 .9504 .8963 .8092 .9767 .9378 .8871 .9761 .9410 .9518 0.4 .4690 .7057 .8334 .8553 .8810
du \dx 0 0.1 0.2 0.3 0.4
0.4 .5085 .7446 .8787 .8961 .8992
du \dx 0 0.1 0.2 0.3 0.4
0.4 .5561 .7675 .9026 .9313 .9551
Table 8: MSE ratio of FGLS estimate for Model B.

n = 256 du \dx 0 0.1 0.2 0.3 0.4 0 .8266 0 .8395 0 .8783 m = [n0.4 ] = 9 0.1 0.2 0.3 .7917 .7093 .5863 .8356 .7706 .7201 .8134 .7671 .7843 m = [n0.4 ] = 12 0.1 0.2 0.3 .8340 .7511 .6301 .8625 .8172 .7550 .8586 .8037 .8547 m = [n0.4 ] = 16 0.1 0.2 0.3 .8570 .7953 .6714 .8858 .8346 .7891 .8614 .8441 .8773 0.4 0 .3676 .8857 .5899 .7069 .7501 .7621 n = 512 0.4 0 .4037 .9164 .6062 .7484 .8305 .7897 n = 1024 0.4 .4400 .6570 .7830 .8370 .8542 0 .9547 m = [n0.5 ] = 16 0.1 0.2 0.3 .8287 .7889 .6725 .8944 .8344 .8002 .8864 .8474 .8544 m = [n0.5 ] = 22 0.1 0.2 0.3 .8590 .8151 .7277 .9071 .8626 .8265 .9143 .8622 .9022 m = [n0.5 ] = 32 0.1 0.2 0.3 .9216 .8631 .7750 .9539 .9043 .8531 .9509 .9109 .9219 0.4 .4469 .6737 .7994 .8112 .8265
du \dx 0 0.1 0.2 0.3 0.4
0.4 .4845 .7140 .8404 .8520 .8543
du \dx 0 0.1 0.2 0.3 0.4
0.4 .5352 .7366 .8623 .8994 .9166
69
70
Chapter 3
Published in Journal of Business and Economic Statistics, 2004, vol. 22, pp. 331345
71
72
Abstract We propose a Lagrange Multiplier test of the null hypothesis of cointegration in fractionally cointegrated models. The test statistic utilizes fully modied residuals to cancel the endogeneity and serial correlation biases, and we show that standard asymptotic properties apply under the null and under local alternatives. With i.i.d. Gaussian errors the asymptotic Gaussian power envelope of all (unbiased) tests is achieved by the one-sided (two-sided) test. The nite sample properties are illustrated by a Monte Carlo study. In an application to the dynamics among exchange rates for seven major currencies against the US dollar, mixed evidence of the existence of a cointegrating relation is found. JEL Classication: C12, C22, C32 Keywords: Cointegration Test, Fully Modied Estimation, Nonstationarity, Optimal Test, Power Envelope
This paper has benetted from comments by Niels Haldrup, Michael Jansson, Peter Phillips, Katsumi Shimotsu, seminar participants at the University of Aarhus and Yale University, and two anonymous referees. In particular, Jrg Breitung and Sren Johansen provided many very helpful and constructive suggestions which improved the paper signicantly.
73
Chapter 3
Introduction
In this paper we propose a Lagrange Multiplier (LM) test of the null hypothesis of cointegration in fractionally cointegrated models. In nonstationary and possibly cointegrated models, estimators and test statistics are often found to have nonstandard distributional properties when the null is nested in the autoregressive alternatives typically considered in the literature. In contrast, we show that by embedding the model of interest in a general I(d) framework, the LM test statistic regains the standard distributional properties and uniform optimality properties well known from simpler models. The analysis of cointegration has been a very active area of research in the econometrics and time series literature in the last 20 years, starting with the seminal contributions by Granger (1981) and Engle & Granger (1987). Most of this work has considered the I (1) I (0) type of cointegration in which linear combinations of two or more I (1) variables are I (0). A process is labelled I (0) if it is covariance stationary and has spectral density that is bounded and bounded away from zero at the origin, and I (1) if the rst dierenced series is I (0) . If yt and xt are I (1), and hence in particular nonstationary (unit root) processes, but there exists a process et which is I (0) and a xed such that yt = 0 xt + et , (1)
then yt and xt are said to be cointegrated. Thus, the nonstationary series move together in the sense that a linear combination of them is stationary and a common stochastic trend is shared. Testing for cointegration in this framework amounts to testing stationarity of the unobserved residual process et against a unit root alternative, see e.g. Shin (1994), Jansson (2004), and the references therein. The above notion of cointegration is based on the knife-edge distinction between I(1) and I(0) processes. However, many economic and nancial time series exhibit strong persistence without exactly possessing unit roots, for some recent evidence see e.g. Diebold & Rudebusch (1989), Baillie & Bollerslev (1994), Baillie (1996), Lobato & Velasco (2000), and Marinucci & Robinson (2001). This has led to the consideration of the class of fractionally integrated processes, which is more general than I(1) and still admits a criterion for linear co-movement of series. Thus, a process is fractionally integrated of order d, denoted I(d), if its dth dierence is I (0). Here, d may be any real number, i.e. d = 0 or d = 1 are special cases. For a precise statement, xt is I (d) if d xt = ut I (t 1) = u# , t or equivalently, inverting (2), xt = d u# , t (3) (2)
dening u# = ut I (t 1), where ut is I (0), I () denotes the indicator function, and the fract 74
tional dierence operator d = (1 L)d is dened by its binomial expansion Z X (j d) Lj , (z) = tz1 et dt, (1 L)d = (d) (j + 1) 0
j=0
(4)
in the lag operator L (Lxt = xt1 ). With the denition (2) or (3), xt is a type II fractionally integrated process, which is nonstationary for all d but asymptotically stationary for d < 1/2, see Marinucci & Robinson (1999). Following the original idea by Granger (1981), a natural generalization of the cointegration concept is to assume that the raw series are I (d) and that a certain linear combination is I (d b) , with d b positive real numbers. This is denoted CI (d, b). To x ideas, consider the simple system (5) db+ y1t 0 y2t = u# , 1t d y2t = u# , 2t (6) where ut = (u1t , u0 )0 is I (0). In this model yt is CI (d, b ) and the cointegration vector is 2t given by 1, 0 . Clearly, this allows the study of co-movement among persistent series much more generally than in the standard unit root based I (1) I (0) cointegration framework. In the present paper, we assume that d and b are known a priori and satisfy d b 3/4 + for some > 0. We wish to test the hypothesis H0 : = 0, i.e. setting d = b = 1 can be seen as an alternative to testing for stationarity of the residuals in (1). If the null hypothesis is changed slightly in this setup, the properties of the process yt do not change as dramatically as in the standard cointegration model in which the relation (1) is either perfectly cointegrating, i.e. CI (1, 1), or spurious. A notion of near-cointegration does exist in the unit root based I (1) I (0) cointegration literature, which oers some smoothing of the gap between CI (1, 1) and spurious regression, e.g. Jansson & Haldrup (2002). However, the test statistics in that framework still have nonstandard distributional properties. We show that in our fractional integration framework much more desirable properties are obtained than in (1). Our test can be considered an extension of the univariate LM tests in Robinson (1991, 1994), Agiakloglou & Newbold (1994), and Tanaka (1999), among others, who considered testing for a unit root in a fractional integration framework, i.e. testing on the parameter d in (2) in the frequency and time domains. They showed that their tests have standard asymptotic distributions and, under Gaussianity, that their tests enjoy optimality properties. Simulations in Tanaka (1999) showed that, in nite samples, the time domain tests are superior to Robinsons (1994) frequency domain LM test with respect to both size and power. Presumably there exist Wald and likelihood ratio versions of our LM test, which have the same asymptotic properties as our test even though their nite sample properties may dier, as shown by Nielsen (2004) in a general univariate model. However, we consider only the time 75
Chapter 3
domain LM test for fractional cointegration with the usual computational motivation that the model only needs to be estimated under the null hypothesis. As we shall see below, in the important special case d = b = 1 the computation of the LM test statistic does not require any fractional dierencing, and indeed all that is needed in this case are the residuals from a fully modied regression which can be obtained from readily available computer software. We show that the likelihood theory in the time domain is tractable and that the ML estimator of the cointegrating vector , which is required to compute the test statistic, reduces to a version of the fully modied least squares estimator of Phillips & Hansen (1990) and Phillips (1991), see also Kim & Phillips (2001) for a fractional cointegration version. We then show that the LM test can be calculated using the residuals from the fully modied regression and establish the desirable distributional properties and optimality properties of the test. In particular, the test statistic is consistent and asymptotically normal or chi-squared distributed, and under the additional assumption of Gaussianity the test is locally most powerful. Indeed, we show that in the special case with i.i.d. Gaussian errors, the asymptotic Gaussian power envelope of all (unbiased) tests is achieved by the one-sided (two-sided) version of our test, i.e. the one-sided (two-sided) test is asymptotically uniformly most powerful among all (unbiased) tests. In a simulation study we nd that the nite sample rejection frequencies are reasonable but well below the asymptotic local power for samples of size n = 200, and much closer to the asymptotic local power for n = 500. Our new methodology is applied to the analysis of exchange rate dynamics following Baillie & Bollerslev (1989, 1994). Previous studies have focused on the estimation of the cointegration vector and the memory parameter of the equilibrium errors, but no formal testing of the hypothesis of fractional cointegration has been done. We concentrate on testing for the presence of (fractional) cointegration with various specications of d and b. Our ndings are not decisive, but we do nd some evidence of cointegration among a system of exchange rates for seven major currencies against the US Dollar. In particular, we do not reject at the 1% level (against fractional alternatives) that the exchange rates can be described by a standard I(1) I(0) cointegration model when the errors (i.e. u1t and u2t in (5) and (6) above) are allowed to follow autoregressive processes of order one. The remainder of the paper is laid out as follows. Section 2 sets up the model of fractional cointegration. In section 3 we consider the estimation of the cointegrating vector, derive the LM test statistic, and establish the desirable distributional properties. In section 4 we derive the asymptotic Gaussian power envelopes for the one-sided and two-sided testing problems and show that they coincide with the local asymptotic power functions of the one-sided and two-sided LM tests. Section 5 presents the results of the Monte Carlo study and in section 6 we provide the empirical application to exchange rate dynamics. Section 7 oers some concluding remarks. All proofs are collected in the appendix. 76
A Model of Fractional Cointegration
Suppose we observe the K-vector time series {yt , t = 1, 2, ..., n}, which we partition as y1t (scalar) and y2t ((K 1)-vector). We consider a triangular model of fractional cointegration in the spirit of the Phillips (1991) triangular system. Thus, let yt be generated by the fractionally cointegrated system y1t = 0 y2t + zt ,
db+ d
t = 1, 2, ...,
(7) (8) (9)
zt =
y2t =
u# , 1t u# , , 2t
t = 1, 2, ..., t = 1, 2, ...,
where zt is the (unobserved) deviation from the cointegrating relation and ut = (u1t , u0 )0 is 2t an error component. We allow the error components u1t and u2t to be contemporaneously correlated and possibly weakly dependent, c.f. Assumption 1 below. The system (7) (9) generalizes the standard triangular cointegration model. The series share fractionally integrated stochastic trends of orders I (d) and I (d b), and the linear combination 1, 0 eliminates the most persistent one. Equation (7) can be regarded as an equilibrium relationship between the I (d) components of yt . Under the null, = 0, the deviations from equilibrium constitute an I (d b) process, and when d = b the deviations are only weakly dependent, so this is a case of special interest. The model could be extended to multidimensional cointegrating relationships as in Jeganathan (1999), where the estimation of the cointegration rank and cointegrating vectors is of interest. However, most empirical studies consider a single cointegrating relation among two or more variables, e.g. Cheung & Lai (1993), Baillie & Bollerslev (1994), Dueker & Startz (1998), Marinucci & Robinson (2001), and Kim & Phillips (2001). Thus, we consider only the case of a single cointegrating relationship in this paper to keep focus on optimal testing of hypotheses on . The model is assumed to satisfy the following assumption on the error process. Assumption 1 We consider four typical specications for the error component ut . In each case, the innovations et = (e1t , e0 )0 i.i.d. (0, ) with nite fourth moment and is a positive 2t denite matrix which we partition conformably as " # 2 0 11 21 = . (10) 21 22 0. ut i.i.d. or equivalently ut = et . 1. u1t follows the stationary AR(p) process g (L) u1t = e1t , and u2t = e2t . 77 t = 1, 2, ..., (11)
Chapter 3
2. u1t = e1t , and u2t follows the (K 1)-dimensional stationary VAR(p) process G (L) u2t = e2t , t = 1, 2, .... (12)
3. ut follows the K-dimensional block diagonal stationary VAR(p) process g (L) u1t = e1t , G (L) u2t = e2t , t = 1, 2, ..., t = 1, 2, .... (13) (14)
In cases 1-3, g (z) and G (z) are lag polynomials of order p with coecients gathered in 1 and 2 , respectively, and G (1) has full rank (no cointegration among the components of y2t ). In the following we write A (z) = diag (g (z) , G (z)) as shorthand for the lag polynomial in Assumption 1.3. It would be straightforward to extend Assumption 1.3 to A (L) ut = et , for a general lag polynomial A (z) of order p, where A (1) has full rank. Applying the formulae in Hosking (1980), the results in Lemma 1 and the following theorems could be extended to cover this more general case. However, the structure imposed by Assumption 1.3 seems relevant and its interpretation is natural. In our model the constants d and b are prespecied. In particular, we assume that d b 3/4 + for some > 0 such that the series are nonstationary and cointegration reduces the integration order by more than 3/4. Assuming that b is known a priori is natural as it eectively species the null for our LM test and thus, according to the LM principle, there is no need to estimate b. If d is not known a priori it could be estimated in a preliminary step as in, e.g., Cheung & Lai (1993), Baillie & Bollerslev (1994), Marinucci & Robinson (2001), and Kim & Phillips (2001), although this may change the limiting distributions below. Ecient procedures have been developed to estimate d in fractionally integrated time series models, e.g. Sowell (1992) (exact ML) and Tanaka (1999) (conditional ML). Our objective is to test the hypothesis H0 : = 0 (15)
against H1 : > 0 or H2 : 6= 0 in the model (7) (9). In particular, d = b = 1 generates a standard I(1) I(0) cointegrated system under the null, so this is a test of the null of cointegration in the usual sense, but the fractional alternatives against which the test is directed are new. Thus, a test of (15) can be considered an alternative to testing stationarity of the residuals in (1), which has been standard in the literature, see e.g. Shin (1994), Jansson (2004), and the references therein. Another important case, for d 1.25 and some small user-chosen > 0, is the one-sided test of (15) with b = d 1/2 + , i.e. d b = 1/2 , which is a test for the existence of an (asymptotically) stationary cointegrating relation against the alternative that no stationary cointegrating relation exists (though a nonstationary but mean-reverting cointegrating relation with 1/2 d b < 1 may still exist). Finally, for d 1 and some small 78
user-chosen > 0, it is of interest to conduct a one-sided test of (15) with b = d 1/4 + , i.e. d b = 1/4 , as a border case for square integrability of the spectral density of the equilibrium errors and asymptotic normality of the autocovariances of the equilibrium errors, see e.g. Fox & Taqqu (1986). Choosing d = b = 1 also suggests applying a test of (15) as a valuable diagnostics tool in a standard I (1) I (0) cointegration analysis. In this context, rejecting (15) should be taken either as evidence of a drastically misspecied dynamic structure or as a suggestion to employ an actual fractional cointegration analysis. Thus, the test could be thought of as a general test for misspecication of the model. If, for example, y1t and y2t are related by some complicated nonlinear lter and a linear model is imposed, then it is plausible that long-range dependence could be introduced in the residuals as a result of this misspecication.
Testing Fractional Cointegration
The log-likelihood function of the model (7)(9) under Assumption 1.3 (the most general case) and Gaussianity of the errors is !0 ! n 1 X g (L) db+ zt n g (L) db+ zt 1 L (, , , ) = ln || (16) 2 2 t=1 G (L) d y2t G (L) d y2t bearing in mind the truncation in our denition of fractionally integrated processes, e.g. G (L) d y2t = e# by (9) and (14). The log-likelihood in (16) is equal to the sum of the 2t marginal log-likelihood 1X n 0 G (L) d y2t 1 G (L) d y2t ln |22 | 22 2 2
t=1 n
(17)
and the conditional log-likelihood

n 2 n 1 X 2 g (L) db+ y1t 0 y2t 0 1 G (L) d y2t , ln 1.2 2 21 22 2 2 1.2 t=1
(18)
see Phillips & Loretan (1991) for a discussion of the equivalent estimator in the standard I (1) I (0) cointegration framework. Presumably, the lagged equilibrium errors in (19) could 79
where 2 = 2 0 1 21 is the variance of e1.2t = e1t 0 1 e2t , which is e1t centered 1.2 11 21 22 21 22 about its mean conditional on e2t . The asymptotic results derived later impose only Assumption 1 on the error process. Gaussianity is not necessary for most of our results and is used only to choose a likelihood function and to derive optimality properties. From the conditional likelihood (18), the MLE of under the null, = 0, is recognized to be the NLS estimator in the augmented regression (19) db y1t = 0 db y2t + (g (L) 1) db y1t 0 y2t + c0 G (L) d y2t + e1.2t ,
Chapter 3
be replaced by leaded d y2t , as demonstrated by Saikkonen (1991) in the standard cointegration framework, and the resulting regression could be estimated by OLS. Under Assumption 1.2 where g (z) = 1, i.e. when there is no autoregressive term in the equilibrium errors, the estimation of (19) reduces to OLS on
db
y1t =
db
y2t +
p X k=0
ck d y2tk + e1.2t .
(20)
This simplication is even stronger under Assumption 1.0 where p = 0 in (20) and the lagged fractionally dierenced y2t disappear. The simplication (20) is especially useful in many applications where cointegration is a result of rational expectations theory, i.e. that deviations from equilibrium in time t should be unpredictable, based on information up to time t 1, which in our framework implies d = b and g (z) = 1. Equivalently, (20) is OLS in the (infeasible) regression
db y1t = 0 db y2t + e1.2t ,
(21)
P where y1t = y1t 0 1 p b y2tk . This is the fully modied least squares method of 21 22 k=0 Phillips & Hansen (1990) and Phillips (1991), which was developed for fractional cointegration by Kim & Phillips (2001). In contrast to our restrictions on d and b, Kim & Phillips (2001) require 2d b > 1, d 1 in their fully modied method and further that b 1 in the likelihood analysis of their model. Thus, Kim & Phillips (2001) limit the strength of the cointegrating relation by bounding b < 2d 1 from above, and in particular they exclude the CI (1, 1) case. We assume at least b 3/4 + for some > 0 in our analysis, since our estimation problem under the null has been transformed into a regression between I (b) processes with I(0) errors, (19) (20). Thus, the necessity of at least b > 1/2 becomes clear, since otherwise the estimator of becomes inconsistent as demonstrated by e.g. Marinucci & Robinson (2001, p. 231). Note that if OLS is applied to (7) directly, which has often been the case in the literature, see e.g. Cheung & Lai (1993) or Baillie & Bollerslev (1994), it introduces a bias unless 21 = 0 and g (z) = 1. Indeed, if 21 = 0 and g (z) = 1 hold, y2t is strictly exogenous and inference on (and estimation of the parameter ) will depend only on the part of the likelihood attributed to (7). In particular, the MLE of reduces to OLS on (7) and we can apply the univariate methods of Robinson (1994) and Tanaka (1999). This is not the case when 21 6= 0 or g (z) 6= 1 because of the well known endogeneity and serial correlation biases, see e.g. Phillips (1991). Returning to the full model, the normalized score statistic is found by dierentiating (16) or (18) with respect to and evaluating the resulting expression under the null, 1 L (, , , ) Sn = n =0,=,=,= n X 1 0 0 ln () ( (L) db (y1t y2t )) g (L) db (y1t y2t ) c0 G (L) d y(22) g = 2 2t , n 1.2 t=1 80
where g (z) and G (z) are evaluated at 1 and 2 , respectively. Using that ln (1 z) = P 1 j j=1 j z and dening the fully modied residuals under = 0 as 0 e1.2t = g (L) db (y1t y2t ) c0 G (L) d y2t 0 e1t = g (L) db (y1t y2t ), the score can be written more compactly as Sn =
n t1 XX 1 j 1 e1tj e1.2t 2 n 1.2 t=1 j=1 n1 n X 1 j C11 (j) c0 C21 (j) 2 j=1 1.2
(23)
and (24)
X n1 1 0 1 n j e1 C (j) e1 , =
j=1
(25)
P 0 where Cab (j) = n1 n t=j+1 eat ebtj is the estimated sample autocovariance function, e1 = 0 (1, 00 ) is the selection vector, and we used that 2 = 11 and 2 0 1 = 12 , where ab 1.2 1.2 21 22 is the (a, b)th block of 1 for a, b = 1, 2. The asymptotic distribution of the score statistic Sn under the null (15) is considered next. Theorem 3.1 Suppose d b 3/4 + for some > 0 in the model (7) (9) and let Sn be dened by (25). Under H0 : = 0 and Assumption 1.0, 2 2 11 Sn N 0, . 6 2 1.2
D
(26)
Under H0 : = 0 and Assumption 1.i, Sn is asymptotically Gaussian with mean zero and variance 1 0 2 2 0 1 0 0 0 11 2 vec e1 e1 i1 , ., e1 e1 ip Hi 6 1.2 1 0 Hi0 i 1 Hi Hi vec 1 e1 e0 0 , ., 1 e1 e0 0 1 i1 1 ip
(27)
P 1 0 for i = 1, 2, 3. Here, i is the covariance matrix of (u0 , ..., u0 t tp+1 ) , il = j=l j i,jl , i,k is the k 0 th term in the Wold representation of ut normalized such that i,0 = IK , and 0 Hi = a0 / i , ..., a0 / i , where aj = vec Aj are the coecients in the autoregressive repp 1 resentation A (L) ut = et . 81
Chapter 3
In the simple bivariate VAR(1) example also considered in the appendix, the variance equations (27) reduce to 0 1 1 0 1 0 2 2 0 0 11 i Hi Hi vec 1 e1 e0 0 , i = 1, 2, 3, 1 i1 2 vec e1 e1 i1 Hi Hi 6 1.2 P where i = E (ut u0 ) can be estimated by n1 n ut u0 and the particular i1 and Hi for this t t=1 t example are given in the appendix. The Fisher information for , which is derived in the next theorem, illustrates the standard nature of our testing problem. Theorem 3.2 Let the assumptions of Theorem 3.1 be satised and assume that {et } is Gaussian. Under Assumption 1.0 the Fisher information for is 2 1 L (, , ) 2 2 11 , (28) = I0 = lim E 0 n n 6 2 1.2 and under Assumption 1.i, i=1,2,3, the Fisher information for is Ii = 1 0 2 2 0 1 0 0 0 11 2 vec e1 e1 i1 , ., e1 e1 ip Hi 6 1.2 1 0 Hi0 i 1 Hi Hi vec 1 e1 e0 0 , ., 1 e1 e0 0 . (29) 1 i1 1 ip
To assess the local power properties of the test, we derive the asymptotic distribution under the sequence of local alternatives 1n = / n. Theorem 3.3 Under the assumptions of Theorem 3.1 and = / n, Sn N (Ii , Ii ) as n , where Ii is dened in Theorem 3.2. Consider again briey the special case where it is known that 21 = 0. In that case the score (25) and the distributions in Theorem 3.3 coincide with the ones obtained by Tanaka (1999). That is, we can apply the test of Tanaka (1999) to the residuals in (24), and Tanakas (1999) i.i.d. result is obtained under Assumptions 1.0 and 1.2 and his result for autocorrelated errors is obtained under Assumptions 1.1 and 1.3. Thus, when 21 = 0, our test has the same functional form and distribution as Tanakas (1999) test, which is based on more information ( known), and therefore our test shares the asymptotic optimality properties of that test when 21 = 0. In practice, to construct an approximate size test of H0 against H1 : > 0 under Assumption 1.i, we compute the statistic p 1 D (31) LMi1 = Sn N Ii , 1 Ii 82
D
(30)
under = / n as n , and compare it to the 100 (1 ) % point of the central 2 1 distribution. A useful feature of the asymptotic distributions (31) and (32) is that they are free of the parameters d and b. Since d and b are assumed known a priori, their eect is neutralized by suitable dierencing. This shows that simple asymptotic inference about can be carried out for any choice of d and b satisfying d b 3/4 + for some > 0. The calculation of the tests may seem to be quite involved as p gets large because of the covariance matrices and . However, for a given parameter value we can calculate and (and thus the tests) simply by nding the coecients in the Wold representation of ut , then directly evaluate the sums in , and set equal to the sample covariance matrix of tp+1 )0 . Another possibility is to employ the following numerical approximation to the (0 , ..., u0 ut one-sided test, ,v n u n n X uX 1t 2 X 1t e e 2 t d i1 = n e1.2t e1.2t , (33) LM t=1 t=1 t=1
H0
under = / n as n , and compare it to the 100 (1 ) % point of the standard normal distribution. To test against the two-sided alternative H2 : 6= 0 under Assumption 1.i, at approximate size , we compute 2 D (32) LMi2 = LMi1 2 2 Ii 1
Corollary 3.1 Under the assumptions of Theorem 3.1 and = / n, p P (LMi1 > Z1 ) Z + || Ii , P LMi2 > 2 1 Fi 2 1,1 1,1 ,
which follows by noting that 1t / = j=1 j 1 e1tj and comparing with (25) and (31). e From Theorem 3.3, we can easily calculate the asymptotic local power functions of the one-sided and two-sided tests (31) (32). This is stated as a corollary.
Pt1
(34) (35)
2 where Z1 and 2 1,1 are the 100 (1 ) % points of the standard normal and central 1 distributions, respectively, and and Fi are the distribution functions of the standard normal distribution and the noncentral 2 distribution with noncentrality parameter i = 2 Ii . 1
Figure 1 shows asymptotic local power functions for d = b = 1 and a variety of rst order autoregressive specications and contemporaneous correlation structures. When the correlation is low (left-hand side panels) only the autoregressive term in the equilibrium error (8) has a signicant eect. In fact, if the errors are contemporaneously uncorrelated, i.e. 21 = 0, the power functions for cases 1.0 and 1.2 coincide and the power functions for cases 1.1 and 1.3 coincide. With highly correlated errors (right-hand side panels) the autocorrelation in 83
Chapter 3
d y2t spills over and has some eect on the power function, though still not as much as the autoregressive term in the equilibrium error. This is well known from standard cointegration analysis. Since the regressors y2t are already heavily trended it makes little dierence if the innovations to the stochastic trend are weakly autocorrelated. Figure 1 about here It follows from Corollary 3.1 and Theorem 3.2 that the power functions of the tests depend on the covariance matrix of the underlying innovations et , such that the power depends on the extent of the endogeneity of the regressors y2t . In particular, under Assumptions 1.0 and 1.1 any correlation between y2t and zt is exploited by the test to increase power, c.f. equation (28) and Figure 1 (compare the starred lines in the left-hand and right-hand side panels). Note that in case 1.2 the power may increase or decrease with correlated errors. Comparing the solid line in the left-hand and right-hand side panels the power increases when correlation is increased from .6 to .9. However, comparing the starred and solid lines in the upper lefthand side panel shows that in case 1.2 power is decreased with correlation .6 compared to the uncorrelated case. Thus, when correlation is high the rst term in I2 , which increases power as correlation increases, dominates the second term, which decreases power due to the spill-over of the autocorrelation via the contemporaneous correlation. In general, the ability of the test to exploit the correlation stands in contrast to the standard I (1) I (0) framework where the power functions and power envelopes do not depend on , see Jansson & Haldrup (2002) and Jansson (2004). The dependence on is due to the fact that can be assumed to be known in the derivation of power functions and power envelopes in our fractional setup. That is not the case in the standard I (1) I (0) framework and thus cointegration tests in that framework are unable to exploit the correlation between y2t and zt to gain power. As a rst optimality result, it follows immediately from Theorems 3.2 and 3.3 that the two-sided test is locally most powerful (LMP) and we state this as a corollary. Corollary 3.2 Under the assumptions of Theorem 3.3 and the additional assumption of Gaussianity, the two-sided test statistic (32) is locally most powerful in the sense that the noncentrality parameter is maximal. Next, we show that much stronger optimality results than the LMP property of Corollary 3.2 can be obtained for the problem of testing (15) when the errors are assumed to be i.i.d. Gaussian.
Asymptotic Gaussian Power Envelopes
In this section, we derive the asymptotic Gaussian power envelopes for the one-sided and twosided testing problems and proceed to show, following Elliott, Rothenberg & Stock (1996) and 84
Tanaka (1999), that the one-sided test is asymptotically uniformly most powerful (UMP) and, following Nielsen (2004), that the two-sided test is asymptotically uniformly most powerful unbiased (UMPU). Assume that the data generating process is (7) (9) with ut independent, normally distrib uted, and known, and true parameter value 0n = c/ n for some xed c > 0. The test of H0 : = 0 against the local alternative H1 : 1n = / n for some xed > 0 is a test of a simple null against a simple alternative. The Neyman-Pearson Lemma, e.g. Lehmann (1986, chapter 3), states that the test that rejects the null when Pn Pn 2 2 1.2nt t=1 uP t=1 u1.2nt (36) Mn = n n 2 t=1 u1.2nt becomes large is most powerful. Here, u1.2nt and u1.2nt are the residuals (with and known) under H0 and H1 , respectively. The next theorem derives the limiting distribution of Mn under local alternatives.
Theorem 4.1 Let Mn denote the test statistic (36) in the model generated by 0n = c/ n (c > 0 is a xed scalar). Then, under the sequence of local alternatives 1n = / n ( > 0 is a xed scalar), it holds that p D Mn M (c, ) = 2 I0 Z + (2c ) I0 as n , where Z is a standard normal variable. Let the power of Mn be given by (c, ) = P (M (c, ) > c ()) under H1n when 0n is true, where the critical value c () is determined by P (M (0, ) > c ()) = . Then the power envelope of all one-sided tests is given by () = (, ), and a test whose power attains the power envelope for all points is UMP. To nd a test statistic that applies against two-sided alternatives we invoke the principle of unbiasedness, see Lehmann (1986, chapter 4), to construct a most powerful unbiased test. Unbiasedness requires that the power of the test does not fall below the nominal signicance level for any point in the alternative. A test whose power attains the power envelope for all points is UMPU. The following theorem derives the asymptotic Gaussian power envelopes of the one-sided and two-sided testing problems, and shows that these envelopes are achieved by our tests. Theorem 4.2 The one-sided asymptotic Gaussian power envelope for all tests of size of H0 : = 0 against H1 : 1n = / n ( a xed scalar) is given by (34) and the two-sided asymptotic Gaussian power envelope for all unbiased tests of size is given by (35). Thus, in the i.i.d. Gaussian model, the one-sided LM test (31) is asymptotically uniformly most powerful (UMP) and the two-sided LM test (32) is asymptotically uniformly most powerful among all unbiased tests (UMPU). 85
Chapter 3
This result is in stark contrast to the results in the standard I (1) I (0) cointegration literature. Tests that enjoy optimality properties have been derived in that framework by e.g. Shin (1994) and Jansson (2004) whose tests are LMP and point optimal, respectively, i.e. tests that have maximal power against a single prespecied point in the (autoregressive) alternative. However, our criterion is against all (fractional) alternatives, i.e. against all points in the alternative 6= 0.
The local power functions and power envelopes derived above are asymptotic results, and in this section we examine by Monte Carlo experiments whether these asymptotic approximations carry over to nite samples. The model we have chosen for the simulation study is a bivariate system with d = b = 1, i.e. (y1t y2t ) = u# , 1t y2t = u# , 2t (37) (38)
which is a standard cointegrated model under the null. We consider several specications for the error process corresponding to each case in Assumption 1 and let et be bivariate normal with variances normalized to unity and with contemporaneous correlation 0 or .6. The parameter values for the autoregressive coecients correspond to those in the upper panels in Figure 1, i.e. 1 = .2 and 2 = .5. All calculations were made in Ox v3.00 (Doornik (2001)) including the Arma package v1.01 (Doornik & Ooms (2001)). Throughout, we x the nominal size (type I error) at .05 and the number of replications at 1, 000. We consider the sample sizes n = 200 and n = 500. The former is typical for macroeconomic time series, and the latter (or even larger) for nancial time series. We concentrate on comparing the nite sample performance of the one-sided LM test (reported as LM) with the asymptotic local power, but also report results for the size corrected LM test (reported as LMsc). The properties of the estimator of the cointegrating vector in a similar model were examined by Kim & Phillips (2001), who found that even in samples as small as n = 100 the performance of the estimator is very good with respect to bias and variance. Tables 1-4 show the simulated rejection frequencies of the test statistics (LM and LMsc) for dierent assumptions on the autocorrelation structure of the errors (as in Figure 1) corresponding to each case outlined in Assumption 1. For comparison, the asymptotic local power, which is equal to the power envelope under Assumption 1.0 by Theorem 4.2, has been calculated from Corollary 3.1 for the same parameter values and is reported under the heading Envelope. The 86
rst three columns of each table give the results for contemporaneously uncorrelated errors, whereas in the last three columns the contemporaneous correlation between the errors is .6. Tables 1-4 about here First, consider the case where the cointegrating error u1t is i.i.d. and u2t is either i.i.d. (Table 1) or follows an AR(1) (Table 3). In these two cases the nite sample rejection frequencies are quite close to the asymptotic local power, even for the small sample size n = 200, and especially with contemporaneously uncorrelated errors. In Table 3 the eect of G (z) spills over via the correlation, but only slightly degrades the size and power compared to the uncorrelated case where there is no spill-over. The insignicance of the specication of u2t is well known from standard cointegration analysis, and is due to the fact that y2t is already highly trended and making the innovations to this I (d) process weakly dependent does not add signicantly to this trend. When u1t is allowed to be autocorrelated as in Tables 2 and 4, where u1t follows an AR(1) process, we know from Corollary 3.1 and Figure 1 that the power of the test degrades and consequently the asymptotic local power functions are much lower than in Tables 1 and 3. The nite sample rejection frequencies reect this behavior and are well below the asymptotic power for n = 200 and also somewhat below the asymptotic power for n = 500. Comparing the middle and right-hand side panels in Tables 1, 2, and 4 shows that the test takes advantage of the correlation between the underlying errors, and the improvement in power when the errors are correlated (right-hand side panels) is evident. The ability of the test to exploit this correlation to increase power even in nite samples is remarkable and contrasts the inability of conventional cointegration tests to exploit this correlation even asymptotically, see Jansson & Haldrup (2002) and Jansson (2004). In general, the nite sample power functions for samples of size n = 200 are reasonable, but well below the asymptotic local power. For samples of size n = 500 they are close to the asymptotic local power functions, especially in the absence of an autoregressive term in the equilibrium errors. Thus, one would expect very good performance of the tests in nancial applications where samples are often many times larger. In such cases the power loss resulting from the estimation of a rich autocorrelation structure would also be of less importance. The sample size in our empirical application below is n = 336, so for the application we expect the performance of the tests to lie between the two cases considered in the present simulation study.
Exchange Rate Dynamics
The analysis of exchange rate dynamics and potential (fractional) cointegrating relations between exchange rates for dierent currencies has attracted much attention recently. Baillie & Bollerslev (1989) nd evidence of one cointegrating relation between seven dierent (log) 87
Chapter 3
spot exchange rates using conventional cointegration methods. This is challenged by Diebold, Gardeazabal & Yilmaz (1994) who show that the inclusion of an intercept changes the conclusion for the Baillie & Bollerslev (1989) data set. This nding is further supported in an analysis of a dierent data set covering a longer span of time in Diebold et al. (1994). In the article by Baillie & Bollerslev (1994) it is argued that the failure of conventional cointegration tests to nd evidence of cointegration in the Baillie & Bollerslev (1989) exchange rate data is due to the presence of fractional cointegration. Thus, they estimate the cointegration vector by OLS following Cheung & Lai (1993) and t a simple fractionally integrated white noise model to the residuals. It is concluded that the exchange rates can be described by a CI(1, .11) relationship (in our notation). However, their estimate of the integration order of the equilibrium errors (.89) may well be upwards biased since relevant short-run dynamics may have been left out. This is indeed what is concluded by Kim & Phillips (2001) who employ their fractional fully modied estimation procedure to a dierent data set covering a longer time span but the same exchange rates. They nd that the equilibrium errors are best described by an ARFIMA(1,d,0) process with d = .33. All the above studies concentrate on the estimation of the cointegration vector and the memory parameter of the equilibrium errors, but no formal testing of the hypothesis of fractional cointegration is attempted. We take the opposite view and concentrate on testing for the presence of (i) standard I(1) I(0) cointegration against fractional alternatives, (ii) CI(d, d) cointegration, where d is a preliminary estimate of d, and (iii) fractional cointegration with equilibrium errors that are integrated of order less than one-quarter, i.e. that the spectral density of the equilibrium errors is square integrable or equivalently that their autocovariances are asymptotically normally distributed. We apply our tests to a system of log exchange rates for the currencies of the following seven countries, (West) Germany, United Kingdom, Japan, Canada, France, Italy, and Switzerland against the US Dollar. The same currencies are examined in the studies cited above. However, where Baillie & Bollerslev (1989, 1994) and Diebold et al. (1994) consider daily observations covering 1 March 1980 to 28 January 1985 and Kim & Phillips (2001) consider quarterly observations from 1957 through 1997, our data set is comprised of monthly averages of noon (EST) buying rates and runs from January 1974 through December 2001 for a total of n = 336 observations. Thus, our data set, which is extracted from the Federal Reserve Board of Governors G.5 release, covers only the period of the current exible exchange rate regime, but a much longer span of time than the Baillie & Bollerslev (1989) data set. A long time span has generally been found to be important in detecting long-run relations. Tables 5-6 about here Table 5 presents the fractional integration analysis of the data set. The rst two rows are the estimates of the fractional integration orders estimated by the conditional ML technique (CMLE) in Tanaka (1999) with lag orders p = 0 and p = 1. The standard errors reported 88
1 in parenthesis are calculated as 6/ ( n) when p = 0 and ( n) when p = 1, where 1 2 = 2 /6 1 a2 a (ln (1 a))2 and a is the estimated AR coecient, see Tanaka (1999). As a robustness check we also report the Gaussian semiparametric (GSP) estimates of Robinson (1995) (applied to the rst-dierenced data and adding one to the resulting estimate) with two dierent bandwidths in the nal two rows of Table 5. The standard errors of these estimates are 1/ (2 m), see Robinson (1995). The nal column gives estimates of a common integration order, computed simply as an average of the estimated integration orders for each exchange rate, which we use in our fractional cointegration analysis. In Table 6 we report two common misspecication tests for the residuals of the univariate CMLEs in Table 5. The Portmanteau test of autocorrelation up to lag 6 is reported as AR(6) and the test for ARCH up to lag 1 is reported as ARCH(1). When p = 0 the AR(6) test rejects at the 1% level for all the time series except CAN, whereas when p = 1 the test only rejects at the 5% level for JAP and UK. Likewise, when p = 0 the ARCH test rejects at the 1% level for ITA, JAP, and UK, but when p = 1 it rejects only for UK at the 1% level. Thus, the misspecication tests suggest that the data are well described when allowance is made for one autoregressive lag, i.e. when p = 1. Returning to the estimates in Table 5, it is clear that the exchange rates can be well described as I (1) processes. The CMLEs are insignicantly dierent from unity except CAN with p = 0, but that estimate may be upwards biased if relevant short-run dynamics is left out of the estimation (although the misspecication tests do not suggest so). Thus, when p = 1 the CAN estimate is insignicantly dierent from unity. The GSP estimates are all insignicantly dierent from unity except FRA with m = 67. Hence, the results of Table 5 support the overwhelming evidence in the previous literature that exchange rates are I(1). E.g. Baillie & Bollerslev (1989) conduct unit root tests of the I(1) hypothesis against the I(0) alternative and Baillie (1996) provides evidence from fractional models. Table 7 about here In Table 7 the results from applying the one-sided LM test (33) to the exchange rate data are presented. The exchange rate for (West) Germany is y1t , and the remaining six exchange rates are gathered in y2t , which is then a six-dimensional vector. Following the evidence in favor of p = 1 in Table 6, we consider only Assumptions 1.2 and 1.3 with p = 1. We test three dierent hypotheses. Based on the evidence in Table 5 and the previous literature, we specify d = b = 1 in the rst hypothesis corresponding to the standard I (1) I (0) model as discussed above. Secondly, we use the estimated common integration order dc for d and b (i.e. c ), using the estimates from Table 5 with p = 1 for dc . The third hypothesis, we set d = b = d d = 1, b = .76, is that there exists a cointegrating relation which is integrated of order less than one-quarter (using = .01). This implies square integrability of the spectral density of the cointegrating errors and asymptotic normality of their autocovariances. The results we obtain in Table 7 are mixed. Under Assumption 1.2 all the tests reject 89
Chapter 3
strongly. However, when allowance is made for an autoregressive specication in the cointegrating relation, i.e. under Assumption 1.3, the test does not reject the third hypothesis, thus supporting a dynamic specication of the cointegrating relation (possibly with fractional integration in the cointegrating relation). In particular, under Assumption 1.3 which we consider the most relevant based on Table 6, the test rejects the rst two hypotheses at the 5% level, but none of the hypotheses are rejected at the 1% level. In the cases with an estimated autoregressive term in the equilibrium errors, the estimates of the autoregressive parameter (not reported in the table) are between .84 and .94. Hence, there appears to be persistence in the cointegrating relation, and the results of Table 7 suggest that it could be only short memory.
Conclusion
We have proposed and examined a time domain LM test for the null of cointegration in a fractionally cointegrated model with the usual computational motivation. In the important case where the null hypothesis is that of standard I (1) I (0) cointegration, but the test is against fractional alternatives, the calculation of the LM test statistic does not require any fractional dierencing and can be based on residuals from readily available computer software. The likelihood theory in the time domain is tractable and the ML estimation of the cointegration vector reduces to a version of the fully modied least squares estimator. Thus, the LM test statistic utilizes fully modied residuals to cancel the endogeneity and serial correlation biases. The test statistic is shown to have standard distributional properties under the null and under local alternatives, such that inference can be drawn from the normal and chi-squared distributions. In the special case with i.i.d. Gaussian errors, the asymptotic Gaussian power envelope of all tests is achieved by the one-sided version of our test, and the asymptotic Gaussian power envelope of all unbiased tests is achieved by the two-sided version of our test. Thus, with i.i.d. Gaussian errors, the one-sided (two-sided) version of our test is asymptotically uniformly most powerful among all (unbiased) tests. The empirical relevance of our test is established by Monte Carlo experiments, which show that nite sample rejection frequencies are reasonable for samples of size n = 200 and close to the asymptotic local power for n = 500. Finally, we have applied our methodology to the analysis of exchange rate dynamics in a system of exchange rates for seven major currencies against the US Dollar. We have focused on testing for the presence of (fractional) cointegration, rather than the estimation of any particular model, but the evidence is mixed. 90
Appendix: Proofs
Before we prove the theorems we need a lemma. Dene the sample autocovariance and residual autocovariance functions
n n X 1 X 0 (j) = 1 et etj and C et e0 , tj C (j) = n n t=j+1 t=j+1
where the et are estimated residuals of a VAR(p) process. We consider the asymptotic distri bution of a particular linear combination of the residual autocovariances in each of the four cases outlined in Assumption 1. Lemma 1 Let et be the estimated residuals of the K-dimensional VAR(p) process A (L) ut = et , where et is i.i.d. (0, ) with nite fourth moments and A (z) has the structural parameterization in Assumption 1.i. Then X n1 1 D n j vec C (j) N (0, 0 )
j=1
n as n , where
n1 X j=1
D j 1 vec C (j) N (0, i )
2 , 6 1 0 0 0 2 i = 0 IK , ..., 0 IK Hi Hi0 i 1 Hi Hi i1 IK , ..., 0 IK , i1 ip ip 6 P 1 0 for i = 1, 2, 3. Here, i is the covariance matrix of (u0 , ..., u0 t tp+1 ) , il = j=l j i,jl , 0 th term in the Wold representation of u normalized such that i,k is the k t i,0 = IK , and 0 Hi = a0 / i , ..., a0 / i , where aj = vec Aj are the coecients in the autoregressive repp 1 resentation A (L) ut = et . 0 = Proof. For a xed m > p dene the K 2 m-vectors Cm = vec(C (1) , ..., C (m))0 and Cm = (1) , ..., C (m))0 . Consider rst case 1.0, where ut is i.i.d. and C (j) is observable. It is vec(C well known that in this case D nCm N (0, Im ) and thus m m X X 1 D n j vec C (j) N 0, j 2 .
j=1 j=1
The desired result for case 1.0 now follows by application of Bernsteins Lemma, see e.g. Hall & Heyde (1980, pp. 191-192). 91
Chapter 3
For the remaining three cases we employ a result of Ahn (1988) on the asymptotic distribution of the residual autocovariances of a VAR(p) process under structural parameterization. 0 Consider case 1.2. Dene the matrix H2 = a0 / 2 , ..., a0 / 2 , where aj = vec Aj are the p 1 coecients in the autoregressive representation A (L) ut = et and 2 is the vector of coecients in G (z). In this setup, Ahn (1988) showed that (in our notation) 0 1 0 0 D nCm N 0, Im Gm H2 H2 1 H2 H2 Gm , where IK 1 IK .. . 0 . .. . . . 0 .. . 0 m1 IK . . . . . . mp IK
G0 m
IK
Consequently,
m X 1 D (m) , n j vec C (j) N 0, 2 j=1
where 2
(m) il ,
(m)
is a truncated version of 2 , i.e. with 2 /6 replaced by
which is truncated at m. Again, we can apply Bernsteins Lemma to replace the by truncated sums by their limits. For cases 1.1 and 1.3 the same results hold, except that Hi , il , and i are dierent as indicated by the subscript i. As a simple example consider a bivariate VAR(1) system with g (z) = 1 1 z and G (z) = 1 2 z. The Hi matrices are H1 = (1, 0, 0, 0)0 , H2 = (0, 0, 0, 1)0 , and H3 = (H1 , H2 ) and the covariance equations simplify to i = 1 0 2 0 IK Hi Hi0 1 i Hi Hi (i1 IK ) , i = 1, 2, 3, i1 6
Pm
j=1 j
and il replaced
where 11 = diag (1 , 1), 21 = diag (1, 2 ), 31 = diag (1 , 2 ), i = 1 ln (1 i ), i = 1, 2, i P t and i = E (ut u0 ) can be estimated by n1 n ut u0 . t t=1
Proof of Theorem 3.1. Suppose rst that is known. Using that vec (A)0 vec (B) = tr (A0 B) and by application of Lemma 1, the score statistic is X 0 n1 1 n j vec 1 e1 e0 vec C (j) 1
j=1 D
Sn =
0 N 0, vec 1 e1 e0 i vec 1 e1 e0 , 1 1 92
where the i are dened in Lemma 1. The variance equations (26) and (27) follow immediately from Lemma 1, e.g. in case 1.0 with i.i.d. errors the variance is 0 2 = ( ) vec 1 e1 e0 vec 1 e1 e0 1 1 6 = 0 2 vec 1 e1 e0 vec e1 e0 1 1 6 2 0 1 0 tr e1 e1 e1 e1 6 2 2 11 . 6 2 1.2
Next, we show that estimating does not inuence the result. From e.g. Cheung & Lai (1993), Marinucci & Robinson (2001), and Kim & Phillips (2001) we know that, since is estimated by OLS between I (b) processes with I (0) errors, = Op (n12b ) when b ) when b > 1. 1/2 < b 1 and = Op (n For simplicity we consider only the case with i.i.d. errors and scalar in the remainder of the proof, i.e. ut = et is a bivariate i.i.d. process. The general case follows similarly. Consider the residual processes zt = y1t y2t = zt + ( )y2t , u1t = db zt = u1t + db ( )y2t ,
u1.2t = u1.2t + db ( )y2t , P P and dene wt = t1 j 1 u1tj u1.2t and wt = t1 j 1 u1tj u1.2t . Then j=1 j=1 1 X (wt wt ) = n t=1
n n t1 j=1
dening v2t = b u# I (b). 2t To prove that (39) (41) are negligible, we rst show that (log n)1 n2b
n1 X j=1
+u1tj db ( )y2t + u1.2t db ( )y2tj n1 n 1 X 1 X = Op j ( )2 v2tj v2t n j=1 t=j+1 n1 n X X 1 +Op j 1 ( )u1tj v2t n j=1 t=j+1 n1 n X X 1 +Op j 1 ( )u1.2t v2tj n
j=1 t=j+1
1 X X 1 ( )2 (db y2tj )(db y2t ) j n t=1
(39)
(40)
(41)
j 1
t=j+1
n X
v2tj v2t = Op (1) .
(42)
93
Chapter 3
It is easily seen that anj = n2b

t=j+1 n X
v2tj v2t = Op (1)
(43)
for j/n 0, n by a slight variation of the results in, e.g., Marinucci & Robinson (2000). We thus rewrite the left-hand side of (42) as (log n)
1 n1 X j=1
anj = (log n)
P Applying (43) and the fact that (log n)1 n j 1 = O (1), the rst term on the right-hand j=1 side of (44) is readily seen to be Op (1) if kn /n 0 as n . The last term of (44) is bounded by n1 n1 X X 1 1 (log n) j anj max |anj | (log n)1 j 1 jn j=kn +1 j=kn +1 log (n/kn ) = max |anj | O , jn log n P Pn 2b 2 where maxjn |anj | n2b maxjn n t=j+1 |v2tj v2t | n t=1 v2t = an0 = Op (1) by (43). Hence, we nd that (42) is Op (1) if we can choose kn such that kn /n 0 and (log n)1 log (n/kn ) = O (1), which is satised if, e.g., kn = n/ (log n). Now we return to the evaluation of (39) (41). When 1/2 < b 1, it follows from (42) that (39) is n1 n X X j 1 v2tj v2t = Op (log n) n3/22b . Op n3/24b
j=1 t=j+1
kn X j=1
anj + (log n)
j=kn +1
n1 X
j 1 anj .
(44)
and
Similarly, when 1/2 < b 1, (40) and (41) are of orders n1 n X X Op n1/22b j 1 u1tj v2t = Op ((log n) n1/2b )
j=1 t=j+1
respectively. Since b 3/4 + for some > 0 by assumption, all these terms are op (1) using that n (log n) = o (1) for any > 0. When b > 1, (39) (41) are all Op (log n) n1/2 by the same arguments. This completes the proof. 94
Op n1/22b
n1 X j=1
j 1
t=j+1
n X
u1.2t v2tj = Op ((log n) n1/2b ),
Proof of Theorem 3.2. As in the proof of Theorem 3.1 it can be shown that estimating does not aect the result, so we assume that is known. The second derivative of the likelihood (18) is 2 L (, , ) 2 =
n o 1 Xn ln (1 L) ln (1 L) g (L) db+ y1t 0 y2t 2 t=1 1.2 g (L) db+ y1t 0 y2t 0 1 G (L) d y2t 21 22
n t1 tj1 n t1 t1 1 X X X 1 1 1 X X X 1 1 = 2 j k e1tjk e1.2t 2 j k e1tj e1tk . (45) 1.2 t=1 j=1 1.2 t=1 j=1 k=1 k=1
n o2 1 Xn ln (1 L) g (L) db+ y1t 0 y2t 2 t=1 1.2
In the case with i.i.d. errors, et is observable and the contribution of the rst term to the Fisher information is zero by uncorrelatedness of et . The contribution of the second term is
n t1 t1 n1 1 X 2 2 1 X X X 1 1 j k e1tj e1tk = 2 j 11 E 2 n 1.2 t=1 j=1 1.2 j=1 k=1
by uncorrelatedness of et , which proves the result for i.i.d. errors. In the remaining cases, we need to take the estimation of the autoregressive parameters into account. Again it can be shown that the rst term of (45) is negligible. Since C (0) = 2 + Op (n1/2 ) and 1.2 = e0 1 e1 , the contribution of the second term is 1 E tr 1 X X X 1 1 0 1 j k e1 e1 C (0) 1 e1tj e1tk n t=1
j=1 k=1 n1 n1 XX j=1 k=1 n t1 t1
= E tr n
j 1 k 1 e1 e0 1 C (j) e1 e0 C (k)0 1 1 1 (j) e1

n1 X k=1
= En
n1 X j=1
j 1 e0 1 C 1
k 1 e0 1 C (k) e1 , 1
0 which is equal to vec 1 e1 e0 i vec 1 e1 e0 as n by Lemma 1 and Theorem 3.1. 1 1 Proof of Theorem 3.3. Let = / n. First suppose is known and dene e1nt = db z = g (L) e and e = G (L) d y . By the Mean Value Theorem we obtain 2t g (L) t 1t 2t X 1 e1nt = e1t + j e1tj + op (n1/2 ) n
j=1 t1
95
Chapter 3
for all t = 1, ..., n. Thus, under = / n, Sn = XX 1 j 1 e1nt (1nt 0 1 e2t ) e 21 22 2 n t=1 j=1 1.2 ! ! tj1 n t1 t1 XX 1 X 1 X 1 e1.2t + j 1 e1tj + k e1tjk k e1tk + op (1) n n 2 n t=1 1.2
j=1 k=1 k=1 j=1 j=1 k=1 n t1
n1 n1 X X X n1 1 0 1 n j e1 C (j) e1 + n j 1 e0 1 C (j) e1 k1 e0 1 C (k) e1 + op (1) 1 1
as in the proofs of Theorems 3.1 and 3.2. The result when is known now follows from Lemma 1 and the above theorems. When is unknown we can apply the same arguments as in the proof of Theorem 3.1, along with elementary inequalities to the components due to e1nt e1t , to show that the result is unaected. Proof of Corollary 3.1. Follows immediately from Theorem 3.3. Proof of Corollary 3.2. Follows immediately from (32) and Theorem 3.2. Proof of Theorem 4.1. By the Mean Value Theorem we obtain u1.2nt = u1t 0 1 u2t 21 22 c X 1 + j u1tj + op (n1/2 ) n
j=1 j=1 t1
t1 c X 1 j u1tj + op (n1/2 ) u1.2nt = u1t 0 1 u2t + 21 22 n
for all t = 1, ..., n. Thus, we note that 1X 2 P u1.2nt 2 1.2 n

t=1 n
(46)
as n . The numerator of Mn is
n X t=1
u2 1.2nt
n X t=1
u2 1.2nt
n t1 X c2 X t=1
j 2 u2 1tj
t1 X j=1
j=1
+2
n X
n t1 X (c )2 X t=1
j 2 u2 1tj
j=1
u1.2t n t=1 2 6
j 1 u1tj + op (1) r 2 2 2 Z + op (1) . 6 11 1.2 (47)
= (2c )
2 11
+ 2
Combining (46) and (47) we get the desired result. 96
Proof of Theorem 4.2. Consider the one-sided case with > 0 (the reverse case follows similarly). The one-sided power envelope is () = P (M (, ) > c ()) p = P 2 I0 Z + I0 > c () p c () I0 /2 I0 , = P Z> = P (M (0, ) > c ()) p c () = P Z> + I0 /2 I0
where c () satises
In the two-sided case we note that, since for varying c the family of distributions M (c, ) is normal, it satises the requirement that it be strictly totally positive of order 3 (STP3 , see Lehmann (1986, p. 119)). Hence the power envelope of all unbiased tests of H0 : = 0 against H1 : 1n = / n is given by 2 () = 1 P (C1, () < M (, ) < C2, ()) (Lehmann (1986, p. 303)), where the constants are determined by P (C1, () < M (0, ) < C2, ()) = 1 P (C1, () < M (c, ) < C2, ()) = 0. c c=0 (48) (49)
such that c () = 2 I0 Z 2 I0 . Then p p () = P Z > 2 I0 Z 2I0 /2 I0 p = P Z > Z I0 .
with the non-trivial solution C1, () = C2, () 2 2 I0 . Now we can nd the constants from (48), 1 = P C2, () 2 2 I0 < M (0, ) < C2, () C2, () + 2 I0 C2, () + 2 I0 , <Z< = P 2 I0 2 I0 where Z is a standard normal random variable. Thus, C2, () solves C2, () + 2 I0 /2 I0 = 1 /2, which implies C2, () = 2 I0 Z1/2 2 I0 , where Z1/2 is the 100 (1 /2) % point of the standard normal distribution. 97
Consider rst (49) which implies that ( () is the density function of the standard normal distribution) C1, () + 2 I0 C2, () + 2 I0 = 2 I0 2 I0
Chapter 3
The two-sided power envelope is then 2 () = 1 P (C1, () < M (, ) < C2, ()) p p p = 1 P 2 I0 Z1/2 2 I0 < 2 I0 Z + 2 I0 < 2 I0 Z1/2 2 I0 p = 1 P Z1/2 < Z + I0 < Z1/2 = 1 F 2 1,1 .
98
References
Agiakloglou, C. & Newbold, P. (1994), Lagrange multiplier tests for fractional dierence, Journal of Time Series Analysis 15, 253262. Ahn, S. K. (1988), Distribution for residual autocovariances in multivariate autoregressive models with structural parameterization, Biometrika 75, 590593. Baillie, R. T. (1996), Long memory processes and fractional integration in econometrics, Journal of Econometrics 73, 559. Baillie, R. T. & Bollerslev, T. (1989), Common stochastic trends in a system of exchange rates, Journal of Finance 44, 167181. Baillie, R. T. & Bollerslev, T. (1994), Cointegration, fractional cointegration, and exchange rate dynamics, Journal of Finance 49, 737745. Cheung, Y. E. & Lai, K. S. (1993), A fractional cointegration analysis of purchasing power parity, Journal of Business and Economic Statistics 11, 103122. Diebold, F. X., Gardeazabal, J. & Yilmaz, K. (1994), On cointegration and exchange rate dynamics, Journal of Finance 49, 727735. Diebold, F. X. & Rudebusch, G. D. (1989), Long memory and persistence in aggregate output, Journal of Monetary Economics 24, 189209. Doornik, J. A. (2001), Ox: An Object-Oriented Matrix Language, 4th edn, Timberlake Consultants Press, London. Doornik, J. A. & Ooms, M. (2001), A package for estimating, forecasting and simulating arma models: Arma package 1.01 for Ox, Working Paper, Nueld College, Oxford . Dueker, M. & Startz, R. (1998), Maximum-likelihood estimation of fractional cointegration with an application to U.S. and Canadian bond rates, Review of Economics and Statistics 83, 420426. Elliott, G., Rothenberg, T. J. & Stock, J. H. (1996), Ecient tests for an autoregressive unit root, Econometrica 64, 813836. Engle, R. & Granger, C. W. J. (1987), Cointegration and error correction: Representation, estimation and testing, Econometrica 55, 251276. Fox, R. & Taqqu, M. S. (1986), Large-sample properties of parameter estimates for strongly dependent stationary gaussian series, Annals of Statistics 14, 517532. 99
Chapter 3
Granger, C. W. J. (1981), Some properties of time series data and their use in econometric model specication, Journal of Econometrics 16, 121130. Hall, P. & Heyde, C. C. (1980), Martingale Limit Theory and its Application, Academic Press, New York. Hosking, J. R. M. (1980), The multivariate portmanteau statistic, Journal of the American Statistical Association 75, 602608. Jansson, M. (2004), Point optimal tests of the null hypothesis of cointegration, Forthcoming in Journal of Econometrics . Jansson, M. & Haldrup, N. (2002), Regression theory for nearly cointegrated time series, Econometric Theory 18, 13091335. Jeganathan, P. (1999), On asymptotic inference in cointegrated time series with fractionally integrated errors, Econometric Theory 15, 583621. Kim, C. S. & Phillips, P. C. B. (2001), Fully modied estimation of fractional cointegration models, Preprint, Yale University . Lehmann, E. L. (1986), Testing Statistical Hypotheses, 2nd edn, Springer, New York. Lobato, I. N. & Velasco, C. (2000), Long memory in stock-market trading volume, Journal of Business and Economic Statistics 18, 410427. Marinucci, D. & Robinson, P. M. (1999), Alternative forms of fractional Brownian motion, Journal of Statistical Planning and Inference 80, 111122. Marinucci, D. & Robinson, P. M. (2000), Weak convergence of multivariate fractional processes, Stochastic Processes and their Applications 86, 103120. Marinucci, D. & Robinson, P. M. (2001), Semiparametric fractional cointegration analysis, Journal of Econometrics 105, 225247. Nielsen, M. . (2004), Ecient likelihood inference in nonstationary univariate models, Econometric Theory 20, 116146. Phillips, P. C. B. (1991), Optimal inference in cointegrated systems, Econometrica 59, 283 306. Phillips, P. C. B. & Hansen, B. E. (1990), Statistical inference in instrumental variables regression with I(1) variables, Review of Economic Studies 57, 99125. Phillips, P. C. B. & Loretan, M. (1991), Estimating long-run economic equilibria, Review of Economic Studies 58, 407436. 100
Robinson, P. M. (1991), Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regressions, Journal of Econometrics 47, 6784. Robinson, P. M. (1994), Ecient tests of nonstationary hypotheses, Journal of the American Statistical Association 89, 14201437. Robinson, P. M. (1995), Gaussian semiparametric estimation of long range dependence, Annals of Statistics 23, 16301661. Saikkonen, P. (1991), Asymptotically ecient estimation of cointegration regressions, Econometric Theory 7, 121. Shin, Y. (1994), A residual-based test of the null of cointegration against the alternative of no cointegration, Econometric Theory 10, 91115. Sowell, F. B. (1992), Maximum likelihood estimation of stationary univariate fractionally integrated time series models, Journal of Econometrics 53, 165188. Tanaka, K. (1999), The nonstationary fractional unit root, Econometric Theory 15, 549582.
101
Chapter 3
Table 1: Finite Sample Rejection Frequencies Under Assumption 1.0 Uncorrelated Correlation .6 Sample Size Envelope LM LMsc Envelope LM LMsc n = 200 0 0.050 0.024 0.050 0.050 0.031 0.050 0.05 0.230 0.172 0.247 0.305 0.222 0.275 0.10 0.567 0.468 0.573 0.733 0.585 0.643 0.15 0.859 0.755 0.828 0.960 0.867 0.886 0.20 0.976 0.913 0.949 0.998 0.976 0.985 0.25 0.998 0.986 0.993 1.000 0.996 0.998 n = 500 0 0.050 0.037 0.050 0.050 0.040 0.050 0.05 0.416 0.360 0.416 0.559 0.507 0.555 0.10 0.889 0.838 0.879 0.974 0.944 0.957 0.15 0.996 0.980 0.986 1.000 0.997 0.997 0.20 1.000 1.000 1.000 1.000 1.000 1.000 0.25 1.000 1.000 1.000 1.000 1.000 1.000
102
103
Chapter 3
CMLE
p=0 p=1
Table 5: Estimates of Fractional Integration Orders WG (y1t ) CAN SW FRA ITA JAP 1.0057 1.1211 0.9938 1.0081 1.0033 0.9975
(0.0425) (0.0975) (0.0870) (0.0611) (0.0425) (0.0425) (0.1016) (0.0870) (0.0611) (0.0425) (0.0914) (0.0425) (0.1026) (0.0870) (0.0611) (0.0425) (0.0980) (0.0870) (0.0611)
UK 1.0770
(0.0425) (0.0960) (0.0870) (0.0611)
dc = d 1.0295 0.9963 1.0495 1.0885
0.9625 1.0428 1.0847
1.0588
(0.0744) (0.0870) (0.0611)
0.9465 1.0197 1.0662
0.9920
(0.0870) 1.1338 (0.0611)
1.0023 1.0857 1.1029
0.9959 0.9696 1.1185
1.0163 0.9837 1.0725
GSP
m = 33 m = 67
1.1311 1.0406
1.1141
Standard errors are given in parenthesis, see also Tanaka (1999) and Robinson (1995). One asterisk denotes signicantly dierent from unity at 5% level and two asterisks denote signicantly dierent from unity at 1% level.
Table 6: Misspecication Tests for Univariate CMLEs WG (y1t ) CAN SW FRA ITA JAP UK p=0 AR(6) 35.095 6.5008 42.468 33.236 54.811 49.539 30.144 ARCH(1) 0.0201 2.9443 0.4078 0.2719 29.695 16.123 14.914 p=1 AR(6) 2.9604 6.5195 6.5442 6.4627 4.2951 11.789 11.389 ARCH(1) 0.7083 2.2356 0.9340 0.5906 2.3098 2.1289 7.1594 AR(6) is the Portmanteau test up to lag 6 and ARCH(1) is the test for ARCH up to lag 1, which are asymptotically 2 (5 p) and F (1, 333 p) distributed, respectively. One asterisk denotes signicance at 5% level and two asterisks denote signicance at 1% level.
Table 7: One-sided LM Tests for Fractional Cointegration d = b = 1 d = b = dc d = 1, b = 0.76 Assumption 1.2 13.40 13.38 11.91 Assumption 1.3 2.16 2.14 0.20 One asterisk denotes signicance at 5% level and two asterisks denote signicance at 1% level.
104
1.00
1.00
0.75
0.75
0.50
u ~iid u 1 ~iid, 1 =0.2, 1 =0.2, 2 =0.5 u 2 ~iid 2 =0.5
0.50
u ~iid u 1 ~iid, 1 =0.2, 1 =0.2, 2 =0.5 u 2 ~iid 2 =0.5
0.25
0.25
0 1.00
5 1.00
0.75
0.75
0.50
u ~iid u 1 ~iid, 1 =0.4, 1 =0.4, 2 =0.8 u 2 ~iid 2 =0.8
0.50
u ~iid u 1 ~iid, 1 =0.4, 1 =0.4, 2 =0.8 u 2 ~iid 2 =0.8
0.25
0.25
Figure 1: Asymptotic local power functions calculated using Corollary 3.1 with d = b = 1 and rst order autoregressive specications. The variances are normalized to unity, and the correlation is .6 and .9 in the left-hand and right-hand side panels, respectively.
105
106
Chapter 4
Forthcoming in Journal of Financial Econometrics, 2005
107
108

Abstract We introduce a multivariate Lagrange Multiplier (LM) test for fractional integration. We derive and analyze the LM statistic and show that it is asymptotically noncentral chisquared distributed under local alternatives, and that, under Gaussianity, the LM test is asymptotically ecient against local alternatives. It is shown that the regression variant in Breitung & Hassler (2002, Journal of Econometrics 110, 167-185) is not equivalent to the LM test in the multivariate case, although it is in the univariate case. A generalization of the LM test that explicitly allows for dierent integration orders for each variable is also introduced. The nite sample properties of the LM test are evaluated by Monte Carlo experiments which demonstrate that it is superior to the Breitung & Hassler (2002) test. An application to multivariate time series of real interest rates for six countries is oered, demonstrating that more clear-cut evidence can be drawn from multivariate tests compared to conducting several univariate tests. JEL Classication: C32 Keywords: Asymptotic Local Power, Ecient Test, Fractional Integration, Lagrange Multiplier Test, Multivariate Fractional Unit Root, Nonstationarity
I am grateful to Jrg Breitung, Uwe Hassler, Sren Johansen, Eric Renault (the editor), an associate editor, and two anonymous referees for many helpful comments and constructive suggestions, to Byung Chul Ahn and Klaus Neusser for providing the data, and to the Danish Social Science Research Council for nancial support (SSF grant no 24-02-0181).
109
Chapter 4
Introduction
In this paper we introduce multivariate Lagrange Multiplier (LM) tests (or ecient score tests) for fractional integration. Multivariate procedures are important since most applied work concerns multiple time series, either stationary or nonstationary. Parametric tests for fractional integration have been examined previously by Robinson (1991, 1994), Agiakloglou & Newbold (1994), and Tanaka (1999), among others, in a univariate framework, and recently by Breitung & Hassler (2002) and Gil-Alana (2003) in the multivariate case. The objective is to test if an observed K-vector time series yt is integrated of order d, denoted I (d), against the hypothesis that it is I (d + ) for 6= 0. By dierencing the observed time series, this is equivalent to testing if xt = (1 L)d yt is I (0) against I (). Multivariate nonparametric tests for I (0) against I () have been considered by Robinson (1995), who considers a test based on the multivariate logperiodogram estimator, Lobato & Robinson (1998), who propose a multivariate LM test based on the objective function considered by Lobato (1999), who also considers a Wald statistic. With no multivariate parametric tests available for testing the order of fractional integration, researchers interested in conducting parametric tests on multiple time series have been forced to apply univariate tests to each element of the multiple time series. That procedure is not only cumbersome, but ignores potentially important correlations between the elements of the multiple time series, which could lead to increased power of a multivariate test. Hence, the purpose of the present paper is to introduce LM tests that apply to the multivariate case, with the usual computational motivation for the LM principle. The proposed multivariate tests in the present paper in many ways parallel the ones by Choi & Ahn (1999) and Nyblom & Harvey (2000), who propose stationarity tests, i.e. tests of I (0) against I (1), for multiple time series, and our work can thus also be seen as a generalization of their work with the important dierence that our test is directed against dierent (i.e. fractional) alternatives. The tests proposed in this paper are intended primarily for preliminary data analysis. For instance, when testing the null of stationarity or I (0)-ness (against fractional alternatives), non-rejection would allow standard methods to be employed for conducting, e.g., causality, structural vector autoregression, or impulse response analyses. More generally, the tests may indicate the transformation of the data that would be required in order to make the data suitable for such analyses. For instance, in Andersen, Bollerslev, Diebold & Labys (2003) a fractional dierence is taken of the multivariate volatility processes considered there, and the resulting multivariate series are modeled by vector autoregressions. Our tests could then be applied to ensure that the fractional dierence is sucient to render the volatility processes I(0). Another example is the analysis of the Real Interest Parity hypothesis by Kugler & Neusser (1993) using a co-dependence approach, which requires that the data is I(0). Again, our methodology could then be applied to test the latter hypothesis, underlying their entire analysis, and we return to this in section 5 below. Suppose we observe {yt , t = 1, ..., n} generated by 110
(1 L)d+ yt = et I (t 1) ,
t = 0, 1, 2, ...,
(1)
where I () denotes the indicator function and et is I (0), i.e. is covariance stationary and has spectral density that is bounded and bounded away from zero at the origin. The process yt generated by (1) is well dened for all d and , and is sometimes called a multivariate type II fractionally integrated process. The eect of the truncation or initial values condition in (1) has been analyzed by Robinson (2004) who investigates the dierence between the process dened in (1) and the corresponding process without the truncation (type I). The process in (1) allows a uniform denition, valid for all d and , whereas the alternative denition without truncation would be valid only for d + (1/2, 1/2) and partial summation would be needed to generate a process with integration order outside this range. For more details on type I and type II fractionally integrated processes and their dierence, see Marinucci & Robinson (1999) and Robinson (2004). Deterministic terms could be added to (1), allowing for a non-zero mean and trend or deterministic seasonal behavior, see section 3.1. In section 3.2 we consider the extension to dierent values of d and for each component of yt in (1). For the moment, we let the errors et be independently and identically distributed with mean zero and positive denite covariance matrix , i.i.d.(0, ). In section 3.3 we relax this assumption, and let et follow a stationary vector autoregressive process of order p, VAR(p). We could presumably relax the assumption of constant second moment structure to accommodate conditional heteroskedasticity along the lines of Ling & Li (1997) who analyze a univariate fractionally integrated autoregressive moving average model with conditional heteroskedasticity. Conditional heteroskedasticity is often found in nancial data, where our methodology is especially applicable due to the large amount of data available, and hence this would be an important direction for future research. Furthermore, note that positive deniteness of rules out cointegration among the components of yt . However, even though we rule out the possibility of cointegration, which has been popular especially in recent empirical macroeconomics, we are still able to apply our model to test a number of interesting hypotheses such as joint stationarity or I(0)-ness as described above, in which case we need not worry about the possibility of cointegration. It turns out that our test is also implicitly a test of the null of no cointegration (see section 2 and the discussion following Corollary 2 below), and for such tests Lee & Tse (1996) provide evidence that leptokurticity as produced by conditional heteroskedasticity may cause overrejection of the null of no cointegration, and Sin & Ling (2004) show how reduced rank analysis may be modied to accommodate conditional heteroskedasticity. Further analysis in the case of cointegration, reduced rank of , and/or conditional heteroskedasticity is beyond the scope of this paper, and we leave these important topics for future research. However, in the next section we do briey consider the properties of our proposed tests in the presence of cointegration. In the model (1) we assume that d is specied a priori and wish to test the hypothesis H0 : = 0 111 (2)
Chapter 4
against the alternative H1 : 6= 0. For instance, the unit root hypothesis and the hypothesis of joint stationarity (or more precisely, weak dependence) of yt are given by (1) and (2) with d = 1 and d = 0, respectively. Indeed, the most important motivation for the current study is the test of joint stationarity or joint I(0)-ness, which we also illustrate empirically in section 5 below. In that case we also do not need to worry about the assumption of no cointegration. It is important to note that the assumption that d is known a priori is made without loss of generality. The specication of a particular value of d exactly species the null hypothesis since = 0 in (2). Robinson (1994) and Tanaka (1999) consider testing (2) in the univariate model, i.e. (1) with K = 1. Robinson (1994) shows that the frequency domain LM test statistic has a chi-squared limiting distribution under the null, and is asymptotically ecient against local alternatives, = / n, under Gaussianity. The frequency domain LM test of Robinson (1994) is extended to the multivariate case by Gil-Alana (2003). Tanaka (1999) shows that the univariate time domain LM test statistic has a normal (one-sided test) or chi-squared (two-sided test) limiting distribution, and that, under Gaussianity, it is asymptotically most powerful against local alternatives among all the invariant tests. A simulation study by Tanaka (1999) also indicates the nite sample superiority of the time domain test over Robinsons (1994) frequency domain test. Breitung & Hassler (2002) suggest a regression variant of Tanakas (1999) LM test similar to the Dickey-Fuller test, see also Dolado, Gonzalo & Mayoral (2002). Breitung & Hassler (2002) also suggest a multivariate version, which generalizes to a trace test for the cointegrating rank, along the lines of the Johansen (1988) test, and show that their multivariate test has a limiting chi-squared distribution, where the degrees of freedom depends only on the cointegrating rank under the null. We show that the equivalence of the LM test and the regression based test of Breitung & Hassler (2002) fails to hold in the multivariate case. We derive the LM test statistic for the hypothesis (2) in the time domain, with the usual computational advantage of estimation under the null. Thus, no multivariate fractionally integrated model needs to be estimated. Of course, in the presence of short-run dynamics a vector autoregressive model needs to be estimated under the null, but that is quite simple and computationally not as demanding as the estimation of a multivariate fractionally integrated model, which would typically require numerical optimization. In fact, the test is based on computationally simple moment matrices, see (4) and (7) below. We establish desirable distributional properties and optimality properties of the test. In particular, the test statistic is asymptotically noncentral chi-squared distributed under local alternatives, where the degrees of freedom equals the number of restrictions tested, and under Gaussianity, it is asymptotically ecient against local alternatives. Furthermore, the LM test is shown to be consistent against fractional cointegration, i.e. it rejects with probability tending to one in the case where the integration order of some linear combination of the observed vector time series is lower than the hypothesized value. Thus, the test could be employed as a 112
test of non-cointegration against the alternative of cointegration. An extension of the LM test statistic that explicitly allows for dierent integration orders (both dierent d and dierent ) for each variable in the vector time series yt is also introduced, and its asymptotic properties are examined. In a simulation study we examine the properties of the LM test in nite samples and compare with the Breitung & Hassler (2002) test. We nd that the LM test compares favorably with the Breitung & Hassler (2002) test, and in particular that the LM test has higher nite sample power than the Breitung & Hassler (2002) test in the non-cointegrated model. The frequency domain test by Gil-Alana (2003) is not considered in our simulation study since the evidence in Tanaka (1999) suggests that time domain tests are superior in terms of nite sample properties. Finally, we present an interesting empirical application of the multivariate LM test and its extensions, which demonstrates their usefulness in practice. We apply our tests to a multivariate time series of real interest rates for six major industrialized countries previously examined by Kugler & Neusser (1993) and Choi & Ahn (1999). Kugler & Neusser (1993) analyze the Real Interest Parity hypothesis by a co-dependence approach, which requires the vector time series in question to be stationary, or more precisely, to be I(0). To test this underlying hypothesis, Kugler & Neusser (1993) apply univariate unit root tests to each element of the multiple time series which mostly reject the null of a unit root, and Choi & Ahn (1999) apply their multivariate stationarity test (i.e. test of I (0) against I (1)) and nd no evidence against the null hypothesis. Our objective is to test if the real interest rates are jointly I (0) against fractional alternatives, and the evidence we obtain from the multivariate tests is more clear-cut than the evidence from applying univariate tests to each element of the multiple time series. The rest of the paper is laid out as follows. Next, we derive and analyze the multivariate LM test in the basic model with only one integration order, which is common to all the variables. In section 3 we consider generalizations of the basic model allowing deterministic terms, dierent values of d and for each variable, and short-run dynamics. Section 4 presents the results of the simulation study, and section 5 presents the empirical application. Section 6 oers some concluding remarks. Proofs are collected in the appendix.
Multivariate LM Test
The Gaussian log-likelihood function of the model in (1) is 1X n 0 L (, ) = ln (2 ||) (1 L)d+ yt 1 (1 L)d+ yt , 2 2 t=1 113
n
(3)
Chapter 4
and hence the score is, see also Tanaka (1999) and Breitung & Hassler (2002), n X L (, ) = (ln (1 L)) x0 1 xt t =0,= t=1 = tr 1 S10 ,
(4)
P P P where xt = (1 L)d yt , S10 = n x x0 , x = t1 j 1 xtj , and = n1 n xt x0 is a t t1 t=2 t1 t j=1 t=1 consistent estimate of = E (et e0 ) under the null. When K = 1, i.e. when the observed time t series is univariate, the score in (4), normalized by n, reduces to Tanakas (1999) univariate Pn1 1 time domain score statistic, sn = n j=1 j (j), where (j) is the jth order sample autocorrelation of xt . Our multivariate score (4) is similar to Choi & Ahns (1999, p. 47) SBDH statistic and Nyblom & Harveys (2000, p. 179) LBI statistic for testing I (0) against I (1) in multiple time series. The dierence is that we introduce the j 1 weights in the calculation of x , where Choi & Ahn (1999) and Nyblom & Harvey (2000) use unweighted partial sums. t1 Breitung & Hassler (2002) consider the test statistic 1 0 0 (d) = tr 1 S10 S11 S10 , (5)
Pn 0 2 where S11 = t=2 xt1 xt1 , and show that 0 (d) d K 2 under the null (2). However, since tr (AB) 6= tr (A) tr (B) in general, (5) is not equivalent to the multivariate LM test of (2), as demonstrated for the univariate test by Breitung & Hassler (2002). Instead, (5) is a regression variant along the lines of the Dickey-Fuller test and the fractional Dickey-Fuller test, see Dolado et al. (2002). Indeed, the main aim of Breitung & Hassler (2002) is to construct a fractional trace statistic similar to Johansen (1988), just as the Dickey-Fuller test generalizes to Johansens (1988) trace statistic. In particular, (5) can be rewritten as a sum of eigenvalues, P 0 (d) = K j , where j turns out to be the test statistic for j = 0 in j=1
0 (vj xt ) = 0 x + et j t1
and vj is the eigenvector corresponding to j . Thus, K 2 restrictions are being tested (j = 0, j = 1, ..., K) instead of one restriction as in (2), which explains the K 2 degrees of freedom in the asymptotic distribution of 0 (d). Consequently, the test statistic (5) is not the LM test statistic for testing the hypothesis (2). The multivariate LM test statistic for testing (2) is, e.g. Amemiya (1985, p. 142), " #1 L () L () 2 L () , LM = 0 =0,= 0 =0,= =0,= 114
(6)
where = ((vec )0 , 0 )0 . The relevant block of the Hessian matrix in (6) is n X 2 L (, ) = (ln (1 L)) x0 1 (ln (1 L)) xt t 2 =0,= t=1 n 1 X 0 1 0 1 xt (ln (ln (1 L))) xt + (ln (ln (1 L))) xt xt + 2 t=1 = tr 1 M11 , Pn
0 t=1 xt2 xt ,
0 dening M11 = S11 + 1 (S20 + S20 ), S20 = 2 nd that
and x = t2
LM =
tr(1 S10 )2 . tr(1 M11 )
Pt2
j=1 j
1 x tj1 .
Thus, we (7)
In the following theorem we present the limiting distribution of the test statistic under alternatives local to the null, H1n : = / n, where is a xed scalar. Theorem 1 Under = / n, the LM test statistic (7) is asymptotically distributed as 2 I 2 , 1 where 1 2 L (, ) 2 K I = lim E0 . (8) = n n 6 2 Under the additional assumption of Gaussianity, the test is asymptotically ecient against local alternatives. Under the null hypothesis (2), LM d 2 . 1 Thus, the LM test is chi-squared with one degree of freedom under the null, which is expected since only one restriction is being tested. In contrast, the test (5) has K 2 degrees of freedom. More generally, standard statistical results apply in the present fractional model, unlike in the multivariate unit root and stationarity tests nested in autoregressive models, e.g. Phillips & Durlauf (1986), Fountis & Dickey (1989), Choi & Ahn (1999), and Nyblom & Harvey (2000). Note that Theorem 1 continues to hold if (the negative inverse of) the Fisher information matrix (8) is substituted for the Hessian matrix in the denition of the LM test in (6) or (7). However, in simulation experiments not reported here, it was found that the LM test dened in (6) has superior nite sample properties, especially in the presence of short-run dynamics. In addition, when allowance is made for short-run dynamics, the calculation of the Fisher information matrices, see (17) and (18) below, can be quite complicated. Thus, we maintain the denition of the LM test in terms of the Hessian matrix as in (6). Next, as in Choi & Ahn (1999), we use the fact that the LM test is invariant to non-singular linear transformations, i.e. transformations of the type xt = Dxt for D non-singular, to show that the test is consistent against fractional cointegration. Following Breitung & Hassler (2002), 115
Chapter 4
we say that yt is fractionally cointegrated, denoted CI (d, b), if yt is I (d) and there exists K r and K (K r) linearly independent matrices and of full rank such that 0 yt I (d b) , where it is assumed that the fractional integration order d is given, but b > 0 is unknown. That is, the maintained hypothesis is that yt is I (d), but it is now assumed that there exists some linear combination of yt , which is integrated of a lower order. Thus, we are still under the null in the sense of (2). We also assume that ut = (, (1 L)b )0 xt is i.i.d. (0, ). The following corollary shows that our multivariate LM test (7) rejects with probability tending to one when yt is CI(d, b). Corollary 2 The LM test statistic (7) is Op (n) when yt is CI (d, b). Note that in practical application with e.g. d = 1, rejection of the null can be caused by either cointegration among the variables or because one of the variables is not I(1). Hence, in that case, rejection of the null warrants further investigation to determine the cause of the rejection, e.g. analyzing subsets of the variables. In addition, Lee & Tse (1996) argue that rejection could be caused by leptokurticity as produced by conditional heteroskedasticity, and Sin & Ling (2004) show how reduced rank analysis may be modied to accommodate conditional heteroskedasticity. However, this is not a concern when testing the important hypothesis of joint stationarity or joint I(0)-ness, i.e. with d = 0, in which case we do not need to worry about the possible presence of cointegration. More generally, by setting equal to a column of the identity matrix, Corollary 2 actually demonstrates that the LM test in this section is also consistent against the alternatives considered in section 3.2 below, in the sense that if the integration order of just one of the variables diers from d, the test statistic will be Op (n). 0 yt I (d) ,
3
3.1
Extensions of the Model

Deterministic Terms
We allow for deterministic terms in the data generating process following Robinson (1994). 0 Suppose we observe the K-vector time series yt , t = 1, 2, ..., n , generated by the linear model
0 yt = zt + yt ,
(9)
where zt is a q-vector of purely deterministic components and yt is an unobserved K-dimensional component generated by (1). Two leading cases for the deterministic terms are zt = 1 and zt = (1, t)0 , which yield the 0 0 models ykt = k0 + ykt and ykt = k0 + k1 t + ykt , respectively, but other terms like seasonal 116
dummies or polynomial trends can also be accommodated. As in Denition 2 of Robinson P (1994), it is only required that n zt zt 0 is positive denite for n suciently large, where t=1 d z . It follows from Robinson (1994) that can be estimated by least squares zt = (1 L) t 0 regression of (1 L)d yt on zt , yielding the estimate . The test statistic is then based on the 0 residuals yt = yt zt . Note that we assume the deterministic terms appear in the generating mechanism of the 0 observed vector time series yt , instead of xt as in Breitung & Hassler (2002). This follows the approach of Robinson (1994), and is more natural for interpretation of zt when d is nonintegral. 0 Consider the simple case with zt = 1 and 0 < d < 1/2. In our setup, yt is then an asymptotically stationary long memory process around a non-zero mean vector, 0 . However, in the setup 0 of Breitung & Hassler (2002), yt would be an asymptotically stationary long memory process around the vector of fractional deterministic trends, (1 L)d I (t 1) 0 .
3.2
Dierent for Each Variable

(1 L)dk +k ykt = ekt I (t 1) ,
Suppose the generating mechanism (1) is modied to k = 1, ..., K, t = 0, 1, 2, ..., (10)
such that = (1 , ..., K )0 is now a K-vector. Redening the log-likelihood accordingly and denoting it LK (, ) (subscript K denoting dierent for each variable), the score is now given by n X LK (, ) = diag ((ln (1 L)) xt ) 1 xt =0,=
t=1
by use of vec (ABC) = (C 0 A) vec B and property 1 of Lemma 1. We denote by diag (a) the diagonal matrix having the vector a on the diagonal, and the matrix JK is dened in Lemma 1. As in the previous section, the score (11) reduces to the univariate score when K = 1. The relevant block of the Hessian matrix in (6) is n X 2 LK (, ) = diag ((ln (1 L)) xt ) 1 diag ((ln (1 L)) xt ) 0 =0,= t=1 +
n X t=1 n X t=1 0 JK (IK 1 xt ) diag ((ln (ln (1 L))) xt )
0 0 = JK vec 1 S10
n X t=1
0 JK (x 1 xt ) t1
(11)
diag
0 = S11 1 + (1 S20 ) IK ,
x t1
n X 1 diag x diag(1 xt ) diag x t1 + t2 t=1
117
Chapter 4
using property 3 of Lemma 1. Here, denotes the Hadamard product, see the appendix or Magnus & Neudecker (1999). We thus form the LM test statistic 1 0 0 0 0 JK vec(1 S10 ). LMK = vec(1 S10 )0 JK S11 1 + (1 S20 ) IK (12)
The asymptotic distribution of the test statistic under local alternatives, H1n : = / n, where is now a xed K-vector, is given by the following theorem. Theorem 3 Under = / n, a xed K-vector, the LM test statistic (12) is asymptotically 0 distributed as 2 IK , where K IK = lim E0
n
1 2 LK (, ) 2 1 . = 0 n 6
Under the additional assumption of Gaussianity, the test is asymptotically ecient against local alternatives. Under the null hypothesis (2), LMK d 2 . K From Theorem 3 it is worth noting once more that, in the more general model considered in this section, the degrees of freedom still equals the number of restrictions tested, K.
3.3
Short-run Dynamics
In this section we allow for short-run dynamics following Tanaka (1999) and Breitung & Hassler (2002). In particular, suppose et is generated according to the vector autoregressive (VAR) process A (L) et = t , t = 0, 1, 2, ..., (13)
where t satises the assumptions of et before. Here, A (z) is a matrix polynomial of order p such that A (1) has full rank and et is a stationary VAR(p) process. The parameters of A (z) 0 are gathered in the K 2 p-vector a = vec (A1 , ..., Ap ), and we also dene = 0 , a0 . In the case with a dierent d or a dierent for each equation, an important caveat applies in our multivariate setup as pointed out by Comte & Renault (1996) and Lobato (1997) in dierent contexts. Namely that the ordering of the autoregressive polynomial and the dierencing operator matters. In our multivariate ARFIMA(p, d, 0) time series model in (10) and (13), it is apparent that, under the null, ykt is integrated of order dk for all k = 1, ..., K. However, suppose instead that the model (bivariate for simplicity) were given by 0 (1 L)d1 0 (1 L)d2 ! a11 (L) a12 (L) a21 (L) a22 (L) 118 ! y1t y2t ! = e1t I (t 1) e2t I (t 1) ! (14)
under the null. That is, the autoregressive polynomial and the dierencing operator have been interchanged compared to our model in (10) and (13). Then we can write y1t as (a11 (L) a22 (L) a12 (L) a21 (L)) (1 L)d1 +d2 y1t
= a22 (L) (1 L)d2 e1t I (t 1) a12 (L) (1 L)d1 e2t I (t 1)
and thus y1t is I (d2 ) if d1 < d2 and a12 (1) 6= 0, and y1t is I (d1 ) otherwise. Similarly, y2t is I (d1 ) if d2 < d1 and a21 (1) 6= 0, and y2t is I (d2 ) otherwise. Thus, in (14), the integration orders of y1t and y2t are no longer constant throughout the parameter space as they are in our model, where y1t is I (d1 ) and y2t is I (d2 ) for any d1 , d2 . The model (14) is equivalent to our model only in the univariate setup or when dk = d for some d and all k = 1, ..., K, i.e. when the setup is as in section 2. For the model with short-run dynamics (13), we construct the test statistics based on the prewhitened series, i.e. we use the residuals from the regression xt = A1 xt1 + ... + Ap xtp + t , t = 1, ..., n, 0 P P t2 tj1 , and Xt1 = x0 , ..., x0 and dene = t1 j 1tj , = t1 j 1 t1 tp . The t1 j=1 j=1 P P test statistics (7) and (12) are now dened in terms of = n1 n t0 , S10 = n 0 , t=1 t t=2 t1 t Pn 0 Pn 0 Pn Pn 0 t1 S11 = t=2 t1t1 , S20 = t=2 t2t , Sx1 = t=2 Xt10 , Sxx = t=2 Xt1 Xt1 , and the Hessian matrices " # 2 L (, a, ) tr(1 M11 ) vec(Sx1 )0 = , 0 =0,a=,= vec Sx1 Sxx 1 a " # 0 0 0 2 LK (, a, ) S11 1 + (1 S20 ) IK JK (Sx1 IK ) = . 0 Sxx 1 (Sx1 IK )JK =0,a=,= a Applying the partitioned matrix inverse formula, the test statistics are LM = tr(1 S10 )2 , 1 tr(1 (M11 Sx1 Sxx Sx1 )) (15)
1 0 0 0 1 0 LMK = vec(1 S10 )0 JK S11 1 + (1 S20 ) IK (Sx1 Sxx Sx1 ) 1 JK vec(1 S10 ). (16) The results of Theorems 1 and 3 continue to hold in the present case with autocorrelated errors, though the noncentrality parameters are dierent. Theorem 4 Suppose (13) holds and let the LM test statistics be dened by (15) and (16). The results of Theorems 1 and 3 continue to hold with noncentrality parameters dened by 2 K tr 0 1 , 6 2 1 0 1 1 , = 6 I= 119 (17) (18)
IK
Chapter 4
0 P 1 0 0 0 where is the covariance matrix of (e0 , ..., e0 t tp+1 ) , = 1 , ..., p , i = j=i j Bji , and Bi is the coecient on z i in the moving average polynomial B (z) from the Wold representation of et . P As a simple example consider the VAR(1), et = Aet1 + t = Aj tj . In this case, I j=0 2 K/6 tr 1 0 and 2 1 1 0 1 , respectively, and IK reduce to 1 1 1 1 6 P 1 j1 and = E (et e0 ) can be recovered from the relation vec = where 1 = IK + j=2 j A t 1 (IK 2 A A) vec .
In this section we compare the nite sample properties of the LM test in (7) or (15), and Breitung & Hasslers (2002) 0 (d) test (henceforth the BH test) in (5) with allowance for short-run dynamics when relevant, see Breitung & Hassler (2002). The asymptotic local power of the LM test can easily be derived from the previous results as 2 (19) P LM > 2 1,1 = 1 F1, 1,1 ,
2 where 2 1,1 is the 100 (1 ) % point of the central distribution with one degree of freedom, and F1, is the distribution function of the noncentral 2 distribution with one degree of freedom and noncentrality parameter dened in Theorems 1 and 4. Setting = n in (19), we can compare the asymptotic local power with the nite sample rejection frequencies for any xed values of and n. The models we consider for the simulation study are " # (1 L)d+ 0 Model A : yt = t I (t 1) , 0 (1 L)d+ " " # # (1 L)d+ a 0 0 , Model B : (I2 AL) yt = t I (t 1) , A = 0 a 0 (1 L)d+ " #" # u1t (1 L)1 0 Model C : y1t = y2t + u1t , (I2 AL) = t I (t 1) , y2t 0 (1 L)
where the t are i.i.d. N (0, ). Unreported simulations have shown that using fat-tailed error distributions, such as the t5 or Cauchy, do not change the results below by much. The contemporaneous covariance matrix is normalized such that the diagonal elements equal unity and the correlation coecient is 0 or 0.6. Models A and B are non-cointegrated and the alternatives are of the form considered in Theorem 1, i.e. with the same for each variable. We use the values d = 0 and d = 1 in Models A and B but the results do not vary much with d as seen below. The simulations have also been run using other values of d and the results are almost identical to those obtained in these two cases. The cointegrated alternatives of 120
Corollary 2 are considered in Model C, where y1t and y2t are fractionally cointegrated if > 0 and non-cointegrated under the null hypothesis, = 0. To generate data we used = 1. All calculations were made in Ox version 3.20 including the Arma package version 1.01, see Doornik (2001) and Doornik & Ooms (2001). To calculate the BH test we adapted the Gauss code available on Jrg Breitungs internet homepage. Throughout, the nominal size (type I error) of the tests is xed at 5%, and the number of replications at 10, 000. Tables 1-2 about here In Tables 1 and 2 the nite sample rejection frequencies of the LM and BH tests for the case with i.i.d. errors are presented, i.e. for Model A, with d = 0 and d = 1, respectively. Under the heading Limit, we give the asymptotic local power calculated from (19) with = n. Size corrected rejection frequencies have also been computed and are reported as LMsc and BHsc. There are no signicant dierences between the case with d = 0 (Table 1) and the one with d = 1 (Table 2). The simulated sizes of both tests are close to the nominal 5% level, but the LM test is the more powerful test for Model A, except against > 0 with n = 100 in which case the BH test appears slightly more powerful. Furthermore, the nite sample power of the LM test is close to the corresponding asymptotic local power. Unreported simulations show that the BH test is robust to the case where the s in Model A are allowed to be dierent, i.e. as in the model of section 3.2. However, the LMK test is specically designed for that model and is directed against alternatives where the s are dierent. Hence, the LMK test is clearly superior to the BH test in that model. Tables 3-4 about here Tables 3 and 4 present the simulation results for Model B with d = 0 and d = 1, respectively, and a = 0.4. As in Model A, there are no signicant dierences between the two cases d = 0 and d = 1. For the small sample size, n = 100, the BH test is slightly size distorted, with simulated sizes ranging from 0.0694 to 0.0745 for the dierent choices of and d. When n = 100, the BH test has slightly higher power against < 0 (opposite the case in Tables 1 and 2), but against > 0 the LM test has much higher power than the BH test. When considering the larger sample size, n = 250, or the size corrected tests, the LM test is clearly the superior test for Model B. It is worth noting that in all cases, i.e. for both n = 100 and n = 250, for both values of d, and for both values of , the BH test has lower power against = 0.3 than against = 0.2. Tables 5-6 about here To evaluate the sensitivity to the particular value of the coecient matrix (i.e. a = 0.4) in the autoregressive specication in Model B, Tables 5 (d = 0) and 6 (d = 1) present the 121
Chapter 4
simulated sizes of the LM and BH tests for dierent specications of the coecient matrix A in Model B. In particular, the values a = 0.75, 0.5, 0.25, 0, 0.25, 0.5, 0.75 and sample sizes n = 100, n = 250, and n = 500 are considered. Notice that the column a = 0 corresponds to the case where a VAR(1) is estimated for et even though it is really an i.i.d. process. For all specications the size distortions of both tests are small, and as before the results for d = 0 and d = 1 are almost identical. For samples of n = 100 the simulated size of the LM test ranges from 0.0497 to 0.0775 when a 0.50. However, when a = 0.75 and n = 100, the simulated size of the LM test is almost 13% for a nominal 5% test. When larger samples of n = 250 and n = 500 are considered, the size distortions for a = 0.75 are smaller. Overall, Tables 5 and 6 show that the size of the LM test is close to the nominal 5% level. Table 7 about here Table 7 shows nite sample rejection frequencies of the LM and BH tests for Model C with d = 1 and a = 0.4, i.e. when yt is fractionally cointegrated with short-run dynamics. The column = 0 corresponds to I (1) non-cointegrated data, the column = 1 to standard bivariate I (1) I (0) cointegration, and 0 < < 1 corresponds to fractional cointegration with I (1 ) cointegration errors. Thus, the degree of cointegration is determined by the magnitude of . In this model, both tests exhibit simulated rejection frequencies very close to the nominal 5% level when = 0, i.e. when there is no cointegration. When = 0, the nite sample rejection frequencies of the two tests are close. When the errors are contemporaneously correlated, = 0.6, both tests have increased rejection frequencies, but the rejection frequencies of the BH test are higher than those of the LM test, which is expected since the BH test is specically directed towards testing against cointegration. Overall, the Monte Carlo study shows that the LM test has higher nite sample power than the BH test in the non-cointegrated model, although both tests can be slightly size distorted when the errors exhibit positive autocorrelation. Since the results for the non-cointegrated model in Tables 1-6 were practically identical for d = 0 and d = 1, we expect that these results carry over to any value of d which is also indicated by unreported simulations for several alternative values of d. Moreover, as the present test is not considered a test for cointegration but more a test of stationarity (or more generally of fractional integration of a given order), we do not put much weight on the results for the cointegrated model. Thus, the LM test is superior in the non-cointegrated model for any value of the integration order, d.
Empirical Application
In this section we apply our tests to the data examined previously by Kugler & Neusser (1993) and Choi & Ahn (1999). The data are monthly observations on real interest rates for the USA, Japan, the UK, (West) Germany, France, and Switzerland from January 1980 to October 1991, i.e. 142 observations on six time series. A more detailed description is available in Kugler & 122
Neusser (1993) or Choi & Ahn (1999). Time series plots of the six real interest rate data series are presented in Figure 1. From the time series plots, the time series data do appear serially correlated. Whether they are in fact fractionally integrated is the question to which we turn next. Figure 1 about here The objective of Kugler & Neusser (1993) was to test the Real Interest Parity hypothesis using a co-dependence approach, which requires the vector time series in question to be stationary, or more precisely, to be I(0). In order to establish stationarity of the data, Kugler & Neusser (1993) conducted a series of univariate unit root tests, which rejected the unit root null hypothesis for most of the series. They found some sensitivity to the choice of lag length for the augmented Dickey-Fuller tests, while the Phillips-Perron tests all rejected the null. Choi & Ahn (1999) reversed the null and alternative hypotheses, and tested the null hypothesis of level-stationarity against the alternative of a unit root, which seems to be a more natural testing strategy in the present case. They applied the multivariate stationarity tests developed in their paper and also the univariate counterparts for comparison. It was found that one of the univariate stationarity tests (their LMI test) rejected the null at the 5% level for France, and that two univariate stationarity tests (their SBDHT and SBDHB tests) rejected the null at the 10% level for the USA. However, none of their multivariate tests rejected the null at the 10% level, thus providing more certain evidence than the univariate tests. We apply our LM and LMK tests and the BH test of Breitung & Hassler (2002) to the real interest rate data to test the hypothesis that d = 0, i.e. that the data are I (0), against fractionally integrated alternatives. Thus, we test one of the underlying assumptions of the Kugler & Neusser (1993) analysis, where non-rejection of the hypothesis that d = 0 implies that their analysis is applicable. We allow for a non-zero mean by setting zt = 1 as in section 3.1, and report the tests without allowing short-run dynamics (p = 0) and allowing VAR(p) dynamics with p = 1 and p = 4. Note that in the case of d = 0, the treatment of deterministic terms is the same for our tests and for the BH test, see section 3.1. We thus demonstrate a wide variety of the tests proposed in the above sections. Table 8 about here In panel (a) of Table 8 we report the results from applying the univariate LM and BH tests to each individual time series. When p = 0 both tests reject clearly for all the time series. However, when p > 0 the LM test rejects at the 1% level in two of the twelve cases (Germany and Switzerland with p = 1), and similarly the BH test rejects at the 5% level in one case (Germany with p = 1) and at the 1% level in one case (France with p = 4). The results from applying the multivariate LM, LMK , and BH tests are reported in panel (b) of Table 8. Again, the null is soundly rejected when no short-run dynamics is allowed, i.e. 123
Chapter 4
when p = 0, and also when p = 4 for the BH test. However, when allowing short-run dynamics with either p = 1 or p = 4, the LM and LMK tests do not reject the null. Thus, the empirical results provide strong evidence that the data are indeed I (0) with nonzero means, when allowance is made for short-run dynamics, and hence support the unit-root tests in Kugler & Neusser (1993) and the stationarity tests in Choi & Ahn (1999). Indeed, our results for the multivariate tests, as well as those of the multivariate tests of Choi & Ahn (1999), are less ambiguous than the results of Kugler & Neusser (1993) and oer more clear-cut evidence in favor of the null hypothesis.
Conclusion
We have introduced a multivariate LM test for fractional integration, generalizing the univariate tests developed recently by Robinson (1994) and Tanaka (1999), among others. The test is intended primarily for preliminary data analysis. For instance, when testing the null of stationarity or I (0)-ness (against fractional alternatives), non-rejection would allow standard methods to be employed for conducting, e.g., causality, structural VAR, or impulse response analyses. More generally, our multivariate test may indicate the transformation of the data that would be required in order to make the data suitable for said analyses. We have shown that the regression variant of the LM test derived by Breitung & Hassler (2002) is not equivalent to the LM test in the multivariate case, although they are equivalent in the univariate case. Indeed, in the multivariate case, the two tests have dierent degrees of freedom in their asymptotic chi-squared distributions. We have established desirable distributional properties and optimality properties of the LM test. In particular, the test statistic is asymptotically noncentral chi-squared distributed under local alternatives, where the degrees of freedom equals the number of restrictions tested. Under Gaussianity the LM test is asymptotically ecient against local alternatives. An extension of the LM test statistic, explicitly allowing dierent integration orders for each variable, has also been introduced. Finite sample properties have been evaluated by Monte Carlo experiments, which show that the LM test compares favorably with the Breitung & Hassler (2002) test. Finally, we have presented an interesting empirical application, demonstrating the practical usefulness of our tests. We apply our tests to a multivariate time series of real interest rates for six major industrialized countries previously examined by Kugler & Neusser (1993) and Choi & Ahn (1999) to test an underlying assumption of their analysis, namely that the vector time series is I(0). Kugler & Neusser (1993) apply univariate unit root tests to each element of the multiple time series which mostly reject the null of a unit root, and Choi & Ahn (1999) apply their multivariate stationarity test (i.e. test of I (0) against I (1)) and nd no evidence against the null hypothesis. Our objective is to test if the real interest rates are jointly I (0) against fractional alternatives, and the evidence we obtain from the multivariate tests is more 124
clear-cut than the evidence from applying univariate tests to each element of the multiple time series. The results indicate that, when allowing for short-run dynamics, the real interest rates are jointly I (0) with non-zero means.
Appendix: Proofs
Proof of Theorem 1. Breitung & Hassler (2002) show that, under = 0, 1 vec 1/2 S10 d N (0, IK ) , n
(20)
and by slight modication of the arguments of Breitung & Hassler (2002, p. 180), it follows that (21) n1 S11 p , n1 S20 p 0, n1 M11 p , where = lim n1
n n X t=1
The distribution under the null follows immediately using tr (A0 B) = vec (A)0 vec (B) and consistency of . Consider next the case = / n. Then ! ! n n X X 1 1 , (23) e e0 + tr 1 e e0 tr S10 = tr 1 t1 t t1 t1 + Op n n t=2 t=2 following the arguments of Tanaka (1999, p. 579). Applying (20) and (21) to the secondmoment matrices of et , the desired result follows. By uncorrelatedness of xt under the null, I = lim E0
n
2 . E x x0 = t t 6
(22)
is the Fisher information for under Gaussianity. Hence, the noncentrality parameter is maximal, and the test is ecient against local alternatives. Proof of Corollary 2. Since the LM test is invariant to non-singular linear transformations, we equivalently consider xt = Dxt (corresponding to zt in Breitung & Hassler (2002)), where ! ( 0 )1/2 0 D= 0 0 ( 0 )1 0 such that the (K r)-vector x1t is i.i.d. (0, IKr ) and the r-vector x2t is uncorrelated with x1t . 2 P K k , where the k are eigenvalues of n1/2 S10 = The LM test is proportional to k=1 0, or equivalently 1 0 n X X n1/2 X 0 X = 0, (24) 125
2K 1 2 L (, ) = tr 1 = n 6 2
Chapter 4
with capital letters denoting matrices of observations, i.e. X = (1 , ..., xn )0 and X = ( , ..., x )0 . x x1 n By Lemma A.1 of Breitung & Hassler (2002), 1 1 X 0X = n n 0 X1 X0
2
say, where A11 = Op (1), A12 = Op (1), A21 = Op (1), and A22 = Op n1/2 . Thus, it follows from eigenvalue inequality (6) of Ltkepohl (1996, section 5.3.1) that (24) has K r eigenvalues that are Op (1) and r eigenvalues that are Op n1/2 . In the following we need a lemma on some properties of the Hadamard product, which is dened for two m n matrices A = (aij ) and B = (bij ) as A B = (aij bij ) , see e.g. Magnus & Neudecker (1999, Chapter 3.6) for more details. The proof of the lemma is easy and is omitted. Lemma 1 Property 1. There exists a K 2 K matrix JK := (vec E11 , ..., vec EKK ), Eii = ei e0 i where ei is the i0 th unit K-vector, such that for any K K matrix A,
0 JK vec A = a,
X1
X2
A11 A12 A21 A22
where a is the K-vector holding the diagonal of A. If Ad := IK A is the diagonal matrix obtained from A then vec Ad = JK a. Property 2. Connection with the Kronecker product. For all K K matrices A and B,
0 JK (A B) JK = A B,
where JK is dened as in property 1. Property 3. Let A and B be K K matrices such that A is diagonal and B is symmetric. Then ABA = aa0 B, where a is dened as in property 1. Proof of Theorem 3. It follows from (20), application of vec (ABC) = (C 0 A) vec B, and property 2 of Lemma 1 that 1 0 0 JK vec 1 S10 d N n 126 2 1 . 0, 6
By (21) and consistency of , the distribution under the null follows. Under = / n the expansion corresponding to (23) is
n n X 1 0 X 1 1 1 et + diag e vec S10 = diag et1 diag e , t1 t1 + Op n n t=2 t=2 (25) and the result follows as above. Proof of Theorem 4. Consider rst = 0. For a xed m > p, dene the K 2 m-vector P 0 Cm = ((vec C (1))0 , ..., (vec C (m))0 )0 , where C (j) = n1 n t=j+1 t tj is the jth residual autocovariance. Hosking (1980) showed that 0 JK
and Bi is the coecient on z i in the moving average polynomial B (z) from the Wold representation of et . Thus, m X 1 n j vec C (j) d N (0, m )
j=1 2 with m = , ..., p )1 (1 , ..., p )0 ) and i = j=1 j ((1 Pm 1 j=i j Bji . It now follows by application of Bernsteins Lemma, see e.g. Hall & Heyde (1980, pp. 191-192), that X n1 1 n j vec C (j) d N (0, ) , j=1
where 1 is the inverse Fisher information for the parameters in A (z), 0 0 B1 IK , Km = . . .. . . . . . 0 0 0 Bm1 Bm2 Bmp
0 nCm d N 0, Im Km 1 Km ,
Pm
0(m)
0(m)
0(m)
0(m)
(m)
where = limm m . The limiting distributions of LM and LMK in (15) and (16), when P = 0, now follow by recalling that n1 S10 = n1 j 1 C (j), and using that n1 Sx1 p j=1 1 S and n xx p along with (21). When = / n, the desired results follow by combining the arguments of the previous theorems, and using expansions like (23) and (25).
127
Chapter 4
References
Agiakloglou, C. & Newbold, P. (1994), Lagrange multiplier tests for fractional dierence, Journal of Time Series Analysis 15, 253262. Amemiya, T. (1985), Advanced Econometrics, Harvard University Press, Cambridge. Andersen, T. G., Bollerslev, T., Diebold, F. X. & Labys, P. (2003), Modelling and forecasting realized volatility, Econometrica 71, 579625. Breitung, J. & Hassler, U. (2002), Inference on the cointegration rank in fractionally integrated processes, Journal of Econometrics 110, 167185. Choi, I. & Ahn, B. C. (1999), Testing the null of stationarity for multiple time series, Journal of Econometrics 88, 4177. Comte, F. & Renault, E. (1996), Long-memory continuous-time models, Journal of Econometrics 73, 101149. Dolado, J. J., Gonzalo, J. & Mayoral, L. (2002), A fractional Dickey-Fuller test for unit roots, Econometrica 70, 19632006. Doornik, J. A. (2001), Ox: An Object-Oriented Matrix Language, 4th edn, Timberlake Consultants Press, London. Doornik, J. A. & Ooms, M. (2001), A package for estimating, forecasting and simulating arma models: Arma package 1.01 for Ox, Working Paper, Nueld College, Oxford . Fountis, N. G. & Dickey, D. A. (1989), Testing for a unit root nonstationarity in multivariate autoregressive time series, Annals of Statistics 17, 419428. Gil-Alana, L. A. (2003), A fractional multivariate long memory model for the US and the Canadian real output, Economics Letters 81, 355359. Hall, P. & Heyde, C. C. (1980), Martingale Limit Theory and its Application, Academic Press, New York. Hosking, J. R. M. (1980), The multivariate portmanteau statistic, Journal of the American Statistical Association 75, 602608. Johansen, S. (1988), Statistical analysis of cointegration vectors, Journal of Economic Dynamics and Control 12, 231254. Kugler, P. & Neusser, K. (1993), International real interest rate equalization: A multivariate time series approach, Journal of Applied Econometrics 8, 163174. 128
Lee, T. & Tse, Y. (1996), Cointegration tests with conditional heteroskedasticity, Journal of Econometrics 73, 401410. Ling, S. & Li, W. K. (1997), On fractionally integrated autoregressive moving-average time series models with conditional heteroskedasticity, Journal of the American Statistical Association 92, 11841194. Lobato, I. N. (1997), Consistency of the averaged cross-periodogram in long memory series, Journal of Time Series Analysis 18, 137155. Lobato, I. N. (1999), A semiparametric two-step estimator in a multivariate long memory model, Journal of Econometrics 90, 129153. Lobato, I. N. & Robinson, P. M. (1998), A nonparametric test for I(0), Review of Economic Studies 65, 475495. Ltkepohl, H. (1996), Handbook of Matrices, John Wiley and Sons, New York. Magnus, J. R. & Neudecker, H. (1999), Matrix Dierential Calculus with Applications in Statistics and Econometrics, revised edn, John Wiley and Sons, New York. Marinucci, D. & Robinson, P. M. (1999), Alternative forms of fractional Brownian motion, Journal of Statistical Planning and Inference 80, 111122. Nyblom, J. & Harvey, A. C. (2000), Tests of common stochastic trends, Econometric Theory 16, 176199. Phillips, P. C. B. & Durlauf, S. N. (1986), Multiple time series regression with integrated processes, Review of Economic Studies 53, 473495. Robinson, P. M. (1991), Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regressions, Journal of Econometrics 47, 6784. Robinson, P. M. (1994), Ecient tests of nonstationary hypotheses, Journal of the American Statistical Association 89, 14201437. Robinson, P. M. (1995), Log-periodogram regression of time series with long range dependence, Annals of Statistics 23, 10481072. Robinson, P. M. (2004), The distance between rival nonstationary fractional processes, Forthcoming in Journal of Econometrics . Sin, C. Y. & Ling, S. (2004), Estimation and testing for partially nonstationary vector autoregressive models with GARCH, Working Paper, Hong Kong University of Science and Technology . Tanaka, K. (1999), The nonstationary fractional unit root, Econometric Theory 15, 549582.
129
Chapter 4
Table 1: Finite sample rejection frequencies for Model A with d = 0

=0 = 0.6 Limit LM BH LMsc BHsc Limit LM BH LMsc BHsc n = 100 0.3 0.9998 0.9966 0.9771 0.9970 0.9789 0.9998 0.9957 0.9777 0.9959 0.9793 0.2 0.9523 0.8900 0.7117 0.8969 0.7195 0.9523 0.8864 0.7148 0.8900 0.7265 0.1 0.4420 0.3918 0.1965 0.4076 0.2020 0.4420 0.3899 0.1978 0.3948 0.2094 0 0.0500 0.0468 0.0476 0.0500 0.0500 0.0500 0.0489 0.0458 0.0500 0.0500 0.1 0.4420 0.1891 0.2645 0.1949 0.2704 0.4420 0.1891 0.2608 0.1915 0.2706 0.2 0.9523 0.7128 0.7971 0.7187 0.8029 0.9523 0.7139 0.8054 0.7160 0.8144 0.3 0.9998 0.9667 0.9848 0.9682 0.9852 0.9998 0.9668 0.9858 0.9676 0.9863 n = 250 0.3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.2 0.9999 0.9999 0.9965 1.0000 0.9968 0.9999 0.9997 0.9954 0.9996 0.9954 0.1 0.8180 0.7879 0.5412 0.7921 0.5478 0.8180 0.7836 0.5344 0.7820 0.5409 0 0.0500 0.0478 0.0483 0.0500 0.0500 0.0500 0.0505 0.0483 0.0500 0.0500 0.1 0.8180 0.6276 0.6188 0.6319 0.6250 0.8180 0.6284 0.6174 0.6267 0.6227 0.2 0.9999 0.9954 0.9959 0.9956 0.9962 0.9999 0.9963 0.9956 0.9963 0.9959 0.3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 Note: The table reports simulated rejections frequencies under the null ( = 0) and alternative ( 6= 0) based on 10,000 replications of Model A with d = 0.
130
Table 2: Finite sample rejection frequencies for Model A with d = 1

=0 = 0.6 Limit LM BH LMsc BHsc Limit LM BH LMsc BHsc n = 100 0.3 0.9998 0.9945 0.9767 0.9950 0.9788 0.9998 0.9966 0.9779 0.9967 0.9775 0.2 0.9523 0.8914 0.7161 0.8977 0.7328 0.9523 0.8923 0.7064 0.8947 0.7057 0.1 0.4420 0.3864 0.1998 0.4000 0.2156 0.4420 0.3899 0.2038 0.3937 0.2032 0 0.0500 0.0457 0.0444 0.0500 0.0500 0.0500 0.0489 0.0501 0.0500 0.0500 0.1 0.4420 0.1855 0.2616 0.1906 0.2726 0.4420 0.1879 0.2609 0.1894 0.2606 0.2 0.9523 0.7159 0.8056 0.7234 0.8152 0.9523 0.7171 0.8029 0.7191 0.8022 0.3 0.9998 0.9667 0.9872 0.9677 0.9881 0.9998 0.9666 0.9884 0.9670 0.9884 n = 250 0.3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.2 0.9999 0.9999 0.9951 0.9999 0.9952 0.9999 0.9998 0.9965 0.9998 0.9965 0.1 0.8180 0.7882 0.5324 0.7867 0.5377 0.8180 0.7876 0.5400 0.7832 0.5458 0 0.0500 0.0504 0.0482 0.0500 0.0500 0.0500 0.0519 0.0477 0.0500 0.0500 0.1 0.8180 0.6241 0.6166 0.6234 0.6203 0.8180 0.6380 0.6278 0.6339 0.6326 0.2 0.9999 0.9964 0.9973 0.9964 0.9974 0.9999 0.9973 0.9966 0.9972 0.9968 0.3 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 Note: The table reports simulated rejections frequencies under the null ( = 0) and alternative ( 6= 0) based on 10,000 replications of Model A with d = 1.
131
Chapter 4
Table 3: Finite sample rejection frequencies for Model B with d = 0 and a = 0.4
=0 = 0.6 Limit LM BH LMsc BHsc Limit LM BH LMsc BHsc n = 100 0.3 0.6044 0.2201 0.3222 0.1847 0.2577 0.6044 0.2195 0.3175 0.1863 0.2591 0.2 0.3171 0.0830 0.1510 0.0676 0.1124 0.3171 0.0789 0.1630 0.0631 0.1227 0.1 0.1150 0.0394 0.0887 0.0308 0.0588 0.1150 0.0459 0.0859 0.0361 0.0609 0 0.0500 0.0603 0.0732 0.0500 0.0500 0.0500 0.0597 0.0694 0.0500 0.0500 0.1 0.1150 0.1517 0.0997 0.1400 0.0719 0.1150 0.1457 0.0963 0.1315 0.0725 0.2 0.3171 0.3047 0.1196 0.2874 0.0868 0.3171 0.3036 0.1169 0.2838 0.0908 0.3 0.6044 0.3964 0.1151 0.3795 0.0832 0.6044 0.4019 0.1165 0.3832 0.0899 n = 250 0.3 0.9404 0.8180 0.7826 0.8390 0.7542 0.9404 0.8222 0.7888 0.8490 0.7607 0.2 0.6500 0.4034 0.3918 0.4378 0.3539 0.6500 0.4079 0.3923 0.4519 0.3617 0.1 0.2164 0.1025 0.1253 0.1190 0.1068 0.2164 0.1061 0.1171 0.1317 0.1023 0 0.0500 0.0428 0.0598 0.0500 0.0500 0.0500 0.0410 0.0597 0.0500 0.0500 0.1 0.2164 0.1059 0.1224 0.1141 0.1063 0.2164 0.1087 0.1188 0.1194 0.1032 0.2 0.6500 0.2983 0.2357 0.3087 0.2120 0.6500 0.2961 0.2349 0.3097 0.2144 0.3 0.9404 0.5081 0.2179 0.5176 0.1954 0.9404 0.5114 0.2176 0.5262 0.1962 Note: The table reports simulated rejections frequencies under the null ( = 0) and alternative ( 6= 0) based on 10,000 replications of Model B with d = 0 and a = 0.4.
132
Table 4: Finite sample rejection frequencies for Model B with d = 1 and a = 0.4
=0 = 0.6 Limit LM BH LMsc BHsc Limit LM BH LMsc BHsc n = 100 0.3 0.6044 0.2308 0.3211 0.1923 0.2513 0.6044 0.2157 0.3137 0.1766 0.2405 0.2 0.3171 0.0892 0.1639 0.0702 0.1220 0.3171 0.0897 0.1638 0.0677 0.1210 0.1 0.1150 0.0441 0.0885 0.0331 0.0614 0.1150 0.0376 0.0860 0.0292 0.0579 0 0.0500 0.0591 0.0745 0.0500 0.0500 0.0500 0.0598 0.0740 0.0500 0.0500 0.1 0.1150 0.1450 0.0973 0.1304 0.0689 0.1150 0.1577 0.0982 0.1402 0.0711 0.2 0.3171 0.3130 0.1232 0.2924 0.0882 0.3171 0.3034 0.1155 0.2842 0.0839 0.3 0.6044 0.3959 0.1143 0.3728 0.0818 0.6044 0.3990 0.1089 0.3765 0.0797 n = 250 0.3 0.9404 0.8209 0.7818 0.8520 0.7560 0.9404 0.8130 0.7797 0.8330 0.7611 0.2 0.6500 0.4126 0.3943 0.4651 0.3655 0.6500 0.4032 0.3923 0.4346 0.3652 0.1 0.2164 0.1087 0.1288 0.1366 0.1146 0.2164 0.1069 0.1241 0.1230 0.1099 0 0.0500 0.0394 0.0570 0.0500 0.0500 0.0500 0.0425 0.0576 0.0500 0.0500 0.1 0.2164 0.1081 0.1161 0.1217 0.1051 0.2164 0.1083 0.1217 0.1172 0.1104 0.2 0.6500 0.3029 0.2367 0.3213 0.2185 0.6500 0.3002 0.2290 0.3093 0.2120 0.3 0.9404 0.5084 0.2118 0.5287 0.1949 0.9404 0.5009 0.2226 0.5132 0.2032 Note: The table reports simulated rejections frequencies under the null ( = 0) and alternative ( 6= 0) based on 10,000 replications of Model B with d = 1 and a = 0.4.
133
Chapter 4
Table 5: Simulated size of nominal 5% test for Model B with d = 0

0.5 0.0540 0.0512 0.0503 0.0565 0.0531 0.0560 0.0495 0.0524 0.0437 0.0547 0.0488 0.0625 0.1004 0.0540 0.0470 0.0510 0.0503 0.0502 0.0528 0.0550 0.0497 0.0568 0.0508 0.0669 0.0749 0.0710 0.1291 0.0588 0.0498 0.0518 0.0523 0.0524 0.25 =0 0 0.25 0.5 0.75 0.75 0.5 0.25 0.0517 0.0568 0.0516 0.0506 = 0.6 0 0.0502 0.0595 0.0501 0.0505 0.25 0.0516 0.0639 0.0472 0.0566 0.5 0.0774 0.0708 0.0471 0.0587 0.75 0.1276 0.0591 0.1082 0.0506
0.75
0.0620 0.0591
Test\a n = 100 LM BH n = 250 LM BH n = 500 LM BH Note:
0.0556 0.0487
0.0508 0.0496 0.0517 0.0505 0.0461 0.0426 0.0862 0.0520 0.0521 0.0532 0.0514 0.0441 0.0424 0.0895 0.0512 0.0544 0.0511 0.0554 0.0490 0.0554 0.0497 0.0500 0.0508 0.0496 0.0517 0.0480 0.0597 0.0487 The table reports simulated rejections frequencies under the null ( = 0) based on 10,000 replications of Model B with d = 0.
134
Table 6: Simulated size of nominal 5% test for Model B with d = 1

0.5 0.0614 0.0531 0.0541 0.0533 0.0519 0.0483 0.0516 0.0528 0.0462 0.0556 0.0505 0.0561 0.1083 0.0506 0.0537 0.0503 0.0535 0.0529 0.0532 0.0554 0.0560 0.0595 0.0527 0.0656 0.0762 0.0667 0.1334 0.0649 0.0587 0.0561 0.0592 0.0564 0.25 =0 0 0.25 0.5 0.75 0.75 0.5 0.25 0.0518 0.0578 0.0544 0.0493 = 0.6 0 0.0521 0.0604 0.0496 0.0556 0.25 0.0513 0.0669 0.0474 0.0600 0.5 0.0690 0.0713 0.0457 0.0611 0.75 0.1271 0.0618 0.1064 0.0520
0.75
0.0569 0.0544
Test\a n = 100 LM BH n = 250 LM BH n = 500 LM BH Note:
0.0516 0.0509
0.0510 0.0517 0.0528 0.0511 0.0432 0.0403 0.0870 0.0531 0.0494 0.0556 0.0491 0.0444 0.0407 0.0879 0.0543 0.0498 0.0481 0.0554 0.0554 0.0564 0.0464 0.0520 0.0535 0.0493 0.0535 0.0545 0.0589 0.0515 The table reports simulated rejections frequencies under the null ( = 0) based on 10,000 replications of Model B with d = 1.
135
Chapter 4
Table 7: Finite sample rejection frequencies for Model C with a = 0.4

=0 Test\ n = 100 LM BH n = 250 LM BH n = 500 LM BH Note: The 0 0.0622 0.0691 0.0429 0.0583 0.2 0.0704 0.1073 0.1554 0.1901 0.4 0.1921 0.2515 0.5948 0.6791 0.6 0.4606 0.5294 0.9256 0.9657 0.8 0.7339 0.7942 0.9939 0.9992 1.0 0.9000 0.9414 0.9998 1.0000 0 0.0609 0.0711 0.0387 0.0595 0.2 0.0893 0.1306 0.1862 0.2845 = 0.6 0.4 0.6 0.2718 0.3625 0.6743 0.8556 0.5731 0.6574 0.9589 0.9947 0.8 0.8376 0.8732 0.9979 0.9998 1.0 0.9535 0.9619 1.0000 1.0000 1.0000 1.0000
0.0411 0.3409 0.9124 0.9976 1.0000 1.0000 0.0403 0.3873 0.9388 0.9995 1.0000 0.0578 0.3942 0.9697 1.0000 1.0000 1.0000 0.0534 0.5824 0.9979 1.0000 1.0000 table reports simulated rejections frequencies based on 10,000 replications of Model C with = 1 and a = 0.4.
136
Table 8: Empirical results for the real interest rate data

(a) Univariate tests of d = 0 with non-zero mean p=0 p=1 p=4 LM(1) BH(1) LM(1) BH(1) LM(1) BH(1) 25.34 1.81 0.19 1.69 1.30 USA 36.48 Japan 37.56 26.97 0.02 0.39 0.45 0.36 UK 51.98 31.23 0.85 0.64 0.10 0.32 18.07 8.99 4.06 0.43 2.74 Germany 26.09 France 42.63 31.12 0.92 1.16 1.04 8.93 Switzerland 44.53 28.05 20.17 2.25 0.14 2.05 (b) Multivariate tests of d = 0 with non-zero mean p=0 p=1 p=4 LM(1) BH(36) LMK (6) LM(1) BH(36) LMK (6) LM(1) BH(36) LMK (6) 136.47 166.44 145.09 0.18 41.11 3.76 2.23 76.76 6.44 Note: One asterisk denotes signicance at 5% level and two asterisks denote signicance at 1% level. All test statistics are asymptotically 2 -distributed, with the appropriate degrees of freedom reported in parenthesis.
137
Chapter 4
Figure 1: Time series plots of real interest rates
20 10 0 -10 1980
USA
20 0
Japan
1985
UK
1990
1980 20 10
Germ any
1985
19 90
10
0 0 1980 20 10 0 1980 1985 1990

France
1985
1990
1980 20 10 0 -10 1980
1985
Switzerland
19 90
1985
19 90
138

Nielsen Dissertation

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Nielsen Dissertation

Caricato da

Copyright:

Formati disponibili

Multivariate Fractional Integration and Cointegration

By Morten rregaard Nielsen

University of Aarhus, Denmark

To My Family Til Min Familie

Summary of the Dissertation

Dansk Resume (Danish Summary)

Summary of the Dissertation

Dansk Resume (Danish Summary)

Stationary Fractional Cointegration Model

R (d) = log G (d) 2d

where Ipp (j ) = Ipp (j ) + ( )0 Re (Ixx (j )) ( ) + 2( )0 Re (Ixp (j )) (14)

E = 2 Ip + G G1 , 2gab , Fab = gpp (1 da db + 2dp )

and asymptotically distributed according to de d D F DLS 0 N mm ge (1 2d)2 0, 2gx (1 2d 2de ) !

(2) with respect to F DLS is

V ar( F DLS ) (2) V ar( )

(1 2d)2 , (1 2d)2 4d2 e

Finite Sample Performance

Model A : Model B : Model C :

= corr(1t , 2t ) = 0, u1t = 0.5u1,t1 + 1t , u1t = 0.5u1,t1 + 1t , = 0, = 0.5,

The Implied-Realized Volatility Relation

Appendix A: Proof of Theorem 1

j 2(1 d0p ) |hj | 0 j |hj | 0

for 1 < d0p , for some > 0,

m 1 X p (j/q)2(1 d0p ) 1 hj 0, m j=1

m (log n)2 X p |hj | 0, m j=1

P where q = exp m1 m log j and j=1 hj =

Ipp (j ) Ipp (j ) Gpp j

1 X 2(dd0p ) j , G (d) = Gpp m Fk,a (d) = Ra (d) d

m 1 X (log j)k 2d Iaa (j ) , j m m G1,a (d) 2 X log j , G0,a (d) m j=1

Appendix B: Limit of the Score

see Lobato (1999).

The part of the left-hand side of (35) corresponding to (37) is

2 da d a+p m p 2 da d a+p m p 2 da d a+p m p

m X j=1 m X j=1 m X j=1

j p Re (g p j Iwa (j )) j p Re (g p j (Iwa (j ) A (j ) J (j ) A (j ))) a j p Re (g p j A (j ) J (j ) A (j )) , a

Write (40) as n 2 m 2 X dp 1 X itj da d a+p m p j Re g p j A (j ) t e Aa (j ) 2n t=1 m a=1

t1 p1 m X X a+p da dp X dp m j Re A0 (j ) j g p0 ei(ts)j Aa (j ) s . n m s=1 a=1 j=1

m a X da j j Re A0 (j ) j g a0 ei(ts)j Aa (j ) s . n m s=1 a=1 j=1

X X 2 2p1 2p1 p E ztn Ft1 a b ab 0,

2 E ztn 1 (|ztn | > ) 0 for all > 0.

First, to show (44),

n X m (log m)2 X (log m)2 = O + n2 s2 m s=1 s=[n/m]+1 ! (log m)2 = O , n

0 E tr c0 ts,n cts,n s s tr c0 ts,n cts,n

by Assumption 2. Rewrite this expression as

t=1 s=1 j=1 j 0 =1

n t1 m XXX t=1 s=1 j=1

n t1 m XXX t=1 s=1 j=1 n t1 m XXX t=1 s=1 j=1

1 tr 0 + j2 0 0 1 + j 0 2 cos ((t s) j ) cos (t s) j 0 j1 j 2 n2 m (50)

n t1 m m XXX X n t1 m m XXX X n t1 m m XXX X

t=1 s=1 j=1 j 0 6=j

tr 0 j 0 1 cos ((t s) j ) cos (t s) j 0 j1 tr 0 j 0 2 cos ((t s) j ) cos (t s) j 0 j2

t=1 s=1 j=1 j 0 6=j

t=1 s=1 j=1 j 0 6=j

2 tr 0 j 0 2 cos ((t s) j ) cos (t s) j 0 . j1 Pp

It was shown by Lobato (1999) that (50) is asymptotically equal to 29

d +db 2dp 2dp da db j n2 m

m p1 p1 XXX j=1 a=1 b=1 m p1 p1 XXX j=1 a=1 b=1

d +db 2dp 2dp da db j 2m n d +db 2dp 2dp da db j 2m n

a+p b+p a+p b+p

gab + gba n2 m a b p gpp 4 1 da db + 2dp

2gab , gpp (1 da db + 2dp )

0 c0 ts,n cts,n cts,n cts,n t1 X r=1

ctr,n c0 tr,n cts,n