0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

54 visualizzazioni8 paginepanel data

Nov 01, 2015

© © All Rights Reserved

DOCX, PDF, TXT o leggi online da Scribd

panel data

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

54 visualizzazioni8 paginepanel data

© All Rights Reserved

Sei sulla pagina 1di 8

In time series data we observe the values of one or more variables

over a period of time (e.g., GDP for several quarters or years).

In cross-section data, values of one or more variables are collected for

several sample units, or entities, at the same point in time.

In panel data the same cross-sectional unit (say a family or a firm or a

state) is surveyed over time. In short, panel data have space as well

as time dimensions.

There are other names for panel data, such as pooled data (pooling of time

series and cross-sectional observations), combination of time series and

cross-section data, micropanel data, longitudinal data (a study over

time of a variable or group of subjects), event history analysis (e.g.,

studying the movement over time of subjects through successive states or

conditions), cohort analysis.

the following advantages of panel data

1. Since panel data relate to individuals, firms, states, countries, etc.,

over time, there is bound to be heterogeneity in these units. The

techniques of panel data estimation can take such heterogeneity

explicitly into account by allowing for individual-specific variables,

such as individuals, firms, states, and countries.

2. By combining time series of cross-section observations, panel data

give more informative data, more variability, less collinearity among

variables, more degrees of freedom and more efficiency.

3. By studying the repeated cross section of observations, panel data are

better suited to study the dynamics of change. Spells of

unemployment, job turnover, and labor mobility are better studied with

panel data.

4. Panel data can better detect and measure effects that simply cannot

be observed in pure cross-section or pure time series data. For

example, the effects of minimum wage laws on employment and

earnings can be better studied if we include successive waves of

minimum wage increases in the federal and/or state minimum wages.

5. Panel data enables us to study more complicated behavioral models.

For example, phenomena such as economies of scale and technological

change can be better handled by panel data than by pure cross-section

or pure time series data.

6. By making data available for several thousand units, panel data can

minimize the bias that might result if we aggregate individuals or firms

into broad aggregates.

time series observations, then such a panel (data) is called a balanced

panel.

If the number of observations differs among panel members, we call such a

panel an unbalanced panel.

In a short panel the number of cross-sectional subjects, N, is greater than the

number of time periods, T.

In a long panel, it is T that is greater than N.

1. Pooled OLS model. We simply pool total of all the observations and

estimate a grand regression, neglecting the cross-section and time

series nature of our data.

ThesimplestwayistopoolalltheobservationstogetherandruntheOLSregressionmodel

However,theproblemwiththisapproachisthatpooledOLSisignoringtheheterogeneityor

individualitythatexistsamongdifferentvariables.

2. The fixed effects least squares dummy variable (LSDV) model.

Here we pool total of all the observations, but allow each cross-section

unit (i.e., variable in our example) to have its own (intercept) dummy

variable.

3. The fixed effects within-group model. Here also we pool total of all

the observations, but for each variable we express each variable as a

deviation from its mean value and then estimate an OLS regression on such

mean-corrected or de-meaned values.

4. The random effects model (REM). Unlike the LSDV model, in which

we allow each variable to have its own (fixed) intercept value, we assume

that the intercept values are a random drawing from a much bigger

population of variables.

Stationary

constant over time and the value of the covariance between the two time

periods depends only on the distance or gap or lag between the two time

periods and not the actual time at which the covariance is computed.

Stochastic process is known as a weakly stationary, or covariance

stationary, or second-order stationary, or wide sense, stochastic

process.

In short, if a time series is stationary, its mean, variance,and autocovariance

(at various lags) remain the same no matter at what point we measure them;

that is, they are time invariant. Such a time series will tend to return to its

mean (called mean reversion) and fluctuations around this mean

(measured by its variance) will have a broadly constant amplitude.

If a time series is not stationary in the sense just defined, it is called a

nonstationary time series (keep in mind we are talking only about weak

stationarity). In other words, a nonstationary time series will have a time

varying mean or a time-varying variance or both.

a purely random, or white noise, process. We call a stochastic process

purely random if it has zero mean, constant variance 2, and is serially

uncorrelated

RANDOM WALK MODEL -- in stationary time series, one often encounters

nonstationary time series, the classic example being the random walk

model (RWM)

It is often said that asset prices, such as stock prices or exchange rates,

follow a random walk; that is, they are nonstationary.

two types of random walks: (1) random walk without drift (i.e., no constant or

intercept term) and

(2) random walk with drift (i.e., a constant term is present).

UNIT ROOT TEST tests whether a time series variable is non-stationary using

an autoregressive model. A well-known test that is valid in large samples is

the augmented DickeyFuller test. The optimal finite sample tests for a unit root in

autoregressive models were developed by Denis Sargan and

Alok Bhargava. Another test is the PhillipsPerron test. These tests use the existence

of a unit root as the null hypothesis.

1. Regression analysis based on time series data implicitly assumes that the underlying

time series are stationary. The classical t tests, F tests, etc. are based on this

assumption.

2. In practice most economic time series are nonstationary.

3. A stochastic process is said to be weakly stationary if its mean, variance, and

autocovariances are constant over time (i.e., they are timeinvariant).

4. At the informal level, weak stationarity can be tested by the correlogram of a time

series, which is a graph of autocorrelation at various lags. For stationary time series,

the correlogram tapers off quickly, whereas for nonstationary time series it dies off

gradually. For a purely random series, the autocorrelations at all lags 1 and greater

are zero.

5. At the formal level, stationarity can be checked by finding out if the time series

contains a unit root. The DickeyFuller (DF) and augmented DickeyFuller (ADF)

tests can be used for this purpose.

6. An economic time series can be trend stationary (TS) or difference stationary

(DS). A TS time series has a deterministic trend, whereas a DS time series has a

variable, or stochastic, trend. The common practice of including the time or trend

variable in a regression model to detrend the data is justifiable only for TS time

series. The DF and ADF tests can be applied to determine whether a time series is TS

or DS.

7. Regression of one time series variable on one or more time series variables often can

give nonsensical or spurious results. This phenomenon is known as spurious

regression. One way to guard against it is to find out if the time series are

cointegrated.

8. Cointegration means that despite being individually nonstationary, a linear combination

of two or more time series can be stationary. The EG, AEG, and CRDW tests can be used to

find out if two or more time series are cointegrated.

9. Cointegration of two (or more) time series suggests that there is a long-run, or

equilibrium, relationship between them.

10. The error correction mechanism (ECM) developed by Engle and Granger is a means

of reconciling the short-run behavior of an economic variable with its long-run behavior.

11. The field of time series econometrics is evolving. The established results and tests are in

some cases tentative and a lot more work remains.

An important question that needs an answer is why some economic time series are

stationary and some are nonstationary.

Forecasting

In econometrics, forecasting is the estimation of the expected value

of a dependent variable for observations that are not part of the

same data set

In most forecasts, the values being predicted are for time periods in

the future, but cross-sectional predictions of values for countries

or people not in the sample are also common

used interchangeably in this chapter

Some authors limit the use of the word forecast to out-ofsample prediction for a time series

Econometric forecasting generally uses a single linear equation to

predict or forecast

Our use of such an equation to make a forecast can be summarized

into two steps:

1. Specify and estimate an equation that has as its dependent

variable the item that we wish to forecast:

observations for which we want a forecast and substitute them into our

forecasting equation:

however, and most actual forecasting involves one or more

additional questionsfor example:

1.

Unknown Xs: It is unrealistic to expect to know the values for the

independent variables outside the sample

What happens when we dont know the values of the

independent variables for the forecast period?

2. Serial Correlation: If there is serial correlation involved,

the forecasting equation may be estimated with GLS

How should predictions be adjusted when forecasting

equations are estimated with GLS?

3. Confidence Intervals: All the previous forecasts were single

values, but such single values are almost never exactly right, so maybe

it would be more helpful if we

forecasted a confidence interval

instead

How can we develop these confidence intervals?

4. Simultaneous Equations Models: many economic and business

equations are part of simultaneous models

How can we use an independent variable to forecast a

dependent variable when we know that a change in value

of the dependent variable will change, in turn, the value of

the independent variable that we used to make the

forecast?

Period)

known with certainty

This is rare in practice

Conditional forecast: actual values of one or more of the

independent variables are not known

This is the more common type of forecast

The careful selection of independent variables can sometimes help

avoid the need for conditional forecasting

This opportunity can arise when the dependent variable can be

expressed as a function of leading indicators:

A leading indicator is an independent variable the

movements of which anticipate movements in the dependent

variable

The best known leading indicator, the Index of Leading

Economic Indicators, is produced each month

The techniques we use to test hypotheses can also be adapted to

create forecasting confidence intervals

Given a point forecast,

all we need to generate a confidence

interval around that forecast are tc, the critical t-value (for the

desired level of confidence), and SF, the estimated standard error of

the forecast:

The critical t-value, tc, can be found in Statistical Table (for a two-tailed

test with T-K-1 degrees of freedom)

Lastly, the standard error of the forecast, SF, for an equation with

just one independent variable, equals the square root of the forecast

error variance:

)

where:

s2

=

T

=

XT+1 =

=

ARIMA

the

the

the

the

number of observations in the sample

forecasted value of the single independent variable

arithmetic mean of the observed Xs in the sample

1.

2.

and past values of the dependent variable to produce often

accurate short-term forecasts of that variable

Examples of such forecasts are stock market price predictions

created by brokerage analysts (called chartists or

technicians) based entirely on past patterns of movement of

the stock prices

If ARIMA models thus essentially ignores economic theory (by

ignoring traditional explanatory variables), why use them?

The use of ARIMA is appropriate when:

little or nothing is known about the dependent variable being

forecasted,

the independent variables known to be important cannot be

forecasted effectively

all that is needed is a one or two-period forecast

(called processes) into one equation:

An autoregressive process (AR):

expresses a dependent variable as a function of past values of the

dependent variable

This is similar to the serial correlation error term function and

the dynamic model

a moving average process (MA):

expresses a dependent variable as a function of past values of

the error term

Such a function is a moving average of past error term observations

that can be added to the mean of Y to obtain a moving average of

past values of Y

the independent variables known to be important cannot be

forecasted effectively all that is needed is a one or two-period

forecast

model for the variance of a time series. ARCH models are used to describe a

changing, possibly volatile variance. Although an ARCH model could possibly

be used to describe a gradually increasing variance over time, most often it

is used in situations in which there may be short periods of increased

variation. (Gradually increasing variance connected to a gradually increasing

mean level might be better handled by transforming the variable.)

ARCH models were created in the context of econometric and finance

problems having to do with the amount that investments or stocks increase

models for that type of variable.

An ARCH model could be used for any series that has periods of increased or

decreased variance.

A GARCH (GENERALIZED AUTOREGRESSIVE CONDITIONALLY

HETEROSCEDASTIC) model uses values of the past squared observations

and past variances to model the variance at time t. As an example, a

GARCH(1,1) is

2t=0+1y2t1+12t1

In the GARCH notation, the first subscript refers to the order of the y2 terms

on the right side, and the second subscript refers to the order of the

2 terms.

time series. The structure is that each variable is a linear function of past

lags of itself and past lags of the other variables.

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.