Econometrics Notes PDF

`ECONOMETRICS
Econometrics is the application of mathematics, statistical methods, and, more

recently, computer science, to economic data and is described as the branch of economics that
aims to give empirical content to economic relations. More precisely, it is "the quantitative
analysis of actual economic phenomena based on the concurrent development of theory and
observation, related by appropriate methods of inference. "Econometrics is the intersection of
economics, mathematics, and statistics. Econometrics adds empirical content to economic theory
allowing theories to be tested and used for forecasting and policy evaluation.
Econometric Theory
Econometric theory uses statistical theory to evaluate and develop econometric methods.
Econometricians try to find estimators that have desirable statistical properties including
unbiasedness, efficiency, and consistency. An estimator is unbiased if its expected value is the
true value of the parameter; It is consistent if it converges to the true value as sample size gets
larger, and it is efficient if the estimator has lower standard error than other unbiased estimators
for a given sample size. Ordinary least squares (OLS) is often used for estimation since it
provides the BLUE or "best linear unbiased estimator" (where "best" means most efficient,
unbiased estimator) given the Gauss-Markov assumptions. When these assumptions are violated
or other statistical properties are desired, other estimation techniques such as maximum
likelihood estimation, generalized method of moments, or generalized least squares are used.
Methodology of Econometrics
1. Statement of Economic Theory or Hypothesis:
At this stage of the analysis, applied econometricians rely heavily on economic theory to
formulate the hypothesis. A theory should have a prediction. In statistics and econometrics,
we also speak of hypothesis. One example is the marginal propensity to consume (MPC)
proposed by Keynes, where Keynes postulated that the Marginal Propensity to Consume
(MPC), the rate of change of consumption for a unit (say a dollar) change in income, is
greater than zero but less than 1. Other examples could be that lower taxes would increase
growth, or maybe that it would increase economic inequality, and that introducing a common
currency has a positive effect on trade.
2. Specification Of Mathematical Model:
Although Keynes postulated a positive relationship between consumption and income, he did
not specify the precise form of the functional relationship between these two variables. The
mathematical form of Keynesian’s consumption function is expressed as:
Y = β1 + β2X (1.1)
(where β2is greater than 0 and less than 1)
Where Y= Consumption Expenditure and X= Income, and where β1 and β2, known as the
parameters of this model, are respectively, the intercept and slope coefficients.
The slope coefficient β2, measures the MPC and intercept coefficient measures autonomous
consumption. This equation which states that consumption is linearly related to income, is an
example of the mathematical model of the relationship between consumption expenditure
and income that is called the consumption function in economics.
3. Specification of the Econometric Model

Here, we assume that the mathematical model is correct but we need to account for the fact
that it may not be so deterministic. This is because there are other non-quantifiable or
unknown factors that affect Y. Moreover, size of family, ages of the members in the family,
family religion are likely to have some influence on consumption but all these variables are
not included in the model. It is also possible that mismeasurements might have entered the
data. To represent all these factors, a disturbance term is added to the mathematical model,
to obtain the econometric model.
Yi = β1 + β2Xi +Ui (1.2)
1
where “Ui” is known as the disturbance, or error, is a random (stochastic) variable that has
well-defined properties. The disturbance term “U” may well represent all those factors that
affect consumption but are not taken into account explicitly. The econometric consumption
function hypothesizes that the dependent variable Y (consumption) is linearly related to the
explanatory variable X (income) but the relationship between the two is not exact; it is
subject to individual variation.
4. Obtaining Data:
We need data for the variables above. This can be obtained from government statistics
agencies and other sources. A lot of data can also be collected on the Internet in these days.
But we need to learn the art of finding appropriate data from the ever increasing huge loads
of data. To estimate the econometric model that is, to obtain the numerical values of β1 and
β2, we need data. Data are classified into time series data, cross section data and pooled data.
5. Estimation of the Econometric Model
Once the data are collected, next task is to estimate the parameters of the consumption
function. The numerical estimates of the parameters give empirical content to the
consumption function. For now, note that the statistical technique of regression analysis is
the main tool used to obtain the estimates. Here, we quantify and , i.e. we obtain
numerical estimates. This is done by statistical technique called regression analysis.
Estimation of the econometric model is usually done using Ordinary Least Square methods.
6. Hypothesis Testing:
According to “positive” economists like Milton Friedman, a theory or hypothesis that is not
verifiable by appeal to empirical evidence may not be admissible as part of scientific enquiry.
We use hypothesis testing techniques for verifying whether the estimated regression equation
and the parameter estimates are reliable. .
7. Forecasting or Prediction
If the chosen model does not refute the hypothesis or theory under consideration, we may use
it to predict the future value(s) of the dependent, or forecast, variable Y on the basis of
known or expected future value(s) of the explanatory, or predictor, variable X.
8. Use of the Model for Control or Policy Purposes:
As these calculations suggest, an estimated model may be used for control, or policy,
purposes. By appropriate fiscal and monetary policy mix, the government can manipulate the
control variable X to produce the desired level of the target variable Y.
SIGNIFICANCE OF THE STOCHASTIC DISTURBANCE TERM

The disturbance term is a variable in a statistical and/or mathematical model, which is
included when the model does not fully represent the actual relationship between the
independent variable(s) and the dependent variable. An error term essentially means that the
model will not be completely accurate, and will result in differing results during real world
applications. The exact linear relationship in can be made stochastic by adding a random
disturbance or error term, Ui, Yi = β0 +β1 (Xi ) + Ui, The inclusion of a (random) disturbance or
2
error term (with well-deﬁned probabilistic properties) is required in regression analysis for the
following important reasons.
1. Vagueness of theory: The theory, if any, determining the behaviour of Y may be, and often is,
incomplete. We might know for certain that weekly income X influence weekly consumption
expenditure Y, but we might be ignorant or unsure about the other variables affecting Y.
Therefore, ui may be used as a substitute for all the excluded or omitted variables from the
model.
2. Unavailability of data: Even if we know what some of the excluded variables are and
therefore consider a multiple regression rather than a simple regression, we may not have
quantitative information about these variables. So we may be forced to omit these variables from
our model despite its great theoretical relevance in explaining consumption expenditure.
3. Core variables versus peripheral variables: Assume in our consumption income example that
besides incomeX1, the number of children per familyX2, sex X3, religion X4, education X5, and
geographical region X6, also affect consumption expenditure. But it is quite possible that the joint
influence of all or some of these variables may be so small that as a practical matter and for cost
considerations it does not pay to introduce them into the model explicitly. Their combined effect
can be treated as a random variable ui.
4. Intrinsic randomness in human behaviour: Even if we succeed in introducing all the relevant
variables in to the model, there is bound to be some "intrinsic" randomness in individual Y's that
cannot be explained no matter how hard we try. The disturbances, the u's may very well reflect
this intrinsic randomness.
5. Poor proxy variables: Although the classical regression model assumes that the
variables Y and X are measured accurately, in practice the data may be plagued by errors of
measurement. Consider, for example Milton Friedman's well-known theory of the consumption
function.
He regards permanent consumptions (Yp) as a function of permanent income (Xp). In practice we
use proxy variables, such as current consumption(Y) and current income(X) which are
observable. Since the observed variables may not be equal to the actual permanent consumption
and income, we use the disturbance term. The disturbance term, u in this case may also represent
errors of measurement.
6. Principle of Parsimony : Usually the behaviour of Y is explained substantially with two or
three explanatory variables and if the theory is not strong other variables can be included. But ui
can represent all these variables. Just to keep the regression model simple, one cannot avoid
relevant and important variables.
7. Wrong functional form : Very often we do not know the form of functional relationship
between the regressand and regressors. For example, Is consumption expenditure a linear
function of income or a nonlinear function? If it's the former then, Yi= β1+β2 Xi +ui is the proper
functional relationship between Y and X, but if it's the latter Yi=β1+β2Xi2+ui may be the correct
functional form. In two variable models functional form can be judged using a scatter diagram.
But in case of multiple regression models, as the appropriate functional forms cannot be judged
this way.
For these reasons, it is important to use the stochastic disturbance term.
SPECIFICATION BIAS
Before any equation can be estimated, it must be completely specified. Specifying an

econometric equation consists of three parts: choosing the correct independent variables, the
correct functional form, and the correct form of the stochastic error term. A specification error
results when any one of these choices is made incorrectly. In practice specification bias is caused
by any of the following three cases
1. Omitting an important explanatory variable: Whenever you have an omitted an important

explanatory variable, the interpretation and use of your estimated equation become suspect.
Leaving out a relevant variable, like price from a demand equation, not only prevents you from
getting an estimate of the coefficient of price but also usually causes bias in the estimated
coefficients of the variables that are in the equation. The bias caused by leaving a variable out of
an equation is called omitted variable bias (or, more generally, specification bias). In an equation
with more than one independent variable, the coefficient represents the change in the dependent
variable Y caused by a one-unit increase in the independent variable Xk, holding constant the
3
other independent variables in the equation. If a variable is omitted, then it is not included as an
independent variable, and it is not held constant for the calculation and interpretation of the
coefficients. This omission can cause bias: It can force the expected value of the estimated
coefficient away from the true value of the population coefficient.
2. Including an irrelevant variable: If the model includes an irrelevant variable, then it has
more independent variables in the estimated equation than in the true one. The addition of a
variable to an equation where it doesn’t belong does not cause bias, but it does increase the
variances of the estimated coefficients of the included variables.
3. Choosing the wrong functional form: If we choose the wrong functional form or make the
wrong assumptions about the variables, the estimated regression may not be correct. To get an
idea of this, we look at the following example: Suppose we choose the following two models to
depict the underlying relationship between the rate of change of money wages and the
unemployment rate: where Yi = the rate of change of money wages, and Xi = the unemployment
rate. Then consider two models: The first regression model (1) is linear both in the parameters
and the variables, Yi= β0+β1Xi+Ui whereas the second model (2) is linear in the parameters (hence
a linear regression model by our definition) but nonlinear in the variable X Yi= β0+β1(1/Xi) +Ui
If (2) is the “correct” or the “true” model, fitting (1) to the scatter-points shown in the
diagram below will give us wrong predictions: Between points A and B, for any given Xi, (1) is
going to overestimate the true mean value of Y, whereas to the left of A (or to the right of B) it is
going to underestimate (or overestimate, in absolute terms) the true ean value of Y. The
preceding example is an instance of what is called a specification bias or a specification error;
here the bias consists in choosing the wrong functional form.
CLASSICAL LINEAR REGRESSION MODEL

Regression analysis is the study of the nature of relationship between a dependant and one or
more independent variables. The Ordinary Least Squares (OLS) Method is widely used for
estimating the regression parameters.
ASSUMPTIONS:
The Gaussian, standard or classical linear regression model (CLRM), which is the cornerstone of
most econometric theory, makes 10 assumptions which are discussed with context to two
variable regression model.
1. Linear Regression Model. The regression model is linear in the parameters Yi= β1+β2+ui
Conditional expectation of Y, E(Y/Xi) is a linear function of the parameters, the β’s; it may
or may not be linear in the variable X. In this interpretation E(Y/Xi) = β1 + β2 Xi2 is a linear
regression model. But the model E(Y/Xi) = β1 + β22 Xi is a non linear regression model. The
term linear regression will always mean a regression that is linear in its parameters, the β’s-
the powers are raised to the first power only.
2. X values are fixed in repeated sampling. Values taken by regressor X are considered fixed in
repeated samples: More technically, X is assumed to be nonstochastic. It is very important to
understand the meaning of “fixed values in repeated sampling”, this basically means that the
regression analysis is conditional regression analysis, that is conditional on the given values
of the regressor X, and finding several values for Y.
4
3. Zero mean value of disturbance ui. Given the value of X, the mean, or expected, value of
random disturbance term ui, is zero. Technically, the conditional mean value of ui is zero.
Symbolically we have E(ui/Xi) = 0.
The figure shows few values of X and Y, where each Y corresponds to a given X and is
distributed around its mean value. There are some Y values above and below the mean
value and the distances between them is the error terms ui. The mean value of the
deviations is equal to zero.
4. Homoscedasticity or equal variance of ui. Given the value of X, the variance of ui is the
same for all observations. That is, the conditional variances of ui are identical.
var(ui/Xi) = σ2
var(ui/Xi) = E[ui - E(ui/Xi)]2
= E{ui2 + [E(ui/Xi)] 2 - E(ui/Xi)}
= E(ui2/Xi)] [since E(ui/Xi) = 0, by assumption 3 ]
= σ2
The equation states that variance of ui for each Xi is some positive constant number equal
to σ2. This represents the assumption of homoscedasticity, or equal (homo) spread
(scedasticity) or equal variance.
F(y/x)
Demonstration of the Homskedasticity Assumption
Predicted Line Drawn Under Homoskedasticity
Variance across values

of x is constant
x1
x2
x3
x4
x
In contrast consider the below figure, where the conditional variance of the Y varies with
X. This is known as heteroscedasticity, or with unequal spread, or variance.
F(y/x)
Demonstration of the Homskedasticity Assumption
Predicted Line Drawn Under Heteroskedasticity
Variance differs across

values of x
x1
x2
x3
x4
x
5
5. No autocorrelation between the disturbances. Given any two X values, Xi and Xj, the
correlation between any two ui and uj ,(i≠j) is zero. Symbolically,
Cov(ui uj/XiXj) =0
Cov(ui uj/XiXj) =E{[ui-E(ui)]/Xi}{[uj-E(uj)]/Xj} [cov-covariance]
=E{(ui/Xi)( uj/Xj) }
= E(ui/Xi) E(uj/Xj) [since ui and uj are generated independently]
=0
This equation postulates that the disturbances ui and uj are uncorrelated. Technically, this
is the assumption of no serial correlation or no auto correlation.
Positive correlation- positive u followed by a positive u or a negative u followed by a

negative u.
Negative correlation- positive u followed by negative u or vice versa.
6. Zero covariance between ui and Xi or E(uiXi)=0. Formally,

Cov(ui,Xi)=E[ui - E(ui)][Xi - E(Xi)]
=E[ui(Xi-E(Xi))] [since E(ui)=0]
=E(uiXi)-E(Xi)E(ui) [since E(Xi)is nonstochastic]
= E(uiXi) [since E(ui)=0]
= XiE(ui) = 0
This states that the disturbance u and explanatory variable X are uncorrelated. If X and u
(which may represent the influence of all the omitted variables) are correlated, it is not
possible to assess their individual effect on Y.
7. The number of observations n must be greater than the number of parameters to be
estimated. Alternatively, the number of observations n must be greater than the number of
explanatory variables.
8. Variability in X values. The X values in a given sample must not all be the same.
Technically, var(x) must be a finite positive number. If all the values are identical, then Xi=X
and the denominator of the equation will be zero, making it impossible to estimate β2 and
therefore β1.
9. The regression model is correctly specified. Alternatively, there is no specification bias or
error in the model used in empirical analysis. We must consider the variables, the functional
form, the linearity of parameters and variables and the probabilistic assumptions made about
X, Y and ui. If we choose the wrong functional form or make the wrong assumptions about
the variables, the estimated regression may not be correct.
10. There is no perfect multicollinearity. That is, there are no perfect linear relationships among
the explanatory variables. Perfect collinearity between two independent variables implies
that: they are really the same variable, or one is a multiple of the other, and/or that a constant
has been added to one of the variables. If there is perfect multicollinearity, then OLS
estimator becomes indeterminate.
6
PROPERTIES OF THE OLS ESTIMATORS
The Gauss Markov Theorem
Gauss Markov theorem states that given the assumptions of the classical linear regression
model, the least square estimators, in the class of unbiased linear estimators, have minimum variance.
In other words they are BLUE (Best Linear Unbiased Estimator).
According to the theorem that the OLS estimator is the best unbiased estimator with minimum
variance assuming the model is linear, the expected value of the error term is zero, errors are
homoskedastic and not autocorrelated, and there is no perfect multicollinearity.
The OLS estimator is widely used because it is BLUE. That is, among all unbiased linear estimators, it has
the lowest variance. The BLUE properties of the OLS estimator are often referred to as the Gauss
Markov Theorem. However, non-linear estimators may be superior to the OLS estimator, but since it is
often difficult or impossible to find the variance of unbiased nonlinear estimator, OLS estimator is most
widely used. The OLS estimator being linear is also easier to use than non linear estimators.
1. Linearity: The estimator is linear, that is, a linear function of a random variable such as the
dependant variable Y in the regression model. The regression model is linear in the parameters Yi =
ß1 + ß2 + Ui, conditional expectations of Y, E(Y/Xi) is a linear function of the parameters, the ß’s; it
may or may not be linear in the linear in the variable X. In this interpretation E(Y/Xi)= ß1 + ß2Xi2 is a
linear regression model. The term linear regression will always mean a regression that is linear in its
parameters, that is, the powers of the parameters are raised to the first power only.
2. Unbiasedness: The OLS estimator is unbiased, that is, the average or expected value of the
estimates is equal to the true value of the parameter. An estimator is unbiased if the mean of its
sampling distribution equals the true parameter. The mean of the sampling distribution is the
expected value of the estimator. Thus, lack of bias means that E(ßcap)= ß where ß is the estimator of
the true parameter, ß. Bias is then defined as the difference between expected value of the
estimator and the true parameter; that is, bias = E(ß)- ß. Note that lack of bias does not mean that
ßcap = ß, but that in repeated random sampling, we, get on average, the correct estimate. The hope is
that the sample actually obtained is close to the mean of the sampling distribution of the estimator.
3. Efficiency: The best unbiased or efficient estimator refers to the one with the smallest variance
among unbiased estimators. It is the unbiased estimator with the most compact or least spread out
distribution. An efficient estimator has the smallest confidence interval and is more likely to be
satisfied than any other estimator. The OLS estimator has minimum variance in the class of all such
linear unbiased estimators. An estimator (a function that we use to get estimates) that has a lower
variance is one whose individual data points are those that are closer to the mean. This estimator is
statistically more likely than others to provide accurate answers.
F(βx)
Unbiased and
efficient estimator
of β Biased estimator
High Sampling
of β
Variance means
inefficient
estimator of β
0
True β β + bias
7
4. Consistency: In addition to the BLUE properties, OLS estimators possess an additional property for
large samples (size more than 30). Conditions for consistency are
 As the sample size increases, the estimator must approach more and more the true parameter
(This is referred to as the asymptotic unbiasedness)
 As the sample size approaches infinity in the limit, the sampling distribution of the estimator
must collapse or become a straight vertical line with height (probability) of 1 above the value of
the true parameter.
 In the figure, β is a consistent estimator of β because as n increases, β approaches β, and as n
approaches infinity in the limit, the sampling distribution of β collapses on β.
Demonstration of
F(βx)
Consistency
n= large
n= medium
n = small
0
True β
In the figure, β is a consistent estimator of β because as n increases, β approaches β, and as n

approaches infinity in the limit, the sampling distribution of β collapses on β.

Econometrics Notes PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Econometrics Notes PDF

Caricato da

Copyright:

Formati disponibili

`ECONOMETRICS

Econometrics is the application of mathematics, statistical methods, and, more

3. Specification of the Econometric Model

Yi = β1 + β2Xi +Ui (1.2)

SIGNIFICANCE OF THE STOCHASTIC DISTURBANCE TERM

Before any equation can be estimated, it must be completely specified. Specifying an

1. Omitting an important explanatory variable: Whenever you have an omitted an important

CLASSICAL LINEAR REGRESSION MODEL

Variance across values

Variance differs across

Positive correlation- positive u followed by a positive u or a negative u followed by a

Negative correlation- positive u followed by negative u or vice versa.

6. Zero covariance between ui and Xi or E(uiXi)=0. Formally,

The Gauss Markov Theorem

In the figure, β is a consistent estimator of β because as n increases, β approaches β, and as n

Potrebbero piacerti anche