D-Block 2 Floor, Room # 218

Mohammad Ali Jinnah University, Islamabad

Office: 051-4486701 Ext: 212

Cell: 0333-6487274

dr.aqkhan@live.com; abdul.qadeer@jinnah.edu.pk

Section 1, 2 & 3

Lecture # 4

July 3-5, 2012

i) Linearity

ii) Xt as variable not fixed

iii) Xt is not stochastic and fixed in different/repeated sample

iv) Expected disturbance term is zero

v) Homoskedasticity

vi) Autocorrelation

vii) Multicollineraity

viii) Normality of residual

ix) N≥2

Linearity: OLS (Ordinary Least Square Method) will be applied after the linearity of the data. The

assumption of linearity is that “slope should be constant”. So, slope will be constant when there will be

linearity in data. The confirmation of linearity is done through by functional specification test.

Dear Students & Fellows, the term “linear” refer to the fact that the population parameters that are “a

and b” in the equation appear linearly here and not to the fact that Xt (that is Independent variable or

explanatory variable) appears linearly. Thus, the model Yt =a+bXt2 + ut is still called a simple linear

regression even though the term “X” appears as a quadratic.

Yt = a + XB +ut

Violations of linearity are extremely serious. If you fit a linear model to data which are nonlinearly

related, your predictions will be seriously in error.

Xt as variable not fixed: Xt is variable means more values must exist for explanatory variables. It must

not be fixed.

Xt is not stochastic (not random, it must be deterministic) and fixed in different samples.

Yt = a + bxt + ut

Where Yt is dependent variable whose behavior the researcher is interested to explain. Other names of

dependent variable are regressond and left hand side variable. The investigator then indentifies the

number of variables denoted by X that influence the dependant variable. X is generally called

independent variable. Other names are exogenous variable, explanatory variable, regressor, right hand

side variable. The choice of independent variable may come from economic theory, past experience,

other studies or from intuitive judgment.

Whereas, α is intercept (constant) and b is the slope of the straight line, also called regression

coefficient and tells us about marginal effect.

And Ut is an unobserved random variable, called error term. Other names are disturbance term or

stochastic term.

The term a+bxt is the deterministic part of the model, and Ut is called stochastic term.

The third assumption: Xt is not stochastic term, this must be deterministic. The second part of the

equation says that Xt is fixed in different samples. Here different samples means if you take the data of X

variable on December 31, 2011 on daily, monthly, quarterly or on yearly basis the value will be same.

Expected disturbance term is zero: The fourth assumption of CLRM is the expected disturbance term is

zero. Error term may come positive or negative.

Homoskedasticity: The term Homoskedasticity means same spread from the regression line. Variance of

error term must be constant. If spread from the regression line deviate then Heteroscedasticity will exist

in the data.

For Example:

Effects of Heteroscedasticity:

effect on beta, it will remain efficient and consistent. In other words, slope of the regression

line will be same.

ii) Standard Error (Inefficient, incorrect and may affect hypothesis testing): Standard error

(for sample) and standard deviation (for population). Due to the Heteroscedasticity standard

error may increase or decrease based on different spread of the data.

iii) T-Statistics (Inefficient and may affect hypothesis testing): Due to Heteroscedasticity,

standard error will increase or decrease that will ultimately upset t-statistics which is

calculated by: t = β / S.E

If Standard Error increase then t-statistics decrease (due to the denominator effect).

Ultimately due to t-statistics, hypothesis testing will be incorrect. So such variable that was

significant in nature will become insignificant.

If Standard Error decrease then t-statistics increase (due to the denominator effect).

Ultimately due to t-statistics, hypothesis testing will be incorrect. So such variable that was

insignificant in nature will become significant.

iv) F-Statistics: Due to Heteroscedasticity regression line (OLS, CLRM or linear line are same

name) not remain best fit. So decision making will be inefficient.

Autocorrelation: Today’s prices predicted by its lag prices (past prices) then autocorrelation will exist. In

other words, if one series is predicted by its lag series, then autocorrelation exist.

Multicollineraity is not a problem originated from or related to the specification of the model or the

estimation of the specified model, it is a problem originating from the nature of the data as it exists in

case of when one or more explanatory variables (Independent Variables) affects other explanatory

variables (Independent Variables). In practice, one can minimize multicollineraity but cannot eliminate

it.

Yt = a + bxt + ut

1. What is Heteroscedasticity

2. How to detect

3. Problem associated

4. Its removal

5. What are different methods to deal in the presence of Heteroscedasticity

Detection of Heteroscedasticity:

1. Breusch-Pagan Test

2. Harvey Test

3. Glejser Test

4. Auto-Regressive Conditional Heteroscedasticity Test

5. PARK Test

6. White Test

7. Goldfeld-Quandt Test

ii. Regress these variables, go to quick menu, estimate equation and write equation; x1 as

dependent variable and x2 x3 and x4 as independent variables. x1 c x2 x3 x4 click OK.

Following results will be displayed. Do nothing, just generate error term; write equation as:

genr ut=resid as shown in picture below.

iii. We are applying first Heteroscedasticity test, Breusch-Pagan Test. Assumption of this test is

to use square root of error term. So to meet this assumption, we will generate: genr

utsq=ut^2 as shown in below:

iv. Again go to quick menu and estimate equation. Now utsq will become dependent variable.

Equation will be: utsq c x2 x3 x4. From the results we will pick the value of R-square which is

in this case: 0.041560. We use R-square value for computing calculated value by formula:

LM = n*R2

= 39*0.041560

= 1.62084 -----------------------------This value is called Calculated Value.

v. For final decision about Heteroscedasticity, we need critical value or tabulated value. We

will generate chi-square as: genr chi=@qchisq(0.95,3). Here 0.95 is confidence interval and

3 means we have 3 independent variables namely x2 x3 and x4. After generating chi square

chi file name will appear. Please open, series of single constant value will appear. This value

is called tabulated value.

Decision Criteria:

o If Calculated value > Tabulated/critical value then Heteroscedasticity (In other words, there is

significant relationship)

o If Calculated value < Tabulated/critical value then Homoskedasticity (In other words, there is

insignificant relationship).

On the basis of above decision criteria, we conclude that there is Homoskedasticity or insignificant

relationship.

i. Generate variables

ii. Go to quick and estimate equation with x1 c x2 x3 x4

iii. Go to view of that small window and click on residual diagnostics then Heteroscedasticity

Tests; as shown in given below:

iv. Select Breusch-Pagan Test and click OK, results will display as given below picture. There are

same results as previous method. We will check Prob. Chi-Square(3), if this probability is

insignificant then it means there is Homoskedasticity, if significant there will exist

Heteroscedasticity.

Other Tests i.e. Harvey Test, Glejser Test and White Test:

All steps are same as we previously tested in shortcut method; just change test type and click OK. If Chi-

square probability is higher than 5% or 0.05 then there will be Homoskedasticity otherwise

Heteroscedasticity. This is the easiest way for detection.

If you are ambitious to work with backend generation of variables; Steps for Glejser Test are given

below:

i. Generate Variables

ii. Regress those variables as x1 c x2 x3 x4

iii. Genr ut=resid

iv. For Glejser Test, don’t create error term or its square

v. Create: genr absut=abs(ut)

vi. absut c x2 x3 x4

vii. Pick the value of R-Square

viii. Apply the formula: LM = n*R2 -------------Called Calculated Value

ix. Generate chi-square as: genr chi=@qchisq(0.95,3)-------------Called Tabulated or critical value

x. Compare and take decision

i. Generate Variables

ii. Regress those variables as x1 c x2 x3 x4

iii. Genr ut=resid

iv. As Bruice Pagan Test, generate error term as: genr utsq=ut^2

v. genr Lutsq=log(utsq)

vi. Go to quick, estimate equation

vii. Write: Lutsq c x2 x3 x4

viii. Pick the value of R-Square

ix. Apply the formula: LM = n*R2 -------------Called Calculated Value

x. Generate chi-square as: genr chi=@qchisq(0.95,3)-------------Called Tabulated or critical value

xi. Compare and take decision

“Knowledge is power. Information is power. The secreting or hoarding of knowledge or information may

be an act of tyranny camouflaged as humility.” (Robin Morgan)

Note: Please convey, if you found any mistake. Comments for improvement will be highly appreciated. Thanks

Abdul Qadeer Khan

