Sei sulla pagina 1di 62

Chapter 4

Statistical Inference
4.1. The distribution of OLS estimator

For Yi = 1 + 2X2i + 3X3i + ........+ kXki + ui


Under the CLM assumptions, the OLS estimators
𝛽መ0 , 𝛽መ1 , … 𝛽መ𝑘 are approximately distributed:

𝛽መ𝑗 ~𝑁(j , 𝑣𝑎𝑟 𝛽መ𝑗 )


Then we have:

~ t(n-k)
4.2 Testing hypothesis about one of
the regression coefficients
4.2.1. Confidence interval
4.2.2. T-statistic and Student t distribution
4.2.1. Confidence interval
Under the CRM assumptions, we can easily construct a confidence
interval (CI) for the population parameter i. Confidence intervals are
also called interval estimates because they provide a range of likely
values for the population parameter, and not just a point estimate.
ˆi  i
t
Se( ˆi )
 t(n-k)
ˆ  t Se( ˆ )    ˆ  t Se( ˆ ), i  1,3

Then CI: i  / 2 i i i /2 i

If the level of confidence is 95%, then t/2 is the 97.5th


percentile in a tn-k distribution.
1 - α is known as the confidence coefficient; and α (0 < α <
1) is known as the level of significance

The lower bound The upper bound


Example: A model that explains the price of a good in terms of the
good’s characteristics is called an hedonic price model. The following
equation is an hedonic price model for housing prices; the
characteristics are square footage (sqrft), number of bedrooms
(bdrms), and number of bathrooms (bthrms). Often price appears in
logarithmic form, as do some of the explanatory variables. Using n =19
observations on houses that were sold in Waltham, Massachusetts, in
1990, the estimated equation (with standard errors in parentheses
below the coefficient estimates) is:

• Compute the 95% confidence interval for sales elasticity?


The interpretation of this confidence
interval is:
Given the confidence coefficient of 95 percent, in
95 out of 100 cases intervals (like Equation 5.3.9)
will contain the true β2. But, as warned earlier, we
cannot say that the probability is 95 percent that the
specific interval in Eq. (5.3.9) contains the true β2
because this interval is now fixed and no longer
random; therefore β2 either lies in it or it does not:
The probability that the specified fixed interval
includes the true β2 is therefore 1 or 0
Hypothesis Testing: The Confidence-
Interval Approach
• Two-Sided or Two-Tail Test
Suppose we postulate that
H0: βi = 0.5
H1: βi≠ 0.5

Rule of decision: Construct a 100(1 - α)%


confidence interval for βi. If the βi under H0 falls
within this confidence interval, do not reject H0, but
if it falls outside this interval, reject H0
Hypothesis: H0: j = 0
H1: j ≠ 0
4.2.2. T-statistic and Student t distribution
• The Student t distribution with m degrees of
freedom is defined to be the distribution of the
ratio of a standard normal random variable,
divided by the square root of an independently
distributed chi-squared random variable with m
degrees of freedom divided by m.
• The t distribution depends on the degrees of
freedom m. The Student t distribution has a bell
shape similar to that of the normal distribution ,
but when m is small (<20), it has more mass in
the tails – that is, it is a fatter bell shape than the
normal.
• To obtain the value t, we must know the degrees of freedom,
n - k, and the level of confidence—95% in this case.
Then, the value for t is obtained from the tn-k distribution

• When n - k>120, the tn-k distribution is close enough to normal


to use the 97.5th percentile in a standard normal distribution for
constructing a 95% CI: ෠ 𝑗 +/_ 1.96*se(෠ 𝑗 ).
• In fact, when n - k >50, the value t is so close to 2 that we can
use a simple rule of thumb for a 95% confidence interval:
෠ 𝑗 plus or minus two of its standard errors.
• For small degrees of freedom, the exact percentiles should be
obtained from the t table
• Level of confidence could be chosen
Hypothesis testing about individual regression coefficients:
the null hypothesis in most applications
• A hypothesis about any individual partial regression
coefficient.
H0: j = 0
H1: j ≠ 0
•  Xj has no effect on the expected value of Y.
• If the computed t value > critical t value at the chosen level
of significance, we may reject the null hypothesis;
otherwise, we may not reject it
• Where: ˆ j  0
t
se( ˆ )
j

11
Testing against two-Sided alternative
Example : Determinants of college GPA (example 4.3, p129)

• The null hypothesis states that:


– ACT held constant,
– hsGPA has no influence on colGPA
H0: β2 = 0 and H1: β2 ≠ 0
0.4534  0
• t-test t  4.73
0.0958
• The critical t value is 2.61 for a two-tail test with the significance level
of 1% (look up tα/2 for 138 df)
• With the significance level of 1%, reject the null hypothesis that
hsGPA has no effect on colGPA.

13
Example 2: Determinants of college GPA

14
A reminder on the language of
classical hypothesis testing
• When H0 is not rejected  “We fail to reject
H0 at the x% level”, do not say: “H0 is accepted
at the x% level”.
• Statistical significance vs economic
significance: The statistical significance is
determined by the size of the t-statistics
whereas the economic significance is related
to the size and sign of the estimators.

15
Testing against one-Sided alternative
Testing against one-Sided alternative
Testing Hypotheses on the coefficients

Hypotheses H0 Alternative Rejection


hypothesis H1 region

 j  0
Two tail |t0|>t(n-k),α/2
 j  0
Right tail t0 > t(n-k),α
 j  0  j  0

Left tail t0 <- t(n-k),α


 j  0  j  0

18
4.3 Testing of joint hypotheses
• 4.3.1 Testing hypotheses on two and more
coefficients
• 4.3.2 F-statistic and Fisher F distribution
4.3.1 Testing the Overall Significance of
the Sample Regression
For Yi = 1 + 2X2i + 3X3i + ........+ kXki + ui
 To test the hypothesis
H0: 2 =3 =....= k= 0 (all slope coefficients are simultaneously zero)
(this is also a test of significance of R2)
H1: Not at all slope coefficients are simultaneously zero
Option 1: t-test
• If the t variable exceeds the critical t value at the designated level of
significance for given df, then you can reject the null hypothesis;
otherwise, you do not reject it
Option 2: F test

R 2 (n  k )
F (8.5.7)
(1  R 2 )( k  1)
(k = total number of parameters to be estimated including intercept)
If F > F critical = F,(k-1,n-k), reject H0, Otherwise you do not reject it 20
Example : Testing the Overall Significance of
the Sample Regression

• Determinants of college GPA


0.1764 *138
F  14.78
(1  0.1764) * 2
• We have F > F critical = F0.05,(2,138) =3.062 
reject H0

21
4.3.2 F-statistic and Fisher F distribution
4.4 Testing single restrictions involving multiple coefficients
a. Testing the Equality of Two Regression Coefficients
• Suppose in the multiple regression
Yi = β1 + β2X2i + β3X3i + β4X4i + ui
we want to test the hypotheses
H0: β3 = β4 or (β3 − β4) = 0
H1: β3 ≠ β4 or (β3 − β4) ≠ 0
that is, the two slope coefficients β3 and β4 are equal.\

23
a. Testing the Equality of Two Regression Coefficients
H0: β3=0, β4=0 vs H1: β3≠0 and/or β4 ≠ 0
Option 1: t-test (test directly)
• If the t variable exceeds the critical t value at the designated
level of significance for given df, then you can reject the null
hypothesis; otherwise, you do not reject it

24
a. Testing the Equality of Two Regression Coefficients

• review
 2 . x32 2
Var ( ˆ2 )  
x x2
2
2
3  ( x2 x3 ) 2
(1  r ) x
2
2, 3
2
2

 . x
2 2
 2
Var ( ˆ3 )  2

x x
2
2
2
3  ( x2 x3 ) 2
(1  r ) x
2
2, 3
2
3

 r 2 2

Cov( ˆ2 ˆ3 )   ( ˆ2 ˆ3 )  2,3

(1  r )
2
2,3 x x 2
2
2
3

25
Example- Eview output
• Model: wage = f(educ,exper, tenure )

26
Example- Eview output
• Model: wage = f(educ,exper, tenure )

27
Example- Eview output
• We have se(ˆ3  ˆ4 )  0.029635
t = -4.958 t 0.025,522  2.
Reject H0

28
a. Testing the Equality of Two Regression Coefficients
Option 2: Transform the regression

29
a. Testing the Equality of Two Regression Coefficients

Option 3: Stata output F-test

. test exper=tenure
( 1) exper - tenure = 0

F( 1, 522) = 24.58
Prob > F = 0.0000
 We reject the hypothesis that the two effects are
equal.
32
b. Restricted Least Squares: Testing Linear Equality
Restrictions

• Now consider the Cobb–Douglas production function:


2 3 U i
Yi  1 X X e
2i 3i 8.6.1
where Y = output
X2 = labor input
X3 = capital input
• Written in log form, the equation becomes
ln Yi = ln β1 + β2 lnX2i + β3lnX3i + ui
= β0 + β2lnX2i + β3lnX3i + ui 8.6.2
where β0 = ln β1.

33
b. Restricted Least Squares: Testing Linear Equality
Restrictions

• Now if there are constant returns to scale (equiproportional


change in output for an equiproportional change in the
inputs), economic theory would suggest that:
β2 + β3 = 1 (8.6.3)
which is an example of a linear equality restriction.
• If the restriction (8.6.3) is valid? There are two approaches:
– The t-Test Approach
– The F-Test Approach

34
b. Restricted Least Squares: Testing Linear Equality
Restrictions
The t-Test Approach
•The simplest procedure is to estimate Eq. (8.6.2) in the usual
manner
•A test of the hypothesis or restriction can be conducted by the
t test.

(8.6.4)

•If the t value computed exceeds the critical t value at the chosen
level of significance, we reject the hypothesis of constant returns
to scale;
•Otherwise we do not reject it. 35
b. Restricted Least Squares: Testing Linear Equality
Restrictions

The F-Test Approach


•we see that β2 = 1 − β3
•we can write the Cobb–Douglas production function as
lnYi = β0 + (1 − β3) ln X2i + β3 ln X3i + ui
= β0 + ln X2i + β3(ln X3i − ln X2i ) + ui
or (lnYi − lnX2i) = β0 + β3(lnX3i − lnX2i ) + ui (8.6.7)
or ln(Yi/X2i) = β0 + β3ln(X3i/X2i) + ui (8.6.8)
Where (Yi/X2i) = output/labor ratio
(X3i/X2i) = capital/labor ratio
Eq. (8.6.7) or Eq. (8.6.8) is known as restricted least squares
(RLS)
36
Restricted Least Squares: Testing Linear Equality
Restrictions
• We want to test the hypotheses
H0: β2 + β3 = 1 (the restriction H0 is valid)

RSSUR: RSS of the unrestricted regression (8.6.2)


RSSR : RSS of the restricted regression (8.6.7) or (8.6.7)
m = number of linear restrictions (1 in the present example)
k = number of parameters in the unrestricted regression
n = number of observations
• If the F value computed > the critical F value at the chosen level of
significance, we reject the hypothesis H0

37
EXAMPLE 4.4 The Cobb–Douglas Production Function for the
Mexican Economy,1955–1974 (Table 8.8)
GDP Employment Fixed Capital
Year Millions of 1960 pesos. Thousands of people. Millions of 1960 pesos.
1955 114043 8310 182113
1956 120410 8529 193749
1957 129187 8738 205192
1958 134705 8952 215130
1959 139960 9171 225021
1960 150511 9569 237026
1961 157897 9527 248897
1962 165286 9662 260661
1963 178491 10334 275466
1964 199457 10981 295378
1965 212323 11746 315715
1966 226977 11521 337642
1967 241194 11540 363599
1968 260881 12066 391847
1969 277498 12297 422382
1970 296530 12955 455049
1971 306712 13338 484677
1972 329030 13738 520553
1973 354057 15924 561531
38
1974 374977 14154 609825
Example

Fk-1,n-k,α = F2,17,0.05 = 3.59

39
Example

• F = 3.75 < Fm,n-k,α = F1,17,0.05 = 4.45 we can not reject H0.


40
A Cautionary Note (p.269)
• Keep in mind that if the dependent variable in
the restricted and unrestricted models is not
the same, R2(unrestricted) and R2(restricted)
are not directly comparable.

41
Example
• General F Testing
• Model: wage = f(educ,exper, tenure )
H0: beta(exper)=0

42
Example
• Unrestricted model

43
Example
• Restricted model

44
Example

General F Testing
•In Exercise 7.19, you were asked to consider the following
demand function for chicken:
lnYt = β1 + β2 lnX2t + β3 lnX3t + β4 lnX4t + β5 ln X5t + ui (8.6.19)
Where Y = per capita consumption of chicken, lb
X2 = real disposable per capita income,$
X3 = real retail price of chicken per lb
X4 = real retail price of pork per lb
X5 = real retail price of beef per lb.

45
Example

• Suppose that chicken consumption is not affected by the prices of


pork and beef.
H0: β4 = β5 = 0 (8.6.21)
Therefore, the constrained regression becomes
lnYt = β1 + β2 ln X2t + β3 lnX3t + ut (8.6.22)

46
Example

• F=1.1224 < F0.05 (2,18) = 3.55. Therefore, there is no reason to reject


the null hypothesis (the demand for chicken does not depend on pork
and beef prices)
• In short, we can accept the constrained regression (8.6.24) as
representing the demand function for chicken. 47
c. Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
• Now we have three possible regressions:Regression (8.7.3) assume
Time period 1970–1981: Yt = λ1 + λ2Xt + u1t (8.7.1)
Time period 1982–1995: Yt = γ1 + γ2Xt + u2t (8.7.2)
Time period 1970–1995: Yt = α1 + α2Xt + ut (8.7.3)
• there is no difference between the two time periods. The mechanics
of the Chow test are as follows:
1. Estimate regression (8.7.3), obtain RSS3 with df = (n1 + n2 − k)
We call RSS3 the restricted residual sum of squares (RSSR) because it is obtained by
imposing the restrictions that λ1 = γ1 and λ2 = γ2, that is, the subperiod regressions are
not different.
2. Estimate Eq. (8.7.1) and obtain its residual sum of squares, RSS1,
with df = (n1 − k).
3. Estimate Eq. (8.7.2) and obtain its residual sum of squares, RSS2,
with df = (n2 − k). 48
C . Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
4. The unrestricted residual sum of squares (RSSUR), that is,
RSSUR = RSS1 + RSS2 with df = (n1 + n2 − 2k)
5. F ratio:

6. If the computed F value exceeds the critical F value, we reject the


hypothesis of parameter stability conclude that the regressions (8.7.1)
and (8.7.2) are different

49
TABLE 4.4 c: Savings and Personal Disposable Income (billions of
dollars), United States, 1970–1995

Observation Savings Income Observation Savings Income


1970 61.00 727.1 1983 167.0 2,522.4
1971 68.67 790.2 1984 235.7 2,810.0
1972 63.62 855.3 1985 206.2 3,002.0
1973 89.65 965.0 1986 196.5 3,187.6
1974 97.64 1,054.2 1987 168.4 33,363.1
1975 104.41 1,159.2 1988 189.1 3, 3640.8
1976 96.48 1,273.0 1989 187.8 33,894.5
1977 92.57 1,401.4 1990 208.7 44,166.8
1978 112.64 1,580.1 1991 246.4 4, 4343.7
1979 130.16 1,769.5 1992 272.6 4, 4613.7
1980 161.84 1,973.3 1993 214.4 4, 4790.2
1981 199.14 2,200.2 1994 189.4 5, 5021.7
1982 205.53 2,347.3 1995 249.3 5, 5320.8
50
c. Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
• For the data given in Table 4.4 c, the empirical counterparts of the
preceding three regressions are as follows:

51
c. Testing for Structural or Parameter Stability of
Regression Models: The Chow Test
• RSSUR = RSS1 + RSS2 = (1785.032 + 10,005.22) = 11,790.252
• RSSR = RSS3 = 23248.3

= 10.69
• From the F tables, we find that for 2 and 22 df the 1 percent critical F
value is 5.72.
• Therefore, The Chow test therefore seems to support our earlier hunch
that the savings–income relation has undergone a structural change in
the United States over the period 1970–1995

52
c. Testing the Functional Form of Regression: Choosing
between Linear and Log–Linear Models
• We can use a test proposed by MacKinnon, White, and Davidson,
which for brevity we call the MWD test, to choose between the two
models
H0: The true model is linear
H1: The true model is Log–Linear
Step I: Estimate the linear model and obtain the estimated Y values. Call them Yf
Step II: Estimate the log–linear model and obtain the estimated lnY values; call lnf
Step III: Obtain Z1 = (lnY f − ln f ).
Step IV: Regress Y on X’s and Z1 obtained in Step III. Reject H0 if the coefficient of Z1
is statistically significant by the usual t test.
Step V: Obtain Z2 = (antilog of lnf − Y f ).
Step VI: Regress log of Y on the logs of X’s and Z2. Reject H1 if the coefficient of Z2 is
statistically significant by the usual t test.

53
EXAMPLE 4.4 The Demand for Roses

• Refer to Exercise 7.16 where we have presented data on the demand


for roses in the Detroit metropolitan area for the period 1971–III to
1975–II.
Linear model: Yt = α1 + α2X2t + α3X3t + ut (8.10.1)
Log–linear model: lnYt = β1 + β2lnX2t + β3lnX3t + ut (8.10.2)
• Where Y is the quantity of roses in dozens
X2 is the average wholesale price of roses ($/dozen),
X3 is the average wholesale price of carnations ($/dozen).
• A priori: α2 and β2 are expected to be negative (why?)
α3 and β3 are expected to be positive
• As we know, the slope coefficients in the log–linear model are
elasticity coefficients.
54
EXAMPLE 4.4 The Demand for Roses
• Step I, Step II

• Step III: Obtain Z1 = (lnYf − ln f ).


• Step IV:

55
EXAMPLE4.4The Demand for Roses

• The coefficient of Z1 is not statistically significant (t test), we do not


reject the hypothesis that the true model is linear
• Step V: Obtain Z2 = (antilog of lnf − Y f ).
• Step VI:

• The coefficient of Z2 is not statistically significant (t test), we also can


reject the hypothesis that the true model is log–linear at 5% level of
significance
• Conclusion: As this example shows, it is quite possible that in a given
situation we cannot reject either of the specifications.
56
Assignments
• Problems 7.16, 7.17, 7.18, 7.19, 7.20 in p25-
240, Gujarati.
• Problems 3.1-3.3 in p. 105-106, Wooldridge.
• Computer exercises C3.1-C3.3 in p. 110-111,
Wooldridge.

57
The Log-Linear Model

 Consider the following model, known as the exponential


regression model:
2
Yi  1 X i e ui

 Which may be expressed alternatively as


ln Yi  ln 1   2 ln X 1  ui (6.5.2)
denote Yi* = lnYi Xi* = lnXi α = lnβ1
We write Eq. (6.5.2) as:
Yi* = α + β2 X i* + ui
β2 measures the ELASTICITY of Y respect to X, that is, percentage
change in Y for a given (small) percentage change in X.
58
Semilog Models: Log–Lin and Lin–Log Models

• EXAMPLE 6.4 The rate of growth expenditure on services


Consider the data on expenditure on services given in Table
6.3. The regression results over time (t) are as follows:

Over the quarterly period 2003 to2006, expenditures on


services increased at the (quarterly) rate of 0.705 percent.

59
Semilog Models: Log–Lin and Lin–Log Models

Linear Trend Model


• Sometimes estimate the following model:
Yt = β1 + β2t + ut (6.6.9)
• For the expenditure on services data, the results of fitting
the linear trend model (6.6.9) are as follows:

• On average, expenditure on services increased at the


absolute rate of about 30 billion dollars per quarter.

60
Semilog Models: Log–Lin and Lin–Log Models

The Lin–Log Model


• Suppose we now want to find the absolute change in Y for a
percent change in X. A model that can accomplish this
purpose can be written as:
Yi = β1 + β2 ln Xi + ui (6.6.11)
For descriptive purposes we call such a model a lin–log
model.

61
Semilog Models: Log–Lin and Lin–Log Models

• Example 6.5:let us revisit our example on food expenditure


in India, Example 3.2. As this figure suggests, food
expenditure increases more slowly as total expenditure
increases. The results of fitting the lin–log model to the data
are as follows:

The slope coefficient of about 257 means that an increase in


the total food expenditure of 1 percent, on average, leads to
about 2.57 rupees increase in the expenditure on food of the
55 families included in the sample.
62

Potrebbero piacerti anche