22 Reg7

22.
INFERENCE FOR Testing For Significance of an Individual Parameter

MULTIPLE REGRESSION The Minitab t-statistics (T) can be used for testing that a given
parameter is zero, that is, H0 : i = 0. (Or to test H0 : = 0).
We can interpret most of the Minitab Multiple Regression output as
If the model holds and i = 0, then the t-statistic corresponding to i ,
we did in the simple regression case.
" T" = i / s = " Coef " / "SE Coef "
We should keep in mind, though, that we are now estimating
i
will have a t-distribution with nk1 degrees of freedom.

several parameters.
This requires us to modify the degrees of freedom, and to think The p-value provided by Minitab corresponds to a two-tailed test of
about the consequences of performing too many hypothesis tests at H0 : i = 0 versus HA : i 0.
once. For the housing example, the coefficients of house size and lot size
Since we are estimating k+1 regression parameters (, 1,..., k), are statistically significant at the 5% level (p < 0.05), while the
we now have n(k+1) = nk1 degrees of freedom. intercept and the coefficient for age are not significant (p > 0.05).
Perhaps age should be deleted from the model, but we will leave it
In the housing example, we have n=15 and k=3, so df=1531=11. in for now.
Regression Analysis: Price versus Size, Age, Lot Size Confidence intervals and tests for a general null
Analysis of Variance
hypothesis on a parameter
Source DF SS MS F-Value P-Value For each regression parameter, Minitab computes t-statistics and
Regression 3 570744 190248 40.03 0.000 p-values for a null hypothesis of zero. For a general null
Error 11 52280 4753
Total 14 623024 hypothesis, you must construct your own t-statistic by hand. To
decide whether it's statistically significant, you need to get the
Model Summary critical value from Table 6.
S
68.9399
R-sq
91.61%
R-sq(adj)
89.32%
R-sq(pred)
87.66%
Even if i is not zero, the estimator i is normally distributed,
with E [ i ] = i . Furthermore, the quantity ( i i ) / s
i
Coefficients
has a t-distribution with nk1 degrees of freedom.
Term Coef SE Coef T-Value P-Value
Constant -161 191 -0.84 0.418 This allows us to get confidence intervals and perform general
Size 41.46 7.51 5.52 0.000 hypothesis tests for i .
Age -2.36 8.81 -0.27 0.794
Lot Size 48.31 9.01 5.36 0.000
Next, let's construct a 95% confidence interval for 1. From
Eg: Suppose in the housing example that, before seeing the the Minitab output, we have s = 7.51.
data, we had a hypothesis that every hundred square feet of 1
house size adds an average of $10,000 to the selling price The confidence interval is
(1=10). To test H0 : 1=10 versus HA : 110 at level 0.05, we
form the t-statistic, 1 t 2 s = 41.46 2.201 (7.51) = (24.9 , 58.0) .
1
Based on this interval, we can perform any two-tailed

t = (41.46 10) / 7.51 = 4.19 . hypothesis test on 1, without actually calculating the
t-statistic.
If H0 were true, such a t-statistic would have a t distribution
with 11 degrees of freedom. Thus, we need to compare our For example, since the interval does not contain 10, we know
observed t-statistic to the critical value t0.025 = 2.201 from Table that we can reject the hypothesis
6 (with df=11).
H0 : 1 = 10 in favor of HA : 110, at level 0.05.
Conclusion: Reject the null hypothesis.
Interval Estimators and Predictors

Prediction for Price
We can estimate the response surface E(Y | x) and predict a
future value using y . We can also obtain confidence intervals Regression Equation
for E(Y | x) and prediction intervals for a future y (for given Price = -161 + 41.46 Size - 2.36 Age + 48.31 Lot Size
values of the explanatory variables) from the Minitab output.
For example, if a house has a ground area of 2000 feet (x1 =
Variable Setting
20), the house is 10 years old (x2 = 10) and the lot size is Size 20
10,000 square feet (x3 = 10), then y = 1128.14 ,so the predicted Age 10
selling price is $1,128,140. Lot Size 10
The 95 percent confidence interval for the mean selling price Fit SE Fit 95% CI 95% PI
is ($1,049,370, $1,206,900) and the 95 percent prediction 1128.14 35.7870 (1049.37, 1206.90) (957.176, 1299.10)
interval for the price of the house is ($957,176, $1,299,100).
Estimating the Error Variance, 2 Multiple Coefficient of Determination, R2
Just as in simple regression, we have
The error variance 2 measures the dispersion of the data SST = SSR + SSE ,
points from the true response surface.
n
where SST = ( yi y ) 2 is the total sum of squares,
We can estimate 2 without bias by s2 = SSE/(nk1). i =1
n
SSR = ( y i y ) 2 is the regression sum of squares, and
The value of s (an estimate of ) is given by Minitab. i =1
n
SSE = ( yi y i ) 2 is the residual sum of squares.
For the housing example, we get s = 68.94. i =1
The coefficient of multiple determination is R2 = SSR/SST .

The interpretations of these quantities are essentially the same as
in simple linear regression. Thus, R2 is the proportion of the
variation in y that is "explained" by the multiple regression model.
Regression Analysis: Price versus Size, Age, Lot Size
Analysis of Variance
In Minitab, R2 is denoted by R-Sq.
Source DF SS MS F-Value P-Value
Regression 3 570744 190248 40.03 0.000 For the housing data, we have R2 = 91.61%
Error 11 52280 4753
Total 14 623024
A value of R2 close to 1 is generally considered to imply
Model Summary that the model is good. We have to be somewhat careful,
S R-sq R-sq(adj) R-sq(pred) however, since R2 is guaranteed to go up whenever we
68.9399 91.61% 89.32% 87.66% include a new variable.
Coefficients
The "Adjusted R2" and Prediction R2" avoid this problem,
Term Coef SE Coef T-Value P-Value but in my opinion they are not much more useful than R2
Constant -161 191 -0.84 0.418 itself for deciding which variables to use in the model.
Size 41.46 7.51 5.52 0.000
Age -2.36 8.81 -0.27 0.794
Lot Size 48.31 9.01 5.36 0.000
In Minitab, we can read the values of SSR, SSE and SST
from the "SS" column of the "Analysis of Variance" table.
For the housing data, we have SSR = 570744

("Regression"), SSE = 52280 (Error"),
and SST = 623024 ("Total").
We can check that 623024 = 570744 + 52280

(SST=SSR+SSE), and that 0.9161 = 570744 / 623024
(R2 = SSR/SST).
We can also compute s2 = SSE/(nk1) = 52280/11 =

4753.
The square root of this is s = 68.94 as given earlier.

22 Reg7

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

22 Reg7

Caricato da

Copyright:

Formati disponibili

22.

INFERENCE FOR Testing For Significance of an Individual Parameter

will have a t-distribution with nk1 degrees of freedom.

Based on this interval, we can perform any two-tailed

Interval Estimators and Predictors

The coefficient of multiple determination is R2 = SSR/SST .

Regression Analysis: Price versus Size, Age, Lot Size

For the housing data, we have SSR = 570744

We can check that 623024 = 570744 + 52280

We can also compute s2 = SSE/(nk1) = 52280/11 =

The square root of this is s = 68.94 as given earlier.

Potrebbero piacerti anche