Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
This requires us to modify the degrees of freedom, and to think The p-value provided by Minitab corresponds to a two-tailed test of
about the consequences of performing too many hypothesis tests at H0 : i = 0 versus HA : i 0.
once. For the housing example, the coefficients of house size and lot size
Since we are estimating k+1 regression parameters (, 1,..., k), are statistically significant at the 5% level (p < 0.05), while the
we now have n(k+1) = nk1 degrees of freedom. intercept and the coefficient for age are not significant (p > 0.05).
Perhaps age should be deleted from the model, but we will leave it
In the housing example, we have n=15 and k=3, so df=1531=11. in for now.
Regression Analysis: Price versus Size, Age, Lot Size Confidence intervals and tests for a general null
Analysis of Variance
hypothesis on a parameter
Source DF SS MS F-Value P-Value For each regression parameter, Minitab computes t-statistics and
Regression 3 570744 190248 40.03 0.000 p-values for a null hypothesis of zero. For a general null
Error 11 52280 4753
Total 14 623024 hypothesis, you must construct your own t-statistic by hand. To
decide whether it's statistically significant, you need to get the
Model Summary critical value from Table 6.
S
68.9399
R-sq
91.61%
R-sq(adj)
89.32%
R-sq(pred)
87.66%
Even if i is not zero, the estimator i is normally distributed,
with E [ i ] = i . Furthermore, the quantity ( i i ) / s
i
Coefficients
has a t-distribution with nk1 degrees of freedom.
Term Coef SE Coef T-Value P-Value
Constant -161 191 -0.84 0.418 This allows us to get confidence intervals and perform general
Size 41.46 7.51 5.52 0.000 hypothesis tests for i .
Age -2.36 8.81 -0.27 0.794
Lot Size 48.31 9.01 5.36 0.000
Next, let's construct a 95% confidence interval for 1. From
Eg: Suppose in the housing example that, before seeing the the Minitab output, we have s = 7.51.
data, we had a hypothesis that every hundred square feet of 1
house size adds an average of $10,000 to the selling price The confidence interval is
(1=10). To test H0 : 1=10 versus HA : 110 at level 0.05, we
form the t-statistic, 1 t 2 s = 41.46 2.201 (7.51) = (24.9 , 58.0) .
1
The 95 percent confidence interval for the mean selling price Fit SE Fit 95% CI 95% PI
is ($1,049,370, $1,206,900) and the 95 percent prediction 1128.14 35.7870 (1049.37, 1206.90) (957.176, 1299.10)
interval for the price of the house is ($957,176, $1,299,100).
Estimating the Error Variance, 2 Multiple Coefficient of Determination, R2
Just as in simple regression, we have
The error variance 2 measures the dispersion of the data SST = SSR + SSE ,
points from the true response surface.
n
where SST = ( yi y ) 2 is the total sum of squares,
We can estimate 2 without bias by s2 = SSE/(nk1). i =1
n
SSR = ( y i y ) 2 is the regression sum of squares, and
The value of s (an estimate of ) is given by Minitab. i =1
n
SSE = ( yi y i ) 2 is the residual sum of squares.
For the housing example, we get s = 68.94. i =1
Analysis of Variance
In Minitab, R2 is denoted by R-Sq.
Source DF SS MS F-Value P-Value
Regression 3 570744 190248 40.03 0.000 For the housing data, we have R2 = 91.61%
Error 11 52280 4753
Total 14 623024
A value of R2 close to 1 is generally considered to imply
Model Summary that the model is good. We have to be somewhat careful,
S R-sq R-sq(adj) R-sq(pred) however, since R2 is guaranteed to go up whenever we
68.9399 91.61% 89.32% 87.66% include a new variable.
Coefficients
The "Adjusted R2" and Prediction R2" avoid this problem,
Term Coef SE Coef T-Value P-Value but in my opinion they are not much more useful than R2
Constant -161 191 -0.84 0.418 itself for deciding which variables to use in the model.
Size 41.46 7.51 5.52 0.000
Age -2.36 8.81 -0.27 0.794
Lot Size 48.31 9.01 5.36 0.000
In Minitab, we can read the values of SSR, SSE and SST
from the "SS" column of the "Analysis of Variance" table.