Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
E (Y x ) = µY x = β 0 + β1 x
where the slope and intercept of the line are called regression
coefficients.
coefficients.
Suppose that the mean and variance of ε are 0 and σ2, respectively,
then
E (Y x ) = E ( β 0 + β1 x + ε ) = β 0 + β1 x + E ( ε ) = β 0 + β1 x
V (Y x ) = V ( β 0 + β1 x + ε ) = V ( β 0 + β1 x ) + V ( ε ) = 0 + σ 2 = σ 2
• The true regression model is a line of mean values:
µY x = β 0 + β1 x
Figure 11-
11-3 Deviations of
the data from the estimated
regression model.
• Using Equation 11-2, the n observations in the sample can be
expressed as
yi = β 0 + β1 xi + ε i , i = 1, 2,… , n
The least squares estimators of β0 and β1, say, βˆ0 and βˆ1 , must
satisfy
n
∂L
∂β 0
( )
= −2∑ yi − βˆ0 − βˆ1 xi = 0
i =1
βˆ0 βˆ1
n
∂L
∂β1
( )
= −2∑ yi − βˆ0 − βˆ1 xi xi = 0
i =1
βˆ0 βˆ1
Simplifying these two equations yields
n n
nβˆ0 + βˆ1 ∑ xi = ∑ yi
i =1 i =1
n n n
βˆ0 ∑ xi + βˆ1 ∑ x = ∑ yi xi
2
i
i =1 i =1 i =1
ŷ = βˆ0 + βˆ1 x
Note that each pair of observations satisfies the relationship
i =1 i =1 n
n n
n n ∑ xi ∑ yi
S xy = ∑ yi ( xi − x ) = ∑ xi yi − i =1 i =1
2
i =1 i =1 n
2
n
n n ∑ yi
SST = S yy = ∑ ( yi − y ) = ∑ yi2 − i =1
2
i =1 i =1 n
Example 1
We will fit a simple linear regression model to the oxygen purity data in Table
11–1. The following quantities may be computed:
20 20
n = 20 ∑x
i =1
i = 23.92 ∑y
i =1
i = 1,843.21 x = 1.1960 y = 92.1605
20 20 20
∑y
i =1
2
i = 170, 044.5321 ∑x
i =1
2
i = 29.2892 ∑x y
i =1
i i = 2, 214.6566
2
20
20 ∑ i x
( 23.92 )
2
Therefore, the least squares estimates of the slope and intercept are
ˆ S xy 10.17744
β1 = = = 14.94748
S yy 0.68088
and
βˆ0 = y − βˆ1 x = 92.1605 − (14.94748 )1.196 = 74.28331
The fitted simple linear regression model (with the coefficients reported to
three decimal places) is
yˆ = 74.283 + 14.947 x
This model is plotted in Fig. 11–4, along with the sample data.
Figure 11-
11-4 Scatter plot of oxygen
purity y versus hydrocarbon level
x and regression model ŷ = 74.20
+ 14.97x.
Computer software
programs are widely used in
regression modeling. Table
11–2 shows a portion of the
output from Minitab for this
problem. The estimates
are highlighted.
Estimating The error sum of squares is
σ² n n
SS E = ∑ e = ∑ ( yi − yˆi )
2 2
i
i =1 i =1
An unbiased estimator of σ2 is
SS E
σˆ 2 =
n−2
SS E = SST − βˆ1S xy
Slope Properties:
σ2
( )
E βˆ1 = β1 ( )
V βˆ1 =
S XX
Intercept Properties:
1 x 2
( )
E βˆ0 = β 0 ( )
V βˆ0 = σ +
2
n S
XX
Estimated In simple linear regression the estimated standard error
of the slope and the estimated standard error of the
Standard
intercept are
Errors
σˆ 2 1 x 2
( )
se βˆ1 =
S XX
( )
ˆ 2
se β 0 = σˆ +
n S
XX
11.4.1 Use of t-Tests
βˆ 0 − β 0 ,0 βˆ 0 − β 0 ,0
T0 = =
1
2
σ +
ˆ
x
2
( )
se βˆ 0
n S XX
11.4.1 Use of t-Tests
t0 > tα 2,n − 2
11.4.1 Use of t-Tests
We will test for significance of regression using the model for the oxygen
purity data from Example 11-1. The hypotheses are
H 0 : β1 = 0
H1 : β1 ≠ 0
and we will use α = 0.01. From Example 11-1 and Table 11-2 we have
Practical Interpretation:
Since the reference value of t is t0.005,18 = 2.88, the value of the test
statistic is very far into the critical region, implying that H0 : β1 = 0 should
be rejected.
There is strong evidence to support this claim.
−9
The P-value for this test is P 1.23 × 10 . This is obtain manually with a
calculator.