11 1-11 4

• Many problems in engineering and science involve exploring
the relationships between two or more variables.
• Regression analysis is a statistical technique that is very

useful for these types of problems.
• For example, in a chemical process, suppose that the yield of

the product is related to the process-operating temperature.
• Regression analysis can be used to build a model to predict

yield at a given temperature level.
Table11-
Table11-1 Oxygen and Hydrocarbon Level
Hydrocarbon Hydrocarbon
Observation Purity Observation Purity
Level Level
Number y (%) Number y (%)
x (%) x (%)
1 0.99 90.01 11 1.19 93.54
2 1.02 89.05 12 1.15 92.52
3 1.15 91.43 13 0.98 90.56
4 1.29 93.74 14 1.01 89.54
5 1.46 96.73 15 1.11 89.85
6 1.36 94.45 16 1.20 90.39
7 0.87 87.59 17 1.26 93.25
8 1.23 91.77 18 1.32 93.41
9 1.55 99.42 19 1.43 94.98
10 1.40 93.65 20 0.95 87.33

Figure 11-
11-1 Scatter Diagram of oxygen purity versus hydrocarbon level from
Table 11-1.
Based on the scatter diagram, it is probably reasonable to assume
that the mean of the random variable Y is related to x by the
following straight-line relationship:
E (Y x ) = µY x = β 0 + β1 x
where the slope and intercept of the line are called regression
coefficients.
coefficients.
The simple linear regression model is given by

Y = β 0 + β1 x + ε
where ε is the random error term.
We think of the regression model as an empirical model.
Suppose that the mean and variance of ε are 0 and σ2, respectively,
then
E (Y x ) = E ( β 0 + β1 x + ε ) = β 0 + β1 x + E ( ε ) = β 0 + β1 x
The variance of Y given x is
V (Y x ) = V ( β 0 + β1 x + ε ) = V ( β 0 + β1 x ) + V ( ε ) = 0 + σ 2 = σ 2
• The true regression model is a line of mean values:
µY x = β 0 + β1 x
where β1 can be interpreted as the change in the mean of Y for a unit

change in x.
• Also, the variability of Y at a particular value of x is determined by the

error variance, σ2.
• This implies there is a distribution of Y-values at each x and that the

variance of this distribution is the same at each x.
Figure 11-
11-2 The distribution of Y for a given value of x for the oxygen purity-
hydrocarbon data.
• The case of simple linear regression considers a single regressor or
predictor x and a dependent or response variable Y.
• The expected value of Y at each level of x is a random variable:

E (Y x ) = β 0 + β1 x
• We assume that each observation, Y, can be described by the

model
Y = β 0 + β1 x + ε
• Suppose that we have n pairs of observations ( x1 , y1 ) , ( x2 , y2 ) ,… , ( xn , yn ) .
• The method of least squares is used to estimate the parameters, β0

and β1 by minimizing the sum of the squares of the vertical deviations
in Figure 11-3.
Figure 11-
11-3 Deviations of
the data from the estimated
regression model.
• Using Equation 11-2, the n observations in the sample can be
expressed as
yi = β 0 + β1 xi + ε i , i = 1, 2,… , n
• The sum of the squares of the deviations of the observations from

the true regression line is
n n
L = ∑ ε = ∑ ( yi − β 0 − β1 xi )
2 2
i
i =1 i =1
n n
L = ∑ ε = ∑ ( yi − β 0 − β1 xi )
2 2
i
i =1 i =1
The least squares estimators of β0 and β1, say, βˆ0 and βˆ1 , must
satisfy
n
∂L
∂β 0
( )
= −2∑ yi − βˆ0 − βˆ1 xi = 0
i =1
βˆ0 βˆ1
n
∂L
∂β1
( )
= −2∑ yi − βˆ0 − βˆ1 xi xi = 0
i =1
βˆ0 βˆ1
Simplifying these two equations yields
n n
nβˆ0 + βˆ1 ∑ xi = ∑ yi
i =1 i =1
n n n
βˆ0 ∑ xi + βˆ1 ∑ x = ∑ yi xi
2
i
i =1 i =1 i =1
These equations are called the least squares normal equations.

equations The
solution to the normal equations results in the least squares
estimators βˆ0 and βˆ1.
Least The least squares estimates of the intercept and slope in
the simple linear regression model are
Squares
βˆ0 = y − βˆ1 x
Estimates
 n  n 
n  ∑ yi  ∑ xi 
 i =1  i =1 
∑ y x
i i −
n
βˆ1 = i =1
2
 n 
n  ∑ xi 
 i =1 
∑
i =1
xi
2
−
n
n n
where y = (1 n ) ∑ yi and x = (1 n ) ∑ xi .
i =1 i =1
The fitted or estimated regression line is therefore
ŷ = βˆ0 + βˆ1 x
Note that each pair of observations satisfies the relationship
yi = βˆ0 + βˆ1 xi + ei , i = 1, 2,… , n
where ei = yi − yˆ i is called the residual.

residual The residual describes the
error in the fit of the model to the ith observation yi.
Notation
2
 n

n n ∑ i  x
S xx = ∑ ( xi − x ) = ∑ xi2 −  i =1 
2
i =1 i =1 n
 n  n 
n n  ∑ xi   ∑ yi 
S xy = ∑ yi ( xi − x ) = ∑ xi yi −  i =1   i =1 
2
i =1 i =1 n
2
 n 
n n  ∑ yi 
SST = S yy = ∑ ( yi − y ) = ∑ yi2 −  i =1 
2
i =1 i =1 n
Example 1
We will fit a simple linear regression model to the oxygen purity data in Table
11–1. The following quantities may be computed:
20 20
n = 20 ∑x
i =1
i = 23.92 ∑y
i =1
i = 1,843.21 x = 1.1960 y = 92.1605
20 20 20
∑y
i =1
2
i = 170, 044.5321 ∑x
i =1
2
i = 29.2892 ∑x y
i =1
i i = 2, 214.6566
2
 20

20 ∑ i x
( 23.92 )
2
S xx = ∑ xi2 −  i =1  = 29.2892 − = 0.68088

i =1 20 20
and
 20   20 
20  ∑ xi   ∑ yi  ( 23.92 )(1,843.21)
S xy = ∑ xi yi −  i =1   i =1  = 2, 214.6566 −
i =1 20 20
= 10.17744
Therefore, the least squares estimates of the slope and intercept are
ˆ S xy 10.17744
β1 = = = 14.94748
S yy 0.68088
and
βˆ0 = y − βˆ1 x = 92.1605 − (14.94748 )1.196 = 74.28331
The fitted simple linear regression model (with the coefficients reported to
three decimal places) is
yˆ = 74.283 + 14.947 x
This model is plotted in Fig. 11–4, along with the sample data.
Figure 11-
11-4 Scatter plot of oxygen
purity y versus hydrocarbon level
x and regression model ŷ = 74.20
+ 14.97x.
Computer software
programs are widely used in
regression modeling. Table
11–2 shows a portion of the
output from Minitab for this
problem. The estimates
are highlighted.
Estimating The error sum of squares is
σ² n n
SS E = ∑ e = ∑ ( yi − yˆi )
2 2
i
i =1 i =1
It can be shown that the expected value of the error

sum of squares is
E ( SS E ) = ( n − 2 ) σ 2 .
Estimating σ²
An unbiased estimator of σ2 is
SS E
σˆ 2 =
n−2
where SSE can be easily computed using
SS E = SST − βˆ1S xy
Slope Properties:
σ2
( )
E βˆ1 = β1 ( )
V βˆ1 =
S XX
Intercept Properties:
 1 x 2

( )
E βˆ0 = β 0 ( )
V βˆ0 = σ  +

2
n S

XX 
Estimated In simple linear regression the estimated standard error
of the slope and the estimated standard error of the
Standard
intercept are
Errors
σˆ 2  1 x 2

( )
se βˆ1 =
S XX
( )
ˆ 2
se β 0 = σˆ  +
 n S

XX 
11.4.1 Use of t-Tests
Hypothesis Tests about the Slope
Suppose we wish to test

H 0 : β1 = β1,0
H1 : β1 ≠ β1,0
An appropriate test statistic would be

βˆ1 − β1,0
T0 =
σˆ 2 S XX
Hypothesis Tests about the Slope
The test statistic could also be written as:

βˆ1 − β1,0
T0 =
( )
se βˆ1
We would reject the null hypothesis if

t0 > tα 2,n − 2
Hypothesis Tests about the Intercept
Suppose we wish to test

H 0 : β 0 = β 0 ,0
H 1 : β 0 ≠ β 0 ,0
An appropriate test statistic would be
βˆ 0 − β 0 ,0 βˆ 0 − β 0 ,0
T0 = =
1
2
σ  +
ˆ
x 

2
( )
se βˆ 0
 n S XX 
Hypothesis Tests about the Intercept
We would reject the null hypothesis if
t0 > tα 2,n − 2
An important special case of the hypotheses of Equation 11-18

is
H 0 : β1 = 0
H1 : β1 ≠ 0
These hypotheses relate to the significance of regression.

regression
Failure to reject H0 is equivalent to concluding that there is no

linear relationship between x and Y.
11-5 The hypothesis H0: β1 = 0 is not rejected.
Figure 11-
11-6 The hypothesis H0: β1 = 0 is rejected.

Figure 11-
Example 1
We will test for significance of regression using the model for the oxygen
purity data from Example 11-1. The hypotheses are
H 0 : β1 = 0
H1 : β1 ≠ 0
and we will use α = 0.01. From Example 11-1 and Table 11-2 we have
βˆ1 = 14.947, n = 20, S XX = 0.68088, σˆ 2 = 1.18

so the t-statistic becomes
βˆ1 βˆ1 14.947

t0 = = = = 11.35
σˆ 2 S XX ( )
se β1ˆ 1.18 0.68088
Example 1
Practical Interpretation:
Since the reference value of t is t0.005,18 = 2.88, the value of the test
statistic is very far into the critical region, implying that H0 : β1 = 0 should
be rejected.
There is strong evidence to support this claim.
−9
The P-value for this test is P 1.23 × 10 . This is obtain manually with a
calculator.

11 1-11 4

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

11 1-11 4

Caricato da

Copyright:

Formati disponibili

• Many problems in engineering and science involve exploring

the relationships between two or more variables.

• Regression analysis is a statistical technique that is very

• For example, in a chemical process, suppose that the yield of

• Regression analysis can be used to build a model to predict

2 1.02 89.05 12 1.15 92.52

3 1.15 91.43 13 0.98 90.56

4 1.29 93.74 14 1.01 89.54

5 1.46 96.73 15 1.11 89.85

6 1.36 94.45 16 1.20 90.39

7 0.87 87.59 17 1.26 93.25

8 1.23 91.77 18 1.32 93.41

9 1.55 99.42 19 1.43 94.98

10 1.40 93.65 20 0.95 87.33

The simple linear regression model is given by

The variance of Y given x is

where β1 can be interpreted as the change in the mean of Y for a unit

• Also, the variability of Y at a particular value of x is determined by the

• This implies there is a distribution of Y-values at each x and that the

• The expected value of Y at each level of x is a random variable:

• We assume that each observation, Y, can be described by the

• The method of least squares is used to estimate the parameters, β0

• The sum of the squares of the deviations of the observations from

These equations are called the least squares normal equations.

yi = βˆ0 + βˆ1 xi + ei , i = 1, 2,… , n

where ei = yi − yˆ i is called the residual.

S xx = ∑ xi2 −  i =1  = 29.2892 − = 0.68088

It can be shown that the expected value of the error

where SSE can be easily computed using

Hypothesis Tests about the Slope

Suppose we wish to test

An appropriate test statistic would be

Hypothesis Tests about the Slope

The test statistic could also be written as:

We would reject the null hypothesis if

Hypothesis Tests about the Intercept

Suppose we wish to test

An appropriate test statistic would be

Hypothesis Tests about the Intercept

We would reject the null hypothesis if

An important special case of the hypotheses of Equation 11-18

These hypotheses relate to the significance of regression.

Failure to reject H0 is equivalent to concluding that there is no

11-6 The hypothesis H0: β1 = 0 is rejected.

βˆ1 = 14.947, n = 20, S XX = 0.68088, σˆ 2 = 1.18

βˆ1 βˆ1 14.947

Potrebbero piacerti anche