Linear Regression Analysis

Slides by
JOHN
LOUCKS
St. Edward’s
University
© 2009 Thomson South-Western. All Rights Reserved 1

Chapter 12
Simple Linear Regression
■ Simple Linear Regression Model

■ Least Squares Method
■ Coefficient of Determination
■ Model Assumptions
■ Testing for Significance
■ Using the Estimated Regression Equation
for Estimation and Prediction
■ Residual Analysis: Validating Model Assumptions

 Managerial decisions often are based on the

relationship between two or more variables.
 Regression analysis can be used to develop an
equation showing how the variables are related.
 The variable being predicted is called the dependent
variable and is denoted by y.
 The variables being used to predict the value of the
dependent variable are called the independent
variables and are denoted by x.

 Simple linear regression involves one independent

variable and one dependent variable.
 The relationship between the two variables is
approximated by a straight line.
 Regression analysis involving two or more
independent variables is called multiple regression.

Simple Linear Regression Model
 The equation that describes how y is related to x and

an error term is called the regression model.
 The simple linear regression model is:
y=β 0 + β 1x +ε
where:
β 0 and β 1 are called parameters of the model,
ε is a random variable called the error term.

Simple Linear Regression Equation
■ The simple linear regression equation is:
E(y) = β 0 + β 1x
• Graph of the regression equation is a straight line.

• β 0 is the y intercept of the regression line.
• β 1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.

■ Positive Linear Relationship
E(y)
Regression line
Intercept Slope β 1
β 0
is positive

■ Negative Linear Relationship
E(y)
Intercept Regression line

β 0
Slope β 1
is negative

■ No Relationship
E(y)
Regression line
Intercept
β 0
Slope β 1
is 0

Estimated Simple Linear Regression
Equation
■ The estimated simple linear regression
equation
ŷ = b0 + b1x
• The graph is called the estimated regression line.

• b0 is the y intercept of the line.
• b1 is the slope of the line.
• isŷ the estimated value of y for a given x value.

Estimation Process
Regression Model Sample Data:

y = β 0 + β 1x +ε x y
Regression Equation x1 y1
E(y) = β 0 + β 1x . .
Unknown Parameters . .
β 0, β 1 xn yn
Estimated
b0 and b1 Regression Equation
provide estimates of ŷ = b0 + b1x
β 0 and β 1 Sample Statistics
b0, b1

Least Squares Method
■ Least Squares Criterion
min ∑ (yi − y i )2
where:
yi = observed value of the dependent variable
for the ith observation
yi =^estimated value of the dependent variable
for the ith observation

■ Slope for the Estimated Regression Equation
b = ∑ (x − x)(y − y)
i i
∑ (x − x)
1 2
i
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable

■ y-Intercept for the Estimated Regression

Equation
b0 = y − b1x

■ Example: Reed Auto Sales

Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown on the next slide.

■ Example: Reed Auto Sales
Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Σ x = 10 Σ y = 100
x=2 y = 20

Estimated Regression Equation
■ Slope for the Estimated Regression Equation
b1 = ∑(x − x)(y − y)
i
= i 20
= 5
∑(x −x) i
2
4
■ y-Intercept for the Estimated Regression Equation
b0 = y − b1x = 20− 5(2)= 10

■ Estimated Regression Equation
yˆ = 10 + 5x

Scatter Diagram and Trend Line
30
25
20
Cars Sold
y = 5x + 10
15
10
5
0
0 1 2 3 4
TV Ads

Coefficient of Determination
■ Relationship Among SST, SSR, SSE
SST = SSR +
SSE
∑ i
( y − y 2
) = ∑ i
( ˆ
y −y 2
) + ∑ i i
( y −yˆ 2
)
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

■ The coefficient of determination is:
r2 = SSR/SST
where:
SSR = sum of squares due to regression
SST = total sum of squares

r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong; 87.7%

of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.

Sample Correlation Coefficient
rxy = (sign of b1 ) Coefficient of Determination

rxy = (sign of b1 ) r 2
where:
b1 = the slope of the estimated regression
equation yˆ = b0 + b1 x

Sample Correlation Coefficient
rxy = (sign of b1 ) r 2
yˆ = 10 + 5 x is “+”.
The sign of b1 in the equation
rxy =+ .8772
rxy = +.9366

Assumptions About the Error Term ε
1.
1. The error εε
The error is
is aa random
random variable
variable with
with mean
mean of
of zero.
zero.
2.
2. The
The variance of εε ,, denoted
variance of by σσ 22,, is
denoted by is the
the same
same for
for
all
all values
values of
of the
the independent
independent variable.
variable.
3.
3. The
The values of εε
values of are
are independent.
independent.
4.
4. The error εε
The error is
is aa normally
normally distributed
distributed random
random
variable.
variable.

Testing for Significance
To
To test
test for
for aa significant
significant regression
regression relationship,
relationship, we
we
must
must conduct
conduct aa hypothesis
hypothesis test
test to
to determine
determine whether
whether
the
the value of ββ 11 is
value of is zero.
zero.
Two
Two tests
tests are
are commonly
commonly used:
used:
t Test and F Test
Both
Both the
the tt test
test and
and FF test
test require
require an
an estimate of σσ
estimate of ,,
22
the
the variance
variance ofof εε in
in the
the regression
regression model.
model.

■ An Estimate of σ 2
The mean square error (MSE) provides the estimate
of σ 2, and the notation s2 is also used.
s 2 = MSE = SSE/(n − 2)
where:
SSE = ∑ ( yi − yˆ i ) 2 = ∑ ( yi − b0 − b1 xi ) 2

■An Estimate of σ
• To estimate σ we take the square root of σ 2
.
• The resulting s is called the standard error of
the estimate.
SSE
s = MSE =
n−2

Testing for Significance: t Test
■ Hypotheses
H 0: β 1 = 0
H a: β 1 ≠ 0
■ Test Statistic
b1 s
t= where sb1 =
sb1 Σ(xi − x)
2

■ Rejection Rule
Reject H0 if p-value < α

or t < -tα/ 2 or t > tα/ 2
where:
tα/ 2 is based on a t distribution
with n - 2 degrees of freedom

1. Determine the hypotheses. H 0: β 1 = 0

H a: β 1 ≠ 0
α = .05
2. Specify the level of significance.
b1
3. Select the test statistic.t =
sb1
4. State the rejection rule.

Reject H0 if p-value < .05
or |t| > 3.182 (with
3 degrees of freedom)

5. Compute the value of the test statistic.

b1 5
t= = = 4.63
sb1 1.08
6. Determine whether to reject H0.

t = 4.541 provides an area of .01 in the upper
tail. Hence, the p-value is less than .02. (Also,
t = 4.63 > 3.182.) We can reject H0.

Confidence Interval for β 1
 We can use a 95% confidence interval for β 1 to test

the hypotheses just used in the t test.
 H0 is rejected if the hypothesized value of β 1 is not
included in the confidence interval for β 1.

■ The form of a confidence interval for β 1 is: tα / 2 sb1

is the
b1 ± tα / 2 sb1 margin
b1 is the of error
point
where tα / 2 is the t value providing an area
estimat
of α /2 in the upper tail of a t distribution
or

■ Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for β 1.
■ 95% Confidence Interval for β 1
b1 ± tα / 2=sb15 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
■ Conclusion
0 is not included in the confidence interval.
Reject H0

Testing for Significance: F Test
■ Hypotheses
H 0: β 1 = 0
H a: β 1 ≠ 0
■ Test Statistic
F = MSR/MSE

■ Rejection Rule
Reject H0 if
p-value < α
or F > Fα
where:
Fα is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator

1. Determine the hypotheses. H 0: β 1 = 0

H a: β 1 ≠ 0
α
2. Specify the level of significance. = .05
3. Select the test statistic.F = MSR/MSE
4. State the rejection rule.

Reject H0 if p-value < .05
or F > 10.13 (with 1 d.f.
in numerator and
3 d.f. in denominator)

5. Compute the value of the test statistic.
F = MSR/MSE = 100/4.667 = 21.43
6. Determine whether to reject H0.

F = 17.44 provides an area of .025 in
the upper tail. Thus, the p-value
corresponding to F = 21.43 is less than
2(.025) = .05. Hence,
The statistical we reject
evidence H0.
is sufficient to
conclude
that we have a significant relationship
between the
number of TV ads aired and the number of
cars sold.
Some Cautions about the
Interpretation of Significance Tests
 Rejecting H0: β 1 = 0 and concluding that
the
relationship between x and y is significant
does not enable us to conclude that a cause-
and-effect
 Justrelationship
because weisare able to
present β 1=
reject Hx0:and
between y. 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.

Using the Estimated Regression Equation
for Estimation and Prediction
■ Confidence Interval Estimate of
E(yp)
y p ± tα / 2sy p
■ Prediction Interval Estimate of yp
yp ± tα / 2sind
where:
confidence coefficient is 1 - α and
tα /2 is based on a t distribution

Point Estimation
If 3 TV ads are run prior to a sale, we

expect
the mean number of cars sold to be:
^
y= 10 + 5(3) = 25 cars

Confidence Interval for E(yp)
■ yˆpof
Estimate of the Standard Deviation
1 (xp − x)2
syˆp = s +
n ∑ (xi − x)2
1 (3− 2)2
syˆp = 2.16025 +
5 (1− 2)2 + (3− 2)2 + (2 − 2)2 + (1− 2)2 + (3− 2)2
1 1
syˆp = 2.16025 + = 1.4491
5 4

Confidence Interval for E(yp)
The 95% confidence interval estimate of the

mean number of cars sold when 3 TV ads
are run is:
y p ± tα / 2sy p
25 + 3.1824(1.4491)
25 + 4.61
20.39 to 29.61 cars

Prediction Interval for yp
■ Estimate of the Standard

Deviation
of an Individual Value of yp
1 (xp − x )2
sind = s 1+ +
n ∑ (xi − x)2
1 1
syˆp = 2.16025 1+ +
5 4
syˆp = 2.16025(1.20416) = 2.6013

Prediction Interval for yp
The 95% prediction interval estimate of the

number of cars sold in one particular week
when 3 TV ads are run is:
yp ± tα / 2sind
25 + 3.1824(2.6013)
25 + 8.28
16.72 to 33.28 cars

Residual Analysis
 If the assumptions about the error term ε appear

questionable, the hypothesis tests about the
significance of the regression relationship and the
interval estimation results may not be valid.
 The residuals provide the best information about ε .
 Residual for Observation i
yi − yˆi
 Much of the residual analysis is based on an

examination of graphical plots.

Residual Plot Against x
■ If the assumption that the variance of ε is the

same for all values of x is valid, and the
assumed regression model is an adequate
representation of the relationship between the
variables, then
The residual plot should give an overall
impression of a horizontal band of points

y − yˆ
Good Pattern
Residual

y − yˆ
Nonconstant Variance
Residual

y − yˆ
Model Form Not Adequate
Residual

■ Residuals
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2

TV Ads Residual Plot

3
2
Residuals
1
0
-1
-2
-3
0 1 2 3 4
TV Ads

End of Chapter 12

Linear Regression Analysis

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Linear Regression Analysis

Caricato da

Copyright:

Formati disponibili

Slides by

© 2009 Thomson South-Western. All Rights Reserved 1

■ Simple Linear Regression Model

■ Residual Analysis: Validating Model Assumptions

© 2009 Thomson South-Western. All Rights Reserved 2

 Managerial decisions often are based on the

© 2009 Thomson South-Western. All Rights Reserved 3

 Simple linear regression involves one independent

© 2009 Thomson South-Western. All Rights Reserved 4

 The equation that describes how y is related to x and

© 2009 Thomson South-Western. All Rights Reserved 5

■ The simple linear regression equation is:

• Graph of the regression equation is a straight line.

© 2009 Thomson South-Western. All Rights Reserved 6

■ Positive Linear Relationship

© 2009 Thomson South-Western. All Rights Reserved 7

■ Negative Linear Relationship

Intercept Regression line

© 2009 Thomson South-Western. All Rights Reserved 8

© 2009 Thomson South-Western. All Rights Reserved 9

• The graph is called the estimated regression line.

© 2009 Thomson South-Western. All Rights Reserved 10

Regression Model Sample Data:

© 2009 Thomson South-Western. All Rights Reserved 11

■ Least Squares Criterion

© 2009 Thomson South-Western. All Rights Reserved 12

■ Slope for the Estimated Regression Equation

© 2009 Thomson South-Western. All Rights Reserved 13

■ y-Intercept for the Estimated Regression

© 2009 Thomson South-Western. All Rights Reserved 14

■ Example: Reed Auto Sales

© 2009 Thomson South-Western. All Rights Reserved 15

■ Example: Reed Auto Sales

© 2009 Thomson South-Western. All Rights Reserved 16

■ Slope for the Estimated Regression Equation

b0 = y − b1x = 20− 5(2)= 10

© 2009 Thomson South-Western. All Rights Reserved 17

© 2009 Thomson South-Western. All Rights Reserved 18

■ Relationship Among SST, SSR, SSE

© 2009 Thomson South-Western. All Rights Reserved 19

■ The coefficient of determination is:

© 2009 Thomson South-Western. All Rights Reserved 20

r2 = SSR/SST = 100/114 = .8772

The regression relationship is very strong; 87.7%

© 2009 Thomson South-Western. All Rights Reserved 21

rxy = (sign of b1 ) Coefficient of Determination

© 2009 Thomson South-Western. All Rights Reserved 22

© 2009 Thomson South-Western. All Rights Reserved 23

© 2009 Thomson South-Western. All Rights Reserved 24

© 2009 Thomson South-Western. All Rights Reserved 25

© 2009 Thomson South-Western. All Rights Reserved 26

© 2009 Thomson South-Western. All Rights Reserved 27

© 2009 Thomson South-Western. All Rights Reserved 28

Reject H0 if p-value < α

© 2009 Thomson South-Western. All Rights Reserved 29

1. Determine the hypotheses. H 0: β 1 = 0

4. State the rejection rule.

© 2009 Thomson South-Western. All Rights Reserved 30

5. Compute the value of the test statistic.

6. Determine whether to reject H0.

© 2009 Thomson South-Western. All Rights Reserved 31

 We can use a 95% confidence interval for β 1 to test

© 2009 Thomson South-Western. All Rights Reserved 32