Sei sulla pagina 1di 53

Slides by

JOHN
LOUCKS
St. Edward’s
University

© 2009 Thomson South-Western. All Rights Reserved 1


Chapter 12
Simple Linear Regression

■ Simple Linear Regression Model


■ Least Squares Method
■ Coefficient of Determination
■ Model Assumptions
■ Testing for Significance
■ Using the Estimated Regression Equation
for Estimation and Prediction

■ Residual Analysis: Validating Model Assumptions

© 2009 Thomson South-Western. All Rights Reserved 2


Simple Linear Regression

 Managerial decisions often are based on the


relationship between two or more variables.
 Regression analysis can be used to develop an
equation showing how the variables are related.
 The variable being predicted is called the dependent
variable and is denoted by y.
 The variables being used to predict the value of the
dependent variable are called the independent
variables and are denoted by x.

© 2009 Thomson South-Western. All Rights Reserved 3


Simple Linear Regression

 Simple linear regression involves one independent


variable and one dependent variable.
 The relationship between the two variables is
approximated by a straight line.
 Regression analysis involving two or more
independent variables is called multiple regression.

© 2009 Thomson South-Western. All Rights Reserved 4


Simple Linear Regression Model

 The equation that describes how y is related to x and


an error term is called the regression model.
 The simple linear regression model is:

y=β 0 + β 1x +ε

where:
β 0 and β 1 are called parameters of the model,
ε is a random variable called the error term.

© 2009 Thomson South-Western. All Rights Reserved 5


Simple Linear Regression Equation

■ The simple linear regression equation is:

E(y) = β 0 + β 1x

• Graph of the regression equation is a straight line.


• β 0 is the y intercept of the regression line.
• β 1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.

© 2009 Thomson South-Western. All Rights Reserved 6


Simple Linear Regression Equation

■ Positive Linear Relationship

E(y)

Regression line

Intercept Slope β 1
β 0
is positive

© 2009 Thomson South-Western. All Rights Reserved 7


Simple Linear Regression Equation

■ Negative Linear Relationship

E(y)

Intercept Regression line


β 0

Slope β 1
is negative

© 2009 Thomson South-Western. All Rights Reserved 8


Simple Linear Regression Equation

■ No Relationship

E(y)

Regression line
Intercept
β 0

Slope β 1
is 0

© 2009 Thomson South-Western. All Rights Reserved 9


Estimated Simple Linear Regression
Equation
■ The estimated simple linear regression
equation
ŷ = b0 + b1x

• The graph is called the estimated regression line.


• b0 is the y intercept of the line.
• b1 is the slope of the line.
• isŷ the estimated value of y for a given x value.

© 2009 Thomson South-Western. All Rights Reserved 10


Estimation Process

Regression Model Sample Data:


y = β 0 + β 1x +ε x y
Regression Equation x1 y1
E(y) = β 0 + β 1x . .
Unknown Parameters . .
β 0, β 1 xn yn

Estimated
b0 and b1 Regression Equation
provide estimates of ŷ = b0 + b1x
β 0 and β 1 Sample Statistics
b0, b1

© 2009 Thomson South-Western. All Rights Reserved 11


Least Squares Method

■ Least Squares Criterion

min ∑ (yi − y i )2

where:
yi = observed value of the dependent variable
for the ith observation
yi =^estimated value of the dependent variable
for the ith observation

© 2009 Thomson South-Western. All Rights Reserved 12


Least Squares Method

■ Slope for the Estimated Regression Equation

b = ∑ (x − x)(y − y)
i i

∑ (x − x)
1 2
i

where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable

© 2009 Thomson South-Western. All Rights Reserved 13


Least Squares Method

■ y-Intercept for the Estimated Regression


Equation
b0 = y − b1x

© 2009 Thomson South-Western. All Rights Reserved 14


Simple Linear Regression

■ Example: Reed Auto Sales


Reed Auto periodically has a special week-long sale.
As part of the advertising campaign Reed runs one or
more television commercials during the weekend
preceding the sale. Data from a sample of 5 previous
sales are shown on the next slide.

© 2009 Thomson South-Western. All Rights Reserved 15


Simple Linear Regression

■ Example: Reed Auto Sales

Number of Number of
TV Ads (x) Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Σ x = 10 Σ y = 100
x=2 y = 20

© 2009 Thomson South-Western. All Rights Reserved 16


Estimated Regression Equation

■ Slope for the Estimated Regression Equation

b1 = ∑(x − x)(y − y)
i
= i 20
= 5
∑(x −x) i
2
4
■ y-Intercept for the Estimated Regression Equation

b0 = y − b1x = 20− 5(2)= 10


■ Estimated Regression Equation
yˆ = 10 + 5x

© 2009 Thomson South-Western. All Rights Reserved 17


Scatter Diagram and Trend Line

30
25
20
Cars Sold

y = 5x + 10
15
10
5
0
0 1 2 3 4
TV Ads

© 2009 Thomson South-Western. All Rights Reserved 18


Coefficient of Determination

■ Relationship Among SST, SSR, SSE

SST = SSR +
SSE

∑ i
( y − y 2
) = ∑ i
( ˆ
y −y 2
) + ∑ i i
( y −yˆ 2
)

where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

© 2009 Thomson South-Western. All Rights Reserved 19


Coefficient of Determination

■ The coefficient of determination is:

r2 = SSR/SST

where:
SSR = sum of squares due to regression
SST = total sum of squares

© 2009 Thomson South-Western. All Rights Reserved 20


Coefficient of Determination

r2 = SSR/SST = 100/114 = .8772

The regression relationship is very strong; 87.7%


of the variability in the number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.

© 2009 Thomson South-Western. All Rights Reserved 21


Sample Correlation Coefficient

rxy = (sign of b1 ) Coefficient of Determination


rxy = (sign of b1 ) r 2

where:
b1 = the slope of the estimated regression
equation yˆ = b0 + b1 x

© 2009 Thomson South-Western. All Rights Reserved 22


Sample Correlation Coefficient

rxy = (sign of b1 ) r 2

yˆ = 10 + 5 x is “+”.
The sign of b1 in the equation

rxy =+ .8772

rxy = +.9366

© 2009 Thomson South-Western. All Rights Reserved 23


Assumptions About the Error Term ε

1.
1. The error εε
The error is
is aa random
random variable
variable with
with mean
mean of
of zero.
zero.

2.
2. The
The variance of εε ,, denoted
variance of by σσ 22,, is
denoted by is the
the same
same for
for
all
all values
values of
of the
the independent
independent variable.
variable.

3.
3. The
The values of εε
values of are
are independent.
independent.

4.
4. The error εε
The error is
is aa normally
normally distributed
distributed random
random
variable.
variable.

© 2009 Thomson South-Western. All Rights Reserved 24


Testing for Significance

To
To test
test for
for aa significant
significant regression
regression relationship,
relationship, we
we
must
must conduct
conduct aa hypothesis
hypothesis test
test to
to determine
determine whether
whether
the
the value of ββ 11 is
value of is zero.
zero.

Two
Two tests
tests are
are commonly
commonly used:
used:
t Test and F Test

Both
Both the
the tt test
test and
and FF test
test require
require an
an estimate of σσ
estimate of ,,
22

the
the variance
variance ofof εε in
in the
the regression
regression model.
model.

© 2009 Thomson South-Western. All Rights Reserved 25


Testing for Significance

■ An Estimate of σ 2
The mean square error (MSE) provides the estimate
of σ 2, and the notation s2 is also used.

s 2 = MSE = SSE/(n − 2)

where:

SSE = ∑ ( yi − yˆ i ) 2 = ∑ ( yi − b0 − b1 xi ) 2

© 2009 Thomson South-Western. All Rights Reserved 26


Testing for Significance

■An Estimate of σ
• To estimate σ we take the square root of σ 2
.
• The resulting s is called the standard error of
the estimate.

SSE
s = MSE =
n−2

© 2009 Thomson South-Western. All Rights Reserved 27


Testing for Significance: t Test

■ Hypotheses

H 0: β 1 = 0
H a: β 1 ≠ 0

■ Test Statistic

b1 s
t= where sb1 =
sb1 Σ(xi − x)
2

© 2009 Thomson South-Western. All Rights Reserved 28


Testing for Significance: t Test

■ Rejection Rule

Reject H0 if p-value < α


or t < -tα/ 2 or t > tα/ 2

where:
tα/ 2 is based on a t distribution
with n - 2 degrees of freedom

© 2009 Thomson South-Western. All Rights Reserved 29


Testing for Significance: t Test

1. Determine the hypotheses. H 0: β 1 = 0


H a: β 1 ≠ 0
α = .05
2. Specify the level of significance.

b1
3. Select the test statistic.t =
sb1

4. State the rejection rule.


Reject H0 if p-value < .05
or |t| > 3.182 (with
3 degrees of freedom)

© 2009 Thomson South-Western. All Rights Reserved 30


Testing for Significance: t Test

5. Compute the value of the test statistic.


b1 5
t= = = 4.63
sb1 1.08

6. Determine whether to reject H0.


t = 4.541 provides an area of .01 in the upper
tail. Hence, the p-value is less than .02. (Also,
t = 4.63 > 3.182.) We can reject H0.

© 2009 Thomson South-Western. All Rights Reserved 31


Confidence Interval for β 1

 We can use a 95% confidence interval for β 1 to test


the hypotheses just used in the t test.
 H0 is rejected if the hypothesized value of β 1 is not
included in the confidence interval for β 1.

© 2009 Thomson South-Western. All Rights Reserved 32


Confidence Interval for β 1

■ The form of a confidence interval for β 1 is: tα / 2 sb1


is the
b1 ± tα / 2 sb1 margin
b1 is the of error
point
where tα / 2 is the t value providing an area
estimat
of α /2 in the upper tail of a t distribution
or
with n - 2 degrees of freedom

© 2009 Thomson South-Western. All Rights Reserved 33


Confidence Interval for β 1

■ Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for β 1.
■ 95% Confidence Interval for β 1
b1 ± tα / 2=sb15 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
■ Conclusion
0 is not included in the confidence interval.
Reject H0

© 2009 Thomson South-Western. All Rights Reserved 34


Testing for Significance: F Test

■ Hypotheses

H 0: β 1 = 0
H a: β 1 ≠ 0
■ Test Statistic

F = MSR/MSE

© 2009 Thomson South-Western. All Rights Reserved 35


Testing for Significance: F Test

■ Rejection Rule

Reject H0 if
p-value < α
or F > Fα
where:
Fα is based on an F distribution with
1 degree of freedom in the numerator and
n - 2 degrees of freedom in the denominator

© 2009 Thomson South-Western. All Rights Reserved 36


Testing for Significance: F Test

1. Determine the hypotheses. H 0: β 1 = 0


H a: β 1 ≠ 0

α
2. Specify the level of significance. = .05

3. Select the test statistic.F = MSR/MSE

4. State the rejection rule.


Reject H0 if p-value < .05
or F > 10.13 (with 1 d.f.
in numerator and
3 d.f. in denominator)

© 2009 Thomson South-Western. All Rights Reserved 37


Testing for Significance: F Test

5. Compute the value of the test statistic.

F = MSR/MSE = 100/4.667 = 21.43

6. Determine whether to reject H0.


F = 17.44 provides an area of .025 in
the upper tail. Thus, the p-value
corresponding to F = 21.43 is less than
2(.025) = .05. Hence,
The statistical we reject
evidence H0.
is sufficient to
conclude
that we have a significant relationship
between the
number of TV ads aired and the number of
cars sold.
© 2009 Thomson South-Western. All Rights Reserved 38
Some Cautions about the
Interpretation of Significance Tests
 Rejecting H0: β 1 = 0 and concluding that
the
relationship between x and y is significant
does not enable us to conclude that a cause-
and-effect
 Justrelationship
because weisare able to
present β 1=
reject Hx0:and
between y. 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.

© 2009 Thomson South-Western. All Rights Reserved 39


Using the Estimated Regression Equation
for Estimation and Prediction
■ Confidence Interval Estimate of
E(yp)
y p ± tα / 2sy p

■ Prediction Interval Estimate of yp

yp ± tα / 2sind

where:
confidence coefficient is 1 - α and
tα /2 is based on a t distribution
with n - 2 degrees of freedom

© 2009 Thomson South-Western. All Rights Reserved 40


Point Estimation

If 3 TV ads are run prior to a sale, we


expect
the mean number of cars sold to be:
^
y= 10 + 5(3) = 25 cars

© 2009 Thomson South-Western. All Rights Reserved 41


Confidence Interval for E(yp)

■ yˆpof
Estimate of the Standard Deviation

1 (xp − x)2
syˆp = s +
n ∑ (xi − x)2

1 (3− 2)2
syˆp = 2.16025 +
5 (1− 2)2 + (3− 2)2 + (2 − 2)2 + (1− 2)2 + (3− 2)2

1 1
syˆp = 2.16025 + = 1.4491
5 4

© 2009 Thomson South-Western. All Rights Reserved 42


Confidence Interval for E(yp)

The 95% confidence interval estimate of the


mean number of cars sold when 3 TV ads
are run is:
y p ± tα / 2sy p

25 + 3.1824(1.4491)
25 + 4.61

20.39 to 29.61 cars

© 2009 Thomson South-Western. All Rights Reserved 43


Prediction Interval for yp

■ Estimate of the Standard


Deviation
of an Individual Value of yp
1 (xp − x )2

sind = s 1+ +
n ∑ (xi − x)2

1 1
syˆp = 2.16025 1+ +
5 4
syˆp = 2.16025(1.20416) = 2.6013

© 2009 Thomson South-Western. All Rights Reserved 44


Prediction Interval for yp

The 95% prediction interval estimate of the


number of cars sold in one particular week
when 3 TV ads are run is:

yp ± tα / 2sind

25 + 3.1824(2.6013)
25 + 8.28

16.72 to 33.28 cars

© 2009 Thomson South-Western. All Rights Reserved 45


Residual Analysis

 If the assumptions about the error term ε appear


questionable, the hypothesis tests about the
significance of the regression relationship and the
interval estimation results may not be valid.
 The residuals provide the best information about ε .
 Residual for Observation i

yi − yˆi

 Much of the residual analysis is based on an


examination of graphical plots.

© 2009 Thomson South-Western. All Rights Reserved 46


Residual Plot Against x

■ If the assumption that the variance of ε is the


same for all values of x is valid, and the
assumed regression model is an adequate
representation of the relationship between the
variables, then
The residual plot should give an overall
impression of a horizontal band of points

© 2009 Thomson South-Western. All Rights Reserved 47


Residual Plot Against x

y − yˆ
Good Pattern
Residual

© 2009 Thomson South-Western. All Rights Reserved 48


Residual Plot Against x

y − yˆ
Nonconstant Variance
Residual

© 2009 Thomson South-Western. All Rights Reserved 49


Residual Plot Against x

y − yˆ
Model Form Not Adequate
Residual

© 2009 Thomson South-Western. All Rights Reserved 50


Residual Plot Against x

■ Residuals
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2

© 2009 Thomson South-Western. All Rights Reserved 51


Residual Plot Against x

TV Ads Residual Plot


3
2
Residuals

1
0
-1
-2
-3
0 1 2 3 4
TV Ads

© 2009 Thomson South-Western. All Rights Reserved 52


End of Chapter 12

© 2009 Thomson South-Western. All Rights Reserved 53

Potrebbero piacerti anche