Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
13-1
Chapter 13
Introduction to Linear Regression
and Correlation Analysis
Chapter Goals
To understand the methods for displaying and
describing relationship among variables
Chapter 13
13-2
Graphical
Scatterplots
Line plots
3-D plots
Models
Linear regression
Correlations
Frequency tables
Chapter 13
13-3
YDI 7.1
Response ( y)
Explanatory (x)
Height of son
Weight
Chapter 13
13-4
Example
Curvilinear relationships
y
x
y
x
y
x
Fall 2006 Fundamentals of Business Statistics
x
8
Chapter 13
13-5
x
y
x
9
Correlation Coefficient
(continued)
10
Chapter 13
13-6
Features of and r
Unit free
Range between -1 and 1
The closer to -1, the stronger the negative
linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker the linear
relationship
11
Examples of Approximate
r Values
Tag with appropriate value:
-1, -.6, 0, +.3, 1
x
x
12
Chapter 13
13-7
Earlier Example
Correlations
Exam1
Exam1
Exam2
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
1
366
.400**
.000
351
Exam2
.400**
.000
351
1
356
13
YDI 7.3
What kind of relationship would you expect in
the following situations:
age (in years) of a car, and its price.
14
Chapter 13
13-8
YDI 7.4
Identify the two variables that vary and decide
which should be the independent variable
and which should be the dependent
variable. Sketch a graph that you think best
represents the relationship between the two
variables.
1. The size of a persons vocabulary over his or
her lifetime.
2. The distance from the ceiling to the tip of the
minute hand of a clock hung on the wall.
Fall 2006 Fundamentals of Business Statistics
15
16
Chapter 13
13-9
17
No Relationship
18
Chapter 13
13-10
Population
Slope
Coefficient
Independent
Variable
y = 0 + 1x +
Linear component
Random
Error
term, or
residual
Random Error
component
19
20
10
Chapter 13
13-11
(continued)
Observed Value
of y for xi
Predicted Value
of y for xi
Slope = 1
Random Error
for this x value
Intercept = 0
xi
21
Estimate of
the regression
intercept
Estimate of the
regression slope
y i = b0 + b1x
Independent
variable
22
11
Chapter 13
13-12
Earlier Example
23
Residual
A residual is the difference between the
observed response y and the predicted
response . Thus, for each pair of
observations (xi, yi), the ith residual is
ei = yi i = yi (b0 + b1x)
24
12
Chapter 13
13-13
(y y)
(y (b
+ b1x))2
25
Interpretation of the
Slope and the Intercept
26
13
Chapter 13
13-14
( x x )( y y )
(x x)
2
algebraic equivalent:
b1 =
x y
xy
n
( x) 2
2
x n
and
b0 = y b1 x
27
28
14
Chapter 13
13-15
29
Square Feet
(x)
1400
1600
1700
1875
1100
1550
2350
2450
1425
1700
30
15
Chapter 13
13-16
SPSS Output
The regression equation is:
house price = 98.248 + 0.110 (square feet)
Model Summary
Model
1
R
R Square
.762a
.581
Adjusted
R Square
.528
Std. Error of
the Estimate
41.33032
Coefficientsa
Model
1
(Constant)
Square Feet
Unstandardized
Coefficients
B
Std. Error
98.248
58.033
.110
.033
Standardized
Coefficients
Beta
.762
t
1.693
3.329
Sig.
.129
.010
31
Graphical Presentation
House price model: scatter plot and
regression line
House Price ($1000s)
Intercept
= 98.248
450
400
350
300
Slope
= 0.110
250
200
150
100
50
0
0
500
1000
1500
2000
2500
3000
Square Feet
32
16
Chapter 13
13-17
Interpretation of the
Intercept, b0
house price = 98.248 + 0.110 (square feet)
33
Interpretation of the
Slope Coefficient, b1
house price = 98.24833 + 0.10977 (square feet)
34
17
Chapter 13
13-18
35
YDI 7.6
The growth of children from early childhood through adolescence
generally follows a linear pattern. Data on the heights of female
Americans during childhood, from four to nine years old, were
compiled and the least squares regression line was obtained as
= 32 + 2.4x where is the predicted height in inches, and x is
age in years.
Interpret the value of the estimated slope b1 = 2. 4.
Would interpretation of the value of the estimated y-intercept, b0
= 32, make sense here?
What would you predict the height to be for a female American at
8 years old?
What would you predict the height to be for a female American at
25 years old? How does the quality of this answer compare to
the previous question?
36
18
Chapter 13
13-19
Coefficient of Determination, R2
0 R2 1
37
Coefficient of Determination, R2
(continued)
Note: In the single independent variable case, the coefficient
of determination is
R2 = r 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
38
19
Chapter 13
13-20
Examples of Approximate
R2 Values
y
x
y
x
y
x
39
Examples of Approximate
R2 Values
R2 = 0
No linear relationship
between x and y:
R2 = 0
40
20
Chapter 13
13-21
SPSS Output
Model Summary
Model
1
R
R Square
.762a
.581
Adjusted
R Square
.528
Std. Error of
the Estimate
41.33032
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
18934.935
13665.565
32600.500
df
1
8
9
Mean Square
18934.935
1708.196
F
11.085
Sig.
.010a
Coefficientsa
Model
1
(Constant)
Square Feet
Unstandardized
Coefficients
B
Std. Error
98.248
58.033
.110
.033
Standardized
Coefficients
Beta
.762
t
1.693
3.329
Sig.
.129
.010
41
42
21
Chapter 13
13-22
SPSS Output
s = 41.33032
Model Summary
Model
1
R
R Square
.762a
.581
Adjusted
R Square
.528
Std. Error of
the Estimate
41.33032
sb1 = 0.03297
Coefficientsa
Model
1
(Constant)
Square Feet
Unstandardized
Coefficients
B
Std. Error
98.248
58.033
.110
.033
Standardized
Coefficients
Beta
.762
t
1.693
3.329
Sig.
.129
.010
43
small s
small sb1
large sb1
large s
44
22
Chapter 13
13-23
Test statistic
b
t= 1 1
sb1
d.f. = n 2
where:
b1 = Sample regression slope
coefficient
1 = Hypothesized slope
sb1 = Estimator of the standard
error of the slope
45
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
(continued)
46
23
Chapter 13
13-24
sb1
b1
Coefficients
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
0.10977
0.03297
3.32938
0.01039
Intercept
Square Feet
d.f. = 10-2 = 8
/2=.025
Reject H0
/2=.025
Do not reject H0
-t/2
-2.3060
Reject H
0
t(1-/2)
2.3060 3.329
Decision:
Reject H0
Conclusion:
There is sufficient evidence
that square footage affects
house price
47
b1 t(1/2 )s b1
d.f. = n - 2
Coefficients
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
Lower 95%
-35.57720
Upper 95%
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
48
24
Chapter 13
13-25
Intercept
Square Feet
Coefficients
Standard Error
t Stat
P-value
98.24833
58.03348
1.69296
0.12892
Lower 95%
-35.57720
Upper 95%
232.07386
0.10977
0.03297
3.32938
0.01039
0.03374
0.18580
49
Residual Analysis
Purposes
Examine for linearity assumption
Examine for constant variance for all
levels of x
Evaluate normal distribution
assumption
50
25
Chapter 13
13-26
residuals
residuals
Not Linear
Fall 2006 Fundamentals of Business Statistics
Linear
51
x
Non-constant variance
residuals
residuals
9Constant variance
52
26
Chapter 13
13-27
Residual Output
RESIDUAL OUTPUT
Predicted
House Price
251.92316
-6.923162
273.87671
38.12329
60
284.85348
-5.853484
40
304.06284
3.937162
218.99284
-19.99284
268.38832
-49.38832
356.20251
48.79749
-40
367.17929
-43.17929
-60
254.6674
64.33264
10
284.85348
-29.85348
Residuals
20
0
-20
1000
2000
3000
Square Feet
53
27