Sei sulla pagina 1di 39

11-1

Chapter 11

Correlation and
Regression
© The McGraw-Hill Companies, Inc., 2000
11-2
Outline

 11-1 Introduction
 11-2 Scatter Plots
 11-3 Correlation
 11-4 Regression

© The McGraw-Hill Companies, Inc., 2000


11-3
Outline

 11-5 Coefficient of
Determination and
Standard Error of Estimate

© The McGraw-Hill Companies, Inc., 2000


11-4
Objectives
 Draw a scatter plot for a set of
ordered pairs.
 Find the correlation coefficient.
 Test the hypothesis H0:  = 0.
 Find the equation of the
regression line.
© The McGraw-Hill Companies, Inc., 2000
11-5
Objectives
 Find the coefficient of
determination.
 Find the standard error of
estimate.
 Find a prediction interval.

© The McGraw-Hill Companies, Inc., 2000


11-6
11-2 Scatter Plots
 A scatter plot is a graph of the
ordered pairs (x, y) of numbers
consisting of the independent
variable, x, and the dependent
variable, y.

© The McGraw-Hill Companies, Inc., 2000


11-7
11-2 Scatter Plots - Example
 Construct a scatter plot for the data
obtained in a study of age and systolic
blood pressure of six randomly selected
subjects.
 The data is given on the next slide.

© The McGraw-Hill Companies, Inc., 2000


11-8
11-2 Scatter Plots - Example
Subject Age, x Pressure, y
A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
© The McGraw-Hill Companies, Inc., 2000
11-9
11-2 Scatter Plots - Example
Positive Relationship

150
150
Pressure
Pressure

140
140

130
130

120
120
40
40 50
50 60
60 70
70
Age
Age

© The McGraw-Hill Companies, Inc., 2000


11-10 11-2 Scatter Plots - Other Examples

Negative Relationship
90
90
80
80
grade
Finalgrade

70
70
Final

60
60
50
50
40
40
55 10
10 15
15
Number
Numberofofabsences
absences
© The McGraw-Hill Companies, Inc., 2000
11-11
11-2 Scatter Plots - Other Examples

No Relationship
10
10

55
Y
y

00
00 10
10 20
20 30
30 4040 50
50 60
60 70
70
xX

© The McGraw-Hill Companies, Inc., 2000


11-12
11-3 Correlation Coefficient
 The correlation coefficient
computed from the sample data
measures the strength and direction
of a relationship between two
variables.
 Sample correlation coefficient, r.
 Population correlation coefficient, 
© The McGraw-Hill Companies, Inc., 2000
11-3 Range of Values for the
11-13
Correlation Coefficient

Strong negative No linear Strong positive


relationship relationship relationship

  

© The McGraw-Hill Companies, Inc., 2000


11-3 Formula for the Correlation
11-14
Coefficient r

n xy   x y


r
 2

n x    x n y    y
2 2 2

Where n is the number of data pairs
© The McGraw-Hill Companies, Inc., 2000
11-3 Correlation Coefficient -
11-15
Example (Verify)

 Compute the correlation coefficient


for the age and blood pressure data.
 x  345,  y = 819 ,  xy = 47 ,634
 x  20 ,399 ,  y  112 ,443.
2 2

Substituting in the formula for r gives


r  0.897 .
© The McGraw-Hill Companies, Inc., 2000
11-3 The Significance of the
11-16
Correlation Coefficient

 The population corelation


coefficient , is the correlation
coefficient,
between all possible pairs of
data values (x, y) taken from a
population.

© The McGraw-Hill Companies, Inc., 2000


11-3 The Significance of the
11-17
Correlation Coefficient

 H0: = 0 H1:  0


 This tests for a significant
correlation between the variables
in the population.

© The McGraw-Hill Companies, Inc., 2000


11-3 Formula for the t tests for the
11-18
Correlation Coefficient

n2
t
1 r
2

with d . f .  n  2
© The McGraw-Hill Companies, Inc., 2000
11-19
11-3 Example
 Test the significance of the correlation
coefficient for the age and blood
pressure data. Use  = 0.05 and
r = 0.897.
 Step 1: State the hypotheses.
 H0: = 0 H1:  0

© The McGraw-Hill Companies, Inc., 2000


11-20
11-3 Example
 Step 2: Find the critical values. Since
 = 0.05 and there are 6 – 2 = 4 degrees
of freedom, the critical values are
t = +2.776 and t = –2.776.
 Step 3: Compute the test value.
t = 4.059 (verify).

© The McGraw-Hill Companies, Inc., 2000


11-21
11-3 Example
 Step 4: Make the decision. Reject the
null hypothesis, since the test value
falls in the critical region (4.059 > 2.776).
 Step 5: Summarize the results. There is
a significant relationship between the
variables of age and blood pressure.

© The McGraw-Hill Companies, Inc., 2000


11-22
11-4 Regression
 The scatter plot for the age and blood
pressure data displays a linear pattern.
 We can model this relationship with a
straight line.
 This regression line is called the line of
best fit or the regression line.
 The equation of the line is y  = a + bx.

© The McGraw-Hill Companies, Inc., 2000


11-4 Formulas for the Regression
11-23
Line y  = a + bx.

a
 y x    x xy
2


n x    x 
2 2

n xy    x  y 
b
n x    x 
2 2

Where a is the y  intercept and b is


the slope of the line.
© The McGraw-Hill Companies, Inc., 2000
11-24
11-4 Example
 Find the equation of the regression line
for the age and the blood pressure data.
 Substituting into the formulas give
a = 81.048 and b = 0.964 (verify).
 Hence, y  = 81.048 + 0.964x.
 Note, a represents the intercept and b
the slope of the line.

© The McGraw-Hill Companies, Inc., 2000


11-25
11-4 Example

150
150
Pressure
Pressure

140
140

130
130
y  = 81.048 + 0.964x
120
120
40
40 50
50 60
60 70
70
Age
Age

© The McGraw-Hill Companies, Inc., 2000


11-4 Using the Regression Line to
11-26
Predict

 The regression line can be used to


predict a value for the dependent
variable (y) for a given value of the
independent variable (x).
 Caution: Use x values within the
experimental region when
predicting y values.
© The McGraw-Hill Companies, Inc., 2000
11-27
11-4 Example
 Use the equation of the regression line
to predict the blood pressure for a
person who is 50 years old.
 Since y  = 81.048 + 0.964x, then
y  = 81.048 + 0.964(50) = 129.248 129.
 Note that the value of 50 is within the
range of x values.

© The McGraw-Hill Companies, Inc., 2000


11-5 Coefficient of Determination
11-28
and Standard Error of Estimate

 The coefficient of determination,


determination
denoted by r2, is a measure of
the variation of the dependent
variable that is explained by the
regression line and the
independent variable.

© The McGraw-Hill Companies, Inc., 2000


11-5 Coefficient of Determination
11-29
and Standard Error of Estimate

 r2 is the square of the correlation


coefficient.
 The coefficient of
nondetermination is (1 – r2).
 Example: If r = 0.90, then
r2 = 0.81.
© The McGraw-Hill Companies, Inc., 2000
11-5 Coefficient of Determination
11-30
and Standard Error of Estimate

 The standard error of estimate,


estimate
denoted by sest, is the standard
deviation of the observed y values
about the predicted y  values.
 The formula is given on the next
slide.

© The McGraw-Hill Companies, Inc., 2000


11-5 Formula for the Standard
11-31
Error of Estimate

 y  y 
2

s 
n2
est

or
 y  a  y  b xy
2

s 
n2
est

© The McGraw-Hill Companies, Inc., 2000


11-5 Standard Error of Estimate -
11-32
Example

 From the regression equation,


y  = 55.57 + 8.13x and n = 6, find sest.
 Here, a = 55.57, b = 8.13, and n = 6.
 Substituting into the formula gives sest
= 6.48 (verify).

© The McGraw-Hill Companies, Inc., 2000


11-33
11-5 Prediction Interval
 A prediction interval is an
interval constructed about a
predicted y value, y , for a
specified x value.

© The McGraw-Hill Companies, Inc., 2000


11-34
11-5 Prediction Interval
 For given  value, we can state
with (1 – )100% confidence that
the interval will contain the
actual mean of the y values that
correspond to the given value of
x.

© The McGraw-Hill Companies, Inc., 2000


11-5 Formula for the Prediction
11-35
Interval about a Value y

1 n ( x  X )2
y  t s 1 
 2 est n n  x 2   x 2
 y

1 n ( x  X )2
y  t s 1 
 2 est n n  x 2   x 2

with d .f .  n  2

© The McGraw-Hill Companies, Inc., 2000


11-36
11-5 Prediction interval - Example

 A researcher collects the data shown on the


next slide and determines that there is a
significant relationship between the age of a
copy machine and its monthly maintenance
cost. The regression equation is
y  = 55.57 + 8.13x. Find the 95% prediction
interval for the monthly maintenance cost of
a machine that is 3 years old.

© The McGraw-Hill Companies, Inc., 2000


11-37
11-5 Prediction Interval - Example
Machine Age, x (Years) Monthly cost, y

A 1 $62

B 2 $78

C 3 $70

D 4 $90

E 4 $93

F 6 $103
© The McGraw-Hill Companies, Inc., 2000
11-38 11-5 Prediction Interval - Example

 Step 1: Find x, x2 and X . x = 20,


x2 = 82, X  20  3.3
6
 Step 2: Find y  for x = 3.
y  = 55.57 + 8.13(3) = 79.96
 Step 3: Find sest
sest = 6.48 as shown in previous
example.
© The McGraw-Hill Companies, Inc., 2000
11-39
11-5 Prediction Interval - Example

 Step 4: Substitute in the formula and


solve.
t/2 = 2.776, d.f. = 6 – 2 = 4 for 95%
60.53 < y < 99.39 (verify)

Hence, one can be 95% confident that


the interval 60.53 < y < 99.39 contains
the actual value of y.
© The McGraw-Hill Companies, Inc., 2000

Potrebbero piacerti anche