Sei sulla pagina 1di 29

Chapter 12

Regression and
Correlation Analysis
Chapter 12 - Chapter Outcomes
After studying the material in this chapter, you
should be able to:
Calculate and interpret the simple correlation
between two variables.
Determine whether the correlation is significant.
Calculate and interpret the simple linear
regression coefficients for a set of data.
Understand the basic assumptions behind
regression analysis.
Determine whether a regression model is
significant.



Chapter 12 - Chapter Outcomes
(continued)
After studying the material in this chapter, you
should be able to:
Calculate and interpret confidence intervals
for the regression coefficients.
Recognize regression analysis applications
for purposes of prediction and description.
Recognize some potential problems if
regression analysis is used incorrectly.
Recognize several nonlinear relationships
between two variables.
Scatter Diagrams
A scatter plot is a graph that may be
used to represent the relationship
between two variables. Also
referred to as a scatter diagram.
Dependent and Independent
Variables
A dependent variable is the variable to be
predicted or explained in a regression
model. This variable is assumed to be
functionally related to the independent
variable.
Dependent and Independent
Variables
An independent variable is the variable
related to the dependent variable in a
regression equation. The independent
variable is used in a regression model to
estimate the value of the dependent
variable.
Two Variable Relationships
(Figure 11-1)
X
Y
(a) Linear
Two Variable Relationships
(Figure 11-1)
X
Y
(b) Linear
Two Variable Relationships
(Figure 11-1)
X
Y
(c) Curvilinear
Two Variable Relationships
(Figure 11-1)
X
Y
(d) Curvilinear
Two Variable Relationships
(Figure 11-1)
X
Y
(e) No Relationship
Correlation
The correlation coefficient is a quantitative
measure of the strength of the linear
relationship between two variables. The
correlation ranges from + 1.0 to - 1.0. A
correlation of 1.0 indicates a perfect linear
relationship, whereas a correlation of 0
indicates no linear relationship.
Correlation
SAMPLE CORRELATION COEFFICIENT or
Pearsons Correlation Coefficient



where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable

] ) ( ][ ) ( [
) )( (
2 2
y y x x
y y x x
r
Correlation
SAMPLE CORRELATION COEFFICIENT

or the algebraic equivalent:



] ) ( ) ( ][ ) ( ) ( [
2 2 2 2
y y n x x n
y x xy n
r
Correlation
(Example 11-1)
Sales Years
y x xy y
2
x
2
487 3 1,461 237,169 9
445 5 2,225 198,025 25
272 2 544 73,984 4
641 8 5,128 410,881 64
187 2 374 34,969 4
440 6 2,640 193,600 36
346 7 2,422 119,716 49
238 1 238 56,644 1
312 4 1,248 97,344 16
269 2 538 72,361 4
655 9 5,895 429,025 81
563 6 3,378 316,969 36
(Table 11-1)

855 , 4

55

687 , 240 , 2

091 , 26

329
Correlation
(Example 11-1)



] ) ( ) ( ][ ) ( ) ( [
2 2 2 2
y y n x x n
y x xy n
r
8325 . 0
] ) 855 , 4 ( ) 687 , 240 , 2 ( 12 ][ ) 55 ( ) 329 ( 12 [
) 855 , 4 ( 55 ) 091 , 26 ( 12
2 2



r
Simple Linear Regression
Analysis
Simple linear regression analysis
analyzes the linear relationship that
exists between a dependent variable
and a single independent variable.
Simple Linear Regression
Analysis
SIMPLE LINEAR REGRESSION MODEL
(POPULATION MODEL)


where:
y = Value of the dependent variable
x = Value of the independent variable
= Populations y-intercept
= Slope of the population regression line
= Error term, or residual

x y
1 0
0

Simple Linear Regression


Analysis
The simple linear regression model has four
assumptions:
Individual values if the error terms,
i
, are
statistically independent of one another.
The distribution of all possible values of
i
is normal.
The distributions of possible
i
values have equal
variances for all value of x.
The means of the dependent variable, for all specified
values of the independent variable, y, can be
connected by a straight line called the population
regression model.
Simple Linear Regression
Analysis
REGRESSION COEFFICIENTS
In the simple regression model, there
are two coefficients: the intercept and
the slope.
Simple Linear Regression
Analysis
The interpretation of the regression slope
coefficient is that is gives the average change
in the dependent variable for a unit increase
in the independent variable. The slope
coefficient may be positive or negative,
depending on the relationship between the
two variables.
Simple Linear Regression
Analysis
The least squares criterion is used
for determining a regression line
that minimizes the sum of squared
residuals.
Simple Linear Regression
Analysis
A residual is the difference between
the actual value of the dependent
variable and the value predicted by
the regression model.
y y

Simple Linear Regression


Analysis
ESTIMATED REGRESSION MODEL
(SAMPLE MODEL)


where:
= Estimated, or predicted, y value
b
0
= Unbiased estimate of the regression intercept
b
1
= Unbiased estimate of the regression slope
x = Value of the independent variable
x b b y
i 1 0

Simple Linear Regression


Analysis
LEAST SQUARES EQUATIONS


algebraic equivalent:


and

n
x
x
n
y x
xy
b
2
2
1
) (

2
1
) (
) )( (
x x
y y x x
b
x b y b
1 0

Simple Linear Regression Analysis
(Annual Truck Repair Expense Example page:662 )
Director of Chapel Hill is interested in the
relationship b/w the age of a garbage truck &
annual repair expense for 4 trucks.
Repair Exp (hundred Age of trucks
during last yr (y) in yrs. (x) xy y
2
x
2
7 5 35 49 25
7 3 21 49 9
6 3 18 36 9
4 1 4 16 1

12 x
24 y

44
2
x
78 xy

150
2
y
Simple Linear Regression
Analysis
(Table 11-3)
75 . 0
4
) 12 (
44
4
) 24 ( 12
78
) (
2 2
2
1


n
x
x
n
y x
xy
b
75 . 3 ) 3 ( 75 . 0 6
1 0
x b y b
The least squares regression line is:
) ( 75 . 0 75 . 3

x y
Least Squares Regression
Properties
The sum of the residuals from the least
squares regression line is 0.
The sum of the squared residuals is a
minimum.
The simple regression line always passes
through the mean of the y variable and
the mean of the x variable.
The least squares coefficients are unbiased
estimates of
0
and
1
.
Simple Linear Regression
Analysis
Standard Error of Estimate: Statistician use
Standard Error of Estimate to measure the
reliability of the estimating regression
equation or in other words Standard Error
of Estimate measure the variability or
scatter of the observation around estimated
regression line.

Potrebbero piacerti anche