Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Chapter Goals
To understand the methods for displaying and describing relationship among two variables
Example
The following graph shows the scatter plot of Exam 1 score (x) and Exam 2 score (y) for 354 students in a class. Is there a relationship between x and y?
x y y
x
6
x y
x
7
Correlation Coefficient
(continued)
The population correlation coefficient (rho) measures the strength of the association between the variables The sample correlation coefficient r is an estimate of and is used to measure the strength of the linear relationship in the sample observations
Features of
and r
Unit free Range between -1 and 1 The closer to -1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker the linear relationship
9
y y
x x y y x
x
10
653 153 000. 1 ** 004. 153 663 000. ** 004. 1 2maxE 1maxE
Earlier Example
11
level 10.0 eht ta tnacifingis si noitalerroC .** N )deliat-2( .giS noitalerroC nosraeP N )deliat-2( .giS noitalerroC nosraeP snoitalerroC 2maxE 1maxE
Questions?
What kind of relationship would you expect in the following situations: Age (in years) of a car, and its price. Number of calories consumed per day and weight. Height and IQ of a person.
12
Exercise
Identify the two variables that vary and decide which should be the independent variable and which should be the dependent variable. Sketch a graph that you think best represents the relationship between the two variables. 1. The size of a persons vocabulary over his or her lifetime. 2. The distance from the ceiling to the tip of the minute hand of a clock hung on the wall.
13
Dependent variable: the variable we wish to explain. Independent variable: the variable used to explain the dependent variable.
14
15
No Relationship
16
Dependent Variable
Yi ! 0 1X i i ; i ! 1,2,..., n
Linear component Random Error component
17
18
Y
Observed Value of yi for xi
Yi ! 0 1X i i
i
xi
X
19
Yi ! b 0 b1X i
Earlier Example
21
Residual
A residual is the difference between the observed response yi and the predicted response i. Thus, for each pair of observations (xi, yi), the ith residual is ei = yi i = yi (b0 + b1xi)
( x x )( y y ) ! (x x)
2
algebraic equivalent:
x y xy
b1 ! n ( x ) 2 x2 n
and
b0 ! y b1 x
24
SPSS Output
28
ecirP esuoH :elbairaV tnednepeD .a 010. 921. .giS 923.3 396.1 t 267. ateB stneiciffeoC dezidradnatS
a
1 ledoM
stneiciffeoC
teeF erauqS ,)tnatsnoC( :srotciderP .a 23033.14 etamitsE eht fo rorrE .dtS 825. erauqS R detsujdA 185. erauqS R
a267.
1 ledoM
yrammuS ledoM
Graphical Presentation
House price model: scatter plot and regression line
450 House Price ($1000s) 400 350 300 250 200 150 100 50 0 0 500 1000 1500 2000 2500 3000 Square Feet
Slope = 0.110
Intercept = 98.248
b1 measures the estimated change in the average value of Y as a result of a one-unit change in X
Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size
31
Exercise
The growth of children from early childhood through adolescence generally follows a linear pattern. Data on the heights of female Americans during childhood, from four to nine years old, were compiled and the least squares regression line was obtained as = 32 + 2.4x where is the predicted height in inches, and x is age in years. Interpret the value of the estimated slope b1 = 2. 4. Would interpretation of the value of the estimated y-intercept, b0 = 32, make sense here? What would you predict the height to be for a female American at 8 years old? What would you predict the height to be for a female American at 25 years old? How does the quality of this answer compare to the previous question?
33
Coefficient of Determination, R2
The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable. The coefficient of determination is also called R-squared and is denoted as R2
0 eR e1
34
Coefficient of Determination, R2
(continued) Note: In the single independent variable case, the coefficient of determination is
R !r
where:
35
x y y
x
36
R2
=0
SPSS Output
38
ecirP esuoH :elbairaV tnednepeD .a 010. 921. .giS 923.3 396.1 t 267. ateB stneiciffeoC dezidradnatS 330. 011. 330.85 842.89 rorrE .dtS B stneiciffeoC dezidradnatsnU teeF erauqS )tnatsnoC( 1 ledoM
astneiciffeoC
ecirP esuoH :elbairaV tnednepeD .b teeF erauqS ,)tnatsnoC( :srotciderP .a 9 8 1 fd AVONA b latoT laudiseR noissergeR
0 a 10. .giS
580.11 F
1 ledoM
teeF erauqS ,)tnatsnoC( :srotciderP .a 23033.14 etamitsE eht fo rorrE .dtS 825. erauqS R detsujdA 185. erauqS R 267.
a
1 ledoM
yrammuS ledoM
39
SPSS Output
s ! 41.33032
sb1 ! 0.03297
40
ecirP esuoH :elbairaV tnednepeD .a 010. 921. .giS 923.3 396.1 t 267. ateB stneiciffeoC dezidradnatS
a
1 ledoM
stneiciffeoC
teeF erauqS ,)tnatsnoC( :srotciderP .a 23033.14 etamitsE eht fo rorrE .dtS 825. erauqS R detsujdA 185. erauqS R
a267.
1 ledoM
yrammuS ledoM
small sI
x y
small sb1
large s I
large sb1
41
b1 t! sb1
= Hypothesized slope
42
The slope of this model is 0.1098 Does square footage of the house affect its sales price?
b1
=0 1 { 0
sb1
t Stat 1.69296 3.32938
t*
P-value 0.12892 0.01039
Decision: Reject H0 Conclusion: There is sufficient evidence that square footage affects 44 house price
Reject H0
-t -2.3060
Do not reject H0 /2 0
2.3060 3.329
b1 s t1E/2
s b1
Excel Printout for House Prices:
Coefficients Intercept Square Feet 98.24833 0.10977 Standard Error 58.03348 0.03297 t Stat 1.69296 3.32938 P-value 0.12892 0.01039
d.f. = n - 2
At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858)
45
t Stat
P-value
Since the units of the house price variable is $1000s, we are 95% confident that the average impact on sales price is between $33.70 and $185.80 per square foot of house size
This 95% confidence interval does not include 0. Conclusion: There is a significant relationship between house price and square feet at the .05 level of significance
46
Residual Analysis
Purposes Examine for linearity assumption Examine for constant variance for all levels of x Evaluate normal distribution assumption Graphical Analysis of Residuals Can plot residuals vs. x Can create histogram of residuals to check for normality
47
x
residuals residuals
Not Linear
Linear
48
x
residuals
x
residuals
x Non-constant variance
Constant variance
49
50