Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
and Correlation.
Corresponds to
Chapter 10
Tamhane and Dunlop
2
Scatter plot of ozone concentration
by temperature
5
4
air$ozone
3
2
1
60 70 80 90
air$temperature
3
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
A Probabilistic Model for Simple Linear Regression
See Figure 10.1, p. 348 and also see page 348 for the four
assumptions of a simple linear regression model.
5
Least Square Line Mathematics (invented by Gauss)
Find the line, i.e., values of β0 and β1 that minimizes the sum
of the squared deviations:
n
Q = ∑ [ y i − ( β 0 + β1x i )]2
i =1
How?
∂Q ∂Q
= 0 and =0
∂β 0 ∂β1
6
Finding Regression Coefficients
∂Q n
= −2∑ [ y i − ( β 0 + β1xi )]
∂β 0 i =1
∂Q n
= −2∑ xi [ y i − ( β 0 + β1xi )]
∂β1 i =1
7
Normal Equations
n n
nβ 0 + β1 ∑ xi = ∑ y i
i =1 i =1
n n n
β 0 ∑ x i + β1 ∑ x = ∑ x i y i
2
i
i =1 i =1 i =1
8
Solution to Normal Equations
n
∑(x i − x )( y i − y )
S xy
βˆ1 = i =1
=
n
S xx
∑(x
i =1
i − x) 2
βˆ0 = y − βˆ1 x
3
2
1
60 70 80 90
air$temperature
10
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Fitted values of yi : yˆ i = βˆ0 + βˆ1 xi , i = 1, 2,..., n
Residuals : ei = y i − yˆ i = y i − ( βˆ0 + βˆ1 x i ) , i = 1, 2,..., n
y is n by 1
X is n by 2
β is 2 by 1
ε is n by 1
12
Y=Xβ + ε
⎡ y 1 ⎤ ⎡1 x1 ⎤ ⎡ε 0 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ y 2 ⎥ = ⎢1 x 2 ⎥ ⎡ β 0 ⎤ ⎢ε 1 ⎥
⎢ ⎥ +
⎢ y 3 ⎥ ⎢1 x 3 ⎥ ⎣ β1 ⎦ ⎢ε 3 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣ y 4 ⎥⎦ ⎢⎣1 x 4 ⎥⎦ ⎢⎣ε 4 ⎥⎦
13
Solution of linear equations
In linear algebra:
Find x which solves Ax=b.
In regression analysis:
Find β which solves Xβ=y
Why can’t we do this?
14
Least Squares
Q=(y-Xβ)’(y-Xβ)
= y’y – β’X’y – y’Xβ + β’X’Xβ
= y’y – 2 β’X’y + β’X’Xβ
15
Least Squares continued
X' X = ⎢
⎡n ∑ xi ⎤
⎥
⎢⎣ ∑ x i ∑ x i ⎥⎦
2
⎡∑ y i ⎤
X' y = ⎢ ⎥
⎢⎣ ∑ x i y i ⎥⎦
16
Least Squares continued
X’Xb = X’y
⎡n
⎢
∑x i ⎤
⎥ b =
⎡∑ y i ⎤
⎢ ⎥
⎢⎣∑ x i ∑x ⎢⎣∑ x i y i ⎥⎦
2
i ⎥
⎦
X’Xb = X’y
b= (X’X)-1X’y (if X has linearly
independent columns)
Solution by QR decomposition
X=QR, Q orthonormal, R upper triangular
and invertible
b=(X’X)-1X’y = (R’Q’QR)-1R’Q’y
=(R’R)-1R’Q’y = R-1Q’y
18
The Hat Matrix
b=(X’X)-1 X’y
ŷ=Xb = X(X’X)-1X’y =Hy
H (n by n) is the Hat matrix
Takes y to ŷ
H is symmetric and idempotent HH=H
Diagonal elements of the hat matrix are
useful in detecting influential observations.
19
Expected value of b
E(b) = E((X’X)-1X’y]
= E[(X’X)-1X’(Xβ+ε)]
= E[(X’X)-1X’X β+ (X’X)-1X’ε]
=β
Hence b is an unbiased estimator of β.
20
Covariance of b
The covariance matrix of y is σ2I
b=(X’X)-1X’y = Ay (where A is k by n)
Cov(b) = A Var(y) A’ = A σ2I A = σ2AA’
= σ2 (X’X)-1X’X(X’X)-1
= σ2 (X’X)-1
21
Covariance of b
−1
σ⎢
2
⎡ n ∑i ⎥ = σ
x ⎤ 2 ⎡
⎢
∑xi - ∑xi ⎤
2
⎥
⎢⎣∑xi ∑xi ⎥⎦ n∑xi −(∑xi ) ⎢⎣−∑xi n ⎥⎦
2 2
SD ( b0 ) = σ
∑ i
x 2
; SD(b 1 ) = σ
1
nS xx S xx
22
Estimation of σ2
n n
∑ i
e 2
∑ i i
( y − ˆ
y ) 2
s2 = i =1
= i =1
n−2 n−2
23
Statistical Inference for βo and β1
SE ( βˆ0 ) = s
∑ i
x 2
and SE ( βˆ1 ) =
s
nS xx Sxx
n
Sum of Squares Total (SST) : ∑ ( yi − y ) 2
i =1
n n
Sum of Squares for Error (SSE) : ∑ ei2 =∑ ( y i − yˆ i )2
i =1 i =1
n
Sum of Squares for Regression (SSR) : ∑ ( yˆ i − y ) 2
i =1
25
Geometry of the Sums of Squares
yi − y = ( yˆ i − y ) + ( yi − yˆ i )
yi
2 SSR SSE
r = = 1− =
SST SST
H 0 : β1 = 0 vs. H 0 : β1 ≠ 0
SSR/1 MSR
F= = = t2
SSE/(n - 2) MSE
28
This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Regression Diagnostics
2
1
Residual vs. observation number
resid(ozone.lm)
0
-1
0 20 40 60 80 100
29
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Regression Diagnostics
2
1 residual vs. fitted value
resid(ozone.lm)
0
-1
fitted(ozone.lm)
30
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Regession Diagnostics
residual vs. x
2
1
resid(ozone.lm)
0
-1
60 70 80 90
air$temperature
31
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Regression Diagnostics
qq plot of residuals
2
1
resid(ozone.lm)
0
-1
-2 -1 0 1 2
0 20 40 60 80 100
33
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
Some useful S-Plus commands
my.lm <- lm(y~x, data=mydata, na.action=na.omit)
includes intercept term by default
summary(my.lm)
gives coefficients, correlation of coefficients, R-square, F-
statistic, residual standard error
summary.aov(my.lm)
gives ANOVA table
resid(my.lm)
gives residuals
fitted(my.lm)
gives fitted values
model.matrix(my.lm)
gives model matrix
34
This code was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.