Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Regression
AMS 572 Group #2
Outline
Introduction
Multiple linear regression attempts to mo
del the relationship between two or more
explanatory variables and a response vari
able by fitting a linear equation to observ
ed data. Every value of the independent v
ariable x is associated with a value of the
dependent variable
Histor
y
7
10
Adrien-Marie Legendre
(18 September 1752 10
January 1833) was a French
mathematician. He made
important contributions to
statistics, number theory,
abstract algebra and
mathematical
He developed analysis.
the least
squares method, which
has broad application
in linear regression,
signal processing,
statistics, and curve
12
fitting.
Johann Carl
Friedrich Gauss
(30 April 1777 23
February 1855) was
a German
mathematician and
scientist who
contributed
significantly to
many fields,
including number
theory, statistics,
analysis, differential
geometry, geodesy,13
In addition to "analysis
of variance", Fisher
invented the
technique of
maximum likelihood
and originated the
concepts of
sufficiency, ancillarity,
Fisher's linear
discriminator and
Fisher information.
17
18
Probabilistic Model
yi : the observed value of the random
Yi
variable(r.v.)
xi1 , xi 2 ,K , xik
depends on fixed predictor values
Yi
,i=1,2,3,,n
0 , 1 ,K , k
unknown model
parameters
n is the number of
observations.
i.i.d
~i N (0, 2 )
19
(j=1,2,,k)
20
Mileage (in
1000 miles)
Groove Depth
(in mils)
394.33
329.50
291.00
12
255.17
16
229.33
20
204.83
24
179.00
28
163.83
32
150.33
21
Depth=386.2612.77mile+0.172sqmile
23
y i
i
ei yi y
(i 1, 2,K , n)
y i
xi 1 xi 1 kxik
(i 1, 2,..., n)
i 1
SST ( yi y ) 2
26
The first column of X
denotes the constant term
(We can treat this as with)
27
Finally let
where
the (k+1)1 vectors of unknown parameters
LS estimates
28
Formula
becomes
Simultaneously, the linear equation
are changed to
Solve this equation respect-1 to and we get
(if the inverse of the matrix exists.)
29
30
According to formula
-1
31
32
Therefore, the LS quadratic model is
33
StatisticalInference
for
MultipleRegression
34
H 0 j : j 0 vs. H 1 j : j 0
If we cant reject H0j, then xj is not a significan
t predictor of y.
35
1 : N 1,
: N (0,1)
Sxx
/ S xx
( n 2) S 2
2
t:
SSE
N
W / ( n 2)
: n2 2
1 1
/ S xx
( n 2) S 2
: tn2
2
( n 2)
36
1 : N 1, V jj
2
[ n ( k 1)]S 2
2
N
t:
W / [n (k 1)]
1 1
: N (0,1)
V jj
SSE
: n2 ( k 1)
1 1
V jj
[n ( k 1)]S 2
: tn ( k 1)
2
[n (k 1)]
37
E
(
0
0
Here, of least
squares estimators E ( )
is also unbiased.
E ( 1 ) 1
L
L
k
E ( k )
38
var(Y )
0 L
0 0
2
L
L
L
L
0
2
Ik
L
39
var( ) var(cY )
c var(Y )c
( X X ) X I(
k
T
( X X ) )X
T T
2 ( X T X ) 1
Let Vjj be the jth diagonal of the matrix ( X T X ) 1
2
var( j ) V jj
40
and we get j : N ( j , V jj )
j j
V jj
: N (0,1)
41
e
SSE
MSE
2
i
S
n (k 1) n (k 1) d . f .
(n (k 1)) S 2 SSE
2
W
2 ~ n ( k 1)
2
42
j j
V jj
t
j j
S V jj
(n (k 1)) S 2
2
~
n ( k 1)
2
j j
[n (k 1)]S
: tn ( k 1)
2
[n (k 1)]
S V jj
2
j j
B
: tn ( k 1)
SE ( )
j
SE ( j ) s v jj
43
j j
tn ( k 1), /2 ) 1
SE ( )
j
P (
j t n ( k 1), /2 SE ( j ) j j t n ( k 1), /2 SE ( j )) 1
j tn ( k 1), /2 SE ( j )
44
0
j
vs. H 1 j : j
0
j
P (Reject H 0 j | H 0 j is true) P( t j c)
c tn ( k 1), / 2
Rejects H0j if
0
j j
tj
tn ( k 1), /2
SE ( j )
45
46
(
x
0
1 1
k k
k)
Var[( x* )T ] ( x* )T Var ( ) x* ( x* )T 2 ( X T X ) 1 ( x* )T
k
B 2 ( xk* )T V ( xk* )T
Replacing 2 by its estimate s 2 MSE, which has
n K 1 d.f ., and using methods as in Simple Linear
Regression, a (1- )-level CI for * is given by
F-Test for
j 's
Consider:
H 0 : 1 L k 0;
vs
H1 : At least one j 0.
MSR
F
f k ,n ( k 1),
MSE
recall that
MSE(error mean squar
n
2
e
e)
i 1 i
MSE
n (k 1)
i 1
i 1
i 1
Sum of
Squares
Degrees of
Freedom
(d.f.)
(SS)
Regression
SSR
Mean
Square
(MS)
k
Error
SSE
n-(k+1)
Total
SST
n-1
SSR
k
SSE
MSE
n (k 1)
MSR
MSR
MSE
52
(i 1, 2,K , n)
H1 : At least one of k m 1 ,K , k 0.
53
then, we have:
( SSEk m SSEk ) / m
F
f m ,n ( k 1),
SSEk / [n (k 1)]
54
Regression
Diagnostics
57
5 Regression Diagnostics
5.1 Checking the Model Assumptions
60
SAS code
61
62
SAS code
63
64
SAS code
65
Standardized residuals
*
e
SE (e )
i
s 1 hii
H X X X
, th
66
outliers
67
68
y * 0 x11 x2 2
after transformation:
log y log 0 1 log x1 2 log x2
or
y * 0* 1* x1* 2* x2*
69
Topics in
Regression
Modeling
70
Multicollinearity
Multicollinearity occurs when two or more predi
ctors in the model are correlated and provide re
dundant information about the response.
Example of multicollinear predictors are height
and weight of a person, years of education and i
ncome, and assessed value and square footage
of a home.
Consequences of high multicollinearity:
a. Increased standard error of estimates of the
s
b. Often confused and misled results.
71
Detecting Multicollinearity
Easy way: compute correlations between all pair
s of predictors. If some r are close to 1 or -1, re
move one of the two correlated predictors from
the
model.X1 X2 X3
Variable
2
X1
X2
21
X3
31
Correlations
Equal to 1
12
13
23
32
X1
colinear
X2
X2
independent
X3
72
Detecting Multicollinearity
Another way: calculate the variance inflation fac
tors for each predictor xj:
where
is the coefficient of determination of
the model that includes all predictors except th
e jth predictor.
If VIFj10, then there is a problem of multicolli
nearity.
73
Muticollinearity-Example
See Example11.5 on Page 416, Response is the heat
of cement on a per gram basis (y) and predictors ar
e tricalcium aluminate(x1), tricalcium silicate(x2), te
tracalcium alumino ferrite(x3) and dicalcium silicate
(x4).
74
Muticollinearity-Example
Estimated parameters in first order model:
y =62.4+1.55x1+0.510x2+0.102x3-0.144x4.
F = 111.48 with pvalue below 0.0001. Individual t
statistics and pvalues: 2.08 (0.071), 0.7 (0.501) a
nd 0.14 (0.896), -0.20 (0.844).
Note that sign on 4 is opposite of what is expected
. And very high F would suggest more than just one
significant predictor.
75
Muticollinearity-Example
Correlations
Muticollinearity-Subsets Selecti
on
Algorithms for Selecting Subsets
All possible subsets
Only feasible with small number of potential predictors (m
aybe 10 or less)
Then can use one or more of possible numerical criteria to
find overall best
Muticollinearity-Subsets Selecti
oin
Forward stepwise regression
Start with no predictors
First include predictor with highest correlation with response
In subsequent steps add predictors with highest partial correlation with response contr
olling for variables already in equations
Stop when numerical criterion signals maximum (minimum)
Sometimes eliminate variables when t value gets too small
Backward elimination
Start with all predictors in equation
Remove predictor with smallest t value
Continue until numerical criterion signals maximum (minimum)
Adjusted R-Square
Will decrease if increase in R-Square with increasing p is small
79
Mallows' Cp statistic
Should be about equal to p and look for small values near p
Need to estimate overall error variance
PRESS statistic
The one associated with the minimum value of PRESSp is chosen
Intuitively easier to grasp than the Cp-criterion.
80
Muticollinearity-Forward Stepwi
se
First include predictor with highest correlation with
response
>FIN 81
=4
Muticollinearity-Forward Stepwi
se
In subsequent steps add predictors with highest pa
rtial correlation with response controlling for variab
les already in equations. (if Fi>FIN=4, enter the Xi a
nd Fi<FOUT=4, remove the Xi)
>FIN
=4
82
Muticollinearity-Forward Stepwi
se
>FIN
<FOUT
=4
=4
83
Muticollinearity-Forward Stepwi
se
Summarize the stepwise algorithms
Muticollinearity-Forward Stepwi
se
Check the significance of the model and individual
parameter again. We find p value are all small and
each VIF is far less than 10.
85
Muticollinearity-Best Subsets
Also we can stop when numerical criterion signals
maximum (minimum) and sometimes eliminate var
iables when t value gets too small.
86
Muticollinearity-Best Subsets
The largest R squared value 0.9824 is associated with t
he full model.
The best subset which minimizes the Cp-criterion inclu
des x1,x2
The subset which maximizes Adjusted R squared or eq
uivalently minimizes MSEp is x 1,x2,x4. And the Adjusted
R squared increases only from 0.9744 to 0.9763 by the
addition of x4to the model already containing x 1 and x2.
Thus the simpler model chosen by the Cp-criterion is p
referred, which the fitted model is
y=52.5773+1.4683x1+0.6623x2
87
Polynomial model
88
Multicollinearity-Polynomial Models
Multicollinearity is a problem in polynomial regr
ession (with terms of second and higher order):
x and x2 tend to be highly correlated.
A special solution in polynomial models is to us
e zi = xi xi instead of just xi. That is, first subt
ract each predictor from its mean and then use
the deviations in the model.
89
90
92
yi Di ut
93
yi Di xi ut
E ( yi ) X
E ( yi ) X
For nonsmoker
For smoker
94
Non-smoker
Smoker
+
95
yi Di xi Di xi ut
E ( yi ) ( ) ( ) X
E ( yi ) X
For nonsmoker
For smoker
96
Non-smoker
Smoker
+
97
98
x
*
ij
xij x j
sxij
(i 1, 2,K , n; j 1, 2,K , k )
sxj
Where s y and
d
xan
j
99
0* 0
Let
s
*
And j ( xj )( j 1,2,K , k )
sy
*j
The magnitudes of
can be directly com
xj
pared to judge the relative effects
of on
y.
100
0* 0
Since
, the constant can be dropped
*
* 's
y
y
from the model. Let
be the vector of th
e
x*
x* ' s
and 1 berthe Kmatrix
ryx1
r of
x1x 2
x1xk
1 K rx2 xk
1 *' *
x
2
x
1
x x R
M
M O
M
n 1
1
rxkx1 rxkx2 K
ryx 2
1 *' *
x y r
M
n 1
ryxk
101
M (x*'x*)1x*' y* R 1r
*
k
1 0.19244, 2 0.3406
x2
*
s
Then we have ( x1 ) 0.875, ( sx 2 ) 0.105
1
2
2
1
sy
sy
103
Then we have
1
0.913
1
0.913
Next calculate
Hence
1
R
1 rx21x 2
1
1
r
x1x 2
0.971
r
0.904
x1
0.913
x2
0.971 0.904
rx1x 2
6.009 5.586
5.486
6.009
0.875
R 1r
0.105
104
Variable Selection
Methods
105
23
Attacker
5 years
more than 20
goals per year
Lionel Messi
10,000,000
EURO/yr
Defender
11 years
less than 1
goals per year
Carles Puyol
5,000,000 EURO/yr
106
107
Stepwise Regression
Partial F-test
Partial Correlation Coefficients
How to do it by SAS?
Drawbacks
108
Partial F-test
(p-1)-Variable Model:
Yi 0 1 xi1 ... p 1 xi , p 1 i
p-Variable Model:
Yi 0 1 xi1 ... p 1 xi , p 1 p xi , p i
109
vs
H 0 p: p 0
H1 pof
in favor
( SSE p 1 SSE p ) /1
SSE p / [n ( p 1)]
at
f ,1,n ( p 1)
110
t Fp
We reject H 0 p
at level if
2
p
| t p | tn ( p 1), /2
111
SSE p 1 SSE p
SSE p 1
test statistics:
Fp t
2
p
2
yx p | x1,..., x p1
[n ( p 1)]
1 ryx2 p | x1,..., x p1
No.
X1
X2
X3
X4
26
60
78.5
29
15
52
74.3
11
56
20
104.3
11
31
47
87.6
52
33
95.9
11
55
22
109.2
71
17
102.7
31
22
44
72.5
54
18
22
93.1
10
21
47
26
1159
11
40
23
34
83.8
12
11
66
12
113.3
13
10
68
12
109.4
113
SAS Code
data example1;
input x1 x2 x3 x4 y;
datalines;
7 26 6 60 78.5
1 29 15 52 74.3
11 56 8 20 104.3
11 31 8 47 87.6
7 52 6 33 95.9
11 55 9 22 109.2
3 71 17 6 102.7
1 31 22 44 72.5
2 54 18 22 93.1
21 47 4 26 115.9
1 40 23 34 83.8
11 66 9 12 113.3
10 68 8 12 109.4
;
Run;
SAS output
115
SAS output
116
Interpretation
Interpretation
Drawbacks
The final model is not guaranteed to be optima
l in any specified case.
119
120
Optimality Criteria
r Criterion
2
p
r
2
p
SSR p
SST
SSE p
SST
Adjusted r Criterion
2
p
2
adj , p
SSE p / (n ( p 1))
SST / n 1
MSE p
MST
122
Optimality Criteria
C p Criterion
Standardized mean square error of prediction:
n
2
ip
p
i
2
i 1
E (Y )]
E
[
Y
Cp
SSE p
2( p 1) n
123
Optimality Criteria
124
How to do it by SAS?(Ex11.9)
proc reg data=example1;
model y= x1 x2 x3 x4 /selection=adjrsq m
se cp;
run;
125
SAS output
126
Interpretation
radj
The subset which maximizes
is x1, x2, x4.
,p
2
r
However, adj , p increases only from 0.9744 to 0.9
763 by the addition of x4 to the model which alr
eady contains x1 and x2.
127
Chapter Summary
and Modern Application
128
Multiple
Regression Model
i 1
n
Q
2 [ yi ( 0 1 xi1 2 xi 2 ... kxik )] 0
0
i 1
n
Q
2 [ yi ( 0 1 xi1 2 xi 2 ... kxik )] xij 0
j
i 1
SSR
r
SST
2
Y X
( X ' X ) 1 X 'Y
( X ' X ) 1 X 'Y
129
Statistical Inference
for Multiple
Regression
Teststatistic:
1j
j j
Z
T
~ Tn ( k 1)
W / n (k 1)
S v jj
Regression
Diagnostics
MSR r 2 {n (k 1)}
F
MSE
k (1 r 2 )
Residual Analysis
Data Transformation
130
Compare
... k x ik i
i1 ... k m x i,km i
RejectH
0 when F0 f m ,n ( k 1),
EstimatingandPredictingFutureObservations:
*
*
*
* '
x
(x
,
x
,...,
x
Let
0 1
k)
Teststatistic: T
and
* *
*
s x Vx
~ Tn ( k 1)
CIfortheestimatedmean*:
* t n ( k 1), / 2 s x* Vx*
131
Topics in
regression
modeling
Multicollinearity
Polynomial Regression
Dummy Predictor
Variables
Logistic egression
Model
2
partial F-testryx p / x1,x p 1
SSE p 1 SSE p
SSE p 1
2
partial Correlation ryx p/ x1L x p1 n p 1
Fp
Coefficient
1 ryx2 p/ x1L x p1
Stepwise Regression:
Variable
Selection
Methods
Strategy for
building a MLR
model
132
133
Financial market
biology
Housing
price
heredit
y
Chemis
try
134
Example
Broadly speaking, an asset pricing model
can be expressed as:
ri ai b1 j f1 b2 j f 2 L bkj f k i
Whereri f k,
and k denote the expected
return on asset i, the kth risk factor and
i factors,
the number of risk
denotesrespectively.
the specific return
on asset i.
135
GDP
Inflation
rate
Wha
t
facto s the mo
s t im
rs?
porta
nt
Interest rate
Rate of return on
the market
portfolio
Employment rate
Government
policies
137
Method
Step 1: Find the efficient factors
(EM algorithms, maximum likelih
ood)
Step 2: Fit the model and estimate the fa
ctor
loading
(Multiple linear regression)
138
tor loading
and the coefficient of mult
2
r
iple determination
We can ensure the factors that mostly eff
ect the return in term of SAS output and t
hen build the appropriate multiple factor
models
We can use the model to predict the futur
e return and make a good choice!
139
Questions
Thank you
140