Multilinear Regression

Multiple Linear
Regression
AMS 572 Group #2
Outline
Jinmiao FuIntroduction and History

Ning MaEstablish and Fitting of the model
Ruoyu ZhouMultiple Regression Model in Matr
ix Notation
Dawei Xu and Yuan ShangStatistical Inference
for Multiple Regression
Yu MuRegression Diagnostics
Chen Wang and Tianyu LuTopics in Regression
Modeling
Tian FengVariable Selection Methods
Hua MoChapter Summary and modern applica
tion
Introduction
Multiple linear regression attempts to mo
del the relationship between two or more
explanatory variables and a response vari
able by fitting a linear equation to observ
ed data. Every value of the independent v
ariable x is associated with a value of the
dependent variable
Example: The relationship between an

adults health and his/her daily eating
amount of wheat, vegetable and meat.
Histor
y
7
Karl Pearson (18571936)

Lawyer, Germanist,
eugenicist, mathematician
and statistician
Correlation coefficient
Method of moments
Pearson's system of
continuous curves.
Chi distance, P-value
Statistical hypothesis testing theory,
statistical decision theory.
Pearson's chi-square test, Principal
component analysis.
Sir Francis Galton FRS (16

February 1822 17 January
1911)
Anthropology and
polymathy
In the late 1860s, Galton
Doctoral
students
Karl
conceived the standard
Pearson
deviation. He created
the statistical concept

of correlation and also
discovered the properties
of the bivariate normal
distribution and its
relationship to regression
Galton invented the use of the

regression line (Bulmer 2003, p.184),
and was the first to describe and explain
the common phenomenon of regression
toward the mean, which he first
observed in his experiments on the size
of the seeds of successive generations of
sweet peas.
10
The publication by his

cousin Charles Darwin of
The Origin of Species in
1859 was an event that
changed Galton's life. He
came to be gripped by
the work, especially the
first chapter on "Variation
under Domestication"
concerning the breeding
of domestic animals.
11
Adrien-Marie Legendre
(18 September 1752 10
January 1833) was a French
mathematician. He made
important contributions to
statistics, number theory,
abstract algebra and
mathematical
He developed analysis.
the least
squares method, which
has broad application
in linear regression,
signal processing,
statistics, and curve
12
fitting.
Johann Carl
Friedrich Gauss
(30 April 1777 23
February 1855) was
a German
mathematician and
scientist who
contributed
significantly to
many fields,
including number
theory, statistics,
analysis, differential
geometry, geodesy,13
Gauss, who was 23 at the time, heard

about the problem and tackled it. After
three months of intense work, he
predicted a position for Ceres in
December 1801just about a year after
its first sightingand this turned out to
be accurate within a half-degree. In the
process, he so streamlined the
cumbersome mathematics of 18th
century orbital prediction that his work
published a few years later as Theory of
14
Celestial Movementremains a
It introduced the Gaussian gravitational

constant, and contained an influential
treatment of the method of least
squares, a procedure used in all
sciences to this day to minimize the
impact of measurement error. Gauss
was able to prove the method in 1809
under the assumption of normally
distributed errors (see GaussMarkov
theorem; see also Gaussian). The
method had been described earlier by
Adrien-Marie Legendre in 1805, but
15
Gauss claimed that he had been using
Sir Ronald Aylmer Fisher

FRS (17 February 1890
29 July 1962) was an
English statistician,
evolutionary biologist,
eugenicist and geneticist.
He was described by
Anders Hald as "a genius
who almost singlehandedly created the
foundations for modern
statistical science," and
16
Richard Dawkins described
In addition to "analysis
of variance", Fisher
invented the
technique of
maximum likelihood
and originated the
concepts of
sufficiency, ancillarity,
Fisher's linear
discriminator and
Fisher information.
17
Establish and Fitting

of the Model
18
Probabilistic Model
yi : the observed value of the random
Yi
variable(r.v.)
xi1 , xi 2 ,K , xik
depends on fixed predictor values
Yi
,i=1,2,3,,n
0 , 1 ,K , k
unknown model
parameters
n is the number of
observations.
i.i.d
~i N (0, 2 )
19
Fitting the model

LS provides estimates of the unknown model
parameters,
0 , 1 ,K , k which minimizes Q
n
Q [ yi ( 0 1 xi1 2 xi 2 ... kxik )]2

i 1
(j=1,2,,k)
20
Tire tread wear vs. mileage (exampl

e11.1 in textbook)
The table gives the m

easurements on the g
roove of one tire afte
r every 4000 miles.
Our Goal: to build a mo
del to find the relation b
etween the mileage and
groove depth of the tire.
Mileage (in
1000 miles)
Groove Depth
(in mils)
394.33
329.50
291.00
12
255.17
16
229.33
20
204.83
24
179.00
28
163.83
32
150.33
21
SAS code----fitting the model

Data example
Input mile depth @@
Sqmile=mile*mile
Datalines;
0 394.33 4 329.5 8 291 12 255.17 16 229.33 20
204.83 24 179 28 163.83 32 150.33
;
run;
Proc reg data=example;
Model Depth= mile
sqmile;
Run;
22
Depth=386.2612.77mile+0.172sqmile
23
Goodness of Fit of the Model

Residuals
y i
i
ei yi y
(i 1, 2,K , n)
are the fitted values
y i
xi 1 xi 1 kxik
(i 1, 2,..., n)
An overall measure of the goodness

of fit
n
min Q SSE ei2
Error sum of squares (SSE):
i 1
total sum of squares (SST):
SST ( yi y ) 2
regression sum of squares (SSR):
SSR SST SSE

24
ultiple Regression Mod

In Matrix Notation
25
1. Transform the Formulas to Matrix Notati

on
26

The first column of X
denotes the constant term
(We can treat this as with)
27
Finally let
where
the (k+1)1 vectors of unknown parameters
LS estimates
28
Formula
becomes
Simultaneously, the linear equation
are changed to
Solve this equation respect-1 to and we get
(if the inverse of the matrix exists.)
29
2. Example 11.2 (Tire Wear Data: Quadratic

Fit Using Hand Calculations)
We will do Example 11.1 again in this part
using the matrix approach.
For the quadratic model to be fitted
30
According to formula
-1
we need to calculate first and then invert

it and get
31
Finally, we calculate the vector of LS estim

ates
32

Therefore, the LS quadratic model is
This model is the same as we obtained in Exam

ple
11.1.
33
StatisticalInference
for
MultipleRegression
34
Statistical Inference for

Multiple Regression
Determine which predictor variables have stati
stically significant effects
We test the hypotheses:
H 0 j : j 0 vs. H 1 j : j 0
If we cant reject H0j, then xj is not a significan
t predictor of y.
35
Statistical Inference on ' s

Review statistical inference for
Simple Linear Regression
1 1
1 : N 1,
: N (0,1)
Sxx
/ S xx
( n 2) S 2
2
t:
SSE
N
W / ( n 2)
: n2 2
1 1
/ S xx
( n 2) S 2
: tn2
2
( n 2)
36

What about Multiple Regression?
The steps are similar
1 : N 1, V jj
2
[ n ( k 1)]S 2
2
N
t:
W / [n (k 1)]
1 1
: N (0,1)
V jj
SSE
: n2 ( k 1)
1 1
V jj
[n ( k 1)]S 2
: tn ( k 1)
2
[n (k 1)]
37

Whats Vjj? Why 1 : N 1, 2V jj ?
1. Mean
Recall from simple linear regression, the least
squares estimators for the regression parameter
1
s
and 0are unbiased.
)
E
(
0
0
Here, of least
squares estimators E ( )
is also unbiased.

E ( 1 ) 1
L
L

k
E ( k )
38

2.Variance
Constant Variance assumption:
V ( i ) 2
var(Y )
0 L
0 0
2
L
L
L
L
0
2
Ik
L
39

( X T X ) 1 X T Y B cY
var( ) var(cY )
c var(Y )c
( X X ) X I(
k
T
( X X ) )X
T T
2 ( X T X ) 1
Let Vjj be the jth diagonal of the matrix ( X T X ) 1
2
var( j ) V jj
40

Sum up, E ( j ) j , var( j ) 2V jj
2
and we get j : N ( j , V jj )
j j
V jj
: N (0,1)
41

Like simple linear regression, the unbiased estimator
of the unknown error variance 2 is given by
2
e
SSE
MSE
2
i
S
n (k 1) n (k 1) d . f .
(n (k 1)) S 2 SSE
2
W
2 ~ n ( k 1)
2
and that S 2 and are statistically independent

j
42

Therefore,
j j
: N (0,1),
V jj
j j
V jj
t
j j
S V jj
(n (k 1)) S 2
2
~
n ( k 1)
2
j j
[n (k 1)]S
: tn ( k 1)
2
[n (k 1)]
S V jj
2
j j
B
: tn ( k 1)
SE ( )
j
SE ( j ) s v jj
43

Derivation of confidence interval of j
P ( tn ( k 1), /2
j j
tn ( k 1), /2 ) 1
SE ( )
j
P (
j t n ( k 1), /2 SE ( j ) j j t n ( k 1), /2 SE ( j )) 1
The 100(1-)% confidence interval for j is
j tn ( k 1), /2 SE ( j )
44

An level test of hypotheses
H0j : j
0
j
vs. H 1 j : j
0
j
P (Reject H 0 j | H 0 j is true) P( t j c)
c tn ( k 1), / 2
Rejects H0j if
0
j j
tj
tn ( k 1), /2
SE ( j )
45
Prediction of Future Observation
Having fitted a multiple regression model, s

uppose we wish to predict the future value of
Y for a specified vector of predictor variables
x*=(x0*,x1*,,xk*)
One way is to estimate E(Y*) by a confiden

ce interval(CI).
46
Prediction of Future Observation

*
*
* T
* E (Y * )
(
x
0
1 1
k k
k)
Var[( x* )T ] ( x* )T Var ( ) x* ( x* )T 2 ( X T X ) 1 ( x* )T
k
B 2 ( xk* )T V ( xk* )T
Replacing 2 by its estimate s 2 MSE, which has
n K 1 d.f ., and using methods as in Simple Linear
Regression, a (1- )-level CI for * is given by
* tn ( k 1), / 2 s ( x* )T Vx* * * tn ( k 1), /2 s ( x* )T Vx*

47
F-Test for
j 's
Consider:
H 0 : 1 L k 0;
vs
H1 : At least one j 0.
HereH 0 is the overall null hypothesis, wh

ich
states that none
x of the variables
areyrelated to . The alternative one sho
ws at least one is related.
48
How to Build a F-Test

The test statistic F=M
SR/MSE follows F-dist
ribution with k and n(k+1) d.f. The -level
test rejectsH 0 if
MSR
F
f k ,n ( k 1),
MSE
recall that
MSE(error mean squar
n
2
e
e)
i 1 i
MSE
n (k 1)
with n-(k+1) degrees

of freedom.
49
The relation between F and r

F can be written as a function of r.
By using the formula:
2
2
SSR r SST ; SSE (1 r ) SST .
F can be as:
r 2 [n (k 1)]
F
k (1 r 2 )
We see that F is an increasing function of r

and test the significance of it.
50
Analysis of Variance (ANOVA)

The relation between SST, SSR and SSE:
SST SSR SSE
where they are respectively equals to:

n
i 1
i 1
i 1
SST ( yi y ) 2 ; SSR ( yi y ) 2 ; SSE ( yi yi ) 2
The corresponding degrees of freedom(d.f.)

is:
d . f .( SST ) n 1; d . f .( SSR ) k ; d . f .( SSE ) n (k 1).

51
ANOVA Table for Multiple Regression

Source of
Variation
(source)
Sum of
Squares
Degrees of
Freedom
(d.f.)
(SS)
Regression
SSR
Mean
Square
(MS)
k
Error
SSE
n-(k+1)
Total
SST
n-1
SSR
k
SSE
MSE
n (k 1)
MSR
This table gives us a clear view of

analysis of variance of Multiple
MSR
MSE
52
Extra Sum of Squares Method for Testing

Subsets of Parameters
Before, we consider the full model with k
parameters. Now we consider the partial
model:
Yi 0 1 xi1 L k m xi ,k m i
(i 1, 2,K , n)
while the rest m coefficients are set to ze

ro. And we could test these m coefficients
to check out the significance:
H 0 : k m 1 L k 0;
vs
H1 : At least one of k m 1 ,K , k 0.
53
Building F-test by Using Extra Sum of Sq

uares Method
SSEk m
Let SSRk m and
be the regression and
error
sums of squares for the partial model. Sinc
e SST
SST SSRk m SSEk m SSRk SSEk
Is fixed regardless of the particular model,
so:
SSRk m SSEk SSRk SSRk m
then, we have:
( SSEk m SSEk ) / m
F
f m ,n ( k 1),
SSEk / [n (k 1)]
54
Remarks on the F-test

The numerator d.f. is m which is the numbe
r of
coefficients set to zero. While the denomina
tor
d.f. is n-(k+1) which is the error d.f. for the f
ull
model.
H0
The MSE in the denominator is the normaliz
ing
55
factor, which is an estimate of for the full
Links between ANOVA and Extra Sum of

Squares Method
Let m=1 and m=k respectively, we have:
SSE0 i 1 ( yi y ) 2 SST , SSEk SSE
n
From above we can derive:
SSE0 SSEk SST SSE SSR
Hence, the F-ratio equals:

SSR / k
MSR
F
SSE / [n (k 1)] MSE
with k and n-(k+1) d.f.

56
Regression
Diagnostics
57
5 Regression Diagnostics
5.1 Checking the Model Assumptions
Plots of the residuals against

individual predictor variables:
check for linearity
A plot of the residuals against fitted
values: check for constant variance
A normal plot of the residuals:
check for normality
58
A run chart of the residuals: check

if the random errors are auto
correlated.
Plots of the residuals against any
omitted predictor variables: check
if any of the omitted predictor
variables should be included in the
model.
59
Example: Plots of the residuals against individ

ual predictor variables
60
SAS code
61
Example: plot of the residuals against fitted

values
62
SAS code
63
Example: normal plot of the residuals
64
SAS code
65
5.2 Checking for Outliers and Influential Observati

ons
Standardized residuals
*
e
SE (e )
i
s 1 hii
Largee*i values indicate outlier observation.

Hat matrix
H X X X
If the Hat matrix diagonalh 2 k 1

n
en
ith observation is influential.
ii
, th
66
Example: graphical exploration of
outliers
67
Example: leverage plot
68
5.3 Data transformation

Transformations of the variables(both y and the x
s) are often necessary to satisfy the assumptions o
f linearity, normality, and constant error variance.
Many seemingly nonlinear models can be written i
n the multiple linear regression model form after
making a suitable transformation. For example,
y * 0 x11 x2 2
after transformation:
log y log 0 1 log x1 2 log x2
or
y * 0* 1* x1* 2* x2*
69
Topics in
Regression
Modeling
70
Multicollinearity
Multicollinearity occurs when two or more predi
ctors in the model are correlated and provide re
dundant information about the response.
Example of multicollinear predictors are height
and weight of a person, years of education and i
ncome, and assessed value and square footage
of a home.
Consequences of high multicollinearity:
a. Increased standard error of estimates of the
s
b. Often confused and misled results.
71
Detecting Multicollinearity
Easy way: compute correlations between all pair
s of predictors. If some r are close to 1 or -1, re
move one of the two correlated predictors from
the
model.X1 X2 X3
Variable
2
X1
X2
21
X3
31
Correlations
Equal to 1
12
13
23
32
X1
colinear
X2
X2
independent
X3
72
Detecting Multicollinearity
Another way: calculate the variance inflation fac
tors for each predictor xj:
where
is the coefficient of determination of
the model that includes all predictors except th
e jth predictor.
If VIFj10, then there is a problem of multicolli
nearity.
73
Muticollinearity-Example
See Example11.5 on Page 416, Response is the heat
of cement on a per gram basis (y) and predictors ar
e tricalcium aluminate(x1), tricalcium silicate(x2), te
tracalcium alumino ferrite(x3) and dicalcium silicate
(x4).
74
Estimated parameters in first order model:
y =62.4+1.55x1+0.510x2+0.102x3-0.144x4.
F = 111.48 with pvalue below 0.0001. Individual t
statistics and pvalues: 2.08 (0.071), 0.7 (0.501) a
nd 0.14 (0.896), -0.20 (0.844).
Note that sign on 4 is opposite of what is expected
. And very high F would suggest more than just one
significant predictor.
75
Correlations
Correlations were r13 = -0.824, r24 =-0.973. Also th

e VIF were all greater than 10. So there is a multicol
linearity problem in such model and we need to ch
oose the optimal algorithm to help us select the var
iables necessary.
76
Muticollinearity-Subsets Selecti
on
Algorithms for Selecting Subsets
All possible subsets
Only feasible with small number of potential predictors (m
aybe 10 or less)
Then can use one or more of possible numerical criteria to
find overall best
Leaps and bounds method
Identifies best subsets for each value of p

Requires fewer variables than observations
Can be quite effective for medium-sized data sets
Advantage to have several slightly different models to com
pare
77
Muticollinearity-Subsets Selecti
oin
Forward stepwise regression
Start with no predictors
First include predictor with highest correlation with response
In subsequent steps add predictors with highest partial correlation with response contr
olling for variables already in equations
Stop when numerical criterion signals maximum (minimum)
Sometimes eliminate variables when t value gets too small
Only possible method for very large predictor pools

Local optimization at each step, no guarantee of finding overall opti
mum
Backward elimination
Start with all predictors in equation
Remove predictor with smallest t value
Continue until numerical criterion signals maximum (minimum)
Often produces different final model than forward stepwise method

78
Muticollinearity-Best Subsets Criteria

Numerical Criteria for Choosing Best Subsets
No single generally accepted criterion
Should not be followed too mindlessly
Most common criteria combine measures of with add penalties fo

r increasing complexity (number of predictors)
Coefficient of determination
Ordinary multiple R-square
Always increases with increasing number of predictors, so not very good f

or comparing models with different numbers of predictors
Adjusted R-Square
Will decrease if increase in R-Square with increasing p is small
79
Muticollinearity-Best Subsets Criteria

Residual mean square (MSEp)
Equivalent to adjusted r-square except look for minimum
Minimum occurs when added variable doesn't decrease error sum
of squares enough to offset loss of error degree of freedom
Mallows' Cp statistic
Should be about equal to p and look for small values near p
Need to estimate overall error variance
PRESS statistic
The one associated with the minimum value of PRESSp is chosen
Intuitively easier to grasp than the Cp-criterion.
80
Muticollinearity-Forward Stepwi
se
First include predictor with highest correlation with
response
>FIN 81
=4
se
In subsequent steps add predictors with highest pa
rtial correlation with response controlling for variab
les already in equations. (if Fi>FIN=4, enter the Xi a
nd Fi<FOUT=4, remove the Xi)
>FIN
=4
82
se
>FIN
<FOUT
=4
=4
83
se
Summarize the stepwise algorithms
Therefore our Best Model should only include x 1 a

nd x2, which is y=52.5773+1.4683x1+0.6623x2
84
se
Check the significance of the model and individual
parameter again. We find p value are all small and
each VIF is far less than 10.
85
Muticollinearity-Best Subsets
Also we can stop when numerical criterion signals
maximum (minimum) and sometimes eliminate var
iables when t value gets too small.
86
Muticollinearity-Best Subsets
The largest R squared value 0.9824 is associated with t
he full model.
The best subset which minimizes the Cp-criterion inclu
des x1,x2
The subset which maximizes Adjusted R squared or eq
uivalently minimizes MSEp is x 1,x2,x4. And the Adjusted
R squared increases only from 0.9744 to 0.9763 by the
addition of x4to the model already containing x 1 and x2.
Thus the simpler model chosen by the Cp-criterion is p
referred, which the fitted model is
y=52.5773+1.4683x1+0.6623x2
87
Polynomial model
Polynomial models are useful in situations wher

e the analyst knows that curvilinear effects are p
resent in the true response function.
We can do this with more than one explanatory
variable using Polynomial regression model:
88
Multicollinearity-Polynomial Models
Multicollinearity is a problem in polynomial regr
ession (with terms of second and higher order):
x and x2 tend to be highly correlated.
A special solution in polynomial models is to us
e zi = xi xi instead of just xi. That is, first subt
ract each predictor from its mean and then use
the deviations in the model.
89
Multicollinearity Polynomial model

Example: x = 2, 3, 4, 5, 6 and x2 = 4, 9, 16, 25, 36.
As x increases, so does x2. rx,x2 = 0.98.
= 4 then z = 2,1, 0, 1, 2 and z2 = 4, 1, 0, 1,

4. Thus, z and z2 are no longer correlated. rz,z2 =
0.
We can get the estimates of the s from the esti
mates of the s. Since
90
Dummy Predictor Variable

The dummy variable is a simple
and useful method of introducing
into a regression analysis
information contained in
variables that are not
conventionally measured on a
numerical scale, e.g., race,
gender, region, etc.
91

The categories of an ordinal variable could
be assigned suitable numerical scores.
A nominal variable with c2 categories ca
n be coded using c 1 indicator variables,
X1,,Xc-1, called dummy variables.
Xi=1, for ith category and 0 otherwise
X1=,,=Xc-1=0, for the cth category
92

If y is a workers salary and
Di = 1 if a non-smoker
Di = 0 if a smoker
We can model this in the following way:
yi Di ut
93

Equally we could have used the dummy var
iable in a model with other explanatory var
iables. In addition to the dummy variable
we could also add years of experience (x), t
o give:
yi Di xi ut
E ( yi ) X
E ( yi ) X
For nonsmoker
For smoker
94

y
Non-smoker
Smoker
+
95

We can also add the interaction to betwee
n smoking and experience with respect to
their effects on salary.
yi Di xi Di xi ut
E ( yi ) ( ) ( ) X
E ( yi ) X
For nonsmoker
For smoker
96

y
Non-smoker
Smoker
+
97
Standardized Regression Coefficients

We typically wants to compare predictors
in terms of the magnitudes of their effect
s on response variable.
We use standardized regression coefficien
ts to judge the effects of predictors with d
ifferent units
98

They are the LS parameter estimates obta
ined by running a regression on standardi
zed variables, defined
_ as follows:
yi y
*
yi
sy
_
x
*
ij
xij x j
sxij
(i 1, 2,K , n; j 1, 2,K , k )
sxj
Where s y and
d
are sample SDsyiof
xan
j
99
0* 0
Let
s
*
And j ( xj )( j 1,2,K , k )
sy
*j
The magnitudes of
can be directly com
xj
pared to judge the relative effects
of on
y.
100
0* 0
Since
, the constant can be dropped
*
* 's
y
y
from the model. Let
be the vector of th
e
x*
x* ' s
and 1 berthe Kmatrix
ryx1
r of
x1x 2
x1xk
1 K rx2 xk
1 *' *
x
2
x
1
x x R
M
M O
M
n 1
1
rxkx1 rxkx2 K
ryx 2
1 *' *
x y r
M
n 1
ryxk
101

So we can get
*
1
M (x*'x*)1x*' y* R 1r

*
k
This method of computing

is numerically
j' s
more stable than computing

directly, bec
j' s
ause all entries of R and r are between -1 and
1.
102

Example (Given in page 424)
From the calculation, we can obtain that
1 0.19244, 2 0.3406
And sample standard deviations of x1,x2 and

are s 6.830, s 0.641, s 1.501
x1
x2
*
s
Then we have ( x1 ) 0.875, ( sx 2 ) 0.105
1
2
2
1
sy
sy
Note that * f *2 ,although

.Thus
x1 h
1 p 2
*
as a larger effect than x2 on y.

1
103

We can also use the matrix method to compute standardized re
gression coefficients.
First we compute the correlation matrix between x1 ,x2 and y
x2
Then we have
1
0.913
1
0.913
Next calculate
Hence
1
R
1 rx21x 2
1
1
r
x1x 2
0.971
r
0.904
x1
0.913
x2
0.971 0.904
rx1x 2
6.009 5.586
5.486
6.009
0.875
R 1r
0.105
Which is as same result as before
104
Variable Selection
Methods
105
How to decide their salaries?

32
23
Attacker
5 years
more than 20
goals per year
Lionel Messi
10,000,000
EURO/yr
Defender
11 years
less than 1
goals per year
Carles Puyol
5,000,000 EURO/yr
106
How to select variables?

1) Stepwise Regression
2)Best Subset Regression
107
Stepwise Regression
Partial F-test
Partial Correlation Coefficients
How to do it by SAS?
Drawbacks
108
Partial F-test
(p-1)-Variable Model:
Yi 0 1 xi1 ... p 1 xi , p 1 i
p-Variable Model:
Yi 0 1 xi1 ... p 1 xi , p 1 p xi , p i
109
How to do the test?

H1 p: 1 p 0
H0 p
We reject
level if
Fp
vs
H 0 p: p 0
H1 pof
in favor
( SSE p 1 SSE p ) /1
SSE p / [n ( p 1)]
at
f ,1,n ( p 1)
110
Another way to interpret the test:

test statistics:
p
tp
SE ( p )
t Fp
We reject H 0 p
at level if
2
p
| t p | tn ( p 1), /2
111
Partial Correlation Coeffientients

2
yx p | x1,..., x p1
SSE p 1 SSE p
SSE p 1
SSE ( x1 ,..., x p 1 ) SSE ( x1 ,..., x p )

SSE ( x1 ,..., x p 1 )
test statistics:
Fp t
2
p
2
yx p | x1,..., x p1
[n ( p 1)]
1 ryx2 p | x1,..., x p1
*Addx p to the regression equation that

xincludes
is large enough.
1 ,..., x p 1 only ifFp
112
How to do it by SAS? (EX9 Continuity of Ex5)

The table shows data
on the heat evolved in
calories during the
hardening of cement
on a per gram basis
(y) along with the
percentages of four
ingredients: tricalcium
aluminate (x1),
tricalcium silicate (x2),
tetracalcium alumino
ferrite (x3), and
dicalcium silicate (x4).
No.
X1
X2
X3
X4
26
60
78.5
29
15
52
74.3
11
56
20
104.3
11
31
47
87.6
52
33
95.9
11
55
22
109.2
71
17
102.7
31
22
44
72.5
54
18
22
93.1
10
21
47
26
1159
11
40
23
34
83.8
12
11
66
12
113.3
13
10
68
12
109.4
113
SAS Code
data example1;
input x1 x2 x3 x4 y;
datalines;
7 26 6 60 78.5
1 29 15 52 74.3
11 56 8 20 104.3
11 31 8 47 87.6
7 52 6 33 95.9
11 55 9 22 109.2
3 71 17 6 102.7
1 31 22 44 72.5
2 54 18 22 93.1
21 47 4 26 115.9
1 40 23 34 83.8
11 66 9 12 113.3
10 68 8 12 109.4
;
Run;
proc reg data=example1;

model y= x1 x2 x3 x4 /selection=stepwise;
run;
114
SAS output
115
SAS output
116
Interpretation
At the first step, x4 is chosen into the equatio

n as it has the largest correlation with y among t
he 4 predictors;
At the second step, we choose x1 into the equ

ation for it has the highest partial correlation wit
h y controlling for x4;
At the third step, sincer

is greater than
yx2 | x4 , x1
ryx3 | x4 , x1 , x2 is chosen into the equation rather th
an x3.
117
Interpretation
At the 4th step, we removed x4 from the mod

el since its partial F-statistics is too small.
From Ex11.5, we know that x4 is highly correl

ated with x2. Note that in Step4, the R-Square is
0.9787, which is slightly higher that 0.9725, the
R-Square of Step 2. It indicates that even x4 is th
e best predictor of y, the pair (x1,x2) is a better p
redictor than the predictor (x1,x4).
118
Drawbacks
The final model is not guaranteed to be optima
l in any specified case.
It yields a single final model while in practice t

here are often several equally good model.
119
Best Subset Regression

Comparison to Stepwise Method
Optimality Criteria
How to do it by SAS?
120
Comparison to Stepwise Regressio

n
In best subsets regression, a subset of var
iables is chosen from that optimizes a wel
l-defined objective criterion.
The best regression algorithm permits det
ermination of a specified number of best
subsets from which the choice of the final
model can be made by the investigator.
121
Optimality Criteria
r Criterion
2
p
r
2
p
SSR p
SST
SSE p
SST
Adjusted r Criterion
2
p
2
adj , p
SSE p / (n ( p 1))
SST / n 1
MSE p
MST
122
Optimality Criteria
C p Criterion
Standardized mean square error of prediction:
n
2
ip
p
i
2
i 1
E (Y )]
E
[
Y
involves unknown parameterssuch

as
s, so
j
p sample estimate
C p statistic
minimize a
of
. Mallows
:
p
Cp
SSE p
2( p 1) n
123
Optimality Criteria
It practice, we use theC Criterion

because of its ease of computation a
nd its ability to judge the predictive p
ower of a model.
p
124
How to do it by SAS?(Ex11.9)
proc reg data=example1;
model y= x1 x2 x3 x4 /selection=adjrsq m
se cp;
run;
125
SAS output
126
Interpretation
The best subset which minimizes Cthe

p Criterion
is x1, x2 which is the same model selected using
stepwise regression in the former example.
2
radj
The subset which maximizes
is x1, x2, x4.
,p
2
r
However, adj , p increases only from 0.9744 to 0.9
763 by the addition of x4 to the model which alr
eady contains x1 and x2.
Thus, the model chosen by the

i
C p Criterion
s preferred.
127
Chapter Summary
and Modern Application
128
Model (Extension of Simple

yRegression):
k xik
i 0 1 xi1 2 xi 2
Multiple
Regression Model
0, 1, 2, .... k are unknown parameters
Least squares method:

n
Q [ yi ( 0 1 xi1 2 xi 2 ... kxik )]2
Fitting the MLR

Model
i 1
n
Q
2 [ yi ( 0 1 xi1 2 xi 2 ... kxik )] 0
0
i 1
n
Q
2 [ yi ( 0 1 xi1 2 xi 2 ... kxik )] xij 0
j
i 1
Goodness of fit of the

model:
MLR Model in
Matrix Notation
SSR
r
SST
2
Y X
( X ' X ) 1 X 'Y
( X ' X ) 1 X 'Y
129
Statistical Inference ' s

on
Hypotheses:
H : 0 vs. H : 0
0j
Statistical Inference
for Multiple
Regression
Teststatistic:
1j
j j
Z
T
~ Tn ( k 1)
W / n (k 1)
S v jj
Hypotheses: H 0 : 1 L k 0 vs. H a : Atleastone j 0

Teststatistic:
Regression
Diagnostics
MSR r 2 {n (k 1)}
F
MSE
k (1 r 2 )
Residual Analysis
Data Transformation
130
The General Hypothesis Test:
Compare
the full model: Yi 0 1 x i1

the partial model:Y x
i
... k x ik i
i1 ... k m x i,km i
Hypotheses: H 0 : km 1 ... k 0 vs. H a : j 0

( SSEk m SSEk ) / m
~ f m,n ( k 1)
Test statistic: F0
SSEk /[n (k 1)]
RejectH
0 when F0 f m ,n ( k 1),
EstimatingandPredictingFutureObservations:
*
*
*
* '
x
(x
,
x
,...,
x
Let
0 1
k)
Teststatistic: T
and
* *
*
s x Vx
* Y * 0 1 x1* ... k xk* x*
~ Tn ( k 1)
CIfortheestimatedmean*:
* t n ( k 1), / 2 s x* Vx*
PI for the estimated Y * t n ( k 1), / 2 s 1 x* Vx*

Y*:
131
Topics in
regression
modeling
Multicollinearity
Polynomial Regression
Dummy Predictor
Variables
Logistic egression
Model
2
partial F-testryx p / x1,x p 1
SSE p 1 SSE p
SSE p 1
2
partial Correlation ryx p/ x1L x p1 n p 1
Fp
Coefficient
1 ryx2 p/ x1L x p1
Stepwise Regression:
Variable
Selection
Methods
Stepwise Regression Algorithm

Best Subsets Regression
Strategy for
building a MLR
model
132
Application of the MLR model

Linear regression is widely used in
biological, chemistry, finance and social
sciences
to
describe
possible
relationships between variables. It ranks
as one of the most important tools used
in these disciplines.
133
Financial market
biology
Housing
price
heredit
y
Chemis
try
134
Example
Broadly speaking, an asset pricing model
can be expressed as:
ri ai b1 j f1 b2 j f 2 L bkj f k i
Whereri f k,
and k denote the expected
return on asset i, the kth risk factor and
i factors,
the number of risk
denotesrespectively.
the specific return
on asset i.
135
The equation can also be expressed in the

matrix notation:
is called the factor

loading
136
GDP
Inflation
rate
Wha
t
facto s the mo
s t im
rs?
porta
nt
Interest rate
Rate of return on
the market
portfolio
Employment rate
Government
policies
137
Method
Step 1: Find the efficient factors
(EM algorithms, maximum likelih
ood)
Step 2: Fit the model and estimate the fa
ctor
loading
(Multiple linear regression)
138
According to the multiple linear regressio

n and run data on SAS, we can get the fac
tor loading
and the coefficient of mult
2
r
iple determination
We can ensure the factors that mostly eff
ect the return in term of SAS output and t
hen build the appropriate multiple factor
models
We can use the model to predict the futur
e return and make a good choice!
139
Questions
Thank you
140

Multilinear Regression

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Multilinear Regression

Caricato da

Copyright:

Formati disponibili

Multiple Linear

Jinmiao FuIntroduction and History

Example: The relationship between an

Karl Pearson (18571936)

Sir Francis Galton FRS (16

the statistical concept

Galton invented the use of the

The publication by his

Gauss, who was 23 at the time, heard

It introduced the Gaussian gravitational

Sir Ronald Aylmer Fisher

Establish and Fitting

Fitting the model

Q [ yi ( 0 1 xi1 2 xi 2 ... kxik )]2

Tire tread wear vs. mileage (exampl

The table gives the m

SAS code----fitting the model

Goodness of Fit of the Model

are the fitted values

An overall measure of the goodness

min Q SSE ei2

Error sum of squares (SSE):

total sum of squares (SST):

regression sum of squares (SSR):

SSR SST SSE

ultiple Regression Mod

1. Transform the Formulas to Matrix Notati

2. Example 11.2 (Tire Wear Data: Quadratic

we need to calculate first and then invert

Finally, we calculate the vector of LS estim

This model is the same as we obtained in Exam

Statistical Inference for

Statistical Inference on ' s

Statistical Inference on ' s

Statistical Inference on ' s

Statistical Inference on ' s

Statistical Inference on ' s

Statistical Inference on ' s

Statistical Inference on ' s

and that S 2 and are statistically independent

Statistical Inference on ' s

Statistical Inference on ' s

The 100(1-)% confidence interval for j is

Statistical Inference on ' s

Prediction of Future Observation

Having fitted a multiple regression model, s

One way is to estimate E(Y*) by a confiden

Prediction of Future Observation

* tn ( k 1), / 2 s ( x* )T Vx* * * tn ( k 1), /2 s ( x* )T Vx*

HereH 0 is the overall null hypothesis, wh

How to Build a F-Test

with n-(k+1) degrees

The relation between F and r

We see that F is an increasing function of r

Analysis of Variance (ANOVA)

where they are respectively equals to:

SST ( yi y ) 2 ; SSR ( yi y ) 2 ; SSE ( yi yi ) 2

The corresponding degrees of freedom(d.f.)

d . f .( SST ) n 1; d . f .( SSR ) k ; d . f .( SSE ) n (k 1).

ANOVA Table for Multiple Regression

This table gives us a clear view of

Extra Sum of Squares Method for Testing

while the rest m coefficients are set to ze