Sei sulla pagina 1di 28

Chapter 6

Variable
Screening
Methods

Copyright 2012 Pearson Education, Inc. All rights reserved.

Why variable screening?

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 2

Commonly used variable


screening methods

Forward Selection
Backward Elimination
Stepwise Regression
All Possible Regression

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 3

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 4

Stepwise Regression
This is the most popular of the three methods. This is a
combination of forward and backward methods.

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 5

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 6

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 7

Some caveats of this procedure:


Stepwise and other variables screening methods do not
guarantee that the best model has been achieved since:

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 8

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 9

Figure 6.1 MINITAB stepwise


regression results for executive salaries

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 10

Figure 6.2 SAS backward stepwise


regression for executive salaries

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 11

All possible Regressions


While the stepwise regressions are objective
methods of arriving at a satisfactory model, there
are some subjective methods as well. These
methods intend to select the subset of explanatory
variables which yield the best model with respect
to some measure. Several such measures are
described in the following slides.
In the absence of multicollinearity various
methods usually result in the same model selected.

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 12

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 13

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 14

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 15

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 16

Figure 6.3 MINITAB all-possibleregressions selection results for executive


salaries

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 17

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 18

Figure 6.4 MINITAB plots of all-possibleregressions selection criteria for Example


6.2

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 19

Which model to use?


According to Chatterjee and Hadi (2006):
[Regression Analysis by Example] a regression
analysis serves different objectives. These are:

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 20

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 21

So a desirable strategy of model building is to select 2 or 3 models


which are good with respect to describing data e.g. using adjusted R2
or mean square error criteria or Cp criteria.
Among these short listed models choose model or models which are
good for prediction out of sample if objective is to predict a new
observation. PRESS statistic is a suitable measure in this regards.
However these all stepwise and best subset regressions work well when
the explanatory variables are not collinear. In the case of collinear data,
the estimate of the coefficient will not be precise i.e. their variances
will be large. So if the objective is control one should then finds a
model which has low collinearity in addition to being good with
respect to PRESS and adjusted R2.

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 22

How to measure collinearity?


In the cases of multicollinearity or simply collinearity the
variances of the estimated regression coefficients are very
large making the coefficients less precise. This can create
problem if our interest is to estimate the coefficient precisely
e.g. in the control objective of the regression. Omitting an
important variable from regression creates bias in estimation
of included coefficients. This bias becomes serious when the
omitted variable is correlated with the included variable.
Thus a good sign of multicollinearity is that inclusion or
exclusion of variables from the regression may significantly
change the values or even signs of the coefficients.

Copyright o 2012 Pearsoeen Education, Inc. All rights reserved.

6- 23

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 24

Is there evidence of multicollinearity


in the executives salary data?
A regression with full set of x variables gives:

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 25

Consider the data set reported


by Montgomery et al (2012).
This data is from Hald (1952) know as Hald
cement data. This is related to heat evolved in
calories per gram of cement (y) and three x
variables, namely, tricalcium aluminate (x1),
tricalcium silicate(x2) tetraclcium ferrite (x3)
and dicalcium silicate (x4).

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 26

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 27

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 28

Potrebbero piacerti anche