Sei sulla pagina 1di 51

HRP 223 - 2008

Topic 9 - Regression

Copyright 1999-2008 Leland Stanford Junior University. All rights reserved.


Warning: This presentation is protected by copyright law and international
treaties. Unauthorized reproduction of this presentation, or any portion of it, may
result in severe civil and criminal penalties and will be prosecuted to maximum
extent possible under the law.
Height and Resting Pulse
HRP223 2008
The spreadsheet RESTING.xls has height and pulse
measures on 50 people. On average, does pulse go
up or down with height?
Look before you leap!
HRP223 2008
Root MSE = Estimated standard deviation of the error in
the model (eta)
Dependent Mean = Mean of the outcome
CV = ratio of above * 100
In general r2 is interpreted as:
.1 small effect, 3. medium effect, .5 large effect
Adjusted R-square =1- ( (1- rsquare) * ((n-1)/n-m-1)) )
n=subjects m=variables
It penalizes you for putting extra terms in the model.
R-squared is typically reported if you have a single predictor
variable.
Adjusted R-square is typically reported if you have several
predictors.
Oxygen
HRP223 2008
The next set of data looks at the relationship
between oxygen inhaled and exhaled. You
would hope that there would be close to a
perfect relationship between the two factors.
Add the library to a new flowchart.

Add the SAS data set


to the project.
Look at the Data
This is bad news.

At least it is symmetric.
Simple correlation is questionable.
HRP223 2008
Are the residuals about normal?
Leave yourself a note on how to
interpret the output.
HRP223 2008
Right click on the flowchart and choose New > Note.
Leave yourself some notes.
Right click on the Note icon > Link Note to > Quadratic
Ice cream!
HRP223 2008
In this example you will predict ice cream
sales based on factors like price and
temperature.
Start by making a library (or copy and paste
the existing one) in a new flowchart.
The data is in a text file. Import the data.
Load the Data
2

Add Celsius

Celsius is ( (5/9) * (Fahr-32) ) 1


Celsius is ( (5/9) * (Fahr-32) )
HRP223 2008
Some people say
VIF > 10 is a
problem but that is
arbitrary.

If VIF is > 1/(1 - R-


squared) then the
factors are more
related to other
predictors than
outcome.
Severely Dehydrated Children
HRP223 2008
A Look
HRP223 2008
Do univariate descriptive statistics.
Things look reasonable.
Do bivariate correlations.
Age and weight are correlated
Do univariate modeling.
There is a weak but statistically significant
association.
Build a model with all 3 predictors and check
variance inflation.
A Simpler Model
It explains a fair
amount of the
variability
(45%). How can
I check to make
sure the model
is working well
and is not being
driven by
outliers?
Outliers
HRP223 2008

Images from: Statistics I: Introduction to ANOVA, Regression, and Logistic Regression


Course Notes (2005) and Categorical Data Analysis Using Logistic Regression Course Notes
(2005), SAS Press.
First Check Residuals
What is influential?
HRP223 2008
Freund and Littell SAS System for Regression 3rd
edition, page 70;
Variance inflation:
vifcheck = 1 /(1 r2)
Leverage greater than this value:
leverageCheck = 2 * (predictors + 1) / records
Covariance more extreme than:
cov1Check = 1 + 3 * (predictors+1) / records
cov1Check = 1 - 3 * (predictors+1) / records
Dfits values with absolute value bigger than:
dffitsCheck = 2 * ((predictors + 1)/records) ** .5
Influence Code
HRP223 2008

Potrebbero piacerti anche