Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
February 2006
*(interval or ratio)
Dependent variable is continuous interval or ratio (numeric) Independent variables are also interval or ratio
Examples
Effect of weight on blood pressure Effect of drug dose on reticulocyte count
Independent Variable
Dependent Variable
Independent Variable
Dependent Variable
Dependent variable is binary (yes/no) outcome. Independent variables are continuous interval
Examples:
Relation of weight and BP to 10 year risk of death Relation of CD4 count to 1 year risk of AIDS diagnosis
AIDS
No AIDS
80
50 20
20
50 80
AIDS
No AIDS
80
50 20
20
50 80
Problems some information is lost when we collapse the numeric data into categories. This leads to loss of power. no estimate of magnitude of relation
Probability:
p = probability of event
1 - p = probabilty of not the event (also called q) p varies from 0 to 1
Odds
Ratio of probability of event to probability of not having the event: Odds = p/(1 - p) When p = 0.5, odds = 1 (or 1:1 odds) When p = 0.1, odds = 0.1/0.9 = 0.11
The log odds ratio (also called logit) is simply the natural logarithm of the odds ratio: logit = ln(odds ratio) = ln(p/(1-p)) = ln(p) ln(1-p)
0.750
0.500
0.250
In other words, the model says the odds of the event happening are
A constant factor (a) Some other constant (b) times a numeric risk factor (x) (for example, SBP)
Given value of the independent variables, the regression equation predicts the
The statistics program calculates the coefficient b The coefficient b shows how much the odds ratio changes with a change in the independent variable Positive b higher risk with higher values Negative b lower risk with higher values
Hypothetical example given above examining relation of BP to risk of stroke/death. The model predicts:
ln(odds ratio) = constant + b SBP
e(c + bSBP)
= ec e(bSBP)
The coefficient b shows how much the odds ratio changes with a change in the independent variable
Odds Ratio In other words, Odds Ratio = something (eb)(x) = ec e(bx)
So eb is the factor indicating effect of x on the event. Each one unit change in x will multiply the odds ratio by a factor of eb .
Suppose b = -0.693 so eb = 0.5 A one-unit change in x will halve the odds ratio. If b = 0, eb = 1, and x has no effect on OR
For the hypothetical example above, the report is given by Epi Info as
Term BP Odds Ratio 1.0597 95% CI 1.022 1.098 Coeff 0.0579 S. E. 0.0185 Z P
3.131 0.0017
Const
-7.201
2.2994
3.131 0.0017
Term BP Constant
S. E. 0.018 2.299
Z 3.131 3.131
Odds Ratio
95% CI
Coefficient
S. E.
P-value
BP
Constant
1.0597
*
1.0220
*
1.0987
*
0.0579
-7.2014
0.0185
2.2994
3.1319
3.1319
0.0017
0.0017
eb
Odds ratio for one unit change in the independent variable (e.g. BP). This is the calculated eb
Term BP Constant
S. E. 0.0185 2.2994
Z 3.1319 3.1319
The confidence interval does not include 1, so the effect is statistically significant
Single variable:
logit = c + bx
OR = c (eb)x
Multiple variables:
logit = c + b1x1 + b2x2 + + bnxn
OR = c (eb1)x1 (eb2)x2 (ebn)xn
Analysis reports a b coefficient for each independent variable. That coefficient is the effect of the given independent variable, separated from the effects of all the other independent variables.
Prospective cohort study of causes of cardiac disease: Evans County Study 1965 Independent variables = age, gender, race, social index, SBP, diabetes, smoking, cholesterol, and an obesity index Dependent variable = risk of dying during 10 year period
Variable
Constant Age Gender Age x gender
Range
40-69 y 0=m, 1=f
b coeff
-6.376 0.086 1.500 -0.043 -0.056 0.0006
SE
1.634 0.115 0.967 0.017 0.040
p
<0.001 <0.001 0.121 0.011 0.160
0.0003 0.082
SBP
Diabetes Smoking Cholesterol Quartlet
88-310
0=n, 1=y 0=n, 1=y 94-546 2.11-8.76
0.019
1.123 0.317 0.0031 -1.064
0.002
0.261 0.157 0.432
<0.001
<0.001 0.043 0.014
0.0015 0.041
(Quartlet)2
4.44-76.8
0.112
0.049
0.022
-0.056
0.0006 0.019 1.123 0.317
0.040
0.002 0.261 0.157 0.432 0.049
0.160
<0.001 <0.001 0.043 0.014 0.022
0.0003 0.082
Cholesterol
Quartlet (Quartlet)2
94-546
2.11-8.76 4.44-76.8
0.0031
-1.064 0.112
0.0015 0.041
SE 0.115 0.967
<0.001 0.121
The p value indicates statistical significance Age is positively correlated with risk of death Gender has positive b coefficient, but the p value is 0.12, indicating that we cannot say that there is a significant relationship.
Variable
Constant Age
Range
40-69 y
b coeff
-6.376 0.086
SE
1.634 0.115
<0.001 <0.001
Gender
0=m, 1=f
1.500
0.967
0.121
Gender is coded as 0 for male, 1 for female eb [e1.5 = 4.48] is change in OR for 1 unit change in gender, i.e. OR for females relative to males eb for any dummy variable (coded 0-1) is the adjusted OR for that risk factor, since 1 unit of change = presence vs. absence of risk factor
Range
b coeff -0.043
SE 0.017
0.011
-0.056
0.0006
0.040
0.160
400-7056
0.0003 0.082
Social index squared is included as well as social index itself. Squared terms allow for curvilinear relationships, just as in ordinary regression
Variable Age
Range 40-69 y
b coeff 0.086
SE 0.115
<0.001
Gender
0=m, 1=f
1.500
-0.043
0.967
0.017
0.121
0.011
Age x gender included to see whether age has different effect in males than in females.
With binary, dummy variables, eb is the odds ratio. You can compare the strength (slope) of the effect by comparing b. With numeric variables, b is not a direct measure of strength of effect.
Example: b is quite small in effect of BP on mortality, because it is the effect of only one mmHg change in BP. BP is still an important factor in mortality because there is a wide range in the BP.
In a prospective cohort study we can use logistic regression model to predict probability of the event given the independent variables. Also can derive relative risk.
Forward selection: add one variable at a time until there are no more that make a significant difference Backward selection: start with all, remove one at a time to see if they made a significant contribution EPI Info has suggestions on how to do this