Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The graphs above show that the assumptions of normality and homoscedasticity is
not being followed, as in the residual vs fitted graph we can see a pattern, the
values are clustered with lower fitted values and far apart with higher fitted
values. This shows that the variances are not same, they depend on the
covariance of fitted values.
Similarly the Normal QQ Plot shows that the plot of the values deviate from the
normal line. Hence the underlying assumptions for a linear relationship are not
satisfied.
So we try the log linear model.
Log model
> mod_2<-lm(log(TOTAL.COST.TO.HOSPITAL)~AGE)
> summary(mod_2)
Call:
lm(formula = log(TOTAL.COST.TO.HOSPITAL) ~ AGE)
Residuals:
Min
1Q Median
3Q
Max
-1.51748 -0.24402 -0.00536 0.25388 1.39912
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.814724 0.043326 272.693 < 2e-16 ***
AGE
0.008565 0.001118 7.662 4.21e-13 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.455 on 246 degrees of freedom
Multiple R-squared: 0.1927,
Adjusted R-squared: 0.1894
F-statistic: 58.7 on 1 and 246 DF, p-value: 4.212e-13
> plot(mod_2, which=c(1,2))
We see in the residual vs fitted graph that it shows random variances, and the
pattern that was first visible in the previous graph is not there. Also the normal QQ
Plot shows a better the fit of normality than the previous plot.
The beta 1 shows that one unit change in age will change the total cost to hospital
by a factor of Rs. 1.0086
> mod_4<-lm(log(TOTAL.COST.TO.HOSPITAL)~GENDER)
> plot(mod_4, which=c(1,2))
> summary(mod_4)
Call:
lm(formula = log(TOTAL.COST.TO.HOSPITAL) ~ GENDER)
Residuals:
Min
1Q Median
3Q
Max
-1.31142 -0.28273 -0.08258 0.26109 1.57082
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.93436 0.05503 216.865 < 2e-16 ***
GENDERM
0.19082 0.06726 2.837 0.00493 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.4983 on 246 degrees of freedom
Multiple R-squared: 0.03168,
Adjusted R-squared: 0.02774
F-statistic: 8.048 on 1 and 246 DF, p-value: 0.004934
> contrasts(GENDER)
M
F0
M1
> exp(0.19082)
[1] 1.210242
Gender being a qualitative variable becomes a dummy variable here in the regression
model. The contrast command shows that it is coded as 1 for male and 0 for female. The
dummy variable formed is GENDERM. The model shows that for males the total cost to
hospital will be increased by a factor of 1.210242 and p value shows it to be significant.
> contrasts(MARITAL.STATUS)
UNMARRIED
MARRIED
0
UNMARRIED
1
> summary(mod_6)
Call:
lm(formula = log(TOTAL.COST.TO.HOSPITAL) ~ MARITAL.STATUS)
Residuals:
Min
1Q Median
3Q
Max
-1.3608 -0.2360 -0.0334 0.2396 1.4042
Coefficients:
shows that for unmarried people the total cost to hospital will be decreased. The total cost
will be multiplied by a factor of 0.6656642 and the p value shows that it is significant.
Only Age is significant. Gender and marital status are insignificant as seen by p
value, however in 4 and 5 these variables were coming as significant. This shows
if considered independently, the gender and marital status show a lot of
significant impact on the total cost to hospital, however, in the combined model,
the effect is not significant.
BP.LOW
-0.0005388 0.0032198 -0.167 0.867311
RR
0.0173013 0.0090719 1.907 0.058343 .
Diabetes1
-0.0931856 0.1643344 -0.567 0.571496
Diabetes2
0.2090071 0.1756235 1.190 0.235820
hypertension1
-0.0623585 0.1217057 -0.512 0.609116
hypertension2
-0.2203463 0.1496889 -1.472 0.143028
hypertension3
0.1137384 0.1999772 0.569 0.570339
other
-0.0703775 0.1239298 -0.568 0.570932
HB
0.0027892 0.0118002 0.236 0.813456
UREA
0.0008210 0.0026521 0.310 0.757307
CREATININE
0.2667857 0.1271125 2.099 0.037444 *
AMBULANCE
0.1048268 0.3199244 0.328 0.743607
TRANSFERRED
-0.2662347 0.2261663 -1.177 0.240923
ELECTIVE
0.0878894 0.3115261 0.282 0.778221
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.3965 on 156 degrees of freedom
(57 observations deleted due to missingness)
Multiple R-squared: 0.5307,
Adjusted R-squared: 0.4285
F-statistic: 5.19 on 34 and 156 DF, p-value: 5.174e-13
The significant predictors are highlighted in yellow in the table above.
> mod_10<lm(log(TOTAL.COST.TO.HOSPITAL)~AGE+CAD.DVD+CAD.TVD+other..heart+other.general+
other.tertalogy+RHD+HR.PULSE+CREATININE)
> summary(mod_10)
Call:
lm(formula = log(TOTAL.COST.TO.HOSPITAL) ~ AGE + CAD.DVD + CAD.TVD +
other..heart + other.general + other.tertalogy + RHD + HR.PULSE +
CREATININE)
Residuals:
Min
1Q Median
3Q
Max
-1.06605 -0.20151 -0.02119 0.19485 1.26342
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
10.974447 0.176893 62.040 < 2e-16 ***
AGE
0.006630 0.001672 3.965 0.000101 ***
CAD.DVD
0.401122 0.105391 3.806 0.000186 ***
CAD.TVD
0.388842 0.109755 3.543 0.000490 ***
other..heart
0.221803 0.074259 2.987 0.003162 **
other.general -1.544496 0.419724 -3.680 0.000298 ***
other.tertalogy 0.288918 0.114124 2.532 0.012103 *
RHD
0.490360 0.100450 4.882 2.11e-06 ***
HR.PULSE
0.005739 0.001594 3.600 0.000399 ***
CREATININE
0.223745 0.064466 3.471 0.000633 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.4061 on 205 degrees of freedom
(33 observations deleted due to missingness)
Multiple R-squared: 0.4232,
Adjusted R-squared: 0.3979
F-statistic: 16.71 on 9 and 205 DF, p-value: < 2.2e-16
The fitted model with all the significant predictor also has a
multiple r square of 42.32% and the adjusted r square of 0.3979.
showing the model explains 42.32% of the model.