Sei sulla pagina 1di 10

Assignment 1

Package Pricing at Mission Hospital


Data Mining and Business Analytics (with R)
Himanshu Gupta (P36116), Divyang Sinha (P36112)
---------------------------------------------------------------------------------------------------------------------

CODE:

Question-1
Develop a suitable simple linear regression model to check if there is any relationship between
Total Cost to Hospital and AGE. For the fitted model, interpret the regression coefficient
corresponding to AGE.

Answer-1

CONCEPT:

For the simple linear regression model,


Output/Dependent Variable = Total Cost to Hospital
Input/Independent Variable = Age
And, with various transformations we are trying to see which model best fits and describes the
above relationship.
Total.Cost.to.Hospital = 0 + 1 Age +
For the various transformations, we decide on the basis of the adjusted R square value i.e. greater
the adjusted R square value better the model.
CODE:

RESULTS & INTERPRETATION:

The plot of Total.Cost.to.Hospital with AGE show that there exist some relationship among them
as Total.Cost.to.Hospital increased with AGE.
We prepared a total of 7 models to check for the relationship between Total.Cost.To.Hospital and
AGE and found the maximum adjusted R squared value in model 4 which is:
log(Total.Cost.to.Hospital) = 0 + 1 Age +
Summary of Model 4:

From the above summary, the resultant model comes out to be:
log(Total.Cost.to.Hospital) = 11.8147 + 0.008565 Age +
which means that an increase in the age by 1 year can increase the total cost to hospital by 0.86%.
Furthermore, using the standard errors we can say an average increase in the age by 1 year can
increase the total cost to hospital from 0.64% to 1.08% (i.e. 1 2 Std. Error)
From the t-value and p-value:
Null Hypothesis Ho : There is no relationship between Total.Cost.to.Hospital and AGE
Alternate Hypothesis Ha : There exists relationship between Total.Cost.to.Hospital and AGE
Since, the t-value is large enough and p-value is significant, therefore we can reject the null
hypothesis.
Using the residual standard value of 0.455, even if the correct and true values of the coefficients
were known exactly, any prediction of total cost to hospital on the basis of it would still be off by
0.455% on average.
Roughly, 18.94% of the variation in total cost to hospital is explained by this model.
The F-statistics is also very large with a significant p-value. The plot below shows the fitted
regression line:

Question-4
Develop a suitable simple linear regression model to check if there is any relationship between
Total Cost to Hospital and GENDER. For the fitted model, interpret the regression
coefficient corresponding to GENDER.
Answer-4

CONCEPT:
For the simple linear regression model,
Output/Dependent Variable = Total Cost to Hospital
Input/Independent Variable = Gender
And, with various transformations we are trying to see which model best fits and describes the
above relationship.
Total.Cost.to.Hospital = 0 + 1 Gender +
For the various transformations, we decide on the basis of the adjusted R square value i.e. greater
the adjusted R square value better the model.

CODE:
RESULTS & INTERPRETATION:

The plot of Total Cost to Hospital with Gender show that there exist some relationship among
them as Total Cost to Hospital is more for Male.

We prepared a total of 5 models to check for the relationship between Total Cost To Hospital and
Gender and found the maximum adjusted R squared value in model 4 which is:
log(Total.Cost.to.Hospital) = 0 + 1 GENDER +
Summary of Model 4:
From the above summary, the resultant model comes out to be:
Log (Total.Cost.to.Hospital) = 11.93436 + 0.19082 Male +
which means that gender affects the total cost to hospital by 19.08%
From the t-value and p-value:
Null Hypothesis Ho : There is no relationship between Total.Cost.to.Hospital and GENDER
Alternate Hypothesis Ha : There exists relationship between Total.Cost.to.Hospital and GENDER
Since, the t-value is large enough and p-value is significant, therefore we can reject the null
hypothesis.
Using the residual standard value of 0.498, even if the correct and true values of the coefficients
were known exactly, any prediction of total cost to hospital on the basis of it would still be off by
0.498% on average.
Roughly, 2.77% of the variation in total cost to hospital is explained by this model.
The F-statistics is also very large with a significant p-value. The plot below shows the fitted
regression line:
Question-5
Develop a suitable simple linear regression model to check if there is any relationship between
Total Cost to Hospital and MARITAL STATUS. For the fitted model, interpret the regression
coefficient corresponding to MARITAL STATUS.

Answer-5

CONCEPT:
For the simple linear regression model,
Output/Dependent Variable = Total Cost to Hospital
Input/Independent Variable = Marital Status
And, with various transformations we are trying to see which model best fits and describes the
above relationship.
Total.Cost.to.Hospital = 0 + 1 Marital.Status +
For the various transformations, we decide on the basis of the adjusted R square value i.e. greater
the adjusted R square value better the model.

CODE:
RESULTS & INTERPRETATION:

The plot of Total Cost to Hospital with Marital Status show that there exist some relationship
among them as Total Cost to Hospital is more for Married.

We prepared a total of 5 models to check for the relationship between Total Cost To Hospital and
Gender and found the maximum adjusted R squared value in model 4 which is:
log(Total.Cost.to.Hospital) = 0 + 1 Marital Status +
Summary of Model 4:
From the above summary, the resultant model comes out to be:
Log (Total.Cost.to.Hospital) = 12.29182 - 0.4067 Unmarried +
which means that marital status affects the total cost to hospital by -40.69%
From the t-value and p-value:
Null Hypothesis Ho : There is no relationship between Total.Cost.to.Hospital and Marital Status
Alternate Hypothesis Ha : There exists relationship between Total.Cost.to.Hospital and Marital
Status
Since, the t-value is large enough and p-value is significant, therefore we can reject the null
hypothesis.
Using the residual standard value of 0.464, even if the correct and true values of the coefficients
were known exactly, any prediction of total cost to hospital on the basis of it would still be off by
0.454% on average.
Roughly, 15.66% of the variation in total cost to hospital is explained by this model.
The F-statistics is also very large with a significant p-value. The plot below shows the fitted
regression line:

Potrebbero piacerti anche