Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1
time there is a need to maximize the utilization on the patient data generated within the hospital
of the available resources. [3]. These models have used data relating to the
Today, the healthcare sector faces strong patient episodes and nursing diagnoses or data
pressures to provide high quality service at an collected from the hospital charge system or the
affordable cost. This is obviously very important pharmacy and the laboratory information system
in the case of privately run hospitals. With the [7]. On the other hand, combining the hospital
introduction of user charges, even if they are related data with the demographic information
based on the “ability to pay”, even the state-run available in building such predictive models
hospitals are forced to become more efficient. would be helpful in better targeting the
Healthcare industry has large volumes of data at governmental and public support to the needy
its disposal and if it can extract the desired segments of the population. Identifying the
information from this voluminous data, it can profiles of the population and classifying them
lead to better utilization of resources. based on demographic characteristics of each
The hospitals routinely collect large amounts profile could make the targeting of the health
of patient related data in the course of their services more effective. Needless to say, the
regular activities. Similarly the governmental hospital and health related variables play as
agencies also collect large amounts of health and much an important role in the profiling exercise
other related data on a regular basis. These data as the demographic characteristics.
sources can be used to develop mathematical
models for predicting various variables such as 2. Neural Prediction Models
duration of hospital stay, hospital expenses etc.
One of the problems involved in developing The neural prediction models employ back
mathematical models using these data that are propagation neural network to predict values.
routinely collected by the hospitals and The prediction is based on prediction value and
governmental agencies is that many of the the attribute relationship discovered by mining a
variables are categorical in nature. Even though set of training data. The training data contains
the dependent variables that are of interest are the independent (explanatory) as well as
quantitative (continuous) in nature, most of the dependent variables. The relationship developed
other variables, which could be used as using the training set of data is tested against
independent or explanatory variables, are another set of data called validation data. The
categorical in nature and not easily amicable for separation of training data from validation data is
usual econometric modeling. Needless to say, it to ensure that the predicting ability can be
is possible to use dummy variables to capture the validated appropriately and without bias.
effect of these categorical variables; such an Usually, the data set is divided randomly into
approach would require large number of dummy training set and validation set. In addition to
variables. Under such circumstances, techniques developing the relationship between the
such as data mining could be effectively used for dependent and independent variables through
analyzing these large quantities of data to back propagation, the neural prediction models
facilitate formulation of appropriate policy create the profiles of different segments based on
measures. Techniques such as classification similarities in the explanatory variables.
trees [6], discriminant analysis [4], clustering or
segmentation [5] had been used in other areas of 3. Data
public policy such as adult literacy programmes
and sericulture sectors. Similarly, it was shown The data used in this study is obtained from the
that previously undiscovered patterns for finding National Sample Survey Organization (NSSO).
out the associated primary and secondary Data is collected by NSSO on a regular basis in
surgical procedures could be induced using data India on various aspects as part of the National
mining techniques [1]. This paper presents Sample Survey. This particular data set contains
application of neural prediction models for data collected directly from the households
predicting the duration of hospitalization as well covering the entire country. Various aspects
the hospital related expenses based on hospital as covered in the survey include demographic
well as demographic data. As mentioned earlier, characteristics, health related factors as well as
data mining models have been used for hospital related variables. The demographic
predicting length of stay and the related characteristics include items such as age, gender,
expenses. But, these models are primarily based
2
social group (caste), income, education, marital While fitting the mathematical function, the
status, occupation, size of household, number of neural prediction models also create the profiles
family nuclei, rural or urban sector etc. The for each of the segments. The profile of the top
health related factors covered were source of segment consisting of 98-100 percentile with
drinking water, agency providing the water for respect to the duration of stay in the hospital is
drinking, application of items such as iodine, presented in Figure 1. Similar profile with
insecticide, vaccination, environmental factors respect to the hospital expenditure is presented in
such as type of drainage, type of latrine, habit of Figure 2.
keeping animals inside or near residential area,
type of dwelling, consumption of items such as
alcohol, ganja, charas, opium, tobacco etc. The
hospital related items include duration of stay in
the hospital, hospital expenses, type of hospital,
type of ward, availing of services such as x-ray,
medicines, surgery, services other than drugs,
etc. This data is generally used for policy
formulation by agencies like Planning
Commission, Ministry of Health etc.
4. Results
As mentioned earlier, the duration of hospital
stay and the expenditure are the two variables Figure 1. Profile of the 98-100 Percentile
selected for prediction. Each variable is divided Segment – Duration of length of stay
into 8 unequal segments (octiles). The first
octile consisted of the top 2 percent of the data
representing the 98-100 percentile. Similarly,
the last octile consisted of the bottom 2 percent
consisting of 0-2 percentile. The second and the
seventh segments each consisted of 8 percent of
the data consisting of 90-98 percentile and 2-10
percentile respectively. The third and the sixth
segments each consisted of 15 percent of the data
with 75-90 percentile and 10-25 percentile
respectively. The remaining two segments
consisted of 25 percent of the data each.
Neural prediction functions were fitted with
the two dependent variables using the other
Figure 2. Profile of the 98-100 Percentile
variables as the explanatory variables. IBM’s
Segment – Hospital Related Expenses
Intelligent Miner was used for fitting the neural
prediction functions. Two-thirds of the data were
The factors that help in differentiating the
used for training and the remaining one-third
profiles of each of the segments of the duration
was used for verification and validation. The
of stay in the hospital are identified and
prediction abilities of each of the neural
presented in Table 1. Most of the variables show
predication functions were measured by the
the impact on the duration as expected. For
normalized error. The duration of stay in the
example, the first segment (98-100 percentile)
hospital was used as one of the explanatory
has an average stay of 36 days. This segment is
variables while fitting the function for the
characterized by male patients (63 percent)
hospital expenditure. The normalized error for
where as the last segment (0-2 percentile) is
the prediction for duration was 22.64 percent. In
characterized by female patients (58 percent).
other words, the model was able to predict the
Similarly, 98 percent of the patients in the first
appropriate octile group with an accuracy level
segment live in independent dwelling units
of 78 percent. Similarly, the normalized error for
where as only 38 percent of the last segment live
the expenditure was 10.32 percent, indicating an
in independent houses. Only those variables,
accuracy level of about 90 percent.
3
which differentiate between various segments,
are presented in the table. Some of the important 110
90 90-98
80 75-90
Interestingly, tap water appears to lead to longer
50-75 durations as compared to tube well water. This
60
25-50 could imply that the tap water needs to be treated
40 10-25 properly. At the same time, government as the
20
2-10
water supply agency appears to be better than
0-2
0 others. Figures 5 and 6 present the trends with
respect to selected health related aspects.
Not Married
Other Caste
Nuclei
Family
Independent
Single
4
expected relationships with respect to the
100
hospital related items. As a matter of fact there
90 are no surprises with respect to the hospital
80
98-100
related items. Since these as well as many of the
90-98
70 75-90
health related aspects appear impact the duration
50-75 on the expected lines, it could be justifiably
60
25-50 concluded that the impacts of the demographic
50 10-25 characteristics as indicated are also as reliable.
40
2-10
The length of stay in the hospital had a
0-2
30 direct impact on the expenses, for obvious
20
reasons. Similarly, availing free medicines, free
No animals No Alcohol Tap Water No surgery and other services other than drugs are
Smoking highly correlated (negative correlation) with the
C harac teristic hospital expenditure. Needless to say, this
relationship is as expected. The patients who
Figure 5. Trends of length of stay with respect to incur higher expenses are mainly in the private
selected health related aspects hospitals or in special wards where as those who
stayed in the public hospitals and free or general
wards incurred lower expenses. In other words,
100 these patients were fully utilizing the support
90
provided by the government. A quick analysis
98-100
90-98
of the impact of demographic characteristics on
80
75-90 the duration of stay indicates that, providing
50-75 primary education could have a favorable impact
70
25-50 the duration of stay in the hospital. Similarly,
60
10-25
proper housing and proper education with
2-10
respect to keeping animals in the residential
50 0-2
areas could reduce the stay in the hospital. The
40 source and agency for water supply and other
Government Tap water No Animals related health aspects could help the demand on
Water Supply
hospital services in the long term. At the same
C harac teristic
time, there is a need to target the rural population
more effectively as these patients appear to stay
Figure 6. Trends of expenditure with respect to longer in the hospital. Figures 7 and 8 present
selected health related aspects the trends with respect to selected hospital
related items.
4.3. Hospital Related Items
90
5
drinking water source, agency for drinking water
120
supply, consumption of charas, ganja, tobacco
100
98-100 etc., and various hospital related items. The
90-98
normalized error with respect to the length of
80 75-90
50-75
stay in the hospital was 22.64 percent where as
60
25-50
that of the hospital related expenses was only
40 10-25 10.32. In other words, the predicting ability of
2-10 the neural prediction models was much higher
20
0-2 with respect to the hospital related expenses as
0 compared to the length of stay.
P ublic Free Ward 0-5 Days Medicines
Hospital Duration Fully P aid
The profiles of each of the octiles indicate a
C harac teristic
very interesting impact of the demographic and
health related aspects on the length of stay as
well as the hospital related expenses. Large
Figure 8. Trends of expenditure with respect to
number of those with short length of durations in
selected hospital related aspects
the hospital as well as those, whose hospital
As mentioned earlier, the profiles of the
related expenses are low, belong to single-family
patients with respect to the hospital expenses are
nuclei. At the same time, the group with long
presented in Table 2. The predictive ability of
length of stay in the hospital is dominated by
the neural prediction model with respect to the
unmarried persons where as the one incurring
hospital expenditure is much better with an error
high expenses is dominated by married persons.
rate of about 10 percent only. The average
Source of water as well as the water supply
expenditure incurred by the top 98-100
agency has an important bearing on the length of
Percentile segment was Rs. 22,728 where as the
stay as well as the hospital related expenses. The
expenditure incurred by the bottom 2 percentile
habit of keeping animals within the residential
was Rs. 207 only. As expected, patients in the
area increases the chances of longer stay in the
older age group (more than 40 years) incur
hospital as well as higher hospital related
higher expenditure as compared to the patients in
expenses.
the younger age group. Similarly, those patients
Social groups belonging to scheduled castes
who incur higher hospital related expenses are
and scheduled tribes dominate the groups with
predominantly from rural areas and involved in
shorter stay in the hospital as well as lower level
agricultural related activity. Needless to say, the
of expenses. This could be the result of their
income levels have a direct relationship with the
economic condition. The type of occupation
hospital expenses. Those with primary or above
does show a trend with respect to both the
primary education get concentrated in the groups
dependent variables, but none of the occupations
with lower expenses. Patients incurring higher
dominate any of the groups. Thus, occupation
expenses predominantly belong to social group
seem to have only a limited impact on the two
other than scheduled caste and scheduled tribes,
dependent variables.
with few having single-family nuclei.
Consumption of various harmful items such
Interestingly, patients who are married appear to
as tobacco, charas, ganja, opium, alcohol etc.
be concentrated among the groups incurring
appear to impact the duration of stay on the
higher expenditure. The most important among
expected lines. Similarly, the impact of all the
the health related aspects, appear to be the source
hospital related items on the length of stay as
of water and the agency supplying water, and the
well as the expenses is on expected lines. These
habit of keeping animals within the residential
impact directions indirectly validate the
premises.
relationships between the two dependent
variables and the demographic, health related
5. Conclusions and hospital related variables.
Neural prediction functions were fitted to predict
the length of stay in the hospital as well as the 6. Policy implications
hospital related expenses. The explanatory
variables used are the demographic Based on the profiles of patients in each of the
characteristics such as age, gender educational segments, it is possible to identify directions for
background etc., health related aspects such as long-term policy. Providing drinking water from
appropriate source and by the appropriate agency
6
could reduce the duration as well as the hospital Systems: Case Study of a Veterans’ Administration
related expenses. Similarly, investment in Spiral Cord Injury Population”, Proceedings of the 36th
primary education and creating awareness in Hawaii International Conference on System Sciences,
2002.
better management at residences will go a long
way in reducing the hospital related expenses. It 4. Nagadevara, V. “Composite Quality Index of Silk
is necessary to target the free supply of Cocoons-An Application of Discriminant Analysis”,
medicines and other health services at the people Paper accepted for publication in the Journal of
living in rural sector. Interestingly, promoting Academy of Business and Economics.
smaller nuclear families will have a positive
impact on the two indicators namely duration of 5. Nagadevara, V. and Nayana Tara, “The influence
stay and hospital related expenses. At the same of the Demographic Characteristics on Adult
time it is also necessary to target the social Education – A Market Segmentation Approach”,
Proceedings of the 6th International Conference on
groups such as scheduled caste and scheduled
Global Business and Economic Development, States
tribes for providing free medicines and other and Markets: Forging Partnerships for Sustainable
related services. The average expenditure in the Development, November 7-10, 2001, Bratislava,
top 98-100 percentile segment was Rs. 22,728 Slovakia
where as the average expenditure in the next two
segments was Rs. 12,001 and Rs. 6,814 6. Nagadevara, V. and Nayana Tara, “Improving the
respectively. If some of the policy initiatives Effectiveness of Post Literacy Programme through
mentioned above could shift the patients by two Data Mining Techniques”, Towards E-Government
or more segments, from the top 98-100 Management Challenges, Ed. M. P. Gupta, Tata
McGraw-Hill Publishing Company Ltd, 2004, pp 369-
percentile, the total benefit accruing to the 378.
society will be immense.
7
Table 1. Segment-wise Profile of Patients Based on the Duration of Stay in the Hospital
Octile Segment (Percentile)
Characteristic Attribute
98-100 90-98 75-90 50-75 25-50 10-25 2-10 0-2
Demographic Characteristics
Gender Male 63 62 61 56 52 47 44 42
Illiterate 12 21 27 30 42 56 73 88
Education
Primary 42 31 24 18 11 5 2 0
Family Nuclei One 56 61 62 66 68 73 75 87
Independent
Dwelling
House 98 95 91 85 75 66 55 38
Marital Status Not Married 61 54 43 44 30 18 11 3
<=20 12 13 14 19 28 32 39 40
20-40 27 27 30 30 32 33 36 45
Age
40-60 37 36 36 32 26 24 18 12
>60 24 22 17 15 12 10 6 2
<=500 53 61 73 77 82 83 85 92
Current Income
>500 47 39 27 33 18 17 15 8
Casual Lab-
Agriculture 17 20 22 22 22 21 15 10
Student 38 28 20 13 7 3 2 0
Occupation
Domestic Work
8 12 15 23 31 36 36 26
Others 28 30 30 25 20 14 0 0
Sector Urban 50 46 51 49 48 43 35 39
Social Group Others 87 83 82 78 74 65 55 39
Health Related Aspects
No 25 35 37 42 46 53 63 74
Drainage
Open Pucca 43 34 30 23 17 11 6 3
Insecticide No 73 77 78 79 81 83 87 88
Keep Animals No 43 47 55 59 61 62 65 71
None 25 30 37 40 49 61 73 87
Latrine
Others 28 25 17 14 8 4 2 0
Government 67 62 63 63 62 64 64 71
Water Agency
Others 31 36 33 32 31 28 28 22
Tap 59 51 53 50 48 40 35 28
Water Source
Tube well 27 27 27 27 28 33 36 42
Alcohol No 57 64 72 79 87 92 93 95
Charas No 62 69 77 84 91 96 98 99
Ganja No 61 68 77 84 91 96 98 99
Opium No 62 69 77 84 91 96 98 99
Tobacco No 52 55 65 73 80 82 84 87
Smoking No 47 54 62 68 75 80 80 78
Hospital Related Items
Public 73 64 54 47 42 39 34 26
Type of Hospital
Private 11 18 24 32 37 40 44 51
Free 71 62 53 49 46 45 42 37
Type of Ward
General 21 25 33 38 42 45 49 56
Subsidized 55 41 25 15 11 6 4 2
Medicines
Fully Paid 37 50 63 68 70 72 72 75
8
Octile Segment (Percentile)
Characteristic Attribute
98-100 90-98 75-90 50-75 25-50 10-25 2-10 0-2
Other Than Drugs Not Availed 63 65 68 68 69 72 73 80
Vaccination Yes 76 81 84 87 88 88 89 93
High Spender Low 77 72 68 64 59 54 50 46
Average Duration Days 36 24 17 13 8 6 4 3
Normalized Error 0.14 0.18 0.19 0.2 0.24 0.27 0.3 0.31
Overall Normalized
0.2264
Error
9
Octile Segment (Percentile)
Characteristic Attribute
98-100 90-98 75-90 50-75 25-50 10-25 2-10 0-2
Not Availed 16 34 52 65 79 83 82 89
Surgery
Fully Paid 82 58 26 11 2 0 0 0
Public 7 15 21 34 50 87 88 87
Type of Hospital
Private 92 78 61 48 37 7 4 3
Type of Ward Free 1 7 14 28 47 90 97 98
General 34 58 53 45 39 9 2 0
Special 63 30 17 11 6 0 0 0
Average Exp 22728 12001 6814 3622 1809 906 492 207
Normalized Error 0.11 0.12 0.13 0.13 0.1 0.06 0.05 0.07
Overall Normalized
0.1032
Error
10