Sei sulla pagina 1di 7

Expert Systems with Applications 38 (2011) 1107811084

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Analysis by data mining in the emergency medicine triage database at a Taiwanese regional hospital
W.T. Lin a, Y.C. Wu b,, J.S. Zheng b, M.Y. Chen a
a b

Department of Industrial Engineering and Management, National Chin-Yi University of Technology, Taiwan Department of Industrial Engineering, Chung Yuan Christian University, Taiwan

a r t i c l e
Keywords: Cluster analysis Data mining Emergency medicine Rough set Triage

i n f o

a b s t r a c t
Emergency medicine is the front line of medical service a hospital provides; also it is the department people seek medical care from immediately after an emergency happens. The statistics by the Department of Health, Executive Yuan, indicate that over years, the number of people at the emergency department has been increasing. The US has introduced and practiced the triage system in the emergency medicine in 1960, whereby to aid the emergency department in allocating the patients, to give them appropriate medical care by the fast decision of the nurses and doctors in case of the patients seriousness through their judgment. This study takes on the knowledge contained in the massive data of unknown characteristics in the triage database at a Taiwanese regional hospital, using the cluster analysis and the rough set theory as tools for data mining to extract, with the analysis software ROSE2 (Rough Sets Data Explorer) and through rule induction technique, the imprecise, uncertain and vague information of rules from the massive database, and builds the model that is capable of simplifying massive data while maintaining the accuracy in classifying rules. After analyzing and evaluating the knowledge obtained from relevant mining in the hospitals past medical data for the consumption of emergency medical resources, this thesis proposes suggestions as reference for the hospitals in subsequent elevation of medical quality and decrease in operative costs. 2011 Elsevier Ltd. All rights reserved.

1. Motivation and objectives of the research 1.1. Background and motivation of the research Emergency department, the front line of a hospital facing urgent patients, consists of doctors, nurses, technicians, social workers, emergency medical technicians, administrative persons, employees and volunteers as members, who maintain a 24-h operation and are able to do anything like rst aid, observation in detention or surgical operation, in a way as if of a hospital in hospital (Shi, 2008). According to the 2007 statistics by the Department of Health, Executive Yuan, as shown in Fig. 1, the daily emergency medical services provided by all hospitals in Taiwan increased from 14,405 person-visits in 1997 to 18,392 person-visits in 2007, a signicant growth. The statistics by US Center for Disease Control and Prevention also showed an increase in the number of emergency patients from 94.9 million in 1997 to 175 million in 2001 (McCaig & Burt, 2003). These all suggest a trend, worldwide, of continuous increase in visitors to emergency department, which has also kept such environment in hectic condition like in warfare.
Corresponding author. Tel./fax: +886 4 23723808.
E-mail addresses: jason_wu1102@yahoo.com.tw, jason_wu@ms2.url.com.tw (Y.C. Wu). 0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.02.152

To avoid the delay in saving the really urgent patients among the numerous visitors to the emergency room, the emergency triage system was established. As such, the US introduced the triage system in emergency medicine in 1960 (Weiner & Edwards, 1964); the US Emergency Nurse Association published the Standards of Emergency Nursing Practice, which specically provides that the emergency nurses should conduct a triage on every patient showing up in the emergency room from the physiological and psychological angles to identify the priority of medical care among patients (Gilboy, Travers, & Wuerz, 1999). Triage is the screening station set up in the emergency medicine; its purpose is chiey to place the right person at the right time in the right place to use the right resources (Chan, 2006). This study investigates the current condition of the emergency patients, extracting by data mining techniques, from the implicit and latent data of emergency patients in the hospital, the trend and data that can serve as reference, and analyzing and understanding the correlation between triage and patient structure and consumption of medical resources. The study, then, evaluates the data obtained from relevant mining to present suggestions for improvement as reference for the hospitals in subsequent elevation of medical quality and decrease in operative costs. It is hoped to serve the basis of reference for the governments health agencies in deliberation on the human power training and allocation in

W.T. Lin et al. / Expert Systems with Applications 38 (2011) 1107811084

11079

Fig. 1. Averaged daily emergency medical services provided by hospitals in Taiwan.

emergency medicine related units at hospitals when reviewing medical expenses and revising health insurance policies in future. Also, the medical modes and trends obtained by data mining techniques can be stored in the existed database of medical knowledge and will be able to make the management of information and knowledge, which is very useful to the medical institutions. 1.2. Research objectives Taking a regional hospital for example, this study explores the effect of patients use of resources by analyzing the data of triage of emergency patients. The study also nds the knowledge of diagnosis by employing the knowledge discovery theory in the eld of data miningrough set theory, RST. Through the application of RST method, data mining is conducted in the historic triage data at a Taiwanese regional hospital to uncover the implicit knowledge in the database, to build the model that can simply massive data while maintaining the accuracy in rules of classication, which serves as the tool for analyzing the original anamnesis data that are massive, vague and full of uncertainty, whereby to analyze the triage data. This study has the following objectives: 1. To use the cluster analysis to classify the triage and modication cases in the triage database to reduce the noises in the classication and, then, to nd out the classifying model of routine triage and modication by classication. 2. To analyze the data and to employ the rough set theory to uncover the implicit knowledge in the database, to build the model that can simply massive data while maintaining the accuracy in rules of classication. 3. To analyze the triage database to identify the key attributes of the triage and to summarize the important rules of decision. 2. Documentary review This chapter comprises two parts; the rst presents the denition for emergency medicine, followed by sorting the related research in triage during the period from 2000 to 2008 in Taiwan, in hope of straightening up the denition of triage. It is also found from many studies, both domestic and abroad, the problems currently facing emergency triage. The second part describes the development of the technique of data mining, including the rise, the denition, the technique and functions, the medical application of and the research in medical industry with data mining, with the hope of using data mining as the research tool here after having made in-depth understanding of its techniques. 2.1. Denition of emergency medicine Thanks to the feature of convenience from the 24-7 service of emergency medicine, people are allowed to make full use of its

resources. But, from the angle of medical management, the function of the emergency medicine in a hospital is greatly different from what people think of. Among the differences are the treating process of complex conditions and the urgency of medicine that differ signicantly from the treating process in general in-patient service (Huang, 1993). With the above reorganized summary, we can roughly understand the denitions and views by the researchers, both from Taiwan or in abroad, about emergency medicine; of which denitions and views the most important point in common is the widely referring to various kinds of urgent conditions (that affect safety of life and health condition) as emergency medicine. Such denition has encompassed the general explanation by most scholars about emergency medicine; however, in this era with rising consumer sense, where people all strongly call for personal life quality and physical health condition, the patients are seeking the assistance from emergency medicine just because they feel under the weather or have slight pain, creating congestion in the emergency department and more workload on the medical persons there. The purpose of solving this and of assisting the medical persons to work more efciently thus gives rise to the work of triage. To have deeper cognition about triage, the explanation for the purpose and the methods of triage will be given immediately below. 2.2. Data mining Data mining is a new technique that emerged with the development of articial intelligence and database techniques in recent years. It focuses on the re-analysis of database, including the construction of models or the determination of data pattern, with the purpose mainly of discovering the valuable information concerned about yet unknown to the owner of database (Hand, Blunt, Kelly, & Adams, 2000). Data mining is a process of automatically selecting, by computers, some important and potentially useful data types or knowledge from massive data or large database. This technique uses classication, relationships, sequential analysis, cluster analysis and other statistic methods to nd out, from enormous database, implicit, unknown yet very useful information for business operation. While the historical data of most enterprises are millions or tens of millions in number, which are difcult to analyze, it becomes possible to extract useful information from huge information by using the tool of data mining. Data mining is sometimes called knowledge discovery, KD; but, in fact, by denition, knowledge discovery is a non-tedious procedure for identifying effective and potential benets amid data. It is known from Fig. 2 below that data mining is one of the important processes of knowledge discovery. From the denitions by the scholars, it is clear that the usage of data mining is an analysis process within a series of knowledge discovery. But, as time changes, the term data mining gradually replaces knowledge discovery. The above summarized, the ulti-

Fig. 2. Flow chart of knowledge discovery in database. Data source: organized by this study.

11080

W.T. Lin et al. / Expert Systems with Applications 38 (2011) 1107811084

mate purpose of data mining is to uncover the rules that are helpful to decision process from massive data. 2.2.1. Theoretical techniques of data mining The tools of data mining generally have two main functions; one is predicting the future trend from the built models to provide to decision makers as reliable information when making decision, like the model built from classication that can be applied to nding out most probable nancial clients with frequent bad debts in order to avoid excessive credit line. The other is revealing the unknown patterns of data, whereby it is possible to use data mining to identify the pattern of the specimen concealed in database, for example, identifying, from customers online shopping experiences, the probable combination of merchandise that customers may purchase, such that the decision maker can do the marketing directed only to certain subjects without wasting much money on printing and mailing but with scarce response. The theoretical techniques of data mining can be divided in conventional techniques and improved techniques. The so called conventional techniques are represented by statistic analysis, including the descriptive statistics, probability theory, regression analysis and categorical data analysis of the study of statistics. Especially the factor analysis that is of the multivariate analysis, one of the advanced statistic methods, used to summarize variables, the discriminate analysis used to classify and the cluster analysis used to separate groups, and others, as the subjects of data mining are mostly data with multiple variables and large in number. In improved techniques, a wide range of articial intelligence methods are used; they include the more popular decision trees, genetic algorithms, neural network, rules induction, fuzzy logic and rough set theory. The commonly used data mining techniques are organized and briey presented as follows. 1. Regression analysis: regression analysis is an analytic method used by many statistic tools, especially in the making of economy and business related decisions. The purpose of regression analysis is to deal with the effects of a multiple of independent variables on a certain dependent variable. But, when using it, it is necessary to assume that each population is independent among others, each having to be consistent to normal distribution and the sampling being randomly made from the population (Chen, 2000). 2. Discriminate analysis: this is a very suitable technique when the dependent variables of a problem encountered are qualitative and the independent variables (predictors) thereof are quantitative. Discriminate analysis is generally applied to solving categorical problems where the dependent variables are composed of two groups. Such case is called two-group discriminate analysis, while multiple discriminate analysis if composed of a multiple of groups (Huang, 2003). 3. Cluster analysis: cluster analysis can be used to rst roughly classify the data when they are very complex and jumbled, or contain too many variables or of too many dimensions. Unlike discriminate analysis, in the practice of cluster analysis, no classication variables are inserted to divide the data. Cluster analysis is rarely used alone, because nding the groups is not the object in itself; rather, once the groups are detected, it is necessary to use other methods to understand the meaning of the clustering (Huang, 2003). 4. Market basket analysis: also called association rule analysis. This bears the same meaning as cluster analysis; both are in a form of clustering. They differ in that the market basket analysis is to nd out probable combination of merchandise, e.g., the sequence of purchasing, product display, the designing of product combination and merchandise promotion. What market

basket analysis appeals is that it applies association rule to explain the correlation between physical merchandise and why they are combined (Agrawal & Srikant, 1994). 5. Neural network: this is an information processing method that resembles living neural network. It uses a large quantity of simple and connected articial neurons to simulate the capability of the biological neural network. With the abilities of memory, learning, screening noises and debugging in addition to the function of high speed computation, neural network can solve many complex problems such as classication or prediction (Yeh, 1999). 6. Decision trees: one of the methods of creating classifying models, decision trees can create, by employing induction method, a tree-like structured model specically for given data and make predictive analysis on data, whether of discrete or continuous types. In order to classify the inputted data, each node in the decision trees is a determinant, which determines whether a record of data is of a certain attribute; as such, every node can classify the inputted data in several categories to form a tree, e.g., a CART (Classication and Regression Trees) or a CHAID (Chi-Square Automatic Interaction Detector) (Quinlan, 1993). 7. Genetic algorithms: this is an optimal spatial query method, very suitable for solving problems of optimization. It employs natural selections, such as selection, reproduction, crossover and mutation, and the genetic evolving mechanism to create new cells. It creates a model in advance, whereby it operates through a series of procedures that are similar to production and generational propagation until the function converges to an optimal solution (Holland, 1975). 2.2.2. Medical related research using data mining techniques At present, research of medical issues by data mining has been relatively prevalent in Taiwan. The most common are the possibilities of using currently massive medical and patient data to investigate the causes of a certain disease, or using classication methods as a data mining technique to induce, by the algorithm of data mining, the consumption of resources from the historical data, specically the cataloging process of medical fees, in medical database. Besides those, data mining is also used to explore the reduction of patient complaints that arise from improper treatment or inefciencies, so as to upgrade the medical quality and to save waste of medical resources. Related literatures include Shi (2008), who improved emergency triage and physician shift scheduling by data mining analysis, where in the analysis of triage accuracy, he uses cluster analysis to classify the triage modication cases of similar nature, reducing the noises in classication, followed by determining the classifying model for triage modication levels by classication. Lai (2007) applied the data mining technique to increase the consistence in triage classication in emergency medicine, where he used three techniques of data mining to increase such consistence, with the research results indicating that the Back-propagation NN has better performance in the prediction of triage classication. Chen (2008) constructed a management and planning system of triage knowledgean example of a medical center in Taiwan dening the key factors that affect triage before employing principal components analysis, ontology and the method of decision trees to uncover the implicit knowledge in the database. 2.3. Cluster analysis Presently, a great number of unsupervised clustering methods has been developed, e.g., K-Means algorithm of the conventional multivariate statistics, Agglomerative clustering method and the

W.T. Lin et al. / Expert Systems with Applications 38 (2011) 1107811084

11081

Fuzzy C-Means, FCM, which introduces fuzzy theory in the K-Means algorithm. Additionally, there are Self-Organizing Map, SOM, from the neural network, Fuzzy Adaptive Resonance Theory, or Fuzzy ART for short, and the like, which work well in clustering. Nonetheless, a number of published documents pointed out that mixed clustering methods, such as the framework composed of supervised learning grouping and unsupervised learning grouping, or a mixed framework of two unsupervised learning grouping methods, all achieved in better grouping effects (Lin, & Huang, 1999). Self-Organization Map, SOM, proposed by Kohonen as an unsupervised learning algorithm (Kohonen, 1989, 1997; Kohonen, Raivio, Simula, Venta, & Henriksson, 1990), is a network model framework based on competitive learning. In the realm of neural network, SOM is an outstanding data mining tool; it can project the inputted graph of high dimension onto the topological grid of lower dimensions and can provide man with the seeing visually and examination of the property of data clustering; also, the research can conduct the quantity analysis in a precise manner by increasing topological nodes when the volume of data increases. The improved two-stage clustering method, proposed by Kuo, Ho, and Hu (2000), has the objective of evaluating the conventional two-stage method and using SOM to determine the initial solutions before substituting them in K-Means to nd the best solution. In the rst stage of this study, the unsupervised SOM network nds the initial solution, and then the initial population is substituted into K-Means, followed by the 2nd-stage analysis of the nodes on the map by K-Means of different distant concepts. The experiment results indicate that whether viewed in the aspect of efciency or the aspect of speed, this clustering method outperforms conventional direct clustering. 2.4. Rough set theory Rough set theory, RST, a new mathematical method proposed by Pawlak from Poland in 1982, is used to analyze imprecise, vague, and uncertain data, where all the information comes from the data of its own and it needs no hypothesis of models. It is not restricted by any of the above stated when it is used, that is, when RST is used to analyze, it need not obey any hypothesis (Pawlak, 1991). Also, it is capable of unveiling the information and knowledge behind data; thus, RST often works well in nding information and knowledge in data regardless when the data form is vague or is with uncertainty (Dimitras, Slowinski, Susmaga, & Zopounidis, 1999). This study uses rough set theory as the tool of data mining for its features or advantages outlined below. 1. Capable of analyzing massive data. 2. Taking the data in every eld as a symbol, avoiding obtaining, as conventional statistic analyses do, different analytic results from different magnitudes (sizes) of data values. 3. Capable of further dividing the affecting factors in core affecting factors and non-core affecting factors, which the conventional statistic analyses cannot. 4. Compared with other analytical techniques, capable of obtaining better accuracy when predicting regarding data with lesser attribute factors. 3. Methods This study studies the operation mode, allocation of emergency medical persons and the process of patient visiting and triage at the hospital in this case. It is found from the medical operation process at the hospital in this case that data can be obtained from the triage database to serve as the variables of our data mining.

Once the data have been pre-treated, it is possible to enter the focus of this study: data mining and analysis. First, the rule and models of interest are found by various mining tools and their algorithms; then, the rule is extended to produce concrete managerial decision and recommendations so as to achieve the nal objective of research. For methodology, this study uses two-stage cluster analysis and rough set theory. In analysis of the data mining algorithm step, we use the database to the methods and data mining algorithms, the correspondence is as follows whichFig. 3. Table 1 shows the selected database elds for data mining. After the data source is classied, the data to mine are also prepared. The preparing process includes selection, cleaning, establishment, integration and formatting of data. 1. Data selection: considering the attributes needed or not by the selection with the relationship to the aim of data mining as the criteria of data selection and analysis; in this study, for example, gender, age, overstay length, insurance status, arrival, expenses, consultation, period, subject, admission and triage are picked up in the patients basic data from Table 1, which contains a synopsis of the database elds to select by data mining, as the patients basic data selected for this study. 2. Data cleaning: proceeding to the second stage of data process to satisfy the need of the analytic tool for the format. As there are three databases in data source, which require sorting individually and merging, the aid of software Excel is used in this study to merge the data and screen, delete and modify the data types that correspond to one another. 3. Data establishment: in this study, the data in the databases are modied to be in the types that can be executed by the analytic tools; for example, the type of gender, in which 1 and 2 are used to replace male and female.

Fig. 3. Diagram of data mining. Data source: organized by this study.

Table 1 Synopsis of selected database elds for data mining. Database Database of registration query Database of physician order screen Database of triage screen Selected elds Anamnesis number, triage, registration date, registration time, discharge date, discharge time, fees, health insurance Anamnesis number, age, triage, registration date, registration time, overstay date, overstay time Gender, triage, chief complaint, past disease records, life sign, physician chief diagnosis (incl. codes), subject

11082

W.T. Lin et al. / Expert Systems with Applications 38 (2011) 1107811084

4. Data integration: integrating the information contained in multiple forms and data sources to generate new and complete records and conversion of variables. Also, merging the databases to make way for subsequent processes of data mining. 5. Data formatting: converting part of the data in terms of format for use by the analytic tools, and changing the meaning of the initial data, where the analytic tools are used in this study to rearrange every data attribute so that the model of data mining can be constructed easily. 4. Empirical study and data analysis 4.1. Data extraction and sorting In this study, data of the attributes of patient consultation in the emergency registration database at a hospital in central Taiwan, including visiting date, consultation, period, age, gender and the like, were rst screened and rearranged on Excel. Then, the cluster technique of data mining and the two-stage clustering were combined to seek for the cluster mode of the patients consultation, and, by analysis, the research results related to emergency patient database and triage were obtained. The patient data obtained from the emergency department were in the number of 22,990, of which the patient distribution was of triage. Of the total number of patients, that of triage level 1 accounts for 5.94%, level 2 31.94%, level 3 61.62% and level 4 0.5%. It can be seen from the basic statistics that of the various levels of triage, levels 2 and 3 patients are predominant, and triage mostly falls on level 3, followed by levels 2, 1 and 4. Of the attributes of patient data, some are in the type of numeric; some are text, which need to be converted into numeric type while the data of numeric type need segmentation to facilitate the data analysis with subsequent software. Changing data from text type to numeric type is easier, which only takes dening each text datum by a value. The data of attributes were sorted; the attributes to be analyzed were age, gender, patient type, insurance status, period, admission, arrival, subject, triage, overstay length and expenses. Gender is divided in male and female; patient type is divided in rst visit and revisit; insurance status is either with health insurance or not (selfcovered); period is divided in AM, PM and night; admission is divided in yes and no; arrival is divided in by ambulance, referral, on foot, outpatient, 119 and others; subject comprises internal, surgery, obstetrics and gynecology, pediatrics, dentistry and psychosomatic medicine; and triage is divided in levels 1, 2, 3 and 4. In segmenting the numeric-type data, how to segment each type of data is a complex undertaking. Thus, in order to avoid the uncertainty at time of analysis caused by arbitrary denition of indices for segmentation, also to let the data after segmentation represent better the data characteristics in every eld of the database, the cluster analysis method was thus introduced to group the numeric-type data. The elds in the basic structural data that are of numeric type are age, medical expenses and overstay length. 4.2. Analysis by clustering technique The statistic software SPSS Clementine10.0 was used to group the 22,990 records of patient data that have been obtained at the emergency department. As the samples of a same cluster after grouping have similar characteristics, in this study, the patient data were subjected to cluster analysis, where, in stage one, through the training and learning with SOM network, they were displayed visually as six groups. From the gradient shades of color in the graph, it is obvious that the patient triage data are divided in six groups as Fig. 4. With the population ascertained, it is possible to proceed to the second stage, K-Means cluster analysis.

In this study, the populations that were obtained by SOM were used as the initial populations of K-Means. The grouping by KMeans resulted in six clusters, of which cluster 1 comprises 7513 records, with average medical expense at NT$1378.93, average age at 17.18, average overstay length at 1.48 days; cluster 2 comprises 3205 records, with average medical expense at NT$3073.39, average age at 70.98, average overstay length at 2.1 days; cluster 3 comprises 2433 records, with average medical expense at NT$2,239.46, average age at 19.39, average overstay length at 1.68 days; cluster 4 comprises 3626 records, with average medical expense at NT$1669.23, average age at 37.51, average overstay length at 1.86 days; cluster 5 comprises 4336 records, with average medical expense at NT$1985.62, average age at 64.04, average overstay length at 1.63 days; cluster 6 comprises 1877 records, with average medical expense at NT$2761.67, average age at 44.55, average overstay length at 2.22 days, as shown in Table 2. This study made further investigation in the cluster with higher consumption of resources, such as the patients with excessive overstay length and those with high medical expenses. However, in view of fully understanding the property of emergency patents to reveal the potential consumers of emergency resources, Cluster 2 was put to RST analysis, whereby to nd out the decision rules. 4.3. Application of rough set theory In this study, the software ROSE2 (Rough Sets Data Explorer), which was developed by Wilk Dari Poznan University of Technology, Poland, was used to conduct the empirical analysis of RST on 3205 records of data in Cluster 2. Out of these data, 2885 records (90%) were randomly chosen for rule induction, with the remaining 320 records (10%) of data serving as rule verication. This software allows users to conduct analysis in Windows environment, and has good performance in the discovery of data attributes. The data of attributes were sorted, where the criteria analyzed included gender, age, period, subject, overstay length, insurance status, arrival, expenses, consultation, admission and triage, as well as triage (D) as decision attribute. Using RST to single out the key attribute from 3205 records of data and ruling out unnecessary attributes can increase the precision of analytic results and reduce the time run by the program. Having integrated the data, we obtained a total of 10 criterionattributes and one decision attribute. Based on the analytic results by the software ROSE2, in which a multiple of sets of attributes can be obtained; sorting all the discovered sets of attributes resulted in set intersection, of which the attributes are called core attribute, and set union, the attributes of which except those of the set intersection are called non-core attribute. Utilizing RSTs ability to simplify the attributes, we can nd smaller sets of attributes to represent the original ones. In the case of this study, we simplied the attributes in the decision information table. As the rule of decision classication in this study, the algorithm of the LEM2 (Learning from Examples Module, version 2) (Pawlak, 1982) was adopted to generate the decision rules. This algorithm was combined with a number of induction methods to generate decision rule, where the equivalence class method described every rule, with the assumption that the rules are in the smallest classication sets, that is, it is no longer possible to completely describe the data without any rule. By the RST analysis, the decision rules were sorted; we induced the smallest set of rules from 11 attributes, and a total of 326 rules were induced from 2885 records of data, as shown in Table 3. Of which, from the patients of level 1 triage, 133 rules were generated, and 196 rules from those of level 2. In row 1 of Table 3 is Rule 1: If (arrival = in ambulance) & (overstay length = 12 days) & (period = night) & (medical expense = 40005000) & (subject = internal), then the triage is level 1, with 90 records revealed to meet this rule.

W.T. Lin et al. / Expert Systems with Applications 38 (2011) 1107811084

11083

Fig. 4. SOM clustering chart.

Table 2 Table of variable means in SOM + K-Means clustering. Item Average expense (NT dollara) Average age (year) Average overstay length Quantity
a

Cluster 1 1378.93 17.18 1.48 7513

Cluster 2 3073.39 70.98 2.10 3205

Cluster 3 2239.46 19.39 1.68 2433

Cluster 4 1669.23 37.51 1.86 3626

Cluster 5 1985.62 64.04 1.63 4336

Cluster 6 2761.67 44.55 2.22 1877

NT dollar = New Taiwan dollar.

Table 3 Table of RST decision rules. Item 1 2 ... 326 Record quantity 90 79 ... 1 Decision rule (arrival = 1) & (overstay length = 2) & (period83) & (medical expense = 5) & (subject = 1) [level 1] (arrival = 1) & (period = 3) & (admission = 2) & (medical expense = 4) & (subject = 1) [level 1] ... (arrival = 3) & (overstay length = 1) & (period = 2) & (medical expense = 4) & (subject = 1) & (gender = 2) [level 1]

Table 4 RST rule testing. Predicted value Actual value 1 2 True positive rate Total number of tested objects: 320 Total accuracy: 0.937 Total coverage: 0.944 1 120 11 0.916 2 8 163 0.953 Record quantity 135 185 Accuracy 0.936 0.937 Coverage 0.948 0.941

In Table 4, the remaining 320 records of triage data were used to verify the rule induction and the analyses as follows: 1. Record quantity: the number of records in the table of data that are actually of such triage level. For example, 135 represents 135 of the 320 records which are of actually level 1 triage. 2. Accuracy: the rate of capability of accurately inducing the triage levels. For example, 0.936 = 120/(120 + 8), where 120 means 120 records that are actually of level 1 triage and were level 1

after the induction by the rule, and if there were eight records determined as level 2 by the induction, then these eight were induction errors. 3. Coverage: the rate of induction algorithms conducted among the identiable as the rules for each actual triage level. For example, 0.948 = (120 + 8)/135. 4. True positive rate: the rate of accurate induction of rules ultimately achievable for each actual triage level. For example, 0.916 = 120/(120 + 11).

11084

W.T. Lin et al. / Expert Systems with Applications 38 (2011) 1107811084

5. Total number of tested objects: the number of records tested. 6. Total accuracy: the rate of the tested objects identiable by the rules. For example, 0.937 = (120 + 163)/(120 + 8 + 11 + 163). 7. Total coverage: if 1, it means all tested objects can be identied by the rule induction (including the induction errors after identication); but, when the induction rule cannot identify any record that has not appeared in previous training stage, the total coverage becomes less than 1. For example, 0.944 = (120 + 8 + 11 + 163)/(135 + 185). 5. Conclusions This study employed cluster analysis technique as the tool of data mining to examine the emergency triage database at a local hospital in Taiwan. The implicit knowledge with unknown features in huge databases were analyzed by the combination of SOM and K-Means cluster analysis, as cluster analysis has the advantage of the ability to avoid the uncertainty in the analysis of numeric-type data caused by arbitrary denition of classication and clustering criteria, as well as to effectively segment the data with different group characteristics. Then, by rough set theory analysis, the uncertain, vague and rough data could be treated, with every eld of data regarded as a symbol when they were being read, the advantage of which is that, unlike conventional statistic analysis, RST analysis does not produce different analytical results from different sizes of data. Also, it was allowed to classify the affecting attributes in core attributes (period, arrival, gender, age, subject and medical expenses) and non-core attributes. This study combined these two techniques to apply in data process and as tool of data mining, and achieved good results. As the results of this study, the patients with longer overstay at emergency, with higher consumption of medical expenses and of older average ages, which were found by two-stage cluster analysis, were in the group with high risks that consumes resources, and they had certain similarities as most of them were patients of level 1 and level 2 triage. In triage, the medical expenses also increased with the aggravation of seriousness; in subject, patients of internal medicine departments consume most; in arrival, most patients arrived in ambulance. Apart from the overstay length at emergency department, the classication of patient diseases is another key factor if it is desired to monitor the emergency patients medical expenses. Finally in this study, the rough set theory was used to nd out the attributes of decision rules for comparison with original data. It was found in this study that in disease classication, the triage types with high expenses concentrated on rare diseases or severe casualties. Therefore, to control the medical expenses of emergency patients, besides overstay length at emergency, the classication of patient diseases is also one of the key factors.

Acknowledgment The authors would like to thank Mr. Wu Tsung-Ling for his assistance in collecting the material and helpful suggestions on an earlier version of this paper. References
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of 1994 international conference on very large data bases (pp. 487 499). Chen, S. Y. (2000). Multivariate analysis. Taiwan: Hwa Tai. Chan, C.-L. (2006). A analysis of the relationship between the triage of Emergency Department, patient structure and medical resource use. Master dissertation. Yuan Ze University. Chen, W. Y. (2008). Construction of management and planning system for triage knowledge an example of a Taiwanese medical center. Master dissertation, National Chin-Yi University of Technology. Dimitras, A. I., Slowinski, R., Susmaga, R., & Zopounidis, C. (1999). Business failure prediction using rough sets. European Journal of Operational Research, 114(2), 263280. Gilboy, N., Travers, D. A., & Wuerz, R. (1999). Re-evaluating triage in the new millennium: A comprehensive look at the need for standardization and quality. Journal of Emergency Nursing, 25(6), 468473. Huang, H. N. (1993). A survey research on emergency service and patient satisfaction. Master dissertation. National Taiwan University. Huang, J. Y. (2003). Marketing (2nd ed.). Taiwan: Book Zone. Hand, D. J., Blunt, G., Kelly, M. G., & Adams, N. M. (2000). Data mining for fun and prot. Statistical Science, 15(2), 111131. Holland, J. H. (1975). Adaptation in natural and articial systems. Ann Arbor: University of Michigan Press. Kohonen, T. (1989). Self organization and associative memory (third ed.). Berlin: Springer-Verlag. Kohonen, T. (1997). Self-organized maps. New York: Springer-Verlag. Kohonen, T., Raivio, K., Simula, O., Venta, O., & Henriksson, J. (1990). Combining linear equalization and self-organizing adaptation in dynamic discrete-signal detection. In Proceedings of the international joint conference on neural networks, San Diego (pp. 223228). Kuo, R. J., Ho, L. M., & Hu, C. M. (2000). Integration of self-organizing feature map and K-means algorithm for marketing segmentation. Journal of Computers and Operation Research. Lai, C. H. (2007). Data Mining applied to the predictive model of triage system in Emergency Department: A case of medical center in Taiwan. Master dissertation, National Chin-Yi University of Technology. Lin, S. F., & Huang, C. A. (1999). Foundations of Neural Networks. Taiwan: Chuan-Hwa Books. McCaig, L. F., & Burt, C. W. (2003). National hospital ambulatory medical care survey 2001 emergency department summary. Online Statistical of Centers for Disease Control and Prevention, available. Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Sciences, 11, 341356. Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. London: Kluwer Academic Publishers. Quinlan, J. R. (1993). C4.5: Programs for machine learning. Berlin: Springer. Shi, Y. S. (2008). Using data mining techniques to analyze and improve for emergency triage and operation of doctor schedule. Master dissertation. National Chin-Yi University of Technology. Weiner, E. R., & Edwards, H. R. (1964). Yales studies in ambulatory medical care changing patterns in hospital emergency services. Hospital, 38(1), 5562. Yeh, I. C. (1999). Application of articial neural network model and implementation. Taiwan: Scholars Books.

Potrebbero piacerti anche