Sei sulla pagina 1di 8

Proceedings of National Conference on Emerging Trends: Innovations and Challenges in IT, 19 -20, April 2013

Data Mining in Decision Support System

**
*P Nancy Newton Dr. Sapna Singh
**
M.sc, M.Tech Reader,
Associate Professor, IT Dept, School of Management Studies,
Aurora Scientific & Technological Institute, University of Hyderabad
Hyderabad

expenses, and organizational commitment. In this section


Abstract: we are going to introduce data mining in decision
Decision support focuses on developing systems to help support system (DMDSS). First we are going to
decision-makers solve problems. It provides a selection introduce its role aware architecture and roles using
of data analysis, simulation, visualization and modeling DMDSS. Then we are going to introduce concepts of the
techniques, and software tools such as decision support use and functionalities of DMDSS through some
systems, group decision support and mediation systems, example forms for data mining administrator and
expert systems, databases and data warehouses. Data business user. The introduction of concepts of use and
mining deals with finding patterns in data that are by functionalities is done for classification data mining
user-definition, interesting and valid. It is an method supported by DMDSS. We are going to finish
interdisciplinary area involving databases, machine the introduction of DMDSS by presenting the experience
learning, pattern recognition, statistics, and visualization. of the use of DMDSS in our BI operator.
Independently, data mining and decision support has no Traditionally, statistical and OLAP tools have
systematic attempt to integrate them. Data Mining and been used for an advanced data analysis. It is often
Decision Support: Integration and Collaboration presents assumed that the business planners would know the
a conceptual framework, plus the methods and tools for specific question to ask, or the exact definition of the
integrating the two disciplines and for applying this problem that they want to solve.
technology to business problems in a collaborative Data mining methods also extend the
setting. possibilities of discovering information, trends and
patterns by using richer model representations (e.g.
Our main goal is to make DM is more accessible and
decision rules, trees, tables , ...) than the usual statistical
effective for decision makers to support application-
methods, and are therefore well-suited for making the
oriented DS
results more comprehensible to the non-technically
oriented business users.
Key Points: Data Mining, Decision Support, Operation
It may well be that by the introduction of data mining to
Flow, Uses of Data Mining, Data Preparation, and Data
information systems the knowledge discovery process
Mining Tools.
and decision process will move to a higher quality level.
I. INTRODUCTION
Uses of data mining are:
Market segmentation - Identify the common
DSS must allow users to intuitively, quickly,
characteristics of customers.
and flexibly manipulate data using familiar terms, in
Customer churn - Predict which customers are likely to
order to provide analytical insight. Data mining
leave some x-company and go to a competitor.
automates the detection of relevant patterns in a
Fraud detection - Identify which transactions are most
database, using defined approaches and algorithms to
likely to be fraudulent.
look into current and historical data that can then be
Direct marketing - Identify which prospects should be
analyzed to predict future trends. Decision support
included in a mailing list to obtain the highest response
systems are the core of business IT infrastructures to
rate.
translate a wealth of business information into tangible
Interactive marketing - Predict what each individual
and lucrative results. Collecting, maintaining, and
accessing a Web site is most likely interested in seeing.
analyzing large amounts of data, however, are mammoth
tasks that involve significant technical challenges,

BVIMSR’S Journal of Management Research ISSN: 0976-4739


Proceedings of National Conference on Emerging Trends: Innovations and Challenges in IT, 19 -20, April 2013

Market basket analysis - Understand what products or


services are commonly purchased together; e.g., bread Functional Processing Steps Frequency
and milk. Categories
Trend analysis - Reveal the difference between a typical Process 1.set data information once
customer present and future. Management
in data
source
Decision Support Tools
Are used to analyze the information stored in the 2.set target data
warehouse, typically to identify trends and new business information
opportunities...
3.data mapping on
A decision support system or tool is one specifically
designed to facilitate business end users performing source data and
computer generated analyses of data on their own. There target data
are very few pure decision support tools. 4.data replication
Business intelligence has become the vendors’ preferred mode definition
synonym for decision support.
5.schedule ETL task
As far as I know, cognitive researchers do not agree on
how decisions are made. Therefore, saying that these Data achieve source data Times
tools support making decisions is not a provable Acquisition
statement. Nor, is it, in opinion, an insightful way of Data transform between times
defining these tools. It seems, though, that 99% of the Transformation source data and target
definitions of BI say something about better decisions. data based on mapping
My wish is that these definitions would include a regulation
cognitive model of how decisions are made and an Data Loading load data to target data times
explanation on how the tools fit into the model. Process revise execution according
Management definition to
Figure:Architecture of decision support system the needs
Source: From Secondary Data

Integration data mining with decision support, involving


joint data preprocessing, standards for model exchange,
and meta-learning providing decision support when
choosing best data mining tools for a given problem. The
following tools, that can be used in integrating data
mining and decision support:
 A pre-processing tool, It allows access to various
data sources, enabling simple transformation tasks
using a library of templates.
 A common representation language supporting the
exchange of data mining and decision support
models for different application and visualization
tools. This development is built as an addition to the
currently developing PMML (Predictive Model
Source: Current trends in DM and DS integration Markup Language) standard (see www.dmg.org). Its
advantage is its independence of a selected
Despite this new commercial development, many application, platform and operating system.
integrating issues remain to be solved, and DM and DS  Meta-learning tools for classifier selection and ROC
integration techniques proposed. methodology for model selection
Table 1: CLASSIFICATION OF ETL OPERATION Shared ontology, using the developing Sol-Eu-Net
FLOW: On Line Glossary of Terms SOGOT.

BVIMSR’S Journal of Management Research ISSN: 0976-4739


Proceedings of National Conference on Emerging Trends: Innovations and Challenges in IT, 19 -20, April 2013

 The RAMSYS methodology for solving data mining


and decision support problems, requiring remote
collaboration of project partners .
Business
Table 2: Analytical Information Technologies Requirments
Data Mining Technologies
Functionalities: Association, correlation, clustering,
classification, regression, database knowledge Data
Source Data
discovery Requriments
Data Analysis : Signal and image processing,
Nonlinear systems analysis, time series and spatial
statistics, time and frequency domain analysis Preparation
Complex Data Analysis: Expert systems, Case-
based reasoning, System dynamics
Applications: Econometrics, Management Science
Decision Support Systems
 Automated Analysis and Modeling
o Operations Research
o Data Assimilation, Estimation and
Tracking Transformation,
 Human Computer Interaction
Reduction, Integration,
o Multidimensional OLAP and spreadsheets
Summarization, Measures
o Allocation and consolidation engine,
and Joins
alerts
o Business workflows and data sharing
CRISP-DM process model breaks down the life cycle of
data mining project into the following six phases which
include as:
Business understanding: focuses on understanding the
project objectives form business perspective and
Data Analysis
transforming it into a data mining problem (domain)
definition. At the end of the phase the project plan is Source: Compiled by Author’s
produced.
Data understanding: starts with an initial data Modeling: covers the creation of various data mining
collection and proceeds with activities in order to get models. The phase starts with the selection of data
familiar with data, to discover first insights into the data mining methods, proceeds with the creation of data
and to identify data quality problems. mining models and finishes with the assessment of
Data preparation: covers all activities to construct the models. Some data mining methods have specific
final data set from the initial raw data including selection requirements on the form of data and to step back to data
of data, cleaning of data, the construction of data, the preparation phase is often necessary.
integration of data and the formatting of data.
Information groundwork study: Evaluation: evaluates the data mining models created in
the modeling phase. The aim of model evaluation is to
confirm that the models are of high quality to achieve the
business objectives.

Deployment: covers the activities to organize


knowledge gained through data mining models and
present it in a way users can use it within decision
making.

BVIMSR’S Journal of Management Research ISSN: 0976-4739


Proceedings of National Conference on Emerging Trends: Innovations and Challenges in IT, 19 -20, April 2013

IDM is a Web-based application system intended to knowledge acquired in data mining models. Another
provide organization-wide decision support capability result is the revealing of bad data quality, which is a
for business users. Besides data mining it also supports typical side effect result for the use of data mining. For
some other function categories to enable decision some areas of analysis bad data quality has been detected
support: data inquiry, data interpretation and in the development of DMDSS and measures at the
multidimensional analysis. In the data mining part it sources of data have been taken to improve data quality.
supports the creation of models of the following data Application of Business and Economics
mining methods: association rules, clustering and  Financial Planning
classification. Through the use of various visualization  Risk Analysis
methods it supports the presentation of data mining  Supply Chain Planning
models. On the top-level it consists of the following five  Marketing Plans
agents: user interface agent, IDM coordinator agent, data  Text and Video Mining
mining agent, data-set agent and report/visualization  Handwriting/Speech
agent. The user interface agent provides interface for the
 Recognition
user to interact with IDM to perform analysis. It is
 Image and Pattern Recognition
responsible for receiving user specifications, inputs,
 Long-range Economic Planning
commands and delivering results.
 Homeland Security
The IDM coordinator agent is responsible for
coordinating tasks between the user interface agent and
DATA MINING TOOLS Data mining tools can be
other three of before mentioned agents. Based on the
classified into one of three categories
user specifications, input and commands, it identifies
tasks that need to be done, define the task sequence and
delegates them to corresponding agents. It also
Data mining tools
synthesises and generates the final result.
The data-set agent is responsible for communication
with data sources. It provides interface to data
warehouses, data marts and databases.
Traditional Dashboards Text-mining
The data mining agent is responsible for creating and
manipulating of data mining models. It performs data
cleansing and data preparation, provides necessary
parameters for data mining algorithms and creates data Source: Compiled by Author’s
mining models through executing data mining
algorithms.
The report/visualization agent is responsible for  Traditional Data Mining Tools. Traditional
generation of the final report to the user. It assimilates data mining programs help companies establish
the results from data mining agent, generates a report data patterns and trends by using a number of
based on predefined templates and performs output complex algorithms and techniques.
customization.  Dashboards. Installed in computers to monitor
information in a database, dashboards reflect
Business users have access to a fewer functionalities data changes and updates onscreen — often in
than the data mining administrator. The form for model the form of a chart or table — enabling to see
viewing for a business user (Figure 2) is slightly overview of the company's performance.
different from the form for model viewing for data  Text-mining Tools. Ability to mine data from
mining administrator, but has similar general different kinds of text — from Microsoft Word
characteristics. and Acrobat PDF documents to simple text
Business users can also view rules in both visualization files, for example. These tools scan content and
techniques as the data mining administrator. convert the selected data into a format that is
The data mining administrator executes daily model compatible with the tool's database, thus
creation and provides support for business users. providing users with an easy and convenient
The first results of the use of DMDSS are some changes way of accessing data without the need to open
planned in the marketing approach and a special different applications. Capturing these inputs
customer group focused campaign based on the can provide organizations with a wealth of

BVIMSR’S Journal of Management Research ISSN: 0976-4739


Proceedings of National Conference on Emerging Trends: Innovations and Challenges in IT, 19 -20, April 2013

information that can be mined to discover Enabling managers and "power users" to
trends, concepts, and attitudes. indiscriminately search the data warehouse looking for
relationships or rules can raise serious privacy and
The knowledge miner agents can discover the security concerns, particularly when using Web-based
hidden relationships and dependencies from a vast ocean tools.
of organizational data for the support of arguments in the While analytical and data mining tools have become
decision-making process. quite powerful they may be too complex and
Organizations are using data warehousing to support sophisticated for the average "information consumer."
strategic and mission-critical applications. Data Managers who are comfortable with paper-based reports
deposited into the data warehouse must be transformed may find the transition to data warehouse tools to be
into information and knowledge and appropriately uncomfortable and counterproductive. Key to effective
disseminated to decision makers within the organization data warehouse use are identifying the right tools for the
and to critical partners in various supply chains. Crucial different types of data warehouse users and providing
problems that must be addressed in this area are: the adequate training and support once those tools have been
modes of dissemination of information to the end user; selected. For a manager whose primary concern is
the development, selection, and implementation of monitoring sales levels over time by product and sales
appropriate analytical and data mining tools; the privacy region a simple Excel spreadsheet automatically
and security of data; system performance; and adequate connected to an OLAP cube may be sufficient. A
levels of training and support. manager attempting to identify new marketing strategies
There are a number of commercially available and pricing schemes may require more sophisticated
analytical and data mining tools. Online Analytical tools.
Processing (OLAP) tools support multidimensional Furthermore, the value of the available tools is
views of the data warehouse. dependent upon matching the data characteristics to the
OLAP "cubes" are frequently extracted from the data managerial need. Early data warehouse applications
warehouse and made available to managers for specific assumed that currency was not a required characteristic
decision making situations. Using tools such as for managerial decision making. Hence data warehouses
ORACLE Discoverer, CognosPowerPlay, or Business were often "refreshed" from operational databases on a
Objects or even Excel spreadsheets managers can "slice, weekly or monthly basis. Given the accelerated pace of
dice, drilldown, and roll-up" instance-level data along business, "active" or "flash" data warehouses are
pre-defined dimensions. These can be extremely useful becoming more prevalent.
for identifying and exploring the causes of problem Such data warehouses are updated virtually in parallel
situations. For example, drilling down on sales for a with operational databases. This can lead to integrity and
specific product that has not met its sales goals can help consistency problems because data is in a constant state
a manager identify which customers or regions are of flux. Analytical results can vary literally from one
underperforming with respect to that product. However, moment to another.
they are not very effective for generating solution
alternatives once the problem is identified nor are they MAIN STEPS OF DATA MINING:
effective in "discovering" relationships within the data
that can be used for strategy formulation or
implementation.
Data mining and other "knowledge discovery in
database" (KDD) tools, on the other hand, are
specifically designed to identify relationships and "rules"
within the data warehouse.
Unfortunately the identified relationships and
rules may or may not be useful to management.
Often such tools require users to specify the
type of relationship or rule sought. For example, a data
mining tool could be used to identify "products that are
frequently purchased at the same time" or products
whose purchase is "dependent on other previously
purchased products."

BVIMSR’S Journal of Management Research ISSN: 0976-4739


Proceedings of National Conference on Emerging Trends: Innovations and Challenges in IT, 19 -20, April 2013

SOURCE: From Secondary Data Each of these techniques analyzes data in different ways:

DATA MINING TECHNIQUES AND THEIR  Artificial neural networks are non-linear,
APPLICATION predictive models that learn through training.
The three main areas of data mining are One area where auditors can easily use them is
(a) Classification, when reviewing records to identify fraud and
(b) Clustering and fraud-like actions. Because of their complexity,
(c) Association rule mining. reviewing credit card transactions every month
Classification assigns items to appropriate to check for anomalies.
classes by using the attributes of each item The k-nearest  Decision trees. These decisions generate rules,
neighbour (k-NN) method uses a training set, and a new which then are used to classify data. Decision
item is placed in the set, whose entries appear most trees are the technique for building
among the k-NNs of the target item.K-NN queries are understandable models. Auditors can use them
also used in similarity search, content based image to assess, for example, whether the organization
retrieval – (CBIR). is using an appropriate cost-effective marketing
Clustering methods group items, but unlike strategy that is based on the assigned value of
classification, the groups are not predefined. the customer, such as profit.
Association rule mining – (ARM) considers  The nearest-neighbor method classifies dataset
market basket or shopping-cart data, that is, the items records based on similar data in a historical
purchased on a particular visit to the supermarket. ARM dataset. Auditors can use this approach to
first determines the frequent sets, which have to meet a define a document that is interesting to them
certain support level. and ask the system to search for similar items.
The power of data mining lies in its ability to allow users
to consider data from a variety of perspectives in order to Each of these approaches brings different advantages
discover hidden patterns. There are two main divisions and disadvantages that need to be considered prior to
of classification: supervised learning or training, and their use. Neural networks, which are difficult to
unsupervised learning. implement, require all input and resultant output to be
Supervised training requires training samples to be expressed numerically, thus needing some sort of
labeled with a known category or outcome to be applied interpretation depending on the nature of the data-mining
to the classifier. Unsupervised learning, also known as exercise. The decision tree technique is the most
clustering, refers to methodologies that are designed to commonly used methodology, because it is simple and
find natural groupings or clusters without the benefit of a straightforward to implement. Finally, the nearest-
training set. The goal is to discover hidden or new neighbor method relies more on linking similar items
relationships within the data set. and, therefore, works better for extrapolation rather than
In addition to using a particular data mining tool, internal predictive enquiries.
auditors can choose from a variety of data mining
techniques. A good way to apply advanced data mining techniques is
to have a flexible and interactive data mining tool that is
THE MOST COMMONLY USED DATA MINING fully integrated with a database or data warehouse.
TECHNIQUES: Using a tool that operates outside of the database or data
warehouse is not as efficient. Using such a tool will
involve extra steps to extract, import, and analyze the
data. When a data mining tool is integrated with the data
Techniques warehouse, it simplifies the application and
implementation of mining results. Furthermore, as the
warehouse grows with new decisions and results, the
Nearest- organization can mine best practices continually and
Artificial Neural
Decision Trees Neighbor apply them to future decisions.
Network
Method
Regardless of the technique used, the real value behind
data mining is modeling — the process of building a
Source: Compiled by Author’s model based on user-specified criteria from already

BVIMSR’S Journal of Management Research ISSN: 0976-4739


Proceedings of National Conference on Emerging Trends: Innovations and Challenges in IT, 19 -20, April 2013

captured data. Once a model is built, it can be used in more attention to data quality issues in data warehouses
similar situations where an answer is not known. For is needed.
example, an organization looking to acquire new Instance level data integration has been studied
customers can create a model of its ideal customer that is extensively in the context of heterogeneous databases.
based on existing data captured from people who Reasons for this are varied, including data entry errors or
previously purchased the product. The model then is limitations of existing software (e.g., disallowing
used to query data on prospective customers to see if multiple locations for a single customer may cause the
they match the profile. Modeling also can be used in organization to maintain a customer record for each
audit departments to predict the number of auditors location). While this problem is prevalent in internal
required to undertake an audit plan based on previous information systems, the emergence of inter-
attempts and similar work. organizational and Web-based systems and the trend
toward mergers
Managing the content of a data warehouse is a daunting
task. Modern organizations use a wide variety of CONCLUSION
distributed information systems to conduct their day-to- Data mining in decision support system provides a
day business. These operational systems draw data from selection of data analysis, simulation, visualization and
a variety of databases that operate on different hardware modeling techniques, and software tools. Data mining is
platforms, use different operating systems and DBMSs, a powerful new technology with great potential to help
and have different database structures with varying companies focus on the most important information in
structural, conceptual, and instance level semantics. the data they have collected about the behaviour of their
Research has successfully addressed many of the customers and potential customers. It discovers
hardware, operating system, DBMS, and structural information within the data that queries and reports can't
heterogeneities associated with such systems. However, effectively reveal.
major challenges remain for data warehouse content Data Rich, Information Poor; the amount of raw data
management. These include identifying and accessing stored in corporate databases is exploding. From trillions
the appropriate data sources, coordinating data capture of point-of-sale transactions and credit card purchases to
from them in an appropriate timeframe, assuring pixel-by-pixel images of galaxies, databases are now
adequate data quality, and instance level integration. measured in gigabytes and terabytes. Data mining tools
The extraction, transformation, and loading (ETL) can answer business questions that traditionally were too
functions in a data warehouse are considered the most time consuming to resolve It allows user to manipulate
time-consuming and expensive portion of the data, the first results of the use of Data Mining and
development Decision Support System are some changes planned in
lifecycle (Srivastava and Chen 1999). the marketing approach and a special customer group
Data quality is a major concern for many operational focused campaign based on the knowledge acquired in
systems as well as data warehouses (Wand and Wang data mining models.
1996, Jarke et al. 2000). Validation of accuracy,
timeliness, completeness, and consistency remain major REFERENCES
problems for many organizations even in internal 1. Yang Bao LuJing Zhang :World Academy of
information systems where users are trained and Science, Engineering and Technology 71
managed by the organization. These problems are (2010), Decision Support System Based on
multiplied in information systems that are exposed to Data Warehouse.
customers, vendors, and other partners. The result can be 2. Surajit Chaudhuri (Microsoft Research),
a disaster for a data warehouse that depends on such UmeshwarDayalHewlettPackard(LaboratoriesV
systems for its content. Mechanisms for protecting a data enkatesh),Ganti(MicrosoftResearch):00189162/
warehouse from poor quality data are crucial. At the 01/$17.00 © (2001), IEEEDatabase Technology
same time rejecting data from an operational system due for Decision Support Systems
to quality concerns can exacerbate the data 3. Mr. S. P. Deshpande and Dr. V. M. Thakare;
synchronization problems discussed above, particularly International Journal of Distributed and Parallel
when the organization is using the data warehouse to systems (IJDPS) Vol.1, No.1, September
integrate diverse information systems. Methods for (2010): DATA MINING SYSTEM AND
monitoring and cleansing data during ETL have been APPLICATIONS: A REVIEW
shown to be successful (Berndt et al. 2003) however,

BVIMSR’S Journal of Management Research ISSN: 0976-4739


Proceedings of National Conference on Emerging Trends: Innovations and Challenges in IT, 19 -20, April 2013

4. Michael J. Shaw, Chandrasekar Subramaniam, 6. Auroop R Ganguly, and Amar Gupta;


Gek Woo Tan, Michael E. Welge; Decision ENCYCLOPEDIA OF DATA
Support Systems 31 (2001) 127–137: WAREHOUSING AND MINING: Data
Knowledge management and data mining for Mining Technologies and Decision Support
marketing Systems for Business and Scientific
5. Salvatore T. March, Alan R. Hevner: ntegrated Applications.
Decision Support:A Data Warehousing 7. Rok Rupnik and Matjaž Kukar: Data Mining
Perspective and Decision Support: An Integrative Approach
8. The CRISP-DM model: the new blueprint for
data mining, J Data Warehousing (2000);
5:13—22.
Website:
www.laits.utexas.edu
www.siit.tu.ac

BVIMSR’S Journal of Management Research ISSN: 0976-4739

Potrebbero piacerti anche