Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data mining
There are number of data mining Practice Methodologies to guide and carry out a data
mining under taking. CRISP –DM and SEMMA methodology are widely used methodology.
CRISP-DM stands for Cross-Industry Standards Practice for data mining. It has been
promoted as an open, cross –industry standard and has been developed and promoted by a
variety of interests and data mining vendors, including Mercedes-Benz and NCR
Corporation. SEMMA is a proprietary methodology which was developed by SAS Institute.
SEMMA stands for Sample, Explore, Modify, Model and Assess. There are many other
methodologies that have been proposed and promoted.
The CRISP-DM model makes large data mining projects faster, more efficient and less costly
by enabling users to take advantage of an already-proven process. This model helps people to
avoid common mistakes and provides a head start in assessing business issues and
articulating data mining methods.
CRISP-DM model
10 Mar 2010 1
Optimization of Business Intelligence using Data mining in a challenging economy
10 Mar 2010 2
Optimization of Business Intelligence using Data mining in a challenging economy
Business
Understanding
Determine Determining
Assess Produce
data data mining
Situation project plan
objective goals
Study
Inventory of Draft Project
Backgrou
resources Data mining plan
nd
goals
Select
Understand
Initial Tools
Requirement,
Business Data mining and
assumptions
objective success Techniques
and
Understand constraints Criteria
Terminology
Cost and
Benefits
10 Mar 2010 3
Optimization of Business Intelligence using Data mining in a challenging economy
and project plan. Data analyst lists the risks or events that might delay the project or cause it
to fail. Data Analyst lists the corresponding contingency plans, and charts what action will be
taken if these risks or events take place. Data analyst compiles the glossary of data mining
terminology and business terminology. Data analyst construct cost benefit analysis to
compare the cost of the project and its benefits if it is successful.
Data Understanding
Verify
Collect Describe Explore
Data
Initial Data Data data
Quality
10 Mar 2010 4
Optimization of Business Intelligence using Data mining in a challenging economy
Explore data
Data analyst tackles the data mining questions, which can be addressed using
querying, visualization, and reporting to achieve data mining goals. The data analyst creates a
data exploration report that outlines first findings, or an initial hypothesis, and the potential
impact on the remainder of the project. They may also contribute to or refine the data
description and quality reports, and feed into the transformation and other data preparation
steps needed for further analysis.
10 Mar 2010 5
Optimization of Business Intelligence using Data mining in a challenging economy
Data
preparation
Generated
report
Clean Data
The data analyst must either select clean subsets of data or incorporate more
ambitious techniques such as estimating missing data through modeling analyses.
Construct Data
The data analyst should undertake data preparation operations such as developing
entirely new records or producing derived attributes. These derived attributes should only be
added if they ease the model process or facilitate the modeling algorithm.
Integrate Data
Integrating data involves combining information from multiple tables or records to
create new records or values.
10 Mar 2010 6
Optimization of Business Intelligence using Data mining in a challenging economy
Format Data
The data analyst will change the format or design of the data if necessary without
changing the meaning of data.
Modeling
Modeling Revised
Models
assumption paramet
er setting
Model
Description
10 Mar 2010 7
Optimization of Business Intelligence using Data mining in a challenging economy
10 Mar 2010 8
Optimization of Business Intelligence using Data mining in a challenging economy
Evaluation
Determine next
Evaluate results Review Process
step
List of
Assessment of Review of process possible
data mining action
Results
Approved
model Decision
Fig3.5.Evaluation phase in CRISP-DM
Evaluate Results
This step assesses the degree to which the model meets the business objectives and
determines if there is some business reason why this model is deficient. Another option here
is to test the model(s) on real-world applications— if time and budget constraints permit.
Moreover, evaluation also seeks to unveil additional challenges, information, or hints for
future directions. At this stage, the data analyst summarizes the assessment results in terms of
business success criteria, including a final statement about whether the project already meets
the initial business objectives.
Review Process
It is now appropriate to do a more thorough review of the data mining engagement to
determine if there is any important factor or task that has somehow been overlooked. This
review also covers quality assurance issues.
10 Mar 2010 9
Optimization of Business Intelligence using Data mining in a challenging economy
Deployment
Plan
monitoring and Produce final Review
Plan
maintenance report Project
Development
Monitoring Experience
Deployment
and Final Report documentati
plan
maintenance on
plan
Final report
presentation
Plan Deployment
In order to deploy the data mining result(s) into the business, this task takes the
evaluation results and develops a strategy for deployment.
10 Mar 2010 10
Optimization of Business Intelligence using Data mining in a challenging economy
Review Project
The data analyst should assess failures and successes as well as potential areas of
improvement for use in future projects. This step should include a summary of important
experiences during the project and can include interviews with the significant project
participants. It may include pitfalls, misleading approaches, or hints for selecting the best-
suited data mining techniques in similar situations. In ideal projects, experience
documentation also covers any reports written by individual project members during the
project phases and tasks.
10 Mar 2010 11