Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
• Alternative names
– Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
5
Data Mining Functionalities
Database
Technology Statistics
Machine Visualization
Data Mining
Learning
Pattern
Recognition Other
Algorithm Disciplines
Classification of Data Mining
Type of data
Application source mined
adopted
Data model
Data Mining drawn
Mining Kind of
Technique Knowledge
discovered
1)Classification according to type of data source mined:
spatial database
multimedia data
time series data
text data
world wide web
2)Classification according to data model drawn:
relational database
oodb
datawarehouse
transactional db
3)Classification according to kind of knowledge
discovered:
functionalities -
Characterization
Association
Classification
clustering
4)Classification according to mining techniques:
data analysis approach –
machine learning
neural network
genetic algorithm
5)Classification according to the application
adopted:
finance
telecommunication
stock market
Major issues in data mining
Major issues
B) Performance issues:
1) Efficiency and scalability of data mining algorithms
2) Parallel, distributed and incremental mining algorithms
• Task-relevant data
– Database or data warehouse name
– Database tables or data warehouse cubes
– Condition for data selection
– Relevant attributes or dimensions
– Data grouping criteria
• Type of knowledge to be mined
– Characterization, discrimination, association, classification, prediction,
clustering, outlier analysis, other data mining tasks
• Background knowledge
• Pattern interestingness measurements
• Visualization/presentation of discovered patterns
24
Guidelines for Successful Data Mining
1. Use a small team with a strong internal integration and a
loose management style.
2. Carry out a small pilot project before a major data mining
project.
3. Identify a clear problem owner responsible for the project.
Could be someone in a sales or marketing. This will benefit
the external integration.
25
Guidelines for Successful Data Mining
4. Try to realize a positive return on investment within 6 to 12
months.
5. The whole data mining project should have the support of
the top management of the company.
26
Statistical Data Mining
• Some of the Statistical Data Mining Techniques are as follows −
• Regression − Regression methods are used to predict the value of the
response variable from one or more predictor variables where the
variables are numeric. Listed below are the forms of Regression −
– Linear
– Multiple
– Weighted
– Polynomial
– Nonparametric
– Robust
• Generalized Linear Models - Generalized Linear Model includes −
– Logistic Regression
– Poisson Regression
• The model's generalization allows a categorical response variable to be
related to a set of predictor variables in a manner similar to the modelling
of numeric response variable using linear regression.
• Analysis of Variance − This technique analyzes −
– Experimental data for two or more populations described by a
numeric response variable.
– One or more categorical variables (factors).
• Mixed-effect Models − These models are used for analyzing
grouped data. These models describe the relationship between a
response variable and some co-variates in the data grouped
according to one or more factors.
• Factor Analysis − Factor analysis is used to predict a categorical
response variable. This method assumes that independent variables
follow a multivariate normal distribution.
• Time Series Analysis − Following are the methods for analyzing
time-series data −
– Auto-regression Methods.
– Univariate ARIMA (AutoRegressive Integrated Moving Average)
Modeling.
– Long-memory time-series modeling.
Data Mining - Applications & Trends
• Data mining is widely used in diverse areas. There are a
number of commercial data mining system available
today and yet there are many challenges in this field. In
this tutorial, we will discuss the applications and the trend
of data mining.
• Data Mining Applications
• Financial Data Analysis
• Retail Industry
• Telecommunication Industry
• Biological Data Analysis
• Other Scientific Applications
• Intrusion Detection