Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Summary
February 1, 2015 Data Mining: Concepts and Techniques 2
Why Data Mining?
Summary
February 1, 2015 Data Mining: Concepts and Techniques 5
What Is Data Mining?
Task-relevant Data
Data Cleaning
Data Integration
Databases
February 1, 2015 Data Mining: Concepts and Techniques 7
KDD Process: Several Key Steps
Learning the application domain
relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation
Find useful features, dimensionality/variable reduction, invariant
representation
Choosing functions of data mining
summarization, classification, regression, association, clustering
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
February 1, 2015 Data Mining: Concepts and Techniques 8
Example: A Web Mining Framework
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data
Warehouse Server
Summary
February 1, 2015 Data Mining: Concepts and Techniques 14
Multi-Dimensional View of Data Mining
Data to be mined
Database data (extended-relational, object-oriented, heterogeneous,
Techniques utilized
Data-intensive, data warehouse (OLAP), machine learning, statistics,
Summary
February 1, 2015 Data Mining: Concepts and Techniques 16
Data Mining: On What Kinds of Data?
Database-oriented data sets and applications
Relational database, data warehouse, transactional database
Advanced data sets and advanced applications
Data streams and sensor data
Time-series data, temporal data, sequence data (incl. bio-sequences)
Structure data, graphs, social networks and multi-linked data
Object-relational databases
Heterogeneous databases and legacy databases
Spatial data and spatiotemporal data
Multimedia database
Text databases
The World-Wide Web
Summary
February 1, 2015 Data Mining: Concepts and Techniques 18
Data Mining Function: (1) Generalization
Summary
February 1, 2015 Data Mining: Concepts and Techniques 24
Data Mining: Confluence of Multiple Disciplines
Applications Visualization
Data Mining
Summary
February 1, 2015 Data Mining: Concepts and Techniques 27
Applications of Data Mining
Web page analysis: from web page classification, clustering to
PageRank & HITS algorithms
Collaborative analysis & recommender systems
Basket data analysis to targeted marketing
Biological and medical data analysis: classification, cluster analysis
(microarray data analysis), biological sequence analysis, biological
network analysis
Data mining and software engineering (e.g., IEEE Computer, Aug.
2009 issue)
From major dedicated data mining systems/tools (e.g., SAS, MS SQL-
Server Analysis Manager, Oracle Data Mining Tools) to invisible data
mining
Summary
February 1, 2015 Data Mining: Concepts and Techniques 29
Major Issues in Data Mining (1)
Mining Methodology
Mining various and new kinds of knowledge
Mining knowledge in multi-dimensional space
Data mining: An interdisciplinary effort
Boosting the power of discovery in a networked environment
Handling noise, uncertainty, and incompleteness of data
Pattern evaluation and pattern- or constraint-guided mining
User Interaction
Interactive mining
Incorporation of background knowledge
Presentation and visualization of data mining results
Summary
February 1, 2015 Data Mining: Concepts and Techniques 32
Summary
Data mining: Discovering interesting patterns and knowledge from
massive amount of data
A natural evolution of database technology, in great demand, with
wide applications
A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
Mining can be performed in a variety of data
Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
Data mining technologies and applications
Major issues in data mining