Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Conference overview
1.
2.
3.
Demo
4.
Summary
T. Nouri
Data Mining
Knowledge-Based
System
User
Knowledge
Base
Shell
User Interface
Explication
Knowledge
Extraction
SystemEnginee
r
DataMining
Knowled
ge
Engineer
Inference
Data/Exp
ert
T. Nouri
Data Mining
What is KDD?
Why is KDD necessary
The KDD process
KDD operations and methods
T. Nouri
Data Mining
T. Nouri
Data Mining
T. Nouri
Data Mining
Data Mining
Prediction
What? Opaque
Description
Why? Transparent
T. Nouri
Data Mining
Verification driven
Validating hypothesis
Querying and reporting (spreadsheets, pivot
tables)
Multidimensional analysis (dimensional
summaries); On Line Analytical Processing
Statistical analysis
T. Nouri
Data Mining
Discovery driven
Exploratory data analysis
Predictive modeling
Database segmentation
Link analysis
Deviation detection
T. Nouri
Data Mining
10
Original
Data
T. Nouri
Knowledge
Patterns
Target
Data
Transformed
Data
Preprocessed
Data
Data Mining
11
Data Mining
12
Data Mining
13
Selectsample
Normalize
values
Eliminate
noisydata
Supplymissing
values
Transform
values
Createderived
attributes
Relevant
attributes
SelectDM
method(s)
SelectDM
task(s)
Refine
knowledge
Presentation,
visualization
Extract
knowledge
Test
knowledge
T. Nouri
Create/select
targetdatabase
Data Mining
14
Related fields
AI
Machine learning
Statistics
Visualization
T. Nouri
Data Mining
15
Data Mining
16
Conference overview
1.
2.
3.
Demo
4.
Summary
5.
T. Nouri
Data Mining
17
Data Mining
18
Data Mining
19
Data Mining
20
Data Mining
21
T. Nouri
Data Mining
22
Scalability
Efficient and sufficient sampling
In-memory vs. disk-based processing
High performance computing
Automation
Ease of use
Using prior knowledge
T. Nouri
Data Mining
23
Discriminative knowledge
Distinguish between K classes
Accurate classification (also black box)
Separate spaces
T. Nouri
Data Mining
24
Components of DM methods
Data Mining
25
Association rules
Sequence mining
Classification(decision tree etc.)
Clustering
Deviation detection
K-nearest neighbors
T. Nouri
Data Mining
26
Data Mining
27
A and B then C
If
If
T. Nouri
Data Mining
28
Data Mining
29
supp(LS RS)
= #Transaction verifying R / (Total # of Transaction)
Conf(R) = supp(LS RS)/supp(LS)
Ex:
R: Milk=> cookies,
A support(R) of 0.8 means in 80% of transaktion Milk and
cookies are together.
The confidence means the correlation, the relation between
the LS and the RS.
T. Nouri
Data Mining
30
Ticket 2
Ticket 3
Ticket 4
Farine
Oeufs
Farine
Oeufs
Sucre
Sucre
Oeufs
Chocolat
Lait
Chocolat
Sucre
Chocolat
Th
Data Mining
31
PassagierID
Ziel
431
102
NewYork
431
102
London
431
102
Cairo
431
102
Paris
701
38
NewYork
701
38
London
701
38
Cairo
11
531
NewYork
11
531
Cairo
301
102
NewYork
301
102
London
301
102
Paris
T. Nouri
Data Mining
32
T. Nouri
Data Mining
33
Data Mining
34
Sequence mining
Data Mining
35
Predictive modeling
Model
High/Low Risk
Data Mining
36
Models
T. Nouri
Decision trees
Rule induction
Regression models
Neural networks
Data Mining
Easier
Harder
37
What is Classification?
Classification is the process of assigning
new objects to predefined categories or
classes
Data Mining
38
Classification learning
Data Mining
39
Decision-tree classification
Age<27.5
23
Family
High
17
Sports
High
43
Sports
High
68
Family
Low
32
Truck
Low
20
Family
High
CarType{Sports}
High
High
Numeric
T. Nouri
Categorical
Data Mining
Low
40
High
T. Nouri
Low
Data Mining
41
What is clustering?
Given N k-dimensional feature vectors ,
find a meaningful partition of the N
examples into c subsets or groups
Discover the labels automatically
c may be given, or discovered
much more difficult than classification,
since in the latter the groups are given,
and we seek a compact description
T. Nouri
Data Mining
42
Clustering
Data Mining
43
Clustering schemes
Distance-based
Numeric
Euclidean distance (root of sum of squared
differences along each dimension)
Angle between two vectors
Categorical
Number of common features (categorical)
Partition-based
Enumerate partitions and score each
T. Nouri
Data Mining
44
K-means algorithm
Initial seeds
T. Nouri
Data Mining
45
K-means algorithm
New centers
T. Nouri
Data Mining
46
K-means algorithm
Final centers
T. Nouri
Data Mining
47
Deviation detection
outlier
T. Nouri
Data Mining
48
K-nearest neighbors
Data Mining
49
K-nearest neighbors
Neighborhood
5 of class
3 of class
=
T. Nouri
Data Mining
50
Conference overview
1.
2.
3.
Demo
4.
Research Trends
5.
Summary
6.
Data Mining
51
Conference overview
1.
2.
3.
Demo
4.
Summary
5.
T. Nouri
Data Mining
52
Conclusions
Data Mining
53
Conclusions
Data Mining
54
T. Nouri
Data Mining
55
Question ?
T. Nouri
Data Mining
56