Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Content
What is WEKA?
The Explorer:
Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization
7/27/2012
What is WEKA?
Waikato Environment for Knowledge Analysis Its a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.
7/27/2012
7/27/2012
Main Features
49 data preprocessing tools
76 classification/regression algorithms
8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search
7/27/2012
Main GUI
Three graphical user interfaces
The Explorer (exploratory data
analysis) The Experimenter (experimental environment) The KnowledgeFlow (new process model inspired interface)
7/27/2012
Content
What is WEKA?
The Explorer:
Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization
7/27/2012
formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called filters WEKA contains filters for:
Discretization, normalization, resampling, attribute
7/27/2012
11
University of Waikato
7/27/2012
12
University of Waikato
7/27/2012
13
University of Waikato
7/27/2012
14
University of Waikato
7/27/2012
15
University of Waikato
7/27/2012
16
University of Waikato
7/27/2012
17
University of Waikato
7/27/2012
18
University of Waikato
7/27/2012
19
University of Waikato
7/27/2012
20
University of Waikato
7/27/2012
21
University of Waikato
7/27/2012
22
University of Waikato
7/27/2012
23
University of Waikato
7/27/2012
24
University of Waikato
7/27/2012
25
University of Waikato
7/27/2012
26
University of Waikato
7/27/2012
27
University of Waikato
7/27/2012
28
University of Waikato
7/27/2012
29
University of Waikato
7/27/2012
30
University of Waikato
7/27/2012
31
University of Waikato
7/27/2012
32
7/27/2012
33
yes
34
36
University of Waikato
7/27/2012
37
University of Waikato
7/27/2012
38
University of Waikato
7/27/2012
39
University of Waikato
7/27/2012
40
University of Waikato
7/27/2012
41
University of Waikato
7/27/2012
42
University of Waikato
7/27/2012
43
University of Waikato
7/27/2012
44
University of Waikato
7/27/2012
45
University of Waikato
7/27/2012
46
University of Waikato
7/27/2012
47
University of Waikato
7/27/2012
48
University of Waikato
7/27/2012
49
University of Waikato
7/27/2012
50
University of Waikato
7/27/2012
51
University of Waikato
7/27/2012
52
University of Waikato
7/27/2012
53
University of Waikato
7/27/2012
54
University of Waikato
7/27/2012
55
University of Waikato
7/27/2012
56
University of Waikato
7/27/2012
57
University of Waikato
7/27/2012
of attributes:
milk, butter bread, eggs (with confidence 0.9 and
support 2000)
Apriori can compute all rules that have a given
61
7/27/2012
itemset: A set of one or more items k-itemset X = {x1, , xk} (absolute) support, or, support count
30
40
50
of X: Frequency or occurrence of an itemset X (relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X) An itemset X is frequent if Xs support is no less than a minsup threshold
July 27, 2012
Items bought Beer, Nuts, Diaper Beer, Coffee, Diaper Beer, Diaper, Eggs Nuts, Eggs, Milk
Nuts, Coffee, Diaper, Eggs, Milk Customer buys both
10 20 30 40 50
Find all the rules X Y with minimum support and confidence support, s, probability that a transaction contains X Y confidence, c, conditional probability that a transaction having X also contains Y
Let minsup = 50%, minconf = 50% Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer, Diaper}:3
Association rules: (many more!) Beer Diaper (60%, 100%) Diaper Beer (60%, 75%)
July 27, 2012
63
64
University of Waikato
7/27/2012
65
University of Waikato
7/27/2012
66
University of Waikato
7/27/2012
67
University of Waikato
7/27/2012
68
University of Waikato
7/27/2012
attributes are the most predictive ones Attribute selection methods contain two parts:
A search method: best-first, forward selection, random,
exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared,
Very flexible: WEKA allows (almost) arbitrary
69
7/27/2012
70
University of Waikato
7/27/2012
71
University of Waikato
7/27/2012
72
University of Waikato
7/27/2012
73
University of Waikato
7/27/2012
74
University of Waikato
7/27/2012
75
University of Waikato
7/27/2012
76
University of Waikato
7/27/2012
77
University of Waikato
7/27/2012
determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)
To do: rotating 3-d visualizations (Xgobi-style)
78
7/27/2012
79
University of Waikato
7/27/2012
80
University of Waikato
7/27/2012
81
University of Waikato
7/27/2012
82
University of Waikato
7/27/2012
83
University of Waikato
7/27/2012
84
University of Waikato
7/27/2012
85
University of Waikato
7/27/2012
86
University of Waikato
7/27/2012
87
University of Waikato
7/27/2012
88
University of Waikato
7/27/2012
data mining.
WEKA Data Mining Book: Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) WEKA Wiki:
http://weka.sourceforge.net/wiki/index.php/Main_Page Others:
Jiawei Han and Micheline Kamber, Data Mining: Concepts and