Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1. PREPROCESS:
The data that is collected from the field contains many unwanted things that leads to wrong
analysis. For example, the data may contain null fields, it may contain columns that are irrelevant
to the current analysis, and so on. Thus, the data must be preprocessed to meet the requirements
of the type of analysis you are seeking. This is the done in the preprocessing module.
At the very top of the window, just below the title bar there is a row of tabs. Only the first tab,
‘Preprocess’, is active at the moment because there is no dataset open. The first three 4 buttons at
the top of the preprocess section enable you to load data into WEKA. Data can be imported from
a file in various formats: ARFF, CSV, C4.5, binary, it can also be read from a URL or from an
SQL database (using JDBC). The easiest and the most common way of getting the data into
WEKA is to store it as Attribute-Relation File Format (ARFF) file.
2. CLASSIFICATION (NAÏVE BAYES ALGORITHM):
In the Classify tab, you can create a model by using Choose to select a model. Naive Bayes is a
classification algorithm. Traditionally it assumes that the input values are nominal, although it
numerical inputs are supported by assuming a distribution.
Naive Bayes uses a simple implementation of Bayes Theorem (hence naive) where the prior
probability for each class is calculated from the training data and assumed to be independent of
each other (technically called conditionally independent).
This is an unrealistic assumption because we expect the variables to interact and be dependent,
although this assumption makes the probabilities fast and easy to calculate. Even under this
unrealistic assumption, Naive Bayes has been shown to be a very effective classification
algorithm.
Naive Bayes calculates the posterior probability for each class and makes a prediction for the
class with the highest probability. As such, it supports both binary classification and multi-class
classification problems
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: golfgame
Instances: 14
Attributes: 3
condition
temperature
class
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
Naive Bayes Classifier
Class
Attribute Yes No
(0.63) (0.38)
===============================
condition
Rainy 4.0 4.0
Sunny 5.0 2.0
Cloudy 3.0 2.0
[total] 12.0 8.0
temperature
mean 60.9744 61.8154
std. dev. 12.0783 9.0532
weight sum 9 5
precision 3.1538 3.1538
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.556 1.000 0.500 0.556 0.526 -0.471 0.244 0.542 Yes
0.000 0.444 0.000 0.000 0.000 -0.471 0.244 0.290 No
Weighted Avg. 0.357 0.802 0.321 0.357 0.338 -0.471 0.244 0.452
a b <-- classified as
5 4 | a = Yes
5 0 | b = No
3. SIMPLE K-MEANS CLUSTERING:
A clustering algorithm finds groups of similar instances in the entire dataset. WEKA supports
several clustering algorithms such as EM, FilteredClusterer, HierarchicalClusterer,
SimpleKMeans and so on. You should understand these algorithms completely to fully exploit
the WEKA capabilities. As in the case of classification, WEKA allows you to visualize the
detected clusters graphically. To demonstrate the clustering, we will use the provided iris
database. The data set contains three classes of 50 instances each. Each class refers to a type of
iris plant.
Relation: golfgame
Instances: 14
Attributes: 3
condition
temperature
class
kMeans
======
Number of iterations: 3
Cluster 0: Sunny,76,Yes
Cluster 1: Sunny,73,Yes
Cluster#
==============================================
Clustered Instances
0 7 ( 50%)
1 7 ( 50%)
4. APRIORI ASSOCIATION RULE:
The Apriori algorithm is one such algorithm in ML that finds out the probable associations
and creates association rules. WEKA provides the implementation of the Apriori algorithm.
You can define the minimum support and an acceptable confidence level while computing
these rules. You will apply the Apriori algorithm to the supermarket data provided in the
WEKA installation.
Relation: golfgame
Instances: 14
Attributes: 4
condition
con
cond
class
Apriori
=======
10. condition=Rainy cond=Normal 2 ==> class=Yes 2 <conf:(1)> lift:(1.56) lev:(0.05) [0] conv:(0.71)