Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
What is a classification rule Evaluating a rule Algorithms PRISM Incremental reduced-error pruning RIPPER Handling Missing values Numeric attributes Rule-based classifiers versus decision trees C4.5rules, PART C4.5rules
Section 5.1 of course book.
Rule: (Condition) y
where
Condition is a conjunction of attribute tests (A1 = v1) and (A2 = v2) and and (An = vn) y is the class label
human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle
warm cold cold warm cold cold warm warm warm cold cold warm warm cold cold cold warm warm warm warm
mammals reptiles fishes mammals amphibians reptiles mammals birds mammals fishes reptiles birds mammals fishes amphibians reptiles mammals birds mammals birds
R1: (Give Birth = no) (Can Fly = yes) Birds R2: (Give Birth = no) (Live in Water = yes) Fishes R3: (Give Birth = yes) (Blood Type = warm) Mammals R4: (Give Birth = no) (Can Fly = no) Reptiles R5: (Live in Water = sometimes) Amphibians
TNM033: Introduction to Data Mining 3
Motivation
Consider the rule set
Attributes A, B, C, and D can have values 1, 2, and 3 A = 1 B = 1 Class = Y C = 1 D = 1 Class = Y Otherwise, Class = N
Class = Y Class = Y Y Y
Motivation
A 1 2 B 1 Y 1 D 1 2 Y N 3 N Y 2 C 3 1 C D 2 N 3 N D 1 2 N 3 N 1 2 N 3 N Y 1 2 N 3 N Y N N D 1 2 N 3 N N N 2 3 1 2 3 C 3 C
warm warm
no yes
yes no
no no
? ?
The rule R1 covers a hawk Class = Bird The rule R3 covers the grizzly bear Class = Mammal
TNM033: Introduction to Data Mining 6
Accuracy: fraction of records covered by the rule that belong to the class on the RHS
6 7 8 9 10
1 0
yes no yes
no no no
no sometimes yes
? ? ?
A lemur triggers rule R3 class = mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules voting or ordering rules default rule
Indirect Method
Extract rules from other classification models (e.g. decision trees, etc). e.g: C4.5rules
Goal: to create rules that cover many examples of a class C and none (or very few) of other classes
TNM033: Introduction to Data Mining 10
{} class = C (a = v)
Two issues:
How to choose the best test? Which attribute to choose? When to stop building a rule?
11
y
26
b a a b b a a a b b b b b b b b b x a b
a b b x
12
C
rule so far rule after adding new term
13
14
PRISM Algorithm
For each class C Initialize E to the training set While E contains instances in class C Create a rule R with an empty left-hand side that predicts class C Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v, Consider adding the condition A = v to the left-hand side of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A = v to R Remove the instances covered by R from E
Available in WEKA
15
Produce rules that dont cover negative instances, as quickly as possible Disadvantage: may produce rules with very small coverage
Special cases or noise? (overfitting)
16
Rule Evaluation
Other metrics:
These measures take into account the coverage/support count of the rule
t: number of instances covered by the rule p: number of instances covered by the rule that are positive examples k: number of classes f: prior probability of the positive class
17
Numeric attributes are treated just like they are in decision trees
1. 2. Sort instances according to attributes value For each possible threshold, a binary-less/greater than test is considered
18
Rule Pruning
The PRISM algorithm tries to get perfect rules, i.e. rules with accuracy = 1 on the training set.
These rules can be overspecialized overfitting Solution: prune the rules
19
Test set
TNM033: Introduction to Data Mining
Available in Weka
21
Prune the rule immediately using reduced error pruning Measure for pruning: W(R) = (p-n)/(p+n)
p: number of positive examples covered by the rule in the validation set n: number of negative examples covered by the rule in the validation set
23
Example
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human python salmon whale frog komodo bat pigeon cat leopard shark turtle penguin porcupine eel salamander gila monster platypus owl dolphin eagle
no yes yes no yes yes no yes no no yes yes no yes yes yes yes yes no yes
yes no no no yes yes yes yes yes no yes yes yes no yes yes yes yes no yes
mammals reptiles fishes mammals amphibians reptiles mammals birds mammals fishes reptiles birds mammals fishes amphibians reptiles mammals birds mammals birds
24
C4.5rules:
(Give Birth=No, Can Fly=Yes) Birds (Give Birth=No, Live in Water=Yes) Fishes (Give Birth=Yes) Mammals (Give Birth=No, Can Fly=No, Live in Water=No) Reptiles ( ) Amphibians
Mammals
Live In Water?
Yes No Sometimes
RIPPER:
(Live in Water=Yes) Fishes (Have Legs=No) Reptiles
Fishes
Amphibians
Can Fly?
Yes No
(Give Birth=No, Can Fly=No, Live In Water=No) Reptiles (Can Fly=Yes,Give Birth=No) Birds () Mammals
Birds
Reptiles
25
Build a partial decision tree on the current set of instances Create a rule from the decision tree
Discarded the decision tree Remove the instances covered by the rule Go to step one
Available in WEKA
26
27