Sei sulla pagina 1di 9

Interesting Patterns

 A data mining system - potential to generate


thousands or even millions of patterns, or rules.
1.“Are all of the patterns interesting?” “What makes
a pattern interesting?
2. Can a data mining system generate all of the
interesting patterns?
3. Can a data mining system generate only interesting
patterns?”
What are Interesting patterns?
Easily understandable by humans
 validates some hypothesis that someone wants to confirm
 or valid on new data with some degree of certainty
 potentially useful or novel.
 An objective measure for association rules of the form X
=>Y is rule support, representing the percentage of
transactions from a transaction database that the given
rule satisfies.

 This is taken to be the probability P(X ᴜY),where X ᴜY


indicates that a transaction contains both X and Y,that is,
the union of itemsets X and Y.

support(X =>Y) = P(XᴜY)


 Confidence , which assesses the degree of certainty of
the detected association.
 This is taken to be the conditional probability P(Y|X),
that is, the probability that a transaction containing X
also contains Y

confidence(X =>Y) = P(Y|X)


Support

Every association rule has a support and a confidence.


“The support is the percentage of transactions that demonstrate the rule.”

Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 1, 3, 5.
2: 1, 8, 14, 17, 12.
3: 4, 6, 8, 12, 9, 104.
4: 2, 1, 8.
support {8,12} = 2 (,or 50% ~ 2 of 4 customers)
support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )
support {1} = 3 (,or 75% ~ 3 of 4 customers)
Example: Database with transactions ( customer_# : item_a1, item_a2, … )
1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {5} => {8} ) ?


supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4
then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
Confidence
Every association rule has a support and a confidence.

An association rule is of the form: X => Y

 X => Y: if someone buys X, he also buys Y

The confidence is the conditional probability that, given X present in a


transition , Y will also be present.

Confidence measure, by definition:


Confidence(X=>Y) equals support(X,Y) / support(X)
2) “Can a data mining system generate all of the
interesting patterns?”
 refers to the completeness of a data mining algorithm.
 Ie. A data mining algorithm is complete if it mines
all interesting patterns.
 Unrealistic and inefficient for data mining systems to
generate all of the possible patterns.
 user-provided constraints and interestingness
measures should be used to focus the search
 Eg: Association rule mining
3) “can a data mining system generate only
interesting patterns?”
 A data mining algorithm is consistent if it mines only
interesting patterns.

 is an optimization problem in data mining.

 It is highly desirable for data mining systems to generate only


interesting patterns.

 Because neither would have to search through the patterns


generated in order to identify the truly interesting ones.

Potrebbero piacerti anche