Sei sulla pagina 1di 20

Overview

Fundamentals of association rules


in data mining and knowledge
discovery
Shichao Zhang1,2∗ and Xindong Wu3

Association rule mining is one of the fundamental research topics in data mining
and knowledge discovery that identifies interesting relationships between item-
sets in datasets and predicts the associative and correlative behaviors for new
data. Rooted in market basket analysis, there are a great number of techniques
developed for association rule mining. They include frequent pattern discovery,
interestingness, complex associations, and multiple data source mining. This pa-
per introduces the up-to-date prevailing association rule mining methods and
advocates the mining of complete association rules, including both positive and
negative association rules. C 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011
1 97–116 DOI: 10.1002/widm.10

INTRODUCTION During the development of KDD, association


rule mining has played a fundamental role, which was
K nowledge Discovery in Databases (KDD) was
initially named by Gregory Piatetsky-Shapiro in
a workshop at the 1989 International Joint Confer-
first proposed by Agrawal et al.3 for market basket
analysis. It has since emerged as a prominent research
ence on Artificial Intelligence in Detroit, USA, in Au- area in KDD. This is attributed to its simple repre-
gust 1989. It was defined as the process of finding in- sentation, easy understanding, and being potentially
teresting, interpretable, useful, and novel knowledge. useful in capturing, for example, customer buying
KDD has become a multidisciplinary subject today. behavior.
The original KDD workshop series became an an- An association rule is of the form X → Y. It
nual international conference in 1995, and a biannual is interesting if its support [supp (X ∪ Y)] and confi-
academic magazine titled ‘Association for Comput- dence [conf (X → Y)] are equal to or greater than the
ing Machinery’s Special Interest Group on Knowl- user-specified minimum support (ms) and minimum
edge Discovery and Data Mining (SIGKDD) Explo- confidence (mc) thresholds, respectively. This simple
rations’ was launched in 1999. The term of KDD was representation is generally referred to as the support–
successively defined at two different times. In 1992, confidence (supp–conf) framework. An association
it was defined as a nontrivial extraction of implicit, rule such as ‘if milk then bread’ means that if a cus-
previously unknown, and potentially useful informa- tomer purchases milk, she/he likely also buys bread.
tion from data by Frawley et al.1 In 1996, Fayyad This explanation is easy to understand. In particular,
et al.2 defined KDD as a nontrivial process of iden- the story concerning ‘diapers → beer’ has shown that
tifying valid, novel, potentially useful, and ultimately association rules are useful, hidden, and difficult to
understandable patterns in data. The 1996 defini- mine. This explanation has made a big boost in as-
tion has been widely accepted by both academia and sociation rule mining development. Thereby, it is not
industry. surprising that association rule mining almost meant
KDD from 1993 to 2002 in many people’s eyes.
Although an association rule looks really sim-

ple; in real applications, association rule mining is
Correspondence to: zhangsc@zjnu.cn
1
often challenging. The first challenge is the object
Department of Computer Science, Zhejiang Normal University,
Jinhua, Zhejiang, PR China
to be mined - data sources. Data sources are often
2
State Key Laboratory for Novel Software Technology, Nanjing
very large, multiple, and heterogeneous. Also, the
University, Nanjing, Jiangsu, PR China data may be raw, rough, incomplete, with missing
3
Department of Computer Science, University of Vermont, values, and dynamic. The second challenge is the min-
Burlington, VT, USA ing method itself. It includes an interestingness mea-
DOI: 10.1002/widm.10 sure, an exponential search space, a mining theory,

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 97
Overview wires.wiley.com/widm

and an evaluation criterion. The last challenge comes support (minsup, ms), which is given by users or ex-
with the constraints such as accuracy requirements perts.
and time and space limits. An association rule is an implication X → Y
As an introduction to association analysis, that describes the existence of a relationship between
this article focuses on the basic concepts, typical itemsets X and Y, where X, Y ⊂ I, and X ∩ Y = 
techniques, and some applications. The article is or- (itemsets X and Y must not intersect). Each associa-
ganized as follows: The section ‘Association Rules’ tion rule has two quality measurements: support and
introduces the basic concepts and the Apriori algo- confidence. The confidence, denoted as conf, is the ra-
rithm concerning association rule discovery. The sec- tio of the number of transactions that include all items
tion ‘Complete Association Rule Analysis: Mining in the consequent as well as the antecedent (namely,
both Positive and Negative Rules’ describes a com- the support) to the number of transactions that in-
plete association analysis that identifies both positive clude all items in the antecedent. For X → Y, they
and negative association rules of interest. In section are defined as follows:
‘Applications of Association Rules’, we list some of
the main applications of association rules. Conclusion (1) The supp of X → Y is the supp of XY/(X ∪
remarks are given in the fifth section. Y), where XY means both itemsets X and Y
occur at the same time, that is, supp = the
frequency of occurring patterns or p(XY).
ASSOCIATION RULES (2) The conf of X → Y is the ratio p(XY)/ p(X),
This section introduces some representative work that is, conf = the strength of implication.
on association rule mining, including the supp–conf
framework, the Apriori algorithm, and some research Association rules provide information of this
directions of association rule mining. type in the form of ‘if–then’ statements. These rules
are computed from the data and, unlike the if–then
rules in logic, association rules are probabilistic in na-
The Support–Confidence Framework
ture. In addition to the antecedent (the ‘if’ part) and
Let I = {i1 , i2 , . . . , iN } be a set of literals or items.
the consequent (the ‘then’ part), an association rule
For example, milk, sugar, and bread for purchase in
has the supp and conf measurements that express the
a store are items. Assume D is a set of transactions
degree of uncertainty about the rule. In association
over I, called the transaction database, in which a
analysis, the antecedent and consequent are sets of
transaction is a set of items, that is, a subset of I. A
items (called itemsets) that are disjoint (without any
transaction has an associated unique identifier called
items in common).
Transaction IDentifier (TID).
Association rule mining seeks interesting asso-
A set of items is referred to an itemset. For sim-
ciations and/or correlation relationships between fre-
plicity, an itemset {i1 , i2 , i3 } is sometimes written as
quent itemsets in datasets. Association rules show
i1 i2 i3 . The number of items in an itemset is the length
attribute-value conditions that occur frequently to-
(or the size) of the itemset. Itemsets of some length k
gether in a given dataset. A typical and widely-used
are referred to as k-itemsets.
example of association rule mining is market basket
Each itemset has an associated statistical mea-
analysis. The first effort on mining association rules
sure called support, denoted as supp or p. The supp is
is based on a supp–conf framework as follows.
either the proportion of transactions in the database
The supp–conf framework (Agrawal et al.3 ). Let
that contains the itemset or the number of transac-
I be the set of items in database D, X, Y ⊆ I be item-
tions that contain the itemset. Formally, for an itemset
sets, X ∩ Y = , p(X) = 0, and p(Y) = 0. Assume
X ⊆ I, p(X) is defined as the fraction of transactions
the minimal support (minsup, or ms) and minimal
in D containing X or
confidence (minconf, or mc) are given by users or
1
n
experts. Then X → Y is a valid association rule if
p(X) = 1(X ⊆ Di ) p(XY) ≥ ms and conf (X → Y) ≥ mc.
n
i=1 Accordingly, association rule mining can be bro-
where the database D is viewed as a vector of n ken down into two subproblems as follows.
records (or transactions) Di such that each record is
a set of items. (1) Generating all itemsets that have a supp
An itemset X in a transaction database D is greater than, or equal to, the user speci-
called a large (or frequent) itemset if its support p(X) fiedms. That is, identifying all frequent item-
is equal to, or greater than, a threshold of minimal sets.

98 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011
WIREs Data Mining and Knowledge Discovery Fundamentals of association rules

(2) Generating all rules that have the mc in the T A B L E 1 A Transaction Database
following way: for a frequent itemset Z, any
X ⊂ Z, and Y = Z − X, if the conf of a TID Item A Item B Item C Item D Item E
rule X → Y is greater than, or equal to, the
mc (or p(Z)/ p(X) ≥ mc, then it can be ex- 100 A C D
tracted as a valid rule. 200 B C E
300 A B C E
With the above decomposed subproblems, the
400 B E
supp–conf framework is a simple and easy-to-
understand two-step process. The first step is to search
for frequent itemsets and the second step generates as- (3.3) let Lk ← {c|c ∈ Temk ∧ ( p(c) = (c.count/
sociation rules. |D|) ≥ ms)};
(3.4) let F ← F ∪ Lk;
The Apriori Algorithm end (3)
The complexity of an association rule mining system
is heavily dependent upon the identification of fre- (4) output F;
quent itemsets. The most prevailing algorithm to per- (5) return.
form this identification is the Apriori algorithm.
The Apriori algorithm generates all frequent
Agrawal and Srikant4 observed an interesting
itemsets in a given database D. The initialization is
downward closure property, called Apriori, among
done in Step (1). Step (2) generates L1 of all frequent
frequent k-itemsets: a k-itemset is frequent only if all
1-itemsets in D in the first pass of D.
of its subitemsets are frequent. Accordingly, the fre-
Step (3) generates Lk for k ≥ 2 by a loop, where
quent 1-itemsets are searched in the first scan of the
Lk is the set of all frequent k-itemsets of interest in
database, then the frequent 1-itemsets are used to gen-
the kth pass of D, and the end condition of the loop is
erate candidate frequent 2-itemsets, and check against
Lk -1 = {}. For each pass of the database in Step (3),
the database to obtain the frequent 2-itemsets. Gen-
say pass k, there are four substeps as follows:
erally, the frequent (k-1)-itemsets are used to generate
Step (3.1) generates Temk of all k-itemsets in
candidate frequent k-itemsets, and check against the
D, where each k-itemset in Temk is generated by two
database to obtain the frequent k-itemsets. This pro-
frequent itemsets in Lk -1 . Each itemset in Temk is
cess iterates until no more frequent k-itemsets can be
counted in D by a loop in Step (3.2). Then Lk is gen-
generated for some k. This is the essence of the Apriori
erated in Step (3.3), which is the set of all potentially
algorithm.4 It is described as follows:
useful frequent k-itemsets in Temk , where all frequent
Algorithm 1. Apriori k-itemsets in Lk meet ms. Finally, Lk is added to F in
Input: D: a database; ms: minimum support; (3.4).
Output: F: a set of frequent itemsets of interest; Step (4) outputs the frequent itemsets of poten-
tial interest in F. The procedure ends in Step (5).
(1) let F ← { };
(2) let L1 ← {frequent 1-itemsets}; F ← F ∪
L1 ; An Example
Let I = {A, B, C, D, E} and the transaction universe
(3) for (k = 2; (Lk -1 = {}); k++ ) do begin be TID = {100, 200, 300, 400}a .
//Generate all possible frequent k-itemsets of In Table 1, 100, 200, 300, and 400 are the
interest in D. unique identifiers of the four transactions: A = sugar,
(3.1) let Temk ← {{x1 , . . . , xk-2 , xk-1 , xk}|{x1 , . . . , B = bread, C = coffee, D = milk, and E = cake.
xk-2 , xk-1 } ∈ Lk-1 ∧{x1 , . . . , xk-2 , xk} ∈ Lk-1 }; Each row in the table can be taken as a trans-
(3.2) for each transaction t in D do begin action. We can identify frequent itemsets (the first
step of the supp–conf framework) from these trans-
//Check which k-itemsets are included in actions using the Apriori algorithm (see, ‘How Apri-
transaction t. ori Works’), and association rules from the frequent
let Temt ← the k-itemsets in t that are also itemsets (the second step of the supp–conf framework)
contained in Temk ; using the supp–conf framework in ‘Generating Asso-
for each itemset A in Temt do ciation Rules’ (see below). Let
let A.count ← A.count + 1; (1) ms = 50% (to be frequent, an itemset must
end for occur in at least two transactions); and

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 99
Overview wires.wiley.com/widm

T A B L E 2 Frequent 1-Itemsets and Their Frequencies in Table 1 T A B L E 3 Frequent 2-Itemsets and Their Frequencies in Table 1

Itemset Frequency ≥ ms Itemset Frequency ≥ ms

{A} 2 Y {A,C} 2 Y
{B} 3 Y {B,C} 2 Y
{C} 3 Y {B,E} 3 Y
{E} 3 Y {C,E} 2 Y

T A B L E 4 Frequent 3-Itemsets and Their Frequencies in Table 1


(2) mc = 60% [to be a high-conf (or valid) rule,
at least 60% of the time you find the an- Itemsets Frequency ≥ ms
tecedent of the rule in the transactions, you
must also find the consequence of the rule {B,C,E} 2 Y
there].
itemset or L3 = {BCE}. The frequent 3-itemsets and
How Apriori Works their frequencies are listed in Table 4.
To illustrate the use of the Apriori algorithm (the L3 is then added to F and F = {A, B, C, E, AC,
first step of the supp–conf framework), we outline an BC, BE, CE, BCE}.
example of the process of identifying frequent itemsets Because L3 = {}, Step (3) is repeated. In the
in the dataset in Table 1. third iteration, candidate frequent 4-itemsets are first
In the Apriori algorithm, in Step (1) F is assigned generated by the frequent 3-itemsets of L3 in Step
an empty set. Step (2) scans the database to gen- (3.1). That is, Tem4 = {}. Because Tem4 = {}, L4 = {}
erate frequent 1-itemsets. The 1-itemsets {A}, {B}, and Step (3) is finished.
{C}, {D}, and {E} are first generated as candidates In Step (4), the frequent itemsets in F (F = {A,
at the first pass over the dataset, and A.count = 2, B, C, E, AC, BC, BE, CE, BCE}) are output and the
B.count = 3, C.count = 3, D.count = 1, and algorithm is ended in Step (5).
E.count = 3, where X.count = x means that the fre-
quency of itemset X is x. Because ms = 50% and Generating Association Rules
dbsize = 4, {A}, {B}, {C}, and {E} are frequent item- To show how to generate association rules from a
sets, or L1 = {A, B, C, E}. The frequent 1-itemsets given database (the second step of the supp–conf
and their frequencies are listed in Table 2. framework), we use the above frequent itemsets iden-
L1 is then added to F and F = {A, B, C, E}. tified from the dataset in Table 1.
Step (3) is a loop. In the first iteration, candidate For simplifying the description, we detail how
frequent 2-itemsets are first generated by the frequent to generate association rules from the frequent itemset
1-itemsets of L1 in Step (3.1). That is, Tem2 = {AB, BCE in F, with p(BCE) = 50% = ms.
AC, AE, BC, BE, CE}. And then the second pass Because p(BC E)/ p(BC) = 2/2 = 100%,
begins over the dataset to examine the candidate which is greater than mc = 60%, BC → E can be
2-itemsets of Tem2 in Steps (3.2) and (3.3). From extracted as a valid rule. In the same way, because
Table 1, AB.count = 1, AC.count = 2, AE.count = 1, p(BC E)/ p(BE) = 2/3 = 66.7%, which is greater
BC.count = 2, BE.count = 3, and CE.count = 2. than mc, BE → C can be extracted as another valid
Therefore, AC, BC, BE, and CE are frequent itemsets, rule; and because p(BC E)/ p(C E) = 2/2 = 100%
or L2 = {AC, BC, BE, CE}. The frequent 2-itemsets is greater than mc, C E → B can be extracted as a
and their frequencies are listed in Table 3. third valid rule. The association rules with 1-item
L2 is then added to F and F = {A, B, C, E, AC, consequences generated from BCE are listed in
BC, BE, CE}. Table 5.
Because L2 = {}, Step (3) is repeated. In the sec- Also, because p(BC E)/ p(BE) = 2/3 = 66.7%
ond iteration, candidate frequent 3-itemsets are first is greater than mc, B → C E can be extracted as a
generated by the frequent 2-itemsets of L2 in Step valid rule. In the same way, because p(BC E)/ p(C) =
(3.1). That is, Tem3 = {BCE}b. And then the third 2/3 = 66.7% is greater than mc, C → BE can be ex-
pass begins over the dataset to examine the candi- tracted as a valid rule; and p(BC E)/ p(E) = 2/3 =
date 3-itemsets of Tem3 in Steps (3.2) and (3.3). From 66.7% is greater than mc, E → BC can be ex-
Table 1, BCE.count = 2. Therefore, BCE is a frequent tracted as a valid rule. The association rules with

100 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011
WIREs Data Mining and Knowledge Discovery Fundamentals of association rules

T A B L E 5 Association Rules with 1-Item Consequences from 3- large. For example, suppose there are 1000 items in
Itemsets a given large database, the average number of items
in each transaction is six. Then there are almost 1015
Support possible itemsets to be counted in the database. These
Rule Number Rule Confidence (%) (%) ≥ mc
have led to some research directions as follows:
Frequent pattern (itemset) mining, such as the
Rule 1 BC → E 100 50 Y
Rule 2 BE → C 66.7 50 Y
FP-growth method5 for frequent pattern (itemset)
Rule 3 CE → B 100 50 Y mining, frequent closed patterns,6 and sampling
techniques7 for algorithm scale-up, has attracted
much attention in data mining. Well-known min-
T A B L E 6 Association Rules with 2-Item Consequences from 3- ing methods include, for example, data structures
Itemsets for association rule mining,8 hashing techniques,9
Support
partitioning,10,11 sampling,12 anytime mining,13 par-
Rule Number Rule Confidence (%) (%) ≥ mc allel and distributed mining,9,14–16 and integrating
mining with relational database systems.17
Rule 4 B → CE 66.7 50 Y Interestingness measures the strength of the re-
Rule 5 C → BE 66.7 50 Y lationship between itemsets X and Y. The prevailing
Rule 6 E → BC 66.7 50 Y measures are as follows:

(1) Interest factor18,19


T A B L E 7 Association Rules from 2-Itemsets
p(XY)
Support I(X → Y) = I(X, Y) =
p(X) p(Y)
Rule Number Rule Confidence (%) (%) ≥ mc
It is a nonnegative real number, with value
Rule 7 A→C 100 50 Y 1 corresponding to the statistical indepen-
Rule 8 C→A 66.7 50 Y dence.
Rule 9 B→C 66.7 50 Y
(2) Pearson’s correlation coefficient
Rule 10 C→B 66.7 50 Y
Rule 11 B→E 100 75 Y ϕ(X → Y) = ϕ(X, Y)
Rule 12 E→B 100 75 Y
Rule 13 C→E 66.7 50 Y p(XY) − p(X) p(Y)
= 
Rule 14 E→C 66.7 50 Y p(X) p(Y)(1 − p(X))(1 − p(Y))

It ranges between −1 and 1. If X and Y are


2-item consequences generated from BCE are listed in
independent, then ϕ(X → Y) = 0.
Table 6.
For all frequent 2-itemsets in F, we can also (3) Cosine similarity
generate all association rules as illustrated in Table 7. p(XY)
According to the above analysis, the 14 associa- cos ine(X → Y) = cos ine(X, Y) = 
p(X) p(Y)
tion rules listed above can be extracted as valid rules 
for Table 1. = conf (X → Y)conf (Y → X)

(4) Odds ratio


Research Directions in Association Rule
p(XY) p(¬X¬Y)
Mining α(X → Y) = α(X; Y) = log
As we have seen previously, association rules are use- p(X¬Y) p(¬XY)
ful in real world applications and have played a funda- (5) Rule interest20
mental role in the development of data mining. When
a given dataset is very large, association rule min- R(X → Y) = R(X, Y) = | p(XY) − p(X) p(Y)|
ing is challenging. This is because the Apriori algo-
rithm used for identifying frequent itemsets involves (6) Conviction18
a search with little heuristics about a space with an ex- conviction(X → Y) = conviction(X, Y)
ponential amount of items and possible itemsets. This
algorithm may suffer from large computational over- p(X) p(¬Y)
=
head when the number of frequent itemsets is very p(X¬Y)

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 101
Overview wires.wiley.com/widm

(7) Certainty factor21 small minimum support might lead to poor mining
p(Y|X) − p(Y) performance and many generations of uninteresting
C F (X → Y) = C F (X, Y) = association rules. Therefore, users are unreasonably
1 − p(Y)
required to know details of the database to be mined
p(XY) − p(X) p(Y) to specify a suitable threshold. However, Han et al.6
=
p(X)(1 − p(Y)) have pointed out that setting the minimum support
Note that, if p(Y) > p(Y|X), C F (X, Y) is de- is quite subtle, which can hinder the widespread ap-
fined as plications of these algorithms; our own experience
of mining transaction databases also tells us that the
p(Y|X) − p(Y) p(XY) − p(X) p(Y)
C F (X, Y) = = setting is by no means an easy task. In particular,
− p(Y) − p(X) p(Y) even though a minimum support is explored under
(8) Laplace measure the supervision of an experienced miner, we cannot
p(XY) + 1 examine whether or not the results (mined with the
laplace(X → Y) = laplace(X, Y) = hunted minimum support) are just what users want.
p(X) + 2
This means that the minimum-support setting is a key
Obviously, it is very similar to the confi- issue in automatic association rule mining.
dence. Current techniques for addressing the
(9) J measure22 minimum-support issue are as follows: In proposals
 for marketing, Piatetsky-Shapiro and Steingold28
p(Y|X)
J (X → Y) = J (X, Y) = p(X) p(Y|X) log proposed to identify only the top 10% or 20% of
p(Y) the prospects with the highest score. Han, et al.6,25

1 − p(Y|X) designed a strategy to mine top-k frequent patterns
+(1 − p(X|Y)) log for effectiveness and efficiency. In proposals for in-
1 − p(Y)
teresting itemset discovery, Cohen, et al.29 developed
Discovery of complex association rules, for exam- a family of effective algorithms for finding interesting
ple, quantitative association rules,23 causal rules,24 associations. In proposals for dealing with temporal
and multilevel and multidimensional association data, Roddick and Rice30 discussed independent
rules.25–27 thresholds and context-dependent thresholds for
Quantitative association rule mining is designed measuring time-varying interestingness of events.
for analyzing quantitative data that are over categor- In proposals for exploring new strategies, Hipp
ical attributes. An item over a categorical attribute and Guntzer31 presented a new mining approach
can be expressed as either an interval (a continuous that postpones constraints from mining to evalu-
set of attribute values) or a single value, called a quan- ation. In proposals for identifying new patterns,
titative item. For example, ‘Salary ∈ [50k, 70k]’ is a Wang, et al.32,33 designed a conf-driven mining
quantitative item. If X is a quantitative item, X can strategy without minimum support. However, these
be valued in a certain interval F. The supp of X is approaches only attempt to avoid specifying the
the sum of the supps of all values in F. A quantita- minimum support.
tive association rule is a relationship between X and Different from traditional association rule min-
Y of the form X → Y, where X and Y are quantitative ing methods, database-independent mining tech-
items. Quantitative association rule mining is based niques have been developed,34,35 in which users can
on the supp–conf framework, so do causal rule min- specify a threshold of supp for a mining task without
ing and multilevel and multidimensional association being required to know any of the database. This has
rule mining. provided a way of developing automatic association
rule mining systems.
Automation of Mining Association Rules
Apriori-based mining algorithms are based on the as- Association Analysis for Different Data Sources
sumption that users can specify the minimum support For example, data sources may be multiple, heteroge-
for their databases. That is, a frequent itemset (or neous, incomplete, and dynamic. Well-known mining
an association rule) is interesting if its supp is larger methods include local pattern analysis,36,37 selecting
than or equal to the minimum support. This creates relevant databases toward multidatabase mining,38
a challenging issue: performances of these algorithms peculiarity discovery,39 local mining for finding se-
heavily depend on some user-specified thresholds. For quential patterns,40 bridging the local and global
example, if the minimum-support value is too big, analysis for noise cleansing,41 classification from mul-
nothing might be found in a database, whereas a tiple sources,42 distributed data mining,43 and so on.

102 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011
WIREs Data Mining and Knowledge Discovery Fundamentals of association rules

COMPLETE ASSOCIATION RULE An infrequent itemset is an itemset that does not


ANALYSIS: MINING BOTH POSITIVE meet the user-specifiedms, whereas a frequent itemset
AND NEGATIVE RULES is an itemset that meets the user-specifiedms.
The negation of an itemset X is indicated by
Traditional association rule mining techniques are ¬ X. The supp of ¬ X, p(¬ X) = 1 − p(X).
only designed to find the strong patterns that have In particular, for an itemset i1 ¬ i2 i3 , its supp is
high predictive accuracy or correlation, called positive p(i1 ¬ i2 i3 ) = p(i1 i3 ) − p(i1 i2 i3 ).
association analysis. This mining sufficiently utilizes We call a rule of the form X → Y a positive
frequent itemsets for real data analysis applications. rule, and rules of the other forms negative rules. For
Wu et al.21,44 studied negative association rule mining convenience, we often use only the form X → ¬ Y
that utilizes infrequent itemsets as well in datasets. A to represent and describe negative association rules in
negative association rule is an implication of the form this chapter.
X → ¬ Y (or ¬ X → Y or ¬ X → ¬ Y). The rule X Like positive rules, a negative rule X → ¬ Y
→ ¬ Y enables us to predict that Y is unlikely to occur has also a measure of its strength, conf, defined as the
when X occurs. In other words, the negative associ- ratio p(X ¬ Y)/p(X).
ation rules do catch mutually exclusive correlations By extending the definition given in Ref 3, nega-
among items. We call negative association rule min- tive association rule discovery seeks rules of the form
ing as negative association analysis and mining both X → ¬ Y with supp and conf greater than, or equal
positive and negative association rules as complete to, user-specified ms and minimum confidence thresh-
association rule (CAR) analysis. olds, respectively, where
This section first recalls what is a significant neg-
ative association rule in ‘Negative Association Rules’. (1) X and Y are disjoint itemsets, that is, XY =
A framework for complete association analysis is ;
given in ‘A Framework for Complete Association
Analysis’. Finally, some research directions are out- (2) p(X) ≥ ms, p(Y) ≥ ms, and p(XY) < ms;
lined in ‘Research Directions in CAR Analysis’. (3) p(X → ¬Y) = p(X¬Y);
(4) conf (X → ¬Y) = p(X¬Y)/ p(X) ≥ mc.

In this article, a significant negative association


Negative Association Rules rule X → ¬ Y means that it satisfies the above-
Although positive association rules are useful in mentioned four conditions. Accordingly, the infre-
decision-making, negative association rules also play quent itemset XY is statistically regarded as a signifi-
important roles in decision-making. For example, cant itemset, or significant infrequent itemset. Below
there are typically two types of trading behaviors (in- is an example of a negative association rule.
sider trading and market manipulation) that impair Example 1c . Suppose we have a market basket
fair and efficient trading in securities stock markets. database from a grocery store, consisting of n baskets.
The objective of the market surveillance team is to Let us focus on the purchases of tea (denoted by t) and
ensure a fair and efficient trading environment for all coffee (denoted by c).
participants through an alert system. Negative asso- When p(t) = 0.25 and p(tc) = 0.2, we can apply
ciation rules assist in determining which alerts can the supp–conf framework for a potential association
be ignored. Assume that each piece of evidence A, rule t → c. The supp for this rule is 0.2, which is
B, C, D can cause an alert of unfair trading X. If fairly high. The conf is the conditional probability
having rules A → ¬ X and C → ¬ X, the team can that a customer who buys tea also buys coffee, that
make the decision of fair trading when A or C oc- is, conf(t → c) = p(tc)/p(t) = 0.2/0.25 = 0.8, which
curs; in other words, alerts caused by A or C can is very high. In this case, we would conclude that the
be ignored. This example has gained an insight into rule t → c is a valid one.
the importance of negative association rule mining. Now consider p(c) = 0.6, p(t) = 0.4,
On the contrary, the development of negative associ- p(tc) = 0.05, and mc = 0.52. The conf of t → c is p(tc)/
ation rule mining will allow companies to hunt more p(t) = 0.05/0.4 = 0.125 < mc = 0.52, and p(tc) = 0.05
business chances through using infrequent itemsets of is low. This indicates that tc is an infrequent
interest than those which only take into account fre- itemset and that t → c cannot be extracted as
quent itemsets. For capturing these significant infre- a rule in the supp–conf framework. However,
quent itemsets, this subsection introduces some basic p(t ¬ c) =p(t) − p(tc) = 0.4–0.05 = 0.35 is
concepts concerning negative association rule mining. high, and the conf of t → ¬ c is the ratio

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 103
Overview wires.wiley.com/widm

p(t ¬ c)/p(t) = 0.35/0.4 = 0.875 > mc. Therefore and a threshold minimum interestingness (mi). It
t → ¬ c is a valid rule from the database. means that a rule X → Y is of potential interest when
Mining negative association rules is a difficult R(X, Y) ≥ mi, and XY is referred to as a potentially
task due to the fact that there are essential differences interesting itemset. Including this R(X, Y) mechanism
between positive and negative association rule min- and the CF model,21 we can formally define the func-
ing. We illustrate this using an example as follows. tion that Z is a frequent itemset of potential interest
Consider a transaction database (TD) = {(A, B, as follows:
D); (B, C, D); (B, D); (B, C, D, E); (A, B, D, F)}, which
has five transactions, separated by semicolonsd . Each fipi(Z) = ( p(Z) ≥ ms) ∧ (∃X, Y ⊂ Z ∧ Z = XY)
transaction contains several items, separated by com- ∧ fipis(X, Y)
mas. There are at least 818 possible negative associ-
ation rules generated from the 49 infrequent itemsets fipis(X, Y) = (X ∩ Y = F ) ∧ g(X, Y, mc, mi) = 1
in TD when minsupp = 0.4. This means there are es-
sential differences between negative association rule where g(X, Y, mc, mi) = s(X, Y) ∨ s(Y, X), and
mining and positive association rule mining. 
Because negative association rules are hidden 1, if R(X, Y) ≥ mi ∧ C F (X, Y) ≥ mc
s(X, Y) =
in infrequent itemsets (with lower frequency), tradi- 0, otherwise
tional pruning techniques are inefficient for identi-
fying infrequent itemsets of interest.21 This means, On the contrary, to mine negative association rules,
we must exploit alternative strategies to (1) confront all itemsets for possible negative association rules in
an exponential search space consisting of all possi- a given database need to be considered. For example,
ble itemsets, frequent and infrequent in a database; if X → ¬ Y can be discovered as a valid rule, then
(2) detect which of the infrequent itemsets can gen- p(X ¬ Y) ≥ ms must hold. If ms is high, p(X ¬ Y) ≥ ms
erate negative association rules; (3) perceive which would mean that p(XY) < ms, and itemset XY cannot
of the negative association rules are really useful to be generated as a frequent itemset in existing associ-
applications; and (4) measure the interestingness of ation analysis algorithms. In other words, XY is an
both positive and negative association rules. These infrequent itemset. However, there are too many in-
problems are very different from those being faced by frequent itemsets in databases, and we must define
discovering positive association rules. And it is rather some conditions for identifying infrequent itemsets of
difficult to identify negative association rules of inter- interest.
est in databases. If X is a frequent itemset and Y is an infrequent
In this subsection, we have not introduced algo- itemset with frequency 1 in a large database, then
rithms for identifying infrequent itemsets and negative X → ¬ Y certainly looks like a valid negative rule
association rules of interest. They will be presented in because p(X) ≥ ms, p(Y) ≈ 0, p(X ¬ Y) ≈ p(X) ≥ ms,
next subsection. conf(X → ¬ Y) = p(X ¬ Y)/p(X) ≈ 1 ≥ mc. This
could indicate that the rule X → ¬ Y is valid, and the
number of this type of itemsets in a given database
A Framework for Complete can be very large. For example, rarely purchased
Association Analysis products in a supermarket are always infrequent
As we have mentioned above, there can be an expo- itemsets.
nential number of infrequent itemsets in a database, However, in practice, more attention is paid
and only some of them are useful for mining associa- to frequent itemsets, and any patterns mined in
tion rules of interest. Therefore, pruning is critical to databases would mostly involve frequent itemsets
efficiently discover complete associations of interest. only. This means that if X → ¬ Y (or ¬ X → Y,
Therefore, in this subsection, we first design a pruning or ¬ X → ¬ Y) is a negative rule of interest, X and Y
strategy and the mining framework, and then a proce- would be frequent itemsets. In other words, no matter
dure for identifying frequent and infrequent itemsets whether association rules are positive or negative, we
of interest, and finally the algorithm of generating are only interested in relationships among frequent
complete associations of interest. itemsets. To operationalize this insight, we can use
the support measure p. If p(X) ≥ ms and p(Y) ≥ ms,
A Pruning Strategye and a Complete the rule X → ¬ Y is of potential interest, and XY is
Association Mining Framework referred to as a potentially interesting itemset.
According to the interest factor,18,19 we use an inter- Including the above insight, the R(X, Y) mecha-
estingness function R(X, Y) = |p(XY) − p(X)p(Y)|20 nism and the CF model, Z is an infrequent itemset of

104 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011
WIREs Data Mining and Knowledge Discovery Fundamentals of association rules

potential interest as follows: an efficient algorithm for finding frequent itemsets of


potential interest and infrequent itemsets of potential
iipi(Z) = (∃X, Y ⊂ Z ∧ Z = XY) ∧ ( p(XY) < ms)
interest in a database.
∧ iipis(X, Y)
Algorithm 2. All Itemsets Of Interest
iipis(X, Y) = ( p(X) ≥ ms) ∧ ( p(Y) ≥ ms)
∧ (X ∩ Y = F ) ∧ h(X, Y, ms, mc, mi) Input: D: a database; ms: minimum support;
mc: minimum conference; mi: minimum in-
=1 terest;
where h(X, Y, ms, mc, mi) = t(X, ¬Y) ∨ t(¬X, Y) ∨ Output: PL: a set of frequent itemsets of in-
t(¬X, ¬Y) ∨ t(Y, ¬X) ∨ t(¬Y, X), and terest;

⎨1, if R(X, Y) ≥ mi ∧ C F (X, Y)
⎪ NL: a set of infrequent itemsets of interest;
t(X, Y) = ≥ mc ∧ g(X, Y, mc, mi) . (1) //Apriori Algorithm


0, otherwise let P L ← {};
Note that, we can also define infrequent itemsets of (2) let L1 ← {frequent1-itemsets}; P L ← P L ∪
potential interest for rules of the forms of ¬ X → Y L1;
and ¬ X →¬ Y accordingly. This article uses only the (3) for (k = 2; (Lk-1 = {}); k++ ) do begin
form of X →¬ Y to represent and describe negative //Generate all possible frequent k-itemsets of
rules for convenience. interest in D.
Using frequent itemset of potential interest (fipi) (3.1) let Temk{{x1 , . . . , xk-2 , xk-1 , xk}|{x1 , . . . , xk-2 ,
and infrequent itemset of potential interest (iipi) xk-1 } ∈ Lk-1 ∧{x1 , . . . , xk-2 , xk} ∈ Lk-1 };
mechanisms for both positive and negative rule dis-
covery, search is constrained to seek interesting rules (3.2) for each transaction t in D do begin
on certain measures, and pruning is the removal of //Check which k-itemsets are included in
uninteresting branches that cannot lead to an inter- transaction t.
esting rule that satisfies those constraints. let Temt ← the k-itemsets in t that are also
On the basis of the measures fipis and iipis, a contained in Temk ;
framework for identifying complete associations of for each itemset A in Temt do
interest is defined as follows:
let A.count ← A.count + 1;
(1) Generate the set PL of frequent itemsets and end for
the set NL of infrequent itemsets. (3.3) let Lk ← {c|c ∈ Temk ∧ ( p(c) = (c.count/
(2) Extract positive rules of the form X → Y in |D|) ≥ ms)};
PL, and negative rules of the forms X → ¬ Y, (3.4) let P L ← P L ∪ Lk;
¬ X → Y and ¬ X → ¬ Y in NL.
end (3)
It means that mining complete associations (4) let NL ← {};
(both positive and negative association rules) of inter- //Generate all possible infrequent k-itemsets
est can be decomposed into the above two subprob- of interest in D.
lems. We carry out these two in ‘Searching For Fre-
for any X and Y in P L ∧ XÇY =  do
quent and Infrequent Itemsets of Interest’ and ‘Identi-
fying Complete Associations of Interest’, respectively. if XY ∈
/ P L then
And their use will be illustrated with a dataset in let NL ← NL ∪ {(X, Y)} f
‘Complete Association Analysis: An Illustration’. (5) //Prune all uninterested itemsets in PL
for each itemset Z in PL do
Searching for Frequent and Infrequent Itemsets
of Interest if NOT(fipi(Z)) then
Many frequent itemsets relate to positive rules that are let P L ← P L-{Z};
not of interest, and many infrequent itemsets relate to (6) //Prune all uninterested itemsets in NL
negative rules that are not of interest. The search space
for each itemset (X, Y) in NL do
can be significantly reduced if the extracted itemsets
are restricted to frequent and infrequent itemsets of if NOT(iipi(XY)) then
potential interest. For this reason, we now construct let NL ← NL − {(X, Y)};

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 105
Overview wires.wiley.com/widm

(7) output PL and NL; (ii) If p(X ¬ Y) ≥ ms, p(X) ≥ ms, p(Y) ≥ ms,
(8) return. R(X, ¬ Y) ≥ mi, and CF(X, ¬ Y) ≥ mc, then
A → ¬ B is a negative rule of interest.
(iii) If p(¬ XY) ≥ ms, p(X) ≥ ms, p(Y) ≥ ms,
The procedure All Itemsets Of Interest gener-
R(¬ X, Y) ≥ mi, and CF(¬ X, Y) ≥ mc, then
ates all frequent and infrequent itemsets of interest
¬ X → Y is a negative rule of interest.
in a given database D, where PL is the set of all fre-
quent itemsets of interest in D, and NL is the set of all (iv) If p(¬ X ¬ Y) ≥ ms, p(A) ≥ ms, p(Y) ≥ ms,
infrequent itemsets of interest in D. PL and NL con- R(¬ X, ¬ Y) ≥ mi, and CF(¬ X, ¬ Y) ≥ mc,
tain only frequent and infrequent itemsets of interest then ¬ X → ¬ Y is a negative rule of interest.
respectively.
The initialization is done in Step (1). Step (2) In the above, Case (1) defines positive associ-
generates L1 of all frequent 1-itemsets in database D ation rules of interest, whereas others are negative
in the first pass of D. association rules of interest (in Cases 2, 3, and 4),
Step (3) generates Lk for k ≥ 2 by a loop, where where p(∗ ) ≥ ms guarantees that an association rule
Lk is the set of all frequent k-itemsets of interest in describes the relationship between two frequent item-
the kth pass of D, and the end condition of the loop is sets; the mi requirement makes sure that the associa-
Lk -1 = {}. For each pass of the database in Step (3), tion rule is of interest; and CF(∗ ,∗ ) ≥ mc specifies the
say, pass k, there are four substeps as follows. conf constraint.
Step (3.1) generates Temk of all k-itemsets in Let D be a database, and ms, mc, and mi be
D, where each k-itemset in Temk is generated by two given by the user. Our algorithm for extracting both
frequent itemsets in Lk -1 . Each itemset in Temk is positive and negative association rules with the CF
counted in D by a loop in Step (3.2). Then Lk is model for conf checking is designed as follows:
generated in Step (3.3). Lk is the set of all potentially
Algorithm 3. Complete Association
useful frequent k-itemsets in Temk , where all frequent
k-itemsets in Lk meet ms. Finally, Lk is added to PL
Input: D: a database; ms, mc, mi: threshold val-
in Step (3.4).
ues;
Step (4) generates the NL, that is, the set of all
infrequent itemsets, whose supports do not meet ms. Output: association rules;
And NL is the set of all potentially useful infrequent Step (1)
itemsets in D. call procedure All Itemsets Of Interest;
Steps (5) and (6) select all frequent and infre-
Step (2) // Generate positive association rules in
quent itemsets of interest. In Step (5), if an itemset
PL.
Z in PL does not satisfy fipi(Z), then Z is an unin-
teresting frequent itemset, and is removed from PL. for each frequent itemset Z in PL do
After all uninteresting frequent itemsets are removed for each expression XY = Z and X∩Y =  do
from PL; in Step (6), if an itemset (X, Y) in NL does begin
not satisfy iipi(XY), then (X, Y) is an uninteresting if fipis(X, Y) then
infrequent itemset, and is removed from NL. All un-
if CF(X, Y) ≥ mc then
interested frequent itemsets are removed from NL.
Step (7) outputs the frequent and infrequent output the rule X → Y
itemsets of potential interest in PL and NL. The pro- with confidence CF(X, Y) and support p(A);
cedure ends in Step (8). if CF(X, Y) ≥ mc then
output the rule Y → X
Identifying Complete Associations of Interest with confidence CF(X, Y) and support p(A);
Let I be the set of items in a database TD, i = XY ⊂ I end for;
be an itemset, X∩Y = , p(X) = 0, p(Y) = 0, and ms,
Step (3) // Generate all negative association rules
mc, and mi > 0 be given by the user. There are four
in NL.
possible rules between X and Y as follows:
for each itemset (X, Y) in NL do
if iipis (X, Y) then begin
(i) If p(XY) ≥ ms, R(X, Y ) ≥ mi, and CF(X,
Y) ≥ mc, then X → Y is a positive rule of if CF(¬ X, Y) ≥ mc then
interest. output the rule ¬ X → Y

106 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011
WIREs Data Mining and Knowledge Discovery Fundamentals of association rules

with confidence CF(¬ X, Y) and support p(¬ X T A B L E 8 A Transaction Database TD


|Y);
if CF(Y, ¬ X) ≥ mc then Transaction ID Items
output the rule Y → ¬ X
T1 A, B, D
with confidence CF(Y, ¬ X) and support T2 A, B, C, D
p(Y ¬ X); T3 B, D
if CF(X, ¬ Y) ≥ mc then T4 C, D, E
output the rule X → ¬ Y T5 A, E
T6 B, D, F
with confidence CF(X,¬Y) and support T7 A, E, F
p(X|¬Y); T8 C, F
if CF(¬ Y, X) ≥ mc then T9 B, C, F
T10 A, B, C, D, F
output the rule ¬ Y → X
with confidence CF(¬ Y, X) and support
p(¬ YX);
if CF(¬ X, ¬ Y) ≥ mc then Complete Association Analysis: an Illustration
output the rule ¬ X → ¬ Y This subsection illustrates the use of the All Itemsets
Of Interest and Complete Association algorithms with
with confidence CF(¬ X, ¬ Y) and support
the data in Table 8g as follows.
p(¬ X|¬Y);
Suppose we have a TD with 10 transactions
if CF(¬ Y, ¬ X) ≥ mc then in Table 8 from a grocery store. Let A = bread,
output the rule ¬ Y → ¬ X B = coffee, C = tea, D = sugar, E = beer, F = butter,
with confidence CF(¬ Y, ¬X) and support ms = 0.3, mc = 0.6, and mi = 0.05.
p(¬ Y ¬ X); The All Itemsets Of Interest algorithm works
as follows: From Steps (1)–(3), PL is generated in the
end if
same way as in the Apriori algorithm, where PL = {A,
Step (4) return. B, C, D, E, F, AB, AD, BC, BD, BF, CD, CF, ABD}.
The frequent itemsets and their frequencies are listed
in Table 8.
Step (4) is a loop of generating potentially useful
Complete Association generates not only all
infrequent itemsets NL. For simplification, we can
positive association rules in PL but also negative as-
carry out this loop as follows:
sociation rules in NL. Step (1) calls procedure All
Itemsets Of Interest to generate the sets PL and NL
(1) For itemset A, because the intersection of any
with frequent and infrequent itemsets of interest, re-
one of {B, C, D, E, F, BC, BD, BF, CD, CF}
spectively, in the database D.
in PL and A is empty, {(A, B); (A, C); (A, D);
Step (2) generates positive association rules of
(A, E); (A, F); (A, BC); (A, BD); (A, BF); (A,
interest for an expression XY of Z in PL if fipis(X, Y).
CD); (A, CF)} is a set of candidate infrequent
If CF(X, Y) ≥ mc, X → Y is extracted as a valid rule
itemsets. Because of AB, AD, and ABD in
of interest, with conf CF(X, Y) and support p(XY). If
PL, {(A, C); (A, E); (A, F); (A, BC); (A, BF);
CF(Y, X) ≥ mc, Y → X is extracted as a valid rule of
(A, CD); (A, CF)} is the set of potentially
interest, with conf CF(Y, X) and support p(XY).
useful infrequent itemsets.
Step (3) generates negative association rules of
interest for an infrequent itemset (X, Y) in NL if (2) For itemset B, because the intersection of any
iipis(X, Y). If CF(¬ X, Y) ≥ mc, ¬ X → Y is ex- one of {C, D, E, F, AD, CD, CF} in PL and
tracted as a valid rule of interest. If CF(Y, ¬ X) ≥ mc, B is empty, {(B, C); (B, D); (B, E); (B, F); (B,
Y → ¬ X is extracted as a valid rule of interest. If AD); (B, CF)} is a set of candidate infrequent
CF(X, ¬ Y) ≥ mc, X → ¬ Y is extracted as a valid rule itemsets. Because of BC, BD, BF, BAD, and
of interest. If CF(¬ Y, X) ≥ mc, ¬ Y → X is extracted ABD in PL, {(B, E); (B, CD)} is the set of
as a valid rule of interest. If CF(¬ X, ¬ Y) ≥ mc, potentially useful infrequent itemsets.
¬ X → ¬ Y is extracted as a valid rule of interest. If (3) For itemset C, because the intersection of any
CF(¬ Y, ¬ X) ≥ mc, ¬ Y → ¬ X is extracted as a one of {D, E, F, AB, AD, BD, BF, ABD} in
valid rule of interest. PL and C is empty, {(C, D); (C, E); (C, F); (C,

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 107
Overview wires.wiley.com/widm

T A B L E 9 All Frequent Itemsets and Their Frequencies in TD

Itemset Number of Transactions Support Itemset Number of Transactions Support

A 5 0.5 AD 3 0.3
B 6 0.6 BC 3 0.3
C 5 0.5 BD 5 0.5
D 6 0.6 BF 3 0.3
E 3 0.3 CD 3 0.3
F 5 0.5 CF 3 0.3
AB 3 0.3 ABD 3 0.3

AB); (C, AD); (C, BD); (C, BF); (C, ABD)} (9) For itemset BC, because no one in PL is ap-
is a set of candidate infrequent itemsets. Be- plicable to BC to generate candidate infre-
cause of CD and CF in PL, {(C, E); (C, AB); quent itemsets, there is no potentially useful
(C, AD); (C, BD); (C, BF); (C, ABD)} is the infrequent itemset in this case.
set of potentially useful infrequent itemsets. (10) For itemset BD, because only CF in PL is
(4) For itemset D, because the intersection of any applicable to BD to generate candidate in-
one of {E, F, AB, BC, BF, CF} in PL and D is frequent itemsets and the union is not in PL,
empty, {(D, E); (D, F); (D, AB); (D, BC); (D, (BD, CF) is a potentially useful infrequent
BF); (D, CF)} is a set of candidate infrequent itemset.
itemsets. Because of ABD in PL, {(D, E); (D, (11) For itemset BF, because only CD in PL is
F); (D, BC); (D, BF); (D, CF)} is the set of applicable to BF to generate candidate infre-
potentially useful infrequent itemsets. quent itemsets and the union is not in PL,
(5) For itemset E, because the intersection of any (BF, CD) is a potentially useful infrequent
one of {F, AB, AD, BC, BD, BF, CD, CF, itemset.
ABD} in PL and E is empty, {(E, F); (E, (12) For itemset CD, because no one of in PL is
AB); (E, AD); (E, BC); (E, BD); (E, BF); (E, applicable to BC to generate candidate infre-
CD); (E, CF); (E, ABD)} is a set of candi- quent itemsets, there is no potentially useful
date infrequent itemsets. Because of none of infrequent itemset in this case.
them in PL, all of them are potentially useful
(13) For itemset CF, because only ABD in PL is
infrequent itemsets.
applicable to CF to generate candidate infre-
(6) For itemset F, because the intersection of any quent itemsets and the union is not in PL,
one of {AB, AD, BC, BD, CD, ABD} in PL (CF, ABD) is a potentially useful infrequent
and F is empty, {(F, AB); (F, AD); (F, BC); itemset.
(F, BD); (F, CD); (F, ABD)} is a set of candi-
(14) For itemset ABD, no one in PL is applica-
date infrequent itemsets. Because of none of
ble to ABD to generate candidate infrequent
them in PL, all of them are potentially useful
itemsets.
infrequent itemsets.
(7) For itemset AB, because the intersection of
Therefore, we have
any one of {CD, CF} in PL and AB is empty,
{(AB, CD); (AB, CF)} is a set of candidate
NL = {(A, C, 2); (A, E, 2); (A, F, 2); (A, BC, 2);
infrequent itemsets. Because of none of them
in PL, all of them are potentially useful in- (A, BF, 1); (A, C D, 2); (A, C F, 1);
frequent itemsets.
(B, E, 0); (B, C D, 2);
(8) For itemset AD, because the intersection of
any one of {BC, BF, CF} in PL and AD is (C, E, 1); (C, AB, 2); (C, AD, 2); (C, BD, 2);
empty, {(AD, BC); (AD, BF); (AD, CF)} is a (C, BF, 2); (C, ABD, 2);
set of candidate infrequent itemsets. Because
(D, E, 1); (D, F, 2); (D, BC, 2); (D, BF, 2);
of none of them in PL, all of them are poten-
tially useful infrequent itemsets. (D, C F, 1);

108 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011
WIREs Data Mining and Knowledge Discovery Fundamentals of association rules

(E, F, 1); (E, AB, 0); (E, AD, 0); (E, BC, 0); satisfying all the conditions for a frequent itemset of
(E, BD, 0); (E, BF, 0); (E, C D, 1); potential interest. And ABD is not removed from PL.
Step (6) is also a loop of pruning uninteresting
(E, C F, 0); (E, ABD, 0); itemsets in NL. Like the above, we illustrate this step
(F, AB, 1); (F, AD, 1); (F, BC, 2); (F, BD, 2); with pairs (A, C) and (E, BC) as follows:
(a) Considering (A, C) and for A → ¬ C, we have
(F, C D, 1); (F, ABD, 1); p(AC) = 0.2 < ms, p(A) = 0.5 > ms, p(C) = 0.5 > ms,
(AB, C D, 2); (AB, C F, 1); p(A ¬ C) = 0.3 > ms and

(AD, BC, 2); (AD, BF, 1); (AD, C F, 1); R(A, ¬C) = | p(A¬C) − p(A) p(¬C)|
(BD, C F, 1); = |0.3 − 0.5 × 0.5| = 0.05 = mi
(BF, C D, 1);
p(A¬C) − p(A) p(¬C)
(C F, ABD, 1)} C F (A, ¬C) =
p(A)(1 − p(¬C))
0.3 − 0.5 × 0.5
There are 43 pairs of infrequent itemsets of the = = 0.2 < mc
0.5(1 − 0.5)
form (X, Y, x), which is only used to simplify the
description. It means that X and Y are itemsets and x
is the frequency of the itemset XY. This means t(A, ¬ C) = 0.
Step (5) is a loop of pruning uninteresting item- For ¬ C → A, we have p(AC) = 0.2 < ms,
sets in PL. We illustrate this step with the frequent p(A) = 0.5 > ms, p(C) = 0.5 > ms,
2-itemset BF and 3-itemset ABD as follows: p(¬ CA) = 0.3 > ms, and
(i) Considering BF and for B → F, we have
R(¬C, A) = | p(¬C A) − p(¬C) p(A)|
p(BF) = 0.3 = ms, and
= |0.3 − 0.5 × 0.5| = 0.05 = mi
R(B, F ) = | p(BF ) − p(B) p(F )|
= |0.3 − 0.6 × 0.5| = 0 <mi p(¬C A) − p(¬C) p(A)
C F (¬C, A) =
p(¬C)(1 − p(A))
p(BF ) − p(B) p(F )
C F (B, F ) = 0.3 − 0.5 × 0.5
p(B)(1 − p(F )) = = 0.2 < mc
0.5(1 − 0.5)
0.3 − 0.6 × 0.5
= = 0 < mc
0.6(1 − 0.5)
This means t(¬ C, A) = 0.
This means s(B, F) = 0 and the function fipi is For ¬ A → C, we have p(AC) = 0.2 < ms,
false. p(A) = 0.5 > ms, p(C) = 0.5 > ms,
On the contrary, for F → B, we have s(F, B) = 0. p(¬ AC) = 0.3 > ms, and
Therefore, g(F, B, 0.6, 0.05) = 0 and the function fipi
R(¬A, C) = | p(¬AC) − p(¬A) p(C)|
is false and BF is uninteresting and is removed from
PL. = |0.3 − 0.5 × 0.5| = 0.05 = mi
(ii) Considering ABD and for AB → D, we have
p(ABD) = 0.3 = ms, and p(¬AC) − p(¬A) p(C)
C F (¬A, C) =
R(AB, D) = | p(ABD) − p(AB) p(D)| p(¬A)(1 − p(C))

= |0.3 − 0.3 × 0.6| = 0.12 > mi 0.3 − 0.5 × 0.5


= = 0.2 < mc
0.5(1 − 0.5)
p(ABD) − p(AB) p(D)
C F (AB, D) =
p(AB)(1 − p(D)) This means t(¬ A, C) = 0.
0.3 − 0.3 × 0.6 For C → ¬ A, we have p(AC) = 0.2 < ms,
= = 1 > mc p(A) = 0.5 > ms, p(C) = 0.5 > ms,
0.3(1 − 0.6)
p(C ¬ A) = 0.3 > ms, and
This means, s(AB, D) = 1, g(AB, D, 0.6,
R(C, ¬A) = | p(C¬A) − p(C) p(¬A)|
0.05) = 1 and the function fipi is true. Therefore, ABD
is of interest because there is at least one pair (AB, D) = |0.3 − 0.5 × 0.5| = 0.05 = mi

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 109
Overview wires.wiley.com/widm

p(C¬A) − p(C) p(¬A) This means s(AB, D) = 1, g(AB, D, 0.6,


C F (C, ¬A) = 0.05) = 1, and the function fipi is true. Therefore,
p(C)(1 − p(¬A))
AB → D is an association rule of interest.
0.3 − 0.5 × 0.5 For AD → B, we have p(ABD) = 0.3 = ms,
= = 0.2 < mc
0.5(1 − 0.5) and

This means s(C, ¬ A) = 0. R(AD, B) = | p(ABD) − p(AD) p(B)|


For ¬ A → ¬ C, p(¬ A ¬ C) = 0.2 < ms, so = |0.3 − 0.3 × 0.6| = 0.12 > mi
does for ¬ C → ¬ A.
From the above, h(A, C, 0.6, 0.05) = 0, AC is
uninteresting and (A, C) is removed from NL.
p(ABD) − p(AD) p(B)
(b) Considering (E, BC) and for E → ¬BC, we C F (AD, B) =
have p(E BC) = 0 < ms, p(E) = 0.3 = ms, p(BC) = p(AD)(1 − p(B))
0.3 = ms, p(BC) = 0.3 = ms, p(E¬BC) = 0.3 > ms 0.3 − 0.3 × 0.6
= = 1 > mc
and 0.3(1 − 0.6)
R(E, ¬BC) = | p(E¬BC) − p(E) p(¬BC)|
This means s(AD, B) = 1, g(AD, B, 0.6,
= |0.3 − 0.3 × 0.3| = 0.21 > mi 0.05) = 1, and the function fipi is true. Therefore,
p(E¬BC) − p(E) p(¬BC) AD → B is an association rule of interest.
C F (E, ¬BC) = For BD → A, we have p(ABD) = 0.3 = ms,
p(E)(1 − p(¬BC))
and
0.3 − 0.3 × 0.3
= = 1 > mc R(BD, A) = | p(ABD) − p(BD) p(A)|
0.3(1 − 0.3)
= |0.3 − 0.5 × 0.5| = 0.05 = mi
This means s(E, ¬ BC) = 1, h(E, BC, 0.6,
0.05) = 1, and the function iipi is true. Therefore, (E,
BC) is of interest because there is at a rule E → ¬ BC p(ABD) − p(BD) p(A)
satisfying all the conditions for an infrequent itemset C F (BD, A) =
p(BD)(1 − p(A))
of potential interest. And (E, BC) is not removed from
NL. 0.3 − 0.5 × 0.5
= = 0.2 < mc
In Step (7), PL and NL are output. The All Item- 0.5(1 − 0.5)
sets Of Interest algorithm is terminated in Step (8).
The Complete Association algorithm works as This means s(BD, A) = 0. Therefore, BD → A
follows: Step (1) calls the All Itemsets Of Interest pro- is not of interest.
cedure to generate the PL and NL. We illustrate the For D → AB, we have p(ABD) = 0.3 = ms,
use of the Complete Association algorithm with the and
same frequent itemset ABD and infrequent itemset
(E, BC) as above. R(D, AB) = | p(ABD) − p(D) p(AB)|
Step (2) is a loop of generating positive associa- = |0.3 − 0.6 × 0.3| = 0.12 > mi
tion rules in PL. We only demonstrate the loop with
the frequent itemset ABD as follows.
Because g(F, B, 0.6, 0.05) = 1 and the function p(ABD) − p(D) p(AB)
fipi is true for ABD, we can generate some interesting C F (D, AB) =
p(D)(1 − p(AB))
association rules from frequent itemset ABD.
For AB → D, we have p(ABD) = 0.3 = ms, 0.3 − 0.6 × 0.3
= = 0.2857 < mc
and 0.6(1 − 0.3)

R(AB, D) = | p(ABD) − p(AB) p(D)| This means s(D, AB, D) = 0. Therefore,


= |0.3 − 0.3 × 0.6| = 0.12 > mi D → AB is not of interest.
For B → AD, we have p(ABD) = 0.3 = ms,
p(ABD) − p(AB) p(D) and
C F (AB, D) =
p(AB)(1 − p(D))
R(B, AD) = | p(ABD) − p(B) p(AD)|
0.3 − 0.3 × 0.6
= = 1 > mc = |0.3 − 0.6 × 0.3| = 0.12 > mi
0.3(1 − 0.6)

110 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011
WIREs Data Mining and Knowledge Discovery Fundamentals of association rules

0.3 > ms, and


p(ABD) − p(B) p(AD)
C F (B, AD) =
p(B)(1 − p(AD)) R(¬E, BC) = | p(¬E BC) − p(¬E) p(BC)|
0.3 − 0.6 × 0.3 = |0.3 − 0.7 × 0.3| = 0.09 > mi
= = 0.2857 < mc
0.6(1 − 0.3)

This means s(B, AD) = 0. Therefore, B → AD p(¬E BC) − p(¬E) p(BC)


C F (¬E, BC) =
is not of interest. p(¬E)(1 − p(BC))
For A → BD, we have p(ABD) = 0.3 = ms, 0.3 − 0.7 × 0.3
and = = 0.1837 <mc
0.7(1 − 0.3)
R(A, BD) = | p(ABD) − p(A) p(BD)|
This means s(¬E, BC) = 0. Therefore, ¬E →
= |0.3 − 0.5 × 0.5| = 0.05 = mi BC is not of interest.
For ¬E → ¬BC, we have p(E BC) = 0 < ms,
p(E) = 0.3 = ms, p(BC) = 0.3 = ms, p(¬E¬BC) =
p(ABD) − p(A) p(BD)
C F (A, BD) = 0.4 > ms
p(A)(1 − p(BD))
0.3 − 0.5 × 0.5 R(¬E, ¬BC) = | p(¬E¬BC) − p(¬E) p(¬BC)|
= = 0.2 < mc
0.5(1 − 0.5) = |0.4 − 0.7 × 0.7| = 0.09 > mi

This means s(A, BD) = 0. Therefore, A → BD


is not of interest. p(¬E¬BC) − p(¬E) p(¬BC)
C F (¬E, ¬BC) =
From the above, two positive association rules, − p(¬E) p(¬BC)
AB → D with conf CF(AB, D) = 1 and support 0.4 − 0.7 × 0.7
p(ABD) = 0.3, and AD → B with conf CF(AD, B) = 1 = = 0.1837 <mc
−0.7 × 0.7
and support p(ABD) = 0.3, are output as valid rules.
Step (3) is also a loop that generates negative as- where p(¬BC) > p(¬BC|¬E). This means
sociation rules in NL. We only demonstrate the loop s(¬E, ¬BC) = 0. Therefore, ¬E → ¬BC is not
with the infrequent itemset (E, BC) as follows. of interest.
Because h(E, BC, 0.6, 0.05) = 1 and the func- For BC → ¬E, we have p(E BC) = 0 < ms,
tion iipi is true for (E, BC), we can generate some p(E) = 0.3 = ms, p(BC) = 0.3 = ms, p(BC¬E) =
interesting negative association rules from infrequent 0.3 > ms, and
itemset (E, BC).
R(BC, ¬E) = | p(BC¬E) − p(BC) p(¬E)|
For E → ¬BC, we have p(E BC) = 0 < ms,
p(E) = 0.3 = ms, p(BC) = 0.3 = ms, p(E¬BC) = = |0.3 − 0.3 × 0.7| = 0.09 > mi
0.3 > ms, and
R(E, ¬BC) = | p(E¬BC) − p(E) p(¬BC)| p(BC¬E) − p(BC) p(¬E)
C F (BC, ¬E) =
= |0.3 − 0.3 × 0.3| = 0.21 > mi p(BC)(1 − p(¬E))
0.3 − 0.3 × 0.7
= = 1 > mc
0.3(1 − 0.7)
p(E¬BC) − p(E) p(¬BC)
C F (E, ¬BC) =
p(E)(1 − p(¬BC)) This means s(BC, ¬E) = 1, h(BC, E, 0.6,
0.3 − 0.3 × 0.3 0.05) = 1, and the function iipi is true. Therefore,
= = 1 > mc
0.3(1 − 0.3) BC → ¬E is a negative association rule of interest.
For ¬BC → E, we have p(¬BC E) = 0 < ms,
This means s(E, ¬BC) = 1, h(E, BC, 0.6, p(E) = 0.3 = ms, p(BC) = 0.3 = ms, p(¬BC E) =
0.05) = 1, and the function iipi is true. Therefore, 0.3 > ms, and
E → ¬BC is a negative association rule of interest.
R(¬BC, E) = | p(¬BC E) − p(¬BC) p(E)|
For ¬E → BC, we have p(E BC) = 0 < ms,
p(E) = 0.3 = ms, p(BC) = 0.3 = ms, p(¬E BC) = = |0.3 − 0.7 × 0.3| = 0.09 > mi

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 111
Overview wires.wiley.com/widm

p(¬BC E) − p(¬BC) p(E) (a) sup(a,yi ) ≥ minsupp, sup(b,yi ) ≥ minsupp


C F (¬BC, E) = (Mediator Support Condition).
p(¬BC)(1 − p(E))
(b) d(a,yi ) ≥ td , d(b, yi ) ≥ td where d(p, q) is a
0.3 − 0.7 × 0.3 measure of the dependence between p and q
= = 0.1837 < mc
0.7(1 − 0.3) (Dependence Condition).52
This means s(¬BC, E) = 1, h(BC, E, 0.6, As in negative associations, an indirect associ-
0.05) = 1, and the function iipi is true. Therefore, ation between an itempair {a,b} also requires that
¬BC → E is a negative association rule of interest. {a,b} is an infrequent itemset (the Itempair Support
For ¬BC → ¬E, we have p(¬BC¬E) = Condition). The most significant difference between
0 < ms, p(E) = 0.3 = ms, p(BC) = 0.3 = ms, negative associations and indirect associations is that
p(¬BC¬E) = 0.4 > ms, and a mediator is central to the concept of indirect asso-
R(¬BC, ¬E) = | p(¬BC¬E) − p(¬BC) p(¬E)| ciations.
It is assumed that a lattice of frequent itemsets
= |0.4 − 0.7 × 0.7| = 0.09 > mi (FI) has been generated using an existing algorithm
such as Apriori.52 During each pass of candidate gen-
p(¬BC¬E) − p(¬BC) p(¬E) eration, it will find all FIs, yi ∈ I − {a,b}, such that
C F (¬BC, ¬E) = both {a} ∪ yi ∈ FI and {b} ∪ yi ∈ FI. Indirect as-
− p(¬BC) p(¬E)
sociation mining has been implemented using (SAS
0.4 − 0.7 × 0.7 Institute Inc., Cany, North Carolina, USA) and tested
= = 0.1837 < mc
−0.7 × 0.7 on various data sets including Web log, text, retail,
where p(¬E) > p(¬E|¬BC). This means and stock market data to demonstrate its utility, and
s(¬BC, ¬E) = 0. Therefore, ¬BC → ¬E is not can be combined with negative association analysis.
of interest.
From the above, two negative association Applications
rules, E → ¬BC with conf C F (E, BC) = 1 and There are many successful cases in real appli-
support p(E¬BC) = 0.3, BC → ¬E with conf cations, for example, identifying complex spatial
C F (BC, ¬E) = 1 and support p(BC¬E) = 0.3, are relationships,54 detecting adverse drug reactions,55
output as valid rules. discovering XML query patterns for caching,56 ex-
The Complete Association algorithm is ended in ploring the relationship between urban land surface
Step (4). temperature and biophysical/social parameters,57 hy-
perlink assessment,58 and filtering Web recommenda-
tion lists.59
Research Directions in CAR Analysis
Since the definition in Wu et al.,44 mining positive and
negative association rules, referred to CAR mining
in this paper, have become an active research topic.
APPLICATIONS OF ASSOCIATION
There are three lines of main research efforts in CAR RULES
analysis, and we outline them as follows. Association rules have been widely used in real ap-
plications. We have mentioned some of them in the
Algorithm Scale-Up previous sections. Here, we outline them from the
Research efforts in this direction include different following perspectives: (1) data mining and machine
frameworks,45–48 new pruning strategies,21,49 and learning (e.g., associative classification and cluster-
confined rule mining.50 ing), (2) search engines (e.g., cube computation and
analysis), and (3) other applications.
Indirect Associations
An itempair {a,b} is indirectly associated via an item-
set (called a mediator) Y if the following conditions Applications to Data Mining and Machine
hold51–53 : Learning
Association rules have been demonstrated to be use-
(1) sup(a,b) ≥ minsupp (Itempair Support Con- ful in the areas of data mining and machine learning.
dition), and One of the important applications is classification. A
(2) There exists a nonempty itemset Y such that well-known application of association rules to clas-
forall yi ∈ Y: sification is the Classification Based on Associations

112 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011
WIREs Data Mining and Knowledge Discovery Fundamentals of association rules

algorithm.38 The idea is to first identify the associ- algorithm,72 the LiveSet-Driven method,73 and the
ation between a frequent pattern and a class label, ConSGapMiner technique.74
and then the discovered association rules are used for Also, association rule mining techniques have
predicting unlabelled data. From published reports on been successfully applied to discover patterns and
associative classification, it can be more accurate than knowledge from the Web.75,76 It includes Web usage
typical classification methods, such as C4.5. Other mining, Web structure mining, and Web content min-
main reports include emerging patterns-based clas- ing. An early application of association rule mining to
sifiers in Dong and Li60 and Li et al.,61 classifica- Web data is the analysis of users’ browse behaviors,
tion based on multiple association rules in Li et al.,62 called Web usage mining. It includes user grouping,
classification based on predictive association rules in page association, and sequential clicks through anal-
Yin and Han,63 and the classifier Refined Classifica- ysis. Web content mining identifies potentially useful
tion Based on Top-k rule groups (RCBT) in Cong information within Web pages, whereas Web struc-
et al.64 ture mining discovers useful structure linkage among
Another important application is clustering. It Web pages. Other applications include, for example,
is mainly applied to high-dimensional data clustering. discovering XML query patterns for caching,56 hyper-
A well-established application CLustering In QUEst link assessment,58 and filtering Web recommendation
(CLIQUE) is given in Agrawal et al.,65 which is an lists.59
Apriori-based dimension-growth subspace clustering
algorithm. It integrates density-based and grid-based
clustering methods. The Apriori property is used to Applications to Other Subjects
find clusterable subspaces, and dense units are iden- For trusty software development, association rule
tified. The algorithm then finds adjacent dense grid mining techniques have been applied to software bug
units in the selected subspaces using a depth first mining60 and software change history.77–79 For ex-
search. Clusters are formed by combining these units ample, Liu et al.80 developed a method to classify the
using a greedy growth scheme. An entropy-based sub- structured traces of program executions using soft-
space clustering algorithm for mining numerical data, ware behavior graphs. It utilizes a frequent graph
called entropy-based subspace clustering (ENCLUS), mining technique. Suspicious buggy regions are iden-
was proposed by Cheng et al.66 Beil et al.67 proposed tified through the capture of the classification accu-
a method for frequent term-based text clustering. racy change, which is measured incrementally during
Wang et al.68 proposed pCluster, a pattern similarity- program execution.
based clustering method for microarray data analy- Other applications include identifying com-
sis, and demonstrated its effectiveness and efficiency plex spatial relationships,54 detecting adverse drug
for finding subspace clusters in a high-dimensional reactions,55 and exploring the relationship be-
space. tween urban land surface temperature and biophysi-
cal/social parameters.57

Applications to Search Engines


In real applications, data are simply large, and it is in- CONCLUSIONS
efficient to perform a sequential scan on the whole Along with classification and clustering, both of
database and examine objects one by one. To ef- which are mentioned in Applications of Association
ficiently search large and complex databases, Yan Rules, association analysis is among the three funda-
et al.69 proposed a discriminative frequent pattern- mental techniques in data mining. This paper has re-
based approach to index structures and graphs, called viewed the traditional Apriori algorithm and mining
gIndex. SeqIndex is one example using a frequent both negative and positive association rules in detail
pattern-based approach to index sequences. Taking with illustrative examples. We have also provided an
frequent patterns as features, new strategies to per- account on the research directions and applications.
form structural similarity search were developed in Because of space constraints, we have not covered
Grafil34 and Partition-based graph Index and Search sequential association analysis and other advanced
(PIS).70 topics in this chapter.
Another application is the iceberg cube com-
putation. The first algorithm to compute iceberg
cubes is bottom-up computation (BUC) proposed by
Beyer and Ramakrishnan.71 It is based on the Apri-
NOTES
a
ori property. More examples include the cubegrade It is adapted from (Zhang and Zhang 2002).

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 113
Overview wires.wiley.com/widm

b f
From the definition of Temk , there is only one can- For convenience of identifying negative rules,
didate in the second iteration. an infrequent itemset XY in NL is written to
c
Some data are adapted from Refs 18, 21. (X, Y).
d g
It is adapted from Ref 21. The data are slightly different from that in Wu
e
The techniques are similar to that in Ref 21. et al.21

ACKNOWLEDGEMENTS
This work was supported in part by the Australian Research Council under grant DP0985456,
the Nature Science Foundation (NSF) of China under grant 90718020, the China 973 Pro-
gram under grant 2008CB317108, the Research Program of China Ministry of Personnel for
Overseas-Return High-level Talents, the MOE Project of Key Research Institute of Humanities
and Social Sciences at Universities (07JJD720044), and the Guangxi NSF (Key) grants.

REFERENCES
1. Frawley WJ, Piatetsky-Shapiro G, Matheus CJ. Knowl- Conference on Very Large Databases. 1995, 432–
edge discovery in databases: An overview, AI Maga- 444.
zine, 1992, 13:57–70. 11. Zhang S, Wu X. Large Scale Data Mining Based on
2. Fayyad U, Piatetsky-Shapiro G, Smyth P. From data Data Partitioning. Applied Artificial Intelligence, 2001,
mining to knowledge discovery: an overview. Adv 15:129–139.
Knowledge Discov Data Min 1996, 1–34. 12. Toivonen H. Sampling large databases for association
3. Agrawal R, Imielinski T, Swami A. Mining association rules. Proceedings of the 22nd International Confer-
rules between sets of items in large databases. Proceed- ence on Very Large Databases. 1996, 134–145.
ings of the 1993 ACM SIGMOD International Con- 13. Zhang S, Zhang C. Anytime mining for multiuser ap-
ference on Management of Data. 1993, 207–216. plications. IEEE Trans Syst Man Cybern A Syst Hum
4. Agrawal R, Srikant R. Fast algorithms for mining as- 2002, 32:515–521.
sociation rules in large databases. Proceedings of the 14. Agrawal R, Shafer J. Parallel mining of association
Twentieth International Conference on Very Large rules. IEEE Transactions on Knowledge and Data En-
Databases. 1994, 487–499. gineering, 1996, 8:962–969.
5. Hon J, et al. Mining Frequent Patterns without Can- 15. Cheung D, Han J, Ng V, Wong C. Maintenance of
didate Generation. Proceedings 2000 ACM-SIGMOD discovered association rules in large databases: an in-
International Conference on Management of Data cremental updating technique. Proceedings of the 12th
(SIGMOD’00), Dallas, TX, May 2000, 1–12. IEEE International Conference on Data Engineering.
1996, 106–114.
6. Han J, Wang J, Lu Y, Tzvetkov P. Mining top-K fre-
quent closed patterns without minimum support. In: 16. Zaki M, et al. New Algorithms for Fast Discovery of
Proceedings of ICDM. 2002, 211–218. Association Rules. Proceedings of the Third Interna-
tional Conference on Knowledge Discovery and Data
7. Zhang C, Zhang S, Webb G. Identifying approximate Mining (KDD-97), 1997, 283–286.
itemsets of interest in large databases. Appl Int 2003,
17. Sarawagi S, Thomas S, Agrawal R: Integrating Min-
18:91–104.
ing with Relational Database Systems: Alternatives and
8. Yan X, Zhang C, Zhang S. On data structures for as- Implications. Proceedings of ACM SIGMOD Inter-
sociation rule discovery. Appl Artif Intell 2007, 21:57– national Conference on Management of Data, 1998,
79. 343–354.
9. Park J, Chen M, Yu P. An effective hash-based al- 18. Brin S, Motwani R, Silverstein C. Beyond market bas-
gorithm for mining association rules. Proceedings of kets: generalizing association rules to correlations. Pro-
ACM SIGMOD International Conference on Manage- ceedings of the ACM SIGMOD Conference. 1997,
ment of Data, 1995, 175–186. 265–276.
10. Savasere A, Omiecinski E, Navathe S. An effi- 19. Silverstein C, Brin S, Motwani R. Beyond Market Bas-
cient algorithm for mining association rules in large kets: Generalizing Association Rules to Dependence
databases. Proceedings of the 21nd International Rules. Data Min Knowl Discov, 1998, 2:39–68.

114 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011
WIREs Data Mining and Knowledge Discovery Fundamentals of association rules

20. Piatetsky-Shapiro G. Discovery, Analysis, and Pre- 36. Zhang S, Wu X, Zhang C, Multi-Database Mining.
sentation of Strong Rules. Knowledge Discovery in IEEE Computational Intelligence Bulletin, June 2003,
Databases, 1991, 229–248. 2:5–13.
21. Wu X, Zhang C, Zhang S. Efficient mining of both 37. Zhang S, Zaki M. Mining multiple data sources: local
positive and negative association rules. ACM Trans Inf pattern analysis. Data Min Knowledge Discov 2006,
Syst 2004, 22:381–405. 12:121–125.
22. Wang Ke, Tay W, Liu B. An Interestingness-Based In- 38. Liu H, Lu H, Yao J. Identifying relevant databases for
terval Merger for Numeric Association Rules. Proceed- multi-database mining. Proceeding of PAKDD. 1998,
ings of the Fourth International Conference on Knowl- 210–221.
edge Discovery and Data Mining, New York, USA, 39. Zhong N, Yao Y, Ohshima M. Peculiarity Oriented
1998, 121–127. Multidatabase Mining. IEEE Trans Knowl Data Eng,
23. Srikant R, Agrawal R. Mining generalized association 2003, 15:952–960.
rules. Proceedings of the 21nd International Confer- 40. Kum H, Chang J, Wang W. Sequential pattern mining
ence on Very Large Databases. 1995, 407–419. in multi-databases via multiple alignment. Data Min
24. Zhang S, Zhang C. Discovering causality in Knowledge Discov 2006, 12:151–180.
large databases. Appl Artif Intell 2002, 16:333– 41. Zhu X, Wu X, Chen Q. Bridging local and global
358. data cleansing: identifying class noise in large, dis-
25. Han J, Pei J, Yin Y, Mao R. Mining frequent patterns tributed data datasets. Data Min Knowledge Discov
without candidate generation: a frequent-pattern tree 2006, 12:275–308.
approach. Data Min Knowledge Discov 2004, 8:53– 42. Ling C, Yang Q. Discovering classification from data of
87. multiple sources. Data Min Knowledge Discov 2006,
26. Han J, Kamber M. Data Mining: Concepts and Tech- 12:181–201.
niques. The Morgan Kaufmann Series in Data Man- 43. Zaki M. Parallel and distributed association mining: a
agement Systems, 2006. survey. IEEE Concurrency. 1999.
27. Kamber M, Han J, Chiang J: Metarule-Guided Mining 44. Wu X, Zhang C, Zhang S. Mining Both Positive and
of Multi-Dimensional Association Rules Using Data Negative Association Rules. In: Proceedings of the 19th
Cubes. Proceedings of the Fourth International Con- International Conference on Machine Learning, Syd-
ference on Knowledge Discovery and Data Mining, ney, Australia, July 2002, 658–665.
1997, 207–210. 45. Goncalves E, Mendes I, Plastino A. Mining exceptions
28. Piatetsky-Shapiro G, Steingold S. Measuring lift quality in databases. AI 2004: advances in artificial intelli-
in database marketing. SIGKDD Explor 2000, 2:76– gence. 17th Australian Joint Conference on Artificial
80. Intelligence. 2004, 1076–1081.
29. Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, 46. Pedreshi D, Ruggieri S, Turini F. Discrimination-aware
Motwani R, Ullman JD, Yang C. Finding interest- data mining. Proceeding of the 14th ACM SIGKDD
ing associations without support pruning. IEEE Trans International Conference on Knowledge Discovery and
Knowledge Data Eng 2001, 13:64–78. Data Mining. 2008, 560–568.
30. Roddick JF, Rice S. What’s interesting about cricket? 47. Shimada K, Hirasawa K, Hu J. Class association rule
On thresholds and anticipation in discovered rules. mining with chi-squared test using genetic network
SIGKDD Explor 2001, 3:1–5. programming. IEEE International Conference on Sys-
31. Hipp J, Guntzer U. Is pushing constraints deeply into tems, Man and Cybernetics. (SMC06), 2006, 5338–
the mining algorithms really what we want? SIGKDD 5344.
Explor 2002, 4:50–55. 48. Zhao L, Zaki MJ, Ramakrishnan N. BLOSOM: a
32. Wang K, He Y, Cheung D, Chin F. Mining confi- framework for mining arbitrary Boolean expressions.
dent rules without support requirement. In: Proceed- Proceedings of the 12th ACM SIGKDD International
ings of the 10th ACM International Conference on Conference on Knowledge Discovery and Data mining.
Information and Knowledge Management. 2001, 89– 2006, 827–832.
96. 49. Wan Q, An A. An efficient approach to mining indirect
33. Wang K, He Y, Han J. Pushing support constraints associations. J Intell Inf Sys 2006, 27:135–158.
into association rules mining. IEEE Trans Knowledge 50. Antonie M, Zaiane O. Mining positive andnegative
Data Eng 2003, 15:642–658. association rules: an approach for confined rules.
34. Yan X, Zhang C, Zhang S. Armga: identifying inter- Proceedings of the 8th European Conference on
esting association rules with genetic algorithms. Appl Principles and Practice of Knowledge Discovery in
Artif Intell 2005, 19:677–689. Databases. 2004, 27–38.
35. Zhang S, Wu X, Zhang C, Lu J. Computing 51. Tan P-N, Kumar V, Kuno H. Using SAS for mining
the minimum-support for mining frequent patterns. indirect associations in data. In Proc of the Western
Knowledge Inf Syst 2008, 15:233–257. Users of SAS Software Conference. 2001.

Volume 1, March/April 2011 


c 2011 John Wiley & Sons, Inc. 115
Overview wires.wiley.com/widm

52. Tan P, Kumar V, Srivastava J. Indirect association: SIGMOD International Conference on Management
Mining higher order dependencies in data. In Principles of Data, 1998, 94–105.
of Data Mining and Knowledge Discovery. Springer, 66. Cheng CH, Fu AW, Zhang Y. Entropy-based subspace
Lyon, France, 2000, 632–637. clustering for mining numerical data. In: Proceeding
53. Tan P, Kumar V, Srivastava J. Selecting the right inter- of International Conference on Knowledge Discovery
estingness measure for association patterns. Proceed- and Data Mining (KDD’99), 1999, 84–93.
ings of the Fourth International Conference on Knowl- 67. Beil F, Ester M, Xu X. Frequent term-based text clus-
edge Discovery and Data Mining, 2002, 32–41. tering. In: Proceeding of ACM SIGKDD International
54. Munro R, Chawla S, Sun P. Complex spatial relation- Conference on Knowledge Discovery in Databases
ships. Third IEEE International Conference on Data (KDD’02), 2002, 436–442.
Mining (ICDM’03). 2003, 227. 68. Wang H, Wang W, Yang J, Yu PS. Clustering by pat-
55. Jin HW, Chen J, He H, Williams GJ, Kelman C, tern similarity in large data sets. In: Proceeding of
O’Keefe CM. Mining unexpected temporal associa- ACM-SIGMOD International Conference on Manage-
tions: applications in detecting adverse drug reactions. ment of Data, 2002, 418–427.
IEEE Trans Inf Technol Biomed 2008, 12:488–500. 69. Yan X, Zhang C, Zhang S. Identifying Software Com-
56. Chen L, Bhowmick SS, Chia LT. Mining positive and ponent Association with Genetic Algorithm. Interna-
negative association rules from XML query patterns tional Journal of Software Engineering and Knowledge
for caching. DASFAA-05. 2005, 736–747. Engineering, 2004, 14:441–447.
57. Rajasekar U, Weng Q. Application of association rule 70. Yan X, Zhang C, Zhang S. On Data Structures for
mining for exploring the relationship between urban Association Rule Discovery. Applied Artificial Intelli-
land surface temperature and biophysical/social pa- gence, 2007, 21:57–79.
rameters. Photogramm Eng Remote Sensing 2009, 71. Beyer K, Ramakrishnan R. Bottom-up computation
75:385–396. of sparse and iceberg cubes. In: Proceeding of ACM-
58. Kazienko P and Pilarczyk M. Hyperlink assessment SIGMOD International Conference on Management
based on web usage mining. Proceedings of the Sev- of Data, 1999, 359–370.
enteenth Conference on Hypertext and Hypermedia. 72. Imielinski T, Khachiyan L, Abdulghani A Cubegrades:
2006, 85–88. generalizing association rules. Data Min Knowl Dis-
59. Kazienko P. Filtering of web recommendation lists us- cov, 2002, 6:219–258.
ing positive and negative usage patterns. Knowledge- 73. Dong G, Han J, Lam J, Pei J, Wang K, Zou W. Mining
Based Intelligent Information and Engineering Sys- constrained gradients in multi-dimensional databases.
tems. 2007, 1016–1023. IEEE Trans Knowl Data Eng, 2004, 16:922–938.
60. Dong G, Li J. Efficient mining of emerging patterns: 74. Ji X, Bailey J, Dong G. Mining minimal distinguish-
Discovering trends and differences. Proceedings of the ing subsequence patterns with gap constraints. In: Pro-
Fourth International Conference on Knowledge Dis- ceeding of International Conference on Data Mining
covery and Data Mining, 1999, 43–52. (ICDM’05), 2005, 194–201.
61. Li J, Dong G, Ramamohanarao K. Instance-Based 75. Kosala R, Blockeel H. Web mining research: a survey.
Classification by Emerging Patterns. Principles of Data ACM SIGKDD Explorations, 2000, 2:1–15.
Mining and Knowledge Discovery (PKDD-00), 2000, 76. Srivastava J, Cooley R, Deshpande M, Tan PN. Web
191–200. usage mining: discovery and applications of usage pat-
62. Li J, Ramamohanarao K, Dong G. Combining the terns from web data. ACM SIGKDD Explorations,
Strength of Pattern Frequency and Distance for Clas- 2000, 1:12–23.
sification. Knowledge Discovery and Data Mining 77. Shirabad J, Lethbridge T, Matwin S. Mining the main-
(PAKDD-01), 2001, 455–466. tenance history of a legacy software system. ICSM-
63. Yin X, Han J. CPAR: Classification based on Predic- 2003. 2003, 95–104.
tive Association Rules. Proceedings of the Third SIAM 78. Ying A, Murphy G, Ng R, Chu-carroll M. Predicting
International Conference on Data Mining, San Fran- source code changes by mining change history. IEEE
cisco, CA, USA, May 1–3, 2003, Student Paper 5. Trans Software Eng 2004, 30:574–586.
64. Cong G, Tan K, Tung A, Xu X. Mining Top-k Covering 79. Zhao Q, Bhowmick S Mining history of changes to
Rule Groups for Gene Expression Data. In: Proceed- web access patterns. PKDD-2004. 2004, 521–523.
ings of ACM SIGMOD International Conference on 80. Liu C, Yan X, Yu H, Han J, Yu P. Mining behavior
Management of Data, 2005, 670–681. graphs for “backtrace” of noncrashing bugs. In: Pro-
65. Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Au- ceeding of the 2005 SIAM international conference on
tomatic Subspace Clustering of High Dimensional Data data mining (SDM’05), Newport Beach: 2005, 286–
for Data Mining Applications. In: Proceedings ACM 297.

116 
c 2011 John Wiley & Sons, Inc. Volume 1, March/April 2011

Potrebbero piacerti anche