Sei sulla pagina 1di 6

Information and Knowledge Management

ISSN 2224-5758 (Paper) ISSN 2224-896X (Online)


Vol 2, No.6, 2012
Interestingness Measures for Multi
R Vijaya Prakash
1. Dept. of Informatics, Kakatiya University, Warangal, India
2. Dept. Of Computer Science & Engineering, JNT
3. Dept. Of Computer Science & Engineering, Vaagdevi College of Engineering
*E-mail of the corresponding author:
Abstract
Association rule mining is one technique
items. Much work has been done focusing on efficiency, effectiveness
focusing on the quality of rules from single level datasets with many
However, there is a lack of interestingness measures developed for multi
Single level measures do not take into account the hierarchy found in a multi
Support-Confidence approach, which does not
approaches which measure multi-level association rules to help
measures of diversity and peculiarity can be used to identify those rules from
potentially useful.
Keywords: Information Retrieval, Interestingness

1. Introduction
Interestingness measures are necessary to rank
different results, and experts, Lenca
interestingness of discovered association rules is an important and active area within data mining research
Geng & H.J. Hamilton 2006). The primary problem is the selection of interestingness measures for a given
application domain. However, there is no formal agreement on a definition for what makes rules interesting.
Association rule algorithms produce thousands of rules, m
et al 2003). In order to filter the rules, the user generally supplies a minimum threshold for support and
confidence. Support and confidence
basic measures of association rule
derived from single level or flat datasets, which were most
datasets are more common in many domains. With this
discover multi-level and cross-level association
derived from multi-level datasets.
approaches for multi-level and cross
datasets are often a source of numerous rules and in
difficult to determine which ones are interesting
existing interestingness measures for single level association rules can not accurately
of multi-level rules since they do not take into consideration the concept of the
in multi-level datasets.
In this paper, we propose measures
rules by examining the diversity and
discovery phase during post-processing to help users determine the interesting
defined as when comparing two data sets, the one with more diverse ru
used to compare two data sets to determine which data set contains rules that are more interesting.
The paper is organized as follows. Section 2 discusses related work. The theory, background and
assumptions behind our proposed interestingness measures are presented in Section 3. Experiments and results
are presented in Section 4. Lastly, Section 5 concludes the paper.

2. Related Work
For as long as association rule mining has been around,
interesting. Originally this started with using the concepts
Since then, many more measures have been proposed
Lallich et al 2006). The Support-Confidence approach is appealing due to the anti
support. However, the support component will ignore itemsets with a low support even
may generate rules with a high confidence

896X (Online)
20
Interestingness Measures for Multi-Level Association Rules

R Vijaya Prakash
1*
, Dr. A. Govardhan
2
, Prof. SSVN. Sarma
3

Dept. of Informatics, Kakatiya University, Warangal, India
Dept. Of Computer Science & Engineering, JNT University, Hyderabad, India
Dept. Of Computer Science & Engineering, Vaagdevi College of Engineering
mail of the corresponding author: vijprak@hotmail.com

Association rule mining is one technique that is widely used to obtain useful associations
work has been done focusing on efficiency, effectiveness and redundancy. There has also been a
quality of rules from single level datasets with many interestingness measures proposed.
of interestingness measures developed for multi-level and cross-
take into account the hierarchy found in a multi-level dataset. This leaves the
which does not consider for the hierarchy. In this paper we propose two
level association rules to help and evaluate their interestingness. These
peculiarity can be used to identify those rules from multi-
Information Retrieval, Interestingness Measures, Association Rules, Multi-Level Datasets
Interestingness measures are necessary to rank Association rule patterns. Each interestingness
, Lenca et.al (2008) have different opinions of what constitutes a good rule. The
interestingness of discovered association rules is an important and active area within data mining research
. The primary problem is the selection of interestingness measures for a given
application domain. However, there is no formal agreement on a definition for what makes rules interesting.
Association rule algorithms produce thousands of rules, many of which are redundant (K. McGarry 2005, Li. J.
. In order to filter the rules, the user generally supplies a minimum threshold for support and
confidence. Support and confidence(R. Agrawal et al 1993, G. Dong & J. Li 1998, S. Lallich
basic measures of association rule interestingness. All of these measures were proposed for association rules
from single level or flat datasets, which were most commonly transactional datasets.
in many domains. With this increase in usage there is a big demand for techniques
level association rules and also techniques to measure interestingness
level datasets. J. Han & Y. Fu (1995, 1999), C.N. Zeigler et al
level and cross-level frequent itemset discovery have been proposed. However, multi
datasets are often a source of numerous rules and in fact the rules can be so numerous it can be much
difficult to determine which ones are interesting (R. Agrawal et al 1993, G. Dong & J. Li 1998)
for single level association rules can not accurately measure the interestingness
they do not take into consideration the concept of the hierarchical structure that exists
we propose measures particularly for assessing the interestingness of multilevel
and peculiarity among rules. These measures can be
processing to help users determine the interesting rules. Diversity of a data set is
defined as when comparing two data sets, the one with more diverse rules is more interesting. Diversity will be
used to compare two data sets to determine which data set contains rules that are more interesting.
The paper is organized as follows. Section 2 discusses related work. The theory, background and
ind our proposed interestingness measures are presented in Section 3. Experiments and results
are presented in Section 4. Lastly, Section 5 concludes the paper.
For as long as association rule mining has been around, there has been a need to determine which rules are
Originally this started with using the concepts of support and confidence (R. Agrawal
measures have been proposed (G. Dong & J. Li 1998, L. Geng & H.J. Hamilton 2006, S.
Confidence approach is appealing due to the anti monotonicity
component will ignore itemsets with a low support even
confidence (S. Lallich et al 2006). Also, the Support-Confidence approach
www.iiste.org

Level Association Rules

University, Hyderabad, India
Dept. Of Computer Science & Engineering, Vaagdevi College of Engineering

associations rules among sets of
and redundancy. There has also been a
interestingness measures proposed.
-level Association rules.
dataset. This leaves the
In this paper we propose two
their interestingness. These
-level datasets that are
Level Datasets, Itemsets
ssociation rule patterns. Each interestingness measure produces
have different opinions of what constitutes a good rule. The
interestingness of discovered association rules is an important and active area within data mining research (L.
. The primary problem is the selection of interestingness measures for a given
application domain. However, there is no formal agreement on a definition for what makes rules interesting.
(K. McGarry 2005, Li. J.
. In order to filter the rules, the user generally supplies a minimum threshold for support and
1993, G. Dong & J. Li 1998, S. Lallich et al 2006) are
measures were proposed for association rules
commonly transactional datasets. Today multi-level
increase in usage there is a big demand for techniques to
rules and also techniques to measure interestingness of rules
et al (2005) proposed some
been proposed. However, multi-level
fact the rules can be so numerous it can be much more
1993, G. Dong & J. Li 1998). Moreover, the
measure the interestingness
hierarchical structure that exists
particularly for assessing the interestingness of multilevel association
among rules. These measures can be determined at rule
Diversity of a data set is
les is more interesting. Diversity will be
used to compare two data sets to determine which data set contains rules that are more interesting.
The paper is organized as follows. Section 2 discusses related work. The theory, background and
ind our proposed interestingness measures are presented in Section 3. Experiments and results
to determine which rules are
R. Agrawal et al 1993).
G. Dong & J. Li 1998, L. Geng & H.J. Hamilton 2006, S.
monotonicity property of the
component will ignore itemsets with a low support even though these itemsets
Confidence approach does
Information and Knowledge Management
ISSN 2224-5758 (Paper) ISSN 2224-896X (Online)
Vol 2, No.6, 2012
not necessarily ensure that the rules are truly interesting,
frequency of the consequent (S. Lallich
interestingness of a rule are needed.
All of these existing measures fall
subjective based (based on the raw
explanations of the patterns) (L. Geng & H.J. Hamilton 2006)
Hamilton 2006) there are nine criteria
nine criteria are; conciseness, coverage,
actionability or applicability. The first five criteria are considered to be objective, with the
surprisingness being considered to be subjective. The final two criteria are considered to
Despite all the different measures, studies and
definition of what interestingness is in the
2006). More recently several surveys of interestingness measures
2006, S. Lallich et al 2006, P. Lenca
strengths and weaknesses of various
Lenca et al (2007) survey looked at
experimental classes, along with eight
outcomes over how useful, suitable etc., an interestingness measure is. Therefore the
be considered to be subjective.
All of these measures mentioned above are for rules
on a single level but do not have the capacity for comparing
multiple levels simultaneously.
Hilderman R.J. and Hamilton H.J. (2001)
measure should satisfy :
The minimum value principle
The maximum value principle
The skewness principle, which states that the interestingness measure for the most uneven distribution
will decrease when then number of classes of tuples increases.
The permutation invariance principle
order of the class and it is only determined by the distribution of counts.
The transfer principle, which states that interestingness increases when a positive transfer is made from
the count of one tuple to another whose count is greater.
Here in our work we propose to measure the interestingness
peculiarity (also known as distance). These measures
just the data).

3. Interestingness Measures for Multi
3.1. Diversity-Based Interestingness Measures
According to L. Geng & H.J. Hamilton 2006
other, while a set of patterns is diverse if the patterns in the set differ significantly from each other"
H.J. Hamilton 2006). Summaries can be measured using diversity
has been research on using diversity to measure summaries, there is little, if any, research that focuses on
measuring the interestingness of association or classification rules
this study suggests that diversity can be used to measure association rule interestingness.
Diversity generally determined by two factors: 1) the proportional distribution of classes in the population,
and 2) the number of classes. Consider two diffe
rules that are more diverse to be more interesting. Additionally, we can consider a set of rules less interesting if
the rules are less diverse. Another way of thinking about this is by consideri
convey less knowledge to a user.
The equations for variance and the Shannon
These are only two measures based on diversity. There are fourteen other measures bas
available. Most measures are designed to be used with a specific application domain. We chose variance and
Shannon because of their wide-spread use.
:orioncc
ClouJc

896X (Online)
21
not necessarily ensure that the rules are truly interesting, especially when the confidence is equal to the
S. Lallich et al 2006). Based on this argument, other measures for determine the

ll of these existing measures fall into three categories; objective based measures (based
data and the user) and semantic based measures (based
(L. Geng & H.J. Hamilton 2006). In the survey presented in
there are nine criteria listed that can be used to determine if a pattern or rule
nine criteria are; conciseness, coverage, reliability, peculiarity, diversity, novelty, surprisingness,
first five criteria are considered to be objective, with the
to be subjective. The final two criteria are considered to be semantic.
Despite all the different measures, studies and works undertaken, there is no widely agreed upon
definition of what interestingness is in the context of patterns and association rules (L. Geng & H.J. Hamilton
recently several surveys of interestingness measures have been presented (L. Geng & H.J. Hamilton
2006, P. Lenca et al 2007 & K. McGray 2008). In K. McGray (2008)
strengths and weaknesses of various measures from the point of view of the level or extent
survey looked at classifying various interestingness measures into five
along with eight evaluation properties. However, all of these surveys
etc., an interestingness measure is. Therefore the usefulness of a measure can
s mentioned above are for rules derived from single level datasets. They work on items
on a single level but do not have the capacity for comparing different levels or rules containing items from
H.J. (2001) established three primary principles that a good interestingness
minimum value principle, which states that a uniform distribution is the most uninteresting.
maximum value principle, which states the most uneven distribution is the most interesting.
, which states that the interestingness measure for the most uneven distribution
will decrease when then number of classes of tuples increases.
permutation invariance principle, which states that interestingness for diversity is unrelated to the
order of the class and it is only determined by the distribution of counts.
, which states that interestingness increases when a positive transfer is made from
the count of one tuple to another whose count is greater.
Here in our work we propose to measure the interestingness of multi-level rules in terms of diversity and
(also known as distance). These measures were chosen as they are considered to
3. Interestingness Measures for Multi-Level Association Rules
Based Interestingness Measures
L. Geng & H.J. Hamilton 2006, a "pattern is diverse if its elements differ significantly from each
other, while a set of patterns is diverse if the patterns in the set differ significantly from each other"
Summaries can be measured using diversity-based interestingness measures. While there
has been research on using diversity to measure summaries, there is little, if any, research that focuses on
measuring the interestingness of association or classification rules (L. Geng & H.J. Hamilton 2006)
this study suggests that diversity can be used to measure association rule interestingness.
Diversity generally determined by two factors: 1) the proportional distribution of classes in the population,
and 2) the number of classes. Consider two different sets of rules mined from a dataset. We can consider the
rules that are more diverse to be more interesting. Additionally, we can consider a set of rules less interesting if
the rules are less diverse. Another way of thinking about this is by considering that too many similar rules will

The equations for variance and the Shannon, C.E (1948) measure are shown in the equation (1) and (2)
These are only two measures based on diversity. There are fourteen other measures based on diversity that are
available. Most measures are designed to be used with a specific application domain. We chose variance and
spread use.
:orioncc =
(p
i
-q)
2 m
i=1
m-1
(1)
ClouJc Sbonnon = - p

log
2
p
1
m
=1
(2)
www.iiste.org

especially when the confidence is equal to the marginal
, other measures for determine the
into three categories; objective based measures (based on the raw data),
measures (based on the semantic and
In the survey presented in (L. Geng & H.J.
listed that can be used to determine if a pattern or rule is interesting. These
reliability, peculiarity, diversity, novelty, surprisingness, utility and
first five criteria are considered to be objective, with the next two, novelty and
be semantic.
works undertaken, there is no widely agreed upon formal
(L. Geng & H.J. Hamilton
(L. Geng & H.J. Hamilton
K. McGray (2008) survey evaluated the
measures from the point of view of the level or extent of user interaction. P.
classifying various interestingness measures into five formal and five
evaluation properties. However, all of these surveys result in different
usefulness of a measure can
derived from single level datasets. They work on items
different levels or rules containing items from
established three primary principles that a good interestingness
, which states that a uniform distribution is the most uninteresting.
uneven distribution is the most interesting.
, which states that the interestingness measure for the most uneven distribution
, which states that interestingness for diversity is unrelated to the
, which states that interestingness increases when a positive transfer is made from
level rules in terms of diversity and
were chosen as they are considered to be objective (rely on
, a "pattern is diverse if its elements differ significantly from each
other, while a set of patterns is diverse if the patterns in the set differ significantly from each other" (L. Geng &
d interestingness measures. While there
has been research on using diversity to measure summaries, there is little, if any, research that focuses on
(L. Geng & H.J. Hamilton 2006). Therefore,
this study suggests that diversity can be used to measure association rule interestingness.
Diversity generally determined by two factors: 1) the proportional distribution of classes in the population,
rent sets of rules mined from a dataset. We can consider the
rules that are more diverse to be more interesting. Additionally, we can consider a set of rules less interesting if
ng that too many similar rules will
in the equation (1) and (2)
ed on diversity that are
available. Most measures are designed to be used with a specific application domain. We chose variance and
Information and Knowledge Management
ISSN 2224-5758 (Paper) ISSN 2224-896X (Online)
Vol 2, No.6, 2012
In the above equation for variance, p
3.2. Peculiarity Based Interestingness Measures
Peculiarity is an objective measure that determines how
away the rule is, the more peculiar. It is usually
apart rules are from each other. Peculiar rules
and significantly different from the rest of the
interesting as they may be unknown.
neighborhood-based unexpected measure
by the rules that surround it in its neigh
The measure is based on the idea of determining
which forms the basis of the distance between
unexpected confidence (where the confiden
neighborhood) and sparsity (where the number of mined rules in
potential rules for that neighborhood) could be determined,
Dong & J. Li 1998, L. Geng & H.J. Hamilton 2006)
G. Dong & J. Li (1998) measure determines the symmetric difference
datasets where each item was equally weighted. Thus the measure is
are not common between the two rules. In a multi
to the hierarchy. Thus the G. Dong & J. Li (1998)
Here we will present an enhancement as part of our proposed work.
We believe it is possible to take the distance
for multi-level datasets. The original measure is a syntax
P(R
1
, R
2
) = o
1
- |(X
1
U
1
)(X
2
U
The operator denotes the symmetric difference
X)
1
,
2
and
3
are the weighting factors
peculiarity of two rules by a weighted sum
rules antecedents, consequents and the rules
We propose an enhancement to this measure to allow
every item is unique and therefore none share any
1-*-*-*, 1-1-*-*, 1-1-1-* and 1-1-1-
not completely different and should have
hierarchy.
The greater the P(R
1
,R
2
) value is, lower similarity and so the greater
Therefore, the further apart the relation is between two items, the
have,
R
1
: 1 1 1 * 1 * * *
R
2
: 1 1 * * 1 * * *
R
3
: 1 1 1 1 1 * * *
We believe that the following should hold; P(R1,R3) < P(R2,R3) as 1
from each other than 1-1-1-* and 1
must be less than 1. Thus, for the above
Equation 3 by calculating the diversity of the symmetric difference between two rules instead of the cardinality
of the symmetric difference. The cardinality of the symmetric difference
rules in terms of the number of different items in the rules. The diversity of the symmetric difference takes into
consideration the hierarchical difference of the items in the symmetric difference to measure the differ
two rules. We recite Equation 2 in terms of a set of items below, where S is a set containing n ite
P(S) =
Where HRD is The Hierarchical Relationship
average number of levels between the two items and their
HRD is defined as width or horizontal distance, which is defined as
ER(n
1
,

896X (Online)
22
p
i
is the probability for class i, and is the average probability for all classes.
Based Interestingness Measures
Peculiarity is an objective measure that determines how far away one association rule is from others. The further
away the rule is, the more peculiar. It is usually done through the use of a distance measure
apart rules are from each other. Peculiar rules are usually few in number (often generated from outlying
and significantly different from the rest of the rule set. It is also possible that these peculiar rules can
interesting as they may be unknown. G. Dong & J. Li (1998) proposed peculiarity
unexpected measure. In this proposal it is argued that a rules interestingness is influenced
by the rules that surround it in its neighborhood.
The measure is based on the idea of determining and measuring the symmetric difference between two
which forms the basis of the distance between them. From this G. Dong & J. Li (1998)
confidence (where the confidence of a rule R is far from the average confidence of the rules in Rs
and sparsity (where the number of mined rules in a neighborhood is far less than that of all the
rules for that neighborhood) could be determined, measured and used as interestingness measures
L. Geng & H.J. Hamilton 2006).
measure determines the symmetric difference was developed for single level
each item was equally weighted. Thus the measure is actually a count of the number of items that
common between the two rules. In a multi-level dataset, each item cannot be regarded as being equal due
G. Dong & J. Li (1998) measure needs to be enhanced to be useful with these
present an enhancement as part of our proposed work.
We believe it is possible to take the distance measure presented in (G. Dong & J. Li 1998)
datasets. The original measure is a syntax-based distance metric in the following form:

2
)| +o
2
- |(X
1
X
2
)| + o
3
|(
1

2
)| (3)
operator denotes the symmetric difference between two item sets, thus X Y is equivalent to
are the weighting factors to be applied to different parts of the rule. Equation
peculiarity of two rules by a weighted sum of the cardinalities of the symmetric difference between
rules antecedents, consequents and the rules themselves.
ement to this measure to allow it to handle a hierarchy. Under the existing measure,
every item is unique and therefore none share any kind of syntax similarity. However, we argue that
-1 (based on Figure 1) all have a relationship with each other.
not completely different and should have a syntax similarity due to their relation through the
) value is, lower similarity and so the greater the distance between those two rules.
further apart the relation is between two items, the greater the difference and distance. Thus if we



We believe that the following should hold; P(R1,R3) < P(R2,R3) as 1-1-*-* and 1-1-1
* and 1-1-1-1. The difference between any two hierarchically related items
must be less than 1. Thus, for the above rules, 1 > P(R2,R3) > P(R1,R2) > 0. In order to achieve this we modify
Equation 3 by calculating the diversity of the symmetric difference between two rules instead of the cardinality
of the symmetric difference. The cardinality of the symmetric difference measures the difference between two
rules in terms of the number of different items in the rules. The diversity of the symmetric difference takes into
consideration the hierarchical difference of the items in the symmetric difference to measure the differ
two rules. We recite Equation 2 in terms of a set of items below, where S is a set containing n ite
u HR(,])
n
]=i+1
n-1
i=1
n(n-1)
+
[ L(,])
n
]=i+1
n-1
i=1
n(n-1)
(4)
Where HRD is The Hierarchical Relationship Distance between two items is defined as the ratio between the
number of levels between the two items and their common ancestor and the height of the tree.
HRD is defined as width or horizontal distance, which is defined as
( n
2
) =
(NL(n
1
,cu)+NL(n
2
,cu)
2-1ccHcght
(5)
www.iiste.org

, and is the average probability for all classes.
far away one association rule is from others. The further
done through the use of a distance measure to determine how far
are usually few in number (often generated from outlying data)
rule set. It is also possible that these peculiar rules can be
peculiarity measure, which is
proposal it is argued that a rules interestingness is influenced
and measuring the symmetric difference between two rules,
G. Dong & J. Li (1998) proposed that
the average confidence of the rules in Rs
a neighborhood is far less than that of all the
ed as interestingness measures (G.
was developed for single level
lly a count of the number of items that
each item cannot be regarded as being equal due
enhanced to be useful with these datasets.
(G. Dong & J. Li 1998) and enhance it
metric in the following form:
(3)
Y is equivalent to (X Y)(Y
to be applied to different parts of the rule. Equation 3 measures the
of the cardinalities of the symmetric difference between the two
it to handle a hierarchy. Under the existing measure,
kind of syntax similarity. However, we argue that the items
all have a relationship with each other. Thus they are
a syntax similarity due to their relation through the datasets
between those two rules.
greater the difference and distance. Thus if we
1-1 are further removed
1. The difference between any two hierarchically related items\ nodes
rules, 1 > P(R2,R3) > P(R1,R2) > 0. In order to achieve this we modify
Equation 3 by calculating the diversity of the symmetric difference between two rules instead of the cardinality
measures the difference between two
rules in terms of the number of different items in the rules. The diversity of the symmetric difference takes into
consideration the hierarchical difference of the items in the symmetric difference to measure the difference of the
two rules. We recite Equation 2 in terms of a set of items below, where S is a set containing n items:

two items is defined as the ratio between the
common ancestor and the height of the tree. In general
Information and Knowledge Management
ISSN 2224-5758 (Paper) ISSN 2224-896X (Online)
Vol 2, No.6, 2012
LD is the Level Distance which measures the distance between two items in terms of their height. This is also
called as height distance or vertical distance, which is defined as
I(n
1
, n
2
NLD is Number of levels NLD(x,y) = |hierarchy level of x|
N1 and n2 are two items, ca is common ancestor. The , are the user threshold values, where +=1.
experiment we set these values as ==0.5.
Thus the neighbourhood-based distance measure
PH(R
1
, R
2
) = o
1
- P((X
1

4. Experiments
The dataset used for our experiments is a real world
http://www.informatik.uni-freiburg.de/ cziegler/BX/). From this dataset we built a multi
dataset that contains 91,550 user records
frequent itemsets we use the MLT2 L1 algorithm proposed
level having its own minimum support.
and generators using the CLOSE+ algorithm proposed
non-redundant association rules using the
The confidence curve shows that the rules are
threshold) up to close to 1. The distribution of rules in this
as 2,181 rules for 0.95 to 1, to as high as 4,430 rules for
interesting rules is more practical than support, but still leaves
The overall diversity curve shows that the majority
value of between 0.3 to 0.4. The curve however, also
diveristy value below the majority, in the range of
of 0.45 up to 0.7. The rules located above
could be of interest as these rules have a
The antecedent-consequent diversity curve is similar
of rules, but the antecedent-consequent diversity curve
curve peaks at 0.35 to 0.4), with 12,408 rules. The
before peaking again at 0.5 to 0.55, wih 2,564
seems to show that the two diversity approaches
rules with differing antecedents and consequents
them. Fig 1. Interesting Mesures

0
3000
6000
9000
12000
15000
18000
21000
24000
27000
30000
33000
36000
39000
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
N
o
.

o
f

R
u
l
e
s

896X (Online)
23
is the Level Distance which measures the distance between two items in terms of their height. This is also
called as height distance or vertical distance, which is defined as
2
) =
NL(n
1
,n
2
)
(1ccHcght-1)
(6)
NLD is Number of levels NLD(x,y) = |hierarchy level of x| - |hierarchy level y|.
N1 and n2 are two items, ca is common ancestor. The , are the user threshold values, where +=1.
experiment we set these values as ==0.5.
based distance measure between two rules shown in Equation 3 now becomes;
(
1
U
1
)(X
2
U
2
) +o
2
- P(X
1
X
2
) + o
3
- P(
1

dataset used for our experiments is a real world dataset, the BookCrossing dataset (obtained from
freiburg.de/ cziegler/BX/). From this dataset we built a multi
user records and 970 leaf items, with 3 concept / hierarchy levels.
MLT2 L1 algorithm proposed by J. Han & Y. Fu (1995, 1999)
level having its own minimum support. From these frequent itemsets we then derive the f
CLOSE+ algorithm proposed by J. Pei et al. From this we then
redundant association rules using the MinMaxApprox (MMA) rule mining algorithm.
The confidence curve shows that the rules are spread out from 0.5 (which is the minimum confidence
up to close to 1. The distribution of rules in this area is fairly consistant and even, ranging from as low
as 2,181 rules for 0.95 to 1, to as high as 4,430 rules for 0.85 to 0.9. Using confidence to determine the
rules is more practical than support, but still leaves over 2,000 rules in the top bin.
The overall diversity curve shows that the majority of rules (23,665) here have an average overall diversity
0.4. The curve however, also shows that there are some rules which have an overall
diveristy value below the majority, in the range of 0.15 to 0.25 and some that are above the majority, in
of 0.45 up to 0.7. The rules located above the majority are different to the rules that make up the
could be of interest as these rules have a high overall diversity.
consequent diversity curve is similar to that of the overall diversity. It has a similar spread
consequent diversity curve peaks earlier at 0.3 to 0.35 (where as the overall diversity
curve peaks at 0.35 to 0.4), with 12,408 rules. The curve then drops down to a low number of rules at 0.45
before peaking again at 0.5 to 0.55, wih 2,564 rules. The shape of this curve with that of the overall diversity
seems to show that the two diversity approaches are related. Using the antecedent-consequent diversity
rules with differing antecedents and consequents to be discovered when support and confidence will not
Fig 1. Interesting Mesures

0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
.
5
5
0
.
6
0
.
6
5
0
.
7
0
.
7
5
0
.
8
0
.
8
5
0
.
9
0
.
9
5 1
Measure Values
Interesting Measures
www.iiste.org

is the Level Distance which measures the distance between two items in terms of their height. This is also
N1 and n2 are two items, ca is common ancestor. The , are the user threshold values, where +=1. In our
now becomes;

2
) (7)
dataset, the BookCrossing dataset (obtained from
freiburg.de/ cziegler/BX/). From this dataset we built a multi-level transactional
leaf items, with 3 concept / hierarchy levels. To discover the
by J. Han & Y. Fu (1995, 1999) with each concept
frequent closed itemsets
. From this we then derive the
MinMaxApprox (MMA) rule mining algorithm.
out from 0.5 (which is the minimum confidence
area is fairly consistant and even, ranging from as low
dence to determine the
over 2,000 rules in the top bin.
of rules (23,665) here have an average overall diversity
shows that there are some rules which have an overall
0.15 to 0.25 and some that are above the majority, in the range
re different to the rules that make up the majority and
to that of the overall diversity. It has a similar spread
peaks earlier at 0.3 to 0.35 (where as the overall diversity
curve then drops down to a low number of rules at 0.45 to 0.5,
rules. The shape of this curve with that of the overall diversity
consequent diversity allows
and confidence will not identify

Support
Conf
Ove_Diver_Dist
A_C_Dist
Dist_Dist
Information and Knowledge Management
ISSN 2224-5758 (Paper) ISSN 2224-896X (Online)
Vol 2, No.6, 2012
5. Conclusion
In this paper we proposed two interestingness
interestingness measures are diversity and peculiarity
within a rule and peculiarity compares items in two rules to see
have shown how diversity and peculiarity distance can be used to identify potentially
cant be identified using basic measurements support and confidence.

References
R. Agrawal, T. Imielinski and A. Swami.
Databases. In ACM SIGMOD International
207216, Washington D.C., USA.
G. Dong and J. Li. (1998), Interestingness of Discovered
Unexpectedness. In Second Pacific
pages 7286, Melbourne, Australia.
L. Geng and H. J. Hamilton. (2006),
Surveys (CSUR), Volume 38, pages 9.
J. Han and Y. Fu (1995). Discovery of Mult
International Conference on Very Large Databases (VLDB95), pages
J. Han and Y. Fu (1999). Mining Multiple
Knowledge and Data Engineering, Volume 11, pages
S. Lallich, O. Teytaud and E. Prudhomme
validation. Quality Measures in Data Mining
P. Lenca, B. Vaillant, B. Meyer and S. Lallich.
theoretical studies. Studies in Computational Intelligence,
K. McGarry. (2005), A Survey of Interestingness Measures
Engineering Review, Volume 20, pages 39
C.-N. Ziegler, S. M. McNee, J. A. Konstan and
Topic Diversification. In 14th International Conference
Japan.
Lenca, P., Meyer, P., Vaillant, B., & Lallich, S. (2008).
rules: User oriented description and multiple criteria decision aid
184, 610-626.
Li, J., & Zhang, Y. (2003). Direct Interesting Rule Generation
Conference on Data Mining (ICDM '03)
Hilderman, R. J., & Hamilton, H. J. (2001).
Kluwer Academic.
Shannon, C. E. (1948). A mathematical theory of communication
379-423.


896X (Online)
24
In this paper we proposed two interestingness measures for Multi Level Association rules
diversity and peculiarity respectively. Diversity is a measure that compares items
rule and peculiarity compares items in two rules to see how different they are.
and peculiarity distance can be used to identify potentially
cant be identified using basic measurements support and confidence.
R. Agrawal, T. Imielinski and A. Swami. (1993), Mining Association Rules between Sets of Items in Large
. In ACM SIGMOD International Conference on Management of Data (SIGMOD93), pages

Interestingness of Discovered Association Rules in terms of Neighbourhood
. In Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD98),

(2006), Interestingness Measures for Data Mining: A Survey
(CSUR), Volume 38, pages 9.
Discovery of Multiple-Level Association Rules from Large Databases
Conference on Very Large Databases (VLDB95), pages 420431, Zurich, Switzerland
Mining Multiple-Level Association Rules in Large Databases
Knowledge and Data Engineering, Volume 11, pages 798805.
S. Lallich, O. Teytaud and E. Prudhomme (2006), Association rule interestingness: measure and statistical
Quality Measures in Data Mining, Volume 43, pages 251276.
, B. Vaillant, B. Meyer and S. Lallich. (2007) Association rule interestingness: experimental and
. Studies in Computational Intelligence, Volume 43, pages 5176.
A Survey of Interestingness Measures for Knowledge Discovery
ew, Volume 20, pages 3961.
N. Ziegler, S. M. McNee, J. A. Konstan and G. Lausen (2005), Improving Recommendation Lists Through
. In 14th International Conference on World Wide Web (WWW05
Lenca, P., Meyer, P., Vaillant, B., & Lallich, S. (2008). On selecting interestingness measures
rules: User oriented description and multiple criteria decision aid. European Journal of Operations Research,
Direct Interesting Rule Generation. Proceedings of the Third IEEE International
Conference on Data Mining (ICDM '03), 155-167.
Hilderman, R. J., & Hamilton, H. J. (2001). Knowledge Discovery and Measures of In
A mathematical theory of communication. The Bell System Technical Journal, 27

www.iiste.org

Multi Level Association rules. These proposed
Diversity is a measure that compares items
In our experiments we
interesting rules which
Association Rules between Sets of Items in Large
on Management of Data (SIGMOD93), pages
Association Rules in terms of Neighbourhood-Based
iscovery and Data Mining (PAKDD98),
for Data Mining: A Survey. ACM Computing
Rules from Large Databases. In 21st
ich, Switzerland.
. IEEE Transactions on
rule interestingness: measure and statistical
rule interestingness: experimental and
Discovery. The Knowledge
Improving Recommendation Lists Through
on World Wide Web (WWW05), pages 2232, Chiba,
On selecting interestingness measures for association
Journal of Operations Research,
Proceedings of the Third IEEE International
Knowledge Discovery and Measures of Interest. Boston, MA:
The Bell System Technical Journal, 27,
This academic article was published by The International Institute for Science,
Technology and Education (IISTE). The IISTE is a pioneer in the Open Access
Publishing service based in the U.S. and Europe. The aim of the institute is
Accelerating Global Knowledge Sharing.

More information about the publisher can be found in the IISTEs homepage:
http://www.iiste.org

The IISTE is currently hosting more than 30 peer-reviewed academic journals and
collaborating with academic institutions around the world. Prospective authors of
IISTE journals can find the submission instruction on the following page:
http://www.iiste.org/Journals/
The IISTE editorial team promises to the review and publish all the qualified
submissions in a fast manner. All the journals articles are available online to the
readers all over the world without financial, legal, or technical barriers other than
those inseparable from gaining access to the internet itself. Printed version of the
journals is also available upon request of readers and authors.
IISTE Knowledge Sharing Partners
EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open
Archives Harvester, Bielefeld Academic Search Engine, Elektronische
Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial
Library , NewJour, Google Scholar

Potrebbero piacerti anche