Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Aasawari Bagewadikar, Priyanka Botny Srinath, Ronald Bayross, Srujana Paluri, Stephen Woolery
Dr. Ahmed Ezzat
Computer Engineering, Santa Clara University
Santa Clara, California, United States
AbstractThe business world requires critical
decision making in various situations and
scenarios. Large amounts of information are
generated from the products and/or services
industry. With information comes data in a raw
format that requires tasks like data mining to
extract meaningful information. Mining
methods like association rule mining help in
uncovering relationships between unrelated
data in a repository. In this paper, we provide
an overview of the basic concepts of association
rule mining and walk-through the list of existing
association rule mining techniques. We analyze
these by taking sample applications to perform
analysis.
Index TermsAssociative Rule Mining,
Apriori, FP Growth, Opinion Mining,
Recommender Systems, Data Stream Mining,
Stock and Text categorization
I. INTRODUCTION
Data mining is an analytical method for gathering
useful information from data. It allows the users to
analyze, categorize, and summarize the
relationships among data using various algorithms.
In this paper, we examine the various association
rule mining algorithms with applications that help
in the analysis.
Many modern business organizations accumulate
large amounts of data from their day-to-day
activities. This data is stored in databases, data
warehouses, and other various information storage
schemes. This data is important and it has to be
processed in order to get the useful information out
of it. Data mining techniques such as clustering,
classification, prediction and association rule
mining are generally used to analyze the data to
II. BACKGROUND
Association rules are if/then statements that help to
uncover relationships between unrelated data in a
database, relational database or other information
repository. Their application is most often seen
with the Market Basket problem, where the items
in a customers basket will suggest or imply what
else may be included in the basket for a given trip
to the market. Once generated, these rules can be
used to provide guidance on what a user group may
do after a certain sequence of events has
occurred. With this guidance, resources can be
better utilized to guide customers to likely items,
or plan for certain events given known behavior.
A common example used to explain associative
rule mining revolves around the relationship
between diapers and beer in grocery store
transactions. In this example, large amounts of
customer purchase data are collected at a grocery
Mining
(SETM)
Ck= Apriori_generate(Fk-1);
a. CBk=Counting_base_generate(Ck,
CBk-1)
b. Support_count(Ck, CBk);
5. Fk= {c Ck| support(c) min_support};}
6. F= sum of all Fk;
4.
6. Fk +1 = Fk +1UN;
7. end
8. end
9. end
10. if Fk +1= then 11. Bottom-Up(Fk +1);
11. end }
G. FP GROWTH ALGORITHM
FP Growth Algorithm is a Frequency Pattern
Growth Algorithm, which constructs the
Frequency Pattern tree based on the given data sets.
This approach has an advantage over the Apriori
approach as it does not require excessive recursion
over the transaction set, and the complexity to
recreate any given itemset is less.
To create an FP tree, first the transaction database
is reviewed to find the frequency of each item.
Then a priority is assigned to each item, based on
how frequent it is, with more common items
having a higher priority. With that complete, the
dataset is rearranged according to priority, and
infrequent items are discarded. Once complete, the
final FP-tree can be constructed.
IV.
ANALYSIS OF ALGORITHMS
H. RECURSIVE ELIMINATION
ALGORITHM
Benefits: Better efficiency than Apriori in all cases.
The goal of frequent pattern mining algorithm is
discover all the patterns having support greater
than the user-defined threshold.
Drawbacks: Less efficiency than ECLAT in all
cases.
Characteristic
Data support
AIS
Less
SETM
Less
Apriori
Limited
Apriori-TID
Often large
Apriori hybrid
Very large
ECLAT
FP-Growth
Table 2 Comparison of Association
Rule Mining algorithms [5]
V. APPLICATIONS
Many applications require the identification and
categorization of frequent itemsets, extending
beyond the associations of purchasing diapers and
beer. Here, we examine several applications of
associative rules and rule learning that represent
some common and some more novel contemporary
approaches.
A. Stock Categorization
Algorithm used: FP Growth Algorithm
In The Influence of Volume and Volatility on
Predicting Shanghai Stock Exchange Trends,
Pierrot et. al. develop a method of evaluating
association rule effectiveness on predicting stock
price behavior. For stock price data, measures of
support are not sufficient to characterize the
performance of associative rules on stock price
behavior, as relatively few trades may severely
skew the results. Confidence measures also may
prove inaccurate, as the seemingly random motion
of stock price may result in a rule having very high
confidence in a certain result, but not having a high
reliability in reaching that result. Instead, the
authors propose two new measures, volatility and
volume, to
evaluation.
generate
associative
rules
for
VIII. REFERENCES
[1] M. Agarwal, M. Jailia. An interactive
method for generalized association rule
mining using FP-tree, in Proc of 2nd
Bangalore Annual Compute Conference, 2009
http://dl.acm.org.libproxy.scu.edu/citation.
cfm?id=1517303.1517314&coll=DL&dl=
ACM&CFID=693745984&CFTOKEN=8
8430067
[2] Sandvig, B. Mobasher, R. Burke.
Robustness of collaborative recommendation
based on association rule mining, in Proc of
2007 ACM Conference on Recommender
Systems, 2007, pp 105-112
http://dl.acm.org.libproxy.scu.edu/citation.
cfm?id=1297231.1297249&coll=DL&dl=
ACM&CFID=693745984&CFTOKEN=8
8430067
[3] W.Y. Kim, J.S. Ryu, K. Kim, U. Kim. A
method for opinion mining of product reviews
using association rules, in Proc of 2nd
International Conference on Interaction
Sciences: Information Technology, Culture
and Human, 2009, pp 270-274
http://dl.acm.org.libproxy.scu.edu/citation.
cfm?id=1655925.1655973&coll=DL&dl=
ACM&CFID=693745984&CFTOKEN=8
8430067
[4] T.A. Kumbhare, S.V. Chobe. An
Overview of Association Rule Mining,
International Journal of Computer Science
and Information Technologies, Vol. 5, pp 927930, 2014
http://www.ijcsit.com/docs/Volume%205/
vol5issue01/ijcsit20140501201.pdf
[5] R.
Tilili,
Y.Slimani.
Executing
Association Rule Mining Algoritms under a
Grid Computing Environment. in Proc.
Parallel and Distributed Systems: Testing,
Analsys, and Debugging, 2011, pp 53-61
http://doi.acm.org.libproxy.scu.edu/10.114
5/2002962.2002973
http://www.ijarcce.com/upload/2013/may/
59-Manisha%20ThoolASSOCIATION%20RULE%20GENERA
TION%20IN%20STREAMS.pdf
[12] R. Pierrot, L. H. Liu. The Influence of
Volume and Volatility on Predicting Shanghai
Stock
Exchange Trends,
in
Fifth
International Conference on Fuzzy Systems
and Knowledge Discovery, Vol. 1 pp 470-474,
October 2008
http://dx.doi.org.libproxy.scu.edu/10.1109/
FSKD.2008.88
[13] Baralis, P. Garza. Associative text
categorization exploiting negated words, in
Proc of 2006 ACM Symposium on Applied
Computing, 2006, pp 530-535
http://doi.acm.org.libproxy.scu.edu/10.114
5/1141277.1141402
[14] Garg, Urvashi. Kaur, Manjit. ECLAT
Algorithm for frequent Itemsets Generation,
International Journal of Computer Systems
(ISSN: 2394-1065), Volume 01 Issue 03,
December, 2014
http://www.ijcsonline.com/IJCS/IJCS_201
4_0103002.pdf
[15] Sharma,Simple. Khurana, Komal. A
Comparative Analysis of Associative Rules
Mining Algorithms International Journal of
Scientific and Research Publications, Volume
3, Issue 5, May 2013 1 ISSN 2250-3153
http://www.ijsrp.org/research-paper0513/ijsrp-p17133.pdf
[16] Kudhati Madhav, Venu. A New Data
Steam Mining Algorithm for InterestingnessRich Association Rules Journal of Computer
Information Systems, Spring 2013
http://iacis.org/jcis/articles/JCIS53-3-2.pdf
[17] S.Vijayarani. P.Sathya. Mining Frequent
Item Sets over Data Streams using clat
Algorithm, International Conference on
Research Trends in Computer Technologies
(ICRTCT - 2013)
http://research.ijcaonline.org/icrtct/number
4/icrtct1048.pdf
[18] Pooja Agrawal. Suresh Kashyap. Vikas
Chandra Pandey. Suraj Prasad Keshri. A
Review Approach on various form of Apriori
with Association Rule Mining in
International Journal on Recent and
Innovation Trends in Computing and
Communication Vol. 1, Issue 5.
http://www.academia.edu/4899172/A_Rev
iew_Approach_on_various_form_of_Apri
ori_with_Association_Rule_Mining
[19] Michael D. Ekstrand. John T. Riedl.
Joseph A. Konstan. Collaborative Filtering
Recommender Systems in Foundations and
Trends in HumanComputer Interaction Vol.
4, 2010.
http://files.grouplens.org/papers/FnT%20C
F%20Recsys%20Survey.pdf
[20] Badrul Sarwar. George Karypis. Joseph
Konstan.
John
Riedl.
Item-Based
Collaborative Filtering Recommendation
Algorithms for GroupLens Research
Group/Army
HPC
Research
Center,
Department of Computer Science and
Engineering University of Minnesota,
Minneapolis.
http://files.grouplens.org/papers/www10_s
arwar.pdf
[21] Kenneth. Lai, and N. Cerpa, Proceedings
of the OPTIMA Conference OPTIMA 2001 Conference of the ICHIO (The Chilean
Operations Research Society), Curic, Chile ,
October 10-12, 2001, Curic, Chile
http://www.academia.edu/648890/Support
_vs_Confidence_in_Association_Rule_Algorithm
s
[22] Das, W. Ng, Y. Woon. Rapid
Association Rule Mining, in Proc of 10th
International Conference on Information and
Knowledge Management, 2001, pp 474-481
http://doi.acm.org/10.1145/502585.502665