Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1. Introduction
Minimarket X wants to analyse the shopping habits of customers to find associations
and correlations among the items in their shopping basket. Specifically, this market basket
analysis aims to determine what items are frequently purchased together by customers. The
application is designed to perform multi-dimensional data mining, where each variable
represents on one particular dimension. The dimension involved is the items purchased and
time. The information generated is the rules and the correlation between the items involved in
the decision making. The goal of this research is to develop an application to find out what
products are often purchased together by customers with the attributes that influence it, such
as product and the time of purchase, using the case study in Minimarket X, using HybridDimension Association Rules criteria.
2. Problem Statement
The aim of this project is to find the relation between the items in the basket in a
particular market X and determine the items that are frequently purchased together by the
customer. Here we have taken Food dataset from IBM SPSS Modeller.
3. Data Mining
3.1. Market Basket Analysis
This is a process which analyses the habits of the buyer to find a relationship between
different items on their shopping cart (market basket). The discovery of these relationships
can help the seller to develop a sales strategy to consider items frequently purchased together
by customers. For example, if a buyer buys flour, how likely they will buy sugar on the same
transaction [1].
While the pseudo code of the formation of joint candidate item set is given below
After the results obtained, just made a strong association rule from these results. This
can be obtained by following rule strength measures. [3]
a) Support
The rule X Y holds with support s if s% of transactions in D contain X Y. Rules
that have as greater than a user-specified support is said to have minimum support.
number of transactions that contain antecedents
Support =
total number of transactions
b) Confidence
The rule X Y holds with confidence c if c% of the transactions in D that contain X
also contain Y. Rules that have a c greater than a user-specified confidence is said to
have minimum confidence.
( )
Confidence =
()
c) Interestingness
Identifies rare rules, even though they have less individual support count adds
interestingness.
( ) ( )
( )
(1
)
()
()
d) Comprehensibility
The Comprehensibility measure is needed to make the discovered rules easy to
understand. The comprehensibility tries to quantify the understand ability of the rule.
log(1 + ||)
log(1 + | |)
Here |Y| and |XY| are the number of attributes involved in the consequent body and
the total rule respectively.
Comprehensibility =
e) Lift
The lift value is a measure of importance of a rule. The lift value of an association
rule is the ratio of the confidence of the rule and the expected confidence of the rule.
The expected confidence of a rule is defined as the product of the support values of
the rule body and the rule head divided by the support of the rule body.
Lift(X Y) =
( )
() ()
f) Leverage
Leverage measures the difference of X and Y appearing together in the data set and
what would be expected if X and Y where statistically dependent. The rational in a
sales setting is to find out how many more units (items X and Y together) are sold
than expected from the independent sells.
Lift(X Y) = ( ) () ()
g) Conviction
The ratio of the expected frequency that X occurs without Y (that is to say, the
frequency that the rule makes an incorrect prediction) if X and Y were independent
divided by the observed frequency of incorrect predictions.
Conviction(X Y) =
1 ()
1 ( )
h) Coverage
Sometimes called antecedent support. It measures how often a rule is
applicable in a database.
= ( )
4. Basic architecture
a)
b)
c)
d)
6. Results
Support min
Confidence
min
Max rule
length
Lift filtering
Apriori: Rules
Apriori: Metrics
0.05
0.5
4
1.1
FPgrowth: Rules
FPgrowth: Metrics
7. Comparing Models
Comparison table
Parameter
Technique
Memory Utilization
Number of Scans
Time
Apriori
Uses Apriori property ,join and
prune property
FpGrowth
It constructs conditional
frequent pattern tree and
conditional base from data
base which satisfies minimum
support
Due to compact structure and
no candidate generation
require less memory
Scan the DB only twice
Based on Metrics
Support* confidence
Support:
Confidence:
8. Conclusion
So we can use FpGrowth as good model for association rule mining in Market Basket
Analysis.
9. References
[1] D. Olson and S. Yong, Introduction to Business Data Mining. New York: McGrawHill, 2007
[2] H. Jiawei and K. Micheline, Data Mining: Concepts and Techniques. Morgan
Kaufmann, 2001.
[3] Association Rules Extraction using Multi-objective Feature of Genetic Algorithm by
Mohit K. Gupta and Geeta Sikka
[4] International Journal of Advanced Research in Computer Science and Software
Engineering, Volume 3, Issue 6, June 2013