Sei sulla pagina 1di 32

AN APROACH TO

ASSOCIATION RULE MINING

1
PROJECT GROUP
MEMBERS
Sudhir L. Naik
Rameez Maredia
Trupti Pachpande
Priyanka Maskar

PROJECT GUIDE
Prof. Prachi kshirsagar

2
TOPICS TO BE
DISCUSSED
Introduction of Project
Proposed system
Objectives of Project
Software and Hardware Requirements
Explanation of Algorithm used with solved
example
System Architecture
Project Planning and line of action
Conclusion and Future work
References
3
Introduction of project
What is Data mining ?
Association Rule mining
Market basket analyses : An approach to
association rule mining
Purpose
Concepts

4
DATA MINING
Data mining
refers to extracting
or “mining”
knowledge from
large amount of
data.
Wide availability of
huge amounts of
data and the
imminent need for
turning such data
into useful
information and
knowledge.
Data mining is an
essential step in
KNOWLEDGE 5
Data Mining: A KDD Process

Pattern Evaluation
– Data mining—core
of knowledge
discovery process Data Mining

Task-relevant Data

Data Selection
Warehouse
Data Cleaning

Data Integration

Databas
6
es
ASSOCIATION RULE
MINING
Association rule mining is one of the
techniques of Data mining.
 Association rule mining searches for
interesting relationships among items in given
data set. Association rules are of the form IF
X THEN Y. For example: 70% of people who
buy bread also buy jam. 50% of people who
have high blood pressure and are overweight
have high cholestrol.

7
8
MARKET BASKET
ANALYSES : An approach
to association rule
mining
MARKET BASKET ANALYSES (one of the
example of association-rule mining) is a useful
method of discovering customer purchasing
patterns by extracting associations or co-
occurrences from stores’ transactional
databases.
 It is an analysis conducted to determine which
products customers purchase together.
This can be very useful as once it is known that
customers’ who buy product A are likely to buy
product B, then company can market both A and B
9
PURPOSE
To improve the effectiveness of marketing and
sales tactics
 To determine what products customers purchase
together.
This facilitates impulse buying and helps ensure
that customers who would buy a product don’t
forget to buy it on account of not having seen it.
In addition, this has the side effect of improving
customer satisfaction – once they’ve found one of
the items they want, the customer doesn’t have
to look all over the store for something they want
to buy. Their other purchases are already located
10
CONCEPTS
Itemset
It is a set of items in a transaction. K-itemset
is a set of ‘k’ number of items.
Frequent Itemset
It is an itemset whose support in a transaction
database is more than the minimum support
specified.

11
CONCEPTS
Support
It is the percentage of records containing an
item combination compared to total number
of records.
Support( A & B )
How important is the rule: What
percent of baskets have both A & B?

12
Support
 Cereal
• Support( s ) =
# of baskets containing S / 40 1000
Beer
# of total baskets
2000 4000

• Support { Beer } = 1000/4000 = 25%


• Support { Cereal } = 2000/4000 = 50%
• Support { Beer, Cereal } = 40/4000 = 1%

• Support: How significant is this itemset


• In a supermarket, anything over .1% might be
significant
• Given the # of total baskets, the minimum interesting
support determines n for the Apriori algorithm
© Ellis Cohen, 2003-2006
CONCEPTS
Confidence
We actually measure how confident can we
be, given that a customer has purchased one
product, that he will also purchase another
product.
Confidence( A  B )
How likely is it that baskets
which contains A also contains
B. In general, should be at least
35%.
14
Confidence Cereal
• Confidence( A  B ) = Beer 40 1000
Support( A & B ) /
Support( A ) 2000 4000
• Confidence( A  B ) =
# of baskets containing A & B /
# of baskets containing A

• Confidence( Beer  Cereal ) = 40/1000 = 4%


• Confidence( Cereal  Beer ) = 40/2000 = 2%

• Confidence( A  B ): If a basket has A,


how likely is it that the basket also will have B
(i.e. how confident are we that A predicts B)
• If this is low (say, less than 30%), it is not very
interesting, since the two items don't correlate

© Ellis Cohen, 2003-2006


PROPOSED SYSTEM
Problem statement

Our aim is to develop a software for Super


Market. Software is basically implementation
of MARKET BASKET ANALYSES using APRIORI
algorithm.

16
OBJECTIVES OF
PROJECT
To make more informed decisions about product
placement, pricing, promotion and profitability.
To learn more about customer behaviour.
To find out which products perform similarly to
each other.
To determine which products should be placed
near each other.
To find out which products should be cross-sold.
To find out if there are any successful products
that have no significant related elements.
17
SOFTWARE AND
HARDWARE
REQUIREMENTS
 Microsoft Windows (2000 / XP / Vista)

 Intel Pentium III / AMD Athlon processor

 256MB RAM

 Oracle for database.

 ASP.NET Language for coding

 All platforms of Microsoft Windows

 Hard disk with at least 40GB.

 Windows Compatible Mouse and Keyboard

18
ALGORITHM USED :
APRIORI ALGORITHM
This is a level wise algorithm developed by
Dr R Aggarwal & Dr R Srikant.
A set of frequent 1-itemsets is found. Then it is
used to generate frequent 2-itemsets and these 2-
itemsets are used to generate 3-itemsets and so
on.
The name of the algorithm is based on the fact
that the algorithm uses prior knowledge of
frequent itemset properties.
Input to algorithm: all transactions
Output: all successful combinations which
satisfies minimum support
19
APRIORI ALGORITHM
______________________________________________

20
EXAMPLE SOLVED: there are
9 transactions in this
database
TID List of item_IDs
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
T900 I1,I2,I3

21
C1 L1
Itemse Sup. Itemse Sup.
Scan D for t count t count
count of {I1} 6 Compare {I1} 6
each {I2} 7 candidate {I2} 7
candidate support
{I3} 6 {I3} 6
count with
{I4} 2 min.suppor {I4} 2
{I5} 2 t {I5} 2

22
C2
I C2
tem L2
set Items Sup.
{I1,I Items Sup.
et count
2}
{I1,I et Coun
{I1,I2} 4
3}
{I1,I
Scan D {I1,I2 t4
{I1,I3} 4 Compare
for count }
4}
{I1,I candidate {I1,I3 4
of each {I1,I4} 1
5} support }
{I1,I5 2
{I2,I candidat
{I1,I5} 2 count with }
3} e {I2,I3 4
{I2,I3} 4 min.suppor
{I2,I t }
{I2,I4 2
4} {I2,I4} 2 }
{I2,I5 2
{I2,I {I2,I5} 2 }
5} {I3,I4} 0
{I3,I {I3,I5} 1
4}
{I4,I5} 0
{I3,I
5}
{I4,I
23
5}
Scan D for count Compare
of each candidate candidate
support count
C3 C3 with L3
min.support
Itemset Itemse Sup. Itemse Sup.
t count t Count
{I1,I2,I3}
{I1,I2,I 2 {I1,I2,I 2
{I1,I2,I5} 3}
{I1,I2,I 2 3}
{I1,I2,I 2
5} 5}

24
The Apriori Algorithm
• Progressively
 identifies large
itemsets of A B C D
different sizes
• Exploits the AB AC AD BC BD CD
property that any
subset of a large
itemset is also a ABC ABD ACD BCD
large itemset
– Also, any superset
of a small itemset ABCD
is also small
Large Itemset Property
Large Itemset Property

If B is not frequent, then none


of the supersets of B can be
frequent.

If {ACD} is frequent, then all


subsets of {ACD} ({AC}, {AD},
{CD}) must be frequent.

If {ACD} is frequent, then all


subsets of ({A}, {A}, {C}) must
be frequent.
SYSTEM
ARCHITECTURE

28
PROJECT PLANNING AND
LINE OF ACTION

29
TIME LINE CHART

30
CONCLUSIONS AND
FUTURE WORK

31
REFERENCES
 www.google.com
 www.wikipedia.org
Data mining :concepts and techniques.
By Jiawei Han and
Micheline Kamber

32

Potrebbero piacerti anche