Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1
PROJECT GROUP
MEMBERS
Sudhir L. Naik
Rameez Maredia
Trupti Pachpande
Priyanka Maskar
PROJECT GUIDE
Prof. Prachi kshirsagar
2
TOPICS TO BE
DISCUSSED
Introduction of Project
Proposed system
Objectives of Project
Software and Hardware Requirements
Explanation of Algorithm used with solved
example
System Architecture
Project Planning and line of action
Conclusion and Future work
References
3
Introduction of project
What is Data mining ?
Association Rule mining
Market basket analyses : An approach to
association rule mining
Purpose
Concepts
4
DATA MINING
Data mining
refers to extracting
or “mining”
knowledge from
large amount of
data.
Wide availability of
huge amounts of
data and the
imminent need for
turning such data
into useful
information and
knowledge.
Data mining is an
essential step in
KNOWLEDGE 5
Data Mining: A KDD Process
Pattern Evaluation
– Data mining—core
of knowledge
discovery process Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databas
6
es
ASSOCIATION RULE
MINING
Association rule mining is one of the
techniques of Data mining.
Association rule mining searches for
interesting relationships among items in given
data set. Association rules are of the form IF
X THEN Y. For example: 70% of people who
buy bread also buy jam. 50% of people who
have high blood pressure and are overweight
have high cholestrol.
7
8
MARKET BASKET
ANALYSES : An approach
to association rule
mining
MARKET BASKET ANALYSES (one of the
example of association-rule mining) is a useful
method of discovering customer purchasing
patterns by extracting associations or co-
occurrences from stores’ transactional
databases.
It is an analysis conducted to determine which
products customers purchase together.
This can be very useful as once it is known that
customers’ who buy product A are likely to buy
product B, then company can market both A and B
9
PURPOSE
To improve the effectiveness of marketing and
sales tactics
To determine what products customers purchase
together.
This facilitates impulse buying and helps ensure
that customers who would buy a product don’t
forget to buy it on account of not having seen it.
In addition, this has the side effect of improving
customer satisfaction – once they’ve found one of
the items they want, the customer doesn’t have
to look all over the store for something they want
to buy. Their other purchases are already located
10
CONCEPTS
Itemset
It is a set of items in a transaction. K-itemset
is a set of ‘k’ number of items.
Frequent Itemset
It is an itemset whose support in a transaction
database is more than the minimum support
specified.
11
CONCEPTS
Support
It is the percentage of records containing an
item combination compared to total number
of records.
Support( A & B )
How important is the rule: What
percent of baskets have both A & B?
12
Support
Cereal
• Support( s ) =
# of baskets containing S / 40 1000
Beer
# of total baskets
2000 4000
16
OBJECTIVES OF
PROJECT
To make more informed decisions about product
placement, pricing, promotion and profitability.
To learn more about customer behaviour.
To find out which products perform similarly to
each other.
To determine which products should be placed
near each other.
To find out which products should be cross-sold.
To find out if there are any successful products
that have no significant related elements.
17
SOFTWARE AND
HARDWARE
REQUIREMENTS
Microsoft Windows (2000 / XP / Vista)
256MB RAM
18
ALGORITHM USED :
APRIORI ALGORITHM
This is a level wise algorithm developed by
Dr R Aggarwal & Dr R Srikant.
A set of frequent 1-itemsets is found. Then it is
used to generate frequent 2-itemsets and these 2-
itemsets are used to generate 3-itemsets and so
on.
The name of the algorithm is based on the fact
that the algorithm uses prior knowledge of
frequent itemset properties.
Input to algorithm: all transactions
Output: all successful combinations which
satisfies minimum support
19
APRIORI ALGORITHM
______________________________________________
20
EXAMPLE SOLVED: there are
9 transactions in this
database
TID List of item_IDs
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
T900 I1,I2,I3
21
C1 L1
Itemse Sup. Itemse Sup.
Scan D for t count t count
count of {I1} 6 Compare {I1} 6
each {I2} 7 candidate {I2} 7
candidate support
{I3} 6 {I3} 6
count with
{I4} 2 min.suppor {I4} 2
{I5} 2 t {I5} 2
22
C2
I C2
tem L2
set Items Sup.
{I1,I Items Sup.
et count
2}
{I1,I et Coun
{I1,I2} 4
3}
{I1,I
Scan D {I1,I2 t4
{I1,I3} 4 Compare
for count }
4}
{I1,I candidate {I1,I3 4
of each {I1,I4} 1
5} support }
{I1,I5 2
{I2,I candidat
{I1,I5} 2 count with }
3} e {I2,I3 4
{I2,I3} 4 min.suppor
{I2,I t }
{I2,I4 2
4} {I2,I4} 2 }
{I2,I5 2
{I2,I {I2,I5} 2 }
5} {I3,I4} 0
{I3,I {I3,I5} 1
4}
{I4,I5} 0
{I3,I
5}
{I4,I
23
5}
Scan D for count Compare
of each candidate candidate
support count
C3 C3 with L3
min.support
Itemset Itemse Sup. Itemse Sup.
t count t Count
{I1,I2,I3}
{I1,I2,I 2 {I1,I2,I 2
{I1,I2,I5} 3}
{I1,I2,I 2 3}
{I1,I2,I 2
5} 5}
24
The Apriori Algorithm
• Progressively
identifies large
itemsets of A B C D
different sizes
• Exploits the AB AC AD BC BD CD
property that any
subset of a large
itemset is also a ABC ABD ACD BCD
large itemset
– Also, any superset
of a small itemset ABCD
is also small
Large Itemset Property
Large Itemset Property
28
PROJECT PLANNING AND
LINE OF ACTION
29
TIME LINE CHART
30
CONCLUSIONS AND
FUTURE WORK
31
REFERENCES
www.google.com
www.wikipedia.org
Data mining :concepts and techniques.
By Jiawei Han and
Micheline Kamber
32