Sei sulla pagina 1di 13

11/19/2019 Apriori Algorithm (Python 3.

0) - A Data Analyst

A DATA ANALYST
Lifelong Learning From Information

Globale Geschäftsideen nutzen

Hier einfach Webseite eingeben und Market


Finder emp ehlt die für Sie geeigneten
Märkte.
Market Finder

MACHINE LEARNING / 4 COMMENTS

Apriori Algorithm (Python 3.0)

Deep Learning Box 10GPU


Bis zu 10 GPU. Schnelle Lieferung. Fertig installiert mit Tensor ow,
Caffe, Theano, usw.

cadnetwork.de ÖFFNEN

Apriori Algorithm

The Apriori algorithm principle says that if an itemset is frequent, then all of its subsets are frequent.this means that if {0,1} is frequent,
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
then {0} and {1} have to be frequent.
To nd out more, including how to control cookies, see here: Cookie Policy

Close and accept


The rule turned around says that if an itemset is infrequent, then its supersets are also infrequent.

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 1/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

We rst need to nd the frequent itemsets, and then we can nd association rules.

Pros: Easy to code up

Cons: May be slow on large datasets

Works with: Numeric values, nominal values

Association analysis

Looking for hidden relationships in large datasets is known as association analysis or association rule learning. The problem is, nding
di erent combinations of items can be a time-consuming task and prohibitively expensive in terms of computing power.

These interesting relationships can take two forms: frequent item sets or association rules. Frequent item sets are a collection of items
that frequently occur together. The second way to view interesting relationships is association rules. Association rules suggest that a
strong relationship exists between two items.

With the frequent item sets and association rules, retailers have a much better understanding of their customers. Another example is
search terms from a search engine.

The support and con dence are ways we can quantify the success of our association analysis.

The support of an itemset is de ned as the percentage of the dataset that contains this itemset.

The con dence for a rule P ➞ H is de ned as support(P | H)/ support(P). Remember, in Python, the | symbol is the set union; the
mathematical symbol is U. P | H means all the items in set P or in set H.

General approach to the Apriori algorithm

1. Collect: Any method.


2. Prepare: Any data type will work as we’re storing sets.
3. Analyze: Any method.

4. Train: Use the Apriori algorithm to nd frequent itemsets.


5. Test: Doesn’t apply.

6. Use: This will be used to nd frequent itemsets and association rules between items.

Finding frequent itemsets

The way to nd frequent itemsets is the Apriori algorithm. The Apriori algorithm needs a minimum support level as an input and a data
set. The algorithm will generate a list of all candidate itemsets with one item. The transaction data set will then be scanned to see which
sets meet the minimum support level. Sets that don’t meet the minimum support level will get tossed out. The remaining sets will then be
combined to make itemsets with two elements. Again, the transaction dataset will be scanned and itemsets not meeting the minimum
support level will get tossed. This procedure will be repeated until all sets are tossed out.

Scanning the dataset

For each transaction in the dataset:

For each candidate itemset, can:


Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To nd out more, including how to control cookies, see here: Cookie Policy

Check to see if can is a subset of tran Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 2/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst
If so increment the count of can

For each candidate itemset:

If the support meets the minimum, keep this item

Return list of frequent itemsets

In [1]:

from numpy import *

Create a simple dataset for testing

In [2]:

def loadDataSet():
return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]

It creates C1 .C1 is a candidate itemset of size one. In the Apriori algorithm, we create C1, and then we’ll scan the dataset to see if these one
itemsets meet our minimum support requirements. The itemsets that do meet our minimum requirements become L1. L1 then gets
combined to become C2 and C2 will get ltered to become L2.

Frozensets are sets that are frozen, which means they’re immutable; you can’t change them. You need to use the type frozenset instead of
set because you’ll later use these sets as the key in a dictionary.

You can’t create a set of just one integer in Python. It needs to be a list (try it out). That’s why you create a list of single-item lists. Finally,
you sort the list and then map every item in the list to frozenset() and return this list of frozensets

In [11]:

def createC1(dataSet):
C1 = []
for transaction in dataSet:
for item in transaction:
if not [item] in C1:
C1.append([item])

C1.sort()
return list(map(frozenset, C1))#use frozen set so we
#can use it as a key in a dict

This function takes three arguments: a dataset, Ck, a list of candidate sets, and minSupport, which is the minimum support you’re
interested in. This is the function you’ll use to generate L1 from C1. Additionally, this function returns a dictionary with support values.

In [28]:

def scanD(D, Ck, minSupport):


ssCnt = {}
for tid in D:
for can in Ck:
if can.issubset(tid):
if not can in ssCnt: ssCnt[can]=1
else: ssCnt[can] += 1
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
numItems
To nd out = float(len(D))
more, including how to control cookies, see here: Cookie Policy
retList = []
supportData = {} Close and accept
for key in ssCnt:

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 3/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst
support = ssCnt[key]/numItems
if support >= minSupport:
retList.insert(0,key)
supportData[key] = support
return retList, supportData

In [29]:

dataSet = loadDataSet()
dataSet

Out[29]:

[[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]

In [30]:

C1 = createC1(dataSet)

In [31]:

C1

Out[31]:

[frozenset({1}),
frozenset({2}),
frozenset({3}),
frozenset({4}),
frozenset({5})]

C1 contains a list of all the items in frozenset

In [32]:

#D is a dataset in the setform.

D = list(map(set,dataSet))

In [33]:

Out[33]:

[{1, 3, 4}, {2, 3, 5}, {1, 2, 3, 5}, {2, 5}]

Now that you have everything in set form, you can remove items that don’t meet our minimum support.

In [34]:

Privacy & Cookies: This


L1,suppDat0 site uses cookies. By continuing to use this website, you agree to their use.
= scanD(D,C1,0.5)
To L1
nd out more, including how to control cookies, see here: Cookie Policy

Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 4/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

Out[34]:

[frozenset({1}), frozenset({3}), frozenset({2}), frozenset({5})]

These four items make up our L1 list, that is, the list of one-item sets that occur in at least 50% of all transactions. Item 4 didn’t make the
minimum support level, so it’s not a part of L1. That’s OK. By removing it, you’ve removed more work from when you nd the list of two-
item sets.

Pseudo-code for the whole Apriori algorithm

While the number of items in the set is greater than 0:

Create a list of candidate itemsets of length k

Scan the dataset to see if each itemset is frequent

Keep frequent itemsets to create itemsets of length k+1

The main function is apriori(); it calls aprioriGen() to create candidate itemsets: Ck.

The function aprioriGen() will take a list of frequent itemsets, Lk, and the size of the itemsets, k, to produce Ck. For example, it will take
the itemsets {0}, {1}, {2} and so on and produce {0,1} {0,2}, and {1,2}.

The sets are combined using the set union, which is the | symbol in Python.

In [35]:

def aprioriGen(Lk, k): #creates Ck


retList = []
lenLk = len(Lk)
for i in range(lenLk):
for j in range(i+1, lenLk):
L1 = list(Lk[i])[:k-2]; L2 = list(Lk[j])[:k-2]
L1.sort(); L2.sort()
if L1==L2: #if first k-2 elements are equal
retList.append(Lk[i] | Lk[j]) #set union
return retList

In [38]:

def apriori(dataSet, minSupport = 0.5):


C1 = createC1(dataSet)
D = list(map(set, dataSet))
L1, supportData = scanD(D, C1, minSupport)
L = [L1]
k = 2
while (len(L[k-2]) > 0):
Ck = aprioriGen(L[k-2], k)
Lk, supK = scanD(D, Ck, minSupport)#scan DB to get Lk
supportData.update(supK)
L.append(Lk)
k += 1
return L, supportData

In [39]:
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To nd out more, including how to control cookies, see here: Cookie Policy
L,suppData = apriori(dataSet)
Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 5/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

In [40]:

Out[40]:

[[frozenset({1}), frozenset({3}), frozenset({2}), frozenset({5})],


[frozenset({3, 5}), frozenset({1, 3}), frozenset({2, 5}), frozenset({2, 3})],
[frozenset({2, 3, 5})],
[]]

L contains some lists of frequent itemsets that met a minimum support of 0.5. The variable suppData is a dictionary with the support
values of our itemsets.

In [46]:

L[0]

Out[46]:

[frozenset({1}), frozenset({3}), frozenset({2}), frozenset({5})]

In [47]:

L[1]

Out[47]:

[frozenset({3, 5}), frozenset({1, 3}), frozenset({2, 5}), frozenset({2, 3})]

In [48]:

L[2]

Out[48]:

[frozenset({2, 3, 5})]

In [49]:

L[3]

Out[49]:

[]

In [50]:

aprioriGen(L[0],2)
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To nd out more, including how to control cookies, see here: Cookie Policy
Out[50]:
Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 6/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

[frozenset({1, 3}),
frozenset({1, 2}),
frozenset({1, 5}),
frozenset({2, 3}),
frozenset({3, 5}),
frozenset({2, 5})]

Mining association rules from frequent item sets

To nd association rules, we rst start with a frequent itemset. We know this set of items is unique, but we want to see if there is anything
else we can get out of these items. One item or one set of items can imply another item.

generateRules(), is the main command, which calls the other two.

The generateRules() function takes three inputs: a list of frequent itemsets, a dictionary of support data for those itemsets, and a
minimum con dence threshold. It’s going to generate a list of rules with con dence values that we can sort through later.

In [51]:

def generateRules(L, supportData, minConf=0.7): #supportData is a dict coming from scanD


bigRuleList = []
for i in range(1, len(L)):#only get the sets with two or more items
for freqSet in L[i]:
H1 = [frozenset([item]) for item in freqSet]
if (i > 1):
rulesFromConseq(freqSet, H1, supportData, bigRuleList, minConf)
else:
calcConf(freqSet, H1, supportData, bigRuleList, minConf)
return bigRuleList

calcConf() calculates the con dence of the rule and then nd out the which rules meet the minimum con dence.

In [53]:

def calcConf(freqSet, H, supportData, brl, minConf=0.7):


prunedH = [] #create new list to return
for conseq in H:
conf = supportData[freqSet]/supportData[freqSet-conseq] #calc confidence
if conf >= minConf:
print (freqSet-conseq,'-->',conseq,'conf:',conf)
brl.append((freqSet-conseq, conseq, conf))
prunedH.append(conseq)
return prunedH

rulesFromConseq() generates more association rules from our initial dataset. This takes a frequent itemset and H, which is a list of items
that could be on the right-hand side of a rule.

In [54]:

def rulesFromConseq(freqSet, H, supportData, brl, minConf=0.7):


m = len(H[0])
if (len(freqSet) > (m + 1)): #try further merging
Hmp1 = aprioriGen(H, m+1)#create Hm+1 new candidates
Hmp1 = calcConf(freqSet, Hmp1, supportData, brl, minConf)
if (len(Hmp1) > 1): #need at least two sets to merge
rulesFromConseq(freqSet, Hmp1, supportData, brl, minConf)
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To nd out more, including how to control cookies, see here: Cookie Policy

In [55]: Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 7/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

L,suppData= apriori(dataSet,minSupport=0.5)

In [56]:

rules= generateRules(L,suppData, minConf=0.7)

frozenset({1}) --> frozenset({3}) conf: 1.0


frozenset({5}) --> frozenset({2}) conf: 1.0
frozenset({2}) --> frozenset({5}) conf: 1.0

This gives you three rules: {1} ➞ {3},{5} ➞ {2},and {2} ➞ {5}. It’s interesting to see that the rule with 2 and 5 can be ipped around but not
the rule with 1 and 3.

Finding similar features in poisonous mushrooms

In [70]:

mushDatSet = [line.split() for line in open('mushroom.dat').readlines()]

In [71]:

L,suppData= apriori(mushDatSet, minSupport=0.3)

Search the frequent itemsets for the poisonous feature 2

In [73]:

for item in L[1]:


if item.intersection('2'):
print (item)

for item in L[1]:


if item.intersection('2'):
print (item)

frozenset({'93', '2'})
frozenset({'36', '2'})
frozenset({'53', '2'})
frozenset({'23', '2'})
frozenset({'59', '2'})
frozenset({'67', '2'})
frozenset({'86', '2'})
frozenset({'39', '2'})
frozenset({'85', '2'})
frozenset({'76', '2'})
frozenset({'63', '2'})
frozenset({'34', '2'})
frozenset({'28', '2'})
frozenset({'90', '2'})

You can&also
Privacy repeat
Cookies: This this forcookies.
site uses the larger itemsets:
By continuing to use this website, you agree to their use.
To nd out more, including how to control cookies, see here: Cookie Policy

In [79]: Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 8/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

for item in L[6]:


if item.intersection('2'):
print (item)

frozenset({'86', '39', '34', '23', '36', '2', '59'})


frozenset({'86', '53', '85', '28', '90', '2', '39'})
frozenset({'93', '86', '34', '23', '90', '59', '2'})
frozenset({'86', '85', '34', '90', '2', '39', '63'})
frozenset({'93', '85', '34', '23', '90', '59', '2'})
frozenset({'93', '86', '39', '23', '59', '2', '36'})
frozenset({'86', '85', '34', '23', '36', '2', '39'})
frozenset({'93', '86', '85', '34', '23', '59', '2'})
frozenset({'93', '86', '34', '23', '59', '2', '39'})
....
....
....

frozenset({'93', '86', '85', '34', '23', '90', '2'})


frozenset({'86', '34', '23', '36', '2', '59', '63'})

In [83]:

rules= generateRules(L,suppData, minConf=0.7)

frozenset({'76'}) --> frozenset({'36'}) conf: 0.7135036496350365


frozenset({'56'}) --> frozenset({'86'}) conf: 1.0
frozenset({'2'}) --> frozenset({'93'}) conf: 0.7490494296577946
.....
.....
.....

frozenset({'23', '85'}) --> frozenset({'86', '39', '34', '59', '2', '36', '63'}) conf: 0.7298578199052134
frozenset({'86', '23'}) --> frozenset({'85', '34', '59', '2', '39', '36', '63'}) conf: 0.7298578199052134
frozenset({'23'}) --> frozenset({'86', '85', '34', '59', '2', '39', '36', '63'}) conf: 0.7298578199052134

excerpts from
photo

Apriori Algorithm Machine Learning

Like this:

Like

Be the first to like this.

SHARE THIS
   

Share this:

  
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To nd out more, including how to control cookies, see here: Cookie Policy

Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 9/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

← PREVIOUS POST NEXT POST →

Bene ts and Challenges of Business IT/IS alignment AdaBoost (Python 3)

piush vaish

YOU MAY ALSO LIKE


   

Coding FP-growth algorithm in Python 3


August 7, 2016
In "Machine Learning"

10 groups of Machine Learning Algorithms


false
In "Data Analysis Resources"

k-Nearest Neighbors(kNN)
false
In "Machine Learning"

4 COMMENTS
   

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
Nay
To nd out more, including how to control cookies, see here: Cookie Policy
September 19, 2017
Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 10/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

HI,
I have a question and I hope you can help me.
I am working on an apriory algorithm for a large list of item.
My question is if I can save all the rules generated in the same le?

REPLY

Roopa T R
February 22, 2018

Hi
In[28] y ssCnt is used, and y ssCnt[can] is assigned for 1

REPLY

1. Association Rules Example with R – DataMathStat


March 11, 2018
[…] Example in Python: https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ […]

Reply

starman
May 21, 2018

hey can you tell me what L1=list([i])[:k-2] Is Doing

REPLY

LEAVE A REPLY

Enter your comment here...

Search

TOP POSTS & PAGES


   

Apriori Algorithm (Python 3.0)

Countvectorizer sklearn example


Evolution of Information System Function

Visualise Categorical Variables in Python

Case Study: Information Systems and Information Technology at Zara

The Comprehensive Guide for Feature Engineering

Coding FP-growth algorithm in Python 3


Di erence between Disintermediation, Re-intermediation and Counter mediation

Building a word count application in Spark

Comparing Positioning approach versus Resource Based View?

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To nd out more, including how to control cookies, see here: Cookie Policy

Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 11/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

Deep
Learning
Box 10GPU
cadnetwork.de

für Ihr Deep Learning


Projekt
Bis zu 10 GPU.
Schnelle Lieferung.
Fertig installiert mit
Tensor ow, Caffe,
Theano, usw.

ÖFFNEN

SUBSCRIBE TO MY BLOG
   

Enter your email address to subscribe to this blog and receive noti cations of new posts by email.

Join 96 other subscribers

Email Address

SUBSCRIBE

CATEGORIES
   

Business (15)

Competition Notes (6)

Data Analysis Resources (42)

Data Visualization (4)

Data Warehousing (1)

E-Business (3)

Enterprise Architecture (6)

ETL (1)

Experience (6)

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
Funding (12)
To nd out more, including how to control cookies, see here: Cookie Policy

Close and accept


Information Security (7)

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 12/13
11/19/2019 Apriori Algorithm (Python 3.0) - A Data Analyst

Information Systems Management (16)

Innovation (3)

IT Strategy (15)

Kaggle (16)

Machine Learning (49)

Personal Stories (3)

Predictive Analysis (18)

Reinforcement Learning (1)

scikit-learn (14)

Spark (4)

2019 © A Data Analyst -  Crafted with love by SiteOrigin

Spiel es für 1 Min

Dieses Spiel wird dich total aus den Socken hauen


Panzer Rush

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To nd out more, including how to control cookies, see here: Cookie Policy

Close and accept

https://adataanalyst.com/machine-learning/apriori-algorithm-python-3-0/ 13/13

Potrebbero piacerti anche