Data Warehousing and Mining - Exam Solutions

DWDM Solutions Nikhil K Pawanikar, UDIT April 2011
Page 1 of 6
Given a set of items I={I1,I2,,Im} and a database of transactions D={t1,t2, , tn} where Set of items: I={I1,I2,,Im} Transactions: D={t1,t2, , tn}, tj I Itemset: {Ii1,Ii2, , Iik} I Support of an itemset: Percentage of transactions which contain that itemset.
Support of Association Rule (s) X Y: Percentage of transactions that contain X Y Confidence of Association Rule (a) X Y: Ratio of number of transactions that contain X Y to the number that contain X Given are: Basket 1 2 3 4 5 Table 1 Items Bread, butter, jam Bread, butter Bread, butter, milk Beer, bread Beer, milk
From the data given above we can calculate Support(S) follows: Set Bread Butter Milk Beer Jam Table 2 Calculation 3/5*100=60 3/5*100=60 2/5*100=40 2/5*100=40 1/5*100=20 Support (S) 60% 60% 40% 40% 20%
Support is the percentage of occurrence of given item Simillarly Support(s) & Confidence(a) for the given rules can be calculated as follows: Rule Support (S) Calculation Butter=>Bread 60% 3/5*100 Jam=>( Butter, Bread) 20% 1/5*100 Support is the percentage of occurrence of given item Confidence is ratio of Support of (XUY) to Support of X Confidence(a) 100% 100% Calculation 60/60*100 20/20*100
DWDM Solutions Nikhil K Pawanikar, UDIT
Page 2 of 6
A Frequent (Large) Itemset is an itemset whose number of occurrences is above a threshold s. Using Apriori Algorithm we scan the given itemsets to generate Large itemsets which satisfy the gives support threshold s =60% Scan Candidate Calculation, support Large(Frequent) Itemset 1 {Egg}, 4/5*100=80%, {Egg}, {Milk}, 4/5*100=80%, {Milk}, {Chips}, 4/5*100=80%, {Chips}, {Butter}, 3/5*100=60%, {Butter} {Popcorn} 1/5*100=20% 2 {Egg,Milk}, 3/5*100=60% {Egg,Milk}, {Egg, Chips}, 3/5*100=60% {Egg, Chips}, {Egg, Butter}, 2/5*100=40% {Milk, Chips}, {Milk,Chips}, 3/5*100=60% {Chips, Butter} {Milk, Butter}, 2/5*100=40% {Chips, Butter} 3/5*100=60% 3 {Egg, Milk, Chips}, 2/5*100=40% Null {Egg, Milk, Butter}, 1/5*100=20% {Egg, Chips, Butter}, 2/5*100=40% {Milk, Chips, Butter} 2/5*100=40% In Scan 1, only those candidates who have a support greater than or equal to the Threshold support of 60% make it to the Large Itemset. In Scan 2, the apriori algorithm is applied to the newly generated 4 candidates of the Large Itemset. We Combine all 4 candidates to generate a total of 3 + 2 + 1 = 6 candidates. Here, out of 6 candidates 4 have a support greater or equal to 60% hence they make it to the Large Itemset and is the final solution. In Scan 3, the apriori algorithm is applied to the newly generated 4 candidates of the Large Itemset. We Combine all 4 candidates to generate new candidates. Here, out of 4 candidates none have a support greater or equal to 60% hence the frequent Itemset is null for scan 3. Thus the frequent itemset with a threshold support of 60% are: {Egg,Milk}, {Egg, Chips}, {Milk, Chips}, {Chips, Butter}
DWDM Solutions Nikhil K Pawanikar, UDIT Oct 2011 Q.NO 5A
Page 3 of 6
Denition (FP-tree).
A frequent-pattern tree (or FP-tree in short) is a tree structure dened below: 1. It consists of one root labeled as null, a set of item-prex subtrees as the children of the root, and a frequent-item-header table. 2. Each node in the item-prex subtree consists of three elds: item-name, count, and node-link, where item-name registers which item this node represents, count registers the number of transactions represented by the portion of the path reaching this node, and node-link links to the next node in the FP-tree carrying the same item-name, or null if there is none. 3. Each entry in the frequent-item-header table consists of two elds, (1) item-name and (2) head of node-link (a pointer pointing to the rst node in the FP-tree carrying the item-name). From the Given table following Observation can be made
A scan of data derives a list of frequent items ( f :4), (c:4), (a:3), (b:3), (m:3), ( p:3) the number after : indicates the support, in which items are ordered in frequency descending order.
TID 1 2 3 4 5
Items Brought f, a, c, d, g, i, m, p a, b, c, f, l , m, o b, f, h, j, o b, c, k, s, p a, f, c, e, l , p, m, n
Frequent Items (ordered) f ,c, a, m, p f,c, a, b, m f, b c, b, p f, c, a, m, p
A frequent-pattern tree can be created as follows:
1. The root of a tree is created and labeled with null.

2. The scan of the rst transaction leads to the construction of the rst branch of the tree: ( f :1), (c:1), (a:1), (m:1), ( p:1). Notice that the frequent items in the transaction are listed according to the order in the list of frequent items. For the second transaction, since its (ordered) frequent item list _ f, c, a, b, m_ shares a common prex _ f, c, a_ with the existing path _ f, c, a, m, p_, the count of each node along the prex is incremented by 1, and one new node (b:1) is created and linked as a child of (a:2) and another new node (m:1) is created and linked as the child of (b:1). For the third transaction, since its frequent item list _ f, b_ shares only the node _ f _ with the f -prex subtree, f s count is incremented by 1, and a new node (b:1) is created and linked as a child of ( f :3). The scan of the fourth transaction leads to the construction of the second branch of the tree, _(c:1), (b:1), ( p:1)_.
3.
4.
5.
DWDM Solutions Nikhil K Pawanikar, UDIT
Page 4 of 6
6.
For the last transaction, since its frequent item list _ f, c, a, m, p_ is identical to the rst one, the path is shared with the count of each node along the path incremented by 1.
Alternate Diagram
(Note: for detailed explanation of FP Tree refer to the PPT on FP Tree) Reference : Oskar Kohonen
DWDM Solutions Nikhil K Pawanikar, UDIT Oct 2009 Q. NO 5 A & Oct 2010 Q. NO 5B
Page 5 of 6
A Frequent (Large) Itemset is an itemset whose number of occurrences is above a threshold s.
Using Apriori Algorithm we scan the given itemsets to generate Large itemsets which satisfy the gives support threshold s =60% Scan Candidate Calculation, Large(Frequent) Itemset support 1 Bread 4*5/100=80 Bread Milk 4*5/100=80 Milk Diapers 4*5/100=80 Diapers Beer 3*5/100=80 Beer Eggs 1*5/100=20 Cola 2*5/100=40 2 Bread, Milk 3/5*100=60% Bread, Milk Bread, Diapers 3/5*100=60% Bread, Diapers Bread, Beer 2/5*100=40% Milk, Diapers Milk, Diapers 3/5*100=60% Diapers, Beer Milk, Beer 2/5*100=40% Diapers, Beer 3/5*100=60% 3 Bread, Milk, Diapers 2/5*100=40% Null Bread, Milk, Beer 1/5*100=20% Bread, Diapers, Beer 2/5*100=40% Milk, Diapers, Beer 2/5*100=40% In Scan 1, only those candidates who have a support greater than or equal to the Threshold support of 60% make it to the Large Itemset. In Scan 2, the apriori algorithm is applied to the newly generated 4 candidates of the Large Itemset. We Combine all 4 candidates to generate a total of 3 + 2 + 1 = 6 candidates. Here, out of 6 candidates 4 have a support greater or equal to 60% hence they make it to the Large Itemset. In Scan 3, the apriori algorithm is applied to the newly generated 4 candidates of the Large Itemset. We Combine all 4 candidates to generate new candidates. Here, out of 4 candidates none have a support greater or equal to 60% hence the frequent Itemset is null for scan 3. Thus the frequent itemset with a threshold support of 60% are: {Bread, Milk}, {Bread, Diapers}, {Milk, Diapers}, {Diapers, Beer}
DWDM Solutions Nikhil K Pawanikar, UDIT April 2008 Q. No 4B
Page 6 of 6
Given a set of items I={I1,I2,,Im} and a database of transactions D={t1,t2, , tn} where Set of items: I={I1,I2,,Im} Transactions: D={t1,t2, , tn}, tj I Itemset: {Ii1,Ii2, , Iik} I Support of an itemset: Percentage of transactions which contain that itemset.
Support of Association Rule (s) X Y: Percentage of transactions that contain X Y Confidence of Association Rule (a) X Y: Ratio of number of transactions that contain X Y to the number that contain X
From the given data we can deduce the following: No 1 2 3 4 5 6 7 8 9 X {1} {2} {3} {5} {1,3} {2.3} {2,5} {3,5} {2,3,5} F (Frequency ) 2 3 3 3 2 2 3 2 2 Calculation 2/9*100=22.22 3/9*100=33.33 3/9*100=33.33 3/9*100=33.33 2/9*100=22.22 2/9*100=22.22 3/9*100=33.33 2/9*100=22.22 2/9*100=22.22 Support 22.22% 33.33% 33.33% 33.33% 22.22% 22.22% 33.33% 22.22% 22.22%
Possibilities of generating association rules for {2, 3, 5} are: 1. 2=> 3,5 2. 3=> 2,5 3. 5=> 2,3 Confidence can be calculated for each of the above cases as follows: Rule Support (S) Calculation Confidence(a) Calculation 2=> 3,5 22.22% 2/9*100=22.22 67.67% 22.22/33.33*100= 67.667 3=> 2,5 33.33% 3/9*100=33.33 100% 33.33/33.33*100=100 5=> 2,3 22.22% 2/9*100=22.22 67.67% 22.22/33.33*100=67.667 Hence, the only association rule exceeding the minimum confidence of 75% is 3=> 2,5

Data Warehousing and Mining - Exam Solutions

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Data Warehousing and Mining - Exam Solutions

Caricato da

Copyright:

Formati disponibili

DWDM Solutions Nikhil K Pawanikar, UDIT April 2011

DWDM Solutions Nikhil K Pawanikar, UDIT

DWDM Solutions Nikhil K Pawanikar, UDIT Oct 2011 Q.NO 5A

Frequent Items (ordered) f ,c, a, m, p f,c, a, b, m f, b c, b, p f, c, a, m, p

A frequent-pattern tree can be created as follows:

1. The root of a tree is created and labeled with null.

DWDM Solutions Nikhil K Pawanikar, UDIT

A Frequent (Large) Itemset is an itemset whose number of occurrences is above a threshold s.

DWDM Solutions Nikhil K Pawanikar, UDIT April 2008 Q. No 4B

Potrebbero piacerti anche