Sei sulla pagina 1di 16

Experiment: 3

AIM:
To implement Primitive Association Rule Mining.

THEORY:

Association rules are patterns discovered in data that includes the concept of
transaction, basket or group. A common example of a transaction is the set
of items someone buys during a supermarket trip. Not all datasets have a
transaction-based structure, e.g. a database of names, ages and addresses
includes no obvious transactions and will not yield association rules.
However, in the supermarket example, the transaction identifier groups
items such as bread, milk and soap together for each customer. Association
rules relate the items within each transaction or basket here.

Given a transaction-based dataset, association rules specify patterns within


that dataset, e.g. 78% of people who buy milk also buy bread. Association
rules can also apply with hierarchies, e.g. milk can be considered as part of
the category dairy and a rule may state that 67% of people who buy dairy
also buy soap.

Association rules identify collections of data attributes that are statistically


related in the underlying data. An association rule is of the form X => Y
where X and Y are disjoint conjunctions of attribute-value pairs. The
confidence of the rule is the conditional probability of Y given X, Pr(Y|X), and
the support of the rule is the prior probability of X and Y, Pr(X and Y). Here
probability is taken to be the observed frequency in the data set. The
traditional association rule mining problem can be described as follows.
Given a database of transactions, a minimal confidence threshold and a
minimal support threshold, find all association rules whose confidence and
support are above the corresponding thresholds.

Association rules are typically found via the process of data mining, which
searches the database for patterns. These patterns can lead to concrete
business decisions, e.g. they can affect discounting policies or marketing
campaigns at retail chains.

Support for an item set X in a transactional database D is defined as


count(X) / |D|.

For an association rule X Þ Y, we can calculate

1. Support (X Þ Y) = support (XY) = support (X union Y).


2. Confidence (X Þ Y) = support (XY) / support (X).

Support (S) and Confidence (C) can also be related to joint probabilities and
conditional probabilities as follows.

• Support (X Þ Y) = P (XY).
• Confidence (X Þ Y) = P (Y/X).

The numbers of association rules that can be derived from a dataset D


are exponentially large. Interesting association rules are those whose
support and confidence are greater than minSupp and minConf.

Frequent item sets (also called as large item sets), are those item sets
whose support is greater than minSupp. The apriori property (downward
closure property) says that any subsets of an frequent itemset are also
frequent itemsets

Consider the following dataset D.

TID Itemsets |D| = 4.


T100 134
T200 235 Count (2,3) = 2.
T300 1235
Support(2,3) = 2/4 = 0.5.

Support(3 Þ 2) = support(2,3) = 0.5.


T400 25
Confidence(3 Þ 2) =
support(2,3)/support(3)

= 0.5/0.75 = 0.67

SOURCE CODE:

#include<iostream.h>
#include<conio.h>
struct s
{char n[4];
char *li;
int m;};
struct s *ptr;
void cal_s(int t)
{ int ni,apearnce=0;
char *c;
cout<<"\n\nenter the no. of items: ";
cin>>ni;
c=new char[ni];
for(int i=0;i<ni;i++)
cin>>c[i];
int ff,k=0,cnt=0;

for(i=0;i<t;i++) //cover all transction


{ff=0;
for(k=0;k<ni;k++)//cover total search item
{
for(int j=0;j<ptr[i].m;j++)//cover item of each transaction
{
if(c[k]==ptr[i].li[j])
{ff+=1;}
} //end j
}//end k
if(ff==ni) apearnce++;
for(int q=0;q<ptr[i].m;q++)
if(c[0]==ptr[i].li[q])
cnt++;
}// end i
cout<<"\n\(SUPPORT)= "<<apearnce<<"/"<<t<<" % ="<<apearnce/t;
cout<<"\n\n(CONFIDENCE) = "<<apearnce<<"/"<<cnt<<" %
="<<apearnce/cnt;
}

void main()
{int n;
clrscr();
cout<<"enter the no. of transactions...";
cin>>n;
ptr= new s[n];
for(int i=0;i<n;i++)
{
cout<<"\nEnter the name: ";
cin>>ptr[i].n;
cout<<"enter the no. of items in this transactions...";
cin>>ptr[i].m;
ptr[i].li=new char[ptr[i].m];
for(int j=0;j<ptr[i].m;j++)
cin>>ptr[i].li[j];
}
cout<<"\n\nTRANSACTION TABLE:\n";
for(i=0;i<n;i++)
{
cout<<endl;
cout<<ptr[i].n<<"\t\t";
for(int j=0;j<ptr[i].m;j++)
cout<<ptr[i].li[j]<<",";
}

while(getch()!='q')
{
cal_s(n);
}
getch();
}
OUTPUT:

enter the no. of transactions...9


Enter the name: t100
enter the no. of items in this transactions...3
1 2 3
Enter the name: t200
enter the no. of items in this transactions...2
2 4
Enter the name: t300
enter the no. of items in this transactions...2
2 3
Enter the name: t400
enter the no. of items in this transactions...3
1 2 3
Enter the name: t500
enter the no. of items in this transactions...2
1 3
Enter the name: t600
enter the no. of items in this transactions...2
2 3
Enter the name: t700
enter the no. of items in this transactions...2
1 3
Enter the name: t800
enter the no. of items in this transactions...4
1 2 3 5
Enter the name: t900
enter the no. of items in this transactions...3
1 2 3

TRANSACTION TABLE:
t100 1,2,3,
t200 2,4,
t300 2,3,
t400 1,2,3,
t500 1,3,
t600 2,3,
t700 1,3,
t800 1,2,3,5,
t900 1,2,3,

enter the no. of items: 2


1 2
(SUPPORT)= 4/9 % = 0.44
(CONFIDENCE) = 4/6 % = 0.66
Experiment: 4
AIM:
To implement APRIORI Algorithm.

THEORY:

The major steps in association rule mining are:

1. Frequent Itemset generation


2. Rules derivation

The APRIORI algorithm uses the downward closure property, to prune


unnecessary branches for further consideration. It needs two parameters,
minSupp and minConf. The minSupp is used for generating frequent
itemsets and minConf is used for rule derivation.

The APRIORI algorithm:

1. k = 1;
2. Find frequent itemset, Lk from Ck, the set of all candidate itemsets;
3. Form Ck+1 from Lk;
4. k = k+1;
5. Repeat 2-4 until Ck is empty;

Step 2 is called the frequent itemset generation step. Step 3 is called as the
candidate itemset generation step. Details of these two steps are in the next
lesson.

APRIORI's Frequent/Candidate generation


Frequent itemset generation

Scan D and count each itemset in Ck, if the count is greater than
minSupp, then add that itemset to Lk.
Candidate itemset generation

For k = 1, C1 = all itemsets of length = 1.

For k > 1, generate Ck from Lk-1 as follows:

The join step:


Ck = k-2 way join of Lk-1 with itself.
If both {a1,..,ak-2, ak-1} & {a1,.., ak-2, ak} are in Lk-1, then add {a1,..,ak-2, ak-1,
ak} to Ck.
The items are always stored in the sorted order.
The prune step:
Remove {a1, …,ak-2, ak-1, ak}, if it contains a non-frequent (k-1) subset.

APRIORI's Rule derivation


Rule Derivation

Frequent itemsets do no mean association rules. One more step is


required to convert these frequent itemsets into rules.

Association Rules can be found from every frequent itemset X as follows:

For every non-empty subset A of X

1. Let B = X - A.
2. A  B is an association rule if

confidence(A B)
 ≥ minConf.

where, confidence (A  B) = support (AB) / support (A), and

support(A  B) = support(AB).

Example for deriving rules

Suppose X = 234 is a frequent itemset, with minSupp = 50%.

1. Proper non-empty subsets of X are: 23, 24, 34, 2, 3, 4 with supports


= 50%, 50%, 75%, 75%, 75%, and 75%, respectively.
2. The association rules from these subsets are:

23  4. confidence = 100%.
24  3. confidence = 100%.
34  2. confidence = 67%.
2  34. confidence = 67%.
3  24. confidence = 67%.
4  23. confidence = 67%

All rules have a support = 50%.

In order to derive an association rule A B,


 we need to have support(AB)
and support(A). This step is not as time consuming as the frequent itemset
generation. It can also be speeded by using parallel processing techniques,
as rules generated from one frequent itemset do not affect the rules
generated from any other frequent itemset.

SOURCE CODE:

#include<iostream.h>
#include<conio.h>
int sort(int num)
{
int arr[3];
int i=0;
int d;
while(num)
{
d=num % 10;
arr[i++]=d;
num/=10;
}
int m;
int p;
int t;
for(int j=0;j<i;j++)
{
m=arr[j];
p=j;
for(int k=j+1;k<i;k++)
{
if(m>arr[k])
{
m=arr[k];
p=k;
}
}
t=arr[p];
arr[p]=arr[j];
arr[j]=t;
}
int ret_num=0;
i=0;
while(i<3)
{
ret_num=ret_num*10 + arr[i];
i++;
}
return ret_num;
}
int ret_count(int test_item,int main_item)
{
int count=0,d=0;
int t[10],m[20];
int i=0,j=0,k=0,l=0;
while(test_item)
{
d=test_item % 10;
t[i++]=d;
test_item/=10;
}
while(main_item)
{
d=main_item % 10;
m[j++]=d;
main_item/=10;
}
for(k=0;k<i;k++)
{
d=t[k];
for(l=0;l<j;l++)
{
if(m[l]==d)
{
count++;
break;
}
}
}
if(count==i)
return 1;
else
return 0;
}
int ret_combi(int item1,int item2)
{
int ret_no=0,d=0;
int t[10],m[20];
int i=0,j=0,k=0,l=0;
while(item1)
{
d=item1 % 10;
t[i++]=d;
item1/=10;
}
while(item2)
{
d=item2 % 10;
m[j++]=d;
item2/=10;
}

for(int p=i-1;p>=0;p--)
{
d=t[p];
for(int q=j-1;q>=0;q--)
{
if(d==m[q])
{--j;break;}
}
ret_no=ret_no*10 + d;
}
if(j>=0)
ret_no=ret_no*10 + m[--j];
if(ret_no)
return ret_no;
else
return 0;
}

void main()
{
clrscr();
int main_data[10][2];
int min_sup,i,j;
cout<<"\n\nEnter Main Transaction Table \n\n";
for(i=0;i<9;i++)
{
cout<<"\nTransaction Id : T";cin>>main_data[i][0];
cout<<"List of Items: ";cin>>main_data[i][1];
}

int C[5][2],L[5][2];
cout<<"\n\nEnter Scan D\n\n";
for(i=0;i<5;i++)
{
cout<<"ItemSet:";cin>>C[i][0];
cout<<"Sup Count:";cin>>C[i][1];

cout<<"\n";
}

cout<<"\n\nEnter Min Sup Count : ";cin>>min_sup;

for(i=0;i<5;i++)
{
if(C[i][1]>=min_sup)
{
L[i][0]=C[i][0];
L[i][1]=C[i][1];
}
}

cout<<"\n\n L 1 \n\n";

for(i=0;i<5;i++)
{
cout<<"ItemSet:"<<L[i][0];
cout<<" Sup Count:"<<L[i][1];
cout<<"\n";
}

int k=0,l=0;
int C2[10][2],L2[10][2];

for(i=0;i<5;i++)
{
for(j=i+1;j<5;j++)
{
C2[k++][0]=L[i][0]*10+L[j][0];
}
}

int count=0;

for(i=0;i<k;i++)
{
count=0;
for(j=0;j<10;j++)
{
if(ret_count(C2[i][0],main_data[j][1]))
++count;
}
C2[i][1]=count;
}

/*cout<<"\n\nGenerated C2 : \n\n";
for(i=0;i<k;i++)
cout<<"\n Itemset : "<<C2[i][0]<<" Sup Count : "<<C2[i][1];*/

j=0;
for(i=0;i<k;i++)
{
if(C2[i][1]>=min_sup)
{
L2[j][0]=C2[i][0];
L2[j++][1]=C2[i][1];
}
}

cout<<"\n\nGenerated L2 : \n\n";
for(i=0;i<j;i++)
cout<<"\n Itemset : "<<L2[i][0]<<" Sup Count : "<<L2[i][1];
getch();

//////////////////////////////////////////////////////////

int C3[10][2],L3[10][2],FL3[10][2];
int len_l2=j;
k=0;
int n,p;
for(i=0;i<len_l2;i++)
{
for(n=i+1;n<len_l2;n++)
{
p=ret_combi(L2[i][0],L2[n][0]);
C3[k++][0]=p;
}
}

for(i=0;i<k;i++)
{
count=0;
for(j=0;j<10;j++)
{
if(ret_count(C3[i][0],main_data[j][1]))
++count;
}
C3[i][1]=count;
}

int q=0,f=0;
int num,d[3];
j=0;
for(i=0;i<k;i++)
{
q=0;f=0;
num=C3[i][0];
while(num)
{
d[q++]=num%10;
num=num/10;
}
if((d[0]==d[1])||(d[0]==d[2])||(d[1]==d[2]))
f=1;

if((C3[i][1]>=min_sup) && !f)


{
L3[j][0]=C3[i][0];
L3[j++][1]=C3[i][1];
}
}
int temp;
for(i=0;i<j;i++)
{
temp=sort(L3[i][0]);
L3[i][0]=temp;
}
int b=0,c=0;
int temp2;
for(i=0;i<j;i++)
{
temp=L3[i][0];
temp2=L3[i][1];
if(i==0)
{
FL3[b][0]=temp;
FL3[b++][1]=temp2;
}

else
{
for(int k=0;k<b;k++)
{
if(temp==FL3[k][0])
{
c=1;
break;
}
}
if(!c)
{
FL3[b][0]=temp;
FL3[b++][1]=temp2;
}
c=0;
}
}

cout<<"\n\nGenerated L3 : \n\n";
for(i=0;i<b;i++)
cout<<"\n Itemset : "<<FL3[i][0]<<" Sup Count : "<<FL3[i][1];

getch();
}

OUTPUT:

Enter Main Transaction Table


Transaction Id : T100
List of Items: 125

Transaction Id : T200
List of Items: 24

Transaction Id : T300
List of Items: 23

Transaction Id : T400
List of Items: 124

Transaction Id : T500
List of Items: 13

Transaction Id : T600
List of Items: 23

Transaction Id : T700
List of Items: 13

Transaction Id : T900
List of Items: 123

Enter Scan D

ItemSet:1
Sup Count:6

ItemSet:2
Sup Count:7

ItemSet:3
Sup Count:6

ItemSet:4
Sup Count:2

ItemSet:5
Sup Count:2

Enter Min Sup Count : 2


L1
ItemSet:1 Sup Count:6
ItemSet:2 Sup Count:7
ItemSet:3 Sup Count:6
ItemSet:4 Sup Count:2
ItemSet:5 Sup Count:2

Generated L2 :
Itemset : 12 Sup Count : 4
Itemset : 13 Sup Count : 4
Itemset : 15 Sup Count : 2
Itemset : 23 Sup Count : 4
Itemset : 24 Sup Count : 2
Itemset : 25 Sup Count : 2

Generated L3 :
Itemset : 123 Sup Count : 2
Itemset : 125 Sup Cou

Potrebbero piacerti anche