Experiment: 3: Aim: Theory

Experiment: 3
AIM:
To implement Primitive Association Rule Mining.
THEORY:
Association rules are patterns discovered in data that includes the concept of
transaction, basket or group. A common example of a transaction is the set
of items someone buys during a supermarket trip. Not all datasets have a
transaction-based structure, e.g. a database of names, ages and addresses
includes no obvious transactions and will not yield association rules.
However, in the supermarket example, the transaction identifier groups
items such as bread, milk and soap together for each customer. Association
rules relate the items within each transaction or basket here.
Given a transaction-based dataset, association rules specify patterns within

that dataset, e.g. 78% of people who buy milk also buy bread. Association
rules can also apply with hierarchies, e.g. milk can be considered as part of
the category dairy and a rule may state that 67% of people who buy dairy
also buy soap.
Association rules identify collections of data attributes that are statistically

related in the underlying data. An association rule is of the form X => Y
where X and Y are disjoint conjunctions of attribute-value pairs. The
confidence of the rule is the conditional probability of Y given X, Pr(Y|X), and
the support of the rule is the prior probability of X and Y, Pr(X and Y). Here
probability is taken to be the observed frequency in the data set. The
traditional association rule mining problem can be described as follows.
Given a database of transactions, a minimal confidence threshold and a
minimal support threshold, find all association rules whose confidence and
support are above the corresponding thresholds.
Association rules are typically found via the process of data mining, which
searches the database for patterns. These patterns can lead to concrete
business decisions, e.g. they can affect discounting policies or marketing
campaigns at retail chains.
Support for an item set X in a transactional database D is defined as

count(X) / |D|.
For an association rule X Þ Y, we can calculate
1. Support (X Þ Y) = support (XY) = support (X union Y).

2. Confidence (X Þ Y) = support (XY) / support (X).
Support (S) and Confidence (C) can also be related to joint probabilities and
conditional probabilities as follows.
• Support (X Þ Y) = P (XY).
• Confidence (X Þ Y) = P (Y/X).
The numbers of association rules that can be derived from a dataset D

are exponentially large. Interesting association rules are those whose
support and confidence are greater than minSupp and minConf.
Frequent item sets (also called as large item sets), are those item sets
whose support is greater than minSupp. The apriori property (downward
closure property) says that any subsets of an frequent itemset are also
frequent itemsets
Consider the following dataset D.
TID Itemsets |D| = 4.

T100 134
T200 235 Count (2,3) = 2.
T300 1235
Support(2,3) = 2/4 = 0.5.
Support(3 Þ 2) = support(2,3) = 0.5.

T400 25
Confidence(3 Þ 2) =
support(2,3)/support(3)
= 0.5/0.75 = 0.67
SOURCE CODE:
#include<iostream.h>
#include<conio.h>
struct s
{char n[4];
char *li;
int m;};
struct s *ptr;
void cal_s(int t)
{ int ni,apearnce=0;
char *c;
cout<<"\n\nenter the no. of items: ";
cin>>ni;
c=new char[ni];
for(int i=0;i<ni;i++)
cin>>c[i];
int ff,k=0,cnt=0;
for(i=0;i<t;i++) //cover all transction

{ff=0;
for(k=0;k<ni;k++)//cover total search item
{
for(int j=0;j<ptr[i].m;j++)//cover item of each transaction
{
if(c[k]==ptr[i].li[j])
{ff+=1;}
} //end j
}//end k
if(ff==ni) apearnce++;
for(int q=0;q<ptr[i].m;q++)
if(c[0]==ptr[i].li[q])
cnt++;
}// end i
cout<<"\n\(SUPPORT)= "<<apearnce<<"/"<<t<<" % ="<<apearnce/t;
cout<<"\n\n(CONFIDENCE) = "<<apearnce<<"/"<<cnt<<" %
="<<apearnce/cnt;
}
void main()
{int n;
clrscr();
cout<<"enter the no. of transactions...";
cin>>n;
ptr= new s[n];
for(int i=0;i<n;i++)
{
cout<<"\nEnter the name: ";
cin>>ptr[i].n;
cout<<"enter the no. of items in this transactions...";
cin>>ptr[i].m;
ptr[i].li=new char[ptr[i].m];
for(int j=0;j<ptr[i].m;j++)
cin>>ptr[i].li[j];
}
cout<<"\n\nTRANSACTION TABLE:\n";
for(i=0;i<n;i++)
{
cout<<endl;
cout<<ptr[i].n<<"\t\t";
for(int j=0;j<ptr[i].m;j++)
cout<<ptr[i].li[j]<<",";
}
while(getch()!='q')
{
cal_s(n);
}
getch();
}
OUTPUT:
enter the no. of transactions...9

Enter the name: t100
enter the no. of items in this transactions...3
1 2 3
2 4
2 3
1 2 3
1 3
2 3
1 3
1 2 3 5
1 2 3
TRANSACTION TABLE:
t100 1,2,3,
t200 2,4,
t300 2,3,
t400 1,2,3,
t500 1,3,
t600 2,3,
t700 1,3,
t800 1,2,3,5,
t900 1,2,3,
enter the no. of items: 2

1 2
(SUPPORT)= 4/9 % = 0.44
(CONFIDENCE) = 4/6 % = 0.66
Experiment: 4
AIM:
To implement APRIORI Algorithm.
THEORY:
The major steps in association rule mining are:
1. Frequent Itemset generation

2. Rules derivation
The APRIORI algorithm uses the downward closure property, to prune

unnecessary branches for further consideration. It needs two parameters,
minSupp and minConf. The minSupp is used for generating frequent
itemsets and minConf is used for rule derivation.
The APRIORI algorithm:
1. k = 1;
2. Find frequent itemset, Lk from Ck, the set of all candidate itemsets;
3. Form Ck+1 from Lk;
4. k = k+1;
5. Repeat 2-4 until Ck is empty;
Step 2 is called the frequent itemset generation step. Step 3 is called as the
candidate itemset generation step. Details of these two steps are in the next
lesson.
APRIORI's Frequent/Candidate generation

Frequent itemset generation
Scan D and count each itemset in Ck, if the count is greater than
minSupp, then add that itemset to Lk.
Candidate itemset generation
For k = 1, C1 = all itemsets of length = 1.
For k > 1, generate Ck from Lk-1 as follows:
The join step:

Ck = k-2 way join of Lk-1 with itself.
If both {a1,..,ak-2, ak-1} & {a1,.., ak-2, ak} are in Lk-1, then add {a1,..,ak-2, ak-1,
ak} to Ck.
The items are always stored in the sorted order.
The prune step:
Remove {a1, …,ak-2, ak-1, ak}, if it contains a non-frequent (k-1) subset.
APRIORI's Rule derivation

Rule Derivation
Frequent itemsets do no mean association rules. One more step is

required to convert these frequent itemsets into rules.
Association Rules can be found from every frequent itemset X as follows:
For every non-empty subset A of X
1. Let B = X - A.
2. A  B is an association rule if
confidence(A B)
 ≥ minConf.
where, confidence (A  B) = support (AB) / support (A), and
support(A  B) = support(AB).
Example for deriving rules
Suppose X = 234 is a frequent itemset, with minSupp = 50%.
1. Proper non-empty subsets of X are: 23, 24, 34, 2, 3, 4 with supports

= 50%, 50%, 75%, 75%, 75%, and 75%, respectively.
2. The association rules from these subsets are:
23  4. confidence = 100%.
24  3. confidence = 100%.
34  2. confidence = 67%.
4  23. confidence = 67%
All rules have a support = 50%.
In order to derive an association rule A B,

 we need to have support(AB)
and support(A). This step is not as time consuming as the frequent itemset
generation. It can also be speeded by using parallel processing techniques,
as rules generated from one frequent itemset do not affect the rules
generated from any other frequent itemset.
SOURCE CODE:
#include<iostream.h>
#include<conio.h>
int sort(int num)
{
int arr[3];
int i=0;
int d;
while(num)
{
d=num % 10;
arr[i++]=d;
num/=10;
}
int m;
int p;
int t;
for(int j=0;j<i;j++)
{
m=arr[j];
p=j;
for(int k=j+1;k<i;k++)
{
if(m>arr[k])
{
m=arr[k];
p=k;
}
}
t=arr[p];
arr[p]=arr[j];
arr[j]=t;
}
int ret_num=0;
i=0;
while(i<3)
{
ret_num=ret_num*10 + arr[i];
i++;
}
return ret_num;
}
int ret_count(int test_item,int main_item)
{
int count=0,d=0;
int t[10],m[20];
int i=0,j=0,k=0,l=0;
while(test_item)
{
d=test_item % 10;
t[i++]=d;
test_item/=10;
}
while(main_item)
{
d=main_item % 10;
m[j++]=d;
main_item/=10;
}
for(k=0;k<i;k++)
{
d=t[k];
for(l=0;l<j;l++)
{
if(m[l]==d)
{
count++;
break;
}
}
}
if(count==i)
return 1;
else
return 0;
}
int ret_combi(int item1,int item2)
{
int ret_no=0,d=0;
int t[10],m[20];
int i=0,j=0,k=0,l=0;
while(item1)
{
d=item1 % 10;
t[i++]=d;
item1/=10;
}
while(item2)
{
d=item2 % 10;
m[j++]=d;
item2/=10;
}
for(int p=i-1;p>=0;p--)
{
d=t[p];
for(int q=j-1;q>=0;q--)
{
if(d==m[q])
{--j;break;}
}
ret_no=ret_no*10 + d;
}
if(j>=0)
ret_no=ret_no*10 + m[--j];
if(ret_no)
return ret_no;
else
return 0;
}
void main()
{
clrscr();
int main_data[10][2];
int min_sup,i,j;
cout<<"\n\nEnter Main Transaction Table \n\n";
for(i=0;i<9;i++)
{
cout<<"\nTransaction Id : T";cin>>main_data[i][0];
cout<<"List of Items: ";cin>>main_data[i][1];
}
int C[5][2],L[5][2];
cout<<"\n\nEnter Scan D\n\n";
for(i=0;i<5;i++)
{
cout<<"ItemSet:";cin>>C[i][0];
cout<<"Sup Count:";cin>>C[i][1];
cout<<"\n";
}
cout<<"\n\nEnter Min Sup Count : ";cin>>min_sup;
for(i=0;i<5;i++)
{
if(C[i][1]>=min_sup)
{
L[i][0]=C[i][0];
L[i][1]=C[i][1];
}
}
cout<<"\n\n L 1 \n\n";
for(i=0;i<5;i++)
{
cout<<"ItemSet:"<<L[i][0];
cout<<" Sup Count:"<<L[i][1];
cout<<"\n";
}
int k=0,l=0;
int C2[10][2],L2[10][2];
for(i=0;i<5;i++)
{
for(j=i+1;j<5;j++)
{
C2[k++][0]=L[i][0]*10+L[j][0];
}
}
int count=0;
for(i=0;i<k;i++)
{
count=0;
for(j=0;j<10;j++)
{
if(ret_count(C2[i][0],main_data[j][1]))
++count;
}
C2[i][1]=count;
}
/*cout<<"\n\nGenerated C2 : \n\n";
for(i=0;i<k;i++)
cout<<"\n Itemset : "<<C2[i][0]<<" Sup Count : "<<C2[i][1];*/
j=0;
for(i=0;i<k;i++)
{
if(C2[i][1]>=min_sup)
{
L2[j][0]=C2[i][0];
L2[j++][1]=C2[i][1];
}
}
cout<<"\n\nGenerated L2 : \n\n";
for(i=0;i<j;i++)
cout<<"\n Itemset : "<<L2[i][0]<<" Sup Count : "<<L2[i][1];
getch();
//////////////////////////////////////////////////////////
int C3[10][2],L3[10][2],FL3[10][2];
int len_l2=j;
k=0;
int n,p;
for(i=0;i<len_l2;i++)
{
for(n=i+1;n<len_l2;n++)
{
p=ret_combi(L2[i][0],L2[n][0]);
C3[k++][0]=p;
}
}
for(i=0;i<k;i++)
{
count=0;
for(j=0;j<10;j++)
{
if(ret_count(C3[i][0],main_data[j][1]))
++count;
}
C3[i][1]=count;
}
int q=0,f=0;
int num,d[3];
j=0;
for(i=0;i<k;i++)
{
q=0;f=0;
num=C3[i][0];
while(num)
{
d[q++]=num%10;
num=num/10;
}
if((d[0]==d[1])||(d[0]==d[2])||(d[1]==d[2]))
f=1;
if((C3[i][1]>=min_sup) && !f)

{
L3[j][0]=C3[i][0];
L3[j++][1]=C3[i][1];
}
}
int temp;
for(i=0;i<j;i++)
{
temp=sort(L3[i][0]);
L3[i][0]=temp;
}
int b=0,c=0;
int temp2;
for(i=0;i<j;i++)
{
temp=L3[i][0];
temp2=L3[i][1];
if(i==0)
{
FL3[b][0]=temp;
FL3[b++][1]=temp2;
}
else
{
for(int k=0;k<b;k++)
{
if(temp==FL3[k][0])
{
c=1;
break;
}
}
if(!c)
{
FL3[b][0]=temp;
FL3[b++][1]=temp2;
}
c=0;
}
}
cout<<"\n\nGenerated L3 : \n\n";
for(i=0;i<b;i++)
cout<<"\n Itemset : "<<FL3[i][0]<<" Sup Count : "<<FL3[i][1];
getch();
}
OUTPUT:
Enter Main Transaction Table

Transaction Id : T100
List of Items: 125
List of Items: 24
List of Items: 23
List of Items: 124
List of Items: 13
List of Items: 23
List of Items: 13
List of Items: 123
Enter Scan D
ItemSet:1
Sup Count:6
ItemSet:2
Sup Count:7
ItemSet:3
Sup Count:6
ItemSet:4
Sup Count:2
ItemSet:5
Sup Count:2
Enter Min Sup Count : 2

L1
ItemSet:1 Sup Count:6
Generated L2 :
Itemset : 12 Sup Count : 4
Generated L3 :
Itemset : 125 Sup Cou

Experiment: 3: Aim: Theory

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Experiment: 3: Aim: Theory

Caricato da

Copyright:

Formati disponibili

Experiment: 3

Given a transaction-based dataset, association rules specify patterns within

Association rules identify collections of data attributes that are statistically

Support for an item set X in a transactional database D is defined as

For an association rule X Þ Y, we can calculate

1. Support (X Þ Y) = support (XY) = support (X union Y).

The numbers of association rules that can be derived from a dataset D

Consider the following dataset D.

TID Itemsets |D| = 4.

Support(3 Þ 2) = support(2,3) = 0.5.

for(i=0;i<t;i++) //cover all transction

enter the no. of transactions...9

enter the no. of items: 2

The major steps in association rule mining are:

1. Frequent Itemset generation

The APRIORI algorithm uses the downward closure property, to prune

The APRIORI algorithm:

APRIORI's Frequent/Candidate generation

For k = 1, C1 = all itemsets of length = 1.

For k > 1, generate Ck from Lk-1 as follows:

The join step:

APRIORI's Rule derivation

Frequent itemsets do no mean association rules. One more step is

Association Rules can be found from every frequent itemset X as follows:

For every non-empty subset A of X

where, confidence (A  B) = support (AB) / support (A), and

Example for deriving rules

Suppose X = 234 is a frequent itemset, with minSupp = 50%.

1. Proper non-empty subsets of X are: 23, 24, 34, 2, 3, 4 with supports

All rules have a support = 50%.

In order to derive an association rule A B,

cout<<"\n\nEnter Min Sup Count : ";cin>>min_sup;

if((C3[i][1]>=min_sup) && !f)

Enter Main Transaction Table

Enter Min Sup Count : 2

Potrebbero piacerti anche