Sei sulla pagina 1di 5

KDD: Knowledge Discovery in Data

Association-Rule Mining (Data Mining)


Which products do MOST customers buy together in one transaction? (Market basket data analysis) Milk, Bread and Eggs Hotdog, Chips and Pop Hamburger, Buns and Barbeque Sauce

Question# 1 in ATB context: Which products do MOST customers hold together? Answer: (a) Mortgage, Line of Credit and Credit Card (80%) (b) Chequing, Savings and Credit Card (70%) Question# 2 in ATB context: What kind of customers go for the above three products together (at ATB): Mortgage, Line of Credit and Credit Card?

Attribute Name
Products (Class)

Attribute-values (Categorical Data)


Mortgage, Line of Credit, Credit Card, Chequing Account and Savings Account

Occupation

Self-employed, Walmart Greeter, McDonalds Employee

Sex Race Age

male, female, other Caucasian, South Asian, Oriental, African <30 yr., 30-50 yr., >50 yr.

Customer/Product Profile:
Subject Number S1 Products held By customer Mortgage, Line of Savings Account Credit Card and Credit, >100K Self Employed male Caucasian >35 yr. Income Occupation Sex Race Age

S2

Savings Account

<30K

Walmart Greeter

female

Oriental

>50 yr.

S3

Chequing Account

<20K

McDonalds Employee

male

Oriental

<30 yr.

S4

Mortgage, Line of Credit and Credit Card

>100K

Self Employed

male

Caucasian

>35 yr.

S5

Mortgage, Line of Credit and Credit Card

<20K

McDonalds Employee

female

Oriental

>50 yr.

S6

Mortgage, Line of Credit Savings Account Credit, Card,

>100K

Self Employed

male

Caucasian

>35 yr.

S7

Chequing Account

<30 K

Walmart Greeter

male

Oriental

<30 yr.

S8

Mortgage, Line of Credit and Credit Chequing Account, Savings Account Card,

>100K

Self Employed

male

Caucasian

>35 yr.

S9

Savings Account

>100K

Self Employed

male

Caucasian

>35 yr.

S10

Chequing Account

<30 K

Walmart Greeter

female

South Asian

>50 yr.

Carry out Association-rule mining to get rules such as:


Income >100K AND Occupation= Self-employed AND Race = Caucasian AND Sex = Male AND Age >35 yr Mortgage, Line of Credit and Credit Card

Support: Support of an association-rule (A B) is the percentage of transactions that contain both the itemsets A and B. The support of a rule is a measure of how often does that rule occur in the dataset Support (A B) = Probability (A B) =

# (A B) n

where n is the total number of transactions in the dataset. (5/10 = 50%) Confidence: The confidence of an association-rule (A B) is the ratio of number of transactions that contain both itemsets A and B to the number of transactions that contain itemset A. The confidence of a rule is a measure of the strength of the rule. Confidence (A B) = Probablility (B/A) =

# (A B) (4/5 = 80%) #A

Lift: It is a measure of correlation between A and B. Lift (A, B) =

P (A B) P(A) P(B)

Associative Classifier for Predictive Modelling: Very Powerful. Very Fast for Predictive classification
Can be used for: Cross-selling/Upselling: Reactivation of lapsed customers (by doing association rule mining on the customers that were reactivated) Winning new high value/low value customers (Provide products that the customers desire in their respective categories)

Contrast-set Mining
P (Reactivation = Yes | Income <20K AND Occupation= McDonalds employee AND Race = Oriental AND Sex = Male AND Age >35 yr) = 0.73 (support), and

P (Reactivation = No | Income <20K AND Occupation= McDonalds employee AND Race = Oriental AND Sex = Male AND Age >35 yr) = 0.23 (support)

max | support (cset , Gi ) support (cset , G j ) |


ij

where is a user defined threshold called minimum support difference.

Subject Number

Customer Reactivation Successful?

Income

Occupation

Sex

Race

Age

S1

Customer Reactivated

>100K

Self Employed

male

Caucasian

>35 yr.

S2

Could NOT be Reactivated

<30K

Walmart Greeter

female

Oriental

>50 yr.

S3

Customer Reactivated

<20K

McDonalds Employee

male

Oriental

<30 yr.

S4

Customer Reactivated

>100K

Self Employed

male

African

>35 yr.

S5

Could NOT be Reactivated

<20K

McDonalds Employee

female

Oriental

>50 yr.

S6

Customer Reactivated

>100K

Self Employed

female

Caucasian

>35 yr.

S7

Could NOT be Reactivated

<30 K

Walmart Greeter

male

Caucasian

<30 yr.

S8

Customer Reactivated

>100K

Self Employed

male

Oriental

>35 yr.

S9

Customer Reactivated

<30 K

Walmart Greeter

female

Caucasian

>35 yr.

S10

Could NOT be Reactivated

<30 K

Walmart Greeter

female

South Asian

>50 yr.

Why are contrast-sets important? Determine the top-10 most significant attributes that affect your class (Prep step to data mining) Small datasets with a high number of attributes (Breast Cancer/Prostrate Cancer) Save time by weeding out the unnecessary rules.

Potrebbero piacerti anche