Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Question# 1 in ATB context: Which products do MOST customers hold together? Answer: (a) Mortgage, Line of Credit and Credit Card (80%) (b) Chequing, Savings and Credit Card (70%) Question# 2 in ATB context: What kind of customers go for the above three products together (at ATB): Mortgage, Line of Credit and Credit Card?
Attribute Name
Products (Class)
Occupation
male, female, other Caucasian, South Asian, Oriental, African <30 yr., 30-50 yr., >50 yr.
Customer/Product Profile:
Subject Number S1 Products held By customer Mortgage, Line of Savings Account Credit Card and Credit, >100K Self Employed male Caucasian >35 yr. Income Occupation Sex Race Age
S2
Savings Account
<30K
Walmart Greeter
female
Oriental
>50 yr.
S3
Chequing Account
<20K
McDonalds Employee
male
Oriental
<30 yr.
S4
>100K
Self Employed
male
Caucasian
>35 yr.
S5
<20K
McDonalds Employee
female
Oriental
>50 yr.
S6
>100K
Self Employed
male
Caucasian
>35 yr.
S7
Chequing Account
<30 K
Walmart Greeter
male
Oriental
<30 yr.
S8
Mortgage, Line of Credit and Credit Chequing Account, Savings Account Card,
>100K
Self Employed
male
Caucasian
>35 yr.
S9
Savings Account
>100K
Self Employed
male
Caucasian
>35 yr.
S10
Chequing Account
<30 K
Walmart Greeter
female
South Asian
>50 yr.
Support: Support of an association-rule (A B) is the percentage of transactions that contain both the itemsets A and B. The support of a rule is a measure of how often does that rule occur in the dataset Support (A B) = Probability (A B) =
# (A B) n
where n is the total number of transactions in the dataset. (5/10 = 50%) Confidence: The confidence of an association-rule (A B) is the ratio of number of transactions that contain both itemsets A and B to the number of transactions that contain itemset A. The confidence of a rule is a measure of the strength of the rule. Confidence (A B) = Probablility (B/A) =
# (A B) (4/5 = 80%) #A
P (A B) P(A) P(B)
Associative Classifier for Predictive Modelling: Very Powerful. Very Fast for Predictive classification
Can be used for: Cross-selling/Upselling: Reactivation of lapsed customers (by doing association rule mining on the customers that were reactivated) Winning new high value/low value customers (Provide products that the customers desire in their respective categories)
Contrast-set Mining
P (Reactivation = Yes | Income <20K AND Occupation= McDonalds employee AND Race = Oriental AND Sex = Male AND Age >35 yr) = 0.73 (support), and
P (Reactivation = No | Income <20K AND Occupation= McDonalds employee AND Race = Oriental AND Sex = Male AND Age >35 yr) = 0.23 (support)
Subject Number
Income
Occupation
Sex
Race
Age
S1
Customer Reactivated
>100K
Self Employed
male
Caucasian
>35 yr.
S2
<30K
Walmart Greeter
female
Oriental
>50 yr.
S3
Customer Reactivated
<20K
McDonalds Employee
male
Oriental
<30 yr.
S4
Customer Reactivated
>100K
Self Employed
male
African
>35 yr.
S5
<20K
McDonalds Employee
female
Oriental
>50 yr.
S6
Customer Reactivated
>100K
Self Employed
female
Caucasian
>35 yr.
S7
<30 K
Walmart Greeter
male
Caucasian
<30 yr.
S8
Customer Reactivated
>100K
Self Employed
male
Oriental
>35 yr.
S9
Customer Reactivated
<30 K
Walmart Greeter
female
Caucasian
>35 yr.
S10
<30 K
Walmart Greeter
female
South Asian
>50 yr.
Why are contrast-sets important? Determine the top-10 most significant attributes that affect your class (Prep step to data mining) Small datasets with a high number of attributes (Breast Cancer/Prostrate Cancer) Save time by weeding out the unnecessary rules.