Sei sulla pagina 1di 51

BOOKBINDERS

BOOK CLUB

DEVIKA DHIR/ ANA LOPEZ /


ASTHA KATHIL / YINGMEI (JOAN) ZHANG
CONTENT
1 CASE SETTING
2 DATA
3 PROBLEM STATEMENT
4 THREE MODEL ANALYSIS
5 PROFIT ANALYSIS
6 CONCLUSIONS AND RECOMMENDATIONS
CONTENT
1 CASE SETTING
2 DATA
3 PROBLEM STATEMENT
4 THREE MODEL ANALYSIS
5 PROFIT ANALYSIS
6 CONCLUSIONS AND RECOMMENDATIONS
1

50K +$20B 10%


new titles with new editions, in profit book for the
of the books are sold
published in the United States publishing industry.
through mail order
each year
BOOKBINDERS
BOOK CLUB
MAIN FOCUS TO SELL SPECIALTY BOOKS THROUGH DIRECT MARKETING

•Currently has a database of 500000 readers and sends mailing once a month
•Plans to implement predictive models to improve the efficacy of its direct mail program
CONTENT
1 CASE SETTING
2 DATA
3 PROBLEM STATEMENT
4 THREE MODEL ANALYSIS
5 PROFIT ANALYSIS
6 CONCLUSIONS AND RECOMMENDATIONS
2

20,000
OVERALL RESPONDENTS Pennsylvania, New York and
Ohio, that where received a special
brochure for The Art History of
Florence book along with
their regular mail and had a 9.03%
response rate for the book
1,200 400 purchase.
NOT PURCHASED PURCHASED
* THE CASE SUGGESTS THREE
1,600 2,300 MODLES OF ANALYSIS

CASE DATA HOLD OUT DATA

PREDICT VALIDATE
2

DEPENDENT VARIABLE:

CHOICE: Whether the customer purchased the The Art History of Florence. 1
corresponds to a purchase and 0 corresponds to a non-purchase.
2

INDEPENDENT VARIABLE:

GENDER: 0 = Female and 1 = Male.


2

INDEPENDENT VARIABLE:

AMOUNT PURCHASED: Total money spent on BBBC books.


2

INDEPENDENT VARIABLE:

FREQUENCY: Total number of purchases in the chosen period (used as a


proxy for frequency.)
2

INDEPENDENT VARIABLE:

LAST PURCHASE: Months since last purchase.



2

INDEPENDENT VARIABLE:

FIRST PURCHASE: Months since first purchase.



2

INDEPENDENT VARIABLE:

BOOK CATEGORIES: Number of books purchased in each category.



CONTENT
1 CASE SETTING
2 DATA
3 PROBLEM STATEMENT
4 THREE MODEL ANALYSIS
5 PROFIT ANALYSIS
6 CONCLUSIONS AND RECOMMENDATIONS
3

TO HELP BBBC IMPROVE THE EFFICIENCY OF ITS DIRECT MAIL PROGRAM….


WHICH OF THE THREE MODELS ALLOWS US TO

TARGET CUSTOMERS BETTER


TO PURCHASE ‘THE ART HISTORY OF FLORENCE’?
CONTENT
1 CASE SETTING
2 DATA
3 PROBLEM STATEMENT
4 THREE MODEL ANALYSIS
5 PROFIT ANALYSIS
6 CONCLUSIONS AND RECOMMENDATIONS
4

THREE MODELS
RFM REGRESSION LOGIT
CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA
4

THREE MODELS
RFM REGRESSION LOGIT
CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA

• Arbitrarily assign scores to each variable (1-12)


• Sum total scores and rank by score
• Build gain chart
• Build graph
4

R F M

RFM
Analysis is a marketing technique used to
determine quantitatively which customers
are the best ones by examining how
recently a customer has purchased
(recency), how often they purchase
(frequency), and how much the customer
spends (monetary).
4
RFM
• Assign scores to each variable (1-12) • Sum total scores and rank by score

Organize dataset into 10 ‘decile’


1,600/10= 160 groups of 160 or 230 records
each for the CASE and
HOLD OUT DATA, to then
2,300/10= 230 build the gain chart.
4
RFM
• Build gain charts

DECILE COLUMN:

1,600/10= 160

2,300/10= 230

Organize dataset into 10 ‘decile’


groups of 160 or 230 records each
for the CASE and HOLD OUT
DATA, to then build the gain chart.
4
RFM
• Build gain charts

MAIL COLUMN:

# of mailing that have been sent out


to each decile.

C. MAIL COLUMN:

Cumulative # of mailing that have


been sent out to each decile.
4
RFM
• Build gain charts

RESPONSE COLUMN:
Based on the score ranking, we count
the ‘CHOICE’ observations of each
decile.

C. RESPONSE COLUMN:

Cumulative count of ‘CHOICE’


observations for each decile.
4
RFM
• Build gain charts

PERCENTAGE COLUMN:

Represents the % of people that


responded to the mailing. For example,
in decile 1, 86 people responded out
of 230.

C. PERCENTAGE COLUMN:

Represents the % of cumulative


responses, for all the deciles up to
that point. (C. Percentage=
C.Response/C.Mail)
4
RFM
• Build gain charts

INDEX COLUMN:
Represents the response % per decile,
divided by the response % for all the
deciles.
INDEX = 37% / 9%

C. INDEX COLUMN:
Represents the cumulative response
per decile, divided by the response %
for all the deciles.
C. INDEX = 37% / 9%
4
RFM
• Build gain charts

INDEX COLUMN:
Represents the response % per decile,
divided by the response % for all the
deciles.
INDEX = 37% / 9%

C. INDEX COLUMN:
Represents the cumulative response
per decile, divided by the response %
for all the deciles.
C. INDEX = 37% / 9%
4
RFM
• Build gain charts
CASE DATA HOLD OUT DATA

✓ Customers in the first deciles have higher scores.


4
RFM
CASE DATA
Cumulative Gains Lift Chart RFM Case Data HOLD OUT DATA
Cumulative Gains Chart RFM Hold Out Data
450 250

400

200
350

300
150
250

200
100
150

100
50

50

0 0
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 2001 2101 2201 2301

Cumulative Baseline Cumulative Actual Response Cumulative Baseline Cumulative Actual Response

✓ NOT a good model, since it doesn’t perform better than a random case.
Meaning the response hits are below the 50% set baseline.
4

THREE MODELS
RFM REGRESSION LOGIT
CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA

• Run the model with ‘Choice’ as the dependent variable


• Choose all the other variables independent
• Analyze output and coefficients
• Substitute the values of each observation in the REG equation
• Rank ‘Predicted Choice’ in decreasing order
• Build gain chart
• Build graph
REGRESSION 4

• Run the model with ‘Choice’ • Choose all the other variables
as the dependent variable independent
REGRESSION 4

• Analyze output and coefficients

✓ GENDER, is negative and the probability of purchase is higher for female (0) customers.
✓ AMOUNT PURCHASE, coefficient being 0.000, the variable does not have an impact on the choice
✓ FREQUENCY, indicates the higher the frequency the less probability of purchase.
✓ LAST PURCHASE, indicates the longer the time frame from last purchase, the higher is the probability of purchase.
✓ FIRST PURCHASE, is not significant.
✓ P_ART, is positive (+), meaning probability of purchase is higher for Art books than other types of books.
REGRESSION 4

• Substitute the values of each observation • Rank ‘Predicted Choice’ in decreasing order
in the REG equation

PROBABILITY OF CHOICE =
0.36 -0.13 *(GENDER) +0.00 *(AMOUNT PURCHASED)
–0.009* (FREQUENCY) +0.097* (LAST PURCHASE)
–0.002* (FIRST_PURCHASE) -0.126* (P_CHILD)
–0.096* (P_YOUTH) –0.141* (P_COOK) -
0.135 * (P_DIY) +0.118* (P_ART)
REGRESSION 4

• Build gain charts


CASE DATA HOLD OUT DATA

✓ Customers in the first deciles have higher scores. ✓ Customers in the first deciles have higher scores.
✓ Percentage column (Response/Mail), represents the % of customers that ✓ Percentage column (Response/Mail), represents the % of customers that
purchased. Meaning that on the first decile, 118 out of 160, purchased the purchased. Meaning that on the first decile, 86 out of 230, purchased the
book. book.
REGRESSION 4

CASE DATA HOLD OUT DATA


Cumulative Gains Lift Chart Regression Case Data Cumulative Gains Lift Chart Regression Hold Out Data
450 250

400

350 200

300

150
250

200
100

150

100
50

50

0 0

101

201

301

401

501

601

701

801

901

1001

1101

1201

1301

1401

1501

1601

1701

1801

1901

2001

2101

2201

2301
1
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601

Cumulative Baseline Cumulative Actual Response Cumulative Baseline Cumulative Actual Response
4

THREE MODELS
RFM REGRESSION LOGIT
CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA

• Run the model with ‘Choice’ as the dependent


variable
• Choose all the other variables independent
• Analyze output and coefficients
• Substitute the values of each observation in the
LOGIT equation
• Rank ‘Predicted Choice’ in decreasing order
• Build gain chart
• Build graph
LOGIT 4

• Run the model with ‘Choice’ • Choose all the other variables
as the dependent variable independent
LOGIT 4

• Analyze output and coefficients


Similar to the regression output analysis,
✓ GENDER, is negative and the probability of purchase is higher
for female (0) customers.
✓ AMOUNT PURCHASE, coefficient still has no impact on
choice
✓ FREQUENCY, indicates the higher the frequency the less
probability of purchase.
✓ LAST PURCHASE, indicates the longer the time frame from
last purchase, the higher is the probability of purchase.
✓ FIRST PURCHASE, is not significant.
✓ P_ART, is positive (+), meaning probability of purchase is higher
for Art books than other types of books.
LOGIT 4

• Substitute the values of each observation • Rank ‘Response Prob’ in decreasing order
in the LOGIT equation

SCORE =
0.35 - 0.86 * (GENDER) + 0.00 * (AMOUNT
PURCHASED) – 0.08 * (FREQUENCY) + 0.61 * (LAST
PURCHASE) – 0.01 * (FIRST PURCHASE) - 0.81 *
(P_CHILD) – 0.64 * (P_YOUTH) – 0.92 * (P_COOK)
– 0.91 * (P_DIY) + 0.69 * (P_ART)

P=
LOGIT 4

• Build gain charts


CASE DATA HOLD OUT DATA

✓ Customers in the first deciles have higher scores. ✓ Customers in the first deciles have higher scores.
✓ Percentage column (Response/Mail), represents the % of customers that ✓ Percentage column (Response/Mail), represents the % of customers that
purchased. Meaning that on the first decile, 120 out of 160, purchased the purchased. Meaning that on the first decile, 86 out of 230, purchased the
book. book.
LOGIT 4

CASE DATA HOLD OUT DATA


Cumulative Gains Lift Chart Logit Case Data Cumulative Gains Lift Chart Logit Hold Out Data
450 250

400

200
350

300
150
250

200
100

150

100 50

50

0
0
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 2001 2101 2201 2301
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601

Cumulative Baseline Cumulative Actual Response Cumulative Baseline Cumulative Actual Response
4

RFM REGRESSION LOGIT


CASE DATA HOLD OUT DATA
Hold Out
450 250

400

350 200

300

150
250

200
100
150

100
50
50

0
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 0
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 2001 2101 2201 2301

Baseline Logit Model RFM Model Regression Model Baseline Regression Model Logit Model RFM Model

✓ Either LOGIT or REGRESSION seem to be great models, which on should we


choose?
CONTENT
1 CASE SETTING
2 DATA
3 PROBLEM STATEMENT
4 THREE MODEL ANALYSIS
5 PROFIT ANALYSIS
6 CONCLUSIONS AND RECOMMENDATIONS
5

RFM PROFIT ANALYSIS

✓ Maximum profit is $12,734.78, and it’s only attainable by selling to all 10


deciles.
5

REGRESSION PROFIT ANALYSIS

✓ Maximum profit is $22,478.26, attainable by TARGETING the first 4


deciles.
5

LOGIT PROFIT ANALYSIS

✓ Maximum profit is $23,143.48, attainable by TARGETING the first 4


deciles.
MAXIMUM PROFIT

✓ Maximum profit is achievable through the LOGIT model, targeting


decile 1 through 4.
CONTENT
1 CASE SETTING
2 DATA
3 PROBLEM STATEMENT
4 THREE MODEL ANALYSIS
5 PROFIT ANALYSIS
6 CONCLUSIONS AND RECOMMENDATIONS
6

RECOMMENDATIONS
SHORT TERM
✓ Target deciles 1 through 4, for the ‘The Art History of Florence’ book.
✓ For the remaining deciles send sales promotions that appeal to customers
with opposing characteristics.

LONG TERM
✓ Target Lookalike Audience to reach new people who are likely to be interested in
the products because they're similar to the best existing customers.
✓ Re-do this type of analysis for their other book offerings, that have a different
target than the one already analyzed.
6

QUESTIONS?
BOOKBINDERS
BOOK CLUB

DEVIKA DHIR/ ANA LOPEZ /


ASTHA KATHIL / YINGMEI (JOAN) ZHANG

Potrebbero piacerti anche