Sei sulla pagina 1di 5

BIDM Assignment No.

1
Predictive Modelling Using Decision Trees
A supermarket ofers a new line of organic products. The supermarkets
management wants
to determine which customers are likely to purchase these products.
The supermarket has a customer loyalty program. As an initial buyer
incentive plan, the supermarket provided coupons for the organic products to
all of the loyalty program participants and collected data that includes
whether these customers purchased any of the organic products.
The ORGANICS data set contains over 22,000 observations.
The ata !ining "b#ective is to determine whether a customer would purchase
organic products or not . The target variable $"%&'() is a binary variable that
indicates whether an individual purchased organic products or not. ataset*
"%&A(+,- $uploaded on claroline). 'ou need to build a ecision Tree !odel using
-A- enterprise miner.
-teps to be followed*
1) ,reate a new folder and upload all the -A- atasets and specially the
"%&A(+,- dataset in the folder. ,reate a new library and link it to the
folder. -teps to be followed are listed below
.hen you open -A- /.2, several libraries are automatically assigned and can be
seen in the 01plorer window.
2. ouble3click on the 4ibraries icon in the 01plorer window.
To de5ne a new library*
2. %ight3click in the 01plorer window and select New.
6. +n the (ew 4ibrary window, type a name for the new library. 7or e1ample,
type ,%--A!8.
9. Type in the path name or select Browse to choose the folder to be connected
with the new library name. 7or e1ample, the chosen folder might be located
at ,*:workshop:sas:dmem.
;. +f you want this library name to be connected with this folder every time you
open -A-, select n!"le !t st!rt#$.
<. -elect O%. The new library is now assigned and can be seen in the 01plorer
window.
=. To view the data sets that are included in the new library, double3click on the
icon for ,rssamp.
2) "pen -A- 0nterprise !iner To start 0nterprise !iner, type miner in the
command bo1 or select
Sol#tions An!l&sis nter$rise Miner.
3) ,reate a new pro#ect $7ile3(ew > 8ro#ect ) and a diagram
4) rag the +nput ata -ource to the iagram -ubspace. "pen the +nput ata
-ource (ode and -elect "%&A(+,- as the source data. -elect ,hange in
!etadata sample and select use complete data as sample. ,hange role of
variable "%&'( to target
5) ,onnect !ultiplot and insight nodes to input datasource node.. %un the
!ultiplot (ode and e1plore the results
6) -et the roles for the analysis variables $,heck that modelrole for custid is
set to id , while model role for "?, 0AT0 and 4,AT0 is set to re#ected).
Also, set the model role for variables A&0&%82, A&0&%82 and
(0+&@?"A%@"" to re#ected.
-et the model role for "%&'( to target $check that the measurement role is
binary), while model role of "%&A(+,- should be set to re#ected.
As noted !"ove' onl& ORG(N will "e #sed )or t*is !n!l&sis !nd s*o#ld
*!ve ! role
o) T!rget. +Tr& #sing ORGANICS !s !n in$#t v!ri!"le' re$ort t*e
res#lts o) t*e decision tree !nd !nswer t*e )ollowing ,#estion- .
C!n ORGANICS "e #sed !s !n in$#t )or ! model t*!t is #sed to
$redict ORG(N .
/*& or w*& not.
Att!c* ! screens*ot01 o) t*e in$#t d!t! so#rce s*owing t*e model
role o) !ll t*e v!ri!"les
7) ,onnect data partition node and partition the dataset into training > <0B
and validation > 90B
8) ,onnect a ecision Tree (ode. "pen the (ode and in the basic settings
select gini reduction as splitting criterion. Ceep the default stopping rules
$no. of observations in leaf node, observations reDuired for split search,
ma1 branches from node, ma1 no. of levels). 'ou may try changing
splitting criterion and stopping rules and see the impact on results
9) +n the advanced settings, select proportion misclassi5ed of as the
assessment criterion. ,lick on score and select process or score training,
validation and test and click on show details of validation.
10) %un the decision tree node and e1plore the results. &o to Eiew >
Tree to view the tree . &o to Eiew > tree options to change the no. of levels
that you want to view the tree. &o to 8lot and Table to e1plore the
misclassi5cation error vs the no. of leaves plot. .hat is the no. of leaves
and the corresponding misclassi5cation error in the 5nal selectedFpruned
subtree. &o to -core and variable selection to see the variables ranked in
the order of importance.
Att!c* screens*ots o) t*e tree res#lts +tree' $lot !nd 1n!l selected
v!ri!"les- S*ot02'3'4
11) ,onnect +nsight (ode to the decision Tree (odes. "pen the insight
node, select entire dataset and validation dataset. %un insight node and
e1plore the results
Att!c* screens*ot o) t*e insig*t node res#lts s*ot05
12) ,onnect Assessment (ode to both the ecision tree (odes . %un
and e1plore the results $lift charts > cumulative and non cumulative B
response chart, B captured response and lift value). /*!t is t*e
c#m#l!tive 6 res$onse' 6 c!$t#red res$onse !nd li)t v!l#e )or
t*e decision tree !t 17 $ercentile !nd 27 $ercentile. Also' w*!t is
t*e non8c#m#l!tive 6 res$onse 6 c!$t#red res$onse !nd li)t
v!l#e !t 27 $ercentile
17 27
c#m#l!tive 6
res$onse
9: ;5
c#m#l!tive 6
c!$t#red res$onse
32 52
c#m#l!tive li)t v!l#e 3.2 2.;
non8c#m#l!tive 6
res$onse
51
non8c#m#l!tive 6
c!$t#red res$onse
27
non8c#m#l!tive li)t
v!l#e
2
Att!c* screens*ots o) c#m#l!tive !nd nonc#m#l!tive 6 res$onse !nd
6 c!$t#red res$onse c*!rts s*ot0;' s*ot09' s*ot0<' s*ot0:
13) ,onnect a 63way decision tree $a decision tree with ma1 no.of
branches G 6), run and view the results. Also, connect the assessment
node to the 63way tree. B!sed on miscl!ssi1c!tion error !nd t*e li)t
c*!rts' w*ic* model wo#ld &o# select +!nd w*&. - i)
a. I) &o# *!ve to t!rget to$ 576 res$onders
b. I) &o# *!ve to t!rget to$ 276 res$onders