Sei sulla pagina 1di 3

SAS 3/11/17

Setup: sas control center -> SASEM -> Configuration setup.


Used by large banks, consulting companies, tremendous productivity in
mins, business users less tuned to programming, mainly business uses.
Cmd -> javaws (1st test pass) -> java1.8 (required with 32 bit) (Java se
runtime 8u45) -> to check for right version javaws viewer -> java ->
view -> x56 for 32 bit version.
Run application for java runtime
Co variance n correlation capturing degree of linear association btw 2
variables -1 to 1; standardized form of co-variance; unit less
Agency problems
Co-variance not scaled invariant and hence not compatible; scale
invariance independent of scale
Std deviations can be multiplied as they can exist.
Co-relation btw totally different headlines
Making prediction with the i/p variables independent variables
Regression what & why ? used here for modelling
Create a New Project -> Next ; ok -> enter project name -> next 2 ->
finish -> stored in sas studio ( click on the link in the web)
Create lib -> new-> lib-> ceate new lib-> will be given by sir
Applied analytics using enterprise miner aaem
3 steps- create project, lib, diagram
Segment ppl lapsing donors donated last year but not in the present so
we target them. Figuring out to response or not coming back with the
help of label (obs 1 donated, 2 didnt , etc.)
How ppl responded in the past and whom to target today. (campaigns).
In 98 donated in 96-97 ; 95-96 are lapsing donors -> divide data like
training and test data as we need evidence of how the model we make
is performing. We make a model with training data and then we predict
with the other set data (like exam and q not from the book but similar)
over fitting. Evidence of generalization.
Over sampling building model for defaulters. Response sheet of a
campaign is very low in reality, so eg actual response rate is 5% so we
cant build a model in the ration of 19:1 -> so I will be having the info
about good guys which I wont be able to predict the defaulters. Non-
homogenous; so take a sample eg 1 milli records overall with 5%
responders with data set of 50 split (adjustment formula); probability
estimate and then again the adjustment.
Models cant be built on unbalanced data.
Separate sampling responders n non-responders arbitrarily extorting
data
Meta data adviser option - variable 1st column
Id variable one which is not required the entire process of modelling
only for identifier.
Role and the level of each variable is guessed by the tool from name and
value respectively; dummy variable eg. Salary = (a+b) * gender so g=0-
>m; g=1+f
Salary a+b(region) r=0 n; r=1 e; r= 2 w; r=3 s; which implies hardcoding
of the data which comes down to forcing the relationship. So slary =
a+bn+be+bw+bs
Tool is not able to identify btw nominal variable.
Class levels count threshold = 2 try out; variables with loads of classes
but are nominal in nature and arent required, then we can reject this
variable.
Qualitative variable will be continuously used even though it has 99
classes.
Credit scoring model to identify defaulters; h0 not default; h1
defaulters
Type 1- not defaulters being named as defaulters type 2 vice versa

True negative 0,0 T1 0,1


False negative 1,0 True positive 1,1

Minimization depends on the cost on the company eg type 2 is loan


outstanding & type 1 opportunity cost through cltv.
Segment of the customer, find cltv calculating expected loss so minimize
the most imp one hence type 1. (extortion high interest rate or the
goons)
Estimation of probability and classification groups based on a threshold
Type 1 error increases and type 2 decreases
The cost differentiation is based on the segments historical data
Specificity TN/(Tn+FP)
Lift is measure of performance; downward sloping curve
Depth x-axis
% response
% capture response and % response rate
Not communicated well amongst the target customers so inflow will be
less
Insufficent advertisement cal lead to less inflow
Paying capacity of ld people maynot avail all our facilities which can pose
risk to our profit margin