Sei sulla pagina 1di 14

Week 3: Augmented Analytics: Smart Predict

Unit 5: Classification
Classification
Classification datasets – Rows

▪ When devising a dataset for a classification or a regression, the first step


is to define the row of the dataset
▪ This is constrained by the way the results will be used
▪ The “object of interest” defines this
− a customer seen at a given time
− a transaction
− a machine seen at a give time
▪ Because of this, the object of interest MUST be unique in the dataset
▪ Think about the fact that the rows present in the dataset can be there
▪ Define the relevant filters
− someone who already churned cannot be included in a dataset meant
to predict who might churn!

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2


Classification
Classification datasets – Columns

▪ Among the columns, one has a specific role. We call it the target. It is the
object of interest.
▪ In Smart Predict, if the target is
− binomial (2 categories), a classification can be used
− continuous, a regression can be used

We cannot use what is not known to learn, i.e. the target cannot have missing values.

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3


Classification
Classification and regression: What are they for?

Classification is about producing the probability that an event will happen

Who among my customers is likely to react positively to my outbound marketing campaign?

Regression is about estimating a number

How much would they spend on my e-site if they did?

Classification

Model Built
Known Data
New Data
Regression: Interpolation

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4


Classification
Classification is more about ranking than clustering

???

Customers who are


most likely to answer
positively
Classification
Customers who are
less likely to answer
positively

Above this limit: do not contact them – no ROI


This limit (threshold) depends on the operational constraints
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 5
Classification
Classification and regression – How to assess quality

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6


Classification
Predictive power and prediction confidence

▪ Red line is no model (i.e. random picking)


▪ Green curve is perfect model (i.e. if I know in
advance)
▪ Yellow and blue curves correspond to the
percentage of positive cases when ordering by
decreasing score/probability

Predictive power represents how close to the perfect model the model is (quality)
Area between Validation and Random curves divided by the area between Perfect and Random curves = C/(A+B+C)
= 0 ➔ Bad quality >= 0.98 ➔ certainly dependent variables between 0.75 and 0.97 ➔ quality acceptable

Prediction confidence expresses the ability to reproduce the same detection (robustness)
You need a « validation sample » to estimate this KPI: it represents another view of the same population
1 – (area between Validation and Training) / (area between Perfect and Random) = 1- B/(A+B+C)
>= 0.95 ➔ good robustness
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
Classification
Variable contribution – Predictive models are not black boxes

▪ Shows the relative contribution of each variable in


the model. It represents how a variable influences
the event that you want to predict
▪ For each variable, you can also pay attention to the
influence of the categories (age ranges, product
lines, …)
− The way the categories impact the model are
shown in the debrief under section “Grouped
Category Influence”
− Grouped Category Statistics shows each group
with its frequency of positive cases and its global
frequency

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8


Classification
Confusion matrix – Choose the threshold

The confusion matrix is a way to navigate in the curve « % detected target »

I have the budget to contact 30% of the population

75% of the customers who would be


interested by my proposition will be in
my selection (and 25% will not)

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9


Classification
Confusion matrix – Choose the threshold

▪ The confusion matrix is provided to SAC Smart Predict


users to select the threshold in full knowledge of the
consequences
▪ The « errors » can be expressed differently, depending
on what is significant for the business: costs, results,
risk level, …

FP/(FP+TN)

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10


Classification
Profit simulation – Optimize the threshold automatically

▪ Estimate the optimal percentage of population to contact to get maximum ROI


▪ This implies knowing average costs and gains
▪ ROI = Probability x Gain – Cost

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11


Classification
Key takeaways

You have learnt :


1) What a classification is and when to use it and
2) The steps of a classification workflow in SAC Smart Predict
– Create and train a predictive scenario
– Inspect our dataset to get insights
– When to use a classification
– Use the results of a classification

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12


Thank you.
Contact information:

open@sap.com
Follow all of SAP

www.sap.com/contactsap

© 2019 SAP SE or an SAP affiliate company. All rights reserved.


No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.

Potrebbero piacerti anche