Sap Classification

Week 3: Augmented Analytics: Smart Predict
Unit 5: Classification
Classification
Classification datasets – Rows
▪ When devising a dataset for a classification or a regression, the first step

is to define the row of the dataset
▪ This is constrained by the way the results will be used
▪ The “object of interest” defines this
− a customer seen at a given time
− a transaction
− a machine seen at a give time
▪ Because of this, the object of interest MUST be unique in the dataset
▪ Think about the fact that the rows present in the dataset can be there
▪ Define the relevant filters
− someone who already churned cannot be included in a dataset meant
to predict who might churn!
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2

Classification
Classification datasets – Columns
▪ Among the columns, one has a specific role. We call it the target. It is the
object of interest.
▪ In Smart Predict, if the target is
− binomial (2 categories), a classification can be used
− continuous, a regression can be used
We cannot use what is not known to learn, i.e. the target cannot have missing values.

Classification
Classification and regression: What are they for?
Classification is about producing the probability that an event will happen
Who among my customers is likely to react positively to my outbound marketing campaign?
Regression is about estimating a number
How much would they spend on my e-site if they did?
Classification
Model Built
Known Data
New Data
Regression: Interpolation

Classification
Classification is more about ranking than clustering
???
Customers who are

most likely to answer
positively
Classification
Customers who are
less likely to answer
positively
Above this limit: do not contact them – no ROI

This limit (threshold) depends on the operational constraints
Classification
Classification and regression – How to assess quality

Classification
Predictive power and prediction confidence
▪ Red line is no model (i.e. random picking)

▪ Green curve is perfect model (i.e. if I know in
advance)
▪ Yellow and blue curves correspond to the
percentage of positive cases when ordering by
decreasing score/probability
Predictive power represents how close to the perfect model the model is (quality)
Area between Validation and Random curves divided by the area between Perfect and Random curves = C/(A+B+C)
= 0 ➔ Bad quality >= 0.98 ➔ certainly dependent variables between 0.75 and 0.97 ➔ quality acceptable
Prediction confidence expresses the ability to reproduce the same detection (robustness)
You need a « validation sample » to estimate this KPI: it represents another view of the same population
1 – (area between Validation and Training) / (area between Perfect and Random) = 1- B/(A+B+C)
>= 0.95 ➔ good robustness
Classification
Variable contribution – Predictive models are not black boxes
▪ Shows the relative contribution of each variable in

the model. It represents how a variable influences
the event that you want to predict
▪ For each variable, you can also pay attention to the
influence of the categories (age ranges, product
lines, …)
− The way the categories impact the model are
shown in the debrief under section “Grouped
Category Influence”
− Grouped Category Statistics shows each group
with its frequency of positive cases and its global
frequency

Classification
Confusion matrix – Choose the threshold
The confusion matrix is a way to navigate in the curve « % detected target »
I have the budget to contact 30% of the population
75% of the customers who would be

interested by my proposition will be in
my selection (and 25% will not)

Classification
Confusion matrix – Choose the threshold
▪ The confusion matrix is provided to SAC Smart Predict

users to select the threshold in full knowledge of the
consequences
▪ The « errors » can be expressed differently, depending
on what is significant for the business: costs, results,
risk level, …
FP/(FP+TN)

Classification
Profit simulation – Optimize the threshold automatically
▪ Estimate the optimal percentage of population to contact to get maximum ROI

▪ This implies knowing average costs and gains
▪ ROI = Probability x Gain – Cost

Classification
Key takeaways
You have learnt :

1) What a classification is and when to use it and
2) The steps of a classification workflow in SAC Smart Predict
– Create and train a predictive scenario
– Inspect our dataset to get insights
– When to use a classification
– Use the results of a classification

Thank you.
Contact information:
open@sap.com
Follow all of SAP
www.sap.com/contactsap
© 2019 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of
SAP SE or an SAP affiliate company.
The information contained herein may be changed without prior notice. Some software products marketed by SAP SE and its
distributors contain proprietary software components of other software vendors. National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or
warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials.
The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty
statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional
warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or
any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation,
and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platforms, directions, and
functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason
without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or
functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ
materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, and they
should not be relied upon in making purchasing decisions.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names
mentioned are the trademarks of their respective companies.
See www.sap.com/copyright for additional trademark information and notices.

Sap Classification

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Sap Classification

Caricato da

Copyright:

Formati disponibili

Week 3: Augmented Analytics: Smart Predict

▪ When devising a dataset for a classification or a regression, the first step

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 2

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 3

Classification is about producing the probability that an event will happen

Who among my customers is likely to react positively to my outbound marketing campaign?

Regression is about estimating a number

How much would they spend on my e-site if they did?

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 4

Customers who are

Above this limit: do not contact them – no ROI

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 6

▪ Red line is no model (i.e. random picking)

▪ Shows the relative contribution of each variable in

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 8

The confusion matrix is a way to navigate in the curve « % detected target »

I have the budget to contact 30% of the population

75% of the customers who would be

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 9

▪ The confusion matrix is provided to SAC Smart Predict

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 10

▪ Estimate the optimal percentage of population to contact to get maximum ROI

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 11

You have learnt :

© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 12

© 2019 SAP SE or an SAP affiliate company. All rights reserved.

Potrebbero piacerti anche