Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Unit 5: Classification
Classification
Classification datasets – Rows
▪ Among the columns, one has a specific role. We call it the target. It is the
object of interest.
▪ In Smart Predict, if the target is
− binomial (2 categories), a classification can be used
− continuous, a regression can be used
We cannot use what is not known to learn, i.e. the target cannot have missing values.
Classification
Model Built
Known Data
New Data
Regression: Interpolation
???
Predictive power represents how close to the perfect model the model is (quality)
Area between Validation and Random curves divided by the area between Perfect and Random curves = C/(A+B+C)
= 0 ➔ Bad quality >= 0.98 ➔ certainly dependent variables between 0.75 and 0.97 ➔ quality acceptable
Prediction confidence expresses the ability to reproduce the same detection (robustness)
You need a « validation sample » to estimate this KPI: it represents another view of the same population
1 – (area between Validation and Training) / (area between Perfect and Random) = 1- B/(A+B+C)
>= 0.95 ➔ good robustness
© 2019 SAP SE or an SAP affiliate company. All rights reserved. ǀ PUBLIC 7
Classification
Variable contribution – Predictive models are not black boxes
FP/(FP+TN)
open@sap.com
Follow all of SAP
www.sap.com/contactsap