Sharda dss10 PPT 06

Business Intelligence and Analytics:
Systems for Decision Support

(10th Edition)
Chapter 6:
Techniques for Predictive
Modeling
Learning Objectives
 Understand the concept and definitions of
artificial neural networks (ANN)
 Learn the different types of ANN
architectures
 Know how learning happens in ANN
 Become familiar with ANN applications
 Understand the sensitivity analysis in ANN
 Understand the concept and structure of
support vector machines (SVM) (Continued…)
6-2 Copyright © 2014 Pearson Education, Inc.
Learning Objectives
 Learn the advantages and disadvantages
of SVM compared to ANN
 Understand the concept and formulation
of k-nearest neighbor algorithm (kNN)
 Learn the process of applying kNN
 Learn the advantages and disadvantages
of kNN compared to ANN and SVM

Opening Vignette…
Predictive Modeling Helps Better
Understand and Manage Complex
Medical Procedures
 Situation
 Problem
 Solution
 Results
 Answer & discuss the case questions.
Questions for the Opening
Vignette
1. Why is it important to study medical procedures?
What is the value in predicting outcomes?
2. What factors do you think are the most important in
better understanding and managing healthcare?
3. What would be the impact of predictive modeling
on healthcare and medicine? Can predictive
modeling replace medical or managerial personnel?
4. What were the outcomes of the study? Who can
use these results? How can they be implemented?
5. Search the Internet to locate two additional cases in
managing complex medical procedures.
Opening
Vignette –
A Process Map
for Training
and Testing
Four Predictive
Models
Opening Vignette
The Comparison of Four Models

Neural Network Concepts
 Neural networks (NN): a brain metaphor for
information processing
 Neural computing
 Artificial neural network (ANN)
 Many uses for ANN for
 pattern recognition, forecasting, prediction, and
classification
 Many application areas
 finance, marketing, manufacturing, operations,
information systems, and so on
Biological Neural Networks
Dendrites
Synapse
Synapse
Axon
Axon
Dendrites Soma
Soma
 Two interconnected brain cells (neurons)

Processing Information in ANN
Inputs Weights Outputs
x1
w1 Y1
x2 w2 Neuron (or PE) f (S )

. S  
n
X iW
Y
. Y2
. i 1
i
.
. Summation
Transfer
.
Function
wn Yn
xn
 A single neuron (processing element – PE)

with inputs and outputs
Biology Analogy

Application Case 6.1
Neural Networks Are Helping to Save
Lives in the Mining Industry
Questions for
Discussion
1. How did neural networks help save lives in the mining
industry?
2. What were the challenges, the proposed solution, and
the obtained results?
Elements of ANN
 Processing element (PE)
 Network architecture
 Hidden layers
 Parallel processing
 Network information processing
 Inputs
 Outputs
 Connection weights
 Summation function
Elements of ANN
x1 (PE)
x2 Weighted Transfer
(PE) Sum Function
Y1
x3 (S) (f)
(PE)
(PE) (PE)
Output
(PE)
Layer
Hidden
Neural Network with
(PE)
Layer
Input One Hidden Layer

Layer

Elements of ANN
(a) Single neuron (b) Multiple neurons
x1 x1 w11 (PE) Y1
w1
w21
(PE) Y
w1 w12
x2 Y  X 1W1  X 2W2
x2 w22 (PE) Y2
PE: Processing Element (or neuron)
Y1  X1W11  X 2W21
Summation Function for a Single w23
Y2  X1W12  X2W22
Neuron (a), and
Y3  X 2W 23 Y3
Several Neurons (b)
(PE)

Elements of ANN
 Transformation (Transfer) Function
 Linear function
 Sigmoid (logical activation) function [0 1]
 Tangent Hyperbolic function [-1 1]
Summation function: Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2

X1 = 3 Transfer function: YT = 1/(1 + e-1.2) = 0.77
W
1 =0
.2
W2 = 0.4 Processing Y = 1.2

X2 = 1 YT = 0.77
element (PE)
.1
3
=0
W
X3 = 2
 Threshold value?

Neural Network Architectures
 Architecture of a neural network is driven by
the task it is intended to address
 Classification, regression, clustering, general
optimization, association, ….
 Most popular architecture: Feedforward,
multi-layered perceptron with
backpropagation learning algorithm
 Used for both classification and regression type
problems
 Others – Recurrent, self-organizing feature
maps, Hopfield networks, …
Feed-Forward Neural Networks
Feed-forward MLP with 1 Hidden Layer
Socio-demographic
Predicted
= vs. Actual
Religious
Voted “yes” or
“no” to legalizing
Financial gaming
. .
. .
. .
Other
INPUT HIDDEN OUTPUT

LAYER LAYER LAYER

Recurrent Neural Networks

Other Popular ANN Paradigms
Self-Organizing Maps (SOM)
Input 1  First introduced

by the Finnish
Professor Teuvo
Input 2
Kohonen
 Applies to
clustering type
problems
Input 3

Other Popular ANN Paradigms
Hopfield Networks
I n p u t
 First introduced
by John Hopfield
O  Highly
u
t interconnected
p
u
neurons
t  Applies to solving
complex
computational
problems (e.g.,
.
.
.
optimization
problems)

Predictive Modeling is Powering the
Power Generators
Questions for Discussion

1. What are the key environmental concerns in the
electric power industry?
2. What are the main application areas for predictive
modeling in the electric power industry?
3. How was predictive modeling used to address a
variety of problems in the electric power industry?
Development Process of an ANN

An MLP ANN Structure for
the Box-Office Prediction Problem
Class 1 - FLOP
1 (BO < 1 M)
MPAA Rating (5) Class 2

(G, PG, PG13, R, NR)
1 2 (1M < BO < 10M)
Competition (3) Class 3

(High, Medium, Low)
2 3 (10M < BO < 20M)
Star Value (3) Class 4

(High, Medium, Low)
3 4 (20M < BO < 40M)
Genre (10) Class 5

(Sci-Fi, Action, ... )
4 5 (40M < BO < 65M)
Technical Effects (3) Class 6

(High, Medium, Low)
5 6 (65M < BO < 100M)
Sequel (2) Class 7

(Yes, No)
6 7 (100M < BO < 150M)
... ...
Number of Screens Class 8
(Positive Integer)
7 8 (150M < BO < 200M)
Class 9 - BLOCKBUSTER
9 (BO > 200M)
INPUT HIDDEN HIDDEN OUTPUT

LAYER LAYER I LAYER II LAYER
(27 PEs) (18 PEs) (16 PEs) (9 PEs)

Testing a Trained ANN Model
 Data is split into three parts
 Training (~60%)
 Validation (~20%)
 Testing (~20%)
 k-fold cross validation

 Less bias
 Time consuming

AN Learning Process
A Supervised Learning Process
ANN
Model
Three-step process:
1. Compute temporary
Compute
output outputs.
2. Compare outputs with
desired targets.
3. Adjust the weights and
repeat the process.
Adjust No Is desired
weights output
achieved?
Yes
Stop
learning

Backpropagation Learning
a(Zi – Yi)
x1 error
w1
x2 w2 Neuron (or PE) f (S )

Y  f (S )
. S  
n
X iW i
Yi
. i 1
. Summation
Transfer
Function
wn
xn
 Backpropagation of Error for a Single Neuron

Backpropagation Learning
 The learning algorithm procedure
1. Initialize weights with random values and set other
network parameters
2. Read in the inputs and the desired outputs
3. Compute the actual output (by working forward
through the layers)
4. Compute the error (difference between the actual
and desired output)
5. Change the weights by working backward through
the hidden layers
6. Repeat steps 2-5 until weights stabilize
Illuminating The Black Box
Sensitivity Analysis on ANN
 A common criticism for ANN: The lack of
transparency/explainability
 The black-box syndrome!
 Answer: sensitivity analysis
 Conducted on a trained ANN
 The inputs are perturbed while the relative
change on the output is measured/recorded
 Results illustrate the relative importance of
input variables
Sensitivity Analysis on ANN
Models
Systematically Trained ANN
Perturbed “the black-box” Observed
Inputs Change in
Outputs
D1
 For a good example, see Application Case 6.3

 Sensitivity analysis reveals the most important
injury severity factors in traffic accidents
Sensitivity Analysis Reveals Injury Severity
Factors in Traffic Accidents

1. How does sensitivity analysis shed light on the black
box (i.e., neural networks)?
2. Why would someone choose to use a blackbox tool like
neural networks over theoretically sound, mostly
transparent statistical tools like logistic regression?
3. In this case, how did NNs and sensitivity analysis help
identify injury-severity factors in traffic accidents?
Support Vector Machines (SVM)
 SVM are among the most popular machine-
learning techniques.
 SVM belong to the family of generalized linear
models… (capable of representing non-linear
relationships in a linear fashion).
 SVM achieve a classification or regression
decision based on the value of the linear
combination of input features.
 Because of their architectural similarities, SVM
are also closely associated with ANN.
 Goal of SVM: to generate mathematical
functions that map input variables to desired
outputs for classification or regression type
prediction problems.
 First, SVM uses nonlinear kernel functions to
transform non-linear relationships among the
variables into linearly separable feature spaces.
 Then, the maximum-margin hyperplanes are
constructed to optimally separate different classes
from each other based on the training dataset.
 SVM has solid mathematical foundation!
 A hyperplane is a geometric concept used to
describe the separation surface between
different classes of things.
 In SVM, two parallel hyperplanes are constructed on
each side of the separation space with the aim of
maximizing the distance between them.
 A kernel function in SVM uses the kernel trick
(a method for using a linear classifier algorithm
to solve a nonlinear problem)
 The most commonly used kernel function is the radial
basis function (RBF).
L1
M
X2 X2 ar
gi
L2 n
e
an
L3
l
rp
pe
hy
n
gi
ar
-m
um
im
ax
M
X1 X1
 Many linear classifiers (hyperplanes) may separate the data

Managing Student Retention with
Predictive Modeling
1. Why is attrition one of the most important issues in
higher education?
2. How can predictive analytics (ANN, SVM, and so
forth) be used to better manage student retention?
3. What are the main challenges and potential
solutions to the use of analytics in retention
management?
Application
Case 6.4
Managing Student
Retention with
Predictive Modeling

How Does an SVM Work?
 Following a machine-learning process, an SVM
learns from the historic cases.
 The Process of Building SVM
1. Preprocess the data
 Scrub and transform the data.
2. Develop the model.
 Select the kernel type (RBF is often a natural choice).
 Determine the kernel parameters for the selected kernel type.
 If the results are satisfactory, finalize the model; otherwise change
the kernel type and/or kernel parameters to achieve the desired
accuracy level.
3. Extract and deploy the model.
The Process of Building an SVM
Pre-Process the Data
Training
ü Scrub the data
data
“Identify and handle missing,
incorrect, and noisy”
ü Transform the data
“Numerisize, normalize and
standardize the data”
Pre-processed data
Develop the Model

Experimentation
ü Select the kernel type “Training/Testing”
“Choose from RBF, Sigmoid
or Polynomial kernel types”
ü Determine the kernel values
“Use v-fold cross validation or
employ ‘grid-search’”
Validated SVM model
Deploy the Model

Prediction
ü Extract the model coefficients Model
ü Code the trained model into
the decision support system
ü Monitor and maintain the
model

SVM Applications
 SVMs are the most widely used kernel-learning
algorithms for wide range of classification and
regression problems
 SVMs represent the state-of-the-art by virtue of
their excellent generalization performance,
superior prediction power, ease of use, and
rigorous theoretical foundation
 Most comparative studies show its superiority in
both regression and classification type prediction
problems.
 SVM versus ANN?
k-Nearest Neighbor Method (k-NN)
 ANNs and SVMs  time-demanding,
computationally intensive iterative derivations
 k-NN is a simplistic and logical prediction
method, that produces very competitive results
 k-NN is a prediction method for classification as
well as regression types (similar to ANN & SVM)
 k-NN is a type of instance-based learning (or
lazy learning) – most of the work takes place at
the time of prediction (not at modeling)
 k : the number of neighbors used
k-Nearest Neighbor Method (k-NN)
Y
k=3
k=5
Yi
The answer
depends on the
value of k
Xi X
The Process of k-NN Method
Training Set
Parameter Setting
Historic Data ü Distance measure

ü Value of “k”
Validation Set
Predicting
Classify (or Forecast)
new cases using k
number of most
similar cases
New Data

k-NN Model Parameter
1. Similarity Measure: The Distance Metric
 Numeric versus nominal values?

k-NN Model Parameter
2. Number of Neighbors (the value of k)
 The best value depends on the data
 Larger values reduce the effect of noise but
also make boundaries between classes less
distinct
 An “optimal” value can be found heuristically
 Cross Validation is often used to
determine the best value for k and the
distance measure
Efficient Image Recognition and
Categorization with kNN

1. Why is image recognition/classification a
worthy but difficult problem?
2. How can k-NN be effectively used for image
recognition/classification applications?

End of the Chapter
 Questions, comments

All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written permission of the publisher. Printed in the
United States of America.

Sharda dss10 PPT 06

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Sharda dss10 PPT 06

Caricato da

Copyright:

Formati disponibili

Business Intelligence and Analytics:

Systems for Decision Support

6-3 Copyright © 2014 Pearson Education, Inc.

6-7 Copyright © 2014 Pearson Education, Inc.

 Two interconnected brain cells (neurons)

6-9 Copyright © 2014 Pearson Education, Inc.

x2 w2 Neuron (or PE) f (S )

 A single neuron (processing element – PE)

6-11 Copyright © 2014 Pearson Education, Inc.

Input One Hidden Layer

6-14 Copyright © 2014 Pearson Education, Inc.

6-15 Copyright © 2014 Pearson Education, Inc.

Summation function: Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2

W2 = 0.4 Processing Y = 1.2

6-16 Copyright © 2014 Pearson Education, Inc.

Feed-forward MLP with 1 Hidden Layer

INPUT HIDDEN OUTPUT

6-18 Copyright © 2014 Pearson Education, Inc.

6-19 Copyright © 2014 Pearson Education, Inc.

Input 1  First introduced

6-20 Copyright © 2014 Pearson Education, Inc.

6-21 Copyright © 2014 Pearson Education, Inc.

Questions for Discussion

6-23 Copyright © 2014 Pearson Education, Inc.

MPAA Rating (5) Class 2

Competition (3) Class 3

Star Value (3) Class 4

Genre (10) Class 5

Technical Effects (3) Class 6

Sequel (2) Class 7

INPUT HIDDEN HIDDEN OUTPUT

6-24 Copyright © 2014 Pearson Education, Inc.

 k-fold cross validation

6-25 Copyright © 2014 Pearson Education, Inc.

6-26 Copyright © 2014 Pearson Education, Inc.

x2 w2 Neuron (or PE) f (S )

 Backpropagation of Error for a Single Neuron

 For a good example, see Application Case 6.3

Questions for Discussion

 Many linear classifiers (hyperplanes) may separate the data

6-37 Copyright © 2014 Pearson Education, Inc.

Develop the Model

Validated SVM model

Deploy the Model

6-39 Copyright © 2014 Pearson Education, Inc.

Historic Data ü Distance measure

6-43 Copyright © 2014 Pearson Education, Inc.

 Numeric versus nominal values?

Questions for Discussion

6-46 Copyright © 2014 Pearson Education, Inc.

6-47 Copyright © 2014 Pearson Education, Inc.

6-48 Copyright © 2014 Pearson Education, Inc.

Potrebbero piacerti anche