Sei sulla pagina 1di 48

Business Intelligence and Analytics:

Systems for Decision Support


(10th Edition)

Chapter 6:
Techniques for Predictive
Modeling
Learning Objectives
 Understand the concept and definitions of
artificial neural networks (ANN)
 Learn the different types of ANN
architectures
 Know how learning happens in ANN
 Become familiar with ANN applications
 Understand the sensitivity analysis in ANN
 Understand the concept and structure of
support vector machines (SVM) (Continued…)
6-2 Copyright © 2014 Pearson Education, Inc.
Learning Objectives
 Learn the advantages and disadvantages
of SVM compared to ANN
 Understand the concept and formulation
of k-nearest neighbor algorithm (kNN)
 Learn the process of applying kNN
 Learn the advantages and disadvantages
of kNN compared to ANN and SVM

6-3 Copyright © 2014 Pearson Education, Inc.


Opening Vignette…
Predictive Modeling Helps Better
Understand and Manage Complex
Medical Procedures
 Situation
 Problem
 Solution
 Results
 Answer & discuss the case questions.
6-4 Copyright © 2014 Pearson Education, Inc.
Questions for the Opening
Vignette
1. Why is it important to study medical procedures?
What is the value in predicting outcomes?
2. What factors do you think are the most important in
better understanding and managing healthcare?
3. What would be the impact of predictive modeling
on healthcare and medicine? Can predictive
modeling replace medical or managerial personnel?
4. What were the outcomes of the study? Who can
use these results? How can they be implemented?
5. Search the Internet to locate two additional cases in
managing complex medical procedures.
6-5 Copyright © 2014 Pearson Education, Inc.
Opening
Vignette –
A Process Map
for Training
and Testing
Four Predictive
Models
Opening Vignette
The Comparison of Four Models

6-7 Copyright © 2014 Pearson Education, Inc.


Neural Network Concepts
 Neural networks (NN): a brain metaphor for
information processing
 Neural computing
 Artificial neural network (ANN)
 Many uses for ANN for
 pattern recognition, forecasting, prediction, and
classification
 Many application areas
 finance, marketing, manufacturing, operations,
information systems, and so on
6-8 Copyright © 2014 Pearson Education, Inc.
Biological Neural Networks
Dendrites
Synapse
Synapse

Axon

Axon

Dendrites Soma
Soma

 Two interconnected brain cells (neurons)

6-9 Copyright © 2014 Pearson Education, Inc.


Processing Information in ANN
Inputs Weights Outputs

x1
w1 Y1

x2 w2 Neuron (or PE) f (S )


. S  
n
X iW
Y
. Y2
. i 1
i

.
. Summation
Transfer
.
Function
wn Yn
xn

 A single neuron (processing element – PE)


with inputs and outputs
6-10 Copyright © 2014 Pearson Education, Inc.
Biology Analogy

6-11 Copyright © 2014 Pearson Education, Inc.


Application Case 6.1
Neural Networks Are Helping to Save
Lives in the Mining Industry

Questions for
Discussion
1. How did neural networks help save lives in the mining
industry?
2. What were the challenges, the proposed solution, and
the obtained results?
6-12 Copyright © 2014 Pearson Education, Inc.
Elements of ANN
 Processing element (PE)
 Network architecture
 Hidden layers
 Parallel processing
 Network information processing
 Inputs
 Outputs
 Connection weights
 Summation function
6-13 Copyright © 2014 Pearson Education, Inc.
Elements of ANN
x1 (PE)

x2 Weighted Transfer
(PE) Sum Function
Y1
x3 (S) (f)

(PE)

(PE) (PE)

Output
(PE)
Layer

Hidden
Neural Network with
(PE)
Layer

Input One Hidden Layer


Layer

6-14 Copyright © 2014 Pearson Education, Inc.


Elements of ANN
(a) Single neuron (b) Multiple neurons

x1 x1 w11 (PE) Y1
w1
w21
(PE) Y

w1 w12
x2 Y  X 1W1  X 2W2
x2 w22 (PE) Y2
PE: Processing Element (or neuron)

Y1  X1W11  X 2W21
Summation Function for a Single w23
Y2  X1W12  X2W22
Neuron (a), and
Y3  X 2W 23 Y3
Several Neurons (b)
(PE)

6-15 Copyright © 2014 Pearson Education, Inc.


Elements of ANN
 Transformation (Transfer) Function
 Linear function
 Sigmoid (logical activation) function [0 1]
 Tangent Hyperbolic function [-1 1]

Summation function: Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2


X1 = 3 Transfer function: YT = 1/(1 + e-1.2) = 0.77
W
1 =0
.2

W2 = 0.4 Processing Y = 1.2


X2 = 1 YT = 0.77
element (PE)
.1
3
=0
W

X3 = 2
 Threshold value?

6-16 Copyright © 2014 Pearson Education, Inc.


Neural Network Architectures
 Architecture of a neural network is driven by
the task it is intended to address
 Classification, regression, clustering, general
optimization, association, ….
 Most popular architecture: Feedforward,
multi-layered perceptron with
backpropagation learning algorithm
 Used for both classification and regression type
problems
 Others – Recurrent, self-organizing feature
maps, Hopfield networks, …
6-17 Copyright © 2014 Pearson Education, Inc.
Neural Network Architectures
Feed-Forward Neural Networks

Feed-forward MLP with 1 Hidden Layer

Socio-demographic
Predicted
= vs. Actual
Religious
Voted “yes” or
“no” to legalizing
Financial gaming

. .
. .
. .
Other

INPUT HIDDEN OUTPUT


LAYER LAYER LAYER

6-18 Copyright © 2014 Pearson Education, Inc.


Neural Network Architectures
Recurrent Neural Networks

6-19 Copyright © 2014 Pearson Education, Inc.


Other Popular ANN Paradigms
Self-Organizing Maps (SOM)

Input 1  First introduced


by the Finnish
Professor Teuvo
Input 2
Kohonen
 Applies to
clustering type
problems
Input 3

6-20 Copyright © 2014 Pearson Education, Inc.


Other Popular ANN Paradigms
Hopfield Networks
I n p u t
 First introduced
by John Hopfield
O  Highly
u
t interconnected
p
u
neurons
t  Applies to solving
complex
computational
problems (e.g.,
.
.
.
optimization
problems)

6-21 Copyright © 2014 Pearson Education, Inc.


Application Case 6.2
Predictive Modeling is Powering the
Power Generators

Questions for Discussion


1. What are the key environmental concerns in the
electric power industry?
2. What are the main application areas for predictive
modeling in the electric power industry?
3. How was predictive modeling used to address a
variety of problems in the electric power industry?
6-22 Copyright © 2014 Pearson Education, Inc.
Development Process of an ANN

6-23 Copyright © 2014 Pearson Education, Inc.


An MLP ANN Structure for
the Box-Office Prediction Problem
Class 1 - FLOP
1 (BO < 1 M)

MPAA Rating (5) Class 2


(G, PG, PG13, R, NR)
1 2 (1M < BO < 10M)

Competition (3) Class 3


(High, Medium, Low)
2 3 (10M < BO < 20M)

Star Value (3) Class 4


(High, Medium, Low)
3 4 (20M < BO < 40M)

Genre (10) Class 5


(Sci-Fi, Action, ... )
4 5 (40M < BO < 65M)

Technical Effects (3) Class 6


(High, Medium, Low)
5 6 (65M < BO < 100M)

Sequel (2) Class 7


(Yes, No)
6 7 (100M < BO < 150M)
... ...
Number of Screens Class 8
(Positive Integer)
7 8 (150M < BO < 200M)

Class 9 - BLOCKBUSTER
9 (BO > 200M)

INPUT HIDDEN HIDDEN OUTPUT


LAYER LAYER I LAYER II LAYER
(27 PEs) (18 PEs) (16 PEs) (9 PEs)

6-24 Copyright © 2014 Pearson Education, Inc.


Testing a Trained ANN Model
 Data is split into three parts
 Training (~60%)
 Validation (~20%)
 Testing (~20%)

 k-fold cross validation


 Less bias
 Time consuming

6-25 Copyright © 2014 Pearson Education, Inc.


AN Learning Process
A Supervised Learning Process
ANN
Model
Three-step process:
1. Compute temporary
Compute
output outputs.
2. Compare outputs with
desired targets.
3. Adjust the weights and
repeat the process.
Adjust No Is desired
weights output
achieved?

Yes

Stop
learning

6-26 Copyright © 2014 Pearson Education, Inc.


Backpropagation Learning

a(Zi – Yi)
x1 error
w1

x2 w2 Neuron (or PE) f (S )


Y  f (S )
. S  
n
X iW i
Yi
. i 1

. Summation
Transfer
Function
wn
xn

 Backpropagation of Error for a Single Neuron


6-27 Copyright © 2014 Pearson Education, Inc.
Backpropagation Learning
 The learning algorithm procedure
1. Initialize weights with random values and set other
network parameters
2. Read in the inputs and the desired outputs
3. Compute the actual output (by working forward
through the layers)
4. Compute the error (difference between the actual
and desired output)
5. Change the weights by working backward through
the hidden layers
6. Repeat steps 2-5 until weights stabilize
6-28 Copyright © 2014 Pearson Education, Inc.
Illuminating The Black Box
Sensitivity Analysis on ANN
 A common criticism for ANN: The lack of
transparency/explainability
 The black-box syndrome!
 Answer: sensitivity analysis
 Conducted on a trained ANN
 The inputs are perturbed while the relative
change on the output is measured/recorded
 Results illustrate the relative importance of
input variables
6-29 Copyright © 2014 Pearson Education, Inc.
Sensitivity Analysis on ANN
Models
Systematically Trained ANN
Perturbed “the black-box” Observed
Inputs Change in
Outputs

D1

 For a good example, see Application Case 6.3


 Sensitivity analysis reveals the most important
injury severity factors in traffic accidents
6-30 Copyright © 2014 Pearson Education, Inc.
Application Case 6.3
Sensitivity Analysis Reveals Injury Severity
Factors in Traffic Accidents

Questions for Discussion


1. How does sensitivity analysis shed light on the black
box (i.e., neural networks)?
2. Why would someone choose to use a blackbox tool like
neural networks over theoretically sound, mostly
transparent statistical tools like logistic regression?
3. In this case, how did NNs and sensitivity analysis help
identify injury-severity factors in traffic accidents?
6-31 Copyright © 2014 Pearson Education, Inc.
Support Vector Machines (SVM)
 SVM are among the most popular machine-
learning techniques.
 SVM belong to the family of generalized linear
models… (capable of representing non-linear
relationships in a linear fashion).
 SVM achieve a classification or regression
decision based on the value of the linear
combination of input features.
 Because of their architectural similarities, SVM
are also closely associated with ANN.
6-32 Copyright © 2014 Pearson Education, Inc.
Support Vector Machines (SVM)
 Goal of SVM: to generate mathematical
functions that map input variables to desired
outputs for classification or regression type
prediction problems.
 First, SVM uses nonlinear kernel functions to
transform non-linear relationships among the
variables into linearly separable feature spaces.
 Then, the maximum-margin hyperplanes are
constructed to optimally separate different classes
from each other based on the training dataset.
 SVM has solid mathematical foundation!
6-33 Copyright © 2014 Pearson Education, Inc.
Support Vector Machines (SVM)
 A hyperplane is a geometric concept used to
describe the separation surface between
different classes of things.
 In SVM, two parallel hyperplanes are constructed on
each side of the separation space with the aim of
maximizing the distance between them.
 A kernel function in SVM uses the kernel trick
(a method for using a linear classifier algorithm
to solve a nonlinear problem)
 The most commonly used kernel function is the radial
basis function (RBF).
6-34 Copyright © 2014 Pearson Education, Inc.
Support Vector Machines (SVM)
L1
M
X2 X2 ar
gi
L2 n

e
an
L3

l
rp
pe
hy
n
gi
ar
-m
um
im
ax
M
X1 X1

 Many linear classifiers (hyperplanes) may separate the data


6-35 Copyright © 2014 Pearson Education, Inc.
Application Case 6.4
Managing Student Retention with
Predictive Modeling
Questions for Discussion
1. Why is attrition one of the most important issues in
higher education?
2. How can predictive analytics (ANN, SVM, and so
forth) be used to better manage student retention?
3. What are the main challenges and potential
solutions to the use of analytics in retention
management?
6-36 Copyright © 2014 Pearson Education, Inc.
Application
Case 6.4
Managing Student
Retention with
Predictive Modeling

6-37 Copyright © 2014 Pearson Education, Inc.


How Does an SVM Work?
 Following a machine-learning process, an SVM
learns from the historic cases.
 The Process of Building SVM
1. Preprocess the data
 Scrub and transform the data.
2. Develop the model.
 Select the kernel type (RBF is often a natural choice).
 Determine the kernel parameters for the selected kernel type.
 If the results are satisfactory, finalize the model; otherwise change
the kernel type and/or kernel parameters to achieve the desired
accuracy level.
3. Extract and deploy the model.
6-38 Copyright © 2014 Pearson Education, Inc.
The Process of Building an SVM
Pre-Process the Data
Training
ü Scrub the data
data
“Identify and handle missing,
incorrect, and noisy”
ü Transform the data
“Numerisize, normalize and
standardize the data”

Pre-processed data

Develop the Model


Experimentation
ü Select the kernel type “Training/Testing”
“Choose from RBF, Sigmoid
or Polynomial kernel types”
ü Determine the kernel values
“Use v-fold cross validation or
employ ‘grid-search’”

Validated SVM model

Deploy the Model


Prediction
ü Extract the model coefficients Model
ü Code the trained model into
the decision support system
ü Monitor and maintain the
model

6-39 Copyright © 2014 Pearson Education, Inc.


SVM Applications
 SVMs are the most widely used kernel-learning
algorithms for wide range of classification and
regression problems
 SVMs represent the state-of-the-art by virtue of
their excellent generalization performance,
superior prediction power, ease of use, and
rigorous theoretical foundation
 Most comparative studies show its superiority in
both regression and classification type prediction
problems.
 SVM versus ANN?
6-40 Copyright © 2014 Pearson Education, Inc.
k-Nearest Neighbor Method (k-NN)
 ANNs and SVMs  time-demanding,
computationally intensive iterative derivations
 k-NN is a simplistic and logical prediction
method, that produces very competitive results
 k-NN is a prediction method for classification as
well as regression types (similar to ANN & SVM)
 k-NN is a type of instance-based learning (or
lazy learning) – most of the work takes place at
the time of prediction (not at modeling)
 k : the number of neighbors used
6-41 Copyright © 2014 Pearson Education, Inc.
k-Nearest Neighbor Method (k-NN)
Y

k=3

k=5
Yi

The answer
depends on the
value of k

Xi X
6-42 Copyright © 2014 Pearson Education, Inc.
The Process of k-NN Method

Training Set
Parameter Setting

Historic Data ü Distance measure


ü Value of “k”

Validation Set

Predicting
Classify (or Forecast)
new cases using k
number of most
similar cases

New Data

6-43 Copyright © 2014 Pearson Education, Inc.


k-NN Model Parameter
1. Similarity Measure: The Distance Metric

 Numeric versus nominal values?


6-44 Copyright © 2014 Pearson Education, Inc.
k-NN Model Parameter
2. Number of Neighbors (the value of k)
 The best value depends on the data
 Larger values reduce the effect of noise but
also make boundaries between classes less
distinct
 An “optimal” value can be found heuristically
 Cross Validation is often used to
determine the best value for k and the
distance measure
6-45 Copyright © 2014 Pearson Education, Inc.
Application Case 6.5
Efficient Image Recognition and
Categorization with kNN

Questions for Discussion


1. Why is image recognition/classification a
worthy but difficult problem?
2. How can k-NN be effectively used for image
recognition/classification applications?

6-46 Copyright © 2014 Pearson Education, Inc.


End of the Chapter

 Questions, comments

6-47 Copyright © 2014 Pearson Education, Inc.


All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written permission of the publisher. Printed in the
United States of America.

6-48 Copyright © 2014 Pearson Education, Inc.

Potrebbero piacerti anche