Comp527 15

COMP527:
Data Mining
COMP527: Data Mining
Dr Robert Sanderson
(azaroth@liv.ac.uk)
Dept. of Computer Science
University of Liverpool
2008
Regression, Prediction January 18, 2008 Slide 1

COMP527:
Data Mining
COMP527: Data Mining
Introduction to the Course Input Preprocessing
Introduction to Data Mining Attribute Selection
Introduction to Text Mining Association Rule Mining
General Data Mining Issues ARM: A Priori and Data Structures
Data Warehousing ARM: Improvements
Classification: Challenges, Basics ARM: Advanced Techniques
Classification: Rules Clustering: Challenges, Basics
Classification: Trees Clustering: Improvements
Classification: Trees 2 Clustering: Advanced Algorithms
Classification: Bayes Hybrid Approaches
Classification: Neural Networks Graph Mining, Web Mining
Classification: SVM Text Mining: Challenges, Basics
Classification: Evaluation Text Mining: TextasData
Classification: Evaluation 2 Text Mining: TextasLanguage
Regression, Prediction Revision for Exam

COMP527:
Data Mining
Today's Topics
Prediction / Regression
Linear Regression
Logistic Regression
Support Vector Regression
Regression Trees

COMP527:
Data Mining
Prediction
Classification tries to determine which class an instance belongs to,
based on known classes for instances by generating a model
and applying it to new instances. The model generated can be in
many forms (rules, tree, graph, vectors...). The output is the
class which the new instance is predicted to be part of.
So the class for classification is a nominal attribute.
What if it was numeric, with no enumerated set of values?
Then our problem is one of prediction rather than

classification.

COMP527:
Data Mining
Regression takes data and finds a formula for it. As with SVM, the
formula can be the model used for classification. This might
learn the formula for the probability of a particular class from 0..1
and then return the most likely class.
It can also be used for predicting/estimating values of a

numeric attribute simply by applying the formula used to
the data.
At the end of the lecture we'll look at regression trees which

combine decision trees and regression.

COMP527:
Data Mining
For example, instead of determining that the weather will be 'hot'
'warm', 'cool' or 'cold', we may need to be able to say with some
degree of accuracy that it will be 25 degrees or 7.5 degrees,
even if 7.5 never appeared in the temperature attribute for the
training data.
Or the stress on a structure under various conditions, the

number of seconds a boxer might last in the ring, the
number of goals a team would score over a season, or ...
any other numeric value that you might want to try to
predict.

COMP527:
Data Mining
Linear Regression
Express the 'class' as a linear combination of the attributes with
determined weights. eg:
x = w0 + w1a1 + w2a2 + ... + wnan
Where w is a weight, and a is an attribute.
The predicted value for instance i then is found by putting the attribute
values for i into the appropriate a slots.
So we need to learn the weights that minimize the error between actual
value and predicted value across the training set.
(Sounds like Perceptron, right?)

COMP527:
Data Mining
Linear Regression
To determine the weights, we try to minimize the sum of the squared error
across all the documents:
∑(xi ∑wjaik)2
Where x is the actual value for instance i and the second
half is the predicted value by applying all k weights to the
k attribute values of instance i.
To do so we can use the method described in Dunham, ~pg

85.
(Which I'm not going to try and explain!)

COMP527:
Data Mining
Linear Regression
Simple case: Method of Least Squares
∑(xi avg(x))(yi avg(y))
w=
∑(xiavg(x))2
solves the simple case of y = b + wx
And then we find b by:

b = avg(y) – w * avg(x)

COMP527:
Data Mining
Non-Linear Regression
We could apply a function to each attribute instead of just
multiplying by a weight.
For example:
x = c + f1(a1) + f2(a2) + ... + fn(an)
Where f is some function (eg square, log, square root, modulo 6,
etc)
Of course determining the appropriate function is a problem!

COMP527:
Data Mining
Logistic Regression
Instead of fitting the data to a straight line, we can try to fit it to a
logistic curve (a flat S shape).
This curve gives values between 0 and 1, and hence can be used
for probability.
We won't go into how to work
out the coefficients, but the
result is the same as the linear
case:
x = c + wa + wa + ... + wa

COMP527:
Data Mining
We looked at the maximum margin hyperplane, which involved
learning a hyperplane to distinguish two classes. Could we learn
a prediction hyperplane in the same way?
That would allow the use of kernel functions for the nonlinear case.
Goal is to find a function that has at most E deviation in prediction
from the training set, while being as flat as possible. This
creates a tube of width 2E around the function. Points that do
not fall within the tube are support vectors.

COMP527:
Data Mining
By also trying to flatten the function, bad choices for E can be
problematic.
If E is too big and encloses all the points, then the function will
simply find the mean. If E is 0, then all instances are support
vectors. Too small and there will be too many support vectors,
too large and the function will be too flat to be useful.
We can replace the dot product in the regression equation with a
kernel function to perform nonlinear support vector regression:
x = b + ∑αia(i)∙a

COMP527:
Data Mining
Regression and Model Trees
The problem with linear regression is that most data sets are not linear.
The problem with nonlinear regression is that it's even more
complicated!
Enter Regression Trees and Model Trees.
Idea: Use a Tree structure (divide and conquer) to split up the instances
such that we can more accurately apply a linear model to only the
instances that reach the end node.
So branches are normal decision tree tests, but instead of a class value
at the node, we have some way to predict or specify the value.

COMP527:
Data Mining
Regression vs Model Trees
Regression Trees: The leaf nodes have the average value of the
instances to reach it.
Model Trees: The leaf nodes have a (linear) regression model to
predict the value of the instances that reach it.
So a regression tree is a constant value model tree.
Issues to consider:
– Building
– Pruning / Smoothing

COMP527:
Data Mining
Building Trees
We know that we need to construct a tree, with a linear model at
each node and an attribute split at non leaf nodes.
To split, we need to determine which attribute to split on, and where
to split it. (Remember that all attributes are numeric)
Witten (p245) proposes Standard Deviation Reduction treating
the std dev of the class values as a measure of the error at the
node and maximising the reduction in that value for each split.


COMP527:
Data Mining
Smoothing
It turns out that the value predicted at the bottom of the tree is generally
too coarse, probably because it was built against only a small subset of
the data.
We can fine tune the value by building a linear model at each node along
with the regular split and then send the value from the leaf back up the
path to the root of the tree, combining it with the values at each step.
p' = (np + kq) / (n + k)
p' is prediction to be passed up. p is prediction passed to this node.
q is the value predicted at this node. n is the number of instances that
reach the node below. k is a constant.

COMP527:
Data Mining
Pruning
Pruning can also be accomplished using the models built at each
node.
We can estimate the error at each node using the model built by
taking the actual error on the test set and multiplying by (n+v)/(n
v) where n is the number of instances that reach the node and v
is the number of parameters in the linear model for the node.
We do this multiplication to avoid underestimating the error on new
data, rather than the data it was trained against.
If the estimated error is lower at the parent, the leaf node can be
dropped.

COMP527:
Data Mining
Building Algorithm
MakeTree(instances)
SD = sd(instances) // standard deviation
root = new Node(instances)
split(root)
prune(root)
split(node)
if len(node)< 4 or sd(node) < 0.05*SD:
node.type = LEAF
else
node.type = INTERIOR
foreach attribute a:
foreach possibleSplitPosition s in a:
calculateSDR(a, s)
splitNode(node, maximumSDR)
split(node.left)
split(node.right)

COMP527:
Data Mining
Pruning Algorithm
prune(node)
if node.type == INTERIOR:
prune(node.left)
prune(node.right)
node.model = new linearRegression(node)
if (subTreeError(node) > error(node):
node.type = LEAF
subTreeError(node)
if node.type = INTERIOR:
return len(left)*subTreeError(left) +
len(right)*subTreeError(right) / len(node)
else:
return error(node)

COMP527:
Data Mining
Specific Algorithms
Some regression/model trees:
CHAID (ChiSquared Automatic Interaction Detector). 1980.
Can also be used either for continuous or nominal classes.
CART (Classification And Regression Tree). 1984.
Entropy or Gini to choose attribute, binary split for selected
attribute.
M5 Quinlan's model tree inducer (of C4.5 fame). 1992.

COMP527:
Data Mining
Further Reading
●
Introductory statistical text books, still!
●
Witten, 3.7, 4.6, 6.5
●
Dunham, 3.2, 4.2
●
Han, 6.11

Comp527 15

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Comp527 15

Caricato da

Copyright:

Formati disponibili

COMP527:

Regression, Prediction January 18, 2008 Slide 1

Regression, Prediction January 18, 2008 Slide 2

Regression, Prediction January 18, 2008 Slide 3

So the class for classification is a nominal attribute.

What if it was numeric, with no enumerated set of values?

Then our problem is one of prediction rather than

Regression, Prediction January 18, 2008 Slide 4

It can also be used for predicting/estimating values of a

At the end of the lecture we'll look at regression trees which

Regression, Prediction January 18, 2008 Slide 5

Or the stress on a structure under various conditions, the

Regression, Prediction January 18, 2008 Slide 6

Regression, Prediction January 18, 2008 Slide 7

To do so we can use the method described in Dunham, ~pg

(Which I'm not going to try and explain!)

Regression, Prediction January 18, 2008 Slide 8

And then we find b by:

Regression, Prediction January 18, 2008 Slide 9

Regression, Prediction January 18, 2008 Slide 10

Regression, Prediction January 18, 2008 Slide 11

Regression, Prediction January 18, 2008 Slide 12

Regression, Prediction January 18, 2008 Slide 13

Regression, Prediction January 18, 2008 Slide 14

Regression, Prediction January 18, 2008 Slide 15

Regression, Prediction January 18, 2008 Slide 16

Regression, Prediction January 18, 2008 Slide 17

Regression, Prediction January 18, 2008 Slide 18

Regression, Prediction January 18, 2008 Slide 19

Regression, Prediction January 18, 2008 Slide 20

Regression, Prediction January 18, 2008 Slide 21

Regression, Prediction January 18, 2008 Slide 22

Potrebbero piacerti anche