Sei sulla pagina 1di 4

Abstract

Chapter five of ESL considers abstractions of the concept of linear regression. Instead of supposing we
will be putting together a simple linear model, ESL dives into the concept of basis functions, preprocessing,
and the use of smoothing paramters in regression models. The chapter finishes with some functional analysis,
and the employment of wavelets in statistical analysis.

Chapter 5: Basis Functions and Regularization


In chapters two and three the regression model was discussed as coming in a form like
y = 0 + 1 x1 + i

(1)

in the case of a one dimensional model. This explanation can be generalized by noting that the X term
is in fact a basis function (multiplied by 1), and that we can specify other models which are analogous to
that produced by the RSS functional above, but with additional specifications.
For instance, we may specify a linear basis of the predictor variables X wherein some function is applied
to X. Well denote this basis for X as hm (X):
f (X) =

M
X

m hm (X)

(2)

m=1

This basis function hm (X) may be a constant, or it may be the value of X taken to a particular power,
or perhaps some other functional transformation. For instance, recall this question from scientific computing:

Note: Add example section type.


To use linear least squares the function must be linear in terms of variable p. To arrive at the linear
form, we being with the expression denoting the coordinates equations:
r=

p
1 e cos()

Inverting this equation produces:


1 e cos(
e cos()
1
1
=
=
r
p
p
p
Parameter p = 1 for this function while the values multiplying

e
p

are:

Assuming p = 1, we effectively define a regression equation with a constant term 1:


1
= 1 e cos()
r
Now, using the data points supplied at the beginning of the problem, the design matrix (for this problem,
denoted as A) as:

1 .669130
1 .390731

A=
1 .121869
1 0.309016
1 0.587785

Where b is made of the respective observations

1
:
r

0.370370

.5

0.618046
b=

0.833333
0.980392

By solving for c in Ac = b, the value of the coefficients for this regression equation are shown to be:


.6880
c=
.4841
Using the coefficients for c, which are in terms of :
p = c11
e
= c2
p
Then the coefficients for the linear least squares estimate of

1
r

are:

p = 1/.688 = 1.453488
e = 1.453488 .4841 = .703633
Which gives:
r=

1.453488
1 .703633 cos()

Displayed below are the results of this equation at the given points along with the absolute error:

48A

67A

83A

108A

126A
#
#

r
2.74669
2.00462
1.58982
1.19389
1.02823

|r r |
0.04669
0.00462
0.02018
0.00611
0.00823

Peter Oliver Caya


This is a remake of the scientific computing assignment regarding regression.

# First, form the design matrix A, the vector of observations b. I provide a


# quick function to calculate the hat function but for this example Rs built-in
# functionality will be used to extract the hat matrix from the regression model.
A = as.matrix(cbind(rep(1,5),c(-.669130,-.390731,-.121869,.309016,.587785)))
b = as.matrix(c(.370370,.5,.618046,.833333,.980392))
hatmake = function(x) {x %*% solve(t(x)%*%x)%*%t(x)}
H = hatmake(A)
c = solve(t(A)%*%A)%*%t(A)%*%b
b.hat = h%*%b
# The model produced using the built-in functionality of R:
reg = lm(b~A[,2])
plot(b,A[,2],col=red)
par(new=T)
abline(b,fitted.values(reg))
# Use Rs functionality to extract the hat matrix and perform an svd on it.
#With the diagonal matrix produced by svd, it is possible to get an idea of
# the level of variance each column of the design matrix imparts to our model.
# Note that in plotting the svd of the design matrix, there are no obvious
#columns contributing unduly to the variance:
X = model.matrix(reg)

h= hat(X)
svd.X = svd(X)
plot(svd.X$d,xlab = ,ylab = Singular Values)
#
#
#

Cooks distance also is useful as it gives us a heuristic (and analytical)


way of assessing the residuals in our model. We first assess the heuristic
by taking the distance and plotting it

lev = diag(h)
plot(A[,2],lev)
plot(residuals(reg),cooks.distance(reg),xlab = Residuals,ylab = Cooks D)
# Next, we can use the influence() function to calculate the change in the
# model that occurs via the ommision of any observation.
This is given
# by the $coefficients argument.
# It can be seen that ommitting the fifth observation would tweak the
# intercept term up by ~.5%, and the slope term by 1.2%
influence(reg)$coefficients

Introduction to the Use of Basis Functions in Regression

In chapter 3 we discussed the creation and modification of linear models. However, a simple linear specification for a model is not always guaranteed! First, look at this more general specification of a statistical
model for some phenomena f (X):
M
X
f (X) =
m hm (X)
(3)
m=1

In this case, hm (X) is some functional being applied to the vector X, IE hm (X) : Rm R, m
[1, 2, ..., M ]. In fact, (3)fregeqn is a linear basis expansion in X. The selection of hm (X) is very large. For
instance, we may simple let hm (X) be
1 which produces our standard regression model. Alternatively, it
could be a quadratic representation of the form:
f (X) = 0 + 1 X + 2 X T X
It could also be a function like log(X) or and indicator function. We may choose the functional to suit
our purposes.
This chapter considers the following methods of manipulation of the basis functions to influence the
underlying model:
1. Piecewise polynomial and splines - Polynomial representation of the regression model. For instance, a
cubic spline, or a set of piecewise basis functions which form the basis for the model may be used.
2. Wavelet bases in which a large number of potential basis functions are formed from a dictionary D .

Introduction to Piecewise Polynomials and Splines

Instead of identifying parameters over an entire region, we can instead divide up the region that the model
is use into segments. The borders of these segments are referred to as knots by the authors of ESL. A very
elementary example is dividing the region [0, 1] into N = 3 sub-regions. On each of these regions, we define
basis functions as follows:
1. h1 (X) = 1 where X [0, 1/3)
2. h2 (X) = 1 where X [1/3, 2/3)
3. h3 (X) = 1 where X [2/3, 1]
When we fit this model, what we get is three different lines which are:
Disconnected at the knots (reffered to as i )
Have a slope of 0.
In other words, we have created three different regression models which contain only coefficients, nothing
else.
The above example may be improved upon by adding the slope term. In thise case we will get a model
which now consists of three disconnected regression lines.
What can be added to this? Constraints on connectedness. Lets break up the region [0, 1] into N = 2
parts, and then use the first order linear basis functions shown above:

1. h1 (X) = 1 + X where X [0, .5) and 0 otherwise.


2. h2 (X) = 1 + X where X [.5, 1] and 0 otherwise.
We add the constraint that at knot 1 = .5, h1 (1 ) = h1 (2 ). In a naive sense, our regression model is:
f (X) = 1 + 1 X + 2 + 2 X

(4)

However, we have to satisfy:


1 + 1 1 = 2 + 2 /1
1 + .51 = 2 + .52
2 =

1 (1 + .5)
= 1
1 + .5

We ended up with the following regression model!


f (X) = 1 + 1 X

(5)

The gist is this: Whenever we add another constraint to the connection of the basis functions, we reduce
the number of coefficients we can solve for. In the book, they use the example of a system of cubic polynomials
over three regions. If you require that the functions simply be connected, you sacrifice a coefficient. For
the case of the cubic polynomial, if you require continuity to the third derivative, you end up with a global
cubic polynomial. So, when we place more restrictions, we produce a more amenable function, but sacrifice
the flexibility of the underlying statistical model.
When we speak of an order M spline , we are discussing a function continuous derivatives up to M 2.
For instance, an order 4 spline is a cubic polynomial.
There is one other form of constraint which can be brought to bear when putting together a piecewise
basis for our regression model. In this case we require that the nodal values at the boundaries be linear.
This relates to the oscillations that can occur with high order polynomials at the boundaries. By requiring
that the function become linear at the boundaries a restrction is placed preventing these wild changes.

Potrebbero piacerti anche