Chapter 5: Basis Functions and Regularization

Abstract
Chapter five of ESL considers abstractions of the concept of linear regression. Instead of supposing we
will be putting together a simple linear model, ESL dives into the concept of basis functions, preprocessing,
and the use of smoothing paramters in regression models. The chapter finishes with some functional analysis,
and the employment of wavelets in statistical analysis.
Chapter 5: Basis Functions and Regularization

In chapters two and three the regression model was discussed as coming in a form like
y = 0 + 1 x1 + i
(1)
in the case of a one dimensional model. This explanation can be generalized by noting that the X term
is in fact a basis function (multiplied by 1), and that we can specify other models which are analogous to
that produced by the RSS functional above, but with additional specifications.
For instance, we may specify a linear basis of the predictor variables X wherein some function is applied
to X. Well denote this basis for X as hm (X):
f (X) =
M
X
m hm (X)
(2)
m=1
This basis function hm (X) may be a constant, or it may be the value of X taken to a particular power,
or perhaps some other functional transformation. For instance, recall this question from scientific computing:
Note: Add example section type.

To use linear least squares the function must be linear in terms of variable p. To arrive at the linear
form, we being with the expression denoting the coordinates equations:
r=
p
1 e cos()
Inverting this equation produces:

1 e cos(
e cos()
1
1
=
=
r
p
p
p
Parameter p = 1 for this function while the values multiplying
e
p
are:
Assuming p = 1, we effectively define a regression equation with a constant term 1:

1
= 1 e cos()
r
Now, using the data points supplied at the beginning of the problem, the design matrix (for this problem,
denoted as A) as:
1 .669130
1 .390731
A=
1 .121869
1 0.309016
1 0.587785
Where b is made of the respective observations
1
:
r
0.370370
.5
0.618046
b=
0.833333
0.980392
By solving for c in Ac = b, the value of the coefficients for this regression equation are shown to be:

.6880
c=
.4841
Using the coefficients for c, which are in terms of :
p = c11
e
= c2
p
Then the coefficients for the linear least squares estimate of
1
r
are:
p = 1/.688 = 1.453488
e = 1.453488 .4841 = .703633
Which gives:
r=
1.453488
1 .703633 cos()
Displayed below are the results of this equation at the given points along with the absolute error:
48A
67A
83A
108A
126A
#
#
r
2.74669
2.00462
1.58982
1.19389
1.02823
|r r |
0.04669
0.00462
0.02018
0.00611
0.00823
Peter Oliver Caya

This is a remake of the scientific computing assignment regarding regression.
# First, form the design matrix A, the vector of observations b. I provide a

# quick function to calculate the hat function but for this example Rs built-in
# functionality will be used to extract the hat matrix from the regression model.
A = as.matrix(cbind(rep(1,5),c(-.669130,-.390731,-.121869,.309016,.587785)))
b = as.matrix(c(.370370,.5,.618046,.833333,.980392))
hatmake = function(x) {x %*% solve(t(x)%*%x)%*%t(x)}
H = hatmake(A)
c = solve(t(A)%*%A)%*%t(A)%*%b
b.hat = h%*%b
# The model produced using the built-in functionality of R:
reg = lm(b~A[,2])
plot(b,A[,2],col=red)
par(new=T)
abline(b,fitted.values(reg))
# Use Rs functionality to extract the hat matrix and perform an svd on it.
#With the diagonal matrix produced by svd, it is possible to get an idea of
# the level of variance each column of the design matrix imparts to our model.
# Note that in plotting the svd of the design matrix, there are no obvious
#columns contributing unduly to the variance:
X = model.matrix(reg)
h= hat(X)
svd.X = svd(X)
plot(svd.X$d,xlab = ,ylab = Singular Values)
#
#
#
Cooks distance also is useful as it gives us a heuristic (and analytical)

way of assessing the residuals in our model. We first assess the heuristic
by taking the distance and plotting it
lev = diag(h)
plot(A[,2],lev)
plot(residuals(reg),cooks.distance(reg),xlab = Residuals,ylab = Cooks D)
# Next, we can use the influence() function to calculate the change in the
# model that occurs via the ommision of any observation.
This is given
# by the $coefficients argument.
# It can be seen that ommitting the fifth observation would tweak the
# intercept term up by ~.5%, and the slope term by 1.2%
influence(reg)$coefficients
Introduction to the Use of Basis Functions in Regression
In chapter 3 we discussed the creation and modification of linear models. However, a simple linear specification for a model is not always guaranteed! First, look at this more general specification of a statistical
model for some phenomena f (X):
M
X
f (X) =
m hm (X)
(3)
m=1
In this case, hm (X) is some functional being applied to the vector X, IE hm (X) : Rm R, m
[1, 2, ..., M ]. In fact, (3)fregeqn is a linear basis expansion in X. The selection of hm (X) is very large. For
instance, we may simple let hm (X) be
1 which produces our standard regression model. Alternatively, it
could be a quadratic representation of the form:
f (X) = 0 + 1 X + 2 X T X
It could also be a function like log(X) or and indicator function. We may choose the functional to suit
our purposes.
This chapter considers the following methods of manipulation of the basis functions to influence the
underlying model:
1. Piecewise polynomial and splines - Polynomial representation of the regression model. For instance, a
cubic spline, or a set of piecewise basis functions which form the basis for the model may be used.
2. Wavelet bases in which a large number of potential basis functions are formed from a dictionary D .
Introduction to Piecewise Polynomials and Splines
Instead of identifying parameters over an entire region, we can instead divide up the region that the model
is use into segments. The borders of these segments are referred to as knots by the authors of ESL. A very
elementary example is dividing the region [0, 1] into N = 3 sub-regions. On each of these regions, we define
basis functions as follows:
1. h1 (X) = 1 where X [0, 1/3)
2. h2 (X) = 1 where X [1/3, 2/3)
3. h3 (X) = 1 where X [2/3, 1]
When we fit this model, what we get is three different lines which are:
Disconnected at the knots (reffered to as i )
Have a slope of 0.
In other words, we have created three different regression models which contain only coefficients, nothing
else.
The above example may be improved upon by adding the slope term. In thise case we will get a model
which now consists of three disconnected regression lines.
What can be added to this? Constraints on connectedness. Lets break up the region [0, 1] into N = 2
parts, and then use the first order linear basis functions shown above:
1. h1 (X) = 1 + X where X [0, .5) and 0 otherwise.

2. h2 (X) = 1 + X where X [.5, 1] and 0 otherwise.
We add the constraint that at knot 1 = .5, h1 (1 ) = h1 (2 ). In a naive sense, our regression model is:
f (X) = 1 + 1 X + 2 + 2 X
(4)
However, we have to satisfy:

1 + 1 1 = 2 + 2 /1
1 + .51 = 2 + .52
2 =
1 (1 + .5)
= 1
1 + .5
We ended up with the following regression model!

f (X) = 1 + 1 X
(5)
The gist is this: Whenever we add another constraint to the connection of the basis functions, we reduce
the number of coefficients we can solve for. In the book, they use the example of a system of cubic polynomials
over three regions. If you require that the functions simply be connected, you sacrifice a coefficient. For
the case of the cubic polynomial, if you require continuity to the third derivative, you end up with a global
cubic polynomial. So, when we place more restrictions, we produce a more amenable function, but sacrifice
the flexibility of the underlying statistical model.
When we speak of an order M spline , we are discussing a function continuous derivatives up to M 2.
For instance, an order 4 spline is a cubic polynomial.
There is one other form of constraint which can be brought to bear when putting together a piecewise
basis for our regression model. In this case we require that the nodal values at the boundaries be linear.
This relates to the oscillations that can occur with high order polynomials at the boundaries. By requiring
that the function become linear at the boundaries a restrction is placed preventing these wild changes.

Chapter 5: Basis Functions and Regularization

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chapter 5: Basis Functions and Regularization

Caricato da

Copyright:

Formati disponibili

Abstract

Chapter 5: Basis Functions and Regularization

Note: Add example section type.

Inverting this equation produces:

Assuming p = 1, we effectively define a regression equation with a constant term 1:

Where b is made of the respective observations

Peter Oliver Caya

# First, form the design matrix A, the vector of observations b. I provide a

Cooks distance also is useful as it gives us a heuristic (and analytical)

Introduction to the Use of Basis Functions in Regression

Introduction to Piecewise Polynomials and Splines

1. h1 (X) = 1 + X where X [0, .5) and 0 otherwise.

However, we have to satisfy:

We ended up with the following regression model!

Potrebbero piacerti anche