Sei sulla pagina 1di 7

Computational North-Holland

Statistics

& Data Analysis

6 (1988) 177-183

177

Applications of the jackknife procedure in ridge regression


Hans NYQUIST
Department of Statistics, Received 29 December Revised 15 July 1987 University of Ume2, S-901 87 (ImeG, Sweden 1986

Abstract: Three aspects of the application of the jackknife technique to ridge regression are considered, viz. as a bias estimator, as a variance estimator, and as an indicator of observations influence on parameter estimates. The ridge parameter is considered non-stochastic. The jackknifed ridge estimator is found to be a ridge estimator with a smaller value on the ridge parameter. Hence it has a smaller bias but a larger variance than the ridge estimator. The variance estimator is expected to be robust against heteroscedastic error variance as well as against outliers. A measure of observations influence on the estimates of regression parameters is proposed. Keywords: Bias reduction, Influential observations, Jackknifing, Linear Outlier detection, Generalized ridge regression, Variance estimation. model, Multicollinearity,

1. Introduction

The presence of multicollinearity among regressor variables in linear regression analysis may cause highly unstable least squares estimates of the regression parameters. An alternative estimation technique for non-orthogonal problems is the ridge regression estimator, originally introduced by Hoer1 and Kennard [5] and [6] (see Hoer1 and Kennard [7] for a bibliographical survey of ridge regression). Thus, we consider the linear regression model y = xp + E, where y is an n X 1 vector of observations on the response variable, X is an n x m matrix of observations on nonstochastic regressors, /3 is an m x 1 parameter vector, and E is an n X 1 vector of independent and identically distributed random errors with mean zero and variance u *. Let I be the m X m orthonormal matrix of eigenvectors of X*X. Then, the regressors can be transformed as 2 = XT and the model reformulated as with LY = r*p. The generalized ridge estimator (GRE) of a: is defined as a(c) = (A + c)-lzTy,
0 1988, Elsevier Science Publishers B.V. (North-Holland)

0167-9473/88/$3.50

178

H. Nyquist / Jackknifing in ridge regression

where A = ZTZ = TXTXI is the 111 x m diagonal matrix with eigenvalues of XX and C is an M X m diagonal matrix with non-negative elements. The GRE of /3 is b(C) = (xTx+ rCrr)-iXrJJ.

In particular the least squares estimator is obtained for C = 0, the zero matrix, and the ordinary ridge estimator for C = cl, where c > 0 is a scalar. The jackknife technique was introduced by Quenouille [ll] and [12] as a nonparametric biasreducing technique and extended by Tukey [13] to a nonparametric variance estimation technique. More recently Cook [l] has used jackknifing as a tool for exhibiting influential observations in a data set. Our goal in this paper is to apply the jackknife technique to the ridge regression estimator. More specifically, in Sections 2, 3, and 4 bias reduction, variance estimation, and a technique for tracing influential observations using jackknifing are discussed, respectively.

2. Jackknifed ridge estimators 2.1. The jackknife procedure The jackknife procedure is based on sequentially removing observations and recomputing the parameter estimate. Removing row i from y and Z and applying the Binomial Inverse Theorem (Woodbury [14]) yields the recomputed parameter estimate
U(C)(i,=(A-ZT.Z;+

C)-(Zy-Z;TY,)

[(A + C)-' + (A
x ( zTy - z&)

+ C)-ZTZ~(A + C)-'/{I -h(C)i}]

=a(C)-(A+C)

-'z~e(C)J{l-h(C)i}

where
h(C), = ti(A + C) -z; = Xi( xTx+ rCTT)-x;,

and e(C); =yi - zia(C)


=yi - x$(C)

is the ith residual, and xi, yi, and zi denote the ith row in X, y, and Z, respectively, x, and zi being expressed as row vectors. Pseudovalues are defined as
Pi=na(C)-(n-l)a(C)(,, = a(C) - (A + C)-( n - l)z~e(C),/{l

- h(C)i},

H. Nyquist / Jackknifing in ridge regression

179

and Quenouilles bias-corrected jackknifed


C2(c)=~+P,=~(C)+KJ(C), i1

estimator of (Yis defined by

with
K(C) = (tl1)tl -(A +

C)- i
i-l

.z~e(C)J{l

-h(C),}, ridge

and -K(C) is the jackknifed estimator of bias of a(C). The jackknifed estimator of /3 is immediately found as b(C) = ru( C). 2.2. The weighted jackknife procedure Hinkley [4] proposed the weighted jackknife weighting when constructing the pseudo-values weighted pseudo-values are defined as Qi=~(C)+n{l-h(C)i}(~(C)-~(C)~i,} yielding the weighted jackknifed estimator 0(C) where
K(C) = (A + c)%(c),

procedure. Instead of equal weights are introduced and

=n-l

i
i-l

Qi+a(C)

+ K(C),

and - K~(C) is the weighted jackknife sponding estimator of /3 is

estimator

of bias of a(C). The corre-

bW(C)=ruW(C)=b(C)=(XTX+ICP)-lKTr~(C). Since A and C are diagonal matrices further calculations yield u(C) = ((A + C)-+ = (A + C)-(A (A + C)-C)Zl; + 2C)Zry = (A + C2(A + 2C)-1)-1Zry

= a(C2(A + 2C)-)

= a(C)

with Cw = C2(A + 2C)-*. Hence, applying the weighted jackknife technique to a(C) is equivalent to changing the constant C to CW. As 0 d CW< C, with strict inequalities if C 2 0, the weighted jackknife technique suggests using a smaller value on C in order to reduce bias. Since Bias[ a(C)] = -(A and I+(C)] = uA(A + C)- (2) + C)-Ccx 0)

180

H. Nyquist / Jackknifing in ridge regression

it is easy to verify that Bias[aw(C)] and = Bias[a(Cw)] = (A + C)-C Bias[a(C)], (3)

v[uW(c)] = v[u(c)] = (I+ (A +

C)-1c)2V[u(C)].

(4)

Hence, we find that Bias[aW(C)] < Bias[a( C)], the reduction factor being (A + C)-C. This factor is zero for C = 0, the least squares case. Further, (A + C) - C tends to the identity matrix when the elements in C tend to infinity. Thus, the bias reducing effect obtained by applying the weighted jackknife procedure to GRE is decreased when elements in C are increasing. The variance is increased by the factor B = { I + (A + C) - C } * when applying the weighted jackknife procedure to GRE. In the least squares case, C = 0, B equals the identity matrix. Further, the diagonal elements in B are strictly increasing functions of the elements in C, tending to 4 as the elements in C tend to infinity. We therefore have that I+(c)] B I+(c)] <4I+(C)].

Note also that V[u(C)J tends to the zero matrix when the elements in C are increasing. MSE of u(C) and u(C) can now be calculated by using formulae (1) to (4). Comparing the MSE for the particular parameter aj we get MSE[ ui( C)] - MSE[ u,( C)]

where cj and Xj denote the jth diagonal elements in C and A, respectively. An examination of this difference shows that it is zero for cj = 0, for cj=c;= 3a21 a;xj + (l+; + 9fJ2 + 10c+*Aj)12/(4LY~),

and tends to zero when cj + co (note that 0 < lSo*/c$ < cj). For 0 c cj < c, the difference is negative, favouring the ridge estimator, and for cy < cj < cc the difference is positive, favouring the weighted jackknife ridge estimator. Differentiating the expressions for MSE and setting the derivatives equal to zero yields an equation, the solution of which gives the optimal C matrix, i.e. the matrix that minimizes MSE. For the GRE this is known to be CoPc= (Hoer1 and Kennard [S])._Evidently, the optimal C madiag( (I/a:, . . . , a/&,) trix for the weighted jackknifed ridge estimator, C,,, say, is the solution to

6&(A + 26,,,)-1 = &t,


or

co,, =copt + (Ccfpt + AC,J2.

H. Nyquist / Jackknifing in ridge regression

181

Since u( CO,,) = a( COPt)we conclude that the GRE and the weighted jackknifed ridge estimator are equivalent with respect to bias, variance, and MSE when the estimators are working at COPc and dOPt, respectively, the C-matrices that minimizes MSE. In practice CoPl is usually unknown and various estimates of it have been proposed. These estimates are often selected so as to minimize some estimate of MSE. Using -K~(C) as estimator of bias we find that Key + a2A(A + C)-2 (with a2 possibly replaced by some estimator) is an estimator of MSE. This estimator is positive and tends to zero as C tends to infinity. Hence, a decision rule minimizing this estimate of MSE would always delete all exploratory variables!

3. Variance estimation An important feature of the jackknife procedure is the possibility of constructing a distribution-free variance estimator for the parameter estimator. According to the standard definition the jackknife estimator of I[ a(C)] is V= {n(n-l)}_i
i-l

{Pi-aJ(C)}{Pi-a(C)}T

=(tl-l)/tZ(A
where

C)-'

i [ i=l

S(C)iS(C)'-tZdl

{ ~18Cc)i)(

~ls(c)i~~

Alternatively, using the weighted jackknife estimator we obtain v,= {+-~)}-1 2


i-l

{Q,-P(c)}{Q~-P(c)}~

=./(,-wz)(A+C)-[Z~A(C)Z-~-C~~(C)~(C)~], where A(C) is the n X n diagonal matrix with e(C); as its diagonal elements. The motivation of using n - I)I in the denominator in the definition of VW is that I, then is an unbiased estimator of V[ a( C)] for C = 0 in a balanced design, i.e. when h( C)i = m/n. Except for that case I, is generally biased (Hinkley [4]). Since I, is distribution-free it is reasonable to believe that it is insensitive to departures from distributional assumptions in the model. In particular we expect it to work well in cases with heteroscedastic error variances and with outlying observations. Finally, a weighted jackknife estimator of MSE of a(C) is K~(C) + VW. Also this estimator tends to zero as the diagonal elements in C tend to infinity. Thus, it is not suitable to base a selection rule for C on this estimator of MSE.

182

H. Nyquist / Jackknifing

in ridge regression

4. Measuring influence

Tracing influential observations is an important step in data analysis. The purpose of this step is to isolate so called outliers in the data set. There are at least two reasons for this in regression analysis. Firstly, outliers may highly influence parameter estimates and hence deteriorate results. Being detected these outliers may then be de-emphasized or omitted in a second estimation of the model. Secondly, it has recently been recognized that multicollinearity can be the result of a few data points with very large values on two or more variables (Mason and Gunst [lo]). If a present multicollinearity is found to be outlier-induced an outlier resistant estimator (e.g. Krasker and Welsch [9] and Huber [S]) is probably a better alternative to least squares then biased estimators such as ridge regression. There are several suggestion for measuring an observations influence on parameter estimates. One alternative for the linear model, proposed by Cook [l], is to base a measure on changes in parameter estimates when one observation is removed. Implementing this proposal on GRE we get Jj(b(C)} = {b(C) - b(C)(i,}TP{ b(C)}-{ b(C) - b(C),i,}/m,

where P{ 6(C)} is some estimate of V{ b(C)}_ The term influential observation then referes to an observation whose inclusion in the data set substantially changes estimates of regression parameters. Hence, an influential observation is not necessarily a harmful1 outlier. If an observation is classified as influential other considerations are needed to discriminate between harmful outliers and good informative observations. For an illustration of the application of the jackknife technique for identifying influential observations we refer to an earlier draft of this paper. It should here be noted that two or more outliers in close proximity to one another in design space may remain undetected by single-deletion diagnostics such as Ji{ b(C)}. If it is felt that this is the case a diagnostic that has been proposed for detecting groups of outliers (e.g. Cook and Weisberg [2] and Gray and Ling [3]) should be implemented to the ridge estimator.

Acknowledgements

The author wishes to thank an associated editor for helpful comments have led to a substantial improvement of this manuscript.
References

which

[l] R.D. Cook, Detection of influential observations in linear regression, Technometrics 19 (1977) 15-18. [2] R.D. Cook and S. Weisberg, Residuals and Influence in Regression (Chapman and Hall, New York, 1982).

H. Nyquist / Jackknifing in ridge regression

183

[3] J.B. Gray and R.F. Ling, K-clustering as a detection tool for influential subsets in regression, Technometrics 26 (1984) 305-330. [4] D.V. Hit&fey, Jackknifing in unbalanced situations, Technometrics 19 (1977) 285-292.

[5] A.E. Hoer1 and R.W. Kennard, Ridge regression: biased estimation for non-orthogonal problems, Technometrics 12 (1970) 55-67. [6] A.E. Hoer1 and R.W. Kennard, Ridge regression: applications to nonorthogonal problems,
Technometrics 12 (1970) 69-82. [7] A.E. Hoer1 and R.W. Kermard, Ridge regression 1980: advances, algorithms and applications, American Journal of Mathematical and Management Sciences 1 (1981) 5-83. [8] P.J. Huber, Minimax aspects of bounded-influence regression, Journal of the American Statistical Association 78 (1983) 66-80. [9] W.S. Krasker and R.E. Welsch, Efficient bounded-influence regression estimation, Journal of the American Statistical Association 77 (1982) 595-604. [lo] R.L. Mason and R.F. Gunst, Outlier-induced collinearities, Technometrics 27 (1985) 401-407. [l l] M. Quenouille, Approximate tests of correlation in time series, Journal of the Royal Statistical Society Ser B 11 (1949) 18-84. [12] M. QuenouiUe, Notes on bias in estimation, Biometrika 43 (1956) 353-360. [13] J. Tukey, Bias and confidence in not quite large samples, abstract, Annals of Mathematical Statistics 29 (1958) 614. [14] M. Woodbury, Inverting modified matrices, Memorandum no 42, Statistical Research Group,

Princeton University (1950).

Potrebbero piacerti anche