Sei sulla pagina 1di 10

Week 13 Lecture: Nonlinear Least Squares Regression (Chapter 13)

Intrinsically Linear Models


Up to this point, we have been using linear regression models (though we did use several nonlinear transformations; e.g., polynomial regression). Some nonlinear models can be transformed to a linear form. For example, the exponential model, Yi = 0 exp( 1 X i ) i can be transformed with logarithms to:
ln Yi = ln 0 + 1 X i + ln i

We have seen this model before (a log-log model), and it can introduce log-normal bias into the regression model. However, this bias can be corrected (see Baskerville 1972, Flewelling and Pienaar 1981, Snowdon 1991).

True Nonlinear Models


Other nonlinear models are truly nonlinear and cannot be transformed to a linear form. For example, the sigmoid family of nonlinear models has been used extensively in the biological sciences, and some representatives include:

1.) General Logistic Model. Yi = 0 + (1 + 1 exp( 2 X i )) i

2.) Additive Error Exponential Model. Yi = 0 exp( 1 X i ) + i

3.) Chapman - Richards Model

Yi = 0 (1 exp( 1 X i ))

Normal Equations
Recall from OLS, we want to minimize:
Q = (Yi 0 1 X i )
i =1 n 2

The dependent variables are:


Yi = E[Yi ] + i E[Yi ] = f ( X i , )
Yi = f ( X i , ) + i

where, Yi = vector of dependent variables, Xi = vector of the independent variables, = vector of the regression parameters, and i = vector of error terms; i.e.:

Yi1 Y i2 . Yi = . . Yiq

X i1 X i2 . Xi = . . X iq

0 1 . = . . p 1

i1 i2 . i = . . iq

Now, minimize Q:
Q = Yi f ( X i , )
i =1 n

n f ( X i , ) Q = 2 Yi f ( X i , ) k i =1 k

and after some rearranging, we get:


n f ( X i , ) f ( X i , ) f ( X i , g) Yi = 0, i =1 k = g i =1 k = g n

k=0, 1, ..., p - 1

where, g = vector of least squares estimates, gk: g0 g 1 . = . . g p 1

g px1

The normal equations are nonlinear in gk. They have no closed form solutions. So, iterative procedures are necessary to solve the equations (e.g., Gauss - Newton, Marquardt, Method of Steepest Descent).

INFERENCES ABOUT NLS PARAMETERS


In NLS, we calculate MSE by:

SSE e' e MSE = = = np np

[Y
n i =1

f ( X i , g) np

MSE is biased, but the bias is small when n is large. Because there are no closed form solutions, variances, confidence intervals, R2, F-tests, etc. do not exist. However, we have a theorem that allows us to calculate asymptotic variances (see Theorem 13.32 on page 528): Theorem: When i ~ N(0,2) and n is large, the sampling distribution of g ~ N(0,2) with E[g] . Thus, s2(g) = 2(DD)-1. So, we can calculate asymptotic tvalues:

gk k ~ t n p , s(g k )

k = 0,1,..., p - 1

and the confidence intervals are calculated as before: g k t 1


2 ,n p

s(g k )

Weighted NLS
When we have non-constant variances, we can use weights to eliminate heteroscedasticity (H ) . Recall that b=(XWX)-1XWY. When the variances, 2i, are not constant, we choose weights, Wi, that are inversely proportional to 2i, so that 2i=2/wi. Though there are different ways to detect H , we will examine residual plots for our example. We will then find weights that will eliminate H . To determine which weight is the best, we will use Furnivals Index of Fit (Furnival 1961), just as we did with weighted linear regression earlier.

Example
The Chapman - Richards sigmoid growth model will be used in our example exercise. This model has seen extensive use in forestry, especially to model tree growth. The Chapman Richards growth model quantitatively describes the growth of an organism as the difference between its anabolic (constructive) growth and the catabolic (destructive) growth. This relationship can be expressed by the differential equation:
dY = Y Y , dt

where: Y = size of the organism t = time anabolic growth = Y (i.e., proportional to the size of the organism, raised to the power ) catabolic growth = Y (i.e., proportional to the size of the organism). This nonlinear first-order differential equation is a Bernoulli equation of the form:
dY + a (x )y = f (x )y n . dt

After a change of variables, the Chapman Richards model becomes:


dz + (1 )z = (1 ) , dt

where z = Y1-. This Bernoulli equation can be solved as a linear first-order differential equation by separation of variables to give the solution (3) in earlier section, True Nonlinear Models. To bring this model into the context of our biological example, we will replace Y with S = size of organism and X with A = age of organism.

Though considered an empirical model, its parameters do lend themselves to biological interpretation. A sigmoid growth form has an asymptote for the maximum size of an organism:
asymptote

SIZE

AGE

This asymptote is represented by 0. The 1 and 2 parameters together define the shape of the curve. The first derivative of S with respect to A gives the inflection point at which the growth rate is the fastest (i.e., point of greatest slope on the curve):
2 1 S = 2 0 (1 exp( 1A )) 1 exp( 1A ) A

inflection point

GROWTH
0

AGE

The second derivative identifies where the growth rate is increasing and decreasing over time:
2 1 2S 2 = 2 0 (1 exp( 1A)) 1 exp( 1A) + 2 A

[ exp( A) ( 1)( )(1 exp( A))


1 1 2 2 0 1

2 2

1 exp( 1A)

RATE OF GROWTH

inflection point 0

AGE

For a detailed review of the family of sigmoid growth models as well as the derivation of the generalized formulation of the sigmoid growth model, see Schnute (1981).

For our example, we will use weighted NLS to fit a Chapman-Richards growth model, HT - 4.5 = b0(1-exp(-b1AGE))b2, to 400 height-age measurements of Douglas-fir trees. This will produce what is commonly known in forestry as height-age curves, which show height development over time for trees of a given species. H is common in unweighted regression models of this nature, so we will use weights to eliminate H . For our example, I have provided a scatterplot of the predicted values and the residuals from unweighted NLS to show the H : 7

Residual Plot For Unweighted Regression


40 30 20 RESIDUALS 10 0 -10 -20 -30 -40 -50 0 20 40 60 80 100 Predicted Heights (H - 4.5)

The weights are reciprocals of age raised to various powers; we will use 1/A0.5, 1/A1, and 1/A1.5. We will use PROC NLIN to fit the models (NOTE: other computer software packages can perform nonlinear regression, such as SPSS, JMP, BMDP, MINITAB, and SYSTAT; only SAS (to my best knowledge) performs weighted NLS - weighted NLS must be done by transformations in the other packages).

The following tables show the results of our SAS runs:


WEIGHT NONE 1/X0.5 1/X1 1/X1.5 1/X2 P Z MSE FI

-0.25 0.5 0.75 1

0 0.8314 1.6628 2.4942 3.3256

99.2115 15.0658 2.4658 0.4549 0.1011 n = 400 8

9.9605 8.9139 8.2818


8.1691

8.8443

ln AGE = 1330.25

WEIGHT NONE 1/X0.5 1/X1

b0

b1

b2

83.5978 (6.5370) 85.7294 (7.2133) 90.5040 (8.9869) 105.0 (15.6143) 226.2 (143.1)

0.0262 (0.00500) 0.0246 (0.00453) 0.0217 (0.00418) 0.0163 (0.00402) 0.00509 (0.00419)

1.3992 (0.1593) 1.3495 (0.1245) 1.2755 (0.0959) 1.1584 (0.0729) 0.9737 (0.0547)

1/X1.5
1/X2

NOTE: asymptotic standard errors are inside the parentheses. The best fit model is indicated in RED font.

BIBLIOGRAPHY
Baskerville, G.L. 1972. Use of logarithmic regression in the estimation of plant biomass. Can. J. For. Res. 2:49-53. Chapman, D.G. 1961. Statistical problems in population dynamics. In: Proc. Fourth Berkeley Symp. Math Stat. and Prob. Univ. Calif. Press, Berkeley. Draper, N.R., and H. Smith. 1998. Applied Regression Analysis, 3rd edition. John Wiley and Sons, Inc., New York. Flewelling, J.W., and L.V. Pienaar. 1981. Multiplicative regression with lognormal errors. For. Sci. 27:281-289. Furnival, G.M. 1961. An index for comparing equations used in constructing volume tables. Forest Sci. 7:337-341. Gallant, A.R. 1987. Nonlinear Statistical Models. John Wiley & Sons, New York.

Greene, W.H. 1992. Econometric Analysis, 2nd edition. Macmillan Publishing Company, New York. Pienaar, F.J., and Turnbull, K.J. 1973. The Chapman Richards generalization of Von Bertalanffys growth model for basal area growth and yield in even=aged stands. Forest Science 19:2 22. Richards, F.J. 1959. A flexible growth function for empirical use. J. Exp. Bot. 10(29):290 300. SAS. 1999. SAS/STAT Users Guide, Version 8. SAS Institute, Inc., Cary, North Carolina. Schnute, J. 1981. A versatile growth model with statistically stable parameters. Can. J. Fish. Aquat. Sci. 38:1128-1140. Snowdon, P. 1991. A ratio estimator for bias correction in logarithmic regressions. Can. J. For. Res. 21:720-724.

10

Potrebbero piacerti anche