ANUM 2012 Curve-Fitting

Least Square Regression
CURVE FITTING
Describes techniques to fit curves (curve fitting) to discrete

data to obtain intermediate estimates.
There are two general approaches for curve fitting:

• Least Squares regression:
Data exhibit a significant degree of scatter. The strategy is
to derive a single curve that represents the general trend
of the data.
• Interpolation:
Data is very precise. The strategy is to pass a curve or a
series of curves through each of the points.
Introduction
In engineering, two types of applications are
encountered:
– Trend analysis. Predicting values of dependent
variable, may include extrapolation beyond data
points or interpolation between data points.
– Hypothesis testing. Comparing existing

mathematical model with measured data.
Mathematical Background
• Arithmetic mean. The sum of the individual data
points (yi) divided by the number of points (n).
y
 y i
, i  1, , n
n
• Standard deviation. The most common measure of a

spread for a sample.
St
Sy  , St   ( yi  y ) 2
n 1
Mathematical Background (cont’d)
• Variance. Representation of spread by the square of

the standard deviation.
 i   y   y 
2 2
( y y ) 2
/n
S 
2
or S 2
 i i
n 1
y
n 1
y
• Coefficient of variation. Has the utility to quantify the

spread of data.
Sy
c.v.  100%
y
Least Squares Regression
Linear Regression
Fitting a straight line to a set of paired
observations: (x1, y1), (x2, y2),…,(xn, yn).
y = a0+ a1 x + e
a1 - slope
a0 - intercept
e - error, or residual, between the model and
the observations
Linear Regression: Residual
Linear Regression: Question
How to find a0 and a1 so that the error would be

minimum?
Linear Regression: Criteria for a “Best” Fit
n n
min e  (y
i 1
i
i 1
i  a0  a1 xi )
e1 e1= -e2
e2
n n
min | e |  | y
i 1
i
i 1
i  a0  a1 xi |
n
min max | ei || yi  a0  a1 xi |
i 1
Linear Regression: Least Squares Fit
n n n
S r   e   ( yi , measured  yi , model )   ( yi  a0  a1 xi ) 2
2
i
2
i 1 i 1 i 1
n n
min S r   ei
2
  ( yi  a0  a1 xi ) 2
i 1 i 1
Yields a unique line for a given set of data.

Linear Regression: Least Squares Fit
n n
min S r   ei
2
  ( yi  a0  a1 xi ) 2
i 1 i 1
The coefficients a0 and a1 that minimize Sr must satisfy

the following conditions:
 S r
 a  0
 0

 S r  0
 a1
Linear Regression:
Determination of ao and a1
S r
 2 ( yi  ao  a1 xi )  0
ao
S r
 2 ( yi  ao  a1 xi ) xi   0
a1
0   yi   a 0   a1 xi
0   yi xi   a 0 xi   a1 xi2
a 0  na0
na0   xi a1   yi 2 equations with 2
unknowns, can be solved
 ii  0i 1i
y x  a x  a x 2
simultaneously
Linear Regression:
Determination of ao and a1
n xi yi   xi  yi
a1 
n x   xi 
2 2
i
a0  y  a1 x
18
19
Error Quantification of Linear Regression
• Total sum of the squares around the mean for

the dependent variable, y, is St
S t   ( yi  y ) 2
• Sum of the squares of residuals around the

regression line is Sr
n n
S r   ei2   ( yi  ao  a1 xi ) 2
i 1 i 1
• St-Sr quantifies the improvement or error

reduction due to describing data in terms of a
straight line rather than as an average value.
St  S r
r 
2
St
r2: coefficient of determination
r : correlation coefficient
For a perfect fit:

• Sr= 0 and r = r2 =1, signifying that the line
explains 100 percent of the variability of the
data.
• For r = r2 = 0, Sr = St, the fit represents no
improvement.
Least Squares Fit of a Straight Line:
Example
Fit a straight line to the x and y values in the

following Table:
xi yi xiyi xi2
 xi  28  yi  24.0
1 0.5 0.5 1
2 2.5 5 4  i  140
x 2
 xi yi  119 .5
3 2 6 9
28 24
4 4 16 16 x 4 y  3.42857
5 3.5 17.5 25 7 7
6 6 36 36
28 24
7 5.5 38.5 x  49  4 y  3.428571
7 7
28 24 119.5 140
Least Squares Fit of a Straight Line: Example
(cont’d)
n xi yi   xi  yi
a1 
n x  ( xi )
2 2
i
7 119.5  28  24
  0.8392857
7 140  28 2
a0  y  a1 x
 3.428571  0.8392857  4  0.07142857
Y = 0.07142857 + 0.8392857 x
Least Squares Fit of a Straight Line: Example
24
4 y  3.428571 (Error Analysis)
7 Y = 0.07142857 + 0.8392857 x
^
xi yi (yi  y)
2
e  ( yi  y ) 2
2

 i 
i
   22.7143
2
1 0.5 8.5765 0.1687 S t y y
2 2.5 0.8622 0.5625
S r   ei  2.9911
2
3 2.0 2.0408 0.3473
4 4.0 0.3265 0.3265
5 3.5 0.0051 0.5896 St  S r
6 6.0 6.6122 0.7972 r 
2
 0.868
St
7 5.5 4.2908 0.1993
28 24.0 22.7143 2.9911
r  r 2  0.868  0.932
Least Squares Fit of a Straight Line:
Example (Error Analysis)
•The standard deviation (quantifies the spread around the mean):
St 22.7143
sy    1.9457
n 1 7 1
•The standard error of estimate (quantifies the spread around the
regression line)
Sr 2.9911
sy / x    0.7735
n2 72
Because S y / x  S y , the linear regression model has good fitness
Linearization of Nonlinear Relationships
• The relationship between the dependent and
independent variables is linear.
• However, a few types of nonlinear functions
can be transformed into linear regression
problems.
The exponential equation.
The power equation.
The saturation-growth-rate equation.
1. The exponential equation.
y  a1eb1x 
ln y  ln a1  b1 x
y* = ao + a 1 x
2. The power equation
y  a2 xb2 
log y  log a2  b2 log x

y* = ao + a1 x*
3. The saturation-growth-rate equation
x
y  a3 
b3  x
y* = 1/y
1 1 b3  1 
    ao = 1/a3
a1 = b3/a3
y a3 a3  x  x* = 1/x
Example
Fit the following Equation:
y  a2 x b2
to the data in the following table: log y  log( a2 xb2 )
xi yi X*=log xi Y*=logyi log y  log a2  b2 log x
1 0.5 0 -0.301 let Y *  log y, X *  log x,
2 1.7 0.301 0.226 a0  log a2 , a1  b2
3 3.4 0.477 0.534
4 5.7 0.602 0.753 Y *  a0  a1 X *
5 8.4 0.699 0.922
15 19.7 2.079 2.141
Example
Xi Yi X*i=Log(X) Y*i=Log(Y) X*Y* X*^2

1 0.5 0.0000 -0.3010 0.0000 0.0000
2 1.7 0.3010 0.2304 0.0694 0.0906
3 3.4 0.4771 0.5315 0.2536 0.2276
4 5.7 0.6021 0.7559 0.4551 0.3625
5 8.4 0.6990 0.9243 0.6460 0.4886
Sum 15 19.700 2.079 2.141 1.424 1.169
 n  x i y i   x i  y i 5 1.424  2.079  2.141

a1    1.75
n  x i  ( x i ) 5 1.169  2.079
2 2

2

a0  y  a1x  0.4282  1.75  0.41584  0.334
Linearization of Nonlinear
Functions: Example
log y=-0.334+1.75log x
y  0.46x 1.75
Polynomial Regression
• Some engineering data is poorly represented

by a straight line.
• For these cases a curve is better suited to fit
the data.
• The least squares method can readily be
extended to fit the data to higher order
polynomials.
Polynomial Regression (cont’d)
A parabola is preferable
• A 2nd order polynomial (quadratic) is defined by:

y  ao  a1 x  a2 x  e
2
• The residuals between the model and the data:
ei  yi  ao  a1 xi  a2 xi
2
• The sum of squares of the residual:

Sr   ei   yi  ao  a1 xi  a2 xi
2

2 2
S r
 2 ( yi  ao  a1 xi  a2 xi2 )  0
ao
S r
 2 ( yi  ao  a1 xi  a2 xi2 ) xi  0
a1
S r
 2 ( yi  ao  a1 xi  a2 xi2 ) xi2  0
a2
 i
y  n  a o  a1 i
x  a 2 i
x 2
3 linear equations
 i i o  i 1 i 2  i
with 3 unknowns
x y  a x  a x 2
 a x 3
(ao,a1,a2), can be
solved
x 2
i yi  ao  x  a1  x  a2  x
2
i
3
i
4
i
• A system of 3x3 equations needs to be solved to determine

the coefficients of the polynomial.
 n

x x
i
2
i
  a0    y i 
   
  xi x x   a1     xi yi 
2 3
i i
 xi2 x x
3 4  a2   xi2 yi 
 i i   
• The standard error & the coefficient of determination
Sr St  S r
sy / x  r 
2
n3 St
General:
The mth-order polynomial:
y  ao  a1 x  a2 x 2  .....  am x m  e
• A system of (m+1)x(m+1) linear equations must be solved for
determining the coefficients of the mth-order polynomial.
• The standard error:
Sr
sy / x 
n  m  1
St  S r
• The coefficient of determination: r 
2
St
Polynomial Regression- Example
Fit a second order polynomial to data:
xi yi xi2 xi3 xi4 xiyi xi2yi x i  15
0 2.1 0 0 0 0 0
1 7.7 1 1 1 7.7 7.7 y
 xi3  225 i  152.6
2 13.6 4 8 16 27.2 54.4
 i  55
x 2
3 27.2 9 27 81 81.6 244.8

4 40.9 16 64 256 163.6 654.4
5 61.1 25 125 625 305.5 1527.5
15 152.6 55 225 979 585.6 2489  i  979
x 4
x y i i  585.6
15 152.6
x  2.5, y  25.433
 xi yi  2488.8
2
6 6
Polynomial Regression- Example (cont’d)
• The system of simultaneous linear equations:

 6 15 55  a0   152.6 
15 55 225  a    585.6 
  1   
55 225 979   2488.8
 2 
a 
a0  2.47857, a1  2.35929, a2  1.86071
y  2.47857  2.35929 x  1.86071 x 2
S r   ei  3.74657
2
St    yi  y   2513.39
2
Polynomial Regression- Example (cont’d)
xi yi ymodel e i2 (yi-y`)2
0 2.1 2.4786 0.14332 544.42889
1 7.7 6.6986 1.00286 314.45929
2 13.6 14.64 1.08158 140.01989
3 27.2 26.303 0.80491 3.12229
4 40.9 41.687 0.61951 239.22809
5 61.1 60.793 0.09439 1272.13489
15 152.6 3.74657 2513.39333
•The standard error of estimate:

3.74657
sy / x   1.12
63
•The coefficient of determination:
2513.39  3.74657
r2   0.99851, r  r 2  0.99925
2513.39

ANUM 2012 Curve-Fitting

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

ANUM 2012 Curve-Fitting

Caricato da

Copyright:

Formati disponibili

Least Square Regression

Describes techniques to fit curves (curve fitting) to discrete

There are two general approaches for curve fitting:

– Hypothesis testing. Comparing existing

• Standard deviation. The most common measure of a

• Variance. Representation of spread by the square of

• Coefficient of variation. Has the utility to quantify the

How to find a0 and a1 so that the error would be

Yields a unique line for a given set of data.

The coefficients a0 and a1 that minimize Sr must satisfy

• Total sum of the squares around the mean for

• Sum of the squares of residuals around the

• St-Sr quantifies the improvement or error

For a perfect fit:

Fit a straight line to the x and y values in the

•The standard deviation (quantifies the spread around the mean):

log y  log a2  b2 log x

Xi Yi X*i=Log(X) Y*i=Log(Y) X*Y* X*^2

 n  x i y i   x i  y i 5 1.424  2.079  2.141

• Some engineering data is poorly represented

• A 2nd order polynomial (quadratic) is defined by:

• The residuals between the model and the data:

• The sum of squares of the residual:

• A system of 3x3 equations needs to be solved to determine

• The standard error & the coefficient of determination

3 27.2 9 27 81 81.6 244.8

• The system of simultaneous linear equations:

•The standard error of estimate:

Potrebbero piacerti anche

Xi Yi Xi=Log(X) Yi=Log(Y) XY X*^2