Optimized Curve Fitting

Optimized Curve Fitting
21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 1

Case 1
 Tabulated data (interest table, steam table etc.)

 Estimates are required at intermediate values
from such tables

Case 2
 Experimentation
 Independent (predictor) variable X
 Dependent (response) variable Y
 Data available at discrete points or times
 Estimates are required at points between the
discrete values (as it is impractical or expensive
to actually measure them)

Case 3
 Function substitution
 A implicit (complicated) function or program is known
 Results at all values are possible but time consuming
 And further mathematical operations (integration,

differentiation, maximum or minimum points ) is
difficult

Case 4
 Hypothesis testing
 Alternative mathematical models are given
 Which is the best to use for a given situation

Solution
 Graphically represent data points
 Develop a mathematical relation (curve fitting)
which describe the relationship between variables
 Draw the curve for the developed mathematical
relation which best represent the given data
points
 Use the mathematical relation or the curve
 to estimate the intermediate values and
extrapolation
 for further mathematical operations
 for hypothesis testing

Problem
 Sketch a line that visually conforms to the data

points
 And obtain the y value for x = 8
i 1 2 3 4 5
x 2.10 6.22 7.17 10.5 13.7
y 2.90 3.83 5.98 5.71 7.74

Curve Fitting
10.00 10.00
8.00 8.00
6.00 6.00
y
y
4.00 4.00
2.00 2.00
0.00 0.00
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
x x
10.00 10.00
8.00 8.00
6.00 6.00
y
y
4.00 4.00
2.00 2.00
0.00 0.00
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
x x

Curve Fitting
10.00
8.00
6.00
4.00
y
2.00
0.00
-2.00 0 2 4 6 8 10 12 14 16
-4.00
x

General observations
 Curves are dependent on subjective view point

 For the same data set, there should not be
different curves
 There may be error in reading the values from
the graph

Curve Fitting
 Two methods - depending on error in data
120
 Interpolation 100
 Precise data 80
Temperature (deg F)
60
 Force through each data point 40
20
0
0 1 2 3 4 5
 Regression
Time (s)
 X values are accurate 7
 Y values are noisy (Experimental) 5
f(x)
4
 Represent trend of the data 3
Without matching individual points

1

0
0 2 4 6 8 10 12 14 16
x

Regression steps
Y = A*exp(–X/X0)
 Model selection
 Describing Merit function for closeness-of-fit
 Compute values of the parameter of the model
 Interpretation of results & assessing goodness-
of-fit

Right Model selection
 Understanding of basic principle of the problem
 Model should represent the data trends
 Linear model y = a0 + a1 x
 Polynomial model
 Non-linear model
 Exponential law
 Power law
 Logarithmic law
 Gaussian law
 Multiple variable y = b1x1 + b2x2 + ... + bnxn + c
Describing Merit Function
 Method of least squares
Outliers & Weighing function
9
8
y = a0 + a1 x y5
7
Data points
y3
6
e3
y4
5
f(x)
e2
4 Residual
y1 y2 e = y - (a 0 + a 1x )
Regression Model
2 y = a 0 + a 1x
0
0 2 4 6 8 10 12 14 16
x

Computing parameter values
Obtain the parameters (a0 & a1) that minimizes the sum of
squares of error between the data points and the line
Linear regression y = a0 + a1 x
Polynomial regression
Multiple regression y = b1x1 + b2x2 + ... + bnxn + c
Exponential law
Power law
Logarithmic law
Can be solved explicitly
Non-linear regression
Gaussian law
Iteratively solved using Levenberg-Marquardt algorithm
Goodness-of-fit
Visual inspection
Random distribution of residual around data points
Correlation Coefficient r2
Standard error of parameters
Confidence interval
Prediction interval

Interpretation of results
 Curve fitting provides a correlation between the

variables
 It means that ‘X predicts Y’ not ‘X causes Y ‘
 Parameter values must also make sense

Linear Regression
Model selection
Assume linear model y = a0 + a1 x
Merit Function
sum of square of residual error
yi = a0 + a1xi + ei 9
8
y5
ei = yi − a0 − a1xi 7
6
Data points
y3
e3
y4
n
∑
5
Sr = ei2
f(x)
e2
4 Residual
y1 y2 e = y - (a 0 + a 1x )
i= 1 3
Regression Model
n 2
∑ ( yi − a0 − a1xi )
y = a 0 + a 1x
2
= 1
i= 1 0
0 2 4 6 8 10 12 14 16
x

Linear Regression
 Parameter computation
Find the values of a0 and a1 that minimize Sr
Minimize Sr by equating derivatives WRT a0 and a1 to zero,
 First a0
∂S r ∂ n 2
= (
∑ i 0 1 i 
y − a − a x )
∂a0 ∂a0  i=1
n
∂
= ∑ [] 2
i=1 ∂ a0
n
∂   Finally
= ∑ 2[]  [] 
i=1  ∂ a0  n  n
n
na0 +  ∑ xi  a1 = ∑ yi
= ∑ 2[ y
i=1
i − a0 − a1 xi ]( − 1 ) i=1  i=1
a0 + x a1 = y
= 0
Linear Regression
 Second a1
∂S r ∂ n 2
= (
∑ i 0 1 i 
y − a − a x )
∂a1 ∂a1  i=1
n
∂
= ∑ [] 2
i=1 ∂a1
n
∂ 
= ∑ 2[]  [] 
i=1  ∂a1 
n
= ∑ 2[ y i − a0 − a1 xi ]( − xi )
i=1  Finally
= 0
n   n 2 n
 ∑ xi  a0 +  ∑ xi  a1 = ∑ xi yi
i=1  i=1  i=1
Linear Regression
 Equations  Solution
1 n n 2 1 n n
n  n ∑ yi ∑ xi − ∑ xi ∑ xi yi
na0 +  ∑ xi  a1 = ∑ yi n i=1 i=1 n i=1 i=1
a0 =
i=1  i=1
n 2 1  n 2
∑ xi −  ∑ xi 
n   n 2 n i=1 n i=1 
 ∑ xi  a0 +  ∑ xi  a1 = ∑ xi yi n
i=1  i=1  i=1 1 n n
∑ xi yi − ∑ xi ∑ yi
i=1 n i=1 i=1
a1 =
n 2 1  n 2
∑ xi −  ∑ xi 
i=1 n i=1 

Linear Regression
 Sum of squared values  Variances & covariance
a1
a0 a1

Example
i xi yi xi
2 xi yi yi
2
1 2.10 2.90 4.41 6.09 8.41

2 6.22 3.83 38.69 23.82 14.67
3 7.17 5.98 51.41 42.88 35.76
4 10.50 5.71 110.25 59.96 32.60
5 13.70 7.74 187.69 106.04 59.91
Sum 39.69 26.16 392.45 238.78 151.35
1 n n 2 1 n n 5
∑ yi ∑ xi − ∑ xi ∑ xi yi ∑ xi = 39.69
n i=1 i=1 n i=1 i=1 i =1
a0 =
n 2 1  n 2 5 2
∑ xi −  ∑ xi  ∑ xi = 392.3201
i=1 n  i=1  i =1
n 1 n n 5
∑ xi yi − ∑ xi ∑ yi ∑ yi = 26.16
i=1 n i=1 i=1 i =1
a1 =
n 2 1  n 2 5
∑ xi −  ∑ xi  ∑ xi yi = 238.7416
i=1 n  i=1  i =1
Example
5
∑ xi = 39.69 1 1
(26.16)(392.3) − (39.69)(238.7)
i =1 a0 = 5 5 = 2.038
1
5 2 392.3 − [ 39.69] 2
∑ xi = 392.3201 5
i =1 1
5 238 . 7 − (39.69)(26.16)
a1 = 5 = 0.4023
∑ yi = 26.16 1
i =1 392.3 − [ 39.69] 2
5 5
∑ xi yi = 238.7416
i =1
y = 2.038 + 0.4023x
Another Approach
[Z ] * { A} = {Y }
[[Z] T
]
* [Z ] * { A} = [Z ] * {Y } { T
}
{ A} = [[Z ] * [Z ] ] * {[Z] }
−1
* {Y }
T T

Example
xi yi
1 2.10 2.90
2 6.22 3.83
3 7.17 5.98
4 10.50 5.71
5 13.70 7.74
y = a0 + a1 x
2.10 1 2.90 
2.10a1 + a0 = 2.90 6.22 1  3.83 
 a 1   
6.22a1 + a0 = 3.83 7.17 1  *   = 5.98 
7.17a1 + a0 = 5.98   a 0   
10.50 1 5.71 
10.50a1 + a0 = 5.71 13.70
 1  7.74 
 
13.70a1 + a0 = 7.74
Example
2.10 1 2.90 
6.22 1  3.83 
2.10 6.22 7.17 10.50 13.70   a1  2.10 6.22 7.17 10.50 13.70   
 1 1 1 1 1  * 7.17 1*  = 
a 1 1 1 1 1  * 5.98 
  10.50 
1  0   5.71 
  
13.70 1  7.74 
  
392.45 39.69  a1  238.78 

39.69 *  = 
 5  a0  26.16 

a1  0.012922 - 0.10257  238.78 

a  =  *
 0  - 0.10257 1.014232  26.16 

a1  0.4022 
  = 2.0395 
a0  
y = 2.038 + 0.4023x
Goodness-of-fit - I
Visual inspection: Linear trend matching
10
8
6
y
4
2
0
0 2 4 6 8 10 12 14 16
x
y = 2.038 + 0.4023x

Goodness-of-fit - I
Predicted values and e
xi yi y e = yi − y
2.10 2.90 2.88 0.02
6.22 3.83 4.54 -0.71
7.17 5.98 4.92 1.06
10.50 5.71 6.26 -0.55
13.70 7.74 7.55 0.19
y = 2.038 + 0.4023 x

Goodness-of-fit - I
Visual inspection
yi y e = y i 8− y
2.90 2.88 +18.5%
3.83 4.54 6
y predicted
5.98 4.92 -17.5%
4
5.71 6.26
7.74 7.55 2
y = 2.038 + 0.4023 x
0
0 2 4 6 8
y m easured

Goodness-of-fit - II
yi y e = y i8 − y
y = 0.8644x + 0.7097
2.90 2.88
R2 = 0.8644
3.83 4.54 6
y predicted
5.98 4.92
4
5.71 6.26
7.74 7.55 2
y = 2.038 + 0.4023x
0
0 2 4 6 8
Can be used to compare y m easured
the mathematical models

Goodness-of-fit - III
Residual Analysis
 If a fitted equation is representative of the data then its
residuals should not form a pattern when residuals are
plotted against values of experimental variables or the
fitted values.
 Sometimes, a normal probability plot is used to see if
the residuals form a pattern (the normal distribution is
representative of random variation).
These procedures allow us to investigate outliers, test
assumptions, and fits.

Residual Analysis
Residual plot shows a pattern, indicating that fitted equation

is not representative of the data.
Residual Plot: e vs y
1.5
1.0
0.5
e
0.0
-0.5
-1.0
0 1 2 3 4 5 6 7 8
y
There is no pattern, showing random distribution of e and

indicating that fitted equation is representative of the data.
Goodness-of-fit - IV
9 Error reduction due to
8
St=SSyy describing the data in terms
7 Sr=SSE
6 of straight line rather than
SSR
5 as an average value
y
4
Sum of squares of residuals
3
2 due to regression
1
0 SSR = St − Sr
0 2 4 6 8 10 12 14 16
x
Maximum possible residual Unexplained residual after linear reg.

Total sum of square of residuals Sum of square of residuals between
between data point and the data point and the predicted y from the
mean n linear model
n
Syy = St = ∑
i =1
(y i − y )2 SSE = Sr = ∑ (yi − a0 − a1xi )2
i =1
Coefficient of Determination
Fraction of total variation (residual) in y that is accounted for
by the fitted equation
sum of squares of residual due to regression
r =
2
total sum of squares of residual
St − Sr
r = 2
St
For perfect fit, Sr=0 ⇒ r2=1

For no improvement, Sr=St ⇒ r2=0
The magnitude of r2 is a measure of the relative strength of
the linear association between x and y.
Goodness-of-fit – IV
Correlation coefficient, r,
assigns a signed number between -1 and 1 that is a
measure of the strength of the relationship between
the variables.
r = 0 means there is no relationship between the variables
r = 1 there is a perfect positive relationship between the
variables; thus, the independent variable, y can be
exactly predicted from the independent variable x, by
the equation of a straight line.
r = -1 there is a perfect negative relationship between the
variables; again the independent variable y can be
exactly predicted from the independent variable x, by
the equation of a straight line.
The values of r are never exactly 1 or -1
Positive r
If x gets larger, y also increases.
Negative r
The variables are inversely related.
As x get larger, y decreases or
as x decreases, y increases.
Just because r is close to 1 does not mean that
fit is necessarily good
To confirm, always inspect a plot of the data
along with the regression line


Spread of dependent variable
Around the mean of dependent variable y = 5.232
10
8
6
y
4
2
0
0 2 4 6 8 10 12 14 16
x
Total sum of square of residuals Standard deviation Coeff of

between data point and the mean variation
n
St Sy
Syy = St = ∑ (y i − y) 2
sy = c.v . =
i =1 n −1 y
Around the mean of dependent variable
yi yi − y (y i − y ) 2
1 2.9 -2.33 5.44
2 3.83 -1.40 1.97
3 5.98 0.75 0.56
4 5.71 0.48 0.23
5 7.74 2.51 6.29
Sum 26.16 St 14.48
y 5.232 Sy 1.90
CV 0.364
Total sum of square of residuals Sample standard deviation St
between data point and the mean sy =
n
n −1
Sy
St = ∑ (y
i =1
i − y) 2
Coefficient of variation c.v . =
y
Around the linear regression
10
8
6
y
4
2
0
0 2 4 6 8 10 12 14 16
x
Total sum of square of residuals Standard error of estimate

between measured y and the y
calculated with the linear model
n Sr
SSE = Sr = ∑ (y
i =1
i − a0 − a1xi ) 2
sy / x =
n −2
Around the linear regression
yi y e = y i − y (y i − y ) 2
2.90 2.88 0.02 0.00
3.83 4.54 -0.71 0.51
5.98 4.92 1.06 1.12
5.71 6.26 -0.55 0.31
7.74 7.55 0.19 0.04
Sr 1.96
S y/x 0.81
Total sum of square of residuals Standard error of estimate
between measured y and the y
calculated with the linear model
n
Sr
Sr = ∑ (y
i =1
i − a0 − a1xi ) 2
sy / x =
n −2
St = 14.48
Sr = 1.96
Coefficient of Determination
St − Sr
r =
2
St =0.864
Correlation Coefficient
r = 0.93

Goodness of fit V
Standard error of estimate
Sr
sy / x = = 0.81
n −2
Standard error in a0 and a1
a0 = 0.81492
a1 = 0.09198
cv(a0) = 0.81492/ 2.038 = 0.40

cv(a1) = 0.09198/ 0.4023 = 0.23
Goodness of fit VI
If measurements are normally distributed
The range y + S y to y − S y will encompass approximately
68% of the measurement
The range y + 2S y to y − 2S y will encompass approximately
95% of the measurement
It is possible to define an interval within which measurement
is likely to fall with certain confidence (probability)

Goodness of fit VI
Confidence Interval
Range around estimated parameters within which the
true value of parameter is expected to lie with a given
probability
The probability that the true mean of y, µ, falls within
the bound from L to U is 1-α. where α is significance
level
L = ai − S (ai )tα / 2,n −1 U = ai + S (ai )tα / 2,n −1

Regression Plot

Polynomial Regression
• Minimize the residual between the data points
and the curve -- least-squares regression
Linear yi = a0 + a1xi
yi = a0 + a1 xi + a2 xi2
Quadratic
Cubic yi = a0 + a1 xi + a2 xi2 + a3 xi3
General yi = a0 + a1 xi + a2 xi2 + a3 xi3 +  + am xim
Must find values of a0 , a1, a2, … am

 Residual
ei = yi − (a0 + a1 xi + a2 xi2 + a3 xi3 +  + am xim )
Sum of squared residuals

n 2 n
S r = ∑ ei = ∑ [ y − (a0 + a1x + a2 x 2 + a3 x 3 +  + am x m )]2
i=1 i=1
• Minimize by taking derivatives

 Normal Equations
 n n
2
n
m   n 
 n ∑ xi ∑ i
x  ∑ xi   ∑ yi 
 n i=1 i=1 i=1   a0  ni=1 
n n n
 ∑x ∑ xi
2
∑ xi
3
 ∑ xi   a1 
m +1  ∑x y 
 i=1 i i=1 i=1 i=1    i=1 i i 
 n  =  n 2 
m+ 2  a2 
n n n
 ∑ xi2 ∑ i
x 3
∑ i
x 4
 ∑ xi    ∑ xi yi 
 i=1 i=1 i=1 i=1     i=1 
       a 
m
  
n m n
m +1
n
m+ 2
n
2m  n m 
 ∑ xi ∑ xi ∑ xi  ∑ xi   ∑ xi yi 
i=1 i=1 i=1 i=1  i=1 

 Solution
[Z ] * { A} = {Y }
[[Z] T
]
* [Z ] * { A} = [Z ] * {Y } { T
}
{ A} = [[Z ] * [Z ] ] * {[Z] }
−1
* {Y }
T T

Example
x 0 1.0 1.5 2.3 2.5 4.0 5.1 6.0 6.5 7.0 8.1 9.0
y 0.2 0.8 2.5 2.5 3.5 4.3 3.0 5.0 3.5 2.4 1.3 2.0
x 9.3 11.0 11.3 12.1 13.1 14.0 15.5 16.0 17.5 17.8 19.0 20.0
y -0.3 -1.3 -3.0 -4.0 -4.9 -4.0 -5.2 -3.0 -3.5 -1.6 -1.4 -0.1
 n n
2
n
3  n 
 n ∑ xi ∑ xi ∑ xi   ∑ yi 
 n i=1 i=1 i=1   ni=1 
n n n  a 0 
 ∑x ∑ xi
2
∑ xi
3
∑ xi   a 
4  ∑x y 
 i=1 i i=1 i=1 i=1   1  =  i=1
i i

n 2 n n n  n 2 
3 4 5 a2 
 ∑ xi ∑ xi ∑ xi ∑ xi     ∑ xi yi 
i=1 i=1 i=1 i=1   a3  i=1 
n 3 n
4
n
5
n
6  n
3 
 ∑ xi ∑ xi ∑ xi ∑ xi   ∑ xi yi 
i=1 i=1 i=1 i=1   i=1 
 24 229.6 3060.2 46342.8   a0   − 1.30 

 229.6 3060.2 46342.8 752835.2   a1   − 316.9 
   =  
 3060.2 46342.8 752835.2 12780147.7  a2   − 6037.2 
46342.8 752835.2 12780147.7 223518116.8  a  − 9943.36
  3   
Example
 a0   − 0.3593 Regression Equation
a   2.3051 
 1 =   y = - 0.359 + 2.305x - 0.353x2 + 0.012x3
a 2  − 0.3532
a   0.0121 
 3
6
 
2
f(x)
-2
-4
-6
0 5 10 15 20 25
21/4/2006 Anuj Jain, Astt Prof, AMD,
x MNNIT, Allahabad 54
Exponential function
 If relationship is an exponential function
bx
y = ae
To make it linear, take logarithm of both sides
ln (y) = ln (a) + bx
Now it’s a linear relation between ln(y) and x
Linear regression gives

 Greater weights to small y values
 Better to minimize weighted function
Linear regression gives


Power Function
 If relationship is a power function
b
y = ax
To make linear, take logarithm of both sides
ln (y) = ln (a) + b ln (x)
Now it’s a linear relation between ln(y) and ln(x)
ln(a)
Power Function
x y X=Log(x) Y=Log(y)
1.2 2.1 0.18 0.74
2.8 11.5 1.03 2.44
4.3 28.1 1.46 3.34
5.4 41.9 1.69 3.74
6.8 72.3 1.92 4.28
7.9 91.4 2.07 4.52
100 2.5
90
80 2
70
1.5
60
Y=Log(y)
50
y
1
40
30
0.5
20
10
0
0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 1 2 3 4 5 6 7 8 9 X=Log(x)
x
x vs y X=Log(x) vs Y=log(y)
Power Function
Using the X’s and Y’s, not the original x’s and y’s
 n   n  5 5
 n ∑ Xi   ∑ Yi  ∑ X i = ∑ ln (xi ) = 8.34
 i=1    =  i=1 
a
i =1 i =1
n n 2  B  n 
  5 2 5
 ∑ Xi ∑ Xi   ∑ X Y
i i 2
i=1 i=1  i=1  ∑ X i = ∑ ln (xi ) = 14.0
i =1 i =1
5 5
∑ Yi = ∑ ln (yi ) = 19.1
i =1 i =1
 6 8.34  a  19.1  5 5
8.34 14.0   B  = 31.4 ∑ X iYi = ∑ ln (xi ) ln (yi ) = 31.4
    
i =1 i =1

Power Function
Example – Carbon Adsorption
q = pollutant mass sorbed per carbon mass
C = concentration of pollutant in solution,
K = coefficient
n = measure of the energy of the reaction
q = K ( c) n
log10 q = log10 K + n log10 c

Power Function
Logarithmic axes:
logK3 = 1.8733, K = 101.6733 = 74.696, n = 0.2289
2.5
2
Y=Log(q)
1.5
log10 q = log10 K + n log10 c

1
0.5
0
0 0.5 1 1.5 2 2.5 3
X=Log(c)
Power Function
Arithmetic axes: K = 74.702, and n = 0.2289
350
300
250
200
q
150 q = K ( c) n
100
50
0
0 100 200 300 400 500 600
21/4/2006 C
Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 63
Nonlinear Relation
Define for initial guess of λs
Obtain dλ s needed to reduce dβ s to zero
In concise matrix form
ATdβ = (ATA)dλ
dλ =(ATA)-1(ATdβ )
Nonlinear Relation
Gaussian function
In matrix form
ATdβ = (ATA)dλ
dλ =(ATA)-1(ATdβ )
Nonlinear Relation
(A, x0, σ)
Initial guess (0.8, 15, 4)
Converged values (1.03, 20.14, 4.86)
Actual values (1, 20, 5)

Software
 Although method involves sophisticated
mathematics,
 a typical software requires initialization of
model and parameters and pressing a button
to provide the results with the statistical values
 No software can pick a model- it can only help
in differentiating between models
 Better programs allow users to specify their
own function

EXCEL functions
 ToolsData AnalysisRegression FORM
 Input X range
 Input Y range
 Labels (column heading)
 Constant is zero
 Confidence level
 Output range
 Residuals & standardized residual
 Residual plots
 Line fit plot
 Normal probability plot

EXCEL functions
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.929710151
R Square 0.864360965
Adjusted R Square 0.819147953
Standard Error 0.809178231
Observations 5
ANOVA
df SS MS F Significance F
Regression 1 12.51757177 12.51757177 19.11752687 0.022132988
Residual 3 1.964308227 0.654769409
Total 4 14.48188
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2.039476493 0.814915963 2.502683204 0.087499474 -0.553952235 4.632905221 -0.553952235 4.632905221
X Variable 1 0.402182352 0.091982912 4.372359417 0.022132988 0.109451398 0.694913305 0.109451398 0.694913305
0 4.078952986
RESIDUAL OUTPUT PROBABILITY OUTPUT
Observation Predicted Y Residuals Standard Residuals Percentile Y

1 2.884059431 0.015940569 0.022747255 10 2.9
2 4.54105072 -0.71105072 -1.014672192 30 3.83
3 4.923123954 1.056876046 1.508166301 50 5.71
4 6.262391185 -0.552391185 -0.788264407 70 5.98
5 7.54937471 0.19062529 0.272023044 90 7.74
EXCEL functions
X Variable 1 Residual Plot
Residuals
1
0
-1
0 2 4 6 8 10 12 14 16
X Variable 1
X Variable 1 Line Fit Plot
10
Y
5
Y
Predicted Y
0
0 2 4 6 8 10 12 14 16
X Variable 1
Normal Probability Plot
10
5
Y
0
0 20 40 60 80 100
Sample Percentile

EXCEL functions
 INTERCEPT(Xdata, Ydata) intercept with y axis of
best fit straight line
 SLOPE(Xdata, Ydata) intercept with y axis of best
fit straight line
 LINEST(Xdata, Ydata, stats) best fit straight line
 TREND(Xdata, Ydata, newXdata,const)y values
along the linear trend
 LOGEST(Xdata, Ydata, stats) best fit exponential
line
 CORREL(array1, array2) correlation coefficient
 PEARSON(array1, array2) P correlation coefficient
 RSQ(array1, array2) square of P correlation
coefficient
EXCEL functions
 DEVSQ(array) sum of squares of data points
about the sample mean
 STEYX(Xdata, Ydata) standard error of predicted
y for each x in regression
 TINV(probability, dofl) student’s t-distribution
 CONFIDENCE(alpha, std dev, size) confidence
interval for population mean
 CHITEST(actual range, expected range) test for
independence
 FTEST(array1, array2) variance in arrays are not
significantly different

Example 1

Example 1

Example 2
a2 a3 a4
 Mh  m   v 2
  dp 
  = a 1  s   ci
  
 A CA D C ρ s   ma   g DC   DC 
S.No. Correlation SSE
0.70
1.    ms  0.0003105
Mh
  = 0.0129
m 
 A CA D C ρ s   a 
0.24
2.  Mh   v 2
 0.0011567
  =0.0014 Ci

gD
 A CA D C ρ s   C 
3.  Mh   dp 
−0.36 0.0009327
  = 0.04 
 A CA D C ρ s  D
 C 
Example 2

Example 2
S.No. Correlation SSE
1.  Mh   v Ci
1.00 2

0.54
0.00001030
  = 0.00096 m s 
  
 A CA D C ρ s  m
 a   g DC 
0.70 0.004
2.  ms   dp  0.00031046
 Mh = 0.013   
 
A D ρ  ma   DC 
 CA C s 
−0.055
 Mh  m 
1.02
 v 2

0.54
 dp 
  0.00069 s   Ci
  
 A CA D C ρ s  0.00000892
 ma   g DC   DC 

Example 2

Example 2
S.No Parameter Value of the Standard error Coefficient of
. s in Eq. parameter of the variation of the
(5.4) parameter parameter (%)
1. a1 9.5739E-4 0.5185E-4 5.416
2. a2 1.0046 0.0142 1.412
3. a3 0.5365 0.0105 1.958
R2 = 0.987

Example 3

Example 3

Example 3

Example 3

Thanks

Example
Often it is difficult to determine which model is best

simply by looking at the scatter plot. In these
cases, one should find the regression equations
for the most appropriate 2 or 3 models and then
plot the data and graph each of the regression
models in the same viewing window. Decide which
model is the best fit by determining which one
contains more of the data points.

Optimized Curve Fitting

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Optimized Curve Fitting

Caricato da

Copyright:

Formati disponibili

Optimized Curve Fitting

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 1

 Tabulated data (interest table, steam table etc.)

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 2

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 3

 And further mathematical operations (integration,

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 4

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 5

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 6

 Sketch a line that visually conforms to the data

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 7

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 8

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 9

 Curves are dependent on subjective view point

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 10

 Force through each data point 40

 X values are accurate 7

 Y values are noisy (Experimental) 5

 Represent trend of the data 3

Without matching individual points

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 11

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 12

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 14

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 16

 Curve fitting provides a correlation between the

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 17

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 18

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 21

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 22

1 2.10 2.90 4.41 6.09 8.41

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 25

392.45 39.69  a1  238.78 

a1  0.012922 - 0.10257  238.78 

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 28

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 29

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 30

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 31

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 32

Residual plot shows a pattern, indicating that fitted equation

There is no pattern, showing random distribution of e and

Maximum possible residual Unexplained residual after linear reg.

For perfect fit, Sr=0 ⇒ r2=1

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 38

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 39

Total sum of square of residuals Standard deviation Coeff of

Total sum of square of residuals Standard error of estimate

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 44

cv(a0) = 0.81492/ 2.038 = 0.40

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 46

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 47

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 48

General yi = a0 + a1 xi + a2 xi2 + a3 xi3 +  + am xim

Must find values of a0 , a1, a2, … am

ei = yi − (a0 + a1 xi + a2 xi2 + a3 xi3 +  + am xim )

Sum of squared residuals

• Minimize by taking derivatives

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 50

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 51

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 52

 24 229.6 3060.2 46342.8   a0   − 1.30 

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 55

Linear regression gives

21/4/2006 Anuj Jain, Astt Prof, AMD, MNNIT, Allahabad 56