Numeth

Numerical Methods In Physics
S. C. Phatak
Institute of Physics, Bhubaneswar 751 005
1 Dierentiation
Often one has to compute derivative of a function while doing numerical computation.
If the form of the function is simple enough ( say sine function, exponential etc ) one
knows the derivative analytically. Then the computation of the derivative is evaluation of
the derivative function. That is straight forward. Many times the function form is quite
complicated. In that case, the computation of analytical derivative could be involved. The
expression of the derivative may be long and it may not be convinient to code the expression
of the derivative. In that case one resorts to numerical computation of the derivative. Some
cases, the function may not be given in a closed form and is evaluated numerically. In that
case computation of analytical derivative is not possible. Finally, you may have a function
given as a table of independent variable and the value of the function. Here also one has to
resort to numerical dierentiation.
The technique of numerical dierentiation is based on the Taylor series expansion of a
function:
f(x) = f(x
0
) + (x x
0
)
df(x)
dx
x=x
0
+
(x x
0
)
2
2!
d
2
f(x)
dx
2
x=x
0
(1)
If xx
0
is small enough, one can neglect the higher order terms and dene the approximate
value of the derivative to be
df(x)
dx
x=x
0
=
f(x) f(x
0
)
x x
0
(2)
In fact, the derivative is dened as a limit of the expression above in the limit of x ap-
proaching x
0
. So, while computing the derivative on the computer, one has to evaluate
1
the function f at two values of the argument close enough. But x x
0
cannot be made
arbitrarily small because of the round o errors.
Note that the error in the value of the derivative depends on (x x
0
)
2
and the second
derivative of the function at x
0
. If one can eliminate the second derivative somehow, one
will be able to compute the derivative more accurately even for relatively large value of
x x
0
. This can be done if we use the Taylor series for two values of x; x
1
= x
0
+ x and
x
2
= x
0
x. I leave it to you to show that the derivative then can be dened as
df(x)
dx
x=x
0
=
f(x
1
) f(x
2
)
x
1
x
2
(3)
and then the error in the derivative is proportional to (x
1
x
2
)
3
d
3
f(x)
dx
3
.
Problem 1: Using these two denitions of the derivative, write a program to
compute the derivative of known functions ( sine, cosine, exponential, logarithm,
Bessel functions, power series etc ). Show that the second denition is superior.
One can extend the method of derivative calculation to high order derivatives. But it
must be noted that the accuracy of the numerical values of the higher derivatives degrade
very fast. This is because every time one computes a derivative there is loss of accuracy due
to truncation. For higher order derivatives the error accumulates fast and soon one starts
getting into noise.
Problem 2: Calculate rst, second and third derivative of exponential numeri-
cally using one of the denitions given above. Show that the accuracy is least
for third order derivative.
Note: While computing third order derivative, you need second order derivative
at two values of the argument, this in turn means rst order derivative is needed
at four ( three if you are careful ) values of the argument and the function at
eight ( or ve ) values of the argument.
Many times one has data given as a function of an argument at xed value. One
then has to calculate the derivative. There are three dierent methods of computing the
derivatives: methods using forward and backward dierences and one using medians. The
2
median method is the best since the values of the function used are close to the argument
at which one is computing the derivatives. Further, if the values are given at equally spaced
points, one can use the data such that even or odd derivatives are eliminated.
2 Interpolation
We need to use interpolation techniques when the evaluation of a quantity as a function of
one or several variables is extremely time consuming so we compute the quantity at discrete
set of points and using those values we want to estimate the quantity at intermediate points.
Or, the quantity may be arising from an experimantal measurement. In that case, the
quantity will necessarily be available at discrete points. Generally interpolation, where the
function is to be estimated at a point lying within the points at which the function is known,
is safer and works very well, particularly if the underlying function is smooth. Sometimes
one needs to do extrapolation when the point lies outside the range of the points at which
the function is known. That, in general, is dangerous ( like predicting weather or future ).
At the most, it may work if the point is close to the last or rst point at which the function
is known. We shall discuss several methods of interpolation.
In the following we shall assume that the function is of one variable ( x ) and it is known
at N points x
i
, i = 1, N and the values are y
i
= f(x
i
). We shall also assume that x
i
s are in
increasing order. The programs we shall be developing can be generalised for increasing or
decreasing orde rof x
i
s. But they must be ordered.
2.1 Polynomeal Interpolation
Let us consider the linear interpolation rst. Consider that a function y(x) is known at
points x
i
, with i = 1 n. The corresponding values are y(i). In linear interpolation, the
value of the function at any point x is computed by rst determining which two x
i
s bracket
x and tting a straight line between x
i
and x
i+1
and computing
y(x) = y
i
+
x x
i
x
i+1
x
i
(y
i+1
y
i
)
=
x x
i+1
x
i
x
i+1
y
i
+
x x
i
x
i+1
x
i
y
i+1
(4)
3
.
The most general polynomeal interpolation formula, which is unique, is
y = f(x) =

i=1,N
j=1,n;j=i
x x
i
x
j
x
i
y
i
(5)
Note that the interpolation formula above has polynomeal of order N 1. For N = 2,
we have linear interpolation;
y = f(x) =
x x
2
x
1
x
2
y
1
+
x x
1
x
2
x
1
y
2
(6)
Several points should be noted about these interpolation formulae.
1. The interpolating polynomeal function always agrees with the input y
i
s whenever
x = x
i
. This can be readily checked from the formula.
2. The extrapolation using the polynomeal formula leads to divergent result and the
divergence becomes worse as N increases. The interpolating function behaves as x
N1
as x .
3. Even for interpolation, large value of N leads to possibly unphysical oscillations in the
interpolating function. It is therefore advisable to use as few points for interpolation.
Generally, three or four point interpolation works well. In no case one shuld use all the
data points for interpolation. For one thing, one has unphysical oscillations. Secondly,
computation is long if one has many points. For N point interpolation formula one
has N-1 terms, evaluation of each requires 2N-1 multiplications, one division and 2N-2
subtractions.
4. One can convince oneself that the interpolating function has discontinuous higher
derivatives as one passes over the points x
i
s. The function is however continuous.
For example, two point interpolation has rst derivative discontinuous and second
order derivative not dened at these values of x.
5. Since the number of interpolation points are generally going to be less than the total
number of points at which the function is known, one needs to nd points close to
the value at which the function is to be evaluated. The we are sure that we are
4
interpolating the function. Also, one expects that the function should be close to the
tabulated values if the argument is close to the values at which the function is known.
The last point implies that one has to locate the value of i for which x
i
< x < x
i+1
.
The simplest way is to compare x
i
and x
i+1
with x for increasing values of i and search till
the condition above is satised. This works but one can do better than that. One can use
bisection method. One begins the checking at j = N/2. If x
j
< x, the required i is larger
than j. Otherwise it is smaller than j. Depending on the condition, choose next j to be
(N j)/2 or j/2. Continue the process till the bracketing interval is unity.
Write a subroutine called locate to determine i such that x
i
< x < x
i+1
when an
array x
i
; i = 1, N and x is given. Use bisection method. The output should be i
satisfying the condition above. Arrange to return i = 0 if x < x
1
and i = N +1
if x > x
N
. This will warn you if you are doing extrapolation.
It is possible that one needs to interpolate the function several times in a program.
Further, many times one needs to interpolate the function repeatedly at values very close
to each other. In that case it is a waste to call locate every time since is will be same or
diering by one or two. One can write another subroutine hunt to get the new value of i.
Write a subroutine hunt to determine i such that x
i
< x < x
i+1
when an array
x
i
; i = 1, N and x is given. The routine should start the hunt from i which
is passed to the subroutine from the main routine from the previous locate or
hunt call. The code should check if the input value of i satises the condition
above. If it does not, i should be increased or decreased by one and check made.
The increase or decrease should be doubled at each step to bracket x between x
j
and x
k
. Once it is bracketed, bisection method should be employed to satisfy the
above condition.
The subroutine locate or hunt will return i such that x is bracketed between x
i
and x
i+1
.
One can then use three or four point interpolation formula with the interpolation points
x
i
s nearest to x. For this one needs to write a subroutine named ( say ) interpol which has
{x
i
}, {y)i}, N, x and i as input and the exptrapolated value y of the function as an input.
5
Write the subroutine interpol3 and interpol4 to do three and four point interpo-
lation, given the {x
i
}, {y)i}, N, x and i such that x is bracketed between x
i
and
x
i+1
. For three point interpolation use x
i1
, x
i
and x
i+1
as interpolation points.
For four point interpolation use x
i2
, x
i1
, x
i
and x
i+1
as interpolation points.
For xs close to x
1
and x
N
, ensure that the interpolating points remain within
the table of {x
i
}.
2.1.1 Rational Function Interpolation
The polynomeal interpolation works very well if the function being interpolated is smooth.
There are problems if it has nearby poles, in which case the polynomeal approximation to
the function is bad. In such situations, rational function approximation works very well. In
this case, one chooses a function of the form
R(x) =
a
0
+a
1
x +
1 +b
1
x +
(7)
for interpolation. It takes care of poles and to some extent branch points of the function.
Further, the rational function approximation also works when polynomeal approximation
works since the denominator can always be expanded in power series and then the method
becomes similar to the polynomeal approximation. One may note that Pade approximants,
which seems to be a power method of approximating functions is related to the rational
function approximation. The tting procedure then requires the determination of a
i
s and
b
i
s. It turns out that one can obtain a recurrance relation for computing the rational
function approximation for m + 1 points in terms of m and m 1 points in the diagonal
approximation. That is , the number of powers of x in numerator and denominator if m is
even and the denominator having one extra power if m is odd. The recurrence relation is
R
i(i+1)(i+m)
(x) = R
(i+1)(i+m)
(x) +
+
R
(i+1)(i+m)
(x) R
i(i+1)(i+m1)
(x)
xx
i
xx
i+m
(1
R
(i+1)(i+m)
(x)R
i(i+1)(i+m1)
(x)
R
(i+1)(i+m)
(x)R
(i+1)(i+m1)
(x)
) 1
(8)
with the initial condition that R
i(i+1)(i+m)
(x) = y
i
for m = 0 and R
i(i+1)(i+m)
(x) = 0 for
m = 1.
6
Writing a subroutine for rational function interpolation is left as an exercise for enthu-
siasts. Note that for this case also one would like to use few points ( three or four ) for
determining the rational function. So here also one needs subroutines locate or hunt.
2.1.2 Cubic Spline Interpolation
Consider the linear interpolation in interval x
i
x
i+1
,
y = Ay
i
+By
i+1
(9)
where A =
x
i+1
x
x
i+1
x
i
and B =
x
i
x
x
i
x
i+1
. The derivative of the interpolating function is
constant in the interval and it changes from interval to interval. Therefore the second
derivative is zero within the interval and is not dened at the boundaries of the intervals. If
the second derivative of the function at the point x
i
s is known, we can use these to ensure
the continuity of second derivatives. It turns out that a unique way of doing it is by a
formula
y = Ay
i
+By
i+1
+Cy
i
+Dy
i+1
(10)
where A and B is as dened above and C =
1
6
(A
3
A) and D =
1
6
(B
3
B). Note
the dependence on x enters through A, B, C and D and the dependence is cubic in x.
Further, since A = 1(0) and B = 0(1) for x = x
i
(x
i+1
), the rst and second derivatives of
the interpolating function are continuous across x
i
s. In fact these are y
=
1
x
i+1
x
i
y
i
1
x
i
x
i+1
y
i+1
3A
2
1
6
1
x
i+1
x
i
y
i

3B
2
1
6
1
x
i
x
i+1
y
i+1
and y
= Ay
i
+By
i+1
. Thus, the second
derivative is interpolated linearly in the interval x
i
x
i+1
.
Most of the time the y
i
s are not available. The one has to have estimates of those from
known y
i
s. One method of going about is to use the expression for the rst derivative of
the interpolating function given above and evaluate it in intervals x
i
x
i+1
and x
i1
x
i
and equate the two. After some algebra one gets
x
i
x
i1
6
y
i1
+
x
i+1
x
i1
3
y
i
+
x
i+1
x
i
6
y
i+1
=
y
i+1
y
i
x
i+1
x
i
y
i
y
i1
x
i
x
i1
(11)
7
There are N 2 such equations for 2 i N 1 for N unknowns. One has to supply
the values of y
1
and y
N
to solve these equations. One methods assumes them to be zero.
Another method determines them by assuming some value for the rst derivative at the
end points. These set of linear equations can then be solved to obtain y
i
s.
Another method is to compute the second derivatives from y
i
s assuming Taylor series
expansion of the function about x
i
.
This method is called cubic spline interpolation. The name spline comes from the curves
used by draftsmen for drawing smooth curves. Notice that the interpolating function one
gets is cubic in x. But it diers from the four point polynomeal interpolation, which is also
cubic in x. One can check that by comparing the coecients of the two polynomeal terms.
The dierence arises from the condition of continuity of rst and second derivatives at each
x
i
.
Developing a program for cubic spline interpolation is left to enthusiastic students.
Again, the cubic spline interpolation program will require subroutines locate and hunt.
3 Integration
Let us consider evaluation of the integral of a one dimensional function f(x):
I =
_
b
a
f(x)dx (12)
If we know a function g(x) such that f(x) =
dg(x)
dx
then I = g(b) g(a). Classically, the
integral is dened as the area under the curve f(x) between interval [a, b] ( as shown in the
accompanying gure ).
For reasonably well behaved functions ( functions having nite discontinuities in interval
[a, b] and having no singularities in the interval ) the integral is also dened as a limiting
process
I =
b a
N
i=1,N
f
_
a +
(i
1
2
)(b a)
N
_
=
b a
N
_
1
2
f(a) +

i=1,N1
f
_
a +
i(b a)
N
_
+
1
2
f(b)
_
(13)
8
Figure 1: Extended trapizoidal and midpoint rules
in the limit N .
3.1 Formulae With Equally Spaced Points
The formula in the above equation can be written in a compact form by dening h =
ba
N
.
What this means is the interval [a, b] is divided into N equal parts of length h and the integral
in each part is approximated by h times the value of the function at the midpoint ( the
rst equation ) or the average of the function at two ends of the part ( the second equation
). These formulae are often called as extended midpoint rule and extended trapizoidal rule
respectively. The meaning of these terms is obvious. The formula is exact in the limit
N . On a computer, however, we cannot take the limit N and we have to
estimate the integral for some reasonably large value of N. This is depicted in Figure below.
9
In this gure the trapizoidal and midpoint rules are shown by dashed and dotted lines
respectively.
x
a b
f
(
x
)
Figure 2: Geometric denition of Integral
In fact, there is an exact expression for the integral in terms of the sum given above
plus correction terms which depend on the higher derivatives of the integrand. :
I =
b a
N
_
1
2
f(a) +

i=1,N1
f
_
a +
i(b a)
N
_
+
1
2
f(b)
_
i
B
2i
h
2i
(2i)!
(f
2i
(b) f
2i
(a))
= I
t
N

i
B
2i
h
2i
(2i)!
(f
2i
(b) f
2i
(a)) (14)
where B
i
are the Bernaulli numbers. This is the Euler-Maclaurin summation formula
for the integration. So, in principle, if one knows all the deriatives of the function, one has
an analytic expression for the integral. Usually, this is not the case so one has to limit the
summation to nite number of derivatives. In fact, most of the time we may not know any
derivatives of the integrand at the end points.
Clearly, the correction terms go to zero ( provided all the higher derivatives of the
function are nite ) in the limit h =
(ba)
N
going to zero. We can take advantage of the
10
formula by computing the rst term for two or more values of N and use the result to
eliminate some of the leading terms of the correction series. In particular, if we compute
the integral for N and 2N intervals then it is straight forward to show that
I =
4
3
I
t
2N

1
3
I
t
N
+O(
1
N
4
) (15)
So, with the two computations we have got a new formula which is expected to be much
more accurate. This is the extended Simpsons rule. We can write it in a more familiar
form by dening h =
ba
2N
,
I
S
2N
= h
_
1
3
f(a) +
4
3
f(a +h) +
2
3
f(a + 2h)
1
3
f(b) (16)
The trapizoidal and Simpson rules are called closed formulae because the end-points are
included in the formula. The midpoint rule is an open formula. Sometimes open formula is
useful if the integrand has an integrable singularity at the end-point(s). The closed formula
does not work in such cases because the function is singular at the end point(s).
We can obtain a formula similar ( in spirit ) to the Simpson formula from the midpoint
rule. The trick is to triple the number of points here since by doubling we cannot get
advantage of previous computations. We then have
I =
9
8
I
m
3N

1
9
I
m
N
+O(
1
N
4
) (17)
where I
m
N
is the estimate of the integral using extended midpoint rule;
I =
b a
N
i=1,N
f
_
a +
(i
1
2
)(b a)
N
_
(18)
Program 1: Write down a subroutine which will compute an integral
_
b
a
dxf(x)
using trapizoidal rule. The input should be the lower and upper limits (
a and b ), the function f which is dened as an external function ( to be
dened separately ) and output should be the estimate of the integral and
11
an error estimate. The procedure adopted should be that begining with
number of intervals equal to 1, the program should double the number of
intervals every time and compute the integral. You can choose maximum
number of intervals to be 2
N
with N some integer ( say between 5 and 10
). Error extimate will then be the dierence between the integral for 2
N1
and 2
N
intervals. Other possibility is to keep on increasing the number of
intervals till some desired accuracy is reached ( i.e. |(I
2
N1 I
2
N)/I
2
N | <
).
Program 2: Write the subroutine for computing the integral using Simp-
sons rule. Note that you can modify the program developed in the previous
problem to do Simpsons rule. But then N should be greater than 2 ( at
least ). You will then have to use the result for 2
N1
and 2
N
intervals
to compute the integral using Simpsons rule. You can also have error
extimate as in the previous problem.
Problem 3: One can use extrapolation technique to get a better estimate
of the integral using trapizoidal rule for 2
N
intervals with dierent val-
ues of N. From Euler-Maclaurin series we know that the correction terms
form a series in powers of 1/N
2
. Considering the integral estimated from
trapizoidal rule as a function of the variable 1/N
2
, we can estimate the in-
tegral by extrapolation to 1/N
2
= 0. One can use polynomeal interpolation
formula.
There are a number of other integration formulae ( both open and closed type ) using
intervals of equal spacing. We shall not be considering them as they are more complicated
to program and they do not seem to have much advantage over trapizoidal, midpoint or
Simpson rules described above.
Given an integral to be done on computer, one is not sure about the number of points
to be used in the calculation. In that case, one can use certain number of integration
points and check the accuracy by doubling the number of points. Note that, if one uses
trapizoidal rule, half of the points at which the function is needed for second calculation are
already computed in the rst step. Using the two results, we have a rened estimate of the
12
answer ( which is actually the Simpson rule ). One should have the dierence between the
second computation and the rened estimate to be tolerably small. If it is not, one needs to
double the number of points again and repeat the procedure. One can have an integration
subroutine using this algorithm.
3.2 Gaussian Quadrature
The formulae developed in the previous section have the integration points equally spaced
or the whole integration interval is divided into subintervals of equal lengths. Gaussian
quadrature formulae, on the other hand, consider that the integration points as well as
the weights can be adjusted to obtain the formula which is exact for polynomeal of certain
maximum power. Consider that you have N number of points ( x
i
s i going from 1 to N )
and N weithts ( w
i
s ) so that we can write
_
b
a
dxf(x) =

i=1,N
w
i
f(x
i
) (19)
Since we have 2N parameters, we can vary these to make the formula exact for a polynomeal
having maximum power 2N 1. As an example, consider an integral in the interval [1, 1]
which we want to do with two points x
1
and x
2
and two weights w
1
and w
2
. We should be
able to determine these by insisting that
_
1
1
dxx
n
= w
1
x
n
1
+ w
2
x
n
2
for n = between 0 and
3. This gives four equations
2 = w
1
+w
2
(20)
0 = w
1
x
1
+w
2
x
2
(21)
2
3
= w
1
x
2
1
+w
2
x
2
2
(22)
0 = w
1
x
3
1
+w
2
x
3
2
(23)
(24)
The second and third equations above are satised if x
1
= x
2
and w
1
= w
2
. The
rst equation above along with the previous statement implies that w
1
= w
2
= 1. Finally,
the third equation above along with the previous statements give x
1
=
_
2
3
and x
2
= x
1
.
13
This is the two point Gaussian quadrature formula. As we shall see later, it is actually the
Gauss-Legendre quadrature formula.
The derivation above is correct but it is dicult to generalise it for more number of
points. Actually, the theory goes under the name of Gaussian quadrature and the general
statement of the quadrature is, one can nd the points and weights for an integral of the
form
_
b
a
dxw(x)f(x) =

i=1,N
w
i
f(x
i
) (25)
where w(x) is some weight function and the summation on the right hand side is exact
if f(x) is a polynomeal of order (2N-1). The limits of integration a and b could be in-
nite. The derivation of x
i
s and w
i
s uses the property that there exist a complete set
of functions P
i
(x) which are polynomeals in x and which satisfy orthonormality relation
_
b
a
dxw(x)P
i
(x)P
j
(x) =
i,j
C
i
where C
i
is a function of i. In fact, P
i
(x)s form a complete
set in the range [a, b] so that an arbitrary function can be expanded as a linear combination
of P
i
(x)s. Below I list some of the sets of polynomeals which are often used in Gaussian
quadrature.
Name a b w(x)
Legendre polynomeals -1 1 1
Chebyshev polynomeals -1 1
1
1x
2
Hermite polynomeals e
x
2
Laguerre polynomeals 0 e
x
There are simple subroutines to compute points and weights for all the cases mentioned
above in terms of the polynomeals going by these names. Of these Gauss-Legendre quadra-
ture is the most often used method so we shall discuss that in more details. The basic
idea behind Gaussian quadrature is the use of orthogonality of the set of polynomeals being
used. These polynomeals have an index ( order ) which starts at zero and increases to
innity and the number of zeroes each polynomeal has is exactly equal to the index of the
polynomeal. Thus the Legendre polynomeal of order N has exactly N zeroes between -1
and 1. If we choose the N zeroes of the N
th
order polynomeal as the integration points, by
construction
_
1
1
dxP
N
(x) = 0 =

i=1,N
w
i
P
N
(x
i
). We can then adjust the weights so that
14
i=1,N
w
i
P
i
(x
i
) = 0 for 0 < i < N and

i=1,N
w
i
P
0
(x
i
) =

i=1,N
w
i
= 2. These are N
equations for N unknowns w
i
s. It turns out that w
i
=
_
1
1
dxP
2
N1
(x)
P
N1
(x)P
N
(x)
. So, determination of
the integration points and weights can be done for arbitrary value of N.
Problem 4: Using the subroutine developed for Legendre polynomeals, de-
velop a program to compute the integration points and weights for N point
Gauss-Legendre quadrature. Use bisection method to nd the zeroes of
Nth order Legendre polynomeal. The Legendre polynomeal is symmetric (
antisymmetric ) for even ( odd ) N so eectively you need to get the pos-
itive zeroes only ( for odd N there is a zero at x=0 ). Having determined
zeroes ( which are x
i
s), compute P
N1
and P
N
at these points. Use the
recurrence relation P
N
(x) =
N(xP
N
(x)P
N1
(x))
x
2
1
and w
j
=
2
(1x
2
j
)[P
N
(x)]
2
.
Many times one linearly scales the Gauss-Legendre points to the required integration
interval. The Gauss-Legendre points do not include end-points so one may use them when
one has integrable end-point singularities.
In all the discussions above, we have assumed that the integrand is regular, i.e. it is
piece-wise continuous and does not have any singularities in the integration region and the
limits of integration are nite. If one or both limits of integration are innite, one can use
suitable coordinate transformation to cast the integral in a form where the limits are nite.
For known integrable singularities, one can use open ended intgration formulae so that one
does not step on the singularities. Another related problem is when one has to compute
principle value integral when there is a simple pole on real axis. By denition the principle
value integral is
I
PV
= P
_
b
a
dxf(x)
1
x c
(26)
= lim
0
_
_
c
a
dxf(x)
1
x c
+
_
b
c+
dxf(x)
1
x c
_
One can do this integral by two methods. One is to compute the integral as dened in the
second line. The other is to add and subtract an integral having same singularity structure
where the integral is known. One can then arrange the things so that the computer evaluates
an integral which does not have the singularity at x = c and subtract a known integral.
15
For example, P
_
c+a
ca
dx
1
xc
= 0 so we can use this to cancel the singularity at x = c in the
integral to be evaluated.
3.3 Multi-dimensional Integration
In principle, one can extend the one-dimensional integration methods to more than one
dimensions. This is indeed trivial if the integration boundary in multi-dimensional inte-
grations is simple ( say a multi-dimensional cube or sphere or some such object ). If the
boundary is complicated, one needs to incorporate that in the integration scheme. That
may be dicult but can still be programmed. The real diculty comes when the dimen-
sionality is large. One then needs large number of integration points to get a reasonable
accuracy for the integral. Typically, if one needs 10 points for 1% accuracy in one dimen-
sion, for N-dimensional integration, one would expect 10
N
integration points to achieve
similar accuracy. This runs into millions of points for more than six dimensions. At this
stage, it may be useful to use Monte Carlo methods to estimate the integral. Generally, for
Monte Carlo methods with N points, the accuracy goes as
1
N
. So with a million points
in N dimensions, we would expect an accuracy of 1% ( at least ). We shall discuss Monte
Carlo method when we discuss random numbers.
4 Linear Equation And Matrices
Consider a problem of solution of a set of linear equations
a
11
x
1
+a12x
2
a
1n
x
n
= b
1
a
21
x
1
+a22x
2
a
2n
x
n
= b
2
.
.
. (27)
a
m1
x
1
+am2x
2
a
mn
x
n
= b
m
If m > n then the set is over-determined and if m < n it is under determined. For
m = n, there is possibly a unique solution. We can write this equation in a matrix form
16
j
A
i,j
X
j
= B
i
(28)
or
AX = B (29)
where A is a mn matrix and X and B are n and m dimensional arrays.
In the following we shall be considering m = n so that a unique solution is possible, i.e.
the matrix A is not singular. In that case one can invert the matrix and write the solution
as
X = A
1
B (30)
Generally, the matrix is nonsingular if any two rows ( or columns ) are not proportional
to each other. Thus the solution of the simultaneous set of equations is equivalent to nding
the inverse of the matrix. Sometimes, one needs to compute the determinant of a matrix A.
It turns out that one determines the determinant while one is computing the inverse. I do
not want to go into the details of the methods used for inverting the matrix for two reasons.
One is that the theory of matrices is discussed in great details in mathematical methods
courses and that need not be repeated here. Second is that, over the years a number of good
subroutines for matrix methods have been developed. Every mathematical library ( NAG,
IMSL, CERN, numerical recipes etc ) has a set of routines specializing for various types
of matrices and it is advisable to use those. The reason is, although it is possible to write
subroutines for matrix methods using mathematical methods, the home-cooked subroutines
are generally not as ecient as the library ones ( This is not the case for interpolation,
integration or dierential equations subroutines ). I will therefore talk about a general
method ( Gauss elimination method ) very briey and indicate some of the problems which
may arise in matrix inversion or simultaneous equations problems.
17
4.1 Gauss Elimination
For simplicity, let us consider a set of two linear equations.
a
11
x
1
+a
12
x
2
= b
1
(31)
a
21
x
1
+a
22
x
2
= b
2
(32)
One method of solving these equation is to eliminate x
1
in the second equation by mul-
tiplying the rst equation by a
21
/a
11
and subtracting it from the second equation. This
gives
a
11
x
1
+a
12
x
2
= b
1
(33)
_
a
22
a
21
a
12
a
11
_
x
2
= b
2
a
21
b
1
a
11
(34)
This is also a matrix equation in a triangular form. and we have already obtained the value
of x
2
=
b
2
a
21
b
1
a
11
a
22
a
21
a
12
a
11
=
a
11
b
2
a
21
b
1
a
11
a
22
a
21
a
12
. x
1
is obtained by back-substitution of the value of x
2
in
the rst equation. This procedure is called as Gauss elimination and back substitution. This
procedure can be written as a multiplication of series of matrices. One can easily extend
it to more than two simultaneous equations. The Gauss elimination and back-substitution
along with a related method, Gauss-Jordan reduction are the workhorses of matrix methods.
There are subroutines developed for special matrices such as symmetric matrices, tridi-
agonal matrices, complex matrices etc. You are advised to use library subroutines for these.
In the Institute we now have NAG library available for linux machines which you can use.
You can also use subroutines given in numerical recipes book.
5 One Dimensional Dierential Equations
Consider a rst order dierential equation
dy(x)
dx
= f(x, y) (35)
with a boundary condition y(x = x
0
) = y
0
. Here, f(x) is a known function of x. A simplest
way of solving this equation is to use Taylor series expansion around x = x
0
and estimate
18
y(x = x
0
+ dx) = y
0
+ dx
dy(x)
dx
= y
0
+ dxf(x = x
0
, y
0
). Here we have ignored the second
and higher derivatives so the approximation is meaningful if dx is small enough. Having
known the function at x = x+dx, we can repeat the process and compute the function y at
x = x +2dx and so on. This method is called as the Eulers method. The error involved in
the computation is Odx
2
. This itself is not so bad if we can choose dx small enough. But
the error may accumulate at each step of dx and in the end we may have large error. So, it
may be useful to have a method which is better than O(dx
2
).
One can solve a set of simultaneous dierential equations by this or similar procedure.
For higher order equations, one can always rewrite them as a set of rst order equations
and solve them. The generalization is straight-forward so we will not spend any more time
on that.
In order to obtain higher order accuracy we need to know the higher order derivatives
of the function y(x) at x = x
0
. There are several ways of doing it and depending on which
one we choose we get dierent formulae. Let us discuss those.
1. Runge-Kutta method: Let us consider the second order Runge-Kutta method. Here
what one does is compute the function y at x
0
+dx/2 and use that to get the higher
order derivatives. The algorithm is summarised in the following equations.
k
1
= hf(x
0
, y
0
)
k
2
= = hf(x
0
+dx/2, y
0
+k
1
/2)
y
1
= y(x
0
+dx)
= y
0
+k
2
O(dx
3
) (36)
The equations above mean that one computes the derivative of y at x
0
+ dx/2 by
using the estimate of the function at that point and use that derivative to compute
the function at x
0
+dx. It should be clear that the method is correct upto order dx
2
.
There is a fourth order Runge-Kutta algorithm given below:
k
1
= hf(x
0
, y
0
)
19
k
2
= hf(x
0
+dx/2, y
0
+k
1
/2)
k
3
= hf(x
0
+dx/2, y
0
+k
2
/2)
k
4
= hf(x
0
+dx, y
0
+k3)
y
1
= y(x
0
+dx)
= y
0
+
1
6
(k
1
+ 2k
2
+ 2k
3
+k
4
) +O(dx
5
) (37)
The fourth Runge-Kutta method is robust, so it can be applied in a variety of calcu-
lations. It is not very ecient. One test to check if the method is stable or not is to
check if the values k
2
and k
3
are close to each other. If they dier signicantly, one
needs to reduce the step length dx.
2. Predictor-corrector methods: These methods are essentially equivalent to an integra-
tion of the function. Consider that the function y = y
0
is kown at x = x
0
. Then,
y(x) = y
0
+
_
x
x
0
dx
f(x
, y(x
)) (38)
The dierence between the evaluation of an integral and solution of the integral equa-
tion is that in former, the derivative ( f(x, y(x)) ) does not depend on y so one can
do the integral using one of the integration methods. For solution of the dierential
equation one needs to know the function y to do the integral. Suppose we somehow
know y in the interval ( at least at some points in the interval ). We can then use some
integration formula to compute y at x. Thw catch is, we need the function y at x to
evaluate integral. The procedure adopted is to rst predict the value y(x) and then
use it to evaluate the right hand side to correct it. To illustrate the point , consider
the lowest order case. Let us call x = x
0
+ h. Using Taylor series, we can write
y(x
0
+h) = y
0
+hf(x
0
, y
0
). The integral on the right hand side can then be evaluated
by using trapizoidal rule, so y(x
0
+h)|
new
= y
0
+h/2(f(x
0
, y
0
) +f(x
0
+h, y(x
0
+h))).
Here the rst computation of y(x
0
+h) is the prediction and the second is correction.
Clearly, the method is O(h
2
). For higher order predictor-corrector methods, one as-
sumes that the function is known at ( some ) n points ( including the rst point x
0
)
20
and uses an open integration formula to predict the function y at x. One such formula
is given below.
3. Extrapolation method: This method is similar to the extrapolation method used
for integrations. In fact we used the extrapolation method to derive Simpsons rule
starting with trapizoidal rule. We can do the same thing here. The method is as
follows. Consider that we know the function y(x) = y
0
at x x
0
. Therefore we also
know f(x
0
, y
0
). We now want to evaluate the function at x = x + H. We do the
calculation in one and two steps. Then the two results we have are:
y
1
(x +H) = y
0
+Hf(x
0
, y
0
) (39)
and
y(x +
H
2
) = y
0
+
H
2
f(x
0
, y
0
)
y
2
(x +H) = y(x +
H
2
) +
H
2
f(x
0
+
H
2
, y(x +
H
2
)) (40)
As in the case of integration, the error in the rst order method is proportional to
even powers of x where x is the step length. So using these two results, we get an
extrapolated result for y(x + H). The error in the result is OH
4
. If we want better
accuracy, we can compute the function using 4, 6, 8, times and use extrapolation
procedure to get the result for the number of steps going to innity. Note that
y
n
(x +H) can be written as a polynomeal in 1/n
2
.
The methods we have described above apply to rst order dierential equation in one
variable. If one has higher order dierential equation, one can convert it into a number
of rst order coupled equations and these can then be solved using the methods described
above. There are usually two types of dierential equations problems one has to solve. One
type is the initial value problems which can be solved using the algorithms discussed above.
The other set of problems are the boundary value problems where the boundary conditions
are specied at two points. A typical case of this type is the solution of Schrodinger equation
for bound states, where the wavefunction has a typical behavior at origin and . That
requires more work and we shall discuss that later.
21
Solutions of equations in more than one dimensions are much more dicult to solve. The
reason is, in one dimension, the boundary condition is at one point and solve the equation.
In two or more dimensions, the the boundary condition is to be given on a surface having
one dimension less than the problem at hand. One needs to develop the methods discussed
above further or use other methods for such problems.
6 Random Numbers And Monte Carlo Methods
A set of random numbers are a very useful tool for doing simulations on computers. With
the increase in the computing power, simulation methods using random numbers are being
used extensively. In this section we shall consider the applications of random number
generators in numerical methods. We know that random numbers are generated in many
physical processes, such as
1. number of counts per unit time in a radiation detector. The counts are due to a
process like cosmic rays which is a random process. Thus the counts measured in unit
time are distributed as a Gaussian with a mean and variance.
2. tossing of coin gives a sequence of binary random numbers ( zeroes and ones ). One
can use these to construct a sequence of random integers. One can also do this by
throwing a dice ( numbers from 1 to 6 ) or using roulette wheel ( numbers between 0
and 20 or so ). These are uniformly distributed random numbers.
3. noise in electronic circuit is random. One can use ( say ) current thus generated to
get a sequence of random numbers.
One can think of many more such processes for obtaining a sequence of random numbers.
A method of using such a random sequence is to keep the random process going and use
the numbers coming out of the process in the calculations. The problem with these physical
methods is that these processes are slow for use in computers. That is, the rate at which
the computer will consume the sequence of numbers to do calculations is much much
faster than the rate at which these numbers are being generated. An alternative method
is to store the series of random numbers generated by a physical process as data and use
22
those in computation. This is also not feasible since it would require large storage space. In
a typical modern calculation on needs billions of numbers and the space required to store
these numbers is prohibitively large. In order to overcome these diculties random number
generators which generate a series of random numbers have been developed. i
The term random number generating program is a contradiction because any com-
putation in a computer is deterministic and not random. However, it is possible to have
algorithms which will produce a series of numbers with successive numbers having no appar-
ent correlations among themselves. Thus, the sequence generated by the random number
generator would pass the tests which any series of random numbers is expected to pass. The
subject of generation and testing of random numbers is a vast subject by itself and we will
not be going into that. Our aim here is to use such a generator to produce a random number
sequence having certain distribution properties and use that sequence in calculations. So
we will assume that there is a random number generator. Usually, every operating system
provides such a built-in random number generator which can be readily used in computa-
tions. Usually it is a function called ran or rand. Please look into a local fortran manual
to nd the name on the system of your choice. In linux-fortran the generator is rand. It
needs to be initialized by calling a subroutine srand. So the initialization is done by calling
a subroutine by a statement
call srand(iseed)
where iseed is an integer used to kick start the generator. To obtain a random number
you need to call a function rand by a statement ( say )
x=rand()
One needs to provide dierent values of iseed in each calculation so that one gets dierent
sets of random sequences in each calculation.
Generally, one needs to ensure that the sequence of random numbers are really random
and it passes tests of randomness. One necessary test is that there is no ( or minimal )
correlation between successive numbers in the series. One also needs to worry about possible
periods of the generator, particularly if one is going to use really large sequences.
Commonly used random numbers are the set of numbers distributed uniformly between
zero and one (The generator mentioned above is one such generator). That is, the probabil-
23
ity of having a number in any interval dx between zero and one is proportional to dx itself.
Uniform random numbers in any other interval can be obtained from these by a simple scal-
ing transformation. Other sets of random numbers often used are those which are having
normal distribution. That is, the probability of having a number in an interval x and x+dx
is proportional to e
(xa)
2
/
2
. This set has mean a and variance . One can generate a
normal sequence of random numbers from uniform sequence by a suitable transformation.
One can also generate dierent types of random number sequences from uniform sequence
by appropriate transformations.
Problem 1: Derive a transformation to convert a sequence of uniform random
numbers between zero and one to a sequence of normal random numbers with
mean zero and variance unity.
We shall now come to the applications of random number sequences in dierent types
of computation.
6.1 Computer Simulations
One place where the random numbers are extensively used is the numerical simulation of a
physical process. The basic idea here is to use the computer to produce the mock data in a
fashion similar to an experiment. The reasons for doing this are many. For example, doing
an experiment may be dicult or not possible. Or, you may want to study how a system
evolves in time and it may not be possible experimentally. Below I shall list some of the
places where simulations are done.
1. The most well-known example is the quantum chromodynamics calculations on lattice.
As is known, there are diculties in doing QCD calculations because the coupling
constant of the theory is large and perturbation theory is not applicable. On the
other hand, it has been shown that these diculties are overcome if one discretizes
the space-time and attack the QCD problems on this space.
2. Collisions of two nuclei involves collisions of nucleons from target and projectile. Even
if the dynamics of collisions between two nucleons is known ( which actually is not ),
24
one is not able to compute the collision process of two nucleons using usual methods.
One has to do a computer simulation of such a process and obtain the cross sections
etc for such collisions.
3. Molecular dynamics is also similar to the collisions between two nuclei. Here also the
interaction between two atoms is understood but the calculations of system of atoms
is complex and random number methods come in handy.
4. Simulations are extensively used in equilibrium and non-equilibrium statistical me-
chinics. A simple process ( which can be done in a course like ours ) is the random
walk problem.
The examples mentioned above do not complete the list but just illustrate that the
applications of random numbers are many and are growing.
As a concrete example let us apply the random numbers to the simulation of radioactive
decay on computer. The theory of the radioactive decay states that the probability that
a nucleus ( or generally any excited state ) will decay by some physical process ( say by
emmiting an electron and neutrino or a photon or an alpha particle etc ) in a unit time is
constant . Thus, the change in the number of nuclei in unit time is proportional to and
number of nuclei (
dN
dt
= N ) and we know that the solution of this dierential equation
is N(t) = N
0
e
t
, where N
0
is the number of nuclei at time t = 0. So, the number of nuclei
decaying in any time interval dt is N(t) dt and often an experimentalist measures this
number. But, note that, actually this is the mean number of nuclei decaying in time dt and
the actual number will be uctating about this value. We want to simulate this process on
a computer.
Problem 2: Using random number generator, simulate the law of radioactivity.
Start with large number of nuclei ( say 10
7
). Choose a time interval dt such that
dt is not too large ( say 0.01 ). This means the probability that a nucleus will
decay in this time interval is dt. Pick a random number and decide that the
nucleus decays if the number is smaller than dt. Doing this for all the nuclei
you will know how many of them decay and how many survive after interval dt.
25
Repeat the process for the surviving nuclei in the next interval dt and so on.
This will generate a table of time ( i dt ) and the number of surviving nuclei.
Plot this and verify that you get an exponential from this synthetic data. We
shall use this data later for data tting.
Problem 3: Another physical problem which can easily be simulated is the
problem of random walk. Consider a one-dimensional random walker who travels
a distance x = 2. (rand() 0.5) before taking next step which is again decided
by the random number. Note that rand() generates a random number between
0 and 1 having uniform distribution. Determine the distance traveled by the
walker after ( say ) 1000 steps. Do this for large number of starts and comment
on the distribution of the nal distance traversed.
One can think of a number of physical processes which can be simulated using ran-
dom numbers. Typically, any physical process which has probability distribution can be
simulated using random numbers.
6.2 Monte Carlo Integrations
Another set of problems where the random numbers are used are problems of multi-
dimensional integrations. Consider a problem of nding the volume of n-dimensional hyper
sphere of radius 1. One dimensional hyper sphere is a line of length 2, so the volume is 2 (
units ). Two dimensional hyper sphere is a circle with the volume being 2. Three dimen-
sional one has volume 4/3. In general one can compute the volume analytically by dening
polar coordinates in n dimensions and doing the angle integrals analytically. ( Derive this
formula ). The method of doing the integration by using random number generator ( the
method is called as Monte Carlo method ) is as follows. The integral is dened as
_
b
a
f(x)dx =
1
N
i=1,N
f(x
i
) (41)
where one has picked N uniformly distributed random numbers {x
i
} between a and b.
The formula is written for one dimensional integral. For n-dimensional integral, the formula
is
26
_
volume
f(x
1
, x
2
, )dx
1
dx
2
=
1
N
i=1,N
f(x
i
1
, x
i
2
, ) (42)
where (x
i
1
, x
i
2
) is a set of n random numbers dening a coordinate in n-dimensional
space and it lies within the volume of integration. Generally, the volume of integration
may have a crooked shape and then one can dene a volume having a simple shape and
covering the volume of integration. The points (x
i
1
, x
i
2
) are then chosen from the simple
volume and the summation on the right includes only those points which fall in the volume
of integration.
Problem 4: Find the volume of n-dimensional sphere of radius unity by doing
Monte Carlo integration. It is best to choose {x
i
}s between 0 and 1. Then
the volume is 2
n
times the Monte Carlo integration. Further, the function
f(x
i
1
, x
i
2
, ) is unity if
_
j
(x
i
j
)
2
< 1 and zero otherwise. Do the calculation
for various values of n. You can turn the problem around and determine the
value of from the value of the integration. Check the accuracy as a function
of the number of points used for integration.
7 Data Fitting
In experimental measurements one usually measures a quantity or quantities as a function
of one or several parameters. For example, one measures dierential and/or total cross
section for scattering of two particles as a function of initial relative energy between two
particles, their spin orientations etc. Or one measures current in a circuit as a function of the
voltage dierence and so on. This yields a function ( or functions ) of one or more variables.
Thus the experimental measurement gives a discrete representation of the function ( total
cross section measured at discrete energy values or dierential cross section measured at
discrete angles at a given energy etc ). Usually, the experimental measurement is associated
with an error in the measurement in the value of the function (cross section ) and/or the
independent variable ( energy or angle ). A more detailed analysis tells us that the errors are
of several type. One type are what are called statistical errors. These arise simply because
27
the measurement process is sometimes statistical in nature. For example, the number
of particles decaying in a second is a statistically uncertain because the decay process is
random. The other error is what is called as the systematic error. The systematic error
arises because of the inherent diculties of the measuring instruments themselves. For
example, measuring instruments cannot measure things innitely accurately. For example,
a ruler can measure lengths accurately up to 1 mm only. There is also what is called as the
zero error of the instrument. So, when a measurement is repeated it gives dierent result,
up to the accuracy of the instrument. The experimentalist usually adds up these errors and
gives the result with error bars.
The data tting procedure aims to obtain a reasonably good representation of the data
in terms of a function of one or more variables. Actually, the required representation has to
have some theoretical basis ( or prejudice ) behind it. Otherwise, there are a large number
( strictly innite ) of functions which may be used to represent the data. The additional
problem is, since the experiment is done at nite number of values of the independent
variable, one does not have a unique function to describe the data. The problem is further
enhanced by the errors in the measurements. One therefore has to choose a function having
certain number of parameters to describe the data. For example, one may choose a linear
variation of the observed quantity as a function of independent variable ( parameters here
are slope and intercept or coecients of constant and term linear in independent variable
). Or, from theoretical considerations, if we know that there is a resonance then one may
choose a Lorentzian shape
C
(EE
0
)
2
+
2
to t the experimental data. In any case, we will
need to choose some function with a number of parameters to represent the data. We then
have to adjust the parameters of the function to obtain the best t to the data.
For some simple cases, we can know whether a t is best or not by looking at it. But
that is not always possible and we need to have some quantitative denition of the term
best t. Qualitatively, one can draw a curve representing the tting function to satisfy
oneself that one has obtained a best t. It is easiest for linear function since one can use
a scale and draw a line so that the data points are closest to the line with roughly half
lying below the line and rest above the line and their distance from the line as small as
possible. Further, the term closest should account for the errors present in the data.
28
That is, one should allow the data points having large error to be farther from the tting
curve in comparison with the points having small error. Obviously, the simple method
described above, although very eective, is not useful when the tting function is more
complicated than a linear function. Thus, we are looking for a method of determining the
set of parameters of the tting function which gives a best t to the data. Actually, this
involves three things. These are:
1. The values of the parameters giving the best t.
2. An estimate of an error associated with these parameters.
3. Finally we also need to have some statistical measure to decide on the goodness of
the t.
Many times, people stop after doing rst or second thing but the third thing is also very
important. It tells us whether the t obtained is meaningful at all or not. Without the
third, having the values of the parameters and their error estimates may not be of much
value.
One of the standard procedure used in data tting is the method of least squares. This
method basically minimizes the square of the dierence between the data and the tting
function. We shall describe that below.
7.1 Least Squares (
2
) tting
Consider the data given by experiment in the form of two arrays, say x
i
and ys with
xs being the independent variable and ys being the result of the measurement. We can
generalize it to more than one independent variable. We want to t the data with a function
of the form Y (x; {
j
}, j = 1 N). Here the function has N parameters
j
s which need
to be adjusted to get the best t. Obviously, the number of data points, M, should be
larger than N to have a meaningful t. For M smaller than N, one may have more than one
possible values of the parameters which will t the data.
One may decide to minimize the square of the dierence between the data and the tted
29
function, or the quantity
i=1,M
_
y
i
Y (x
i
;
j
)
_
2
(43)
This is almost like least squares minimization. But the problem with this minimization
is that it gives larger weightage to the data which has a large value y
i
. This need not always
be a good thing to do. What one would want to do is to have best representation of the data
which is more accurately determined. As mentioned earlier, experimental determination of
y
i
s is usually associated with error y
i
. In order to take this into account, one denes a
quantity called
2
as
2
=

i=1,M
_
y
i
Y (x
i
;
j
)
y
i
_
2
(44)
and minimize it. This procedure is called as least squares or
2
minimization. Since one is
dividing by y
i
2
, the data having large error is deemphasized. Another advantage of this
procedure is that the quantity
2
is dimensionless.
The qualitative meaning of this expression is as follows: Obviously, the quantity
2
is smallest when the experimental quantities y
i
are closest to Y (x
i
;
j
) on the average.
Further, the division by y
2
i
ensures that when the error y
i
is large, that point is given
smaller weight during minimization.
The deeper meaning of the
2
tting is as given below. It assumes that there is a theory
behind the experiment which is given by the function Y (x;
j
) and the experimental data
corresponds to this theory. Further, if we assume that there is an error in the measurement
of each point y
i
which follows normal ( Gaussian ) distribution, then the probability that
the data represents the theory is given by
P e
1
2
(
y
i
Y (x
i
;
j
)
y
i
)
2
(45)
Maximization of the probability is equivalent to the minimization of
2
dened above. One
must realize that this discussion assumes that the errors in each y
i
s are independent of
each other and there is no correlation between them. Further they also follow normal (
Gaussian ) distribution. Actually, this need not be true and in that case
2
-tting need not
30
be the correct thing to do. If the errors are correlated, the formula needs to be modied as
the number of independent data is smaller. The assumption that the errors follow normal
distribution is not so bad because of the central limit theorem. In fact the statistical error
in the measurement ( y
i
) does have normal distribution. The systematic error need
not follow normal distribution. Experimentalists often add the two and give a single error
estimate and one often treats this sum also distributed normally.
The
2
minimization gives the best t values of the parameters {
i
}. So, this is only
the rst part of the job. One still has to estimate the error in those values and also have
some statistical measure of the goodness of the t. For the situation where the function
Y (x,
i
) depends linearly on parameters
i
s the statistical distribution of
2
is given by the
so called chi-squared distribution of ( M - N ) degrees of freedom. That is, the probability
that the chi-square should exceed a particular value of
2
The
2
-minimization procedure for polynomial ts is somewhat simpler because it re-
duces to a solution of a set of linear equations. Let us consider a linear t to the data. So
then tting function is Y (x) = a + bx and we want to determine best values of a and b.
Minimization of chi
2
with respect to the parameters s and b is done by demanding
i=1,M
_
y
i
abx
i
y
i
_
2
a
= 0 (46)
and
i=1,M
_
y
i
abx
i
y
i
_
2
b
= 0 (47)
A simple calculation shows that these equations reduce to two linear algebraic equations,
a
i=1,M
1
y
2
i
+b
i=1,M
x
i
y
2
i
=

i=1,M
y
i
y
2
i
(48)
and
a
i=1,M
x
i
y
2
i
+b
i=1,M
x
2
i
y
2
i
=

i=1,M
y
i
x
i
y
2
i
(49)
Solution of these equations is a trivial job and one can obtain the best t values of the
coecients a and b. Dening S =

1
y
2
i
, S
x
=

x
i
y
2
i
, S
xx
=

x
2
i
y
2
i
, S
y
=

y
i
y
2
i
and
31
S
xy
=

x
i
y
i
y
2
i
, we get,
a =
S
xx
S
y
S
x
S
xy
SS
xx
S
2
x
b =
SS
xy
S
x
S
y
SS
xx
S
2
x
(50)
The generalization for a polynomial t is straight forward. Consider a tting a poly-
nomial of n
th
degree. In this case one has to solve n linear equations to determine the
coecients of the polynomial. For solving these one can use standard matrix methods.
7.2 Errors Estimates
In previous sections we have seen how one could compute the values of the parameters in the
tting function cal be computed. We gave a specic example of linear tting. Now we shall
look into what are the errors in the values of the parameters determined by minimizing the
2
. These errors arise from the fact that the experimental data itself has errors associated
with it. These errors give rise to the errors in the values of the parameters. For normal (
Gaussian ) distribution of errors, the errors in the parameters propagate in quadrature.
That is, the error in the parameter
i
due to errors in the measured values of y
j
s ( y
j
s
) is
2
(
i
) =

j
(y
j
)
2
_
i
Y
_
2
=

j
(y
j
)
2
_
Y
i
_
2
Y =y
i
(51)
For the linear t, the parameters a and b are given explicitly in the previous section.
Using those expressions, we can compute the errors in a and b explicitly.
2
(a) =

i
(y
i
)
2
_
SxxSySxSxy
SSxxS
2
x
y
i
_
2
=
_
1
SS
xx
S
2
x
_
2
_
i
(S
xx
)
2

j
1
(y
j
)
2
S
xx
S
x
j
x
i
(y
j
)
2
_
=
S
xx
SS
xx
S
2
x
(52)
32
and
2
(b) =

i
(y
i
)
2
_
SSxySxSy
SSxxS
2
x
y
i
_
2
=
_
1
SS
xx
S
2
x
_
2
_
S
2

j
x
2
j
(y
j
)
2
(S
x
)
2

1
(y
j
)
2
_
=
S
SS
xx
S
2
x
(53)
One can use same methodology to compute the errors in the coecients when one is tting
polynomial functions. The computing procedure for other functional forms is more involved.
One can also estimate the errors in the coecients by looking at the values of
2
in
the parameter space. If the minimum in the
2
is very deep, the error estimate of the
parameters is small. For shallow minima, the errors in the parameters are large.
It is possible that the parameters of the tting function are not really independent
but are correlated. This correlation will be reected in the error estimates as well. Such
correlations will show up in the plots of
2
as a function of two ( or more ) parameters of
the tting function.
7.3 Goodness of Fit
The procedure described in previous subsections give the values of the parameters and
possible errors in the determination of these parameters. But one still wants to know how
reliable are these numbers. For this one needs to have some understanding of the nature
of the data. Here we shall assume that the errors in the data are normal or Gaussian in
nature. We have seen that in such a case, the probability that the determined parameters
represent the data ( or the real theory ) is given by the
2
distribution of MN degrees of
freedom. Here M is the number of data points and N is the number of parameters in the
tting function. This distribution is
P(
2
, ) =
1
(/2)
_
2
2
_
/21
e
2
/2
(54)
where
2
is the computed value of
2
( as dened in eq(2), = M N is the number of
degrees of freedom and (/2) is the usual Gamma function. For large number of degrees
of freedom, the probability peaks at = M N. That is the expected value of
2
per
33
degree of freedom is close to unity. Also, the peak looks like a Gaussian peak with width
very close to

. The
2
distributions for dierent degrees of freedom are shown in the
accompanying gure.
0.0 1.0 2.0 3.0 4.0
0.0
1.0
2.0
3.0
0.0 1.0 2.0 3.0 4.0
0.0
1.0
2.0
3.0
chisq ( nu = 1 )
chisq ( nu = 2 )
chisq ( nu = 5 )
chisq ( nu = 10 )
chisq ( nu = 50 )
chisq ( nu = 100 )
Figure 3:
2
distributions
Now, if the conditions for the applicability of the
2
distribution for the data are valid,
we can comment on the goodness of the t. Knowing the number of degrees of freedom
in the data, we can say something about the expected value of
2
after the t. We expect
that the
2
per degree of freedom should be close to 1 and the dierence from this value
should be of the order of

where is the number of degrees of freedom. This statement
can be made more precise by considering the properties of the
2
distribution. But for large
, the
2
distribution is very close to the normal distribution with
2
= . If the computed
value of the minimum of
2
satises these conditions, then the t is good and the physics
depicted by the tting equation represents the data well.
A large dierence from these conditions implies that either the equations do not represent
the data very well or the conditions of the applicability of
2
analysis is not valid. That
34
is, either the assumption of errors following normal distributions is not valid or the number
of degrees of freedom are wrongly chosen or the errors at each data point are wrongly
estimated or the theory ( i.e. the equation used for tting ) is wrong. In particular, it
must be emphasized that getting a very small value of
2
also indicates some problem. It
is not expected that the tting curve will go through the data points giving a small ( or
null ) value of
2
. This is because there are errors in the measurement and these should be
reected in the t. A very small value of
2
will result if the error estimate is on the larger
side ( actual errors in the experiments are smaller than the quoted values ) or the data has
been fudged.
7.4 General Discussion
In this section we have discussed the general methodology adopted for tting the data
with a function having some physical basis and having some parameters which are to be
adjusted to obtain the best t. While doing this, we have assumed that the errors in the
experiment are normal and the data are not correlated. In that case, we have seen that
one can obtain the best t by minimizing a quantity called
2
. We have stated that this
minimization is equivalent to obtaining the largest probability of getting the parameters of
the tting function. We have also seen how one can compute the error in the parameters,
which essentially follows from the errors in the experimental measurements themselves. We
have also seen how we can determine the goodness of the t in terms of the
2
distribution
function, which is depends on the number of degrees of freedom in the data.
35

Numeth

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Numeth

Caricato da

Copyright:

Formati disponibili

Numerical Methods In Physics

Potrebbero piacerti anche