Hjorth-Jensen Notes2008 03

Chapter 3
Numerical differentiation
3.1 Introduction
Numerical integration and differentiation are some of the most frequently needed methods in compu-
′
Rtational physics. Quite often we are confronted with the need of evaluating either f or an integral
f (x)dx. The aim of this chapter is to introduce some of these methods with a critical eye on numerical
accuracy, following the discussion in the previous chapter.
The next section deals essentially with topics from numerical differentiation. There we present also
the most commonly used formulae for computing first and second derivatives, formulae which in turn find
their most important applications in the numerical solution of ordinary and partial differential equations.
This section serves also the scope of introducing some more advanced C++-programming concepts, such
as call by reference and value, reading and writing to a file and the use of dynamic memory allocation.
3.2 Numerical differentiation
The mathematical definition of the derivative of a function f (x) is
df (x) f (x + h) − f (x)
= lim
dx h→0 h
where h is the step size. If we use a Taylor expansion for f (x) we can write
h2 f ′′ (x)
f (x + h) = f (x) + hf ′ (x) + + ...
2
We can then set the computed derivative fc′ (x) as
f (x + h) − f (x) hf ′′ (x)
f ′ (x) ≈ ≈ f ′ (x) + + ...
h 2
Assume now that we will employ two points to represent the function f by way of a straight line between
x and x + h. Fig. 3.1 illustrates this subdivision.
This means that we can represent the derivative with
f (x + h) − f (x)
f2′ (x) = + O(h),
h
45
where the suffix 2 refers to the fact that we are using two points to define the derivative and the dominating
error goes like O(h). This is the forward derivative formula. Alternatively, we could use the backward
derivative formula
f (x) − f (x − h)
f2′ (x) = + O(h).
h
If the second derivative is close to zero, this simple two point formula can be used to approximate the
derivative. If we however have a function like f (x) = a + bx2 , we see that the approximated derivative
becomes
f2′ (x) = 2bx + bh,
while the exact answer is 2bx. Unless h is made very small, and b is not too large, we could approach the
exact answer by choosing smaller and smaller and values for h. However, in this case, the subtraction in
the numerator, f (x + h) − f (x) can give rise to roundoff errors and eventually a loss of precision.
A better approach in case of a quadratic expression for f (x) is to use a 3-step formula where we
evaluate the derivative on both sides of a chosen point x0 using the above forward and backward two-step
formulae and taking the average afterward. We perform again a Taylor expansion but now around x0 ± h,
namely
h2 f ′′ h3 f ′′′
f (x = x0 ± h) = f (x0 ) ± hf ′ + ± + O(h4 ), (3.1)
2 6
which we rewrite as
h2 f ′′ h3 f ′′′
f±h = f0 ± hf ′ + ± + O(h4 ).
2 6
Calculating both f±h and subtracting we obtain that
fh − f−h h2 f ′′′
f3′ = − + O(h3 ),
2h 6
and we see now that the dominating error goes like h2 if we truncate at the scond derivative. We call
the term h2 f ′′′ /6 the truncation error. It is the error that arises because at some stage in the derivation,
a Taylor series has been truncated. As we will see below, truncation errors and roundoff errors play an
important role in the numerical determination of derivatives.
For our expression with a quadratic function f (x) = a + bx2 we see that the three-point formula
′
f3 for the derivative gives the exact answer 2bx. Thus, if our function has a quadratic behavior in x in
a certain region of space, the three-point formula will result in reliable first derivatives in the interval
[−h, h]. Using the relation
fh − 2f0 + f−h = h2 f ′′ + O(h4 ),
we can define the second derivative as
fh − 2f0 + f−h
f ′′ = + O(h2 ).
h2
We could also define five-points formulae by expanding to two steps on each side of x0 . Using a
Taylor expansion around x0 in a region [−2h, 2h] we have
4h3 f ′′′
f±2h = f0 ± 2hf ′ + 2h2 f ′′ ± + O(h4 ). (3.2)
3
Using Eqs. (3.1) and (3.2), multiplying fh and f−h by a factor of 8 and subtracting (8fh − f2h )− (8f−h −
f−2h ) we arrive at a first derivative given by
f−2h − 8f−h + 8fh − f2h
′
f5c = + O(h4 ),
12h
46
3.2 – Numerical differentiation
f (x)
x0 − 2h x0 − h x0 x0 + h x0 + 2h x
Figure 3.1: Demonstration of the subdivision of the x-axis into small steps h. Each point corresponds to
a set of values x, f (x). The value of x is incremented by the step length h. If we use the points x0 and
x0 + h we can draw a straight line and use the slope at this point to determine an approximation to the
first derivative. See text for further discussion.
47
with a dominating error of the order of h4 at the price of only two additional function evaluations. This
formula can be useful in case our function is represented by a fourth-order polynomial in x in the region
[−2h, 2h]. Note however that this function includes two additional function evaluations, implying a more
time-consuming algorithm. Furthermore, the two additional subtraction can lead to a larger risk of loss of
numerical precision when h becomes small. Solving for example a differential equation which involves
the first derivative, one needs always to strike a balance between numerical accurary and the time needed
to achieve a given result.
It is possible to show that the widely used formulae for the first and second derivatives of a function
can be written as
∞ (2j+1)
fh − f−h X f0
′
= f0 + h2j , (3.3)
2h (2j + 1)!
j=1
and
∞ (2j+2)
fh − 2f0 + f−h X f
2
= f0′′ + 2 0
h2j , (3.4)
h (2j + 2)!
j=1
and we note that in both cases the error goes like O(h2j ).
These expressions will also be used when we
evaluate integrals.
To show this for the first and second derivatives starting with the three points f−h = f (x0 − h),
f0 = f (x0 ) and fh = f (x0 + h), we have that the Taylor expansion around x = x0 gives
∞ (j) ∞ (j)
X f X f
a−h f−h + a0 f0 + ah fh = a−h 0
(−h)j + a0 f0 + ah 0
(h)j , (3.5)
j! j!
j=0 j=0
where a−h , a0 and ah are unknown constants to be chosen so that a−h f−h + a0 f0 + ah fh is the best
possible approximation for f0′ and f0′′ . Eq. (3.5) can be rewritten as
a−h f−h + a0 f0 + ah fh = [a−h + a0 + ah ] f0

∞ (j)
h2 f0′′ X f0
+ [ah − a−h ] hf0′ + [a−h + ah ] (h)j (−1)j a−h + ah .

+
2 j!
j=3
To determine f0′ , we require in the last equation that
a−h + a0 + ah = 0,
1
−a−h + ah = ,
h
and
a−h + ah = 0.
These equations have the solution
1
a−h = −ah = − ,
2h
and
a0 = 0,
yielding
∞ (2j+1)
fh − f−h X f
= f0′ + 0
h2j .
2h (2j + 1)!
j=1
48
To determine f0′′ , we require in the last equation that
a−h + a0 + ah = 0,
−a−h + ah = 0,
and
2
a−h + ah = .
h2
These equations have the solution
1
a−h = −ah = − ,
h2
and
2
a0 = − ,
h2
yielding
∞ (2j+2)
fh − 2f0 + f−h X f
= f ′′
0 + 2 0
h2j .
h2 (2j + 2)!
j=1
3.2.1 The second derivative of ex

As an example, let us calculate the second derivatives of exp (x) for various values of x. Furthermore, we
will use this section to introduce three important C++-programming features, namely reading and writing
to a file, call by reference and call by value, and dynamic memory allocation. We are also going to split
the tasks performed by the program into subtasks. We define one function which reads in the input data,
one which calculates the second derivative and a final function which writes the results to file.
Let us look at a simple case first, the use of printf and scanf . If we wish to print a variable defined as
double speed_of_sound; we could for example write printf (‘‘ speed_of_sound = %lf\n ’’, speed_of_sound); .
In this case we say that we transfer the value of this specific variable to the function printf . The
function printf can however not change the value of this variable (there is no need to do so in this case).
Such a call of a specific function is called call by value. The crucial aspect to keep in mind is that the
value of this specific variable does not change in the called function.
When do we use call by value? And why care at all? We do actually care, because if a called function
has the possibility to change the value of a variable when this is not desired, calling another function with
this variable may lead to totally wrong results. In the worst cases you may even not be able to spot where
the program goes wrong.
We do however use call by value when a called function simply receives the value of the given variable
without changing it.
If we however wish to update the value of say an array in a called function, we refer to this call as
call by reference. What is transferred then is the address of the first element of the array, and the called
function has now access to where that specific variable ’lives’ and can thereafter change its value.
The function scanf is then an example of a function which receives the address of a variable and is
allowed to modify it. Afterall, when calling scanf we are expecting a new value for a variable. A typical
call could be scanf(‘‘% lf \n ’’, &speed_of_sound);.
Consider now the following program
//
// This program module
// demonstrates memory allocation and data transfer in
49
// between functions in C++

//
#in lude <stdio.h> // Standard ANSI-C++ include files

#in lude <stdlib.h>
int main(int argc, char ∗argv[])

{
int a: // line 1
int ∗b; // line 2
a = 10; // line 3
b = new int[10]; // line 4
for(i = 0; i < 10; i++) {
b[i] = i; // line 5
}
func( a,b); // line 6
return 0;
} // End: function main()
void func( int x, int ∗y) // line 7

{
x += 7; // line 8
∗y += 10; // line 9
y[6] += 10; // line 10
return; // line 11
} // End: function func()
There are several features to be noted.
– Lines 1,2: Declaration of two variables a and b. The compiler reserves two locations in memory.
The size of the location depends on the type of variable. Two properties are important for these
locations – the address in memory and the content in the
– Line 3: The value of a is now 10.
– Line 4: Memory to store 10 integers is reserved. The address to the first location is stored in b. The
address of element number 6 is given by the expression (b + 6).
– Line 5: All 10 elements of b are given values: b[0] = 0, b[1] = 1, ....., b[9] = 9;
– Line 6: The main() function calls the function func() and the program counter transfers to the first
statement in func(). With respect to data the following happens. The content of a (= 10) and the
content of b (a memory address) are copied to a stack (new memory location) associated with the
function func()
– Line 7: The variable x and y are local variables in func(). They have the values – x = 10, y =
address of the first element in b in the main() program.
– Line 8: The local variable x stored in the stack memory is changed to 17. Nothing happens with
the value a in main().
50
– Line 9: The value of y is an address and the symbol *y stands for the position in memory which
has this address. The value in this location is now increased by 10. This means that the value of
b[0] in the main program is equal to 10. Thus func() has modified a value in main().
– Line 10: This statement has the same effect as line 9 except that it modifies element b[6] in main()
by adding a value of 10 to what was there originally, namely 6.
– Line 11: The program counter returns to main(), the next expression after func(a,b);. All data on
the stack associated with func() are destroyed.
– The value of a is transferred to func() and stored in a new memory location called x. Any modi-
fication of x in func() does not affect in any way the value of a in main(). This is called transfer
of data by value. On the other hand the next argument in func() is an address which is transferred
to func(). This address can be used to modify the corresponding value in main(). In the program-
ming language C it is expressed as a modification of the value which y points to, namely the first
element of b. This is called transfer of data by reference and is a method to transfer data back to
the calling function, in this case main().
C++ allows however the programmer to use solely call by reference (note that call by reference is
implemented as pointers). To see the difference between C and C++, consider the following simple
examples. In C we would write
int n ; n =8;
f u n c (&n ) ; / ∗ &n i s a p o i n t e r t o n ∗ /
....
void func ( i n t ∗ i )
{
∗ i = 1 0 ; / ∗ n i s ch a n g ed t o 10 ∗ /
....
}
whereas in C++ we would write

int n ; n =8;
func ( n ) ; / / j u s t t r a n s f e r n i t s e l f
....
v o i d f u n c ( i n t& i )
{
i = 1 0 ; / / n i s ch a n g ed t o 10
....
}
Note well that the way wex have defined the input to the function func( int& i) or func( int ∗i ) decides
how we transfer variables to a specific function. The reason why we emphasize the difference between
call by value and call by reference is that it allows the programmer to avoid pitfalls like unwanted changes
of variables. However, many people feel that this reduces the readability of the code. It is more or less
common in C++ to use call by reference, since it gives a much cleaner code. Recall also that behind the
curtain references are usually implemented as pointers. When we transfer large objects such a matrices
and vectors one should always use call by reference. Copying such objects to a called function slows
down considerably the execution. If you need to keep the value of a call by reference object, you should
use the const declaration.
51
In programming languages like Fortran one uses only call by reference, but you can flag whether
a called function or subroutine is allowed or not to change the value by declaring for example an in-
teger value as INTEGER, INTENT(IN):: i . The local function cannot change the value of i. Declaring a
transferred values as INTEGER, INTENT(OUT):: i. allows the local function to change the variable i.
Initialisations and main program

In every program we have to define the functions employed. The style chosen here is to declare these
functions at the beginning, followed thereafter by the main program and the detailed task performed by
each function. Another possibility is to include these functions and their statements before the main
program, meaning that the main program appears at the very end. I find this programming style less read-
able however since I prefer to read a code from top to bottom. A further option, specially in connection
with larger projects, is to include these function definitions in a user defined header file. The following
program shows also (although it is rather unnecessary in this case due to few tasks) how one can split
different tasks into specialized functions. Such a division is very useful for larger projects and programs.
In the first version of this program we use a more C-like style for writing and reading to file. At the
end of this section we include also the corresponding C++ and Fortran files.
http://www.fys.uio.no/ ompphys/ p/programs/FYS3150/ hapter03/ pp/program1. pp

/∗
∗∗ Program t o co mp u te t h e s e c o n d d e r i v a t i v e o f e x p ( x ) .
∗∗ Three c a l l i n g f u n c t i o n s are i n c l u d e d
∗∗ i n t h i s v e r s i o n . I n one f u n c t i o n we r e a d i n t h e d a t a fro m s c r e e n ,
∗∗ t h e n e x t f u n c t i o n computes t h e second d e r i v a t i v e
∗∗ while the l a s t f u n c t i o n p r i n t s out data to screen .
∗/
u s i n g namespace s t d ;
# include <iostream >
v o i d i n i t i a l i s e ( do uble ∗ , do uble ∗ , i n t ∗ ) ;
v o i d s e c o n d _ d e r i v a t i v e ( i n t , double , double , do uble ∗ , do uble ∗ ) ;
v o i d o u t p u t ( do uble ∗ , do uble ∗ , double , i n t ) ;
i n t main ( )
{
/ / declarations of variables
int number_of_steps ;
do uble x , i n i t i a l _ s t e p ;
do uble ∗ h _ s t e p , ∗ c o m p u t e d _ d e r i v a t i v e ;
// r e a d i n i n p u t d a t a fro m s c r e e n
i n i t i a l i s e (& i n i t i a l _ s t e p , &x , &n u m b e r _ o f _ s t e p s ) ;
// a l l o c a t e s p a c e i n memory f o r t h e one−d i m e n s i o n a l a r r a y s
// h _ s t e p and c o m p u t e d _ d e r i v a t i v e
h _ s t e p = new do uble [ n u m b e r _ o f _ s t e p s ] ;
c o m p u t e d _ d e r i v a t i v e = new do uble [ n u m b e r _ o f _ s t e p s ] ;
// co mp u te t h e s e c o n d d e r i v a t i v e o f e x p ( x )
s e c o n d _ d e r i v a t i v e ( number_of_steps , x , i n i t i a l _ s t e p , h_step ,
computed_derivative ) ;
// Then we p r i n t t h e r e s u l t s t o f i l e
output ( h_step , computed_derivative , x , number_of_steps ) ;
/ / f r e e memory
52
delete [] h_step ;
delete [] computed_derivative ;
return 0;
} / / end main program
We have defined three additional functions, one which reads in from screen the value of x, the initial step
length h and the number of divisions by 2 of h. This function is called initialise . To calculate the second
derivatives we define the function second_derivative . Finally, we have a function which writes our results
together with a comparison with the exact value to a given file. The results are stored in two arrays, one
which contains the given step length h and another one which contains the computed derivative.
These arrays are defined as pointers through the statement
A call in the main function to the function second_derivative looks then like this
s e c o n d _ d e r i v a t i v e ( number_of_steps , x , i n t i a l _ s t e p , h_step ,
while the called function is declared in the following way

v o i d s e c o n d _ d e r i v a t i v e ( i n t n u m b e r _ o f _ s t e p s , do uble x , do uble ∗ h _ s t e p , do uble
∗ computed_derivative ) ;
indicating that double ∗h_step , double ∗computed_derivative; are pointers and that we transfer the address
of the first elements. The other variables int number_of_steps, double x; are transferred by value and are
not changed in the called function.
Another aspect to observe is the possibility of dynamical allocation of memory through the new
function. In the included program we reserve space in memory for these three arrays in the following way
h_step = new double[number_of_steps]; and computed_derivative = new double[number_of_steps]; When we
no longer need the space occupied by these arrays, we free memory through the declarations delete []
h_step ; and delete [] computed_derivative ;
The function initialise
// Read i n fro m s c r e e n t h e i n i t i a l s t e p , t h e number o f s t e p s

// and t h e v a l u e o f x
v o i d i n i t i a l i s e ( do uble ∗ i n i t i a l _ s t e p , do uble ∗x , i n t ∗ n u m b e r _ o f _ s t e p s )
{
p r i n t f ( "Read in from s reen initial step , x and number of steps\n" ) ;
s c a n f ( "%lf %lf %d" , i n i t i a l _ s t e p , x , n u m b e r _ o f _ s t e p s ) ;
return ;
} / / end o f f u n c t i o n i n i t i a l i s e
This function receives the addresses of the three variables double ∗ initial_step , double ∗x, int ∗
number_of_steps; and returns updated values by reading from screen.
The function second_derivative
53
// This f u n c t i o n computes t h e second d e r i v a t i v e
v o i d s e c o n d _ d e r i v a t i v e ( i n t n u m b e r _ o f _ s t e p s , do uble x ,
do uble i n i t i a l _ s t e p , do uble ∗ h _ s t e p ,
do uble ∗ c o m p u t e d _ d e r i v a t i v e )
{
int counter ;
do uble h ;
// calculate the step size
// i n i t i a l i s e t h e d e r i v a t i v e , y and x ( i n m i n u t e s )
// and i t e r a t i o n c o u n t e r
h = initial_step ;
// s t a r t computing f o r d i f f e r e n t st e p s i z e s
f o r ( c o u n t e r = 0 ; c o u n t e r < n u m b e r _ o f _ s t e p s ; c o u n t e r ++ )
{
// s e t u p a r r a y s w i t h d e r i v a t i v e s and s t e p s i z e s
h_step [ counter ] = h ;
computed_derivative [ counter ] =
( exp ( x+h ) −2.∗ exp ( x ) + exp ( x−h ) ) / ( h ∗h ) ;
h = h ∗0.5;
} / / end o f do l o o p
return ;
} / / end o f f u n c t i o n s e c o n d d e r i v a t i v e
The loop over the number of steps serves to compute the second derivative for different values of h.
In this function the step is halved for every iteration (you could obviously change this to larger or
smaller step variations). The step values and the derivatives are stored in the arrays h_step and double
computed_derivative .
The output function

This function computes the relative error and writes to a chosen file the results.
The last function here illustrates how to open a file, write and read possible data and then close it.
In this case we have fixed the name of file. Another possibility is obviously to read the name of this file
together with other input parameters. The way the program is presented here is slightly unpractical since
we need to recompile the program if we wish to change the name of the output file.
An alternative is represented by the following program C program. This program reads from screen
the names of the input and output files.

1 # i n c l u d e < s t d i o . h>
2 # i n c l u d e < s t d l i b . h>
3 int col :
4
5 i n t main ( i n t a r g c , cha r ∗ a r g v [ ] )
6 {
7 FILE ∗ i n , ∗ o u t ;
8 int c ;
9 i f ( argc < 3) {
10 p r i n t f ( "You have to read in :\n" ) ;
11 p r i n t f ( "in_file and out_file \n" ) ;
54
12 exit (1) ;
13 i n = f o p e n ( a r g v [ 1 ] , "r" ) ; } / / returns pointer to the i n _ f i l e
14 i f ( i n n == NULL ) { / / can ’ t f i n d i n _ f i l e
15 p r i n t f ( "Can't find the input file %s\n" , a r g v [ 1 ] ) ;
16 exit (1) ;
17 }
18 o u t = f o p e n ( a r g v [ 2 ] , "w" ) ; / / returns a pointer to the ou t _ f i l e
19 i f ( u t == NULL ) { / / can ’ t f i n d o u t _ f i l e
20 p r i n t f ( "Can't find the output file %s\n" , a r g v [ 2 ] ) ;
21 exit (1) ;
22 }
. . . p r o g r am s t a t e m e n t s
23 fclose ( in ) ;
24 fclose ( out ) ;
25 return 0;
}
This program has several interesting features.

Line Program comments
5 • main() takes three arguments, given by argc. argv points to the following:
the name of the program, the first and second arguments, in this case file
names to be read from screen.
7 • C++ has a data type called FILE. The pointers in and out point to spe-
cific files. They must be of the type FILE.
10 • The command line has to contain 2 filenames as parameters.
13–17 • The input file has to exit, else the pointer returns NULL. It has only read
permission.
18–22 • Same for the output file, but now with write permission only.
23–24 • Both files are closed before the main program ends.
The above represents a standard procedure in C for reading file names. C++ has its own class for
such operations.

/∗
∗∗ Program t o co mp u te t h e s e c o n d d e r i v a t i v e o f e x p ( x ) .
∗∗ I n t h i s v e r s i o n we u s e C++ o p t i o n s f o r r e a d i n g and
∗∗ w r i t i n g f i l e s and d a t a . The r e s t o f t h e c o d e i s a s i n
∗∗ p ro g ra ms / c h a p t e r 3 / program1 . cpp
∗∗ Three c a l l i n g f u n c t i o n s are i n c l u d e d
∗∗ i n t h i s v e r s i o n . I n one f u n c t i o n we r e a d i n t h e d a t a fro m s c r e e n ,
∗∗ t h e n e x t f u n c t i o n computes t h e second d e r i v a t i v e
∗∗ while the l a s t f u n c t i o n p r i n t s out data to screen .
∗/
u s i n g namespace s t d ;
# include <iostream >
# include <fstream >
# include <iomanip >
# i n c l u d e <cmath >
v o i d i n i t i a l i s e ( do uble ∗ , do uble ∗ , i n t ∗ ) ;
55
v o i d s e c o n d _ d e r i v a t i v e ( i n t , double , double , do uble ∗ , do uble ∗ ) ;

v o i d o u t p u t ( do uble ∗ , do uble ∗ , double , i n t ) ;
ofstream o f i l e ;
i n t main ( i n t a r g c , cha r ∗ a r g v [ ] )
{
/ / declarations of variables
cha r ∗ o u t f i l e n a m e ;
int number_of_steps ;
do uble x , i n i t i a l _ s t e p ;
/ / Read i n o u t p u t f i l e , a b o r t i f t h e r e a r e t o o f e w command−l i n e
arguments
i f ( a r g c <= 1 ) {
c o u t << "Bad Usage: " << a r g v [ 0 ] <<
" read also output file on same line" << e n d l ;
exit (1) ;
}
else {
o u t f i l e n a m e= a r g v [ 1 ] ;
}
o f i l e . o p en ( o u t f i l e n a m e ) ;
// r e a d i n i n p u t d a t a fro m s c r e e n
i n i t i a l i s e (& i n i t i a l _ s t e p , &x , &n u m b e r _ o f _ s t e p s ) ;
// a l l o c a t e s p a c e i n memory f o r t h e one−d i m e n s i o n a l a r r a y s
// h _ s t e p and c o m p u t e d _ d e r i v a t i v e
h _ s t e p = new do uble [ n u m b e r _ o f _ s t e p s ] ;
c o m p u t e d _ d e r i v a t i v e = new do uble [ n u m b e r _ o f _ s t e p s ] ;
// co mp u te t h e s e c o n d d e r i v a t i v e o f e x p ( x )
s e c o n d _ d e r i v a t i v e ( number_of_steps , x , i n i t i a l _ s t e p , h_step ,
// Then we p r i n t t h e r e s u l t s t o f i l e
output ( h_step , computed_derivative , x , number_of_steps ) ;
/ / f r e e memory
delete [] h_step ;
delete [] computed_derivative ;
/ / close output f i l e
ofile . close () ;
return 0;
} / / end main program
The main part of the code includes now an object declaration ofstream ofile which is included in C++ and
allows the programmer to open and declare files. This is done via the statement ofile .open( outfilename ) ; .
We close the file at the end of the main program by writing ofile . close () ; . There is a corresponding
object for reading inputfiles. In this case we declare prior to the main function, or in an evantual header
file, ifstream ifile and use the corresponding statements ifile .open( infilename ) ; and ifile . close () ; for
opening and closing an input file. Note that we have declared two character variables char∗ outfilename
; and char∗ infilename ; . In order to use these options we need to include a corresponding library of
functions using # include <fstream>.
One of the problems with C++ is that formatted output is not as easy to use as the printf and scanf
functions in C. The output function using the C++ style is included below.
56
// function to write out the f i n a l r e s u l t s

v o i d o u t p u t ( do uble ∗ h _ s t e p , do uble ∗ c o m p u t e d _ d e r i v a t i v e , do uble x ,
int number_of_steps )
{
int i ;
o f i l e << " RESULTS:" << e n d l ;
o f i l e << s e t i o s f l a g s ( i o s : : s h o w p o i n t | i o s : : u p p e r c a s e ) ;
f o r ( i = 0 ; i < n u m b e r _ o f _ s t e p s ; i ++)
{
o f i l e << s e t w ( 1 5 ) << s e t p r e c i s i o n ( 8 ) << l o g 1 0 ( h _ s t e p [ i ] ) ;
o f i l e << s e t w ( 1 5 ) << s e t p r e c i s i o n ( 8 ) <<
l o g 1 0 ( f a b s ( c o m p u t e d _ d e r i v a t i v e [ i ]− exp ( x ) ) / exp ( x ) ) ) << e n d l ;
}
} / / end o f f u n c t i o n o u t p u t
The function setw(15) reserves an output of 15 spaces for a given variable while setprecision (8) yields
eight leading digits. To use these options you have to use the declaration # include <iomanip>
Before we discuss the results of our calculations we list here the corresponding Fortran program. The
corresponding Fortran example is
http://www.fys.uio.no/ ompphys/ p/programs/FYS3150/ hapter03/f90/program1.f90

! Program t o co mp u te t h e s e c o n d d e r i v a t i v e o f e x p ( x ) .
! On ly one c a l l i n g f u n c t i o n i s i n c l u d e d .
! I t c o m p u t e s t h e s e c o n d d e r i v a t i v e and i s i n c l u d e d i n t h e
! MODULE f u n c t i o n s a s a s e p a r a t e meth o d
! The v a r i a b l e h i s t h e s t e p s i z e . We a l s o f i x t h e t o t a l number
! o f d i v i s i o n s by 2 o f h . The t o t a l number o f s t e p s i s r e a d fro m
! screen
MODULE c o n s t a n t s
! d e f i n i t i o n o f v a r i a b l e s f o r d o u b l e p r e c i s i o n s and c o m p l e x v a r i a b l e s
INTEGER , PARAMETER : : dp = KIND ( 1 . 0 D0 )
INTEGER , PARAMETER : : dpc = KIND ( ( 1 . 0 D0 , 1 . 0 D0 ) )
END MODULE c o n s t a n t s
! Here you can i n c l u d e s p e c i f i c f u n c t i o n s wh ich can be u s e d by

! many s u b r o u t i n e s o r f u n c t i o n s
MODULE f u n c t i o n s
USE c o n s t a n t s
IMPLICIT NONE
CONTAINS
SUBROUTINE d e r i v a t i v e ( n u m b e r _ o f _ s t e p s , x , i n i t i a l _ s t e p , h _ s t e p , &
computed_derivative )
INTEGER , INTENT ( IN ) : : n u m b e r _ o f _ s t e p s
INTEGER : : l o o p
REAL( DP ) , DIMENSION( n u m b e r _ o f _ s t e p s ) , INTENT (INOUT) : : &
computed_derivative , h_step
REAL( DP ) , INTENT ( IN ) : : i n i t i a l _ s t e p , x
REAL( DP ) : : h
! calculate the step s ize
! i n i t i a l i s e t h e d e r i v a t i v e , y and x ( i n m i n u t e s )
57
! and i t e r a t i o n c o u n t e r
h = initial_step
! s t a r t computing f o r d i f f e r e n t st e p s i z e s
DO l o o p =1 , n u m b e r _ o f _ s t e p s
! s e t u p a r r a y s w i t h d e r i v a t i v e s and s t e p s i z e s
h_step ( loop ) = h
c o m p u t e d _ d e r i v a t i v e ( l o o p ) = ( EXP ( x+h ) −2.∗EXP ( x ) +EXP ( x−h ) ) / ( h ∗h )
h = h ∗0.5
ENDDO
END SUBROUTINE d e r i v a t i v e
END MODULE f u n c t i o n s
PROGRAM s e c o n d _ d e r i v a t i v e
USE f u n c t i o n s
IMPLICIT NONE
! declarations of variables
INTEGER : : n u m b e r _ o f _ s t e p s , l o o p
REAL( DP) : : x , i n i t i a l _ s t e p
REAL( DP) , ALLOCATABLE, DIMENSION ( : ) : : h _ s t e p , c o m p u t e d _ d e r i v a t i v e
! r e a d i n i n p u t d a t a fro m s c r e e n
WRITE( ∗ , ∗ ) ’ Read i n i n i t i a l s t e p , x v a l u e and number o f s t e p s ’
READ( ∗ , ∗ ) i n i t i a l _ s t e p , x , n u m b e r _ o f _ s t e p s
! o p en f i l e t o w r i t e r e s u l t s on
OPEN( UNIT=7 ,FILE = ’ o u t . d a t ’ )
! a l l o c a t e s p a c e i n memory f o r t h e one−d i m e n s i o n a l a r r a y s
! h _ s t e p and c o m p u t e d _ d e r i v a t i v e
ALLOCATE( h _ s t e p ( n u m b e r _ o f _ s t e p s ) , c o m p u t e d _ d e r i v a t i v e ( n u m b e r _ o f _ s t e p s ) )
! co mp u te t h e s e c o n d d e r i v a t i v e o f e x p ( x )
! i n i t i a l i z e the arrays
h _ s t e p = 0 . 0 _dp ; c o m p u t e d _ d e r i v a t i v e = 0 . 0 _dp
CALL d e r i v a t i v e ( n u m b e r _ o f _ s t e p s , x , i n i t i a l _ s t e p , h _ s t e p , c o m p u t e d _ d e r i v a t i v e
)
! Then we p r i n t t h e r e s u l t s t o f i l e
DO l o o p =1 , n u m b e r _ o f _ s t e p s
WRITE( 7 , ’ ( E16 . 1 0 , 2X, E16 . 1 0 ) ’ ) LOG10 ( h _ s t e p ( l o o p ) ) ,&
LOG10 ( ABS ( ( c o m p u t e d _ d e r i v a t i v e ( l o o p )−EXP ( x ) ) / EXP ( x ) ) )
ENDDO
! f r e e memory
DEALLOCATE ( h _ s t e p , c o m p u t e d _ d e r i v a t i v e )
! close the output f i l e
CLOSE( 7 )
END PROGRAM s e c o n d _ d e r i v a t i v e
The MODULE declaration in Fortran allows one to place functions like the one which calculates second
derivatives in a module. Since this is a general method, one could extend its functionality by simply
transfering the name of the function to differentiate. In our case we use explicitely the exponential
function, but there is nothing which hinders us from defining other functions. Note the usage of the
module constants where we define double and complex variables. If one wishes to switch to another
58
precision, one just needs to change the declaration in one part of the program only. This hinders possible
errors which arise if one has to change variable declarations in every function and subroutine. Finally,
dynamic memory allocation and deallocation is in Fortran done with the keywords ALLOCATE( array(
size)) and DEALLOCATE(array). Although most compilers deallocate and thereby free space in memory
when leaving a function, you should always deallocate an array when it is no longer needed. In case
your arrays are very large, this may block unnecessarily large fractions of the memory. Furthermore,
you should always initialise arrays. In the example above, we note that Fortran allows us to simply write
h_step = 0.0_dp; computed_derivative = 0.0_dp, which means that all elements of these two arrays are
set to zero. Coding arrays in this manner brings us much closer to the way we deal with mathematics.
In Fortran it is irrelevant whether this is a one-dimensional or multi-dimensional array. In the next next
chapter, where we deal with allocation of matrices, we will introduce the numerical library Blitz++ which
allows for similar treatments of arrays in C++. By default however, these features are not included in the
ANSI C++ standard.
Results
In Table 3.1 we present the results of a numerical evaluation for various step sizes for the second deriva-
f −2f +f
tive of exp (x) using the approximation f0′′ = h h02 −h . The results are compared with the exact ones
for various x values. Note well that as the step is decreased we get closer to the exact value. However, if
x h = 0.1 h = 0.01 h = 0.001 h = 0.0001 h = 0.0000001 Exact

0.0 1.000834 1.000008 1.000000 1.000000 1.010303 1.000000
1.0 2.720548 2.718304 2.718282 2.718282 2.753353 2.718282
2.0 7.395216 7.389118 7.389057 7.389056 7.283063 7.389056
3.0 20.102280 20.085704 20.085539 20.085537 20.250467 20.085537
4.0 54.643664 54.598605 54.598155 54.598151 54.711789 54.598150
5.0 148.536878 148.414396 148.413172 148.413161 150.635056 148.413159
Table 3.1: Result for numerically calculated second derivatives of exp (x) as functions of the chosen step
size h. A comparison is made with the exact value.
it is further decreased, we run into problems of loss of precision. This is clearly seen for h = 0.0000001.
This means that even though we could let the computer run with smaller and smaller values of the step,
there is a limit for how small the step can be made before we loose precision.
3.2.2 Error analysis

Let us analyze these results in order to see whether we can find a minimal step length which does not
lead to loss of precision. Furthermore In Fig. 3.2 we have plotted
!
f ′′ ′′
computed − fexact

ǫ = log10 ′′ ,
fexact
as function of log10 (h). We used an intial step length of h = 0.01 and fixed x = 10. For large values of
h, that is −4 < log10 (h) < −2 we see a straight line with a slope close to 2. Close to log10 (h) ≈ −4
the relative error starts increasing and our computed derivative with a step size log10 (h) < −4, may no
longer be reliable.
59
6
Relative error
4
-2
ǫ
-4
-6
-8
-10
-14 -12 -10 -8 -6 -4 -2 0
log10 (h)
Figure 3.2: Log-log plot of the relative error of the second derivative of ex as function of decreasing step
lengths h. The second derivative was computed for x = 10 in the program discussed above. See text for
further details
Can we understand this behavior in terms of the discussion from the previous chapter? In chapter 2
we assumed that the total error could be approximated with one term arising from the loss of numerical
precision and another due to the truncation or approximation made, that is
ǫtot = ǫapprox + ǫro .
For the computed second derivative, Eq. (3.4), we have

∞ (2j+2)
fh − 2f0 + f−h X f
f0′′ = 2
−2 0
h2j ,
h (2j + 2)!
j=1
and the truncation or approximation error goes like

(4)
f
ǫapprox ≈ 0 h2 .
12
If we were not to worry about loss of precision, we could in principle make h as small as possible.
However, due to the computed expression in the above program example
fh − 2f0 + f−h (fh − f0 ) + (f−h − f0 )
f0′′ = = ,
h2 h2
we reach fairly quickly a limit for where loss of precision due to the subtraction of two nearly equal
numbers becomes crucial. If (f±h − f0 ) are very close, we have (f±h − f0 ) ≈ ǫM , where |ǫM | ≤ 10−7
for single and |ǫM | ≤ 10−15 for double precision, respectively.
We have then
′′ (fh − f0 ) + (f−h − f0 ) 2ǫM
f0 = ≤ .
h2 h2
60
Our total error becomes

(4)
2ǫM f
|ǫtot | ≤ 2 + 0 h2 . (3.6)
h 12
It is then natural to ask which value of h yields the smallest total error. Taking the derivative of |ǫtot | with
respect to h results in
!1/4
24ǫM
h= (4)
.
f0
With double precision and x = 10 we obtain
h ≈ 10−4 .
Beyond this value, it is essentially the loss of numerical precision which takes over. We note also that
the above qualitative argument agrees seemingly well with the results plotted in Fig. 3.2 and Table 3.1.
The turning point for the relative error at approximately h ≈ ×10−4 reflects most likely the point where
roundoff errors take over. If we had used single precision, we would get h ≈ 10−2 . Due to the subtractive
cancellation in the expression for f ′′ there is a pronounced detoriation in accuracy as h is made smaller
and smaller.
It is instructive in this analysis to rewrite the numerator of the computed derivative as
(fh − f0 ) + (f−h − f0 ) = (ex+h − ex ) + (ex−h − ex ),
as
(fh − f0 ) + (f−h − f0 ) = ex (eh + e−h − 2),
since it is the difference (eh + e−h − 2) which causes the loss of precision. The results, still for x = 10
are shown in the Table 3.2. We note from this table that at h ≈ ×10−8 we have essentially lost all leading
h eh + e−h eh + e−h − 2
10−1 2.0100083361116070 1.0008336111607230×10−2
10−2 2.0001000008333358 1.0000083333605581×10−4
10−3 2.0000010000000836 1.0000000834065048×10−6
10−4 2.0000000099999999 1.0000000050247593×10−8
10−5 2.0000000001000000 9.9999897251734637×10−11
10−6 2.0000000000010001 9.9997787827987850×10−13
10−7 2.0000000000000098 9.9920072216264089×10−15
10−8 2.0000000000000000 0.0000000000000000×100
10−9 2.0000000000000000 1.1102230246251565×10−16
10−10 2.0000000000000000 0.0000000000000000×100
Table 3.2: Result for the numerically calculated numerator of the second derivative as function of the step
size h. The calculations have been made with double precision.
digits.
From Fig. 3.2 we can read off the slope of the curve and thereby determine empirically how truncation
errors and roundoff errors propagate. We saw that for −4 < log10 (h) < −2, we could extract a slope
close to 2, in agreement with the mathematical expression for the truncation error.
We can repeat this for −10 < log10 (h) < −4 and extract a slope ≈ −2. This agrees again with our
simple expression in Eq. (3.6).
61
3.3 Exercises and projects
Exercise 3.1: Computing derivatives numerically

We want you to compute the first derivative of
f (x) = tan−1 (x)

√
for x = 2 with step lengths h. The exact answer is 1/3. We want you to code the derivative using the
following two formulae
′ f (x + h) − f (x)
f2c (x) = + O(h), (3.7)
h
and
fh − f−h
′
f3c = + O(h2 ), (3.8)
2h
with f±h = f (x ± h).
(a) Find mathematical expressions for the total error due to loss of precision and due to the numerical
approximation made. Find the step length which gives the smallest value. Perform the analysis
with both double and single precision.
(b) Make thereafter a program which computes the first derivative using Eqs. (3.7) and (3.8) as function
of various step lengths h and let h → 0. Compare with the exact answer.
Your program should contain the following elements:
– A vector (array) which contains the step lengths. Use dynamic memory allocation.
– Vectors for the computed derivatives of Eqs. (3.7) and (3.8) for both single and double preci-
sion.
– A function which computes the derivative and contains call by value and reference (for C++
users only).
– Add a function which writes the results to file.
(c) Compute thereafter !

f′ ′
computed − fexact

ǫ = log10 ′ ,
fexact

as function of log10 (h) for Eqs. (3.7) and (3.8) for both single and double precision. Plot the results
and see if you can determine empirically the behavior of the total error as function of h.
62

Hjorth-Jensen Notes2008 03

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Hjorth-Jensen Notes2008 03

Caricato da

Copyright:

Formati disponibili

Chapter 3

3.2 Numerical differentiation

The mathematical definition of the derivative of a function f (x) is

a−h f−h + a0 f0 + ah fh = [a−h + a0 + ah ] f0

To determine f0′ , we require in the last equation that

To determine f0′′ , we require in the last equation that

3.2.1 The second derivative of ex

// between functions in C++

#in lude <stdio.h> // Standard ANSI-C++ include files

int main(int argc, char ∗argv[])

void func( int x, int ∗y) // line 7

There are several features to be noted.

– Line 3: The value of a is now 10.

whereas in C++ we would write

Initialisations and main program

http://www.fys.uio.no/ ompphys/ p/programs/FYS3150/ hapter03/ pp/program1. pp

while the called function is declared in the following way

The function initialise

// Read i n fro m s c r e e n t h e i n i t i a l s t e p , t h e number o f s t e p s

The function second_derivative

// This f u n c t i o n computes t h e second d e r i v a t i v e

The output function

http://www.fys.uio.no/ ompphys/ p/programs/FYS3150/ hapter03/ pp/program2. pp

This program has several interesting features.

http://www.fys.uio.no/ ompphys/ p/programs/FYS3150/ hapter03/ pp/program3. pp

v o i d s e c o n d _ d e r i v a t i v e ( i n t , double , double , do uble ∗ , do uble ∗ ) ;

// function to write out the f i n a l r e s u l t s

http://www.fys.uio.no/ ompphys/ p/programs/FYS3150/ hapter03/f90/program1.f90

! Here you can i n c l u d e s p e c i f i c f u n c t i o n s wh ich can be u s e d by

x h = 0.1 h = 0.01 h = 0.001 h = 0.0001 h = 0.0000001 Exact

3.2.2 Error analysis

ǫtot = ǫapprox + ǫro .

For the computed second derivative, Eq. (3.4), we have

and the truncation or approximation error goes like

Our total error becomes

3.3 Exercises and projects

Exercise 3.1: Computing derivatives numerically

f (x) = tan−1 (x)

(c) Compute thereafter !

Potrebbero piacerti anche