Sei sulla pagina 1di 211

Martin Scholtz

Classical Mechanics
and
Dynamical Systems

With calculations in Mathematica

December 27, 2012


2

Department of Applied Mathematics


Faculty of Transportation Sciences
Czech Technical University in Prague
Contents

1 Classical mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Newton’s laws of motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Einstein’s convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.6 Conservation of momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.7 Conservation of angular momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8 Curvilinear coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.8.1 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.8.2 Spherical coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Lagrange equations of the second kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2.1 Generalized coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2.2 Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.2.3 Generalized forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.4 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3 Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4 Particle in homogeneous gravitational field . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5 Harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6 Mathematical pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Contents

2.7 Lagrange equations in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53


2.8 Solving the equations of motion of pendulum . . . . . . . . . . . . . . . . . . . . . . 55
2.9 Deriving the Lagrangian in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.10 Planet in gravitational field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Legendre transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Particle in homogeneous gravitational field . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.1 Homogeneous functions and Hamiltonian . . . . . . . . . . . . . . . . . . . . . 70
3.5.2 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.6 Phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.7 Harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1 Fermat’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Formulation of variational problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3 Variation of the functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4 Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.5 Non-uniqueness of the Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.6 Variational derivation of Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . 91
4.7 Noether’s theorem: motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.8 Noether’s theorem: proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.9 Basic conservation laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Hamilton-Jacobi equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103


5.1 Canonical transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2 Hamilton-Jacobi equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3 Example: harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4 Action-angle variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6 Electromagnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113


6.1 Lagrangian and equations of motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.2 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.4 Homogeneous fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.5 Electromagnetic wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Contents 5

6.6 Electrostatic wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7 Discrete dynamical systems and fractals . . . . . . . . . . . . . . . . . . . . . . . . . 129


7.1 Complex sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2 Mandelbrot set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8 Dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139


8.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.3 Implementation in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.4 Chaotic pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.5 Critical points of the pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.6 Stability of critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.7 Classification of critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.7.1 Stable and unstable nodes, saddle points . . . . . . . . . . . . . . . . . . . . . 161
8.7.2 Centres and foci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.8 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.9 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
8.10 Flow of the vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.11 Lyapunov stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

9 Bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.1 Saddle-node bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.2 Transcritical bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.3 Pitchfork bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
9.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

A Important commands in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . 201


A.1 D-derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
A.2 Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

B Some features of Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203


B.1 Rules of replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
B.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
B.3 Pure functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
B.4 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
B.5 Working with heads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6 Contents

C Shortcuts in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209


C.1 Greek letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

D To do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
1
Classical mechanics

Classical mechanics is the most basic part of the physics. In fact, the physics as
an exact science started with the development of mechanics by sir Isaac Newton.
Conventionally we distinguish two parts of mechanics: the kinematics and the dy-
namics. In this chapter we introduce basic notions of dynamics, including the notion
of generalized coordinates and several conventions being used thorough the entire
textbook.
The word “kinematics” is derived from the Greek word κινει̃ν (kinein) meaning
“to move”. Thus, the kinematics studies the motion of bodies and point masses. It
does not, however, ask why the bodies move in a given way, but rather it provides
us with the description of the motion. In kinematics we ask where the bodies are,
at what velocities and with what accelerations they move. We also classify the kinds
of motion according to the shapes of the trajectories or according to the velocities.
Typical kinematic quantities are position, velocity and acceleration.
In dynamics, on the other hand, we study reasons of the motion. The word “dy-
namics” has an ancient origin as well: δυναµικóς means “powerful”. In this branch of
mechanics we ask what are the forces acting on the bodies and what is the influence
of the forces on the motion. This influence will not depend on the force themselves
only, but also on the mass of the bodies. Mass, force and momentum belong to basic
quantities in dynamics.

1.1 Newton’s laws of motion


In this chapter we start with reformulation of non-relativistic classical, i.e. Newto-
nian, dynamics. In Newtonian dynamics, physical bodies or idealized point particles
are moving and interacting according to Newton’s laws of motion. Newton himself
formulated them in his Mathematical Principles of Natural Philosophy as follows:
8 1 Classical mechanics

• Law of inertia
Every body persists in its state of being at rest or of moving uniformly straight
forward, except insofar as it is compelled to change its state by force impressed.
• Law of force
The alteration of motion is ever proportional to the motive force impressed; and
is made in the direction of the right line in which that force is impressed.
• Law of action and reaction
To every action there is always an equal and opposite reaction: or the forces of
two bodies on each other are always equal and are directed in opposite directions.
These laws involve important notions of force, momentum and mass and we as-
sume that reader is familiar with them. Consider a point particle of mass m. Choosing
some fixed point O (origin) in the space, we can describe motion of the particle by
position vector (radius vector) r. Position vector is time-dependent if the body is
moving with respect to the origin O, which is mathematically denoted by r = r(t).
Trajectory of the point particle is the set of all end-points of the position vector
in some time interval (see fig. 1.1). Velocity is defined as a derivative of r(t) with
respect to time:
dr
v(t) = .
dt
The total derivative with respect to time will be often denoted by “dot”, so that the
last equation is briefly written as v = ṙ(t) = ṙ. Sometimes it is useful to parametrize
position vector by other parameter than time, for example by the length of the
trajectory.
Similarly, second derivative with respect to time will be denoted by “double-dot”.
The most important example is the definition of acceleration, which is the second
derivative of position vector with respect to time:
d2 r dv
a= 2 = = r̈ = v̇.
dt dt
Quantities r, v and a are so called kinematic quantities. They describe the motion
independently of the causes and reasons of motion. According to Aristotle, the motion
is caused by forces, but this is wrong. Aristotle’s opinion was so influential that it
stopped the progress in physics for the next two thousands years. Experimental
research of Galileo Galilei, his discovery of the law of inertia, and finally the grand
work of Isaac Newton founded the basis of modern physics.
Why is Aristotle’s point of view wrong? Well, we have to clarify what we mean by
the statement that “the motion is caused by the forces”. The law of inertia says that
1.1 Newton’s laws of motion 9

trajectory
P (position at time t)

v(t)(velocity)
r(t)(radius vector)

O(origin)

Fig. 1.1. The point mass is moving along the trajectory. Its position at time t is given by position
vector r(t). The velocity v = ṙ

if there is no force, the body will move uniformly along the straight line. We need
the force to change the motion, not to preserve it. Therefore there is no connection
between the force and velocity, but there must be a relation between the force and
acceleration. This crucial point was missed by Aristotle.
The precise form of the relation between force and acceleration is given by New-
ton’s second law, the law of force. We expect that acceleration has the same direction
as the force, and that bigger force will cause bigger acceleration. Experience teaches
us that we need bigger force to change the motion of heavier bodies, so the acceler-
ation must be inversely proportional to the mass. This simple consideration directly
leads us to the suggestion
F
a= ,
m
where m is the mass of the body and F is the force acting on the body. The last
formula is a mathematical expression of Newton’s second law and the experiments
show that it is in a very good accordance with the reality, although it fails for high
velocities, strong gravitational fields and for microscopic objects.
We can formulate this law in slightly different form by defining the (linear) mo-
mentum p of the body:
p = m v.
Momentum incorporates both the measure of the inertia (mass) and the “state of
motion”, velocity. The force can be then defined as the change of the momentum in
time, i.e.
10 1 Classical mechanics

dp
F = .
dt
If the mass of the body is constant in time, we have ṗ = mv̇ = ma, which is again
Newton’s law

F = m a.

1.2 Index notation


In the previous section we defined the position vector, velocity and the other quanti-
ties geometrically. For example, by position vector we mean oriented line connecting
the origin with given point, the velocity was defined as a vector tangent to the
trajectory etc. This geometrical language is very convenient and useful and will be
developed in more detail thorough the textbook. However, if we want to describe
the position of a body, we have to introduce a coordinate system in which we can
specify the coordinates. In what follows the notion of coordinates will be crucial. In
this section we therefore briefly review the basics of the so-called index notation.
We suppose that the reader is familiar with the Cartesian coordinate system.
Consider figure 1.2. Again, the point O is the origin of the reference frame and we
want to specify the position of the point P . The position vector r is an oriented
line connecting points O and P . Now, choose three lines called x, y and z, which are
perpendicular to each other and intersect at the origin O. These lines are called axes.
Then to each point P we can assign three real numbers called Cartesian coordinates.
Symbolically we write

r = (x, y, z)

and say that x, y and z are the coordinates (or the components) of the position vector
r. In the index notation we define

x1 = x, x2 = y, x3 = z.

Thus, each point P has three coordinates

xi , i = 1, 2, 3.

If the position vector depends on time, i.e. r = r(t), also its coordinates xi do:

xi = xi (t).
1.2 Index notation 11

The components of the velocity v = ṙ are then


vi = ẋi ,
and the components of the acceleration are
ai = v̇i = ẍi .
Any vector equation can be then written equivalently in the index form. For
example, Newton’s law of force F = m a is equivalent to equation
Fi = m ai .
Substituting for ai we obtain the index form of the law of force:
Fi = m ẍi .
Recall that the momentum of the particle was defined as p = m v. In the index
notation we can write
pi = m vi = m ẋi .

1.2.1 Einstein’s convention


Let x and y be arbitrary vector quantities with corresponding components xi and
yi . In other words,
x = (x1 , x2 , x3 ),
(1.1)
y = (y1 , y2 , y3 ).
The scalar product of these vector can be defined as a scalar quantity
x · y = x1 y 1 + x 2 y 2 + x3 y 3 .
Using the summation symbol Σ, the last equality can be rewritten as
3
X
x·y = xi yi .
i=1

This notation means that we subsequently substitute values 1, 2, 3 for the variable
i and then add all terms. Notice that under the summation symbol we have two
vector quantities and the index i appears there exactly twice. In fact, expressions of
this type arise very often in mathematics and physics. Albert Einstein introduced a
convention named after him, in which we do not write the symbol Σ. More precisely,
if some index appears in some term exactly twice, the sum through this index is
automatically assumed. That is, the scalar product can be written simply as
x · y = xi y i .
12 1 Classical mechanics

1.2.2 Differentiation

Let us see another example. Suppose that f is a physical quantity depending on the
position, i.e.

f = f (r) = f (x, y, z) = f (x1 , x2 , x3 ).

If the coordinates xi represent the position of moving body, functions xi depend on


time, i.e. xi = xi (t). Then also the quantity f depends on time, for we can write

f (t) = f (x(t)).

The total derivative of f with respect to time reads


3
∂f dx1 ∂f dx2 ∂f dx3 X ∂f dxi
f˙ = + + = .
∂x1 dt ∂x2 dt ∂x3 dt i=1
∂xi dt

We can see that the expression under the sum has again the same structure: index i
appears there exactly twice. According to Einstein’s convention we therefore write
∂f dxi
f˙ = .
∂xi dt
For convenience we introduce also the notation

= ∂i ,
∂xi
and together with notation ẋi = dxi /dt we can write simply

f˙ = ẋi ∂i f.

Obviously, index notation is very compact and brief.

1.2.3 Examples

Example 1: scalar product

Let x and y be vectors with the components

x = (1, 2, 3), y = (−5, 1, 1). (1.2)

Find the scalar product x · y.


1.2 Index notation 13

P (x, y, z)

O y

x
Fig. 1.2. Position vector in Cartesian coordinates.

Solution. Both vectors have three components, so that the index i takes values

i = 1, 2, 3.

The scalar product is defined as

x · y = xi yi .

Since the index i repeats twice in the previous expression, according to Einstein’s
summation convention we have

xi yi = x1 y1 + x2 y2 + x3 y3 .

Substituting values (1.2) we find

xi yi = 1 · (−5) + 2 · 1 + 3 · 1 = 0.

Thus, the scalar product of given vectors vanishes:

x · y = 0.

In such a case the vectors x and y are said to be mutually orthogonal.


14 1 Classical mechanics

Example 2: divergence

Let

v = (v1 , v2 , v3 )

be a given vector field in Cartesian coordinates. Write down the expression

∂i vi

explicitly. Expression ∂i vi is called the divergence of vector field v. Evaluate resulting


expression for vector field

v = (x − y, x2 + y 2 , xy). (1.3)

Solution. According to Einstein’s convention, expression ∂i vi is a sum through i:

∂i vi = ∂1 v1 + ∂2 v2 + ∂3 v3

or, equivalently,
∂v1 ∂v2 ∂v3
∂i vi = + + .
∂x ∂y ∂z
Substituting (1.3) we find

∂1 v1 = 1,
∂2 v2 = 2 y, (1.4)
∂3 v3 = 0

so that the divergence is

∂i vi = 1 + 2 y.

Example 3: The Laplace operator

The Laplace operator or the Laplacian is an operator ∆ defined as

∆f = ∂i ∂i f,

where f is an arbitrary object (scalar function or the component of vector). Find the
expression for the Laplacian in the Cartesian coordinates.
1.2 Index notation 15

Solution. Again, we use Einstein’s convention in a straightforward way:

∂ 2f ∂ 2f ∂ 2f
∆f = ∂i ∂i f = ∂1 ∂1 f + ∂2 ∂2 f + ∂3 ∂3 f = + + .
∂x2 ∂y 2 ∂z 2
Thus, the Laplacian reads

∂2 ∂2 ∂2
∆= + + .
∂x2 ∂y 2 ∂z 2

Example 4: radius vector

Let

r = (x, y, z)

be a position (radius) vector. Its magnitude is given by


√ p
r = r · r = x2 + y 2 + z 2 .

Find the derivatives of radius vector with respect to Cartesian coordinates and write
down the result in the index notation.
Solution. We need to evaluate quantities ∂i r. We start with ∂1 r, i.e. with the deriva-
tive with respect to coordinate x. We have
∂r 1 x
= ∂x (x2 + y 2 + z 2 )1/2 = (x2 + y 2 + z 2 )−1/2 2 x = p .
∂x 2 x2 + y 2 + z 2

Thus, we arrived at
∂r x
= .
∂x r
Similarly, one can show that for the other coordinates the following holds:
∂r y ∂r z
= , = .
∂y r ∂z r
All these result can be summarized in the index notation as
xi
∂i r = .
r
16 1 Classical mechanics

Example 5: time dependence

Now let the radius vector from the previous example be time-dependent in such way
that
x(t) = a cos ωt,
y(t) = a sin ωt, (1.5)
z(t) = 0.

Find the velocity v = ṙ, the acceleration a = v̇ and their magnitudes.


Solution. Since r = (x, y, z), we can find the velocity by straightforward differenti-
ation:
ẋ = − b ω sin ωt,
ẏ = b ω cos ωt, (1.6)
ż = 0.

The velocity is then

v = (− b ω sin ωt, b ω cos ωt, 0).

Its magnitude is
√ p
v = v · v = (b ω sin ωt)2 + (b ω cos ωt)2 = ω b.

Differentiate the velocity to find the acceleration,

a = (− b ω 2 cos ωt, −b ω 2 sin ωt),

and calculate the magnitude:



a = a · a = ω 2 b.

Example 6: The total differential

Prove the relation


d 2
v = 2 v · v̇, (1.7)
dt
preferably in the index notation.
1.3 Kinetic energy 17

Standard solution. First we prove (1.7) in an usual way. If the components of v


are

v = (v1 , v2 , v3 ),

its total derivative with respect to time is

v̇ = (v̇1 , v̇2 , v̇3 ).

The magnitude of v is

v 2 = v12 + v22 + v32 .

Let us differentiate it with respect to time:


d 2
v = 2 v1 v̇1 + 2 v2 v̇2 + 2 v3 v̇3 = 2 v · v̇.
dt

Solution in index notation. The proof is essentially identical to the previous one,
but more compact:
d 2 d
v = vi vi = 2 vi v̇i = 2 v · v̇.
dt dt

1.3 Kinetic energy


The notion of energy is somewhat subtle and its full understanding relies on the so-
called Emmy Noether theorems, see sections 4.7 and 4.8 in chapter 4. In mechanics,
however, the situation is quite simple. Roughly speaking, the body has an energy if
it can perform the work.
The force can cause the displacement of a body and the quantity known as “work”
is a quantitative characteristics of this process. Suppose that the force F causes the
displacement of the body along a given trajectory from point A to point B. The work
done by this force is defined by
ZB
W = F · dr.
A

Suppose, in addition, that the body was at rest at point A, while the final velocity
of the body was v. Let us evaluate the work done by the force in terms of the final
velocity. Using Newton’s law in the form
18 1 Classical mechanics

F = m v̇

we have
ZB
dv
W = m · dr.
dt
A

Notice that infinitesimal displacement dr is related to the velocity via

dr = v dt,

so that
ZB
W =m v · dv.
A

Now we can use relation (1.7) to find

ZB
1 2
W =m dv .
2
A

The last expression integrates to


1 1
W = m (v 2B − v 2A ) = m v 2 ,
2 2
where we have used the assumptions

v A = 0, v B = v.

Let us analyze the result


1
W = m v2
2
in some detail. First, we can see that the work W done by the force F does not
depend on the trajectory. It does not matter whether the body was moving along
the line or along the curved trajectory, the result depends only on the final velocity
v. Moreover, the work does not depend on the character of motion: the body could
be accelerated uniformly with constant acceleration, or it could be accelerated with
1.4 Potential energy 19

variable acceleration, but the work depends only on the final velocity. And, finally,
the work does not depend on the force. The force could be small and act for a long
time, or it could be big and act only for a moment, but the work depends only on
the final velocity.
Summa summarum, if the body was at rest at the beginning, but it had velocity
v at the end, the work needed to accelerate the body is always the same, regardless
on the way how it was accelerated. This work is called kinetic energy and is defined
by

1 1
T = m v 2 = m ẋi ẋi . (1.8)
2 2
The body has kinetic energy if it is moving and the kinetic energy is equal to work
which must be done by the force to accelerate the body from the rest to velocity v.

1.4 Potential energy


The notion of potential energy is a subtle one and it cannot be defined for a general
system, as we will see later in this textbook. Suppose that there is some force acting
on the particle. It can be gravitational force, electromagnetic force or the force which
spring exerts on the point mass attached to one of its endpoints. In the cases just
enumerated, we can give an explicit expressions for the force. Gravitational force is
given by Newton’s gravitational law
m1 m2
Fg = G
r2
where m1 and m2 are masses of bodies and r their distance, G is gravitational
constant. Electromagnetic force acting on the point charge q moving at velocity v
in electromagnetic field characterized by electric field E and magnetic field B is the
so-called Lorentz force

F EM = q (E + v × B) .

When the spring is displaced from its equilibrium position by y, it exerts force

F = −ky

where k is the constant characterizing the spring. Hence, one way how to characterize
the force is to give an explicit expression. Since the force is vector quantity, we have
to specify three coordinates Fx , Fy and Fz .
20 1 Classical mechanics

In the previous section we introduced kinetic energy which is a quantity charac-


terizing the state of motion. Recall that it is equal to work which is necessary to
accelerate the body of mass m from the state of rest to state of motion at velocity
v. An important feature of kinetic energy is that it does not depend on the process
how the body acquired its velocity.
Now, suppose that the body is under influence of the force so that its velocity is
being changed. This is connected to corresponding change of kinetic energy of the
body. For example, the body released from some altitude undergoes the change from
zero velocity to accelerated motion called free fall under the influence of gravitational
force. In this case it is the gravitational field which performs the work on the body
and this work is equal to the change of kinetic energy of the body. Thus, another way
how to characterize the force is to specify how the kinetic energy of the body changes
under this force. Potential energy will be therefore defined as the work performed by
the force acting on the body.
Suppose that under the influence of the force, the body was displaced from point
A to point B along the trajectory γ depicted in figure 1.3. Work performed by the
force during this motion is, as usually, defined by
Z
W = F · dr (1.9)
γ

where symbol γ resembles the trajectory along which the body was moving. However,
if we choose any other curve γ 0 , figure 1.3, the work associated with this curve will
be, in general, different:
Z
0
W = F · dr 6= W. (1.10)
γ0

In such a case, the notion of potential energy is useless because it depends on partic-
ular trajectory. Hence, in general, potential energy is meaningless quantity. Surpris-
ingly enough, there are many examples of forces where the work W in fact does not
depend on the choice of trajectory γ. Such forces are called conservative or potential
forces and in such cases we can define a useful and meaningful potential energy.
Let us find which forces have this property. We demand that integral (1.9) does not
depend on γ and it depends only on points A and B and investigate the consequences
of this assumption. Then, however, work performed along any closed loop must be
equal to zero. Indeed, let γ be arbitrary closed loop as depicted in figure 1.4. Let us
choose arbitrary two points A and B lying on the curve. In this way we obtain two
1.4 Potential energy 21

γ0
A
Fig. 1.3. Under the influence of force F , body moves from point A to point B. Potential energy is
the work done by the force during this displacement. However, there are infinitely many trajectories
connecting these two points and the work is, in general, different for each of them.

curves: γ1 which is part of γ going from A to B and γ2 which is going from B to A


along different trajectory. Integral over γ can be written as a sum
 
I Z Z
F · dr =  +  F · dr. (1.11)
γ γ1 γ2

We have made an assumption that for any two points A and B the integral be-
tween these two points does note depend on the trajectory but only on the points
themselves. Thus, we can write
Z ZB
F · dr = F · dr
γ1 A

where we do not specify the trajectory as the integral does not depend on it. Similar
consideration applies to integral over γ2 but notice that this curve starts at point B
and ends at point A. Hence,
Z ZA ZB
F · dr = F · dr = − F · dr.
γ2 B A

Therefore, both integrals on the right hand side of (1.11) have the same values but
differ by sign and so we arrive at
22 1 Classical mechanics
I
F · dr = 0 (1.12)
γ

as claimed. Conversely, we leave to the reader to show that if (1.12) hold for arbitrary
closed loop γ, then necessarily integral between any two points does not depend on
the trajectory. Conservative forces are those for which (1.12) holds.
There is yet another formulation of the fact that the force is conservative. This
last formulation is convenient for practical purposes because it is a differential rather
than integral criterion for a force to be conservative. If γ is arbitrary closed loop and
the force is conservative, i.e. (1.12) holds, we can use the Stokes theorem to convert
the line integral into a surface integral:
I Z
F · dr = (∇ × F ) · dS (1.13)
γ S(γ)

where S(γ) is the surface surrounded by loop γ. Then, by the conservative character
of F , the Stokes theorem implies
Z
(∇ × F ) · dS = 0
S(γ)

for arbitrary loop γ. But since the choice of γ is arbitrary, the last equality can hold
for all possible loops only if the integrand vanishes everywhere, i.e.

∇ × F = 0. (1.14)

In other words, the curl of conservative field F is necessarily zero. Poincare’s lemma
then asserts that any vector field with vanishing divergence is the gradient of some
scalar field φ,

F = − ∇φ, (1.15)

so that the components of the force are given by partial derivatives of function φ:
∂φ
Fi = − ≡ −∂i φ. (1.16)
∂xi
The sign minus is conventional.
Let us recapitulate. We have discussed the notion of the work performed on the
body by force of the external force field in which the body is moving. We have argued
1.4 Potential energy 23

γ = γ1 ∪ γ2

B
γ2

γ1
A
Fig. 1.4. Let γ be any closed trajectory (a loop) and A and B any of its two points which split the
curve γ into a union of curves γ1 and γ2 .

that this work in general depends not only on the initial and final positions but also
on the trajectory. Then we defined a special class of the forces for which this is not
true and the work is actually path-independent and called such forces conservative
or potential. We have found four equivalent criteria for the force to be conservative:
• For arbitrary points A and B, the integral

ZB
W = F · dr (1.17)
A

does not depend on the trajectory between points A and B.


• For arbitrary closed curve γ integral W vanishes,
I
F · dr = 0. (1.18)
γ

• The curl of the force field vanishes,

∇ × F = 0. (1.19)

• Force field F is (minus) the gradient of some scalar function

F = − ∇φ. (1.20)

Function φ, if exists, is called the potential of vector field F or simply the potential
energy.
24 1 Classical mechanics

Sometimes, especially in the context of Lagrange’s and Hamilton’s formalism, the


potential is denoted by V instead of φ. This convention will be followed later in the
textbook.
To conclude this section we repeat what is the potential energy. This term can
be defined only for conservative forces, i.e. for forces satisfying one of equivalent
conditions1 (1.17)–(1.20). Then, by (1.20), there exists a function φ such that F =
−∇φ. This function is, by definition, called potential energy. Recall that our original
motivation was to characterize the field not by the force but by the work which
the force performs during the motion of body. This work, for conservative forces, is
directly related to the potential:

ZB ZB ZB
W = F · dr = − (∇φ) · dr = − dφ = φA − φB . (1.21)
A A A

Thus, the work performed by the force is equal to difference of the values of the
potential at the initial and at the final point of the trajectory. In the proof we have
used the identity

dφ = ∂i φ dxi = (∇φ) · dr.

1.5 Conservation of energy


Adjective “conservative” introduced to name forces which display properties (1.17)–
(1.20) reflects the fact that energy of a moving body is constant in such force field.
We define the total mechanical energy of the body in conservative field F = −∇φ by

E = T + φ. (1.22)

This quantity is constant in time. Before we proof this statement, notice that by
second Newton’s law and the definition of the potential, the acceleration of the body
is
dv 1 1
a= = F = − ∇φ.
dt m m
Let us differentiate the total energy with respect to time:
1
The equivalence means that if the force satisfies one of these conditions, it automatically satisfies re-
maining three conditions.
1.6 Conservation of momentum 25
 
dE d 1 2
= m v + φ = m v · v̇ + dφ = − v · ∇φ + (∇φ) · v = 0. (1.23)
dt dt 2
Thus, quantity E is indeed constant in time, i.e. it is conserved.
Theorem 1. Mechanical energy of the system in which the forces are potential is
constant.
Later we will see that the conservation of energy is in fact a consequence of a
deeper principle that the laws of motion cannot depend on time, i.e. the laws are
the same at all times. We say that the time is homogeneous. Precise meaning of this
statement will be clarified in sections 4.7 and 4.8.

1.6 Conservation of momentum


Consider a system of N particles, each of them has position vector r i , velocity v i = ṙ i
and acceleration ai = v̇ i = r̈ i , where i = 1, 2 . . . N . Mass of the i−th particle will be
denoted by mi and hence the momentum of i−the particle is pi = mi v i .
These particle are in the interaction with each other and, in addition, there can
be an external force acting on each particle, e.g. gravitational force. Internal force
exerted by i−th particle on j−th particle will be denoted by F ij . In accordance with
the action of reaction, internal forces obey relations

F ij = −F ji . (1.24)

According to the law of force, total force exerted on i−th particle is equal to derivative
of its momentum, i.e.
dpi X
= Fi + F ij (1.25)
dt j6=i

where the total force on the right hand side is a sum of the external force and internal
forces exerted by all other particles.
Total momentum of the system is a sum of momenta of all particles,
X
P = pi ,
i

and its time derivative reads


X X XX
Ṗ = ṗi = Fi + F ij .
i i i j6=i
26 1 Classical mechanics

Now we use the law of action and reaction (1.24). In the expression
XX
F ij
i j6=i

we sum through all (ordered) pairs of particles. For each pair (i, j) contributing by
F ij to the sum, there is a pair (j, i) contributing to the sum by F ji = −F ij . Hence,
the total sum of all internal forces is necessarily equal to zero and the time derivative
of the momentum reads
X
Ṗ = F i. (1.26)
i

In other words, the total momentum changes only because of the external forces and
internal interaction does not contribute to the overall change of momentum. If there
are no external forces, the total momentum is constant,

Ṗ = 0. (1.27)

Law (1.26) states that the total change of the momentum is equal to the external
force impressed on the system and total momentum is constant if there are no ex-
ternal forces. System with no external forces is called isolated because of lack of its
interaction with surrounding bodies. Thus, law (1.26) can be reformulated as follows.

Theorem 2. Total momentum of isolated system of interacting particles is constant


in time (conserved).

Similarly to the case of energy, conservation of momentum is a consequence of homo-


geneity of the space. Notion of isotropy and its relation to conservation of momentum
are discussed in detail in sections 4.7 and 4.8.

1.7 Conservation of angular momentum


For a single particle, as well as for a system of particles, the force impressed will cause
a change of the momentum. However, in the case of the system of N particles, it
makes sense to distinguish two kinds of motion: translation, when the body changes
the position, and rotation.
In the introductory courses of elementary physics it is explained that rotational
effect of the force can be quantified by the so-called torque (or moment of force) with
respect to a fixed origin defined by
1.7 Conservation of angular momentum 27

M = r × F. (1.28)

Recall that the magnitude of the cross product is

M = r F sin α

where α is an angle between both vectors. Because of the presence of the cross
product, the torque vanishes if the force is parallel to the position vector. In such a
case we expect that the force will not cause a rotation. In contrary, rotational effect
of the force will be maximal if vectors r and F are orthogonal, see figure 1.5.

Fn
r
α

F
Ft

Fig. 1.5. Rotational effect of the force F on the disk attached to a fixed point in its centre. Any
force F can be decomposed into the normal part Fn and the tangential part Ft . Clearly, the normal
part F n does not affect the rotation of the disk and only tangential part Ft is responsible for rotation.
Magnitude of tangential part is given by Ft = F sin α and hence we define the torque by (1.28).

By the same argumentation we can arrive at the notion of angular momentum.


While the torque characterizes rotational effect of the force exerted, angular momen-
tum characterizes rotational state of motion. Angular momentum of the i−th particle
with respect to fixed origin is defined as

l i = r i × pi . (1.29)

The total angular momentum is then naturally


28 1 Classical mechanics
X
L= li . (1.30)
i

Let us take a time derivative of total angular momentum:


X X
L̇ = l̇i = [ṙ i × pi + r i × ṗi ] .
i i

Since pi = mi ṙi , vectors r i and pi are parallel and hence their cross product vanishes.
Thus, we have
X
L̇ = r i × ṗi .
i

Because ṗi is a total force acting on i−the particle, we can see that the rate of change
of the angular momentum of i−th particle is given by the torque of total force acting
on this particle. However, we can proceed further and decompose ṗi into an external
force and the sum of internal forces (as in the previous section) to find
X XX
L̇ = ri × F i + r i × F ij .
i i i6=j

Repeating the argument based on the action-reaction law we conclude that the total
change of the angular momentum is
X
L̇ = ri × F i = M (1.31)
i

where
X
M= ri × F i
i

is the total torque of external forces impressed on the system.


Hence, internal forces does not contribute to the total change of rotational state
of the system, i.e. internal forces cannot affect total angular momentum. The only
reason why the system of particles can change its angular momentum is the presence
of external forces. Again, when no external forces are present and the system is
isolated, total angular momentum is conserved.

Theorem 3. Total angular momentum of an isolated system of interacting particles


is constant.
1.8 Curvilinear coordinates 29

1.8 Curvilinear coordinates


In this chapter we have introduced familiar Cartesian coordinate system. In the
Cartesian coordinates we assign a triple of numbers (x, y, z), or xi where i = 1, 2, 3,
to each point of the space. In chapter 2 we will see that sometimes it is useful to
use a different coordinate system which is better adapted to a problem to be solved.
Motivation will be presented in chapter 2 but let us introduce the most common
coordinate systems here.
In geometry, coordinates from the Cartesian ones are called curvilinear coordinates
because axes associated to non-Cartesian coordinates are usually curves rather than
lines, see below. In mechanics we often refer to curvilinear coordinates as generalized
coordinates in a sense that the Cartesian coordinates comprise only special class of
more general coordinate systems. In this book we use convention that the Cartesian
coordinates will be always denoted by symbol x and labelled by the Latin indices

i, j, k, . . .

which take values 1, 2, . . . n where n is a dimension of the space. For example, n = 3


for ordinary three-dimensional space, n = 2 for the plane. Later we will meet abstract
spaces with higher dimensions, e.g. the phase space. Hence, for n = 3 we have three
coordinates

x = (x1 , x2 , x3 ), or, for brevity, xi .

We do not specify the values of indices i, j, k, . . . if the dimension is clear from the
context. Notice that symbol x without index stands for the n−tuple of coordinates
xi where i = 1, 2, . . . n, in general. Occasionally, we use standard notation

x1 = x, x2 = y, x3 = z,

if no confusion can arise.


Generalized coordinates will be denoted by symbol q and labelled by the Latin
indices

a, b, c, . . .

which take values 1, 2, . . . n where n is a dimension of the space. Again, symbol q


stands for the n−tuple of coordinates qa ,

q = (q1 , q2 , . . . qn ).
30 1 Classical mechanics

Hence, if we write f = f (q), it means that function f depends on all generalized


coordinates qa . Usually, if generalized coordinates have direct geometrical meaning,
we use specific symbols for individual coordinates. For example, if q1 has the meaning
of distance, it will be denoted by q1 = r, if q2 is the angle, it will be denoted by q2 = φ.
When dealing with coordinate transformation from Cartesian coordinates to curvi-
linear coordinates, we often need the Jacobi matrix of a transformation. Jacobi matrix
is the J matrix of first derivatives of ”new” coordinates with respect to the ”old”
ones. Elements of Jacobi matrix are therefore defined by
∂x1 ∂x1 ∂x1
 
 ∂q1 ∂q2 · · · ∂qn 
 ∂x2 ∂x2 ∂x2 
 
∂xi  ··· 
Jia = = ∂q1 ∂q2 ∂qn 
.
∂qa  ..

 .


 ∂xn ∂xn ∂xn 
···
∂q1 ∂q2 ∂qn
Notice that the matrix itself is denoted by the bold symbol J while the elements of
the matrix are denoted by Jia .
Suppose that we are given transformation

xi = xi (q)

from curvilinear coordinates to the Cartesian coordinates. Notice that the last equa-
tion is in fact an abbreviation for n transformation relations. If we invert these
relations we arrive at the inverse coordinate transformation

qa = qa (x)

with the Jacobi matrix


∂qa
J ai = . (1.32)
∂xi

Let us take a matrix product of the Jacobi matrix J and matrix J of the inverse
transformation. We find
∂xi ∂qa
Jia J aj = = δij
∂qa ∂xj
where we have used the chain rule for partial derivatives in the last step. Since δij
are the components of the unit matrix, we have
1.8 Curvilinear coordinates 31

J ·J =I

where I is the identity matrix. The last equality shows

J −1 = J ,

i.e. Jacobi matrices of direct and inverse coordinate transformations are mutually
inverse.

1.8.1 Polar coordinates

Polar coordinates are defined in the plane rather than in three-dimensional space,
see figure 1.6. Let (x, y) be Cartesian coordinates of a given point with the position
vector r. Distance of this point from the origin will is denoted by r and is related to
Cartesian coordinates by
p
r = x2 + y 2 .

Now we denote the angle between the position vector and the axis x by θ, see figure
1.6. Then the pair (r, θ) constitutes the polar coordinates of a point under consider-
ation. Clearly, polar coordinates and Cartesian coordinates are related by equations

x = r cos θ,
(1.33)
y = r sin θ.

The inverse transformation reads


p
r = x2 + y 2 ,
y (1.34)
θ = arctan .
x
In the notation introduced above, the Cartesian coordinates for the plane are

x1 = x and x2 = y

while the generalized coordinates qa (polar coordinates, in this case) are

q1 = r, q2 = θ.

For transformation (1.33), the Jacobian is


32 1 Classical mechanics

(x, y)
r

θ
x
Fig. 1.6. Polar coordinates in the plane. Cartesian coordinates of the point are (x, y), polar coordinates
are (r, θ) where r is a distance of the point from the origin and θ is the angle between the radius-vector
and x−axis.

∂x ∂x
 
 ∂r ∂θ 
 
cos θ −r sin θ
J = = . (1.35)
 
 ∂y ∂y  sin θ r cos θ
∂r ∂θ
Let us see how this result can be obtained using Mathematica. First we define function
Jacobi which accepts the list of the Cartesian coordinates xs, the list of generalized
coordinates qs and the list of transformation rules rules. These rules are assumed to
be of the form
{ x1 -> ..., x2 -> ..., etc.}
where the dots express the Cartesian coordinates in terms of generalized ones. Func-
tion Jacobi can be defined, for example, as follows:

H* x i = x i H q L *L
In[1]:=

Jacobi@ xs_ , qs_ , rules_ D := D@ xs . rules , ð D & ž qs  Transpose

For our particular example of polar coordinates, this function should be called in the
following way:
1.8 Curvilinear coordinates 33

Jacobi@ 8x , y <, 8r , Θ <, 8x ® r Cos@Θ D, y ® r Sin @Θ D<D  MatrixForm


In[2]:=

Out[2]//MatrixForm=

Cos@ΘD - r Sin @ΘD


K O
Sin @ΘD r Cos@ΘD

We can see that the result is identical with the previous one. In the rest of this
chapter we will use function Jacobi freely without explicitly mentioning it. Moreover,
we can call function Inverse to find
cos θ sin θ
!
J −1 = sin θ cos θ .

r r
By (1.32), we can deduce the partial derivatives of generalized coordinates with
respect to the Cartesian ones without actually calculating them:
∂r ∂r
= cos θ, = sin θ,
∂x ∂y
(1.36)
∂θ sin θ ∂θ cos θ
=− , = .
∂x r ∂y r
Now suppose that we want to describe the motion of a particle in the polar
coordinates. Since the particle is moving, its Cartesian coordinates will depend on
time, xi = xi (t), or explicitly

x = x(t), y = y(t).

The Cartesian components of the velocity are, as usually, vi = ẋi , or explicitly


dx dy
vx = , vy = .
dt dt
If the Cartesian coordinates depend on time, so do the polar coordinates, i.e. qa =
qa (t), or explicitly

r = r(t), θ = θ(t).

Cartesian components of the velocity in terms of polar coordinates read


34 1 Classical mechanics

d
ẋ = (r(t) cos θ(t)) = ṙ cos θ − r θ̇ sin θ,
dt (1.37)
d
ẏ = (r(t) sin θ(t)) = ṙ sin θ + r θ̇ cos θ.
dt
The magnitude of the velocity is then

v 2 = ẋi ẋi = ẋ2 + ẏ 2 = ṙ2 + r2 θ̇2 . (1.38)

1.8.2 Spherical coordinates


Spherical coordinates are analogous to polar coordinates but they are defined in
three-dimensional space. Geometrical meaning of spherical coordinates is depicted in
figure 1.7. Again, r is distance of the point from the origin, θ is the angle between
the position vector and z−axis. Next we project the position vector onto xy−plane,
obtaining so a vector r0 . Angle between this vector and the x−axis is denoted by φ.
By simple geometry we find transformation relations
x = r sin θ cos φ,
y = r sin θ sin φ, (1.39)
z = r cos θ.
Corresponding inverse relations read
p
r = x2 + y 2 + z 2 ,
p p
x2 + y 2 z x2 + y 2
θ = arctan = arccos p = arcsin p , (1.40)
z x2 + y 2 + z 2 x2 + y 2 + z 2
y
φ = arctan .
x
The Jacobi matrix and its inverse are
 
cos φ sin θ r cos θ cos φ −r sin θ sin φ
J =  sin θ sin φ r cos θ sin φ r cos φ sin θ  ,
cos θ −r sin θ 0
 
cos φ sin θ sin θ sin φ cos θ
 cos θ cos φ cos θ sin φ sin θ  (1.41)
 − 
−1
J =
 r r r ,

 
 csc θ sin φ cos φ csc θ 
− 0
r r
1.8 Curvilinear coordinates 35

θ (x, y, z)

r0
φ

Fig. 1.7. Spherical coordinates (r, θ, φ) in three-dimensional space.

where
1
csc x = .
sin x
Components of the velocity can be calculated in the same way as in the previous
subsection. However, we can use Mathematica as in the following example:

x @t_ D = r @tD Sin @Θ @tDD Cos@Φ @tDD;


In[31]:=

y @t_ D = r @tD Sin @Θ @tDD Sin @Φ @tDD;


z@t_ D = r @tD Cos@Θ @tDD;
Print@"x ' = ", x '@tDD
Print@"y ' = ", y '@tDD
Print@"z' = ", z '@tDD
PrintB"v 2 = ", Simplify B x '@tD 2 + y '@tD 2 + z '@tD 2 FF

x ' = Cos@Φ@tDD Sin @Θ@tDD r ¢ @tD + Cos@Θ@tDD Cos@Φ@tDD r @tD Θ ¢ @tD - r @tD Sin @Θ@tDD Sin @Φ@tDD Φ ¢ @tD

y ' = Sin @Θ@tDD Sin @Φ@tDD r ¢ @tD + Cos@Θ@tDD r @tD Sin @Φ@tDD Θ ¢ @tD + Cos@Φ@tDD r @tD Sin @Θ@tDD Φ ¢ @tD

z' = Cos@Θ@tDD r ¢ @tD - r @tD Sin @Θ@tDD Θ ¢ @tD

v 2 = r ¢ @tD 2 + r @tD 2 IΘ ¢ @tD 2 + Sin @Θ@tDD 2 Φ ¢ @tD 2 M


36 1 Classical mechanics

The last line of the output shows that the magnitude of the velocity in spherical
coordinates is
 
v 2 = ṙ2 + r2 θ̇2 + sin2 θ φ̇2 .
2
Lagrange equations

2.1 Motivation
Basic equation of classical mechanics is Newton’s law of force. If the force F acts on
a point mass m, this point mass undergoes an acceleration a according to formula
F
a= .
m
In the previous chapter we have introduced a Cartesian coordinate system, in which
the law of force can be written in the form

Fi = m ẍi . (2.1)

We can see that Newton’s law is a differential equation of second order. Solving this
equation we find three coordinates xi as functions of time

xi = xi (t).

However, equation (2.1) holds only in Cartesian coordinates. Since we are inter-
ested in the motion of bodies in three dimensional space E 3 or two dimensional space
E 2 , we can always introduce Cartesian coordinate system, write down equations of
motion and in principle we can also solve them. However, Cartesian system is not
always the most convenient choice and there can be other coordinate systems which
are more appropriate. So, natural question arises: what are the equations of motion
in arbitrary coordinate system?
To illustrate why we need non-Cartesian coordinates, let us consider the following
example. Mathematical pendulum is a point mass m attached to a fixed point called
pivot via rigid rod of length r, see figure 2.1. Cartesian coordinates of the point mass
38 2 Lagrange equations

are (x, y). Pendulum is subject to gravitational force F = mg, where g = (0, −g) is
gravitational acceleration. Thus, in order to find the equation of motion we have to
find Cartesian components of the force F and insert them into Newton’s law (2.1).
There is a problem, however: coordinates x and y are not independent. For the rod
is assumed to be rigid, it has fixed length and, by Pythagorean theorem, coordinates
x and y have to satisfy equation

x2 + y 2 = r 2 , (2.2)

where r is the length of the rod. This is not a dynamical equation, because it is not a
differential equation which can be solved for given initial conditions. Rather it is an
algebraic equation which must be satisfied for any solution of equations of motion.
Equations of this kind are called constraints and we say that coordinates x and y
are constrained.

θ
r

(x, y)
θ
Ft Fn
mg

y
Fig. 2.1. Mathematical pendulum.

In other words, we have two equations of motion, one for each coordinate, but in
addition we have to satisfy the constraint (2.2). Instead of two equations we have
2.1 Motivation 39

to solve three. The reason is that the Cartesian coordinates are not well adapted to
the problem at all. If the system is described by two independent coordinates, we
say that it has two degrees of freedom. But the constraint reduces the number of
degrees of freedom to one! It is natural, because the pendulum can move only along
the circle of radius r. And circle is one-dimensional object. Although we describe the
position of pendulum by two coordinates, it has only one degree of freedom.
Can we describe the motion of the pendulum in such a way that it will have
manifestly only one degree of freedom? Definitely we can. The position of pendulum
is uniquely determined by the angle of deflection θ, see again figure 2.1. According
to that figure, Cartesian coordinates (x, y) are related to the angle θ by

x = r sin θ,
(2.3)
y = r cos θ.

This is similar to polar coordinates introduced before, the exchange of sin and cos
comes from different definition of angle θ. More important is that quantity r is
constant, it is not a variable. We can easily verify that the constraint (2.2) is satisfied
for any value of θ:

x2 + y 2 = r2 sin2 θ + r2 cos2 θ = r2 (sin2 θ + cos2 θ) = r2 .

We can see that if we describe the pendulum by angle θ, we do not have to care
about the constraint anymore, for it is automatically satisfied. We thus have the
single variable θ which corresponds to the fact that the pendulum has only one
degree of freedom.
This is certainly a progress! In Cartesian coordinates we had two equations of
motion and one constraint. Now we have only one variable and no constraint. What
remains is to find the equation of motion. From the figure 2.1 it is obvious that
the force F acting on the pendulum can be decomposed to two components F t and
F n . Force F n is the normal component parallel to the rod. It causes the tension of
the rod, but since the rod is rigid, this has no effect on the motion of pendulum.
On the other hand, component F t tangent to the trajectory causes the acceleration.
Magnitude of tangent force is

Ft = F sin θ = m g sin θ.

By Newton’s law, this force causes tangential acceleration at of magnitude


Ft
at = = g sin θ.
m
40 2 Lagrange equations

Tangential acceleration is
at = r θ̈
where θ̈ is angular acceleration. The equation of motion of mathematical pendulum
is therefore
r θ̈ + g sin θ = 0
or in slightly modified form
g
θ̈ + sin θ = 0 (2.4)
r
The point is that the Cartesian coordinates are not always the most convenient.
We have seen that if we describe pendulum by Cartesian coordinates we have to solve
two equations of motion and one constraint, i.e. three equations. But the pendulum
has only one degree of freedom and its description by two coordinates is redundant.
This redundancy is reason why we have to impose the constraint. The problem can
be circumvented by appropriate choice of coordinates. Choosing angle θ as a single
coordinate we have eliminated the constraint and we have found the single equation
of motion. So we have one variable θ and one equation of motion. In this coordinate
system the system has one degree of freedom manifestly and we do not have to
impose the constraint.
Mathematical pendulum is a very simple system and we will analyze its properties
later on. We will see that despite its simplicity it possesses several non-trivial proper-
ties and the equation of motion cannot be even solved. In physics and in modelling of
realistic situations we often meet systems which are much more complicated. Double
pendulum, for example, consists of two point masses, one is attached to pivot, but
the second point mass is attached to the first one. Analysis shows that the motion of
double pendulum is chaotic. But in the case of double pendulum it is not clear how to
find the equations of motion and the procedure sketched above is more complicated.
Lagrange formalism to be introduced in this chapter provides a systematic way how
to derive equations of motion in arbitrary curvilinear coordinate system.

2.2 Lagrange equations of the second kind


We start with the derivation of Lagrange equations of the second kind. Lagrange
equations of the first kind also exist but they contain the constraints explicitly. We
will not study them in this text. Lagrange equations of the second kind eliminate
constraints by choosing appropriate coordinate system. For simplicity we consider
only one particle of mass m. The result will be easily generalized to more particles.
2.2 Lagrange equations of the second kind 41

2.2.1 Generalized coordinates

In Cartesian coordinates, the law of force has the form

Fi = m ẍi . (2.5)

We want to transform this equation into an arbitrary curvilinear coordinate system.


New coordinates will be denoted q and labeled by indices a = 1, 2, . . . n, where n is
not necessarily equal to 3. For example, as we have seen, the pendulum is described
by single coordinate θ. Variables qa are called generalized coordinates. Cartesian co-
ordinates are connected to generalized coordinates by relations of the form

xi = xi (q),

where symbol q stands for the whole n−tuple (q1 , . . . qn ). If this is too abstract for
the reader, equations (2.3) from the previous section can serve as an example of a
coordinate transformation. In the case of pendulum, Cartesian coordinates xi are x
and y, and the only generalized coordinate is q1 = θ.
Moreover, we assume that previous relations can be inverted, i.e. we can express
generalized coordinates as functions of the Cartesian ones:

qa = qa (x).

Thus, the generalized coordinates are functions of the Cartesian coordinates and vice
versa. On the other hand, Cartesian coordinates depend on time (they are solutions
of (2.5)), so the generalized coordinates must depend on time, too:

qa (t) = qa (x(t)).

The total derivative of qa with respect to time can be obtained by the chain rule for
derivatives:
∂qa
q̇a = ẋi .
∂xi
This relation immediately implies
∂ q̇a ∂qa
= . (2.6)
∂ ẋi ∂xi
The total derivative of xi expressed in terms of generalized coordinates reads
42 2 Lagrange equations

∂xi
ẋi = q̇a . (2.7)
∂qa
Notice that since qa depend on xi , also the quantity
∂qa
∂xi
depends on xi . Similarly, xi depends on qa and therefore
∂xi
∂qa
depends on qa as well.
We know that if xi is a Cartesian coordinate, then ẋi is i−th component of the
velocity, i.e. vi = ẋi . Analogously, derivatives of qa with respect to time are called
generalized velocities. In Lagrangian formalism, coordinates and corresponding ve-
locities are treated as independent variables. In other words,
∂ ẋi ∂ q̇a
= = 0.
∂xj ∂qb

2.2.2 Kinetic energy


Kinetic energy expressed in the Cartesian coordinates is
1 1
T = m v 2 = m ẋi ẋi . (2.8)
2 2
Kinetic energy therefore depends on the Cartesian velocities, but it does not depend
on the Cartesian coordinates themselves:
∂T
= 0,
∂xi
but
∂T
= m ẋi . (2.9)
∂ ẋi
Expression (2.8) for kinetic energy can be rewritten in terms of generalized coor-
dinates using (2.7):
  
1 ∂xi ∂xi 1 ∂xi ∂xi
T = m q̇a q̇b = m q̇a q̇b .
2 ∂qa ∂qb 2 ∂qa ∂qb
Kinetic energy depends on generalized velocities q̇a , but now it depends also on qa ,
because of partial derivatives (recall the remarks below equation (2.7)),
∂T ∂T
T = T (q, q̇), 6= 0, 6= 0.
∂qa ∂ q̇a
2.2 Lagrange equations of the second kind 43

2.2.3 Generalized forces

The last ingredient neccesary for the derivation of Lagrange equations is the notion
of generalized forces. Generalized forces are the components of the force F in the
curvilinear coordinate system. If Fi are Cartesian components of the force, then the
generalized forces are defined by
∂xi
Qa = Fi . (2.10)
∂qa

2.2.4 Derivation

Now we are prepared to derive the Lagrange equations of the second kind. Newton’s
law reads

Fi = m ẍi .

Using relation (2.9), the right hand side can be rewritten as

d ∂T
Fi = .
dt ∂ ẋi
Multiply this equation by ∂xi /∂qa to obtain

∂xi ∂xi d ∂T
Fi = .
∂qa ∂qa dt ∂ ẋi
On the left hand side we can see the generalized forces Qa according to relation
(2.10):

∂xi d ∂T
Qa = . (2.11)
∂qa dt ∂ ẋi
Now we are going to rearrange the right hand side in order to eliminate the Cartesian
coordinates xi .
Using the Leibniz rule1 , the right hand side can be rewritten as
 
d ∂xi ∂T ∂T d ∂xi
Qa = − . (2.12)
dt ∂qa ∂ ẋi ∂ ẋi dt ∂qa
1
Leibniz rule is a product rule for differentiation. Derivative of the product f g is (f g)0 = f 0 g + f g 0 . We
use this rule in the form f g 0 = (f g)0 − f 0 g.
44 2 Lagrange equations

The first term on the right hand side is, using (2.6), equal to
   
d ∂xi ∂T d ∂ ẋi ∂T
= .
dt ∂qa ∂ ẋi dt ∂ q̇a ∂ ẋi

Recall that the kinetic energy depends on Cartesian velocities ẋi but it does not
depend on the coordinates. Then, by the chain rule, we have
∂T ∂T ∂ ẋi ∂T ∂xi ∂T ∂ ẋi
= + = .
∂ q̇a ∂ ẋi ∂ q̇a ∂xi ∂ q̇a ∂ ẋi ∂ q̇a
|{z}
0

Thus, equation (2.12) acquires the form

d ∂T ∂T d ∂xi
Qa = − . (2.13)
dt ∂ q̇a ∂ ẋi dt ∂qa
Now we want to eliminate the Cartesian coordinates from the second term of
equation (2.13). Consider following identity:
   
d ∂xi ∂ ∂xi ∂ ∂xi
= q̇b + q̈b .
dt ∂qa ∂qb ∂qa ∂ q̇b ∂qa

The order of partial derivatives can be interchanged (partial derivatives commute):


 
d ∂xi ∂ ∂xi ∂xi ∂ dxi ∂ ẋi
= q̇b + q̈b = = .
dt ∂qa ∂qa ∂qb ∂ q̇b ∂qa dt ∂qa

The second term in (2.13) then reads

∂T d ∂xi ∂T ∂ ẋi ∂T
= = .
∂ ẋi dt ∂qa ∂ ẋi ∂qa ∂qa

Substituting this equality into (2.13) we arrive at the final form of the equations of
motion.

d ∂T ∂T
− = Qa (2.14)
dt ∂ q̇a ∂qa
2.3 Lagrange equations 45

2.3 Lagrange equations


In the previous section we have derived, after some effort, the Lagrange equations of
the second kind (2.14). These equations are completely equivalent to Newton’s law
of motion, but they are written in arbitrary curvilinear coordinate system, while the
Newton’s law has its simple form in the Cartesian coordinates only. If we want to
derive equations of motion for particular system, we have to write down expression
for kinetic energy T , transform it to appropriate coordinate system and find the
components of generalized forces Qa and insert them into equations (2.14).
Note that while it is easy to find the expression for T in generalized coordinates,
because it is a scalar function, generalized forces involve the calculation of the sum
∂xi
Qa = Fi .
∂qa
There is, however, special but very important case, when the forces Fi are conserva-
tive. We know from elementary physics that gravitational force or electrostatic force
can be written as a gradient of a scalar function called potential. By definition, the
force with components Fi is called conservative if there exists a potential V such
that
∂V
Fi = − ≡ − ∂i V, (2.15)
∂xi
where the minus sign is conventional. What are the components of the generalized
forces in such case? The calculation is straightforward:
∂xi ∂xi ∂V ∂V
Qa = Fi = − =− .
∂qa ∂qa ∂xi ∂qa
Thus, for conservative forces, the components of generalized forces are simply par-
tial derivatives of the potential with respect to generalized coordinates. Lagrange
equations of the second kind then acquire the form
d ∂T ∂T ∂V
− =− . (2.16)
dt ∂ q̇a ∂qa ∂qa
An important point is that the potential cannot depend od velocities. It is a conse-
quence of the fact that conservative forces do not depend on the motion of the bodies
on which they act2 – they depend only on the configuration of the system, i.e. on
the positions of individual objects. In other words,
2
For example, gravitational force depends only on the distance of both objects, but it does not depend
on the velocity of the bodies. On the other hand, electromagnetic force does depend on the velocity –
magnetic force is a cross product of the velocity and magnetic field. Nevertheless, the concept of the
Lagrangian is valid also for the electromagnetic force; this issue is explained later.
46 2 Lagrange equations

∂V
= 0.
∂ q̇a
Now, rewrite equation (2.16) as
d ∂T ∂(T − V )
− = 0.
dt ∂ q̇a ∂qa
Since the potential does not depend on q̇a , we can write also
d ∂(T − V ) ∂(T − V )
− = 0,
dt ∂ q̇a ∂qa
because the term ∂V /∂ q̇a which we added vanishes anyway. Obviously, it is useful to
introduce a new scalar function called Lagrangian by

L = T − V. (2.17)

We arrive at Lagrange equations in the form

d ∂L ∂L
− = 0. (2.18)
dt ∂ q̇a ∂qa
Notice the terminology used: there exist Lagrange equations of the first kind
but we do not consider them in this text. In the previous section we derived the
Lagrange equations of the second kind which are equivalent to Newton’s law of
motion, but they are written in generalized coordinate system. Equations (2.18)
are called simply Lagrange equations. They are not completely equivalent to the
Newton law, because we assumed that the forces are conservative. Gravitational
and electrostatic forces are typical conservative forces. By contrast, the friction and
general electromagnetic forces are non-conservative, i.e. there is no potential V from
which they can be derived. If the system under consideration contains the friction,
we cannot find the Lagrangian of this system and we cannot use Lagrange equations,
but we still can use Lagrange equations of the second kind. It is interesting that
although the electromagnetic force is not conservative, the Lagrangian exists, as
we will see later. The friction is not a fundamental force, however: it is a result of
complicated interaction between the molecules forming surfaces of bodies in contact.
The electromagnetic force, on the other hand, is fundamental, it is one of four basic
forces in Nature. In fact, it is the most important force for us. Fortunately, it can be
described in Lagrange formalism so that the Lagrange equations (2.18) are sufficient
for the description of almost all physically relevant situations.
2.4 Particle in homogeneous gravitational field 47

Lagrange equations have one important advantage compared to Lagrange equa-


tions of the second kind. The equations of motion are derived from the single function
L and we do not have to calculate generalized forces Qa . In the following sections
we will show few examples how Lagrange formalism works, then we show how to
implement new formalism in Mathematica.

2.4 Particle in homogeneous gravitational field


We start with very simple example, with homogeneous gravitational field. Gravita-
tional field is never homogeneous but near the surface of the Earth it is approxi-
mately constant. All bodies are moving with constant gravitational acceleration g.
Its Cartesian coordinates are

g = (0, −g),

where g = ˙ 9.81 kg m s−2 . Thus, gravitational acceleration always points downwards


and has magnitude g. On the other hand, Cartesian coordinates of the acceleration
are (ẍ, ÿ), so the equations of motion are

ẍ = 0, ÿ = −g. (2.19)

Let us see how the same result can be derived in Lagrange formalism. Kinetic
energy of the particle is
1 1
m ẋi ẋi = m ẋ2 + ẏ 2 .

T =
2 2
Gravitational force is

F = m g,

so that

F1 = 0, F2 = −m g.

In order to find the potential we have to solve equations


∂V ∂V
F1 = − , F2 = − .
∂x ∂y
The first equation merely states that V does not depend on x, i.e.
48 2 Lagrange equations

V = V (y).

The second equation then reads


∂V
− = −g
∂y
which integrates to
Z
V = m g dy = m g y + const.

Integration constant does not affect the equations of motion (why?), so we can set
the constant to zero without the loss of generality.
We have found the kinetic energy and the potential, so we can write down the
Lagrangian which is by definition
1
m ẋ2 + ẏ 2 − m g y.

L=T +V = (2.20)
2
Notice that Lagrange equations (2.18) are written in an arbitrary coordinate system.
Our motivation was to introduce curvilinear coordinates but these equations hold in
the Cartesian system as well. Now the generalized coordinates are simply

q1 = x, q2 = y,

and Lagrange equations read


d ∂L ∂L
− = 0.
dt ∂ ẋi ∂xi
For the Lagrangian (2.20) we have
∂L d ∂L
= m ẋ, = m ẍ,
∂ ẋ dt ∂ ẋ

∂L d ∂L
= m ẏ, = m ÿ,
∂ ẏ dt ∂ ẏ

∂L ∂L
= 0, = m g.
∂x ∂y
2.5 Harmonic oscillator 49

Substituting these expressions into Lagrange equations (2.18) we arrive at the equa-
tions of motion:
d ∂L ∂L
− =0 → m ẍ = 0,
dt ∂ ẋ ∂x

d ∂L ∂L
− =0 → m ÿ = −m g.
dt ∂ ẏ ∂y
We can see that the Lagrange equations are familiar equations of motion (2.19). Of
course, for the motion in homogeneous gravitational field we can find the equations
of motion easier than through the Lagrangian. But before we can apply it to more
complicated problems, it is useful to see how it works in simple cases where we know
the result even without Lagrange equations.

2.5 Harmonic oscillator


Harmonic oscillator is one of the most important models in physics. In mechanics it
corresponds to the motion of an idealized spring, see figure 2.2. The point mass m
is connected to a fixed point via massless spring. If the point mass is displaced from
the equilibrium position, the spring exerts the force

F = −k q,

where q is the displacement. The minus sign is due to fact that the force always acts
in the direction opposite to the displacement. The constant k is called the rigidity
of the spring. According to the Newton’s law of motion, the acceleration is given by
F
a= .
m
Since the motion is one-dimensional, the only component of the previous equation is
k
q̈ = − q.
m
Constant k/m is usually denoted as
k
ω2 =
m
so that the equation of motion is
50 2 Lagrange equations

q̈ + ω 2 q = 0. (2.21)

This equation appears in physics very frequently, even if it is not connected with
the motion of the spring, e.g. oscilations of the electric circuits, vibrating atoms in
the crystal lattice.
Let us find the Lagrangian for harmonic oscilator. Kinetic energy is straightfor-
ward:
1
T = m q̇ 2 .
2
Potential is defined by relation
∂V
F =−
∂q
and the integration yields
1
Z Z
V = − F dq = k q dq = k q 2 .
2
This expression is usually written in terms of parameter ω:
1
V = m ω2 q2.
2
Thus, the Lagrangian is
1 1
L= m q̇ 2 − m ω 2 q 2 . (2.22)
2 2
Lagrange equations are obtained in a usual way and we find
∂L d ∂L ∂L
= m q̇, = m q̈, = m ω 2 q. (2.23)
∂ q̇ dt ∂ q̇ ∂q

Inserting these expressions into (2.18) we find

d ∂L ∂L
− =0 → q̈ + ω 2 q = 0. (2.24)
dt ∂ q̇ ∂q

Again, we arrived at expected equation of motion (2.21).


2.6 Mathematical pendulum 51

m
F
q=0 q
Fig. 2.2. Equilibrium position of the spring corresponds to q = 0. Restoring force F is proportional
to the displacement, F = −kq.

2.6 Mathematical pendulum


So far we considered Lagrange equations in the Cartesian coordinates. Now we return
to the example from the introduction to this chapter, to mathematical pendulum;
recall figure 2.1. As we explained, it is more convenient to introduce polar coordinates
via relations

x = r sin θ, y = r cos θ.

The kinetic energy is again given, in Cartesian coordinates, by


1 1
m ẋi ẋi = m ẋ2 + ẏ 2 .

T =
2 2
Now we have to rewrite this expression in the polar coordinates r and θ. Since the rod
of the pendulum is supposed to be perfectly rigid, the coordinate r remains constant.
On the other hand, coordinate θ depends on time,

θ = θ(t).

Derivatives of Cartesian coordinates x and y with respect to time are therefore given
by

ẋ = r (cos θ) θ̇ = r θ̇ cos θ,
(2.25)
ẏ = −r (sin θ) θ̇ = −r θ̇ sin θ.

Next we insert these relations into T :


52 2 Lagrange equations

1  2 2 
T = m r θ̇ cos2 θ + r2 θ̇2 sin2 θ
2
1
= m r2 θ̇2 cos2 θ + sin2 θ

(2.26)
2
1
= m r2 θ̇2 .
2
What about the potential V ? In our elementary analysis from the beginning of the
chapter, we decomposed the force F into tangent and normal component and realized
that the normal component does not affect the motion and the tangent component
causes the angular acceleration. Decomposition of the force was easy but sometimes it
can be very difficult and one has to find an appropriate way. In Lagrange formalism,
however, the procedure is straightforward (although it can be complicated).
Cartesian components of the force are
F1 = 0, F2 = m g.
Note that we do not include the minus sign, because y−axis is oriented downwards
(see figure 2.1). Now we can compute generalized forces according to relation (2.10).
Since we have only one generalized coordinate θ, there is only one generalized force:
∂xi ∂x ∂y
Q= Fi = F1 + F2 = −m g r cos θ.
∂θ ∂θ ∂θ
Potential is then defined as
∂V
Q=−
∂θ
which integrates to
Z
V = − Q dθ = −m g r cos θ.

The Lagrangian of mathematical pendulum is therefore


1
L = m r2 θ̇2 + m g r cos θ. (2.27)
2
Once we have the Lagrangian, equations of motion follow immediately from Lagrange
equations (2.18):
∂L d ∂L ∂L
= m g r2 θ̇, = m g r2 θ̈, = −m g r sin θ.
∂ θ̇ dt ∂ θ̇ ∂θ

d ∂L ∂L g
− =0 → θ̈ + sin θ = 0.
dt ∂ θ̇ ∂θ r
2.7 Lagrange equations in Mathematica 53

If we define
g
ω02 = ,
r
the equation of motion acquires the form

θ̈ + ω02 sin θ = 0. (2.28)

We have seen how the Lagrange formalism can be applied to familiar problems to
obtain the equations of motion and we are ready for studying a new problem where
the equations of motion are unknown. We will illustrate the power of the formalism
on the example of the double pendulum. Before we analyse double pendulum, let us
see how the Lagrange formalism can be implemented in Mathematica.

2.7 Lagrange equations in Mathematica


In this section we present one possible way how to derive Lagrange equations using
Mathematica. The algorithm to be explained takes Lagrangian L, the list of gen-
eralized coordinates and velocities and differentiates Lagrangian in order to obtain
Lagrange equations. In our example we study the motion in homogeneous gravita-
tional field investigated in section 2.4.
Generalized coordinates are now

q1 = x, q2 = y,

the Lagrangian is
1
m ẋ2 + ẏ 2 − m g y.

L=
2
Let us explicitly denote which variables depend on time:
1
m ẋ(t)2 + ẏ(t)2 − m g y(t).

L=
2
In order to find the Lagrange equations
d ∂L ∂L
− =0
dt ∂ q̇a ∂qa
we have to evaluate partial derivatives
54 2 Lagrange equations

∂L ∂L ∂L ∂L
, , , ,
∂ ẋ(t) ∂ ẏ(t) ∂x(t) ∂y(t)
and then to calculate total derivatives
d ∂L d ∂L
, .
dt ∂ ẋ(t) dt ∂ ẏ(t)
Derivative of the Lagrangian with respect to coordinate x can be found by the com-
mand
D[ L, x[t] ]
where L is the Lagrangian written in Mathematica. Similarly, derivative with respect
to velocity is simply
D[ L, x’[t] ].
An equivalent way how to perform the last command is
D[ L, D[x[t], t] ].
Since we need to differentiate this expression with respect to time again, we can write
D[ L, D[x[t], t], t ].
Hence, the Lagrange equation for variable x can be written in the form
D[ L, D[x[t], t], t ] - D[L, x[t]] == 0.
Analogous command can be constructed for the second variable y.
In order to make our code universal, we realize that we have to perform operation
D[ L, D[#, t], t] - D[L, #] == 0
for each generalized coordinate #, where # must be taken from the list of generalized
coordinates. Hence, suppose that we have a list of generalized coordinates called qs,
in our case
qs = { x[t], y[t] }
and the Lagrangian L. Then we can define the function

LagrangeEqs@qs_ , L_ D := D@L , D@ð , tD, tD - D@L , ð D Š 0 & ž qs


In[5]:=

and use it by invoking


2.8 Solving the equations of motion of pendulum 55

1
LagrangeEqsB 8x @tD, y @tD<, m J x '@tD 2 + y '@tD 2 N - m g y @tDF
In[6]:=

8m x ¢¢ @tD Š 0, g m + m y ¢¢ @tD Š 0<


Out[6]=

Obviously, Mathematica returns correct set of equations of motion.

2.8 Solving the equations of motion of pendulum


In the case of mathematical pendulum we can write
q = {\[Theta]};
v = {p};
L = 1/2 m r^2 p^2 + m g r Cos[\[Theta]];
where have denoted p = θ̇. Again, running the rest of our code yields correct equation
of motion
{g m r Sin[\[Theta][t]] + m r^2 \[Theta]’’[t] == 0}.
We can see that the mass can be canceled and the equation of motion is (2.28):

θ̈ + ω02 sin θ = 0. (2.29)

This equation cannot be solved in a closed form which means that we cannot write
down its explicit solution. Fortunately, there exist numerical methods which allow us
to find the approximate solution. In fact, Mathematica has many built-in methods for
constructing the numerical solutions of many types of differential equations. They are
all encapsulated in NDSolve function. But the numerical solution cannot be obtained
if the values of the constants are not specified. Moreover, to find a particular solution
we have to provide also the initial conditions. Our task is now to solve equation (2.29)
with appropriate initial conditions numerically.
\[Omega]0 = 1; T0 = 2 \[Pi] / \[Omega]0;
eqs = { \[Theta]’’[t] + \[Omega]0^2 Sin[\[Theta][t]] == 0,
\[Theta][0] == \[Pi]/4, \[Theta]’[0] == 0};
sol = NDSolve[ eqs, \[Theta][t], {t, 0, 2 T0}]
56 2 Lagrange equations

Here we first set the value of ω0 to 1 for simplicity. Moreover we define the “period”
T0 = 2π/ω0 because we know that for harmonic oscilator such relation holds. Next
we define the list of three equations,
π
θ̈ + ω02 sin θ = 0, θ(0) = , θ̇(0) = 0.
4
First of them is the equation of motion, the other two represent the initial conditions.
Equation θ(0) = π/4 means that the angle of deflection at time t = 0 is equal to
π/4 (in what position the pendulum is?). The velocity θ̇ has been set to zero. The
solution is found by the function NDSolve, as claimed, where we specify
• system of equations to solve – eqs;
• unknown variable – θ[t];
• interval of – {t, 0, 2 T0}.
The result of NDSolve is something of the form
{{ \[Theta][t] -> InterpolatingFunction[....][t] }}
We can see that it is a replacement rule. According to this rule, any occurence of θ[t]
will be replaced by interpolating function. When the function NDSolve constructs the
solution, it finds only a finite number of values of the unknown function θ on desired
interval. Then, however, we want to evaluate the solution at arbitrary time t and it
can happen that this time will be different than any time used in the construction of
the solution. For this reason, Mathematica has to ”guess” the correct value of θ at
that time. By the ”guessing” we mean the interpolation between two closest times
at which the value of θ is known.
However, it is not important how the procedure works for us. What we need is
that in order to evaluate the solution at arbitrary time t we have to type
\[Theta][t] /.sol /.t->1
Symbol θ[t] has no meaning to Mathematica, but the rule sol will replace the symbol
by the function which is the solution of the equation of motion. Then we can replace
the argument t by its concrete numerical value using the next rule.
Finally, we can visualise the solution by the command Plot. Complete code for
solving the equations of motion and plotting it follows, resulting picture is in figure
2.3.
2.9 Deriving the Lagrangian in Mathematica 57

\[Omega]0 = 1; T0 = 2 \[Pi] / \[Omega]0;


eqs = { \[Theta]’’[t] + \[Omega]0^2 Sin[\[Theta][t]] ==
0, \[Theta][0] == \[Pi]/4, \[Theta]’[0] == 0};
sol = NDSolve[ eqs, \[Theta][t], {t, 0, 2 T0}]
Plot[ \[Theta][t] /. sol, {t, 0, 2 T0}]

Fig. 2.3. Numerical solution of equation of mathematical pendulum for initial conditions θ(0) =
π/4, θ̇ = 0 and value ω0 = 1.

2.9 Deriving the Lagrangian in Mathematica


The code we have developed in previous sections is able to find the equations of
motion from arbitrary Lagrangian provided that the lists of generalized coordinates
and velocities are specified. In the case of mathematical pendulum in section 2.6
we have seen that sometimes the Lagrangian must be transformed into appropriate
coordinate system. Even this procedure can be automatized by Mathematica.
As an example we use mathematical pendulum again. The Lagrangian in Cartesian
coordinates reads
1
m ẋ2 + ẏ 2 − m g y.

L=
2
Recall that polar coordinates for pendulum were introduced by
58 2 Lagrange equations

x = r sin θ,
(2.30)
y = r cos θ.

Recall, in addition, that only the coordinate θ depends on time. In Mathematica we


type
x[t_] = r Sin[\[Thetat];
y[t_] = r Cos[t];
L = 1/2 m ( x’[t]^2 + y’[t]^2) + m g y[t]
which yields
g m r Cos[\[Theta][t]] +
1/2 m (r^2 Cos[\[Theta][t]]^2 \[Theta]’[t]^2 +
r^2 Sin[\[Theta][t]]^2 \[Theta]’[t]^2)
This is correct but can be simplified:
x[t_] = r Sin[t];
y[t_] = r Cos[t];
L = Simplify[ 1/2 m ( x’[t]^2 + y’[t]^2)] + m g y[t]
Now the identity sin2 θ + cos2 θ = 1 is applied automatically and the result is
-g m r Cos[\[Theta][t]] + 1/2 m r^2 \[Theta][t]^2.
Reader can check that this result is identical with the Lagrangian (2.27).

2.10 Planet in gravitational field


In this section we study a new problem: motion of the planet in homogeneous grav-
itational field. First we formulate the problem in physical terms, then we present its
solution using Mathematica.
Suppose we have a massive star, e.g. the Sun, which is at rest in a given reference
frame. The Sun produces gravitational field which attracts all bodies to its center.
According to Newton’s gravitational law, the body of arbitrary mass m moves with
the acceleration
r
a = −M G 3, (2.31)
r
where G is Newton’s gravitational constant, M is the mass of the Sun and bmr is
the position vector of the planet with respect to Sun, see figure 2.4. We can choose
units in such a way that
2.10 Planet in gravitational field 59

G M = 4 π2.

It is convenient to introduce polar coordinates in a standard way as

x = r cos θ, y = r sin θ,

where both coordinates r and θ depend on time (in general).

planet, mass m

r (position vector)

Sun, mass M
Fig. 2.4. Position of the planet with respect to the Sun.

It is easy to show that potential of the gravitational force is


m
V (r) = −4 π 2 , (2.32)
r
where m is the mass of a planet. The Lagrangian of a planet is then
1
m ẋ2 + ẏ 2 − V.

L=
2
Let us find the expression for this Lagrangian in the polar coordinates. Corresponding
Mathematica code reads
x[t_] = r[t] Sin[\[Theta][t]];
y[t_] = r[t] Cos[\[Theta][t]];
L = Simplify[1/2 m ( x’[t]^2 + y’[t]^2)] - (4 \[Pi]^2 m)/r
and yields
1  2  4 π2 m
L= m ṙ + r2 θ̇2 − .
2 r
3
Hamilton’s equations

3.1 Motivation
Let us recapitulate the advantages of Lagrange’s equations compared to the Newton
law of motion:
• Lagrange’s equations hold in arbitrary curvilinear coordinate system;
• the number of Lagrange’s equations is equal to the number of degrees of freedom
while in the Cartesian system we always have three equations for each particle
and possibly some additional constraints;
• the system is described by single scalar function called Lagrangian which simplifies
the transformation to generalized coordinate system.
In the context of classical mechanics, Lagrange’s equations are equivalent to New-
ton’s laws, but Newton’s laws appeared to be incorrect and have to be replaced by
the theory of relativity and the quantum mechanics. Nevertheless, the formalism of
Lagrange’s equations can be applied even in those theories.
Typical Lagrangian for one particle in Cartesian coordinates has form
1
L = T − V = m ẋi ẋi − V (x). (3.1)
2
Let us examine the structure of Lagrange’s equations
d ∂L ∂L
− =0
dt ∂ ẋi ∂xi
compared to Newton’s law of force in the form
dpi
= Fi .
dt
62 3 Hamilton’s equations

For the Lagrangian under consideration we have


∂L
= m ẋi = pi .
∂ ẋi
We can see that, in Cartesian coordinates, derivative of Lagrangian with respect to
velocity ẋi is equal to ordinary momentum

pi = m ẋi .

Lagrange’s equations can be then written in the form


dpi ∂L
= .
dt ∂xi
But for Lagrangian (3.1) we have
∂L ∂ ∂L
= (T − V ) = − = Fi ,
∂xi ∂xi ∂xi
since the kinetic energy T does not depend on xi and the potential is defined by
relation Fi = −∂V /∂xi . With this observation we immediately see that Lagrange’s
equations are equivalent to Newton’s law for they acquire form
dpi
= Fi .
dt
Notice that all of this holds in the Cartesian coordinates only.
What about general coordinate system qa ? We have seen, see equation (2.27), that
the Lagrangian of the pendulum in polar coordinates is
1
L= m r2 θ̇2 + m g r cos θ,
2
where the single generalized coordinate is θ and corresponding generalized velocity
is θ̇. Now
∂L
= m r2 θ̇.
∂ θ̇
This expression is not equal to the momentum of the pendulum, nevertheless, there
is a connection. The velocity of the pendulum is

v = ωr
3.1 Motivation 63

where ω = θ̇ is immediate angular velocity of the pendulum, Since p = mv, actual


momentum of the pendulum is

p = m v = m r θ̇.

We can see that quantities p and ∂L/∂ θ̇ differ by a factor r. But this is a consequence
of the choice of the coordinates only! Although p and ∂L/∂ θ̇ are different, they are
obviously related.
Thus, in the two cases we presented, derivatives of the Lagrangian with respect
to generalized velocities are related to the momentum of the particle. As we have
seen, in Cartesian system they coincide, but in curvilinear coordinates they do not.
Nevertheless, it seems reasonable to define notion of momenta derived from the La-
grangian.
Let L be an arbitraty Lagrangian depending on generalized coordinates qa and
velocities q̇a , i.e. L = L(q, q̇). We define generalized momentum pa conjugated to
coordinate qa :
∂L
pa = . (3.2)
∂ q̇a
Lagrange’s equations then acquire form
dpi ∂L
= .
dt ∂qa
If we know actual position and momentum of the particle, we can calculate how the
momentum varies in time. But how the position changes? We can find the answer only
by solving Lagrange’s equations to obtain functions qa = qa (t) and then calculate q̇a .
It would be better, however, if we could write equations of the form

q̇a = something,
(3.3)
ṗa = something else.

If we describe the state of the system by coordinates qa and momenta pa , Lagrange’s


equations give only the second part via
∂L
ṗa = . (3.4)
∂qa
But the Lagrangian is a function of qa and q̇a , so we can, in principle, use the definition
of generalized momentum
64 3 Hamilton’s equations

∂L
pa = (3.5)
∂ q̇a
and invert it to obtain relation of the form

q̇a = q̇a (q, p).

In general, however, we cannot say anything more.


Summa summarum, Lagrange’s equations are second order differential equations
for unknown functions qa . In the Lagrange formalism independent variables are co-
ordinates qa and velocities q̇a . We defined a generalized momentum pa by (3.2). Now
we want to rewrite Lagrange’s equations in such a way that new equations will have
the form (3.3). We have seen that using momenta pa Lagrange’s equations have the
form (3.5) and constitute only the half of equations we want to find. The difficulty
essentialy is that the Lagrangian itself is a function of qa and q̇a , but now we want the
independent variables to be qa and pa . Hamilton’s formalism provides a systematic
way how to obtain desired equations. Before we present it, a remark on the Legendre
transformation must be made.

3.2 Legendre transformation


In this section we formulate the problem in more general way, in the following section
we apply it to Lagrange’s formalism. Suppose that function f of variables (x1 , . . . xn )
is given,

f = f (x1 , . . . xn ) ≡ f (x).

Its total differential is


∂f ∂f
df = dx1 + . . . + dxn ,
∂x1 ∂xn
or, using Einstein’s summation convention,
∂f
df = dxi .
∂xi
It is an important point, that also the converse is true. If f is a function of some set
of variables, and its total differential is found to be

df = yi dxi ,
3.2 Legendre transformation 65

we can deduce that f is function of variables xi and relation


∂f
= yi
∂xi
holds.
OK, let us return to expression
∂f
df = dxi .
∂xi
Now denote
∂f
yi =
∂xi
so that the differential df acquires the form

df = yi dxi .

Using the Leibniz rule we can write

df = d(xi yi ) − xi dyi .

This is equivalent to

d(xi yi ) − df = xi dyi

or, using the linearity of the differential,

d (xi yi − f ) = xi dyi .

Note that on the left hand side we have the total differential of some function which
will be denoted by g:

dg = xi dyi ,

where g = xi yi − f . And this is what we wanted to achieve. Function g is a function


of yi , because its differential contains only differentials dyi , which means that

g = g(y).

Moreover, relation
66 3 Hamilton’s equations

∂g
= xi
∂yi
holds.
Let us recapitulate the procedure. We started with function f = f (x) depending
on variables xi . Then we defined new variables yi by
∂f
yi =
dxi
which means that the differential df became

df = yi dxi .

Finally we defined new function

g = xi y i − f

with differential

dg = xi dyi

which means that the new function depends on new variables yi . Function g is called
Legendre transformation of function f . Thus, Legendre transformation is procedure
how to transform function f = f (x) to new function g = g(y) where yi = ∂i f .

3.3 Hamilton’s equations


Now we are in position to derive Hamilton’s equations. Suppose that our system is
described by the Lagrangian

L = L(q, q̇, t)

where qa are generalized coordinates, q̇a are generalized velocities. We also allow the
Lagrangian to depend on time explicitly, i.e. ∂t L 6= 0. We introduce new variables
called generalized momenta by
∂L
pa = .
∂ q̇a
Thus, generalized momenta are partial derivatives of function L with respect to
one set of variables – velocities. We want to find the Legendre tranformation of
3.3 Hamilton’s equations 67

Lagrangian in order to obtain function which depends on coordinates qa and momenta


pa .
The total differential of the Lagrangian reads
∂L ∂L ∂L
dL = dqa + dq̇a + dt.
∂qa ∂ q̇a ∂t
Using the definition of generalized momentum and using the Lagrange equations
(3.4) we find
∂L
dL = ṗa dqa + pa dq̇a + dt.
∂t
Rearrange the terms to get
∂L
pa dq̇a − dL = − ṗa dqa − dt. (3.6)
∂t
The left hand side can be rewritten as

pa dq̇a − dL = d(pa q̇a ) − q̇a dpa − dL = d (pa q̇a − L) − q̇a dpa .

Plugging this expression back to (3.6) yields


∂L
d (pa q̇a − L) = q̇a dpa − ṗa dq̇a − dt. (3.7)
∂t
On the left hand side we have again the total differential of some function. This
function is called the Hamiltonian and is defined by

H = pa q̇a − L. (3.8)

Hamiltonian H is a Legendre transformation of the Lagrangian and depends on qa


and pa . This fact follows from equation (3.7) according to which the total differential
of Hamiltonian is
∂L
dH = q̇a dpa − ṗa dqa − dt.
∂t
We know that coefficients standing by the differentials on the right hand side are in
fact partial derivatives of function on the left hand side, i.e. the partial derivatives
of Hamiltonian:
∂H ∂H ∂H ∂L
= q̇a , = − ṗa , = − . (3.9)
∂pa ∂qa ∂t ∂t
68 3 Hamilton’s equations

Notice that the first two equations have exactly the form (3.3)! These are new equa-
tions of motion called Hamilton’s equations:
∂H ∂H
q̇a = ṗa = − (3.10)
∂pa ∂qa
Hamilton’s equation possess the advantages of Lagrange’s equations but there are
some differences. Let us compare them briefly.
• Both Lagrange and Hamilton equations hold in arbitrary curvilinear coordinate
system;
• equations of motion are derived from single scalar function, L or H;
• Hamilton’s equations are of the first order while the Lagrange equations are of
the second order;
• there is one Lagrange equation for each degree of freedom, so for the system with
n degrees of freedom we have n Lagrange’s equations; on the other hand, there
are two Hamilton’s equations for each degree of freedom, one for the coordinate
and one for the momentum – thus, there are 2n Hamilton’s equations.
Let us illustrate how Hamilton’s equations “work” on familiar examples.

3.4 Particle in homogeneous gravitational field


At this stage, the reader should be very familiar with the section (2.4), page 47. The
Lagrangian of the particle in homogeneous gravitational field is
1 1
m ẋi ẋi − m g y = m ẋ2 + ẏ 2 .

L=
2 2
The generalized coordinates in this case are

q = (q1 , q2 ) = (x, y),

i.e. we use ordinary Cartesian coordinates. The generalized momenta are, by defini-
tion (3.2),

∂L
p1 = = m ẋ,
∂ ẋ
(3.11)
∂L
p2 = = m ẏ.
∂ ẏ
3.5 Conservation of energy 69

We can see that the generalized momenta are ordinary momenta, the components of
p = m v in Cartesian coordinates. The last relations can be inverted to find
p1
ẋ = ,
m (3.12)
p2
ẏ = .
m
In order to find the Hamiltonian we have to perform the Legendre transformation
of the Lagrangian using relation (3.8):
1
H = pa q̇a − L = p1 ẋ + p2 ẏ − m (ẋ2 + ẏ 2 ) + m g y.
2
This is not a correct expression for we have to eliminate velocities q̇a and express
them as functions of momenta pa :
p1 p2 1 p2 1 p2
H = p1 + p2 − m 12 + m 22 + m g y.
m m 2 m 2 m
Collecting similar terms we arrive at the Hamiltonian in the form
p21 + p22
H= + m g y. (3.13)
2m
Notice that this is, not accidentally, an expression for the total energy of the particle.
Hamilton’s equations then follow straighforwardly from (3.10).

ẋ = p1 , ẏ = p2 ,
(3.14)
ṗ1 = 0, ṗ2 = − m g.

3.5 Conservation of energy


In mechanics we often analyze systems where the total energy is conserved. All
examples we have seen until know belong to this class of systems. The fact that
energy is conserved can be made explicit in the Hamiltonian framework.
First we show that the Hamiltonian H is in fact equal to the total energy E =
T + V , there T is kinetic energy and V is the potential. Recall that the kinetic energy
of one particle is
1 1 ∂xi ∂xi 1
T = m ẋi ẋi = m q̇a q̇b = gab q̇a q̇b ,
2 2 ∂qa ∂qb 2
70 3 Hamilton’s equations

where we defined
∂xi ∂xi
gab = m .
∂qa ∂qb
Direct differentiation gives
∂T
= gab q̇b .
∂ q̇a
Since the potential V does not depend on generalized velocities, we have
∂L ∂T
H = q̇a pa − L = q̇a − T + V = q̇a − T + V = gab q̇a q̇b − T + V
∂ q̇a ∂ q̇a
and therefore

H = 2 T − T + V = T + V = E.

We have proved that the Hamiltonian is equal to the total energy.

3.5.1 Homogeneous functions and Hamiltonian

There exists more general proof of the last statement and we present it briefly.
It relies on Euler’s theorem about homogeneous functions. Function f of variables
x = (x1 , . . . xn ) is said to be homogeneous of degree N if the following holds:

f (λ x) = λN f (x).

Differentiating this equation with respect to λ we arrive at


∂f
xi = N λN −1 f (x).
∂(λxi )
Setting λ = 1 gives the Euler theorem.
∂f
xi = N f (x). (3.15)
∂xi
Let us apply this theorem to the Hamiltonian. Kinetic energy T is a function of
coordinates qa and velocities q̇a , since
1
T (q, q̇) = gab (q) q̇a q̇b .
2
3.5 Conservation of energy 71

We can see that kinetic energy is homogeneous function of degree 2 in velocities, for
we have
1
T (q, λq̇) = gab (q) (λq̇a ) (λq̇b ) = λ2 T.
2
Application of (3.15) immediately yields
∂T
q̇a = 2 T,
∂ q̇a
so that the Hamiltonian is
∂L ∂T
H = q̇a pa − L = q̇a − T + V = q̇a − T + V = T + V = E.
∂ q̇a ∂ q̇a
This is an alternative proof of the above statement that the Hamiltonian is equal to
the total energy.

3.5.2 Conservation of energy


Since now we know that the Hamiltonian is equal to energy, we can discuss the
conservation of energy. Relation (3.9) shows that
∂H ∂L
= − .
∂t ∂t
Let us calculate the overall change of the energy per unit time:
dH ∂H ∂H ∂H
= q̇a + ṗa + .
dt ∂qa ∂pa ∂t
Using Hamilton’s equations (3.9) we find
dH ∂L ∂L
= − ṗa q̇a + q̇a ṗa − = − .
dt ∂t ∂t
Thus, we derived relation
dH ∂L
= − .
dt ∂t
Energy is conserved if Ḣ = 0 but this is achieved when
∂L
= 0.
∂t
The question is: can the Lagrangian depend on time explicitly? Notice that
72 3 Hamilton’s equations

3.6 Phase space


What is the interpretation of Hamilton’s equations? They describe the evolution of
the system in time. Let us explain this important point in some detail.
Generalized coordinates qa describe the position of all parts of the system. If we
know values

q = (q1 , . . . qn )

we know where all the particles are. We say that qa describe the configuration of
the system. Generalized momenta describe the state of motion (recall that momenta
are related to generalized velocities) of individual particles. Together we have 2n
quantities describing the actual state of the system which can be encapsulated in the
order 2n−tuple

z = (q1 , . . . , qn , p1 , · · · , pn ) ≡ (q, p).

Variables (q, p) define the state of the system. Hamilton’s equations (3.10) then say
how, for given state, these quantities change in time. The set of all possible states
of the system is called phase space. In other words, each state can be identified with
one point of the phase space. Let us illustrate the idea on the example of harmonic
oscillator.

3.7 Harmonic oscillator


Lagrangian treatment of harmonic oscillator can be found in the section 2.5, page
49. The Lagrangian of harmonic oscillator is
1 1
L= m q̇ 2 − m ω 2 q 2 .
2 2
Generalized momentum is then
∂L
p= = m q̇
∂ q̇
and corresponding Hamiltonian

p2 p2 1
H = pq̇ − L = −L= + m ω2 q2. (3.16)
m 2m 2
3.7 Harmonic oscillator 73

Then we can derive Hamiltonian equations of motion easily:


∂H p ∂H
q̇ = = , ṗ = − = − m ω 2 q.
∂p m ∂q
Now we set m = ω = 1 in order to simplify the analysis, so that the equations of
motion become

q̇ = p, ṗ = − q. (3.17)

Let us interpret these equations in the spirit of section 3.6. We have two vari-
ables describing the state of harmonic oscillator, coordinate q and the momentum
p. Hence, the phase space is a two-dimensional plane with coordinates q and p, see
figure 3.1. In this figure, the actual state of the oscillator is depicted as a point with
coordinates (q, p). Oscillator will then evolve in accordance with Hamilton’s equa-
tions (3.17) which determine the derivatives of coordinates. Thus, the oscillator will
move in the phase plane in the direction of velocity (q̇, ṗ) which is a vector tangent
to the trajectory of the oscillator in the phase plane. This trajectory is called phase
trajectory. By Hamilton’s equations (3.17) we have

(q̇, ṗ) = (p, − q).

This vector is depicted as an arrow in figure 3.1.


Since Hamilton’s equations are of the first order, the evolution of the system is
given uniquely by the initial position in the phase plane. If we draw a velocity at
each point of the phase plane, we get a reasonable idea about the behaviour of the
oscillator. Simple Mathematicacode which can be used to draw the velocity field of
harmonic oscillator follows:
VectorPlot[ {p, -q}, {q, -5, 5}, {p, -5, 5},
Frame -> True,
FrameLabel -> {"q", "p"},
BaseStyle -> {FontFamily -> "Times New Roman", FontSize -> 13}
]
The result is plotted in figure 3.2. This figure suggests that phase trajectories of
harmonic oscillators are circles centered at the origin of the phase plane. In the case of
harmonic oscillator we can prove this analytically. Since the Hamiltonian represents
the total energy, which was proved to be conserved, we can write

H = E,
74 3 Hamilton’s equations

p
(q, p)

(p, −q)

Fig. 3.1. Geometrical interpretation of Hamilton’s equations (3.17) in the phase space (q, p). The state
of the oscillator is represented by the position q and momentum p which can be regarded as coordinates
in the phase space. The ”velocity” is then vector with coordinates (q̇, ṗ) where the derivatives are
determined by Hamilton’s equations.

where E is the total energy of the oscillator, while H is the Hamiltonian (3.16) with
the simplification m = ω = 1 employed in this section for brevity:

p2 q 2
H= + .
2 2
Now, equation H = E can be rearranged slightly so it acquires the form of the
equation of circle,
√ 2
q 2 + p2 = 2E ,

where the radius of the circle is manifestly r = 2E. We can see that the phase
trajectory is determined by the single parameter E, the total energy.
4

0
p

-2

-4

-4 -2 0 2 4
q

Fig. 3.2. Velocity field of harmonic oscillator. At each point of the phase plane (q, p) we calculate the
velocity (q̇, ṗ) using the Hamilton equations (3.17) and draw the vector representing the velocity.
4
Variational principle

We have seen that both Lagrange’s equations and Hamilton’s equations are essen-
tially equivalent (at least when the forces involved have potential) to Newton’s law
of motion. In this chapter we derive Lagrange’s equations in a completely different
way, using variational principle. We will see that with this principle it is possible to
derive equations of motion from scratch, with the minimum of initial assumptions.
This approach is much more powerful, because it works even outside the realms of
classical mechanics. In fact, all laws of modern physics can be formulated in terms
of variational principle.

4.1 Fermat’s principle


Before we formulate variational principle, or Hamilton’s principle, in classical me-
chanics, we start our discussion with perhaps more familiar example from optics. It
is well-known that the light in different media propagates at different speeds. If c
denotes the speed of light in the vacuum, than the refractive index of given medium
is defined as
c
n=
v
where v is the speed of light in that medium. For example, the refractive index of
water is about n = 1, 33 which means that the light propagates 1, 33 times slower in
water than in the vacuum. Refractive index of the air is approximately n = 1, i.e.
the speed of light in the air is the same as in the vacuum.
Now, when the light ray propagates from one medium to another, it changes the
direction. Suppose that the light rays crosses the interface between two media with
refractive indices n1 and n2 , figure 4.1. It is customary to measure the angle of impact
78 4 Variational principle

with respect to the line perpendicular to the plane of the interface. For example, the
angle of impact in figure 4.1 is denoted by α. Similarly, the angle of refraction is β.
Can we calculate the angle of refraction provided that the angle of impact is given?
Yes, we can. According to the Snell law, these angles must satisfy equation
sin α n2
= . (4.1)
sin β n1
The Snell law (4.1) is a phenomenological law which was discovered before the
theory of electromagnetism and propagation electromagnetic waves have been found.
We can say that the rôle of the Snell law is similar to Newton’s law of motion. This
analogy goes even further. Three basic laws of geometrical optics are
• In homogeneous medium, the light propagates along straight lines;
• When the light propagates from one medium to another, the angle of impact and
the angle of refraction are related by the Snell law (4.1); if the light is reflected
on the interface, then the angle of impact is equal to the angle of reflection;
• If the light ray can propagate along some trajectory, then it can propagate also
in the opposite direction along the same trajectory.
With a small portion of fantasy one observes striking analogy between Newton’s laws
of motion and these three laws of optics. The first law tells us that if no changes
of refractive index occur, the light is propagating along the straight line which can
be regarded as an analogy to Newton’s law of motion: if no forces act, the body is
moving uniformly along a straight line. Second law, on the other hand, tells us how
the direction of propagation is influenced by changes of refractive index. Newton’s
law tells us how the velocity is changed under external force. Finally, the third law
of optics ensures that if light can propagate along some trajectory, it can propagate
in a reverse direction. In other words, if Alice can see Bob, then also Bob can see
Alice. Newton’s third law says that if body A exerts a force on body B, then also
body B exerts the force of the same magnitude and opposite direction on body A.
However, what we want to emphasize is that both the Snell law and the Newton
law of force are empirical laws which are justified by experiment. Is there any deeper
law which could explain all three laws of optics? Can we replace three laws of optics
by a single law? Yes, we can and it is called Fermat’s law.
Let us see how we can arrive at the formulation of the Fermat principle by heuristic
arguments. Suppose that the light ray is propagating in the homogeneous medium in
which, by definition, the refractive index is constant. Then, by the first law of optics,
the ray propagates along the straight line which is the shortest curve connecting
given two points. Hence, in the homogeneous medium, the light ray which travels
4.1 Fermat’s principle 79

α
medium n1

medium n2
β

B
Fig. 4.1. The light ray changes the direction on the interface between two media with refractive indices
n1 and n2 . In this figure we assume n1 < n2 which means that the light is faster in the first medium.

from point A to point B follows the shortest path from A to B. Does this statement
hold in general? Certainly not, as it is obvious from figure 4.1: the trajectory of light
ray passing from one medium to another is not a straight line and so its trajectory is
longer than the shortest possible one. But recall that the light propagates at different
speeds in both media. Maybe that the time rather than length is the minimal!
There is a beautiful argument by Richard Feynman. Suppose that you are standing
at the coast and there is a nice girl drowning in the sea. Of course, you want to save
her (this statement does not depend on whether the reader is a girl or a boy). You
must to reach the girl in a shortest possible time, not by the shortest distance! It is not
the same because you run faster than you swim. There are two extreme trajectories
along which you can travel, figure 4.2.
If you run straightly to that girl, trajectory i), you have to swim a long distance
which takes a longer time. If you choose trajectory ii), you spend the shortest possible
time in the water, but you have to run longer while your enter the water. It is obvious
that we have to find a point where to enter the water, so that the overall time which
you need to reach the girl is the shortest. This qualitative analysis shows that the
trajectory must be something like it is shown in figure 4.1 which indeed suggests that
the light ray is following the trajectory with minimal time.
Thus, we have arrived at the conjecture that the light ray propagates from point
A to point B along trajectory which takes the minimal time. This is the Fermat
principle. Let us formulate it in mathematical terms. Suppose that the light ray
starts in the point A and ends in the point B as in figure 4.1. Time which the ray
needs to travel along distance dr is
80 4 Variational principle

coast ii)

water i)

B (drowning girl)

Fig. 4.2. Two extreme trajectories which can be used to save drowning girl. Trajectory i) is the most
natural one, but the time you spend in water is too long. Trajectory ii) is better, but now you spend
too much time on the coast.

dr 1
dt = = n dr.
v c
Speed of light c is irrelevant, because whenever ndr is minimal, so is dt. Hence, we
define optical path length by
ds = n dr.
Total optical path length between points A and B is
ZB ZB
S= ds = n dr. (4.2)
A A

This notation is slightly awkward because the value of the integral depends not only
on points A and B but on the whole trajectory. In figure 4.3 we depict two different
trajectories γ and γ 0 connecting points A and B. Obviously, optical path length S is
different for both trajectories and thus instead of writing integration bounds A and
B we write the trajectory explicitly, e.g.
Z
S[γ] = n dr.
γ

Here we explicitly emphasize that the integral is taken along trajectory γ. Notice
that S depends on the entire trajectory γ. In other words, S can be regarded as a
mapping which assigns a number, optical path length, to each trajectory γ,
4.1 Fermat’s principle 81

S : γ 7→ R.

In general, any mapping from arbitrary set into the real numbers is called a func-
tional.

A
γ
B
γ0

Fig. 4.3. There are many trajectories connecting points A and B and the optical path length S is
different for each of them.

What is the law of propagation for the light ray? The Fermat principle states
that the light propagates along such trajectory for which the optical length S[γ] is
minimal. All three optical laws formulated at the beginning of this section can be
recovered from this simple statement. We do not show how it can be done in general,
but we show how the Snell law (4.1) can be derived from the Fermat principle.
Situation is sketched in figure 4.4. Suppose again that the light ray starts at point
A in the medium with refractive index n1 , crosses the interface between both media
and finally ends at point B in the medium with refractive index n2 . Let a be the
distance of point A from the interface, let b be the distance of point B from the
interface and let x be a coordinate of the place where the ray crosses the interface. If
points A and B are held fixed, it is the coordinate x which is unknown: we want to
find the place where the ray must cross the interface in order that the optical path
length be minimal. Complement of distance x will be denoted by y. Notice that, for
fixed points A and B, the sum of x and y is constant, say, l (it is the horizontal
distance of between points A and B):

x + y = l.

This equation immediately implies


∂y
= − 1. (4.3)
∂x
82 4 Variational principle

A
r1
a α
n1 x y
n2 r2
β b

B
Fig. 4.4. Derivation of the Snell law using the Fermat principle.

Distance from point A to the crossing-point is



r1 = a2 + x2

and corresponding optical path length is



s1 = n1 r1 = n1 a2 + x2 .

Similarly, distance from the crossing point to point B and corresponding optical path
length are
p p
r2 = b2 + y 2 , s2 = n2 r2 = n2 b2 + y 2 .

The total optical path length of the light ray is therefore


√ p
S = n1 a2 + x2 + n2 b2 + y 2 . (4.4)

We want to find such x that the length S will be minimal. This is an easy task of
elementary calculus: we differentiate S with respect to x and set the derivative equal
to zero. Assuming that n1 and n2 are constants and using (4.3) we find

∂S x y
= n1 √ − n2 p = 0. (4.5)
∂x a2 + x 2 b2 + y 2

From figure 4.4 we can see that x and y are related to angles α and β by
x x y x
sin α = =√ , sin β = =p ,
r1 a + x2
2 r2 b2 + y 2
4.2 Formulation of variational problem 83

and therefore equation (4.5) is equivalent to


sin α n2
=
sin β n1
which is the Snell law. This finalizes the proof.
Let us recapitulate and conclude this section. First we formulated three basic laws
of geometrical optics and emphasized the analogy between these laws and Newton’s
laws of motion. By some argumentation we have arrived at the conjecture that the
laws of optics can be replaced by the single law called Fermat’s principle: the light
ray propagates in such a way that the optical path length is minimal. From this
simple law we were able to derive the Snell law of refraction of the light.
Although this textbook is not concerned with the optics, our aim was to illustrate
the idea of variational principle on familiar example. The reason of this digression
rests in a fundamental importance of variational principles in theoretical physics.
It is possible to show that all phenomena which occur in geometrical optics can be
explained on the basis of the Fermat principle. Since the laws of optics are analogous,
at least mathematically, to the Newton laws, we can hope that it is possible to
formulate Newton’s laws in the framework of the variational principle. This will be
done in the rest of this chapter.

4.2 Formulation of variational problem


Before we apply the variational principle to Newtonian mechanics, we formulate the
problem in precise mathematical terms and solve it. Let us return to integral (4.2)
which is only a formal expression of the Fermat principle. When we derived the Snell
law from the Fermat principle, we assumed that the refractive index was constant in
the first medium and constant but different in the second medium. This allow us to
split the integral into the sum of two terms (4.4). In general, however, the refractive
index will be the function of coordinates. Indeed, the first law of optics tells us that
if the refractive index is constant everywhere, the light rays propagate along the
straight lines. Hence, at the first step, we must admit that n is a function of spatial
coordinates:
Z
S[γ] = n(x, y) dr.
γ

For simplicity, we restrict ourselves to two-dimensional case and so we suppress the


z−coordinate. The line element dr must be then expressed in terms of the Cartesian
coordinates as well; using the Pythagorean theorem we have
84 4 Variational principle
p
dr = dx2 + dy 2 .

Now suppose that we parametrize coordinates x and y by some parameter t. Then

dx = ẋ dt, dy = ẏ dt,

where ẋ and ẏ are derivatives of coordinates with respect to parameter t. Integral S


acquires final form
Z p
S[γ] = n(x, y) ẋ2 + ẏ 2 dt. (4.6)
γ

We can see that the optical path length S[γ] is a functional of the form
Z
S[γ] = L(x(t), ẋ(t)) dt
γ

where L is a function of coordinates and their derivatives. Our task is to find such
trajectory γ for which the value S[γ] is minimal. It is a task similar to finding the
minimum of function familiar from elementary calculus. Such problem is solved by
taking the derivative with respect to the variable and setting it to zero. The difference
in our case is that now γ is not a single variable but it is the entire trajectory and it
is not obvious how to differentiate S with respect to γ. This concept is known as a
functional derivative or a variation and it can be defined in a very general context.
Here we define it in a more pedestrian way sufficient for our purposes.

4.3 Variation of the functional


In the previous section we have formulated basic problem of variational calculus in
the Cartesian coordinates. We have seen in previous chapters that it is often useful
to introduce generalized coordinates qa instead of Cartesian coordinates xi . Hence,
we replace x in integral (4.6) by q:
Z
S[q] = L(q(t), q̇(t)) dt. (4.7)
γ

Here we replaced the argument γ of functional S by argument q, because q is a


coordinate expression of the trajectory. How can we differentiate S with respect to
γ in order to find γ for which S[γ] is minimal?
4.3 Variation of the functional 85

Suppose that qa is the trajectory which is the solution to our problem, i.e. suppose
that for qa the functional S[q] acquires minimal value. Let this trajectory passes point
A for t = t1 and point B for t = t2 , see figure 4.5. Since qa is the minimal trajectory,
any trajectory qa0 different from qa must yield bigger value of S. Notice that we can
choose arbitrary trajectory qa0 but it must satisfy boundary conditions
qa0 (t1 ) = qa (t1 ), qa0 (t2 ) = qa (t2 ), (4.8)
because points A and B are held fixed. Let us write trajectory qa0 in the form
qa0 (t) = qa (t) + ε ηa (t) (4.9)
where ε is arbitrary constant parameter and η(t) is arbitrary function of time, subject
to boundary conditions
ηa (t1 ) = ηa (t2 ) = 0 (4.10)
in order to satisfy conditions (4.8).
Since function ηa is a difference between trajectories qa and qa0 , it is called a
variation and in physical textbooks it is often denoted by δqa = εηa . Symbol δ has
formally the same properties as the total differential d. Notice that (4.9) implies
q̇ 0a = q̇a + ε η̇a
so we can write δ q̇a = εη̇a . In other words, variation δ commutes with differentiation
with respect to parameter t.
As we emphasized repeatedly, functional S depends on the trajectory and for qa
it acquires minimal value, while for qa0 6= qa we have
S[q 0 ] = S[q + εη].
Notice that now we parametrized the family of trajectories qa0 by single parameter ε.
Since we want to find the minimum of S[q 0 ] (which is S[q]), we need to differentiate
S[q 0 ] somehow. While we do not know how to differentiate S with respect to entire
trajectory, differentiation with respect to ε is a well-defined operation. Hence, we
define the variation or functional derivative of S by

d
δS = S[q + εη]. (4.11)
dε ε=0

Notation d/dε|ε=0 means that first we differentiate the function with respect to ε
and then set ε = 0. The reason why we substitute zero for ε will be clear soon. Now,
the correct qa is a solution to equation
δS = 0.
86 4 Variational principle

qa

qa
A
qa0
εη B

t
t1 t2
Fig. 4.5. Two trajectories qa and qa0 starting at point A and ending at point B. Only for trajectory
qa the functional S is minimized.

4.4 Euler-Lagrange equations


Having defined the variation (4.11), we can now easily solve our variational problem.
Let us state it again. We want to find such trajectory qa (t) that the integral (called
action)
Zt2
S[q] = L(q(t), q̇(t)) dt (4.12)
t1

is minimal. In order to find this trajectory we replace qa by varied trajectory

qa 7→ qa + ε ηa

and solve equation δS = 0 where the variation δ is defined by (4.11). Since we suppose
that initial and final points A and B are fixed, we are interested only in trajectories
for which

ηa (t1 ) = ηa (t2 ) = 0.

Let us now find the variation δS explicitly. We have


4.4 Euler-Lagrange equations 87

Zt2
d d
δS = S[q + εη] = L(q(t) + εη(t), q̇ + εη̇) dt
dε ε=0 dε ε=0
t1
(4.13)
Zt2  
∂L ∂L
= ηa + η̇a dt.
∂qa ∂ q̇a
t1

Note that function L in the first line is evaluated on varied trajectory qa0 = qa + εηa .
Then we differentiate L using the chain rule with respect to its first and then with
respect to its second argument. After differentiation we put ε = 0 so that function
L is evaluated on the original trajectory qa after the differentiation. Hence, after the
differentiation we do not have varied trajectory, only the original one.
Next step is to remove derivative of variation ηa with respect to parameter t.
Using integration by parts we find
Zt2  t2 Zt2
∂L dηa ∂L d ∂L
dt = ηa − dt. (4.14)
∂ q̇a dt ∂ q̇a t1 dt ∂ q̇a
t1 t1

Now we impose boundary conditions (4.10) that ηa must vanish at boundary points
A and B which implies that the ”boundary” term in square brackets is equal to zero!
Thus, after integration by parts, the variation of the action becomes
Zt2  
∂L d ∂L
δS = − ηa dt. (4.15)
∂qa dt ∂ q̇a
t1

Our variational principle tells us that this variation must be equal to zero. Recall
that during the variation we kept boundary points A and B fixed. However, equation
(4.15) must hold for arbitrary points A and B, because we did not say anything
specific about these points. We can choose these points arbitrarily and then find δS
and this variation δS must vanish. Moreover, variation ηa was chosen to be arbitrary
as well. Then, δS can vanish for all ηa and for all points A and B only if the expression
in the square brackets is zero everywhere. In other words, variational principle implies
that following equations must hold:
∂L d ∂L
− = 0. (4.16)
∂ q̇a dt ∂ q̇a
These equations are known as the Euler-Lagrange equations of variational calculus.
88 4 Variational principle

We can see that the Euler-Lagrange equations are nothing else than Lagrange’s
equations (2.18), if we identify the Lagrangian L with function L above. This is
a surprising result: actual physical system evolves in time in such a way so as to
minimize the action (4.12)!
From the other point of view, recall that Lagrange’s equations (2.18) have been
derived as an equivalent formulation of Newton’s laws of motion in arbitrary coordi-
nate system. Thus, at the beginning, we had the Newton law which is a physical law.
In this chapter, on the other hand, we have not assumed anything about the physics:
we merely formulated the rule, variational principle, that action must be minimal.
Then we performed some calculations and showed that this principle is equivalent to
the Euler-Lagrange equations (4.16). Hence, we have derived the same form of the
law of motion without using any physics.
Of course, this strong statement is somewhat weakened if we realize that varia-
tional principle does not tell us what is the form of function L. In order to guess the
form of L we have to impose some physical restrictions. First, consider free particle,
i.e. particle moving in free space where no forces are present. If we describe the parti-
cle in the Cartesian coordinates, the Lagrangian L can depend on coordinates xi and
velocities ẋi . However, all points of the space are equivalent and no one is preferred.
If there are no forces, the particle must behave in the same way independently of its
position. Hence, Lagrangian cannot depend directly on coordinates, it can depend
only on the velocities. This is a consequence of homogeneity of space.
Next restriction comes from a isotropy of space. While homogeneity implies that
all points are equivalent, isotropy implies that for a given point, all directions in the
space are equivalent. We can rotate the system containing our particle under analysis
and the particle will behave in the same way. Thus, the Lagrangian cannot depend
on the direction of velocity vi = ẋi and can depend only on its magnitude, v 2 = ẋi ẋi .
Thus, we have determined the Lagrangian of the free particle up to multiplicative
constant and we can write it in the form

L = α v2 (4.17)

where α is multiplicative constant. This constant cannot be specified further because


it must be a constant characteristic to the particle and its value will depend on
the convention we use. We can argue that our Lagrangian is proportional to kinetic
energy and therefore it is plausible to set α = m/2, but it is not necessary. We
emphasize that it is more-less only a convention that we write constant α in this
form. The reason is that it was the Newton law which was discovered first and the
variational principle was discovered later. From now we assume α = m/2 and denote
kinetic energy by
4.4 Euler-Lagrange equations 89

1
T = α v2 = m v2
2
and investigate what happens in the presence of forces.
We can see the heuristic power of variational principle: equations of motion are
provided by the Euler-Lagrange equations which have always the same form regard-
less on the system we describe and independently on the coordinates used. In order
to find equations of motion we have only to specify the Lagrangian. Usually we do
not have too many possibilities how the Lagrangian can look like. We have seen in
the case of the free particle that essentially the only form of admissible Lagrangian
is (4.17). The reason is that the Lagrangian is a scalar, so we must construct a scalar
quantity from quantities describing our system, like velocity and coordinates. Usually
there are only few possibilities.
Situation is similar even in the presence of forces. If the force is potential and thus
described by single scalar V such that Fi = −∂i V , it is natural to set
L = α v2 − V
where the minus sign is customary again and is related to the fact that the force is
minus the gradient. This choice is convenient but absolutely not necessary.
Electromagnetic forces, on the other hand, are not potential. Thus, the construc-
tion of Lagrangian as in chapter 2 is impossible: we can define the generalized forces
Qa but they are not a gradient of any scalar. In fact, electromagnetic field is described
by one scalar potential φ and one vector potential Ai . These potentials in general
depend on time and position. Now it is not important what is the vector potential,
we want just illustrate that even in this case the Lagrangian can be constructed.
Indeed, the particle moving in the electromagnetic field is described again by the
position xi and by the velocity vi while the field itself is described by potentials φ
and Ai . Can we combine these quantities to form a scalar Lagrangian? Yes, and the
construction is fairly unique. Since φ is itself a scalar, we can simply add it to the
Lagrangian of free particle (or, more precisely, subtract it from the Lagrangian), so
that the first part of the Lagrangian will be
L = T − βφ.
Here, β is again a constant to be specified later. Now we can form two scalar functions
from quantities xi , vi and Ai :
xi v i , xi Ai , Ai vi .
The first combination does not contain field quantities and we can exclude it im-
mediately, for it cannot describe interaction of the particle with a field. The second
90 4 Variational principle

combination looks better but recall that the space itself is homogeneous. This ho-
mogeneity is broken down by the presence of the electromagnetic field, but still the
Lagrangian should not depend on coordinates directly, only through potentials φ and
Ai . Hence, the only plausible combination is Ai vi and we can write

L = T − βφ + γA · v.

Now, constants β and γ obviously determine the strength of interaction between the
field and the particle. We know from the experience that electromagnetic force is
proportional to the charge of the particle e and thus we can write the Lagrangian in
the final form

L = T − e φ + e A · v. (4.18)

We can see that our construction is not ”bullet-proof” but it is very natural and,
moreover, it yields correct equations of motion. This heuristic approach is even more
powerful in relativistic theories where the action must be a scalar1 with respect to so-
called Lorentz group which is a strong restriction. Notice that in classical mechanics
we know what the correct equations of motion are: Lagrange’s equations must reduce
to Newton’s law. However, when we are developing a new theory, we do not know
what the correct equations are. In such a position we usually assume that variational
principle is correct and guess the form of the action or the Lagrangian. In this
way, people constructed modern quantum field theories of electromagnetic, weak and
strong interactions. Hence, variational principle is much more fundamental principle
than it seems from our discussion.

4.5 Non-uniqueness of the Lagrangian


Using the variational formulation it is easy to see that the Lagrangian is not unique,
i.e. there are many different Lagrangians which yield the same equations of motion.
To see this, consider arbitrary function F = F (t) of time and define

dF
f (t) = .
dt
Let us modify the action by adding a new term to it:
1
In classical mechanics it does not matter whether we construct the action or directly the Lagrangian,
because they differ only by integration over time. In relativistic theories, time is not invariant and trans-
forms as a component of (four-)vector. Hence, it is the action which must be scalar, not the Lagrangian.
4.6 Variational derivation of Hamilton’s equations 91

Zt2
0
S =S+ f (t) dt.
t1

Second term can be integrated,


Zt2
dF
S0 = S + dt = S + [F (t)]tt21 = S + F (t2 ) − F (t1 ).
dt
t1

Thus, the new action S 0 differs from S only by boundary terms – values of F at
boundaries of the trajectory. These are fixed under variation and so we have
δS = δS 0 .
That means that variational principle δS = 0 gives the same equations of motion as
principle δS 0 = 0. By the definition of the action, we have
Zt2
0
S = (L + f (t)) dt,
t1

which can be written as


Zt2
0
S = L0 dt
t1

where
dF
L0 = L + f (t) = L + . (4.19)
dt
In other words, if we change the Lagrangian by adding function f which is a total
derivative of some other function F with respect to time, we do not change the
equations of motion. Hence, the Lagrangian is not unique. This is an important
observation which will be exploited in the connection with canonical transformation,
chapter 5.

4.6 Variational derivation of Hamilton’s equations


We have shown that variational principle reproduces Lagrange’s equations. Can we
reproduce Hamilton’s equations as well? Let us start with the action (4.7) and express
the Lagrangian in terms of the Hamiltonian using the Legendre transform (3.8):
92 4 Variational principle

Zt2
S= (pa q̇a − H) dt. (4.20)
t1

Recall that the Hamiltonian is function of coordinates and momenta, H = H(q, p).
Let us variate the action, remembering that δ−symbol behaves like the differential,
Zt2  
∂H ∂H
δS = pa δ q̇a + q̇a δpa − δqa − δpa dt.
∂qa ∂pa
t1

Now we have three variations δ q̇a , δpa and δqa . However, they are not independent
because q̇a should be expressed in terms of momenta. We can get rid of this term
integrating by parts,
Zt2 Zt2
pa δ q̇a dt = [pa δqa ]tt21 − ṗa δqa dt
t1 t1

where the first term on the right hand side vanishes by boundary conditions δpa = 0
for t = t1 and t = t2 . Then the variation of the action becomes
Zt2  
∂H ∂H
δS = −ṗa δqa + q̇a δpa − δqa − δpa dt.
∂qa ∂pa
t1

Variation will be zero for arbitrary choice of t1 and t2 if the integrand vanishes.
Comparing coefficients standing beside independent variations δqa and δpa we recover
Hamilton’s equations
∂H ∂H
q̇a = , ṗa = − . (4.21)
∂pa ∂qa

4.7 Noether’s theorem: motivation


In general, during the evolution of mechanical system, quantities characterizing the
system change. Namely, coordinates and velocities (or momenta in the Hamiltonian
formulation) are solutions to equations of motion and hence they are genuine (non-
trivial) functions of time. However, there are other quantities which are functions of
qa and pa but for the real evolution they remain constant. The most familiar example
4.7 Noether’s theorem: motivation 93

is energy. We have seen that Hamiltonian represents total energy of the system and if
it does not depend on time explicitly, it does not depend at time at all. For example,
the Hamiltonian of harmonic oscillator is
p2 1
H= + m ω2 qa
2m 2
where both p and q are functions of time. Nevertheless, for any solution of Hamil-
ton’s equations, particular combination of coordinates and momenta given by H is
a constant. In this case we say that the energy is conserved.
Other examples of conserved quantities are momentum and angular momentum.
Total momentum and total angular momentum of an isolated system are constant
in time.
From mathematical point of view, the existence of conserved quantities is not
surprising but it is a direct consequence of properties of differential equations. For
the system with n degrees of freedom we have n Lagrange’s equations of the second
order or 2n Hamilton’s equations of the first order. Solution of second-order equation
contains two arbitrary constants, so the solution of complete set of Lagrange’s equa-
tions contains 2n constants. Similarly, solution to first-order equation contains one
integration constant, so the solution of complete set of Hamilton’s equations contains
again 2n constants.
We have arrived at conclusion that, regardless on the formalism, the solution
of equations of motion depends on the choice of 2n arbitrary constants C1 , . . . C2n .
Hence, the solution (q, p) of equations of motion can be written in the form

q1 = q1 (t, C1 , . . . C2n ), p1 = p1 (t, C1 , . . . C2n ),


.. ..
. .
qn = qn (t, C1 , . . . C2n ), pn = pn (t, C1 , . . . C2n ).

This is the system of 2n equations for constants Cm which can be inverted to obtain

C1 = C1 (q1 , . . . qn , p1 , . . . pn , t),
..
.
C2n = C2n (q1 , . . . qn , p1 , . . . pn , t),

In other words, for any solution of Hamilton’s equations there must exist at least
2n functions Cm of coordinates and momenta which are in fact constant and hence
conserved. In this sense the existence of conserved quantities is a pure mathemati-
cal consequence of the fact that solutions of differential equations contain integration
94 4 Variational principle

constants. Of course, any combination of constants Cm is again a constant and there-


fore the set of conserved quantities is not unique.
There is, however, much deeper physical interpretation of the existence of con-
served quantities. Some of these conserved quantities reflect properties of the space
and time and so they are intimately related to symmetries of Nature. This relation is
the content of celebrated Noether’s theorem, one of the most fundamental and strik-
ing achievements of modern theoretical physics. The most important consequences
of Noether’s theorem can be found in relativistic quantum field theories, but it has
implications even in the context of classical mechanics. In the following we derive and
proof the Noether theorem, then we show that conservation of energy, momentum
and angular momentum is the consequence of this theorem. The reader will notice
that the theorem is genuinely based on the variational principle which this chapter
is devoted to.

4.8 Noether’s theorem: proof


When we derived Lagrange’s equations from the action, the idea was to find such
trajectory qa that the action S acquires its extremal value. The variation of the action
was introduced with the help of varied trajectories, recall figure 4.5. The variation
of the trajectory was arbitrary with the only constraint that it must vanish at the
boundary points A and B. Using this constraint we were able to derive equations of
motion, i.e. the Lagrange equations.
Now we proceed differently. We claimed in the previous section that to each sym-
metry of the system there is a conserved quantity. What do we mean by the symmetry
of the system? The simplest example of the symmetry is the invariance with respect
to temporal translation. Isolated systems must be invariant under translations in
time. In other words, if we perform some experiment at time t1 and then the same
experiment at later time t2 > t1 , both experiments must give the same results, if all
conditions remain unchanged.
For example, suppose we study the collision of two particles with initial velocities
v 1 and v 2 and masses m1 and m2 . In addition, we suppose that these particles form
an isolated system, not affected by the laboratory. After the collision we measure
the velocities and find that new velocities of particles are v 01 and v 02 . The point is
that if the initial velocities and masses do not change, resulting velocities after the
collision do not depend on time when the experiment was performed. It does not
matter whether we study the collision on Monday or on Friday, the result must be
the same, independent of time.
4.8 Noether’s theorem: proof 95

B B0

A A0

t1 t01 t2 t02
Fig. 4.6. Translation of the system in time.

More generally, imagine that qa = qa (t) is the real trajectory (i.e. it is a solution
of equations of motion) which passes point A at time t1 and point B at time t2 ,
see figure 4.6. If we perform the same experiment at later time, we can imagine it
as ”shifting” the trajectory to the right (in time direction), so that new trajectory
starts at point A0 at shifted time t01 and ends at point B 0 at time t02 . We say that,
mathematically, we translated the system in time. If all other conditions remain the
same, then the shape of the trajectory cannot change, the particle must move along
the same trajectory but at later time.
We say that time is homogeneous, i.e. alt instants of time are physically equivalent.
Hence, the result of any experiment cannot depend explicitly on time at which it was
performed: isolated system must be invariant under the translation in time.
Notice that this conclusion does not apply to non-isolated systems. For example,
suppose that we measure the intensity of the sunlight at 8.00 am and at 11.00 pm.
Then the results will be, of course, different! We cannot say that the intensity of the
sunlight is always the same. However, this is related to the fact that Earth is not an
isolated system if one studies the sunlight, because for our measurement it is crucial
that there is an energy coming from Sun to Earth. The conditions which can affect
the experiment are not the same in the morning and before midnight. Hence, the
assumption that the system is isolated is important. In fact, the existence of Sun and
the rotation of Earth breaks down the homogeneity of time.
We will not always emphasize it, but in connection with conservation laws, we
will always assume that the system is isolated.
96 4 Variational principle

Homogeneity of time is the simplest of the symmetries to be discussed. The next


one is the homogeneity of the space. This principle states that the result of experiment
cannot depend on the place where we perform it. Again, we must add an assumption
that all external conditions must be the same. But if this assumption is satisfied, it
does not matter where we perform the experiment. The physics must be invariant
with respect to translation in the space; this transformation is plotted at figure 4.7.

q
B0

q 0(t1) A0
B

q(t1) A

t1 t2
Fig. 4.7. Translation of the system in space.

The last of the most important symmetries is the isotropy of the space. Isotropy
means that at given point of the space, all directions are equivalent and the result
of any experiment cannot change if we rotate the system by arbitrary angle.
If the system is invariant with respect to some transformation(translation in time,
space or rotation), the action of this system does not change under this transforma-
tions. Noether’s theorem then implies that each of these symmetries is responsible
for the conservation of some quantity. Homogeneity of time implies the conserva-
tion of energy, homogeneity of the space implies the conservation of momentum and
isotropy of the space implies conservation of angular momentum.
Notice that in previous examples we varied either the trajectory or the time. In
the case of spatial translation, figure 4.7, we did not transform the time, only the
trajectory. However, boundary points were not fixed because the endpoints of the
trajectory are transformed as well. Thus, in general, boundary conditions

δqa (t1 ) = δqa (t2 ) = 0


4.8 Noether’s theorem: proof 97

must be relaxed. In the case of time translation we did not change values of coordi-
nates qa , but we shifted the trajectory in time and thus we must consider not only
variations of coordinates qa , but also variation of time δt.
All transformations considered above are special cases of general transformation

q(t) 7→ q 0 (t) + δq(t), t0 7→ t + δt(t).

Here we explicitly emphasized that variations δt and δqa can depend on time. Varia-
tion δq is called isochronous variation because it is a difference of varied coordinate
q 0 (t) and original coordinate q(t) at the same time. Beside δqa we introduce also
non-isochronous variation or total variation ∆qa and defined by

∆qa (t) = qa0 (t0 ) − qa (t).

Using the Taylor expansion we can write

∆qa = qa0 (t + δt) − qa (t) = qa0 (t) + q˙a δt − qa (t) = δqa + q̇a δt. (4.22)

Now we are prepared to prove the Noether theorem.


Theorem 4 (Emmy Noether’s theorem). Let S be the action of the system de-
fined by
Zt2
S= L(q, q̇, t) dt (4.23)
t1

Let

q(t) 7→ q 0 (t) + δq(t), t0 7→ t + δt(t). (4.24)

be a transformation of the coordinates qa and time t which leaves S invariant. Then


quantity

Q = pa ∆qa − E δt (4.25)

is constant during the evolution of the system whenever qa is the solution of equations
of motion, where pa are the generalized momenta and E is generalized energy of the
system E defined by
∂L
pa = , E = pa q̇a − L. (4.26)
∂ q̇a
98 4 Variational principle

Proof. By assumption, the action (4.23) is invariant under transformation (4.24).


The action associated with varied trajectory qa0 and varied time t is
t2Z+δt2
0
S = L(q 0 (t), q̇ 0 (t), t) dt (4.27)
t1 +δt1

where we use notation δt1 = δt(t1 ) and δt2 = δt(t2 ) for brevity. Notice that the
time translation δ affects only the integration bounds, not the integrand. The total
variation of the action is then ∆S = S 0 − S which is zero by assumption of invariance
of the action:

∆S = S 0 − S = 0. (4.28)

Using the additivity of integral we can rewrite varied action as


t2Z+δt2 Zt2 t2Z+δt2 t1Z+δt1 t2Z+δt2 Zt1 t1Z+δt1 t2Z+δt2

S0 = = + = − + = − − +
t1 +δt1 t1 +δt1 t2 t2 t2 t2 t1 t2
Zt2 t1Z+δt1 t2Z+δt2
0 0 0 0
= L(q , q̇ , t) dt − L(q , q̇ , t) dt + L(q 0 , q̇ 0 , t) dt
t1 t1 t2

where we have omitted the integrand in intermediate steps. Hence, the total variation
of the action reads
Zt2 t2Z+δt2 t1Z+δt1

∆S = L(q 0 , q̇ 0 , t) − L(q, q̇, t) dt + L(q 0 , q̇ 0 , t) dt − L(q 0 , q̇ 0 , t) dt .


t1 t2 t1
| {z } | {z }
∆S1 ∆S2
(4.29)

Now we are in position to expand these integrals in variations δqa and δt assuming
they are infinitesimal and hence neglecting higher order terms. This is ”legal” because
in the definition of the variation it was assumed that after variation, all quantities
will be evaluated at δqa = δt = 0, so only the first order terms enter the result.
First we express the variation denoted by ∆S1 in the equation above. The La-
grangian is evaluated on different trajectories but at the same time and so the ex-
pression under the integral is isochronous variation of the Lagrangian:
4.8 Noether’s theorem: proof 99

Zt2 Zt2
∂L ∂L
∆S1 = δL dt = δqa + δ q̇a dt.
∂qa ∂ q̇a
t1 t1

Second term can be integrated by parts to find


 t2 Zt2  
∂L ∂L d ∂L
∆S1 = δqa + − δqa dt.
∂ q̇a t1 ∂qa dt ∂ q̇a
t1

We arrived at the same expression when we derived Lagrange’s equations from the
variational principle but now the interpretation is different. There we assumed that
boundary points of the trajectory are fixed and so we assumed δqa (t1 ) = δqa (t2 ) = 0.
By this assumption, the first term in square brackets vanished and hence we deduced
that in order to satisfy δS = 0, the Lagrange equations must hold. But now the
boundary conditions are not fixed because we consider the transformation of the
system. However, we assume that the equations of motion are satisfied and therefore
the second term vanishes! Consequently, the only contribution from ∆S1 to total
variation is merely
 t2
∂L
∆S1 = δqa .
∂ q̇a t1

Next we evaluate variation ∆S2 in the expression (4.29). Recall that we are ex-
panding all quantities up to the first order in variations δqa and δt. Thus, for example,
the first integral in ∆S2 is
t2Z+δt2 t2Z+δt2

L(q 0 , q̇ 0 , t) dt = L(q, q̇, t) dt + O δq 2 .




t2 t2

Now we can expand the Lagrangian into series in t as

L(t) = L(t2 ) + L̇(t2 )(t − t2 ) + O (t − t2 )2 .




Since the integral is taken over interval (t2 , t2 + δt2 ), the inequality

t − t2 < δt2

holds and therefore we can write

L(t) = L(t2 ) + L̇(t2 )(t − t2 ) + O (δt2 )2 .



100 4 Variational principle

Then the integral reads


t2Z+δt2

L(t) dt = [L(t2 ) t]tt22 +δt2 + O (δt2 )2 = L(t2 ) δt2 .




t2

Neglecting the quadratic terms we arrive at


t2Z+δt2

L(q 0 , q̇ 0 , t) dt = L(q(t2 ), q̇(t2 ), t2 ) δt2 .


t2

By the same reasoning we can derive


t1Z+δt1

L(q 0 , q̇ 0 , t) dt = L(q(t1 ), q̇(t1 ), t1 ) δt1 .


t1

We can conclude that total variation ∆S2 is equal to

∆S2 = L(q(t2 ), q̇(t2 ), t2 ) δt2 − L(q(t1 ), q̇(t1 ), t1 ) δt1 = [L δt]tt21 .

Summa summarum, the total variation of the action reads


 t2
∂L
∆S = δqa + L δt . (4.30)
∂ q̇a t1

Using the definition of generalized momentum (3.2) and relation between isochronous
and total variation (4.22) we find

∆S = [pa ∆qa − (pa q̇a − L) δt]tt21 . (4.31)

The coefficient standing by variation δt is in fact equal to the Hamiltonian (3.8). The
reason why we do not denote it by H is that the Hamiltonian should be expressed
as the function of qa and pa which is not our case. But we know that Hamiltonian is
equal to the total energy and hence we define generalized energy by

E = pa q̇a − L

so that the total variation of the action becomes

∆S = [pa ∆qa − E δt]tt21 . (4.32)


4.9 Basic conservation laws 101

Finally, let us denote the expression in square brackets by Q:


Q = pa ∆qa − E δt. (4.33)
The total variation is then
∆S = [Q]tt21 = Q(t2 ) − Q(t1 ).
Now, by (4.28) we have ∆S = 0 and hence
Q(t1 ) = Q(t2 ). (4.34)
Since times t1 and t2 can be chosen arbitrarily, we have Q(t1 ) = Q(t2 ) for arbitrary
times t1 and t2 . In other words, the value of Q at arbitrary time t1 is equal to value
of Q at arbitrary time t2 . In other words, Q acquires the same value at each time
and hence Q is a conserved quantity,
Q = constant.
Nevertheless, Q is not our final expression for conserved quantity, because it con-
tains the variations ∆qa and δqa and hence it depends on particular transformation.
We have to clarify the nature of variations further. If we say that the system is in-
variant under, for example, translations, we actually mean that it is invariant under
arbitrary translation. The translation in, say, x−direction can be understood as a
continuous transformation parametrized by parameter a,
x 7→ x + a.
For a = 0 we have the identity transformation x 7→ x. Since a is a continuous
parameter, also the transformation x 7→ x + a is continuous in variable a.
This concludes the proof of Noether’s theorem.
t
u

4.9 Basic conservation laws


In previous section we have proved the Noehter’s theorem for general transformation
of the system generated by infinitesimal variations ∆qa and δt. We have proved that
for such a general transformation, quantity (4.25) given by
Q = pa ∆qa − E δt
is conserved. In this section we investigate the implications of Noether’s theorem
regarding basic symmetries of the space and time discussed above: homogeneity of
space and time and the isotropy of space.
5
Hamilton-Jacobi equation

In previous chapters we found two alternative formulations of Newton’s laws of mo-


tion, namely the Lagrange and the Hamiltonian formulation. Lagrange’s equations
are formulated in arbitrary coordinate system. Their main advantage is that by an
appropriate choice of the coordinates we can eliminate the constraints which compli-
cate the analysis. Similarly to Newton’s law, Lagrange’s equations are second order
equations. Hamilton’s equations are also coordinate-independent but, in addition,
they have the form of first order differential equations. In general, first order equa-
tions are easier to solve. In the case of Hamilton’s equations, this advantage is only
formal because in order to solve the system of Hamilton’s equations we usually have
to convert them back to second-order equations. Main advantage of Hamilton’s equa-
tions is that we can interpret the motion of particles as the motion in the phase space.
We have seen that the conservation of energy allows us to find the phase trajectories
even without solving the equations of motion.
In this chapter we start with the analysis of such coordinate transformations
which leave the form of Hamilton’s equation invariant, so-called canonical trans-
formations. Then we study the possibility of finding such transformations which
simplify the Hamilton’s equations so that they can be solved easily. We will see that
this is indeed possible if we solve the Hamilton-Jacobi equation. In many situations,
Hamilton-Jacobi equation can be solved exactly and the solution of Hamilton’s equa-
tions simplify significantly. Analysis of Hamilton-Jacobi equation will lead us to a
new, third formulation of classical mechanics. Finally we introduce action-angle vari-
ables which will be useful in the analysis of more complicated systems with periodic
behaviour.
104 5 Hamilton-Jacobi equation

5.1 Canonical transformations


In Hamilton’s formalism we treat coordinates qa and momenta pa as independent
variables. Let us investigate such transformations which do not change the form of
Hamilton’s equations
∂H ∂H
q̇a = , ṗa = − . (5.1)
∂pa ∂qa
Hence, we are interested in transformations

Qa = Qa (q, p), Pa = Pa (q, p), (5.2)

preserving equations (5.1), i.e. such that new equations of motion will be
∂H0 ∂H0
Q̇a = , Ṗa = − . (5.3)
∂Pa ∂Qa
In chapter 4 we have seen that the Lagrangian is not determined uniquely, so that we
can add arbitrary function which is a total time-derivative to a Lagrangian without
affecting the equations of motion, recall equation (4.19).
Suppose that we have the Lagrangian L = L(q, q̇) and corresponding Hamiltonian

H = q̇a pa − L.

Then we perform transformation (5.2) to new coordinates Qa and new momenta Pa


and obtain a new Lagrangian L0 = L0 (Q, Q̇) with associated Hamiltonian

H0 = Q̇a Pa − L0 .

We require that both Lagrangians yield the same equations of motion. Then, by
(4.19), two Lagrangians can differ only by a total derivative of some function F with
respect to time,
dF
L0 = L + .
dt
In terms of Hamiltonian this means
dF
q̇a pa − H = Q̇a Pa − H0 + . (5.4)
dt
In general, function F depends on both old coordinates, new coordinates and possibly
on time,
5.1 Canonical transformations 105

F = F (q1 , . . . qn , p1 , . . . pn , Q1 , . . . Qn , P1 . . . Pn , t) ≡ F (q, p, Q, P, t),

i.e. it is a function of 4n + 1 variables. But these coordinates are not all indepen-
dent as they are constrained by 2n equations (5.2). Hence, F is a function of 2n + 1
independent variables and we can decide which variables will be independent. Trans-
formations (5.2) are called canonical and function F is called generating function for
canonical transformations (5.2).
Let us choose a generating function F1 which is a function of old and transformed
coordinates (and possibly on time),

F1 = F1 (q, Q, t). (5.5)

Its total derivative with respect to time is


dF1 ∂F1 ∂F1 ∂F1
= q̇a + Q̇a + . (5.6)
dt ∂qa ∂Qa ∂t

Substituting this expression into (5.4) and comparing coefficients standing by inde-
pendent derivatives q̇a and Q̇a , respectively, we find
∂F1 ∂F1 ∂F1
pa = , Pa = − , H0 = H + . (5.7)
∂qa ∂Qa ∂t

Hence, we can define arbitrary function F1 of qa and Qa and, using relations (5.7),
we can find transformations which function F1 generates. Equation

pa = F1 (q, Q, t)
∂qa
can be used to find defining relation for Qa , i.e. we can solve this equation to find

Qa = Qa (q, p, t).

This result can be substituted to equation



Pa = − F1 (q, Q, t)
∂Qa
which can be then solved to find relation

Pa = Pa (q, p, t).
106 5 Hamilton-Jacobi equation

Sometimes it is useful to define generating function which depends on variables qa


and Pa . This can be achieved using familiar Legendre transformation. Let us write
the differential of F1 with the help of equations (5.7):
∂F1 ∂F1 ∂F1
dF1 = dqa + dQa +
∂qa ∂Qa ∂t
∂F1
= pa dqa − Pa dQa +
∂t
∂F1
= pa dqa − d(Qa Pa ) + Qa dPa + dt.
∂t
Let us define function

F2 = F1 + Qa Pa . (5.8)

Its differential reads


∂F1
dF2 = pa dqa + Qa dPa + dt (5.9)
∂t
which means that F2 is function of qa and Pa ,

F2 = F2 (q, P, t), (5.10)

and, in addition, transformation generated by function F2 is


∂F2 ∂F2 ∂F2
Qa = , pa = , H0 = H + . (5.11)
∂Pa ∂qa ∂t

5.2 Hamilton-Jacobi equation


Canonical transformations preserve the equations of motion. Let us find such canon-
ical transformation that Hamilton’s equations simplify as much as possible so that
we can solve them explicitly. We introduce generating function of type (5.10) but we
will denote it by S:

S = S(q, P, t).

From (5.11) we have


∂S ∂S ∂S
Qa = , pa = , H0 = H + . (5.12)
∂Pa ∂qa ∂t
5.2 Hamilton-Jacobi equation 107

In order to simplify Hamilton’s equations, let us put H0 = 0, so that S satisfies


equation
∂S
H+ = 0. (5.13)
∂t
Hamilton’s equations (5.3) with H0 = 0 then imply

Q̇a = 0, Ṗa = 0. (5.14)

In other words, transformed coordinates and momenta are constant. Equations (5.14)
can be solved trivially,

Qa = αa , P a = βa , (5.15)

where αa and βa are integration constants, but they are equal to constant values of
coordinates and momenta. Then the generating function can be written as

S = S(q, β, t) (5.16)

and equations (5.12) acquire the form


∂S ∂S ∂S
αa = , pa = , H+ = 0. (5.17)
∂βa ∂qa ∂t
Thus, if we want to find canonical transformation which simplifies the Hamilton
equations, we first solve equations
∂S ∂S
pa = , H(q, p, t) + .
∂qa ∂t
Notice that the first equation is merely a definition of pa so the only equation which
must be in fact solved is
 
∂S ∂S
H q, ,t + . (5.18)
∂q ∂t
This equation for generating function S is known as the Hamilton-Jacobi equation.
Hamilton-Jacobi equation contains 2n + 1 derivatives and therefore the solution S
contains 2n + 1 constants. On of them is additive, for obviously any function S 0 =
S + c, where c is constant, is also a solution to (5.18). This constant can be set to
zero without the loss of generality because Hamilton-Jacobi equation contains only
derivatives of S. Hence, the solution will contain 2n constants:
108 5 Hamilton-Jacobi equation

S = S(q1 , . . . qn , c1 , . . . cn , t). (5.19)

This result should be compared to equation (5.16) where βa are constant momenta.
Our aim was to arrive at Hamilton’s equations in the form (5.14), so in order to
identify constants ca with momenta βa we have to show that coordinates derived
from generating function (5.19) via (5.12) are indeed constant. We have

∂S
Qa =
∂ca
and using Hamilton’s equations and the Hamilton-Jacobi equation we find
   
d ∂S ∂ ∂S ∂ ∂S ∂ ∂S ∂ ∂S
Q̇a = = q̇b + = q̇b +
dt ∂ca ∂qb ∂ca ∂t ∂ca ∂ca ∂qb ∂ca ∂t
∂pb ∂H ∂H
= − .
∂ca ∂pb ∂ca
Now we use that fact that Hamiltonian H depends on constants ca only through
generating function S,
 
∂  ∂S(q, c, t)  ∂H ∂pb
H q, , t = ,
∂ca  ∂q  ∂pb ∂ca
| {z }
p

so that expression for Q̇a actually reduces to zero:

Q̇a = 0.

Hence, we have proved that function S which is a solution to Hamilton-Jacobi equa-


tion generates canonical transformation after which the coordinates Qa are constant
and we denote them by αa = Qa as we did above. Then we can identify unknown con-
stants ca in function S with constant momenta Pa = ca = βa . Hamilton’s equations
in transformed coordinates thus read

Q̇a = 0, Ṗa = 0,

as desired.
5.3 Example: harmonic oscillator 109

5.3 Example: harmonic oscillator


The procedure explained in the previous section may seem to be somewhat abstract
and it could be useful to see how it works on our favorite example of harmonic
oscillator. Let us take, for simplicity, take the Hamiltonian in the form
p2 q 2
H(q, p) = + .
2 2
In order to formulate Hamilton-Jacobi equation, we replace the momentum p by
derivative of generating function S,
∂S
p= ,
∂q
in accordance with (5.12). Hamilton-Jacobi equation (5.18) then reads
 
∂S ∂S
+ H q, = 0.
∂t ∂q
Let us put

S = A(t) + W (q)

where A is only a function of time t and W is time-independent. Then the Hamilton-


Jacobi equation acquires the form
 
∂A ∂W
= − H q, .
∂t ∂q
We know that the Hamiltonian H is constant and equal to the total energy,
∂A
= −E
∂t
which integrates to A = −Et and the generating function can be written in the form

S(q, E, t) = − E t + W (q).

Energy E is the first integration constant, in the notation of previous section we


write β = E. Hamilton-Jacobi equation is now
 
∂W
H q, = E.
∂q
110 5 Hamilton-Jacobi equation

Using particular form of the Hamiltonian we arrive at equation


1 2 1
(W 0 (q)) + q 2 = E,
2 2
and after rearrangement,
p
dW = 2E − q 2 dq.

This is an elementary integral and can be evaluated easily but with some work (or
using Mathematica). The result is
q 1 p
W (q, E) = E arcsin √ + q 2E − q 2 .
2E 2
where the additive integration constant has been set to zero1 .
Since we have identified integration constant E with constant momentum P = β,
we can use relation (5.17),

∂S
α= ,
∂β
to obtain constant transformed coordinate Q = α. By differentiating S = −Et + W
we find
∂W q
α= −t+ = − t + arcsin √ .
∂E 2E
We have proved in the previous section that α must be constant, we can use the last
equation to express q as a function of time t:

q = 2E sin(α + t)

which is the usual solution to equation of harmonic oscillator.


1

In order to perform the integration, use the substitution q = 2E sin x to obtain 2E cos2 xdx. Then
R

use trigonometric formula cos2 x = (1 + cos 2x)/2 and perform trivial integration. Finally, return√ to
variablepq by inverting the relation for x and use formula sin 2x = 2 sin x cos x where sin x = q/ 2E and
cos x = 1 − sin2 x.
5.4 Action-angle variables 111

5.4 Action-angle variables


Let us continue with our analysis of harmonic oscillator. An important class of sys-
tems is described by so-called integrable Hamiltonians, the term to be defined later.
Before we discuss the integrability of the system, we need to introduce a new set of
canonically conjugated variables known as action-angle variables.
In the previous section we have seen that if the Hamiltonian is time-independent,
the action S can be written in the form

S = − E t + W (q, E).

Function W is called Hamilton’s characteristic function and it depends on the coor-


dinate q and total energy E. Now we are going to use this function as a generating
function for canonical transformation.
We know that harmonic oscillator moves in a periodic way and its phase trajec-
tories are circles (or ellipses when we use simplified units, as we do in this chapter)
in the phase space. In other words, its phase trajectories are always closed curves.
Hence, it makes sense to define new momentum called action variable by
I
J= p dq (5.20)

where the integral is taken along the orbit of the oscillator, i.e. along the circle. We
said that J will be treated as a momentum which means that we identify transformed
momentum β with action-variable J. Recall that the Hamiltonian is equal to the total
energy,

H(q, p) = E

which can be inverted to find

p = p(q, E).

Hence, the integrand of (5.20) depends on q and E. But since we integrate over
variable q, the integral does not depend on q anymore and we have

J = J(E) or E = E(J).

Consequently, we can write Hamilton’s characteristic function as the function of q


and J:

W = W (q, J).
112 5 Hamilton-Jacobi equation

According to (5.11), coordinate Q conjugated to momentum P is a derivative of


generating function with respect to momentum. In our case W is the generating
function, J plays the rôle of momentum and conjugated coordinate will be called
angle variable and defined by
∂W
w= . (5.21)
∂J
Because generating function W does not depend on time explicitly, by (5.11) we
have H0 = H and since canonical transformations preserve the form of Hamilton’s
equations, equation of motion in terms of action-angle variables is simply
∂H
ẇ = . (5.22)
∂J
In the case of harmonic oscillator we have
p2 q 2
H= + =E
2 2
p
so that p = 2E − q 2 .
6
Electromagnetic field

Lagrange’s equations and Hamilton’s equations have been derived from Newton’s
law under assumption that the force which acts on the particle is conservative, i.e.
it can be written as a gradient of the potential,

F = −∇V.

In this case we can define the Lagrangian L = T − V as a difference of kinetic and


potential energy and consequently we can introduce the Hamiltonian. On the other
hand, when we derived Lagrange’s equations from the variational principle, we just
assumed that the system can be described by some Lagrangian L without assuming
the conservative nature of the forces explicitly. We have only argued that if the force
is conservative and thus has a potential V , then the natural choice is L = T −V . This
approach, however, does not exclude the possibility that the system can be described
by some function L even if the force is not conservative.
Particle in electromagnetic field is the most important practical example of such
system. In physics we often study the motion of charged particles in external elec-
tromagnetic fields but we do not care how these fields emerged. Hence, we do not
study the dynamics of the fields, we merely assume that these fields are given and
investigate the motion of particles in regions where electromagnetic fields are present.
In the past people thought that electricity and magnetism are two different phe-
nomena while today we know that they are just two different aspects of single entity
called electromagnetic field. Electric part of the field is described by vector field E
(sometimes called electric field strength or electric intensity) and magnetic part of
the field is described by vector field B (sometimes called magnetic induction). Fully
unified view of these fields as parts of electromagnetic field is possible only in the
framework of special theory of relativity. Let us elucidate the meaning of fields E
and B.
114 6 Electromagnetic field

Recall that in classical Newtonian theory of gravitation, the sources of gravita-


tional force are masses: gravitational force between two point masses m and m0 is
proportional to product m m0 and is given by

m m0
F =G 2
r
where r is the distance between the between the points and G is gravitational con-
stant. Numerical value of constant G in standard SI units is

G = 6, 674 × 10−11 m3 kg−1 s−2 .

Similarly, the sources of electromagnetic interaction are charges, i.e. charged par-
ticles. Charge is usually denoted by symbol q or e and it can be either positive or
negative. Particles with vanishing charge are called neutral. It is a remarkable fact
that for two point charges at rest, the electric force of their interaction is given by
the Coulomb law which is formally identical to the Newton’s law of gravitation. Two
point charges q and q 0 at mutual distance r act on each other by electric force of
magnitude

q q0
F =k (6.1)
r2
where k is the constant characterizing the strength of electromagnetic interaction and
plays the rôle similar to that of gravitational constant G in Newton’s law. Numerical
value of constant k depends on the system of units we use. In standard SI units we
write k in the form
1
k=
4πε0
where ε0 is called permittivity of the vacuum and its value is

ε0 = 8, 854 × 10−12 F · m−1

so that the constant k is

k = 8, 99 × 109 F−1 m.

Comparing this value to the value of gravitational constant G we can see that electric
force is much, much stronger than gravitational force.
6 Electromagnetic field 115

However, simple Coulomb’s law (6.1) holds only for charges at rest. When the
charges start to move in an arbitrary way, new effects emerge. First, electromagnetic
field propagates at finite speed c equal to the speed of light,

c = 299 792 458 m · s−1 .

Notice that in SI units, this value is not approximate but exact. It is related to
constant ε0 by
1
c= √
ε0 µ 0
where µ0 is called permeability of vacuum and its value is, by definition,

µ0 = 4 π × 10−7 m · kg · s−2 · A−2 .

When we say that the speed of propagation of electromagnetic field is finite and
equal to c, we mean that if one charge changes its position, the other charges do not
feel this change immediately but only after time
r
∆t =
c
where r is the distance from the charge which changed the position. From this fact
it is immediately obvious that r in the Coulomb law (6.1) is a problematic quantity
because we must take into account that the charge at actual distance r cannot have
immediate effect on some other charge.
Next problem is that moving charge produces not only electric but also magnetic
field. Time-dependent electric field is a source of magnetic field and vice versa. This
is what we mean by dynamics of electromagnetic field: the field can propagate over
empty spacetime (without charges) at the speed of light. Hence, the notion of the
force is not appropriate for description of dynamics of electromagnetic interaction
and the notion of the field must be introduced.
But, as we claimed, we will not discuss the dynamics of electromagnetic field which
is given by celebrated Maxwell’s equations. We simply assume that the electromag-
netic field is given and investigate the motion of charged particles in this field. Once
again, electromagnetic field is described by electric field E and magnetic field B.
Consider particle with charge q which is moving in the region where only electric
field is present, i.e. B = 0. Then the electric field acts on the particle by force given
by

F = q E. (6.2)
116 6 Electromagnetic field

In other words, electric force is proportional to electric field E and the charge of
particle q, which is an experimental fact. Once we discover this fact, relation (6.2) is
a definition of electric vector E. Vector E is such vector that electric force exerted
on a point charge q is given by (6.2).
Similarly, consider particle moving in the region where only magnetic field is
present. Once again we find (experimentally) that the force acting on charge q is
proportional to the charge. But, in addition we find that the direction of magnetic
force is always orthogonal to the velocity v of the charge. It was discovered that
magnetic force is given by

F = qv×B (6.3)

where operation × is standard vector product1 (or cross product). Again, relation
(6.3) is a definition of magnetic vector B.
When both electric and magnetic fields are present, the force exerted on the
particle is given by the so-called Lorentz force

F = q (E + v × B) . (6.4)

We emphasize that relation (6.4) is an experimental fact, similarly as the Newton law
of force is, and we do not derive it from some more basic principle. It is fascinating
that relation (6.4) can be derived from more basic principles but this is completely
beyond the scope of this textbook2 . In the theory of electromagnetism it is shown
that instead of electric field E and magnetic field B we can introduce one scalar
function φ and one vector function A; it is a consequence of Maxwell’s equations. In
this textbook we proceed differently and assume that this can be done. From this
assumption we will be able to derive correct equations of motion of charged particle
in arbitrary electromagnetic field.

6.1 Lagrangian and equations of motion


In accordance with the last paragraph of previous section, we assume that electro-
magnetic field can be described, in some sense, by one scalar field φ called scalar
1
Recall that the cross product a × b of vectors a and b is a vector orthogonal both to a and b and its
magnitude is |a × b| = a b sin θ where θ is the angle between both vectors.
2
Particular form of the Lorentz force can be obtained from the first principles by considering the Poincaré
group of isometries of the Minkowski spacetime. Electromagnetic fields appears to be a massless repre-
sentation of the Poincaré group with spin 1 which yields the set of Maxwell equations. The Lorentz force
can be then derived using the principle of local gauge invariance.
6.1 Lagrangian and equations of motion 117

potential and by one vector field A called vector potential so that the Lagrangian of
particle in electromagnetic field is
1 1
L= m v 2 − e φ + e v · A = m ẋi ẋi − e φ + e ẋi Ai . (6.5)
2 2
where e is a constant measuring the strength of the interaction between the particle
and the electromagnetic field; this constant is called charge of the particle. We assume
that the Lagrangian (6.5) represents correct description of particle moving in given
electromagnetic field. This assumption is justified a posteriori by accordance of the
theory with the experiment.
Equations of motion can be derived from usual Lagrange’s equations (2.18)

d ∂L ∂L
− = 0.
dt ∂ ẋi ∂xi
Partial derivatives read
∂L ∂L
= m ẋi + e Ai , = −e ∂i φ + e ẋj ∂i Aj .
∂ ẋi ∂xi
Note that total derivative of Ai with respect to time is
dAi ∂Ai dxj ∂Ai
= + ≡ ẋj ∂j Ai + ∂t Ai
dt ∂xj dt ∂t

and hence
d ∂L
= m ẍi + e ẋj ∂j Ai + e ∂t Ai .
dt ∂ ẋi
Collecting these auxiliary expression we find that the Lagrange equations of motion
are

m ẍi = −e ∂i φ − e ∂t Ai + e ẋj (∂i Aj − ∂j Ai ) . (6.6)

Now, since ∂i ẋj = 0, the last term on the right hand side can be rewritten as

ẋj (∂i Aj − ∂j Ai ) = ∂i (Aj ẋj ) − ẋj ∂j Ai = [∇(A · v) − v · ∇A]i .

It is straightforward to proove the identity

v × (∇ × A) = ∇(A · v) − v · ∇A
118 6 Electromagnetic field

so that relation (6.6) can be written in the vector form as


dv ∂A
m = −e ∇φ − e + e v × (∇ × A). (6.7)
dt ∂t
This is the equation of motion of charged paricle. However, we can see that the
acceleration is not given directly by potentials but by their derivatives (that is the
reason why they are called potentials). Hence, we can introduce vectors
∂A
E = −∇φ − , B = ∇ × A, (6.8)
∂t
in which case we can write equation (6.7) in the form
dv
m = e (E + v × B) (6.9)
dt
which is the law for the Lorentz force (6.4). For the sake of completeness we list the
Cartesian components of equation (6.9):
dvx
m = e Ex + e vy Bz − e vz By ,
dt
dvy
m = e Ey + e vz Bx − e vx Bz , (6.10)
dt
dvz
m = e Ez + e vx By − e vy Bx .
dt

6.2 Hamilton’s equations


Having derived the Lagrange equations of motion of charged particle in an exter-
nal electromagnetic field, we now turn to the Hamiltonian description of the same
problem. Proceeding in a standard way we introduce a generalize momentum by
∂L
pi = = m ẋi + e Ai . (6.11)
∂ ẋi
In order to find the Hamiltonian we invert this relation to find
pi e
ẋi = − Ai . (6.12)
m m
Notice that although we are working in the Cartesian coordinates, generalized mo-
mentum
6.3 Mathematica 119

p = mv + eA

is different from linear momentum mv. The Hamiltonian is then given by the Leg-
endre transformation of the Lagrangian,

H = ẋi pi − L, (6.13)

where we must, however, express the velocities ẋi in terms of generalized momenta
(6.11). After simple rearrangements we find
1
H= (p − eA)2 + e φ. (6.14)
2m
Let us now differentiate the Hamiltonian with respect to coordinates and mo-
menta,
∂H e
= − (pj − eAj ) ∂i Aj + e ∂i φ,
∂xi m
∂H 1
= (pi − e Ai ) ,
∂pi m
from which the Hamilton equations follow:
1
ẋi = (pi − e Ai ) ,
m (6.15)
e
ṗi = (pj − e Aj ) ∂i Aj − e ∂i φ.
m

6.3 Hamilton equations in Mathematica


Hamilton’s equations (6.15) can be easily implemented in Mathematica. Although
following code may look a bit complicated, it is in fact very straightforward. We
implement function

HamiltonEM[φ, A]

where φ and A are functions of Cartesian coordinates x, y, z representing the scalar


and vector potential. This function consequently produces the list of six Hamilton’s
equations for the particle in electromagnetic field.
120 6 Electromagnetic field

HamiltonEM @Φ_ , A_ ; ListQ @A D ì Length @A D Š 3D :=


Module B8xs, ps, eqs1, eqs2, dependencies, DA , DΦ <,
xs = 8x , y , z<;
ps = 8p1, p2, p3<;
dependencies = 8 x ® x @tD, y ® y @tD, z ® z@tD, p1 ® p1@tD, p2 ® p2@tD, p3 ® p3@tD<;
eqs1 = Equal žžž
1
Transpose B : D@xs . dependencies, tD - H H ps - e A L . dependenciesL, 80, 0, 0<> F;
m
DΦ = D@Φ , ð D & ž xs;
DA = D@A , ð D & ž xs;
eqs2 = Equal žžž Transpose B
e
: D@ps . dependencies, tD - H DA .ps - e DA .A L - e DΦ . dependencies , 80, 0, 0<> F;
m
Flatten @ 8eqs1, eqs2<D
F

6.4 Homogeneous fields


As a first example we consider motion of charged particle in the homogeneous mag-
netic field B without the presence of electric field, i.e.

E = 0, B = constant.

Let us reformulate these conditions in terms of potentials A and φ.


Since magnetic and electric fields are assumed to be constant (or even vanishing),
potentials obviously do not depend on time, so that

E = −∇φ, B = ∇ × A.

Next, electric field vanishes and so, by last equations, potential φ is constant which
can be set to zero without the loss of generality. Remaining equation B = ∇ × A in
the component form reads

Bx = ∂y Az − ∂z Ay ,
By = ∂z Ax − ∂x Az ,
Bz = ∂x Ay − ∂y Ax .

It is possible to find the solution for arbitrary direction of magnetic field, but for
convenience we choose a coordinate system in which B has direction of z−axis,

B = (0, 0, B).
6.4 Homogeneous fields 121

This orientation of Cartesian coordinate system can always be achieved by appro-


priate rotation. With this choice we have
0 = ∂y Az − ∂z Ay ,
0 = ∂z Ax − ∂x Az ,
B = ∂x Ay − ∂y Ax .
Since B is constant along the z−axis, all partial derivatives ∂z must be zero:
0 = ∂y Az ,
0 = −∂x Az ,
B = ∂x Ay − ∂y Ax .
First two equations tell that Az does not depend on x and y and hence is a constant.
However, this constant does not enter expression for B in the third equation and
thus we can set Az = 0. Equation for B can be solved, for example, by setting
Ax = 0, Ay = Bx.
Summa summarum, potentials φ and A representing homogeneous magnetic field
parallel to the z−axis can be written in the form
φ = 0, A = (0, Bx, 0). (6.16)
Reader can check that ∇ × A = (0, 0, B). Of course, the choice of the potentials is
not unique and we have chosen the simplest possibility.
In the following code we generate the set of Hamilton’s equations by invoking
function HamiltonEM defined above and setting initial conditions to
x0 = 1, y0 = 0, z0 = 0, p10 = 0, p20 = 2, p30 = 0.
Numerical values of constants are set to
m = B = e = 1.

eqs = HamiltonEM @ 0, 80, B x , 0<D;


In[2]:=

vals = 8m ® 1, B ® 1, e ® 1<;
initConds = 8 x @0D Š 1, y @0D Š 0, z@0D == 0, p1@0D Š 0, p2@0D Š 2, p3@0D Š 0<;
tmax = 20;
sol = NDSolve @ Join @eqs, initCondsD . vals,
8x @tD, y @tD, z@tD, p1@tD, p2@tD, p3@tD<, 8t, 0, tmax <D;
122 6 Electromagnetic field

Now we plot the solution. All plotting options can be ignored, they serve just to
improve the quality of the plot.

g1 = ParametricPlot3D@ 8x @tD, y @tD, z@tD< . sol, 8t, 0, tmax <,


In[114]:=

AxesOrigin ® 80, 0, 0<, Boxed ® False , PlotRange ® 88- 1, 3.5<, 8- 1, 2<, 8- 1, 1<<,
Ticks ® 8 Range @- 1, 3, 1D, Range @- 1, 2, 1D, 8- 1, 1<<,
BaseStyle ® 8FontFamily ® "Times New Roman ", FontSize ® 15<,
ViewPoint ® 81, 1, 1<D;
g2 = Graphics3D@ 8Text@Style @"x ", 15D, 83.5, 0.2, 0<D,
Text@Style @"y ", 15D, 8-0.2, 2, 0<D,
Text@Style @"z ", 15D, 8-0.05, 0.1, 1<D
<
D;
Show @g1, g2D

The code above produces following figure:

1 z

-1 -1
0
1 1
y
2 -1 2
3
x

We can see that the trajectory of the particle is a circle of radius 1 centered at
position (2, 0, 0). This is a familiar property of magnetic field: the field does not
perform the work on a particle, only changes direction of its motion. Since magnetic
force is always orthogonal to velocity, resulting trajectory is a circle.
Now suppose that we add an initial velocity in the z−direction, e.g. we set

p30 = 0, 1.
6.4 Homogeneous fields 123

That means that initial velocity is not orthogonal to magnetic field B anymore, but
the vz -component of the velocity does not affect magnetic force. Hence, in addition to
circular motion, the charge will move uniformly in z−direction. Resulting trajectory
of the particle is called helix (in order to obtain this figure in Mathematica, do not
forget to adjust the range on z−axis).

1 z -1
-1 0
1
2 1
-1 y
3 2

Let us consider another example. Suppose that in addition to magnetic field, there
is homogeneous electric field

E = (0, 0, E)

in the direction of axis z. This field is time-independent again and thus the equation
for scalar potential reads

E = −∇φ

or, in components,
∂φ ∂φ ∂φ
= 0, = 0, = −E,
∂x ∂y ∂z
from which we find
124 6 Electromagnetic field

φ = −x z.
Corresponding code:

eqs = HamiltonEM @ - E0 z, 80, B x , 0<D;


In[248]:=

vals = 8m ® 1, B ® 1, e ® 1, E0 ® 0.01<;
initConds = 8 x @0D Š 1, y @0D Š 0, z@0D == 0, p1@0D Š 0, p2@0D Š 2, p3@0D Š 0<;
tmax = 100;
sol = NDSolve @ Join @eqs, initCondsD . vals,
8x @tD, y @tD, z@tD, p1@tD, p2@tD, p3@tD<, 8t, 0, tmax <D;

In this case, the motion of the particle consists of uniform circular motion in the
plane z = constant and uniformly accelerated motion in the direction of z−axis.

1 z -1
-1 0
1
2 1
-1 y
3 2

6.5 Electromagnetic wave


In this section we consider harmonic electromagnetic plane wave propagating in the
direction of x−axis. Electric field is assumed to have a form
E(t, x) = (0, 0, E0 cos(t − x)),
6.5 Electromagnetic wave 125

i.e. it has only z−component. E0 is the amplitude of the electric field. Electric field
is related to potentials via
∂A
E = −∇φ − .
∂t
Let us set φ = 0:
∂A
E=− .
∂t
This equation can be integrated to find the vector potential in the form
Z
A = − E dt = (0, 0, −E0 sin(t − x)) .

Corresponding magnetic field is then

B = (0, −E0 cos(t − x), 0) .

We can see that magnetic field has direction of y−axis and hence it is orthogonal
to electric field, which is a general property of electromagnetic waves. Derivation
performed above can be done with Mathematica using following commands:

In[11]:=
Needs@"VectorAnalysis`"D
El@t_ , x_ D = 80, 0, E0 Cos@t - x D<;
A = - à El@t, x D â t;

B = Curl@A . x ® Xx D . Xx ® x

80, - E0 Cos@t - x D, 0<


Out[14]=

New potential A can be used to derive Hamilton’s equations in a usual manner,

eqs = HamiltonEM @ 0, A D;
In[211]:=

vals = 8m ® 1, E0 ® 1, e ® 1<;
initConds = 8 x @0D Š 1, y @0D Š 0, z@0D Š 0, p1@0D Š 0, p2@0D Š 0, p3@0D Š 0<;
tmax = 100;
sol = NDSolve @ Join @eqs, initCondsD . vals,
8x @tD, y @tD, z@tD, p1@tD, p2@tD, p3@tD<, 8t, 0, tmax <D;
126 6 Electromagnetic field

and plotted by

g1 = ParametricPlot3D@ 8x @tD, y @tD, z@tD< . sol, 8t, 0, 80<,


In[216]:=

AxesOrigin ® 80, 0, 0<, Boxed ® False , PlotRange ® Full,


BaseStyle ® 8FontFamily ® "Times New Roman ", FontSize ® 15<,
ViewPoint ® 81, 1, 1<D;
g2 = Graphics3D@ 8Text@Style @"x ", 15D, 8- 5, 0.5, 0<D,
Text@Style @"y ", 15D, 80, 1.5, 0<D,
Text@Style @"z ", 15D, 80, 0, - 2.2<D
<
D;
g = Show @g1, g2D

which yields the following result.

-4 x
1.5
-2
1.0
0.5 0
0.0
0.5
1.0
y
1.5
2.0
z
6.6 Electrostatic wave 127

6.6 Electrostatic wave


Relations (6.8) hold in general. It can be shown directly from Maxwell’s equations
that electric and magnetic fields can always be written in the form (6.8). However,
there are situations too complicated to be analyzed in this way. For example, electro-
magnetic field in the plasma is a complicated consequence of interaction of external
electromagnetic fields and fields produced by the particles comprising plasma. In
such situations we usually cannot find electromagnetic fields as exact solution to
Maxwell’s equations and some simplifications are necessary. One can imagine exter-
nal homogeneous magnetic fields penetrating to plasma and, in addition, an electro-
static wave propagating in the plasma. We have seen that electric way described by
time-dependent vector potential is always accompanied by magnetic field given by
the curl of this potential. Thus, any electric wave must be accompanied by magnetic
wave, as we have seen in the previous section.
On the other hand, in plasma it is possible for electric wave to propagate through
the medium without generating accompanying magnetic wave which is a consequence
of complicated interactions mentioned above. In this case we can proceed in the
following way. We assume the presence of homogeneous magnetic field B and assume
the presence of electrostatic wave. For example,

E = (0, 0, E0 cos(t − x)), B = (B, 0, 0). (6.17)

These fields cannot be described by the same vector potential and hence the equations
of motion cannot be derived from any potential. Nevertheless, with this prescription,
we can write down usual Newtonian equation of motion
dv
m = e (E + v × B)
dt
and solve it numerically. Appropriate Mathematica code reads

El = 80, 0, Cos@t - x @tDD<;


In[23]:=

B = 81, 0, 0<;
r @t_ D = 8 x @tD, y @tD, z@tD <;
eqs = 8Equal žžž Transpose @8r ''@tD , El + r '@tD ‰ B<D,
x @0D Š 0, y @0D Š 0, z@0D Š 0,
x '@0D Š 0, y '@0D Š 0, z '@0D Š 0<
sol = NDSolve @ eqs, r @tD, 8t, 0, 100<D
ParametricPlot@8y @tD, z@tD< . sol, 8t, 0, 100<D
128 6 Electromagnetic field

Here we have set initial velocity to zero. The trajectory is found to be the spiral.

40

20

-40 -20 20 40

-20

-40
7
Discrete dynamical systems and fractals

This chapter is a digression from the main line but, first, discrete dynamical systems
provide a simple model of more complicated continuous dynamical systems which we
will study later and, second, we will plot nice pictures called fractals and get some
insight into complicated nature of chaotic systems.

7.1 Complex sequences


We start our discussion with one of the most famous examples of fractals, the Man-
delbrot set, which is very easy to plot using Mathematica. Let us choose arbitrary
point z0 ∈ C in the complex plane and let us define a sequence of complex numbers
by recurrent relation

zn+1 = f (zn ) + z0 (7.1)

where f (z) = z 2 . Thus, starting from a given z0 , members of this sequence read

z1 = f (z0 ) + z0 = z02 + z0 ,
z2 = f (z1 ) + z0 = z04 + 2 z03 + z02 + z0 ,
···

We can use Mathematicato generate members of this sequence using the following
command

NestList[#2 + z0 &, z0 , 5]//Expand (7.2)

which generates first five members:


130 7 Discrete dynamical systems and fractals

z0 = z0 ,
z1 = z02 + z0 ,
z2 = z04 + 2z03 + z02 + z0 ,
z3 = z08 + 4z07 + 6z06 + 6z05 + 5z04 + 2z03 + z02 + z0 ,
z4 = z016 + 8z015 + 28z014 + 60z013 + 94z012 + 116z011 + 114z010 + 94z09 + 69z08
+ 44z07 + 26z06 + 14z05 + 5z04 + 2z03 + z02 + z0 ,
z5 = z032 + 16z031 + 120z030 + 568z029 + 1932z028 + 5096z027 + 10948z026
+ 19788z025 + 30782z024 + 41944z023 + 50788z022 + 55308z021 + 54746z020
+ 49700z019 + 41658z018 + 32398z017 + 23461z016 + 15864z015 + 10068z014
+ 6036z013 + 3434z012 + 1860z011 + 958z010 + 470z09 + 221z08 + 100z07
+ 42z06 + 14z05 + 5z04 + 2z03 + z02 + z0 .

Obviously, the complexity of each term zn grows very quickly with increasing n.
It is instructive to see the behaviour of the sequence graphically. Hence, we choose
some particular z0 and plot few terms zn of the sequence starting from z0 . Let us
define following functions:
seq[z0_, n_] := NestList[ #^2 + z0 &, z0, n] // Expand
list[z0_, n_] := {Re[#], Im[#]} & /@ seq[z0, n]
First definition defines function which generates the list of n members of the sequence
zn . For example, command seq[I, 10] generates the list of ten members of the sequence
starting at point z0 = i:

{i, −1 + i, −i, −1 + i, −i, −1 + i, −i, −1 + i, −i, −1 + i, −i}.

However, we cannot plot complex numbers directly and so we must convert each
complex number z = x + iy into a pair of coordinates (x, y). This is accomplished by
function list. We define a pure function
{Re[#], Im[#]}&
which splits the argument into its real and imaginary parts. Then we apply this pure
function to all elements of the list seq[z0,n]. Using the previous example, command
list[I, 10] produces

{{0, 1}, {−1, 1}, {0, −1}, {−1, 1}, {0, −1},
{−1, 1}, {0, −1}, {−1, 1}, {0, −1}, {−1, 1}, {0, −1}}.
7.1 Complex sequences 131

Notice that this sequence is periodic: except from the starting point i, the sequence
is jumping from −1 + i to −i and back, infinitely.
The list produced by list can be already plotted by ListLinePlot. Let us plot the
list list[I,10] by
ListLinePlot[ {list[I, 10]},
PlotRange -> Full, AxesOrigin -> {0, 0}, AspectRatio -> 1,
PlotMarkers -> Automatic,
PlotStyle -> { {Blue} },
BaseStyle -> {FontFamily -> "Times New Roman", FontSize -> 13}
]
Expected result is plotted in figure 7.1.

Fig. 7.1. Points of the sequence zn starting from point i.

Now, let us choose a different starting point close to original point i, e.g z0 =
0.8i, and construct first ten members of the sequence again. We can compare both
trajectories using the following code:
ListLinePlot[ {list[I, 10], list[0.8 I, 10]},
PlotRange -> Full, AxesOrigin -> {0, 0}, AspectRatio -> 1,
PlotMarkers -> Automatic,
132 7 Discrete dynamical systems and fractals

PlotStyle -> { {Blue}, {Orange}},


BaseStyle -> {FontFamily -> "Times New Roman", FontSize -> 13}
]
We can see in figure 7.2 that the behaviour of the sequence changed significantly.
It is not periodic anymore but, in addition, it exhibits unpredictable behaviour. We
could guess that if we choose starting point z0 = 0.9i we obtain sequence ”somewhere
between” sequences starting from i and 0.8i. Reader is invited to plot the result for
z0 = 0.8i, here we just present the list of points produced by list[0.9I,10]:
{{0., 0.9}, {−0.81, 0.9}, {−0.1539, −0.558}, {−0.287679, 1.07175}, {−1.06589, 0.283359}, {1.05584, 0.295938},
{1.02721, 1.52493}, {−1.27023, 4.03285}, {−14.6504, −9.34529}, {127.3, 274.725}, {−59268.4, 69945.6}}

Obviously, this sequence is not bounded and it escapes to infinity very quickly.
What conclusion can be drawn from examples above? What we did actually see
is the most characteristic property of chaotic systems: sensitivity to initial condi-
tions. Particular choice of the starting point z0 corresponds to imposing the initial
condition. We have seen three sequences starting from points close to each other,
i, 0.9i and 0.8i. In non-chaotic systems, if we change initial positions slightly, also
the solution will change only slightly. In chaotic systems, the behaviour can differ
drastically even for very similar initial conditions. In our examples, first sequence
was periodic, second was unpredictable and the third one was diverging and tending
to infinity.

7.2 Mandelbrot set


In the case of Hamiltonian systems we were able to visualize possible behaviour of the
system by the method of phase portraits. Phase trajectories of harmonic oscillator
were circles, phase trajectories of pendulum were more complicated and we revealed
the existence of two type of periodic motions (open and closed curves) separated
by separatrix. For chaotic systems it is usually impossible to plot a phase portrait
because trajectories are very complicated and irregular. For illustration, figure 7.3
certainly is not very useful.
However, in order to visualize extreme sensitivity to initial conditions, it is not
important to see all kinds of trajectories. Qualitative behaviour of trajectories is
more interesting. Each sequence can either stay in a bounded region or escape to
infinity. We cannot inspect asymptotic behaviour of particular sequence but we can
choose a fixed radius R and investigate whether the sequence stays inside the region
bounded by circle of radius R or whether it escapes the circle after some number
of steps. In such a way we can assign a number to each point of the plane. Let us
describe the algorithm more precisely.
7.2 Mandelbrot set 133

Fig. 7.2. Comparison of two sequences with close starting points i and 0.8i.

Fig. 7.3. Ten sequences starting from initial points of the form z0 = x + 0.8I, x ∈ (−0.5, 0.5).
134 7 Discrete dynamical systems and fractals

Parameters of the algorithm are radius R > 0 and maximum number of steps
nmax . We choose a point z0 = x0 + iy0 ∈ C and construct the sequence zn starting
from this point. If |zn | > R then the algorithm stops and returns value n. If |zn | < R,
we compute zn+1 and repeat the procedure. If n > nmax , the algorithm stops and
returns value nmax . In this way we assign an integer to each point z0 of the complex
plane or, equivalently, to each point (x0 , y0 ) of usual Euclidean plane.
Let us see how this algorithm can be implemented in Mathematica. In usual pro-
cedural languages we would use some kind of cycle like for or while. In Mathematica,
these cycles can be still implemented but functional methods are more satisfactory;
in this case we use function NestWhileList. Function Mandelbrot implementing the
algorithm described above follows:
Mandelbrot[x_, y_,
OptionsPattern[{MaxRadius -> 100, MaxSteps -> 50}]] :=
Module[ {c, R, n},
c = x + I y;
R = OptionValue[MaxRadius];
n = OptionValue[MaxSteps];
Length[NestWhileList[ N[#^2 + c] &, c, (Abs[#] < R) &, 1, n]]
]
The head of the function tells Mathematicathat the function has two obligatory pa-
rameters x and y – these are the coordinates of initial point (x0 , y0 ) in the plane.
Moreover, function accepts optional arguments specifying the behaviour of the func-
tion. In our case, optional parameters are maximum radius R with default value 100
and the maximum number of steps with default value 50. If we call function without
specifying optional parameters, e.g.
Mandelbrot[ 1, 3 ],
default values are used. If we want to change these values, we call the function in
the form, e.g.
Mandelbrot[1, 3, MaxRadius -> 20, MaxSteps -> 1000]
Reader should be familiar with this notation as it is used in many predefined functions
in Mathematica.
Then we define three local variables c, R and n. Variable c represents the initial
point because we set

c = x + i y.
7.2 Mandelbrot set 135

Variables R and n are set to the values of parameters MaxRadius and MaxSteps and
we introduce them only to increase the readability of the code. The core of function
Mandelbrot is in the last command. Function
NestWhileList[ N[#^2 + c] &, c, (Abs[#] < R) &, 1, n]
applies the pure function #2 +c&, which is our function f (z) = z 2 + z0 , to the initial
value c repeatedly. Calling of function N is included in order to obtain just numerical
value of the result instead of exact value which would take a long time and occupy
a lot of memory (after all, reader is invited to remove the calling of this function to
see the differnce). Function NestWhileList stops when the condition specified again
as a pure function is violated. In our case, the condition |zn | < R is typed as a pure
function (Abs[#] < R)&. Next parameter of NestWhileList specifies how many recent
results of nested call should be inserted to the test. Here we want to test only the last
result and hence set this parameter to 1. The last parameter n specifies maximum
number of the calls.
The result of NestWhileList is the sequence of numbers zn which stops if |zn | > R
or if n > nmax . The point is that this command returns the list of all members of
given sequence so taking its length we find how long this sequence is. This number
is then a result of function Mandelbrot.
Finally we can visualize function Mandelbrot using
DensityPlot[
Mandelbrot[x, y], {x, -1.5, 0.5}, {y, -1.3, 1.3},
PlotPoints -> 100]
Function DensityPlot serves to visualize functions of two variables not by plotting a
three-dimensional graph but by assigning a color to each point (x, y) depending on
the value of the function to be plotted. The result is shown in figure 7.4 and is known
as the Mandelbrot set.
The meaning of regions with different colors can be understood easily. For exam-
ple, if we choose zero to be the initial point, z0 = 0 then all members of the sequence
2
must be zero, for we have zn = zn−1 + 0 = 0. In other words, the sequence stays at
point zero for all n and therefore function Mandelbrot will stop only after maximum
number of steps have been reached. Indeed, typing
Mandelbrot[0, 0]
yields the result 51. That means that after 50 steps the sequence was still in the
circle of radius R. We can see that the neighbourhood of zero is plotted in white
color in figure 7.4. Hence, white regions correspond to high values of the function
Mandelbrot. Blue color, on the other hand, represents regions where the values of the
136 7 Discrete dynamical systems and fractals

Fig. 7.4. The Mandelbrot set.

function are small and so the sequence escapes the circle of radius R very soon. For
example, at point (−1.5, 1) the value of
Mandelbrot[-1.5, 1]
is equal to 5 which means that the sequence escapes the circle after 5 steps.
It is natural to expect that small numbers close to zero yield bounded sequence
while numbers distant from zero yield rapidly diverging sequences. An unexpected
feature of this construction is the existence of boundary between blue and white
region which exhibits highly non-trivial structure. This boundary is obviously irreg-
ular but when we zoom into the boundary, we find kind of self-similarity: at each
scale we observe similar shape of the boundary. In figure 7.5 we plot the boundary
of Mandelbrot sets for different zooms.
This complicated structure of Mandelbrot’s set corresponds to the behaviour ob-
served in the previous section. Two different but close points give rise to sequences
with very different behaviour: one sequence remains bounded while the other one
escapes to infinity. Thus, in this sense, Mandelbrot set visualize extreme sensitivity
of the sequence zn to the choice of the initial point.
Fig. 7.5. Mandelbrot set on different scales.
8
Dynamical systems

We explained in previous chapters that both Lagrange’s equations and Hamilton’s


equations are equivalent to original Newton’s law of force F = ma if the force F
can be written as a gradient of the potential, i.e. if the force F is conservative. On
the other hand, we have seen that electromagnetic field is not conservative but the
motion of charged particle can still be described by the Lagrangian and consequently
by the Hamiltonian.
Hamilton’s equations
∂H ∂H
q̇a = , ṗa = −
∂pa ∂qa
are first-order ordinary differential equations and we have seen that such system of
equations can be given a geometrical interpretation in the phase space. In fact, using
the conservation of energy we were able to plot the phase trajectories even without
actually solving the equations of motion. In this chapter we study more general
system of equations when the right hand side is not derived from the Hamiltonian
but it is a general function. We will see that any second-order equation of motion
can be written as the system of first-order equation, but we will not be restricted to
conservative systems while the geometrical interpretation of the phase trajectories
will be preserved. Dynamical systems provide an appropriate framework for studying
all kinds of physical systems including those with the friction or time-dependent
external forces.

8.1 Definition
Dynamical system is a set of n first-order ordinary differential equations of the form
140 8 Dynamical systems

ẋ1 (t) = f1 (x1 (t), x2 (t), . . . xn (t), t),


ẋ2 (t) = f2 (x1 (t), x2 (t), . . . xn (t), t),
.. (8.1)
.
ẋn (t) = fn (x1 (t), x2 (t), . . . xn (t), t),
where xa = xa (t) are unknown functions of time, a = 1, 2, . . . n and fa are arbitrary
differentiable functions of variables xa and possibly of time t. If functions fa do not
depend on time explicitly, dynamical system is called autonomous, otherwise it is
called non-autonomous. Using the index notation, dynamical system (8.1) can be
written briefly in the form
ẋa = fa (x, t) (8.2)
where x stands for the n−tuple of variables xa . Autonomous system is then
ẋa = fa (x).
In this notation we suppress time-dependence of xa on time because this dependence
is assumed implicitly.
Motivated by Hamilton’s formalism, we intend to interpret the solution xa = xa (t)
as the motion in the phase space. Phase space is an abstract space1
M = Rn [x1 , x2 , . . . xn ]
with coordinates xa . Arbitrary point x ∈ M represents the state of physical system
described by equations (8.1). Solution of dynamical system is not unique unless we
specify the initial conditions, i.e. values of coordinates xa at some given initial time
t0 ,
x10 = x1 (t0 ), ... xn0 = xn (t0 ).
Usually we set t0 = 0. The n−tuple of initial coordinates xa0 will be denoted simply
by x0 ∈ M .
Suppose we choose a point x0 ∈ M at time t0 = 0 as in figure 8.1. A mathematical
theorem guarantees that there exists unique solution x = x(t) satisfying (8.1) such
that x(0) = x0 . The solution x = x(t) is also called the phase trajectory. Equations
(8.1) essentially state that vector f (x(t)) evaluated at arbitrary point of the trajec-
tory is in fact tangent to the trajectory, see figure 8.1. Hence, we can interpret vector
field f (x) as a velocity. Although it can be very difficult or even impossible to solve
the equations of motion (8.1), the velocity gives us a good idea about the behaviour
of the system.
1
Our definition is a simplification. In differential geometry, the phase space is defined as cotangent bundle
on the configuration manifold endowed with canonical symplectic form ω = qa ∧ pa .
8.2 Example 141

x2 f (x0) = ẋ(0)

x20 x0
x

f (x) = ẋ

x10 x1
Fig. 8.1. Two-dimensional dynamical system of the form ẋa = fa . Initial position is at x0 = x(0).
The “velocity” vector at x0 is f (x0 ) and determines the trajectory of the system in the infinitesimal
neighbourhood of the initial point.

8.2 Example
Let us see an illustrating example. We are already familiar with the equation of
harmonic oscillator

θ̈ + θ = 0.

This is a second order equation but we can bring into into the firs-order form by
setting

x1 = θ, x2 = θ̇.

Then we have

ẋ1 = θ̇ = x2

and

ẋ2 = θ̈ = −θ = −x1 .
142 8 Dynamical systems

Hence, instead of single equation θ̈+θ = 0 of second order we now have two equations
of first order
ẋ1 = x2 ,
(8.3)
ẋ2 = −x1 .

Clearly, this is a dynamical system (8.1) if we set f1 = x2 and f2 = −x1 . Thus, the
velocity field is

f (x1 , x2 ) = (x2 , −x1 ) . (8.4)

This vector field can be visualised in Mathematica by function VectorPlot:

f@x_ , y_ D = 8y , - x <;
StreamPlot@ f@x , y D, 8x , - 2, 2<, 8y , - 2, 2<D

Resulting figure is

-1

-2

-2 -1 0 1 2

This picture agrees with our previous analysis when we used the conservation of
energy to show that the phase trajectories of harmonic oscillator are circles (or ellipses
8.3 Implementation in Mathematica 143

when using SI units). Another possibility is to use function StreamPlot with the same
arguments which yields

-1

-2

-2 -1 0 1 2

8.3 Implementation in Mathematica


In this section we show how to implement a dynamical system in Mathematica in a
convenient way.

DynSys@f_ , IC_ , tmax_ D := Module @8vars, lhs, rhs, eqs, inConds<,


vars = Table @ x @aD@tD, 8a, 1, Length @IC D<D;
lhs = D@vars, tD;
rhs = f@Sequence žž varsD;
eqs = Equal žžž Transpose @8lhs, rhs<D;
inConds = Equal žžž Transpose @8vars . t ® 0, IC <D;
NDSolve @Join @eqs, inCondsD, vars, 8t, 0, tmax <D
D

This code deserves a brief explanation. Arguments of the function DynDys are
• pure function f – this is a vector function representing the right hand side of
dynamical system (8.1);
144 8 Dynamical systems

• initial conditions IC – list of the initial values of variables xa at time t = 0;


• tmax – upper bound of interval t ∈ (0, tmax ).
Hence, function DynSys can be called, e.g. with the arguments

sol = DynSys@ 8ð2, - ð1< &, 81, 1<, 10D


In[4]:=

In this example, pure function f is


{#2, -#1} &
which is equivalent to

f (x1 , x2 ) = (x2 , −x1 ).

Clearly, this corresponds to harmonic oscillator (8.4). Initial conditions IC are set to

x1 (0) = 1, x2 (0) = 1

and we want to find the solution in time interval (0, 10).


Now suppose that we called function DynSys with the arguments above and let
us explain how this function works. Thus, we assume that the arguments are
f = {#2, -#1}&
IC = {1, 1}
tmax = 10.
The first command
vars = Table[ x[a][t], {a, 1, Length[IC]}];
creates a list of variables xa (t) in the form
vars = { x[1][t], x[2][t] }.
The left hand side of equations ẋa (t) = fa (t) is generated simply by calling
lhs = D[vars, t]
which yields
lhs = { x[1]’[t], x[2]’[t] }.
8.3 Implementation in Mathematica 145

Now we form the right hand side of equations. Recall that vars is the list of
variables. We want to evaluate functions fa at point xa , i.e. we need the expression

fa (x1 , . . . xn ).

However, we cannot write simply f[vars] because this would mean


f[{x[1][t], x[2][t]}]
while what we need is
f[ x[1][t], x[2][t] ].
Hence, we must turn the list vars into the sequence of arguments by replacing its
head. Command
rhs = f[Sequence @@ vars].
leads to correct application of function f to arguments xa :
f[ Sequence @@ vars ]= f[ x[1][t], x[2][t] ]
= { x[2][t], -x[1][t]}.
Having defined the left hand side and the right hand side of dynamical system
separately, we join them in a usual way,
eqs = Equal @@@ Transpose[ {lhs, rhs} ]
Next we define the initial conditions with values specified in argument IC={1,1}. We
need to produce the list
{ x[1][0] == 1, x[2][0] == 1 }
The left hand side of initial conditions consists of the elements of vars evaluated at
time 0,
vars /. t->0,
the right hand side consists of the elements of IC. We con join them together by
inConds = Equal @@@ Transpose[{vars /. t -> 0, IC}];
Finally we solve the list of equations of motion and initial conditions by NDSolve:
NDSolve[Join[eqs, inConds], vars, {t, 0, tmax}].
Now, the reader should be familiar with functionality of function DynSys. We use
this function in the following examples.
To finalize this section we show how to use function DynSys to solve the motion
of harmonic oscillator.
146 8 Dynamical systems

sol = DynSys@ 8ð2, - ð1< &, 81, 1<, 10D


In[7]:=

ParametricPlot@ 8x @1D@tD, x @2D@tD< . sol, 8t, 0, 10<D

88x @1D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD,


Out[7]=

x @2D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD<<

1.0

0.5

Out[8]=

- 1.0 - 0.5 0.5 1.0

- 0.5

- 1.0

8.4 Chaotic pendulum


In the previous chapters we introduced the mathematical pendulum as a simple
example of physical system which has only one degree of freedom (angle of deflection
θ) and is described by the Lagrange equation

θ̈ + sin θ = 0.

This equations is non-linear because of the presence of the sine. We have seen that
this equation cannot be solved in terms of elementary functions but we were able
to find the numerical solution. Moreover, using the Hamiltonian formalism we were
able to plot the phase trajectories without actually solving the equation of motion.
We can generalize the model of mathematical pendulum in several ways. First, any
realistic system is dissipative, i.e. there are resisting forces acting against the motion
8.4 Chaotic pendulum 147

of the pendulum. As an approximation, resisting force is proportional to velocity and


has opposite direction. In the case of pendulum, velocity is proportional to θ̇ and
hence equation of pendulum with resisting force has the form

θ̈ + b θ̇ + sin θ = 0,

where b is the constant characterizing the strength of resisting force. For example,
it can be related to the viscosity of the medium in which the pendulum moves.
Pendulum with resisting force is called damped pendulum.
Next we can assume that in addition to restoring gravitational force there is an
external force acting on the pendulum. Such force is called driving force. In the
presence of driving force, even if the initial velocity of the pendulum is zero (and the
pendulum is at equilibrium position), driving force will make the pendulum to move.
Resulting motion of the pendulum will be a ”mixture” of two motions: periodic
motion due to self-oscillations of the pendulum, and motion due to driving force.
Pendulum with the driving force, driven pendulum with the friction is described by
equation

θ̈ + b θ̇ + sin θ = F0 sin Ωt (8.5)

where we assume that driving force is harmonic with angular frequency Ω and ampli-
tude F0 . In the subsequent analysis we will show that this kind of pendulum exhibits
chaotic behaviour and hence we also call it chaotic pendulum.
We start with rewriting equation (8.5) in the form of dynamical system. This is
straightforward since we can define

x1 = θ, x2 ≡ p = θ̇, x3 = φ = Ωt.

We will freely pass from notation (θ, p, φ) to equivalent notation (x1 , x2 , x3 ) according
to the context. By definition, variable φ satisfies equation

φ̇ = Ω,

while variable p (which is clearly related to the momentum of the pendulum) was
defined by

θ̇ = p

which can be consequently regarded as an equation for θ. The only true dynamical
equation is an equation for p which follows from (8.5):
148 8 Dynamical systems

ṗ = F0 sin φ − b p − sin θ.

Thus, variables (x1 , x2 , x3 ) = (θ, p, φ) are determined by dynamical system

θ̇ = p,
ṗ = F0 sin φ − b p − sin θ, (8.6)
φ̇ = Ω,

supplemented with initial conditions, i.e. values of xa at time t = 0.


Dynamical system (8.6) can be solved in Mathematica using function DynSys
defined in the previous section. We choose initial conditions
π
θ(0) = , p(0) = 0, φ(0) = 0
4
and investigate how values b, Ω and F0 affect the behaviour of the pendulum. First
we set

b = 0, , F0 = 0, , Ω = 0.

So, we reduce driven pendulum to the case of ordinary mathematical pendulum


without friction and driving force (damping coefficient b = 0 and the amplitude of
the force is F0 = 0).

vals = 8b ® 0, f0 ® 0, W ® 0<;
In[37]:=

tmax = 10;
sol = DynSys@8 ð2, f0 Sin @ð3D - b ð2 - Sin @ð1D, W< & . vals, 8Π  4, 0, 0<, tmax D

88x @1D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD,


Out[39]=

x @2D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD,


x @3D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD<<

Here we have chosen tmax = 10 but the reader should adjust this parameter in order
to reproduce all figures below. Now we can plot the phase trajectory in a usual way.
8.4 Chaotic pendulum 149

ParametricPlot@ 8x @1D@tD, x @2D@tD< . sol, 8t, 0, tmax <, PlotRange ® FullD


In[28]:=

0.6

0.4

0.2

Out[28]=

- 0.5 0.5

- 0.2

- 0.4

- 0.6

Let us add the friction now and set

b = 0.1,

see figure 8.2 for the result. We can see that the phase trajectory is a spiral which,
in the limit tmax → ∞, ends at the origin of the phase plane. This means that the
oscillations are damped until the pendulum stops. Slightly more ”fancy” picture can
be obtained by

g1 = ParametricPlot@ 8x @1D@tD, x @2D@tD< . sol, 8t, 0, tmax <, PlotRange ® Full,


In[111]:=

AxesLabel ® 8"Θ ", " p "<, Ticks ® None ,


BaseStyle ® 8 FontName ® "Times New Roman ", FontSize ® 15<
D;
g2 = Plot@ x @1D@tD . sol, 8t, 0, tmax <,
AxesLabel ® 8"t", "Θ "<, Ticks ® None ,
BaseStyle ® 8 FontName ® "Times New Roman ", FontSize ® 15<D;
GraphicsRow @ 8g1, g2<D

which results in figure 8.3.


150 8 Dynamical systems

0.6

0.4

0.2

-0.6 -0.4 -0.2 0.2 0.4 0.6 0.8

-0.2

-0.4

-0.6

Fig. 8.2. Parameters b = 0.1, F0 = 0. Non-zero friction leads to damped oscillations of the pendulum.

p Θ

Θ t

Fig. 8.3. Phase trajectory of damped pendulum together with the time dependence of deflection
θ = θ(t).
8.4 Chaotic pendulum 151

Let us see how the driving force affects the motion. For this purpose we set

b = 0, F0 = 1, Ω = 2,

and choose the initial conditions to

θ(0) = p(0) = φ(0) = 0.

In other words, the pendulum is initially at its equilibrium position and, hence,
without driving force it would stay at rest. However, the presence of driving force
leads to solution plotted in figure 8.4.

p
Θ

Θ t

Fig. 8.4. Motion of the pendulum without friction under the external driving force with amplitude
F0 = 1 and angular frequency Ω = 2. Initial position of the pendulum is θ(0) = p(0) = 0.

In the following example we choose

b = 1, F0 = 1, Ω = 1,

see figure 8.5. An interesting feature of this solution is the presence of short transient
stage during which the phase trajectory follows outgoing spiral but then settles at
circular periodic orbit. Such behaviour is called limit cycle.
The reader is invited to experiment with values of parameters b, F0 and Ω and with
initial values θ(0), p(0) and φ(0). We can see that resulting motion of the pendulum
is a consequence of complicated and delicate interplay between three motions:
152 8 Dynamical systems

p
Θ

Θ t

Fig. 8.5. Motion of the pendulum with friction (b = 1) under the external driving force with amplitude
F0 = 1 and angular frequency Ω = 1. Initial position of the pendulum is θ(0) = p(0) = 0.

• self-oscillations of the pendulum;


• resisting force (friction);
• external driving force.
While for some combinations of parameters the motion is perfectly understandable
(like in the absence of the friction and the driving force or in the absence of driving
force but in the presence of friction), for general values the motion is unpredictable,
chaotic.

8.5 Critical points of the pendulum


Let us return to equation of pure mathematical pendulum
θ̈ + sin θ = 0,
or, in the form of dynamical system,
θ̇ = p, ṗ = − sin θ. (8.7)
What are possible equilibrium positions of the pendulum? Clearly, if we set
θ(0) = 0, p(0) = 0,
the pendulum will not move. These conditions correspond to the situation when the
pendulum is hanging freely at the equilibrium position with zero initial velocity. In
this case the derivatives of θ and p take values
8.5 Critical points of the pendulum 153

θ̇ = 0, ṗ(0) = − sin θ(0) = − sin θ0 = 0.

In other words, the derivatives of all variables xa , where x = (θ, p), vanish and
therefore the pendulum does not move.
However, there is another possibility. If we set

θ(0) = π, p(0) = 0,

then the derivatives will vanish as well:

θ̇ = 0, ṗ(0) = − sin θ(0) = − sin θπ = 0.

This corresponds to situation when the pendulum is in upper position. If we were


able to arrange initial conditions in such a way that the angle of deflection θ is exactly
equal to π and the velocity is zero, we would obtain an equilibrium configuration in
which the pendulum does not move.
Points with these properties are called critical points or fixed points. In general,
critical point xC of dynamical system

ẋa = fa (x)

is a point for which

ẋa = fa (xC ) = 0.

If we choose the initial point to x(0) = xC , the system will remain at this initial
position forever, it will not move. For mathematical pendulum we have two critical
points,

xC1 = (0, 0) and xC2 = (π, 0).

They are sketched in figure 8.6.


On the other hand, we feel that two critical points of mathematical pendulum
have a different character. The first critical point xC1 is stable in the sense that small
perturbation results in periodic oscillations near this critical point. Critical point
xC2 is unstable in the sense that arbitrarily small perturbation will cause the fall of
pendulum and results in oscillations around critical point xC1 !
This observation is based on our physical intuition but can we predict stability
or instability of critical points directly from equations? By definition, critical points
represent equilibrium configurations of the system. Can we predict the behaviour of
the system near the critical point?
154 8 Dynamical systems

xC2 = (π, 0), θ = π, p = 0

xC1 = (π, 0), θ = 0, p = 0


Fig. 8.6. Two critical points of mathematical pendulum.

The idea is that small perturbations of stable critical points will produce small
deviations from equilibrium position, but small perturbations of unstable critical
points will result in a motion far from the critical point. We will use a basic fact
from mathematical analysis that, under certain assumptions, function f (x) can be
expanded into the Taylor series around arbitrary point xC in the following way
1 2 00
f (xC + δ) = f (xC ) + δ f 0 (xC ) + δ f (xC ) + O δ 3

(8.8)
2
where f 0 (x) denotes the value of derivative of f at point x, f 00 (x) is the second
derivative at the x, etc.
First we analyse critical point xC1 = (0, 0). Let us denote critical values of θ = x1
and p = x2 by

θC = 0, pC = 0.

Now suppose that the angle θ is only a small perturbation of θC , i.e.

θ = θC + δ where |δ|  1.

Since θC = 0 is a constant, we have


8.5 Critical points of the pendulum 155

θ̇ = δ̇.

Next we simplify equations of motion (8.7) under this assumption. Let us expand
sin θ around critical point θC = 0:

d
sin θ = sin(θC + δ) = sin θC + δ sin θ = sin θC + δ cos θC = δ
dθ θ=θC

where we have neglected higher powers of δ which is assumed to be small. With this
assumption, equations of pendulum (8.7) simplify to

δ̇ = p, ṗ = −δ.

These are, in fact, well-known equations for harmonic oscillator and we can easily
plot the solution which we already know is a circle. We can plot it by
sol = DynSys[ {#2, - #1} &, {0.1, 0}, 10];
ParametricPlot[ {x[1][t], x[2][t]} /. sol, {t, 0, 10}]
where we have chosen small initial deflection θ(0) = 0.1, in accordance with the
assumption. Since the solution is a circle, we observe that phase trajectories near
the first critical point remain in the vicinity of this critical point; an indicator of
stability.
Let us now investigate the second critical point located at

θC = π, pC = 0.

As in the previous case, we assume that deviations from critical value θC are small
and write

θ = θC + δ = π + δ, |δ|  1.

Now we expand sin θ into the Taylor series about θ = θC as follows:



d
sin θ = sin(θC + δ) = sin θC + δ sin θ = sin π + δ cos π = δ.
dθ θ=π

Equations of motion (8.7) simplify to

δ̇ = p, ṗ = δ.

In order to compare solutions near both critical points we use the following code:
156 8 Dynamical systems

In[154]:=
tmax = 2 Π ;
Needs@"PlotLegends`"D
sol1 = DynSys@ 8ð2, - ð1< &, 80.1, 0<, tmax D;
sol2 = DynSys@ 8ð2, ð1< &, 80.1, 0<, tmax D;
ParametricPlot@ 88x @1D@tD, x @2D@tD< . sol1, 8x @1D@tD, x @2D@tD< . sol2<,
8t, 0, tmax <, PlotRange ® 8 8-0.5, 2<, 8-0.5, 2<<,
AxesLabel ® 8∆, p<, PlotStyle ® 8 Red , Blue <, BaseStyle ® 8FontSize ® 15<,
PlotLegend ® 8"Critical point Θ C =0", "Critical point Θ c = Π "<, LegendPosition ® 8-0.5, 1<
D

Both trajectories near critical points are plotted in figure 8.7. We can see that tra-
jectory corresponding to first critical point θC = 0 is a circle and thus remains in the
vicinity of the critical point. The second trajectory corresponding to critical point
θC = π, on the other hand, is a line which escapes to infinity. Hence, we can see
that the second critical point is unstable in the following sense. If we move the pen-
dulum to θ = π and set the initial velocity to zero, the pendulum remains at this
equilibrium position. However, arbitrarily small perturbation (in our case δ = 0.1)
will cause the pendulum to escape from equilibrium position quickly. In our case,
the trajectory escapes to infinity, but this is an artefact of the linearization: we have
assumed that the perturbation δ is small but as soon as the pendulum is far enough
from the critical point, this assumption is not valid anymore.

8.6 Stability of critical points


Having illustrated the main idea about the stability and instability on the example
of mathematical pendulum, we can proceed to a general theory. For simplicity we
restrict ourselves to autonomous planar dynamical systems, i.e. dynamical systems
with only two variables x1 = x and x2 = y which can be visualised in the plane.
Hence, planar dynamical system is a set of two first-order equations of the form

ẋ = fx (x, y) ẏ = fy (x, y). (8.9)

Critical point of system (8.9) is such a point (xC , yC ) for which

fx (xC , yC ) = fy (xC , yC ) = 0. (8.10)

Then the equations of motion (8.9) reduce to

ẋ(xC , yC ) = 0, ẏ(xC , yC ) = 0,
8.6 Stability of critical points 157

Critical point Θ C = 0

Critical point Θ c = Π

p
2.0

1.5

1.0

0.5


- 0.5 0.5 1.0 1.5 2.0

- 0.5

Fig. 8.7. Phase trajectories near two critical points θC = 0 and θC = π. In both cases the actual
deflection is θ = θC + δ but with different θC . We can see that the red trajectory is a circle about the
origin while the blue trajectory diverges to infinity rapidly.

which means that critical points represent the equilibrium configurations of the sys-
tem.
Now we want to investigate the stability or instability of critical points. That
means we want to find out how the phase trajectories behave in the vicinity of
critical points. In the case of the pendulum we have seen that an appropriate way
how to proceed is to linearize the system of equations near the critical point.
Let us assume that (xC , yC ) is a critical point of system (8.9). In the neighbour-
hood of critical point we can write
158 8 Dynamical systems

x = xC + δ, |δ|  1,
(8.11)
y = yC + ε, |ε|  1.
Since xC and yC are constants, for the time derivatives of x and y we have
ẋ = δ̇, ẏ = ε̇.
Function fx (x, y) can be then expanded into the Taylor series:

∂fx ∂fx
fx (x, y) = fx (xC + δ, yC + ε) = fx (xC , yC ) + δ +ε
∂x (xC ,yC ) ∂y (xC ,yC )
= a δ + b ε, (8.12)
where we have used definition (8.10) in the last step and denoted partial derivatives
of fx by

∂fx ∂fx
a= , b= . (8.13)
∂x (xC ,yC ) ∂y (xC ,yC )
Vertical line with the subscript indicates that partial derivatives must be evaluated
at the critical point. Similarly, for fy we find
fy (x, y) = fy (xC + δ, yC + ε) = c δ + d ε
where

∂fy ∂fy
c= , d= . (8.14)
∂x (xC ,yC ) ∂y (xC ,yC )
Thus, near the critical point, planar dynamical system (8.9) can be replaced by
simpler equations
δ̇ = a δ + b ε,
(8.15)
ε̇ = c δ + d ε.
Coefficients a, b, c and d are not functions but constants given by (8.13) and (8.14).
It is useful to write equations (8.15) in the matrix form. Let us define
 
 ab
x= δε , J= .
cd
Then two equations (8.15) are equivalent to single matrix equation
ẋ = J · x (8.16)
where the dot denotes standard matrix multiplication.
8.6 Stability of critical points 159

8.6.1 Example

Let us consider non-linear dynamical system

ẋ = x(1 + y), ẏ = y(1 − x)

with

fx = x(1 + y), fy = y(1 − x).

First we find the critical points, i.e. we set

xC (1 + yC ) = 0, yC (1 − xC ) = 0.

There are two solution to these equations,

(xC , yC ) = (0, 0) and (xC , yC ) = (1, −1).

We analyse these points separately. The emphasis is on finding the critical points
and deriving linearized equations of motion. The solution is merely stated because
we will analyse all cases in detail later.
a) Critical point (0, 0). In this case we write

x = xC + δ = δ, y = yC + ε = ε.

Now we have

fx = x(1 + y) = δ(1 + ε) = δ, fy = y(1 − x) = ε(1 − δ) = ε,

where we have neglected higher order terms εδ because of linearization. Hence, in


the neighbourhood of the first critical point, the equations of motion are

δ̇ = δ, ε̇ = ε.

These equations can be solved trivially to find

δ = C1 et , ε = C2 et ,

where C1 and C2 are integration constants. We will discus this later, but for now it
is obvious that the phase trajectory escapes to infinity because

lim et = ∞.
t→∞
160 8 Dynamical systems

Hence, this critical point is unstable.


b) Critical point (1, −1). In this case we write
x = xC + δ = 1 + δ, y = yC + ε = −1 + ε,
so that
fx = x(1 + y) = (1 + δ)(1 − 1 + ε) = ε, fy = y(1 − x) = (−1 + ε)(1 − 1 − δ) = δ,
where we have neglected products εδ again. Now the linearized equations of motion
are
δ̇ = ε, ε̇ = δ
which solve to
δ = C1 cosh t + C2 sinh t, ε = C1 sinh t + C2 cosh t.
The reader is invited to check that solutions (δ, ε) are hyperbolas escaping to infinity
and hence the second critical point is unstable again.

8.7 Classification of critical points


In the previous idea we defined the critical points and sketched how these points can
be divided to stable and unstable points. We have seen that mathematical pendulum
has two critical points, one is stable, the other is not. In the next example we have
seen a system with two unstable critical points. The classification of critical points,
however, is more subtle and we discuss all possibilities in this section.
Let us first recapitulate our goal. We study planar dynamical system described
by equations
ẋ = fx (x, y), ẏ = fy (x, y).
We assume that we have found critical point of this system, i.e. point (xC , yC ) such
that
fx (xC , yC ) = fy (xC , yC ) = 0,
and study the behaviour of the system near this critical point. We linearize the
equations in the neighbourhood of critical point so that we obtain equations2
2
In the notation of previous section, our functions x and y are in fact perturbations δ and ε. In this
section, however, we use x and y as they are more natural.
8.7 Classification of critical points 161

ẋ = a x + b y, ẏ = c x + d y.

This system can be written also in the matrix form

ẋ = J · x

where
 
ab
J= .
cd
Now we discuss several forms of matrix J and classify the critical points. Finally we
will show how the analysis can be done for general matrix J .

8.7.1 Stable and unstable nodes, saddle points


Consider linear planar system of the form

ẋ = λ1 x, ẏ = λ2 y (8.17)

which corresponds to matrix


 
λ1 0
J= . (8.18)
0 λ2

System (8.17) can be easily solved. Equations for x and y are independent; we say
that these equations are decoupled which means that equation for ẋ does not contain
y and vice versa.
Let us solve equation

ẋ = λ1 x

first. In usual mathematical notation, this equation reads


dx
= λ1 x
dt
which is separable differential equation. We can rewrite it as
dx
= λ1 dt.
x
This form of equation is called separated because the left hand side of the equations
contains only x and the right hand side contains only time t. We can integrate the
equation,
162 8 Dynamical systems

dx
Z Z
= λ1 dt,
x
to obtain

log x = λ1 t + C

where C is an integration constant. It is customary that if the logarithm appears in


the solution, we write the constant as a logarithm as well3 :

log x = λ1 t + log K.

Exponentiating the last equation we arrive at

x = K eλ1 t .

By the same procedure we solve equation for y to get

y = L eλ2 t

where L is an integration constant again. Notice that, according to the solution, we


have

x(0) = K and y(0) = L.

Hence, K and L are values of x and y at time t = 0, respectively. Therefore, we can


write the solution of (8.17) in the form

x(t) = x0 eλ1 t , y(t) = y0 eλ2 t . (8.19)

Clearly, the only critical point of system (8.17) is (0, 0). Having derived solution
of this system, we can analyze its behaviour near the critical point. Useful function
to visualise properties of the system near critical point is StreamPlot which takes the
vector field and plots trajectories. In the following example we choose λ1 = λ2 = 1.

3
Notice that arbitrary real number C is a logarithm of some other real number, i.e. we can write C = log K
for some K.
8.7 Classification of critical points 163

vals = 8 Λ1 ® 1, Λ2 ® 1<;
StreamPlot@ 8Λ1 x , Λ2 y < . vals, 8x , - 10, 10<, 8y , - 10, 10<D

10

Out[173]=
0

-5

- 10

- 10 -5 0 5 10

In this figure we can see trajectories (8.19) for initial points (x0 , y0 ) chosen by Math-
ematica. Notice that we have inserted the right hand side of (8.17) as an argument
of function StreamPlot. We can see that the trajectories are straight lines emanating
from the origin (critical point) and tending to infinity exponentially.
What about other choices of λ1,2 ? It is clear that function eλt is increasing for
λ > 0 and decreasing for λ < 0. We can conclude that qualitative behaviour of the
system depends on signs of λ1,2 and four possibilities are shown in figure 8.8 which
was created by following commands in Mathematica. We distinguish three cases.
• λ1 > 0 and λ2 > 0
In this case the critical point is called unstable node. Trajectories are emanating
from the origin and they are repelled to infinity.
• λ1 > 0, λ2 < 0 or λ1 < 0, λ2 > 0
Critical point is called saddle point. Trajectories are repelled from y−axis and
attracted to x−axis (for λ1 < 0) or repelled from x−axis and attracted to y−axis
(for λ2 < 0).
• λ1 < 0 and λ2 < 0
Critical point is called stable node. Trajectories are attracted to the origin.
164 8 Dynamical systems

In addition to this classification, critical points with distinct values λ1 6= λ2 are called
singular while critical points with the same values λ1 = λ2 are called degenerate.
Clearly, the saddle points cannot be singular.

aL Λ1 > 0 , Λ 2 > 0 bL Λ1 > 0 , Λ 2 < 0


10 10

5 5

0 0

-5 -5

- 10 - 10
- 10 -5 0 5 10 - 10 -5 0 5 10

cL Λ1 < 0 , Λ 2 > 0 dL Λ1 < 0 , Λ 2 < 0


10 10

5 5

0 0

-5 -5

- 10 - 10
- 10 -5 0 5 10 - 10 -5 0 5 10
Fig. 8.8. Different behaviour of planar system (8.17) for different choices of λ1,2 . Critical points are
a) unstable node, b,c) saddle point, d) stable node.
8.7 Classification of critical points 165

Recall that planar dynamical system (8.17) can be represented by the matrix
(8.18),
 
λ1 0
J= .
0 λ2

From elementary linear algebra we know that with matrix J we can associate a set
of eigenvalues λ defined by equation

J · e = λe

where e is called an eigenvector. It is easy to show that the eigenvalues of matrix


(8.18) are λ1 and λ2 and corresponding eigenvectors are
   
1 0
e1 = , e2 = .
0 1

In other words, vectors e1 and e2 satisfy equations

J · e1 = λ1 e1 , J · e2 = λ2 e2 .

We can see that trajectories starting on lines determined by vectors ei , i = 1, 2,


always remain in these lines. If the trajectory is being repelled from the critical point
along direction e, the line determined by vector e is called unstable manifold. If the
trajectory is attracted to the critical point along the vector e, the line determined by e
is called stable manifold. For matrix (8.18), vectors e1 and e2 are always eigenvectors.
We can see that e1 lies on the x−axis and e2 lies on the y−axis. Hence, the axes are
stable or unstable manifolds of system (8.17), depending on the sign of λ1,2 .
The classification introduced above can be reformulated in the following way. Let
 
ab
J=
cd

be a matrix of general linear dynamical system

ẋ = a x + b y, ẏ = c x + d y.

If matrix J has two real eigenvalues λ1 and λ2 , then critical point is stable/unstable
node or a saddle point, depending on the signs of these eigenvalues.
We illustrate this classification on the example. Consider dynamical system

ẋ = 2 x + y, ẏ = x, (8.20)
166 8 Dynamical systems

with the matrix


 
21
J= .
10

This matrix is not of the form (8.18) but we can apply the second criterion. Eigen-
values and eigenvectors can be found in Mathematica using

J = 882, 1<, 81, 0<<;


In[58]:=

Eigensystem @J D

::1 + 2 >, ::1 + 2 , 1>, :1 -


Out[59]=
2 ,1- 2 , 1>>>

which shows that eigenvalues are


√ √
λ1 = 1 + 2, λ2 = 1 − 2,

and corresponding eigenvectors are


 √   √ 
1+ 2 1− 2
e1 = , e2 = .
1 1

Since λ1 > 0 and λ2 < 0, vector e1 defines the stable manifold and e2 defines unstable
manifold. Since both eigenvalues have different signs, the critical point is a saddle
point and it is regular. Phase trajectories together with stable and unstable manifolds
can be plotted by

g1 = StreamPlot@J .8x , y <, 8x , - 10, 10<, 8y , - 10, 10<D;


In[54]:=

g2 = GraphicsB : Thick , Blue , Line B:- 10 : 1 + 2 , 1> , 10 : 1 + 2 , 1> > F> F;

g3 = GraphicsB : Thick , Red , Line B:- 10 : 1 - 2 , 1> , 10 : 1 - 2 , 1> > F> F;


Show @g1, g2, g3D

The result is plotted in figure 8.9.

8.7.2 Centres and foci


Next special case we consider is the dynamical system of the form
8.7 Classification of critical points 167

10

-5

-10

-10 -5 0 5 10

Fig. 8.9. Phase portrait for dynamical system (8.20). Blue line represents unstable manifold, red line
represents stable manifold.

ẋ = α x + β y, ẏ = −β x + α y. (8.21)

Matrix of this system is


 
αβ
J= . (8.22)
−β α

System (8.21) is little trickier to solve. Let us switch to polar coordinate system
by usual transformation

x = r cos θ, y = r sin θ,

where r = r(t) and θ = θ(t). Inverse transformation reads


p y
r= x2 + y 2 , θ = arctan .
x
These relations can be used to find
168 8 Dynamical systems

∂r x ∂r y
= , = ,
∂x r ∂y r
∂θ y ∂θ x
= − 2, = .
∂x r ∂y r2

Now we use (8.21) to derive corresponding equations for r and θ:

∂r ∂r
ṙ = ẋ + ẏ = α r,
∂x ∂y
∂θ ∂θ
θ̇ = ẋ + ẏ = −β.
∂x ∂y

We can see that dynamical system (8.21) in polar coordinates decouples to two
independent equations for coordinates r and θ,

ṙ = α r, θ̇ = − β. (8.23)

First we solve equation for r. Let us write it in the form


dr
= α dt
r
which integrates to

log r = α t + log C

where the integration constant has been written as a logarithm (see footnote on page
162). Exponentiating the last equation we arrive at

r = C eαt .

Obviously, at time t = 0 we have r(0) = C and so we write the solution in the form

r = r0 eαt .

Next we solve equation for θ. This is trivial since we have

dθ = β dt

which integrates to

θ = β t + θ0
8.7 Classification of critical points 169

where the integration constant has been denoted by θ0 and represents the value of θ
at t = 0. Summa summarum, solution of system (8.23) acquires the form

r = r0 eαt , θ = θ0 + β t. (8.24)

Hence, solution of original system (8.21) in the Cartesian coordinates reads

x = r0 eαt cos (θ0 + β t) , y = r0 eαt sin (θ0 + β t) . (8.25)

Suppose that α = 0 so that

x = r0 cos(θ0 + β t), y = r0 sin(θ0 + β t).

Clearly, this represents motion at constant angular velocity β and constant radius r0
and therefore the phase trajectories are circles of radius r0 . If α 6= 0, the radius of
the ”circle” will be

r0 eαt

and hence the trajectory will be a spiral. If α > 0, the radius will increase exponen-
tially and the spiral will tend to infinity. If, on the other hand, α < 0, the radius will
decrease exponentially and the phase trajectories will spiral towards the origin. All
cases are plotted in figure 8.10 by Mathematica commands

J = 8 8Α , Β<, 8- Β, Α <<;
In[25]:=

g1 = StreamPlot@ J .8x , y < . 8Α ® 0, Β ® 1<, 8x , - 5, 5<, 8y , - 5, 5<,


PlotLabel ® "Α = 0, Β > 0", BaseStyle ® 8FontSize ® 10<D;
g2 = StreamPlot@ J .8x , y < . 8Α ® 0, Β ® - 1<, 8x , - 5, 5<, 8y , - 5, 5<,
PlotLabel ® "Α = 0, Β < 0", BaseStyle ® 8FontSize ® 10<D;
g3 = StreamPlot@ J .8x , y < . 8Α ® 1, Β ® 1<, 8x , - 5, 5<, 8y , - 5, 5<,
PlotLabel ® "Α > 0, Β > 0", BaseStyle ® 8FontSize ® 10<D;
g4 = StreamPlot@ J .8x , y < . 8Α ® - 1, Β ® 1<, 8x , - 5, 5<, 8y , - 5, 5<,
PlotLabel ® "Α < 0, Β > 0", BaseStyle ® 8FontSize ® 10<D;
g = GraphicsGrid @ 88g1, g2<, 8g3, g4< <D

and can be classified as follows:


• α=0
Critical point is called centre. Trajectories are circles centred at the origin.
• α>0
Critical point is called unstable focus, trajectories are spirals escaping to infinity.
170 8 Dynamical systems

• α<0
Critical point is called stable focus, trajectories are spirals tending to the origin.
Parameter β has the meaning of angular velocity. If it is zero, spirals become straight
lines and dynamical system reduces to previous case (8.17). If it is non-zero, its sign
determines the sense of rotation: trajectories orbit the origin in a clockwise sense for
β > 0 and in a counter-clockwise sense for β < 0.
Let us now analyse critical points of system (8.21) in terms of eigenvalues of
matrix (8.22)
 
αβ
J= .
−β α

We can use Mathematica to find the eigenvalues and eigenvectors of matrix (8.22)
by

Eigensystem @ 8 8Α , Β<, 8- Β, Α <<D


In[44]:=

88Α - ä Β, Α + ä Β<, 88ä, 1<, 8-ä, 1<<<


Out[44]=

which shows that this matrix has two eigenvalues

λ1 = α − i β and λ2 = α + i β

with eigenvectors
   
i −i
e1 = , e2 = .
1 1

In other words, eigenvalues and eigenvectors of matrix J satisfy relations

J · e1 = λ1 e1 , J · e2 = λ2 e2 .

The first observation is that the eigenvectors are complex and hence there are no
neither stable nor unstable manifolds, i.e. there is no real direction which is mapped
to the same direction. The only exception is when β = 0 since in this case dynamical
system (8.21) reduces to (8.17) and the eigenvectors become real.
Second, eigenvalues λ1,2 are mutually complex conjugated (as well as the eigen-
vectors),
8.7 Classification of critical points 171

aL Α = 0, Β > 0 bL Α = 0 , Β < 0

4 4

2 2

0 0

-2 -2

-4 -4

-4 -2 0 2 4 -4 -2 0 2 4

cL Α > 0, Β > 0 dL Α < 0 , Β > 0

4 4

2 2

0 0

-2 -2

-4 -4

-4 -2 0 2 4 -4 -2 0 2 4
Fig. 8.10. Classification of critical points for the system (8.21): a, b) centre, c) unstable focus, d)
stable focus.
172 8 Dynamical systems

λ1 = λ2

where the bar denotes the complex conjugation. Hence, even if the dynamical system
is not of the form (8.21), we can conclude, that if the matrix J has two complex
conjugated eigenvalues

α ± i β,

the critical point is stable/unstable focus or a centre, depending on the values of α


and β as classified above.
Example. Consider dynamical system

ẋ = 2 x + 4 y, ẏ = −3 x + 2y.

This system is not of the form (8.21) but we can apply the criterion based on the
analysis of eigenvalues. In Mathematica we type

J = 8 82, 4<, 8- 3, 2<<;


In[68]:=

Eigensystem @J D  Expand

:: 2 + 2 ä 3 >, ::- , 1>, :


Out[69]= 2 ä 2 ä
3 , 2- 2 ä , 1>>>
3 3

where we have used Expand in order to simplify the expression for eigenvectors (try
this code without Expand). We have found two eigenvalues

λ1,2 = 2 ± 2 i 3 = α ± i β,

which are mutually complex conjugated. In this case, parameters α and β are

α = 2, β = 2 i 3.

Parameter α is positive and so the critical point is an unstable focus. Trajectories of


dynamical system considered:
8.7 Classification of critical points 173

-2

-4

-4 -2 0 2 4

Another example is the system

ẋ = x + 2 y, ẏ = −2 x − y.

Eigenvalues are found by

J = 8 81, 2<, 8- 2, - 1<<;


In[148]:=

Eigensystem @J D  Expand

::ä 3 >, ::- , 1>, :-


Out[149]= 1 ä 3 1 ä 3
3 , -ä - + , 1>>>
2 2 2 2

Hence, now the eigenvalues are



λ1,2 = ±i 3 = α ± i β

which means that



α = 0, β = ± 3.
174 8 Dynamical systems

Since α = 0, critical point is a centre rather than focus. Trajectories of this dynamical
system are the following:

-2

-4

-4 -2 0 2 4

8.8 General case


In the previous two sections we studied two special cases of planar linear dynamical
systems given by matrices
   
λ1 0 αβ
J= and J = .
0 λ2 −β α

However, we have seen that the analysis can be performed using the eigenvalues of
these matrices. Now we consider general linear planar dynamical system

ẋ = α x + β y, ẏ = γ x + δ y. (8.26)

Let us find the eigenvalues and eigenvectors of this general matrix. Recall that the
determinant of matrix J is
8.8 General case 175

D = det J = α δ − β γ.

The trace of the matrix is defined as a sum of its diagonal elements, i.e.

T = Tr J = α + δ.

Eigenvalues λ are defined by equation

J · e = λe

where e is an eigenvector. The last equation can be rewritten in the form

(J − λ I) · e = 0

where I is the unit matrix 2 × 2 so that


 
α−λ β
(J − λ I) = .
γ δ−λ

This equation is a homogeneous system of linear equations which has non-trivial


solutions only if the determinant of the system is zero:

det (J − λ I) = 0.

This determinant reads

(α − λ)(δ − λ) − β γ = 0.

Expanding the brackets we arrive at

λ2 − (α + δ)λ + α δ − β γ = 0,

or, equivalently

λ2 − T λ + D = 0.

This is a quadratic equation for λ and its solutions are



T ± T 2 − 4D
λ1,2 = . (8.27)
2
Now we can summarize the classification of critical points as follows.
176 8 Dynamical systems

• λ1,2 ∈ R (real eigenvalues)


– λ1 6= λ2 – singular node
– λ1 = λ2 – degenerate node
– λ1 > 0, λ2 > 0 – unstable node
– λ1 , λ2 < 0 – saddle point
– λ1 < 0, λ2 < 0 – stable node
• λ1,2 = α ± i β, λ1 = λ2 (complex conjugated eigenvalues)
– α = 0 – centre
– α > 0 – unstable focus
– α < 0 – stable focus
Moreover, if the real parts of eigenvalues λ1,2 are non-zero, critical point is called
hyperbolic, otherwise it is called non-hyperbolic.

8.9 Examples
Example 1

Consider linear dynamical system

ẋ = 2 x + y, ẏ = x + 2 y.

There is only one critical point at the origin,

xC = 0, yC = 0.

Since the system is linear, we do not have to linearize it and can write the matrix of
linearized system immediately:
 
21
J= .
12

Its eigenvalues are calculated from (8.27):

λ1 = 1, λ2 = 3.

Eigenvectors can be found easily by hand. Recall that eigenvectors are solutions to
equation

J · ei = λi ei , i = 1, 2.
8.9 Examples 177

(Notice that we suppress the Einstein summation convention). For i = 1 we obtain


homogeneous system of linear equations
   
11 a
· =0
11 b

where e1 = (a, b) is unknown eigenvector. Since the rows (or columns) of the matrix
above are linearly dependent4 , this system has infinitely many non-trivial solutions
satisfying condition a = −b. Hence, all eigenvectors corresponding to eigenvalue
λ1 = 1 have the form
 
a
.
−a

We choose the eigenvector to be


 
1
e1 = .
−1

By similar consideration we find that the eigenvector associated with eigenvalue


λ2 = 3 is
 
1
e2 = .
1

To summarize, we have found the eigensystem of matrix J :


 
1
λ1 = 1, e1 = ,
−1
  (8.28)
1
λ2 = 3, e2 = .
1

Now we can classify the critical point (0, 0). Since the eigenvalues are real and non-
zero, critical point is hyperbolic. They are both positive and hence the critical point
is unstable node. Finally, eigenvectors are real and so the system has two unstable
manifolds given by e1 and e2 . Implementation in Mathematica is shown in figure
8.11.
4
This is a consequence of (8.27), because this equation has been derived under the assumption det(J −
λ I) = 0.
178 8 Dynamical systems

Dynamical system
x’ = 2 x + y, y’ = x + 2y
with the matrix J = K O
2 1
1 2

H* critical points *L
In[4]:=

Solve @ 82 x + y Š 0, x + 2 y Š 0<, 8x , y <D

88x ® 0, y ® 0<<
Out[4]=

Origin (0, 0) is the only critical point. Eigenvalues and eigenvectors are found by

J = 8 82, 1<, 81, 2< <;


In[22]:=

Eigensystem @J D

883, 1<, 881, 1<, 8-1, 1<<<


Out[23]=

Eigenvalues are real and positive:


hyperbolic point, unstable node

g1 = StreamPlot@ J .8x , y <, 8x , - 5, 5<, 8y , - 5, 5<D; H* phase trajectories *L


In[24]:=

H* unstable manifold e 1 = H 1, 1L *L
g2 = Graphics@8Blue , Thick , Line @ 8 - 10 81, 1<, 10 81, 1<<D<D;
H* unstable manifold e 1 = H- 1, 1L *L
g3 = Graphics@8Red , Thick , Line @ 8 - 10 8- 1, 1<, 10 8- 1, 1<<D<D;
Show @g1, g2, g3D

Out[27]=
0

-2

-4

-4 -2 0 2 4

Fig. 8.11. Example 1.


8.9 Examples 179

Example 2
Linear dynamical system has the form

ẋ = − 2 x, ẏ = − 4 x − 2 y.

The matrix of this system is clearly


 
−2 0
J= .
−4 −2

Eigenvalues are

λ1 = λ2 = −2

and so it is a degenerate node. Since both eigenvalues are negative, it is a stable


degenerate node. Implementation in Mathematica is shown in figure 8.12.

Example 3. Volterra-Lotka equations


Volterra-Lotka equations belong to the class of predator-prey models which describe
interaction between two populations. The population of preys has a tendency to grow
and the population of predators tends to die. It is due to their mutual interaction
that also the population of predators can grow and the population of preys can die,
in other words, predators are eating preys.
Let x = x(t) be the number of preys, say, rabbits, let y = y(t) be the number
of predators, say, foxes. We can construct a plausible model of interaction between
foxes and rabbits by following simple considerations. Suppose that y = 0, i.e. there
are only rabbits present. As a first approximation we can assume that the population
of rabbits will grow, the number of rabbits x will increase because of ”interaction”
between rabbits and the higher is the number of rabbits, the higher is the rate of
growth. Hence, we can postulate that isolated population of rabbits will be governed
by equation

ẋ = α x.

Roughly speaking, constant α can be interpreted as a probability of the birth of a


new rabbit when there are no foxes. This equation has solution

x = x0 e α t

which means that isolated population of rabbits will grow exponentially.


180 8 Dynamical systems

Dynamical system
x’ = -2 x,
y’ = -4 x - 2 y
with the matrix J = K O
-2 0
-4 -2

H* critical points *L
In[28]:=

Solve @ 8- 2 x Š 0, - 4 x - 2 y Š 0<, 8x , y <D

88x ® 0, y ® 0<<
Out[28]=

Origin (0, 0) is the only critical point. Eigenvalues and eigenvectors are found by

J = 8 8- 2, 0<, 8- 4, - 2< <;


In[29]:=

Eigensystem @J D

88- 2, - 2<, 880, 1<, 80, 0<<<


Out[30]=

Eigenvalues are real and repeated


hyperbolic point, stable degenerate node
Stable manifold is given by e 1 = H 0, 1L

g1 = StreamPlot@ J .8x , y <, 8x , - 5, 5<, 8y , - 5, 5<D; H* phase trajectories *L


In[37]:=

H* stable manifold *L
g2 = Graphics@8Red , Thick , Line @ 8 - 10 80, 1<, 10 80, 1<<D<D;
Show @g1, g2D

Out[39]=
0

-2

-4

-4 -2 0 2 4

Fig. 8.12. Example 2.


8.9 Examples 181

Similar consideration applies to isolated population of foxes. If γ is the probability


of death of the fox, isolated population of foxes will be governed by equation

ẏ = −γ y

which has the solution

y = y0 e−γ t ,

i.e. the population of foxes will die exponentially.


Now we add an interaction to our equations. The number of rabbits eaten by foxes
is proportional to number of rabbits and to number of foxes. Conversely, the number
of new-born foxes is proportional to number of foxes and to number of rabbits. If we
introduce constants β and δ for both processes, equations for interacting populations
of rabbits and foxes read

ẋ = α x − β x y, ẏ = −γ y + δ x y. (8.29)

These are Volterra-Lotka equations. Obviously, they are non-linear and the non-
linearity represents the interaction between two populations. All constants are as-
sumed to be positive.
Critical points can be found by

cp = Solve @ 8Α x - Β x y Š 0, - Γ y + ∆ x y Š 0<, 8x , y <D


In[4]:=

:: x ® >, 8x ® 0, y ® 0<>
Out[4]= Γ Α
, y ®
∆ Β

Hence, the critical points are


 
γ α
xC1 = , , xC2 = (0, 0) .
δ β
In order to linearize equations (8.29) we introduce the Jacobi matrix J
∂ ẋ ∂ ẋ
 

J =  ∂x ∂y 
∂ ẏ ∂ ẏ  .

∂x ∂y
The Jacobi matrix can be found in Mathematica by
182 8 Dynamical systems

f@x_ , y_ D = 8 Α x - Β x y , - Γ y + ∆ x y <;
In[16]:=

J = Transpose @D@ f@x , y D, ð D & ž 8x , y <D

88Α - y Β, - x Β<, 8y ∆, - Γ + x ∆<<


Out[17]=

which shows
 
α − yβ −xβ
J= .
yδ xδ − γ
Next we evaluate the Jacobian at both critical points:

J1 = J . cp@@1DD
In[24]:=

J2 = J . cp@@2DD

::0, - >, :
Out[24]= Β Γ Α ∆
, 0>>
∆ Β

88Α, 0<, 80, - Γ <<


Out[25]=

i.e. we have
βγ
 
 0 − δ 
J1 = α δ  at critical point xC1 ,
0
β
 
α 0
J2 = at critical point xC2 .
0 −γ
Finally we find eigenvalues and eigenvectors by

Eigensystem @J1D
In[27]:=

Eigensystem @J2D

ä Β Γ ä Β Γ
::-ä Γ >, ::- , 1>, :
Out[27]=
Α Γ ,ä Α , 1>>>
Α ∆ Α ∆

88Α, - Γ <, 881, 0<, 80, 1<<<


Out[28]=
8.10 Flow of the vector field 183

8.10 Flow of the vector field


In this section we introduce some useful notions related to the concept of dynamical
system. We consider general autonomous dynamical system (8.1)

ẋa = fa (x), a = 1, 2, . . . n. (8.30)

We know that the solution exists and is unique if prescribe initial conditions

xa (0) = xa0 (8.31)

where xa0 are constants with the meaning of initial value of coordinates xa . The
solution of dynamical system is then a set of functions xa as functions of time,

xa (t) = xa (t, x0 ), (8.32)

where we have explicitly emphasized that particular solution depends on initial values

x0 = (x10 , x20 , . . . xn0 ) .

Hence, in the following, by symbol x(t, x0 ) we mean the set of functions

x(t, x0 ) = (x1 (t), x2 (t), . . . xn (t))

such that
d
x(0, x0 ) = x0 and x(t, x0 ) = fa (x(t, x0 )). (8.33)
dt
In other words, x(t, x0 ) is a solution of dynamical system (8.30) with initial conditions
(8.31).
It is useful to introduce slightly more formal notation for x(t, x0 ). We defined
the phase space M as an abstract space with coordinates xa . For n−dimensional
dynamical system, the phase space is

M = Rn = R
| × R{z
× · · · R} .
n

The flow of dynamical system (8.30) is a mapping


184 8 Dynamical systems

Φ : R × M 7→ M
defined by
Φs (x0 ) = x(s, x0 ).
Geometrically, the flow Φs is a mapping which maps arbitrary point x0 to point
x(s, x0 ), i.e. shifts point x0 along the phase trajectory by parametric distance s.
Hence, the flow satisfies relations
Φ0 (x0 ) = x0 , Φs+t = Φs ◦ Φt , (Φs )−1 = Φ−s .
Obviously,

dΦs (x0 ) d
= x(s, x0 ) = fa (x0 ).
ds s=0 ds s=0
Thus, we can also say that the flow Φs shifts point x0 along the vector field fa .
Let us illustrate it on the example of familiar planar dynamical system
ẋ = y, ẏ = −x
so that we have
f1 (x, y) = y, f2 (x, y) = −x.
Vector field fa can be plotted by

VectorPlot@ 8y , - x <, 8x , - 10, 10<, 8y , - 10, 10<D


In[63]:=

10

Out[63]=
0

-5

- 10

- 10 -5 0 5 10
8.10 Flow of the vector field 185

This dynamical system for initial conditions

x(0) = x0 , y(0) = y0 ,

can be solved explicitly by

sol = DSolve @ 8x '@tD Š y @tD, y '@tD Š - x @tD, x @0D Š x0, y @0D Š y0<, 8x @tD, y @tD<, tD
In[66]:=

88x @tD ® x0 Cos@tD + y0 Sin @tD, y @tD ® y0 Cos@tD - x0 Sin @tD<<


Out[66]=

which shows, in the notation introduced above,

x(t, x0 , y0 ) = x0 cos t + y0 sin t, y(t, x0 , y0 ) = −x0 sin t + y0 cos t.

Thus, the flow Φs maps point (x0 , y0 ) to point which lies on the solution with initial
conditions (x0 , y0 ) at time s:

Φs (x0 , y0 ) = (x0 cos s + y0 sin s, − x0 sin s + y0 sin s).

Hence, Φs (x0 , y0 ) is a position of the system at time s for initial conditions (x0 , y0 ).
In figure 8.13 we plot the flow for initial conditions

x0 = 1, y0 = 8.

We have seen that the curve Φs (x0 ) for a given x0 is a solution of dynamical
system with initial condition x(0) = x0 . This curve is called orbit of point x0 and is
denoted by

Λ(x0 ) = {Φs (x0 ) | −∞ < s < ∞} . (8.34)

Similarly, we define positive semi-orbit and negative semi-orbit by

Λ+ (x0 ) = {Φs (x0 ) | s > 0} ,


(8.35)
Λ− (x0 ) = {Φs (x0 ) | s < 0} .
186 8 Dynamical systems

IC = 8 x0 ® 1, y0 ® 8<;
In[187]:=

g1 = VectorPlot@ 8y , - x <, 8x , - 10, 10<, 8y , - 10, 10<, VectorStyle ® Orange D;


g2 = ParametricPlot@ 8x @tD, y @tD< . sol . IC , 8t, 0, 5<, PlotStyle ® Black D;
g3 = Graphics@ 8Black , Text@Style @"x 0 = F 0 H x 0 L", 8Large <D, 80.2, 9<D<D;
g4 = Graphics@ 8Black , Text@Style @"F 5 H x 0 L", 8Large <D, 8- 6, 4.2<D<D;
Show @g1, g2, g3, g4D

x 0 =F 0 Hx 0 L
10

F 5 Hx 0 L
5

Out[192]=
0

-5

- 10

- 10 -5 0 5 10

Fig. 8.13. Illustration of the flow.

8.11 Lyapunov stability


Recall that we have defined the critical point or fixed point xC of dynamical system
(8.30) as such point xC for which

fa (xC ) = 0

and hence ẋa (xC ) = 0. System with initial conditions x0 = xC is in equilibrium in


the sense that it remains in the critical point at all times, i.e.

Λ(xC ) = {xC }.

In other words, critical point xC satisfies relation

Φs (xC ) = xC for all s ∈ R.


8.11 Lyapunov stability 187

We have classified critical points according to behaviour of the orbits (phase tra-
jectories) in the vicinity of the critical point. If the orbit remained in the vicinity of
critical point, we have said that the critical point is stable. If the orbit was attracted
to critical point, it was called stable node or stable focus, depending on the character
of the system. If the orbit was circular, critical point was called centre. Finally, if
the orbit escaped from the critical point to infinity, we called the critical point the
unstable node or unstable focus. However, this analysis was performed for linearized
dynamical system. Now we can formulate the stability for general non-linear system
in terms of the flow.
Let k · k be standard norm defined on the phase space M , i.e. for any x ∈ M its
norm is
q
kxk = x21 + x22 + · · · x2n .

In general, the norm is a measure of distance of point x from the origin. In some
situations, it is useful to introduce different notion of the norm, for example the
so-called p−norm (p is positive integer) defined by

kxkp = p xp1 + xp2 + · · · xpn .


p

In the following we will use standard norm k · k = k · k2 which is a standard Euclidean


distance, as follows from the Pythagorean theorem. In general, the norm must satisfy
three relations.
• Positive definiteness

kxk ≥ 0 and kxk = 0 only for x = 0.

• Linearity

kα xk = |α| kxk

for arbitrary real α ∈ R.


• Triangle inequality

kx + yk ≤ kxk + kyk.

In some contexts the first condition is relaxed, i.e. we admit there are vectors
x 6= 0 for which kxk = 0. In this case, operation k · k is called semi-norm. In this
textbook we consider only positive definite norms satisfying the first property. Notice
that positive definiteness implies that whenever
188 8 Dynamical systems

kx − yk = 0,

vectors x and y are equal, x = y.


Solution Φs (x0 ) is called Lyapunov stable if for any ε > 0 there exists δ > 0 such
that

kx0 − y0 k < δ → ∀s∈R kΦs (x0 ) − Φs (y0 )k < ε.

If solution Φs (x0 ) is not Lyapunov stable, it is called unstable. Solution Φs (x0 ) is


called asymptotically stable if it is stable and, in addition, there exists δ > 0 such
that

kx0 − y0 k < δ → lim kΦs (x0 ) − Φs (y0 )k = 0.


s→∞
9
Bifurcations

In the previous chapter we defined the concept of dynamical system and introduced
several notions related to dynamical systems. Among others, we have investigated the
stability of critical points. This discussion was connected with the behaviour of the
phase trajectories (or orbits) n the neighbourhood of the critical point. In this section
we analyse dynamical systems from another point of view. Instead of investigating
the orbits (but using classification introduced in previous chapter) we investigate the
influence of the parameters of the system. We will observe that there are values of
parameters for which the system can exhibit different behaviour. Which behaviour
occurs depends on the circumstances, e.g. on the history of the system. Points at
which the system must ”decide” which behaviour to choose are called bifurcation
points. These issues will be clarified and illustrated below. Bifurcation theory is a
large subject and in this chapter we merely sketch the main ideas without going into
depth.

9.1 Saddle-node bifurcation


The existence and properties of critical points can depend on the parameters of
dynamical system. Consider one-dimensional dynamical system

ẋ = µ + x2 (9.1)

where µ is a real parameter. If µ > 0, there are no real critical points. For µ = 0, the

only critical point is xC = 0, and for µ < 0 there are two critical points at xC = µ

and xC = − µ. Let us examine the character of critical points briefly.
For µ = 0 and critical point xC = 0, the linearized version of system (9.1) reads

ẋ = 0
190 9 Bifurcations

which shows that xC is non-hyperbolic critical point (eigenvalue of Jacobi matrix has
vanishing real part).

For µ < 0, the critical point is xC = ± µ. We expand function

f (x) = µ + x2

into the Taylor series in x about point xC and find


√ √ √
f (x) = f (xC ) + (x − xC ) f 0 (xC ) = 2 µ + 2 (x ∓ µ) (± µ) = ±2 µ x.

Hence, system (9.1) linearized in the neighbourhood of point µ reads

ẋ = 2 µ x

which shows that critical point µ is unstable node. In the neighbourhood of critical

point − µ we have

ẋ = −2 µ x

and so this critical point is a stable node. We can plot critical points corresponding
to different values of µ by code presented in figure 9.1.
Saddle-node bifurcations occur when critical points do not exist for some values of
the parameter, then a critical point suddenly appears at some value of the parameter
and single critical point splits into two critical points for other values of the param-
eter. In our case, there are no critical points for µ > 0 but a critical point appears
at µ = 0. This is a bifurcation point. Finally, for µ < 0 there are two critical points,
one of them being stable, the other one being unstable.

9.2 Transcritical bifurcations


Now consider dynamical system

ẋ = µ x − x2 = x (µ − x). (9.2)

Regardless on the value of µ, there is always one critical point at xC = 0 and one
critical point at xC = µ. Hence, unlike the case of saddle-node bifurcations, the
number of critical points does not change. However, we will show that the character
of these critical points change at the bifurcation point.
First critical point is xC = 0. After linearization of system (9.2) we find

ẋ = µ x.
9.2 Transcritical bifurcations 191

PlotB : - Μ > , 8Μ, - 2, 0.5<,


In[61]:=
-Μ , -
PlotStyle ® 88Dashed , Thick <, 8Thick <<, AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,
BaseStyle ® 8FontSize ® 15<,
Epilog ® 8Disk @80, 0<, 0.03D,
Text@"unstable node ", 8- 1, 1.3<D,
Text@"stable node ", 8- 1, - 1.3<D,
Text@"bifurcation point", 8-0.6, 0.1<D
<
F

xC

unstable node

1.0

0.5

Out[61]=
bifurcation point
Μ
- 2.0 -1.5 -1.0 - 0.5 0.5

- 0.5

-1.0

stable node

Fig. 9.1. Saddle-node bifurcation. Diagram for one-dimensional system (9.1).

Obviously, for µ > 0, critical point is unstable while for µ < 0 it is stable. On the
other hand, after linearization of system (9.2) we have

ẋ = µ2 − µ x. (9.3)

This equation is inhomogeneous linear equation with constant coefficients and can
be solved by elementary methods. First we write down corresponding homogeneous
equation

ẋ = −µ x

which integrates to
192 9 Bifurcations

xH = C e−µ t

where subscript H stands for ”homogeneous”. Next we need to find any particular
solution of original inhomogeneous equation. This is trivial, however, for obviously
the choice x = µ is a solution to equation 9.3. By a mathematical theorem, general
solution to equation (9.3) is

x = µ + C e−µ t .

Constant µ does not affect the character of critical point (prove!) and only the ex-
ponential term matters. We can see that for µ > 0 the critical point is stable while
for µ < 0 it is unstable.
To summarize, we have found two critical points

xC = 0 and xC = µ

which change the character at µ = 0. Character of both critical points is depicted in


figure 9.2.
Transcritical bifurcations occur when there are two critical points for all values of
parameter. However, at bifurcation point (in our case µ = 0), these critical points
interchange their character and the point which was stable becomes unstable and
vice versa.

9.3 Pitchfork bifurcation


Next we examine the system

ẋ = µ x − x3 . (9.4)

Notice that this system is invariant under reflection x 7→ −x, for under this trans-
formation we have

x 7→ −x, ẋ 7→ −ẋ, x3 7→ −x3 ,

and hence

ẋ = µ x − x3 7→ −ẋ = −µx + x3 → ẋ = µ x − x3 .

Thus, equation (9.4) does not change its form under the reflection, i.e. the reflection
is a symmetry of equation (9.4). Pitchfork bifurcations occur often in the systems
possessing some kinds of symmetries.
9.3 Pitchfork bifurcation 193

xC1@Μ_ D = Piecewise @8 80, Μ < 0<, 8Μ, Μ > 0<<D;


In[66]:=

xC2@Μ_ D = Piecewise @8 8Μ, Μ < 0<, 80, Μ > 0<<D;

Plot@ 8xC1@ΜD, xC2@ΜD<, 8Μ, - 2, 2<,


In[83]:=

PlotStyle ® 88Blue , Thick <, 8Dashed , Red , Thick <<, AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,
Axes ® 8False , True <, BaseStyle ® 8FontSize ® 15<,
Epilog ® 8Disk @80, 0<, 0.03D,
Text@"unstable ", 8- 1, 0.1<D,
Text@"stable ", 81, 0.1<D,
Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D
<
D

xC
2

Out[83]=
unstable stable
0 Μ

-1

-2

Fig. 9.2. Transcritical bifurcation diagram for system (9.2).

There is always a critical point xC = 0 regardless on the value of µ. Linearization


of system (9.4) yields
ẋ = µ x
and so this critical point is stable for µ < 0 and unstable for µ > 0.

For µ > 0 there are two other critical points xC = ± µ. By linearization we find

ẋ = −2 µ (x ∓ µ)
194 9 Bifurcations

which shows that (ignoring the constant factor as in the previous section) both critical

points ± µ are stable. Indeed, µ > 0 and hence the factor standing by x is always
−2µ < 0. All possibilities are plotted in figure 9.3 again.

xC1@Μ_ ; Μ £ 0D = 0;
In[99]:=

xC2@Μ_ ; Μ > 0D = Μ ;

xC3@Μ_ ; Μ > 0D = - Μ ;
xC4@Μ_ ; Μ > 0D = 0;

Plot@ 8xC1@ΜD, xC2@ΜD, xC3@ΜD, xC4@ΜD<, 8Μ, - 2, 2<,


In[104]:=

PlotStyle ® 88Blue <, 8Blue <, 8Blue <, 8Red , Dashed <<,
AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,
Axes ® 8False , True <, BaseStyle ® 8FontSize ® 15<,
Epilog ® 8Disk @80, 0<, 0.03D,
Text@"stable ", 8- 1, 0.1<D,
Text@"unstable ", 81, 0.1<D,
Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D
<
D

xC

1.0

0.5

Out[104]=
stable unstable
0.0
Μ

- 0.5

-1.0

Fig. 9.3. Supercritical pitchfork bifurcation diagram for system (9.4).


9.4 Example 195

Bifurcations of the type discussed are called supercritical pitchfork bifurcations.


Dynamical system

ẋ = µ x + x3

is a typical system showing the so-called subcritical pitchfork bifurcation. Show by


standard analysis that bifurcation diagram for this system is correctly depicted in
figure 9.4.

9.4 Example
Now let us see a non-trivial example on pitchfork bifurcation. Let the system be

ẋ = µ x + y + sin x, ẏ = x − y. (9.5)

Our task is to determine the bifurcation point and type of bifurcation. We will use
Mathematica to solve particular steps.
First we find critical points by setting ẋ = 0 and ẏ = 0. Second equation imme-
diately gives x = 0 and hence equation for x reads

µ x + x + sin x = 0. (9.6)

Clearly, a general solution cannot be found analytically but we can see that for
arbitrary µ there is always a solution

xC = yC = 0.

Let us determine the character of this critical point. Jacobi matrix of system (9.5) is
 
µ+1 1
J= (9.7)
1 −1

and its eigenvalues can be found by Mathematica :

J = 8 8Μ + 1, 1<, 81, - 1<<;


In[30]:=

sys = Eigenvalues@J D

: >
Out[31]= 1 1
Μ- 8 + 4 Μ + Μ2 , Μ+ 8 + 4 Μ + Μ2
2 2
196 9 Bifurcations

xC1@Μ_ ; Μ £ 0D = 0;
In[11]:=

xC2@Μ_ ; Μ < 0D = -Μ ;

xC3@Μ_ ; Μ < 0D = - -Μ ;
xC4@Μ_ ; Μ > 0D = 0;

Plot@ 8xC1@ΜD, xC2@ΜD, xC3@ΜD, xC4@ΜD<, 8Μ, - 2, 2<,


In[17]:=

PlotStyle ® 88Blue <, 8Blue <, 8Blue <, 8Red , Dashed <<,
AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,
Axes ® 8False , True <, BaseStyle ® 8FontSize ® 15<,
Epilog ® 88PointSize @Large D, Point@80, 0<D<,
Text@"stable ", 8- 1, 0.1<D,
Text@"unstable ", 81, 0.1<D,
Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D
<
D

xC

1.0

0.5

Out[17]=
stable unstable
0.0
Μ

- 0.5

-1.0

Fig. 9.4. Subcritical pitchfork bifurcation diagram for system ẋ = µ x + x3 .


9.4 Example 197

Although it is trivial to investigate behaviour of eigenvalues as functions of parameter


µ, it is even easier to use Mathematica to plot dependence of λ1 and λ2 on µ.

Λ1@Μ_ D = sysP 1T
In[14]:=

Λ2@Μ_ D = sysP 2T
Plot@ 8Λ1@ΜD, Λ2@ΜD<, 8Μ, - 10, 10<, PlotStyle ® 8Blue , Red <D

Out[14]= 1
Μ- 8 + 4 Μ + Μ2
2

Out[15]= 1
Μ+ 8 + 4 Μ + Μ2
2

10

Out[16]=

- 10 -5 5 10

-5

Hence, for all values of µ we have λ1 < 0 while λ2 changes the sign for µ = −2. That
means that for µ < −2, when both eigenvalues are negative, the critical point is a
stable node. For µ > −2, the critical point is a saddle point because eigenvalues have
different signs.
Clearly, point µ = −2 is a candidate for being a bifurcation point. Since we cannot
solve equation (9.6) exactly, we restrict our attention to neighbourhood of potential
bifurcation point µ = −2. Critical points are roots of function
Rµ (x) = µ(x + 1) + sin x.
In figure 9.5 we plot this function for three values of µ. We can see that critical
points different from the origin appear only for µ > −2. Approximate location of
these critical points can be found by expanding function sin x in (9.6) up to the third
order,
1 3
sin x = x − x,
3!
198 9 Bifurcations

so that this equation simplifies to


1 3
x (µ + 2) − x = 0.
6
One solution is, of course, x = 0, the other two are
p
x = ± 6(µ + 2). (9.8)

R Μ H x L = ΜH x +1L + sin x
0.4
Μ = - 2.1
0.2

Μ = -2
- 1.0 - 0.5 0.5 1.0
x

- 0.2 Μ = -1.9

- 0.4

Fig. 9.5. Plot of function Rµ (x) = µ(x + 1) + sin x. Its roots are critical points of system (9.5). For
µ ≤ −2, the origin x = 0 is the only critical point, for µ > −2 there are two critical points symmetric
about the origin.

Now we can determine the character of bifurcation point even without analysis
of new critical points. Recall that the origin is a critical point, stable for µ < −2
and unstable for µ > −2. New critical point emerge at bifurcation point and exist
for µ > −2. Hence, the bifurcation diagram is similar to that in figure 9.3. We can
deduce that the bifurcation is supercritical and two new critical points are stable.
In Mathematica we can easily find precise locations of critical points numerically
using function FindRoot. This function needs a starting point and we choose this
starting point to be approximate solution (9.8). Full Mathematica code for plotting
correct bifurcation diagram in the neighbourhood of the bifurcation point µ = −2 is
shown in figure 9.6.
9.4 Example 199

cp@Μ_ ; Μ > - 2D := FindRootB Μ x + x + Sin @x D Š 0, : x , 6 H Μ + 2L > F@@1, 2DD


In[218]:=

;
In[229]:=
xC1@Μ_ Μ £ - 2D = 0;
xC2@Μ_ ; Μ > - 2D = 0;
xC3@Μ_ ; Μ > - 2D = cp@ΜD;
xC4@Μ_ ; Μ > - 2D = - cp@ΜD;

g = Plot@ 8xC1@ΜD, xC2@ΜD, xC3@ΜD, xC4@ΜD<,


In[262]:=

8Μ, - 3, - 1<, PlotStyle ® 88Blue <, 8Red , Dashed <, 8Blue <, 8Blue <<,
AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,
Axes ® 8False , True <, BaseStyle ® 8FontSize ® 15<,
Epilog ® 8Disk @80, 0<, 0.03D,
Text@"stable ", 8- 2.5, 0.2<D,
Text@"stable ", 8- 1.5, 2.5<D,
Text@"unstable ", 8- 1.2, 0.2<D,
Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D,
8PointSize @Large D, Point@8- 2, 0<D<,
Text@Style @"Μ=- 2", FontSize ® 15D, 8- 1.8, 0.2<D
<
D

xC
3
stable
2

Out[262]=
stable Μ =-2 unstable
0

-1

-2

-3

Fig. 9.6. Supercritical bifurcation point for dynamical system (9.5).


A
Important commands in Mathematica

A.1 D-derivative
Derivatives in Mathematica can be computed in several ways. Command of the form
D[f, x]
differentiates function f with respect to variable x. If we need n−th order derivative
of f , we use
D[f, {x, n}]
Similarly, second partial derivatives with respect to several variables can be calculated
by
D[ f, x, y ]
which is equivalent of
∂ 2f
∂x ∂y
For example, commands
D[ Sin[x^2], x]
D[ x^3, {x, 2} ]
D[ y x^2 + x y^2, x, y]
are equivalents of mathemematical expressions

d d2 3 ∂2
sin x2 , y x2 + x y 2

x,
dx dx2 ∂x ∂y
and produce following output
202 A Important commands in Mathematica

2 x Cos[x^2]
6 x
2 x + 2 y

A.2 Table
Command Table[...] creates one-dimensional or more dimensional lists of elements.
One-dimensional list can be created by
Table[ expr, {i, imin, imax} ]
where expr is some expression depending on variable i. Command Table subsequently
substitutes values of i into expression expr and produces a list of expressions. For
example, command
squares = Table[ i^2, {i, 1, 5} ]
produces a list
{1, 4, 9, 16, 25}
which is now stored in variable squares. In order to access individual elements of the
list, use the double-square-brackets [[ and ]]. For example, third element of the list
squares can be accessed via
squares[[ 3 ]]
which returns
9.
B
Some features of Mathematica

B.1 Rules of replacement


One of the most powerfull tools in Mathematica is the rule-based replacement. We
start by simple example. Suppose we have trivial expression
y
and we want to replace symbol y by some more complicated expression, say y = x2 .
Let us write
y /. y-> x^2
In the previous code, symbol /. means that we are going to use some rules of replace-
ment. The rule itself is
y -> x^2
and says that any occurence of symbol y will be replaced by expression x2 . This can
be useful when the expression is more complicated. Example:
x + y^2 - 1/y /.y->x^2
will replace all occurences of symbol y in expression x + y 2 − 1/y by x2 , so that the
result is
-(1/x^2) + x + x^4
We can define the list of rules as well. Imagine we want to replace simultaneously
x and y in some expression, for example, we want to replace x by x − 1 and y by
1 − y 2 in expression x2 + y 2 :
x^2 + y^2 /. { x-> x-1, y -> 1-y^2 }
which yields
204 B Some features of Mathematica

(-1 + x)^2 + (1 - y^2)^2.


Sometimes it is useful to define the rules separately in order to increase the read-
ibility of the code. Previous example is equivalent to the following:
rules = { x-> x-1, y -> 1-y^2 };
x^2 + y^2 /. rules

B.2 Functions
In Mathematica you can define functions of any type and there are many features to
be covered. Here we discuss only what is necessary for the purposes of our textbook.
Function of one or more variables is defined according to scheme
func_name [ var1_, var2_, ... ] = expr
where func name is the name of new function. In square brackets you have to enu-
merate all variables which the function depends on. Notice the underline symbol
after the name of each variable. Assignment is performed via traditional symbol =.
Finally, on the right hand side there is an expression for the function.
2
For example, you can define function f = 3 x e−x as
f[ x_ ] = 3 x Exp[-x^2];
Now you can evaluate it at some point, say 10, by
f[10]
which yields
30
.
e100
If you need numerical value, type
f[10] //N
to find result 1.11602 × 10−42 .
Let us see an example of function of more variables.
f[ x_, y_, z_ ] = x^2 + y^2 + z^2
To evaluate this function at some point, say (1, 2, 3), type
f[1, 2, 3]
to get number 14.
B.3 Pure functions 205

B.3 Pure functions


Pure functions are very useful constructions in Mathematica. In mathematics there
is a difference between f and f (x), although these symbols are (in some contexts)
used as equivalent. Symbol f is a function of, say, one variable x, which means that
it maps real number into real number, mathematically

f : R 7→ R.

On the other hand, symbol f (x) is a value of function f at point x. More precisely,
f is a set of ordered pairs (x, y) such that there is only one y for each x. If a pair
(x, y) is an element of f , i.e. (x, y) ∈ f , we write usually

y = f (x).

Thus, f is a set of ordered pairs of real numbers, while f (x) is the single real number
meaning the value of f at point x.
Let us turn back to Mathematica. When you write, for example,
f[x_] = 1 + x^2
you tell Mathematica that the value of function f at point x is f (x) = 1 + x2 . But
the name of the argument is irrelevant, for if you write
f[q_] = 1 + q^2
you define exactly the same function! The name of argument is only formal. The
alternative is to use the pure function.
Consider following definition:
f = Function[ 1 + #^2 ]
Here we do not use the name of arguments. The sharp symbol # means the argument
of function regardless on its name. You can verify that function f defined in this way
behaves as function f[x] or f[q] defined above. Similarly, you can define function of
more variables by
f = Function[ #1^2 + #2^2 ]
where symbols #1 and #2 stand for the first and the second argument, respectively.
Calling
f[x,y]
206 B Some features of Mathematica

now yields

x2 + y 2 ,

calling
f[1, 3]
yields number 10.
Pure function can be defined without using command Function by symbol &.
Following three lines are equivalent:
f[x_] = 1 + x^2
f = Function[ 1+#^2 ]
f = (1 + #^2)&
Notation with symbol & is particularly useful if we need to use the function only
at one place but we do not need it later. Then it is unnecessary to define function
separately. For example, suppose that you are given a list
list = { 1, 2, 3, 4, 5 };
and you want to apply function f (x) = 1 + x2 to each element of list. We can use
operator /@:
(1+#^2)& /@ list
Here we defined a pure function (1 + #2 )& which, as we have seen, is an abstract
way of defining function 1 + x2 . Operator /@ now substitutes each element of list into
this pure function and produces a list
{2, 5, 10, 17, 26}.

B.4 Expressions
Anything you type in Mathematica is called expression and expressions can be di-
vided into two groups, atomic and composed. Atomic expressions are the most simple
elements, e.g. numbers, functions. Each expression has the so-called head which can
be found using function Head. For example, try the following code:
Head[2]
Head[4.5]
Head[2 + 3 I ]
Mathematica returns “values”
B.4 Expressions 207

Integer, Real, Complex


which means that 2 was recognized as an integer, 4.5 as a real, and 2+3i as a complex
number. Mathematica’s power rests in its ability to work with symbolic expressions.
If the atomic expression is not identified as a number, its head is symbol. Verify this
fact for:
Head[x]
Head[Sin]
Head[f]
etc.
Atomic expressions we have seen above can be combined into composed expres-
sions. For example, symbols x and y are atomic expressions, but their sum x + y is
composed expression. The head of expression x+y is Plus (check!). If you want to
access several parts of composed expression, you can use function Part. For example,
Part[ x+y, 2 ]
returns the second part of composed expression x + y which is y. We can also list all
parts of the composed expression by Level:
Level[ x+y-z+b, 1 ]
yields {b, x, y, -z}. Try the following:
Head[x + y]
Head[x y]
Head[ x^y ]
You can see that Mathematica returns Plus, Times, Power. If, however, you type
Head[x-y]
Mathematica returns Plus again (did you expect “minus”?). The reason is obvious,
for if we type
Level[ x-y, 1 ]
Mathematica returns {x, -y}. Thus, Mathematica treats expression x − y as a sum
of x and −y. Typing
Head[-y]
Level[-y,1]
yields
208 B Some features of Mathematica

Times
{-1, y}
Therefore, −y is a product of −1 and y. Reader is invited to experiment with several
expressions in order to get feeling for the structure of Mathematica.

B.5 Working with heads


Heads of arbitrary expressions can be replaced without changing the structure of
expression. For example, expressions
x + y + z
x y z
{x, y, z}
all have the same strcture and differ only by the head. We can verify that by functions
Head and Level. These functions reveal that the head of the first expression is Plus,
the head of the second one is Times and the head of the third expression is List.
Nevertheless, calling the function Level shows that the structure of all expressions is
{x, y, z}.
Therefore, by changing the head, we can easily convert those expressions between
themselves. The head of the expression can be changed by function Apply as in the
following example:
Apply[ Plus, {x, y, z} ]
turns the list {x,y,z} into expression x+y+z. The same operation can be written in
an abbreviated form as
Plus @@ {x, y, z}
The head Plus is applied by operator @@ to the list on the right hand side.
C
Shortcuts in Mathematica

C.1 Greek letters


Greek letters can be typed in several ways. The most convenient is to use following
table:
α ESC a ESC ι ESC i ESC σ ESC s ESC
β ESC b ESC κ ESC k ESC τ ESC t ESC
γ ESC g ESC λ ESC l ESC φ ESC f ESC
δ ESC d ESC µ ESC m ESC χ ESC c ESC
 ESC a ESC ν ESC n ESC ψ ESC y ESC
ζ ESC z ESC ξ ESC x ESC ω ESC w ESC
η ESC h ESC π ESC p ESC
θ ESC q ESC ρ ESC r ESC
For example, to type α just press Escape key, then type a and press Escape again.
Mathematica will automatically display symbol α. Another way is to use table

α \[Alpha] ι \[Iota] σ \[Sigma]


β \[Beta] κ \[Kappa] τ \[Tau]
γ \[Gamma] λ \[Lambda] φ \[Phi]
δ \[Delta] µ \[Mu] χ \[Chi]
 \[Epsilon] ν \[Nu] ψ \[Psi]
ζ \[Zeta] ξ \[Xi] ω \[Omega]
η \[Eta] π \[Pi]
θ \[Theta] ρ \[Rho]
D
To do

• Rotation matrices
• Full analysis of chaotic pendulum
• Matrix eigenvalues
• Volterra-Lotka equations
• Pictures on Lyapunov stability
• More coordinate systems