0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

5 visualizzazioni211 pagineMar 21, 2019

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

5 visualizzazioni211 pagine© All Rights Reserved

Sei sulla pagina 1di 211

Classical Mechanics

and

Dynamical Systems

2

Faculty of Transportation Sciences

Czech Technical University in Prague

Contents

1 Classical mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 Newton’s laws of motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.1 Einstein’s convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.4 Potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.6 Conservation of momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.7 Conservation of angular momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.8 Curvilinear coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.8.1 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.8.2 Spherical coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.2 Lagrange equations of the second kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.2.1 Generalized coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.2.2 Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.2.3 Generalized forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.2.4 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3 Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.4 Particle in homogeneous gravitational field . . . . . . . . . . . . . . . . . . . . . . . . 47

2.5 Harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.6 Mathematical pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Contents

2.8 Solving the equations of motion of pendulum . . . . . . . . . . . . . . . . . . . . . . 55

2.9 Deriving the Lagrangian in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.10 Planet in gravitational field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.2 Legendre transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.3 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4 Particle in homogeneous gravitational field . . . . . . . . . . . . . . . . . . . . . . . . 68

3.5 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.5.1 Homogeneous functions and Hamiltonian . . . . . . . . . . . . . . . . . . . . . 70

3.5.2 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.6 Phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.7 Harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.1 Fermat’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2 Formulation of variational problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.3 Variation of the functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.4 Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.5 Non-uniqueness of the Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.6 Variational derivation of Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . 91

4.7 Noether’s theorem: motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.8 Noether’s theorem: proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.9 Basic conservation laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.1 Canonical transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.2 Hamilton-Jacobi equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.3 Example: harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.4 Action-angle variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.1 Lagrangian and equations of motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.2 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.3 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.4 Homogeneous fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.5 Electromagnetic wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Contents 5

7.1 Complex sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

7.2 Mandelbrot set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

8.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

8.3 Implementation in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

8.4 Chaotic pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

8.5 Critical points of the pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

8.6 Stability of critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

8.6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

8.7 Classification of critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8.7.1 Stable and unstable nodes, saddle points . . . . . . . . . . . . . . . . . . . . . 161

8.7.2 Centres and foci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

8.8 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

8.9 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

8.10 Flow of the vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

8.11 Lyapunov stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

9 Bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9.1 Saddle-node bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9.2 Transcritical bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

9.3 Pitchfork bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

9.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

A.1 D-derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

A.2 Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

B.1 Rules of replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

B.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

B.3 Pure functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

B.4 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

B.5 Working with heads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

6 Contents

C.1 Greek letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

D To do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

1

Classical mechanics

Classical mechanics is the most basic part of the physics. In fact, the physics as

an exact science started with the development of mechanics by sir Isaac Newton.

Conventionally we distinguish two parts of mechanics: the kinematics and the dy-

namics. In this chapter we introduce basic notions of dynamics, including the notion

of generalized coordinates and several conventions being used thorough the entire

textbook.

The word “kinematics” is derived from the Greek word κινει̃ν (kinein) meaning

“to move”. Thus, the kinematics studies the motion of bodies and point masses. It

does not, however, ask why the bodies move in a given way, but rather it provides

us with the description of the motion. In kinematics we ask where the bodies are,

at what velocities and with what accelerations they move. We also classify the kinds

of motion according to the shapes of the trajectories or according to the velocities.

Typical kinematic quantities are position, velocity and acceleration.

In dynamics, on the other hand, we study reasons of the motion. The word “dy-

namics” has an ancient origin as well: δυναµικóς means “powerful”. In this branch of

mechanics we ask what are the forces acting on the bodies and what is the influence

of the forces on the motion. This influence will not depend on the force themselves

only, but also on the mass of the bodies. Mass, force and momentum belong to basic

quantities in dynamics.

In this chapter we start with reformulation of non-relativistic classical, i.e. Newto-

nian, dynamics. In Newtonian dynamics, physical bodies or idealized point particles

are moving and interacting according to Newton’s laws of motion. Newton himself

formulated them in his Mathematical Principles of Natural Philosophy as follows:

8 1 Classical mechanics

• Law of inertia

Every body persists in its state of being at rest or of moving uniformly straight

forward, except insofar as it is compelled to change its state by force impressed.

• Law of force

The alteration of motion is ever proportional to the motive force impressed; and

is made in the direction of the right line in which that force is impressed.

• Law of action and reaction

To every action there is always an equal and opposite reaction: or the forces of

two bodies on each other are always equal and are directed in opposite directions.

These laws involve important notions of force, momentum and mass and we as-

sume that reader is familiar with them. Consider a point particle of mass m. Choosing

some fixed point O (origin) in the space, we can describe motion of the particle by

position vector (radius vector) r. Position vector is time-dependent if the body is

moving with respect to the origin O, which is mathematically denoted by r = r(t).

Trajectory of the point particle is the set of all end-points of the position vector

in some time interval (see fig. 1.1). Velocity is defined as a derivative of r(t) with

respect to time:

dr

v(t) = .

dt

The total derivative with respect to time will be often denoted by “dot”, so that the

last equation is briefly written as v = ṙ(t) = ṙ. Sometimes it is useful to parametrize

position vector by other parameter than time, for example by the length of the

trajectory.

Similarly, second derivative with respect to time will be denoted by “double-dot”.

The most important example is the definition of acceleration, which is the second

derivative of position vector with respect to time:

d2 r dv

a= 2 = = r̈ = v̇.

dt dt

Quantities r, v and a are so called kinematic quantities. They describe the motion

independently of the causes and reasons of motion. According to Aristotle, the motion

is caused by forces, but this is wrong. Aristotle’s opinion was so influential that it

stopped the progress in physics for the next two thousands years. Experimental

research of Galileo Galilei, his discovery of the law of inertia, and finally the grand

work of Isaac Newton founded the basis of modern physics.

Why is Aristotle’s point of view wrong? Well, we have to clarify what we mean by

the statement that “the motion is caused by the forces”. The law of inertia says that

1.1 Newton’s laws of motion 9

trajectory

P (position at time t)

v(t)(velocity)

r(t)(radius vector)

O(origin)

Fig. 1.1. The point mass is moving along the trajectory. Its position at time t is given by position

vector r(t). The velocity v = ṙ

if there is no force, the body will move uniformly along the straight line. We need

the force to change the motion, not to preserve it. Therefore there is no connection

between the force and velocity, but there must be a relation between the force and

acceleration. This crucial point was missed by Aristotle.

The precise form of the relation between force and acceleration is given by New-

ton’s second law, the law of force. We expect that acceleration has the same direction

as the force, and that bigger force will cause bigger acceleration. Experience teaches

us that we need bigger force to change the motion of heavier bodies, so the acceler-

ation must be inversely proportional to the mass. This simple consideration directly

leads us to the suggestion

F

a= ,

m

where m is the mass of the body and F is the force acting on the body. The last

formula is a mathematical expression of Newton’s second law and the experiments

show that it is in a very good accordance with the reality, although it fails for high

velocities, strong gravitational fields and for microscopic objects.

We can formulate this law in slightly different form by defining the (linear) mo-

mentum p of the body:

p = m v.

Momentum incorporates both the measure of the inertia (mass) and the “state of

motion”, velocity. The force can be then defined as the change of the momentum in

time, i.e.

10 1 Classical mechanics

dp

F = .

dt

If the mass of the body is constant in time, we have ṗ = mv̇ = ma, which is again

Newton’s law

F = m a.

In the previous section we defined the position vector, velocity and the other quanti-

ties geometrically. For example, by position vector we mean oriented line connecting

the origin with given point, the velocity was defined as a vector tangent to the

trajectory etc. This geometrical language is very convenient and useful and will be

developed in more detail thorough the textbook. However, if we want to describe

the position of a body, we have to introduce a coordinate system in which we can

specify the coordinates. In what follows the notion of coordinates will be crucial. In

this section we therefore briefly review the basics of the so-called index notation.

We suppose that the reader is familiar with the Cartesian coordinate system.

Consider figure 1.2. Again, the point O is the origin of the reference frame and we

want to specify the position of the point P . The position vector r is an oriented

line connecting points O and P . Now, choose three lines called x, y and z, which are

perpendicular to each other and intersect at the origin O. These lines are called axes.

Then to each point P we can assign three real numbers called Cartesian coordinates.

Symbolically we write

r = (x, y, z)

and say that x, y and z are the coordinates (or the components) of the position vector

r. In the index notation we define

x1 = x, x2 = y, x3 = z.

xi , i = 1, 2, 3.

If the position vector depends on time, i.e. r = r(t), also its coordinates xi do:

xi = xi (t).

1.2 Index notation 11

vi = ẋi ,

and the components of the acceleration are

ai = v̇i = ẍi .

Any vector equation can be then written equivalently in the index form. For

example, Newton’s law of force F = m a is equivalent to equation

Fi = m ai .

Substituting for ai we obtain the index form of the law of force:

Fi = m ẍi .

Recall that the momentum of the particle was defined as p = m v. In the index

notation we can write

pi = m vi = m ẋi .

Let x and y be arbitrary vector quantities with corresponding components xi and

yi . In other words,

x = (x1 , x2 , x3 ),

(1.1)

y = (y1 , y2 , y3 ).

The scalar product of these vector can be defined as a scalar quantity

x · y = x1 y 1 + x 2 y 2 + x3 y 3 .

Using the summation symbol Σ, the last equality can be rewritten as

3

X

x·y = xi yi .

i=1

This notation means that we subsequently substitute values 1, 2, 3 for the variable

i and then add all terms. Notice that under the summation symbol we have two

vector quantities and the index i appears there exactly twice. In fact, expressions of

this type arise very often in mathematics and physics. Albert Einstein introduced a

convention named after him, in which we do not write the symbol Σ. More precisely,

if some index appears in some term exactly twice, the sum through this index is

automatically assumed. That is, the scalar product can be written simply as

x · y = xi y i .

12 1 Classical mechanics

1.2.2 Differentiation

Let us see another example. Suppose that f is a physical quantity depending on the

position, i.e.

time, i.e. xi = xi (t). Then also the quantity f depends on time, for we can write

f (t) = f (x(t)).

3

∂f dx1 ∂f dx2 ∂f dx3 X ∂f dxi

f˙ = + + = .

∂x1 dt ∂x2 dt ∂x3 dt i=1

∂xi dt

We can see that the expression under the sum has again the same structure: index i

appears there exactly twice. According to Einstein’s convention we therefore write

∂f dxi

f˙ = .

∂xi dt

For convenience we introduce also the notation

∂

= ∂i ,

∂xi

and together with notation ẋi = dxi /dt we can write simply

f˙ = ẋi ∂i f.

1.2.3 Examples

1.2 Index notation 13

P (x, y, z)

O y

x

Fig. 1.2. Position vector in Cartesian coordinates.

Solution. Both vectors have three components, so that the index i takes values

i = 1, 2, 3.

x · y = xi yi .

Since the index i repeats twice in the previous expression, according to Einstein’s

summation convention we have

xi yi = x1 y1 + x2 y2 + x3 y3 .

xi yi = 1 · (−5) + 2 · 1 + 3 · 1 = 0.

x · y = 0.

14 1 Classical mechanics

Example 2: divergence

Let

v = (v1 , v2 , v3 )

∂i vi

expression for vector field

v = (x − y, x2 + y 2 , xy). (1.3)

∂i vi = ∂1 v1 + ∂2 v2 + ∂3 v3

or, equivalently,

∂v1 ∂v2 ∂v3

∂i vi = + + .

∂x ∂y ∂z

Substituting (1.3) we find

∂1 v1 = 1,

∂2 v2 = 2 y, (1.4)

∂3 v3 = 0

∂i vi = 1 + 2 y.

∆f = ∂i ∂i f,

where f is an arbitrary object (scalar function or the component of vector). Find the

expression for the Laplacian in the Cartesian coordinates.

1.2 Index notation 15

∂ 2f ∂ 2f ∂ 2f

∆f = ∂i ∂i f = ∂1 ∂1 f + ∂2 ∂2 f + ∂3 ∂3 f = + + .

∂x2 ∂y 2 ∂z 2

Thus, the Laplacian reads

∂2 ∂2 ∂2

∆= + + .

∂x2 ∂y 2 ∂z 2

Let

r = (x, y, z)

√ p

r = r · r = x2 + y 2 + z 2 .

Find the derivatives of radius vector with respect to Cartesian coordinates and write

down the result in the index notation.

Solution. We need to evaluate quantities ∂i r. We start with ∂1 r, i.e. with the deriva-

tive with respect to coordinate x. We have

∂r 1 x

= ∂x (x2 + y 2 + z 2 )1/2 = (x2 + y 2 + z 2 )−1/2 2 x = p .

∂x 2 x2 + y 2 + z 2

Thus, we arrived at

∂r x

= .

∂x r

Similarly, one can show that for the other coordinates the following holds:

∂r y ∂r z

= , = .

∂y r ∂z r

All these result can be summarized in the index notation as

xi

∂i r = .

r

16 1 Classical mechanics

Now let the radius vector from the previous example be time-dependent in such way

that

x(t) = a cos ωt,

y(t) = a sin ωt, (1.5)

z(t) = 0.

Solution. Since r = (x, y, z), we can find the velocity by straightforward differenti-

ation:

ẋ = − b ω sin ωt,

ẏ = b ω cos ωt, (1.6)

ż = 0.

Its magnitude is

√ p

v = v · v = (b ω sin ωt)2 + (b ω cos ωt)2 = ω b.

√

a = a · a = ω 2 b.

d 2

v = 2 v · v̇, (1.7)

dt

preferably in the index notation.

1.3 Kinetic energy 17

are

v = (v1 , v2 , v3 ),

The magnitude of v is

d 2

v = 2 v1 v̇1 + 2 v2 v̇2 + 2 v3 v̇3 = 2 v · v̇.

dt

Solution in index notation. The proof is essentially identical to the previous one,

but more compact:

d 2 d

v = vi vi = 2 vi v̇i = 2 v · v̇.

dt dt

The notion of energy is somewhat subtle and its full understanding relies on the so-

called Emmy Noether theorems, see sections 4.7 and 4.8 in chapter 4. In mechanics,

however, the situation is quite simple. Roughly speaking, the body has an energy if

it can perform the work.

The force can cause the displacement of a body and the quantity known as “work”

is a quantitative characteristics of this process. Suppose that the force F causes the

displacement of the body along a given trajectory from point A to point B. The work

done by this force is defined by

ZB

W = F · dr.

A

Suppose, in addition, that the body was at rest at point A, while the final velocity

of the body was v. Let us evaluate the work done by the force in terms of the final

velocity. Using Newton’s law in the form

18 1 Classical mechanics

F = m v̇

we have

ZB

dv

W = m · dr.

dt

A

dr = v dt,

so that

ZB

W =m v · dv.

A

ZB

1 2

W =m dv .

2

A

1 1

W = m (v 2B − v 2A ) = m v 2 ,

2 2

where we have used the assumptions

v A = 0, v B = v.

1

W = m v2

2

in some detail. First, we can see that the work W done by the force F does not

depend on the trajectory. It does not matter whether the body was moving along

the line or along the curved trajectory, the result depends only on the final velocity

v. Moreover, the work does not depend on the character of motion: the body could

be accelerated uniformly with constant acceleration, or it could be accelerated with

1.4 Potential energy 19

variable acceleration, but the work depends only on the final velocity. And, finally,

the work does not depend on the force. The force could be small and act for a long

time, or it could be big and act only for a moment, but the work depends only on

the final velocity.

Summa summarum, if the body was at rest at the beginning, but it had velocity

v at the end, the work needed to accelerate the body is always the same, regardless

on the way how it was accelerated. This work is called kinetic energy and is defined

by

1 1

T = m v 2 = m ẋi ẋi . (1.8)

2 2

The body has kinetic energy if it is moving and the kinetic energy is equal to work

which must be done by the force to accelerate the body from the rest to velocity v.

The notion of potential energy is a subtle one and it cannot be defined for a general

system, as we will see later in this textbook. Suppose that there is some force acting

on the particle. It can be gravitational force, electromagnetic force or the force which

spring exerts on the point mass attached to one of its endpoints. In the cases just

enumerated, we can give an explicit expressions for the force. Gravitational force is

given by Newton’s gravitational law

m1 m2

Fg = G

r2

where m1 and m2 are masses of bodies and r their distance, G is gravitational

constant. Electromagnetic force acting on the point charge q moving at velocity v

in electromagnetic field characterized by electric field E and magnetic field B is the

so-called Lorentz force

F EM = q (E + v × B) .

When the spring is displaced from its equilibrium position by y, it exerts force

F = −ky

where k is the constant characterizing the spring. Hence, one way how to characterize

the force is to give an explicit expression. Since the force is vector quantity, we have

to specify three coordinates Fx , Fy and Fz .

20 1 Classical mechanics

terizing the state of motion. Recall that it is equal to work which is necessary to

accelerate the body of mass m from the state of rest to state of motion at velocity

v. An important feature of kinetic energy is that it does not depend on the process

how the body acquired its velocity.

Now, suppose that the body is under influence of the force so that its velocity is

being changed. This is connected to corresponding change of kinetic energy of the

body. For example, the body released from some altitude undergoes the change from

zero velocity to accelerated motion called free fall under the influence of gravitational

force. In this case it is the gravitational field which performs the work on the body

and this work is equal to the change of kinetic energy of the body. Thus, another way

how to characterize the force is to specify how the kinetic energy of the body changes

under this force. Potential energy will be therefore defined as the work performed by

the force acting on the body.

Suppose that under the influence of the force, the body was displaced from point

A to point B along the trajectory γ depicted in figure 1.3. Work performed by the

force during this motion is, as usually, defined by

Z

W = F · dr (1.9)

γ

where symbol γ resembles the trajectory along which the body was moving. However,

if we choose any other curve γ 0 , figure 1.3, the work associated with this curve will

be, in general, different:

Z

0

W = F · dr 6= W. (1.10)

γ0

In such a case, the notion of potential energy is useless because it depends on partic-

ular trajectory. Hence, in general, potential energy is meaningless quantity. Surpris-

ingly enough, there are many examples of forces where the work W in fact does not

depend on the choice of trajectory γ. Such forces are called conservative or potential

forces and in such cases we can define a useful and meaningful potential energy.

Let us find which forces have this property. We demand that integral (1.9) does not

depend on γ and it depends only on points A and B and investigate the consequences

of this assumption. Then, however, work performed along any closed loop must be

equal to zero. Indeed, let γ be arbitrary closed loop as depicted in figure 1.4. Let us

choose arbitrary two points A and B lying on the curve. In this way we obtain two

1.4 Potential energy 21

γ0

A

Fig. 1.3. Under the influence of force F , body moves from point A to point B. Potential energy is

the work done by the force during this displacement. However, there are infinitely many trajectories

connecting these two points and the work is, in general, different for each of them.

along different trajectory. Integral over γ can be written as a sum

I Z Z

F · dr = + F · dr. (1.11)

γ γ1 γ2

We have made an assumption that for any two points A and B the integral be-

tween these two points does note depend on the trajectory but only on the points

themselves. Thus, we can write

Z ZB

F · dr = F · dr

γ1 A

where we do not specify the trajectory as the integral does not depend on it. Similar

consideration applies to integral over γ2 but notice that this curve starts at point B

and ends at point A. Hence,

Z ZA ZB

F · dr = F · dr = − F · dr.

γ2 B A

Therefore, both integrals on the right hand side of (1.11) have the same values but

differ by sign and so we arrive at

22 1 Classical mechanics

I

F · dr = 0 (1.12)

γ

as claimed. Conversely, we leave to the reader to show that if (1.12) hold for arbitrary

closed loop γ, then necessarily integral between any two points does not depend on

the trajectory. Conservative forces are those for which (1.12) holds.

There is yet another formulation of the fact that the force is conservative. This

last formulation is convenient for practical purposes because it is a differential rather

than integral criterion for a force to be conservative. If γ is arbitrary closed loop and

the force is conservative, i.e. (1.12) holds, we can use the Stokes theorem to convert

the line integral into a surface integral:

I Z

F · dr = (∇ × F ) · dS (1.13)

γ S(γ)

where S(γ) is the surface surrounded by loop γ. Then, by the conservative character

of F , the Stokes theorem implies

Z

(∇ × F ) · dS = 0

S(γ)

for arbitrary loop γ. But since the choice of γ is arbitrary, the last equality can hold

for all possible loops only if the integrand vanishes everywhere, i.e.

∇ × F = 0. (1.14)

In other words, the curl of conservative field F is necessarily zero. Poincare’s lemma

then asserts that any vector field with vanishing divergence is the gradient of some

scalar field φ,

F = − ∇φ, (1.15)

so that the components of the force are given by partial derivatives of function φ:

∂φ

Fi = − ≡ −∂i φ. (1.16)

∂xi

The sign minus is conventional.

Let us recapitulate. We have discussed the notion of the work performed on the

body by force of the external force field in which the body is moving. We have argued

1.4 Potential energy 23

γ = γ1 ∪ γ2

B

γ2

γ1

A

Fig. 1.4. Let γ be any closed trajectory (a loop) and A and B any of its two points which split the

curve γ into a union of curves γ1 and γ2 .

that this work in general depends not only on the initial and final positions but also

on the trajectory. Then we defined a special class of the forces for which this is not

true and the work is actually path-independent and called such forces conservative

or potential. We have found four equivalent criteria for the force to be conservative:

• For arbitrary points A and B, the integral

ZB

W = F · dr (1.17)

A

• For arbitrary closed curve γ integral W vanishes,

I

F · dr = 0. (1.18)

γ

∇ × F = 0. (1.19)

F = − ∇φ. (1.20)

Function φ, if exists, is called the potential of vector field F or simply the potential

energy.

24 1 Classical mechanics

potential is denoted by V instead of φ. This convention will be followed later in the

textbook.

To conclude this section we repeat what is the potential energy. This term can

be defined only for conservative forces, i.e. for forces satisfying one of equivalent

conditions1 (1.17)–(1.20). Then, by (1.20), there exists a function φ such that F =

−∇φ. This function is, by definition, called potential energy. Recall that our original

motivation was to characterize the field not by the force but by the work which

the force performs during the motion of body. This work, for conservative forces, is

directly related to the potential:

ZB ZB ZB

W = F · dr = − (∇φ) · dr = − dφ = φA − φB . (1.21)

A A A

Thus, the work performed by the force is equal to difference of the values of the

potential at the initial and at the final point of the trajectory. In the proof we have

used the identity

Adjective “conservative” introduced to name forces which display properties (1.17)–

(1.20) reflects the fact that energy of a moving body is constant in such force field.

We define the total mechanical energy of the body in conservative field F = −∇φ by

E = T + φ. (1.22)

This quantity is constant in time. Before we proof this statement, notice that by

second Newton’s law and the definition of the potential, the acceleration of the body

is

dv 1 1

a= = F = − ∇φ.

dt m m

Let us differentiate the total energy with respect to time:

1

The equivalence means that if the force satisfies one of these conditions, it automatically satisfies re-

maining three conditions.

1.6 Conservation of momentum 25

dE d 1 2

= m v + φ = m v · v̇ + dφ = − v · ∇φ + (∇φ) · v = 0. (1.23)

dt dt 2

Thus, quantity E is indeed constant in time, i.e. it is conserved.

Theorem 1. Mechanical energy of the system in which the forces are potential is

constant.

Later we will see that the conservation of energy is in fact a consequence of a

deeper principle that the laws of motion cannot depend on time, i.e. the laws are

the same at all times. We say that the time is homogeneous. Precise meaning of this

statement will be clarified in sections 4.7 and 4.8.

Consider a system of N particles, each of them has position vector r i , velocity v i = ṙ i

and acceleration ai = v̇ i = r̈ i , where i = 1, 2 . . . N . Mass of the i−th particle will be

denoted by mi and hence the momentum of i−the particle is pi = mi v i .

These particle are in the interaction with each other and, in addition, there can

be an external force acting on each particle, e.g. gravitational force. Internal force

exerted by i−th particle on j−th particle will be denoted by F ij . In accordance with

the action of reaction, internal forces obey relations

F ij = −F ji . (1.24)

According to the law of force, total force exerted on i−th particle is equal to derivative

of its momentum, i.e.

dpi X

= Fi + F ij (1.25)

dt j6=i

where the total force on the right hand side is a sum of the external force and internal

forces exerted by all other particles.

Total momentum of the system is a sum of momenta of all particles,

X

P = pi ,

i

X X XX

Ṗ = ṗi = Fi + F ij .

i i i j6=i

26 1 Classical mechanics

Now we use the law of action and reaction (1.24). In the expression

XX

F ij

i j6=i

we sum through all (ordered) pairs of particles. For each pair (i, j) contributing by

F ij to the sum, there is a pair (j, i) contributing to the sum by F ji = −F ij . Hence,

the total sum of all internal forces is necessarily equal to zero and the time derivative

of the momentum reads

X

Ṗ = F i. (1.26)

i

In other words, the total momentum changes only because of the external forces and

internal interaction does not contribute to the overall change of momentum. If there

are no external forces, the total momentum is constant,

Ṗ = 0. (1.27)

Law (1.26) states that the total change of the momentum is equal to the external

force impressed on the system and total momentum is constant if there are no ex-

ternal forces. System with no external forces is called isolated because of lack of its

interaction with surrounding bodies. Thus, law (1.26) can be reformulated as follows.

in time (conserved).

geneity of the space. Notion of isotropy and its relation to conservation of momentum

are discussed in detail in sections 4.7 and 4.8.

For a single particle, as well as for a system of particles, the force impressed will cause

a change of the momentum. However, in the case of the system of N particles, it

makes sense to distinguish two kinds of motion: translation, when the body changes

the position, and rotation.

In the introductory courses of elementary physics it is explained that rotational

effect of the force can be quantified by the so-called torque (or moment of force) with

respect to a fixed origin defined by

1.7 Conservation of angular momentum 27

M = r × F. (1.28)

M = r F sin α

where α is an angle between both vectors. Because of the presence of the cross

product, the torque vanishes if the force is parallel to the position vector. In such a

case we expect that the force will not cause a rotation. In contrary, rotational effect

of the force will be maximal if vectors r and F are orthogonal, see figure 1.5.

Fn

r

α

F

Ft

Fig. 1.5. Rotational effect of the force F on the disk attached to a fixed point in its centre. Any

force F can be decomposed into the normal part Fn and the tangential part Ft . Clearly, the normal

part F n does not affect the rotation of the disk and only tangential part Ft is responsible for rotation.

Magnitude of tangential part is given by Ft = F sin α and hence we define the torque by (1.28).

While the torque characterizes rotational effect of the force exerted, angular momen-

tum characterizes rotational state of motion. Angular momentum of the i−th particle

with respect to fixed origin is defined as

l i = r i × pi . (1.29)

28 1 Classical mechanics

X

L= li . (1.30)

i

X X

L̇ = l̇i = [ṙ i × pi + r i × ṗi ] .

i i

Since pi = mi ṙi , vectors r i and pi are parallel and hence their cross product vanishes.

Thus, we have

X

L̇ = r i × ṗi .

i

Because ṗi is a total force acting on i−the particle, we can see that the rate of change

of the angular momentum of i−th particle is given by the torque of total force acting

on this particle. However, we can proceed further and decompose ṗi into an external

force and the sum of internal forces (as in the previous section) to find

X XX

L̇ = ri × F i + r i × F ij .

i i i6=j

Repeating the argument based on the action-reaction law we conclude that the total

change of the angular momentum is

X

L̇ = ri × F i = M (1.31)

i

where

X

M= ri × F i

i

Hence, internal forces does not contribute to the total change of rotational state

of the system, i.e. internal forces cannot affect total angular momentum. The only

reason why the system of particles can change its angular momentum is the presence

of external forces. Again, when no external forces are present and the system is

isolated, total angular momentum is conserved.

is constant.

1.8 Curvilinear coordinates 29

In this chapter we have introduced familiar Cartesian coordinate system. In the

Cartesian coordinates we assign a triple of numbers (x, y, z), or xi where i = 1, 2, 3,

to each point of the space. In chapter 2 we will see that sometimes it is useful to

use a different coordinate system which is better adapted to a problem to be solved.

Motivation will be presented in chapter 2 but let us introduce the most common

coordinate systems here.

In geometry, coordinates from the Cartesian ones are called curvilinear coordinates

because axes associated to non-Cartesian coordinates are usually curves rather than

lines, see below. In mechanics we often refer to curvilinear coordinates as generalized

coordinates in a sense that the Cartesian coordinates comprise only special class of

more general coordinate systems. In this book we use convention that the Cartesian

coordinates will be always denoted by symbol x and labelled by the Latin indices

i, j, k, . . .

for ordinary three-dimensional space, n = 2 for the plane. Later we will meet abstract

spaces with higher dimensions, e.g. the phase space. Hence, for n = 3 we have three

coordinates

We do not specify the values of indices i, j, k, . . . if the dimension is clear from the

context. Notice that symbol x without index stands for the n−tuple of coordinates

xi where i = 1, 2, . . . n, in general. Occasionally, we use standard notation

x1 = x, x2 = y, x3 = z,

Generalized coordinates will be denoted by symbol q and labelled by the Latin

indices

a, b, c, . . .

stands for the n−tuple of coordinates qa ,

q = (q1 , q2 , . . . qn ).

30 1 Classical mechanics

coordinates qa . Usually, if generalized coordinates have direct geometrical meaning,

we use specific symbols for individual coordinates. For example, if q1 has the meaning

of distance, it will be denoted by q1 = r, if q2 is the angle, it will be denoted by q2 = φ.

When dealing with coordinate transformation from Cartesian coordinates to curvi-

linear coordinates, we often need the Jacobi matrix of a transformation. Jacobi matrix

is the J matrix of first derivatives of ”new” coordinates with respect to the ”old”

ones. Elements of Jacobi matrix are therefore defined by

∂x1 ∂x1 ∂x1

∂q1 ∂q2 · · · ∂qn

∂x2 ∂x2 ∂x2

∂xi ···

Jia = = ∂q1 ∂q2 ∂qn

.

∂qa ..

.

∂xn ∂xn ∂xn

···

∂q1 ∂q2 ∂qn

Notice that the matrix itself is denoted by the bold symbol J while the elements of

the matrix are denoted by Jia .

Suppose that we are given transformation

xi = xi (q)

from curvilinear coordinates to the Cartesian coordinates. Notice that the last equa-

tion is in fact an abbreviation for n transformation relations. If we invert these

relations we arrive at the inverse coordinate transformation

qa = qa (x)

∂qa

J ai = . (1.32)

∂xi

Let us take a matrix product of the Jacobi matrix J and matrix J of the inverse

transformation. We find

∂xi ∂qa

Jia J aj = = δij

∂qa ∂xj

where we have used the chain rule for partial derivatives in the last step. Since δij

are the components of the unit matrix, we have

1.8 Curvilinear coordinates 31

J ·J =I

J −1 = J ,

i.e. Jacobi matrices of direct and inverse coordinate transformations are mutually

inverse.

Polar coordinates are defined in the plane rather than in three-dimensional space,

see figure 1.6. Let (x, y) be Cartesian coordinates of a given point with the position

vector r. Distance of this point from the origin will is denoted by r and is related to

Cartesian coordinates by

p

r = x2 + y 2 .

Now we denote the angle between the position vector and the axis x by θ, see figure

1.6. Then the pair (r, θ) constitutes the polar coordinates of a point under consider-

ation. Clearly, polar coordinates and Cartesian coordinates are related by equations

x = r cos θ,

(1.33)

y = r sin θ.

p

r = x2 + y 2 ,

y (1.34)

θ = arctan .

x

In the notation introduced above, the Cartesian coordinates for the plane are

x1 = x and x2 = y

q1 = r, q2 = θ.

32 1 Classical mechanics

(x, y)

r

θ

x

Fig. 1.6. Polar coordinates in the plane. Cartesian coordinates of the point are (x, y), polar coordinates

are (r, θ) where r is a distance of the point from the origin and θ is the angle between the radius-vector

and x−axis.

∂x ∂x

∂r ∂θ

cos θ −r sin θ

J = = . (1.35)

∂y ∂y sin θ r cos θ

∂r ∂θ

Let us see how this result can be obtained using Mathematica. First we define function

Jacobi which accepts the list of the Cartesian coordinates xs, the list of generalized

coordinates qs and the list of transformation rules rules. These rules are assumed to

be of the form

{ x1 -> ..., x2 -> ..., etc.}

where the dots express the Cartesian coordinates in terms of generalized ones. Func-

tion Jacobi can be defined, for example, as follows:

H* x i = x i H q L *L

In[1]:=

For our particular example of polar coordinates, this function should be called in the

following way:

1.8 Curvilinear coordinates 33

In[2]:=

Out[2]//MatrixForm=

K O

Sin @ΘD r Cos@ΘD

We can see that the result is identical with the previous one. In the rest of this

chapter we will use function Jacobi freely without explicitly mentioning it. Moreover,

we can call function Inverse to find

cos θ sin θ

!

J −1 = sin θ cos θ .

−

r r

By (1.32), we can deduce the partial derivatives of generalized coordinates with

respect to the Cartesian ones without actually calculating them:

∂r ∂r

= cos θ, = sin θ,

∂x ∂y

(1.36)

∂θ sin θ ∂θ cos θ

=− , = .

∂x r ∂y r

Now suppose that we want to describe the motion of a particle in the polar

coordinates. Since the particle is moving, its Cartesian coordinates will depend on

time, xi = xi (t), or explicitly

x = x(t), y = y(t).

dx dy

vx = , vy = .

dt dt

If the Cartesian coordinates depend on time, so do the polar coordinates, i.e. qa =

qa (t), or explicitly

r = r(t), θ = θ(t).

34 1 Classical mechanics

d

ẋ = (r(t) cos θ(t)) = ṙ cos θ − r θ̇ sin θ,

dt (1.37)

d

ẏ = (r(t) sin θ(t)) = ṙ sin θ + r θ̇ cos θ.

dt

The magnitude of the velocity is then

Spherical coordinates are analogous to polar coordinates but they are defined in

three-dimensional space. Geometrical meaning of spherical coordinates is depicted in

figure 1.7. Again, r is distance of the point from the origin, θ is the angle between

the position vector and z−axis. Next we project the position vector onto xy−plane,

obtaining so a vector r0 . Angle between this vector and the x−axis is denoted by φ.

By simple geometry we find transformation relations

x = r sin θ cos φ,

y = r sin θ sin φ, (1.39)

z = r cos θ.

Corresponding inverse relations read

p

r = x2 + y 2 + z 2 ,

p p

x2 + y 2 z x2 + y 2

θ = arctan = arccos p = arcsin p , (1.40)

z x2 + y 2 + z 2 x2 + y 2 + z 2

y

φ = arctan .

x

The Jacobi matrix and its inverse are

cos φ sin θ r cos θ cos φ −r sin θ sin φ

J = sin θ sin φ r cos θ sin φ r cos φ sin θ ,

cos θ −r sin θ 0

cos φ sin θ sin θ sin φ cos θ

cos θ cos φ cos θ sin φ sin θ (1.41)

−

−1

J =

r r r ,

csc θ sin φ cos φ csc θ

− 0

r r

1.8 Curvilinear coordinates 35

θ (x, y, z)

r0

φ

where

1

csc x = .

sin x

Components of the velocity can be calculated in the same way as in the previous

subsection. However, we can use Mathematica as in the following example:

In[31]:=

z@t_ D = r @tD Cos@Θ @tDD;

Print@"x ' = ", x '@tDD

Print@"y ' = ", y '@tDD

Print@"z' = ", z '@tDD

PrintB"v 2 = ", Simplify B x '@tD 2 + y '@tD 2 + z '@tD 2 FF

x ' = Cos@Φ@tDD Sin @Θ@tDD r ¢ @tD + Cos@Θ@tDD Cos@Φ@tDD r @tD Θ ¢ @tD - r @tD Sin @Θ@tDD Sin @Φ@tDD Φ ¢ @tD

y ' = Sin @Θ@tDD Sin @Φ@tDD r ¢ @tD + Cos@Θ@tDD r @tD Sin @Φ@tDD Θ ¢ @tD + Cos@Φ@tDD r @tD Sin @Θ@tDD Φ ¢ @tD

36 1 Classical mechanics

The last line of the output shows that the magnitude of the velocity in spherical

coordinates is

v 2 = ṙ2 + r2 θ̇2 + sin2 θ φ̇2 .

2

Lagrange equations

2.1 Motivation

Basic equation of classical mechanics is Newton’s law of force. If the force F acts on

a point mass m, this point mass undergoes an acceleration a according to formula

F

a= .

m

In the previous chapter we have introduced a Cartesian coordinate system, in which

the law of force can be written in the form

Fi = m ẍi . (2.1)

We can see that Newton’s law is a differential equation of second order. Solving this

equation we find three coordinates xi as functions of time

xi = xi (t).

However, equation (2.1) holds only in Cartesian coordinates. Since we are inter-

ested in the motion of bodies in three dimensional space E 3 or two dimensional space

E 2 , we can always introduce Cartesian coordinate system, write down equations of

motion and in principle we can also solve them. However, Cartesian system is not

always the most convenient choice and there can be other coordinate systems which

are more appropriate. So, natural question arises: what are the equations of motion

in arbitrary coordinate system?

To illustrate why we need non-Cartesian coordinates, let us consider the following

example. Mathematical pendulum is a point mass m attached to a fixed point called

pivot via rigid rod of length r, see figure 2.1. Cartesian coordinates of the point mass

38 2 Lagrange equations

are (x, y). Pendulum is subject to gravitational force F = mg, where g = (0, −g) is

gravitational acceleration. Thus, in order to find the equation of motion we have to

find Cartesian components of the force F and insert them into Newton’s law (2.1).

There is a problem, however: coordinates x and y are not independent. For the rod

is assumed to be rigid, it has fixed length and, by Pythagorean theorem, coordinates

x and y have to satisfy equation

x2 + y 2 = r 2 , (2.2)

where r is the length of the rod. This is not a dynamical equation, because it is not a

differential equation which can be solved for given initial conditions. Rather it is an

algebraic equation which must be satisfied for any solution of equations of motion.

Equations of this kind are called constraints and we say that coordinates x and y

are constrained.

θ

r

(x, y)

θ

Ft Fn

mg

y

Fig. 2.1. Mathematical pendulum.

In other words, we have two equations of motion, one for each coordinate, but in

addition we have to satisfy the constraint (2.2). Instead of two equations we have

2.1 Motivation 39

to solve three. The reason is that the Cartesian coordinates are not well adapted to

the problem at all. If the system is described by two independent coordinates, we

say that it has two degrees of freedom. But the constraint reduces the number of

degrees of freedom to one! It is natural, because the pendulum can move only along

the circle of radius r. And circle is one-dimensional object. Although we describe the

position of pendulum by two coordinates, it has only one degree of freedom.

Can we describe the motion of the pendulum in such a way that it will have

manifestly only one degree of freedom? Definitely we can. The position of pendulum

is uniquely determined by the angle of deflection θ, see again figure 2.1. According

to that figure, Cartesian coordinates (x, y) are related to the angle θ by

x = r sin θ,

(2.3)

y = r cos θ.

This is similar to polar coordinates introduced before, the exchange of sin and cos

comes from different definition of angle θ. More important is that quantity r is

constant, it is not a variable. We can easily verify that the constraint (2.2) is satisfied

for any value of θ:

We can see that if we describe the pendulum by angle θ, we do not have to care

about the constraint anymore, for it is automatically satisfied. We thus have the

single variable θ which corresponds to the fact that the pendulum has only one

degree of freedom.

This is certainly a progress! In Cartesian coordinates we had two equations of

motion and one constraint. Now we have only one variable and no constraint. What

remains is to find the equation of motion. From the figure 2.1 it is obvious that

the force F acting on the pendulum can be decomposed to two components F t and

F n . Force F n is the normal component parallel to the rod. It causes the tension of

the rod, but since the rod is rigid, this has no effect on the motion of pendulum.

On the other hand, component F t tangent to the trajectory causes the acceleration.

Magnitude of tangent force is

Ft = F sin θ = m g sin θ.

Ft

at = = g sin θ.

m

40 2 Lagrange equations

Tangential acceleration is

at = r θ̈

where θ̈ is angular acceleration. The equation of motion of mathematical pendulum

is therefore

r θ̈ + g sin θ = 0

or in slightly modified form

g

θ̈ + sin θ = 0 (2.4)

r

The point is that the Cartesian coordinates are not always the most convenient.

We have seen that if we describe pendulum by Cartesian coordinates we have to solve

two equations of motion and one constraint, i.e. three equations. But the pendulum

has only one degree of freedom and its description by two coordinates is redundant.

This redundancy is reason why we have to impose the constraint. The problem can

be circumvented by appropriate choice of coordinates. Choosing angle θ as a single

coordinate we have eliminated the constraint and we have found the single equation

of motion. So we have one variable θ and one equation of motion. In this coordinate

system the system has one degree of freedom manifestly and we do not have to

impose the constraint.

Mathematical pendulum is a very simple system and we will analyze its properties

later on. We will see that despite its simplicity it possesses several non-trivial proper-

ties and the equation of motion cannot be even solved. In physics and in modelling of

realistic situations we often meet systems which are much more complicated. Double

pendulum, for example, consists of two point masses, one is attached to pivot, but

the second point mass is attached to the first one. Analysis shows that the motion of

double pendulum is chaotic. But in the case of double pendulum it is not clear how to

find the equations of motion and the procedure sketched above is more complicated.

Lagrange formalism to be introduced in this chapter provides a systematic way how

to derive equations of motion in arbitrary curvilinear coordinate system.

We start with the derivation of Lagrange equations of the second kind. Lagrange

equations of the first kind also exist but they contain the constraints explicitly. We

will not study them in this text. Lagrange equations of the second kind eliminate

constraints by choosing appropriate coordinate system. For simplicity we consider

only one particle of mass m. The result will be easily generalized to more particles.

2.2 Lagrange equations of the second kind 41

Fi = m ẍi . (2.5)

New coordinates will be denoted q and labeled by indices a = 1, 2, . . . n, where n is

not necessarily equal to 3. For example, as we have seen, the pendulum is described

by single coordinate θ. Variables qa are called generalized coordinates. Cartesian co-

ordinates are connected to generalized coordinates by relations of the form

xi = xi (q),

where symbol q stands for the whole n−tuple (q1 , . . . qn ). If this is too abstract for

the reader, equations (2.3) from the previous section can serve as an example of a

coordinate transformation. In the case of pendulum, Cartesian coordinates xi are x

and y, and the only generalized coordinate is q1 = θ.

Moreover, we assume that previous relations can be inverted, i.e. we can express

generalized coordinates as functions of the Cartesian ones:

qa = qa (x).

Thus, the generalized coordinates are functions of the Cartesian coordinates and vice

versa. On the other hand, Cartesian coordinates depend on time (they are solutions

of (2.5)), so the generalized coordinates must depend on time, too:

qa (t) = qa (x(t)).

The total derivative of qa with respect to time can be obtained by the chain rule for

derivatives:

∂qa

q̇a = ẋi .

∂xi

This relation immediately implies

∂ q̇a ∂qa

= . (2.6)

∂ ẋi ∂xi

The total derivative of xi expressed in terms of generalized coordinates reads

42 2 Lagrange equations

∂xi

ẋi = q̇a . (2.7)

∂qa

Notice that since qa depend on xi , also the quantity

∂qa

∂xi

depends on xi . Similarly, xi depends on qa and therefore

∂xi

∂qa

depends on qa as well.

We know that if xi is a Cartesian coordinate, then ẋi is i−th component of the

velocity, i.e. vi = ẋi . Analogously, derivatives of qa with respect to time are called

generalized velocities. In Lagrangian formalism, coordinates and corresponding ve-

locities are treated as independent variables. In other words,

∂ ẋi ∂ q̇a

= = 0.

∂xj ∂qb

Kinetic energy expressed in the Cartesian coordinates is

1 1

T = m v 2 = m ẋi ẋi . (2.8)

2 2

Kinetic energy therefore depends on the Cartesian velocities, but it does not depend

on the Cartesian coordinates themselves:

∂T

= 0,

∂xi

but

∂T

= m ẋi . (2.9)

∂ ẋi

Expression (2.8) for kinetic energy can be rewritten in terms of generalized coor-

dinates using (2.7):

1 ∂xi ∂xi 1 ∂xi ∂xi

T = m q̇a q̇b = m q̇a q̇b .

2 ∂qa ∂qb 2 ∂qa ∂qb

Kinetic energy depends on generalized velocities q̇a , but now it depends also on qa ,

because of partial derivatives (recall the remarks below equation (2.7)),

∂T ∂T

T = T (q, q̇), 6= 0, 6= 0.

∂qa ∂ q̇a

2.2 Lagrange equations of the second kind 43

The last ingredient neccesary for the derivation of Lagrange equations is the notion

of generalized forces. Generalized forces are the components of the force F in the

curvilinear coordinate system. If Fi are Cartesian components of the force, then the

generalized forces are defined by

∂xi

Qa = Fi . (2.10)

∂qa

2.2.4 Derivation

Now we are prepared to derive the Lagrange equations of the second kind. Newton’s

law reads

Fi = m ẍi .

d ∂T

Fi = .

dt ∂ ẋi

Multiply this equation by ∂xi /∂qa to obtain

∂xi ∂xi d ∂T

Fi = .

∂qa ∂qa dt ∂ ẋi

On the left hand side we can see the generalized forces Qa according to relation

(2.10):

∂xi d ∂T

Qa = . (2.11)

∂qa dt ∂ ẋi

Now we are going to rearrange the right hand side in order to eliminate the Cartesian

coordinates xi .

Using the Leibniz rule1 , the right hand side can be rewritten as

d ∂xi ∂T ∂T d ∂xi

Qa = − . (2.12)

dt ∂qa ∂ ẋi ∂ ẋi dt ∂qa

1

Leibniz rule is a product rule for differentiation. Derivative of the product f g is (f g)0 = f 0 g + f g 0 . We

use this rule in the form f g 0 = (f g)0 − f 0 g.

44 2 Lagrange equations

The first term on the right hand side is, using (2.6), equal to

d ∂xi ∂T d ∂ ẋi ∂T

= .

dt ∂qa ∂ ẋi dt ∂ q̇a ∂ ẋi

Recall that the kinetic energy depends on Cartesian velocities ẋi but it does not

depend on the coordinates. Then, by the chain rule, we have

∂T ∂T ∂ ẋi ∂T ∂xi ∂T ∂ ẋi

= + = .

∂ q̇a ∂ ẋi ∂ q̇a ∂xi ∂ q̇a ∂ ẋi ∂ q̇a

|{z}

0

d ∂T ∂T d ∂xi

Qa = − . (2.13)

dt ∂ q̇a ∂ ẋi dt ∂qa

Now we want to eliminate the Cartesian coordinates from the second term of

equation (2.13). Consider following identity:

d ∂xi ∂ ∂xi ∂ ∂xi

= q̇b + q̈b .

dt ∂qa ∂qb ∂qa ∂ q̇b ∂qa

d ∂xi ∂ ∂xi ∂xi ∂ dxi ∂ ẋi

= q̇b + q̈b = = .

dt ∂qa ∂qa ∂qb ∂ q̇b ∂qa dt ∂qa

∂T d ∂xi ∂T ∂ ẋi ∂T

= = .

∂ ẋi dt ∂qa ∂ ẋi ∂qa ∂qa

Substituting this equality into (2.13) we arrive at the final form of the equations of

motion.

d ∂T ∂T

− = Qa (2.14)

dt ∂ q̇a ∂qa

2.3 Lagrange equations 45

In the previous section we have derived, after some effort, the Lagrange equations of

the second kind (2.14). These equations are completely equivalent to Newton’s law

of motion, but they are written in arbitrary curvilinear coordinate system, while the

Newton’s law has its simple form in the Cartesian coordinates only. If we want to

derive equations of motion for particular system, we have to write down expression

for kinetic energy T , transform it to appropriate coordinate system and find the

components of generalized forces Qa and insert them into equations (2.14).

Note that while it is easy to find the expression for T in generalized coordinates,

because it is a scalar function, generalized forces involve the calculation of the sum

∂xi

Qa = Fi .

∂qa

There is, however, special but very important case, when the forces Fi are conserva-

tive. We know from elementary physics that gravitational force or electrostatic force

can be written as a gradient of a scalar function called potential. By definition, the

force with components Fi is called conservative if there exists a potential V such

that

∂V

Fi = − ≡ − ∂i V, (2.15)

∂xi

where the minus sign is conventional. What are the components of the generalized

forces in such case? The calculation is straightforward:

∂xi ∂xi ∂V ∂V

Qa = Fi = − =− .

∂qa ∂qa ∂xi ∂qa

Thus, for conservative forces, the components of generalized forces are simply par-

tial derivatives of the potential with respect to generalized coordinates. Lagrange

equations of the second kind then acquire the form

d ∂T ∂T ∂V

− =− . (2.16)

dt ∂ q̇a ∂qa ∂qa

An important point is that the potential cannot depend od velocities. It is a conse-

quence of the fact that conservative forces do not depend on the motion of the bodies

on which they act2 – they depend only on the configuration of the system, i.e. on

the positions of individual objects. In other words,

2

For example, gravitational force depends only on the distance of both objects, but it does not depend

on the velocity of the bodies. On the other hand, electromagnetic force does depend on the velocity –

magnetic force is a cross product of the velocity and magnetic field. Nevertheless, the concept of the

Lagrangian is valid also for the electromagnetic force; this issue is explained later.

46 2 Lagrange equations

∂V

= 0.

∂ q̇a

Now, rewrite equation (2.16) as

d ∂T ∂(T − V )

− = 0.

dt ∂ q̇a ∂qa

Since the potential does not depend on q̇a , we can write also

d ∂(T − V ) ∂(T − V )

− = 0,

dt ∂ q̇a ∂qa

because the term ∂V /∂ q̇a which we added vanishes anyway. Obviously, it is useful to

introduce a new scalar function called Lagrangian by

L = T − V. (2.17)

d ∂L ∂L

− = 0. (2.18)

dt ∂ q̇a ∂qa

Notice the terminology used: there exist Lagrange equations of the first kind

but we do not consider them in this text. In the previous section we derived the

Lagrange equations of the second kind which are equivalent to Newton’s law of

motion, but they are written in generalized coordinate system. Equations (2.18)

are called simply Lagrange equations. They are not completely equivalent to the

Newton law, because we assumed that the forces are conservative. Gravitational

and electrostatic forces are typical conservative forces. By contrast, the friction and

general electromagnetic forces are non-conservative, i.e. there is no potential V from

which they can be derived. If the system under consideration contains the friction,

we cannot find the Lagrangian of this system and we cannot use Lagrange equations,

but we still can use Lagrange equations of the second kind. It is interesting that

although the electromagnetic force is not conservative, the Lagrangian exists, as

we will see later. The friction is not a fundamental force, however: it is a result of

complicated interaction between the molecules forming surfaces of bodies in contact.

The electromagnetic force, on the other hand, is fundamental, it is one of four basic

forces in Nature. In fact, it is the most important force for us. Fortunately, it can be

described in Lagrange formalism so that the Lagrange equations (2.18) are sufficient

for the description of almost all physically relevant situations.

2.4 Particle in homogeneous gravitational field 47

tions of the second kind. The equations of motion are derived from the single function

L and we do not have to calculate generalized forces Qa . In the following sections

we will show few examples how Lagrange formalism works, then we show how to

implement new formalism in Mathematica.

We start with very simple example, with homogeneous gravitational field. Gravita-

tional field is never homogeneous but near the surface of the Earth it is approxi-

mately constant. All bodies are moving with constant gravitational acceleration g.

Its Cartesian coordinates are

g = (0, −g),

and has magnitude g. On the other hand, Cartesian coordinates of the acceleration

are (ẍ, ÿ), so the equations of motion are

ẍ = 0, ÿ = −g. (2.19)

Let us see how the same result can be derived in Lagrange formalism. Kinetic

energy of the particle is

1 1

m ẋi ẋi = m ẋ2 + ẏ 2 .

T =

2 2

Gravitational force is

F = m g,

so that

F1 = 0, F2 = −m g.

∂V ∂V

F1 = − , F2 = − .

∂x ∂y

The first equation merely states that V does not depend on x, i.e.

48 2 Lagrange equations

V = V (y).

∂V

− = −g

∂y

which integrates to

Z

V = m g dy = m g y + const.

Integration constant does not affect the equations of motion (why?), so we can set

the constant to zero without the loss of generality.

We have found the kinetic energy and the potential, so we can write down the

Lagrangian which is by definition

1

m ẋ2 + ẏ 2 − m g y.

L=T +V = (2.20)

2

Notice that Lagrange equations (2.18) are written in an arbitrary coordinate system.

Our motivation was to introduce curvilinear coordinates but these equations hold in

the Cartesian system as well. Now the generalized coordinates are simply

q1 = x, q2 = y,

d ∂L ∂L

− = 0.

dt ∂ ẋi ∂xi

For the Lagrangian (2.20) we have

∂L d ∂L

= m ẋ, = m ẍ,

∂ ẋ dt ∂ ẋ

∂L d ∂L

= m ẏ, = m ÿ,

∂ ẏ dt ∂ ẏ

∂L ∂L

= 0, = m g.

∂x ∂y

2.5 Harmonic oscillator 49

Substituting these expressions into Lagrange equations (2.18) we arrive at the equa-

tions of motion:

d ∂L ∂L

− =0 → m ẍ = 0,

dt ∂ ẋ ∂x

d ∂L ∂L

− =0 → m ÿ = −m g.

dt ∂ ẏ ∂y

We can see that the Lagrange equations are familiar equations of motion (2.19). Of

course, for the motion in homogeneous gravitational field we can find the equations

of motion easier than through the Lagrangian. But before we can apply it to more

complicated problems, it is useful to see how it works in simple cases where we know

the result even without Lagrange equations.

Harmonic oscillator is one of the most important models in physics. In mechanics it

corresponds to the motion of an idealized spring, see figure 2.2. The point mass m

is connected to a fixed point via massless spring. If the point mass is displaced from

the equilibrium position, the spring exerts the force

F = −k q,

where q is the displacement. The minus sign is due to fact that the force always acts

in the direction opposite to the displacement. The constant k is called the rigidity

of the spring. According to the Newton’s law of motion, the acceleration is given by

F

a= .

m

Since the motion is one-dimensional, the only component of the previous equation is

k

q̈ = − q.

m

Constant k/m is usually denoted as

k

ω2 =

m

so that the equation of motion is

50 2 Lagrange equations

q̈ + ω 2 q = 0. (2.21)

This equation appears in physics very frequently, even if it is not connected with

the motion of the spring, e.g. oscilations of the electric circuits, vibrating atoms in

the crystal lattice.

Let us find the Lagrangian for harmonic oscilator. Kinetic energy is straightfor-

ward:

1

T = m q̇ 2 .

2

Potential is defined by relation

∂V

F =−

∂q

and the integration yields

1

Z Z

V = − F dq = k q dq = k q 2 .

2

This expression is usually written in terms of parameter ω:

1

V = m ω2 q2.

2

Thus, the Lagrangian is

1 1

L= m q̇ 2 − m ω 2 q 2 . (2.22)

2 2

Lagrange equations are obtained in a usual way and we find

∂L d ∂L ∂L

= m q̇, = m q̈, = m ω 2 q. (2.23)

∂ q̇ dt ∂ q̇ ∂q

d ∂L ∂L

− =0 → q̈ + ω 2 q = 0. (2.24)

dt ∂ q̇ ∂q

2.6 Mathematical pendulum 51

m

F

q=0 q

Fig. 2.2. Equilibrium position of the spring corresponds to q = 0. Restoring force F is proportional

to the displacement, F = −kq.

So far we considered Lagrange equations in the Cartesian coordinates. Now we return

to the example from the introduction to this chapter, to mathematical pendulum;

recall figure 2.1. As we explained, it is more convenient to introduce polar coordinates

via relations

x = r sin θ, y = r cos θ.

1 1

m ẋi ẋi = m ẋ2 + ẏ 2 .

T =

2 2

Now we have to rewrite this expression in the polar coordinates r and θ. Since the rod

of the pendulum is supposed to be perfectly rigid, the coordinate r remains constant.

On the other hand, coordinate θ depends on time,

θ = θ(t).

Derivatives of Cartesian coordinates x and y with respect to time are therefore given

by

ẋ = r (cos θ) θ̇ = r θ̇ cos θ,

(2.25)

ẏ = −r (sin θ) θ̇ = −r θ̇ sin θ.

52 2 Lagrange equations

1 2 2

T = m r θ̇ cos2 θ + r2 θ̇2 sin2 θ

2

1

= m r2 θ̇2 cos2 θ + sin2 θ

(2.26)

2

1

= m r2 θ̇2 .

2

What about the potential V ? In our elementary analysis from the beginning of the

chapter, we decomposed the force F into tangent and normal component and realized

that the normal component does not affect the motion and the tangent component

causes the angular acceleration. Decomposition of the force was easy but sometimes it

can be very difficult and one has to find an appropriate way. In Lagrange formalism,

however, the procedure is straightforward (although it can be complicated).

Cartesian components of the force are

F1 = 0, F2 = m g.

Note that we do not include the minus sign, because y−axis is oriented downwards

(see figure 2.1). Now we can compute generalized forces according to relation (2.10).

Since we have only one generalized coordinate θ, there is only one generalized force:

∂xi ∂x ∂y

Q= Fi = F1 + F2 = −m g r cos θ.

∂θ ∂θ ∂θ

Potential is then defined as

∂V

Q=−

∂θ

which integrates to

Z

V = − Q dθ = −m g r cos θ.

1

L = m r2 θ̇2 + m g r cos θ. (2.27)

2

Once we have the Lagrangian, equations of motion follow immediately from Lagrange

equations (2.18):

∂L d ∂L ∂L

= m g r2 θ̇, = m g r2 θ̈, = −m g r sin θ.

∂ θ̇ dt ∂ θ̇ ∂θ

d ∂L ∂L g

− =0 → θ̈ + sin θ = 0.

dt ∂ θ̇ ∂θ r

2.7 Lagrange equations in Mathematica 53

If we define

g

ω02 = ,

r

the equation of motion acquires the form

We have seen how the Lagrange formalism can be applied to familiar problems to

obtain the equations of motion and we are ready for studying a new problem where

the equations of motion are unknown. We will illustrate the power of the formalism

on the example of the double pendulum. Before we analyse double pendulum, let us

see how the Lagrange formalism can be implemented in Mathematica.

In this section we present one possible way how to derive Lagrange equations using

Mathematica. The algorithm to be explained takes Lagrangian L, the list of gen-

eralized coordinates and velocities and differentiates Lagrangian in order to obtain

Lagrange equations. In our example we study the motion in homogeneous gravita-

tional field investigated in section 2.4.

Generalized coordinates are now

q1 = x, q2 = y,

the Lagrangian is

1

m ẋ2 + ẏ 2 − m g y.

L=

2

Let us explicitly denote which variables depend on time:

1

m ẋ(t)2 + ẏ(t)2 − m g y(t).

L=

2

In order to find the Lagrange equations

d ∂L ∂L

− =0

dt ∂ q̇a ∂qa

we have to evaluate partial derivatives

54 2 Lagrange equations

∂L ∂L ∂L ∂L

, , , ,

∂ ẋ(t) ∂ ẏ(t) ∂x(t) ∂y(t)

and then to calculate total derivatives

d ∂L d ∂L

, .

dt ∂ ẋ(t) dt ∂ ẏ(t)

Derivative of the Lagrangian with respect to coordinate x can be found by the com-

mand

D[ L, x[t] ]

where L is the Lagrangian written in Mathematica. Similarly, derivative with respect

to velocity is simply

D[ L, x’[t] ].

An equivalent way how to perform the last command is

D[ L, D[x[t], t] ].

Since we need to differentiate this expression with respect to time again, we can write

D[ L, D[x[t], t], t ].

Hence, the Lagrange equation for variable x can be written in the form

D[ L, D[x[t], t], t ] - D[L, x[t]] == 0.

Analogous command can be constructed for the second variable y.

In order to make our code universal, we realize that we have to perform operation

D[ L, D[#, t], t] - D[L, #] == 0

for each generalized coordinate #, where # must be taken from the list of generalized

coordinates. Hence, suppose that we have a list of generalized coordinates called qs,

in our case

qs = { x[t], y[t] }

and the Lagrangian L. Then we can define the function

In[5]:=

2.8 Solving the equations of motion of pendulum 55

1

LagrangeEqsB 8x @tD, y @tD<, m J x '@tD 2 + y '@tD 2 N - m g y @tDF

In[6]:=

Out[6]=

In the case of mathematical pendulum we can write

q = {\[Theta]};

v = {p};

L = 1/2 m r^2 p^2 + m g r Cos[\[Theta]];

where have denoted p = θ̇. Again, running the rest of our code yields correct equation

of motion

{g m r Sin[\[Theta][t]] + m r^2 \[Theta]’’[t] == 0}.

We can see that the mass can be canceled and the equation of motion is (2.28):

This equation cannot be solved in a closed form which means that we cannot write

down its explicit solution. Fortunately, there exist numerical methods which allow us

to find the approximate solution. In fact, Mathematica has many built-in methods for

constructing the numerical solutions of many types of differential equations. They are

all encapsulated in NDSolve function. But the numerical solution cannot be obtained

if the values of the constants are not specified. Moreover, to find a particular solution

we have to provide also the initial conditions. Our task is now to solve equation (2.29)

with appropriate initial conditions numerically.

\[Omega]0 = 1; T0 = 2 \[Pi] / \[Omega]0;

eqs = { \[Theta]’’[t] + \[Omega]0^2 Sin[\[Theta][t]] == 0,

\[Theta][0] == \[Pi]/4, \[Theta]’[0] == 0};

sol = NDSolve[ eqs, \[Theta][t], {t, 0, 2 T0}]

56 2 Lagrange equations

Here we first set the value of ω0 to 1 for simplicity. Moreover we define the “period”

T0 = 2π/ω0 because we know that for harmonic oscilator such relation holds. Next

we define the list of three equations,

π

θ̈ + ω02 sin θ = 0, θ(0) = , θ̇(0) = 0.

4

First of them is the equation of motion, the other two represent the initial conditions.

Equation θ(0) = π/4 means that the angle of deflection at time t = 0 is equal to

π/4 (in what position the pendulum is?). The velocity θ̇ has been set to zero. The

solution is found by the function NDSolve, as claimed, where we specify

• system of equations to solve – eqs;

• unknown variable – θ[t];

• interval of – {t, 0, 2 T0}.

The result of NDSolve is something of the form

{{ \[Theta][t] -> InterpolatingFunction[....][t] }}

We can see that it is a replacement rule. According to this rule, any occurence of θ[t]

will be replaced by interpolating function. When the function NDSolve constructs the

solution, it finds only a finite number of values of the unknown function θ on desired

interval. Then, however, we want to evaluate the solution at arbitrary time t and it

can happen that this time will be different than any time used in the construction of

the solution. For this reason, Mathematica has to ”guess” the correct value of θ at

that time. By the ”guessing” we mean the interpolation between two closest times

at which the value of θ is known.

However, it is not important how the procedure works for us. What we need is

that in order to evaluate the solution at arbitrary time t we have to type

\[Theta][t] /.sol /.t->1

Symbol θ[t] has no meaning to Mathematica, but the rule sol will replace the symbol

by the function which is the solution of the equation of motion. Then we can replace

the argument t by its concrete numerical value using the next rule.

Finally, we can visualise the solution by the command Plot. Complete code for

solving the equations of motion and plotting it follows, resulting picture is in figure

2.3.

2.9 Deriving the Lagrangian in Mathematica 57

eqs = { \[Theta]’’[t] + \[Omega]0^2 Sin[\[Theta][t]] ==

0, \[Theta][0] == \[Pi]/4, \[Theta]’[0] == 0};

sol = NDSolve[ eqs, \[Theta][t], {t, 0, 2 T0}]

Plot[ \[Theta][t] /. sol, {t, 0, 2 T0}]

Fig. 2.3. Numerical solution of equation of mathematical pendulum for initial conditions θ(0) =

π/4, θ̇ = 0 and value ω0 = 1.

The code we have developed in previous sections is able to find the equations of

motion from arbitrary Lagrangian provided that the lists of generalized coordinates

and velocities are specified. In the case of mathematical pendulum in section 2.6

we have seen that sometimes the Lagrangian must be transformed into appropriate

coordinate system. Even this procedure can be automatized by Mathematica.

As an example we use mathematical pendulum again. The Lagrangian in Cartesian

coordinates reads

1

m ẋ2 + ẏ 2 − m g y.

L=

2

Recall that polar coordinates for pendulum were introduced by

58 2 Lagrange equations

x = r sin θ,

(2.30)

y = r cos θ.

type

x[t_] = r Sin[\[Thetat];

y[t_] = r Cos[t];

L = 1/2 m ( x’[t]^2 + y’[t]^2) + m g y[t]

which yields

g m r Cos[\[Theta][t]] +

1/2 m (r^2 Cos[\[Theta][t]]^2 \[Theta]’[t]^2 +

r^2 Sin[\[Theta][t]]^2 \[Theta]’[t]^2)

This is correct but can be simplified:

x[t_] = r Sin[t];

y[t_] = r Cos[t];

L = Simplify[ 1/2 m ( x’[t]^2 + y’[t]^2)] + m g y[t]

Now the identity sin2 θ + cos2 θ = 1 is applied automatically and the result is

-g m r Cos[\[Theta][t]] + 1/2 m r^2 \[Theta][t]^2.

Reader can check that this result is identical with the Lagrangian (2.27).

In this section we study a new problem: motion of the planet in homogeneous grav-

itational field. First we formulate the problem in physical terms, then we present its

solution using Mathematica.

Suppose we have a massive star, e.g. the Sun, which is at rest in a given reference

frame. The Sun produces gravitational field which attracts all bodies to its center.

According to Newton’s gravitational law, the body of arbitrary mass m moves with

the acceleration

r

a = −M G 3, (2.31)

r

where G is Newton’s gravitational constant, M is the mass of the Sun and bmr is

the position vector of the planet with respect to Sun, see figure 2.4. We can choose

units in such a way that

2.10 Planet in gravitational field 59

G M = 4 π2.

x = r cos θ, y = r sin θ,

planet, mass m

r (position vector)

Sun, mass M

Fig. 2.4. Position of the planet with respect to the Sun.

m

V (r) = −4 π 2 , (2.32)

r

where m is the mass of a planet. The Lagrangian of a planet is then

1

m ẋ2 + ẏ 2 − V.

L=

2

Let us find the expression for this Lagrangian in the polar coordinates. Corresponding

Mathematica code reads

x[t_] = r[t] Sin[\[Theta][t]];

y[t_] = r[t] Cos[\[Theta][t]];

L = Simplify[1/2 m ( x’[t]^2 + y’[t]^2)] - (4 \[Pi]^2 m)/r

and yields

1 2 4 π2 m

L= m ṙ + r2 θ̇2 − .

2 r

3

Hamilton’s equations

3.1 Motivation

Let us recapitulate the advantages of Lagrange’s equations compared to the Newton

law of motion:

• Lagrange’s equations hold in arbitrary curvilinear coordinate system;

• the number of Lagrange’s equations is equal to the number of degrees of freedom

while in the Cartesian system we always have three equations for each particle

and possibly some additional constraints;

• the system is described by single scalar function called Lagrangian which simplifies

the transformation to generalized coordinate system.

In the context of classical mechanics, Lagrange’s equations are equivalent to New-

ton’s laws, but Newton’s laws appeared to be incorrect and have to be replaced by

the theory of relativity and the quantum mechanics. Nevertheless, the formalism of

Lagrange’s equations can be applied even in those theories.

Typical Lagrangian for one particle in Cartesian coordinates has form

1

L = T − V = m ẋi ẋi − V (x). (3.1)

2

Let us examine the structure of Lagrange’s equations

d ∂L ∂L

− =0

dt ∂ ẋi ∂xi

compared to Newton’s law of force in the form

dpi

= Fi .

dt

62 3 Hamilton’s equations

∂L

= m ẋi = pi .

∂ ẋi

We can see that, in Cartesian coordinates, derivative of Lagrangian with respect to

velocity ẋi is equal to ordinary momentum

pi = m ẋi .

dpi ∂L

= .

dt ∂xi

But for Lagrangian (3.1) we have

∂L ∂ ∂L

= (T − V ) = − = Fi ,

∂xi ∂xi ∂xi

since the kinetic energy T does not depend on xi and the potential is defined by

relation Fi = −∂V /∂xi . With this observation we immediately see that Lagrange’s

equations are equivalent to Newton’s law for they acquire form

dpi

= Fi .

dt

Notice that all of this holds in the Cartesian coordinates only.

What about general coordinate system qa ? We have seen, see equation (2.27), that

the Lagrangian of the pendulum in polar coordinates is

1

L= m r2 θ̇2 + m g r cos θ,

2

where the single generalized coordinate is θ and corresponding generalized velocity

is θ̇. Now

∂L

= m r2 θ̇.

∂ θ̇

This expression is not equal to the momentum of the pendulum, nevertheless, there

is a connection. The velocity of the pendulum is

v = ωr

3.1 Motivation 63

momentum of the pendulum is

p = m v = m r θ̇.

We can see that quantities p and ∂L/∂ θ̇ differ by a factor r. But this is a consequence

of the choice of the coordinates only! Although p and ∂L/∂ θ̇ are different, they are

obviously related.

Thus, in the two cases we presented, derivatives of the Lagrangian with respect

to generalized velocities are related to the momentum of the particle. As we have

seen, in Cartesian system they coincide, but in curvilinear coordinates they do not.

Nevertheless, it seems reasonable to define notion of momenta derived from the La-

grangian.

Let L be an arbitraty Lagrangian depending on generalized coordinates qa and

velocities q̇a , i.e. L = L(q, q̇). We define generalized momentum pa conjugated to

coordinate qa :

∂L

pa = . (3.2)

∂ q̇a

Lagrange’s equations then acquire form

dpi ∂L

= .

dt ∂qa

If we know actual position and momentum of the particle, we can calculate how the

momentum varies in time. But how the position changes? We can find the answer only

by solving Lagrange’s equations to obtain functions qa = qa (t) and then calculate q̇a .

It would be better, however, if we could write equations of the form

q̇a = something,

(3.3)

ṗa = something else.

equations give only the second part via

∂L

ṗa = . (3.4)

∂qa

But the Lagrangian is a function of qa and q̇a , so we can, in principle, use the definition

of generalized momentum

64 3 Hamilton’s equations

∂L

pa = (3.5)

∂ q̇a

and invert it to obtain relation of the form

Summa summarum, Lagrange’s equations are second order differential equations

for unknown functions qa . In the Lagrange formalism independent variables are co-

ordinates qa and velocities q̇a . We defined a generalized momentum pa by (3.2). Now

we want to rewrite Lagrange’s equations in such a way that new equations will have

the form (3.3). We have seen that using momenta pa Lagrange’s equations have the

form (3.5) and constitute only the half of equations we want to find. The difficulty

essentialy is that the Lagrangian itself is a function of qa and q̇a , but now we want the

independent variables to be qa and pa . Hamilton’s formalism provides a systematic

way how to obtain desired equations. Before we present it, a remark on the Legendre

transformation must be made.

In this section we formulate the problem in more general way, in the following section

we apply it to Lagrange’s formalism. Suppose that function f of variables (x1 , . . . xn )

is given,

f = f (x1 , . . . xn ) ≡ f (x).

∂f ∂f

df = dx1 + . . . + dxn ,

∂x1 ∂xn

or, using Einstein’s summation convention,

∂f

df = dxi .

∂xi

It is an important point, that also the converse is true. If f is a function of some set

of variables, and its total differential is found to be

df = yi dxi ,

3.2 Legendre transformation 65

∂f

= yi

∂xi

holds.

OK, let us return to expression

∂f

df = dxi .

∂xi

Now denote

∂f

yi =

∂xi

so that the differential df acquires the form

df = yi dxi .

df = d(xi yi ) − xi dyi .

This is equivalent to

d(xi yi ) − df = xi dyi

d (xi yi − f ) = xi dyi .

Note that on the left hand side we have the total differential of some function which

will be denoted by g:

dg = xi dyi ,

of yi , because its differential contains only differentials dyi , which means that

g = g(y).

Moreover, relation

66 3 Hamilton’s equations

∂g

= xi

∂yi

holds.

Let us recapitulate the procedure. We started with function f = f (x) depending

on variables xi . Then we defined new variables yi by

∂f

yi =

dxi

which means that the differential df became

df = yi dxi .

g = xi y i − f

with differential

dg = xi dyi

which means that the new function depends on new variables yi . Function g is called

Legendre transformation of function f . Thus, Legendre transformation is procedure

how to transform function f = f (x) to new function g = g(y) where yi = ∂i f .

Now we are in position to derive Hamilton’s equations. Suppose that our system is

described by the Lagrangian

L = L(q, q̇, t)

where qa are generalized coordinates, q̇a are generalized velocities. We also allow the

Lagrangian to depend on time explicitly, i.e. ∂t L 6= 0. We introduce new variables

called generalized momenta by

∂L

pa = .

∂ q̇a

Thus, generalized momenta are partial derivatives of function L with respect to

one set of variables – velocities. We want to find the Legendre tranformation of

3.3 Hamilton’s equations 67

pa .

The total differential of the Lagrangian reads

∂L ∂L ∂L

dL = dqa + dq̇a + dt.

∂qa ∂ q̇a ∂t

Using the definition of generalized momentum and using the Lagrange equations

(3.4) we find

∂L

dL = ṗa dqa + pa dq̇a + dt.

∂t

Rearrange the terms to get

∂L

pa dq̇a − dL = − ṗa dqa − dt. (3.6)

∂t

The left hand side can be rewritten as

∂L

d (pa q̇a − L) = q̇a dpa − ṗa dq̇a − dt. (3.7)

∂t

On the left hand side we have again the total differential of some function. This

function is called the Hamiltonian and is defined by

H = pa q̇a − L. (3.8)

and pa . This fact follows from equation (3.7) according to which the total differential

of Hamiltonian is

∂L

dH = q̇a dpa − ṗa dqa − dt.

∂t

We know that coefficients standing by the differentials on the right hand side are in

fact partial derivatives of function on the left hand side, i.e. the partial derivatives

of Hamiltonian:

∂H ∂H ∂H ∂L

= q̇a , = − ṗa , = − . (3.9)

∂pa ∂qa ∂t ∂t

68 3 Hamilton’s equations

Notice that the first two equations have exactly the form (3.3)! These are new equa-

tions of motion called Hamilton’s equations:

∂H ∂H

q̇a = ṗa = − (3.10)

∂pa ∂qa

Hamilton’s equation possess the advantages of Lagrange’s equations but there are

some differences. Let us compare them briefly.

• Both Lagrange and Hamilton equations hold in arbitrary curvilinear coordinate

system;

• equations of motion are derived from single scalar function, L or H;

• Hamilton’s equations are of the first order while the Lagrange equations are of

the second order;

• there is one Lagrange equation for each degree of freedom, so for the system with

n degrees of freedom we have n Lagrange’s equations; on the other hand, there

are two Hamilton’s equations for each degree of freedom, one for the coordinate

and one for the momentum – thus, there are 2n Hamilton’s equations.

Let us illustrate how Hamilton’s equations “work” on familiar examples.

At this stage, the reader should be very familiar with the section (2.4), page 47. The

Lagrangian of the particle in homogeneous gravitational field is

1 1

m ẋi ẋi − m g y = m ẋ2 + ẏ 2 .

L=

2 2

The generalized coordinates in this case are

i.e. we use ordinary Cartesian coordinates. The generalized momenta are, by defini-

tion (3.2),

∂L

p1 = = m ẋ,

∂ ẋ

(3.11)

∂L

p2 = = m ẏ.

∂ ẏ

3.5 Conservation of energy 69

We can see that the generalized momenta are ordinary momenta, the components of

p = m v in Cartesian coordinates. The last relations can be inverted to find

p1

ẋ = ,

m (3.12)

p2

ẏ = .

m

In order to find the Hamiltonian we have to perform the Legendre transformation

of the Lagrangian using relation (3.8):

1

H = pa q̇a − L = p1 ẋ + p2 ẏ − m (ẋ2 + ẏ 2 ) + m g y.

2

This is not a correct expression for we have to eliminate velocities q̇a and express

them as functions of momenta pa :

p1 p2 1 p2 1 p2

H = p1 + p2 − m 12 + m 22 + m g y.

m m 2 m 2 m

Collecting similar terms we arrive at the Hamiltonian in the form

p21 + p22

H= + m g y. (3.13)

2m

Notice that this is, not accidentally, an expression for the total energy of the particle.

Hamilton’s equations then follow straighforwardly from (3.10).

ẋ = p1 , ẏ = p2 ,

(3.14)

ṗ1 = 0, ṗ2 = − m g.

In mechanics we often analyze systems where the total energy is conserved. All

examples we have seen until know belong to this class of systems. The fact that

energy is conserved can be made explicit in the Hamiltonian framework.

First we show that the Hamiltonian H is in fact equal to the total energy E =

T + V , there T is kinetic energy and V is the potential. Recall that the kinetic energy

of one particle is

1 1 ∂xi ∂xi 1

T = m ẋi ẋi = m q̇a q̇b = gab q̇a q̇b ,

2 2 ∂qa ∂qb 2

70 3 Hamilton’s equations

where we defined

∂xi ∂xi

gab = m .

∂qa ∂qb

Direct differentiation gives

∂T

= gab q̇b .

∂ q̇a

Since the potential V does not depend on generalized velocities, we have

∂L ∂T

H = q̇a pa − L = q̇a − T + V = q̇a − T + V = gab q̇a q̇b − T + V

∂ q̇a ∂ q̇a

and therefore

H = 2 T − T + V = T + V = E.

There exists more general proof of the last statement and we present it briefly.

It relies on Euler’s theorem about homogeneous functions. Function f of variables

x = (x1 , . . . xn ) is said to be homogeneous of degree N if the following holds:

f (λ x) = λN f (x).

∂f

xi = N λN −1 f (x).

∂(λxi )

Setting λ = 1 gives the Euler theorem.

∂f

xi = N f (x). (3.15)

∂xi

Let us apply this theorem to the Hamiltonian. Kinetic energy T is a function of

coordinates qa and velocities q̇a , since

1

T (q, q̇) = gab (q) q̇a q̇b .

2

3.5 Conservation of energy 71

We can see that kinetic energy is homogeneous function of degree 2 in velocities, for

we have

1

T (q, λq̇) = gab (q) (λq̇a ) (λq̇b ) = λ2 T.

2

Application of (3.15) immediately yields

∂T

q̇a = 2 T,

∂ q̇a

so that the Hamiltonian is

∂L ∂T

H = q̇a pa − L = q̇a − T + V = q̇a − T + V = T + V = E.

∂ q̇a ∂ q̇a

This is an alternative proof of the above statement that the Hamiltonian is equal to

the total energy.

Since now we know that the Hamiltonian is equal to energy, we can discuss the

conservation of energy. Relation (3.9) shows that

∂H ∂L

= − .

∂t ∂t

Let us calculate the overall change of the energy per unit time:

dH ∂H ∂H ∂H

= q̇a + ṗa + .

dt ∂qa ∂pa ∂t

Using Hamilton’s equations (3.9) we find

dH ∂L ∂L

= − ṗa q̇a + q̇a ṗa − = − .

dt ∂t ∂t

Thus, we derived relation

dH ∂L

= − .

dt ∂t

Energy is conserved if Ḣ = 0 but this is achieved when

∂L

= 0.

∂t

The question is: can the Lagrangian depend on time explicitly? Notice that

72 3 Hamilton’s equations

What is the interpretation of Hamilton’s equations? They describe the evolution of

the system in time. Let us explain this important point in some detail.

Generalized coordinates qa describe the position of all parts of the system. If we

know values

q = (q1 , . . . qn )

we know where all the particles are. We say that qa describe the configuration of

the system. Generalized momenta describe the state of motion (recall that momenta

are related to generalized velocities) of individual particles. Together we have 2n

quantities describing the actual state of the system which can be encapsulated in the

order 2n−tuple

Variables (q, p) define the state of the system. Hamilton’s equations (3.10) then say

how, for given state, these quantities change in time. The set of all possible states

of the system is called phase space. In other words, each state can be identified with

one point of the phase space. Let us illustrate the idea on the example of harmonic

oscillator.

Lagrangian treatment of harmonic oscillator can be found in the section 2.5, page

49. The Lagrangian of harmonic oscillator is

1 1

L= m q̇ 2 − m ω 2 q 2 .

2 2

Generalized momentum is then

∂L

p= = m q̇

∂ q̇

and corresponding Hamiltonian

p2 p2 1

H = pq̇ − L = −L= + m ω2 q2. (3.16)

m 2m 2

3.7 Harmonic oscillator 73

∂H p ∂H

q̇ = = , ṗ = − = − m ω 2 q.

∂p m ∂q

Now we set m = ω = 1 in order to simplify the analysis, so that the equations of

motion become

q̇ = p, ṗ = − q. (3.17)

Let us interpret these equations in the spirit of section 3.6. We have two vari-

ables describing the state of harmonic oscillator, coordinate q and the momentum

p. Hence, the phase space is a two-dimensional plane with coordinates q and p, see

figure 3.1. In this figure, the actual state of the oscillator is depicted as a point with

coordinates (q, p). Oscillator will then evolve in accordance with Hamilton’s equa-

tions (3.17) which determine the derivatives of coordinates. Thus, the oscillator will

move in the phase plane in the direction of velocity (q̇, ṗ) which is a vector tangent

to the trajectory of the oscillator in the phase plane. This trajectory is called phase

trajectory. By Hamilton’s equations (3.17) we have

Since Hamilton’s equations are of the first order, the evolution of the system is

given uniquely by the initial position in the phase plane. If we draw a velocity at

each point of the phase plane, we get a reasonable idea about the behaviour of the

oscillator. Simple Mathematicacode which can be used to draw the velocity field of

harmonic oscillator follows:

VectorPlot[ {p, -q}, {q, -5, 5}, {p, -5, 5},

Frame -> True,

FrameLabel -> {"q", "p"},

BaseStyle -> {FontFamily -> "Times New Roman", FontSize -> 13}

]

The result is plotted in figure 3.2. This figure suggests that phase trajectories of

harmonic oscillators are circles centered at the origin of the phase plane. In the case of

harmonic oscillator we can prove this analytically. Since the Hamiltonian represents

the total energy, which was proved to be conserved, we can write

H = E,

74 3 Hamilton’s equations

p

(q, p)

(p, −q)

Fig. 3.1. Geometrical interpretation of Hamilton’s equations (3.17) in the phase space (q, p). The state

of the oscillator is represented by the position q and momentum p which can be regarded as coordinates

in the phase space. The ”velocity” is then vector with coordinates (q̇, ṗ) where the derivatives are

determined by Hamilton’s equations.

where E is the total energy of the oscillator, while H is the Hamiltonian (3.16) with

the simplification m = ω = 1 employed in this section for brevity:

p2 q 2

H= + .

2 2

Now, equation H = E can be rearranged slightly so it acquires the form of the

equation of circle,

√ 2

q 2 + p2 = 2E ,

√

where the radius of the circle is manifestly r = 2E. We can see that the phase

trajectory is determined by the single parameter E, the total energy.

4

0

p

-2

-4

-4 -2 0 2 4

q

Fig. 3.2. Velocity field of harmonic oscillator. At each point of the phase plane (q, p) we calculate the

velocity (q̇, ṗ) using the Hamilton equations (3.17) and draw the vector representing the velocity.

4

Variational principle

We have seen that both Lagrange’s equations and Hamilton’s equations are essen-

tially equivalent (at least when the forces involved have potential) to Newton’s law

of motion. In this chapter we derive Lagrange’s equations in a completely different

way, using variational principle. We will see that with this principle it is possible to

derive equations of motion from scratch, with the minimum of initial assumptions.

This approach is much more powerful, because it works even outside the realms of

classical mechanics. In fact, all laws of modern physics can be formulated in terms

of variational principle.

Before we formulate variational principle, or Hamilton’s principle, in classical me-

chanics, we start our discussion with perhaps more familiar example from optics. It

is well-known that the light in different media propagates at different speeds. If c

denotes the speed of light in the vacuum, than the refractive index of given medium

is defined as

c

n=

v

where v is the speed of light in that medium. For example, the refractive index of

water is about n = 1, 33 which means that the light propagates 1, 33 times slower in

water than in the vacuum. Refractive index of the air is approximately n = 1, i.e.

the speed of light in the air is the same as in the vacuum.

Now, when the light ray propagates from one medium to another, it changes the

direction. Suppose that the light rays crosses the interface between two media with

refractive indices n1 and n2 , figure 4.1. It is customary to measure the angle of impact

78 4 Variational principle

with respect to the line perpendicular to the plane of the interface. For example, the

angle of impact in figure 4.1 is denoted by α. Similarly, the angle of refraction is β.

Can we calculate the angle of refraction provided that the angle of impact is given?

Yes, we can. According to the Snell law, these angles must satisfy equation

sin α n2

= . (4.1)

sin β n1

The Snell law (4.1) is a phenomenological law which was discovered before the

theory of electromagnetism and propagation electromagnetic waves have been found.

We can say that the rôle of the Snell law is similar to Newton’s law of motion. This

analogy goes even further. Three basic laws of geometrical optics are

• In homogeneous medium, the light propagates along straight lines;

• When the light propagates from one medium to another, the angle of impact and

the angle of refraction are related by the Snell law (4.1); if the light is reflected

on the interface, then the angle of impact is equal to the angle of reflection;

• If the light ray can propagate along some trajectory, then it can propagate also

in the opposite direction along the same trajectory.

With a small portion of fantasy one observes striking analogy between Newton’s laws

of motion and these three laws of optics. The first law tells us that if no changes

of refractive index occur, the light is propagating along the straight line which can

be regarded as an analogy to Newton’s law of motion: if no forces act, the body is

moving uniformly along a straight line. Second law, on the other hand, tells us how

the direction of propagation is influenced by changes of refractive index. Newton’s

law tells us how the velocity is changed under external force. Finally, the third law

of optics ensures that if light can propagate along some trajectory, it can propagate

in a reverse direction. In other words, if Alice can see Bob, then also Bob can see

Alice. Newton’s third law says that if body A exerts a force on body B, then also

body B exerts the force of the same magnitude and opposite direction on body A.

However, what we want to emphasize is that both the Snell law and the Newton

law of force are empirical laws which are justified by experiment. Is there any deeper

law which could explain all three laws of optics? Can we replace three laws of optics

by a single law? Yes, we can and it is called Fermat’s law.

Let us see how we can arrive at the formulation of the Fermat principle by heuristic

arguments. Suppose that the light ray is propagating in the homogeneous medium in

which, by definition, the refractive index is constant. Then, by the first law of optics,

the ray propagates along the straight line which is the shortest curve connecting

given two points. Hence, in the homogeneous medium, the light ray which travels

4.1 Fermat’s principle 79

α

medium n1

medium n2

β

B

Fig. 4.1. The light ray changes the direction on the interface between two media with refractive indices

n1 and n2 . In this figure we assume n1 < n2 which means that the light is faster in the first medium.

from point A to point B follows the shortest path from A to B. Does this statement

hold in general? Certainly not, as it is obvious from figure 4.1: the trajectory of light

ray passing from one medium to another is not a straight line and so its trajectory is

longer than the shortest possible one. But recall that the light propagates at different

speeds in both media. Maybe that the time rather than length is the minimal!

There is a beautiful argument by Richard Feynman. Suppose that you are standing

at the coast and there is a nice girl drowning in the sea. Of course, you want to save

her (this statement does not depend on whether the reader is a girl or a boy). You

must to reach the girl in a shortest possible time, not by the shortest distance! It is not

the same because you run faster than you swim. There are two extreme trajectories

along which you can travel, figure 4.2.

If you run straightly to that girl, trajectory i), you have to swim a long distance

which takes a longer time. If you choose trajectory ii), you spend the shortest possible

time in the water, but you have to run longer while your enter the water. It is obvious

that we have to find a point where to enter the water, so that the overall time which

you need to reach the girl is the shortest. This qualitative analysis shows that the

trajectory must be something like it is shown in figure 4.1 which indeed suggests that

the light ray is following the trajectory with minimal time.

Thus, we have arrived at the conjecture that the light ray propagates from point

A to point B along trajectory which takes the minimal time. This is the Fermat

principle. Let us formulate it in mathematical terms. Suppose that the light ray

starts in the point A and ends in the point B as in figure 4.1. Time which the ray

needs to travel along distance dr is

80 4 Variational principle

coast ii)

water i)

B (drowning girl)

Fig. 4.2. Two extreme trajectories which can be used to save drowning girl. Trajectory i) is the most

natural one, but the time you spend in water is too long. Trajectory ii) is better, but now you spend

too much time on the coast.

dr 1

dt = = n dr.

v c

Speed of light c is irrelevant, because whenever ndr is minimal, so is dt. Hence, we

define optical path length by

ds = n dr.

Total optical path length between points A and B is

ZB ZB

S= ds = n dr. (4.2)

A A

This notation is slightly awkward because the value of the integral depends not only

on points A and B but on the whole trajectory. In figure 4.3 we depict two different

trajectories γ and γ 0 connecting points A and B. Obviously, optical path length S is

different for both trajectories and thus instead of writing integration bounds A and

B we write the trajectory explicitly, e.g.

Z

S[γ] = n dr.

γ

Here we explicitly emphasize that the integral is taken along trajectory γ. Notice

that S depends on the entire trajectory γ. In other words, S can be regarded as a

mapping which assigns a number, optical path length, to each trajectory γ,

4.1 Fermat’s principle 81

S : γ 7→ R.

In general, any mapping from arbitrary set into the real numbers is called a func-

tional.

A

γ

B

γ0

Fig. 4.3. There are many trajectories connecting points A and B and the optical path length S is

different for each of them.

What is the law of propagation for the light ray? The Fermat principle states

that the light propagates along such trajectory for which the optical length S[γ] is

minimal. All three optical laws formulated at the beginning of this section can be

recovered from this simple statement. We do not show how it can be done in general,

but we show how the Snell law (4.1) can be derived from the Fermat principle.

Situation is sketched in figure 4.4. Suppose again that the light ray starts at point

A in the medium with refractive index n1 , crosses the interface between both media

and finally ends at point B in the medium with refractive index n2 . Let a be the

distance of point A from the interface, let b be the distance of point B from the

interface and let x be a coordinate of the place where the ray crosses the interface. If

points A and B are held fixed, it is the coordinate x which is unknown: we want to

find the place where the ray must cross the interface in order that the optical path

length be minimal. Complement of distance x will be denoted by y. Notice that, for

fixed points A and B, the sum of x and y is constant, say, l (it is the horizontal

distance of between points A and B):

x + y = l.

∂y

= − 1. (4.3)

∂x

82 4 Variational principle

A

r1

a α

n1 x y

n2 r2

β b

B

Fig. 4.4. Derivation of the Snell law using the Fermat principle.

√

r1 = a2 + x2

√

s1 = n1 r1 = n1 a2 + x2 .

Similarly, distance from the crossing point to point B and corresponding optical path

length are

p p

r2 = b2 + y 2 , s2 = n2 r2 = n2 b2 + y 2 .

√ p

S = n1 a2 + x2 + n2 b2 + y 2 . (4.4)

We want to find such x that the length S will be minimal. This is an easy task of

elementary calculus: we differentiate S with respect to x and set the derivative equal

to zero. Assuming that n1 and n2 are constants and using (4.3) we find

∂S x y

= n1 √ − n2 p = 0. (4.5)

∂x a2 + x 2 b2 + y 2

From figure 4.4 we can see that x and y are related to angles α and β by

x x y x

sin α = =√ , sin β = =p ,

r1 a + x2

2 r2 b2 + y 2

4.2 Formulation of variational problem 83

sin α n2

=

sin β n1

which is the Snell law. This finalizes the proof.

Let us recapitulate and conclude this section. First we formulated three basic laws

of geometrical optics and emphasized the analogy between these laws and Newton’s

laws of motion. By some argumentation we have arrived at the conjecture that the

laws of optics can be replaced by the single law called Fermat’s principle: the light

ray propagates in such a way that the optical path length is minimal. From this

simple law we were able to derive the Snell law of refraction of the light.

Although this textbook is not concerned with the optics, our aim was to illustrate

the idea of variational principle on familiar example. The reason of this digression

rests in a fundamental importance of variational principles in theoretical physics.

It is possible to show that all phenomena which occur in geometrical optics can be

explained on the basis of the Fermat principle. Since the laws of optics are analogous,

at least mathematically, to the Newton laws, we can hope that it is possible to

formulate Newton’s laws in the framework of the variational principle. This will be

done in the rest of this chapter.

Before we apply the variational principle to Newtonian mechanics, we formulate the

problem in precise mathematical terms and solve it. Let us return to integral (4.2)

which is only a formal expression of the Fermat principle. When we derived the Snell

law from the Fermat principle, we assumed that the refractive index was constant in

the first medium and constant but different in the second medium. This allow us to

split the integral into the sum of two terms (4.4). In general, however, the refractive

index will be the function of coordinates. Indeed, the first law of optics tells us that

if the refractive index is constant everywhere, the light rays propagate along the

straight lines. Hence, at the first step, we must admit that n is a function of spatial

coordinates:

Z

S[γ] = n(x, y) dr.

γ

z−coordinate. The line element dr must be then expressed in terms of the Cartesian

coordinates as well; using the Pythagorean theorem we have

84 4 Variational principle

p

dr = dx2 + dy 2 .

dx = ẋ dt, dy = ẏ dt,

acquires final form

Z p

S[γ] = n(x, y) ẋ2 + ẏ 2 dt. (4.6)

γ

We can see that the optical path length S[γ] is a functional of the form

Z

S[γ] = L(x(t), ẋ(t)) dt

γ

where L is a function of coordinates and their derivatives. Our task is to find such

trajectory γ for which the value S[γ] is minimal. It is a task similar to finding the

minimum of function familiar from elementary calculus. Such problem is solved by

taking the derivative with respect to the variable and setting it to zero. The difference

in our case is that now γ is not a single variable but it is the entire trajectory and it

is not obvious how to differentiate S with respect to γ. This concept is known as a

functional derivative or a variation and it can be defined in a very general context.

Here we define it in a more pedestrian way sufficient for our purposes.

In the previous section we have formulated basic problem of variational calculus in

the Cartesian coordinates. We have seen in previous chapters that it is often useful

to introduce generalized coordinates qa instead of Cartesian coordinates xi . Hence,

we replace x in integral (4.6) by q:

Z

S[q] = L(q(t), q̇(t)) dt. (4.7)

γ

coordinate expression of the trajectory. How can we differentiate S with respect to

γ in order to find γ for which S[γ] is minimal?

4.3 Variation of the functional 85

Suppose that qa is the trajectory which is the solution to our problem, i.e. suppose

that for qa the functional S[q] acquires minimal value. Let this trajectory passes point

A for t = t1 and point B for t = t2 , see figure 4.5. Since qa is the minimal trajectory,

any trajectory qa0 different from qa must yield bigger value of S. Notice that we can

choose arbitrary trajectory qa0 but it must satisfy boundary conditions

qa0 (t1 ) = qa (t1 ), qa0 (t2 ) = qa (t2 ), (4.8)

because points A and B are held fixed. Let us write trajectory qa0 in the form

qa0 (t) = qa (t) + ε ηa (t) (4.9)

where ε is arbitrary constant parameter and η(t) is arbitrary function of time, subject

to boundary conditions

ηa (t1 ) = ηa (t2 ) = 0 (4.10)

in order to satisfy conditions (4.8).

Since function ηa is a difference between trajectories qa and qa0 , it is called a

variation and in physical textbooks it is often denoted by δqa = εηa . Symbol δ has

formally the same properties as the total differential d. Notice that (4.9) implies

q̇ 0a = q̇a + ε η̇a

so we can write δ q̇a = εη̇a . In other words, variation δ commutes with differentiation

with respect to parameter t.

As we emphasized repeatedly, functional S depends on the trajectory and for qa

it acquires minimal value, while for qa0 6= qa we have

S[q 0 ] = S[q + εη].

Notice that now we parametrized the family of trajectories qa0 by single parameter ε.

Since we want to find the minimum of S[q 0 ] (which is S[q]), we need to differentiate

S[q 0 ] somehow. While we do not know how to differentiate S with respect to entire

trajectory, differentiation with respect to ε is a well-defined operation. Hence, we

define the variation or functional derivative of S by

d

δS = S[q + εη]. (4.11)

dε ε=0

Notation d/dε|ε=0 means that first we differentiate the function with respect to ε

and then set ε = 0. The reason why we substitute zero for ε will be clear soon. Now,

the correct qa is a solution to equation

δS = 0.

86 4 Variational principle

qa

qa

A

qa0

εη B

t

t1 t2

Fig. 4.5. Two trajectories qa and qa0 starting at point A and ending at point B. Only for trajectory

qa the functional S is minimized.

Having defined the variation (4.11), we can now easily solve our variational problem.

Let us state it again. We want to find such trajectory qa (t) that the integral (called

action)

Zt2

S[q] = L(q(t), q̇(t)) dt (4.12)

t1

qa 7→ qa + ε ηa

and solve equation δS = 0 where the variation δ is defined by (4.11). Since we suppose

that initial and final points A and B are fixed, we are interested only in trajectories

for which

ηa (t1 ) = ηa (t2 ) = 0.

4.4 Euler-Lagrange equations 87

Zt2

d d

δS = S[q + εη] = L(q(t) + εη(t), q̇ + εη̇) dt

dε ε=0 dε ε=0

t1

(4.13)

Zt2

∂L ∂L

= ηa + η̇a dt.

∂qa ∂ q̇a

t1

Note that function L in the first line is evaluated on varied trajectory qa0 = qa + εηa .

Then we differentiate L using the chain rule with respect to its first and then with

respect to its second argument. After differentiation we put ε = 0 so that function

L is evaluated on the original trajectory qa after the differentiation. Hence, after the

differentiation we do not have varied trajectory, only the original one.

Next step is to remove derivative of variation ηa with respect to parameter t.

Using integration by parts we find

Zt2 t2 Zt2

∂L dηa ∂L d ∂L

dt = ηa − dt. (4.14)

∂ q̇a dt ∂ q̇a t1 dt ∂ q̇a

t1 t1

Now we impose boundary conditions (4.10) that ηa must vanish at boundary points

A and B which implies that the ”boundary” term in square brackets is equal to zero!

Thus, after integration by parts, the variation of the action becomes

Zt2

∂L d ∂L

δS = − ηa dt. (4.15)

∂qa dt ∂ q̇a

t1

Our variational principle tells us that this variation must be equal to zero. Recall

that during the variation we kept boundary points A and B fixed. However, equation

(4.15) must hold for arbitrary points A and B, because we did not say anything

specific about these points. We can choose these points arbitrarily and then find δS

and this variation δS must vanish. Moreover, variation ηa was chosen to be arbitrary

as well. Then, δS can vanish for all ηa and for all points A and B only if the expression

in the square brackets is zero everywhere. In other words, variational principle implies

that following equations must hold:

∂L d ∂L

− = 0. (4.16)

∂ q̇a dt ∂ q̇a

These equations are known as the Euler-Lagrange equations of variational calculus.

88 4 Variational principle

We can see that the Euler-Lagrange equations are nothing else than Lagrange’s

equations (2.18), if we identify the Lagrangian L with function L above. This is

a surprising result: actual physical system evolves in time in such a way so as to

minimize the action (4.12)!

From the other point of view, recall that Lagrange’s equations (2.18) have been

derived as an equivalent formulation of Newton’s laws of motion in arbitrary coordi-

nate system. Thus, at the beginning, we had the Newton law which is a physical law.

In this chapter, on the other hand, we have not assumed anything about the physics:

we merely formulated the rule, variational principle, that action must be minimal.

Then we performed some calculations and showed that this principle is equivalent to

the Euler-Lagrange equations (4.16). Hence, we have derived the same form of the

law of motion without using any physics.

Of course, this strong statement is somewhat weakened if we realize that varia-

tional principle does not tell us what is the form of function L. In order to guess the

form of L we have to impose some physical restrictions. First, consider free particle,

i.e. particle moving in free space where no forces are present. If we describe the parti-

cle in the Cartesian coordinates, the Lagrangian L can depend on coordinates xi and

velocities ẋi . However, all points of the space are equivalent and no one is preferred.

If there are no forces, the particle must behave in the same way independently of its

position. Hence, Lagrangian cannot depend directly on coordinates, it can depend

only on the velocities. This is a consequence of homogeneity of space.

Next restriction comes from a isotropy of space. While homogeneity implies that

all points are equivalent, isotropy implies that for a given point, all directions in the

space are equivalent. We can rotate the system containing our particle under analysis

and the particle will behave in the same way. Thus, the Lagrangian cannot depend

on the direction of velocity vi = ẋi and can depend only on its magnitude, v 2 = ẋi ẋi .

Thus, we have determined the Lagrangian of the free particle up to multiplicative

constant and we can write it in the form

L = α v2 (4.17)

it must be a constant characteristic to the particle and its value will depend on

the convention we use. We can argue that our Lagrangian is proportional to kinetic

energy and therefore it is plausible to set α = m/2, but it is not necessary. We

emphasize that it is more-less only a convention that we write constant α in this

form. The reason is that it was the Newton law which was discovered first and the

variational principle was discovered later. From now we assume α = m/2 and denote

kinetic energy by

4.4 Euler-Lagrange equations 89

1

T = α v2 = m v2

2

and investigate what happens in the presence of forces.

We can see the heuristic power of variational principle: equations of motion are

provided by the Euler-Lagrange equations which have always the same form regard-

less on the system we describe and independently on the coordinates used. In order

to find equations of motion we have only to specify the Lagrangian. Usually we do

not have too many possibilities how the Lagrangian can look like. We have seen in

the case of the free particle that essentially the only form of admissible Lagrangian

is (4.17). The reason is that the Lagrangian is a scalar, so we must construct a scalar

quantity from quantities describing our system, like velocity and coordinates. Usually

there are only few possibilities.

Situation is similar even in the presence of forces. If the force is potential and thus

described by single scalar V such that Fi = −∂i V , it is natural to set

L = α v2 − V

where the minus sign is customary again and is related to the fact that the force is

minus the gradient. This choice is convenient but absolutely not necessary.

Electromagnetic forces, on the other hand, are not potential. Thus, the construc-

tion of Lagrangian as in chapter 2 is impossible: we can define the generalized forces

Qa but they are not a gradient of any scalar. In fact, electromagnetic field is described

by one scalar potential φ and one vector potential Ai . These potentials in general

depend on time and position. Now it is not important what is the vector potential,

we want just illustrate that even in this case the Lagrangian can be constructed.

Indeed, the particle moving in the electromagnetic field is described again by the

position xi and by the velocity vi while the field itself is described by potentials φ

and Ai . Can we combine these quantities to form a scalar Lagrangian? Yes, and the

construction is fairly unique. Since φ is itself a scalar, we can simply add it to the

Lagrangian of free particle (or, more precisely, subtract it from the Lagrangian), so

that the first part of the Lagrangian will be

L = T − βφ.

Here, β is again a constant to be specified later. Now we can form two scalar functions

from quantities xi , vi and Ai :

xi v i , xi Ai , Ai vi .

The first combination does not contain field quantities and we can exclude it im-

mediately, for it cannot describe interaction of the particle with a field. The second

90 4 Variational principle

combination looks better but recall that the space itself is homogeneous. This ho-

mogeneity is broken down by the presence of the electromagnetic field, but still the

Lagrangian should not depend on coordinates directly, only through potentials φ and

Ai . Hence, the only plausible combination is Ai vi and we can write

L = T − βφ + γA · v.

Now, constants β and γ obviously determine the strength of interaction between the

field and the particle. We know from the experience that electromagnetic force is

proportional to the charge of the particle e and thus we can write the Lagrangian in

the final form

L = T − e φ + e A · v. (4.18)

We can see that our construction is not ”bullet-proof” but it is very natural and,

moreover, it yields correct equations of motion. This heuristic approach is even more

powerful in relativistic theories where the action must be a scalar1 with respect to so-

called Lorentz group which is a strong restriction. Notice that in classical mechanics

we know what the correct equations of motion are: Lagrange’s equations must reduce

to Newton’s law. However, when we are developing a new theory, we do not know

what the correct equations are. In such a position we usually assume that variational

principle is correct and guess the form of the action or the Lagrangian. In this

way, people constructed modern quantum field theories of electromagnetic, weak and

strong interactions. Hence, variational principle is much more fundamental principle

than it seems from our discussion.

Using the variational formulation it is easy to see that the Lagrangian is not unique,

i.e. there are many different Lagrangians which yield the same equations of motion.

To see this, consider arbitrary function F = F (t) of time and define

dF

f (t) = .

dt

Let us modify the action by adding a new term to it:

1

In classical mechanics it does not matter whether we construct the action or directly the Lagrangian,

because they differ only by integration over time. In relativistic theories, time is not invariant and trans-

forms as a component of (four-)vector. Hence, it is the action which must be scalar, not the Lagrangian.

4.6 Variational derivation of Hamilton’s equations 91

Zt2

0

S =S+ f (t) dt.

t1

Zt2

dF

S0 = S + dt = S + [F (t)]tt21 = S + F (t2 ) − F (t1 ).

dt

t1

Thus, the new action S 0 differs from S only by boundary terms – values of F at

boundaries of the trajectory. These are fixed under variation and so we have

δS = δS 0 .

That means that variational principle δS = 0 gives the same equations of motion as

principle δS 0 = 0. By the definition of the action, we have

Zt2

0

S = (L + f (t)) dt,

t1

Zt2

0

S = L0 dt

t1

where

dF

L0 = L + f (t) = L + . (4.19)

dt

In other words, if we change the Lagrangian by adding function f which is a total

derivative of some other function F with respect to time, we do not change the

equations of motion. Hence, the Lagrangian is not unique. This is an important

observation which will be exploited in the connection with canonical transformation,

chapter 5.

We have shown that variational principle reproduces Lagrange’s equations. Can we

reproduce Hamilton’s equations as well? Let us start with the action (4.7) and express

the Lagrangian in terms of the Hamiltonian using the Legendre transform (3.8):

92 4 Variational principle

Zt2

S= (pa q̇a − H) dt. (4.20)

t1

Recall that the Hamiltonian is function of coordinates and momenta, H = H(q, p).

Let us variate the action, remembering that δ−symbol behaves like the differential,

Zt2

∂H ∂H

δS = pa δ q̇a + q̇a δpa − δqa − δpa dt.

∂qa ∂pa

t1

Now we have three variations δ q̇a , δpa and δqa . However, they are not independent

because q̇a should be expressed in terms of momenta. We can get rid of this term

integrating by parts,

Zt2 Zt2

pa δ q̇a dt = [pa δqa ]tt21 − ṗa δqa dt

t1 t1

where the first term on the right hand side vanishes by boundary conditions δpa = 0

for t = t1 and t = t2 . Then the variation of the action becomes

Zt2

∂H ∂H

δS = −ṗa δqa + q̇a δpa − δqa − δpa dt.

∂qa ∂pa

t1

Variation will be zero for arbitrary choice of t1 and t2 if the integrand vanishes.

Comparing coefficients standing beside independent variations δqa and δpa we recover

Hamilton’s equations

∂H ∂H

q̇a = , ṗa = − . (4.21)

∂pa ∂qa

In general, during the evolution of mechanical system, quantities characterizing the

system change. Namely, coordinates and velocities (or momenta in the Hamiltonian

formulation) are solutions to equations of motion and hence they are genuine (non-

trivial) functions of time. However, there are other quantities which are functions of

qa and pa but for the real evolution they remain constant. The most familiar example

4.7 Noether’s theorem: motivation 93

is energy. We have seen that Hamiltonian represents total energy of the system and if

it does not depend on time explicitly, it does not depend at time at all. For example,

the Hamiltonian of harmonic oscillator is

p2 1

H= + m ω2 qa

2m 2

where both p and q are functions of time. Nevertheless, for any solution of Hamil-

ton’s equations, particular combination of coordinates and momenta given by H is

a constant. In this case we say that the energy is conserved.

Other examples of conserved quantities are momentum and angular momentum.

Total momentum and total angular momentum of an isolated system are constant

in time.

From mathematical point of view, the existence of conserved quantities is not

surprising but it is a direct consequence of properties of differential equations. For

the system with n degrees of freedom we have n Lagrange’s equations of the second

order or 2n Hamilton’s equations of the first order. Solution of second-order equation

contains two arbitrary constants, so the solution of complete set of Lagrange’s equa-

tions contains 2n constants. Similarly, solution to first-order equation contains one

integration constant, so the solution of complete set of Hamilton’s equations contains

again 2n constants.

We have arrived at conclusion that, regardless on the formalism, the solution

of equations of motion depends on the choice of 2n arbitrary constants C1 , . . . C2n .

Hence, the solution (q, p) of equations of motion can be written in the form

.. ..

. .

qn = qn (t, C1 , . . . C2n ), pn = pn (t, C1 , . . . C2n ).

This is the system of 2n equations for constants Cm which can be inverted to obtain

C1 = C1 (q1 , . . . qn , p1 , . . . pn , t),

..

.

C2n = C2n (q1 , . . . qn , p1 , . . . pn , t),

In other words, for any solution of Hamilton’s equations there must exist at least

2n functions Cm of coordinates and momenta which are in fact constant and hence

conserved. In this sense the existence of conserved quantities is a pure mathemati-

cal consequence of the fact that solutions of differential equations contain integration

94 4 Variational principle

fore the set of conserved quantities is not unique.

There is, however, much deeper physical interpretation of the existence of con-

served quantities. Some of these conserved quantities reflect properties of the space

and time and so they are intimately related to symmetries of Nature. This relation is

the content of celebrated Noether’s theorem, one of the most fundamental and strik-

ing achievements of modern theoretical physics. The most important consequences

of Noether’s theorem can be found in relativistic quantum field theories, but it has

implications even in the context of classical mechanics. In the following we derive and

proof the Noether theorem, then we show that conservation of energy, momentum

and angular momentum is the consequence of this theorem. The reader will notice

that the theorem is genuinely based on the variational principle which this chapter

is devoted to.

When we derived Lagrange’s equations from the action, the idea was to find such

trajectory qa that the action S acquires its extremal value. The variation of the action

was introduced with the help of varied trajectories, recall figure 4.5. The variation

of the trajectory was arbitrary with the only constraint that it must vanish at the

boundary points A and B. Using this constraint we were able to derive equations of

motion, i.e. the Lagrange equations.

Now we proceed differently. We claimed in the previous section that to each sym-

metry of the system there is a conserved quantity. What do we mean by the symmetry

of the system? The simplest example of the symmetry is the invariance with respect

to temporal translation. Isolated systems must be invariant under translations in

time. In other words, if we perform some experiment at time t1 and then the same

experiment at later time t2 > t1 , both experiments must give the same results, if all

conditions remain unchanged.

For example, suppose we study the collision of two particles with initial velocities

v 1 and v 2 and masses m1 and m2 . In addition, we suppose that these particles form

an isolated system, not affected by the laboratory. After the collision we measure

the velocities and find that new velocities of particles are v 01 and v 02 . The point is

that if the initial velocities and masses do not change, resulting velocities after the

collision do not depend on time when the experiment was performed. It does not

matter whether we study the collision on Monday or on Friday, the result must be

the same, independent of time.

4.8 Noether’s theorem: proof 95

B B0

A A0

t1 t01 t2 t02

Fig. 4.6. Translation of the system in time.

More generally, imagine that qa = qa (t) is the real trajectory (i.e. it is a solution

of equations of motion) which passes point A at time t1 and point B at time t2 ,

see figure 4.6. If we perform the same experiment at later time, we can imagine it

as ”shifting” the trajectory to the right (in time direction), so that new trajectory

starts at point A0 at shifted time t01 and ends at point B 0 at time t02 . We say that,

mathematically, we translated the system in time. If all other conditions remain the

same, then the shape of the trajectory cannot change, the particle must move along

the same trajectory but at later time.

We say that time is homogeneous, i.e. alt instants of time are physically equivalent.

Hence, the result of any experiment cannot depend explicitly on time at which it was

performed: isolated system must be invariant under the translation in time.

Notice that this conclusion does not apply to non-isolated systems. For example,

suppose that we measure the intensity of the sunlight at 8.00 am and at 11.00 pm.

Then the results will be, of course, different! We cannot say that the intensity of the

sunlight is always the same. However, this is related to the fact that Earth is not an

isolated system if one studies the sunlight, because for our measurement it is crucial

that there is an energy coming from Sun to Earth. The conditions which can affect

the experiment are not the same in the morning and before midnight. Hence, the

assumption that the system is isolated is important. In fact, the existence of Sun and

the rotation of Earth breaks down the homogeneity of time.

We will not always emphasize it, but in connection with conservation laws, we

will always assume that the system is isolated.

96 4 Variational principle

one is the homogeneity of the space. This principle states that the result of experiment

cannot depend on the place where we perform it. Again, we must add an assumption

that all external conditions must be the same. But if this assumption is satisfied, it

does not matter where we perform the experiment. The physics must be invariant

with respect to translation in the space; this transformation is plotted at figure 4.7.

q

B0

q 0(t1) A0

B

q(t1) A

t1 t2

Fig. 4.7. Translation of the system in space.

The last of the most important symmetries is the isotropy of the space. Isotropy

means that at given point of the space, all directions are equivalent and the result

of any experiment cannot change if we rotate the system by arbitrary angle.

If the system is invariant with respect to some transformation(translation in time,

space or rotation), the action of this system does not change under this transforma-

tions. Noether’s theorem then implies that each of these symmetries is responsible

for the conservation of some quantity. Homogeneity of time implies the conserva-

tion of energy, homogeneity of the space implies the conservation of momentum and

isotropy of the space implies conservation of angular momentum.

Notice that in previous examples we varied either the trajectory or the time. In

the case of spatial translation, figure 4.7, we did not transform the time, only the

trajectory. However, boundary points were not fixed because the endpoints of the

trajectory are transformed as well. Thus, in general, boundary conditions

4.8 Noether’s theorem: proof 97

must be relaxed. In the case of time translation we did not change values of coordi-

nates qa , but we shifted the trajectory in time and thus we must consider not only

variations of coordinates qa , but also variation of time δt.

All transformations considered above are special cases of general transformation

Here we explicitly emphasized that variations δt and δqa can depend on time. Varia-

tion δq is called isochronous variation because it is a difference of varied coordinate

q 0 (t) and original coordinate q(t) at the same time. Beside δqa we introduce also

non-isochronous variation or total variation ∆qa and defined by

∆qa = qa0 (t + δt) − qa (t) = qa0 (t) + q˙a δt − qa (t) = δqa + q̇a δt. (4.22)

Theorem 4 (Emmy Noether’s theorem). Let S be the action of the system de-

fined by

Zt2

S= L(q, q̇, t) dt (4.23)

t1

Let

quantity

Q = pa ∆qa − E δt (4.25)

is constant during the evolution of the system whenever qa is the solution of equations

of motion, where pa are the generalized momenta and E is generalized energy of the

system E defined by

∂L

pa = , E = pa q̇a − L. (4.26)

∂ q̇a

98 4 Variational principle

The action associated with varied trajectory qa0 and varied time t is

t2Z+δt2

0

S = L(q 0 (t), q̇ 0 (t), t) dt (4.27)

t1 +δt1

where we use notation δt1 = δt(t1 ) and δt2 = δt(t2 ) for brevity. Notice that the

time translation δ affects only the integration bounds, not the integrand. The total

variation of the action is then ∆S = S 0 − S which is zero by assumption of invariance

of the action:

∆S = S 0 − S = 0. (4.28)

t2Z+δt2 Zt2 t2Z+δt2 t1Z+δt1 t2Z+δt2 Zt1 t1Z+δt1 t2Z+δt2

S0 = = + = − + = − − +

t1 +δt1 t1 +δt1 t2 t2 t2 t2 t1 t2

Zt2 t1Z+δt1 t2Z+δt2

0 0 0 0

= L(q , q̇ , t) dt − L(q , q̇ , t) dt + L(q 0 , q̇ 0 , t) dt

t1 t1 t2

where we have omitted the integrand in intermediate steps. Hence, the total variation

of the action reads

Zt2 t2Z+δt2 t1Z+δt1

t1 t2 t1

| {z } | {z }

∆S1 ∆S2

(4.29)

Now we are in position to expand these integrals in variations δqa and δt assuming

they are infinitesimal and hence neglecting higher order terms. This is ”legal” because

in the definition of the variation it was assumed that after variation, all quantities

will be evaluated at δqa = δt = 0, so only the first order terms enter the result.

First we express the variation denoted by ∆S1 in the equation above. The La-

grangian is evaluated on different trajectories but at the same time and so the ex-

pression under the integral is isochronous variation of the Lagrangian:

4.8 Noether’s theorem: proof 99

Zt2 Zt2

∂L ∂L

∆S1 = δL dt = δqa + δ q̇a dt.

∂qa ∂ q̇a

t1 t1

t2 Zt2

∂L ∂L d ∂L

∆S1 = δqa + − δqa dt.

∂ q̇a t1 ∂qa dt ∂ q̇a

t1

We arrived at the same expression when we derived Lagrange’s equations from the

variational principle but now the interpretation is different. There we assumed that

boundary points of the trajectory are fixed and so we assumed δqa (t1 ) = δqa (t2 ) = 0.

By this assumption, the first term in square brackets vanished and hence we deduced

that in order to satisfy δS = 0, the Lagrange equations must hold. But now the

boundary conditions are not fixed because we consider the transformation of the

system. However, we assume that the equations of motion are satisfied and therefore

the second term vanishes! Consequently, the only contribution from ∆S1 to total

variation is merely

t2

∂L

∆S1 = δqa .

∂ q̇a t1

Next we evaluate variation ∆S2 in the expression (4.29). Recall that we are ex-

panding all quantities up to the first order in variations δqa and δt. Thus, for example,

the first integral in ∆S2 is

t2Z+δt2 t2Z+δt2

t2 t2

Since the integral is taken over interval (t2 , t2 + δt2 ), the inequality

t − t2 < δt2

100 4 Variational principle

t2Z+δt2

t2

t2Z+δt2

t2

t1Z+δt1

t1

t2

∂L

∆S = δqa + L δt . (4.30)

∂ q̇a t1

Using the definition of generalized momentum (3.2) and relation between isochronous

and total variation (4.22) we find

The coefficient standing by variation δt is in fact equal to the Hamiltonian (3.8). The

reason why we do not denote it by H is that the Hamiltonian should be expressed

as the function of qa and pa which is not our case. But we know that Hamiltonian is

equal to the total energy and hence we define generalized energy by

E = pa q̇a − L

4.9 Basic conservation laws 101

Q = pa ∆qa − E δt. (4.33)

The total variation is then

∆S = [Q]tt21 = Q(t2 ) − Q(t1 ).

Now, by (4.28) we have ∆S = 0 and hence

Q(t1 ) = Q(t2 ). (4.34)

Since times t1 and t2 can be chosen arbitrarily, we have Q(t1 ) = Q(t2 ) for arbitrary

times t1 and t2 . In other words, the value of Q at arbitrary time t1 is equal to value

of Q at arbitrary time t2 . In other words, Q acquires the same value at each time

and hence Q is a conserved quantity,

Q = constant.

Nevertheless, Q is not our final expression for conserved quantity, because it con-

tains the variations ∆qa and δqa and hence it depends on particular transformation.

We have to clarify the nature of variations further. If we say that the system is in-

variant under, for example, translations, we actually mean that it is invariant under

arbitrary translation. The translation in, say, x−direction can be understood as a

continuous transformation parametrized by parameter a,

x 7→ x + a.

For a = 0 we have the identity transformation x 7→ x. Since a is a continuous

parameter, also the transformation x 7→ x + a is continuous in variable a.

This concludes the proof of Noether’s theorem.

t

u

In previous section we have proved the Noehter’s theorem for general transformation

of the system generated by infinitesimal variations ∆qa and δt. We have proved that

for such a general transformation, quantity (4.25) given by

Q = pa ∆qa − E δt

is conserved. In this section we investigate the implications of Noether’s theorem

regarding basic symmetries of the space and time discussed above: homogeneity of

space and time and the isotropy of space.

5

Hamilton-Jacobi equation

tion, namely the Lagrange and the Hamiltonian formulation. Lagrange’s equations

are formulated in arbitrary coordinate system. Their main advantage is that by an

appropriate choice of the coordinates we can eliminate the constraints which compli-

cate the analysis. Similarly to Newton’s law, Lagrange’s equations are second order

equations. Hamilton’s equations are also coordinate-independent but, in addition,

they have the form of first order differential equations. In general, first order equa-

tions are easier to solve. In the case of Hamilton’s equations, this advantage is only

formal because in order to solve the system of Hamilton’s equations we usually have

to convert them back to second-order equations. Main advantage of Hamilton’s equa-

tions is that we can interpret the motion of particles as the motion in the phase space.

We have seen that the conservation of energy allows us to find the phase trajectories

even without solving the equations of motion.

In this chapter we start with the analysis of such coordinate transformations

which leave the form of Hamilton’s equation invariant, so-called canonical trans-

formations. Then we study the possibility of finding such transformations which

simplify the Hamilton’s equations so that they can be solved easily. We will see that

this is indeed possible if we solve the Hamilton-Jacobi equation. In many situations,

Hamilton-Jacobi equation can be solved exactly and the solution of Hamilton’s equa-

tions simplify significantly. Analysis of Hamilton-Jacobi equation will lead us to a

new, third formulation of classical mechanics. Finally we introduce action-angle vari-

ables which will be useful in the analysis of more complicated systems with periodic

behaviour.

104 5 Hamilton-Jacobi equation

In Hamilton’s formalism we treat coordinates qa and momenta pa as independent

variables. Let us investigate such transformations which do not change the form of

Hamilton’s equations

∂H ∂H

q̇a = , ṗa = − . (5.1)

∂pa ∂qa

Hence, we are interested in transformations

preserving equations (5.1), i.e. such that new equations of motion will be

∂H0 ∂H0

Q̇a = , Ṗa = − . (5.3)

∂Pa ∂Qa

In chapter 4 we have seen that the Lagrangian is not determined uniquely, so that we

can add arbitrary function which is a total time-derivative to a Lagrangian without

affecting the equations of motion, recall equation (4.19).

Suppose that we have the Lagrangian L = L(q, q̇) and corresponding Hamiltonian

H = q̇a pa − L.

and obtain a new Lagrangian L0 = L0 (Q, Q̇) with associated Hamiltonian

H0 = Q̇a Pa − L0 .

We require that both Lagrangians yield the same equations of motion. Then, by

(4.19), two Lagrangians can differ only by a total derivative of some function F with

respect to time,

dF

L0 = L + .

dt

In terms of Hamiltonian this means

dF

q̇a pa − H = Q̇a Pa − H0 + . (5.4)

dt

In general, function F depends on both old coordinates, new coordinates and possibly

on time,

5.1 Canonical transformations 105

i.e. it is a function of 4n + 1 variables. But these coordinates are not all indepen-

dent as they are constrained by 2n equations (5.2). Hence, F is a function of 2n + 1

independent variables and we can decide which variables will be independent. Trans-

formations (5.2) are called canonical and function F is called generating function for

canonical transformations (5.2).

Let us choose a generating function F1 which is a function of old and transformed

coordinates (and possibly on time),

dF1 ∂F1 ∂F1 ∂F1

= q̇a + Q̇a + . (5.6)

dt ∂qa ∂Qa ∂t

Substituting this expression into (5.4) and comparing coefficients standing by inde-

pendent derivatives q̇a and Q̇a , respectively, we find

∂F1 ∂F1 ∂F1

pa = , Pa = − , H0 = H + . (5.7)

∂qa ∂Qa ∂t

Hence, we can define arbitrary function F1 of qa and Qa and, using relations (5.7),

we can find transformations which function F1 generates. Equation

∂

pa = F1 (q, Q, t)

∂qa

can be used to find defining relation for Qa , i.e. we can solve this equation to find

Qa = Qa (q, p, t).

∂

Pa = − F1 (q, Q, t)

∂Qa

which can be then solved to find relation

Pa = Pa (q, p, t).

106 5 Hamilton-Jacobi equation

and Pa . This can be achieved using familiar Legendre transformation. Let us write

the differential of F1 with the help of equations (5.7):

∂F1 ∂F1 ∂F1

dF1 = dqa + dQa +

∂qa ∂Qa ∂t

∂F1

= pa dqa − Pa dQa +

∂t

∂F1

= pa dqa − d(Qa Pa ) + Qa dPa + dt.

∂t

Let us define function

F2 = F1 + Qa Pa . (5.8)

∂F1

dF2 = pa dqa + Qa dPa + dt (5.9)

∂t

which means that F2 is function of qa and Pa ,

∂F2 ∂F2 ∂F2

Qa = , pa = , H0 = H + . (5.11)

∂Pa ∂qa ∂t

Canonical transformations preserve the equations of motion. Let us find such canon-

ical transformation that Hamilton’s equations simplify as much as possible so that

we can solve them explicitly. We introduce generating function of type (5.10) but we

will denote it by S:

S = S(q, P, t).

∂S ∂S ∂S

Qa = , pa = , H0 = H + . (5.12)

∂Pa ∂qa ∂t

5.2 Hamilton-Jacobi equation 107

equation

∂S

H+ = 0. (5.13)

∂t

Hamilton’s equations (5.3) with H0 = 0 then imply

In other words, transformed coordinates and momenta are constant. Equations (5.14)

can be solved trivially,

Qa = αa , P a = βa , (5.15)

where αa and βa are integration constants, but they are equal to constant values of

coordinates and momenta. Then the generating function can be written as

S = S(q, β, t) (5.16)

∂S ∂S ∂S

αa = , pa = , H+ = 0. (5.17)

∂βa ∂qa ∂t

Thus, if we want to find canonical transformation which simplifies the Hamilton

equations, we first solve equations

∂S ∂S

pa = , H(q, p, t) + .

∂qa ∂t

Notice that the first equation is merely a definition of pa so the only equation which

must be in fact solved is

∂S ∂S

H q, ,t + . (5.18)

∂q ∂t

This equation for generating function S is known as the Hamilton-Jacobi equation.

Hamilton-Jacobi equation contains 2n + 1 derivatives and therefore the solution S

contains 2n + 1 constants. On of them is additive, for obviously any function S 0 =

S + c, where c is constant, is also a solution to (5.18). This constant can be set to

zero without the loss of generality because Hamilton-Jacobi equation contains only

derivatives of S. Hence, the solution will contain 2n constants:

108 5 Hamilton-Jacobi equation

This result should be compared to equation (5.16) where βa are constant momenta.

Our aim was to arrive at Hamilton’s equations in the form (5.14), so in order to

identify constants ca with momenta βa we have to show that coordinates derived

from generating function (5.19) via (5.12) are indeed constant. We have

∂S

Qa =

∂ca

and using Hamilton’s equations and the Hamilton-Jacobi equation we find

d ∂S ∂ ∂S ∂ ∂S ∂ ∂S ∂ ∂S

Q̇a = = q̇b + = q̇b +

dt ∂ca ∂qb ∂ca ∂t ∂ca ∂ca ∂qb ∂ca ∂t

∂pb ∂H ∂H

= − .

∂ca ∂pb ∂ca

Now we use that fact that Hamiltonian H depends on constants ca only through

generating function S,

∂ ∂S(q, c, t) ∂H ∂pb

H q, , t = ,

∂ca ∂q ∂pb ∂ca

| {z }

p

Q̇a = 0.

tion generates canonical transformation after which the coordinates Qa are constant

and we denote them by αa = Qa as we did above. Then we can identify unknown con-

stants ca in function S with constant momenta Pa = ca = βa . Hamilton’s equations

in transformed coordinates thus read

Q̇a = 0, Ṗa = 0,

as desired.

5.3 Example: harmonic oscillator 109

The procedure explained in the previous section may seem to be somewhat abstract

and it could be useful to see how it works on our favorite example of harmonic

oscillator. Let us take, for simplicity, take the Hamiltonian in the form

p2 q 2

H(q, p) = + .

2 2

In order to formulate Hamilton-Jacobi equation, we replace the momentum p by

derivative of generating function S,

∂S

p= ,

∂q

in accordance with (5.12). Hamilton-Jacobi equation (5.18) then reads

∂S ∂S

+ H q, = 0.

∂t ∂q

Let us put

S = A(t) + W (q)

Jacobi equation acquires the form

∂A ∂W

= − H q, .

∂t ∂q

We know that the Hamiltonian H is constant and equal to the total energy,

∂A

= −E

∂t

which integrates to A = −Et and the generating function can be written in the form

S(q, E, t) = − E t + W (q).

write β = E. Hamilton-Jacobi equation is now

∂W

H q, = E.

∂q

110 5 Hamilton-Jacobi equation

1 2 1

(W 0 (q)) + q 2 = E,

2 2

and after rearrangement,

p

dW = 2E − q 2 dq.

This is an elementary integral and can be evaluated easily but with some work (or

using Mathematica). The result is

q 1 p

W (q, E) = E arcsin √ + q 2E − q 2 .

2E 2

where the additive integration constant has been set to zero1 .

Since we have identified integration constant E with constant momentum P = β,

we can use relation (5.17),

∂S

α= ,

∂β

to obtain constant transformed coordinate Q = α. By differentiating S = −Et + W

we find

∂W q

α= −t+ = − t + arcsin √ .

∂E 2E

We have proved in the previous section that α must be constant, we can use the last

equation to express q as a function of time t:

√

q = 2E sin(α + t)

1

√

In order to perform the integration, use the substitution q = 2E sin x to obtain 2E cos2 xdx. Then

R

use trigonometric formula cos2 x = (1 + cos 2x)/2 and perform trivial integration. Finally, return√ to

variablepq by inverting the relation for x and use formula sin 2x = 2 sin x cos x where sin x = q/ 2E and

cos x = 1 − sin2 x.

5.4 Action-angle variables 111

Let us continue with our analysis of harmonic oscillator. An important class of sys-

tems is described by so-called integrable Hamiltonians, the term to be defined later.

Before we discuss the integrability of the system, we need to introduce a new set of

canonically conjugated variables known as action-angle variables.

In the previous section we have seen that if the Hamiltonian is time-independent,

the action S can be written in the form

S = − E t + W (q, E).

dinate q and total energy E. Now we are going to use this function as a generating

function for canonical transformation.

We know that harmonic oscillator moves in a periodic way and its phase trajec-

tories are circles (or ellipses when we use simplified units, as we do in this chapter)

in the phase space. In other words, its phase trajectories are always closed curves.

Hence, it makes sense to define new momentum called action variable by

I

J= p dq (5.20)

where the integral is taken along the orbit of the oscillator, i.e. along the circle. We

said that J will be treated as a momentum which means that we identify transformed

momentum β with action-variable J. Recall that the Hamiltonian is equal to the total

energy,

H(q, p) = E

p = p(q, E).

Hence, the integrand of (5.20) depends on q and E. But since we integrate over

variable q, the integral does not depend on q anymore and we have

J = J(E) or E = E(J).

and J:

W = W (q, J).

112 5 Hamilton-Jacobi equation

generating function with respect to momentum. In our case W is the generating

function, J plays the rôle of momentum and conjugated coordinate will be called

angle variable and defined by

∂W

w= . (5.21)

∂J

Because generating function W does not depend on time explicitly, by (5.11) we

have H0 = H and since canonical transformations preserve the form of Hamilton’s

equations, equation of motion in terms of action-angle variables is simply

∂H

ẇ = . (5.22)

∂J

In the case of harmonic oscillator we have

p2 q 2

H= + =E

2 2

p

so that p = 2E − q 2 .

6

Electromagnetic field

Lagrange’s equations and Hamilton’s equations have been derived from Newton’s

law under assumption that the force which acts on the particle is conservative, i.e.

it can be written as a gradient of the potential,

F = −∇V.

potential energy and consequently we can introduce the Hamiltonian. On the other

hand, when we derived Lagrange’s equations from the variational principle, we just

assumed that the system can be described by some Lagrangian L without assuming

the conservative nature of the forces explicitly. We have only argued that if the force

is conservative and thus has a potential V , then the natural choice is L = T −V . This

approach, however, does not exclude the possibility that the system can be described

by some function L even if the force is not conservative.

Particle in electromagnetic field is the most important practical example of such

system. In physics we often study the motion of charged particles in external elec-

tromagnetic fields but we do not care how these fields emerged. Hence, we do not

study the dynamics of the fields, we merely assume that these fields are given and

investigate the motion of particles in regions where electromagnetic fields are present.

In the past people thought that electricity and magnetism are two different phe-

nomena while today we know that they are just two different aspects of single entity

called electromagnetic field. Electric part of the field is described by vector field E

(sometimes called electric field strength or electric intensity) and magnetic part of

the field is described by vector field B (sometimes called magnetic induction). Fully

unified view of these fields as parts of electromagnetic field is possible only in the

framework of special theory of relativity. Let us elucidate the meaning of fields E

and B.

114 6 Electromagnetic field

tional force are masses: gravitational force between two point masses m and m0 is

proportional to product m m0 and is given by

m m0

F =G 2

r

where r is the distance between the between the points and G is gravitational con-

stant. Numerical value of constant G in standard SI units is

Similarly, the sources of electromagnetic interaction are charges, i.e. charged par-

ticles. Charge is usually denoted by symbol q or e and it can be either positive or

negative. Particles with vanishing charge are called neutral. It is a remarkable fact

that for two point charges at rest, the electric force of their interaction is given by

the Coulomb law which is formally identical to the Newton’s law of gravitation. Two

point charges q and q 0 at mutual distance r act on each other by electric force of

magnitude

q q0

F =k (6.1)

r2

where k is the constant characterizing the strength of electromagnetic interaction and

plays the rôle similar to that of gravitational constant G in Newton’s law. Numerical

value of constant k depends on the system of units we use. In standard SI units we

write k in the form

1

k=

4πε0

where ε0 is called permittivity of the vacuum and its value is

k = 8, 99 × 109 F−1 m.

Comparing this value to the value of gravitational constant G we can see that electric

force is much, much stronger than gravitational force.

6 Electromagnetic field 115

However, simple Coulomb’s law (6.1) holds only for charges at rest. When the

charges start to move in an arbitrary way, new effects emerge. First, electromagnetic

field propagates at finite speed c equal to the speed of light,

Notice that in SI units, this value is not approximate but exact. It is related to

constant ε0 by

1

c= √

ε0 µ 0

where µ0 is called permeability of vacuum and its value is, by definition,

When we say that the speed of propagation of electromagnetic field is finite and

equal to c, we mean that if one charge changes its position, the other charges do not

feel this change immediately but only after time

r

∆t =

c

where r is the distance from the charge which changed the position. From this fact

it is immediately obvious that r in the Coulomb law (6.1) is a problematic quantity

because we must take into account that the charge at actual distance r cannot have

immediate effect on some other charge.

Next problem is that moving charge produces not only electric but also magnetic

field. Time-dependent electric field is a source of magnetic field and vice versa. This

is what we mean by dynamics of electromagnetic field: the field can propagate over

empty spacetime (without charges) at the speed of light. Hence, the notion of the

force is not appropriate for description of dynamics of electromagnetic interaction

and the notion of the field must be introduced.

But, as we claimed, we will not discuss the dynamics of electromagnetic field which

is given by celebrated Maxwell’s equations. We simply assume that the electromag-

netic field is given and investigate the motion of charged particles in this field. Once

again, electromagnetic field is described by electric field E and magnetic field B.

Consider particle with charge q which is moving in the region where only electric

field is present, i.e. B = 0. Then the electric field acts on the particle by force given

by

F = q E. (6.2)

116 6 Electromagnetic field

In other words, electric force is proportional to electric field E and the charge of

particle q, which is an experimental fact. Once we discover this fact, relation (6.2) is

a definition of electric vector E. Vector E is such vector that electric force exerted

on a point charge q is given by (6.2).

Similarly, consider particle moving in the region where only magnetic field is

present. Once again we find (experimentally) that the force acting on charge q is

proportional to the charge. But, in addition we find that the direction of magnetic

force is always orthogonal to the velocity v of the charge. It was discovered that

magnetic force is given by

F = qv×B (6.3)

where operation × is standard vector product1 (or cross product). Again, relation

(6.3) is a definition of magnetic vector B.

When both electric and magnetic fields are present, the force exerted on the

particle is given by the so-called Lorentz force

F = q (E + v × B) . (6.4)

We emphasize that relation (6.4) is an experimental fact, similarly as the Newton law

of force is, and we do not derive it from some more basic principle. It is fascinating

that relation (6.4) can be derived from more basic principles but this is completely

beyond the scope of this textbook2 . In the theory of electromagnetism it is shown

that instead of electric field E and magnetic field B we can introduce one scalar

function φ and one vector function A; it is a consequence of Maxwell’s equations. In

this textbook we proceed differently and assume that this can be done. From this

assumption we will be able to derive correct equations of motion of charged particle

in arbitrary electromagnetic field.

In accordance with the last paragraph of previous section, we assume that electro-

magnetic field can be described, in some sense, by one scalar field φ called scalar

1

Recall that the cross product a × b of vectors a and b is a vector orthogonal both to a and b and its

magnitude is |a × b| = a b sin θ where θ is the angle between both vectors.

2

Particular form of the Lorentz force can be obtained from the first principles by considering the Poincaré

group of isometries of the Minkowski spacetime. Electromagnetic fields appears to be a massless repre-

sentation of the Poincaré group with spin 1 which yields the set of Maxwell equations. The Lorentz force

can be then derived using the principle of local gauge invariance.

6.1 Lagrangian and equations of motion 117

potential and by one vector field A called vector potential so that the Lagrangian of

particle in electromagnetic field is

1 1

L= m v 2 − e φ + e v · A = m ẋi ẋi − e φ + e ẋi Ai . (6.5)

2 2

where e is a constant measuring the strength of the interaction between the particle

and the electromagnetic field; this constant is called charge of the particle. We assume

that the Lagrangian (6.5) represents correct description of particle moving in given

electromagnetic field. This assumption is justified a posteriori by accordance of the

theory with the experiment.

Equations of motion can be derived from usual Lagrange’s equations (2.18)

d ∂L ∂L

− = 0.

dt ∂ ẋi ∂xi

Partial derivatives read

∂L ∂L

= m ẋi + e Ai , = −e ∂i φ + e ẋj ∂i Aj .

∂ ẋi ∂xi

Note that total derivative of Ai with respect to time is

dAi ∂Ai dxj ∂Ai

= + ≡ ẋj ∂j Ai + ∂t Ai

dt ∂xj dt ∂t

and hence

d ∂L

= m ẍi + e ẋj ∂j Ai + e ∂t Ai .

dt ∂ ẋi

Collecting these auxiliary expression we find that the Lagrange equations of motion

are

Now, since ∂i ẋj = 0, the last term on the right hand side can be rewritten as

v × (∇ × A) = ∇(A · v) − v · ∇A

118 6 Electromagnetic field

dv ∂A

m = −e ∇φ − e + e v × (∇ × A). (6.7)

dt ∂t

This is the equation of motion of charged paricle. However, we can see that the

acceleration is not given directly by potentials but by their derivatives (that is the

reason why they are called potentials). Hence, we can introduce vectors

∂A

E = −∇φ − , B = ∇ × A, (6.8)

∂t

in which case we can write equation (6.7) in the form

dv

m = e (E + v × B) (6.9)

dt

which is the law for the Lorentz force (6.4). For the sake of completeness we list the

Cartesian components of equation (6.9):

dvx

m = e Ex + e vy Bz − e vz By ,

dt

dvy

m = e Ey + e vz Bx − e vx Bz , (6.10)

dt

dvz

m = e Ez + e vx By − e vy Bx .

dt

Having derived the Lagrange equations of motion of charged particle in an exter-

nal electromagnetic field, we now turn to the Hamiltonian description of the same

problem. Proceeding in a standard way we introduce a generalize momentum by

∂L

pi = = m ẋi + e Ai . (6.11)

∂ ẋi

In order to find the Hamiltonian we invert this relation to find

pi e

ẋi = − Ai . (6.12)

m m

Notice that although we are working in the Cartesian coordinates, generalized mo-

mentum

6.3 Mathematica 119

p = mv + eA

is different from linear momentum mv. The Hamiltonian is then given by the Leg-

endre transformation of the Lagrangian,

H = ẋi pi − L, (6.13)

where we must, however, express the velocities ẋi in terms of generalized momenta

(6.11). After simple rearrangements we find

1

H= (p − eA)2 + e φ. (6.14)

2m

Let us now differentiate the Hamiltonian with respect to coordinates and mo-

menta,

∂H e

= − (pj − eAj ) ∂i Aj + e ∂i φ,

∂xi m

∂H 1

= (pi − e Ai ) ,

∂pi m

from which the Hamilton equations follow:

1

ẋi = (pi − e Ai ) ,

m (6.15)

e

ṗi = (pj − e Aj ) ∂i Aj − e ∂i φ.

m

Hamilton’s equations (6.15) can be easily implemented in Mathematica. Although

following code may look a bit complicated, it is in fact very straightforward. We

implement function

HamiltonEM[φ, A]

and vector potential. This function consequently produces the list of six Hamilton’s

equations for the particle in electromagnetic field.

120 6 Electromagnetic field

Module B8xs, ps, eqs1, eqs2, dependencies, DA , DΦ <,

xs = 8x , y , z<;

ps = 8p1, p2, p3<;

dependencies = 8 x ® x @tD, y ® y @tD, z ® z@tD, p1 ® p1@tD, p2 ® p2@tD, p3 ® p3@tD<;

eqs1 = Equal

1

Transpose B : D@xs . dependencies, tD - H H ps - e A L . dependenciesL, 80, 0, 0<> F;

m

DΦ = D@Φ , ð D & xs;

DA = D@A , ð D & xs;

eqs2 = Equal Transpose B

e

: D@ps . dependencies, tD - H DA .ps - e DA .A L - e DΦ . dependencies , 80, 0, 0<> F;

m

Flatten @ 8eqs1, eqs2<D

F

As a first example we consider motion of charged particle in the homogeneous mag-

netic field B without the presence of electric field, i.e.

E = 0, B = constant.

Since magnetic and electric fields are assumed to be constant (or even vanishing),

potentials obviously do not depend on time, so that

E = −∇φ, B = ∇ × A.

Next, electric field vanishes and so, by last equations, potential φ is constant which

can be set to zero without the loss of generality. Remaining equation B = ∇ × A in

the component form reads

Bx = ∂y Az − ∂z Ay ,

By = ∂z Ax − ∂x Az ,

Bz = ∂x Ay − ∂y Ax .

It is possible to find the solution for arbitrary direction of magnetic field, but for

convenience we choose a coordinate system in which B has direction of z−axis,

B = (0, 0, B).

6.4 Homogeneous fields 121

priate rotation. With this choice we have

0 = ∂y Az − ∂z Ay ,

0 = ∂z Ax − ∂x Az ,

B = ∂x Ay − ∂y Ax .

Since B is constant along the z−axis, all partial derivatives ∂z must be zero:

0 = ∂y Az ,

0 = −∂x Az ,

B = ∂x Ay − ∂y Ax .

First two equations tell that Az does not depend on x and y and hence is a constant.

However, this constant does not enter expression for B in the third equation and

thus we can set Az = 0. Equation for B can be solved, for example, by setting

Ax = 0, Ay = Bx.

Summa summarum, potentials φ and A representing homogeneous magnetic field

parallel to the z−axis can be written in the form

φ = 0, A = (0, Bx, 0). (6.16)

Reader can check that ∇ × A = (0, 0, B). Of course, the choice of the potentials is

not unique and we have chosen the simplest possibility.

In the following code we generate the set of Hamilton’s equations by invoking

function HamiltonEM defined above and setting initial conditions to

x0 = 1, y0 = 0, z0 = 0, p10 = 0, p20 = 2, p30 = 0.

Numerical values of constants are set to

m = B = e = 1.

In[2]:=

vals = 8m ® 1, B ® 1, e ® 1<;

initConds = 8 x @0D 1, y @0D 0, z@0D == 0, p1@0D 0, p2@0D 2, p3@0D 0<;

tmax = 20;

sol = NDSolve @ Join @eqs, initCondsD . vals,

8x @tD, y @tD, z@tD, p1@tD, p2@tD, p3@tD<, 8t, 0, tmax <D;

122 6 Electromagnetic field

Now we plot the solution. All plotting options can be ignored, they serve just to

improve the quality of the plot.

In[114]:=

AxesOrigin ® 80, 0, 0<, Boxed ® False , PlotRange ® 88- 1, 3.5<, 8- 1, 2<, 8- 1, 1<<,

Ticks ® 8 Range @- 1, 3, 1D, Range @- 1, 2, 1D, 8- 1, 1<<,

BaseStyle ® 8FontFamily ® "Times New Roman ", FontSize ® 15<,

ViewPoint ® 81, 1, 1<D;

g2 = Graphics3D@ 8Text@Style @"x ", 15D, 83.5, 0.2, 0<D,

Text@Style @"y ", 15D, 8-0.2, 2, 0<D,

Text@Style @"z ", 15D, 8-0.05, 0.1, 1<D

<

D;

Show @g1, g2D

1 z

-1 -1

0

1 1

y

2 -1 2

3

x

We can see that the trajectory of the particle is a circle of radius 1 centered at

position (2, 0, 0). This is a familiar property of magnetic field: the field does not

perform the work on a particle, only changes direction of its motion. Since magnetic

force is always orthogonal to velocity, resulting trajectory is a circle.

Now suppose that we add an initial velocity in the z−direction, e.g. we set

p30 = 0, 1.

6.4 Homogeneous fields 123

That means that initial velocity is not orthogonal to magnetic field B anymore, but

the vz -component of the velocity does not affect magnetic force. Hence, in addition to

circular motion, the charge will move uniformly in z−direction. Resulting trajectory

of the particle is called helix (in order to obtain this figure in Mathematica, do not

forget to adjust the range on z−axis).

1 z -1

-1 0

1

2 1

-1 y

3 2

Let us consider another example. Suppose that in addition to magnetic field, there

is homogeneous electric field

E = (0, 0, E)

in the direction of axis z. This field is time-independent again and thus the equation

for scalar potential reads

E = −∇φ

or, in components,

∂φ ∂φ ∂φ

= 0, = 0, = −E,

∂x ∂y ∂z

from which we find

124 6 Electromagnetic field

φ = −x z.

Corresponding code:

In[248]:=

vals = 8m ® 1, B ® 1, e ® 1, E0 ® 0.01<;

initConds = 8 x @0D 1, y @0D 0, z@0D == 0, p1@0D 0, p2@0D 2, p3@0D 0<;

tmax = 100;

sol = NDSolve @ Join @eqs, initCondsD . vals,

8x @tD, y @tD, z@tD, p1@tD, p2@tD, p3@tD<, 8t, 0, tmax <D;

In this case, the motion of the particle consists of uniform circular motion in the

plane z = constant and uniformly accelerated motion in the direction of z−axis.

1 z -1

-1 0

1

2 1

-1 y

3 2

In this section we consider harmonic electromagnetic plane wave propagating in the

direction of x−axis. Electric field is assumed to have a form

E(t, x) = (0, 0, E0 cos(t − x)),

6.5 Electromagnetic wave 125

i.e. it has only z−component. E0 is the amplitude of the electric field. Electric field

is related to potentials via

∂A

E = −∇φ − .

∂t

Let us set φ = 0:

∂A

E=− .

∂t

This equation can be integrated to find the vector potential in the form

Z

A = − E dt = (0, 0, −E0 sin(t − x)) .

We can see that magnetic field has direction of y−axis and hence it is orthogonal

to electric field, which is a general property of electromagnetic waves. Derivation

performed above can be done with Mathematica using following commands:

In[11]:=

Needs@"VectorAnalysis`"D

El@t_ , x_ D = 80, 0, E0 Cos@t - x D<;

A = - à El@t, x D â t;

B = Curl@A . x ® Xx D . Xx ® x

Out[14]=

eqs = HamiltonEM @ 0, A D;

In[211]:=

vals = 8m ® 1, E0 ® 1, e ® 1<;

initConds = 8 x @0D 1, y @0D 0, z@0D 0, p1@0D 0, p2@0D 0, p3@0D 0<;

tmax = 100;

sol = NDSolve @ Join @eqs, initCondsD . vals,

8x @tD, y @tD, z@tD, p1@tD, p2@tD, p3@tD<, 8t, 0, tmax <D;

126 6 Electromagnetic field

and plotted by

In[216]:=

BaseStyle ® 8FontFamily ® "Times New Roman ", FontSize ® 15<,

ViewPoint ® 81, 1, 1<D;

g2 = Graphics3D@ 8Text@Style @"x ", 15D, 8- 5, 0.5, 0<D,

Text@Style @"y ", 15D, 80, 1.5, 0<D,

Text@Style @"z ", 15D, 80, 0, - 2.2<D

<

D;

g = Show @g1, g2D

-4 x

1.5

-2

1.0

0.5 0

0.0

0.5

1.0

y

1.5

2.0

z

6.6 Electrostatic wave 127

Relations (6.8) hold in general. It can be shown directly from Maxwell’s equations

that electric and magnetic fields can always be written in the form (6.8). However,

there are situations too complicated to be analyzed in this way. For example, electro-

magnetic field in the plasma is a complicated consequence of interaction of external

electromagnetic fields and fields produced by the particles comprising plasma. In

such situations we usually cannot find electromagnetic fields as exact solution to

Maxwell’s equations and some simplifications are necessary. One can imagine exter-

nal homogeneous magnetic fields penetrating to plasma and, in addition, an electro-

static wave propagating in the plasma. We have seen that electric way described by

time-dependent vector potential is always accompanied by magnetic field given by

the curl of this potential. Thus, any electric wave must be accompanied by magnetic

wave, as we have seen in the previous section.

On the other hand, in plasma it is possible for electric wave to propagate through

the medium without generating accompanying magnetic wave which is a consequence

of complicated interactions mentioned above. In this case we can proceed in the

following way. We assume the presence of homogeneous magnetic field B and assume

the presence of electrostatic wave. For example,

These fields cannot be described by the same vector potential and hence the equations

of motion cannot be derived from any potential. Nevertheless, with this prescription,

we can write down usual Newtonian equation of motion

dv

m = e (E + v × B)

dt

and solve it numerically. Appropriate Mathematica code reads

In[23]:=

B = 81, 0, 0<;

r @t_ D = 8 x @tD, y @tD, z@tD <;

eqs = 8Equal Transpose @8r ''@tD , El + r '@tD B<D,

x @0D 0, y @0D 0, z@0D 0,

x '@0D 0, y '@0D 0, z '@0D 0<

sol = NDSolve @ eqs, r @tD, 8t, 0, 100<D

ParametricPlot@8y @tD, z@tD< . sol, 8t, 0, 100<D

128 6 Electromagnetic field

Here we have set initial velocity to zero. The trajectory is found to be the spiral.

40

20

-40 -20 20 40

-20

-40

7

Discrete dynamical systems and fractals

This chapter is a digression from the main line but, first, discrete dynamical systems

provide a simple model of more complicated continuous dynamical systems which we

will study later and, second, we will plot nice pictures called fractals and get some

insight into complicated nature of chaotic systems.

We start our discussion with one of the most famous examples of fractals, the Man-

delbrot set, which is very easy to plot using Mathematica. Let us choose arbitrary

point z0 ∈ C in the complex plane and let us define a sequence of complex numbers

by recurrent relation

where f (z) = z 2 . Thus, starting from a given z0 , members of this sequence read

z1 = f (z0 ) + z0 = z02 + z0 ,

z2 = f (z1 ) + z0 = z04 + 2 z03 + z02 + z0 ,

···

We can use Mathematicato generate members of this sequence using the following

command

130 7 Discrete dynamical systems and fractals

z0 = z0 ,

z1 = z02 + z0 ,

z2 = z04 + 2z03 + z02 + z0 ,

z3 = z08 + 4z07 + 6z06 + 6z05 + 5z04 + 2z03 + z02 + z0 ,

z4 = z016 + 8z015 + 28z014 + 60z013 + 94z012 + 116z011 + 114z010 + 94z09 + 69z08

+ 44z07 + 26z06 + 14z05 + 5z04 + 2z03 + z02 + z0 ,

z5 = z032 + 16z031 + 120z030 + 568z029 + 1932z028 + 5096z027 + 10948z026

+ 19788z025 + 30782z024 + 41944z023 + 50788z022 + 55308z021 + 54746z020

+ 49700z019 + 41658z018 + 32398z017 + 23461z016 + 15864z015 + 10068z014

+ 6036z013 + 3434z012 + 1860z011 + 958z010 + 470z09 + 221z08 + 100z07

+ 42z06 + 14z05 + 5z04 + 2z03 + z02 + z0 .

Obviously, the complexity of each term zn grows very quickly with increasing n.

It is instructive to see the behaviour of the sequence graphically. Hence, we choose

some particular z0 and plot few terms zn of the sequence starting from z0 . Let us

define following functions:

seq[z0_, n_] := NestList[ #^2 + z0 &, z0, n] // Expand

list[z0_, n_] := {Re[#], Im[#]} & /@ seq[z0, n]

First definition defines function which generates the list of n members of the sequence

zn . For example, command seq[I, 10] generates the list of ten members of the sequence

starting at point z0 = i:

However, we cannot plot complex numbers directly and so we must convert each

complex number z = x + iy into a pair of coordinates (x, y). This is accomplished by

function list. We define a pure function

{Re[#], Im[#]}&

which splits the argument into its real and imaginary parts. Then we apply this pure

function to all elements of the list seq[z0,n]. Using the previous example, command

list[I, 10] produces

{{0, 1}, {−1, 1}, {0, −1}, {−1, 1}, {0, −1},

{−1, 1}, {0, −1}, {−1, 1}, {0, −1}, {−1, 1}, {0, −1}}.

7.1 Complex sequences 131

Notice that this sequence is periodic: except from the starting point i, the sequence

is jumping from −1 + i to −i and back, infinitely.

The list produced by list can be already plotted by ListLinePlot. Let us plot the

list list[I,10] by

ListLinePlot[ {list[I, 10]},

PlotRange -> Full, AxesOrigin -> {0, 0}, AspectRatio -> 1,

PlotMarkers -> Automatic,

PlotStyle -> { {Blue} },

BaseStyle -> {FontFamily -> "Times New Roman", FontSize -> 13}

]

Expected result is plotted in figure 7.1.

Now, let us choose a different starting point close to original point i, e.g z0 =

0.8i, and construct first ten members of the sequence again. We can compare both

trajectories using the following code:

ListLinePlot[ {list[I, 10], list[0.8 I, 10]},

PlotRange -> Full, AxesOrigin -> {0, 0}, AspectRatio -> 1,

PlotMarkers -> Automatic,

132 7 Discrete dynamical systems and fractals

BaseStyle -> {FontFamily -> "Times New Roman", FontSize -> 13}

]

We can see in figure 7.2 that the behaviour of the sequence changed significantly.

It is not periodic anymore but, in addition, it exhibits unpredictable behaviour. We

could guess that if we choose starting point z0 = 0.9i we obtain sequence ”somewhere

between” sequences starting from i and 0.8i. Reader is invited to plot the result for

z0 = 0.8i, here we just present the list of points produced by list[0.9I,10]:

{{0., 0.9}, {−0.81, 0.9}, {−0.1539, −0.558}, {−0.287679, 1.07175}, {−1.06589, 0.283359}, {1.05584, 0.295938},

{1.02721, 1.52493}, {−1.27023, 4.03285}, {−14.6504, −9.34529}, {127.3, 274.725}, {−59268.4, 69945.6}}

Obviously, this sequence is not bounded and it escapes to infinity very quickly.

What conclusion can be drawn from examples above? What we did actually see

is the most characteristic property of chaotic systems: sensitivity to initial condi-

tions. Particular choice of the starting point z0 corresponds to imposing the initial

condition. We have seen three sequences starting from points close to each other,

i, 0.9i and 0.8i. In non-chaotic systems, if we change initial positions slightly, also

the solution will change only slightly. In chaotic systems, the behaviour can differ

drastically even for very similar initial conditions. In our examples, first sequence

was periodic, second was unpredictable and the third one was diverging and tending

to infinity.

In the case of Hamiltonian systems we were able to visualize possible behaviour of the

system by the method of phase portraits. Phase trajectories of harmonic oscillator

were circles, phase trajectories of pendulum were more complicated and we revealed

the existence of two type of periodic motions (open and closed curves) separated

by separatrix. For chaotic systems it is usually impossible to plot a phase portrait

because trajectories are very complicated and irregular. For illustration, figure 7.3

certainly is not very useful.

However, in order to visualize extreme sensitivity to initial conditions, it is not

important to see all kinds of trajectories. Qualitative behaviour of trajectories is

more interesting. Each sequence can either stay in a bounded region or escape to

infinity. We cannot inspect asymptotic behaviour of particular sequence but we can

choose a fixed radius R and investigate whether the sequence stays inside the region

bounded by circle of radius R or whether it escapes the circle after some number

of steps. In such a way we can assign a number to each point of the plane. Let us

describe the algorithm more precisely.

7.2 Mandelbrot set 133

Fig. 7.2. Comparison of two sequences with close starting points i and 0.8i.

Fig. 7.3. Ten sequences starting from initial points of the form z0 = x + 0.8I, x ∈ (−0.5, 0.5).

134 7 Discrete dynamical systems and fractals

Parameters of the algorithm are radius R > 0 and maximum number of steps

nmax . We choose a point z0 = x0 + iy0 ∈ C and construct the sequence zn starting

from this point. If |zn | > R then the algorithm stops and returns value n. If |zn | < R,

we compute zn+1 and repeat the procedure. If n > nmax , the algorithm stops and

returns value nmax . In this way we assign an integer to each point z0 of the complex

plane or, equivalently, to each point (x0 , y0 ) of usual Euclidean plane.

Let us see how this algorithm can be implemented in Mathematica. In usual pro-

cedural languages we would use some kind of cycle like for or while. In Mathematica,

these cycles can be still implemented but functional methods are more satisfactory;

in this case we use function NestWhileList. Function Mandelbrot implementing the

algorithm described above follows:

Mandelbrot[x_, y_,

OptionsPattern[{MaxRadius -> 100, MaxSteps -> 50}]] :=

Module[ {c, R, n},

c = x + I y;

R = OptionValue[MaxRadius];

n = OptionValue[MaxSteps];

Length[NestWhileList[ N[#^2 + c] &, c, (Abs[#] < R) &, 1, n]]

]

The head of the function tells Mathematicathat the function has two obligatory pa-

rameters x and y – these are the coordinates of initial point (x0 , y0 ) in the plane.

Moreover, function accepts optional arguments specifying the behaviour of the func-

tion. In our case, optional parameters are maximum radius R with default value 100

and the maximum number of steps with default value 50. If we call function without

specifying optional parameters, e.g.

Mandelbrot[ 1, 3 ],

default values are used. If we want to change these values, we call the function in

the form, e.g.

Mandelbrot[1, 3, MaxRadius -> 20, MaxSteps -> 1000]

Reader should be familiar with this notation as it is used in many predefined functions

in Mathematica.

Then we define three local variables c, R and n. Variable c represents the initial

point because we set

c = x + i y.

7.2 Mandelbrot set 135

Variables R and n are set to the values of parameters MaxRadius and MaxSteps and

we introduce them only to increase the readability of the code. The core of function

Mandelbrot is in the last command. Function

NestWhileList[ N[#^2 + c] &, c, (Abs[#] < R) &, 1, n]

applies the pure function #2 +c&, which is our function f (z) = z 2 + z0 , to the initial

value c repeatedly. Calling of function N is included in order to obtain just numerical

value of the result instead of exact value which would take a long time and occupy

a lot of memory (after all, reader is invited to remove the calling of this function to

see the differnce). Function NestWhileList stops when the condition specified again

as a pure function is violated. In our case, the condition |zn | < R is typed as a pure

function (Abs[#] < R)&. Next parameter of NestWhileList specifies how many recent

results of nested call should be inserted to the test. Here we want to test only the last

result and hence set this parameter to 1. The last parameter n specifies maximum

number of the calls.

The result of NestWhileList is the sequence of numbers zn which stops if |zn | > R

or if n > nmax . The point is that this command returns the list of all members of

given sequence so taking its length we find how long this sequence is. This number

is then a result of function Mandelbrot.

Finally we can visualize function Mandelbrot using

DensityPlot[

Mandelbrot[x, y], {x, -1.5, 0.5}, {y, -1.3, 1.3},

PlotPoints -> 100]

Function DensityPlot serves to visualize functions of two variables not by plotting a

three-dimensional graph but by assigning a color to each point (x, y) depending on

the value of the function to be plotted. The result is shown in figure 7.4 and is known

as the Mandelbrot set.

The meaning of regions with different colors can be understood easily. For exam-

ple, if we choose zero to be the initial point, z0 = 0 then all members of the sequence

2

must be zero, for we have zn = zn−1 + 0 = 0. In other words, the sequence stays at

point zero for all n and therefore function Mandelbrot will stop only after maximum

number of steps have been reached. Indeed, typing

Mandelbrot[0, 0]

yields the result 51. That means that after 50 steps the sequence was still in the

circle of radius R. We can see that the neighbourhood of zero is plotted in white

color in figure 7.4. Hence, white regions correspond to high values of the function

Mandelbrot. Blue color, on the other hand, represents regions where the values of the

136 7 Discrete dynamical systems and fractals

function are small and so the sequence escapes the circle of radius R very soon. For

example, at point (−1.5, 1) the value of

Mandelbrot[-1.5, 1]

is equal to 5 which means that the sequence escapes the circle after 5 steps.

It is natural to expect that small numbers close to zero yield bounded sequence

while numbers distant from zero yield rapidly diverging sequences. An unexpected

feature of this construction is the existence of boundary between blue and white

region which exhibits highly non-trivial structure. This boundary is obviously irreg-

ular but when we zoom into the boundary, we find kind of self-similarity: at each

scale we observe similar shape of the boundary. In figure 7.5 we plot the boundary

of Mandelbrot sets for different zooms.

This complicated structure of Mandelbrot’s set corresponds to the behaviour ob-

served in the previous section. Two different but close points give rise to sequences

with very different behaviour: one sequence remains bounded while the other one

escapes to infinity. Thus, in this sense, Mandelbrot set visualize extreme sensitivity

of the sequence zn to the choice of the initial point.

Fig. 7.5. Mandelbrot set on different scales.

8

Dynamical systems

equations are equivalent to original Newton’s law of force F = ma if the force F

can be written as a gradient of the potential, i.e. if the force F is conservative. On

the other hand, we have seen that electromagnetic field is not conservative but the

motion of charged particle can still be described by the Lagrangian and consequently

by the Hamiltonian.

Hamilton’s equations

∂H ∂H

q̇a = , ṗa = −

∂pa ∂qa

are first-order ordinary differential equations and we have seen that such system of

equations can be given a geometrical interpretation in the phase space. In fact, using

the conservation of energy we were able to plot the phase trajectories even without

actually solving the equations of motion. In this chapter we study more general

system of equations when the right hand side is not derived from the Hamiltonian

but it is a general function. We will see that any second-order equation of motion

can be written as the system of first-order equation, but we will not be restricted to

conservative systems while the geometrical interpretation of the phase trajectories

will be preserved. Dynamical systems provide an appropriate framework for studying

all kinds of physical systems including those with the friction or time-dependent

external forces.

8.1 Definition

Dynamical system is a set of n first-order ordinary differential equations of the form

140 8 Dynamical systems

ẋ2 (t) = f2 (x1 (t), x2 (t), . . . xn (t), t),

.. (8.1)

.

ẋn (t) = fn (x1 (t), x2 (t), . . . xn (t), t),

where xa = xa (t) are unknown functions of time, a = 1, 2, . . . n and fa are arbitrary

differentiable functions of variables xa and possibly of time t. If functions fa do not

depend on time explicitly, dynamical system is called autonomous, otherwise it is

called non-autonomous. Using the index notation, dynamical system (8.1) can be

written briefly in the form

ẋa = fa (x, t) (8.2)

where x stands for the n−tuple of variables xa . Autonomous system is then

ẋa = fa (x).

In this notation we suppress time-dependence of xa on time because this dependence

is assumed implicitly.

Motivated by Hamilton’s formalism, we intend to interpret the solution xa = xa (t)

as the motion in the phase space. Phase space is an abstract space1

M = Rn [x1 , x2 , . . . xn ]

with coordinates xa . Arbitrary point x ∈ M represents the state of physical system

described by equations (8.1). Solution of dynamical system is not unique unless we

specify the initial conditions, i.e. values of coordinates xa at some given initial time

t0 ,

x10 = x1 (t0 ), ... xn0 = xn (t0 ).

Usually we set t0 = 0. The n−tuple of initial coordinates xa0 will be denoted simply

by x0 ∈ M .

Suppose we choose a point x0 ∈ M at time t0 = 0 as in figure 8.1. A mathematical

theorem guarantees that there exists unique solution x = x(t) satisfying (8.1) such

that x(0) = x0 . The solution x = x(t) is also called the phase trajectory. Equations

(8.1) essentially state that vector f (x(t)) evaluated at arbitrary point of the trajec-

tory is in fact tangent to the trajectory, see figure 8.1. Hence, we can interpret vector

field f (x) as a velocity. Although it can be very difficult or even impossible to solve

the equations of motion (8.1), the velocity gives us a good idea about the behaviour

of the system.

1

Our definition is a simplification. In differential geometry, the phase space is defined as cotangent bundle

on the configuration manifold endowed with canonical symplectic form ω = qa ∧ pa .

8.2 Example 141

x2 f (x0) = ẋ(0)

x20 x0

x

f (x) = ẋ

x10 x1

Fig. 8.1. Two-dimensional dynamical system of the form ẋa = fa . Initial position is at x0 = x(0).

The “velocity” vector at x0 is f (x0 ) and determines the trajectory of the system in the infinitesimal

neighbourhood of the initial point.

8.2 Example

Let us see an illustrating example. We are already familiar with the equation of

harmonic oscillator

θ̈ + θ = 0.

This is a second order equation but we can bring into into the firs-order form by

setting

x1 = θ, x2 = θ̇.

Then we have

ẋ1 = θ̇ = x2

and

ẋ2 = θ̈ = −θ = −x1 .

142 8 Dynamical systems

Hence, instead of single equation θ̈+θ = 0 of second order we now have two equations

of first order

ẋ1 = x2 ,

(8.3)

ẋ2 = −x1 .

Clearly, this is a dynamical system (8.1) if we set f1 = x2 and f2 = −x1 . Thus, the

velocity field is

f@x_ , y_ D = 8y , - x <;

StreamPlot@ f@x , y D, 8x , - 2, 2<, 8y , - 2, 2<D

Resulting figure is

-1

-2

-2 -1 0 1 2

This picture agrees with our previous analysis when we used the conservation of

energy to show that the phase trajectories of harmonic oscillator are circles (or ellipses

8.3 Implementation in Mathematica 143

when using SI units). Another possibility is to use function StreamPlot with the same

arguments which yields

-1

-2

-2 -1 0 1 2

In this section we show how to implement a dynamical system in Mathematica in a

convenient way.

vars = Table @ x @aD@tD, 8a, 1, Length @IC D<D;

lhs = D@vars, tD;

rhs = f@Sequence varsD;

eqs = Equal Transpose @8lhs, rhs<D;

inConds = Equal Transpose @8vars . t ® 0, IC <D;

NDSolve @Join @eqs, inCondsD, vars, 8t, 0, tmax <D

D

This code deserves a brief explanation. Arguments of the function DynDys are

• pure function f – this is a vector function representing the right hand side of

dynamical system (8.1);

144 8 Dynamical systems

• tmax – upper bound of interval t ∈ (0, tmax ).

Hence, function DynSys can be called, e.g. with the arguments

In[4]:=

{#2, -#1} &

which is equivalent to

Clearly, this corresponds to harmonic oscillator (8.4). Initial conditions IC are set to

x1 (0) = 1, x2 (0) = 1

Now suppose that we called function DynSys with the arguments above and let

us explain how this function works. Thus, we assume that the arguments are

f = {#2, -#1}&

IC = {1, 1}

tmax = 10.

The first command

vars = Table[ x[a][t], {a, 1, Length[IC]}];

creates a list of variables xa (t) in the form

vars = { x[1][t], x[2][t] }.

The left hand side of equations ẋa (t) = fa (t) is generated simply by calling

lhs = D[vars, t]

which yields

lhs = { x[1]’[t], x[2]’[t] }.

8.3 Implementation in Mathematica 145

Now we form the right hand side of equations. Recall that vars is the list of

variables. We want to evaluate functions fa at point xa , i.e. we need the expression

fa (x1 , . . . xn ).

f[{x[1][t], x[2][t]}]

while what we need is

f[ x[1][t], x[2][t] ].

Hence, we must turn the list vars into the sequence of arguments by replacing its

head. Command

rhs = f[Sequence @@ vars].

leads to correct application of function f to arguments xa :

f[ Sequence @@ vars ]= f[ x[1][t], x[2][t] ]

= { x[2][t], -x[1][t]}.

Having defined the left hand side and the right hand side of dynamical system

separately, we join them in a usual way,

eqs = Equal @@@ Transpose[ {lhs, rhs} ]

Next we define the initial conditions with values specified in argument IC={1,1}. We

need to produce the list

{ x[1][0] == 1, x[2][0] == 1 }

The left hand side of initial conditions consists of the elements of vars evaluated at

time 0,

vars /. t->0,

the right hand side consists of the elements of IC. We con join them together by

inConds = Equal @@@ Transpose[{vars /. t -> 0, IC}];

Finally we solve the list of equations of motion and initial conditions by NDSolve:

NDSolve[Join[eqs, inConds], vars, {t, 0, tmax}].

Now, the reader should be familiar with functionality of function DynSys. We use

this function in the following examples.

To finalize this section we show how to use function DynSys to solve the motion

of harmonic oscillator.

146 8 Dynamical systems

In[7]:=

Out[7]=

1.0

0.5

Out[8]=

- 0.5

- 1.0

In the previous chapters we introduced the mathematical pendulum as a simple

example of physical system which has only one degree of freedom (angle of deflection

θ) and is described by the Lagrange equation

θ̈ + sin θ = 0.

This equations is non-linear because of the presence of the sine. We have seen that

this equation cannot be solved in terms of elementary functions but we were able

to find the numerical solution. Moreover, using the Hamiltonian formalism we were

able to plot the phase trajectories without actually solving the equation of motion.

We can generalize the model of mathematical pendulum in several ways. First, any

realistic system is dissipative, i.e. there are resisting forces acting against the motion

8.4 Chaotic pendulum 147

has opposite direction. In the case of pendulum, velocity is proportional to θ̇ and

hence equation of pendulum with resisting force has the form

θ̈ + b θ̇ + sin θ = 0,

where b is the constant characterizing the strength of resisting force. For example,

it can be related to the viscosity of the medium in which the pendulum moves.

Pendulum with resisting force is called damped pendulum.

Next we can assume that in addition to restoring gravitational force there is an

external force acting on the pendulum. Such force is called driving force. In the

presence of driving force, even if the initial velocity of the pendulum is zero (and the

pendulum is at equilibrium position), driving force will make the pendulum to move.

Resulting motion of the pendulum will be a ”mixture” of two motions: periodic

motion due to self-oscillations of the pendulum, and motion due to driving force.

Pendulum with the driving force, driven pendulum with the friction is described by

equation

where we assume that driving force is harmonic with angular frequency Ω and ampli-

tude F0 . In the subsequent analysis we will show that this kind of pendulum exhibits

chaotic behaviour and hence we also call it chaotic pendulum.

We start with rewriting equation (8.5) in the form of dynamical system. This is

straightforward since we can define

x1 = θ, x2 ≡ p = θ̇, x3 = φ = Ωt.

We will freely pass from notation (θ, p, φ) to equivalent notation (x1 , x2 , x3 ) according

to the context. By definition, variable φ satisfies equation

φ̇ = Ω,

while variable p (which is clearly related to the momentum of the pendulum) was

defined by

θ̇ = p

which can be consequently regarded as an equation for θ. The only true dynamical

equation is an equation for p which follows from (8.5):

148 8 Dynamical systems

ṗ = F0 sin φ − b p − sin θ.

θ̇ = p,

ṗ = F0 sin φ − b p − sin θ, (8.6)

φ̇ = Ω,

Dynamical system (8.6) can be solved in Mathematica using function DynSys

defined in the previous section. We choose initial conditions

π

θ(0) = , p(0) = 0, φ(0) = 0

4

and investigate how values b, Ω and F0 affect the behaviour of the pendulum. First

we set

b = 0, , F0 = 0, , Ω = 0.

without friction and driving force (damping coefficient b = 0 and the amplitude of

the force is F0 = 0).

vals = 8b ® 0, f0 ® 0, W ® 0<;

In[37]:=

tmax = 10;

sol = DynSys@8 ð2, f0 Sin @ð3D - b ð2 - Sin @ð1D, W< & . vals, 8Π 4, 0, 0<, tmax D

Out[39]=

x @3D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD<<

Here we have chosen tmax = 10 but the reader should adjust this parameter in order

to reproduce all figures below. Now we can plot the phase trajectory in a usual way.

8.4 Chaotic pendulum 149

In[28]:=

0.6

0.4

0.2

Out[28]=

- 0.5 0.5

- 0.2

- 0.4

- 0.6

b = 0.1,

see figure 8.2 for the result. We can see that the phase trajectory is a spiral which,

in the limit tmax → ∞, ends at the origin of the phase plane. This means that the

oscillations are damped until the pendulum stops. Slightly more ”fancy” picture can

be obtained by

In[111]:=

BaseStyle ® 8 FontName ® "Times New Roman ", FontSize ® 15<

D;

g2 = Plot@ x @1D@tD . sol, 8t, 0, tmax <,

AxesLabel ® 8"t", "Θ "<, Ticks ® None ,

BaseStyle ® 8 FontName ® "Times New Roman ", FontSize ® 15<D;

GraphicsRow @ 8g1, g2<D

150 8 Dynamical systems

0.6

0.4

0.2

-0.2

-0.4

-0.6

Fig. 8.2. Parameters b = 0.1, F0 = 0. Non-zero friction leads to damped oscillations of the pendulum.

p Θ

Θ t

Fig. 8.3. Phase trajectory of damped pendulum together with the time dependence of deflection

θ = θ(t).

8.4 Chaotic pendulum 151

Let us see how the driving force affects the motion. For this purpose we set

b = 0, F0 = 1, Ω = 2,

In other words, the pendulum is initially at its equilibrium position and, hence,

without driving force it would stay at rest. However, the presence of driving force

leads to solution plotted in figure 8.4.

p

Θ

Θ t

Fig. 8.4. Motion of the pendulum without friction under the external driving force with amplitude

F0 = 1 and angular frequency Ω = 2. Initial position of the pendulum is θ(0) = p(0) = 0.

b = 1, F0 = 1, Ω = 1,

see figure 8.5. An interesting feature of this solution is the presence of short transient

stage during which the phase trajectory follows outgoing spiral but then settles at

circular periodic orbit. Such behaviour is called limit cycle.

The reader is invited to experiment with values of parameters b, F0 and Ω and with

initial values θ(0), p(0) and φ(0). We can see that resulting motion of the pendulum

is a consequence of complicated and delicate interplay between three motions:

152 8 Dynamical systems

p

Θ

Θ t

Fig. 8.5. Motion of the pendulum with friction (b = 1) under the external driving force with amplitude

F0 = 1 and angular frequency Ω = 1. Initial position of the pendulum is θ(0) = p(0) = 0.

• resisting force (friction);

• external driving force.

While for some combinations of parameters the motion is perfectly understandable

(like in the absence of the friction and the driving force or in the absence of driving

force but in the presence of friction), for general values the motion is unpredictable,

chaotic.

Let us return to equation of pure mathematical pendulum

θ̈ + sin θ = 0,

or, in the form of dynamical system,

θ̇ = p, ṗ = − sin θ. (8.7)

What are possible equilibrium positions of the pendulum? Clearly, if we set

θ(0) = 0, p(0) = 0,

the pendulum will not move. These conditions correspond to the situation when the

pendulum is hanging freely at the equilibrium position with zero initial velocity. In

this case the derivatives of θ and p take values

8.5 Critical points of the pendulum 153

In other words, the derivatives of all variables xa , where x = (θ, p), vanish and

therefore the pendulum does not move.

However, there is another possibility. If we set

θ(0) = π, p(0) = 0,

able to arrange initial conditions in such a way that the angle of deflection θ is exactly

equal to π and the velocity is zero, we would obtain an equilibrium configuration in

which the pendulum does not move.

Points with these properties are called critical points or fixed points. In general,

critical point xC of dynamical system

ẋa = fa (x)

ẋa = fa (xC ) = 0.

If we choose the initial point to x(0) = xC , the system will remain at this initial

position forever, it will not move. For mathematical pendulum we have two critical

points,

On the other hand, we feel that two critical points of mathematical pendulum

have a different character. The first critical point xC1 is stable in the sense that small

perturbation results in periodic oscillations near this critical point. Critical point

xC2 is unstable in the sense that arbitrarily small perturbation will cause the fall of

pendulum and results in oscillations around critical point xC1 !

This observation is based on our physical intuition but can we predict stability

or instability of critical points directly from equations? By definition, critical points

represent equilibrium configurations of the system. Can we predict the behaviour of

the system near the critical point?

154 8 Dynamical systems

Fig. 8.6. Two critical points of mathematical pendulum.

The idea is that small perturbations of stable critical points will produce small

deviations from equilibrium position, but small perturbations of unstable critical

points will result in a motion far from the critical point. We will use a basic fact

from mathematical analysis that, under certain assumptions, function f (x) can be

expanded into the Taylor series around arbitrary point xC in the following way

1 2 00

f (xC + δ) = f (xC ) + δ f 0 (xC ) + δ f (xC ) + O δ 3

(8.8)

2

where f 0 (x) denotes the value of derivative of f at point x, f 00 (x) is the second

derivative at the x, etc.

First we analyse critical point xC1 = (0, 0). Let us denote critical values of θ = x1

and p = x2 by

θC = 0, pC = 0.

θ = θC + δ where |δ| 1.

8.5 Critical points of the pendulum 155

θ̇ = δ̇.

Next we simplify equations of motion (8.7) under this assumption. Let us expand

sin θ around critical point θC = 0:

d

sin θ = sin(θC + δ) = sin θC + δ sin θ = sin θC + δ cos θC = δ

dθ θ=θC

where we have neglected higher powers of δ which is assumed to be small. With this

assumption, equations of pendulum (8.7) simplify to

δ̇ = p, ṗ = −δ.

These are, in fact, well-known equations for harmonic oscillator and we can easily

plot the solution which we already know is a circle. We can plot it by

sol = DynSys[ {#2, - #1} &, {0.1, 0}, 10];

ParametricPlot[ {x[1][t], x[2][t]} /. sol, {t, 0, 10}]

where we have chosen small initial deflection θ(0) = 0.1, in accordance with the

assumption. Since the solution is a circle, we observe that phase trajectories near

the first critical point remain in the vicinity of this critical point; an indicator of

stability.

Let us now investigate the second critical point located at

θC = π, pC = 0.

As in the previous case, we assume that deviations from critical value θC are small

and write

θ = θC + δ = π + δ, |δ| 1.

d

sin θ = sin(θC + δ) = sin θC + δ sin θ = sin π + δ cos π = δ.

dθ θ=π

δ̇ = p, ṗ = δ.

In order to compare solutions near both critical points we use the following code:

156 8 Dynamical systems

In[154]:=

tmax = 2 Π ;

Needs@"PlotLegends`"D

sol1 = DynSys@ 8ð2, - ð1< &, 80.1, 0<, tmax D;

sol2 = DynSys@ 8ð2, ð1< &, 80.1, 0<, tmax D;

ParametricPlot@ 88x @1D@tD, x @2D@tD< . sol1, 8x @1D@tD, x @2D@tD< . sol2<,

8t, 0, tmax <, PlotRange ® 8 8-0.5, 2<, 8-0.5, 2<<,

AxesLabel ® 8∆, p<, PlotStyle ® 8 Red , Blue <, BaseStyle ® 8FontSize ® 15<,

PlotLegend ® 8"Critical point Θ C =0", "Critical point Θ c = Π "<, LegendPosition ® 8-0.5, 1<

D

Both trajectories near critical points are plotted in figure 8.7. We can see that tra-

jectory corresponding to first critical point θC = 0 is a circle and thus remains in the

vicinity of the critical point. The second trajectory corresponding to critical point

θC = π, on the other hand, is a line which escapes to infinity. Hence, we can see

that the second critical point is unstable in the following sense. If we move the pen-

dulum to θ = π and set the initial velocity to zero, the pendulum remains at this

equilibrium position. However, arbitrarily small perturbation (in our case δ = 0.1)

will cause the pendulum to escape from equilibrium position quickly. In our case,

the trajectory escapes to infinity, but this is an artefact of the linearization: we have

assumed that the perturbation δ is small but as soon as the pendulum is far enough

from the critical point, this assumption is not valid anymore.

Having illustrated the main idea about the stability and instability on the example

of mathematical pendulum, we can proceed to a general theory. For simplicity we

restrict ourselves to autonomous planar dynamical systems, i.e. dynamical systems

with only two variables x1 = x and x2 = y which can be visualised in the plane.

Hence, planar dynamical system is a set of two first-order equations of the form

ẋ(xC , yC ) = 0, ẏ(xC , yC ) = 0,

8.6 Stability of critical points 157

Critical point Θ C = 0

Critical point Θ c = Π

p

2.0

1.5

1.0

0.5

∆

- 0.5 0.5 1.0 1.5 2.0

- 0.5

Fig. 8.7. Phase trajectories near two critical points θC = 0 and θC = π. In both cases the actual

deflection is θ = θC + δ but with different θC . We can see that the red trajectory is a circle about the

origin while the blue trajectory diverges to infinity rapidly.

which means that critical points represent the equilibrium configurations of the sys-

tem.

Now we want to investigate the stability or instability of critical points. That

means we want to find out how the phase trajectories behave in the vicinity of

critical points. In the case of the pendulum we have seen that an appropriate way

how to proceed is to linearize the system of equations near the critical point.

Let us assume that (xC , yC ) is a critical point of system (8.9). In the neighbour-

hood of critical point we can write

158 8 Dynamical systems

x = xC + δ, |δ| 1,

(8.11)

y = yC + ε, |ε| 1.

Since xC and yC are constants, for the time derivatives of x and y we have

ẋ = δ̇, ẏ = ε̇.

Function fx (x, y) can be then expanded into the Taylor series:

∂fx ∂fx

fx (x, y) = fx (xC + δ, yC + ε) = fx (xC , yC ) + δ +ε

∂x (xC ,yC ) ∂y (xC ,yC )

= a δ + b ε, (8.12)

where we have used definition (8.10) in the last step and denoted partial derivatives

of fx by

∂fx ∂fx

a= , b= . (8.13)

∂x (xC ,yC ) ∂y (xC ,yC )

Vertical line with the subscript indicates that partial derivatives must be evaluated

at the critical point. Similarly, for fy we find

fy (x, y) = fy (xC + δ, yC + ε) = c δ + d ε

where

∂fy ∂fy

c= , d= . (8.14)

∂x (xC ,yC ) ∂y (xC ,yC )

Thus, near the critical point, planar dynamical system (8.9) can be replaced by

simpler equations

δ̇ = a δ + b ε,

(8.15)

ε̇ = c δ + d ε.

Coefficients a, b, c and d are not functions but constants given by (8.13) and (8.14).

It is useful to write equations (8.15) in the matrix form. Let us define

ab

x= δε , J= .

cd

Then two equations (8.15) are equivalent to single matrix equation

ẋ = J · x (8.16)

where the dot denotes standard matrix multiplication.

8.6 Stability of critical points 159

8.6.1 Example

with

xC (1 + yC ) = 0, yC (1 − xC ) = 0.

We analyse these points separately. The emphasis is on finding the critical points

and deriving linearized equations of motion. The solution is merely stated because

we will analyse all cases in detail later.

a) Critical point (0, 0). In this case we write

x = xC + δ = δ, y = yC + ε = ε.

Now we have

the neighbourhood of the first critical point, the equations of motion are

δ̇ = δ, ε̇ = ε.

δ = C1 et , ε = C2 et ,

where C1 and C2 are integration constants. We will discus this later, but for now it

is obvious that the phase trajectory escapes to infinity because

lim et = ∞.

t→∞

160 8 Dynamical systems

b) Critical point (1, −1). In this case we write

x = xC + δ = 1 + δ, y = yC + ε = −1 + ε,

so that

fx = x(1 + y) = (1 + δ)(1 − 1 + ε) = ε, fy = y(1 − x) = (−1 + ε)(1 − 1 − δ) = δ,

where we have neglected products εδ again. Now the linearized equations of motion

are

δ̇ = ε, ε̇ = δ

which solve to

δ = C1 cosh t + C2 sinh t, ε = C1 sinh t + C2 cosh t.

The reader is invited to check that solutions (δ, ε) are hyperbolas escaping to infinity

and hence the second critical point is unstable again.

In the previous idea we defined the critical points and sketched how these points can

be divided to stable and unstable points. We have seen that mathematical pendulum

has two critical points, one is stable, the other is not. In the next example we have

seen a system with two unstable critical points. The classification of critical points,

however, is more subtle and we discuss all possibilities in this section.

Let us first recapitulate our goal. We study planar dynamical system described

by equations

ẋ = fx (x, y), ẏ = fy (x, y).

We assume that we have found critical point of this system, i.e. point (xC , yC ) such

that

fx (xC , yC ) = fy (xC , yC ) = 0,

and study the behaviour of the system near this critical point. We linearize the

equations in the neighbourhood of critical point so that we obtain equations2

2

In the notation of previous section, our functions x and y are in fact perturbations δ and ε. In this

section, however, we use x and y as they are more natural.

8.7 Classification of critical points 161

ẋ = a x + b y, ẏ = c x + d y.

ẋ = J · x

where

ab

J= .

cd

Now we discuss several forms of matrix J and classify the critical points. Finally we

will show how the analysis can be done for general matrix J .

Consider linear planar system of the form

ẋ = λ1 x, ẏ = λ2 y (8.17)

λ1 0

J= . (8.18)

0 λ2

System (8.17) can be easily solved. Equations for x and y are independent; we say

that these equations are decoupled which means that equation for ẋ does not contain

y and vice versa.

Let us solve equation

ẋ = λ1 x

dx

= λ1 x

dt

which is separable differential equation. We can rewrite it as

dx

= λ1 dt.

x

This form of equation is called separated because the left hand side of the equations

contains only x and the right hand side contains only time t. We can integrate the

equation,

162 8 Dynamical systems

dx

Z Z

= λ1 dt,

x

to obtain

log x = λ1 t + C

the solution, we write the constant as a logarithm as well3 :

log x = λ1 t + log K.

x = K eλ1 t .

y = L eλ2 t

have

write the solution of (8.17) in the form

Clearly, the only critical point of system (8.17) is (0, 0). Having derived solution

of this system, we can analyze its behaviour near the critical point. Useful function

to visualise properties of the system near critical point is StreamPlot which takes the

vector field and plots trajectories. In the following example we choose λ1 = λ2 = 1.

3

Notice that arbitrary real number C is a logarithm of some other real number, i.e. we can write C = log K

for some K.

8.7 Classification of critical points 163

vals = 8 Λ1 ® 1, Λ2 ® 1<;

StreamPlot@ 8Λ1 x , Λ2 y < . vals, 8x , - 10, 10<, 8y , - 10, 10<D

10

Out[173]=

0

-5

- 10

- 10 -5 0 5 10

In this figure we can see trajectories (8.19) for initial points (x0 , y0 ) chosen by Math-

ematica. Notice that we have inserted the right hand side of (8.17) as an argument

of function StreamPlot. We can see that the trajectories are straight lines emanating

from the origin (critical point) and tending to infinity exponentially.

What about other choices of λ1,2 ? It is clear that function eλt is increasing for

λ > 0 and decreasing for λ < 0. We can conclude that qualitative behaviour of the

system depends on signs of λ1,2 and four possibilities are shown in figure 8.8 which

was created by following commands in Mathematica. We distinguish three cases.

• λ1 > 0 and λ2 > 0

In this case the critical point is called unstable node. Trajectories are emanating

from the origin and they are repelled to infinity.

• λ1 > 0, λ2 < 0 or λ1 < 0, λ2 > 0

Critical point is called saddle point. Trajectories are repelled from y−axis and

attracted to x−axis (for λ1 < 0) or repelled from x−axis and attracted to y−axis

(for λ2 < 0).

• λ1 < 0 and λ2 < 0

Critical point is called stable node. Trajectories are attracted to the origin.

164 8 Dynamical systems

In addition to this classification, critical points with distinct values λ1 6= λ2 are called

singular while critical points with the same values λ1 = λ2 are called degenerate.

Clearly, the saddle points cannot be singular.

10 10

5 5

0 0

-5 -5

- 10 - 10

- 10 -5 0 5 10 - 10 -5 0 5 10

10 10

5 5

0 0

-5 -5

- 10 - 10

- 10 -5 0 5 10 - 10 -5 0 5 10

Fig. 8.8. Different behaviour of planar system (8.17) for different choices of λ1,2 . Critical points are

a) unstable node, b,c) saddle point, d) stable node.

8.7 Classification of critical points 165

Recall that planar dynamical system (8.17) can be represented by the matrix

(8.18),

λ1 0

J= .

0 λ2

From elementary linear algebra we know that with matrix J we can associate a set

of eigenvalues λ defined by equation

J · e = λe

(8.18) are λ1 and λ2 and corresponding eigenvectors are

1 0

e1 = , e2 = .

0 1

J · e1 = λ1 e1 , J · e2 = λ2 e2 .

always remain in these lines. If the trajectory is being repelled from the critical point

along direction e, the line determined by vector e is called unstable manifold. If the

trajectory is attracted to the critical point along the vector e, the line determined by e

is called stable manifold. For matrix (8.18), vectors e1 and e2 are always eigenvectors.

We can see that e1 lies on the x−axis and e2 lies on the y−axis. Hence, the axes are

stable or unstable manifolds of system (8.17), depending on the sign of λ1,2 .

The classification introduced above can be reformulated in the following way. Let

ab

J=

cd

ẋ = a x + b y, ẏ = c x + d y.

If matrix J has two real eigenvalues λ1 and λ2 , then critical point is stable/unstable

node or a saddle point, depending on the signs of these eigenvalues.

We illustrate this classification on the example. Consider dynamical system

ẋ = 2 x + y, ẏ = x, (8.20)

166 8 Dynamical systems

21

J= .

10

This matrix is not of the form (8.18) but we can apply the second criterion. Eigen-

values and eigenvectors can be found in Mathematica using

In[58]:=

Eigensystem @J D

Out[59]=

2 ,1- 2 , 1>>>

√ √

λ1 = 1 + 2, λ2 = 1 − 2,

√ √

1+ 2 1− 2

e1 = , e2 = .

1 1

Since λ1 > 0 and λ2 < 0, vector e1 defines the stable manifold and e2 defines unstable

manifold. Since both eigenvalues have different signs, the critical point is a saddle

point and it is regular. Phase trajectories together with stable and unstable manifolds

can be plotted by

In[54]:=

Show @g1, g2, g3D

Next special case we consider is the dynamical system of the form

8.7 Classification of critical points 167

10

-5

-10

-10 -5 0 5 10

Fig. 8.9. Phase portrait for dynamical system (8.20). Blue line represents unstable manifold, red line

represents stable manifold.

ẋ = α x + β y, ẏ = −β x + α y. (8.21)

αβ

J= . (8.22)

−β α

System (8.21) is little trickier to solve. Let us switch to polar coordinate system

by usual transformation

x = r cos θ, y = r sin θ,

p y

r= x2 + y 2 , θ = arctan .

x

These relations can be used to find

168 8 Dynamical systems

∂r x ∂r y

= , = ,

∂x r ∂y r

∂θ y ∂θ x

= − 2, = .

∂x r ∂y r2

∂r ∂r

ṙ = ẋ + ẏ = α r,

∂x ∂y

∂θ ∂θ

θ̇ = ẋ + ẏ = −β.

∂x ∂y

We can see that dynamical system (8.21) in polar coordinates decouples to two

independent equations for coordinates r and θ,

ṙ = α r, θ̇ = − β. (8.23)

dr

= α dt

r

which integrates to

log r = α t + log C

where the integration constant has been written as a logarithm (see footnote on page

162). Exponentiating the last equation we arrive at

r = C eαt .

Obviously, at time t = 0 we have r(0) = C and so we write the solution in the form

r = r0 eαt .

dθ = β dt

which integrates to

θ = β t + θ0

8.7 Classification of critical points 169

where the integration constant has been denoted by θ0 and represents the value of θ

at t = 0. Summa summarum, solution of system (8.23) acquires the form

r = r0 eαt , θ = θ0 + β t. (8.24)

Clearly, this represents motion at constant angular velocity β and constant radius r0

and therefore the phase trajectories are circles of radius r0 . If α 6= 0, the radius of

the ”circle” will be

r0 eαt

and hence the trajectory will be a spiral. If α > 0, the radius will increase exponen-

tially and the spiral will tend to infinity. If, on the other hand, α < 0, the radius will

decrease exponentially and the phase trajectories will spiral towards the origin. All

cases are plotted in figure 8.10 by Mathematica commands

J = 8 8Α , Β<, 8- Β, Α <<;

In[25]:=

PlotLabel ® "Α = 0, Β > 0", BaseStyle ® 8FontSize ® 10<D;

g2 = StreamPlot@ J .8x , y < . 8Α ® 0, Β ® - 1<, 8x , - 5, 5<, 8y , - 5, 5<,

PlotLabel ® "Α = 0, Β < 0", BaseStyle ® 8FontSize ® 10<D;

g3 = StreamPlot@ J .8x , y < . 8Α ® 1, Β ® 1<, 8x , - 5, 5<, 8y , - 5, 5<,

PlotLabel ® "Α > 0, Β > 0", BaseStyle ® 8FontSize ® 10<D;

g4 = StreamPlot@ J .8x , y < . 8Α ® - 1, Β ® 1<, 8x , - 5, 5<, 8y , - 5, 5<,

PlotLabel ® "Α < 0, Β > 0", BaseStyle ® 8FontSize ® 10<D;

g = GraphicsGrid @ 88g1, g2<, 8g3, g4< <D

• α=0

Critical point is called centre. Trajectories are circles centred at the origin.

• α>0

Critical point is called unstable focus, trajectories are spirals escaping to infinity.

170 8 Dynamical systems

• α<0

Critical point is called stable focus, trajectories are spirals tending to the origin.

Parameter β has the meaning of angular velocity. If it is zero, spirals become straight

lines and dynamical system reduces to previous case (8.17). If it is non-zero, its sign

determines the sense of rotation: trajectories orbit the origin in a clockwise sense for

β > 0 and in a counter-clockwise sense for β < 0.

Let us now analyse critical points of system (8.21) in terms of eigenvalues of

matrix (8.22)

αβ

J= .

−β α

We can use Mathematica to find the eigenvalues and eigenvectors of matrix (8.22)

by

In[44]:=

Out[44]=

λ1 = α − i β and λ2 = α + i β

with eigenvectors

i −i

e1 = , e2 = .

1 1

J · e1 = λ1 e1 , J · e2 = λ2 e2 .

The first observation is that the eigenvectors are complex and hence there are no

neither stable nor unstable manifolds, i.e. there is no real direction which is mapped

to the same direction. The only exception is when β = 0 since in this case dynamical

system (8.21) reduces to (8.17) and the eigenvectors become real.

Second, eigenvalues λ1,2 are mutually complex conjugated (as well as the eigen-

vectors),

8.7 Classification of critical points 171

aL Α = 0, Β > 0 bL Α = 0 , Β < 0

4 4

2 2

0 0

-2 -2

-4 -4

-4 -2 0 2 4 -4 -2 0 2 4

4 4

2 2

0 0

-2 -2

-4 -4

-4 -2 0 2 4 -4 -2 0 2 4

Fig. 8.10. Classification of critical points for the system (8.21): a, b) centre, c) unstable focus, d)

stable focus.

172 8 Dynamical systems

λ1 = λ2

where the bar denotes the complex conjugation. Hence, even if the dynamical system

is not of the form (8.21), we can conclude, that if the matrix J has two complex

conjugated eigenvalues

α ± i β,

and β as classified above.

Example. Consider dynamical system

ẋ = 2 x + 4 y, ẏ = −3 x + 2y.

This system is not of the form (8.21) but we can apply the criterion based on the

analysis of eigenvalues. In Mathematica we type

In[68]:=

Eigensystem @J D Expand

Out[69]= 2 ä 2 ä

3 , 2- 2 ä , 1>>>

3 3

where we have used Expand in order to simplify the expression for eigenvectors (try

this code without Expand). We have found two eigenvalues

√

λ1,2 = 2 ± 2 i 3 = α ± i β,

which are mutually complex conjugated. In this case, parameters α and β are

√

α = 2, β = 2 i 3.

dynamical system considered:

8.7 Classification of critical points 173

-2

-4

-4 -2 0 2 4

ẋ = x + 2 y, ẏ = −2 x − y.

In[148]:=

Eigensystem @J D Expand

Out[149]= 1 ä 3 1 ä 3

3 , -ä - + , 1>>>

2 2 2 2

√

λ1,2 = ±i 3 = α ± i β

√

α = 0, β = ± 3.

174 8 Dynamical systems

Since α = 0, critical point is a centre rather than focus. Trajectories of this dynamical

system are the following:

-2

-4

-4 -2 0 2 4

In the previous two sections we studied two special cases of planar linear dynamical

systems given by matrices

λ1 0 αβ

J= and J = .

0 λ2 −β α

However, we have seen that the analysis can be performed using the eigenvalues of

these matrices. Now we consider general linear planar dynamical system

ẋ = α x + β y, ẏ = γ x + δ y. (8.26)

Let us find the eigenvalues and eigenvectors of this general matrix. Recall that the

determinant of matrix J is

8.8 General case 175

D = det J = α δ − β γ.

The trace of the matrix is defined as a sum of its diagonal elements, i.e.

T = Tr J = α + δ.

J · e = λe

(J − λ I) · e = 0

α−λ β

(J − λ I) = .

γ δ−λ

solutions only if the determinant of the system is zero:

det (J − λ I) = 0.

(α − λ)(δ − λ) − β γ = 0.

λ2 − (α + δ)λ + α δ − β γ = 0,

or, equivalently

λ2 − T λ + D = 0.

√

T ± T 2 − 4D

λ1,2 = . (8.27)

2

Now we can summarize the classification of critical points as follows.

176 8 Dynamical systems

– λ1 6= λ2 – singular node

– λ1 = λ2 – degenerate node

– λ1 > 0, λ2 > 0 – unstable node

– λ1 , λ2 < 0 – saddle point

– λ1 < 0, λ2 < 0 – stable node

• λ1,2 = α ± i β, λ1 = λ2 (complex conjugated eigenvalues)

– α = 0 – centre

– α > 0 – unstable focus

– α < 0 – stable focus

Moreover, if the real parts of eigenvalues λ1,2 are non-zero, critical point is called

hyperbolic, otherwise it is called non-hyperbolic.

8.9 Examples

Example 1

ẋ = 2 x + y, ẏ = x + 2 y.

xC = 0, yC = 0.

Since the system is linear, we do not have to linearize it and can write the matrix of

linearized system immediately:

21

J= .

12

λ1 = 1, λ2 = 3.

Eigenvectors can be found easily by hand. Recall that eigenvectors are solutions to

equation

J · ei = λi ei , i = 1, 2.

8.9 Examples 177

homogeneous system of linear equations

11 a

· =0

11 b

where e1 = (a, b) is unknown eigenvector. Since the rows (or columns) of the matrix

above are linearly dependent4 , this system has infinitely many non-trivial solutions

satisfying condition a = −b. Hence, all eigenvectors corresponding to eigenvalue

λ1 = 1 have the form

a

.

−a

1

e1 = .

−1

λ2 = 3 is

1

e2 = .

1

1

λ1 = 1, e1 = ,

−1

(8.28)

1

λ2 = 3, e2 = .

1

Now we can classify the critical point (0, 0). Since the eigenvalues are real and non-

zero, critical point is hyperbolic. They are both positive and hence the critical point

is unstable node. Finally, eigenvectors are real and so the system has two unstable

manifolds given by e1 and e2 . Implementation in Mathematica is shown in figure

8.11.

4

This is a consequence of (8.27), because this equation has been derived under the assumption det(J −

λ I) = 0.

178 8 Dynamical systems

Dynamical system

x’ = 2 x + y, y’ = x + 2y

with the matrix J = K O

2 1

1 2

H* critical points *L

In[4]:=

88x ® 0, y ® 0<<

Out[4]=

Origin (0, 0) is the only critical point. Eigenvalues and eigenvectors are found by

In[22]:=

Eigensystem @J D

Out[23]=

hyperbolic point, unstable node

In[24]:=

H* unstable manifold e 1 = H 1, 1L *L

g2 = Graphics@8Blue , Thick , Line @ 8 - 10 81, 1<, 10 81, 1<<D<D;

H* unstable manifold e 1 = H- 1, 1L *L

g3 = Graphics@8Red , Thick , Line @ 8 - 10 8- 1, 1<, 10 8- 1, 1<<D<D;

Show @g1, g2, g3D

Out[27]=

0

-2

-4

-4 -2 0 2 4

8.9 Examples 179

Example 2

Linear dynamical system has the form

ẋ = − 2 x, ẏ = − 4 x − 2 y.

−2 0

J= .

−4 −2

Eigenvalues are

λ1 = λ2 = −2

degenerate node. Implementation in Mathematica is shown in figure 8.12.

Volterra-Lotka equations belong to the class of predator-prey models which describe

interaction between two populations. The population of preys has a tendency to grow

and the population of predators tends to die. It is due to their mutual interaction

that also the population of predators can grow and the population of preys can die,

in other words, predators are eating preys.

Let x = x(t) be the number of preys, say, rabbits, let y = y(t) be the number

of predators, say, foxes. We can construct a plausible model of interaction between

foxes and rabbits by following simple considerations. Suppose that y = 0, i.e. there

are only rabbits present. As a first approximation we can assume that the population

of rabbits will grow, the number of rabbits x will increase because of ”interaction”

between rabbits and the higher is the number of rabbits, the higher is the rate of

growth. Hence, we can postulate that isolated population of rabbits will be governed

by equation

ẋ = α x.

new rabbit when there are no foxes. This equation has solution

x = x0 e α t

180 8 Dynamical systems

Dynamical system

x’ = -2 x,

y’ = -4 x - 2 y

with the matrix J = K O

-2 0

-4 -2

H* critical points *L

In[28]:=

88x ® 0, y ® 0<<

Out[28]=

Origin (0, 0) is the only critical point. Eigenvalues and eigenvectors are found by

In[29]:=

Eigensystem @J D

Out[30]=

hyperbolic point, stable degenerate node

Stable manifold is given by e 1 = H 0, 1L

In[37]:=

H* stable manifold *L

g2 = Graphics@8Red , Thick , Line @ 8 - 10 80, 1<, 10 80, 1<<D<D;

Show @g1, g2D

Out[39]=

0

-2

-4

-4 -2 0 2 4

8.9 Examples 181

of death of the fox, isolated population of foxes will be governed by equation

ẏ = −γ y

y = y0 e−γ t ,

Now we add an interaction to our equations. The number of rabbits eaten by foxes

is proportional to number of rabbits and to number of foxes. Conversely, the number

of new-born foxes is proportional to number of foxes and to number of rabbits. If we

introduce constants β and δ for both processes, equations for interacting populations

of rabbits and foxes read

ẋ = α x − β x y, ẏ = −γ y + δ x y. (8.29)

These are Volterra-Lotka equations. Obviously, they are non-linear and the non-

linearity represents the interaction between two populations. All constants are as-

sumed to be positive.

Critical points can be found by

In[4]:=

:: x ® >, 8x ® 0, y ® 0<>

Out[4]= Γ Α

, y ®

∆ Β

γ α

xC1 = , , xC2 = (0, 0) .

δ β

In order to linearize equations (8.29) we introduce the Jacobi matrix J

∂ ẋ ∂ ẋ

J = ∂x ∂y

∂ ẏ ∂ ẏ .

∂x ∂y

The Jacobi matrix can be found in Mathematica by

182 8 Dynamical systems

f@x_ , y_ D = 8 Α x - Β x y , - Γ y + ∆ x y <;

In[16]:=

Out[17]=

which shows

α − yβ −xβ

J= .

yδ xδ − γ

Next we evaluate the Jacobian at both critical points:

J1 = J . cp@@1DD

In[24]:=

J2 = J . cp@@2DD

::0, - >, :

Out[24]= Β Γ Α ∆

, 0>>

∆ Β

Out[25]=

i.e. we have

βγ

0 − δ

J1 = α δ at critical point xC1 ,

0

β

α 0

J2 = at critical point xC2 .

0 −γ

Finally we find eigenvalues and eigenvectors by

Eigensystem @J1D

In[27]:=

Eigensystem @J2D

ä Β Γ ä Β Γ

::-ä Γ >, ::- , 1>, :

Out[27]=

Α Γ ,ä Α , 1>>>

Α ∆ Α ∆

Out[28]=

8.10 Flow of the vector field 183

In this section we introduce some useful notions related to the concept of dynamical

system. We consider general autonomous dynamical system (8.1)

We know that the solution exists and is unique if prescribe initial conditions

where xa0 are constants with the meaning of initial value of coordinates xa . The

solution of dynamical system is then a set of functions xa as functions of time,

where we have explicitly emphasized that particular solution depends on initial values

such that

d

x(0, x0 ) = x0 and x(t, x0 ) = fa (x(t, x0 )). (8.33)

dt

In other words, x(t, x0 ) is a solution of dynamical system (8.30) with initial conditions

(8.31).

It is useful to introduce slightly more formal notation for x(t, x0 ). We defined

the phase space M as an abstract space with coordinates xa . For n−dimensional

dynamical system, the phase space is

M = Rn = R

| × R{z

× · · · R} .

n

184 8 Dynamical systems

Φ : R × M 7→ M

defined by

Φs (x0 ) = x(s, x0 ).

Geometrically, the flow Φs is a mapping which maps arbitrary point x0 to point

x(s, x0 ), i.e. shifts point x0 along the phase trajectory by parametric distance s.

Hence, the flow satisfies relations

Φ0 (x0 ) = x0 , Φs+t = Φs ◦ Φt , (Φs )−1 = Φ−s .

Obviously,

dΦs (x0 ) d

= x(s, x0 ) = fa (x0 ).

ds s=0 ds s=0

Thus, we can also say that the flow Φs shifts point x0 along the vector field fa .

Let us illustrate it on the example of familiar planar dynamical system

ẋ = y, ẏ = −x

so that we have

f1 (x, y) = y, f2 (x, y) = −x.

Vector field fa can be plotted by

In[63]:=

10

Out[63]=

0

-5

- 10

- 10 -5 0 5 10

8.10 Flow of the vector field 185

x(0) = x0 , y(0) = y0 ,

sol = DSolve @ 8x '@tD y @tD, y '@tD - x @tD, x @0D x0, y @0D y0<, 8x @tD, y @tD<, tD

In[66]:=

Out[66]=

Thus, the flow Φs maps point (x0 , y0 ) to point which lies on the solution with initial

conditions (x0 , y0 ) at time s:

Hence, Φs (x0 , y0 ) is a position of the system at time s for initial conditions (x0 , y0 ).

In figure 8.13 we plot the flow for initial conditions

x0 = 1, y0 = 8.

We have seen that the curve Φs (x0 ) for a given x0 is a solution of dynamical

system with initial condition x(0) = x0 . This curve is called orbit of point x0 and is

denoted by

(8.35)

Λ− (x0 ) = {Φs (x0 ) | s < 0} .

186 8 Dynamical systems

IC = 8 x0 ® 1, y0 ® 8<;

In[187]:=

g2 = ParametricPlot@ 8x @tD, y @tD< . sol . IC , 8t, 0, 5<, PlotStyle ® Black D;

g3 = Graphics@ 8Black , Text@Style @"x 0 = F 0 H x 0 L", 8Large <D, 80.2, 9<D<D;

g4 = Graphics@ 8Black , Text@Style @"F 5 H x 0 L", 8Large <D, 8- 6, 4.2<D<D;

Show @g1, g2, g3, g4D

x 0 =F 0 Hx 0 L

10

F 5 Hx 0 L

5

Out[192]=

0

-5

- 10

- 10 -5 0 5 10

Recall that we have defined the critical point or fixed point xC of dynamical system

(8.30) as such point xC for which

fa (xC ) = 0

the sense that it remains in the critical point at all times, i.e.

Λ(xC ) = {xC }.

8.11 Lyapunov stability 187

We have classified critical points according to behaviour of the orbits (phase tra-

jectories) in the vicinity of the critical point. If the orbit remained in the vicinity of

critical point, we have said that the critical point is stable. If the orbit was attracted

to critical point, it was called stable node or stable focus, depending on the character

of the system. If the orbit was circular, critical point was called centre. Finally, if

the orbit escaped from the critical point to infinity, we called the critical point the

unstable node or unstable focus. However, this analysis was performed for linearized

dynamical system. Now we can formulate the stability for general non-linear system

in terms of the flow.

Let k · k be standard norm defined on the phase space M , i.e. for any x ∈ M its

norm is

q

kxk = x21 + x22 + · · · x2n .

In general, the norm is a measure of distance of point x from the origin. In some

situations, it is useful to introduce different notion of the norm, for example the

so-called p−norm (p is positive integer) defined by

p

distance, as follows from the Pythagorean theorem. In general, the norm must satisfy

three relations.

• Positive definiteness

• Linearity

kα xk = |α| kxk

• Triangle inequality

kx + yk ≤ kxk + kyk.

In some contexts the first condition is relaxed, i.e. we admit there are vectors

x 6= 0 for which kxk = 0. In this case, operation k · k is called semi-norm. In this

textbook we consider only positive definite norms satisfying the first property. Notice

that positive definiteness implies that whenever

188 8 Dynamical systems

kx − yk = 0,

Solution Φs (x0 ) is called Lyapunov stable if for any ε > 0 there exists δ > 0 such

that

called asymptotically stable if it is stable and, in addition, there exists δ > 0 such

that

s→∞

9

Bifurcations

In the previous chapter we defined the concept of dynamical system and introduced

several notions related to dynamical systems. Among others, we have investigated the

stability of critical points. This discussion was connected with the behaviour of the

phase trajectories (or orbits) n the neighbourhood of the critical point. In this section

we analyse dynamical systems from another point of view. Instead of investigating

the orbits (but using classification introduced in previous chapter) we investigate the

influence of the parameters of the system. We will observe that there are values of

parameters for which the system can exhibit different behaviour. Which behaviour

occurs depends on the circumstances, e.g. on the history of the system. Points at

which the system must ”decide” which behaviour to choose are called bifurcation

points. These issues will be clarified and illustrated below. Bifurcation theory is a

large subject and in this chapter we merely sketch the main ideas without going into

depth.

The existence and properties of critical points can depend on the parameters of

dynamical system. Consider one-dimensional dynamical system

ẋ = µ + x2 (9.1)

where µ is a real parameter. If µ > 0, there are no real critical points. For µ = 0, the

√

only critical point is xC = 0, and for µ < 0 there are two critical points at xC = µ

√

and xC = − µ. Let us examine the character of critical points briefly.

For µ = 0 and critical point xC = 0, the linearized version of system (9.1) reads

ẋ = 0

190 9 Bifurcations

which shows that xC is non-hyperbolic critical point (eigenvalue of Jacobi matrix has

vanishing real part).

√

For µ < 0, the critical point is xC = ± µ. We expand function

f (x) = µ + x2

√ √ √

f (x) = f (xC ) + (x − xC ) f 0 (xC ) = 2 µ + 2 (x ∓ µ) (± µ) = ±2 µ x.

√

Hence, system (9.1) linearized in the neighbourhood of point µ reads

√

ẋ = 2 µ x

√

which shows that critical point µ is unstable node. In the neighbourhood of critical

√

point − µ we have

√

ẋ = −2 µ x

and so this critical point is a stable node. We can plot critical points corresponding

to different values of µ by code presented in figure 9.1.

Saddle-node bifurcations occur when critical points do not exist for some values of

the parameter, then a critical point suddenly appears at some value of the parameter

and single critical point splits into two critical points for other values of the param-

eter. In our case, there are no critical points for µ > 0 but a critical point appears

at µ = 0. This is a bifurcation point. Finally, for µ < 0 there are two critical points,

one of them being stable, the other one being unstable.

Now consider dynamical system

ẋ = µ x − x2 = x (µ − x). (9.2)

Regardless on the value of µ, there is always one critical point at xC = 0 and one

critical point at xC = µ. Hence, unlike the case of saddle-node bifurcations, the

number of critical points does not change. However, we will show that the character

of these critical points change at the bifurcation point.

First critical point is xC = 0. After linearization of system (9.2) we find

ẋ = µ x.

9.2 Transcritical bifurcations 191

In[61]:=

-Μ , -

PlotStyle ® 88Dashed , Thick <, 8Thick <<, AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,

BaseStyle ® 8FontSize ® 15<,

Epilog ® 8Disk @80, 0<, 0.03D,

Text@"unstable node ", 8- 1, 1.3<D,

Text@"stable node ", 8- 1, - 1.3<D,

Text@"bifurcation point", 8-0.6, 0.1<D

<

F

xC

unstable node

1.0

0.5

Out[61]=

bifurcation point

Μ

- 2.0 -1.5 -1.0 - 0.5 0.5

- 0.5

-1.0

stable node

Obviously, for µ > 0, critical point is unstable while for µ < 0 it is stable. On the

other hand, after linearization of system (9.2) we have

ẋ = µ2 − µ x. (9.3)

This equation is inhomogeneous linear equation with constant coefficients and can

be solved by elementary methods. First we write down corresponding homogeneous

equation

ẋ = −µ x

which integrates to

192 9 Bifurcations

xH = C e−µ t

where subscript H stands for ”homogeneous”. Next we need to find any particular

solution of original inhomogeneous equation. This is trivial, however, for obviously

the choice x = µ is a solution to equation 9.3. By a mathematical theorem, general

solution to equation (9.3) is

x = µ + C e−µ t .

Constant µ does not affect the character of critical point (prove!) and only the ex-

ponential term matters. We can see that for µ > 0 the critical point is stable while

for µ < 0 it is unstable.

To summarize, we have found two critical points

xC = 0 and xC = µ

figure 9.2.

Transcritical bifurcations occur when there are two critical points for all values of

parameter. However, at bifurcation point (in our case µ = 0), these critical points

interchange their character and the point which was stable becomes unstable and

vice versa.

Next we examine the system

ẋ = µ x − x3 . (9.4)

Notice that this system is invariant under reflection x 7→ −x, for under this trans-

formation we have

and hence

ẋ = µ x − x3 7→ −ẋ = −µx + x3 → ẋ = µ x − x3 .

Thus, equation (9.4) does not change its form under the reflection, i.e. the reflection

is a symmetry of equation (9.4). Pitchfork bifurcations occur often in the systems

possessing some kinds of symmetries.

9.3 Pitchfork bifurcation 193

In[66]:=

In[83]:=

PlotStyle ® 88Blue , Thick <, 8Dashed , Red , Thick <<, AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,

Axes ® 8False , True <, BaseStyle ® 8FontSize ® 15<,

Epilog ® 8Disk @80, 0<, 0.03D,

Text@"unstable ", 8- 1, 0.1<D,

Text@"stable ", 81, 0.1<D,

Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D

<

D

xC

2

Out[83]=

unstable stable

0 Μ

-1

-2

of system (9.4) yields

ẋ = µ x

and so this critical point is stable for µ < 0 and unstable for µ > 0.

√

For µ > 0 there are two other critical points xC = ± µ. By linearization we find

√

ẋ = −2 µ (x ∓ µ)

194 9 Bifurcations

which shows that (ignoring the constant factor as in the previous section) both critical

√

points ± µ are stable. Indeed, µ > 0 and hence the factor standing by x is always

−2µ < 0. All possibilities are plotted in figure 9.3 again.

xC1@Μ_ ; Μ £ 0D = 0;

In[99]:=

xC2@Μ_ ; Μ > 0D = Μ ;

xC3@Μ_ ; Μ > 0D = - Μ ;

xC4@Μ_ ; Μ > 0D = 0;

In[104]:=

PlotStyle ® 88Blue <, 8Blue <, 8Blue <, 8Red , Dashed <<,

AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,

Axes ® 8False , True <, BaseStyle ® 8FontSize ® 15<,

Epilog ® 8Disk @80, 0<, 0.03D,

Text@"stable ", 8- 1, 0.1<D,

Text@"unstable ", 81, 0.1<D,

Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D

<

D

xC

1.0

0.5

Out[104]=

stable unstable

0.0

Μ

- 0.5

-1.0

9.4 Example 195

Dynamical system

ẋ = µ x + x3

standard analysis that bifurcation diagram for this system is correctly depicted in

figure 9.4.

9.4 Example

Now let us see a non-trivial example on pitchfork bifurcation. Let the system be

ẋ = µ x + y + sin x, ẏ = x − y. (9.5)

Our task is to determine the bifurcation point and type of bifurcation. We will use

Mathematica to solve particular steps.

First we find critical points by setting ẋ = 0 and ẏ = 0. Second equation imme-

diately gives x = 0 and hence equation for x reads

µ x + x + sin x = 0. (9.6)

Clearly, a general solution cannot be found analytically but we can see that for

arbitrary µ there is always a solution

xC = yC = 0.

Let us determine the character of this critical point. Jacobi matrix of system (9.5) is

µ+1 1

J= (9.7)

1 −1

In[30]:=

sys = Eigenvalues@J D

: >

Out[31]= 1 1

Μ- 8 + 4 Μ + Μ2 , Μ+ 8 + 4 Μ + Μ2

2 2

196 9 Bifurcations

xC1@Μ_ ; Μ £ 0D = 0;

In[11]:=

xC2@Μ_ ; Μ < 0D = -Μ ;

xC3@Μ_ ; Μ < 0D = - -Μ ;

xC4@Μ_ ; Μ > 0D = 0;

In[17]:=

PlotStyle ® 88Blue <, 8Blue <, 8Blue <, 8Red , Dashed <<,

AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,

Axes ® 8False , True <, BaseStyle ® 8FontSize ® 15<,

Epilog ® 88PointSize @Large D, Point@80, 0<D<,

Text@"stable ", 8- 1, 0.1<D,

Text@"unstable ", 81, 0.1<D,

Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D

<

D

xC

1.0

0.5

Out[17]=

stable unstable

0.0

Μ

- 0.5

-1.0

9.4 Example 197

µ, it is even easier to use Mathematica to plot dependence of λ1 and λ2 on µ.

Λ1@Μ_ D = sysP 1T

In[14]:=

Λ2@Μ_ D = sysP 2T

Plot@ 8Λ1@ΜD, Λ2@ΜD<, 8Μ, - 10, 10<, PlotStyle ® 8Blue , Red <D

Out[14]= 1

Μ- 8 + 4 Μ + Μ2

2

Out[15]= 1

Μ+ 8 + 4 Μ + Μ2

2

10

Out[16]=

- 10 -5 5 10

-5

Hence, for all values of µ we have λ1 < 0 while λ2 changes the sign for µ = −2. That

means that for µ < −2, when both eigenvalues are negative, the critical point is a

stable node. For µ > −2, the critical point is a saddle point because eigenvalues have

different signs.

Clearly, point µ = −2 is a candidate for being a bifurcation point. Since we cannot

solve equation (9.6) exactly, we restrict our attention to neighbourhood of potential

bifurcation point µ = −2. Critical points are roots of function

Rµ (x) = µ(x + 1) + sin x.

In figure 9.5 we plot this function for three values of µ. We can see that critical

points different from the origin appear only for µ > −2. Approximate location of

these critical points can be found by expanding function sin x in (9.6) up to the third

order,

1 3

sin x = x − x,

3!

198 9 Bifurcations

1 3

x (µ + 2) − x = 0.

6

One solution is, of course, x = 0, the other two are

p

x = ± 6(µ + 2). (9.8)

R Μ H x L = ΜH x +1L + sin x

0.4

Μ = - 2.1

0.2

Μ = -2

- 1.0 - 0.5 0.5 1.0

x

- 0.2 Μ = -1.9

- 0.4

Fig. 9.5. Plot of function Rµ (x) = µ(x + 1) + sin x. Its roots are critical points of system (9.5). For

µ ≤ −2, the origin x = 0 is the only critical point, for µ > −2 there are two critical points symmetric

about the origin.

Now we can determine the character of bifurcation point even without analysis

of new critical points. Recall that the origin is a critical point, stable for µ < −2

and unstable for µ > −2. New critical point emerge at bifurcation point and exist

for µ > −2. Hence, the bifurcation diagram is similar to that in figure 9.3. We can

deduce that the bifurcation is supercritical and two new critical points are stable.

In Mathematica we can easily find precise locations of critical points numerically

using function FindRoot. This function needs a starting point and we choose this

starting point to be approximate solution (9.8). Full Mathematica code for plotting

correct bifurcation diagram in the neighbourhood of the bifurcation point µ = −2 is

shown in figure 9.6.

9.4 Example 199

In[218]:=

;

In[229]:=

xC1@Μ_ Μ £ - 2D = 0;

xC2@Μ_ ; Μ > - 2D = 0;

xC3@Μ_ ; Μ > - 2D = cp@ΜD;

xC4@Μ_ ; Μ > - 2D = - cp@ΜD;

In[262]:=

8Μ, - 3, - 1<, PlotStyle ® 88Blue <, 8Red , Dashed <, 8Blue <, 8Blue <<,

AspectRatio ® 1, AxesLabel ® 8"Μ", "x C "<,

Axes ® 8False , True <, BaseStyle ® 8FontSize ® 15<,

Epilog ® 8Disk @80, 0<, 0.03D,

Text@"stable ", 8- 2.5, 0.2<D,

Text@"stable ", 8- 1.5, 2.5<D,

Text@"unstable ", 8- 1.2, 0.2<D,

Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D,

8PointSize @Large D, Point@8- 2, 0<D<,

Text@Style @"Μ=- 2", FontSize ® 15D, 8- 1.8, 0.2<D

<

D

xC

3

stable

2

Out[262]=

stable Μ =-2 unstable

0

-1

-2

-3

A

Important commands in Mathematica

A.1 D-derivative

Derivatives in Mathematica can be computed in several ways. Command of the form

D[f, x]

differentiates function f with respect to variable x. If we need n−th order derivative

of f , we use

D[f, {x, n}]

Similarly, second partial derivatives with respect to several variables can be calculated

by

D[ f, x, y ]

which is equivalent of

∂ 2f

∂x ∂y

For example, commands

D[ Sin[x^2], x]

D[ x^3, {x, 2} ]

D[ y x^2 + x y^2, x, y]

are equivalents of mathemematical expressions

d d2 3 ∂2

sin x2 , y x2 + x y 2

x,

dx dx2 ∂x ∂y

and produce following output

202 A Important commands in Mathematica

2 x Cos[x^2]

6 x

2 x + 2 y

A.2 Table

Command Table[...] creates one-dimensional or more dimensional lists of elements.

One-dimensional list can be created by

Table[ expr, {i, imin, imax} ]

where expr is some expression depending on variable i. Command Table subsequently

substitutes values of i into expression expr and produces a list of expressions. For

example, command

squares = Table[ i^2, {i, 1, 5} ]

produces a list

{1, 4, 9, 16, 25}

which is now stored in variable squares. In order to access individual elements of the

list, use the double-square-brackets [[ and ]]. For example, third element of the list

squares can be accessed via

squares[[ 3 ]]

which returns

9.

B

Some features of Mathematica

One of the most powerfull tools in Mathematica is the rule-based replacement. We

start by simple example. Suppose we have trivial expression

y

and we want to replace symbol y by some more complicated expression, say y = x2 .

Let us write

y /. y-> x^2

In the previous code, symbol /. means that we are going to use some rules of replace-

ment. The rule itself is

y -> x^2

and says that any occurence of symbol y will be replaced by expression x2 . This can

be useful when the expression is more complicated. Example:

x + y^2 - 1/y /.y->x^2

will replace all occurences of symbol y in expression x + y 2 − 1/y by x2 , so that the

result is

-(1/x^2) + x + x^4

We can define the list of rules as well. Imagine we want to replace simultaneously

x and y in some expression, for example, we want to replace x by x − 1 and y by

1 − y 2 in expression x2 + y 2 :

x^2 + y^2 /. { x-> x-1, y -> 1-y^2 }

which yields

204 B Some features of Mathematica

Sometimes it is useful to define the rules separately in order to increase the read-

ibility of the code. Previous example is equivalent to the following:

rules = { x-> x-1, y -> 1-y^2 };

x^2 + y^2 /. rules

B.2 Functions

In Mathematica you can define functions of any type and there are many features to

be covered. Here we discuss only what is necessary for the purposes of our textbook.

Function of one or more variables is defined according to scheme

func_name [ var1_, var2_, ... ] = expr

where func name is the name of new function. In square brackets you have to enu-

merate all variables which the function depends on. Notice the underline symbol

after the name of each variable. Assignment is performed via traditional symbol =.

Finally, on the right hand side there is an expression for the function.

2

For example, you can define function f = 3 x e−x as

f[ x_ ] = 3 x Exp[-x^2];

Now you can evaluate it at some point, say 10, by

f[10]

which yields

30

.

e100

If you need numerical value, type

f[10] //N

to find result 1.11602 × 10−42 .

Let us see an example of function of more variables.

f[ x_, y_, z_ ] = x^2 + y^2 + z^2

To evaluate this function at some point, say (1, 2, 3), type

f[1, 2, 3]

to get number 14.

B.3 Pure functions 205

Pure functions are very useful constructions in Mathematica. In mathematics there

is a difference between f and f (x), although these symbols are (in some contexts)

used as equivalent. Symbol f is a function of, say, one variable x, which means that

it maps real number into real number, mathematically

f : R 7→ R.

On the other hand, symbol f (x) is a value of function f at point x. More precisely,

f is a set of ordered pairs (x, y) such that there is only one y for each x. If a pair

(x, y) is an element of f , i.e. (x, y) ∈ f , we write usually

y = f (x).

Thus, f is a set of ordered pairs of real numbers, while f (x) is the single real number

meaning the value of f at point x.

Let us turn back to Mathematica. When you write, for example,

f[x_] = 1 + x^2

you tell Mathematica that the value of function f at point x is f (x) = 1 + x2 . But

the name of the argument is irrelevant, for if you write

f[q_] = 1 + q^2

you define exactly the same function! The name of argument is only formal. The

alternative is to use the pure function.

Consider following definition:

f = Function[ 1 + #^2 ]

Here we do not use the name of arguments. The sharp symbol # means the argument

of function regardless on its name. You can verify that function f defined in this way

behaves as function f[x] or f[q] defined above. Similarly, you can define function of

more variables by

f = Function[ #1^2 + #2^2 ]

where symbols #1 and #2 stand for the first and the second argument, respectively.

Calling

f[x,y]

206 B Some features of Mathematica

now yields

x2 + y 2 ,

calling

f[1, 3]

yields number 10.

Pure function can be defined without using command Function by symbol &.

Following three lines are equivalent:

f[x_] = 1 + x^2

f = Function[ 1+#^2 ]

f = (1 + #^2)&

Notation with symbol & is particularly useful if we need to use the function only

at one place but we do not need it later. Then it is unnecessary to define function

separately. For example, suppose that you are given a list

list = { 1, 2, 3, 4, 5 };

and you want to apply function f (x) = 1 + x2 to each element of list. We can use

operator /@:

(1+#^2)& /@ list

Here we defined a pure function (1 + #2 )& which, as we have seen, is an abstract

way of defining function 1 + x2 . Operator /@ now substitutes each element of list into

this pure function and produces a list

{2, 5, 10, 17, 26}.

B.4 Expressions

Anything you type in Mathematica is called expression and expressions can be di-

vided into two groups, atomic and composed. Atomic expressions are the most simple

elements, e.g. numbers, functions. Each expression has the so-called head which can

be found using function Head. For example, try the following code:

Head[2]

Head[4.5]

Head[2 + 3 I ]

Mathematica returns “values”

B.4 Expressions 207

which means that 2 was recognized as an integer, 4.5 as a real, and 2+3i as a complex

number. Mathematica’s power rests in its ability to work with symbolic expressions.

If the atomic expression is not identified as a number, its head is symbol. Verify this

fact for:

Head[x]

Head[Sin]

Head[f]

etc.

Atomic expressions we have seen above can be combined into composed expres-

sions. For example, symbols x and y are atomic expressions, but their sum x + y is

composed expression. The head of expression x+y is Plus (check!). If you want to

access several parts of composed expression, you can use function Part. For example,

Part[ x+y, 2 ]

returns the second part of composed expression x + y which is y. We can also list all

parts of the composed expression by Level:

Level[ x+y-z+b, 1 ]

yields {b, x, y, -z}. Try the following:

Head[x + y]

Head[x y]

Head[ x^y ]

You can see that Mathematica returns Plus, Times, Power. If, however, you type

Head[x-y]

Mathematica returns Plus again (did you expect “minus”?). The reason is obvious,

for if we type

Level[ x-y, 1 ]

Mathematica returns {x, -y}. Thus, Mathematica treats expression x − y as a sum

of x and −y. Typing

Head[-y]

Level[-y,1]

yields

208 B Some features of Mathematica

Times

{-1, y}

Therefore, −y is a product of −1 and y. Reader is invited to experiment with several

expressions in order to get feeling for the structure of Mathematica.

Heads of arbitrary expressions can be replaced without changing the structure of

expression. For example, expressions

x + y + z

x y z

{x, y, z}

all have the same strcture and differ only by the head. We can verify that by functions

Head and Level. These functions reveal that the head of the first expression is Plus,

the head of the second one is Times and the head of the third expression is List.

Nevertheless, calling the function Level shows that the structure of all expressions is

{x, y, z}.

Therefore, by changing the head, we can easily convert those expressions between

themselves. The head of the expression can be changed by function Apply as in the

following example:

Apply[ Plus, {x, y, z} ]

turns the list {x,y,z} into expression x+y+z. The same operation can be written in

an abbreviated form as

Plus @@ {x, y, z}

The head Plus is applied by operator @@ to the list on the right hand side.

C

Shortcuts in Mathematica

Greek letters can be typed in several ways. The most convenient is to use following

table:

α ESC a ESC ι ESC i ESC σ ESC s ESC

β ESC b ESC κ ESC k ESC τ ESC t ESC

γ ESC g ESC λ ESC l ESC φ ESC f ESC

δ ESC d ESC µ ESC m ESC χ ESC c ESC

ESC a ESC ν ESC n ESC ψ ESC y ESC

ζ ESC z ESC ξ ESC x ESC ω ESC w ESC

η ESC h ESC π ESC p ESC

θ ESC q ESC ρ ESC r ESC

For example, to type α just press Escape key, then type a and press Escape again.

Mathematica will automatically display symbol α. Another way is to use table

β \[Beta] κ \[Kappa] τ \[Tau]

γ \[Gamma] λ \[Lambda] φ \[Phi]

δ \[Delta] µ \[Mu] χ \[Chi]

\[Epsilon] ν \[Nu] ψ \[Psi]

ζ \[Zeta] ξ \[Xi] ω \[Omega]

η \[Eta] π \[Pi]

θ \[Theta] ρ \[Rho]

D

To do

• Rotation matrices

• Full analysis of chaotic pendulum

• Matrix eigenvalues

• Volterra-Lotka equations

• Pictures on Lyapunov stability

• More coordinate systems

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.