Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The principle of least action originates in the idea that Nature has a purpose and
thus should follow a minimum or critical path. This basic principle, with its variants
and generalizations, applies to optics, mechanics, electromagnetism, relativity and
quantum mechanics, and provides a guide to understanding the beauty of physics.
This text provides an accessible introduction to the action principle across these
various fields of physics and examines its history and fundamental role in science.
It includes explanations from historical sources, discussions of classic papers, and
original worked examples.
Different sections require different levels of mathematical sophistication. How-
ever, the main story line is accessible not only to researchers and students in physics
and the history of physics, but also to those with a more modest mathematical
background.
A L B E R T O R O J O is an Associate Professor at Oakland University. He is a Ful-
bright Specialist in Physics Education and he was awarded the Jack Williams
Endowed Chair in Science and Humanities from the University of Eastern New
Mexico. His research focuses primarily on theoretical condensed matter, and he
has previously published popular science books.
A L B E RTO RO J O
Oakland University, Michigan
ANTHONY BLOCH
University of Michigan
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906
www.cambridge.org
Information on this title: www.cambridge.org/9780521869027
DOI: 10.1017/9781139021029
c Alberto Rojo and Anthony Bloch 2018
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2018
Printed in the United Kingdom by Clays, St Ives plc
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Rojo, Alberto G., author. | Bloch, Anthony, author.
Title: The principle of least action : history and physics / Alberto Rojo
(Oakland University, Michigan), Anthony Bloch (University of Michigan).
Description: Cambridge, United Kingdom ; New York, NY :
Cambridge University Press, 2018.
| Includes bibliographical references and index.
Identifiers: LCCN 2017023575| ISBN 9780521869027 (hardback ; alk. paper) |
ISBN 0521869021 (hardback ; alk. paper)
Subjects: LCSH: Least action. | Variational principles. | Mechanics. |
Lagrange equations. | Hamilton-Jacobi equations.
Classification: LCC QA871 .R65 2017 | DDC 530.1–dc23
LC record available at https://lccn.loc.gov/2017023575
ISBN 978-0-521-86902-7 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents
1 Introduction 1
v
vi Contents
ix
x List of Illustrations
We thank Pablo Amster for useful input, Michael V. Berry for very valuable com-
ments on a preliminary version of our manuscript, Danilo Capecchi for answering
our questions on the history of virtual work and d’Alembert’s principle, Olivier
Darrigol for comments on McCullagh’s theory of elasticity, David Garfinkle for
his suggestions on the relativity chapter, Ursula Goldenbaum for her remarks on
Maupertuis’s controversy, William Kentridge for his wonderful cover illustration,
Mario Mariscotti for his feedback, Guillermo Martínez for useful comments and
for his input on the cover, Jeffrey K. McDonough for helpful comments on Leib-
niz, Luis Navarro Veguillas for remarks on the adiabatic principle in quantum
physics, Bernardino Orio de Miguel for sending us his translation from Latin of
the Bernoulli-Leibniz correspondence, Peter Pesic for useful discussions, Roshdi
Rashed for comments on Islamic science, Jeffrey Rauch for discussions on Huy-
gens’ principle, Ignacio Silva for references on Aristotle, and Alejandro Uribe for
his comments on optics and saddle paths.
1
Introduction
The idea of writing a book on the principle of least action came to us after many
conversations over coffee, while we pondered ways of communicating to students
the ideas of mechanics with an historical flavor. We chose the principle of least
action because we think that its importance and aesthetic value as a unifying idea
in physics is not sufficiently emphasized in regular courses. To the general public,
even to those interested in science at a popular level, the beautiful notion that the
fundamental laws of physics can be expressed as the minimum (or an extremum) of
something often seems foreign. Nature loves extremes. Soap films seek to minimize
their surface area, and adopt a spherical shape; a large piece of matter tends to
maximize the gravitational attraction between its parts, and as a result the planets
are also spherical; light rays refracting in a glass window bend and follow the path
of least time; the orbits of the planets are those that minimize something called
the “action;” and the path that a relativistic particle chooses to follow between two
events in space-time is the one that maximizes the time measured by a clock on the
particle.
Our initial intention was to write a popular book, but the project morphed into
a more technical presentation. Nevertheless we have tried to keep sophisticated
mathematics to a minimum: nothing more than freshman calculus is needed for
most of the book, and a good part of the book requires only high school algebra.
Some familiarity with differential equations would be useful in certain sections.
While the different sections have various levels of difficulty, the book does not need
to be read in a linear fashion. It is quite feasible to browse through this book, as
most of the chapters and many of the sections are relatively self-contained. Sections
and subsections that are a bit more technical and that can easily be omitted on a
first reading include 1.1, 2.5, 3.2.2, 3.2.5, 4.3, 5.7, 6.2 to 6.6, 7.7 and 8.8. These
are marked in the text with an asterisk.
The gold standards on the topic of our book are The Variational Principles
of Mechanics by Cornelius Lanczos and Variational Principles in Dynamics and
1
2 Introduction
Quantum Theory by Wolfgang Yourgrau and Stanley Mandelstam. Our book can
be regarded as a supplement to these two masterpieces, with expositions that follow
the historical development of minimum principles, some elementary examples, an
invitation to read the primary sources, and to appreciate science, in the words of
Isidor Isaac Rabi, as a “human endeavor in its historic context, . . . as an intellectual
pursuit rather than as a body of tricks.”
The metaphysical roots of the least action principle are in Aristotle’s statement
from De caelo and Politics: “Nature does nothing in vain.” If there is a purpose in
Nature, she should follow a minimum path. At least that is the notion pursued by
Hero of Alexandria in the first century AD to deduce the law of reflection: light
follows the path that minimizes the travel time. Later, in 1657, Pierre de Fermat
extended this idea to the refraction of light rays. “There is nothing as probable or
apparent,” says Fermat, “as the assumption that Nature always acts by the easiest
means, which is to say either along the shortest lines when time is not a considera-
tion, or in any case by the shortest time.” The Arabic astronomer, Ibn al-Haytham,
also uses the principle of “the simplest way” to explain refraction. Galileo, in pos-
tulating the uniform acceleration of freely falling bodies, in 1638, also echoes
Aristotle: “we have been led by the hand to the investigation of naturally acceler-
ated motion by consideration of the custom and procedure of Nature herself in all
her other works, in the performance of which she habitually employs the first, sim-
plest, and easiest means.” In 1746, Pierre Louis Moreau de Maupertuis postulated
the principle of least action. His proposal, based on metaphysical and religious
views, reflected his adherence to notions of simplicity that had guided Fermat and
Galileo: “Nature, in the production of its effects,” he wrote, “does so always by the
simplest means.” More specifically: “in Nature, the amount of action (la quantité
d’action) necessary for change is the smallest possible. Action is the product of the
mass of a body times its velocity times the distance it moves.” His formulation was
vague, but, in the hands of Leonard Euler, it later became a well-formulated prin-
ciple. Gottfried Leibniz used similar (but not identical) ideas to study refraction of
light. Leibniz’s idea is of a “most determined” path and this reflects “God’s inten-
tions to create the best of all possible worlds.” “This principle of Nature,” he says
in his Tentamen Anagogicum, “is purely architectonic,” and then he adds: “Assume
the case that Nature were obliged in general to construct a triangle and that, for this
purpose, only the perimeter or the sum were given and nothing else; then Nature
would construct an equilateral triangle.”
The formulation of mechanics in terms of minimum principles originates in the
optical mechanical analogy first used by John Bernoulli to solve the “brachis-
tochrone problem:” what path between two fixed points in a vertical plane does
a particle follow in order to minimize the time taken? Bernoulli maps the prob-
lem to that of a light ray refracting in a medium of varying index of refraction,
Introduction 3
where light follows the path of least time. The mapping between mechanics and
optics becomes an isomorphism with Maupertuis’s formulation. The minimization
of action for a particle and the minimization of time for a light ray become the
same mathematical problem provided the index of refraction is identified with the
momentum of the particle: the paths are isomorphic. The principle of least action
then may be viewed as an alternative and equivalent formulation to Newton’s laws
of motion.
In the Age of Enlightenment, Newton’s ideas were extended to incorporate con-
straints in mechanical systems. The key figures are James Bernoulli, Jean le Rond
d’Alembert, and Joseph-Louis Lagrange. The central concept for these develop-
ments is the principle of virtual work, which establishes the conditions of static
equilibrium and its extension to dynamics. The work of Lagrange, starting in 1760,
is of supreme importance. For a constrained system with r degrees of freedom
(for example, a particle constrained to move on the surface of a sphere has r = 2
since at a given position it can move in only two directions), he is able to express
the dynamics in terms of a single function L (the Lagrangian) through r equa-
tions identical in structure. Lagrange’s equations can be derived from a minimum
principle, giving rise to an expanded version of the principle of least action: Mau-
pertuis’s minimum principle gives the path between two points in space for a fixed
value of the energy, while Lagrange’s integral gives the path that takes a given
time t between two fixed points in space. Lagrange’s ideas were extended, start-
ing in the 1820s, by William Rowan Hamilton (and also by Carl Jacobi). Hamilton
and Jacobi put the optical mechanical analogy in a broader conceptual frame: the
end points of paths that emanate from a given origin at t = 0 (each path being a
minimum of the Lagrangian action) create, at a later time t, a “wave-front” that
propagates. This wave-front is a surface that intersects the particle trajectories (just
like a wave-front for light is perpendicular to the light rays) but does not include
interference or diffraction effects peculiar to waves. However, it invites a “natu-
ral” question: if light rays are the small wavelength limit of wave optics, what is
the wave theory of particles whose small wavelength limit gives the particle tra-
jectories? Hamilton did not have an experimental reason to entertain the question
in the mid-nineteenth century, but the answer came in the 1920s with Louis de
Broglie’s and Erwin Schrödinger’s quantum theory of wave mechanics. In 1923
de Broglie wrote: “Dynamics must undergo the same evolution that optics has
undergone when undulations took the place of purely geometrical optics,” and in
1926 Schrödinger considered the “general correspondence which exists between
the Hamilton-Jacobi differential equation and the ‘allied’ wave equation.” In 1942
Richard Feynman established an even deeper connection between least action and
quantum physics: a quantum particle, in propagating between two fixed points in
space and time, does not follow a single path but all possible paths “at the same
4 Introduction
time.” The contribution of each path to the total propagation is the (complex)
exponential of Hamilton’s action.
The fact that many fundamental laws of physics can be expressed in terms of
the least action principle (with the appropriate action) led Max Planck to say that,
“Among the more or less general laws which manifest the achievements of physical
science in the course of the last centuries, the principle of least action is probably
the one which, as regards form and content, may claim to come nearest to that
final ideal goal of theoretical research.” And Arthur Eddington, in 1920, wrote:
“the law of gravitation, the laws of mechanics, and the laws of electromagnetic
fields have all been summed up in a principle of least action. . . . Action is one
of the two terms in pre-relativity physics which survive unmodified in a descrip-
tion of the absolute world. The only other survival is entropy.” Although Einstein
didn’t follow a least action approach in his theories of relativity (special and gen-
eral), Max Planck, in 1907, in the first relativity paper not written by Einstein,
formulated the dynamics of the special theory in terms of the least action principle.
One of the most interesting applications of the least action principle is the deriva-
tion, by David Hilbert, of the field equations of general relativity. Hilbert knew,
from Einstein, that the relativistic theory of gravitation had to involve the curva-
ture of a four-dimensional space-time. Einstein had struggled for eight years and
he had eventually arrived at the solution by analyzing the properties of the field
equations themselves. Hilbert followed the approach of the least action principle,
guessed the “most natural” Lagrangian and, in 1915, derived the field equations
before Einstein.
Our purpose in writing this book is to tell the above stories with some math-
ematical rigor while staying as close as possible to the sources. Chapter 2 visits
some ancient incarnations of minimum principles before moving on to Galileo’s
curve of swiftest descent and Fermat’s precalculus ideas. We also include New-
ton’s calculation of the solid of least resistance, which anticipates the calculus of
variations used in the principle of least action. In chapter 3 we take an excursion
to Newton’s Principia, even though this work is not directly related to variational
principles. We do so for two reasons: the monumental importance of this work on
mechanics, and the fact that Newton’s ideas are crucial in the development of the
principle of least action. Chapter 4 tells the story of the optical mechanical analogy
and the true beginnings of variational principles. In chapter 5 we visit the principle
of virtual work and Lagrange’s equations. Here we point out that the principle of
least action fails to give the dynamics of nonholonomic systems, where the con-
straints are expressed in terms of the possible motions rather than in terms of the
possible configurations. Chapter 5 and the ones that follow require familiarity with
calculus. In writing chapter 6, we decided to follow Hamilton’s crucial papers as
closely as possible, making some sections of this chapter perhaps less accessible
Introduction 5
These verses refer to the legend of Queen Dido who fled her home because her
brother, Pygmalion, had killed her husband and was plotting to steal all her money.
She ended up on the north coast of Africa, where she was given permission to rule
over whatever area of land she was able to enclose using the hide of only one bull.
She cut the hide into thin strips, tying them together to form the longest loop she
could make, in order to enclose the largest possible kingdom. Queen Dido seems to
have discovered how to use this loop to maximize the area of her kingdom: using
straight coastline as her side border, she enclosed the largest area of land possible
by placing the loop in the shape of a semi-circle.
Queen Dido’s story is now the emblem of the so-called isoperimetric problem:
for a fixed perimeter, determine the shape of the closed, planar curve that encloses
the maximum area. The answer is the circle. Aristotle, in De caelo, while dis-
cussing the motion of the heavens, displays some knowledge or intuition of this
result (Aristotle, 350 BC/1922, Book II):
Again, if the motion of the heavens is the measure of all movement . . . and the minimum
movement is the swiftest, then, clearly, the movement of the heavens must be the swiftest
of all movements. Now of the lines which return upon themselves the line which bounds
the circle is the shortest; and that movement is the swiftest which follows the shortest line.
6
2.1 Queen Dido and the Isoperimetric Problem 7
However, a common assumption in ancient times was that the area of a figure is
determined entirely by its perimeter (Gandz, 1940). For example, Thucydides, the
great ancient historian, estimated the size of Sicily from its circumnavigation time
which is proportional to the perimeter (Thucydides, 431 BC, Book VI):
For the voyage round Sicily in a merchantman is not far short of eight days; and yet, large
as the island is, there are only two miles of sea to prevent its being mainland.
The confusion persisted even up to the times of Galileo, who expresses the
problem in Sagredo’s voice (Galilei, 1638/1974, p. 61):
people who lack knowledge of geometry . . . make the error when speaking of surfaces; for
in determining the size of different cities, they often imagine that everything is known when
the lengths [quantità] of the city boundaries are given, not knowing that one boundary
might be equal to another, while the area contained by one be much greater than that in the
other.
A B
Figure 2.1 Adapted from Zenodorus. The equilateral polygon encloses a larger
area than any irregular polygon with the same perimeter and the same number of
sides.
b)
+
2 (a
1
D
E
b)
C
+
b
2 (a
1
A B
Figure 2.2 From Zenodorus. Among the triangles on a fixed base AB and fixed
perimeter AB + a + b, the isosceles triangle AD B has the largest area. Triangle
AD B, whose legs have length (a + b)/2, has the same perimeter as the starting
(scalene) triangle AC B. Prolong AC to F so that AD = D F. The dashed line
D E, parallel to AB, is a “line of symmetry:” the segment D B is the reflection
of D F on the “mirror” D E, and all points below the line D E are closer to B
than to F. Now consider the triangle AC F and use the “triangular inequality”
(in any triangle the sum of any two sides is greater than the remaining side):
C F + a > a + b and the segment C F > b. Point C is therefore below C E; the
height of the triangle AC B is therefore smaller than the height of AD B. Since
both triangles have the same base, AD B has larger area.
C to D. Zenodorus shows that this reshaping increases the area. The specifics of the
proof (see Figure 2.2) draw from the repertoire of “conjuring tricks” of the Greek
geometers – the choreography of auxiliary lines and symmetric angles that reveal
sometimes unexpected and paradoxical relations. The process can be repeated for
all triangles of consecutive vertices of the polygon, allowing one to conclude that
the polygon enclosing the largest area is equilateral.
The second step is to show that the maximum polygon is also equiangular. This
Zenodorus proves by considering two consecutive triangles from the polygon (see
Figure 2.3). Zenodorus proves that, given two non-similar isosceles triangles, if we
construct, on the same bases, two similar triangles with the same total perimeter as
the first two triangles, then the sum of the areas of the similar triangles is greater
2.1 Queen Dido and the Isoperimetric Problem 9
Figure 2.3 Adapted from Zenodorus. The area of a polygon can be increased by
making it equiangular.
D
B
a D a h2
a a
h1
h2
A b1 C b2 E
a h1
a
B̄ h1
B̄ b 1 + b2
Figure 2.4 Adapted from Zenodorus. The sum of the areas of two isosceles trian-
gles with different bases AC = 2b1 and C E = 2b2 , but otherwise equal sides a,
is smaller than the sum of the areas of two similar triangles with the same bases
and equal total perimeter.
than the sum of the areas of the non-similar triangles. According to the account
by Heath (1921), Zenodorus’s proof is restricted. But we can show that a slight
modification makes it valid in general.
Let us start with the non-similar isosceles triangles ABC and C D E (see Fig-
ure 2.4). Their bases are AC = 2b1 and C E = 2b2 respectively, their heights are
h 1 and h 2 , and all the legs are of length a. Following Zenodorus’s logic, construct
the triangle A B̄C, which is the “mirror” image of triangle ABC, the mirror being
along the line of the common bases.
Now construct the two similar triangles A B̄ C and C D̄ E with new heights h 1
and h 2 , keeping the total perimeter the same:
B D = 2a. (2.1)
Since the original triangles are not similar, the line B̄ D joining their vertices is
shorter than 2a.1
B̄ D < 2a. (2.2)
1 This is due to the triangular inequality; the side B̄ D is smaller than the sum of BC and C D.
10 Prehistory of Variational Principles
=
h
h
Perimeter
Figure 2.5 The area of a regular polygon of n sides (n = 5 in this case) is one half
the perimeter times the apothem h (the height of each of the n identical triangles
making the polygon).
B α
O b β
O D
a
α β O A
A B C D
The largest regular polygon of a given perimeter is the one with an infinite num-
ber of sides: the circle. And the area of that circle, as shown by Archimedes (see
Figure 2.7), is one half the area of the rectangle having the perimeter and radius of
the circle as its length and width respectively.
Zenodorus’s solution, as well as a very elegant proof from Steiner (1842), are
vulnerable to a subtle but important flaw: the shape of maximum area for a given
perimeter is assumed to exist (without proof). The fact that from a given n-gon we
can construct a new one enclosing a larger area, does not guarantee that the area-
maximizing n-gon exists. As pointed out later by Weierstrass (1927), we could be
finding an upper bound and not the actual solution. Consider, for example (Blan-
chard and Brüning, 1982), the problem of finding the shortest curve C joining
points A and B with the restriction that C is perpendicular to the straight line AB
at both A and B (see Figure 2.8). If we call a the length of the straight line AB, we
can construct a curve connecting A and B that satisfies the constraint, consisting
of two arcs of a circle of radius plus a straight line. For each of these curves we
can find a smaller that shortens the length, but the limiting straight line of length
a is never reached.
It turns out that for n-gons, the proof that one of maximum area exists is rel-
atively simple (Blåsjö, 2005; Courant and Robbins, 1996), since the area and the
perimeter are continuous functions of the 2n coordinates of its vertices, which can
be restricted to a “compact” set of points in a 2n-dimensional space. For example,
each point can be thought of as being inside a square. Weierstrass showed that a
continuous function (the area in our case) on a closed and bounded interval is itself
bounded and attains its bounds. Zenodorus gave a beautiful proof but missed the
delicate proof of existence, and so Weierstrass usually gets the credit.
Figure 2.7 Archimedes’ proof that the area of a circle is half that of the rectangle
having the perimeter and radius of the circle as its length and width respec-
tively. A regular polygon with an infinite number of sides (of which we show
an approximation, with 15 sides) is a circle.
A B
what moves with constant velocity follows a straight line. An example is an arrow which
we see shot from the bow. For, because of the forward moving force the moving body
strives to follow the shortest path since it cannot afford the time for a slower motion, that is
a longer path. The moving force does not allow such a delay. Thus the body tries to follow
the shortest path because of its speed, but between the same endpoints the shortest of all
lines is the straight line.3
For Hero, light propagates at a finite velocity, an assumption that goes back as
early as 490 BC, attributed to Empedocles of Agrigentum (in Sicily), two millennia
before the finite speed of light was verified by the Danish astronomer, Ole Romer,
in 1676 (Hildebrandt and Tromba, 1985). The laws of reflection were known by the
Greeks before Hero. Euclid, in his Optics, states that light propagates in straight
lines and that the angle of incidence equals the angle of reflection. But Hero is
the first to derive the law from a minimization principle: he seeks the shortest path
between two points, subject to the condition of touching an intermediate point on
a plane. Hero uses the metaphysical principle of economy (Eastwood, 1970) to
derive a physical law, an approach at the core of the history of the principle of
least action. According to Damianus of Larissa (fourth century AD), author of On
the Hypotheses in Optics, Hero applies a principle that Aristotle mentions in many
places of his work: Nature does nothing in vain.4 If “Nature did not wish to lead
our sight in vain, she would incline it so as to make equal angles”(Damianus, 1897,
p. 21).
Hero considers the trajectory of a ray that connects points d and g and reflects
on the plane eh (see Figure 2.9). The model of vision for the Greeks was inspired
by a popular analogy between the sun and the eye: the light rays are “visual rays”
that originate in the eye, and the sensation of sight is produced when those linear
tentacles touch the object. Today we know that this model is wrong, but the geome-
try of these rays is maintained if we invert the direction of propagation, and Hero’s
treatment is valid. With an ingenuity that echoes Zenodorus’s proof in Figure 2.2,
Hero draws a line perpendicular to the plane eh through g and considers a sym-
metric point z such that ze = eg. For any point of reflection b, we have zb = bg
3 Translation from Pedersen (1993).
4 For example, in Politics, Aristotle says “Nature, as we often say, makes nothing in vain” (Aristotle, 350 BC,
Book I, Part II), and in De caelo we read “But God and Nature create nothing that has not its use” (Aristotle,
350 BC/1922, Book I, 4).
14 Prehistory of Variational Principles
z e g
d
b
and the angles ∠zbe = ∠ebg. These equalities reduce the problem to finding the
shortest path between the initial point d and the reflected, auxiliary point z. The
answer is evident: the straight line dz that intersects the reflecting plane at a. Since
the angles ∠had and ∠zae are equal, and since, by the symmetrical construction
∠zae = ∠eag one concludes with Hero that
∠had = ∠eag; (2.9)
in other words, the angle of incidence is the same as the angle of reflection.
Figure 2.10 The descent of a particle from A to C is faster through the arc of the
circle than through the straight line.
we have been led by the hand to the investigation of naturally accelerated motion by con-
sideration of the custom and procedure of Nature itself in all her other works, in the
performance of which she habitually employs the first, simplest, and easiest means . . . .
Thus when I consider that a stone, falling from rest at some height, successively acquires
new increments of speed, why should I not believe that those additions are made by the
simplest and most evident rule? (Galilei, 1638/1974, pp. 153–154).
H
× D
g
g
H
D
Figure 2.11 The final√velocity of a particle falling from rest on an inclined plane
is proportional to D/ H .
2 D
ttotal = ×√ . (2.11)
g H
A A
P
α
β P Q P D
H
R
β
B O
B
Figure 2.12 Galileo’s law of chords. Left: Thales’ theorem: If A, B, and P are
points on a circle, and AB its diameter, the angle A P B (equal to α + β) is a
right angle. Center: By similarity of the triangles AB P and P B Q, P B/B Q =
AB/P B: (P B)2 /B Q is constant for a circle, and equal to the diameter AB. This
geometrical relation is the same one that appears in the time of fall on an inclined
plane: equation
√(2.11). Right: Galileo’s law of chords: For all points on the arc of
the circle, D/ H is constant, and the descent time from rest is the same for all
inclined planes.
D A
B
Circle of equal time
C
Figure 2.13 Adapted from Galileo’s Two New Sciences. Point B is on the arc of
circle C B D. A particle starting at rest from D falls faster through the broken path
D B + BC than through the direct, shorter path DC.
both paths,8 the average velocity is larger for path BC than for path FC. Also,9
FC > C B, and from equation (2.13) the time BC is smaller than the time FC: the
broken path is faster.
Galileo then considers the fall through a polygonal path of five sides and con-
jectures that adding more sides, and in this way approaching an arc of a circle,
8 Galileo shows this by extending the path C B to C A. Particles starting at rest from D and A reach B with the
same velocity (they fall from the same height). The velocity at C from the paths D BC and AC is therefore
the same. And since the planes AC and DC fall from the same height, the final velocities at C are
independent of the paths taken.
9 Galileo shows this inequality with a detailed argument in the [Third] Lemma of Proposition 35. Galileo
didn’t number the lemmas, but we follow Drake’s ordering.
18 Prehistory of Variational Principles
B A
F
C G
makes the descent time get shorter. He is probably motivated by the isoperimetric
problem. In previous passages, in the First Day of Two New Sciences, he discusses
Zenodorus’s proof; Sagredo says that he has “seen the proof of this with partic-
ular satisfaction” (Galilei, 1638/1974, p. 62). However, Galileo does not give a
proof that a polygonal path of more sides gives a faster path on an arc of a cir-
cle. “It appears that one can deduce,” he says, “that the swiftest movement is . . .
along the circular arc.” Galileo adds more planes, as shown in Figure 2.14, and
uses the argument of the law of chords to break plane DC into the faster path
D E + EC. But the particle is not starting from rest at D, so the law of chords does
not apply (Erlichson, 1998). A full proof of this problem requires techniques that
were beyond Galileo’s mathematical horizon but that were developed within a few
decades.
A H
F
B E
C
I
G D
It is not the angles of incidence and refraction that are proportional; the sines
of the angles, measured with respect to the normal to the plane of incidence, are
proportional. Descartes proves that a law of this sort results from a mechanical
model that treats light as particles that change their velocity as they go from one
medium to another. His analogy is the inflection of the motion of a tennis ball
upon entering water (see Figure 2.15). In a particle model, the refraction problem
is treated as analogous to the bouncing (or reflection) of a ball on a surface: the wall
inverts the direction of the velocity perpendicular to the wall, and the component
parallel to the plane remains the same. Descartes assumes that the same happens for
refraction: as the tennis ball penetrates the surface, the component of the velocity
parallel to the plane of the water does not change. Assume (see Figure 2.15) that
the tennis ball changes its velocity from vair to vwater as it penetrates the surface
BE. Since BI and AB are of equal length (Descartes chooses to put these points on
a circle), the travel times tair and twater for the two segments are different and must
satisfy:
vwater × twater = BI = vair × tair = AB. (2.14)
If the projection of the velocities parallel to the water surface, vparallel , is constant,
we have
Descartes obtains the correct law in the sense that the ratio of the sines is con-
stant. However, the experiments showed that light rays bend towards the normal
(AH>AF) as they penetrate the surface of water or glass (and not away from
the normal, as in Figure 2.15). In order to obtain agreement with experiments,
Descartes has to assume that light, like sound, propagates faster in water (or in
any dense medium, like glass) than in air. A radically different approach was fol-
lowed by Pierre de Fermat, who objected to Descartes’ argument that light should
propagate faster in denser media. The difference in the approach was not only in
the models for speed of propagation in denser media. In contrast with Descartes’
mechanistic approach, Fermat’s solution has the spirit of Hero’s treatment, with
Aristotle’s idea that Nature does nothing in vain as its ultimate justification.
In 1657, seven years after Descartes’ death, Fermat received the treatise La
lumière by Marín Cureau de la Chambre, which contains some bold statements
about minimum paths. After discussing the law of reflection, he says: “We see
therefore by all this reasoning that the equality of angles in reflection is made
along the shortest lines, and that this is not something particular to Light, since
nature observes the same order in all the movements that she causes” (De la
Chambre, 1662). Shortly afterward, he qualifies his statement: “if nature makes
its movements by the shortest lines, it would be necessary that they be made in
refraction as well.” Fermat replies in agreement: “The principle of physics is that
Nature performs her movements by the most simple paths”(Fermat, 1657/1894). In
a subsequent letter, from 1662, he adds:
There is nothing as probable or apparent as the assumption that Nature always acts by
the easiest means, which is to say either along the shortest lines when time is not a
consideration, or in any case by the shortest time.
As pointed out by Rashed (1970), the Arabic astronomer, Ibn al-Haytham (Alha-
cen is the Latinized version) uses the principle of ‘the simplest way’ to explain
refraction. Ibn al-Haytham thought of light as tiny hard spheres that move in
straight lines and that propagated in different media according to their density. The
denser the medium, the greater the resistance to penetration by light (Mark Smith,
2009). Interestingly, de la Chambre knew the Book of Optics of Ibn al-Haytham in
the Risner edition or in the Witelos version (Rashed, 2016).
Fermat is able to derive the law of refraction by minimizing the travel time for
the light ray from its initial point to its final point. And he does so using his own
“precalculus” mathematical invention, the method of maxima and minima.
2.4 Bending of Light Rays and Fermat’s Minimum Principle 21
Note, in passing, that Fermat is solving the “isoperimetric problem” for a rectangle
of half perimeter AC. Call AC = b, and AE = a, with a the point to be found.
The product of the two segments is a(b − a) = ab − a 2 . Now he changes a by
a small amount e so that the first segment is a + e and the second is b − a − e.
The new product is ab − a 2 + eb − 2ae − e2 . At this point Fermat introduces
a concept (adaequare) that he took from Diophantus of Alexandria, and whose
interpretation was the subject of considerable debate (Breger, 1994; Giusti, 2009).
Fermat equates approximately, or sets adequal, the original product to the new one:
eb − 2ae − e2 ∼ 0, (2.18)
then divides by e
b − 2a − e ∼ 0. (2.19)
Finally,10 he sets e = 0 to obtain
b
a=
, or AE = EC. (2.20)
2
(The rectangle of largest area with a given perimeter is a square.) “It is impossible
to give a more general method,” he says.
Fermat’s method of finding the maximum of a function f (a) (equal to a(b − a)
in his introductory example) consists of finding the tangent of the function and
identifying the point a at which that tangent is equal to zero (see Figure 2.16).
Fermat’s method of tangents is very close to the infinitesimal calculus later devel-
oped by Newton and Leibniz. Newton, in his correspondence, acknowledges that
he obtained a hint for his method (of fluxions) “from Fermat’s way of drawing tan-
gents,” and that he just made it more general (Sabra, 1981, p. 144). In contemporary
mathematical language, Fermat’s method is to expand the ratio
f (a + e) − f (a)
(2.21)
e
in powers of e and to take the constant term: he is finding f (a), and then setting
it equal to zero to locate the extremum of the function. Fermat never spoke of
e as an infinitesimal, although it was a concept he knew from Galileo (Galilei,
10 In a parenthetical comment, Fermat adds that he divides by e “or by the highest common factor of e.”
22 Prehistory of Variational Principles
M
f(x)
C
A
f(a + e)
f(a) B D
e
x
A a B
Figure 2.16 Fermat’s method of tangents. Left: The tangent to a curve, or the
slope of the curve, is given by [ f (a + e) − f (a)]/e when e is set to zero. At
a maximum M, the tangent is zero. Right: The tangent to a curve is zero for
“stationary” points: maxima like A or C, minima like B, or points like D which
are neither maxima nor minima.
1638/1974, p. 54). As told by Alexander (2014), in 1632 (at the time when Galileo
was being tried for defending heliocentrism), Jesuit clergymen banned the use of
infinitesimals. This might be the reason why Fermat, a staunch Catholic, does not
mention the term explicitly (Bascelli et al., 2014).
Fermat applies his method to several examples, including functions that include
square roots. As an example, let us apply Fermat’s method of tangents to the
√ √ √
function f (a) = a. We first need a + e − a. Multiplying and dividing by
√ √
a + e + a, we obtain
√ √
√ √ √ √ a+e+ a
a+e− a = a+e− a × √ √
a+e+ a
e
=√ √ . (2.22)
a+e+ a
√
Dividing by e and setting e = 0, we obtain 1/2 a for the tangent, or the slope of
√
a.
In his letter of 1662, Fermat tells Cureau de la Chambre that his derivation of
the law of refraction is long and tedious and “involves four lines by their square
roots.” In his Analyse pour les réfractions, Fermat (1657/1894b) indicates where
these square roots come from. The figure he uses is very similar to the one used by
Descartes in his tennis ball analogy. He puts the initial and final points, C and I ,
on a circle (see Figure 2.17). Just as in Descartes’ derivation, the minimum path
goes through the center D of the circle, and the unknown is the relation between
the segments D H and F D. For simplicity, let us take the radius of the circle
C D = D I = 1. And let us use Fermat’s notation: D H = a, D F = b. If we call v1
and v2 the velocities in the upper and lower media, the time for the minimum path is
CD DI 1 1
Tmin = + ≡ + . (2.23)
v1 v2 v1 v2
2.4 Bending of Light Rays and Fermat’s Minimum Principle 23
C θ1
1
D H D H
F O F b a
1
θ2
I
Figure 2.17 From Fermat’s Analyse pour les réfractions (left). On the right, we
indicate the angles of incidence (θ1 ) and refraction (θ2 ).
Fermat writes his equation in terms of what he calls the “resistances” in each
medium, which will be just the inverse velocities 1/v1 and 1/v2 . His resistances are,
in the contemporary notation, the indices of refraction of each medium. In order to
apply his method of maxima and minima, Fermat considers the nearby path C O I ,
where O D = e, the quantity later to be set to zero. The time Te for this path is
CO OI
Te = +
v1 v2
√ √
1 − 2eb + e2 1 + 2ea + e2
= + . (2.24)
v1 v2
In order to apply his method of tangents, we need to equate Tmin = Te , divide
by e, and set e equal to zero. Fermat does not provide the details of his “tedious”
calculation, but they are relatively easy (although somewhat lengthy) to carry out.
A reconstruction of the steps followed by Fermat, squaring twice in order to elim-
inate the square roots, was carried out by Sabra (1981) and Andersen (1983). We
offer a simpler derivation, in the spirit of the method of tangents, that Fermat could
well have followed.
The difference in travel times is
√ √
1 − 2eb + e2 − 1 1 + 2ea + e2 − 1
Te − Tmin = + . (2.25)
v1 v2
This is equivalent to
1 −2eb + e2 1 2ea + e2
Te − Tmin = √ + √ . (2.26)
v1 1 − 2eb + e2 + 1 v2 1 + 2ea + e2 + 1
Dividing by e and setting e = 0, we obtain
Te − Tmin b a
∼− + = 0. (2.27)
e v1 v2
24 Prehistory of Variational Principles
I do not pretend, nor have I ever pretended to be in the inner confidence of Nature. She has
obscure and hidden ways which I have never undertaken to penetrate. I would only have
offered her a little geometrical aid on the subject of refraction, should she have been in
need of it. But since you assure me, Sir, that she can manage her affairs without it and that
she is content to follow the way that has been prescribed to her by M. Descartes, I willingly
hand over to you my alleged conquest of physics; and I am satisfied that you allow me to
keep my geometrical problem – pure and in abstracto, by means of which one can find the
path of a thing moving through two different media and seeking to complete its movement
as soon as it can.
H
L
F
K
B
M G
Q
C
ABC by shifting the refraction point to the left, from B to K. So he concludes that
the path that satisfies Snell’s law minimizes the travel time between A and C. His
proof is indeed simple. Consider the shift to the right first. Huygens draws the two
right triangles BHF and BGF, with their common hypotenuse being the shift BF.
The distance traveled by the ray in medium 1 increases by HF, while in medium 2
it decreases by FC−(BG + GC). If we call, as before, v1 the velocity in the upper
medium, and v2 the velocity in the lower medium, the change in time between AFC
and ABC is
HF BG FC − CG
TAFC − TABC = − +
v1 v2 v2
sin θ1 sin θ2 FC − CG
= FB − + , (2.29)
v1 v2 v2
where we used the geometrical construction of Figure 2.18: HF/FB = sin θ1 and
BG/FB = sin θ2 . The same analysis applies to the path AKC:
sin θ2 sin θ1 AK − AL
TAKC − TABC = K B − + (2.30)
v2 v1 v1
If the path obeys Snell’s law, the expression in the curly brackets in equations (2.29)
and (2.30) is zero. Huygens notes that FC>CG, since FC is the hypotenuse of the
right triangle FGC and, for the same reason, AK>AL. The difference in times is
always larger than zero for arbitrary displacements to the left and to the right of B:
the path that satisfies Snell’s law is a minimum.
To connect Huygens’ approach to Fermat’s derivation of the method of tangents,
we repeat Huygens’ analysis starting from a path that is not a minimum. In order
26 Prehistory of Variational Principles
and dividing by e and setting e = 0, we obtain Snell’s law (note that, for e = 0,
FC=CG). One nice feature about Huygens’ treatment is that he is not considering
infinitesimal displacements from the reference path. Instead, he shows that any
path that deviates from the one obeying Snell’s law takes a longer time. This is
true because, as we move the refraction point along the interface between the two
media, the time taken has only one minimum; the “time” function is concave (see
Figure 2.19).
In his Lectures on Physics, Richard Feynman (1963) presents a nice synthesis of
Fermat’s and Huygens’ geometrical argument. We reproduce Feynman’s version
in Figure 2.19.
A N1 Time
BF x
D
F
B x
G
N2 C
Figure 2.19 Adapted from Feynman’s Lectures on Physics. The refracted ray
ABC is displaced to AF B. If B F is much smaller than AB and BC, then
AB ≈ AD and GC ≈ FC. The path length changes by D F − BG, while the
change in time is D F/v1 − BG/v2 = B F(sin θ1 /v1 − sin θ2 /v2 ). Here θ1 is the
angle of the ray AB with the normal N1 , and θ2 is the angle of the refracted ray
BC with the normal N2 . For a stationary point the change in time is zero, and we
obtain Snell’s law, sin θ1 /v1 = sin θ2 /v2 .
2.5 Newton and the Solid of Least Resistance* 27
The demonstration of these curious Theorems being omitted by the author, the analysis
thereof, communicated by a friend, is added at the end of this volume.
It is well known that in the Principia Newton determines the form of the solid of least
resistance, thus affording the first example of a class of problems which we now solve by
means of the Calculus of Variations.
Newton proves that the resistance of a sphere is half the resistance of a cylinder
of the same diameter, moving in the direction of its axis. In the first statement of
the scholium, he finds the angle of a truncated cone (a frustum of a cone) “which
should meet with less resistance than any other frustum constructed with the same
base and altitude.” In the second statement, he remarks that the resistance of an
ellipsoid or an oval of revolution is lowered if the front is replaced by a frustum
forming an angle of 135◦ . He proposes that this property “may be of use in the
building of ships.” And in the third somewhat puzzling statement, he gives – in
geometric form – the differential equation of the solid of least resistance. To the
28 Prehistory of Variational Principles
best of our knowledge, this is the first occurrence of a curve defined by means of
a differential equation. In the following paragraphs, we will visit Proposition 34,
with particular focus on Newton’s variational calculation which anticipates Euler’s
treatment of variational calculus of 1744.
Motivated by a mundane application that amounts to finding the optimal shape
of a bullet, Newton presents what is “probably the most sophisticated calculation
in the Principia” (Chandrasekhar, 1995, p. 567).
B
E L
C D
(θ = ÐLBD = ÐECB)
Figure 2.20 Copied from Newton’s Principia, Proposition 34, Book II.
a cylinder, swimming in a vortex, offered more resistance to its suction, and was drawn in
with greater difficulty than an equally bulky body, of any form whatever.
Aside from the fact that one should not demand scientific rigor in a work of
fiction, Newton’s treatment applies to regimes of high velocities, where the resis-
tance is independent of the viscosity of the fluid. That is unlikely to be the case for
cylinders in water with vortices.
12 Recent scholarship revisits Archimedes’ proof and questions Mach’s arguments (Palmieri, 2008).
30 Prehistory of Variational Principles
H
x D
H H
VP 2
VC
R B R πR2
R R
y y
√
A Ry πRy
F
θ
O D S
Q
language: the angle is such that C Q = Q S, where Q is the midpoint of the height
and S is the vertex of the cone. More specifically, if S is the vertex of the cone, the
angle is such that
OD OD 2
OS = + + C O2 . (2.33)
2 2
The proof is simple using elementary calculus. We adapt here Newton’s proof –
he uses calculus in this case – as presented in Whiteside (1974). Since all the points
of the cone form the same angle θ with the axis of symmetry (tan θ = OC/O S),
the resistance of the frustum is the sum of two terms: the resistance of a disc of
radius F D (proportional to the area of the disc), and the resistance of the conical
surface (proportional to the product of the area of the annulus bounded by the
circles of radii F D and OC with sin2 θ):
R = Cπ OC 2 − D F 2 sin2 θ + D F 2 (2.34)
= Cπ OC sin θ + D F cos θ ,
2 2 2 2
(2.35)
and, since OC − D F = O D tan θ, we have
R = Cπ OC 2 sin2 θ + (OC − O D tan θ)2 cos2 θ (2.36)
= Cπ OC 2 + O D O D sin2 θ − 2OC sin θ cos θ . (2.37)
The angle of the cone of minimum resistance is determined by the condition
d R/dθ = 0, which we write, after very little algebra, as
OD
cot2 θ − cot θ − 1 = 0, (2.38)
OC
and
2
OS OD OD
cot θ = = + +1 (2.39)
OC 2 OC 2 OC
which is Newton’s statement as expressed in equation (2.33).
32 Prehistory of Variational Principles
D N
F G
A R
M B
I H
Note that, for an infinitesimal cone for which O D → 0, θ = π/4, the optimal
angle for an inifinitesimal cone is 45◦ (or 135◦ , as indicated by Newton, if the
angle is measured with respect to the positive x axis). In order to verify that this is
a minimum (and not a maximum), we take O D OC in equation (2.37):
R
C OC 2 − O D × OC sin 2θ , (2.40)
which is minimized for θ = π/4. Newton notes that the angle θ is acute; cot θ > 1,
as can be seen from equation (2.39), and he states (see Figure 2.24):
Newton does not offer a proof of this statement. We will prove it below. But
before that we mention Newton’s remarkable statement about a general solid of
least resistance. Given the previous statement, the optimal solid has to end with a
slope of 45◦ and to have a circular “nose.” Call BG the radius of the nose, as in
Figure 2.24, and write the slope at M in terms of BG as G B/B R. Newton states
that at every point N of the curve the following relation holds
B R × G B2 1
MR × 4
= . (2.41)
GR 4
Newton does not prove this statement in the Principia, but a proof appears
in an Appendix to Motte’s translation and in unpublished papers including the
manuscripts of the Portsmouth Collection. His calculation is a notable anticipation
of Euler’s treatment of variational calculus. Note that equation (2.41), written in a
geometrical language, is a relation between the slope and the height of the curve
2.5 Newton and the Solid of Least Resistance* 33
M N = y, (2.42a)
GB dy
=− ≡ −y , (2.42b)
BR dx
GB dy y
= − = , (2.42c)
GR d x 2 + dy 2 1 + y 2
equation (2.41) becomes
yy 3 GB
2 = − , (2.43)
1 + y 2 4
with G B a constant, the radius of the circular front, which in turn equals the value
of y when the slope is y = −1.
For the proof of the statement that the solid of least resistance always has an
angle θ < 45◦ , consider the curve F B of Figure 2.25. The (infinitesimal) seg-
ment ab of the curve generates a conical surface upon revolution along the axis
O P. If the segment is broken into two segments ac and cb, where ac is at 45◦
with the horizontal, the resistance decreases (the optimal infinitesimal frustum has
θ = 45◦ ). From equation (2.40), the change R is the difference between the
a
d c
45◦
a
d c G
b
O mn P
B
resistances of the solids of revolution generated by the trapezia macn and mabn:
R = C × am × mn(sin 2θ0 − 1), where θ0 is the angle of ab with the horizontal.
Now consider the surfaces S1 and S2 obtained respectively by revolving the bro-
ken lines a ac, and a c c. Surface S2 is obtained from S1 by “lifting” the segment
ac to the line F G (at 45◦ with the axis O P). This lifting process decreases the
resistance by (one half) the difference of the resistance of the annuli obtained by
revolving the segments ad and a d . (Recall that the resistance of a conical sur-
face that forms an angle of 45◦ with the axis of revolution is half the resistance
of the annulus projected by the surface on the plane perpendicular to the direc-
tion of propagation.) Now imagine dividing the whole curve F B into infinitesimal
segments and sequentially performing this lifting process from left to right from
F, where the curved surface intersects the cone F G (the surface having a slope θ
equal to or larger than 45◦ at that point). As a result, we obtain the truncated cone
F G B with lower resistance, as stated by Newton. This proof is inspired by (but
different from) the one presented by Whiteside (1974, p. 663) in a paragraph that
concludes, “While no record of Newton’s own demonstration of this inequality has
survived, we have no reason to think that it could have been greatly different in its
structure.”
Now we turn to Newton’s anticipatory treatment of the calculus of variations.
g g
h G
C m M b B P
Figure 2.26 Newton’s variational method for the curve Dn N gG P of the solid of
least resistance.
y
n
b1
y1
θ1 N
N
b2
θ2 G
y2
a1 a2
x
m M B
and, since b/a = dy/d x, we get that equation (2.52) is equivalent to equations
(2.43) and (2.41).
In the proof in Newton’s letter (probably to David Gregory) reproduced in
Cajori’s edition (1729/1934), Newton takes b1 = b2 and the relation that is constant
along the curve becomes
a1 a2
y1 4
= y2 . (2.54)
nN N G4
It is a simple excercise – proceeding in the same manner as we did for (2.52) – to
show that equation (2.54) implies equation (2.41). Newton’s differential equation
can be derived with the rules of variational calculus, later developed by Euler and
Lagrange. We present such a derivation in Appendix A.
3
An Excursion to Newton’s Principia
The Principia is one of the greatest books of all time. In it Newton formulates a
“System of the World,” incorporating his own new mathematical ideas as well as
displaying a masterful command of geometry. “Newton was the greatest genius
that ever existed,” the mathematician Joseph-Louis Lagrange is alleged to have
said, adding, with a grain of humor, “and the most fortunate, for we cannot find
more than once a system of the world to establish.”
We include a chapter on Newton’s laws of motion and the Principia even though
this work is not directly related to variational principles. We do so for two reasons:
the monumental importance of his work on mechanics and the fact that his ideas
are crucial in the development of the principle of least action. Moreover, Newton
himself was a proponent of the Aristotelian simplicity and economy (Lyssy, 2015).
In his first rule for the study of Natural Philosophy we read: “As the philosophers
say: Nature does nothing in vain, and more causes are in vain when fewer suffice.
For nature is simple and does not indulge in the luxury of superfluous causes”
(Cohen and Whitman, 1999, p. 794).
LAW I
Every body perseveres in its state of rest, or of uniform motion in a right line, unless it is
compelled to change that state by forces impressed thereon.
LAW II
The alteration of motion is ever proportional to the motive force impressed; and is made
in the direction of the right line in which that force is impressed
38
3.2 Geometrical Derivation of Kepler’s Laws of Planetary Motion 39
LAW III
To every action there is always an opposed and equal reaction: or the mutual actions of
two bodies upon each other are always equal, and directed to contrary parts.
In many derivations of the Principia, Newton uses his second law following
a method of impulses. The body moves on straight lines that are broken by the
actions of impulses acting at regular intervals of time. The orbits are firstly poly-
gons, much in the style of Zenodorus’s proof that we presented in Chapter 2. And
these polygons become continuous curves when the time between impulsive forces
shrinks to zero. Newton takes this procedure from Descartes’s tennis ball analogy
of the Dioptrique (Cohen and Whitman, 1999) where the motion of the tennis ball
is broken by an impulsive force at the interface of air and water (see Figure 2.15).
It is interesting that Newton, although being the creator of differential calculus,
uses geometrical methods in all his derivations in the Principia. In the rest of the
chapter we take a brief excursion through these derivations.
vΔt
S
Figure 3.1 Equal areas in equal times. A particle moving at constant velocity v
on a straight line travels a distance vt at each interval t. If one considers a
fixed point S at a distance d from the line, the area swept out with respect to that
point during t is constant and given by dvt/2.
40 An Excursion to Newton’s Principia
displace the vertex of a triangle in a direction parallel to the opposite side, the area
does not change. And the same applies to a parallelogram when the sides are dis-
placed parallel to each other. This perhaps curious result was shown by Euclid in
his Elements, in Proposition 35 of Book 1. In his commentary to Euclid’s work,
Pappus of Alexandria (ca. 300 – ca. 350) refers to Proposition 35 as one of the
“paradoxical theorems of mathematics, since the uninstructed might well regard it
as impossible that the area of the parallelograms should remain the same while the
length of the sides other than the base and the side opposite to it may increase indef-
initely”(Heath, 1926, p. 329). This simple theorem of Euclidean geometry allows
one to prove that a particle moving in a straight line sweeps out equal areas in equal
times. And Newton uses the same theorem to prove that equal areas are swept out
in equal times in the presence of a force that points to a fixed point.
If a force acts on a particle, by the second law, its motion is no longer going
to be of constant velocity. If the force is in the direction of motion, its velocity
will change, but the particle will keep moving in the same direction. If the force
makes an angle with the direction of motion, the particle will change its direction.
Let us consider the case where the force is always directed to the fixed position
of point S (Newton labeled the point “S” because he was thinking of the sun).
Newton approximates the motion as a sequence of straight segments, as though
the force were acting in the form of periodic pulses. At the end of the calculation,
he takes the limit of the time between pulses to zero, and both the force and the
trajectory become continuous. The idea of impulsive forces was used by Robert
Hooke (Nauenberg, 1994), although the priority over Newton is debated (Erlich-
son, 1997). The idea is to compose the velocities: after each pulse the velocity is
the sum of two (vector!) components, the velocity the particle had just before the
pulse (which we will call v) and the change imparted by the impulsive force (which
we will call v).
At a time t after the pulse at a point B (see Figure 3.3), the displacement of
the particle will be given by the composition of two displacements: vt in the
direction of the velocity before the pulse, and vt in the direction of the impul-
sive force at B. Since the displacement cC (see Figure 3.3) is in the direction of
S B, the triangles S Bc and S BC have the same area: regardless of the changes
v, provided all these changes are directed to a fixed point; equal areas are swept
out in the same time. Simultaneously with Newton, Robert Hooke was using a
similar kind of polygonal diagram to study accelerated motion (see Figure 3.2),
and stated that, for elliptical orbits, the force should vary as the inverse square of
the distance. Later Hooke accused Newton of plagiarizing the inverse square law.
Newton responded that Hooke had only hinted at the idea and had offered no proof,
and he almost completely omitted almost completely any reference to him in the
Principia.
3.2 Geometrical Derivation of Kepler’s Laws of Planetary Motion 41
Figure 3.2 A page from Robert Hooke’s manuscript, dated September 1685,
showing graphical evaluation of orbital motion which varies linearly with dis-
tance.
3.2.2 Proposition 6: The Force Law and the Geometry of the Orbit*
After proving Kepler’s second law, Newton derives a geometric relation for the
orbit of an object subject to a central force. Figure 3.4 shows Newton’s diagram of
the orbit. When the particle is at point P, its velocity is along the tangent at P (the
line Z Y ). As we discussed in the previous section, the position of the particle an
instant later is Q, which results from the composition of two displacements: P R,
the displacement of the particle without the force at P; and R Q, the displacement
42 An Excursion to Newton’s Principia
vΔt c
CΔ
vΔt
vΔt
S A
f e
E d
F D
Z
c
C
V B
S A
Figure 3.3 Equal areas in equal times. Top: Since the triangle S BC is obtained
from the triangle S Bc by displacing the vertex c parallel to their common base
S B, their areas are the same. For “central forces” – forces that are directed to a
fixed point (S in the figure) – equal areas are swept out in equal times. Bottom:
Reproduced from Proposition 1 of the Principia. Using his second law and a
simple extension of the argument of Figure 3.1, Newton derives Kepler’s second
law of planetary motion.
Y
R
P
Q
T Z
A
S
V
due to the force at P. Since the force points always in the direction of S, Q R is
in the direction of P S. Since the second law states that the alteration of motion is
proportional to the force, the segment Q R, for a fixed time (or, which is equivalent,
for fixed R), is proportional to the force. Then Newton states that, if we fix the
3.2 Geometrical Derivation of Kepler’s Laws of Planetary Motion 43
force, the displacement Q R is proportional to the square of the time. The segment
Q R is the deviation of the body from the tangential path P R due to the action of
the force. Newton knew the results of Galileo’s experiments with inclined planes:
for a constant force, the distance traveled is proportional to the square of the time.
“Newton is assuming that, as the point Q shrinks back to the point P, the force
can be treated as if it were constant”(Brackenridge, 1996), and the point describes
a parabolic motion. Thus the segment Q R is both proportional to the square of the
time and to the magnitude of the force at point P:
And here Newton uses Kepler’s second law (his Proposition 1): t is proportional
to the area of the triangle S R P, which is approximately S P × QT /2, with QT the
segment drawn from Q perpendicular to S P:
1 (Q R)
Force ∝ . (3.2)
(S P)2 (QT )2
The geometry of the orbit is encoded in the relation of the infinitesimal quantities
(QT )2 /Q R with the distance to the center of force S P. Different orbits and differ-
ent positions of the center S with respect to the orbit give different laws of force.
One of the breakthroughs in Newton’s Principia is the proof that, if an orbit is ellip-
tical and the force is directed to the focus of the ellipse, the force should decrease
as the inverse square of the distance. Newton proves this by showing that (when
S P points to the focus) (QT )2 /Q R is a constant (equal to b2 /a with a the semi-
major axis of the ellipse and b the semi-minor axis). But before that he considers
some simpler cases, which we now discuss.
S
S C
Figure 3.5 Left: Circular orbit with the center of force S at the center of the circle.
Right: Center of force S at a point on the circle (treated in Proposition 7 of the
Principia).
the circle? Newton considers this problem in Proposition 7. Let us consider the case
where the center of force is at a point on the circle. This is a peculiar orbit where
the object passes periodically through the very center of the force. The solution for
this problem is simple because of the similarities of the triangles C P y, Qx T and
Pvx:
QT Py SP
= ≡
Qx CP 2a
QR 2a
= . (3.5)
Pv SP
For the law of force, we need the ratio QT 2 /Q R:
(QT )2 S P 3 (Qx)2
= , (3.6)
QR 2a Pv
but, since Qx
Qv, the ratio (Qx)2 /Pv = 2a, the ratio we obtained in equation
(3.4) for the center of force at the center of the circle, and
(QT )2 (S P)3
= . (3.7)
QR (2a)2
The force is
1 QR 1
F∝ ∼ . (3.8)
(S P) (QT )
2 2 (S P)5
With a force that decreases as the inverse fifth power of the distance, we can obtain
a circular orbit that passes through the center of force. But we stress that Newton
is deriving a law of force from the geometry of the orbit, and not the converse.
In the following propositions, Newton considers elliptical orbits and uses his
masterful command of the properties of conics to relate the ratio (QT )2 /Q R to the
geometry of the orbit.
3.2 Geometrical Derivation of Kepler’s Laws of Planetary Motion 45
3.2.4 Proposition 10: Elliptical Orbit with the Center of Force at the Center of
the Ellipse
In his treatment of elliptical orbits, Newton uses several properties of conics. Some
of them he derives, and some of them he mentions without proof. In order to make
this presentation self-contained, we present simple proofs of all the properties used
by Newton.
D
D
P P
C
C
F
G
G
K
Figure 3.6 An ellipse is a rescaled circle. Upon rescaling (contracting the verti-
cal direction in this case), all the areas are changed by the same factor. Since the
circumscribed (shaded) parallelogram (right) is a contracted square, it has con-
stant area (its ratio with the area of the ellipse is always 1/4π ), irrespective of the
choice of conjugate axes.
46 An Excursion to Newton’s Principia
D Q
P
v
Q
C
G
Figure 3.7 The intersecting chord theorem: (Pv × vG)/Qv 2 = (PC)2 /(DC)2 .
This is a property that Newton cites without proof. In Proposition 10, Problem
5, he invokes this property, stating “by the property of conic sections.” According
to the recent translation by B. I. Cohen and Anne Whitman (1999), authors of
books on conic sections in the eighteenth and nineteenth centuries supplied a proof
of this theorem to help the readers of the Principia. And, as common readers of
the Principia, at this point of his derivation we relate to the words Galileo puts
in Simplicio’s mouth while demonstrating one of his geometrical proofs: “You
proceed too grandly in your demonstrations; it seems to me that you always assume
that all of Euclid’s propositions are as familiar and ready at hand to me as his
very first axioms” (Galilei, 1638/1974, p. 220). Here we offer a proof using the re-
scaling idea. First we prove this for a circle, where the property can be expressed
as a ratio of areas. Since the areas are changed by the same factor upon rescaling,
the property remains valid for the ellipse.
Consider the areas A1 , A2 , and A3 of Figure 3.8. For the circle, using PC = C D
(equal to the radius of the circle) and the Pythagorean theorem, we have that the
ratio of the areas is
A1 A2 [(Gv)(C D)][(Pv)(C D)]
= (3.11)
A32 (PC)2 (R P)2
[(PC + Cv)(PC)][(PC − Cv)(PC)]
= (3.12)
(PC)2 (R P)2
3.2 Geometrical Derivation of Kepler’s Laws of Planetary Motion 47
D R
A3
P D R
A3
v A2 P
v A2
A1
C A1
C
G G
(PC)2 − (Cv)2
= (3.13)
(R P)2
=1 (3.14)
A A
= 1 2 2 (3.15)
A3
where, in the last line, we used the fact that all the areas are rescaled by the same
factor. Since the sides of the parallelograms of areas A1 , A2 , and A3 form the same
angle α (the angle between the semi-major axes), their areas are the product of their
sides times cos α. The factor cos α cancels in computing A1 A2 /A2 3 , and we obtain,
for the ellipse:
(Gv)(Pv)(C D)2
= 1. (3.16)
(C P)2 (R P)2
The theorem of equation (3.16) is valid regardless of the quantities Pv and R P
being infinitesimally small. Since Newton is interested in infinitesimal values of
Pv and R P, one can replace Gv by G P = 2C P, and then the theorem adopts a
simpler form (note that Qv = R P):
(Pv)(C D)2
= Constant (= 2). (3.17)
(PC)(Qv)2
Equipped with these two properties of the ellipse, Newton derives the law of
force for the center S at the center of the ellipse (see Figure 3.9).
We are interested in the ratio QT 2 /Q R ≡ QT 2 /Pv. By similarity of the
triangles QT v and P FC, we have
QT PF A0
= = (3.18)
Qv PC (PC) × (C D)
48 An Excursion to Newton’s Principia
D R
Q
v P
T
F
G
Figure 3.9 From Proposition 10 of the Principia. Elliptical orbit with center of
force at the center of the ellipse.
the force law for an elliptical orbit with the center of force at the center of the
ellipse is proportional to the first power of the distance.
And now for the climax: the proof that an elliptical orbit with the center of force
at the focus implies that the force decreases as the inverse square of the distance.
P I P
E
S H S C H
Figure 3.10 The reflective property of the ellipse. Left: The dotted lines touch
confocal ellipses and have a total length greater than P S + P H . Right: If EC is
parallel to the tangent at P, P E is constant.
definition of the ellipse as the shape traced by a point so that the sum of its distances
from two foci, let us call it L 0 , is constant. For a given ellipse, draw a set of confocal
ellipses (sharing the two foci) but with larger sum L > L 0 of the distances to the
foci. The tangent to the first ellipse will touch these confocal ellipses. If we move
a point along the tangent, the smaller distance to the foci corresponds to P (the
ellipse with the smaller L). And, according to Hero, the minimum distance occurs
when the angle of incidence equals the angle of reflection. Now, for the second
property, draw H I parallel to the tangent at P through the focus H (see Figure
3.10). Due to the reflective property P I = P H (the triangle I P H is isosceles).
Also, since EC is parallel to H I , E S = E I and the total length is L 0 = 2P I +
2E I , P E is constant (equal to the ellipse’s semi-major axis).
Px PE
= , (3.22)
Pv PC
QT PF
= . (3.23)
Qx PE
S C H
F
G
1 In the letter’s signature we read “Gröningen June 9/19.” The duplicated dates correspond to the Julian and
Gregorian calendars which produced a dephasing of 10 or 11 days until 1701. Leibniz and Bernoulli discuss
calendars and the division of the day in many places of their correspondence. In letter 120, of January 1701,
Bernoulli jokes that he received Leibniz’s letter the day after it was written, “a notable day with a night of
eleven days” (Orio, 2009, p. 497).
51
52 The Optical-Mechanical Analogy, Part I
his response to Leibniz, John Bernoulli mentions two solutions. The first is similar
to Leibniz’s. The second uses a clever map from the swiftest path to the problem
of finding the trajectory of a light ray propagating in a medium of continuously
varying index of refraction. This mapping is a hallmark of the optical – mechan-
ical analogy, which had a profound influence in later formulations of mechanics.
In the nineteenth century, the analogy was used by William Rowan Hamilton, and
in the twentieth century it played a fundamental role in the formulation of wave
mechanics by Louis de Broglie and Erwin Schrödinger.
Bernoulli published his challenge in the December issue of the Acta Erudito-
rum and announced that he would suppress his own solution until Easter 1697.
The May 1697 issue of the Acta Eruditorum contained an introductory historical
paper by Leibniz on the brachistochrone. Leibniz omits his own solution because
it corresponded, he said, with the other solutions. The five solutions submitted by
John, Jacob Bernoulli, the Marquis de l’Hospital, Ehrenfried Walther von Tschirn-
haus, and Isaac Newton were published. Newton had not revealed his name; John
Bernoulli recognized the author, “from the claw of the lion.” In this section we
visit Huygens’ elegant proof that the cycloid corresponds to the isochronous pen-
dulum. We also discuss Leibniz’s solution of the brachistochrone, included in the
Beilage (a supplement or appendix) to his letter of June 16, 1696, as well as John
Bernoulli’s optical – mechanical solution.
d
P
h
Σ S
Q G F
x
R T
O
A A
B A P
Q
R
S
Figure 4.2 Path independence of the final velocity. According to Galileo’s sec-
ond postulate, equation (2.12), particles that start at rest at P and at A, and travel
respectively along the inclined planes AQ and P Q, reach Q with the same veloc-
ity. This means that the velocity at R is the same for the particle following the
path P Q R or the inclined plane A R. But the velocity at R is the one acquired by
a particle falling from rest along the plane B R, and the velocity at S is the same
for the path P Q RS and for the inclined plane B S.
accelerated motion, equation 2.12, and by discretizing the curve into a polygon
(see Figure 4.2).
Since the segment G R is very small (G R h), the velocity is approximately
√
constant along the segment, and given by 2gh. The travel time from G to R is
GR
TimeG R
√ . (4.1)
2gh
Using the similarity of the triangles G R and G AQ, we have
GR GA
= = , (4.2)
R QA QA
where we have also used Galileo’s law of chords of Figure 2.12: G A is the
geometric mean between Q A and the diameter of the circle. Huygens’ inge-
nious step is to map the points of the cycloid to points of a circle of radius
54 The Optical-Mechanical Analogy, Part I
D C
v
F B
Q E S D
X O
M
L H f
S
K G
R PT P
N
A I
Figure 4.3 Diagram from Huygens’ original work on the isochronous clock.
O S = ( − d)/2 (see Figure 4.3). Consider the segment ST , whose vertical pro-
√ to R. Using
jection Sx is equal
√
similarity of the triangles Sx T and S F O, and the
fact that S F = h × A F ≡ h × Q A we have:
ST OS OS
= =√ . (4.3)
R SF QA × h
Combining equations (4.2) and (4.3)
ST √
GR = × ×h
OS √
∠S O T × × h. (4.4)
Substituting equation
√ (4.4) in the expression for the time of passage of equation
(4.1), the factor h cancels out and we obtain:
Time RG
∠S O T × . (4.5)
2g
independent of the amplitude. Notice that T corresponds to the period for small
oscillations of a pendulum of length 2 (equal to the radius of curvature of the
cycloid at the lowest point),
4.1 Bernoulli’s Challenge and the Brachistochrone 55
A
E D
C B
F
L
G
M P
1B
1E 1C
2B
2E 2C
1F 1D
3B
3E 3C
2F 2D
inverse velocities in the two media. The main difference is that n and r are vary-
ing quantities as we move along the vertical line AM and the fact that Leibniz
is implicitly considering variations over all possible curves. The analysis can be
repeated for further vertical segments as shown in Figure 4.5, where Leibniz shows
both the curve of fastest descent on the right and a parabola representing the time
for vertical fall on the left (we change the notation for subindices from Leibniz’s
n B to the contemporary Bn ). If, we call Tn the time for vertical fall from Bn to
Bn+1 , the ratio Tn /Bn Bn+1 is the inverse of the average velocity of the particle
traveling from Bn to Bn+1 . If the intervals Tn and Bn Bn+1 are very small we have:
Tn 1 1
≈√ =√ . (4.10)
Bn Bn+1 2g ABn 2gy
1 Dn Cn+1 1 dx
√ =√ = Constant ≡ k. (4.11)
2gy Cn Cn+1 2gy d x + dy 2
2
with, a = k 2 g/4. Leibniz expresses his solution in the form of equation (4.12)
without noticing that it corresponds to a cycloid.
2 We exchange x with y so that x is the horizontal coordinate. Leibniz calls y the horizontal coordinate.
4.1 Bernoulli’s Challenge and the Brachistochrone 57
Equation (4.14) is equivalent to equation (4.12), with a the radius of the circle that
generates the cycloid.
Bernoulli tells Leibniz that he found the solution in two ways. He refers to the
first method as a discovery of an “admirable coincidence” between the curvature
of a light ray that propagates in a non-uniform medium and the brachistochrone.
He cites the letter to De La Chambre where Fermat establishes that a light ray
refracts towards the perpendicular, traveling, from the point of view of time,
the shortest path. He also mentions Snell’s law (equation 2.28) and proposes an
extension to a medium where the velocity varies continuously (see Figure 4.6).
Bernoulli is extending Snell’s law to all points of the path, implicitly using the fact
that “any portion of a path of quickest descent must itself be a path of quickest
descent”(McDonough, 2009).
Figure 4.6 John Bernoulli divides space into horizontal regions on which the
velocity is constant. The path is a sequence of small straight segments. The curve
of minimum time for a particle traveling from A to B on a vertical plane is the
same as the light ray from A to B, provided the particle and ray velocities are
proportional.
58 The Optical-Mechanical Analogy, Part I
Bernoulli discretizes the problem into thin horizontal layers. We call d the thick-
ness of each layer (see Figure 4.7). According to Bernoulli, the particle’s velocity
in each layer is constant. The interfaces between regions are at heights yn = nd
(here n = 1, 2, · · · ) and the particle’s velocity in the n-th region is approximately
√
vn = 2g × nd. Since one is looking for a path of least time, Bernoulli maps the
problem to that of a light ray that moves at a velocity vn in the n-th region and uses
Snell’s law at the interface. We call θni and θnr the angles of incidence and refraction
of the ray at the n-th interface (see Figure 4.7). According to Snell’s law, we have
sin θni sin θnr
= . (4.15)
vn vn+1
Since the interfaces are parallel, the refracted angle at the n-th interface is equal to
the incidence angle at the next interface:
θnr = θn+1
i
(4.16)
which implies that the ratio of the sines of angles of incidence with the velocity is
a constant at each interface:
sin θ
√ = C, (4.17)
y
with C a constant. From our discussion of Leibniz’s solution, it is clear that
equation (4.17) represents a cycloid: since
dx
sin θ = , (4.18)
d x 2 + dy 2
equation (4.17) is equivalent to Leibniz’s solution of equation (4.11).
In the last part of the letter, by way of an appendix, Bernoulli mentions his solu-
tion to another problem in which he anticipates ideas that William Rowan Hamilton
would study in the nineteenth century. He discusses a curve, which he calls “syn-
chronous,” generated by points B (see Figure 4.8) of cyloids of a common origin
A, so that the travel times from A to B are equal. The synchronous curve cuts
4.2 Maupertuis, Least Action, and Metaphysical Mechanics 59
A G
B O
B
L
P B
B
K
all the cycloids perpendicularly and corresponds, says Bernoulli, to the “wave”
that Huygens (1690/1945) had discussed in his Treatise on Light. Using the results
of Huygens’ isochronous pendulum, he indicates how to construct such a curve.
From equation (4.5), the time √ to fall from A to B on a cycloid is proportional
√ ∠(G O L) × G K (see Figure 4.8). Since the time is also propor-
to the product
tional to A P, Bernoulli indicates that the synchronous curve can be obtained
intersecting the cycloid with an horizontal line through L, with the arc G L pro-
portional to the geometric mean of G K with A P. This elegant construction is
possible due to the coincidental fact that the brachistochrone and the isochronous
pendulum correspond to the same curve. Bernoulli regards this coincidence as a
motive for metaphysical speculation: Nature, which operates according to the sim-
plest means, has chosen uniform acceleration for falling bodies, and one curve
to fulfill both functions. If the velocity, instead of increasing quadratically with
height, were proportional to the height (a force proportional to the square of the
height), then the isochronous curve is a straight line. On the other hand, Snell’s
law would give sin θ = C y (the equation of a circle) and the brachistochrone,
rather than a cycloid, is then a circle. Given its historical importance as well as the
counter-intuitive property of being faster than a straight line, the brachistochrone
has received much prominence outside standard mathemetical venues. A brachis-
tochrone monument was unveiled at the University of Gronigen in 1996 and, in the
Academy building, in Groningen, the brachistochrone is depicted in a stained-glass
window (Sussmann and Willems, 1997).
R D
C
W W
B
F
application: the principle of least action. His proposal, based on metaphysical and
religious views (Jourdain, 1912), reflected his adherence to notions of simplicity
that had previously guided Fermat and Galileo: “Nature, in the production of its
effects,” he wrote, “does so always by the simplest means” (Maupertuis, 1744).
More specifically: “in Nature, the quantity action (la quantité d’action) necessary
for change is the smallest possible. Action is the product of the mass of a body
times its velocity times the distance it moves” (Maupertuis, 1746).
In his 1744 article, Maupertuis derives the law of refraction using a minimization
process. His procedure uses calculus, replicating Fermat’s method. However, in his
calculation, rather than minimizing time, he minimizes the action V × A R + W ×
R B (see Figure 4.9), where V and W are the velocities of light in the different
media. Maupertuis minimizes the action treating the length of the segment C R as
a variable, the procedure being identical to Leibniz’s calculation of equation (4.9).
His result is Snell’s law in Descartes’s version: the ratios of the sines of the angles
is equal to the reciprocal of the ratio of the velocities:
sin ∠A R E W
= . (4.19)
sin ∠F R B V
Even though Maupertuis gets the wrong result for light, his expression, from a
corpuscular point of view, is correct. In contemporary language, as we discussed
in Section 2.4, conservation of momentum in the direction parallel to the interface
gives Maupertuis’s (and Descartes’s) law:
V sin ∠A R E= W sin ∠F R B. (4.20)
Maupertuis’s novelty is to show (albeit with the wrong premise) that, for the
restricted case of a single interface, the bending of a particle path can be obtained
from a minimum principle. In a subsequent paper entitled “Derivation of the laws of
4.2 Maupertuis, Least Action, and Metaphysical Mechanics 61
light bends towards the perpendicular as it enters a denser – and, in this view, faster
– medium taking a longer time than a straight line. So he proposes that light “radi-
ating from a point reaches an illuminated point by the easiest path”(Leibniz, 1682)
– or “most determined” path (Leibniz, 1696/1952) – where by easiest he means the
one that minimizes the total “resistance” opposed by the media. For a refracting ray
in an air-water interface like the one shown in Figure 4.9, the easiest path corre-
sponds to the minimum of m × A R + n × R B, where m and n are the resistances of
the upper (air) and lower (water) media respectively. Leibniz argues that the resis-
tance is higher in a denser medium and gets the correct result. His result agrees
in spirit with Fermat in that the physical path is obtained through a minimization
process and not by a local, mechanically efficient method, as favored by Descartes.
However, in order to agree with Descartes, Leibniz has to assume that light travels
faster in more resistant media, because “greater resistance prevents the diffusion
of light rays,” in a manner similar to a “river that flows in a narrow bed and thus
acquires a larger velocity”(Dugas, 1955, p. 260). In a deeper sense, Leibniz main-
tains that a principle like the most determined path is reflecting “God’s intentions
to create the best of all possible worlds”(McDonough, 2008). “This principle of
nature,” he says in his Tentamen Anagogicum “is purely architectonic,” and then he
adds: “Assume the case that nature were obliged in general to construct a triangle
and that for this purpose only the perimeter or the sum were given, and nothing
else; then nature would construct an equilateral triangle”(Leibniz, 1696/1952).
The biggest supporter of Maupertuis’s ideas was Leonard Euler, who condemned
König’s accusations and applied Maupertuis’s naive formulation to curvilinear
paths to show that planetary orbits can be obtained by requiring the action to be
a minimum.
A H I J K L M N O P Q R S Z
Figure 4.10 From Euler’s A Method for Finding Curved Lines having some Prop-
erties of Maximum and Minimum (Euler, 1744). In order to determine the curve
y = y(x), with A ≤ x ≤ Z , which minimizes (or maximizes) the definite inte-
Z
gral A F(x, y, y )d x, Euler divides the interval AZ into manysmall subintervals,
each of width x. He then replaces the integral by a sum i F(xi , yi , yi )x.
In each term of this sum, he approximates the derivative yi by the slope of the
straight line between initial and final points of the subinterval. He then takes the
variation on a single point (N in the figure), changing y from n to ν, and asks that
the variation in the integral (the sum in the discretized version) is zero.
N = xj+1 , etc. With the same numbering convention, Ll= y j−1 , Mm= y j ,
Nm= y j+1 , etc. Following the logic that we used in previous discretizations, we
assume with Euler that the particle velocity v(y) (a function of the vertical height)
is constant along each straight segment of the curve. The travel time from m to n
will be mn/v(y j ); from n to o, it will be no/v(y j+1 ) etc. Using the Pythagorean
theorem, we have:
2
mn = (d x)2 + y j − y j−1 , (4.21a)
2
no = (d x)2 + y j+1 − y j , (4.21b)
(4.21c)
and similarly for all the other segments of the curve. The total travel time T is now
a sum over travel times on the individual segments:
2 2
T = · · · n(y j ) (d x)2 + y j − y j−1 + n(y j+1 ) (d x)2 + y j+1 − y j + · · · ,
(4.22)
where for simplicity we called n(y) = 1/v(y) (the index of refraction in the optical
mechanical analogy). Euler designated p j the slope, or the derivative of the curve:
y j − y j−1
pj = , (4.23)
dx
64 The Optical-Mechanical Analogy, Part I
At this point, notice that the second term in equation (4.26), in the limit where d x
is infinitesimal, becomes
⎧ ⎫
⎨
1 pj p j+1 ⎬ d d
n(y j ) − n(y j+1 ) ≈− n(y) 1+ p .
2
dx ⎩ 1 + p 2j 1 + p 2j+1 ⎭ dx dp
(4.27)
Finally, introducing partial derivatives, we obtain the following differential
equation for the fastest particle moving with an inverse velocity n(y):
∂ d ∂
n(y) 1 + p 2 − n(y) 1 + p 2 = 0. (4.28)
∂y dx ∂p
Following Euler, we can repeat the same discretization procedure for a general
integral of the form
#
Z (y(x), p(x)) d x. (4.29)
On top of the constant E, there is another constant associated with equal areas
covered in equal times (the angular momentum in contemporary language). In polar
coordinates, the components of the velocity are vr in the radial direction and vθ in
the tangential direction. Constant areas in equal times imply that
dθ
mr vθ ≡ mr 2 = L = Constant, (4.36a)
dt
dr dr dθ dr L
vr = = = . (4.36b)
dt dθ dt dθ mr 2
Substituting equations (4.36) in equation (4.35), and noticing that v 2 = vr2 + vθ2 ,
we obtain a differential equation for the trajectory r = r (θ) that does not involve
the time variable:
2
L2 dr L2 G Mm
E= + − , (4.37)
2mr 2 dθ 2mr 4 r
or, in Euler’s notation
dr r 2
=√ r (A + V (r )) − C, (4.38)
dθ C
with V (r ) = G Mm/r , C = L 2 /2m and A = E. Euler shows that this differential
equation results from a minimization of Maupertuis’s action,
#
mvd. (4.39)
Euler calls x the radial coordinate. The function to be found, y(x), is our θ(r )
(the polar angle of the orbit as a function of the radius), and Euler’s p = dθ/dr .
This means that the function Z is given by
√
Z (y(x), p(x)) = 2m A + V (x) 1 + x 2 p 2 . (4.43)
4.3 Euler and the Method of Maxima and Minima* 67
Since Z from equation (4.43) is not a function of y, Euler’s equation (4.30) gives
d Z /dy = 0, or, which is equivalent
dZ px 2 √
= Constant → A + V (x) = C, (4.44)
dp 1 + x 2 p2
√
where we wrote the constant as C so as to stay close to Euler’s notation. Equation
(4.44), after simple algebra gives
1 dx x 2
= =√ x (A + V (x)) − C, (4.45)
p dθ C
which is the same result as that of equation (4.38), obtained from the direct method.
Euler stresses that this calculation is valid as long as there is no resistance to
the motion.4 In other words, the motion involves a constant of motion E, the total
energy, and the velocity v of the particle is a simple function of the coordinates
determined by (4.35), the vis viva equation. This restriction to conservative motion
was not mentioned by Maupertuis. Euler (1751) later published a paper entitled
“Dissertation on the least action principle, with an examination of the objections
made by Professor König,” where he yields all the honor of the discovery of the
principle of least action to Maupertuis. He cites Aristotle’s notion that Nature does
nothing in vain. The preference for “least” is metaphysical; a maximum would be
evidence of “imperfection of the Creator’s wisdom” (Dugas, 1955, p. 275). How-
ever, what Euler showed in his calculation is that orbits in a central potential are
extrema of the Maupertuis action, and non necessarily minima. It is clear that the
paths cannot be maxima. One could always increase the value of the action by
adding “wiggles” to the path over a region small enough in size that the potential
V is constant. This process increases the kinetic energy of the path (the integral of
the potential energy remains constant), and the action increases. As we will dis-
cuss in more detail in Section 6.8, the physical paths are either minima or saddle
“points” of the action. It is also interesting that Euler does not claim to have proven
or discovered a wider principle. On the contrary, he remarks that he has “not discov-
ered this beautiful property a priori but (using logical terms) a posteriori, deducing
after many trials the formula which must become a minimum” (Euler, 1751). Mach
had good things to say about Euler’s modesty and accomplishments: “Euler mag-
nanimously left the principle its name, Maupertuis the glory of the invention, and
converted it into something new and really serviceable” (Mach, 1960, p. 550).
4 Here the term “resistance” refers to friction, or non-energy conserving forces, and not to resistance in the
Leibnizian sense
68 The Optical-Mechanical Analogy, Part I
V1
V2
This means that, if we know the trajectory of a particle with energy E connecting
two points of a medium in which the potential is V (x), that identical trajectory will
be the one followed by a light ray connecting those points in a medium in which
the index of refraction is5
n(x) ∝ m [E − V (x)] . (4.51)
Conversely, the trajectory of a light ray in a medium in which the index of refraction
is n(x) will be identical to that followed by a particle of zero energy moving in a
potential
n 2 (x)
mV (x) ∝ − . (4.52)
2
As we already mentioned, the optical mechanical analogy was later used by
William Rowan Hamilton, and played a crucial role in the formulation of quan-
tum mechanics in the twentieth century. However, and remarkably, the idea “is
nowhere to be found again before the time of Hamilton in the nineteenth century”
(Carathéodory, 1937). From a pedagogical point of view, it is interesting that we
can obtain the trajectories of light rays in media with varying indexes of refrac-
tion by solving Newton’s equation of motion for a particle (Evans and Rosenquist,
1986).
Figure 4.12 A light ray refracting in a region in which the index of refraction
varies radially.
cd rn−1 θ
sin θnr = = ,
ad ad
and
ab rn θ sin θnr
sin θn−1
i
= = ≡ rn .
ad ad rn−1
We can now omit the index “i”, since in the above equation θ is the angle that
the light ray forms with the line connecting the “center of force” with the point of
refraction. The idea is now to take the limit of r very small so the trajectory of
the light ray becomes a continuous line. Also, we can write the velocity in terms of
the index of refraction v(r ) = c/n(r ) and rewrite (4.54) as
In some optics textbooks this equation is called the formula of Bouguer (Born and
Woolf, 1999, p. 123), and is identical to the conservation of angular momentum6 if
we follow the optical-mechanical identification mv(r ) ↔ n(r ).
with A a constant. From simple geometrical considerations we can see that this
equation corresponds to a hypocycloid, the curve described by a point on a circle
rolling inside of a larger circle.
From Figure 4.13 we see that
and
r cos θ = R sin α. (4.59)
C E D
B
A
S
R
m q
M e D
C E p
N
A R n
B
m
N
q p
M n
C E e D
S
Figure 4.14 John Bernoulli’s proof of Snell’s law using the mechanical equilib-
rium of a tense string. Reproduced from Bernoulli (1742).
and others, Snell’s law is the result of the minimization of the time t from A to
B, given by ct = n 1 1 + n 2 2 , where 1 = AE and 2 = E B. The analogy is
immediate; the quantity to be minimized for Bernoulli’s string is given by
U = T1 1 + T2 2 . (4.62)
The quantity U is the potential energy of the system. If we use the optical mechan-
ical analogy, the tension Ti of the string can be identified with the velocity vi
√
of the particle, which, in turn, is given by vi = 2m(E − Vi ), and Ui is the
corresponding potential energy for a particle in each region.
74 The Optical-Mechanical Analogy, Part I
Table 4.1 Corresponding quantities in the analogy used in the principle of least
action between mechanics, geometric optics and the equilibrium of a
non-stretchable string.
Figure 4.15 Frictionless pulleys that can slide in horizontal lines with a string
passing through them a sufficient number of times gives the trajectory of the par-
ticle if we identify Ti with mvi at each segment. Since the string can only pass
through each pulley an integer number of times, the ratios of the velocities are
approximated by the ratios of the number of times the rope passes through each
segment (see Mach, 1960, p. 473).
xi − x P
vP = , (4.63)
dt
x
(a)
(xi, dt)
xi
Q = (xQ, t)
F
P = (xP, 0)
t
0 dt 2dt
x
−F (b)
xi
)t2) 2
d(td
Q
/(/
mm
==
kk
P
t
0 dt 2dt
where points xi and x P should be thought of as very close to each other. The effect
of F is to change the particle’s velocity from v P to v Q , given by
x Q − xi
vQ = . (4.64)
dt
Notice that, since the force is downward, the slope decreases: downward force
means that, at xi , the potential is increasing as a function of x.
The path (in space time (x, t)) is a solution of Newton’s second law, according to
which the rate of change of the velocity times the particle mass is the force acting
on it:
vQ − vP
F =m . (4.65)
dt
Substituting the above expressions for the velocity:
m m
F= x Q − xi − (xi − x P ) . (4.66)
(dt)2
(dt)2
At this point take a step towards abstraction, and forget the space-time picture
for a moment. Equation (4.66) can be thought of as describing the force of a system
of two springs of identical “spring constant” k = m/(dt)2 , the first spring connect-
ing point (xi , dt) with (x P , 0), the second connecting (x Q , 2dt) with (xi , dt), as
sketched in Figure 4.16(b). In order for the system to be in equilibrium, or, in other
words, for the intermediate coordinate to have the value xi (the other two are fixed),
there has to be a force of precisely magnitude F but of opposite sign.
The path given by Newton’s law is given by the equilibrium condition of a
mechanical model of two springs in the presence of a potential of opposite sign
to that of V (x). The equilibrium configuration is the one that minimizes the poten-
tial energy of the entire system, springs plus “external” potential −V (x). Since the
potential energy for a spring of spring constant k connecting two points separated
by a distance δ is kδ 2 /2 , the total potential energy of the system (that we will call
$
S) is given by
$ m xi − x P 2 m x Q − xi 2
S= + − V (xi ). (4.67)
2 dt 2 dt
Just as in Euler’s treatment, the equilibrium condition of equation (4.66) is
obtained from d $S/d xi = 0, noting that F = −d V /d x.
Coming back to the original world line picture, the optimum path in space-time
is the one that minimizes the difference between kinetic and potential energy. This
is Hamilton’s principle, which in this case we obtained using a mechanical analogy
similar to the principle of least action in the sense that there is a correspondence
between the kinetic energy and the potential energy of the fictitious springs. In
other words, the stretchable string is in equilibrium due to two types of forces in
4.5 The String Analogy and the Principle of Least Action 77
space-time: the external force due to (minus) the real external potential, and the
elastic force of fictitious springs playing the role of the kinetic energy.
For a longer path with N straight segments, each of them traversed by the particle
in a time dt, the velocity at the i-th segment will be vi = (xi+1 − xi )/dt and the
equivalent potential energy will be given by
2 2
$ mv1 mv2 mv 2N
S= − V (x1 ) + − V (x2 ) + · · · + . (4.68)
2 2 2
In the continuum limit we will have to minimize the quantity S[x(t)]:
# x(t) 2
mv
S[x(t)] = dt − V (x) , (4.69)
x(0) 2
which amounts to finding the path x(t) that minimizes the integral S.
We can also derive Hamilton’s principle using a slightly more sophisticated
approach while still keeping it elementary. In reference to Figure 6.11(a), Mau-
pertuis’s and Euler’s principle of least action tells us the path a particle of fixed
energy E will choose in going from A to B. Call V1 and V2 the potential ener-
√
gies in the upper and lower parts of the line C D, and v1 = 2m(E − V1 ) and
√
v2 = 2m(E − V2 ) the corresponding velocities. Now consider paths with differ-
ent energies and ask for which of those paths the particle will satisfy Newton’s laws
and spend a fixed amount of time t going from A to B. Following the logic of the
principle of least action, we want to find a function of the paths that will give the
desired one upon minimization. For the special case under consideration, the path
consists of two straight segments, and the function has to be such that, of all paths
that take a time t in going from A to B, the particle chooses the one that satisfies
the “Snell’s law for particles”: mv1 sin θ1 = mv2 sin θ2 .
Call a and b the perpendicular distances of A and B to the interface C D, L the
horizontal distance between A and B, and x the distance C E. Maupertius’s action
A = mv1 1 + mv2 2 can be thought of as a function of x and the energy:
A(x, E) = mv1 (E) x 2 + a 2 + mv2 (E) (L − x)2 + a 2 . (4.70)
79
80 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
C
2m
m h
E 2
h
A B
Figure 5.1 Adapted from Galileo’s “On Mechanics.” Virtual displacements for an
inclined plane. According to Galileo, the system is at equilibrium if the ratio of
their vertical virtual velocities is reciprocal to the ratio of the weights.
vertical velocities (if they were displaced in the same amount of time) is 1/2. In
general, says Galileo, two weights will be in equilibrium on the inclined plane of
Figure 5.1 when the ratio of the forces E and F is equal to the ratio of the lines
C B and AC.
Interestingly, although Galileo is usually portrayed as rejecting Aristotle’s
dynamics, the principle of virtual displacements, the exclusive basis of Galileo’s
science of motion – at least according to historian Pierre Duhem – is of Aristotelian
heritage (Duhem, 1905, p. 260).
The term “virtual” appears for the first time in a letter of John Bernoulli to Pierre
Varignon, dated January 26, 1717. Varignon was able to solve numerous problems
of statics using the parallelogram rule for the decomposition of forces. Bernoulli
used a different approach. He didn’t see a clear way of introducing the reactive
forces in statics and created his “rule of energies” (later called the “principle of
virtual velocities” by Lagrange). For Bernoulli, a virtual velocity is a tendency, or
a propensity, to move, that the acting forces have on the system at equilibrium, and
his proposition is: “In any equilibrium of any forces in any way they are applied and
following any directions, either they interact with each other indirectly or directly,
the sum of the positive energies will be equal to the sum of the negative energies
taken positively” (Varignon, 1735, p. 176). What Bernoulli calls “energies” is what
we call today virtual work. In Figure 5.2 we reproduce Bernoulli’s definition of
virtual velocity. A force F, represented by the line FP, is acting on a point P of
the system in equilibrium. He now calls P p the (infinitesimal) movement of the
point when it is displaced from equilibrium. Since the point has moved from P to
p, according to Bernoulli the direction of the force at p will be along the line f p,
which in general will not be colinear with FP. He calls the segment C p the virtual
velocity of the force at P, and the energy is given by F ×C p (where “×” means the
usual product). The energy can be positive or negative depending on the direction
of the force at P, which can be from P to F or from F to P. In contemporary
notation, we denote the point P by the coordinate x and use F(x) for the force F.
5.1 The Principle of Virtual Work 81
F
f
P C
For the displacement P p we use δx and F(x + δx) for the force F at p. In this
notation C p = δx · F(x + δx)/|F(x + δx)|. If we now use the fact that the segment
C p is infinitesimal, keeping the terms to lowest order in δx, we have:
In this notation, for a system of N points that can undergo N virtual displacements
Bernoulli’s statement reads
The quantities δxi are not arbitrary displacements of the system but those dis-
placements allowed by the constraints. It is crucial that each δxi is infinitesimal
in order for equation (5.4) to express a condition of equilibrium. When the dis-
placements are infinitesimal, we can divide equation (5.4) by an infinitesimal of
time “dt” (a differential of a “virtual time” t of sorts since the system is at rest)
and call Bernoulli’s statement the principle of virtual velocities. Varignon praised
Bernoulli’s idea but considered the rule of energies as a corollary to his parallel-
ogram rule. He was not able to prove that statement in general but showed the
equivalence between the two methods in many specific examples, one of which
we show in Figure 5.3. Three equal weights of magnitude W hang from pulleys
on a horizontal table, and the strings from which they hang are knotted at O. The
forces W A , W B , and WC acting on O are of equal magnitude but point in different
directions. We want to find the angle between the strings when the system is at
equilibrium. The principle of virtual work for this example can be expressed as
(W A + W B + WC ) · δx = 0, (5.5)
82 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
A
A
WA α
δx
W B
O
WC O WB B
C β
W C
Figure 5.3 Virtual displacements and equilibrium for three equal weights con-
nected by strings and knotted at a point O. Modified from Mach (1960).
lost
unchanged
impressed
Figure 5.4 Impressed, unchanged, and lost motions (or velocities) for a rigid bar
that is free to rotate around C.
g g g θx
X B
A C
a2
M G
T
a1
Figure 5.5 The center of oscillation X of a bar with two masses is such that,
if an impulsive force (or impressed velocity) g is applied at X , the compound
system will behave as a simple pendulum with a mass at X . The motions (arcs)
AM = X T = BG are all proportional to g. a1 and a2 are the lost motions.
horizontal, and the force of gravity acts impulsively.1 Since the masses acquire
momenta proportional to m 1 g and m 2 g respectively, their impressed velocities will
be equal, and proportional to g.
If the masses, rather than being constrained to be along the same line, were
free to rotate independently around C, a short time after the impulses they would
describe arcs of the same length, proportional to g. Mass 1 would be at position
M, and mass 2 would be at G. Since all of the calculation is based on proportion-
alities, we take the arcs AM = BG = g. In order for the constraint to be satisfied,
two extra motions, a1 and a2 , are required (see Figure 5.5). In their real position
after the impulses, the positions of the masses form the same angle θx with the
horizontal:
g + a1 g − a2
= = θx . (5.7)
1 2
The quantities a1 and a2 are the lost motions of this problem, and originate in
internal, impulsive forces, F1 and F2 , that act perpendicularly to the bar. Since
these forces act in the same infinitesimal time interval dt as the impulsive velocity
g dt, we have, using Newton’s law:2 a1 = F1 /m 1 , and a2 = F2 /m 2 . Bernoulli
assumes that the lost motions acting on the system at rest will not alter the static
equilibrium; the forces F1 and F2 obey the law of the lever, F1 1 = F2 2 , or which
is equivalent:
m 1 a1 1 = m 2 a2 2 . (5.8)
1 Bernoulli considers the bar at an angle with the horizontal. The algebra is slightly simpler when the bar is
horizontal and is enough to capture the essence of the method.
2 Again, we are using equalities where we mean proportionality. Strictly speaking, if we call a a
1
displacement, it is a quantity of “second order.” First, the impressed velocity dv1 comes from a force F1
acting in an infinitesimal time dt: dv1 = F1 dt/m 1 . Second, the displacement a1 is the distance traveled in a
small time that we can call δt: a1 = F1 δtdt/m A . The factors dt, δt are omitted, and do not enter into the
final calculation, provided one is talking about infinitesimal displacements.
5.3 D’Alembert’s Principle 85
idea of causes motrices (motive causes) was for him “obscure and metaphysicical”
(d’Alembert, 1743, p. xvi). His program was to reduce all dynamics to kinematics
and to describe motion from geometry only, without physical concepts such as
force, which are drawn from experience. He didn’t object to the notion of mass,
which is also physical, but more tolerable than force, since it can be conceived of
as the number of particles in a body. For him, changes of motion of particles origi-
nate in collisions with other particles, which in turn collided previously with other
particles. In this sense, the “cause” of the change in motion is a consequence of a
previous consequence. The exception is gravity (and other forces like magnetism
that were not conceptualized at the time). He even tried to develop an impact the-
ory of gravitation and failed (Hankins, 1970, p. 167). If one treats the motion under
gravity as collisions with an invisible stream of particles, the resulting “force”
will depend on the velocity of the attracted body, and that is not observed. He
insisted that the motion under gravity could be described without forces, and that
the causes of motion are “known only through the effects, and we are completely
ignorant of their real nature” (d’Alembert, 1743, p. x). D’Alembert’s discomfort
with the notion of force was founded as follows. In order for the relation f = ma
(or f = dp/dt) to be a physical law, mass, acceleration, and force have to be
defined independently. If we exclude the law of gravitation, and define force as the
rate of change of momentum, Newton’s second law says that the rate of change
of momentum is equal to the change of momentum – a tautology. Since “forces”
of constraint, for example, are not gravitational, d’Alembert formulates his theory
avoiding the notion of force. It is interesting that Einstein’s theory of gravitation,
developed centuries later, is a theory based on geometry, without forces.
In his Treatise, d’Alembert postulates two laws and two theorems as the foun-
dation of dynamics. The first law is the law of inertia: a body at rest will remain
at rest unless an external influence acts on it. He explicitly states that he takes this
law from Newton. The second law states that a body, once it is put in motion, will
“persevere uniformly in a straight line” unless an external influence on it. The first
theorem is the parallelogram rule for the composition of velocities: if two “forces”
(he uses the term puissance and not force) act to change the velocity of a body at A
so that one will make it move uniformly from A to B and the other from A to C, the
body will change its velocity along the diagonal AD of the parallelogram formed
by AB and AC. The second theorem, which he calls the “law of equilibrium,”
and refers to impenetrable bodies, is as follows: “If two bodies whose velocities
are in inverse ratio of the masses, such that one cannot move without shifting the
other, there is equilibrium between these two bodies.” In his article “Equilibre”
in the monumental Encyclopedie he admits that he is using the term equilibrium
that comes from statics (from the Latin aequs and libra or ‘equal balance’) to
mean something from dynamics: if two impenetrable bodies of equal momenta
5.3 D’Alembert’s Principle 87
(mv) collide and their motions are destroyed by the collision, “after the instant
of the collision these two bodies have lost their tendency to move” (d’Alembert,
1755). Today we generally choose Newton’s laws of motion over d’Alembert’s.
We accept mechanics as an experimental science and not a branch of geometry and
think of forces as acting continuously and not by impacts. D’Alembert remains a
major name in the history of mechanics, not because of his laws, but because of his
principle, stated in Part II of the Treatise.
D’Alembert’s formulates his principle (we reproduce the original statement
in Appendix B) in terms of “motions” and not of forces. Consider the motions
a, b, c, · · · imparted on a system of masses A, B, C, · · · , as shown in Figure 5.6.
By motions d’Alembert means the momentum mv. In fact, since in general the
particles were already moving, we should think of a, b, c, · · · as changes in the
momenta of particles A, B, C, · · · . Due to the constraints of the system, part of
those motions will be lost and part will remain unchanged. D’Alembert uses the
parallelogram rule to decompose the impressed motion into the unchanged motions
ā, b̄, c̄ · · · and the lost motions α, β, γ · · · . The principle states that, if we think of
the system at rest and act upon it with the lost motions only, the system will remain
at rest: the lost motions cancel each other out. In general, these motions will not
be colinear, and we have to use the law of the lever to establish that cancellation.
With this simple prescription, one should be able to find the motions ā, b̄, c̄ · · · of
the system.
The first example presented by d’Alembert is the calculation of the center of
oscillation, where the calculation is essentially the same as the one by Bernoulli,
in the version we presented in Section 5.2. In solving the examples, d’Alembert
applies his principle using a geometrical approach that resembles that in the
Principia and which looks complicated for the modern reader (Fraser, 1985).
a (impressed motion)
ā (unchanged motion)
(lost motion) α
γ
A
C
c
b̄
b
c̄
β
B
Figure 5.6 Impressed, unchanged, and lost motions (or velocities) for a system
with constraints. D’Alembert’s principle states that, if the system at rest is acted
upon by the lost motions only, it will remain at rest.
88 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
D’Alembert’s great insight is that the system is in static equilibrium under the
action of forces of constraint (or the lost motions in his language). The condition
of static equilibrium, according to the principle of virtual work, is
δx
dx
x(t)
r(
t)
Figure 5.7 Virtual (δx) versus real (dx) displacements for a particle constrained
to move on a circle, the radius of which, r (t), varies according to an externally
imposed function of t. The dashed line shows a possible path or the particle.
other. On the other hand, the real displacements dx, being tangent to a real path
x(t) are not independent of each other at different times. A third difference arises
regarding the work done by the forces of constraint. Equation (5.11) can be read
as saying that the forces of constraint do no work under virtual displacements. In
order to illustrate this last point, consider the example of Figure 5.7: the motion of
a particle constrained to move on a circle whose radius r (t) is some given function
of t and there is no external force. At a given time, the virtual displacement δx is
in the direction tangent to the circle. Since the force of constraint, Fc , is along the
radial direction we have: Fc · δx = 0. On the other hand, the real velocity v in gen-
eral can have a radial component and so will the real displacement dx = vdt. This
means that the force of constraint can in fact do work under real displacements, but
not under virtual displacements.
The connection between static equilibrium and dynamics appears also in Euler’s
work, where he sets out to clarify what is really meant by the “action” to be
minimized (Euler, 1748a,b). Euler considers the integral of the force over the
displacement:
#
= F · dx, (5.13)
which he calls the “action of the forces” (Euler, 1748b) or “effort of the
forces” (Euler, 1752). Later Lagrange will call this quantity the “potential.” In equi-
librium, this quantity is a minimum. Euler states that it is “natural to maintain that
the principle of equilibrium should also hold for the motion of bodies acted upon
by similar forces.” Since, he says, “the intent of Nature is to economize the total
effort
as much as possible,” then, “if dt denotes the element of time, the integral
dt must be a minimum. Thus if in the state of equilibrium, the quantity is
90 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
a minimum, the same laws of nature seem to require that for motion, the integral
dt should also be a minimum” (Euler, 1752).
He then uses the vis viva theo-
rem: 2 Mv = C − to obtain that 2Ct − Mv dt should be a minimum. The
6 1 2 2
term
Ct “does
not enter into the consideration of the maximum or7minimum” and
Mv dt = Mv ds, Maupertuis’s action, should be a minimum.
2
A delicate point that emerges in the evaluation of equation (5.16) is the choice
of independent variable for the paths one wishes to compare. Choosing time as
6 He does not write the factor 1 in the kinetic energy, but that does not affect the result.
2
7 Evidently Euler is referring here to an extreme, since the minimum of is the maximum of −.
5.4 Lagrange’s Dynamics 91
the independent variable is problematic: since we are comparing paths of the same
energy, the elapsed times will be different for different paths, and time will have to
be varied. Since Lagrange does not vary time, his derivation was re-analyzed criti-
cally in the nineteenth century by several researchers including Rodrigues (1816),
Mayer (1877), and notably by Jacobi (1884). The main point is that, for paths of
the same energy, if one uses time as the variable, the changes dδx and δdx are not
equal. One can use the “operators” “dδ” or “δd” interchangeably when comparing
paths that take the
same time in going from the initial to the final point, which is
not the case for vds. As we anticipated in Section 4.5, the path connecting two
points at a given time is the one that minimizes the quantity
#
dt (T − V ), (5.17)
−∇V x · δ
δ vds = dλ · δ |x | + v = 0. (5.24)
0 Mv |x |
Since the vector x /|x | is of unit length and along the tangent to the path, we can
write:
# # 1
F
δ vds = dλ · δ |x | + v · δ , (5.25)
0 Mv
where F = −∇V is the external force acting on the particle, originating from the
potential V . Now we can write the identity
d
(v · δ) = v · δ + v · δ , (5.26)
dλ
and use the fact that δ vanishes at the extremes11 of the integral to write
# # 1
|x | dv
δ vds = dλ F− · δ, (5.27)
0 Mv dλ
which, given that the variations δ are arbitrary, implies
|x | dv
F− · δ = 0. (5.28)
Mv dλ
If we now use x = dx/dλ, |x | = ds/dλ, and ds/v = dt, we can write equation
(5.28) as
dv
F−M · δx = 0, (5.29)
dt
9 The function δ(λ) is to be regarded as of infinitesimal magnitude. We can also write it as δ(λ) = η(λ), with
infinitesimally small.
10 Recall that both δ and δ are infinitesimal if one thinks of both as proportional to some .
11 Notice that, had we chosen time to parameterize the paths x(t) and x (t) = x(t) + δ(t), this last step would
1
not be licit. The total elapsed times T and T1 are different for the different paths, and δ(T ) = 0.
5.4 Lagrange’s Dynamics 93
where we went back to the notation δ = δx. Equation (5.29) is, in contemporary
notation, the result Lagrange obtained in his 1760/1761 paper. If the variations of
the different components of δ along the x, y, z axes are independent, we obtain
Newton’s law: F = Mdv/dt. On the other hand, if the particle is subject to geo-
metrical constraints – for example, if it is restricted to move on a certain surface –
then the components of the possible variations of δx are not independent of each
other, and equation (5.29) becomes d’Alembert’s principle. Lagrange appears to
have noticed that his derivation corresponded to d’Alembert’s principle (or the
principle of virtual velocities), and therefore the principle of least action would
become a result derivable from a more fundamental principle.
In a prize essay on the libration of the moon, Lagrange (1764) formulates the
dynamic equations of motion using a “new principle of mechanics:” the principle
of virtual velocities. From then on, Lagrange shifts to the principle of virtual veloc-
ities, probably because of his rejection of the teleological speculations associated
with least action (Fraser, 1983).
δx = A δξ + B δψ + C δϕ (5.30a)
δy = A δξ + B δψ + C δϕ (5.30b)
δz = A δξ + B δψ + C δϕ, (5.30c)
and the same expressions for equations for d x, dy, dz in terms of dξ , dψ,
dϕ. Lagrange does not use the language of partial derivatives, but we identify
A = ∂ x/∂ξ , B = ∂ x/∂ψ etc. Using equations (5.30) in the expression of the
acceleration we obtain:
d 2 x δx + d 2 y δy + d 2 z δz = Ad 2 x + A d 2 y + A d 2 z δξ
+ Bd 2 x + B d 2 y + B d 2 z δψ
+(Cd 2 x + C d 2 y + C d 2 z) δϕ. (5.31)
The variation of I is
# tb
∂f ∂f
δI = dt δx(t) + δ ẋ(t) (5.41)
ta ∂x ∂ ẋ
# tb
∂f d ∂f d ∂f
≡ dt δx(t) + δx(t) − δx(t) . (5.42)
ta ∂x dt ∂ ẋ dt ∂ ẋ
The middle term in equation (5.42) vanishes upon integration if the variation δx
is zero at the extremes of the integral. Since the variations are arbitrary, we obtain
∂f d ∂f
− = 0. In other words, the structure of equation (5.39) can be regarded
∂x dt ∂ ẋ
as resulting not from a dynamical principle, but from the minimization of L dt,
with L an arbitrary function of qi , q̇i and time.
William Rowan Hamilton, whose work was inspired by that of Lagrange, had
words of high praise for the Mécanique analytique: “[Lagrange showed] that the
most varied consequences respecting motions of systems of bodies may be derived
from one radical formula; the beauty of the method so suiting the dignity of the
results, as to make of his great work a kind of scientific poem” (Hamilton, 1834a).
13 Lagrange considers f to be a function of more variables, y, z and higher order derivatives ẍ, ÿ, etc. but in
order to illustrate that the implied structure is that of equation (5.39) it is enough to consider the integral of
equation (5.40).
96 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
5.4.2 Symmetries
Symmetries play a key role in both classical and quantum physics. In the con-
text of Lagrangian mechanics, a symmetry is some mathematical operation or
transformation that leaves the Lagrangian or the equations of motion unchanged.
For example, right after deriving his scientific poem, Lagrange remarks that if
one adds to L a function that is a total derivative, d A(q)/dt, equations (5.39)
remain unaltered. Consider for simplicity of notation the case of only one variable:
d A(q)/dt = A (q)q̇, and:
L = L + A (q)q̇. (5.43)
The added term is innocuous since:
∂
A (q)q̇ = A (a)q̇, (5.44a)
∂q
d ∂ d
A (q)q̇ = A (q) = A (a)q̇, (5.44b)
dt ∂ q̇ dt
implying that both L and L give rise to the same equations of motion, just as
adding a constant to a function f (x) does not alter its derivative, and therefore
leaves the position of the minimum of f unchanged. This freedom of choice of L
was later baptized as “gauge symmetry” (Weyl, 1919) and plays a crucial role in
modern physics
Another simple mathematical evidence of a symmetry is a cyclic variable. This
means that the Lagrangian is independent of that variable. For example, the vari-
able could be a direction in space, say the x-axis. The Lagrange equation for that
variable becomes:
d ∂L
= 0, (5.45)
dt ∂ ẋ
and hence the quantity px = ∂∂ Lẋ (the particle’s momentum in the x direction) is
conserved along the motion.
In a famous paper, the mathematician Emmy Noether (1918) generalized the
concept underlying the cyclic variables and proved that symmetry implies con-
servation. In our simple example, the cyclic nature of the variable means that
∂ L/∂ x = 0, which is equivalent to L(x + d x) = L: we have invariance with
respect to translation in the x direction, or invariance with respect to the group
of translations in that direction. (This is a group in the sense that one can compose
two translations and obtain another translation or take the inverse of a translation by
5.4 Lagrange’s Dynamics 97
∂L ∂L
δL = ·f+ · ḟ = 0. (5.47)
∂q ∂ q̇
Equation (5.47) expresses the symmetry property of the function L under the
change generated by the function f, and so far does not say anything about conser-
vation. The magic of Noether’s theorem comes from noticing that, if we evaluate
∂L d ∂L
equation (5.47) along a path q(t) that obeys the equations of motion = ,
∂q dt ∂ q̇
the variation δL becomes a total derivative:
d ∂L
δL = · f = 0, (5.48)
dt ∂ q̇
and
∂L
· f = Constant of motion. (5.49)
∂ q̇
The constant of motion is sometimes called the “conserved charge” correspond-
ing to the symmetry f. For example, consider the Lagrangian of a particle of mass
M moving in a central potential:
1
M q̇2 − V (q),
L= (5.50)
2
with q = |q| the distance to the
origin.
Consider the following infinitesimal vari-
ation in coordinates: δq = n̂ × q , corresponding to f = n̂ × q, with n̂ a fixed
unit vector in space. The variation is an infinitesimal rotation around an axis in
the direction of n̂. Using ∂ L/∂q = −V (q)q/q and ∂ L/∂ q̇ = M q̇, the variation
becomes:
q·f
δL = −V (q) + M q̇ · ḟ . (5.51)
q
14 For compactness, we use the dot product notation: ∂ L · f = ∂ L f + ∂ L f + · · · ∂ L f , etc.
1 2 N
∂q ∂q1 ∂q2 ∂q N
98 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
∂L ∂L d
δL = · q̇ + · q̈ = L = 0. (5.55)
∂q ∂ q̇ dt
The variation is not zero, but is equal to a total derivative, with A = L, and
equation (5.54) applies. Since the change of coordinates is δq = q̇, we have f = q̇
and the constant of motion is:
5.4 Lagrange’s Dynamics 99
∂L
· q̇ − L = E, (5.56)
∂ q̇
with E the energy of the system. For the cases where L = 12 M q̇2 − V (q), we have
the vis viva theorem: E = T + V is constant along all paths that obey the equations
of motion.
Clearly the Lagrangian is not invariant under this change: L is not invariant
under translations. However, along an orbit of angular momentum L we can use
1/r 2 = M θ̇/L, and δL becomes
Mk d mk dA
δL = − θ̇ sin θ = cos θ ≡ . (5.62)
L dt L dt
Since δL is a total derivative along the orbit, we can speak of the translation in the
y direction generated by f = ĵ as a dynamical symmetry of the Kepler problem.
The constant of motion, from equation (5.54), is
∂L Mk
C=· ĵ − cos θ (5.63)
∂ ẋ L
Mk
= M ẏ − ûr · ı̂. (5.64)
L
With simple manipulations we can verify that the constant C of equation (5.64)
is proportional to the x component of the LRL vector. Since the angular momentum
is in the z direction, we have:
ẋ × L = L ẋĵ − ẏı̂ , (5.65)
and
L ẏ = − (ẋ × L) · ı̂. (5.66)
Multiplying equation (5.64) by L we obtain:
p × L + Mk ûr · ı̂ = Const. (5.67)
Since the problem has rotational symmetry, we can choose any direction for f
and obtain that all the components of the LRL vector are constant. This dynamical
symmetry is unique to the Kepler problem. There are only two central poten-
tials that give closed orbits for all energies: the Kepler problem and the harmonic
potential where the force is proportional to the distance. And both have dynamical
symmetries. A vector is constant for the Kepler problem, and a more complicated
mathematical object, a tensor, is constant for the harmonic potential.16
M 2 K 2 m 2 k
L= Ẋ − X + u̇ − (u n − u n+1 ) ,2
(5.70)
2 2 n=−∞
2 n 2
where u n represents the (vertical) displacement of the n-th mass with respect to
the equilibrium position. We are choosing the u n ’s to be vertical so that the string
102 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
un−1
na (n + 1)a
x
(n − 1)a un
un+1
oscillations are transverse, but we could have also chosen the oscillator coupled
to longitudinal oscillations of a medium. In the Lagrangian of equation (5.70) the
string and the oscillator are decoupled. Lamb chooses a coupling in the form of a
holonomic constraint:
X = u0, (5.71)
which amounts to “gluing” the oscillator to one of the points of the string (the one
at x = 0), and decreasing by one the number of degrees of freedom of the system.
The Lagrangian becomes:17
'∞
m 2 k (M − m) 2 K 2
L= u̇ − (u n − u n+1 ) + δn,0
2
u̇ n − u n . (5.72)
n=−∞
2 n 2 2 2
Now take the limit, a → 0, where the chain becomes a continuous string. The
discrete difference becomes a second derivative: u n+1 + u n−1 − 2u n = a 2 ∂ 2 u/∂ x 2 ,
and we obtain the following parameters for the string: ρ = m/a (the mass per unit
length) and T = ka (the tension). Also, the Kronecker delta becomes the Dirac
delta: δi,0 = aδ(x), and Lagrange equations become:
∂ 2u ∂ 2u
(ρ + δ(x)M) = T − K δ(x)u(x). (5.74)
∂t 2 ∂x2
When x = 0, equation (5.74) becomes the free wave equation:
∂ 2u ∂ 2u
ρ = T , (5.75)
∂t 2 ∂x2
17 In equation (5.72) the function δ
i, j is the “Kronecker delta,” which is equal to one when i = j and zero
when i = j
5.5 Lagrange versus d’Alembert: Dissipative and Nonholonomic Systems 103
M Ẍ = −K X − (2cT ) Ẋ . (5.78)
X(t)
−ct ct
equations for each of the unconstrained variables. On the other hand, for nonholo-
nomic systems, the values of the unconstrained variables are not a function of the
constrained variables and the limitations are on the (virtual) displacements them-
selves and not on the possible configurations. For example, following the logic of
the derivation of Lagrange equations, consider a holonomic transformation of vari-
ables from x, y, z to ξ, ψ: x = f (ξ, ψ), y = g(ξ, ψ), z = h(ξ, ψ). The variations
of the constrained variables are of the form δx = (∂ f /∂ξ )δξ + (∂ x/∂ψ)δψ and
similarly for y and z. The fact that we can write a variation ∂(δx)/∂(δξ ) as (∂ f /∂ξ )
for some f allowed Lagrange to write equation (5.32) and from that equality derive
the scientific poem. For nonholonomic systems, the function f does not exist, and
the crucial step of equation (5.32) cannot be taken. Let us see this fundamental
difference in a simple example.
Consider a particle in three dimensions in the absence of external forces.
d’Alembert’s principle reads:
m ẍδx + m ÿδy + m z̈δz = 0. (5.79)
Now consider the following “velocity constraint”:
δz = yδx, (5.80a)
ż = y ẋ. (5.80b)
This is a nonholonomic constraint in that a function z = f (x, y) does not exist
that would give us ∂ f /∂ x = y and ∂ f /∂ y = 0: there is no Lagrangian for this
system. Substituting equation (5.80a) in d’Alembert’s principle, equation (5.79),
we obtain:
(ẍ + z̈ y) δx + ÿδy = 0. (5.81)
5.6 Gauss’s Principle of Least Constraint 105
ẍ + z̈ y = 0, (5.82a)
ÿ = 0. (5.82b)
1 F1 2
xb = x(t) + ẋ(t)dt + dt (5.85a)
2m
106 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
1
xc = x(t) + ẋ(t)dt + a1 dt 2 . (5.85b)
2
In equation (5.85), the coordinate xb does not satisfy the constraint, whereas xc
does. Since the factor dt 2 /2 is common to both xb and xc , we obtain
2
F1
(bc) = (xb − xc ) ∝
2 2
− a1 . (5.86)
m1
In this notation, Gauss tells us that the values of a1 , a2 , · · · are the ones that
minimize the constraint given by:
2 2 2
F1 F2 FN
m1 − a1 + m 2 − a2 + · · · m N − aN . (5.87)
m1 m2 mN
If we compute variations of equation (5.87) with respect to the variables ai and
equate to zero we obtain:
(F1 − m 1 a1 ) · δa1 + (F2 − m 1 a2 ) · δa2 + · · · (F N − m N a N ) · δa N = 0 (5.88)
Now we notice, from equations (5.85), that δa1 = dt22 δxc ≡ dt22 δx1 : for each par-
ticle, the variation of the acceleration ai is proportional to the virtual displacement
of the same particle. In other words, equation (5.88) is equivalent to d’Alembert’s
principle of equation (5.12).
As an example consider a particle in two dimensions constrained to move on a
parabola:
1
y = x2 (5.89)
2
We assume we have the potential force −mg in the downward y direction.
Gauss’s principle tells us that we need to minimize (with respect to the
accelerations) the expression
(m ẍ)2 + (−mg − m ÿ)2 , (5.90)
incorporating the constraint, which in our case is y = x 2 /2, giving ÿ = ẋ 2 + x ẍ.
The only independent acceleration is ẍ. Gauss’s principle tells us that the mini-
mization of equation (5.90) becomes
∂ 2
ẍ + (g + ẋ 2 + x ẍ)2 = 0. (5.91)
∂ ẍ
The result is the following equation of motion
ẍ(1 + x 2 ) + x ẋ 2 + gx = 0. (5.92)
Equation (5.92) can also be derived from the constrained Lagrangian, which
is in turn obtained by substituting the constraint (5.89) into the Lagrangian
L = 12 m(ẋ 2 + ẏ 2 ) − mgy:
5.7 Least Action with a Twist 107
1 1
L = m ẋ 2 1 + x 2 − mg x 2 . (5.93)
2 2
Gauss’s principle is attractive since it deals with a true minimum and not an
extreme of a function. As pointed out by Gauss himself, the principle is not new
and is equivalent to d’Alembert’s principle. That, says Gauss, does not make it
less interesting, since “it is always interesting and instructive to regard the laws
of nature from a new and advantageous point of view, so as to solve this or that
problem more simply.” An interesting interpretation of Gauss’s principle was given
by Heinrich Hertz (1894/1956). In the absence of external forces, the kinetic energy
is constant (for time-independent constraints). That means that the length of the
path s is proportional to vt (the velocity v of the particle is constant). That means
that the acceleration is given by
d 2x d 2x d
a= 2
∝ 2
= t̂ = κ, (5.94)
dt ds ds
with t̂ = dx/ds the tangent to the curve, and κ the curvature. In the absence of
forces, the curve followed by a constrained particle (or system of particles) is the
one with minimum possible curvature. For a particle on a sphere, the path connect-
ing two points is a great circle of radius equal to the radius R of the sphere, and the
curvature κ = 1/R, the smallest possible.
5.7 Least Action with a Twist: the Elasticity of the Ether and Maxwell’s
Equations*
The power of the Lagrangian approach “consists in its allowing us to ignore or
leave out of the account altogether the details of the mechanism . . . in the phe-
nomena under discussion” (Larmor, 1893). One could follow a heuristic route,
postulate a Lagrangian with a plausible potential and kinetic energies (for exam-
ple, that respects the symmetry of the system) and obtain the dynamics through
equation (5.39). The first people to exploit this “blackboxing” property of the prin-
ciple of least action (Darrigol, 2014, p. 93) were George Green (1838) and James
McCullagh (1846). McCullagh studied the propagation of light in anisotropic crys-
tals. When light enters an anisotropic crystal, the ray splits into two rays (and, in
some cases, an infinite number of rays, as we discuss in Section 6.2). This phe-
nomenon – the “birefringence” – is a clear indication that light is a transverse
wave. McCullagh’s idea is to treat light as the vibration of an elastic solid, just as
in the case we discussed in Section 5.5.1. In a crystal, this solid is anisotropic, and
in a vacuum this solid (the ether) is isotropic. In order to eliminate the longitudi-
nal vibrations, he proposes that the ether is incompressible. For example, consider
two atoms of a solid at positions x and x + d x, whose displacements from their
equilibrium positions in the x direction are u x (x) and u x (x + d x) respectively. If
108 D’Alembert, Lagrange, and the Statics-Dynamics Analogy
dy
C
u(x + dx,y)
1
u(x,y) 2
Figure 5.10 The mean rotation of the distorted square is the “angular momentum”
of the elastic displacements u with respect to the center C.
5.7 Least Action with a Twist 109
dx
T1 = (u x (x, y) + u x (x + d x, y) − u x (x + d x, y + dy) − u x (x, y + dy))
2
∂u x
−d x . (5.95)
∂y
∂u
Similarly, the torque of the y components is T2 = +dy ∂ xy . Adding the two
terms (for a square of side d x = dy), we get that the mean rotation of the square is
∂u
proportional to ∂u
∂y
x
− ∂ xy , the z component of ∇ × u. For a cube, the mean rotation
will be precisely a vector proportional to ∇ ×u. From the point of view of elasticity
theory, a term (∇ × u)2 in the potential energy implies that there is some “spring”
that opposes the rotation of an element of the ether with respect to the vacuum!
Despite this feature, McCullagh’s theory gives the correct result and anticipates
Maxwell’s equations.
The Lagrangian proposed by McCullagh is:20
#
1 2 1
L(t) = dv ρ u̇ − h (∇ × u) .
2
(5.96)
2 2
The first term in equation (5.96) is the kinetic energy, with ρ the density of the ether.
In the second term, h is a rotational elasticity, a measure of the ether’s resistance to
twisting. This Lagrangian has an infinite number of variables, u(x, t), one for each
value of the coordinate x. Applying Lagrange’s equations (see Appendix C for the
details of the derivation), we obtain:
ρ ü = −h∇ × (∇ × u) . (5.97)
Vindication for McCullagh came from FitzGerald (1879), who noted that equa-
tion (5.97) is identical in structure to Maxwell’s equations in free space. FitzGerald
pointed out that under the following correspondence,21
B ∝ u̇, (5.98a)
E ∝ ∇ × u, (5.98b)
The correspondence does not imply mechanical properties of the ether. Rather,
this is a mechanical analogy insinuating that the ether (at least in McCullagh’s
time) should be a unique fluid, different from ordinary matter. Later, William
Thomson (1890) (Lord Kelvin) attempted to justify McCullagh’s ether with a
system of microscopic liquid gyrostats, but the model becomes “desperately com-
plicated”(Sommerfeld, 1950, p. 111). In a letter to astronomer John Herschel,
McCullagh wrote: “One thing only I am persuaded of, is that the constitution of
the ether, if it ever would be discovered, will be found to be quite different from
any thing that we are in the habit of conceiving, though at the same time very
simple and very beautiful.” For us it is most important that FitzGerald initiated the
practice of formulating electromagnetism in terms of a Lagrangian of the fields (see
Darrigol, 2014, p. 94). Hermann Helmholtz (1892), an advocate of the principle of
least action, used the Lagrangian formalism to derive Maxwell’s equations by anal-
ogy. Hendrik Lorentz (1892, 1903) derived his “Lorentz force” (F = eE + ev × B)
using d’Alembert’s principle applied to the change in kinetic energy of the mag-
netic field. Karl Schwarzschild (1903), using the Lorentz force as an ansatz, wrote
the Lagrangian of the electromagnetic field in the presence of charges of density ρ
that move at velocity v:
#
1 2 1
L = dv B − E2 + ρ φ − v · A . (5.100)
2 c
In equation (5.100) the functions φ and A are potentials from which the elec-
tric and magnetic fields are derived: E = −∇φ − ∂A/∂t and B = ∇ × A. The
Lagrangian of equation (5.100) is a function of the potentials and of the coor-
dinates and velocities of the particles. If, following Schwarzschild, we minimize
with respect to the potentials, we obtain Maxwell’s equations. If we minimize with
respect to coordinates of one of the particles, we obtain the Lorentz force acting on
that particle. In a vacuum ρ = 0, and the Lagrangian is the same as McCullagh’s.
Max Planck (1909/1915) devotes the seventh of his Eight Lectures on Theoret-
ical Physics (delivered in 1909 at Columbia University) to the principle of least
action. He uses McCullagh’s Lagrangian to show that the “significance of the prin-
ciple of least action may be extended beyond ordinary mechanics and therefore it
can be utilized as the foundation of general dynamics, since it governs all known
reversible processes.” More recently, Richard Feynman remarks on the rotational
ether: “It is interesting that the correct equations for the behavior of light were
worked out by McCullagh in 1839. But people said to him, ‘Yes, but there is no
5.7 Least Action with a Twist 111
real material whose mechanical properties could possibly satisfy those equations,
and since light is an oscillation that must vibrate in something, we cannot believe in
this abstract equation business.’ If people had been more open-minded, they might
have believed in the right equations for the behavior of light much earlier than they
did” (Feynman, 2013).
6
The Optical-Mechanical Analogy, Part II: The
Hamilton-Jacobi Equation
that gives a smaller value of V than the others? “In general” such a curve exists
and corresponds to a ray in optics. As we will see, the curve is an extremum in the
calculus of variations, which includes curves that are “stationary” or “critical,” and
not necessarily giving an absolute minimum.
Hamilton’s central idea is the following: Regard the minumum (or stationary)
value of the integral as a function of the six coordinates (x, y, z) and (x , y , z ) of
A and A . So, in the usual calculus of variations the coordinates of A and B are
fixed parameters, and the variables on which V depends are the curves connecting
112
6.1 Hamilton’s “Theory of Systems of Rays” 113
(Hamilton uses the notation (cos α, cos β, cos γ ) for unit vectors in terms of the
cosines with the coordinate axes.) Now consider a point Q within the mirror
infinitesimally close to the point of incidence P (see Figure 6.1): Q = P + δx.
Since δx is in the tangent plane of the mirror δx · n̂ = 0, the law of reflection
implies:
Figure 6.1 The law of reflection for a parabolic mirror of focal length f . In gen-
eral, F(x, y) = C, with F given by equation (6.6) gives the family of parabolas
y = x 2 /(2( f + C)) + f 2 − C 2 .
114 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
Hamilton asks how to find a mirror defined by the equation F(x, y, z) = C that
will reflect a system of rays of any given congruence into a focus at a point A . He
solves the problem formally by proving that equation (6.3) is an exact differential.
Before showing Hamilton’s proof, let us look at an example that illustrates the
spirit of Hamilton’s approach. Consider for simplicity a two-dimensional problem.
Take the incident rays as parallel and in the y direction û = ĵ. We want the rays to
converge to a focus on the y axis: A = f ĵ. Calling (x, y) the point of reflection
P = (x, y), (see Figure 6.1), we have
−ı̂x + ( f − y)ĵ
û = . (6.4)
x 2 + ( f − y)2
Equation (6.3) becomes
x f −y
dx − + 1 dy = 0. (6.5)
x 2 + ( f − y)2 x 2 + ( f − y)2
Equation (6.5), by simple inspection, is in fact an exact differential, dF = 0, with
F(x, y) = x 2 + ( f − y)2 − y, (6.6)
and F(x, y) = C describes a family of parabolas of focal length ( f + C)/2.
Hamilton’s motivation for introducing the characteristic function was probably
his noticing that, in general, equation (6.3) is an exact differential on a plane of
reflection (Darrigol, 2012). Hamilton first notes that û · δx is an exact differential,
since it simply represents δρ , with ρ the distance A P from the point of incidence
to the focus. Since δx is on the tangent to the mirror, û · δx = −δρ is also an exact
differential, but only on the surface of the mirror. So, in principle, while û · δx is
an exact differential outside of the mirror, û · δx is only so on the surface of the
mirror; in other words, the component u of û on the tangent plane is the gradient
of some function G. This implies that ∇ × u = 0 , and (since the curl of a vector
that lives on a surface is perpendicular to that surface, or ∇ × u ∝ n̂), û · δx = dG
is equivalent to
n̂ · ∇ × û = 0. (6.7)
Hamilton then notes that the variation of û along the direction of the incident
ray is zero since û is precisely along the ray; formally: (û · ∇)û = 0. Then he
remarks that, since û2 = 1, ∇ û2 = 0. This implies1 that û × (∇ × û) = 0, and
therefore ∇ × û is parallel to û. But, according to equation (6.7) ∇ × û is on the
tangent plane, and therefore ∇ × û = 0 on the mirror. Furthermore, the condition
(û · ∇)û = 0 implies (û · ∇)(∇ × û) = 0, implying that (∇ × u) remains the same
along the ray (as pointed out by Darrigol (2012), Hamilton overlooked this step in
1 Using the identity û × ∇ × û = ∇ û2 − (û · ∇)û
6.1 Hamilton’s “Theory of Systems of Rays” 115
V (x, x ) = ρ + ρ ≡ Px + x P . (6.8)
Now consider the variation of V upon changing one of the extremes by δx. The
new value of V is
V (x + δx, x ) = Q R + x Q. (6.9)
Q R + x Q = P R + ρ , (6.10)
and
V (x + δx, x ) − V (x, x ) ≡ ∇V · δx = P R − ρ = û · δx. (6.11)
For a given luminous point source at x , the gradient ∇V (x , x) gives the “vector
of normal slowness,” in the direction of the ray. For this case of reflection and
propagation in vacuum, we showed that this vector is of unit length, which implies
the following two differential equations:
∂V 2 ∂V 2 ∂V 2
+ + =1 (6.13a)
∂x ∂y ∂z
∂V 2 ∂V 2 ∂V 2
+ + = 1. (6.13b)
∂x ∂ y ∂z
The above equations also apply for multiple reflections. As a simple example,
Hamilton solves V for the case of a flat mirror (Hamilton, 1833). Consider the
source to be at a point A = (x , y , z ), an “eye” to be at B = (x, y, z), and the
mirror to be located at z = 0. The characteristic function is given by the length L of
the path that connects A and B and reflects on the mirror. We know that L, accord-
ing to the theorem of Hero of Alexandria discussed in Section 2.2, is given by the
length of the straight line connecting A with the reflected point B̃ = (x, y, −z),
and
V (x , x) = (x − x )2 + (y − y )2 + (z + z )2 . (6.14)
Another interesting case is a set of three mirrors at right angles to one another,
coincident with the planes x = 0, y= 0 and z = 0. Taking into account the three
reflections that take place for the ray to travel from A to B, we have
V (x , x) = (x + x )2 + (y + y )2 + (z + z )2 . (6.15)
In the case of refraction, the path of least time is clearly not the shortest path,
since light moves at different speeds in different media. The quantity to be mini-
mized is the “optical path,” the product of the length with the index of refraction.
The extension of Malus’s theorem to a refracting surface is immediate. A simple
derivation is the following. Consider a set of rays that form a congruence. For
example, in reference to Figure 6.3 the rays A P and B Q are perpendicular to a
surface C. The rays propagate in a medium of index of refraction n and refract on a
surface S into a medium of index n . Consider two points A and B on the refracted
rays so that the optical paths of both rays are the same:
n A P + n P A = n B Q + n Q B . (6.16)
If the segments are adjacent, the optical path of the segments AQ A and A P A are
equal, since Snell’s law is obeyed at P and, to first order, the change in optical path
for an infinitesimal displacement along the refracting surface is zero:
n AQ + n Q A = n B Q + n Q B , (6.17)
6.1 Hamilton’s “Theory of Systems of Rays” 117
Figure 6.3 Malus’s theorem for refraction: a normal congruence will remain
normal after a refraction.
where n(x) is the (position dependent) index of refraction, ds is the arc length
of the path, and the integral is evaluated on the trajectory that minimizes V . The
function V (x)/c, with c the velocity of light, is the time spent by the light ray in
going from x to x. The function V is then Fermat’s integral evaluated at the path
that extremizes the optical length.
In analogy with the case of reflection, consider the variation of V upon an
infinitesimal change in x (see Figure 6.4). If the optical length between x and x
is V , the new path x R has optical length V + d V . Since the new path is minimal,
the change in optical path is
with û the unit vector normal to the surfaces of constant V . Extending the variation
with respect to x , as in the case of reflection, we obtain the following differential
equations,
∂V 2 ∂V 2 ∂V 2
+ + = n(x)2 (6.20a)
∂x ∂y ∂z
∂V 2 ∂V 2 ∂V 2
+ + = n(x )2 , (6.20b)
∂x ∂ y ∂z
118 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
W (û, x ) = û · x − V (6.21a)
dW = x · d û + dx · û . (6.21b)
Notice that there are two mixed characteristics, the second one depending on the
final coordinates and the initial direction of the light ray. The angle characteristic
depends on the initial and final angles of the ray:
Although Hamilton’s papers in optics were formal, and his interest was mainly
in analysis rather than practical applications, in his “Third Supplement to an Essay
on the Theory of Systems of Rays” (Hamilton, 1837), he describes a remark-
able prediction for refraction in birefringent crystals: the phenomenon of conical
refraction.
Following Sarton (1932), the simplest way of explaining this is to borrow the
words of a contemporary account in the Dublin University Magazine of January,
1842, as quoted in a footnote in Robert Percival Graves’ biography of Hamilton
(Graves, 1882, pp. 623–624):
The law of the reflection of light at ordinary mirrors appears to have been known to
EUCLID; that of ordinary refraction at a surface of water, glass, or other uncrystallized
medium, was discovered at a much later age by SNELLIUS; HUYGENS discovered, and
MALUS confirmed, the law of extraordinary refraction produced by uniaxal crystals, such
as Iceland spar; and finally the law of the extraordinary double refraction at the faces of
biaxial crystals, such as topaz or arragonite, was found in our own time by FRESNEL. But
even in these cases of extraordinary or crystalline refraction, no more than two refracted
rays had ever been observed or even suspected to exist, if we except a theory of CAUCHY,
that there might possibly be a third ray, though probably imperceptible to our senses. Pro-
fessor HAMILTON, however, in investigating by his general method the consequences of
the law of FRESNEL, was led to conclude that there ought to be in certain cases, which
he assigned, not merely two, nor three, nor any finite number, but an infinite number, or a
cone of refracted rays within a biaxial crystal, corresponding to and resulting from a single
incident ray; and that in certain other cases, a single ray within such a crystal should give
rise to an infinite number of emergent rays, arranged in a certain other cone. He was led,
therefore, to anticipate from theory two new laws of light, to which he gave the names of
Internal and External Conical Refraction.
f = −K δx (6.23)
2 2 2
where K is the diagonal matrix K = diag a , b , c . Since we are dealing with a
transverse wave, the direction of propagation of the wave û is perpendicular to δx.
In turn, the restoring force can be decomposed as f = f +f⊥ in components parallel
and perpendicular to û. The idea followed by Hamilton is that the component along
f is not important, since that motion has no effect on the eye; all that matters is
the transverse displacement. The reasoning is that, in order for the motion to be
harmonic, the direction of the wave û has to be such that the component f⊥ is along
the direction of the displacement δx. In that way one can write an equation of the
form of an harmonic oscillator,
f⊥ = −ω2 δx, (6.24)
with ω (proportional to) the velocity of the wave. Using f⊥ = f − (f · û)û and
equation (6.23), we obtain:
K δx − (K δx · û)û = ω2 δx, (6.25)
and
δx = (K δx · û)(K − ω2 )−1 û. (6.26)
Multiplying both sides by K and then taking the dot product with û we get
û · K (K − ω2 )−1 û = 1, (6.27)
or, which is equivalent:
û · (K − ω2 )−1 û = 0. (6.28)
Since K is a diagonal matrix, another way of writing equation (6.28) is:
u 2x u 2y u 2z
+ + = 0, (6.29)
ω2 − a 2 ω2 − b2 ω2 − c2
which is the equation derived by Fresnel, and the starting point of Hamilton’s dis-
cussion. Equation (6.29) gives the values of the normal (or phase) velocity ω(û)
of propagation of a plane wave in the direction of the unit vector û. Since the
system is anisotropic, ω is in general different from the velocity of a ray. More
precisely, consider that at t = 0 a system of rays is emitted from the origin x = 0
in all directions. The rays are straight lines because the system is homogeneous,
but their velocities depend on direction because the system is anisotropic. At time
t, the tips of the rays generate a wave surface V (0, x) whose normal at x is û, not
necessarily in the direction of the ray, and the normal velocity of the front at that
point is given by ω(û). It is relatively easy to realize that, as noticed by Fresnel, for
6.2 Conical Refraction* 121
a given direction there are two possible normal velocities that correspond in turn
to the two orthogonal polarizations of the wave. For example, if the propagation
is along the x y plane, multiplying equation (6.29) by the product of denominators
and setting u z = 0, u x = cos θ, u y = sin θ we obtain
(ω2 − c2 ) ω2 − b2 cos2 θ − a 2 sin2 θ = 0, (6.30)
which corresponds to a circle of radius c and an oval that intersects the axes at
x = b and y = a. The same analysis applies to u x = 0 and u y = 0. If, for example
a > b > c, in the x z plane the oval will intersect the circle in four points. We
indicate those points in Figure 6.5 as I1 , I2 , I3 , and I4 , in directions forming an
angle α0 with the axes. The angle α0 given by
a 2 − b2
cos2 α0 = , (6.31a)
a 2 − c2
b2 − c2
sin2 α0 = 2 , (6.31b)
a − c2
corresponds to a direction of a single normal velocity and is called the “opti-
cal axis.” Hamilton noticed that a function of the form given by equation (6.29)
gives rise to a surface with four cusps. Remarkably, Hamilton calculates V from
ω(û) using a purely analytical method and rederives Fresnel’s equation for the
wave surface. It turns out that the wave surface has a similar structure to that
given by equation (6.29) and therefore has cusps as well. Hamilton’s origi-
nal contribution was to realize that those cusps are the vertices of cones that
give rise to infinitely many possible directions of the refracted rays for some
special directions of the incident ray. We visit his derivation in the following
section.
y z z
I4 I1
α0
x x y
I3 I2
uz = 0 uy = 0 ux = 0
Figure 6.5 Polar graph of the normal velocity ω(û) given by equation (6.29) with
û projected on the three planes for a = 3, b = 2 and c = 1. The thick line is the
optical axis which makes an angle α0 [given by equations (6.31)] with the x axis.
122 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
Q B
P
A
û
Figure 6.6 As the point P describes the wave surface, the point Q describes the
surface of normal slowness.
According to the duality argument, the normal to this surface will be proportional
to x (the ray direction). Following the notation by Darrigol (2012) we define M =
(1 − s 2 K )−1 :
∇s G = 2Ms + 2 s · M 2 K s s = λx, (6.33)
λ = 2s · M 2 K ss 2
≡ 2s · M 2 K s 2 − 1 + 1 s
= 2s · M 2 −M −1 + 1 s
= 2s · M 2 s. (6.34)
1
≡ Ms + K M 2 M −1 Ms
M
2
1 K M 2 s 2
= 1 + K M 2
Ms − K Ms. (6.37)
M 2 M 2
Using equation
(6.35) and
noticing that after squaring equation (6.36) we have
r = x = 1 + K M /M , equation (6.37) becomes
2 2 2 2
x = (r 2 − K )Ms. (6.38)
x · (r 2 − K )−1 x = 1, (6.39)
which is equivalent to
x · (1 − r 2 K −1 )−1 x = 0, (6.40)
meaning that the wave surface V can be obtained from the equation of normal
slowness by replacing K by K −1 : an “inversion,” or rescaling of the axes.
124 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
Figure 6.7 Fresnel wave surface of equation (6.42) for a 2 = 0.12, b2 = 0.4,
c2 = 0.75, showing four “cusps.”
6.2 Conical Refraction* 125
On top of observing the emerging cone, Lloyd made a remarkable discovery: all
the rays of the cone were polarized in different planes. After analyzing the cone
of emergent rays through a tourmaline polarizer, he writes: “ I was surprised to
observe that one radius only of the circular section vanished in a given position
of the axis of the tourmaline, and that the ray which disappeared ranged through
360◦ as the tourmaline plate turned through 180◦ . . . the angle between the planes
of polarization of any two of the rays of the cone is half the angle contained by
the planes passing through the rays themselves and its axis”(Lloyd, 1833). After
observing this result, Lloyd himself is able to reconcile his finding with Fresnel’s
theory. In modern terminology, the polarization of the ray is on the plane formed
by the wave vector and the Poynting vector. In Hamilton’s words: “the vibrations at
the circle of contact on Fresnel’s wave are in the chords of that circle drawn from
the extremity of the normal ω of single velocity” (point B on Figure 6.8). As a
result, and as explained by Hamilton, as one completes one turn along the circle of
the transmitted rays, the polarization completes half a turn (as shown in the insert
of Figure 6.8). Or, if one selects a point on the circle by passing the outgoing beam
through a polarizer, as the polarizer rotates by 180◦ , the point completes a whole
turn.
N
N M
P B
A
L
C
Incident beam
The observation of conical refraction not only provided powerful evidence con-
firming that light is a transverse wave, but the π-phase change of polarization
after a full turn around the circle might also be considered the first realization
of a geometrical phase, or ‘Berry phase’ (Berry, 1984). In addition, Hamilton’s
cone is an anticipation of conical intersections that appear in chemistry and
condensed matter physics, where they are referred to as “Dirac cones” (Berry,
2015).
where U is what Hamilton calls the force function, and, in modern notation, cor-
responds to minus the potential energy. Next he considers variations of 2T , the
“living force” of the system (in modern nomenclature, twice the kinetic energy),
with
1
T = m ẋ2 (t). (6.55)
2
x(t)
x(t)
δx(t)
Hamilton then uses the “celebrated law of living force” (the modern conservation
of mechanical energy). For both paths we have
1 2
m ẋ (t) = U {x(t)} + E (6.56)
2
1 2
m ẋ (t) = U x (t) + E , (6.57)
2
with E and E constants of motion (the total energy). The difference between the
above equations, to lowest order in δx is
δT = m ẋ(t) · δ ẋ(t) = δU + δ E
= m ẍ(t) · δx(t) + δ E, (6.58)
where, in equation (6.58), following Hamilton, we have used equation (6.54), and
δ E = E − E.
Using the identity
d
m [ẋ(t) · δx(t)] = m ẋ(t) · δ ẋ(t) + m ẍ(t) · δx(t) (6.59)
dt
we can rewrite equation (6.58) as
d
2(δT ) = m [ẋ(t) · δx(t)] + δ E. (6.60)
dt
Integrating equation (6.60) between times t = 0 and t, Hamilton obtains
# t
2(δT )dt = δV = m ẋ(t) · δx(t) − m ẋ(0) · δx(0) + tδ E, (6.61)
0
with
# t
V = 2T dt. (6.62)
0
where he is using the law of living force T = U + E (recall that Hamilton calls U
what we today call minus the potential energy). From equations (6.67) and (6.60),
the variation of S is
δS = δV − tδ E − Eδt (6.69)
= m ẋ(t) · δx(t) − m ẋ(0) · δx(0) − Eδt, (6.70)
which means δS/δt = −E, δS/δx(t) = m ẋ(t) and δS/δx(0) = −m ẋ(0). He
finishes the article with equations for S, which follow directly from the living force
equation:
( 2 2 )
δS 1 δS 2 δS δS
+ + + = U (x, y, z), (6.71)
δt 2m δx δy δz
6.4 An Example from Hamilton 131
and ( 2 2 2 )
δS 1 δS δS δS
+ + + = U (a, b, c). (6.72)
δt 2m δa δb δc
which is a recasting of the Maupertuis integral, evaluated for a path that connects
two points in a time t. Since energy is conserved (and the integral is evaluated on a
path that is a solution of the equations of motion), we have
# t
V = 2 (E − U )dt
0
# t
= 2Et − U dt, (6.74)
0
where E is the total energy and U is the potential energy U = −mμ/r where
μ is the mass of the Sun and m the mass of the planet or orbiting comet.2 Also,
Hamilton calls 12 h the areal velocity, with
h = r 2 θ̇, (6.75)
a constant of motion proportional to the angular momentum of the particle. The
total energy is therefore
1 1 h 2 mμ
E = m ṙ 2 + m 2 − . (6.76)
2 2 r r
Hamilton uses the parameters a (the semi – major axis of the ellipse of the orbit)
and p = b2 /a (with b the semi-minor axis of the ellipse). Also, the total energy
is simply related to the semi-major axis by E = −mμ/2a, and the areal velocity
satisfies p = h 2 /μ.3
2 Hamilton uses μ = m + M, but we will simplify the calculation slightly by taking M m.
3 These relations are sometimes called the vis viva equations and can be deduced using simple expressions for
the angular momentum and the energy at the turning points (call them 1 and 2) of the orbit:
132 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
b P
v F
a C B A
ae
Figure 6.10 The eccentric anomaly v. As the point P orbits around F – a focus of
the ellipse – it sweeps
out equal areas in equal times. The area of the sector F P A
is A F P A = 12 ht ≡ 12 b μa t. Since the ellipse as shown in the figure is a circle of
radius a that has been uniformly compressed in the vertical direction by a scaling
factor ab , the areas of the sectors F P A and F Q A are proportional: A F P A =
( ab )A F Q A . And since A F Q A = va2 /2 − Triangle{C F Q} = 12 va2 − 12 a2 e sin v, we
3
obtain t = aμ (v − e sin v).
v1 r1 = v2 r2 = h, and E/m = v12 /2 − μ/r1 = v22 /2 − μ/r2 . With simple manipulations of these equations,
one gets: h = 2μr1 r2 /(r1 + r2 ), E = −μ/(r1 + r2 ).
6.4 An Example from Hamilton 133
Also, he uses the eccentric anomaly v, in terms of which the radii r0 and r can be
written as4
r = a (1 − e cos v) , r0 = a (1 − e cos v0 ) . (6.80)
From equation (6.80) we have
a −r
cos v = , (6.81)
ae
or, which is equivalent,
ae sin v = 2ar − r 2 − pa. (6.82)
Notice that, expressed in terms of v, the integral of the potential energy is imme-
diate: from equation (6.80) we have dr = ae sin vdv and, since equation (6.82) is
just the denominator that appears in equation (6.78), we have
# t # v
√ √
−2 U dt = 2m μa dv = 2μm a(v − v0 ). (6.83)
0 v0
Kepler’s law of equal areas in equal times gives the elapsed time t in terms of
the initial and final eccentric anomalies (see Figure 6.10 for a derivation of the
equation below):
a3
t= (v − v0 − e sin v + e sin v0 ) , (6.84)
μ
and from this equation Hamilton obtains an interesting expression for the charac-
teristic function in terms of the eccentric anomalies:
√
V = 2Et + 2m μa(v − v0 )
√
= m μa (v − v0 + e sin v − e sin v0 ) . (6.85)
The expression for V does not yet have the desired form, since we want a char-
acteristic function that depends on the initial and final coordinates and not on the
parameters of the conic (which, in principle, we don’t know). Hamilton takes the
limit a → ∞ (and e → 1), which corresponds to parabolic motion. In this limit,
the angles v and v0 are small, and V , from equation (6.85), has the form
√
V = 2m μa (v − v0 ) . (6.86)
He then writes V in terms of r , r0 and the chord τ joining the initial and final
points. Since the coordinates of the initial and final points are (a cos v, b sin v) and
(a cos v0 , b sin v0 ), the chord is
% x &2 % y &2
4 These relations are easily derived from the equation of the ellipse + = 1, and (see Figure 6.10 )
a b
x = a cos v. Since r 2 = (x − ae)2 + y 2 , equation (6.80) follows.
134 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
one central aspect of Lagrange’s equations: its invariance under change of coor-
dinates. Hamilton considers a set of 3n rectangular coordinates xi = (xi , yi , z i )
to be functions of another set of 3n coordinates, or “marks of position,”
(η1 , η2 , · · · , η3n ).
x = x[η] (6.93)
∂x
ẋ = · η̇ (6.94)
∂η
∂ ẋ ∂x
= . (6.95)
∂ η̇ ∂η
Notice that the components of x and of ẋ are both independent variables, and equa-
tion (6.95) – the “law of cancellation of dots” – follows immediately from equation
(6.94). Using the chain rule we have
∂U ∂U ∂x
= · (6.96)
∂η ∂x ∂η
∂x
= m ẍ · (6.97)
∂η
d ∂x d ∂x
= m ẋ · − m ẋ · . (6.98)
dt ∂η dt ∂η
Hamilton relates the term in square brackets in equation (6.98) to derivatives of the
kinetic energy:
∂T ∂ ẋ ∂x
= m ẋ · ≡ m ẋ · , (6.99)
∂ η̇ ∂ η̇ ∂η
where we used the law of cancellation of dots of equation (6.95). Also
∂T ∂ ẋ d ∂x
= m ẋ · ≡ m ẋ · , (6.100)
∂η ∂η dt ∂η
where we used the fact that the derivatives d/dt and ∂/∂η commute.5 Substituting
equations (6.100) and (6.99) in equation (6.98), Hamilton obtains
∂U d ∂T ∂T
= − . (6.101)
∂η dt ∂ η̇ ∂η
If one adds the assumption that the potential energy is independent of the velocity,
we obtain the “covariance” (see page 95) of Lagrange’s equations (now written in
term of each of the variables ηi ):
d ∂L ∂L
= , (6.102)
dt ∂ η̇i ∂ηi
% &
5 Since ∂ η̇k = 0 we have ∂ d x = ∂ η̇ ∂ x = η̇ ∂ ∂ d ∂
∂η j ∂η j dt i ∂η j k k ∂ηk i k k ∂ηk ∂η j xi = dt ∂η j xi .
136 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
with L = T + U the Lagrangian. Hamilton does not use L in showing the invari-
ance; he uses the form of equation (6.101). Each coordinate ηi has its own Lagrange
equation, and each equation has the same form, or the same appearance, regard-
less of the transformation from one set of independent variables to another. This
invariance can seem mysterious. What is its origin? The answer is the principle of
least action. The Lagrange equations can be obtained from the stationarity of the
action at some trajectory, the important point being that the condition of stationar-
ity is the same in both coordinate systems. A very simple example illustrates this
point. Consider a function f (x) with a minimum at x0 . Now change coordinates
√
to Q = x 2 . As a function of Q, the function f becomes F(Q) = f ( Q), with a
different functional form, and a minimum at Q 0 = x02 . But obviously the condition
of minimization has the same appearance for both coordinates: d f /d x = 0 and
d F/d Q = 0. We can extend this analysis to functions f of many variables, as long
as f is a scalar (it does not have components). The Lagrangian function L behaves
like a scalar under a coordinate transformation; it changes its functional depen-
dence on the coordinates, but its numerical value at a given point remains the same.
Even though the root of this property is mathematically simple, “the invariance
of the Lagrangian equations with respect to arbitrary point-transformations gives
these equations a unique position in the development of mathematical thought.
These equations stand out as the first example of that ‘principle of invariance’
which was one of the leading ideas of the 19th century mathematics, and which
has become of dominant importance in contemporary physics” (Lanczos, 1962,
p. 117).
After showing the invariance of L, Hamilton uses a property of the kinetic energy
used by Lagrange in his Mécanique analytique:6
∂T
2T = η̇ · , (6.103)
∂ η̇
where T is taken to be a function of both variables η and η̇: T = T (η, η̇). The
differential of T is
∂T ∂T
dT = · dη + · d η̇. (6.104)
∂η ∂ η̇
Also, from equation (6.103)
∂T ∂T
d(2T ) = η̇ · d + · d η̇, (6.105)
∂ η̇ ∂ η̇
6 The proof of equation (6.103) is simpler in components. Starting with the equation for T in cartesian
coordinates, T = j m j ẋ 2j /2, we have ∂ T /∂ η̇i = j m j ẋ j (∂ ẋ j /∂ η̇i ) = j m j ẋ j (∂ x j /∂ηi ),
by the law of cancellation of dots. Now notice that
i (∂ x j /∂ηi )η̇i = d x j /dt ≡ ẋ j , which implies
η̇ · ∂ T /∂ η̇ = j m j ẋ 2j = 2T .
6.5 Hamilton’s “Second Essay on a General Method in Dynamics”* 137
The Lagrangian, evaluated over this path η(t ), treating ηi , η f and t as parame-
ters, is
2
1 η f − ηi 1 η f − ηi 1
L(t ) = + gt − gt − g ηi + t − g(t − t)t ,
2 t 2 t 2
from which we obtain
# t
S(ηi , η f , t) = L(t )dt (6.137)
0
1 (η f − ηi )2 η f + ηi 1
= − gt − g2t 3. (6.138)
2 t 2 24
It is straightforward to show that the function S(ηi , η f , t) satisfies both differential
equations
∂S 1 ∂S 2
+ + gηi = 0, (6.139)
∂t 2 ∂ηi
∂S 1 ∂S 2
+ + gη f = 0. (6.140)
∂t 2 ∂η f
Another approach to obtaining S is to integrate the differential equation (only
one of them, the Hamilton-Jacobi equation in terms of the final coordinates, for
example). This is not the procedure followed by Hamilton in his essay (for more
elaboration on this point, see Nakane and Fraser (2002)). For the present problem,
since the potential is time independent, we can separate variables and write
where E is a constant, which will have to be expressed in terms of the initial and
final conditions through equation (6.64): t = ∂ V /∂ E. This separation leads us to
the equation for the characteristic function V :
1 ∂V 2
+ gη f = E, (6.142)
2 ∂η f
from which, after integrating, we get
2√
V =− 2(E − gη f )3/2 + C, (6.143)
3g
where C is an integration constant (constant with respect to η f ) that we find by
imposing the boundary condition of vanishing V for η f = ηi , V (ηi , ηi , E) = 0:
2√
V = 2 (E − gηi )3/2 − (E − gη f )3/2 . (6.144)
3g
142 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
η f − ηi gt
2(E − gηi, f ) = ± , (6.146)
t 2
which, substituted in equation (6.144) gives
(η f − ηi )2 g2 t 3
V = + . (6.147)
t 12
Squaring equation (6.146) we have
(η f − ηi )2 gt 2
E= + + g(ηi + η f ), (6.148)
2t 2 8
and from these expressions for V and E, forming S = V − Et, we obtain equation (6.138).
6.7 Applications and Examples 143
In Huygens’ construction, the light rays are perpendicular to the wave-front but,
as discussed by Hamilton in his 1833 essay, this can be extended to the so-called
“extraordinary refraction,” where the family of light rays are not perpendicular to
the wave-front. An important observation made by Hamilton in his essay is the
equivalence between Fermat’s least action principle and Huygens’ construction,
which we illustrate for the simple case of refraction into an ordinary medium in
Figure 6.12. Since the Hamilton-Jacobi equation is derived from Fermat’s princi-
ple, the propagation of the “wave-front” S(x, t) of particles can be obtained from
Huygens’ construction.
Malus and Laplace extended the treatment outlined in Figure 6.12 to show the
equivalence of Huygens’ principle and Fermat’s principle of least time (Darrigol,
2012). Laplace considers a medium for which the velocity of propagation depends
on direction and is given by the following function: v 2 = α 2 + β 2 cos2 θ. For a
medium like this, the wave-fronts from a given point q0 are ellipses centered at q0
and the rays – straight lines that radiate from q0 – are clearly not perpendicular to
the ellipses, a situation we will consider in more detail later in this chapter.
The great contribution of Hamilton’s series of articles, summarized in his 1833
essay, is to formalize Huygens’ principle and extend it from light rays to mechan-
ical trajectories. Whereas Laplace’s formulation of the principle of least action
leads to second-order differential equations for the trajectories, Hamilton, follow-
ing the logic of Huygens’ principle, formulates the problem in terms of a first order
differential equation for the wave-fronts.
with λ a variable that parameterizes the ray given by curve x(λ): the ray starts
at x(0) and ends at x(1). ẋ = dx/dλ, and L, the Lagrangian of the problem, is
given by:
√
L(ẋ, x) = ẋ 2 + ẏ 2 + ż 2 n(x) ≡ ẋ2 n(x). (6.151)
d t̂ % &
n + t̂ t̂ · ∇n = ∇n, (6.154)
ds
which we can write as:
d t̂ ∇n
= t̂ × × t̂ , (6.155)
ds n
the equation of motion of the ray.
We will return to equation (6.155) in Section 7.4.1 when we calculate the
bending of a light ray due to relativistic effects.
H = T ( p) + V (x), (6.157)
with
∂L
p= = ẋ, (6.158)
∂ ẋ
giving
1 2 1 2 2
H= p + ω x , (6.159)
2 2
As an exercise let us derive the Hamiltonian from the explicit form of the
action S(q, t), for a particle that starts at x = 0 at t = 0, and ends at x = q
at time t. Let us call τ the intermediate times of the path. The general solu-
tions (the paths that minimize the action) for the motion of a harmonic oscillator
are
ẋ0
x(τ ) = x0 cos ωτ + sin ωτ. (6.160)
ω
For the initial and final conditions x(τ = 0) = 0 and x(τ = t) = q, the
solution is:
sin ωτ
x(τ ) = q . (6.161)
sin ωt
146 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
Figure 6.13 The paths for a particle in a magnetic field are not perpendicular to
the wave-fronts of constant action S(q, t).
t = 0, and ends at (x, y) = (q1 , q2 ) at time t. The physical orbits of this problem
are circles of radius r0 given by (see Figure 6.13)
|q(t)|
r0 = (6.169)
2 sin ωt/2
q12 + q22
= (6.170)
2 sin ωt/2
√
which rotate at a frequency ω = B/m. The velocity of the particle is constant
along the trajectory:
|ẋ(t)| = |ẋ(τ )| = ωr0 , (6.171)
where we use τ for the intermediate time of the trajectory connecting the origin
and the final point q(t). The term (x ẏ − y ẋ) for the intermediate time τ can also
be evaluated geometrically by noting that
|x(τ )| = 2r0 sin ωτ/2,
and that the angle between x(τ ) and ẋ(τ ) is ωτ/2:
ωτ
.
ẋ(τ )y(τ ) − ẏ(τ )x(τ ) = 2ωr02 sin2 (6.172)
2
Substituting in the Lagrangian (evaluated at the minimum path) we get
1
L(τ ) = ω2r02 cos ωτ, (6.173)
2
148 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
Q
F M
O
Figure 6.14 The physical path O Q P, where P is beyond the focus F of mirror
M, is not a minimum path for a light ray. Consider another, unphysical, nearby
path, O Q F P. Since this path also goes through the focus, it has the same length
as O Q P. But the segment Q P is clearly shorter than the broken path Q F P,
and the (unphysical) path O Q P has a length shorter than .
150 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
a
F
c
O
t
T0 T
Q
8 As a counterexample, consider a one-dimensional infinite potential well, where the particle is free for values
of x in the interval (0, a). There is an infinite number of physical paths that connect the space-time point
6.8 When the Principle of Least Action Loses its “Least” 153
where x0 (t) is the path that minimizes the action S, α a small parameter and φ(t) a
function that satisfies φ(0) = φ(T ) = 0. The second variation of the action is
#
α2m T 2
δ S=
2
(φ̇ − ω02 φ 2 )dt. (6.194)
2 0
Now set, just as we did for the particle on a sphere,9
t
φ(t) = C sin π , (6.195)
T
which satisfies φ(t) = φ(T ) = 0, and ω = 2π/T .
Substitution of φ into the expression for the second variation gives
* +
α 2 mω02 C 2 T0 2
δ S=
2
−1 . (6.196)
4 2T
If T < T0 /2 (before the focus), δ 2 S > 0. But beyond the kinetic focus, for
T > T0 /2, we obtain a negative second variation. We can clearly choose other
variations, with a lot of wiggles, so that the second variation is positive. Hence we
have a saddle beyond the kinetic focus.
P0 = (x = 0, t = 0) with a point P f = (x f , t). These correspond to the particle bouncing from the wall n
times before reaching P f . For no bounce, the direct path has action S0 = mx 2 /t f ; for one bounce the action
is S1 = m(2a − x)2 /t f ; for two bounces S2 = m(2a + x)2 /t f and so on.
9 The general form of the variation satisfying the initial and final conditions is φ(t) = ∞ a sin π nt , as
n=1 n T
considered in Gray and Taylor (2007). However, if our purpose is to show that, beyond the kinetic focus,
there exists a path for which the action is smaller (to second order in α) this simpler function suffices.
154 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
Q
F P
Figure 6.17 Elliptical orbits of the same energy and the same focus F, passing
through a common point P. Since the action depends only on the semi-major axis
(equation (6.199)), (P, T) is a kinetic focus of (P, 0).
6.8 When the Principle of Least Action Loses its “Least” 155
From equation (6.84) we have T = 2πa 3/2 /μ1/2 , and since E = −mμ/(2a), the
action S for an elliptical orbit that starts and ends at the same spatial point after one
revolution is
√
S = 3πm μa . (6.199)
S is a function of the semi-major axis only and therefore (P, T ) is a kinetic focus.
If one considers a converging wave that has passed through a focus and has then become
divergent, a simple calculation shows that the vibration of that wave has advanced half a
period compared to what it should be according to the distance traveled and the speed of
light.
Since its first description, the Gouy phase has been observed under a wide variety
of circumstances, but the origin of this anomaly continues to be a matter of some
debate (see, for example, Visser (2010) and references therein). In this section,
we present a simple exposition that connects the origin of the Gouy phase to the
minimum versus saddle point problem for the action before and after the focus.
Interestingly, it turns out that Gouy himself had remarked on this point in his 1891
paper. We will consider the simpler case of a focal line (two-dimensional wave
propagation) rather than a focal point, a situation also considered by Gouy, where
the wave advances a quarter of a period rather than half a period.
Consider the superposition of coherent sources of light of wavelength λ located
on an arc of a circle of radius R, as shown in Figure 6.18. For this example, we
will assume that the angle α0 of the arc is small (α0 π ), but that the length of
the arc is much larger than the wavelength (Rα0 λ). From each point of the arc
√
emanates a cylindrical wave whose amplitude decreases as sin(kρ)/ ρ, where ρ
is the distance to the source, and k = 2π/λ. At the end of the section, we argue that
the extension from an arc to a curved surface is immediate. For simplicity, we will
consider the superposition of all these cylindrical waves evaluated along an axis x
that starts from the center of symmetry of the arc (point O on Figure 6.18) and goes
through the focus R. Along the x axis, the amplitude of the wave originating at O is
√
sin(kx)/ x. This will be our reference wave. At the focus – the center of the circle
– all the waves have the same phase √ and add constructively, giving rise to a total
amplitude proportional to α0 sin k R/ R. We are interested in points along the x
axis at distances d away from the focus that are much larger than the wavelength of
156 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
light (kd 1), and much smaller than the radius of the arc (d/R 1). The second
of these conditions implies that, for
√points close to the focus, the denominator of
the amplitude can be replaced by R. With this approximation (which was also
considered by Gouy), the amplitude of the wave at a point x is proportional to the
following superposition of waves originating from points within the arc:
# α0
1
I (x) ∝ √ dθ sin kρx (θ) (6.200)
R −α0
where the distance ρx (θ) is given by
ρx (θ) = R 2 + d 2 ± 2d R cos θ
R 2 ± 2d R cos θ
R ± d cos θ, (6.201)
and x = R ± d. The plus (minus) sign corresponds to a point after (before) the
focus. Since the angle of the arc is small we can replace cos θ
1 − θ 2 /2, and
obtain
( 2
x + dθ2 (before the focus)
ρx (θ) = 2 (6.202)
x − dθ2 (after the focus).
The amplitude at a point x before (after) the focus is the result of a superposition
of waves that have longer (shorter) path lengths than the “reference” wave sin kx.
This difference, rooted in the different nature of the stationary points before and
after the focus, is behind the dephasing of the wave as it passes through the focus
(see Figure 6.18). Before the focus, the amplitude results from the superposition
of phases whose optical paths are larger than the reference optical path x. The
converse happens after the focus. The precise value of the phase shift requires
evaluating the integral for I (x):
# α0
1 kdθ 2
I (x) ∝ √ dθ sin kx ± (6.203)
R −α0 2
# α √ kd
1 2 0 2
=√ √ kd dz sin(kx ± z )
2
(6.204)
R kd −α0 2
# ∞
1 2
√ dz sin(kx ± z 2 ), (6.205)
R kd −∞
where we used the condition kd 1 to boldly extend the limits of integration
to plus and minus infinity. Expanding the sine function, we obtain
# ∞ # ∞
1 2
I (x) ∝ √ sin kx dz cos z ± cos kx
2
dz sin z .
2
(6.206)
R kd −∞ −∞
6.8 When the Principle of Least Action Loses its “Least” 157
d d
a R b
O x
Figure 6.18 Schematic rendition of the origin of the Gouy phase shift. Coherent
sources originating on the arc C interfere constructively at the focus R (thick line
wave). At a distance d away from the focus, the superposition is between waves
of different path lengths. At a point a before the focus, waves (represented by
the solid line) originating at an arbitrary point P have longer path lengths than
the wave originating at O (dashed line, or our reference wave). Since Pa > Oa,
the superposition gives a resultant wave shifted to the left of the reference wave.
The converse happens after the focus, where Pb < Ob, and the resulting wave is
shifted to the right. (The figure is out of scale with respect to the analysis in the
text, where R d λ is assumed.)
This expression was obtained by Gouy in his 1891 paper (Gouy (1891), p. 192).
The integrals in equation (6.206) are the so-called Fresnel integrals, which, for
enigmatic reasons, are equal:
# ∞ # ∞
π
dz sin z =
2
dz cos z =
2
, (6.207)
−∞ −∞ 2
and the equality of these two integrals (and not their precise value) is what gives
rise to the shift in π/2. √
Using the trigonometric identity sin kx ± cos kx = 2 sin(kx ± π/4), we obtain
our final result:
1 2π sin kx + π4
I (x) ∝ √ π
(before the focus) (6.208)
R kd sin kx − 4
(after the focus).
For this one-dimensional arc we obtain a phase difference of π/2 for the waves
before and after the focus. Gouy (1891) points out that everything happens as
though the waves, in the vicinity of the focal line, propagate at a greater veloc-
ity, gaining an advance of λ/2 over a usual plane wave that displaces on the same
line. Of course, this is an advance that refers to the phase velocity; the group veloc-
ity is never higher than the speed of light in vacuum. Otherwise, we √ could send
superluminal messages across the focus! Notice also the large factor kd in the
denominator of the resulting amplitude (also obtained by Gouy), meaning that, due
158 The Optical-Mechanical Analogy, Part II: The Hamilton-Jacobi Equation
to cancellations, the amplitude is much smaller than the amplitude at the focus, as
expected. If we consider an ellipsoidal surface rather than a circle, with two foci,
we will see two phase changes, each of them of π/2. For a single focus and a
two-dimensional surface, the total phase change is π.
6.8.6 Caustics
In addition to the focusing phenomena, where a family of reflecting or refracting
rays (or paths) converges to a single point, there is a more general focusing mecha-
nism where the rays concentrate on surfaces or lines rather than points. These focal
surfaces are called “caustics” (from the Greek word meaning capable of burning),
because they are the places where light is most intense. Just as in the focal point,
the action is a minimum between the source and the caustic, and a saddle after
the caustic is crossed. Caustics are observed in everyday phenomena like rainbows
(Boyer, 1987), mirages (Young, 2012), and light reflected on the surface of water
(Berry, 2015).
As an illustration, consider rays propagating in two dimensions, giving rise to a
caustic line. Rays originate at a point M = x1 , reflect at a point P of a mirror, and
reach an “eye” A = x2 , as in Figure 6.1. If we call ρ1 = M P and ρ2 = P A , the
action, or optical path, is S = ρ1 + ρ2 . If the mirror is generated by a vector x(s),
with s the arc length, we have:
S = |x1 − x(s)| + |x2 − x(s)| ≡ ρ1 + ρ2 . (6.209)
In order to find the point P where the reflection takes place, we compute the first
variation of the action by shifting infinitesimally the reflection point along the
mirror:
where we used d t̂/ds = κ n̂, with κ the curvature of the mirror at P, and n̂ the
normal to the mirror, as in Figure 6.11. Using ûi = (xi − x(s)) / |xi − x(s)| we
obtain:
d ûi 1 % &
= −t̂ + (ûi · t̂)ûi . (6.213)
ds ρi
Substituting expression (6.213) in equation (6.212), and using û1 · t̂ = −û2 · t̂ =
sin θ we obtain:
d2S 1 1
= cos θ cos θ + − 2κ . (6.214)
ds 2 ρ1 ρ2
Let us analyze equation (6.214) for a fixed point of reflection, taking ρ1 and θ
constant, and varying ρ2 , that is, walking away from the reflection point along
the direction of the reflected ray. For very small ρ2 , the 1/ρ2 term dominates
and d 2 S/ds 2 > 0. As we increase ρ2 , we could reach a critical point where
d 2 S/ds 2 = 0. If we keep increasing ρ2 , then d 2 S/ds 2 < 0, and the action is a
saddle. The caustic for a mirror in two dimensions is therefore described by the
following equation:
1 1
cos θ + − 2κ = 0. (6.215)
ρ1 ρ2
It is clear that different positions of the originating light rays with repect to the
mirror will give rise to different caustics.
As a particular example, let us consider parallel rays incident on a circle of radius
R: ρ1 → ∞ and κ = 1/R (see Figure 6.19). The caustic generated is the curve
usually seen at the bottom of a cup of coffee. From equation (6.215), the equation
for the caustic of the cup of coffee is (renaming ρ2 = ρ)
R
ρ(θ) = cos θ. (6.216)
2
θ
R
(θ
)
ρ(θ
)
For rays incident vertically, the coordinate of the reflection point is R(θ) =
R(sin θ, − cos θ) (see Figure 6.19). Since the reflected ray forms an angle 2θ with
the incident ray, the vector ρ(θ) connecting the reflection point with the point in
the caustic is ρ(θ) = R2 cos θ(− sin 2θ, cos 2θ). The parametric equations of the
caustic, which we plot in Figure 6.19 is then:
R
x=
(2 sin θ − cos θ sin 2θ), (6.217a)
2
R
y = (cos θ cos 2θ − 2 cos θ). (6.217b)
2
As a second exactly solvable example, we move away from light rays and
consider motion in one dimension for a particle in a potential of the form:
V (x) = λ|x|. (6.218)
This example was considered by Gray and Taylor (2007); see also Gray (2009).
The motion corresponds to a constant force, like constant gravity, that reverses
direction at x = 0.
For a particle starting at x = 0 at t = 0 and initial velocity ẋ(0) = v0 , the orbit
is a parabola curving downwards:
1 2 2v0
x(t) = v0 t − λt , t < t0 = . (6.219)
2 λ
At t = t0 , when the particle comes back to the origin, the force reverses direction,
and the orbit becomes an upward curving parabola;
1
x(t) = v0 (t0 − t) + λ(t − t0 )2 , (t0 < t < 2t0 ) . (6.220)
2
The motion is periodic, with half period t0 = 2v0 /λ and amplitude λt02 /2. For a
given orbit, there is a critical point at tC = 4/3t0 (see Figure 6.20): for t < tC the
t0 4t0/3
t
caustic
Figure 6.20 Caustic for a particle moving in the potential V (x) = λ|x|, for initial
conditions x(0) = 0, ẋ(0) = λt0 .
6.8 When the Principle of Least Action Loses its “Least” 161
action is a minumum, and for t > tC the action is a saddle. The caustic is described
by the curve,
1
xC (t) = − λt 2 . (6.221)
16
In Appendix F we derive the caustic for one-dimensional motion treating the
second variation as a quantum mechanical problem.
7
Relativity and Least Action
1. The laws of physics take the same form for “all reference frames for which the
equations of mechanics hold good” (inertial frames).
2. Light always propagates in empty space with a velocity c which is independent
of the state of motion of the emitting body.
162
7.1 Simultaneity and the Relativity of Time 163
It is remarkable that the equations that Einstein derives existed before his work.
In 1895, the Dutch physicist Hendrik A. Lorentz, in order to explain some exper-
iments by Michelson and Morley wrote a set of equations (identical to Einstein’s)
in which time appeared as a mathematical variable that depended on velocity and
position. Lorentz distinguished between a true time (the one measured by a clock
at rest) and local time (the one dependent on the location of an event). The crucial
point is that Lorentz considered the local time a mere mathematical fiction used
to simplify an equation. Einstein accepts that fiction as real – a “suspension of
disbelief” of sorts – and incorporates it into his relativistic universe.
In the next few sections, we discuss Einstein’s special relativity paper with a
focus on its bearing on the principle of least action. In the second part of the chap-
ter, we visit the general theory, where the principle of least action is realized as
a “principle of maximal (or extremal) aging:” the paths followed by particles in
curved space-time are geodesics which extremize the time on the wristwatch of a
traveler following these path.
K x vt ct0 x
Figure 7.1 Viewed from the rest frame K , the distance traveled by the ray (thick
line) is ct = 2ct0 − vt , and t = 2ct0 /(c + v).
mined. Next he looks for a linear relation between each of the spatial coordinates
ξ, η, ζ of the moving frame and x, y, z, t of the stationary frame. For points with
y = z = 0, let us write the linear relation for the ξ coordinate as ξ = b1 x + b2 t.
Since the origin of coordinates of the moving frame (ξ = 0) has coordinates
x = vt in the stationary frame, we have b2 /b1 = −v, or ξ = b(v)(x − vt)
and b(v) a constant to be determined. Now, since the coordinate of a light ray
is ξ = cτ in the moving frame and x = ct in the stationary frame, we have
b(v)(c−v)t = ca(v)(t −vt/c), implying a(v) = b(v): ξ(x, 0, 0, t) = a(v)(x −vt).
In order to obtain the relations between the coordinates perpendicular to the
motion in the moving frame and x, y, z, t, Einstein considers a light ray moving in
the η direction in the moving frame: η = cτ . Viewed from the rest frame, the ray
follows a tilted path, as shown in Figure 7.2.
We write the linear relation between η and the coordinates in the rest frame as
η = d1 x + d2 y + d3 z + d4 t. Consider the following three events (see Figure 7.2):
7.1 Simultaneity and the Relativity of Time 165
y
√ “2”
c 2 − v 2t
ct
ct
“3”
“1”
x
0 vt 2vt
Figure 7.2 A ray moving vertically in the moving frame follows a tilted path as
viewed from the rest frame.
√ frame) at x = y = z = t = 0; “2”:
“1”: the ray is emitted vertically (in the moving
the ray reaches a point with coordinates y = c2 − v 2 t, x = vt, at time t; “3”: the
ray is back at the origin (x = 2vt) at time 2t. Using equation (7.3) we now have
1
(τ (0, 0, 0, 0) + τ (2vt, 0, 0, 2t)) = τ (vt, c2 − v 2 t, 0, t). (7.5)
2
Since we already know the dependence of τ on x and t for y = 0, using
τ (x, y, 0, t) = a(v)(t − vx/c2 ) + α2 y, we obtain α2 = 0: the moving clocks,
viewed from the stationary frames, remain synchronized in the y direction. Repeat-
ing the same argument for a ray in the ζ direction, we obtain that τ is independent
of the coordinates perpendicular to the directions of motion.
In order to obtain a relation between y and the vertical coordinate η, we write:
%
v & v2
η = cτ = c a(v) t − 2 x = c a(v) 1 − 2 t (7.6)
c c
v2
= a(v) 1 − 2 y, (7.7)
c
where in √equation (7.6) we used x = vt and in equation (7.7)
we used (see Figure
7.2) y = c2 − v 2 t. An identical argument gives ζ = a(v) 1 − v2 /c2 z.
Einstein switches notation from the factor a(v) to φ(v) = a(v) 1 − v 2 /c2 and
writes the following relation for the transformed coordinates:
For the second step, he considers a rod of length l (measured in the moving system)
and oriented in the direction η, perpendicular to the motion. From equation (7.8c),
the length of the rod measured in the stationary system is y = l/φ(v). For “reasons
of symmetry,” the length of the rod in the stationary system has to be the same for
the frame k moving in the +x or −x direction: l/φ(v) = l/φ(−v), from which
it follows that φ(v) = φ(−v), and, together with equation (7.9), Einstein obtains
φ(v) = 1. His final expression for the transformation of coordinates is:
τ = γ (t − vx/c2 ), (7.10a)
ξ = γ (x − vt), (7.10b)
η = y, (7.10c)
ζ = z. (7.10d)
Einstein says, “we have thus shown that . . . the electrodynamic foundation of
Lorentz’s theory . . . agrees with the principle of relativity” (Einstein, 1952, p. 60).
Einstein uses these transformed fields to derive the relativistic version of
Newton’s second law. He considers the motion of a charged particle in an elec-
tromagnetic field. He wants to find out how the particle would accelerate under
the action of an electromagnetic force. Can we still apply F = m ẍ? The answer is
no. For example, under the action of a constant force, according to Newton’s law a
particle will increase its velocity indefinitely, whereas the transformation equations
(7.10) set a limit to the velocity: at v = c, time, viewed from the stationary sys-
tem, will stand still, and “all moving objects shrivel up into plane figures, . . . the
velocity of light plays the part, physically, of an infinitely great velocity” (Einstein,
1952, p. 48).
Consider a particle accelerated under the action of an electromagnetic force.
Einstein starts with the assumption that Newton’s second law is valid when the
particle is instantaneously at rest and is acted upon by an electric field E: m ẍ = eE,
with e the charge of the electron. Next he assumes, without loss of generality, that at
a certain instant t the (accelerated) particle is moving at velocity v in the x direction
as viewed from the stationary frame K . Since the particle is instantaneously at rest
in another frame k (moving at a velocity v with respect to K ), he writes:
d2x
m = eE x , (7.12a)
dt
d 2 y
m = eE y , (7.12b)
dt
d 2 z
m = eE z , (7.12c)
dt
where the primes refer to the coordinates and fields measured by an observer in
k. Since the motion of k with respect to K is in the x direction, we have x =
γ (x − vt), y = y, z = z, t = γ (t − vx/c2 ), and the primed fields given by
equations (7.11).
Since d x = γ (d x − vdt) and dt = γ (dt − vd x/c2 ), we have:
dx ẋ − v
= , (7.13)
dt 1 − v ẋ/c2
and
d2x 1 d dx
=
dt 2 dt /dt dt dt 2
1 ẍ(1 − v ẋ/c2 ) + (ẋ − v)v ẍ/c2
= . (7.14)
γ (1 − v ẋ/c2 ) (1 − v ẋ/c2 )2
168 Relativity and Least Action
Now, since we are considering an instant for which, in the unprimed frame, the
particle is moving at velocity ẋ = v, we have (1 − v ẋ/c2 ) = 1/γ 2 , and Einstein
writes
d2x d 2 y d 2 z
m = γ 3 m ẍ, m = γ 2 m ÿ m = γ 2 m z̈. (7.15)
dt dt dt
Using equations (7.15), together with the transformed fields of equation (G.8c)
Einstein obtains:
γ 3 m ẍ = eE x , (7.16a)
v
γ m ÿ = eE y − e Bz , (7.16b)
c
v
γ m z̈ = eE z + e B y , (7.16c)
c
where we are using the notation ẍ = d 2 x/dt 2 etc. Einstein remarks that, if one
insists on using Newton’s expression, where force equals mass times acceleration,
then special relativity gives a “longitudinal mass” (the mass in the direction of the
instantaneous velocity) of magnitude γ 3 m and a “transverse mass” equal to γ m for
the acceleration transverse to the velocity. And he remarks that “with a different
definition of force and acceleration we should naturally obtain other values for the
masses” (Einstein, 1952, p. 68). In a footnote – most probably due to Sommerfeld
– to the 1905 article included in the collection The Principle of Relativity (Einstein,
1952), we read “The definition of force here given is not advantageous, as was first
shown by Planck.” In fact, in the first paper on relativity by someone else other than
Einstein, Max Planck (1907) wrote the relativistic version of Newton’s second law
where the force is proportional to the rate of change of the momentum mγ ẋ and
derived the relativistic Lagrangian.
Following Planck, we decompose the acceleration into its component parallel to
the velocity, given by (ẍ · v̂)v̂, and the transverse component, given by ẍ − (ẍ · v̂)v̂,
with v̂ = ẋ/v a unit vector in the direction of the velocity. Using these definitions,
the left-hand side of equations (7.16) can be written (omitting the factor m) in
vector form:
1
γ 3 (ẍ · v̂)v̂ + γ ẍ − (ẍ · v̂)v̂ = γ ẍ + 2 γ 3 (ẍ · ẋ)ẋ (7.17)
c
d
= γ ẋ, (7.18)
dt
where we used γ 3 − γ = γ 3 v 2 /c2 , and γ = 1/ 1 − ẋ2 /c2 .
7.2 The Relativistic “F = ma” 169
This structure of (p, E/c) implies that the four-vector transforms as (dx, c dt) upon
change in coordinates. More precisely, if px , p y , pz , E/c is the four-momentum
in frame K , the components of the vector px , p y , pz , E /c viewed from a moving
frame in the x direction are
E E
=γ − βpx (7.29a)
c c
E
px = γ px − β (7.29b)
c
7.2 The Relativistic “F = ma” 171
p y = p y (7.29c)
pz = pz , (7.29d)
with β = v/c. Just as in the case of the invariants of equation (7.24), we have the
invariant “length” of the energy-momentum vector
E2
p2 −
= Invariant = −(mc)2 . (7.30)
c2
The invariant, the “proper length” of the four momentum, is its value in a frame
when the particle is at rest (p = 0), which leads us to perhaps the most famous
equation in the world:
E 0 = mc2 , (7.31)
with E 0 the energy of the particle at rest. For a particle with velocity v, or
momentum p the energy is
mc2
E= = (cp)2 + (mc2 )2 . (7.32)
1 − v 2 /c2
The mass of the particle emerges as a relativistic invariant, a “scalar” of the theory.
It is common in many textbooks to interpret equation (7.19), F = d(mγ ẋ)/dt, in
terms of a mass that increases with the velocity. It is more consistent with relativity
(Taylor and Wheeler, 1999, pp. 246–252) to refer to m as an invariant (Okun, 1989,
2009) and speak of a momentum that is not related linearly with the velocity.
Ay = A y , (7.34c)
Az = Az , (7.34d)
and h is, reasons Planck, a relativistic invariant. “It is evident that because of this
theorem the significance of the principle of least action is extended in a new direc-
tion” (Planck, 1907). We will return to this discussion in Chapter 8. For Planck,
the principle of least action, “by its form and comprehensiveness, may be said
to have approached most closely to the ideal aim of theoretical inquiry” (Planck,
1909/1915, p. 69) and attains its full clarity in the context of relativity where the
principle “contains all four world coordinates in fully symmetrical order”(Planck,
1910).
the Special Theory of Relativity which establishes that information cannot be prop-
agated faster than the speed of light. In addition, in Newtonian mechanics, gravity
is a peculiar force in the following sense. On the one hand, Newton’s second law
states that
Force = Mass × Acceleration, (7.46)
where this “Mass” is the inertial mass m I of the body, which expresses the extent to
which it resists to a change in motion. On the other hand, the gravitational attraction
on the same body due to a second body of mass M gives rise to a force
Gm G M
, (7.47)
r2
where m G is the gravitational mass of the body. The acceleration of this body
will be
mG G M
Acceleration = . (7.48)
mI r2
Now, from Galileo’s experiments, and to current experimental accuracy
mG = m I .
This means that, given some initial conditions, the motion of a body under the
action of gravitational forces is independent of its nature. This is called the “weak
equivalence principle:” the dynamics of the particle is specified by a single world
line. This suggests that we can locally eliminate gravity by transforming to a mov-
ing frame that is in free fall with the particle. For example, consider that you are in
an elevator in free fall. Since all the particles feel the same acceleration, the effect
of gravity has disappeared. A change of frame of reference has eliminated grav-
ity. Gravity must therefore be a fictitious force that arises in non-inertial frames of
reference.
We will examine the principle of equivalence for an elevator in free (vertical)
fall assuming that the gravitational field is constant. Later we will see that this is a
limited approximation: if it were valid, bodies could be accelerated indefinitely to
infinite momenta. Consider two clocks A and B in the elevator separated a distance
h in the vertical direction. The two clocks have velocities v A and v B as they pass
successively very close to a clock C at rest. We apply special relativity to relate
the time between clicks of the different clocks. As shown in the previous chapter,
for an observer at rest with clock C, the time t A is slower than the clicks of her
clock tC :
tC 1 v 2A
t A =
tC 1 + , (7.49)
v 2A 2 c2
1 − c2
7.4 The Principle of Equivalence 175
and
tC 1 v 2B
t B =
tC 1 + , (7.50)
v 2B 2 c2
1− c2
rigid rod, those times would be different upon arrival at the position of a distant
observer. For the distant observer, the velocity at x, says Einstein, will be given by
the relation
(x)
c(x) = c0 1 + 2 , (7.55)
c0
with (x) the gravitational potential and c0 the velocity of light in vacuum. (For a
spherically symmetric star, = −G M/r where r is the distance to the center of
the star.)
t̂0 θ
x t̂∞
∇n GM % &
≈ ∇n = − 2 3 îx + ĵy . (7.58)
n cr
Substituting equation (7.58) in equation (7.56) we obtain:
# ∞ # ∞
d t̂ GM
ds ≡ t̂∞ − ı̂ = −ĵ ds y 2 3 , (7.59)
0 ds 0 cr
which coincides with Einstein’s integral
√ for the deflection. Since the deflection is
small, we can use ds = d x and r = R 2 + x 2 . Also, t̂∞ − ı̂ ≈ −θĵ, and we obtain
Einsteins’s expression for the total deflection, α = 2θ:
#
GMR ∞ dx 2G M
α=2 2 3/2 = 2 . (7.60)
c 0 R2 + x 2 c R
ˆr,0
−u
R p∞ = p∞ˆ
ur,∞
Figure 7.4 Newtonian bending of a particle of mass m. Using the constancy of the
Laplace-Runge-Lenz vector A along the trajectory, we obtain for the deflection
angle tan θ = G Mm 2 /Rp0 p∞ .
GM
p(r ) ≈ 1 + , (7.62)
c2r
which coincides with equation (7.57).
7.4.2 Bending of Light Rays, Newtonian Calculation
As an exercise, let us use Newtonian mechanics to compute the bending of a light
ray by first solving the bending for a particle and then taking the particle velocity
to be very large. In Section 5.4.2, we showed that the Laplace-Runge-Lenz vector
R (we will call it A in the present section) given by equation (5.57) is a constant
of motion. For an unbound orbit that corresonds to the path of a light ray, we can
evaluate A at two points: at the grazing point, where the particle’s linear momentum
is p0 and the magnitude of the angular momentum is = p0 R (see Figure 7.4), and
at infinity, where the momentum p∞ and the radial unit vector ûr,∞ are colinear.
For the choice of axes of Figure 7.4, A points in the y direction, and since it is a
constant of motion, we have:
A = × p0 + G Mm 2 ur,0 = × p∞ + G Mm 2 ur,∞ , (7.63)
with
= − p0 R k̂, (7.64a)
p0 = p0 ı̂, (7.64b)
ûr,0 = ĵ, (7.64c)
% &
p∞ = p∞ ûr,∞ ≡ p∞ cos θ î − sin θ ĵ . (7.64d)
Since the x component of A is zero for all points of the path, we use equations
(7.64) and evaluate A x at a point “at infinity,”
× p∞ + G Mm 2 ur,∞ x = G Mm 2 cos θ − Rp0 p∞ sin θ = 0, (7.65)
7.4 The Principle of Equivalence 179
and obtain:
GM
tan θ = . (7.66)
Rv0 v∞
Equation (7.66) coincides with Soldner’s expression.3 For a very fast particle, the
velocities at infinity and at closest approach are approximately equal, v0 ∼ v∞ ∼ c,
and the angle θ is small (tan θ ∼ θ)
2G M
α = 2θ =, (7.67)
Rc2
in agreement with Einstein’s 1911 result. Later, in the culmination of several
attempts, Einstein (1916a) would present his general theory of relativity and predict
that the bending is twice as big as his equivalence calculation predicted.
with the indices α and β running from 1 to 4, and the convention that whenever
we see repeated indices we sum over those indices. When there is no gravitational
field, the tensor gαβ is diagonal. For example, in cartesian coordinates gx x = g yy =
gzz = −1, and gtt = 1. Once the metric tensor is known, the (four-dimensional)
path followed by a particle between two space-time points, the “geodesic,” is the
one for which the length s (or the proper time τ = s/c), given by
# #
s ≡ ds = gαβ d x α d x β , (7.75)
Newtonian
r0
r
Ē
For that circular orbit, the (proper) period is τ0 = 2π/ωφ , the proper time for a par-
ticle to come back to the same point in the circle. From equation (7.80b), we have
ωφ = . (7.86)
r02
On the other hand, for those same values of Ē and the radial component will
also describe an oscillatory motion as a function of τ . Since Mercury’s orbit has a
small eccentricity, we can compute the oscillation for small variations with respect
to the circular orbit. In this regime, the potential can be approximated by a parabola
(see Figure 7.5), and the radial motion will be harmonic in τ , with a frequency
ω2R ≡ V (r0 ) given by:
GM 2 2 G M
V (r0 ) = −2 + 3 − 12 (7.87)
r03 r04 c2r05
2 2 G M
= 4 −6 2 5 (7.88)
r0 c r0
GM
= ωφ2 − 6ωφ2 2 . (7.89)
c r0
7.6 Weak Gravity around a Static, Spherical Star 185
Always within the weak gravity approximation, we obtain that the radial and
angular frequencies differ in the relativistic case:
GM
ω R ≈ ωφ − 3ωφ . (7.90)
c2 r0
In other words, whereas in the Keplerian problem the orbits are closed (ω R = ωφ ),
in the relativistic case there is a small angular change δφ (the precession) after a
period τ R in the radial coordinate. From equation (7.5), using ω R τ R = 2π, and
ωφ τ R = 2π + δφ we obtain
6π G M
δφ = . (7.91)
c2 r0
This is one of the most famous results of the General Theory of Relativity.
(ds)2 = 0, (7.94)
and obtains, in the same spirit as his principle of equivalence law, a new velocity
of light
1 − G M/r c2 2G M
c(r ) = c
c 1− . (7.95)
1 + G M/r c2 r c2
and a corresponding index of refraction
2G M
n(r ) = 1 + , (7.96)
r c2
186 Relativity and Least Action
whose deviation from unity is twice the amount he obtained in his principle of
equivalence calculation (equation 7.57). His calculation of the bending is exactly
the one in the principle of equivalence, giving, for the bending angle the celebrated:
4G M
α= . (7.97)
Rc2
that in the limit of weak gravity would reduce to the classical equations of Newto-
nian gravity. The relation, or relations, between metric and curvature were known
to the geometers, which Einstein studied patiently. The idea of curvature is intu-
itive for a two-dimensional surface, like the surface of a sphere. But in higher
dimensions, like the four dimensions of space-time, the notion is far from obvious.
The most natural generalization to curvature in higher dimensions is the so-called
“Riemann tensor,” named after Bernhard Riemann, one of the creators of the idea
of n-dimensional curvature. The Riemann tensor is of order four, that is, it involves
four indices, rather than the two indices of equation (7.98). Qualitatively, the idea of
Riemann’s tensor being of order four is the following: one way of computing cur-
vature is to “transport” a vector parallel to itself along a closed curve. If, after this
journey, the vector returns to its original orientation, independent of the curve cho-
sen, space is flat. If it returns rotated with respect to its original orientation, space
is curved. Now imagine transporting a vector following a path made of a small
(infinitesimal) “parallelogram.”4 The parallelogram has an outgoing direction û,
4 This parallelogram is made of four geodesics.
7.7 Hilbert’s Least Action Principle for General Relativity* 187
Since the Lagrangian is a scalar, the most natural quantity that depends on the
metric and its second-order derivatives is the Ricci scalar, and Hilbert wrote:
#
c4 √
Scurvature = R −gd 4 x (7.100)
16π G
where R is the Ricci scalar and the integral is taken over space-time (d 4 x =
√
d x1 d x2 d x3 d x4 ). The term −g, where −g is the determinant of the metric, is
included so that the action is an invariant: the Ricci scalar is an invariant, but d 4 x
√
by itself depends on the choice of coordinates. The volume element −gd 4 x is
invariant under change in coordinates. With this guess for the action, Hilbert com-
puted the variations with respect to the metric at each point of space and time (the
components of the metric are the fields of the problem):
188 Relativity and Least Action
δ
Scurvature = 0. (7.101)
δgαβ
and obtained, for the first time, Einstein’s equations. Later, citing Hilbert, Einstein
(1916b) followed a slightly different approach and rederived his equations using
the least action principle.
8
The Road to Quantum Mechanics
According to the assumption considered here, the energy of a light wave emitted from a
point source is not spread continuously over ever larger volumes, but consists of a finite
number of energy quanta that are spatially localized at points of space, move without
dividing, and are absorbed or generated only as a whole.
189
190 The Road to Quantum Mechanics
Up to that time no one had ever produced anything like it in the realm of spectroscopy,
agreement between theory and experiment to five significant figures.
192 The Road to Quantum Mechanics
Bohr’s approach of 1913 was extended very soon after by Sommerfeld, who
also found striking agreement with experiments. Sommerfeld’s beautiful extension
incorporates the theory of relativity, and accounts for experimental discrepancies
(close doublets, not predicted by Bohr) that were put in evidence when the lines of
hydrogen’s spectrum were observed in a powerful spectroscope. The developments
of the quantum theory (the old and the “new,” post 1925) proceed to a great extent
by analogy with classical motion, seeking for the minimal modifications in the
Newtonian and Hamiltonian mechanics that will agree with observations. In the
next section, we visit the results obtained by Bohr and Sommerfeld in their main
papers, and concentrate on their use of classical concepts derived from the principle
of least action.
and was one of “the blind spots” of his theory (Perez, 2009). Given the form of
Balmer’s formula, the energy of each stationary state will be given by
hc R
En = − , (8.12)
n2
with n an integer. Using equation (8.8), an electron with this energy in a circular
orbit will have a frequency ω̄ given by
1/2
8 1 (hc R)3/2
ω̄ = . (8.13)
m e n3
In contrast to the initial treatment, the frequency ω of the emitted radiation is not
in principle related to the frequency ω̄ of the electron orbiting around the nucleus.
This is a clear deviation from ordinary electrodynamics, where we expect the fre-
quency of radiation to be that of the emitting sources. Now, says Bohr, consider the
transition between two successive states, i.e. m = n+1 of very low frequency (very
large n). For low frequencies one expects to recover the classical result: the emitted
radiation should have the same frequency as the electron’s orbital frequency. Using
Balmer’s formula, we have
1 1 R
ω = 2πc R 2 −
4πc 3 . (8.14)
n (n + 1) 2 n
If we require the correspondence principle (ω̄ = ω), we obtain from equations
(8.13) and (8.14)
2π 2 me4
R= , (8.15)
ch 3
which agrees with the experimentally observed values, and with the values obtained
in the initial treatment for E, but with a fundamentally different interpretation.
In the same paper, Bohr offers an alternative, “very simple interpretation” of his
results: in a circular orbit of frequency ω the kinetic energy T = mω2r 2 /2 and the
angular momentum = mωr 2 are related by
T
.= (8.16)
2ω
Combining T = −2E from equation (8.7) with E = −nω/2 from equation
(8.9) we have
= n. (8.17)
This relation is what is popularly known as Bohr’s quantization condition, a
relation that, he mentions, had also been developed by John William Nicholson
(1912), a mathematical physicist he had met in Cambridge. In his study of the
solar corona, Nicholson had initially set the ratio of energy to frequency equal
8.2 Bohr’s “Trilogy” of 1913 and Sommerfeld’s Generalization 195
In any molecular system consisting of positive nuclei and electrons in which the nuclei
are at rest relative to each other and the electrons move in circular orbits, the angular
momentum of every electron round the center of its orbit will in the permanent state of the
system be equal to h/2π ,where h is Planck’s constant.
The extension to more general cases beyond the case of circular motions was
later developed by Arnold Sommerfeld. We discuss Sommerfeld’s treatment in the
following section.
with k and n integers. In order to apply these equations, one needs expressions
for pr and pθ . Sommerfeld (1916) obtains these expressions using the Hamilton-
Jacobi equation (Section 6.5), with S = S(x) + Et. Since the potential is central,
we can safely assume that the motion is in a plane, and S depends only on r
and θ:
1 ∂ S(r, θ) 2 ∂ S(r, θ) 2 e2
+ = E − . (8.23)
r2 ∂θ ∂r r
The above equation can be solved using the method of separation of variables
S(r, θ) = S1 (r ) + S2 (θ), from which we see that:
∂ S2
= pθ = L , (8.24)
∂θ
with L a constant. Sommerfeld’s quantum condition on pθ gives:
# 2π
pθ dθ = kh, → L = k, (8.25)
0
8.2 Bohr’s “Trilogy” of 1913 and Sommerfeld’s Generalization 197
with k an integer. Notice that a graph of pθ versus θ is a horizontal line and not
a closed curve. Following Sommerfeld, we restricted the integral over θ to values
between 0 and 2π, since θ and θ + 2π correspond to the same angular position.
Applying the quantum condition on pr is more complicated due to the r
dependence of the momentum:
3
√ # r2 e2 L2
pr dr = 2 2m dr E + − , (8.26)
r1 r 2mr 2
where r1 and r2 are the turning points of the (classical) trajectory:
e2
r1 + r2 = , (8.27a)
|E|
L2
r1r2 = , (8.27b)
2m|E|
which are obtained by setting pr (r1 ) = pr (r2 ) = 0. Sommerfeld solves this inte-
gral using the rather sophisticated technique of contour integration, extending the
integrand to the complex plane. We present the calculation because we find it illus-
trative and elegant, although with current software the integral can be evaluated
with a simple click.
Notice that equation (8.28) can be written as
3 # r2
dr
I = pr dr = 2 2m|E| (r − r1 )(r2 − r ), (8.28)
r1 r
so that the integrand, when extended to the complex plane, has a branch cut in the
real axis between the points r1 and r2 . The integral can be evaluated by computing
the residues at singularities in the complex plane. The integrand has two singu-
√
larities: a simple pole at z = 0, with residue Res(0) = −r1r2 , and a residue
at infinity, since there is a term of the integrand that decreases as ∼ 1/z for very
large z:
1 i r2 + r2
(z − r1 )(r2 − z)
− + ··· , (8.29)
z z 2
where “· · · ” is shorthand for terms (not necessarily small) that vanish after inte-
grating over a very large circle centered at z = 0 in the complex plane. The above
term therefore gives rise to a residue Res(∞) = −i(r2 + r2 )/2 upon integration
over a large circle. The result of the contour integration is:
√
I = 2m E 2πi [Res(∞) + Res(0)]
√ √
= 2m Eπ r1 + r2 − 2 r1r2
2me4
=π − 2π L
|E|
= nh. (8.30)
198 The Road to Quantum Mechanics
with k an integer.
Expanding the squares, equation (8.32) above can be cast in the same form as
that in equation (8.28):
pr2 1 ∂ S1 2 ē2 L̄ 2
≡ = Ē + − , (8.34)
2m 2m ∂r r 2mr 2
with
E
Ē = E 1 + , (8.35a)
2mc2
E
ē = e 1 + , (8.35b)
mc2
L̄ = k̄ ≡ k 2 − α 2 , (8.35c)
and α = e2 /c the so-called “fine structure constant”2 (α
1/137).
1 In comparing with equation (7.45), notice that Sommerfeld adds a constant mc2 to E. This is just a shift so
that the zero of energy corresponds to the particle at rest.
2 The fine structure constant gave rise to numerological speculations and inspired Guido Beck, Hans Bethe,
and Wolfgang Riezler to write a parody relating α = 1/137 to the value of the absolute zero temperature T0 ,
expressed in degrees Kelvin: T0 = −(2/α − 1). The spoof was accepted in good faith by the editor of Die
8.3 Adiabatic Invariants 199
The contour integration proceeds in the same way as in the previous section, and
we obtain Ē from equation (8.31) substituting e → ē, k → k̄:
2π 2 m ē4 1
| Ē| = √ , (8.36)
h 2 (n + k 2 − α 2 )2
or, with a little algebra in the above equation we obtain:
* −1/2 +
α 2
E = mc2 1+ √ −1 (8.37)
(n + k 2 − α 2 )2
The two quantum numbers n and k give rise to orbits which before had the
same energy but now have very slightly different energies. For example, the orbits
corresponding to (k = 2, n = 1) and (k = 1, n = 2) have the same energy
according to equation (8.31), but not according to equation (8.37). Sommerfeld’s
formula later showed very good agreement with experiment, and this anticipation
received high praise from Planck, who in his Nobel lecture of 1920 said:
“that magic formula arose before which both the hydrogen and the helium spectrum had
to reveal the riddle of their fine structure, to such an extent that the finest present-day mea-
surements, those of F. Paschen, could be explained generally through it - an achievement
fully comparable with that of the famous discovery of the planet Neptune whose existence
and orbit was calculated by Leverrier before the human eye had seen it.”
with Einstein, the adiabatic hypothesis (Navarro and Pérez, 2006; Pié i Valls and
Pérez, 2016): the adiabatic invariants of mechanical systems are the quantities to
be quantized.
In order to understand the essentials of an adiabatic invariant, let us consider a
harmonic oscillator of frequency ω. Let us think (rather than of a pendulum) of
√
a mass m attached to a spring with constant K , and ω = K /ω. Given some
initial conditions, the displacement x(t) with respect to the equilibrium position
will execute a harmonic motion. For example, if we start the oscillator at zero
velocity and a displacement x0 , we will have
where we have used the fact that the average of cos2 φ, cos2 φ = 1/2.
Another way of writing the above equation is
E E
= = Constant. (8.41)
ω ω
The ratio E/ω is the adiabatic invariant of the harmonic oscillator. In order to
visualize the fact that this is an invariant on average, with some small fluctuations,
in Figure 8.3 we show the results of iterating equation (8.39) for three cases using
φn as random numbers.
For a harmonic oscillator, the orbits of constant energy (in the position-
momentum plane (x, p)) are the ellipses of equation (8.18 ), whose area 2π E/ω is
the adiabatic invariant. See Figure 8.4. Ehrenfest extended the argument to multi-
periodic orbits and put the adiabatic principle on a firmer basis. He showed that for
periodic orbits, the adiabatic invariant is the average kinetic energy during a period:
“the average kinetic energy increases in the same proportion as the frequency under
adiabatic influencing.” Notice the equivalence between the adiabatic principle and
Sommerfeld’s condition:
Figure 8.3 Energy versus frequency for a harmonic oscillator for which the fre-
quency is increased very slowly, starting from A: frequency ω/2 and energy E,
B: frequency ω and energy E, and C: frequency ω and energy E/2. Notice that
there are some small fluctuations, but on average the energy increases in such a
way that E/ω retains its initial value. The dashed lines show the corresponding
straight lines E = Constant × ω.
8.4 De Broglie’s Matter Waves 203
p
√
2mE
A
x
ω 2E/m
# τ # τ 3
2νT = 2 T dt = mv dt =
2
mv ds, (8.42)
0 0
with τ = 1/ν the period of the orbit. We show the details of the derivation in
Appendix I. The concept of invariance (relativistic invariance of the action and adi-
abatic invariance) brings some unity to the fragmentary models of the old quantum
theory. The adiabatic principle, in most historical accounts, has less significance
than the correspondence principle, which is regarded as a precursor of the full
theory (Pié i Valls and Pérez, 2016).
are identical with the paths which are dynamically possible” (de Broglie, 1924a).
In other words, he is addressing a question that is latent in Hamilton’s work: what
is the wave theory whose limit corresponds to the Newtonian paths? Unknown (as
far as we know) to de Broglie, is a striking remark in Ehrenfest’s notebooks that
raises this issue (in 1904!): “The Hamilton-Jacobi equation of Lagrangian mechan-
ics corresponds to diffractionless optics. What is the super-Lagrangian mechanics
whose Hamilton-Jacobi equation is adequate for describing the diffracted wave?”
(Klein, 1970, p. 161).
Starting from the equivalence between Fermat’s principle and Maupertuis’s least
action principle,
# #
ds
δ = δ p ds = 0, (8.47)
λ
Fermat Maupertuis
φ(x, t) = eik(x−ct) .
φ(x, t) = φ(x)e−ikct .
4 The word “eikonal” is the German form of the Greek word ικ ών, meaning likeness.
8.5 Schrödinger’s Wave Mechanics 207
“get rid of the energy parameter” and that acquires the form of (8.59) for stationary
states:
2 ∂ 2 ψ(x, t) ∂ψ(x, t)
− + V (x)ψ(x, t) = ±i . (8.60)
2m ∂ x 2 ∂t
is introduced. The above relation implies directly that the action of the operator p̂
acting on a function of q̂ is to compute its derivative with respect to q̂:
∂
p̂ = i . (8.62)
∂ q̂
These operators act, as matrices, on a “space” of states, which Dirac denotes
|ψ (the “ket”) and φ| (the “bra”), with the inner product denoted by φ|ψ. In a
representation where the position operators are diagonal, one has
In his paper Dirac relates the inner product q|Q with the so-called contact
transformations of the Lagrangian theory. The notion of contact transformation is
closely related to Huygens’ principle for wave propagation and, at the same time, to
Hamilton’s view of the propagation of the surfaces of constant action discussed in
210 The Road to Quantum Mechanics
From the above equation Dirac makes the following operator identifications:
∂U (q̂, Q̂) ∂U (q̂, Q̂)
p̂ = , P̂ = − (8.71)
∂ q̂ ∂ Q̂
which are of the form of the classical equations (8.66). Therefore, Dirac concludes,
the function U (q, Q) defined in equation (8.69) is “the analogue” of the classical
action S(Q, q). Notice that Dirac is stating an analogy and not an equality.
m ẋ (t)
S= − V [x(t)] + k ẋ(t)ẋ(t + T0 ),
2
(8.72)
−∞ 2
which corresponds to a particle in a potential V (x), and interacting with itself in
a distant mirror: T0 /2 is the (constant) time it takes for light to reach the mirror.
Since the quantity in curly brackets is not local in time, there is no good definition
of momentum. The equations of motion derived from δS/δx(t) are
m ẍ(t) = −V [x(t)] − k 2 [ẍ(t + T0 ) + ẍ(t − T0 )] . (8.73)
The force acting on the particle at time t depends on the motion of the particle
at times other than t, and the equations of motion cannot be described directly in
Hamiltonian form. “The least action principle does not imply a Hamiltonian form
if the action is a function of anything more than positions and velocities at the
same moment” (Feynman, 1965). Feynman confronts the need to quantize systems
which in general have no Hamiltonian form, but whose (classical) dynamics can be
described in terms of a least action principle. In his Nobel prize acceptance speech,
he narrates the story eloquently:
When I was struggling with this problem, I went to a beer party in the Nassau Tavern in
Princeton. There was a gentleman, newly arrived from Europe (Herbert Jehle) who came
and sat next to me. Europeans are much more serious than we are in America because
they think that a good place to discuss intellectual matters is a beer party. So, he sat by
me and asked, “what are you doing” and so on, and I said, “I’m drinking beer.” Then I
realized that he wanted to know what work I was doing and I told him I was struggling
212 The Road to Quantum Mechanics
with this problem, and I simply turned to him and said, “listen, do you know any way of
doing quantum mechanics, starting with action—where the action integral comes into the
quantum mechanics?” “No,” he said, “but Dirac has a paper in which the Lagrangian, at
least, comes into quantum mechanics. I will show it to you tomorrow.”
Feynman’s goes on to relate his bafflement with Dirac’s statement about the
quantity q|Q as being “analogous” to the exponential of the action. Since q|Q
propagates the wave function ψ from point q (or x) to a different point Q (or x )
at a later time, consider the propagation from point (x, t) to point (x , t + ) an
infinitesimal time later (Feynman, 1942/2005):
# ∞ x
(x, t + ) = A d x (x , t)ei x dt L/ , (8.74)
−∞
where Feynman is taking Dirac’s statement at face value, replacing the “analogous”
by an equality through a proportionality constant A. Since the integral of L is
between two infinitesimally close times one can write:
# x
dt L
L average . (8.75)
x
Replacing the average velocity between the two very close points x and x by (x −
x)/ ≡ η/ one obtains
# x
m (x − x )2 x + x
dt L
−V .
x 2 2
m η2
− V (x) . (8.76)
2
Substituting the above expression in (8.74), and expanding to lowest order in and
η one obtains:
# ∞ i mη2
2 −V (x)
ψ(x, t + )
A dηψ(x + η, t)e ,
−∞
# ∞
i i mη2
A 1 − V (x) dηψ(x + η, t)e 2
−∞ # ∞
i i mη2
A 1 − V (x) ψ(x, t) dηe 2
−∞
#
1 ∂ ψ(x, t) ∞
2
i mη2
+ dη η e
2 2
2 ∂x2 −∞
i 2πi 1 ∂ 2 ψ(x, t) i
= A 1 − V (x) ψ(x, t) + . (8.77)
m 2 ∂x2 m
8.7 Feynman’s Thesis and Path Integrals 213
“corresponds to” xtb |xta in the quantum theory. Feyman shows that this corre-
spondence is strict provided a) a proportionality constant is introduced and b) the
time interval tb − ta is infinitesimal. For a finite interval, Feyman follows Dirac,
and divides the interval tb − ta into a large number N + 1 of small time intervals of
length :
t1 = ta + , t2 = ta + 2, · · · t N = tb − ,
and uses, following Dirac, the composition law (or completeness relation)
#
d xi |xi xi | = 1, (8.79)
to obtain
#
xtb |xta = d x1 d x2 · · · d x N xtb |x N x N |x N −1 x N −1 | · · · |x1 x1 |xta . (8.80)
Since each intermediate propagator xi+1 |xi involves times that are infinitesimally
close, one can replace them by the expression
( )
m m xi+1 − xi 2
xi+1 |xi = exp i − V (xi ) .
2πi 2
where
# tb
S[b, a] = L(ẋ, x)dt (8.82)
ta
is the line integral of the Lagrangian taken over the trajectory connecting xa with
xb and passing through the points xi with straight sections in between.
The kernel, or propagator xtb |xta is then a sum over all paths, each of them with
equal weights.
cΔt
Wave-front at time t
Figure 8.5 Huygens’ construction, showing the two wave-fronts generated, only
one of which should be kept.
8.8 Huygens’ Principle in Optics and Quantum Mechanics* 215
radiation from all the new sources together, “makes no sense at all. Light does not
emit light; only accelerating charges emit light. Later we will see that it actually
does give the right answer for the wrong reasons.” Finally, the incarnation of Huy-
gens’ principle in quantum mechanics gave rise to some confusing remarks in the
literature. In the classic “Variational Principles in Dynamics and Quantum Theory,”
by W. Yourgrau and S. Mandelstam, we read: “this [quantum mechanics] is the only
discipline of physics which is susceptible to a consistent treatment by Huygens’
concept. The wave equations of classical physics are differential equations of the
second order with respect to time. Huygens’ principle – which determines the wave
function at any time once it is known thoughout space at an earlier time – is applica-
ble without modification exclusively to first-order differential equations”(Yourgrau
and Mandelstam, 1968, p. 136)
To complete our tour of the principle of least action, here we revisit these
questions with the purpose of clarifying the nature of Huygens’ principle at an
introductory level. We show, through explicit calculations, that: a) Huygens’ prin-
ciple can be in fact written in terms of a first-order propagator, provided we treat
the two “components” of the wave-front at a given time (the function and its time
derivative) as the source; b) The future wave-front can be written as an envelope
of spherical waves emerging from each point of the (two component) source wave
function; c) The cancellation of the backwards wave results explicitly from the pre-
cise relation that exists for a wave-front between the wave and its time-derivative;
and d) The secondary wave has effect only where it touches the envelope. We
present the calculations for scalar waves of constant velocity and later argue that,
for an inhomogeneous medium, although the mathematics gets more complicated,
the basic principle can still be applied.
as
∂ 0 1
(x, t) = (x, t) ≡ Û (x, t), (8.86)
∂t c2 ∇ 2 0
with the following solution:
u t (x, 0) = g(x)
#
1
= √ d 3 kḡ(k)eik·x , (8.95)
( 2π)3
the functions at time t are given by
# * +
1 g(k̄)
u(x, t) = √ d 3 keik·x f¯(k) cos ckt + sin ckt (8.96)
( 2π )3 ck
#
1
u t (x, t) = √ d 3 keik·x −ck f¯(k) sin ckt + g(k̄) cos ckt . (8.97)
( 2π )3
In other words, the propagation of a second-order equation can be written in
terms of a first-order propagator of a two-component wave function, with two ini-
tial conditions u and u t . Having established the first-order nature of the evolution
operator for the two-state wave-function, in the next subsection we compute the
spatial representation of the evolution operator, and write it in the form:
#
(x, t) = d 3 x Ĝ(x − x , t)(x , 0), (8.98)
u(x’,0) ut(x’,0)
ct ct
u(x,t) = x + x
Figure 8.6 Huygens’ construction: at each point x , the wave-front u(x , 0) and
its time derivative u t (x , 0) emit spherical wavelets. As a result, the wave-front
u(x, t) is the average of u(x , 0) and u t (x , 0) on a spherical surface of radius ct
centered at x. The small arrows are meant to represent the time derivative of the
wave-front.
# ∞
1 1
= dk cos k(|x−x | − ct)+cos k(|x − x | + ct)
(2π)2 c|x − x | 0
1 δ(|x − x | − ct)
= . (8.101)
4π c|x − x |
Notice that the second term, given by δ(|x − x | + ct), vanishes for positive t
(we are propagating in one direction of time only). The propagator (for t > 0) is
therefore given by
1 δt (|x − x | − ct) δ(|x − x | − ct)
Ĝ(x − x , t) = . (8.102)
4πc|x − x | δtt (|x − x | − ct) δt (|x − x | − ct)
So, in a generalized way, Huygens’ principle works, and each point gener-
ates “two-component spherical pulses,” one from the function and one from the
time-derivative. Substituting (8.102) in (8.98), we obtain
# #
∂ d S d S
u(x, t) = t u(x , 0) + t u t (x , 0), (8.103)
∂t S 4π S 4π
where the surface integrals above correspond to the integrals on a spherical surface
of radius ct centered on x. The wave function at (x, t) is therefore a superposition
of spherical wavelets that originate at points from earlier times. We represent this
average pictorially in Figure 8.6. Equation (8.103) appears in specialized books
(Baker and Copson, 1950) and, interestingly, goes back to Poisson (1818) (who
does not cite Huygens).
The choice of u t = −c Aδ (z) originates in the fact that we expect this condition
to be of a propagating front of the form u(x, t) = Aδ(z − ct). Our purpose in this
section is to identify the cancellation of the backwards wave from the contribution
of the spherical wavelets.
Let’s evaluate the averages that appear in Eq. (8.103). Since the problem has
translational invariance in the x y plane, let us compute, without loss of generality,
the wave function at a point (0, 0, z), along the z axis. For the integral in the first
term in (8.103) we have
# # π
d S A
t u(x , 0) = t 2π dθ sin θ δ(z − ct cos θ )
S 4π 4π 0
A
= [1 − (|z| − ct)] , (8.105)
2c
with (x) the Heaviside function. From equation (8.105) we see that the contribu-
tion to the wave-front u at time t originating from the wave-front at t = 0 (the first
term in Eq. (8.103)) is given by:
#
∂ d S A
t u(x , 0) = δ(|z| − ct)
∂t S 4π 2
A
≡ [δ(z − ct) + δ(z + ct)] . (8.106)
2
In other words, the wavelets from the front itself give rise to two sheets, one above
and one below the front, each of them with half the amplitude.
Now consider the contribution from the second term in equation (8.103):
# #
d S A π
t u t (x , 0) = −ct dθ sin θ δ (z − ct cos θ )
S 4π 2 0
A ∂
=− [1 − (|z| − ct)]
2 ∂z
A
≡ [δ(z − ct) − δ(z + ct)] . (8.107)
2
The second set of wavelets gives rise to two sheets as well, but of opposite sign.
As a result, adding the two contributions, the backwards wave cancels and the
wave-front propagates in the “forward” direction.
The calculation of this section shows as well that, even though Huygens’ con-
struction calls for a superposition of wavelets, the secondary wave has effect only at
the point (0, 0, z), where the source from (0, 0, 0) touches the envelope. So, due to
the structure of the wave equation, the evolution of the wave-front in fact proceeds
as though light “were emitting light.”
220 The Road to Quantum Mechanics
221
222 Newton’s Solid of Least Resistance, Using Calculus
y
C1
C2
C3
x
O D
Figure A.1 Meridian of the solid of least resistance, from equations (A.12).
1 We changed the sign of the range of q simply in order to plot the curve with positive values of x.
Appendix B
Original Statement of d’Alembert’s Principle
The original statement, as translated from the original French, and given by Fraser
(1985) is:
General Principle:
Given a system of bodies arranged mutually in any manner whatever; let us suppose that a
particular motion is impressed on each of the bodies, that it cannot follow because of the
action of the others, to find that motion that each body should take.
Solution.
Let A, B, C, etc. be the bodies composing the system, and let us suppose that the
motions a, b, c, etc. be impressed on them, and which be forced because of the mutual
action of the bodies to be changed into the motions a, b, c etc. It is clear that the motion a
impressed on the body A can be regarded as composed of the motion a that it takes, and
of another motion α; similarly, the motions b, c, etc. can be regarded as composed of the
motions b, β, c, x; etc.; from which it follows that the motions of the bodies A, B, C, etc.
would have been the same, if instead of giving the impulses a, b, c, one had given simul-
taneously the double impulses a, α, b, β, c, x etc. Now by supposition the bodies A, B, C,
etc. took among themselves the motions a, b, c, etc. Therefore the motions α, β, x, etc.
must be such that they do not disturb the motions a, b, c, etc., that is, that if the bodies had
received only the motions α, β, x etc. these motions would have destroyed each other and
the system would remain at rest.
From this results the following principle for finding the motion of several bodies which
act on one another. Decompose the motions a, b, c, etc. impressed on each body into two
others a, α, b, β, c, x etc. which are such that if the motions a, b, c, etc. were impressed
alone on the bodies they would retain these motions without interfering with each other;
and that if the motions α, β, x were impressed alone, the system would remain at rest; it is
clear that a, b, c will be the motions that the bodies will take by virtue of their action.
223
Appendix C
Equations of Motion of McCullagh’s Ether
224
Appendix D
Characteristic Function for a Parabolic Keplerian Orbit
In this appendix, we prove equations (6.92). Call (x, y) and (x , y ) the coordinates
of r and r with respect to the focus.
r 2 = x 2 + y2 (D.1)
2 2
r02 =x +y (D.2)
2 2
τ = (x − x ) + (y − y )
2
(D.3)
≡r 2
+ r02 − 2rr0 cos θ (D.4)
Call
V √ √ √ √
w= √ = r + r0 + τ − r + r0 − τ ≡ + − −
2 μm
∂w 1 1 1 ∂r 1 1 ∂τ
= √ −√ + √ +√ (D.5)
∂x 2 + − ∂x + − ∂x
1 1 1 x 1 1 (x − x )
= √ −√ + √ +√ (D.6)
2 + − r + − τ
2 4 5 x2
∂w 1
= 2 (r + r0 ) − 2rr0 (1 + cos θ) 2
∂x 8rr0 (1 + cos θ) r
4 5 (x − x )2
+ 2 (r + r0 ) + 2rr0 (1 + cos θ)
τ2
x(x − x )
−2 × 2 × τ (D.7)
rτ
225
226 Characteristic Function for a Parabolic Keplerian Orbit
2 2
∂w ∂w 1
+ = (r + r0 ) − 2rr0 (1 + cos θ)
∂x ∂y
4rr0 (1 + cos θ)
+ (r + r0 ) + 2rr0 (1 + cos θ)
x(x − x ) + y(y − x )
−2 (D.8)
r
1
= {r + r0 − r + r0 cos θ} (D.9)
2rr0 (1 + cos θ)
1
= (D.10)
2r
from which equations (6.92) follow immediately (note the symmetry between r
and r0 ).
Appendix E
Saddle Paths for Reflections on a Mirror
The following exercise illustrates a case where the action is not a minimum. Con-
sider (see Figure E.1) a light ray that travels in the x y plane from P to Q reflecting
on a mirrored surface whose equation is
1
y = αx 2 . (E.1)
2
The optimal path will consist of two straight segments P R and Q R. We choose
points P and Q symmetrical with respect to the y axis. We want to find the
trajectory P R Q that minimizes the time T (x) given by
2
1 2 1 2 2
cT (x) = (x + a) + b − αx
2 + (x − a) + b − αx
2 . (E.2)
2 2
We are interested, for concreteness, in the reflection at x = 0. Consider x close
to zero and expand T (x) to obtain:
cT (x)
(x + a)2 + b2 − bαx 2 + (x − a)2 + b2 − bαx 2
= a 2 + b2 + 2ax + (1 − bα)x 2 + a 2 + b2 − 2ax + (1 − bα)x 2
P C Q
a a
0 x
Figure E.1 The action (length) for a light ray reflecting from a curved surface
could be a saddle.
227
228 Saddle Paths for Reflections on a Mirror
b b
a 2 + b2 − √ α− 2 x2
a +b
2 2 a + b 2
b
= a 2 + b2 − √ (α − αe ) x 2 , (E.3)
a + b2
2
√
where αe = b/ a 2 + b2 corresponds to the curvature of an √ ellipse E whose foci
are at P and Q and of major and minor semi-axes given by a 2 + b2 and a respec-
tively. If the curvature α of the mirror is larger than that of the ellipse E (αe ), the
law of reflection is followed at x = 0, but the action is a maximum with respect to
small variations of the point R around x = 0. If α = αe , the action is critical, as
expected if P and Q are focal points: the law of reflection is followed for all points
in the mirror. We stress that the action, although being a maximum with respect to
variations of the “reflection” point, cannot be an overall maximum, since the path
P Q or Q R can always be made longer by adding wiggles to the straight line. The
path length is therefore a minimum or a saddle.
Appendix F
Kinetic Caustics from Quantum Motion
in One Dimension
In this Appendix we analyze the interesing mapping between the second varia-
tion of the action of a classical path in a one-dimensional potential V (x), and
the quantum mechanical problem of a particle moving in a potential −V (x) =
−d 2 V (x)/d x 2 (Hussein, Pereira, Stojanoff, and Takai, 1980). Let us take the mass
of the particle as unity. The action S, given by
# t
1 2
S= dt ẋ − V (x) , (F.1)
0 2
has a vanishing variation, δS = 0, for a path x0 (t). If we consider a small variation
around the path x0 :
x(t) = x0 (t) + φ(t), (F.2)
with
# t
1 2
S2 = dt φ̇ − V (x0 (t))φ (t) .
2
(F.4)
0 2
Integrating by parts the first term, and noting that φ(0) = φ(t) = 0, we obtain:
# T
1 d2
S2 = dt φ(t) − + Ṽ (t) φ(t), (F.5)
0 2 dt 2
where, for simplicity of notation, we have called T the final time in order to distin-
guish it from the intermediate, integrated time t, and Ṽ (t) = −V (x0 (t)). Equation
(F.5) is precisely the expectation value of the energy of a “quantum” particle, if we
think of the variable t as x and take 2 /m = 1. The potential in the x (or t) direction
depends on the classical path x0 (t) minimizing the action.
229
230 Kinetic Caustics from Quantum Motion in One Dimension
The analysis of the possible signs of S2 is reduced now to studying the spectrum
of eigenvalues (the “energies” E n ) of the problem whose stationary Schrödinger
equation is given by:
1 d2
− + Ṽ (t) φn (t) = E n φn (t). (F.6)
2 dt 2
Since the set of eigenfunctions φn (t) is orthonormal, we can write any arbitrary
variation φ(t) as a linear combination of the form
∞
'
φ(t) = an φn (t), (F.7)
n=0
The condition for a caustic or for a kinetic focus (S2 = 0) is then that the corre-
sponding quantum problem has an eigenvalue E n = 0. Let us apply this treatment
to the problem considered in section 6.8.6, where V (x) = λ|x|, giving
Ṽ (t) = −V (x0 (t)) = −λδ(x0 (t)), (F.9)
where x0 (t) is the classical orbit given by equations (6.219) and (6.220). Since in
the interval 0 < t < 2t0 the particle has a value x0 (t) = 0 for t = t0 , we write
δ(t − t0 )
δ(x0 (t)) = . (F.10)
|v0 |
Using v0 = λt0 /2 the equivalent Schrödinger equation becomes
1 d2 2
− − δ(t − t0 ) φn (t) = E n φn (t), (F.11)
2 dt 2 t0
which corresponds to an otherwise free particle with an attractive delta function
potential of intensity 2/t0 located at t = t0 . (In addition, in order to impose the
boundary conditions φn (0) = φn (T ) = 0, we add infinite potential walls at t = 0
and at t = T .)
First note that if t0 > T the delta function is outside the well, and the quantum
particle is free inside the well. The eigenfunctions are those of the potential well, a
classical textbook quantum problem:
πnt
φn (t) ∝ sin , (F.12)
T
with positive eigenvalues given by:
1 π 2n2
En = , (n = 1, 2, · · · ). (F.13)
2 T2
Kinetic Caustics from Quantum Motion in One Dimension 231
In other words, for times T shorter than t0 we have S2 > 0, and the action is a
minimum. For T > t0 , the attractive delta function falls inside the potential well,
and there is the possibility of a zero (or negative) energy. If we integrate equation
(F.11) between t0− and t0+ we obtain:
4
φn (t0− ) − φn (t0+ ) =
φ(t0 ), (F.14)
t0
so the delta function imposes a discontinuity in the derivative. We concentrate now
on the possibility of having a state function φ0 (t) with zero energy. If we look at
the Schrödinger equation for times t = t0 , for E 0 = 0 we have simply:
φ0 (t) = 0, (F.15)
whose nontrivial solution is
φ0 (t) = a + bt. (F.16)
The function φ0 will be of the form (see Figure F.1):
⎧
⎨at if 0 < t < t0 ;
φ0 (t) = (T − t) (F.17)
⎩at0 if t0 < t < T .
T − t0
In order for φ0 of equation (F.17) to be an allowed solution, it has to satisfy the
discontinuity of the derivative of equation (F.14):
Ṽ(t)
φ0(t)
t0
t
0 T
2
− δ(t − t0)
t0
Figure F.1 Quantum analogue for the second variation of the action for a potential
V (x) = λ|x|.
232 Kinetic Caustics from Quantum Motion in One Dimension
t0
a+a = 4a, (F.18)
T − t0
giving
4
T = t0 . (F.19)
3
In other words, for times shorter than 4t0 /3 the action is a maximum. For
t = 4t0 /3, the second variation is zero, and that point corresponds to the caustic of
this problem (see Figure 6.20).
Appendix G
Einstein’s Proof of the Covariance
of Maxwell’s Equations
233
234 Einstein’s Proof of the Covariance of Maxwell’s Equations
1 ∂ ∂ ∂ v ∂ ∂ Bx
γ −v Ez = γ − 2 By − . (G.3c)
c ∂t ∂x ∂x c ∂t ∂ y
Einstein now uses the vanishing divergence of the electric field, expressed in the
primed coordinates:
∂ Ex ∂ Ey ∂ Ez ∂ v ∂ ∂ Ey ∂ Ez
+ + =γ − 2 Ex + + = 0, (G.4)
∂x ∂y ∂z ∂x c ∂t ∂y ∂z
which, substituted in equation (G.3a) gives
1 ∂ Ex ∂ % v & ∂ % v &
= γ B z − E y − γ B y + Ez . (G.5)
c ∂t ∂ y c ∂z c
On the other hand, equations (G.3b) and (G.3c) can be rewritten as
1 ∂ % v & ∂ Bx ∂ % v &
γ E y − B z = − γ B z − Ey , (G.6a)
c ∂t c ∂z ∂x c
1 ∂ % v & ∂ % v & ∂ Bx
γ E z + By = γ By + E z − . (G.6b)
c ∂t c ∂x c ∂z
Einstein includes the prefactor ψ(v) because equations (G.5) and (G.6) are
linear in the fields, and therefore they determine the primed fields up to a mul-
tiplicative constant. In order to find ψ(v), he proceeds in the same fashion as in
the transformation of fields. Applying the transformation followed by the inverse
transformation, an operation that brings the fields to the original values, he obtains
ψ(v)ψ(−v) = 1. “From reasons of symmetry” (Einstein, 1952, p. 53) ψ(v) =
ψ(−v), which implies therefore ψ(v) = 1 and we obtain equations (G.8c).
1 ∂B
Applying the same logic we obtain: = −∇ × E .
c ∂t
Appendix H
Relativistic Four-Vector Potential
Equations (G.8c) for the transformation of the electromagnetic fields are different
in structure from the transformation of the coordinates x = γ (x − vt), y = y,
z = z, t = γ (t − vx/c2 ): the fields change in the transverse direction but remain
unchanged in the direction of motion. Now, we can write the six components of the
fields E and B in terms of two potentials, a scalar φ(x, t) and a vector A(x, t):
1 ∂A
E = −∇φ − , (H.1a)
c ∂t
B = ∇ × A. (H.1b)
Since φ and A have the same units let us write the transformation of coordinates
in terms of x, y z and t¯ with t¯ = ct having units of length:
where β = v/c. Using the chain rule for the derivatives we have:
∂ ∂ t¯ ∂ ∂x ∂
= +
∂t¯ ¯
∂t ∂t ¯ ¯
∂t ∂ x
∂ ∂
=γ −β (H.3a)
∂ t¯ ∂x
∂ ∂x ∂ ∂ t¯ ∂
= +
∂x ∂x ∂x ∂ x ∂ t¯
∂ ∂
=γ −β (H.3b)
∂x ∂ t¯
235
236 Relativistic Four-Vector Potential
∂ ∂
= , (H.3c)
∂y ∂y
∂ ∂
= . (H.3d)
∂z ∂z
Let us consider the transformation laws for the fields obtained by Einstein – equa-
tions (G.8c) – and express them in terms of the potentials of equations (H.1b). For
the electric field in the x direction we have:
∂φ ∂ A x
E x = E x = − −
∂x ∂ t¯
∂ ∂ ∂ ∂
= −γ −β φ−γ − β Ax
∂x ∂ t¯ ∂ t¯ ∂x
∂ ∂
= − γ (φ − β A x ) − γ (A x − βφ) . (H.4)
∂x ∂ t¯
φ Ax
∂φ ∂ Ax
E x = − − , (H.5)
∂x ∂ t¯
with φ = γ (φ − β A x ) and Ax = γ (A x − βφ): the transformation properties of
(φ, A x ) are the same as those of (ct, x). For the field in the y component, we have:
E y = γ E y − β Bz (H.6)
∂φ ∂ A y ∂ Ay ∂ Ax
= −γ + − γβ − (H.7)
∂y ∂ t¯ ∂x ∂y
∂ ∂ ∂
= − γ (φ − β A x ) − γ +β Ay (H.8)
∂y ∂ t¯ ∂x
φ
∂
∂ t¯
∂φ ∂ Ay
=− − , (H.9)
∂y ∂ t¯
The same calculation can be done for all the components of the electric and mag-
netic fields. We conclude that, if in one frame the fields are given in term of the
potentials as:
1 ∂A
E = −∇φ − , (H.14a)
c ∂t
B = ∇ × A, (H.14b)
in a primed frame moving at velocity v with respect to the first one, the fields are
given by
1 ∂A
E = −∇ φ − (H.15a)
c ∂t
B = ∇ × A , (H.15b)
φ = γ (φ − β A x ) (H.16a)
Ax = γ (A x − βφ), (H.16b)
Ay = Ay, (H.16c)
Az = Az . (H.16d)
Appendix I
Ehrenfest’s Proof of the Adiabatic Theorem
238
Ehrenfest’s Proof of the Adiabatic Theorem 239
x2(0)= x2(tB)
δ(0)
x2(tA)
δ(tA)
x1(0) = x1(tA)
are solutions of the equations of motion for time-independent, fixed values of the
respective parameters. The respective orbits have periods t A and t B :
x1 (t A ) = x1 (0), (I.5a)
x2 (t B ) = x2 (0), (I.5b)
x2 (t) = x1 (t) + δ(t). (I.5c)
or which is equivalent,
δ ωT = 0, (I.16)
where T is the average kinetic energy during a period t P .
References
241
242 References
Berry, M. V., and Jeffrey, M. R. 2007. Conical Diffraction: Hamilton’s Diabolical Point
at the Heart of Crystal Optics. In Progress in Optics 50, edited by E. Wolf: 13–50.
Elsevier B. V.
Berry, M.V. 2015. Nature’s Optics and Our Understanding of Light. Contemp. Phys. 56:
2–16.
Blanchard, P. and Brüning, E. 1982. Variational Methods in Mathematical Physics: A
Unified Approach. Springer-Verlag.
Blåsjö, V. 2005. The Isoperimetric Problem. American Mathematical Monthly 112, June-
July: 526–566.
Bloch, A. 2003. Nonholonomic Mechanics and Control. With J. Baillieul, P. Crouch., J.
Marsden., and D. Zenkov. 2nd edn., 2015. Springer.
Bohr, N. 1913a. On the Constitution of Atoms and Molecules: Part I. Phil. Mag. 26 (6):
1–25.
Bohr, N. 1913b. On the Constitution of Atoms and Molecules: Part II, Systems Containing
Only a Single Nucleus. Phil. Mag. 26 (6): 476–502.
Bohr, N. 1913c. On the Constitution of Atoms and Molecules: Part III, Systems Containing
Several Nuclei. Phil. Mag. 26 (6): 857–875.
Bohr, N. 1922. The Theory of Spectra and Atomic Constitution: Three Essays. Cambridge
University Press.
Boltzmann, L. 1872. Further Studies on the Thermal Equilibrium of Gas Molecules.
Reprinted in Brush, S. G. 2003. The Kinetic Theory of Gases, an Anthology of
Classic Papers with Historical Commentary. Imperial College Press: 262–349. Orig-
inally published under the title Weitere Studien über das Wärmegleichgewicht unter
Gasmolekülen. Sitzungberichte Akad. Wiss. Vienna, part II, 66: 275–370.
Born, M. 1926. The Problems of Atomic Dynamics: 75. Dover Publications.
Born, M., and Woolf, E. 1999. Principles of Optics, 7th ed., Cambridge University
Press.
Boyer, C. A. 1987. The Rainbow: From Myth to Mathematics. Princeton University Press.
Brackenridge, J. B. 1996. The Key to Newton’s Dynamics: The Kepler Problem and the
Principia. University of California Press.
Breger. H. 1994. The Mysteries of Adaequare: A Vindication of Fermat. Arch. Hist. Exact
Sci. 46 (3): 193–219.
Briggs, J. S., and Rost, J. M. 2001. On the Derivation of the Time-Dependent Equation of
Schrödinger. Foundations of Physics 31: 693–712.
Brillouin, Marcel. 1919. Actions mécaniques à hérédité discontinue par propagation; essai
de théorie dynamique de l’atome à quanta. Comptes rendus 168: 1318–1320.
de Broglie, L. 1923a. Ondes et quanta. Comptes rendus 177: 507–510.
de Broglie, L. 1923b. Quanta de lumière, diffraction et interférences. Comptes rendus 177:
548–550.
de Broglie, L. 1923c. Les quanta, la théorie cinétique des gaz el le principe de Fermat.
Comptes rendus 177: 630–632.
de Broglie, L. 1924a. A Tentative Theory of Light Quanta. Philosophical Magazine 47:
446–458.
de Broglie, L. 1924b. Ph. D. Thesis. Université de Paris.
Brunet, P. 1938. Etude historique sur le principe de la moindre action. Herman & Cie ,
Éditeurs.
Burns, H. 1895. The Eikonal. Translated by D. H. Delphenich. S. Hirzel.
Butterfield, J. 1995. On Hamilton-Jacobi Theory as a Classical Root of Quantum Theory.
In Quo Vadis Quantum Mechanics? Edited by Elitzur, A. C., Dolev, S. and Kolenda,
N., 239–273. Springer.
References 243
Byers, N. 1978. E. Noether’s Discovery of the Deep Connection Between Symmetries and
Conservation Laws. Arxiv. hep-th 980744.
Cajori, F. 1729/1934. Sir Isaac Newton’s Mathematical Principles of Natural Philosophy
and his System of the World (Andrew Motte’s translation of the Principia, of 1729,
revised.) University of California Press.
Capecchi, D. 2012. History of Virtual Work Laws. A History of Mechanics Prospective.
Springer Verlag.
Carathéodory, C. 1937. The Beginning of Research in the Calculus of Variations. Osiris,
3: 224–240.
Cauchy, A. 1830. Oeuvres complètes d’Augustin Cauchy. Ser. 1, vol. 9, p. 410.
Cauchy, A. 1843. Mémoire sur les dilatations, les condensations et les rotations produites
par un changement de forme dans un système de points matériels. Comptes rendus,
Vol. 16, p. 12. Reprinted in Oeuvres Complètes d’Augustin Cauchy. Vol. 7 (1892),
235–246.
Chandrasekhar, S. 1995. Newton’s Principia for the Common Reader. Oxford University
Press.
Cohen, I. B. 1974. Isaac Newton, The Calculus of Variations, and the Design of Ships. In
For Dirk Struik, Scientific, Historical and Political Essays in Honor of Dirk J. Struik.
Edited by 000, 169–187. Boston Studies in the Philosophy of Science Vol. 15. Reidel
Publishing Company.
Cohen, I. B., and Whitman, A. 1999. A Guide to Newton’s Principia. In Isaac Newton, The
Principia, p. 46. University of California Press.
Cohen, M., and Drabkin, I. E. 1965. A Source Book in Greek Science. Harvard University
Press, 271–272.
Courant, R., and Robbins, H. 1996. What is Mathematics? An Elementary Approach to
Ideas and Methods. Revised by Ian Stewart from the original 1941 edition. Oxford
University Press.
D’Alembert, J. L. R. 1743. Traité de dynamique.
D’Alembert, J. L. R. 1755. Equilibre. In Encyclopédie ou Dictionnaire raisonné des sci-
ences, des arts et des métiers. Available online at http://encyclopedie.uchicago.edu/.
D’Alembert, J. L. R. 1758. Traité de dynamique (Second edition). First edition 1743.
Damianus, 1897. Schrift über Optik: Mit Auszügen aus Geminos, Edited by R. Schöne,
Berlin, Reichsdruckerei.
Darrigol, O. 2010. James MacCullagh’s ether: An optical route to Maxwell’s equations?
European Physical Journal H, 35: 133–172.
Darrigol, O. 2012. A History of Optics from Greek Antiquity to the Nineteenth Century.
Oxford University Press.
Darrigol, O. 2014. Physics and Necessity: Rationalist Pursuits from the Cartesian Past to
the Quantum Present. Oxford University Press.
De la Chambre, M. C. 1662. La Lumière, 313–314. Available in Google Books.
Descartes, R. 1637. Discours de la méthode pour bien conduire sa raison, et chercher la
vérité dans les sciences plus la dioptrique, les météores et la géomeétrie qui sont des
essais de cette méthode. Leyde: Maire. p. 73.
Dirac, P. A. M. 1933. The Lagrangian in Quantum Mechanics. Phys. Zeits. Sowjetunion,
3 (1): 64–72.
Drago, A. 1993. The Principle of Virtual Works as a Source of Two Traditions in 18th
Century Mechanics. In Bevilacqua F. (ed.), 1992. History of Physics in Europe in the
19th and 20th Centuries, Como, Italy, 1992, F. Bevilacqua ed., Società Italiana di
Fisica, Bologna, 69–80, Bologna.
Drake, S. 1978. Galileo at Work: His Scientific Biography. University of Chicago Press.
244 References
Euler, L. 1751. Dissertation sur le principe de la moindre action, avec l’examen des objec-
tions de M. le Professeur Koenig faites contre ce principe. Berlin. Bilingual edition
available online at http://eulerarchive.maa.org/docs/originals/E186a.pdf
Euler, L. 1752. Harmonie entre les principes generaux de repos et de mouvement de M. de
Maupertuis. Mémoires de l’Académie des sciences de Berlin, 7: 169–198. Available
at http://eulerarchive.maa.org/.
Evans, J. and Rosenquist, M. 1986. “F = ma” optics. Am. J. Phys. 54: 876–882.
Fermat, P. 1657/1894. Œuvres. Vol. 1. Translated by Paul Tannery Correspondance. Paris.
Available in Google books.
Fermat, P. 1657/1894b. Œuvres. Vol. 3. Translated by Paul Tannery. Paris. Available in
Google books. 149–151.
Feynman, R. 1942/2005. Feynman’s Thesis–A New Approach to Quantum Theory. Edited
by Laurie M Brown. World Scientific. Available online at https://cds.cern.ch/record/
101498/files/Thesis-1942-Feynman.pdf
Feynman, R. 1965. “The Development of the Space-Time View of Quantum Electrody-
namics.” Nobel Lecture. http://www.nobelprize.org/nobel_prizes/physics/laureates/
1965/feynman-lecture.html.
Feynman, R. 1963. The Feynman Lectures on Physics. Vol. I, Addison Wesley. Section
26–3.
Feynman, R. 2013. The Feynman Lectures on Physics. Vol. II, The Millenium Edition.
http://www.feynmanlectures.caltech.edu/II_01.html.
FitzGerald, G. F. 1879. On the electromagnetic theory of the reflection and refraction of
light. Proceedings of the Royal Society. Reprinted in FitzGerald, G. F. 1902. The
Scientific Papers of the Late G. F. FitzGerald, ed. J. Larmor. Dublin, 41–44.
Fraser, C. 1983. J. L. Lagrange’s Early Contributions to the Principles and Methods of
Mechanics. Archive for History of Exact Sciences, 28: 197–241.
Fraser, C. 1985. D’Alembert’s Principle: The Original Formulation and Application in Jean
d’Alembert’s Traité de Dynamique (1743), parts 1 and 2. Centaurus 28: 31–61.
Galilei, G. 1600/1960. On Motion and On Mechanics. Translated with Introduction and
notes by I. E. Drabkin and Stillman Drake. University of Wisconsin Press.
Galilei, G. 1638/1974. Discourses and Mathematical Demonstrations Two New Sciences.
Translated by Stillman Drake. University of Wisconsin Press.
Gamow, R. I. 1966. Thirty Years That Shook Physics: The Story of Quantum Theory. Dover
Publications.
Gandz, S. 1940. Studies in Babylonian Mathematics III: Isoperimetric Problems and the
Origin of the Quadratic Equations. Isis, 32: 103–115.
Gauss, C. F. 1829. Über ein neues allgemeines Grundgesetzder Mechanik.Crelles Journal,
4: 232–235.
Giusti, E. 2009. Les méthodes des maxima et minima de Fermat. Ann. Fac. Sci. Toulouse
Math. 18 (6): Fascicule Special, 5985.
Goldenbaum, U. 2016. Ein gefälschter Leibnizbrief?: Plaidoyer für seine Authentizität.
Wehrhahn Verlag, Hannover.
Goldenbauh, U. 2017. Private communication.
Goldstine, H. H. 1980. A History of the Calculus of Variations from the 17th through the
19th Century. Springer Verlag.
Gould, S. H. 1985. Newton, Euler, and Poe in the Calculus of Variations. In Differential
Geometry, Calculus of Variations, and their Applications. Edited by Rassias, G. M.
and Rassias T. M., 267–282. Marcel Dekker, Inc., 1985.
Gouy, L. G. 1890. Sur une propriété nouvelle des ondes lumineuses. Comptes rendus
hebdomadaires des séances de l’Académie des sciences, 110: 1251–1253.
246 References
Gouy, L. G. 1891. Sur la propagation anomale des ondes. Annales de chimie et de physique
24 6e série, 145–213.
Graves, R. P. 1882. Life of Sir William Rowan Hamilton, Andrews Professor of Astronomy
in the University of Dublin, and Royal Astronomer of Ireland, Including Selections
from his Poems, Correspondence, and Miscellaneous Writings. Hodges, Figgis, Vol. I.
Gray, J. 1993. Möbius’s Geometrical Mechanics. In Möbius and His Band. Edited by
Fauvel, J. R. Flood, R. and Wilson, R., 79–103 Oxford University Press.
Gray, C. G. and Taylor, E. F. 2007. When Action Is Not Least. Am. J. Phys. 75: 434–458.
Gray, C. G. 2009. Principle of Least Action. Scholarpedia, 4 (12): 8291. Available at
http://www.scholarpedia.org/article/Principle− of− least− action.
Green, G. 1838. On the Laws of Reflection and Refraction of Light at the Common Surface
of Two Non-Crystallized Media (read 11 Dec 1837). Transactions of the Cambridge
Philosophical Society. Also in The Mathematical Papers of the Late George Green.
MacMillan and Co. 243–269.
Hall, A. R. and Hall, M. B. 1888/1962. A Catalog of the Portsmouth Collection of Books
and Papers Written by or Belonging to Sir Isaac Newton. Cambridge University
Press.
Hamilton, W. R. 1823. On a General Method of Expressing the Paths of Light, and of the
Planets, by the Coefficients of a Characteristic Function. Dublin University Review
and Quarterly Magazine, 1: 795–826.
Hamilton, W. R. 1833. On Some Results of the View of a Characteristic Function in Optics.
Report of the Third Meeting of the British Association for the Advancement of Science
held at Cambridge in 1833 John Murray, 360–370.
Hamilton, W. R. 1834. On a General Method in Dynamics. Philosophical Transactions of
the Royal Society, part II, 247–308.
Hamilton, W. R. 1834b. On the Application to Dynamics of a General Mathematical
Method Previously Applied to Optics. British Association Report, 513–518.
Hamilton, W. R. 1835. Second Essay on a General Method in Dynamics. Philosophical
Transactions of the Royal Society, I: 95–144.
Hamilton, W. R. 1837. Third Supplement to an Essay on the Theory of Systems of Rays.
Transactions of the Royal Irish Academy, 17: 1–144.
Hanc, J. and Taylor, E., F. 2004. From Conservation of Energy to the Principle of Least
Action: A Story Line. Am. J. Phys., 72: 514–521 .
Hanc, J., Taylor, E., F. and Tuleja, S. 2004. Deriving Lagrange’s Equations Using
Elementary Calculus. Am. J. Phys., 72: pp. 510–513.
Hanc, J., Taylor, E., F. and Tuleja, S. 2005. Variarional Mechanics in One and Two
Dimensions. Am. J. Phys., 73: 603–610.
Hanc, J., Tuleja, S. and Hancova, M. 2004. Simple Derivation of Newtonian Mechanics
from the Principle of Least Action. Am. J. Phys., 71: 386–391.
Hankins, T., L. 1967. The Reception of Newton’s Second Law of Motion in the 18th
Century. Archives internationales d’histoire des sciences, 20: 55–56.
Hankins, T., L. 1970. Jean d’Alembert, Science and the Enlightenment. Clarendon Press.
Hankins, T., L. 1980. Sir William Rowan Hamilton. The John Hopkins University Press.
Hardy, G., H. 1967. A Mathematicians Apology. Cambridge University Press.
Hawking, S. and Ellis, G.F.R. 1973. The Large Scale Structure of Space-Time. Cambridge
University Press. pp. 365–368.
Heath, T. 1921. A History of Greek Mathematics. Vol. 2. Oxford University Press, pp.
206–213.
Heath, T. 1926. The Thirteen Books of Euclid’s Elements. Books 1–2. Cambridge
University Press.
References 247
Heilbrom, J. 2013. The path to the Quantum Atom. Nature, 498: 27–30.
Helmholtz, H., von. 1887. Zur Geschichte des Princips der kleinsten Action.
Helmholtz, H., von. 1892. Das Princip der kleinsten Wirkung in der Electrodynamik.
Annalen der Physik, 283: 1–26.
Heronis Alexandrini. 1976. Opera Qvae Supersvnt Omnia. Vol. 2. Teubner, 396. Available
at http://gallica.bnf.fr/ark:/12148/bpt6k25187r
Hertz, H.R. 1894/1956. Gessamelte Werke, Vol. 3 Der Prinzipien der Mechanik in neuem
Zusammenhange dargestellt, Barth. English Translation: Dover.
Herzberger, M. 1936. On the Characteristic Function of Hamilton, the Eiconal of Bruns,
and Their Use in Optics. J. Opt. Soc. Am., 26: 177–178.
Hildebrandt S. and Tromba A. 1985. Mathematics and Optimal Form. Scientific American
Books.
Holm, D. D. 2008. Geometric Mechanics. Imperial College Press.
Hussein, M. S., Pereira, J. G., Stojanoff, V., and Takai, H. 1980. The Sufficient Condition
for an Extremum in the Classical Action Integral as an Eigenvalue Problem. American
Journal of Physics, 48: 767–770.
Huygens, C. 1673. Horologium Oscillatorium; sive, de Motu Pendulorum ad Horologia
Aptato Demonstrationes Geometricae.
Huygens, C. 1690/1945. Treatise on Light. In Which Are Explained the Causes of That
Which Occurs in Reflexion, & in Refraction. And Particularly in the Strange Refrac-
tion of Iceland Crystal. Rendered into English by Silvanus P. Thompson. University
of Chicago Press, 42–44.
Jacobi, C. G. 1837. Über die Reduction der Integration der partiellen Differentialgleichun-
gen erster Ordnung zwischen Irgend einer Zahl Variabeln auf die Integration eines
einzigen Systemes gewohnlicher Differentialgleichungen. Journal für die Reine und
Angewandte Mathematik, 17: 97–162. In Werke, 4: 57–127.
Jacobi, C. G. 1837. Zür Theorie der Variationensrechnung und der Differential Gle-
ichungen. J. f. Math. XVII: 68–82. An English translation is given in Todhunter
(1861/2005): p. 243.
Jacobi, C. G. 1884. Vorlesungen über Dynamik. For the English version see Balagan-
gadharan, K. (translator), 2009. Jacobi’s Lectures on Dynamics. Hindustan Book
Agency.
Jammer, M. 1999. Concepts of Force. Dover.
Jourdain, P. E. B. 1912. Maupertuis and the Principle of Least Action. The Monist, 22:
414–459.
Jouguet, E. 1908. Lectures de Mécanique. La Mécanique Ensignée par les Auteurs
Orininaux. Vol. 1. Gauthier-Villars.
Klein, F. 1918. Über die Differentialgesetze für die Erhaltung von Impuls und Energie
in die Einsteinschen Gravitationstheories. Nachr. d. Konig. Gesellsch. d.Wiss. zu
Gottingen Math-phys. Klasse.
Klein, M. J. 1970. Paul Ehrenfest, The Making of a Theoretical Physicist. Vol. 1. North
Holland.
Knobloch, E. 2012. Leibniz and the Brachistochrone. Documenta Mathematica. Extra
Volume ISMP, 15–18.
König, S. 1751. De universali principio aequilibrii et motus. Nova acta eroditorum,
162–176. Available at http://gallica.bnf.fr/.
Kragh, J. 1982. Erwin Schrödinger and the Wave Equation: The Crucial Phase. Centaurus,
26: 154–197.
Kragh, J. 1985 The Fine Structure of Hydrogen and the Gross Structure of the Physics
Community, 1916–26. Historical Studies in the Physical Sciences, 15: No. 2, 67–125.
248 References
Kuhn, T., S. and Heilbron, J., L. 1969. The Genesis of the Bohr Atom. Historical Studies
in the Physical Sciences, 1: 211–290.
Kuhn, T., S. 1978. Black-body Theory and the Quantum Discontinuity: 1894–1912. Oxford
University Press.
Lagrange, J. L. 1760/1761. Application de la méthode exposé dans le mémoire précédente
à la solution des problèmes de dynamique differents. Miscelanea Taurinesia, 196–
298.
Lagrange, J. L. 1764. Recherches sur la libration de la Lune. Œuvres de Lagrange, Vol. 6,
5–61. Available at http://gallica.bnf.fr/.
Lagrange, J. L. 1768. Mécanique Analytique, Seconde Partie. The equations first appeared
in Miscell. Tourin, 11.
Lagrange, J. L. 1811/1995. Analytical Mechanics. Translated by A. Boissonnade and V. N.
Vagliente from the Mécanique analytique. New edn. 1811. Springer.
Lamb, H. 1900. On a Peculiarity of the Wave-System due to the Free Vibrations of a
Nucleus in an Extended Medium. Proc. London Math. Soc., 53: 208–211.
Lanczos, C. 1962. The Variational Principles of Mechanics. Second edn. University of
Toronto Press, first edn. 1949.
Landsman, N., P. 2007. Between Classical and Quantum. In Handbook of the Philosophy of
Science, Vol. 2: Philosophy of Physics, Edited by John Earman & Jeremy Butterfield,
417–554. North Holland.
Laplace, P. S. 1799. Traité de mécanique céleste. Vol. 1, First Part, Book 2. 165 ff.
Larmor, J. 1893. A Dynamical Theory of the Electric and Luminuferous Medium.
Proceedings of the Royal Society of London, 14: 438–461.
Leibniz, G. W. 1682. Unicum Opticae, Catoptricae & Dioptricae Principium. Acta
eruditorum. June. Reprinted in Acta Eruditorum. Vol. 1. Johnson Reprint
Corporation. English translation by Jeffrey K. McDonough. Available at
http://philosophyfaculty.ucsd.edu/faculty/rutherford/Leibniz/unitary-principle.htm.
Leibniz, G., W. 1962. Mathematische Schriften, Vol. 3/1. Edited by G. I. Gerhardt. Reprint.
Georg Olms Verlagbuchhandlung Hildesheim.
Leibniz, G., W. 1696/1952. Tentamen Anagogicum. In Philosophical Papers and Letters.
Translated by Loemker, L. E., 777–788. University of Chicago Press.
Lenz, W. 1924. Über den Bewegungsverlauf und die Quantenzustände der gestörten
Keplerbewegung. Zeitschrift für Physik, 24: 197–207.
Levi, M. 2002. Lectures on Geometrical Methods in Mechanics In Classical and Celestial
Mechanics. Edited by H. Cabral and F. Diacu, 239–280, Princeton University Press.
Levi, M. 2012. The Mathematical Mechanic: Using Physical Reasoning to Solve Problems.
Princeton University Press.
Lewis, A. 1998. The Geometry of the Gibbs-Appell Equations and Gauss’ Principle of
Least Constraint. Reports on Math. Phys, 38: 11–28.
Liberzon, D. 2012. Calculus of Variations and Optimal Control Theory. Princeton
University Press.
Lloyd, H. 1833. On the Phenomena Presented by Light in its Passage along the Axes of
Biaxial Crystals. Trans. R. Irish Acad., 17: 145–158. Reprinted in Lloyd, H., 1877.
Miscellaneous Papers Connected with Physical Science. Longman Green: 1–18.
Lohne, J. 1959. Thomas Harriott (1650–1621), The Tycho Brahe of Optics. Centaurus,
6 (2): 113–121.
Lorentz, H. A. 1892 La théorie électromagnétique de Maxwell et son application aux corps
mouvants. E.J. Brill.
Lorentz, H. A. 1895. Michelson’s Interference Experiment. Reprinted in Einstein (1952),
1–7.
References 249
Lorentz, H. A. 1903. Contributions to the Theory of Electrons, Proc. Roy. Acad. Amster-
dam. 608: 132–154.
Lyssy, A. 2015. L’Économie de la nature—Maupertuis et Euler sur le Principe de Moindre
Action, Philosophiques, 42: 31–51.
Lyusternik, L. A. 1964. Shortest Paths, Variational Problems. MacMillan.
Mach, E. 1960. The Science of Mechanics: Account of its Development. Translated by
Thomas J. McCormack. Sixth edn. Open Court Publishing Company.
Mariotte, E. (1673). Traité de la percussion ou chocq des corps, dans lequel les princi-
pales régles du mouvement contraires á celles que Mr. Des Cartes, & quelques autres
modernes ont voulu éstablir, sont demonstrées par leurs veritables causes.
Mark Smith, A. 1982. Ptolemy’s Search for a Law of Refraction: A Case-Study in the
Classical Methodology of ‘Saving the Appearances’ and Its Limitations. Archive for
History of Exact Sciences, 26: No. 3, 221–240.
Mark Smith, A. 2009. Alhacen on Refraction: A Critical Edition, with English Transla-
tion and Commentary, of Book 7 of Alhacen’s De Aspectibus. Transactions of the
American Philosophical Society, 100 (3): 213–331.
Marsden, J. E. and T. S. Ratiu. 1999. Introduction to Mechanics and Symmetry. Springer-
Verlag, Texts in Applied Mathematics, 17; First Edition 1994, Second Edition, 1999.
de Maupertuis, P. L. M. 1744. Accord de différentes loix de la nature, qui avoient jusqu’ici
paru incompatible. Memoires de l’Académie Royale de Sciences (Paris), 417–426.
Reprinted in Oeuvres, 4 pp. 1–23 Reprografischer Nachdruck der Ausg. (1768).
de Maupertuis, P. L. M. 1746. Les Loix du mouvement et du repos duites d’un principe
metaphysique. Histoire de l’Académie Royale des Sciences et des Belles Lettres,
267–294.
Mayer, A. 1877. Geschichte des Princips der kleinsten Action. Leipzig. Available in
Google books.
MacCullagh, J. 1846. An Essay towards a Dynamical Theory of Crystalline Re-exion and
Refraction (read 9 Dec. 1839). The Transactions of the Royal Irish Academy, 21:
17–50.
McDonough, J. K. 2008. Leibniz’s two realms revisited. Nôus, 42 (4): 673–696
McDonough, J. K. 2009. Leibniz on Natural Teleology and the Laws of Optics. Philosophy
and Phenomenological Research, 78 (3): 505–544.
Mehra, J. and Rechenberg, H. 1982. The Historical Development of Quantum Theory. Part
I, Springer-Verlag, 58.
Möbius, A. F. 1837. Lehrbuch der Statik. Part 2, 217–313.
Moore, T. A. 2004. Getting the Most Action Out of Least Action: A Proposal. Am. J. Phys.
72: 522–527.
Motte, A. 1729. Mathematical Principles of Natural Philosophy by Sir Isaac Newton,
translated into English. Vol. 2, Appendix pp. i–vii.
Nakane M. and Fraser C. G. 2002. The Early History of Hamilton-Jacobi Dynamics 1834–
1837. Centaurus, 44: 61–227.
Nadderd, L., Davidovic, M. and Davidovic, D. 2014. A direct derivation of the relativistic
Lagrangian for a system of particles using d’Alembert’s principle. Am. J. Phys, 82:
1083–1086.
Nauenberg, M. 1994. Hooke, Orbital Motion, and Newtons’s Principia. American Journal
of Physics 62 (4): 331–350.
Neimark, J. I. and N. A. Fufaev. 1972. Dynamics of Nonholonomic Systems. Translations
of Mathematical Monographs, AMS, 33.
Navarro, L. and Pérez E. 2006. Paul Ehrenfest: The Genesis of the Adiabatic Hypothesis,
1911–1914. Arch. Hist. Exact Sci. 60: 209–267.
250 References
Neumann, J.G. 1888. Leipzig Beriechte XL, Vierkandt Monatshefte fur̈ Math u. Phys. III.
Newton, I. 1687. Philosophi Naturalis Principia Mathematica. Londini Societatis Regiae
ac Typis, Josephi and Streater.
Newton, I. 1718. Opticks: A Treatise of the Reflections, Refractions, Inflections & Colours
of Light. Second Edition. Available in Google books.
Noether, E. 1918. Invariante Variationsprobleme, Nachr. D. Knig. Gesellsch. D. Wiss. Zu
Göttingen, Math-phys. Klasse, 235–257.
Nicholson, J. W. 1912. The Constitution of the Solar Corona. II. Month. Not. Roy. Astr.
Soc., 72: 677–692.
O’Hara, J. 1979. Analysis versus Geometry: William Rowan Hamilton, James Mac-
Cullagh and the Elucidation of the Fresnel Wave Surface in the Theory of Dou-
ble Refraction. Available at https://halshs.archives-ouvertes.fr/halshs-00004274v3/
file/15_OHara.tif.pdf.
O’Hara, J. 1982. The Prediction and Discovery of Conical Refraction by William Rowan
Hamilton and Humphrey Lloyd (1832–1833). Proc. Roy. Ir. Acad. 82: pp. 231–257.
Okun, L.B. 1989. The Concept of Mass. Physics Today, 42: 31–36.
Okun, L.B. 2009. Mass versus Relativistic and Rest Masses. Am. J. Phys. 77: 430–431.
Olver, P. 2015. Introduction to the Calculus of Variations. Notes.
Orio de Miguel, B. (Translator). 2009. Correspondencia G. W. Leibniz–Johann Bernoulli.
Cartas 1–275, 20 de diciembre de 1693–11 de noviembre de 1716. Registro de la
Propiedad Intelectual de la Comunidad de Madrid.
Pais, A. 1982. ‘Subtle is the Lord: The Science and the Life of Albert Einstein. Oxford
University Press, 154
Pais, A. 1991. Niels Bohr’s Times. Oxford University Press. p.154
Palmieri, P. 2008. The Empirical Basis of Equilibrium: Mach, Vailati, and the Lever. Stud.
Hist. Phil. Sci. 39: 42–53.
Pappus of Alexandria, 1888. Pappi Alexandrini Collectionis: quae supersunt. Vol. 2, 1189–
1211.
Pars, L. A. 1965. A Treatise on Analytical Dynamics. Heineman.
Pedersen, O. 1993. Early Physics and Astronomy: A Historical Introduction. Cambridge
University Press 115.
Pérez, E. 2009. Ehrenfest’s Adiabatic Theory and the Old Quantum Theory, 1916–1918.
Arch. Hist. Exact Sci. 63: pp. 81–125.
Pié i Valls, B. and Pérez, E. 2016. The Historical Role of the Adiabatic Principle in Bohr’s
Quantum Theory. Ann. Phys., 528: 530–534 .
Planck, M. 1906a. Vorlesungen über die Theorie der Wärmestrahlumg, J. A. Barth, 155–
156. For the translation of the second edition, see M. Planck. 1914. The Theory of
Radiation. Blakiston’s Son & Co., 160–166.
Planck, M. 1906b. Das Prinzip der Relativität und die Grundgleichungen der Mechanik.
Verhandlungen der Deutschen Physikalischen Gesellschaft, 8: 136–141.
Planck, M. 1907. Zur Dynamik bewegter Systeme. Sitzungsberichte der Preussischen
Akademie der Wissenschaften, (January-June), 542–570.
Planck, M. 1910. Die Stellung der neueren Physik zur mechanischen Naturanschauung. In
Planck M. 1944. Wege zur physikalischen Erkenntnis. Reden und Vorträge. S. Hirzel.
pp.. 25–41.
Planck, M. 1915. Eight Lectures on Theoretical Physics. Columbia University Press.
Planck, M. 1915/1993. The Principle of Least Action. In A Survey of Physical Theory.
Dover.
Poe, E. A. 1975. The Complete Tales and Poems. Vintage Books.
References 251
Poisson M. 1818. Mémoire sur l’intégration de quelques équations lináires aux dif-
férences partielles, et particulièrement de léquation générale du mouvement des
fluides élastiques. Mémoires de l’Académie royale des sciences, 3: 121–176.
Pound, R. V. and Rebka Jr. G. A. 1959. Gravitational Red-Shift in Nuclear Resonance.
Physical Review Letters, 3: 439–441
Rashed, R. 1970. Optique géometrique et doctrine optique chez Ibn al-Haytham. Archive
for History of Exact Sciences, 6: 271–298.
Rashed, R. 1990. A Pioneer in Anaclastics: Ibn Sahl on Burning Mirrors and Lenses. Isis,
81: 464–491.
Rashed, R. 2016. Private communication.
Rodrigues, O. 1816. De la manière d’employer le principe de la moindre action, pour
obtenir les équations du mouvement, rapportées aux variables independentes. Corre-
spondance sur l’École Royale Polytechnique. Vol. III, 159–162.
Rojo, A. G. 2005. Hamilton’s Principle: Why Is the Integrated Difference of the Kinetic
and Potential Energy Minimized? Am. J. Phys., 73: 831–836.
Runge, C. 1919. Vektoanalysis. Vol. 1.
Sabra, A. I. 1981. Theories of Light from Descartes to Newton. Cambridge University
Press.
Sarton G. 1932. Discovery of Conical Refraction by William Rowan Hamilton and
Humphrey Lloyd (1833). Isis, 17: 154–170.
Schopenhauer, A. 1969. The world as will and representation. Vol. 1 Dover Publications.
Schrödinger, E. 1926a. Quantisation as a Problem of Proper Values (Part I). Annalen der
Physik, 79: 361–376.
Schrödinger, E. 1926b. Quantisation as a Problem of Proper Values (Part II). Annalen der
Physik, 79: 489–527.
Schrödinger, E. 1926c. Quantisation as a Problem of Proper Values (Part III). Annalen der
Physik, 80: 437–490.
Schrödinger, E. 1926d. Quantisation as a Problem of Proper Values (Part IV). Annalen der
Physik, 81: 109–139.
Schrödinger, E. 1928. Collected Papers on Wave Mechanics. Blackie & Son Limited,
London.
Schwartz, M. 1972. Principles of Electrodynamics. Dover Publications.
Schwarzschild, K. 1903. Zur Elektrodynamik. 1. Zwei Formen des Prinzips der kleinsten
Wirkung in der Elektronentheorie. Konigliche Gesellschaft der Wissenschaften und
der Georg August Universitat zu Gottingen. 126–131. Available at http://gdz.sub.uni-
goettingen.de/ .
Schwarzschild, K. 1916. On the Gravitational Field of a Point-Mass, According to
Einstein’s Theory. Sitzungsberichte der Königlich Preußischen Akademie der Wis-
senschaften, 49: 189–196.
Serway, Raymond A. and Jewett, John W. 2013. Physics for Scientists and Engineers.
Ninth edn. Cengage Learning, 1071–1072.
Sklar, L. 2013. Philosophy and the Foundations of Mechanics. Cambridge University
Press.
Soldner, H. J. von. 1801 On the Deflection of a Light Ray from its Motion along a Straight
Line through the Attraction of a Celestial Body Which Passes Nearby. Translated
from Soldner, H. J., Astronomisehes Jahrbuch für das Jahr 1804, 161–172, in Jaki, S.
L. 1978. Johann Georg von Soldner and the Gravitational Bending of Light, with an
English Translation of His Essay on It Published in 1801. Foundations of Physics, 8:
Nos. 11/12, 927–950.
252 References
Sommerfeld A. and Runge I. 1911. Anwendung der Vektorrechtung auf die Grundlagen
der geometrischen Optik. Annalen der Physik, 4th ser. 35: pp. 289–293.
Sommerfeld, A. 1911a. Das Plancksche Wirkungsquantum und seine allgemeine Bedeu-
tung für die Molekularphysik. Physikalische Zeitschrift, 12: 10571069.
Sommerfeld, A. 1911b. Application de la théorie de l’élément d’áction aus phénomènes
moléculaires non périodiques. In La théorie du rayonnement et les quanta. Rapports
et discussions de la réunion tenue à Bruxelles, du 30 octobre au 3 novembre 1911,
sous les auspices de M. E. Solvay. Pub. par MM. P. Langevin et M. de Broglie. Ulan
Press, 2012, 313–392.
Sommerfeld, A. 1916. Zur Theorie des Zeeman-Effekts der Wasserstofflinien, mit einem
Anhang über den Stark-Effect. Phys. Zs., 17: 491–507
Sommerfeld, A. 1950. Mechanics of Deformable Bodies: Lectures on Theoretical Physics.
Vol. 2. Academic Press.
Sommerfeld, A. 1952. Mechanics. Lectures on Theoretical Physics. Vol. 1. Academic
Press.
Smith, G. E. 2006. The vis viva dispute: A controversy at the dawn of dynamics. Phys.
Today, October, 31–36.
Steiner J. 1842. Sur le maximum et le minimum des figures dans le plan, sur la sphère et
dans l’espace en général. Second mémoire’. J. Reine Angew. Math. 24: 189–250.
Stokes, G. G. 1862. Report on double refraction. Report of the British Association for the
Advancement of Science. Reprinted Mathematical and Physical Papers, by the Late
George Gabriel Stokes. Vol. 4. Cambridge University Press (1904). 127–202
Stöltzner, M. 2003. The Principle of Least Action as the Logical Empiricist’s Shibboleth.
Studies in History and Philosophy of Modern Physics, 34: 285–318.
Stuewer, R. 1970. Non-Einstenian Interpretations of the Photoelectric Effect. Minnesota
Studies in the Philosophy of science. Edited by R. Stuewer. Vol. 5, 246–263
Synge, J. L. 1937. Geometrical Optics: An Introduction to Hamilton’s Method. Cambridge
University Press, vii.
Synge, J. L. 1945. The Life and Early Work of Sir William Rowan Hamilton. In a collection
of papers in memory of Sir William Rowan Hamilton, Scipta Mathematica Studies,
13–24.
Sussmann, H. J. and Willems, J. C. 1997. 300 Years of Optimal Control: From the
Brachistochrone to the Maximum Principle. IEEE Control System Magazine, 17:
32–44.
Taylor, E. F. and Wheeler, J. A. 1999. Spacetime Physics: Introduction to Special
Relativity. Second edn. W. H. Freeman.
Taylor, E. F. and Wheeler, J. A. 2000. Exploring Black Holes. Addison Wesley, 5.
Terrall, M. 2002. The Man Who Flattened the Earth; Maupertuis and the Sciences of
Enlightment. University of Chicago Press.
Thomson, William (Lord Kelvin). 1890. On a Mechanism for the Constitution of Ether.
Proceedings Royal Society of Edinburgh, 17: 122–132.
Thucydides, 431 BC. The History of the Peloponnesian War. Available at
http://classics.mit.edu/Thucydides/pelopwar.html
Tisserand, F. 1899. Traité de mécanique céleste, 95–97.
Todhunter I. 1861/2005. A History of the Progress of the Calculus of Variations During the
Nineteenth Century. Cambridge University Press.
Truesdell, C. 1960. Rational Mechanics of Flexible or Elastic Bodies. 1638–1788 –
Introduction to Work of Euler. Springer.
Truesdell, C. 1960b. A Program toward Rediscovering the Rational Mechanics of the Age
of Reason. Archive for History of Exact Sciences, 1: 3–36.
References 253
Truesdell, C. 1968. Whence the Law of Moment of Momentum? Chapter V in Truesdell C.,
1968. Essays in the History of Mechanics. Springer-Verlag.
Varignon, P. 1735. Nouvelle Mecanique ou Statique. Vol. 2.
Vierkandt, A. 1892. Über gleitende und rollende Bewegung. Monatshefte der Math. und
Phys. 3: 31–54.
Virgil. 19. BC The Aeneid. Book I. Translated by Robert Fitzgerald. New York: Random
House, 1981.
Visser, T. D. and Wolf, E. 2010. The origin of the Gouy phase anomaly and its
generalization to astigmatic wavefields. Optics Communications, 283: 3371–3375.
Voltaire, 1753. Diatribe du docteur Akakia, médecin du Pape: Decret de l’inquisition.
Rome. Available in Google books.
Vollgraff, J.A. 1915. Christiaan Huygens (1629–1695) et Jean le Rond d’Alembert (1715–
1783). Janus, 20: 269–313.
Weierstrass, K. 1927. Mathematische Werke. Vol. 7. Mayer & Muller.
Weyl. H. 1919. Eine neue Erweiterung der Relativitätstheorie. Ann. der Physik, 59: 101–
133.
Whiteside D. T. 1974. The Mathematical Papers of Isaac Newton. Vol. 6. Cambridge
University Press.
Whittaker, E. T. 1988. A Treatise on the Analytical Dynamics of Particles and Rigid Bod-
ies. Fourth Edition, Cambridge University Press; first edn. 1904, fourth edn., 1937,
reprinted by Dover 1944 and Cambridge University Press 1988.
Young, A. T. 2012. An introduction to mirages http://mintaka.sdsu.edu/GF/mirages/
mirintro.html.
Yourgrau, W. and Mandelstam, S. 1968. Variational Principles in Dynamics and Quantum
Theory. W.H. Saunders and Co., first published 1955.
Yurkina, M. I. 1985. Sur l’histoire de la notion du potentiel. Journal of Geodesy, 59 (2):
150–166.
Index
254
Index 255