Sei sulla pagina 1di 46

Ten principles of Quantum Mechanics

Steven S. Gubser

A suggestion: Before reading the whole 46 pages, try reading just the bold-faced
statements of each principle at the beginning of each section.

Contents
1 Wave-particle duality 3

2 Einstein relation for the energy 7

3 De Broglie relation for the momentum 11

4 Heisenberg uncertainty relation 16

5 Probability and the wave-function 20

6 Fermions and bosons 24

7 Negative frequency and anti-matter 27

8 Spin 31

9 Entropy 34

10 Thermal occupation numbers 38

Preface
“Your theory is crazy, but it’s not crazy enough to be true.” — Niels Bohr1

Quantum mechanics is unlike classical mechanics not only in substance but in style.
Whereas in classical mechanics, one is used to everything following more or less logically
from F� = m�a together with some knowledge of force laws, in quantum mechanics there are
a number of inter-related principles which do not seem to arise from a single underlying
1
All quotations from Niels Bohr were copied verbatim from http://en.wikipedia.org/wiki/Niels Bohr.
I did not check their correctness or authenticity.

1
Figure 1: Left: A classical cat (actually drawn in a Baroque style). Right: A classical cat
at finite temperature. From http://www.stevenorton.com/shop/page1.html.

law. Whereas F� = m�a is a starting point with clear mathematical content, the principles
of quantum mechanics have a semi-quantitative, semi-empirical quality, and they do not
immediately seem to lend themselves to a unique mathematical implementation. And yet,
quantum mechanics has grown to be a precise mathematical framework which is highly
predictive and, as far as we can tell, correct. I aim to set forth in ten principles the main
ideas of quantum mechanics that inform that mathematical framework and which have some
relevance to the early history of the subject (early meaning up to about 1932, when the
positron was discovered).2 Two of these principles have to do more properly with quantum
statistical mechanics, where in addition to quantum uncertainty there is thermal randomness,
something like in figures 1 and 2.
I do not aim to formulate quantum mechanics precisely. The canonical presentation of
non-relativistic quantum mechanics captures all but principles 7, 9, and 10 pretty well. But
a full implementation of principles 7 and 8 requires relativistic quantum theory, which also
principle 5 is implemented in a more subtle fashion than non-relativistic quantum mechanics.
Principles 9 and 10 are the ones having to do with finite temperature, so they are also
outside the purview of the simplest treatment of non-relativistic quantum mechanics. But
the fundamental physical constant describing quantum phenomena, Planck’s constant, h =
6.626 × 10−34 J · s, was discovered in a finite temperature context. So, from a historical
perspective at least, quantum mechanics and quantum statistical mechanics are inseparable.
2
If I went further forward in history, say up to 1980, other fundamental ideas would come in: second
quantization, gauge symmetry, and renormalization. The first two of these ideas are tucked into the current
discussion in small ways, mostly having to do with the description of photons.

2
Figure 2: Left: A quantum cat (actually drawn in a Cubist style). Right: A quantum cat
at finite temperature. From http://www.stevenorton.com/shop/page1.html.

This adds to the difficulty of understanding either subject, because it seems necessary to
understand everything before you understand anything. Pity then the pioneers of quantum
mechanics, who not only had to start from the wrong end of the subject (blackbody radiation)
and work their way to the easy bits (Schrodinger’s equation), but also had to figure out along
the way which principles of classical mechanics should be retained and which abandoned.

1 Wave-particle duality
“A triviality is a statement whose opposite is false. However, a great truth is a statement
whose opposite may well be another great truth.” — Niels Bohr

Particles like electrons and photons behave like waves in the sense of exhibiting
interference phenomena. But they behave like particles in the sense that they
are indivisible: for example, if an electron absorbs or emits a photon, it absorbs
or emits it entirely, and the change in energy of the electron equals the energy
of the photon.

A clean demonstration of the wave nature of light is the two-slit experiment, see figure 3.
Suppose a coherent green light source is incident on the two slits. For our purposes, coherent

3
Figure 3: The setup of the double slit experiment. Upper left: Rays from the two slits travel
different distances to get to the screen. Lower left: A coherent source incident on the screen
from the left produces interfering wave-fronts from the two slits. Upper right: A single slit
produces a broad peak as the central part of its diffraction pattern. Lower right: The two-slit
interference pattern.

4
means that the light incident on the two slits is a plane wave such that a particular wave
crest (let’s say crest A in the lower-left drawing in figure 3) enters through each slit at the
same moment. The resulting interference phenomenon observed on the screen has regularly
spaced maxima which are understood to be points of constructive interference between light
coming through slit 1 and light coming through slit 2. Between these minima one finds
minima corresponding to destructive interference. Constructive interference occurs when
the wave from slit 1 is in phase with the wave from slit 2. That will happen if the wave from
slit 2 has to travel the same distance as the wave from slit 1 in order to get to the screen,
or if it has to travel one wavelength further, or two wavelengths, or n times the wavelength
where n is an integer. Using a small angle approximation, which is justified in the limit
where � is much greater than d and s, it’s straightforward to show that the angles θ at which
the maxima occur are given by
λ s
sin θ ≈ n≈ . (1)
d �
The minima occur at angles such that sin θ ≈ (n + 1/2)λ/d because then the two waves are
180◦ out of phase, so they destructively interfere.
So far, I’ve summarized the double slit experiment at the level found in any introductory
physics text, for example chapter 22 of Knight. Now for the punchlines:

1. Electrons do the same thing. A double-slit apparatus capable of handling electrons was
not around in the early 20th century, but the Davisson-Germer experiment in 1927 is
generally deemed an adequate stand-in. Electrons with momentum p ≈ 14 keV/c
were reflected off a nickel crystal, and the reflected wave was observed to exhibit an
interference pattern. The details are more complicated than figure 4 suggests partly
because nickel has a face-centered cubic crystal structure, not just a square lattice as
shown. (Face-centered cubic means a cubic lattice with an extra atom at the center
of the face of each side of each cube.) Simplifying a little, one type of scattering of
the electrons off the nickel crystal is like reflection of a wave of electrons from a line of
atoms with spacing a = 3.5 Å. Then you expect maxima for angles θ such that

2a cos θ = nλ . (2)

The reason is that if the electron can hit the atom one below the surface, it travels
2a cos θ further than if it hit the atom right above it. The angle θ is arbitrary, but an
approximation is still being used: � � a.
Maxima were observed corresponding to the result (2), where λ = h/p ≈ 0.14 Å. The

5
Figure 4: A cartoon of the Davisson-Germer experiment, where electrons are reflected off
a nickel crystal. Top: The electrons reflect through an angle 2θ, with angle of incidence
equalling angle of reflection as measured from the normal to a crystal plane. Bottom: Elec-
trons reflecting off atoms in different crystal planes travel different distances, so they can
interfere.

6
Davisson-Germer result gives compelling support to de Broglie’s hypothesis, discussed
at greater length in section 3.
Maybe this means that we should understand electrons to be a wave as well, and give
up on all this matter-as-particles nonsense. But there’s more...

2. Electrons in a double-slit experiment exhibit the standard interference pattern even


when they go through the apparatus one at a time. If you run the experiment for
a very short time, only one electron gets through, and you see only one dot on the
photographic plate behind the two slits. The top panel of figure 5 shows a situation
where only a few electrons have gotten through the apparatus. This seems like com-
pelling evidence that electrons are indeed particles. But if you wait a long time so
that many such dots accumulate, will form the usual interference pattern, as lower
panels of figure 5 show. The explanation is that a single electron acts as a wave when
passing through the two slits, and this wave interferes with itself, but when the wave
is incident on the screen, the electron is “forced to choose” where to be, and it does so
probabilistically, with the probability proportional to the intensity of the wave. The
same thing happens with photons. If there were many photons (or electrons), then the
intensity of the wave can be understood as the energy delivered per unit area per unit
time. Loosely, this intensity is proportional to the brightness of the lines of constructive
interference in the original double-slit experiment.

2 Einstein relation for the energy


“You know, what Mr. Einstein said is not so stupid....” — Wolfgang Pauli3

A wave with frequency ω is composed of particles with energy �ω, where � = h/2π.
A particle with energy can be described as a wave oscillating with frequency ω.

Frequency and wavelength for light are related by

λω = 2πλν = 2πc . (3)

Visible light has a typical wavelength λ = 5500 Å. (More precisely, this wavelength corre-
sponds to green light, which is in the middle of the visible spectrum.) Using (3) one finds
All quotations from Pauli were copied verbatim from http://www.msu.edu/∼lewiska8/finalwebisp213h/
3

pauli quotes.htm. I did not check their correctness or authenticity.

7
Figure 5: Results from a double slit experiment using electrons. The number of electrons is:
10 (a), 200 (b), 6000 (c), 40000 (d), 140000 (e). From A. Tanamura et al., Am. J. Phys. 57
(1989) 117. 8
ω = 3.4 × 1015 Hz, or equivalent, ν = 5.5 × 1014 Hz. For some reason, frequencies of light are
usually quoted in terms of ν rather than angular frequency ω (and to make matters worse,
people often use f for ordinary frequency rather than ν), but ω is generally preferred in
quantum mechanics because there are fewer 2π’s. AM radio waves have typical frequency
ν = 100 MHz, corresponding to λ = 3 m.
Using Einstein’s relation,
E = �ω , (4)

one finds E = 2.3 eV for a green photon, and E = 4.1 × 10−7 eV for an AM radio photon.
Calculations like this are facilitated by remembering the value of �c in weird units:

�c = 1973 eV · Å . (5)

Note that the photon energy has no clear relation to E = γmc2 : photons are massless and
travel at the speed of light, so γ = ∞ and m = 0 while E is finite.
What prompted Einstein’s study of (4) is the photo-electric effect. Let’s start with a
simplified version of this experiment, where we have a source of photons whose frequency is
tunable and whose overall brightness is also tunable. We shine it on a metal, say gold, and
watch for electrons coming out. (That last bit is pretty sketchy: we have to assume that
we’ve learned how to observe single electrons and measure their energy.) The main features
of the photo-electric effect are:

1. The light has to have a certain minimum frequency ω0 before any electrons are emitted.
The corresponding energy is W = �ω0 , called the work function. Every metal has a
characteristic work function, and for typical metals W ranges from 4 to 5 eV. The work
function is lower for alkali metals like sodium, potassium, and cesium, on the order of
2.2 eV,4 which means that visible light has a high enough frequency to produce photo-
electrons: see figure 6. Einstein’s understanding of this minimum frequency is that the
electrons are bound inside the metal with binding energy W , and to kick an electron
loose, the photon has to deposit an energy �ω > W into it.

2. When ω > ω0 , the electrons that are kicked loose (called photo-electrons) have a
4
This makes sense because the outermost electron in their shell structure is more loosely held than in
other metals. The same property makes alkali metals basic when dissolved in water (hence the name): they
are electron donors, like Lewis bases.

9
Figure 6: The photo electric effect for potassium. Blue and green light produce
photo-electrons, and red light doesn’t. From http://hyperphysics.phy-astr.gsu.edu/
hbase/mod1.html.

10
maximum kinetic energy

1 2
K.E.max = mvmax = �ω − W . (6)
2

(It’s assumed here that the electrons are non-relativistic, otherwise we would have to
use K.E.max = (γ − 1)mc2 .) Einstein’s explanation of this is that all the photon’s
energy goes into the electron that it hits: note that this is part of my earlier statement
of wave-particle duality in section 1. But it could happen that the electron scatters off
of other electrons in the metal before escaping, and if it does it can lose energy. Hence
its kinetic energy is often less than the bound (6).

3. When ω > ω0 , the number of photo-electrons emitted per unit time is proportional to
the intensity of the light source. Einstein’s understanding of this is that intensity is
a measure of the number of photons that hit the metal per unit time. Each one has
some chance of producing a photo-electron, and there are enough electrons in the metal
and few enough photons hitting it per unit time that the individual photon-electron
collision events are independent.

It’s worth noting that energy conservation is a cornerstone of Einstein’s theory of the photo-
electric effect, which is what he won the Nobel Prize for. Excluding the rest energy of the
electron (E = mc2 ), its initial energy is −W , and after it absorbs the photon it is �ω − φ.

3 De Broglie relation for the momentum


A wave with wavelength λ is composed of particles with momentum p = �k, where
k = 2π/λ is the wave-number. A particle with momentum p can be described as
a wave with wavelength λ = h/p.

The Davisson-Germer experiment was the experimental confirmation of de Broglie’s hy-


pothesis. What’s notable is that de Broglie came first, in 1925. He had two big hints. The
first was Einstein’s relation (4) for the photo-electric effect (1905), which implies de Broglie’s
relation for photons:
E �ω h
p= = = �k = for photons. (7)
c c λ
In more detail: E = pc is a standard relation of special relativity (for photons), and ω = ck
is a standard relation for waves that have a k-independent phase velocity, dω/dk = c, and
together they imply the equivalence of E = �ω and p = �k.

11
The second big hint was Bohr’s model of the hydrogen atom (1913) and the Bohr-
Sommerfeld quantization condition (c.a. 1915, reformulated by Einstein in 1917). Before
reviewing Bohr’s model from de Broglie’s perspective, we need to remind ourselves about
the fundamental experimental observations that made Bohr’s model compelling and yet con-
fusing. First, atoms have a very small nucleus (about (1.2 fm)A1/3 in radius, where A is the
atomic number, compared to a typical size 1 Å of the atom as a whole). This was discov-
ered by Rutherford in 1911 by bombarding gold with non-relativistic alpha particles (helium
atoms stripped of their electrons) and observing that they scatter off the nucleus in just the
way they should if the alpha particle and the nucleus are described as point particles with
a repulsive electrostatic interaction, V = 2Ze2 /4π�0 r, where Z is the number of protons in
the nucleus and the 2 accounts for the two protons in an alpha particle.
Thinking that both protons and electrons are effectively point particles leads to the “solar
system” model of the hydrogen atom: the electron orbits the proton in the way that the
earth orbits the sun. Even the force law is the same up to an overall factor as Newtonian
gravity:
e2 1 e2 �r
V (r) = − so F� = − , (8)
4π�0 r 4π�0 r3
where the minus sign is because the electron has charge −e while the proton has charge e. But
what about orbits where the electron falls radially inward toward the proton? Wouldn’t there
be a collision of the two, and wouldn’t the kinetic energy get arbitrarily large as V → −∞?
No sign of such high-energy collisions is observed in a vessel of hydrogen. Worse yet, it’s
understood from the classical theory of radiation that an electron in a circular orbit radiates
energy. There is a formula for this, Larmor’s formula (1895) which has many quantitative
confirmations. Using it on the hydrogen atom should lead to the conclusion that the electron
spirals inward, radiating photons with a wide spectrum of energies. This should continue
without limit, and the total power going into radiated photons should increase with time,
without limit. This is very far from observations!5 A vessel of hydrogen at room temperature
scarcely radiates at all. Most of the atoms are in their ground state, meaning the state of
minimum possible energy.
Radiation from hydrogen atoms does occur when the hydrogen is raised to a high tem-
perature (without any oxygen present, please). The radiation has a number of characteristic
5
It might seem odd that the same objection is not raised to the apparent stability of the solar system.
Shouldn’t the earth radiate gravitons as it goes around the sun and similarly spiral inward? In fact it does,
but very gradually. Einstein’s theory of General Relativity predicts precisely how fast this happens, and
for systems where a pulsar (a spinning neutron star) orbits around another compact massive object, the
in-spiralling effect was observed by J. Taylor and collaborators, resulting in the 1993 Nobel Prize.

12
wavelengths (all quoted in Anstroms):

λm→n m=2 m=3 m=4 m=5


Lyman series, n = 1 1216 1026 972 950
(9)
Balmer series, n = 2 6565 4863 4342
Paschen series, n = 3 18760 12820

Remarkably, the corresponding frequencies, ωm→n = 2πc/λm→n , can be organized into a


single formula when numbered as indicated in (9):

Em − En
ωm→n = for m > n ≥ 1, (10)

where
R
En = − and R = 13.6 eV . (11)
n2
Already in (11) we see another justification for the idea that photons are emitted as particles,
all at once, with some definite energy. The energies En are interpreted as the possible energies
of the hydrogen atom, and transition between levels occurs via photon emission. See figure 7.
The n = 1 energy level is thought to be the ground state, from which no further radiation is
possible.
Now we can start coming to the punchline. Consider circular orbits of the electron around
the proton, as shown in figure 7. Let’s assume that the electron moves non-relativistically.
Then its velocity follows from using F = ma in the y direction of figure 9 and noting that
ay = −v 2 /r:
e2 1 v2 p2
Fy = − = may = −m = − . (12)
4π�0 r2 r mr
We can solve for p: �
me2
p = mv = . (13)
4π�0 r
De Broglie tells us that the electron is a wave with wavelength λ = h/p. This suggests the
criterion
2πr = nλ , (14)

meaning that when the electron goes once around its orbit, it returns in phase with itself
because it traveled an integer number of wavelengths. A picture of what would happen if
this weren’t so is shown in figure 10.

13
Figure 7: A cartoon of the Bohr model of the hydrogen atom, where a photon of def-
inite energy is emitted during a transition from one energy level to a lower one. From
http://encyclopedia.thefreedictionary.com/Bohr-Sommerfeld+quantization.

Using (13) and the de Broglie relation to find λ, we find that (14) becomes

4π�0 r (15)
2πr = nh .
me2

This can be solved to find the radius:

�2 4π�0
r n = n 2 aB where aB = ≈ 0.529 Å . (16)
me2

Since (13) now tells us the momentum, it’s easy to evaluate the energy of the nth level:

p2 p2 e2 e2
En = K.E. + P.E. = + V (r) = − =− , (17)
2m 2m 4π�0 r 8π�0 rn

where in the last step we found (by calculation) that the kinetic energy of the circular orbit
is −1/2 times the potential energy, and we plugged in r = rn . Using (16) once more, one
finds
(e2 /4π�0 �)2
En = − . (18)
2n2
Plugging in numbers, one recovers precisely (11). Bohr won the Nobel Prize for his theory

14
Figure 8: The spectrum of hydrogen, showing Lyman, Paschen, and Balmer se-
ries as well as energy levels in eV. From http://www.daviddarling.info/images/
hydrogen spectrum.gif.

v
e−

F
y
r
x
p

Figure 9: Free body diagram of an electron moving around a proton.

15
Figure 10: An electron in a circular orbit represented as a wave. This is disallowed as a
quantum orbit because the circumference isn’t an integer multiple of the wavelength. So
the wave doesn’t constructively interfere with itself after going once around the orbit. Note
that the notation for the radius of the orbit here is a instead of my preferred letter for
radius, r. From http://www2.kutl.kyushu-u.ac.jp/seminar/MicroWorld2 E/2Part1 E/
2P11 E/deBroglie wave E.htm.

of atomic spectra in 1922.

4 Heisenberg uncertainty relation


“If quantum mechanics hasn’t profoundly shocked you, you haven’t understood it yet.” —
Niels Bohr

∆x ∆p ≥ �/2, where ∆x is the uncertainty in the position of a particle or system


of particles and ∆p is the uncertainty in its momentum. Also, ∆E ∆t ≥ �/2 where
∆E is the uncertainty in the energy of a particle or system of particles and ∆t
is a finite time interval over which its properties are measured or altered.

The uncertainty principle is shocking because it says that the whole structure of Newto-
nian mechanics is based on a false premise. That premise is that a particle’s state at a given
time can be specified by giving its position and velocity. The discussion so far seems to rely
on this premise: especially the calculation (12) was squarely based on Newtonian mechanics,
although we quickly transitioned to a wave picture in (14). And yet we got a right answer
in (18).
According to the wave picture, the electron is spread out over its orbit, which is a standing
wave around the proton. Either it is in all places at once along this orbit, or it hasn’t

16
“decided” where to be along its orbit. In this sense (if it makes sense), we should regard the
uncertainty in its position to be comparable to the radius of its orbit: ∆x ≈ r. Likewise,
one can’t be sure which way the momentum points, because it points in different directions
at different points along the orbit. But the magnitude of the momentum is fixed according
to (13). So ∆p ≈ p, and the uncertainty relation tells us


pr >
∼ 2. (19)

We can check this against (14), which can be re-expressed as

pr = n� , (20)

where n > 0 is an integer. So we see that the allowed quantum orbits according to the Bohr
model satisfy the Heisenberg inequality with about a factor of 2n to spare.6
We can arrive at a more precise manifestation of the uncertainty principle by considering
a wave-packet in a theory with a dispersion relation ω = ω(k). To keep things simple, let’s
restrict to motion in one dimension:
� ∞
dk
φ(t, x) = φ̂(k)eikx−iω(k)t . (21)
−∞ 2π

We saw wave-packets of this type when we considered a chain of coupled pendula. More
precisely, when we considered the continuum limit of such a system we found a dispersion
relation �
ω(k) = c2s k 2 + ω02 (22)

for some constants c2s and ω0 expressible in terms of the properties of the pendula and their
coupling. And we remarked that if we set cs = c and ω0 = mc2 /�, and use Einstein’s relation
E = �ω and de Broglie’s relation p = �k, we recover one of the main relations of special
relativity: �
� m2 c4
E= p2 c2 + m 2 c4 ↔ ω= c2 k 2 + . (23)
�2
6
Actually, there are four subtleties that a fully quantum mechanical treatment would correct. First, we
have only estimated ∆x and ∆p in a reasonable way, and the inequality ∆x ∆p ≥ �/2 is precise only when
we give a precise definition of ∆x and ∆p, as in the remainder of this section. Second, the real orbitals are
smeared out in radius, so r doesn’t have a definite value. Third, the momentum is uncertain even in its
magnitude, so p doesn’t have a definite value. And fourth, there are multiple components of position and
momentum, and each separately satisfies the uncertainty principle: for example, ∆x ∆px > �/2. Even after
all the subtleties are properly accounted for, the uncertainty principle survives.

17
Although excitations of coupled pendula have nothing obvious to do with electrons or pho-
tons, the proposal is to use wave-packets of precisely the form (21) to describe relativistic free
particles. For the photon we would set m = 0. For reasons to be explained in section 7, we
want to allow φ to be a complex variable when it describes an electron, but we’ll take the real
part when it describes a photon. This simple construction is almost the whole story about
free particles in quantum mechanics: the only missing ingredients are spin and anti-matter.
It seems sensible to define the average position at time t as
�∞
dx |φ(t, x)|2 x
�x(t)� ≡ �−∞
∞ , (24)
−∞
dx |φ(t, x)|2

and the average momentum to be


�∞ dk
|φ̂(k)|2 k
�p� ≡ ��k� where �k� ≡ �−∞


dk
. (25)
−∞ 2π
|φ̂(k)|2

The average momentum doesn’t change with time because φ̂(k) doesn’t change. Physically,
this indicates that no forces act on the particle/wave: we’re describing free propagation. To
formulate precisely what we mean by ∆x, recall the definition of the standard deviation of
a collection of numbers {xn }N
n=1 :


∆x ≡ �x2 � − �x�2 (26)

where �N �N
N N
1 � xi 2 1 � 2 i=1 (xi )
2
�x� = xi = �i=1
N
�x � = (xi ) = � N
. (27)
N i=1 i=1 1
N i=1 i=1 1
�N
The identity i=1 1 = N is trivial, but it makes the expression for �x� look as similar as
7
possible to (24). Extending this analogy, we can define
�∞ �∞ dk
2 −∞
dx |φ(t, x)|2 x2 2 −∞ 2π
|φ̂(k)|2 k 2
�x(t) � ≡ �∞ �k � ≡ � ∞ dk . (28)
−∞
dx|φ(t, x)|2 −∞ 2π
|φ̂(k)|2

If (26) seems unfamiliar, consider the deviation ∆xi ≡ xi − �x� of each point xi from the average, �x�.
7

An equivalent definition to (26) is �


� N
�1 �
∆x = � (∆xi )2 .
N i=1

∆x is sometimes called the “root-mean-square” deviation, or rms deviation, of the data from the average,
because you first square the deviations, then average them, then take the square root.

18
# Φ at t$0
Φ
1.0
1.5

0.8 1.0

0.6 0.5

0.4 !10 !5 5 10
x
!0.5
0.2
!1.0

!10 !5 5 10
k !1.5

Figure 11: The wave-packet (31) with k̄ = 4 and σk = 1/2.

Then we can define

� �
∆x(t) ≡ �x(t)2 � − �x(t)�2 ∆p = �∆k ≡ � �k 2 � − �k�2 . (29)

It must be admitted that we now have a lot of definitions. One good thing is that we
know exactly what we mean by average position and uncertainty for any wave-form. Another
good thing is that there is a deep theorem of Fourier analysis which says (for any t)

1
∆x ∆k ≥ , (30)
2

which evidently is the same as the uncertainty principle when we use p = �k. The inequality
(30) is saturated for Gaussians, e.g.

−(k−k̄)2 /4σk2 π 2 /4σ 2
φ̂(k) = e φ(0, x) = eik̄x−x x , (31)
σx

where σx ≡ 1/2σk . This wave-packet is precisely the one discussed starting on page VIII.13
of the lecture notes; also see figure 11. It is readily demonstrated that this Gaussian has

�k� = k̄ ∆k = σk
(32)
�x(0)� = 0 ∆x(0) = σx .

As we saw in our earlier wave-packet calculations, the width of the x-space Gaussian increases
both for t > 0 and t < 0, while of course φ̂(k) doesn’t change (because no forces act on the
wave.)

19
5 Probability and the wave-function
“What we observe is not nature itself, but nature exposed to our method of questioning.” —
Werner Heisenberg8

The probability of observing a particle at a given position is proportional to the


square of the amplitude of the corresponding wave at that position.

We already started using this in (24)-(32). Now let’s formalize it a little with the notion
of a probability distribution function (PDF), which is simply a function ℘(x) with two
properties: � ∞
℘(x) > 0 dx ℘(x) = 1 . (33)
−∞

The probability to find a particle in a given region, say x ∈ (a, b), is


� b
� �
℘ x ∈ (a, b) = dx ℘(x) (34)
a

The second condition in (33) says that the probability of finding the particle somewhere—
anywhere—is 1, which means it’s a sure thing. The average of any function f (x) is

�f � ≡ dx ℘(x)f (x) . (35)

Evidently, given any wave-function φ(x) which is square integrable,


��

2
℘(x) = |φ(x)| dx̃ |φ(x̃)|2 (36)
−∞

is a PDF. We would define a different probability function ℘(t, x) ∝ |φ(t, x)|2 at each time
t. Time and space play a very different role in this discussion: it’s not that ℘(t, x) is a
probability distribution function over t and x simultaneously; rather, for any fixed t, ℘(t, x)
is a PDF over x.9
8
All quotes from Werner Heisenberg were copied verbatim from http://www.brainyquote.com/quotes/
authors/w/werner heisenberg.html. I did not check their correctness or authenticity.
9
This difference between t and x appears to present some puzzles when we pass to relativistic quantum
theory: if we mix t and x with a Lorentz boost, then the PDF in the boosted frame seems to be making
statements in the unboosted frame about the relative probability of a particle being here now versus there
later. Anti-matter also complicates the story. A full understanding requires a more powerful discussion using
Hilbert spaces.

20
Using an inverse Fourier transform
� ∞
φ̂(k) = dx φ(x)e−ikx (37)
−∞

to get φ̂(k) starting from φ(x), we can form a PDF in k-space:


��

dk̃
℘(k) = |φ̂(k)|2 |φ̂(k̃)|2 . (38)
−∞ 2π

I generally insist on a factor 1/2π on the k-space integration measure, as you see in the
denominator of (38). This means that
� ∞
dk
℘(k) = 1 , (39)
−∞ 2π

expressing the fact that the probability that the particle has some momentum—any momentum—
is 1.
Two additional characteristic features of quantum mechanics are:

1. Quantum states are usually “combined” by superposing them. This means that if one
state has wave-function φ1 (t, x) and the other has wave-function φ2 (t, x), the super-
position has wave-function φ1 (t, x) + φ2 (t, x). This is very different from adding the
probability distribution functions, which are proportional to |φ1 (t, x)|2 and |φ2 (t, x)|2 .
(But see sections 9 and 10 for important exceptions to this rule!)

2. The overall phase of the wave-function doesn’t enter into probabilities, but relative
phases between two superposed components do. For this reason, the overall phase is
said to be unobservable.

Here’s a good example: let’s consider the superposition

φ(t, x) = φ1 (t, x) + φ2 (t, x) where φ1 (t, x) = eikx−iωt and φ2 (t, x) = e−ikx−iωt ,


(40)

and ω = c2 k 2 + m2 c4 /�2 . The wave-function φ(t, x) expresses the idea that an electron
either has momentum �k or −�k. It’s in a superposed state of the two possibilities. Recalling
that eiθ + e−iθ = 2 cos θ, we can rewrite (40) more simply as a single standing wave:

φ(t, x) = 2e−iωt cos kx . (41)

21
Now the PDF is
℘(t, x) ∝ |φ(t, x)|2 = 4 cos2 kx . (42)

We notice right away that there are maxima (at x = nπ/k for integer n) and minima
(at x = (n + 1/2)π/k). This is in contrast with a purely right-moving travelling wave,
φ(t, x) = eikx−iωt , for which ℘(t, x) ∝ |φ(t, x)|2 = 1, meaning that the particle is equally
probable to be anywhere, with no maxima or minima.
Consider altering the wave-function (40) by multiplying by an overall phase:

� �
φ(t, x) = eiθ φ1 (t, x) + φ2 (t, x) . (43)

Evidently, ℘(t, x) is unchanged from (42) because |φ(t, x)|2 is unchanged. This is even true
if the phase θ depends on t and x.10 Now suppose we alter the wave-function by inserting a
relative phase: to keep things simple, just consider

φ(t, x) = φ1 (t, x) − φ2 (t, x) = 2e−iωt sin kx . (44)

Now the PDF is different, being proportional to sin2 kx rather than cos2 kx. The maxima
and minima are in different places.
There’s a pathology in our discussion of traveling waves like φ1 (t, x) = e−iωt+ikx , which
have |φ1 (t, x)|2 = 1 everywhere. The trouble is, you can’t have ℘(x) = 1 over the whole
real line, because the integral of this function is infinite. You can’t have ℘(x) = � for any
� > 0, no matter how small, for the same reason. Nor can you have ℘(x) = 4 cos2 kx,
because this function also as an infinite integral. To solve this problem, you have to consider
wave-packets. Instead of (40), I should have superposed two wave-packets, like this:

φ(t, x) = φ1 (t, x) + φ2 (t, x) where


� ∞
dk −(k−k̄)2 /4σk2 ikx−iω(k)t
φ1 (t, x) = e e (45)
−∞ 2π
� ∞
dk −(k+k̄)2 /4σk2 ikx−iω(k)t
φ2 (t, x) = e e .
−∞ 2π

φ1 is a right-moving wave-packet, and φ2 is a left-moving wave-packet. The superposition of


10
There is something odd about this last statement: it seems to say, for example, that e−i∆ωt e−iωt+ikx is
indistinguishable from e−iωt+ikx , at least based on probabilities. But there is a difference ∆ω in frequency
between the two wave-functions, so there should be a difference ∆E = �∆ω in energy. A tentative conclusion
from this is that the total energy is hard to define, and maybe this shouldn’t bother us. But the same thing
goes through for momentum, and it’s strange not to know what p = 0 means. This problem has to do with
how you couple the electron to an electromagnetic field.

22
!Φ!^2 at t#!5 !Φ!^2 at t#!2
12 12

10 10

8 8

6 6

4 4

2 2

!5 5
x !5 5
x

!Φ!^2 at t#0 !Φ!^2 at t#5


12 12

10 10

8 8

6 6

4 4

2 2

!5 5
x !5 5
x

Figure 12: The probability distribution function for colliding wave-packets with k̄ = ±4,
σk = 1/2, m = 4, and � = c = 1. The smooth red lines in the plots at t = −2 and t = 0
show the PDF one would get using (47).

the two does not describe two electrons; instead, it describes just one whose time-evolving
PDF is ℘(t, x) ∝ |φ(t, x)|2 . Either wave-packet by itself makes a smooth, node-free contri-
bution to ℘(t, x), but when we square the sum of the two, an inteference pattern emerges!
See figure 12. The interference pattern owes its existence to cross-terms in the expression
for the probability distribution function:

℘ ∝ |φ|2 = |φ1 + φ2 |2 = |φ1 |2 + |φ2 |2 + φ∗1 φ2 + φ1 φ∗2 (46)

If we were to omit these terms, we would obtain a different PDF:

℘wrong ∝ |φ1 |2 + |φ2 |2 ∝ ℘1 + ℘2 . (47)

All expressions in (46) and (47) are functions of t and x. In figure 12 we show the difference
between (46) and (47).

23
Interlude
The principles laid out so far provide most of the conceptual underpinning of introductory
non-relativistic quantum mechanics, i.e. PHY 208 and part of PHY 305. We could describe
them as the “five basic principles” of quantum mechanics. They are not enough, however, to
understand the broad sweep of the subject. The remaining sections set forth three “advanced
principles” of quantum mechanics and two principles of quantum statistical mechanics. All
these notions were understood, in some form, by about 1932. Together—in an appropriate
mathematical formulation—they underlie huge swathes of atomic physics, condensed matter
physics, high energy physics, and chemistry. We really need these additional principles: for
example, to understand the structure of the periodic table, the Pauli exclusion principle and
spin are indispensible.
At the same time, the five additional principles to be described below take us further and
further from classical intuitions, and they seem to me even more arbitrary and eclectic than
the first five. In the end, the whole of quantum mechanics, including spin, anti-matter, and
the relation to statistical mechanics, finds a beautiful unifying mathematical framework in
quantum field theory. Only gravity remains apart, requiring some new idea (string theory?)
to bring it fully within the purview of quantum phenomena.

6 Fermions and bosons


“This is the worst set of notes I’ve ever seen!” — Wolfgang Pauli, in reference to notes
based on his lectures.

Particles are either fermions or bosons. Identical fermions cannot occupy the
same quantum state (Pauli exclusion). Identical bosons can occupy the same
state. If two identical particles are in two different quantum states, there is no
way of telling which one is in which state.

Consider energy eigenstates, like the levels of the hydrogen atom. Although hydrogen has
only one electron, helium has two, and if we neglect the interations between the two, each can
occupy energy levels which are similar in structure to hydrogen’s. The exclusion principle
says that the two electrons can’t occupy the same state. But there are two subtleties:
1. We forgot about spin! An electron can be spin up or spin down in any given energy
eigenstate. So two electrons (but no more) can occupy a given energy eigenstate, and
when they do, one has to be spin up and the other spin down.

24
2. There can be several states with the same energy: these are called “degenerate” en-
ergy levels. Electron orbitals happen to have some interesting degeneracies. Solving
Schrodinger’s equation shows that the n = 1 energy level of Bohr’s model is unique
(apart from the spin degeneracy mentioned above), but the n = 2 level is four-fold
degenerate (again without accounting for spin degeneracy), and the n = 3 level is
nine-fold degenerate, and so on.

The first two “periods” of the periodic table can now be explained. Each of Bohr’s energy
levels is referred to as a “shell,” because a heuristic picture is that each successive level is a
little further out from the nucleus. Electrons fill the lower shells before starting to fill the
upper shells. An atom with its outermost shell filled is a noble gas: for instance, neon has 10
electrons, of which two occupy the n = 1 shell and 8 = 4 × 2 occupy the n = 2 shell. Adding
one more electron (and correspondingly increasing the charge of the nucleus to maintain
overall charge neutrality), one gets sodium, where in addition to filling the n = 1 and n = 2
levels, there’s one electron in the n = 3 level. It seems intuitive that this additional electron
is easier to scoop out of sodium than any of neon’s electrons are, because E3 in Bohr’s model
is less negative than E2 . And indeed, the chemistry of sodium is all about losing that last,
lonely electron.
Working out the number of electrons that give filled shells correctly predicts the atomic
number of the first two noble gases: see figure 13.

Z=2 helium

Z = 2 × (1 + 4) = 10 neon (48)

Z = 2 × (1 + 4 + 9) = 28 WRONG! Argon has 18

We should ask what went wrong with with argon. The answer is that when you include the
interactions among electrons (which in practice can only be done approximately), the energy
eigenstates turn out not to be as degenerate as in the Bohr model. And it turns out that
energy eigenstates with less angular momentum have lower energy, to an extent that ten of
the level 3 states with high angular momentum are more energetic than two of the level 4
states with no angular momentum (to be more precise: 3d states are more energetic than
4s).11
11
Zero angular momentum may seem odd from the perspective of the Bohr model: the only way to achieve
this is to have the electron moving radially, which seems to mean it’s going to hit the nucleus. In a fully
correct quantum treatment, the uncertainty principle “spreads out” the electron enough so that it (mostly)
avoids the nucleus. In particular, ground state of hydrogen (1s) has no angular momentum (ignoring spin),

25
Figure 13: Periodic table of the elements. From http://www.bpc.edu/mathscience/
chemistry/history of the periodic table.html.

26
B C
A C A B

1 2 1 2
Figure 14: Two ways of distributing three identical bosons among two quantum states.
Because the occupation numbers of each state are the same, there is no distinction between
the two ways of distributing the bosons among them.

In contrast to fermions, two identical bosons are allowed to be in the same quantum state.
The notion of identical particles is very precise in quantum mechanics. Ignoring interactions
between particles, what it means is that to describe a state of several identical particles,
you shouldn’t inquire which quantum state each one is in; rather you should enumerate all
the states and ask how many particles are in each state. The difference is perhaps best
illustrated with an example of three identical bosons, let’s call them A, B, and C, with two
quantum states available to each, call them states 1 and 2. For example, these states could
be orbitals of the bosons in some central potential similar to the hydrogen atom. Then the
punchline is that there’s no difference between the state where A and B are in state 1 while
C is in state 2, and the state where A and C are in state 1 while B is in state 2: see figure 14.
In fact, it was misleading to give separate designations to the bosons in the first place. The
only meaningful statement is that state 1 is occupied by two bosons, and state 2 is occupied
by one boson. If you had N bosons to distribute among the two states, all could be in state
1, or N − 1, or whatever. If instead we considered fermions, the allowed occupation numbers
of each quantum state are 0 and 1 (ignoring spin degeneracies).

7 Negative frequency and anti-matter


Particles with charge (like electrons) have complex wave functions, and so there
is a difference between time dependence e−iωt and eiωt . When ω > 0, the former
describes an electron of energy �ω > 0. The latter solution necessitates the
existence of anti-electrons, or positrons. Particles without charge (like photons)
have real wave functions, and there is no distinction between a photon and its
anti-particle: that is, there are only photons, and an anti-photon is a photon.
whereas in the Bohr model one incorrectly predicts that it does have angular momentum.

27
Figure 15: Cartoon depiction of the Dirac sea. Two electrons occupy each negative energy
level because one is spin up and one is spin down. From http://www.phys.ualberta.ca/
∼gingrich/phys512/latex2html/node63.html.

At a classical level, we might inquire whether the standard relation

E 2 = p2 c2 + m 2 c4 (49)

of special relativity (often called the “mass-shell relation”) could be solved to give not only
� �
E = p2 c2 + m2 c4 , but also E = − p2 c2 + m2 c4 . At first sight, the second solution seems
unphysical, because how can a particle have negative energy? Dirac suggested that such
solutions imply the existence of positrons. More precisely, he suggested that all solutions
e−iωt+ikx to the Klein-Gordon equation with ω 2 = c2 k 2 + m2 c4 /�2 are allowed, but when
describing electrons, the solutions with ω < 0 correspond to states that are already occupied,
in the sense that there is an electron already present in this negative-energy state in the
state that we experience as the vacuum. This hypothesis is described as the Dirac sea. The
exclusion principle forbids any additional electrons from being in negative energy states.
That is why the electrons of our experience (as contrasted with the ones in the sea) have
positive energy. But we could also make a positive-energy state by removing one of the
negative-energy electrons from the sea. If the electron we remove has energy E < 0 and
momentum p�, then the state we end up with has energy −E > 0 and momentum −�p,
relative to the original vacuum state where all negative energy states are occupied.
The notion of the Dirac sea seems fanciful, and it’s not clear to me that it’s required in

28
the mathematical implementation. But positrons are not fanciful! They were discovered in
1932, a year after Dirac predicted they should exist.
People objected to the Dirac sea on grounds that it implies that there is an infinite
negative charge density in the vacuum. Later it was realized that the Fermi sea of electrons
that exists in metals and semi-conductors provides a concrete realization of the Dirac sea,
except that instead of populating all energy levels with E < 0, the Fermi sea has electrons
in all states with 0 < E < Ef .12 The Fermi energy Ef is by definition the energy of the
highest state that’s occupied. The electrons of the Fermi sea are real objects, understood
approximately as the electrons donated from the outermost shell of each atom in the metal
or semi-conductor. If you add one more electron to the system with E > Ef , it’s analogous
to creating an electron with energy E − Ef > 0 above the Dirac sea. If you take away
one electron from the system, it has to have energy E < Ef (because only such electrons
exist in the ground state of the system before you disturb it), and it’s analogous to creating
a positron with energy Ef − E > 0 in Dirac’s theory. These two types of excitations are
commonly referred to as electrons and holes, or as particles and holes.
Simplifying a little, if we start with pure silicon and “dope” it by replacing a few silicon
atoms with phosphorus, which has one additional electron, then the additional electrons
act as particle excitations on top of the Fermi sea. But if instead of phosphorus we used
aluminum, which has one fewer electrons than silicon, then the absences of a few electrons
act as hole excitations of the Fermi sea.13 Using phosphorus gives an n-type semi-conductor,
so named because the extra electrons are negatively charged, and using aluminum gives a
p-type semi-conductor. The holes in a p-type semi-conductor carry electrical current, so they
are quite tangible even though their existence is more properly an absence. If you like, holes
are only a matter of careful bookkeeping for the electrons in the Fermi sea.
Now let’s pass to a wave equation description and ask how things simplify when we
consider matter only, and not anti-matter. Recall that in our treatment of wave-phenomena
in section VIII of the lecture notes, we recovered the mass-shell relation E 2 = p2 c2 + m2 c4
from a wave-perspective by finding the normal modes of the Klein-Gordon equation:
� �
∂2 ∂2 m2 c4
− 2 + 2 − 2 φ = 0. (50)
∂t ∂�x �
12
Often the allowed electron energies have a “band structure,” meaning, for example, that states with
E1,i < E < E1,f are allowed, and then states with E2,i < E < E2,f are allowed, but states in between these
two “bands” are disallowed by the underlying crystal structure.
13
For reasons I don’t understand, boron seems to be a more common dopant than aluminum. It has the
same effect because it has the same number of valence electrons, namely three, as opposed to four in silicon
and five in phosphorus.

29

In writing φ = e−iωt+ik·�x , it was a choice to make ω positive, and then we got (22) and
then (using Einstein and de Broglie’s relations) (23). There are lots of situations where
we can be fairly sure we’re dealing with electrons, not positrons: in particular, in non-
relativistic quantum mechanics, if you start with electrons, you’re not going to accidentally
make positrons, because the kinetic energies are much less than the rest mass of an electron-
positron pair: K.E. � 2mc2 is essentially the same statement as v � c.
We would like to inquire whether there is a more explicit way of stipulating in the
differential equation that we want only the positive frequency solutions. To see that there
is, consider the analogies

E ↔ �ω ↔ i�
∂t (51)

p ↔ �k ↔ −i� .
∂x
The first step in each line of (51) is just Einstein or de Broglie. The second is based on
noting that

∂ −iωt+ikx ∂ −iωt+ikx
i� e = �ω e−iωt+ikx and − i� e = �k e−iωt+ikx . (52)
∂t ∂x

These equations remind me of eigenvalue equations, M�v = λ�v , where M is, for example,
i� ∂/∂t, �v is e−iωt+ikx , and λ = �ω = E. Briefly, energy is an eigenvalue of the differential
operator i� ∂/∂t, and momentum is an eigenvalue of −i� ∂/∂x. Let’s be bold and consider
the identifications
∂ ∂
E = i� p = −i� . (53)
∂t ∂x
If we use these identifications in the mass-shell relation (49), we recover almost the Klein-
Gordon equation:
∂2 2 2 ∂
2
−� 2 + � c
2
2
− m2 c4 = 0 . (54)
∂t ∂�x
This last equation doesn’t quite mean anything, because it is just a differential operator that
isn’t acting on a function. But if we formally multiply both sides on the right by φ(t, �x), we
get the Klein-Gordon equation.
What we’d like to do, then, is to translate


E= p2 c2 + m2 c4 (55)

into a differential equation. But it doesn’t work, at least not straightforwardly: you wind up
with ∂ 2 /∂x2 under a square root, and that’s hard to make any mathematical sense of even

30
when you do allow yourself to multiply in some wave function. (Actually, the Dirac equation
is a subtle way of taking this square root using matrices, but it turns out that it doesn’t help
you escape the E < 0 solutions—hence Dirac’s prediction of the positron.) What we can do
is expand the square root, making the non-relativistic assumption p � mc:

p2 p2 (56)
E = mc2 1+ ≈ mc 2
+ .
m2 c2 2m

Now let’s use the identifications (53) to get

∂ �2 ∂ 2
i� =− 2
+ mc2 . (57)
∂t 2m ∂x

As before, this equation isn’t quite sensible yet, but if we continue at a formal level by
multiplying from the right by the wave function, now denoted ψ to accord with Schrodinger’s
preferred notation, we get
� �
∂ �2 ∂ 2 2
i� ψ(t, x) = − + mc ψ(t, x) . (58)
∂t 2m ∂x2

This is almost Schrodinger’s equation. Schrodinger replaced mc2 by the potential energy
V (x), which is allowed to depend non-trivially on x—so we are no longer describing free
particles! Although this may not be how Schrodinger came to his equation,14 it seems
reasonable to think of it as a slight modification of the non-relativistic limit of the Klein-
Gordon equation. Admittedly, we are not used to incorporating mc2 into the potential energy
V (r) of an electron; but including it or excluding just corresponds to including or excluding an
2 t/�
overall factor e−imc in the wave-function. This has no effect on the probability distribution
functions: it is an example of an unobservable phase (see footnote 10).

8 Spin
Most particles have intrinsic angular momentum, the “spin.” Fermions, such as
electrons, have half integer spin, meaning that the spin along a specified axis
can only be � times a half-integer, like ±�/2 (which are the only values allowed
for an electron). Bosons, such as photons, have integer spin, meaning � times
14
I am told that he had previously made an intensive study of the Fokker-Planck equation, which is
approximately Schrodinger’s equation without the i on the left-hand side and has to do with diffusion. So
it seems at least plausible that he came at the formulation of his equation from this completely different
perspective

31
an integer. Photons have an additional special property: their spin is always
aligned or anti-aligned with their momentum, and its magnitude is always �.

Let’s start with an apparent digression, namely a calculation of the angular momentum
of the circular orbits in the Bohr model. Assume that the orbit is counter-clockwise in the
x-y plane. Then the only non-zero component of L� = �r × p� is

h
Lz = pr = r = n� . (59)
λ

In the second equality we used de Broglie’s relation, p = h/λ. In the third equality we used
the criterion (14) that the orbital should correspond to a standing wave with an integer
number n wavelengths around the circumference. It’s worth noting that we didn’t need to
assume anything about the potential V (r) except that it was a central force potential, so
that angular momentum is conserved. Thus it should be a fairly general conclusion that Lz ,
� has to be some integer multiple of �.
or any other component of L,
As claimed above, it turns out that electrons by themselves have spin 1/2, meaning that
any component of their intrinsic angular momentum, say Sz , takes values ±�/2. I don’t
know of any semi-classical calculation that leads to this conclusion. You would think there
might be: an electron could be visualized as uniform ball of negative charge that literally
spins around some axis. This does not seem to work. Instead, the mathematics underlying
spin 1/2 is the group representation theory of rotations in three dimensions—a somewhat
abstract algebraic construction! But there is a reasonably concrete demonstration that a
given component Sz has two possible values. It is the Stern-Gerlach experiment, pictured in
figure 16. Electrons are shot through a magnetic field and then observed on a screen.15 The
magnetic field couples to the spin like this:

p2 e� � p2 e
H= + S · B(x, y, z) = + Sz B(x, y, z) , (60)
2m m 2m m

� = B ẑ. The electron is like a little bar magnetic


where in the last expression I assumed B
whose south-to-north axis points opposite its spin. The factor of e/m in (60) is called the
gyromagnetic ratio. Understanding why this particular factor must appear—as opposed to
e/2m or some other multiple—is subtle, but it can be understood given a sufficiently precise
understanding of relativistic quantum mechanics.16
15
Actually, Stern and Gerlach used silver, not free electrons. This is easier, and the physics is nearly the
same because all but one of the electrons in silver are paired into zero-angular-momentum states.
16
Actually, there are quantum field theoretic corrections to the gyromagnetic ratio which have been cal-

32
N
e− Sz = h/2
Sz = −h/2
S
z

y
x
e− N Sx = −h/2
S Sx = h/2

Figure 16: The Stern-Gerlach experiment. Electrons are passed through a magnetic field and
then observed when they hit a screen. Only two trajectories are observed, corresponding to
Sz = ±�/2. If the magnets are rotated 90◦ around the y axis, and nothing else is changed,
then it is still true that only two trajectories are observed, but now they correspond to
Sx = ±�/2.

33
We can understand classical trajectories of electrons in the Stern-Gerlach apparatus by
using one of Hamilton’s equations:

∂H e ∂B
ṗy = − =− Sz . (61)
∂y 2m ∂y

Evidently, the last expression is the force on the electron due to a gradient in the magnetic
field. This force can be either up or down according to whether Sz = �/2 or Sz = −�/2.
The two dots observed in the experiment support the existence of only two possible choices
for Sz , but unless we believe in the factor of e/2m and have a very precise understanding of
our apparatus, it’s hard to claim that Stern-Gerlach determines the particular values Sz is
allowed to take.
There is something odd about our description of spin so far: we’ve claimed that any
component of the spin, like Sz , can take only two values. Naively, that seems to mean that
� can take one of eight values, namely
S
       
1 1 1 −1
�
1
 �
1
 �
−1
 � 
−1 .
··· (62)
2  2  2  2 
1 −1 1 −1

But this is wrong! There are only two spin states for the electron, not eight. One “basis”
for these states is the state | ↑� with Sz = �/2 and the state | ↓� with Sz = −�/2. The
notation |state� is an abstract notation to indicate a quantum state with some property.
Representation theory leads to funny relations like this:

|Sx = �/2� = |↑� + |↓� . (63)

Translating (63) into words: spin in the positive x direction is a specific superposition of spin
in the positive and negative z directions. Other superpositions of |↑� and |↓� correspond to
spin in other directions: for example, |↑� + i |↓� points in the positive y direction.

9 Entropy
The entropy of a system is S = kB log W , where W is the number of quantum
states accessible to it, or consistent with its macroscopic properties, assuming
that all these states are equally probable.
culated and experimentally verified to about 12 decimal places.

34
The dependence S ∝ log W arises because entropy is an extensive variable. Here’s an
example. Say you have two systems, A and B, each completely isolated from and independent
of the other, and also isolated from the rest of the universe. Say there are WA states accessible
to system A and WB states accessible to system B. Then there are WA WB states accessible
to A and B together. By “together” I don’t meant that we put the systems in contact; I
only mean that to specify the state of both A and B we have to specify each one’s state
separately, and there are WA WB ways to do this. Formally, we could write

WA∪B = WA WB . (64)

Now, an extensive variable is one like energy, where the energy of two isolated systems
considered together is the sum of the energies of each system separately: we could write

EA∪B = EA + EB . (65)

It’s clear from (64) that W is not an extensive variable, but because

log WA∪B = log WA + log WB , (66)

we see that log W is extensive.17 Historically, entropy was measured in units of J/K, and
to remain consistent with that we factor in the Boltzmann constant kB = 1.381 × 10−23 J/K
in the definition of S. The specific value of kB is not important to the current discussion,
but heuristically, it is a conversion factor between any temperature T and a characteristic
thermal energy kB T at that temperature. But S is in some sense more basic than either
energy or temperature: it arises from purely combinatorial notions, like saying that system
A chooses among WA states as system B chooses among WB states. Because entropy is so
basic, sometimes people omit the kB from the definition or choose units of temperature such
that kB = 1 (e.g. measuring energy and temperature both in Joules, or in eV).
The other reason for defining S = kB log W is that W quantifies the disorder of a system.
But it does so only to the extent that we don’t know which of the W states the system is
in. If we knew that a system was in one particular state out of its W possible states, call it
state 1, then the system would be perfectly “ordered,” and its entropy would be zero. The
right way to look at this is that perfect knowledge of the system means that there’s only one
17
By log I mean the natural log. If I wanted to talk about the base 10 logarithm I would write log10 .
This is a common convention among theoretical physicists and mathematicians. Astronomers and engineers
prefer (I think) for log to mean the base 10 log, and then ln means natural log.

35
p log p
0.2 0.4 0.6 0.8 1.0
p
!0.05

!0.10

!0.15

!0.20

!0.25

!0.30

!0.35

Figure 17: The function ℘ log ℘ as a function of probability ℘.

state accessible to it, namely the one we know it’s in. So indeed S = kB log W = 0 because
W = 1. If we know that it must be in one of two states but are completely unsure which
one, then W = 2 and S = kB log 2. But what about a case where the system is 95% likely to
be in state 1 and 5% likely to be in state 2? This goes beyond the definition S = kB log W ,
because to apply this definition we stipulated that all the accessible states should be equally
probable. It turns out one can argue that in a system which has probability ℘i to be in any
of several states i, the entropy is

S = −kB ℘i log ℘i . (67)
i

You can easily check that if there are W states, and ℘1 = ℘2 = · · · = ℘W = 1/W , then (67)
reproduces S = kB log W . If ℘1 = 1 and all other ℘i vanish, then S = 0 because the function
℘ log ℘ vanishes as ℘ → 0 and as ℘ → 1: see figure 17.
Let’s try to apply the thinking above to a spin-1/2 particle. If we know nothing about
the system, then S = kB log 2, because the system could be either in the state | ↑� with
Sz = �/2 or in the state |↓� with Sz = −�/2. But wait—it could also be in the superposed
state |↑�, which I claimed was the state where Sx = �/2. The most general superposed state
of the spin-1/2 particle is
|ψ� = Z1 |↑� + Z2 |↓� (68)

for arbitrary complex numbers Z1 and Z2 .18 This seems to show that there are infinitely
18
Actually, only one of Z1 and Z2 should really be regarded as independent, because the overall normal-
ization of a wave-function is not important. Alternatively, one often says that the wave-function has to have
unit norm, and its overall phase is not important.

36
many states. Should we say S = ∞ for a single spin? The standard answer is No: instead
we should count a new state as distinct only if it cannot be represented as a superposition
of states counted previously. So W = 2 for a spin 1/2 particle, and for more complicated
systems, W is the dimension of the vector space of all possible wave-functions for the system.
This could still be infinite: it is, for example, for the hydrogen atom, where there are an
unlimited number of energy eigenstates. But if we stipulate that the hydrogen atom has
energy E < −1 eV, then only finitely many of the energy eigenstates are accessible to it, so
W becomes finite.
The relationship between quantum mechanics and entropy actually is quite twisty, as
exemplified by the entropy of a pair of electrons, of which one is spin up and the other is
spin down, like the electrons in the ground state of a helium atom. This is a unique state, so
S = 0. But what if we considered the electrons, call them A and B, separately? There’s no
difference between them, so each one has to be equally likely to be in a spin up state or a spin
down state. So we should say WA = 2 and WB = 2, right? But together, the electrons are in
a unique state, so WA∪B = 1 �= WA WB . That violates our basic premise, (64). Maybe it’s not
so bad because the electrons interact with one another. What seems really uncomfortable is
that we could in principle prepare two spins in an up-down state and then, without disturbing
their spins, separate them a long way apart so that they can’t interact. Then (keeping track
only of spin) should we still say WA = WB = 2 and WA∪B = 1? Experimentally, if you make
such states over and over, and then measure only spin A, you will find that it’s up in half
the cases and down in the other half. So ℘A B
↑ = ℘↓ = 1/2. If you measure both spins, you
always find one up and one down, only it’s impossible to predict which one will be up and
which down. Maybe then we should say WA∪B = 2 for the whole system, since there are
only two possible outcomes when you measure both spins? But this runs afoul of our notions
of identical particles, where we said that the only thing there is to know is how many spins
are up and how many down. From the identical particle perspective the spin state of the
electrons is unique: one up, one down.19 This is the Einstein-Padolsky-Rosen paradox.20
The consensus view of practitioners of quantum mechanics and quantum statistical me-
chanics is that indeed WA = WB = 2 while WA∪B = 1. The spins aren’t really independent
19
In fact, it is misleading even to given the electrons different labels when they are identical. But we could
imagine replaying the discussion where one electron is replaced by a proton, which also has spin 1/2. The
story about spin states still goes through with the proton as particle A and the electron as particle B.
20
More precisely, EPR were bothered by the claim that if you measure spin A and find spin up, then
no matter how shortly afterward you measure spin B, you must find it to be spin down. How does spin B
“know” the outcome of the measurement of spin A? No signal, not even a light wave, could “tell” spin B how
the spin A experiment came out if we measured B sufficiently shortly afterward. So quantum measurement
seems acausal in some sense.

37
because we stipulated from the start that one is spin up and one is spin down. They are
said to be correlated, or entangled. “Independent” systems, to which (64) applies, are by
definition systems where no correlations exist—that is, systems whose states can be chosen
without reference to one another or to the totality of A ∪ B.

10 Thermal occupation numbers


If a system has quantum states with definite energies En , then raising it to finite
temperature T corresponds to a probability proportional to e−En /kB T that the
system will be in the quantum state with energy En , where kB = 1.381 × 10−23 J/K
is Boltzmann’s constant.

There are two thorny issues here. First, the probability that we’re talking about here
is different from the one that comes from probability distribution functions associated with
superpositions of different states. And second, we seem to have pulled the functional form
e−En /kB T out of a hat.
Considering a spin-1/2 particle in a constant magnetic field helps clarify both issues.
The Hamiltonian is the same as in (60), but now we additionally assume that B is constant.
Also we imagine somehow “trapping” the particle spatially and keeping it always in the
same trapped state: only its spin is allowed to fluctuate. So the only important part of the
Hamiltonian, for the purposes of the example I want to describe, is

eB
H= (Sz + �/2) . (69)
m
eB
Really I should have written H = m z
S, but adding the constant eB�/2m corresponds to
an unobservable overall phase in the evolution of the wave-function. The energy eigenstates
are the state with Sz = �/2 (spin up, |↑�) and the state with Sz = −�/2 (spin down, |↓�).
The corresponding energies are
E↓ = E0 = 0
eB� (70)
E↑ = E1 = .
m
So spin down is the ground state, and our adjustment of the energy was contrived so that
its energy E0 vanishes. Intuitively, it seems reasonable that at zero temperature, the system
stays always in its ground state, and at non-zero temperature, the spin can “fluctuate” in
some sense. If the temperature is low, then usually the spin is down, but sometimes it is up.

38
If the temperature is sufficiently high, then the spin scarcely cares about the magnetic field,
so it will be up as often as down.
Here is how we would naturally attempt to implement such states with a superposition:

|ψ� = |↓� + e−E1 /kB T |↑� . (71)

This seems like a good idea because when we form the corresponding probability distribution
function by squaring the coefficients, we get

1 e−E1 /kB T (72)


℘↓ = ℘↑ = ,
Z Z

where Z = 1 + e−E1 /kB T is a normalization factor chosen so that ℘↓ + ℘↑ = 1. By design,


℘s ∝ e−Es /kB T where Es is the energy for spin s. We’re pleased to observe that as T → 0,
℘↓ → 1 while ℘↑ → 0, so S → 0; and as T → ∞, ℘↓ and ℘↑ both approach 1/2, so
S → kB log 2, which is the largest entropy that a single spin can have.
The problem is that if we take T big, then according to (71),

|ψ� ≈ |↓� + |↑� , (73)

which corresponds to a spin that points in the +x direction. That’s not what we wanted!
The problem is that when you form a superposition, you “know too much” about the state
for it to truly describe thermal fluctuations.21 In (73), for example, in the limit T → ∞, we
“know” that the spin is in the Sx = �/2 state, which is unique, so S = 0. We followed the
rules laid out in section 5 in forming the probability distribution function ℘, but the trouble
is that the superposition (71) contains extra information that we do not want, namely a
definite direction for the spin for any choice of T . So we have to give something up, and
the thing we give up is the superposition. Thermal randomness is different from quantum
randomness: it has to do only with probabilities, not superpositions.
I still haven’t explained why I chose the function e−En /kB T for the probabilities. To justify
this, imagine that we have a large number N of spins, each in the same B field, and coupled
very weakly to each other so that they can exchange energy but don’t appreciably affect
21
It is this inherent incompleteness of knowledge about a thermal state that I tried to depict on the right
hand sides of figures 1 and 2. But whereas lack of knowledge of a classical system is continuous in character,
as exemplified by a thermal uncertainty in the position or the momentum, lack of knowledge of a quantum
system has itself a discrete character, because it (usually) amounts to assigning probabilities to a discrete
number of possible quantum states. That’s what I was thinking of when I made the quantum cat pixelated
instead of blurred.

39
each other’s energy levels. Thermal equilibrium can be reached by dumping a certain total
amount Etot of energy into the system and then waiting for it to equilibrate among the spins.
We assume that all ways of sharing this energy among the spins are equally probable. How
many are there? Well, Etot clearly has to be some multiple of E1 , say Etot = M E1 for some
integer M , and 0 ≤ M ≤ N because the smallest possible amount of energy is 0 (all spins
down) and the largest is N E1 (all spins up). For some intermediate M , the number of ways
of distributing the energy is
� �
N N! N N e−N
W = = ≈ M −M = eN log N −M log M −(N −M ) log(N −M ) ,
M M !(N − M )! M e (N − M )N −M
(74)
where in the second equality we used Stirling’s approximation, N ! ≈ N N /eN .22 I’m cheating
a little by ignoring the possibility of superposed states: each spin is either up or down in this
discussion. Spins of this type are referred to as Ising spins, and they are not fully quantum
mechanical because superpositions are disallowed. The full quantum treatment is a little
trickier, but the final result is essentially the same.
Based on (74) we can calculate the entropy of the N spins at energy Etot = M E1 :
� �
Stot = kB log W ≈ kB N log N − M log M − (N − M ) log(N − M ) . (75)

From this we extract the average entropy S per spin,


� � � �
S Stot M M M M
≡ = − log − 1− log 1 −
kB N kB N N N N
� � � � (76)
E E E E
= − log − 1− log 1 −
E1 E1 E1 E1

in terms of the average energy per spin,

Etot M
E≡ = E1 . (77)
N N

Recall now the first law of thermodynamics:

dE = T dS . (78)
√ � �
22
A more accurate formula is N ! ≈ 2πN N N e−N 1 + 12N 1
+ O(N −2 ) , but for present purposes the
rather crude estimate I quoted in the main text is sufficient.

40
(Usually you say dE = Q + W with Q = T dS and W = −P dV , but in this spin system,
there is no notion of volume or pressure, so T dS is all there can be on the right-hand side
of (78).) We can manipulate (78) into a form that allows us to compute the temperature:
� �
1 dS kB E1
= = log −1 . (79)
T dE E1 E

We can now solve for the energy:

E1 e−E1 /kB T
E= . (80)
1 + e−E1 /kB T

On the other hand, we can calculate the expectation value for the energy from the probability
distribution function (72) derived by assuming ℘s ∝ e−Es /kB T :

e−E1 /kB T e−E1 /kB T


�E� = E↓ ℘↓ + E↑ ℘↑ = E1 = E1 . (81)
Z 1 + e−E1 /kB T

Comparing (80) with (81), we see that they match! Furthermore, there is only way to
simultaneously have ℘↓ + ℘↑ = 1 and make the average �E� match with E as computed in
(80). So we conclude that ℘s ∝ e−Es /kB T is the only possible choice.
That was quite a derivation, and things only get harder when there are more than two
states. So let me just assume that ℘n ∝ e−En /kB T is true in general and pass to an example
of considerable historical as well as modern interest: blackbody radiation. Planck knew that
an evacuated cavity heated to a very high temperature glows with a light whose intensity
depends on frequency and temperature in a very specific way (see figure 18):

�ω 3 1
I(ω, T ) = . (82)
4π 3 c2 e�ω/kB T − 1

It was well understood that (82) is equivalent to an energy density


u(ω, T ) = I(ω, T ) . (83)
c

It’s difficult for us to understand all the prefactors just now, but the dependence 1/(e�ω/kB T −
1) is something we can work out, as follows. Photons with frequency ω have energy �ω. They
are bosons, so any number of them can be in a given state, and that number is the only
information we can have about photons of a particular frequency. That means that there’s
one state, call it |0�, with no photons, another, call it |1�, with one photon, yet another, call

41
Figure 18: The spectrum of blackbody radiation quantified in terms of intensity per wave-
length. From http://en.wikipedia.org/wiki/Planck’s law of black body radiation.

42
it |2�, with two photons, and so on. Photons do not interact with one another, so the energy
of the state |n� must be
En = n�ω . (84)

Now we construct a thermal probability distribution function based on these energy eigen-
states:23

1 e−�ω/kB T e−2�ω/kB T e−n�ω/kB T


℘0 = ℘1 = ℘2 = ... ℘n = . . . , (85)
Z Z Z Z

where the normalization factor that ensures ∞ n=0 ℘n = 1 is

Z = 1 + e−�ω/kB T + e−2�ω/kB T + e−3�ω/kB T + . . . + e−n�ω/kB T + . . .


�∞
1 (86)
= e−n�ω/kB T = −n�ω/k
,
1 − e BT
n=0

where in the last step we noticed that the infinite sum is a geometric series. Just as in the
calculation (81) for a spin system, we can now compute the average energy �Eω � contributed
by photons in definite state of frequency ω:

1�
�Eω � = E0 ℘0 + E1 ℘1 + E2 ℘2 + . . . = n�ωe−n�ω/kB T . (87)
Z n=0

That last series may seem hard to sum, because it’s not geometric. It’s a little easier to look
at if we define x = �ω/kB T : then

∞ ∞
�ω � −nx � 1
�Eω � = ne and Z= e−nx = (88)
Z n=0 n=0
1 − e−x

The trick is to notice that

dZ � e−x
=− ne−nx = − . (89)
dx n=0
(1 − e−x )2
23
The same cheat is in place here that I mentioned earlier: I am assigning probabilities to states with a
definite number of photons without telling you why the system should choose to be in one of these states
instead of a superposition of states with different particle numbers. In fact, superpositions are allowed, but
a fully correct treatment in quantum statistical mechanics will reproduce the same result for u(ω, T ) as the
current one.

43
Using this, we arrive finally at

e−x 1 1
�Eω � = �ω −x
= �ω x = �ω �ω/k T . (90)
1−e e −1 e B −1

This still differs from the energy density (83) by a factor ω 2 /π 2 c3 . This factor has to do
with the number of different states of frequency ω: the wave-number �k must have |�k| = ω/c
because the photon is massless, but �k can point in different directions, and different directions
correspond to different states for a photon. Suffice it to say here that a sufficiently careful
treatment of photon modes, together with the ideas summarized in (90), allow one to derive
precisely the formulas (82)-(83) which fit the data.
Three final notes:

1. Planck somehow figured all this out without previous knowledge of Einstein’s relation
E = �ω, or of photons, or of bosons, or of identical particles, or of quantized energy
levels. He did know statistical mechanics quite well, and he invented quantized energy
levels with spacing �ω as well as establishing � as a fundamental constant of Nature.
Must have been a clever guy! He won the Nobel Prize in 1918 for his theory of
blackbody radiation.

2. The term “blackbody” is appropriate because the radiation spectrum (82), shown in
figure 18, is much less featured than the spectrum from, say, the hydrogen atom,
where instead of a single hump there are many intense lines. The latter spectrum can
reasonably be described as “colorful,” because each line has a specific color. Perhaps
a more conventional justification for the term “blackbody” is that the more perfectly
a surface absorbs light, the better it is at emitting it at sufficiently high temperatures,
and the more closely its emitted light follows the blackbody form. Perfectly absorbing
light means something is black, hence the term.

3. The cosmic microwave background radiation conforms spectacularly well to a black-


body curve with T ≈ 2.73 K. See figure 19. This is understood as arising from a
transition when the universe was about 400, 000 years old, in which photons decoupled
from electrons. This happens at quite a high temperature (about 3000 K), and the
2.73 K observed today results from the cooling of the photons due to the expansion of
the universe over the intervening 14 billion years or so. The surviving leaders of the
COBE team, John Mather and George Smoot, won the Nobel Prize in 2006.

44
Figure 19: Results from the COBE satellite, showing a perfect fit to the blackbody formula.
From http://en.wikipedia.org/wiki/COBE.

45
“Quantum mechanics is certainly imposing. But an inner voice tells me that it is not yet the
real thing. The theory says a lot, but does not really bring us any closer to the secret of the
’old one.’ I, at any rate, am convinced that He is not playing at dice.” — Albert Einstein

“It is very difficult to make an accurate prediction, especially about the future.” — Niels
Bohr

46

Potrebbero piacerti anche