Sei sulla pagina 1di 1056

1

ETHER WIND, SPECTRAL LINES, AND


MICHELSON INTERFEROMETERS
The Michelson interferometer is named after Albert Abraham Michelson, who designed and built
it in 1881 to detect the ether wind caused by the Earth’s orbital motion. Michelson’s attempt
failed; his interferometer, sensitive enough to detect stamping feet 100 meters away,1 could not
detect the Earth’s orbital motion. So important and difficult to explain was this result that
Michelson and Edward Morley repeated the experiment with a larger and more sensitive
interferometer in 1887. This second attempt, which is today called the Michelson-Morley
experiment, also yielded a negative result: The Earth’s motion could not be detected. The
Michelson-Morley experiment is one of the most important negative findings of 19th-century
science; it encouraged physics to discard the idea of a luminiferous ether and prepared the way
for Einstein’s relativity theories at the beginning of the 20th century.
The idea of a luminiferous ether—a plenum pervading both (transparent) matter and empty
space—had been widely accepted ever since Young and Fresnel established around 1820 that
light behaved like a transverse vibration or wavefield as it propagated past obstacles. There were
recognized difficulties with the concept; for example, the ether provided no detectable resistance
to the motion of material bodies yet was elastic enough to transmit light vibrations without
measurable energy loss. In the 1820s and ’30s, Poisson, Cauchy, and Green, famous
mathematical scientists, derived equations of motion for transverse waves in an elastic medium,
but when these equations were applied to the already known behavior of light, the results were at
best mixed.2 In 1867 James Clerk Maxwell modified the formulas describing the interdependent
behavior of electric and magnetic fields to make them a self-consistent set of equations; he
believed himself to be constructing a mechanical analogy for the ether. After showing that the
new set of equations predicted transverse electromagnetic waves traveling at the speed of light,
Maxwell not only asserted that light was a propagating electromagnetic disturbance, but he also
used his discovery to connect electric and magnetic properties to the behavior of the luminiferous
ether. It was not until 1888 that Hertz demonstrated experimentally that propagating
electromagnetic disturbances actually exist; and the optical community itself did not
acknowledge until 1896, with the discovery of the Lorentz-Zeeman effect, that light had to be

1
A. Michelson, “The Relative Motion of the Earth and the Luminiferous Ether,” American Journal of Science 22,
Series 2 (1881), p. 120–129.
2
E. Whittaker, A History of the Theories of Aether and Electricity, Vol. I, The Classical Theories (Thomas Nelson &
Sons, Ltd., New York, 1951), pp. 129–142.

-1-
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

such a propagating electromagnetic wavefield.3 So the ether concept was not only alive and well
at the time of Michelson’s experiments, but it could also be said, with the growing acceptance of
Maxwell’s equations to describe the behavior of the luminiferous ether, that it had never been
healthier.

1.1 The First Michelson Interferometer


Figure 1.1(a) is a drawing of the instrument Michelson described in his 1881 paper, and Fig.
1.1(b) shows how the interferometer works. Incident light enters from the left, as shown by the
dark solid arrow, and hits a glass plate whose back is a partly reflecting, partly transmitting
surface. Ideally, half the incident light is transmitted through to mirror C and half is reflected up
to mirror D. Mirrors C and D then return the light to the beam splitter, as shown by the dashed
arrows. At the beam splitter, the light is again half transmitted and half reflected to send two
equal-intensity beams into the observer’s telescope. The light that is first transmitted and then
reflected at the beam splitter is called beam TR, and the light that is first reflected and then
transmitted at the beam splitter is called beam RT. These beams are drawn as two side-by-side
dotted arrows, but in reality they should be thought of as lying one on top of the other, filling the
same volume of space as they travel from the beam splitter to the telescope.
Michelson, thinking then in terms of 19th-century optical theory, would have regarded light as
transverse and elastic vibrations in the ether. The ether’s plane of vibration might be horizontal,
as shown in Fig. 1.2(a), or vertical, as shown in Fig. 1.2(b). It was assumed, in fact, that the ether
could undergo transverse vibrations in any plane at all—horizontal, vertical, or something in
between, as shown in Fig. 1.2(c)—although not all at the same time. At any given point in the
light beam, there could be only one plane of vibration, with different colors of light characterized
by different wavelengths of vibration. If a “snapshot” of a light beam could be taken, the plane of
vibration could well be changing along its length, as shown in Fig. 1.3(a). At some slightly later
time, the snapshot would show the same configuration advanced in the direction of propagation,
as shown in Fig. 1.3(b). White light, then as now, was taken to be a composite beam consisting of
many different wavelengths simultaneously traveling in the same direction. Different colors of
light correspond to disturbances of different wavelengths. Combining or adding together many
different-colored disturbances produces a total transverse vibration having no particular or unique
wavelength and with the plane of vibration free to change in an irregular fashion along the length
of the beam, as shown in Fig. 1.3(c). The situation depicted in Figs. 1.3(a)–1.3(c) is actually very
close to the physical models used today to explain the behavior of light; all we need to do is
accept Maxwell’s equations—but not Maxwell’s ether—and say that the sinusoidal curves in

3
D. Goldstein, Polarized Light, 2nd ed. (Marcel Dekker, Inc., New York, 2003), p. 298.

-2-
7KH)LUVW0LFKHOVRQ,QWHUIHURPHWHUÂ

),*85( $
D 7KHILUVW0LFKHOVRQLQWHUIHURPHWHU








Figs. 1.3(a)±1.3(c) describe the changing length and orientations of the tip of the wavefield’s
oscillating electric or magnetic field vectors.4
Suppose length D in Fig. 1.1(b) is adMusted until the distance from mirror C to the beam splitter
is exactly the same as the distance from mirror D to the beam splitter. When monochromatic
light—that is, light having a unique wavelength—enters the interferometer as shown in Figs.
1.4(a) and 1.4(b), then the beams reflected from C and D recombine when leaving the
interferometer in such a way that their planes of vibration, as well as their state of oscillation,
exactly match. Since the planes of vibration match, we can disregard the planes’ orientation and
Must add together the two beams’ sinusoidal curves. Figure 1.5(a) shows that if the RT and TR
beams line up exactly—as they must when the distances from mirrors C and D to the beam
splitter are equal—then the summed oscillation is a maximum because the two wavefields are in
phase. If the distances from mirrors C and D to the beam splitter are unequal, then beams RT and
TR shift with respect to each other, as shown in Figs. 1.5(b)±1.5(e). The two beams can be out of
wavelength.depending on the
phase by any fraction of a wavelength howamount
much the
of inequality in mirror
the twodistance is.
distances.



4
See, for example, the discussion in Secs. 4.2 through 4.4 of Chapter 4. Figures 1.2(a) and 1.2(b) can be profitably
compared to Figs. 4.5 and 4.6 in Chapter 4.


1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.1(b).

Mirror D

a Beam Compensator
Splitter Plate

Incident
Light

Mirror C
partially reflective
surface

Beam RT Beam TR
first reflected then first transmitted then
transmitted at beam splitter reflected at beam splitter

Observing Telescope

-4-
The First Michelson Interferometer · 1.1

FIGURE 1.2(a).

cut in
wavefield
plane perpendicular
to direction of
propagation

FIGURE 1.2(b).

vibrations of vibrations of
transverse wavefield transverse wavefield
cut in wavefield

direction of
propagation

plane
perpendicular
to direction of
propagation

-5-
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.2(c). propagation direction for


transverse wavefield

three different
planes of vibration

FIGURE 1.3(a).
vibration wavelength

vibration wavelength
FIGURE 1.3(b).

FIGURE 1.3(c). white light—no unique


wavelength

-6-
The First Michelson Interferometer · 1.1

The closer this fraction is to one-half, the smaller the summed oscillation; and if they are out of
phase by exactly a half-wavelength, then their sum is zero and the combined beam disappears.
When one beam is shifted against the other by exactly one wavelength, and the planes of
vibration still match, then once again the monochromatic RT and TR beams are in phase and
producing a bright combined oscillation.5 There seems to be a real possibility that a
monochromatic beam cannot be used to confirm that mirrors C and D are the same distance from
the beam splitter because the recombined exit beam may look the same as it does when no shift at
all exists if one wavefield is shifted against the other by one, two, etc., wavelengths.
Suppose two monochromatic beams with two different wavelengths are sent through the
interferometer at the same time. If the distances from mirrors C and D to the beam splitter are
equal, then both the monochromatic beams, even though they have different wavelengths, must
be in phase when leaving the interferometer, producing a maximally bright oscillation in the
recombined exit beam. When the distances to the beam splitter are not exactly equal, however,
one of the monochromatic beams may end up shifted against itself by one, two, etc., wavelengths,
but there is no reason for the other beam to be shifted against itself the same way. When three
monochromatic beams are sent through the interferometer while the distances to the beam splitter
are not equal, matching all three wavetrains becomes even more unlikely. Hence, if we pass
white light containing innumerable distinct monochromatic wavetrains through the instrument,
then the RT and TR beams will recombine to produce a maximally bright output beam if and only
if the distances from mirrors C and D to the beam splitter are equal.
To make the white-light beam work as intended, the interferometer needs a glass compensator
plate between mirror C and the beam splitter [see Fig. 1.1(b)]. The compensator plate must be the
same thickness and orientation—and made from the same type of glass—as the glass in front of
the beam splitter’s partially reflecting surface. Figure 1.6(a) shows how light waves reflect from
mirrors C and D; the wavelength does not change while reflecting. In Fig. 1.6(b), however, light
waves inside the glass are somewhat shorter than they are outside the glass; the wavelength of the
light with respect to the glass thickness is greatly exaggerated to show this effect.
Therefore, a given distance traveled inside the glass corresponds to more wavelengths of a
monochromatic beam than the same distance in empty space. Moreover, different colors or
wavelengths of light shrink by different amounts, and this effect was a familiar one to 19th-
century optical scientists. If the compensator plate is not present, then the RT beam in Fig. 1.1(b)
passes through the glass in the beam splitter three times, whereas the TR beam passes through the
beam-splitter glass only once. The RT beam thus contains more wavelengths than the TR beam
even though the distances between the mirrors and the beam splitter are equal. With the
compensator plate there, however,
present, however,both thethe
both TRTRandand
RTthe
beams pass through
RT beams three glass
pass through threelayers.
glass
thicknesses.

5
In fact, we now know that a strictly monochromatic beam of light must have matching planes of vibration when
shifted against itself by exactly one, two, etc., wavelengths.

-7-
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.4(a). Figure 1.4(a) shows a segment of radiation entering the interferometer and Fig. 1.4(b)
shows what that segment becomes when it leaves the interferometer if the distance it travels up and back
each interferometer arm is the same.

before passing through


the interferometer

-8-
The First Michelson Interferometer · 1.1

FIGURE 1.4(b).

after leaving the


interferometer

Beam RT Beam TR

-9-
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

Beam TR
FIGURE 1.5(a).
Beam RT
In Phase Total

Beam TR
FIGURE 1.5(b).
Beam RT
Out-of-Phase
by a Quarter
Wavelength Total

Beam TR
FIGURE 1.5(c).
Out-of-Phase Beam RT
by a Half
Wavelength
Total

Beam TR
FIGURE 1.5(d).
Beam RT
Out-of-Phase by
Three-Quarters Total
Wavelength

Beam TR
FIGURE 1.5(e).
Beam RT
In Phase
Total

- 10 -
The First Michelson Interferometer · 1.1

FIGURE 1.6(a).

Incident Wavefield

Reflected Wavefield

FIGURE 1.6(b).

Reflected Wavefield

Incident Wavefield

Transmitted
Glass Wavefield
Substrate

Beamsplitting Film

- 11 -
Â(WKHU:LQG6SHFWUDO/LQHVDQG0LFKHOVRQ,QWHUIHURPHWHUV

Now Noweacheach monochromatic


monochromatic component
component has
has itsitsown
ownunique
uniquenumber
numberofofwavelengths
wavelengths inin each
each arm
of the interferometer; thus, the blue-light component in one arm has the same number of
wavelengths as the blue-light component in the other arm, the red-light component in one arm
has the same number of wavelengths as the red-light component in the other arm, and the same
can be said about all the other colors in the white-light beam.
Michelson wanted to do more than Must make the distances traveled by light going back and
forth between the C, D mirrors and the beam splitter equal; he also wanted to see how the
distances traveled by the light beams changed when he rotated the interferometer on its stand >see
Fig. 1.1(a)@. Up to now, we have assumed that mirrors C and D are exactly perpendicular to the
line of sight between their centers and the beam splitter, but nothing stops us from tilting one of
them a very slight amount, as shown in Fig. 1.7. The degree of tilt is, of course, greatly
exaggerated to show what is happening. When the tilt is imposed after the distances of mirrors C
and D to the beam splitter have been made equal, the center line of the tilted mirror remains at the
same distance from the beam splitter as it was before the tilt occurred. If the tilt is so small that
the slight change in direction of the beam can be disregarded, then that part of the beam reflecting
off the mirror’s center line still recombines with light from the other mirror in such a way as to
produce the maximally bright oscillation already discussed above. The off-center parts of the
recombined beam are, of course, dimmer because the off-center parts of the tilted mirror no
longer match up properly to the untilted mirror.6 An observer looking through the telescope
shown in Figs. 1.1(a) and 1.1(b) sees a bright central band, called a ³fringe,´ corresponding to the
central strip lying along the center line of the tilted mirror, with dark and less bright bands or
fringes on either side. If the distance that the light travels between the tilted mirror and the beam
splitter changes slightly, we expect the central fringe to shift as one side or another of the tilted
mirror—instead of its center line—becomes equal to the distance traveled by the light in the other
arm of the interferometer. It is exactly this sort of fringe shift that Michelson hoped to see when
he rotated the interferometer on its stand, changing the direction in space of the light going up
and back the arms of the interferometer.
One last point we need to make is that many beam splitters of the type shown in Fig. 1.1(b)
reflect differently from the glass side and the nonglass side of the partially reflecting surface,
reversing the directing of vibration in the TR beam reflecting off the nonglass side and not
reversing it in the RT beam reflecting off the glass side.7
Figure 1.5(c) shows that reversing the direction of vibration is the same as changing the phase
of the beam by one half-wavelength or 1808, so the phenomenon is often referred to as a 1808
phase shift on reflection. Michelson used this sort of phase-shifting beam splitter, so the RT and
TR beams in his interferometer did not match up the way they are shown in Fig. 1.4(b) when the
distances of mirrors
mirrors CC and
andDDfrom
fromthe
thebeam
beamsplitter
splitter are
are equal
equalbut
butinstead
insteadmatch
matchupupasasshown
showninin



6
See Secs. 5.20 and 5.21 in Chapter 5 for a more detailed discussion of how to analyze a tilted mirror.
7
F. Jenkins and H. White, )XQGDPHQWDOV RI 2SWLFV 3rd ed. (McGraw-Hill Book Company, New <ork, 1957), p.
251.


The First Michelson Interferometer · 1.1

Centerline of
FIGURE 1.7. Tilted Mirror

Line of Sight to Beam Splitter

Angle
Note: The angle of tilt is
greatly exaggerated in of Tilt
this diagram.

- 13 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

Fig. 1.8. Now the central fringe coming from the center line of the tilted mirror is dark because
all the monochromatic components of the two beams cancel out rather than add together. When
Michelson sent white light through his interferometer, he thus saw a central dark fringe with
parallel multicolored fringes on either side. The colored fringes come from the off-center strips of
the tilted mirror where one or another monochromatic wavetrain is shifted against itself by
exactly one, two, etc., wavelengths, increasing the amplitude of its oscillation with respect to the
wavetrains of other colors inside the recombined beam. In this setup, the central dark fringe is
unique, making it easy for Michelson to see how its position changes as the interferometer is
rotated.

1.2 Historical Reasoning Behind the Ether-Wind Experiment


Physical theory has changed a great deal since 1881, but it is still relatively easy to understand
the reasoning behind Michelson’s experiment. As soon as light is taken to be a wavefield in a
medium at rest, such as waves on the surface of water, and the Earth’s motion through space is
regarded as carrying the interferometer through the medium, everything falls into place.
The first point worth mentioning is that the velocity at the equator due to the Earth’s daily
47 km/sec, much less than the Earth’s orbital velocity around the sun of 29.67
rotation is 0.46 9.7
km/sec. Consequently, the rotational velocity of Michelson’s laboratory—well north of the
equator—was only about 1% of the orbital velocity, and Michelson did not have to pay any
attention to it. The interferometer in Fig. 1.1(a) can be rotated on its stand, so at noon and
midnight, Michelson could always arrange for one arm to be aligned with the Earth’s orbital
velocity. Figures 1.9(a) and 1.9(b) show light traveling along the arms of a Michelson
interferometer when the interferometer is viewed as moving with a velocity v through a stationary
medium—that is, a luminiferous ether—and one of the arms is aligned with v. To keep life
simple, we have dropped the compensator plate from the two diagrams. Figure 1.9(a) shows light
traveling out and back along the arm aligned with v, with the interferometer rotated so that this is
the arm holding mirror C in Fig. 1.1(b). Figure 1.9(b) shows light traveling out and back along
the arm holding mirror D in Fig. 1.1(b). The positions of mirrors C and D are adjusted so that
each one is the same distance a from the beam splitter.
Figure 1.9(a) shows the beam splitter at three different positions as a single crest of the light’s
wavefield moves through the interferometer: when the wavecrest first enters the arm of the
interferometer, when the wavecrest reflects off mirror C, and when the wavecrest returns to the
beam splitter for the second time. Mirror C is shown at the same three times—when the
wavecrest enters the arm, when it reflects off C, and when it returns to the beam splitter. The
velocity of the wavecrest with respect to the ether is c, and time t1 elapses as the wavecrest goes
from the beam splitter to mirror C. Hence, the wavecrest covers a distance a + vt1 in the
stationary ether while traveling at velocity c, with

a  vt1 ct1 . (1.1a)

- 14 -
Historical Reasoning Behind the Ether-Wind Experiment · 1.2

FIGURE 1.8.

Beam RT Beam TR

- 15 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.9(a).
Direction of
Earth’s Motion

vt1 vt 2 vt1 vt 2

a
Incident Light

Positions of the Positions


Beam Splitter of Mirror C

To Telescope

- 16 -
Historical Reasoning Behind the Ether-Wind Experiment · 1.2

FIGURE 1.9(b).

Direction of
Earth’s Motion

Mirror D

Positions of the
Beam Splitter
a

Incident Light

vt 3 vt 3

To Telescope

- 17 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

Time
Time t2 elapses
t2 elapses while
while thethe wavecrestreturns
wavecrest returnsfrom
frommirror
mirrorCCtotothe
thebeam
beam splitter,
splitter, and
and similar
reasoning shows that
a  vt2 ct2 . (1.1b)

Solving
Solving for for
t1 and
t1 and
t2 in
t2 Eqs.
in Eqs.
(1.1a)
(1.1a)
andand
(1.1b)
(1.1b)
gives
gives

a
t1
cv
and
a
t2 .
cv

TheThe
wavecrest
wavecrest
spends
spends
timetime
a a 2ac
t1  t2  2 2
cv cv c v

going out to mirror C and back to the beam splitter, and it does so while traveling at velocity c, so
it covers a total distance
2ac 2
c A (t1  t2 ) 2 2 . (1.1c)
c v

Figure 1.9(a) also shows the wavecrest traveling at an angle, instead of straight down, after it
reflects off the beam splitter when leaving the interferometer’s arm. This allows it to head toward
where the observing telescope will be by the time the wavecrest reaches it; there is thus no
danger of the telescope missing the wavecrest because it has moved out of position. Figures
1.10(a) and 1.10(b) show why this happens. Figure 1.10(a) shows a single wavecrest reflecting
off a 458 stationary mirror. The large dots indicate where the “corner” of the reflecting wavecrest
is now and has been in the past as it reflects from the stationary mirror. The reflected wavecrest
travels upward at 908 from its original direction, as expected. Figure 1.10(b) shows what happens
when the same type of wavecrest reflects off a moving 458 mirror. The four thin solid lines show
the positions of the mirror at four equally spaced instants in time, and the large dots again show
where the corner of the reflecting wavecrest is at these times. Connecting these dots with a thick
dashed line, we see that the wavecrest feels an effective stationary mirror that is slanted at an
angle somewhat greater than 458. This means the reflected wavecrest does not travel straight up
as in Fig. 1.10(a) but instead moves a little off to the right.

- 18 -
Historical Reasoning Behind the Ether-Wind Experiment · 1.2

Figure 1.9(b) shows how the wavecrest travels up and back the interferometer arm
perpendicular to velocity v. In time t3 , the wavecrest travels a distance a 2  v 2t32 from the beam
splitter to mirror D; and, because it does this at velocity c, we must have

ct3 a 2  v 2t32
or
a
t3 .
c2  v2

Figure
Figure 1.9(b)
1.9(b) shows
shows thatthat
thethe totaldistance
total distancetraveled
traveledfrom
fromthe
thebeam
beamsplitter
splitter to
to mirror
mirror D
D and
back again must be
2ac
2ct3 . (1.2)
c2  v2

Even though the two interferometer arms are both of length a, if the interferometer is moving
then a single wavecrest splitting at the beam splitter does not travel the same distance in each arm
before recombining at the beam splitter. The difference ¨s between the distances traveled out and
back in each arm is, according to Eqs. (1.2) and (1.1c),

2ac ª c º 2a ª 1 º
s c(t1  t2 )  2ct3 «  1» «  1» .
c2  v2 ¬ c2  v2 ¼ 1  v 2 c 2 «¬ 1  v 2 c 2 »¼

The Earth’s orbital velocity is about 104 of the speed of light c, so we can make the
approximation

1 2 v2
1  v2 c2  1
2c 2
.

This gives
§ v2 · § v2 · av 2
s 2a ¨1  2 ¸ ¨1  2  1¸ 2  O(v 4 c 4 ) .
© 2c ¹ © 2c ¹ c

- 19 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.10(a). An incident wavecrest enters from the right and is reflected up from a stationary
surface. The dots show where the corner of the wavecrest is at equally spaced time intervals while it is
reflecting off the surface.

incident wavecrest
moving to the left

reflected wavecrest
moving up

reflecting surface

- 20 -
Historical Reasoning Behind the Ether-Wind Experiment · 1.2

FIGURE 1.10(b). The same wavecrest is shown here at four instants of time, each instant
separated from the next by a time interval of ¨t, as it enters from the right and reflects off a flat
surface traveling from left to right across the page. The dots show where the corner of the wavecrest
is at these four instants of time, and the thick dashed line shows the effective slant of the surface
experienced by the wavecrest as it reflects.

Same incident wavecrest at four equally


spaced instants of time

t t – ǻt
t  2t
direction of travel of
incident wavecrest
t  3t

direction of travel of
reflected wavecrest

t
t  3t
t  2t t  t

reflecting surface at four equally spaced


instants of time

- 21 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

Since v 2 c 2 108 and v 4 c 4 1016 , it makes sense to neglect the v 4 c 4 terms and write

av 2
s 2 ? 108 a . (1.3a)
c

It isIt perhaps of of
is perhaps interest to to
interest point
pointoutoutthat
thatMichelson,
Michelson,bybymistakenly
mistakenlyassuming
assuming that
that the
the light
light
traveling up and back the arm perpendicular to the orbital velocity covered a distance 2a instead
of 2ac / c 2  v 2 , ended up with
2av 2
s ? 2 ;108 a (1.3b)
c2

in his 1881 paper. This incorrect formula did not affect Michelson’s overall analysis because, as
he explained in the paper, the data was good enough to rule out an effect ten times smaller than
what he expected to see.
As pointed out in Sec. 1.1, when white light passed through the interferometer with one of the
end mirrors slightly tilted, Michelson saw a central dark band or fringe from the centerline of the
tilted mirror because the centerline is the same distance from the beam splitter as the untilted
mirror. Remembering that Michelson used a beam splitter that reversed the direction of vibration
in one of the recombining beams, we know that at the center of the dark fringe each
monochromatic wavetrain in the white-light beam cancels itself out. At the first colored band or
fringe on either side of the centerline, the wavetrains go from cancelling themselves out to
reinforcing themselves, becoming bright at those positions on the tilted mirror where the length
traveled out and back the tilted mirror arm is a half-wavelength longer than at the center of the
dark band [see, for example, the transition from Fig. 1.5(c) to Fig. 1.5(e)]. Hence, for each
monochromatic wavetrain, the transition from dark to bright is halfway complete where the
length traveled out and back the tilted-mirror arm is a quarter wavelength different from what it is
at the center of the dark band. Considering the joint actions of all the monochromatic wavetrains
in the white-light beam, Michelson then knew that going from the center to the edge of the dark
fringe corresponded to shifting from a position on the tilted mirror where the length out and back
in both interferometer arms was equal to a position where the length out and back the tilted
mirror arm was different by one quarter of the average wavelength Ȝav of the white-light beam.
Thus the fringe widths inside the telescope’s field of view gave him an extremely fine-grained
scale for measuring the difference in distance between the two arms. For greater accuracy, a
monochromatic beam could be sent through the interferometer and the tilted mirror adjusted until
the fringes matched up with the scale marks of the telescope’s eyepiece.
If the interferometer is rotated so that the arm originally parallel to v is now perpendicular to
v, then the distance out and back one arm is shorter by ¨s and the distance out and back in the
other arm is longer by ¨s, so there is—according to Eq. (1.3a)—a shift of

- 22 -
Historical Reasoning Behind the Ether-Wind Experiment · 1.2

2av 2
2∆s ≅ 2
≈ 2 ×10−8 a (1.4)
c

of the wavefield from one arm when compared to the wavefield from the other arm. If 2¨s equals
λav / 4 , the dark fringe shifts until its center is located at the previous position of one of its edges;
if 2¨s is larger, then the dark fringe shifts more; and if 2¨s is smaller, then the dark fringe shifts
less. For the value of a he chose, Michelson expected the fringe to shift by approximately one-
tenth its width. To within experimental error, he did not see the dark fringe shift at all. Michelson
concluded that

the hypothesis of the stationary ether is thus shown to be incorrect, and the necessary conclusion follows that
the hypothesis is erroneous.8

The existence of the ether was accepted by a lot of scientists, so this experiment was by no
means the last word in the matter; indeed, it inaugurated 50 years of ever more painstaking
attempts to detect an ether wind using larger and more sensitive Michelson interferometers.
Michelson himself took the first step down this road when, in 1887, he collaborated with Edward
Morley to repeat his experiment; Fig. 1.11 shows the optical diagram of the interferometer they
constructed. They concluded that the velocity v of the interferometer with respect to the ether was
probably less than a sixth of the Earth’s orbital velocity, an upper limit suggested by
experimental error.9 Michelson and Morley regarded this as another negative result. Many
scientists, including Michelson, at first interpreted these experiments as showing that the Earth
dragged along a layer of ether near its surface, making it hard to say just how fast the
interferometer might be moving with respect to the ether in the laboratory. Interferometers were
set up on tops of mountains and sent up in high-altitude balloons, hoping to get outside the ether
layer dragged along by the Earth, but no one came up with any results convincingly larger than
experimental error. According to Einstein’s special theory of relativity, published in 1905, there
is no reason to expect “ether drift” at all, because the speed of light is the same in all inertial
frames of reference. After 1905, attempts to detect ether drift were basically attempts to disprove
relativity theory, and scientists who pursued them were regarded by their peers as ever more
eccentric. Perhaps the last serious attempt to detect an ether wind using a Michelson
interferometer took place on top of Mount Palomar, where Dayton Miller ran an extremely large
and sensitive Michelson experiment in the 1920s. When publishing the results in the early 1930s,
he claimed to detect ether-wind velocities on the order of 10 km/sec,10,11 but the data remained

8
Michelson, “The Relative Motion of the Earth.”
9
A. Michelson and E. Morley, “On the Relative Motion of the Earth and the Luminiferous Ether,” American Journal
of Science 34, Series 3 (1887), 333–345.
10
D. Miller, “The Ether-Drift Experiment and the Determination of the Absolute Motion of the Earth,” Reviews of
Modern Physics 5, no. 2 (July 1933), 203–242.

- 23 -
Â(WKHU:LQG6SHFWUDO/LQHVDQG0LFKHOVRQ,QWHUIHURPHWHUV

controversial. After his death, the results were attributed to slight but systematic temperature
changes in the instrument during the measurements.12

0RQRFKURPDWLF/LJKWDQG6SHFWUDO/LQHV
The wavelength λ of a monochromatic light wave and the frequency I in cycles per unit time of
that same monochromatic light wave are connected by

λI =F, (1.5)

where F is the velocity of light. By the second half of the 19th century, it was known that the light
emitted by free atoms, such as from the atoms inside a hot dilute gas, is often emitted at specific
frequencies called spectral lines. Equation (1.5) then requires the light from a spectral line to
have a precise wavelength λ  FI. Michelson used these spectral lines to generate the
monochromatic light sent through his interferometer. When, for example, a spectroscope was
used to separate out the cadmium red line and send it through the interferometer, he would see a
regular pattern of red fringes; when the mercury green line was sent through, he would see
regular green fringes; and so on. Many of these lines are in reality clumped groups of spectral
lines, all having nearly the same wavelength; they masquerade as a single bright line when
observed by low-resolution spectroscopes and spectrometers.

$SSO\LQJWKH0LFKHOVRQ,QWHUIHURPHWHUWR6SHFWUDO/LQHV
After the first ether-wind experiments, Michelson demonstrated that his interferometer could also
be used both as an extremely accurate, practical ruler for measuring fundamental lengths and as
an extremely high-resolution spectrometer. To understand Michelson’s approach, we must keep
in mind that the only ³optical detectors´ available back then were cameras (whose images had to
be chemically developed in darkrooms) and the human eye.
When the interferometer is used as a ruler or spectrometer, one of the arms is modified so that
its mirror is easily moved, as shown in Fig. 1.12. This moving mirror and the fixed mirror on the
other arm are still slightly tilted with respect to each other; that is, when extended indefinitely,
the planes of the mirror surfaces do not meet at exactly 90°. In this discussion, we refer to the
moving mirror as being tilted and the fixed mirror as being untilted. To keep things consistent
Sec. 1.1,
with the discussion in Sec. 1.1, the
the beam
beam splitter
splitter isis assumed
assumedto
tobe
bethe
thesame
sametype
typeused
usedininthe
the1881
1881



11
D. Miller, ³The Ether-Drift Experiment and the Determination of the Absolute Motion of the Earth,´ 1DWXUH
(February 3, 1934), 162±164.
12
R. Shankland, S. McCuskey, F. Leone, and G. Kuerti, ³New Analysis of the Interferometer Observations of
Dayton C. Miller,´ 5HYLHZVRI0RGHUQ3K\VLFV , no. 2 (April 1955), 167±178.


Applying the Michelson Interferometer to Spectral Lines · 1.4

FIGURE 1.11.

- 25 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

ether-wind experiment. Hence, when a white-light beam is sent through the instrument, an
observer notes a central dark fringe if the center of the tilted moving mirror is the same distance
from the beam splitter as the center of the fixed mirror. This equidistant position of the moving
mirror is today often called the position of zero-path difference (ZPD) because the light’s path up
and back each arm of the interferometer is the same when there is no tilt present.
The position and tilt of the moving mirror can be adjusted until the central dark fringe is
centered on rulings marked in the telescope’s eyepiece. When the white-light beam is replaced by
a monochromatic beam from a spectral line, the observer sees a sequence of light and dark bands
forming a regular pattern of fringes having the same color as the spectral line. The marked
position of the central dark fringe in the center of the eyepiece is now occupied by a dark null of
the monochromatic fringe pattern. This null corresponds to the centerline strip of the tilted
mirror’s surface being the same distance from the beam splitter as the untilted mirror’s surface.
The two bright fringes on either side of the marked null separate that null from the two
neighboring nulls, with the neighboring nulls corresponding to two strips of the tilted mirror’s
surface that are a half-wavelength closer to, and a half-wavelength further away from, the beam
splitter. A half-wavelength difference in distance from the beam splitter creates, of course, a full
wavelength’s difference in the distance traveled up and back the interferometer’s arm, which is
why we see another null. Depending on the configuration of the telescope, the amount of tilt in
the tilted mirror, and the wavelength of the monochromatic beam, there will be some number of
additional fringes alternating bright and dark across the field of view, with the nulls
corresponding to strips of the tilted mirror’s surface that are one half-wavelength closer to and
further away from the beam splitter, two halves or one full wavelength closer to and further away
from the beam splitter, three halves closer to and further away from the beam splitter, and so on.
The observer can slowly move the tilted mirror out along its arm, watching as the fringe
pattern moves across the telescope’s field of view. The movement occurs, of course, because the
strips of the moving mirror’s tilted surface that are 1/2, 1, 3/2, etc., wavelengths closer to or
further away from the beam splitter are now no longer where they used to be. The marked null
shifts and, after the mirror moves half a wavelength from its original position, the null that used
to be immediately to one side shifts into the marked location. The fringe pattern looks the same
as just before the mirror began moving, but the observer knows there has been a half-wavelength
shift in the position of the moving mirror because the fringes have been carefully watched as their
positions changed. As the mirror moves, old fringes move out of sight on one side of the field of
view while new fringes replace them on the other side of the field of view. The observer checks
that the tilt of the moving mirror does not change by making sure that there is always the same
number of bright-null repetitions in the fringe pattern. Since the position of the moving mirror is
always known to within a small fraction of a wavelength, the interferometer has now become an
extremely accurate way to measure distance.

- 26 -
Applying the Michelson Interferometer to Spectral Lines · 1.4

FIGURE 1.12.

Moving Mirror
p

Beam Compensator
Splitter Plate

Source Radiance Containing


Spectral Lines

Fixed
Mirror

To Telescope

- 27 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

Michelson did not hesitate to measure distances with his interferometer. In 1892 he
established that the standard meter bar in Paris corresponded, to an accuracy of one part in two
million, to 1,553,163.5 wavelengths of monochromatic light from the red cadmium spectral line.
At Yerkes Observatory in Wisconsin, he measured the extremely small tidal distortions of the
planet Earth due to the moon’s gravity, helping to establish that the Earth has an iron core, and
published the results in 1919. There is, however, a fundamental difficulty limiting his ability to
use the interferometer as a ruler: As the moving mirror gets further and further away from its
equidistant or ZPD position, the pattern of fringes starts to fade and eventually disappears. This
phenomenon is caused by the beam from the spectral line not being exactly monochromatic—
either because what looks like a single spectral line is in reality a group of two or more lines
having almost the same wavelength, or because the line itself has a finite spectral “width,”
simultaneously emitting light at a very large number of wavelengths all very close to each other
in value.
To see why the fade-out occurs for a closely spaced group of spectral lines, we first analyze
what happens when the light from a pair of equal-intensity, closely spaced spectral lines,
sometimes called a spectral doublet, is sent through the interferometer. Inside the interferometer,
the doublet behaves like two monochromatic beams—each having a slightly different
wavelength—simultaneously passing through the instrument. After using white light to put the
moving, tilted mirror at its ZPD position, we begin sending the doublet beam through the
interferometer. Each monochromatic beam produces a fringe pattern. To the human eye, the
fringe patterns have the same color and their nulls seem to be at exactly the same places in the
telescope’s field of view. Because the wavelengths of the beams are nearly identical, the two
fringe patterns lie almost exactly on top of each other, reinforcing each other the same way the
dashed and solid oscillations lie on top of each other to create a thicker line at the left-hand edge
of Fig. 1.13. When, for example, there is a null in one beam’s fringe pattern because that strip of
the tilted mirror’s surface is an integer number of half-wavelengths closer to or further away from
the beam splitter, the null from the other beam’s fringe pattern falls in almost exactly the same
place because it has almost exactly the same wavelength. As we shift the moving mirror further
away from ZPD and watch the fringes move, we know that when each new fringe forms at the
leading edge of the field of view, it shows that the edge of the tilted moving mirror is an ever
larger number of half-wavelengths further from the beam splitter. Sooner or later, however, the
same thing happens to the two beams’ fringe patterns that happens in Fig. 1.13 as we look away
from its left-hand edge—the oscillations get out of phase. Just as the dashed and solid lines in
Fig. 1.13 no longer match up exactly because they have slightly different repetition lengths, so do
the two fringe patterns of the two beams match up less well because they have slightly different
wavelengths. There always comes a point—perhaps when the next null is forming at 10,000 or
50,000 or more half-wavelengths from the ZPD position of the moving mirror—where the
monochromatic beam with the slightly shorter wavelength λ1 is ready to form a null somewhat
before the beam with the slightly longer wavelength λ2. The nulls and brights from one
monochromatic fringe pattern shift enough with respect to the other that we begin to notice a
change: the pattern begins to fade. Eventually, the two fringe patterns are completely out of

- 28 -
Applying the Michelson Interferometer to Spectral Lines · 1.4

phase, with the brights and nulls of one pattern lying on, respectively, the nulls and brights of the
other. If the two beams are of equal intensity, then the fringe pattern fades away completely.
Suppose the λ1 set of fringes first becomes exactly out of phase with the λ2 set of fringes when
the moving mirror has traveled a distance of approximately N/2 wavelengths of the λ2 beam from
its equidistant or ZPD location. At this point, N satisfies the approximate equation

1 1§ 1·
N λ2 ≅ ¨ N + ¸ λ1 , (1.6a)
2 2© 2¹
which can also be written as
λ2 − λ1 1
≅ . (1.6b)
λ1 2N

This gives the formula for the fractional spread

λ2 − λ1
λ1

between the doublet’s wavelengths in terms of N. If N is too large for convenient counting and
only several digits of accuracy are needed, we can directly measure the distance p in Fig. 1.12 at
which the fringe pattern disappears. Recognizing that both sides of Eq. (1.6a) are formulas for p
at the fade-out point, we can approximate either side of Eq. (1.6a) by N λav , where λav is the
approximate wavelength of the doublet, and write

N λav
≅ p. (1.6c)
2

Solving for N gives the formula


2p
N≅ (1.6d)
λav

to estimate N in terms of the known values of p and λav . This approximate value of N can then
be put into Eq. (1.6b) to find the fractional spread in the doublet. Hence, we see that the fade-out
is both a “bug” and a “feature” of the interferometer—although it sets a limit on the distances that
can be measured, it also specifies the exact separation of spectral lines too close to be resolved by
other types of spectrometers. This exercise also establishes the basic idea behind Michelson-
based spectroscopy: examining the behavior of the interference signal to measure the beam’s
spectral shape.

- 29 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.13. The solid oscillation represents the fringe pattern of one spectral line in the doublet and
the dashed oscillation represents the fringe pattern of the other spectral line in the doublet. The
wavelengths of both spectral lines are almost the same, so their fringe patterns slowly change from being
in-phase, to being out-of-phase, and then back to being in-phase.

ax( p )

p
i
0
P
i

min ( p )
0 1 2 3 4 5 6 7 8 9 10
0 x 10
i

strong fringes weak fringes no fringes weak fringes strong fringes

Now that we understand why the fringe pattern of a doublet fades, it is easy to see why the
same sort of thing happens with any size group—or multiplet—of closely spaced spectral lines.
Each line of intrinsically greater or lesser intensity generates a fringe pattern of intrinsically
greater or lesser intensity connected to its wavelength. Near ZPD, all the fringe patterns are in
phase, but as the moving mirror shifts away from ZPD, the fringe patterns, since each is produced
by a slightly different wavelength, go out of phase, causing the fringes to fade. Figure 1.14 even
suggests a quick way of understanding something about why a single, finite-width spectral line
also produces fading fringe patterns; approximating it as a closely spaced multiplet, we might
expect its fringes to behave the same way any other multiplet’s would. We should, however, be
careful about carrying this sort of reasoning too far. Figure 1.13 suggests that if, after reaching
the fade-out point, we keep moving the tilted mirror away from its ZPD position, then the
doublet’s fringe pattern starts to reappear, eventually becoming as strong as it was near ZPD. The
same sort of phenomenon should also occur for any multiplet consisting of a finite number of
exact wavelengths; if we go far enough from ZPD, then there should be a region where the fringe
patterns are all back in phase. In reality, when moving away from ZPD, there are indeed regions
where a multiplet’s fringe pattern first fades then grows stronger, but the finite width of each
spectral line inside the multiplet stops the fringes from ever regaining their full ZPD strength.
The fringes always, eventually, fade away completely. To explain this behavior, it is enough to
examine how and why the fringe pattern of a single, finite-width spectral line fades away. This is
done in the next three sections, where we show how a fringe pattern is connected to the Fourier
transform of the spectral intensity.

- 30 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

1.5 Interference Equation for the Ideal Michelson Interferometer


When using a Michelson interferometer for Fourier-transform spectroscopy, the end mirrors in
each arm are aligned to be perpendicular to the line of sight between their centers and the center
of the beam splitter. In effect, we remove the tilt from the moving mirror so that its central fringe
fills the detector’s field of view in Fig. 1.15. The light beam passing through the interferometer
should be collimated, shown schematically in Fig. 1.15, by putting the point source of the beam
at the focus of a thin lens. The beam leaving the interferometer is concentrated onto a detector by
another thin lens. The dashed line shows the ZPD position of the moving mirror in Figs. 1.15 and
1.16. The moving mirror is a distance p from ZPD in these two figures, with p taken to be
positive when the mirror is further away from the beam splitter than its ZPD position and
negative when it is closer to the beam splitter than its ZPD position. The moving mirror should
remain perpendicular to the line of sight between it and the beam splitter as p changes, and the
detector records the changing intensity I of the collimated beam leaving the interferometer.
Even though Michelson did not usually set up his interferometers this way, optical theory was
advanced enough then for him to predict how I depends on p. The first step is to set up an x, y, z
Cartesian coordinate system such as the one shown in Fig. 1.16, with the collimated exit beam
traveling down the z axis. There are dimensionless unit vectors x̂ , ŷ , ẑ pointing in the direction
of the positive x, y, z coordinate axes. Still treating a light beam as a transverse wavefield of the
type shown in Figs. 1.2(a)–1.2(c) and 1.3(a)–1.3(c), we assume that beam TR in Fig. 1.16 is
monochromatic light and write its transverse disturbance as

K § 2π z · § 2π z ·
Af = xU
ˆ f cos ¨ − 2π ft + δU ¸ + yV
ˆ f cos ¨ − 2π ft + δV ¸ . (1.7a)
¨ λf ¸ ¨ λf ¸
© ¹ © ¹

Here, t is the time coordinate, f is the frequency of the monochromatic disturbance, and λf is the
wavelength corresponding to frequency f. The period of the disturbance is, of course, 1/f, and Eq.
(1.5) reminds us that the wavelength λf is connected to the frequency f by

λf f = c ,
K
where again c is the speed of light. Vector Af has no ẑ component, allowing it to represent a
transverse disturbance in the “ether”
K of the type shown in Figs. 1.2(a)–1.2(c) and 1.3(a)–1.3(c).
The x̂ and ŷ components of Af are the real-valued expressions

§ 2π z ·
U f cos ¨ − 2π ft + δU ¸
¨ λf ¸
© ¹

- 31 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.14.

Spectral Intensity

frequency f

Spectral Intensity

Spectral Multiplet

frequency f

- 32 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

FIGURE 1.15.

90 deg.

p Moving Mirror

Fixed
Mirror
45 deg. Compensator
source at
Plate
focus 90 deg.

Beam
Splitter

Detector

- 33 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

and
§ 2π z ·
V f cos ¨ − 2π ft + δV ¸¸
¨ λf
© ¹

respectively. These components must both oscillate at the same frequency f because the light
beam is monochromatic, but they can have different constant phase shifts δU and δV . This allows
K
Af to point in different directions in the x, y plane when we move along the beam, as suggested
by the changing orientations of the arrows in beams RT and TR of Fig. 1.16. The Uf and Vf
amplitudes of the x and y oscillations do not have to be equal. To simplify the notation, and
because the concept will be routinely used in the rest of the book, we define

1
σf = (1.7b)
λf

to be the wavenumber of the monochromatic disturbance. Now Eqs. (1.7a) and (1.5) can be
written as
K
ˆ f cos ( 2πσ f z − 2π ft + δU ) + yV
Af = xU ˆ f cos ( 2πσ f z − 2π ft + δV ) (1.7c)
with
σ f = f /c . (1.7d)

This is the same monochromatic disturbance as before; all that changes is the notation used to
specify how its phase changes with z.
The power transported by a physical wavefield of any type is usually proportional to its
squared amplitude;13,14 and in optics it is now, as it was in Michelson’s time, customary to set the
time average of the squared amplitude equal to the intensity of the transverse wavefield.15 Visible
light has a wavelength on the order of 5 × 10−7 meters , so by Eq. (1.5) its frequency is about

c
f ≅ ≅ 6 ×1014 Hz (1.8a)
5 ×10 meters
−7

given that c ≅ 3 ×108 m/sec . Hence one cycle of the transverse wavefield has a period of about

13
H. Lamb, Hydrodynamics (6th edition), Dover Publications, New York, 1945 copy of the 6th edition first
published in 1879, p. 370.
14
P. Morse and K. Ingard, Theoretical Acoustics, McGraw-Hill, Inc., New York, 1968, p. 250.
15
G. Stokes, Mathematical and Physical Papers, Vol. III, Cambridge at the University Press, 1901, pp. 233-258.

- 34 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

FIGURE 1.16.

Moving Mirror

Fixed
Mirror
Beam
Splitter

Compensator
Plate
χ = 2p

Beam RT


y axis
x axis

z axis
ẑ x̂
Beam TR

- 35 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

1
2 ;1015 sec . (1.8b)
6 ;1014 Hz

TheThe response
response time
time of of
thethe unaidedhuman
unaided humaneye eyeisisperhaps
perhapsasasshort 10í2í2 s,s, and
shortasas10 and 2×10 í15
2×10í15 s is
13 13
shorter than that by a factor of about 10 . The response of the fastest optical detectors available
today is on the order of 10í9 s, which is still an incredibly long time compared to 2×10í15 s.
Therefore, we might as well take the time over which the squared amplitude is averaged to be
infinitely long, because compared to the wavefield’s period, that’s what it effectively is.
Following the notation of the time, the time average of a function g(t) is taken to be

T
1
j  g (t )  lim
T 75 2T ³ g (t )dt .
T
(1.9a)

ForFor
anyany
twotwo functions
functions g(t)g(t)
andand
h(t),h(t),
we wethenthen have
have

T T T
1 1 1
j  g (t )  h(t )  lim
T 75 2T ³ [ g (t )  h(t )]dt lim
T
T 75 2T ³
T
g (t )dt  lim
T 75 2T ³ h(t )dt
T

or
j  g (t )  h(t )  j  g (t )   j  h(t )  . (1.9b)

Multiplying
Multiplying g(t)g(t)
by abyconstant
a constant K and
K and thenthen averaging,
averaging, we we
get get

T T
1 1
j  K A g (t )  lim
T 75 2T ³T [ Kg (t )]dt K Tlim
75 2T ³ g (t )dt
T
or
j  K A g (t )  K A j  g (t )  . (1.9c)

The squared amplitude of the monochromatic wavefield in Eq. (1.7c) is


K K
Af = Af U 2f cos 2  2&) f z  2& ft  U   V f2 cos 2  2&) f z  2& ft  V  .

Time averaging both sides to get the intensity gives


K K

j ( Af = Af ) j U 2f cos 2  2&) f z  2& ft  U   V f2 cos 2  2&) f z  2& ft  V  ,  (1.10a)

- 36 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

which becomes, applying Eqs. (1.9b) and (1.9c),


K K
  
j ( Af = Af ) U 2f j cos 2  2&) f z  2& ft  U   V f2 j cos 2  2&) f z  2& ft  V  .  (1.10b)

The average of the squared cosine is 1/2 over one of its cycles.16 As the averaging time gets
longer, it contains ever more cycles of the squared cosine, as well as—almost certainly—some
fraction of a cycle. The contribution of the squared cosine over a fractional cycle has practically
no influence compared to the squared cosine’s average value of 1/2 over a large number of
complete cycles. In the limit as T ĺ ’, it follows that


j cos 2 (at  b) 1/ 2  (1.10c)

for all real values of a and b. Hence, the formula for the intensity of the monochromatic beam in
Eq. (1.10b) now reduces to
K K 1
j ( Af i Af ) U 2f  V f2 .
2
  (1.10d)

Although the squared cosine is always positive, the cosine itself is negative as often as it is
positive and averages to zero over one cycle. As the averaging time increases, it includes an ever
larger number of cycles as well as (probably) some leftover fraction of a cycle. Again, the
influence of the zero from the large number of complete cycles outweighs the contribution of
whatever fractional cycle may be present, and as T ĺ ’ in the limit

j  cos(at  b)  0 (1.11)
for all real values of a and b.
The wavefield of a beam of light containing two monochromatic wavetrains of frequencies f1
and f2 can be written as K K K
A A f1  A f2 , (1.12a)
where
K
 
ˆ f1 cos 2&) f1 z  2& f1t  U(1)  yV
Af1 xU 
ˆ f1 cos 2&) f1 z  2& f1t  V(1)  (1.12b)
and
K
 
ˆ f2 cos 2&) f2 z  2& f 2t  U(2)  yV
Af2 xU 
ˆ f2 cos 2&) f2 z  2& f 2t  V(2) .  (1.12c)

16
D. Griffiths, Introduction to Electrodynamics, 2nd ed. (Prentice Hall, Englewood Cliffs, NJ, 1989), p. 359.

- 37 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

The beam’s intensity is the time average of its squared amplitude, which is
K K K K K K K K K K K K
  
j  A = A  j ( Af1  Af2 ) = ( Af1  Af2 ) j Af1 = Af1  Af2 = Af2  2 Af1 = Af2 ) . 

Equations (1.9b) and (1.9c) can be applied to get


K K K K K K K K
   
j  A = A  j Af1 = Af1  j Af2 = Af2  2 j Af1 = Af2 .   (1.12d)

Substituting Eqs. (1.12b) and (1.12c) into the cross term in Eq. (1.12d) gives
K K
     
j Af1 = Af2 j U f1U f2 cos 2&) f1 z  2& f1t  U(1) cos 2&) f2 z  2& f 2t  U(2) 

 V f1V f2 cos 2&) f1 z  2& f1t  V(1)  cos  2&) f2 z  2& f 2t  V(2)  .

Again, Eqs. (1.9b) and (1.9c) are applied to get


K K
     
j Af1 = Af2 U f1U f2 j cos 2&) f1 z  2& f1t  U(1) cos 2&) f2 z  2& f 2t  U(2) 
(1.12e)
 
 V f1V f2 j cos 2&) f1 z  2& f1t  V(1)  cos  2&) (2)
f 2 z  2& f 2 t  V  .

There
There is aistrigonometric
a trigonometric identity
identity

1 1
(cos . )(cos  ) cos(.   )  cos(.   ) , (1.12f)
2 2
which shows that
  
cos 2&) f1 z  2& f1t  U(1) cos 2&) f2 z  2& f 2t  U(2) 
1

2

cos 2& z () f1  ) f2 )  2& t ( f1  f 2 )  U(1)  U(2)  (1.12g)

1

 cos 2& z () f1  ) f2 )  2& t ( f1  f 2 )  U(1)  U(2) .
2

Taking
Taking the the
timetime average
average of both
of both sides
sides andand applying
applying Eqs.Eqs. (1.9b)
(1.9b) andand (1.9c),
(1.9c), we we
see see
thatthat

- 38 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

   
j cos 2&) f1 z  2& f1t  U(1) cos 2&) f2 z  2& f 2t  U(2) 
1

2
  
j cos 2& z () f1  ) f2 )  2& t ( f1  f 2 )  U(1)  U(2)

1
 
 j cos 2& z () f1  ) f2 )  2& t ( f1  f 2 )  U(1)  U(2)
2
 .
Equation (1.11)
Equation requires
(1.11) bothboth
requires terms on the
terms right-hand
on the sideside
right-hand to be
to zero, which
be zero, gives
which gives

   
j cos 2&) f1 z  2& f1t  U(1) cos 2&) f2 z  2& f 2t  U(2)  = 0 . (1.12h)

Replacing
Replacing U(1,2)
U(1,2)bybyV(1,2)
V(1,2)in inthethealgebra
algebraused
usedtotoreach
reach this
this result
result does
does not
not change
change the
conclusion, which means that

   
j cos 2&) f1 z  2& f1t  V(1) cos 2&) f2 z  2& f 2t  V(2)  = 0 (1.12i)

also. Substituting these two formulas into Eq. (1.12e) leads to


K K

j Af1 = Af2 0  (1.12j)

for any two frequencies f1 and f2 such that f1  f2. Hence, Eq. (1.12d) can be written as
K K K K K K
  
j  A = A  j Af1 = Af1  j Af2 = Af2 .  (1.12k)

Comparing
Comparing thethe
formula in in
formula (1.12k)
(1.12k)forforthe
theintensity
intensityofofa abeam
beamcontaining
containing two
two monochromatic
monochromatic
wavefields to the left-hand side of the formula in (1.10d) for the intensity of a single
monochromatic wavefield, we note that the intensity of the beam with two monochromatic
wavefields is the sum of the intensities of each monochromatic wavefield.
The wavefield of a beam of light containing three monochromatic wavetrains of frequencies
f1, f2, and f3 can be written as K K K K
A A f1  Af2  A f3 (1.13a)

K K K
with Af1 , Af2 specified by formulas (1.12b) and (1.12c) respectively and Af3 specified by

- 39 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

K

ˆ f3 cos 2&) f3 z  2& f 3t  U(3)  yV
Af3 xU  
ˆ f3 cos 2&) f3 z  2& f3t  V(3) .  (1.13b)

Following thethe
Following same
sameanalysis as as
analysis before,
before,wewenotenotethat
thatthe
theintensity
intensityofofthis
thisthree-frequency
three-frequency light
beam is
K K K K K K K K

j  A = A  j ( Af1  Af2  Af3 ) = ( Af1  Af2  Af3 ) 
K K K K K K K K K K K K

j Af1 = Af1  Af2 = Af2  A f3 = Af3  2 A f1 = A f2  2 Af1 = Af3  2 A f2 = Af 3 
K K K K K K
  
j Af1 = Af1  j Af2 = Af2  j Af3 = Af3   
K K K K K K
  
 2 j Af1 = Af2  2 j Af1 = Af3  2 j Af2 = Af3 .   
Equation (1.12j) shows that
K K

j Af1 = Af2 0 
K K
for any two distinct frequencies f1 and f2. The only thing different about j Af1 = Af3 and  
K K
 
j Af2 = Af3 is the subscripts assigned to the distinct frequencies, so the same algebra showing
K K
 
that j Af1 = Af2 is zero also shows that

K K K K
 
j Af1 = Af3 j Af2 = Af3 0 . 
K K K
Hence, the the
Hence, three-frequency formula
three-frequency for for
formula j jA= A= A
reduces   
reduces
to to

K K K K K K K K
  
j  A = A  j Af1 = Af1  j Af2 = Af2  j Af3 = Af3 .    (1.13c)

Here again, the intensity of the beam equals the sum of the intensities of its monochromatic
wavetrains.
This same argument can obviously be generalized to a beam consisting of N monochromatic
wavetrains. Since N may be left unspecified and can be made as large as we please, this is the
same as extending it to a beam of white light. The white-light wavefield can be written as

K N K
A ¦ A fi , (1.14a)
i 1
where
K

ˆ fi cos 2&) fi z  2& f i t  U( i )  yV
Afi xU  
ˆ fi cos 2&) fi z  2& fi t  V( i )  (1.14b)

- 40 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

with fi  fj whenever i  j. The intensity of this beam is

K K §§ N K · § N K ·· § N N K K ·
j ( A • A ) = j ¨ ¨ ¦ A fi ¸ • ¨ ¦ A f j ¸ ¸¸ = j ¨ ¦¦ Afi • Af j ¸ ,
¨
© © i =1 ¹ © j =1 ¹¹ © i =1 j =1 ¹

or, applying Eq. (1.9b),


K K K K
( )
N N
j ( A • A ) = ¦¦ j A fi • Af j . (1.14c)
i =1 j =1

Equation (1.12j) requires


K K
(
j A fi • A f j = 0 ) (1.14d)

whenever i  j, so Eq. (1.14c) reduces to

K K K K K K K K N K K
( ) ( ) (
j ( A • A ) = j Af1 • Af1 + j Af2 • Af2 + " + j Af N • Af N = ¦ j Afi • Afi ) i =1
( ) (1.14e)

because all the i  j terms disappear. Equation (1.14e) shows that the intensity of any beam, even
a white-light beam, is the sum of the intensities of its monochromatic wavetrains. This is
sometimes called the principle of independent superposition,17 and can be written as

N
I = I f1 + I f2 + " + I f N = ¦ I fi , (1.14f)
i =1
where
K K
I = j ( A • A) (1.14g)
is the total intensity of the beam and
K K
(
I fi = j A fi • A fi ) (1.14h)

is the intensity of the beam’s monochromatic wavetrain of frequency fi.


Returning now to Fig. 1.16, we suppose that Eqs. (1.14f)–(1.14h) refer to beam TR and
consider how to write the disturbance for beam RT. In an ideal Michelson interferometer, the
only difference between beam RT and beam TR is that the wavefields in beam RT lag behind the
wavefields in beam TR by a distance Ȥ = 2p that is usually called the optical-path difference.
Using the notation specified in Eq. (1.14b), we see that for every monochromatic wavetrain

17
J. Chamberlain, The Principles of Interferometric Spectroscopy (John Wiley & Sons, New York, 1979), p. 98.

- 41 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

K )
A(fTR
i
= xU (
ˆ fi cos 2πσ fi z − 2π fi t + δU(i ) + yV ) (
ˆ fi cos 2πσ fi z − 2π fi t + δV(i ) ) (1.15a)

in beam TR, there must be, according to Fig. 1.16, a corresponding monochromatic wavetrain
K
( )
ˆ fi cos 2πσ fi ( z + χ ) − 2π f i t + δU( i ) + yV
A(fiRT ) = xU (
ˆ fi cos 2πσ fi ( z + χ ) − 2π f i t + δV( i ) ) (1.15b)

in beam RT. The total disturbance for the combined beams’ fith wavetrain is then
K K )
A(fiRT ) + A(fTR
i

in Fig. 1.16. We also note, however, that the beam splitter in Fig. 1.16 is evidently not the same
sort of beam splitter as the one used by Michelson because it does not reverse the direction of the
oscillation of the TR beam the way that the beam splitter in Fig. 1.8 did. For this sort of beam
splitter, the total disturbance of the combined beam’s fith wavetrain should be
K K )
A(fiRT ) − A(fTR
i

according to the discussion at the end of Sec. 1.1. To accommodate both possibilities, we write
the fith wavetrain of the combined beam as
K K K )
A(ficb ) = A(fiRT ) + WA(fTR
i
, (1.15c)

where parameter W is í1 for Michelson-type beam splitters Kand 1 for non-Michelson beam
splitters. The superscript (cb) indicates that the disturbance A(ficb ) is the fith wavetrain of two
beams combined in a balanced way—that is, each beam has undergone one transmission and one
reflection at the beam splitter. The intensity of the combined fith wavetrain is
K K K K K K )
( ) (
I (ficb ) = j A(ficb ) • A(ficb ) = j ( A(fiRT ) + WA(fiTR ) ) • ( A(fiRT ) + WA(fTR
i
)
)
K K K ) K (TR ) K ( RT ) K (TR )
(
= j A(fiRT ) • A(fiRT ) + W 2 A(fTR
i
• Af
i
+ 2WA fi • Af
i
)
.

Applying Eqs. (1.9b) and (1.9c) gives


K K K ) K (TR ) K ( RT ) K (TR )
( ) (
I (ficb ) = j A(fiRT ) • A(fiRT ) + j A(fTR
i
• Af
i
+ 2W j)A fi • Af (
i
, ) (1.15d)

- 42 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

where we have recognized that W2 = 1 because W = ±1. Since both disturbances have the same fi
K K )
frequency, Eq. (1.12j) cannot be used to say that j A(fiRT ) = A(fTR
i

is zero. Substituting from 
(1.15a) and (1.15b) gives
K K )

j A(fiRT ) = A(fTR
i
    
j U 2fi cos 2&) fi ( z   )  2& f i t  U( i ) cos 2&) fi z  2& f i t  U( i ) 
  
 V f2i cos 2&) fi ( z   )  2& fi t  V( i ) cos 2&) fi z  2& f i t  V( i )  ,
or
K K )

j A(fiRT ) = A(fTR
i
    
U 2fi j cos 2&) fi z  2&) fi   2& fi t  U(i ) cos 2&) fi z  2& fi t  U(i ) 
(1.15e)
2
 
 V j cos 2&) fi z  2&) fi   2& fi t  
fi
(i )
V  cos  2&) fi z  2& fi t   (i )
V  .

Formula (1.12f) shows that

   
j cos 2&) fi z  2&) fi   2& fi t  U(i ) cos 2&) fi z  2& f i t  U( i ) 
§1 1 ·
 
j ¨ cos 4&) fi z  2&) fi   4& fi t  2U( i )  cos 2&) fi  ¸ .
©2 2 ¹
 
Applying (1.9b)
Applying andand
(1.9b) (1.9c), we we
(1.9c), get get
thatthat

   
j cos 2&) fi z  2&) fi   2& f i t  U( i ) cos 2&) fi z  2& fi t  U( i ) 
(1.15f)
1 1

2
 
j cos 4&) fi z  2&) fi   4& f i t  2U( i )  
2
 
j cos 2&) fi   .

TheThe
timetime average
average of any
of any time-independent
time-independent quantity
quantity equals
equals thatthat quantity—that
quantity—that is, is,

j K  K (1.15g)

for any constant K. Equation (1.11) shows that

 
j cos 4&) fi z  2&) fi   4& fi t  2U(i )  0 .

- 43 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

These two results can be substituted into (1.15f) to get

   
j cos 2&) fi z  2&) fi   2& fi t  U(i ) cos 2&) fi z  2& f i t  U( i ) 
(1.15h)
1

2

cos 2&) fi  . 
Replacing U(i ) by
Replacing (i )
U  byV(i ) does
(i )
V doesnot not
change
change
the the
algebra
algebra
usedused
to derive
to derive
(1.15h).
(1.15h).
It follows
It follows
thatthat

   
j cos 2&) fi z  2&) fi   2& fi t  V(i ) cos 2&) fi z  2& f i t  V( i )   12 cos  2&)   . (1.15i)
fi

Substituting
Substituting (1.15h)
(1.15h) andand (1.15i)
(1.15i) intointo (1.15e)
(1.15e) nownow gives
gives

K K ) 1 2

j A(fiRT ) = A(fTR
i

2
   
U fi  V f2i cos 2&) fi  ,  (1.15j)

and this result can be put into (1.15d) to get


K K K K )
  
I (ficb ) j A(fiRT ) = A(fiRT )  j A(fiTR ) = A(fTR
i
   
 W U 2fi  V f2i cos 2&) fi  .  (1.15k)

For an ideal Michelson interferometer, the intensity of the fith monochromatic wavetrain in
the RT beam and the intensity of the fith monochromatic wavetrain in the TR beam must be
identical because they arise in a symmetric way from the fith wavetrain of the white-light beam
entering the instrument. We can imagine taking out the moving mirror from its interferometer
arm
K (TR )so that only the TR beam is reflected back to the beam splitter. This means that only the
A fi monochromatic disturbance leaves the interferometer in the proper direction, and its
K ) K (TR )
intensity is, of course, j A(fTR i
 = Af
i

. Taking out the fixed mirror in the other arm and
replacing the moving mirror in the first arm ensures that only the RT beam reflects back to the
K K
 
beam splitter. Now j A(fiRT ) = A(fiRT ) is the intensity of the monochromatic disturbance leaving
the interferometer in the proper direction. Since we have just said that these two intensities must
be equal, it follows that
K K K K )
  
j A(fiRT ) = A(fiRT ) j A(fiTR ) = A(fTR
i
.  (1.16a)

- 44 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

KK
Equation
Equation
(1.10d)
(1.10d)
holds
holds true
true
forforany
anymonochromatic
monochromaticwavetrain
wavetrain AAf f of
offrequency
frequency f,f, so
so itit must
K (TR )
apply to wavetrain Afi of frequency f1. Hence, Eq. (1.15a) must mean that
K ) K (TR ) 1 2

j A(fTRi
= Af
i

2

(U fi  V f2i ). (1.16b)

K KRT( RT) )
Equation (1.10d)
Equation (1.10d)also
alsoapplies wavetrain A(A
appliestotowavetrain fi fi ofoffrequency
frequency fifi inin Eq.
Eq. (1.15b),
(1.15b), which
similarly leads to
K K 1
 
j A(fiRT ) = A(fiRT ) (U 2fi  V f2i ) .
2
(1.16c)

The right-hand sides of (1.16b) and (1.16c) are the same, which makes sense since the left-hand
sides of (1.16b) and (1.16c) must satisfy Eq. (1.16a).
Again taking out the moving mirror, we note that then, in an ideal interferometer, one quarter
of the entering beam’s power ends up leaving the interferometer as beam TR traveling along the z
axis in Fig. 1.16. Hence, if I (0)
fi is the intensity of the fith monochromatic wavetrain entering this
interferometer, we must have
K ) K (TR ) 1

j A(fTR
i
= Af
i
I (0)
4 i
f . (1.17a)

Consulting Eq. (1.16a), we see that this means

K K 1
 
j A(fiRT ) = A(fiRT ) I (0)
4 i
f (1.17b)

and, of course, Eqs. (1.16b) and (1.16c) then reveal that

I (0) 2 2
fi 2(U fi  V fi ) . (1.17c)

Substituting Eqs. (1.17a)–(1.17c) into (1.15k) then leads to

1 (0) W (0)
I (ficb ) I f  I fi cos 2&) fi 
2 i 2
 
or
1 (0) ª
I (ficb ) I f 1  W cos 2&) fi  º .
  (1.17d)
2 i ¬ ¼

- 45 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

Equation (1.17d) is the basic equation for the intensity of a monochromatic wavetrain leaving
an ideal Michelson interferometer when the intensity of the corresponding wavetrain entering the
interferometer is I (0)
fi and the moving mirror is displaced from its ZPD position by a distance
p  / 2 , as shown in Fig. 1.16. We note that for those values of Ȥ = 2p, where
 
W cos 2&) f  1 , the intensity of the fith monochromatic wavetrain leaving the interferometer is
i

the same as the intensity of the fith monochromatic wavetrain entering the interferometer. This
corresponds to constructive interference of the fith monochromatic component of the RT and TR
beams. Suppose the beam entering the interferometer consists of just this one monochromatic
component. Glancing back at Fig. 1.1(b), we see that the power of the beam entering an ideal
Michelson interferometer can leave by either the combined RT and TR dotted beams or by the
two combined dash-dot beams traveling in the opposite direction to the incident beam. The dotted
beams are often called the balanced output of the interferometer, because each one has undergone
one transmission and one reflection at the beam splitter; similarly, the dash-dot beams are called
the unbalanced output, because one beam has undergone two reflections and the other beam has
undergone two transmissions. Conservation of energy requires that the power in all the
monochromatic beams leaving the ideal interferometer must equal the power in the one
monochromatic beam entering the interferometer. Hence, when constructive interference of the
balanced RT and TR beams makes their combined intensity equal to that of the beam entering the
interferometer, we know that destructive interference of the two unbalanced beams must make
their combined intensity equal to zero. Consequently, at each Ȥ = 2p value where
W cos  2&) f   1 , not only is the intensity of the balanced monochromatic beams the same as
i

that of the monochromatic beam entering the interferometer, but also the intensity of the
unbalanced monochromatic beams is zero. On the other hand, for moving-mirror positions where
Ȥ = 2p has a value such that W cos  2&) f   1 , the intensity of the combined monochromatic
i

RT and TR beams in Fig. 1.1(b) is zero according to Eq. (1.17d). At these moving-mirror
locations, the balanced output undergoes destructive interference. Conservation of energy then
requires the unbalanced output to undergo constructive interference and have the same intensity
as the monochromatic beam entering the interferometer.
This analysis can be generalized to any mirror position and value of Ȥ = 2p. If I (ficu ) is the
intensity of the unbalanced monochromatic wavetrain and, as before, I (0)
fi and I (ficb ) are the
intensities of the incident monochromatic wavetrain and balanced monochromatic wavetrain
respectively, then conservation of energy forces us to write

I (0) ( cb )
fi I fi  I (ficu ) . (1.18a)

Substituting from Eq. (1.17d), we get

- 46 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

1 (0) ª
I (0)
fi =
2 ¬ ¼ (
I fi 1 + W cos 2πσ fi χ º + I (ficu ) , )
which can be solved for I (ficu ) to get
1 (0) ª
2
I fi 1 − W cos 2πσ fi χ º .
I (ficu ) =
¬ ¼ ( (1.18b) )
This specifies the intensity of the fith monochromatic wavetrain in the unbalanced output of an
ideal Michelson interferometer.
The dashed lines in Fig. 1.17 show the positions of the moving mirror at which

n n +1 n + 2
χ = …, , , ,… .
σf i
σf i
σf i

These are the positions where I (ficb ) = 0 in Eq. (1.17d) when W = í1 for an interferometer using a
Michelson-type beam splitter. This can also be written as, substituting from Eq. (1.7b),

χ = " , nλ f , (n + 1)λ f , (n + 2)λ f ," ,


i i i

where λ fi is the wavelength of the fith monochromatic wavetrain. For beam splitters where
W = 1 , of course, these dashed lines represent the moving-mirror positions at which I (ficb ) = I (0)
fi . If

the moving mirror is slightly tilted, so that its surface crosses more than one dashed line, and the
beam entering the interferometer contains only the fith monochromatic wavetrain, then the
combined RT and TR beams leaving the interferometer have light and dark strips as the surface
of the tilted mirror crosses through those planes in space where an untilted mirror would produce
an all-bright or an all-dark balanced output. This connects Eq. (1.17d) to the bright and null
fringe patterns from a spectral line discussed in Sec. 1.4.
When a beam of white light passes through the interferometer—that is, a beam having many
different frequencies—the principle of independent superposition in Eq. (1.14f) requires the
intensity of the interferometer’s balanced output to be the sum of the intensities of each
monochromatic wavetrain,
N
I ( cb ) = ¦ I (ficb ) ,
i =1

which becomes, substituting from Eq. (1.17d),

1 N (0) ª
I ( cb )
= ¦ I fi 1 + W cos 2πσ fi χ º .
2 i =1 ¬ ¼ ( ) (1.19a)

- 47 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.17.

(n + 3)rd crossing

(n + 2)nd crossing

distance between
dashed lines is λ fi / 2

(n + 1)st crossing

nth crossing

position where position where position where position where


χ = nλ f i χ = (n + 1)λ fi χ = (n + 2)λ fi χ = (n + 3)λ fi

- 48 -
Interference Equation for the Ideal Michelson Interferometer· 1.5

When describing natural sources of light, we often replace sums of discrete quantities with
integrals over continuous functions, and this transformation was perhaps even more characteristic
of late 19th-century science than it is of today’s physics. So it would be an automatic process for
Michelson and his contemporaries to define a spectral intensity function I (0) ( f ) to describe the
radiation entering the instrument. When using this sort of mathematical formalism, we say that
I (0) ( f )df is the optical intensity of all the radiation having frequency values between f and f + df
entering the interferometer. The intensity of the balanced output is then

5
1 (0)
2 ³0
I ( cb ) I ( f ) ª¬1  W cos  2&) f   º¼ df . (1.19b)

TheThe physical
physical meaning
meaning of of
Eq.Eq. (1.19b)
(1.19b) is isexactly
exactlythe
thesame
sameasasEq.
Eq.(1.19a);
(1.19a);we
wehave
have just
just replaced
replaced
(0) (0)
I fi by I ( f )df and changed the sum to an integral. We have also relied on variable f itself
instead of index i to label the different frequencies. To make this last tactic work, we just assume
that I (0) ( f ) is zero for those frequencies f that are not part of the original sum over i; this also
lets us specify the integral to be over all possible frequencies f between 0 and ’. The
wavenumber ıf can be eliminated by substituting from the formula for f in (1.7d) to get

ª § 2& f · º
5
1
I ( cb )
³ I (0) ( f ) «1  W cos ¨  ¸ » df . (1.19c)
20 ¬ © c ¹¼

TheThe only
only problem
problem with
with this
this equationis isthe
equation theunreasonably
unreasonablyhigh
highnumbers
numbersrequired
required to
to represent
represent f
at optical frequencies—when going from one extreme to the other across the visible spectrum, for
example, frequency f changes from 4×1014 Hz to 7.5×1014 Hz (approximately). Consequently,
today’s Fourier spectroscopists often use Eq. (1.7d) to eliminate f rather than ı from Eq. (1.19b).
To do this, we differentiate both sides of (1.7d) to get

1
df c d) or d) df
c
and define
S () ) cI (0) (c) ) (1.19d)
so that
1
S () ) d) cI (0) (c) ) A df
c
simplifies to
S () ) d) I (0) (c) ) df . (1.19e)

- 49 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

Now Eq. (1.7d) can be applied to (1.19c) to get


1
I ( cb )
= ³ S (σ ) ª¬1 + W cos ( 2πσχ ) º¼ dσ . (1.19f)
20

To get the white-light intensity formulas for the unbalanced output, we can apply to the
unbalanced monochromatic formula the same analysis used on the balanced monochromatic
formula. Comparing the unbalanced formula (1.18b) to the balanced formula (1.17d), we see that
changing the sign of W is all that needs to be done to go from the balanced formula to the
unbalanced formula. Hence, when we apply to the unbalanced formula the same algebra used on
the balanced formula, we know that all the way through the derivation—and, of course, in the
final results—the only difference would be that W is replaced by íW. Consequently, we can write
down at once the unbalanced white-light formulas corresponding to (1.19b), (1.19c), and (1.19f)
as


1
I ( cu )
= ³ I (0) ( f ) ª¬1 − W cos ( 2πσ f χ ) º¼ df , (1.20a)
20


1 ª § 2π f · º
I ( cu )
= ³ I (0) ( f ) «1 − W cos ¨ χ ¸ » df , (1.20b)
20 ¬ © c ¹¼

and

1
I ( cu )
= ³ S (σ ) ª¬1 − W cos ( 2πσχ ) º¼ dσ (1.20c)
20

respectively. Formulas (1.19b), (1.19c), and (1.19f) contain all the basic information needed to
understand how Fourier-transform spectroscopy works, and it was derived here using only those
facts that Michelson knew over 100 years ago about the nature of light. Unfortunately, it applies
only to an ideal interferometer; not surprisingly, the 19th-century approach used to derive it is
difficult to adapt to the study of both the random and nonrandom errors present in even the most
accurate of today’s Michelson interferometers. For this reason, in Chapter 4 we return to basic
principles and rederive the formula for I(cb) starting from the modern form of Maxwell’s
equations, this time being careful to include all the nonideal terms needed for the error analysis.
Formula (1.19f) is, however, already good enough—if we borrow several mathematical results
from Chapter 2—to explain why the fringes from even the thinnest of spectral lines discussed in
Sec. 1.4 must eventually fade away as Ȥ = 2p increases.

- 50 -
Fringe Patterns of Finite-Width Spectral Lines· 1.6

1.6 Fringe Patterns of Finite-Width Spectral Lines


Finite-width spectral lines, such as the one in the top graph of Fig. 1.18, can be represented by a
spectral intensity function I(0)(f). We can also follow the standard practice of Fourier
spectroscopists and represent the finite-width spectral line by the S(ı) function defined in Eq.
(1.19d) and plotted in the bottom graph of Fig. 1.18. If the intensity of a spectral line is described
by a narrow I(0)(f) function such as the one in the top graph of Fig. 1.18, which is significantly
different from zero only between two very closely spaced frequencies f1 and f2, then the
corresponding S(ı) curve is significantly different from zero only between the two closely spaced
wavenumbers ) 1 f1 / c and ) 2 f 2 / c , as shown in the bottom graph of Fig. 1.18.
The right-hand side of Eq. (1.19f) can be split up into the sum of a constant term and a term
that changes as the location coordinate p = Ȥ/2 of the moving mirror changes,

5 5
1 W
I ( cb )
³ S () ) d)  ³ S () ) cos  2&)  d) . (1.21a)
20 2 0

Since ) :)0: in
Since 0 the
in the
integrals
integrals
over
overd)d), nothing
, nothingstops
stopsususfrom
fromreplacing
replacing SS(()))) by
by SS(()) )) in the
second term to get
5 5

³ S () ) cos  2&)  d) ³ S ( ) ) cos  2&)  d) .


0 0
(1.21b)

Anticipating
Anticipating some
some of of
thethe Fourier
Fourier materialininChapter
material Chapter2,2,we
wenote
notethat,
that,according
according to
to Eq.
Eq. (2.11a)
(2.11a)
in Chapter 2, function S ( ) ) is even because

S ( ) ) S ( ) ) ,

and, of course, it is real because it represents a real physical quantity—the intensity of the
spectral line. Turning next to Eq. (2.34g) in Chapter 2, we see that because S ( ) ) is a real and
even function, the cosine integral on the right-hand side of Eq. (1.21b) is one half of the Fourier
transform of S [if we specify that parameter ı in (1.21b) corresponds to variable t in (2.34g) and
that parameter Ȥ in (1.21b) corresponds to variable f in (2.34g)]. Anticipating the material in
Chapter 2 one last time, we consult Eq. (2.35k) and note that if the nth derivative of S has a well-
defined Fourier transform, then for large values of its argument the Fourier transform of S
approaches zero as the nth power of the absolute value of its argument. Since S describes a
spectral line—that is, a natural phenomenon—we expect it to have derivatives of all orders and
also expect those derivatives to have Fourier transforms. The argument of the Fourier transform
of S is Ȥ, and we already know that the right-hand side of (1.21b) is half the Fourier transform of
S, so we can now conclude that

- 51 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

5 5

³ S () ) cos  2&)  d) ³ S ( ) ) cos  2&)  d) ? O   


n
(1.21c)
0 0

for positive values of n as  7 5 . Applying this to Eq. (1.21a)


(1.20a) shows that

5
1
I ( cb )
2 ³0
S () ) d )  O  
n
 (1.21d)

for large values of Ȥ. Hence, as the moving mirror gets further and further from its ZPD location,
increasing the value of  2 p , the value of I ( cb ) eventually stops changing and approaches the
constant value
5
1
lim I ( cb )
³ S () ) d) . (1.21e)
 75 20

This happens for all types of intensity curves, not just those associated with spectral lines. If S
does represent a spectral line such as the one in Fig. 1.18, the brights and nulls associated with
the dashed lines in Fig. 1.17 eventually fade away. Consequently, no matter how the moving
mirror is tilted, no fringes can be seen. If the Michelson interferometer is being used as a ruler,
the fringe counting must stop. When the spectral line is a closely spaced multiplet, each line in
the group has a finite spectral width, ensuring that—no matter how the lines interact with each
other to form bright and dim regions in the overall fringe pattern—eventually any and all fringe
traces must disappear. Every spectral line found in nature produces light having some finite
spectral width, no matter how small, so this sort of fade-out is a universal phenomenon.

1.7 Fourier-Transform Spectrometers


In Michelson’s time there was no easy way to measure the intensity of the exit beam leaving the
interferometer, so it was not practical to measure the change in I(cb) as a function of Ȥ = 2p in
order to determine the Ȥ-dependent curve,

³ S () ) cos  2&)  d) ,
0

coming from the second term on the right-hand side of Eq. (1.21a). In the previous section we
found that this curve is half the Fourier transform of S. This means that if the curve could be

- 52 -
Fourier-Transform Spectrometers · 1.7

FIGURE 1.18.

( 0)
Spectral Intensity I (f)

f1 f2 frequency f

S (σ ) = cI (0) (cσ )

f1 f2 wavenumber σ
σ1 = σ2 =
c c

- 53 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

measured, then the Fourier transform could be reversed to get the shape of the S spectrum
entering the interferometer. In the 1950s, both optical detectors to measure I(cb) and digital
computers to reverse the Fourier transform became widely available. Spectroscopists began to
design and build spectrometers based on measuring I(cb) as a function of Ȥ and then reversing the
Fourier transform to find S. Today, these sorts of instruments are usually called Fourier-transform
spectrometers.
Equation (1.21a) is an idealized form of the fundamental equation of Fourier-transform
spectroscopy. It describes the intensity of the beam leaving an interferometer whenever we

1) Divide the beam into equal-amplitude secondary beams, and


2) Recombine the two secondary beams after the wavefield of one is shifted a distance Ȥ
with respect to the wavefield of the other.

Although this is exactly what happens inside a standard Michelson interferometer, Figs. 1.19(a)–
1.19(d) show that there are many other combinations of beam splitters and mirrors that divide and
recombine beams in this way.18
Figure 1.19(a) shows the first and perhaps most obvious modification. Michelson put the arms
of his interferometer at right angles to maximize the fringe shift due to the ether wind thought to
exist by 19th-century scientists. If all that is desired, however, is to divide and recombine beams,
then the two arms can be at any (reasonable) angle with respect to each other, as shown in Fig.
1.19(a). The setup in Fig. 1.19(a) may in fact have some advantages over the standard Michelson
interferometer; arranging for near-normal reflections off the beam splitter usually modifies the
polarization of the wavefields less than large-angle reflections (see Sec. 4.4 of Chapter 4 for an
explanation of polarization).
Figure 1.19(b) shows that the end mirrors can be replaced by retroreflectors like corner cubes
or cat’s-eyes. For best results, both arms should have the same type of retroreflector.
The discussion following Eq. (1.17d) above explains the difference between the balanced and
unbalanced optical outputs leaving the standard Michelson interferometer. In Figs. 1.19(a) and
1.19(b), the unbalanced output cannot be detected because it goes back out along the entrance
beam, making it impossible to separate the two. The interferometer in Fig. 1.19(c), however,
shows that there are ways to keep the entrance beam separate from the unbalanced output, giving
us access to both the balanced and unbalanced optical signals. According to Eqs. (1.19f) and
(1.20c), if I(cb) is the intensity of the balanced output and I((cu)
cb)
is the intensity of the unbalanced
output, then
5
I ( cb )
I ( cu )
W ³ S () ) cos  2&)  d) (1.22a)
0

and

18
To keep things simple, compensation plates and other secondary optical components have been omitted.

- 54 -
)RXULHU7UDQVIRUP6SHFWURPHWHUVÂ


, ( FE )
+, ( FX )
= ³ 6 (σ ) Gσ . (1.22b)
0

Equation (1.22a) shows that subtracting the output of the detectors measuring the balanced and
unbalanced signals eliminates the constant term and doubles the size of the signal component
containing the Fourier transform. Adding the detectors’ outputs in Eq. (1.22b) eliminates the
Fourier transform, producing the integrated spectral intensity of the entrance beam. This
integrated source intensity should, of course, remain constant during a spectral measurement
because Fourier-transform spectrometers are vulnerable to source fluctuations. Astronomers often
design their Fourier-transform spectrometers so that both the balanced and unbalanced outputs
are available. When they investigate the spectra of weak and fluctuating sources (such as
twinkling stars), these instruments allow them both to double the signal from—and to check the
constancy of—the radiances being measured. If the source fluctuates, formula (1.22b) can be
used to measure the fluctuation. Sometimes this allows the astronomer to rescale the Fourier
signal in (1.22a) to correct the spectral measurement.
In a standard Michelson interferometer such as the one shown in Fig. 1.1(b), and in the setups
shown in Figs. 1.19(a)±1.19(c), the wavefield of one recombining beam is displaced a distance Ȥ
with respect to the wavefield of the other whenever the moving mirror or corner cube is displaced
from =PD by a distance Ȥ/2. In Fig. 1.19(d), however, the corner cube only has to move a
distance Ȥ/4 to displace one wavefield by Ȥ with respect to the other. Equation (5.67) in Chapter 5
shows that larger values of Ȥ lead to more detailed spectral measurements in standard Michelson
interferometers, and the same holds true for the nonstandard interferometers discussed here. In
particular, a setup such as the one shown in Fig. 1.19(d) lets us achieve larger Ȥ values with
smaller displacements of the corner cube. The moving corner cube is also, strictly speaking, no
longer the retroreflector; plane mirrors in both arms are used to reverse the beam directions.
During the 1950s, it was established that Fourier-transform spectrometers had two basic
advantages—often called the Jacquinot advantage and the Fellget advantage—over contemporary
types of prism-based and grating-based spectrometers.19 These advantages revealed that under
many circumstances spectra measured by Fourier-transform spectrometers had a better signal-to-
noise ratio than equivalent prism-based or grating-based instruments. With the popularization of
the fast-Fourier transform (FFT) algorithms in the 1960s, Fourier-transform spectrometers soon
established themselves as usually the first and best choice for measuring infrared spectra
(electromagnetic radiation having wavelengths between 1 and 100 ȝm). The growing availability
of personal and desktop computers in the late 1970s and 1980s made Fourier-transform systems
more compact, powerful, and user-friendly. Over the past two decades, there has been a tendency
standard Michelson
to use standard Michelson configurations,
configurations,such
suchasasthose
thoseininFigs.
Figs.1.1(b)
1.1(b)oror1.19(a),
1.19(a),when
when



19
J. Chamberlain, 7KH3ULQFLSOHVRI,QWHUIHURPHWULF6SHFWURVFRS\ p. 16.


1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.19(a). χ
p=
2
Moving
Mirror
Beam Fixed
Splitter Mirror
Entrance Beam

To Balanced
Signal Detector
Moving Corner χ
FIGURE 1.19(b). Cube p=
2
Beam
Splitter
Entrance Beam Fixed Corner
Cube
To Balanced
Signal Detector
χ
p=
FIGURE 1.19(c). 2

Beam
Entrance Beam Splitter
Fixed Corner
Cube

To Unbalanced To Balanced
Signal Detector Signal Detector

- 56 -
Fourier-Transform Spectrometers · 1.7

FIGURE 1.19(d).

Moving Corner Cube


χ
p=
4

Beam
Entrance Beam Splitter

Fixed
Mirror

To Balanced Signal Detector

designing the optics of Fourier-transform spectrometers. Standard Michelsons are well suited to
the laser-based servo controls often used to maintain the alignment of the fixed and moving
mirrors.

1.8 Laser-Based Control Systems


Today’s Fourier-transform spectrometers often rely on laser-based servo systems to maintain
alignment and control the motion of the moving mirror. The average wavelength of the measured
spectra determines the standards of alignment and control required for good spectral

- 57 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

measurement. Systems designed to measure infrared spectra typically have lasers that work in the
visible. Not only do modest standards of alignment and control in the visible correspond to
extremely accurate standards of alignment and control in the infrared—because visible
wavelengths are much shorter than infrared wavelengths—but the infrared detectors responsible
for the spectral measurements are also easily shielded from stray laser light. The laser servo
systems follow many different designs. Figures 1.20(a) and 1.20(b) show a typical setup that may
not be exactly like any system now in use but that does present the basic ideas behind them.
In Fig. 1.20(a), a single laser beam is separated into beams A, B, and C by laser-beam
splitters. Separating one beam into three ensures that all three beams have the same wavelength.
The three beams enter the interferometer parallel to, and at the edges of, the entrance beam.
Figure 1.20(b) shows the path of beams A and B through the instrument; beam C is not shown
because it is out of the plane of the page, but it is assumed to follow a path similar to beams A
and B. The solid lines representing the laser beams are always parallel to the dotted lines showing
the path of the entrance beam through the interferometer; and the laser beams interact with the
interferometer’s beam splitter, fixed mirror, and moving mirror exactly the same way the
entrance beam does. Because all three laser beams are monochromatic wavetrains of wavelength
λ, the same reasoning used to produce Fig. 1.17 shows that we can draw a sequence of dashed
lines perpendicular to the laser beams to represent the moving-mirror positions where the laser
beams would form fringes. Just like in Fig. 1.17, each dashed line is separated from its two
nearest neighbors by λ/2. Taking the dashed lines to represent nulls, we note that if the moving
mirror has a slight tilt, as shown in Fig. 1.20(b), then the laser detector for beam B will see a near
null in the beam B fringe while the laser detector for beam A will see a near bright in the beam A
fringe. If the moving mirror is aligned in the plane of Fig. 1.20(b) but has a small out-of-plane
tilt, then the laser detector for beam C is sure to see a different fringe brightness than the laser
detectors for beams A and B. The three laser detectors send their signals to a servomechanism
that readjusts the mirror tilt until both detectors see the same fringe intensity, keeping the
interferometer aligned while the moving mirror changes position. Often these servomechanisms
readjust the tilt of the fixed mirror instead of directly correcting the moving mirror’s tilt. It is not
difficult to design systems of this sort that can detect changes of λ/100 in the position of the
moving-mirror’s surface. The A, B, and C laser detectors can also be used to count fringes as the
moving mirror changes position, keeping a record of where the moving mirror is and how fast it
is moving. This information is almost always used to sample the interferometer’s output signal at
equally spaced positions of the moving mirror, and it is often sent to a servomechanism
responsible for producing steady motion in the moving mirror.

___________

Chapters 2 and 3 spell out the mathematical ideas needed to analyze the performance of
Fourier-transform spectrometers, and they also establish the notation used to describe these ideas
in subsequent chapters. Readers who are already familiar with Fourier theory and random

- 58 -
Laser-Based Control Systems · 1.8

functions can skip ahead to Chapter 4, returning to Chapters 2 and 3 as needed to refresh their
understanding. Chapter 4 starts with Maxwell’s equations, working with them to derive the
nonideal versions of Eq. (1.19f) and (1.20c) needed to understand both the nonrandom and
random sources of error in Fourier-transform spectrometers. We always assume a standard
Michelson configuration, such as the ones shown in Fig. 1.1(b) or 1.19(a), controlled by laser-
based metrology and alignment systems similar to the ones shown in Figs. 1.20(a) and 1.20(b).
These are arguably the most common type of Fourier-transform spectrometer in use today. Most
of the basic ideas applied here to these standard Michelson systems are also relevant to other
types of Fourier-transform spectrometers; anyone who reads and understands the analysis
presented in Chapters 4 through 8 will be able to modify the equations presented there so that
they apply to nonstandard Michelson configurations. One possible exception to this rule are
Michelsons such as the one shown in Fig. 1.19(b) that use nonstandard retroreflectors to return
the split entrance beam to the beam splitter. These sorts of systems, which are outside the scope
of this book, are spared many forms of the “tilt” misalignment possible in a standard Michelson,
which is an advantage, but on the other hand exhibit shear types of misalignments, which
standard Michelsons do not have. The equations governing shear misalignment turn out to be
similar to those for tilt misalignment, but it does not necessarily make sense to analyze them as a
source of random error, the way tilt is analyzed in Chapter 7.

- 59 -
1 · Ether Wind, Spectral Lines, and Michelson Interferometers

FIGURE 1.20(a).

Interferometer
Beam Splitter

Beam C

Beam B
Laser

Beam A

Laser Beam
Splitters

Entrance
Beam

- 60 -
Laser-Based Control Systems · 1.8

FIGURE 1.20(b).

Laser Fringe Positions

Moving
Mirror
Laser

Beam C
Interferometer Fixed
Laser Beam Beam Splitter Mirror
Splitters
Beam B

Entrance
Beam

Beam A

To Laser
Detector B

To Laser
Detector A
To Infrared Detector

- 61 -
2
FOURIER THEORY
Many single-chapter introductions to Fourier theory follow a top-down approach, defining what a
Fourier transform is and then listing the mathematical consequences. Here, on the other hand, we
begin with more of a bottom-up approach, seeking not only to present the mathematical
formalism of Fourier transforms but also to give an intuitive feel for how they work and what
they mean. Once the basic idea is established, we need to know which data sequences and
functions have well-defined Fourier transforms. This topic is often scanted because Fourier
theory is notorious for providing no simple mathematical answers to this simple mathematical
question. Indeed, engineers, scientists, and applied mathematicians have a long tradition of using
Fourier transforms in mathematically improper—yet extremely useful—ways that usually give
the correct answer. To show why these techniques work, and also when they cannot be trusted,
there is a brief sketch of generalized function theory. This is followed by a discussion of the
Fourier series and the discrete Fourier transform, including an exact description of how they are
connected to the integral Fourier transform. The discrete Fourier transform is particularly
important because, almost without exception, the only type of Fourier transform calculated on
today’s computers is the discrete Fourier transform; without it, the Michelson interferometer
would be a much more limited instrument. The chapter then concludes with a brief discussion of
how Fourier transforms are applied to two-dimensional and three-dimensional functions.

2.1 Basic Concept of a Fourier Transform


The idea of a Fourier transform develops naturally from a simple idea for comparing the shape of
two sequences of measurements. A sequence of measurements is really just a list of numbers, so
when we compare sequences of measurements we compare the shapes of number lists graphed in
the order of their measurement. We can suppose without any loss of generality that two lists, uk
and vk , have the same number of members with k 1, 2, … , N . Figures 2.1(a) and 2.1(b) show
two lists uk and vk graphed against their index value k. Defining u and v to be the mean values
of uk and vk ,
1 N
u ¦ uk (2.1a)
N k 1
and
1 N
v ¦ vk , (2.1b)
N k 1

-- 62
62 --
Basic Concept of a Fourier Transform · 2.1

FIGURE 2.1(a).

List uk

1 2 3 4
increasing index k

FIGURE 2.1(b).

List vk

1 2 3 4
increasing index k

-- 63
63 --
2 · Fourier Theory

we form the sum S of the products of the differences from the mean,

N
S ¦  uk  u  vk  v  . (2.2)
k 1

If the graphs of uk and vk have similar shapes, so that uk  u ? vk  v for most values of k,
then  uk  u  and  vk  v  are very likely to have the same sign for most values of k. This means
few terms in the sum are negative and S ends up being a large positive number. If uk and vk have
little similarity in shape, then  uk  u  and  vk  v  are as likely to have opposite signs as the
same sign and the terms in the sum are just as likely to be positive as they are to be negative.
When this happens, S is a sum of terms that tend to cancel out, and the magnitude of S is likely to
be small.
The same basic idea can be applied to continuous functions u(t) and v(t). To create a formal
correspondence between functions and lists, we define an interval ¨t in t and match uk and vk to
u(t) and v(t) with the equations
u
u (k t ) k
t
and
v(k t ) vk .

Because u and v are continuous functions of time, we can assume that they vary in an
unsurprising manner between the isolated points at t , 2t , … , N t at which they have been
specified. Traditionally, the argument of functions u and v is called t and assumed to be time, but
it is worth remembering that t can stand for any relevant physical parameter, such as length,
voltage, current, etc. Now we can approximate Eq. (2.2) as

N t
S ³  u (t )  u  v(t )  v  dt ,
t
(2.3a)

where now
N t
1
u
N t ³ u (t )dt
t
(2.3b)

and
N t
1
v
N t ³ v(t )dt .
t
(2.3c)

Equations (2.3b) and (2.3c) just ensure that u and v are now the average values of u(t) and

- 64 --
- 64
Basic Concept of a Fourier Transform · 2.1

v(t) respectively. We note that the value of u has been redefined from what it was in Eq. (2.1a)
above,
unew uold / t ,

whereas v has basically the same value as in Eq. (2.1b)—the only change is to replace the sum
by the equivalent integral. At this point, the finite value of ¨t is just a distraction, because it is the
shapes of the continuous functions u(t) and v(t) that are being compared. Taking the limit as
t 7 0 and N 7 5 in such a way that

lim N t Tmax constant , (2.4a)


t 70
N 75

we get
Tmax

S ³  u (t )  u  v(t )  v  dt ,
0
(2.4b)

where
Tmax
1
u
Tmax ³ u (t )dt
0
(2.4c)

and
Tmax
1
v
Tmax ³
0
v(t )dt . (2.4d)

We still expect S to be large when functions u and v have similar shapes and S to be small when
they have dissimilar shapes.
Equation (2.4b) can be written as

Tmax Tmax

S ³  u(t )  u  v(t )dt  v ³  u(t )  u  dt


0 0
Tmax
ªTmax º
³  u (t )  u  v(t ) dt  v A « ³ u (t ) dt  u A Tmax »
0 ¬« 0 ¼» (2.5)
Tmax Tmax
ª Tmax
º
³ u (t ) v(t ) dt  u ³ v(t )dt  v A « ³ u (t ) dt  u A Tmax »
0 0 ¬« 0 ¼»
Tmax

³ u(t )v(t )dt  u A v A T


0
max ,

where in the last step (2.4c) ensures that the term in the square brackets [ ] is zero and (2.4d) is

-- 65
65 --
2 · Fourier Theory

used to replace the integral over v by vTmax . To get to Fourier theory from Eq. (2.5), we suppose
v(t) to be an oscillatory function like sin(2& ft ) or cos(2& ft ) with f > 0 . This makes function u
the data—that is, the value of our measurement at time t is u(t). Equation (2.4d) then reveals,
depending on whether we choose v to be a sine curve or a cosine curve, that

Tmax
1
vTmax ³ sin(2& ft )dt 2& f A 1  cos(2& fT ) 
0
max (2.6a)

or
Tmax
1
vTmax ³
0
cos(2& ft )dt
2& f
A sin(2& fTmax ) . (2.6b)

When v is a sine curve, vTmax oscillates between 1 & f  and 0 as Tmax increases; and when v
is a cosine curve, vTmax oscillates between 1  2& f  and 1  2& f  as Tmax increases. Keeping in
mind that u(t) represents a function measured in a laboratory, if we want to compare the shape of
u to either sin(2& ft ) or cos(2& ft ) , common sense requires Tmax, the range of t over which data is
gathered, to be much greater than 1/ƒ, the period of the sine or cosine curve to which we want to
compare the data. Unless u entirely lacks a resemblance to the sine or cosine so that

Tmax

³ u (t )v(t )dt 0
0

no matter how large u or Tmax become, we expect

Tmax

³ u (t )v(t )dt
0

to be large when the u measurements are large, and small when the u measurements are small—
and the integral’s magnitude should also increase as Tmax increases. So when u represents a
typical set of data that is not completely unlike v in shape, then

Tmax

³ u(t )v(t )dt O(uT


0
max )

or

- 66 --
- 66
Basic Concept of a Fourier Transform · 2.1

Tmax
1
u ³ u(t )v(t )dt O(T
0
max ).

Equations (2.6a) and (2.6b) show that vTmax must remain somewhere between the two values
1 & f  and 1  2& f  no matter how large Tmax gets, which means

vTmax O( f 1 ) .

Having already concluded that Tmax has been chosen much larger than 1/ƒ, we expect

Tmax
1
u ³ u(t )v(t )dt O(T
0
max ) O( f 1 ) vTmax ,

which, of course, reduces to


Tmax
1
u ³ u(t )v(t )dt vT
0
max .

Therefore, Eq. (2.5) can be approximated as

ª 1 Tmax º T
1 max
Tmax

S u A « ³ u (t )v(t )dt  v A Tmax » u A ³ u (t )v(t ) dt ³ u (t )v(t ) dt . (2.7)


¬« u 0 ¼» u 0 0

The integral in (2.7) can be regarded as assigning the number S to the similarity in shape of u and
v, when v is a sine or cosine curve of frequency ƒ. Remembering where S came from, we realize
that this number is large when u and v have similar shapes and small when u and v have
dissimilar shapes.

2.2 Fourier Sine and Cosine Transforms


To make the ideas of the previous section mathematically rigorous, we define the Fourier sine
transform of function u to be
5
p( ft )  u (t )  2 ³ u (t ) sin(2& ft ) dt (2.8a)
0

-- 67
67 --
2 · Fourier Theory

and the Fourier cosine transform of u to be

5
C ( ft )
 u (t )  2³ u (t ) cos(2& ft )dt . (2.8b)
0

The notation p( ft )  u (t )  and C ( ft )  u (t )  shows that the function u(t) is being multiplied by,
respectively, the sine or cosine function having—as indicated by the superscript—an argument ft
multiplied by 2& . The order of the ft product in the superscript does not matter because it does
not matter in the arguments of the sine and cosine, so

p( ft )  u (t )  p( tf )  u (t )  and C ( ft )  u (t )  C ( tf )  u (t )  .

In particular we know, because t is repeated in both u(t) and the superscript of p and C , that t is
the dummy variable of integration whereas ƒ, which is only contained in the superscript, is an
independent parameter. This means the transforms p( ft )  u (t )  and C ( ft )  u (t )  are themselves
functions of the parameter ƒ,
5
U p  f  2 ³ u (t ) sin(2& ft )dt (2.8c)
0

and
5
U C  f  2 ³ u (t ) cos(2& ft )dt . (2.8d)
0

The “capital U” names of functions U p and U C show that they are mathematically associated
with the original function u(t), created from u(t) by the integrals in (2.8c) and (2.8d).
Although the upper limit of integration is now ’ in Eqs. (2.8a) and (2.8b), this should not be
interpreted as taking the limit as Tmax 7 5 in Eq. (2.7). The upper limit is put at ’ just to
eliminate Tmax as an explicit parameter, and the idea behind the presence of Tmax—that u(t)
represents the result of a measurement—is kept alive by placing restrictions on the type of
function u can be. In particular, we expect u(t), in some sense, to diminish or get small as t gets
large, because it is impossible to measure data for all the times t out to ’. It turns out that when
the right sorts of restrictions are placed on u, the Fourier sine and cosine transforms can be
inverted to recover the original functions,

5
u (t ) 2 ³ U p  f  sin(2& ft ) df (2.8e)
0

- 68 --
- 68
Fourier Sine and Cosine Transforms · 2.2

and
5
u (t ) 2 ³ U C  f  cos(2& ft ) df (2.8f)
0

for t 0 .
If we adopt the strictest definition of what is meant by the integral of a function between 0 and
’, then Eqs. (2.8a)–(2.8f) are true when function u(t) satisfies the following four requirements:

(I) It is absolutely integrable.


(II) It is continuous except for a finite number of jump discontinuities.
(III) It is bounded on any finite interval 0
a
t
b
5 .
(IV) It has finite variation on any finite interval 0
a
t
b
5 .

We now show why function u(t) naturally satisfies all these restrictions when it represents a
(possibly idealized) measurement controlled or described by a continuous parameter t.
No matter what the argument t of function u represents—time, voltage, energy, etc.—function
u(t) can only be measured over a finite range of t. Although there may be no reason to think u is
zero or negligible when measured outside this range, we obviously cannot “make up” values for
what it might be. If we extrapolate to get the unmeasured t values, the extrapolation should not
dominate the information contained in u. In general, the measurement should be carried out in
such a way that the unmeasured or extrapolated values are of negligible importance compared to
the measured values. Mathematically we might say that there exists a positive, finite value of t,
which we call Tmax, such that the important measured values of u are all at t 4 Tmax . One way of
expressing this constraint is to require

Tmax 5

³
0
u (t ) dt ³ u (t ) dt .
0
(2.9a)

Since the left-hand integral ought to be finite, when (2.9a) is true, it follows that

³ u (t ) dt
5 .
0
(2.9b)

Functions u that satisfy (2.9b) are said to be absolutely integrable; clearly, all functions
representing possible measurements share this quality, satisfying requirement (I) above.
Understanding requirement (II) requires some discussion of what it means to call an
experimental measurement continuous. To assign, with negligible experimental error, a definite
value of t to a measurement u, some minimum and finite change in t must occur between adjacent
measurements. In practice, continuous measurements are constructed by connecting sequences of

-- 69
69 --
2 · Fourier Theory

adjacent but separate points. We then assume that if u were measured between these already
known points, it would equal (to within experimental error) the values selected by connecting the
points. Thus, the continuity of u is a requirement that the measurement captures all the relevant
detail. In this sense, asserting that u is continuous is a type of idealization—just another way of
saying that the measurement is accurate and representative. This takes care of the first part of
requirement (II), but there is a second part permitting u to have a finite number of jump
discontinuities. Figure 2.2 shows a jump discontinuity in u(t). Jump discontinuities represent
another type of idealization—what can occur when, for example, instruments are turned on or off
during a measurement. Because it is unrealistic to have this happen an infinite number of times
over a finite range of t, it makes sense to say that all functions u representing measurements are
continuous over any finite range of t except for a finite number of jump discontinuities.
Consequently, we can expect all functions representing measurements to satisfy requirement (II).
Standard proofs that the Fourier transform of the Fourier transform returns the original
function u usually end up showing as their final step that

5
1
2 ³ U p  f  sin(2& ft )df lim u (t   )  u (t   ) (2.9c)
 70 2
0

and
5
1
2 ³ U C  f  cos(2& ft ) df lim  u (t   )  u (t   )  . (2.9d)
 70 2
0

When u is continuous, this immediately reduces to the desired result, but when the integrals are
evaluated at a jump discontinuity, such as at t to in Fig. 2.2, the limits on the right-hand side of
(2.9c) and (2.9d) give u a value at the jump discontinuity that is probably different from the
original value of u at the jump discontinuity. To keep this from happening, we define the value of
u to be, for all values t t jump marking the location of a jump discontinuity,

1
u (t jump ) lim ª¬u (t jump   )  u (t jump   ) º¼ . (2.9e)
 70 2

Modifying u this way cannot change the value of any integral whose integrand is the product of u
with another smooth function. The sine and cosine are smooth functions, so using (2.9e) to
modify the value of u at jump discontinuities does not change the values of the sine or cosine
transforms.
Measurements must be done with physically realizable equipment, which necessarily
produces finite values of u. This means there always exists a finite real number B
5 such that

- 70 --
- 70
Fourier Sine and Cosine Transforms · 2.2

Figure 2.2.
u (t )

t t0

______________________________________________________________________________

u (t )
B (2.9f)

over any finite interval 0


a
t
b
5 when function u represents a measurement. Functions
obeying this inequality are called bounded functions, so functions representing measurements
always satisfy requirement (III).
Requirement (IV) is a little bit more complicated to explain. Any function u(t) can be written
as the difference of two other functions u1 (t ) and u2 (t ) , as shown in Figs. 2.3(a) and 2.3(b),

u (t ) u1 (t )  u2 (t ) (2.9g)

In Fig. 2.3(a), function u is drawn with a continuous line where it is increasing and with a dashed
line where it is decreasing. In Fig. 2.3(b), we see that functions u1 and u2 are constructed so that
every time u increases, u1 also increases while u2 remains the same, and every time u decreases,
u2 increases while u1 remains the same. Consequently, for any function u and time values b : a ,
the differences u1 (b)  u1 (a) and u2 (b)  u2 (a ) are non-negative and can only increase, which
means that their sum

-- 71
71 --
2 · Fourier Theory

FIGURE 2.3(a).
u (t )

a b
t1 t2 t3

FIGURE 2.3(b).
u1,2 (t )

u1 (t )

u2 (t )

a b
t1 t2 t3

- 72 --
- 72
Fourier Sine and Cosine Transforms · 2.2

Vab (u ) u1 (b)  u1 (a )  u2 (b)  u2 (a ) (2.9h)

is also non-negative. Functions u1 and u2 have been constructed so that every time u goes up and
down, the differences u1 (b)  u1 (a ) and u2 (b)  u2 (a ) increase, making the size of Vab (u ) a
record of how many times u oscillates in the interval a
t
b . We define Vab (u ) to be the
variation of u over the interval a
t
b , and if

Vab (u )
5 , (2.9i)

we say that u has finite variation over the interval a


t
b . Requirement (IV), that u have finite
variation in any interval 0
a
t
b
5 , means that u can only oscillate a finite number of
times in that interval. The function sin((t  1) 1 ) , for example, does not have finite variation over
any interval containing t 1 . If we attempted to measure a quantity that had infinite variation
inside a finite interval, we would be blocked by the realization, already discussed above in
connection with requirement (II), that adjacent measurements must be separated by some
minimum value of t. If the measurement were repeated over and over, it would seem as if u were
changing unpredictably in the region of infinite variation, leading us to wonder whether our
measurement reflected the same physical reality. Therefore, our measurements cannot have
infinite variation, and so any function u(t) representing a realistic measurement must also satisfy
requirement (IV).
We see that requirements (I) through (IV) are always satisfied by functions representing
physically realizable measurements. It should be emphasized that requirements (I) through (IV)
are sufficient to ensure that Eqs. (2.8a)–(2.8f) hold true, but not necessary. It is easy to show that
there exist functions that do not meet requirements (I) through (IV) yet still satisfy Eqs. (2.8a)–
(2.8f). Consider, for example,
­& for 0 4 t
1  2& 
°
g (t ) ® & / 2 for t 1  2&  (2.10a)
°0 for t 1  2& 
¯

This test function clearly satisfies (I) through (IV) and so must have a Fourier cosine transform,

 2 &  1
sin( f )
GC ( f ) 2& ³
0
cos(2& ft )dt
f
(2.10b)

such that we return to the original function g by taking cosine transform of the GC transform,

-- 73
73 --
2 · Fourier Theory

5 5
sin( f )
g (t ) 2 ³ GC ( f ) cos(2& ft )df 2³ cos(2& ft )df . (2.10c)
0 0
f

We could, however, just as easily have started with the function

sin(t )
h(t )
t

and taken its cosine transform to get

5
sin(t )
H C ( f ) 2³ cos(2& ft )dt . (2.10d)
0
t

The integral in (2.10d) is clearly the same as the first integral in (2.10c) with the variables ƒ and t
interchanged. Therefore,

­& for 0 4 f
1  2& 
°
H C ( f ) g ( f ) ® & / 2 for f 1  2& 
° 0 for f 1  2& 
¯

Hence we know that h(t) satisfies Eqs. (2.8b), (2.8d), and (2.8f)—it is both cosine transformable
and its cosine transform returns the original function when cosine transformed—exactly because
g(t) in (2.10a) satisfies Eqs. (2.8b), (2.8d), and (2.8f). Yet h(t), unlike g(t), does not satisfy
requirements (I) through (IV)—in particular, it violates requirement (I) because it is not
absolutely integrable. To see that this is true, note that

j& j&
5
sin(t ) 5
sin(t ) 5
1 2 5
1
³ dt ¦ ³& dt : ¦ ³& sin(t ) dt ¦ j 7 5,
0
t j 1  j 1 t j 1 j&  j 1 & j 1

where the last step uses a well-known property of the harmonic series,

5
1
¦ j,
j 1

that it grows large without limit. This simple example also shows that just because a function g(t)
satisfies requirements (I) through (IV), so that the transform of the transform returns the original

- 74 --
- 74
Fourier Sine and Cosine Transforms · 2.2

function g(t), it does not necessarily follow that transform itself satisfies requirements (I) through
(IV).
Here is another example to show that, even though the transform of a function may exist, if
requirements (I) through (IV) are violated, then the transform of the transform does not
necessarily return the original function. We consider another test function,

z (t ) t 1 , (2.10e)

which is clearly not absolutely integrable because

5 A
dt dt
³0 t lim
A75 ³ lim ª¬ ln  A   º¼ 5 ,
t A775
 70  0

violating requirement (I). The sine transform of z is

5
sin(2& ft )
Z p ( f ) 2³ dt .
0
t

Any handbook of definite integrals shows that

­ 0 for f 0
Zp ( f ) ® . (2.10f)
¯& for f 0

Therefore, the sine transform Z p of z (t ) t 1 exists, yet the sine transform of the sine transform
does not return z:

5 F
1 1
2& ³ sin(2& ft )df lim 2& ³ sin(2& ft ) df lim 1  cos(2& Ft )  > . (2.10g)
F 75 F 75 t t
0 0

Clearly, if a function violates requirements (I) through (IV) yet has a well-defined sine or
cosine transform, the sine transform of the sine transform and the cosine transform of the cosine
transform must be checked explicitly to confirm that the original function is returned. The only
exception is when the transform itself satisfies (I) through (IV) even though the original test
function does not. Because we could just as easily have started with the transform itself instead of
the original test function, we can conclude that the transform of the transform of the original
function must return the original function. In general, repeatedly applying the sine or cosine

-- 75
75 --
2 · Fourier Theory

transform just takes us back and forth between the same two functions, and the transformations
are mathematically justified whenever at least one of those functions satisfies requirements (I)
through (IV).

2.3 Even, Odd, and Mixed Functions


Fourier transform theory can be extended to include functions that are evaluated for negative as
well as positive values of their arguments. To assist our analysis of these extended transforms, we
decide to classify u as an even, odd, or mixed function. An even function u satisfies the constraint

u (t ) u (t ) (2.11a)

for all values of t, negative as well as positive; an odd function satisfies the constraint

u (t ) u (t ) (2.11b)

for all values of t, negative as well as positive; and a mixed function is partly even and partly odd
in the sense that it is the sum of an even function and an odd function, neither of which is
identically zero. Any function u(t)—whether even, odd, or mixed—can be written as the sum of
two functions, ue and uo , with ue being an even function obeying (2.11a) and uo being an odd
function obeying (2.11b),

u (t ) ue (t )  uo (t ) , (2.11c)
where
1
ue (t ) u (t )  u (t ) (2.11d)
2
and
1
uo (t ) u (t )  u (t ) . (2.11e)
2

Clearly,
1 1
ue (t ) u (t )  u (t ) u (t )  u (t ) ue (t )
2 2
and
1 1
u o ( t ) u (t )  u (t )  u (t )  u (t ) uo (t ) .
2 2

If u starts off as an even function, then u ue , and uo is identically zero; if u starts off as an odd
function, then u uo , and ue is identically zero; and if u starts off as a mixed function, then

- 76 --
- 76
Even, Odd, and Mixed Functions · 2.3

neither ue nor uo are identically zero. If u is identically zero, it can be regarded as either even or
odd, according to the classifier’s convenience.
Figures 2.4(a) and 2.4(b) graph examples of even and odd functions respectively, and Fig.
2.4(c) shows a mixed function that is split up into its even and odd parts. We note that cos(2& ft )
is an even function of both ƒ and t and sin(2& ft ) is an odd function of both ƒ and t. One point
worth remembering is that the behavior of even and odd functions is severely constrained near
t 0 . For any odd function at t 0 , we have

u (0) u (0) u (0)

from Eq. (2.11b). Since the only number equal to its own negative value is zero, all odd functions
u(t) that have a well-defined value at t 0 must be zero at t 0 ,

u 0 if u (0) exists and u is odd. (2.12a)


t 0

Because u (t ) u (t ) for even functions, when t is near zero the value of u (if u is continuous) is
almost constant. Therefore, when t is exactly zero the derivative of any even function u(t), if it is
well defined, must be zero,

du
0 if the derivative at zero exists and u is even. (2.12b)
dt t 0

In fact, using the definition of the derivative

du ª u (t   )  u (t ) º ª u (t )  u (t   ) º
lim « » lim « »¼ ,
dt  70 ¬  ¼  70 ¬ 

when u is even we see that

du ª u (to   )  u (to ) º ª u (t   )  u (to ) º du


lim « » lim « o »  .
dt t  to  70 ¬  ¼  70 ¬  ¼ dt t to

This shows that when u is even, the derivative of u is odd, and so from (2.12a), which states that
odd functions are zero when their argument is zero, we know that (2.12b) must be true. Similarly,
for any odd function u,

-- 77
77 --
2 · Fourier Theory

FIGURE 2.4(a).
u (t )

FIGURE 2.4(b).

u (t )

- 78 --
- 78
Even, Odd, and Mixed Functions · 2.3

FIGURE 2.4(c).

10
9.28

ue (t )
5

u (t )

u t
i

ue t
i 0

uo t
i

uo (t )
5

9.557 10
2 1.5 1 0.5 0 0.5 1 1.5 2
2 t ti 0 t 2

du ª u (to   )  u (to ) º ª u (t )  u (to   ) º du


lim « » lim « o »¼ dt ,
dt t  to  70 ¬  ¼  70 ¬  t to

showing that when u is odd, its derivative is even. The second derivative d 2u dt 2 of an even
function u is the first derivative of du dt that is odd, and so d 2u dt 2 must be even; similarly, the
third derivative d 3u dt 3 is the first derivative of d 2u dt 2 that is even, and so must be odd.
Examining in this fashion ever higher derivatives of the even function u, we conclude that

-- 79
79 --
2 · Fourier Theory

d n u ­odd function for n 1, 3, 5, …½


® ¾ when u is even. (2.12c)
dt n ¯ even function for n 2, 4, … ¿

The same reasoning applied to the derivatives of an odd function u shows that

d n u ­even function for n 1, 3, 5, …½


® ¾ when u is odd. (2.12d)
dt n ¯ odd function for n 2, 4, 6, …¿

Equation (2.12c) states that the odd-numbered derivatives of an even function are odd while the
even-numbered derivatives of an even function are even, and Eq. (2.12d) states that the odd-
numbered derivatives of an odd function are even while the even-numbered derivatives of an odd
function are odd. Therefore, an immediate consequence of (2.12a), (2.12c), and (2.12d) is that the
odd-numbered derivatives of an even function—if they exist and are well-defined—are zero at
t 0 and the even-numbered derivatives of an odd function—if they exist and are well-defined—
are zero at t 0 .

2.4 Extended Sine and Cosine Transforms


We can now extend the sine and cosine transforms to include functions u(t) evaluated for
negative as well as positive values of t while generalizing requirements (I) through (IV)
previously applied to u for t 0 in Sec. 2.2. The extended requirements are

(V) Function u (t ) must satisfy


5

³ u(t ) dt
5 .
5
(2.13a)

(VI) Function u (t ) must be continuous except for a finite number of jump discontinuities
over any finite interval 5
a
t
b
5 .
(VII) There must exist a finite positive number B such that

u (t )
B . (2.13b)

(VIII) The non-negative variation Vab (u ) of function u(t) as defined in Eqs. (2.9g) and (2.9h)
is finite over any finite interval 5
a
t
b
5 ,

Vab (u )
5 . (2.13c)

- 80 --
- 80
Extended Sine and Cosine Transforms · 2.4

We also define the value of u at all its jump discontinuities to be given by Eq. (2.9e). These new
requirements are clearly just the old set of requirements extended to cover negative as well as
positive values of t.
The extended Fourier sine transform of u is

5
pE ( ft )
 u (t )  ³ u (t ) sin(2& ft )dt , (2.14a)
5

and the extended Fourier cosine transform of u is

5
CE ( ft )
 u (t )  ³ u (t ) cos(2& ft )dt . (2.14b)
5

Just like in Eqs. (2.8a) and (2.8b), defining the standard sine and cosine transforms, the order of
the ft product in the superscript does not matter:

pE ( ft )  u (t )  pE ( tf )  u (t ) 
and
CE ( ft )  u (t )  CE ( tf )  u (t )  .

We can write u as the sum of even and odd functions, u (t ) ue (t )  uo (t ) , as described in Eq.
(2.11c), and substitute this sum into the definitions of the extended sine and cosine transforms in
(2.14a) and (2.14b) to get

5 5
pE ( ft )
 u (t )  ³ ue (t ) sin(2& ft )dt  ³ uo (t ) sin(2& ft )dt (2.15a)
5 5
and
5 5
CE ( ft )  u (t )  ³ ue (t ) cos(2& ft )dt 
5
³ u (t ) cos(2& ft )dt .
5
o (2.15b)

We note that the product of an even function ue and the sine, as well as the product of an odd
function uo and the cosine, must be an odd function,

ue (t ) sin  2& f A (t )  ue (t )   sin(2& ft )   ue (t ) sin(2& ft )  , (2.16a)

-- 81
81 --
2 · Fourier Theory

and
uo (t ) cos  2& f A (t )   uo (t )  cos(2& ft )  uo (t ) cos(2& ft )  . (2.16b)

The integral between í’ and +’ of any odd function o (t ) can be thought of as the limit of
the sum of a large number of small terms,

³  (t )dt "   (2dt ) A dt   (dt ) A dt   (0) A dt   (dt ) A dt   (2dt ) A dt  " .


5
o o o o o o

Because o is odd, o (0) is zero; o (dt ) A dt o (dt ) A dt and cancels o (dt ) A dt ;
o (2dt ) A dt o (2dt ) A dt and cancels o (2dt ) A dt ; and so on. Therefore,20

³  (t )dt 0 ,
5
o (2.17)

and Eqs. (2.15a) and (2.15b) can be written as

5
pE ( ft )  u (t ) 
5
³ u (t ) sin(2& ft )dt
o (2.18a)

and
5
CE ( ft )
 u (t )  ³ ue (t ) cos(2& ft )dt . (2.18b)
5

The integral between í’ and +’ of any even function e (t ) can be thought of as

³  (t )dt "   (2dt ) A dt   (dt ) A dt   (0) A dt   (dt ) A dt   (2dt ) A dt  " .


5
e e e e e e

Because e is even, e ( dt ) e (dt ) , e (2dt ) e (2dt ) , and so on. Therefore, the integral over
negative t has the same value as the integral over positive t and we can write

20
Strictly speaking, we are here treating the integral between í’ and +’ as a Cauchy principle value, a concept
introduced in Sec. 2.10 below.

- 82 --
- 82
Extended Sine and Cosine Transforms · 2.4

5 5

³  (t )dt 2³  (t )dt .
5
e
0
e (2.19)

The product of uo and the sine is an even function,

uo (t ) sin  2& f A (t )   uo (t )  A   sin(2& ft )  uo (t ) sin(2& ft )  , (2.20)

and the product of ue and the cosine, both of them even functions, is another even function.
Consequently, the extended sine and cosine transforms in Eqs. (2.18a) and (2.18b) are, according
to (2.19), (2.8a), and (2.8b),

5 5
pE ( ft )  u (t )  ³ uo (t ) sin(2& ft )dt 2³ uo (t ) sin(2& ft )dt p  uo (t ) 
( ft )
(2.21a)
5 0

and
5 5
CE ( ft )
 u (t )  ³ ue (t ) cos(2& ft )dt 2³ ue (t ) cos(2& ft )dt C ( ft )  ue (t )  . (2.21b)
5 0

Equation (2.21a) shows that the extended sine transform of a function u(t) is the unextended sine
transform of uo , the odd component of u; and Eq. (2.21b) shows that the extended cosine
transform of u(t) is the unextended cosine transform of ue , the even component of u. Because the
result will be needed later, we also show that the extended sine transform defined in Eq. (2.14a)
is an odd function of ƒ,

5 5
pE (  ft )
 u (t )  ³ u (t ) sin(2& ft )dt  ³ u (t ) sin(2& ft )dt pE ( ft )  u (t )  ; (2.22a)
5 5

and a similar manipulation shows that the extended cosine transform defined in (2.14b) is an even
function of ƒ,

5 5
CE (  ft )  u (t )  ³ u (t ) cos(2& ft )dt ³ u (t ) cos(2& ft )dt C  u (t )  .
( ft )
E (2.22b)
5 5

We now examine what happens when the extended sine and cosine transforms are applied
twice to the same function. We define

-- 83
83 --
2 · Fourier Theory

U pE  f  pE ( ft )  u (t )  p( ft )  uo (t )  (2.23a)
and
U CE  f  CE ( ft )  u (t )  C ( ft )  ue (t )  , (2.23b)

where the second step in Eqs. (2.23a) and (2.23b) comes from (2.21a) and (2.21b). Taking the
extended Fourier sine and cosine transforms of U pE and U CE respectively, we get

5
pE ( tf )
U pE ( f )  pE U pE ( f )  ³ U pE ( f ) sin(2& ft )df
( ft )
(2.24a)
5
and
5
CE ( tf )
U CE ( f )  CE U CE ( f )  ³ U CE ( f ) cos(2& ft )df
( ft )
. (2.24b)
5

The second step in (2.24a) and (2.24b) is there just to emphasize that we are allowed to change
the order of the ft product in the superscripts.
Equation (2.22a) shows that the extended sine transform U pE is an odd function of ƒ, so its
product with the sine is an even function of ƒ; and Eq. (2.22b) shows that the extended cosine
transform U CE is an even function of ƒ, so its product with the cosine is also an even function of
ƒ. Hence, according to (2.19), Eqs. (2.24a) and (2.24b) become

5
pE ( tf )
U pE ( f )  2³ U pE ( f ) sin(2& ft )df (2.25a)
0

and
5
CE ( tf ) U CE ( f )  2³ U CE ( f ) cos(2& ft )df . (2.25b)
0

But Eq. (2.23a) shows that U pE is also the unextended sine transform of uo , so from (2.25a) we
see that
pE ( tf ) U pE ( f ) 

equals the unextended sine transform of the unextended sine transform of uo , the odd component
of function u. According to Eqs. (2.8a), (2.8c), and (2.8e), the unextended sine transform of the
unextended sine transform returns the original function for positive values of t. This means that
the extended sine transform of the extended sine transform,

- 84 --
- 84
Extended Sine and Cosine Transforms · 2.4

pE ( tf ) U pE ( f )  ,

which we have just seen to be equal to the unextended sine transform of the unextended sine
transform, must return uo for positive values of t. Consequently, for positive values of t, Eq.
(2.25a) becomes
5
pE ( tf )
U pE ( f )  2³ U pE ( f ) sin(2& ft )df uo (t ) . (2.26a)
0

Function uo is, however, defined for all values of t according to the rule for odd functions
uo (t ) uo (t ) , and the integral
5
2 ³ U pE ( f ) sin(2& f (t ))df
0

is also an odd function of t when we allow t to be both positive and negative,

5 5
2 ³ U pE ( f ) sin(2& f (t ))df 2³ U pE ( f ) sin(2& ft ) df .
0 0

Consequently, the integral exists and is well defined for negative t whenever the integral exists
and is well-defined for positive t. We conclude that Eq. (2.26a) holds true for negative as well as
positive t. Hence, using Eq. (2.23a) to substitute for U pE in Eq. (2.26a), we can write

 
pE (tf ) pE ( ft 3)  u (t 3)  uo (t ) (2.26b)

This shows that taking the extended sine transform of the extended sine transform returns the odd
component uo of function u for all values of t, both positive and negative. Switching now to the
extended cosine transform U CE , we see that Eq. (2.23b) shows the extended cosine transform U CE
is also the unextended cosine transform of ue , the even component of function u. From the right-
hand side of Eq. (2.25b), we then know that

CE ( tf ) U CE ( f ) 

is equal to the unextended cosine transform of the unextended cosine transform of ue . Equations
(2.8b), (2.8d), and (2.8f) show that the unextended cosine transform of the unextended cosine
transform returns the original function for positive values of t. Consequently, the extended cosine

-- 85
85 --
2 · Fourier Theory

transform of the extended cosine transform,

CE ( tf ) U CE ( f )  ,

which we have just seen to be equal to the unextended cosine transform of the unextended cosine
transform of ue , must also equal ue for positive values of t. This means that Eq. (2.25b) becomes
(for positive values of t),
5
CE ( tf )
U CE ( f )  2³ U CE ( f ) cos(2& ft )df ue (t ) . (2.26c)
0

But ue (t ) is defined for negative as well as positive values of t according to the rule
ue (t ) ue (t ) for even functions of t, and the integral

5
2 ³ U CE ( f ) cos(2& ft )df
0

is also an even function of t when t is allowed to be both positive and negative:

5 5
2 ³ U CE ( f ) cos  2& f (t )  df 2³ U CE ( f ) cos  2& f (t )  df .
0 0

Consequently, the integral exists and is well defined for negative t if it exists and is well defined
for positive t. We conclude that Eq. (2.26c) is valid for both negative and positive t and that,
substituting Eq. (2.23b) into Eq. (2.26c),

 
CE (tf ) CE ( ft 3)  u (t 3)  ue (t ) . (2.26d)

This shows that taking the extended cosine transform of the extended cosine transform returns
ue , the even component of function u, for all values of t both positive and negative. Equations
(2.11d) and (2.11e), the original definitions of the even and odd components of a function u,
show that Eqs. (2.26b) and (2.26d) can be written as

1
 
pE ( tf ) pE ( ft 3)  u (t 3) 
2
u (t )  u (t ) (2.26e)
and

- 86 --
- 86
Extended Sine and Cosine Transforms · 2.4

1
 
CE ( tf ) CE ( ft 3) (u (t 3))
2
u (t )  u (t ) . (2.26f)

Adding together the extended sine transform of the extended sine transform and the extended
cosine transform of the extended cosine transform then gives

  
pE ( tf ) pE ( ft 3)  u (t 3)   CE ( tf ) CE ( ft 3)  u (t 3) 
1 1 (2.26g)
u (t )  u (t )   u (t )  u (t ) u (t ) .
2 2

We conclude that for any function u(t), the sum of the extended sine transform of the extended
sine transform and the extended cosine transform of the extended cosine transform returns the
original function.
One obvious way to proceed from this point is to define the Hartley transform

5
e a
( ft )
 u (t )  ³ u (t ) cos(2& ft )  sin(2& ft ) dt
5
5 5
³ u (t ) cos(2& ft )dt  ³ u(t ) sin(2& ft )dt
5 5
(2.26h)

CE (tf )  u (t )   pE (tf )  u (t ) 
U CE ( f )  U pE  f  ,

where in the next-to-last step we use definitions (2.14a) and (2.14b) of the extended sine and
cosine transforms and in the last step Eqs. (2.23a) and (2.23b) are used to write the extended sine
and cosine transforms as functions of ƒ. The order of the ft product in the superscript is not
important because, just like in the sine and cosine transforms, we have

ea( ft )  u (t )  ea( tf )  u (t )  .

Working with this definition, we see that the Hartley transform of the Hartley transform gives

 
ea( tf ) ea( ft 3)  u (t 3)  ea(tf ) U CE ( f )  U pE  f  
5 (2.26i)
³
5
ª¬U CE ( f )  U pE  f  º¼  cos(2& ft )  sin(2& ft )  df .

-- 87
87 --
2 · Fourier Theory

According to Eqs. (2.22a) and (2.22b), the extended sine transform U pCE is an odd function of ƒ
and the extended cosine transform U CE is an even function of ƒ. Using the same reasoning as in
Eqs. (2.16a) and (2.16b) above,

U CE ( f ) sin  2& t A ( f )  U CE ( f )   sin(2& ft )   U CE ( f ) sin(2& ft ) 

and
U pE ( f ) cos  2& t A ( f )   U pE ( f )  cos(2& ft )  U pE ( f ) cos(2& ft )  .

We see that U CE ( f ) sin(2& ft ) and U pE  f  cos(2& ft ) are both odd functions of ƒ, and Eq. (2.17)
states that the integral between í’ and +’ of any odd function is zero. Therefore,

5 5

³U
5
CE ( f ) sin(2& ft )df ³ U  f  cos(2& ft )df 0 .
5
pE

Now the Hartley transform of the Hartley transform in Eq. (2.26i) can be simplified to

5
e a
( tf )
 e  u (t 3)   ³
a
( ft 3 )
ª¬U CE ( f )  U pE  f  º¼  cos(2& ft )  sin(2& ft )  df
5
5 5
³
5
U CE ( f ) cos(2& ft )df  ³U
5
CE ( f ) sin(2& ft )df
5 5
 ³ U  f  cos(2& ft )df  ³ U  f  sin(2& ft )df
5
pE
5
pE

5 5
³U
5
CE ( f ) cos(2& ft )df  ³ U  f  sin(2& ft )df
5
pE

CE (tf ) U CE ( f )   pE ( tf ) U pE ( f )  .

Because U CE and U pE are respectively the extended cosine and


sine and sine transforms of u [see Eqs.
cosine
(2.23a) and (2.23b)], we have

     
ea( tf ) ea( ft 3)  u (t 3)  CE ( tf ) CE ( ft 3)  u (t 3)   pE ( tf ) pE ( ft 3)  u (t 3)  ,

which becomes, substituting from (2.26g),

- 88 --
- 88
Extended Sine and Cosine Transforms · 2.4

 
ea( tf ) ea( ft 3)  u (t 3)  u (t ) . (2.26j)

We see that the Hartley transform of the Hartley transform returns the original function for both
positive and negative values of t. The Hartley transform was never very popular and is only rarely
encountered today. What is done instead, as we shall see in the next section, is to combine the
extended sine and cosine transforms into a single Fourier transform based on a complex
exponential.

2.5 Forward and Inverse Fourier Transforms


The Fourier transform is based on the well-known identity

ei cos( )  i sin( ) , (2.27)

where i 1 .
For any real function u(t) satisfying requirements (V) through (VIII) in Sec. 2.4, we can add
the extended cosine transform to i times the extended sine transform to get

5 5
CE ( ft )  u (t )   i A pE ( ft )  u (t )  ³ u(t ) cos(2& ft )  i sin(2& ft ) dt ³e
2& ift
u (t )dt . (2.28a)
5 5

From Eqs. (2.23a) and (2.23b), we have

CE ( ft )  u (t )  U CE  f  and pE ( ft )  u (t )  U pE  f  ,

which means (2.28a) can be written as

³e
2& ift
u (t )dt U CE  f   iU pE  f  . (2.28b)
5

Taking the extended sine transform of both sides of (2.28b) gives

5 5 5 5

³ df sin(2& ft ) ³ dt 3 e ³ U  f  sin(2& ft )df  i ³ U  f  sin(2& ft )df


2& ift 3
u (t 3) CE pE
5 5 5
5
5
(2.28c)
i ³ U pE  f  sin(2& ft )df
5

-- 89
89 --
2 · Fourier Theory

because U CE  f  sin(2& ft ) is an odd function of ƒ and integrates to zero [see discussion after Eq.
(2.26i) above]. Taking the extended cosine transform of both sides of Eq. (2.28b) gives

5 5 5 5

³ df cos(2& ft ) ³ dt 3 e2& ift 3u (t 3) ³ U CE  f  cos(2& ft )df  i ³ U pE  f  cos(2& ft )df


5 5
5
5 5
(2.28d)
³ U  f  cos(2& ft )df
5
CE

because U pE  f  cos(2& ft ) is an odd function of ƒ and integrates to zero. Substitution of Eqs.


(2.24a) and (2.24b) into (2.28c) and (2.28d) gives

5 5

³ df sin(2& ft ) ³ dt 3 e
2& ift 3
u (t 3) i A pE ( tf ) U pE ( f )  (2.28e)
5 5
and
5 5

³
5
df cos(2& ft ) ³ dt 3 e2& ift 3u (t 3) CE ( tf ) U CE ( f )  .
5
(2.28f)

Since CE ( ft )  u (t )  U CE  f  and pE ( ft )  u (t )  U pE  f  [see Eqs. (2.23a) and (2.23b)], Eqs.


(2.28e) and (2.28f) can be written as

5 5

³ df sin(2& ft ) ³ dt 3 e  
2& ift 3
u (t 3) i A pE ( tf ) pE ( ft 3)  u (t 3)  (2.28g)
5 5
and
5 5

³ df cos(2& ft ) ³ dt 3 e  
2& ift 3
u (t 3) CE ( tf ) CE ( ft 3)  u (t 3)  . (2.28h)
5 5

We now multiply both sides of (2.28g) by ( i ) and sum the resulting equation with Eq. (2.28h) to
get
5 5 5 5

³
5
df cos(2& ft ) ³ dt 3 e2& ift 3u (t 3)  i ³ df sin(2& ft ) ³ dt 3 e 2& ift 3u (t 3)
5 5 5

CE ( tf )
C E
( ft 3 )
 u (t 3)    pE (tf )  pE ( ft3)  u (t 3)  

or, using the identity e  i cos( )  i sin( ) ,

- 90 --
- 90
Forward and Inverse Fourier Transforms · 2.5

5 5

³ df e ³ dt 3 e    
2& ift 2& ift 3
u (t 3) CE ( tf ) CE ( ft 3)  u (t 3)   pE (tf ) pE ( ft3)  u (t 3)  . (2.28i)
5 5

Equation (2.26g) simplifies this to

5 5

³ df e ³ dt 3 e
2& ift 2& ift 3
u (t 3) u (t ) . (2.28j)
5 5

If, in Eq. (2.28a), we start out by adding the extended cosine transform to (i ) times the extended
sine transform, then instead of Eqs. (2.28g) and (2.28h), we get [just replace i by (i )
everywhere]
5 5

³
5
df sin(2& ft ) ³ dt 3 e2& ift 3u (t 3) i A pE ( tf ) pE ( ft 3)  u (t 3) 
5
 
and
5 5

³ df cos(2& ft ) ³ dt 3 e  
2& ift 3
u (t 3) CE ( tf ) CE ( ft 3)  u (t 3)  .
5 5

Now we must multiply the top equation by i before summing it with the bottom equation to get

5 5 5 5

³ df cos(2& ft ) ³ dt 3 e u (t 3)  i ³ df sin(2& ft ) ³ dt 3 e2& ift 3u (t 3)


2& ift 3

5 5 5 5

C E
( tf )
C E
( ft 3 )
 u (t 3)    pE (tf )  pE ( ft3)  u (t 3)  

or
5 5

³ df e ³ dt 3 e
2& ift 2& ift 3
u (t 3) u (t ) . (2.28k)
5 5

Clearly, Eqs. (2.28j) and (2.28k) are basically the same identity, which can be written as

5 5

³ df e ³ dt 3 e
92& ift B2& ift 3
u (t 3) u (t ) . (2.28 A )
5 5

As long as the exponent of e changes sign in the two integrals over ƒ and t, we get back the
original function. Looking at how Eqs. (2.28j) and (2.28k) are derived, we see that if the sign of
the exponent does not change, we get

-- 91
91 --
2 · Fourier Theory

  
CE ( tf ) CE ( ft 3)  u (t 3)   pE ( tf ) pE ( ft 3)  u (t 3)  
instead of
  
CE ( tf ) CE ( ft 3)  u (t 3)   pE ( tf ) pE ( ft 3)  u (t 3)  . 
Equations (2.26e) and (2.26f) then show that

   
CE ( tf ) CE ( ft 3)  u (t 3)   pE ( tf ) pE ( ft 3)  u (t 3)  u (t ) ,

which gives
5 5

³ df e 92& ift ³ dt 3 e
92& ift 3
u (t 3) u (t ) (2.28m)
5 5

This interesting result shows that when u is even so that u (t ) u (t ) , we still get back the
original function, and when u is odd so that u (t ) u (t ) , we just have to multiply by ( 1) to
retrieve u. Even when u is mixed, no information is lost; reversing the sign of the argument still
gets us back to the original function. Replacing t by ít in (2.28m) takes us back to the original
formula (2.28 A ).
Up to this point, we have taken u to be real, but if Eq. (2.28 A ) holds true when u is a real
function of a real argument, it must also hold true when u is a complex function of a real
argument. To show why this is so, we break complex functions u(t) of a real argument t into real
and imaginary parts,
u (t ) ur (t )  iui (t ) ,

where ur and ui are both real functions of t. Substituting this complex-valued u(t) into the left-
hand side of (2.28 A ) gives

5 5

³ ³ dt 3 e
B2& ift 3
df e92& ift ur (t 3)  iui (t 3)
5 5
5 5 5 5

³ df e ³ dt 3 e ur (t 3)  i ³ df e ³ dt3 e
92& ift B2& ift 3 92& ift B2& ift 3
ui (t 3) .
5 5 5 5

Since (2.28 A ) holds for real functions ur and ui , this last expression must be equal to the
original complex function u,

ur (t )  iui (t ) u (t ) ,

- 92 --
- 92
Forward and Inverse Fourier Transforms · 2.5

showing that Eq. (2.28 A ) is true for complex functions of t as well as strictly real functions of t.
Similar reasoning shows that (2.28m) also holds true for complex functions of real variables.
Indeed, we can even apply this analysis to the unextended sine and cosine transforms to show that
the unextended sine transform of the unextended sine transform and the unextended cosine
transform of the unextended cosine transform return the original function (for positive values of
the argument) when the original function is complex.
We now define the Fourier transform of a complex function u with real argument t to be

5
F ( ift )  u (t )  ³ u(t )e
2& ift
dt . (2.29a)
5

The notation for F introduced in (2.29a) explicitly shows that t, being repeated inside both upper
and lower parentheses, is the dummy variable of integration; and that F produces a function of ƒ
because ƒ is only listed in the upper parentheses. We call (2.29a) the forward Fourier transform
and, when convenient, follow the custom of writing it with the upper-case letter of the
transformed function,
5

³ u (t )e
2& ift
U( f ) dt . (2.29b)
5

If (2.29a) is the forward transform, then the inverse Fourier transform is

³ U ( f )e
( itf ) 2& ift
F (U ( f )) df . (2.29c)
5

In both the forward and inverse transform the order of the tf product in the superscript is
irrelevant, just as it is for the sine, cosine, and Hartley transforms,

F ( 9 itf )  u (t )  F ( 9 ift )  u (t )  and F ( 9 itf ) U ( f )  F ( 9 ift ) U ( f )  .

What is important is the sign inside the superscript, since it determines whether the forward or
inverse transform is being performed. Equation (2.28 A ) shows, of course, that

5
u (t ) F ( itf ) U ( f )  ³ U ( f )e  
2& ift
df F (itf ) F (  ift 3)  u (t 3)  . (2.29d)
5

It is entirely a matter of convention which Fourier transform is called the forward transform and
which is called the reverse transform; all that matters is for (2.28 A ) to be satisfied. Some authors

-- 93
93 --
2 · Fourier Theory

change the sign of the exponent  2& ift  , defining the forward Fourier transform to be F ( ift ) ,

5
F ( ift )
 u (t )  ³ u (t )e2& ift dt ,
5

and the inverse Fourier transform to be F ( ift ) ,

5
F (  itf ) U ( f )  ³ U ( f )e
2& ift
df .
5

Clearly, this convention also satisfies (2.28 A ), with the inverse Fourier transform of the forward
Fourier transform still returning the original function.
In physics and related disciplines, the frequency variable is often changed to - 2& f , so that
(2.28 A ) becomes
5 5
1
³ ³ dt 3 eB i-t 3u (t 3) u (t ) .
9 i-t
d - e (2.30a)
2& 5 5

Authors using the frequency variable Ȧ allocate the factor of 1 (2& ) different ways when
defining the forward and inverse Fourier transforms in terms of Ȧ, with all reasonable
possibilities chosen at one time or another:

5
Forward Fourier transform of u (t ) is ³ u (t )e B i-t dt U (- ) , (2.30b)
5
5
1
³ U (- )e
9 i-t
Inverse Fourier transform of U (- ) d- ,
2& 5

5
1
³ u (t )e
B i-t
Forward Fourier transform of u (t ) is dt U (- ) , (2.30c)
2& 5
5
1
³ U (- )e
9 i-t
Inverse Fourier transform of U (- ) d- ,
2& 5

5
1
³ u (t )e
B i-t
Forward Fourier transform of u (t ) is dt U (- ) , (2.30d)
2& 5

- 94 --
- 94
Forward and Inverse Fourier Transforms · 2.5

³ U (- )e
9 i-t
Inverse Fourier transform of U (- ) d- .
5

In each of the three pairs of definitions listed above, the plus and minus signs are synchronized;
so if the top (bottom) sign is chosen for the first member of the pair then the top (bottom) sign
must also be chosen for the second member of the pair. This gives a total of six different ways of
defining the forward and inverse Fourier transforms, and all six satisfy Eq. (2.30a).
The unextended sine and cosine transforms—usually called just the sine and cosine
transforms—can also be defined in many different ways. Equations (2.8a), (2.8c), (2.8e), and
(2.8b), (2.8d), (2.8f) can be combined to write

5 5
4 ³ df sin(2& ft ) ³ dt 3 u (t 3) sin(2& ft 3) u (t ) for t 0 (2.31a)
0 0

and
5 5
4 ³ df cos(2& ft ) ³ dt 3 u (t 3) cos(2& ft 3) u (t ) for t 0 . (2.31b)
0 0

Changing the frequency variable to - 2& f gives

5 5
2
& ³ df sin(-t )³ dt 3 u (t 3) sin(-t 3) u(t ) for t 0
0 0
(2.31c)

and
5 5
2
& ³ df cos(-t )³ dt 3 u (t 3) cos(-t 3) u(t )
0 0
for t 0 . (2.31d)

Just like the factor of 1 (2& ) in Eq. (2.30a), the factor of 2 & in (2.31c) and (2.31d) can be
allocated three different ways when defining the forward and inverse sine and cosine transforms:

5
Forward sine transform of u (t ) for t 0 is ³ u (t ) sin(-t )dt U p -  , (2.31e)
0
5
Forward cosine transform of u (t ) for t > 0 is ³ u (t ) cos(-t ) dt U C -  ,
0
5
2
Inverse sine transform of U p -  is
& ³ U -  sin(-t )d- u(t )
0
p for t 0 ,

-- 95
95 --
2 · Fourier Theory

5
2
Inverse cosine transform of U C -  is
& ³ U -  cos(-t )d- u (t )
0
C for t 0 ,

5
2
Forward sine transform of u (t ) for t > 0 is
& ³ u (t ) sin(-t )dt U -  ,
0
p (2.31f)

5
2
Forward cosine transform of u (t ) for t > 0 is
& ³ u (t ) cos(-t )dt U -  ,
0
C

5
2
Inverse sine transform of U p -  is
& ³ U -  sin(-t )d- u (t )
0
p for t 0 ,

5
2
Inverse cosine transform of U C -  is
& ³ U -  cos(-t )d- u (t )
0
C for t 0 ,

5
2
Forward sine transform of u (t ) for t > 0 is
& ³ u (t ) sin(-t )dt U -  ,
0
p (2.31g)

5
2
Forward cosine transform of u (t ) for t > 0 is
& ³ u (t ) cos(-t )dt U -  ,
0
C

5
Inverse sine transform of U p -  is ³ U p -  sin(-t )d - u (t ) for t 0 ,
0
5
Inverse cosine transform of U C -  is ³ U C -  cos(-t )d - u (t ) for t 0 .
0

The reader should expect to encounter all three classes of definitions given in (2.31e)–(2.31g).
The symmetric definitions in (2.31f) are the most popular, probably because they remove the
distinction between the forward and inverse transform, letting us say that the sine transform of
the sine transform and the cosine transform of the cosine transform return the original function
for t 0 .
In today’s optical-engineering textbooks—and user manuals for the fast Fourier transform—
there is a tendency to choose Eq. (2.29a)–(2.29d) as the definitions of the forward and inverse
Fourier transform, and that is the convention followed here. It is perhaps somewhat
unconventional not to use the frequency variable - 2& f when defining the sine and cosine
transforms, but using ƒ rather than Ȧ brings their definitions into conformity with the definitions
chosen for the forward and inverse Fourier transforms.

- 96 --
- 96
Fourier Transform as a Linear Operation · 2.6

2.6 Fourier Transform as a Linear Operation


The forward and inverse Fourier transforms are linear operations. If Į, ȕ are any two complex
constants and u(t), v(t) are two complex-valued functions of a real variable t, then the definition
of a linear operator L isis that
that

L  A u (t )   A v(t )   A L  u (t )    A L  v(t )  . (2.32a)

Examples of linear operators are multiplication by a specified function g(t)

L1  u (t )  g (t ) A u (t ) ,

differentiation with respect to t


du (t )
L2  u (t )  ,
dt

and integration over the interval t1


t
t2

t2

L3  u (t )  ³ u (t ) dt .
t1

We see that for these three examples

L1  u (t )   v(t )   g (t )u (t )   g (t )v(t )  L1  u (t )    L1  v(t )  ,

du (t ) dv(t )
L2  u (t )   v(t )     L2  u (t )    L2  v(t )  ,
dt dt
and
t2 t2

L3  u (t )   v(t )   ³ u (t )dt   ³ v(t )dt  L3  u (t )    L3  v(t )  .


t1 t1

Combinations of linear operators are always linear; for example, the operator Z defined by

Z  u (t )  L3  L1  u (t )  
must be linear because

-- 97
97 --
2 · Fourier Theory

Z  u (t )   v(t )  L3  L1  u (t )   v(t )   L3  L1  u (t )    L1  v(t )  


 L3  L1  u (t )     L3  L1  v(t )   (2.32b)
 L  u (t )    L  v(t ) 

We note that the forward Fourier transform

5
F (  ift )
 u (t )  ³ u (t )e2& ift dt
5

as defined in Eq. (2.29a) is, in fact, just L3  L1  u (t )   with g (t ) e 2& ift in the L1 multiplication
and t1 5 , t2 5 in the L3 integration. Similarly, the inverse Fourier transform is,
interchanging the roles of the ƒ and t variables in Eq. (2.29b),

5
F (ift ) U (t )  ³ U (t )e
2& ift
dt ,
5

showing it to be L3  L1 U (t )   with g (t ) e 2& ift in the L1 multiplication and t1 5 , t2 5 in


the L3 integration. Equation (2.32b) thus shows that both the forward and inverse Fourier
transforms are linear. The unextended and extended sine transforms in Eqs. (2.8a) and (2.14a),

5 5
p ( ft )
 u (t )  2³ u (t ) sin(2& ft )dt and pE ( ft )
 u (t )  ³ u (t ) sin(2& ft )dt ,
0 5

are also both L3  L1  u (t )   : the unextended sine transform has g (t ) 2sin(2& ft ) in the L1
multiplication and t1 0 , t2 5 in the L3 integration; and the extended sine transform has
g (t ) sin(2& ft ) in the L1 multiplication and t1 5 , t2 5 in the L3 integration. The
unextended and extended cosine transforms in Eqs. (2.8b) and (2.14b),

5 5
C ( ft )
 u (t )  2³ u (t ) cos(2& ft )dt and CE ( ft )
 u (t )  ³ u (t ) cos(2& ft )dt ,
0 5

are, of course, identical to the unextended and extended sine transforms in being L3  L1  u (t )   ;
the only change is that the sines change to cosines in the L1 multiplications. From Eq. (2.32b), all

- 98 --
- 98
Fourier Transform as a Linear Operation · 2.6

four transforms—the extended sine transform, the unextended sine transform, the extended
cosine transform, and the unextended cosine transform—are linear operations. We see that the
only other transform discussed so far, the Hartley transform

5
ea( ft )  u (t )  ³ u (t ) cos(2& ft )  sin(2& ft ) dt
5

in Eq. (2.26h), must also be linear because it is

L3  L1  u (t )   with g (t ) cos(2& ft )  sin(2& ft )

in the L1 multiplication and has t1 5 , t2 5 in the L3 integration.

2.7 Mathematical Symmetries of the Fourier Transform


There are a large number of symmetry relations that hold for any function u(t) and its Fourier
transform
5
U( f ) F (  ift )
 u (t )  ³ u (t )e2& ift dt . (2.33a)
5

We have already seen that the inverse Fourier transform of U ( f ) returns the original function,

³ U ( f )e
2& ift
df F (itf ) U ( f )  u (t ) . (2.33b)
5

Replacing t by ít, changes this to


u (t ) F (  itf ) U ( f )  .

Interchanging the roles of variables ƒ and t, we get

u ( f ) F (  ift ) U (t )  , (2.33c)

which shows that u(íf) is the forward Fourier transform of U(t). We expect, then, that U(t) is the
inverse Fourier transform of u(íf). To show this is true, we interchange the roles of variables ƒ
and t in (2.33a) and then make f 3  f the new variable of integration to get

-- 99
99 --
2 · Fourier Theory

5 5 5
U (t ) F (  itf )
 u ( f )  ³ u ( f )e 2& ift
df  ³ u ( f 3)e 2& if 3t df 3 ³ u (  f )e
2& ift
df
5 5 5
(2.33d)
( itf )
F  u ( f )  .
Not only does this show that U(t) is the inverse Fourier transform of u(íf) but also, by comparing
the two expressions involving the F operator, we see that changing the sign of the integration
variable ƒ does not change the value of the Fourier operation F. It does, however, change its
name—the first F operation in (2.33d) is the forward Fourier transform of u(f) and the second F
operation in (2.33d) is the inverse Fourier transform of u(íf). Taking the complex conjugate of all
three expressions in Eq. (2.33b) gives

³ U( f ) e  
 2& ift
u (t ) df F (  itf ) U ( f ) ,
5

which shows that we get the complex conjugate of operator F by taking the complex conjugates
of the quantities inside both parentheses. Starting with the original Fourier transform relationship
between U and u,
U ( f ) F ( ift )  u (t )  (2.33e)
and
u (t ) F ( itf ) U ( f )  , (2.33f)

we take the complex conjugates of both sides of (2.33e),


U ( f ) F ( ift ) u (t ) , 
and then change the sign of ƒ to get


U ( f ) F ( ift ) u (t ) .  (2.33g)

This shows that U(íf)* is the forward Fourier transform of u(t)*. Since U(íf)* is the forward
Fourier transform of u(t)*, we expect the inverse Fourier transform of U(íf)* to be u(t)*. To show
this is true, we just change the sign of integration variable in Eq. (2.33f),

u (t ) F ( itf ) U ( f )  ,

and then take the complex conjugate to get

- 100
- 100- -
Mathematical Symmetries of the Fourier Transform · 2.7


u (t ) F ( itf ) U ( f ) .  (2.33h)

Hence, u(t)* is indeed the inverse Fourier transform of U(íf)*.


When u(t) is a strictly real function, as it is for much of the Fourier-transform work done in
this book, u equals its complex conjugate so that

 
F ( ift )  u (t )  F ( ift ) u (t ) ,

and Eq. (2.33g) becomes


U ( f ) F (  ift )  u (t )  .

But F ( ift )  u (t )  is just U(f), the forward Fourier transform of u, so

U (  f ) U ( f )

or, taking the complex conjugate of both sides,

U (  f ) U ( f ) . (2.34a)

Functions U(f) that obey Eq. (2.34a) are called Hermitian. If u(t) is purely imaginary, so that
u (t ) u (t ) , then Eq. (2.33g) becomes

U ( f ) F ( ift )  u (t ) 
or
F ( ift )  u (t )  U ( f ) , (2.34b)

where the linearity of F is used to take (1) outside the transform and shift it over to the other
side of the equation. Since F ( ift )  u (t )  is just U(f), Eq. (2.34b) shows that

U ( f ) U ( f )
or
U ( f ) U ( f ) (2.34c)

when u is purely imaginary. Functions U(f) that obey Eq. (2.34c) are called anti-Hermitian. A
special and very important case occurs when u is both real and even. Then, since U is the forward

-- 101
101 --
2 · Fourier Theory

Fourier transform of u with U ( f ) F ( ift )  u (t )  , we take the complex conjugate of both sides to
get

U ( f ) F ( ift ) u (t ) . 
Because u is real this becomes, changing the sign of the variable of integration,

U ( f ) F ( ift )  u (t )  F (  ift )  u (t )  .

Because u is even, this simplifies to

U ( f ) F (  ift )  u (t )  U ( f )
so that
U ( f ) U ( f ) . (2.34d)

Hence, U equals its own complex conjugate, which shows it must be real. Because u is real, we
already know that U is Hermitian and (2.34a) must hold true; now that U is known to be real, Eq.
(2.34a) can be written as
U ( f ) U ( f ) (2.34e)

This shows that U must be real and even when u is real and even. Taking the real part of Eq.
(2.33a) now gives, since both U and u are known to be real,

§5 · 5
U ( f ) Re ¨ ³ u (t )e 2& ift
dt ¸ ³ u (t ) Re e2& ift dt ,
 
© 5 ¹ 5

which becomes, applying Eq. (2.27),

5
U( f ) ³ u(t ) cos(2& ft ) dt .
5
(2.34f)

Because u(t) is also even, we know that the product u (t ) cos(2& ft ) is even with respect to t,
which means that (2.34f) can be written as [see formula (2.19) above]

5
U ( f ) 2 ³ u (t ) cos(2& ft ) dt . (2.34g)
0

- 102
- 102- -
Mathematical Symmetries of the Fourier Transform · 2.7

The right-hand side is the unextended cosine transform of u, showing that when u(t) is real and
even, its Fourier transform equals its cosine transform. According to Eq. (2.8f), it follows that u
must then be the cosine transform of U,

5
u (t ) 2 ³ U ( f ) cos(2& ft ) df . (2.34h)
0

2.8 Basic Fourier Identities


There are a number of simple Fourier identities that are true for the transforms of any function u.
One very simple identity—surprisingly easy to overlook—is that when U(f) is the forward or
inverse Fourier transform of u(t), the value of U at the origin is the total integral of u:

ª5 º
U( f ) f 0
« ³ u (t )e B2& ift dt »
¬ 5 ¼ f 0

or
5
U (0) ³ u (t )dt .
5
(2.35a)

Similarly, u (0) is the total integral of U ( f ) :

ª5 º
u (t ) t 0 « ³ U ( f )e 92& ift df »
¬ 5 ¼ t 0
or
5
u (0) ³ U ( f )df .
5
(2.35b)

When U(f) is the forward Fourier transform of u(t), the nth derivative of U is

d nU <n
5 5

³ u(t )e ³ ª¬t u(t ) º¼ e


2& ift n n 2& ift
dt (2& i ) dt ; (2.35c)
df n <f n 5 5

and, because Eqs. (2.29a) and (2.29d) require u to be the inverse transform of U when U is the
forward transform of u, the nth derivative of u is

-- 103
103 --
2 · Fourier Theory

d nu < n
5 5

³ U ( f )e ³ ª¬(2& i) f nU ( f ) º¼ e2& ift df .


2& ift n
df (2.35d)
dt n <t n 5 5

Therefore, when both u and d nu dt n satisfy requirements (V) through (VIII) in Sec. 2.4 and U(f)
is the forward Fourier transform of u(t), Eq. (2.35d) shows that [(2& i ) n f nU ( f )] must be the
forward Fourier transform of d nu dt n because d nu dt n is the inverse Fourier transform of
[(2& i ) n f nU ( f )] . Equation (2.35c) similarly shows that when u(t) and [t nu (t )] satisfy
requirements (V) through (VIII) in Sec. 2.4 and U(f) is the forward Fourier transform of u(t), the
forward Fourier transform of [t nu (t )] is
1 d nU
.
(2& i ) n df n

We introduce the notation “ 6 ” to show this sort of Fourier-transform relationship between


functions, adopting the convention that the function on the right is always the forward Fourier
transform of the function on the left and the function on the left is always the inverse Fourier
transform of the function on the right. The results of the above analysis can then be written as

d nu
6 (2& i ) n f nU ( f ) (2.35e)
dt n
and
1 d nU
t nu (t ) 6 . (2.35f)
(2& i ) n df n

For the integral of any complex function c(t), the inequality

b b

³ c(t ) dt 4 ³ c(t ) dt
a a
(2.35g)

must hold true for any two real values of a and b where a 4 b . When u(t) is real, so is its nth
derivative, and we can write

d n u 2& ift d n u 2& ift d n u 2& ift


5 5 5

³5 dt n e dt 4 ³5 dt n e dt ³5 dt n A e dt ,

which reduces to, since e 2& ift 1 ,

- 104
- 104- -
Basic Fourier Identities · 2.8

d nu 2& ift d nu
5 5

³ dt n e dt 4 5³ dt n dt .
5
(2.35h)

Because we are supposing the Fourier transform of d nu / dt n to exist, the existence requirement
in Eq. (2.13a) shows that
d nu
5

³ dt n dt
5

is finite. Hence, inequality (2.35h) requires

d n u 2& ift
5

³5 dt n e dt

also to be finite, which means that we can assume that it is less than or equal to some finite real
and non-negative number B for all values of ƒ:

d nu 2& ift
5

³ dt n e dt 4 B .
5
(2.35i)

Formula (2.35e) states that


d nu 2& ift
5

³5 dt n e dt (2& ) i f U ( f ) ,
n n n
(2.35j)

where
5

³ u (t )e
2& ift
U( f ) dt
5

is, of course, the Fourier transform of u(t). Taking the magnitude of the complex values of both
sides of (2.35j) and remembering that i n 1 shows that

d nu 2& ift
5

³5 dt n e dt (2& ) f
n n
U( f ) ,

which becomes, applying inequality (2.35i),

-- 105
105 --
2 · Fourier Theory

n
B : (2& ) n f U( f )
or
B n
U( f ) 4 f . (2.35k)
(2& ) n

Hence, when the Fourier transform of the nth derivative of u(t) exists, we know that the
n
magnitude U ( f ) of the Fourier transform of u decreases as f for large values of ƒ.
We next examine a set of identities often called the Fourier shift theorem. When U(f) is the
forward Fourier transform of u(t),
5

³ u (t )e
2& ift
U( f ) dt ,
5

and u(t) is shifted to the right by an amount a,

u (t ) 7 u (t  a) ,

then the forward Fourier transform of u (t  a) is, changing the variable of integration to
t3 t  a ,

5 5

³ u(t  a)e dt ³ u (t 3)e


2& ift 2& if ( t 3  a )
dt 3
5 5
5
e 2& ifa ³ u (t 3)e
2& ift 3
dt 3 e 2& ifaU ( f ).
5

Hence the forward Fourier transform of u (t  a) is e 2& ifaU ( f ) when the forward Fourier
transform of u(t) is U(f), which we can write as

If u (t ) 6 U ( f ) then u (t  a) 6 e 2& ifaU ( f ) . (2.36a)

operator, we
In terms of the Fourier F operator, we have
have

F ( ift )  u (t  a )  e 2& ifa F (  ift )  u (t )  . (2.36b)

Working with the reverse Fourier transform of U ( f  f 0 ) and changing the variable of
integration to f 3 f  f 0 , we see that

- 106
- 106- -
Basic Fourier Identities · 2.8

5 5

³ U ( f  f )e ³ U ( f 3)e
2& ift 2& if0 t 2& if 3t
0 df e df 3 e 2& if0t u (t ) (2.36c)
5 5
or
e 2& if0t u (t ) 6 U ( f  f 0 ) . (2.36d)

The F operator lets us write this result as

F ( itf ) U ( f  f 0 )  e 2& if0t F ( itf ) U ( f )  (2.36e)


or
F (ift ) e 2& if0t u (t ) U ( f  f 0 ) F 
i  f  f 0 t 
   u (t )  . (2.36f)

Equations (2.36d)–(2.36f) show that multiplying u(t) by e 2& if0t shifts U(ƒ), the forward Fourier
transform of u(t), to the right by a frequency f 0 . By interchanging the roles of t and ƒ—and
replacing u by U and f 0 by a—in (2.36e) and comparing the result to (2.36b), we see the two
equations can be combined into one formula:

F ( 9 ift )  u (t  a )  e 92& ifa F ( 9 ift )  u (t )  . (2.36g)

This last result can also be written as, defining a new constant b  a ,

5 5

³ u(t  b) e ³ u(t ) e
92& ift B2& ifb 92& ift
dt e dt (2.36h)
5 5
or
F ( 9 ift )  u (t  b)  e B2& ifb F ( 9 ift )  u (t )  . (2.36i)

The next set of identities is sometimes called the Fourier scaling theorem. If U(ƒ) is the
forward Fourier transform of u(t) and the argument of u is scaled by the real constant a,

u (t ) 7 u (at ) ,

then the forward Fourier transform of u ( at ) is, letting t 3 at ,

5 5 § ft 3 ·
1 2& i ¨ ¸ 1 § f ·
³ u(at )e ³ u (t 3)e
2& ift © a ¹
dt dt 3 U ¨ ¸.
5
a 5
a ©a¹

-- 107
107 --
2 · Fourier Theory

This can be written as


1 § f ·
u (at ) 6 U¨ ¸ (2.37a)
a ©a¹
or
1  i f a t 
F (  ift )  u (at )  F  u (t )  . (2.37b)
a

We also have, scaling the frequency by a positive constant a and letting f 3 af , that

5 5 § f 3t ·
1 2& i ¨ ¸ 1 §t·
³ U (af )e df ³ U ( f 3)e © a ¹ df 3 u ¨ ¸ .
2& ift

5
a 5 a ©a¹

This can be written as


1 §t·
u ¨ ¸ 6 U (af ) for a 0 (2.37c)
a ©a¹
or
1 it a  f 
F ( itf ) U (af )  F U ( f )  for a 0 . (2.37d)
a

Equation (2.37b) and (after interchanging the roles of ƒ and t) Eq. (2.37d) can be combined into
the single formula,
1 9 i  f a t 
F ( 9 ift )  u (at )  F   u (t )  for a 0 . (2.37e)
a

Because u(t) must satisfy requirements (V) through (VIII) in Sec. 2.4 for these results to be
true—and in particular it must satisfy requirement (V) that it be absolutely integrable—there may
well be only a finite region of t over which u(t) is significantly different from zero. When
0
a
1 so that the range of t over which u is significantly different from zero expands, formula
(2.37a) shows that the region of ƒ over which U(ƒ) is significantly different from zero shrinks;
and, of course, when a 1 , just the opposite occurs. For 0
a
1 , function u (at ) more closely
resembles sin(2& ft ) and cos(2& ft ) for smaller values of ƒ, explaining why the region of ƒ for
which U is significantly different from zero shrinks; and when a 1 , function u (at ) more closely
resembles sin(2& ft ) and cos(2& ft ) for larger values of ƒ, explaining why the region of ƒ for
which U is significantly different from zero expands. We also note that if f 1 (2& ) , so that
sin(2& ft ) sin(t ) and cos(2& ft ) cos(t ) , then the sine and cosine can change significantly in
value only when t changes by at least

- 108
- 108- -
Basic Fourier Identities · 2.8

tmin O (1) .

Suppose t must also change by at least tmin O (1) for a significant change in u(t) to occur,
which means that sin(2& ft ) sin(t ) and cos(2& ft ) cos(t ) vary about as fast with respect to t as
u does—that is, sin(t ) and cos(t ) “resemble” u somewhat. Recalling the heuristic reasoning used
in Sec. 2.1 to introduce and justify the sine and cosine integrals, we now expect U(ƒ) to be
significantly different from zero when f 1 (2& ) . Suppose next that t changes by less than
tmin O (1) so that u does not change significantly in value, remaining almost constant. Now
when ƒ becomes significantly larger than 1 (2& ) , functions sin(2& ft ) and cos(2& ft ) oscillate
ever more rapidly so that they change significantly in value for changes in t that are ever smaller
than tmin . For these larger values of ƒ, the sine and cosine do not much resemble u(t), forcing
the Fourier transform U(ƒ) to be negligible or zero for f O (1 (2& )) . We can modify the
original function u by creating a new function u (t ) u (t  ) for  0 . Now t must change by at
least an O(  ) amount for u to change significantly; and when t changes by less than O(  ) ,
function u does not change significantly in value. We know from (2.37a) with a 1  that the
forward Fourier transform of u is U  ( f )  U   f  . Hence, when ƒ is larger than
O 1 (2& )  , it must be true that U  ( f ) is negligible or zero, since this is the same as having
f O(1 (2& )) in U(ƒ). Because 2& is often regarded as an O(1) quantity, this result can also be
interpreted as showing that U  ( f ) must be negligible or zero for f O (1  ) . Since the original
Fourier transform pair

u (t ) 6 U ( f )

is left unspecified, u in fact represents any function v(t) where t must change by at least an
O(  ) amount for a significant change in v to occur. Consequently, we can conclude if t must
change by at least an O(  ) amount for v(t) to change significantly, then the forward Fourier
transform of v(t) must be negligible or zero for f O (1  ) . The arguments leading to this
conclusion work just as well when we consider the inverse Fourier transform in Eqs. (2.37c) and
(2.37e). Therefore, this more general result is also true: if v(t) is a function such that t must
change by at least an O(  ) amount for a significant change in v to occur, then the forward or
inverse Fourier transform,
5

³ v(t )e
92& ift
V( f ) dt ,
5

is negligible or zero for f O (1  ) .

-- 109
109 --
2 · Fourier Theory

2.9 Fourier Convolution Theorem


It is hard to overstate the importance of the Fourier convolution theorem; it plays a fundamental
role in linear signal theory and structures the thinking of many different engineering
disciplines—signal processing, electrical engineering, image analysis, and servomechanism
design, to name but a few.
We define the convolution of two functions u(t) and v(t) to be

5
u (t )  v(t ) ³ u(t3)v(t  t3) dt 3 .
5
(2.38a)

Here, u and v may be complex functions but their argument t is assumed to be real. The
convolution is commutative and associative. It is commutative because making the substitution
t 33 t  t 3 gives

5 5 5
u (t )  v(t ) ³ u (t 3)v(t  t 3) dt 3  ³ u (t  t 33)v(t 33) dt 33 ³ v(t33)u(t  t 33) dt 33 ,
5 5 5

showing that
u (t )  v(t ) v(t )  u (t ) . (2.38b)

The convolution is associative because for three complex functions u(t), v(t), and h(t) with real
argument t we can write, changing the variable of integration to t 333 t 33  t 3 ,

5 5 5 5

u (t )  v(t )  h(t ) ³ dt 33h(t  t 33) ³ dt 3u (t 3)v(t 33  t 3) ³ dt 3u (t 3) ³ dt 33h(t  t 33)v(t 33  t 3)


5 5 5 5
5 5
³ dt 3u (t 3) ³ dt 333v(t 333)h  (t  t 3)  t 333
5 5
u (t )   v(t )  h(t ) .

Hence,
u (t )  v(t )  h(t ) u (t )  v(t )  h(t ) . (2.38c)

The convolution is a linear operation, because for any two complex constants Į and ȕ,

- 110
- 110- -
Fourier Convolution Theorem · 2.9

5
h(t )   u (t )   v(t )  ³ h(t 3)  u (t  t 3)   v(t  t 3)  dt 3
5
5 5
 ³ h(t 3)u (t  t 3)dt 3   ³ h(t 3)v(t  t 3)dt 3 ,
5 5
showing that

h(t )  ( u (t )   v(t ))   h(t )  u (t )     h(t )  v(t )  . (2.38d)

Because the convolution is commutative, the equation can also be written as

 u (t )   v(t )   h(t )   u (t )  h(t )     v(t )  h(t )  . (2.38e)

This shows that the convolution is linear on both the left-hand and right-hand sides of the  .
The convolution of two even functions or two odd functions is an even function. If u(t) and
v(t) are both even or both odd, then we have, using t 33 t 3 ,

5 5
u (t )  v(t ) ³ u(t 3)v(t  t 3) dt 3  ³ u (t 33)v(t  t 33) dt 33
5
5
5
(2.38f)

5
³ u (t 33)v(t  t 33) dt 33 u(t )  v(t ) .

When u is even and v is odd, or u is odd and v is even, then we have

5 5
u (  t )  v ( t ) ³ u(t 3)v(t  t 3) dt 3  ³ u (t 33)v(t  t33) dt 33
5 5
5
(2.38g)
 ³ u (t 33)v(t  t 33) dt 33 u (t )  v(t ) .
5

Hence, the convolution of an even and an odd function is always odd.


If u and v have more than one argument so that they are written u ( y, x1 , x2 ,…) and
v( y, x13, x23 ,…) , then we adopt the convention that the convolution

-- 111
111 --
2 · Fourier Theory

u ( y, x1 , x2 ,…)  v( y, x13, x23 ,…)

is over variable y rather than variables x1 , x13, x2 , x23 ,… ,

5
u ( y, x1 , x2 ,…)  v( y, x13, x23 ,…) ³ u( y3, x , x ,…)v( y  y3, x3, x3 ,…) dy3 ,
5
1 2 1 2

because y is the only argument repeated on both sides of the  .


To derive the Fourier convolution theorem, we take the forward or inverse transform of
u (t )  v(t ) to get
5 5 5
F ( 9 ift )
 u (t )  v(t )  ³ e 92& ift
u (t )  v(t ) dt ³ dt e 92& ift
³ dt 3u(t 3)v(t  t 3)
5 5 5
5 5

³ dt 3u (t 3) ³ dt e
92& ift
v(t  t 3).
5 5

Changing the variable of integration in the inner integral to t 33 t  t 3 gives

5 5
F ( 9 ift )  u (t )  v(t )  ³
5
dt 3u (t 3)e 92& ift 3 ³ dt 33e92& ift 33v(t 33)
5

ª 5
º ª5 º
« ³ dt 3u (t 3)e 92& ift 3
» A « ³ dt 33e
92& ift 33
v(t 33) »
¬ 5 ¼ ¬ 5 ¼

or
F ( 9 ift )  u (t )  v(t )  F ( 9 ift )  u (t )  A F ( 9 ift )  v(t )  . (2.39a)

If U(ƒ) and V(ƒ) are the forward Fourier transforms of u(t) and v(t) respectively, we can choose
the minus sign of (2.39a) to get

³e
2& ift
u (t )  v(t ) dt U ( f ) AV ( f ) , (2.39b)
5
which shows that
u (t )  v(t ) 6 U ( f ) A V ( f ) . (2.39c)

Equation (2.28 A ) can be written as, for any function g(t) after interchanging the roles of t and t 3 ,

- 112
- 112- -
Fourier Convolution Theorem · 2.9

 
F ( 9 it 3f ) F ( B ift )  g (t )  g (t 3) . (2.39d) (2.39d)

We replace F ( 9 ) by F ( B ) on the right-hand side of Eq. (2.39a), which is just a change in the order
in which the two possible signs of the exponent are listed, and then take F ( 9 it 3f ) of both sides to
get that, applying (2.39d) with g (t ) u (t )  v(t ) ,


u (t 3)  v(t 3) F ( 9 it 3f ) F ( B ift )  u (t )  A F ( B ift )  v(t )  .  (2.39e)

Because u(t) and v(t) represent arbitrary, Fourier-transformable functions of t, F ( B ift ) (u (t )) and
F ( B ift ) (v(t )) must be arbitrary, Fourier-transformable functions of ƒ, which we can call U ( B ) and
V ( B ) respectively,
U ( B ) ( f ) F ( B ift )  u (t )  (2.39f)
and
V ( B ) ( f ) F ( B ift )  v(t )  . (2.39g)

Applying this notation to (2.39d), first with g (t ) u (t ) and then with g (t ) v(t ) , we see that

 
F ( 9 it 3f ) U ( B ) ( f ) u (t 3) (2.39h)
and
 
F ( 9 it 3f ) V ( B ) ( f ) v(t 3) . (2.39i)

Hence Eq. (2.39e) can be written as

    
F ( 9 it 3f 3) U ( B ) ( f 3)  F ( 9 it 3f 33) V ( B ) ( f 33) F ( 9 it 3f ) U ( B ) ( f ) AV ( B ) ( f ) , 
where the convolution is over t 3 because it is the only argument repeated on both sides of the  .
Since U ( B ) and V ( B ) are arbitrary, transformable functions, we can replace them by the arbitrary
transformable functions u and v to get, after interchanging the roles of ƒ and t 3 ,

F ( 9 ift 3)  u (t 3) A v(t 3)  F ( 9 ift 33)  u (t 33)   F ( 9 ift 333)  v(t 333)  .

This can be simplified by dropping a prime from each of the t’s:

F ( 9 ift )  u (t ) A v(t )  F ( 9 ift 3)  u (t 3)   F ( 9 ift 33)  v(t 33)  . (2.39j)

-- 113
113 --
2 · Fourier Theory

If U(ƒ) and V(ƒ) are the forward Fourier transforms of u(t) and v(t) respectively, we can choose
the minus sign of (2.39j) to get

³e
2& ift
u (t ) A v(t ) dt U ( f ) V ( f ) (2.39k)
5
or
u (t ) A v(t ) 6 U ( f )  V ( f ) . (2.39 A )

Equation (2.39b) shows that the forward Fourier transform of the convolution of two functions
is the product of the forward Fourier transform of each function, and (2.39k) shows that the
forward Fourier transform of the product of two functions is the convolution of the forward
Fourier transform of each function. Equations (2.39a) and (2.39j) show that everything we just
said about the forward Fourier transform still holds true when we take the reverse Fourier
transform of the product of two functions or of the convolution of two functions.
When using the Fourier convolution theorem, we usually regard one of the two convolved
functions as representing the undisturbed signal—that is, the true set of values for what is to be
measured—and the other—usually much more narrow—function as specifying the blurring or
smearing effect of an imperfect measurement. The blurring or smearing function has different
names in different engineering disciplines; optical engineers often call it the instrument-response
or instrument line-shape function. In Fig. 2.5(a), function u is taken to be the true signal, and in
Fig. 2.5(b) function v is the instrument-response or instrument line-shape function. The
convolution
5
u (t )  v(t ) ³ u (t 3)v(t  t 3) dt 3 u
5
blur (t )

defines the new function ublur (t ) as shown in Figs. 2.5(c)–2.5(e). The function v is flipped left to
right and slid along the t 3 axis in Fig. 2.5(c) by changing the value of t. Figure 2.5(d) is a close-
up of v at a specific value of t, with the shaded region being the area under the product
u (t 3)v(t  t 3) . Since u (t 3)v(t  t 3) is zero where v(t  t 3) is zero, the area of the shaded region can
be found by integrating u (t 3)v(t  t 3) over t 3 between í’ and +’. This is, of course, just the
convolution of u and v for this particular value of t , which means the area of the shaded region
must be ublur (t ) for this value of t. Figure 2.5(e) represents the complete ublur (t ) function for all
values of t; clearly ublur has less detail than the original signal u.
The v(t) function in Fig. 2.5(b) is an unusual type of instrument response because it is not an
even function of t. Figure 2.5(f) shows a typical even instrument response ve (t ) . When the
instrument-response function is ve , the blurred signal is

- 114
- 114- -
Fourier Convolution Theorem · 2.9

ue ,blur (t ) u (t )  ve (t ) . (2.40a)

The instrument-response function is even, so ve (t ) ve (t ) and we can write

5 5
ue,blur (t ) ³ u(t 3)ve (t  t 3) dt 3
5
³ u(t 3)v (t 3  t ) dt 3
5
e (2.40b)

with the last integral in (2.40b) making it perhaps more obvious that ue,blur is a localized and
weighted average of u centered on t. Instrument-response or line-shape functions are usually
designed to be even because an even instrument-response function does not shift the center point
of isolated peaks in the true data u.
As described in the first chapter, when using Michelson interferometers, we do not much care
about the exact shape of the optical intensity signal u but are instead interested in the shape of its
transform,
U ( f ) F (  ift )  u (t )  . (2.40c)

In many types of interferometers, u is a signal of time t, which means U can be analyzed as a


function of ƒ, the signal frequency. The electrical circuits transmitting and recording the signal u
can never do a perfect job—they always blur and smooth the original signal to some extent—so
what we end up with is not u(t) and U(ƒ) but rather ue,blur (t ) and the associated Fourier transform

U e ,blur ( f ) F (  ift )  ue ,blur (t )  . (2.40d)

The relationship between U e ,blur and U must be understood to design the electrical circuits
properly. Here is an important example of how to use the Fourier convolution theorem.
Substitution of (2.40a) into (2.40d) gives

U e ,blur ( f ) F (  ift )  u (t )  ve (t )  .

Using the Fourier convolution theorem as presented in Eq. (2.39a), this is rewritten as

U e ,blur ( f ) F (  ift )  u (t )  A F (  ift )  ve (t ) 


or
U e ,blur ( f ) U ( f ) A Ve ( f ) , (2.40e)

where U(ƒ) comes from (2.40c) and we define

-- 115
115 --
2 · Fourier Theory

FIGURE 2.5(a).
u (t )

FIGURE 2.5(b). v(t )

u (t 3)
FIGURE 2.5(c).

t3

u (t 3)v(t  t 3)

t value
v(t  t 3)

FIGURE 2.5(d).
t3

ublur (t )

FIGURE 2.5(e). t

ve (t )

t
FIGURE 2.5(f).

- 116
- 116- -
Fourier Convolution Theorem · 2.9

Ve ( f ) F (  ift )  ve (t )  .

Equation (2.40e) is a very reassuring result, stating that as long as Ve ( f ) is known and not zero,
we can recover the Fourier transform of the true signal U(ƒ) from U e ,blur ( f ) by calculating

U e ,blur ( f )
U( f ) . (2.40f)
Ve ( f )

To design the circuits of a Michelson interferometer, we find the frequencies ƒ for which U(ƒ)
must be known and arrange for Ve to be as constant as possible—and definitely not zero—over
these frequencies. It turns out that preserving certain signal frequencies while neglecting others is
a standard problem in electrical circuit design, and it is usually easy to arrange for this to occur.
There is, in fact, a whole branch of electrical engineering called filter theory that describes
exactly how to design circuits where Ve is zero or very small at some frequencies while being
large and quasi-constant at others.

2.10 Fourier Transforms and Divergent Integrals


Fourier-transform theory has a history of treating with extreme kindness engineers and scientists
who blindly use its formalism without worrying about whether their manipulations make
mathematical sense. The rule of thumb seems to be that if the final result is mathematically
sound—such as a finite integral or the transform of an obviously transformable function—it
almost never matters whether intermediate steps involve the transforms of functions that
obviously cannot be transformed or even, strictly speaking, are not true functions at all. Any
reasonably comprehensive table of Fourier transforms contains functions that not only violate
requirements (V) through (VIII) in Sec. 2.4 but also have transform integrals that, according to
the standard definition of integration, either diverge or have no well-defined value. This book
shows that these puzzling entries are the modest but ubiquitous legacy of mathematicians who
have extended the meaning of what is meant by an integral and what is meant by a function in
Fourier-transform theory. Their work has not only benefited many scientists and engineers who
no longer have to apologize for the way they solve Fourier-transform problems but has also
helped their students who no longer need to accept without good explanations divergent integrals
and the transforms of poorly defined functions.
The standard definition of an improper integral

³ u (t )dt
5
for the function u(t) is that

-- 117
117 --
2 · Fourier Theory

5 T2

³ u(t )dt lim ³ u(t )dt .y


T1 75
5 T2 75 T1

If there is any singular point t s where lim u (t ) 95 , the definition becomes


t 7t s

5 ª ts 1 T2
º
³5 u (t ) dt lim « ³
« T1
u (t ) dt  ³ u (t ) dt ». (2.41a)
¼»
T1 75 , T2 75
1 70,  2 70 ¬ ts  2

In this definition, the limits as T1 7 5 , T2 7 5 , 1 7 0 , and  2 7 0 occur independently; no


matter how T1 , T2 , 1 , and  2 approach their limits, the same answer is expected if the integral
exists. We now decide, in the interest of expanding Fourier-transform theory, to change this
standard definition of improper integral by connecting 1 to  2 and T1 to T2 as we take the limit,

5 ª t s  T º
³5 u (t ) dt lim «
« T
³ u (t ) dt  ³ u (t )dt » . (2.41b)
¼»
T 75
 70 ¬ ts 

The limiting process in definition (2.41b) is said to give the Cauchy principle value of the
integral, sometimes written as
5 5
_
PV ³ u (t )dt or ³ u(t )dt .
5 5

If u(t) has multiple singular points, the definition is expanded in the obvious way. For example,
with two singular points at ts1 and ts 2 with ts1
ts 2 , we have

5 ª ts1 1 t s 2  2 T º
PV ³ u (t )dt lim « ³ u (t )dt  ³ u (t )dt  ³ u (t ) dt » (2.41c)
1 70 « »¼
T 75
5 ¬ T ts 1 1 ts 2  2
 2 70

and so on for three, four, etc., interior points of singularity in u(t). If an improper integral
converges to a finite value in the standard sense of (2.41a), then its Cauchy principle value also
converges to the same answer, but many improper integrals that do not converge in the sense of
(2.41a) nevertheless have well-defined Cauchy principle values. For this reason, it is customary
in Fourier-transform theory to interpret all improper integrals—such as the forward and inverse
Fourier transforms—as Cauchy principle values, and that is what we shall do from now on. There
will be no special notation used to distinguish Cauchy principle values from ordinary improper
integrals.

- 118
- 118- -
Fourier Transforms and Divergent Integrals · 2.10

To show the relevance of the Cauchy principle value, we calculate the Fourier transform of
1 t , an example already considered above in connection with the sine transform [see discussion
following Eq. (2.10e)]. Using the identity ei cos( )  i sin( ) , we have

5 5 5
F (  ift ) (t 1 ) ³ e 2& ift t 1dt ³ cos(2& ft ) t 1dt  i ³ sin(2& ft ) t 1dt . (2.42a)
5 5 5

There is no problem evaluating the imaginary part of this transform. Because [t 1 sin(2& ft )] is
an even function of t, we can apply formulas (2.19) and (2.10f) to get

5 5
i ³ sin(2& ft ) t dt 2i ³ sin(2& ft ) t 1dt i& for
1
f 0.
5 0

When f
0 , we have

5 5
i ³ sin(2& ft ) t dt i ³ sin(2& f t ) t 1dt i& ,
1

5 5

allowing us to write
5
i ³ sin(2& ft ) t 1dt i& sgn( f ) , (2.42b)
5

where we define
­ 1 for f 0
°
sgn( f ) ® 0 for f 0 . (2.42c)
° 1 for f
0
¯

The specification that sgn(0) 0 makes sgn( f ) a proper odd function, equal to zero at f 0 ,
even though it has a jump discontinuity there. It also, of course, makes sense considering that
(2.42b) is the integral of the zero function when f 0 . Evaluation of the real part of the
transform in (2.42a) shows the usefulness of interpreting improper integrals as Cauchy principle
values. When f 0 , the real part of the left-hand side of (2.42a) becomes, using the standard
interpretation of an improper integral in (2.41a),

-- 119
119 --
2 · Fourier Theory

5
dt ª 1 dt T2 dt º ª T1 dt § T2 · º
³ t T1 75,T2 75 « ³T t ³ t » T1 75,T2 75 « ³ t ¨©  2 ¸¹»»
lim «  » lim «   ln
5  70,  70 ¬ 1
1 2 2 ¼  70, 70 ¬ 1
1 2 ¼
ª §T · § T ·º
lim «  ln ¨ 1 ¸  ln ¨ 2 ¸ » (2.43a)
T1 75 , T2 75
1 70,  2 70 ¬ © 1 ¹ ©  2 ¹¼
ª § · § T ·º
lim «ln ¨ 1 ¸  ln ¨ 2 ¸ » .
T1 75 , T2 75
 70,  7 0 ¬
1 2
© 2 ¹ © T1 ¹ ¼

The expression ln(1  2 ) can be made anything we want depending on the limiting ratio
chosen for 1  2 as 1 7 0 and  2 7 0 ; the same is true of ln(T1 T2 ) as T1 7 5 and T2 7 5 .
Therefore, under the standard interpretation of an improper integral, the limit in (2.43a) does not
exist. Comparison of (2.41a) to (2.41b) shows that (2.43a) can be converted to a Cauchy principle
value by setting 1  2  , T1 T2 T , and taking the limit as T 7 5 ,  7 0 . This leads to

ª § · § T ·º
lim «ln ¨ ¸  ln ¨ ¸ » 0 ,
 70 ¬
T 75
© ¹ © T ¹¼

5
dt
allowing us to give a well-defined value to the expression ³ t .
5
In general, the Cauchy principle value of any odd function is always zero,

³ u(t )dt 0
5
for any function u such that u (t ) u (t ), (2.43b)

because when taking the limit we are always simultaneously adding u (t )dt increments to the
integral at values of t and ít with the balanced addition of increments always cancelling out.
Hence, interpreted as a Cauchy principle value,

³ cos(2& ft ) t
1
dt 0 (2.43c)
5

because [t 1 cos(2& ft )] is an odd function of t. Therefore we can now assign a well-defined


meaning to the forward Fourier transform of 1 t in (2.42a) using (2.43c) and (2.42b):

F ( ift ) (t 1 ) i& sgn( f ) . (2.43d)

- 120
- 120- -
Fourier Transforms and Divergent Integrals · 2.10

For this answer to be a true extension to Fourier-transform theory, however, 1/t must satisfy
Eq. (2.28 A ); that is, the inverse transform

F ( itf )  i& sgn( f ) 

has to give back the original function 1/t.


Direct evaluation of the inverse transform gives

5
F ( itf )
 i& sgn( f )  i& ³ e2& ift sgn( f )df
5
5 5
(2.43e)
i& ³ cos(2& ft ) sgn( f )df  & ³ sin(2& ft ) sgn( f )df .
5 5

The cosine integral is again the integral of an odd function so its Cauchy principle value is zero,
but it is still not clear what value to assign the integral of [sin(2& ft ) sgn( f )] . As the integral of
an even function, we might try applying formula (2.19) to get

5
? 5 5
& ³ sin(2& ft ) sgn( f )df 2& ³ sin(2& ft ) sgn( f )df 2& ³ sin(2& ft ) df , (2.43f)
5 0 0

but then we have the same difficulty already encountered when trying to evaluate the sine
transform
5
2& ³ sin(2& ft )df
0

in Eq. (2.10g). To evaluate the inverse transform of i& sgn( f ) , we need to create a new class of
mathematical entities, called generalized functions, together with a set of rules for how they
behave inside integrals. This extension to Fourier-transform theory is often called distribution
theory, with the generalized functions called distributions.

2.11 Generalized Functions


Generalized functions are based on the well-established mathematical concept of a functional. A
functional is a rule for assigning a complex number to each member of a set of test functions,
where each test function  has only one number assigned to it and the same number may end up
assigned to different test functions. The Fourier transform of a function  (t ) at a specific
frequency f f 0 is a functional because it assigns the number  ( f 0 ) F (  if0t )  (t )  to the test

-- 121
121 --
2 · Fourier Theory

function  . In general, we can use any complex function u(t) having a real argument t as a
weighting function inside an integral to create a functional. This functional, called ³ u , is defined
to be
5

 ³ u  ³ dt u(t ) (t ) complex number . (2.44)


5

According to this definition the functional ³ u is linear, like the Fourier transform, because

5 5 5

 ³ u      ³ u(t )  (t )   (t ) dt  ³ u (t ) (t )dt   ³ u(t ) (t )dt


1 2 1 2 1 2
(2.45)
5 5 5

  ³ u     ³ u 
1 2

for any two complex constants Į, ȕ and test functions 1 , 2 .


From the notation ³ u , it is clear that all functions u, as long as the integral in Eq. (2.44) exists,
have associated with them the functional ³ u defined for the test functions  . There are also
functionals that behave in every way like the functionals ³ u , but for which no corresponding true
function u can be defined. We can, however, associate with these functionals a new class of
mathematical objects, called generalized functions, which can be shown to have many of the
properties of true functions. For this reason, it is customary to use function notation when
referring to generalized functions. If an already-understood functional has no true function u(t)
associated with it, we can use the properties of this already-understood functional to define a
generalized function called uG (t ) , with the subscript G reminding us that uG is a generalized
function. By analogy with the true function u(t) associated with the functional ³ u , the
generalized function and its behavior inside integrals is defined in terms of the already-known
functional, which we call ³ uG , using the definition

³u
5
G  
(t )  (t ) dt ³ uG  (2.46)

for any test function  . Since we already know what complex number the functional ³ uG gives
for any test function  , Eq. (2.46) is not a definition of ³ uG but rather a definition of what it
means to put [uG (t ) A  (t )] inside an integral. Clearly, the generalized function itself is well
defined only when its product with a test function is integrated over t. Because the functional ³ uG
behaves in every way like the functionals ³ u based on the Cauchy-principle-value integration of
true functions, we have established a new type of integration using the product of generalized

- 122
- 122- -
Generalized Functions · 2.11

functions uG (t ) with test functions  (t ) . Hence, we have not only generalized what is meant by a
function but have also extended again what is meant by integration.
To handle algebraic expressions involving both generalized functions and true functions, we
must define what it means to say two generalized functions uG (t ) and vG (t ) are equal. We say
that when
5 5

³u
5
G (t ) (t )dt ³v
5
G (t ) (t )dt (2.47a)

for all appropriate test functions  , then

uG (t ) vG (t ) . (2.47b)

We also define a generalized function uG (t ) , which we know only from its associated
functional ³ uG using definition (2.46), to be equal to a true function v(t) when

 ³ u   ³ v 
G (2.48a)

for all appropriate test functions  . Another way of stating this is that whenever

5 5

³ uG (t ) (t )dt
5
³ v(t ) (t )dt
5
(2.48b)

for all the test functions  , we say that


uG (t ) v(t ) . (2.48c)

Two generalized functions uG (t ) and vG (t ) are defined to be equal over an interval a


t
b
when
 
³ uG ab ³ vG ab   (2.48d)
or
5 5

³ uG (t )ab (t )dt
5
³v
5
G (t )ab (t )dt (2.48e)

for all test functions ab (t ) that are identically zero for all t
a and for all t b . The key point
here is that we are explicitly allowing ab (t ) to be nonzero only inside the interval a
t
b . We
also say that a true function v(t) equals a generalized function uG (t ) in the interval a
t
b ,

-- 123
123 --
2 · Fourier Theory

uG (t ) v(t ) for a
t
b , (2.48f)
whenever
5 5

³u
5
G (t )ab (t )dt ³ v(t )
5
ab (t )dt (2.48g)

for all the ab (t ) test functions. In Eqs. (2.48d)–(2.48g), we allow for half-infinite intervals by
permitting constant b to be 5 with constant a finite and constant a to be í’ with constant b
finite.
The definitions of equality between two generalized functions or between a generalized
function and a true function can be, depending on the set of test functions  chosen, either very
much looser than the standard idea of equality or very much the same. Suppose, by way of
analogy, we define two true functions u1 (t ) and u2 (t ) to be “equal” when

5 5

³ u (t ) (t )dt ³ u (t ) (t )dt
5
1
5
2 (2.49)

for all test functions  . If the only allowed test function is  (t ) 0 , then any two functions u1 (t )
and u2 (t ) are “equal.” If, on the other hand, the allowed test functions are  (t ) e 92& ift for all real
values of ƒ, we are saying that u1 (t ) and u2 (t ) are “equal” when their Fourier transforms
F ( 92& ift )  u1 (t )  and F ( 92& ift )  u2 (t )  are the same. From the Fourier inversion formulas, it then
follows that u1 (t ) must be identical to u2 (t ) , except possibly at jump discontinuities and isolated
points, for all reasonably well-behaved functions u1 (t ) and u2 (t ) . In general, we expect the set of
test functions to be diverse enough that serious thought and some mathematical ingenuity are
required to find two functions u1 (t ) and u2 (t ) that satisfy Eq. (2.49) yet are not basically the
same function. Of course, the integrals used in Eq. (2.49)—and all the other integrals involving
only true functions in Eqs. (2.44) through (2.48g), for that matter—must be known to exist. Often
the finiteness of these integrals and the general smoothness of the test functions are enforced by
the requirement that

N
lim[ t  (t )] 0 for N 0,1, 2,… , (2.50a)
t 75

with the Mth derivative,    (t ) d M  dt M , satisfying


M

- 124
- 124- -
Generalized Functions · 2.11

N
lim[ t  ( M ) (t )] 0 for N 0,1, 2,…
t 75 . (2.50b)
and M 1, 2,…

2
A function such as e  at for a 0 satisfies (2.50a) and (2.50b), and in general all functions
representing physically realistic measurements can be taken to satisfy these two requirements. It
turns out, however, that the most useful and popular generalized function used in Fourier theory
can handle a wider variety of test functions, requiring only that the test functions  be
continuous at t 0 (see Sec. 2.14 below).
Continuing to develop what is meant by the sign applied to generalized functions, we say
that the product of a true function w(t) and a generalized function uG (t ) is another generalized
function vG (t ) ,
vG (t ) w(t ) A uG (t ) , (2.51a)

which is defined to mean that


5 5

³
5
vG (t ) (t )dt ³ w(t )u
5
G (t ) (t ) dt

for all test functions  (t ) . A linear combination of true functions and generalized functions
specified by
wG (t ) u1 (t )vG1 (t )  u2 (t )vG 2 (t )  " (2.51b)

is defined to mean that

5 5 5

³
5
wG (t ) (t )dt ³ u1 (t )vG1 (t ) (t )dt 
5
³ u (t ) v
5
2 G2 (t ) (t ) dt  "

for all test functions  (t ) . In general, there is no difficulty assigning a meaning to equations such
as
u1 (t )vG1 (t )  u2 (t )vG 2 (t )  "  u N (t )vGN (t )
(2.51c)
U1 (t )VG1 (t )  U 2 (t )VG 2 (t )  "  U M (t )VGM (t )

for true functions u1 (t ), u2 (t ),… , u N (t ), U1 (t ), U 2 (t ),… , U M (t ) and generalized functions


vG1 (t ), vG 2 (t ),… , vGN (t ), VG1 (t ), VG 2 (t ),… , VGM (t ) . As long as both sides of the equation are just
linear combinations of generalized functions and true functions, we interpret their equality to
mean that

-- 125
125 --
2 · Fourier Theory

5 5 5

³ u1 (t )vG1 (t ) (t )dt 
5 5
³ u2 (t )vG 2 (t ) (t )dt  "  ³u
5
N (t )vGN (t ) (t ) dt
5 5 5
³ U (t )V
5
1 G1 (t ) (t ) dt  ³ U 2 (t )VG 2 (t ) (t ) dt  "  ³ U M (t )VGM (t ) (t ) dt
5 5

for all test functions  (t ) . Even the simplest nonlinear expressions, however, such as

? 2
vG (t ) uG (t )  ,

cannot be resolved by putting both sides inside an integral, because the right-hand side of

5
?5
³ vG (t ) (t )dt ³ uG (t )   (t )dt
2

5 5

is still undefined. We know that the left-hand side is the same as applying the already-understood
functional ³ uG to  ,
5

³u
5
G  
(t ) (t )dt ³ uG  ,

but no definition has been given to


5

³ u
2
G (t )   (t )dt
5

in terms of the functional ³ uG . It turns out that, in general, nonlinear expressions involving
generalized functions cannot be given useful interpretations. Hence, generalized functions must
be treated with caution unless they are used inside linear combinations of the type shown in
(2.51b) and (2.51c).
Although generalized functions do have limitations, there are many things that can be done
with them. We can give meaning to uG (t  a ) for any real constant a by defining that

5 5

³ uG (t  a) (t )dt
5
³u
5
G (t ) (t  a)dt (2.52a)

for all test functions  . This definition is, of course, consistent with what happens when the
formal substitution t 3 t  a is made inside the original integral,

- 126
- 126- -
Generalized Functions · 2.11

5 5 5

³ uG (t  a) (t )dt ³ uG (t 3) (t 3  a)dt 3


5 5
³u
5
G (t ) (t  a)dt ,

treating uG (t  a ) like a true function u (t  a) . We can give meaning to uG (at ) for any real
constant a by defining that
5 5
1
³5 G
u ( at ) (t ) dt ³ uG (t )  t a  dt
a 5
(2.52b)

for all test functions  . This definition is consistent with what happens when we make the formal
substitution t 3 at in the integral
5

³u
5
G (at ) (t )dt

and treat uG (at ) like a true function,

­1 5 ½
5 ° ³
° a 5
uG (t 3)  t 3 a  dt 3 for a 0 °
° 1
5

³5 uG (at ) (t )dt ® 1 5 ¾ ³u G (t )  t a  dt .


° ° a
°a ³
5
uG ( t 3)  t 3 a  dt 3 for a
0 °
¯ 5 ¿

When the argument of uG is the a linear combination at  c for real constants a and c, we
define
5 5
1
³ uG (at  c) (t )dt a 5³ uG (t )  (t  c) a  dt
5
(2.52c)

and, combining the arguments used to explain definitions (2.52a) and (2.52b), we see that
transforming the variable of integration to t 3 at  c gives

5 5
1
³5 uG (at  c) (t )dt a ³u
5
G (t 3)  (t 3  c) a  dt 3 ,

justifying definition (2.52c). In general, any variable transformation that is permitted for the
argument of a true function we also permit for the argument of a generalized function unless it
results in an inappropriate test function.
We define a generalized function uG (t ) to be even if

-- 127
127 --
2 · Fourier Theory

³u
5
G (t )o (t )dt 0 (2.52d)

for all odd test functions o , and we define uG (t ) to be odd if

³u
5
G (t )e (t )dt 0 (2.52e)

for all even test functions e . This gives uG (t ) the same behavior it would have if it were an even
or odd true function multiplied by e or o and integrated over all t. Putting a subscript e on the
generalized function uGe (t ) to show that it obeys the above definition for an even generalized
function, we note that, as described in Eq. (2.11c) above, any test function  (t ) can be written as
the sum of an even function e (t ) and an odd function o (t ) . Hence, for any test function  and
an even generalized function uGe (t ) , we can write, using definition (2.52d),

5 5 5 5

³ uGe (t ) (t )dt
5
³ uGe (t ) e (t )  o (t ) dt
5 5
³ uGe (t )e (t )dt  ³u
5
Ge (t )o (t )dt
5
³u
5
Ge (t )e (t )dt .

Definition (2.52b) gives, again using that  (t ) e (t )  o (t ) ,

5 5 5

³ uGe (t ) (t )dt


5
³ uGe (t ) (t )dt
5
³u
5
Ge (t ) e (t )  o (t )  dt
5 5
³ uGe (t )e (t )dt 
5
³u
5
Ge (t )o (t )dt
5 5
³u
5
Ge (t )e (t ) dt  ³u
5
Ge (t )o (t ) dt
5
³u
5
Ge (t )e (t ) dt ,

where in the last two steps we use o (t ) o (t ) , e (t ) e (t ) , and definition (2.52d). We see
that both

- 128
- 128- -
Generalized Functions · 2.11

5 5

5
³u Ge (t ) (t )dt and ³u
5
Ge (t ) (t )dt

are equal to
5

5
³u Ge (t )e (t )dt

for any test function  , so by definition (2.47a) for the equality of two generalized functions, it
follows that
uGe (t ) uGe (t ) (2.52f)

for any even generalized function uGe (t ) . If uGo (t ) is any odd generalized function, we can use
 (t ) e (t )  o (t ) and definition (2.52e) to get

5 5 5

³u
5
Go (t ) (t )dt ³u
5
Go (t ) e (t )  o (t ) dt ³u
5
Go (t )o (t )dt

and definition (2.52b) to get

5 5 5 5

³ uGo (t ) (t )dt


5
³ uGo (t ) (t )dt
5
³ uGo (t )e (t )dt 
5
³u
5
Go (t )o (t )dt
5 5
³u
5
Go (t )e (t ) dt  ³ [u
5
Go (t )]o (t ) dt
5
 ³ [uGo (t )o (t )] dt
5

or
5 5

³ [ u
5
Go (t )] (t ) dt ³u
5
Go (t )o (t ) dt .

5 5
Clearly, ³ uGo (t ) (t )dt and
5
³ [u
5
Go (t )] (t ) dt are equal to each other because they are both
5
equal to ³u
5
Go (t )o (t )dt for any test function  , so by definition (2.47a) we conclude that

uGo (t ) uGo (t )

-- 129
129 --
2 · Fourier Theory

or
uGo (t ) uGo (t ) . (2.52g)

We define the derivative of a generalized function uG (t ) to be another generalized function

uG3 (t ) uG(1) (t ) .

The generalized function uG (t ) is defined in terms of the already-known functional ³ uG , but


what functional ³ uG3 defines the generalized function uG3 (t ) ? We specify this new functional ³ uG3
with the definition
 
³ uG3   ³ uG  3  
or
5 5
§ d ·
³ uG3   ³ uG (t ) 3(t )dt  ³ uG (t ) ¨
  ¸ dt (2.53a)
5 5 © dt ¹

for any test function  . Therefore, the new generalized function uG3 (t ) satisfies the equation

5 5
§ d ·
³ u3 (t ) (t )dt  ³ u
5
G
5
G (t ) ¨ ¸ dt
© dt ¹
(2.53b)

for any test function  . We note that this definition is consistent with a formal integration by
parts, treating uG3 (t ) like a true function u 3(t ) to get

5 5 5
§ d · § d ·
³5 uG3 (t ) (t )dt uG (t ) (t )5  5³ uG (t ) ¨© dt ¸¹ dt 5³ uG (t ) ¨© dt ¸¹ dt ,
5

with the term in square brackets [ ] zero for all test functions  . We can make this first term zero
either by requiring  to approach zero as t 7 95 or by having uG (t ) equal a true function in the
sense of (2.48g) with the true function becoming zero as t 7 95 . The integral involving
 3(t ) d dt must also, of course, have a well-defined meaning for all the test functions  .
The convolution of two generalized functions uG (t ) and vG (t ) is defined to be another
generalized function
wG (t ) uG (t )  vG (t ) . (2.54a)

From Eqs. (2.47a) and (2.47b), we know that (2.54a) must mean that

- 130
- 130- -
Generalized Functions · 2.11

5 5

³w
5
G (t ) (t )dt
5
³ u G (t )  vG (t ) (t )dt (2.54b)

for all test functions  . We now give meaning to both sides of (2.54b) by defining that, for all
test functions  ,

5 5 5 5

³
5
wG (t ) (t )dt
5
³ uG (t )  vG (t ) (t )dt 5
³ dt 3 uG (t 3) ³ dt 33 vG (t 33) (t 3  t 33) .
5
(2.54c)

Note that the right-hand side of (2.54c) is as well defined as our previous definitions, since

5
v ³v
5
G (t 33) (t 3  t 33)dt 33

is just another complex number depending on the real parameter t 3 , which can be treated as
another true test function  v (t 3) inside the double integral of (2.54c),

5 5 5

³ dt 3 u
5
G (t 3) ³ dt 33 vG (t 33) (t 3  t 33)
5
³u
5
G (t 3) v (t 3) dt 3 .

As long as  (t 3  t 33) and  v (t 3) are both test functions whenever  is a test function,
definition (2.54c) should present no difficulties. To justify this definition, we note that formally
treating uG (t ) and vG (t ) as true functions gives

5 5 5

³ uG (t 33)  vG (t 33) (t 33)dt 33


5
³
5
dt 33 (t 33) ³ dt 3 uG (t 3)vG (t 33  t 3)
5
5 5
³ dt 3 u
5
G (t 3) ³ dt 33 (t 33)vG (t 33  t 3) ,
5

where the last step interchanges the order of integration. We now use (2.52a) to write

5 5

³  (t 33)v
5
G (t 33  t 3)dt 33 ³v
5
G (t 33) (t 33  t 3) dt 33 ,

which leads to

-- 131
131 --
2 · Fourier Theory

5 5 5

³ uG (t 33)  vG (t 33) (t 33)dt 33


5
³
5
dt 3 uG (t 3) ³ dt 33 vG (t 33) (t 33  t 3) ,
5

justifying the definition given in (2.54c). Note that the order of integration inside the double
integral of (2.54c) can be freely interchanged,

5 5 5 5

³
5
dt 3 uG (t 3) ³ dt 33 vG (t 33) (t 3  t 33)
5
³
5
dt 33 vG (t 33) ³ dt 3 uG (t 3) (t 3  t 33) ,
5

showing that uG (t )  vG (t ) vG (t )  uG (t ) for generalized functions as well as true functions.


Because the convolution itself is defined as an integral, there is no problem giving a meaning to
the convolution of a true function with a generalized function as long as the true function is an
acceptable test function. For a generalized function uG (t ) and test function  (t ) , we have

5 5 5
uG (t )   (t ) ³u
5
G (t 3) (t  t 3)dt 3 ³u
5
G (t 3)  (t 3  t )  dt 3 ³u
5
G (t  t 3) (t 3)dt 3 , (2.55a)

where definition (2.52c) with a 1 and c t is used in the last step of (2.55a). It clearly makes
sense to say that
5

³u
5
G (t  t 3) (t 3)dt 3  (t )  uG (t ) ,

which means that


uG (t )   (t )  (t )  uG (t ) (2.55b)

for the convolution of a generalized function with any test function  .

2.12 Generalized Limits


Given a sequence of true functions u1 (t ), u2 (t ),… , un (t ),… , we can form a corresponding
sequence of integrals with the test functions  ,

5 5 5

³ u (t ) (t )dt , ³ u (t ) (t )dt, … , ³ u (t ) (t )dt , … .


5
1
5
2
5
n

We define Glim, the generalized limit of the sequence of true functions un (t ) , by taking the
standard limit of the sequence of integrals,

- 132
- 132- -
Generalized Limits · 2.12

5
lim
n 75 ³ u (t ) (t )dt ,
5
n

and requiring that the generalized limit of the sequence of true functions un (t ) , written as

G lim un (t ) ,
n 75

satisfy the equation


5 5
lim
n 75 ³ u (t ) (t )dt ³ ª¬G lim u (t )º¼  (t )dt
5
n
5
n 75
n (2.56a)

for any test function  . In effect, the generalized limit Glim is what we get when we insist on
moving the standard limit inside the integral. Almost always, of course, it turns out that the
generalized limit is the same as the standard limit,

G lim un (t ) lim un (t ) ,
n 75 n 75

so that
5 5
lim
n 75 ³ u (t ) (t )dt ³ ª¬ lim u (t )º¼  (t )dt ,
5
n
5
n 75
n (2.56b)

but this is not always the case. If we define the  function (see Fig. 2.6) by

­ 1 for t
T
°
 (t , T ) ®1 2 for t T , (2.56c)
° 0 for t T
¯

we can construct a sequence of true functions by

1 §t ·
un (t )  ¨ ,1¸ . (2.56d)
n ©n ¹

Function  (t n ,1) is 1 only when  n


t
n , so when

 (t ) 1

-- 133
133 --
2 · Fourier Theory

is an acceptable test function, it is always true that

5 5
1 §t ·
³5 un (t )dt n 5³  ¨© n ,1¸¹ dt 2 ,
which makes
5
lim
n 75 ³ u (t )dt 2 .
5
n (2.56e)

On the other hand,


ª 1 § t ·º
lim un (t ) lim «  ¨ ,1¸ » 0 ,
n 75 n
n 75
¬ © n ¹¼
which gives
5

³ ª¬ lim u (t )º¼ dt 0 .
5
n 75
n (2.56f)

______________________________________________________________________________

FIGURE 2.6.  (t , T )

t T t T

- 134
- 134- -
Generalized Limits · 2.12

The disagreement of (2.56e) and (2.56f) shows that there can be a very important difference
between the generalized limit and the standard limit, because Eq. (2.56b) does not always hold
true. We cannot avoid this problem by ruling out constant test functions such as  (t ) 1 .
Consider, for example,
1
 (t )
1 t2

and construct a sequence of true functions

un (t ) t sin(t n) .

We find that21
5
t sin(t n)
5
³ 1 t 2
dt & e 1 n , (2.57a)

which gives
5
t sin(t n)
lim
n 75 ³
5
1 t2
dt & . (2.57b)

This is not the same as


5 {lim t sin(t n) }
³
5
n 75
1 t2
dt 0 . (2.57c)

Once again, we have found a sequence of true functions un (t ) that does not satisfy (2.56b). This
second example can, in fact, be seen to fail (2.56b) for much the same reason as the first. Since an
even function is being integrated, we can write that [see Eq. (2.19)]

5 5
t sin(t n) t sin(t n)
lim ³ 2
dt 2 lim ³ dt . (2.57d)
n 75
5
1 t n 75
0
1 t2

Consider what happens to the first, positive hump of the sine as n increases in the integral on the
right-hand side of Eq. (2.57d). The values of t for which sin(t n) is significantly different from
zero, say from n A (& 4) to n A (3& 4) , comprise an interval t n A (& 2) with a width that
increases linearly with n, just like the interval 2n in (2.56d) over which  (t n ,1) equals one. The

21
I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products, edited by Alan Jeffrey, 5th ed.
(Academic Press, New York, 1994), p. 445, formula 4 in Sec. 3.723 with a=1/n and =1.

-- 135
135 --
2 · Fourier Theory

center of this hump is at t n A (& 2) , so as n increases, the hump’s center appears at ever larger
values of t. Hence, we can make the approximation that for large n

t 2
2
t 1 ? .
1 t n&

This means the characteristic size of


t sin(t n)
1 t2

at the hump decreases as 1 n , while the hump’s width, t n A (& 2) , increases as n. The product
of the size and width therefore tends to a constant as n gets large, preventing the integral from
shrinking as n 7 5 . This is the same phenomenon that caused our first example n 1 (t n ,1) to
fail Eq. (2.56b). Up to this point, we have, of course, only discussed the contribution of the first

2.13 Fourier Transforms of Generalized Functions


For every generalized function uG (t ) , there is at least one sequence of true functions
u1 (t ), u2 (t ),… , un (t ),… such that
G lim un (t ) uG (t ) . (2.58a)
n 75

This formula should be interpreted in the sense of (2.47b) and (2.56a); that is, it means

5 5 5
ª º
³5 «¬G limn75un (t )»¼  (t )dt lim
n 75 ³
5
un (t ) (t )dt ³ uG (t ) (t )dt
5
(2.58b)

for all test functions  . We use the sequence of true functions whose generalized limit is the
generalized function to define the Fourier transform of the generalized function. If a sequence of
true functions w1 (t ), w2 (t ),… , wn (t ),… can be forward Fourier transformed to give another

- 136
- 136- -
Fourier Transforms of Generalized Functions · 2.13

sequence of true functions W1 ( f ), W2 ( f ),… , Wn ( f ),… such that

³ w (t )e
2& ift
Wn ( f ) n dt (2.59a)
5
and
5

³ W ( f )e
2& ift
wn (t ) n df (2.59b)
5

for all values of n, we then define the forward Fourier transform of the generalized function

wG (t ) G lim wn (t ) (2.59c)
n 75

to be
F ( ift )  wG (t )  G lim Wn ( f ) . (2.59d)
n 75

We expect the sequence of true functions W1 ( f ), W2 ( f ),… , Wn ( f ),… also to give a generalized
function when we take the generalized limit of the sequence,

WG ( f ) G lim Wn ( f ) , (2.59e)
n 75

and we define the inverse Fourier transform of this generalized function to be wG (t ) ,

F ( itf ) WG ( f )  G lim wn (t ) wG (t ) . (2.59f)


n 75

The double-arrow notation 6 introduced in the discussion after Eq. (2.35d) can be used to
restate this definition more concisely. We define that whenever

w1 (t ), w2 (t ),… , wG (t )

is true, and that whenever


W1 ( f ), W2 ( f ),… , WG ( f )

is true, and that whenever

w1 (t ) 6 W1 ( f ), w2 (t ) 6 W2 ( f ), … , wn (t ) 6 Wn ( f ),…

-- 137
137 --
2 · Fourier Theory

is true for all n, it must also be true that

wG (t ) 6 WG (t ) (2.59g)

for the generalized functions given by the generalized limits of sequences

w1 (t ), w2 (t ),… and W1 ( f ), W2 ( f ),… .

Now at last we can attach a meaning to the Fourier transform pair that could not be completed
in Eqs. (2.43d)–(2.43f). The explicit development that follows is perhaps somewhat long, but
worth doing to show how to construct the Fourier transforms of some of the functions violating
one or more of requirements (V) through (VIII) in Sec. 2.4. We create the sequence

sgn( f ) ( f ,1), sgn( f ) ( f , 2), … , sgn( f ) ( f , n), …

and define the generalized sgn function by

"sgn( f )" G lim sgn( f ) ( f , n)  , (2.60a)


n 75

where quotes “ ” are used to indicate that the “ sgn( f ) ” is a generalized function instead of the
true function sgn( f ) defined in Eq. (2.42c) above. The reason for this choice of sequence is
straightforward—function [sgn( f ) ( f , n)] satisfies requirements (V) through (VIII) in Sec. 2.4
for every finite value of n and so has a well-defined Fourier transform; as n increases, function
[sgn( f ) ( f , n)] resembles ever more closely the sgn( f ) function to which we want to give a
Fourier transform. We note that for any test function 

5 5

³  ( f ) "sgn( f )" df ³  ( f ) G lim sgn( f ) ( f , n) df


5 5
n 75

5
lim ³  ( f ) sgn( f ) ( f , n)df
n 75
5
n
lim ³  ( f ) sgn( f )df
n 75
n
5
³  ( f ) sgn( f )df ,
5
so
"sgn( f )" sgn( f ) (2.60b)

- 138
- 138- -
Fourier Transforms of Generalized Functions · 2.13

in the sense of Eq. (2.48c). This equivalence can be used to justify dropping the distinction
between “ sgn( f ) ” and sgn( f ) . Applied mathematicians who work with generalized functions
often drop the distinction between a generalized function and the true function to which it is
equivalent, and the double-quote notation introduced here is not standard usage. There is,
however, no harm in keeping track of the distinction between the two types of functions, and the
double quotes acknowledge the close relationship of the two functions while reminding us that
they are not the same.
The inverse Fourier transform of [ i& sgn( f ) ( f , n)] is, using the identity
ei" cos "  i sin " ,

5 n
F ( itf )
 i& sgn( f ) ( f , n)  i& ³ e 2& ift
sgn( f ) ( f , n) df 2& ³ sin(2& ft ) df .
5 0

In the last step, we use that the integral of

[cos(2& ft ) sgn( f ) ( f , n)] ,

which is an odd function in ƒ, has an integral that is zero according to Eq. (2.17); and the integral
between (ín) and n of [sin(2& ft ) sgn( f )] , which is an even function in ƒ, is twice the value of its
integral from zero to n according to Eq. (2.19). Making the substitution f 3 2& tf gives

1 2& nt
F (itf )  i& sgn( f ) ( f , n)   cos f 30 .
t

This shows that the inverse Fourier transform of [i& sgn( f ) ( f , n)] is

F (itf )  i& sgn( f ) ( f , n)  t 1 1  cos(2& nt )  .

Now we calculate the forward Fourier transform of (1/ t )[1  cos(2& nt )] . We get

5
F (  ift )
t 1
 ³e
[1  cos(2& nt )] 2& ift 1
t [1  cos(2& nt )] dt
5
5 5
dt 1
³
5
e 2& ift  ³ e 2& ift cos(2& nt ) dt
t 5 t
5
1
i& sgn( f )  i ³ cos(2& nt ) sin(2& ft ) dt .
5
t

-- 139
139 --
2 · Fourier Theory

In the last step, Eq. (2.43d) is used to evaluate the integral of [e 2& ift t 1 ] ; we also substitute
ei" cos "  i sin " into the integral of [e 2& ift t 1 cos(2& nt )] , discovering that the Cauchy principle
value of the integral of [t 1 cos(2& ft ) cos(2& nt )] , which is an odd function in t, is zero [see Eq.
(2.17)]. The remaining integral over the even function

[t 1 sin(2& ft ) cos(2& nt )]

can be simplified by applying Eq. (2.19) and then consulting a table of definite integrals,22

5 5
1 1
³5 t cos(2& nt ) sin(2& ft ) dt 2sgn( f )³0 t cos(2& nt ) sin(2& f t ) dt
& sgn( f ) (2& n, 2& f ) & sgn( f ) (n, f ) .

We conclude that the forward Fourier transform of (1/ t )[1  cos(2& nt )] is

 
F ( ift ) t 1[1  cos(2& nt )] sgn( f ) ª¬ i&  i& (n, f ) º¼ i& sgn( f ) ª¬1   ( n, f ) º¼
i& sgn( f ) ( f , n) .

Hence, (1/ t )[1  cos(2& nt )] and [i& sgn( f ) ( f , n)] are a Fourier-transform pair,

1
1  cos(2& nt ) 6 i& sgn( f ) ( f , n) .
t

This confirms that there are two sequences

1 1 1
1  cos(2& t ) , 1  cos(4& t ) , … , 1  cos(2& nt ) , … (2.60c)
t t t
and

i& sgn( f ) ( f ,1),  i& sgn( f ) ( f , 2), … ,  i& sgn( f ) ( f , n), …

22
I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products, p. 453, formula 2 in Sec. 3.741 with
a=2&|f| and b=2&n.

- 140
- 140- -
Fourier Transforms of Generalized Functions · 2.13

such that each member of the lower sequence is the forward Fourier transform of the
corresponding member of the upper sequence and each member of the upper sequence is the
inverse Fourier transform of the corresponding member of the lower sequence. We know from
(2.60a) and (2.60b) that the generalized function given by the generalized limit of the lower
sequence is

G lim  i& sgn( f ) ( f , n)  i& G lim sgn( f ) ( f , n)  i& "sgn( f )"


n 75 n 75 (2.60d)
i& sgn( f ) ,

but what is the generalized function given by the generalized limit of the upper sequence? We
have for any test function 

5 5
1
³  (t ) G lim 1 t [1  cos(2& nt )]2 dt lim ³  (t ) 1  cos(2& nt ) dt
1

5
n 75 n 75
5
t
­° 5 dt
5
dt ½°
lim ® ³  (t )  ³  (t ) cos(2& nt ) ¾ (2.60e)
n 75
¯° 5 t 5 t ¿°
5 5
dt 1
³  (t )  lim ³  (t ) cos(2& nt ) dt .
5
t n75 5 t

Working with the limit of the integral containing cos(2& nt ) , we write

5 
1 1
lim ³  (t ) cos(2& nt ) dt lim ³  (t ) cos(2& nt ) dt
n 75
5
t n 75
5
t

1
 lim ³  (t ) cos(2& nt ) dt (2.60f)
n 75

t
5
1
 lim ³  (t ) cos(2& nt ) dt ,
n 75
 t

where  is a small positive number. By making all the test functions  (t ) have finite variation as
in requirement (VIII) in Sec. 2.4, we recognize the first and third integrals on the right-hand side
of (2.60f) become zero as n 7 5 , because eventually the cosine oscillates both positive and
negative over each infinitesimal interval while  (t ) t barely changes at all—the integrals can be
made as small as desired by picking a large enough value of n. For future use, we note that for
any continuous, finite-variation test function  ,

-- 141
141 --
2 · Fourier Theory

5 5 5
lim ³  (t ) sin(nt )dt lim ³  (t ) cos(nt )dt lim ³  (t )e 9 int dt 0 ,
n 75 n 75 n 75
5 5 5

so that
G lim sin(nt ) G lim cos(nt ) G lim e 9 int 0 . (2.60g)
n 75 n 75 n 75

The middle integral in Eq. (2.60f) can be written as

 5
dt 1
³  (t ) cos(2& nt ) t  (0)5³  (t ,  ) t cos(2& nt ) dt ,

where we have chosen  small enough that  (t ) barely changes over the integral, letting us
replace it by  (0) . Now the middle integral on the right-hand side of (2.60f) can be recognized as
the Cauchy principle value of the integral of (1 t )  (t ,  ) cos(2& nt ) , which is an odd function of t
and must be zero according to Eq. (2.17). Hence, (2.60f) becomes

5
1
lim ³  (t ) cos(2& nt ) dt 0 ,
n 75
5
t

which shows that (2.60e) simplifies to

5 5
dt
³  (t ) G lim 1 t [1  cos(2& nt )]2 dt ³  (t )
1
(2.60h)
5
n 75
5
t

for any test function  . Since (2.60h) denotes equality in the sense of Eq. (2.48c), we can define
the generalized function “ t 1 ” to be

1
" t 1 " G lim t 1[1  cos(2& nt )]
n 75
2 (2.60i)

and then note that Eq. (2.60h) now states that

" t 1 " t 1 . (2.60j)

Equations (2.60d) and (2.60j) show that [i& "sgn( f )"] and “ t 1 ” are the generalized limits of the
two sequences in (2.60c). Because all the sequence members are Fourier transform pairs, we

- 142
- 142- -
Fourier Transforms of Generalized Functions · 2.13

know, according to (2.59g), that [ i& "sgn( f )"] and " t 1 " are a Fourier transform pair even
though [i& sgn( f )] and t 1 do not satisfy requirements (V) through (VIII) in Sec. 2.4 and, as
shown in Eqs. (2.43a) and (2.43f), their transforms cannot be evaluated as standard integrals. In
this sense, we can write that

F ( ift ) (t 1 ) i& sgn( f ) (2.60k)


and
F ( ift )  i& sgn( f )  t 1 . (2.60 A )

This can also be written as, reversing the sign of ƒ in (2.60k), the sign of t in (2.60 A ), and using
Eq. (2.42c) to get that sgn(  f )  sgn( f ) ,

F (ift ) (t 1 ) i& sgn( f ) (2.60m)


and
F ( ift )  i& sgn( f )  t 1 . (2.60n)

It is important to remember that Eqs. (2.60k) and (2.60m) are true only when integrals between
í’ and +’ are interpreted as Cauchy principle values and (2.60 A ) and (2.60n) are true only
when equality is defined as in Eq. (2.48c) using generalized function theory. Strictly speaking, it
might be better to say that the Cauchy principle value of

5
dt
³e
92& ift
is 9i& sgn( f )
5
t
and that
5

³e
92& ift
 i& "sgn( f )" df 9 " t 1 " .
5

This is the reason that


5
dt
³e
92& ift
9i& sgn( f ) (2.61a)
5
t

is usually not listed in standard tables of improper integrals without notation showing that it is a
Cauchy principle value, and the equality

5
i
³e
92& ift
sgn( f ) df 9 (2.61b)
5
&t

-- 143
143 --
2 · Fourier Theory

is usually not listed in these tables under any circumstances. It is also true, however, that (2.61a)
and (2.61b) are constantly used either explicitly or implicitly in Fourier-transform theory; and
lists of Fourier-transform pairs often contain (2.61a) and (2.61b). Unfortunately, it is standard
practice in the Fourier-transform tables that do list these integrals to omit any explanation that
they are only true when interpreted as the Fourier transforms of generalized functions. In general,
when using tables of Fourier transforms, all those transforms that do not exist as standard
integrals or Cauchy principle values should be interpreted as the transforms of generalized
functions and used only in the context of generalized function theory.

2.14 The Delta Function


The most popular and useful generalized function is the Dirac delta function, a name usually
shortened to just the delta function. In a sense, the Secs. 2.11–2.13 describing generalized
function theory are there just so we can give a mathematically exact description of the delta
function. The delta function is often inexactly described in elementary textbooks as that function
 (t ) such that
­5 for t 0
 (t ) ® (2.62a)
¯ 0 for t > 0

with
b
­ f (0) for a
0
b
³  (t ) f (t )dt ®
a ¯ 0 for a
b
0 or 0
a
b
. (2.62b)

More sophisticated textbooks may define it as a standard limit, for example,

 (t ) lim[n (t , n 1 )] (2.63a)
n 75

or
§ n 2 ·
 (t ) lim ¨¨ e  nt ¸¸ . (2.63b)
© &
n 75
¹

There are, in fact, two different—but equivalent—mathematically exact ways to define the delta
function. The first way is to create a well-defined functional ³  that, when operating on a
complex-valued test function  (t ) with a real argument t, produces as its complex number  (0) ,
the value of  at t equal to zero,
 ³    (0) . (2.64a)

- 144
- 144- -
The Delta Function · 2.14

This makes  (t ) the generalized function associated with functional ³  , with  (t ) having the
property that
5

³  (t ) (t )dt  (0)
5
(2.64b)

for all test functions  . The second way to define  (t ) is to say it is the generalized limit of a
sequence such as the ones specified in (2.63a) and (2.63b),

 (t ) G lim[n (t , n 1 )] (2.65a)
n 75

or
§ n 2 ·
 (t ) G lim ¨¨ e  nt ¸¸ . (2.65b)
© &
n 75
¹

Although the delta function is a generalized function in every sense of the term, we follow
standard notation and do not add the G subscript—or add the quotes “ ”—used to label other
generalized functions in this chapter.
Defining  (t ) with a functional, as in (2.64a), shows that this generalized function can be
used on an extremely large set of test functions—any true function that is continuous at the origin
is an acceptable and appropriate test function. The subset of test functions ab used in Eqs.
(2.48d)–(2.48g) has a
b with ab (t ) automatically set to zero when t does not lie inside the
interval a
t
b . These functions can be used in (2.64b) to show that

³  (t )
5
ab (t )dt ab (0) 0

when a
b
0 or 0
a
b . Therefore, we have

 (t ) 0 for t > 0 (2.65c)

in the sense of definition (2.48f)—that is, we know that

5 5

³  (t )
5
ab (t )dt ³ 0 A
5
ab (t )dt 0

-- 145
145 --
2 · Fourier Theory

for all test functions ab where the interval a


t
b does not include t 0 . This is a
mathematically exact way of stating the lower level of Eq. (2.62a). If  (t ) is defined using
generalized limits, as in Eqs. (2.65a) and (2.65b), then we must show why Eq. (2.64b) is true. The
sequence in (2.65b), for example, leads to

5
ª n  nt 2 º
5
n  nt 2
5
n  nt 2
³5  (t ) «Gn75
lim
&
e » dt lim ³
n 75 &
e  (t )dt lim  (0) ³
n 75 &
e dt
¬ ¼ 5 5
5
n  nt 2
 (0) lim ³ e dt (2.66)
n 75
5
&
 (0)

for any test function  . As n gets large in (2.66), only the value of  at t 0 can contribute
significantly to the integral. Replacing  (t ) by  (0) quickly reduces the whole expression to
 (0) , showing that the generalized limit of the sequence in (2.65b) is indeed the delta function.
Some commonly used sequences that have the delta function as their generalized limits are

 (t ) G lim
n &  , (2.67a)
n 75 1  n 2t 2

sin 2 (nt )
 (t ) G lim , (2.67b)
n 75 n& t 2

sin(2& nt )
 (t ) G lim , (2.67c)
n 75 &t

and so on. Perhaps the most interesting of these sequences is (2.67c). We know from (2.65c) that
one important property of the delta function is

³  (t )
5
ab (t )dt 0

whenever the interval a


t
b does not include t 0 . The reason that

5 5
ª sin(2& nt ) º ª sin(2& nt ) º
³5 «¬ n75 & t »¼
G lim ab (t ) dt
n 75 ³ «
lim
5 ¬ & t »¼ ab (t )dt 0

- 146
- 146- -
The Delta Function · 2.14

when the interval a


t
b does not include t 0 is that for extremely large n values the sine
oscillates rapidly between +1 and í1 while ab (t ) t stays essentially constant for t > 0 , averaging
the integrand to zero. Hence,

sin(2& nt )
G lim  (t ) 0 for t > 0
n 75 &t

for the same reason that


G lim e 9 int 0
n 75

in Eq. (2.60g). To understand the behavior near t 0 , we construct function a 0b (t ) in which the
interval a
t
b does include t 0 . Now we can write, transforming the variable of integration
to t 3 2& nt ,

5 5
ª sin(2& nt ) º 1 ª sin(t 3) º § t3 · 3
³ « n75 & t »¼
5 ¬
G lim a 0 b (t ) dt lim
n 75 & ³5 «¬ t 3 »¼ ©¨ 2& n ¹¸ dt
a 0 b

5 5
1 ª sin(t 3) º
a 0b  0  lim ³ « dt a 0b  0  ³  (t )a 0b (t )dt ,
3
n 75 &
5 ¬
t 3 »¼ 5

where in the second-to-last step we use (see any handbook of definite integrals)

5
sin(t 3)
³
5
t3
dt 3 & .

Any arbitrary test function can be written as a function a 0b (t ) whose interval of nonzero values
includes t 0 plus other test functions whose intervals of nonzero values do not include t 0 ;
that is, we can always write  (t ) a 0b (t )  [other functions zero at the origin] . When this  (t ) is
multiplied by G lim sin(2& nt ) (& t ) and integrated over t between í’ and +’, we realize that the
n 75

value of the integral is a 0b (0)  (0) because the other functions that are zero at the origin give
zero contribution to the integral as n 7 5 . Consequently,

5 5
ª sin(2& nt ) º
³5 «¬Gn75
lim
&t »¼  (t )dt   0  ³  (t ) (t )dt ,
5

indicating that the generalized limit of the sequence

-- 147
147 --
2 · Fourier Theory

sin(2& nt )
&t

equals the delta function in the only sense that two generalized functions can ever be equal—the
integral of the left-hand side with any test function  is always the same as the integral of the
right-hand side with any test function  [see discussion after Eq. (2.47b)]. Figures 2.7(a)–2.7(c)
2
and 2.8(a)–2.8(c) plot the behavior of n & A e  nt and (& t ) 1 sin(2& nt ) sequences, showing the
two different ways these sequences change into delta functions.
We note that for any odd test function o (t )

5
³  (t ) (t )dt  (0) 0
o o

because, according to Eq. (2.12a), odd functions are zero at the origin. Therefore, from the
definitions of even and odd generalized functions in Eqs. (2.52d) and (2.52e), we conclude that
the delta function is an even generalized function because its integral with all odd test functions is
always zero. This means we can write [see Eq. (2.52f)]

 (t )  (t ) . (2.68a)

From the behavior of generalized functions specified in Eq. (2.52a), we have

5 5

³  (t  t ) (t )dt ³  (t ) (t  t )dt  (t )
5
0
5
0 0

and, because the delta function equals the zero function for t > 0 , this result can be written as

b
­0 for a
b
t0 or t0
a
b
³a  (t  t0 ) (t )dt ®¯ (t0 ) for a
t0
b . (2.68b)

From Eq. (2.52b), we have

5 5
1 1
³5  (c A t ) (t )dt c ³  (t ) (t / c)dt
5
c
 (0) ,

from which we conclude that

- 148
- 148- -
The Delta Function · 2.14
FIGURE 2.7(a).

0
t
FIGURE 2.7(b).

0 t
FIGURE 2.7(c).

0 t

2
Figures 2.7(a)–2.7(c) show how n / & e  nt changes into a delta function of t as n increases.

-- 149
149 --
2 · Fourier Theory

FIGURE 2.8(A).

0 t
FIGURE 2.8(b).

0 t

FIGURE 2.8(c).

0 t
-1
Figures 2.8(a)–2.8(c) show how (ʌt) sin(2ʌnt) changes into a delta function of t as n increases.

- 150
- 150- -
The Delta Function · 2.14

1
 (ct )  (t ) (2.68c)
c
because
5
1
5
ª1 º
³  ( c A t ) (t ) dt
c
 (0) ³ « c »»  (t )dt
«
5 ¬
 (t )
5 ¼

for all test functions  . We note that this last rule, Eq. (2.68c), can also be used to show that the
delta function is even, since (2.68a) is just a special case of (2.68c) with c 1 .
Equation (2.52c) shows that there is no difficulty handling a general linear transformation of
the delta function’s argument, because for any two real constants a and c, we have

5
1
5
1 §c·
5
ª 1 § c ·º
³  (a A t  c) (t )dt
5
a ³  (t ) ((t  c) / a)dt
5
 ¨ ¸ ³ «  ¨ t  ¸ »  (t )dt
a © a ¹ 5 «¬ a © a ¹ »¼

for all test functions  . Consequently,


1 § c·
 (a A t  c)  ¨t  ¸ . (2.68d)
a © a¹

This is the same answer we would get from factoring a out of the delta function argument and
then using (2.68c) to rescale the delta function.
When the delta function is multiplied by a true function v(t), we have

5 5 5

³  (t  t )v(t ) (t )dt ³  (t  t ) v(t ) (t ) dt v(t ) (t ) ³  (t  t )v(t ) (t )dt


5
0
5
0 0 0
5
0 0

for any test function  , from which we conclude that

v(t ) A  (t  t0 ) v(t0 ) A  (t  t0 ) . (2.68e)

A useful generalization of (2.68d) is, for continuous true functions u(t),

1
  u (t )  ¦  (t  tk ) , (2.68f)
all k u 3(tk )

where u3(t ) du dt and t1 , t2 ,… are the values of t for which u (t ) 0 . This formula only makes
sense, of course, when u3(tk ) > 0 for t1 , t2 ,… . Perhaps the easiest way to see that (2.68f) must be

-- 151
151 --
2 · Fourier Theory

true is to note that the delta function equals the zero function whenever its argument is not zero.
Therefore,
5 ª tk   º
³  (u (t )) (t ) dt ¦ « ³  (u (t )) (t ) dt » (2.68g)
5 « tk 
all k ¬ ¼»

with  0 taken to be small enough that each interval tk  


t
tk   only includes one of the
tk values for which u is zero. Nothing stops us from making  as small as we please—as long as
it does not become zero—and eventually each integral on the right-hand side of (2.68g) can be
written as
tk  tk  

³   u(t )   (t )dt ³   (t  t )u3(t )   (t )dt ,


tk  tk 
k k

where we expand u as

u (t ) u (tk )  (t  tk )u3(tk ) (t  tk )u 3(tk )

since u (tk ) 0 . Next, we use (2.68d) to write

1
  (t  tk )u 3(tk )   (t  tk ) ,
u 3(tk )

so that
tk  tk  
ª 1 º
³   u (t )   (t )dt ³ «« u3(t )  (t  t ) »»  (t )dt
k
tk  tk  ¬ k ¼
5
ª 1 º
³5 « u3(tk )
«  (t  t k ) »  (t )dt .
¬ ¼»

Substitution of this result back into (2.68g) gives

5
ª 1
5
º 5
ª 1 º
³5  (u (t )) (t )dt ¦ ³ «  (t  tk ) »  (t )dt ³ « ¦  (t  tk ) »  (t )dt
¬ u 3(tk )
all k 5 « »¼ ¬ all k u 3(tk )
5 « »¼

for all test functions  . This justifies Eq. (2.68f) according to the definition for the equality of
generalized functions [see Eqs. (2.47a) and (2.47b)].

- 152
- 152- -
Derivatives of the Delta Function · 2.15

2.15 Derivatives of the Delta Function


We have already remarked that the set of test functions for  (t ) contains all functions that are
continuous at the origin. Changing the argument of the delta function changes the set of
appropriate test functions. In Eq. (2.68b), for example, the test functions must be continuous at
t t0 ; in (2.68d) they must be continuous at t c / a ; and in (2.68f) they must be continuous at
all t tk . When Eq. (2.53b) is used to define the derivative of a delta function,  3(t ) , we have

5 5

³  3(t ) (t )dt  ³  (t ) 3(t )dt  3(0) ,


5 5
(2.69a)

which shows that now the first derivative of all the test functions must be continuous at the
origin. If we start out with a test function ab (t ) that must be identically zero for all t
a and for
all t b , then Eq. (2.69a) becomes

5 5

³  3(t )ab (t )dt  ³  (t )ab3 (t )dt ab3 (0) 0


5 5

whenever the interval a


t
b does not contain the origin. Hence, we can write

5 5

³  3(t )ab (t )dt 0


5
³ 0 A
5
ab (t )dt

for a
b
0 or 0
a
b , showing that  3(t ) equals the zero function in the sense of Eq. (2.48f)
for t > 0 . Equation (2.52a) can be used in conjunction with (2.53b) to evaluate  3(t ) when it is
shifted from the origin by an amount t0 ,

5 5 5

³  3(t  t ) (t )dt ³  3(t ) (t  t )dt  ³  (t ) 3(t  t )dt  3(t ) ,


5
0
5
0
5
0 0 (2.69b)

where now we require the first derivative of the test functions to be continuous at t t0 . This
result can be applied to test functions ab (t ) to get

5 5 5

³  3(t  t )
5
0 ab (t )dt ³  3(t )
5
ab (t  t0 )dt  ³  (t )ab
5
3 (t  t0 )dt ab
3 (t0 ) 0

-- 153
153 --
Â)RXULHU7KHRU\

ZKHQHYHUWKHLQWHUYDO D < W < E GRHVQRWFRQWDLQ W = W 7KHUHIRUH

∞ ∞
  ³ δ ′ W − W φDE W GW =  = ³  ⋅φ DE W GW 
−∞ −∞

ZKHQHYHU D < E < W RU W < D < E VKRZLQJWKDW δ ′ W − W HTXDOVWKH]HURIXQFWLRQ>LQWKHVHQVHRI
(T I @IRU W ≠ W (TXDWLRQV D DQG E FDQEHDSSOLHGDQ\QXPEHURIWLPHVWRJHW
δ Q WKHQWKGHULYDWLYHRIWKHGHOWDIXQFWLRQVKLIWHGDZD\IURPWKHRULJLQE\DQDPRXQW W :H
KDYH
∞ ∞ ∞

³δ W − W φ W GW = − ³ δ W φ W + W GW = ³δ W φ  W + W GW = " 
Q Q −  Q − 

−∞ −∞ −∞

ZKLFKHYHQWXDOO\EHFRPHV


G Qφ
³δ W − W φ W GW = ( −) φ W = ( −)
Q Q Q Q
   F 
−∞
GW Q W = W


$JDLQWKLVODWHVWUHVXOWFDQEHDSSOLHGWRWHVWIXQFWLRQV φDE W WRJHW


 ³ δ Q W − W φDE W GW = ( −) φDE Q W =  
Q

−∞

ZKHQHYHUWKHLQWHUYDO D < W < E GRHVQRWFRQWDLQ W = W %HFDXVH

∞ ∞
 ³δ Q
W − W φDE W GW =  = ³  ⋅φ DE W GW 
−∞ −∞

ZKHQHYHU W = W  OLHV RXWVLGH WKLV LQWHUYDO ZH HQG XS ZLWK >XVLQJ WKH GHILQLWLRQ RI HTXDOLW\ LQ
I @
  δ Q W − W = IRUW ≠ W   G 

7KH WHVW IXQFWLRQV LQWHJUDWHG ZLWK δ Q W − W  PXVW RI FRXUVH KDYH WKHLU QWK GHULYDWLYHV
FRQWLQXRXVDW W = W 



Derivatives of the Delta Function · 2.15

We define the function (t ) to be

­ 1 for t 0
°
(t ) ®1 2 for t 0 . (2.70a)
° 0 for t
0
¯

Function  is often called the Heaviside step function. If we take

d
 (1) (t )  (t ) (2.70b)
dt

to be the first derivative of the  function, then  (1) (t ) 0 for all t > 0 . To evaluate  (1) (t ) at
the origin, we decide to turn  (t ) and  (1) (t ) into generalized functions that we call “  (t ) ” and
“  (1) (t ) ” respectively. We define

5 5

³ "(t )" (t )dt ³ (t ) (t )dt


5 5

for all test functions  , which means that, according to Eqs. (2.48b) and (2.48c),

" (t )" (t ) . (2.70c)

Having established the generalized function “ (t ) ”, we know from Eq. (2.53b) that the
generalized function “  (1) (t ) ” must satisfy

5 5

³ " (t )" (t )dt  ³ "(t )" 3(t )dt .


(1)
(2.70d)
5 5

A formal integration by parts of the left-hand side gives

5 5

³ " ³ "(t )" 3(t )dt .


(1) 5
(t )" (t )dt " (t )"A  (t )5 
5 5

This becomes, using (2.70c) to remove the double quotes,

-- 155
155 --
2 · Fourier Theory

5 5

³5 " (t )" (t )dt ª¬lim  (t ) º  ³ (t ) 3(t )dt


(1)
t 75 ¼ 0
ªlim  (t ) º   (0)  ªlim  (t ) º
¬ t 75 ¼ ¬ t 75 ¼
5
 (0) ³  (t ) (t )dt .
5

Hence, for all test functions  continuous at the origin (note that they do not have to approach
zero at ’), we have
5 5

³ " (t )" (t )dt ³  (t ) (t )dt ,


(1)

5 5
so
d
"  (1) (t )" " (t )"  (t ) (2.70e)
dt

in the sense of Eq. (2.47b). There is nothing unique about the Heaviside step function. We can
also show, using the generalized function "sgn(t )" introduced in Eqs. (2.60a) and (2.60b) above,
that for any test function 
5 5
1
³5 2 "sgn (t )" (t )dt 5³  (t ) (t )dt ,
(1)
(2.70f)

where "sgn (1) (t )" is the first derivative of "sgn(t )" . To show this is true, we do a formal
integration by parts,

5 5
1 1 1
³5 2 "sgn (t )" (t )dt 2 "sgn(t )"A  (t )5  2 5³ "sgn(t )" 3(t )dt .
(1) 5

This becomes, using Eqs. (2.60b) and (2.42c),

5 0 5
1 1ª º  1 ª lim  (t ) º  1  3(t )dt  1  3(t ) dt
³2 ³ 2 ³0
(1)
"sgn (t )" ( t ) dt lim  (t )
5
2 ¬ t 75 ¼ 2 ¬ t 75 ¼ 2 5
1 1 1 1 1 1
ªlim  (t ) º  ª lim  (t ) º  ª lim  (t ) º   (0)   (0)  ª lim  (t ) º
2 ¬ t 75 ¼ 2 ¬ t 75 ¼ 2 ¬ t 75 ¼ 2 2 2 ¬ t 75 ¼
5
 (0) ³  (t ) (t )dt .
5
This shows Eq. (2.70f) is true. Again, we get a formula

- 156
- 156- -
Derivatives of the Delta Function · 2.15

1
"sgn (1) (t )"  (t ) (2.70g)
2

in the sense of Eq. (2.47b), where the only major restriction on the test functions is that they be
continuous at the origin.

2.16 Fourier Transform of the Delta Function


To find the Fourier transform of the delta function, we construct two sequences of functions
having the relationship specified in (2.59a)–(2.59g) above. It is easiest to start with the delta-
function sequence in Eq. (2.67c). Any standard table of Fourier transforms gives23

5
sin(2& nt ) § sin(2& nt ) ·
³e
2& ift
dt F (  ift ) ¨ ¸  ( f , n)
5
&t © &t ¹
and
5
sin(2& nt )
³
5
e 2& ift  ( f , n)df F (ift )   ( f , n) 
&t
so that
sin(2& nt )
6  ( f , n) . (2.71a)
&t

Although Eq. (2.71a) holds true for all real n, it is here used only for integer values of n. We
know from (2.67c) that the generalized limit as n 7 5 of the left-hand side of (2.71a) is  (t ) ,
but what is the corresponding generalized limit of the right-hand side? We have

5 5 n 5

³5  ( f ) df ª¬Gn75
lim  ( f , n) º lim ³  ( f , n)  ( f ) df lim ³  ( f ) df ³ 1A ( f ) df
¼ n75 5 n 75
n 5

for any test function  . This shows that

G lim  ( f , n) 1 ,
n 75

which is no surprise. Therefore, taking the generalized limit as n 7 5 of both sides of (2.71a)

23
Jack D. Gaskill, Linear Systems, Fourier Transforms, and Optics (John Wiley & Sons, New York, 1978), p. 201,
with the sinc, rect function pair corresponding to formula (2.71a) above.

-- 157
157 --
2 · Fourier Theory

gives
 (t ) 6 1 , (2.71b)
or, restating this result,
5

³  (t )e
2& ift
dt 1 (2.71c)
5
and
5

³e
2& ift
df  (t ) . (2.71d)
5

Equation (2.71c) is just what we expect from Eq. (2.64b), since

e 2& if A0 1 ;

but Eq. (2.71d) is true only in the sense of Eq. (2.47b), and it is only safe to substitute freely from
(2.71d) when the substitution takes place inside an integral.
Because the sine is an odd function of its argument, we have according to Eq. (2.17), and
assuming the integral is a Cauchy principle value, that

³ sin(2& ft )df
5
0.

Therefore, Eq. (2.71d) becomes, using Eq. (2.19) and that the cosine is even,

5 5

³ cos(2& ft )  i sin(2& ft ) df 2³ cos(2& ft )df  (t ) .


5 0

Since the integral over the sine always disappears, we can also write

5 5

³ cos(2& ft ) 9 i sin(2& ft ) df ³ e
92& ift
 (t ) df .
5 5

Hence, two additional formulas for the delta function are

5
2 ³ cos(2& ft )df  (t ) (2.71e)
0

- 158
- 158- -
Fourier Transform of the Delta Function · 2.16

and
5

³e
92& ift
df  (t ) . (2.71f)
5

As was the case for Eq. (2.71d), these formulas are meant to be used inside integrals.

2.17 Fourier Convolution Theorem with Generalized Functions


Now that we have defined what is meant by the Fourier transform of a generalized function, it is
surprisingly easy to show that the Fourier convolution theorem holds for the product of a
generalized function and a true function.
We start with two sequences of true functions, one of them labeled with a superscript minus
sign for reasons that will become shortly become apparent, called

v1 (t ), v2 (t ),… , vn (t ),… and V1(  ) ( f ), V2(  ) ( f ),… , Vn(  ) ( f ),… .

If these two sequences obey the relationship

v1 (t ) 6 V1(  ) ( f )
v2 (t ) 6 V2(  ) ( f )
# ,
()
vn (t ) 6 Vn ( f )
#
we know from Eq. (2.59g) that the generalized functions vG (t ) and VG(  ) ( f ) specified by

vG (t ) G lim vn (t ) (2.72a)
n 75

and
VG(  ) ( f ) G lim Vn(  ) ( f ) (2.72b)
n 75

form a Fourier transform pair,


vG (t ) 6 VG(  ) ( f ) . (2.72c)

We also suppose that there exists a third sequence of true functions labeled with a superscript
plus sign,
V1(  ) (t ),V2(  ) (t ),… , Vn(  ) (t ),… ,
such that

-- 159
159 --
2 · Fourier Theory

V1(  ) (t ) 6 v1 ( f )
V2(  ) (t ) 6 v2 ( f )
# .
()
Vn (t ) 6 vn ( f )
#

If this third sequence has a generalized function as its generalized limit,

VG(  ) (t ) G lim Vn(  ) (t ) , (2.72d)


n 75

then the generalized functions VG(  ) (t ) and vG ( f ) are also a Fourier transform pair,

VG(  ) (t ) 6 vG ( f ) . (2.72e)

Definitions (2.72b) and (2.72d) taken together show that

VG( 9 ) ( f ) G lim Vn( 9 ) ( f ) , (2.72f)


n 75

where we have replaced t by ƒ in (2.72d); and Eqs. (2.72c) and (2.72e) taken together give

VG( 9 ) ( f ) F ( 9 ift )  vG (t )  , (2.72g)

where we have interchanged the roles of t and ƒ in Eq. (2.72e).


From the Fourier convolution theorem for true functions [see Eq. (2.39j)], it follows that for
any true function u(t)

F ( 9 ift )  u (t ) A vn (t )  F ( 9 ift 3)  u (t 3)   F ( 9 ift 33)  vn (t 33) 


or
5 5

³e ³U
92& ift (9)
u (t )vn (t ) dt ( f 3) Vn( 9 ) ( f  f 3) df 3 ,
5 5
where
5 5
U (9) ( f ) ³ e 92& ift u (t )dt and Vn( 9 ) ( f ) ³e
92& ift
vn (t )dt .
5 5

The integral formula for Vn( 9 ) ( f ) just restates the definitions given to Vn(  ) and Vn(  ) on the two
previous pages. Taking the limit of both sides as n 7 5 gives

- 160
- 160- -
Fourier Convolution Theorem with Generalized Functions · 2.17

5 5
lim
n 75 ³
5
e 92& ift u (t )vn (t ) dt lim ³ U ( 9 ) ( f 3) Vn( 9 ) ( f  f 3) df 3
n 75
5

or, moving the limiting process inside the integral so that it becomes a generalized limit [see
discussion after Eq. (2.56a)],

5 5

³e ³U
92& ift (9)
u (t ) G lim vn (t ) dt ( f 3) G lim Vn( 9 ) ( f  f 3)df 3 .
n 75 n 75
5 5

From the definitions of vG (t ) and VG( 9 ) ( f ) [see Eqs. (2.72a) and (2.72f)], we get

5 5

³e ³U
92& ift (9)
u (t )vG (t ) dt ( f 3)VG ( 9 ) ( f  f 3) df 3 ,
5 5

which becomes
5

³e
92& ift
u (t )vG (t ) dt U ( 9 ) ( f )  VG ( 9 ) ( f ) (2.72h)
5

or, substituting from Eq. (2.72g),

F ( 9 ift )  u (t ) A vG (t )  F ( 9 ift 3)  u (t 3)   F ( 9 ift 33)  vG (t 33)  . (2.72i)

Consulting Eq. (2.55b) above, we note that convolution with a generalized function is
commutative, just like the convolution of two standard functions, so Eqs. (2.72h) and (2.72i) can
also be written as
5

³e
92& ift
u (t )vG (t ) dt VG ( 9 ) ( f )  U ( 9 ) ( f ) (2.72j)
5
and
F ( 9 ift )  u (t ) A vG (t )  F ( 9 ift 33)  vG (t 33)   F ( 9 ift 3)  u (t 3)  . (2.72k)

This establishes the generalized-function counterpart to Eq. (2.39j) whenever e 92& ift u (t ) and
U ( 9 ) ( f ) qualify as acceptable test functions. Since almost all well-behaved, continuous functions
are acceptable test functions when used with linear combinations of delta functions or the
derivatives of delta functions, Eqs. (2.72h) and (2.72i) are valid whenever vG (t ) is a linear
combination of delta functions or the derivatives of delta functions.

-- 161
161 --
2 · Fourier Theory

Establishing the Fourier convolution theorem in the other direction is even easier. We just
write, making the variable substitution t 33 t  t 3 and remembering that the convolutions are
commutative,

5 5 5

³
5
e 92& ift [u (t )  vG (t )] dt ³
5
dt e92& ift
5
³ dt3 u(t  t 3) G lim v (t 3)
n 75
n

5 5

³ dt 3 G lim v (t 3) ³ dt u(t  t 3) e
92& ift
n
n 75
5 5
5 5

³ dt 3 v (t 3) ³ dt u(t  t 3) e
92& ift
lim n
n 75
5 5
5 5
lim
n 75 ³
5
dt 3 vn (t 3) e92& ift 3 ³ dt 33 u (t 33) e92& ift 33
5
5 5
[³ e 92& ift 3
G lim vn (t 3) dt 3] A [ ³ u (t 33) e92& ift 33 dt 33] .
n 75
5 5

We conclude that
F ( 9 ift )  u (t )  vG (t )  F ( 9 ift 3)  u (t 3)  A F ( 9 ift 33)  vG (t 33)  , (2.72 A )

showing that Eq. (2.39a) holds true for the convolution of a true function and a generalized
function as well as for the convolution of two true functions.

2.18 The Shah Function


The shah function, often written as I I , can be defined as the generalized limit

§ t § 1 ··
sin ¨ 2& ¨ n  ¸ ¸
1
II( t , T ) A G lim © T© 2 ¹¹
. (2.73)
T n75 § t ·
sin ¨ & ¸
© T¹
For any test function  (t ) , we have

5 ª sin 2& tT 1  n  (1 2) 
  »ºdt lim 5 ­° sin 2& tT 1  n  (1 2) 
  ½°dt
³5  (t ) Gn75
lim «
«¬ sin & tT 1  »¼ n 75 ³
 (t ) ®
sin & tT 1  
¾ (2.74a)
5 ¯° ¿°

- 162
- 162- -
The Shah Function · 2.18

As n gets large in (2.74a), the term in braces { } oscillates ever more rapidly between +1 and í1,
causing the more slowly varying function  to make only a negligible contribution to the
integral. The only place this might not hold true is at the isolated t values

t 0, 9 T , 9 2T ,… . (2.74b)

It is easy to see why these isolated values are different. Suppose t differs from one of these
isolated values by only a small amount ¨t so that

t t 9 mT for m 0,1, 2,… . (2.74c)

Then the term in braces becomes


sin 2& (t 9 mT )T 1  n  (1 2)   sin  2&tT  n  (1 2)  9 2& nm 9 & m 
1


sin & (t 9 mT )T 1  sin(&tT 1 9 & m)



sin 2&tT 1  n  (1 2)  .
1
sin(&tT )

To explain the last step, we note that the sine does not change when a ±nm number of 2ʌ’s is
added to its argument, and adding a ±m number of ʌ’s to the sine’s argument either leaves the
sine unchanged (if m is even) or multiplies it by í1 (if m is odd). Since the sine values in both the
numerator and denominator have the same number of ʌ’s added to their arguments, we do not
care if m is odd because the factor of í1 cancels, leaving the sine ratio unchanged. As ¨t is taken
to be ever smaller in magnitude for a fixed value of n, there comes a time when the arguments of
both sines are small in magnitude, allowing each sine to be approximated by its argument. We
then have

sin 2& (t 9 mT )T 1  n  (1 2) 

 
sin 2&tT 1  n  (1 2)  

sin & (t 9 mT )T 1  sin &tT 1  
1
2&tT  n  (1 2)  2
 n  (1 2)  .
&tT 1

Consequently, the peak values of the term in braces get ever larger at the isolated points in
(2.74b) as n increases, as shown in Figs. 2.9(a)–2.9(c). We see that the triangular peaks at the
isolated points in (2.74b) have widths equal to T (n  (1 2)) . As n gets ever larger, the term in
braces oscillates so rapidly between +1 and í1 compared to the test function  that there is no
contribution made to the integral on the right-hand side of (2.74a) except at the isolated t values
shown in Figs. 2.9(a)–2.9(c). At these t values, we have

-- 163
163 --
2 · Fourier Theory

5 ­° sin 2& tT 1  n  (1 2) 
  °½dt "   (T ) 1area of triangular peak2
lim ³  (t ) ® ¾
n 75
5 °¯ 
sin & tT 1  °¿
  (0) 1area of triangular peak2
  (T ) 1area of triangular peak2  "
1 T
A A 2  n  (1 2) 1"   (T )   (0)   (T )  "2 ,
2 n  (1 2)
which simplifies to

5 ­° sin 2& tT 1  n  (1 2) 
  ½° dt T k 5
lim ³  (t ) ® ¾ ¦  (kT ) . (2.75a)
n 75
5 °¯ sin & tT 1   °¿ k 5

k 5
But T ¦  (kT ) can be regarded
k 5
thought ofasaswhat
whatwe
weget
getwhen
whenevaluating
evaluatingthe
theintegral
integral

ª k 5 º
5 k 5 5 k 5

³5 «¬ k¦
 (t ) T
5
 (t  kT ) »
¼
dt T ¦³
k 5 5
 (t  kT )  (t ) dt T ¦  (kT ) .
k 5

This lets us write (2.75a) as

5 ­° sin 2& tT 1  n  (1 2) 
  ½° dt 5
ª k 5
º
lim ³  (t ) ® ¾ ³  (t ) «T ¦  (t  kT ) »¼ dt (2.75b)
n 75
5 ¯° sin & tT 1   ¿° 5 ¬ k 5

or, using (2.56a) to take the limit inside the integral as a generalized limit,

5 ­° sin 2& tT 1  n  (1 2) 
  ½° dt 5
ª k 5
º
³5  (t ) Gn75
lim ® ¾ ³  (t ) «T ¦  (t  kT ) »¼ dt .
°¯ sin & tT 1   °¿ 5 ¬ k 5

Since this last result is true for any test function  , we conclude that

- 164
- 164- -
The Shah Function · 2.18

­° sin  2& tT 1  n  (1 2)   ½° k 5
G lim ® ¾ T ¦  (t  kT ) (2.75c)
n 75
°¯ sin & tT 1  °¿ k 5

in the sense of Eq. (2.47b). Comparison of this result to the definition of the shah function in Eq.
(2.73) above shows that
5
II( t , T ) ¦  (t  kT ) .
k 5
(2.75d)

We note that variable t can be replaced by ƒ in Eq. (2.75c) to get

­° sin 2& fT 1  n  (1 2) 
  ½° T k 5
G lim ® ¾ ¦  ( f  kT ) .
n 75
¯° 
sin & fT 1  ¿° k 5

Parameter T is arbitrary throughout this derivation, so nothing stops us from replacing it by T 1


everywhere to get
­° sin  2& fT  n  (1 2)   ½° 1 k 5 § k·
G lim ® ¾ ¦  ¨ f  ¸. (2.75e)
n 75
¯° sin  & fT  ¿° T k 5 © T ¹

This is another useful version of the formula in Eq. (2.75d).

2.19 Fourier Transform of the Shah Function


To get the Fourier transform of the shah function, we construct the sequence of true functions
G1 (t , T ), G2 (t , T ),… , Gn (t , T ),… such that
n
Gn (t , T ) ¦g
k  n
n (t  kT ) , (2.76a)

where
sin(2& (n  1)t )
g n (t ) . (2.76b)
&t

From Eq. (2.67c), we have


sin(2& nt )
G lim g n 1 (t ) G lim  (t ) .
n 75 n 75 &t

-- 165
165 --
2 · Fourier Theory

FIGURE 2.9(a).

FIGURE 2.9(b).

FIGURE 2.9(c).

The formula for the t interval between the arrows is T /( n  1/ 2) in all three plots. Figures 2.9(a), 2.9(b),
and 2.9(c) show how the base width of the central lobe becomes ever narrower as n increases.

- 166
- 166- -
Fourier Transform of the Shah Function · 2.19

Since adding one to n does not make any difference in the limit, we end up with

G lim g n (t )  (t ) ; (2.76c)
n 75

and from (2.71a) we get, again adding one to n,

sin  2& (n  1)t 


6  ( f , n  1) for n 1, 2,… . (2.76d)
&t

To find the generalized function that is the forward Fourier transform of the generalized limit of
Gn as n 7 5 , we must evaluate the forward Fourier transform of Gn for finite n,

5 n 5
F ( ift )  Gn (t )  ³ e Gn (t ) dt
2& ift
¦ ³e
k  n 5
2& ift
g n (t  kT ) dt
5
n 5
¦ e2& ifkT
k  n
³e
2& ift 3
g n (t 3) dt 3,
5

where in the last step the variable of integration has been changed to t 3 t  kT . The Fourier
transform inside the sum can be done using (2.76b) and (2.76d) to get

n
F (  ift )  Gn (t )   ( f , n  1) A ¦ e 2& ifkT . (2.77a)
k  n

n
The sum ¦e
k  n
2& ifkT
is just a disguised form of geometric series. We can write
n n

¦e
k  n
2& ifkT
¦w
k  n
k
, (2.77b)

where
w e 2& ifT

and define
n n
Sn ¦w
k  n
k
¦e
k  n
2& ifkT
.

-- 167
167 --
2 · Fourier Theory

Using the standard approach for calculating the sum of a geometric series, we note that
multiplying every term in the sum by w increases each power of w in the sum by one. This is the
same as adding wn 1 and subtracting w n from the original sum, giving

n 1
wSn ¦
k  n 1
wk S n  wn 1  w n

or
wn 1  w n
Sn .
w 1

Hence, (2.77b) becomes

2& ifT  n  1 2  
e  
2& ifT n  1 2 
n
e 2& ifT ( n 1)  e 2& ifT ( n ) e
¦e
k  n
2& ifkT

e 2& ifT  1

e & ifT  e& ifT
(2.77c)


sin 2& fT  n  1 2   ,
sin(& fT )

which means Eq. (2.77a) can be written as

F ( ift ) (Gn (t ))

sin 2& fT  n  1 2    ( f , n  1) . (2.77d)
sin(& fT )

The inverse Fourier transform of the forward Fourier transform returns the original function [see
Eqs. (2.29b) and (2.29d)], so this last result lets us write

Gn (t ) 6

sin 2& fT  n  1 2    ( f , n  1) . (2.77e)
sin(& fT )

From the definition of the Fourier transform of a generalized function [see (2.59g)], we know that
taking the generalized limit of both sides of (2.77e) gives a Fourier transform relationship
between two generalized functions—all that needs to be done now is to find out what these
generalized functions are.
To find the generalized function that is the generalized limit of Gn as n 7 5 , we write for
any test function  , using Eq. (2.76a), that

- 168
- 168- -
Fourier Transform of the Shah Function · 2.19

ª º
5 5 5 n

³  (t ) ªG lim Gn (t ) º dt lim ³  (t ) Gn (t ) dt lim ³  (t ) « ¦ gn (t  kT )» dt


¬ n 75 ¼ n 75 n 75
¬ k  n ¼
5 5 5
n 5
(2.77f)
lim
n 75
¦ ³  (t ) g
k  n 5
n (t  kT ) dt.

Equation (2.76c) states that the generalized limit of g n is the delta function, so

5 5 5
lim ³  (t ) g n (t  kT ) dt ³  (t ) G lim g n (t  kT ) dt ³  (t ) (t  kT ) dt  (kT ) ,
n 75 n 75
5 5 5

which means that


n 5 5
lim
n 75
¦ ³  (t ) gn (t  kT ) dt
k  n 5
¦  (kT ) .
k 5

Hence, Eq. (2.77f) can be written as

5 5

³  (t ) ªG lim Gn (t )º dt
¬ n 75 ¼ ¦  (kT ) .
k 5
(2.77g)
5

But, just as in the discussion following Eq. (2.75a) above, we can regard

¦  (kT )
k 5

as the result of integrating the shah generalized function

5
II( t , T ) ¦  (t  kT )
k 5
with any test function  , since

ª 5 º
5 5 5

³ II( t , T )  (t ) dt
5
³ «¦
5 ¬ k 5
 (t  kT ) »
¼
 (t ) dt ¦  (kT ) .
k 5

Therefore, (2.77g) can be written as

-- 169
169 --
2 · Fourier Theory

ª 5 º
5 5
ª lim Gn (t ) ³ « ¦  (t  kT ) »  (t ) dt
³5 dt (t ) ¬Gn75 º (2.77h)
¼ 5 ¬ k 5 ¼

for any test function  , showing that

5
G lim Gn (t )
n 75
¦  (t  kT ) II( t , T )
k 5
(2.77i)

in the sense of Eq. (2.47b).


The generalized function that is the generalized limit of the right-hand side of (2.77e) is
multiplied by an arbitrary test function  ( f ) and integrated over all ƒ to get

­ ª sin 2& fT  n  1 2  
  º ½°
°
5

³5  ( f ) ®Gn75
lim «
« sin(& fT )
 ( f , n  1) » ¾ df
»°
°¯ ¬ ¼¿
n 1 ª sin 2& fT  n  1 2   º
 
lim ³  ( f ) « » df (2.78a)
n 75 « sin(& fT ) »
 ( n 1)
¬ ¼
5 ª sin 2& fT  n  1 2   º
 
lim ³  ( f ) « » df ,
n 75 « sin( & fT ) »
5
¬ ¼

where in the last step we recognize that the behavior of the sine ratio inside the square brackets
[ ] is not affected by the endpoints for the region of integration as n 7 5 . Equations (2.56a) and
(2.75e) show that

°­ sin  2& fT  n  (1 2)   °½ ª º
5 5 k 5
lim ³  ( f ) ® ¾ df ³  ( f ) «T 1 ¦  ( f  kT 1 ) » df ,
n 75
5 °¯ sin & fT  °¿ 5 ¬ k 5 ¼

which means that (2.78a) simplifies to

- 170
- 170- -
Fourier Transform of the Shah Function · 2.19

­ ª sin 2& fT  n  1 2  
  º ½°
°
5

³5  ( f ) ®Gn75
lim «
« sin(& fT )
 ( f , n  1) » ¾ df
»°
°¯ ¬ ¼¿
ª º
5 k 5
³  ( f ) «T 1 ¦  ( f  kT 1 ) » df
5 ¬ k 5 ¼
for any test function  ( f ) . Therefore,

ª sin 2& fT  n  1 2  
  º 1 k 5
§ k·
G lim «  ( f , n  1) » ¦  ¨ f  ¸ (2.78b)
n 75 « sin(& fT ) » T k 5 © T¹
¬ ¼

in the sense of Eq. (2.47b). Since the right-hand side of (2.78b) is, according to (2.75d),
proportional to the shah function, we end up with

1 5 § k · 1
¦ ¨f 
T k 5 © T ¹ T
1
¸ II( f , T ) . (2.78c)

Equations (2.78b) and (2.77i) let us take the generalized limits as n 7 5 of both sides (2.77e) to
get
5
1 5 § k·
¦
k 5
 (t  kT ) 6 ¦  ¨ f  ¸ .
T k 5 © T¹
(2.78d)

According to Eq. (2.75d), this can also be written as

1
II( t , T ) 6 II( f , T 1 ) . (2.78e)
T

These last two results can be transformed


modified to directly
generalize
to how
showboth the forward
explicitly andthe
that both inverse
forward Fourier
and
inverse Fourier
transform transform
of the shah of produce
function the shahanother
function produce
shah another
function. shah
We first function.
write (2.78d)We firstforward
as the write
(2.78d)
and as the
inverse forward
Fourier and inverse Fourier transforms,
transforms,

2& ift ª º
5 5
1 5
§ j·
³5 e « ¦  (t  kT ) » dt T ¦  ¨© f  T ¸¹ (2.79a)
¬ k 5 ¼ j 5

and
5
ª
2& ift 1
5
§ j ·º 5

³ «¬ T
e ¦  ¨  ¸ » ¦  (t  kT ) .
j 5 ©
f
T ¹¼
df
k 5
(2.79b)
5

-- 171
171 --
2 · Fourier Theory

The discussion following Eq. (2.52c) above shows that linear transformations of the variables of
integration are allowed when using generalized functions, so we can change to t 3 t in Eqs.
(2.79a) and (2.79b) to get

2& ift 3 ª º
5 5
1 5
§ j·
³ e « ¦
¬ k 5
 ( t 3  kT ) »
¼
dt 3
T
¦  ¨© f  T ¸¹
j 5
5
and
5
ª
2& ift 3 1
5
§ j ·º 5

³5 e «¬ T ¦¨f 
j 5 ©
¸»
T ¹¼
df ¦  (t 3  kT ) .
k 5

The sum over index k goes over all positive and negative integers, so we can change the sum’s
index to k 3  k and use that the delta function is even [see Eq. (2.68a)] to get

2& ift 3 ª º
5 5
1 5
§ j·
³5 «¬ k¦
e
3 5
 (t 3  k 3T ) » dt 3
¼ T
¦  ¨© f  T ¸¹
j 5

and
5
ª1 5
§ j ·º 5

³e
2& ift 3
« ¦  ¨ f  ¸ » df ¦  (t 3  k 3T ) .
5 ¬T j 5 © T ¹¼ k 3 5

Dropping the primes and combining these results with Eqs. (2.79a) and (2.79b) produces the
more general formulas

92& ift ª º
5 5
1 5
§ j·
³ e « ¦
¬ k 5
 (t  kT ) »
¼
dt
T
¦  ¨© f  T ¸¹
j 5
(2.79c)
5
and
5
ª
92& ift 1
5
§ j ·º 5

³5 e «¬ T ¦¨f 
j 5 ©
¸»
T ¹¼
df ¦  (t  kT ) .
k 5
(2.79d)

In fact, we can easily show that Eqs. (2.79c) and (2.79d) are really the same formula. First, we
interchange the j, k indices and the ƒ, t variables in Eq. (2.79c) so that it becomes

5
ª 5 º 1 5 § k ·
³ e 92& ift
« ¦  ( f  jT ) » df ¦  ¨t 
T k 5 © T
¸.
¹
5 ¬ j 5 ¼

Parameter T is arbitrary, so—just like in the analysis following Eq. (2.75d) above—it can be
replaced everywhere by T 1 to get

- 172
- 172- -
Fourier Transform of the Shah Function · 2.19

5
ª 5 § j ·º 5
§ k·
³5 «¦ ¨ ¦
92& ift
e  f  ¸» df T  ¨ t  kT¸ .
¬ j 5 © T ¹¼ k 5 © T¹

After dividing through by T, we see that this last result is the same as Eq. (2.79d), showing that
Eqs. (2.79c) and (2.79d) are really the same formula.

2.20 Fourier Series


Integral Fourier transforms are connected in a direct and straightforward way to both the Fourier
series and the discrete Fourier transform. This section shows the connection to the Fourier series
and the next section shows the connection to the discrete Fourier transform.24
We begin with an arbitrary, nonpathological function u(t) that has a well-defined Fourier
integral transform. Function u can be complex-valued but its argument t must be real, and U(ƒ) is
the forward Fourier transform of u(t), so

5
U( f ) F (  ift )
 u (t )  ³ u (t )e2& ift dt (2.80a)
5
and
u (t ) 6 U ( f ) . (2.80b)

From u(t), we create a new function u[ 5 ] (t , T ) that repeats forever along the t axis at intervals of
T,
5
u[ 5 ] (t , T ) ¦ u(t  kT ) .
k 5
(2.81a)

Although perhaps redundant, it turns out that listing T as one of the arguments of u[ 5 ] is a
convenient way to keep track of the connection between u and u[ 5 ] . Function u   is called a
5

periodic function of period T because, for any finite positive or negative integer m,

u[ 5 ] (t  mT , T ) u[ 5 ] (t , T ) . (2.81b)

Figures 2.10(a) and 2.10(b) show the plots for both u and u[ 5 ] as functions of t. Since function u
is left unspecified, u[ 5 ] can be thought of as representing an arbitrary periodic function. We can

24
The analysis in Secs. 2.20 and 2.21 is adapted from A. Papoulis, Signal Analysis (McGraw-Hill Book Company,
New York, 1977), pp. 76–81.

-- 173
173 --
2 · Fourier Theory

also define a function u[ N ] (t , T ) by the formula

N
u [N ]
(t , T ) ¦ u (t  kT ) .
k  N
(2.81c)

Clearly,
lim u[ N ] (t , T ) u[ 5 ] (t , T ) . (2.81d)
N 75

We assume that u[ N ] is well behaved with respect to the test functions  , so that

5 5

³  (t ) u ³  (t ) u
[N] [5]
lim (t , T ) dt (t , T ) dt . (2.81e)
N 75
5 5

_____________________________________________________________________________

FIGURE 2.10(a). u (t )

FIGURE 2.10(b).
u[ 5 ] (t , T )
T

Figure 2.10(a) is a plot of u (t ) . The solid curve in Fig. 2.10(b), shifted upward from its true position, is
u[ 5 ] (t , T ) and the dashed curves represent u (t ) displaced by multiples of T .

- 174
- 174- -
Fourier Series · 2.20

From (2.81e) and the definition of the generalized limit [see Eq. (2.56a)], we then know that

5 5 5
lim ³  (t ) u
[N ]
(t , T ) dt ³  (t ) ªG lim u[ N ] (t , T ) º dt ³  (t ) u[ 5 ] (t , T ) dt ,
N 75
5 5
¬ N 75 ¼ 5

from which it follows that


G lim u[ N ] (t , T ) u[ 5 ] (t , T ) (2.81f)
N 75

in the sense of Eq. (2.48c).


Following the pattern of the definitions in (2.81a) and (2.81c), we define

N
 [ N ] (t , T ) ¦  (t  kT )
k  N
(2.82a)

and
5
 [5]
(t , T ) ¦  (t  kT ) .
k 5
(2.82b)

Function  [ 5 ] (t , T ) is clearly just another way of writing the shah function II( t , T ) . [The shah
5
function is defined in Eq. (2.73) and shown equal to ¦  (t  kT )
k 5
in Eq. (2.75d).] The

convolution of the generalized function

N
 [N]
(t , T ) ¦  (t  kT )
k  N

with the true function u(t) is

5 N 5
u (t )   [ N ] (t , T ) ³ u(t 3) (t  t 3, T )dt 3
[N ]
¦ ³ u(t3)  t 3  (t  kT ) 
k  N 5
5
N
¦ u(t  kT ) ,
k  N

where the next-to-last step uses  ( x )  ( x ) as shown in Eq. (2.68a). The definition of u[ N ] in
(2.81c) then gives

u[ N ] (t , T ) u (t )   [ N ] (t , T ) . (2.82c)

-- 175
175 --
2 · Fourier Theory

Taking the integral Fourier transform of both sides, using the Fourier convolution theorem [see
Eq. (2.72 A )], and remembering that U(ƒ) is the forward Fourier transform of u(t), we get

  
F ( ift ) u[ N ] (t , T ) F (  ift )  u (t )  A F (  ift )  [ N ] (t , T ) 
N 5
U( f )A ¦ ³e
k  N 5
2& ift
 (t  kT )dt
N (2.83a)
U( f ) ¦ e 2& ikfT

k  N
sin  2& fT ( N  1 2) 
U( f ) ,
sin(& fT )

where in the last step we substitute from Eq. (2.77c) above. Having now found that

sin  2& fT ( N  1 2) 

F ( ift ) u[ N ] (t , T ) U ( f )  sin(& fT )
,

we take the inverse Fourier transform of both sides to get

5
sin  2& fT ( N  1 2) 
³e
[N ] 2& ift
u (t , T ) U( f ) df . (2.83b)
5
sin(& fT )

Taking the limit of both sides as N 7 5 , we get, using (2.81d), that

5
sin  2& fT ( N  1 2) 
³e
[5] 2& ift
u (t , T ) lim U( f ) df . (2.83c)
N 75
5
sin(& fT )

Equations (2.56a) and (2.75e) can now be used to write

5
ª sin  2& fT ( N  1 2)  º
³e
[5] 2& ift
u (t , T ) U ( f ) G lim « » df
5
N 75
¬ sin(& fT ) ¼
1ª k ·º
5 5
§
³ e 2& iftU ( f ) « ¦  ¨ f  ¸ » df
5
T ¬ k 5 © T ¹¼
or
5 kt

¦ ª¬T
2& i
u [5]
(t , T ) 1
U (k T ) º¼ e T
. (2.83d)
k 5

- 176
- 176- -
Fourier Series · 2.20

Equation (2.83d) specifies the Fourier series for an arbitrary periodic function u[ 5 ] , showing that
u[ 5 ] can be written as the infinite sum of complex exponentials multiplied by the complex
constants [T 1U (k T )] . To get these complex constants directly from u[ 5 ] , we note that for any
real number * and integer m,

1 ­° ½°
5 m *  ( N 1)T m
1 §m· 1 2& i t 2& i t
U ¨ ¸ ³ u (t )e T
dt lim ® ³ u (t )e T
dt ¾
T © T ¹ T 5 N 75 T
¯° *  NT ¿°
1 ­°
*  ( N 1)T m *  ( N  2)T m
2 & i t 2 & i t
lim ®
N 75 T
°̄ *
³
 NT
u (t )e T
dt  ³
*  ( N 1)T
u (t )e T
dt  "
* m * T m
2 & i t 2& i t
 ³ u (t )e
* T
T
dt  ³* u(t )e T
dt
*  2T m
2 & i t
*  ( N 1)T m
2& i t ½°

* T
³ u (t )e T
dt  "  ³
*  NT
u (t )e T
dt ¾ .
°¿
This can be simplified to

*  ( k 1)T m
1 §m· 1 N
U ¨ ¸ lim ¦
2& i t

T © T ¹ N 75 T k  N ³
*  kT
e T
u (t )dt . (2.83e)

For each value of k, we change the variable of integration to t 3 t  kT so that

*  ( k 1)T m * T m * T m
2& i t 2& i t 3 2& i t 3
³ ³* e ³*
2& imk
e T
u (t ) dt T
e u (t 3  kT ) dt 3 e T
u (t 3  kT ) dt 3 ,
*  kT

where we use that e 2& imk 1 . Substituting this into (2.83e) gives

* T m * T m
1 §m· 1 N 1 ª N º
U ¨ ¸ lim ¦ « ¦ u (t 3  k 3T ) » dt 3 ,
2& i t 3 2& i t 3

T © T ¹ N 75 T k  N ³* e T
u (t 3  kT )dt 3 lim
N 75 T ³* e T

¬ k 3  N ¼

where in the last step we have replaced index k by index k 3  k . Now, taking the limit inside the
integral to get the generalized limit [see Eq. (2.56a) above], we rely on (2.81f) to get

* T m * T m
1 §m· 1 ª N º 1
G lim « ¦ u (t 3  k 3T ) » dt 3
2& i t 3 2 & i t 3
³* e ³* e
[5 ]
U¨ ¸ T T
u (t 3, T ) dt 3 . (2.83f)
T ©T ¹ T N 75
¬ k 3  N ¼ T

-- 177
177 --
2 · Fourier Theory

Equations (2.83d) and (2.83f) let us put the Fourier series into its standard form. For any
periodic function
5
[5]
v(t ) u (t , T ) ¦ u (t  kT )
k 5
of period T, we have found that
5 t

¦
2& ik
v(t ) Ak e T
, (2.84a)
k 5

where
* T k
1 2& i t
Ak
T ³* e T
v(t ) dt . (2.84b)

for any finite value of * . Because we did not require u(t) to be real in (2.80a), Eqs. (2.83d),
(2.83f), (2.84a), and (2.84b) still hold true for complex periodic functions with real arguments t.
It is customary—but of course not mandatory—to choose * 0 or *  T 2 in (2.84b).
Using v(t ) u[ 5 ] (t , T ) , we know from Eqs. (2.83d), (2.83f), (2.84a), and (2.84b) that the Ak
coefficients can be specified in terms of the forward Fourier transform U(ƒ) of u(t),

1 §k ·
Ak U¨ ¸. (2.85a)
T ©T ¹

When u is real—which means that v(t ) u[ 5 ] (t , T ) is also real—we know from entry 7 of Table
2.1 (located at the end of this chapter) that U(ƒ) must be Hermitian so that

U (  f ) U ( f ) .

Hence, when v(t) is real in (2.84a), it then follows from (2.85a) that

A k Ak  (2.85b)

in (2.84b). This procedure can be extended to all the entries in Table 2.1, giving us the entries in
Table 2.2 (also located at the end of this chapter). To go through another example, if u is
imaginary and odd, we know from entry 3 of Table 2.1 that U is real and odd, so

U ( f ) U ( f ) and Im U ( f )  0 .

- 178
- 178- -
Fourier Series · 2.20

Equation (2.85a) then shows that

A k  Ak and Im  Ak  0 . (2.85c)

We can show that v(t ) u[ 5 ] (t , T ) is imaginary and odd when u is imaginary and odd (let
k 3  k ),

5 5 5
v(t ) u [5 ]
(t , T ) ¦ u(t  kT ) ¦ u (t  k 3T )  ¦ u(t  k 3T )
k 5 k 3 5 k 3 5
u[ 5 ] (t , T ) v(t ) ,
and
5
Re  v(t )  ¦ Re  u (t  kT )  0 .
k 5

This shows that we end up with (2.85c) associated with v(t) being imaginary and odd, as stated in
entry 3 of Table 2.2.
A final point worth mentioning about Fourier series is that the Ak coefficients are often
reshuffled so that the series can be written as a sum of sines and cosines. Equation (2.84a) can be
rewritten as, using ei cos   i sin  ,

5 ª t
2& i k º
t
v(t ) A0  ¦ « A k e
2 & i k
T
 Ak e T »
k 1 ¬ ¼ (2.86a)
5 § 2& k t · 5 § 2& k t ·
A0  ¦ ª¬ A k  Ak º¼ cos ¨ ¸  ¦ i ª¬ Ak  A k º¼ sin ¨ ¸.
k 1 © T ¹ k 1 © T ¹

From Eq. (2.84b), we get


* T
1
A0
T ³* v(t ) dt , (2.86b)

1
* T
ª 2& i k Tt 2& i k º
t
2
* T
§ 2& k t ·
A k  Ak
T ³* v(t ) « e
¬
e T
» dt ³ v(t ) cos ¨
¼ T * © T ¹
¸ dt , (2.86c)

and
i
* T
ª 2& i k Tt 2& i k º
t
2
* T
§ 2& k t ·
i ª¬ Ak  A k º¼
T ³* v(t ) «e
¬
 e T » dt ³ v(t ) sin ¨
¼ T * © T
¸ dt .
¹
(2.86d)

-- 179
179 --
2 · Fourier Theory

Putting these results together, we can write

c0 5 § 2& kt · 5 § 2& kt ·
v(t )  ¦ ck cos ¨ ¸  ¦ sk sin ¨ ¸, (2.87a)
2 k 1 © T ¹ k 1 © T ¹
where
* T
2 § 2& kt ·
ck
T ³* v(t ) cos ¨© T ¹
¸ for k 0,1, 2,… (2.87b)

and
* T
2 § 2& kt ·
sk
T ³* v(t ) sin ¨© T ¹
¸ for k 1, 2,3,… . (2.87c)

The absolute value signs are dropped from index k because it is defined positive in (2.87a), and
A0 is replaced by c0 2 so that the formula for c0 can be folded into the general formula for ck in
(2.87b). Although it is still not mandatory, parameter * is usually given the value 0 or  T 2 .
Nowhere has v been required to be real, so Eqs. (2.87a)–(2.87c), just like Eqs. (2.84a) and
(2.84b), still hold true when v is a complex-valued periodic function of (real) period T. Indeed, if
v is a complex-valued function of a real argument t, both its real part

vR (t ) Re  v(t ) 
and its imaginary part
vI (t ) Im  v(t ) 

are real-valued periodic functions of period T. This means that when, for any integer m, we have

v(t 9 mT ) v(t ) (2.88a)

for a complex-valued function v of a real argument, then

vR (t 9 mT ) vR (t ) (2.88b)
and
vI (t 9 mT ) vI (t ) . (2.88c)

Since sines and cosines of real arguments are strictly real, we can now take the real and
imaginary parts of (2.87a)–(2.87c) to get

- 180
- 180- -
Fourier Series · 2.20

Re(c0 ) 5 § 2& kt · 5 § 2& kt ·


vR (t )  ¦  Re(ck )  cos ¨ ¸  ¦  Re( sk ) sin ¨ ¸, (2.89a)
2 k 1 © T ¹ k 1 © T ¹
with
* T
2 § 2& kt ·
Re(ck )
T ³* v R (t ) cos ¨
© T ¹
¸ for k 0,1, 2,… (2.89b)

and
* T
2 § 2& kt ·
Re( sk )
T ³* v R (t ) sin ¨
© T ¹
¸ for k 1, 2,3,… , (2.89c)

as well as
Im(c0 ) 5 § 2& kt · 5 § 2& kt ·
vI (t )  ¦  Im(ck )  cos ¨ ¸  ¦  Im( sk )  sin ¨ ¸ , (2.90a)
2 k 1 © T ¹ k 1 © T ¹
with
* T
2 § 2& kt ·
Im(ck )
T ³* v (t ) cos ¨©
I
T ¹
¸ for k 0,1, 2,… (2.90b)

and
* T
2 § 2& kt ·
Im( sk )
T ³* v (t ) sin ¨©
I
T ¹
¸ for k 1, 2,3,… . (2.90c)

2.21 Discrete Fourier Transform


The first step in going from the integral Fourier transform to the discrete Fourier transform is to
repeat the procedure used in Sec. 2.20 to get the Fourier series. We pick a nonpathological
function u(t) having a forward Fourier transform

³ u (t )e
2& ift
U( f ) dt (2.91a)
5

and, following the same procedure used in Eq. (2.81a) above, create a periodic function of period
T:
5
u[ 5 ] (t , T ) ¦ u (t  kT ) .
k 5
(2.91b)

As was shown Sec. 2.20, we can now write the associated Fourier series as [see Eq. (2.83d)]

-- 181
181 --
2 · Fourier Theory

kt
1 5 §k · 2& i T
u[ 5 ] (t , T ) ¦ U¨
T k 5 © T
¸e
¹
, (2.91c)

where, as specified in (2.91a), U is the forward Fourier transform of u.


Next we divide the period T of u[ 5 ] into N equal lengths, t T N , and evaluate (2.91c) only
for t mt with m 0,1, 2,… , N  1 ,

km
1 5 §k · 2& i N
u [5]
(mt , T ) ¦ U ¨ ¸e , (2.92a)
T k 5 © T ¹

where we have used


N t T (2.92b)

to simplify the exponent of (2.92a). The infinite sum in (2.92a) can be split in two by making the
substitution k n  rN with n 0,1, 2,… , N  1 and r 0, 9 1, 9 2,…. This gives

nm
1 5 N 1 § n  rN · 2& i N 2& irm
u [5]
(mt , T ) ¦ ¦ U ¨ ¸e e .
T r 5 n 0 © T ¹

Since e2& irm 1 and T N t , this becomes, making the index substitution r 3 r ,

nm
1 N 1 2& i N 5
§n r3 ·
u [5]
(mt , T ) ¦ e ¦ U ¨© T  t ¸¹
T n 0 r 3 5

or
nm
1 N 1 2& i N [ 5 ] § n 1 ·
u[ 5 ] (mt , T ) ¦ e U ¨© T , t ¸¹ ,
T n 0
(2.93a)

where we follow the pattern of Eqs. (2.81a) and (2.91b) and define

5
U [5] ( f , F ) ¦ U ( f  rF )
r 5
(2.93b)

for any two frequencies ƒ and F.


Equation (2.93a) is a somewhat disguised version of the discrete Fourier transform (DFT).
Figures 2.11(a) and 2.11(b) show the relationship of the two periodic functions u[ 5 ] and U [ 5 ] ,
graphed with solid lines, to the two original functions u and U graphed with dashed lines. [In
graphs such as these, u(t) typically stands for data and is usually real, making it easy to represent

- 182
- 182- -
Discrete Fourier Transform · 2.21

with a two-dimensional plot; but its transform U(ƒ) is often complex, so it makes more sense to
plot U ( f ) if we just want to show where U(ƒ) is different from zero.] When function u[ 5 ] has
period T and is uniformly sampled at intervals of ¨t, then function U [ 5 ] has period

1
F (2.93c)
t
and is uniformly sampled at intervals of
1
f . (2.93d)
T

Note, of course, we could also say that u[ 5 ] has period 1 f and is uniformly sampled at
intervals of 1 F when U [ 5 ] has period F and is sampled at intervals of ¨ƒ. When both ¨ƒ and ¨t
are known, we have from (2.92b) and (2.93d) that

1
f A t (2.93e)
N

Figures 2.12(a) and 2.12(b) show that if T and F are large and functions u(t) and U(ƒ) die away
relatively quickly when t and f are large—which means that u and U are localized near the t
and ƒ origins—then the corresponding periodic functions u[ 5 ] (t , T ) and U [ 5 ] ( f , F ) can be used
to approximate the non-negligible regions of u and U. Almost always when the DFT is used, its
users have in mind a situation such as that shown in Figs. 2.12(a) and 2.12(b), with u[ 5 ] and U [ 5 ]
being good approximations of u and U for small to moderately large values of t and ƒ.
To complete the DFT transform pair, we define

2& i
wN e N
(2.94a)

and write (2.93a) as


1 N 1 nm [ 5 ] § n 1 ·
u[ 5 ] (mt , T ) ¦ wN U ¨© T , t ¸¹ .
T n 0
(2.94b)

Multiplying both sides by wN mk and summing over m gives

N 1
1 N 1 ­ [ 5 ] § n 1 · ª N 1 mA( n  k ) º ½
¦u
m 0
[5 ]
(mt , T ) w mk
N ¦ ®U ¨ , ¸ A « ¦ wN
T n 0 ¯ © T t ¹ ¬ m 0
»¾ .
¼¿
(2.94c)

-- 183
183 --
2 · Fourier Theory

FIGURE 2.11(a).
1
T
f
u[ 5 ] (t , T )

t
t 1/ F
FIGURE 2.11(b).
1
F
U [5 ] ( f , F ) t

f
f 1/ T

The sum over m on the right-hand side is the sum of a geometric series,

N 1
Vn[,Nk ] ¦ wNm ( n  k ) . (2.94d)
m 0

This can be solved using the standard procedure for geometric sums [see the analysis following
Eq. (2.77b) above], multiplying every term in the sum by wNn  k to get

Vn[,Nk ] A wn  k Vn[,Nk ]  wNN A( n k )  1 . (2.94e)

Solving for Vn[,Nk ] gives


[N] 1  wNN A( n  k ) 1  e 2& i ( n  k )
V n,k , (2.94f)
1  wNn  k § nk ·
2& i ¨ ¸
1 e © N ¹

- 184
- 184- -
Discrete Fourier Transform · 2.21

where in the last step definition (2.94a) is used to eliminate wN . Index n goes from zero to N  1
for each value of k [see Eqs. (2.94b) and (2.94c)]. Deciding also to restrict k to one of the integers
k 0,1, 2,… , N  1 , we see that the denominator in (2.94f) can be zero only when n k . This
looks like it could be a problem, but when n = k, we can return to the original formula in (2.94d),
noting that for n = k the sum Vn[,Nk ] is equal to N. When n  k, the right-hand side of (2.94f) shows
that Vn[,Nk ] is zero because e2& i ( n  k ) 1 . We conclude that

­ N for n k ½
Vn[,Nk ] ® ¾ N  k ,n , (2.94g)
¯ 0 for n > k ¿

where  k ,n is the Kronecker delta,


­1 for n k
 k ,n ® . (2.94h)
¯0 for n > k

Substitution of (2.94d) into (2.94c) gives

N 1
1 N 1 ­ [ 5 ] § n 1 · [ N ] ½
¦ u[5 ] (mt , T )wN mk ¦ ®U ¨© T , t ¸¹ AVn,k ¾ .
T n 0 ¯
m 0 ¿

Substituting from (2.94g), we get

N [ 5 ] § k 1 · N 1 [ 5 ]
U ¨ , ¸ ¦ u (mt , T ) wN mk . (2.94i)
T © T t ¹ m 0

This equation is the other half of the DFT [the first half is specified by Eqs. (2.94a) and (2.94b)].
Using Eqs. (2.94a) and (2.92b) to replace wN by e(2& i ) / N and N T by 1 t , we write (2.94b)
and (2.94i) as
§ mn ·
1 N 1 2& i¨ ¸ §n 1 ·
u (mt , T ) ¦ e © N ¹ U [ 5 ] ¨ , ¸
[5]
(2.95a)
T n 0 © T t ¹
and
§ mn ·
§n 1 · N 1 2& i ¨ ¸
U [5]
¨ , ¸
© T t ¹
t ¦
m 0
u[ 5 ] (mt , T )e © N ¹ , (2.95b)

-- 185
185 --
2 · Fourier Theory

FIGURE 2.12(a).

u[ 5 ] (t , T ) 1
T
f

t 1/ F

region over which u[ 5 ] u

FIGURE 2.12(b).
U [5] ( f , F ) 1
F
t

f
f 1/ T

region over which U [ 5 ] U

- 186
- 186- -
Discrete Fourier Transform · 2.21

where index k has been replaced by n in (2.94i). This can also be written as, using Eqs. (2.93c)
and (2.93d),
N 1 2& i § mn ·
¨ ¸
u (mt , T ) f ¦ e
[5] © N ¹
U [ 5 ]  nf , F  (2.95c)
n 0
and
N 1 § mn ·
2& i ¨ ¸
U [5]
 nf , F  t ¦ u [5]
(mt , T )e © N ¹
. (2.95d)
m 0

The forward and inverse DFTs shown in (2.95c) and (2.95d) are often written as

N 1 § mn ·
2& i ¨ ¸
um ¦ U n e © N ¹
(2.96a)
n 0
and
N 1 § mn ·
1 2 & i ¨ ¸
Un
N
¦u
m 0
m e © N ¹
. (2.96b)

To get Eq. (2.96a) from (2.95c), we define

um u[ 5 ] (mt , T ) (2.96c)
and
U n f A U [ 5 ] (nf , F ) , (2.96d)

and to get Eq. (2.96b), both sides of (2.95d) are multiplied by ¨ƒ, using (2.93e) to replace f A t
by 1 N . We can also define
U n U [ 5 ] (nf , F ) (2.97a)
and
um t A u[ 5 ] (mt , T ) (2.97b)

to transform Eqs. (2.95c) and (2.95d) into

N 1 § mn ·
1 2& i ¨ ¸
um
N
¦U n e
n 0
© N ¹
(2.97c)

and
N 1 § mn ·
2& i ¨ ¸
U n ¦ um e © N ¹
, (2.97d)
m 0

-- 187
187 --
2 · Fourier Theory

where now we have multiplied both sides of (2.95c) by ¨t before replacing f A t by 1 N .


Figures 2.13(a) and 2.13(b) show how the u[’] and U[’] continuous functions are sampled to
create the DFT formulas in the previous paragraph. The values of the original functions u and U
are ignored for negative values of t and ƒ; instead, we sample u[’] and U[’] out to t = T and f = F,
picking up the original u and U values at negative t and ƒ where they repeat near t = T and f = F.
Many times DFT plots show um and Un with n and m running from 0 to N í 1. When this is done,
it is with the understanding that the large index values greater than N/2 represent u and U for
negative t and ƒ values respectively.

2.22 Aliasing as an Error


The DFT is important because there is an algorithm, called the fast Fourier transform (FFT), that
allows computers to calculate the sums in Eqs. (2.96a), (2.96b), (2.97c), and (2.97d) rapidly when
N is a multiple of 2. The FFT performs best when N 2 j for j a positive integer. In fact, when
faced with calculating an integral Fourier transform

³ u (t )e
2& ift
U( f ) dt
5

over a range of ƒ values for an arbitrary function u(t), it is standard practice to convert the
integral to a DFT and do the job on a computer with a FFT. As we saw in the previous section,
the DFT deals directly with u[ 5 ] and U [ 5 ] rather than u and U. Thus, successfully using the DFT
to calculate the integral transform requires that u[ 5 ] and U [ 5 ] consist of well-separated, repetitive
regions of u and U, as shown in Figs. 2.12(a) and 2.12(b), instead of overlapping regions of u and
U, as shown in Figs. 2.11(a) and 2.11(b). Ensuring that u[ 5 ] consists of nonoverlapping regions
of u tends to occur naturally; the shape of u is already known so there is no real difficulty in
picking T large enough to prevent significant amounts of overlap in u[ 5 ] . The shape of U,
however, is not known in advance, so care must be taken to avoid significant amounts of overlap
in U.
Consider what happens when the DFT is used to analyze a real signal u(t) having the spectrum
U(ƒ) and we know that U(ƒ) is zero for all f : f max and nonzero for 0
f
f max . Because u is
real, we know from entry 7 in Table 2.1 that U ( f ) U ( f ) , ensuring that U(ƒ) is also nonzero
for negative frequency values 0 f  f max ; that is, for every positive ƒ at which U is nonzero
there must be a íƒ at which U is nonzero, and because U is zero for f : f max it follows that U is
zero for all f 4  f max . Hence U can be represented schematically by the solid triangle centered
on the origin of Fig. 2.14. To construct U [ 5 ] , we write

- 188
- 188- -
Aliasing as an Error · 2.22

FIGURE 2.13(a).

u[ 5 ] (t , T )
1
T
f

t 1/ F

region over which u[ 5 ] u

FIGURE 2.13(b).

U [5 ] ( f , F ) 1
F
t

f 1/ T

region over which U [ 5 ] U

-- 189
189 --
2 · Fourier Theory

5
U [5] ( f , F ) ¦ U ( f  kF ) ,
k 5
(2.98a)

where the smallest we can make F and still avoid overlap is, as shown by the dotted triangles in
Fig. 2.14,
F 2 f max . (2.98b)

From Eq. (2.93c), we see that in Fig. 2.14


1
F ,
t

where ¨t is the interval in t between adjacent samples of u(t). If ¨t is made smaller, then F
increases, moving the regions of nonzero U further apart in Fig. 2.14; and if ¨t is made larger,
then F decreases, forcing the regions of nonzero U to overlap in Fig. 2.14. Making ¨t smaller is
wasteful, in that more effort than is needed goes into sampling u(t), and making ¨t larger
damages the integrity of the U calculations for large values of ƒ near f max . Clearly, the frequency
value F/2 plays an important role in DFT analysis, because optimum performance requires
f max F / 2 . For this reason frequency F/2 is given a special name: the Nyquist frequency
f Nyq F / 2 . From (2.93c), we see that
1
f Nyq . (2.99a)
2t

A realistic system, of course, is designed with some built-in margin for error. The requirement
then becomes that ¨t be small enough to separate unexpectedly high frequencies when the
highest expected frequency is f max . To provide this margin, we take

1
f Nyq f max (2.99b)
2t
or
1
t
. (2.99c)
2 f max

Now the region between f max and f Nyq is available for analysis of unexpectedly high frequencies.
Suppose U(ƒ) is negligible everywhere except at two frequencies, the positive frequency f 0
and the corresponding negative frequency   f 0  . Since U(ƒ) is the transform of a real signal,
entry 7 of Table 2.1 requires U ( f ) U ( f ) , forcing the existence of a non-negligible transform

- 190
- 190- -
Aliasing as an Error · 2.22

value at   f 0  when there is a non-negligible transform value at f 0 . The two frequencies are
represented by wide, solid-sided arrows in Fig. 2.15. The arrows represent isolated, narrow
regions where U is very large, so we can think of them as proportional to delta functions and
write U(ƒ) as
U ( f ) A A  ( f  f0 )  B A  ( f  f0 ) .

Variables A and B are arbitrary complex constants. We have just seen that Table 2.1 requires
U ( f ) U ( f ) . Because the delta functions are real, the equation U ( f ) U ( f ) can be
written as
A A  ( f  f 0 )  B A  ( f  f 0 ) A A  ( f  f 0 )  B A  ( f  f 0 )

or, since the delta functions are also even [see Eq. (2.68a)],

A A  ( f  f 0 )  B A  ( f  f 0 ) A A  ( f  f 0 )  B A  ( f  f 0 ) .

This can only be true if A B (which is, of course, the same thing as having B A ).
Therefore, we have the freedom to choose only one arbitrary complex constant, say A, and after
making that choice function U(ƒ) becomes

______________________________________________________________________________

FIGURE 2.14.
U [5 ] ( f , F )

-F - f max f max F

U( f )

-- 191
191 --
2 · Fourier Theory

U ( f ) A A  ( f  f 0 )  A A  ( f  f 0 ) . (2.100a)

It is not difficult to figure out what happens when the DFT is used to calculate this double-delta
frequency spectrum. If the double-delta U(ƒ) is used to construct U[’](f, F) according to formula
(2.98a), we get multiple isolated regions where U[’] is very large, as shown by the wide dashed
arrows in Fig. 2.15. The curved single arrows show which wide dashed arrows come from the
wide, solid-sided arrow at f0 and which wide dashed arrows come from the wide solid-sided
arrow at   f 0  . For example, the wide dashed arrow closest to f0 comes from the wide solid-
sided arrow at (–f0), and the wide dashed arrow closest to (–f0) comes from the wide solid-sided
arrow at f0. The two wide solid-sided arrows at f0 and –f0 lie a distance a inside the positions of
the positive and negative Nyquist frequencies fNyq and –fNyq, and the two wide dashed arrows that
are closest to f0 and –f0 lie a distance a outside the positive and negative Nyquist frequencies fNyq
and –fNyq. We see that the original double-delta U(ƒ) transform can be written as [from Eq.
(2.100a)]

U ( f ) A A  ( f  f Nyq  a)  A A  ( f  f Nyq  a) , (2.100b)

and we can pair up the two wide dashed arrows closest to f0 and –f0 to create the transform

U [1] ( f ) A A  ( f  f Nyq  a)  A A  ( f  f Nyq  a ) . (2.100c)

Because the delta function  ( f  f Nyq  a)  ( f  f 0 ) has the coefficient A in (2.100b), the
curved single arrow going from   f 0  to f Nyq  a shows that the delta function  ( f  f Nyq  a )
at f Nyq  a must have the coefficient A in Eq. (2.100c); similarly, the curved single arrow going
from f 0 to  f Nyq  a shows that the delta function  ( f  f Nyq  a ) at  f Nyq  a must have the
coefficient A in Eq. (2.100c). Nothing stops us from continuing out from the origin, pairing the
wide dashed arrows at f 3 f Nyq  a and f 3 f Nyq  a to get

U [2] ( f ) A A  ( f  3 f Nyq  a)  A A  ( f  3 f Nyq  a ) (2.100d)

and pairing the wide dashed arrows at f 3 f Nyq  a and f 3 f Nyq  a to get

U [3] ( f ) A A  ( f  3 f Nyq  a )  A A  ( f  3 f Nyq  a) . (2.100e)

- 192
- 192- -
Aliasing as an Error · 2.22

FIGURE 2.15.

frequency – f 0 frequency f 0

frequency – f Nyq frequency f Nyq

a a a a

F 2 f nyq

Each time, the curved single arrows in Fig. 2.15 are consulted to find the coefficients of the delta
functions. This can obviously be continued out to indefinitely large values of ƒ, creating the
paired transforms U [4] ,U [5] ,…, etc. The general formula for U [ k ] turns out to be

­ A ( f  f Nyq  kf Nyq  a)
°
°  A ( f  f Nyq  kf Nyq  a) for k even
°
U [ k ] ( f ) ® . (2.100f)
° A ( f  f  (k  1) f  a)
° Nyq Nyq
°  A ( f  f Nyq  (k  1) f Nyq  a ) for k odd
¯

-- 193
193 --
2 · Fourier Theory

We started out with the double-delta U(ƒ) being the forward Fourier transform of u(t), which
means that u(t) is the inverse Fourier transform of the double-delta U(ƒ),

³ U ( f )e
2& ift
u (t ) df .
5

We now show that u(t), the inverse transform of the double-delta U(ƒ), and u[1] (t ), u[2] (t ),… the
inverse transforms of U [1] ,U [2] ,…, all have the same values at t mt for m 0, 9 1, 9 2,… ,

u (mt ) u[1] (mt ) u[2] (mt ) " u[ k ] (mt ) " . (2.100g)

We begin by taking the inverse Fourier transform of the double-delta U(ƒ) function specified
in (2.100b),

5
u (t ) ³ [ A A ( f  f
5
Nyq  a)  A A  ( f  f Nyq  a)]e 2& ift df
(2.101a)
2& it ( f Nyq  a )  2& it ( a  f Nyq ) 2& it ( f Nyq  a )
Ae Ae 2 Re[ Ae ].

Similarly, we can take the inverse Fourier transform of U [ k ] ( f ) in (2.100f) to get

­°2 Re[ Ae2& it ( f Nyq  kf Nyq  a ) ] for k even


u[ k ] (t ) ® 2& it ( f Nyq  ( k 1) f Nyq  a )
. (2.101b)
°̄ 2 Re[ Ae ] for k odd

Substituting t mt from (2.100g) and f Nyq 1 (2t ) from (2.99a) into Eq. (2.101a) gives

1
u (mt ) 2 Re[ Ae2& imt ((2 t ) a )
] 2 Re[ Aei& m e 2& imat ]
(2.101c)
2 Re[(1) m Ae 2& imat ] .

Making the same substitutions into Eq. (2.101b) gives

- 194
- 194- -
Aliasing as an Error · 2.22

­ 2 Re[ Ae 2& imt ((2 t )  k (2 t )  a ) ]


1 1

°
° 2 Re[ Aei& m ei& mk e2& imat ] for k even
°
u[ k ] (mt ) ® . (2.101d)
° 1 1
2& imt ((2 t )  ( k 1)(2 t )  a )
°2 Re[ Ae ]
° 2 Re[ Ae i& m e i& m ( k 1) e2& imat ] for k odd
¯

But e 9 i& mk (1) mk 1 when k is even and e 9 i& m ( k 1) (1) m ( k 1) 1 when k is odd, so this last
result can be written as

­2 Re[ A(1) m e 2& imat ] for k even


u[ k ] (mt ) ® m 2& imat
. (2.101e)
¯ 2 Re[ A(1) e ] for k odd

Comparing this with (2.101c), we conclude that u (mt ) u[ k ] ( mt ) for all values of m and k,
showing that (2.100g) must be true. Because the u[ k ] functions have exactly the same values as
the u functions at t mt for m 0, 9 1, 9 2,… , the u[ k ] functions are called aliases of function
u. Figure 2.16 graphs an example of u(t) and to show how u and its alias u[1] can have identical
values at all the sample positions on the t axis.
The term “alias” is an interesting one; it suggests that there is no real way to distinguish these
functions if all we know are the values of the sample points at t mt . Yet in Figs. 2.14 and
2.15, there is really no question as to which is the correct region of U [ 5 ] ; spectral values whose
frequencies do not lie between +fNyq and –fNyq can clearly be disregarded. Consider, however, that
before u(t) is analyzed there is no guarantee as to what the correct value of fmax is. Figure 2.17, for
example, shows a pattern for U [ 5 ] that seems to have well-separated regions for U and all its
aliases when in fact there is a high-frequency triangle that is hidden by aliasing. The unwary
analyst might conclude that U has the shape shown in Fig. 2.18(a) when its true shape is the one
shown in Fig. 2.18(b). There is really no way to be sure of the true shape of U when all that is
known is the DFT of the sampled signal u(t). The basic problem, which is that the DFT is the
sampled version of U [ 5 ] instead of U, does not disappear when F 1 t is made larger by
decreasing the sampling interval ¨t; there is always the possibility that the true U curve is broad
enough to overlap. Returning to Fig. 2.16, we see that no matter how small ¨t is made, the
information thrown away from between the samples inevitably allows high frequencies to
masquerade as low frequencies. There is no foolproof method for both sampling the data and
avoiding this possibility.
Fortunately, there are usually ways of avoiding this logical dead end. As is pointed out in Sec.
2.2 above [see discussion after Eq. (2.9b)], in practice all measurements are sampled and, before
representing them by continuous functions, we must know that the samples capture all the

-- 195
195 --
2 · Fourier Theory

relevant detail. In other words, there must be some way of knowing, based on past experience or
knowledge of how the data is gathered, that the sampling is rapid enough to represent faithfully
all the important high-frequency details. In terms of the notation used to discuss Fig. 2.14, we
must eventually be prepared to say that, for some specific ƒmax, no higher frequencies are present
to create aliasing—that is, we must know that if more closely spaced sampling is done all that
would be found is a smooth, quasi-linear variation between the current samples. Many times the
electronic instruments used to make the measurements cannot sense high-frequency data, so even
if high-frequency components exist, they cannot be recorded. Other times, all that can be done is
to look at the data samples and decide whether it is reasonable to suspect the presence of unseen
high-frequency components. The data in Fig. 2.19(a), for example, almost certainly do not
contain significant amounts of unseen high frequencies, whereas unseen high frequencies could
well be present in Fig. 2.19(b). There may be cases where all that can be done is to shorten ¨t and
see whether previously aliased frequency components suddenly appear. The question of whether
aliasing is present is analogous to the question of whether experimental error is present. Just as it
is always logically possible that data contain significant amounts of undetected error, so it is

FIGURE 2.16.

1.1
1

0.5

y
i
0
Y
i

0.5

1
1.1
5 4 3 2 1 0 1 2 3 4 5
4.5 x
i t 4.5

The solid line represents a sinusoidal oscillation at a frequency that is 0.8 times the Nyquist
frequency, and the dashed line represents a sinusoidal oscillation that is 1.2 times the
Nyquist frequency. When the curves are sampled at the rate represented by the black dots—
which in this case is the Nyquist frequency—there is no way to tell them apart in the sampled
data.

- 196
- 196- -
Aliasing as an Error · 2.22

always logically possible that significant amounts of aliasing are being overlooked. Just as we
often expect insignificant amounts of error to occur no matter what precautions are taken, so we
often expect insignificant amounts of aliasing to occur in the calculated DFT. What is needed is
the presence of good engineering and scientific judgment; there must always be someone willing
to pick a value for ƒmax, allowing us to specify the sampling interval t 4 1 (2 f max ) that prevents
significant aliasing in the DFT.

2.23 Aliasing as a Tool


The previous section presented the bad aspects of aliasing, treating it as a form of data corruption.
There are, however, occasions when aliasing is more of a feature than a bug. Many times, a real
function u(t) is known to have a Fourier transform

³ u (t )e
2& ift
U( f ) dt ,
5

which is zero for all positive frequencies ƒ that do not lie between the two positive numbers ƒmin
and ƒmax; that is, U(ƒ) is zero when 0 4 f 4 f min and f : f max . Because u(t) is real, U(ƒ) must be
Hermitian (see entry 7 of Table 2.1), which means

U (  f ) U ( f ) .

This shows
This shows thatthat
U(U(ƒ)
f ) must
mustalso
alsobebestrictly
strictlyzero
zerofor
fornegative
negative frequencies
frequencies ƒf where
where  f min 4 f 4 0
and f 4  f max . The U(ƒ) transform is schematically represented in Fig. 2.20 with the two blocks
showing that U is zero unless ƒ lies between ( f max ,  f min ) or ( f min , f max ) .
The situation shown in Fig. 2.20 describes the signal produced by Michelson interferometers.
At the beginning of this chapter, we mentioned that interferometers produce interferograms that
must then be Fourier transformed to produce the desired spectral measurement. As explained
later in Chapter 4 (see Sec. 4.10), interferometers use optical filters to block out undesired
electromagnetic frequencies, which means there always exist values of ƒmin and ƒmax such that the
transform U(ƒ) of the interferogram signal u(t) is zero unless ƒ lies between ( f max ,  f min ) or
( f min , f max ) . Suppose we sample the interferogram signal with a sampling interval ¨t such that
the Nyquist frequency f Nyq (2t ) 1 is slightly larger than ƒmax. Repeating the reasoning used to
get Fig. 2.15 above, we see that
5
U [5] ( f , F ) ¦ U ( f  kF )
k 5

-- 197
197 --
2 · Fourier Theory

FIGURE 2.17.
U [5 ] ( f , F )

f
 F 2 f Nyq F 2 f Nyq
 f Nyq f Nyq

FIGURE 2.18(a). U( f )

FIGURE 2.18(b).

U( f )

The U [5 ] ( f , F ) data in Fig. 2.17 contains hidden aliasing that can lead spectral analysts to assume
that the Fig. 2.18(a) rather than 2.18(b) depicts the true frequency spectrum.

- 198
- 198- -
Aliasing as a Tool · 2.23

FIGURE 2.19(a).

This data is relatively smooth, suggesting that it does not contain high-frequency components.

FIGURE 2.19(b).

This curve varies rapidly in three locations, suggesting the presence of high-frequency
components in the data.

-- 199
199 --
2 · Fourier Theory

now has the form shown in Fig. 2.21. Again, the solid blocks show the original U(ƒ), the dashed
blocks show the aliases created by turning U(ƒ) into U [ 5 ] ( f , F ) , and the curved arrows drawn
show exactly how the aliased blocks are created from the original blocks. No solid blocks overlap
with the dashed blocks, so aliasing is not a problem.
Now consider what happens when we force aliasing to occur by choosing ¨t to be half its
original size, creating the U [ 5 ] plot shown in Fig. 2.22. As in Fig. 2.21, none of the solid blocks
overlap with the dashed blocks. Because the dashed blocks come from turning U into U [ 5 ] , the
spectral shapes represented by the solid and dashed blocks are all identical. This means that the
aliasing does not cause spectral information to be lost; either the solid blocks or the dashed
blocks can be used to recover the true shape of U(ƒ). The electronic equipment used to sample
u(t) only needs to sample half as often as before, which usually makes it less expensive to build,
and as a bonus the rate at which data flows from the interferometer ends up being cut in half. This
last point is often a significant consideration when the interferometer is on a satellite and all the
data has to be communicated to the ground. The scheme shown in Fig. 2.22 is called
undersampling. There is nothing special about undersampling by a factor of 2; if the distance
between ƒmin and ƒmax is small enough, and ƒmin is far enough from f 0 , we can undersample
by much higher factors. Figure 2.23 shows a scheme that withundersamples
4 aliases rather
bythan one. of 5.
a factor

2.24 Sampling Theorem


We define a band-limited function u(t) to be a function for which there exists a positive
frequency ƒmax such that the forward Fourier transform of u(t),

³ u(t )e
2& ift
U( f ) dt ,
5

is strictly zero when f 4  f max or f : f max . The previous section indicated that the interferogram
of a Michelson interferometer is a special case of a band-limited function; not only is its
transform zero for f : f max , but there is also a positive frequency ƒmin such that its transform is
zero for f 4 f min (see Fig. 2.20). It can be shown that whenever a continuous function u(t) is
also band limited, then its samples u (mt ) (with m 0, 9 1, 9 2,… ) can be used to reconstruct the
complete function—including the values of u between the samples—as long as we choose

1
t
(2.102)
2 f max
to prevent aliasing.
We start by forming the mathematical construct

- 200
- 200- -
Sampling Theorem · 2.24

FIGURE 2.20.
U( f )

f
 f max  f min f min f max

FIGURE 2.21.

U [5] ( f , F )

f
 f min f min
F  f max f max f Nyq F
 f Nyq

Frequency F is twice the Nyquist frequency f Nyq in Fig. 2.21.

-- 201
201 --
2 · Fourier Theory

5
v(t ) ¦ u(mt ) (t  mt ) .
m 5
(2.103)

Clearly, the u (mt ) sample values of function u are the only data used to set up function v(t).
Because u (t ) (t  t0 ) u (t0 ) (t  t0 ) for any continuous function u [see Eq. (2.68e) above], this
can be written as
5
v(t ) ¦ u (t ) (t  mt )
m 5
or
ª 5 º
v(t ) u (t ) A « ¦  (t  mt ) » .
¬ m 5 ¼

here tt in
Note that here has
thereturned
functiontoubeing a continuous,
has returned not
to being a sampled, variable. Taking the Fourier
a continuous
transform of both sides gives, using the Fourier convolution theorem [see Eq. (2.72i)],

ª1 5 § k ·º
V ( f ) U ( f )  « ¦  ¨ f  ¸» , (2.104a)
¬ t k 5 © t ¹ ¼
where
5

³ v(t )e
2& ift
V( f ) dt , (2.104b)
5

³ u (t )e
2& ift
U( f ) dt , (2.104c)
5
and
ª 5 º 2& ift
5
1 5 § k ·
³5 ¬« k¦
5
 (t  k t ) »
¼
e dt ¦ ¨ f  ¸
t k 5 © t ¹
(2.104d)

from formula (2.78d). Note that here both ƒ and t are continuous, not sampled, variables. We can
now use the linearity of the convolution [see discussion after Eq. (2.38c)] and the definition of
the convolution in Eq. (2.38a) to write (2.104a) as

5
5
§ k · 5
§ k ·
t AV ( f ) ¦
k 5
U ( f )   ¨
©
f  ¸ ¦ ³ U ( f 3) ¨ f   f 3 ¸ df 3

t ¹ k 5 5 © t ¹ (2.105a)
5
§ k · § 1 ·
¦ U ¨ f  ¸ U [5] ¨ f , ¸ ,
k 5 © t ¹ © t ¹

- 202
- 202- -
Sampling Theorem · 2.24

 f min f min
FIGURE 2.22.

F F
[5 ]
 f max U ( f ,F)
f max

f
 f Nyq
f Nyq
[5]
U ( f ,F)
FIGURE 2.23.

f min f max
 f max  f min f Nyq
F F
 f Nyq

In both Figs. 2.22 and 2.23, frequency F is twice the Nyquist frequency f Nyq .

where U [ 5 ] is as defined in Eq. (2.93b) above. Inequality (2.102) ensures that the separate
regions of U that combine to create U [ 5 ] do not overlap, giving us the graph of U [ 5 ] shown in
Fig. 2.24. Hence, we can use the  function defined in Eq. (2.56c) to select just the region of
nonzero U [ 5 ] between  (2t ) 1 and  (2t ) 1 , recreating the original U(ƒ) transform.
 
Multiplication of (2.105a) by  f , (2t ) 1 then gives

§ 1 · [5] § 1 · § 1 ·
U( f ) ¨ f , ¸ AU ¨ f , ¸ t AV ( f ) A  ¨ f , ¸. (2.105b)
© 2t ¹ © t ¹ © 2t ¹

-- 203
203 --
2 · Fourier Theory

Having recovered the original U(ƒ), an inverse Fourier transform of U(ƒ) gives back the original
unsampled u(t). Using the Fourier convolution theorem again to take the inverse Fourier
transform of both sides of (2.105b), we get [applying Eq. (2.39j) after interchanging the roles of ƒ
and t]
5
§ 1 · 2& ift
u (t ) t ³ V ( f ) A  ¨ f , ¸ e df
5 © 2 t ¹
(2.106a)
ª 5
º ª5 § 1 · 2& if 3t º
t « ³ V ( f )e df »  « ³  ¨ f 3,
2& ift
¸ e df 3» ,
¬ 5 ¼ ¬ 5 © 2t ¹ ¼

where the convolution between the two expressions inside square brackets [ ] is over the variable
t. From (2.104b), function V(ƒ) is the forward Fourier transform of v(t), making v(t) equal to the
inverse Fourier transform of V(ƒ) in (2.106a), with v(t) defined as

5
v(t ) ¦ u(mt ) (t  mt )
m 5

in Eq. (2.103). From Eq. (2.71a) above, the inverse Fourier transform of  is

§ § 1 ··
5
§ 1 · 1 § &t ·
¸¸ ³ e ¨ f ,
( ift ) 2& ift
F ¨¨ f , ¸ df sin ¨ ¸ .
© © 2t ¹ ¹ 5 © 2t ¹ & t © t ¹

Equation (2.106a) can now be written as

ª 5 º ª1 § & t ·º
u (t ) t « ¦ u (mt ) (t  mt ) »  « sin ¨ ¸ » . (2.106b)
¬ m 5 ¼ ¬& t © t ¹ ¼

Again, the linearity of the convolution can be used to simplify (2.106b),

5
­ ª 1 § & t ·º ½
u (t ) t ¦ ®u (mt ) « (t  mt )  & t sin ¨© t ¸¹» ¾
m 5 ¯ ¬ ¼¿

or, using that  (t  t0 )  u (t ) u (t  t0 ) for any continuous function u,

5 ­° ª 1 § & (t  mt ) · º ½°
u (t ) ¦ °®u (mt ) « & ((t  mt ) t ) sin ©¨ t
¸» ¾ .
¹ ¼ ¿°
(2.106c)
m 5 ¯ ¬

- 204
- 204- -
Sampling Theorem · 2.24

FIGURE 2.24.

§ 1 ·
U [5] ¨ f , ¸
© t ¹

f
§1 · 1
 ¨  f max ¸  f max f max  f max
© t ¹ t

1 1
 U( f )
2t 2t

This formula gives us u(t) everywhere in terms of the samples u (mt ) and the function

1 § &t ·
sin ¨ ¸ .
& (t t ) © t ¹

We now define the function


sin( x)
sinc( x) (2.106d)
x

and write (2.106c) as


5
§ & (t  mt ) ·
u (t ) ¦ u(mt )sinc ¨©
m 5 t
¸.
¹
(2.106e)

-- 205
205 --
2 · Fourier Theory

Many authors use a different definition of the sinc function, which we call here sincalt , with
sin(& x)
sinc alt ( x) .
&x

In terms of sincalt , Eq. (2.106e) becomes

5
§ (t  mt ) ·
u (t ) ¦ u(mt )sinc
m 5
alt ¨
© t
¸.
¹

sin( x) sin(& x)
For the rest of this book, the symbol sinc will refer to instead of . We also
x &x
note that the Fourier transform pair in (2.71a) can be written in terms of sinc( x) as

³e
2& ift
[2 Fsinc(2& Ft )] dt  ( f , F )
5
and
5

³e
2& ift
 ( f , F ) df 2 Fsinc(2& Ft ) .
5

Replacing ƒ by íƒ in the top integral and t by ít in the bottom integral gives

³e
2& ift
[2 Fsinc(2& Ft )] dt  ( f , F )  ( f , F )
5
and
5

³e
2& ift
 ( f , F ) df 2 Fsinc(2& Ft ) 2 Fsinc(2& Ft ) ,
5

where we have used that  ( f , F ) and sinc(2& Ft ) are even functions of their arguments:

sinc( x) sinc( x) (2.107a)


and
 ( f , F )  ( f , F ) . (2.107b)

This means we can write this Fourier relationship using the more general formulas

- 206
- 206- -
Sampling Theorem · 2.24

5
F ( 9 ift )
 2 Fsinc(2& Ft )  ³ e92& ift [2 Fsinc(2& Ft )] dt  ( f , F ) (2.108a)
5
and

5
F ( 9 ift )
 ( f , F )  F ( 9 itf )
  ( f , F )  ³ e92& ift  ( f , F ) df 2 Fsinc(2& Ft ) . (2.108b)
5

2.25 Fourier Transforms in Two and Three Dimensions


The integral Fourier transform extends easily and naturally to two- and three-dimensional
functions. We can, for example, define the integral Fourier transform of any two-dimensional
function u(x,y) to be
5 5

³ dx ³ dy e
2& i ( x.  y! )
U (. ,! ) u ( x, y ) . (2.109a)
5 5

The inverse Fourier transform of U returns the original function,

5 5

³ d. ³ d! e
2& i ( x.  y! )
u ( x, y ) U (. ,! ) . (2.109b)
5 5

In three dimensions we can write, for the function u( x, y, z ) , that

5 5 5
U (. ,! , 0 ) ³
5
dx ³ dy ³ dz e2& i ( x.  y!  z0 )u ( x, y, z )
5 5
(2.109c)

and
5 5 5

³ d. ³ d! ³ d0 e
2& i ( x.  y!  z0 )
u ( x, y , z ) U (. ,! , 0 ) . (2.109d)
5 5 5

This pattern of forward and inverse transforms can be extended indefinitely to functions u and U
with ever larger numbers of arguments, but for the purposes of this book there is no need to go
beyond the two- and three-dimensional transforms given in Eqs. (2.109a)–(2.109d). As a matter
of notation, we often use the standard Cartesian x̂ and ŷ unit vectors pointing along the x and y
axes of a Cartesian coordinate system to define vectors
G G
( xxˆ  yyˆ and q . xˆ  ! yˆ .

-- 207
207 --
2 · Fourier Theory

G G
We introduce the symbol u ( ( ) as a shorthand for u(x,y) and the symbol U (q ) as a shorthand for
U (. ,! ) . Now Eqs. (2.109a) and (2.109b) can be written as

5
G G G G
U (q ) ³³
5
d 2 ( e 2& i ( =q u ( () (2.110a)

and
5
G G G G
u(( ) ³³
5
d 2q e 2& i( =qU (q ) . (2.110b)

We can also define vectors for the three-dimensional case,


G G
r xxˆ  yyˆ  zzˆ and s . xˆ  ! yˆ  0 zˆ ,

and then write Eqs. (2.109c) and (2.109d) as

5
G G G G
³ ³³
3 2& ir = s
U (s ) d r e u (r ) (2.110c)
5
and
5
G G G G
³ ³³d se
3 2& ir = s
u (r ) U (s ) . (2.110d)
5
Vector notation is sometimes used to group families of associated forward and inverse Fourier
transforms into a single equation. We might, for example, write the six scalar equations

5 5
G G G G G G G G
³ ³ ³ d r e u x (r ) , u x (r ) ³ ³³d
3 2& ir = s 3
U x (s ) s e 2& ir = sU x ( s ) ,
5 5

5 5
G G G G G G G G
³ ³³d re ³ ³³
3 2& ir = s 3 2& ir = s
U y (s ) u y (r ) , u y (r ) d s e U y (s ) ,
5 5
and
5 5
G G G G G G G G
³ ³ ³ d r e u z (r ) , u z (r ) ³ ³³d se
3 2& ir = s 3 2& ir = s
U z (s ) U z (s )
5 5

as the pair of vector equations


G G 5
2& ir = s G G
G G

³ ³³
3
U (s ) d r e u (r ) (2.110e)
5

- 208
- 208- -
Fourier Transforms in Two and Three Dimensions · 2.25

and
G G G G
5
G G
³ ³³d
3
u (r ) s e 2& ir = sU ( s ) , (2.110f)
5
where
G G G G G G G G G G
u (r ) xˆu x (r )  yˆ u y (r )  zˆu z (r ) and U ( s ) xˆU x ( s )  yˆU y ( s )  zˆU z ( s ) .
G G G G G G
We call U ( s ) the vector Fourier transform of u (r ) and u (r ) the vector inverse Fourier
G G
transform of U ( s ) . Just as in the one-dimensional case, it makes no difference which Fourier
transform is labeled the forward transform and which is labeled the inverse transform as long as
there is a change in sign of the exponent of e. Following the pattern of Eq. (2.28 A ), we can also
write
5 5
G G G G G G
³³ ³³
2 9 2& i ( = q 2 B 2 & i ( 3= q
d q e d ( 3 e u ( ( 3) u ( ( ) (2.110g)
5 5
and
5 5
G G
³³ ³ d ³ ³ ³ d r3 e
G G G G
3 se 92& ir = s 3 B2& ir 3= s v (r 3) v(r ) (2.110h)
5 5

G G
for two-dimensional and three-dimensional scalar functions u ( ( ) and v(r ) . For three-
dimensional vector functions, this becomes

5 5
G G G G G G G G
³³ ³ d s e ³ ³ ³ d r3 e
3 9 2& ir = s 3 9 2& ir 3= s
v (r 3) v (r ) . (2.110i)
5 5

Many one-dimensional Fourier identities have two-dimensional and three-dimensional


counterparts. For example, the Fourier shift theorem [see Eq. (2.36h) above] in two dimensions
G
ˆ x  ya
becomes, for a two-dimensional vector constant a xa ˆ y,

5 5 5
G G G G
³ ³d ³ dx ³ dy e
2 92& i ( = q 92& i ( x.  y! )
(e u((  a) u ( x  ax , y  a y )
5 5 5
5 5

³ dx3 ³ dy3 e
B2& i (. a x ! a y ) 92& i ( x3.  y 3! )
e u ( x3, y3) ,
5 5

where in the last step we define x3 x  ax and y3 y  ax . We now see that (dropping the
primes inside the double integral)

-- 209
209 --
2 · Fourier Theory

5 5
G G G G G G G G G
³ ³d (e u ( (  a ) e B2& ia =q ³ ³
2 92& i ( = q 2 92& i ( = q
d ( e u( ( ) . (2.110j)
5 5

G G G G
This shows the forward or inverse two-dimensional Fourier transform of u( (  a) to be e B2& ia =q
G
multiplied by the forward or inverse two-dimensional Fourier transform of u ( ( ) . Similarly in
G
ˆ x  yb
three dimensions, we have, for a three-dimensional constant vector b xb ˆ y  zb
ˆ z , that

G G
5 5 5 5
G G
³ ³³d re ³ dx ³ dy ³
3 92& ir = s
v(r  b ) dz e92& i ( x.  y!  z0 ) v( x  bx , y  by , z  bz )
5 5 5 5
5 5 5

³ dx3 ³ dy3 ³ dz 3e 92& i ( x3.  y3!  z30 ) v( x3, y3, z3) ,


B2& i ( bx.  by!  bz0 )
e
5 5 5

where x3 x  bx , y3 y  by , and z 3 z  bz . This time we find that the forward or inverse three-
G G G G
dimensional Fourier transform of v (r  b ) is e B2& is =b multiplied by the forward or inverse three-
G
dimensional Fourier transform of v(r ) ,

G G
5 5
G G G G G G G
³ ³³d re v(r  b ) e B2& is =b ³ ³³
3 92& ir = s 3 92& ir = s
d r e v( r ) . (2.110k)
5 5

There is also a two-dimensional and three-dimensional version of the one-dimensional Fourier


scaling theorem discussed in Sec. 2.8 above [see Eq. (2.37a)]. In two dimensions when we have

5
G G G G
V ( 9 ) (q ) ³³
5
d 2 ( e 9 2& i ( =qv ( () (2.110 A )

G G G G
and v( ( ) is replaced by v(( ) , where Į is a real scalar, then we can substitute ( 3 ( to get
G
5 5 § ( 3· G
G G G 1 9 2& i¨ ¸ = q G 1 G
³³d ³³d
2 9 2& i ( = q 2
(e v ( () 2 (3e © ¹
v ( ( 3) 2 V ( 9 ) (q  ) . (2.110m)
5
 5

G G G G
Suppose there is a function of ( called u ( ( ) such that ( has to change by a vector distance (
G
whose magnitude must be at least (  for there to be a significant change in the value of
G
u ( ( ) . Using the same reasoning as was applied to the one-dimensional Fourier scaling theorem
G
[see the analysis following Eq. (2.37e)], we can show that U ( 9 ) (q ) , the two-dimensional forward

- 210
- 210- -
Fourier Transforms in Two and Three Dimensions · 2.25

G
or inverse Fourier transform of u, must be negligible or zero for all vectors q whose magnitude
G
q exceeds 1  . The Fourier scaling theorem in three dimensions starts with

5
G G G G
³ ³³
(9) 3 92& ir = s
V (s ) d r e v(r ) , (2.110n)
5

G G G
from which we discover, replacing r by r 3  r , that
G
5 5 § r3 · G
G G G 1 9 2& i¨ ¸ =s G 1 G
³ ³³ d r e ³ ³³ d
3 9 2& ir =s 3
v ( r ) 3 r3 e © ¹
v (r 3) 3 V ( 9 ) ( s  ) . (2.110o)
5  5 

G G
Again we can conclude that if there is a function u (r ) such that r must be at least ȕ for there
G
to be a significant change in u, then U ( 9 ) ( s ) , the three-dimensional forward or inverse Fourier
G G
transform of u, must be negligible or zero for all vector arguments s whose magnitude s
exceeds 1  .
The two-dimensional convolution of scalar functions u(x,y) and v(x,y) is written using the
symbol  and defined to be
5 5
u ( x, y ) v( x, y ) ³ dx3 ³ dy3u( x3, y3)v( x  x3, y  y3) ,
5 5
(2.111a)

or
5
G G G G G
³ ³d
2
u ( ( ) v( ( ) ( 3 u ( ( 3)v( (  ( 3) (2.111b)
5

using the more concise vector notation. The vector notation may make the connection between
the one- and two-dimensional convolutions in Eqs. (2.38a) and (2.111b) easier to see. The two-
dimensional convolution, like the one-dimensional convolution, is both commutative and
associative. Using the same type of reasoning as in the analysis in Sec. 2.9, we have for the two-
G G G
dimensional functions u ( ( ) , v( ( ) , and h( ( ) that

5 5
G G G G G G G G
³ ³ ³ ³
2 2 2
u ( ( ) v( ( ) d ( 3 u ( ( 3) v ( (  ( 3)  1 d ( 33 u ( (  ( 33) v ( ( 33)
5
5
5
(2.111c)
G G G G G
³ ³d
2
( 33 v( ( 33)u ( (  ( 33) v( ( ) u ( ( )
5
and

-- 211
211 --
2 · Fourier Theory

5 5
G G G G G 2 G G G G
³³ ³³
2
u ( ( ) v( ( ) h( ( ) d ( 33 h ( (  ( 33) d ( 3 u ( ( 3) v ( ( 33  ( 3)
5 5
5 5
G G G G G G
³³ d ( 3 u ( ( 3) ³ ³d
2 2
( 33 h( (  ( 33)v( ( 33  ( 3)
5 5
(2.111d)
5 5
G G G G G G
³³ d ( 3 u ( ( 3) ³ ³d
2 2
( 333 v( ( 333) h(( (  ( 3)  ( 333)
5 5
G G G
u ( ( )   v( ( ) h( ( )  ,

where to show that the two-dimensional convolution is commutative we make the variable
G G G
substitution ( 33 (  ( 3 in (2.111c); and to show it is associative, we make the variable
G G G
substitution ( 333 ( 33  ( 3 in (2.111d). The two-dimensional convolution is also linear. For any
two complex constants Į and ȕ, we have

5
G G G G G G G G
³ ³d
2
u ( ( )   v( ( )   h( ( )  ( 3 u ( ( 3)  v( (  ( 3)   h( (  ( 3) 
5
5 5
G G G G G G
³ ³d ³ ³d
2 2
 ( 3 u ( ( 3)v( (  ( 3)   ( 3 u ( ( 3)h( (  ( 3) (2.111e)
5 5
G G G G
 u ( ( ) v( ( )   u ( ( ) h( ( ),

and because the two-dimensional convolution is commutative it follows that


G G G G G G G
 v( ( )   h( ( ) u ( ( )  v( ( ) u ( ( )    h( ( ) u ( ( ) . (2.111f)

It is easy to show that the Fourier convolution theorem holds true in two dimensions. We start
with
5 5

³ dx ³ dy e
92& i ( x.  y! )
[u ( x, y ) v( x, y )]
5 5
5 5 5 5
³
5
dx ³ dy e 92& i ( x.  y! ) ³ dx3 ³ dy3 u ( x3, y3)v( x  x3, y  y3)
5 5 5
5 5 5 5

³ dx3 ³ dy3 u( x3, y3) ³ dx ³ dy e


92& i ( x.  y! )
v( x  x3, y  y3).
5 5 5 5

- 212
- 212- -
Fourier Transforms in Two and Three Dimensions · 2.25

Now we replace the x, y integration variables by x33 x  x3 and y33 y  y3 , with dx33 dx and
dy33 dy , so that

5 5

³ dx ³ dy e
92& i ( x.  y! )
[u ( x, y ) v( x, y )]
5 5
5 5 5 5

³ dx3 ³ dy3 u ( x3, y3)e ³ dx33 ³ dy33 e


92& i ( x3.  y 3! ) 92& i ( x33.  y 33! )
v( x33, y33)
5 5 5 5
or
5 5

³ dx ³ dy e
92& i ( x.  y! )
[u ( x, y ) v( x, y )] U ( 9 ) (. ,! ) A V ( 9 ) (. ,! ) , (2.112a)
5 5

where U ( 9 ) is the two-dimensional forward or inverse Fourier transform of u,

5 5
U ( 9 ) (. ,! ) ³
5
dx ³ dy e 92& i ( x.  y! )u ( x, y ) ,
5
(2.112b)
(9)
and V is the two-dimensional forward or inverse Fourier transform of v,

5 5

³ dx ³ dy e
(9) 92& i ( x.  y! )
V (. ,! ) v ( x, y ) . (2.112c)
5 5

This gives the first half of the two-dimensional Fourier convolution theorem. To get the
second half, we reverse the transform in (2.112a). If the plus sign is used in (2.112a), take the
forward two-dimensional Fourier transform of both sides, and if the minus sign is used take the
inverse two-dimensional Fourier transform of both sides. This leads to

5 5

³ d. ³ d! e
B2& i ( x.  y! )
U ( 9 ) (. ,! ) A V ( 9 ) (. ,! ) u ( x, y ) v( x, y ) , (2.113a)
5 5

where, reversing the transforms in Eqs. (2.112b) and (2.112c),

5 5

³ d. ³ d! e
B2& i ( x.  y! )
u ( x, y ) U ( 9 ) (. ,! ) (2.113b)
5 5
and
5 5

³ ³ d! e
B2& i ( x.  y! )
v ( x, y ) d. V ( 9 ) (. ,! ) . (2.113c)
5 5

-- 213
213 --
2 · Fourier Theory

The first half of the two-dimensional Fourier convolution theorem, Eqs. (2.112a)–(2.112c),
shows that the forward or inverse two-dimensional Fourier transform of the two-dimensional
convolution of two functions u and v is the product of the forward or inverse two-dimensional
Fourier transforms of u and v. Because no restrictions are placed on the nature of u and v, other
than that they are transformable, there are also no restrictions on the nature of their U ( 9 ) and V ( 9 )
transforms. This means we can think of U ( 9 ) and V ( 9 ) as arbitrary transformable functions. The
(9 ) superscripts on U and V in Eqs. (2.113a)–(2.113c) then just tell us that, according to Eqs.
(2.112b) and (2.112c),
5 5

³ dx ³ dy e
(9) 92& i ( x.  y! )
U (. ,! ) u ( x, y )
5 5
and
5 5
V ( 9 ) (. ,! ) ³
5
dx ³ dy e92& i ( x.  y! ) v( x, y ) .
5

We already know this, however, from looking at Eqs. (2.113b) and (2.113c)—just take the
opposite-sign Fourier transform of both sides. Hence, we can drop the (9 ) superscripts on U and
V in Eqs. (2.113a)–(2.113c) as long as ( B ) superscripts are added to u and v to distinguish
between the two choices of sign in (2.113b) and (2.113c). Now Eqs. (2.113a)–(2.113c) become

5 5

³ d. ³ d! e
B2& i ( x.  y! )
U (. ,! ) A V (. ,! ) u ( B ) ( x, y ) v ( B ) ( x, y ) , (2.114a)
5 5
where
5 5
u ( B ) ( x, y ) ³ ³ d! e
B2& i ( x.  y! )
d. U (. ,! ) (2.114b)
5 5
and
5 5

³ d. ³ d! e
(B) B2& i ( x.  y! )
v ( x, y ) V (. ,! ) . (2.114c)
5 5

The letters used to label the functions and variables are, of course, arbitrary, so nothing stops us
from interchanging the letters u and U, v and V, x and ȗ, y and Ș, and the vertical order of the ±
signs to get

5 5

³ dx ³ dy e
92& i ( x.  y! )
u ( x, y ) A v( x, y ) U ( 9 ) (. ,! ) V ( 9 ) (. ,! ) , (2.115a)
5 5

- 214
- 214- -
Fourier Transforms in Two and Three Dimensions · 2.25

where
5 5
U ( 9 ) (. ,! ) ³
5
dx ³ dy e 92& i ( x.  y! )u ( x, y )
5
(2.115b)

and
5 5

³ dx ³ dy e
(9) 92& i ( x.  y! )
V (. ,! ) v ( x, y ) . (2.115c)
5 5

Equations (2.115a)–(2.115c) are the other half of the two-dimensional Fourier convolution
theorem—they show that the forward or inverse two-dimensional Fourier transform of the
product of two functions u and v is the two-dimensional convolution of the forward or inverse
two-dimensional Fourier transforms of u and v.
The three-dimensional convolution is written using the symbol  and defined to be

5 5 5
u ( x, y, z )  v( x, y, z ) ³
5
dx3 ³ dy3 ³ dz3u ( x3, y3, z3) v( x  x3, y  y3, z  z3)
5 5
(2.116a)

or
5
G G G G G
³ ³ ³ d r 3 u (r 3) v(r  r 3) .
3
u (r )  v(r ) (2.116b)
5

Using three-dimensional vector notation, the three-dimensional convolution has the same
commutative, associative, and linearity properties as the two-dimensional convolution, as can be
seen by returning to Eqs. (2.111c)–(2.111f), mentally adding an extra  , an extra integral sign,
and replacing all the superscript 2’s by superscript 3’s.

G G G G
u ( ( )  v( ( ) v( ( )  u ( ( ) , (2.117a)

G G G G G G
u ( ( )  v( ( )  h( ( ) u ( ( )  v( ( )  h( ( ) , (2.117b)

G G G G G G G
u ( ( )   v( ( )   h( ( )   u ( ( )  v( ( )    u ( ( )  h( ( )  , (2.117c)

and
G G G G G G G
 v( ( )   h( ( )  u ( ( )  v( ( )  u ( ( )    h( ( )  u ( ( ) . (2.117d)

-- 215
215 --
2 · Fourier Theory

Looking carefully at the variable manipulations used to derive Eqs. (2.112a)–(2.112c), the first
half of the two-dimensional Fourier convolution theorem, we see that working with an extra
product z0 in the exponent of e and an extra integration over dz does not affect the end result.
We can therefore say that

5 5 5

³ dx ³ dy ³ dz e
92& i ( x.  y!  z0 )
[u ( x, y, z )  v( x, y, z )]
5 5 5
(2.118a)
(9) (9)
U (. ,! , 0 ) A V (. ,! , 0 ) ,

where
5 5 5
U ( 9 ) (. ,! , 0 ) ³ dx ³ dy ³ dz e
92& i ( x.  y!  z0 )
u ( x, y , z ) (2.118b)
5 5 5

and
5 5 5

³ dx ³ dy ³ dz e
(9) 92& i ( x.  y!  z0 )
V (. ,! , 0 ) v ( x, y , z ) . (2.118c)
5 5 5
The argument about relabeling the functions and variables used to go from (2.112a)–(2.112c) to
(2.115a)–(2.115c) works equally well here, giving us at once the other half of the three-
dimensional Fourier convolution theorem,

5 5 5

³ dx ³ dy ³ dz e
92& i ( x.  y!  z0 )
u ( x, y , z ) A v ( x, y , z )
5 5 5
(2.119a)
U ( 9 ) (. ,! , 0 ) V ( 9 ) (. ,! , 0 ) ,

where
5 5 5

³ dx ³ dy ³ dz e
(9) 92& i ( x.  y!  z0 )
U (. ,! , 0 ) u ( x, y , z ) (2.119b)
5 5 5

and
5 5 5
V ( 9 ) (. ,! , 0 ) ³ dx ³ dy ³ dz e
92& i ( x.  y!  z0 )
v ( x, y , z ) . (2.119c)
5 5 5

One last matter of notation worth mentioning is that we can create two-dimensional and three-
dimensional delta functions from the products of the already-discussed one-dimensional delta
function:

- 216
- 216- -
Fourier Transforms in Two and Three Dimensions · 2.25

G
 ( ( )  ( x) A  ( y ) (2.120a)

and
G
 (r )  ( x) A  ( y ) A  ( z ) . (2.120b)

For any two-dimensional continuous function u(x,y), we have

5 5 5 5

³ dx ³ dy u( x, y) ( x  x ) ( y  y ) ³ dx ( x  x ) ³ dy u( x, y) ( y  y )


o o o o
5 5 5
5
5
(2.121a)
³ dx ( x  x )u( x, y ) u( x , y );
5
o o o o

and similarly for any continuous three-dimensional function v ( x, y , z ) , we have

5 5 5

³ dx ³ dy ³ dz v( x, y, z) ( x  x ) ( y  y ) ( z  z )
5 5 5
o o o

5 5
³ dx ( x  x ) ³ dy v( x, y, z ) ( y  y )
5
o
5
o o (2.121b)
5
³ dx ( x  x )v( x, y , z ) v( x , y , z ).
5
o o o o o o

These equations can be written in vector notation as

5
G G G G
³ ³d
2
( u ( ( )  ( (  (o ) u ((o ) (2.121c)
5
and
5
G G G G
³ ³ ³ d r v( r )  (r  r ) v(r ) .
3
o o (2.121d)
5

Combining Eq. (2.71f) for the one-dimensional delta function with Eqs. (2.120a) and (2.120b),
we see that in two dimensions

5 5 5
G G G

³ d. e92& ix. ³ d! e 92& iy! ³ ³d qe


2 92& i ( = q
 ( ( )  ( x) A  ( y ) (2.122a)
5 5 5

-- 217
217 --
2 · Fourier Theory

G
using the vector notation q . xˆ  ! yˆ ; and in three dimensions

5 5 5
G
³ d. e ³ d! e ³ d0 e
92& ix. 92& iy! 92& iz0
 (r )  ( x) A  ( y ) A  ( z )
5 5 5
5
(2.122b)
G G

³ ³³d
3 92& ir = s
se
5

G
using the vector notation s . xˆ  ! yˆ  0 zˆ .

__________

This chapter provides both an intuitive understanding and a rigorous explanation of how
Fourier transforms work. Sine and cosine transforms are introduced as a way to measure how
much functions resemble sine and cosine curves, and these transforms are then combined to
create the standard complex Fourier transform. We describe convolutions and how they produce
new functions by blurring old ones. The Fourier convolution theorem—whose importance is
difficult to overstate—directly connects the convolution to Fourier-transform theory. Generalized
limits are explained to show in what sense some of the more puzzling functions found in lists of
Fourier transforms belong there, and a brief outline of generalized functions is presented to show
how delta functions can be described without making them sound like obvious nonsense.
Computers use discrete Fourier transforms to handle Fourier calculations, and we explain how
the discrete Fourier transform can be used to approximate the integral Fourier transform. The
discrete Fourier transform produces aliasing; we show when aliasing is desirable, when it is not
desirable, and when it can be neglected. All the major concepts explained in this chapter—the
linearity of the Fourier transform, the linearity of the convolution, the Fourier convolution
theorem, the idea of even and odd functions, and the delta function—have important roles to play
in the pages that follow.

- 218
- 218- -
Table 2.1

Table 2.1

U ( f ) F ( ift ) (u (t )) u (t ) F (ift ) (U ( f ))

(1) [real, even] [real, even]


Im(U ( f )) 0 , U ( f ) U ( f ) Im(u (t )) 0 , u (t ) u (t )

(2) [imag., even] [imag., even]


Re(U ( f )) 0 , U ( f ) U ( f ) Re(u (t )) 0 , u (t ) u (t )
(3) [real, odd] [imag., odd]
Im(U ( f )) 0 , U ( f ) U ( f ) Re(u (t )) 0 , u (t ) u (t )

(4) [imag., odd] [real, odd]


Re(U ( f )) 0 , U ( f ) U ( f ) Im(u (t )) 0 , u (t ) u (t )
(5) [complex, even] [complex, even]
Re(U ( f )) > 0 for some f Re(u (t )) > 0 for some t
Im(U ( f )) > 0 for some f Im(u (t )) > 0 for some t
U ( f ) U ( f ) u (t ) u (t )
(6) [complex, odd] [complex, odd]
Re(U ( f )) > 0 for some f Re(u (t )) > 0 for some t
Im(U ( f )) > 0 for some f Im(u (t )) > 0 for some t
U ( f ) U ( f ) u (t ) u (t )
(7) [Hermitian] [real]
U ( f ) U ( f )  Im(u (t )) 0
(8) [real] [Hermitian]
Im(U ( f )) 0 u (t ) u (t ) 

-- 219
219 --
2 · Fourier Theory

Table 2.1
(continued)

(9) [anti-Hermitian] [imag.]


U ( f ) U ( f )  Re(u (t )) 0

(10) [imag.] [anti-Hermitian]


Re(U ( f )) 0 u (t ) u (t ) 

(11) [complex, no symmetry] [complex, no symmetry]

- 220
- 220- -
Table 2.2

Table 2.2

§t·
5 2&ik ¨ ¸
T §k ·
1 2 & i ¨ t ¸ v(t ) ¦ Ak e ©T ¹
Ak ³ e © T ¹ v(t )dt k 5
T 0

(1) [real, even] [real, even]


Im( Ak ) 0 , Ak Ak Im(v(t )) 0 , v(t ) v(t )
(1)
(2) [real, even]
[imag., even] [real, even]
[imag., even]
Re( Akk ) 00 ,, AAkk AAkk
Im( A ) Im(vv((tt))
Re( )) 00 ,, vv((tt)) vv((tt))

(2)
(3) [imag.,
[real, even]
odd] [imag., odd]
[imag., even]
Re( A
Im( Ak ))
00 ,, A
Ak
AA
kk
Re(vv((tt ))
Re( ))
00 ,, vv((
tt ))
v(vt()t )
k k

(3)
(4) [real, odd] [imag., odd]
[real, odd]
[imag., odd]
Im( Re(
Im(vv((tt )) 00 ,, vv((tt)) vv((tt))
Re( Ak ) 00 ,, A
A k ) Akk  A
Akk ))
(4) [imag., odd] [real, odd]
(5) [complex, even] [complex, even]
Re( Ak ) 0 , Ak  Ak Im(v(t )) 0 , v(t ) v(t )
Re( Ak ) > 0 for some k Re(v(t )) > 0 for some t
Im( Ak ) > 0 for some k Im(v(t )) > 0 for some t
(5) [complex, even] [complex, even]
A (tv)(t ))v>
vRe( (t )0 for some t
 k A )A>
Re( kk 0 for some k
Im( Ak ) > 0 for some k Im(v(t )) > 0 for some t
(6) [complex, odd]
Ak Ak v[complex,
(t ) v(todd]
)
Re( Ak ) > 0 for some k Re(v(t )) > 0 for some t
Im( Ak ) > 0 for some k Im(v(t )) > 0 for some t
(6) [complex, odd] [complex, odd]
A Ak0 for some k (tv)(t ))>v(0t )for some t
vRe(
Re( k A ) >
k
Im( Ak ) > 0 for some k Im(v(t )) > 0 for some t
(7)
A[Hermitian] v[real]
(t ) v(t )
 k  Ak Im(v(t )) 0
Ak Ak
(7) [Hermitian] [real]
(8) A Ak Im(v(t )) 0
[Hermitian]
k
[real]
Im( Ak ) 0 v(t ) v(t ) 
- 221 -
(8) [real] [Hermitian]
Im( Ak ) 0 v(t ) v(t ) 
-- 221
221 --
2 · Fourier Theory

Table 2.2
(continued)

(9) [anti-Hermitian] [imag.]


Ak  Ak Re(v(t )) 0

(10) [imag.] [anti-Hermitian]


Re( Ak ) 0 v(t ) v(t ) 

(11) [complex, no symmetry] [complex, no symmetry]

- 222
- 222- -
3
RANDOM VARIABLES, RANDOM
FUNCTIONS, AND POWER SPECTRA
Engineers and scientists are taught many statistical concepts in school, but all too often this is
done in an informal manner that does a good job of explaining how to eliminate random errors
and noise from real experimental data and a poor job of explaining how to analyze random errors
and noise in physical models. Understanding the correct way to represent random errors and
noise requires formal knowledge of the statistical concepts used to describe random signals;
otherwise, basic equations can be misunderstood and misused. For this reason, we here take a
more formal approach to the subject. Starting off with an explanation of the basics—random
functions, independent and dependent random variables, the expectation operator E , stationarity
and ergodicity—that do not require the Fourier theory discussed in the previous chapter, we then
move on to topics that do, such as autocorrelation functions, white noise, the noise-power
spectrum, and the Wiener-Khinchin theorem. The techniques explained in this chapter are used a
few times in the next chapter during the derivation of the Michelson interference equations and
then over and over again in Chapters 6, 7, and 8 to analyze the random errors and noise found in
Michelson systems.

3.1 Random and Nonrandom Variables


Random variables can be thought of as uncontrolled variables and nonrandom variables can be
thought of as controlled variables. When, for example, a computer program is being written, the
programmer controls the values of nonrandom program variables using inputs or lines of code,
but the programmer has no desire to control the program’s random variables—a pseudo-random
number generator gives them values instead. In a similar spirit, a statistician constructing a set of
model equations always ends up controlling the nonrandom variables—either directly by saying
this variable can be measured like this and that variable can be measured like that, or indirectly,
by saying these variables must solve that set of equations. Even when a statistician plots a
function against its argument, the graph is constructed by specifying the argument’s values and
then calculating the function according to its definition, which puts both the nonrandom argument
and the nonrandom value of the function under the statistician’s control. The statistician always,
on the other hand, treats random variables in a model as if they cannot be controlled. They must
be handled as if coins will be flipped, dice rolled, or needles spun on dials to determine their
values after the model is written down. All the statistician can know is the probability this
random variable takes on that value and the probability that random variable takes on this value;

- 223 -
3 · Random Variables, Random Functions, and Power Spectra

that is, he knows what the chances are that the coins, dice, or needles return one set of numbers
rather than another. Most scientists and engineers do not pay much attention to the difference
between controlled and uncontrolled variables—perhaps because most of their “controlled”
variables are usually a little “uncontrolled” in the sense that they come from imperfectly accurate
measurements—but it is very convenient when analyzing a statistical model to keep careful track
of this distinction. To help us remember which variables are random and which are not, we put a
wavy line or tilde over the random variables while writing the nonrandom variables in the usual
way. As an example of how this looks, we note that u, a0, and zƍ are all nonrandom variables
whereas NJ, ã0, and z′ are all random.

3.2 Random and Nonrandom Functions


When the argument of a function is a random variable, the value of the function is also random.
If, for example, x is a random variable and f is a function, then

y = f ( x ) (3.1a)

is another random variable. To give an example of how this works, we create a nonrandom time
variable t and a random angular frequency ω , multiply them together and take the sine of their
product to get
y = sin(ω t ) . (3.1b)

The value of y is clearly uncontrolled; for each unpredictable value of ω at time t, there is a
corresponding unpredictable number y that is given by sin(ω t ) . This example also shows that
when a function has several arguments, its value becomes random when only one of the
arguments is random. In Eq. (3.1b) the sine of ω t , regarded as a function of both ω and t, is
random even though only one of its arguments, ω , is random.
Many times when a function has multiple arguments, the controlled argument or arguments
are more interesting than the uncontrolled argument or arguments that make the function random.
One way to handle this situation is to list only the nonrandom arguments and say that what we
have is a random function with nonrandom arguments. To show what is going on, we put a wavy
line over the function name, indicating that even though all the listed arguments are nonrandom,
the function itself is random. If, for example, we are only interested in the nonrandom time t, we
could define
R (t ) = sin(ω t ) (3.2a)

to be a random function of the nonrandom variable t. Now whenever there is a list of time values
t1, t2, …, there is a corresponding list of random variables

- 224 -
Random and Nonrandom Functions · 3.2

u1 = R (t1 ) = sin(ω t1 ) , (3.2b)


u = R (t ) = sin(ω t ) ,
2 2 2

Although Eq. (3.2b) implicitly assumes a list of distinct and separate t values, this reasoning still
holds up when t is explicitly made a continuous variable. Nothing, for example, stops us from
saying that for each value of t between í’ and +’, there corresponds a different random variable

ut = R (t ) = sin(ω t ) . (3.2c)

The idea of a random function of nonrandom arguments becomes more attractive when there is
no realistic possibility of analyzing the effect of multiple random arguments on a single
nonrandom function. We might, for example, know exactly how N random parameters r1 , r2 , …,
rN interact to cause an error e in an electrical signal s at time t. This lets us write the error as a
nonrandom function
e(t , r1 , r2 ,… , rN ) .

Rather than investigating how r1 , r2 , …, rN are behaving, it usually makes more sense to say that
there is a random noise
n (t ) = e(t , r1 , r2 ," , rN ) (3.3a)

contaminating electrical signal s. Now we can put the error into our model as a random function ñ
that depends on a nonrandom parameter t instead of as a nonrandom function e that depends on t
and N random parameters r1 , r2 , …, rN . Sometimes the signal s in our model depends on more
than one nonrandom parameter, such as the x, y coordinates of an image point at time t. If the
corresponding error e in the signal s depends on x, y, and t as well as the random parameters r1 ,
r2 , …, rN , then we can say there is a random noise

n ( x, y, t ) = e( x, y, t , r1 , r2 ,… , rN ) (3.3b)

contaminating signal s(x, y, t). Note that we can think in terms of a signal noise ñ(t) or ñ(x,y,t)
even when we are not sure what random arguments r1 , r2 , …, rN make the nonrandom function e
behave randomly. This is, of course, why the idea of a random function is so useful. In this book,
we use the term “random function” to refer to what statisticians often prefer to call a random or
stochastic process.

- 225 -
3 · Random Variables, Random Functions, and Power Spectra

3.3 Probability Density Distributions: Mean, Variance, Standard


Deviation
With every random variable r , we associate a nonrandom probability density distribution pr ( x)
such that pr ( x) dx is the probability that the random variable r takes on a value between x and
x + dx . The nonrandom argument x of pr is a dummy variable, and nothing stops us from calling
it r instead—in fact, that is the convention. The usual way to introduce a probability density
distribution for a random variable r is to say that pr (r ) dr is the probability that r takes on a
value between r and r + dr . The dummy argument of a probability density distribution p must be
nonrandom, and the subscript of the probability density distribution p must be random—the
subscript, after all, labels p to show which random variable is being described. Since r must
always take on some sort of value between í’ and +’, the sum of all the probabilities pr (r ) dr
between í’ and +’ must always be one. Consequently, for any probability density distribution
pr (r ) , we have

³ p (r ) dr = 1 .
−∞
r (3.4)

For Eq. (3.4) to make sense, the probability density distribution pr (r ) must be defined for all r
between í’ and +’ with the understanding that

pr (r ) = 0

for those values of r to which the random variable r can never be equal.
The predicted average or mean value of r can be written as


µr = ³ p (r ) r dr .
−∞
r (3.5a)

Note that µr , just like pr , is nonrandom even though it has a random subscript. The predicted
variance of r , which is defined to be the predicted average or mean squared difference between
r and µr , is another nonrandom quantity

³ p (r ) (r − µ )
2
vr = r r dr . (3.5b)
−∞
Many people prefer to characterize a random number r by its standard deviation σ r instead of its
variance vr . The standard deviation of a random number r is defined to be the square root of the
variance,

- 226 -
Probability Density Distributions: Mean, Variance, Standard Deviation · 3.3

σ r = vr . (3.5c)

Of course σ r , like vr , is a nonrandom quantity. In general, the probability density distribution pr
lets us find the predicted average or mean value of any nonrandom function f of the random
variable r by calculating the nonrandom quantity


predicted mean value of f = ³ p (r ) f (r ) dr .
−∞
r (3.5d)

When f (r ) = r , this equation reduces to formula (3.5a) for µr ; and when f (r ) = (r − µr ) 2 , this
equation reduces to formula (3.5b) for vr .
Many random variables found in nature appear to obey a Gaussian, or “normal,” probability
distribution:
( r − µ r ) 2
1 −
2σ r2
pr (r ) = e . (3.6a)
σ r 2π

This can in part be explained as a consequence of the central limit theorem,25 which is described
in Sec. 3.11 below. It is easy to show that parameter µr in Eq. (3.6a) is the mean of the Gaussian
distribution. Consulting formula (3.5a) above, we see that the mean of the distribution in (3.6a)
must be
∞ ( r − µr )2 ∞ ( r ′ )2
r −
1 −

³σ ³
2σ r2 2σ r2
e dr = (r ′ + µr ) e dr ′ , (3.6b)
−∞ r 2π σ r 2π −∞

where on the right-hand side the variable of integration is changed to r ′ = r − µr . This becomes,
consulting Eq. (7A.3d) in Appendix 7A of Chapter 7,
∞ ( r ′ )2 ∞ ( r ′ )2 ∞ ( r ′ )2
1 −
1 −
µr −

³ (r ′ + µ ) e ³ r′ e ³e
2σ r2 2σ r2 2σ r2
r dr ′ = dr ′ + dr ′
σ r 2π −∞ σ r 2π −∞ σ r 2π −∞
(3.6c)
∞ ( r ′ )2
1 −

³ r′ e
2σ r2
= dr ′ + µr ⋅1 .
σ r 2π −∞

25
Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. (McGraw-Hill, Inc., New
York, 1991), p. 214.

- 227 -
3 · Random Variables, Random Functions, and Power Spectra

If we replace r ′ by −r ′ in
( r ′ )2

2σ r2
g (r ′) = r ′ e ,

it is the same as multiplying g by −1 , which makes g an odd function [see Eq. (2.11b) in Chapter
2). Hence, according to Eq. (2.17) in Chapter 2,

∞ ( r ′)2

³ r′ e
2σ r2
dr ′ = 0
−∞

because it is the integral of an odd function between í’ and +’. Therefore, Eq. (3.6c) simplifies
to
∞ ( r ′ )2
1 −

³ (r ′ + µ ) e
2σ r2
r dr ′ = µr , (3.6d)
σ r 2π −∞

which can be substituted back into (3.6b) to get

∞ ( r − µ r ) 2
r −

³σ
2σ r2
e dr = µ r . (3.6e)
−∞ r 2π

This shows that, as claimed above, parameter µr is the mean of the probability distribution
specified in Eq. (3.6a). It is just as easy to show that σ r is the standard deviation of the
distribution in (3.6a). From (3.5b) we know that the variance of this distribution is

∞ ( r − µ r )2 ∞ ( r ′ )2
(r − µ r ) 2 − (r ′) 2 − 2σ r2
³−∞ σ  2π e dr = ³
2σ r2
e dr ′
r −∞ σ r 2 π

when the variable of integration is changed to r ′ = r − µr . According to Eq. (7A.3b) in Appendix
7A of Chapter 7, we can write
∞ ( r ′ )2
(r ′) 2 − 2σ r2
³−∞ σ  2π e dr ′ = σ r .
2
(3.6f)
r

Consequently, σ r2 is the variance of this probability density distribution. The square root of the
variance is the standard deviation according to (3.5c). Hence, it is, as claimed, easy to see that σ r

- 228 -
Probability Density Distributions: Mean, Variance, Standard Deviation · 3.3

is the standard deviation of the probability density distribution in Eq. (3.6a).


When r can only take on the values r1 , r2 , …, rN , then pr can be written as a sum of delta
functions. If, for example, p1 is the probability that r is r1 , p2 is the probability that r is r2 , …,
pN is the probability that r is rN , then

N
pr (r ) = ¦ pk ⋅ δ (r − rk ) . (3.7a)
k =1

The integral for the predicted mean value of r in Eq. (3.5a) now reduces to

∞ N N ∞ N
µr = ³ [¦ pk ⋅ δ (r − rk )] r dr = ¦ pk ³ δ (r − rk ) r dr = ¦ pk rk (3.7b)
−∞ k =1 k =1 −∞ k =1

as we expect. Similarly, according to Eq. (3.5b), the predicted variance of r becomes

∞ N N ∞

³ [¦ pk ⋅ δ (r − rk )](r − µr ) dr = ¦ pk ³ δ (r − r ) (r − µ )
2 2
vr = k r dr
−∞ k =1 k =1 −∞
(3.7c)
N
= ¦ pk (rk − µr ) 2 ;
k =1

and, according to Eq. (3.5d), the predicted mean value of f (r ) becomes

∞ N N ∞ N

³ [¦ pk ⋅ δ (r − rk )] f (r ) dr = ¦ pk
−∞ k =1 k =1
³
−∞
f (r ) δ (r − rk ) dr = ¦ pk f (rk ) .
k =1
(3.7d)

Again, the integral formulas reduce to the correct probability-weighted sums. Looking at the
limiting case where N = 1 and p1 = 1 , we get

pr (r ) = δ (r − r1 )
so that

µr = ³ δ (r − r ) r dr = r
−∞
1 1 (3.7e)

and the variance about µr = r1 is

- 229 -
3 · Random Variables, Random Functions, and Power Spectra

³ (r − r ) δ (r − r1 ) dr = (r1 − r1 )2 = 0 .
2
vr = 1 (3.7f)
−∞

Results (3.7e) and (3.7f) show that the value of r is now completely controlled; it must be equal
to r1 and no longer needs to be treated like a random variable. Hence, the limiting case where
N = 1 and p1 = 1 can be regarded as changing a random variable into a nonrandom variable.

3.4 The Expectation Operator


Statisticians avoid the mathematical awkwardness of probability density distributions and their
associated integrals by defining an expectation operator E . For any nonrandom function f with a
random argument x , we say that
E ( f ( x ) )

is the predicted mean, or average, value of f ( x ) . We also call E ( f ( x ) ) the expectation value of
f ( x ) . Mathematically we define

E ( f ( x ) ) = ³ p ( x) f ( x) dx .
x (3.8a)
−∞

Just like before, px ( x) dx is the probability that the random variable x takes on a value between
x and x + dx . We can find E( x ) , the expectation value of x , by choosing f ( x ) = x in Eq. (3.8a)
to get

E( x ) = ³ p ( x) x dx .
−∞
x (3.8b)

Comparing this to Eq. (3.5a) above, we see that the expectation value of x is the same as the
predicted mean or average value of x ,

E( x ) = µ x , (3.8c)

which makes good intuitive sense. Choosing f ( x ) = ( x − µ x ) 2 gives

(
E ( x − µ x ) 2 = ) ³ p ( x) ( x − µ )
x x
2
dx . (3.8d)
−∞

- 230 -
The Expectation Operator · 3.4

Comparing this to Eq. (3.5b) above, we see that E ( ( x − µ x ) 2 ) is the variance of x ,

(
vx = E ( x − µ x ) 2 . ) (3.8e)

A notation often used for the variance of x instead of vx is

(
Var ( x ) = E ( x − µ x ) 2 . ) (3.8f)

When the E operator is applied to any sort of random variable or function—for example,
f ( x ) —the result is always a nonrandom variable or function, namely

³ p ( x) f ( x) dx .
−∞
x

For example, the characteristic function Φ x of a random variable x , which is the nonrandom
Fourier transform of the probability density distribution of x ,

³ p ( x )e
−2π iν x
Φ x (ν ) = x dx , (3.9a)
−∞

can be written as, using the E operator,

Φ x (ν ) = E (e −2π iν x ) . (3.9b)

To specify what happens when E is applied to a nonrandom variable c, we set up a random


variable ρ that has the probability density distribution

pρ ( ρ ) = δ ( ρ − c) . (3.9c)

According to the discussion following Eqs. (3.7e,f) above, this makes ρ equivalent to the
nonrandom variable c. Consequently, we can say that

E(c) = E( ρ ) (3.9d)
and use Eq. (3.8b) above to get

- 231 -
3 · Random Variables, Random Functions, and Power Spectra

∞ ∞
E( c ) = ³ pρ ( ρ ) ρ d ρ = ³ δ ( ρ − c ) ρ d ρ = c .
−∞

−∞
(3.9e)

This justifies the general rule—which also makes good intuitive sense—that

E( c ) = c (3.9f)
for any nonrandom quantity c.
The expectation operator E can be applied to multiple random variables at the same time—all
that we need is the appropriate probability density distribution. Suppose, for example, that the
behavior of two random variables x and X is described by a two-argument probability density
distribution pxX

( x, X ) , with pxX

( x, X ) dx dX being the probability that the random variable x
takes on a value between x and x + dx while the random variable X takes on a value between X
and X + dX . No matter what the behavior of random variables x and X , we can always
construct an appropriate probability density distribution p  . Since x and X must always take

xX

on some values in the intervals

−∞ < x < ∞ and −∞ < X < ∞ ,

the same reasoning used to produce Eq. (3.4) now shows that

∞ ∞

³
−∞
dx ³ dX pxX
−∞

( x, X ) = 1 (3.10a)

for any probability density distribution pxX 


. The expectation value of any function of the random
variables x and X , such as f ( x , X ) , is defined to be

∞ ∞

( ) ³
E f ( x, X ) = dx ³ dX pxX

( x, X ) f ( x , X ) . (3.10b)
−∞ −∞

In particular, we can always set f ( x , X ) = x X to get the expected value of the random variables’
product,

∞ ∞
 )=
E( xX ³ x dx ³ dX X p 
xX
( x, X ) . (3.10c)
−∞ −∞

- 232 -
Independent and Dependent Random Variables · 3.5

3.5 Independent and Dependent Random Variables


When comparing two random variables such as x and X , one of the first questions that arises is
whether they are dependent or independent. When two random variables are dependent, the
random variables influence each other; and when two random variables are independent, they do
not.
Independent random variables are used to describe random quantities for which no cause-and-
effect relationship can be found. When, for example, we pick a car randomly from all the cars
sold in a given year, there is no reason to expect that the random variable representing the
brightness of the car’s headlights is associated with any particular value of the random variable
representing the car’s length. Lacking any evidence to the contrary, then, we say that these two
random variables ought to be independent. Similarly, if we pick someone at random from a
collection of adults, there is no obvious reason to assume that the random variable representing
the person’s yearly income is associated with any particular value of the person’s shoe size.
Again, we might assume that these are independent random variables. In general, when there is
no reason to connect the values of random quantities, we set them up in our models as
independent random variables.
Many times random variables turn out to be dependent in surprising ways. Returning to the
first of the previous examples, when we examine the connection between a car’s length and the
brightness of its headlights, it might turn out that very short cars are more likely to be European
sports cars frequently washed by their owners, making them more likely to have cleaner and thus
brighter headlights. Similarly, returning to the second example, a person’s shoe size and height
are connected; and statisticians have in fact shown that tall people, who are more likely to wear
large shoes, are also more likely to earn large incomes (if only because people living in the
United States, Australia, Canada, and Europe are more likely to be tall). Just as in these two
examples, many random variables that look like they ought to be unconnected and independent
turn out, after closer examination, to be dependent; in this sense, the independence of random
variables is the ideal case from which realistic random variables tend to deviate to a greater or
lesser degree.

3.6 Analyzing Independent Random Variables


When x and X are independent random variables, their probability density distribution can be
written as26

pxX

( x, X ) = px ( x) ⋅ p X ( X ) . (3.11a)

where px and p X are the standard probability density distributions for x and X when x and X

26
Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, p. 132.

- 233 -
3 · Random Variables, Random Functions, and Power Spectra

are treated as solitary random variables. This means that px ( x) dx is the probability that x lies
between x and x + dx regardless of the value of X , and p X ( X ) dX is the probability that X lies
between X and X + dX regardless of the value of x . We see that, according to Eqs. (3.10c) and
  of two independent random variables is
(3.11a), the expectation value of the product xX

∞ ∞ ∞ ∞
 )=
E( xX ³ x dx ³ dX X pxX

( x, X ) = ³ x dx ³ dX X px ( x) p X ( X )
−∞ −∞ −∞ −∞
∞ ∞
= [ ³ px ( x) x dx] ⋅ [ ³ p X ( X ) X dX ] .
−∞ −∞

According to Eqs. (3.8b) and (3.8c), this can be written as

  ) = E( x ) ⋅ E( X )
E( xX (3.11b)
or
  ) = µ x µ X .
E( xX (3.11c)

3.7 Large Numbers of Random Variables


Our analysis of two random variables can be extended in a straightforward way to large
collections of random variables. If there are N random variables x1 , x2 ,…, x N , then we can
always construct a probability density distribution

px1x2 "xN ( x1 , x2 ,… , xN )
such that
px1x2 "xN ( x1 , x2 ,… , xN ) dx1 dx2 " dxN

is the probability that x1 lies between x1 and x1 + dx1 , that x2 lies between x2 and x2 + dx2 , ... ,
that x N lies between xN and xN + dxN . The expectation value of any function f ( x1 , x2 ,… , x N ) of
these N random variables is

E ( f ( x1 , x2 ,… , x N ) )
∞ ∞ ∞ (3.12a)
= ³
−∞
dx1 ³ dx2 " ³ dxN f ( x1 , x2 ,… , xN ) px1 x2 "xN ( x1 , x2 ,… , xN ).
−∞ −∞

- 234 -
Large Numbers of Random Variables · 3.7

Note that nothing has been said so far about the connections between these N random variables;
they could be either dependent or independent. If we now assume that these N random variables
are all independent with respect to one another, then

px1x2 "xN ( x1 , x2 ,… , xN ) = px1 ( x1 ) px2 ( x2 ) " pxN ( xN ) , (3.12b)

where px1 ( x1 ) dx1 is the probability that x1 lies between x1 and x1 + dx1 regardless of the values
of the other N − 1 random variables, px2 ( x2 ) dx2 is the probability that x2 lies between x2 and
x2 + dx2 regardless of the values of the other N − 1 random variables, …, pxN ( xN ) dxN is the
probability that x N lies between xN and xN + dxN regardless of the values of the other N − 1
random variables. The expectation value of the product of these N random variables can now be
written as, setting f ( x1 , x2 ," , x N ) = x1 x2 " x N in Eq. (3.12a),

∞ ∞ ∞
E( x1 x2 " x N ) = ³
−∞
dx1 ³ dx2 " ³ dxN [ x1 x2 " xN ] px1 x2 "xN ( x1 , x2 ,… , xN )
−∞ −∞
∞ ∞ ∞
= ³
−∞
px1 ( x1 ) x1 dx1 ³ px2 ( x2 ) x2 dx2 " ³ pxN ( xN ) xN dxN .
−∞ −∞

Again, we consult Eqs. (3.8b) and (3.8c) to get

E( x1 x2 " x N ) = E( x1 ) E( x2 ) " E( x N ) (3.12c)


or
E( x1 x2 " x N ) = µ x1 µ x2 " µ xN . (3.12d)

3.8 Single-Variable Means from Multivariable Distributions


We can calculate the predicted mean values of x and X by choosing f ( x , X ) = x and
f ( x , X ) = X in Eq. (3.10b) above. This gives
∞ ∞
µ x = E( x ) = ³ dx ³ dX x p
−∞ −∞

xX
( x, X ) (3.13a)

and
∞ ∞
µ X = E( X ) = ³ dx ³ dX X p 
xX
( x, X ) . (3.13b)
−∞ −∞

- 235 -
3 · Random Variables, Random Functions, and Power Spectra

Writing the double integrals as


∞ ∞
E( x ) = ³
−∞
x [ ³ pxX
−∞

( x, X ) dX ] dx (3.13c)

and
∞ ∞
E( X ) = ³ X [³ pxX

( x, X ) dx] dX , (3.13d)
−∞ −∞

we compare them to the formula for the expected value of a random variable given in Eq. (3.8b).
This comparison suggests that, if we want to specify the behavior of one random variable while
disregarding the presence of the other, we can construct the single-argument probability density
distributions of x and X by writing

px ( x) = ³p
−∞

xX
( x, X ) dX (3.13e)

and

p X ( X ) =
−∞
³p 
xX
( x, X ) dx . (3.13f)

Up to this point, none of the integrations have required assumptions about the dependence or
independence of the random variables, so Eqs. (3.13e) and (3.13f) hold true both for dependent
and independent random variables x and X . If we specify that x and X are independent, then
Eq. (3.11a) can be substituted into (3.13e) and (3.13f) to get

∞ ∞
px ( x) = ³ p ( x)
−∞
x p X ( X ) dX = px ( x) ³ p X ( X ) dX
−∞
and
∞ ∞
p X ( X ) =
−∞
³ px ( x) p X ( X ) dx = p X ( X ) ³ px ( x) dx .
−∞

Glancing back at Eq. (3.4), we note that these last two equalities are trivially true, because in both
cases the right-most integrals must be one.

3.9 Analyzing Dependent Random Variables


Having found formulas for µ x and µ X that hold true for any pair of dependent or independent
random variables x and X , we now use µ  and µ  to define a new random variable
x X

- 236 -
Analyzing Dependent Random Variables · 3.9

y = ( x − µ x )( X − µ X ) . (3.14a)

From Eq. (3.8c), we know that

(
E( y ) = E ( x − µ x )( X − µ X ) ) (3.14b)

is just the predicted average value of y . We can imagine, each time we acquire a random pair of
x and X values, comparing the sizes of x and X to their respective averages µ x and µ X by
subtracting µ  and µ  from them. If x and X are both simultaneously greater than, or both
x X

simultaneously less than, their averages, then y is positive; and if one is greater than its average
when the other is less that its average, then Ϳ is negative. If there is a tendency for one of the
random variables to exceed its average whenever the other exceeds its average, or a tendency for
one of the random variables to fall below its average whenever the other falls below its average,
then Ϳ has a greater probability of being positive than negative, so

E( y ) > 0 .

If, on the other hand, there is a tendency for one of the random variables to exceed its average
when the other falls below its average, then Ϳ has a greater probability of being negative than
positive, so
E( y ) < 0 .

If E( y ) is zero, it indicates that Ϳ is just as likely to be negative as positive, which means that
knowing one variable lies above or below its average tells us nothing about the likelihood that the
other variable lies above or below its average. Writing out the integral formula for E( y ) in terms
of the probability density distribution pxX 
( x, X ) gives

∞ ∞

( ) ³ dx ³ dX [( x − µ )( X − µ
E( y ) = E ( x − µ x )( X − µ X ) = x X
)] pxX

( x, X ) . (3.14c)
−∞ −∞

We say that the value of the integral in Eq. (3.14c) measures the covariance of random variables
x and X . When
(
E( y ) = E ( x − µ x )( X − µ X ) )
is greater than zero, x and X are said to be positively correlated; when

- 237 -
3 · Random Variables, Random Functions, and Power Spectra

(
E( y ) = E ( x − µ x )( X − µ X ) )
is less than zero, x and X are said to be negatively correlated; and when

(
E( y ) = E ( x − µ x )( X − µ X ) )
equals zero, x and X are said to be uncorrelated.
Evaluating E( y ) and finding it not equal to zero is a standard way of showing that two
random variables x and X are correlated and so cannot be independent. We cannot, however,
say that x and X are independent just because E( y ) is zero; that is, saying that x and X are
uncorrelated is a weaker statement than saying that x and X are independent. To show why this
is so, we set up a random variable φ which has a probability density distribution

­ 1 (2π ) for 0 ≤ φ < 2π


pφ (φ ) = ® . (3.15a)
¯ 0 for φ < 0 or φ ≥ 2π

The probability density distribution pφ shows that φ is equally likely to take on any value
between zero and 2ʌ, and that φ never takes on values less than zero or greater than 2ʌ. We next
define two random variables u and v such that

u = sin(φ ) (3.15b)
and
v = cos(φ ) . (3.15c)

It follows that
∞ 2π
1
µu = E(u ) = E(sin φ ) = ³−∞ pφ (φ ) sin(φ ) dφ = 2π ³ sin(φ ) dφ = 0 , (3.15d)
0

and similar reasoning shows that


1
µv = E(v ) =
2π ³ cos(φ ) dφ = 0 .
0
(3.15e)

Note that

- 238 -
Analyzing Dependent Random Variables · 3.9

(
E ( (u − µu )(v − µv ) ) = E(u v ) = E (sin φ )( cos φ ) )

1
=
2π ³ sin(φ ) cos(φ ) dφ
0
(3.15f)


1
4π ³0
= sin(2φ ) dφ = 0 ,

which means that u and v are uncorrelated random variables. On the other hand, we also know
that
u 2 + v 2 = sin 2 φ + cos 2 φ = 1 ,

which means that whenever u takes on a particular random value, say 1/2, then v must take on
one of the two random values
± 1 − (1 2) 2 = ± 3 2 .

Consequently, u and v are by no means independent random variables even though by definition
they are uncorrelated random variables.

3.10 Linearity of the Expectation Operator


The expectation operator is linear with respect to all random quantities. To see why, we take any
two functions f and g whose arguments are the N random variables x1 , x2 ,…, x N and multiply
them by two nonrandom variables Į and ȕ. The expectation operator E applied to

α f ( x1 , x2 ,… , x N ) + β g ( x1 , x2 ,… , x N )

then gives, according to Eq. (3.12a) above,

E (α f ( x1 , x2 ,… , x N ) + β g ( x1 , x2 ,… , x N ) )


∞ ∞ ∞
= ³ dx ³ dx " ³ dx
−∞
1
−∞
2
−∞
N [α f ( x1 , x2 ,… , xN ) + β g ( x1 , x2 ,… , xN )] px1 x2 "xN ( x1 , x2 ,… , xN )

∞ ∞ ∞
=α ³ dx ³ dx " ³ dx
−∞
1
−∞
2
−∞
N f ( x1 , x2 ," , xN ) px1 x2 "xN ( x1 , x2 ," , xN ) (3.16a)

∞ ∞ ∞
+β ³
−∞
dx1 ³ dx2 " ³ dxN g ( x1 , x2 ,… , xN ) px1x2 "xN ( x1 , x2 ,… , xN )
−∞ −∞

= α E ( f ( x1 , x2 ,… , x N ) ) + β E ( g ( x1 , x2 ,… , x N ) ) .

- 239 -
3 · Random Variables, Random Functions, and Power Spectra

Note that in the last step Eq. (3.12a) is applied again to return to the expectation operator.
According to Eq. (2.32a) in Chapter 2, the definition of a linear operator L is that

L (α f + β g ) = α L ( f ) + β L ( g ) (3.16b)

for any two functions f, g and any two constants Į, ȕ. When we think of the nonrandom variables
Į and ȕ as “constants,” we see that Eqs. (3.16a) and (3.16b) provide plenty of justification for
calling the expectation operator E a linear operator with respect to all random quantities.
The linearity of E can be used to show that multiplying any random variable x by a
nonrandom parameter Į results in the mean of x being multiplied by Į and the variance of x
being multiplied by Į2. Starting with Eq. (3.8c), we multiply both sides by Į to get

α E( x ) = αµ x . (3.16c)

Because E is linear, E(α x ) = α E( x ) , which means that Eq. (3.16c) can be written as

E(α x ) = αµ x . (3.16d)

This shows that multiplying x by Į changes its average value from µ x to αµ x . As for the
variance vx of random variable x , according to Eq. (3.8e) we have

( )
E ( x − µ x ) 2 = vx (3.16e)

from the definition of the variance of x . Multiplying both sides by Į2 gives

α 2E ( ( x − µ x ) 2 ) = α 2 vx . (3.16f)
Again the linearity of E lets us write

α 2E ( ( x − µ x ) 2 ) = E (α 2 ( x − µ x )2 ) ,

and taking Į inside the square gives

α 2E ( ( x − µ x )2 ) = E ( (α x − αµ x )2 ) .

This can be substituted into (3.16f) to get

- 240 -
Linearity of the Expectation Operator · 3.10

E ( (α x − αµ x ) 2 ) = α 2 vx . (3.16g)

Since α x is the new random variable which comes from multiplying x by Į and [according to
Eq. (3.16d)] the quantity αµ x is the mean of this new random variable, we now realize—
consulting the definition of the variance in Eq. (3.8e)—that E ( (α x − αµ x ) 2 ) must be the variance
of the new random variable α x . Equation (3.16e) reminds us that vx is the variance of the old
random variable x . Hence, Eq. (3.16g) states that if x is multiplied by Į then its variance must
be multiplied by Į2.
The expectation operator usually can be moved inside an integral over a nonrandom variable.
Suppose function f depends on one nonrandom variable z in addition to N random variables
x1 , x2 ,…, x N . Then, again using Eq. (3.12a), the expectation value of the integral

zB

³ f ( z, x , x ,…, x
zA
1 2 N ) dz

is
zB

E ( ³ f ( z , x1 , x2 ,… , x N ) dz )
zA
∞ ∞ ∞ zB

= ³ dx ³ dx " ³ dx
−∞
1
−∞
2
−∞
N px1 x2 "xN ( x1 , x2 ,… , xN ) ³ f ( z, x1 , x2 ,… , xN ) dz .
zA

As long as we can interchange the order of these integrations—which is almost always allowed
when dealing with physically realistic integrals—the expectation value can also be written as

§ zB ·
E ¨ ³ f ( z, x1 , x2 ,… , x N ) dz ¸
¨z ¸
© A ¹
zB
ª∞ ∞ ∞
º
= ³ dz « ³ dx1 ³ dx2 " ³ dxN px1 x2 "xN ( x1 , x2 ,… , xN ) f ( z, x1 , x2 ,… , xN ) » .
zA ¬ −∞ −∞ −∞ ¼

This can, again applying Eq. (3.12a), be written as

§ zB · zB
E ¨ ³ f ( z, x1 , x2 ,… , xN ) dz ¸ = ³ E ( f ( z, x1 , x2 ,… , x N ) ) dz .
   (3.17a)
¨z ¸ z
© A ¹ A

- 241 -
3 · Random Variables, Random Functions, and Power Spectra

The same reasoning can be extended to M integrals over M nonrandom variables z1 , z2 ,…, zM .
We have

§ z1 B z2 B zMB
·
E ¨ ³ dz1 ³ dz2 " ³ dzM f ( z1 , z2 ,… , zM , x1 , x2 ,… , x N ) ¸
¨z ¸
© 1 A z2 A zMA ¹
∞ ∞ z1 A zMB

= ³
−∞
dx1 " ³ dxN px1x2 "xN ( x1 ,… , xN )
−∞
³
z1 A
dz1 " ³ dz
zMA
M f ( z1 ,… , zM , x1 ," , xN )

z2 A zMB
ª∞ ∞
º
= ³
z1 A
dz1 " ³z M «¬ −∞³ 1 −∞³ dxN px1x2"xN ( x1 ,… , xN ) f ( z1 ,…, zM , x1 ," , xN ) »¼ ,
dz dx "
MA

which can also be written as

§ z2 B z2 B zMB
·
E ¨ ³ dz1 ³ dz2 " ³ dzM f ( z1 , z2 ,… , zM , x1 , x2 ,… , x N ) ¸
¨z ¸
© 1 A z2 A zMA ¹ (3.17b)
z1 B z2 B zMB

= ³ dz ³ dz " ³ dz
1 2 M E ( f ( z1 , z2 ,… , zM , x1 , x2 ,… , x N ) ).
z1 A z2 A zMA

The expectation operator can even be moved inside the integral of a random function

f ( z1 , z2 ,… , zM ) .

According to our definition of a random function in Sec. 3.2 above, we have

f ( z1 , z2 ,… , zM ) = f ( z1 , z2 ,… , zM , x1 , x2 ,… , x N )

for some set of random variables x1 , x2 ,…, x N . Hence, we can just suppress the random variables
x1 , x2 ,…, x N in Eq. (3.17b) to get

- 242 -
Linearity of the Expectation Operator · 3.10

§ z2 B z2 B zMB
·
E ¨ ³ dz1 ³ dz2 " ³ dzM f ( z1 , z2 ,… , zM ) ¸
¨z ¸
© 1 A z2 A zMA ¹ (3.17c)
z1 B z2 B zMB

= ³ dz ³ dz " ³ dz
1 2 M ( )
E f ( z1 , z2 ,… , zM ) .
z1 A z2 A zMA

This result is referred to more than once in the following chapters.

3.11 The Central Limit Theorem


The central limit theorem states that if there is a random variable sN equal to the sum of N
independent random variables r1 , r2 ,…, rN , then

sN = r1 + r2 + " + rN (3.18a)

has a probability density distribution psN ( sN ) that resembles a Gaussian or normal probability
density distribution more and more as N gets large,

( s N − µ sN )2

1 2σ s2N
psN ( sN ) ≅ e . (3.18b)
σ s N

In Eq. (3.18b), µ sN is the mean or average value of sN and σ sN is the standard deviation of sN
about its mean. Figure 3.1 is a plot of the Gaussian distribution specified on the right-hand side of
(3.18b). For large but finite values of N, this Gaussian distribution tends to be a relatively good
approximation of psN ( sN ) for sN values near the peak in Fig. 3.1 and a not-so-good
approximation of psN ( sN ) for sN values in the tails of Fig. 3.1—that is, for sN values far from
the peak.
The mean of sN comes from applying the expectation operator E to both sides of Eq. (3.18a).
Remembering that E is linear with respect to random quantities [see Eq. (3.16a) above], we get

E( sN ) = E(r1 + r2 + " + rN ) = E(r1 ) + E(r2 ) + " + E(rN ) ,

- 243 -
3 · Random Variables, Random Functions, and Power Spectra

FIGURE 3.1.

p ~sN ( s N )

sN

σ ~sN µ ~sN σ ~sN

which becomes, applying Eq. (3.8c) above,

µ s = µr + µr + " + µr .


N 1 2 N
(3.19a)

The variance of sN is, according to Eq. (3.8e),

( )
vsN = E ( sN − µ sN ) 2 ,

which becomes, after substituting from Eqs. (3.18a) and (3.19a),

- 244 -
The Central Limit Theorem · 3.11

§§ N N · ·
2
§§ N · ·
2

vsN = E ¨ ¨ ¦ rj − ¦ µrj ¸ ¸ = E ¨ ¨ ¦ (rj − µrj ) ¸ ¸ .


¨ © j =1 ¹ ¸¹ ¨ © j =1 ¹ ¸¹
© j =1
©

Expanding the square inside the expectation operator gives

§ N N N
·
vsN = E ¦ (rj − µrj ) + ¦¦ [(rj − µrj )(rk − µrk )] ¸ ,
¨ 2
¨ j =1 ¸
¨ j =1 k =1 ¸
© k≠ j ¹

and the linearity of the expectation operator with respect to random quantities then lets us write
this as

( ) ( )
N N N
vsN = ¦ E (rj − µrj ) 2 + ¦¦ E (rj − µrj )(rk − µrk ) . (3.19b)
j =1 j =1 k =1
k≠ j

Since r1 , r2 ,…, rN are independent random quantities, so must the random quantities r1 − µr1 ,
r2 − µr2 ,…, rN − µrN also be independent. Hence, according to Eq. (3.11b), we see that when
j≠k

( )
E (rj − µrj )(rk − µrk ) = E(rj − µrj ) ⋅ E(rk − µrk ) . (3.19c)

But, applying the linearity of the expectation operator and Eqs. (3.8c) and (3.9f), we have

E(rj − µ rj ) = E(rj ) − E( µ rj ) = µ rj − µ rj = 0 .

Consequently, Eq. (3.19c) becomes

(
E (rj − µ rj )(rk − µ rk ) = 0 ) (3.19d)

when j ≠ k . Substituting this into (3.19b) gives

( )
N
vsN = ¦ E (rj − µrj ) 2 ,
j =1

- 245 -
3 · Random Variables, Random Functions, and Power Spectra

which becomes, after applying Eq. (3.8e),

vsN = vr1 + vr2 + " + vrN , (3.19e)


where
( )
E (rj − µrj ) 2 = vrj (3.19f)

is the variance of rj for j = 1, 2,… , N . The standard deviation of a random quantity is the square
root of its variance [see Eq. (3.5c)], so formulas (3.19e) and (3.19f) can also be written as

σ s2 = σ r2 + σ r2 + " + σ r2 ,


N 1 2 N
(3.19g)
where
( )
E (rj − µrj ) 2 = σ rj (3.19h)

is the standard deviation of rj for j = 1, 2,… , N and σ sN is the standard deviation of sN .
Returning to the approximation in Eq. (3.18b) used to explain the central limit theorem, we
notice that some care must be exercised in interpreting the limit as N → ∞ ; in particular, it is
clear from Eqs. (3.19a) and (3.19g) that there is a tendency for both µ sN and σ sN to become large
without limit as N increases, making the expression on the right-hand side of (3.18b) difficult to
interpret in the limit of large N. The central limit theorem can be written in terms of a
mathematically well-defined limit as N → ∞ if we are careful how the arguments of the
Gaussian or normal distribution are defined. To state the central limit theorem precisely, we
define a new random variable
sN − µ sN
zN = (3.20a)
σ s N

that has a probability density distribution pzN ( z N ) . Now we can present the central limit theorem
exactly by stating that
1 − z2 / 2
lim ª¬ pzN ( z ) º¼ = e . (3.20b)
N →∞ 2π

The right-hand side of (3.20b) is the Gaussian or normal distribution introduced above in Eq.
(3.6a) where the random variable has a mean of zero and a standard deviation of one. For any
large but finite value of N, we can recover the approximation in (3.18b) by assuming that pzN is
near its limit and then replacing z in (3.20b) by zN as defined in (3.20a). [The extra factor of σ sN

- 246 -
The Central Limit Theorem · 3.11

multiplying the 2π on the right-hand side of (3.18b) can be regarded as coming from Eq. (3.4)
above—if it isn’t there, then the integral of the probability density distribution between í’ and
+’ does not equal one.]

3.12 Averaging to Improve Experimental Accuracy


It is now easy to explain why averaging together many identical but independent measurements
from the same experiment improves the accuracy of the result. Suppose N independent
measurements are to be averaged together this way. We can say that each measurement is an
independent random number rj for j = 1, 2,… , N having the same mean value µ, with µ taken to
be the true value of the experimental quantity being measured. Since the measurements are all
identical, all the rj have the same standard deviation ı due to the same sorts of random errors
occurring in each independent measurement. When all the experimental results are averaged, we
create a new random number—namely, the sum of all the rj divided by N. Let’s call this new
random number a N . The work done in the previous section lets us write this as [see Eq. (3.18a)]

sN
a N = . (3.21a)
N

Applying the expectation operator E to both sides gives, using the linearity of the expectation
operator (see Sec. 3.10 above),
1
E(a N ) = E( sN ) . (3.21b)
N

Since E( sN ) = µ sN , Eq. (3.19a) shows that, since all the rj have the same mean value µ,

E( sN ) = µr1 + µr2 + " + µrN = N µ . (3.21c)

Hence, Eq. (3.21b) now becomes


1
E(a N ) = (N µ) = µ . (3.21d)
N

Equation (3.21d) states that the expected value of the experimental average a N is µ, the true
value of the experimental quantity being measured. This is no great surprise, because the
averaging process would not make sense unless it were true. The typical size of the error left after
the rj are averaged together—that is, the amount by which a N is likely to be different from its
average value—is just its standard deviation [see Eqs. (3.5c) and (3.8e) above],

- 247 -
3 · Random Variables, Random Functions, and Power Spectra

σ a = E ( (a N − µ ) 2 ) ,
N

which can also be written as, after substituting from Eq. (3.21a) and using the linearity of the
expectation operator,
§§ 1 · · 1
2

σ a N
= E ¨ ¨ sN − µ ¸ ¸ =
¨© N
©
¸
¹ ¹ N
E ( sN − N µ ) .
2
( ) (3.21e)

According to (3.21c), N µ is the mean value of sN , which makes

(
E ( sN − N µ )
2
).
the variance vsN of sN [see Eq. (3.8e) above]. Hence, (3.21e) can be written as

1 1
σ a = vsN = σ s2N
N
N N

because the variance is the square of the standard deviation σ sN . Substituting from (3.19g) now
gives
1 1
σ a = vsN = σ r21 + σ r22 + " + σ r2N .
N
N N

As already mentioned above, we can assume that all the rj have the same standard deviation ı.
Hence,
1 σ
σ a = Nσ 2 = . (3.21f)
N
N N

This shows that when the standard deviation or expected error in one measurement is ı, then the
standard deviation or expected error in the average a N of N identical but independent
measurements is σ / N , a significantly smaller number. Although we use several formulas from
the previous section on the central limit theorem to get this result, there is no assumption here
that the rj obey any particular probability density distribution. In order to derive Eqs. (3.21d) and
(3.21f), all that is needed is that the rj are independent and that the probability density
distributions of the rj have the same mean and standard deviation.
When spectrometers are used to make independent measurements of the same radiance

- 248 -
Averaging to Improve Experimental Accuracy · 3.12

spectra, we can extend the above analysis to the spectral measurements by regarding the
independent but identical random variables rj as random functions of the spectral wavelength or
frequency, with different values of index j now representing different spectral curves from
independent spectral measurements. We can now repeat all the algebraic manipulations used in
(3.21a)–(3.21f) above while regarding every quantity except N as a function of the spectral
wavelength or frequency and end up with the same results. If, for example, the quantities are
regarded as functions of the spectral wavelength Ȝ, then we just need to visualize a (Ȝ)
immediately following the relevant variables. In a sense, all that is happening is that we have
decided to repeat the algebra of Eqs. (3.21a)–(3.21f) at each spectral wavelength. Equation
(3.21d), for example, becomes

E ( a N (λ ) ) = µ (λ ) , (3.22a)

showing that the point-by-point average of the rj (λ ) spectral curves creates another curve a N (λ )
whose expected value is the true spectrum µ(Ȝ). The average spectrum a N (λ ) is allowed to have
a different expected value µ(Ȝ) at each wavelength Ȝ because it is now, of course, taken to be a
function of Ȝ. Similarly Eq. (3.21f) becomes

σ (λ )
σ a (λ ) = . (3.22b)
N
N

This shows that the expected error σ aN (λ ) at wavelength Ȝ of the average spectrum a N (λ ) is
smaller by a factor of N than the expected error ı(Ȝ) at wavelength Ȝ of a single spectral
measurement. The expected error σ (λ ) , just like the average µ(Ȝ), is allowed to be different at
different wavelengths. As long as the expected value µ(Ȝ) of a N (λ ) is the true spectral curve, Eq.
(3.22b) shows that we can approach this true spectrum as closely as we desire—that is, make the
error in our point-by-point average spectrum arbitrarily small—by making N as large as
necessary.

3.13 Mean, Autocorrelation, Autocovariance of Random Functions of


Time
Using the same notation as in the discussion following Eq. (3.2a) above, we write ñ(t) to
represent a random function ñ of a nonrandom time t. As we already mentioned at the end of Sec.
3.2, ñ(t) is often called a random or stochastic process. Having specified a random function—or
stochastic process or random process—called ñ(t), we know that for each time t there is a random
variable ñ(t); and when there are two different time values t1 and t2 with t1 t2, there is no reason
to expect the random variables ñ(t1) and ñ(t2) to behave the same way.

- 249 -
3 · Random Variables, Random Functions, and Power Spectra

We also know the behavior of random variables can be described by probability density
distributions. Associated with any N sequential random variables n (t1 ) , n (t2 ) ,..., n (t N ) specified
by the time values t1 < t2 < " < t N there is a probability density distribution

pn (t1 ) n (t2 )"n (tN ) (n1 , n2 ,… , nN ) ,


such that
pn (t1 ) n (t2 )"n (tN ) (n1 , n2 ,… , nN )dn1dn2 " dnN

is the probability first that ñ(t1) takes on a value between n1 and n1 + dn1 , and then that n (t2 )
takes on a value between n2 and n2 + dn2 , and then that n (t3 ) takes on a value between n3 and
n3 + dn3 , …, and then that n (t N ) takes on a value between nN and nN + dnN . The expectation
operator E has the same meaning as before: the expected or mean value of any function f of the
N random variables n (t1 ) , n (t2 ) , ... , n (t N ) is

E ( f ( n (t1 ), n (t2 ),… , n (t N ) ) )


∞ ∞ ∞ (3.23a)
= ³
−∞
dn1 ³ dn2 " ³ dnN f (n1 , n2 ,… , nN ) pn ( t1 ) n ( t2 )"n ( tN ) (n1 , n2 ,… , nN ) .
−∞ −∞

One of the most important expectation values associated with ñ occurs when we set N = 2 and
specify that
f ( n (t1 ), n (t2 ),… , n (t N ) ) = n (t1 ) ⋅ n (t2 )

to get the autocorrelation function

∞ ∞
Rnn  (t1 , t2 ) = E ( n (t1 ) ⋅ n (t2 ) ) = ³ dn1 ³ dn2 [n1n2 ] pn ( t1 ) n (t2 ) (n1 , n2 ) . (3.23b)
−∞ −∞

Other important expectation values are the mean of ñ as a function of time,


µn (t ) = E ( n (t ) ) = ³np n ( t ) (n) dn , (3.23c)
−∞
and the autocovariance of ñ,

- 250 -
Mean, Autocorrelation, Autocovariance of Random Functions of Time · 3.13

  (t1 , t2 ) = E
Cnn ((
n (t1 ) − µn ( t1 ) )( n(t ) − µ ) )
2 n ( t2 )

∞ ∞ (3.23d)
= ³ dn ³ dn (n − µ
−∞
1
−∞
2 1 n ( t1 ) )(n2 − µn ( t2 ) ) pn ( t1 ) n ( t2 ) (n1 , n2 ).

Clearly, when µn ( t ) = 0 for all t, we have

  (t1 , t2 ) = Cnn
Rnn   (t1 , t2 ) . (3.23e)

Almost always, the random functions used to represent noise in a physical system are specified in
such a way that µn ( t ) = 0 , which means the distinction between the autocorrelation function and
the autocovariance function becomes irrelevant.

3.14 Ensembles
Just as random variables are often regarded as taking on one or another specific value chosen
randomly from some collection of allowed nonrandom values, so too do we often think of
random functions as becoming one or another specific, nonrandom function chosen randomly
from a collection—or ensemble—of allowed nonrandom functions. We can visualize this
situation by imagining an infinitely long row of biased and crooked slot machines, one for every
value of t on the time axis.27 The slot machines do not necessarily behave identically and they are
wired together so that they can influence each other. When a slot machine’s lever is pulled, there
is never any jackpot; all that happens is that another number appears inside its window. Each time
we simultaneously pull all the levers of the slot machines, we randomly choose another member
of the ensemble of allowed functions. The probability pn ( t ) (n) dn that random variable ñ(t) takes
on a value between n and n + dn is just the probability that the slot machine at t takes on a value
between n and n + dn , and it is also the probability that some member function randomly chosen
from the ensemble of allowed functions has a value between n and n + dn at time t. In fact, we
can say that

pn ( t1 ) n ( t2 )"n ( tN ) (n1 , n2 ,… , nN )dn1dn2 " dnN

is the probability, after the slot machine levers are pulled, that the slot machine at t1 has a value
between n1 and n1 + dn1 , that the slot machine at t2 has a value between n2 and n2 + dn2 , …, and

27
An objection that could be raised here is that an infinite number of slot machines is only what is called countably
infinite whereas the number of points on the time axis is uncountably infinite, a much “larger” type of infinity. For
our purposes, the distinction between these two types of infinity is not important.

- 251 -
3 · Random Variables, Random Functions, and Power Spectra

that the slot machine at tN has a value between nN and nN + dnN . It can also, of course, be thought
of as the probability that a member function randomly chosen from the ensemble of allowed
functions has values at times t1 < t2 < " < t N that lie between n1 and n1 + dn1 , n2 and n2 + dn2 ,
…, nN and nN + dnN respectively.

3.15 Stationary Random Functions


A random function ñ(t) is strictly stationary,28 or strict-sense stationary,29 if all its statistical
properties are unaffected when the origin of its time axis is changed (that is, when we change the
point at which t = 0 ). Mathematically we require, for any t1 < t2 < " < t N , that the probability
density distribution

pn ( t1 ) n ( t2 )"n ( tN ) (n1 , n2 ,… , nN ) = pn ( t1 +τ ) n ( t2 +τ )"n ( tN +τ ) (n1 , n2 ,… , nN ) (3.24a)

for any value of τ and all N = 1, 2,… , ∞ . Thus, for any integrable function f with N arguments,

∞ ∞ ∞

³ dn ³ dn " ³ dn
−∞
1
−∞
2
−∞
N f (n1 , n2 ,… , nN ) pn ( t1 ) n (t2 )"n ( tN ) (n1 , n2 ,… , nN )

∞ ∞ ∞
(3.24b)
= ³ dn ³ dn " ³ dn
−∞
1
−∞
2
−∞
N f (n1 , n2 ,… , nN ) pn ( t1 +τ ) n ( t2 +τ )"n ( tN +τ ) (n1 , n2 ,… , nN ) ,

where t1 < t2 < " < t N and N = 1, 2,… , ∞ . This means that, according to Eq. (3.23a),

E ( f ( n (t1 ), n (t2 ),… , n (t N ) ) ) = E ( f ( n (t1 + τ ), n (t2 + τ ),… , n (t N + τ ) ) ) (3.24c)

for any integrable function f, any value of τ , and N = 1, 2,… , ∞ . We note that when Eq. (3.24c)
holds true,
E ( f ( n (t1 ), n (t2 ),… , n (t N ) ) )

cannot depend on all the N independent time values t1 , t2 ,…, t N as we might at first suppose. To
see why this is so, we just set τ = −t1 in (3.24c) to get

28
Paul H. Wirsching, Thomas L. Paez, and Keith Ortiz, Random Vibrations: Theory and Practice (John Wiley and
Sons, Inc., New York, 1995), p. 80.
29
Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, p. 297.

- 252 -
Stationary Random Functions · 3.15

E ( f ( n (t1 ), n (t2 ),… , n (t N ) ) )


(3.24d)
= E ( f ( n (0), n (t2 − t1 ), n (t3 − t1 ),… , n (t N − t1 ) ) ) .
This shows that
E ( f ( n (t1 ), n (t2 ),… , n (t N ) ) )

must be a function of just the nonrandom time parameters (t2 − t1 ) , (t3 − t1 ) ,…, (t N − t1 ) and there
are, of course, only N − 1 of these.
Equations (3.24b)–(3.24d) can be understood in terms of the following thought experiment.
We randomly pick some function from the ensemble of allowed functions and choose N time
values t1 < t2 < " < t N . The randomly picked function has values n1 , n2 ,…, nN at times
t1 , t2 ,…, t N respectively. Next, we create some nonrandom function f that has N arguments and is
not one of those physically unreasonable abstractions that mathematicians specialize in. We
calculate and store the value of f (n1 , n2 ,… , nN ) . Randomly choosing another function from the
ensemble of allowed functions for n (t ) , we again use n1 , n2 ,…, nN at t1 , t2 ,…, t N to calculate and
store a new value of f (n1 , n2 ,… , nN ) . Repeating this procedure enough times to get a large
collection of f values, we average them all together to get a good estimate of

E ( f ( n (t1 ), n (t2 ),… , n (t N ) ) ) .

Shifting to a new set of time values t1 + τ , t2 + τ ,…, t N + τ , we again generate another large
collection of f values, this time averaging them together to get a good estimate of

E ( f ( n (t1 + τ ), n (t2 + τ ),… , n (t N + τ ) ) ) .

Since n is strict-sense stationary, we know that no matter what the positive integer N is, and no
matter what the function f is, and no matter what the value of τ is, both collections of f values
always have approximately the same average, with the difference between the averages becoming
less and less as the collections of f values get larger and larger.
To give an example of a random function ñ(t) that is strict-sense stationary, we define

n (t ) = a cos(ω t ) + b sin(ω t ) , (3.25a)

where a and b obey a probability density distribution pab



(a, b) such that pab

(a, b) da db is the
probability that a takes on a value between a and a + da when b takes on a value between b and

- 253 -
3 · Random Variables, Random Functions, and Power Spectra

b + db . We can also, just as correctly, say that pab



(a, b) da db is the probability that b takes on a
value between b and b + db when a takes on a value between a and a + da . We next require

pab

(a, b) = pab

( a 2 + b2 ) . (3.25b)

Equation (3.25b) says that pab



(a, b) is circularly symmetric because it depends on a and b only
through a 2 + b 2 , the “radius length” of a point whose x and y coordinates are a, b. Returning to
the slot-machine model for ñ(t) explained in Sec. 3.14, we note that randomly choosing values for
a and b is the same as simultaneously pulling the levers of all the slot machines representing
ñ(t) in Eq. (3.25a). Having pulled the levers and gotten, say, values a1 for a and b1 for b , we
then know that the number in the window of the slot machine located at time value t1 is

a1 cos(ω t1 ) + b1 sin(ω t1 ) ,

we know that the number in the window of the slot machine located at time value t2 is

a1 cos(ω t2 ) + b1 sin(ω t2 ) ,

and so on. If we pull all the levers again and get values a2 for a and b2 for b , then we know that
the slot machine at t1 has a number
a2 cos(ω t1 ) + b2 sin(ω t1 ) ,

we know the slot machine at t2 has a number

a2 cos(ω t2 ) + b2 sin(ω t2 ) ,

and so on. Because the probability density distribution pab 


(a, b) completely determines the
statistics of random variables a and b , we see that it must also completely determine the
statistics of ñ(t) in Eq. (3.25a).
It is not difficult to show that ñ(t) in Eq. (3.25a) is strict-sense stationary when pab
is
circularly symmetric.30 Picking an arbitrary time interval τ , we construct two new random
variables

30
Athanasios Papoulis, Probability, Random Variables, and Stochastic Processes, p. 301.

- 254 -
Stationary Random Functions · 3.15

A = a cos(ωτ ) + b sin(ωτ ) (3.26a)


and
B = b cos(ωτ ) − a sin(ωτ ) . (3.26b)

The reverse transformation to Eqs. (3.26a) and (3.26b) is, of course,

a = A cos(ωτ ) − B sin(ωτ ) (3.26c)


and
b = B cos(ωτ ) + A sin(ωτ ) , (3.26d)

which we can find by solving Eqs. (3.26a) and (3.26b) for a and b in terms of A and B .
Equations (3.26a) and (3.26b) state that if random variables a and b take on the values a and b,
then random variables A and B must take on the values

a cos(ωτ ) + b sin(ωτ )
and
b cos(ωτ ) − a sin(ωτ )

respectively. Similarly Eqs. (3.26c) and (3.26d) state that if random variables A and B take on
values A and B, then random variables a and b must take on value