Notes 2011

DIGITAL COMMUNICATIONS
lecture notes
c _ Prof. Giorgio Taricco c _
Politecnico di Torino
2010/2011
1 c Prof. Giorgio Taricco c DIGITAL COMMUNICATIONS
Copyright Notice
All material in this course is the property of the Author. Copyright
and other intellectual property laws protect these materials.
Reproduction or retransmission of the materials, in whole or in
part, in any manner, without the prior written consent of the
copyright holder, is a violation of copyright law. A single printed
copy of the materials available through this course may be made,
solely for personal, noncommercial use. Individuals must preserve
any copyright or other notices contained in or associated with
them. Users may not distribute such copies to others, whether or
not in electronic form, whether or not for a charge or other
consideration, without prior written consent of the copyright holder
of the materials.
Basic concepts
Outline
1
Basic concepts
2
Digital modulations over the AWGN channel
3
Information Theory
4
Channel codes
5
ISI and Equalization
Basic concepts
Section Outline
1
Basic concepts
Model of a digital communication system
Band-pass signalling
Problem set 1
Signal spaces
Problem set 2
Basic concepts
Reference books in Digital Communications
S. Benedetto and E. Biglieri, Principles of Digital Transmission:
With Wireless Applications. Kluwer, 1999.
U. Madhow, Fundamentals of Digital Communication. Cambridge
University Press, 2008.
J. Proakis and M. Salehi, Digital Communications (4th Edition).
McGraw-Hill, 2008.
B. Sklar, Digital communications: fundamentals and applications.
Prentice-Hall, 2001.
D. Tse and P. Viswanath, Fundamentals of Wireless
Communication. Cambridge University Press, 2005.
J.M. Wozencraft and I.M. Jacobs, Principles of communication
engineering. Wiley, 1965.
Basic concepts Model of a digital communication system
CHANNEL MODEL
The main subject of this course is the study of digital
communications over a wireless channel.
To this purpose, it is useful to characterize the model of a
digital communication system in order to get acquainted with
its dierent constituent parts.
TOP LEVEL CLASSIFICATION
The model can be divided into three sections, as illustrated in
the following picture:
1 The user section
2 The interface section
3 The channel section
CHANNEL MODEL
1 The user section
CHANNEL MODEL
1 The user section
TX ENCODER MODULATOR
C
H
A
N
N
E
L
DEMODULATOR DECODER RX
user section interface section channel section
ENCODER
Implements source encoding to limit the amount of
transmitted data (for example, voice can be encoded at 4
kbit/s or sent at 64 kbit/s with conventional telephony).
Implements channel encoding to limit the eect of channel
disturbances
MODULATOR
Converts the digital signal into a waveform to be transmitted
over the channel
CHANNEL
Reproduces the transmitted waveform at the receiver
Its operation is aected by frequency distortion, fading,
additive noise, and other disturbances
ENCODER
disturbances
MODULATOR
over the channel
CHANNEL
ENCODER
disturbances
MODULATOR
over the channel
CHANNEL
DEMODULATOR
Converts the received waveform into a sequence of samples to
be processed by the decoder
DECODER
Implements channel decoding to limit the eect of the errors
introduced by the channel
Implements source decoding
DEMODULATOR
Converts the received waveform into a sequence of samples to
be processed by the decoder
DECODER
Implements channel decoding to limit the eect of the errors
introduced by the channel
Implements source decoding
Basic concepts Band-pass signalling
Band-pass signals
A band-pass signal has spectral components in a limited range
of frequencies f (f
2
, f
1
) (f
1
, f
2
) provided that
0 < f
1
< f
2
.
f
B
x
f
1
+f
1
f
2
+f
2
A certain frequency in the range (f
1
, f
2
) (usually the middle
frequency) is called carrier frequency and denoted by f
c
.
The signal bandwidth is B
x
= f
2
f
1
.
It is often convenient to represent band-pass signals as
equivalent complex signals with low-pass frequency spectrum
(i.e., including the zero frequency).
The analytic signal
A real band-pass signal x(t) can be mapped to a complex
analytic signal x(t) by passing through a linear lter with
transfer function 2u(f) = 2 1
f>0
:
x(t) 2u(f) x(t)
where u(t) = 1
t>0
= 1 for t > 0 and 0 for t < 0.
Summarizing:
The analytic signal is a complex representation of a real signal.
It is used to simplify the analysis of modulated signals.
It generalizes the concept of phasor used in electronics.
The basic properties of the analytic signal derive from the
Fourier transform.
The analytic signal (cont.)
If x(t) is a real signal, then its Fourier transform is a
Hermitian function since:
X(f)
=
_

x(t)
e
+j 2ft
dt =
_

x(t)e
j 2(f)t
dt
= X(f)
Therefore, the spectrum is completely determined by its
positive frequency (or negative frequency) part.
Hilbert transform
Since:
X(f) = 2u(f)X(f) = X(f) + sgn(f)X(f) X(f) + j

X(f),
applying T
1
yields:
x(t) = x(t) +j x(t).
The signal x(t) is called the Hilbert transform of x(t):
x(t) x(t)
1
t
=
1
x()
t
d
Hilbert transform (cont.)
Here, the Cauchy principal part of the integral has been taken,
namely,
lim
0,T
_
t
T
+
_
T
t+
x()
t
d
Spectral properties
Assuming x(t) a zero-mean stationary real random process,
we have
G
x
(f) = [2u(f)[
2
G
x
(f) = 4u(f)G
x
(f)
Therefore,
G
x
(f) =
1
4
[G
x
(f) +G
x
(f)]
Moreover,
E[[ x(t)[
2
] =
_

0
4G
x
(f)df = 2
_

G
x
(f)df = 2E[x(t)
2
]
Band-pass signalling
Assume that x(t) is a zero-mean stationary band-pass random
process with bandwidth B
x
and carrier frequency f
c
so that
its power density spectrum is nonzero over the frequencies
f (f
c
B
x
/2, f
c
+B
x
/2) (f
c
B
x
/2, f
c
+B
x
/2)
where f
c
> B
x
/2 > 0.
We dene the baseband complex envelope of x(t) as
x(t) = x(t) e
j 2f
c
t
The complex envelope is sometimes called baseband
equivalent signal.
The complex envelope x(t)
Then, we derive the autocorrelation function and the power
density spectrum of x(t):
R
x
() = E
_
x(t +) e
j 2f
c
(t+)
x
(t) e
j 2f
c
t
_
= R
x
() e
j 2f
c
= G
x
(f) = G
x
(f +f
c
).
Then, the power density spectrum of x(t) is nonzero over the
frequencies f (B
x
/2, B
x
/2), i.e., it is a baseband signal
with bandwidth B
x
/2.
In-phase and quadrature components
The real and imaginary parts of x(t) = x
c
(t) + j x
s
(t) are
called in-phase and quadrature components of the signal.
They can be expressed in terms of the signal itself and of its
Hilbert transform:
_
x
c
(t) = !e[ x(t)e
j 2f
c
t
] = x(t) cos(2f
c
t) + x(t) sin(2f
c
t)
x
s
(t) = Jm[ x(t)e
j 2f
c
t
] = x(t) cos(2f
c
t) x(t) sin(2f
c
t)
The previous relationships can be inverted and yield:
_
x(t) = !e[ x(t)e
j 2f
c
t
] = x
c
(t) cos(2f
c
t) x
s
(t) sin(2f
c
t)
x(t) = Jm[ x(t)e
j 2f
c
t
] = x
s
(t) cos(2f
c
t) +x
c
(t) sin(2f
c
t)
Frequency up-conversion = modulation
The modulation (or frequency up-conversion) of a real signal
x(t) consists in the following operation:
x(t) x(t) cos(2f
c
t).
The modulation of a couple of real signals x
c
(t) and x
s
(t)
consists in the following operation:
[x
c
(t), x
s
(t)] x
c
(t) cos(2f
c
t) x
s
(t) sin(2f
c
t).
In the analytic signal domain, modulation can be represented
as follows:
x(t) x(t) = x(t)e
+j 2f
c
t
.
Frequency down-conversion = demodulation
Therefore, demodulation (or frequency down-conversion) in
the analytic signal domain is represented as follows:
x(t) x(t) = x(t)e
j 2f
c
t
.
Correspondingly, in the real signal domain, demodulation can
be represented by:
_
x
c
(t) = x(t) cos(2f
c
t) + x(t) sin(2f
c
t)
x
s
(t) = x(t) cos(2f
c
t) x(t) sin(2f
c
t)
Frequency down-conversion = demodulation (cont.)
In a real system, demodulation can be implemented by
observing that
MULTIPLICATION BY IN-PHASE CARRIER
x(t) cos(2f
c
t +) 2 cos(2f
c
t +)
= x(t) +x(t) cos(4f
c
t + 2)
In other words, multiplication of the signal x(t) cos(2f
c
t +)
by the phase-coherent sinusoid 2 cos(2f
c
t +) returns the
superposition of
the modulating signal x(t);
another modulated signal with carrier frequency 2f
c
.
Low-pass ltering with bandwidth B
x
eliminates the
modulated signal with carrier frequency 2f
c
.
Demodulator
The following picture illustrates the block diagram of a
demodulator with input:
x(t) = x
c
(t) cos(2f
c
t) x
s
(t) sin(2f
c
t).
x(t)
2 sin(2f
c
t) = 2 cos(2f
c
t +

2
)
2 cos(2f
c
t)
LOW-PASS
FILTER
LOW-PASS
FILTER
x
s
(t)
x
c
(t)
Basic concepts Problem set 1
Problem set 1
1 Calculate the analytic signal corresponding to
x(t) = cos(2f
c
t +).
x(t) = sinc(t) cos(20t).
x(t) = sinc(t)
2
cos(20t).
2 Calculate the baseband equivalent signal corresponding to
x(t) = cos(41t) + 2 sin(39t), f
c
= 20.
x(t) = sinc(t) cos(20t), f
c
= 10.
x(t) = sinc(t) cos(20t), f
c
= 9.
Basic concepts Signal spaces
Signal spaces
Signal spaces are linear (or vector) spaces built upon the
concept of Hilbert space, i.e., nite or innite-dimensional
complete inner product spaces.
The elements of a signal space are real or complex signals x(t)
dened over a support interval J, for example J = (0, T).
The inner product of two elements (signals) x and y is dened
as
(x, y) =
_
I
x(t)y
(t)dt. (1)
Signal spaces (cont.)
Correspondingly, the induced norm of x is given by
|x| (x, x)
1/2
. (2)
Accordingly, the set
L
2
(J) = x : |x| <
is dened as a signal space.
Inner products satisfy the Cauchy-Schwarz inequality
[(x, y)[ |x| |y|.
If [(x, y)[ = |x| |y|, then the two signals are proportional,
i.e., y(t) = x(t) for some C.
A signal x(t) 1 L
2
(J) if |x| is nite.
The squared norm of a signal x(t) is the signal energy:
E(x) (x, x) =
_
I
[x(t)[
2
dt = |x|
2
.
A nite-dimensional signal space 1 L
2
(J) is identied
through a base [
n
(t)]
N
n=1
.
The elements of a base have the following orthogonality
property:
(
m
,
n
) = 1 if m = n and 0 otherwise.
A signal x(t) 1 L
2
(J) can be represented by the
expansion
x(t) =
N
n=1
x
n
n
(t).
The coecients x
n
in this expansion can be calculated by
x
n
= (x,
n
) =
_
I
x(t)
n
(t)
dt.
In many cases, a signal space 1 is dened as the set of all
possible linear combinations of a set of signals:
1 = /(s
1
, . . . , s
M
)
= x(t) =
1
s
1
(t) + +
M
s
M
(t),
(
1
, . . . ,
M
) C
M
.
The set /(s
1
, . . . , s
M
) is called linear span of s
1
(t), . . . , s
M
(t).
In general, the signal set s
1
, . . . , s
M
is not a base, but a
base can be found by using the Gram-Schmidt algorithm.
The Gram-Schmidt algorithm nds a base (
n
)
N
n=1
of
1 = /(s
1
, . . . , s
M
) by the following set of iterative equations:
For k = 1, . . . , n :
_
_
d
k
= s
k
k1
i=1
(s
k
,
i
)
i
(projection step)
k
=
d
k
|d
k
|
(normalization step)
At every projection step such that d
k
= 0 the corresponding
k
is not assigned and not accounted for in the remaining
steps.
The number of signals in the base is the number of
dimensions of 1.
The Gram-Schmidt algorithm works since, at every step, the
signal d
k
(t) is orthogonal to all previously generated signals
i
(t), i = 1, . . . , k 1. In fact,
(d
k
,
i
) =
_
s
k
k1
=1
(s
k
,
,
i
_
= (s
k
,
i
)
k1
=1
(s
k
,
)(
,
i
)
= (s
k
,
i
) (s
k
,
i
) = 0.
The projection of a signal x(t) 1 over the subspace
} = /(
1
, . . . ,
N
) 1 is a signal x
Y
(t) with the following
properties:
It can be expressed through the base of } as follows:
x
Y
(t) =
N
n=1
(x,
n
)
n
(t).
It is the closest signal in } to x(t):
x
Y
= arg min
yY
|x y|.
Basic concepts Problem set 2
Problem set 2
Assume support interval J = (0, 1).
1 Let 1 be the linear span of cos(2t) and sin(2t). Determine if the
signal cos(2t +/4) belongs to 1.
2 Given the signals s
1
(t) = u(t) u(t 0.6),
s
2
(t) = u(t 0.4) u(t 1), and s
3
(t) = u(t) u(t 1), apply the
Gram-Schmidt algorithm to nd a base of /(s
1
, s
2
, s
3
) [u(t) = 0 for
t < 0 and 1 for t > 0 is the unit step function].
3 Given the signals s
1
(t) = cos(2t) and s
2
(t) = sin(6t), apply the
Gram-Schmidt algorithm to nd a base of /(s
1
, s
2
).
4 Check Schwarzs inequality for the signals s
1
(t) = u(t) u(t 0.6),
s
2
(t) = u(t 0.4) u(t 1), and s
3
(t) = u(t) u(t 1).
5 Let Y be the linear span of sin(2t) and sin(6t). Find the
projection of x(t) = u(t) u(t 1) over Y and calculate |xx
Y
|
2
.
Outline
1
Basic concepts
2
3
Information Theory
4
Channel codes
5
Section Outline
2
Additive White Gaussian Noise (AWGN) Channel
Linear digital modulation
Digital receiver design
Baseband digital modulation
Band-pass digital signalling
Signal detection
Standard digital modulations
Problem set 3
Power density spectrum of digital modulated signals
Comparison of digital modulations
Problem set 4
Digital modulations over the AWGN channel Additive White Gaussian Noise (AWGN) Channel
AWGN channel
This channel model is specied by the equation
y(t) = Ax(t) +z(t) (3)
where:
A is the real channel gain.
x(t) and y(t) are the channel input and output signals.
z(t) is the zero-mean additive white Gaussian noise process. It has
autocorrelation function and power density spectrum:
R
z
() = E[z(t +)z(t)] =
N
0
2
(),
G
z
(f) =
N
0
2
.
Digital modulations over the AWGN channel Linear digital modulation
Linear modulations
We consider the following modulated signal:
x(t; a) =
N
n=1
a
n
n
(t) (4)
where
The vector a = (a
n
)
N
n=1
represents a modulation symbol
vector and is taken from a nite set / =
1
, . . . ,
M
.
/ is called modulation alphabet or signal constellation.
n
(t) is the nth shaping pulse of the modulated signal.
N is the number of dimensions of the modulation scheme.
We assume that each
n
(t) ,= 0 only for t (0, T).
We also assume that (
m
,
n
) =
mn
= 1 if m = n and 0
otherwise (Kroneckers delta).
Linear modulations (cont.)
The signal x(t; a) allows us to send one symbol vector every
T time units so that T is called symbol time, symbol interval,
or signalling interval.
The signal x(t; a) belongs to the Hilbert space 1 generated
by all possible linear combinations of the base signals (
n
)
N
n=1
.
The corresponding received signal
y(t) = A
N
n=1
a
n
n
(t) +z(t) (5)
does not necessarily belong to 1 = /(
1
, . . . ,
N
).
Linear modulations (cont.)
The projection of y(t) on 1 is given by:
y
H
(t) =
N
n=1
(Aa
n
+z
n
)
n
(t), (6)
where z
n
= (z,
n
) =
_
T
0
z(t)
n
(t)dt is a Gaussian random
variable representing the noise component along the nth
dimension of 1.
Digital modulations over the AWGN channel Digital receiver design
Receiver design
The goal of a digital receiver is to recover the transmitted
symbol vector a from the received signal y(t).
The correlation receiver projects y(t) on 1 to obtain (6) and
outputs the coecients y
n
= Aa
n
+z
n
.
In the absence of noise, the correlation receiver output is a
scaled version (by the channel gain A) of the transmitted
symbol vector.
When noise is present, the receiver guesses which symbol
vector from / was transmitted with the goal of minimizing
the error probability.
This process is called detection or decision.
Receiver design (cont.)
The correlation receiver can be interpreted as a matched lter
by observing that:
y
n
=
_
T
0
y(t)
n
(t)dt
=
_
T
0
y(t)h
n
(T t)dt
= [y(t) h
n
(t)]
t=T
,
where we dened the impulse response of the matched lter
as h
n
(t) =
n
(T t).
Sequential receiver design
The signalling model described can be repeated over many
symbol times.
We can write the sequential modulated signal as:
x(t; a
0
, . . . , a
L
) =
L
i=0
N
n=1
a
i,n
n
(t iT).
The corresponding received signal over the AWGN channel is:
y(t) = A
L
i=0
N
n=1
a
i,n
n
(t iT) +z(t).
Sequential receiver design (cont.)
The matched lter structure lends itself to a sequential
implementation accounting for the transmission of successive
modulated symbols in time.
The bank of matched lters receiver is illustrated as follows:
y(t)
1
(T t)
2
(T t)
.
.
.
N
(T t)
y
i,1
y
i,2
y
i,N
t = (i + 1)T
Sequential receiver design (cont.)
The output of the nth matched lter at time t = (i + 1)T is
given by:
y
i,n
= [y(t)
n
(T t)]
t=(i+1)T
=
_

y(t
1
)
n
(T (i + 1)T +t
1
)dt
1
=
_
T
0
y(t
2
+iT)
n
(t
2
)dt
2
= A
L
j=0
N
m=1
_
T
0
[a
j,n
m
(t
2
+iT jT) +z(t)]
n
(t
2
)dt
= A a
i,n
+z
i,n
This is just an extension of the digital receiver operation
described over the rst signalling interval (0, T).
Digital modulations over the AWGN channel Baseband digital modulation
Baseband digital modulation
If the symbols a
n
and the shaping pulses
n
(t) are real,
x(t; a) in (4) represents a baseband digitally modulated signal.
A simple example of digital modulation is the binary antipodal
modulation with alphabet / = 1.
If a = (+1, +1, 1, 1, 1, +1, 1, +1, +1, 1), the signal is
illustrated as follows.
x(t)
t
+1 +1 1 1 1 +1 1 +1 +1 1
g(t)
T
Digital modulations over the AWGN channel Baseband digital modulation
Baseband digital modulation (cont.)
Another example (same data symbols).
x(t)
t
+1 +1 1 1 1 +1 1 +1 +1 1
Baseband digital modulations are represented in a
one-dimensional space.
Digital modulations over the AWGN channel Band-pass digital signalling
Band-pass modulation
We start from the baseband equivalent signal:
x(t; a) = a g(t),
where g(t) is a real baseband signal of bandwidth much lower
than f
c
and a / is a complex modulation symbol.
Then, we obtain the corresponding band-pass signal as:
x(t; a) = !e
_
x(t; a)e
j 2f
c
t
_
= !e(a) [g(t) cos(2f
c
t)]
+ Jm(a) [g(t) sin(2f
c
t)],
Digital modulations over the AWGN channel Band-pass digital signalling
Band-pass modulation (cont.)
The signal x(t; a) can be interpreted as a two-dimensional
linear modulation.
In fact, it can be represented as a linear combination of:
_

1
(t) = g(t) cos(2f
c
t)
2
(t) = g(t) sin(2f
c
t)
(7)
with coecients !e(a) and Jm(a).
The two orthogonal signals (
1
,
2
) are the base of a
two-dimensional signal space 1 provided that T[g(t)] = 0 for
f f
c
and |g|
2
= 2.
Digital modulations over the AWGN channel Signal detection
Detection of transmitted symbols
The receiver outputs an estimate of the transmitted symbol a
based on the received signal y(t) over t (0, T).
The rst stage of the receiver converts y(t) into the vector
y = Aa +z
where a = (a
1
, . . . , a
N
) and z = (z
1
, . . . , z
N
).
We dene a generic decision rule (or detection rule):
a(y) = ( a
1
(y), . . . , a
N
(y)). (8)
a(y) maps 1 into the modulation alphabet /.
The decision rule can be optimized according to some
goodness criterion.
Typically, the goal is minimizing the error probability.
Optimum digital receiver
We can write the (average) error probability as follows:
P(e) =
M
m=1
P(
m
)P(e [
m
) (9)
where
P(
m
) is the a priori probability of transmitting
m
.
P(e [
m
) is the probability of error conditioned on the
transmission of
m
.
We notice that
P(e [
m
) = P
_
a(y = A
m
+z) ,=
m
_
,
i.e., the probability that the decision rule returns a symbol
dierent from the transmitted one.
Optimum digital receiver (cont.)
It is plain to see that minimizing the (average) error
probability is equivalent to maximizing the (average)
probability of correct decision P(c) since P(c) = 1 P(e).
Let us dene
The pdf of y given the transmitted symbol : f(y[).
The decision regions
!
m
y : a(y) =
m
), m = 1, . . . , M.
Since the decision rule a(y) is a well dened function (i.e.,
single-valued) for all y 1 = R
N
(the signal space), the
decision regions do not intersect and their union lls 1 itself:
M
m=1
!
m
= 1
(
denotes the union of disjoint sets).

Now, we can write the probability of correct decision as a
function of the a priori probabilities P(
m
), f(y[), and the
decision rule. Since a(y) =
m
for y !
m
, we get:
P(c) =
M
m=1
P(
m
)P( a(y) =
m
[
m
)
=
M
m=1
_
yR
m
P(
m
)f(y[
m
)dy
=
M
m=1
_
yR
m
P( a(y))f(y[ a(y))dy
=
_
H=R
N
P( a(y))f(y[ a(y))dy (10)
Maximizing P(c) requires to maximize the integrand in (10),
which can be accomplished by selecting the symbol /
that maximizes
P()f(y [ )
for all possible received vectors y.
The resulting optimum decision rule is:
a
opt
(y) = arg max
A
P()f(y [ ). (11)
Since applying the Bayes rule, we have
P(
m
[ y) =
P(
m
)f(y [
m
)
f(y)
,
the optimum decision rule is equivalent to maximizing the a
posteriori probability P(
m
[ y).
Thus, the optimum decision rule is called maximum
a-posteriori (MAP) decision.
When transmitted symbols are equiprobable, i.e.,
P(
m
) = 1/M, the MAP rule reduces to a maximum
likelihood (ML) rule:
a(y) = arg max
m
f(y [
m
)
This name comes from the name of the functions f(y [
m
)
(likelihood functions in radar theory).
The decision regions can be represented as follows:
!
m
=
_
_
_
y : P(
m
)f(y [
m
) > P(
n
)f(y [
n
) n ,= m MAP
y : f(y [
m
) > f(y [
n
) n ,= m ML
Special case: The AWGN channel
Proposition. The additive noise components of an AWGN
channel are iid Gaussian random variables with zero mean and
variance N
0
/2.
Proof.
We have, by denition,
z
n
=
_
T
0
z(t)
n
(t)dt
for n = 1, . . . , N.
Then,
E[z
n
] =
_
T
0
E[z(t)]
n
(t)dt = 0
since the additive noise random process has zero mean.
Special case: The AWGN channel (cont.)
Moreover,
E[z
n
z
n
] =
_
T
0
_
T
0
E[z(t)z(t
)]
n
(t)
n
(t
)dtdt
=
_
T
0
_
T
0
N
0
2
(t t
)
n
(t)
n
(t
)dtdt
=
_
T
0
N
0
2

n
(t)
n
(t)dt
=
N
0
2

n,n
.
In other words, dierent components of the noise vector z are
uncorrelated (and hence independent since Gaussian), and
each one has variance N
0
/2.
As a result, the conditional pdf of the received vector y is
f(y [ ) = f
z
(y A)
= (N
0
)
N/2
e
yA
2
/N
0
. (12)
It is worth noting that the joint pdf (12) depends on the
distance of the received signal from the transmitted one
scaled by the channel gain A.
Using (12), the logarithms of the likelihood functions are
readily obtained as follows:
ln f(y [
m
) =
N
2
ln(N
0
)
1
N
0
|y A
m
|
2
.
Since these functions depend on a distance, they are called
decision metrics.
The MAP and ML decision rules can be expressed in terms of
decision metrics for the AWGN channel as follows:
a(y) =
_
arg min
m
_
|y A
m
|
2
N
0
ln P(
m
)
_
MAP
arg min
m
|y A
m
|
2
ML
As a result, the ML decision rule for the AWGN channel is
often referred to as minimum distance decision.
The decision regions on the AWGN channel can be
represented as follows:
!
m
=
_
_
y : |y A
m
|
2
N
0
ln P(
m
) <
|y A
n
|
2
N
0
ln P(
n
) n ,= m MAP
y : |y A
m
|
2
< |y A
n
|
2
n ,= m ML
Here is a (nontrivial) example of minimum distance decision
regions (aka Voronoi regions):
(2, 1)
(+2, +1)
(1, +1)
(+1, 1)
Error probability
Using the optimum decision rule (11) and assuming that
m
has been transmitted, we can see that the symbol decision is
incorrect if one or more of the following events occur:
_
P(
m
)f(y [
m
) > P(
m
)f(y [
m
) [
m
_
.
for m
= 1, . . . , M and m
,= m.
Error probability (cont.)
Thus, the error probability, conditioned on the transmission of
m
, is given by
P(e [
m
)
= P
_
_
m
=m
_
P(
m
)f(y [
m
) > P(
m
)f(y [
m
)
_
m
_
where represents the union of events.
The above expression of the error probability is too complex
to calculate analytically, whereas the pairwise error
probabilities (PEPs)
Pairwise Error Probability
P(
m

m
)
P
_
P(
m
)f(y [
m
) > P(
m
)f(y [
m
)
m
_
can be calculated very simply!
Thus, lower and upper bounds are used to approximate
P(e[
m
).
The bounds derive from basic probability theory. Recalling
that probability is a measure (an area in two dimensions), it is
clear that:
A
1
A
2
_
_
P(A
1
) P(A
1
A
2
)
P(A
2
) P(A
1
A
2
)
P(A
1
A
2
) P(A
1
) +P(A
2
)
= P(A
1
) +P(A
2
) P(A
1
A
2
)
Generalizing to m events:
Given a set of events A
1
, . . . , A
m
, the following inequalities hold:
max
1im
P(A
i
) P
_
_
i
A
i
_
i=1
P(A
i
). (13)
Applying the bounds (13) to the conditional probabilities
P(e[
m
), we obtain
M
m=1
P(
m
) max
n=m
P(
m

n
) P(e)
M
m=1
P(
m
)
n=m
P(
m

n
)
(14)
Assuming the MAP decision rule over the AWGN channel, and
letting A = 1, the PEPs are given by
P(
m

n
) = Q
_
|
m
n
|
2
+N
0
ln[P(
m
)/P(
n
)]
2N
0
|
m
n
|
_
.
(15)
Q(x) is the counter cumulative distribution function of the
normalized Gaussian random variable (with zero mean and
unit variance) whose distribution is denoted by A(0, 1).
The Q function
The Q function used in the previous expressions is dened as:
Q(x) P(A(0, 1) > x) =
_

x
1
2
e
u
2
/2
du.
Applying integration by parts (
_
udv = uv
_
vdu), we
obtain the inequalities:
e
x
2
/2
2x
_
1
1
x
2
_
Q(x)
e
x
2
/2
2x
.
The Q function (cont.)
The upper bound yields the asymptotic behavior:
Q(x)
e
x
2
/2
2x
.
In many cases, the following crude asymptotic approximation
is used:
Q(x) e
x
2
/2
.
The following diagram compares the Q-function and the two
approximations, which are plotted as the red (
e
x
2
/2
2x
) and
green (e
x
2
/2
) dashed curves, respectively.
We can see that the approximation of the red curve is better
than 10% for x 3.
The Q function (cont.)
4 2 0 2 4
0.2
0.4
0.6
0.8
1.0
Error probability o binary modulations
The inequalities (14) yield the exact error probability in the
case of for binary modulations (M = 2):
P(e) = P(
1
)P(
1

2
) +P(
2
)P(
2

1
)
= P(
1
)Q
_
|
1
2
|
2
+N
0
ln[P(
1
)/P(
2
)]
2N
0
|
1
2
|
_
+ P(
2
)Q
_
|
1
2
|
2
N
0
ln[P(
1
)/P(
2
)]
2N
0
|
1
2
|
_
.
Error probability o binary modulations (cont.)
With equiprobable signals, i.e., P(
m
) = 1/M, inequalities
(14) yield:
1
M
M
m=1
max
n=m
P(
m

n
) P(e)
1
M
M
m=1
n=m
P(
m

n
)
Here, P(
m

n
) = Q(|
m
n
|/
2N
0
).
High SNR approximation
In most situations, one is mostly interested to the high SNR
(and then low N
0
) case.
Since the Q function decreases very quickly, we can keep in
the bounds only the terms with minimum distance:
d
min
min
m=n
|
m
n
| (16)
and disregard the others which are very small.
To be conservative, we use the upper bound to P(e) and
obtain this approximation:
P(e) N
min
Q
_
d
min
2N
0
_
(17)
where N
min
=
1
M
n
1
n
=d
min
.
Standard plots of P(e)
In most cases, the error probability is plotted in log-log
graphics where
the abscissa is the energy ratio E
b
/N
0
expressed in dB on a
linear scale;
the ordinate is the error probability in logarithmic scale.
E
b
is the energy per information bit. For the uncoded
modulations considered, we have:
E
b
=
E
s
log
2
M
E
s
=
m
P(
m
)|
m
|
2
,
and E
s
is called the energy per symbol.
Bit error probability
In some applications, it is better considering the bit error
probability P
b
(e) than the symbol error probability P(e).
Typically, modulation symbols are assigned to bit vectors (bit
mapping), so that a symbol error corresponds to having a
received bit vector dierent from the transmitted one.
The bit error probability is the average number of errors in the
received bit vector divided by the vector size:
P
b
(e) =
E[N
b
]
log
2
M
,
where N
b
denotes the number of bit errors.
Of course, P
b
(e) depends on the bit mapping.
Assuming high SNR, most errors occur between minimum
distance symbols.
Bit error probability (cont.)
For some modulations it is possible to select a bit mapping
such that all minimum distance symbols dier in their
corresponding bit vectors at only one position (Gray
encoding). For example,

3 + +3
00 01 11 10
In this case, with high probability, every symbol error
corresponds to 1 bit error (out of log
2
M transmitted bits).
Therefore, we have:
P
b
(e)
1
log
2
M
P(e).
Digital modulations over the AWGN channel Standard digital modulations
PAM = Pulse Amplitude Modulation
The alphabet of M-PAM is / = (2mM 1)
M
m=1
.
For example, the constellation of 8-PAM is as follows:

7 5 3 + +3 +5 +7
The error probability of M-PAM is:
P(e) = 2
M 1
M
Q
_
_
6 log
2
M
M
2
1
E
b
N
0
_
. (18)
QAM = Quadrature Amplitude Modulation
The alphabet of M-QAM is
/ = (2m
M 1) + j (2n
M 1)
M
m,n=1
.
For example, the constellation of 16-QAM is as follows:

QAM = Quadrature Amplitude Modulation (cont.)
The error probability of M-QAM is:
P(e) = 4
M 1
M
Q
_
_
3 log
2
M
M 1
E
b
N
0
_
. (19)
PSK = Phase Shift Keying
The alphabet of M-PSK is / =
E
s
e
j (2m1)/M
M
m=1
.
For example, the constellation of 8-PSK is as follows:

PSK = Phase Shift Keying (cont.)
The error probability of M-PSK is:
P(e) 2 Q
_
_
2 sin
2

M
log
2
M
E
b
N
0
_
. (20)
In the special case of M = 4 we have:
P(e) 2 Q
_
_
2
E
b
N
0
_
.
Orthogonal modulations
The alphabet of an orthogonal modulation consists of M
vectors in R
M
with a single nonzero coordinate equal to
E
s
.
For example, a quaternary orthogonal modulation is
represented by the following four signals:
1
= (
_
E
s
, 0, 0, 0),
2
= (0,
_
E
s
, 0, 0),
3
= (0, 0,
_
E
s
, 0),
4
= (0, 0, 0,
_
E
s
).
The error probability of an M-ary orthogonal modulation is:
P(e) (M 1)Q
_
_
log
2
M
E
b
N
0
_
. (21)
Orthogonal modulations (cont.)
Two examples of orthogonal modulations are given as follows.
1 Pulse position modulation (M-PPM): Given the orthogonal
pulse (t), the modulated signals are:
x
m
(t) =
M(Mt (m1)T), (22)

i.e., (t) is contracted in time to (0, T/M) and shifted by
(m1)T/M.
2 Frequency shift keying (M-FSK):
x
m
(t) =
2 cos[2(f
c
+mf)t] (23)
with f T is an integer number.
Digital modulations over the AWGN channel Problem set 3
Problem set 3
1 Derive the PEP (15).
2 Derive the union bound approximation (17) along with the
expression of N
min
by keeping only those terms from the
upper bound
1
M
M
m=1
n=m
P(
m

n
)
corresponding to minimum distance errors, i.e., such that
|
m
n
| = d
min
.
3 Derive the error probability in (18).
6 Check the orthogonality of the signals in (22) and (23).
Problem set 3 (cont.)
8 Calculate the error probability of the 32-QAM modulation
characterized by the following signal set:
5 3 1 +1 +3 +5
5
3
1
+1
+3
+5
9 Find the error probability of the binary modulation whose
signals are
s
1
(t) = 1
0<t<0.8
, s
2
(t) = 1
0.4<t<1
, T = 1.
Hint: it is not necessary to represent the signals in a
normalized signal space, only the average energy and the
distance are required.
10 Consider the quaternary modulation obtained by using the
following four signals:
s
1
(t) = A[u(t) u(t T)]
s
2
(t) = A[u(t) u(t T/4) +u(t T/2) u(t 3T/4)]
s
3
(t) = A[u(t) u(t T/4) u(t T/2) +u(t 3T/4)]
s
4
(t) = A[u(t) 2u(t T/2) +u(t T)]
Calculate i) the average energy per bit E
b
, ii) the minimum
distance d
2
min
, and iii) the average symbol error probability
(high-SNR approximation) in the form Q(
_
E
b
/N
0
).
11 Calculate the error probability of a 4-PSK signal set assuming
that the receiver has a constant phase oset that rotates the
decision regions by an angle .
12 Calculate the error probability of an octonary signal set whose
signals are located over two concentric circles with rays 1 and
0.5 +
1.5. The signals are equally spaced over each circle

and have a phase oset of /4 radians between the
corresponding signals over dierent circles.
13 Calculate the error probability of the digital modulation based
on the following four signals:
s
m
(t) = sin
_
5
2T
_
t (m1)
T
5
__
1
|tmT/5|T/5
for m = 1, 2, 3, 4.
Digital modulations over the AWGN channel Power density spectrum of digital modulated signals
Power density spectrum of digital modulations
The power density spectrum of x(t) =
n
a
n
(t nT),
where a
n
is a wide-sense stationary sequence with
autocorrelation function R
a
(p) = E[a
n+p
a
n
], can be expressed
as the product of two terms:
Power density spectrum: G
x
(f) = S
a
(f) G
(f)
_
_
S
a
(f)

p
R
a
(p)e
j 2pfT
(data spectrum)
G
(f)
1
T
[(f)[
2
(pulse spectrum)
In many circumstances, the bandwidth of a digital signal is
approximated by an expression depending only on the
signalling interval T.
Shannon bandwidth
A common approximation to the bandwidth of a digital signal
is the Shannon bandwidth:
W
sh
N
d
1
2T
where N
d
is the signal space dimension and T is the symbol
interval.
It can be shown that this approximation is very good when
the number of dimensions is large.
However, even with N
d
= 1, the bandwidth overhead is
limited for suitably chosen pulses, as illustrated in the
following example.
Bandwidth of antipodal signals
Consider a binary PAM signal with iid equiprobable symbols
a
n
1. The shaping pulse, (t), is one of the following:
(t) =
_
1
t(0,T)
square pulse
2 sin(t/T)1
t(0,T)
sinusoidal pulse
The power density spectrum is given by G
x
(f) = [(f)[
2
/T,
and (check as homework):
(f) =
_
_
Tsinc(fT)e
j fT
square pulse
2
e
j fT
cos(fT)
2T(f
2
1/(2T)
2
)
sinusoidal pulse
Bandwidth of antipodal signals (cont.)
The following diagram plots the fractional power content
(W) =
_
W
W
G
x
(f)df
_

G
x
(f)df
versus the normalized bandwidth WT (normalized with
respect to the signalling rate 1/T).
This quantity represents the fraction of power of the digital
modulation signal contained in the bandwidth W with respect
to the total power.
Bandwidth of antipodal signals (cont.)
0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
WT
(
W
)

square
sine
Digital modulations over the AWGN channel Comparison of digital modulations
Key parameters
The performance of dierent modulation schemes is described
by three system parameters:
1 Error probability (symbol or bit).
2 Spectral eciency, i.e., the ratio between the bit rate R
b
and
the occupied bandwidth W.
3 The signal-to-noise ratio E
b
/N
0
.
For N
d
-dimensional signal sets, the occupied bandwidth is
approximately equal to the Shannon bandwidth
W
sh
= N
d
1
2T
=
N
d
R
b
2 log
2
M
Hence, the spectral eciency is given by
b

R
b
W
sh
=
2 log
2
M
N
d
.
Spectral eciency
The spectral eciency
b
grows slowly (logarithmically) with
the constellation size M and decreases rapidly (linearly) with
the number of dimensions N
d
.
For a xed M, PAM modulations have higher spectral
eciency than orthogonal modulations. Therefore,
PAM modulations are used in channels with limited bandwidth
(bandwidth limited channels) and high power.
Orthogonal modulations are used in channels with limited
power (power limited channels) and large bandwidth.
Shannons bound
Shannons theorem yields the maximum bit rate that can be
sustained with arbitrarily low error probability by an
N
d
-dimensional digital modulation with symbol interval T
over an AWGN channel:
R
b
=
N
d
2T
log
2
_
1 +
S
N
_
. (24)
Here, S is the received power, N is the noise power, and S/N
is called signal-to-noise ratio.
We assume that the signal bandwidth is W
sh
= N
d
/(2T).
Shannons bound (cont.)
Since the noise power is N = N
0
W
sh
and the signal power is
S = E
b
/T
b
= R
b
E
b
, (24) can be written as:
R
b
W
sh
log
2
_
1 +
R
b
E
b
W
sh
N
0
_
.
Since the spectral eciency is
b
= R
b
/W
sh
, we obtain:
b
log
2
_
1 +
b
E
b
N
0
_

E
b
N
0
b
1
b
.
Finally, for
b
0, we have
E
b
N
0
ln 2
_
E
b
N
0
_
dB
10 log
10
(ln 2) 1.6 dB.
Shannons bound (cont.)
10
-2
10
-1
10
0
10
1
0
10
20
30
40
50
60
R
b
/W [bit/s/Hz]
E
b
/
N
0

[
d
B
]

Power-limited
region
Bandwidth-limited
region
4-PPM
1024-PPM
2-PAM
1024-PAM
-1.6 dB
Shannon's bound
P
b
(e)=1e-4
P
b
(e)=1e-6
P
b
(e)=1e-8
Problem set 4
1 Derive the power density spectrum formula
G
x
(f) = S
a
(f) G
(f) for the signal

x(t) =
n
a
n
(t nT), (25)
where
a
n
is a wide-sense stationary sequence with autocorrelation
function R
a
(p) = E[a
n+p
a
n
];
S
a
(f)
p
R
a
(p)e
j 2pfT
is the data spectrum;
G
(f)
1
T
[(f)[
2
is the pulse spectrum.
Hint: Consider the (randomly delayed and stationary) signal
x(t ), with uniformly distributed in (0, T), and calculate
the Fourier transform of its autocorrelation function to obtain
the power density spectrum.
2 Calculate the power density spectrum of the signal (25)
assuming that the symbols a
n
are uncorrelated with mean
a
and variance
2
a
.
3 Calculate the power density spectrum of the signal (25)
assuming that the transmitted symbols a
n
have zero mean
and correlation R
a
(m) =
|m|
(where (0, 1)), (t) has
unit energy, and the average signal power is P.
4 Calculate the power density spectrum of (25) assuming that
the transmitted symbols are iid and taken from a 4-PSK
signal set with probabilities (0.7, 0.1, 0.1, 0.1).
Information Theory
Outline
1
Basic concepts
2
3
Information Theory
4
Channel codes
5
Information Theory
Section Outline
3
Information Theory
Basic concepts, entropy, mutual information
Channel codes
Channel capacity
Continuous input-output channels
Shannons capacity formula
Problem set 5
Information Theory Basic concepts, entropy, mutual information
Information Theory: Basic Concepts
Information theory was invented by Claude E. Shannon in
1948 to study the quantitative meaning of information and its
ow in a communication system.
Among the other results, information theory allows to nd the
maximum transmission rate which can be sustained on a
channel with coding by keeping the information error
probability arbitrarily low.
Given a random variable X with discrete alphabet A and
probability distribution p
X
(x) dened for all x A, we dene
its entropy as follows:
H(X)
xX
p
X
(x) log
2
p
X
(x). (26)
The entropy is measured in bit. If any p
X
(x) = 0, we assume
0 log
2
0 = 0.
Information Theory: Basic Concepts (cont.)
It is plain to see that, since probabilities are not greater than
1, the entropy is always nonnegative: H(X) 0.
Moreover, if the cardinality of A is N, we have
Entropy inequalities:
0 H(X) log
2
N
We dene the entropy function of a probability vector
p = (p
1
, . . . , p
N
), such that 0 p
i
1 and
N
i=1
p
i
= 1, as
follows:
H(p) = H(p
1
, . . . , p
N
)
N
i=1
p
i
log
2
p
i
. (27)
The entropy of a random variable is a measure of its
uncertainty.
If the random variable takes one value with high probability
and the other ones with small probabilities, than the entropy is
very low:
H(0.01, 0.01, 0.01, 0.97) = 0.2419.
If instead the probabilities are equal:
H(0.25, 0.25, 0.25, 0.25) = 2.
The entropy of a binary random variable with probability
vector (p, 1 p) is
H
b
(p) H(p, 1 p) = p log
2
p (1 p) log
2
(1 p)
where 0 p 1. Plainly, H
b
(0.5) = 1 and the function plot
is as follows:
0.2 0.4 0.6 0.8 1.0
p
0.2
0.4
0.6
0.8
1.0
H
b
p
The joint entropy of two random variables X and Y is dened
as
H(X, Y )
(x,y)XY
p
XY
(x, y) log
2
p
XY
(x, y). (28)
The concept can be extended directly to more than two
random variables.
The conditional entropy of two random variables X and Y is
dened as
H(X[Y )
(x,y)XY
p
XY
(x, y) log
2
p
X|Y
(x[y). (29)
The mutual information between two random variables is
dened as
Mutual information
I(X; Y )

(x,y)XY
p
XY
(x, y) log
2
p
XY
(x, y)
p
X
(x)p
Y
(y)
. (30)
We can see that
I(X; Y ) = H(X) +H(Y ) H(X, Y )
= H(X) H(X[Y )
= H(Y ) H(Y [X). (31)
Information Theory Channel codes
Basic concepts on channel coding
An [M, n] channel code over A
n
, (, is an invertible mapping
J A
n
between a set of messages J = 1, . . . , M and
the set of dierent n-tuples (x
1
, . . . , x
n
) A
n
, which are
called code words.
The set of all possible code words is called the codebook.
We dene the code rate as:
R =
log
2
M
n
information bits
channel symbols
. (32)
Since M cannot be larger than the number of all possible
words from A
n
(i.e., [A[
n
), we have the inequality
R
log
2
([A[
n
)
n
= log
2
[A[.
Information Theory Channel codes
Basic concepts on channel coding (cont.)
A binary channel code has M = 2
k
and A = 0, 1 and is
called an (n, k) code.
Its rate is R = log
2
(2
k
)/n = k/n log
2
[A[ = 1.
Examples of binary codes.
Repetition code (n, 1): (0 . . . 0), (1 . . . 1) with rate R = 1/n.
Single parity-check code (n, n 1). The code words are
obtained by concatenating a parity bit to the rst (n 1) bits.
For example, the (3,2) code has the following words:
(000), (011), (101), (110). The rate is R = (n 1)/n.
A code can have no structure at all. For example, this is a
[4, 5] binary code with rate R = 2/5:
c
1
= (10000), c
2
= (01010), c
3
= (10110), c
4
= (11001).
Information Theory Channel capacity
Shannon Theorem and channel capacity
Shannon Theorem for discrete channels
Assume X and Y are the input and output, respectively, of a
discrete communication channel described by the conditional
probability distribution p
Y |X
(y[x).
Assume that all possible sequences of n 1 channel
transmissions are stationary and independent, i.e.,
P(Y
1
= y
1
, . . . , Y
n
= y
n
[ X
1
= x
1
, . . . , X
n
= x
n
)
=
n
i=1
p
Y |X
(y
i
[x
i
).
Shannon Theorem and channel capacity (cont.)
Then, there exist
a code sequence (
n
, where each codebook contains M
n
words
x
n
(m) A
n
and the code rate sequence R
n
R as n ;
a decoding function T(), mapping the channel output word
y }
n
corresponding to the transmitted x(m) back to x(m
);
such that, as n , the maximum error probability
P
n,max
(e) max
1mM
n
P
_
T(y) ,= m [ x
n
(m) transmitted
_
converges to zero as n
provided that the limit code rate R is smaller than the
channel capacity
C = max
all possible p
X
(x)
I(X; Y ). (33)
A simpler statement of Shannon Theorem
According to Shannon Theorem, by using channel codes of
very large length n, we can transmit reliably (i.e., with small
error probability) up to nC bit per codeword, where C is the
channel capacity given by (33).
The channel capacity can be calculated numerically for all
discrete channels by using the Blahut-Arimoto algorithm.
In some special cases there are simple analytic algorithms to
calculate C but none of them applies in general.
Channel capacity depends only on the conditional distribution
p
Y |X
(y[x), which can be represented in matrix form as
follows:
P =
_
_
_
_
_
p
Y |X
(y
1
[x
1
) p
Y |X
(y
2
[x
1
) . . . p
Y |X
(y
N
[x
1
)
p
Y |X
(y
1
[x
2
) p
Y |X
(y
2
[x
2
) . . . p
Y |X
(y
N
[x
2
)
.
.
.
.
.
.
.
.
.
.
.
.
p
Y |X
(y
1
[x
M
) p
Y |X
(y
2
[x
M
) . . . p
Y |X
(y
N
[x
M
)
_
_
_
_
_
(34)
assuming M = [A[ and N = [}[.
If the rows and columns of P are, respectively, permutations
of the rst row and rst column, then the channel is called
strictly symmetric and its capacity is given by
C = log
2
[}[ H(p
1
), (35)
where p
1
is the rst row of P.
For example, the binary symmetric channel has matrix
P =
_
1 p p
p 1 p
_
with A = } = 0, 1. The parameter p is called error
probability of the channel.
This matrix satises the symmetry condition stated above.
Then, the channel capacity is
C = log
2
[}[ H(1 p, p) = 1 H
b
(p).
Information Theory Continuous input-output channels
Mutual information in the continuous case
Let X be a continuous random variable, i.e., with continuous
probability distribution characterized by the probability density
function f
X
(x).
We can approximate X by a discrete random variable X
taking on values x
i
(such that x
i+1
x
i
= ) with
probabilities p
i
= P([X x
i
[ < /2).
We can show that
H(X
) + log
2
h(X)
_
f
X
(x) log
2
f
X
(x)dx (36)
as 0.
Notice that H(X
) as 0.
Here, represents a discretization step and h(X) is called
the dierential entropy of X.
Information Theory Continuous input-output channels
Mutual information in the continuous case (cont.)
The concept can be extended to multiple random variables
and we get
H(X
, Y
) + 2 log
2
h(X, Y )

_
f
XY
(x, y) log
2
f
XY
(x, y)dxdy (37)
as 0.
We can see that while entropies diverge as 0, the mutual
information converges to a nite quantity:
lim
0
I(X
; Y
) = h(X) +h(Y ) h(X, Y ).

This is the starting point to extend Shannon Theorem to
continuous input-output channels.
Information Theory Shannons capacity formula
Shannon Theorem for the additive Gaussian channel
We have the following property:
h(X +a) = h(X) (38)
for every constant a.
For an additive channel Y = X +Z, Z represents Gaussian
distributed additive noise, and is always independent of X.
The following property holds:
h(Y [X) = h(Z). (39)
Shannon Theorem for the additive Gaussian channel
(cont.)
Assuming the variance of X xed and equal to
2
, the
following inequality holds:
h(X) h(X
G
) =
1
2
log
2
(2e
2
), (40)
where X
G
A(0,
2
) (Gaussian with mean 0 and variance
2
).
The capacity of an additive Gaussian channel Y = X +Z,
where Z A(0, N) and E[X
2
] S, is given by
C =
1
2
log
2
(1 +S/N). (41)
This is Shannons capacity formula.
Proof of Shannons capacity formula
We can write the mutual information as
I(X; Y ) = h(Y ) h(Z). (42)
We search the maximum I(X; Y ) for all possible f
X
(x) with
the constraint E[X
2
] S.
Since Y = X +Z and Z is independent of X,
2
Y
E[(X +Z)
2
]
= E[X
2
] + 2E[XZ] +E[Z
2
]
= E[X
2
] + 2E[X]E[Z] +E[Z
2
]
S +N
and the upper limit can be attained when E[X] = 0 and
E[X
2
] = S.
Proof of Shannons capacity formula (cont.)
Therefore, applying (40), we obtain
h(Y ) h(Y
G
) =
1
2
log
2
(2e(S +N)) (43)
where the maximum is attained when X A(0, S) since the
sum of two Gaussian random variables, X +Z, is Gaussian.
This condition, X A(0, S), maximizes h(Y ) and hance
I(X; Y ) = h(Y ) h(Z) over the possible input distributions
f
X
(x) satisfying the power constraint E[X
2
] S.
Then, the channel capacity is achieved when X A(0, S)
and its value is
C =
1
2
log
2
(2e(S +N))
1
2
log
2
(2eN) =
1
2
log
2
_
1 +
S
N
_
,
which is Shannons capacity formula.
Information Theory Problem set 5
Problem set 5
1 Show that lim
p0
p log
2
p = 0.
2 Show that
ln(1 +x) x (44)
for x > 1 by studying the function (x) = ln(1 +x) x.
3 Show that if [A[ = N, then H(X) log
2
N. Hint: use (44).
4 Show eq. (36) and give an intuitive explanation of the fact
that H(X
) as 0.
5 Show eq. (37).
6 Show eq. (38).
7 Show eq. (39).
8 Show eq. (40). Hint: Assume X and X
G
have both zero
mean since the dierential entropy is invariant to constant
shifts of the random variable. Then, let f(x) and g(x) be the
pdfs of X and X
G
, so that
g(x) =
1
2
2
e
x
2
/(2
2
)
.
The proof of (40) is based on the following identity:
_
f(x) log
2
g(x)dx =
_
g(x) log
2
g(x)dx.
Prove the identity above and use it together with the
logarithmic inequality ln(1 +x) x to show (40).
9 Calculate the capacity of the channel with probability matrix
P =
_
_
1 p q p q
q 1 p q p
p q 1 p q
_
_
10 Consider the cascade of two channels represented by
X Y Z forming a Markov chain, i.e., such that
p
Z|XY
(z[x, y) = p
Z|Y
(z[y).
Applying Bayes rules calculate the conditional probability
distribution p
Z|X
(z[x).
Show that if P
XY
represents the probability matrix of the
channel X Y , the following matrix property holds:
P
XZ
= P
XY
P
Y Z
.
11 Consider the cascade of two binary symmetric channels.
Check that the resulting channel is still a binary symmetric
channel.
Calculate its error probability.
Extend the result to the cascade of three binary symmetric
channels
Channel codes
Outline
1
Basic concepts
2
3
Information Theory
4
Channel codes
5
Channel codes
Section Outline
4
Channel codes
Linear codes
Properties of linear codes
Hamming and Reed-Muller codes
Decoding and Error performance
Problem set 6
Coding bounds
Problem set 7
Channel codes Linear codes
Linear codes
An important class of codes are linear codes.
Binary linear codes are dened by a binary generator matrix G
with dimensions k n.
Representing messages by row vectors a 0, 1
k
and code
words by row vectors c 0, 1
n
, we have the following
modulo-2 encoding rule:
c = aG. (45)
Linear codes (cont.)
Example: let a = (101) and G =
_
_
1110
1101
1011
_
_
. Then,
c = (101)
_
_
1110
1101
1011
_
_
= (1110) + (1011) = (0101).
A binary linear code is systematic if G is in row-echelon form,
i.e., it can be written as
G = (I
k
, P).
Here, I
k
is the k k identity matrix and P is a k (n k)
parity matrix.
By denition, code words are obtained as
c = a(I
k
, P) = (a, aP).
The rst k bits coincide with the message and are called
information bits, while the next (n k) bits are called
parity-check bits.
Example: let a = (101) and G =
_
_
100110
010101
001011
_
_
. Then,
c = (101)
_
_
100110
010101
001011
_
_
= (100110) + (001011) = (101101).
A linear code can be also specied by a parity-check matrix
H satisfying the property
cH
T
= 0 =aGH
T
= 0
for every code word c.
Since the identity holds for all possible information words a,
we have the equation GH
T
= 0.
The parity-check matrix of a systematic code with generator
matrix G = (I
k
, P) is
H = (P
T
, I
nk
).
The property can easily be checked:
GH
T
= (I
k
, P)
_
P
I
nk
_
= P +P = 0.
Channel codes Properties of linear codes
Properties of linear codes
The following important properties hold for linear codes:
1 The code words are all the possible linear combinations of the
rows of the generator matrix.
2 The sum of two code words is a code word.
3 The all-zero word is a code word.
An important parameter of a code word is its Hamming
weight w
H
(c), that is, the number of ones that it contains.
The set of all distinct weights in a code, together with the
number of code words of that weight, is the weight
distribution of the code.
Properties of linear codes (cont.)
Given two code words c
i
and c
j
, it is useful to dene a
quantity to measure their dierence.
This quantity is the Hamming distance d
ij
= d
H
(c
i
, c
j
)
between the two code words, dened as the number of
positions in which the two code words are dierent.
Clearly, d
ij
satises the condition 0 d
ij
n.
The Hamming distance between two binary words is equal to
the Hamming weight of their modulo-2 sum:
d
H
(c
1
, c
2
) = w
H
(c
1
+c
2
).
The Hamming distance satises the triangular inequality:
d
H
(c
1
, c
3
) d
H
(c
1
, c
2
) +d
H
(c
2
, c
3
) (46)
for every triple of binary words c
1
, c
2
, c
3
.
The smallest Hamming distance of two distinct code words
(i ,= j) is called the minimum distance d
min
of the code.
The following property simplies the computation of d
min
for
linear codes.
4 The minimum distance of a linear block code is the minimum
weight of its nonzero code words.
In fact, the distance between two binary words is equal to the
weight of their sum, and the sum of two code words is still a
code word (Property 2).
A minimum distance decoder outputs the code word c at
minimum Hamming distance from the received word r:
c = arg min
cC
d
H
(c, r).
Theorem. The output codeword c is uniquely determined
provided that the Hamming weight of the error word, w
H
(e),
is lower than d
min
/2.
Proof. In fact, assume by contradiction that two code words
c
1
and c
2
have Hamming distance t < d
min
/2 from the
received word. Then, from the triangular inequality,
d
H
(c
1
, c
2
) d
H
(c
1
, r) +d
H
(r, c
2
) = 2t < d
min
contrary to the assumption that d
min
is the minimum distance.
Then, a linear code with minimum distance d
min
can correct
up to t = (d
min
1)/2| channel errors (packing radius or
correcting capability of the code).
Channel codes Hamming and Reed-Muller codes
Hamming codes
Hamming codes have length n = 2
m
1, size
k = 2
m
1 m, and d
min
= 3 for m = 2, 3, . . .
Their generator matrix G = (I
k
, P) can be obtained by lling
the parity matrix P with the binary rows of length m and
Hamming weight 2.
For example, with m = 3, we obtain the following generator
matrix:
G =
_
_
_
_
1000011
0100101
0010110
0001111
_
_
_
_
. (47)
Channel codes Hamming and Reed-Muller codes
Reed-Muller codes
For any m and r < m, there is a Reed-Muller code with
parameters given by
n = 2
m
k =
r
i=0
_
m
i
_
d
min
= 2
mr
.
The generator matrix G is dened as follows.
Let v
0
be a row vector whose 2
m
elements are all 1s, and let
v
1
, v
2
, . . . , v
m
be the rows of a matrix with all possible
m-tuples as columns.
The rows of the rth-order generator matrix are the vectors
v
0
, v
1
, . . . , v
m
and all the pointwise products of v
1
, . . . , v
m
two at a time, three at a time, up to r at a time.
Here the product vector v
i
v
j
has components given by the
products of the corresponding components of v
i
and v
j
.
Channel codes Decoding and Error performance
Standard array
The standard array of a linear code is a table containing all
the 2
n
binary vectors (words) of length n with 2
k
columns and
2
nk
rows.
The rows are called cosets.
The words e in a coset have the same syndrome dened as
s = eH
T
.
Coset leaders are dened as the minimum weight words in the
coset and are located in the rst position of the row.
Code words have zero syndrome (since cH
T
= 0 by denition
of parity-check matrix) and are located in the rst row of the
standard array.
Standard array (cont.)
A code is called perfect if all the coset leaders have Hamming
weight not exceeding the packing radius of the code.
A necessary condition for a code to be perfect is that
t
i=0
_
n
i
_
= 2
nk
.
Only a few classes of codes are perfect: repetition codes with
odd n, Hamming codes, and the (23, 12, 7) Golay code.
Minimum distance decoding can be implemented by using the
standard array.
Let the code word c be transmitted and the word r be
received (after hard decisions on the bits).
Minimum distance decoding corresponds to nding a word e of
minimum Hamming weight such that r = c +e for some code
word c.
It can be implemented by calculating the syndrome of r:
s = rH
T
= cH
T
+eH
T
= eH
T
.
The syndrome determines the coset of r and the corresponding
coset leader, say e, which is the word in the coset with
minimum Hamming weight.
Then, the decoded word is c = r +e.
Standard array decoding requires the storage of 2
nk
syndromes of length (n k) and of 2
nk
error patterns of
length n: a total of 2
nk
(2n k) bits.
In contrast, exhaustive decoding requires the storage of the
n 2
k
bits representing all the code words.
Then, standard array decoding has lower complexity than
exhaustive decoding for high rate codes (k > n/2).
However, both methods are impractical even for moderately
large code size and more elaborate algebraic structure must be
assigned to the code in order to apply computational
algorithms, rather than on look-up tables.
Decoding strategy
The error performance depends on the decoding strategy used.
The following are among the mostly used decoding strategies.
Complete decoding.
The decoder always outputs a decoded code word exploiting
the standard array.
It assumes that the error vector corresponds to the leader e of
the coset containing the received word and outputs c = r +e.
In doing this, the decoder goes beyond the correction
capabilities of the code, so that some error vectors with
weight larger than t can lead to wrong decoding.
Bounded t-distance decoding.
This strategy consists of applying standard array decoding
only if the Hamming weight of the coset leader is lower than
the code packing radius.
Decoding strategy (cont.)
As a consequence, some received word are not decoded. In
those cases, the decoder declares a decoding failure.
Error detection.
This strategy consists of simply checking that the receive word
is a code word. If not, the decoder declares a decoding failure.
This strategy is equivalent to bounded 0-distance decoding.
Automatic Repeat reQuest (ARQ) can be applied whenever a
decoding failure occurs.
For transmission systems employing block codes, two error
probabilities are of interest.
The word error probability P
w
(e), dened as the probability
that the decoder output is a code word dierent from that
transmitted.
The bit error probability P
b
(e) dened as the probability that
an information bit is in error after decoding.
Decoding strategy (cont.)
Since calculating exactly the bit error probability after
decoding is dicult, a simple approximation is used at high
SNR.
At high SNR, the error probability is small and the most likely
decoding error events correspond to minimum distance code
words.
In these cases, the number of code word bit dierences is
d
min
. Then, we have a fraction d
min
/n of incorrect coded bits.
For most binary codes it can be shown that approximately the
same fraction of information bits is incorrect.
Therefore, we can use the approximation:
P
b
(e)
d
min
n
P
w
(e).
Error performance with HD complete decoding
Consider the case of coded transmission on a BSC.
Standard array decoding is successful if and only if the error
word is a coset leader. Therefore, the error probability is given
by
P
w
(e) = 1
eL
p
w
H
(e)
(1 p)
nw
H
(e)
where / is the set of coset leaders of the code.
Since the set / contains all the error words with weight lower
than or equal to t, the following upper bound to the word
error probability can be derived:
P
w
(e)
n
i=t+1
_
n
i
_
p
i
(1 p)
ni
_
n
t + 1
_
p
t+1
for p 0.
Error performance with HD complete decoding (cont.)
Consequently, we can approximate the bit error probability
after decoding as:
P
b
(e)
d
min
n
_
n
t + 1
_
p
t+1
.
Then, we assume that a 2
m
-ary digital modulation is used
with asymptotic uncoded bit error probability
P
b,U
(e) exp
_
E
b,U
N
0
_
(48)
for some constant . Here, E
b,U
denotes the energy per
uncoded transmitted bit.
Error performance with HD complete decoding (cont.)
Then, neglecting all constant factors in the exponentials, the
bit error probability after decoding is
P
b,HD
(e) exp
_
(t + 1)
E
b,U
N
0
_
= exp
_
(t + 1)
k
n
E
b,C
N
0
_
. (49)
Here, E
b,C
denotes the energy per information bit with coded
transmission, and satises the identity nE
b,U
= kE
b,C
.
Therefore, comparing (48) and (49), we can see that the
asymptotic coding gain with hard-decision decoding is given by
10 log
10
_
k
n
(d
min
+ 1)/2|
_
dB. (50)
assuming that the packing radius t assumes its maximum
possible value, i.e., (d
min
1)/2|.
Error performance with SD complete decoding
All the previous derivations referred to hard-decision (HD)
decoding.
In this case, the decoder does not interact with the symbol
detector of the receiver.
Instead, the symbol detector detects the received signal
symbol-by-symbol and converts each symbol into bits.
The resulting binary sequences (or code words) are then
decoded.
Alternatively, demodulation and symbol detection can be
merged into a single device.
Error performance with SD complete decoding (cont.)
The received sample sequence after the demodulator is passed
directly to the decoder which compares it by using a
Euclidean rather than Hamming distance metric.
This operating mode is called soft-decision (SD) decoding.
Calculating the word error probability with soft-decision
decoding is very dicult for general modulation schemes.
However, assuming binary PAM modulation, a simple
approximation based on the union bound can be derived.
In this case, the squared Euclidean distance between the
modulated code words is proportional to the Hamming
distance between the corresponding binary code words:
|x
1
x
2
|
2
= 4 E
s
d
H
(c
1
, c
2
).
Assuming equiprobable code word transmission, the word error
probability after soft-decision decoding is given by
P
w
(e) = 2
k
2
k
u=1
P
_
_
v=u
|x
u
+z x
v
|
2
< |z|
2
_
2
k
2
k
u=1
v=u
P(|x
u
+z x
v
|
2
< |z|
2
)
= 2
k
2
k
u=1
v=u
Q
_
|x
u
x
v
|
2N
0
_
(2
k
1)Q
_
_
2 d
min
E
s
N
0
_
Then, we have the following approximation of the bit error
probability:
P
b
(e) exp
_
d
min
E
s
N
0
_
= exp
_
d
min
k
n
E
b
N
0
_
Notice that E
s
= kE
b
/n in this case because of the presence
of the code.
Without coding, the bit error probability is given by
P
b
(e) = Q
_
_
2
E
s
N
0
_
exp
_
E
s
N
0
_
= exp
_
E
b
N
0
_
Notice that E
s
= E
b
in this case.
Then, the asymptotic coding gain with soft-decision (ML)
decoding and binary PAM is given by
10 log
10
_
k
n
d
min
_
dB. (51)
Comparing (50) and (51), we can see that for large Hamming
distance d
min
, the gain advantage of SD decoding with
respect to HD decoding is approximately equal to 3 dB.
Channel codes Problem set 6
Problem set 6
1 Write all the code words of the binary (7,4) Hamming code
with generator matrix
G =
_
_
_
_
1000011
0100101
0010110
0001111
_
_
_
_
. (52)
2 Calculate the weight distribution of the previous code.
3 Calculate all the generator matrices for Reed-Muller codes
with m = 3 and r = 0, 1, 2.
4 Assuming that all the coset leaders in the standard array of
the (8, 4) Reed-Muller code have Hamming weight lower than
or equal to 2, calculate the word error probability after
complete decoding.
5 Write a standard array for the code with generator matrix
G =
_
1 0 1 1 0
0 1 1 0 1
_
.
Channel codes Coding bounds
Coding bounds
We consider binary codes specied by the parameters
(n, k, d
min
), representing the code length, dimension, and
minimum distance, respectively.
Previous results have shown that the coding gain increases
with the product of the code rate k/n by the packing radius
(with HD decoding) or with the minimum distance (with SD
decoding).
Thus, a valid design goal for a block code consists in
maximizing the minimum distance for a given code rate.
In order to check what is the maximum minimum distance for
given code parameters (n, k), we can resort to coding bounds.
More precisely, we have two categories of coding bounds:
existence bounds and achievable bounds.
Coding bounds (cont.)
Existence bounds give conditions for the existence of the
code. For example, for every (n, k) code the minimum
distance cannot be greater than a certain value.
These bounds are necessary for the existence of an
(n, k, d
min
) code.
Achievable bounds give conditions that can be achieved only
by some codes. These bounds provide a lower bound on the
minimum distance of the best code, i.e., the one with greatest
minimum distance among (n, k) codes.
These bounds guarantee the existence of an (n, k, d
min
) code.
Among the existence bounds, we consider the Singleton,
Hamming, and Plotkin bounds.
Among the achievable bounds, we consider the Gilbert and
Varshamov bounds.
In the following we dene a sphere of radius t and dimension
n, centered at c as the set v 0, 1
n
: d
H
(v, c) t.
The volume of this sphere is the number of points it contains
and is given by
V
t,n

t
i=0
_
n
i
_
.
Notice that
_
n
i
_
represents the number of binary vectors
having i 1s and (n i) 0s.
Therefore,
_
n
i
_
is the volume of the shell of the sphere
containing all points at distance i from the center.
Singleton bound. If a binary linear (n, k, d
min
) code exists,
then
d
min
n k + 1. (53)
Proof. Every linear code can be represented by a generator
matrix in row-echelon form:
G = (I
k
, P).
Plainly, the Hamming weight of each row of G cannot exceed
1 + (n k) and is a code word.
Thus, the minimum Hamming distance cannot exceed
n k + 1.
Hamming or sphere-packing bound. If a binary (n, k, d
min
)
code exists, then
V
t,n
=
t
i=0
_
n
i
_
2
nk
. (54)
This inequality yields a bound on the maximum t and the
corresponding minimum distance is d
min
= 2t + 2.
Proof. If t is the packing radius of the code, every code word
is surrounded by all the n-bit vectors at Hamming distance
lower than or equal to t and the distance of all these words
from any other code word is greater than t.
Therefore, since there are 2
k
code words, each occupying the
volume of a sphere of radius t in the space of binary vectors of
length n containing 2
n
points, we have the following
inequality:
2
k
V
t,n
= 2
k
i=1
_
n
i
_
2
n
.
This inequality implies the bound (54).
Plotkin bound. If a binary linear (n, k, d
min
) code exists, then
d
min

n2
k1
2
k
1
. (55)
Proof. We build a 2
k
n array of binary code words.
Assume that a given column contains n
0
0s and n
1
1s and
consider a code word having 1 at that column position.
Since the code is linear, adding this code word to all the code
words in the array reproduces the same array with a row
permutation.
The number of 0s and 1s in this array is n
1
and n
0
,
respectively.
Therefore, we have n
0
= n
1
= 2
k1
for each column.
Thus, we can calculate the average Hamming weight of the
nonzero code words:
w
H
=
n 2
k1
2
k
1
.
Finally, since the minimum Hamming weight cannot exceed its
average value, we have the bound (55).
Gilbert bound. A binary linear (n, k, d
min
) code exists if
V
d
min
1,n
=
d
min
1
i=0
_
n
i
_
< 2
nk+1
. (56)
Proof. This result is proved by an iterative construction of the
generator matrix choosing its rows in order to keep the
minimum distance greater than or equal to the target d
min
.
First of all, we notice that a linear (n, k) code with minimum
distance d
min
exists if a k n generator matrix G exists and
all its row combinations have minimum Hamming weight d
min
.
We ll the rst row of G by an arbitrary binary vector with
Hamming weight greater than or equal to d
min
.
Then, for = 2 to k, we choose the th row in order to keep
the minimum distance above the threshold limit d
min
.
This can be done according to the following steps.
Assume that the rst ( 1) linearly independent rows of G
are given and the minimum Hamming weight of all their
possible nonzero linear combinations is at least d
min
.
Then, if we write all the linear combinations of the rst ( 1)
rows of G as c
1
, c
2
, . . . , c
2
1, then
w
H
(c
i
) d
min
i = 1, . . . , 2
1
1
whenever c
i
,= 0.
Let G
be the th row of G to be determined.

If
w
H
(c
i
+G
) = d
H
(c
i
, G
) d
min
i = 1, . . . , 2
1
,
then the linear code generated by the rst rows of G has
minimum Hamming distance greater than or equal to d
min
.
Now, we can guarantee the existence of at least one vector G
with the previous property if the vectors at distance up to

d
min
1 from c
1
, . . . , c
2
1 do not ll completely the binary
space 0, 1
n
.
And the number of these vectors is, at most, 2
1
V
d
min
1,n
.
Thus, if (but not only if)
2
1
V
d
min
1,n
< 2
n
, (57)
we can nd the matrix row G
such that the code generated

by the rst rows of G has minimum distance at least d
min
.
The previous argument provides the inequalities (57) for
= 2, . . . , k.
Among them, the more restrictive is the one corresponding to
= k, which implies all the others.
In conclusion, this provides the Gilbert bound (56).
Varshamov bound. A binary linear (n, k, d
min
) code exists if
V
d
min
2,n
=
d
min
2
i=0
_
n
i
_
< 2
nk
. (58)
Proof. The proof is similar to that of the Gilbert bound but in
this case we construct iteratively the parity-check matrix.
A sketch of the proof is given as follows.
A linear (n, k) code with minimum distance d
min
exists if an
(n k) n parity-check matrix H exists and any sum of i
columns of H is nonzero for 1 i d
min
1.
Assume the rst (n 1) columns of H to be given.
The nth column cannot be any sum of i of the rst (n 1)
columns of H for i = 0, . . . , d
min
2.
Since the number of these sums is
d
min
2
i=0
_
n
i
_
, it cannot
exceed the number of binary words of length (n k), i.e.,
2
nk
.
Problem set 7
1 Calculate the lower and upper bounds to the minimum
distance of the best binary (8, 4) code.
2 Calculate the lower and upper bounds to the minimum
distance of the best binary (16, 8) code.
3 What is the minimum length n of a binary code of rate 1/3
required to achieve a soft-decision coding gain of 2 dB?
Outline
1
Basic concepts
2
3
Information Theory
4
Channel codes
5
Section Outline
5
Channel model
Nyquist criterion
Receiver design
Truncation eects
Eye diagram
Matched lter
Problem set 8
Equalization
Problem set 9
ISI and Equalization Channel model
Channel model
We consider the transmission of the complex envelope of a
linear digital modulated signal
s(t) =
n
a
n
g(t nT) =
n
a
n
(t nT) g(t), (59)
where g(t) is the modulating pulse.
Assume the signal is transmitted over a baseband equivalent
communication channel with impulse response g
C
(t).
The corresponding received signal is
r(t) = s(t) g
C
(t) +z(t)
=
n
a
n
(t nT) g(t) g
C
(t) +z(t). (60)
Channel model (cont.)
The additive baseband equivalent noise term z(t) has
constant power density spectrum equal to N
0
.
The receiver is based on a lter with impulse response g
R
(t)
followed by a sampler taking samples at times t
n
= t
0
+nT.
The signal output from the receive lter is
y(t) = r(t) g
R
(t)
=
n
a
n
(t nT) g(t) g
C
(t) g
R
(t)
+ z(t) g
R
(t). (61)
The channel model is described in the following block
diagram:
n
a
n
(t nT)
G(f) G
C
(f) +
z(t)
G
R
(f)
t
n
y(t
n
)
Then, we dene the overall channel impulse response
h(t) g(t) g
C
(t) g
R
(t) (62)
and the received noise as
(t) z(t) g
R
(t). (63)
As a result, the output signal from the receive lter is given by
y(t) =
n
a
n
h(t nT) +(t). (64)
The nth sampled output after the receive lter is given by
y
n
y(t
n
) =
m
a
m
h(t
n
mT) +(t
n
)
=
m
a
m
h(t
nm
) +(t
n
). (65)
(In fact, t
nm
= t
0
+ (n m)T = t
0
+nT mT = t
n
mT)
Dening the samples h
n
h(t
n
) and
n
(t
n
), we obtain
the discrete equivalent baseband channel equation
y
n
=
m
a
m
h
nm
+
n
=
m
h
m
a
nm
+
n
. (66)
Notice that the rst term in the rhs is the discrete convolution
of the data sequence a
n
and the discrete baseband channel
impulse response sequence h
n
.
Moreover, eq. (66) can be written in the following form:
y
n
= h
0
a
n
..
(1)
+
m=0
h
m
a
nm
. .
(2)
+
n
..
(3)
. (67)
The three terms in this expression are interpreted as follows:
(1) is the useful term since it contains the symbol transmitted at
the nth discrete time multiplied by the coecient h
0
depending on the transmission chain (modulation pulse,
channel, receive lter).
(2) is called intersymbol interference (ISI) and can be assimilated
to a disturbance as the noise.
(3) is the ltered noise sample.
Usually, the eect of the ISI is greater than the eect of noise
so that a common strategy is aimed at removing ISI regardless
of the consequences on the ltered noise.
ISI and Equalization Nyquist criterion
Nyquist criterion
It is possible to design the receive lter in order to remove
completely the ISI.
This condition is met if we have h
n
= h(t
n
) = 0 for all n ,= 0.
Since t
n
= t
0
+nT, this is equivalent to the following
equation:
h(t +t
0
)
n
(t nT) =
n
h(t
n
)(t nT)
= h(t
0
)(t). (68)
Nyquist criterion (cont.)
The Fourier transform of the Dirac train
T
(t)
n
(t nT)
is given by
T[
T
(t)] =
n
e
j 2nfT
=
1
T
_
f
n
T
_
=
1/T
(f)
T
.
(69)
From a basic property of the Fourier transform, we know that,
if H(f) = T[h(t)], then

H(f) T[h(t +t
0
)] = e
j 2ft
0
H(f).
Nyquist criterion (cont.)
Inserting the previous results in eq. (68), after using the basic
properties of Fourier transforms, we obtain the following
equation:
H(f)
1
T
_
f
n
T
_
= h(t
0
).
Finally, we can rewrite the previous result as
H
_
f
n
T
_
= h(t
0
)T, (70)
which is commonly referred to as Nyquist criterion.
ISI and Equalization Receiver design
Receiver design based on Nyquist pulses
A common approach to the design of the modulation pulse
and of the receive lter is based on the Nyquist criterion.
Then, a transfer function

H(f) is chosen satisfying eq. (70).
The raised cosine is the most popular transfer function used
for the design of communication systems.
Its normalized form is given by setting

H
RC
(f; , T)
_
_
T [f[ <
1
2T
T
2
_
1 + cos
T
_
[f[
1
2T
__
1
2T
[f[
1 +
2T
0 [f[ >
1 +
2T
(71)
Receiver design based on Nyquist pulses (cont.)
The parameter , 0 1, in

H
RC
(f; , T) is the roll-o
factor.
The following diagram plots the raised cosine transfer function
with roll-o factor = 0.25:
f
H(f)
1+
2T
+1/(2T) 1/(2T)
It is plain to see that the sum
H(f n/T) is constant

and the Nyquist criterion is satised.
The roll-o factor represents the fractional increase of the
occupied bandwidth, (1 +)/(2T), with respect to the
Shannon bandwidth, 1/(2T).
The raised cosine impulse response is given by
h
RC
(t; , T) T
1
[

H
RC
(f; , T)]
= sinc
_
t
T
_
cos(t/T)
1 4
2
t
2
/T
2
(72)
The raised cosine impulse response with = 0.25 is plotted
as follows:
t
h(t)
3T 2T T 0 T 2T 3T
ISI and Equalization Truncation eects
Truncation eects on Nyquist pulses
In a physically realizable system implementation, the impulse
response must be causal, i.e., h(t) = 0 for t < 0.
To obtain a causal impulse response we can truncate it and
shift it to the right by t
0
time units.
Starting from the raised cosine impulse response

h
RC
(t; , T)
and truncating it over the interval (T, T), we obtain
h
RC,trunc
(t; , T, )
h
RC
(t; , T) 1
|t|<T
. (73)
In this way, the impulse response has a duration of 2 symbol
times.
Truncation eects on Nyquist pulses (cont.)
Shifting

h
RC,trunc
(t; , T, ) to the right by T time units we
obtain
h
RC,trunc
(t; , T, ) =

h
RC,trunc
(t T; , T, ), (74)
whose support interval is (0, 2T).
The time-shifting operation corresponds to a phase rotation of
the transfer function:
H
RC,trunc
(f; , T, ) =

H
RC,trunc
(f; , T, )e
j 2ft
0
(75)
where t
0
= T is the 0th sampling time.
The result is an approximation of the raised cosine transfer
function which gets better as increases.
For = 0.25 and = 4 we have:
t
h
RC,trunc
(t)
4T 8T
The following diagrams illustrate it for dierent values and
with T = 1 and positive values of f since

H(f) =

H(f).
0 0.2 0.4 0.6 0.8 1
0.2
0
0.2
0.4
0.6
0.8
1
1.2
f
H
(
f
)
Raised cosine truncated over (1T,1T)

=0
=0.1
=0.25
=0.5
=1
0 0.2 0.4 0.6 0.8 1
0.2
0
0.2
0.4
0.6
0.8
1
1.2
f
H
(
f
)

=0
=0.1
=0.25
=0.5
=1
0 0.2 0.4 0.6 0.8 1
0.2
0
0.2
0.4
0.6
0.8
1
1.2
f
H
(
f
)

=0
=0.1
=0.25
=0.5
=1
0 0.2 0.4 0.6 0.8 1
0.2
0
0.2
0.4
0.6
0.8
1
1.2
f
H
(
f
)

=0
=0.1
=0.25
=0.5
=1
The following pictures show the fraction of power of the
truncated raised cosine transfer function within the bandwidth
W for dierent values of roll-o and truncation interval.
More precisely, we dene the truncated raised cosine transfer
function as
H
RC,trunc
(f; , T, ) =
_
T
T
h
RC,trunc
(t; , T)e
j 2ft
dt.
Then, the power fraction function is dened as
(W) =
_
W
W
[
H
RC,trunc
(f; , T, )[
2
df
_
H
RC,trunc
(f; , T, )[
2
df
.
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
W
(
W
)
Outofband power of raised cosine truncated over (1T,1T)

=0.1
=0.25
=0.5
=1
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
W
(
W
)

=0.1
=0.25
=0.5
=1
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
W
(
W
)

=0.1
=0.25
=0.5
=1
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
W
(
W
)

=0.1
=0.25
=0.5
=1
The following table summarizes the previous diagram by
reporting the power fraction at the nominal bandwidth
(1 +)/(2T).
(
1 +
2T
) (
1 +
2T
)
0.1 0.5 0.824390 0.1 2 0.993237
0.25 0.872942 0.25 0.998809
0.5 0.924796 0.5 0.999878
1 0.965806 1 0.999995
0.1 1 0.977944 0.1 4 0.998811
0.25 0.991655 0.25 0.999920
0.5 0.998239 0.5 0.999997
1 0.999833 1 1.000000
We can see from the table that, allowing a maximum
out-of-band power of 1% of the total power, a roll-o
= 0.25 is enough to keep the truncation interval at 2T.
We can see that the roll-o = 0.223 is sucient to keep the
out-of-band power below 1%.
In other words, this corresponds to a raised cosine truncated
pulse ranging over two signalling periods.
t
h
trunc
(t)
T 2T
ISI and Equalization Eye diagram
Eye diagram
The common method to visualize the eect of ISI is the eye
diagram.
The eye diagram is the superposition of many translated (by
integer multiples of the symbol interval) replicas of the receive
lter output y(t) in the absence of noise.
The shape of the eye diagram depends on the truncation
interval on the overall channel impulse response.
The following diagrams refer to a 2-PAM modulation based
on a raised cosine pulse with roll-o = 0.25, truncation
interval of 6 and 8 symbol intervals, respectively, and
observation window of 2 symbol intervals.
Eye diagram (cont.)
1 0.5 0 0.5 1
2
1.5
1
0.5
0
0.5
1
1.5
2
t/T
y
(
t
)
Eye diagram (cont.)
1 0.5 0 0.5 1
2
1.5
1
0.5
0
0.5
1
1.5
2
t/T
y
(
t
)
Eye diagram (cont.)
The dierence between the two diagram is that the second
contains many more signal paths, as a result of the longer
truncation interval which includes more signal tails in the
diagram.
The following diagram refers to 4-PAM with truncation
interval of 4 symbol intervals.
Eye diagram (cont.)
1 0.5 0 0.5 1
5
4
3
2
1
0
1
2
3
4
5
t/T
y
(
t
)
Eye diagram (cont.)
The following diagram refers again to 2-PAM with truncation
interval of 8 symbol intervals but with greater roll-o, = 1.
Eye diagram (cont.)
1 0.5 0 0.5 1
1.5
1
0.5
0
0.5
1
1.5
t/T
y
(
t
)
Eye diagram (cont.)
We notice that the signal paths in the case of roll-o = 1
are much less spread than in the case of = 0.25 since the
signal tails decrease more rapidly.
In all cases, the signal paths pass through xed amplitude
levels in correspondence of integer multiples of the symbol
time T.
This is due to the fact that the raised cosine pulse satises the
Nyquist criterion and there is no ISI at these sampling times.
In the presence of ISI, things change drastically as illustrated
by the following diagram corresponding to
h(t) = te
t/(0.5T)
u(t)
truncated to 8 symbol intervals.
Eye diagram (cont.)
0 0.5 1 1.5 2
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
t/T
y
(
t
)
Maximum eye
opening
Eye diagram (cont.)
The signal paths do not pass through xed points at the
sampling instants because of the presence of ISI.
The eye diagram gives a measure of the quality of the
transmission system through the horizontal and vertical
amplitude of the eye (the inner area).
The larger is the vertical amplitude, the better is the noise
immunity.
The larger is the horizonal amplitude, the less sensitive is the
system to sampling time errors.
Eye diagram (cont.)
The availability of inexpensive digital signal processing devices
has reduced the importance of the eye diagram calibration in
modern communication systems which rely on heavy
numerical algorithms for the equalization.
The advantage of such system is that simple and cheap low
pass lters can be used instead of more precise lters designed
upon the raised cosine transfer function.
Moreover, system tuning based on the eye diagram or similar
tools has become unnecessary in most cases.
ISI and Equalization Matched lter
Matched lter
Theorem. Given the modulation pulse g(t) and the channel
impulse response g
C
(t), the optimum receive lter is the
matched lter with impulse response
g
R
(t) = g
T
(t
0
t)
, (76)
where g
T
(t) g(t) g
C
(t).
Note. The transfer function of the matched lter is
G
R
(f) =
_
g
T
(t
0
t)
e
j 2ft
dt
= e
j 2ft
0
_
g
T
(u)
e
j 2fu
du
= e
j 2ft
0
G
T
(f)
.
Matched lter (cont.)
Proof. This result can be derived as in the case of a symbol
time limited pulse g
T
(t) considered before by properly
extending the concept.
Assuming a nite span of input data symbols a
n
/ for
0 n N 1, the signal
n
a
n
g
T
(t nT) can be
embedded in a signal space
1 = /(g
T
(t), g
T
(t T), . . . , g
T
(t (N 1)T)).
where the signals g
T
(t nT) are not orthogonal.
We can use the Gram-Schmidt algorithm to derive an
equivalent orthogonal base
n
(t) =
n
m=0
nm
g
T
(t mT), n = 0, . . . , N 1.
Then, the correlation receiver can be obtained by calculating
the inner products (r(t),
n
(t)), where
r(t) =
n
a
n
g
T
(t nT) +z(t).
However, by using the base obtained from the GS algorithm,
we obtain
(r(t),
n
(t)) =
n
m=0
nm
(r(t), g
T
(t mT)).
As a result, the inner products (r(t), g
T
(t nT)) provide all
the required information for the optimum detection of the
transmitted data symbols.
These inner products can be calculated by using a matched
lter with impulse response g
R
(t) = g
T
(t
0
t)
sampled at
times t = t
n
= t
0
+nT. In fact,
y(t
n
) =
_
r(t)g
R
(t
n
t)dt
=
_
r(t)g
T
(t
0
t
n
+t)
dt
=
_
r(t)g
T
(t nT)
dt
= (r(t), g
T
(t nT))
The sampling time t
0
must be chosen to make g
R
(t) causal.
Equivalently, since g
R
(t) = g
T
(t
0
t)
, we must have
g
T
(t) = 0 for t > t
0
.
Applying the matched lter concept, a receiver design based
on the raised cosine transfer function can be carried out by
assuming
G(f) = G
R
(f) =

H
RC
(f; , T)
1/2
,
after truncation and time-shifting for causality.
The corresponding impulse response can be obtained
analytically:
g(t) = g
R
(t) = T
1
[

H
RC
(f; , T)
1/2
]
=

h
RRC
(t; , T)
4t cos[(1 +)t] + sin[(1 )t])

t(1 16
2
t
2
)
(77)
The following diagram plots the root raised cosine impulse
response with = 0.25:
t
h
RRC
(t)
T
Also in this case we can carry out a numerical analysis of the
truncation eects on the fractional power content.
ISI and Equalization Problem set 8
Problem set 8
1 Consider a communication channel where the symbol interval
is T = 1, the modulating pulse is g(t) = u(t) u(t 2), the
channel impulse response is g
C
(t) = (t) 0.5(t 1), and
the receiver is based on a matched lter.
Calculate the matched lters impulse response g
R
(t).
Calculate the overall impulse response h(t).
Calculate the error probability corresponding to the
transmission of binary PAM symbols. The noise variance is
=
N
0
2
_

[G
R
(f)[
2
df.
ISI and Equalization Equalization
Introduction
Instead of designing an overall channel transfer function
complying with the Nyquist criterion, we accept the presence
of ISI and try to remove it by using equalization techniques.
The resulting receiver relaxes the design constraints on the
receive lter (which may be very expensive) at the price of
heavier digital signal processing requirements (which become
less and less expensive as technology evolves).
We consider two types of linear equalizers: zero forcing (ZF)
and linear minimum mean-square error (LMMSE).
Equalization techniques rely on some methods from linear
algebra which are recalled in the following.
Some linear algebra
The notation A = (a
ij
)
m,n
i,j=1
corresponds to the mn matrix
A =
_
_
_
_
_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
. . . a
mn
_
_
_
_
_
and (A)
ij
= a
ij
, the element in the ith row and the jth
column.
The matrix A
T
is the transpose of A and is dened by
(A
T
)
ij
= (A)
ji
.
The matrix A
H
is the Hermitian transpose of A and is
dened by (A
H
)
ij
= (A)
ji
.
Some linear algebra (cont.)
The trace of a square n n matrix A is dened by
Tr(A) =
n
i=1
(A)
ii
.
The trace has the property Tr(AB) = Tr(BA). In fact,
Tr(AB) =
i
(AB)
ii
=
j
(A)
ij
(B)
ji
=
j
(B)
ij
(A)
ji
= Tr(BA).
The squared Euclidean norm of a matrix A is
|A|
2
= Tr(AA
H
).
The transposition, Hermitian transposition, and inverse
operators commute. For example: (A
H
)
1
= (A
1
)
H
.
A square matrix A is symmetric if A = A
T
. A square matrix
A is Hermitian if A = A
H
.
The expression
x
H
Ax =
i,j
x
i
(A)
ij
x
j
is a quadratic form in the variable vector x. If A is Hermitian,
then the quadratic form is real because
(x
H
Ax)
= (x
H
Ax)
H
= x
H
A
H
x = x
H
Ax.
A Hermitian matrix is positive denite if x
H
Ax > 0 for all
nonzero complex vectors x. The eigenvalues of A are all real
and positive.
A Hermitian matrix is positive semidenite if x
H
Ax 0 for
all complex vectors x. The eigenvalues of A are all real and
nonegative.
Partial ordering relations are dened for Hermitian matrices:
A > B AB is positive denite.
A B AB is positive semidenite.
Equalization
We start by rewriting the channel equation in matrix form for
the rst N received signal samples:
_
_
_
_
_
y
0
y
1
.
.
.
y
N1
_
_
_
_
_
. .
y
=
_
_
_
_
_
_
_
h
0
0 0 . . . 0 0
h
1
h
0
0 . . . 0 0
h
2
h
1
h
0
. . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . h
2
h
1
h
0
_
_
_
_
_
_
_
. .
H
_
_
_
_
_
a
0
a
1
.
.
.
a
N1
_
_
_
_
_
. .
a
+
_
_
_
_
_
z
0
z
1
.
.
.
z
N1
_
_
_
_
_
. .
z
Two types of linear equalization techniques exist:
Zero Forcing (ZF) equalization.
Linear MMSE equalization.
Equalization (cont.)
ZF equalization consists in removing completely the ISI by
multiplying the received vector by the matrix W
ZF
= H
1
before making decisions.
The main drawback of ZF equalization is noise enhancement,
i.e., the increase of the average noise power after equalization.
Linear MMSE equalization is obtained by minimizing the MSE
between the transmitted symbol vector a and the set of
vectors Wy, which are linear mappings of the received vector.
A ZF equalizer outputs the estimate
a
ZF
= H
1
y = a +H
1
z.
According to the denition, a linear MMSE equalizer outputs
the estimate
a
LMMSE
=
_
arg min
W
E[|Wy a|
2
]
_
y.
Expanding the MSE and dening the covariance matrices
aa
= E[aa
H
],
ay
= E[ay
H
],
ya
= E[ya
H
], and
yy
= E[yy
H
], we obtain the following expression:
2
(W) = Tr
_
E
_
(Wy a)(Wy a)
H
__
= Tr
_
E
_
Wyy
H
W
H
Wya
H
ay
H
W
H
+aa
H
__
= Tr
_
W
yy
W
H
W
ya
ay
W
H
+
aa
_
.
Elaborating the previous expression, we can see that
2
(W) = Tr
_
(W
ay
1
yy
)
yy
(W
ay
1
yy
)
H
+
aa
ay
1
yy

ya
_
,
so that the matrix W =
ay
1
yy
minimizes the MSE.
Typically, the matrix W is derived by assuming iid zero mean
transmitted symbols with E[[a
n
[
2
] = 1 and iid noise
components with E[[z
n
[
2
] =
2
.
Since
yy
= E[(Ha +z)(Ha +z)
H
] = HH
H
+
2
I
N
ay
= E[a(Ha +z)
H
] = H
H
we have:
W
LMMSE
= H
H
(HH
H
+
2
I
N
)
1
= (H
H
H +
2
I
N
)
1
H
H
,
(this matrix identity will be derived in a problem).
Then, we can write the linear MMSE estimate of the
transmitted symbols as
a
LMMSE
= W
LMMSE
y.
Equalization: SNR performance
We can now compare the SNRs obtained after ZF and linear
MMSE equalization.
1 With ZF equalization,
a
ZF
= a +H
1
z.
Thus, assuming iid zero mean transmitted symbols with
E[[a
n
[
2
] = 1 and iid noise components with E[[z
n
[
2
] =
2
, we
obtain:
SNR
ZF
=
N
2
Tr[(H
H
H)
1
]
(78)
since
E[|H
1
z|
2
] = E[Tr(H
1
zz
H
(H
H
)
1
)]
= Tr(H
1
E[zz
H
](H
H
)
1
)
=
2
Tr[(H
H
H)
1
].
Equalization: SNR performance (cont.)
2 With linear MMSE equalization,
a
LMMSE
= (H
H
H +
2
I
N
)
1
H
H
Ha
+(H
H
H +
2
I
N
)
1
H
H
z
= a
2
(H
H
H +
2
I
N
)
1
a
+(H
H
H +
2
I
N
)
1
H
H
z.
Thus, assuming iid zero mean transmitted symbols with
E[[a
n
[
2
] = 1 and iid noise components with E[[z
n
[
2
] =
2
, we
obtain:
SNR
LMMSE
=
N
Tr
_
4
(H
H
H +
2
I
N
)
2
+
2
(H
H
H +
2
I
N
)
1
H
H
H(H
H
H +
2
I
N
)
1
_
N
Tr
_
2
(H
H
H +
2
I
N
)
1
_
(79)
Comparing the two SNRs in (78) and (79), we can see that
the former is always lower than the latter.
Therefore, linear MMSE equalization is always better than ZF
equalization because
H
H
H < H
H
H +
2
I
N
(H
H
H)
1
> (H
H
H +
2
I
N
)
1
Tr[(H
H
H)
1
] > Tr[(H
H
H +
2
I
N
)
1
].
However, it requires a good estimate of the noise variance.
Both equalization algorithms require the inversion of an
N N matrix. However, the inverse for ZF equalization has
complexity O(N
2
) while that for linear MMSE equalization
has complexity O(N
3
).
Several iterative algorithms have been designed to
approximate this requirement. One of them is decision
feedback (DF) equalization.
We dene T() as the minimum distance decision over the
signal set.
Then, DF equalization can be illustrated by the following
steps:
1 At n = 0, we make the decision a
0
= T(y
0
/h
0
).
2 At n = 1, we make the decision a
1
= T[(y
1
h
1
a
0
)/h
0
].
3 At n = 2, we make the decision
a
2
= T[(y
2
h
1
a
1
h
2
a
0
)/h
0
], and so on.
ISI and Equalization Problem set 9
Problem set 9
1 Show that the squared Euclidean norm of a matrix A is
|A|
2
= Tr(AA
H
).
2 Check the identity
H
H
(HH
H
+
2
I
N
)
1
= (H
H
H +
2
I
N
)
1
H
H
.
3 Calculate in detail the SNR of the LMMSE equalizer.
4 Consider a channel with gains h
0
= 1, h
1
= 0.2, h
2
= 0.1,
N = 3 iid zero mean transmitted symbols with E[[a
n
[
2
] = 1,
and iid noise components with E[[z
n
[
2
] =
2
.
Calculate the SNRs with ZF and LMMSE equalization.
Calculate the SNR without equalization (i.e., considering ISI as
additive noise).
5 Derive an iterative algorithm to invert the lower triangular
matrix H. (Hint: the result is a lower triangular matrix as
well).

Notes 2011

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Notes 2011

Caricato da

Copyright:

Formati disponibili

DIGITAL COMMUNICATIONS

X(f) = 2u(f)X(f) = X(f) + sgn(f)X(f) X(f) + j

denotes the union of disjoint sets).

79 c Prof. Giorgio Taricco c DIGITAL COMMUNICATIONS

81 c Prof. Giorgio Taricco c DIGITAL COMMUNICATIONS

M(Mt (m1)T), (22)

1.5. The signals are equally spaced over each circle

(f) for the signal

) = h(X) +h(Y ) h(X, Y ).

be the th row of G to be determined.

with the previous property if the vectors at distance up to

such that the code generated

H(f n/T) is constant

4t cos[(1 +)t] + sin[(1 )t])

Potrebbero piacerti anche