CTDT

Lecture Notes of
Optical Communication Theory and Techniques

Part I: Communication Theory and Digital Transmission
Master Degree Program in Computer Science and Networking (MCSN) International Masters of Science in Communication Networks Engineering (IMCNE) Masters on Photonic Networks Engineering (MAPNET)
Contents
Chapter 1. Introduction to Communication Systems 1.1. Modulation 1.2. Distortion, Interference, and Noise 1.3. Classication of Communication Systems 1.4. Communication Networks Chapter 2. Mathematical Introduction 2.1. Denitions 2.2. Discrete Representation of Signals 2.3. Complete Bases 2.4. Discrete Representation of Stochastic Processes 2.5. Gaussian Random Variables Chapter 3. Waveform Transmission Over Wideband Channels 3.1. Introduction 3.2. Optimum Detection Strategy 3.3. Additive White Gaussian Noise Channel 3.4. Optimum Receiver Structures 3.5. Probability of Error Chapter 4. Optimum Detection of Stochastic Signals 4.1. Stochastic Signals in AWGN 4.2. Signals with Random Amplitude 4.3. Signals with Random Phase Chapter 5. Optimum Detection in the Presence of Colored Noise 5.1. Kharunen-Love Series Expansion 5.2. Reversibility Theorem 5.3. Whitening Filter and Spectral Factorization Chapter 6. Waveform Transmission Over Dispersive Channels 6.1. Pulse Amplitude Modulation (PAM) 6.2. Optimum Receiver for One-Shot Transmission 6.3. Intersymbol Interference and Nyquist Criterion 6.4. Power Spectrum of PAM Signals 6.5. Optimum Terminal Filters 1 1 2 2 3 4 4 7 16 20 23 29 29 31 33 35 48 89 89 91 94 122 122 128 128 137 137 140 144 147 150
CHAPTER 1
Introduction to Communication Systems

A communication system conveys information from its source to a destination some distance away. By information we mean, in a broad sense, its physical manifestation as a message produced by the source and transmitted using a medium of any nature (electric, electromagnetic, acoustic, optic, etc.). The meaningfulness of a message is not related to its content and duration, but rather to its mere physical existence. In other words, any variation of the state of a mediumwhich is generated with the goal of communicatingrepresents information. Input Transducer Transmission Channel F IGURE 1.1. There are many communication systems, but generally all of them can be reduced in principle to the scheme shown in Fig. 1.1, whose elements are: (1) A source of information producing the messages to be transmitted. The nature of a message can be diverse: written text, voice, images, a sequence of data, etc. (2) An input transducer, needed to convert the message to an electrical signal. (3) A transmitter, needed to convert the electrical signal to a form suitable for transmission through a given medium (the transmission channel). Signal processing for transmission almost always involves modulation and may also include coding. (4) A transmission medium (channel), bridging the distance from source to destination. It may be a pair of wires, a coaxial cable, a radio wave, or a laser beam. (5) A receiver, performing the inverse transformation operated by the transmitter for delivery to the output transducer. Receiver operations include amplication to compensate for transmission loss, and demodulation and decoding. (6) An output transducer that converts the output signal to the desired message form. (7) A destination, which is the recipient of the information. 1.1. Modulation In many cases, the communication system is used for transmitting several messages simultaneously, but in such a way that each message, originating from separate sources, can reach the proper destination. As an example, this is the case in radio communications where the free space is used as a transmission channel. Clearly, in order to be distinguished among all other ones by the receiver, the message should have a proper form. That is to say that a new signal should be created in such a way that it carries the same information as the original message and, at the same time, it can be
1
Source
TX
RX
Output Transducer
Dest.
1.3. CLASSIFICATION OF COMMUNICATION SYSTEMS
distinguished on the basis of a peculiar characteristic and not for its information content. Such signal processing is referred to as modulation and the new signal is called modulated signal. It is now clear that the output transducer must be able to extract the information from the modulated signal. The whole communication system from source to destination is then called an information or communication link, independently from the fact that it is part of a single or multiple communication system.
1.2. Distortion, Interference, and Noise In the course of signal transmission, distortion and unwanted undesirable contamination by extraneous signals and noises of different nature are unavoidably introduced. Distortion is waveform perturbation caused by imperfect response of the system to the desired signal itself. In most cases it may be corrected, or at least reduced, with the help of equalizers. Interference caused by extraneous signals from human sourcesother transmitters, power lines and machinery, etc. may be removed by appropriate ltering to the extent that the interfering signals occupy different frequency bands than the desired signal. Noise constitutes one of the fundamental system limitations. It manifests itself as a random and unpredictable signal, produced by natural processes both internal and external to the system, that corrupt the desired signal, thus altering its information content. Noise produced by electronic devices is practically unavoidable because it is due to the random motion of electrons occurring at any temperature above absolute zero. There are also other types of noise, but thermal noise appears in every communication system. The noise relative to an information signal is measured in terms of the signal-to-noise power ratio S/N. For a reliable communication, the ratio S/N should be the highest possible, as amplication at the receiver is to no avail, because the noise will be amplied along with the signal. In order to have a sufciently large S/N at the receiver, other than increasing the transmitted power, the receiver should also be properly designed. Indeed, due to their different statistical properties, the signal may be enhanced with respect to the noise by means of particular techniques.
1.3. Classication of Communication Systems There are many kinds of information sources, including machines as well as people, and messages appear in various forms. Nonetheless, we can identify two distinct message categories, analog and digital. Thus, a communication system may be classied according to the signals obtained by the transducer. a) Analog systems. The transducer performs a linear transformation on the source message, producing an electrical signal that varies with time, usually in a smooth and continuous fashion. Examples of analog messages are the acoustic pressure produced by a sound, or the light intensity at some point in a video image. Since the information resides in a time-varying waveform, an analog communication system should deliver this waveform with a specied degree of delity.
1.4. COMMUNICATION NETWORKS
b) Digital systems. The source message, independently from its nature, is transformed into an ordered sequence of symbols selected from a nite set of discrete elements. Examples of digital messages are the printed letters on a page, or the keys pressed at a computer terminal. Since the information resides in discrete symbols, a digital communication system should deliver these symbols with a specied degree of accuracy in a specied amount of time. 1.4. Communication Networks A communication system is only a part of a communication network, which is a complex system for messages exchange, routing and switching. Although in many cases the information to be transmitted is analog in nature, it is often converted into a digital message before entering the network. Indeed, digital transmission techniques are predominant for many reasons, such as the larger insensitivity of digital signals to channel impairments, the availability of highly reliable, lowcost, small-sized digital circuitry, and, last but not least, the fact that the great variety of signals to be transmitted can be made uniform after digitization. So, the evolution has lead to integrated service digital networks (ISDN), where the most diverse services are all interconnected. This scenario concerns a large variety of digital signals of different formats that must be routed and switched to their proper destinations. The only limit is the transmission capacity, whose demand is steadily growing also thanks to the great success of the Internet and of wireless telephony in digital form, together with the advent of digital TV in MPEG format.
CHAPTER 2
Mathematical Introduction
This part of the course is devoted to the representation of the signals used in a communication system. We will see that a function of time may be associated to a vector and that we can translate some operations on signals (correlations, energy differences, etc.) into operations on vectors. 2.1. Denitions D EFINITION 2.1. Let x (t ) be a function dened in the interval (a, b). The energy E of x (t ) is dened as the integral1 b E | x (t )|2 dt (2.1)
a
All signals appearing in a communication system can be represented by real or complex valued functions with nite energy in any nite interval of time. We will be dealing with signals belonging to the class of functions with E < over (a, b), and will denote such class by L2 (a, b). We will write x (t ) L2 (a, b) to mean that x (t ) is a nite energy function over (a, b). D EFINITION 2.2. The internal product of x (t ) and y(t ) is dened as b ( x , y) x (t )y (t )dt
a
(2.2)
where y (t ) is the complex conjugate of y(t ). Note that ( x , x ) represents the energy of x (t ). D EFINITION 2.3. The norm of x (t ), denoted as x , is dened as the square root of its energy b x | x (t )|2 dt (2.3)
a
D EFINITION 2.4. The function x (t ) is said to be normalizable if x = 0. Apparently, if x = 0, then x (t ) = 0 almost everywhere, meaning that x (t ) = 0 only for a nite (or innite but countable) number of values of t in (a, b). D EFINITION 2.5. The distance between x (t ) and y(t ) is dened as the norm of x (t ) y(t ) b xy = | x (t ) y(t )|2 dt .
a
(2.4)
If x y = 0, x (t ) and y(t ) coincide almost everywhere. From a practical point of view, they are equivalent and we will consider that x (t ) = y(t ). Specically, if the norm of a function is zero, we will assume that it is the zero function.
1
The symbol
means equal by denition.

4
2.1. DEFINITIONS
T HEOREM 2.6 (Schwartz inequality). Given any x (t ) and y(t ) the following is always true: where the equal sign is valid if and only if x (t ) = y(t ) for some . P ROOF. If y = 0, (2.5) is trivially true. Assuming y = 0, for any we have and, letting = ( x , y)/ y 2 , we obtain x y
2
|( x , y)| x y
(2.5)
= x
2Re{ ( x , y)} + ||2 y

2
(2.6)
|( x , y)|2 x 0 y 2 from which (2.5) follows. Moreover, if x (t ) = y(t ), then x = || y , |( x , y)| = || y

2
(2.7)
and thus (2.5) is true with the equal sign. Conversely, if (2.5) is true with the equal sign, then (2.7) is also true with the equal sign, and so is (2.6) for = ( x , y)/ y 2 . It follows that x y 2 = 0, i.e., x (t ) = y(t ). D EFINITION 2.7. Two functions x (t ) and y(t ) are said orthogonal if ( x , y) = 0.
M D EFINITION 2.8. Let us consider a set of M functions2 { xi (t )}1 . Such functions are said linearly independent if M
ci xi (t ) = 0
can only be satised for ci = 0, i = 1, 2, . . . , M . Conversely, if (2.8) can be satised by a set of M numbers {ci }1 not all equal to zero, the M functions are said linearly dependent.
i =1
t (a, b)
(2.8)
E XAMPLE 2.9. The functions {1, t , t 2 } are linearly independent over any interval (a, b). Indeed, assuming that c1 + c2 t + c3 t 2 = 0 and differentiating two times with respect to t , we get the following system of equations: c1 + c2 t + c3 t 2 = 0 c2 + 2c3 t = 0 2c3 = 0
whose only solution is c1 = c2 = c3 = 0. Thus, the functions {1, t , t 2 }are linearly independent.
M R EMARK 2.10. If the M functions { xi (t )}1 are linearly independent, no one of them can be identically equal to zero. In fact, if xk (t ) is identically zero, then iM =1 ci xi (t ) = 0 with ci = 0 for i = k M and ck = 0, and thus { xi (t )}1 would be linearly dependent.
M R EMARK 2.11. If the M functions { xi (t )}1 are linearly independent, eliminating any one of them, say xk (t ), the remaining M 1 functions { x1 (t ), . . . , xk 1 (t ), xk +1 (t ), . . . , x M (t )} are still linearly independent. Indeed, if they were linearly dependent, we could nd a set of M 1 numbers ci , not all equal to zero, such that
c1 x1 (t ) + . . . + ck 1 xk 1 (t ) + ck +1 xk +1 (t ) + . . . + cM x M (t ) = 0.
N We will use the abbreviated notation {ai }1 to denote the set {a1 , a2 , . . . , aN }, and specify the index only to avoid N ambiguity, as in {ai j }i=1 .
2.1. DEFINITIONS
Thus, letting ck = 0, we would have

M i =1 M where not all ci are zero, meaning that { xi (t )}1 would be linearly dependent. M R EMARK 2.12. If the M functions { xi (t )}1 are linearly independent, none of them is a linear combination of the others. Indeed, if
ci xi (t ) = 0
xk (t ) = we would have xk (t )
M i =k
M i =k
ci xi (t )
ci xi (t ) = 0
M and thus { xi (t )}1 would be linearly dependent.
E XAMPLE 2.13. Over any interval (a, b), the functions {sin t , cos t } are linearly independent, while {sin t , cos t , cos(t )} are linearly dependent. In fact, assuming that c1 sin t +c2 cos t = 0 and differentiating with respect to t , we get c1 sin t + c2 cos t = 0 c1 cos t c2 sin t = 0 whose only solution is c1 = c2 = 0, meaning that {sin t , cos t } are linearly independent. However, as cos(t ) = cos cos t + sin sin t , the functions {sin t , cos t , cos(t )} are linearly dependent.
M D EFINITION 2.14. Given M functions { xi (t )}1 , all their linear combinations M i =1 M form a subspace S of L2 (a, b) which is said the subspace generated by { xi (t )}1 .
ci xi (t )
E XAMPLE 2.15. Recalling the previous example, cos(t ) belongs to the subspace generated by {sin t , cos t }. E XAMPLE 2.16. The solutions of the differential equation (the dots denote differentiation with respect to time t ) x (t ) + 3 x (t ) + 2 x (t ) = 0 form a subspace, as they can be written as x (t ) = c1 et + c2 e2t where c1 and c2 are arbitrary constants. t (0, )
2.2. DISCRETE REPRESENTATION OF SIGNALS
2.2. Discrete Representation of Signals

N D EFINITION 2.17. Given a subspace S , the functions {i (t )}1 are said a basis B of S if
R EMARK 2.18. Given that all i (t ) S , each of them is a linear combination of the functions M { xi (t )}1 which generated S . Thus, not only any function of S can be written as a linear combination N N of {i (t )}1 , but any linear combination of {i (t )}1 belongs to S , too.
(1) They belong to S . (2) They are linearly independent. N . (3) Any function of S can be written as a linear combination of {i (t )}1
N T HEOREM 2.19. If {i (t )}1 are a basis B of S , then any x (t ) S has a unique representation of the form
x (t ) = The numbers
N { xi }1
N i =1
xi i (t ).
(2.9)
are said the components of x (t ) with respect to the basis B.
P ROOF. Assuming that a second representation of x (t ) exists, x (t ) = subtracting (2.10) from (2.9) we get
N i =1 N i =1
xi i (t ),
(2.10)
( xi xi )i (t ) = 0
N and, given the linear independence of {i (t )}1 , it must be xi = xi for i = 1, 2, . . . , N .
N R EMARK 2.20. According to the previous theorem, given a basis B = {i (t )}1 of S , there exists a one-to-one correspondence between the functions of S and the N -dimensional vectors of a space CN . Indeed, to each x (t ) S corresponds in CN one and only one column vector x = ( x1 , x2 , . . . , xN )T , which is said the image of x (t ). Conversely, to each x CN corresponds a linear combination (2.9) and thus a (unique) function of S . In other words, S and CN are isomorphic with respect to the basis B.
R EMARK 2.21. Once specied a basis, linear operations (sum, subtraction, multiplication by constants) performed on the functions of S , translate to linear operations on their images, as specied by the following theorem. T HEOREM 2.22. Given two functions x (t ) and y(t ) of S , and their images x and y with respect to a N basis {i (t )}1 , then3 x (t ) + y(t ) x + y x (t ) x
The symbol means that LHS and RHS are in one-to-one correspondence.
P ROOF. By hypothesis we can write x (t ) = y(t ) = and summing the two equations we obtain x (t ) + y(t ) =
N i =1 N i =1 N i =1
xi i (t ) yi i (t )
( xi + yi )i (t ),
proving that the image of x (t ) + y(t ) is x + y. Similarly, multiplying x (t ) by we get x (t ) = which proves the second part of the theorem.
N T HEOREM 2.23. If B1 {i (t )}1 is a basis of S , and x (t ) S is not identically zero, then (for some k ) B2 { x (t ), 1 (t ), . . . , k 1 (t ), k +1 (t ), . . . , N (t )} is a basis of S . N i =1
xi i (t ),
P ROOF. As x (t ) S , we have x (t ) =
N i =1
xi i (t ).
(2.11)
At least one of the components xi of x (t ) is not zero, because x (t ) is not identically zero. So, assuming xk = 0, from (2.11) it follows that x (t ) 1 k (t ) = xk xk On the other hand, any y(t ) S can be written as y(t ) =
N N i =k
xi i (t ).
(2.12)
yi i (t )
(2.13)
i =1
and, substituting (2.12) in (2.13), it follows that y(t ) can be expressed as a linear combination of the functions of the basis B2 . So, the functions of B2 satisfy conditions 1 and 3 of Denition 2.17, and we must only show that they are linearly independent. This can be proved by contradiction. If the functions of B2 were linearly dependent, for some c = (c1 , c2 , . . . , cN )T we would have ck x (t ) +
N i =k
ci i (t ) = 0
(2.14)
with ck = 0 (otherwise the functions of B1 would be linearly dependent, see Remark 2.11). Then, replacing the expression for x (t ) obtained from (2.14) in (2.12), we would have that k (t ) can be expressed as a linear combination of the remaining functions of B1 . But this would mean that N {i (t )}1 are linearly dependent (see Remark 2.12), which contradicts the hypothesis that B1 is a basis.
R EMARK 2.24. The previous theorem shows that, in general, a subspace S has more than one basis (and indeed we will see that the number of bases is innite). Thus, there are more ways of associating the functions of S with vectors. However, the following theorem shows that, whatever the basis, the space of the images of the functions of S always has the same dimension. T HEOREM 2.25. If B1
N {i (t )}1 and B2 M {i (t )}1 are two bases of S , then N = M .
P ROOF BY CONTRADICTION . Assuming that N < M and applying Theorem 2.23, we replace k (t ) in B1 with 1 (t ), obtaining a new basis A1 {1 (t ), 1 (t ), . . . , k 1 (t ), k +1 (t ), . . . , N (t )}. Applying again the same theorem, we replace j (t ), j = k , with 2 (t ), obtaining the basis A2 {1 (t ), 2 (t ), 1 (t ), . . . , j 1 (t ), j +1 (t ), . . . , k 1 (t ), k +1 (t ), . . . , N (t )}.
Note that when expressing 2 (t ) through the basis A1 , at least one of the coefcients of the functions i (t ) is not zero, otherwise we would have a relation like 2 (t ) = c1 (t ), which is incompatible with the hypothesis that B2 is a basis. In other words, when inserting 2 (t ) in A1 , we can eliminate one of the remaining functions i (t ) appearing with a non-zero coefcient in the expression of 2 (t ) through A1 , and not 1 (t ). This also means that if 2 (t ) = c1 1 (t ) + cm m (t ), or 2 (t ) = cm m (t ), with cm = 0, when inserting 2 (t ) we must eliminate right m (t ) and not some other i (t ). Iteratively applying the same theorem, we arrive at the conclusion that AN a basis. Thus, N +1 (t ) can be expressed through the functions of AN N +1 (t ) =
N i =1
{1 (t ), . . . , N (t )} is
ci i (t ).
But this contradicts the fact that B2 is a basis (see Remark 2.12). D EFINITION 2.26. The dimension of a subspace S is the number of functions of its bases. D EFINITION 2.27. A basis B
N {i (t )}1 is said orthonormal if
(i , j ) = i j where i j is the Kronecker symbol
i , j = 1, 2 , . . . , N if i = j . if i = j
R EMARK 2.28. In general, computing the components of the image x of x (t ) S with respect to a N generic basis B may be a cumbersome task. However, if B = {i (t )}1 is an orthonormal basis, the computation becomes easy. Indeed, writing x (t ) =
N i =1
1 i j = 0
xi i (t ),
(2.15)
and taking into account that (i , k ) = ik , from (2.15) we have that ( x , k ) = xk , i.e., the components of x are given by the internal product of x (t ) and the functions of B b xk = ( x , k ) = x (t ) k = 1, 2 , . . . , N k (t )dt
a
10
N R EMARK 2.29. Given an orthonormal basis B {i (t )}1 , the correspondence between the funcN tions of S and the vectors of C has some interesting properties. For example, if x (t ) S , we have
x (t ) = and thus
a b N N
xi i (t )
b N
i =1
| x (t )| dt =
i =1 j =1
xi x j
i (t ) j (t )dt
=
i =1
| xi |2 .
(2.16)
This means that the energy of x (t ) is equal to the squared distance from the origin of its image x. Similarly, given y(t ) S y(t ) = we have
a b N i =1 N i =1
yi i (t ),
| x (t ) y(t )|2 dt =
| xi yi |2 ,
(2.17)
such that the distance between two functions may be interpreted as the distance between their images x and y. Finally, as regards the internal product of x (t ) and y(t ) b N x (t )y (t )dt = xi yi ,
a i =1
(2.18)
we see that it is equal to the scalar product xT y of their images. R EMARK 2.30. Equations (2.16) and (2.18) allow to dene the angle between two real functions. From geometry, we know that the scalar product (also known as dot product) of two real vectors x and y may be written as xT y = x y cos where is the angle between them. If x and y are the images of the real functions x (t ) and y(t ), from (2.16) we have that x = x and y = y , and from (2.18) xT y = ( x , y). Thus, we can dene the angle between x (t ) and y(t ) by ( x , y) cos = . (2.19) x y Note that as | cos | 1, from (2.19) it follows that |( x , y)| x y , which is the Schwartz inequality. R EMARK 2.31. There is some ambiguity in the denition of angle between complex vectors. If x and y are vectors with complex-valued components, x y 2 = (x y)T (x y) = x 2 + y 2 2Re{xT y }, and comparing this with the result obtained by the law of cosines x y 2 = x 2 + y 2 2 x y cos , we have However, if we use this last equation for dening the angle between the complex vectors x and y, then = /2 may hold even when xT y = 0. Being the space of complex vectors isomorphic with Re{xT y } = x y cos .
11
the space of complex functions, this ambiguity is also present if we dene the angle between two complex functions x (t ) and y(t ) by Re{( x , y)} cos = , x y such that it may be that = /2 even when ( x , y) = 0.
N N , and B {i (t )}1 R EMARK 2.32. Given a subspace S and two orthonormal bases B {i (t )}1 let x and x be the images of x (t ) S with respect to B and B , respectively. In order to nd the relation between x and x , we note that
x (t ) = x (t ) =
N i =1 N i =1
xi i (t ) xi i (t ),
(2.20) (2.21)
and, expressing each i (t ) as a linear combination of the functions of B , i (t ) =

N k =1
(i , k )k (t )
i = 1, 2, . . . , N ,
(2.22)
we substitute (2.22) in (2.20), getting x (t ) =

N i =1
N k =1
Comparing (2.23) with (2.21), we see that the relation between the images x and x is x = Ax (2.24) where the components of the N N matrix A are ai j = ( j , i ). It is easy to see that the matrix A is unitary, i.e., A A = AA = I, where A is the conjugate transpose (Hermitian adjoint) of A and I is the identity matrix. Indeed, multiplying both sides of (2.22) by j (t ), integrating over (a, b), and taking into account that (k , j ) = ( j , k ) , we get (i , j ) =
N k =1
xk (k , i ) i (t ).
(2.23)
(i , k )( j , k ) =
N k =1
a k j aki = a j ai = i j ,
where ai is the i -th column of A, and
a j
the j -th row of A .
This means that the transformation (2.24) is a rotation. This result can be expressed as follows. T HEOREM 2.33. Given a subspace S and two orthonormal bases B and B , the images x and x of x (t ) S can be obtained from each other through a rotation. R EMARK 2.34. From the previous theorem we have that, as the possible number of rotations is innite, there exist an innite number of orthonormal bases. R EMARK 2.35. A method for determining an orthonormal basis of the subspace S generated by the M set of functions X { xi (t )}1 is provided by the Gram-Schmidt orthonormalization procedure and consists in determining an orthonormal set of functions B {i (t )}iN , with N M , such that (1) each i (t ) S ; (2) each x (t ) S can be expressed as a linear combination of the functions of B.
12
Note that as {i (t )}iN are orthonormal, they are also linearly independent and thus B is a basis of S . Indeed, if
N i =1
ci i (t ) = 0,
2 N i =1
then it should also be
N i =1
ci i (t ) dt =
|ci |2 = 0
which is possible only if ci = 0 for all i .
If a function of X has zero norm, it can be eliminated without affecting the subspace S . So, in the following we will assume that no function of X has zero norm. A LGORITHM 2.36 (The Gram-Schmidt procedure). The procedure consists of the following (at most) M steps: Step 1: The rst function of the sought basis B is 1 (t ) x1 (t ) . x1
Clearly, 1 (t ) is normalized and x1 (t ) = x1 1 (t ) is a linear combination of the functions of B. Step 2: Let us dene the auxiliary function 2 (t ) If 2 = 0, then x2 (t ) is a linear combination of the functions of B, at least almost everywhere, and we proceed with the next step. Otherwise, if 2 = 0, we let 2 (t ) 2 (t ) . 2 (2.26) x2 (t ) ( x2 , 1 )1 (t ). (2.25)
Clearly, 2 (t ) is normalized and it is easy to verify that it is orthogonal to 1 (t ) by replacing (2.25) in (2.26) and computing (2 , 1 ). Moreover, from (2.25) and (2.26) it follows that x2 (t ) = 2 2 (t ) + ( x2 , 1 )1 (t ), i.e., x2 (t ) can be expressed as a linear combination of the functions of B. Step k: We have already determined k 1 orthonormal functions {i (t )}1 , as at each step we introduce at most one new orthonormal function. So, let us consider the auxiliary function k (t ) = xk (t ) ( xk , i )i (t ).
i =1
(2.27)
If k = 0, then xk (t ) is a linear combination of the functions of B, and we proceed with the next step. Otherwise, if k = 0, we let Clearly,
+1 (t ) +1 (t )
k (t ) . k j = 1, 2, . . . ,
(2.28)
is normalized and from (2.27) and (2.28) we have that (

+1 , j )
=0
13
that is {i (t )}1+1 is an orthonormal set. Moreover, again from (2.27) and (2.28) it follows that xk (t ) = k
+1 (t )
+
i =1
( xk , i )i (t ),
i.e., xk (t ) can be expressed as a linear combination of the functions of B. The procedure stops when k = M . R EMARK 2.37. The functions of B are N M and the equal sign is valid iff no auxiliary function i (t ) has zero norm. T HEOREM 2.38. The linear independence of the functions of X is a necessary and sufcient condition for having N = M . P ROOF. By construction, the functions k (t ) are a linear combination of the functions of X , k (t ) = xk (t ) +
k 1 i =1 M thus, if { xi (t )}1 are linearly independent, it cannot be k = 0, and thus N = M .
ci xi (t ),
M Conversely, if N = M and { xi (t )}1 were linearly dependent, we could express one of them as a linear combination of the others. So, we could eliminate from X such function, obtaining a set X generating the same subspace S . Then, applying the Gram-Schmidt procedure to X , we would nd a basis B with N 1 functions. Now we would have a basis B with N functions and another basis B with N 1 functions, and this would be in contradiction with the fact that all bases have the same number of functions (Theorem 2.25).
x1(t ) 2
1 4
1 2
3 4
x2(t ) 2 1
x3(t ) 2
1 4
1 4
1 2
3 4
F IGURE 2.1. E XAMPLE 2.39. Applying the Gram-Schmidt procedure for nding an orthonormal basis for the subspace generated by the functions in Fig. 2.1, we have that x1 = 1 and thus 1 (t ) = x1 (t ) . In the second step, the auxiliary function is 1 2 (t ) = x2 (t ) ( x2 , 1 )1 (t ) = x2 (t ) 1 (t ) 2 and being 2 = we nd that 1 2
2 (t ) = 2 x2 (t ) 1 (t ) .
14
In the third step, as 2 (t ) = x3 (t ), it turns out that 3 (t ) = x3 (t ) ( x3 , 1 )1 (t ) ( x3 , 2 )2 (t ) = x3 (t ) 2 (t ) = 0 meaning that the dimension of the subspace is 2 and one of its orthonormal bases is R EMARK 2.40. Given any function q(t ), the sum x (t ) =
k =
{1 (t ), 2 (t )} = { x1 (t ), x3 (t )} . q(t kT )
(2.29)
is periodic with period T and thus can be expanded in Fourier series x (t ) = where the coefcients Xn are given by 1 Xn = T Replacing (2.29) in (2.31) we get 1 Xn = T = 1 T
n=
Xn e j 2nt /T
(2.30)
x (t )e j 2nt /T dt
(2.31)
k = 0 (k 1)T k = kT
q(t kT )e j 2nt /T dt q( )e j 2n(+kT )/T d
where we also performed the change of variable t kT = . Taking into account that e j 2nk = 1, we obtain (k 1)T 1 Xn = q( )e j 2n/T d T k = kT 1 = q( )e j 2n/T d T 1 n = Q (2.32) T T where Q( f ) =
q(t )e j 2 f t dt
is the Fourier transform of q(t ).
P ROPOSITION 2.41 (Poissons sum formula). Replacing (2.32) in (2.30) and comparing with (2.29) we obtain the Poissons sum formula T
k =
q(t kT ) =
n=
n j 2nt /T e T
(2.33)
15
R EMARK 2.42. With similar reasoning, It can also be shown that k Q f q(kT )e j 2k f T =T T k = k = which is the dual of (2.33) in the frequency domain. P ROPOSITION 2.43 (Parsevals formula). Given two arbitrary nite energy signals x1 (t ) and x2 (t ) with Fourier transforms X1 ( f ) and X2 ( f ), respectively, then x1 (t ) x2 (t )dt = X1 ( f ) X2 ( f )d f
P ROOF. Substituting for x2 (t ) the inverse Fourier transform j 2 f t X2 (t )e dt = X2 (t )e j 2 f t d f x2 (t ) =
(2.34)
and interchanging the order of time and frequency integrations gives x1 (t ) x2 (t )dt = x1 (t ) X2 ( f )e j 2 f t d f dt = x1 (t )e j 2 f t dt X2 ( f )d f ( f )d f = X1 ( f ) X2
where sinc x
E XAMPLE 2.44. The functions n (t ) = 2 B sinc(2 Bt n)
sin( x )/( x ) (see Fig. 2.2), are an orthonormal set for < t < .
n = 0, 1, 2, . . . , N
(2.35)
x sinc(x) = sin x
F IGURE 2.2. This can be proved by showing that (n , m ) =
The above integral can be easily computed by using the Parsevals formula. Given two functions x1 (t ) and x2 (t ) in L2 (, ), according to Parsevals formula x1 (t ) x2 (t )dt = X1 ( f )X2 ( f )d f ,

n (t ) m (t )dt = nm .
2.3. COMPLETE BASES
16
where X1 ( f ) and X2 ( f ) are the Fourier transforms of x1 (t ) and x2 (t ) Xi ( f ) = xi (t )e j 2 f t dt . n , 2B denoting by n ( f ) the Fourier transform of n (t ), we have n (t ) = 0 t n ( f ) = 0 ( f )e j 2 f n/(2B) , such that
As all n (t ) can be written in terms of 0 (t ),
(2.36) (2.37)
In order to compute the RHS of (2.37), we must know 0 ( f ). It can be easily shown that (see Fig. 2.3) 1/ 2 B | f | B 0 ( f ) = (2.38) 0 |f | > B
N which substituted in (2.37) proves the orthonormality of the functions {n (t )}1 .
n (t ) m (t )dt
|0 ( f )|2 e j 2 f (nm)/(2B) d f .
2B
0(t )
1/ 2B
0 ( f )
1 B
21 B
1 2B
1 B
F IGURE 2.3. 2.3. Complete Bases R EMARK 2.45. We have seen that, given a subspace S in L2 (a, b) and one of its orthonormal bases B, we can associate any function in S to a vector. We now want to show how to associate any function x (t ) L2 (a, b) to a vector of a proper vector space. Of course, if x (t ) S , we already know how to do that, but if x (t ) S , it is not possible to express x (t ) as a linear combination of the functions of the basis B. So, we can only approximate x (t ) using the basis B, but with what degree of accuracy? The answer to this question may be given by using the results provided by the following theorem. T HEOREM 2.46 (Projection theorem). Given a subspace S and one of his orthonormal bases B N {i (t )}1 , for any x (t ) L2 (a, b) there exists in S a function x (t ) given by x (t ) =
N
( x , i )i (t )
(2.39)
i =1
such that it has the minimum distance from x (t ). Moreover, x (t ) x (t ) is orthogonal to any function of S .
2.3. COMPLETE BASES
17
P ROOF. We will rst show that x (t ) x (t ) is orthogonal to any function y(t ) S . To this end, we note that y(t ) can be written as y(t ) = ad thus, using (2.39) and (2.40), (x x , y) = ( x , y) ( x , y) = xy
2 N i =1 N
yi i (t )
(2.40)
i =1
( x , i )yi
i =1
( x , i )yi = 0.
In particular, x (t ) x (t ) is orthogonal to x (t ) y(t ) S , such that we can write from which it follows that x x is a minimum of x y and such minimum is achieved if and only if y(t ) = x (t ). xy xx = xx +x y
2
= xx 2+ x y 2,
D EFINITION 2.47. The function x (t ) dened in (2.39) is called the projection of x (t ) onto S . R EMARK 2.48. The Gram-Schmidt procedure can now be restated as follows. When considering xk (t ) X , we already have an orthonormal set B {i (t )}1 . So, we project xk (t ) onto B obtaining x k (t ) and examine the error xk (t ) x k (t ). If xk (t ) belongs to the subspace generated by the functions of B , the error is zero and we consider the next function xk +1 (t ). Otherwise, xk (t ) x k (t ) is orthogonal to the functions of B , so, after normalization, we can add it to B , obtaining a new set B +1 . R EMARK 2.49. Let us return to our initial problem of approximating x (t ) S with a function of S . As already seen, the function of S nearest to x (t ) is x (t ). The difference e(t ) = x (t ) x (t ) is the instantaneous error, while e 2 /(b a) is the mean square error. Thus, e 2 is a measure of the degree of accuracy of the approximation x (t ) to x (t ). Observing that e(t ) is orthogonal to any function of S , and in particular to x (t ), we have that x 2 = 2 2 2 e+x = e + x , and thus e 2= x 2 x 2, (2.41) e
2
which, taking into account (2.39) and (2.16), can also be written as = x
2 N
i =1
|( x , i )|2 .
(2.42)
This expression suggests that if we use a sequence of orthonormal bases B1 {1 (t )}, B2 {1 (t ), 2 (t )}, . . ., BN {1 (t ), 2 (t ), . . . , N (t )}, and project x (t ) onto them, we get a sequence of approximations x 1 (t ), x 2 (t ), . . . , x N (t ), whose errors eN (t ) x (t ) x N (t ) are such that eN
2
i.e., they form a decreasing sequence and thus the approximation x N (t ) becomes better when increasing N . Then, we wonder whether using an orthonormal basis with a countable innity of 2 functions, say B {i (t )} 0 when N . In general, the answer is 1 , it happens that eN negative, as demonstrated by the following example.
= e N 1
|( x , N )|2 ,
(2.43)
2.3. COMPLETE BASES
18
E XAMPLE 2.50. Given the set of 2N +1 functions BN {i (t )}iN =N , where i (t ) = 2 B sinc(2 Bt i ) as in Example 2.44, and x (t ) L2 (, ), let us compute the limiting expression of eN 2 for N , being eN (t ) Using the Parsevals theorem, eN eN
2 2
x (t )
( x , n )n (t ).
(2.44)
n=N
can be computed as 2 = |eN (t )| dt = |E N ( f )|2 d f

N n=N
(2.45)
where E N ( f ) is the Fourier transform of eN (t ). From (2.44) we have EN ( f ) = X ( f ) ( x , n )n ( f ), (2.46)
where X ( f ) and n ( f ) are the Fourier transforms of x (t ) and n (t ), respectively. Using the Parsevals formula, we can also write ( x , n ) = X () n ( )d ,
(2.47)
and substituting (2.47) in (2.46), we get EN ( f ) = X ( f )

N n=N
n ( f )
X () n ( )d .
(2.48)
Substituting (2.36) in (2.48) and letting N , we obtain j 2 ( f )n/(2 B) E ( f ) = X ( f ) 0 ( f ) X () d , 0 ( )e

n=
(2.49)
and, letting = f , we arrive at E ( f ) = X ( f ) 0 ( f )
X ( f + ) 0 ( f + )
n=
e j 2n/(2B) d .
(2.50)
Recalling now the Poissons sum formula T

n=
q(t nT ) =
n=
n j 2nt /T e , T
(2.51)
where Q( f ) is the Fourier transform of q(t ), letting T = 2 B, t = , and q(t ) = (t ), (t ) being the Dirac delta function, (2.51) becomes 2B
n=
n2 B =
n=
e j 2n/(2B) .
(2.52)
2.3. COMPLETE BASES
19
Substituting (2.52) in (2.50), we have E ( f ) = X ( f ) 2 B0 ( f ) = X ( f ) 2B On the other hand (see Fig. 2.3) |0 ( f )|2 0 ( f ) ( f + n 2 B ) = 0 0 21B |0 ( f )|2 = 0 n=0 , n=0
n= n=
X ( f + ) 0 ( f + ) n2 B d (2.53)
X ( f + n2 B)0 ( f ) 0 ( f + n2 B).
thus, as
|f | B , |f | > B
from (2.53) it follows that
and nally from (2.45), letting N ,
|f | B 0 E ( f ) = X ( f ) | f | > B e
2
Let us now compute the coefcients ( x , n ) of the expansion of x (t ). From (2.47) we have j 2 f n/(2 B) ( x , n ) = X ( f ) df 0 ( f )e
From (2.54) we see that e 2 is equal to the energy of x (t ) outside the bandwidth ( B, B). Thus, if x (t ) is bandlimited to ( B, B), i.e., if X ( f ) = 0 for | f | > B, then eN 0 when N , otherwise e = 0.
| f |> B
|X ( f )|2 d f .
(2.54)
B 1 ( x , n ) = X ( f )e j 2 f n/(2B) d f . 2 B B If x (t ) is bandlimited to ( B, B), the integral in (2.55) is equal to x n/(2 B) , and thus 1 n ( x , n ) = x . 2B 2B
and, recalling (2.38),
(2.55)
(2.56)
This result is known as the sampling theorem. T HEOREM 2.51 (Sampling theorem). If x (t ) is bandlimited to ( B, B), it can be expressed as the series n x (t ) = x sinc(2 Bt n) (2.57) 2B n= i.e., x (t ) can be reconstructed from its samples x n/(2 B) spaced 1/(2 B) seconds apart.
2.4. DISCRETE REPRESENTATION OF STOCHASTIC PROCESSES
20
D EFINITION 2.52. Given a subspace S L2 (a, b) (not necessarily of nite dimension and possibly coincident with L2 (a, b)) and an orthonormal basis with a countable innity of functions {i (t )} 1 , such basis is said to be complete for the subspace S if lim x (t )
N i =1
( x , i )i (t )
=0
x (t ) S
(2.58)
i.e., the representation of x (t ) is a series converging in the mean square sense. When this happens, the series converges to x (t ) almost anywhere, and in practice we will assume that x (t ) = E XAMPLE 2.53. The Fourier basis 1 n (t ) = e j 2nt /T T n = 0, 1, 2, . . . ,
i =1
( x , i )i (t ).
(2.59)
is complete for all functions with a nite number of discontinuities and extrema in an interval of length T , for example the interval (t0 , t0 + T ). E XAMPLE 2.54. As already seen, the basis whose function are n (t ) = 2 B sinc(2 Bt n) n = 0, 1, 2, . . . , is complete for the subspace of all functions of L2 (, ) with nite bandwidth B. R EMARK 2.55. The results (2.16), (2.17), and (2.18) can be extended to the case of complete bases, such that the energy of x (t ) is equal to the squared distance from the origin of its image x
2
i =1
| xi |2 ,
the distance between x (t ) and y(t ) is equal to the distance between their images xy =
i =1
| xi yi |2 ,
and the internal product of x (t ) and y(t ) is equal to the scalar product of their images ( x , y) =
i =1
xi yi .
2.4. Discrete Representation of Stochastic Processes R EMARK 2.56. Let us consider a stochastic process {n(t )} and assume that each realization n(t ) has nite energy in the interval (a, b), i.e., n(t ) S L2 (a, b) with probability equal to 1. In this case,
21
given a complete orthonormal basis {i (t )} 1 of S , we can write (with probability 1) n(t ) = where
i =1
ni i (t ),
n2
ni (n, i ), such that the image of n(t ) is a vector n = (n1 , n2 , . . .)T in an innite dimensional space. As n depends on the realization n(t ), its components ni are random variables and the images of all realizations constitute a cloud of points, as illustrated in Fig. 2.4 by considering only the rst two components n1 and n2 .
n1
F IGURE 2.4.
The mean value E {ni } and the correlation E {ni n j } of the components ni can be evaluated by knowing the mean (t ) and autocorrelation function R(t1 , t2 ) of the process {n(t )}. Recalling that, by denition, (t ) R(t1 , t2 ) we have E {ni } = E and E {ni n j} =E =E = =
a b
E {n(t )}
E {n(t1 )n (t2 )}
b
n(t ) i (t )dt
a b
E {n(t )} i (t )dt
a b
=
a
(t ) i (t )dt
(2.60)
n(t1 ) i (t1 )dt1

b
n (t2 ) j (t2 )dt2
b
a b a
n(t1 )n (t2 ) i (t1 ) j (t2 )dt1 dt2
b
a
b
a
E {n(t1 )n (t2 )} i (t1 ) j (t2 )dt1 dt2 R(t1 , t2 ) i (t1 ) j (t2 )dt1 dt2 . (2.61)
Similarly, the covariance cov(ni , n j ) can be evaluated from the autocovariance function C (t1 , t2 ) R(t1 , t2 ) (t1 ) (t2 ). Indeed, using (2.60) and (2.61), we get cov(ni , n j )
E {ni n j } E {ni } E {n j } b b R(t1 , t2 ) (t1 ) (t2 ) = i (t1 ) j (t2 )dt1 dt2 a a b b = C (t1 , t2 ) i (t1 ) j (t2 )dt1 dt2 . a a
(2.62)
22
R EMARK 2.57. In general, as shown by (2.62), the covariance of the components of n depends on the chosen basis. We wonder whether bases such that the components of n are uncorrelated exist. For such a basis it would be b b cov(ni , n j ) = C (t1 , t2 ) i , j = 1, 2, . . . (2.63) i (t1 ) j (t2 )dt1 dt2 = j i j
a a
with j = var(n j ) =
2 n . j
Recalling that
a
i (t1 ) j (t1 )dt1 = i j
(2.64)
we replace (2.64) in the right-hand side of (2.63), obtaining b b i (t1 ) C (t1 , t2 ) j (t2 )dt2 j j (t1 ) dt1 = 0
a a
i , j = 1, 2 , . . .
A necessary and sufcient condition for the above integral to vanish is that the term in square brackets vanishes for any t1 (a, b), i.e., b C (t1 , t2 ) j (t2 )dt2 = j j (t1 ) a t1 b
a
So, the answer to our question is provided by the following theorem. T HEOREM 2.58 (Karhunen-Love). If {n(t )} is a stochastic process with zero mean and autocovariance function C (t1 , t2 ), under hypotheses normally satised in practice: (1) The set {i (t )} of the normalized eigenfunctions of C (t1 , t2 ), i.e., the normalized solutions of the Fredholm integral equation b C (t1 , t2 )(t2 )dt2 = (t1 ) , a t1 b, R (2.65)
a
is a complete orthonormal basis in (a, b). (2) Denoting by {i } the eigenvalues associated to the eigenfunctions {i (t )} (i.e., the set of real values of for which the above equation admits solutions (t ) not identically equal to zero), we have C (t1 , t2 ) =
i =1
i i (t1 ) i (t2 )
(2.66)
(3) By using the orthonormal basis {i (t )} for expanding the realizations of the process n(t ), the components ni of such expansion are zero-mean random variables and cov(ni , n j ) = i i j R EMARK 2.59. The orthonormal complete basis obtained by solving (2.65) and the corresponding expansion are referred to as Karhunen-Love basis and Karhunen-Love series expansion, respectively. The advantage of this expansion is that the components of the process turn out to be uncorrelated. However, it does not mean that they are also independent, unless they are jointly Gaussian.
2.5. GAUSSIAN RANDOM VARIABLES
23
2.5. Gaussian Random Variables The N components xi of the real vector x = ( x1 , x2 , . . . , xN )T are jointly Gaussian if the probability density function (pdf) of x has the following expression p(x) = 1 exp (x )T C1 (x ) 2 (2 )N det C 1
where = E {x} = (1 , 2 , . . . , N )T is the mean value vector and C is the covariance matrix C = E {xxT } T 2 x cov( x1 x2 ) 1 2 cov( x2 x1 ) x 2 = . . . . . . cov( xN x1 ) cov( xN x2 ) p(x) = with pi ( xi ) = 1
2 2x i N i =1
R EMARK 2.60. If the random variables (r.v.) x1 , x2 , . . . , xN are uncorrelated, the matrix C and its inverse C1 are diagonal, such that p(x) turns out to be the product of the marginal pdfs pi ( xi ) ( xi i )2 , 2 2 x i
cov( x1 xN ) cov( x2 xN ) . .. . . . 2 x N
exp
and thus x1 , x2 , . . . , xN are also independent.

N N T HEOREM 2.61. If the r.v.s { xi }1 are jointly Gaussian and {yi }1 are obtained by a linear combination
yi =
ai j x j
i = 1, 2, . . . , N
j =1 N then the r.v.s {yi }1 are jointly Gaussian, too.
R EMARK 2.62. It turns out that the components of n are jointly Gaussian if {n(t )} is a Gaussian process. Indeed, as a process {n(t )} is Gaussian if (any nite number of) its samples n(t1 ), n(t2 ), . . . n(tN ) are jointly Gaussian, and taking into account that a generic component of n b n(tk ) ni = n(t ) i (t )dt i (tk )t
a k
is a linear combination of jointly Gaussian r.v.s, it follows that {ni } are jointly Gaussian.
R EMARK 2.63. For a Gaussian process, the pdf of n is determined by the mean (t ) and the autocovariance C (t1 , t2 ). Indeed, the components of n are jointly Gaussian and their joint pdf only depends on and C, whose elements are given by b i = (t ) i (t )dt a b b ci j = C (t1 , t2 ) i (t1 ) j (t2 )dt1 dt2
a a
24
D EFINITION 2.64. A stochastic process with mean (t ) and autocorrelation R(t1 , t2 ) is said to be wide-sense stationary (WSS) if (t ) is independent of time and R(t1 , t2 ) depends only on the time difference t1 t2 . R EMARK 2.65. As C (t1 , t2 ) R(t1 , t2 ) (t1 ) (t2 ), a WSS process is such that both autocorrelation and autocovariance functions depend only on t1 t2 . In the following, they will be denoted by R( ) and C ( ). D EFINITION 2.66. The power spectral density N ( f ) of a WSS process is dened as the Fourier transform of its autocorrelation function N( f ) R( )e j 2 f d R( ) = N ( f )e j 2 f d f R EMARK 2.67. If x (t ) is a WSS process with mean x and autocorrelation function Rx ( ), then the output y(t ) = x (t ) h(t ) of a lter with impulse response h(t ) and corresponding transfer function H ( f ) is still WSS and the output power spectral density is given by Indeed, E {y(t )} = E { x (t )} h(t ) = H (0) and h() x (t + )d h ( ) x (t )d Ry ( ) = E {y(t + )y (t )} = E = h()h ( )E { x (t + ) x (t )}d d h()Rx ( + )d d = h ( ) = Rx ( ) h( ) h ( ) . D EFINITION 2.68. A zero-mean WSS process is referred to as a white process if its power spectral density is independent of frequency. Denoting by N0 /2 its value, we have that N 0 j 2 f N0 R( ) = e df = ( ) 2 2 As (t ) = 0, we also have that C ( ) = R( ). R EMARK 2.69. If {n(t )} is a white process with N ( f ) = N0 /2, then the mean value of the components of n is zero and N0 b b (t1 t2 ) cov(ni , n j ) = i (t1 ) j (t2 )dt1 dt2 2 a a N0 b N0 = i (t1 ) j (t1 )dt1 = i j 2 a 2 such that they are uncorrelated, whatever the orthonormal basis used. R EMARK 2.70. A white process is not a process of the real world, as it has realizations with innite energy even over nite time intervals. Indeed, it sufces to notice that the average energy

Ny ( f ) = Nx ( f )|H ( f )|2 .
25
of its realizations is innite E
|n(t )| dt =
E {|n(t )| }dt =
R(t , t )dt
We can attach a meaning to a white process by thinking of it as the limit of a succession of processes whose bandwidth gets larger and larger with respect to the bandwidths of the lters appearing in a communication system. As an example, let us consider a lter H ( f ) excited by noise ni (t ) with power spectral density Ni ( f ) constant over its bandwidth, as shown in Fig. 2.5.
Ni( f ) H( f ) B B f
and, as R( ) is a Dirac delta, R(t , t ) = R(0) = and the integral diverges.
F IGURE 2.5. As the output power spectral density is given by it does not matter the way Ni ( f ) vanishes outside the bandwidth from B to B, and, as soon as Ni ( f ) = N0 /2 for B f B, we always get N0 No ( f ) = |H ( f )|2 . 2 Thus, as regards computing No ( f ), we could replace ni (t ) by a white process with power spectral density N0 /2. R EMARK 2.71. In general, replacing ni (t ) with a white process is only valid for computing the output power spectral density. However, if ni (t ) is Gaussian and zero-mean, then its substitution with a white process is completely equivalent for all purposes. Indeed, if two Gaussian processes have same mean and autocorrelation function, then they are the same process. Now, substituting ni (t ) with a white process does not change the output power spectral density and thus the autocorrelation function. On the other hand, the output process is still Gaussian and zero-mean. It follows that the output process remains unchanged. R EMARK 2.72. A white process does not satisfy the hypotheses of the Karhunen-Love theorem. However, it has to be thought as the limit of a succession of processes with nite bandwidth, each satisfying those N0/2 hypotheses. As an example, the class of WSS processes with power spectral density (see Fig. 2.6) N0 / 2 N( f ) = 1 + ( f / f0 )2 f f0 f0 is such that lim f0 N ( f ) = N0 /2, while the KarhunenLove basis tends to the Fourier basis and all the eigenF IGURE 2.6. values to N0 /2. No ( f ) = Ni ( f )|H ( f )|2
26
This last result can also be derived by formally solving (2.65) b C (t1 , t2 )(t2 )dt2 = (t1 )
a
with C (t1 , t2 ) = It can be easily found that
N0 (t1 t2 ) . 2
No (t ) = (t ) 2 from which we have that = N0 /2, whatever (t ). The functions of the basis remain undetermined. This is to be interpreted in the sense that the Karhunen-Love basis depends on the succession of processes used for approximating a white process. In the case of the succession previously seen, the basis tends to the Fourier basis but, a different succession may lead to a different basis. In other words, for a white process, any complete orthonormal basis is a Karhunen-Love basis and all the eigenvalues are equal to N0 /2. D EFINITION 2.73. Given a real valued white Gaussian process {n(t )} with power spectral density N0 /2, the process dened as t x (t ) = n( )d 0tT is referred to as a Wiener process on the interval (0, T ). E XAMPLE 2.74. In order to nd the Karhunen-Love basis of the Wiener process { x (t )}, we must know the autocovariance function. First of all, we note that { x (t )} has zero mean t E { x (t )} = E {n( )}d = 0,
0 0
such that the autocovariance function can be written as C (t1 , t2 ) = E { x (t1 ) x (t2 )} t1 t2 =E n(1 )n(2 )d 1 d 2 0 0 t1 t2 = E {n(1 )n(2 )}d 1 d 2 0 0 t1 t2 N0 (1 2 )d 1 d 2 . = 2 0 0 Performing the integration with respect to 1 we get t1 1 if t1 2 (1 2 )d 1 = = u(t1 2 ), 0 otherwise 0
where we expressed the result in terms of the unit step function u(t ). Using this result, the integration with respect to 2 gives t2 t1 if t1 < t2 u(t1 2 )d 2 = , t2 otherwise 0
27
N0 min(t1 , t2 ) 2 and thus the Fredholm integral equation to be solved is N0 T min(t1 , t2 )(t2 )dt2 = (t1 ). 2 0 By breaking the integral in two parts, we can write T N 0 t1 N0 t2 (t2 )dt2 + t1 (t2 )dt2 = (t1 ), 2 0 2 t1 and, differentiating two times with respect to t1 , we get N0 (t1 ) = (t1 ) 2 which can be written as N0 (t1 ) + 2 (t1 ) = 0 = . 2 By seeking a solution of the form (t1 ) = e t1 , must satisfy the following equation C (t1 , t2 ) = 2 + 2 = 0, i.e., = j , such that
such that we get
(2.67)
(t1 ) = c1 e j t1 + c2 e j t1 , and, in order to get a real valued function, it must be c2 = c1 . Letting c1 = 1/2ce j , with c, R, the solution can be written as (t1 ) = c cos(t1 + ). (2.68) Replacing (2.68) in (2.67) we get t1 T 2 2 t2 cos(t2 + )dt2 + t1 cos(t2 + )dt2 = cos(t1 + ) (2.69)
0
which should be satised for all t1 (0, T ), and in particular for t1 = 0, leading to 0 = cos , meaning that it should be = /2 + k , with k = 0, 1, 2, . . .. We can use any value of k as the corresponding values of are all equivalent. For example, using = /2 and normalizing (2.68), we get 2 sin(t1 ) (t1 ) = T but we are not still done, as we have to determine the eigenvalues i . To this end, using = /2 and also letting t1 = T in (2.69), we have that T 2 t2 sin t2 dt2 = sin T , i.e., by the change of variable t2 = x
0 T
t1
x sin xdx = sin T
should also be satised. Recalling the integration by parts rule udv = uv vdu,
28
cos T = 0 which is satised when the argument of the cosine function is equal to /2+(i 1) , with i = 1, 2, . . ., i.e., when assumes the following values (2i 1) i = i = 1, 2 , . . . 2T Recalling that = N0 /(2), the eigenvalues are 2 N0 T 2 (2i 1)2 2 and from (2.68) the normalized eigenfunctions are i = i (t ) = 2 (2i 1) t sin T 2T i = 1, 2 , . . .
we get
i = 1, 2 , . . .
CHAPTER 3
Waveform Transmission Over Wideband Channels

Even though, for the sake of generality, we have considered complex valued functions (signals) in our mathematical introduction, from now on we will only consider real valued ones, if not otherwise stated. Complex functions occur when bandpass signals are described using complex envelopes. For example, s(t ) = A(t ) cos 0 t + (t ) is completely specied by its amplitude A(t ) and phase (t ), once we know the carrier (angular) frequency 0 . Indeed, we could use the so called complex envelope s (t ) = A(t )e j (t ) for representing s(t ), as s(t ) = Re{s (t )e j 0 t }. The advantage of this representation lies in the fact that, if s(t ) is a bandpass signal with its spectrum centered around f0 = 0 /2 , then s (t ) is a lowpass signal whose spectrum is centered around f = 0. However, in the following we will not use the complex envelope representation and thus all signals we deal with will be real valued.
n(t )
SOURCE
TX
s(t )
r (t )
RX
CHANNEL
F IGURE 3.1.
3.1. Introduction With reference to Fig. 3.1, a discrete source of information emits every T seconds a message (or M symbol) mk belonging to a nite set {mi }1 of M elements referred to as the alphabet. We will denote by m the generic message and by mk one of the possible M different messages. In other words, m is a random variable and mk is one of its possible values. The transmitter rigidly associates M each message mk to one of M different signals {si (t )}1 and, in correspondence to the emission of mk , transmits a signal sk (t ) of duration T . We will denote by s(t ) the generic transmitted signal M belonging to the set of M possible signals s(t ) {si (t )}1 . In other words, s(t ) is a stochastic process and sk (t ) is one of its possible realizations or sample functions. The transmission channel alters the signal s(t ) such that the received signal r (t ) (also referred to as the observable) seen by the receiver differs from s(t ). All signals si (t ) are different and they are known to receiver. If the signals were not altered by the channel, the receiver could distinguish them without ambiguity. The signal may be altered in deterministic or random ways, or both, or even in more complex ways. In the following, we will assume that the channel is simply additive, meaning that the only alteration is the addition of noise represented by a signal n(t ) which is the realization (or sample function) of a stochastic process.
29
3.1. INTRODUCTION
30
Under the hypothesis that s(t ) is transmitted starting from t = 0, the receiver should estimate the transmitted message m based on the observation of the channel output r (t ) = s(t ) + n(t ) in the time interval 0 t T . Thus, at the time instant t = T , the receiver will emit a message m (belonging to the alphabet) according to what signal it deems was transmitted. Notice that the observed waveform r (t ) is a stochastic process not only because s(t ) is a stochastic process (the receiver knows the shape of all signals but does not know what signal was actually transmitted), but also because of the alterations introduced by the channel through the addition of the noise n(t ). Without the presence of the noise, the receiver could easily determine what signal was transmitted, as it knows the shape of all possible signals. But, due to the presence of the noise, the receiver may be wrong in its decisions. An error in the decision corresponds to the occurrence of the error event E = {m = m} . We would like to determine the strategy the receiver should follow in order to minimize the probability of error P(E ) = P(m = m) based on the decisions taken from the observation of the received signal r (t ). Another aspect we would like to investigate is about the impact of the signal shapes on the performance of the receiver. We will see how the probability of error changes when changing the shape of the signals. 3.1.1. Limitations. The model that we are adopting (see Fig. 3.1) is quite simple, also due to some other assumptions that we will make. Let us examine these assumptions. The noise is simply additive. This is not always true, because there are cases where the noise is multiplicative, as in the case of nonselective fading channels. Signal and noise are statistically independent. Also this assumption may not always be true. For example, in optical communications using a simple PIN receiver, the shot-noise due to the signal may dominate the electronic (thermal) noise, such that the stronger the signal, the stronger the noise. However, in cheap receivers it may happen that the electronic noise dominates shot-noise, making this assumption valid again. Another case where this assumption holds is that of optical receivers using an optical amplier before photodetection, as the optical noise added by the amplier dominates both shot and electronic noise. The noise is Gaussian. This is not always the case, too. For example, in radio links, the electromagnetic pollution in proximity of urban zones has an impulsive nature with non-Gaussian statistics. In the case of optical communications, the shot noise is not strictly Gaussian. However, the optical noise added by optical ampliers is modeled as a Gaussian process, making the theory we are going to develop still applicable. Summarizing, we will assume that the noise is additive, Gaussian, and statistically independent of the signal. This model can be applied to many communication systems, such as microwave communications with satellites, radio links, transmission over cables, or, again, in optical communications when using optical ampliers. 3.1.1.1. Signals duration. Another important assumption we will initially make is about the M duration of the signals. One of the signals {si (t )}1 is transmitted in the time interval (0, T ), another one in the interval (T , 2T ) and so on. Our fundamental hypothesis is that si (t ) = 0 for t (0, T ), i = 1, 2 , . . . , M
3.2. OPTIMUM DETECTION STRATEGY
31
which is to say that the signal duration is at most T , such that there is no interference with the signals transmitted in the next time intervals. 3.1.1.2. Impact of the channel. According to the model in Fig. 3.1, the channel has no inuence on the signals (apart from the addition of noise). It is clear that no physical channel behaves like that and our theory would be unusable, in practice. In reality, it is only necessary that the transmitted signals si (t ) have a duration not greater than T and that the channel alters them (in a deterministic and known way to the receiver) such that the duration of the received signals si (t ) is still not greater than T . Indeed, in this case we can replace the actual channel with a ctitious one that does not alter the transmitted signals, provided that we also replace si (t ) with si (t ). Notice that this last assumption is quite restrictive and is tantamount to assuming that the channel bandwidth B is much larger than the signaling rate 1/T . In many practical cases B is of the same order as 1/T and the theory is unt. However, we will relieve this assumption in Chapter 6. As a last note, we will assume that the number of transmitted signals M is nite and the receiver knows those signals, having only to decide which of them was transmitted in each time interval T . Correspondingly, this kind of communication system is referred to as digital, in contrast to analog communication systems where M is innite and the signals are only statistically known to the receiver. 3.2. Optimum Detection Strategy Let us now determine the strategy (i.e., the decision rule) that a receiver should follow for minimizing the probability of error. Choosing a complete orthonormal basis {i (t )} 1 , the received waveform r (t ) can be expanded (with probability 1) as r (t ) = where ri = (r, i ) =
i =1
ri i (t )
0 T
0tT r (t )i (t )dt .
(3.1)
(3.2)
However, in order to avoid dealing with vectors with innite components, we will solve our problem in two steps. First, we will limit the summation in (3.1) to N terms and determine the strategy the receiver would follow if it was observing r (t ) =
N i =1
ri i (t )
(3.3)
instead of r (t ). Then, we will let N , such that r (t ) r (t ). Collecting the components of r (t ) into a vector r = (r1 , r2 , ..., r N )T (3.4)
we can now reformulate the problem as follows. When the source emits the message m, the receiver observes the waveform r (t ), i.e., the vector r , and, based on this observation, decides that m was transmitted. We want to determine in which way the receiver should take his decision such that to minimize the probability of error P(E | r ) = P(m =m|r) (3.5)
3.2. OPTIMUM DETECTION STRATEGY
32
averaged over all possible noise realizations and all possible messages. Indeed, (3.5) is a random variable because it depends on r , whose components are random variables depending on the actual transmitted signal and noise realization. In order to perform the required average, we should have a statistical knowledge of the vector r . This knowledge is summarized in the joint probability density function of its components p(r ). So, the average probability of error to be minimized can be written as P(E ) = E {P(E | r )} = P(E | r ) p(r )d r (3.6)
S
S = ( < r1 < , . . . , < r N < ). Note that the integrand in (3.6) is given by the product of P(E | r ), which depends on the adopted strategy (i.e., on the criterion by which m is decided given that r was received), and p(r ), which is a non-negative quantity independent of the strategy. Thus, (3.6) is minimum if the strategy minimizes P(E | r ) for any r . Thus, let us see how this can be achieved. Denoting by C the event correct decision, we have that P(E | r ) = 1 P(C | r ) , (3.7)
where
and thus, minimizing P(E | r ) is tantamount to maximizing P(C | r ). If the receiver, after observing r , selects the message mk , the probability of correct decision is the same as the probability that mk was transmitted, i.e., P(C | r ) = P(mk | r ), because the events correct decision conditional upon r and the transmitted message mk coincides with the selected message m (r )
are equivalent, and hence m (r ) = mk In other words, for maximizing P(C | r ) we should select the message m that maximizes P(m | r ), i.e., the a posteriori probability that message m is transmitted. 3.2.1. Maximum A Posteriori Probability (MAP) Criterion. Thus, our optimum decision strategy consists in selecting the message whose a posteriori probability of being transmitted is maximum. The MAP criterion can be expressed as m = argmax P(mi | r )
mi
P(C | r ) = P(mk | r ) .
(3.8)
where argmax denotes the value that maximizes a function. For example, argmax x f ( x ) denotes the value x for which f ( x ) is maximum, such that (for y > 0) argmax x (2 xy2 x 2 y) = y. Similarly, for x > 0, argminy (2 xy2 x 2 y) = x /4, because 2 xy2 x 2 y is minimum for y = x /4. Note that if two or more messages have the same a posteriori probability, the receiver can select one of them arbitrarily, without affecting P(C | r ) and thus also P(E ). A more convenient expression of the MAP criterion can be obtained by using the Bayes formula P(mi ) p(r | mi ) P(mi | r ) = (3.9) p(r ) where P(mi ) is the a priori probability that mi is transmitted and p(r | mi ) is the pdf of r conditional upon the transmission of mi .
3.3. ADDITIVE WHITE GAUSSIAN NOISE CHANNEL
33
As p(r ) 0 is independent of the decision rule, it does not affect the maximum, such that the MAP criterion becomes m = argmax P(mi ) p(r | mi ) (3.10)
mi
Note that the receiver must have a statistical knowledge of both source and channel. Indeed, it has to know the a priori probabilities of the messages and it has to know the pdf of r which depends on the noise, i.e., on the channel. 3.2.2. Maximum Likelihood (ML) Criterion. In case all messages are equally likely, the MAP criterion becomes m = argmax p(r | mi ) (3.11)
mi
which is referred to as maximum likelihood (ML) criterion. If the receiver has no statistical knowledge of the source, i.e., if it does not know the probabilities P(mi ), the MAP criterion cannot be followed and often one fall backs on the ML criterion. Clearly, if the receiver adopts this criterion, the probability of error is not minimum, unless the transmitted symbols are equally probable. However, if the probabilities P(mi ) do not differ much, ML and MAP are practically equivalent. Indeed, from the point of view of the receiver, the probability that mi is transmitted is equal to P(mi ) before observing r (t ), while it becomes P(mi | r ) afterwards. Now, observing r (t ) changes the probabilities of the messages, generally increasing the probability of the message actually transmitted. Thus, if this is mk , P(mk | r ) is normally much greater than the a posteriori probabilities of the other messages and then On the other hand, if the probabilities P(mi ) are not much different, this means that p(r | mk ) > p(r | mi ) i = k P(mk ) p(r | mk ) P(mi ) p(r | mi ) i = k .
and thus, the MAP and ML criteria perform about the same. 3.3. Additive White Gaussian Noise Channel Until now, we have made no assumption on the noise, so the previous relations hold whatever the noise statistics. Let us now examine the important case in which the noise is additive, white and Gaussian (AWGN channel). If message mi is transmitted, the received signal is r (t ) = si (t ) + n(t ) and hence r = si + n (3.12)
N where si and ni are the images of si (t ) and n(t ) with respect to {i (t )}1 . Denoting by pn (n ) the pdf of n , from (3.12) it follows that p(r | mi ) = pn (r si ) (3.13)
such that we can compute p(r | mi ) through pn (n ).
3.3. ADDITIVE WHITE GAUSSIAN NOISE CHANNEL
34
and det C = (N0 /2)N . Thus
The N components of n are zero-mean jointly Gaussian r.v.s and, as the noise is white, uncorrelated. Hence, they are also independent. This means that their covariance matrix C is diagonal and, if the power spectral density of the noise is N0 /2, N0 / 2 0 0 2/ N 0 0 0 0 N / 2 0 0 2 / N 0 0 0 1 C= , C = . . . . . . . . . . . . . . . . . . . . . . . . 0 0 N0 / 2 0 0 2/ N 0 pn (n ) = = and 1 (2 )N 1 exp n T C1 n 2 det C
N j =1 N j =1
1 1 exp ( N0 )N /2 N0
n2 j
(3.14)
1 1 p(r | mi ) = exp N / 2 ( N 0 ) N0
(r j si j )2 .
(3.15)
Taking into account this expression, the MAP and ML criteria can be written as MAP: m = argmin
mi N j =1
(r j si j )2 N0 ln P(mi )
N j =1
ML: m = argmin
mi
(r j si j )2
Letting now N , such that r (t ) r (t ), we obtain MAP: ML: m = argmin

mi
r si
mi
N0 ln P(mi )
2
(3.16) (3.17)
m = argmin
r si
being r si = di the distance between r and si .

s3 d3 d2 s2 r d1 s1
F IGURE 3.2.
3.4. OPTIMUM RECEIVER STRUCTURES
35
These expressions have a remarkable geometric interpretation (see Fig. 3.2). The receiver computes the distance di between r and si . Then, according to the ML criterion selects m = mk if r is closer to sk . Instead, according to the MAP criterion, the terms N0 ln P(mi ), accounting for the a priori probabilities of the messages, are to be subtracted to di2 . The greater P(mi ), the greater the quantity subtracted to di2 , and thus more frequent the decision in favor of mi . 3.4. Optimum Receiver Structures Let us now devise some block diagrams implementing the described decision strategies. We will focus on the MAP strategy, as the ML one is only a particular case. 3.4.1. Correlation Receiver. By expressing the distances through integrals, the MAP criterion (3.16) can be rewritten as T [r (t ) si (t )]2 dt N0 ln P(mi ) (3.18) m = argmin
mi 0
which, expanding the square and neglecting the terms that do not depend on the decision strategy, easily leads to m = argmax
mi T 0
r (t )si (t )dt + Ci
(3.19)
where Ci Ei
N0 Ei ln P(mi ) , 2 2 T si2 (t )dt .

0
(3.20) (3.21)
Of course, the energy Ei of the signal si (t ) is known to the receiver because it knows all the signals. If the receiver also knows N0 and P(mi ), i.e., it has a statistical knowledge of both channel and source, the Ci s are known constants and the MAP strategy can be implemented as shown in Fig. 3.3. This is called a correlation receiver, as the cascade of a multiplier and an integrator is referred to as a correlator.1
s1(t )
T 0
C1 rT s 1
CHOOSE MAX
s2(t ) r(t )
T 0
C2 rT s 2
sM (t )
. . .
T 0
CM rT s
M
F IGURE 3.3.
1
As we deal with real quantities, we can omit the conjugate in internal products, such that (r, s) = rT s.
36
3.4.2. Matched Filter Receiver. An alternative method to implement the MAP strategy consists in substituting the correlators with matched lters. A lter is matched to the signal si (t ) if its impulse response is hi (t ) = si (T t ). Let us show how the correlators can be replaced with lters matched to the signals si (t ). Taking into account that the signals are real valued and vanish outside the time interval (0, T ), the output yi (t ) in response to the input r (t ) is given by yi (t ) = r ( )hi (t )d = r ( )si (T t + )d t = r ( )si (T t + )d and thus
t T
yi (T ) =
r ( )si ( )d
such that, at the time instant t = T , the response of hi (t ) to the input r (t ) coincides with the output of the i -th correlator in Fig. 3.3. Hence, another possible implementation of the MAP strategy is as shown in Fig. 3.4.
C1 s1 ( T t ) rT s 1 C2 r(t ) s2 ( T t ) rT s
2
CHOOSE MAX
. . .
sM ( T t )
. . .
rT s t=T
M
CM
F IGURE 3.4. 3.4.3. Other Receiver Structures. If the number M of signals is large, the previous implementations turn out to be expensive, because of the large number of correlators or matched lters they require. In such cases it may be advantageous implementing the MAP strategy in a different way. In the absence of noise, we could use an orthonormal basis for the subspace S generated by the Q M signals {si (t )}1 . For example, we can obtain such a basis, say {i (t )}1 , by means of the GramSchmidt procedure. As known, the dimension Q of the subspace S is less than or equal the number M of signals, i.e., Q M . However, in the presence of noise, we need a complete orthonormal basis in order to be able to expand the received signal. Supposing that {i (t )} 1 is such a basis,
37
we can insert the functions i (t ) in the rst Q positions by correspondingly eliminating one of the functions i (t ), for some i , as explained in Theorem 2.23. At the end of the procedure we will have Q . The components of the images a complete orthonormal basis whose rst Q functions are2 {i (t )}1 M si of the signals {si (t )}1 with respect to this complete orthonormal basis are different from zero only in the rst Q positions, i.e., si = (si1 , si2 , ..., siQ , 0, 0, 0, ...)T
Q is a basis of the subspace generated by the signals, because, as {i (t )}1 Q
si (t ) =
j =1
si j j (t )
and thus each si (t ) is orthogonal to all functions k (t ) which have not been eliminated, due to the Q are orthogonal to them. fact that {i (t )}1 Now, going back to the MAP criterion written in the form (3.10) and reported here for convenience m = argmax P(mi ) p(r | mi ) ,
mi
(3.10)
if we rewrite the vector r as r= where
r1 r2
(3.22)
r1 r Q +1 . r Q +2 . r1 = , r = (3.23) 2 . . . rQ . i.e., such that r1 collects the rst Q components of r and r2 all remaining ones, we can also rewrite the MAP criterion as (3.24) m = argmax P(mi ) p(r1 , r2 | mi ) .
mi
Now, applying the chain rule factorization3 p(r1 , r2 | mi ) = p(r1 | mi ) p(r2 | r1 , mi ) we can also write m = argmax P(mi ) p(r1 | mi ) p(r2 | r1 , mi ) .
mi
(3.25) (3.26)
If r2 , conditional on r1 , is independent of mi , i.e., if then this last term does not play any role in the decision, which can be based only upon r1 . This means that the vector r2 is irrelevant to the decision and can be discarded, such that the MAP criterion reduces to m = argmax P(mi ) p(r1 | mi ) . (3.28)
mi
p(r2 | r1 , mi ) = p(r2 | r1 ) ,
(3.27)
Inserting the functions i (t ) provides a complete basis whose functions not necessarily are orthonormal. In this case, we can still apply the Gram-Schmidt procedure for orthonormalizing the new set, whose rst Q functions still still turn out to be i (t ), i = 1, 2, . . . , Q. 3 The equality is due to the fact that p(r2 | r1 , mi ) p(r1 , r2 | mi )/ p(r1 | mi ).
2
38
In such a case, we say that the vector r1 is a sufcient statistics. This is exactly our case, because r = si + n and only its rst Q components depend on mi , such that xing r1 does not affect r2 . Hence, the MAP criterion can be written as
Q
m = argmax
mi j =1
r j si j + Ci
(3.29)
and can be implemented as shown in Fig. 3.5.

1(t )
i = 1, . . . , M
T 0
C1 r1 r s1
T
Q (t )
. . .
T 0
COMPUTE rT si
r (t )
T 0
r2
r s2
T
CHOOSE MAX
2(t )
C2
. . .
rT s M
rQ
CM
F IGURE 3.5. The advantage of this implementation is that it requires Q correlators instead of M and, usually, Q M . However, due to the fact that we now need to compute the correlations rT si , this scheme might not be more convenient than the other one. It depends on the number of signals M and on the dimension Q of their subspace. So, case by case, the cost of the network needed for computing rT si should be compared with the savings obtainable in terms of number of correlators in order to nd out what implementation is cheaper. Also in this case, we could use matched lters instead of correlators, as shown in Fig. 3.6.
C1
i = 1, . . . , M
1(T t )
r1
r s1
T
. . .
Q (T t ) rQ t=T
COMPUTE rT si
r (t )
2(T t )
r2
r s2
T
CHOOSE MAX
C2
. . .
rT s M
CM
F IGURE 3.6. Unless otherwise stated, in the following we will implicitly use complete orthonormal bases whose rst Q functions are an orthonormal basis for the subspace generated by the signals, such that we
39
will be dealing only with Q-dimensional vectors. All other components that would be normally required in order to correctly represent the received waveform are irrelevant for the detection and can be simply discarded. Let us apply this concept to a practical example. E XAMPLE 3.1 (QPSK system). The so called quadrature phase-shift keying (QPSK) system is often used in radio links, satellite links, telephone lines, and ber-optic communications. It is also referred to as 4-PSK system as the employed signals are 4 sinusoidal pulses of duration T differing by their relative phase shifts: si (t ) = A cos(0 t i ) where i = (i 1) 0tT (3.30)
i = 1, 2, 3, 4 (3.31) 2 i.e., i {0, /2, , 3 /2}. Let us nd the optimum receiver for the AWGN channel with power spectral density N0 /2. Our system model is as shown in Fig. 3.1. As a rst step, we have to nd an orthonormal basis for the signals (3.30). Using the trigonometric identity cos( ) = cos cos + sin sin , they can be written as si (t ) = A cos i cos 0 t + A sin i sin 0 t 1 (t ) = a cos 0 t 2 (t ) = b sin 0 t (3.32) i.e., as a linear combination of the signals (3.33)
Let us now check whether they are orthogonal. By using the trigonometric identity sin 2 = 2 sin cos , their internal product turns out to be T ab T ab 1 (t )2 (t )dt = sin 20 t dt = (1 cos 20 T ) (3.34) 2 0 40 0 such that 1 (t ) and 2 (t ) are orthogonal if 20 T = k 2 , i.e., 0 T = k (3.35) with k an integer. As cos 0 t and sin 0 t are periodic with period 2 /0 , this means that an integer number of half-periods should be contained in the signaling interval (0, T ), as shown in Fig. 3.7. sin(0t ) T t
2 0 2 0
cos(0t )
F IGURE 3.7. The transmitter, according to the message emitted by the source, should transmit one of the signals {si (t )}4 1 . We can obtain s1 (t ) from an oscillator and synthesize the other ones by means of phaseshifters, such that the transmitter could be implemented as shown in Fig. 3.8.
40
mk
sk (t )
s1(t )
s2(t ) /2
s3(t )
s4(t ) 3 / 2
A cos 0 t OSCILLATOR F IGURE 3.8. Every T seconds, the selector selects one of the signals {si (t )}4 1 , such that the clock frequency is 1/T . However, the oscillator frequency 0 is independent of the clock frequency. Hence, it is difcult to guarantee that (3.35) holds. Luckily, if 0 T 1, it is easy to show that the internal product of 1 (t ) and 2 (t ) is negligible with respect to their norms, and thus (see Fig. 3.9) T 1 2 cos = 1 2 0 2 (0 T 1)
2 1
F IGURE 3.9.
Indeed, under the hypothesis that 0 T 1, T a2 T a2 T sin 20 T 2 1 2 = 1 (t )dt = (1 + cos 20 t )dt = 1+ 2 0 2 20 T 0 T b2 T b2 T sin 20 T 2 2 (t )dt = (1 cos 20 t )dt = 2 2 = 1 2 0 2 20 T 0 and, taking into account (3.34), we get T 1 2 1 2
ab (1 40
a2 T 2 b2 T 2
(3.36) (3.37)
cos 20 T ) abT /2
1 cos 20 T 20 T
for 0 T
1.
From (3.36) and (3.37) we also see that, in order to have 1 = 2 = 1 it should be a = b = 2/T , and hence our orthonormal basis is 1 (t ) = 2 (t ) = 2 cos 0 t T 2 sin 0 t T
(3.38)
41
Implementing an optimum correlation receiver as in Fig. 3.5, requires a linear network for computing the correlations rT si , such that we need to compute the images of the signals (3.30) with respect to the basis (3.38), i.e., the coefcients si1 and si2 of the expansion 2 cos 0 t + si2 T By comparing (3.32) and (3.39), we immediately obtain si (t ) = si1 1 (t ) + si2 2 (t ) = si1 si1 = A si2 = A such that the images our signals are 0 T s11 s A 21 , 2 = = , T s12 s A 2 22 0 r T s1 = A T cos i 2 T sin i 2 s31 A = s32 0
T 2 ,
2 sin 0 t . T
(3.39)
(3.40)
and hence the required correlations turn out to be
T T T T r1 , rT s2 = A r2 , r T s3 = A r1 , r T s4 = A r2 . (3.42) 2 2 2 2 We would also need the constants Ci in Fig. 3.5, but, as N0 Ei Ci = ln P(mi ) 2 2 and, for 0 T 1, T A2 T i , Ei = si2 (t )dt 2 0 we can omit them under the hypothesis that the messages are equally likely, because in this case the Ci s are the same for all signals. This is also true for the constants A T /2 in (3.42), which can be omitted, too. In conclusion, the block diagram of our optimum receiver is as in Fig. 3.10.
T 0
2 T
0 s41 = s42 A
T
2
(3.41)
r1
CHOOSE MAX
cos 0 t OSCILLATOR /2 1
T 0
r2 r1 r2
r (t )
2 T
sin 0 t
F IGURE 3.10. E XAMPLE 3.2 (Synchronization issues). In the implementation of a correlation receiver, there is a problem that should be addressed. The transmit and receive oscillators are independent of each other. So, there is no guarantee that they are synchronized, i.e., that their relative phase-shifts are the same. Most probably, they are not. Our computations were based on the fact that the local
42
oscillator (i.e., the oscillator at the receive end) was synchronized to the received signals. If this is not the case, we would suffer a penalty and the receiver would not be optimum anymore. Indeed, let us continue Example 3.1 and suppose that the local oscillator is affected by a phase-shift , such that its output is 2/T cos(0 t + ) instead of 2/T cos 0 t . Letting r (t ) = si (t ) + n(t ), where si (t ) is as in (3.30) and n(t ) is AWGN with power spectral density N0 /2, for 0 T 1, the output of the correlators in Fig. 3.10 would be r1 = r2 = where si1 = A n1 = T cos(i + ) 2 2 T n(t ) cos(0 t + )dt T 0 si2 = A n2 = T sin(i + ) 2 2 T n(t ) sin(0 t + )dt , T 0 (3.43) (3.44) 2 T 2 T
0 T T
[si (t ) + n(t )] cos(0 t + )dt = si1 + n1 [si (t ) + n(t )] sin(0 t + )dt = si2 + n2
instead of r j = si j + n j (for j = 1, 2), where (3.40) and the n j s are zero-mean the si j s are as in Gaussian r.v.s with variance N0 /2. Since 2/T cos(0 t + ) and 2/T sin(0 t + ) are still orthonormal in (0, T ), n1 and n2 in (3.44) are still zero-mean Gaussian r.v.s with variance N0 /2. Thus, while the noise components remain unchanged, the signal components change, so that the image of the received signal may be closer to the image of the wrong signal, increasing the probability of error. For example, if = /2, the receiver would exchange s1 with s2 , s2 with s3 , and so on, and would be in error almost always. There are several techniques that ensure synchronization with the received signals, but they are beyond the scope of this course. Here, we only mention a self-synchronization technique that avoids a local oscillator and provides the required oscillation directly from the received signal. Using this technique, the block diagram of a QPSK receiver becomes as in Fig. 3.11, and we are now going to explain the operations that the SYNC circuit has to perform.
T 0
r1
CHOOSE MAX
r (t )
r2 1 r1 r2
/2
T 0
SYNC
F IGURE 3.11.
43
In the absence of noise, the receiver knows that one the four signals (3.30) can be received. Their images si with respect to the basis (3.38) are shown in Fig. 3.12. We note that multiplying by 4 their angles transforms each of them into s1 . This means that if we are able to multiply by 4 the phase of the received signals, we actually remove the phase modulation and are able to obtain the required oscillation, which would be automatically in-phase with s1 (t ).
4
s2 s3 s4 F IGURE 3.12. s1
We can do this by using a nonlinear device whose output is the fourth power of the input. Indeed si4 (t ) = A cos(0 t i ) = A4 3 A4 A4 + cos(20 t 2i ) + cos(40 t 4i ) 8 2 8
and we are interested in the last term. Taking into account that the Fourier transform of si4 (t ) is composed by Dirac deltas centered at f = 0, 2 f0 , 4 f0 , being f0 = 0 /2 , we can obtain the desired term by using a very narrow bandpass lter (BPF) H ( f ) centered at 4 f0 (see Fig. 3.13).
H( f )
4 f 0
2 f 0
2 f0
4 f0
F IGURE 3.13. In order to obtain the desired frequency of oscillation, we have to use a frequency divider to bring down to f0 the oscillation at 4 f0 . So, a block diagram of the SYNC circuit in Fig. 3.11 is as in Fig. 3.14.
SYNC r (t )
4
BPF at 4 f0
FREQ. DIVIDER 4
cos 0 t + (t )
F IGURE 3.14. Due to the presence of the noise, the output of the SYNC circuit will not be proportional to cos 0 t but rather to cos 0 t + (t ) , where (t ) is hopefully small, both because the noise power over the signal bandwidth is normally much smaller than the signal power and because of the very narrow bandpass lter. However, as small as it can be, the phase noise (t ) still has an impact on the probability of error. For this reason, a local oscillator is still preferred and more sophisticated techniques are actually employed to ensure phase synchronization. E XAMPLE 3.3 (Matched-lter receiver). Let us now see how the QPSK receiver of Example 3.1 can be implemented by using matched lters. We need a couple of lters whose impulse response is matched to the signals of the basis (3.38). We can devise circuits having the desired impulse response by using resistors, capacitors, and inductors.
44
A resistor is an electrical component that implements electrical resistance as a circuit element. The voltage across resistors terminals is proportional to the current owing through it. The constant of proportionality is called resistance. This relationship is represented by the Ohms law: v(t ) = R i (t ) where R is the resistors resistance measured in ohms (see Fig. 3.15).
F IGURE 3.15.
A capacitor is an electrical component basically consisting of two metal plates separated by a dielectric (insulator). Its circuit representation is shown in Fig. 3.16. Applying a voltage across the plates, an electric eld develops across the dielectric, causing positive charge to collect on one plate and negative charge on the other plate. The accumulated charge q(t ) is proC portional to the applied voltage v(t ), the constant of proportionality being the capacitance C (measured in farads), i.e., q(t ) = C v(t ). DifF IGURE 3.16. ferentiating this expression with respect to time t , the resulting current i (t ) = dq(t )/dt and the voltage v(t ) across the capacitor are related by dv(t ) i (t ) = C dt As can be seen, if v(t ) is constant, apart from an initial transient, a current cannot ow (due to the insulator between the plates) and we get i (t ) = 0 from the relation above. However, applying a varying voltage, the accumulated charges on the two plates continuously change, causing the ow of a (varying) current in the circuit external to the capacitor. An inductor is a conductive wire coiled in loops to reinforce the magnetic eld created by a current. Its circuit representation is shown in Fig. 3.17. An inductor opposes changes in the current through it by developing a voltage across it proportional to the rate of change of the current (it means that an ideal inductor would offer no resistance to a constant direct current). Indeed, according to Faradays law of induction, the voltage v(t ) induced in any closed circuit is equal to the time rate of change of the magnetic ux (t ) through the circuit, i.e., v(t ) = d (t )/dt . As the L magnetic ux through the coil is proportional to the current through it, i.e., (t ) = L i (t ), it follows that the relationship between the current through an inductor and the voltage across it is given by F IGURE 3.17. di (t ) v(t ) = L dt where the constant of proportionality L is called inductance (measured in henries). By using the Kirchhoffs circuit laws, we can analyze any linear electrical circuit by applying the previous relationships to write the differential equations governing the circuit. Usually, these differential equations turn out to be linear with constant coefcients, such that their solution can be obtained by means of either the Laplace transform (if we are interested in transients due to initial conditions) or the Fourier transform (if we are only interested to the steady state solution). Indeed, let us see how we can turn a differential equation into an algebraic one by the help of the Fourier transform.
45
Recalling that, if X ( f ) is the Fourier transform of x (t ), then j 2 f X ( f ) is the Fourier transform of dx (t )/dt , and using for convenience the angular frequency 2 f , we can rewrite in the frequency domain the relationships between the current through and the voltage across the previous circuit elements as v(t ) = R i (t ) dv(t ) i (t ) = C dt di (t ) v (t ) = L dt = = = V () = R I () I () = j C V () V () = j L I ()
Hence, in the frequency domain, capacitors and inductors behave exactly as a resistor, i.e., voltages across and currents through them are proportional to each other. From the relations above, an inductor is equivalent to a resistor whose resistance is j L, while a capacitor is equivalent to a resistance with value 1/ j C . However, when working in the frequency domain, we refer to such (complex) values as impedances rather than resistances. Going back to our initial task, under the hypothesis that 0 T = 2k , with k an integer, we have to devise two lters whose impulse responses are matched to the signals (3.38), i.e., 2/T cos 0 t 0 t T 1 (T t ) = (3.45) 0 otherwise 2/T sin 0 t 0 t T 2 (T t ) = (3.46) 0 otherwise
To this end, let us consider the two circuits in Fig. 3.18.
(t )
+ C L h p (t ) (a) F IGURE 3.18. +
L + C hs (t ) (b)
(t )
The circuit in Fig. 3.18(a) is driven by a current, while the one in Fig. 3.18(b) by a voltage. If the input is a (current or voltage) Dirac delta, the output voltage is the impulse response. Let us denote the impulse response of the circuits in Fig. 3.18(a) and (b) as h p (t ) and hs (t ), respectively. Working in the frequency domain, we can deal with capacitors and inductors as if they were resistors with resistance or, better, impedance equal to 1/ j C and j L, respectively. As the Fourier transform of a Dirac delta is equal to 1, the input current in Fig. 3.18(a) becomes I () = 1 and the input voltage in Fig. 3.18(b) V () = 1. Thus, the corresponding circuits in the frequency domain are as in Fig. 3.19.
46
I ( ) = 1 + 1 jC (a) F IGURE 3.19. j L H p ( ) + V ( ) = 1
j L + 1 jC (b) Hs( )
The impedance Z p equivalent to the parallel of two impedances Z1 and Z2 is given by Zp = So, letting Z1 = 1/ j C and Z2 = j L, Zp =
1 j L j C 1 + j L j C
Z1 Z2 . Z1 + Z2 j L 1 = 2 1 LC C j 2
1 LC
and the voltage H p () in Fig. 3.19(a) turns out to be H p () = Z p I () = where 1 j 2 C 2 0 (3.47)
1 . LC As regards the circuit in Fig. 3.19(b), the current through it is given by 2 0

1 j C
(3.48)
V () + j L
and hence, the output voltage is Hs () = where 2 0 is again as in (3.48). In order to nd the inverse Fourier transform of (3.47) and (5.22), let us consider the signal x (t ) = u(t )et where u(t ) is the unit step function and > 0, and compute its Fourier transform as a function of e(+ j )t 1 e(+ j )t dt = = . X () = + j 0 + j 0 1 j C 2 V () 0 = 1 2 2 + j L 0 j C (3.49)
47
Recalling that the Fourier transform of x (t ) e j 0 t is X ( 0 ), the Fourier transforms of x (t ) cos 0 t = 1 x (t ) e j 0 t + e j 0 t 2 1 x (t ) sin 0 t = x (t ) e j 0 t e j 0 t 2j
are, respectively, 1 1 1 1 X ( 0 ) + X ( + 0 ) = + 2 2 + j ( 0 ) + j ( + 0 ) 1 1 1 1 X ( 0 ) X ( + 0 ) = 2j 2 j + j ( 0 ) + j ( + 0 ) + j + ( + j )2 0 2 0 + ( + j )2 2 0 Now, letting 0, we obtain j 2 2 0 0 2 2 0

F
such that
u(t )et cos 0 t u(t )et sin 0 t
u(t ) cos 0 t u(t ) sin 0 t 1 cos 0 t t 0 C h p (t ) = 0 t<0 0 sin 0 t t 0 hs (t ) = 0 t<0
such that 1 j H p () = 2 C 2 0 Hs () = 2 0 0 2
F
(3.50) (3.51)
As can be seen, (3.50) and (3.51) differ from (3.45) and (3.46) because
(1) their amplitudes are different, and because (2) (3.50) and (3.51) do not vanish outside the time interval (0, T ). The rst point above does not matter, because we can obtain the required amplitudes by proper amplication. As regards the second point, the output y p (t ) of the lter h p (t ) in response to the input r (t ) is given by y p (t ) = r ( )h p (t )d t = r ( )h p (t )d 1 t r ( ) cos 0 (t )d (3.52) = C
3.5. PROBABILITY OF ERROR
48
while the output y1 (t ) of the lter 1 (T t ) would be t y1 (t ) = 2/T r ( ) cos 0 (t )d

t T
(3.53)
At rst sight, (3.52) and (3.53) appear to be different. However, if we apply r (t ) at the input of h p (t ) at time t = 0, (3.52) becomes 1 t y p (t ) = r ( ) cos 0 (t )d C 0 because it would be r ( ) = 0 for < 0, such that for t = T we have y p (T ) = y1 (T ) apart from an inessential constant of proportionality. Similarly, it can be shown that ys (T ) = y2 (T ), such that the block diagram of an optimum QPSK receiver using matched lters is as in Fig. 3.20.
r1 r (t )
1 1
r2 r1 r2
CHOOSE MAX
t=T
F IGURE 3.20. 3.5. Probability of Error All optimum receivers follow the same strategy, even if their actual implementation is different. In the following we will refer to the MAP criterion, being the ML criterion a particular case. 3.5.1. Decision Zones. As already said, using a complete orthonormal basis whose rst Q Q M functions {i (t )}1 are an orthonormal basis of the subspace S generated by the M signals {si (t )}1 , the receiver should only take into account the projection of the received waveform r (t ) onto this subspace. This means that it has to deal with Q-dimensional vectors, being all other components irrelevant to the decision. In order to avoid introducing further symbols, we will still denote by r and si the images of r (t ) and si (t ) in the signal subspace S . According to the MAP criterion, the receiver follows the rule m = mk where Dk denotes the decision zone for message mk if r Dk
mi
Dk = r RQ : P(mk ) p(r | mk ) = max P(mi ) p(r | mi ) .
(3.54)
49
Hence, the receiver operates as follows. The signal subspace is partitioned (once and for all) into M regions. After observing the waveform r (t ), the receiver computes its image r and decides that mk was transmitted if r belongs to the region Dk . Fig. 3.21 shows a possible partitioning of the signal space into decision zones for the case Q = 2, M = 3.
s3 D3 D1 s1 D2 s2
F IGURE 3.21. When message mi is transmitted, the receiver makes an error if r Di , such that the probability of error, conditional upon the transmission of mi , can be written as or, in terms of the probability of correct decision P(C | mi ), as So, the average probability of error can be expressed as P (E ) = where P(C | mi ) = P(r Di | mi ) =
M i =1
P(E | mi ) = P(r Di | mi )
P(E | mi ) = 1 P(C | mi ) = 1 P(r Di | mi ) . P(E | mi )P(mi ) = 1

M i =1
P(C | mi )P(mi ) p(r | mi )d r
(3.55)
Di
(3.56)
can be computed using the pdf of r conditional upon mi , such that M P(E ) = 1 P(C ) = 1 P(mi ) p(r | mi )d r .
i =1 Di
(3.57)
As can be seen from (3.54) and (3.57), the M a priori probabilities P(mi ) and the M probability density functions p(r | mi ) sufce to specify the optimum detection strategy and the probability of error of an optimum receiver. The actual implementation depends on the specic model used for the transmission channel, which determines the functional form of the pdfs p(r | mi ). 3.5.2. Probability of error for the AWGN channel. In this case, conditional upon the transmission of mi , the image of the received waveform is r = si + n and hence (see also (3.15)) p(r | mi ) = pn (r si ) = 1 r si exp (2 2 )Q/2 2 2
Q j =1 2
1 1 = exp (2 2 )Q/2 2 2 where 2 N 0 / 2.
(r j si j )2
(3.58)
50
It follows that the decision zones (3.54) can be expressed as Dk = r RQ : r sk r sk

Q j =1 2 2
2 2 ln P(mk ) = min r si
mi 2
2 2 ln P(mi )
and their boundaries are actually hyperplanes, because at a boundary the following equation holds i.e., 2 2 ln P(mk ) = r si
2 Q j =1
2 2 ln P(mi ) (3.59)
(r j sk j ) 2 ln P(mk ) =
(r j si j )2 2 2 ln P(mi )
which is clearly linear in the r j s, as all squared terms r 2 j can be simplied. The probability of error can be computed as in (3.55) after evaluating the M integrals (3.56) using (3.58). Referring to the ensemble of the images of the transmitted signals as the signal constellation, the evaluation of these integrals can be made simpler by exploiting the following observations: (1) A rigid-body displacement of the signal constellation (i.e., a simultaneous translation and rotation without change of shape or size) entails the same rigid-body displacement of the decision zones. (2) A rigid-body displacement of the signal constellation does not change the conditional probabilities of error and thus the average error probability. It is easy to show that the two observations above hold. Indeed, denoting by x a displacement vector and by A a rotation matrix (such that A A = AA = I), and letting r = Ar + x, si = Asi + x, we have that r si 2 = Ar + x (Asi + x) 2 = A(r si ) 2 = r si 2 . So, if (3.59) holds for r and si , then it also holds for r and si , because the decision zones are displaced in the same manner as the signals. On the other hand, replacing r with r and si with si , also (3.58) remains unchanged, such that the integrals (3.56) have the same value after the rigid-body displacement. As an example, Fig. 3.22 shows the constellation in Fig. 3.21 rst rotated clockwise by /4 and then translated by x = ( E3 /2, E3 /2)T , E3 being the energy of s3 .
p(r | m1) p(r | m3) D3 D1 D2 p(r | m2) p(r | m 1) p(r | m 3) D 3 D 1 D 2 p(r | m 2)
F IGURE 3.22. As evident from (3.58), in the AWGN channel case, the conditional pdfs depend only on the power spectral density of the noise N0 /2 = 2 and on the images si of the signals. From the previous observations, we can conclude that, at equal conditions, i.e., same values of P(mi ) and N0 , the probability of error of two systems whose signal constellations simply differ by a rigid-body disM placement is the same. In other words, the shape of the signals {si (t )}1 does not matter as, whatever their shapes, the probability of error only depends on the signal constellation, i.e., on the relative positions of the images si in the Q-dimensional space S .
51
However, even if two systems have the same probability of error, i.e., the same performance, it does not mean that they are completely equivalent. Indeed, the same quality (probability of error) may have a different cost. The energy required for transmitting the symbol mi is given by T Q 2 2 si j = si2 (t )dt . E i = si =
j =1 0
As the symbol mi is transmitted with probability P(mi ), the average energy required for the transmission of a generic symbol is Es =
M i =1
Ei P(mi ) =
M i =1
si 2 P(mi ) .
(3.60)
So, it is clear that even if two systems have the same performance, their cost (required average energy) can be different. As a rigid-body displacement of the signal constellation changes Es but does not change P(E ), we wonder what is the rigid-body displacement that minimizes Es while leaving unaltered P(E ). In other words, we wonder what is the set of signals with minimum average energy that guarantees the same average probability of error of the original set. We observe that, interpreting P(mi ) as the mass of point si , the average energy (3.60) can be seen as the moment of inertia about the origin of the masses P(mi ) at points si . As known from analytical mechanics, such moment of inertia can be minimized by translating the masses such that their center of mass (or barycenter) coincides with the origin. As the center of mass is given by c=
M i =1
P(mi )si ,
the images si of the set of signals minimizing the average energy Es are given by si = si c . Note that any rotation around the origin does not modify the moment of inertia of the masses, i.e., the average energy of the signals. So, by a rotation of si , we can get different sets of signals all of which have minimum Es and same performance, despite their differently shaped waveforms. E XAMPLE 3.4. A communication system using orthogonal, equal-energy and equally probable binary signals, whose images are shown in Fig. 3.23(a), is not a minimum energy system. As P(m1 ) = P(m2 ) = 1/2, the barycenter of the signals s1 and s2 is in the middle of the line segment joining them. Hence, a minimum energy system can be obtained by rigidly moving the signal constellation such as the barycenter coincides with the origin, as done in Fig. 3.23(b). If the barycenter of two equal-energy and equally probable signals coincides with the origin, the two signals are the opposite of each other, i.e., s2 (t ) = s1 (t ). Such signals are referred to as antipodal. The probability of error for the systems in Fig. 3.23(a) and (b) is the same, but the average energy spent by the system using orthogonal signals is Es = E , while it is Es = E /2 for the system using antipodal signals.
52
s2 E d=
2E s 2
E 2
E 2
s 1 d= 2E
(a)
s1 E
(b) F IGURE 3.23.
As an example, if the orthonormal basis used for obtaining the images in Fig. 3.23 is as in (3.38) with 0 = k 2 /T , then, for 0 t T , E 2 E s1 (t ) = cos 0 t cos t s ( t ) = 0 1 T T . and E 2 E cos 0 t sin 0 t s2 (t ) = s2 (t ) = T T Notice that, with respect to the 1-dimensional orthonormal basis 1 (t ) = 1/T , for 0 t T , the signals s1 (t ) and s2 (t ) in Fig. 3.24 have the same images as in Fig. 3.23(b). Thus, at equal conditions (i.e., same a priori probabilities and same noise), the system using these signals has same error probability and same average energy of the system using the signals s1 (t ) and s2 (t ) above, despite the fact that their shapes are very different. Hence, the choice between different sets of signals having the same constellation should be based on considerations different from the probability of error, possibly based on the characteristics of the transmission channel.
E 2T s1 (t ) s2 (t )
T
E 2T
F IGURE 3.24. The probability of error depends on the evaluation of integrals like (3.56), which can be made easier by exploiting the fact that the signal constellation can be rigidly moved without changing the probability of error. 3.5.3. Binary Signals. In the case of binary signals, whatever the signal constellation, we can rigidly move it such that the middle point of the line segment that joins the signal images coincide with the origin and the signal images themselves lie on the horizontal axis, as shown in Fig. 3.25. Note that this is not necessarily the minimum energy conguration, it depending on whether the signals are equally probable or not. Thanks to this rigid-body displacement, the originally bidimensional problem becomes monodimensional. Taking into account that the ordinates of s1 and s2 are zero while their abscissas are
53
s 1 d
D 1 D 2 s1
D1 d
D2
s 2
s2
F IGURE 3.25. s11 = d /2 and s21 = d /2, respectively, from (3.59) we have that the abscissa of the boundary line between the two decision zones D1 and D2 satises + such that d 2
2
2 2 ln P(m1 ) =
d 2
2 2 ln P(m2 ) (3.61)
2 P(m1 ) N0 P(m1 ) = ln = ln . d P(m2 ) 2d P(m2 ) P(C | m1 ) = P(r D1 | m1 ) = P(r1 < | m1 ) d /2 + 1 (r1 + d /2)2 1 2 = exp dr1 = ey /2 dy 2 2 2 2 2 P(C | m2 ) = P(r D2 | m2 ) = P(r1 > | m2 ) (r1 d /2)2 1 1 2 = exp dr1 = ey /2 dy 2 d /2 2 2 2 2
The probability of error can now be computed as in (3.55) by using (3.56) and (3.58)
(3.62a)
(3.62b)
where we performed the change of variable y = (r1 d /2)/ . These expressions can be put in a standard form by introducing the Q-function 1 2 Q( x ) = ey /2 dy , (3.63) 2 x which represents the area under the tail of a Gaussian distribution with zero mean and unit variance, as illustrated in Fig. 3.26(a).
1 10-1
2 1 ey /2 2
1 x 2
exp(x2/2)
10-2 10-3 10-4 10-5
Q( x )
Q( x ) x (a) y
10-6
x 3
(b)
F IGURE 3.26.
54
For x > 3, the Q-function is very well approximated (the difference being less than 10%) by Q( x ) as can be seen from Fig. 3.26(b). Now, as 1 2 it turns out that 1 2
x
1 2 ex /2 x 2
(3.64)
ey /2 dy = 1
Q ( x ) = 1 Q ( x )
ey /2 dy = 1 Q( x )
(3.65) (3.66)
and thus (3.62a) and (3.62b) can also be written as d /2 + d /2 P(C | m2 ) = 1 Q P(C | m1 ) = 1 Q such that, taking into account that P(E | mi ) = 1 P(C | mi ), from (3.55) we get d /2 d /2 + + P(m2 )Q . If the signals are equally probable, then = 0 (see (3.61)), and thus P(E ) = P(m1 )Q P(E ) = Q d . 2 (3.67a) (3.67b)
Sometimes, instead of the Q-function, the so called complementary error function is used for expressing the probability of error. Its denition is similar to that of the Q-function 2 2 erfc( x ) = ey dy x such that they are simply related by erfc( x ) = 2Q( 2 x ) Q( x ) = 1 x erfc . 2 2
and, conversely,
When comparing different communication systems, it is useful to express the error probability in terms of the average energy instead of the distance between the signals. Indeed, as the average energy of a given constellation depends on the position of the signal images, a given system may be more energy efcient than others at equal P(E ). In other words, it is more meaningful to compare the probability of error for the same average energy rather than for the same signal distance. Indeed, in this last case we already know that P(E ) is the same. Let us examine this aspect in the case of binary orthogonal and antipodal signaling.
55
3.5.3.1. Orthogonal and Antipodal Binary Signals. If the signals are equally likely, the average energy is 1 Es = s1 2 + s2 2 2 such that s1 d 2 d s2 d 2 s 1 d 2 d 2 s 2 d2 E = s 2 d =Q P(E ) = Q 2 d2 E = s 4 d =Q P(E ) = Q 2
Es N0
ORTHOGONAL SIGNALS
Es 2 N0
ANTIPODAL SIGNALS
The probability of error is the same in both cases because the distance d between the signals is the same, but using antipodal signals allows spending half the energy that is required with orthogonal signals. Representing P(E ) on a logarithmic scale as a function of the ratio Es /N0 expressed in dB Es Es , = 10 log10 N0 dB N0 the two graphs are horizontally separated by about 3 dB (see Fig. 3.27) because, at equal P(E ), Es Es =2 N0 orthogonal N0 antipodal and 10 log10 2 = 3.0103 3 dB.
1 10 -2 10 -4
Orthogonal signals Antipodal signals 3 dB
P(E )
10 -6 10 10
-8
-10
10 -12 -5
10
15
20
[Es /N0]dB
F IGURE 3.27.
56
3.5.3.2. Generic Binary Signals. The squared distance between two generic binary signals is given by T 2 2 d = s1 s2 = |s1 (t ) s2 (t )|2 dt 0 T = E1 + E2 2 s1 (t )s2 (t )dt
0
where E1 and E2 are the energies of the signals. By dening the correlation coefcient of the two signals as their normalized internal product T 1 s1 (t )s2 (t )dt (3.68) E1 E2 0 we can also write d 2 = E 1 + E 2 2 E 1 E 2 . If the signals are equally likely, the probability of error can be written as d E + E 2 E E 1 2 1 2 P(E ) = Q = Q 2 2 N0 P(E ) = Q (1 ) Es . N0 (3.69)
and therefore, if the signals are also equal-energy, such that E1 = E2 = Es , (3.70)
Comparing (3.68) and (2.19), we see that the correlation coefcient between two signals is equal to the cosine of the angle between the signal images, that is, with reference to Fig. 3.28, = cos . Hence, (3.69) is nothing more than the law of cosines applied to the triangle having the origin and the signals as vertices.
s1 d E1 s2 E2
F IGURE 3.28. From this geometric interpretation, = 0 if the signals are orthogonal, while = 1 if they are antipodal. As || = | cos | 1, P(E ) is minimum for = 1, i.e., for antipodal signals, and maximum for = 1, corresponding to s1 (t ) = s2 (t ), this latter case being of no interest, as it leads to P(E ) = 1/2.
57
si
Di
b2 b1
s i
D i a2
(a) F IGURE 3.29.
a1
(b)
3.5.4. Rectangular Decision Zones. For Q = 2, if a decision zone Di has a rectangular shape with edges not parallel to the coordinate axes, as shown in Fig. 3.29(a), the evaluation of the integral (3.56) can be simplied by rotating and translating the signal constellation such that to obtain a conguration as in Fig. 3.29(b). Indeed, conditional to mi , the pdf of r = si + n is still as in (3.58), i.e.,
(r1 s )2 (r2 s )2 i1 i2 1 1 2 2 2 2 e e , 2 2 but (3.56) can now be evaluated as the product of two 1-dimensional integrals P(C | mi ) = P(r Di | mi ) = p(r | mi )d r
p(r | mi ) = pn1 (r1 si1 ) pn2 (r2 si2 ) =
2 =
N0 2
a2
b2
Di
= =
a1 a2 a1
b1
pn1 (r1 si1 ) pn2 (r2 si2 )dr1 dr2

b2 b1
pn1 (r1 si1 )dr1
pn2 (r2 si2 )dr2 .
s s a2 b2 i1 i2 1 1 2 /2 2 /2 y1 y2 e e dy dy2 P(C | mi ) = 1 a s b s 2 1 i 1 2 1 i 2 a1 si1 a2 si1 b1 si2 b2 si2 = Q Q Q Q . (3.71) Note that this procedure still holds if any of a1 , a2 , b1 , b2 is innite and can be readily extended to the case of signal spaces with dimension Q > 2.
By the change of variable yk = (rk sik )/ , for k = 1, 2, we get
If the signals are equally likely, it is easy to see that the decision zones of the rotated constellation are the various quadrants.
E XAMPLE 3.5 (Error probability of a QPSK system). Let us compute the probability of error for the QPSK system of Example 3.1. As already seen, using the basis (3.38), the images of the received signals are as in (3.41), such that the signal constellation is as in Fig. 3.30(a). Rotating it counterclockwise by /4, we obtain the constellation in Fig. 3.30(b), where AT AT AT AT s s s s11 21 = 31 = 41 = 2 2 2 2 A . = , , , (3.72) T A T A T A T s22 s32 s42 s12 2 2 2 2
58
T 2
s2 s1
s 2
A2T
A T 2
s 1
A T 2
s3
A
T 2
A
T 2
T 2
s4 (a) F IGURE 3.30.
A T s 3 2
s 4
(b)
By symmetry considerations, all the conditional probabilities of error P(E | mi ) are the same, hence and, letting a1 = b1 = 0 and a2 = b2 = , from (3.71), (3.72) and (3.66) we obtain b1 s12 a1 s11 Q P(E ) = 1 Q A T = 1 Q2 2 A T 2 . =1 1Q 2 The average energy spent for transmitting the signals si in Fig. 3.30(a) is Es = A2 T /2, so, as = N0 /2, the probability of error can also be expressed as P(E ) = 1 1 Q Es N0
2
P(E ) = P(E | mi ) = P(E | m1 ) = 1 P(C | m1 )
= 2Q
Es Q2 N0
Es . N0
(3.73)
with 0 T = k 2 , and we will assume that they are equally probable. Let us nd the probability of error of an optimum receiver for an AWGN channel with power spectral density N0 /2. The signal space has dimension Q= 1, so, using the orthonormal basis ( t ) = 2/T cos 0 t , taking into 1 T T account that r s1 = r1 A T /2 and r s2 = r1 A T /2, and being the constants Ci in (3.20) the same for both signals, the structure of the receiver in Fig. 3.5 specializes as shown in Fig. 3.31(a).
3.5.5. Comparing Different Systems. When comparing different communication systems, attention must be paid to perform a fair comparison. Let us illustrate the subtleties of a comparison by an example. We would like to compare the performance of a QPSK system with that of a binary phase-shift keying (BPSK) system. A BPSK system employs the antipodal signals s1 (t ) = A cos 0 t 0tT (3.74) s2 (t ) = A cos 0 t
Note that we actually dont need the second branch (the one providing r1 ), as we can simply compare r1 with the threshold 0 in order to decide which one between r1 and r1 is larger. Consequently, the block diagram of the optimum receiver becomes as in Fig. 3.31(b), where the comparator is denoted by its input/output characteristic.
59
CHOOSE MAX
r (t )
T 0
r1
r (t ) m 2/T cos 0 t
T 0
r1
2/T cos 0 t (a)
r1
(b)
F IGURE 3.31. As already known, for binary signaling P(E ) = Q d d , =Q 2 2 N0
where d is the distance between the signals. For antipodal signals, distance and average energy per signal are related by Es = d 2 /4, such that the error probability can be written as P (E ) = Q 2 Es . N0 (3.75)
For the sake of comparison, let us denote (3.73) and (3.75) as PQPSK (E ) and PBPSK (E ), respectively. As we want small Es /N0 should be large and, consequently, Q( Es /N0 ) is very error probabilities, small. Thus, Q2 ( Es /N0 ) Q( Es /N0 ) and it can be neglected in (3.73), such that PQPSK (E ) 2Q Es N0 Es 2 . N0
(3.76)
PBPSK (E ) = Q
What matters in the comparison is the argument of the Q function and, as can be seen, the argument is larger in the BPSK case, such that, for the same average energy Es , PQPSK (E ) PBPSK (E ) .
So, it may seem that the BPSK system is better, but we should note that this comparison is not fair. Indeed, in each signaling interval we transmit one out of four possible signals in the QPSK case, but only one out of two in the BPSK case, i.e., we transmit log2 4 = 2 bits in one case and log2 2 = 1 bit in the other case. Now, in order to transmit N bits, we need N signaling intervals with BPSK but only N /2 with QPSK, such that the total energy spent is N Es with BPSK and N Es /2 with QPSK. So, it is not fair comparing the two systems on the basis of the same average energy per signal, because, while it is true that we get a lower probability of error with the BPSK system, this is only because we actually spend more energy for transmitting the same amount of information. Hence, in order to be fair, the comparison should be performed on the basis of the same average energy per bit rather than per signal. If the number of signals is M , the average energy per bit is given by Eb = Es , log2 M
60
such that Eb = Es in the BPSK case, but Eb = Es /2 in the QPSK case. Replacing these values in (3.76), we have Eb PQPSK (E ) 2Q 2 N0 (3.77) Eb PBPSK (E ) = Q 2 N0 and hence, in terms of same energy per bit, PQPSK (E ) 2PBPSK (E ) so that the probability of error simply doubles. Now, a difference of a factor 2 in the probability of error is absolutely negligible in terms of energy-per-bit spent. Indeed, we are usually interested in how much the same performance costs rather than in what is P(E ) for a given cost, and Fig. 3.32 shows that the horizontal difference between the BPSK and QPSK curves is a small fraction of dB and tends to vanish at higher values of Eb /N0 , while their vertical difference is always a factor 2.
1 10-2 10-4 10-6 10-8 10-10 10-12 10-14 2 4 QPSK BPSK
P(E )
10
12
14
16
Eb/N0 [dB]
F IGURE 3.32. By looking at the signal constellations represented in Fig. 3.33, we see that the factor 2 is due to the fact that each QPSK signal has two neighbors, instead of only one as in the BPSK case. This means that the uncertainty of the receiver, and thus the probability of error, doubles. BPSK s2 2 Es Es s1 QPSK s2 Es
s3
s1
s4 F IGURE 3.33.
2E s
It can also be seen that, at equal average energy per signal Es , the distance between neighbors is 2 Es for BPSK, but 2Es for QPSK, such that, for causing an error, a larger noise excursion is needed for BPSK, explaining the lower error probability. However, as already said, if we perform the comparison in terms of total energy spent for transmitting a given amount of information, things change: it is true that the probability of error is lower in the BPSK case but it is also true that the cost increases.
61
3.5.6. Impact of Timing and Synchronization. We have already qualitatively analyzed the impact of the lack of carrier phase synchronization in Example 3.2, i.e., the problem of synchronizing the local oscillator such that it has the correct phase relation with the incoming signal. Another timing issue is symbol synchronization, consisting in the knowledge of where in time a signaling interval is located. We neglected this problem in our derivations by implicitly assuming that there was no propagation delay from the transmitter to the receiver. In practice this is not so and the propagation delay is generally unknown to the receiver, which has to derive symbol timing from the received signal for proper operation. For example, the lack of symbol synchronization would cause performing the integrals in Fig. 3.10 over the time interval (, + T ) instead of (0, T ), or sampling in Fig. 3.20 at t = + T instead of at t = T . Of course, a propagation delay also results in carrier phase mismatch, so that carrier and symbol synchronization are interrelated and the receiver has to cope with both. As already said, even if timing and synchronization techniques are beyond the scope of this course, nevertheless we want to analyze their impact on system performance. 3.5.6.1. Carrier Synchronization. Let us establish the impact of carrier phase mismatch on an optimum BPSK receiver (see Fig. 3.31). Instead of 2/T cos 0 t , the local oscillator produces 2/T cos(0 t + ), such that instead of r1 = we now have r1 = 2 T
0 T
2 T
r (t ) cos 0 tdt = A
T + 2
2 T
n(t ) cos 0 tdt
(3.78)
r (t ) cos(0 t + )dt = A
T cos + 2
2 T
n(t ) cos(0 t + )dt .
(3.79)
As already discussed in Example 3.2, the phase synchronization error has no effect on the statistics of the noise term in (3.79), but the signal component is reduced by a factor cos , as if the signal amplitude is A cos instead of A. So, the energy usable for detection reduces from Es = A2 T /2 to Es = Es cos2 and the probability of error becomes PBPSK (E ) = Q 2 Es cos > Q N0 2 Es . N0
(3.80)
In particular, if is such that cos < 0, then PBPSK (E ) > 1/2. As regards QPSK, from (3.43) we see that the signal constellation is rotated around the origin by , as shown in Fig. 3.34(a), but the receiver will still use the old decision zones. So, after rotating the constellation such that the decision zones coincide with the four quadrants, as in Fig. 3.34(b), we note that, by symmetry, we still have P(E ) = P(E | m1 ) = 1 P(C | m1 ) .
62
s2 D3 s3 D4 (a)
D2
Es sin( + /4) s 2
s 1 + /4 s 4 Es cos( + /4)
E s s1
D1 s4
s 3 (b)
F IGURE 3.34. Taking into account that this time s11 = Es cos( /4 + ) and s12 = Es sin( /4 + ), from (3.71) with a1 = b1 = 0, a2 = b2 = and = N0 /2, we get Es cos( /4 + ) Es sin( /4 + ) PQPSK (E ) = 1 Q Q N0 / 2 N0 / 2 Es Es =1 1Q 2 cos( /4 + ) 1 Q 2 sin( /4 + ) N0 N0 Es Es =Q 2 cos( /4 + ) + Q 2 sin( /4 + ) N0 N0 Es Es 2 cos( /4 + ) Q 2 sin( /4 + ) . Q (3.81) N0 N0 2 cos( /4 + ) or Comparing (3.81) and (3.73) is easy if | | > / 4 because, in this case, either 2 sin( /4 + ) is negative, such that (3.81) is greater than 1/2, and thus much greater than (3.73). As regards the case || < /4, we note that the two terms 2 cos( /4 + ) and 2 sin( /4 + ) are vanishes for = /4. At equal to 1 for = 0. If increases from 0, 2 cos( /4 + ) decreases and the same time, 2 sin( /4 + ) increases ad reaches the maximum value 2. The opposite occurs when decreases. So, letting min 2 cos( /4 + ), 2 sin( /4 + ) , we have that, for || < /4, it is always 0 1, and we can rewrite (3.81) as Es Es Es Es (3.82) PQPSK (E ) = Q +Q 2 2 Q Q 2 2 N0 N0 N0 N0 For || < /4, it is easier comparing the approximations of (3.82) and (3.73) obtained using (3.64). Indeed, neglecting the product of the Q functions, letting x Es /N0 , and using the approximation (3.64), we can arrive at the following comparison: 1 2 1 2 2 2 2 2 e x /2 + e(2 )x /2 ex /2 , 2 x 2 x 2 2 x 2 which reduces to 1 (1 2 )x2 /2 1 2 2 e(1 )x /2 2 . e + 2 2
63
Observing that 2 2 for 1, we obtain a lower bound for the left hand side by replacing 1/ with 1/ 2 2 , so that 1 1 1 (1 2 )x2 /2 2 2 2 2 2 2 e + e(1 )x /2 e(1 )x /2 + e(1 )x /2 . 2 2 2 2 Now we note that
1 2 (1 2 ) x 2 2 2 2 2 e(1 )x /2 + e(1 )x /2 = cosh 2 2 2 2 2 2 because cosh y 1 for any y, and hence 1 (1 2 )x2 /2 1 2 2 e + e(1 )x /2 2 , 2 2 meaning that (3.82), and thus (3.81), is always greater than or equal to (3.73).
3.5.6.2. Symbol Synchronization. Considering a one-shot transmission and a BPSK system, the correlator in Fig. 3.31, due to a small timing error T , performs the integration in the time interval (, + T ) instead of (0, T ). As the transmitted signals are as in (3.74), instead of (3.78) we now have 2 +T T 2 +T r1 = r (t ) cos 0 tdt = A 1 + n(t ) cos 0 tdt . (3.83) T 2 T T Due to the stationarity of n(t ), the shifted interval of integration has no effect on the statistics of the noise term in (3.83) but the signal component is reduced by a factor 1 /T . So, the energy usable for detection reduces from Es = A2 T /2 to Es = Es (1 /T )2 and the probability of error becomes PBPSK (E ) = Q 2 Es 1 N0 T >Q 2 Es . N0
However, if the timing error is very small its impact is negligible. For example, with Es /N0 = 12.5, i.e., about 11 dB, perfect timing gives PBPSK (E ) = Q(5) 3 107 , while an error of 1% of a signaling interval, i.e., /T = 0.01, results in PBPSK (E ) = Q(4.95) 4 107 . Instead, using a matched-lter receiver, things change considerably. In this case, the lter matched to s1 (t ) in (3.74) is 2/T cos 0 t 0 t T h1 (t ) = 1 (T t ) = 0 otherwise and, due to a timing error , the signal component is obtained by sampling its output at t = + T instead of t = T (see Fig. 3.35).
r (t ) 1 (T t ) r1 t =+T m
F IGURE 3.35. The signal at the output of the matched lter is z(t ) = r (t ) h1 (t ) = s(t ) h1 (t ) + n(t ) h1 (t ) (3.84)
64
Taking into account that
where, as s(t ) = A cos 0 t for 0 t T and 0 otherwise, 2 t cos 0 cos 0 (t )d 0tT A T 0 2 T s(t ) h1 (t ) = A T t T cos 0 cos 0 (t )d T t 2T 0 otherwise cos 0 cos 0 (t ) =
T t 2 T T 2
and that 0 T = k 2 , we obtain
1 cos 0 t + cos 0 (2 t ) 2
sin 0 t 0 T sin 0 t 0 T
As, usually, 0 T
A s(t ) h1 (t ) = A 0
cos 0 t +
t T
0tT T t 2T otherwise
cos 0 t
where Es = A2 T /2 is the average energy. The output of the matched lter when s1 (t ) is applied to its input is sketched in Fig. 3.36, where the expected maximum value Es at t = T is also shown.
Es
1, we can neglect the term sin 0 t /(0 T ), such that t cos 0 t 0tT Es T t Es 2 T s(t ) h1 (t ) cos 0 t T t 2T 0 otherwise
2T t
F IGURE 3.36. However, because of the timing error , anyhow supposed to be small such that | |/T 1, the signal sample is now | | Es 1 cos 0 ( + T ) Es cos 0 T while the noise sample in (3.84), due to the stationarity of n(t ), has the same statistics as before. So, as in the previous case, the energy usable for detection reduces from Es to Es = Es cos2 0 and the probability of error becomes Es Es cos 0 > Q 2 . N0 N0 This time, even an error of 1% of a signaling interval can be harmful. Indeed, suppose again that Es /N0 = 12.5, so that perfect timing would give PBPSK (E ) = Q(5) 3 107 . If /T = 0.01, then PBPSK (E ) = Q 2
65
0 = 0 T /T = 2k 0.01 and, for k = 1020, cos 0 0.31 resulting in PBPSK (E ) = Q(1.55) 6 102 . As can be seen, a correlation receiver has much less sensitivity to timing error, since the integrated output does not oscillate at the carrier frequency. However, the local oscillator must be accurately synchronized, otherwise a phase synchronization error can be as dangerous as a timing error is for a matched lter receiver, as evident from (3.80). 3.5.7. Orthogonal Signals. Let us know consider the case of M equal-energy and orthogonal signals. In this case, very frequently, they are also equally likely. Being the signals orthogonal, they are linearly independent, such that the dimensionality Q of their subspace is equal to M . In Fig. 3.37, the cases with M = 2 and M = 3 are represented. s2 M=2 M=3 s1 s1 F IGURE 3.37. The probability of error can be computed in the following way. Recalling that P(E ) = 1 and being P(mi ) = 1/ M i , we have
M i =1
s3 s2
P(r Di | mi )P(mi )
M i =1
1 P (E ) = 1 M
P(r Di | mi ) . i
Given the symmetry, and hence
P(r Di | mi ) = P(r D1 | m1 )
P(E ) = 1 P(r D1 | m1 ) . Being the signals equally likely, the decision zone D1 is the set of all points nearest to s1 , i.e., the points r whose rst component is larger than all others (see Fig. 3.38): si ri Di D1 F IGURE 3.38. s1 r1
ri
r1
i = 2, . . . , M
66
D1 = {r RM : r1 > r2 , r1 > r3 , . . . , r1 > r M } = r RM : and hence P(r D1 | m1 ) = P

M
r1 > ri
M i =2
i =2
r1 > ri
m1 .
Taking into account that, if m1 is transmitted, Es + n1 ri = ni we can write P(r D1 | m1 ) =
i=1 i2 m1 , r1 p(r1 | m1 )dr1
r1 > ri
i =2
1 (r1 Es )2 p(r1 | m1 ) = exp 2 2 2 is the pdf of r1 conditional upon m1 . Note that, conditional upon r1 , the events r1 > ri are independent, such that P
M i =2
where
r1 > ri
m1 , r1 =
M i =2
P(ri < r1 | m1 , r1 ) = P(r2 < r1 | m1 , r1 )
M 1
where, in the last equation, we exploited the symmetry of the signal constellation. As r1 P(r2 < r1 | m1 , r1 ) = 1 Q we get 1 (r1 Es )2 r 1 M 1 P(r D1 | m1 ) = 1 Q dr1 exp 2 2 2 and, nally, by the change of variable x = (r1 Es )/ and recalling that 2 = N0 /2, P(E ) = 1 P(r D1 | m1 ) x2 1 =1 exp 1Q x+ 2 2 2Es N0
M 1
dx
(3.85)
The previous expression has to be computed numerically and is reported in Fig. 3.39(a) for M = 2k , k = 1, 2, 3, 4, 5, 6. From Fig. 3.39(a) it may seem that it is preferable using a small number of signals for obtaining a lower P(E ) at equal Es /N0 . But remember that the larger the number of signals, the larger the amount of information, such that systems with larger M transmit more information for the same energy. The increased P(E ) is simply due to the increased uncertainty of the receiver. In other words, while it is true that the larger M , the larger P(E ), it is also true that the larger M , the lower the energy required to transmit the same amount of information. As already said, a fair
67
1 10 -1
64
1 10 -1
32 16 P(E ) 8 4 M=2
10 -2
P(E )
10 -2 10 -3 10 -4 10 -5
M=
10 -3 10 -4 10 -5 10 -6 -4 -2 0 2 4 6
(a)
M=2 4 8 16 32 64 128 256 512 1024
10 12 14 16
10 -6
-5
-1.6 0
5
Eb/N0 [dB] (b)
10
15
Es/N0 [dB]
F IGURE 3.39.
comparison should be based on the energy required to transmit each information bit, i.e., Eb . If M = 2k , then Eb = Es /k , such that Eb 1 Es = N0 k N0 Eb N0 = Es N0 10 log10 k
dB
dB
and the graph of P(E ) as a function of Eb /N0 dB for M = 2k can be obtained by translating the corresponding graph in Fig. 3.39(a) by 10 log10 k towards left, as shown in Fig. 3.39(b), which also reports a few other graphs for k = 7, 8, 9, 10 and k = . As the amount of shift towards left increases with k (i.e., with M ), now the situation is reversed, such that, for a given (sufciently high) value of Eb /N0 , the probability of error decreases when M increases. However, the advantage obtained by increasing M is not without price, because the complexity of the receiver increases and, as we will see, also the bandwidth occupied by the signals. As can be seen from Fig. 3.39(b), there exists a minimum value of Eb /N0 required to achieve an arbitrarily small probability of error as M . This value can be obtained as a limit from (3.85) or through information theoretic arguments and it turns out to be equal to Eb /N0 = ln 2 0.693, corresponding to 1.6 dB. Indeed, according to the Shannon theorem, only if the information rate Rb is less than the channel capacity C , the probability of error at the receiver can be made arbitrarily small. The capacity in bits per second of an AWGN channel with bandwidth B is given by C = B log2 1 + S N0 B
68
where S is the average signal power. In our case B = , thus, letting B = x S /N0 , we have
B
lim C = lim log2 1 +

B
S N0 B
= log2 eS/N0 1 S = . ln 2 N0
1 = log2 lim 1 + x x
x S / N0
For M -ary signaling, the information rate in bits per second is Rb = (log2 M )/T while S = Es /T = (Eb log2 M )/T , so that the inequality Rb < C becomes log2 M 1 Eb log2 M < T ln 2 N0 T leading to Eb /N0 > ln 2. This result is valid for any communication system and thus also for systems using orthogonal signals. 3.5.7.1. Bit Error Probability. Sometimes, it is desirable to express the probability of error as the probability that a bitrather than a symbol is wrong. Indeed, a symbol error may cause more bit errors, and, moreover, systems with different number of signals are better compared in terms of the probability that a single information bit is wrong. In the case of equally probable orthogonal signals, it is easy to convert the probability of a symbol error into the equivalent probability of a bit error, as shown in the following. Due to the symmetry of the signal constellation, a symbol is equally likely mistaken with any other one. So, letting M = 2k and given that an error occurs, as the number of symbols differing from the correct one for i bits is given by the binomial coefcient k , the probability that a symbol error i causes i bit errors is given by Number of symbols differing in i bits from the correct one i = k . Total number of wrong symbols 2 1 Hence, the random variable number of bit errors per symbol X is such that 1 P(E ) i = 0 P{X = i } = k P(E ) i = 1, 2 , . . . , k
i 2 k 1 k
and its mean value, i.e., the average number of bit errors per symbol, is E {X } = Now,
k i =1 k
i =0
i P{X = i } =
k i =1
i =1
k P(E ) . i 2k 1
k 1 j =0
k =k i
k i =1
where we performed the change of variable j = i 1 and used the known result E {X } = k 2 k 1 k M P(E ) = P(E ) k 2 1 2 M 1
(k 1)! =k (i 1)!(k i )!
k1 =k i1
k1 = k 2 k 1 , j
n n i =0 i
(3.86) = 2n . Thus
69
and the average bit error probability is simply E {X } divided by k , the number of bits per symbol 1 M Pb = P(E ) . (3.87) 2 M 1 So, for M 1, Pb P(E )/2. The graphs of the bit error probability Pb are shown in Fig. 3.40 as a function of Eb /N0 .
1 10 -1 10 -2
Pb M=2 4 8 16 32 64 128 256 512 1024
10 -3 10 -4 10 -5 10 -6 -5 -1.6 0 5
Eb/N0 [dB] M=
10
15
F IGURE 3.40. 3.5.8. Examples of Orthogonal Signals. Let us see how orthogonal signals can be generated. 3.5.8.1. Pulse Position Modulation. This method consists in generating M signals temporally disjoint. Given a pulse p(t ) of duration T / M , for example A sin 0 t 0 t T / M p(t ) = (3.88) 0 otherwise si (t ) = p t (i 1)T / M i = 1, 2, . . . , M. A communication system using this kind of signals is referred to as a pulse position modulation (PPM) system. Fig. 3.41 shows some of the possible signals of a PPM system.
s1(t ) 0 s2(t ) 0 sM (t ) 0
T M 2T M M 1 M T T M 2T M M 1 M T T M 2T M M 1 M T
the i -th signal is obtained by delaying p(t ) of a multiple of T / M
F IGURE 3.41.
70
and are sketched for i = 1, 2, 3, 4 and i = 0, in Fig. 3.42.

s1(t ) 0 s2(t ) 0 s3(t ) 0 s4(t ) 0
T 5 2T 5 3T 5 T 4 T 2 T 3 2T 3 T 2
3.5.8.2. Frequency Shift Keying. The signals employed in a frequency shift keying (FSK) system are A sin 0 + (i 1) t + i 0 t T si (t ) = i = 1, 2 , . . . , M (3.89) 0 otherwise
T
3T 4
4T 5
F IGURE 3.42. Let us verify under what conditions they are orthogonal. As the internal product of si (t ) and s j (t ) is T 2 (si , s j ) = A sin 0 + (i 1) t + i sin 0 + ( j 1) t + j dt 0 A2 T A2 T = cos (i j )t + i j dt cos 20 + (i + j 2) t + i + j dt 2 0 2 0 if 0 T 1, the second integral is negligible and thus A2 sin (i j )T + i j sin(i j ) (si , s j ) = . 2 (i j ) So, in the absence of phase synchronization, i.e., if i j = k , with k an integer, the minimum frequency deviation by which the signals are orthogonal is min 1 = . 2 T However, if the initial phases of the signals are synchronized such that i j = k , i.e., if they have same or opposite phases, then min 1 = . (3.90) 2 2T 3.5.8.3. Walsh Signals. The Walsh signals si (t ), i = 1, . . . , M , are each obtained by a train of M pulses p(t ) of duration T / M , delayed by a multiple of T / M and with possibly inverted polarity si (t ) = where ai,k = 1 (see Fig. 3.43).
M k =1
ai,k p t (k 1)
T M
(3.91)
71
p(t ) 0 si(t ) 0 T T /M
F IGURE 3.43. All signals si (t ) have same energy Es and the internal product of any two of them is given by Es (si , s j ) = M
T M k =1
ai,k a j,k =
Es T a aj M i
where ai = (ai,1 , ai,2 , . . . , ai,M ) and a j = (a j,1 , a j,2 , . . . , a j,M )T . So, the problem of generating M orthogonal Walsh signals consists in nding a set of M orthogonal vectors whose components are either +1 or 1. If M is a power of 2, such vectors can be obtained in the following way. For M = 2, it is easy to see that the columns of the matrix H1 = 1 1 1 1
is such that its columns are orthogonal. Iteratively applying this method, one obtains the matrix Hk +1 (referred to as Hadamard matrix) Hk +1 = Hk Hk Hk Hk
are orthogonal. For M = 22 , it is evident that also the matrix 1 1 1 1 H1 H1 1 1 1 1 = H2 = 1 1 1 1 H1 H1 1 1 1 1
whose columns are orthogonal if the columns of Hk are orthogonal. Indeed, any two different columns ai and a j of Hk +1 are such that their rst 2k elements either coincide or are orthogonal. In the rst case, the remaining 2k elements are all different, such that aiT a j = 0. In the second case, the remaining 2k elements are also orthogonal and thus aiT a j = 0. 3.5.9. Bandwidth Requirement for Orthogonal Signaling. At equal conditions, i.e., same number of signals M , same average energy Es and same noise, all the previously examined systems using orthogonal signals have same information rate Rb = (log2 M )/T bit/s and same error probability. Also the cost is the same, but, while the transmitters of the FSK system and of the system using Walsh signals deliver the energy Es over the whole signaling interval T , the transmitter of the PPM system should deliver it in a fraction 1/ M of this time, such that the required power is M times larger. Now we would like to establish what is the required bandwidth for orthogonal signaling. The bandwidth occupied by a signal can be dened in different ways. From a practical point of view, a useful denition is the following. Let us consider two systems using the same kind of signals but different carrier frequencies, and suppose that they are perfectly synchronized (i.e., the signals are
72
perfectly known to the receivers). The minimum distance at which the carrier frequencies of the two systems can be placed such that the signals of a system do not interfere with the receiver of the other system is called effective bandwidth. Clearly, given this denition, as the receiver operates on the correlations (r, si ), there will be no interference if the signals used by a system are orthogonal to those used by the other system. 3.5.9.1. PPM. If the two systems employ PPM signals obtained by delaying pulses like the one in (3.88) of a multiple of T / M Ai sin i t 0 t T / M pi (t ) = i = 1, 2 0 otherwise
0
being fi = i /2 is the carrier frequency of system i , it is easy to see that the signals obtained by differently delaying p1 (t ) and p2 (t ) are orthogonal because they are temporally disjoint. For the same delay, the orthogonality condition is
T /M
A1 A2 p1 (t ) p2 (t ) dt = 2
T /M
[cos(1 2 )t cos(1 + 2 )t ] dt = 0 . 1, such that T = k M
The contribution of cos(1 + 2 )t to the integral is negligible if (1 + 2 )T / M

0 T /M
p1 (t ) p2 (t ) dt =
A1 A2 sin[(1 2 )T / M ] =0 2 1 2 M |1 2 | = 2 2T
(1 2 )
and the minimum carrier offset turns out to be
3.5.9.2. FSK. As regards orthogonal FSK signaling, suppose the two systems employ the following signals System 1: System 2: si (t ) = A1 sin 1 + (i 1) t i = 1, 2 , . . . , M j = 1, 2 , . . . , M s j (t ) = A2 sin 2 + ( j 1) t 0tT
with 2 1 + M . From the result (3.90) of Section 3.5.8.2, the orthogonality condition implies that the highest frequency of the system with lower carrier and the lowest frequency of the system with higher carrier should differ by a multiple of 1/(2T ). Hence, the minimum distance between the carrier frequencies, i.e., the effective bandwidth, is |1 2 | M = 2 2T the same result we obtained with PPM. 3.5.9.3. Walsh Signals. Let us suppose that two systems use the Walsh signals (3.91) for amplitude modulating two carriers at frequencies fi = i /2 , i = 1, 2. Denoting by si (t ) and s j (t ) the signals of the respective systems and under the hypothesis that (1 + 2 )T 1, the orthogonality
73
condition is given by T si (t ) sin 1 t s j (t ) sin 2 t dt

0
1 2 1 = 2 =0
0 M
si (t )s j (t ) cos(1 2 )t dt
kT / M (k 1)T / M
k =1
si (t )s j (t ) cos(1 2 )t dt
As in each subinterval (k 1)T / M, kT / M the Walsh signals are constant, si (t )s j (t ) is constant and the previous condition is equivalent to kT /M cos(1 2 )t dt = 0 , k = 1, 2, . . . , M
(k 1)T / M
leading to the result that, again, the minimum distance between the carrier frequencies should be |1 2 | M = . 2 2T In conclusion, other than same performance and energy efciency, the three analyzed systems also have the same effective bandwidth. This result is valid in general, i.e., the bandwidth occupancy of a system employing orthogonal signals increases linearly with the number of signals M . We point out that the above value of the effective bandwidth is relative to the case of perfectly synchronized systems, i.e., when each communication system has a perfect knowledge of the signals used in any other system. If this is not the case, the effective bandwidth would double for all considered systems.
M /2 3.5.10. Biorthogonal Signals. Given a set of M /2 equal-energy orthogonal signals {si (t )}1 , M /2 by adding the signals {si (t )}1 we get a new set of M signals such that to each si (t ) corresponds the opposite signal si (t ). Such signals are referred to as biorthogonal signals. Fig. 3.44 sketches the cases for M = 4 and M = 6.
s2
M=4
M=6 s2 s3 s1 s2 s3
s1 s2
s1
s1
F IGURE 3.44. The cases with M = 2 and M = 4 are perhaps the most common ones and correspond to the BPSK and QPSK systems, respectively. For a generic M , the probability of error can be computed as in the case of the orthogonal signals. If the signals are equally probable and equal-energy, for symmetry reasons we still have that P(E ) = P(E | m1 ) = 1 P(r D1 | m1 )
74
but now the decision is in favor of s1 if r1 > 0 and |ri | < r1 for i = 2, 3, . . . , M /2 (see Fig. 3.45), such that the decision zone D1 is D1 = {r RM /2 : r1 > 0, |r2 | < r1 , |r3 | < r1 , . . . , |r M /2 | < r1 } , and thus P(r D1 | m1 ) = P r1 > 0,
M /2 i =2
|ri | < r1
m1 =
M /2 i =2
|ri | < r1
m1 , r1 p(r1 | m1 )dr1 .
ri
si ri s1
r1
r s1 r1
i = 2, . . . , M /2
ri =
si F IGURE 3.45. Taking into account that, if m1 is transmitted, Es + n1 ri = ni P and hence
M /2 i =2
as, conditional upon r1 , the events |ri | < r1 are independent, we have that |ri | < r1 m1 , r1 =
M /2 i =2
i=1 2 i M /2
P(|ri | < r1 | m1 , r1 ) = P(|r2 | < r1 | m1 , r1 )
Except for small values of M , the performance of a system using biorthogonal signals is quite similar to that of a system using orthogonal signals. However, biorthogonal signals have some advantages, such as a simpler receiver due to the halved number of correlators, halved effective bandwidth, and minimum energy signal constellation. This last property explains why biorthogonal signals compete well with orthogonal signals.
r1 2 r2 (r1 Es )2 1 1 P(r D1 | m1 ) = exp exp dr2 2 2 2 2 2 0 2 r1 M (r1 Es )2 r 1 2 1 1 = exp 1 2Q dr1 . 2 2 2 0 Finally, after the change of variable x = (r1 Es )/ and replacing = N0 /2, M 1 x2 2 E s 2 1 P(E ) = 1 exp dx . 1 2Q x + 2 N0 2 2 E s / N0
r1
M 2 1 M 2 1
dr1
75
The probability of error for biorthogonal signaling is shown as a function of Es /N0 in Fig. 3.46(a) and as a function of Eb /N0 in Fig. 3.46(b), where also the probability of error for orthogonal signaling is reported for comparison.
1 10 -1 10 -2
P(E ) 64 P(E ) 32 16 8 4 M=2
Biorthogonal
1 10 -1 10 -2 10 -3 10 -4 10 -5
M=
Biorthogonal Orthogonal
10 -3 10 -4 10 -5 10 -6 -4 -2 0 2
M=2 4 8 16
6
(a)
10 12 14 16
10 -6 -5
-1.6 0
5
Eb/N0 [dB] (b)
10
15
Es/N0 [dB]
F IGURE 3.46. As can be seen, for M = 2 biorthogonal signals perform better than orthogonal ones, as expected, but already for M = 4 the difference starts becoming negligible. 3.5.11. Bounding the Error Probability. In most cases, evaluating the probability of error can be difcult. For example, in the case of orthogonal signals it cannot be computed in closed form. In such cases, we resort to approximations or bounds. We can either upper or lower bound the probability of error, or do both things, preferably. We require the bounds to be tight for low error-probability values, so that the difference between upper and lower bound is small enough to give a reasonable approximation to the unknown true value. Of course, for being of any practical value, approximations and bounds should be more easily computable than direct evaluation of the probability of error. 3.5.11.1. Union Bound. The union bound is a simple upper bound and is based on the following reasoning. Let us consider a generic system with M signals. According to the MAP criterion (3.8), the receiver must prefer m j to mi if P(m j | r) > P(mi | r). Let us call Ei j such event, that is So, P(Ei j | mi ) is the probability that, when mi is transmitted, m j is preferred by the receiver. If mi and m j were the only two messages used by the communication system, P(Ei j | mi ) would be equal to the conditional probability of error P(E | mi ). For this reason, it is called the pairwise error probability. However, in general, if mi is transmitted, there are M 1 of such error events, namely, all the events Ei j with j = i , and the receiver makes an error if any one of them occurs. So, supposing that message mi is transmitted, the receiver makes an error if the following event occurs Ei = {mi is not preferred} =
M j =1 j =i
Ei j = {m j is preferred to mi } .
Ei j
76
and therefore the conditional probability of error can be written as P(E | mi ) = P(Ei | mi ) = P
M j =1 j =i
E i j mi .
Since the probability of a union of events is upper bounded by the sum of the probabilities of the single events, the conditional probability of error can be upper bounded as P(E | mi )
M j =1 j =i
P ( E i j | mi ) .
(3.92)
Averaging over all messages, we get the so called union bound on the probability of error P(E ) =
M i =1
P(mi )P(Ei | mi )
M i =1
P(mi )
M j =1 j =i
P ( E i j | mi ) .
(3.93)
For equally probable signals, P(mi ) = 1/ M for all i , and the union bound becomes 1 P(E ) M
M M
i =1 j =1 j =i
P ( E i j | mi ) .
(3.94)
As P(Ei j | mi ) represents the conditional probability of error of a binary system using only the messages mi and m j , it is usually easier to compute than the conditional probability of error P(E | mi ) when M > 2. In the AWGN case, the pairwise error probabilities can be computed as done in Section 3.5.3. Given the signal images si and s j as in Fig. 3.47 (see also Fig. 3.25 and compare with (3.67)), P(Ei j | mi ) = Q di j + 2i j , 2 si di j P(E ji | m j ) = Q E ji di j 2i j 2 (3.95)
i j
sj
Ei j
F IGURE 3.47. where = N0 /2 is the standard deviation of the noise components, di j is the distance between si and s j , and i j is the binary threshold (compare with (3.61)) i j = 2 P(mi ) ln . di j P(m j ) (3.96)
The union bound, either (3.93) or (3.94), requires evaluating M (M 1) pairwise error probabilities P(Ei j | mi ), each one depending on the constants di j and i j . However, taking into account that di j = d ji and i j = ji , only half of such constants are to be actually evaluated (see (3.95)).
77
If the signals are equally likely, i j = 0 i , j , so that (3.92) and (3.94) become P(E | mi ) P(E ) 1 M
M M
j =1 j =i M
di j 2
(3.97)
i =1 j =1 j =i
di j . 2
(3.98)
In addition, if the signals are also equidistant, i.e., di j = d i , j , observing that the double summation in (3.98) involves M ( M 1) terms, we get P(E ) = P(E | mi ) (M 1)Q d . 2 (3.99)
E XAMPLE 3.6. Let us compute the union bound for the BPSK and QPSK systems, supposing that the signals are equally probable. In both cases, for symmetry reasons (see Fig. 3.33) P(E ) = P(E | mi ) = P(E | m1 ) = P(E1 | m1 ) i . In the BPSK case,E1 = E12 and d12 = 2 Es , so, from either (3.92) or (3.97) with i = 1, and recalling that = N0 /2, 2Es P(E ) P(E12 | m1 ) = Q . N0 It turns out that this is actually the exact value of P(E ) (see (3.75)). This is not surprising, because, in this case, the pairwise error probability P(E12 | m1 ) coincides with the conditional probability of error P(E | m1 ). In the QPSK case, E1 = E12 E13 E14 and we have (see Fig 3.33(b)) d12 = d14 = 2Es d13 = 2 Es so, from either (3.92) or (3.97) with i = 1, P(E ) P(E12 | m1 ) + P(E13 | m1 ) + P(E14 | m1 ) = 2Q Es Q2 N0 Es . N0 Es +Q N0 2Es N0 (3.100)
to be compared to the exact value (3.73), reported here for convenience P(E ) = 2Q
3.5.11.2. Expurgated Union Bound. The union bound may often be tightened when one or more events Ei j are included in the union of other events. As an example, it may happen that, for some k , Eik
M
Ei j
j =1 j =i ,k
78
such that the error event Ei = {mi is not preferred} can be written as the union of M 2 events rather than M 1, i.e., Ei = In this case, P ( E | mi ) = P
M
Ei j =
Ei j .
j =1 j =i M j =1 j =i ,k
j =1 j =i , k M j =1 j =i , k
E i j mi
P ( E i j | mi )
is more tight than (3.92) as the positive term P(Eik | mi ) has been expurgated. E XAMPLE 3.7. With reference to Example 3.6, as E13 E12 E14 , (see gure on the right) E1 = E12 E13 E14 = E12 E14 Es . N0 E 12 s3 E 13 s4 s2 s1 E 14
and therefore we get the tighter bound
P(E ) P(E12 | m1 ) + P(E14 | m1 ) = 2Q
3.5.11.3. Minimum Distance Upper Bound. If the signals are equally likely, we can upper bound the pairwise error probabilities by using the minimum distance dmin between any two signals dmin = min di j .
i= j
Indeed, as Q( x ) is a monotonically decreasing function of its argument x , we have di j dmin Q 2 2 which, replaced in (3.98), gives the minimum distance upper bound P(Ei j | mi ) = Q
M M
1 P(E ) M
i =1 j =1 j =i
di j dmin ( M 1)Q . 2 2
(3.101)
The minimum distance upper bound is valid for equally likely signals and, clearly, it is simpler to evaluate but looser than the union bound. 3.5.11.4. Minimum Distance Lower Bound. Upper bounds are useful approximations to the error probability, as they assure that true performance can only be better, allowing to play safe as regards system design. However, they do not provide information on how much better the performance could be. A complementary information is provided by lower bounds, such that, if the difference between lower and upper bounds is small, approximating the true probability of error by a middle value gives reasonable results. As Ei =
M j =1 j =i
Ei j ,
79
a simple way to lower bound the probability of error, conditional upon the transmission of message mi , is approximating it by only one of the pairwise error probabilities. In fact, recalling that the probability of a union of events is lower bounded by each one of the probabilities of the single events, we have P(Ei | mi ) P(Ei j | mi ) j = i and, consequently, P(Ei | mi ) max P(Ei j | mi ) .
j =i
The lower bound on the probability of error is then obtained by averaging over all messages P(E ) =
M i =1
P(mi )P(Ei | mi )
M i =1
P(mi ) max P(Ei j | mi ) .

j =i
(3.102)
In the AWGN case with equally likely signals, the pairwise error probabilities can be expressed as in (3.95) with i j = 0 i , j . Recalling that Q( x ) is monotone decreasing and denoting by di the distance of si from the nearest neighbor di = min di j ,
j , j =i
we obtain
di j di =Q . j =i j =i 2 2 Since P(mi ) = 1/ M for all i , the lower bound (3.102) can be written as max P(Ei j | mi ) = max Q 1 P(E ) M
M i =1
di . 2
(3.103)
Now, since Q( x ) decreases rapidly when its argument increases, we could neglect all the terms in (3.103) with di higher than a certain value. Choosing to neglect all terms with di > dmin , we obtain the so called minimum distance lower bound min dmin P(E ) Q (3.104) M 2 where min denotes the number of signals that have at least one neighbor at distance dmin . This lower bound is of course looser than (3.103) but simpler to compute. Note that min 2, because at least two signals have a neighbor at distance dmin . 3.5.11.5. Approximated Expression for the Error Probability. A good approximation to the probability of error can be obtained starting from the union bound (3.97) on the conditional probability of error, reported here for convenience: P(E | mi )
M j =1 j =i
di j . 2
By retaining only the terms with di j = dmin , we cannot be sure that the right hand side is still an upper bound, but for di j / it will asymptotically tend to the union bound. Indeed, it can be shown that, for > 1, Q( x ) lim =0, x Q( x )
80
meaning that Q( x ) is innitesimal of higher order than Q( x ). So, denoting by i the number of signals at distance dmin from si , we have dmin , 2 means that the right hand side is an asymptotic upper bound. Averaging over P(E | mi ) i Q P (E ) = where = If all messages are equally probable, = 1 M
M M i =1
where the notation all messages, we get
P(E | mi )P(mi )
M i =1
dmin 2
(3.105)
i P(mi ) .
i
i =1
can be interpreted as the average number of nearest neighbors of the signals in the constellation. This approximation is especially valid for not so small values of P(E ). 3.5.11.6. The Importance of the Minimum Distance. For the AWGN channel with an equally probable M -ary signal set, combining the minimum distance lower and upper bounds and taking into account that min 2, we can bound the error probability as
2 dmin dmin Q P(E ) ( M 1)Q . M 2 2 This expression highlights the fact that the bounds differ only for a multiplicative constant, whatever the signal constellation. So, dmin is an important parameter determining, by alone, the quality of an equally probable M -ary signal constellation. Constellations with same average energy can be compared on the basis their minimum distance, as the one with the largest dmin is expected to perform better. However, we should also consider that the average number of nearest neighbors may play a determinant role, so that maximizing dmin may not be equivalent to minimizing P(E ). E XAMPLE 3.8. Let us evaluate the minimum distance bounds and the approximate upper bound (3.105) for a QPSK system. The minimum distance is dmin = 2Es (see Fig 3.33(b)), so from (3.101), the minimum distance upper bound is dmin Es . = 3Q 2 N0 As all signals have at least one neighbor at minimum distance, min = 4 and thus, from (3.104), the minimum distance lower bound is P(E ) 3Q Es dmin =Q . 2 N0 As regards the approximate upper bound, as each signal has 2 neighbors at minimum distance, i = 2 for all i , such that = (2 + 2 + 2 + 2)/4 = 2 and P(E ) Q P(E ) 2Q dmin = 2Q 2 Es . N0
81
Notice that, in this case, the approximate upper bound coincides with the expurgated union bound (see Example 3.7). Fig. 3.48 shows the exact probability of error together with all considered bounds.
1
10 -1
dm
10 -2
10 -3
10 -4 -10
Ex pu rga
rb ou nd Ex ted ac un t ion bo un Un d ion bo dm un in u d pp er bo un d

-5 0 Es /N0 [dB]
P(E )
in
low e
10
F IGURE 3.48. As can be seen, the union bounds are asymptotically tight, contrarily to the minimum distance upper and lower bounds. Also note that, for small values of Es /N0 , the upper bounds can exceed 1. E XAMPLE 3.9 (M -ary PSK). The signals employed in a M -PSK system are si (t ) = A cos(0 t i ) 0tT 2 i = (i 1) i = 1, 2, . . . , M M
M As easily veried, the dimensionality of the subspace of {si (t )}1 is Q = 2. For 0 T = N 2 , with N an integer, or, alternatively, 0 T 1, the energy of all signals is Es = A2 T /2 and
2 cos 0 t T 0tT 2 2 (t ) = sin 0 t T are an orthonormal basis of the signals subspace. The images of the signals are evenly distributed along a circle with radius Es and are shown in Fig. 3.49 for M = 8. The distance between two consecutive images is d = 2 Es sin( / M ). 1 (t ) = We have already computed the probability of error for M = 2 (BPSK) and M = 4 (QPSK). However, as the exact evaluation for other values of M is cumbersome, we will analyze the performance through upper and lower bounds.
82
s3 s4 E1M s5 E12 s6 s7 F IGURE 3.49. Given the symmetry, Since (see Fig. 3.49) we have that P(E ) = P(E | m1 ) = P(E1 | m1 ) . E1i E12 E1M E1 =
M i =2
s2
/M
Thus, the expurgated union bound provides the upper bound Still for symmetry P(E ) P(E12 | m1 ) + P(E1M | m1 ) .
d , P(E12 | m1 ) = P(E1M | m1 ) = Q 2 such that, replacing d = 2 Es sin( / M ) and = N0 /2, 2Es sin . N0 M As regards the lower bound, we note that min = M , because all signals have at least one neighbor at minimum distance. So, the minimum distance lower bound (3.104) gives P(E ) 2Q P(E ) Q In conclusion, Q 2Es sin . N0 M
2Es 2Es sin P(E ) 2Q sin , (3.106) N0 M N0 M so that upper and lower bounds differ only by a factor of 2, which is usually adequate for applications. Note that, as each signal has 2 neighbors at minimum distance, = 2 and the approximate upper bound (3.105) again coincides with the expurgated union bound. Also note that the union bound is asymptotically tight for large values of Es /N0 . Indeed, the exact probability of error would be P(E1 | m1 ) = P(E12 E1M | m1 ) = P(E12 | m1 ) + P(E1M | m1 ) P(E12 E1M | m1 ) , that is 2Es sin P(E12 E1M | m1 ) , P(E ) = 2Q N0 M
Es
d = 2 Es sin M s1
sM
i = 3, . . . , M 1
E1i = E12 E1M .
83
and it is clear that the upper bound is obtained by neglecting the last term. Now, since E12 E1M E1( M 2 +1) we have that 2 Es 2Es P(E12 E1M | m1 ) P(E1( M =Q | m1 ) = Q 2 +1) 2 N 0 and, as sin( / M ) < 1, this term is innitesimal of higher order than Q 2Es /N0 sin( / M ) when Es /N0 increases. E XAMPLE 3.10 (Orthogonal signals). As already seen, in the case of M equally probable, equalenergy and orthogonal signals, the probability of error can only be expressed by means of an integral which should be computed numerically. Now, closed form formulas are quite useful for drawing conclusions and the bounds we analyzed can provide closed form approximations which are accurate, at least for large values of Es /N0 . In this case, the signals are equidistant, hence, the union bound and the minimum distance upper bound coincide. Since all signals have at least one neighbor at minimum distance, min = M . So, as the distance between the signals is d = 2Es , we have Es Es P(E ) (M 1)Q . Q N0 N0 Note that for M = 2 upper and lower bounds coincide. This is expected, as they are based on the pairwise error probabilities, which give exact results for binary signaling. However, the lower bound does not depend on M and becomes looser and looser when M increases. On the contrary, the upper bound is asymptotically tight for increasing values of Es /N0 . 3.5.12. Bandwidth and Asymptotic Power Efciency. According to Theorem 2.51, any signal x (t ) bandlimited in the interval ( B, B) can be expressed as in (2.57). Now, if the energy of x (t ) is almost entirely contained in a time interval T , we can assume that x (t ) almost vanishes outside this time interval. Hence, as the spacing between samples is = 1/(2 B), the series (2.57) can be reduced to a summation of T / = 2 BT terms, neglecting only a small fraction of the energy. This fact suggests that the space of real signals of approximate duration T and approximate bandwidth B has approximate dimension 2 BT . This is called the 2 BT -theorem and can be proved more rigorously than our naive justication. So, according to this theorem, the dimensionality of a set of signals with duration T and bandwidth B is approximately N = 2 BT , such that we could dene the bandwidth of a signal set with N dimensions as N B= . 2T This is known as the Shannon bandwidth and coincides with our denition of the effective bandwidth in the case of N orthogonal signals. The Shannon bandwidth may be interpreted as the minimum amount of bandwidth that the signals needs, in contrast to the bandwidth actually used. As already discussed, for M -ary signaling, each message symbol carries log2 M information bits. Thus, the information rate in bits per second is log2 M number of bits per symbol Rb = = signaling interval T and the average power expended by the transmitter is Es Eb log2 M S= = = E b Rb . T T
84
Dening the signal-to-noise ratio as the ratio between the average signal power S and the average noise power in the bandwidth B, the latter being (N0 /2) 2 B = N0 B, we have E b Rb S = . N0 B N0 B So, the signal-to-noise ratio is the product of Eb /N0 and the dimensionless quantity Rb / B, called the bandwidth (or spectral) efciency. The spectral efciency, even if dimensionless, is measured in bit/s/Hz and tell us how many bits per second are transmitted in a given bandwidth B. The higher the spectral efciency, the more efcient the use of the available bandwidth. Recalling that , for high signal-to-noise ratios, the probability of error can be approximated as in (3.105), i.e., replacing = N0 /2, dmin P(E ) Q 2 N0 and taking antipodal binary signaling (for which dmin = 2 Eb ) as a reference, we dene the asymptotic power efciency as the quantity satisfying that is, 2Eb dmin = , N0 2 N0
2 dmin = . 4Eb The asymptotic power efciency tells us how efciently the available energy is used for generating a given minimum distance, and thus a given error probability. Hence, at least for high signal-tonoise ratios and for a comparable average number of nearest neighbors , the greater , the better the signal set.
The evaluation of a communication system should be based on the probability of error P(E ), the value of Eb /N0 necessary to achieve it, and the spectral efciency. Ideally, a system achieves a small P(E ) with a low Eb /N0 and a high Rb / B. Of course, in a practical implementation one has also to consider complexity as a trade-off parameter. For example, let us compare M -ary orthogonal signaling and M -PSK. From Example 3.10 and recalling that Es = Eb log2 M , for orthogonal signaling we have Eb Eb log2 M P(E ) (M 1)Q log2 M . N0 N0 This expression evidences that, for a given Eb /N0 , the probability of error decreases when M increases, because the asymptotic power efciency = 1/2 log2 M increases. On the other hand, for a M -PSK system, from (3.106) we have Q Q 2Eb (log2 M ) sin2 P(E ) 2Q N0 M 2Eb (log2 M ) sin2 N0 M
but, this time, the asymptotic power efciency = sin2 /M log2 M decreases for M 4 ( = 1 for M = 2, 4), such that, for a given Eb /N0 , the probability of error increases when M increases. However, the bandwidth needed for an orthogonal signal set is B = M/2T and increases linearly with M , such that the spectral efciency Rb / B = 2/M log2 M decreases, while the bandwidth needed by M -ary PSK is B = 1/T and is independent of M , because the dimensionality of the signal space
85
remains unchanged, such that the spectral efciency Rb / B = log2 M increases. We can conclude that orthogonal signals are power efcient, while M -PSK and, in general, any other system using bidimensional signals are bandwidth efcient. Strictly speaking, we should compare different systems on the basis of the bit error probabilities instead of the symbol error probabilities, as explained in the following. 3.5.13. Bit Error Probability. As already said, communication systems using a different number of signals should be compared by expressing the probability of error in terms of Eb /N0 . However, a symbol error may cause a different number of bit errors (from 1 to k = log2 M ), so it is more meaningful comparing systems with different number of signals in terms of the probability of a bit error, rather than of a symbol error. When the probability of mistaking a signal with another one is the same, as in the case of equally probable and equal-energy orthogonal signals, the way the information bits are associated to the signals does not matter, i.e., the bit error probability does not depend on the adopted association rule. However, this is not the case in general. In the following, the rule by which log2 M bits are assigned to each symbol will be referred to as a map. In order to clarify things, let us revisit our model in Fig. 3.1 and expand the block representing the ( M -ary) source of information as done in Fig. 3.50.
b(1)
S/P
n(t )
MAPPER
BINARY SOURCE
. . .
TX
s(t )
r(t )
RX
b(k)
M-ARY SOURCE CHANNEL
F IGURE 3.50. A binary source emits every T b seconds a binary symbol b. The binary symbols are grouped in blocks of length k by means of a serial-to-parallel converter S/P and are assigned to a symbol m M belonging to a nite set M = {mi }1 of M = 2k elements. The symbols m are thus emitted every T = kT b seconds, such that the signaling rate (also called baud rate) is 1/T while the information rate is Rb = 1/T b = (log2 M )/T . The same group of k = log2 M bits is rigidly assigned to one of the M -ary symbols. There are M ! different ways (all possible permutations of M objects) of associating each group of k bits to a different symbol, i.e., M ! different maps are possible. For a given map, the probability of error can be computed as follows. Let us denote by X the random variable number of bit errors per symbol, such that X can assume any value in the set {0, 1, 2, . . . , k }. Denoting by bi j the Hamming distance of mi and m j , i.e., the number bits by which mi and m j differ, conditional on the transmission of mi , the event {m = m j } causes bi j bit errors with probability P(m = m j | mi ). Partitioning the set M into the subsets M , = 0, 1, . . . , k , whose elements are all m j s such that bi j = , i.e., and taking into account that the events {m = m j }, j = 1, 2, . . . , M , are disjoint, we have P(X = | mi ) = P m = mj
m j M
M = m j M : bi j = mi =
m j M
P(m = m j | mi ) .
86
Hence, assuming that mi is transmitted and taking into account that bii = 0, the average number of bit errors can be written as E {X | mi } =
k =0
P(X =
| mi ) =
k =0 m j M
P(m = m j | mi ) =
M j =1 j =i
bi j P(m = m j | mi )
and the conditional bit error probability is simply this quantity divided by k = log2 M , the number of bits per symbol, 1 P(Eb | mi ) = log2 M
M j =1 j =i
bi j P(m = m j | mi ) .
The bit error probability is then obtained by averaging over all symbols Pb 1 P(Eb ) = log2 M
M M
i =1 j =1 j =i
bi j P(m = m j | mi )P(mi ) .
In the case of equally probable and equal-energy orthogonal signals, a symbol is equally likely mistaken with any other one. Thus P(m = m j | mi ) = 1 Pb = log2 M The term
M
and, as P(mi ) = 1/ M for all i ,
P(E ) M 1
M
j=i
bi j
i =1 j =1 j =i M j =1 j =i
P(E ) 1 . M 1 M
bi j
gives the total number of bit errors when mistaking mi with all other symbols. For any map and mi , the number of symbols differing by bits from mi is k , with k = log2 M , so that (see (3.86))
M j =1 j =i
bi j =
k =1
= k 2 k 1 =
M log2 M 2
and 1 P(E ) 1 Pb = log2 M M 1 M in accordance with (3.87).
M i =1
M M log2 M = P(E ) 2 2(M 1)
E XAMPLE 3.11. If the probability of mistaking a symbol with another one is not the same for all possible mistakes, the way the bits are mapped to the symbols inuences the bit error probability. Let us verify this fact with the QPSK system. Fig. 3.51 shows the signal constellation rotated by /4 counterclockwise together with two different maps.
87
s2 01
s1 00
s2 01
s1 00
10 s3
11 s4
11 s3 Gray map F IGURE 3.51.
10 s4
Natural map
The natural map simply associates the base 2 representation of i 1 to signal si . Given the symmetry, the probability of a bit error is simply Pb = P(Eb | m1 ) = Denoting by d = letting 1 log2 4
4 j =2
b1 j P(m = m j | m1 ) .
2Es = 2 Eb the distance between any two adjacent signals and, for simplicity, d 2Eb =Q 2 N0 p1 j = P(m = m j | m1 ) , p=Q
we have p12 = p14 = p(1 p) p13 = p2 . For each map, the values of b1 j are listed in Table 1. Natural map b12 = b13 = 1 b14 = 2 TABLE 1. Thus, for the natural map we get Pb = while, for the so called Gray map, 1 2 p(1 p) + 2 p2 = p . 2 As p < 1/2 whatever , the Gray map gives a smaller bit error probability. Pb = 1 3 3 p(1 p) + p2 = p p, 2 2 Gray map b12 = b14 = 1 b13 = 2
88
The result of the last example has a simple explanation. Because a signal is most likely mistaken with a neighboring one, the mapping should be performed such that neighboring symbols differ by the least possible number of bits. If it is possible to perform the assignment such that neighboring symbols differ by only one bit, the map is said to be a Gray map. Depending on the signal constellation, it is not always possible adopting a Gray map. However, commonly used constellations corresponding to pulse amplitude modulation (PAM), phase-shift keying (PSK), and quadrature amplitude modulation (QAM) always allow Gray mapping (see Fig. 3.52). 00 01 11 10
4-PAM 0000 011 010 001 0001 0011 0010
0100
0101
0111
0110
1100 110 111 101 8-PSK F IGURE 3.52. 000 100 1000
1101
1111
1110
1001
1011
1010
16-QAM
A Gray map minimizes the bit error probability and also allows to express it as a simple approximation in terms of the symbol error probability, which will be denoted for simplicity as Ps . Indeed, taking again advantage of the fact that the Q function decreases rapidly when its argument increases, if we neglect the probability of mistaking a symbol with one which is farther apart than a neighboring one, then we can assume that each symbol error simply produces a bit error, such that Ps Pb . log2 M Note that this is a lower bound, as we are neglecting error events whose probability, however small, is nite. Anyway, as for Eb /N0 the probability of these events is innitesimal of higher order than the probability of a neighbor error, it is asymptotically tight. As an example, given that the symbol error probability for QPSK is Ps = 2 p p2 , if a Gray map is employed, the above approximation gives Pb p p2 /2, while the exact bit error probability computed previously is Pb = p. As a last note, recall that our comparison between QPSK and BPSK in (3.77) in terms of Ps evidenced a slight advantage of a factor of 2 for BPSK. We now know that QPSK with Gray mapping has the same bit error probability as BPSK. So, reporting Pb instead of Ps in Fig. 3.32, the QPSK curve coincides with the BPSK one.
CHAPTER 4
Optimum Detection of Stochastic Signals

There are practical situations where the receiver does not have a precise knowledge of the shape of the received signals. For example, this may be due to the lack of phase synchronization between the oscillators at the transmitter and the receiver. But, even if the transmitted signals were perfectly known, propagation from the transmitter to the receiver may introduce random attenuation and delays. If, for cost or whatever else reason, such quantities cannot be estimated, the theory developed in the previous chapter should be revised accordingly.
4.1. Stochastic Signals in AWGN Suppose that the receiver has only a partial knowledge of the signals, i.e., the received signals are known except for the value of some parameters collected in a vector = (1 , 2 , . . .)T . In this case, correspondingly to the transmission of symbol mi , the received signal can be written as r (t ) = si (t , ) + n(t ) 0 < t < T, i = 1, 2, . . . , M n(t ) being AWGN with power spectral density N0 /2. If the receiver has at least a statistical knowledge of the parameters, i.e., it knows their joint probability density function p( ), an optimum detection strategy as per the MAP criterion can be devised under the reasonable hypothesis of independence of the parameters from both signals and noise. Note that, according to this model, the received signal is a stochastic process even in the absence of noise n(t ) and conditional to the transmission of a given signal si (t ), as the received signal si (t , ) depends on the random vector . 4.1.1. Known Parameters. If, in some way, the receiver can estimate the parameters for each signaling interval, the detection strategy can be devised as in the previous chapter. Indeed, M if is known, si (t , ) is a deterministic waveform and the signal space generated by {si (t , )}1 has nite dimension, such that a sufcient statistic is given by the projection of the received signal r (t ) onto the signal space. As P(mi | r, ) = P(mi ) p(r, | mi ) P(mi ) p(r | , mi ) p( | mi ) = , p(r, ) p(r | ) p( ) m = argmax P(mi ) p(r | , mi ) ,
mi
if the parameters are independent of the signals, i.e., if p( | mi ) = p( ), the MAP strategy is
where p(r | , mi ) =
1 1 exp r si ( ) Q / 2 ( N0 ) N0
89
(4.1)
4.1. STOCHASTIC SIGNALS IN AWGN
90
M and {si ( )}1 are the images of the signals, now depending on . Thus, the MAP strategy can also be written as either one of the following equivalent expressions (compare with (3.16) and (3.19))
m = argmin
mi
= argmax
mi
r si ( )
T 0
N0 ln P(mi )
N0 1 r (t )si (t , )dt + ln P(mi ) 2 2
si2 (t , )dt ,
highlighting the fact that, in general, the strategy and, hence, the error probability depend on the particular realization of . It follows that the average probability of error is given by P(E ) = P(E | ) p( )d
where is the domain of p( ). 4.1.2. Unknown Parameters. If is only statistically known to the receiver through the joint pdf p( ), we can proceed as follows. Assuming for simplicity that the space generated by the M signals {si (t , )}1 has nite dimension Q whatever ,1 the MAP strategy is m = argmax P(mi ) p(r | mi )
mi
but now p(r | mi ) is still unknown to us, because of the dependence of r = si ( ) + n on . However, if is independent of the signals, p(r | mi ) = p(r | , mi ) p( | mi )d = p(r | , mi ) p( )d
where is the space of the parameters, i.e., the domain of p( ). Recalling (4.1), we get 1 1 p(r | mi ) = exp r si ( ) 2 p( )d Q / 2 ( N0 ) N0 and thus 1 m = argmax P(mi ) exp r si ( ) 2 p( )d N mi 0 2 T N0 1 = argmax exp r si ( ) + ln P(mi ) si ( ) N0 2 2 mi or, equivalently, m = argmax
mi
p( )d
exp fi ( ) p( )d
(4.2)
If this is not the case, we can follow the same reasoning as done in Section 3.3, arriving at the same conclusions.
4.2. SIGNALS WITH RANDOM AMPLITUDE
91
where 2 fi ( ) = N0
r (t )si (t , )dt + Ci ( )
(4.3) (4.4) (4.5)
Ci ( ) =
1 N0 ln P(mi ) Ei ( ) 2 2 T Ei ( ) = si2 (t , )dt .

0
This is the most general formulation of the MAP strategy for the detection of partially known signals in AWGN. Further simplications can be obtained case by case, depending on the signals and the statistics of the unknown parameters. 4.2. Signals with Random Amplitude If the channel introduces an attenuation > 0, the received signals are si (t , ) = si (t ) i = 1, 2, . . . , M
and, as leaves unchanged the dimensionality of the signal space, the corresponding images are si ( ) = si such that they move along lines passing through the origin and the points si . The decision regions will be a compromise between the regions corresponding to the values assumed by , weighted according to the probability density associated to such values. In general, they are not easy to determine, as shown in the following example. E XAMPLE 4.1. A binary system employs equally probable orthogonal signals whose images, with T T respect to a given orthonormal basis, are s1 = 2a, 0 and s2 = 0, a . The signals can reach the receiver following either of two equally likely paths, characterized by attenuation 1 = 1/2 and 2 = 1, respectively, such that si ( ) = si with 1 1 p( ) = ( 1 ) + ( 2 ) . 2 2 Let us nd the MAP strategy that an optimum receiver should follow and the decision zones. Being the signals equally likely, we can neglect their a priori probabilities, such that Ci ( ) = Ei ( )/2 with 2 2 a Ei ( ) = 2 si 2 = , i = 1, 2 . i As T 2 a r (t )si (t , )dt = rT si ( ) = r i , i = 1, 2 i 0 and taking into account that the average transmitted energy is Es = such that a = 2/5Es /a, we get 2 fi ( ) = N0 2 a 1 2 a ri i 2 i
2
1 2
s1
+ s2
5 = a2 2
8 Es 2 ri , = 5i N0 a i
i = 1, 2 .
92
Thus, according to (4.2), the MAP strategy is e f1 (1 ) + e f1 (2 ) e f2 (1 ) + e f2 (2 ) ,

m2 m1
(4.6)
i.e., select m1 if the left-hand side (LHS) is greater than the right-hand side (RHS), and m2 otherwise. The boundary between the decision zones is obtained by equating LHS and RHS in (4.6). Using the equality x +y y ex + ey = 2e 2 cosh x 2 and taking into account that 1 = 1/2 and 2 = 1, such that
2 2 + 2 4 Es 1 + 2 4 Es 3 5 fi (1 ) + fi (2 ) = ri 1 = ri 2 5i N0 a i 5i N 0 2 a 4i 2 2 2 fi (1 ) fi (2 ) 4 Es 1 2 4 Es 1 3 = ri 1 = ri , 2 5i N0 a i 5i N0 2a 4i the equation of the boundary can be written as Es 2 3 Es 6 1 Es 2 3 Es 6 r1 1 cosh r1 = exp r2 cosh r2 exp N 0 5a N0 5a 5 2N0 5a 2 2 N 0 5a 10
. (4.7)
From Fig. 4.1, we see that ln(cosh x )
| x | ln 2, the difference being less than 10% for | x | > 1.

2 1 ln(cosh x) x
3 2 1
1 2 3 |x| ln 2
F IGURE 4.1. So, taking the natural logarithm of both sides of (4.7), the equation of the boundary can be approximated as 2 3 1 6 1 3 6 1 2 r1 1 + r1 = r2 r2 . (4.8) + 5a 5a 5 2 5a 2 2 5a 10 The approximation becomes loose when the absolute value of the argument of the hyperbolic cosines in (4.7) is less than about 1, i.e., for 5/3 3 5/ 3 3 1 a < r1 < 1 + a E s / N0 2 E s / N0 2 20/3 3 20/3 3 1 a < r2 < 1 + a. E s / N0 4 E s / N0 4 Note that the true boundary (4.7) does depend on Es /N0 , while the approximated one (4.8) does not. However, for Es /N0 , it becomes asymptotically exact. Equation (4.8) can be broken up into three parts, namely, 3 a if r1 3 a 2r1 4 4 3 3 r2 = r1 if 4 a r1 2 a 2r 3 a if r 3 a 1 1 2 2 and is reported in Fig. 4.2 together with (4.7) for Es /N0 = 7 dB in (a), and for Es /N0 = 11 dB in (b).
93
r2
r2
Exact Approximate
2 s2 1 s2 1 s1
Es N0
2 s2 1 s2 2 s1
= 7 dB (b) r1
1 s1
Es N0
2 s1
= 11 dB
r1
(a)
F IGURE 4.2. Notice how the boundary line progressively morphs from the boundary line corresponding to the decision regions for = 1 to the one corresponding to the decision regions for = 2 . The solution simplies considerably if the transmitted signals are equally probable with equal energy Es . In such a case, the images lie over a hypersphere with radius Es . As can be easily seen from Fig. 4.3, if r si < r sk is true for some > 0, then it is true for any > 0. s2
s2
s1
s1
F IGURE 4.3. It follows that if then, for any , r si < r sk exp k = i
1 1 r si 2 > exp r sk 2 k = i N0 N0 and thus, for any p( ) such that p( ) = 0 for < 0, the strategy becomes 1 m = argmax exp r si 2 p( )d N mi 0 = argmin r si
mI 2
= argmax rT si
mi
4.3. SIGNALS WITH RANDOM PHASE
94
meaning that the decision regions do not depend on and the optimum receiver can be designed without regard to the attenuation introduced by the channel. However, the probability of error depends on the statistics of . For example, in the case of two equally probable and equal-energy orthogonal signals, P(E | ) = Q( Eb /N0 ) and Eb P(E ) = Q p( )d . N0 0 4.3. Signals with Random Phase In this section we will address narrow bandpass signals. A bandpass signal s(t ) is a signal whose Fourier transform vanishes outside a frequency interval B centered around f0 , with B < 2 f0 . It can be written as s(t ) = A(t ) cos 0 t + (t ) where A(t ) is the envelope and (t ) the phase of s(t ), while f0 0 /2 is referred to as the carrier frequency.2 The envelope is dened to be nonnegative, so that A(t ) 0. Negative amplitudes, when they occur, are absorbed in the phase by adding . 4.3.1. Complex Envelope. Notice that, letting s (t ) s(t ) can be written as s(t ) = Re s (t )e j 0 t = and thus its Fourier transform is 1 ( f f0 ) , S ( f f0 ) + S (4.11) 2 where the rst term in the RHS is centered around f0 , while the second one around f0 (see Fig. 4.4). It follows that the so called complex envelope s (t ) in (4.9), and thus both A(t ) and (t ), are lowpass signals with bandwidth B/2. When B 2 f0 , s(t ) is referred to as a narrow bandpass signal and it looks like a sinusoid at frequency f0 with slowly changing amplitude and phase. S( f ) =
( f ) S
1 2 S ( f
A(t )e j (t ) , 1 s (t )e j 0 t + s (t )e j 0 t 2
(4.9)
(4.10)
S( f ) f 0) f0 B 2
1 2 S( f
f 0) f
B 2
B 2
B f0 B 2 f0 f0+ 2
f0
f0+ B 2
F IGURE 4.4.
The choice of f0 can be different from the center of the bandwidth B. In this case, the phase (t ) changes accordingly, but the envelope A(t ) remains the same.
2
95
( f f0 ) is the Fourier transform of s(t ) 4.3.2. Hilbert Transform and Analytic Signal. As 1/2S for f 0, ( f f0 ) = 2S ( f )u( f ) S (4.12) where u( f ) is the unit step function. Using the sign function 1 f >0 sgn( f ) 1 f < 0 and a lter whose transfer function is we can write 2u( f ) = 1 + sgn( f ) = 1 + jH ( f ), and therefore (4.12) as ( f f0 ) = S ( f )[1 + jH ( f )] . S Notice that (see Fig. 4.5) |H ( f )| = 1 and /2 f > 0 arg H ( f ) = /2 f < 0
/2 |H ( f ) | 1
f
H ( f ) = j sgn( f )
(4.13)
so that H ( f ) simply introduces a /2 phase shift for f > 0 H ( f ) /2 and a /2 phase shift for f < 0. Hence, H ( f ) is a /2 phase shifter, also called a Hilbert lter. Its output is referred to F IGURE 4.5. as the Hilbert transform of the input. We will denote Hilbert transforms by a reversed hat, such that the Hilbert transform of s(t ) is s (t ), whose Fourier transform is, in turn, ( f ) = S ( f )H ( f ) = j sgn( f )S ( f ) . S (4.14) Hence, (4.13) can be written as ( f f 0 ) = S ( f ) + j S ( f ) S and, denoting by x (t ) its inverse Fourier transform, we have x (t ) s (t )e j 0 t = s(t ) + j s (t ) (4.15)
which is known as the analytic signal associated with s(t ). As H ( f ) has Hermitian symmetry, i.e., H ( f ) = H ( f ), its impulse response h(t ) is real. Therefore, the Hilbert transform s (t ) = s(t ) h(t ) of a real signal s(t ) is still real, such that from (4.15) we get Re s (t )e j 0 t = s(t ) in accordance with (4.10). E XAMPLE 4.2. The effect of the Hilbert lter on a sinusoid with arbitrary frequency is simply a /2 phase shift. So, the Hilbert transform of s(t ) = cos(0 t + ) is s (t ) = sin(0 t + ), while the Hilbert transform of s(t ) = sin(0 t + ) is s (t ) = cos(0 t + ). E XAMPLE 4.3. If x (t ) is a baseband signal such that X ( f ) = 0 for | f | > f0 , letting 0 = 2 f0 , the Hilbert transform of s(t ) = x (t ) cos 0 t is s (t ) = x (t ) sin 0 t . Indeed, as 1 1 S ( f ) = X ( f f0 ) + X ( f + f0 ) 2 2 and X ( f f0 ) = 0 for f < 0, while X ( f + f0 ) = 0 for f > 0, from (4.14) we have ( f ) = j X ( f f 0 ) + j X ( f + f 0 ) S 2 2
96
and, taking the inverse Fourier transform, j j s (t ) = x (t )e j 2 f0 t + x (t )e j 2 f0 t d f 2 2 j 0 t j 0 t e e = x (t ) 2j = x (t ) sin 0 t . Similarly, it can be shown that the Hilbert transform of s(t ) = x (t ) sin 0 t is s (t ) = x (t ) cos 0 t . E XAMPLE 4.4. As s(t ) = A(t ) cos 0 t + (t ) can also be written as s(t ) = A(t ) cos (t ) cos 0 t A(t ) sin (t ) sin 0 t , drawing upon the results of the previous example, if the Fourier transforms of A(t ) cos (t ) and A(t ) sin (t ) vanish for | f | > f0 , the Hilbert transform of s(t ) is s (t ) = A(t ) cos (t ) sin 0 t + A(t ) sin (t ) cos 0 t = A(t ) sin 0 t + (t ) . (t ) = A(t ) cos 0 t + (t ) . Similarly, the Hilbert transform of s(t ) = A(t ) sin 0 t + (t ) is s The complex envelope and the analytic signal are useful tools that allow a compact notation. We will make use of these tools in section 4.3.7.
4.3.3. Noncoherent Detection of Bandpass Signals. Supposing that the employed signals are narrow bandpass, they can be written as si (t ) = Ai (t ) cos 0 t + i (t ) 0tT, i = 1, 2, . . . , M (4.16)
If, for whatever reason, the carrier phase-shift is completely unknown to the receiver and can be different for each signal, the received signals will be si (t , ) = Ai (t ) cos 0 t + i (t ) i . (4.17)
The complete lack of knowledge about i is reected by modeling it as uniformly distributed in an interval of length 2 , for example p(i ) = 1 , 2 i .
In this case, the MAP strategy and the resulting optimum receiver are said noncoherent, in contrast to the case where the carrier phase-shift is exactly known, which is referred to as coherent detection.3
In optical communications, the terms synchronous or asynchronous detection are used to refer to a detection strategy which assumes or not knowledge of the phase of the received signal, whereas coherent detection refers to the use of a laser as a local oscillator, independently of the detection strategy.
3
97
Under the hypothesis that 0 T 1 and taking into account that Ai (t ) and i (t ) vary much more slowly than cos(20 t ) because their bandwidth is much smaller than f0 , we have T Ei ( ) = si2 (t , )dt 0 T 2 = A2 i (t ) cos 0 t + i (t ) i dt 0 1 T 2 1 T 2 A (t )dt + A (t ) cos 20 t + 2i (t ) 2i dt = 2 0 i 2 0 i 1 T 2 0 A (t )dt (4.18) 2 0 i so that the energy of the signals and thus Ci ( ) in (4.4) are independent of . As fi ( ) in (4.3) only depends on i , assuming that the unknown phases i are independent, the MAP strategy (4.2) becomes 1 m = argmax exp fi (i ) d i . (4.19) 2 mi On the other hand, letting xi
T
yi we get
0 T
r (t ) Ai (t ) cos 0 t + i (t ) dt r (t ) Ai (t ) sin 0 t + i (t ) dt
0 T
(4.20)
r (t )si (t , )dt =
where
= zi cos(i i ) zi =
r (t ) Ai (t ) cos 0 t + i (t ) i dt
(4.21)
xi2 + yi2 yi i = arctan , xi such that 2zi Ei cos(i i ) + ln P(mi ) . N0 N0 Given the denition of the modied Bessel function of order 0, 1 1 x cos I0 ( x ) = e d = ex cos() d 2 2 we obtain 2zi Ei 1 m = argmax exp cos(i i ) d i exp ln P(mi ) 2 N0 N0 mi fi (i ) = = argmax I0
mi
(4.22)
(4.23)
2zi Ei exp ln P(mi ) N0 N0 2zi Ei + ln P(mi ) N0 N0
= argmax ln I0
mi
(4.24)
98
Then, the structure of an optimum receiver is as shown in Fig. 4.6.
z1
T 0
2 z1 ) ln I0 ( N 0
r(t )
/2
si(t )
T 0
2 x2 i + yi
zi
CHOOSE MAX
xi
E1 ln P(m1 ) N 0 2 ln I0( N zi ) 0 Ei ln P(mi) N 0
yi zM
2 ln I0( N zM ) 0
M ln P(mM ) E N0
F IGURE 4.6.
A drawback of this scheme is that the function generators Observing that I0 ( x ) ex / 2 x for x 1, we have
xi2 + yi2 and ln I0 (2zi /N0 ) are costly.
ln I0 ( x ) x ln x ln 2 and, neglecting ln x with respect to x , instead of (4.24) we could adopt the suboptimal strategy N0 Ei ln P(mi ) 2 2
m = argmax zi +
mi
(4.25)
Thus, for sufciently high values of the the signal-to-noise ratio, the optimum receiver is well approximated by eliminating the nonlinear element ln I0 () and by adding the constants Ci = 1 1 N ln P(mi ) 2 Ei to zi , as shown in Fig. 4.7. 2 0
zi
For high SNR

2 ln I0( N zi ) 0 Ei ln P(mi ) N 0
zi
1 1 2 N0 ln P(m1 ) 2 Ei
F IGURE 4.7.
99
As regards the other nonlinear element, let us consider the lter hi (t ) matched to si (t ) in (4.16), i.e., hi (t ) = si (T t ). Because hi (t ) = 0 for t < 0 and t > T , the output will be t v(t ) = r ( )hi (t )d t T t = r ( ) Ai (T t + ) cos 0 (T t + ) + i (T t + ) d t T t = r ( ) Ai (T t + ) cos 0 + i (T t + ) d cos 0 (T t )
t T
x (t )
t
y(t ) = x (t ) cos 0 (T t ) y(t ) sin 0 (T t ) = z(t ) cos 0 (T t ) + (t ) where z(t ) (t ) For t = T , we have x (T ) = y(T ) =
0
t T
r ( ) Ai (T t + ) sin 0 + i (T t + ) d sin 0 (T t )
x 2 (t ) + y2 (t ) y(t ) arctan . x (t )
r ( ) Ai ( ) cos 0 + i ( ) d r ( ) Ai ( ) sin 0 + i ( ) d
which coincide with xi and yi in (4.20), such that the envelope z(t ), at the time instant t = T , coincides with the output zi of the i -th function generator in Fig. 4.6. The envelope z(t ) can be obtained by the simple envelope detector shown in Fig. 4.8, so that a cheaper receiver can be realized as shown in Fig. 4.9.
F IGURE 4.8. Note that if the signals are equally likely, the approximate decision strategy becomes m = argmax zi
mi
Ei 2
(4.26)
and thus it is independent of N0 , as in the case of perfectly known signals.
100
t =T s1 ( T t )
ENVELOPE DETECTOR
z1
1 1 2 N0 ln P(m1 ) 2 E1
r(t )
si ( T t )
ENVELOPE DETECTOR
zi
1 1 2 N0 ln P(mi ) 2 Ei
CHOOSE MAX
sM ( T t )
ENVELOPE DETECTOR
zM
1 1 2 N0 ln P(mM ) 2 EM
F IGURE 4.9.
If the signals are also equal-energy, the decision strategy simplies considerably without resorting to approximations. Indeed, taking into account that I0 ( x ) is a monotone increasing function for x > 0, when the signals are equally probable and equal-energy, the MAP strategy (4.24) becomes m = argmax zi
mi
(4.27)
or, equivalently, m = argmax zi2 .

mi
(4.28)
In this case, the structure of the two equivalent receivers is shown in Fig. 4.10(a) and (b), respectively.
z1
CHOOSE MAX
T 0
xi zi m r(t )
/2
r(t )
/2
si(t )
2 x2 i + yi
si(t )
z2 i
T 0
CHOOSE MAX
T 0
xi
z2 1 ( )2
T 0
yi
zM
yi
( )2
z2 M
(a)
(b)
F IGURE 4.10. In the case of Fig. 4.10(a), in order to lower the cost of the receiver, the correlators and the nonlinear elements can be replaced with matched lters and envelope detectors. This is not necessary in the case of Fig. 4.10(b), as the nonlinear elements can be realized more easily.
101
Notice that a photodiode produces an output proportional to the intensity (the square of the electrical eld) of the light impinging upon it. So, if optical amplication is used, the signals are corrupted by amplied spontaneous emission (ASE) noise (modeled as AWGN), and the structure of an optimum noncoherent optical receiver employing a matched (optical) lter equivalent to Fig. 4.10(b) is as shown in Fig. 4.11.
z2 1
r(t )
si ( T t )
z2 i
CHOOSE MAX
z2 M
F IGURE 4.11.
However, it should be noted that noncoherent detection is not always possible. For example, in the M -PSK case the signals are as in (4.16) with Ai (t ) = A i (t ) = i = (i 1) 2 M i = 1, 2, . . . , M .
Now, as the decision is based on zi2 = xi2 + yi2 , where xi and yi are as in (4.20), it means that zi is proportional to the norm of the projection of the received waveform r (t ) onto the bidimensional signal space. As the signals are equal-energy, they all have the same norm and thus all zi s are the same, whatever the transmitted signal buried in r (t ). Indeed, using the orthonormal basis (3.38) and replacing A = 2Es /T , we get xi = yi = where r1 =
0
r (t ) A cos(0 t i )dt = r (t ) A sin(0 t i )dt = 2 cos 0 tdt T
Es (r1 cos i + r2 sin i ) Es (r2 cos i r1 sin i )

0 T
r (t )
r2 =
r (t )
2 sin 0 tdt , T
2 2 and therefore xi2 + yi2 = Es (r1 + r2 ) for all i . Thus, following the strategy (4.27) or (4.28) would be equivalent to randomly choosing one of the M signals, such that the probability of error would be P(E ) = 1 P(C ) = 1 1/ M , an unacceptable high value. In the M -PSK case, noncoherent detection is only possible by associating a symbol to the phase difference between two consecutively transmitted signals (differential coding) and then observing pair of signals.
In general, all M -ary systems using bidimensional signals, such as M -PSK and M -QAM, cannot be detected noncoherently by only observing the received waveform over a single signaling interval, because the signals cannot be distinguished without a phase reference. Noncoherent detection is instead possible for orthogonal signals and also in the case of generic binary signals, provided that they are almost orthogonal. Notice that, for noncoherent detection, the orthogonality condition should hold whatever the phase difference. For example, A cos 0 t and A sin 0 t are orthogonal for coherent detection but not orthogonal for noncoherent detection,
102
because it is required that A cos(0 t + ) and A sin(0 t + ) be orthogonal. Now, even if 0 T T 1 2 T A cos(0 t + ) A sin(0 t + )dt = A sin( ) + sin(20 t + + ) dt 2 0 0 1 A2 T sin( ) = 0 if = . 2
1,
4.3.4. Noncoherent Detection of FSK Signals. In this case, the transmitted signals si (t ) are as in (4.16) with Ai (t ) = A (4.29) i (t ) = (i 1)t so that (compare also with (3.89)), letting4 i 0 + (i 1) , (4.30)
si (t ) = A cos i t 0 t T , i = 1, 2 , . . . , M while the received signals are si (t , ) = A cos i t i . As already seen, for 0 T 1 and in the absence of phase synchronization, such signals are orthogonal if the (angular) frequency deviation is an integer multiple of 2 /T . So, the minimum frequency deviation for orthogonality is 1/T , two times the minimum value required when the transmitted signals are phase-synchronous. However, even if the transmitted signals were synchronized, the phase shifts i introduced by the channel at the angular frequencies i would require such minimum value to preserve the orthogonality. Thus, the required bandwidth is at least doubled with respect to coherent transmissions. 1, the energy of all signals is the same and independent of T 1 Ei ( ) = si2 (t , )dt = A2 T = Es i . 2 0 So, if the signals are equally probable, the optimum detection strategy is as in (4.27) and the receiver 1 1 structure as in Fig. 4.10 or Fig. 4.9, but without the constants Ci = 2 N0 ln P(mi ) 2 Ei . It is easy to see that, for 0 T
Now, as already seen in Example 3.3, the simple resonant parallel circuit of Fig. 3.18(a), tuned to the frequency 0 , has the impulse response (3.50). Tuning this circuit to the frequency i , by proper amplication its impulse response can be made equal to A cos i t t 0 hi (t ) = . 0 t<0
4
Let us now see how the matched lter in Fig. 4.9 can be replaced with a more easily realizable one. Letting i i T , where i is as in (4.30), the lter matched to si (t ) is A cos(i t i ) 0 t T . si (T t ) = 0 otherwise
Some authors write the angular frequencies i in terms of the middle angular frequency c (1 + M )/2 = 0 + (M 1)/2 as i = c + (2i 1 M )d , where d /2, and consider c as the carrier frequency. This is because, instead of M separate oscillators, a single one with free tuning frequency c may be used and then, according to the source message to be transmitted, c is changed to i (by changing an oscillator parameter). In this way, the resulting FSK signal is phase-continuous and exhibits a narrower spectrum due to the absence of abrupt phase transitions.
103
As can be seen, hi (t ) differs from the matched lter si (T t ) by the phase shift i and because it does not vanish for t > T . However, if we apply r (t ) to its input at t = 0, we can consider r (t ) = 0 for t < 0, such that the output will be t t v(t ) = r ( )hi (t )d = r ( ) A cos i (t )d 0 0 t t = r ( ) A cos i d cos i t + r ( ) A sin i d sin i t
0 0
x (t ) = z(t ) cos i t (t ) where z(t ) x 2 (t ) + y2 (t )
y(t )
(t )
arctan
y(t ) . x (t )
Instead, the output of the matched lter would be t t v (t ) = r ( )si (T t + )d = r ( ) A cos i (t ) i d t T t T t t = r ( ) A cos i d cos(i t i ) + r ( ) A sin i d sin(i t i ) x (t ) = z (t ) cos i t i (t ) where z (t ) x 2 (t ) + y 2 (t )
0 t T t T
y (t )
(t )
arctan
y (t ) . x (t )
Clearly, v(t ) = v (t ), but for t = T , we have x (T ) = x (T ) = y(T ) = y (T ) =
r ( ) A cos i d
r ( ) A sin i d
so that the envelope z(t ) coincides with the envelope z (t ) at the time instant t = T . In conclusion, a cheap implementation of the receiver is as shown in Fig. 4.12.
z1
CHOOSE MAX
r(t )
zi t =T zM
F IGURE 4.12.
104
4.3.4.1. Error Probability. Let us now compute the error probability for noncoherent detection of equally probable and equal-energy M -ary orthogonal signals. Although we specically refer to M -ary FSK, the result is valid for any other set of orthogonal signals. For symmetry, we have that P(E ) = P(E | m1 ) = 1 P(C | m1 ) . The receiver decides according to the strategy m = argmax zi
mi
(4.31)
(4.32)
based upon the random variables zi = xi2 + yi2 i = 1, 2 , . . . , M (4.33)
where xi and yi are as in (4.20). For FSK signals, they specialize as T xi = r (t ) A cos i tdt 0 t yi = r (t ) A sin i tdt
0
(4.34) (4.35)
with i as in (4.30). Supposing that m1 is transmitted, r (t ) = s1 (t , ) + n(t ) = A cos(1 t 1 ) + n(t ), where n(t ) is AWGN with power spectral density N0 /2. So, conditional upon , and taking into M M account that { A cos i t }1 and { A sin i t }1 are orthogonal to each other, each with energy Es = 2 A T /2, xi and yi are independent Gaussian random variables, all with variance Es N0 /2. As regards their mean value, from (4.34) and (4.35), we get T E { x i | m1 , } = [ A cos(1 t 1 ) + E {n(t )}] A cos i tdt 0 A2 T A2 T cos [(i 1 )t + 1 ] dt + cos [(i + 1 )t 1 ] dt = 2 0 2 0 E {yi | m1 , } = [ A cos(1 t 1 ) + E {n(t )}] A sin i tdt T A A2 T = sin [(i 1 )t + 1 ] + sin [(i + 1 )t 1 ] dt . 2 0 2 0
0 2 0 for 0 T 1
0 for 0 T T
For i = 1, the above integrals vanish, as i 1 = (i 1) and is an integer multiple of 2 /T . Thus, denoting by i j the Kronecker symbol, E { xi | m1 , } = Es cos 1 1i E {yi | m1 , } = Es sin 1 1i .
From the independence of the xi s and yi s and from (4.33), it follows that, conditionally to m1 and , also the zi s are independent. Moreover, as only z1 depends on , they are also independent
105
unconditionally from , such that, collecting all zi s in the vector z, their joint probability density function factorizes as p(z | m1 ) =
M i =1
p(zi | m1 ).
(4.36)
According to (4.32), the event correct decision C , conditional upon the transmission of m1 , is C = {z1 > z2 , z1 > z3 , . . . , z1 > z M } and thus P(C | m1 ) =
0
p(z1 | m1 )
M i =2
z1
p(zi | m1 )dzi dz1 .
Now, as for i 2 the Gaussian r.v.s xi and yi have same mean and variance, the probability density functions p(zi | m1 ) are all equal, such that z1 M 1 P(C | m1 ) = p(z1 | m1 ) p(z2 | m1 )dz2 dz1 . (4.37)
0 0
In order to determine p(z1 | m1 ) and p(z2 | m1 ), we consider the transformation xi2 + yi2 yi i = arctan xi which, for zi 0 and |i | , has the single solution xi = zi cos i yi = zi sin i . zi =
Recalling the fundamental theorem for the transformation of random variables, px y ( xi , yi | m1 , ) , pzi i (zi , i | m1 , ) = i i | J ( xi , yi )| and, as the Jacobian of the transformation is J ( xi , yi ) = we get z1 (z1 cos 1 Es cos 1 )2 + (z1 sin 1 Es sin 1 )2 exp E s N0 E s N0 2 2 z + Es 2Es z1 cos(1 1 ) z1 exp 1 = E s N0 E s N0 pz2 2 (z2 , 2 | m1 , ) = z2 px2 y2 (z2 cos 2 , z2 sin 2 | m1 , ) = = z2 (z2 cos 2 )2 + (z2 sin 2 )2 exp E s N0 E s N0 z2 z2 = exp 2 . E s N0 E s N0 pz1 1 (z1 , 1 | m1 , ) = z1 px1 y1 (z1 cos 1 , z1 sin 1 | m1 , )
z i xi i xi z i yi i yi
xi zi yi zi
x i 1 i yi i
1 , zi
106
To obtain the marginal pdfs of z1 and z2 , we integrate with respect to 1 in the rst case and 2 in the second one 2 2 z1 + Es 2z1 z1 exp p(z1 | m1 , ) = exp cos(1 1 ) d 1 E s N0 E s N0 N0 2 z2 + Es 2z1 2z1 = exp 1 I0 (4.38) E s N0 E s N0 N0 2 z2 2z2 exp . (4.39) p(z2 | m1 , ) = E s N0 E s N0 As can be seen, neither z1 nor z2 depend on , such that p(zi | m1 , ) = p(zi | m1 ). The conditional pdfs of z1 and z2 are known as Rice and Rayleigh distributions, respectively. Replacing (4.38) and (4.39) in (4.37), performing the inner integral and then the change of variable z 1 = x E s N0 / 2 yields P(C | m1 ) = e
E s / N0
xe
x 2 /2
Such expression can be manipulated into a closed form as follows. Using the binomial expansion 1 ex (4.40) can also be written as P(C | m1 ) =
M 1 k =0
2 /2
x I0
2Es x 2 /2 1e N0
M 1
dx .
(4.40)
M 1
M 1 k =0
M 1 2 (1)k ekx /2 k
M 1 (1)k eEs /N0 k
xe
(k +1) x 2 /2
From the normalization condition for the Rice distribution (4.38), we get
0
x I0
2Es dx . N0
(4.41)
p(z1 | m1 )dz1 = e
0
E s / N0
z2 2z1 2z1 exp 1 I0 dz1 = 1 E s N0 E s N0 N0
which, letting a = 2/(Es N0 ) and b = 2/N0 , can be rewritten as
xeax /2 I0 (bx )dx =
1 b2 exp . a 2a
Substituting now a = k + 1 and b = 2Es /N0 yields

0
xe
(k +1) x 2 /2
x I0
2Es 1 E s / N0 exp dx = N0 k+1 k+1
(4.42)
107
and thus, replacing (4.42) in (4.41) and using (4.31), P(E ) = P(E | m1 ) = 1 P(C | m1 ) =1 =
M 1 k =1 M 1 k =0
M 1 (1)k k Es exp k k+1 k + 1 N0
M 1 (1)k +1 k Es exp k k+1 k + 1 N0

M i =2
Es 1 exp = M 2 N0
M i 2 Es (1)i exp i i 2 N0
(4.43)
where, in the last equation, we performed the change of variable i = k + 1 and used the equality 1 M 1 M 1 = . i i M i For M = 2, it is Es = Eb and P(E ) = Pb , so we get
(NC) Pb =
Eb 1 exp 2 2 N0
(4.44)
to be compared with the bit error probability of a coherent system using orthogonal binary signals, which, for Eb /N0 1, can be approximated as E Eb 1 b (CO) Pb = Q exp . N0 2 N0 2 E b / N 0 The ratio
(CO) Pb (NC) Pb
Eb 2 N0
increases with Eb /N0 , but the the error probability curve becomes steeper and steeper, such that the (CO) (NC) horizontal difference between Pb and Pb actually decreases. For M > 2, we can express the bit error probability as a function of Eb /N0 by recalling that Eb Es = log2 M N0 N0 M Pb = P(E ) . 2(M 1) The bit error probability for noncoherent detection of equally probable and equal-energy orthogonal signals is shown in Fig. 4.13 as a function of Eb /N0 . For comparison, also the bit error probability for coherent detection is reported with dashed lines.
1 10 -1 10 -2
Pb
Noncoherent Coherent
10 -3
M=2
10 -4 10 -5 10 -6
M= 103 106 32
-5
-1.6 0
5
Eb/N0 [dB]
10
15
F IGURE 4.13.
108
As can be seen, the difference between coherent and noncoherent detection becomes negligible for increasing values of M . The asymptotic performance limit for M is the same as for coherent detection. As a last note, using a computer, the probability of error can be easily computed as in (4.43) only for not so large values of M and for not so small values of P(E ). This is because, for large M , the terms in the summation are large and with alternating signs, so that the numerical cancellation phenomenon takes place. Indeed, not all real numbers are representable in oating point form, the format used by computers, and their representation is more dense near the zero. As any number is approximated by its nearest representation in oating point, two large and almost equal numbers may be approximated by two adjacent representations whose difference is much larger than the true difference. So, subtracting two large numbers can yield a very wrong result. This is called numerical cancellation or loss of signicance. This problem can be avoided by numerically computing P(C | m1 ) as in (4.40), that entails always summing small positive values, and then P(E ) as P(E ) = P(E | m1 ) = 1 P(C | m1 ). 4.3.5. Noncoherent Detection of OOK Signals. Especially in optical communications, another case of interest where the signals are not perfectly known by the receiver is binary on-off keying (OOK) signaling. Here, the transmitted signals si (t ) are as in (4.16) with A i = 1 Ai (t ) = 0 i = 2 (4.45) i (t ) = 0 i = 1, 2 so that s1 (t ) = A cos 0 t s2 (t ) = 0 s1 (t , ) = A cos(0 t ) s2 (t , ) = 0 0tT
while the received ones are 0tT
where is an unknown phase-shift, modeled as a random variable uniformly distributed in the interval (, ) 21 | | p( ) = . 0 elsewhere If 0 T 1, the energy of the received signals does not depend on 1 2 AT 2 E2 = 0 . E1 = Moreover, from (4.21) and (4.45), by dropping the index i , T z cos( ) i = 1 r (t )si (t , )dt = 0 i=2 0
109
where, specializing (4.20) and (4.22) to this case, z= x2 + y2 y x with y= x=
r (t ) A cos 0 tdt . r (t ) A sin 0 tdt
= arctan
According to (4.19) and (4.23), assuming equally likely signals, we can neglect the term ln P(mi ) in (4.23), and therefore 2z E1 f1 ( ) = cos( ) N0 N0 f2 ( ) = 0 from which 2z 1 E1 exp f1 ( ) d = I0 exp 2 N0 N0 1 exp f2 ( ) d = 1 . 2 Replacing these expressions in (4.19), the optimum detection strategy turns out to be E1 2z m1 if I0 N0 > exp N0 m = . E1 2z m2 if I0 N < exp N0 0
1 Denoting by I0 ( x ) the inverse function of I0 ( x ), we can also write m1 if z > z0 m = m if z < z 2 0
(4.46)
where
z0 =
E1 N 0 1 I0 exp 2 N0
(4.47)
Following the same reasoning that led to Fig. 4.12, the strategy (4.46) can be implemented in a simple and cheap way as shown in Fig. 4.14. z t =T F IGURE 4.14. An equivalent cheap structure for a ber-optic system employing optical amplication can be obtained by changing the strategy (4.46) to 2 m1 if z2 > z0 m = m if z2 < z2
2 0 5 5
m1 m2 z0
Actually, the cost of such optical receiver might depend on the optical lter. As it is technologically difcult obtaining an optical lter with a preassigned impulse response, if a cheap optical lter that only approximates the matched lter is used, the optical receiver will be suboptimum.
110
which is implemented by the structure in Fig. 4.15. r(t ) z2 t =T F IGURE 4.15. Notice that the probability of error will be the same for both receivers of Figs. 4.14 and 4.15. 4.3.5.1. Error Probability. The probability of error can be computed as 1 P(E ) = 1 P(C ) = 1 P(C | m1 ) + P(C | m2 ) 2 1 = 1 P(z > z0 | m1 ) + P(z < z0 | m2 ) (4.48) 2 so that we need the two pdfs p(z | m1 ) and p(z | m2 ). These pdfs can be computed in the same way as done for (4.38) and (4.39). It turns out that p(z | m1 ) is a Rice pdf while p(z | m2 ) is Rayleigh
2 z2 + E1 2z 2z exp I0 E 1 N0 E 1 N0 N0 2 z 2z exp . p(z | m2 ) = E 1 N0 E 1 N0 Therefore, by the change of variable z = x E1 N0 /2, we get P(z > z0 | m1 ) = p(z | m1 )dz
m1 m2 z2 0
s1 ( T t )
p(z | m1 ) =
for z 0
2E1 x 2 + 2 E 1 / N0 x exp I0 = 2 N0 z 0 / E 1 N0 / 2 2E1 z0 , = Q1 , N0 E 1 N0 / 2 where Q1 (a, b) is the rst-order Marcum Q-function, dened as x 2 + a2 I0 (ax )dx , Q1 (a, b) x exp 2 b and such that Q1 (a, 0) = 1
z0
dx x
(4.49)
(4.50)
Q1 (0, b) = exp(b2 /2) . Performing the change of variable z = x E1 N0 , we also obtain z0 z 2 / ( E 1 N0 ) 0 z2 P(z < z0 | m2 ) = p(z | m2 )dz = exp( x )dx = 1 exp 0 , E 1 N0 0 and thus, replacing (4.49) and (4.52) in (4.48), 2 z 2 E z 1 1 0 0 . 1 + exp P(E ) = Q1 , 2 E 1 N0 N0 E 1 N0 / 2
(4.51)
(4.52)
(4.53)
111
This expression for P(E ) is valid for any threshold z0 , but, of course, the minimum is achieved when z0 is chosen as in (4.47). For large values of x , by using the crude approximation I0 ( x ) ex , 1 as we already did for obtaining (4.25), we have that I0 (x) ln x . So, for large signal-to-noise 6 ratios E1 /N0 , (4.47) simplies to E1 z0 (4.54) 2 and, using this threshold and writing E1 in terms of the average energy per bit Eb = E1 /2, we get 1 E E E b b b P (E ) = Q1 , (4.55) 1 + exp 2 . 2 2 N0 N0 N0 For Eb /N0 1, this expression is amenable to simplication. Indeed, for b 1 and b ba the rst-order Marcum Q-function can be approximated by the standard Q-function (3.63) as the asymptotic expansion Q1 (a, b) Q(b a). In our case, b a = Eb /N0 , such that E E E Eb b b b Q1 , 1. for 2 Q 1 N0 N0 N0 N0 However, as limb0 Q1 (a, b) = 1 for all a, it is expected that Q1 (a, a/2) Therefore, (4.55) simplies to 1 Eb P(E ) exp 2 2 N0 1 also for a
1.
(4.56)
for both small and large values of Eb /N0 , with, possibly, some negligible loss of accuracy for intermediate values. This is conrmed in Fig. 4.16, where the error probability (4.53) corresponding to the optimum threshold (4.47) is reported as the solid curve, together with the error probability for z0 = E1 /2 in (4.55) (dashed curve) and the corresponding approximation (4.56) (crosses).
1 10 -1 10 -2
Pb
10 -3 10 -4 10 -5 10 -6 -5
Optimum threshold Threshold = E1/2 Approximation
5
Eb/N0 [dB]
10
15
F IGURE 4.16.
6
1 An excellent analytic approximation of (4.47) is z0 = 2 E1 1 + 4/(E1 /N0 )0.92 .
112
4.3.6. Performance Comparison of Binary Systems. The probability of error of four binary systems analyzed so far is reported as a function of Eb /N0 in Fig. 4.17. As can be seen, the PSK system with coherent detection allows saving about 3 dB with respect to coherent orthogonal FSK (CO-FSK), about 3.5 dB with respect to noncoherent OOK (NC-OOK), and about 3.7 dB with respect to noncoherent FSK (NC-FSK). Noncoherent OOK is slightly better ( 0.2 dB) than noncoherent FSK when using the optimum threshold. However, when using the threshold (4.54) for OOK, NC-FSK and NC-OOK will perform practically the same.
1 101 102 103 104 105 106 PSK Pb NC-FSK CO-FSK NC-OOK
10 12 Eb/N0 [dB]
14
16
F IGURE 4.17. It should also be noted that coherent FSK can perform better than how reported in Fig. 4.17. Indeed, we recall that the probability of error for coherent detection of equally probable and equal-energy binary signals is given by (3.70) Eb , N0 where is the correlation coefcient (3.68), that in this case is T 1 s1 (t )s2 (t )dt . = Eb 0 Now, for the binary FSK signals P(E ) = Q (1 ) s1 (t ) = A cos 1 t , if i T
2
s2 (t ) = A cos 2 t ,
1, i = 1, 2, we have that Eb = A T /2 and, letting f (1 2 )/2 , T 1 = A2 cos 1 t cos 2 tdt Eb 0 1 T = cos[(1 2 )t ]dt T 0 sin 2 f T = = sinc(2 f T ) . 2 f T The behavior of as a function of the frequency deviation f is reported in Fig. 4.18.
0tT
113
f0
1 2T 1 T 3 2T
F IGURE 4.18. As can be seen, the value f0 0.715/T of the frequency deviation that minimizes does not correspond to orthogonal signals. The minimum value of the correlation coefcient is 0.22, such that Eb P(E ) = Q 1.22 N0 and thus the CO-FSK curve in Fig. 4.17 should be shifted by 10 log10 1.22 0.9 dB to the left. 4.3.7. Noncoherent Detection of Equal-Energy Binary Signals. For equally likely and equalenergy binary signals, the MAP noncoherent detection strategy is equivalently given by either (4.27) or (4.28). If the received signals are orthogonal, the bit error probability is given by (4.44). However, this formula does not hold if the received signals are not orthogonal, because it was obtained under the hypothesis that the decision r.v.s zi were independent, which now is not the case, as we will show. In this case, it is mathematically convenient rewriting the zi s in terms of complex envelopes and analytic signals (as dened in (4.9) and (4.15), respectively) because this allows a more compact notation. By dening the complex random variables7 i where xi and yi are as in (4.20), the decision r.v.s zi can be written as zi = Thus, we have i = = = = where
0 T
xi jyi
i = 1, 2 ,
xi2 + yi2 = |i | .
0 T
(4.57)
r (t ) Ai (t ) cos 0 t + i (t ) dt j r (t ) Ai (t )e j [0 t +i (t )] dt r (t )s i (t )e j 0 t dt r (t )s i (t )dt
r (t ) Ai (t ) sin 0 t + i (t ) dt
(4.58) (4.59) (4.60)
s i (t ) = Ai (t )e j i (t ) is the complex envelope (also called lowpass equivalent) of si (t ) in (4.16), while s i (t ) = s i (t )e j 0 t = si (t ) + j s i (t )

7
The r.v.s i can also be equivalently dened as i
xi + jyi . We choose the minus sign only for our convenience.
114
is the analytic signal associated with si (t ), being s i (t ) the Hilbert transform of si (t ). Notice that, as i (t ), the strategy based on zi or zi2 is implemented by the structures in Fig. 4.10(a) s i (t ) = si (t ) j s or Fig. 4.10(b), respectively, because the /2 phase shifters produce just s i (t ).8 Taking into account that the signals are equally probable and equal-energy and that the energy is independent of (see (4.18)), we have that E1 = E2 = Eb , where Eb is the average energy per bit, which turns out to be half the energy of the complex envelope (4.59) 1 T 2 1 T A (t )dt = |s i (t )|2 dt i . Eb = 2 0 i 2 0 Recalling Example 4.4, the transmitted signals and their Hilbert transforms are si (t ) = Ai (t ) cos 0 t + i (t ) s i (t ) = Ai (t ) sin 0 t + i (t ) = Ai (t ) cos 0 t + i (t ) /2 , while the received signals are as in (4.17), also reported here for convenience si (t , ) = A1 (t ) cos 0 t + 1 (t ) i . Thus, writing the various signals in terms of their complex envelopes and taking into account that integrals of passband terms at frequencies 20 vanish for 0 T 1, we get T T si (t , )sk (t )dt = Re s i (t )e j i e j 0 t Re s k (t )e j 0 t dt 0 0 1 T s i (t )e j i e j 0 t + s i (t )e j i e j 0 t s k (t )e j 0 t + s k (t )e j 0 t dt = 4 0 1 T = s i (t )s k (t )e j i + s i (t )s k (t )e j i dt 4 0 1 T = Re s i (t )s k (t )e j i dt . 2 0 As the complex envelope of s k (t ) is s k (t )e j /2 = j s k (t ) and Re( jz) = Im(z), drawing upon the previous result, we also obtain T T si (t , )s k (t )dt = Re s i (t )e j i e j 0 t Re s k (t )e j /2 e j 0 t dt 0 0 1 T (t )e j /2 e j i dt = Re s i (t )s k 2 0 1 T = Im s i (t )s k (t )e j i dt , 2 0 so that T 1 T si (t , ) sk (t ) j s k (t ) dt = s i (t )s k (t )dt e j i . (4.61) 2 0 0 Therefore, supposing that s1 (t ) is transmitted, r (t ) = s1 (t , ) + n(t ), where n(t ) is the AWGN, and T 1 = 1 (t ) dt = Eb e j 1 + n 1 (4.62) [s1 (t , ) + n(t )] s1 (t ) j s
0 8
Actually, the sign does not matter and either s (t ) would do.
115
where n 1 = n1c + jn1s is a complex r.v. whose real and imaginary parts are T n1c n(t )s1 (t )dt 0 T n1 s n(t )s 1 (t )dt . As s1 (t ) and s 1 (t ) are orthogonal and both have energy Eb , n1c and n1s are independent zero-mean Gaussian r.v.s with equal variance 2 n = E b N0 / 2 . (4.63) Besides, dening the complex-valued correlation coefcient between the complex envelopes of s1 (t ) and s2 (t ) as T 1 s 1 (t ) s 2 (t ) dt , (4.64) 2Eb 0 and taking into account (4.61), we have that T 2 (t ) dt = Eb e j 1 + n 2 , (4.65) 2 = [s1 (t , ) + n(t )] s2 (t ) j s where, similarly to n 1 , n 2 = n2c + jn2s is a complex r.v. whose real and imaginary parts are inde2 pendent zero-mean Gaussian r.v.s with equal variance n as in (4.63). Proceeding as already done for obtaining (4.38), we nd that, conditional to the transmission of message m1 , both z1 = |1 | and z2 = |2 | are independent of and are Ricean-distributed p(z1 | m1 ) =
0 0
2 z2 + Eb 2z1 2z1 exp 1 I0 E b N0 E b N0 N0 2 2 2 z + | | E b 2z2 2| |z2 p(z2 | m1 ) = exp 2 I0 . E b N0 E b N0 N0 However, using (4.60) and (4.64), T T n(t1 ) s1 (t1 ) + j s 1 (t1 ) dt1 n(t2 ) s2 (t2 ) j s 2 (t2 ) dt2 E {n 1 n 2 } = E
T
0
0 T
= =
T
0
1 (t1 )e j 0 t1 s 2 (t2 )e j 0 t2 dt1 dt2 E n(t1 )n(t2 ) s
N0 T = s 1 (t1 ) s 2 (t1 ) dt1 2 0 = E b N0 ,
N0 (t1 t2 ) s 1 (t1 ) s 2 (t2 ) e j 0 (t1 t2 ) dt1 dt2 2
(4.66)
so that, if = 0 (i.e., if s1 (t ) and s2 (t ) are not orthogonal), n 1 and n 2 are correlated, and thus z1 = |1 | = |Eb e j 1 + n 1 | and z2 = |2 | = | Eb e j 1 + n 2 | are not independent. Therefore, even if the error probability is still given by (4.31)
0
where
P(E ) = P(E | m1 ) = 1 P(C | m1 ) , P(C | m1 ) = P(z1 > z2 | m1 ) =

z1 0
p(z1 , z2 | m1 )dz1 dz2 ,
116
now (4.36) does not hold anymore, i.e., and thus we cannot compute P(C | m1 ) as in (4.37). p(z1 , z2 | m1 ) = p(z1 | m1 ) p(z2 | m1 )
Instead of nding the joint pdf p(z1 , z2 | m1 ), the error probability can be alternatively expressed as where
2 z1 2 2 2 2 P(E ) = P(E | m1 ) = P(z1 < z2 | m1 ) = P(z1 < z2 | m1 ) = P(z1 z2 < 0 | m1 ) , 2 z2
(4.67)
is a quadratic form in complex-valued Gaussian r.v.s. Rewriting (4.62) and (4.65) as 1 e j 1 = Eb + n 1 e j 1 2 e j 1 = Eb + n 2 e j 1
we observe that the joint statistics of n 1 and n 2 are invariant to a phase rotation because En i e j 1 n k e j 1 Thus, letting En i e j 1 | 1 = e j 1 E n i | 1 = 0
| 1 = E n i n k | 1 .
1 = E b + n 1 2 = Eb + n 2 we have that
2 2 z1 z2 |1 |2 |2 |2 , (4.68) where means that LHS and RHS are statistically equivalent. For further development, it is convenient using a matrix notation. So, letting
we can write
1 , 2
n 1 , n 2 Eb Eb (4.69)
E { } =
= + n where denotes the Hermitian adjoint9 of , while A is the matrix A= The covariance matrix C of can be expressed as (see (4.63) and (4.66)) C = E
(4.70) (4.71)
|1 | |2 | = A ,
2 2
1 0 . 0 1
n T = =E n
E |n 1 |2 En 1 2n
and thus it is Hermitian, i.e., C = C . A standard result in matrix algebra is that a Hermitian matrix has real eigenvalues and orthogonal eigenvectors, even if the eigenvalues are not all distinct. Hence, if i is the eigenvalue associated to the eigenvector ui , i.e., if C ui = i ui ,
The Hermitian adjoint of a vector or matrix is the transpose and conjugate of the vector or matrix, such that is a short-hand for T .
9
En 2 1n E |n 1 |2
E b N0 E b N0 , E b N 0 E b N 0
(4.72)
117
by collecting the eigenvectors in a matrix U such that ui is the i -th column of U, we can write C U = U where = (4.73)
1 0 (4.74) 0 2 is a diagonal matrix with the diagonal elements i arranged in the same order as the columns ui of U. If the eigenvectors are normalized such that ui = 1, U turns out to be unitary, i.e., UU = I, being I the identity matrix. Therefore, multiplying both sides of (4.73) by U , C can be diagonalized as C = UU . (4.75) The matrix U that diagonalizes C is useful for devising a transformation producing independent r.v.s. However, we also need that the transformed variates have same variances. To this end, it helps factorizing the matrix as = r r (4.76) where r is the (easily invertible) square-root matrix 1/ 1 1 0 0 1 r = , r = . (4.77) 2 0 0 1/ 2 Performing now the following transformation
1 T = r U
(4.78) (4.79) so that the covariance matrix
we have that and, taking into account (4.70), = C turns out to be10
1 T r U (
1 T E { } = r U
) =
1 T , r U n T T
C = E
1 T = E r U n
1 T r U n
1 1 T Ur n = E r Un 1 1 n T Ur = r U E n 1 1 = r U C Ur .
Thus, taking into account (4.75) and (4.76), C = I (4.80) meaning that the components of are complex-valued independent Gaussian r.v.s with unit variances. Inverting (4.78) = U r and replacing in (4.71) with this expression, we obtain |1 |2 |2 |2 = B , where
10
(4.81) (4.82)
r UT AU r .
We recall that (XY)T = YT XT .
118
Let us now nd eigenvalues i and eigenvectors ui of C in (4.72). The generic eigenvalue and its associated eigenvector u should satisfy which has non-trivial solutions u = 0 only if so that (C I)u = 0 , (4.83)
det(C I) = (Eb N0 )2 | |2 (Eb N0 )2 = 0 ,
1 = Eb N0 (1 + | |) (4.84) 2 = Eb N0 (1 | |) . Replacing the above values in (4.83), we nd that the components of ui = (ui1 , ui2 )T , i = 1, 2, are such that | | u12 = u11 | | u22 = u21 . Therefore, choosing u11 = u21 = 1, the eigenvectors would be u1 = 1 , | |/ u2 = 1 | |/
T and it is easy to verify that they are orthogonal, as u1 u2 = 0, but u1 u1 and u2 , the unitary matrix U turns out to be
= u2
= 2. So, normalizing (4.85)
Now, letting
2
1 1 1 . U= |/ | |/ 2 | 1 2 = Eb N0 1 | |2 , UT AU = 0 1 , 1 0 2 0 1 2 = 2 . 0 0
(4.86)
as the matrix B in (4.82) proves to be
0 B = r U AU r = 1 2
T
2 2 We can also easily diagonalize B, as its eigenvalues are readily found to be 1 = and 2 = , with associated normalized eigenvectors
1 1 , v1 = 2 1 Therefore, letting V we have
1 1 v2 = . 2 1 1 1 1 2 1 1 2 0 2 0 (4.87)
B = V V .
119
Replacing this result in (4.81) yields |1 |2 |2 |2 = VV , and, with the further change of variable = V , we obtain
2 |1 |2 |2 |2 = = |1 |2 |2 |2 .
(4.88) (4.89)
From (4.87), (4.77) and (4.85) V

1 T r U
1 + ( |/ 1 2 2 1 )| , = 2 |/ 2 2 1 ( 1 + 2 )|
and, taking into account (4.88), (4.79), (4.69), (4.84), and (4.86) Eb 1 (1 | |) + 2 (1 + | |) 1 T E { } = V = V r U = 2 |) 1 (1 | |) 2 2 (1 + | = 1 2 Eb N0 1 | | + 1 + | | . 1 + | | 1 | |
Furthermore, from (4.88), (4.80) and (4.87), the covariance matrix C is C = E ( ) ( )T =E V ( )

T
V ( )
= VT IV =I,
= VT E ( ) ( )T V
1 T so that the components of = V r U are independent Gaussian r.v.s with unit variances. Thus, performing the last transformation
= we nally have E {} = = 1 4 E b 1 | |2 2
such that the components 1 and 2 of are independent Gaussian r.v.s both with common variance 2 . Besides, (4.89) becomes The advantage here is that, while 1 and 2 are not independent, 1 and 2 are. Therefore, recalling (4.68) and (4.67), as the joint pdf of |1 | and |2 | factorizes p|1 |,|2 | ( x , y) = p|1 | ( x ) p|2 | (y) , |1 |2 |2 |2 = |1 |2 |2 |2 .
C = E
1 | | + 1 + | | 1 + | | 1 | |
T 2 = I,
(4.90)
120
the error probability can be expressed as P(E ) = P(E | m1 ) = P(z1 < z2 | m1 ) = P(|1 | < |2 | | m1 ) = p|1 |,|2 | ( x , y)dxdy 0 x = p|1 | ( x ) p|2 | (y)dy dx .
0 x
(4.91)
Now, as already implicitly seen when we derived (4.38), if x1 and x2 are independent Gaussian r.v.s
2 2 + x2 is a Ricean-distributed r.v., with means i , i = 1, 2, and common variance 2 , then r = x1 whose pdf is r r 2 + s2 rs p(r ) = 2 exp I0 2 r0 (4.92) 2 2 2 2 where s2 = 1 + 2 . As i = ic + j is , i = 1, 2, and being ic and is real-valued Gaussian r.v.s, we
have that |i | =
2 2 ic + is . Thus, letting
2 and taking into account that = var(i ) = var(ic ) + var(is ), and that var(ic ) = var(is ), the pdfs 2 of |1 | and |2 | are as in (4.92) with s replaced by si and 2 = /2. Thus, from (4.90) and (4.86), the parameters specifying p|1 | ( x ) and p|2 | (y) are
si2 = |E {i }|2 = E 2 {ic } + E 2 {is }
i = 1, 2
1 2 E 1 | |2 1 + 1 | |2 2 b 1 2 2 s2 = Eb 1 | |2 1 1 | |2 2 1 2 = E b N 0 1 | |2 . 2 Performing the change of variable z = y/ and recalling (4.50), we can compute the inner integral in (4.91) as 2 y2 + s2 ys2 y p|2 | (y)dy = exp I dy 0 2 2 2 2 x x s2 z2 + (s2 / )2 = z exp I0 z dz 2 x / s2 x = Q1 , and thus 2 x 2 + s1 x xs1 s2 x P(E ) = exp I0 2 Q1 , dx . 2 2 2 0
2 s1 =
Surprisingly enough, this integral can be given a closed form expression. Letting 1 1 | |2 2 1 + 1 | |2 b= 2 a=
121
Eb P(E ) = Q1 a , N0 which, using the symmetry relation
it turns out that
Eb Eb a + b Eb 1 b exp I ab , 0 N0 2 2 N0 N0 2 + 2 I0 () , 2
Q1 (, ) + Q1 (, ) = 1 + exp can also be written as
1 E E E E b b b b P(E ) = (4.93) 1 Q 1 b , a + Q1 a , b . 2 N0 N0 N0 N0 For orthogonal signals, a = 0 and b = 1 because | | = 0, and, taking into account (4.51), from 1 1 (4.93) we obtain P(E ) = /2 exp( /2Eb /N0 ), as expected (see (4.44)). On the other hand, for | | = 1 (e.g., for antipodal signals) we have a = b = 1/2 and thus, from (4.93), P(E ) = 1/2. For 0 < | | < 1, P(E ) increases for increasing values of | |, as shown in Fig. 4.19.
1 101 102 103 104 105 106 107 108 6 8 10 | = 0 | 12 14 Eb/N0 [dB] 0.6 0.4 0.2 Pb 0.8
16
18
20
F IGURE 4.19.
CHAPTER 5
Optimum Detection in the Presence of Colored Noise

Until now we have always supposed that the noise is white. In this chapter we will remove this restriction and will seek the structure and probability of error of an optimum receiver for the case in which the noise is additive, Gaussian and colored, that is, wide-sense stationary with non-constant power spectral density. There are different ways to accomplish this goal, but we are going to examine only a couple of them. The rst one is the most direct and is based on the KharunenLove series expansion studied in chapter 2. However, this method might introduce issues about the realizability of the receiver. The second examined approach is more intuitive and is based on the concept of whitening the noise through a proper processing. However, this processing may require the observation of the received signal over a time interval larger than the duration of the transmitted signals. The reason is that now the noise autocorrelation function is not a Dirac delta anymore, and thus the noise outside the time interval occupied by the signals is correlated with the noise inside the interval, such that we can presumably get more knowledge for combating the inside-interval noise by also observing the outside-interval noise. 5.1. Kharunen-Love Series Expansion If the additive noise is Gaussian but not white, although the received waveform can be expanded as in (3.1) using any complete orthonormal basis, in general, the components (3.2) are not conditionally independent, unless the functions of the basis satisfy the Fredholm integral equation (2.65) with a = 0 and b = T . Thus, if the noise is colored, i.e., if its power spectral density N ( f ) is not constant, we have to actually nd a Kharunen-Love basis.1 Supposing that the noise is zero-mean and wide-sense stationary, the autocovariance function C (t1 , t2 ) to be used in (2.65) coincides with the autocorrelation function and can be obtained as the inverse Fourier transform of N ( f ) C (t1 , t2 ) = R(t1 t2 ) = N ( f )e j 2 f (t1 t2 ) d f . Once we nd the eigenvalues and eigenfunctions {i (t )} 1 (both real-valued in our case), conditionally to mi , the image r in (3.12) of the received signal r (t ) truncated to the rst N components has joint pdf 1 1 p(r | mi ) = exp (r si )T C1 (r si ) , 2 (2 )N det C {i } 1 where now 1 0 0 2 C= . . . . . . 0 0 .. . 0 0 , . . . N 1/1 0 0 1/2 = . . . . . . 0 0 .. . 0 0 . . .
C1
1/N
Actually, this may be avoided, as explained later.

122
5.1. KHARUNEN-LOVE SERIES EXPANSION
123
such that p(r | mi ) =
(2 )N det C j =1 Therefore, canceling common terms, letting N , and taking the logarithm, the MAP criterion (3.10) becomes 1 m = argmin (r j si j )2 2 ln P(mi ) . mi j j =1 As can be seen, because the eigenvalues can now be different from each other, the optimum detection strategy does not depend anymore on the distance between r and si . Expanding the square and neglecting the terms that do not depend on mi we get m = argmax
mi j =1
exp
1 2
1 (r j si j )2 . j
r j si j 1 j 2
0 T
j =1
si2j j
+ ln P(mi ) .
Taking into account that rj = si j = we can also write m = argmax

mi
r (t ) j (t )dt si (t ) j (t )dt j (t ) j ( ) d dt j
j =1
r (t )
T
si ( )
0 T
j =1
1 2 which, letting
si (t )
si ( )
j (t ) j ( ) d dt + ln P(mi ) , j
K (t , ) gi (t )
0 T
j =1
j (t ) j ( ) j
(5.1) (5.2)
si ( )K (t , )d ,
T
can be expressed in the more compact form m = argmax

mi
r (t ) gi (t )dt + Ci
0 T
(5.3) (5.4)
Ci
1 ln P(mi ) 2
si (t ) gi (t )dt .
From this expression,2 it is easy to see that a correlation receiver can be realized as in Fig. 3.3, by correlating r (t ) with gi (t ) rather than si (t ), or a matched lter receiver as in Fig. 3.4, by using gi (T t ) instead of si (T t ). However, we have to solve (2.65) and then compute gi (t ) as in (5.2) for i = 1, 2, . . . , M , which is not an easy task. Moreover, the problem of nding a Kharunen-Love
2
If j = N0 /2 for all j , from (5.1) and (5.2) it follows that gi (t ) = si (t )/(N0 /2) and thus (5.3) reduces to (3.19).
124
basis has a general solution only when the noise power spectral density is given as the ratio of polynomials, while in all other cases we have to get by with a numerical solution. Alternatively to nding a Kharunen-Love basis, we could directly obtain gi (t ), i = 1, . . . M , by solving M integral equations as follows. From (2.65) we have T C (t1 , t ) j (t )dt = j j (t1 ) , 0 t1 T ,
0
so that, multiplying (5.1) by C (t1 , t ) and integrating with respect to t , we get T C (t1 , t )K (t , )dt = j (t1 ) j ( ) , 0 t1 , T .
0 j =1
Accounting for (2.66) and the fact that j (t ) is real-valued, we multiply again this last expression by C (, t2 ) = C (t2 , ) and integrate with respect to , obtaining T T C (, t2 ) C (t1 , t )K (t , )dt d = j j (t1 ) j (t2 ) = C (t1 , t2 ) , 0 t1 , t2 T .
0 0 j =1
This implies that the inner integral must be a Dirac delta, i.e., T C (t1 , t )K (t , )dt = (t1 ) , 0 t1 , T .
0
Thus, multiplying (5.2) by C (t1 , t ) and integrating with respect to t we obtain T C (t1 , t ) gi (t )dt = si (t1 ) , 0 t1 T ,
0
(5.5)
which is the i -th integral equation to be solved for gi (t ). Assuming that gi (t ) vanishes outside the time interval (0, T ), the integral equation (5.5) can be solved in a quite straightforward way by means of a Fourier transformation. Indeed, recalling our assumption of zero-mean and wide-sense stationary Gaussian noise, C (t1 , t ) = R(t1 t ). Thus, if we assume that gi (t ) = 0 for t < 0 and t > T , (5.5) can be written as R(t1 t ) gi (t )dt = si (t1 ) , where the integral is simply the convolution of R(t ) and gi (t ). Hence, denoting by Gi ( f ) and Si ( f ) the Fourier transforms of gi (t ) and si (t ), respectively, and as the Fourier transform of R(t ) is the noise power spectral density N ( f ), we have N ( f )Gi ( f ) = Si ( f ) , from which we get Gi ( f ) = and therefore gi (t ) = F {Gi ( f )} =
1
Si ( f ) N( f )
(5.6)
When computed in this way, gi (t ) may not vanish outside the time interval (0, T ). However, as gi (t ) must satisfy (5.5), we point out that only its shape for 0 t T matters.
Gi ( f )e j 2 f t d f .
125
Although the detection problem in the presence of colored Gaussian noise can be considered formally solved, when following this procedure there is the risk of obtaining a non-realizable function, for example because gi (t ) contains singularities. This aspect is analyzed in the following example. E XAMPLE 5.1. A binary transmission system employs the equally probable antipodal signals A sin c t 0 t T s1 (t ) = s2 (t ) = , c T = 2 N , (5.7) 0 elsewhere N( f ) = From (5.6) we have N0 1 . 2 1 + ( f / f0 )2 (5.8)
while the power spectral density of the additive Gaussian noise is
2 1 + ( f / f0 )2 , N0 so that g1 (t ) = g2 (t ), because s1 (t ) = s2 (t ). Recalling that Gi ( f ) = Si ( f ) F and letting 0

1
(5.9)
d n s(t ) , ( j 2 f ) S ( f ) = dt n
n
2 f0 , the inverse Fourier transform of (5.9) yields gi (t ) = 2 1 d 2 si (t ) si (t ) 2 . N0 0 dt 2 0tT . elsewhere (5.10)
As c T = 2 N , s1 (0) = s1 (T ) = 0 and thus s1 (t )
Thus, g1 (t ) would not be realizable in this case and we could not use this approach for devising an optimum receiver.3 However, as this procedure is based on a gi (t ) that satises (5.5), let us see whether, even if neglecting the impulses, we can at least approximate (5.5). So, we use a function g1 (t ) obtained by replacing (5.11) and (5.12) (but dropping the Dirac deltas) in (5.10). Proceeding in this way, we get 2 2 c sin c t 0 t T N0 A 1 + 2 0 . (5.13) g1 (t ) = 0 elsewhere Now, the inverse Fourier transform of (5.8) yields R(t1 t2 ) =
3
However, s1 (t ) is not continuous in t = 0 and t = T , where it jumps from 0 to c A and viceversa, such that d 2 s1 (t ) 0tT c A (t ) (t T ) 2 c A sin c t = . (5.12) s1 (t ) 2 dt 0 elsewhere
ds1 (t ) c A cos c t = 0 dt
(5.11)
N0 0 e0 |t1 t2 | 4
Actually, as g1 (t ) appears always inside an integral, the strategy (5.3) is still realizable by a more complex receiver requiring also the samples of the received waveform r (t ) at t = 0 and t = T .
126
and hence R(t1 t2 ) g1 (t2 )dt2 =
where we have broken the integral into two parts such that |t1 t2 | = t1 t2 for 0 t2 t1 , and |t1 t2 | = t2 t1 for t1 t2 T . It turns out that t1 c 0 sin c t1 c cos c t1 e0 (t1 t2 ) sin c t2 dt2 = 2 e0 t1 + 2 2 0 + c 2 0 0 + c T 0 sin c T + c cos c T 0 (T t1 ) 0 sin c t1 + c cos c t1 e0 (t2 t1 ) sin c t2 dt2 = e + 2 2 2 2 t1 0 + c 0 + c So, taking into account that sin c T = 0 and cos c T = 1 (because c T = 2 N ), we obtain T c 0 T /2 R(t1 t2 ) g1 (t2 )dt2 = Ae sinh [0 (T/2 t1 )] + A sin c t1 , (5.14) 0 0 while, according to (5.5), we should have obtained s1 (t1 ) = A sin c t1 . This is of course expected, because, for the sake of realizability, we have neglected the impulses in (5.12). However, as and e0 T /2 sinh [0 (T/2 t1 )] 1
0 T
R(t1 t2 ) g1 (t2 )dt2 t1 T 2 A0 c 0 (t1 t2 ) = 1+ 2 e sin c t2 dt2 + e0 (t2 t1 ) sin c t2 dt2 2 0 0 t1

0
for 0 t1 T except for t1 = 0 and t1 = T ,
for 0 T 1 the rst term in the RHS of (5.14) becomes negligible with respect to the second one (irrespective of c T ), such that, with g1 (t ) as in (5.13), T R(t1 t2 ) g1 (t2 )dt2 s1 (t1 ) (5.15) Now, as s2 (t ) = s1 (t ) and g2 (t ) = g1 (t ), and being the signals equally likely, the constants Ci in (5.4) are such that C1 = C2 and can be dropped. Therefore, an optimum receiver can be realized as in Fig. 5.1(a) (correlation receiver) or Fig. 5.1(b) (matched lter receiver).
r (t )
T 0
0
lim e0 T /2 sinh [0 (T/2 t1 )] = 0
r (t )
g1 (T t ) (b)
z t=T
g1 (t )
(a)
m1 m = m2
if z > 0 if z < 0
F IGURE 5.1.
P(E ) = P(E | m1 ) = P(z < 0 | m1 ) , where, supposing that m1 is transmitted such that r (t ) = s1 (t ) + n(t ), T T z= s1 (t ) g1 (t )dt + n(t ) g1 (t )dt = s11 + n1 .
0 0
For symmetry,
127
From (5.7) and (5.13) s11
2 2 2 c s1 (t ) g1 (t )dt = A 1+ 2 N0 0 = = 2 2 2 c A 1+ 2 N0 0 2 2Eb c 1+ 2 N0 0
sin2 c t dt 1 cos 2c t dt 2 (5.16)
where Eb = A2 T /2 is the average energy per bit. As E {n(t )} = 0, the noise component T n1 n(t ) g1 (t )dt
0
is such that E {n1 } = 0, and therefore its variance is T 2 2 n(t1 ) g1 (t1 )dt1 n = E {n1 } = E =
T
n(t2 ) g1 (t2 )dt2
E {n(t1 )n(t2 )} g1 (t1 ) g1 (t2 )dt1 dt2

T
R(t1 t2 ) g1 (t2 )dt2 g1 (t1 )dt1
2 2Eb c 1+ 2 , N0 0
s1 (t1 ) g1 (t1 )dt1 (5.17)
where we used (5.15).4 Thus, taking into account that n1 is zero-mean, from (5.16) and (5.17) we get P(E ) = P(z < 0 | m1 ) = P(s11 + n1 < 0) = P(n1 < s11 ) = P(n1 > s11 ) = Q 2 2 E b c = Q N0 1 + 2 . 0 s11 n (5.18)
As can be seen, even with rational power spectra, this procedure may easily lead to difculties. Hence, it may be better following the so called whitening approach, based on performing a preliminary processing on r (t ) to transform the problem into a white Gaussian noise problem and then use the white Gaussian noise solution obtained in the previous chapters. Indeed, if the preliminary processing is reversible, it has no effect on the performance of the system, as we are going to demonstrate.
We point out that the actual variance is slightly larger than this value, because, when g1 (t ) is as in (5.13), the exact relation is (5.14).
4
5.3. WHITENING FILTER AND SPECTRAL FACTORIZATION
128
5.2. Reversibility Theorem The reversibility theorem states that if R1 is an optimum receiver for r (t ), then a receiver R2 optimum (according to the same criterion) for a reversible transformation of r (t ) has the same performance of R1 . r (t ) R1 r (t ) P1
R2 r (t ) R3
P2 P1
T 1
R1
P3 P2
F IGURE 5.2. With reference to Fig. 5.2, this means that if R1 is optimum for r (t ) with probability of error P1 , and R2 is optimum for r (t ) with probability of error P2 , then P1 = P2 if T is a reversible transformation. Note that the theorem is valid whatever the transformation T , be it linear or not, time invariant or not. What matters is the reversibility of T , such that r (t ) can be reconstructed from its transformed image. A simple proof of the theorem is as follows. Clearly, R2 cannot perform better than R1 or this would contradict our statement that R1 is optimum for r (t ) (indeed, a better receiver for r (t ) would be the cascade of T and R2 ), so P2 P1 . On the other hand, we now show that R2 cannot perform worse than R1 . Indeed, the receiver R3 , which operates on r (t ), is such that P3 P2 , otherwise R2 would not be optimum for r (t ). However, being R3 the cascade of the inverse transformation T 1 and R1 , which operates on r (t ), we have that P3 = P1 . In conclusion, P1 P2 P3 = P1 and thus P1 = P2 if T is reversible. As a last note, from our discussion it is clear that the reversibility of T is only a sufcient condition for having P1 = P2 , it is not necessary. 5.3. Whitening Filter and Spectral Factorization A linear and time invariant transformation is a lter characterized by a transfer function H ( f ), dened as the Fourier transform of the impulse response h(t ). A lter H ( f ) is reversible if its input r (t ) can be reconstructed from the output r (t ) through a physically realizable lter H 1 ( f ) . Clearly, if H ( f ) is reversible 1 H 1 ( f ) = . H( f ) Physical realizability implies stability and causality. By denition, a system is stable if its output is bounded for every bounded input, while it is causal if the output does not appear before applying the input, i.e., if it is not anticipative (so, a lter is causal if its impulse response vanishes for t < 0). As regards causality, a very general criterion is provided by the Paley-Wiener theorem, which states
129
that a necessary and sufcient condition for a lter H ( f ) to be realizable is that |H ( f )| is square integrable and ln |H ( f )| df < , (5.19) 2 1 + f meaning that the amplitude response |H ( f )| cannot have too great a total attenuation. It may vanish for a discrete set of frequencies, but it cannot vanish over a nite band of frequencies. Moreover, from (5.19) it follows that |H ( f )| cannot decay faster than exponentially, such that H ( f ) = e| f | cor2 responds to a realizable lter, but H ( f ) = e f is not realizable.5 As regards stability, a necessary and sufcient condition is that the impulse response is absolutely integrable |h(t )|dt < .
A simpler criterion for physical realizability, accounting for both stability and causality, is the following one. By analytic continuation, we extend H ( f ) to the complex plane s = + j through the substitution f s/( j 2 ). To avoid introducing further symbols, we let H (s) H (s/ j 2 ).6 It can be shown that the lter H ( f ) is stable and causal if H (s) satises the following conditions: (1) H (s) does not have poles in the right half of the s-plane. (2) H (s) does not have multiple poles on the imaginary axis (the poles must be simple poles). (3) If H (s) is a ratio of polynomials in s, the degree of the numerator of H (s) should not exceed the degree of the denominator by more than 1. Thus, if all poles and zeros of H (s) are in the left half of the s-plane7 and those on the imaginary axis are not multiple, both H ( f ) and 1/H ( f ) are stable and causal, and hence H ( f ) is reversible. In this case, due to the reversibility theorem, an optimum receiver R1 for r (t ) = s(t ) + n(t ) has the same probability of error of an optimum receiver R2 for the ltered signal (see Fig 5.3) r (t ) = r (t ) h(t ) = s(t ) h(t ) + n(t ) h(t ) = s (t ) + n (t ) . r (t ) h(t ) r (t ) R2
F IGURE 5.3. Denoting by Si ( f ) and Si ( f ) the Fourier transforms of si (t ) and si (t ), and by N ( f ) and N ( f ) the power spectral densities of n(t ) and n (t ) = n(t ) h(t ), respectively, we have Si ( f ) = Si ( f )H ( f ) (5.20)
2
N ( f ) = N ( f )|H ( f )| .
5
(5.21)
This is not a difculty but a theoretical impossibility. If the Paley-Wiener criterion is satised, one can associate a phase response ( f ) to |H ( f )| such that H ( f )e j ( f ) is a causal lter, otherwise not. Of course, one could always approximate |H ( f )| through a realizable lter. 6 The function H (s)that we obtain is nothing more than the bilateral Laplace transform of the impulse response st h(t ), dened as H (s) h(t )e ds. As can be seen, Laplace and Fourier transforms can be formally obtained from each other by s = j 2 f . However, we should bear in mind that this relation holds only if the Laplace transform exists also for = 0. 7 A lter whose poles and zeros all lie in the left half of the s-plane is referred to as minimum phase.
130
Hence, if we are able to choose H ( f ) such that N ( f ) is constant, the noise at the output of the M lter would be white and we could devise an optimum receiver which observes the signals {si (t )}1 as done in the previous chapters. Moreover, if such lter is reversible, this solution would also be M optimum for the detection of the original signals {si (t )}1 corrupted by additive non-white noise.
Letting N ( f ) = N0 /2 and accounting for (5.21), the whitening lter is such that8 N0 / 2 |H ( f )|2 = . N( f ) However, we cannot choose H ( f ) simply as the square root of the RHS, because we should also take into account physical realizability. The problem of nding a realizable whitening lter can be easily solved if the power spectral density N ( f ) can be expressed (or approximated) as the ratio of polynomials in f 2 . As the impulse response h(t ) is real-valued, its Fourier transform has Hermitian symmetry, so that |H ( f )|2 = H ( f )H ( f ) = H ( f )H ( f ). Therefore, performing the substitution f s/( j 2 ), the whitening lter should satisfy N0 / 2 H (s)H (s) = . (5.22) N (s) Now, as N ( f ) is a real-valued, even and non-negative function whose inverse Fourier transform is an autocorrelation function, it can be shown that (see Fig. 5.4(a)):
j N (s)
ZERO POLE
H (s)
H (s)
(a)
(b)
F IGURE 5.4. Poles and zeros of N (s) are symmetrical with respect to the and j axes. The zeros on the j axis always occur in pairs. The are no poles on the j axis.
Using this result, it is easy to infer from (5.22) the expression of H (s). Indeed, taking into account that both H (s) and H 1 (s) should be reversible for the physical realizability of H (s), and that the poles of H (s) are the zeros of H 1 (s) and viceversa, it follows that H (s) should not have neither poles nor zeros in the right half of the s-plane. Hence, as poles and zeros of N (s) are zeros and poles, respectively, of H (s)H (s), we should assign all poles and zeros of 1/N (s) lying in the left half of the s-plane to H (s), and all poles and zeros of 1/N (s) lying in the right half of the s-plane to H (s). If there are double poles on the j axis (resulting from double zeros of N (s)), one of them is assigned to H (s) and the other one to H (s), as shown in Fig. 5.4(b).
The choice of N0 /2 is arbitrary, because it only affects H ( f ) by a multiplicative constant, leaving unchanged the signal-to-noise ratio.
8
131
Once H (s) is determined, letting s = j 2 f we obtain H ( f ) and, using (5.20), the expression of the M as ltered signals {si (t )}1 si (t ) =
such that the MAP strategy for their detection in AWGN can be derived.
M is in general larger than Note that, due to the whitening lter, the duration T of the signals {si (t )}1 T . Thus, the observation interval of the receiver operating on r (t ) is (0, T ) and not (0, T ). This receiver is equivalent to an optimum receiver observing r (t ) over a time interval in general larger than T . Thus, unlike the AWGN case, the decision strategy may take advantage of the observation of r (t ) even over a time interval where the transmitted signal is not present. As already noted, this is to be explained by the fact that, being the noise correlated, observing its shape outside the signaling interval may provide information on the shape of the noise inside the signaling interval. However, allowing more time to the receiver for making a decision, lowers the signaling rate. In the next chapter we will see how this can be avoided.
Si ( f )H ( f )e j 2 f t d f ,
E XAMPLE 5.2. Let us nd an optimum receiver for the binary transmission system of Example 5.1 by using the whitening approach. Letting f = s/( j 2 ) and 0 = 2 f0 , from (5.8) we get N0 N0 1 1 N (s) = , = 2 2 1 (s/0 ) 2 (1 + s/0 )(1 s/0 ) which, substituted in (5.22), yields s s H (s)H (s) = 1 + 1 , 0 0 such that s H (s) = 1 + 0 and thus j 2 f H( f ) = 1 + . 0 Hence, the whitening lter can be realized as shown in Fig. 5.5.
1 d 0 dt
r (t )
r (t )
F IGURE 5.5. The signals s1 (t ) and s2 (t ) are readily found to be 1 ds1 (t ) s1 (t ) = s2 (t ) = s1 (t ) + 0 dt c = A sin c t + A cos c t 0 = A 1+ 2 c sin(c t + ) 2 0 0tT
132
where The distance between these signals is d = =

0 T
arctan
c . 0
s1 (t ) s2 (t ) dt 2 c 1+ 2 0 2 c 2 0
0 T
4 A2
sin2 (c t + )dt
= 2 Eb 1 + and thus the probability of error is
the same result as (5.18), as it should be. In the AWGN case with power spectral density N0 /2, we would have obtained T 2 d = 4A sin2 c tdt = 2 Eb ,
0
2 d 2 E b c Pb = Q = Q 1 + , 2 N0 0 2 N0
such that
which is explained by the fact that, over the signal bandwidth, N( f ) N0 . 2
Pb = Q
2Eb Pb N0
while the power spectral density of the additive zero-mean Gaussian noise is N( f ) = Letting 0 N0 1 + ( f / f0 )2 . 2 ( f / f0 )2
E XAMPLE 5.3. A binary transmission system employs the equally likely and antipodal signals A 0 t T s1 (t ) = s2 (t ) = , (5.23) 0 elsewhere (5.24)
2 f0 , from (5.22), the whitening lter is such that H (s)H (s) = (s/0 )2 s/0 s/0 = 2 (s/0 ) 1 (s/0 ) + 1 (s/0 ) 1
133
and thus
s/0 1 =1 1 + s/0 1 + s/0 1 H( f ) = 1 . 1 + j f / f0 The inverse Fourier transform of H ( f ) yields H (s) = h(t ) = (t ) 0 e0 t u(t ) ,
where u(t ) is the unit step function. Hence, the signals at the output of the whitening lter are s1 (t ) = s2 (t ) = s1 (t ) h(t ) = s1 (t ) 0 s1 (t ) e0 t u(t ) = s1 (t ) 0 s1 ( )e0 (t ) u(t )d min(T ,t ) 0 t = s1 (t ) 0 Ae e0 d where we accounted for the fact that s1 ( ) = 0 for < 0 and > T , while u(t ) = 0 for > t . Performing the calculation, yields 0tT Ae0 t s1 (t ) = s2 (t ) = . T t A(e 0 1)e 0 t > T
0
As can be seen, the optimum receiver should observe the signals for an innitely long time. Let us see what happens when the receiver observes the signals either over (0, T ) or (0, ). Letting D equal to either T or , depending on the duration of the observation interval, as the signals are equally likely and antipodal, the structure of the optimum receiver is as shown in Fig. 5.6.
1 1+ j f / f0
s1 (t )
r (t )
r (t )
D 0
m1 m2
m1 m = m2
z>0 z<0
F IGURE 5.6. Letting ED = the probability of error can be expressed as

0 D
|s1 (t )|2 dt , 2E D . N0
Specically, we get ET =
A2 1 e20 T 1 e20 T = Eb 20 20 T 0 2 A2 1 e0 T E = |s1 (t )|2 dt = ET + 1 e0 T = Eb 20 0 T 0 |s1 (t )|2 dt =
2 ED = Q Pb ( E D ) = Q 2 N0
(5.25)
(5.26) (5.27)
134
where Eb =
is the average energy per bit. As can be seen, ET E , and thus Pb (ET ) Pb (E ). As lim
|s1 (t )|2 dt = A2 T
ET 1 1 e20 T = lim =1, 0 0 E 0 0 2 1 e0 T if 0 T 1, we can limit the observation to the interval (0, T ) without signicantly increasing the probability of error. This is due to the fact that N0 lim N ( f ) = , 0 0 2 i.e., the noise becomes white. However, as 1 1 e20 T 1 ET = lim = , T 0 2 1 e 0 0 E 2 if 0 T 1, for maintaining the same probability of error observing the signals in the interval (0, T ), we need 3 dB more energy. Note that already for 0 T = 4, ET / E 0.51. lim E XAMPLE 5.4. Let us now solve Example 5.3 through the Kharunen-Love series approach. Rewriting (5.24) as 2 N0 0 N( f ) = 1+ (5.28) 2 (2 f )2 where 0 2 f0 , and letting = t1 t2 , the corresponding autocorrelation function is R( ) = N0 1 1 ( ) + 2 0F 2 (2 f )2 . (5.29)
In order to nd the inverse Fourier transform of 1/(2 f )2 , we reason as follows. Given a signal x (t ) such that d 2 x (t ) = (t ) , (5.30) dt 2 by taking the Fourier transform of both sides, we get ( j 2 f )2 X ( f ) = 1, i.e., 1 X( f ) = (2 f )2 and thus 1 F 1 = x (t ) . (2 f )2 1 A signal x (t ) satisfying (5.30) is x (t ) = 2 |t | + at + b, where a and b are constants. As the autocorrelation function must be even and no Dirac deltas appear in (5.28), necessarily a = b = 0, such that (5.29) becomes N0 1 R( ) = ( ) 2 | | . 2 2 0 As the noise is zero-mean, C (t , ) = R(t ) and, replacing this expression in (5.5), we have T N0 T 1 R(t ) gi ( )d = (t ) 2 |t | gi ( )d = si (t ) , 0 t T , (5.31) 2 0 2 0 0
135
which, performing the integral involving the Dirac delta, can also be written as T 2 N0 0 |t | gi ( )d = si (t ) , 0tT. gi (t ) 2 2 0 Differentiating two times with respect to t , we rst obtain T dsi (t ) N0 dgi (t ) 2 0 sgn(t ) gi ( )d = , 2 dt 2 0 d and then T N0 d 2 gi (t ) d 2 si (t ) 2 ( t ) g ( ) d = , i 0 2 dt 2 dt 2 0 By performing the integration we nally get N0 d 2 gi (t ) d 2 si (t ) 2 g ( t ) = , i 0 2 dt 2 dt 2 whose Fourier transform is 0tT 0tT.
0tT,
(5.32)
N0 2 (5.33) ( j 2 f )2 2 0 G i ( f ) = ( j 2 f ) Si ( f ) . 2 Notice that, if we require that both sides of (5.32) vanish outside the time interval 0 t T , this would correspond to multiplying both sides by u(t ) u(t T ) and dropping the restriction 0 t T . Hence, taking the Fourier transform, both sides of (5.33) should be convolved with F {u(t ) u(t T )} = T sinc( f T ) exp( j f T ). However, (5.33) would still hold, as X ( f ) V ( f ) = Y ( f ) V ( f ) implies X ( f ) = Y ( f ). Thus, we can simply ignore the restriction 0 t T in (5.32), solve (5.33) and then disregard the result outside the observation interval. From (5.33) we get Gi ( f ) = = 2 (2 f )2 Si ( f ) N0 (2 f )2 + 2 0
1 20 j 2 f Si ( f ) j 2 f 0 N0 (2 f )2 + 2 0
which, of course, could have been obtained directly from (5.6). Taking into account (5.23) and that, consulting a table of Fourier transform pairs, 20 F e0 |t | , 2 2 (2 f ) + 0 we have j 2 f S1 ( f ) 20 j 2 f (2 f )2 + 2 0 leading to g1 (t ) = A [(t ) (t T )] sgn(t )e0 |t | N0 A = sgn(t )e0 |t | sgn(t T )e0 |t T | , N0
F F
A[(t ) (t T )] 0 sgn(t )e0 |t | ,
136
such that we can assume
As the signals are equally probable and being s2 (t ) = s1 (t ) and g2 (t ) = g1 (t ), an optimum receiver can be again realized as in Fig. 5.1(a) or Fig. 5.1(b). Denoting by Eb = A2 T the average energy per bit and assuming that m1 is transmitted (such that r (t ) = s1 (t ) + n(t )), we have T T z= s1 (t ) g1 (t )dt + n(t ) g1 (t )dt = s11 + n1 ,
0 0
A N0 e0 t + e0 (t T ) , 0 t T g1 (t ) = . 0 otherwise
(5.34)
where, from (5.23) and (5.34) T A2 T 0 t 2Eb 1 e0 T s11 s1 (t ) g1 (t )dt = e + e0 (t T ) dt = N0 0 N0 0 T 0 T n1 n(t ) g1 (t )dt . As E {n(t )} = 0, n1 turns out to be zero-mean (i.e., E {n1 } = 0), and therefore its variance is T T 2 2 n(t ) g1 (t )dt n( ) g1 ( )d n = E {n1 } = E
T 0
(5.35) (5.36)
= = = =
E {n(t )n( )} g1 (t ) g1 ( )dtd

T
R(t ) g1 ( )d g1 (t )dt
i.e., the same result as in (5.25) with E D = E in (5.27). However, notice that in this case we need to observe the received waveform only in the time interval 0 t T .
2Eb 1 e0 T , (5.37) N0 0 T where we used (5.31). Thus, taking into account that n1 is zero-mean, the probability of error is s11 P(E ) = P(E | m1 ) = P(z < 0 | m1 ) = P(s11 + n1 < 0) = P(n1 > s11 ) = Q n and, from (5.35) and (5.37) we get T 0 2Eb 1 e P(E ) = Q N0 0 T ,
s1 (t ) g1 (t )dt
CHAPTER 6
Waveform Transmission Over Dispersive Channels

Until now, we have supposed that the channel was not affecting the duration of the transmitted signals. We will now drop this hypothesis, that essentially boils down to supposing that the channel bandwidth is much larger than 1/T , the inverse of the signaling interval. Indeed, in many applications the channel bandwidth is of the order of 1/T , such that its impulse response may extend over many signaling intervals, and thus may spread (or disperse) the duration of the signals. We will take into account communication systems employing pulse amplitude modulation (PAM) for transmitting information, as this modulation format is amenable to analytical development, but the results can be readily extended to any other kind of signaling. 6.1. Pulse Amplitude Modulation (PAM) The signals used in a M -ary pulse amplitude modulation (PAM) transmission system are si (t ) = ai p(t ), where M is even,1 0 t T , i = 1, 2 , . . . , M i = 1, 2 , . . . , M
d ai = (2i 1 M ) , 2 such that the elements of the alphabet are
and p(t ) is a pulse vanishing outside the interval 0 t T and conveniently chosen to have unit energy. We will suppose that the signals si (t ) (or, which is the same, the message symbols ai ) are equally likely. At rst, we will also suppose that the channel is wideband and AWGN, being N0 /2 the power spectral density of the noise. Clearly, the signal space is monodimensional and, as p = 1, p(t ) is its orthonormal basis. The images of the signals are thus as shown in Fig. 6.1. s1 (M 1) d 2 s M 1
2
d /2, 3d /2, . . . , ( M 1)d /2 ,
sM
2
s M +1
2
s M +2
2
sM ( M 1) d 2
3 d 2
d 2
d 2
3d 2
F IGURE 6.1. Being the signals equally likely, according to the MAP criterion, an optimum receiver follows the strategy a = argmax
ai T 0
r (t )si (t )dt + Ci ,
Ci = Ei /2 = a2 i /2
(6.1)
It is not necessary that M is even, it will be assumed even only for convenience.
137
6.1. PULSE AMPLITUDE MODULATION (PAM)
138
which is implemented by the structure in Fig. 6.2(a). As the signals si (t ) differ only by a multiplicative constant ai , instead of M correlators, a single matched lter could be used, so that the structure becomes as in Fig 6.2(b).
s1(t )
T 0
a2 1 /2 a2 2 /2
T 0
a1
a2 1 /2 a2 2 /2
CHOOSE MAX
r(t )
r(t )
. . .
sM (t )
T 0
p(T t )
z t=T
CHOOSE MAX
s2(t )
a2
. . .
aM
. . .
a2 M /2
a2 M /2
(a) F IGURE 6.2.
(b)
We could have directly obtained this last structure by observing that T T r (t )si (t )dt = ai r (t ) p(t )dt = ai z ,
0 0
z being the output of the lter matched to p(t ) sampled at t = T . In fact, substituting this result in (6.1), the strategy can also be written as a = argmax ai z a2 i /2 .
ai
(6.2)
2 Now, taking into account that a1 < a2 < < aM , we have that ai z a2 i /2 > a j z a j /2 (meaning that ai should be preferred to a j ) when z < (ai + a j )/2 if j > i , and when z > (ai + a j )/2 if j < i . So, the structure in Fig 6.2(b) is equivalent to the one in Fig. 6.3, where the thresholds i lie midway between two adjacent signal images.
r(t )
p(T t )
z t=T
i =
ai + ai+1 2
1
(M 1) d 2
M 2
2
M 1
2
M
2
M +1
2
a1 a2 . . a = . a M 1 aM
2
z < 1 1 < z < 2 . . . M 2 < z < M 1 M 1 < z
M +2
M1
( M 1) d 2
3 d 2
d 2
d 2
3d 2
F IGURE 6.3.
6.1. PULSE AMPLITUDE MODULATION (PAM)
139
Let us now compute the probability of error. Supposing that si (t ) = ai p(t ) is transmitted, the received signal is r (t ) = ai p(t ) + w(t ), where w(t ) is the additive white Gaussian noise. So, the sample z is given by T T 2 z = ai p (t )dt + w(t ) p(t )dt = ai + n 0 0 T where n = 0 w(t ) p(t )dt is a zero-mean Gaussian random variable with variance 2 = N0 /2. Thus P(E ) =
M i =1
1 P(E | ai )P(ai ) = M
M i =1
P(E | ai )
and the conditional error probabilities can be easily evaluated. For symmetry reasons, P(E | a1 ) = P(E | aM ) and P(E | ai ) = P(E | a j ) for i , j = 2, . . . , M 1. Hence, we have to evaluate only P(E | a1 ) and P(E | a2 ) P(E | a1 ) = P(a1 + n > 1 ) = P n > 1 a1 d 2 d 2 =P n> =Q such that P (E ) = P(E | a2 ) = P(a2 + n < 1 ) + P(a2 + n > 2 ) = P n < 1 a2 + P n > 2 a2 d d +P n> 2 2 d d = 2P n > = 2Q 2 2 =2 d M 1 Q . M 2 =P n<
d d 1 2Q + ( M 2)2Q M 2 2
(6.3)
Let us now express the probability of error in terms of Eb /N0 . As the signals are equally likely and have same energy in pairs, the average energy per symbol Es is 1 Es = M Taking into account that
M i =1
2 Ei = M
M i =1
M /2 i =1
EM 2 +i
2 = M
M /2 i =1
a M +i
2
d2 = 2M
M /2 i =1
(2i 1)2 .
i2 =
1 M ( M + 1)(2 M + 1) , 6
M /2 i =1
and observing that
2 M i =1
i =
(2i 1) +
M /2 i =1
(2i )2 ,
we have
M /2 i =1
(2i 1)2 =
M i =1
i2 4
M /2 i =1
i2 = =
4M M 1 M ( M + 1)(2 M + 1) + 1 ( M + 1) 6 6 2 2 1 M ( M + 1)(M 1) 6 M2 1 2 d . 12
and thus Es =
2
Here we simply perform the summation by rst summing the odd and then the even terms.
6.2. OPTIMUM RECEIVER FOR ONE-SHOT TRANSMISSION
140
As each symbol carries log2 M bits, Eb = Es / log2 M , and solving for d we obtain 12Es d= = M2 1 and therefore, recalling that = N0 /2, 12 log2 M Eb M2 1 (6.4)
6 log M M 1 M 1 d E b 2 P(E ) = 2 =2 Q Q 2 . M M M 1 N0 2 N0 6.2. Optimum Receiver for One-Shot Transmission
If the channel bandwidth is of the order of 1/T , we must revisit our system model in Fig. 3.1. The channel will be modeled as a linear time invariant lter at whose output a white Gaussian noise w(t ) is added, as shown in Fig. 6.4.
w(t )
SOURCE
ai
TX
s(t )
CHANNEL FILTER
r(t )
RX
a i
CHANNEL
F IGURE 6.4. Here, ai is a symbol emitted by a source of information in the i -th signaling interval T and belonging to a M -ary alphabet3 M a( j ) = a(1) , a(2) , . . . , a(M ) . 1 For a PAM system, A a(i) = (2i 1 M ) i = 1, 2 , . . . , M (6.5) 2 and, correspondingly to the emission of ai , the transmitter generates a pulse ai hT (t iT ), obtained by delaying a pulse hT (t ) of duration T . Thus, the overall signal generated by the transmitter is s(t ) =
i
ai hT (t iT ) ,
(6.6)
where we omit the lower and upper limit of the summation to mean that the index i runs from to +. We can think that s(t ) is generated by ltering a train of Dirac deltas i ai (t iT ) with a lter whose impulse response is hT (t ), as shown in Fig. 6.5, where HT ( f ) is the lter transfer function, i.e., the Fourier transform of hT (t ).
ai (t iT )
i
3
HT ( f )
aihT (t iT )
i
F IGURE 6.5.
Contrarily to what done in the previous section, we use a superscript to denote the elements of a set, reserving the subscript for the temporal index. So, ai is the i -th symbol emitted by the source, while a(i) is the i -th element of the M -ary alphabet.
141
The signal s(t ) is distorted by the channel and corrupted by the additive white Gaussian noise w(t ). Denoting by HC ( f ) the channel lter transfer function, if the bandwidth of HC ( f ) is large with respect to that of HT ( f ) and the channel phase response is linear over the bandwidth of HT ( f ), then the channel impulse response hC (t ) = F 1 {HC ( f )} can be approximated by a Dirac delta, i.e., hC (t ) HC (0)(t ). Indeed, (see Fig. 6.6) we would have that HT ( f )HC ( f ) HT ( f )HC (0)e j 2 f , and the channel would simply introduce inessential attenuation and delay.
|HT ( f )| B B
2 f
|HC ( f )| f HC ( f )
F IGURE 6.6. In this case, as already seen, the optimum MAP receiver would be a lter matched to to HT ( f ), followed by a sampler and a multiple-threshold detector. Assuming the same structure also when the channel is not wideband, but, for now, leaving the lter HR ( f ) at the receiver unspecied, the model of a PAM transmission system is as shown in Fig. 6.7. s(t ) = ai hT (t iT )
i
w(t ) r(t ) HR ( f ) x(t ) tk = t0 + kT F IGURE 6.7. DECISION a k
ai (t iT )
i
HT ( f )
HC ( f )
Denoting by G( f ) = HT ( f )HC ( f )HR ( f )

1
(6.7)
the overall transfer function of the system and by g(t ) = F {G( f )} its inverse Fourier transform, the signal at the sampler can be written as x (t ) =
i
ai g(t iT ) + n(t )
where n(t ) is the response of the lter HR ( f ) to w(t ). Because w(t ) is AWGN with power spectral density N0 /2, n(t ) is a wide-sense stationary Gaussian process with zero-mean and power spectral 1 density 2 N0 |HR ( f )|2 . Sampling at the time instant tk t0 + kT , we get x (tk ) =
i
ai g(tk i ) + n(tk )
(6.8)
where, for all tk , n(tk ) is a Gaussian random variable with zero-mean and variance 2 equal to the power of n(t ) N0 2 = |HR ( f )|2 d f . (6.9) 2
142
Let us now see how HR ( f ) and t0 should be chosen. Considering a one-shot transmission, i.e., s(t ) = a0 hT (t ), the received signal would be r (t ) = a0 hTC (t ) + w(t ), where hTC (t ) is the channel ltered version of hT (t ). If the channel is dispersive, the duration of hTC (t ) is now t0 > T . From the point of view of the receiver, it is as if the transmitted signal is a0 hTC (t ) instead of a0 hT (t ). So, supposing that the message symbols are equally likely, the MAP strategy is t0 a 0 = argmax r (t )a(i) hTC (t )dt + Ci , Ci = Ei /2
a(i) 0
hT (t ) hC (t )
(6.10)
where now Ei is Ei = a Eh being the energy of hTC (t ). Letting

(i ) 2
t0
|hTC (t )|2 dt = a(i) Eh
(6.11)
t0 1 z = r (t )hTC (t )dt , Eh 0 the strategy can be rewritten as (compare with (6.2)) a 0 = argmax a(i)z a(i) /2 .
a(i ) 2
Given that the correlation between r (t ) and hTC (t ) can also be obtained by sampling at t = t0 the output of a lter matched to hTC (t ), the structure of the optimum receiver is as in Fig. 6.3 but with p(T t ) replaced by hTC (t0 t )/ Eh and the output sampled at t = t0 instead of t = T . So, using hTC (t0 t ) instead of hTC (t0 t )/ Eh , the optimum receiver is as in Fig. 6.8, where z = Ehz . (1) a z < 1 a(2) 1 < z < 2 . . r(t ) x(t ) z a 0 . . a 0 = . . hTC (t0 t ) ( M 1) t = t0 a M 2 < z < M 1 a(M ) M 1 < z (i ) (i +1) a +a Eh i = 2 F IGURE 6.8. In conclusion, recalling (6.10), as
hTC (t0 t ) HT ( f )HC ( f )e j 2 f t0 , F
choosing
H R ( f ) = HT ( f )HC ( f )e j 2 f t0 (6.12) and sampling at t0 , provides the optimum receiver for a one-shot transmission. Note that from (6.7) and (6.12) we have
such that
G( f ) = HT ( f )HC ( f )HR ( f ) = |HT ( f )HC ( f )|2 e j 2 f t0 , g(t0 ) =
G( f )e
j 2 f t0
df =
|HT ( f )HC ( f )|2 d f = Eh .
(6.13)
143
Thus, as the decision sample is z = x (t0 ) = a0 g(t0 ) + n(t0 ) , (6.14) the distance between neighboring images is d = Ag(t0 ), while, from (6.9) and (6.12), n(t0 ) is a zero-mean Gaussian random variable with variance N0 N0 2 = |HT ( f )HC ( f )|2 d f = Eh . (6.15) 2 2 As the probability of error can be written as in (6.3), taking into account (6.13) and (6.15), we have P(E ) = 2 M 1 d Q M 2 M 1 Ag(t0 ) =2 Q M 2 =2 M 1 Q M
M i =1
(6.16) (6.17)
A2 Eh . 2 N0 M2 1 2 A Eh 12
Using (6.5) and (6.11), the average received energy per symbol Es is 1 Es = M
M i =1
Eh Ei = M
a(i)
(6.18)
such that, denoting by Eb the received energy per bit, 6 log2 M A2 6 Eh = 2 Es = Eb 2 M 1 M2 1 6 log M M 1 Eb 2 P(E ) = 2 Q 2 , M M 1 N0
and (6.17) can be rewritten as
i.e., the same result as in (6.4), as it should be.
Let us now interpret these results. Recalling the Schwartz inequality (2.5), let us rewrite it as 2 2 V ( f )W ( f )d f |V ( f )| d f |W ( f )|2 d f (6.19) and also recall that the equality holds when V ( f ) and W ( f ) are proportional. From (6.16), we see that P(E ) depends on g(t0 )/ or, which is the same, on g2 (t0 )/ 2 . As j 2 f t0 g(t0 ) = G( f )e df = HT ( f )HC ( f )HR ( f )e j 2 f t0 d f , letting V ( f ) = HR ( f ) and W ( f ) = HT ( f )HC ( f )e j 2 f t0 , we have 2 2 j 2 f t0 2 g (t0 ) = HT ( f )HC ( f )HR ( f )e df | H R ( f )| d f Using (6.9), we can also write 2

|HT ( f )HC ( f )|2 d f .
(6.20)
N0 g2 (t0 ) . 2 |HT ( f )HC ( f )|2 d f
6.3. INTERSYMBOL INTERFERENCE AND NYQUIST CRITERION
144
Now, if we hold xed g(t0 ), the right hand side does not depend on HR ( f ), such that it is a minimum for 2 when changing HR ( f ). This minimum is achieved when the equality holds in (6.20), i.e., assuming the constant of proportionality equal to 1, when (6.12) holds.4 Thus, by choosing HR ( f ) as in (6.12), we maximize the signal-to-noise ratio at the sampling time t0 with the constraint that g(t0 ) is xed. As (6.13) descends from (6.12), taking into account (6.18), a constraint on g(t0 ) is tantamount to a constraint on the received average energy and thus, in last instance, on the received average power. 6.3. Intersymbol Interference and Nyquist Criterion Of course, a one-shot transmission has no interest for us, because we would like to transmit continuously. We could do that by using a signaling rate 1/t0 , but, if the channel is very dispersive, t0 could be much greater than T , the duration of the transmitted signals, so that we would be forced to a very slow rate. The way out could be considering an entire block of symbols as a single signaling and designing a receiver for this super one-shot transmission. After transmitting a block, we could wait for a sufcient time and then transmit another block of symbols. The larger the block we consider, the more we approximate the signaling rate 1/T . However, this is not feasible. For example, if we transmit N symbols at rate 1/T , such that our block is ai = (ai,1 , ai,2 , . . . , ai,N ), the corresponding received signal is si (t ) =
N k =1
ai,k hTC (t kT )
where hTC (t ) is the impulse response of HT ( f )HC ( f ). If our original alphabet has M symbols, the total number of signals is M N and the receiver should decide, with the minimum probability of error, which one of the M N signals was transmitted. In the AWGN case, the optimum receiver requires M N correlators or matched lters. The problem here is that such a receiver would not be realizable due to the fact that M N is a huge number. As an example, suppose that we transmit 104 symbols per seconda very low transmission rate for nowadays standardssuch that T = 104 seconds. Let us suppose of using a binary alphabet, i.e., M = 2, and that we perform a one-shot transmission of duration 102 seconds. Thus, N = 100 and we would need 2100 1030 correlators, i.e., much much more than the number of stars in our galaxy! So, let us see whether we can make do with the system model illustrated in the previous section. If we transmit a pulse every T seconds, sampling at t = t0 , from (6.8) we get x (t0 ) = a0 g(t0 ) +
i =0
ai g(t0i ) + n(t0 )
instead of (6.14), and, in general, sampling at tk = t0 + kT , x (tk ) = ak g(t0 ) +

i =k
ai g(tk i ) + n(tk ) ,
where the term

i =k 4
ai g(tk i ) ,
(6.21)
Note that P(E ) is independent of the constant of proportionality, as signal and noise are affected in the same way and what matters is their ratio.
145
called intersymbol interference (ISI), is a random quantity acting as additional noise. As the MAP strategy that we followed only takes into account the AWGN producing n(tk ), the receiver in Fig 6.8 would not be optimum anymore. However, if we were able to choose HT ( f ) and HR ( f ) such that G( f ) in (6.7) yields 1 k = 0 g(tk ) = , 0 k = 0
(6.22)
the ISI term in (6.21) would vanish, and the receiver in Fig. 6.8 would still be optimum, despite the dispersive channel. 6.3.1. The Nyquist Criterion. In order to nd the shape of G( f ) satisfying (6.22), it is convenient introducing the function gN (t ) g(t + t0 ) obtained anticipating g(t ) by t0 , as it easier expressing the condition (6.22) in terms of its Fourier transform G N ( f ). The relation between the Fourier transforms is G N ( f ) = G( f )e j 2 f t0 , so the ndings on G N ( f ) are readily transferred to G( f ). Restating (6.22) in terms of gN (t ) 1 k = 0 (6.23) gN (kT ) = 0 k = 0 and recalling the Poisson sum formula (2.34), we get GN f k =T T gN (kT )e j 2k f T = T
k
(6.24)
which is referred to as the Nyquist criterion for the absence of intersymbol interference at the time instants kT . In the following, a Fourier transform satisfying the Nyquist criterion at t = kT will be denoted by the subscript N , as done above. Among all possible solutions, we are interested to the signals whose Fourier transform vanishes outside the frequency range | f | 1/T , such that, as G N ( f ) is the Fourier transform of a real signal, (6.24) can be checked only for 0 f 1/T . Therefore, the Nyquist criterion becomes G N ( f ) + G N ( f 1/ T ) = T 0 f 1/ T (6.25)
and it is clear that a triangular shape as the one in Fig. 6.9 satises it.
GN ( f + 1 / T )
GN ( f )
GN ( f 1 / T )
1 T 2 T
2 T
1 T
F IGURE 6.9.
146
1 1 + + G =T 0 f 1/ T . N 2T 2T If gN (t ) is an even function, its Fourier transform is real valued, too, and the Nyquist criterion can be written as 1 1 GN + + GN =T 0 f 1/ T , 2T 2T so that, for example, the trapezoidal shape in Fig. 6.10 satises it. GN GN ( f ) T
The condition (6.25) entails that G N ( f ) has a sort of Hermitian symmetry around f = 1/2T . This symmetry condition can be made clear by substituting f = 21 + in (6.25). Taking into account T that gN (t ) is real valued, G N ( f ) = G N ( f ), and thus
+ 1 1 12 T 2T 2T
1 2T
1 2T
1+ 2T
F IGURE 6.10. However, a more realistic solution is the so called raised cosine shape T | f | < 12 T T 1 GN ( f ) = T cos2 | f | 12 | f | 12+ 01 2 T 2T T 1 + 0 | f | > 2T
where is said the rolloff parameter. A raised cosine shape is represented in Fig. 6.11(a) for = 0, 1/2, 1, while Fig. 6.11(b) shows the corresponding inverse Fourier transform cos(t /T ) gN (t ) = sinc(t /T ) . 1 (2t /T )2
GN ( f ) gN (t )
=0
1/2
1 f
=1
1 2T 1 T
0 3T t
1 T
21 T
0 (a)
3T 2T
T 0 (b)
2T 1/2
F IGURE 6.11. As can be seen, the bandwidth of G N ( f ) decreases with , while the leading and trailing oscillations of gN (t ) increase. For = 0, G N ( f ) has a rectangular shape and gN (t ) = sinc(t /T ) .
6.4. POWER SPECTRUM OF PAM SIGNALS
147
In this case, the bandwidth B = 1/2T of G N ( f ) is the minimum compatible with the absence of intersymbol interference. Such bandwidth is referred to as the Nyquist bandwidth. In conclusion, (i) if the symbols are equally probable and independent, (ii) if ISI is avoided by choosing the terminal lters HT ( f ) and HR ( f ) such that G( f ) = HT ( f )HC ( f )HR ( f ) = G N ( f )e j 2 f t0 (6.26) where G N ( f ) satises the Nyquist criterion (6.23), and (iii) HR ( f ) is chosen as in (6.12), reported here for convenience, H R ( f ) = HT ( f )HC ( f )e j 2 f t0 , (6.27) then the receiver in Fig. 6.7 is optimum. From (6.26) and (6.27) we have that |HR ( f )|2 = G N ( f ) GN ( f ) |HT ( f )|2 = . |HC ( f )|2 (6.28) (6.29)
So, when the system is designed in this way, it is clear that the transmitter has to take care of the channel. Indeed, if the channel introduces attenuation, from (6.29) we see that this is compensated for by a larger amplitude response of HT ( f ). If there is a constraint on the maximum power that can be delivered by the transmitter, then the required signal-to-noise ratio at the receiver could possibly not be guaranteed. This is a direct consequence of the fact that the system is designed with a constraint on the received power. Indeed, this constraint derives from the fact that HR ( f ) in (6.27) is guaranteed to be optimum when g(t0 ) has a xed value. We can overcome this restriction by optimizing the terminal lters under the constraint that the transmitted power is held constant. In order to be able to do this, we need to know the power spectrum of the transmitted signal. 6.4. Power Spectrum of PAM Signals The power spectral density of a wide-sense stationary (WSS) process is dened as the Fourier transform of its autocorrelation function (see Denition 2.66). Unfortunately, a PAM signal such as (6.6), even modeling the succession of symbols {ai } as a discrete-time stationary process, is a stochastic process which is neither stationary nor wide-sense stationary. Indeed, letting E {ai a j } the stochastic process x (t ) =
i
E {ai }
ma Ra (i j ) ai p(t iT ) p(t iT )
j
is such that x (t ) = E { x (t )} = ma Rx (t1 , t2 ) = E { x (t1 ) x (t2 )} =

i i
Ra (i j ) p(t1 iT ) p(t2 jT )
148
and it can be seen that x (t ) is not constant and Rx (t1 , t2 ) does not depend only the difference t1 t2 . Actually, x (t ) turns out to be a wide-sense cyclostationary process of period T , as x (t ) = x (t + T ), t , and Rx (t1 , t2 ) = Rx (t1 + T , t2 + T ), t1 , t2 . However, an alternative denition of power spectral density, independent of stationarity, is possible. Given the truncated process x (t ) |t | < /2 x (t ) = , 0 |t | > /2
since each truncated realization has nite duration, it also has nite energy. So, we can use the Fourier transform /2 j 2 f t X ( f ) = x (t )e dt = x (t )e j 2 f t dt
/ 2
and the Parseval formula (see Proposition 2.43) to write the energy of each truncated realization as 2 | x (t )| dt = |X ( f )|2 d f . Thus, the average power P x of x (t ) can be written as 1 1 2 P x = lim E |X ( f )| d f = lim E |X ( f )|2 d f whose integrand may be interpreted as the power spectral density. Therefore, the power spectrum of a stochastic process can be computed as Sx ( f ) = lim 1 E |X ( f )|2

and it can be shown that, if x (t ) is WSS, the power spectrum obtained in this way coincides with the Fourier transform of the autocorrelation function. Now, letting = (2N + 1)T , we have x (t ) = X ( f ) =
N k = N N
ak p(t kT ) ak P( f )e j 2 f kT
k = N
where P( f ) is the Fourier transform of p(t ), and 1 |P( f )|2 E |X ( f )|2 = (2N + 1)T |P( f )|2 = (2N + 1)T
N N
k = N = N N N
E {ak a }e j 2 f (k
)T
k = N = N
Ra (k )e j 2 f (k
)T
149
2N + 1 Considering a (2N + 1) (2N + 1) matrix whose rows and columns 000 111 000 111 00 11 00000 11111 are numbered from N to N , and indexing the 2N + 1 diagonals 000 111 00 11 000 111 000 111 00 11 00000 11111 000 111 00 11 000 111 000 111 000 111 000 111 000 111 000 111 00 11 00000 11111 with the difference k , 2N k 2N , where k identies 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 00011 111 0002N 111 a row and a column, we have that the diagonal whose index is 000 111 00 11 000 111 00000 111 00 11 000 111 00 11 2N + 1 000 111 000 111 00 11 k = 0 is the principal diagonal and has 2N + 1 elements. The 000 111 000 111 000 111 000 111 00 11 000 111 000 111 000 111 000 111 00 11 000 111 000 111 00011 111 00 000 111 diagonals immediately above and below have index k = 1 00 11 000 111 000 111 00 11 000 111 00 11 000 111 000 111 00 11 0001 111 and 2N elements, and, in general, the diagonals whose index is k = 2N 2N 1 1 0 k = n have 2N + 1 n elements. Now, associating each term in the double summation above with the matrix element in the k -th row and -th column, we can rst sum all terms with k = n, which are 2N + 1 |n| and are all equal, and then sum over the diagonal index n from 2N to 2N . So, the double summation can be arranged as the single sum
N N
and hence, as the limit corresponds to N ,
k = N = N
Ra (k )e j 2 f (k
)T
2N
=
n =2 N
(2N + 1 |n|)Ra (n)e j 2 f nT
|n| 1 1 1 Ra (n)e j 2 f nT Sx ( f ) = lim E |X ( f )|2 = lim |P( f )|2 N T 2N + 1 n=2N = 1 |P( f )|2 Ra (n)e j 2 f nT . T n=
2N
2 2 In the case of uncorrelated message symbols with mean ma = 0 and variance a = E {a2 i } ma 2 + m2 n=0 a a Ra (n) = m2 n=0 a
and
Ra (n)e
j 2 f nT
n=
2 a
m2 a
e j 2 f nT .
n=
Then, drawing upon Poissons sum formula,

n=
j 2 f nT
1 = T
n=
f
2
n T n . T
and therefore
2 a m2 a 2 Sx ( f ) = |P( f )| + 2 T T
This result shows that, if ma = 0, the power spectrum of x (t ) contains impulses at integer multiples of the signaling rate 1/T , unless P( f ) = 0 at all f = n/T . This fact might be useful for synchronization purposes, as one of the harmonics could be extracted by a narrowband lter. If the message symbols are zero-mean, i.e., ma = 0, the power spectrum simplies to Sx ( f ) =
2 a |P( f )|2 . T
n=
n P T
2N + 1
6.5. OPTIMUM TERMINAL FILTERS
150
6.5. Optimum Terminal Filters Let us now review the way the terminal lters are dimensioned. If the received signals are xed, the optimum receiving lter HR ( f ) has to be the matched lter. In this case, we found that dimensioning the transmitting lter such that to avoid ISI, the optimum terminal lters should be as in (6.28) and (6.29). However, we learned that the system in Fig. 6.7 is optimum according to the MAP criterion if the following three conditions hold: (1) The message symbols ai are independent and equally likely. (2) The ISI is avoided. (3) The signal-to-noise ratio at the receiver is maximized. However, condition (3) above is to be veried according to a constraint, because we could increase without limit the signal-to-noise ratio by either decreasing the variance 2 in (6.9) or, equivalently, increasing without limit the transmitted power. However, we cannot let 2 vanish by reducing HR ( f ) because also the signal component would vanish, while, for obvious reasons, the transmitted power cannot be increased above a given value. The terminal lters dimensioned as in (6.28) and (6.29) correspond to a constraint on the received power. Now, we want to investigate what happens if we choose a constraint on the transmitted power. We would also like to account for nonideal Dirac delta pulses at the input of Fig. 6.7 and for colored noise. 6.5.1. Optimum Filters for Fixed Transmitted Power. For PAM signaling with equally probable and independent symbols as in (6.5), we have that ma = E {ai } = 0 and hence the power spectrum of the transmitted signal (6.6) is ST ( f ) = where, as ma = 0,
2 a |HT ( f )|2 T
2 a = E {a2 i} =
M2 1 2 A . 12 (6.30)
Thus, the transmitted signal power will be
M 2 1 A2 PT = |HT ( f )|2 d f . 12 T If the overall system transfer function G( f ) is such that G( f ) = HT ( f )HC ( f )HR ( f ) = G N ( f )e j 2 f t0 where G N ( f ) satises the Nyquist criterion at the time instants kT , then at tk = t0 + kT 1 k = 0 g(tk ) = 0 k = 0
(6.31)
and there is no ISI. In this case, the probability of error is given by (6.16) with g(t0 ) = 1, i.e., M 1 A P(E = 2 Q , M 2 where 2 is as in (6.9) N0 2 = |HR ( f )|2 d f . (6.32) 2
151
From (6.31) we get |HT ( f )| =
and, substituting it in (6.30) and solving for A2 , we obtain
|G N ( f )| |HC ( f )HR ( f )|
(6.33)
so that, if we x the transmitted power PT , the value of A depends only on G N ( f ), HC ( f ) and HR ( f ). Combining now (6.32) and (6.34), we have A 2
2
12T /(M 2 1) A2 = |G ( f )|2 PT , N d f |HC ( f )HR ( f )|2
(6.34)
N0
Holding xed PT , the previous expression is maximum when the denominator is minimum. Recalling the Schwartz inequality (6.19), this occurs when the integrands are proportional. Consequently, disregarding the inessential constant of proportionality, the optimum receiving lter is such that |HR ( f )|2 = |G N ( f )| |HC ( f )| (6.35)
6T /( M 2 1) PT |GN ( f )|2 2d f d f | H ( f ) | R 2 |HC ( f )HR ( f )|
and, replacing this expression in (6.33), the optimum transmitting lter must satisfy |HT ( f )|2 = |G N ( f )| . |HC ( f )| (6.36)
As can be seen, minimizing the probability of error under the constraint that the transmitted power is xed, requires that the amplitude response of the terminal lters is the same. 6.5.2. Nonideal Input Pulses. Practical systems often deviate from the schematization in Fig, 6.7. As an example, the signal that is actually applied to the transmitter is not a train of Dirac deltas, but rather a train of pulses i ai p (t iT ), where p (t ) is some pulse of duration . Most often, p (t ) is a rectangular pulse of duration T , as the information to be transmitted is processed by digital circuits working with rectangular rather than ideal pulses. We can account for nonideal pulses by incorporating their Fourier transform P ( f ) into the transmitting lter, i.e., by thinking REAL that HT ( f ) is the cascade of P ( f ) and the real transmitting lter HT ( f ) as shown in Fig. 6.12.
ai (t iT )
i
P ( f ) HT ( f ) F IGURE 6.12.
REAL HT (f)
Thus, after performing the dimensioning, the actual transmitting lter will simply be
REAL HT (f ) =
HT ( f ) . P ( f )
152
6.5.3. Colored Noise. If the noise in Fig, 6.7 is Gaussian but not white, we have already seen that an optimum receiver can be designed by considering the signals after a reversible whitening lter. So, if the power spectral density of the noise is Sn ( f ), denoting by H ( f ) the whitening lter such that Sn ( f )|H ( f )|2 = N0 /2 H ( f ) H 1 ( f ) = 1 , (6.37) (6.38)
we can think of applying the cascade of H ( f ) and H 1 ( f ) at the receiver input as shown in Fig. 6.13. n(t ) HT ( f ) HC ( f ) H( f ) H 1 ( f ) r(t ) HR ( f ) x(t ) tk = t0 + kT F IGURE 6.13. So, as the noise after the whitening lter H ( f ) is AWGN with power spectral density N0 /2, the system in Fig. 6.13 is equivalent to that in Fig. 6.14, where w(t ) = n(t ) h(t ), being h(t ) the impulse response of H ( f ). w(t ) HT ( f ) HC ( f )
(f) HC
H( f )
r (t )
H 1 ( f )
HR ( f )
x(t ) tk = t0 + kT
(f) HR
F IGURE 6.14. Notice that this system corresponds to the system in Fig. 6.7 when replacing the channel and the receiving lter with HC ( f ) = HC ( f )H ( f ) H R ( f ) = H 1 ( f ) H R ( f ) and that, due to (6.38), the condition for the absence of ISI (6.26) is not changed. So, the terminal lters can be dimensioned as before. Taking into account (6.37) and the fact that H 1 ( f ) = 1/H ( f ), if using a constraint on the received power, (6.28) and (6.29) become N0 / 2 Sn ( f ) G N ( f ) Sn ( f ) |HT ( f )|2 = , |HC ( f )|2 N0 /2 |HR ( f )|2 = G N ( f )
153
while, with a constraint on the transmitted power, (6.35) and (6.36) become |HR ( f )|2 = |HT ( f )|2 = |G N ( f )| |HC ( f )| |G N ( f )| |HC ( f )| N0 / 2 Sn ( f ) Sn ( f ) . N0 / 2
Notice how, in both cases, the transmitting lter emphasizes the signal at the frequencies where the noise is stronger, while the receiving lter provides the necessary deemphasis.

CTDT

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

CTDT

Caricato da

Copyright:

Formati disponibili

Lecture Notes of

Optical Communication Theory and Techniques

Introduction to Communication Systems

1.3. CLASSIFICATION OF COMMUNICATION SYSTEMS

1.4. COMMUNICATION NETWORKS

means equal by denition.

2Re{ ( x , y)} + ||2 y

|( x , y)|2 x 0 y 2 from which (2.5) follows. Moreover, if x (t ) = y(t ), then x = || y , |( x , y)| = || y

Thus, letting ck = 0, we would have

M and thus { xi (t )}1 would be linearly dependent.

2.2. DISCRETE REPRESENTATION OF SIGNALS

2.2. Discrete Representation of Signals

are said the components of x (t ) with respect to the basis B.

N and, given the linear independence of {i (t )}1 , it must be xi = xi for i = 1, 2, . . . , N .

2.2. DISCRETE REPRESENTATION OF SIGNALS

2.2. DISCRETE REPRESENTATION OF SIGNALS

(i , j ) = i j where i j is the Kronecker symbol

2.2. DISCRETE REPRESENTATION OF SIGNALS

2.2. DISCRETE REPRESENTATION OF SIGNALS

and, expressing each i (t ) as a linear combination of the functions of B , i (t ) =

we substitute (2.22) in (2.20), getting x (t ) =

where ai is the i -th column of A, and

the j -th row of A .

2.2. DISCRETE REPRESENTATION OF SIGNALS

then it should also be

which is possible only if ci = 0 for all i .

is normalized and from (2.27) and (2.28) we have that (

2.2. DISCRETE REPRESENTATION OF SIGNALS

2.2. DISCRETE REPRESENTATION OF SIGNALS

q(t kT )e j 2nt /T dt q( )e j 2n(+kT )/T d

is the Fourier transform of q(t ).

2.2. DISCRETE REPRESENTATION OF SIGNALS

E XAMPLE 2.44. The functions n (t ) = 2 B sinc(2 Bt n)

F IGURE 2.2. This can be proved by showing that (n , m ) =

2.3. COMPLETE BASES

As all n (t ) can be written in terms of 0 (t ),

2.3. COMPLETE BASES

2.3. COMPLETE BASES

can be computed as 2 = |eN (t )| dt = |E N ( f )|2 d f

where E N ( f ) is the Fourier transform of eN (t ). From (2.44) we have EN ( f ) = X ( f ) ( x , n )n ( f ), (2.46)

and substituting (2.47) in (2.46), we get EN ( f ) = X ( f )

Substituting (2.36) in (2.48) and letting N , we obtain j 2 ( f )n/(2 B) E ( f ) = X ( f ) 0 ( f ) X () d , 0 ( )e

and, letting = f , we arrive at E ( f ) = X ( f ) 0 ( f )

Recalling now the Poissons sum formula T

2.3. COMPLETE BASES

from (2.53) it follows that

and nally from (2.45), letting N ,

and, recalling (2.38),

2.4. DISCRETE REPRESENTATION OF STOCHASTIC PROCESSES

2.4. DISCRETE REPRESENTATION OF STOCHASTIC PROCESSES

n(t1 ) i (t1 )dt1

n (t2 ) j (t2 )dt2

n(t1 )n (t2 ) i (t1 ) j (t2 )dt1 dt2

2.4. DISCRETE REPRESENTATION OF STOCHASTIC PROCESSES

i (t1 ) j (t1 )dt1 = i j

2.5. GAUSSIAN RANDOM VARIABLES

and thus x1 , x2 , . . . , xN are also independent.

j =1 N then the r.v.s {yi }1 are jointly Gaussian, too.

2.5. GAUSSIAN RANDOM VARIABLES

2.5. GAUSSIAN RANDOM VARIABLES

of its realizations is innite E

and, as R( ) is a Dirac delta, R(t , t ) = R(0) = and the integral diverges.

2.5. GAUSSIAN RANDOM VARIABLES