Sei sulla pagina 1di 38

INFORMATION THEORY

Information theory deals with mathematical modelling and analysis of a communications


system rather than with physical sources and physical channels.
Specifically, given an information source and a noisy channel, information theory provides
limits on :
1- The minimum number of bits per symbol required to fully represent the source.
(i.e. the efficiency with which information from a given source can be represented.)

2- The maximum rate at which reliable (error-free) communications can take place over the
noisy channel.

Since the whole purpose of a communications system is to transport information from a
source to a destination, the question arises as to how much information can be transmitted in
a given time. ( Normally the goal would be to transmit as much information as possible in as
small a time as possible such that this information can be correctly interpreted at the
destination.)
This of course leads to the next question, which is :
How can information be measured and how do we measure the rate at which information is
emitted from a source ?

Suppose that we observe the output emitted by a discrete source ( every unit interval or
signalling interval.)
The source output can be considered as a set, S, of discrete random events ( or outcomes).
These events are symbols from a fixed finite alphabet.
( for example the set or alphabet can be the numbers 1 to 6 on a die and each roll of the die
outputs a symbol being the number on the die upper face when the die comes to rest.
Another example is a digital binary source, where the alphabet is the digits "0" and "1", and
the source outputs a symbol of either "0" or "1" at random .)

If in general we consider a discrete random source which outputs symbols from a fixed finite
alphabet which has k symbols. Then the set S contains all the k symbols and we can write

S = { s
0
, s
1
, s
2
, ......., s
k-1
} and

p s
i
i
i k
( )
( )
=
=
=
1
0
1
(3.1)

In addition we assume that the symbols emitted by the source during successive signalling
intervals are statistically independent i.e. the probability of any symbol being emitted at any
signalling interval does not depend on the probability of occurrence of previous symbols. i.e.
we have what is called a discrete memoryless source.

Can we find a measure of how much "information" is produced by this source ?

The idea of information is closely related to that of "uncertainty" and "surprise".



If the source emits an output s
i
, which has a probability of occurrence p(s
i
) = 1, then all other
symbols of the alphabet have a zero probability of occurrence and there is really no
"uncertainty", "surprise", or information since we already know before hand ( a priori ) what
the output symbol will be.
If on the other hand the source symbols occur with different probabilities, and the probability
p(s
i
) is low, then there is more "uncertainty", "surprise" and therefore "information" when the
symbol s
i
is emitted by the source, rather than another one with higher probability.

Thus the words "uncertainty", "surprise", and "information" are all closely related.
- Before the output s
i
occurs, there is an amount of "uncertainty".
- When the output s
i
occurs, there is an amount of "surprise".
- After the occurrence of the output s
i
, there is gain in the amount of "information".

All three amounts are really the same and we can see that the amount of information is
related to the inverse of the probability of occurrence of the symbol.

Definition :
The amount of information gained after observing the event s
i
which occurs with probability
p(s
i
), is

]
)
i
p(s
1
[
2
log )
i
I(s = bits, for i = 0,1,2, ..., (k-1) (3.2)

The unit of information is called "bit" , a contraction of "binary digit"

This definition exhibits the following important properties that are intuitively satisfying:

1- I (s
i
) = 0 for p(s
i
) = 1

i.e. if we are absolutely certain of the output of the source even before it occurs (a
priory), then there is no information gained.

2- I(s
i
) > 0 because 0 s p(s
i
) s 1 for symbols of the alphabet.

i.e. the occurrence of an output s
j
either provides some information or no information
but never brings about a loss of information ( unless it is a severe blow to the head
which is highly unlikely from the discrete source !)

3- I(s
j
) ) I(s
i
) for p(s
j
) ( p(s
i
)

i.e. the less probability of occurrence an output has the more information we gain when it
occurs.

4- I(s
j
s
i
) = I(s
j
) + I(s
i
) if the outputs s
j
and s
i
are statistically independent.



The use of the logarithm to the base 2 ( instead of to the base 10 or to the base e ) has been
adopted in the measure of information because usually we are dealing with digital binary
sources, (however it is useful to remember that log
2
(a) = 3.322 log
10
(a) ). Thus if the source
alphabet was the binary set of symbols, i.e. "0" or "1" , and each symbol was equally likely to
occur i.e. s
0
having p(s
0
) = 1/2 and s
1
having p(s
1
) = 1/2
we have :
1 ) 2 ( log ]
2 1
1
[ log = ]
)
i
p(s
1
[
2
log )
i
I(s
2 2
= = = bit
Hence "one bit" is the amount of information that is gained when one of two possible and
equally likely (equiprobable) outputs occurs.

[Note that a "bit" is also used to refer to a binary digit when dealing with the transmission of
a sequence of 1's and 0's].

Entropy
The amount of information , I(s
i
), associated with the symbol s
i
emitted by the source during
a signalling interval depends on the symbol's probability of occurrence. In general, each
source symbol has a different probability of occurrence. Since the source can emit any one of
the symbols of its alphabet, a measure for the average information content per source
symbol was defined and called the entropy of the discrete source , H, (i.e. taking all the
discrete source symbols into account ).

Definition
The entropy, H, of a discrete memoryless source with source alphabet composed of the set
S = { s
0
, s
1
, s
2
, ......., s
k-1
}, is a measure of the average information content per source
symbol, and is given by :

]
) p(s
1
[ log ) p(s
) I(s ) p(s H
i
2
1) (k i
0 i
i
i
1) (k i
0 i
i

=
=
=
=
=
=
bits/symbol (3.3)



We note that the entropy, H, of a discrete memoryless source with equiprobable symbols is
bounded as follows:
0
2
s s H k log , where k is the number of equiprobable source symbols.


Furthermore, we may state that :
1- H = 0 , if and only if the probability p(s
i
) = 1 for some symbol s
i
, and the remaining
source symbols probabilities are all zero. This lower bound on entropy corresponds to no
uncertainty and no information.
2- H = log
2
k bits/symbol, if and only if p(s
i
) = 1/k for all the k source symbols (i.e. they are
all equiprobable). This upper bound on entropy corresponds to maximum uncertainty and
maximum information.
Example:
Calculate the entropy of a discrete memoryless source with source alphabet S = { s
0
, s
1
, s
2
}
with probabilities p(s
0
) = 1/4 , p(s
1
) = 1/4, p(s
3
) = 1/2 .
H p(s ) I(s )
p(s ) log [
1
p(s )
]
i
i 0
i (k 1)
i
i
i 0
i (k 1)
2
i
=
=
=
=
=
=


H p(s ) log [
1
p(s )
] + p(s log [
1
p(s )
] + p(s log [
1
p(s )
]
=
1
4
log
1
4
log
1
2
log
0 2
0
1 2
1
2 2
2
2 2 2
=
+ + = =
) )
( ) ( ) ( ) . / 4 4 2
3
2
1 5 bits symbol





Information Rate
If we consider that the symbols are emitted from the source at a fixed time rate (the signalling
interval), denoted by r
s
symbols/second. We can define the
average source information rate R in bits per second as the product of the average
information content per symbol, H, and the symbol rate r
s
.

R = r
s
H bits/sec (3.4)


Example
A discrete source emits one of five symbols once every millisecond. The source symbols
probabilities are 1/2, 1/4, 1/8, 1/16, and 1/16 respectively.
Find the source entropy and information rate.


H p(s ) log [
1
p(s )
]
i
i 0
i (k 1)
2
i
=
=
=

bits where, in this case k = 5



H p(s ) log [
1
p(s )
] + p(s log [
1
p(s )
] + p(s log [
1
p(s )
] + p(s log [
1
p(s )
] + p(s log [
1
p(s )
]
=
1
2
log
1
4
log
1
8
log
1
16
log
1
16
log
0 2
0
1 2
1
2 2
2
3 2
3
4 2
4
2 2 2 2 2
=
+ + + +
= + + + + =
) ) ) )
( ) ( ) ( ) ( ) ( )
. . . . . . /
2 4 8 16 16
0 5 0 5 0 375 0 25 0 25 1 875 bits symbol


R = r
s
H bits/sec

The information rate R = (1/10
-3
) x 1.875 = 1875 bits/second.



Entropy of a Binary Memoryless Source:
To illustrate the properties of H, let us consider a memoryless digital binary source for which
symbol 0 occurs with probability p
0
and symbol 1 with probability p
1
= (1 - p
0
).
The entropy of such a source equals:
bits ]
) p - (1
1
[ log ) p - (1 + ]
p
1
[ log p
]
p
1
[ log p + ]
p
1
[ log p H
0
2 0
0
2 0
1
2 1
0
2 0
=
=


We note that
1- When p
0
= 0, the entropy H = 0. This follows from the fact that x log x 0 as x 0.
2- When p
0
= 1, the entropy H = 0.
3- The entropy H attains its maximum value, H
max
= 1 bit, when p
0
= p
1
=1/2, that is symbols
0 and 1 are equally probable. (i.e. H = log
2
k = log
2
2 = 1 )
( H
max
= 1 can be verified by differentiating H with respect to p and equating to zero )


















CHANNEL CAPACITY


In Information Theory, the transmission medium is treated as an abstract and noisy filter
called the channel. The maximum rate of information transmission through a channel is
called the channel capacity, C.



Channel Coding Theorm
Shannon showed that, if the information rate R [remember that R = r
s
H bits/sec] is equal to
or less than C, R s C , then there exists a coding technique which enables transmission over
the noisy channel with an arbitrarily small frequency of errors.
[A converse to this theorem states that it is not possible to transmit messages without error if
R > C ]

Thus the channel capacity is defined as the maximum rate of reliable (error-free) information
transmission through the channel.

Now consider a binary source with an available alphabet of k discrete messages (or symbols)
which are equiprobable and statistically independent (these messages could be either single
digit symbols or could be composed of several digits each depending on the situation). We
assume that each message sent can be identified at the receiver; therefore this case is often
called the discrete noiseless channel. The maximum entropy of the source is log
2
k bits,
and if T is the transmission time of each message, (i.e. r
s
=
T
1
symbols/sec), the channel
capacity is

k r H r R C
s s 2
log = = =
bits per second.


To attain this maximum the messages must be equiprobable and statistically independent.
These conditions form a basis for the coding of the information to be transmitted over the
channel.
In the presence of noise, the capacity of this discrete channel decreases as a result of the
errors made in transmission.


In making comparisons between various types of communications systems, it is convenient to
consider a channel which is described in terms of bandwidth and signal-to-noise ratio.





Review of Signal to Noise Ratio
The analysis of the effect of noise on digital transmission will be covered later on in this
course but before proceeding, we will review the definition of signal to noise ratio. It is
defined as the ratio of signal power to noise power at the same point in a system. It is
normally measured in decibels.

Signal to Noise Ratio (dB) =
N
S
10
log 10
dB


Noise is any unwanted signal. In electrical terms it is any unwanted introduction of energy
tending to interfere with the proper reception and reproduction of transmitted signals.


Channel Capacity Theorm

Bit error and signal bandwidths are of prime importance when designing a communications
system. In digital transmission systems noise may change the value of the transmitted digit
during transmission. (e.g. change a high voltage to a low voltage or vice versa).

This raises the question : Is it possible to invent a system with no bit error at the output even
when we have noise introduced into the channel? Shannons Channel Capacity Theorm
(also called the the Shannon - Hartley Theorm) answers this question

C = B log
2
(1 + S/N) bits per second,

where C is the channel capacity, B is the channel bandwidth in hertz and S/N is the signal-to-noise
power ratio (watts/watts, not dB).

Although this formula is restricted to certain cases (in particular certain types of random
noise), the result is of widespread importance to communication systems because many
channels can be modelled by random noise.

From the formula, we can see that the channel capacity, C, decreases as the available
bandwidth decreases. C is also proportional to the log of (1+S/N), so as the signal to noise
level decreases C also decreases.

The channel capacity theorem is one of the most remarkable results of information theory. In
a single formula, it highlights the interplay between three key system parameters: Channel
bandwidth, average transmitted power (or, equivalently, average received power), and noise
at the channel output.

The theorem implies that, for given average transmitted power S and channel bandwidth B,
we can transmit information at the rate C bits per seconds, with arbitrarily small probability
of error by employing sufficiently complex encoding systems. It is not possible to transmit at
a rate higher than C bits per second by any encoding system without a definite probability of
error.
Hence, the channel capacity theorem defines the fundamental limit on the rate of error-free
transmission for a power-limited, band-limited Gaussian channel. To approach this limit,
however, the noise must have statistical properties approximating those of white Gaussian
noise.



Problems:

1. A voice-grade channel of the telephone network has a bandwidth of 3.4 kHz.
(a) Calculate the channel capacity of the telephone channel for a signal-to-noise ratio of 30
dB.
(b) Calculate the minimum signal-to-noise ratio required to support information transmission
through the telephone channel at the rate of 4800 bits/sec.
(c) Calculate the minimum signal-to-noise ratio required to support information transmission
through the telephone channel at the rate of 9600 bits/sec.

2. Alphanumeric data are entered into a computer from a remote terminal through a voice-
grade telephone channel. The channel has a bandwidth of 3.4 kHz, and output signal-to-noise
ratio of 20 dB. The terminal has a total of 128 symbols. Assume that the symbols are
equiprobable, and the successive transmission are statistically independent.
(a) Calculate the channel capacity.
(b) Calculate the maximum symbol rate for which error-free transmission over the channel is
possible.

3. A black-and-white television picture may be viewed as consisting of approximately 3 x 10
5

elements, each one of which may occupy one 10 distinct brightness levels with equal
probability. Assume (a) the rate of transmission is 30 picture frames per second, and (b) the
signal-to-noise ratio is 30 dB.
Using the channel capacity theorem, calculate the minimum bandwidth required to support
the transmission of the resultant video signal.

4. What is the minimum time required for the facsimile transmission of one picture over a
standardtelephone circuit?
There are about 2.25 x 10
6
picture elements to be transmitted and 12 brightness levels are to
be used for good reproduction. Assume all brightness levels are equiprobable. The telephone
circuit has a
3-kHz bandwidth and a 30-dB signal-to-noise ratio (these are typical parameters).


THE BINARY SYMMETRIC CHANNEL

Usually when a "1" or a "0" is sent it is received as a "1" or a "0", but occasionally a "1" will
be received as a "0" or a "0" will be received as a "1".
Let's say that on the average 1 out of 100 digits will be received in error, i.e. there is a
probability p = 1/100 that the channel will introduce an error.
This is called a Binary Symmetric Channel (BSC), and is represented by the following
diagram.


p
p
(1-p)
(1-p)
0
1
0
1
Representati on of the Bi nary Symmetri c Channel
wi th an error probabi l i ty of p



Now let us consider the use of this BSC model .
Say we transmit one information digit coded with a single even parity bit . This means that if
the information digit is 0 then the codeword will be 00 , and if the information digit is a 1
then the codeword will be 11.

As the codeword is transmitted through the channel, the channel may (or may not) introduce
an error according to the following error patterns:
E = 00 i.e. no errors
E = 01 i.e. a single error in the last digit
E = 10 i.e. a single error in the first digit
E = 11 i.e. a double error

The probability of no error , is the probability of receiving the second transmitted digit
correctly on condition that the first transmitted digit was received correctly.
Here we have to remember our discussion on joint probability:
p(AB) = p(A) p(B/A) = p(A) p(B) when the occurrence of any of the two outcomes is
independent of the occurrence of the other.
Thus the probabilty of no error is equal to the probability of receiving each digit correctly.
This probability, according to the BSC model, is equal to (1 - p), where p is the probability of
one digit being received incorrectly.
Thus the probability of no error = (1 - p) ( 1- p) = (1 - p)
2
.

Similarly, the probability of a single error in the first digit = p ( 1- p)
and the probability of a single error in the second digit = (1 - p) p ,
i.e. the probability of a single error is equal to the sum of the above two probabilities ( since
the two events are mutually exclusive), i.e.

the probability of a single error ( when a code with block length, n = 2 , is used, as in this
case)
is equal to 2 p(1 - p)

Similarly the probability of a double error in the above example ( i.e. the error pattern E = 11
)
is equal to p
2
.
In summary these probabilities would be
p(E = 00) = (1 - p)
2

p(E = 01) = (1 - p) p
p(E = 10) = p (1 - p)
p(E = 11) = p
2
.

and if we substitute for p = .01 ( given in the above example) we find that
p(E = 00) = (1 - p) = 0.98
p(E = 01) = (1 - p) p = 0.0099
p(E = 10) = p (1 - p) = 0.0099
Thus the probability of a single error per codeword = (1 - p) p + p (1 - p) = 2 p (1-p)
p(E = 11) = p
2
= 0.0001

This shows that if p < 1/2 , then the probability of no error is higher than the probability of a
single error occurring which in turn is higher than the probability of a double error.

Again, if we consider a block code with block length n = 3 , then the
probability of no error p(E = 000) = (1 - p)
3
,
probability of an error in the first digit p(E = 100) = p (1 -p)
2
,
probability of a single error per codeword p(1e) = 3 p (1 -p)
2
,
probability of a double error per codeword = p(2e) = ( )
2
3
p
2
(1 - p) = 3 p
2
(1 - p)
probability of a triple error per codeword = p(3e) = p
3
.

And again, if we have a code with block length n = 4, then the
probability of no error p(E = 0000) = (1 - p)
4
,
probability of an error in the first digit p(E = 1000) = p (1 -p)
3
,
probability of a single error per codeword p(1e) = 4 p (1 -p)
3
,
probability of a double error per codeword = p(2e) = ( )
2
4
p
2
(1 - p)
2
= 6 p
2
(1 - p)
2

probability of a triple error per codeword = p(3e) = ( )
3
4
p
3
(1 - p) = 4 p
3
(1 - p)
probability of four errors per codeword = p(4e) = p
4
.

And again, if we have a code with block length n = 5, then the
probability of no error p(E = 00000) = (1 - p)
5
,
probability of an error in the first digit p(E = 10000) = p (1 -p)
4
,
probability of a single error per codeword p(1e) = 5 p (1 -p)
4
,
probability of a double error per codeword = p(2e) = ( )
2
5
p
2
(1 - p)
3
= 10 p
2
(1 - p)
2

probability of a triple error per codeword = p(3e) = ( )
3
5
p
3
(1 - p)
2
= 10 p
3
(1 - p)
2

probability of four errors per codeword = p(4e) = ( )
4
5
p
4
(1 - p) = 5 p
4
(1 - p).
probability of five errors per codeword = p(5e) = p
5
.





From all of the disscussion, we realise that if the error pattern (of length n) has weight
of say e
then the probability of occurrence of e errors in a codeword with blocklength n is

) (
n
e
p
e
(1 - p)
n-e
.

We also realise that, since p < 1/2 , we have (1 - p) ) p, and
(1 - p)
n
) p (1 - p)
n-1
) p
2
(1 - p)
n-2
) ...............

Therefore an error pattern of weight 1 is more likely to occur than an error pattern of weight
2., and so on.



The Communications System from the channel Coding Theorem point of view
















The Communications System from the channel Coding Theorem point of view


source Encoder Decoder user
Information Theory Summary


1- A discrete memoryless source (DMS) is one that outputs symbols taken from a fixed finte
alphabet which has k symbols. These symbols form a set S = {s
0
, s
1
, s
2
, . , s
k-1
}
where the occurrence of each symbol (s
i
) at the output of the source has a probability of
occurrence p(s
i
) .( The probabilities of occurrence of the symbols are called the source
statistics.)

and

=
=
=
i k i
i
i
s p
0
1 ) (

2- The amount of information gained after observing the output symbol (s
i
) which
occurrs
with probability p(s
i
) is

(

=
) (
1
log ) (
2
i
i
s p
s I i = 0,1,2,,(k-1)

3- The entropy, H, of a discrete memoryless source with source alphabet composed of the set
S = {s
0
, s
1
, s
2
, . , s
k-1
}, is a measure of the average information content per source
symbol, and is given by:

=
=
=
=
(

= =
1
0
2
1
0
) (
1
log ) ( ) ( ) (
k i
i
i
i
k i
i
i i
s p
s p s I s p H bits/symbol

4- Information rate (bit rate) = symbol rate * entropy
R = r
s
H bits/sec


5- |
.
|

\
|
+ =
N
S
BW C 1 log
2
bits/sec


6- BSC = Binary Symmetric Channel

7- Prob of e errors in n digits = ) (
n
e
p
e
(1 - p)
n-e
.


CHANNEL CODING

Suppose that we wish to transmit a sequence of binary digits across a noisy channel. If we
send a one, a one will probably be received; if we send a zero, a zero will probably be
received. Occasionally, however, the channel noise will cause a transmitted one to be
mistakenly interpreted as a zero or a transmitted zero to be mistakenly interpreted as a one.
Although we are unable to prevent the channel from causing such errors, we can reduce their
undesirable effects with the use of coding.
The basic idea, is simple. We take a set of k information digits which we wish to transmit,
annex to them r check digits, and transmit the entire block of n = k + r channel digits.
Assuming that the channel noise changes sufficiently few of these transmitted channel digits,
the r check digits may provide the receiver with sufficient information to enable it to detect
and/or correct the channel errors.
(The detection and/or correction capability of a channel code will be discussed at some length
in the following pages.)
Given any particular sequence of k message digits, the transmitter must have some rule for
selecting the r check digits. This is called channel encoding.
Any particular sequence of n digits which the encoder might transmit is called a codeword.
Although there are 2
n
different binary sequences of length n, only 2
k
of these sequences are
codewords, because the r check digits within any codeword are completely determined by the
k information digits. The set consisting of these 2
k
codewords, of length n each, is called a
code (some times referred to as a code book.)
No matter which codeword is transmitted, any of the 2
n
possible binary sequences of length n
may be received if the channel is sufficiently noisy. Given the n received digits, the decoder
must attempt to decide which of the 2
k
possible codewords was transmitted.

Repetition codes and single-parity-check codes

Among the simplest examples of binary codes are the repetition codes, with k = 1, r arbitrary,
and n = k + r = 1 + r . The code contains two codewords, the sequence of n zeros and the
sequence of n ones.
We may call the first digit the information digit; the other r digits, check digits. The value of
each check digit (each 0 or 1) in a repetition code is identical to the value of the information
digit. The decoder might use the following rule:
Count the number of zeros and the number of ones in the received bits. If there are more
received zeros than ones, decide that the all-zero codeword was sent; if there are more ones
than zeros, decide that the all-one codeword was sent. If the number of ones equal the number
of zeros do not decide (just flag the error)..
This decoding rule will decode correctly in all cases when the channel noise changes less
than half the digits in any one block. If the channel noise changes exactly half of the digits in
any one block, the decoder will be faced with a decoding failure (i.e. it will not decode the
received word into any of the possible transmitted codewords) which could result in an ARQ
(automatic request to repeat the message). If the channel noise changes more than half of the
digits in any one block, the decoder will commit a decoding error; i.e. it will decode the
received word into the wrong codeword.
If channel errors occur infrequently, the probability of a decoding failure or a decoding error
for a repetition code of long block length is very small indeed. However repetition codes are
not very useful. They have only two codewords and have very low information rate R = k/n
( also called code rate),all but one of the digits are check digits.
We are usually more interested in codes which have a higher information rate.

Extreme examples of such very high rate codes which use a single-parity-check digit. This
check digit is taken to be the modulo-2 sum (Exclusive-OR) of the codeword (n -1)
information digits. (
The information digits are added according to the exclusive-OR binary operation : 0 + 0 = 0 ,
0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0 ). If the number of ones in the information word is even the
modulo-2 sum of all the information digits will be equal to zero, If the number of ones in the
information word is odd their modulo-2 sum will be equal to one.
Even parity means that the total number of ones in the codeword is even, odd parity means
that the total number ones in the codeword is odd. Accordingly the parity bit (or digit) is
calculated and appended to the information digits to form the codeword.
This type of code can only detect errors. A single digit error (or any number of odd digit
errors) will be detected but any combination of two digit errors (or any number of even digit
errors) will cause a decoding error. Thus the single-parity-check type of code cannot correct
errors.

These two examples, the repetitive codes and the single-parity-check codes, provide the
extreme, relatively trivial, cases of binary block codes. ( Although relatively trivial single-
parity-checks are used quite often because they are simple to implement.)
The repetition codes have enormous error-correction capability but only one information bit
per block. The single-parity-check codes have very high information rate but since they
contain only one check digit per block, they are unable to do more than detect an odd number
of channel errors.
There are other codes which have moderate information rate and moderate error-
correction/detection capability, and we will study few of them.

These codes are classified into two major categories:
Block codes , and Convolutional codes.


In block codes, a block of k information digits is encoded to a codeword of n digits
(n > k). For each sequence of k information digits there is a distinct codeword of n digits.

In convolutional codes, the coded sequence of n digits depends not only on the k
information digits but also on the previous N - 1 information digits (N > 1). Hence the coded
sequence for a certain k information digits is not unique but depends on N - 1 earlier
information digits.
In block codes, k information digits are accumulated and then encoded into an n-digit
codeword. In convolutional codes, the coding is done on a continuous, or running, basis
rather than by accumulating k information digits.

We will start by studying block codes. (and if there is time we might come back to study
convolutional codes).






BLOCK CODES
The block encoder input is a stream of information digits. The encoder segments the input
information digit stream into blocks of k information digits and for each block it calculates
a number of r check digits and outputs a codeword of n digits, where n = k + r (or r = n -
k).
The code efficiency (also known as the code rate ) is k/n.
Such a block code is denoted as an (n,k) code.
Block codes in which the k information digits are transmitted unaltered first and followed by
the transmission of the r check digits are called systematic codes, as shown in figure 1
below.
Since systematic block codes simplify implementation of the decoder and are always used in
practice we will consider only systematic codes in our studies.
( A non-systematic block code is one which has the check digits interspersed between the
information digits. For Linear block codes it can be shown that a non systematic block code
can always be transformed into a systematic one).



C
1
C
n
C
n-1
C
2
..........
C
k
.....................................
r check di gi ts k information digits
Figure 1 an (n,k) block codeword in systemati c form




LINEAR BLOCK CODES

Linear block codes are a class of parity check codes that can be characterized by the (n, k)
notation described earlier.
The encoder transforms a block of k information digits (an information word) into a longer
block of n codeword digits, constructed from a given alphabet of elements. When the
alphabet consists of two elements (0 and 1), the code is a binary code comprised of binary
digits (bits). Our discussion of linear block codes is restricted to binary codes.
Again, the k-bit information words form 2
k
distinct information sequences referred to as
k-tuples (sequences of k digits).
An n-bit block can form as many as 2
n
distinct sequences, referred to as n-tuples.
The encoding procedure assigns to each of the 2
k
information k-tuples one of the 2
n
n-tuples.
A block code represents a one-to-one assignment, whereby the 2
k
information k-tuples are
uniquely mapped into a new set of 2
k
codeword n-tuples; the mapping can be accomplished
via a look-up table, or via some encoding rules that we will study shortly.

Definition:
An (n, k) binary block code, is said to be linear if, and only if, the modulo-2 addition (C
i

C
j
) of any two codewords, C
i
and C
j
, is also a codeword. This property thus means that (for
linear block code) the all-zero n-tuple must be a member of the code book (because the
modulo-2 addition of a codeword with itself results in the all-zero n-tuple).
A linear block code, then, is one in which n-tuples outside the code book cannot be created
by the modulo-2 addition of legitimate codewords (members of the code book).
For example, the set of all 2
4
= 16, 4-tuples (or 4-bit sequences ) is shown below:


0000 0001 0010 0011 0100 0101 0110 0111

1000 1001 1010 1011 1100 1101 1110 1111


an example of a block code ( which is really a subset of the above set ) that forms a linear
code is

0000 0101 1010 1111

It is easy to verify that the addition of any two of these 4 code words in the code book can
only yield one of the other members of the code book and since the all-zero n-tuple is a
codeword this code is a linear binary block code.

Figure 5. 13 illustrates, with a simple geometric analogy, the structure behind linear block
codes. We can imagine the total set comprised of 2
n
n-tuples. Within this set (also called
vector space) there exists a subset of 2
k
n-tuples comprising the code book . These 2
k

codewords or points , shown in bold "sprinkled" among the more numerous 2
n
points,
represent the legitimate or allowable codeword assignments.






An information sequence is encoded into one of the 2
k
allowable codewords and then
transmitted. Because of noise in the channel, a corrupted version of the sent codeword
(one of the other 2
n
n-tuples in the total n-tuple set) may be received.

The objective of coding is that the decoder would be able to decide whether the received
word is a valid codeword, or whether it is a codeword which has been corrupted by noise (
i.e. detect the occurrence of one or more errors ). Ideally of course the decoder should be able
to decide which codeword was sent even if this transmitted codeword was corrupted by noise,
and this process is calld error-correction.

By thinking about it, if one is going to attempt to correct errors in a received word
represented by a sequence of n binary symbols, then it is absolutely essential not to allow the
use of all 2
n

n-tuples as being legitimate codewords.
If, in fact, every possible sequence of n binary symbols were a legitimate codeword, then in
the presence of noise one or more binary symbols could be changed, and one would have no
possible basis for determining if a received sequence was any more valid than any other
sequence.
Carrying this thought a little further, if one wished that the coding system would correct the
occurrence of a single error, then it is both necessary and sufficient that each codeword
sequence differs from every other codeword in at least 3 positions.
In fact, if one wished that the coding system would correct the occurrence of e errors, then it
is both necessary and sufficient that each codeword sequence differs from every other
codeword in at least (2e +1) positions.

DEFINITION
The number of positions in which any two codewords differ from each other is called the
Hamming distance, and is normally denoted by d .

For example:
Looking at the (n,k) = (4,2) binary linear block code, mentioned earlier, which has the
following codewords:
C
1
0000
C
2
0101
C
3
1010
C
4
1111

we see that the Hamming distance, d, :
between C
2
and C
3
is equal to 4
between C
2
and C
4
is equal to 2
between C
3
and C
4
is equal to 2

We also observe that the Hamming distance between C
1
and any of the other codewords is
equal to the "weight" that is the number of ones in each of the other codewords.

We can also see that the minimum Hamming distance ( i.e. the smallest Hamming distance
between any pair of the codewords), denoted by d
min
, of this code is equal to 2
( The minimum Hamming distance of a binary linear block code is simply equal to the
minimum weight of its codewords. This is due to the fact that the code is linear, meaning that
if any two codewords are added together modulo-2 the result will be another codeword. thus
to find the minimum Hamming distance of a linear block code all we need to do is to find the
minimum-weight codeword).

Looking at the above code again, and keeping in mind what we said earlier about the
"Hamming distance" property of the codewords for a code to correct a single error.
We said that, to correct a single error, this code must have any of its codewords differing
from any of the other codewords by at least (2e + 1), where e in our case is 1 (i.e. a single
error). That is the minimum Hamming distance of the code must be at least 3. Therefore the
above mentioned code cannot correct the result of occurrence of a single error, (since its d
min

= 2), but it can detect it.

To explain this further let us consider the following diagram in figure



C
1
C
2
x x x
b) Hamming sphere of radius e = 1 around each codeword
Hamming distance between codewords = 2
Code can only detect e = 1 error but cannot correct it
because d = e + 1 ( i.e. d < 2e + 1)
x x x x
C
1
C
2
a) Hamming sphere of radius e = 1 around each codeword
Hamming distance between codewords = 3
Code can correct a single error since d = 2e + 1



FIGURE 2

If we imagine that we draw a sphere ( called a Hamming sphere) of radius e= 1 around each
codeword. This sphere will contain all n-tuples which are at a distance 1 away from each
codeword ( i.e. all n-tuples which differ from this code word in one position ).
If the minimum Hamming distance of the code d
min
< 2e + 1 (as in figure 2b, where d = 2)
the occurrence of a single error will result in changing the codeword to the next n-tuple and
the decoder does not have enough information to decide if codeword C
1
or C
2
was
transmitted. The decoder however can detect that an error has occurred.
If we look at figure 2a we see that the code has a d
min
= 2e + 1 and that the occurrence of a
single error results in the next n-tuple being received and in this case the decoder can make
an unambiguous decision, based on what is called nearest neighbour decoding rule, as to
which of the two codewords was transmitted.
If the corrupted received n-tuple is not too unlike (not too distant from) the valid codeword,
the decoder could make a decision that the transmitted codeword was the code word "nearest
in distance" to the received the word.

Thus in general we can say that a binary linear code will correct e errors
if d
min
= 2e + 1 (for odd d
min
)
if d
min
= 2e + 2 (for even d
min
)







A (6 , 3) Linear Block Code Example

Examine the following coding assignment that describes a (6, 3) code. There are 2
k
= 2
3
= 8
information words, and therefore eight codewords.
There are 2
n
= 2
6
= sixty-four 6-tuples in the total 6-tuple set (or vector space)

Information digits, Codewords
C
1
,C
2
,C
3
C
1
C
2
,C
3
C
4
C
5
,C
6
parity check equations for this code are
c4 = c1 c2
000 000000 c5 = c2 c3
001 001011 c6 = c1 c3
010 010110 and its H matrix is
011 011101 110100
100 100101 011010
101 101110 101001
110 110011
111 111000



It is easy to check that the eight codewords shown above form a linear code (the all-zeros
codeword is present, and the sum of any two codeword is another codeword member of the
code). Therefore, these codewords represent a linear binary block code.

It is also easy enough to check that the minimum Hamming distance of the code is d
min
= 3
thus we conclude that this code is a single error correction code, since
d
min
= 2e + 1 (for odd d
min
) .








In the simple case of single-parity-check codes, the single parity was chosen to be the
modulo-2 sum of all the information digits.
Linear block codes contain several check digits, and each check digit is a function of the
modulo-2 sum of some (or all) of the information digits.

Let us consider the (6 , 3) code, i.e. n = 6, k = 3, and there are r = n-k = 3 chek digits.

We shall label the three information digits by C
1
,C
2
,C
3
and the three check digits as C
4
,C
5

and C
6
.
Lets choose to calculate the check digits from the information digits according to the
following rules: (each one of these equations must be indepent of any or all of the others)

C
4
= C
1
C
2

C
5
= C
2
C
3

C
6
= C
1
C
3


or in matrix notation
(
(
(

(
(
(

=
(
(
(

3
2
1
6
5
4
1 0 1
1 1 0
0 1 1
C
C
C
C
C
C


The full codeword consists of the digits C
1
,C
2
,C
3
, C
4
,C
5
,C
6
.

Generally the n-tuple codeword is denoted as C = [C
1
,C
2
,C
3
, C
4
,C
5
,C
6
]

Every codeword must satisfy the parity-check equations

C
1
C
2


C
4
= 0
C
2
C
3
C
5
= 0
C
1
C
3
C
6
= 0

or in matrix notation
(
(
(

=
(
(
(
(
(
(
(
(

(
(
(

0
0
0
1 0 0 1 0 1
0 1 0 1 1 0
0 0 1 0 1 1
6
5
4
3
2
1
C
C
C
C
C
C


which can be written a little more compactly as

(
(
(

=
(
(
(

0
0
0
1 0 0 1 0 1
0 1 0 1 1 0
0 0 1 0 1 1
t
C


Here C
t
denotes the column which is the transpose of the codeword
C = [C
1
,C
2
,C
3
, C
4
,C
5
,C
6
]
Even more compactly, we can write these parity check equations as

H C
t
= 0

where 0 is the three dimensional column whose components are all zeros and
H is called the parity-check matrix. Thus in our example
H =
(
(
(

1 0 0 1 0 1
0 1 0 1 1 0
0 0 1 0 1 1


Note that each row of the parity check matrix is independent of all othe other rows, we say
that these rows are linearly independent (i.e. we cannot obtain any row by the linear
addition of any combination of the other rows).

The 2
3
= 8 codewords in the code are


Information digits, Codewords
C
1
,C
2
,C
3
C
1
C
2
,C
3
C
4
C
5
,C
6


000 000000
001 001011
010 010110
011 011101
100 100101
101 101110
110 110011
111 111000


After the information sequence is encoded into the full codeword, the codeword is
transmitted across the noisy channel.
The channel adds to this codeword the "noise word", also called the error pattern,
E = [E
1
,E
2
,E
3
, E
4
,E
5
,E
6
]


0 if the channel does not change the ith digit
where E
i
={
1 if the channel changes the ith digit.

The received word is given by the sequence R = [R
1
,R
2
,R
3
, R
4
,R
5
,R
6
]

where R = C E (i.e. R
i
= C
i
E
i
)

( note that E = R C since addition modulo-2 is the same as subtraction modulo-2 )
for example say that the transmitted codeword was C = [ 110011 ]
and the received word was R = [ 110111 ]
We can say that the error pattern was E = [ 000100 ]
If we multiplied the transpose of the received word by the parity-check matrix H
what do we get ?

H R
t
= H (CE)
t
= H C
t
H E
t
= S
t


The r-tuple S = [ S
1
,S
2
,S
3
] is called the syndrome.
This shows that the syndrome test, whether performed on either the corrupted received word
or on the error pattern that caused it, yields the same syndrome
Since the syndrome digits are defined by the same equations as the parity-check equations,
the syndrome digits reveal the parity check failures on the received codeword. (This happens
because the code is linear. An important property of linear block codes, fundamental to the
decoding process, is that the mapping between correctable error patterns and syndromes is
one-to-one and this means that we not only can detect an error but we can also correct it.)

For example using the received word given above R =[ 110111]
H R
t
=
(
(
(

=
(
(
(
(
(
(
(
(

(
(
(

0
0
1
1
1
1
0
1
1
1 0 0 1 0 1
0 1 0 1 1 0
0 0 1 0 1 1
= S
t
,

where S = [ S
1
,S
2
,S
3
] = [100]

and as we can see this points to the fourth bit being in error.
Now all the decoder has to do ( after calculating the syndrome) is to invert the fourth bit
position in the received word to produce the codeword that was sent i.e C = [ 110011 ].

having obtained a feel of what channel coding and decoding is about, lets apply this
knowledge to a particular type of linear binary block codes called the Hamming codes.

















HAMMING CODES

These are Linear binary single-error-correcting codes having the property that the columns of
the parity-check-matrix, H, consist of all the distinct non-zero r sequences of binary numbers.
Thus a Hamming code has as many parity-check matrix columns as there are single-error
sequences. these codes will correct all patterns of single errors in any transmitted codeword.

These codes have n = k + r , where n = 2
r
- 1 , and k = 2
r
- 1 - r .
These codes have a guaranteed minimum Hamming distance d
min
= 3 .

for example the parity-check-matrix for the (7,4) Hamming code is
H =
(
(
(

1 0 0 1 1 0 1
0 1 0 1 0 1 1
0 0 1 0 1 1 1


a) Determine the codeword for the information sequence 0011
b) If the received word, R, is 1000010, determine if an error has occurred. If it has, find the
correct codeword.

Solution:
a) since H C
t
= 0 , we can use this equation to calculate the parity digits for the given
codeword as follows:
(
(
(
(
(
(
(
(
(

(
(
(

7
6
5
4
3
2
1
1 0 0 1 1 0 1
0 1 0 1 0 1 1
0 0 1 0 1 1 1
C
C
C
C
C
C
C
=
(
(
(
(
(
(
(
(
(

(
(
(

7
6
5
1
1
0
0
1 0 0 1 1 0 1
0 1 0 1 0 1 1
0 0 1 0 1 1 1
C
C
C
=
(
(
(

0
0
0


by multiplying out the left hand side we get
1.0 1.0 1.1 0.1 1.C
5
0.C
6
0.C
7
= 0
0 0 1 0 C
5
0 0 = 0
i.e. 1 C
5
= 0 and C
5
= 1
similarly by multiplying out the second row of the H matrix by the transpose of the
codeword we obtain
1.0 1.0 0.1 1.1 0.C
5
1.C
6
0.C
7
= 0
0 0 0 1 0 C
6
0 = 0
i.e. 1 C
6
= 0 and C
6
= 1
similarly by multiplying out the third row of the H matrix by the transpose of the codeword
we obtain
1.0 0.0 1.1 1.1 0.C
5
0.C
6
1.C
7
= 0
0 0 1 1 0 0 C
7
= 0
i.e. 1 1 C
7
= 0 and C
7
= 0

so that the codeword is
C = [C
1
,C
2
,C
3
, C
4
,C
5
,C
6
, C
7
] = 0011110
b) to find whether an error has occurred or not we use the following equation
H R
t
= S
t
, if the syndrome is zero then no error has occurred, if not an error
has occurred and is pin pointed by the syndrome.

Thus to compute the syndrome we multiply out the rows of H by the transpose of the
received word.


(
(
(
(
(
(
(
(
(

(
(
(

0
1
0
0
0
0
1
1 0 0 1 1 0 1
0 1 0 1 0 1 1
0 0 1 0 1 1 1
=
(
(
(

1
0
1


because the syndrome is the third column of the parity-check matrix, the third position of the
received word is in error and the correct codeword is 1010010.


The Generator Matrix of a linear binary block code

We saw above that the parity-check matrix of a systematic linear binary block code can be
written in the following (n-k) by n matrix form

H = [ h I
n-k
]

The Generator Matrix of this same code is written in the following k by n matrix form

G = [ I
k
h
t
]

The generator matrix is useful in obtaining the codeword from the information sequence
according to the following formula

C = m G

Where,
C is the codeword [C
1
,C
2
,.........,C
n-1
,C
n
]
m is the information digit sequence [ m
1
, m
2
, ....., m
k
] , and
G is the generator matrix of the code as given by the formula for G above.

Thus if we consider the single-error-correcting (n,k) = (7,4) Hamming code disscussed
previously, its parity-check matrix was

H =
1 1 1 0 1 0 0
1 1 0 1 0 1 0
1 0 1 1 0 0 1



and thus its generator matrix would be

G =
1 0 0 0 1 1 1
0 1 0 0 1 1 0
0 0 1 0 1 0 1
0 0 0 1 0 1 1


now if we had an information sequence given by the following digits 0011 , the codeword
would be given by C = m G , i.e.

C = = 0 0 1 1
1 0 0 0 1 1 1
0 1 0 0 1 1 0
0 0 1 0 1 0 1
0 0 0 1 0 1 1
0011110


Thus the (n,k) = (7,4) Hamming code generator matrix and the code book:

H = [ h I
n-k
]

G = [ I
k
h
t
]


4
3
2
1
1 1 0 1 0 0 0
1 0 1 0 1 0 0
0 1 1 0 0 1 0
1 1 1 0 0 0 1
row
row
row
row
G
(
(
(
(

=


combinations
codeword


1 1 row1 1000111
2 2 row2 0100110
3 3 row3 0010101
4 4 row4 0001011
5
1 2 row
i
row
j

1100001
6
1 3
etc 1010010
7
1 4
. 1001100
8
2 3
. 0110011
9
2 4
. 0101101
10
3 4
. 0011110
11
1 2 3
. 1110100
12
1 2 4
. 1101010
13
1 3 4
. 1011011
14
2 3 4
. 0111000
15
1 2 3 4
. 1111111
16
1 1 or 2 2 or 3 3 or 4 4
0000000

Potrebbero piacerti anche