Sei sulla pagina 1di 85

1

DIGITAL TRANSMISSION TECHNIQUES




Introduction to Communication Systems

Telecommunications is about transferring information from one location to another. This includes many
forms of information:

Telephone conversations,
Television signals,
Radio broadcast signals,
Computer files,
Other types of data.

To transfer the information, you need a channel between the two locations. This may be a wire pair, radio
signal, optical fiber, etc. Telecommunications companies receive payment for transferring their customer's
information. The financial bottom line is simple: the more information they can pass through a single
channel, the more money they make.

A communication (comm) system is concerned with the transfer of information from one location (sender,
source, originator, or transmitter) to a second location (destination, sink and receiver) some distance away.
A comm. system uses electrical, electronic or electromagnetic energy to transmit information. Electrical
energy can travel at the speed of light hence, info transfer can occur instantaneously. Electrical energy can
be transmitted:

Directly over wire line, cable, optical fiber
Radiated to the atmosphere (satellite or deep space) comm.
Radiated into space as in microwave

Examples of practical communication systems:

Telephone
Radio and Television
Cable TV
Wireless Personal Communication Systems
Cellular phones, beepers, pagers, etc.
Computer interconnections as in Internet or WWW
Computer Networks (LAN, WAN or MAN)
Satellite/Deep Space Communication (Telemetry)

An Engineer with expertise in Communication Systems?

Is involved in the analysis, design and development of communication systems
He or she is responsible for integrating electronic devices into functional communication systems


Most basic representation of any communication system
2


Each block performs a unique function in the communication process and the performance of one can
affect the other. Basic representation can be expanded.


Digital communication system

Advantages of Digital Systems Compared to Analogue Systems

Pure digital processing offers many advantages over traditional AM and FM systems, ranging from

Versatility and Flexibility:
Digital systems can be reprogrammed for other applications (at least where programmable DSP
chips are used)
3
Digital systems can be ported to different hardware (for example a different DSP chip or board
level product)

Repeatability Reproducibility:

Digital systems can be easily duplicated.
Digital systems do not depend on strict component tolerances.
Digital system responses do not drift with temperature.
Digital systems are accurate

Simplicity:

Some things can be done more easily digitally than with analogue systems.

Ease of design:

No special math skills needed to visualize the behaviour of small digital logic elements

Economy:
Lower equipment costs
Lower installation costs.
Lower maintenance costs
Lower overall cost

Performance consistency:

Performance does not change with temperature or other environmental variables

Infinite signal extension:
Capacity not limited to one signal input.
Speed;
A digital logic element can produce an output in less than 10nanoseconds

For the past quarter century, fiber optic communication has been hailed as the superior method for
transmitting video, audio, data and various analog signals. Fiber offers many well-known advantages over
twisted pair and coaxial cable, including immunity to electrical interference and superior bandwidth. For
these and many other reasons, fiber optic transmission systems have been increasingly integrated into a
wide range of applications across many industries. However, while fiber optic transmission has
dramatically improved upon the standards obtained via conventional copper-based transmission mediums,
until recently, fiber optic technology has continued to use the same analog-based signaling techniques as its
predecessors. Now, a new generation of products that employs pure digital signaling to transmit analog
information offers the opportunity to raise the standard once again, bringing fiber optic transmission to a
whole new level. Digital systems offer superior performance, flexibility and reliability, and yet dont cost
any more than the older analog designs they replace. This Education Guide examines how digital signaling
over fiber is accomplished and the resulting benefits, both from a performance and economic perspective.

Unfortunately, both LEDs and laser diodes are nonlinear devices. This means that it is difficult to control
the brightness of their light in a controlled continuum, from completely off to completely on with all
variations in between. However, in an AM system, this is exactly how they are used. As a result, various
distortions to the transmitted signal occur, such as:

Degradation in the signal-to-noise ratio, or SNR, as the length of the fiber optic cable is increased
Nonlinear differential gain and phase errors of video signals
Poor dynamic range of audio signals

In an effort to improve upon the performance of AM based fiber optic transmission systems, FM design
techniques were introduced. In these systems, the signal is conveyed by pulsing the LED or laser diode
completely on and off, with the speed of pulsing varying with respect to the original incoming signal. For
4
those familiar with FM transmission over mediums other than fiber, the term FM for this type of system
may be a bit confusing. FM stands for frequency modulation and might be interpreted, within the context
of fiber optics, to mean that modulating the frequency of the light itself to transmits the signal. This is not
the case. A more accurate name for this type of system is PPM, which stands Pulse Position Modulation.
However, across industries, FM and not PPM is the name associated with these types of systems. Just
remember that the F in frequency refers to the frequency of the pulses instead of to the frequency at
which the light itself travels.

While FM transmission systems eliminate many of the problems found in AM systems, which result from
difficulties in controlling the varying brightness level of light emanating from the diode, FM systems offer
their own unique set of problems. One distortion common in FM systems is called crosstalk. This occurs
when multiple FM carriers are transmitted over a single optical fiber, such as when using a multiplexer.
Crosstalk originates within either the transmitter or receiver unit and is the result of a drift in alignment of
critical filtering circuits designed to keep each carrier separate. When the filters do not perform properly,
one FM carrier may interfere with and distort another FM carrier. Fiber optic engineers can design FM
systems that minimize the likelihood of crosstalk occurring, but any improvement in design also means an
increase in price.

Another type of distortion is called intermodulation. Like crosstalk, this problem occurs in systems
designed to transmit multiple signals over a single fiber. Intermodulation originates in the transmitter unit
and is most often the result of a non-linearity present in a circuit common to the FM carriers. The result is
that the two (or more) original incoming signals interfere with each other before being combined into a
single optical signal, causing reduced fidelity in the transmitted optical signal.

As in analog-based fiber systems, transmitters in digital systems take in baseband video, audio and data
signals and the receivers output these signals in their original format. The digital difference occurs in
how the signals are processed and transmitted between transmitter and receiver. In a pure digital system,
the incoming base band signals are immediately run through analog to digital converters within the
transmitter. This converts the incoming signal or signals to a series of 1s and 0s, called digital streams.
Then, if more than one signal has been processed, the transmitter combines all the resulting digital streams
into a single digital stream. This combined stream is used to turn on and off the emitting diode at a very
high speed, corresponding to the 1s and 0s to be transmitted. At the receiving end, the process performed
by the transmitter is reversed. The combined digital bit stream is separated into multiple bit streams,
representing each of the unique, transmitted signals. These are then run through digital to analog
converters, and the receiver outputs video, audio and data in the same, analog format in which the signals
originated.



Lets examine some of these in detail and discuss their direct benefits both to the system installer and to the
user of the system. Unlike in AM and FM systems, pure digital transmission guarantees that the fidelity of
the baseband video, audio and data signals will remain constant throughout the systems entire available
optical budget. This is true whether you are transmitting one or multiple signals through the fiber, over
short or long distances (up to the longest distance allowed by the system). By contrast, analog systems
using AM signaling techniques exhibit a linear degradation in signal quality over the entire transmission
path. This characteristic, combined with the fact that AM systems can only transmit over multimode fiber,
limits the use of these systems to applications in which a relatively short transmission distance must be
5
covered. FM systems fare a bit better, with signal quality remaining relatively constant over short
transmission distances, but decreasing dramatically at longer distances. And even at short distances, there is
some signal degradation. Only systems that use pure digital transmission techniques can claim to have an
absolutely uniform signal quality over the entire transmission path, from transmitter to receiver.



Comparison of signal degradation in AM, FM and Digital Systems

Consistent signal quality, or lack thereof, is also an issue when considering systems designed to transmit
more than one signal over a single fiber (multiplexers). For example, a multichannel analog system
designed to carry four channels of video or audio might restrict the bandwidth allocated to each signal in
order to accommodate all the desired signals to be transmitted. Systems using digital transmission need not
make this compromise. Whether sending one, four, or even ten signals over a single fiber, the fidelity of
each signal remains the same.

Due to the inherent design of digital transmission systems, baseband analog signals maintain a higher
fidelity when transmitted using processing techniques. This is because the only signal distortion that occurs
in a digital system is that resulting from the analog to digital conversion and back again. While no
conversion is perfect, the state of todays A/D and D/A converter technology is so advanced that even the
lowest cost converters can produce video and audio quality that far exceeds that attainable by conventional
AM and FM based systems. Evidence of this can be seen by comparing signal-to-noise ratios and
measurements of nonlinear distortions (differential phase and differential gain) between digital and analog
products. Comparing systems designed to transmit similar signal types over the same fiber and at the same
wavelength, digital systems will consistently offer higher signal-to-noise ratios and lower levels of
nonlinear distortion.

6
In addition, digital design offers tremendous flexibility to fiber optic engineers who wish to scale the
performance level of a product to meet the needs of different markets, applications and budgets. By
changing the number of source coding bits used in the analog to digital conversion, engineers control the
transmission bandwidth of a system affecting the products ultimate performance and price. However,
regardless of the number of bits used, the other digital properties of the system remain constant, including
lack of distortion and consistent performance up to the maximum supported transmission distance. This
makes it easy for engineers to anticipate how different variations of a given system design will work if
scaled for high-end and lower end performance requirements. When designing analog systems, engineers
are constantly struggling with ways in which price and performance may be balanced without
compromising critical qualities of the transmitted signals.

Given the many advantages that digital systems offer, one might think that they cost more than traditional
analog systems. This should not be the case. In fact, users of digital systems save money on many levels.
For starters, the cost of digital components has decreased greatly in recent years, resulting in the ability of
fiber optic manufacturers to design and manufacture products that are priced similarly or even less than
earlier generation analog products. Of course some manufacturers would like the public to believe that the
superior performance of digital systems can only be obtained for a premium price, but in reality, they are
simply choosing not to pass their own savings on to their customers. By competitive shopping, users of
fiber optic transmission systems should be able to find products offering digital performance at a better
price than analog systems.

Another advantage of digital systems over their analog predecessors is their ability to regenerate a
transmitted signal without incurring any additional degradation to the original baseband video, audio or
data signal. This is accomplished by using a device called a repeater. As light travels through a length of
fiber, its optical power gradually attenuates. Eventually, not enough light remains to be detected by a
receiver. By placing a repeater on the fiber at a point prior to where the light diminishes too much to be
detected, the repeater can regenerate and restore the digital signal to its original form. This regenerated
signal is then launched again on the same wavelength, with the same optical power found at the original
transmission point. It is important to note that the baseband video, audio or data signal is never actually
recovered in a repeater; only the data stream representing the original signal or signals is processed. In this
manner, the quality of the original baseband signal is not degraded, regardless of how many times the
signal is repeated and over how great a distance. The advantages this offers to the system designer are
obvious. Not only can tremendous distances be covered that far exceed the capability of any AM or FM
system, but the designer can also be assured that the quality of the received signal will always be consistent
and meet the performance requirements of his application.

Another cost factor to consider is the amount of maintenance and repair that a system is likely to require
throughout its years of use. Once again, digital systems provide a clear advantage. First of all, digital
systems require no adjustments in the installation process. Transmitter, cable and receiver are simply
connected together and the system is ready to go. By contrast, analog systems often need tuning and
adjustments to accommodate for transmission distance and signal strength. Additional setup time translates
to additional costs. Then, once a system is fully operational, digital systems are far more likely to perform
consistently and reliably over time. This is because by design, digital systems have far fewer components
that may break or malfunction. There is no possibility for the system to require retuning or adjustments.
And, the user is likely to spend less time troubleshooting because the digital design makes it immune to
interference, crosstalk, drifting, and other performance issues that plague traditional analog systems. Lower
transmitter and receiver costs; lower cable costs; lower maintenance costs. Digital fiber optic transmission
systems provide a clear economic advantage at every level.

Sampling and reconstruction

Most of the signals directly encountered in communication are continuous:

Light intensity that changes with distance;
Voltage that varies over time;

7
Analog-to-Digital Conversion (ADC) and Digital-to-Analog Conversion (DAC) are the processes that help
to process the signals for transmission. Digital information is different from its continuous counterpart in
two important respects:

It is sampled,
It is quantized.

Both of these restrict how much information a digital signal can contain. The analogue signal - a
continuous variable defined with infinite precision - is converted to a discrete sequence of measured values
that are represented digitally. Information is lost in converting from analogue to digital, due to:

Inaccuracies in the measurement
Uncertainty in timing
Limits on the duration of the measurement

These effects are called quantisation errors. Information management is about understanding what
information you need to retain, and what information you can afford to lose. In turn, this dictates the
selection of the:

Sampling frequency,
Number of bits,
Type of analog filtering needed for converting between the analog and digital realms.

The Sampling Theorem

The definition of proper sampling is quite simple. Suppose you sample a continuous signal in some
manner. If you can exactly reconstruct the analog signal from the samples, you must have done the
sampling properly. Even if the sampled data appears confusing or incomplete, the key information has been
captured if you can reverse the process. The phenomenon of sinusoids changing frequency during sampling
is called aliasing. Just as a criminal might take on an assumed name or identity (an alias), the sinusoid
assumes another frequency that is not its own. Since the digital data is no longer uniquely related to a
particular analog signal, an unambiguous reconstruction is impossible. According to our definition, this is
an example of improper sampling.

The sampling theorem, frequently called the Shannon sampling theorem, or the Nyquist sampling
theorem, after the authors of 1940s papers on the topic, indicates that a continuous signal can be properly
sampled, only if it does not contain frequency components above one-half of the sampling rate. For
instance, a sampling rate of 2,000 samples/second requires the analog signal to be composed of frequencies
below 1000 cycles/second. If frequencies above this limit are present in the signal, they will be aliased to
frequencies between 0 and 1000 cycles/second, combining with whatever information that was legitimately
there.

Two terms are widely used when discussing the sampling theorem: the Nyquist frequency and the Nyquist
rate. Unfortunately, their meaning is not standardized. To understand this, consider an analog signal
composed of frequencies between DC and 3 kHz. To properly digitize this signal it must be sampled at
6,000 samples/sec (6 kHz) or higher. Suppose we choose to sample at 8,000 samples/sec (8 kHz), allowing
frequencies between DC and 4 kHz to be properly represented. In this situation there are four important
frequencies:

The highest frequency in the signal, 3 kHz;
Twice this frequency, 6 kHz;
The sampling rate, 8 kHz; and
One-half the sampling rate, 4 kHz.

Which of these four is the Nyquist frequency and which is the Nyquist rate? It depends who you ask! All of
the possible combinations are used. Fortunately, most authors are careful to define how they are using the
8
terms. In this course, they are both used to mean one-half the sampling rate. Just as aliasing can change the
frequency during sampling, it can also change the phase.

Quantization

Method of approximating the amplitude of an analog signal sample and recording it in terms of bits.

Quantisation takes the maximum amplitude or height of a given signal and divides it into a number of small
pieces that are stacked on top of each other to reach the highest amplitude. The actual amplitude value falls
somewhere within one of those stacked blocks. That block is then given as the height of the signal at that
sample.

In digital systems, the number of bits used to record the quantisation blocks determines quantisation. With
more bits, there can be more blocks resulting in signals more closely resembling the actual signal. The
sampling rate combined with the quantisation affect the potential maximum quality of an analog signal
converted to and then later reproduced from digital data. The difficulty with using higher sampling rates
and quantisation bit word lengths is that the resulting file size becomes increasingly large making storage
of very high-fidelity digital data a problem. In a binary system, digits can only take one of two values a 1
or a0. Therefore if the signal is to be quantised into Q levels, the number of binary digits required to
completely encode the signal is: k = log
2
Q.

Quantisation Noise

If a signal x(t) is quantised, then the transmitted signal only represents a finite number of levels equal to 2
k

= Q quantisation steps. This means that all those levels of the input signal falling between two adjacent
quantisation steps are rounded off to the nearest step. This introduces errors into the signal amplitude.
These errors are called quantisation errors. The maximum error signal occurs when the input signal has a
value halfway between two successive quantisation levels. In this instance the error signal is:

A
2
1

Where A is the quantisation step.














Reducing Quantisation error

Reducing the quantisation step reduces the maximum value of quantisation error. But reducing the
quantisation step means that more quantisation levels are required to encode the same signal; which in turn
implies that more codewords are required to encode the signal. Since the number of possible codewords
depends on the number of digits within a codeword it means that more digits will be required per codeword
if the quantisation step is reduced. A bigger number of codeword digits imply that a bigger bandwidth is
required for the transmission of the same signal. Therefore selection of the appropriate quantisation step is
a tradeoff between the desired system bandwidth and the quantisation error that can be tolerated.

Input signal x(t)
Quantised levels
approximating input
signal
error
9
Mean Quantisation error over a single quantisation interval

If the i
th
quantisation level is q
i
and the quantisation step for that level is A
i
then the signal amplitudes
which are all represented by the quantisation height q
i
are in the range:
2 2
i
i
i
i
q x q
A
+ < s
A
.
The squared quantisation error for each signal is:
( )
2
2
q
e
i
i
x =

The probability distribution function for each squared error term is the same as the probability of
occurrence of the signal amplitude that gave rise to it p(x). The mean square error over the range
2 2
i
i
i
i
q x q
A
+ < s
A
is the expected squared error in that interval i.e.
}
A
+
A

=
2
2
2 2
) (
i
i
i
i
q
q
i i
dx e x p E
This can be written as:
}
A
+
A

=
2
2
2 2
) )( (
i
i
i
i
q
q
i i
dx q x x p E .
If the maximum signal excursion is much bigger than the quantisation step A
i
, then the signal probability
function p(x) can be assumed to be constant over the quantisation interval
2
i
i
q
A
to
2
i
i
q
A
+ . If the
probability function is uniform over this interval then:
i
x p
A
=
1
) (
and the mean square error over the quantisation interval is:
}
A
+
A

A
=
2
2
2 2
) (
1
i
i
i
i
q
q
i
i
i
dx q x E
After carrying out the integrating process we get:
2
2
3 2
) (
3
1
i
i
i
i
q
q
i
i
i
q x E
A
+
A

-
A
=
On evaluating we get:
12 12
1
2 3
2
A A
= -
A
=
i i
i
i E

i.e. if the quantisation step for the i
th
interval is A
i
then the mean square error over that i
th
interval is:
12
2
2
A
=
i
i E
.


10
Mean Square Error over the Entire Quantisation Range

If P
i
is the probability of the signal amplitude falling in the i
th
interval then the mean square error over the
entire quantisation range is simply the expected mean square error E over the entire range i.e.

= = =
A =
A
= =
N
i
i i
N
i
i
i
N
i
i i
P P E P E
1
2
1
2
1
2 2
12
1
12

For uniform quantisation the step interval A
i
is constant. The mean square error over the entire interval for a
uniform quantisation is:
2
1
2 2
12
1
12
1
A = A =

=
N
i
i
P E
Since the total probability over an entire quantisation range is one.

Maximum SQNR for a uniform Quantiser

The maximum SQNR occurs when the signal amplitude is the maximum value that can be handled by the
quantiser. Let us assume that the input signal is sinusoidal i.e.

v = V
m
sinet,

and it has equal excursions in the positive direction and negative direction. If the quantiser has N
quantisation intervals, then N/2 quantisation intervals are required to quantise the peak value in either
direction.

i.e.










Hence
A = A - = N
N
V
m
2
1
2

If v is a voltage signal then the signal power S delivered to a load impedance will be proportional to:
2
2
2
1
2
1
2
1
|
.
|

\
|
A = N V
m

This simplifies to:
2 2
8
1
A N .

Signal to quantisation noise ratio SQNR is defined as the ratio of signal power to mean squared error. Since
both the signal power and the quantisation noise power (as given by the mean squared error) are developed
across the same load the SQNR is:
2
2
2
3
N
E
S
SQNR = =

-V
m

V
m

Quantisation steps
2
N

2
N

11
SQNR can be expressed in decibel form as:
2
10
2
3
log 20 N SQNR
dB
=
Which reduces to:
N SQNR
10
log 20 8 . 1 + =

For binary coded PCM:
N = 2
k

Where k is the number of bits in a codeword. For this case the SQNR is given by

2 log 20 8 . 1
10
k SQNR + = dB

Which simplifies to:

k SQNR 6 8 . 1 + = dB.


Non-Linear Quantisation

With uniform quantisation, the quantisation noise is constant: Note that the mean square noise is:
2
12
1
2
A = E
Where A is the quantisation step. Since the quantisation noise is independent of signal level, a signal whose
level is XdB lower than the maximum will have a SQNR that is XdB worse than the one determined for
uniform quantisation. This is because SQNR is simply the ratio of signal value to mean square quantisation
error. In telephony, the probability of the presence of smaller amplitudes is much greater than that of larger
amplitudes. Non-uniform quantisation, where the lower amplitudes have smaller quantisation steps than the
larger amplitudes, must be used. This improves the SQNR at lower signal amplitudes.

It is difficult to come up with a variable step quantiser. In general the SQNR for small signals is improved
by compressing the level range before coding and then using expansion after decoding at the receiver. The
combined process is called companding. In telecommunication, the term signal compression, in analogue
(usually audio) systems, means the reduction of the dynamic range of a signal by controlling it as a function
of the inverse relationship of its instantaneous value relative to a specified reference level.

Signal compression is usually expressed in dB. Instantaneous values of the input signal that are low,
relative to the reference level, are increased, and those that are high are decreased. Signal compression is
usually accomplished by separate devices called "compressors." It is used for many purposes, such as:

Improving signal-to-noise ratios prior to digitizing an analog signal for
transmission over a digital carrier system.
Preventing overload of succeeding elements of a system.
Matching the dynamic ranges of two devices.

Signal compression (in dB) may be a linear or nonlinear function of the signal level across the frequency
band of interest and may be essentially instantaneous or have fixed or variable delay times. Signal
compression always introduces distortion, which is usually not objectionable, if the compression is limited
to a few dB.






12














Uniform Quantisation Non-Uniform Quantisation
The original dynamic range of a compressed signal may be restored by a circuit called an "expander."










In telephony, a standard audio signal for a single phone call is encoded as 8000 analog samples per second,
of 8 bits each, giving a 64 kbit/s digital signal known as DS0. The default encoding on a DS0 is either -
law (mu-law) PCM (North America) or a-law PCM (Europe and most of the rest of the world). These are
logarithmic compression systems where a 12 or 13 bit linear PCM sample number is mapped into an 8 bit
value. This system is described by international standard G.711. An a-law algorithm is a standard
companding algorithm, used in European digital communications systems to optimize, i.e., modify, the
dynamic range of an analog signal for digitizing. It is similar to the mu-law algorithm used in North
America and Japan. For a given input x, the equation for A-law encoding is as follows,

,

where A is the compression parameter. In Europe, A = 87.7; the value 87.6 is also used. Note that to avoid
confusion with the sine function, this function is often called the signum function. The sign function is
often represented as sgn and can be defined thus:

or using the Iverson bracket notation:
Signal input
Signal input
Compression transfer
characteristic
Expansion transfer
characteristic
Input signal
Input signal
Output
signal
Output
signal
13

Any real number can be expressed as the product of its absolute value and its sign function:



From equation (1) it follows that whenever x is not equal to 0 we have:


A-law expansion is given by the inverse function,



The reason for this encoding is that the wide dynamic range of speech does not lend itself well to efficient
linear digital encoding. A-law encoding effectively reduces the dynamic range of the signal, thereby
increasing the coding efficiency and resulting in a signal-to-distortion ratio that is superior to that obtained
by linear encoding for a given number of bits.

The A-law algorithm provides a slightly larger dynamic range than the mu-law at the cost of worse
proportional distortion for small signals. By convention, A-law is used for an international connection if at
least one country uses it. For the A-law the SQNR improvement factor is:
A
A
dx
dy
e
log 1
log 20 log 20
10 10
+
=

In telecommunication, a mu-law algorithm (-law) is a standard analog signal compression or companding
algorithm, used in digital communications systems of the North American and Japanese digital hierarchies,
to optimize (in other words, modify) the dynamic range of an audio analog signal prior to digitizing. It is
similar to the A-law algorithm used in Europe. For a given input x, the equation for -law encoding is as
follows,
,

where = 255 (8 bits) in the North American and Japanese standards. -law expansion is then given by the
inverse equation:


This encoding is used because speech has a wide dynamic range that does not lend itself well to efficient
linear digital encoding. Moreover, perceived intensity (loudness) is logarithmic. Mu-law encoding
effectively reduces the dynamic range of the signal, thereby increasing the coding efficiency and resulting
in a signal-to-distortion ratio that is greater than that obtained by linear encoding for a given number of bits.
The mu-law algorithm is also used in some rather standard programming language approaches for storing
and creating sound (such as the sun.audio classes in java 1.1, in the .au format, and in some C# methods).

Number of Digits in a Codeword

Let us assume that the maximum number of levels needed for quantising a signal without clipping is Q, the
number of digits in the codeword is k, and is the number of discrete values which can be taken by each
digit in the codeword.

14
Each of the k digits in the codeword can take one of values implying that the total number of possible
codewords is k.
To ensure that each quantisation level has a unique codeword Q
k
> . The minimum number of
codewords that just satisfy this condition is Q
k
= .
Therefore if each digit in a codeword can take any one of levels, the minimum number of digits
sufficient for uniquely encoding a signal with Q quantisation levels is k = logQ.

Some forms of PCM combine signal processing with coding. Older versions of these systems applied the
processing in the analog domain as part of the A/D process, newer implementations do so in the digital
domain. These simple techniques have been largely rendered obsolete by modern transform-based signal
compression techniques.

Differential (or Delta) pulse-code modulation (DPCM) encodes the PCM values as differences
between the current and the previous value. For audio this type of encoding reduces the number
of bits required per sample by about 25% compared to PCM.
Adaptive DPCM (ADPCM) is a variant of DPCM that varies the size of the quantization step, to
allow further reduction of the required bandwidth for a given signal-to-noise ratio (SNR or S/N).

Where circuit costs are high and loss of voice quality is acceptable, it sometimes makes sense to compress
the voice signal even further. An ADPCM algorithm is used to map a series of 8 bit PCM samples into a
series of 4 bit ADPCM samples. In this way, the capacity of the line is doubled. The technique is detailed in
the G.726 standard. Later it was found that even further compression was possible and additional standards
were published. Some of these international standards describe systems and ideas which are covered by
privately owned patents and thus use of these standards requires payments to the patent holders.

Bandwidth Reduction Techniques For PCM

All physical transmission mediums have characteristics that only allow signal transmission over a finite
band of frequencies. This results in bandwidth being a limited and therefore a valuable resource. In PCM
bandwidth is directly proportional to the number of bits k in the resulting codeword. To save on bandwidth,
techniques have been produced which result in a smaller number of bits being generated.

Such techniques include:

Differential PCM (DPCM)
Predictive DPCM
Adaptive DPCM (ADPCM).
Delta Modulation.

Differential PCM (DPCM)

In ADPCM the difference between adjacent samples is encoded and transmitted. For most signals that are
transmitted, there is a high degree of correlation from sample to sample i.e. there is a lot of signal content
similarities between adjacent samples. As a result it suffices to transmit only the sample differences.








DPCM Encoder

Sampler
PCM
encoder
Delay
T
s


Differential PCM
-
+
15
From the diagram the DPCM encoder simply represents an adjacent sample differencing operation
followed by a conventional pulse code modulator. The sample differences require less quantisation levels
than the actual signal. This reduces the required code word length.









DPCM Decoder

The receiver is simply a conventional PCM decoder followed by an adjacent sample summing operation.
Because only the sample differences are encoded, the resulting number of bits is small, and hence a smaller
bandwidth is required than that needed for PCM.

Predictive DPCM

Predictive DPCM uses an algorithm instead of only the previous sample to predict an information signals future value.
It then waits until the actual value is available for examination and subsequently transmits a signal that represents a
correction to the predicted value i.e. the transmitted correction represents the information signals surprising or
unpredictable part.










Predictive DPCM
The input signal g(t) is an analogue signal. It is sampled to produce a sample sequence g(kTs). The encoder
determines the error signal e(kTs). This is the difference between the actual value of g(kTs) and ) (
Ts
k g ,
the value predicted from past samples. The error signal e(kTs) is then quantised to produce the quantised
error signal eq(kTs) which is then encoded and transmitted. At the receiver the estimated output ) (
~
s
kT g
is obtained by summing the predictor output ) (
s
kT g to the received error signal eq(kTs). The smoothing
filter is used to reconstruct the output signal ) (
~
t g . Note that in predictive DPCM the predictor coefficients
at the transmitter and the receiver are the same, and they are fixed. As a result there is no need to transmit
any other information apart from the error signal eq(kTs).






Predictive Decoder

Delay
T
s

PCM
decoder
+
+
sampler
q-level
quantiser
PCM
encoder

Predictor

) (
T
s
k g

) (
~
s
kT g
e(kTs)
g(kTs) e
q
(kTs)
Predictive
DPCM
-
Predictive
DPCM PCM
decoder
rrr

Predictor
Smoothing
filter
) (
T
s
k g

e
q
(kTs)
) (
~
t g

g(t)
16
Adaptive DPCM (ADPCM)

In adaptive DPCM both the quantisation step and the predictor coefficients are continuously modified (i.e.
adapted) to suit the changing signal statistics. In addition to the error signals, information on the step size
and predictor coefficients is transmitted for use by the receiver decoder. 32 kbit/s ADPCM is used in
telecommunications in place of the normal 64 kbits/s PCM in some instances where bandwidth is at a
premium.

Delta Modulation

Delta modulation is a subclass of DPCM where the error signal e(n) is encoded using only one bit. At the
transmitter, the sampled value ) (
s t
kT g is compared with a predicted value ) (
s
kT g . The difference
signal e(kTs) is quantised into one of two values +A or A ( Note A is the quantisation step height). The
output of the quantiser is encoded using one binary digit per sample and sent to the receiver. At the
receiver, the decoded value of the difference signal eq(kTs) is added to the immediately preceding value of
the receiver output to obtain an estimated value ) (
~
s
kT g . This is then smoothed by the smoothing filter to
give ) (
~
t g .











DM encoder








DM decoder

Problems of Delta Modulation Schemes

DM schemes suffer from start up errors, idling noise, and slope overload as shown below.










g(t)
sampler

q-level
quantiser
Delay
T
s

e
q
(kTs)
e(kTs)
-
) (
T
s
k g

) (
~
s
kT g

Smoothing
filter
Delay T
s

) (
~
s
kT g
) (
~
t g

E
q
(kT
s
)
17













Common problems of DM.

Initially at start up, the estimated signal ) (
s
kT g is less than the input signal g(kTs), so the output signal
eq(kTs) has a value of A. When this error signal eq(kTs) is fed back through the delay element it produces a
step change in ) (
s
kT g of height A. This process continues during the startup period until
) ( ) (
s s
kT g kT g = . If the information signal g(t) remains constant for a significant period of time
) (
~
s
kT g displays a hunting behaviour, resulting in the transmitted signal eq(Ts) alternating between +A
and A. This alternating signal is called quantisation noise.

If g(t) changes ) (
~
s
kT g and ) (
s
kT g follow in a step wise fashion as long as successive samples of g(t)
do not differ by an amount greater than the step size A. When the difference is greater than A then ) (
~
s
kT g
and ) (
s
kT g can no longer follow g(t). This form of error is called slope overloading. In Delta modulation,
reducing the step size reduces idling noise. Increasing the step size, however, reduces slope overloading.
This conflict can be minimised by keeping the step size small (to reduce idling noise) and increasing the
sampling frequency much higher than the Nyquist sampling rate (minimising slope overload by enabling
) (
~
s
kT g and ) (
s
kT g to respond much faster to changes in g(t)). However if the sampling rate is
increased bandwidth also increases and this reduces the bandwidth advantage of Delta modulation over
PCM.

Adaptive Delta Modulation

Adaptive delta modulation improves on delta modulation by decreasing the step height A during constant
inputs (this reduces idling noise) and increasing the step height during rapid signal changes (this reduces
slope overload errors).

Measures of Information

Source theory

Any process that generates successive messages can be considered a source of information. Sources can be
classified in order of increasing generality as:

Memoryless- The term "memoryless" as used here has a slightly different meaning than it normally does in
probability theory. Here a memoryless source is defined as one that generates successive messages
independently of one another and with a fixed probability distribution. However, the position of the first
occurrence of a particular message or symbol in a sequence generated by a memoryless source is actually a
Start-up
Idling noise
Slope overload
18
memoryless random variable. The other terms have fairly standard definitions and are actually well studied
in their own right outside information theory.
Ergodic - in mathematics, a measure-preserving transformation T on a probability space is said to be
ergodic if the only measurable sets invariant under T have measure 0 or 1.
Stationary, - in the mathematical sciences, a stationary process is a stochastic process whose probability
distribution at a fixed time or position is the same for all times or positions. As a result, parameters such
as the mean and variance also do not change over time or position.
Stochastic - in the mathematics of probability, a stochastic process is a random function. This view of a
stochastic process as an indexed collection of random variables is the most common one.
Information theory

Information theory is a field of mathematics that considers three fundamental questions:

Compression: How much can data be compressed (abbreviated) so that another person can
recover an identical copy of the uncompressed data?
Lossy data compression: How much can data be compressed so that another person can recover
an approximate copy of the uncompressed data?
Channel capacity: How quickly can data be communicated to someone else through a noisy
medium?

These somewhat abstract questions are answered quite rigorously by using mathematics introduced by
Claude Shannon in 1948. His paper spawned the field of information theory, and the results have been
crucial to the success of the:

Voyager missions to deep space,
the invention of the CD,
the feasibility of mobile phones,
analysis of the code used by DNA,
numerous other fields.

Information theory is the mathematical theory of data communication and storage, generally considered to
have been founded in 1948 by Claude E. Shannon. The central paradigm of classic information theory is
the engineering problem of the transmission of information over a noisy channel. The most fundamental
results of this theory are:

Shannon's source coding theorem, which establishes that on average the number of bits needed to
represent the result of an uncertain event is given by the entropy; and
Shannon's noisy-channel coding theorem, which states that reliable communication is possible
over noisy channels provided that the rate of communication is below a certain threshold called
the channel capacity.

The channel capacity is achieved with appropriate encoding and decoding systems. Information theory is
closely associated with a collection of pure and applied disciplines that have been carried out under a
variety of banners in different parts of the world over the past half century or more:

adaptive systems,
anticipatory systems,
artificial intelligence,
complex systems,
complexity science,
cybernetics,
informatics,
machine learning,
systems sciences of many descriptions.

19
Information theory is a broad and deep mathematical theory, with equally broad and deep applications,
chief among them coding theory. Coding theory is concerned with finding explicit methods, called codes,
of increasing the efficiency and fidelity of data communication over a noisy channel up near the limit that
Shannon proved is all but possible. These codes can be roughly subdivided into:

data compression codes
error-correction codes.
cryptographic ciphers;

Information theory is also used in:

information retrieval,
intelligence gathering,
gambling,
statistics,
musical composition.

Self-information

Shannon defined a measure of information content called the self-information or surprisal of a message m:

where p(m) = Pr(M = m) is the probability that message m is chosen from all possible choices in the
message space M. This equation causes messages with lower probabilities to contribute more to the overall
value of I(m). In other words, infrequently occurring messages are more valuable. (This is a consequence
from the property of logarithms that logp(m) is very large when p(m) is near 0 for unlikely messages and
very small when p(m) is near 1 for almost certain messages). For example, if John says "See you later,
honey" to his wife every morning before leaving for office, that information holds little "content" or
"value". But, if he shouts "Get lost" at his wife one morning, then that message holds more value or content
(because, supposedly, the probability of him choosing that message is very low).

Information entropy

Entropy is a concept in information theory. Information entropy is occasionally called Shannon's entropy
in honor of Claude E. Shannon. The basic concept of entropy in information theory has to do with how
much randomness (or, alternatively, 'uncertainty') there is in a signal or random event. An alternative way
to look at this is to talk about how much information is carried by the signal.

As an example consider some English text, encoded as a string of letters, spaces, and punctuation (so our
signal is a string of characters). Since some characters are not very likely (e.g. 'z') while others are very
common (e.g. 'e') the string of characters is not really as random as it might be. On the other hand, since we
cannot predict what the next character will be: it is, to some degree, 'random'. Entropy is a measure of this
randomness, suggested by Shannon in his 1948 paper A Mathematical Theory of Communication. Shannon
offers a definition of entropy which satisfies the assumptions that:

The measure should be proportional (continuous) - i.e. changing the value of one of the
probabilities by a very small amount should only change the entropy by a small amount.
If all the outcomes (letters in the example above) are equally likely then increasing the number of
letters should always increase the entropy.
We should be able to make the choice (in our example of a letter) in two steps, in which case the
entropy of the final result should be a weighted sum of the entropies of the two steps.

(Note: Shannon/Weaver make reference to Tolman (1938) who in turn credits Pauli (1933) with the
definition of entropy that is used by Shannon. Elsewhere in statistical mechanics, the literature includes
references to von Neumann as having derived the same form of entropy in 1927, so it was that von
Neumann favoured the use of the existing term 'entropy'. )


20
Formal definitions

Claude E. Shannon defines entropy in terms of a discrete random event x, with possible states (or
outcomes) 1..n as:

That is, the entropy of the event x is the sum, over all possible outcomes i of x, of the product of the
probability of outcome i times the log of the inverse of the probability of i (which is also called its
surprisal - the entropy of x is the expected value of its outcome's surprisal). We can also apply this to a
general probability distribution, rather than a discrete-valued event. Shannon shows that any definition of
entropy satisfying his assumptions will be of the form:

where K is a constant (and is really just a choice of measurement units). Shannon defined a measure of
entropy (H = p
1
log
2
p
1
p
n
log
2
p
n
) that, when applied to an information source, could determine
the minimum channel capacity required to reliably transmit the source as encoded binary digits. This
measure has also been called surprisal, as it represents the "surprise" of seeing the outcome (a certain
outcome is not surprising). This term was coined by Myron Tribus in his 1961 book Thermostatic and
Thermodynamics. Some claim it is more accurate than "self-information", but it has not been widely used.
The formula can be derived by calculating the mathematical expectation of the amount of information
contained in a digit from the information source. Shannon's entropy measure came to be taken as a measure
of the uncertainty about the realization of a random variable. It thus served as a proxy capturing the concept
of information contained in a message as opposed to the portion of the message that is strictly determined
(hence predictable) by inherent structures. For example, redundancy in language structure or statistical
properties relating to the occurrence frequencies of letter or word pairs, triplets etc.

Examples
On tossing a coin, the chance of 'tail' is 0.5. When it is proclaimed that indeed 'tail' occurred, this amounts
to:
H('tail') = log
2
(1/0.5) = log
2
2 = 1 bits of information.

When throwing a die, the probability of 'four' is 1/6. When there is proclaimed that 'four' has been thrown,
the amount of self-information is:
H('four') = log
2
(1/(1/6)) = log
2
(6) = 2.585 bits.

When, independently, two dice are thrown, the amount of information associated with {throw 1 = 'two' &
throw 2 = 'four'} equals:

H('throw 1 is two & throw 2 is four') = log
2
(1/Pr(throw 1 = 'two' & throw 2 = 'four')) = log
2
(1/(1/36))
= log
2
(36) = 5.170 bits.

This outcome equals the sum of the individual amounts of self-information associated with {throw 1 =
'two'} and {throw 2 = 'four'}; namely 2.585 + 2.585 = 5.170 bits.

Shannon's definition of entropy is closely related to thermodynamic entropy as defined by physicists and
many chemists. Boltzmann and Gibbs did considerable work on statistical thermodynamics, which became
the inspiration for adopting the word entropy in information theory. There are relationships between
thermodynamic and informational entropy.

It is important to remember that entropy is a quantity defined in the context of a probabilistic model for a
data source. Independent fair coin flips have an entropy of 1 bit per flip. A source that always generates a
long string of A's has an entropy of 0, since the next character will always be an 'A'. The entropy rate of a
data source means the average number of bits per symbol needed to encode it. Empirically, it seems that
entropy of English text is between 1.1 and 1.6 bits per character, though clearly that will vary from text
21
source to text source. Experiments with human predictors show an information rate of 1.1 or 1.6 bits per
character, depending on the experimental setup; the Prediction by Partial Matching (PPM) compression
algorithm can achieve a compression ratio of 1.5 bits per character. From the preceding example, note the
following points:

The amount of entropy is not always an integer number of bits.
Many data bits may not convey information. For example, data structures often store information
redundantly, or have identical sections regardless of the information in the data structure.

Entropy effectively bounds the performance of the strongest lossless (or nearly lossless) compression
possible, which can be realized in theory by using the typical set or in practice using Huffman, Lempel-Ziv
or arithmetic coding. The performance of existing data compression algorithms is often used as a rough
estimate of the entropy of a block of data. A common way to define entropy for text is based on the Markov
model of text. For an order-0 source (each character is selected independent of the last characters), the
binary entropy is:


where p
i
is the probability of i. For a first-order Markov source (one in which the probability of selecting a
character is dependent only on the immediately preceding character), the entropy rate is:

where i is a state (certain preceding characters) and p
i
(j) is the probability of j given i as the previous
character (s). For a second order Markov source, the entropy rate is:

In general the b-ary entropy of a source = (S,P) with source alphabet S = {a
1
, , a
n
} and discrete
probability distribution P = {p
1
, , p
n
} where p
i
is the probability of a
i
(say p
i
= p(a
i
)) is defined by:

Note: the b in "b-ary entropy" is the number of different symbols of the "ideal alphabet" which is being
used as the standard yardstick to measure source alphabets. In information theory, two symbols are
necessary and sufficient for an alphabet to be able to encode information, therefore the default is to let b =
2 ("binary entropy"). Thus, the entropy of the source alphabet, with its given empiric probability
distribution, is a number equal to the number (possibly fractional) of symbols of the "ideal alphabet", with
an optimal probability distribution, necessary to encode for each symbol of the source alphabet. Also note
that "optimal probability distribution" here means a uniform distribution: a source alphabet with n symbols
has the highest possible entropy (for an alphabet with n symbols) when the probability distribution of the
alphabet is uniform. This optimal entropy turns out to be
.

Efficiency

A source alphabet encountered in practice should be found to have a probability distribution which is less
than optimal. If the source alphabet has n symbols, then it can be compared to an "optimized alphabet" with
n symbols, whose probability distribution is uniform. The ratio of the entropy of the source alphabet with
the entropy of its optimized version is the efficiency of the source alphabet, which can be expressed as a
percentage. This implies that the efficiency of a source alphabet with n symbols can be defined simply as
being equal to its n-ary entropy.

22
Joint entropy

The joint entropy of two discrete random variables X and Y is defined as the entropy of the joint distribution
of X and Y:

If X and Y are independent, then the joint entropy is simply the sum of their individual entropies. (Note:
The joint entropy is not to be confused with the cross entropy, despite similar notation.)

Conditional entropy (equivocation)

Given a particular value of a random variable Y, the conditional entropy of X given Y = y is defined as:

where

is the conditional probability of x given y. The conditional entropy of X given Y, also called the
equivocation of X about Y is then given by:

A basic property of the conditional entropy is that:



Mutual information (transinformation)

It turns out that one of the most useful and important measures of information is the mutual information, or
transinformation. This is a measure of how much information can be obtained about one random variable
by observing another. The transinformation of X relative to Y (which represents conceptually the amount of
information about X that can be gained by observing Y) is given by:


A basic property of the transinformation is that:



Mutual information is symmetric:



Coding theory

Coding theory is the most important and direct application of information theory. It can be subdivided into:

data compression theory and
error correction theory.

23
Using a statistical description for data, information theory quantifies the number of bits needed to describe
the data. There are two formulations for the compression problem in lossless data compression the data
must be reconstructed exactly, whereas lossy data compression examines how many bits are needed to
reconstruct the data to within a specified fidelity level. This fidelity level is measured by a function called a
distortion function. In information theory this is called rate distortion theory. Both lossless and lossy source
codes produce bits at the output which can be used as the inputs to the channel codes mentioned above. The
idea is to first compress the data, i.e. remove as much of its redundancy as possible, and then add just the
right kind of redundancy (i.e. error correction) needed to transmit the data efficiently and faithfully across a
noisy channel.

This division of coding theory into compression and transmission is justified by the information
transmission theorems, or source-channel separation theorems that justify the use of bits as the universal
currency for information in many contexts. However, these theorems only hold in the situation where one
transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter
(the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the
relay channel), or more general networks, compression followed by transmission may no longer be optimal.
Network information theory refers to these multi-agent communication models.

Entropy encoding
An entropy encoding is a coding scheme that assigns codes to symbols so as to match code lengths with
the probabilities of the symbols. Typically, entropy encoders are used to compress data by replacing
symbols represented by equal-length codes with symbols represented by codes proportional to the negative
logarithm of the probability. Therefore, the most common symbols use the shortest codes. According to
Shannon's source coding theorem, the optimal code length for a symbol is log
b
P, where b is the number of
symbols used to make output codes and P is the probability of the input symbol. Three of the most common
entropy encoding techniques are:

Huffman coding,
Range encoding,
Arithmetic coding.

If the approximate entropy characteristics of a data stream are known in advance (especially for signal
compression), a simpler static code such as:
Unary coding,
Elias gamma coding,
Fibonacci coding,
Golomb coding,
Rice coding may be useful.

Huffman coding

In computer science, Huffman coding is an entropy encoding algorithm used for lossless data
compression. The term refers to the use of a variable-length code table for encoding a source symbol (such
as a character in a file) where the variable-length code table has been derived in a particular way based on
the estimated probability of occurrence for each possible value of the source symbol. It was developed by
David A. Huffman as a Ph.D. student at MIT in 1952, and published in A Method for the Construction of
Minimum-Redundancy Codes.

Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a
prefix-free code (that is, the bit string representing some particular symbol is never a prefix of the bit string
representing any other symbol) that expresses the most common characters using shorter strings of bits than
are used for less common source symbols. Huffman was able to design the most efficient compression
method of this type: no other mapping of individual source symbols to unique strings of bits will produce a
smaller average output size when the actual symbol frequencies agree with those used to create the code.
(Huffman coding is such a widespread method for creating prefix-free codes that the term "Huffman code"
is widely used as a synonym for "prefix-free code" even when such a code was not produced by Huffman's
algorithm.)
24

For a set of symbols with a uniform probability distribution and a number of members which is a power of
two, Huffman coding is equivalent to simple binary block encoding. Assertions of the optimality of
Huffman coding should be phrased carefully, because its optimality can sometimes accidentally be over-
stated. For example, arithmetic coding ordinarily has better compression capability, because it does not
require the use of an integer number of bits for encoding each source symbol. LZW coding can also often
be more efficient, particularly when the input symbols are not independently-distributed, because it does
not depend on encoding each input symbol one at a time (instead, it batches up a variable number of input
symbols into each encoded syntax element). The efficiency of Huffman coding also depends heavily on
having a good estimate of the true probability of the value of each input symbol.

In 1951, David Huffman and his MIT information theory classmates were given the choice of a term paper
or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most
efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up
and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree, and
quickly proved this method the most efficient. In doing so, the student outdid his professor, who had
worked with information theory inventor Claude Shannon to develop a similar code. Huffman avoided the
major flaw of Shannon-Fano coding by building the tree from the bottom up instead of from the top down.

Given. A set of symbols and their costs.

Find. A prefix free binary character code (a sets of code words) with minimum weighted path length.
Note-1. A code wherein each character is represented by a unique binary string (codeword) is called a
binary character code.
Note-2. A prefix free code is a code having the property that no codeword is a prefix of any other
codeword.
Note-3. In the standard Huffman coding problem, it is assumed that each symbol in the set that the code
words are constructed from has an equal cost to transmit: a code word whose length is N digits will always
have a cost of N, no matter how many of those digits are 0s, how many are 1s, etc. When working under
this assumption, minimizing the total cost of the message and minimizing the total number of digits are the
same thing.

Formalized description
Input.
Alphabet , which is the symbol alphabet of size n.

Set , which is the set of the symbol costs, i.e.
.

Output.
Code , which is the set of (binary) code words, where h
i
is the
codeword for .

Goal.
Let be the weighted path length of code H.
25
Condition: for any code .
Samples
Sample-1
Input
Alphabet a b c d e f g h i
Costs 10 15 5 15 20 5 15 30 5
Output
Codewords
000 010 0010 011 111 00110 110 10 00111

Weighted path
length
10 *
3
15 *
3
5 * 4
15 *
3
20 *
3
5 * 5
15 *
3
30 *
2
5 * 5
=
355

Basic technique
The technique works by creating a binary tree of nodes. These can be stored in a regular array, the size of
which depends on the number of symbols(N). A node can be either a leaf node or an internal node.
Initially, all nodes are leaf nodes, which contain the symbol itself, the weight (frequency of appearance) of
the symbol and optionally, a link to a parent node which makes it easy to read the code (in reverse) starting
from a leaf node. Internal nodes contain symbol weight, links to two child nodes and the optional link to a
parent node. As a common convention, bit '0' represents following the left child and bit '1' represents
following the right child. A finished tree has N leaf nodes and N1 internal nodes.
A fast way to create a Huffman tree is to use the heap data structure, which keeps the nodes in partially
sorted order according to a predetermined criterion. In this case, the node with the lowest weight is al ways
kept at the root of the heap for easy access.

Creating the tree:
Start with as many leaves as there are symbols.
Push all leaf nodes into the heap.
While there is more than one node in the heap:
Remove two nodes with the lowest weight from the heap.
If the heap was storing copies of node data rather than pointers to nodes in final storage
for the tree, move these nodes to final storage.
Create a new internal node, with the two just-removed nodes as children (either node can
be either child) and the sum of their weights as the new weight.
Update the parent links in the two just-removed nodes to point to the just-created parent
node.
Push the new node into the heap.
The remaining node is the root node; the tree has now been generated.

It is generally beneficial to minimize the variance of codeword length. For example, a communication
buffer receiving Huffman-encoded data may need to be larger to deal with especially long symbols if the
tree is especially unbalanced. To reduce variance every newly generated node must be favored among same
weight nodes and placed as high as possible. This modification will retain the mathematical optimality of
the Huffman coding while minimizing the length of the longest character code.

Main properties

The frequencies used can be generic ones for the application domain that are based on average experience,
or they can be the actual frequencies found in the text being compressed. (This variation requires that a
frequency table or other hint as to the encoding must be stored with the compressed text; implementations
employ various tricks to store tables efficiently.) Huffman coding is optimal when the probability of each
input symbol is a negative power of two. Prefix-free codes tend to have slight inefficiency on small
alphabets, where probabilities often fall between these optimal points. Expanding the alphabet size by
26
coalescing multiple symbols into "words" before Huffman coding can help a bit. The worst case for
Huffman coding can happen when the probability of a symbol exceeds 2
-1
making the upper limit of
inefficiency unbounded. To prevent this, run-length encoding can be used to preprocess the symbols.
Arithmetic coding produces slight gains over Huffman coding, but in practice these gains have seldom been
large enough to offset arithmetic coding's higher computational complexity and patent royalties. (As of
November 2001, IBM owns patents on the core concepts of arithmetic coding in several jurisdictions.)

Adaptive Huffman coding

A variation called adaptive Huffman coding calculates the frequencies dynamically based on recent actual
frequencies in the source string. This is somewhat related to the LZ family of algorithms.

Length-limited Huffman coding

Length-limited Huffman coding is a variant where the goal is still to achieve a minimum weighted path
length, but there is an additional restriction that the length of each codeword must be less than a given
constant.

Huffman template algorithm

Most often, the weights used in implementations of Huffman coding represent numeric probabilities, but
the algorithm given above does not require this; it requires only a way to order weights and to add them.
The Huffman template algorithm enables one to use non-numerical weights (costs, frequencies).

n-ary Huffman coding

The n-ary Huffman algorithm uses the {0, 1, ..., n 1} alphabet to encode message and build an n-ary
tree.

Huffman coding with unequal letter costs

In the standard Huffman coding problem, it is assumed that each symbol in the set that the code words are
constructed from has an equal cost to transmit: a code word whose length is N digits will always have a
cost of N, no matter how many of those digits are 0s, how many are 1s, etc. When working under this
assumption, minimizing the total cost of the message and minimizing the total number of digits are the
same thing. Huffman coding with unequal letter costs is the generalization in which this assumption is no
longer assumed true: the letters of the encoding alphabet may have non-uniform lengths, due to
characteristics of the transmission medium. An example is the encoding alphabet of Morse code, where a
'dash' takes longer to send than a 'dot', and therefore the cost of a dash in transmission time is higher. The
goal is still to minimize the weighted average codeword length, but it is no longer sufficient just to
minimize the number of symbols used by the message.

Applications

Arithmetic coding can be viewed as a generalization of Huffman coding. Although arithmetic coding offers
better compression performance than Huffman coding, Huffman coding is still in wide use because of its
simplicity, high speed and lack of encumbrance by patents. Huffman coding today is often used as a "back-
end" to some other compression method. DEFLATE (PKZIP's algorithm) and multimedia codecs such as
JPEG and MP3 have a front-end model and quantization followed by Huffman coding.

Range encoding

Range encoding is a data compression method that is believed to approach the compression ratio of
arithmetic coding, without the patent issues of arithmetic coding. Like arithmetic coding, range encoding
conceptually encodes all the symbols of the message into one number, unlike Huffman coding which
assigns each symbol a bit-pattern and concatenates all the bit-patterns together. Thus range encoding can
achieve greater compression ratios than the one-bit-per-symbol upper bound on Huffman encoding and it
27
does not suffer the inefficiencies that Huffman does when dealing with probabilities that are not exact
powers of two.
The central concept behind range encoding is this: given a large-enough range of integers, and a probability
estimation for the symbols, the initial range can easily be divided into sub-ranges whose sizes are
proportional to the probability of the symbol they represent. Each symbol of the message can then be
encoded in turn, by reducing the current range down to just that sub-range which corresponds to the next
symbol to be encoded. The decoder must have the same probability estimation the encoder used, which can
either be sent in advance, derived from already transferred data or be part of the compressor and
decompressor.
When all symbols have been encoded, merely identifying the sub-range is enough to communicate the
entire message (presuming of course that the decoder is somehow notified when it has extracted the entire
message). A single integer is actually sufficient to identify the sub-range, and it may not even be necessary
to transmit the entire integer; if there is a sequence of digits such that every integer beginning with that
prefix falls within the sub-range, then the prefix alone is all that's needed to identify the sub-range and thus
transmit the message.
Example
Suppose we want to encode the message "AABA<EOM>", where <EOM> is the end-of-message symbol.
For this example it is assumed that the decoder knows that we intend to use base 10 number system, an
initial range of [0, 100000) and the frequency algorithm {A: .60; B: .20; <EOM>: .20}. The first symbol
breaks down the range [0, 100000) into three subranges:
A: [ 0, 60000)
B: [ 60000, 80000)
<EOM>: [ 80000, 100000)
Since our first symbol is an A, it reduces our initial range down to [0, 60000). The second symbol choice
leaves us with three sub-ranges of this range, we show them following the already-encoded 'A':
AA: [ 0, 36000)
AB: [ 36000, 48000)
A<EOM>: [ 48000, 60000)
With two symbols encoded, our range is now [000000, 036000) and our third symbols leads to the
following choices:
AAA: [ 0, 21600)
AAB: [ 21600, 28800)
AA<EOM>: [ 28800, 36000)
This time it is the second of our three choices that represent the message we want to encode, and our range
becomes [21600, 28800). It looks harder to determine our sub-ranges in this case, but it is actually not: we
can merely subtract the lower bound from the upper bound to determine that there are 7200 numbers in our
range; that the first 4320 of them represent .60 of the total, the next 1440 represent the next .20, and the
remaining 1440 represent the remaining .20 of the total. Adding back the lower bound gives us our ranges:
AABA: [21600, 25920)
AABB: [25920, 27360)
AAB<EOM>: [27360, 28800)
Finally, with our range narrowed down to [21600, 25920), we have just one more symbol to encode. Using
the same technique as before for dividing up the range between the lower and upper bound, we find the
three sub-ranges are:
AABAA: [21600, 24192)
AABAB: [24192, 25056)
AABA<EOM>: [25056, 25920)
And since <EOM> is our final symbol, our final range is [25056, 25920). Because all five-digit integers
starting with "251" fall within our final range, it is one of the three-digit prefixes we could transmit that
would unambiguously convey our original message. (The fact that there are actually eight such prefixes in
all implies we still have inefficiencies. They have been introduced by our use of base 10 rather than base 2.)
The central problem may appear to be selecting an initial range large enough that no matter how many
symbols we have to encode, we will always have a current range large enough to divide into non-zero sub-
28
ranges. In practice, however, this is not a problem, because instead of starting with a very large range and
gradually narrowing it down, the encoder works with a smaller range of numbers at any given time. After
some number of digits have been encoded, the leftmost digits will not change. In the example after
encoding just three symbols, we already knew that our final result would start with "2". Like arithmetic
coding, more digits are shifted in on the right as digits on the left are sent off. Arithmetic coding can be
thought of as a form of range encoding with the range starting at zero and extending to one.

Arithmetic coding

Arithmetic coding is a method for lossless data compression. It is a form of entropy encoding, but where
other entropy encoding techniques separate the input message into its component symbols and replace each
symbol with a code word, arithmetic coding encodes the entire message into a single number, a fraction n
where (0.0 n < 1.0).

How arithmetic coding works

Arithmetic coders produce near-optimal output for a given set of symbols and probabilities. Compression
algorithms that use arithmetic coding start by determining a model of the data -- basically a prediction of
what patterns will be found in the symbols of the message. The more accurate this prediction is, the closer
to optimality the output will be.
Example: a simple, static model for describing the output of a particular monitoring instrument over time
might be:
60% chance of symbol NEUTRAL
20% chance of symbol POSITIVE
10% chance of symbol NEGATIVE
10% chance of symbol END-OF-DATA. (The presence of this symbol means that the
stream will be 'internally terminated', as is fairly common in data compression; the first
and only time this symbol appears in the data stream, the decoder will know that the
entire stream has been decoded.)

Models can handle other alphabets than the simple four-symbol set chosen for this example, of course.
More sophisticated models are also possible: higher-order modeling changes its estimation of the current
probability of a symbol based on the symbols that precede it (the context), so that in a model for English
text, for example, the percentage chance of "u" would be much higher when it followed a "Q" or a "q".
Models can even be adaptive, so that they continuously change their prediction of the data based on what
the stream actually contains. The decoder must have the same model as the encoder.

Each step of the encoding process, except for the very last, is the same; the encoder has basically just three
pieces of data to consider:
- The next symbol that needs to be encoded
- The current interval (at the very start of the encoding process, the
interval is set to [0,1), but that will change)
- The probabilities the model assigns to each of the various symbols
that are possible at this stage (as mentioned earlier, higher-order or
adaptive models mean that these probabilities are not necessarily
the same in each step.)
The encoder divides the current interval into sub-intervals, each representing a fraction of the current
interval proportional to the probability of that symbol in the current context. Whichever interval
corresponds to the actual symbol that is next to be encoded becomes the interval used in the next step.
Example: for the four-symbol model above:
- The interval for NEUTRAL would be [0, 0.6)
- The interval for POSITIVE would be [0.6, 0.8)
- The interval for NEGATIVE would be [0.8, 0.9)
29
- The interval for END-OF-DATA would be [0.9, 1).
When all symbols have been encoded, the resulting interval identifies, unambiguously, the sequence of
symbols that produced it. Anyone who has the final interval and the model used can reconstruct the symbol
sequence that must have entered the encoder to result in that final interval.

It is not necessary to transmit the final interval, however; it is only necessary to transmit one fraction that
lies within that interval. In particular, it is only necessary to transmit enough digits (in whatever base) of
the fraction so that all fractions that begin with those digits fall into the final interval.

Example: we are now trying to decode a message encoded with the four-symbol model above. The
message is encoded in the fraction 0.538 (for clarity, we are using decimal, instead of binary; we are also
assuming that whoever gave us the encoded message gave us only as many digits as needed to decode the
message. We will discuss both issues later.)

We start, as the encoder did, with the interval [0,1), and using the same model, we divide it into the same
four sub-intervals that the encoder must have. Our fraction 0.538 falls into the sub-interval for NEUTRAL,
[0, 0.6); this indicates to us that the first symbol the encoder read must have been NEUTRAL, so we can
write that down as the first symbol of our message. We then divide the interval [0, 0.6) into sub-intervals:
- The interval for NEUTRAL would be [0, 0.36) -- 60% of [0, 0.6)
- The interval for POSITIVE would be [0.36, 0.48) -- 20% of [0, 0.6)
- The interval for NEGATIVE would be [0.48, 0.54) -- 10% of [0, 0.6)
- The interval for END-OF-DATA would be [0.54, 0.6). -- 10% of [0, 0.6)
Our fraction of .538 is within the interval [0.48, 0.54); therefore the second symbol of the message must
have been NEGATIVE. Once more we divide our current interval into sub-intervals:
- The interval for NEUTRAL would be [0.48, 0.516)
- The interval for POSITIVE would be [0.516, 0.528)
- The interval for NEGATIVE would be [0.528, 0.534)
- The interval for END-OF-DATA would be [0.534, 0.540).
Our fraction of .538 falls within the interval of the END-OF-DATA symbol; therefore, this must be our
next symbol. Since it is also the internal termination symbol, it means our decoding is complete. (If the
stream was not internally terminated, we would need to know where the stream stops from some other
source -- otherwise, we would continue the decoding process forever, mistakenly reading more symbols
from the fraction than were in fact encoded into it.)
The same message could have been encoded by the equally short fractions .534, .535, .536, .537 or .539
suggests that our use of decimal instead of binary introduced some inefficiency. This is correct; the
information content of a three-digit decimal is approximately 9.966 bits; we could have encoded the same
message in the binary fraction .10001010 (equivalent to .5390625 decimal) at a cost of only 8 bits. This is
only slightly larger than the information content, or entropy of our message, which with a probability of
0.6% has an entropy of approximately 7.381 bits. (Note that the final zero must be specified in the binary
fraction, or else the message would be ambiguous.)

Precision and renormalization

The above explanations of arithmetic coding contain some simplification. In particular, they are written as
if the encoder first calculated the fractions representing the endpoints of the interval in full, using infinite
precision, and only converted the fraction to its final form at the end of encoding. Rather than try to
simulate infinite precision, most arithmetic coders instead operate at a fixed limit of precision that they
know the decoder will be able to match, and round the calculated fractions to their nearest equivalents at
that precision. An example shows how this would work if the model called for the interval [0,1) to be
30
divided into thirds, and this was approximated with 8 bit precision. Note that now that the precision is
known, so are the binary ranges we'll be able to use.

Symbol
Probability
(expressed as
fraction)
Interval reduced to
eight-bit precision
(as fractions)
Interval reduced to
eight-bit precision (in
binary)
Range in
binary
A 1/3 [0, 85/256)
[0.00000000,
0.01010101)
00000000 -
01010100
B 1/3 [85/256, 171/256)
[0.01010101,
0.10101011)
01010101 -
10101010
C 1/3 [171/256, 1)
[0.10101011,
1.00000000)
10101011 -
11111111
A process called renormalization keeps the finite precision from becoming a limit on the total number of
symbols that can be encoded. Whenever the range is reduced to the point where all values in the range
share certain beginning digits, those digits are sent to the output. However many digits of precision the
computer can handle, it is now handling fewer than that, so the existing digits are shifted left, and at the
right, new digits are added to expand the range as widely as possible. Note that this result occurs in two of
the three cases from our previous example.

Symbol Probability Range
Digits that can be
sent to output
Range after
renormalization
A 1/3
00000000 -
01010100
0 00000000 - 10101001
B 1/3
01010101 -
10101010
None 01010101 - 10101010
C 1/3
10101011 -
11111111
1 01010110 - 11111111

Teaching aid

An interactive visualization tool for teaching arithmetic coding, dasher.tcl, was also the first prototype of
the assistive communication system, Dasher.

Connections between arithmetic coding and other compression methods

Huffman coding

There is great similarity between arithmetic coding and Huffman coding in fact, it has been shown that
Huffman is just a specialized case of arithmetic coding but because arithmetic coding translates the entire
message into one number represented in base b, rather than translating each symbol of the message into a
series of digits in base b, it will often approach optimal entropy encoding much more closely than Huffman
can.

Range encoding

There is profound similarity between arithmetic coding and range encoding. range encoding in classical
implementation uses 9 bits more then arithmetic encoding. So with same precision range encoding is a little
behind in compression but ahead in speed due to operation on bytes. Range encoding, unlike arithmetic
coding, is generally believed not to be covered by any company's patents.
31

The idea behind range encoding is that, instead of starting with the interval [0,1) and dividing it into sub-
intervals proportional to the probability of each symbol, the encoder starts with a large range of non-
negative integers, such as 000,000,000,000 to 999,999,999,999, and divides it into sub-ranges proportional
to the probability of each symbol. When the sub-ranges get narrowed down sufficiently that the leading
digits of the final result are known, those digits may be shifted "left" out of the calculation, and replaced by
digits shifted in on the "right" -- each time this happens, it is roughly equivalent to a retroactive
multiplication of the size of the initial range.

US patents on arithmetic coding

A variety of specific techniques for arithmetic coding have been protected by US patents. Some of these
patents may be essential for implementing the algorithms for arithmetic coding that are specified in some
formal international standards. When this is the case, such patents are generally available for licensing
under what are called reasonable and non-discriminatory (RAND) licensing terms (at least as a matter of
standards-committee policy). In some well-known instances (including some involving IBM patents) such
licenses are available for free, and in other instances, licensing fees are required. The availability of
licenses under RAND terms does not necessarily satisfy everyone who might want to use the technology, as
what may be "reasonable" fees for a company preparing a proprietary software product may seem much
less reasonable for a free software or open source project.

One company well known for innovative work and patents in the area of arithmetic coding is IBM. Some
commenters feel that the notion that no kind of practical and effective arithmetic coding can be performed
without infringing on valid patents held by IBM or others is just a persistent urban legend in the data
compression community (especially considering that effective designs for arithmetic coding have now been
in use long enough for many of the original patents to have expired). However, since patent law provides
no "bright line" test that proactively allows you to determine whether a court would find a particular use to
infringe a patent, and as even investigating a patent more closely to determine what it actually covers could
actually increase the damages awarded in an unfavorable judgment, the patenting of these techniques has
nevertheless caused a chilling effect on their use. At least one significant compression software program,
bzip2, deliberately discontinued the use of arithmetic coding in favor of Huffman coding due to the patent
situation. Some US patents relating to arithmetic coding are listed below.
- Patent 4,122,440 (IBM) Filed March 4, 1977, Granted 24 October 1978 (Now expired)
- Patent 4,286,256 (IBM) Granted 25 August 1981 (presumably now expired)
- Patent 4,467,317 (IBM) Granted 21 August 1984 (presumably now expired)
- Patent 4,652,856 (IBM) Granted 4 February 1986 (presumably now expired)
- Patent 4,891,643 (IBM) Filed 15 September 1986, granted 2 January 1990
- Patent 4,905,297 (IBM) Granted 27 February 1990
- Patent 4,933,883 (IBM) Granted 12 June 1990
- Patent 4,935,882 (IBM) Granted 19 June 1990
- Patent 4,989,000 Filed 19 June 1989, granted 29 January 1991
- Patent 5,099,440 (IBM) Filed 5 January 1990, granted 24 March 1992
- Patent 5,272,478 (Ricoh) Filed 17 August 1992, granted 21 December 1993
Unary coding

Unary coding is an entropy encoding that represents a number n with n-1 ones followed by a zero. For
example 5 is represented as 11110. Unary coding is equivalent to the probability distribution:
P(x) = 2
x

Elias gamma coding

Elias gamma code is a universal code encoding the positive integers. To code a number:
1. Write it in binary.
32
2. Subtract 1 from the number of bits written in step 1 and prepend that many zeros.
An equivalent way to express the same process:
1. Separate the integer into the highest power of 2 it contains (2
N
) and the remaining N binary digits
of the integer.
2. Encode N in unary; that is, as N zeroes followed by a one.
3. Append the remaining N binary digits to this representation of N.
The code begins:
1 1
2 010
3 011
4 00100
5 00101
6 00110
7 00111
8 0001000
9 0001001
10 0001010
11 0001011
12 0001100
13 0001101
14 0001110
15 0001111
16 000010000
17 000010001
To decode an Elias gamma-coded integer:
1. Read and count zeroes from the stream until you reach the first one. Call this count of zeroes N.
2. Considering the one that was reached to be the first digit of the integer, with a value of 2
N
, read the
remaining N digits of the integer.
Gamma coding is used in applications where the largest encoded value is not known ahead of time, or to
compress data in which small values are much more frequent than large values.

Generalizations
Gamma coding does not code zero or the negative integers. One way of handling zero is to add 1 before
coding and then subtract 1 after decoding. Another way is to prefix each nonzero code with a 1 and then
code zero as a single 0. One way to code all integers is to set up a bijection, mapping integers (0, 1, -1, 2, -
2, 3, -3, ...) to (1, 2, 3, 4, 5, 6, 7, ...) before coding. In mathematics, a function f from a set X to a set Y is
said to be bijective if for every y in Y there is exactly one x in X such that f(x) = y.

Elias delta coding

Elias delta code is a universal code encoding the positive integers. To code a number:
1. Write it in binary.
2. Count the bits, remove the leading one, and write that number of starting bits in binary preceding
the previous bit string.
3. Subtract 1 from the number of bits written in step 2 and prepend that many zeros.
An equivalent way to express the same process:
33
1. Separate the integer into the highest power of 2 it contains (2
N'
) and the remaining N' binary digits
of the integer.
2. Encode N = N' + 1 with Elias gamma coding.
3. Append the remaining N' binary digits to this representation of N.
The code begins:
1 = 2
0
=> N' = 0, N = 1 => 1
2 = 2
1
+ 0 => N' = 1, N = 2 => 0100
3 = 2
1
+ 1 => N' = 1, N = 2 => 0101
4 = 2
2
+ 0 => N' = 2, N = 3 => 01100
5 = 2
2
+ 1 => N' = 2, N = 3 => 01101
6 = 2
2
+ 2 => N' = 2, N = 3 => 01110
7 = 2
2
+ 3 => N' = 2, N = 3 => 01111
8 = 2
3
+ 0 => N' = 3, N = 4 => 00100000
9 = 2
3
+ 1 => N' = 3, N = 4 => 00100001
10 = 2
3
+ 2 => N' = 3, N = 4 => 00100010
11 = 2
3
+ 3 => N' = 3, N = 4 => 00100011
12 = 2
3
+ 4 => N' = 3, N = 4 => 00100100
13 = 2
3
+ 5 => N' = 3, N = 4 => 00100101
14 = 2
3
+ 6 => N' = 3, N = 4 => 00100110
15 = 2
3
+ 7 => N' = 3, N = 4 => 00100111
16 = 2
4
+ 0 => N' = 4, N = 5 => 001010000
17 = 2
4
+ 1 => N' = 4, N = 5 => 001010001

To decode an Elias delta-coded integer:
1. Read and count zeroes from the stream until you reach the first one. Call this count of zeroes L.
2. Considering the one that was reached to be the first digit of an integer, with a value of 2
L
, read the
remaining L digits of the integer. Call this integer M.
3. Put a one in the first place of our final output, representing the value 2
M
. Read and append the
following M digits.
Example:
001010001
1. 2 leading zeros in 001
2. read 2 more bits i.e. 00101
3. decode N = 00101 = 5
4. get N' = 5 - 1 = 4 remaining bits for the complete code i.e. '0001'
5. encoded number = 2
4
+ 1 = 17

Elias omega coding

Elias omega coding is a universal code encoding the positive integers. Like Elias gamma coding and Elias
delta coding, it works by prefixing the integer with a representation of its order of magnitude in a universal
code. Unlike those other two codes, however, Elias omega uses itself recursively to encode that prefix;
thus, they are sometimes known as recursive Elias codes. To code a number:
1. Put a 'group' of "0" at the end of the representation.
2. If the number to be encoded is 1, stop; if not, add the binary representation of the number as a
'group' to the beginning of the representation.
3. Repeat the previous step, with the number of digits just written, minus 1, as the new number to be
encoded.
The code begins:
1 0
34
2 10 0
3 11 0
4 10 100 0
5 10 101 0
6 10 110 0
7 10 111 0
8 11 1000 0
9 11 1001 0
10 11 1010 0
11 11 1011 0
12 11 1100 0
13 11 1101 0
14 11 1110 0
15 11 1111 0
16 10 100 10000 0
17 10 100 10001 0
To decode an Elias omega-coded integer:
1. Start with a variable N, set to a value of 1.
2. Read the first 'group', which will either be a single "0", or a "1" followed by N more digits. If it is
a "0", it means the value of the integer is 1; if it starts with a "1", then N becomes the value of the
group interpreted as a binary number.
3. Read each successive group; it will either be a single "0", or a "1" followed by N more digits. If it
is a "0", it means the value of the integer is N; if it starts with a "1", then N becomes the value of
the group interpreted as a binary number.
Omega coding is used in applications where the largest encoded value is not known ahead of time, or to
compress data in which small values are much more frequent than large values.

Fibonacci coding
In mathematics, Fibonacci coding is a universal code that encodes positive integers into binary code
words. All tokens end with "11" and have no "11" before the end. The code begins as follows:
1 11
2 011
3 0011
4 1011
5 00011
6 10011
7 01011
8 000011
9 100011
10 010011
11 001011
12 101011
The Fibonacci code is closely related to Fibonacci representation, a positional numeral system sometimes
used by mathematicians. The Fibonacci code for a particular integer is exactly that of the integer's
Fibonacci representation, except with the order of its digits reversed and an additional "1" appended to the
end. To encode an integer X:
1. Find the largest Fibonacci number equal to or less than X; subtract this number from X, keeping
track of the remainder.
2. If the number we subtracted was the Nth unique Fibonacci number, put a one in the Nth digit of
our output.
3. Repeat the previous steps, substituting our remainder for X, until we reach a remainder of 0.
4. Place a one after the last naturally-occurring one in our output.
35
To decode a token in the code, remove the last "1", assign the remaining bits the values 1,2,3,5,8,13... (the
Fibonacci numbers), and add the "1" bits.

Comparison with other universal codes
Fibonacci coding has a useful property that sometimes makes it attractive in comparison to other universal
codes: it is easier to recover data from a damaged stream. With most other universal code, if a single bit is
altered, none of the data that comes after it will be correctly read. With Fibonacci coding, on the other
hand, a changed bit may cause one token to be read as two, or cause two tokens to be read incorrectly as
one, but reading a "0" from the stream will stop the errors from propagating further. Since the only stream
that has no "0" in it is a stream of "11" tokens, the total edit distance between a stream damaged by a single
bit error and the original stream is at most three.

Golomb coding

Golomb coding is a form of entropy coding invented by Solomon W. Golomb that is optimal for alphabets
following geometric distributions, that is, when small values are vastly more common than large values. It
uses a tunable parameter b to divide an input value into two parts: q, the result of a division by b, and r, the
remainder. The quotient is sent in unary coding, followed by the remainder in truncated binary encoding.
When b = 1 Golomb coding is equivalent to unary coding. Formally, the two parts are given by the
following expression, where x is the number being encoded:

and
r = x qb 1
The final result looks like:

Note that r can be of a varying number of bits. The parameter b is a function of the corresponding Bernoulli
process, which is parameterized by p = P(X = 0) the probability of success in a given Bernoulli Trial. b and
p are related by these inequalities:

The Golomb code for this distribution is equivalent to the Huffman code for the same probabilities, if it
were possible to compute the Huffman code. Rice coding is a special case of Golomb coding first
described by Robert Rice. It is equivalent to Golomb coding where the tunable parameter is a power of two.
This makes it extremely efficient for use on computers, since the division operation becomes a bitshift
operation and the remainder operation becomes a bitmask operation.

Applications
The FLAC audio codec uses Rice coding to represent linear prediction residues. Apple's ALAC audio
codec uses bastardized Rice coding after its Adaptive FIR filter.

Shannon-Fano coding

In the field of data compression, Shannon-Fano coding is a technique for constructing a prefix code based
on a set of symbols and their probabilities (estimated or measured). The symbols are arranged in order from
most probable to least probable, and then divided into two sets whose total probabilities are as close as
possible to being equal. All symbols then have the first digits of their codes assigned; symbols in the first
set receive "0" and symbols in the second set receive "1". As long as any sets with more than one member
36
remain, the same process is repeated on those sets, to determine successive digits of their codes. When a set
has been reduced to one symbol, of course, this means the symbol's code is complete and will not form the
prefix of any other symbol's code.

The algorithm works, and it produces fairly efficient variable-length encodings; when the two smaller sets
produced by a partitioning are in fact of equal probability, the one bit of information used to distinguish
them is used most efficiently. Unfortunately, Shannon-Fano does not always produce optimal prefix codes;
the set of probabilities (.35, .17, .17, .16, .15) is an example of one that will be assigned non-optimal codes
by Shannon-Fano.

For this reason, Shannon-Fano is almost never used; Huffman coding is almost as computationally simple
and always produces optimal prefix codes optimal, that is, under the constraints that each symbol is
represented by a code formed of an integral number of bits. In fact, when in a usage situation, the codes
produced by Huffman coding are only genuinely optimal if the probabilities of the symbols are some power
of a half. This is a constraint that is often unneeded, since the codes will be packed end-to-end in long
sequences. In such situations, arithmetic coding can produce greater overall compression than either
Huffman or Shannon-Fano, since it can encode in fractional numbers of bits that more closely approximate
the actual information content of the symbol. However, arithmetic coding has not obsoleted Huffman the
way that Huffman obsoletes Shannon-Fano, both because arithmetic coding is more computationally
expensive and because it is covered by multiple patents. However, range encoding is equally efficient, and
is not plagued by patent issues.

JPEG Joint Photographic Experts Group

In computing, JPEG (pronounced jay-peg) is a most commonly used standard method of lossy
compression for photographic images. The file format which employs this compression is commonly also
called JPEG; the most common file extensions for this format are .jpeg, .jfif, .jpg, .JPG, or .JPE although
.jpg is the most common on all platforms.

The name stands for Joint Photographic Experts Group. JPEG itself specifies only how an image is
transformed into a stream of bytes, but not how those bytes are encapsulated in any particular storage
medium. A further standard, created by the Independent JPEG Group, called JFIF (JPEG File
Interchange Format) specifies how to produce a file suitable for computer storage and transmission (such as
over the Internet) from a JPEG stream. In common usage, when one speaks of a "JPEG file" one generally
means a JFIF file, or sometimes an Exif JPEG file. There are, however, other JPEG-based file formats,
such as JNG, and the TIFF format can carry JPEG data as well.

JPEG/JFIF is the format most used for storing and transmitting photographs on the World Wide Web. It is
not as well suited for line drawings and other textual or iconic graphics because its compression method
performs badly on these types of images (the PNG and GIF formats are in common use for that purpose;
GIF, having only 8 bits per pixel is not well suited for colour photographs, PNG can be used to losslessly
store photographs but the filesize makes it largely unsuitable for putting photographs on the web).












37
Channel Coding



The fundamental resources at the disposal of a communications engineer are:

- Signal power,
- Time and
- Bandwidth.

For a given communication system design these three resources can be traded against each other. In general
communications engineers seek to maximise data transfer rate, in a minimum bandwidth while maintaining
an acceptable quality of transmission. The quality of transmission, in the context of digital
communications, is essentially concerned with the probability of bit error, P
b
, at the receiver.

Channel Capacity

The capacity of a channel is the maximum rate at which information can be transferred through a given
channel, and the rate of transmitting this information depends on the rate at which the source generates the
information as well as the fidelity of the channel. If the communication channel is noisy, then more effort is
needed to transmit the intended information reliably. We can optimise information transmission over a
noisy channel by applying the Shannon-Hartley theorem that states that:
|
.
|

\
|
+ =
N
S
B
R
1
log
2
max
bits/s
where R
max
is the maximum rate of information transmission over a channel with bandwidth B and signal to
noise ratio S/N.

This theorem is also called the channel capacity theorem. It tells us that we can maintain the same rate of
information transmission at low signal to noise ratios (S/N) simply by increasing the available bandwidth
B. Alternatively, we can maintain the same transmission rate at reduced bandwidths simply by increasing
the signal power. Note that the signal to noise ratio increases if signal power is increased for the same noise
level.

Need for Error control coding

In general communication systems convey information through imperfect channels. Channel impairments
are classified as:

1) Additive noise
a) White noise
b) Impulsive noise, e.g. man-made noise

38
2) Channel disturbances
a) Fading in radio channels
b) Breaks in transmission

The likely effects on digital signals are:
1) Uniformly random errors
These are digital errors that occur individually and independently. They are caused primarily by noise.
2) Burst errors
These are digital errors that occur in clusters. They are caused mainly by a combination of noise and
channel disturbances.
3) Erasures
These are irregular intervals when no reliable signal can be detected as a result of severe channel
disturbances.

Despite these impairments, communication systems are expected to transmit reliably at the required data
rate (often determined by the service being provided) within the bandwidth constraint of the available
channel and the power constraint of the particular application. For instance, in mobile communications the
prescribed channel allocations determine the bandwidth, and safety considerations or transceiver battery
technology determine the maximum radiated power. For reliable communication, the required data rate
must be achieved within an acceptable BER (Bit error rate) and time delay. Usually, some form of coding
is required if reliable communication is to be achieved within the given constraints of signal power and
available bandwidth. This form of coding is called error control coding or channel coding.

Error rate control concepts

The normal measure of error performance in digital communications is the bit error rate (BER). BER is the
average rate at which errors occur. If P
b
is the probability of any transmitted binary digit being in error and
R
b
is the bit transmission rate in the channel, then
R P b b
BER = . If the error rate of a particular system
is too large then the following measures, either separately or in combination are used to reduce the BER:

1) Increasing the transmitted power:
This is not always desirable, especially in mobile communications, where transceiver batteries limit
the maximum available power.

2) Applying diversity:
Diversity techniques are especially suitable for reducing errors due to signal fading: There are three
types of implementing diversity:
a) Space diversity:
Data is transmitted simultaneously over two or more paths that are sufficiently separated to
prevent the same channel disturbance affecting more than one path.
b) Frequency diversity:
Two or more frequencies are used to transmit the same data.
c) Time diversity:
The same message is transmitted more than once at different times.

3) Using full duplex transmission:
This uses simultaneous two-way transmission. When the transmitter sends information to a receiver,
the information is echoed back to the transmitter on a separate feedback channel. Information echoed
back that contains errors can then be retransmitted. This technique requires twice the bandwidth of a
single direction (simplex) transmission that may be unacceptable in cases where bandwidth is limited.

4) Automatic repeat request (ARQ):
Here a simple error detecting code is used and if any error is detected in a given data block, then a
request is sent via a feedback channel to retransmit that block.
There are two major ARQ techniques:
a) Stop and wait ARQ
39
Each block of data is positively or negatively acknowledged by the receiver as being error
free before the next data block can be transmitted. If the transmitter does not receive any
acknowledgement within a time-out period, the data block is retransmitted in the same
way as if it had been negatively acknowledged.

b) Continuous ARQ
Blocks of data continue to be transmitted without waiting for each previous block to be
acknowledged. Continuous ARQ can, in turn, be divided into two variants:
(i) Go-back-n ARQ:
Data blocks carry a sequence or reference number n. Each acknowledgement
signal contains the reference number of a data block and effectively
acknowledges all data blocks up to n-1. When a negative acknowledgement is
received (or data block timed out) all data blocks starting from the reference
number in the last acknowledgement signal are retransmitted.

(ii) Selective repeat ARQ:
In this case only those data blocks explicitly negatively acknowledged are
retransmitted.

ARQ is very effective, and it is the main error control method in Group 3 facsimile
transmission.

5) Forward error correcting coding (FECC):
Source data is partitioned into blocks of data interleaved with sufficient check bits to enable the
receiver to correct the most probable errors. FECC is more complex than the other error control
techniques and had to wait for the advent of VLSI that made it practical to implement the FECC
algorithms in hardware.

Applications for error control Coding

Error control coding is used in digital systems in two cases:
- Where the value of the message is critically dependent on reliable
transmission
- When the transmission media is very prone to error.

Messages that are critically dependent on reliable transmission:
- Digital data with no internal redundancy
- Control signals for multiple-user systems of all kinds
- Voice or data that have undergone highly efficient source coding e.g. mobile communication.

Transmission media with potentially high error rates include:
- The switched telephone network used for the conveyance of voice-frequency data signals
- Very long radio paths, in space communications
- Radio paths which are particularly vulnerable to fading, such as in mobile systems
- Storage media which may exhibit localised flaws, leading to error bursts in recovered data.

Error coding in Specific applications

- Digital Speech over the Fixed telephone network:
The error rate of the medium is very low, and speech has so much redundancy that common bit errors
are not noticeable. Hence no error coding is implemented in this case.
- Computer Communications:
Computer data is very sensitive to errors, but it does not suffer adversely from delay. ARQ is the
preferred mode of error control in this case.
- Space Communications:
40
The long radio paths associated with space communications introduce very long propagation delays,
which mean ARQ, with its need for retransmission, is unsuitable. FECC codes are the preferred error
correction codes.
- Mobile communications:
Very efficient source coding algorithms are typically used to compress voice and data in mobile
applications since bandwidth is limited. Hence mobile applications are critically dependent on reliable
transmission since they have little or no redundancy. Error control techniques that are used are
generally complex mixtures of FECC techniques in the case of voice, and a combination of FECC and
ARQ techniques in the case of data.
- Audio Compact Disk systems:
Audio compact disk systems use FECC codes for error control so as to guarantee the high fidelity of
the stored audio signals (typically music).

An Introduction to Error Coding Techniques

Looking back at the diagram for a digital communication system we note that there are two coding schemes
involved:












The basic components of a digital communication system

The first coding scheme is source coding. Its main purpose is to improve transmission efficiency by
ensuring that the minimum possible code bits are assigned to each code word emanating from a source.

The second coding scheme, called error control coding (or channel coding), improves transmission
reliability by deliberately adding more bits to the code words before they are transmitted. These additional
bits are called check bits and they enable a receiver to detect, and if possible, to correct errors in the
received bit-stream.

( n, k) Block codes

In a block code, the source data is partitioned into blocks of k bits. The encoder in the transmitter translates
each block into a block of n bits (n > k) for transmission through the imperfect medium. The code is
conventionally known as the (n, k) block code.





The transmitted n-digit code word consists of the k data bits together with (n k) check bits generated from
the data bits according to a defined algorithm.

Information
source and
input
transducer
Source
encoder
Channel
encoder
Digital
modulator

Channel
Digital
demodulator
Channel
decoder
Source
decoder
Output
transducer
Output
signal
Block encoder

n encoded bits k data bits
41







The decoder at the receiver recovers the k data bits. The ratio of the data bits to the total number of bits in
the code word is called the rate or efficiency (R) of the error code. i.e. R = k/n.

Error Control capabilities of a Code

The error control capabilities of a code depend on a key property of the code known as the Hamming
distance. The Hamming distance between two code words is defined as the number of bit positions in
which they differ.

The smallest Hamming distance between any pair of code words in a given code is called the distance of
the code. This distance is important because it determines how easy it is to change or alter one valid
codeword into another.

The number of nonzero components of a codeword is termed the Hamming weight of the codeword, and
for any code the distance of the code equals the minimum possible weight of a non-zero code word
belonging to the code.

Example:
Comment on the error correcting capabilities of codes with the following distances
a) d = 1
b) d = 2
c) d = 3

Solution:
a) If d = 1, then successive code words differ in only one bit position. Hence no error control is possible
since any error pattern can convert a valid word into another valid word.
b) If d = 2 then any two code words will differ in at least two bit positions. If a single bit error occurs, a
valid code word will be converted into an invalid code word. Hence it is possible to use this code to
detect single errors. However it is not possible to identify and correct the error, since the invalid word
is equally distant from several valid words, and there is no way of choosing between them. If a double
error occurs then a valid word is converted into another valid word, and there is no way the detector
can pick this error. If, on the other hand, a triple bit error occurs in a word, the detector will see this as
a single error. In fact all even sums of bit errors will not be detected and all odd sums errors will be
seen as single bit errors.

c) If d = 3, any single bit error converts a valid word into an invalid word. This invalid word will be at a
distance 1 from the true source word but distance 2 from any alternative valid words. On the basic
assumption that single errors are more probable than multiple errors, the source word can be inferred
and the single error corrected. Double errors will be detected, but there is no way they can be corrected
with absolute certainty. In addition, all errors that are multiples of 3 will not be detected since they
simply convert valid words to other valid words. Other multiple errors will be seen as either single or
double errors.

The results of this example can be summarised as:

Error detection capability of a code:
If the distance of a code is
dmin
,
then the code can successfully detect up to
1
min

d
errors.

k data bits (n-k) check
bits
n digit codeword
42
Error correction capability:

If the distance of a code is
dmin
,
then the code can successfully correct
|
|
.
|

\
|
2
1
int
min d
errors. Here int( )
indicates the integer part of.

Single parity check code

The simplest error control coding technique is to add parity bits to source codes so as to ensure that the
parity of the transmitted code word is always even or odd. A code word is said to have even parity when
the number of 1s in the code word is even, otherwise it has odd parity. The receiver identifies an error in
the received codeword if it does not have the expected parity.

Practice Examples

1. For the following code words add a single parity bit so as to achieve
a) Even parity
b) Odd parity
0001, 0101, 1110, 011110, 010101

2. Source codes consisting of four code bits have a single bit added for parity checking. Determine the
received source codes that have errors for the case of
a) an even parity system
b) an odd parity system
00100, 11110, 01010, 00100, 01110, 01111

3. The ASCII code is used to represent 128 character and control symbols in personal computers. Each
ASCII code is 7 bits long. A parity bit is often added to bring the number of bits in a single codeword
to 8. Calculate the rate of the 8-bit ASCII code and determine the distance of both odd parity and even
parity 8 bit ASCII. Show that both odd and even parity schemes can only identify odd numbers of
errors and not even.

Array Codes

Multiple parity checks can be implemented to improve the simple parity check technique described above.
One of these techniques is to consider the bits to be encoded as a rectangular array whose rows and
columns constitute natural subsets e.g. a single row could represent a single code word, and a given number
of rows could make up a single transmission block i.e.







Transmission block arranged as an array
To each row will be appended a row parity check bit, and to each column is appended a column parity
check bit. Finally, an overall parity check bit (check on checks bit) is placed where the row parity check
column intersects with the column parity check row i.e.




1
st
Code word
n
th
code word
43












Transmission block arranged as an array with parity check bits

Changing any one bit alters three parity bits one row, one column and the overall parity bit. This enables
any single error to be located it is at the intersection of the row and columns whose parities are upset. A
double error changes two parities if both errors are in a common row or column, four parities otherwise. It
can be detected but not corrected since in each case two or more error patterns can yield the same
condition. A triple error changes the overall parity and various row and column parities, again detectable
but not correctable. A quadruple error may fail to be detected: consider four bit errors at the corners of a
rectangle, which violate no parities whatsoever. We therefore conclude that the code has distance 4, and
can be used to correct one error and detect two or to detect three errors.

Practice Example

Array coding is used for error control purposes in a certain data transmission link. The digits are divided
into blocks of four rows and 4 columns. Row and column even parity check bits are added to each block. A
check on check digit is added at the receiver. The following is received at the receiver:
1101110110100100111100000
Determine whether the received codeword is the same as the transmitted codeword. Will double errors be
detected, and will it be possible to correct these errors? Explain clearly.

Linear Block Codes

In an (n, k) block code, the information data bits are divided into blocks of length k, and each block is then
encoded into a code word of length n with (n > k). The original message bits form the first k bits of the new
code word whilst the added bits make up the last (n k) bits of the code word. These additional bits are
called check bits. There are 2
k
code words in an (n, k) block code.

A block code is linear if any linear combination of two code words is also a code word. In the binary case
this requires that if two code words are valid members of the code then the code word obtained by
component wise modulu-2 addition of the two code words will also be a valid code word.

Construction of an (n, k) linear block code

Consider a k bit message data sequence {d
1
, d
2
, d
3
,---- d
k
}. The last (n k) bits in the error control code
word are generated according to some predetermined sequence:
C
k+1
= p
11
d
1
+ p
21
d
2
+ -------- + p
k1
d
k

C
k+2
= p
12
d
1
+ p
22
d
2
+ -------- + p
k2
d
k





C
n
= p
1(n-k)
d
1
+ p
2(n-k)
d
2
+ -------- + p
k(n-k)
d
k

Code
words Row checks
Column
checks
Check on checks
44
This can be more compactly written in matrix form as:

[C
k+1
, C
k+2
,.C
n
] = [d
1
,d
2
,.d
k
] p
11
p
12
-------- p
1k

p
21
p
22
-------- p
2k

` p
31
p
32
------- p
3k



pk
1
pk
2
-------- p
kk

This can be written in short hand notation as:

[C
k+1
, C
k+2
,.C
n
] = D - P

where D is the row matrix representing the k- message bit sequence and P is the parity check generator
matrix. The whole n bit codeword is generated by the following matrix operation:


[C
1
, C
2
, C
3
.C
n
] = [d
1
,d
2
,.d
k
] 1 0 0 0 ----0 p
11
p
12
-------- p
1n-k

0 1 0 0 ---- 0 p
21
p
22
-------- p
2n-k

` 0 0 1 0 ---- 0 p
31
p
32
------- p
3n-k



0 0 0 0 ---- 1 pk
1
pk
2
-------- p
kn-k

This can be written in shorthand notation as:
C = D- G
Where D is the row matrix representing the k - message bit sequence; P is the parity check generator
matrix and G is termed the generator matrix. The generator matrix G consists of an identity matrix I
k
and
the parity check generator matrix P
i.e. G = | I
k
P|

Associated with each (n, k) block code is a parity check matrix H that is used at the receiver to verify
whether the received codeword is error free or not.
H = | P
T
I
n-k
|

A received codeword has been generated by the source generator matrix G if

CH
T
= 0
Where C is the codeword received without any errors, and H
T
is the transpose of the parity check matrix H
i.e.
H
T
= P


I
n-k


If the received codeword R is in error then:

RH
T
= (C + E)H
T
= CH
T
+ EH
T

where E is the error vector.

45
RH
T
= EH
T

The term RH
T
is called the error syndrome S. If an error occurs in the i
th
bit of the codeword C, then the
error syndrome will have the same binary expression as the i
th
row of the H
T
matrix.
Note: In designing the codes ensure that no row of the H
T
matrix will have an all 0s expression since the
syndrome for no errors corresponds to a row of all 0s.

To an enable a linear (n, k) block code to correct all single errors the following inequality, known as the
Hamming inequality, must be satisfied:
) 1 (
log
2
+ + > n k n
To convert from one base to another, this is the rule:
log[a]x
log[b]x = ---------
log[a]b
For example, to convert from base 10 to base 2, we would say:
log[10]x
log[2]x = ----------
log[10]2
Practice Examples

1. A binary code has a parity check generator matrix

P = 1 0 1
0 1 1
1 1 1

a) Is the word 1 0 1 0 1 0 a valid code word?
b) If the code word 001111 is transmitted, but 0011101 is received, what is the resultant syndrome?
c) Where would the syndrome indicate an error had occurred?
d) List all the possible code words in this code.

2. A binary message consists of words that are 5 bits long. The message words are to be encoded using
single error correcting code. The first 5 bits of each code word must be the message bits d
1
, d
2
, d
3
, d
4
,
d
5
, while the remaining bits are check bits.
a) What is the minimum number of check bits?
b) Create suitable P and G matrices.
c) Construct an appropriate H matrix for this code.
d) Find the syndrome at the receiver if there is an error in d
5
.

3. Consider a single error-correcting code for 11 message bits.
a) How many check bits are required?
b) Find a suitable G matrix.
c) Find the syndrome if the single error occurs in the 7
th
position.

4. Prove that if an error occurs in the i
th
bit of a codeword C then the error syndrome S = CH
T
will have
the same value as the i
th
row of the H
T
matrix.

Binary Cyclic Codes

Binary cyclic codes are a popular subclass of linear block codes, which do not possess the all zeros
codeword. A cyclic redundancy check (CRC) is a type of function that takes an input of data stream of any
length and produces as output a value of a certain fixed size. The term CRC is often used to denote either
the function or the function's output. A CRC can be used as a checksum to detect alteration of data during
46
transmission or storage. The CRC was invented by W. Wesley Peterson, and published in his 1961
paper.Their properties and advantages are as follows:

Their mathematical structure permits higher order correcting codes
Their code structures can be easily implemented in hardware by using simple shift registers and
exclusive-Or gates.
Are particularly good at detecting common errors caused by noise in transmission channels.
Cyclic code members are all cyclical shifts of one another.
Cyclic codes can be represented as, and derived using, polynomials

The cyclic property can be expressed as follows. If:

C = (C
1
, C
2
, C
3
, , C
n
)

Then:
C
i
= (C
i+1
, ., C
n
, C
1
, C
2
, .., C
i
)

C
i
is the result of cyclically shifting the original code word C by i bits. For instance, the following three
codes are obtained from each other by cyclically shifting.

a) 1001011
b) 0010111
c) 0101110

NB: Code (b) is obtained by shifting code (a) left by one bit such that the left most bit in (a) becomes
the right most bit in (b). Code (c) is obtained by shifting code (b) left by one bit such that the left
most bit in (b) becomes the right most bit in (c).

Polynomial Representation of Code words

A codeword of length k bits:
m m mk 0 1 1
, , ,

may be represented by a polynomial of order k-1:



m m x m
x x M
k
k 0 1
1
1
) ( + + + =



Cyclic Generator Polynomial

If g(x) is a polynomial of degree (n k), and is a factor of 1 +
x
n
, then g(x) generates an (n,k) cyclic code
in which the code polynomial V(x) for a data message:

) , , , (
1 2 1 0 d d d d k
D

=

is generated by:
) ( ) ( ) ( x g x D x V =

in accordance with the following modulo-2 manipulation rules:

0 + 0 = 0 0 - 0 = 0
0 + 1 = 1 0 - 1 = 0
1 + 0 = 1 1 - 0 = 0
1 + 1 = 0 1 - 1 = 1
Addition is the same as subtraction
No carry is involved

47
Using modulo-2 arithmetics, the codeword can be expressed in the following systematic form:
) , , , , , (
1 1 0 1 2 1 0 d d d r r r r k k n
V

=

where
r r k n 1 0
,

are the n k parity bits and
d d k 1 0
,

are the k message bits.



The generated codeword can be expressed in polynomial form as:
r r x r x
r
x d x d x d x d
x x V
k n
k n k n n
k
n
k k n
0 1
2
2
1
0
1
1
2
2
1
1 1
) ( + + + + =

+


+ + + +



And the parity check bits can be represented in polynomial form as:
r r x r x r
x x r
k n
k n 0 1
2
2
1
1
) ( + + + + =





This parity check polynomial is the remainder of dividing ) (x D
x
k n

by g(x) i.e.:
) (
) (
) (
x g
x D
rem x r
x
k n

=



The parity check polynomial r(x) is also called the cyclic redundancy check. It is also important to realise
that the expression ) (x D
x
k n

is simply the equivalent of appending a sequence of (n-k-1) zeros to the


right handside (i.e. least significant bit side) of the message bit stream D(x). A numerical example
illustrates this concept best: Given D(x) = 100111 and n = 9; k is the bit length of the message D(x) which
is 6. In this case (n-k) = (9 6) = 3.
Hence ) (x D
x
k n

= ) (
3
x D
x

But 1000
3
=
x

) (
3
x D
x
= 1000
bin
-D(x)
= 1000
bin
-100111
bin
= 100111000
bin


From this illustration it is easily seen that the generated codeword V(x) is:

) ( ) (
) (
x r x D
x x V
k n
+
=



Error Detection and Correction

At the receiver if the received codeword V(x) is divided by the cyclic code generator g(x) and the remainder
is zero, then no errors have occurred. This can be easily verified as follows:

From the expression:
) (
) (
) (
x g
x D
rem x r
x
k n

=


it follows that:
) ( ) ( ) ( ) ( x r x g x Q x D
x
k n
+ =


Where Q(x) is the quotient. If we subtract r(x) from ) (x D
x
k n

we get an expression which is an exact


multiple of g(x). But subtraction is the same process as addition in modulo-2 arithmetics. Hence the
expression we get is:
48
) ( ) ( ) ( ) ( x g x Q x r x D
x
k n
= +



But the right handside is simply the expression for the generated code word V(x). Put simply V(x) is a
multiple of the code generation polynomial g(x) and if no errors are introduced during transmission then the
received codeword will be completely divisible by g(x).

Common Issues with CRC

The concept of the CRC as an error-detecting code gets complicated when an implementer or standards
committee turns it into a practical system. Here are some of the complications:

Sometimes an implementation prefixes a fixed bit pattern to the bitstream to be checked. This is
useful when clocking errors might insert 0-bits in front of a message, an alteration that would
otherwise leave the CRC unchanged.
Sometimes an implementation appends n 0-bits (n being the size of the CRC) to the bitstream to
be checked before the polynomial division occurs. This has the convenience that the CRC of the
original bitstream with the CRC appended is exactly zero, so the CRC can be checked simply by
performing the polynomial division on the expanded bitstream and comparing the remainder with
zero.
Sometimes an implementation exclusive-ORs a fixed bit pattern into the remainder of the
polynomial division.
Some schemes view the low-order bit of each byte as "first", which then during polynomial
division means "leftmost", which is contrary to our customary understanding of "low-order". This
convention makes sense when serial-port transmissions are CRC-checked in hardware, because
some widespread serial-port transmission conventions transmit bytes least-significant bit first.
With multi-byte CRCs, there can be confusion over whether the byte transmitted first (or stored in
the lowest-addressed byte of memory) is the least-significant byte or the most-significant byte. For
example, some 16-bit CRC schemes swap the bytes of the CRC.


Uses of Binary Cyclic Codes

An important reason for the popularity of CRCs for detecting the accidental alteration of data is their
efficiency guarantee. Typically, an n-bit CRC, applied to a data block of arbitrary length, will detect any
single error burst not longer than n bits (in other words, any single alteration that spans no more than n bits
of the data), and will detect a fraction 1-2
-n
of all longer error bursts. Errors in both data transmission
channels and magnetic storage media tend to be distributed non-randomly (i.e. are "bursty"), making CRCs'
properties more useful than alternative schemes such as multiple parity checks.

CRCs are not, by themselves, suitable for protecting against intentional alteration of data (for example, in
authentication applications for data security), because their convenient mathematical properties make it
easy to compute the CRC adjustment required to match any given change to the data. While useful for error
detection, CRCs cannot be safely relied upon to fully verify data correctness in the face of deliberate (rather
than random) changes. It is often falsely assumed that when a message and its CRC are transmitted over an
open channel, then when it arrives and the CRC matches the message's calculated CRC then the message
can not have been altered in transit. This assumption is false because CRC is a poor method of data
encryption. In fact, it is not really encryption at all: it is supposed to be used for data integrity checks, but is
occasionally assumed to be used for encryption. When a CRC is calculated, the message is preserved (not
encrypted) and the constant-size CRC is tacked onto the end (i.e. the message can be just as easily read as
before the CRC was created).

Additionally, as the length of the CRC is usually much smaller than the length of the message, it is
impossible for a 1:1 relationship between CRCs and messages. So multiple codes will produce the same
CRC. Of course, these codes are designed to be different enough such that random (and usually only one or
two bit) changes in the codeword will produce a fantastically different CRC, and so the error is likely to be
detected. If deliberate tampering (changes to bits) occurred in the message, then a new, phony CRC could
49
be computed for the new message and replace the real CRC on the end of the packet (created a packet that
would be verified by any Data-Link entity). So CRCs can be relied upon to verify integrity but not
correctness.

The polynomial coded message V(x) consists of the original k message bits, followed by n k additional
bits. The generator polynomial is carefully chosen so that almost all the errors are detected. Using a
generator of degree k (NB. Degree refers to the highest power in an expression) allows the detection of all
burst errors affecting up to k consecutive bits. The generator chosen by ITU-T for the V.41 standard used
extensively on wide area networks (WANs) is:
1 ) (
5 12 16
+ + + =
x x x
x V

The generator used on the Ethernet local area bus network is:

1
2 4 5 7 8 10 11 12 16 22 23 26 32
) (
+
+ + + + + + + + + + + + + =
x x x x x x x x x x x x x x
x V

CRC six-bit code words are also transmitted within the PCM plesiochronous multiplex to improve the
robustness of frame alignment words. The error correcting capabilities of CRC codes is low and it is mainly
used when ARQ retransmission is deployed, rather than for error correcting itself.

Criticism

CRCs as used in globally standardized telecommunications systems have not been fully standardized. Most
CRCs in current use have some weakness with respect to strength or construction. Standardization of CRCs
would allow for better designed CRCs to come into common use.

Both forms of CRC-8 in use have notable weaknesses mathematically.
The definition of CRC-12 is disputed, as there are 3 forms of CRC-12 in common use.
CCITT CRCs differ from ITU CRCs (of the same size), as the same entity has standardized
checksums more than once but in different eras.
Despite its popularity, the CRC-32 polynomial used by (among others) V.42, Ethernet, FDDI and
ZIP and PNG files, was arbitrarily chosen and generally exhibits very poor error detection
properties (in terms of Hamming distance per given block size) compared to polynomials selected
by algorithmic means
It is assumed that at least 10 other forms of CRC-16 and CRC-32 exist, but no form of CRC-16 or
CRC-32 in use is mathematically optimal.
The ITU and IEEE have been historically helpful in standardizing checksums used in
telecommunications equipment and protocols -- but have provided little to no support in
standardization since the end of the Cold War.





















50
Common polynomials
Name Polynomial Representations
CRC-1 x + 1 (most hardware; also known as parity bit) 0x1 or 0x1 (0x1)
CRC-4-ITU x
4
+ x + 1 (ITU G.704, p. 12) 0x3 or 0xC (0x9)
CRC-5-ITU x
5
+ x
4
+ x
2
+ 1 (ITU G.704, p. 9) 0x15 or 0x15 (0x0B)
CRC-5-USB x
5
+ x
2
+ 1 (USB token packets) 0x05 or 0x14 (0x9)
CRC-6-ITU x
6
+ x + 1 (ITU G.704, p. 3) 0x03 or 0x30 (0x21)
CRC-7 x
7
+ x
3
+ 1 (telecom systems, MMC) 0x09 or 0x48 (0x11)
CRC-8-ATM x
8
+ x
2
+ x + 1 (ATM HEC) 0x07 or 0xE0 (0xC1)
CRC-8-CCITT x
8
+ x
7
+ x
3
+ x
2
+ 1 (1-Wire bus) 0x8D or 0xB1 (0x63)
CRC-8-
Dallas/Maxim
x
8
+ x
5
+ x
4
+ 1 (1-Wire bus) 0x31 or 0x8C (0x19)
CRC-8 x
8
+ x
7
+ x
6
+ x
4
+ x
2
+ 1 0xD5 or 0xAB (0x57)
CRC-8-SAE
J1850
x
8
+ x
4
+ x
3
+ x
2
+ 1 0x1D or 0xB8 (0x71)
CRC-10 x
10
+ x
9
+ x
5
+ x
4
+ x + 1 0x233 or 0x331 (0x263)
CRC-11 x
11
+ x
9
+ x
8
+ x
7
+ x
2
+ 1 (FlexRay) 0x385 or 0x50E (0x21D)
CRC-12 x
12
+ x
11
+ x
3
+ x
2
+ x + 1 (telecom systems,
[6][7]
) 0x80F or 0xF01 (0xE03)
CRC-15-CAN x
15
+ x
14
+ x
10
+ x
8
+ x
7
+ x
4
+ x
3
+ 1 0x4599 or 0x4CD1 (0x19A3)
CRC-16-
Fletcher
Not a CRC; see Fletcher's checksum Used in Adler-32 A & B CRCs
CRC-16-CCITT
x
16
+ x
12
+ x
5
+ 1 (X.25, V.41, CDMA, Bluetooth,
XMODEM, PPP, IrDA, BACnet; known as CRC-CCITT)
0x1021 or 0x8408 (0x0811)
CRC-16-IBM
x
16
+ x
15
+ x
2
+ 1 (SDLC, USB, many others; also known as
CRC-16)
0x8005 or 0xA001 (0x4003)
CRC-24-Radix-
64
x
24
+ x
23
+ x
18
+ x
17
+ x
14
+ x
11
+ x
10
+ x
7
+ x
6
+ x
5
+ x
4
+ x
3
+
x + 1 (FlexRay)
0x864CFB or 0xDF3261 (0xBE64C3)
CRC-30
x
30
+ x
29
+ x
21
+ x
20
+ x
15
+ x
13
+ x
12
+ x
11
+ x
8
+ x
7
+ x
6
+ x
2

+ x + 1 (CDMA)
0x2030B9C7 or 0x38E74301
(0x31CE8603)
CRC-32-Adler Not a CRC; see Adler-32 See Adler-32
CRC-32-IEEE
802.3
x
32
+ x
26
+ x
23
+ x
22
+ x
16
+ x
12
+ x
11
+ x
10
+ x
8
+ x
7
+ x
5
+ x
4

+ x
2
+ x + 1 (V.42, MPEG-2, PNG
[8]
)
0x04C11DB7 or 0xEDB88320
(0xDB710641)
CRC-32C
(Castagnoli)
x
32
+ x
28
+ x
27
+ x
26
+ x
25
+ x
23
+ x
22
+ x
20
+ x
19
+ x
18
+ x
14
+
x
13
+ x
11
+ x
10
+ x
9
+ x
8
+ x
6
+ 1
0x1EDC6F41 or 0x82F63B78
(0x05EC76F1)
CRC-32K
(Koopman)
x
32
+ x
30
+ x
29
+ x
28
+ x
26
+ x
20
+ x
19
+ x
17
+ x
16
+ x
15
+ x
11
+
x
10
+ x
7
+ x
6
+ x
4
+ x
2
+ x + 1
0x741B8CD7 or 0xEB31D82E
(0xD663B05D)
CRC-64-ISO x
64
+ x
4
+ x
3
+ x + 1 (HDLC ISO 3309)
0x000000000000001B or
0xD800000000000000
(0xB000000000000001)
CRC-64-
ECMA-182
x
64
+ x
62
+ x
57
+ x
55
+ x
54
+ x
53
+ x
52
+ x
47
+ x
46
+ x
45
+ x
40
+
x
39
+ x
38
+ x
37
+ x
35
+ x
33
+ x
32
+ x
31
+ x
29
+ x
27
+ x
24
+ x
23
+
x
22
+ x
21
+ x
19
+ x
17
+ x
13
+ x
12
+ x
10
+ x
9
+ x
7
+ x
4
+ x + 1 (as
described in ECMA-182 p.63)
0x42F0E1EBA9EA3693 or
0xC96C5795D7870F42
(0x92D8AF2BAF0E1E85)
Known to exist, but technologically defunct -- mainly replaced by cryptographic hash functions
CRC-128 (IEEE)
CRC-256 (IEEE)


51
Examples

1. A (6,3) binary cyclic codes with generator polynomial 1 ) (
2
+ =
x
x g is used in a certain
transmission link. Construct the set of transmitted code words.
2. A (15,5) linear cyclic code has a generator polynomial
x x x x x
x x g
+ +
+ + + + =
10 8 5 4 2
1 ) (
Find the codeword polynomial for the message polynomial
x x
4 2
1 + +
3. For a (7,3) linear cyclic code with
x x x
x g
4 3 2
1 ) ( + + + = find all the possible code words by first
finding the CRC polynomial r(x) in each case.
4. The binary message 10111 is to be transmitted with cyclic redundancy check error detection. The CRC
generator polynomial is
x x
x x g
3 2
) ( + + = . What is the transmitted codeword? Show that if the
codeword is received without error then dividing the codeword polynomial by g(x) gives a remainder
of zero.

Digital Multiplexing Aspects

Multiplexing is a means of sharing a communication link by enabling simultaneous transmission of more
than one signal on the same physical bearer. A bearer is simply a transmission link between any two points.

Types of Multiplexing
Multiplexing falls into two groups:
- Frequency division multiplexing
- Time division multiplexing

In Frequency division multiplexing (FDM) simultaneous transmission is achieved by dividing the
bandwidth available on the communication link among the various signal channels.

In Time division multiplexing (TDM) simultaneous transmission is achieved by assigning the whole
physical medium to each channel for a fraction of the transmission time.

TDM is preferred over FDM because
- It offers uniform performance across all channels
- It is simpler to cascade many multiplexers
- It is compatible with digital switching
- Enables more sophisticated performance monitoring
- Compatible with digital traffic sources such as computers
- It can be implemented in VLSI

Time Division Multiplexing
In TDM a sample of data is taken from each source to be multiplexed. These signal sources are referred to
as tributaries. A period of time, called a time slot, is allocated for the transmission of each sample.
Time slots are assembled into frames to form a continuous high-speed data stream.
A frame is the smallest group of bits in the output data stream that contains at least one sample from each
input channel plus overhead control information that enables the individual timeslots to be identified and
subsequently separated back into individual channels at the receiver.

In principle the TDM multiplexer can be visualised as a commutator switching sequentially from one input
to the next:



A
1
A
2
A
3

B
3
B
2
B
1

C
1
C
2
C
3

A
2
B
2
C
2
D
2
A
3
B
3
D
1
C
3
D
3
C
1
B
1
A
1
One whole frame
1 time slot
52












In practice digital electronic techniques are used to achieve this.

At the receiver side a demultiplexer directs the individual timeslots to their respective channels as follows:


















Bit Interleaved TDM
If a multiplexer assigns each incoming channel a timeslot equal to one bit period of the outgoing data
stream, the arrangement is called bit interleaving.

Byte or Word Interleaved TDM
Instead of multiplexing bit by bit, a multiplexer can accept a group of bits, making up a word or a character,
from each channel in turn.

A byte is a block of 2
n
bits, with n = 3 (8 bits) being the most popular.

Synchronicity
Ideally the incoming data streams are synchronous with each other and with the internal timing of the
multiplexer. If synchronicity is lost, the demultiplexed channels may be misrouted. If there is a difference
in the timing of the input data relative to that of the multiplexer, additional signal processing has to be
implemented. If there is a difference in the timing of the input data relative to that of the multiplexer,
additional signal processing has to be implemented. If the digital sources are nearly synchronous with the
multiplexer and any difference constrained within predefined frequency limits then the multiplexing
scheme is said to be plesiochronous. The differences in frequency are normally due to the tolerances on
crystal oscillators of the same nominal frequency, or dynamic changes due to clock jitter.
A
2
B
2
C
2
D
2
A
3
B
3
D
1
C
3
D
3
C
1
B
1
A
1
One whole frame
1 time slot
A
1
A
2
A
3

B
3
B
2
B
1

C
1
C
2
C
3

D
3
D
1
D
2

Individual
channels
53

To solve this problem of plesiochronous data inputs justification (or bit stuffing) is used to convert the data
to a truly synchronous signal. Bit stuffing adds or subtracts extra bits from the incoming data stream to
create a synchronous signal. Information to enable the receiver to identify the stuffed bits is also
transmitted within the synchronous bit stream. Positive justification, whereby each tributary rate is
synchronised to a rate slightly greater than the maximum allowable input rate is the commonest. Fewer
stuffing bits are added if the input is running fast and more stuffing bits are added if the input rate is
running fast.

The PCM Primary Multiplex Group

PCM was originally designed for audio frequency transmission where it was found that transmitting up to 2
Mbits was satisfactory. Telephone channels are combined using TDM to form an assembly of 24 or 30
channels. This is known as the primary multiplex group. The primary multiplex group is used as a building
block for assembling larger channels in higher order multiplex systems. The operation of the primary
multiplexer is shown below:

Two frame structures are widely used; the European 30-channel system and the DSI 24 channel system
used in the USA and Japan. Both systems employ 8 bit coding with the 30-channel system using the A-law
companding and the 24-channel system using the -Law. We will only consider the European 30-channel
format since it is the one used in Southern Africa.

Frame Structure for the 30 Channel System

The frame for the 30-channel system is divided into 32 time slots, each with 8 digits. Each byte of eight
digits is obtained by sampling the analogue signal at 8kHz (i.e. 8000 times per second). Hence the total bit
rate is 8000 X 8 X 32 = 2 048 000 bits/s = 2.048Mbits/s. Time slot 1 to 15 and 17 to 31 are each allotted to
a speech channel. Time slot 0 is used for frame alignment whilst time slot 16 is used for signaling.




























D
0
D
1
D
2
D
3
D
4
D
5
D
6
D
7
Time slot 3.9s
1 bit period
0 1 to 15 16 17 to 30 31
signaling
54


















The Plesiochronous Digital Hierarchy

The primary multiplex group of 24 (for the 24 channel Multiplex) or 30 channels (as used in the European
30 channel system) is used as a building block for larger numbers of channels in higher order multiplex
systems. At each level in the hierarchy, several bit streams, known as tributaries, are combined by a
multiplexer. The output from a multiplexer may serve as a tributary to a multiplexer at the next higher level
in the hierarchy, or it may be sent directly over a line or radio link.There are 2 Plesiochronous digital
hierarchies: the European one (based on the 30 channel format and known as the PCM 30 Hierarchy) and
the North American one (based on the 24 channel format and generally known as the PCM 24 Hierarchy).

The European Plesiochronous Digital Hierarchy
In this PDH hierarchy each multiplexing step achieves a 4 multiplexing ratio i.e. the higher level is
made up of four lower level tributaries. However the bit rate increase is somewhat greater than the increase
in channels carried. This is due to the need to accommodate justification bits as well as overhead control
bits at each multiplexing level.























D
0
D
1
D
2
D
3
D
4
D
5
D
6
D
7
D
0
D
1
D
2
D
3
D
4
D
5
D
6
D
7
Frame alignment
Speech amplitude bits
Polarity
Speech channel time slot (1-15 and 17-131)
30

4
4
4
30 channels
2.048Mbits/s
(30channels
)
8.448Mbits/s
(120 channels)
34.368Mbits/s
(480 channels)
564.992 Mbits/s
7680 channels
55








The Synchronous Digital Hierarchy

PDH is a sufficient technique for transporting telephony signals, but with the increase in other services such
as video and data there is a clear need to come up with a transmission system capable of multiplexing and
transmitting information streams of various bit rates. The Synchronous Digital Hierarchy (SDH) provides a
simple mechanism of multiplexing and demultiplexing information streams of various bit rates.

The key advantages of SDH are as follows:
Universal standards:
SDH is based on universal standards that permit interworking of all telecommunications networks and
manufacturers equipment
Easier multiplexing and demultiplexing
It is cheaper to add and drop signals to meet customer requirements. To extract a 64 kbit signal from a
140 Mbits multiplexed stream in a PDH hierarchy, one has to demultiplex down to the 64 kbit level
before the required stream can be extracted. With SDH a 64 kbit stream can be extracted directly from
the high order multiplex level without the need of first demultiplexing
Smaller and cheaper equipment
Because of the advanced techniques employed in SDH technology, equipment size is greatly reduced,
resulting in significant savings in space and power.
Comprehensive performance monitoring
Equipment performance monitoring is comprehensive and centralised. Performance parameters can
be measured at a number of levels in the SDH hierarchy and test signals can be remotely injected and
analysed.
Centralised Network monitoring and control
For PDH networks, field personnel carry out fault diagnosis and rectification, but in SDH alarms and
diagnostic tools are very detailed and centralised. This permits fault diagnosis to be carried out
remotely.
Easier to introduce new services
In SDH the reconfiguration of equipment can be carried out remotely permitting new services to be
offered and changes made to already existing services.

Key Features of the SDH Architecture

The key features of the synchronous digital hierarchy are as follows:
The basic modules making up the hierarchy are synchronous with each other
Multiplexing is by byte interleaving
Any of the existing PDH rates can be multiplexed into a common transport rate of 155.52 Mbits/s
For each transport level a wide range of overhead bits are included. These include multiplex overheads
such as tributary identity, error performance checks, and network management channels, all based on a
standard format that is independent of equipment manufacturer.
Higher transport rates are defined as integer multiples of 155.52 Mbits/s in an 4 n sequence. For
example the next transport rate after the 155.52 Mbits/s rate is the 622.08 Mbits/s followed by
2488.32Mbits/s.
Since the modules and their contents are synchronised, it is easy to drop and insert and to cross
connect complete basic modules in a switch. This allows drop and insert and cross-connection down to
2 Mbits/s (in Europe and its former colonies) and 1.5 Mbits/s (in North America and Japan).
The modules are based on a frame period of 125s. This allows cross-connection down to 64 kbits/s.
Moreover higher order multiplexed traffic can be fed directly into the switch core of a digital
56
exchange. This gives both considerable economies and allows broadband switched services to be
provided to customers.
The interfaces on the multiplexers can be optical or electrical.

The SDH frame Structure
The SDH has a repetitive frame structure with a periodicity of 125s, the same as that of PCM primary rate
signals (i.e. the 2 Mbit/s frame in PDH-30 or the 1.5Mbit/s frame in PDH-24). This allows for cross-
connection of components within primary rate components in the SDH frame.

The basic SDH signal is termed the synchronous transport module at level 1 (STM 1). It consists of 9
equal segments with a burst of overhead bytes at the start of each. The remaining bytes contain a mixture of
traffic and overheads, depending on the type of traffic carried. The overall length is 2430 bytes, with each
overhead occupying nine bits. The aggregate bit rate is therefore 155 520 kbits/s which is usually referred
to as 155 Mbits/s.

STM 1 frame structure







The frame is conventionally represented as 9 rows and 270 columns of 8 bit bytes. The first nine columns
are mostly for section overheads (SOH), such as frame alignment and error monitoring as well as the AU
pointer. The remaining 261 columns comprise the payload into which a variety of signals to be transported
are mapped.

Alternative STM 1 frame structure











Since all signals at STM 1 are synchronous, it is very simple to create higher transmission rates by byte
interleaving payloads without any additional complexity of justification or additional overheads:














The Multiplexing Process
1 2 3 4
7 6 5
8 9
125 microseconds
Section
Overhead
AU pointer
Section
overhead


STM 1 Payload
270 byte-wide columns
9

r
o
w
s

STM 4
4
STM 1
261 bytes
9 bytes
4 261 bytes 4 9 bytes
155.52 Mbits/s
622.08Mbits/s
57

Tributaries from 1.544 Mbits/s up to 139.264 Mbits/s are synchronised, if necessary, with justification bits,
and then packaged in containers of suitable standardised sizes. Control and supervisory information is then
added to the container to form a virtual container (VC). The VC travels through the network as a complete
package until demultiplexing occurs. Virtual container sizes range from VC-11 for 1.5 Mbits/s, VC 12
for 2 Mbits/s, through VC 3 for 45 Mbits/s to VC-4 for 140 Mbits/s.

The VC itself is not fully synchronised with the STM-1 frame. A VC could have a frequency that is slightly
different from that of the STM 1. A pointer is used to mark the start position of the VC in the STM-1
frame. The VC together with its pointer is called the tributary unit. As VCs travel across the network,
being exposed to network timing variations, each VC is allowed to float between the STM 1 frames,
within its TU columns, constantly tracked by its pointer, new pointer values being calculated when
necessary. Thus the TU is locked to the STM-1 frame by the pointer.

The lower level tributary units can be multiplexed further using byte interleaving. Subsequent addition of f
path overheads results in the creation of higher order VCs. A pointer is then calculated for each higher
order VC to give a higher order tributary unit. The final VC and its pointer constitute an administrative unit
(AU) and the pointer is given the special name AU pointer. If the final VC is VC-4, the resulting
administrative unit is called the AU-4. If the final VC is the VC-3, then the resulting administrative unit is
called the AU-3.

Line Coding
Many modern networks transmit information using the same basic capabilities required by Morse code: the
ability to control and detect the flow of energy between the sender and receiver and the ability to measure
time. That is, the symbols in the alphabets used by these systems are distinguished from one another by
detecting whether or not energy is flowing and measuring the time that elapses between changes in the state
of the energy flow. Unlike Morse code, these systems are designed to use a true binary alphabet,
distinguishing only two basic symbols.

The PCM signal is made up of bit sequences of 1s and0s. Depending upon the transmission medium,
difficulties can arise at the receiving end if the 1s and0s are not transmitted in the correct form. A line
code provides the transmitted symbol sequence with the necessary properties to ensure reliable
transmission in the channel and subsequent detection at the receiver. This is achieved by matching the
spectral characteristics of the digital signal to be transmitted to its channel and to provide guaranteed
recovery of timing signals at the receiver. For a line code system to be synchronized using in-band
information, there must not be long sequences of identical symbols, such as ones or zeroes. For binary
PCM systems, the density of 1-symbols is called 'ones-density'. In other cases, the long term DC value of
the modulated signal is important, as building up a DC offset will tend to bias detector circuits out of their
operating range. In this case special measures are taken to keep a count of the cumulative DC offset, and to
modify the codes if necessary to make the DC offset always tend back to zero.

A line code must be selected to suit the characteristics of the particular channel. For metallic cables that
have an essentially low-pass frequency characteristic, it is usually the last coding function performed before
transmission. For bandpass systems, such as optical fibres, radio systems and the analogue telephone
network, where modulation is required before transmission, line coding is implemented immediately before
the modulation function, or it is incorporated within the modulation process.

A line code provides the transmitted symbol sequence with the necessary properties to ensure reliable
transmission in the channel and subsequent detection at the receiver. The normal method of accomplishing
the bit synchronisation is to use a form of Line Encoding. This enables the bit clock information to
accompany the data. The data format is changed prior to transmission not only to make certain the clock
information is present in the data stream but also to make the long-term mean voltage equal zero. This latter
requirement being necessary because frequently the transmission media bandwidth does not extend to DC.
To achieve this objective the designer of a line code must consider the following conditions:

- Spectrum at low frequency: Most transmission systems are easier to design if a.c coupling is used
between waveform processing stages. This means that if the line code has a significant d.c spectral
58
component, the coupling circuits used (namely capacitors and transformers) will filter out the d.c
component of the line code, resulting in the receiver being unable to identify the transmitted symbols.
- Transmission Bandwidth Required: Where bandwidth is at a premium, multilevel line codes, in
which each transmitted symbol represents more than one bit of information, should be considered.
- Timing content: A line code should have sufficient embedded timing information in the transmitted
symbol sequence so that the distant receiver (and intermediate repeaters if used) can extract a reliable
clock to time their decision making processes.
- Error monitoring: A line code can assist in reliable data transmission by its very nature. For example
a line code can be constrained so that it never produces certain symbol sequences. The occurrence of
these illegal symbol sequences at the receiver will indicate that transmission errors have occurred.
- Efficiency: Line coding may involve adding redundancy to the transmitted digit stream, which lowers
the transmission efficiency. A selected line code must be as efficient as possible.




Line
coder
Low-pass channel
Modulator
OR
Band-pass
Channel
Information to be
transmitted
Error control, coding for
security, multiplexing etc
Channel Coding
Relationship between line Coding and other Channel coding functions



59


Comparison of three coding schemes

NRZ (NonReturn To Zero) Code [100% unipolar]

This is the most common type of digital signal since all logic circuits operate on the ON-OFF principle. In
NRZ a pulse lasts for the duration of the clock period. All the 1s have a positive polarity, which results in
the NRZ spectrum having a dc component. If V volts represent a 1 and zero volts represent a 0 then the
average dc component of the signal will depend on the average ratio of 1s and0s in the bit stream. For
an alternating bit stream101010., the average dc value will be V/2. The dc value can range from 0 volts
for all 0 transmission to V volts for all 1 transmission.

From the frequency spectrum there is no signal amplitude at the clock frequency f, so it is impossible to
extract the clock frequency at the receiving end. If a 1 is corrupted to a 0 or vice versa it is impossible to
detect this error. For these reasons NRZ is not suitable for signal transmission over long distances.

RZ (Return To Zero) Code [50% Unipolar]

RZ is similar to NRZ, with the difference that the pulse duration is one half of a clock period. Like NRZ it
is suitable for logic circuits since logic circuits operate on the ON _ OFF principle. RZ also has a dc
component like NRZ. The fundamental frequency for NRZ is at the clock frequency f that makes it possible
to extract the clock frequency provided that a long sequence of zeros is not transmitted. Like NRZ
detection of errors at the receiver is not possible.
Manchester encoding
One approach to encoding binary data that avoids this problem is Manchester encoding. The feature
required to make a code self-synchronizing is that there must be predictable transitions in the signal at
predictable times. Manchester encoding ensures this by representing information in such a way that there
will be a transition in the middle of the transmission of each binary digit. However, since there has to be a
transition in each of these time slots, 0's can not be distinguished from 1's simply by the presence or
absence of the flow of energy during the period of a bit. Instead, it is the nature of the transition that occurs
that is used to encode the type of digit being transmitted. A transition from a state where energy is flowing
to a state where no energy is flowing is interpreted as a 0. A transition from no energy flow to energy flow
is interpreted as a 1.
During the first half of the time period devoted to encoding a 0, the sender allows energy to flow to the
receiver. Then, midway through the period, the sender turns off the flow of energy. It is this downward
transition in the middle of the interval devoted to transmitting a bit that identifies it as a 0. To send a 1, on
the other hand, the sender would block the flow of energy for the first half of the interval used to transmit
60
the bit and then allow energy to flow for the second half of the bit. To make this concrete, the diagram
below shows how the binary sequence "10100110", would be encoded using Manchester Encoding.

Interpreting diagrams showing Manchester encoding of signals can be difficult. The problem is that our
eyes tend to focus on the square bumps rather than on the transitions. This makes it tempting to see the
pattern as an example of on-off keying (in which case it might represent the binary sequence
0110011010010110). The trick is to remember that there must be a transition in the middle of each bit time
and use this fact to break the sequence up into distinct bit times. When this is done, the diagram above
looks more like:

With the vertical red lines indicating the boundaries between bit times. Now, focusing on just the areas
between adjacent red lines, one should clearly see the two patterns of energy flow used to represent 0 and 1,
making it easy to associate binary values with each of the patterns as shown below.

In Manchester encoding the data is exclusive-ored with the bit clock to produce the bit stream. The use of
bi-polar pulses results in zero DC. This example shows the effect of coding and notice that there is a
transition at the centre of every data bit. It is this that can be used for timing extraction. The wave form has
to be Bi-polar to keep the average DC level small This type of coding requires a wider bandwidth than
there is sometimes available particularly in the wide area networks of speech transmission. Accordingly
different format line codes have been designed. If the transmission is over a copper cable they are of the
Pseudo three level type and over Fibre optic networks of two level codes which attempt to keep the number
of ones and zeros approximately the same.

The effect of Manchester Encoding
61
Pseudo three level codes
Historically Alternate Mark Inversion (AMI was the first code proposed. And most other pseudo three level
codes are developments of AMI. In AMI the1 are alternatively positive and negative, hence there is no dc
component in the AMI spectrum.

AMI Line Coded Data

Notice the coded data has been drawn Non-return-to-Zero (NRZ). This is normal in order to save
transmission bandwidth. A disadvantage of such a scheme is that long strings of zero's would entail no
timing content. A long stream of 0s results in difficulties in extracting the clock signal. In alternating
sequence the clock signal is simply obtained by rectifying the bit stream. Error monitoring can be obtained
by noting violations of this alternate marking inversion rule.

The HDBn Coding

Pulse synchronisation is usually required at the receiver to ensure that the samples, on the basis of which
symbol decisions are made, are taken at the correct instance in time. AMI schemes lose synchronisation
when data streams that have a long sequence of zeros are transmitted. To overcome this, high-density
bipolar substitution (HDBn) is used. In HDBn coding, when the number of continuous zeros exceeds n they
are replaced by a special code. High density Bi-polar number 3 (HDB3) is one development that overcomes
this by increasing the density of marks in the coded data.

The HDB3 code, where n =3 is the code recommended by the ITU for PCM systems at multiplexed bit
rates of 2, 8, and 34 Mbits/s. The HDB3 code seeks to limit the number of 0s in along sequence to 3. This
ensures clock extraction at the receiver. Long sequences of more than three 0s are avoided by replacing
one or two 0s with 1s according to the following rules:

1. Invert every second 1 for as long as the number of consecutive 0s does not exceed three. (i.e. AMI
rule). Marks in the data follow the normal AMI practice, this requires no further illumination.
2. For consecutive 0s exceeding three, replace the fourth consecutive 0 with a violation pulse. Note
the violation pulse violates the AMI rule.

Every alternate violation pulse shall change polarity. If this rule cannot be applied set a 1 in accordance
with the AMI rule in the position of the first 0. This is called the B pulse.
62

Bipolar Variations
These extra marks need to be Bi-polar violations (BPV's). Look now at the effect that single rule has on the
coded Fig 2.4. There is the unfortunate possibility that the BPV's will not alternate and this will mean that
there will be a non-zero mean value. A third rule is required.
3. When the first occurrence of four consecutive zero's is found insert a single BPV in the position of the
fourth zero. Subsequently it may be necessary to insert two marks one as a BPV and one as a true
mark. The decision has to be determined by whether the BPV's will alternate. Fig 2.5 shows the effect
of the modification.

The effect of modification
The BPV's alternate as required. A similar coding, but with much simpler implementation rules is B6ZS. In
this a stream of six zero's is replaced by OVBOVB. V being an extra mark that is a violation and B an extra
mark that alternates AMI fashion, see Fig 6
63

BZ6S Encoding.
Other forms of coding group a number of binary digits together and maps these into groups of ternary
digits, 4B3T is one such example. This uses groups of four Binary digits and identifies them with a
particular group of three Ternary digits. One advantage of such schemes is a reduction in the bandwidth
required, there only being three symbols in the place of four originally. There is the complication of two
distinct Modes.
Table 1
Binary Word Ternary Code
Mode A Mode B Word Sum
0000 +0- +0- 0
0001 -+0 -+0 0
0010 0-+ 0-+ 0
0011 +-0 +-0 0
0100 ++0 --0 2
0101 0++ 0-- 2
0110 +0+ -0- 2
0111 +++ --- 3
1000 ++- --+ 1
1001 +-+ -+- 1
1010 +-+ -+- 1
1011 +00 -00 1
64
1100 0+0 0-0 1
1101 00+ 00- 1
1110 0+- 0+- 0
1111 -0+ -0+ 0
If four digits are considered there 16 possible groups, with three digits there are 8 possible codings. But if
the mean value of the data is to be zero, the 16 that are chosen should ideally have the same number of +ve
marks as -ve marks. There are not quite enough combinations available with this property. Two modes are
required to overcome this. An example for complete explanation.

The variation in mean level with time can also be plotted to demonstrate how rapidly the mean value will
tend to zero.
65

Coded Mark Inversion (CMI)

CMI is a polar NRZ code which uses both amplitude levels (each for half the symbol period) to represent a
digital 0 and either amplitude level (for the full symbol period) to represent a digital 1. The level used
alternates for successive digital ones. ITU-T recommended the use of CMI for 140 Mbits/s multiplexed
PCM.

Modulation for Bandpass channels

Modulation refers to the modification of one signal's characteristics in sympathy with another signal. The
signal being modified is called the carrier and the signal doing the modification is called the information
signal. In this topic we will look at intermediate or radio frequency bandpass modulation in which the
carrier is a sinusoid and the characteristics adjusted are amplitude, frequency and/or phase. IF modulation is
carried out to achieve the following objectives:
- Matching the information signal to the channel characteristics
- To enable frequency division multiplexing prior to transmission on a common physical transmission
medium
- To enable efficient antennas of reasonable size to be used in the communication system
- To fit the information signal in an assigned radio spectrum band

Evaluation of modulation schemes

Modulation schemes can be compared and evaluated on the basis of:
- Spectral efficiency
- Carrier to noise power
- Signal power required by different modulation schemes to achieve the same BER

Spectral efficiency is a measure of information transmission rate per Hz of Bandwidth and its units are
bit/s/Hz. It is often desirable to transmit a maximum information rate in a minimum possible bandwidth.
This is the case with radio communications in which radio spectrum is a scarce resource.

Types of IF modulation

IF modulation broadly falls into either of two schemes:
- Binary IF modulation, where a parameter of the carrier is varied between two values
- M-ary IF modulation, where a parameter of the carrier is varied between more than two values

Binary IF modulation

66
The basic binary IF modulation schemes are:
1. Binary amplitude shift keying
2. Binary frequency shift keying
3. Binary phase shift keying

Binary amplitude shift keying

In binary amplitude shift keyed (BASK) systems two digital symbols, zero and one, are represented by
pulses of a sinusoidal carrier (frequency f
c
) with two different amplitudes A
0
and A
1
. In practice, one of the
amplitudes, A
0
, is set to zero resulting in on off keyed IF modulation. Mathematically we have:

H
=
0 digital a for 0
1 digital a for 2 cos ) (
) (
0
1
t f
T
t
A
t f
c
t



T
0
is the symbol duration and H is the rectangular pulse function. BASK causes symbol energy to spread
over more than one symbol period resulting in intersymbol interference (ISI). The probability of symbol
error P
e
is worsened. Performance is not so good and is generally used for low bit rate signals.

Binary frequency shift keying

Binary frequency shift keying (BFSK) represents digital ones and zeros by carrier pulses with two distinct
frequencies f
1
and f
2
. i.e.

H
H
=
0 digital a for 2 cos ) (
1 digital a for 2 cos ) (
) (
2
0
1
1
0
1
t f
T
t
A
t f
T
t
A
t f
t
t



0 1 0 0 1 1 0 0
f
1

f
1

f
1

f
2
f
2

1 0 1 0 0 1 0
67
Bandwidth of BFSK

If the bit period of the modulating signal is T
b
then its baseband frequency will be:
0
1
b
f
T
=
The frequency deviation is given by:
2
1 2
f f
f

= A
Where f
1
is the upper frequency and f
2
is the lower frequency. By Carson's rule for the bandwidth of
frequency modulated signals which states that the minimum bandwidth of an angle modulated signal is
twice the sum of the peak frequency deviation and the highest modulating signal frequency we get:

2
2
b
B f
T
= A + .
BW is generally
4
b T
>

Probability of error for best performance is given by:
0
1.22
1
( )
2 2
b
e
erfc
E
P
N
=
If f
1
and f
2
are orthogonal then:
0
1
2 2
b
e
erfc
E
P
N
=
Binary phase shift keying

Binary phase shift keying (BPSK) is a two phase modulation scheme in which a carrier is transmitted (0,
reference) to indicate a MARK (or digital 1) condition, or the phase is reversed (180) to indicate a SPACE
(or digital 0). The phase shift does not have to be 180, but this allows for the maximum separation of the
digital states. Maximising the phase-state separation is important because the receiver has to distinguish
one phase from the other even in a noisy environment. The phase-shift keying technique can be described
mathematically as shown:

+ H
H
=
0 digital a for ) 2 cos( ) (
1 digital a for 2 cos ) (
) (
0
1
0
1
| t
t
t f
T
t
A
t f
T
t
A
t f
c
c



Using a phase shift of 180 amounts to reversing the phase each time the binary digit changes from a one to
a zero and vice-versa. As a result such a scheme is also called a phase reversal scheme.
68

A simple block diagram for a binary phase shift-keying scheme is shown below:


The bandwidth for BPSK is given by:

2
b
BW
T
~
Probability of error is given by:
0
1
2
b
e
erfc
E
P
N
=
0
1
0
0
1 0 0
180 Phase shift
69

BPSK Demodulator

The BPSK demodulator must have an internal signal whose frequency is exactly equal to the incoming
carrier in order to be able to demodulate the received signal. In practice phase locked loop techniques are
used to recover the carrier from the incoming signal. This recovered carrier signal is used as the phase
reference.

Then the incoming signal and the recovered carrier signal are fed into a phase comparator or product
detector unit so as to recover the modulating signal. A generalised bock diagram of the PSK demodulator
with carrier recovery for phase reference is shown below:




Differential Phase Shift Keying
Differential phase shift keying means that the data is transmitted in the form of discrete phase shifts Au,
where the phase reference is the previously transmitted signal phase. The advantage with this scheme is that
the receiver, as well as the transmitter, does not have to maintain an absolute phase reference against which
the received signal is compared.

A simplified block diagram of a DPSK modulator is shown below:








An incoming information bit is XNORed with the preceding bit prior to entering the BPSK modulator. For
the first data bit, there is no preceding bit with which to compare it. Therefore an initial reference bit is
assumed. The DPSK demodulator is as shown:




70




The received signal is delayed by one bit time, then compared with the next signaling element in the
balanced modulator. If they are the same, a logic 1 is generated. If they are different, a logic 0 is generated.
If the reference phase is incorrectly assumed, only the first demodulated bit is in error.

The primary advantage of DPSK is the simplicity with which it can be implemented. With DPSK, no
carrier recovery circuit is required. A disadvantage of DPSK is that it requires between 1 and 3dB more
signal-to-noise ratio to achieve the same bit error rate as that of absolute PSK. For DPSK, the probability of
error is given by:
0
1
2
b
e
E
N
e P

=

Modulation techniques with Increased Spectral efficiency

Spectral efficiency q
s
is defined as the rate of information transmission per unit of occupied bandwidth. i.e.
bits/s/Hz
B
H R
s
s
= q
Where R
s
is the symbol rate, H is the entropy, and B is the occupied bandwidth. For an alphabet containing
M, statistically independent, equiprobable symbols, log
2
M H = bits/symbol. Since spectrum is a
limited resource it is often desirable to minimise the bandwidth occupied by a signal of a given baud rate.
Since
0
1 T R
s
= and log
2
M H = , for statistically independent, equiprobable symbols then q
s can
be
expressed as:
B T
M
s
0
2
log
= q bits/s/Hz.

From this equation, it is clear that spectral efficiency is maximised by making the symbol alphabet size M,
large and the B T
0
product small. This is exactly the strategy employed by spectrally efficient modulation
techniques.

M-ary Encoding
The term M-ary is derived from the word "binary". M is simply a digit that represents the number of
conditions possible. The digital modulation techniques discussed so far are binary systems because there
are only two possible outputs, one representing a digital one and the other representing a digital zero. With
digital modulation, we can increase the symbol alphabet by encoding at levels higher than two. For instance
for M = 4, we have four possible outputs; M = 8 we have 8 possible outputs etc.


Quaternary phase shift keying

71
Quaternary phase shift keying (QPSK) is a form of angle modulated digital modulation. In this scheme M =
4, and with QPSK four output phases are possible for a single carrier frequency. This means that four
different inputs can be fed into the QPSK modulator. For a binary system a word length of 2 bits is
sufficient to give us the four possible states. The corresponding binary words are 00, 01, 10, 11 i.e. in
QPSK the input data is grouped into groups of two bits called dibits. Each dibit code generates one of the
four possible conditions. For each 2-bit dibit clocked into the modulator, a single output is generated.
Therefore the baud rate at the output is one half of the input bit rate. The output of the QPSK modulator has
exactly the same amplitude. Therefore the binary information is encoded entirely in the phase of the output
signal.

QPSK modulator
A block diagram of a QPSK modulator is shown below:



Two bits are serially clocked (a dibit) are clocked into the bit splitter. The bits are then outputted in parallel.
One bit is directed to the I channel and the other to the Q channel. The I bit modulates a carrier that is in
phase with the reference oscillator, hence the name "I" for in phase channel. The Q bit modulates a carrier
that is 90out of phase or in quadrature with the reference carrier, hence the name "Q" for the quarature
channel.

Once the dibit has been split into the I and Q channels, the operation is the same as for the BPSK
modulator. Essentially a QPSK modulator is two BPSK modulators combined in parallel. For each channel,
two phases are possible at the output of the balanced modulator. The linear summer then combines the two
quadrature signals to give one of four possible resultant phases:


Binary Input QPSK output phase Q I Resultant
72
Q I
1 1 45 cose
c
t +sine
c
t Sin(e
c
t +45)
1 0 135 cose
c
t -sine
c
t Sin(e
c
t +135)
0 0 -135 -cose
c
t -sine
c
t Sin(e
c
t -135)
1 1 -45 -cose
c
t +sine
c
t Sin(e
c
t -45)

The QPSK can be represented on a phasor diagram as follows:


And a constellation diagram for the QPSK output is as shown:


Bandwidth Considerations of QPSK

Since the input data is divided into two, the bit rate in either the I channel or the Q channel is equal to one
half of the input data F
b
/2). Assuming NRZ signals, to generate a bit rate of F
b
, we need a clock frequency
of F
b
/2. Consequently the highest fundamental frequency present at the data input is one half of F
b
/2 i.e.
F
b
/4. By Nyquist's theorem, the minimum bandwidth required in a system is twice the maximum frequency.
In this case the minimum bandwidth we can achieve with QPSK is F
b
/2. This is one half of the incoming
bit rate. Also, assuming NRZ signals, the fastest output rate of change (baud rate) is also equal to one half
the input bit rate. As with BPSK the minimum bandwidth and the baud rate are equal.



QPSK Receiver

sine
c
t
-sine
c
t
-cose
c
t
0 1
0 0
1 0 1 1
cose
c
t
sine
c
t
-sine
c
t
-cose
c
t
0 1
0 0
1 0
1 1
cose
c
t
73
Eight phase PSK (8PSK) is an M-ary encoding technique where M = 8. With an 8PSK modulator, there are
eight possible output phases. To encode eight different phases, the incoming bits are considered in groups
of 3 bits, called tribits (Remember 2
3
= 8).

Binary Input 8PSK output phase
0 0 0 -112.5
0 0 1 -157.5
0 1 0 -67.5
0 1 1 -22.5
1 0 0 +112.5
1 0 1 +157.5
1 1 0 +67.5
1 1 1 +22.5

The resulting constellation diagram is:

Quadrature Amplitude Modulation (QAM)

Quadrature Amplitude Modulation (QAM) is a form of modulation where the digital information is
contained in both the amplitude and the phase of the transmitted carrier.

Eight QAM

Eight QAM (8QAM) is an M-ary encoding technique where M = 8. Unlike 8PSK, the output signal from an
8QAM modulator is not a constant-amplitude signal. The truth table for 8QAM is shown:

Binary Input 8QAM output
amplitude phase
0 0 0 0.765V -135
0 0 1 1.848V -135
0 1 0 0.765V -45
0 1 1 1.848V -45
1 0 0 0.765V +135
1 0 1 1.848V +135
1 1 0 0.765V +45
1 1 1 1.848V +45

And the constellation diagram is as follows:
00
0 0
100
110
011
sine
c
t
-sine
c
t
-cose
c
t
cose
c
t
01
0 0
00
1 0
10
1 0
11
1 0
74


Channel capacity

Let us consider the communication process over a discrete channel.

o---------o
| Noise |
o---------o
|
V
o-------------o X o---------o Y o----------o
| Transmitter |-------->| Channel |-------->| Receiver |
o-------------o o---------o o----------o

Here X represents the space of messages transmitted, and Y the space of messages received during a unit
time over our channel. Let p(y | x) be the conditional probability distribution function of Y given X. We will
consider p(y | x) to be an inherent fixed property of our communications channel (representing the nature of
the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel
and by our choice of f(x), the marginal distribution of messages we choose to send over the channel. Under
these constraints, we would like to maximize the amount of information, or the signal, we can
communicate over the channel. The appropriate measure for this is the transinformation, and this maximum
transinformation is called the channel capacity and is given by:


Noisy channel coding theorem

In information theory, the noisy-channel coding theorem establishes that however contaminated with
noise interference a communication channel may be, it is possible to communicate digital data
(information) error-free up to a given maximum rate through the channel. This surprising result, sometimes
called the fundamental theorem of information theory, or just Shannon's theorem, was first presented
by Claude Shannon in 1948. The Shannon limit or Shannon capacity of a communications channel is the
theoretical maximum information transfer rate of the channel, for a particular noise level. The theorem
describes the maximum possible efficiency of error-correcting methods versus levels of noise interference
and data corruption. The theory doesn't describe how to construct the error-correcting method, it only tells
us how good the best possible method can be. Shannon's theorem has wide-ranging applications in both
communications and data storage applications. This theorem is of foundational importance to the modern
field of theory. The Shannon theorem states that given a noisy channel with information capacity C and
information transmitted at a rate R, then if

sine
c
t
-sine
c
t
-cose
c
t
cose
c
t
00
0 0
100
011
01
0 0
00
1 0
101
0
110
11
1 0
75
there exists a coding technique that allows the probability of error at the receiver to be made arbitrarily
small. This means that theoretically, it is possible to transmit information without error up to a limit, C. The
converse is also important. If

the probability of error at the receiver increases without bound as the transmission rate is increased. No
useful information can be transmitted beyond the channel capacity. Simple schemes such as "send the
message 3 times and use at best 2 out of 3 voting scheme if the copies differ" are inefficient error-
correction methods, unable to asymptotically guarantee that a block of data can be communicated free of
error. Advanced techniques such as Reed-Solomon codes and, more recently, Turbo codes come much
closer to reaching the theoretical Shannon limit, but at a cost of high computational complexity. With
Turbo codes and the computing power in today's digital signal processors, it is now possible to reach within
1/10 of one decibel of the Shannon limit.

Mathematical statement

Theorem (Shannon, 1948):
1. For every discrete memoryless channel, the channel capacity

has the following property. For any > 0 and R < C, for large enough N, there exists a code of
length N and rate R and a decoding algorithm, such that the maximal probability of block error
is .
2. If a probability of bit error p
b
is acceptable, rates up to R(p
b
) are achievable, where

3. For any p
b
, rates greater than R(p
b
) are not achievable.

Channel capacity of particular model channels
- A continuous-time analog communications channel subject to Gaussian noise see ShannonHartley
theorem.
- A binary symmetric channel (BSC) with crossover probability p is a binary input, binary output
channel that flips the input bit with probability p. The BSC has a capacity of 1 H
b
(p) bits per channel
use, where H
b
is the binary entropy function:

- A binary erasure channel (BEC) with erasure probability p is a binary input, ternary output channel.
The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents
complete loss of information about an input bit. The capacity of the BEC is 1 - p bits per channel use.
76

In coding theory, a binary symmetric channel (or BSC) is an idealized model of a communications
channel that sends bits. In a BSC, the probability of a 1 becoming a 0 and of a 0 becoming a 1 are assumed
to be the same (hence the term symmetric). Since 1s and 0s may be represented very differently (as a pulse
and absence of a pulse, for instance), this assumption is often not valid in practical situations. However, this
assumption makes analysis much easier.

Formally, let p < be the probability of an error occurring. Then the probability of a bit sent over a BSC
being correctly received is (1p), and this probability is independent of what bit is sent. Assuming that p is
known to the receiver, we may without loss of generality assume that since otherwise we may
simply invert (swap over) the received symbols giving an error probability of 1 p < 1 / 2. The story is
more subtle if p is entirely unknown to the receiver.


Binary Symmetric channel (BSC) is idealised model used for noisy channel.
binary (0,1)
symmetric p( 01) =p(10)
77

Forward transition probabilities of the noisy binary symmetric channel.

Transmitted source symbols Xi; i=0,1. Received symbols Yj; j=0,1;, Given P(Xi), source probability;
P(Yj| Xi) = Pe if ij; P(Yj| Xi) = 1- Pe if i=j; Where Pe is the given error rate.

Calculate the average mutual information (the average amount of source information acquired per received
symbol, as distinguished for that per source symbol, which was given by the entropy H(X)).

Step 1: P(Yj,Xi)=P(Xi)P(Yj|Xi); i=0,1. j=0,1.

Step 2: P(Yj) =X P(Yj,Xi) ; j=0,1.

(Logically the probability of receiving a particular Yj is the sum of all joint probabilities over the range of
Xi. (i.e. the prob of receiving 1 is the sum of the probability of sending 1 receiving 1 plus the probability of
sending 0, receiving 1. That is, the sum of the probability of sending 1 and receiving correctly plus the
probability of sending 0, receiving wrongly. )

Step 3: I(Yj,Xi) = log {P(Xi|Yj)/P(Xi) } =log {P(Yj,Xi)/ [P(Xi)P(Yj)]} ; i=0,1. j=0,1.

(This quantifies the amount of information conveyed, when Xi is transmitted, and Yj is received. Over a
perfect noiseless channel, this is self information of Xi, because each received symbol uniquely identifies a
transmitted symbol with P(Xi|Yj)=1; If it is very noisy, or communication breaks down
P(Yj,Xi)=P(Xi)P(Yj), this is zero, no information has been transferred).

Step 4: I(X,Y)= I(Y ,X ) = X Y P(Yj,Xi)log{P(Xi|Yj)/P(Xi)}
= X Y P(Yj,Xi) log {P(Yj,Xi)/ [P(Xi)P(Yj)]}

Equivocation
Represents the destructive effect of noise, or additional information needed to make the reception correct

The observer looks at the transmitted and received digits; if they are the same, reports a 1, if different a
0.

78
The information sent by the observer is easily evaluated as -[p(0)logp(0)+p(1)logp(1)] applied to the binary
string. The probability of 0 is just the channel error probability.

Example: A binary system produces Marks and Spaces with equal probabilities, 1/8 of all pulses being
received in error. Find the information sent by the observer.
The information sent by observer is -[7/8log (7/8)+1/8log(1/8)]=0.54 bits since the input information is 1
bit/symbol, the net information is 1-0.54=0.46 bits.

The noise in the system has destroyed 0.55 bits of information, or that the equivocation is 0.55 bits.
- General expression for equivocation
Consider a specific pair of transmitted and received digits {x,y}

1. Noisy channel probability change p(x)p(x|y)
2. Receiver: probability correction p(x|y)1
The information provided by the observer =-log( p(x|y) ). Averaging over all pairs:


The information transferred via the noisy channel (in the absence of the observer)



Example: A binary system produces Marks with probability of 0.7 and Spaces with probability 0.3, 2/7 of
the Marks are received in error and 1/3 of the Spaces. Find the information transfer using the expression
for equivocation.

79
Summary of basic formulae by Venn Diagram



Channel capacity
C=max I(xy) that is maximum information transfer

Binary Symmetric Channels
The noise in the system is random, then the probabilities of errors in 0 and 1 is the same. This is
characterised by a single value p of binary error probability. Channel capacity of this channel

80
Channel capacity of BSC channel


This is a backward equivocation (error entropy), p is fixed, so the I(xy) is maximum when H(y) is
maximum. This occurs when p(0)=p(1) at receiver (output) H(y)=1.
C=1-H(p)



How to overcome the problem of information loss in noisy channel?
(a) Physical solution?
(b) System solution. (Channel coding).

Source coding: The task of source coding is to represent the source information with the minimum of
symbols under the assumption that channel is noisy-free. When a code is transmitted over a channel in the
presence of noise, errors will occur.
Channel coding: The task of channel coding is to represent the source information in a manner that
minimises the error probability in decoding. Redundancy; --- put extra amount of information to
compensate information loss; (temperature control of a room in winter for different outdoor temperature).
81
Symbol error is the error based on some decision rule; If a received code word (some bits might be in
error) is classified as the wrong symbol (different than the original symbol it meant). Binomial
distribution plays an important role in channel coding; A binomial distribution experiment consists of n
identical trials, (think of coding a symbol by a binary digit sequence i.e. code word , so n is length of the
code word). Each trial has two possible outcomes, S or F, respectively, with a probability p. Easily S can
be defined as a transmission error (10 or 01). The probability p is bit error rate.


is used to calculate probability of r bit errors in a codeword.

Coding in noisy channel
- Error protection:-- improve tolerance of errors
- Error detection: --- indicate occurrence of errors.
- Error correction

Binary coding for error protection
Example: Assume Binary Symmetrical Channel, p=0.01 ( error probability)
1) Coding by repetition
Code A=00000, B=11111, use majority decision rule. If more 0s than 1s A 2 errors tolerated
without producing symbol error.

Use binomial probability distribution to find symbol error probability p(e)


2) Coding by selection of code words
( using 5 digits, there are 32 possible code words,But we dont have to use them all. )
a)Two selections ( i.e. repetition) A=00000, B=11111. This gives:

b) Thirty -two selections

82
c) 4 selections
A compromise between two extremes:
1. A lot of code words to give reasonable R.
2. Code words are as different as possible to reducep(e), e.g.


Each code word differs from all the other in at least three digit positions. Minimum Hamming distance
(MHD) =3. One error can be tolerated.











83
- David J.C. MacKay (2003) Information theory, inference and learning algorithms,
CUP, ISBN 0521642981, (also available online)
References
- Golomb, S.W. (1966). Run-length encodings. IEEE Transactions on Information
Theory, IT--12(3):399--401 [1]
- R. F. Rice, "Some Practical Universal Noiseless Coding Techniques, " Jet
Propulsion Laboratory, Pasadena, California, JPL Publication 79--22, Mar. 1979.
- Witten, Ian Moffat, Alistair Bell, Timothy. "Managing Gigabytes: Compressing
and Indexing Documents and Images." Second Edition. Morgan Kaufmann
Publishers, San Francisco CA. 1999 ISBN 1-55860-570-3


[edit]
External links
- The on-line textbook: Information Theory, Inference, and Learning Algorithms,
by David J.C. MacKay, explains arithmetic coding in Chapter 6.
- Paul E. Black, arithmetic coding at the NIST Dictionary of Algorithms and Data
Structures.
"Arithmetic Coding" site on Compression-links.info (free open version of DataCompression.info) - an
excellent collection of links pertaining to arithmetic coding

[edit] References
1. ^ Peterson, W. W. and Brown, D. T. (January 1961). "Cyclic Codes for
Error Detection". Proceedings of the IRE 49: 228.
doi:10.1109/JRPROC.1961.287814. ISSN 0096-8390.
2. ^ (slib) Cyclic Checksum. Retrieved on 2008-04-06.
3. ^ Castagnoli, G. and Braeuer, S. and Herrman, M. (June 1993).
"Optimization of Cyclic Redundancy-Check Codes with 24 and 32
Parity Bits". IEEE Transactions on Communications 41 (6): 883.
doi:10.1109/26.231911. ISSN 0090-6778. - Castagnoli's et al. work on
algorithmic selection of CRC polynomials
4. ^ Koopman, P. (June 2002). "32-Bit Cyclic Redundancy Codes for
Internet Applications". The International Conference on Dependable
Systems and Networks: 459. doi:10.1109/DSN.2002.1028931. -
84
verification of Castagnoli's results by exhaustive search and some new
good polynomials
5. ^ [|Koopman, Philip] & Chakravarty, Tridib (2004), Cyclic Redundancy
Code (CRC) Polynomial Selection For Embedded Networks,
<http://www.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_pol
y_embedded.pdf>
6. ^ Perez, A.; Wismer & Becker (1983). "Byte-Wise CRC Calculations".
IEEE Micro 3 (3): 40-50. doi:10.1109/MM.1983.291120. ISSN 0272-
1732.
7. ^ Ramabadran, T.V.; Gaitonde, S.S. (1988). "A tutorial on CRC
computations". IEEE Micro 8 (4): 62-75. doi:10.1109/40.7773. ISSN
0272-1732.
8. ^ Thomas Boutell, Glenn Randers-Pehrson, et al. (1998-07-14). PNG
(Portable Network Graphics) Specification, Version 1.2. Retrieved on
2008-04-28.
[edit] See also
Information security
Simple file verification
[edit] External links
Easy to understand CRC32 C++ Source Code
Free CRC Source Code from the Boost C++ Libraries
The CRC Pitstop
Williams, R. (1993-09) A Painless Guide to CRC Error Detection
Algorithms
Understanding Cyclic Redundancy Check
Black, R. (1994-02) Fast CRC32 in Software; algorithm 4 is used in Linux
and info-zip's zip and unzip.
Kounavis, M. and Berry, F. (2005). A Systematic Approach to Building
High Performance, Software-based, CRC generators, Slicing-by-4 and
slicing-by-8 algorithms
CRC32: Generating a checksum for a file, C++ implementation by Brian
Friesen
85
CRC16 to CRC64 collision research
Reversing CRC - Theory and Practice.
'CRC-Analysis with Bitfilters'.
Another Bad CRC Reading Tool, but free
MathPages - Cyclic Redundancy Checks
Tool for Bad CRC Reading with advanced options
A CRC calculation utility and C source code generator written in Python.
(MIT licence)
CRC Encoding - C# Implementation by Marcel de Wijs
Cyclic Redundancy Check: theory, practice, hardware, and software with
emphasis on CRC-32. A sample chapter from Henry S. Warren, Jr.
Hacker's Delight.
[edit] Online tools
Free CRC Verilog Circuit generator
Online CRC32 and CRC32B calculator
Online Tool to compute common CRCs (8/16/32/64) from strings
Online CRC calculator
Another online CRC calculator
Online CRC Tool: Generator of synthesizable CRC functions
Online Char (ASCII), HEX, Binary, Base64, etc... Encoder/Decoder with
MD2, MD4, MD5, SHA1+2, CRC, etc. hashing algorithms

Potrebbero piacerti anche