Sei sulla pagina 1di 21

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.

html

Basics of Information Theory


According to Shannon, the entropy of an information source S is defined as:

where pi is the probability that symbol Si in S will occur.

indicates the amount of information contained in Si, i.e., the number


of bits needed to code Si.
For example, in an image with uniform distribution of gray-level intensity,
i.e. pi = 1/256, then the number of bits needed to code each gray level is 8
bits. The entropy of this image is 8.
Q: How about an image in which half of the pixels are white (I = 220) and
half are black (I = 10)?

Pattern Substitution
Lossless Compression Algorithms (Pattern Substitution)

This is a simple form of statistical encoding.


Here we substitue a frequently repeating pattern(s) with a code. The code is shorter than than
pattern giving us compression.
A simple Pattern Substitution scheme could employ predefined code (for example replace all
occurrences of `The' with the code '&').
More typically tokens are assigned to according to frequency of occurrenc of patterns:

Count occurrence of tokens


Sort in Descending order

Assign some symbols to highest count tokens

A predefined symbol table may used ie assign code i to token i.

Huffman Coding
Huffman coding is based on the frequency of occurance of a data item (pixel in images). The
principle is to use a lower number of bits to encode the data that occurs more frequently.
Codes are stored in a Code Book which may be constructed for each image or a set of images.
In all cases the code book plus encoded data must be transmitted to enable decoding.
The Huffman algorithm is now briefly summarised:

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

A bottom-up approach

1. Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g., ABCDE).
2. Repeat until the OPEN list has only one node left:
(a) From OPEN pick two nodes having the lowest frequencies/probabilities, create a parent
node of them.
(b) Assign the sum of the children's frequencies/probabilities to the parent node and insert it
into OPEN.
(c) Assign code 0, 1 to the two branches of the tree, and delete the children from OPEN.

Adaptive Huffman Coding


The basic Huffman algorithm has been extended, for the following reasons:
(a) The previous algorithms require the statistical knowledge which is often not available
(e.g., live audio, video).
(b) Even when it is available, it could be a heavy overhead especially when many tables had
to be sent when a non-order0 model is used, i.e. taking into account the impact of the
previous symbol to the probability of the current symbol (e.g., "qu" often come together, ...).
The solution is to use adaptive algorithms. As an example, the Adaptive Huffman Coding is
examined below. The idea is however applicable to other adaptive compression algorithms.
ENCODER
-------

DECODER
-------

Initialize_model();
while ((c = getc (input)) != eof)
{
encode (c, output);
update_model (c);
}

Initialize_model();
while ((c = decode (input)) != eof)
{
putc (c, output);
update_model (c);
}

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

The key is to have both encoder and decoder to use exactly the same
initialization and update_model routines.

Arithmetic Coding
Huffman coding and the like use an integer number (k) of bits for each symbol, hence k is
never less than 1. Sometimes, e.g., when sending a 1-bit image, compression becomes
impossible.

Idea: Suppose alphabet was

X, Y
and
prob(X) = 2/3
prob(Y) = 1/3

If we are only concerned with encoding length 2 messages, then we can


map all possible messages to intervals in the range [0..1]:

To encode message, just send enough bits of a binary fraction that


uniquely specifies the interval.

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

Similarly, we can map all possible length 3 messages to intervals in the


range [0..1]:

Q: How to encode X Y X X Y X ?

Q: What about an alphabet with 26 symbols, or 256 symbols, ...?

In general, number of bits is determined by the size of the interval.

Examples:
o
o

first interval is 8/27, needs 2 bits -> 2/3 bit per symbol (X)
last interval is 1/27, need 5 bits

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

In general, need
bits to represent interval of size p. Approaches
optimal encoding as message length got to infinity.

Problem: how to determine probabilities?


o

Simple idea is to use adaptive model: Start with guess of symbol


frequencies. Update frequency with each new symbol.

Another idea is to take account of intersymbol probabilities, e.g.,


Prediction by Partial Matching.

Implementation Notes: Can be CPU and memory intensive; patented.

update_model does two things: (a) increment the count, (b) update the
Huffman tree (Fig 7.2).
o

During the updates, the Huffman tree will be maintained its sibling
property, i.e. the nodes (internal and leaf) are arranged in order of
increasing weights (see figure).

When swapping is necessary, the farthest node with weight W is


swapped with the node whose weight has just been increased to
W+1. Note: If the node with weight W has a subtree beneath it,
then the subtree will go with it.

The Huffman tree could look very different after node swapping
(Fig 7.2), e.g., in the third tree, node A is again swapped and
becomes the #5 node. It is now encoded using only 2 bits.

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

Note: Code for a particular symbol changes during the adaptive coding process

Lempel-Ziv-Welch (LZW) Algorithm


The LZW algorithm is a very common compression technique.
Suppose we want to encode the Oxford Concise English dictionary which contains about
159,000 entries. Why not just transmit each word as an 18 bit number?
Problems:

Too many bits,


everyone needs a dictionary,

only works for English text.

Solution: Find a way to build the dictionary adaptively.

Original methods due to Ziv and Lempel in 1977 and 1978. Terry Welch
improved the scheme in 1984 (called LZW compression).

It is used in UNIX compress -- 1D token stream (similar to below)

It used in GIF comprerssion -- 2D window tokens (treat image as with


Huffman Coding Above).

Reference: Terry A. Welch, "A Technique for High Performance Data Compression", IEEE
Computer, Vol. 17, No. 6, 1984, pp. 8-19.
The LZW Compression Algorithm can summarised as follows:
w = NIL;
while ( read a character k )
{
if wk exists in the dictionary

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

w = wk;
else
add wk to the dictionary;
output the code for w;
w = k;

Original LZW used dictionary with 4K entries, first 256 (0-255) are ASCII
codes.

Example:
Input string is "^WED^WE^WEE^WEB^WET".
w
k
output
index
symbol
----------------------------------------NIL
^
^
W
^
256
^W
W
E
W
257
WE
E
D
E
258
ED
D
^
D
259
D^
^
W
^W
E
256
260
^WE
E
^
E
261
E^
^
W
^W
E
^WE
E
260
262
^WEE
E
^
E^
W
261
263
E^W
W
E
WE
B
257
264
WEB
B
^
B
265
B^
^
W
^W
E
^WE
T
260
266
^WET
T
EOF
T

A 19-symbol input has been reduced to 7-symbol plus 5-code output. Each
code/symbol will need more than 8 bits, say 9 bits.
Usually, compression doesn't start until a large number of bytes (e.g., >
100) are read in.

The LZW Decompression Algorithm is as follows:


read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{
entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry;
}

Example (continued):
Input string is "^WED<256>E<260><261><257>B<260>T".

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

w
k
output
index
symbol
------------------------------------------^
^
^
W
W
256
^W
W
E
E
257
WE
E
D
D
258
ED
D
<256>
^W
259
D^
<256>
E
E
260
^WE
E
<260>
^WE
261
E^
<260> <261>
E^
262
^WEE
<261> <257>
WE
263
E^W
<257>
B
B
264
WEB
B
<260>
^WE
265
B^
<260>
T
T
266
^WET

Problem: What if we run out of dictionary space?


o Solution 1: Keep track of unused entries and use LRU
o

Solution 2: Monitor compression performance and flush dictionary


when performance is poor.

Implementation Note: LZW can be made really fast; it grabs a fixed


number of bits from input stream, so bit parsing is very easy. Table lookup
is automatic.

The Discrete Cosine Transform (DCT)


The discrete cosine transform (DCT) helps separate the image into parts (or
spectral sub-bands) of differing importance (with respect to the image's visual
quality). The DCT is similar to the discrete Fourier transform: it transforms a
signal or image from the spatial domain to the frequency domain (Fig 7.8).

DCT Encoding
The general equation for a 1D (N data items) DCT is defined by the following
equation:

and the corresponding inverse 1D DCT transform is simple F-1(u), i.e.:


where

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

The general equation for a 2D (N by M image) DCT is defined by the following equation:

and the corresponding inverse 2D DCT transform is simple F-1(u,v), i.e.:


where

The basic operation of the DCT is as follows:

The input image is N by M;


f(i,j) is the intensity of the pixel in row i and column j;

F(u,v) is the DCT coefficient in row k1 and column k2 of the DCT matrix.

For most images, much of the signal energy lies at low frequencies; these
appear in the upper left corner of the DCT.

Compression is achieved since the lower right values represent higher


frequencies, and are often small - small enough to be neglected with little
visible distortion.

The DCT input is an 8 by 8 array of integers. This array contains each


pixel's gray scale level;

8 bit pixels have levels from 0 to 255.

Therefore an 8 point DCT would be:

where

Question: What is F[0,0]?


answer: They define DC and AC components.

The output array of DCT coefficients contains integers; these can range
from -1024 to 1023.
It is computationally easier to implement and more efficient to regard the
DCT as a set of basis functions which given a known input array size (8 x
8) can be precomputed and stored. This involves simply computing values

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
for a convolution mask (8 x8 window) that get applied (summ values x
pixelthe window overlap with image apply window accros all rows/columns
of image). The values as simply calculated from the DCT formula. The 64
(8 x 8) DCT basis functions are illustrated in Fig 7.9.

DCT basis functions

Why DCT not FFT?

DCT is similar to the Fast Fourier Transform (FFT), but can approximate lines well
with fewer coefficients (Fig 7.10)

DCT/FFT Comparison

Computing the 2D DCT


o Factoring reduces problem to a series of 1D DCTs (Fig 7.11):

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

apply 1D DCT (Vertically) to Columns

apply 1D DCT (Horizontally) to resultant Vertical DCT above.

or alternatively Horizontal to Vertical.

The equations are given by:

Most software implementations use fixed point arithmetic. Some


fast implementations approximate coefficients so all multiplies are
shifts and adds.
World record is 11 multiplies and 29 adds. (C. Loeffler, A. Ligtenberg
and G. Moschytz, "Practical Fast 1-D DCT Algorithms with 11
Multiplications", Proc. Int'l. Conf. on Acoustics, Speech, and Signal
Processing 1989 (ICASSP `89), pp. 988-991)

Differential Encoding
Simple example of transform coding mentioned earlier and instance of this approach.
Here:

The difference between the actual value of a sample and a prediction of


that values is encoded.
Also known as predictive encoding.

Example of technique include: differential pulse code modulation, delta


modulation and adaptive pulse code modulation -- differ in prediction part.

Suitable where successive signal samples do not differ much, but are not
zero. E.g. Video -- difference between frames, some audio signals.

Differential pulse code modulation (DPCM) simple prediction:

fpredict(ti) = factual(ti-1)
i.e. a simple Markov model where current value is the predict next value.
So we simply need to encode:

If successive sample are close to each other we only need to encode first sample with
a large number of bits:

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

Actual Data: 9 10 7 6
Predicted Data: 0 9 10 7
: +9, +1, -3, -1.

Delta modulation is a special case of DPCM: Same predictor function,


coding error is a single bit or digit that indicates the current sample should
be increased or decreased by a step.

Not Suitable for rapidly changing signals.

Adaptive pulse code modulation -- Fuller Markov model: data is extracted


from a function of a series of previous values: E.g. Average of last n
samples. Characteristics of sample better preserved.

Vector Quantisation
The basic outline of this approach is:

Data stream divided into (1D or 2D square) blocks -- vectors


A table or code book is used to find a pattern for each block.

Code book can be dynamically constructed or predefined.

Each pattern for block encoded as a look value in table

Compression achieved as data is effectively subsampled and coded at this


level.

Video Compression
We have studied the theory of encoding now let us see how this is applied in practice.
We need to compress video (and audio) in practice since:
1.
Uncompressed video (and audio) data are huge. In HDTV, the bit rate easily exceeds 1
Gbps. -- big problems for storage and network communications. For example:
One of the formats defined for HDTV broadcasting within the United States is 1920
pixels horizontally by 1080 lines vertically, at 30 frames per second. If these numbers
are all multiplied together, along with 8 bits for each of the three primary colors, the
total data rate required would be approximately 1.5 Gb/sec. Because of the 6 MHz.
channel bandwidth allocated, each channel will only support a data rate of 19.2
Mb/sec, which is further reduced to 18 Mb/sec by the fact that the channel must also
support audio, transport, and ancillary data information. As can be seen, this
restriction in data rate means that the original signal must be compressed by a figure
of approximately 83:1. This number seems all the more impressive when it is realized

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

that the intent is to deliver very high quality video to the end user, with as few visible
artifacts as possible.
2.
Lossy methods have to employed since the compression ratio of lossless methods
(e.g., Huffman, Arithmetic, LZW) is not high enough for image and video
compression, especially when distribution of pixel values is relatively flat.
The following compression types are commonly used in Video compression:

Spatial Redundancy Removal - Intraframe coding (JPEG)


Spatial and Temporal Redundancy Removal - Intraframe and Interframe coding
(H.261, MPEG)

These are discussed in the following sections.

Intra Frame Coding


The term intra frame coding refers to the fact that the various lossless and lossy compression
techniques are performed relative to information that is contained only within the current
frame, and not relative to any other frame in the video sequence. In other words, no temporal
processing is performed outside of the current picture or frame. This mode will be described
first because it is simpler, and because non-intra coding techniques are extensions to these
basics. Figure 1 shows a block diagram of a basic video encoder for intra frames only. It turns
out that this block diagram is very similar to that of a JPEG still image video encoder, with
only slight implementation detail differences.

The potential ramifications of this similarity will be discussed later. The basic
processing blocks shown are the video filter, discrete cosine transform, DCT
coefficient quantizer, and run-length amplitude/variable length coder. These

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
blocks are described individually in the sections below or have already been
described in JPEG Compression.

This is a basic Intra Frame Coding Scheme is as follows:

Macroblocks are 16x16 pixel areas on Y plane of original image.

A macroblock usually consists of 4 Y blocks, 1 Cr block, and 1 Cb block.


In the example HDTV data rate calculation shown previously, the pixels were
represented as 8-bit values for each of the primary colors red, green, and blue. It
turns out that while this may be good for high performance computer generated
graphics, it is wasteful in most video compression applications. Research into the
Human Visual System (HVS) has shown that the eye is most sensitive to changes in
luminance, and less sensitive to variations in chrominance. Since absolute
compression is the name of the game, it makes sense that MPEG should operate on a
color space that can effectively take advantage of the eyes different sensitivity to
luminance and chrominance information. As such, H/261 (and MPEG) uses the
YCbCr color space to represent the data values instead of RGB, where Y is the
luminance signal, Cb is the blue color difference signal, and Cr is the red color
difference signal.
A macroblock can be represented in several different manners when referring to the
YCbCr color space. Figure 7.13 below shows 3 formats known as 4:4:4, 4:2:2, and
4:2:0 video. 4:4:4 is full bandwidth YCbCr video, and each macroblock consists of 4
Y blocks, 4 Cb blocks, and 4 Cr blocks. Being full bandwidth, this format contains as
much information as the data would if it were in the RGB color space. 4:2:2 contains
half as much chrominance information as 4:4:4, and 4:2:0 contains one quarter of the
chrominance information. Although MPEG-2 has provisions to handle the higher
chrominance formats for professional applications, most consumer level products will
use the normal 4:2:0 mode.

Macroblock Video Formats

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Because of the efficient manner of luminance and chrominance
representation, the 4:2:0 representation allows an immediate data
reduction from 12 blocks/macroblock to 6 blocks/macroblock, or 2:1
compared to full bandwidth representations such as 4:4:4 or RGB. To
generate this format without generating color aliases or artifacts requires
that the chrominance signals be filtered.

The Macroblock is coded as follows:

o
o

Many macroblocks will be exact matches (or close enough). So send


address of each block in image -> Addr
Sometimes no good match can be found, so send INTRA block ->
Type

Will want to vary the quantization to fine tune compression, so send


quantization value -> Quant

Motion vector -> vector

Some blocks in macroblock will match well, others match poorly. So


send bitmask indicating which blocks are present (Coded Block
Pattern, or CBP).

Send the blocks (4 Y, 1 Cr, 1 Cb) as in JPEG.

Quantization is by constant value for all DCT coefficients (i.e., no


quantization table as in JPEG).

Inter-frame (P-frame) Coding


The previously discussed intra frame coding techniques were limited to processing the video
signal on a spatial basis, relative only to information within the current video frame.
Considerably more compression efficiency can be obtained however, if the inherent temporal,
or time-based redundancies, are exploited as well. Anyone who has ever taken a reel of the
old-style super-8 movie film and held it up to a light can certainly remember seeing that most
consecutive frames within a sequence are very similar to the frames both before and after the
frame of interest. Temporal processing to exploit this redundancy uses a technique known as
block-based motion compensated prediction, using motion estimation. A block diagram of the
basic encoder with extensions for non-intra frame coding techniques is given in Figure 7.14.
Of course, this encoder can also support intra frame coding as a subset.

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

P-Frame Coding
Starting with an intra, or I frame, the encoder can forward predict a future frame.
This is commonly referred to as a P frame, and it may also be predicted from
other P frames, although only in a forward time manner. As an example, consider
a group of pictures that lasts for 6 frames. In this case, the frame ordering is
given as I,P,P,P,P,P,I,P,P,P,P,

Each P frame in this sequence is predicted from the frame immediately preceding it, whether
it is an I frame or a P frame. As a reminder, I frames are coded spatially with no reference to
any other frame in the sequence.
P-coding can be summarised as follows:

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

An Coding Example (P-frame)


Previous image is called reference image.

Image to code is called target image.

Actually, the difference is encoded.

Subtle points:
1.
Need to use decoded image as reference image, not original. Why?
2.
We're using "Mean Absolute Difference" (MAD) to decide best block. Can
also use "Mean Squared Error" (MSE) = sum(E*E)

MPEG Video
MPEG compression is essentially a attempts to over come some shortcomings of H.261 and
JPEG:

Recall H.261 dependencies:

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

The Problem here is that many macroblocks need information is not in the
reference frame.
For example:

The MPEG solution is to add a third frame type which is a bidirectional


frame, or B-frame
B-frames search for macroblock in past and future frames.

Typical pattern is IBBPBBPBB IBBPBBPBB IBBPBBPBB

Actual pattern is up to encoder, and need not be regular.

MPEG Video Layers


MPEG video is broken up into a hierarchy of layers to help with error handling, random
search and editing, and synchronization, for example with an audio bitstream. From the top
level, the first layer is known as the video sequence layer, and is any self-contained bitstream,
for example a coded movie or advertisement. The second layer down is the group of pictures,
which is composed of 1 or more groups of intra (I) frames and/or non-intra (P and/or B)
pictures that will be defined later. Of course the third layer down is the picture layer itself,
and the next layer beneath it is called the slice layer. Each slice is a contiguous sequence of
raster ordered macroblocks, most often on a row basis in typical video applications, but not
limited to this by the specification. Each slice consists of macroblocks, which are 16x16
arrays of luminance pixels, or picture data elements, with 2 8x8 arrays of associated
chrominance pixels. The macroblocks can be further divided into distinct 8x8 blocks, for
further processing such as transform coding. Each of these layers has its own unique 32 bit
start code defined in the syntax to consist of 23 zero bits followed by a one, then followed by

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

8 bits for the actual start code. These start codes may have as many zero bits as desired
preceding them.

The MPEG Video Bitstream


The MPEG Video Bitstream is summarised as follows:

Public domain tool mpeg_stat and mpeg_bits will analyze a bitstream.

Sequence Information
1.
Video Params include width, height, aspect ratio of pixels, picture rate.
2.
Bitstream Params are bit rate, buffer size, and constrained parameters
flag (means bitstream can be decoded by most hardware)
3.
Two types of QTs: one for intra-coded blocks (I-frames) and one for intercoded blocks (P-frames).

Group of Pictures (GOP) information


1.

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Time code: bit field with SMPTE time code (hours, minutes, seconds,
frame).
2.
GOP Params are bits describing structure of GOP. Is GOP closed? Does it
have a dangling pointer broken?

Picture Information
1.
Type: I, P, or B-frame?
2.
Buffer Params indicate how full decoder's buffer should be before starting
decode.
3.
Encode Params indicate whether half pixel motion vectors are used.

Slice information
1.
Vert Pos: what line does this slice start on?
2.
QScale: How is the quantization table scaled in this slice?

Macroblock information
1.
Addr Incr: number of MBs to skip.
2.
Type: Does this MB use a motion vector? What type?
3.
QScale: How is the quantization table scaled in this MB?
4.
Coded Block Pattern (CBP): bitmap indicating which blocks are coded.

http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html

Simple Audio Compression Methods


Traditional lossless compression methods (Huffman, LZW, etc.) usually don't work well on
audio compression (the same reason as in image compression).
The following are some of the Lossy methods applied to audio compression:

Silence Compression - detect the "silence", similar to run-length coding


Adaptive Differential Pulse Code Modulation (ADPCM)

e.g., in CCITT G.721 - 16 or 32 Kbits/sec.


(a) encodes the difference between two consecutive signals,
(b) adapts at quantization so fewer bits are used when the value is smaller.
o
o

It is necessary to predict where the waveform is headed -> difficult


Apple has proprietary scheme called ACE/MACE. Lossy scheme that
tries to predict where wave will go in next sample. About 2:1
compression.

Linear Predictive Coding (LPC) fits signal to speech model and then
transmits parameters of model. Sounds like a computer talking, 2.4
kbits/sec.

Code Excited Linear Predictor (CELP) does LPC, but also transmits error
term - audio conferencing quality at 4.8 kbits/sec

Potrebbero piacerti anche