Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
html
Pattern Substitution
Lossless Compression Algorithms (Pattern Substitution)
Huffman Coding
Huffman coding is based on the frequency of occurance of a data item (pixel in images). The
principle is to use a lower number of bits to encode the data that occurs more frequently.
Codes are stored in a Code Book which may be constructed for each image or a set of images.
In all cases the code book plus encoded data must be transmitted to enable decoding.
The Huffman algorithm is now briefly summarised:
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
A bottom-up approach
1. Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g., ABCDE).
2. Repeat until the OPEN list has only one node left:
(a) From OPEN pick two nodes having the lowest frequencies/probabilities, create a parent
node of them.
(b) Assign the sum of the children's frequencies/probabilities to the parent node and insert it
into OPEN.
(c) Assign code 0, 1 to the two branches of the tree, and delete the children from OPEN.
DECODER
-------
Initialize_model();
while ((c = getc (input)) != eof)
{
encode (c, output);
update_model (c);
}
Initialize_model();
while ((c = decode (input)) != eof)
{
putc (c, output);
update_model (c);
}
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
The key is to have both encoder and decoder to use exactly the same
initialization and update_model routines.
Arithmetic Coding
Huffman coding and the like use an integer number (k) of bits for each symbol, hence k is
never less than 1. Sometimes, e.g., when sending a 1-bit image, compression becomes
impossible.
X, Y
and
prob(X) = 2/3
prob(Y) = 1/3
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Q: How to encode X Y X X Y X ?
Examples:
o
o
first interval is 8/27, needs 2 bits -> 2/3 bit per symbol (X)
last interval is 1/27, need 5 bits
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
In general, need
bits to represent interval of size p. Approaches
optimal encoding as message length got to infinity.
update_model does two things: (a) increment the count, (b) update the
Huffman tree (Fig 7.2).
o
During the updates, the Huffman tree will be maintained its sibling
property, i.e. the nodes (internal and leaf) are arranged in order of
increasing weights (see figure).
The Huffman tree could look very different after node swapping
(Fig 7.2), e.g., in the third tree, node A is again swapped and
becomes the #5 node. It is now encoded using only 2 bits.
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Note: Code for a particular symbol changes during the adaptive coding process
Original methods due to Ziv and Lempel in 1977 and 1978. Terry Welch
improved the scheme in 1984 (called LZW compression).
Reference: Terry A. Welch, "A Technique for High Performance Data Compression", IEEE
Computer, Vol. 17, No. 6, 1984, pp. 8-19.
The LZW Compression Algorithm can summarised as follows:
w = NIL;
while ( read a character k )
{
if wk exists in the dictionary
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
w = wk;
else
add wk to the dictionary;
output the code for w;
w = k;
Original LZW used dictionary with 4K entries, first 256 (0-255) are ASCII
codes.
Example:
Input string is "^WED^WE^WEE^WEB^WET".
w
k
output
index
symbol
----------------------------------------NIL
^
^
W
^
256
^W
W
E
W
257
WE
E
D
E
258
ED
D
^
D
259
D^
^
W
^W
E
256
260
^WE
E
^
E
261
E^
^
W
^W
E
^WE
E
260
262
^WEE
E
^
E^
W
261
263
E^W
W
E
WE
B
257
264
WEB
B
^
B
265
B^
^
W
^W
E
^WE
T
260
266
^WET
T
EOF
T
A 19-symbol input has been reduced to 7-symbol plus 5-code output. Each
code/symbol will need more than 8 bits, say 9 bits.
Usually, compression doesn't start until a large number of bytes (e.g., >
100) are read in.
Example (continued):
Input string is "^WED<256>E<260><261><257>B<260>T".
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
w
k
output
index
symbol
------------------------------------------^
^
^
W
W
256
^W
W
E
E
257
WE
E
D
D
258
ED
D
<256>
^W
259
D^
<256>
E
E
260
^WE
E
<260>
^WE
261
E^
<260> <261>
E^
262
^WEE
<261> <257>
WE
263
E^W
<257>
B
B
264
WEB
B
<260>
^WE
265
B^
<260>
T
T
266
^WET
DCT Encoding
The general equation for a 1D (N data items) DCT is defined by the following
equation:
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
The general equation for a 2D (N by M image) DCT is defined by the following equation:
F(u,v) is the DCT coefficient in row k1 and column k2 of the DCT matrix.
For most images, much of the signal energy lies at low frequencies; these
appear in the upper left corner of the DCT.
where
The output array of DCT coefficients contains integers; these can range
from -1024 to 1023.
It is computationally easier to implement and more efficient to regard the
DCT as a set of basis functions which given a known input array size (8 x
8) can be precomputed and stored. This involves simply computing values
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
for a convolution mask (8 x8 window) that get applied (summ values x
pixelthe window overlap with image apply window accros all rows/columns
of image). The values as simply calculated from the DCT formula. The 64
(8 x 8) DCT basis functions are illustrated in Fig 7.9.
DCT is similar to the Fast Fourier Transform (FFT), but can approximate lines well
with fewer coefficients (Fig 7.10)
DCT/FFT Comparison
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Differential Encoding
Simple example of transform coding mentioned earlier and instance of this approach.
Here:
Suitable where successive signal samples do not differ much, but are not
zero. E.g. Video -- difference between frames, some audio signals.
fpredict(ti) = factual(ti-1)
i.e. a simple Markov model where current value is the predict next value.
So we simply need to encode:
If successive sample are close to each other we only need to encode first sample with
a large number of bits:
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Actual Data: 9 10 7 6
Predicted Data: 0 9 10 7
: +9, +1, -3, -1.
Vector Quantisation
The basic outline of this approach is:
Video Compression
We have studied the theory of encoding now let us see how this is applied in practice.
We need to compress video (and audio) in practice since:
1.
Uncompressed video (and audio) data are huge. In HDTV, the bit rate easily exceeds 1
Gbps. -- big problems for storage and network communications. For example:
One of the formats defined for HDTV broadcasting within the United States is 1920
pixels horizontally by 1080 lines vertically, at 30 frames per second. If these numbers
are all multiplied together, along with 8 bits for each of the three primary colors, the
total data rate required would be approximately 1.5 Gb/sec. Because of the 6 MHz.
channel bandwidth allocated, each channel will only support a data rate of 19.2
Mb/sec, which is further reduced to 18 Mb/sec by the fact that the channel must also
support audio, transport, and ancillary data information. As can be seen, this
restriction in data rate means that the original signal must be compressed by a figure
of approximately 83:1. This number seems all the more impressive when it is realized
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
that the intent is to deliver very high quality video to the end user, with as few visible
artifacts as possible.
2.
Lossy methods have to employed since the compression ratio of lossless methods
(e.g., Huffman, Arithmetic, LZW) is not high enough for image and video
compression, especially when distribution of pixel values is relatively flat.
The following compression types are commonly used in Video compression:
The potential ramifications of this similarity will be discussed later. The basic
processing blocks shown are the video filter, discrete cosine transform, DCT
coefficient quantizer, and run-length amplitude/variable length coder. These
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
blocks are described individually in the sections below or have already been
described in JPEG Compression.
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Because of the efficient manner of luminance and chrominance
representation, the 4:2:0 representation allows an immediate data
reduction from 12 blocks/macroblock to 6 blocks/macroblock, or 2:1
compared to full bandwidth representations such as 4:4:4 or RGB. To
generate this format without generating color aliases or artifacts requires
that the chrominance signals be filtered.
o
o
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
P-Frame Coding
Starting with an intra, or I frame, the encoder can forward predict a future frame.
This is commonly referred to as a P frame, and it may also be predicted from
other P frames, although only in a forward time manner. As an example, consider
a group of pictures that lasts for 6 frames. In this case, the frame ordering is
given as I,P,P,P,P,P,I,P,P,P,P,
Each P frame in this sequence is predicted from the frame immediately preceding it, whether
it is an I frame or a P frame. As a reminder, I frames are coded spatially with no reference to
any other frame in the sequence.
P-coding can be summarised as follows:
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Subtle points:
1.
Need to use decoded image as reference image, not original. Why?
2.
We're using "Mean Absolute Difference" (MAD) to decide best block. Can
also use "Mean Squared Error" (MSE) = sum(E*E)
MPEG Video
MPEG compression is essentially a attempts to over come some shortcomings of H.261 and
JPEG:
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
The Problem here is that many macroblocks need information is not in the
reference frame.
For example:
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
8 bits for the actual start code. These start codes may have as many zero bits as desired
preceding them.
Sequence Information
1.
Video Params include width, height, aspect ratio of pixels, picture rate.
2.
Bitstream Params are bit rate, buffer size, and constrained parameters
flag (means bitstream can be decoded by most hardware)
3.
Two types of QTs: one for intra-coded blocks (I-frames) and one for intercoded blocks (P-frames).
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Time code: bit field with SMPTE time code (hours, minutes, seconds,
frame).
2.
GOP Params are bits describing structure of GOP. Is GOP closed? Does it
have a dangling pointer broken?
Picture Information
1.
Type: I, P, or B-frame?
2.
Buffer Params indicate how full decoder's buffer should be before starting
decode.
3.
Encode Params indicate whether half pixel motion vectors are used.
Slice information
1.
Vert Pos: what line does this slice start on?
2.
QScale: How is the quantization table scaled in this slice?
Macroblock information
1.
Addr Incr: number of MBs to skip.
2.
Type: Does this MB use a motion vector? What type?
3.
QScale: How is the quantization table scaled in this MB?
4.
Coded Block Pattern (CBP): bitmap indicating which blocks are coded.
http://www.cs.cf.ac.uk/Dave/Multimedia/node237.html
Linear Predictive Coding (LPC) fits signal to speech model and then
transmits parameters of model. Sounds like a computer talking, 2.4
kbits/sec.
Code Excited Linear Predictor (CELP) does LPC, but also transmits error
term - audio conferencing quality at 4.8 kbits/sec