Multimedia Compression: Audio, Image and Video Require Vast Amounts of Data

Multimedia Compression
Audio, image and video require vast amounts

of data
320x240x8bits grayscale image: 77Kb
1100x900x24bits color image: 3MB
640x480x24x30frames/sec: 27.6 MB/sec
Low networks bandwidth doesn't allow for
real time video transmission
Slow storage or processing devices don't
allow for fast playing back
Compression reduces storage requirements
E.G.M. Petrakis Multimedia Compression 1
Classification of Techniques
Lossless: recover the original
representation
Lossy: recover a representation similar
to the original one
high compression ratios
more practical use
Hybrid: JPEG, MPEG, px64 combine
several approaches
Compression Standards
Furht at.al. 96

Lossless Techniques
Furht at.al. 96

Lossy Techniques
Furht at.al. 96

JPEG Modes of Operation
Sequential DCT: the image is encoded in
one left-to-right, top-to-bottom scan
Progressive DCT: the image is encoded in
multiple scans (if the transmission time is
long, a rough decoded image can be
reproduced)
Hierarchical: encoding at multiple
resolutions
Lossless : exact reproduction
JPEG Block Diagrams
Furht at.al. 96

JPEG Encoder
Three main blocks:
Forward Discrete Cosine Transform (FDCT)
Quantizer
Entropy Encoder
Essentially the sequential JPEG encoder
Main component of progressive, lossless
and hierarchical encoders
For gray level and color images

Sequential JPEG
Pixels in [0,2p-1] are shifted in [-2p-1,2p-1-1]
The image is divided in 8x8 blocks
Each 8x8 block is DCT transformed
C (u ) C ( v ) 7 7 ( 2 x 1)u ( 2 y 1)v
F (u, v )
2

2 x 0 y 0
f ( x, y ) cos
16
cos
16
1
for u 0
C (u ) 2
1 for u 0
1
for v 0
C (v ) 2
1 for v 0
DCT Coefficients
F(0,0) is the DC coefficient: average
value over the 64 samples
The remaining 63 coefficients are the
AC coefficients
Pixels in [-128,127]: DCTs in
[-1024,1023]
Most frequencies have 0 or near to 0
values and need not to be encoded
This fact achieves compression
Quantization Step
All 64 DCT coefficients are quantized
Fq(u,v) = Round[F(u,v)/Q(u,v)]
Reduces the amplitude of coefficients
which contribute little or nothing to 0
Discards information which is not visually
significant
Quantization coefficients Q(u,v) are
specified by quantization tables
A set of 4 tables are specified by JPEG
Quantization Tables
Furht at.al. 96
for (i=0; i < 64; i++)
for (j=0; j < 64; j++) Q[i,j] = 1 + [ (1+i+j) quality];

quality = 1: best quality, lowest compression
quality = 25: poor quality, highest compression

AC Coefficients
The 63 AC coefficients are Furht at.al. 96
ordered by a zig-zag
sequence
Places low frequencies
before high frequencies
Low frequencies are likely
to be 0
Sequences of such 0
coefficients will be encoded
by fewer bits
DC Coefficients
Predictive coding of DC Coefficients
Adjacent blocks have similar DC intensities
Coding differences yields high compression

Entropy Encoding
Encodes sequences of quantized DCT
coefficients into binary sequences
AC: (runlength, size) (amplitude)
DC: (size, amplitude)
runlength: number consecutive 0s, up to 15
takes up to 4 bits for coding
(39,4)(12) = (15,0)(15,0)(7,4)(12)
amplitude: first non-zero value
size: number of bits to encode amplitude
0 0 0 0 0 0 476: (6,9)(476)
Huffman coding
Converts each sequence into binary
First DC following with ACs
Huffman tables are specified in JPEG
Each (runlength, size) is encoded using
Huffman coding
Each (amplitude) is encoded using a
variable length integer code
(1,4)(12) => (11111101101100)
Example of Huffman table
Furht at.al. 96

JPEG Encoding of a 8x8 block
Furht at.al. 96

Compression Measures
Compression ratio (CR): increases with higher
compression
CR = OriginalSize/CompressedSize
Root Mean Square Error (RMS): better quality
with lower RMS
1

n
RMS i 1
( X i xi ) 2
n
Xi: original pixel values
xi: restored pixel values
n: total number of pixels
Furht at.al. 96

JPEG Decoder
The same steps in reverse order
The binary sequences are converted to
symbol sequences using the Huffman tables
F(u,v) = Fq(u,v)Q(u,v)
Inverse DCT
1 7 7
( 2 x 1)u ( 2 y 1)v
F ( x, y )
4

u 0 v 0
C (u )C ( v ) F (u, v ) cos
16
cos
16


Progressive JPEG
When image encoding or transmission takes
long there may be a need to produce an
approximation of the original image which is
improved gradually
Furht at.al. 96

Progressive Spectral Selection
The DCT coefficients are grouped into
several bands
Low-frequency bands are first
band1: DC coefficient only
band2: AC1,AC2 coefficients
band3: AC3, AC4, AC5, AC6 coefficients
band4: AC7, AC8 coefficients

Lossless JPEG
Simple predictive encoding Furht at.al. 96
prediction schemes

Hierarchical JPEG
Produces a set of images at multiple
resolutions
Begins with small images and continues
with larger images (down-sampling)
The reduced image is scaled-up to the
next resolution and used as predictor for
the higher resolution image

Encoding
1. Down-sample the image by 2a in each x, y
2. Encode the reduced size image
(sequential, progressive ..)
3. Up-sample the reduced image by 2
4. Interpolate by 2 in x, y
5. Use the up-sampled image as predictor
6. Encode differences (predictive coding)
7. Go to step 1 until the full resolution is
encoded
Furht at.al. 96

JPEG for Color images
Encoding of 3 bands (RGB, HSV etc.) in
two ways:
Non-interleaved data ordering: encodes
each band separately
Interleaved data ordering: different bands
are combined into Minimum Coded Units
(MCUs)
Display, print or transmit images in parallel with
decompression

Interleaved JPEG
Minimum Coded Unit (MCU): the smallest
group of interleaved data blocks (8x8)
Furht at.al. 96

Video Compression
Various video encoding standards:
QuickTime, DVI, H.261, MPEG etc
Basic idea: compute motion between
adjacent frames and transmit only
differences
Motion is computed between blocks
Effective encoding of camera and object
motion

MPEG
The Moving Picture Coding Experts Group
(MPEG) is a working group for the
development of standards for
compression, decompression, processing,
and coded representation of moving
pictures and audio
MPEG groups are open and have attracted
large participation
http://mpeg.telecomitalialab.com
MPEG Features
Random access
Fast forward / reverse searches
Reverse playback
Audio visual synchronization
Robustness to errors
Auditability
Cost trade-off

MPEG -1, 2
At least 4 MPEG standards finished or
under construction
MPEG-1: storage and retrieval of moving
pictures and audio on storage media
352x288 pixels/frame, 25 fps, at 1.5 Mbps
Real-time encoding even on an old PC
MPEG-2: higher quality, same principles
720x576 pixels/frame, 2-80 Mbps

MPEG-4
Encodes video content as objects
Based on identifying, tracking and
encoding object layers which are
rendered on top of each other
Enables objects to be manipulated
individually or collectively on an
audiovisual scene (interactive video)
Only a few implementations
Higher compression ratios
MPEG-7
Standard for the description of
multimedia content
XML Schema for content description
Does not standardize extraction of
descriptions
MPEG1, 2, and 4 make content available
MPEG7 makes content semantics
available
MPEG-1,2 Compression
Compression of full motion video, interframe
compression, stores differences between frames
A stream contains I, P and B frames in a given pattern
Equivalent blocks are compared and motion vectors
are computed and stored as P and B frames
Furht at.al. 96

Frame Structures
I frames: self contained, JPEG encoded
Random access frames in MPEG streams
Low compression
P frames: predicted coding using with
reference to previous I or P frame
Higher compression
B frames: bidirectional or interpolated coding
using past and future I or P frame
Highest compression

Example of MPEG Stream
Furht at.al. 96
B frames 2 3 4 are bi-directionally coded

using I frame 1 and P frame 5
P frame 5 must be decoded before B frames 2 3 4
I frame 9 must be decoded before B frames 6 7 8
Frame order for transmission: 1 5 2 3 4 9 6 7 8
MPEG Coding Sequences
The MPEG application determines a
sequence of I, P, B frames
For fast random access code the
whole video as I frames (MJPEG)
High compression is achieved by using
large number of B frames
Good sequence: (IBBPBBPBB)
(IBBPBBPBB)...
Motion Estimation
The motion estimator finds the best
matching block in P, B frames
Block: 8x8 or16x16 pixels
P frames use only forward prediction: a
block in the current frame is predicted
from past frame
B frames use forward or backward or
prediction by interpolation: average of
forward, backward predicted blocks
Motion Vectors
block:
16x16pixles
Furht at.al. 96
One or two motion vectors per block

One vector for forward predicted P or B frames or
backward predicted B frames
Two vectors for interpolated B frames

MPEG Encoding
I frames are JPEG compressed
P, B frames are encoded in terms of future or
previous frames
Motion vectors are estimated and differences
between predicted and actual blocks are
computed
These error terms are DCT encoded
Entropy encoding produces a compact binary code
Special cases: static and intracoded blocks

MPEG encoder
JPEG encoding
Furht at.al. 96

MPEG Decoder
Furht at.al. 96

Motion Estimation Techniques
Not specified by MPEG
Block matching techniques
Estimate the motion of an nxm block in
present frame in relation to pixels in
previous or future frames
The block is compared with a previous or
forward block within a search area of size
(m+2p)x(n+2p)
m = n = 16
p = 6
Block Matching
Furht at.al. 96
Search area in block matching techniques

Typical case: n=m=16, p=6
F: block in current frame
G: search area in previous (or future) frame

Cost functions
The block has moved to the position that
minimizes a cost function
I. Mean Absolute Difference (MAD)
1 n/2 m/2
MAD ( dx, dy ) F (i, j ) G (i dx, j dy )
mn i n / 2 j m / 2
F(i,j) : a block in current frame
G(i,j) : the same block in previous or future frame
(dx,dy) : vector for the search location
dx=(-p,p), dy=(-p,p)

More Cost Functions
II. Mean Squared Difference (MSD)
1 n/2 m/2

2
MSD (dx, dy ) F (i, j ) G (i dx, j dy )
mn i n / 2 j m / 2
III. Cross-Correlation Difference (CCF)

F (i, j )G(i dx, j dy )
CCF ( dx, dy ) i j
1/ 2 1/ 2

F (i, j ) G
2
2
(i dx, j dy )

i j i j

More cost Functions
IV. Pixel Difference Classification (PDC)
PDC ( dx, dy ) T ( dx, dy , i, j )

i j
1 if F (i, j ) G (i dx, j dy ) t
T ( dx, dy , i, j )
0 otherwise
t: predefined threshold
each pixel is classified as a matching pixel
(T=1) or a mismatching pixel (T=0)
the matching block maximizes PDC
Block Matching Techniques
Exhaustive: very slow but accurate
Approximation: faster but less accurate
Three-step search
2-D logarithmic search
Conjugate direction search
Parallel hierarchical 1-D search (not
discussed) Pixel difference classification
(not discussed here)

Exhaustive Search
Evaluates the cost function at every
location in the search area
Requires (2p+1)2 computations of the cost
function
For p=6 requires169 computations per
block!!
Very simple to implement but very slow

Three-Step Search
Computes the cost function at the
center and 8 surrounding locations in
the search area
The location with the minimum cost
becomes the center location for the next
step
The search range is reduced by half

Three-Step Motion Vector
Estimation (p=6)
Furht at.al. 96

ThreeStep Search
1. Compute cost (MAD) at 9 locations
Center + 8 locations at distance 3 from center
2. Pick min MAD location and recompute MAD
at 9 locations at distance 2 from center
3. Pick the min MAD locations and do same at
distance 1 from center
The smallest MAD from all locations indicates
the final estimate
M24 at (dx,dy)=(1,6)
Requires 25 computations of MAD
2-D Logarithic Search
Combines cost function and predefined
threshold T
Check cost at M(0,0), 2 horizontal and 2
vertical locations and take the minimum
If cost at any location is less than T then
search is complete
If no then, search again along the
direction of minimum cost - within a
smaller region
Furht at.al. 96
if cost at M(0,0) < T then search ends!

compute min cost at M 1,M2,M3,M4; take their min;
if min cost < M(0,0)
if (cost less than T) then search ends!
else compute cost at direction of minimum cost (M 5,M6 in the example);
else compute cost at the neighborhood of min cost within p/2 (M5 in the
example)

Conjugate Direction Search
Furht at.al. 96
Repeat
find min MAD along dx=0,-1,1 (y fixed): M(1,0) in example
find min MAD along dy=0,-1,1 starting from previous min (x
fixed): M(2,2)
search similarly along the direction connecting the above mins
Other Compression
Techniques
Digital Video Interactive (DVI)
similar to MPEG-2
Fractal Image Compression
Find regions resembling fractals
Image representation at various resolutions
Sub-band image and video coding
Split signal into smaller frequency bands
Wavelet-based coding
References
B. Furht, S. W. Smoliar, H-J. Zang, Video and Image
Processing in Multimedia Systems, Kluwer Academic
Pub, 1996

Multimedia Compression: Audio, Image and Video Require Vast Amounts of Data

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Multimedia Compression: Audio, Image and Video Require Vast Amounts of Data

Caricato da

Copyright:

Formati disponibili

Multimedia Compression

Audio, image and video require vast amounts

E.G.M. Petrakis Multimedia Compression 3

E.G.M. Petrakis Multimedia Compression 4

E.G.M. Petrakis Multimedia Compression 5

E.G.M. Petrakis Multimedia Compression 7

E.G.M. Petrakis Multimedia Compression 8

for (i=0; i < 64; i++)

for (j=0; j < 64; j++) Q[i,j] = 1 + [ (1+i+j) quality];

E.G.M. Petrakis Multimedia Compression 12

E.G.M. Petrakis Multimedia Compression 14

E.G.M. Petrakis Multimedia Compression 17

E.G.M. Petrakis Multimedia Compression 18

E.G.M. Petrakis Multimedia Compression 20

E.G.M. Petrakis Multimedia Compression 21

E.G.M. Petrakis Multimedia Compression 22

E.G.M. Petrakis Multimedia Compression 23

E.G.M. Petrakis Multimedia Compression 24

E.G.M. Petrakis Multimedia Compression 25

E.G.M. Petrakis Multimedia Compression 27

E.G.M. Petrakis Multimedia Compression 28

E.G.M. Petrakis Multimedia Compression 29

E.G.M. Petrakis Multimedia Compression 30

E.G.M. Petrakis Multimedia Compression 32

E.G.M. Petrakis Multimedia Compression 33

E.G.M. Petrakis Multimedia Compression 36

E.G.M. Petrakis Multimedia Compression 37

B frames 2 3 4 are bi-directionally coded

One or two motion vectors per block

E.G.M. Petrakis Multimedia Compression 41

E.G.M. Petrakis Multimedia Compression 42

E.G.M. Petrakis Multimedia Compression 43

E.G.M. Petrakis Multimedia Compression 44

Search area in block matching techniques

E.G.M. Petrakis Multimedia Compression 46

E.G.M. Petrakis Multimedia Compression 47

III. Cross-Correlation Difference (CCF)

E.G.M. Petrakis Multimedia Compression 48

PDC ( dx, dy ) T ( dx, dy , i, j )

E.G.M. Petrakis Multimedia Compression 50

E.G.M. Petrakis Multimedia Compression 51

E.G.M. Petrakis Multimedia Compression 52

E.G.M. Petrakis Multimedia Compression 53

if cost at M(0,0) < T then search ends!

E.G.M. Petrakis Multimedia Compression 56

E.G.M. Petrakis Multimedia Compression 59

Potrebbero piacerti anche