Sei sulla pagina 1di 5

Saturn Software Mills

Visit for more technical article

Introduction to MPEG Video Stream

Digital techniques have made rapid progress in audio and
video. Digital Information is more robust and error
resilient. This means that generation losses during
recording and losses in transmission can be eliminated. The
compact disk (CD) was the first consumer product to
demonstrate this. Digital recording and transmission
techniques allow content manipulation that is not possible
in analog. Once audio or video is digitized, the contents
are in the form of data. Such data can be handled in the
same way as any other kind of data. However, production
standard digital video generates over 200 megabits per
second of data, and this bit rate requires extensive
capacity for storage and wide bandwidth for transmission.
This extensive storage and bandwidth requirement can be
reduced by compression. Compression is a way of expressing
digital audio and video by using less data.

Video Sequence
Group of Pictures

Picture Block
Slice Macroblock 8 pixels

8 pixels

Y Cb Cr
1 2 5 6
3 4
Structure of macroblock

Figure 1. Video Sequence in MPEG stream

MPEG is one of the most popular audio/video compression

techniques because it is not just a single standard.
Instead, It is a range of standards suitable for different
applications but based on similar principles. MPEG is an
acronym for the Moving Picture Expert Group established by
ISO (International Standards Organization) and IEC
Saturn Software Mills
Visit for more technical article

(International Electrotechnical Commission). A video is a

sequence of pictures and each picture is an array of
pixels. This video data is organized in a hierarchical
fashion in an MPEG video stream. MPEG video sequence
consists of different layers, GOP, Pictures, Slices,
Macroblock, Block. A comprehensive picture is shown in
figure 1.

Video Sequence
Begins with a sequence header, includes one or more groups
of pictures, and ends with an end-of-sequence code.

Group of Pictures (GOP)

A header and a series of one or more pictures intended to
allow random access into the sequence.

This is primary coding unit of a video sequence. A picture
consists of three rectangular matrices representing
luminance (Y) and two chrominance (Cb and Cr) values. The Y
matrix has an even number of rows and columns. The Cb and Cr
matrices are one half the size of the Y matrix in
horizontal and vertical directions.

It contains one or more contiguous macroblocks. The order
of the macroblocks within a slice is from left to right and
top to bottom. Slices are important in the handling of
errors. If the bitstream contains an error, the decoder can
skip to start of the next slice.

This is basic coding unit in the MPEG algorithm. It is a
16x16 pixel segment in a frame. If each chrominance
component has one-half the vertical and horizontal
resolution of the luminance component, a macroblock
consists of four Y, one Cr, and one Cb block.

This is smallest coding unit in the MPEG algorithm. It
consists of 8x8 pixels and can be one of three types:
luminance(Y), red chrominance(Cr), or blue chrominance(Cb).
Saturn Software Mills
Visit for more technical article

Picture Types
The MPEG standard specifically defines three types of
• Intra Pictures (I-Pictures)
• Predicted Pictures (P-Pictures)
• Bidirectional Pictures (B-Pictures)

These three types of pictures are combined to form a group

of picture (GOP). Typical GOP structures are as follows:


Intra Pictures
Intra pictures, or I-Pictures, are coded using only
information present in the picture itself, and provides
potential random access points into the compressed video
data. It uses only transform coding and provide moderate

Predicted Pictures
Predicted pictures, or P-Pictures, are coded with respect
to the nearest previous I or P-Pictures. This technique is
called forward prediction. P-Pictures use motion
compensation to provide more compression than is possible
with I-pictures.

Bidirectional Pictures
Bidirectional pictures, or B-pictures, are pictures that
use both a past and future picture as a reference. This
technique is called bidirectional prediction. B-pictures
provide the most compression since it uses the past and
future picture as a reference, however the computation time
is largest.

Encoding Intra Picture

The MPEG transform coding algorithm for Intra picture
includes the following
• Discrete cosine transform (DCT)
• Quantization
• Run-length encoding
Saturn Software Mills
Visit for more technical article

For every
8x8 block DCT Quantization
macroblock Zig-Zag
Huffman RLE

Figure 2. Encoding of Intra Picture

The 8x8 block in a picture generally contains high spatial

redundancy. To reduce this redundancy, the MPEG algorithm
transforms 8x8 blocks of pixels from the spatial domain to
the frequency domain with the discrete cosine transform
(DCT). The combination of DCT and quantization results in
many of the high frequency coefficients being zero. To take
maximum advantage of this, the coefficients are organized
in a zigzag order to produce long runs of zero. This zigzag
sequence is then coded with a variable length code (Huffman
Encoding), which uses shorter coded for commonly occurring
pairs and longer codes for less common pairs. The intra
picture coding steps are shown in figure 2.

Target Frame


Reference Frame

Best Match Quant.
Motion vector RLE



Figure 3. Encoding of Predicted Picture

Saturn Software Mills
Visit for more technical article

Encoding of Predicted Picture

A P-picture is coded with reference to a previous image
(reference image) that is an I or P pictures as shown in
figure 3. Motion compensation based prediction is used to
exploit the temporal redundancy. Since the frames are
closely related, it is possible to accurately represent or
predict the data of one frame based on the data of a
reference image, provided the translation is estimated.
This translation is known as motion vector of macroblock.
In P pictures, each 16x16 sized macroblock is predicted
from a macroblock of a previously encoded I picture. A
search is conducted in the I frame to find the macroblock
which closely matches the macroblock under consideration in
the P frame. The difference between two macroblock is the
prediction error. This error can be coded in the DCT domain
and quantized. Finally it uses the run-length encoding and
Huffman encoding to encode the data.

Encoding of Bi-directional Pictures

A B picture is bidirectional predicted picture. Two frames
are used to predict the current B picture, the previous
frame and the next frame. Hence B pictures are coded like P
pictures except the motion vectors can reference either the
previous reference picture, the next picture, or both.
Consider a B picture B. B will be predicted from two
reference frames R1 and R2. R1 is previous I/P picture and R2
is next I/P picture. For each macroblock MB of B, find the
closest match MB1 in R1 and MB2 in R2. The predicted
macroblock, PM is calculated as given below.

PM = NINT (α1 MB1 + α2 MB2)

NINT is nearest integer operator and α1 and α2 are described
α1 = 0.5 and α2 = 0.5 if both matches are satisfactory.
α1 = 1 and α2 = 0 if only first match is satisfactory.
α1 = 0 and α2 = 1 if only second match is satisfactory.
α1 = 0 and α2 = 0 if neither match is satisfactory.

Finally the error block E is computed by taking the

difference of MB and PM. This error block E is coded as per
Intra coding standards.