H264 Review and Comparison

Review and Comparison
Francisco Aguirre-Ramos
fjaguirre@ieee.org
Introduction
Video compression standards
Video encoding/decoding
Fundamentals
What is a video sequence?
Colour spaces
RGB
YUV
Prediction
Temporal prediction
Spatial prediction
Frame types
H.264/AVC
H.264 components
Macroblocks
Slices
Decoding/Display order
Reference picture lists
H.264 prediction
Intra-prediction
Inter-prediction
Transforms and quantization
H.264 coding
Summary of encoding
The High Efficiency Video
Coding (HEVC)
What the HEVC?
Main differences between H.264
and H.265
Conclusions
References
2
3

Consumer internet video streaming accounted for 56
exabytes of internet traffic in 2010, its predicted that
this will grow to 403 exabytes by 2015
- Cisco,2011

It would take over 6 million years to watch the
amount of video that will cross global IP networks
each month in 2016. Every second, 1.2 million
minutes of video content will cross the network in
2016
-Cisco, 2012

4
Why are standards so important?
Simplify inter-operability between
encoders/decoders from different manufacturers
Make possible to build complete platforms
Avoid patent issues
5
Evolution of video compression standards
1990 1993 1994 1995 - 1998 2003
H.261
(VCEG)
Video
conferencing
MPEG-1 Part 2
(MPEG)
VCD
Interactive CDs
H.262/MPEG-2
Video (JCT-VC)
DVD-V
HDTV
H.263 (VCEG)
H.263V2
MPEG-4 Part 2
(JCT-VC)
Flash video
MPEG-4 Part 10
H.264/AVC
(JCT-VC)
Mobile applications
Adaptive multi-bitrate
streaming
HDTV broadcasting
Blu-ray Video
3DTV
ITU-T Video Coding Experts Group (VCEG)
ISO/IEC Moving Picture Experts Group (MPEG)
VCEG + MPEG = Joint Collaborative Team on Video Coding (JCT-VC)
Are we missing
something?
6
Video Encoding or Coding?
Cambridge Online Dictionary Definition:
Encoding: to change something into a system for
sending messages secretly, or to represent
complicated information in a simple or short way.
Coding: to represent a message in code so that it
can only be understood by the person who is
meant to receive it.
7
We will refer to encoding as something
wider than coding.
Encoded object
Coded info
1
Coded info
2
Coded info
n
, ,
,
Mechanism
1
Mechanism
2
Mechanism
n
8
We decode when we take something coded
or encoded and recover its original form.
9
What is a video sequence?
A video is a representation of a real-world visual scene
The scene is sampled at a point in time to produce a frame
Sampling is repeated at intervals of time to produce a motion
picture
The Hobbit: An
Unexpected Journey,
was filmed and
released at 48 FPS!
10
Digital video is just sampling a continuos
signal spatially and temporally.
11
To obtain a 2D spacially sampled image
(frame) usually CCD or CMOS sensors are
used to capture each color component
12
Digital video applications need a way to
capture and represent colour information
Two colour spaces will be explained
RGB
YUV
13
Represents colour using three numbers
Red, Green and Blue
If we have an infinite range of values we can
create any color
RGB is used in digital video

RGB components of a colour image
14
Luminance (Y) and Chrominance (UV) can
represent a color image
Luminance represents the achromatic image
Chrominance represents the color information
15
The human visual system (HVS) is less sensitive to
colour than to luminance
Chrominance information can be stored with less
resolution than luminance
In the RGB colour space the three colours are
equally important
YCrCb can be calculated from RGB values
Digital video uses Luma and Chroma (YCrCb)
=
=
16
Video sequences contain high levels of spatial
and temporal redudancy
The goal of a prediction model is to reduce this
redudancy
Prediction from previously coded frames is called
temporal (Inter)
Prediction from previously coded image samples
in the same frame is called spatial (Intra)
17
The predicted frame is created from one or more
past or future frames known as reference frames
Frame n, Frame n+1, Difference
Motion estimation helps to reduce the information to code
18
It is possible to estimate the trajectory of
each pixel between successive video frames
Lighting changes and uncovered regions have to
be coded
Optical flow between frames
With a trajectory by
each pixel, this is still
too much information
19
Instead of calculating trajectories for each
pixel, blocks can be used.
Take a MxN-sample region and search a similar
area en in the reference frame (smallest residual)
Motion estimation
Calculate the residual between regions Motion
compensation
Code the residual block and the offset between
the current block and the position of the predictor
(Motion vector)

20
Inter-prediction in block based video compression
standards
21
Spatial prediction uses as reference frame the
current frame with its previously coded
blocks
Intra-prediction available samples
Extrapolation modes
and residuals are coded
22
Depending on the used prediction, frames
can be classified as:
Intra-prediction frames: Only spatial prediction is
used, this frames are inserted to refresh the inter-
prediction
Inter-prediction frames:
Predictive (P)
Bi-predictive (B)

23
Characteristics
Enhanced compression efficiency
Gain of 50% of bitrate compared to previous standards
Decoder complexity is increased by 4x with
respect to MPEG-2 decoder
Encoder is 8x more complex than a MPEG-2
encoder
Network friendly

24
25
An H.264 video encoder carries out prediction,
transforming and coding processes to produce a
compressed H.264 bitstream
Video codec: high level overview
26
The H.264 basic block is the Macroblock (MB)
Maximum sizes of 16x16, 8x8 and 4x4
Intra- and Inter-prediction are performed at MB level
A MB is formed by
Luma and Chroma
blocks
27
Each encoded frame is composed of one or more
slices
Slice header
Integral number of MB
Possible scenarios:
One slice per coded picture
N slices per picture (variable size)
N slices per picture (fixed size)
Intra-perdiction is performed
over the MBs contained in the
same slice
28
Decoding and Display order are not always the
same
Decoding order is related to the inter dependences
Display order depends on the original video sequence
29
Pictures used for reference are stored in the
Decoded Picture Buffer (DPB)
DPB pictures can be marked as:
Short term reference picture, indexed according
to its frame_num or POC
Long term reference picture, indexed according to
a reference number
The oldest short term reference picture is
removed from DPB when it is full (automatic
mode)
30
Little example
31
Reference pictures are ordered in one or two
lists prior to encoding or decoding a slice
0
1
For P slices, only 0 is used
Short term reference pictures are in decoding
order
For B slices, both 0 and 1 are used
Short term reference pictures are in display order
32
0 (P slice): The default order is decreasing
order of decoding
0 (B slice): The default order is:
Decreasing order of POC, for pictures with POC
earlier than the current picture
Increasing order of POC, for pictures with POC later
than the current picture
1 (B slice): The default order is:
Increasing order of POC, for pictures with POC later
than the current picture
Decreasing order of POC, for pictures with POC
earlier than the current picture

33
Little example for P slices (0)
34
Now with B slices (0, 1), frame 68
35
We can say that 0 is used to predict from
past pictures while 1 is for future
pictures
Even when this is boring it is
fundamental when we are trying
to achieve high compression
36
H.264/AVC supports a wide range of
prediction options
Intra-prediction
Many different prediction modes
Inter-prediction
Motion compensation
Sub-pixel interpolation
Multiple MB sizes (and luma block sizes)
Filtering process to reduce artifacts
37
Some facts:
An intra (I) MB is coded without referring to any
data outside the current slice
I-MB may occur in any slice type
Every MB in an I slice is an I MB
Intra prediction uses samples from adjacent,
previously coded blocks to predict the values in
the current block
38
Best block size selection is performed at
encoder, there are different prediction modes
for each block size
Intra-prediction block size Notes
16x16 (luma) Four possible prediction
modes
8x8 (luma) Nine possible prediction
modes
4x4 (luma) Nine possible prediction
modes
Chroma One prediction mode
39
This prediction modes are available depending on
the present blocks in slice
Predictive mode signaling is used
These are the prediction modes for 4x4 blocks, 8x8
blocks use a similar set
40
16x16 blocks are used in homogeneous areas with
low motion and texture
The used mode is signaled
These are the prediction modes for block sizes of
16x16, chroma modes are similar
41
Example of intra block size choices
42
Some facts:
Inter-prediction blocks can range in size from 16x16
down to 4x4
Reference picture is chosen from DPB
The offset between the current partition and the
prediction region in the reference picture is a motion
vector (MV)
MV can point to integer, half or quarter-sample
positions in the luma component and even to a
eighth-sample position for chroma
MVs are differencially coded from the MVs of
neighbouring blocks
43
More facts!:
A prediction block may be generated from a
simple prediction region in a reference picture, for
a P- or B-MB, or from two prediction regions in
reference pictures, for a B-MB
Prediction block may be weighted according to
the temporal distance
In B-MB, a block may be predicted in direct mode
44
An example of bi-prediction
45
Each 16x16 P- or B-MB may be predicted
using a range of block sizes
Each block or partition can have its own MV
Partitions are not necessary square
Motion vectors are coded using a predictive
approach
46
47
In H.264/AVC the transform and quantization
processes are designed to minimize
computational complexity and avoid
ecoder/decoder mismatch. This is achieved
by:
Using an integer transform that can be carried out
using integer or fixed point arithmetic
Minimizing the number of multiplications,
required by the quantization step to process a
block of residual data

48
Forward transform and quantization
Re-scaling and inverse transform
49
Forward transform and quantization 50
Re-scaling and inverse transform 51
A coded H.264 stream consists of a series of
coded symbols.
The H.264/AVC standard specifies several
methods for coding symbols into a binary
pattern:
Fixed length code: A symbol is converted into a binary
code with a specified length (n bits).
Exponential-Golomb variable length code: The
symbol is represented as a codeword with varying
number of bits (v bits). Shorter codewords are
assigned to frequent symbols.

52
CAVLC: Context-Adaptive Variable Length
Coding, specially-designed method of coding
transform coefficients in which different sets of
VLC are chosen depending on the statistics of
recently-coded coefficients.
CABAC: Context-Adaptive Binary Arithmetic
Coding, a method of arithmetic coding in which
the probability models are updated based on
previous coding statistics.
53
Symbols occurring in the syntax above the
slice data level (headers, video parameters)
Fixed Length Codes or Exp-Golomb codes.
Symbols at slice data level and below
(Prediction modes, coefficients, etc.)
CABAC.
CAVLC + Fixed Length Codes or Exp-Golomb
codes.
54
Typical H.264 encoder
55

Developed by the Joint Collaborative Team on Video Coding
Designed to fulfill the requirements of the new video applications
Focused on:
Increased video resolution
Parallel processing architectures
It will allow:
Full-HDTV broadcasting
UHDTV (up to 8k*4k resolution)
Mobile HD content
Full-HD 3DTV
Provides an improvement of coding efficiency over AVC of 50%
(half bitrate same quality)
HEVC encoding process is up to 10x more complex than the one
of AVC
56
H.265/ HEVC - MPEG-H Part 2
We will resume the encoding process into five
steps:
Partitioning
Prediction
Reconstruction
Coding
Packetization

57
HEVC introduces a larger
block structure than
previous standards
A larger block structure
provides a higher
compression performance
Basic block is known as the
largest coding unit (LCU), it
can be recursively split into
smaller coding units (CU)
The CU is used as the basic
unit for intra- and
intercoding

Picture partitioning
58
Picture partitioning/Prediction Units
59
A transform unit (TU) is the
basic unit for the transform
and quantization processes
Size and shape of the TUs
depend on the size of the
PU
TUs can be as small as 4x4
or as large as 32x32
Picture partitioning/Transform Units
60
Picture partitioning/Example
61
Prediction/Intra
HEVC has 35 luma and 6 chroma intraprediction modes
(H.264/AVC has 9 luma and 4 chroma intra modes)
62
Asymmetric motion partitions (AMP) improve the
coding efficiency (allows to fit PUs to shapes in the
picture)
The accuracy of motion compensation in HEVC is
1/4 pel for luma samples
Motion information is coded using advanced motion
vector prediction (AMVP), merge and skip modes
are availables

Prediction/Inter
63
HEVC applies square and non-square DCT-
like integer transforms
Integer transforms used in HEVC are better
approximations to the DCT than the used in
H.264/AVC
Integer discrete sine transform (DST) is used
for some residuals

Reconstruction/T&Q
64
HEVC applies 3 different in-loop filtering methods:
Deblocking filter: Similar to the one in H.264/AVC
Sample Adaptive Offset (SAO): Classify pixels into
different categories and adds a simple offset to each pixel
based on its category
Adaptive Loop Filtering (ALF): Its constructed based on
the original image and it's designed to minimize the
distortion between the reconstructed and the original
images

Reconstruction/In-loop Filtering
65
Coding/Tiles
66
Coding/Slices
67
Wavefront parallel
processing (WPP) is an
efficient mechanism for
parallel encoding/decoding
Prediction dependencies
are not broken across slices
The basic concept is to
start processing a new row
of LCUs with a new parallel
process as soon as two
LCUs have been processed
in the row above
Coding/WPP
68
69
Even when HEVC is the new video
compression standard, it has a long path until
its complete adoption.
Most of the technologies that will need HEVC
to work are still in an early stage.
H.264/AVC will continue to be the industry
standard for a while.
A lot of research is being doing in different
HEVC related areas.
70
Ostermann, J., Bormans, J., List, P., Marpe, D., Narroschke, M.,
Pereira, F., Stockhammer, T., et al. (2004). Video coding with
H.264/AVC: tools, performance, and complexity. IEEE Circuits and
Systems Magazine, 4(1), 728. doi:10.1109/MCAS.2004.1286980
Ohm, J., Sullivan, G. J., Schwarz, H., Tan, T. K., & Wiegand, T.
(2012). Comparison of the Coding Efficiency of Video Coding
Standards Including High Efficiency Video Coding (HEVC).
Sullivan, G. J., Ohm, J., Han, W., & Wiegand, T. (2012). Overview of
the High Efficiency Video Coding (HEVC) Standard, (c).
Bross, B., Han, W.-J., Ohm, J.-R., Sullivan, G. J., & Wiegand, T.
(2012). High Efficiency Video Coding (HEVC) Text Specification
Draft 8 (pp. 1261).
71
Goldman, M. S. (2011). High Efficiency Video Coding (HEVC) The
Next Generation Compression Technology. SMPTE Conferences,
2011(1), 111. doi:10.5594/M001098
Pourazad, M., Doutre, C., Azimi, M., & Nasiopoulos, P. (2012).
HEVC: The New Gold Standard for Video Compression: How Does
HEVC Compare with H.264/AVC? IEEE Consumer Electronics
Magazine, 1(3), 3646. doi:10.1109/MCE.2012.2192754
Nightingale, J., Wang, Q., & Grecos, C. (2012). HEVStream: a
framework for streaming and evaluation of high efficiency video
coding (HEVC) content in loss-prone networks. IEEE Transactions
on Consumer Electronics, 58(2), 404412.
doi:10.1109/TCE.2012.6227440
Richardson, I. E. G. (2010). The H.264 Advanced Video Compression
Standard (2nd ed., p. 316). John Wiley & Sons Ltd.

72

H264 Review and Comparison

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

H264 Review and Comparison

Caricato da

Copyright:

Formati disponibili

Review and Comparison

Potrebbero piacerti anche