Sei sulla pagina 1di 50

High Efficiency Video

Coding(HEVC)
Prepared By:
Raviraj P Makwana (P13VL020)
ELECTRONICS DEPARTMENT
SVNIT SURAT
Guided By:
Mr. Anand Darji
ELECTRONICS DEPARTMENT
SVNIT SURAT
BASIC BUILDING BLOCKS OF VIDEO CODING
Why do we need compression?
Compression reduces the bandwidth required to transmit and store digital video.
Example:
- 480x640 pixels/frame, progressive scanning at 60 frames/s:
=
480x640


60

= 442.368Mbps
If Channel B.W = 20Mbps then compression ratio of near about 23 is required.
As pixels/Frame increases . C.R increases.
BASIC BUILDING BLOCKS OF VIDEO CODING
How to achieve the compression?
Reduce redundant or repeated information
Reduce irrelevant information
Video contains much spatial and temporal redundancy. Compression is
achieved by exploiting the spatial and temporal redundancy inherent to
video
Spatial redundancy: Neighboring pixels are similar
Temporal redundancy: Adjacent frames are similar
BASIC BUILDING BLOCKS OF VIDEO CODING
Interlacing , Recording
Frame is divided into odd and even lines.
It can either halve the required bandwidth or double the vertical
resolution.
Video was first recorded on open-reel magnetic tape by the BBC in
1955
Consumer-friendly cassettes were standardized in the 1970s
Adopted in home entertainment in the 1980s.
predictive compression
I-frames, P-frames and B-frames.
I-frame (Intra-coded picture) fully specified picture. least
compressible, don't require other video frames to decode.
P-frame (Predicted picture) holds only the changes in the image from
the previous frame. also known as delta-frames.
B-frame (Bi-predictive picture) saves even more space by using
differences between the current frame and both the preceding and
following frames to specify its content.
Petteri Aimonen - Own work
Hybrid Video Coding
Video encoder
Video decoder
VIDEO SOURCE CODING BASICS
Video coding often uses a colour representation having three components
called Y, Cb, and Cr.
Y is called luma and represents brightness.
The two Cb and Cr are called chroma components represent the extent to
which the colour deviates from gray toward blue and red, respectively.
It uses 4:2:0 sampling.
The two basic video formats are progressive and interlaced.
If the two elds of a frame are captured at different time instants, the
frame is referred to as an interlaced frame, and otherwise it is referred to
as a progressive frame.
Techniques for digital compression
1) Prediction
2) Transformation
3) Quantization
4) Entropy coding
Brief History of Video Coding
CRTs and Analog Broadcast
The first video systems evolved from oscilloscopes.
It is Oscilloscopes with a second dimension.
CRT-based display device could present a signal modulated on a
transmitted radio frequency carrier. Thus television was born.
Colour television was first developed by RCA(Radio Corporation of
America) in the late 1940s.
In 1935 the British government defined high definition television
(HDTV) as having at least 240 lines.
Today the definition of HDTV is at least 720 lines.
motion JPEG
intra-picture or Intra coding
picture is coded without referring to other pictures in a video
sequence.
segmenting the picture arrays into equal-size blocks of 8x8 samples
each.
These blocks are transformed by a DCT.
DCT coefcients are then quantized and transmitted using variable-
length codes.
H.120
inter-picture or Inter coding
video can be represented more efciently by sending only the
changes in the video scene rather than coding all regions repeatedly
conditional replenishment (CR).
It had two versions, one for PAL and the other for NTSC.
inability to rene the approximation given by a repetition.
H.261
hybrid codecs
use both prediction and transformation.
only P-pictures are allowed
Used for ISDN/Videoconferencing work.
maximum bit rate of 2 Mbps.
limited with a chroma sub-sampling of 4:2:0.
MPEG-1
adopted H.261 and JPEG together .
limited to 1.5 Mbps, 4:2:0 and stereo audio only.
sub-divisions
Part 1 is always for the System .
Part 2 is for video.(H.261)
Part 3 is for audio.
supports only progressive scan (non-interlaced) pictures
MPEG-2/H.262
supports standard definition (SD, 720x576/480pixel) resolutions
high-definition (HD) video signals with a pixel resolution of
1920x1080.
target data rate is 4-30 Mbit/s.
support for interlaced video.
MPEG-2 decoder is backward compatible with the MPEG-1 standard.
H.263
Used for video conferencing at low bit rates for mobile wireless
communications.
At very low bit rates, video quality is better by a factor of two
compared to MPEG-2 / H.262.
H.264 or MPEG-4 Part 10/AVC
originally known as H.26L or JVT.
ITU name :H.264
ISO name : MPEG-4 Part 10/Advanced Video Coding (AVC).
HEVC or H.265
Up to 8K UHDTV (81924320 maximum)
12-bit color bit depth
4:4:4 and 4:2:2 chroma sub-sampling
Supports up to 300 fps (earlier versions only supported up to 59.94
fps)
Data rates of several GB/s
File size subjectively half the size of H.264 with better quality!
H.264/AVC VIDEO CODING STANDARD
Some key building blocks of the NAL design .
1) NAL Units:
is effectively a packet that contains an integer number of bytes.
The rst byte of each NAL unit is a header byte that contains an
indication of the type of data in the NAL unit.
The remaining bytes contain payload data of the type indicated by the
header.
A. H.264/AVC NAL
Two classes of NAL units
a. VCL
contain the data that represents the values of the samples in the
video pictures
b. non-VCL
Contai information such as parameter sets (important header data
that can apply to a large number of VCL NAL units)
supplemental enhancement information (timing information and
other supplemental data that may enhance usability of the decoded
video signal but are not necessary for decoding the values of the
samples in the video pictures).
2) Parameter Sets:
contains important header information that can apply to a large
number of VCL NAL units.
Two types of parameter sets:
I. sequence parameter sets, which apply to a series of consecutive
coded video pictures;
II. picture parameter sets, which apply to the decoding of one or
more individual pictures.
Key VCL NAL units for a picture each contain an identier that refers
to the content of the relevant picture parameter set.
Each picture parameter set contains an identier that refers to the
relevant sequence parameter set.
small amount of data (the identier) can be used to establish a larger
amount of information (the parameter set).
Sequence and picture parameter sets can be sent well ahead of the
VCL NAL units that they apply to.
3) Access Units:
The set of VCL and non-VCL NAL units that is associated with a single
decoded picture is referred to as an access unit.
contains all macroblocks of the picture, possibly some redundant ap-
proximations of some parts of the picture for error resilience
purposes (referred to as redundant slices), and other supplemental
information associated with the picture.
B. H.264/AVC VCL
called block-based hybrid video coding approach
There is no single
coding element in the
VCL that provides the
majority of the
signicant improvement
in compression efciency
in relation to prior video
coding standards.
1) Macroblocks, Slices, and Slice Groups:
Each picture is partitioned into xed size macroblocks
16x16 samples for the luma component.
8x8 sample regions for each of the two chroma components.
predictedeither spatially or temporallyand the resulting
prediction residual is transmitted using transform coding.
Quantized and entropy coded.
The macroblocks of the picture are organized into slices.
represent regions of a given picture that can be decoded
independently.
Each slice is a sequence of macroblocks that is processed in the order
of a raster scan.
A picture may contain one or more slices .
Slices can be used for:
error resilience, as the partitioning of the picture allows spatial
concealment within the picture and as the start of each slice provides
a resynchronization point at which the decoding process can be
reinitialized.
creating well-segmented payloads for packets that t the maximum
transfer unit (MTU) size of a network
parallel processing, as each slice can be encoded and decoded
independently of the other slices of the picture.
The error resilience aspect of slices can be further enhanced (among
other uses) through the use of the FMO tech- nique,
2) Slice Types:
I slice: A slice in which all macroblocks of the slice are coded using
Intra prediction.
P slice: In addition to the coding types of the I slice,macroblocks of a
P slice can also be coded using Inter prediction with at most one MCP
signal per block.
B slice: In addition to the coding types available in a P slice,
macroblocks of a B slice can also be coded using Inter prediction with
two MCP signals per prediction block that are combined using a
weighted average.
SP slice: A so-called switching P slice that is coded such that efcient
and exact switching between different video streams (or efcient
jumping from place to place within a single stream) becomes possible
without the large number of bits needed for an I slice.
SI slice: A so-called switching I slice that allows an exact match with
an SP slice for random access or error recovery purposes, while using
only Intra prediction.
3) Intra-Picture Prediction:
Intra 4x4 mode is based on predicting each 4x4 luma block separately
and is well suited for coding of parts of a picture with signicant detail
The Intra 16x16 mode does prediction and residual coding on the
entire 16x16 luma block and is more suited for coding very smooth
areas of a picture.
separate chroma prediction is conducted.
H.263 and MPEG-4 Visual - Intra prediction has been conducted in the
transform domain.
Intra prediction in H.264/AVC is always conducted in the spatial
domain.
In Intra 16x16 mode, the whole 16x16 luma component of the
macroblock is predicted at once, and only four prediction modes are
supported:
I. Vertical
II. Horizontal
III. DC
IV. Plane
The chroma samples of an Intra macroblock are predicted using
similar prediction techniques as for the luma component in Intra
16x16 macroblocks.
I_PCM Intra macroblock type
No prediction is performed and the raw values of the samples are
simply sent without compression.
This mode is primarily included for decoder implementation reasons.
4) Inter-Picture Prediction:
Inter-Picture Prediction in P Slices:
P macroblocks can be partitioned into smaller regions for MCP with
luma block sizes of 16x16, 16x8, 8x16, and 8x8 samples.
The prediction signal for each predictive-coded MxN luma block is
obtained by MC, which is specied by a translational MV and a
picture reference index.
The syntax allows MVs to point over picture boundaries.
The MV values are differentially coded using either median or
directional prediction from neighboring blocks. No MV value
prediction (or any other form of prediction) takes place across slice
boundaries.
A P macroblock can also be coded in the so-called P Skip mode.
The reconstructed signal is obtained using only a prediction signal
Inter-Picture Prediction in B Slices:
Two distinct MCP values for building the prediction signal.
B slices use a similar macroblock partitioning as P slices.
bipredictive prediction and Direct prediction are provided.
If no prediction error signal is transmitted for a Direct macroblock
mode, it is also referred to as B Skip mode and can be coded very
efciently.
Weighted Prediction in P and B Slices:
previous standards, biprediction has typically been performed with a
simple (1/2, 1/2) averaging of the two prediction signals.
In H.264/AVC, an encoder can specify scaling weights and offsets to
be used for each prediction signal in the P and B macroblocks of a
slice.
The weighting and offset values can be inferred from temporally
related relationships or can be specied explicitly.
5) Transform, Scaling, and Quantization:
Similar to previous video coding standards, H.264/AVC uses spatial
transform coding of the prediction residual.
In H.264/AVC, the transformation is applied to 4x4 blocks (instead of
the larger 8x8 blocks used in previous standards).
separable integer transform with similar properties to a 4x4 DCT is
used.
Reasons for using a smaller transform size (4x4)
improved prediction process both for Inter and Intra.
the residual signal has less spatial correlation.
The smaller 4x4 transform has visual benets resulting in less noise
around edges.
The smaller transform requires less computation and a smaller
processing word length.
A quantization parameter (QP) is used for determining the
quantization of transform coefcients in H.264/AVC.
It can take on 52 values.
The quantization step size is controlled logarithmically by QP rather
than linearly as in previous standards.
The quantized transform coefcients of a block generally are scanned
in a zigzag fashion and transmitted using entropy coding.
The 2x2 DC coefcients of the chroma component are scanned in
raster-scan order.
All inverse transform operations in H.264/AVC can be implemented
using only additions, subtractions, and bit-shifting operations on 16-b
integer values.
6) Entropy Coding:
In H.264/AVC, two alternatives for entropy coding are supported.
I. context-adaptive variable-length coding (CAVLC).
II. context-adaptive binary arithmetic coding (CABAC).
CABAC has higher complexity than CAVLC, but has better coding
efciency.
When using CAVLC, the quantized transform coefcients are coded
using VLC tables that are switched depending on the values of
previous syntax elements.
CABAC uses context-conditional probability estimates and adjusts its
probability estimates to adapt to nonstationary statistical behavior.
7) In-Loop Deblocking Filter:
In the block-based coding production of visible block artifacts,
especially at low bit rates occurs.
H.264/AVC denes an adaptive in-loop deblocking lter.
The lter reduces blockiness while basically retaining the sharpness of
the true edges in the scene.
The lter typically reduces bit rate by 5%10% for the same objective
quality as the nonltered video, and improves subjective quality even
more.
8) Adaptive Frame/Field Coding Operation:
H.264/AVC allows the following interlace specic coding methods:
frame mode: combine the two elds together as a frame and to code the
entire frame as a picture;
eld mode: not combining the two elds and instead coding each single eld
as a separate picture;
macroblock-adaptive frame/eld mode (MBAFF): coding the entire frame as a
picture, but enabling the selection of individual pairs of vertically adjacent
macroblocks within the picture to be split into elds for prediction and
residual coding
9) Hypothetical Reference Decoder:
A key benet provided by a standard is the assurance that all
decoders that conform to the standard will be able to decode any
conforming compressed video bitstream.
The H.264/AVC HRD species operation of an idealized decoder with
two buffers having specied capacity constraints:
The coded picture buffer (CPB) models the arrival and removal timing of the
coded bits
The DPB models the storage for decoded pictures.
10) Proles and Levels:
A prole denes a syntax that can be used in generating a conforming
bitstream,
A level places constraints on the values of key parameters (such as
maximum bit rate, buffering capacity, or picture resolution).
All decoders conforming to a specic prole must supportall features
in that prole.
In H.264/AVC, three proles are dened. These are the Baseline,
Main, and Extended proles.
The features of the H.264/AVC design
Set 0 (basic features for efciency, robustness, and ex- ibility): I and
P slices, CAVLC, and other basics.
Set 1 (enhanced robustness/exibility features): FMO(exible
macroblock ordering ), ASO( arbitrary slice ordering ), and redundant
slices.
Set 2 (further enhanced robustness/exibility features): SP/SI slices
and slice data partitioning.
Set 3 (enhanced coding efciency features): B slices, weighted
prediction, eld coding, and macroblock adaptive frame/eld coding.
Set 4 (a further coding efciency feature): CABAC.
The Baseline prole, which emphasizes coding efciency and
robustness with low computational complexity, supports the features
of sets 0 and 2.
The Main prole, which empha-sizes primarily coding efciency
alone, supports the features of sets 0, 3, and 4.
The Extended prole, which emphasizes robustness and exibility
with high coding efciency, supports the features of sets 0, 1, 2, and 3
(all features except CABAC).
JVT experts group has done further work to extend the capabilities of
H.264/AVC with important new enhancements known as the Fidelity
Range Extensions (FRExt), including four new proles (the High, High
10, High 4:2:2, and High 4:4:4 proles).
REFERENCES
Video CompressionFrom Concepts to the H.264/AVC Standard
BY GARY J. SULLIVAN, SENIOR MEMBER, IEEE AND THOMAS WIEGAND
0018-9219/$20.00 2005 IEEE
H.264 and MPEG-4 Video Compression
BY Iain E. G. Richardson The Robert Gordon University, Aberdeen, UK
http://nptel.ac.in/courses.php?disciplineId=117
Digital Voice & Picture Communication (Video) >> 23 - Video Coding : Basic Building Blocks
Prof. Somnath Sengupta , IIT Kharagpur
For next week presentation
Overview of the High Efciency Video Coding (HEVC) Standard
Gary J. Sullivan, Fellow, IEEE, Jens-Rainer Ohm, Member, IEEE, Woo-Jin Han, Member, IEEE, and
Thomas Wiegand, Fellow, IEEE
High Efficiency Video Coding: The Next Frontier in Video Compression
BY Jens-Rainer Ohm and Gary J. Sullivan
IEEE SIGNAL PROCESSING MAGAZINE [152] JANUARY 2013

Potrebbero piacerti anche