Sei sulla pagina 1di 45

Video Coding Using Motion Compensation

(Chapter 9 continues)

Yao Wang Polytechnic University, Brooklyn, NY11201

Outline

Block-Based Hybrid Video Coding


Overview of Block-Based Hybrid Video Coding Overlapped Block Motion Compensation Coding mode selection and rate control Loop filtering

Characteristics of Typical Videos

Frame t-1

Frame t

Adjacent frames are similar and changes are due to object or camera motion

Key Idea in Video Compression

Predict a new frame from a previous frame and only code the prediction error Prediction error will be coded using the DCT method Prediction errors have smaller energy than the original pixel values and can be coded with fewer bits Those regions that cannot be predicted well will be coded directly using DCT Work on each macroblock (MB) (16x16 pixels) independently for reduced complexity
Motion compensation done at the MB level DCT coding of error at the block level (8x8 pixels)

Temporal Prediction

No Motion Compensation:
Work well in stationary regions

f (t , m, n) f (t 1, m, n)
Uni-directional Motion Compensation:
Does not work well for uncovered regions by object motion

f (t , m, n) f (t 1, m d x , n d y )
Bi-directional Motion Compensation
Can handle better covered/uncovered regions

f (t , m, n) wb f (t 1, m db, x , n db, y ) w f f (t 1, m d f , x , n d f , y )

Encoder: Typical Block-Based Hybrid Video Coder

Block-based: each frame divided into blocks of fixed size Hybrid: motion-compensated temporal prediction and transform coding VLC: variable-length coding (Hoffman)
6

Decoder Block Diagram

Typical Block-Based Hybrid Video Coder

If temporal prediction is successful, then prediction error block needs fewer bits than the original block. Called P-mode If not, code the original block directly using transform coding. Called intra-mode If use bidirectional prediction, then call B-mode Both B-mode and P-mode are inter-mode. Hence, I-frame, P-frame, B-frame Blocks for motion estimation and larger than blocks for DCT and called macroblocks (MB) Number of MBs form group of blocks (GOB) or slice, several GOBs form a picture.

Different Coding Modes

MB Structure in 4:2:0 Color Format

4 8x8 Y blocks

1 8x8 Cb blocks

1 8x8 Cr blocks

10

Block Matching Algorithm for Motion Estimation

MV
Search Region

Frame t-1 (Reference Frame)

Frame t (Predicted frame)


11

Macroblock Coding in I-Mode

DCT transform each 8x8 DCT block

Quantize the DCT coefficients with properly chosen quantization matrices

The quantized DCT coefficients are zig-zag ordered and run-length coded

12

Macroblock Coding in P-Mode

Estimate one MV for each macroblock (16x16)

Depending on the motion compensation error, determine the coding mode (intra, inter-with-no-MC, inter-with-MC, etc.)

The original values (for intra mode) or motion compensation errors (for inter mode) in each of the DCT blocks (8x8) are DCT transformed, quantized, zig-zag/alternate scanned, and run-length coded

13

Macroblock Coding in B-Mode

Same as for the P-mode, except a macroblock can be predicted from a previous picture, a following one, or both.

vf

vb

14

Overlapped Block Motion Compensation (OBMC)


Conventional block motion compensation
One best matching block is found from a reference frame The current block is replaced by the best matching block

OBMC
Each pixel in the current block is predicted by a weighted average of several corresponding pixels in the reference frame The corresponding pixels are determined by the MVs of the current as well as adjacent MBs The weights for each corresponding pixel depends on the expected accuracy of the associated MV

( x), r ( x), p ( x) - coded, reference, predicted frame

15

OBMC Using 4 Neighboring MBs

Should be inversely proportional to the distance between x and the center of

16

Optimal Weighting Design


Convert to an optimization problem:
Minimize

Subject to

Optimal weighting functions (solution):


(R autocorrelation matrix, r cross-correlation vector)

17

How to Determine MVs with OBMC

Option 1: using conventional block-matching method (BMA), minimize the prediction error (MAD) within each MB independently Option 2: minimize the prediction error assuming OBMC
Solve the MV for the current MB while keeping the MVs for the neighboring MBs found in the previous iterations

18

Weighting Coefficients Used in H.263

19

Window Function Corresponding to H.263 Weights for OBMC

20

Rate Control

Rate control:
The coding method necessarily yields variable bit rate Rate control is necessary when the video is to be sent over a constant bit rate (CBR) channel, where the rate when averaged over a short period should be constant The fluctuation within the period can be smoothed by a buffer at the encoder output

Problem of rate control:


Step 1) Determine the target rate at the frame, GOB, and MB level, based on the current buffer fullness Step 2) Satisfy frame level target rate by varying frame rate (skip frames when necessary) Step 3) Satisfy GOB/MB level target rate by varying the coding mode at each MB

21

Video Coding Standards


(Chapter 13)

Yao Wang Polytechnic University, Brooklyn, NY11201 22

Outline

Overview of Standards and Their Applications ITU-T (International telecommunication Union) Standards for Audio-Visual Communications
H.261 H.263 H.263+, H.263++

ISO (International Organization for Standardization) Standards for


MPEG-1 MPEG-2 MPEG-4 MPEG-7

23

Standardization

ITU International Telecommunication Union 1865 International radiotelegraph Convention signed - 1906 CCIF International Telephone Consultative Committee 1924 CCIT International Telegraph Consultative Committee 1925 CCIR International Radio Consultative Committee 1927 CCIT + CCIF merge -> CCITT 1956, published H.261 in 1989 CCIR -> ITU-R; CCITT -> ITU-T - 1992 ITU-T has Study Groups (SG), SG 16 multimedia; SG divided into into Working Parties (WP) each dealing with several Questions

http://www.itu.int/home/index.html
24

Standardization

International Electromechanical Commission (IEC) 1906 International Organization for Standardization (ISO) 1947 Joint ISO/IEC Technical Commission 1 (JTC1) on Information Technology Subcommittee 24: Computer Graphics and Image Processing VRML Subcommittee 26: Coding - MPEG

http://www.iso.org/iso/en/ISOOnline.frontpage
25

Multimedia Communications Standards and Applications

Standards

Application

Video Format

Raw Data Rate

Compressed Data Rate >=384 Kbps >=64 Kbps >=64 Kbps >=18 Kbps 1.5 Mbps 3-10 Mbps 28-1024 Kbps

H.320 (H.261) H.323 (H.263) H.324 (H.263) MPEG-1 MPEG-2 MPEG-4 GA-HDTV MPEG-7

Video conferencing over ISDN Video conferencing over Internet Video over phone lines/ wireless Video distribution on CD/ WWW Video distribution on DVD / digital TV Multimedia distribution over Inter/Intra net HDTV broadcasting Multimedia databases (content description and retrieval)

CIF QCIF 4CIF/ CIF/ QCIF QCIF CIF CCIR601 4:2:0 QCIF/CIF SMPTE296/295

37 Mbps 9.1 Mbps

9.1 Mbps 30 Mbps 128 Mbps

<=700 Mbps

18--45 Mbps

26

H.261 Video Coding Summary


H.261 is an 1990 ITU video coding standard originally designed for transmission over ISDN lines on which data rates are multiples of 64 kbit/s. The data rate of the coding algorithm was designed to be able to operate between 40 kbit/s and 2 Mbit/s. The standard supports CIF and QCIF video frames with luma resolutions of 352x288 and 176x144 respectively (and 4:2:0 sampling with chroma resolutions of 176x144 and 88x72, respectively). History: H.261 was the first practical digital video coding standard. The H.261 design was a pioneering effort, all subsequent international video coding standards (MPEG-1, MPEG-2/H.262, H.263, and even H.264) have been based on its design. Design: The basic processing unit of the design is called a macroblock. Each macroblock consists of a 16x16 array of luma samples and two corresponding 8x8 arrays of chroma samples using 4:2:0 sampling and a YCbCr color space. The inter-picture prediction removes temporal redundancy, with MV used to help the codec compensate for motion. Transform coding using an 8x8 discrete cosine transform (DCT) removes the spatial redundancy. Scalar quantization is then applied to round the transform coefficients to the appropriate precision, the quantized transform coefficients are zig-zag scanned and entropy coded (using a Run-Level variable-length code) to remove statistical redundancy.
27

H.261 Video Coding Standard


For video-conferencing/video phone
Video coding standard in H.320 Low delay (real-time, interactive) Slow motion in general

For transmission over ISDN


Fixed bandwidth: px64 Kbps, p=1,2,,30

Video Format:
CIF (352x288, above 128 Kbps) QCIF (176x144, 64-128 Kbps) 4:2:0 color format, progressive scan

Published in 1990 Each macroblock can be coded in intra- or inter-mode Periodic insertion of intra-mode to eliminate error propagation due to network impairments Integer-pel accuracy motion estimation in inter-mode

28

H.261 Encoder

T transform Q - quantizer

Coding Control

p flag, INTRA/INTER t flag qz quantizer infication q quantizing index for transform coeff d motion vector f on/off loop filter

F: Loop filter (low pass the prediction; P: motion estimation and compensation
29

DCT Coefficient Quantization

DC Coefficient in Intra-mode: Uniform, step size=8


Others: Uniform with dead zone, step size=2~64 (MQUANT) Dead zone: [-T,T] To avoid too many small coefficients being coded, which are typically due to noise

30

Motion Estimation and Compensation

Integer-pel accuracy in the range [-16,16] Methods for generating the MVs are not specified in the standard
Standards only define the bit stream syntax, or the decoder operation)

MVs coded differentially (DMV) Encoder and decoder uses the decoded MVs to perform motion compensation Loop-filtering can be applied to suppress propagation of coding noise temporally
Separable filter [1/4,1/2,1/4] Loop filter can be turned on or off

31

Variable Length Coding

DCT coefficients are converted into run-length representations and then coded using VLC (Huffman coding for each pair of symbols)
Symbol: (Zero run-length, non-zero value range)

Other information are also coded using VLC (Huffman coding)

32

Parameter Selection and Rate Control

MTYPE macroblock type (intra vs. inter, zero vs. non-zero MV in inter) CBP coded block pattern (which blocks in a MB have non-zero DCT coefficients) MQUANT optional (allow the changes of the quantizer step size at the MB level) should be varied to satisfy the rate constraint MV (ideally should be determined not only by prediction error but also the total bits used for coding MV and DCT coefficients of prediction error) Loop Filter on/off

33

Formats supported

Note: H.261 encoder must support specific formats

34

H.263 Video Coding Summary


H.263 is a video codec originally designed by the ITU-T in 1995/1996 as a lowbitrate compressed format standard for videoconferencing. It is one member of the H.26x family of video coding standards in the domain of the ITU-T Video Coding Experts Group (VCEG). The codec was first designed to be utilized in H.324 based systems (PSTN and other circuit-switched network videoconferencing and videotelephony), but has since found use in H.323 (RTP/IP-based videoconferencing), H.320 (ISDN-based videoconferencing), RTSP (streaming media) and SIP (Internet conferencing) solutions as well. Most Flash Video content (as used on sites such as YouTube, Google Video, MySpace etc.) is encoded in this format, though some sites now use VP6 encoding, which is supported since Flash 8. H.263 video can be decoded with the free LGPL-licensed libavcodec library (part of the ffmpeg project) which is used by programs such as ffdshow, VLC media player and MPlayer. H.263 was developed as an evolutionary improvement based on experience from H.261, the previous ITU-T standard for video compression, and the MPEG-1 and MPEG-2 standards. Its first version was completed in 1995 and provided a suitable replacement for H.261 at all bitrates.
35

H.263 Video Coding Standard

H.263 is the video coding standard in H.323/H.324, targeted for visual telephone over PSTN or Internet Developed later than H.261, can accommodate computationally more intensive options Initial version (H.263 baseline): 1995 H.263+: 1997 H.263++: 2000 Goal: Improved quality at lower rates Result: Significantly better quality at lower rates Better video at 18-24 Kbps than H.261 at 64 Kbps Enable video phone over regular phone lines (28.8 Kbps) or wireless modem

36

Improvements over H.261


Better motion estimation half-pel accuracy motion estimation with bilinear interpolation filter Larger motion search range [-31.5,31], and unrestricted MV at boundary blocks More efficient predictive coding for MVs (median prediction using three neighbors) overlapping block motion compensation (option) variable block size: 16x16 -> 8x8, 4 MVs per MB (option) use bidirectional temporal prediction (PB picture) (option) 3-D VLC for DCT coefficients (runlength, value, EOB-end of block)

Syntax-based arithmetic coding (option)


4% savings in bit rate for P-mode, 10% saving for I-mode, at 50% more computations

The options, when chosen properly, can improve the PSNR 0.5-1.5 dB over default at 20-70 kbps range.
37

Prediction of MVs

Half-pixel motion compensation if needed

38

Prediction of MVs

Encoder needs to decide if for which MB 4 MV should be used: as extra bits needed for coding

39

PB-Picture Mode

PB-picture mode codes two pictures as a group. The second picture (P) is coded first, then the first picture (B) is coded using both the P-picture and the previously coded picture. This is to avoid the reordering of pictures required in the normal B-mode. But it still requires additional coding delay than P-frames only. In a B-block, forward prediction (predicted from the previous frame) can be used for all pixels; backward prediction (from the future frame) is only used for those pels that the backward motion vector aligns with pels of the current MB. Pixels in the white area use only forward prediction. An improved PB-frame mode was defined in H.263+, that removes the previous restriction.
40

Performance of H.261 and H.263

OBMC, 4 MVs, etc

Half-pel MC, +/- 32

Integer MC, +/- 16, loop filter Integer MC, +/- 32 Integer MC, +/- 16

Forman, QCIF, 12.5 Hz 41

H.323 protocols for multimedia

42

ITU-T Multimedia Communications Standards

/3

43

(multimedia communication over PSTN)

H.324 Terminal

44

Outline

Overview of Standards and Their Applications ITU-T (International telecommunication Union) Standards for Audio-Visual Communications
H.261 H.263 H.263+, H.263++

ISO (International Organization for Standardization) Standards for


MPEG-1 MPEG-2 MPEG-4 MPEG-7

45

Potrebbero piacerti anche