DC CH03

Outline
3.1 Overview 3.2 The Huffman Coding Algorithm 3.2.1 Minimum Variance Huffman Codes 3.2.2 Optimality of Huffman Codes (*) 3.2.3 Length of Huffman Codes (*) 3.2.4 Extended Huffman Codes (*) 3.3 Nonlinear Huffman Codes (*) 3.4 Adaptive Huffman Coding 3.4.1 Update Procedure 3.4.2 Encoding Procedure 3.4.3 Decoding Procedure
Ch 3 Huffman Coding 2
Chapter 3 HUFFMAN CODING

Yeuan-Kuen Lee [ MCU, CSIE ]
Outline
3.5 Golomb Codes 3.6 Rice Codes 3.6.1 CCSDS Recommendation for Lossless Compression 3.7 Tunstall Codes 3.8 Applications of Huffman Coding 3.8.1 Lossless Image Compression 3.8.2 Text Compression 3.8.3 Audio Compression 3.9 Summary 3.10 Projects and Problems
3.1 Overview
In this chapter, we describe a very popular coding algorithm called the Huffman coding algorithm: Present a procedure for building Huffman codes when the probability model for the source is known. A procedure for building codes when the source statistics are unknown Describe a new technique for code design that are in some sense similar to the Huffman coding approach Some applications
Ch 3 Huffman Coding
Ch 3 Huffman Coding
3.2 The Huffman Coding Algorithm

This technique was developed by David Huffman as part of a class assignment; the class was the first ever in the area of information theory and was taught by Robert Fano at MIT. The Codes generated using this technique are called Huffman Codes. These Codes are Prefix codes Optimum for a given model ( set of probabilities ) Based on two observations regarding optimum prefix codes: 1. In an optimum code, symbols that occurs more frequently ( have a higher probability of occurrence) will have short codewords than symbols that occur less frequently. 2. In an optimum code, the two symbols that occur least frequently will have the same length.

In an optimum code, the two symbols that occur least frequently will have the same length. Suppose an optimum code C exists in which the two codewords corresponding to the two least probable symbols do not have the same length. Suppose the longer codeword is k bits longer than the shorter codeword.
distinct
k bits
As these codewords correspond to the least probable symbols in the alphabet, no other codeword can be longer than these codewords; therefore there is no danger that the shortened codeword would become the prefix of some other codeword. Ch 3 Huffman Coding 6
Ch 3 Huffman Coding

Furthermore, by dropping these k bits we obtain a new code that has a shorter average length than C. But, this violates our initial contention that C is an optimal code. Therefore, for an optimal code the second observation also holds true. A simple requirement A simple requirement The codewords corresponding to the two lowest probability symbols differ only in the last bit. That is, if and are the two least probable symbols in an alphabet, if the codeword for was m 0, the codeword for would be m 1. Here, m is a string of 1s and 0s, and denotes concatenation.

Example 3.2.1 Design of a Huffman Code Example 3.2.1 Design of a Huffman Code An alphabet A = { a1 , a2 , a3 , a4 , a5 } with P( a1 ) = P( a3 ) = 0.2 P( a2 ) = 0.4 P( a4 ) = P( a5 ) = 0.1 The entropy = -2 * 0.2 log2 (0.2) - 0.4 log2 (0.4) - 2 * 0.1 log2 (0.1) = 2.122 bits/symbol Table 3.1 The initial five-letter alphabet Letter a2 a1 a3 a4 a5 7 Ch 3 Huffman Coding Probability 0.4 0.2 0.2 0.1 0.1 Codeword c(a2) c(a1) c(a3) c(a4) c(a5) The two symbols with the lowest probability are a4 and a5. c(a4) = 1 0 c(a5) = 1 1 1 is a binary string
Ch 3 Huffman Coding

Define a new alphabet A = { a1 , a2 , a3 , a4 } where a4 is composed of a 4 and a 5 . P( a4 ) = P( a4 ) + P( a5 ) = 0.2 Table 3.2 The reduced four-letter alphabet Letter a2 a1 a3 a4 Probability 0.4 0.2 0.2 0.2 Codeword c(a2) c(a1) c(a3) 1 In this alphabet A , a3 and a4 are the two letters at the bottom of the sorted list. We assign their codewords as c(a3) = 2 0 c(a4) = 2 1

We again define a new alphabet A = { a1 , a2 , a3 } where a3 is composed of a 3 and a 4 . P( a3 ) = P( a3 ) + P( a4 ) = 0.4 Table 3.3 The reduced three-letter alphabet Letter a2 a3 a1 Probability 0.4 0.4 0.2 Codeword c(a2) 2 c(a1)
In this case, the least probable symbols are a3 and a1 . Therefore, c(a3) = 3 0 c(a1) = 3 1
but c(a4) = 1 . Therefore, 1 = 2 1 Which mean that Ch 3 Huffman Coding
c(a4) = 1 0 = 2 10 c(a5) = 1 1 = 2 11 9
but c(a3) = 2 . Therefore, 2 = 3 0 Which mean that
c(a3) = 2 0 = 3 00 c(a4) = 2 10 = 3 010 c(a5) = 2 11 = 3 011 10
Ch 3 Huffman Coding

We again define a new alphabet A = {a3 , a2 } where a3 is composed of a3 and a 1 . P( a3 ) = P( a3 ) + P( a1 ) = 0.6 Table 3.4 The reduced two-letter alphabet Letter a3 a2 Probability 0.6 0.4 Codeword 3 c(a2) We have only two letters, The codeword assignment is straightforward: c(a3) = 0 c(a2) = 1

Table 3.5 Huffman code for the original five-letter alphabet Letter a2 a1 a3 a4 a5 Probability 0.4 0.2 0.2 0.1 0.1 Codeword 1 01 000 0010 0011
The average length for this code is l = .4*1 + .2*2 + .2*3 + .1*4 + .1*4 = 2.2 bits/symbol. A measure of the efficiency of this code is its redundancy the difference between the entropy and the average length. In this case, the redundancy = 2.2 2.122 = 0.078 bits/symbol. 11 Ch 3 Huffman Coding 12
but c(a3) = 3 . Therefore, 3 = 0 Which mean that
c(a1) = 3 1 = 01 c(a3) = 3 00 = 000 c(a4) = 3 010 = 0010 c(a5) = 3 011 = 0011
Ch 3 Huffman Coding

Sorted by probabilities

We build the binary tree starting at the leaf nodes. (0.4) (0.2) (0.2) 0 1 (0.2) 0 1 0 (0.4) (0.4) (0.2) 0 1 (0.4) 1 0 (0.6) (0.4) root 0 1 (1.0) 0 (0.6) 1 1
a2 (0.4) a1 (0.2) a3 (0.2) a4 (0.1) a5 (0.1) 0 1
a2 (0.4) a1 (0.2) a3 (0.2) a4 (0.2) 0 1
a2 (0.4) a3 (0.4) a 1 (0.2) 0 1
a3 (0.6) 0 a2 (0.4) 1
a2 (0.4) a1 (0.2) a3 (0.2) a4 (0.1) a5 (0.1)
a2
(0.4)
a1
(0.2) (0.2) 1
Figure 3.1 The Huffman encoding procedure. The symbol probabilities are list in parentheses.
Figure 3.2 Building the binary Huffman tree. Notice the similarity between Figures 3.1 and 3.2. This is not surprising, as they are a result of viewing the same procedure in two different ways. 13 Ch 3 Huffman Coding
(0.2)
a3
a4
(0.1)
a5
(0.1) 14
Ch 3 Huffman Coding
3.2.1 Minimum Variance Huffman Codes

Table 3.2 Reduced four-letter alphabet Letter a2 a1 a3 a4 Probability 0.4 0.2 0.2 0.2 Codeword c(a2) c(a1) c(a3) 1

Table 3.7 Reduced three-letter alphabet. Letter a1 a2 a4 Probability 0.4 0.4 0.2 Codeword 2 c(a2) 1
Table 3.6 Reduced four-letter alphabet Letter a2 a4 a1 a3 Probability 0.4 0.2 0.2 0.2 Codeword c(a2) 1 c(a1) c(a3) 15 Table 3.8 Reduced two-letter alphabet. Letter a2 a1 Probability 0.6 0.4 Codeword 3 2
Ch 3 Huffman Coding
Ch 3 Huffman Coding
16

Table 3.9 Minimum variance Huffman code Letter a1 a2 a3 a4 a5 Probability 0.2 0.4 0.2 0.1 0.1 Codeword 10 00 11 010 011

Sorted by probabilities
a2 (0.4) a1 (0.2) a3 (0.2) a4 (0.1) a5 (0.1) 0 1
a2 (0.4) a4 (0.2) a1 (0.2) a3 (0.2) 0 1
a1 (0.4) a2 (0.4) a 4 (0.2) 0 1
a2 (0.6) a1 (0.4)
0 1
The average length for this code is l = .4*2 + .2*2 + .2*2 + .1*3 + .1*3 = 2.2 bits/symbol. These two codes are identical in terms of their redundancy. However, the variance of the length of the codewords is significantly different.
Figure 3.3 The minimum variance Huffman encoding procedure.
Ch 3 Huffman Coding
17
Ch 3 Huffman Coding
18

root 0 (0.6) 1 0 (0.4) 1 0 1 0 (0.6) 0 1 0 (0.2) 1 root 1 (0.4) 0
3.4 Adaptive Huffman Coding

Two parameters are added to the binary tree: 1. Weight external node, leaf internal node 1 2. Node number: unique An alphabet of size n, 2n-1 node (internal + external) Node number Weight : y1, y2, y3, y(2n-1) : x1, x2, x3, x(2n-1) , x1 x2 x3 x(2n-1) The number of times the symbol has been encountered Sum of the weight of its offspring
3
5
a2
(0.4)
a1
(0.2) (0.2) 1
(0.4)
a2
a1
(0.2)
a3
(0.2)
a3
(0.2)
a4
(0.1)
a5
(0.1)
a4
(0.1)
a5
(0.1)
minimum variance
Sibling property : nodes y(2j-1) and y(2j) are sibling for 1 j < n node number for the parent number is greater than y(2j-1) and y(2j) 19 Ch 3 Huffman Coding 20
Figure 3.4 Two Huffman trees corresponding to the same probabilities. Ch 3 Huffman Coding

symbols transmitter transmitter codes 01101

Before the beginning of transmission, a fixed code for each symbol is agreed upon between transmitter and receiver. receiver receiver If the source has an alphabet ( a1, a2, a3, , am ) of size m , then pick e and r such that m = 2 e + r and 0 r < 2 e . ex: m = 26, 26 = 24 + 10, e = 4 , r = 10
initial tree NYT
initial tree NYT
The letter ak is encoded as (e+1)-bit binary representation of k-1 e-bit binary representation of k-r-1 , ex: a1 a2 a22 21 [ 1 2*10 ] [ 2 2*10 ] [22 > 2*10 ] 1-1 2-1 22-10-1 if 1 k 2r otherwise
As transmission progresses, nodes corresponding to symbols transmitted will be added to the tree, and the tree is reconfigured using a update procedure.
00000 (5 bits) 00001 (5 bits) 1011 (4 bits) 22
Ch 3 Huffman Coding
Ch 3 Huffman Coding

When a symbol is encountered for the first time, 1. 2. 3. 4. The code for the NYT node is transmitted Followed by the fixed code for the symbol A node for the symbol is created The symbol is taken out of the NYT list.
3.4.1 Update Procedure

The update procedure requires the nodes be in a fixed order. This ordering is preserved by numbering the node. The largest node number is given to the root of the tree, and the smallest number is assigned to the NYT node. The number from the NYT node to the root are assigned in increasing order from left to right, and from lower to upper level. The set of nodes with the same weight make up a block. The function of the update procedure is to preserve the sibling property.
Both transmitter and receiver Start with the same tree structure Update procedure is identical Therefore, the encoding and decoding processes remain synchronized.
Ch 3 Huffman Coding
23
Ch 3 Huffman Coding
24

START Figure 3.6 (a) Update Procedure NYT gives birth to new NYT and external node Increment weight of external node and old NYT node Go to old NYT node B Ch 3 Huffman Coding Yes First appearance for symbol? No Go to symbol external node C Node number max in block ? Yes A 25 No switch node with highest numbered node in block

A B Increment node weight C
Is this the root node ? Yes STOP Figure 3.6 (b) Update Procedure Ch 3 Huffman Coding
No
Go to parent node
26

Example 3.4.1 Update Procedure Example 3.4.1 Update Procedure Message [ a a r d v a r k], where the alphabet consists of the 26 lowercase letters of the English alphabet. Total number of node = 2 * 26 1 = 51. root root NYT 2 0 1 NYT 0 a 49 ( aa ) a Send 1 for the second a 27 51 1 1 0 NYT 51

root root 0 NYT 0 49 a r Send 0 for NYT node, then send the fixed code 10001 for r Since the index of r is 18 So, the fixed code is 10001 (17) Ch 3 Huffman Coding ( aa ) 2 51 1 Old NYT a NYT 0 47 1 49 3 51
2
50
2
50
1
48 r ( aar )
0
51
2
50
initial tree
0
49
1
50
(a) a Send a binary code 00000 for a Since the index of a is 1 Ch 3 Huffman Coding
update the tree for r
28

root 3 51 1 49 NYT 0 47 ( aar ) d Send 00 for NYT node, then send the fixed code 00011 for d Since the index of d is 4 So, the fixed code is 00011 (3) Ch 3 Huffman Coding root 4 51 2 49 Old NYT 1 47 NYT 0 45

root 4 51 2 49 1 47 NYT root 4 51
2
50
2
50
2
50
a 1 47 1 45 NYT
2 49 B Old NYT
2
50
1 r
48
1
48
1
48
1
48
1
46
0
45
1
46
1
46
Swap nodes
( aard ) update the tree for d 29
( aard ) v Send 000 for NYT node, then send the fixed code 1011 for v Since the index of v is 22 So, the fixed code is 1011 (11) Ch 3 Huffman Coding
0
43
1
44
v ( aardv )
update the tree for v 30

root 4 51 2 49 root 5 51
3.4.2 Encoding Procedure

Figure 3.8 (a) flowchart of the encoding procedure START
Read in Symbol 3 50
2
50 2 48
2 a
49
1 r
47 1 45 NYT
Swap nodes
1 r
47 1 45 NYT
2 48
Send code for NYT node followed by index in the NYT list
Yes
First appearance for symbol?
No
Code is the path from the root node to the corresponding node
1
46
1
46
d Call update procedure A 31 Ch 3 Huffman Coding 32
0
43
1
44
0
43
1
44
( aardv ) Ch 3 Huffman Coding
( aardv )

Figure 3.8 (b) flowchart of the encoding procedure A

Example 3.4.2 Encoding procedure Example 3.4.2 Encoding procedure
Message [
Is this the last symbol? Yes START
No
0 0 0 0 0 1
0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0
NYT
NYT
NYT
Ch 3 Huffman Coding
33
Ch 3 Huffman Coding
34
3.4.3 Decoding Procedure

START B Go to root of the tree Figure 3.9 (a) flowchart of the decoding procedure

A Figure 3.9 (b) flowchart of the decoding procedure Yes
Is the node the NYT node? No
Read e bit
Is the node an external node? Yes A
No
Read bit and go to corresponding node
Decode element corresponding to node C
Is the e-bit number p less than r ? Yes Read one more bit
No
Add r to p
D Ch 3 Huffman Coding 35 Ch 3 Huffman Coding 36

C D

Example 3.4.3 Decoding procedure Example 3.4.3 Decoding procedure NYT NYT NYT
Call update procedure
Decode the (p+1) element in NYT list
0 0 0 0 0 1
B No Is this the last bit ? Yes START Figure 3.9 (c) flowchart of the decoding procedure Message [
0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 0
Ch 3 Huffman Coding
37
Ch 3 Huffman Coding
38
3.8 Applications of Huffman Coding 3.8.1 Lossless Image Compression
3.8.1 Lossless Image Compression

Table 3.23 Compression using Huffman codes on pixel values. Image Name Sena Sensin Earth Omaha Bits/Pixel 7.01 7.49 4.94 7.12 Total Size(bytes) Compression Ratio 57,504 61,430 40,534 58,374 1.14 1.07 1.62 1.12
Table 3.24 Compression using Huffman codes on pixel difference values. Sena Sensin Earth Omaha Image Name Sena Sensin Earth Omaha 39 Ch 3 Huffman Coding Bits/Pixel 4.02 4.70 4.13 6.42 Total Size(bytes) Compression Ratio 32,968 38,541 33,880 52,643 1.99 1.70 1.93 1.24 40
256*256 Gray scale raw image. Figure 3.10 Test Images. ftp://ftp.mkp.com/pub/Sayood/uncompressed_software/datasets/images/ Ch 3 Huffman Coding
3.8.1 Lossless Image Compression

Table 3.25 Compression using adaptive Huffman codes on pixel difference values. Image Name Sena Sensin Earth Omaha Bits/Pixel 3.93 4.63 4.82 6.39 Total Size(bytes) Compression Ratio 32,261 37,896 39,504 52,321 2.03 1.73 1.66 1.25
3.8.2 Text Compression

Table 3.26 Probabilities of occurrence of the letters in the English alphabet in the U.S. Constitution. Letter A B C D E F G H I Probability 0.057305 0.014876 0.025775 0.026811 0.112578 0.022875 0.009523 0.042915 0.053475 Letter J K L M N O P Q R Probability 0.002031 0.001016 0.031403 0.015892 0.056035 0.058215 0.021034 0.000973 0.048819 Letter S T U V W X Y Z Probability 0.060289 0.078085 0.018474 0.009882 0.007576 0.002264 0.011702 0.001502
Adaptive Huffman coder Adv. Can be used as an on-line or real-time coder Disadv. More vulnerable to errors More difficult to implement
Ch 3 Huffman Coding
41
Ch 3 Huffman Coding
42

Table 3.27 Probabilities of occurrence of the letters in the English alphabet in this chapter. Letter A B C D E F G H I Probability 0.049885 0.016110 0.025835 0.030232 0.097434 0.019745 0.012053 0.035723 0.048783 Letter J K L M N O P Q R Probability 0.000394 0.002450 0.025835 0.016494 0.048039 0.050642 0.015007 0.001509 0.040492 Letter S T U V W X Y Z Probability 0.042657 0.061142 0.015794 0.004988 0.012207 0.003413 0.008466 0.001050

U.S. Constitution U.S. Constitution 0.12 0.12 0.1 0.1 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0 0 A B C D E F G H II A B C D E F G H JJ K L M N O P Q R S T U V W X Y Z K L M N O P Q R S T U V W X Y Z Chapter 1 Chapter 1
Ch 3 Huffman Coding
43
Ch 3 Huffman Coding
44
3.8.3 Audio Compression

CD-quality audio data CD-quality audio data Each stereo channel is sampled at 44.1kHz Each sample is represented by 16 bits. ( the amount of data stored on one CD is enormous )
3.8.3 Audio Compression

Table 3.28 Huffman codes of 16-bit CD-quality audio. File Name Mozart Cohn Mir Original File Size(bytes) 939,862 402,442 884,020 Entropy(bits) 12.8 13.8 13.7 Estimated Compressed File Size(bytes) 725,420 349,300 759,540 Compression Ratio 1.30 1.15 1.16
16 bits : 65,536 distinct values Huffman coder would require 65,536 distinct (variable-length) codewords. In most applications, a codeword of this size would not be practical. Large alphabet Recursive indexing chapter 8 Others [ reference: #180]
Table 3.29 Huffman codes of differences of 16-bit CD-quality audio. File Name Mozart Cohn Mir 45 Original File Size(bytes) 939,862 402,442 884,020 Entropy(bits) 9.7 10.4 10.9 Estimated Compressed File Size(bytes) 569,792 261,590 602,240 Compression Ratio 1.65 1.54 1.47 46
Ch 3 Huffman Coding
Ch 3 Huffman Coding

DC CH03

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

DC CH03

Caricato da

Copyright:

Formati disponibili

Outline

Chapter 3 HUFFMAN CODING

3.2 The Huffman Coding Algorithm

3.2 The Huffman Coding Algorithm

3.2 The Huffman Coding Algorithm

3.2 The Huffman Coding Algorithm

3.2 The Huffman Coding Algorithm

3.2 The Huffman Coding Algorithm

but c(a4) = 1 . Therefore, 1 = 2 1 Which mean that Ch 3 Huffman Coding

but c(a3) = 2 . Therefore, 2 = 3 0 Which mean that

c(a3) = 2 0 = 3 00 c(a4) = 2 10 = 3 010 c(a5) = 2 11 = 3 011 10

3.2 The Huffman Coding Algorithm

3.2 The Huffman Coding Algorithm

but c(a3) = 3 . Therefore, 3 = 0 Which mean that

c(a1) = 3 1 = 01 c(a3) = 3 00 = 000 c(a4) = 3 010 = 0010 c(a5) = 3 011 = 0011

3.2 The Huffman Coding Algorithm

3.2 The Huffman Coding Algorithm

a2 (0.4) a1 (0.2) a3 (0.2) a4 (0.1) a5 (0.1) 0 1

a2 (0.4) a1 (0.2) a3 (0.2) a4 (0.2) 0 1

a2 (0.4) a3 (0.4) a 1 (0.2) 0 1

a2 (0.4) a1 (0.2) a3 (0.2) a4 (0.1) a5 (0.1)

3.2.1 Minimum Variance Huffman Codes

3.2.1 Minimum Variance Huffman Codes

3.2.1 Minimum Variance Huffman Codes

3.2.1 Minimum Variance Huffman Codes

a2 (0.4) a1 (0.2) a3 (0.2) a4 (0.1) a5 (0.1) 0 1

a2 (0.4) a4 (0.2) a1 (0.2) a3 (0.2) 0 1

a1 (0.4) a2 (0.4) a 4 (0.2) 0 1

Figure 3.3 The minimum variance Huffman encoding procedure.

3.2.1 Minimum Variance Huffman Codes

3.4 Adaptive Huffman Coding

3.4 Adaptive Huffman Coding

3.4 Adaptive Huffman Coding

initial tree NYT

initial tree NYT

00000 (5 bits) 00001 (5 bits) 1011 (4 bits) 22

3.4 Adaptive Huffman Coding

3.4.1 Update Procedure

3.4.1 Update Procedure

3.4.1 Update Procedure

3.4.1 Update Procedure

3.4.1 Update Procedure

update the tree for r

3.4.1 Update Procedure

3.4.1 Update Procedure

( aard ) update the tree for d 29

update the tree for v 30

3.4.1 Update Procedure

3.4.2 Encoding Procedure

First appearance for symbol?

d Call update procedure A 31 Ch 3 Huffman Coding 32

( aardv ) Ch 3 Huffman Coding

3.4.2 Encoding Procedure

3.4.2 Encoding Procedure

Is this the last symbol? Yes START

3.4.3 Decoding Procedure

3.4.3 Decoding Procedure

Is the node the NYT node? No

Is the node an external node? Yes A

Read bit and go to corresponding node

Decode element corresponding to node C

D Ch 3 Huffman Coding 35 Ch 3 Huffman Coding 36