Sei sulla pagina 1di 44

C#ODE Studio

Huffman Coding: An Application of Binary Trees and Priority Queues

CS 102

C#ODE Studio Encoding and Compression of Data

Fax Machines ASCII Variations on ASCII


min number of bits needed cost of savings patterns modifications

CS 102

C#ODE Studio Purpose of Huffman Coding

Proposed by Dr. David A. Huffman in 1952


A Method for the Construction of Minimum Redundancy Codes

Applicable to many forms of data transmission


Our example: text files

CS 102

The Basic Algorithm

C#ODE Studio

Huffman coding is a form of statistical coding Not all characters occur with the same frequency! Yet all characters are allocated the same amount of space
1 char = 1 byte, be it

or

CS 102

The Basic Algorithm

C#ODE Studio

Any savings in tailoring codes to frequency of character? Code word lengths are no longer fixed like ASCII. Code word lengths vary and will be shorter for the more frequently used characters.
CS 102

C#ODE The (Real) Basic Algorithm


1. 2. 3. 4. 5.

Studio

Scan text to be compressed and tally occurrence of all characters. Sort or prioritize characters based on number of occurrences in text. Build Huffman code tree based on prioritized list. Perform a traversal of tree to determine all code words. Scan text again and create new file using the Huffman codes.
CS 102

Building a Tree

C#ODE Studio

Scan the original text

Consider the following short text:


Eerie eyes seen near lake.

Count up the occurrences of all characters in the text


CS 102

Building a Tree

C#ODE Studio

Scan the original text

Eerie eyes seen near lake. What characters are present?


E e r i space y s n a r l k .

CS 102

Building a Tree

C#ODE Studio

Scan the original text

Eerie eyes seen near lake.


What is the frequency of each character in the text?

Char Freq. E 1 e 8 r 2 i 1 space 4

Char Freq. y 1 s 2 n 2 a 2 l 1
CS 102

Char Freq. k 1 . 1

Building a Tree

C#ODE Studio

Prioritize characters

Create binary tree nodes with character and frequency of each character Place nodes in a priority queue
The lower the occurrence, the higher the priority in the queue

CS 102

Building a Tree
Uses binary tree nodes

C#ODE Studio

Prioritize characters

public class HuffNode { public char myChar; public int myFrequency; public HuffNode myLeft, myRight; }

priorityQueue myQueue;

CS 102

Building a Tree

C#ODE Studio

The queue after inserting all nodes

E
1

i
1

y
1

l
1

k
1

.
1

r
2

s
2

n
2

a
2

sp
4

e
8

Null Pointers are not shown


CS 102

Building a Tree

C#ODE Studio

While priority queue contains two or more nodes


Create new node Dequeue node and make Dequeue next node and subtree Frequency of new node frequency of left and Enqueue new node back
CS 102

it left subtree make it right equals sum of right children into queue

Building a Tree

C#ODE Studio

E 1

i 1

y 1

l 1

k 1

. 1

r 2

s 2

n 2

a 2

sp 4

e 8

CS 102

Building a Tree

C#ODE Studio

y 1

l 1

k 1

. 1

r 2

s 2

n 2

a 2

sp 4

e 8

2 E 1 i 1

CS 102

Building a Tree

C#ODE Studio

y 1

l 1

k 1

. 1

r 2

s 2

n 2

a 2 E 1

2
i 1

sp 4

e 8

CS 102

Building a Tree

C#ODE Studio

k 1

. 1

r 2

s 2

n 2

a 2

sp 4

e 8

E 1

i 1

2 y 1

l 1

CS 102

Building a Tree

C#ODE Studio

k 1

. 1

r 2

s 2

n 2

a 2

2 y 1

2 l 1

sp 4

e 8

E 1

i 1

CS 102

Building a Tree

C#ODE Studio

2 y 1

2 l 1

sp

2
E 1 i 1

2 k 1 . 1

CS 102

Building a Tree

C#ODE Studio

2 i 1

sp

2
E 1

4 y 1 l 1
k 1 . 1

CS 102

Building a Tree

C#ODE Studio

2
E 1 i 1

sp 4 . 1

e 8

y 1

l 1
4 r 2 s 2

k 1

CS 102

Building a Tree

C#ODE Studio

2 E 1 i 1
y 1

2
l 1 k 1

2
. 1

sp 4 r 2

4 s 2

CS 102

Building a Tree

C#ODE Studio

2 y 1

2 l 1 k 1

2 . 1

sp 4 r 2

4 s 2

e 8

E 1

i 1

4 n 2 a 2

CS 102

Building a Tree

C#ODE Studio

2 y 1

2 l 1 k 1

2 . 1

sp 4 r 2

4 s 2 n 2

4 a 2

e 8

E 1

i 1

CS 102

Building a Tree

C#ODE Studio

2 k 1 . 1

sp 4 r 2

4 s 2 n 2

4 a 2

4 2 E 1 i 1 2

y 1

l 1 CS 102

Building a Tree

C#ODE Studio

2 k 1 . 1

sp 4 r 2

4 s 2 n 2

4 a 2 2

4 2 y 1

e 8

E 1

i 1

l 1

CS 102

Building a Tree

C#ODE Studio

4 r 2 s 2 n 2

4 a 2 E 1 6 2 k 1 . 1 sp 4 2 i 1

4 2 y 1 l 1

e 8

CS 102

Building a Tree

C#ODE Studio

4 r 2 s 2 n 2

4 a 2

e sp 4 8

2 E 1 i 1
y 1

2 l 1 k 1

. 1

What is happening to the characters with a low number of occurrences?


CS 102

Building a Tree

C#ODE Studio

4 2 E 1 i 1 2 2 l 1

6 sp 4

e 8

y 1

k 1

. 1

8 4 r 2 CS 102 s 2 n 2 4 a 2

Building a Tree

C#ODE Studio

4 2 E 1 i 1 2 2 l 1

6 sp 4

e 8 4 r 2 s 2

8 4 n 2 a 2

y 1

k 1

. 1

CS 102

Building a Tree

C#ODE Studio

e 8

4
r 2 s 2 n 2

4 10 a 2 2 4 2 y 1 6

2 l 1
k 1 . 1

sp 4

E 1

i 1

CS 102

Building a Tree

C#ODE Studio

e 8

10 4 4 a 2 E 1 2 i 1 2 2 l 1 6 sp 4

4
r 2 s 2 n 2

y 1

k 1

. 1

CS 102

Building a Tree

C#ODE Studio

10 4 2 E 1 i 1 2 2 l 1 16 6 sp 4 e 8 4 r 2 s 2 n 2 8 4 a 2

y 1

k 1

. 1

CS 102

Building a Tree

C#ODE Studio

10 4 2 E 1 i 1 2 2 l 1 6 sp 4

16 e 8 4 r 2 s 2 n 2

8 4
a 2

y 1

k 1

. 1

CS 102

Building a Tree
26

C#ODE Studio

10
4 2 E 1 i 1 y 1 2 l 1 k 1 2 . 1

16 e 8 sp 4 r 2 4 s 2 n 2

8 4
a 2

CS 102

Building a Tree

C#ODE Studio

26 10 4 2 E 1 i 1 y 1 2 l 1 k 1 6 e 8 sp 4 . 1 r 2 4 s 2 n 2 16 8 4 a 2

After enqueueing this node there is only one node left in priority queue.

CS 102

Building a Tree
Dequeue the single node left in the queue. This tree contains the new code words for each character.
2 10 4 2 2 6

C#ODE Studio

26 16 e 8 sp 4 4 8 4

Frequency of root node should equal number of characters in text.

E i y l k . 1 1 1 1 1 1

r s n a 2 2 2 2

Eerie eyes seen near lake.


CS 102

26 characters

Encoding the File


Traverse Tree for Codes
Perform a traversal of the tree to obtain new code words Going left is a 0 going right is a 1 code word is only completed when a leaf node is reached

C#ODE Studio

26 10 16 e 8 sp 4 4 8 4

4 2 2 2

E i y l k . 1 1 1 1 1 1

r s n a 2 2 2 2

CS 102

Encoding the File


Traverse Tree for Codes
Char E i y l k . space e r s n a Code 0000 0001 0010 0011 0100 0101 011 10 1100 1101 1110 1111

C#ODE Studio

26 10 16 e 8 sp 4 4 8 4

4 2 2 2

E i y l k . 1 1 1 1 1 1

r s n a 2 2 2 2

CS 102

Encoding the File


Rescan text and encode file using new code words
Eerie eyes seen near lake.

C#ODE Studio

0000101100000110011 1000101011011010011 1110101111110001100 1111110100100101


Why is there no need for a separator character?
.

Char E i y l k . space e r s n a
CS 102

Code 0000 0001 0010 0011 0100 0101 011 10 1100 1101 1110 1111

Encoding the File


Results

C#ODE Studio

Have we made 0000101100000110011 things any 1000101011011010011 better? 1110101111110001100 73 bits to encode 1111110100100101 the text ASCII would take 8 * 26 = 208 bits If modified code used 4 bits per character are needed. Total bits 4 * 26 = 104. Savings not as great.
CS 102

Decoding the File

C#ODE Studio

How does receiver know what the codes are? Tree constructed for each text file.
Considers frequency for each file Big hit on compression, especially for smaller files

Tree predetermined
based on statistical analysis of text files or file types

Data transmission is bit based versus byte based


CS 102

Decoding the File


Once receiver has tree it scans incoming bit stream 0 go left 1 go right
26 10 4 6

C#ODE Studio

16 e 8 sp 4 4 8 4

101000110111101111 01111110000110101

E i y l k . 1 1 1 1 1 1

r s n a 2 2 2 2

CS 102

Summary

C#ODE Studio

Huffman coding is a technique used to compress files for transmission Uses statistical coding
more frequently used symbols have shorter code words

Works well for text and fax transmissions An application that uses several data structures
CS 102

Potrebbero piacerti anche