Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CS 102
CS 102
CS 102
C#ODE Studio
Huffman coding is a form of statistical coding Not all characters occur with the same frequency! Yet all characters are allocated the same amount of space
1 char = 1 byte, be it
or
CS 102
C#ODE Studio
Any savings in tailoring codes to frequency of character? Code word lengths are no longer fixed like ASCII. Code word lengths vary and will be shorter for the more frequently used characters.
CS 102
Studio
Scan text to be compressed and tally occurrence of all characters. Sort or prioritize characters based on number of occurrences in text. Build Huffman code tree based on prioritized list. Perform a traversal of tree to determine all code words. Scan text again and create new file using the Huffman codes.
CS 102
Building a Tree
C#ODE Studio
Building a Tree
C#ODE Studio
CS 102
Building a Tree
C#ODE Studio
Char Freq. y 1 s 2 n 2 a 2 l 1
CS 102
Char Freq. k 1 . 1
Building a Tree
C#ODE Studio
Prioritize characters
Create binary tree nodes with character and frequency of each character Place nodes in a priority queue
The lower the occurrence, the higher the priority in the queue
CS 102
Building a Tree
Uses binary tree nodes
C#ODE Studio
Prioritize characters
public class HuffNode { public char myChar; public int myFrequency; public HuffNode myLeft, myRight; }
priorityQueue myQueue;
CS 102
Building a Tree
C#ODE Studio
E
1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
Building a Tree
C#ODE Studio
it left subtree make it right equals sum of right children into queue
Building a Tree
C#ODE Studio
E 1
i 1
y 1
l 1
k 1
. 1
r 2
s 2
n 2
a 2
sp 4
e 8
CS 102
Building a Tree
C#ODE Studio
y 1
l 1
k 1
. 1
r 2
s 2
n 2
a 2
sp 4
e 8
2 E 1 i 1
CS 102
Building a Tree
C#ODE Studio
y 1
l 1
k 1
. 1
r 2
s 2
n 2
a 2 E 1
2
i 1
sp 4
e 8
CS 102
Building a Tree
C#ODE Studio
k 1
. 1
r 2
s 2
n 2
a 2
sp 4
e 8
E 1
i 1
2 y 1
l 1
CS 102
Building a Tree
C#ODE Studio
k 1
. 1
r 2
s 2
n 2
a 2
2 y 1
2 l 1
sp 4
e 8
E 1
i 1
CS 102
Building a Tree
C#ODE Studio
2 y 1
2 l 1
sp
2
E 1 i 1
2 k 1 . 1
CS 102
Building a Tree
C#ODE Studio
2 i 1
sp
2
E 1
4 y 1 l 1
k 1 . 1
CS 102
Building a Tree
C#ODE Studio
2
E 1 i 1
sp 4 . 1
e 8
y 1
l 1
4 r 2 s 2
k 1
CS 102
Building a Tree
C#ODE Studio
2 E 1 i 1
y 1
2
l 1 k 1
2
. 1
sp 4 r 2
4 s 2
CS 102
Building a Tree
C#ODE Studio
2 y 1
2 l 1 k 1
2 . 1
sp 4 r 2
4 s 2
e 8
E 1
i 1
4 n 2 a 2
CS 102
Building a Tree
C#ODE Studio
2 y 1
2 l 1 k 1
2 . 1
sp 4 r 2
4 s 2 n 2
4 a 2
e 8
E 1
i 1
CS 102
Building a Tree
C#ODE Studio
2 k 1 . 1
sp 4 r 2
4 s 2 n 2
4 a 2
4 2 E 1 i 1 2
y 1
l 1 CS 102
Building a Tree
C#ODE Studio
2 k 1 . 1
sp 4 r 2
4 s 2 n 2
4 a 2 2
4 2 y 1
e 8
E 1
i 1
l 1
CS 102
Building a Tree
C#ODE Studio
4 r 2 s 2 n 2
4 a 2 E 1 6 2 k 1 . 1 sp 4 2 i 1
4 2 y 1 l 1
e 8
CS 102
Building a Tree
C#ODE Studio
4 r 2 s 2 n 2
4 a 2
e sp 4 8
2 E 1 i 1
y 1
2 l 1 k 1
. 1
Building a Tree
C#ODE Studio
4 2 E 1 i 1 2 2 l 1
6 sp 4
e 8
y 1
k 1
. 1
8 4 r 2 CS 102 s 2 n 2 4 a 2
Building a Tree
C#ODE Studio
4 2 E 1 i 1 2 2 l 1
6 sp 4
e 8 4 r 2 s 2
8 4 n 2 a 2
y 1
k 1
. 1
CS 102
Building a Tree
C#ODE Studio
e 8
4
r 2 s 2 n 2
4 10 a 2 2 4 2 y 1 6
2 l 1
k 1 . 1
sp 4
E 1
i 1
CS 102
Building a Tree
C#ODE Studio
e 8
10 4 4 a 2 E 1 2 i 1 2 2 l 1 6 sp 4
4
r 2 s 2 n 2
y 1
k 1
. 1
CS 102
Building a Tree
C#ODE Studio
10 4 2 E 1 i 1 2 2 l 1 16 6 sp 4 e 8 4 r 2 s 2 n 2 8 4 a 2
y 1
k 1
. 1
CS 102
Building a Tree
C#ODE Studio
10 4 2 E 1 i 1 2 2 l 1 6 sp 4
16 e 8 4 r 2 s 2 n 2
8 4
a 2
y 1
k 1
. 1
CS 102
Building a Tree
26
C#ODE Studio
10
4 2 E 1 i 1 y 1 2 l 1 k 1 2 . 1
16 e 8 sp 4 r 2 4 s 2 n 2
8 4
a 2
CS 102
Building a Tree
C#ODE Studio
26 10 4 2 E 1 i 1 y 1 2 l 1 k 1 6 e 8 sp 4 . 1 r 2 4 s 2 n 2 16 8 4 a 2
After enqueueing this node there is only one node left in priority queue.
CS 102
Building a Tree
Dequeue the single node left in the queue. This tree contains the new code words for each character.
2 10 4 2 2 6
C#ODE Studio
26 16 e 8 sp 4 4 8 4
E i y l k . 1 1 1 1 1 1
r s n a 2 2 2 2
26 characters
C#ODE Studio
26 10 16 e 8 sp 4 4 8 4
4 2 2 2
E i y l k . 1 1 1 1 1 1
r s n a 2 2 2 2
CS 102
C#ODE Studio
26 10 16 e 8 sp 4 4 8 4
4 2 2 2
E i y l k . 1 1 1 1 1 1
r s n a 2 2 2 2
CS 102
C#ODE Studio
Char E i y l k . space e r s n a
CS 102
Code 0000 0001 0010 0011 0100 0101 011 10 1100 1101 1110 1111
C#ODE Studio
Have we made 0000101100000110011 things any 1000101011011010011 better? 1110101111110001100 73 bits to encode 1111110100100101 the text ASCII would take 8 * 26 = 208 bits If modified code used 4 bits per character are needed. Total bits 4 * 26 = 104. Savings not as great.
CS 102
C#ODE Studio
How does receiver know what the codes are? Tree constructed for each text file.
Considers frequency for each file Big hit on compression, especially for smaller files
Tree predetermined
based on statistical analysis of text files or file types
C#ODE Studio
16 e 8 sp 4 4 8 4
101000110111101111 01111110000110101
E i y l k . 1 1 1 1 1 1
r s n a 2 2 2 2
CS 102
Summary
C#ODE Studio
Huffman coding is a technique used to compress files for transmission Uses statistical coding
more frequently used symbols have shorter code words
Works well for text and fax transmissions An application that uses several data structures
CS 102