Sei sulla pagina 1di 14

Course

Year

: T0034 Algorithm Design & Analysis


: 2013

Session 19
Huffman Code

DATA COMPRESSION
How does a compression program really work?
Why can size of file be reduced without reducing its
contain?
Supposed we will store letter A. Computer recognizes the
letter as a character with sequence 65, then the letter is
stored in harddisk as 1000001 (binary number of 65
decimal). It needs 7 binary digit to store letter A.
Data compression tries to store as minimum as possible
number of binary digits.
Bina Nusantara

ENCODING CHARACTER
ASCII (American Standard Code for Information Interchange)
ASCII character encoding consists 128 characters; 7 bit binary numbers;
printed 95 characters and 33 command character.

ISO 8859-1 is a standard character with 8 bit, can store 256 characters.
This standar is called as character encoding Latin-1.

UTF-8 (Unicode Transformation Format) is a standard encoding


character that enable many languages as shown in the same time.
Number of bit to store a character is different.
ASCII character is stored in 1 byte (8 bit).
Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac dan Thaana
character is stored in 2 byte (16 bit).
Other laguages character is stored in 3 byte (24 bit). Character of Jawa/Bali
(hanacaraka alphabetic) dan Bugis from Indonesia exist in this standard.
In the future it is possible to store character in 4 byte.

Bina Nusantara

CHARACTER TABLE
Binary

Binary

Binary

0010 0000
0010 0001
0010 0010
0010 0011
0010 0100
0010 0101
0010 0110
0010 0111
0010 1000
0010 1001
0010 1010
0010 1011
0010 1100
0010 1101
0010 1110
0010 1111
0011 0000
0011 0001
0011 0010
0011 0011
0011 0100
0011 0101
0011 0110
0011 0111
0011 1000
0011 1001
0011 1010
0011 1011
0011 1100
0011 1101
0011 1110
0011 1111

32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63

sp
!
"
#
$
%
&
'
(
)
*
+
,
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?

0100 0000
0100 0001
0100 0010
0100 0011
0100 0100
0100 0101
0100 0110
0100 0111
0100 1000
0100 1001
0100 1010
0100 1011
0100 1100
0100 1101
0100 1110
0100 1111
0101 0000
0101 0001
0101 0010
0101 0011
0101 0100
0101 0101
0101 0110
0101 0111
0101 1000
0101 1001
0101 1010
0101 1011
0101 1100
0101 1101
0101 1110
0101 1111

64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95

@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_

0110 0000
0110 0001
0110 0010
0110 0011
0110 0100
0110 0101
0110 0110
0110 0111
0110 1000
0110 1001
0110 1010
0110 1011
0110 1100
0110 1101
0110 1110
0110 1111
0111 0000
0111 0001
0111 0010
0111 0011
0111 0100
0111 0101
0111 0110
0111 0111
0111 1000
0111 1001
0111 1010
0111 1011
0111 1100
0111 1101
0111 1110

96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~

Bina Nusantara

HUFFMAN ALGORITHM
Invented by David A. Huffman in 1951 when he
studied his Ph.D at Massachusetts Institute of
Technology (MIT).
He discovered a method to build binary tree based
on frequency. The Binary Tree is called as Huffman
Tree is foundation of data compression with ZIP
format.
The technique is used as algorithm to create JPEG
image and musical file format MP3.
Bina Nusantara

CREATING A HUFFMAN TREE

Sort characters from the smalest frequency in a table.


Choose top 2 characters, make as leaf node of the tree.
Node the character and its frequency.
Make new node as parent of the two leaf nodes. Its
fequency comes from both childrent.
Remove used nodes from the table.
Enter new node to the table.
Repeat step 2 until all nodes removed.
Beside contains bit data, table that contains Huffman Code
has to be saved in order to re-translate.

Bina Nusantara

EXAMPLE CASE
Supposed we store words:
LOGIKA ALGORITMA

Then, frequency table is:

Bina Nusantara

sp

HUFFMAN TREE

Bina Nusantara

HUFFMAN CODE TABLE

sp

11110

11111

0100

0101

1110

011

100

101

110

00

L
O
G
I
K
A
sp
A
L
G
O
R
I
T
M
A

Bina Nusantara

011
100
101
110
11110
00
1110
00
011
101
100
11111
110
0100
0101
00

3 bit
3 bit
3 bit
3 bit
5 bit
2 bit
4 bit
2 bit
3 bit
3 bit
3 bit
5 bit
3 bit
4 bit
4 bit
2 bit

EXERCISE
Create frequency table, Huffman Tree, and
Huffman Code to compress:
DESIGN AND ANALYSIS OF ALGORITHMS

Bina Nusantara

REVIEW
DATA COMPRESSION
ENCODING CHARACTER
CHARACTER TABLE

Bina Nusantara

Books References
References:
Computer Algorithms / C++
Ellis Horowitz, Sartaj Sahni, Sanguthevar Rajasekaran.
Computer Science Press. (1998)

Introduction to Algorithms
Thomas H Cormen, Charles E Leiserson, Ronald L.
3nd Edition. The MIT Press. New York. (2009)

Algoritma Itu Mudah


Robert Setiadi.
PT Prima Infosarana Media, Kelompok Gramedia.
Jakarta. (2008)
Bina Nusantara

END

Bina Nusantara

Potrebbero piacerti anche