Sei sulla pagina 1di 19

Huffman Coding-

 Huffman Coding also called as Huffman Encoding is a famous greedy algorithm that is used
for the lossless compression of data.
 It uses variable length encoding where variable length codes are assigned to all the characters
depending on how frequently they occur in the given text.
 The character which occurs most frequently gets the smallest code and the character which
occurs least frequently gets the largest code.

Prefix Rule-
To prevent ambiguities while decoding, Huffman coding implements a rule known as a prefix rule which
ensures that the code assigned to any character is not a prefix of the code assigned to any other character.

Major Steps in Huffman Coding-


There are two major steps in Huffman coding-
1. Building a Huffman tree from the input characters
2. Assigning codes to the characters by traversing the Huffman tree

Steps to construct Huffman Tree-

Step-01:
Create a leaf node for all the given characters containing the occurring frequency of characters.

Step-02:
Arrange all the nodes in the increasing order of frequency value contained in the nodes.

Step-03:
Considering the first two nodes having minimum frequency, create a new internal node having
frequency equal to the sum of the two nodes frequencies and make the first node as a left child and the
other node as a right child of the newly created node.

Step-04:
Keep repeating Step-02 and Step-03 until all the nodes form a single tree.
After following all the above steps, our desired Huffman tree will be constructed.
Important Formulas-

Formula-01:

Formula-02:
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= ∑ ( frequencyi x Code lengthi )

Time Complexity-

 extractMin( ) is called 2 x (n-1) times if there are n nodes.


 As extractMin( ) calls minHeapify( ), it takes O(logn) time.
 Thus, Overall complexity becomes O(nlogn).

Time Complexity of Huffman Coding = O (nlogn)

where n is the number of unique characters in the text

PRACTICE PROBLEMS BASED ON HUFFMAN CODING-

Problem-01:
A file contains the following characters with the frequencies as shown. If Huffman coding is used for
data compression, determine-

1. Huffman code for each character


2. Average code length
3. Length of Huffman encoded message (in bits)
Characters Frequencies

a 10

e 15

i 12

o 3

u 4

s 13

t 1

Solution-
First let us construct the Huffman tree using the steps we have learnt above-

Step-01:

Step-02:

Step-03:
Step-04:

Step-05:

Step-06:
Step-07:
After we have constructed the Huffman tree, we will assign weights to all the edges. Let us assign
weight ‘0’ to the left edges and weight ‘1’ to the right edges.

Note
 We can also assign weight ‘1’ to the left edges and weight ‘0’ to the right edges.
 The only thing to keep in mind is that we must follow the same convention at the
time of decoding which we adopted at the time of encoding.

After assigning weight ‘0’ to the left edges and weight ‘1’ to the right edges, we get-

1. Huffman code for the characters-


We will traverse the Huffman tree from the root node to all the leaf nodes one by one and and will write
the Huffman code for all the characters-
 a = 111
 e = 10
 i = 00
 o = 11001
 u = 1101
 s = 01
 t = 11000

From here, we can observe-


 Characters occurring less frequently in the text are assigned the larger codes.
 Characters occurring more frequently in the text are assigned the smaller codes.

2. Average code length-


We know,
Average code length
= ∑ ( frequencyi x code lengthi ) / ∑ ( frequencyi )
= { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } / (10 + 15 + 12 + 3 + 4 + 13 + 1)
= 2.52

3. Length of Huffman encoded message-


We know-
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= 58 x 2.52
= 146.16
=147 bits
Hamming code:
Hamming code is a set of error-correction codes that can be used to detect and correct the errors that can
occur when the data is moved or stored from the sender to the receiver. It is technique developed by R.W.
Hamming for error correction.
Redundant bits –
Redundant bits are extra binary bits that are generated and added to the information-carrying bits of data
transfer to ensure that no bits were lost during the data transfer.
The number of redundant bits can be calculated using the following formula:
2^r ≥ m + r + 1
where, r = redundant bit, m = data bit
Suppose the number of data bits is 7, then the number of redundant bits can be calculated using:
= 2^4 ≥ 7 + 4 + 1
Thus, the number of redundant bits= 4
Paritybits–
A parity bit is a bit appended to a data of binary bits to ensure that the total number of 1’s in the data are
even or odd. Parity bits are used for error detection. There are two types of parity bits:
1. Even parity bit:
In the case of even parity, for a given set of bits, the number of 1’s are counted. If that count is odd, the
parity bit value is set to 1, making the total count of occurrences of 1’s an even number. If the total
number of 1’s in a given set of bits is already even, the parity bit’s value is 0.
2. Odd Parity bit–
In the case of odd parity, for a given set of bits, the number of 1’s are counted. If that count is even, the
parity bit value is set to 1, making the total count of occurrences of 1’s an odd number. If the total
number of 1’s in a given set of bits is already odd, the parity bit’s value is 0.
General Algorithm of Hamming code –
The Hamming Code is simply the use of extra parity bits to allow the identification of an error.
1. Write the bit positions starting from 1 in binary form (1, 10, 11, 100, etc).
2. All the bit positions that are a power of 2 are marked as parity bits (1, 2, 4, 8, etc).
3. All the other bit positions are marked as data bits.
4. Each data bit is included in a unique set of parity bits, as determined its bit position in binary form.
a. Parity bit 1 covers all the bits positions whose binary representation includes a 1 in the least
significant
position (1, 3, 5, 7, 9, 11, etc).
b. Parity bit 2 covers all the bits positions whose binary representation includes a 1 in the second
position from
the least significant bit (2, 3, 6, 7, 10, 11, etc).
c. Parity bit 4 covers all the bits positions whose binary representation includes a 1 in the third position
from
the least significant bit (4–7, 12–15, 20–23, etc).
d. Parity bit 8 covers all the bits positions whose binary representation includes a 1 in the fourth
position from
the least significant bit bits (8–15, 24–31, 40–47, etc).
e. In general each parity bit covers all bits where the bitwise AND of the parity position and the bit
position is
non-zero.
5. Since we check for even parity set a parity bit to 1 if the total number of ones in the positions it checks
is odd.
6. Set a parity bit to 0 if the total number of ones in the positions it checks is even.
Determining the position of redundant bits –
These redundancy bits are placed at the positions which correspond to the power of 2.
As in the above example:
1. The number of data bits = 7
2. The number of redundant bits = 4
3. The total number of bits = 11
4. The redundant bits are placed at positions corresponding to power of 2- 1, 2, 4, and 8
Suppose the data to be transmitted is 1011001, the bits will be placed as follows:

Determining the Parity bits –


1. R1 bit is calculated using parity check at all the bits positions whose binary representation includes a
1 in the least significant position.
R1: bits 1, 3, 5, 7, 9, 11

To find the redundant bit R1, we check for even parity. Since the total number of 1’s in all the bit
positions corresponding to R1 is an even number the value of R1 (parity bit’s value) = 0
2. R2 bit is calculated using parity check at all the bits positions whose binary representation includes a
1 in the second position from the least significant bit.
R2: bits 2,3,6,7,10,11
To find the redundant bit R2, we check for even parity. Since the total number of 1’s in all the bit
positions corresponding to R2 is an odd number the value of R2(parity bit’s value)=1
3. R4 bit is calculated using parity check at all the bits positions whose binary representation includes a
1 in the third position from the least significant bit.
R4: bits 4, 5, 6, 7

To find the redundant bit R4, we check for even parity. Since the total number of 1’s in all the bit
positions corresponding to R4 is an odd number the value of R4(parity bit’s value) = 1
4. R8 bit is calculated using parity check at all the bits positions whose binary representation includes a
1 in the fourth position from the least significant bit.
R8: bit 8,9,10,11

To find the redundant bit R8, we check for even parity. Since the total number of 1’s in all the bit
positions corresponding to R8 is an even number the value of R8(parity bit’s value)=0.
Thus, the data transferred is:

Error detection and correction –


Suppose in the above example the 6th bit is changed from 0 to 1 during data transmission, then it gives new
parity values in the binary number:

The bits give the binary number as 0110 whose decimal representation is 6. Thus, the bit 6 contains an error.
To correct the error the 6th bit is changed from 1 to 0.
Shannon-Fano Algorithm for Data Compression
DATA COMPRESSION AND ITS TYPES
Data Compression, also known as source coding, is the process of encoding or converting data in
such a way that it consumes less memory space. Data compression reduces the number of
resources required to store and transmit data.
It can be done in two ways- lossless compression and lossy compression. Lossy compression
reduces the size of data by removing unnecessary information, while there is no data loss in
lossless compression.
WHAT IS SHANNON FANO CODING?
Shannon Fano Algorithm is an entropy encoding technique for lossless data compression of
multimedia. Named after Claude Shannon and Robert Fano, it assigns a code to each symbol
based on their probabilities of occurrence. It is a variable length encoding scheme, that is, the codes
assigned to the symbols will be of varying length.
HOW DOES IT WORK?
The steps of the algorithm are as follows:
1. Create a list of probabilities or frequency counts for the given set of symbols so that the
relative frequency of occurrence of each symbol is known.
2. Sort the list of symbols in decreasing order of probability, the most probable ones to the left
and least probable to the right.
3. Split the list into two parts, with the total probability of both the parts being as close to each
other as possible.
4. Assign the value 0 to the left part and 1 to the right part.
5. Repeat the steps 3 and 4 for each part, until all the symbols are split into individual
subgroups.
The Shannon codes are considered accurate if the code of each symbol is unique.
EXAMPLE:
Given task is to construct Shannon codes for the given set of symbols using the Shannon-Fano
lossless compression technique.
Step:
Tree:

Solution:
Let P(x) be the probability of occurrence of symbol x:

1. Upon arranging the symbols in decreasing order of probability:

P(D) + P(B) = 0.30 + 0.2 = 0.58


and,
P(A) + P(C) + P(E) = 0.22 + 0.15 + 0.05 = 0.42
And since thealmost equally split the table, the most is dividedit the blockquote table
isblockquotento
{D, B} and {A, C, E}
and assign them the values 0 and 1 respectively.
Step:
Tree:

2. Now, in {D, B} group,

P(D) = 0.30 and P(B) = 0.28


which means that P(D)~P(B), so divide {D, B} into {D} and {B} and assign 0 to D and 1 to B.
Step:
Tree:

3. In {A, C, E} group,

P(A) = 0.22 and P(C) + P(E) = 0.20


So the group is divided into
{A} and {C, E}
and they are assigned values 0 and 1 respectively.
4. In {C, E} group,

P(C) = 0.15 and P(E) = 0.05


So divide them into {C} and {E} and assign 0 to {C} and 1 to {E}
Step:
Note: The splitting is now stopped as each symbol is separated now.

The Shannon codes for the set of symbols are:

Potrebbero piacerti anche