Sei sulla pagina 1di 14

Data Compression

Media Signal Processing,


Presentation 2

Presented By: Jahanzeb Farooq


Michael Osadebey
What is Data Compression?

Definition
-Reducing the amount of data required to represent a source of
information (while preserving the original content as much as
possible).
Objectives
1- Reduce the amount of data storage space required.
2- Reduce length of data transmission time over the network.
Categories of Data Compression

Lossy Data Compression


-The original message can never be recovered exactly as it was before
it was compressed.
-Not good for critical data, when we cannot afford to loss even a single
bit.
-Used mostly in sound, video, image compressions where the losses
can be tolerated.
-A threshold level is used for truncation. (for example In a sound file,
very high and low frequencies, which the human ear can not hear, may
be truncated from the file)
-Examples: JPG, MPEG
-Lossy techniques are much more effective at compression than
lossless methods. The higher the compression ratio, the more noise
added to the data.
Categories of Data Compression

Lossless Data Compression


-The original message can be exactly decoded.
-Repeated patterns in a message are found and encoded in an
efficient manner.
-Also referred to as ’redundancy reduction’.
-Must required for textual data, executable code, word processing
files, tabulated numbers.
-Popular algorithms: LZW(Lempel-Ziv-Welch), RLE(Run Length
Encoding), Huffman coding, Arithmetic Coding, Delta Encoding.
-GIF images (an example of lossless image compression)
Applications: Why We Need Data Compression?

The two most important points are:


1-Data storage
-Modern data processing applications require storage of large volumes
of data.
-Compressing a file to half of its original size is equivalent to doubling
the capacity of the storage medium.
2-Data transmission
-Modern communication networks require massive transfer of data
over communication channels.
-Compressing the amount of data to be transmitted is equivalent to
increasing the capacity of the communication channel.
-Smaller a file the faster it can be transferred over the channel.
Applications

Applications
-Wide range of applications. We can say Data Compression is used
almost everywhere.
Types
-Image Compression
-(e.g JPG images)
-Audio Compression
-(e.g MP3’s audio)
-Video Compression
-(e.g DVD’s)
-General Data Compression
-(e.g ZIP files)
Data Compression Algorithms

1-Huffman coding
2-Run Length Encoding
3-Lempel-Ziv-Welch Encoding
4-Arithmatic coding
5-Delta Encoding

Some others...
6-Adaptive Huffman coding
7-Wavelet compression
8-Discrete Cosine Transform
Huffman Coding

-The characters in a data file are converted to a binary code.


-The most common characters in the input file(characters with higher
probability) are assigned short binary codes and
-least common characters(with lower probabilities) are assigned
longer binary codes.
-Codes can be of different lengths
Lempel-Ziv-Welch

-Uses a dictionary or code table.


-Done by constructing a "dictionary" of words or parts of words in a
message, and then using pointers to the words in the dictionary.
-LZW to compress text, executable code, and similar data files to
about one-half their original size. Higher compressions of 1:5 can
also be achievable.
Example:

The string "ain" can be stored in the dictionary and then pointed to
when it repeats.
Lempel-Ziv-Welch
Lempel-Ziv-Welch
Run Length Encoding

-Coding data with frequently repeated characters.


-It is called run-length because a run is made for repeated bits and
coded in lesser bits by only stating how many bits were there.
Example:
-A file with 0 as repeating character.
-Two characters in the compressed file replace each run of zeros.
-For the first 3 repeating 0’s in original file, the first encdoed stream
in compressed file is showing that ’0’ was repating ’3’ times.
Arithmetic Coding

-Message is encoded as a real number in an interval from 0 to 1.


-Shows better performance than Huffman coding.
Disadvantages
-The whole codeword must be received to start decoding.
-If there is a corrupt bit in the codeword, the entire message could
become corrupt.
-Limited number of symbols to encode within a codeword.
Arithmetic Coding

Symbol Probability Interval Symbol New ‘A’ Interval


A 0.2 (0.0 , 0.2) A (0.0, 0.04)
B 0.3 (0.2 , 0.5) B (0.04, 0.1)
C 0.1 (0.5 , 0.6) C (0.1, 0.102)
D 0.4 (0.6 , 1.0) D (0.102, 0.2)

Symbol New ‘B’ Interval Symbol New ‘D’ Interval


A (0.102, 0.1216) A (0.1608, 0.16864)
B (0.1216, 0.151) B (0.16864, 0.1804)
C (0.151, 0.1608) C (0.1804, 0.18432)
D (0.1608, 0.2) D (0.18432, 0.2)

Potrebbero piacerti anche