Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Research Question: To what extent are data compression techniques like Lempel Ziv Welch
Algorithm, DEFLATE Compression and DWT compression more effective in compressing tiff
images regards to data compression ratio, space-time complexity and efficiency?
Krises Maskey
1. Introduction
The following essay will focus on various algorithms needed to compress data and compare them
according to speed, compression ratio, efficiency and practicality in cloud computing. This is
mainly helpful as large data can be compressed into fewer bits which saves both time and money
for us. This essay will specifically look into the Lempel Ziv Welch Compression, DEFLATE
Compression and Lossy Compressions used in cloud computing. As stated before Data
Compression is the base for all the multimedia applications around the world. Without Data
Compression algorithms it wouldn’t be possible for us to upload images, audio, text and video on
websites. Similarly, mobile phones would also not be able to provide telecommunication clearly
without the use of data compression. Therefore, of the many data compression techniques
currently used, Lempel Ziv Welch, DEFLATE and lossy are some of them. These compression
techniques mainly help large data files to be compressed in smaller bits with keeping the space-
Hence, the question: To what extent are data compression techniques like Lempel Ziv Welch
Algorithm, DEFLATE Compression and Lossy compression more effective in cloud computing
Even though each compression algorithm uses different techniques to compress files, both have
the same function: Both of them search for the duplicity of data in the file and uses complicated
algorithms to store them in compact data representation. Lossless data compression reduces size
by identifying and eliminating statistical redundancy. The major advantage of lossless data
compression is, no data is lost in the process. Whereas, Lossy data compression reduces size by
removing unnecessary and repeated information/data. The major drawback of this is data is lost
in the process.
Lossy data compression methods mainly include DCT (Discreet Cosine Transform), Vector
Quantization and Huffman coding. Similarly, Lossless data compression methods includes RLE
(Run Length Encoding), string-table compression and LZW (Lempel Ziv Welch). There are
reduction from the source file to the size of the compressed version. Similarly, to understand
i. Compression ratio
Compression ratio is defined as the ratio of the output to the input file size of a compression
technique. For example: when a file is compressed and compared to its original size, and if
the following result shows the compressed file is three times smaller than the original one it
resources and time required for running it. For example, we can use the O-notation, which
mainly denotes the time efficiency and storage requirement. However, the behavior of
compression algorithms can be very inconsistent. Thus, computational complexity can never
Compression time is the time required to compress a file. Even though, times for encoding
and decoding a file is considered separately, some applications decoding time is more
important than encoding time. For other applications, both are equally important.
v. Entropy
characteristics of the input. Entropy can also be used as a theoretical bound if the
judgement. This also provides a theoretical idea as of how much compression can be
achieved.
vi. Redundancy
compression can be achieved, whereas lower redundancy results in lower compression ratio.
vii. Overhead
The amount of extra data added to the compressed version of the inputted data is known as
Overhead, this is done because it is needed for decompression later onwards. Overhead at
time can be large but it is advised it should be much smaller than the space saved by
compression.
Lempel Ziv Welch (LZW) was created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was
published originally by Lempel and Ziv in 1978. Since then, LZW has become a very common
compression algorithm. This algorithm is mainly used in GIF, PDF and texts. LZW is a type of
lossless compression, meaning data is not lost when compressing. This algorithm is pretty simple
to implement and has one of highest throughput potential in hardware implementations. Lempel
Ziv Welch is the compression technique which is widely used to compress Unix file, texts and
Lempel Ziv Welch data compression works by searching reoccurring patterns or redundancy of
data to save data space. LZW is seen as the fastest technique for general purpose data
primarily searches a sequence of symbols, then after grouping the symbols into strings, and
converting the strings into codes. This way the codes take up less space than the strings they
convert.
a. LZW compression mainly uses a code table, which consists of 4096 sections to enter data
in the tables. Codes 0-255 in the code table are always assigned to represent single bytes
b. In the start the code table only contains the first 256 entries, but as encoding starts the
remaining sections of the table are filled. Compression is achievable when codes 256
c. The encoding continues, as LZW identifies redundancy of data and adds them to the code
table.
d. The file can be decompressed by taking each code from the compressed file and
represents.
The major idea of this compression technique is: when a user inputs a data, it will be processed,
whereas a dictionary keeps a correspondence between the longest encountered words and a list of
code values. Then these words are replaced by their corresponding codes, this way the input file
Sample CODE:
* PSEUDOCODE
1 Initialize table with single character strings
2 OLD = first input code
3 output translation of OLD
4 WHILE not end of input stream
5 NEW = next input code
6 IF NEW is not in the string table
7 S = translation of OLD
8 S = S + C
9 ELSE
S = translation of NEW
11 output S
12 C = first character of 10 S
13 OLD + C to the string table
14 OLD = NEW
15 END WHILE
of blocks, equivalent to a following block of input data. The input blocks are compressed using a
combination of the LZ77 compression algorithm and Huffman compression algorithm. When the
data id compressed LZ77 algorithm finds repeated substrings, words, letters etc. Then after, it
replaces them with backward references. The LZ77 algorithm can use a same reference for a
duplicated string occurring in the same or preceding blocks, equal to 32K input bytes.
a. LZ77 compression
LZ77 was developed by Abraham Lempel and Jacob Ziv which uses window divider for
search buffer and look-ahead buffer. The size of the search buffer is mostly 8192 bits and
size of the look-ahead buffer is about 10 to 20 bits. The algorithm can be described as
follows; Foremost the longest prefix of a look-ahead buffer that starts in the search buffer
is found. This particular prefix is then encoded as a triplet (a, b, c) where ‘a’ is the
distance of the beginning of the found preface from the end of the search buffer.
Similarly, ‘b’ is the length of the preface and ‘c’ is the first character after the preface in
look-ahead buffer.
b. Huffman compression
This compression technique is based on the redundancy of data item. The most frequent
data items will be represented and encoded with a lower number of bits. Here, this
algorithm creates a binary tree, known as the Huffman tree, which is based on the
repetition of the data. A Huffman algorithm starts by assembling the elements of the data
by assigning each one a ‘weight’; which is a number that represents its relative frequency
within the dataset to be compressed. These weights can be guessed or measured exactly
from passes through the data, or some combination of the two. The elements are selected
two at a time, element with the lowest weights being chosen. Then the two elements are
When both Huffman compression and LZ77 compression are combined, DEFLATE compression
algorithm is formed. Deflate is considered one of the most efficient and effective compression
techniques.
Foremost the compression is done with LZ77, which is followed with Huffman coding.
The trees which are used to compress in this algorithm are already defined by the Deflate
specification itself, hence no extra space needs to be taken to store those trees. In this
process initially, the data is broken up in ‘blocks’, each of this block uses a single mode
compression with the trees defined by the specification, or to compression with specified
Huffman trees, or to compression with a different pair of Huffman trees, the current block
Discrete Wavelet Transform (DWT) is used in image compression which separates an image into
a pixel. This technique is used in signal and image processing mainly for lossless image
compression.
In this paper we will be specifically looking at Haar wavelet transform method. Haar wavelet
transform is one of the simplest transforms for image compression. The process involved with
this also is very simple as it only requires calculating averages and differences of adjacent pixels.
Similarly, the Haar DWT is also faster and computationally efficient than the other sinusoidal
based discrete transforms, meaning. But the major drawback of this DWT tradeoff between
quality of image and decreased energy compaction compared to the DCT. As a general rule of
thumb whenever computational complexity increases, compression ratio also increases, but with
Haar DWT computational complexity is also at a minimum, meaning compression ratio also
decreases.
The Haar DWT transform consists of a matrix, which operates row-wise as each sums and
differences of consecutive elements are found. The sums and differences are stored such that if
the matrix is split in half from top to bottom the sums can be found in one side and the
differences in the other. Similarly, the next operation occurs column-wise, where an image is
split in half from left to right, while storing the sums on one half and the differences in the other.
This process is repeated on the smaller square matrices, power-of-two matrix which results in
sums of sums. The greater number of times this process happens it can be inferred as the depth of
the transform.
b. The basis vectors for the Haar matrix are sequence ordered.
c. Drawback for Haar DWT is it has poor energy compaction for images.
d. In case of Orthogonality: The original signal is split into a low and a high frequency part and
filters enabling the splitting without duplicating information are said to be orthogonal.
e. Compact support: The magnitude response of the filter should be exactly zero outside the
frequency range covered by the transform. If this property satisfied with algebraic equation, the
of weighted basis functions and the reproduced sample values are identical to those of the input
signal, the transform is said to have the perfect reconstruction property. If, in addition no
information redundancy is present in the sampled signal, the wavelet transform is, as stated
Image is defined as a two-dimensional function which can be represented by; f (x, y), where the spatial
(plane) coordinates are ‘X’ and ‘Y’. Whereas the amplitude of any pair of coordinates (x, y) is known as
the gray level or intensity of an image at a specific point. An image can be called a ‘digital image’ when x,
y and the amplitude values of ‘f’ is a real finite and discrete quantities. When image is formed it takes up
some chunks of storage which increases as more images are stored. To solve this Image compression is
used which helps to minimize the amount of memory needed to represent an image. A large number of
bits is required to represent a single image, and at times if the image needs to be stored or transferred,
it is unfeasible to do so without reducing the number of bits. This is a dilemma for the world at present
with tons of images stored in our smartphones taking loads of storage. To overcome this issue Image
compression was developed, it can be defined as the process of reducing the amount of data required to
represent an image. To achieve this redundant information is removed from an image. The three types
i. Coding redundancy
bits. Gray levels used to represent an image, is also based on the number of bits used to represent each
pixel.
a) To compress gray levels in an image we can use variable length code; number of bits used to
represent each pixel. The general concept here is to decrease the usage of bits for more
frequent gray levels and use a greater number of bits for lower frequent gray levels in an image.
This way we represent the entire image using the least possible number of bits. Thus, this can
2. Inter pixel redundancy: In an image each pixel depends on its neighboring pixel. This mainly
pixels means two or more pixels are completely dependent on one another. For examples:
When watching a video, if the frame rate is higher then the successive frames contain
almost same information. Similarly, in case of still images, higher the spatial resolutions
3. Psycho visual redundancy: Psycho visual redundancy is the Information that is ignored by
human eye or is unimportant in an image. Reducing this can compress the image further and
amount of detail. Now it is important to consider which of the three algorithms is most efficient.
Compression ratio, time complexity, computational complexity was brought up at the beginning
of this essay but not applied when the three algorithms were explored. To find out an experiment
will be carried out to measure these for each algorithm and be compared.
For this experiment all three algorithms will be compressing a tiff format image as input.
However, this compression as mentioned earlier will only compare lossless compression
techniques. Similarly, LZW algorithm, DEFLATE and Haar DWT will be compared on the basis
of compression ratio, its saving percentage and most to least compressibility. The complication
with comparing these algorithms is they rely on different resources. This is due to their
multipurpose use in the field of data compression, LZW as mentioned earlier is a dictionary-
based compression technique which compresses image by the removal of image's spatial
redundancy. The DEFLATE algorithm, however, compresses an image without reducing its
quality. Whereas, Haar DWT compression compresses an image according to its color
redundancy.
The experiment will measure the compression time, ratio between all three algorithms, and
complexity of each one. By the varying compression ratio with respect to its complexity, a clear
relationship between the image and the algorithm can be determined and how this relationship
differs with size of inputted file can also be concluded form this experiment.
Therefore, I hypothesize that there will be a positive relationship between compression ratio and
file size of image as described above. I also believe that the LZW will be more efficient when
compressing tiff images out of the three algorithms. Since the efficiencies will be the conclusive
factor, we will also be measuring the time it takes for the algorithms to compress a tiff image.