An Updated Data Compression Algorithm For Dynamic Data

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 5May 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page 1234

An Updated Data Compression Algorithm for
Dynamic Data

Vishal Gupta
#1
, Dheeraj Chandra Murari
*2

#
Asst.Professor & Department Of Computer science &Engineering & Bipin Tripathi Kumaon Institute of Technology
*
Asst.Professor & Department Of Computer science &Engineering & Bipin Tripathi Kumaon Institute of Technology
Dwarahat , Almora ,Uttarakhand ,India

Abstract This paper presents a compression algorithm for
dynamic data, the size of which keeps on increasing rapidly. In
this paper we have used a data compression algorithm that
filtered a dynamic data again and again so that a heavy bunch
of data could be passed through a low bandwidth cable in spite
of heavy load without any data loss and change in nature. It is a
memory efficient data compression technique comprising of a
block approach that keeps the data in compressed form as long
as possible and enables the data to be appended to the already
compressed text. The algorithm requires only a minimal
decompression for supporting update of data using a little pre-
processing which reduces the unnecessary time spent in
compression-decompression to support update by the and
algorithms deigned till now to a minimum. Further, the text
document can be modified as required without decompressing
and again compressing the whole document.
.

Keywords Dynamic data, LZW compression, Block Data
Structure, Text Compression
I. INTRODUCTION

In present era data transferring has became very
important factor for commercial as well as
educational systems. But due to heavy load on
network bandwidth we have to face many
uncomfortable situations. using this technique we
can reduce the data bits without any external
hardware support for processing the data Page .
Data compression is widely used in office
automation systems to save storage-space and
network bandwidth. Many compression methods
have been proposed till date which reduces the
size of all these documents by various degrees and
those that work efficiently for specific
applications [1, 2, 3]. One important characteristic
of office documents is that their size keeps on
increasing at a very fast pace. Moreover, these
automation systems require frequent modification
of already stored text [4]. Hence, if the data is kept
in compressed form, it needs to undergo repeated
cycles of compression and decompression with
each update.

In this paper, we focus on the dynamic documents
that are appended with new data very frequently.
We have proposed a compression method,
inspired by Lempel-Ziv compression, Huffman
coding and K- map [5] for such documents. The
algorithm creates blocks of input text data while
compressing it. The novel approach is to keep the
data in compressed state as long as possible and to
decompress only that block which needs to be
modified or appended with new data. This saves
the repeated cycles of compression-decompression
required for each modification of the same data as
required by other approaches till date. Moreover,
our algorithm supports a query on compressed text
based on location of word in the original text. In
computer science and information theory,
Huffman coding is an entropy encoding algorithm
used for lossless data compression. Huffman
coding uses a specific method for choosing the
representation for each symbol, resulting in a
prefix code that expresses the most common
source symbols using shorter strings of bits than
are used for less common source symbols and also
in a Karnaugh map the boolean variables are
transferred (generally from a truth table) and
ordered according to the principles of Gray code


in which only one variable changes in between
adjacent squares. Once the table is generated and
the output possibilities are transcribed, the data is
arranged into the largest possible groups
containing 2n cells (n=0,1,2,3...)[1] and the
minterm is generated through the axiom laws of
Boolean algebra. So, this compression is based on
the combination of all these.

II. RELATED WORK
Most of the data compression algorithms based on
adaptive dictionaries have their roots in two
algorithms developed in 1977 and 1978
respectively. They are known as LZ77 [7] and
LZ78 [8]. Various improved variants of LZ78
algorithm have been designed. The most
important modification is done by Terry Welch
that got published in 1984 and is known as LZW
[5]. Many variants of this algorithm also exist
today [1, 2, 3,9], many of which are specific to
some application. Relatively more advanced
methods are based on coding words as basic unit
[15, 16, 17]. These compression techniques use
word as a unit to put up in the dictionary to aid in
compression. Experiments have been done with
word-based LZW by Horspool and Cormeck [14].
Recursive block structured data compression
algorithm [18] describes an approach for
compressing data by creating blocks of data for
compression. Further improved approaches for
database compression based on structure of stored
data have been proposed [18]. There has been
research for efficient database compression and
data retrieval systems which retrieves the required
data from compressed text efficiently [20, 21, 22].
New techniques that create block of data also have
been proposed [21]. Moff at has described
technique for dynamic data In this paper, we have
proposed a new technique of data compression
that creates blocks of data while compressing the
input text and stores the corresponding
information about the blocks in an index table.
The proposed algorithm for data appends and
update uses this
index structure which forms the base for the
efficiency of the algorithm.

III. PROPOSED DYNAMIC COMPRESSION TECHNIQUE
In this section, we outline the algorithm designed
by us for dynamic documents inspired by LZW
compression algorithm. The main idea behind our
algorithm is to keep the data in compressed state
as long as possible and to decompress only the
minimal part for each update. The algorithm
works by creating blocks of input text when it
parses text for compression. The algorithm stores
the corresponding block information in an index
table while compressing the text. Each update and
modification requires processing of this index
table and decompression of the corresponding
block only.
A. BLOCKS AND INDEX CREATION

The key point in the algorithm is the block
creation of input text when compressing. The
compression algorithm adds every new string of
characters it sees to the lexicon.This is usually
implemented using trie data structure in which the
substrings encountered in input text are stored in
the lexicon.

Fig. I Lexicon implementation by trie data structure. The lexicon contains
all one character

The strings at the time of initialization .Now if
each compressed code is of x bits, there can be2x
different nodes in the trie. After creation of 2x
nodes, the trie gets filled up completely and no
more nodes can beaded further.
No. of bits per compressed text: x
No. of nodes in Lexicon : 2 x


Our algorithm considers the amount of input text
which fills up the trie once as a block. It keeps on
counting the number of word and characters per
block. When a block of data is compressed, the
algorithm creates an entry corresponding to the
block in the index table along with the number of
words and characters in the block. The starting
pointer and the ending pointer of each block are
added to the index table that point to the starting
bit and the last bit of the compressed text
respectively corresponding to the block of input
text. This creates a data structure with pre
processing information which helps in the update
and modification of the compressed text requiring
the decompression of the only block containing
the data to be updated.

Fig. II Index structure showing blocks with the corresponding entries per
block.

B. K- MAP REVIEW

K-map is a technique for presenting Boolean
functions. It comprises a box for every variable
(represented by a column) in the truth table. All
the inputs in the map are arranged in a way that
keeps Gray code true for every two adjacent cells
[Ercegovac, et al., 1999]. The maxi minimal
rectangular groups that cover the 1s give a
minimum implementation. The proposed
compression technique uses 4-inputs K-map as
can be seen in Figure 1.

Figure I. K-map for input functions

The proposed algorithm uses a K-map with 4-
variables. Thus, the number of different
combinations will be 42. This value is equal to
256 different possibilities for K-map, which has 4
inputs. These 256 different combinations will be
evaluated to only 82 distinct expressions. The
minimized expressions might contain 1, 2, 3 or 4
variables. For example, the number of different
expressions that contain exactly one variable is 10
expressions out of 82 expressions. For the input
variables (A, B, C, D), the set S1 that contain
exactly one variable will be S1={A, A`, B, B`, C,
C`, D, D`,0, 1}. The distribution ofthese 82
expressions and the number of variables in it is as
follows:

Following is Table (1) which contains all the
possibilities of the minimized terms with the
corresponding distinct variables in the minimized
term.
Table 1. All possible combinations for 4 input K-
map



C. Proposed Compression Algorithm

Initial Compressed Data
Data

Final Compressed Data Encrypted Data

An Algorithm for compression of data using new
approach:

[Step1]: input data value into circuit.
[Step2]: using compression algorithm to compress
data.
[Step3]: Encrypt the compressed data.
[Step4]: Apply K-map algorithm.
[Step5]: Return to (1-5) step.

D.Proposed Decompression Algorithm

Compressed decompressed
Data Data

Decrypted Data
Original Data

An Algorithm for Decompression of data

[Step1]: input compressed data value into circuit.
[Step2]: using Decompression algorithm to
decompress data.
[Step3]: Decrypt the Decompressed data.
[Step4]: Use Decoder to decode the data.
[Step5]: Return to (1-5) step.

IV. CONCLUSION
We conclude from this proposed algorithm that it
is more secure and compressed than the previous
algorithms. There is one more benefit that no
needs to decrypt the previous data to encrypt with
the new data because of the block of data used for
it which increases its efficiency also.

V. FUTURE WORK

This proposed algorithm will be practically
implemented to know its advantages and
disadvantages to this real world and make its more
better than now.

REFERENCES

[1] J . A. Storer and M. Cohn, editors.,Proc. 1999 IEEE Data
Compression Conference, Los Alamitos, California: IEEE Computer
Society Press, 1999.
[2] J . A. Storer and M. Cohn, editors.,Proc. 2000 IEEE Data
Compression Conference, Los Alamitos, California: IEEE
Computer Society Press, 2000.
[3] D. Salomon, Data Compression, Springer Verlag, 1998.
[4] I H. Witten, A. Moffat and T. C. Bell, Managing Gigabytes
Compressing and Indexing Dook Title, Van Nostrand Reinheld, 1994.
[5] Terry A. Welch, "A technique for high-performance data
compression", IEEE Computer, Vol. 17, No. 6, 1984, pp. 8- 19.
[6] Vishal Gupta, Neha Bora, Nitin Arora; Equivalence between the
Number of Binary Trees, Stack Permutations and Chain Matrix
Multiplication International J ournal of Advances in Engineering,
Science and Technology, pp. 232-235, Vol. 2- No. 3, Aug-Oct 2012.
[7] J . Ziv and A. Lempel, "An universal algorithmfor sequential Data
compression", in IEEE Transactions on Information Theory, 1977,
Vol. 23, Issue 3, pp. 337-343.
[8] J . Ziv and A. Lempel, "Compression of individual sequences via
variable-rate encoding", in IEEE Transactions on Information
Theory, 1978, Vol. IT-24, Issue 5, pp. 530-536.
[9] Vishal Gupta; Comparative Performance Analysis of AODV, DSR,
DSDV, LAR1 and WRP Routing Protocols in MANET Using
GloMoSim 2.0.3 Simulator International J ournal of Computer
Applications (0975-8887), pp. 16-24, Vol. 52- No. 20, August 2012.
[10] J . A. Storer and T. G. Szymanski, Data Compression via Textual
Sustitution, Journal of the ACM, Vol. 29, pp. 928- 951.
[11] Vishal Gupta, Neha Bora, Deepika Sharma; A Survey of Dynamic
Instruction Scheduling for Microprocessors Having Out Of Order
Execution International J ournal of Computer Application, pp. 80-88,
Issue 2 Vol. 5, October 2012.
[12] Suresh Kumar, Vishal Gupta and Vivek Kumar Tamta; Dynamic
Instruction Scheduling for Microprocessors Having Out Of Order
Execution Computer Engineering and Intelligent Systems, IISTE,
pp. 1014, Vol. 3- No. 4, 2012.
[13] T. C. Bell, "Data Compression in Full-Text Retreival Systems",
Journal of the American Soceity for Information Science, Vol. 44, No.
9, 1993, pp. 508-531.
[14] Nandita Goyal Bhatnagar, Vishal Gupta and Anubha Chauhan; A
Comparative Study of Differentiation Between Macintosh and
Windows Operating System IJ REISS, pp. 77-85, Vol. 2, Issue 6,
J une 2012. [20] T. C. Bell, "Data Compression in Full-Text
Retrieval Systems", J ournal of the American Society for Information
Science, Vol. 44, No. 9, 1993, pp. 508-531.
[15] R. N. Horspool and G. V. Cormack, "Construction word based
Text Compression Algorithms", in Proc. 2nd IEEE Data
Compression Comference, 1992.
Block
coding
Encryption

Apply
K-map
Block
coding
Decryption

Apply
K-map


[16] D. Pirckl, "Word-based LZW Compression", Master thesis,
Palacky University, Ololouc, Czech, 1998.
[17] J . Yang and S. Savari, Dictionary-based English text
compression using word endings, in Data Compression Conference,
2007.
[18] K. S. Nig, L. M. Cheng and C. H. Wong, Dynamic word based text
compression, in Proceedings of the Fourth International
Conference on Document Analysis and recognition, Vol. 1, pp. 412-
416.
[19] M. Tilgner, M. Ishida and T. Yamaguchi, "Recursive block
structured data compression", in Data Compression Comference,
1997.
[20] H. Kruse and A. Mukherjee, Data compression using text
encryption in Data Compression Conference, 1997.
[21] Suresh Kumar, Madhu Rawat, Vishal Gupta and Satendra Kumar;
The Novel Lossless Text Compression Technique Using Ambigram
Logic and Huffman Coding Information and Knowledge
Management, IISTE, pp. 25-31, Vol. 2- No. 2, 2012.
[22] W. K. Ng and C. V. Ravinshankar, Block-oriented
compression technique for large statistical databases, in IEEE
Transactions on Knowledge and Data Enginewering, Vol, 9, No.
2,1997, pp. 314-328.
[23] A. Moffat and R. Wan. RE-store, "A system for compressing,
browsing and searching large documents, in Proc. 8th Intl. Symp.
On String Processing and Information Retreival, 2001, pp. 162-174.
[24] A. Moffat, N. B. Sharman and J . Zobel, "Static Compression
for Dynamic Texts", in IEEE Data Compression Conference, 1997,
pp. 126-135.

An Updated Data Compression Algorithm For Dynamic Data

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

An Updated Data Compression Algorithm For Dynamic Data

Caricato da

Copyright:

Formati disponibili

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 5May 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 1234

Potrebbero piacerti anche