Raee2018 PDF

Design and Implementation of Efficient DA
Architecture for LeGall 5/3 DWT

Fatima Aziz, Sadaf Javed, Syeda Eima Iftikhar Gardezi, Ch. Jabbar Younis, Mehboob Alam
Department of Electrical Engineering, Mirpur University of Science and Technology (MUST),
Mirpur - 10250 (AJK), Pakistan,
fatimaaziz.pk@gmail.com, sadafjaved.pk@gmail.com, eimaiftikhar.pk@gmail.com,
jabbaryounis.ee@must.edu.pk , m.alam@ieee.org
Abstract—In recent years, wavelet based image and video There exist few architecture which implement 5/3 DWT
processing has gained an increasing importance in multimedia wavelet transform. The direct implementation using filter
applications. In this paper, we have designed and implemented a based employs all 8 multipliers which in case of array multi-
hardware-efficient lossless LeGall 5/3 Discrete Wavelet Transform
(DWT) using the Distributed Arithmetic (DA) architecture. The plication result in full 8 bits 58 adders [12]. The filter based
DA based architecture exploits the computational similarities implementation has the advantage of parallel processing the
within the intermediate results of the co-efficient module and input data, however this convolution based implementation has
only uses adders and shift registers, to achieve low complexity very high computational complexity. Another interesting im-
architecture. In our proposed implementation, the DWT coeffi- plementation is lifting based architecture, which is inherently
cient matrix is distributed by selective analysis of input, output
and the preferred precession of the coefficients. The DWT is low-complexity and automatically reduces the computational
implemented in HDL (Hardware Description Language) with its complexity to half. However, a lifting based implementation
main computational unit based on DA architecture. A hardware require bit serial data and may result in architecture with
comparison shows savings in excess of 75% compared to direct low clock frequency and high latency. Hybrid architectures
filter based implementation, and 22% in comparison to the other in the form of folded architectures are another choice for the
general purpose and optimized DWT architectures. The proposed
designed solution is high performance, low-power and greatly implementation of DWT [17]. The architecture gives efficient
alleviate the real time transmission and compression of high implementation of 5/3 wavelet transform, however the use of
fidelity video over bandwidth-limited channels. 19 adders are far from ideal. In our implementation, we have
Index Terms—LeGall 5/3 DWT, Distributed Arithmetic, selected a filter based architecture, which processes parallel
Wavelet, Video Processing, Image Processing data to provide high-throughput and can achieve higher clock
I. I NTRODUCTION frequencies. In addition, carefully selecting hardwired opera-
tion gives low complexity hardware with multiplierless VLSI
There is a growing interest in the transmission of multi-
implementation of 5/3 LeGall wavelet transform.
media contents such as audio, images, and digital video over
Currently wavelet transform is part of various multimedia
band-limited channels [1], [2]. The use of internet and various
compression standards i.e. JPEG2000, MPEG-4, motion JPEG
other mobile applications are forcing a steady increase in
and provides much needed linear transform operation in these
demand of multimedia content. Keeping in view the current
multimedia standards for high speed applications [13], [14]. In
availability of computing performance and storage capacity of
this paper, we will explore low complexity hardware imple-
various devices, there is an increase interest in processing of
mentation of lossless 5/3 LaGall DWT. In our implementation,
multimedia data, which is routinely transferred and processed
we have used distributed arithmetic to gain advantage of
at various stages [3]. Digital image and video compression
computational redundancy, which reduces the hardware needed
plays a significant role in the transmission and storage of
to implement wavelet transform. In real time applications,
multimedia contents. Most images have a common property
these computationally expensive wavelet transforms are often
that the adjacent pixels are highly correlated and they contain
implemented in hardware to meet high throughput requirement
redundant information [4], [5]. The main idea of use of trans-
of real time applications. The motivation is to design a new
form in image compression is the representation of an image
hardware implementation, which is simple as well as provide
in which pixels are less correlated. The wavelet transform
low hardware complexity for the DWT.
provides a useful service in this aspect. It disintegrate a signal
into a multi-scale description and hence permits a compressed II. D ISCRETE WAVELET T RANSFORM
view of its information contents [6]. The DWT has many A DWT is a transform in which samples are taken discretely.
existing architectures and hardware realizations, however the The key feature of DWT over other transforms is that it gives
high demands of real time video compression require a very both the frequency and location information simultaneously.
high-throughput, low-power architecture for the DWT [7]– Mathematically it is define as:
[11]. X
x(t) = aj,k 2j/2 ψ(2j t − k), (1)
Identify applicable funding agency here. If none, delete this. j,k
where ψ is the wavelet transformation function, and j and k
are frequency and space variables with cj,k being the wavelet
coefficient. Let us express the input signal x(t) at a scale of
j + 1 as follows
X
x(t) = aj+1,k 2(j+1)/2 ψ(2j+1 t − k). (2)
k
Note that at one scale lower resolution i.e. at j, wavelet are

necessary and the function can be expressed as
X X
x(t) = aj,k 2j/2 ϕ(2j t − k) + bj,k 2j/2 ψ(2j t − k). (3)
k k
Note that if ϕj,k (t) and ψj,k (t) are both orthogonal and
normalized, the j coefficients can be found by taking inner
product and interchanging sum with integrals as
X
aj,k = h0 (m − 2k)aj+1 (m), (4)
m
and Fig. 2. A wavelet transformation has both time location and frequency
X
information about the time-domain signal
bj,k = h1 (m − 2k)bj+1 (m), (5)
m
where h are the filter coefficients. It is these scale and trans- A. LeGall 5/3 DWT
lation variables, which enables wavelet transform to have the The DWT can be implemented by set of filter as given
time and frequency information of the signal. Let us consider by (4) and (5). In LeGall 5/3 wavelet transform, each level
a time domain signal as shown in Fig. 1, with no frequency of resolution can be implemented by a set of LP and HP
information. The wavelet transform, once applied to the signal filters. Note that in this filtering operation, a set of input
will have both location of time and frequency information as sequence is convolved with filter coefficients. Fig. 3 shows
shown in Fig. 2. The simplest of DWT implementation is using single level of 5/3 wavelet decomposition. In order to process
Fig. 3. Single level of 5/3 DWT decomposition, showing a HP and LP giving

approximation and detail DWT coefficients respectively.
the input data x[n], the signal is first simultaneously filtered

by a LP filter, and a HP filter. This data is divided into 2
branches, and are later downsampled by 2 to keep the same
number of approximate and detail coefficients. At any level of
Fig. 1. Original signal in time-domain decomposition, the wavelet coefficients are calculated from its
previous stage.
the filter bank, where at each level it is processed by set of Let us represent the two filter components of LeGall 5/3
low-pass (LP) and high-pass (HP) filter as also given by two wavelet filters, g 5/3 [n] and h5/3 [n] as
convolution function of (4) and (5). In next section, we will
h5/3 [n] = h[−2].x[n + 2] + h[−1].x[n + 1] + h[0].x[n] (6)
analyze LeGall 5/3 discrete wavelet transform and will see its
direct implementation using filter bank. + h[1].x[n − 1] + h[2].x[n − 2],
and mixture of Boolean and ordinary algebra. Let us consider
following sum of products.
g 5/3 [n] = h[−1].x[n + 1] + h[0].x[n] + h[1].x[n − 1]. (7)
n
Table I gives the filter coefficients of the 5/3 wavelet transform, y=
X
Hn xn , (8)
with a 5 co-efficient LP and 3 co-efficient HP filter. n=1
TABLE I
C OEFFICIENTS OF 5/3 L E G ALL WAVELET T RANSFORM where Hn are co-efficients and xn is the input signal. Let xn
be represented by 2’s complement, such that |xn | < 1 then xn
n LP Filter h[n] HP Filter g[n]) can be expressed as
0 6/8 1
±1 2/8 -(4/8)
±2 −(1/8) M
X −1
xn = −bn0 + bnm 2−m , (9)
m=1
The input x[n] in (6) and (7) can be represented as a data
vector, however depending upon the level of decomposition, it where bnm are bits, bn0 represent the sign bit. Let us use (8)
has several outputs, each one describing a main and a detail and (9) to find out y and we have
component as shown in Fig. 3. The figure shows a digital filter
based implementation, where transform is implemented using n
X M
X −1
the two filters. The most basic form of this transform is a 1D y= Hn [−bn0 + bnm 2−m ]. (10)
single level, with just one of each kind filter to transform the n=1 m=1
signal as shown in Fig. 3.
The 5/3 LeGall wavelet transform is a part of bi-orthogonal Note that the above equation is a form of inner products.
and symmetric wavelet family. It is a lossless transform, Without loss of generality, let us first represent the lfilter co-
with no quantization of coefficients for its implementation. efficients in terms of decimal numbers as shown in Table II.
The 5/3 transform can be implemented using filter bank but
most popular choice is using lifting scheme. Note that since
lifting scheme has a serial nature, that’s why it cannot be TABLE II
used for certain applications. In filter implementation, the C OEFFICIENTS OF 5/3 L E G ALL WAVELET T RANSFORM
hardware performance of 5/3 wavelet transform depends on the
LP co-efficients h[k] HP co-efficients g[k]
accuracy of the co-efficient. In next section, we will design and
implement a hardware-efficient, low power, lossless LeGall 5/3 -0.125 -0.5
0.25 1
DWT based on the DA architecture. 0.75 -0.5
0.25
III. P ROPOSED DA A RCHITECTURE FOR L E G ALL 5/3 -0.125
DWT
The Distributed Arithmetic (DA) based architecture are
most commonly used in digital signal processing, where one These filters coefficients given by Table II will become the
of the vector is pre-determined [15]. In many cases, the taps of the FIR filter and helps in transforming the signal. Let
results are precomputed and store in memories or lookup- us first represent wavelet filter coefficients g 5/3 [n] and h5/3 [n]
tables. The resultant architecture are usually symmetrical and using DA mathematical representation as given by (10). For
regular, which can give multiplierless implementations of the LP filter g 5/3 [n], we can write
the original function. There are several distributed arithmetic
based implementations of digital signal processing algorithms. k
X k
X
DA based architectures are best know for calculating vector g 5/3 [n] = − Ak bk0 + [(Ak bk1 )2−1 (11)
inner product. In the case of the DWT filter based implemen- k=1 k=1
tation, the coefficient matrix is fixed, and DA architecture can + (Ak bk2 )2−2 + (Ak bk3 )2−3 + ... + (Ak bk7 )2−7 ]
preprocess and exploit the redundancy. In DA architecture, g 5/3 [n] = −[A1 b10 + A2 b20 + A3 b30 ] + [(A1 b11 )2−1
the operation of arithmetic multiplication by coefficient is
distributed by taking in to account the redundancy in com- + (A1 b12 )2−2 + (A1 b13 )2−3 + ... + (A1 b17 )2−7 ]
putation. The DA based architecture are reported to provides + [(A2 b21 )2−1 + (A2 b22 )2−2 + (A2 b23 )2−3 + ...
saving as high as 88% as compared to non-DA filter based + (A2 b27 )2−7 ] + [(A3 b31 )2−1 + (A3 b32 )2−2
implementations [16].
+ (A3 b33 )2−3 + ... + (A3 b37 )2−7 ] (12)
DA is basically an efficient method of generating inner
product in one step. Mathematical derivation of DA is a
g 5/3 [8] = −[A1 b10 + A2 b20 + A3 b30 ] + [(A1 b11 ) + (A2 b21 )
(13)
+ (A3 b31 )]2−1 + [(A1 b12 ) + (A2 b22 ) + (A3 b32 )]2−2
···
···
···
+ [(A1 b17 ) + (A2 b27 ) + (A3 b37 )]2−7 . (14)
We started with standard DA definition of HP co-efficients
g 5/3 [n] in (11). The co-efficients are then distrubuted over the
input and bytes with similar shift are brought together in (13).
Similarly for the LP filter h5/3 [n], we can write
k
X k
X
h5/3 [n] = − Ak bk0 + [(Ak bk1 )2−1 (15)
k=1 k=1
+ (Ak bk2 )2−2 + (Ak bk3 )2−3 + ... + (Ak bk7 )2−7 ]
h5/3 [n] = −[A1 b10 + A2 b20 + ... + A5 b50 ] + [(A1 b11 )2−1
+ (A1 b12 )2−2 + (A1 b13 )2−3 + ... + (A1 b17 )2−7 ]
+ [(A2 b21 )2−1 + (A2 b22 )2−2 + (A2 b23 )2−3 + ...
+ (A2 b27 )2−7 ]
Fig. 4. Schematic delay line, which makes the data available for parallel
.. processing by the co-efficient adder butterflies block.
.
+ [(A5 b51 )2−1 + (A5 b52 )2−2
+ (A5 b53 )2−3 + ... + (A5 b57 )2−7 ] (16) Verilog HDL. Waveform showing Verilog HDL simulation,
clock, reset, input signal and wavelet co-efficients is given by
h5/3 [8] = −[A1 b10 + A2 b20 + ... + A5 b50 ] + [(A1 b11 ) Fig. 7. The waveform is verified and results are compared with
+ (A2 b21 )... + (A5 b51 )]2−1 + [(A1 b12 ) + (A2 b22 ) Matlab simulation of the 5/3 LaGall DWT for the same set of
+ ... + (A5 b52 )]2−2 input.
···
TABLE III
··· H ARDWIRED F IXED ROUTING , WITH S HIFT O PERATION
··· HP LP
+ [(A1 b17 ) + (A2 b27 ) + ... + (A5 b57 )]2−7 . (17) W1 = (-X0 ) W1 = (-X0 )
W2 = (X0 >> 1) W2 = (X0 >> 1)
In order to implement it in HDL, we will assume accuracy W3 = (X1 ) W3 = (X0 >> 2)
up to 8 decimal places, with Q1.8 positional number format. W4 = (-X2 ) W4 = (X0 >> 3)
Note that in Qn.m format, the number n gives number of W5 = (X2 >> 1) W5 = (X1 >> 2)
W1 = (-X3 ) W6 = (X2 >> 1)
bits before decimal and m shows the number of bits to the W2 = (X3 >> 1) W7 = (X2 >> 2)
right of decimal. In the initial case, the downsampling by two W3 = (X2 ) W8 = (X3 >> 2)
will split the input to LP and HP. The first architecture block W4 = (-X1 ) W9 = (-X4 )
W5 = (X1 >> 1) W10 = (X4 >> 1)
is the delay line, which makes parallel data available to the W1 = (-X1 ) W11 = (X4 >> 2)
co-efficient adder butterflies block as shown in Fig. 4. In the W2 = (X1 >> 1) W12 = (X4 >> 3)
co-efficients adder butterflies, the hardwired fixed connections
are made keeping in view the redundancy of addition operation
and the detail is given by Table III.
The next hardware component to follow is the adder but-
terflies block for the LP and HP filter and are given in
Fig. 5 and 6. They exploit the redundancy in computation IV. C ONCLUSION
and result in few addition. A comparison of the proposed DA In this work, we proposed and implemented a hardware-
architecture with other implementation is given by Table IV. efficient lossless LeGall 5/3 DWT based on the DA architec-
The Table IV shows savings in excess of 75% compared to ture. In our proposed implementation, the DWT coefficient
direct filter based implementation, and 22% in comparison to matrix is distributed by selective analysis of the coefficients.
the other general purpose and optimized DWT architectures. A hardware comparison shows a savings of 75% and 22% in
The 5/3 LaGall DWT was modeled and was simulated using comparison to direct and other optimized DWT architectures
respectively. The proposed designed solution has low compu-
tation complexity is well suited for the real time transmission
and compression of high fidelity video over bandwidth-limited
channels.
R EFERENCES
[1] J. Huang, H. Wang and Y. Qian, “Game User-oriented Multimedia
Transmission over Cognitive Radio Networks,” IEEE Transactions on
Circuits and Systems for Video Technology, Vol. 27, no. 1, pp. 198-208,
2017.
[2] M. Xing, J. He, and L. Cai, “Maximum-utility Scheduling for Multimedia
Transmission in Drive-thru Internet,” IEEE Transactions on Vehicular
Technology, Vol. 65, no. 4, pp. 2649-2658, 2016.
[3] L. Gao, J. Song, X. Liu, J. Shao, J. Liu, and J. Shao, “Learning in High-
dimensional Multimedia Data: The State of the Art,” Multimedia Systems,
Vol. 23, no. 3, pp. 303-313, 2017.
[4] D. Báscones, C. González, and D. Mozos, “Hyperspectral Image Com-
Fig. 5. The co-efficient adder butterflies block of LP filter, showing full-adder pression Using Vector Quantization, PCA and JPEG2000,” Remote Sens-
used in the block. ing, vol. 10, no. 6, 2018.
[5] M. Alam, W. Badawy, and G. Jullien, “An Efficient Architecture for
a Lifted 2D Biorthogonal DWT,” Springer Journal of VLSI Signal
Processing, vol. 40, no. 3, pp. 335-342, April 2005.
[6] O. Rioul, and M. Vatterli, “Wavelets and signal processing,” IEEE Signal
Processing magazine,” IEEE Signal Processing magazine, vol. 8, no. 4,
pp. 14-38, 1991.
[7] M. Martina, and G. Masera, “Low-Complexity, Efficient 9/7 Wavelet Fil-
ters VLSI Implementation,” IEEE Transactions on Circuits and Systems
II , Nov 13, 2006.
[8] P. Enfedaque, F. Auli-Llinas, and J. C. Moure, “Implementation of the
DWT in a GPU through a Register-based Strategy ,” IEEE Transactions
on Parallel and Distributed Systems , vol. 26, no. 12, pp. 3394-3406,
2015.
[9] S. Bhavani, K. Thanushkodi, “A Survey on Coding Algorithms in Medical
Image Compression,”International Journal on Computer Science and
Engineering , vol. 02, no. 05, pp. 1429-1434, 2010.
[10] E. Maria. Angelopoulou and Peter Y.K. Chaung, “Implementation and
Comparison of the 5/3 Lifting 2D Discrete Wavelet Transform Compu-
tation Schedules on FPGAs,” Journal of Signal Processing System, vol.
51, no. 1, pp. 3-21, 2008.
Fig. 6. The co-efficient adder butterflies block of HP filter, showing full-adder [11] G. Shi, W. S. Gan, “ An efficient folded architechture for lifting based
used in the block. discrete wavelet transform,” IEEE Trans. Circuit Syst.II, Exp. Briefs,vol.
56, no. 4, pp. 290-294, 2009.
[12] D. J. LeGall, “ Sub-band Coding of Images with Low Computational
Complexity,”U.S. Patent 4,829,378, issued May 9, 1989.
TABLE IV
[13] B. Wu and C. Lin, “A High-performance and Memory-efficient Pipeline
H ARDWARE C OMPLEXITY WITH E XISTING 5/3 L E G ALL WAVELET
Architecture for the 5/3 and 9/7 Discrete Wavelet Transform of JPEG2000
T RANSFORM
Codec ,” IEEE Transactions on circuits and systems for video technology,
Implementations Adders Multipliers vol. 15, no. 12, pp. 1615-1628, 2005.
[14] Iain E. H. Richardson, “ H. 264 and MPEG-4 Video Compression: Video
Folded DWT [17] 19 0
Coding for Next-generation Multimedia ,” John Wiley & Sons, 2004.
Lifting DWT [18] 4 2 (14 Adders)
[15] M. Alam, W. Badawy, and G. Jullien, “A New Time Distributed DCT
Convolution DWT [18] 6 4 (12 Adders) Architecture for MPEG-4 Hardware Reference Model,”IIEEE Transac-
Filter based Direct DWT 0 8 (56 Adders) tions on Circuits and Systems for Video Technology, vol. 15, no. 5, pp.
Proposed DA Architecture 14 0 726-730, May 2005.
[16] M. A. Shams, A. Chidanandan, W. Pan, and Magdy A. Bayoumi,
“NEDA: A low-power high-performance DCT architecture,”IEEE trans-
actions on signal processing, vol. 54, no. 3, pp. 955-964, 2006.
[17] M. Martina and G. Masera, “Multiplier less Folded 9/7-5/3 Wavelet
VLSI Architecture,”IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54,
no. 9, pp. 770-774, Sep. 2007.
[18] C. Lian, K. Chen, H. Chen, and L. Chen, “Lifting based Discrete Wavelet
Transform Architecture for JPEG2000,”in Proc. IEEE Int. Symp. Circuits
Syst., col 2, Sydney, Australia, pp. 445-448, May 2001.
[19] C. Schremmer, “Multimedia Applications of the Wavelet Transform,”
Ph.D. Dissertation, Universität Mannheim, 2001.
[20] M. Alam, C. A. Rahman, W. Badawy, G. Jullien, “Efficient Distributed
Arithmetic Based DWT Architecture for Multimedia Applications,” The
3rd IEEE International Workshop on System-on-Chip for Real-Time
Applications, 2003, pp. 333-336, July 2003.
Fig. 7. Waveform of Verilog HDL simulation, showing clock, reset, input

signal and the calculated wavelet co-efficients.

Raee2018 PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Raee2018 PDF

Caricato da

Copyright:

Formati disponibili

Design and Implementation of Efficient DA

Architecture for LeGall 5/3 DWT

Note that at one scale lower resolution i.e. at j, wavelet are

Fig. 3. Single level of 5/3 DWT decomposition, showing a HP and LP giving

the input data x[n], the signal is first simultaneously filtered

Fig. 7. Waveform of Verilog HDL simulation, showing clock, reset, input

Potrebbero piacerti anche