Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
VOL. ASSP-22,NO.
2,
APRIL
1974
87
Absfracf-Thestructure
of transformshaving the convolution
property is developed. A particular transform is proposed that is defined on a finite ring of integers with arithmetic carried out modulo
Fermat numbers. This Fermat number transform (FNT) is ideally
suited todigital computation, requiringon the order of N log N additions, subtractions and bit shifts, but no multiplications. In addition
to.being efficient, the Fermat number transform implementation of
convolution is exact, i.e., there is no roundoff error. There is a restrictionon sequencelength imposedbyword
length butmultidimensional techniques are discussed which overcome this limitation. Results of an implementationon the IBM 370/155 are presented
andcomparedwith
thefast Fouriertransform(FFT)showing
a
substantial improvement in efficiency and accuracy.
88
..
N- 1
yn
xn*hn =
2khn-k
O,l,..
-N
1.
TX
Th
Tg.
so that
tk,m's
Y = X @ H
(3)
N-1
N-1
tk,nyn
xmhn-m
tk,n
n=O
n=O
m=O
x- 1 N- 1
=
xmhmtk,m+l
m=O
Z=O
N- 1
x k
tk,mxnz
m=O
A'- 1
HA
IC
tk,zhl
O,l,.*
.,.I:
1.
z=o
4)
A - 1 A'-1
&hltk,m+z
5)
xm:mhztk,nztk,z.
z=o
m=o
z=o
m=O
k,Z,m
= tk,&,z
tk++l
O,l,-.
.,X
1.
(6)
lc,m = O , l , *
- .,w - 1.
(7)
16,772 =
0,1,
.,iv -
(8)
1.
(1)
tk.1
k=O
1974
(9)
tl,lk.
akm
k,7?% =
0,1,' *
tl.1
by a gives the
',Av- 1.
(10)
N-la--km
k,m
O,l,.
.,N
1.
(11)
89
TT-l
or
tk,m
N-1
N-1
= 6
(Ykn(Y-ln
(12)
k.2
7L=O
1 if p
N- 1
Lv-l
0 (mod W )
(13)
0 otherwise.
n=O
IC
1(
( Y~ 1)
( Y ~ n
= l p l
( ( Y-~ 1)
~
= kk,lm
lc,m
0,1,--.,N - 1.
(15)
0.
111. TRANSFORMS I N T H E R I N G 011' INTEGERS
MODULO AN I N T E G E R
n=O
A
i
(Y
90
1974
IEEE TRANSACTIONS ON
PROCESSING,
APRIL
ACOUSTICS,
SIGNAL ANDSPEECH,
I n order to have a computational advantage in implementing cyclic convolution, wewill derive a transform
having the cyclic convolutionproperty in this ring. In
addition, we would like thistransfornltohave
a fast
cornput'ationalalgorithm.First,
wewill presentresults
about the existence of such transforms which mere shown
to depend on the exist.enceof an a of order N and existence
N - l in the ring. The number theory necessary to understand the derivations is very basic and can be found in
any elementary book on number theory [14].
Let
F = p l r l p 2 r z . PlPL
(16)
..
(3.7)
a N = 1 (mod F )
which implies
all' = I(mod
piri)
..
1,2,. ,1.
(18)
# @(mod piri)
1,
11,
P #4
0 5 p , q 5 A: - 1.
(19)
--
a ,
+
+
-+
-- 1
(21)
CONVOLUTION
FAST
AGARWAL
BURRUS: AND
91
AAPPLICATXONS
FILTERING
DIGITAL TO
p , ::26 :+ 1,
b = 2t
TABLE I
(22)
FNT'S
___..I__
a!
ah' = 1(mod F , )
Notethat CY-^ =
and 2-" = -2b-"(mod F,).
As discussed in Sections I1 and 111, we canperform
cyclic convolut,ion of two integer sequences of length hr,
using the FNT, if certain conditions,are satisfied. Now
we discuss the possible values of N for which an FNT
exists, given a choice of F,.As implied by Theorem 1 of
Section 111, for transforms (mod F , ) of length N to exist,
N should divide O(F,).
Fermat numbers up to Fd are all primes, therefore for
FNT for
these cases O ( F , ) = 2b, and we canhavean
any length N = 2", m 5 b. For these Fermat primes the
integer 3 is an of order N = 2b, giving the largest possibletransformlength.
Of course thereare 2b-1 other
integers also which are of order 2b.They can be obtained
by taking odd powers of 3 . If an integer a! is of order N ,
then a!P(mod F,)will be of order N / p , if N / p is an integer.
Therefore,usingFermatprimes,
an integer a! of order
2" m 5 b, is given by 32&"(modF,). The integer 2 is of
order 2b = 2t+1,If CY is taken as 2 or a power of 2, all the
powers of a! would be some powers of 2, and, for these
cases, as discussed in t'he beginning of this section, the
FNT can be computed very
efficiently and is called the RT.
There are no other known Fernlat primes. For digital
fiheringapplications, F 5 ( b = 3 2 ) and F6 ( b = 64) seem
to be most practical. Lucas [13] has proven that every
prime factor of composite F , is of the form K-2t+2 1.
Therefore, 21+2 divides O(F,)
, for t > 4. I n particular it
can be verified that for F5 and F6,O ( F t ) = 2t+2. Therefore, for these choices of Fermat numbers, the maximum
possible transform 1engt)his 2t+2= 4b.Also, we assert that
a!* given by (23) is of order 2," mod F,,t 2 2.
O(
qj 4
22t-2(221-'1
.-
1)*
(23)
CY;
= 2 mod P,.
The proof that aP given by (23) is of order 21+2 with respect to any factorof F , is given i n Appendix C. Any odd
power of .\/z willalso be of order 2t+2. By raising q
2 to
21+2-mthpower, we obtain an integer U! of order 2m, m 5
t 2.
Table I gives Val-ues of 1V for the two most important
values of CY and also gives the maximum possible N for
the most practical values of b. The Fermat number transform has also been defined by Rader [lo], using & = 2
but his development does not indicate the possibility
of
using
Using theresults of Section 111, we havefound
the maximumpossible length N for which an FNT exists,
for the most practical values of b, and have found the
correspondinginteger LY. TJsing CY = fi, themaximum
possible lengths for these transforms
have increased by
a factor of 2, which makes them more useful. In the next
section we discuss a two-dimensional convolution scheme
3
4
5
6
8
16
32
256
16
32
32
64
128
12s
256
2*+1
1
-+
41
-+ 1
2l'
232
2e4
64
64
._
3
65536
...__ 3
128
256
.-..~_l-ll-__-_I____
fi
__
Example
To make the ideas of this section more clear, we now
present an example. This example
will illustrate several
points,treat,ment of negativevaluesinthedata,
t'he
structure of thetransformandthe.inversetransform
matrix, negative powers of CY,frequent "overflow" during
computation,meaningless of the transformvalues,and
exactnegs of the final answer. This example w-ill not demonstrate the efhient implementation of the R T using the
binary arithmatic. That will be illustrated in Section VI
on implementation of the RT.
Consider two sequences 2 = (2, - 2,1,0) and h =
(1;2,0,0), whose convolution is desired. From the overflow consideration, it is sufficient 'if we work modulo
Fz = 17. We want N = 4,for F2 the integer 2 is of order
8, therefore i2= 4 is an a! of order 4. The transformation
matrix T is given by
11
16 13
16
1 16
11
13 16
4 -1
I1 (mod 17)
41
--4
92
1974
TABLE I1
MAXIMUMONE-DIMENSIONAL
CYCLIC
CONVOLUTION
LENGTHS
USINGTWO-DIMEKSIONAL
FNT OR RT
____
T-1
1 4-1
4-2
4-3
4-4
4-6
=-41 1
-4
-1
4-2
31;
1;
1 -1
16
8192 32
3276864
4 1
4-6 4-9-
l;
1 -1
-1
-4-
1;f3(mod17)*
4- 1
16 1 1 [ 3
X=Tx=
16
1 16
1 1613
Word length b
4-1
1 4-3
X-H
N for CY
-N for CY
2 =! %
512
2048
8192
2048
+ 17 = 15.
(3,9,16,10) and Y
(3,90,80,90)
93
A . Negation
Let
A XB
CL
+ CW2b= CL - C;i(mod F , ) .
h- 1
ai2i,ai
0 or 1.
CL
i=O
0111 0101
L'rr
L'L
0101
-cl{) = 1010
Then
1111 = 15.
b-1
-A=-C
a22
E. Multiplication by a Power of 2
If CY is taken as 2 or a power of 2, RT's are obtained and
h-1
the
only multiplications involved in computing them are
=
di2i - (26 - 1)
those by some powers of 2. These multiplications are pari=O
ticularlysimple to implement in arithmetic
modulo F,.
h- 1
Suppose we need to multiply the content's of a register
=
di2i - (26 - 1)
2b 1
mod F ,
by 2k, 0 < k < 6, all we need to do is left-shift the coni=O
tents of the register by lc bits and subtract the lc overflow
a- 1
bits. A convenient way to do this is to append the data
=
di2; 2
mod F,.
register
on the left with a register of zeros, left-shift the
i-0
double register by IC positions, dropping the leading zeros,
Thus to negate a number, we have to complement each
and then subtract thehigher order register from the lower
bit and add 2 to the result. For example,
order register, as in D. If IC is outside the range 0 < k < b,
we make use of the fact that 2b = - 1 mod F,. Computa( 1011)
tion of the inverse transform required multiplications by
negative powers of 2 which can be converted to positive
(mod 17): 4 = 0100; -4 = [+11:)] = 1101 = 13;
powers by the following relationship, 2-k = - 2b--lc mod F,.
An alternate method to multiplyby 2-k is to load the data
13 4 = 17. in the higherorderregister
of adoubleregister,
filling
the lower order register with zeros, t,hen right-shift the
13. Addition
double register by Ic positions and subtract the low-order
When we addtwo b-bit integers, we obtaina b-bit register fromthe high-order register mod P,. For fast
integer and possibly a carry bit. The carry bit represents implementation of the bit shift operation, shift amount k
2b = - 1,mod F,. To implement arithmeticmodulo 2b 1, should be expressed in a binary form and a t a clock pulse
shifting should be done by a power of 2. For example,
we simplysubtractthecarrybit.Thusthehardwarc
should be of the carry subtract type. For example,
(mod 17) : 11 X Z3 = 88 = 3 mod 17
1010
11 = 0000 1011.
(mod 17) : 10 9 = 17 = 2(mod 17) +lo01
0101 1000
Shift left 3 positions __
10011
d=O
+ +
crr
1
1
CI, =
0010
(-CJr)
C. Subtraction
Subtraction is implemented as
an addition by
first negating the subtrahend and then adding them. Addition must
be done according to B.
1000
+1100
1 0100
-Ll
0011
11 X 2-3 = 11 X (-24-3)
CL
we get a 2b-bit
11 = 1011 0000.
-22
3
=
12 mod 17
94
0001 0110
Shift right 3 positions -C'H
CL
1100
12.
+ +
1974
To computea lengthN fast KT, 1%' log, N additions/subtract'ions,and ( N / 2 ) log, N / 2 multiplications by some
powers of 2 are requiredwhich are implernentcd as bit shifts
and subtractions. To compute the convolut.ion using the
FFT, -most of the time is talcen in computing the complex
A
multiplicat,ionsrequired t o computet,hetransfornls.
comparison with thc RT reveals that these complcx nul-.
tiplications are replaced by bit shifts and subtractions
which are much faster operatio~~s. This resultsin considerablecomputationalsavingsin
the implementation
of convolution. This fact has
been verified on a gcneral
purposemachine (IBll.2 370/155) The cornputation required to multiply the two transforms is about, the same
for both the implementations.To convolve long sequences
using the two-dimensional P N T or BT, the compuhtional
effort increases by, at t'hc nlost, a factor of 2. Still, the
RT implementation of convolution is much faster as compared to the FE'T implementation.
RT's have some additional advantages ovcr the k'k'T,
First,the FYP implemc!ntation requiresstoringallt,he
powers of W requiringa significant, amount' of storage
which maybeanimportantfactor
for a small minicomputer or a special purpose hardware implcrnentation.
Second, fixed-point, FF'Y implementation introduces a significant amount of the roundoff noise at the output, (i-X
bits depending on the data [a]. This degrades tha signalto-noise rat'ioduring the filteringoperations. The PNT
or RT implementationiserrorfree,
the onlysoum: of
error is input A / D quanti~at~ion.
VIII. I~IPS,EMli:NTATrC)NON THE I B l l 370/155
The word length of t,hc IBM 360 -370 series is 32 bits
and, therefore, is well suited for t,heirnplementation of
convolution using the P'NT or 1i.T nloclulo P 5 = 2 3 2
1.
I t has two's compl(:mcnt represcntation of rlegat.ive integers, i.c., -- x isrepresented as
x. W o want. rwgative
integers to be reprcscntedas the conlplernent modulo
232 1, i.e., --x is t o be rcprcscnted by F 2 $- 1 -- x, and
to accomplish this 1 is a.dded whenever a nega.tivo int>eger
is encountered in the data. As noted before, on this machine, we cannot represorrt --- l. If a -- l is cmmxntcred
in the data, it is roundcd t:ithw to 0 or to -2. This is
equivalent to introducing some initial quantizat,ion noisc.
If the data are uncorrclatt:d, t'hc probability that - 1 will
appear after an arithmetical opcration during cornp1~t.ation of the transform is roughly 2-32=
pcropcmtion.
This crror int,roduced during computationis ratllcr wrious
andprobablytho
corrcsponding output b1oc.k will be
meaningless.Assuming N = 64, thc probability that a
particular block is erroneous is roughly
k'or
most filtering operat'ions, this may be permissible.
Logical add ( A l A ) andsubtractinstructions
(SLIt)
are used to add and subt'ract two intcgcrs modulo Fj. If a
carry is detected aftor add, 1 is subtracted frorn the result,
and if 110 carry is dctectlctl aflcr subtract, I is added t,o
the result. Multiplication by 'Lk is done using tho logical
left shift operation (SLl)I.,) and m~dltiplicat,ionby 2 '; is
95
TABLE 111
seem to be the best choice for implementation on digit,al
CYCLICCONVOLUTION
TIMINGS
FOR LENGTHN REALSEQUENCES
FNT or IZT
ms
(Y
+
+
=L
IX. GENERALIZATIONSAND
APPLICATIONS
OTHER
96
IEEE TRANSACTIONS
APRIL
1974
APPENDIX A
G.
Theoren
If
=
F(W
+m ) }
F(K)oc-mK.
Ti f(n
If N can be factored as
N
Transform :
k
f(n)ak
7L=O
CYN =
O,l,-..,N
1
+ + --- +
1.
Inverse Transform:
N- 1
f ( n ) = N-
F ( k )a-nk
0,1, *
- -,lv- 1.
r1-r2---rm
N- 1
(A91
H . Fast ~ ~ ~ ~ ~ ~ ~ t ~ t i ~ ~ ~ l , 4 l ~ ~ ~
A , Definition
F(k)
T { f(n) 1
(A2)
k 4
I . Convolut.ion Property
Transform of cyclic convolut,ion of two sequences in
the product of their transforms,
i.e.,
if
(AIO)
C . 0rth.ogonality
The inner product of two basis functions are given by
(13)] [see (12j and
if n a
n mod N
B . Periodicity
F ( k ) can also be periodically extended, similar t o $he
extension of f ( n ),
=
F(K).
T{h(n)j.
N- 1
X(k)N(--12)
s ( n ) h ( n )=
and
x- 1
N- 1
s(n)h(-n)
case of s(n) = h ( n )
N- 1
X ( k ) X (- 1 6 )
22(n) =
and
N- 1
N- 1
f(N
- n)
F(-K)
F(N - K).
z ( n ) z (-n)
JV
n=o
F(K)
(M3)
k=O
7Z=O
(h12)
k=O
A- 1
(AS)
X(k)H(k).
a=O
E . Symmetry Property
f(n)= f ( - n )
(All)
k=O
For
t,he
particular
f(n)
H(k)
n=O
+N )
F(K + N )
T(z(n)]
N- 1
otherwise
f(n
Then
(A41
X(k)
X(k)2.
(-414)
k=O
Note that in
Ohis case, the conventional Parswals theorem
(AB)
N-1
N- 1
-f(-n.)
-f(N - n ) ,
in ring
a magnitude
is undefined.
97
K. Stretch Property
The stretch property exists
if the transform for the
longer length sequence exists in that ring.
L. Sampling Property
The sampling property exists.
2t+l
aP
PROOF OF THEOREM I
A8 shown before, the existence of such a transform depends on the existence of an a of order N with respect to
each prime power factor moduli, pari. According to Eulers
Theorem [14], N , order of a, must divide cp (pi7;),where
(o(piTi)is the Eulers (o function given by
=
p p - y p i - 1).
(B1)
of I\-,
N I O(F).
(B2)
a = pi(mod piri),
1,2,. .,Z.
(B3)
This is our desired a of order N . Since N and p i are relatively prime, N-I exists in this ring. This completes the
proof of the theorem.
APPENDIX C
= 22t+l =
=
Ni I 2
(C2>
APPENDIX B
cp(pp)
1 mod F t
1 mod p p .
22t
-1mod F ,
-1 mod p p .
((333