Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
5, OCTOBER 1977
392
I. INTRODUCTIONAND BACKGROUND
HE calculation of the finite digital convolution
x,=
N- 1
L,
N- 1
Xkffnk,
n = o , 1 , . * *, N - 1
(1 4
k=O
Yi =
tationally expensive convolution operation in (1.1) corresponds to the N complex multiplications in (1.3). The DFT
is, therefore, said to have the cyclic convolution property
(CCP). Since the FFT algorithm enables one to calculate the
DFT in O(N log N ) operations, the entire convolution requires
O(N log N ) operations.
A seemingly paradoxical situation arises here when one considers that all numbers in (1.1) may be integers making exact
calculation of the convolution possible.However, the computationally efficient DFT method involves intermediate
quantities, i.e., sines and cosines, which are irrational numbers,
thereby making exact results impossible on a digital machine.
This, as shown by Agarwal and Burrus [2], is a consequence of
the fact that, in order to have the CCP, a transformation must
have the form
hi-kXk
(1.1)
AND AGARWAL
393
394
k=max(O,i-N+1)
X(z) =
N- 1
5, OCTOBER 1977
... .
...
VOL. ASSP-25.
NO.
A=
min ( N - 1 , i )
w.=
AND SIGNAL
PROCESSING,
(2.7)
(2.8)
(2.9)
YO=WO+wN
Y 1 = w1 + W N + l
xi zi.
(2.2)
i= 0
(2.3)
mi = W(aj)=H(aj)X(aj),
j = 0, 1, . . * , 2 N - 2
(2.4)
of linear combinations of the hi's and xi's. The Lagrange interpolation formula may be used to uniquely determine the
2 N - 2 degree polynomial
(2.6)
Y N - 2 = WN-2 + W 2 N - 2
Y N - I = WN-1
(2.10)
which leads to
y=Cm
(2.1 1)
where C is an N by 2 N - 1 matrix obtained from C" by performing the row operations on G* corresponding to (2.10).
Here, and in what follows, we seek algorithms of the general
form (2.6) and (2.8) or (2.1 I), except that we will not require
that x be multiplied by the same matrix as h and consider,
instead, algorithms of amore general form,
m = (Ah) x (23x1.
(2.12)
395
mo = hoxo
ml
= hoxo
=hlxl.
(2.13)
+ h l z ) (x0 + x I z ) .
(2.14)
and
(2.22)
mo = (ho - h d (x0 - X I )
m l = hoxo
m2 = (ho + A d (x0 + X I )
(2.2 1)
wo = mo
wo t w1z + w2z2=(ho
=(ho+h1)(xo+x,)
(2.15)
(z - 1)z
( z tl ) ( z - 1)
+ mo
+ ml
1-2
1 *(-l)
(-2) (- 1)
z(z t 1)
(2.23)
IEEE TRANSACTIONS
ON
ACOUSTICS,
SPEECH,
AND
SIGNAL
PROCESSING,
VOL.
ASSP-25,
NO.
396
(2.25)
and
Y(z) = W(z) mod (zN - 1).
(2.26)
The polynomial zN - 1 is factored into a product of irreducible polynomials with integer coefficients
z
. P,,(Z).
(2.27)
* *
q(2)= ( 2 N
l)/Pdi(Z)
5,
OCTOBER 1977
(2.33)
@ ( z ) = [27(2)]- modPdi(z)
(2.34)
(2.35)
1 mod Pdj(z).
(2.36)
mod Pdi(z)
(2.37)
where
(2.38)
(2nj - 1) = 2 N - K .
(2.40)
j=1
Y(z) =
(2.30)
j=O
where
Si(.)
modPd.(z)
1
(2.3 1)
and
Si(z) E 0 modPdk(z),
k #j.
(2.32)
(2.41)
(2.42)
AGARWALALGORITHMS
AND COOLEY:
CONVOLUTION
FOR DIGITAL
397
TABLE I
than optimal algorithms for the small convolutions (2.42). In
MINIMUMNUMBER
OF MULTIPLICATIONS
FOR CONVOLUTION
many cases, the algorithms developed by Agarwal and Burrus THEORETICAL
AND NUMBER
OF MULTIPLICATIONS
AND ADDITIONS
FOR ALGORITHMS
[ l ] did this but it was not known, when they were written,
OF APPENDIX
A
how close they were to being optimal.
N
K
2N- K
M
A
Evidently, the manipulations to be carried out in deriving
the A , B , and C operators are quite tedious and fraught with
2
2
4
2
2
opportunitiesfor errors. Therefore, SCRATCHPAD [8] was
4
4
3
11
2
of enormous help in deriving and checking error-free expres4
5
3
15
5
35
8
10
5
sions for a sequence of calculations of intermediate quantities
8
44
8
6
leading to expressions forthe final results. The authors of
12
12
I
19
SCRATCHPAD added a few commands to the language
12
46
8
14
15
9
98
22
which. made theentire
procedure quite simple. At first,
16
10
SCRATCHPAD wasused interactively to develop concepts and
20
11
expressions which helped to minimize the number of additions
18
12
and to yield formulas convenient for programming. Then, the
resulting set of commands was run in a batch mode to deT2(z) = (z - 1) (z + 1)
velop alternate formulas for each N and to go up to higher N .
T3(z) = z2 - 1
(2.46)
In using SCRATCHPADfor the above calculations, all one had
to do was to define the various polynomials recursively and re- and
quest the printing of various formulas at appropriate points.
Q , (z) = [TI(z)] mod (z - 1) =
The program then printed out expressions for
1) the xps in terms of the xjs (formulas for the hqs are the
same),
2) the yis in terms of the products of the hps and the xps,
and
3) the y i s in terms of the yis.
Other quantities such as the factors of zN - 1 were also given,
but not really needed to describe the final algorithms.
The numbers of operationsfor some of the convolution
formulas derived by the above methods are givenin Table I
where K is the number of divisors of N , 2N - K is the minimum number of multiplications required for an N-point convolution, and M and A are the number of multiplications and
additions, respectively, required forthe algorithms given in
Appendix A.
C An Example with N = 4
The derivation of an optimal algorithm for a cyclic N = 4
convolution will be given here in detail, according to the
methods in Section 11-B. The convolution is defined by
(2.43)
QZ(z) = [Tz(z)]
-mod (z t 1) = - 4
(2.47)
giving
s,(z)
= -(z3 - z2 t z - 1)/4
(2.48)
(2.49)
are
H1(z)=h~=hothlth2th3
H 2 ( ~ ) = h ; = h O - h l + h , -h3
H~(z)
= h i + h : ~= (ho - h2) -I- (h, - h 3 ) ~ .
(2.50)
1)
(2.5
are exactly the same form as those for Hi(z). The relation
In terms of polynomials whose coefficients are the sequences
involved, this corresponds to
(2.44)
Yi(z)=Hi(z)Xi(z)rnodPdj(z)
is, in terms of the coefficients of H(z) and Xj(z),
yh = hhxh
yi = h i x i
- h:x:
P1(z)=z- 1
y ; = h;x;
P(Z) = z t 1
y: = h;x: t h:x;.
P4(Z)
=z t 1.
(2.45)
(2.52)
(2.53)
The calculation of Y3(z) is exactly like complex multiplication and is carried out as though z =
1. Therefore, as shown
in Section 11-A, the Cook-Toom algorithm can be used to
compute y ; and y:, in 3 instead of 4 multiplications. For the
4-
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-25, NO. 5, OCTOBER 1977
398
present purpose, however, we will use a slightly different complex number multiplication algorithm also requiring 3 multiplications, but requiring fewer additions involving the variable
data xi and y i . The result is that we have to compute the five
products
mo = hAxl
m, = hgxg
m2 = hg(x: + x : )
m3 = (h: - h:)xi
m4 = ( h i t h:)x:.
(2.54)
Y l = mo
Y ; =m1
y i = m 2 - m4
y: = m 2 - m 3 .
(2.55)
Y(z)=
Yj(z) sj(z)
(2.56)
j=l
N-1
Yi =
(3.1)
hi-kxk
k=O
N = r1r2
(2.57)
(3.2)
with mutually prime factors r1 and r 2 . This permits us to define the one-to-onemapping
As mentioned above, we assume that hi is fixed and used repeatedly for many xi sequences. Accordingly, we simplify the
i t--,(il, i 2 )
(3.3)
computation by redefining the mks and combining the and
factors with the his. The resulting algorithm, as described where il and i2 are defined by the congruence relations
in Appendix A, is of the general form of (2.1 1) and (2.12).
il = i m o d r l ,
O<il<rl
The algorithms for N = 2, 3, 4, 5, 6 , 7 , 8 , and 9 are given in
i2 = i mod r2, 0 < i2 < r 2 .
(3.4)
Appendix A so as to show the grouping of terms, by means of
parentheses, which hopefully minimizes the number of addi- The CRT says that there is a unique solution i to the congrutions. With the above arrangement it is seen that for N = 4, ences (3.4) which is given by
not counting the calculation of Ah, there are 5 multiplications
i = i l s l + i2s2 mod N
and 15 additions compared with the 16 multiplications and 12
additions required by direct use of the defining formula (2.43).
O<:i<N
(3.5)
It is interesting to note that, if the parentheses are grouped
around intermediate quantities occurring as the coefficients of where
reduced polynomials, a grouping of additions is obtained
s1 3 1 mod rl
which we have, in every case, been unable to improve upon in
s2 3 1 mod r2
(3 4
terms of thenumber of additions required. However, we
know of no theorems about the minimum number of addis1 3 0 mod r2
tions, or of systematic procedures for reducing the number of
s2 0 mod r l .
(3.7)
additions.
This mapping was used by Good [7] and Thomas I171 for expressing the DFT as a multidimensional DFT, thereby reducing the amount
of computation required. This procedure is describedby Cooley,
Lewis, and Welch [5 ]
399
= 41rz
32
= q2r1
(3.12)
k,=O nl=O
(3.8)
where
r. -1
(3.13)
1;
k , =O
q2 = (rl);;?
(3-9)
and
r. -1
(3.14)
The superscript 1 is put on the elements of A l , B 1 , and
C1, By changing the order of summation in (3.12), we obtain
hil-k,,i,-k,Xk,,k,
(3.10)
k,=Ok,=O
(3.15)
n,=O
where
Y.
-1
r--1
y =Hx
G2,n2Hnl,n,Xn,,n,
Cf,,n,
n,=o
rz-1 r,-1
Yi,,i, =
M , -1
M I -1
Yi,,j, =
r. -1
(3.16)
where the index of y , which is also the row index ofH, is the
sequence of pairs (kl,k2) in lexicographical order. Although
r. -1
y, h , and x are vectors, it will sometimes help to explain cerk,=O
tain operations by thinking of them as two-dimensional arrays
with row and column indices i l and i2, respectively, or kl and
rz-1 r,-1
1
k2 , respectively, whichever the case may be. Equation (3.10)
(3.17)
=
B~z,k,Bnl,k,Xk,,k,.
represents a two-dimensional cyclic convolution where the first
kz=O k,=O
dimension is of length r and the second dimension is of length
In operator notation, the calculation can be described3 by
r2. It will be shown below that this two-dimensional cyclic
convolution can be computed using a two-dimensional transY = ClC2 [(AZAlh) x (B2131X)l.
(3.18)
formation having the CCP. Being a two-dimensional transformation, it can be expressed as a direct product of two one- The notation B z B l x means that one computes the transform
dimensional transformations having the CCP for lengths rl and B1 of the columns of x and then the transform B2 of the rows
r 2 . Let us assume thatboth these transformations are rec- of the result; Since the ordering of the operators corresponds
to the ordering of the summations, they commute. However,
tangular transforms of the type represented by (2.23).
With subscripts to denote which of the factors rl or r2 the the ordering of the operators affects the sizes of intermediate
matrices refer to, we let A 1, B 1 , and C1 represent a set of rec- arrays, thenumber of additions,and program organization.
tangular matrices of dimensions M I x rl ,M1 x rl, and rl x M1, These willbe discussed in Section V-A.
We have thus shown that the composite two-dimensional
respectively, having the CCP for length r I and requiring M 1
transform
algorithm as described by (3.18) has the CCP.
multiplications. Similarly, A 2 , B 2 , and C, represent a set
Mapping
the result
intothe one-dimensional array yi
of rectangular matrices of dimensions M2 X r2, M 2 X r2 ,and
via
the
CRT
(3.5)
yields
the one-dimensional convolution
r2 X M2 ,reqpectively, having the CCP for length r2 and requir(3.1).
Hence,
the
total
transformation
(3.18) has the oneing\M2 multiplications. Then, the two-dimensional rectangular
transformation having the CCP can be derived as follows.
For the moment, let h andx be regarded as two-dimensional
3Equation (3.18) can be written in Kronecker product notationas
arrays. The sum over kl in (3.10) is, for each fixed i2 and k2 a
Y = (Cl x C 2 ) [ ( 4 x A l W x ( B 2 x B l X ) ] ,
convolution of column i2 - k2 of the array h with column k2 where X denotes the Kronecker product and x denotes element by eleof the array X. Each of these convolutions may be computed ment multiplication. However, this notation serves no useful purpose
and can cause some confusion. Therefore,it will not be used here.
by the above transform methods, giving
TABLE I1
TABLE
OF VALUES
OF Tr,)= M, - rj/A,
dimensional CCP with respect to the one-dimensional sequences yi, hi, and x i , i = 0, 1, * * ,N - 1 .
(3.19)
+ Ac,.
(3.20)
0.000
0.091
0.066
0.142
0.045
0.166
0.130
3
4
5
6
7
8
0.131
r1)
or
Alrz +AzMl <A2r1 +AIMz
from which it follows that
M1 - r1
A1
<-.M z - rz
(3.24)
Az
(3.25)
is smaller. Values of T(r)
are listed in Table 11.
N = r 1 r 2 .. a r t
(3.26)
where, as stated above, the s:r are mutually prime. The multidimensional index mapping is defined by
= A lrz t AzMl
operations. The reader may verify that if the Cj's were applied
in the order CzC1,one would obtain
A*(rl,r2)=ABIr2 + A B z M l+Ac,Mz
+AC,rl.
(3.22)
This is more complicated than (3.21) and makes it more difficult to minimize the number of additions. Both of these
formulas were tested with actual operation counts and,in only
one case, was it found that (3.22) gave fewer additions. Therefore, we have adopted the convention of placing the Cj operators in the reverse order of that used for the Bits in order to be
able to use (3.21). As mentioned earlier, this ordering also
simplifies programming.
Now let us consider reversing the order of the factors. If the
transforms are computed first along index 2 and then along
index 1, the total numberof additions required will be
+AIMz.A(rz,r1)=A2r1
(3.27)
(3.21)
(3.23)
(3.28)
rj.
(3.29)
where
si = qj (N/rj)
in which qi satisfies
qi(N/rj)
E 1 mod
401
A(rl,rz,~~~,r,)=Alr2...r,+M1A2r3..~rl
+MlM2A3r4- * - rt + * t M1 . . M,_,A,.
(3.32)
---
Mi - ri
T(ri) = -,
Ai
i.e., such that
T(rk) < T(ri) when k < j .
(3.33)
N = 1 2 8 r z r 3 . . . r,.
(4.1)
(3.34)
(4.2)
while the number of length 128 FNTs and inverse FNTs required is
F = 2r2r3
r,.
(4.3)
128(Azr3r4...r,tM2A3r4-.-rtt.**
-tMzM3
* * *
Mt-1 A t ) .
(4 -4)
402
TABLE 111
NUMBER
OF MULTIPLICATIONS
.4ND ADDITIONS
PER OUTPUT POINT FOR
COSVOLUTION
USIKGCOMPOSITE
ALGORITHMS
FORMED
FROM
THE
RECTANGULAR
TRANSFORMS
IN APPENDIX A
N
4,
44
2,g
8,9
6
12
18
20
80 30
36
60
12
84
3,8,5 120
180
210
360
420
504
840
1260
2520
Factors
of
Total Number
Multiplications
2, 3
3
8
20
4,5
2,325
4,g
4,3,5
50
4,3,7
4,9,5
2,3,5, I
8, 9 , 5
4,3,5, I
8,9,7
3,8,5, I
4,9,5, I
8,9,5, I
110
200
308
380
560
1100
1520
3080 I10
3800
,5852
10 640
20 900
58 520
Total Number
Multiplications
of Additions
per Point
34
100
232
25 0
450
625
20.00
1200
1186
25.48
2140
3320
6915
8910
19
22 800 54.29
34 618
15.61
63 560
128 025
359 130
Multiplications
Real
Real
per Point
Additions
per Point
N
4
8
16
32
64
128
256
512
1024
2048
4096~
2.00
2.5 0
4.25
5.12
6.06
8.03
9.01
10.00
12.00
13.00
14.00
1.00
9.50
12.37
14.81
11.53
20.51
23.00
25.15
28.75
31.25
34.00
Note: (It is assumed that one will do two real transforms with each
5.67
8.33
12.89
12.50
15.00
17.36
24.80
21.61
38.75
42.42
54.15
68.81
101.61
142.75
TABLE V
AMOUNT
OF COMPUTATION
FOR COYVOLUTION
USING THE FNT
MGLTIDIMENSIOKAL
ALGORITHMS
TABLE IV
NUMBERO F MULTIPLICATIONS
A N D ADDITIONS
PER OUTPUT
POINT
FOR
CONVOLUTION
USINGCOMPOSITE
FFT ALGORITHMS
(RADICES
2, 4, 8)
1.33
1.61
2.44
2.50
2.61
3.06
3.33
4.28
4.52
4.61
6.11
1.24
8.56
9.05
11.61
12.61
16.59
23.22
Additions
per Point
128
128
384
640
896
128x
1152
128. 1920
Factors of
N
x 1
128 x 3
128 x 5
128 x I
9
3x5
IN
Number of
Multiplies
per Point
Number
of
Extra Adds
per Point
1.0
1.33
2.0
2.11
2.44
2.66
0.00
3.66
1.00
10.28
10.88
13.00
complex FFT.)
Table V lists the amount of computation required for multidimensional implementation of cyclic convolution using FNTs
and rectangular transforms.
The data in Table V are to be compared with that in Tables
I11 and IV, where comparable dataforthecomputation
of
convolutions by rectangular transform and FFT methods are
given. The comparison is difficult to make since the FNT does
depend for its efficiency upon special machine hardware for
the transformations. However, the data do show how much
is to be gained if one has a machine with such hardware. The
reduction in numbers of multiplications is quite impressive.
For example, a mixed radix FFT algorithm (see [16])for
1024 points takes 12 multiplications per output point to compute a cyclic convolution while the FNT, used with the present
algorithms for a composite 896 point transform, takes only
2.71 multiplications per output point. The comparable figure
for840
points with the composite rectangular transform
method is 12.67 multiplications per outputpoint.For
N=
1920, we have 2.66 multiplications per output point for the
V. MISCELLANEOUSCONSIDERATIONS
A. Programming of the Algorithm and Machine Organization
We first summarize the calculation in matrix operator notation. The two-dimensional convolution (3.10) may be written
in the form
y = h**x
(5.1)
H= A l h
X
=Blx
y = Hx *X
y = c1 Y .
(5.2)
(5.3)
( 5 -4)
(5.5)
CONVOLUTION
403
respect to one index is done for all values of the other indices
and is, therefore, a vector operation which can be done simultaneously or in pipelined fashion for all vector elements. This
can be done conveniently by an array processor where one
may even consider hard-wiring the circuits which compute the
rectangular transforms.
Also, since the computation involves multidimensional transforms, it caneasily be adapted to a two-level memory hierH =A2H
(5.6) archy. A slow memory unit can be used to store all the data,
X = B2X
(5.7) and a fast memory unit can be used to compute on a part of
the data ata time (usually on a row or a column).
Y = H XXX
(5.8)
Y = c2Y
(5 *9) B. Bounds on Intermediate Results
If a multidimensional convolution is implemented in moduwhere the x x in (5.8) denotes element by element multipli- lar arithmetic (for example when the FNT is used) then we do
cation of all elements. The above formulation can be used to not have to worry about the intermediate values as long as the
define the structure of a program for implementing the algo- final output is correctly bounded. But if ordinary arithmetic
rithm. Such a program would carry out the operations defined is used, all theintermediate
values should be correctly
by (5.2)-(5.5)in
that order. This would essentially bean
bounded so that no overflow of the intermediatevalues occurs.
r,-point convolution program operating on vectors. In com- Below, wewillgive some simple bounds for the case where
puting (5.4), however, the program would compute the con- data are real and only rectangular transforms are used. It is
volutions by performing the operations defined by (5.6)-(5.9)
assumed thatthe h sequence is predetermined and remains
in that order. The latter computation can be done by a sub- fixed. Results are given for the two-dimensional case, but they
routine having exactly the same structure as (5.2)-(5.5). This generalize easilyto more than twodimensions.
is essentially an r2-point convolution subroutine also operating
Let
on vectors. On step (5.8), an element by element multiplicaN = rlr2
(5.10)
tion is performed. If there were a third factor, (5.8) would
contain a convolution and would be computed by still another and let
convolution subroutine operating on vectors. This could thus
X r n a = max IXk,,k,l.
(5.1 1)
proceed for as many levels of subroutines as there are factors
k , k2
in N .
For convolutions of real sequences, the rectangular trans- A bound ymaxon the magnitudes of the elements o f y in (5.1)
form approach requires only real arithmetic as compared with satisfies
complex arithmetic required by the FFT algorithm. This
r,-1 r,-1
should reduce hardware complexity considerably.
lyilnax
GXmax
Ihk,,k,l*
(5.12)
It may appear that the CRT mapping of a one-djmensional
k,=O k,=O
sequence intoa
multidimensional array may require substantial computation. However, this is not so. To map a one- The above bound is also a least upper bound. For a particular
dimensional sequence of length N into-a t-dimensional array of x array it can be achieved. Equation (5.12) is a bound on the
dimensions rl, r2,-* . ,rt [as given by (3.27)] ,we set up t ad- output, but we also need bounds on the intermediate results.
dress registers which give the t-dimensional array address for Consider the X array (5.3) obtained after computing the B1
each data point. As the input data comes in sequentially, all transform along the first dimension. A simple bound on the
address registers are updated by one. These address registers elements of X satisfies
are so set u p that when the contents of +e jth register beIXkl,j21GxrnaB(r1, n ~ )
(5.13)
comes rj, it is automatically reset to zero. Using this scheme,
no additional computation is required forthe address map- for all n l , j2where here, and in what follows,
ping. After computingthe convolution, removing thedata
rj-1
from the machine using (3.28) will require a substantial
B(rj,
nj)
=
lBnj,kji,
j = 1,2.
(5.14)
amount of computation. We can get around thisby removing
kj = O
thedata sequentially in the form of a one-dimensional sequence y. Again, we use the scheme as described above to give The absolute values of the elements of the X array, (5.7), are
the t-dimensional array address where the output is residing.
bounded by
For both input and output we use the mapping (3.27) which is
Ixn,,n,I
IXLl,jZIrnaxB(r2,n 2 )
(5.15)
much simpler. If the h sequence is fixed, the rectangular transwhere
the
max
refers
to
the
maximum
with
respect
to j 2 .
form of h canbe precomputed andstoredina
read-only
This, with (5.13) gives
memory (ROM).
For basic short length convolution algorithms, the A , B , and
IXn,,n,l GXmaxB(r1, nl)B(r2, n 2 )
C matrices are very simple and require few additions. Furthernl=O,l;*.,M1-l,
n 2 = 0 , 1 , . - - , M 2 -(5.16)
1.
more, as mentioned above, a rectangular transformation with
x,
404
IEEE TRANSACTIONS
ON
ACOUSTICS,
SPEECH,
AND
SIGNAL
PROCESSING,
VOL.
ASSP-25,
NO.
Both bounds (5.13) and (5.15) are least upper bounds. We get
a bound on the elements of the transform Y in (5.8) in terms
of the known fixed H by substituting the bound(5.16) in
(5.17)
to get
IYnl,n21 ~ x X m a x I ~ n , , n z l B ( ~ l ~ n l ) B ( ~ * , n ~ ) .(5.18)
(5.19)
(5.20)
5, OCTOBER 1977
(5.21)
VI. CONCLUSIONS
The multidimensional method for computing convolutions
was investigated by Agarwal and Burrus [ l ] in order to permit the efficient use of FNTs.While this presented computational advantages for computers capable of the special
arithmetic required for the FNT, it was also shown that even
without the FNT, a general-purpose computer could compute
convolutions by this method infewer multiplications than
others using the FFT for sequence lengths up to around 128.
The present paper suggests the use of the CRT for mapping
into multidimensional sequences. This, with improved short
convolution algorithms, makes the multidimensional method
better than FFT methods for sequence lengths up to around
420. The present methods are also more attractive since they
donot require complex arithmetic with sines and cosines.
This means that the calculation can be carried in integer arithmetic without rounding errors.
Theoretical results from computational complexity theory
showing how close the special algorithms are to optimal are
cited. Some of this theory is used for developing systematic
techniques for deriving optimal short convolution algorithms.
It is expected that these techniques, using computer-based
formula manipulation systems, willbeuseful for developing
tailor-made convolution algorithms which take advantageof
the special properties of a given computer. For the same reasons, one may also expect such techniques to have an effect
on the design of special-purpose digital processing systems.
APPENDIXA
CONVOLUTION ALGORITHMSFOR 2 < N < 9
Optimal and near-optimal algorithms for a number of short
convolutions are given with the number of multiplications M
and the number of additions A B , A c , and A . The operations
involving h are not counted. The elements of Ah and Bx are
denoted by ak and bk, k = 0, . ,M - 1, respectively.
The expressions for ak and bk are written with parentheses
arranged so as to show the ordering of the operations, which
CONVOLUTION
DIGITAL
AGARWAL
FOR
ALGORITHMS
AND COOLEY:
405
TABLE VI
OPTIMUM
SIZE SEGMENTS
OF LONGSEQUENCES
WHEN CONVOLVING
WITH
A SHORT SEQUENCE
BY RECTANGULAR TRANSFORM
METHODS
Filter Tap
Length
6
12
2.66
180
420
2
4
80 8
16
32
64
12.97
128
256
1.66
30
60
1206.29
9.04
840
Number of
Multiplications
M
F1 ( N )
Multiplications
per Point
Fz(P,N )
20
200
3.33
5 60
4.66
9.40
1100
6.11
3800
10 64018.17 12.66
1.60
2.22
4.44
N = 2 Algorithm-M = 2, A B = 2 , A c = 2 , A = 4:
a0 = (ho + hl)/2
a1 = (ho - hlY2
bo =x0 + X 1
bl =x0 - x1
mk=Ukbk, k = 0 , 1
Yo=mo+ml
y l = m o- m l .
N=3Algorithm-M=4,AE=5,Ac=6,A=ll:
a. = (ho t hl t h2)/3
al = ho - h2
a2 = h l - h2
a3 = [(ho- h 2 )+ (hl - h 2 ) l / 3
b o = x o t x l +x2
bl =x0 - x2
b2 =x1 - x2
b3 = (x0 - x2) -t (x1 - x 2 )
mk =akbk, k = 0, 1 , 2 , 3
'
YO =mO ( m l -
m3)
mo
N = 4 Algorithm-M = 5, A B = 7, Ac = 8 , A = 15:
G o = [(ho+h2)+(h1+h3)I/4
al=[(ho-th~)-(hl+h3)I/4
a2 = (ho - h2)/2
a3 = [(ho - h2) - ( h -~h3)1/2
a4 = [(ho- h2) -t ( h -~h3)I / 2
bo=(xo+x2)+(x1+x3)
bl = ( x 0 ' x 2 ) -(x1 ' x 3 1
bz = (x0 - x21 + (x1 - x31
b3 =x0 - X Z
b4 = X I - x3
N=6Algorithm-M=8,AB=18,Ac=26,A=44:
where
1 1 1 1 1 1
1 0 0 0 0 0 - 1
A = d i a g ( l 1 -1 1 1 1 1 1 ) - B / 6
0 1 0 0 0 0 - 1
where
0 0 1 0 0 0 - 1
1
0 - 1
0-1
1 - 1
1-1
1 - 1
B=
1 - 1
0 - 1 - 1
0 - 1 - 1
0 0 0 0 1 0 - 1
0 0 0 0 0 1 - 1
1 0 0 1 0 0 - 2
0-1-1
1 -1
1 -1
0 1 0 0 1 0 - 2
1 -1
1
0 0 0 1 0 0 - 1
A=
0 0 1 0 0 1 - 2
1 1 0 0 0 0 - 2
0 1 1 0 0 0 - 2
1 0 1 0 0 0 - 2
0 0 0 1 1 0 - 2
0 0 0 0 1 1 - 2
0 0 0 1 0 1 - 2
1 1 0 1 1 0 - 4
0 1 1 0 1 1 - 4
1 1 1 1 1 1 - 6
1 -2 -1
1
C=
-2
2 - 1 -1
1 - 1 - 2
-2
1 -1
1
2 -1
-1
-1
1
1
2 -1
2
1-
2-1
1 -1
-2
1
1 -2
1
1
1
1
407
1 10 - 1 - 1 - 1 - 1
0 - 1
1 - 1 - 10 - 1 - 1
0-1
1 - 11 - 10 - 1 - 1
00
0 - 1
0 - 1
0 - 1
1 0 - 1
C= 1 - 1 - 1 - 1 - 1
1
1 ' 10 - 1 - 10 - 1
1 1 - 1 1 1 - 1 1 0
1 0
1 1
0 - 1
1 0 - 1 - 10-1
1 1
0-1-1-1
0 - 1
1-1
uo = mo - m18
Also,
u 1 = m-l m5
0 1 0 1 0 - 1
0-1u2=m4+m6
u3=m1 +m3
1 -1 1 - 1 - 1 1 -1 1
U 4 = m.2 - m6
0
1 0 1 0 - 1 0 - 1
~ ~ = m ~ + m ~ t m ~ + m ~ - m ~
0 0 0 1 0 0 0 - 1
u6 =uO - u3
u, =uot u5
0 0 1 - 1 0 0 - 1 1
yo=~o+~1-~2-m3+m9+ml~
0 0 1 0 0 0 - 1 0
y1=uo-u1-u2-m2+mlo+m15
0 1 0 0 0 - 1 0 0
y2=~6+~4-m5+m12+m14
B=
Y3=U6-u4-m4+m7+mll
1-1 0 0 - 1 1 0
0
y4=~7+m1-m7-m10-m13+m16
1 0 0 0 - 1 0 0 0
y5=(mo+m0)+(2m~+2m~)+m~-~o-~1-~2-~3
1 1 -1 - 1 1 1 -1
-Y4-Y6
y , = ~ ~ + m ~ - m ~ - m ~ ~ - m ~ ~ + m ~ 7 .
0 1 0 - 1 0 1 0 - 1
1
where
1 00 - 1 - 1
1
1 -1 - 1 -1
0 - 1 - 1
1 0 1 0 - 1 0 - 1
E=
-1
1
0
1 -1
-1
-1
-1
1 1 -1
1 0 - 1
1 0 - 1
1 0 - 1
-1
-1 - 1
1
-1
1 - 1 -1
1 -1
-1
1 -1
1 -1
1 -1
1 -1
1 -1
1 -1
1 -1
1 0 - 1
1 -1
-1
1 0 - 1
1 -1
1 -1
1 -1
408
IEEE TRANSACTIONS
ON
ACOUSTICS,
SPEECH, AND
SIGNAL
PROCESSING,
VOL.
CONVOLUTION
DIGITAL
AGARWAL
FOR
ALGORITHMS
AND COOLEY:
409
(B4)
y=CY
Hk
Ak,php
p=o
x,=
N-1
Bk,qXq,
k = 0 , 1 , 2 , * * * , M 1.
-
(B6)
q=o
q=o
Bk,qxq}
N-1
N-1
Yn
q = op = o
c n , k Ahkp,X
pB
q k,q}.
k=O
k=O
Cn,kAk,pBk,q
=lifptq=nmodN
= 0 otherwise.
(B9)
REFERENCES
R.C.Agarwaland
C. S. BUIIUS, Fastone-dimensional digital
IEEE Trans.
convolution
by
multidimensional
techniques,
Acoust.,Speech; Signal Processing, vol.ASSP-22, pp.1-10,
Feb. 1974.
-, Fast convolution using Fermat number transforms with
applications to digital filtering, IEEE Trans. Acoust., Speech,
SignalProcessing, vol. ASSP-22, pp.87-99, Apr. 1974.
-, Numbertheoretictransforms
to implementfast digital
convolution,Proc. IEEE, vol. 63, pp. 550-560, Apr. 1975.
G. D. Bergland, A fast Fourier transform algorithmusing base 8
iterations,Math. Comput., vol. 22, pp. 275-279, Apr. 1968.
J. W. Cooley, P. A. W. Lewis, and P. D. Welch, Historical notes
on the fastFouriertransform,
IEEE Trans. AudioElectroacoust., vol. AU-15, pp. 76-79, June 1967.
-, The fast Fourier transform: Programming considerations in
the calculation of sine, cosine and Laplace transforms,J. Sound
Vib., vol. 22, pp. 315-337, July 1970.
I. J. Good,TheinteractionalgorithmandpracticalFourier
analysis, J. Royal Statist. Soc., ser. B. vol. 20, pp. 361-372,
1958;addendum, vol. 22,1960, pp. 372-375, (MR 21 1674;
MR 23 A4231).
J. H. Griesmer, R. D. Jenks, and D. Y. Y. Yun, SCRATCHPAD
users manual, IBMRes. Rep. RA 70, IBM Watson Res. Cen.,
Yorktown Heights,NY, June 1975; and SCRATCHPAD Techn i c a l Newsletter No. 1, Nov. 15,1975.
D. E. Knuth, Seminumerical algorithms, in The Art of ComMA: Addision-Wesley,
puter Programming, vol.
2.
Reading,
1971.
T.Nagell, Introduction to Number Theory.
New York: Wiley,
1951.
P. J. Nicholson, Algebraic theory of fiiite Fourier transforms,
J. Comput. Syst. Sei., vol. 5, pp. 524-527, Oct. 1971.
J. M. Pollard,ThefastFouriertransformina
fiiite field,
Math. Comput., vol. 25, no. 114,pp. 365-374, Apr. 1971.
C. M. Rader, Discrete convolutions via Mersenne transforms,
IEEE Trans. Comput., vol. C-21, pp. 1269-1273, Dec. 1972.
I. S. Reed and T. K. Truong, The use of finite fieldsto compute
convolutions, IEEE Trans. Inform. Theory, vol. IT-21, pp. 208213, Mar. 1975.
-, Complex integer convolutions over a direct sum of Galois
fields, IEEE Trans. Inform Theory, vol. IT-21, pp. 657-661,
Nov. 1975.
410
PROCESSING, VOL.
[ 181 S. Winograd, Some bilinear forms whose multiplicative complexity depends on the field of constants, IBM Res. Rep. RC 5669,
IBM WatsonResearchCen.,
Yorktown Heights, NY, Oct. 10,
1975.
[19] -, On computing the discreteFouriertransform, Proc. Nat.
Acad. Sci USA, vol. 73, no. 4, pp. 1005-1006, Apr. 1976.
I. INTRODUCTION
N MANY engineering applications it is desired to combine a
number of discrete-time signals in a linear fashion to obtain
a composite sequence with enchanced signal-to-noise ratio.
The enhancement technique applies, for example, to data
acquisition in the presence of severe electromagnetic radiation,
to simulation of spinal reflex transmission, and to the design
of two-dimensional digital filters [ l ] -[4] . A newly proposed
application is the multichannel processing of signals sensed by
an array of geophones in a coal gasification project [5] :In
each case, the composite sequence is obtained by passing the
signals through individual filters followed by a summer. These
fdters are somewhat similar to the Wiener and the Kalman
fdters in the sense that a least squares criterion is used to determine the filter coefficients.
They differ from the Wiener and the Kalman filters in the
specification of a priori information. Rather than being given
as a signal covariance matrix or as the output of a linear dynamic system driven by white noise, the signal information is
given in the form of constraints on the filter coefficients. For
this reason, the filters are called constrained least squares
(CLS) filters.
For the case of two discrete-time signals and three filter
points, Claerbout [6] established thatthe CLS filter coefficients constitute the solution of a block-Toeplitz system. The
solution procedure requires the inversion ofathird-order
block-Toeplitz matrix of block size 2 X 2. In the general case
* l - L + ? \
Y
Fig. 1. Composite sequence.
11. VECTOR-MATRIXFORMULATION
The CLS coefficients will be obtained as a solution of a system of linear equations. Let fi(k), i = 1, 2, . * , M , be M
discrete-time signals, each of whch has exactly a signalelements, i.e., k = 0 , 1 , * * . ,a - 1, and each of which is passed
through an individual sample-data filter followed by a summer,
as shown in Fig. 1 .
The impulse response hi(k) of the sample-data filter is of
Manuscript received July 9, 1976; revised April 11,1977.
duration b, i.e., k = 0 , 1, * ,b - 1 Without loss of generality,
The authorwas with GeneralDynamics, Orlando, FL. He is now with
the International Telephone and Telegraph Corporation, Stamford, CT fi and hi can be considered to be identically zero for all indices
less than zero and forall indices greater than (a - 1) and
06902.