1975 Number Theoretic Transforms To Implement Fast Digital Convolution

PROCEEDINGS
IEEE, OF THE
550
VOL. 63,NO. 4, APRIL 1975
Number Theoretic Transforms to Implement Fast

Digital Convolution
Invited Paper
Absmiet-Transforms using number theoretic concepts are developed
as a method for fast and erroraee calculation of r i t e digital convolution. The transforms are defiied on finite fields and rings of integers
with arithmetic &
e
d out modulo a n integer and it is shown that
undercertainconditions
this gives the sameresults as conventional
digital conwlution. -use
of these characteristics ihey are ideally
suited to digital computation by taking into account quantization of
amplitude as well as timeintheir d e f h i o n . When the modulus is
chosen as a Fezmat number a transform results that requires only on
t h e o r d a of N log N additions and word shifts but no multiplications.
In addition to being efficient, they have no roundoff enor and do not
require storage of basis functions. There is a restriction on sequence
lengthimposedbywordlengthanda
problem withovertlow but
methods for overcoming these are presented. Results of an implementation on an IBM 370/155 are presented and compared with the fast
Fourier tnnsfom showing a substantial improvement in effkiency and
accuracy. Variations on the basic number theoretic transforms are also
The use of transform methods has proven to be useful when

an application allows sequences to be processed in blocks. The
most versatile transform is the discrete Fourier transform
(DFT) defined by
DFT[xl
4 X(k)=
N- 1
x(n)exp(-j2nnk/N),
n=o
k=O,l;.-,Nand the inverse transform
IDFT[Xl
4 x(n) =N-'
N- 1
X ( k ) exp(j2nnk/N),
k =O
presented.
n = 0 , l , . * * , N -1. (3)
I. INTRODUCTION
1 (2)
INITEDIGITALconvolution
defined by
is anumericalprocedure
The property of this transform that is important here is the

cyclic convolution property (CCP) which states that
DFT[h
N-1
h(n - m)x(m),
y ( n )=
=o,
1,2,.
..
(1)
m +O
* x]
= DFT[h]
- DFT[x].
This implies that a convolution can be calculated by

y ( n ) = IDFT(DFT[h]
and symbolically denoted
- DFT[x]}
(4)
using two transforms,N multiplications, and one inverse transform.Theconvolutionimplementedby(4)

is cdled cyclic
convolution since it evaluates (1) as if h ( n ) and x(n) were
where x(n), h ( n ) , and y ( n ) are digital number sequences. This
periodically extended outside of the range from 0 to ( N - 1)
operation has many very powerful applications. It is used to
or, equivalently, theinditesare
evaluated mod N . Normal
implement nonrecursive or finite-impulse-response digital filters
finite convolution can be calculated by cyclic convolution if
either directly or with sectioning or block techniques[ 1 1, [ 2 1
zerosare appendedtox(n)and
h ( n ) to preventfolding or
and recursive or infinite-impulse-response digital fiiter by block
aliasing [ 11, [21.
methods [3]. It also is used to carry out auto and cross correThis transform approach became useful only when Cooley
lation as well as for computations suchas polynomial multipliand Tukey [ 71 introduced a very efficient algorithm known as
cation and multiplicationof very large integers [ 41 -[ 6 I.
the fast Fourier transform (FFT) for calculating the DFT and
There are several methods to implement finite convolution
its inverse in (2) and(3).Thenumber
of multiplications
that differ in the amount o f computation required, the effects necessary t o calculate the FFTof a number sequence of length
of arithmetic roundoff, and the amount
of storage required.
N is on the order of N log2 N. Implementation of convolution
It is somewhat difficult to compare various algorithms because
using the FFTresults in a considerable savings in multiplication
of the tradeoffs between these various factors thatdepend on
when lengths are approximately above N = 32. The disadvanthe hardware or software that is available. However, because
tage of this approach is in the form of significant amounts of
of the complexity of performing multiplication, the number roundoff error [ 8 1, storage or generation of the complexbasis
of multiplications necessary to implement convolutionis often functionsthat have toberoundedand
stillaconsiderable
an important factor to be
minimized.
amount .of multiplying.
If one looks for the properties that a general transform with
Manuscriptreceived September 5, 1974; revised October 15, 1974. the DFT structure
~ ( n=)h ( n ) * x ( n )
This work was supported in part by the National Science Foundation

under Grant GK-23697.
R. C. Aganval was with the Department of ElectricalEngineering,
Rice University, Houston, Tex. He is now with the IBM Thomas J.
Watson Research Center, Yorktown Heights, N.Y. 10598.
C. S. Burrus is with the Department of Electrical Engineering, Rice
University, Houston, Tex. 77001.
N- 1
~ ( k= )
x(n)ank
(5)
n=O
must have to have the CCP, it is found [9], [ 101 that a is a
AGARWAL AND BURRUS: NUMBERTHEORETIC TRANSFORMS
55 1
root of unity of order N , i.e., N is the least positive integer

such that
Strassen[61defined
transforms having the CCP modulo a
Fermat number and discussed their application t o fast multiplication of very large integers. Knuth [ 121 elaborated on the
1.
(6) work of Schonhage and Strassen. Nicholson [ 101 presented an
This analysis shows thatinthecomplexnumber
field, the algebraic theory of FFTs in any ring and established fast FFT[ 131,
transforms.Rader
conventional DFT witha = exp (- j27r/N) is the only transform type algorithms tocomputethese
[
141
proposed
number
theoretic
transforms
in
rings
of
integers
given by ( 5 ) withthe CCP. If, however, other fieldsand
arithmetic systems are used, new transforms become possible modulo bothMersenne and Fermat numbers. He f i i t proposed
with very interesting properties. This is pursued by considering the application to digital signal processing, showed that the
using onlyadditionsandbit
mathematical systems that are fundamentally compatible with transforms couldbecalculated
shifting,
showed
the
word
length
constraint, and suggested
digital computing capability.
two-dimensional
transforms
as
a
possible relaxation of that
In any practical situation,orwhen
workingwith digital
[ 151 discussed number
constraint.
Agarwal
and
Burrus
[9],
machines, thedata are available onlywithsomefinite
precision, and therefore, without loss of generality, the data can theoretic transforms in detail, defined Fermat number transbe considered to be integers with some upper bound. To com- forms and also proposed their application for fast digital convolution. They also suggested possible hardware and software
pute convolution in thisdigital domain,operationsinthe
IBM 370/155
complexnumber
field of thecontinuousdomain
can be implementations. Their implementation on the
imitated in a finite field or, more generally, in a finite ring of showed a factor of 3 t o 5 speed improvement over efficient
integers underadditions
andmultiplications
modulo some FFT implementations of cyclic convolution for lengths up to
256. An earlierarticle by TakahasiandIshibashi
[30] was
integer M. An integer a of order N replaces exp(-j2n/N)
recently
brought
to
our
attention
by
Dr.
J.
W.
Cooley
of IBM.
used in a DFT. In this ring, when two integer sequencesx(n)
and h ( n ) are convolved, the output integer sequence y ( n ) is
11. MODULAR ARITHMETIC
congruent to the conventional convolution of x(n) and h ( n )
In
this
section,
some of the basic concepts of modular
mod M. In the ring of integers mod M, conventional integers
arithmetic
from
number
theory
relevant to NTT will be discan beunambiguouslyrepresented
if theirabsolute value is
foundinmost basic booksonnumber
less than M/2. If the input integer sequences x(n) and h ( n ) are cussed. Thiscanbe
so scaled that ly(n)l never exceeds M/2, we would get the theory [ 161, [ 171.
Two integers Q and b are said t o be congruent modM if
same results by implementing convolution in the ring of integers mod M as that obtained with normal arithmetic. This is
a=b+kM
(7)
similar to the overflow constraint infixed-point digital machines. In most digital filteringapplications, h ( n ) represents where k is some integer and M is the modulus. This is written
the impulse response and is known a priori; also the maximum as
magnitude of the input signal is usually known.
Q = b (mod M).
(8)
By working
in
a
finite field or ring of integers with
arithmetic carried out modulo an integer M, a large class of All integers are congruent mod M to some integer in the finite
transforms exist that have the CCP. By special choices of the set (0, 1 , 2 , * ,M - 1) which is called the set of integers
length N, the mod M, and the value a, it is possible to have mod M and denoted by Z M . ZM is also known as the ring of
transforms that need only word shifts and additions but no
integers mod M. If in a ring of integers multiplicative inverses
multiplications, that have an FFT type fast algorithm, that do exist for all nonzero integers, this ring becomesa
field
not require storage of complex values for a,and that have no and it can be shown that Z , is a field if and only if M is a
roundoff errors. These transforms are called number theoretic prime. We will use the symbol ZM and the expression the
transforms (NTT) and they look very promising in the evalua- ring of integers mod M for rings as well as fields since a field
tion of finite convolutions. Their main disadvantage seems to is also a ring. The following basic arithmeticoperations are
be a relation of the sequence length N to the required word permissible with modular arithmetic.
length that can require long word lengths for longsequence
Addition: Example, 7 + 12 = 19 = 2 (mod 17).
lengths.
Negation:
Example, -7 = - 7 + 17 = 10 (mod 17).
These number theoretic transforms are truly digital transSubtraction:
Example, 7 - 12 = 7 + (- 12) = 7 + 5 = 12
forms, taking into account the quantization in amplitude and
(mod 17).
the finite precision of digital signals. They bear the same relaMultiplication: Example, 7 X 12 = 8 4 = 16 (mod 17).
tionto digital signals as theDFT does to discrete-time or
Multiplicative Inverse: Multiplicative inverse of an integer
sampled data signals and the Fourier or Laplace transforms do
b in Z M exists if and only if b and M are relatively prime.
to continuous-time signals. In the same manner that the relaInthat case b- is an integer suchthat b X b- =1
tion of discrete-time signals to continuous-time signals through
(mod M). Example, 7- = 5 (mod 17); 7 X 5 = 35 = 1
sampling involves a possible folding or aliasing in the frequency
(mod 17).
domain, the relation of calculations with the DFT t o calculaDivision: a / b exists if and only if b has an inverse. In
tions with the number theoretic transforms invohesa possible
that case a / b = Q X b- . Example, 12/7 = 12 X 5 = 9
folding of the amplitude that must be taken into account.
(mod 17); 7 X 9 = 12 (mod 17).
The literature on transforms of these types is fairly recent.
Knuth [4] has proposed the use of transforms in finite fields.
This may seem like a rather peculiar way to do arithmetic
Pollard [5] discussed transforms having the CCP in a finite but it is used quite often by everyone. In discussing the day of
field and also gives conditions for having transforms with the the week, one uses an arithmetic mod 7 or in stating the time,
CCP in a finite ring of integers. Good [ 11 ] also mentioned the one is calculating mod 12 or perhaps 24. Indeed the mantissa
use of transforms in a finite ring of integers. Schonhage and of a number in scientific notation is evaluated mod 10.
dv=
552
IEEE,PROCEEDINGS OF THE
APRIL
Because of the nature of modular arithmetic, numbers do

not have sizes or magnitude. We can not say that a particular
number is larger than another or that two numbers are close.
Tuesday may not be close to Wednesday orcome before
Wednesday if they occur in different weeks.
As was mentioned in the introduction, for the existence of
transforms with the DFT structure given in ( 5 ) and having the
CCP, it is necessary that an integer exist that is the Nth rootof
unity. We will now consider this problem using modular
arithmetic. First Euler's cp function is definedas cp(M), the
number of integers in Z M that are relative primes to M. For M
a prime, q ( M ) = M - 1. If M is composite and its prime fac* * p;'
then the general
tored form is denoted by M = p:'p;
expression for cp is
cp(M) =M(1 - l/P1)(1 - 1/P2) *
Consider raising each element of Z7 to powers from 1 t o 6

(mod 7).
N
(1 - l/Pl).
~~
(mod MI.
1N=l
2N=1
3N=1
4N=1
5N=1
gN=l
1
2
3
4
5
6
1
4
2
2
4
1
3
1
1
6
1
6 ,
6
4
1
2
4
4
2
1
5
1
4
5
2
3
6
6
1
1
1
1
1
1
This illustrates several very interesting features. Consider

various roots of order N .
An important theorem known as Euler's theorem states that

for every a relatively prime to M
aq(M)= 1
1975
Roots of order N
1
2
3
6
1
6
2,4
3, 5
the
(9)
Onlythose N that cfivide cp(M) =cp(7) = 6 have rootsthat

For M prime this reduces to Fermat's theorem
belong to them. The number of roots is given by p ( N ) and the
number of primitive roots iscp(cp(M)) = 2 and they are 3 and 5 .
aM-l = 1
(modM)
(10)
Note that both of the primitive roots generate all the nonzero
which holds for all nonzero elements of ZM since they are all elements of the field while the other roots generatecyclic subsets with N distinct members. Also note that Euler's theorem
relatively prime to M if M is prime.
There are certain roots of unity that are of particular inter- (Fermat's theorem in this case) is indeed satisfied in that all
elements raised to the 6th power are congruent to unity and
est. If N is the least positive integer such that
(1 3) does generate all the roots of order N from the primitive
aN = 1 (mod M )
1) (1
roots. Also note'that every nonzero integer a has an inverse
For a nonprime M ,a has an inverse given by
if
then a is said to be a root of unity of order N , or simply of x?-2.
order N . In some of the literature a is said to belong to the a and M are relatively prime.
By considering a similar example with M a composite rather
exponent N or N is the exponent to which a belongs. Another
than
a prime, one observes several differences. First ZM is not
terminology says a is a primitive Nth root of unity.
If the order of a (the exponent t o which a belongs) is equal a field since all elements will not have inverses. There is no
to cp(M), then a is called.a primitive root (do notconfuse with primitive root that will generate the entire ring, only subsets
a primitive Nth root of unity). If M is prime and a is a primi- with cp(M) elements.
When considering a nonprirrie mod M ,Z M is a ring and in* ,
tive root, the set of integers (ak(Mod M),k = 0, 1, 2,
M - 2) is the total set of nonzero elements in 2,. Thus all verses exist only for integers relatively prime to M. Let M have
nonzero integers in Z M can be generated by powers of a primi- the following unique prime power factorization.
tive root. This characterizes the entire field.
.p:' * p;l.
(15)
Euler's theorem implies that if a is of order N then N must
divide cp(M), denotedby NIcp(M). If M is prime it can be When the arithmeticis done modM ,it is in effect done modulo
shown that roots of order N exist if and only if NI (M- 1) eachprimepower
pi" simultaneously [ 4 ] , [ 181. A set of
and the roots aregiven by
arithmeticoperations can bedoneeithermodulo
each p y
separately andthe finalresult mod M obtained using the
=4M-1"N
(12) Chinese remainder theorem [ 4 ] , [ 161, [ 181, or alternatively
where a,+,denotes a primitive root. More generally, if a is a all the operationsmay be done modM ,but, they must be valid
operationsmodfor each p y . An integer a is said to be of
root of order N then
order N i n ZM if a d only if it is of order N in each Zpiri. Here
akis of order N / k if k IN
we present some basic results.
ak is of order N if N and k are relatively prime. (1 3)
a b
(16)
(modM)
This implies the number of roots of order N is given by cp(N)
is true if and only if
and, therefore, the numberof primitive roots is cp(cp(M)). These
relations will allow one to calculate all of theroots of all
~ = b (modpj'),
..i=1,2;**,Z.
(17)
possible orders from one primitive root. Tables will often list
primes and the smallest primitive root for each.
If we know the residues of an integer a modulo each pj', we
These ideas will become clearer by looking
at an example. can uniquely reconstructthe integer a (mod M ) using the
First we will Chinese remainder theorem given in the following.
Consider the field Z7 witharithmetic mod7.
give the f i t few evaluations of Euler's function.
Let
..
cp(1) = 1 p(2) = 1 (p(3) = 2 cp(4) = 2

( ~ ( 5=) 4 ( ~ ( 6=) 2 ~ ( 7 =) 6.
(14)
AGARWAL AND BURRUS: NUMBERTHEORETIC TRANSFORMS
553
and
N l ( p i - l )i ,= l , 2 ; * . , 1
r' -1
-1
(mod $1
(di mod p i ' )
di
lN I g c d b , - 1,pz - l , * * * , ~ 1).
We define O(M)as the greatest common divisor (gcd) of the
( P i - 1)
1).
(24)
O ( M ) P g c d { p 1 -, p zl ,-. . * , p l -
then
Therefore,
111. NUMBERTHEORETIC
TRANSFORMS
In this section, the definition and basic conditions for the
existence of the NTT will be presented and in particular the
allowed relations between the modulus M and the transform
length N and basis function a are spelled out.
If we have a length N sequence of numbers, then a transform
pair of the form given by
N-1
X(k)=
x(n)a"k
n =O
N- 1
x ( n ) =N-1
X ( k ) a-nk
(20)
k= 0
is said to have a DFT structure. By requiring that application

of the transform method in (4) results in cyclic convolution,
the following theorem can be proven.
Theorem I : A length N transform having the DFT structure
will implement cyclic convolution if and only if there exists an
inverse of N and an element a,a root of unity of order N , i.e.,
N is the least positive integer such that
(25)
N I OW).
Equation (25) gives the necessary condition for the existence
of a transform of length N in the arithmetic mod.!r
Now consider the converse of it. If NIO(M)or N(cp(pi'), then there
exist integers a i (mod p y ) of order N in Z ri. Using these ai
pi
we can construct transforms (mod p y ) whch have the DFT
structure of (1 9) and areinvertible.Combiningthesetransforms by the Chinese remainder theorem (18) one can obtain
Alternatively,
a transform(modM) having the CCP in Z,.
one can combine the ai's by the Chinese remainder theorem
to obtain an a (mod M ) of order N in ZW and construct the
finaltransform using this a. The results will beidentical.
Therefore, (25) is the necessary and sufficient condition for
the existence of an invertible transform of length N which has
the CCP mod M. This is stated in theform of a theorem
[91,[151.
Theorem 2: A length N transform having the DFT structure
will implement cyclic convolution modM if and only if
N I O(M).
This also establishesthe maximum transform length in Z, as
d v = 1.
Nmax
This is a very general result applying to both rings and fields

that are fiiite or infinite and it has been developed from a
variety of points of view [ 51, [9], [ lo]. In addition to the
CCP, transforms of thistype also allow fast computation
algorithms of the FFT type when N is highly composite [ 101.
Theorem 1 is somewhat difficult t o use when investigating
various possible moduli with modular arithmetic, so an alternate set of conditions will be developed. Let Z, represent the
ring of integers (0, 1-,2, * * ,M - 1) with arithmetic carried
out mod M.
Let M have the following unique prime power factorization
M = pi1 p 2 . . p;l
(21)
where the pi's are distinct primes. As pointed out in Section11,
when we carry, our arithmetic mod M, we are in effect doing it
modulo eachpi' simultaneously.
Therefore, the length N number theoretic transform having
the CCP in Zw must also have the CCP in Z p .ri for i = 1, 2,
* * , I .
This requires that(mod
p p ) an inteber oforder N
must exist in Z r i , i.e., N is the least positive integer such that
Pi
a N = 1 (modp?),
i = 1,2;**,2.
(22)
Furthermore, since the inverse transform requires N-' ; the inverse of N should exist in Z ',., or, N should be relatively
Pi
prime to M. Now we investigate the existence of an a of order
N , in each Zp:,. By Euler's theorem (9) and (22), we have
~ ~ c p ( p ? ) , i = 1,2, *
Or
(pi - l)
ri
(Phi
relatively prime to M (or pi's)
-pi
* *
ri-1
,I
(26)
= O(M).
This is a very important theorem that states exactly what the

possible transform lengths for a given modulus are.
Althoughboththeorems
as stated here assume theDFT
structure of (1 9), they hold for any general transform having
the CCP [ lo]. It is possible in a ring with a composite modulus
to have a transform with the CCP but not the DFTstructure.'
Transforms of this sort do not allow an FFT-type fast algorithm and, therefore, do notseem promising.
For number theoretic transforms tobe attractive in comparison to other implementations of convolution, they should be
computationally efficient.Thereare
three requirements that
will be considered. First, N shouldbe
highly composite
(preferably apower of 2) for afast FFT-type algorithm to
exist and should be large enough for practical sequence lengths.
Second, since complex multiplications take most of the computational effort in calculating the FFT, it is important that
the multiplication by powers of a be a simple operation. This
is possible if the powers of a have binary representations with
very few bits; preferably also be a power of two, where multiplication by a power of a reduces to a word shift. Third, in
order to facilitate arithmetic
mod M ,Mshould also have a
binary representation with a very few bits and should be large
snough t o prevent overflow.
Although the class of all possible numbertheoretic transforms seems very large at first consideration, closer examination shows that very few seem to satisfy the aforementioned
criteria. The parameters that must be chosen are M, N , and a.
Unfortunately the conditions given by Theorems 1 and 2 do
(23)
(pi - l ) . SinceN is
'An
example
otransform
fa
structure w a s produced
by
observation.
in Z,, having the CCP but

the
not
DFT
G. Kopec at M.I.T. and led t o this
PROCEEDINGS OF THE IEEE, APRIL
554
not give a systematic way of determining the "best" choices. Number theoretictransforms witha
Fermatnumber
As a result one must use intuition, insight, and a bit of search- modulus are calIed Fermat number transforms (FNT).
ing. Usually an M is selected and the resulting possible N and
a are then examined.

First we see that if M is even, it has a factor of 2 and, therefore, O ( h 4 ) and Nmaxare 1 which implies M should be odd. If
M is a prime then O(M) = M - 1 which is as large as one could
hope for in a field of M integers. For M = 2& - 1, let k be a
composite PQ, where P is prime. Then 2'- 1 divides fQ- 1
and the maximum possible length of the transform will be
governed by the length possible for 2' - 1. Therefore, only
the prime k need to be considered interesting. Numbers of this
form are known as Mersenne numbersand Radar [ 141 has
discussed convolution using Mersenne numbers in detail. For
Mersenne number transforms, it can be shown that transforms
of length at least 2P exist andthe corresponding a is - 2 .
Mersenne number transforms are not of as much interest because 2P is not highly composite and, therefore, we do not
have fast FFT-type computationalalgorithms.
1975
as a
As discussed in the last section, for the FNT of length N to

exist, N must divide O(F,) = Nmax. Now, we consider transform lengths possible inarithmeticmodulo
various Fermat
numbers and also give the corresponding values of a.
Since Fermat numbers up to F4 are prime, O(F,) = 2 b , and
we can have an FNT for any length N = 2 m , m Q b . For these
Fermat primes the integer 3 is an a of order N = 2 b , allowing
the largest possible transform length. Thereare 2"' - 1
other integers also which are of order 2b and can be obtained
from ( 1 3 ) . The integer 2 is of order N = 2 b = 2'". If (Y is
taken as 2 or a power of 2 , all the powers of a would be some
powers of 2 , and for these cases, as discussed in the last section
and in [ 141, the FNT can be computed very efficiently and is
called the Rader transform (RT).
T o better see the character of these prime moduli consider
an example for F2 similar in manner to that in Section 11. If
the modulus is M = F2 = 17 then
~
aN=l2
3N=1 3
qN= 1'4
gN=l 6
8
10
13
12
16
13
1
4
15
5
4
13
15
16
8
9
11
13
14
1
16
2
14
4
11
N = O
9
16
2
1
16
10 1211
4
8
16
15
8
7
13
5
16
4
1
13
13
14 16 15
15
12
4
10
13
2
16
1
9
6
13
3
1
1
1
For M = 2 k + 1 and k odd, 3 divides 2& + 1 and the largest Here we see that 3 and 6 are primitive roots that will generate
possible transform length is 2 , thus we consider only k even. the entire field 21,. The value 2 is of order 8 and 4 is of order
the sense that 62 = 2 (mod 17).
Let k be s2', where s is an odd integer. Then 2 2 f+ 1 divides 4 . Also note that 6 =*in
For digital filtering applications, the composites F s ( b = 3 2 )
2"'+ 1 and 'the length of the possible transform will be gov[ 191 has
erned by the length possible for 22' + 1. Therefore, integers of and F6(b = 6 4 ) also seem to bepractical.Lucas
the form M = 22'+ 1 are of interest. These numbers are known proven that every prime factor of a composite F , is of the form
as Fermat numbers andwill be discussed in detail inthis paper. K2'" + 1. Therefore, 2'" divides O(F,), for r 4 . In parFermatnumbers seem to
optimum in the sense of having ticular it can be verified that for Fs and F 6 , O ( F , ) = 2"'.
transforms whose length is interesting while the word size is Therefore, for these choices of Fermat numbers, the maximum
possible transform length is N = 2'+' = 4 b . Also, we assert that
moderate. Numbers of the form 2"'+ 1 are also of limited
%b given by ( 2 8 ) is of order 4 b in ZF,,r 2 2 .
interest and are discussed in Section IX. A systematic investigation of those M which require more than two bitrepresenta*4 a4b = 2b/4 ( p / 2 - 1)
(28)
tion is difficult. Our preliminary investigation in that direction
has not been very encouraging.
We denote this (Y4b as *because
>
IV. FERMATNUMBERTRANSFORMS
In this section, we consider one of the most promising number theoretic transforms where the modulus is chosen to be a
(Y&
=2
(mod F f ) .
The proof that a 4 b given by ( 2 8 ) is of order 4 b with respect to

any factor of F , is given in [ 9 ] . Any odd power of *will
also be of order 2'+'. By raising fl to (2'+2-m)thpower,
2
we obtain
an
integer a of order 2 m , m Q t + 2 .
M=F,=2' + 1
Table I below gives values of N for the two most important
= a b + 1 , b=2'
( 2 7 ) values of & and also gives the maximum possible fl for the most
practical values of b .
and F , is called the tth Fermat number. Originally, Fermat
For FNT's with a prime or composite modulus we see a = 2
conjectured [ 161 that these numbers were all prime but un- or a power of 2 is possible for sequence lengths up to N = 2 b =
fortunately not only was the conjecture wrong, it seems that 2'*'. This is a very desirable situation since N is highly comonly FO through F4 are prime and all the others arecomposite. posite allowing an FFT type algorithm and all multiplications
The first few values are:
by powers of a are simple word shifts. If a = a i s used then
sequences of length N = 4 b = 2'" are possible but one stage
Fo =
in the FFT algoritw will require two shifts [ 9 ] . This a = .\/z
F1 =
and the resulting N = 4 b give the maximum length possible for
F2 =
prime
Fs and F 6 , however, for prime F , further increases in N are
F3 =
257
possible up to N = 2' if more stages of the FFT algorithm are
53765
F4 =
allowed to have multiplication rather than simple word shifts.
From this example it is seen that a = *=
6 for M = Fz gives
X 6 700 417
F S = 4 294 967 297 641
the maximum possible N .
F 6 2 1 . 8 4 x 1019 = 2 7 41 7 7 x 6 7 2 8 0 4 2 13 1 0 7 2 1 .
Fermat number
'
i}
AGARWAL AND BURRUS: NUMBER THEORETIC TRANSFORMS
TABLE I
PARAMETERS
FOR SEVERAL
POSSIBLEIMPLEMENTATIONS
FOR FNTs
P
r
8
16
3
4
5
6
F,
32
64
2 + 1
216 + 1
2 + 1
264 + 1
555
T- is given by
rll
=2
16
32
64
128
Nmax
CY
32 6
64
128
256
25
65536
128
25 6
forNm,x
T-1
11
= 4-1
a mcase
~ corresponds to the Rader Transform.
=-4
3I
l l l
1 -4 -1
1 -1
1 -1
L1 4 -1 - 4
Because of thenature of modulararithmetic discussed in

Section 11, theFNTcoefficientsdonot
seem to have any
r l 1 1 1 1
physical meaning. Although the signal for which the FNT is
being taken may be very small, its FNT coefficients may lie
4
1 16
13
=13
17). (mod
anywhere between 0 and F , - 1. This is because the concept
1 16 1 16
of magnitude does not exist. This also means that the concept
131
Ll 4 16
of closeness of two numbers does not exist in the modular
arithmetic.Therefore,approximationsor
roundings arenot
The transformsof x and h are given by
allowed in the modular arithmetic. A seemingly small approximation in the transform domain may introduce
serious error in
1 1
the finalresult.
But, because of thenature of themodular
4 16
arithmetic, there is no need for approximation. During various
16 1
stages of the computation each accumulation of signal overflows many times. But still the end result of the convolution
13 16
will be exact if the input signals are properly bounded. Some
of the propertiesof the FNTs are given in [ 9, appendix A ] .
Example
T o make the ideas of this section more clear, we now present
an example. This example will illustrate several points: treatment of negative values in the data, the structure of the transform and the inverse transform matrix, negative powers of a,
frequent overflow during computation, meaninglessness of
the transform values, and exactness of the final answer. This
example will not demonstrate the efficient implementation of
the FNT using the binary arithmetic.
Consider two sequences x = ( 2 , - 2 ,1, 0) and h = (1, 2 , 0,0 ) ,
whose convolution is desired. From the overflow consideration,it is sufficient ifwe workmodulo F2 = 17. We want
N = 4, for F2 the integer 2 is of order 8, therefore 2 = 4 is an
a of order 4.
The transformation matrix Tis given by
16 13
L1 16
13
Since 4- = - 4 (mod 17), the
(mod 17).
4J
inverse transformationmatrix
Note that in x, - 2 was represented by - 2 + 17 = 15. Similarly,
H = (3,9, 16, 10) and Y = X * H = ( 3 , 9 0 , 8 0 , 9 0 )

= (3, 5, 12, 5) (mod 17).
Taking the inverse transform of Y ,

y = ( 2 , 2 , 14,2) (mod 17).
According to OUT assumption, integers are supposed to lie between - 8 and 8. Therefore, 14must
be
represented
as
14 - 17 = -3. This gives y = ( 2 , 2 ,-3, 2 ) , which is the correct
answer. Also, note that y is a symmetric sequence, therefore,
Y is also a symmetric sequence. Other than this, the transform
values seem to have no interpretation.
For many applicationsa direct application of the FNT to
implement convolution will result in a significant improvement
over any alternative methods. There are many other situations
where the constraintsof the transform are too severe.
If data magnitude or machine constraints dictate acertain
word length andhencea
certain F,, the allowed sequences
length N may be too short. If input data magnitude and filter
length indicate a possible output magnitude that would exceed
F , / 2 , then overflow becomesaproblem.
If b bit words are
used with a modulus of F , = 2b + 1, themachine can represent
2b integers but the transform needs 2b + 1. We now consider
several partial solutions to these problems.
V. METHODS
FOR CONVOLVINGLONG SEQUENCES

AND
FOR AVOIDINGOVERFLOW
Arithmetic mod F , can be implemented using b = 2 bit

representation of integers with some provision for representing
2b. We have seen the maximum length of sequences which can
APRIL
556
MAXIMUM
TABLE I1
ONE-DIMENSIONAL
CYCLICCONVOLUTIONLENGTHSUSING
TWO-DIMENSIONAL
FNT OR RT
Word Length b
2048
32168
16
32
64
N for Q = 2
N for Q = fi
1975
machines. In most digital filtering applications, h ( n ) represents

the impulse response and is known a priori; also the maximum
magnitude of the input signal is usually known. In this situation, we can bound the peak output magnitude by
512
k =O
8192
be cyclically convolved using the FNT with a = 2 is N = 2b and

therefore the length of sequences which can be convolved is
proportional to the word lengthin bits. Thus, for long sequences, word lengthrequirement may be excessive. Rader
[ 141 suggested using a two-dimensional convolution scheme to
convolve longone-dimensionalsequencesand
Agarwal and
Burms [ 151, [ 231 presented such a two-dimensional convolution scheme. Using this scheme, cyclic convolution of length
N = LP is implemented as a two-dimensional cyclic convolution of length 2L by P. This two-dimensional cyclic convolution can be implemented using a two-dimensional FNT [ 151,
[23] defined similar to the one-dimensional transform. Using
this two-dimensional scheme, the word length required is proportional to the square root of the length of the sequences to
be convolved which would give for a maximum sequence length
8b2, rather than 4b.
If P is taken as the maximum possible
length 4b, and 2L is a small integer, than either direct convolution or another high-speed algorithm could be employed [ 23 1 ,
tocompute convolution along theshort dimensionand the
one-dimensional FNT could be used along the long dimension.
Computationally this combination can be very efficient as will
be shown in the implementation in Section VIII. Table I1 lists
the maximumlengths for two-dimensionaltransforms.
This
approach requires approximately a factor of two increase in
computation and storage requirements over adirectonedimensional implementation.
Anotherapproachto
achieving longertransforms than allowed with a = 2 is to use a = fi in (28). It can be shown
[ 9, 15 I that a = fi is of order N = 4b = 2'" and that multiplication times firequires two word shifts..
Examination of the FFT algorithm [ 11 shows that if a = fi
is used, only onestage will require multiplication by odd powers
of fi and from [ 9 ] it is shownthis can be done with two
word shifts. The other stages will multiply by even powers of
fi and, therefore, use a single shift as for the case with a = 2.
This modification is relatively simple and allows a doubling of
the allowed N. Note in the example for Fz that fi= 6 which
also gives N = Nms.
Each additional square root of a results in a doubling of the
allowed N (up to N = Nma) and adds an additional stage of
calculation to the FFT algorithm. Unfortunately, beyond fi
eachstage will requirea general multiplication. For the case
where F , is a prime, if a few stages of multiplication can be allowed, then N can be increased [ 251. For F5 and F6 a = fi
gives the rnaximum N.
This use of a # 2 can be combined with the two-dimensional
methods to give various desired N.
Another possible problem arises because of the modular arithmetic. In the ring of integers mod M, conventional integers can
be unambiguously represented only if their absolute value is
less than M/2. If the input integersequences x(n) and h ( n )
are so scaled that Iy(n)l never exceeds M/2, we would get the
same results by implementing convolution in
the ring of integers
modulo M as thatobtained with normalarithmetic.This is
similar to
the
overflow constraint in
fixed-point
digital
This may well require a longer word length than is possible or

practical.
One possible solution to this overflow problem involves segmenting the words into shorter blocks and convolving them
separately [ 91 :
< 2k
x(n) = xz(n)+x1(n)2k,
Ix,(n)l
h(n)=hz(n)+h,(n)2k,
Ihz(n)l < 2 k
* h = (x1 * h1)22k
+(x1 * h2 +x2 * h , ) 2 k
(30)
y =x
+x2
* hz.
(3 1)
Now, since x1, h 1 , x ? , and h2 have roughly half the number of

bits, it should be possible to convolve them using approximately
half the number of bits. If necessary, a more precise analysis of
the above situation could be easily performed. In (31), the last
term, in comparison to the first term, is very small and can be
neglected. We need to take two transforms for x and two transforms for h , the summation shown within the parentheses can
be performed in the transform domain.Finally, we need to take
two inverse transforms, oneforx1 * h l and theotherfor
(x1 * h2 + x2 * h l ) .
There is another alternative to thisproblem suggested by
Rader [ 241 and Parks [251. This is based onthe Chinese
remainder theorem. The convolution is done modulo two different integers Ml and M 2 where Mi and M2 are such that the
cyclic convolution in Z M ~and Z M ~is easily implemented on
the same machine. The final result mod M1 . M z is obtained
by the Chinese remainder theorem. M1 is usually a Fermat
number and the cyclic convolution in Znnl is computed using
the FNT. Rader [ 241 suggested usingM2 a power of 2, in that
case the convolutionin ZM? is computed by taking the sequences mod M z and then convolving them in Z M ~using the
FNT and then reducingthem back to Z M ~ .In this case M2
should be small enough so that no error is introduced by implementing convolution mod M z , in Z M , . This requires
NM; < M ~ .
(32)
Parks [ 25 I suggested that M z could be a Fermat number, just

smaller than M1.
In this case the convolution in Z M ~can also be done using an
FNT. Furthermore, the same machine can be utilized to compute the FNT in Z M ~ .Because M2 I(M1 - 2), therefore, arithmetic in Z M ~can be carried out in Z ( M , - ~ ) . Example: M1 =
216 + l,M1
- 2 = 216 - 1, andMz = 2* + 1. These ideas can be
further extended.
Still another approach to solving the sequence length N a n d
word length constraints would be to use block processing [ 31.
By breaking the sequence of length N into smaller blocks and
scaling and processing them separately with the FNT one can
combine the results to get the desired output. This can be
viewed as a type of two-dimensional processing.
VI. OVERFLOW AND QUANTIZATION CONSIDERATIONS

As mentioned in Section V, we could perform cyclic convolution modulo integer M and obtain the correct result if the absolute value of the output never exceeds M/2. If this condition
BURRUS:
AGARWAL AND
NUMBER THEORETIC TRANSFORMS
557
is violated, the resulting error is rather serious. Because of the

nature of the modular arithmetic we obtain folding or aliasing
of the signal amplitude. This situation could be avoided if the
signals are properly quantized (or normalized).
Let x ( n ) and h ( n ) represent the original signals. They may
have fractional parts (bits to the right of the binary point). To
make use of the number theoretic transforms these sequences
must be integersequences.This
is easily accomplished by
merely shifting the binaryposition all the way to the right.
This introduces scale factors in the sequences. The integer sequences ?(n) and c(n) are given by
dom can be utilized to minimize the effect of the quantization

of the sequences ontheoutput noise to signal ratio. If the
bound_(38) i? used to minimize theoutput noise to signal
ratio, b 1 and b2 should be so selected [ 221 that
?(n) = x ( n ) 2bl
(33)
C(n) = h ( n ) 2b1
(34)
where b l and b z , respectively, represent the number of bits to

the right of the binary point in x ( n ) and h ( n ) sequences.
= 2b1+ba x ( n ) * h ( n )
(35)
y(n). These
Now wegive some upper bounds on the output
are due to Jackson [ 2 1 ] . The L , norm of a signal is defined by
The output of the cyclic convolution is bounded by
Ijqn)l <Nllx"llplliiIlq,
-+-=
1, p , q 2 1.
(37)
In particular
lY(n)I GNllXll2 rlcll2
(38)
lY(n)l GNllX"ll1 Il~ll..
(39)
lY(n)l ~Nllx"ll=mllh711~
(40)
All these are valid upper bounds, but, theydiffer in the amount
of computation required to use them. The computation of L z
norm requires N multiplications, the computation of L norm
requiresNadditions, and the computationof L , norm requires
N comparisons because l l x \ l =~ ~ x ( n ) ~ , . ,Depending
,~.
on the
particular situation any of these bounds can be used. To avoid
aliasing error ly(n)lmm should be less thanM/2. If the bound
thus computed exceedsmaximum allowable value, i.e., M/2,
either or both of the signals are scaled down by truncating or
rounding the low-order bits. We choose new integers gl and
6-2 which are less than or equal to b l and b 2 , respectively, and
obtain integer sequences ?(n) and y"(n) as follows:
?(n) = [ x ( n(41)
)2b' ]
K(n) = [ h ( n )222 ]
where [ - ] representsrounding
tothe nearestinteger.
introduces roundoff noise sources e x ( n ) and E h ( n ) .
Z ( n ) = x ( n ) 2b'
+ Ex(n)
(42)
This
(43 1
(44)
The output
- -
Ir"(n)l
(46)
The quantization of the sequences can be done on a block to

block basis.
The output y ( n ) thus obtained would roughly have twice the
all thesebitsaremeaningful.Roughly
number of bits.Not
speaking lower half of the bits represent noise. Using quantization noise models, an analysis can be made for the roundoff
noise in y ( n ) (due to the quantization of x ( n ) and h ( n ) ) and
an appropriate number of low order bits of y ( n ) can bediscarded for further processing.
VII. ARITHMETICCONSIDERATIONS
jqn) = x"(n) * L(n)

= 2b1+bl y ( n )
IIx"II2 = Ilh712 .
= ly(n) 2b1+b,l GM/2.
(45)
Eq. (45) suggests that we have some freedom in choosing Fl

and g2 with the constraint that $1 + gz is constant. This free-
Since the structure of the FNT is similar to that of the DFT,

and N is a power of 2, afast implementation of the FNTsimilar
to the FFT with radix 2 exists. As a matter of fact, all we have
to do is replace W = exp (-j2n/nT) by CY in any FFT algorithm
[ l l , [71.
In computing the FNT, arithmetic is done mod 2b + 1. In
this arithmetic the only allowed integers are 0, 1, * * , 2b and
all integers whose absolute values do not exceed 2 b - 1 can be
represented unambiguously. Negative integers are represented
by adding 2b t 1 to them; this is similar to twos complement
representation of negative integers. Using a b-bit register, all
integers from 0 to 2b - 1 can be represented. A very unfortunate problem arises in that 2b (which is also - 1) cannot be
represented.
If - 1 is encountered in the data it is rounded either to 0 or
to - 2. This is equivalent to introducing some initial quantization noise and will not have any serious effect on the result.
But, if during the course of the computation 2b appears as a reF,) it can not be
sult of some arithmeticoperation(mod
represented correctly and will give rise to serious error in the
output for that block. If the data are uncorrelated, the probability that this number will appear after an arithmetic operation is approximately 2-b. For digital filtering applications, b
would typically be 32 or 64; in these cases the probability of
occurrence of 2b is extremely small. If b = 32 and N = 64, the
probability that a particular block is erroneous is roughly
lo-'.For
somefiltering
operations, this may be
permissible. If, however, this possible error cannot be allowed,
an extra bit will be required to represent 2 b at the expense of a
slightly more complicated hardware.
The arithmetic unit has to be wired to do arithmetic modulo
F,. Therefore, whenever a carry bit (2b = - 1 mod F,) is encounteredit should be subtractedfromthedata.
Thearithmetic will be of carry subtracttype (similar to carry-add
arithmetic required for 1's complement hardware). This is the
only major difference as compared to the conventional hardware units. If this is implemented,the hardware will automatically do arithmetic modulo Fermat numbers.
In [ 91, it is explainedhow to do various basic arithmetic
operationsmoduloFermat numbers. If CY is taken as 2ora
power of 2 the only multiplications involved in computing the
fast FNT are those by some powers of 2. These multiplications
are particularly simple to implement in arithmetic modulo F,.
They are implemented as a double register shift followed by the
subtraction of the higher order bits [ 9 ] . This is much simpler
than complex multiplication which is the main reason why the
FNT is much faster than the FFT. For implementation of the
558
APRIL
TABLE 111
CYCLIC CONVOLUTION TIMINGSFOR LENGTH N
32
64
128
256
256
512
1024
204
Wing a = ,IT.
FFT
(ms)
16
31
245
530
REAL
SEQUENCES
FNT or RT
(ms)
3.3
1.4
16.6a
8O.Oc
166.OC
34O.Oc
72O.Oc
1975
implementations.To convolve longsequences using the twodimensional RT the computational effort and required storage
increases by, at the most, a factor of 2. Still, the FNT implementation of convolution is much faster as compared to the
FFT implementation.
These transforms were implemented in assembler language on
an IBM 370/155 which has a 32-bit word length [9].The results were compared with an efficient FFT program for computing convolution which makes use of the symmetry of the
DFT for real data (see Table 111) [ 201.
Ix. GENERALIZATIONS,VARIATIONS, AND OTHER

RESULTS
A . Other Choices for M

In this paper, we have primarily discussed number theoretic
transforms in the rings of integers modulo Fermat numbers.
fast FNT, unlike the FFT, we do not need to store the powers These numbers seem to be the best choice for implementation
on binary computers. Nevertheless, any odd integer M can be
of a (if a is taken as 2 or a power of 2).
used as was discussed in Section 111. Rader [ 141 proposed the
use of Mersenne numbers M p = 2p - 1, where p is aprime.
VIII. COMPARISON WITH THE FFT
As noted in the previous section, computing the FNTis a very Mersenne numbertransformsfor (11 = - 2 have N = 2p, and,
do
not
have an FFT-type fast computational
simple operation on a binary machine. Now let us compare the therefore,
complexity of various basic operations involved in computing algorithm.
FNTs require the computing word length to be a power of 2.
the FNT vis-&vis the FFT. If the two sequences x ( n ) and h ( n )
Many
computers do not have word length a power of 2. Transhave b l and bz bit representations, respectively, andare of
forms
similar to FNTs exist for many of these situations. For
length N , then the output y ( n ) would need no more than a
example, on a 24-bit machine, we may perform convolution
(bl + bz +log, N) bit representation. To obtain the correct
result b 2 b l + b2 + log, N . In Section V, we havegiven a modulo M = 224 + 1. For this M , a = 23 gives N = 16, and
better bound on the output.In Section VI, we have given other a = 2% = 2(212 - 1) gives N = 32 as rthe maximum length.
In general one could take M = 2 + 1, s is an odd integer.
bounds. Roughly speaking, we need twice the number of bits
In
that case, a = 2 would give N = 2, and a = (2) =
to carry out the convolution using the FNT as compared to the
- 1) would give N = 2r+2. In many
fiied-pointFFTimplementationoftheconvolution.
But in 2[(-1)/2+zr-21(2Zf-l
the DFT, every data point is treated as a complex number and situations,it may be possible to have transforms of greater
Forexample,
taking M = 240 + 1,the
therefore requires two words, one for the real part and one for length than 2.
maximum
possible
transform
length
is 256, and taking M =
the imaginary part. Thus, in effect, the hardware requirement
for two transforms is about the same. Although for real data 2w + 1, the maximum possible transform length is 1024, but
it is possible to make use of the symmetry properties of the the corresponding as may not be simple.
For many computers whose word length is not a power of 2,
DFTs they require extra computation and for the purpose of
if
the above formulation is used to computetransforms
comparison it will be ignored, even though we have taken this
into account for our IBM 370/155 implementation to be dis- analogous to the FNT, the maximum transform length is very
cussed later. Therefore, we shall assume that in the FFT imple- small. But, if we are willing to sacrifice the effectiveword
mentation, each data point is represented by a b/2 bit real part length to some extent, we can increase the maximum transform length significantly. Let b , a multiple of 4 be the word
and a b/2 bit imaginary part,
One b/2 bit complex addition is equivalent to two b/2 bit length of the machine. Let
real additions, which are comparable to a b-bit addition mod F,.
M = 2b + 1 =MlM2.
(47)
Thus, the complexity of addition/subtraction is the same in
both the transforms. Similarly, it can be shown that a b/2 bit MI and M z may be nonprimes.It can be easily proved that
complex multiplication is comparable to a b-bit multiplication
mod F,. Computation of the RT requiresmultiplications by
powers of 2, which implemented as bit shifts and subtractions
It may so happen that M 1 is a small integer and O ( M z ) >>
become much simpler operations compared to complex multi- O(M), therefore, because of the presence of M 1 the maximum
plications required in the FFT implementation.
transform length is being considerablyreduced while atthe
Tocompute
alength
N fast RT, N log2 N additions/ same time M1 may not be increasing the maximum allowable
subtractions, and ( N / 2 )log2 N / 2 multiplications by some output (y,,,=) orthe effective word length significantly.
powers of 2 are required which are implemented as bit shifts In this situation we can compute transformsin 2~~ with
and subtractions. To compute the convolution using the FFT, maximumtransform
length O(M,) andmaximum allowable
most of the time is taken in computing the complex multipli- output magnitude as M z /2. At the expense of reduced output
cations required to compute thetransforms. A comparison with range, we have increased the transformlength. Furthermore,
RT reveals that these complex multiplications are replaced by arithmetic mod M z can be conveniently camed out as arithmetic mod M , because M 2 is a factor of M. At the end of the
bit shifts and subtractions which are much faster operations.
This results in considerable computational savings in the imple- computation, we have to reduce the result mod M 2 .
mentation of convolution. The
computation
required to
Table IV shows this factorization of M for several values of
multiply the two transforms is about the same for both the
b . Log2 M z shows the effective word length of the machine.
2 by 128 convolution.
%sing two-dimensional RT.
BURRUS:
AGARWAL AND
NUMBER THEORETIC TRANSFORMS
559
TABLE IV
FACTORIZATION
OF M = 2 b t 1 AS MI M, AND THE h h X l M U M
TRANSFORMLENGTHCORRESPoNDING TO M2
~~~
In
Machine Word
Length
2111,
Effective Word
Length
MZ
12
20
24
28
36
40
48
.56
60
72
80
16
16
32
16
16
256
64
32
16
17
17
257
17
17 X 241
257
65537
257
17 X 241 X 61681
97 X 257 X 673
65537
24 1
61681
97 X 673
15790321
433 X 38737
4278255361
193 X 22253377
5153 X 54410972897
4562284561
577 X 487824887233
X
44479210368001
15360
414721
1024
N for cz = fi
N for a = 2
log2 M z
MI
OW)
approx.
8
16
1696
24
24 144
32
32
48
32
48
64 320
N,,,
= O(Mz)
4b
2b
240
61680
96
15790320
144
4278255360
192
2242
4562284560
576
24
40
48
56
72
80
96
112
120
144
160
48
80
112
160
192
24
240
288
TABLE V
%ME PARAMETERS FOR NUMBERTHEORETIC
TRANSFORMS
IN
DECIMAL ARITHMETIC
~~~~~
Machine
Digits
b
100
8
4
6
8
10
12
16
O(M)
MZ
MI
1
12 101
17
101
73 X 137
1
16
20
32
Effective Digits
1%10 Mz
approx.
87 3 X 137
9901
5882353
60 3541 X 27961
99990001
99990000
353 X 449
32X 641 X 1409 X
69857
9900
Note that for thechoices of Mzs shown in Table IV 2 is always

of order 2b and also there exists an integer fi= 2b/4(2 - 1)
which is of the order 4b in Z M ~ . This leads to an efficient
implementation of these transforms, because powers of a are
simple and arithmetic mod M is also simple. For these transforms the word length is not a power of 2, but, still the transform length 4b is highly composite. The case b = 60 needs
special attention, for this if MI is taken as 17 for remaining
Mz,O(Mz) = 240 but fi is not of order 240. This is because
fi is of order 48 in Z Z ~and
] of order 80 in Z b l a l . For this
case although one can find an integer a which is of order 240
in Z M ~
it is not likely to be simple.
Thus far, our discussion was based on the assumption that
thecomputer is abinary computer. Many computers have
decimal representation of integers and for these computers, it
will be efficient if the arithmetic is done mod M = 10 + 1 and
a and powers of a are powers of 10. We have compiled Table
V to be used for decimal computers. Similar tables can be compiled for other radices also.
B. Complex Number Theoretic Transforms

The integer field Z , (assuming M to be prime) can be extended to complex integer field denoted by Z&, if the following equation does not have a solution in Z,:
xz
+ 1 = 0.
(49)
This means (- 1) does not have a square root in Z, or equivalently a root of order 4 does not exist inZM. This implies
4 1 O ( M ) = M - 1.
(50)
4
6
7
8
8
16
N,,,
= O(Mz)
N for cz = 10
5882352
16
20
24
32
Equation (50)is the only condition required for Z& to exist.

In Z% every integer is represented as a + jb; a , b E Z,. All the
arithmetical operations are done as in thenormalcomplex
arithmetic with jz = - 1. Both real andcomplex parts are
evaluated mod M, separately. The concept of magnitude and
phase does not exist in Z h . Complex number theoretic transforms (CNT) similar to NTT exist in Z&, and can be used to
compute the cyclic convolution of two complexinteger sequences. To avoid error due toaliasing both real and imaginary
parts of the output should be separately bounded to M/2. The
idea of the CNT has been considered by Reed [ 261.
Theorem: A transform having the cyclic convolution property in Z& exists if and only if
N I ( M Z - 1).
We will not give a formal proof of this theorem, but we will
outline a procedure to find a complex integer a of order N in
Z&, if N divides ( M z - 1).
Theorem:
This theorem can be easily proved. This theorem implies that

every complex integer is at most an (M+ 1)th root of a real
integer, in 2%. Let a + j b be an Nth root of a real integer, i.e.,
N is the least positive integer such that
(a + jb>N = a real integer
(53)
then by (5 2)
NI(M
+ 1).
(54)
PROCEEDINGS O F THE IEEE, APRIL
560
How to find an a of order M Z - 1 in Z&:

Consider complex integers of the form (1 + j b ) and search over
b E ZM such that (1 + j b ) is an (M+ 1)th root of a real integer
(proof for the existence of such a b can be given).
Then,
(1 +jb>M+ = I
+ bZ.
(55)
Let aM-, be a root of order (M- 1) in Z M . Then 1 + bZ can

be written as some power of a ~ : - ~
(1
+ bZ) = a j f - 1 .
It can be shown that x is odd. Then,

order (Mz- 1) in Z&.
(56)
-1
given by (57) is of
aMa-1 = (1 + j b ) aJ?yl)z)k
(57)
This can be easily proved. By raising a p - l by ((Mz- l)/N)th

power, we can find a complex integer a~ of order N in Z& if
NI(Mz - 1). It can be easily proved that
1 ) =1g2,cM
.d (+M -
(58)
Let
N1
=gcd ( M -1 , N )
N=NI X N Z
&
then a~ will be complex, but (YN,=

will be real.
In the fast FFT algorithm to compute the CNT the part corresponding to N z will require complex arithmetical operations
but the part corresponding to N1 will require only real arithmetical operations. CNT are good in theory as they offer more
so far no CNT have been
choiceintransformlengths.But,
found for which powers of a are simple. CNT do not exist for
Fermat numbers, but they exist for Mersenne numbers.
The only extension we have investigated is when
M = P l Pz P I
(59)
where pis are distinct primes. We have not investigated the case
when M contains prime powers. ZM can be extended to Z& as
before if
4 Y ( p i - 11,1 , 2i;=* . , 1 .
(60)
Also the Chinese remainder theorem can be used in Z&. It is

applied separately to the real and imaginary parts. For CNT to
exist inZ&, they must exist in Z& also. This gives the following
theorem.
Theorem: CNT of length N in Z& exists, if and only if
Nlgcd{p:-
1 , ~ ;l -, * . * , p ? - 1 ) .
(61)
We find ai of order N in Z i i and then combine them by the

Chinese remainder theorem to obtainan a of order N in Z&.
C. Application to Two-Dimensional Filtering
Rader [ 271 has recently discussed the application of the FNT
to two-dimensional filtering. In 2-0 applications, the length of
the impulseresponse along eachdimension is not too large,
therefore the FNTs are ideally suited for this application, because, for these applications the length constraint of the FNTs
1975
is not important. Other choices of M discussed in this section

can also be used for this application.
REFERENCES
[ 11 B. Goldand
C.M.
Rader, Digital Processing of Signals. New
York: McGraw-Hill, 1969.
[ 2 ] T. G. Stockham, High speedconvolutionandcorrelation,in
AFIPS Conf. Proc., 1966 Joint Cornpurer Con5 ,vol. 28, pp. 229233 (also in [?SI).
[ 31 C. S. Burrus, Block realization of digital filters,
IEEE Trans
Audio Electroacoust.,vol. AU-20, pp. 230-235, Oct. 1972.
( 4 ) D.E. Knuth, The Art of Computer Programming, vol. 2, Seminumerical Algorithms. Reading, M a s : Addison-Wesley, 1969.
[ 51 J . M. Pollard, The fast Fourier transform in a finite field,Math.
Comput.,vol. 25, pp. 365-374,Apr. 1971.
61 A. Schonhage and V. Strassen, Fast multiplication of large numbers, Compur. (in German), vol. 7, pp. 281-292, 1971.
71 J. W. Cooley and J . W. Tukey, An algorithm for machine calculationofcomplex Fourier series,Math. Cornput., vol. 19, pp. 297301, 1966 (also in [28]).
81 A. V. Oppenheimand C. Weinstein,Effectsof
finite register
Proc.
length in digital filtering and the fast Fourier transform,
IEEE,voI. 60, pp. 957-976, Aug. 1972.
91 R.C. Aganval and C. S. Burrus, Fast convolution using Fermat
number transforms with applications
t o digitalfiltering, ZEEE
Trans. Acoustics,Speech, and Signal Processing, vol. ASSP-22,
pp. 87-97, Apr. 1974.
[ 101 Pi J. Nicholson, Algebraic theory of finite Fourier transforms,
J. Comput. Syst. S c i , vol. 5, pp. 524-547, 1971.
[ 1 1 1 I. J. Good, The relation between two fast Fourier
transforms,
IEEE Trans. Compur., vol. C-20, pp. 310-317, Mar. 1971.
[ l Z ] D. E. Knuth,The
art ofcomputingprogramming-errata
et
addenda,Comput. Sci. Dep., Stanford Univ., Stanford, Calif.,
Rep. STAN-CS-71-194, pp. 21-26, Jan. 1971.
[ 131 C. M. Rader,ThenumbertheoreticDFTandexactdiscrete
convolution, presented at IEEE Arden House Workshop on Digital Signal Processing, Harriman, N.Y., Jan. 11, 1972.
[ 141 -, Discrete convolutionvia Mersenne transforms, IEEE Trans.
Compur., vol. (2-21, pp. 1269-1273, Dec. 1972.
[ 151 R.C. Agarwal and C. S. Burrus, Fast digital convolution using
Sourhwesr IEEE Conf. Rec., pp. 538Fermattransforms,in
543, Apr. 1973.
[ 161 0.Ore, NumberTheory and Its History. NewYork:McGrawHill, 1948.
[ 171 G. H. Hardy and E. M. Wright, The Theory of Numbers. Oxford,
England: Oxford Univ. Press, 1960.
[ 181 N. S.Szabo and R. I. Tanaka, Residue Arithmetic and Its Applicatiom to Computer Technology. New York: McGraw-Hill, 1967.
[ 191 L. E. Dickson, Hisrory of the Theory of Numbers, vol. I. Washington, D.C.: Carnegie Institute, 1919, p. 376.
[ZO] R.C. Singleton, An algorithm for computing the mixed radix
fast Fourier transform, IEEE Trans. Audio Electroacoust., vol.
AU-17, pp. 93-103, June 1969 (also in [ZS]).
I.2 1 1. L. B. Jackson. On the interaction of round-off noise and dvnamic
range in digital fiiters, Bell Sysr. Tech. J . , vol. 49, pp. 159-1 84,
Feb. 1970 (also in [ZS]).
1221 R.C. Aganval, On realization of digital
filters, Ph.D. dissertation, Dep. Elec. Eng., Rice Univ., Houston, Tex., Dec. 1973.
[ 2 3 ] R.C.
Agarwal and C. S. Burrus, Fastone-dimensionaldigital
IEEE
convolution
by
multi-dimensional
techniques,
Acousr., Speech, and S&nal Processing, vol. ASSP-22, pp. 1-10,
Feb. 1974.
[ 241 C. M. Rader, Private Commun.
[ 25 1 T. W. Parks, Private Commun.
(261 I. S. Reed, Private Commun.
1271 C.
M.
Rader, On the application ofthenumbertheoretic
transforms of high speed convolution
t o two-dimensional fitering, ZEEE Tram. Circuit Theory,t o be published.
[ 2 8 ] L. R. Rabiner and C.M. Rader, E&., Digital Signal Processing.
New York: IEEE Press, 1972.
M.
Rader,A noteonexact
discreteFouriertransforms,
[ 2 9 ] C.
IEEE Trans. Audio E l e c t r ~ o u s t .(Conesp.), vol. AU-21,pp.
558-559, Dec. 1973.
[ 301 H. Takahasi and Y. Ishibashi, A new method for exact calculation by a digital computer, Znform. Process. Jap., vol. I, pp. 28-
Trans.
42, 1962.

1975 Number Theoretic Transforms To Implement Fast Digital Convolution

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

1975 Number Theoretic Transforms To Implement Fast Digital Convolution

Caricato da

Copyright:

Formati disponibili

PROCEEDINGS

VOL. 63,NO. 4, APRIL 1975

Number Theoretic Transforms to Implement Fast

Absmiet-Transforms using number theoretic concepts are developed

The use of transform methods has proven to be useful when

The property of this transform that is important here is the

This implies that a convolution can be calculated by

and symbolically denoted

using two transforms,N multiplications, and one inverse transform.Theconvolutionimplementedby(4)

This work was supported in part by the National Science Foundation

must have to have the CCP, it is found [9], [ 101 that a is a

AGARWAL AND BURRUS: NUMBERTHEORETIC TRANSFORMS

root of unity of order N , i.e., N is the least positive integer

Because of the nature of modular arithmetic, numbers do

cp(M) =M(1 - l/P1)(1 - 1/P2) *

Consider raising each element of Z7 to powers from 1 t o 6

This illustrates several very interesting features. Consider

An important theorem known as Euler's theorem states that

Onlythose N that cfivide cp(M) =cp(7) = 6 have rootsthat

cp(1) = 1 p(2) = 1 (p(3) = 2 cp(4) = 2

AGARWAL AND BURRUS: NUMBERTHEORETIC TRANSFORMS

(di mod p i ' )

is said to have a DFT structure. By requiring that application

This also establishesthe maximum transform length in Z, as

This is a very general result applying to both rings and fields

relatively prime to M (or pi's)

This is a very important theorem that states exactly what the

in Z,, having the CCP but

PROCEEDINGS OF THE IEEE, APRIL

a are then examined.

As discussed in the last section, for the FNT of length N to

The proof that a 4 b given by ( 2 8 ) is of order 4 b with respect to

AGARWAL AND BURRUS: NUMBER THEORETIC TRANSFORMS

Because of thenature of modulararithmetic discussed in

Note that in x, - 2 was represented by - 2 + 17 = 15. Similarly,

H = (3,9, 16, 10) and Y = X * H = ( 3 , 9 0 , 8 0 , 9 0 )

Taking the inverse transform of Y ,

FOR CONVOLVINGLONG SEQUENCES

Arithmetic mod F , can be implemented using b = 2 bit

machines. In most digital filtering applications, h ( n ) represents

be cyclically convolved using the FNT with a = 2 is N = 2b and

This may well require a longer word length than is possible or

Now, since x1, h 1 , x ? , and h2 have roughly half the number of

Parks [ 25 I suggested that M z could be a Fermat number, just

VI. OVERFLOW AND QUANTIZATION CONSIDERATIONS

NUMBER THEORETIC TRANSFORMS

is violated, the resulting error is rather serious. Because of the

dom can be utilized to minimize the effect of the quantization

where b l and b z , respectively, represent the number of bits to

The output of the cyclic convolution is bounded by

lY(n)I GNllXll2 rlcll2

lY(n)l GNllX"ll1 Il~ll..

The quantization of the sequences can be done on a block to

jqn) = x"(n) * L(n)

= ly(n) 2b1+b,l GM/2.

Eq. (45) suggests that we have some freedom in choosing Fl

Since the structure of the FNT is similar to that of the DFT,

Ix. GENERALIZATIONS,VARIATIONS, AND OTHER

A . Other Choices for M

NUMBER THEORETIC TRANSFORMS

Note that for thechoices of Mzs shown in Table IV 2 is always

B. Complex Number Theoretic Transforms

Equation (50)is the only condition required for Z& to exist.