DSP

4.
Discrete Fourier Transform (DFT)/Fast Fourier Transform (FFT)
The transformation of discrete data between the time and frequency domain is
quite useful in extracting information from the signal. The DFT expresses signals as a
linear combination of sinusoidal or complex exponential signals with various angular
frequencies. This decomposition of signals allows one to examine the effects of the
system on each signal component. It is based on the Fourier series, which is a linear
weighted sum of harmonically related sinusoids and complex exponentials. The DFT
and its fast computational algorithm, the fast Fourier transform (FFT) are among the
most important tools in signal processing. The DFT is commonly applied to
implement linear filtering, spectral analysis, and correlation analysis.
We will analyze various types of DFT and FFT algorithms by implementing

them on the TMS320C40 quad development system.
4.1 Some Definitions
We first review the Fourier series and the Fourier transform before proceeding
with the DFT.
4.1.1 Fourier Series

The Fourier series of any periodic waveform f(t) [3] is given by
∞
x (t ) = ∑c e
n = −∞
n
jnωt
(4.1)
where cn is the Fourier series coefficient given by

1 Tp / 2
cn =
Tp ∫− Tp / 2
x (t ) e− jnωt dt (4.2)
t is an independent variable with frequencies nω, known as the nth harmonics of ω,

where ω=2πf=2π/Tp and Tp is the repetition period [5, 22].
4.1.2 Fourier Transform
Chapter 4: DFT / FFT 261

The Fourier transform is given by
∞
X (ω ) = ∫ x (t )e − jωt dt (4.3)
−∞
whereas the inverse Fourier transform is given by

1 ∞
x(t ) =
2π ∫ −∞
X (ω )e jωt dω (4.4)
Unlike the Fourier series, the Fourier transform does not require that the signal be
periodic [21].
4.2 Discrete Fourier Transform (DFT)
The discrete Fourier transform (DFT) of a discrete-time signal x[n] is defined

by
N −1
X (k ) = ∑ x[n]W
n=0
kn
N , k = 0, 1, ... , N − 1 (4.5)
where
WN = e − j 2 π / N (4.6)
The factor W, also known as the twiddle factor, is a function of N frequency terms
with argument kn which can take on any integer value up to (N-1)2 [20, 23]. Each
point of the DFT in equation (4.5) can be calculated using N complex multiplications
and (N-1) complex additions. Therefore, N2 complex multiplications and N(N-1)
complex additions are needed to compute N DFT values [5]. If we count two real
sums for every complex one, and four real multiplications plus two real summations
for every complex multiplication then we need a total of 4N2-2N real summations and
4N2 real multiplications to compute the N DFT values.
The inverse discrete Fourier transform (IDFT) is given by
N −1
1
x[n] =
N
∑ X ( k )W
k =0
− kn
N , n = 0, 1, ... , N − 1 (4.7)
We can see from equation (4.5) and (4.7) that the only difference between the DFT and
the IDFT is the factor of 1/N and the negative exponent in the IDFT [21]. Therefore,

a DFT can be converted into an IDFT by reversing the order of elements of the twiddle
factor and dividing by N. The order of elements of the twiddle factor are reversed by
reading them upside down.
We may express the N-point DFT in matrix form [5] as
X N = WN x N (4.8)
where WN is the N×N matrix of linear transformation, xN is the N-point vector of the
signal x[n], and XN is the N-point vector of frequency samples defined by
1 1 1 . . . 1 
1 W WN2 . . . W N −1 
 N N 
 WN2 WN42 . . . W 2 ( N − 1)
N 
WN =  . . . . . . . 
 
. . . . . . . 
. . . . . . . 
 N −1 2 ( N − 1) ( N − 1)( N −1) 
(4.9)
1 WN WN . . . WN 
 x (0)   X (0) 
 x (1)   X (1) 
   
xN =  . , XN =  . 
 .   . 
 x ( N − 1)   X ( N − 1) 
   
In the same terms, the IDFT is given by
1
xN = WN* X N (4.10)
N
where W*N is the complex conjugate of WN.
4.2.1 Computation of the DFT: The Goertzel Algorithm

Besides the direct approach in calculating the DFT, there are some modified
algorithms to calculate the DFT. We discuss one of the most commonly used; the
Goertzel algorithm.
The Goertzel algorithm is derived by evaluating the modified z-transform X(z)
polynomial:
X ( k ) = X ( z ) z =W k = DFT { x[n]} (4.11)
where

N −1
X ( z) = ∑ x[n]z
n=0
n
(4.12)
The polynomial calculation can be rewritten as a difference equation [21] in the form
of
y[n ] = zy[n − 1] + x[ N − n ] (4.13)
At n=N, the Goertzel algorithm is written in the form of a first order filter as
y[n ] = W k y[n − 1] + x[ N − n ], y ( −1) = 0 (4.14)
with
X ( k ) = y[ N ] (4.15)
Figure (4.1) shows the Goertzel algorithm as a recursive filter.
y(n)
[n]
[N-n)
x(N-n) [n-1]
y(n-1)
DELAY z-1
Wk
Figure 4.1 Goertzel Algorithm as a Recursive Filter
The second order Goertzel algorithm [5, 21] is given by

q[n ] = [2 cos]q[n − 1] − q[n − 2 ] + x[n ]
(4.16)
y[n ] = q[n ] − [cos− j sin]q[n − 1], q (0) = q (-1) = 0
The Goertzel algorithm is only efficient if the DFT is to be computed for less than
log2(N) elements [21].
4.2.2 Notes on DFT

The DFT provides the same number of points as exist in the sampled signal
since the DFT is a complete and unique representation of the sampled signal. This is

due to the fact that the twiddle factor W is orthogonal over the finite interval [22, 25],
and both the signal and its spectrum are repeated or aliased, making it unnecessary to
carry the sum beyond the edge of the window in time or the fundamental interval in
frequency [22]. We also observe that the IDFT provides a periodic signal
reconstruction in the interval NTs [25], where Ts is the sampling interval. As we
mentioned earlier, the DFT is calculated by a series of multiply-and-add steps, of
which there may be a very large number. For N=1024, one million complex
multiplications and one million complex additions are required [3]. This is very
undesirable, and this amount can be reduced if we note the built-in redundancy in
equations (4.5) and (4.6). We will discuss this reduction in the amount of calculations
in detail later.
The DFT is the Fourier transform of a sampled signal computed at equally

spaced frequency intervals of size fs/N, where N is the number of points and fs is the
sampling frequency [22]. Since the DFT is a special case extension of Fourier
analysis, it is closely related to other transforms. Table (4.1) shows the relationship
between the Fourier transform, Fourier series, Fourier transform of sampled data, and
the DFT.
Time Domain Frequency Domain
data range data range
Fourier transform continuous infinite continuous infinite
Fourier series continuous periodic discrete infinite
FT of sampled data discrete infinite continuous periodic
Discrete Fourier transform discrete periodic discrete periodic
Table 4.1 Relationship between the FT, Fourier Series, FT of Sampled Data, and
DFT [22]
4.2.3 Properties of the DFT
The DFT has some very important mathematical properties which are
useful for applications and algorithm development. These properties play an important

role in applying the DFT to signal processing applications. Table (4.2) provides a
brief listing of these properties.
Property Time domain Frequency domain
Time Reversal x[N-n] X(N-k)
Linearity αx1[n]+βx2[n] αX1(k)+βX2(k)
Circular shifting x([n-M])N X(k)e-j2πkM/N
Circular convolution x1[n] ⊗x2[n] X1(k)X2(k)
Periodicity x[n]=x[n+N] X(k)=X(k+N)
Spectrum shift x[n]ej2πMn/N X((k-M))N
Multiplication x1[n]x2[n] (1/N)X1(k)⊗X2(k)
Parseval’s theorem Σx[n]y*[n] (1/N)ΣX(k)Y*(k)
Complex conjugate x*[n] X*(N-k)
Correlation x[n] ⊗y*[-n] X(k)Y*(k)
Table 4.2 Properties of the DFT
The symmetry properties of the DFT are described in detail in [5, 25].
Another important property of the DFT is its application to the performance of

linear filtering in the frequency domain. This application of the DFT is an alternative
to time-domain convolution and it is also computationally more efficient than the
time-domain convolution. Linear filtering via the DFT involves operations on a block
of data; therefore the input sequence is sectioned into smaller blocks [24]. There are
two well-known methods for doing this: the overlap-add and the overlap-save
methods. These methods are explained in [5, 21].
Example 4.1:
We first develop and implement a direct method to compute the DFT and IDFT
on the ‘C40. Files hprg4_1.c and dprg4_1.c implement this algorithm. Please note

that this method does not involve complex structure. The real and imaginary parts of
the input data is separated in two arrays x and y, respectively. Output is uploaded in
the form of, again, two arrays xx and yy. We tested this program to compute the DFT
of:
x(n)={4,2,1,4,6,3,5,2}
Note that since there is no imaginary part in this input data the output array y has eight
zero values in it. The DFT is calculated by setting the value of inv in dprg4_1.c as
“0”. If we set inv as “1” then the program will compute the IDFT. The DFT output
is:
Real Imaginary
27.000000 0.000000
-4.121320 3.292893
4.000000 1.000000
0.121320 -4.707107
5.000000 -0.000000
0.121320 4.707107
4.000000 -1.000000
-4.121320 -3.292893
We also used this algorithm to compute DFT for the sequence

x(n)=cos(2π(0.1)n), n = 1,....100
Figure (4.2) shows the DFT for this sequence.

120
100
80
60
40
20
0
0 50 100 150 200
π0.1n)
Figure 4.2 DFT of x(n)=cos(2π
Example 4.2:
We modify hprg4_1.c and dprg4_1.c to use the complex structure. This will
enable to us compute more efficiently. This does not involve using two separate arrays
for the real and imaginary parts but rather uses a common C structure of the form
typedef struct {
float real;
float imag;
} complex
This enables us to use the data in the form
x[n].real and x[n].imag
Great care is needed in coding algorithms using complex structure for ‘C40. C can
also use double instead of float but this will not work for ‘C40 implementation using
Sonitech’s board. Also the values uploaded from the ‘C40 to the pentium host program
should be at least two times the size of data points to accommodate for real and
imaginary points. Files hprg4_1a.c and dprg4_1a.c are used for implementation.

Example 4.3:
We implement the Goertzel algorithm in this example. Files hprg4_2.c and
dprg4_2.c implement this algorithm. This algorithm is a general example which
implements both the first order and the second order Goertzel algorithm. The program
computes a 5-point DFT of a sequence:
x(n)={1,2,3}
Recommended Exercises
1. Modify hprg4_1.c and dprg4_1.c to compute the IDFT of output given in

Example (4.1). Create new files. The answer should be equal to the input data
used in Example (4.1).
2. Modify hprg4_1.c and dprg4_1.c to read and write the input and the output
from data files. Generate a sequence [3] based on the following equation in
MATLAB.
x[n] = Qn , n = 0, 1, ... , 31
where
Q = 0.9 + j0.3
Save this sequence in a data file and compute the DFT. (Hint: Use “save”
command in MATLAB to create a data file. Use “help save” in MATLAB for
further information.)
Plot the absolute value of DFT.
3. Calculate the DFT of the following two sequences:

x[n]={0,1,1,0}
x[n]={1,0,0,1}
Use programs hprg4_1.c/dprg4_1.c and hprg4_2.c/dprg4_2.c.
4. Plot the following sequence:

x[n]=sin(n), n=0,...99
Find the DFT of this sequence and plot the sequence using MATLAB.
5. Convert program in file dprg4_1.c into an optimized assembly language file.
4.3 Fast Fourier Transform (FFT)
The direct calculation of the DFT from equation (4.5) is computationally

intense, and consequently slow, making it unrealistic for real-time digital signal
processing. This limitation was removed by the fast Fourier transform (FFT)
algorithm presented by Cooley and Tukey [26] in 1965. The FFT is simply an
efficient method for computing the DFT. It is not only efficient, but it also reduces
round-off errors by a factor of log2N/N where N is the number of data samples [26,
27]. If we have N=2n samples, then only Nlog2N arithmetic operations are required to
compute the N-point DFT as compared to the N2 operations used by the direct
computation of the DFT. For N=1024=210, the DFT requires N2=1,048,576
multiplications whereas the FFT algorithms reduce this to Nlog2N=10,240, a factor-of-
100 improvement [22]!
There are many types of FFT algorithms available. We discuss four of the
most commonly used algorithms.
4.3.1 Radix-2 Algorithms

These algorithms were developed by Cooly and Tukey in 1965 [26]. It
assumes that N is a power of 2 and hence is called a radix-2 algorithm. There are also
other similar algorithms including radix-4, radix-8, and radix-16 algorithms. This
algorithm has two forms based on its application in the time or frequency domains.
When applied in the time domain, the algorithm is called a decimation-in-time (DIT)
FFT and when applied in the frequency domain, this algorithm is called decimation-in-

frequency (DIF) FFT. The DIF was independently developed by Sande and
Gentleman in 1966 and by Cooley and Stockholm and is discussed in the next section.
Let a sequence {x[n]} be separated into two N/2-point sequences [27] with one
corresponding to the even-numbered {y[n]} and the second representing odd-
numbered {z[n]} samples of x[n] as shown by equation (4.17).
y[n] = x[2n]
N (4.17)
z[n] = x[2n + 1], n = 0, 1, ... , −1
1
Using equation (4.18), we can rewrite equation (4.5) as
( N / 2 ) −1 ( N / 2 ) −1
X (k ) = ∑ y[n]WNk 2n +
n=0
∑ z[n]W
n=0
k ( 2 n +1)
N , k = 0, 1, ... , N − 1 (4.18)
or
( N / 2 ) −1 ( N / 2 ) −1
X (k ) = ∑ y[n]W k 2n
N +WNk ∑ z[n]W k 2n
N , k = 0, 1, ... , N − 1 (4.19)
n=0 n=0
But W2N = WN/2 [3]. Using this equality in equation (4.19) we get
( N / 2 ) −1 ( N / 2 ) −1
X (k ) = ∑ y[n]W kn
N /2 +Wk
N ∑ z[n]W kn
N /2 , k = 0, 1, ... , N − 1 (4.20)
n=0 n=0
and
X ( k ) = Y ( k ) + WNk Z ( k ) (4.21)
Since Y(k) and Z(k) are periodic in the half interval N/2, we can rewrite equation (4.21)
using this periodic property and the factor WNk+N/2 = -WkN as
 N N
X  k +  = Y ( k ) − WNk Z ( k ), k = 0, 1, ... , −1 (4.22)
 2 2
We observe that N-point DFT is computed using two length (N/2) DFTs and some
extra operations called butterfly operations. Figure (4.3) shows one of the butterfly
operation. Each of these butterflies is a length 2 DFT breaking a length N DFT up into
a two-dimensional length (2×N/2) DFT. It involves one complex multiplication and
two complex additions.

a a+bWrN
Wr N
b a-bWrN
-1
Figure 4.3 Butterfly Operation
Figure (4.3) shows an 8-point decimation-in-time (DIT) FFT. This is so called

because the alternate time samples are decimated by the process. We can see from this
figure that an N-point FFT contains N/2 butterflies per stage with log2N stages giving a
total of (N/2)log2N butterflies [3]. This method of FFT calculation involves
(N/2)log2N complex multiplications and Nlog2N complex additions providing a
reduction in computations. For a 1024-point DFT, the computation time is reduced by
two orders of magnitude if the FFT is employed [3, 27].
As we can see from Figure (4.4), the input is in scrambled order. This
shuffling is necessary because the input, output, and intermediate data must occupy the
same memory locations so that the memory requirement for the FFT is of order N.
This shuffling is achieved by using bit-reversal where samples are stored in bit-
reversed order. Table (4.3) explains this process [22].
Decimal numbers 0 1 2 3 4 5 6 7
Binary Equivalents 000 001 010 011 100 101 110 111
Bit-reversed binary 000 100 010 110 001 101 011 111
Decimal equivalents 0 4 2 6 1 5 3 7
Table 4.3 Bit-Reversal [22]

x[0] x[0]
x[4] x[1]
-1
W 0N
x[2] x[2]
-1
W 2N
x[6] x[3]
-1 -1
W 0N
x[1] x[4]
-1
W 1N
x[5] x[5]
-1 -1
W 0N W 2N
x[3] x[6]
-1 -1
W 2N W 3N
x[7] x[7]
-1 -1 -1
Figure 4.4 Signal Flow Graph
4.3.2 Decimation-in-frequency (DIF) Radix-2 Algorithm

This algorithm is closely related to the DIT FFT discussed in section 4.3.1, as it
is derived by reversing the direction of signals flow in figure (4.4). A sequence x[n] is
divided into two N/2-point sequences. The first sequence, y[n], is composed of first
N/2 points and the second sequence, z[n], is composed of last N/2 points of the
sequence x[n]. that is,
y ( n) = x ( n)
N N (4.23)
z(n) = x ( n + ), n = 0, 1, ... , −1
2 1
Then we have
( N / 2 ) −1 N −1
X (k ) = ∑ y[n]WNkn +
n=0
∑ z[n]W
n= N /2
kn
N (4.24)
Using equation (4.23) and the property WNkN/2 = (-1)k, we can rewrite equation (4.24)
as

( N / 2 ) −1
 k  N   kn
X (k ) = ∑
n=0
 x[n] + ( −1) x n + 2  WN
  
(4.25)
where
N
k = 0, 1, ... , −1 (4.26)
2
Now, we can split X(k) into even- and odd-numbered samples by using W2N = WN/2 as
( N / 2 ) −1
X (2 k ) = ∑ [ y[n] + z[n]]W kn
N /2
n=0
( N / 2 ) −1
(4.27)
X (2 k + 1) = ∑ {[ y[n] − z[n]]W }W n
N
kn
N /2
n=0
Again, the N-length DFT can be computed using two length (N/2) DFTs. Figure (4.5)
shows the signal flow graph for 8-point radix-2 DIF FFT.
x[0] x[0]
x[1] x[4]
-1
W 0N W 0N
x[2] x[2]
-1
W 0N
x[3] x[6]
-1 -1
W 0N
x[4] x[1]
-1
W 1N
x[5] x[5]
-1 -1
W 2N W 0N
x[6] x[3]
-1 -1
W 3N W 2N
x[7] x[7]
-1 -1 -1
Figure 4.5 Signal Flow Graph for 8-Point Radix-2 DIF FFT
We can see that the DIF requires the same number of multiplications and additions
that are required by the DIT FFT. It has input samples in natural (unshuffled) order
and yields frequency samples in bit-reversed (shuffled) order [27]. Figure (4.6) shows

the butterfly diagram for the DIF FFT. It only differs from the DIT FFT in whether the
multiplication by Wk comes before (DIT) or after (DIF) the summation step. In other
words, DIT refers to grouping the input sequence into even and odd samples, whereas
the DIF refers to grouping the output (frequency) sequence into even and odd samples.
Therefore, for a given DIT FFT algorithm there exists an inverse which is of DIF form
[27], repeatedly dividing the output sequence into even and odd samples.
a a+b
WrN
b (a-b)WrN
Figure 4.6 Butterfly Diagram for the DIF FFT
4.3.3 FFT Radix-4 algorithm

This method is applied when the number of data points in the DFT is a power
of 4. The first stage of a DIT radix-4 FFT is given by
 N  ( N / 4 ) −1 ( N / 4 ) −1
X  k +  = ∑ x[4n]WNnk/ 4 + ( − j ) k WNk ∑ x[4n + 1]WNnk/ 4
 4 n =0 n=0
( N / 4 ) −1
+ ( −1) W
k 2k
N ∑ x[4n + 2]W nk
N /4 (4.28)
n=0
( N / 4 ) −1
+ ( j) W
k 3k
N ∑ x[4n + 3]W nk
N /4
n=0
Figure (4.7) shows the signal flow graph for a radix-4 DIT FFT. The input order in
this figure is obtained by reversing the base-4 representation; e.g., 6 = 124, which
reverses to 214 = 9. This method requires (9/4)N log2 (N) - (43/12)N +(16/3) real
multiplications and (25/4)N log2 (N) -(43/12)N + 16/3 real additions, assuming that the
complex multiplications are done using three real multiplications and three additions.
The radix-4 FFT is used to reduce the number of complex multiplications by a factor
of almost 2. Furthermore the number of additions is also reduced. All this increases
the computational speed for the FFT algorithms.

x[0] x[0]
x[4] x[1]
x[8] x[2]
x[12] x[3]
x[1] x[4]
W0N
x[5] x[5]
W1 N
x[9] x[6]
W2 N
x[13] x[7]
W3 N
x[2] x[8]
W0N
x[6] x[9]
W2 N
x[10] x[10]
W4N
x[14] x[11]
W6N
x[3] x[12]
W0N
x[7] x[13]
W3 N
x[11] x[14]
W6N
x[15] x[15]
W9N
Figure 4.7 Signal Flow Graph for a Radix-4 DIT FFT [24]
A radix-4 decimation-in-frequency (DIF) FFT is also obtained in a similar fashion as

in the case of radix-2 DIF FFT. The N/4-point sub-sequences representing DIF FFT
are

( N / 4 ) −1
  N  N  3 N   0 kn
X (4 k ) = ∑
n=0 
 x[n ] + x n + 4  + x n + 2  + x n + 4  WN WN / 4
     
( N / 4 ) −1
  N  N  3 N   n kn
X (4 k + 1) = ∑  x[n ] − jx n +  − x n +  + jx n + WN WN / 4
n=0   4  2  4  
( N / 4 ) −1
(4.29
  N  N  3 N   2 n kn
X ( 4 k + 2) = ∑  x[n ] − x n + 4  + x n + 2  − x n + 4  WN WN / 4
     
n=0 
( N / 4 ) −1
  N  N  3 N   3n kn
X (4 k + 3) = ∑  x[n ] + jx n + 4  − x n + 2  − jx n + 4  WN WN / 4
     
n=0 
)
Figure (4.8) shows the signal flow graph for a radix-4 DIF FFT.
x[0] x[0]
x[1] x[4]
x[2] x[8]
x[3] x[12]
x[4] x[1]
x[5] x[5]
x[6] x[9]
x[7] x[13]
x[8] x[2]
x[9] x[6]
x[10] x[10]
x[11] x[14]
x[12] x[3]
x[13] x[7]
x[14] x[11]
x[15] x[15]

Figure 4.8 Signal Flow graph for a Radix-4 DIF FFT
4.3.4 FFT Split-radix algorithm

A split-radix FFT algorithm is realized by applying a radix-2 FFT to the even-
indexed terms and a radix-4 FFT algorithm to map the odd-indexed terms. Figure
(4.9) shows a length-16 split-radix FFT signal flow graph. The area covered by the
shade is the L-shaped butterfly used in this method. The even part (top half)
progresses one stage as a radix-2, while the odd part (bottom half) progresses two
stages as a radix-4; hence the name split-radix [24]. This method requires N log2 (N) -
(2/3)N +(2/3)(-1)log(N) real multiplications and 3N log2 (N) -(2/3)N + (2/3)(-1)log(N) real
additions, assuming that the complex multiplications are done using three real
multiplications and three additions.
x[0] x[0]
x[1] x[8]
W 0N
x[2] x[4]
W0N
x[3] x[12]
x[4] x[2]
x[5] x[10]
x[6] x[6]
x[7] x[14]
W0 N
x[8] x[1]
W 1N
x[9] x[9]
W2N W 0N
x[10] x[5]
W3 W0N
N x[13]
x[11]
W0N
x[12] x[3]
W 3N
x[13] x[11]
W 6N W 0N
x[14] x[7]
W9N W 0N
x[15] x[15]
Figure 4.9 Signal Flow Graph for Length 16 Split-Radix FFT
Example 4.4:

We now implement the Cooley-Tukey Radix-2 DIF and DIT algorithms on
‘C40. Files hprg4_4.c and dprg4_4.c are used in this example. This is a very basic
implementation of FFT requiring input data to be in separate arrays for real and
imaginary parts and the length of input data should be equal or greater than the number
of FFT points to be calculated. We tested this algorithm with the following input data:
x(n)={1,2,3,4}
the output after executing the file hprg4_4.exe is
10.000000 0.000000
-2.000000 0.000000
-2.000000 0.000000
-2.000000 0.000000
Example 4.5:
We now implement another method of implementing a FFT for real data. Files
hprg4_5.c and dprg4_5.c are used in this sample implementation. This FFT routine
accomplishes the in-place transformation of a real sequence of length N in a storage
area using the method given by Brogham [41]. In this method the original real
sequence of N points is viewed as a complex sequence of N/2 points.
[x[0], x[1], ... , x[N-1]] = [uo , u1 , ... , uN/2-1]
We implemented the following test sequence on this algorithm
x[n] = {-6, -2, 0, 2, 4, 6, 3, -1}
which resulted in the following output
Real Imaginary
6.0 0.0
-17.7782 6.5355
-5.0 -3.0
-2.2218 0.5355
-4.0 0.00
We used this program to compute the FFT of the following hypothetical voltage
waveform sequence.

 0, k < 100

xk =   k − 100   2π ( k − 100)  , k = 100,101,....,1023
exp − sin
  176   160 
Figure (4.10) shows the plot of this sequence.
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
0 200 400 600 800 1000 1200
Figure 4.10 Plot of Hypothetical Voltage Waveform
Figure (4.11) shows the FFT plot of the first 100 values.
80
70
60
50
40
30
20
10
0
0 10 20 30 40
Figure 4.11 FFT of Figure (4.11) Sequence

The sequence selected has no physical significance. It was selected to test the program
for large data sets.
Example 4.6:
Files hprg4_7.c and dprg4_7.c implement the inverse FFT using REAL data.
The transformations are in place and N has to be power of 2. We later combined this
program structure with the programs used in Example (4.5) into one big program using
two ‘C40s. This results in a flexible program.
Example 4.7:
In this example, we implement the FFT computations on complex input data.
Files hprg4_8.c and dprg4_8.c are created for this implementation. The complex
computation requires twice the storage size as compared to the implementations
discussed in Example (4.5) and Example (4.6). The computation is in place requiring
N to be a power of 2. The basic algorithm here is an extension of Cooley-Tukey
algorithm [26]. The output has two main parts in it; spectral component and the
output time series, which is N times the original data. In our actual program output
column 2nd and 3rd belong to the output time series whereas the spectral components
are in column 4th and 5th. Please note that column 4th is the real part and column 5th
is the imaginary pert of the data.
Example 4.8:
We now implement the Radix-4 FFT. Files hprg4_3.c and dprg4_3.c are used
in this sample implementation. This implementation is based on Cooley-Tukey Radix-
4 DIF FFT program [21, 26]. The real part of the input data is in array x whereas the
imaginary part in array y. In this implementation, the length of input data should be
equal or greater to the FFT length n. The program was tested for the following data.
x(n) = {1, 2, 3, 4}

Example 4.9:
In the last example of this section we implement the Radix-8 FFT. Files
hprg4_6.c and dprg4_6.c are created to implement this algorithm. This algorithm is
based on the Cooley-Tukey Radix-8 DIF FFT program [21]. This version is a basic
one-butterfly version of the algorithm. The Radix-8 FFT has a few more adds than the
Radix-4 FFT, but the total number of multiplies and adds decreases [21]. Array x has
the real part and array y has the imaginary part of the data. We tested this algorithm
with the following sequence.
x(n)={1,2,3}
for FFT of length 8.
Recommended Exercises
1. Modify the programs in files hprg4_4.c and dprg4_4.c to find the DIF and DIT
Radix-2 FFT of the following sequence. Plot the FFT using MATLAB.
x(n)=cos(2π(0.1)n), n = 1, ...., 100
2. Convert the program in dprg4_4.c to an assembly language program with C

lines inserted as comments in the code without using optimization.
3. Generate the sequence

x=sin[n], n=1, ..., 100
Use program files hprg4_5.c and dprg4_5.c to compute the FFT.
4. Combine into one, the FFT and IFFT of real sequences( Files are hprg4_5.c,
hprg4_6.c, dprg4_5.c, dprg4_6.c). Use ‘C40 # 1 to implement the FFT and
‘C40 # 2 for the IFFT.

5. Compute using hprg4_8.c and dprg4_8.c the FFT of the following sequence:
x[n]=exp((0.9+0.6j)n), n=1,2, ... , 150
Plot the FFT magnitude.
6. Convert the Radix-4 algorithms on file dprg4_3.c into assembly language

optimized code. Use the sequence given in Exercise 3 to compute FFT.
7. Use the following sequence for Radix-8 and Radix-4 calculations.

x(n)={1,1,0,0,1,1,1,1,0,0,1,1,0,0}
4.4 Some Additional FFT Algorithms
In addition to the above described DFT/FFT algorithms, there are a number of

other algorithms in existence. We discuss two of these algorithms.
4.4.2 The Prime-Factor FFT algorithm

The prime-factor FFT algorithm (PFA) converts the one-dimensional DFT into
a multidimensional DFT where each dimension is transformed independently. No
twiddle factors are involved in this algorithm as efficient short-length modules are
used to transform each dimension [ 21]. If we assume N has Q relatively prime
factors, N=N1N2 . . . NQ, then the factors are defined as
−1
N
Mi =   , i = 1,2,..., Q, (4.30)
 Ni 
Ni
The prime factor map for time and frequency are given by
N N N
n≡ M1n1 + M 2 n2 + . . . + M Q nQ ,
N1 N2 NQ
N
(4.31)
N N N
k≡ k1 + k2 + . . . + kQ .
N1 N2 NQ
N

Supplementing these values in equation (4.5) and rearranging, we get the prime-factor
FFT algorithm [21 ]
N1 − 1N 2 − 1 N Q −1
[ ]
X k1 , k 2 ,..., k Q =
n1 = 0 n2 = 0 nQ = 0
[ ]
∑ ∑ ..... ∑ x n1 , n2 ,..., nQ ⋅ WNn11k1WNn22k2 ...WNQQ Q
n k
(4.32)
4.4.2 Winograd’s Algorithm

This method was developed by S. Winograd in 1975 for efficient calculation of
prime-length cyclic convolution using a minimum number of multiplications. The
Winograd FFT factors the DFT into a multidimensional DFT with each dimension
computed using Winograd short-length modules, which consist of three parts: the
prewave additions, the multiplications, and the postwave additions [21]. The modules,
based on Rader’s permutations, transform a DFT into a cyclic convolution and then
uses Winograd’s efficient short-length cyclic convolution algorithms. Figure (4.12)
shows a short Winograd DFT Module.
Figure 4.12 Short Winograd DFT Module
4.5 Application Examples
4.5.1 Fast Correlation based on FFT

The correlation function of x[k] and y[k] is given by rxy(n) where
M −1
rxy (n) = corr {[ xk ],[ yk ]}∆ ∑ x[k ] y[n − k ], n = 0,1, ..., M − 1 (4.33)
k =0
rxy(n) is the cross-correlation function of x[k] and x[k] and rxx(n) is the autocorrelation
function of x[k]. The cross-correlation function is a very sensitive measure of the

similarity of two signals as it is maximized when the two signals are similar in
frequency content and are in phase with each other. The correlation function, as a
general rule, is related to the power spectrum by the Fourier transform which makes it
one of the better methods to estimate the frequency content in a signal [8, 21].
We implemented this algorithm in files hprg4_9 and dprg4_9.c. The

algorithm requires two data vectors x and y to be correlated (for autocorrelation, both x
and y are the same vector). In our design example we implemented this algorithm to
compute the autocorrelation of a length 1500 sine wave added with white noise. The
signal-to-noise (SNR) ratio of this sequence is 0.1 or -10 dB. Note the correlation
function in dprg4_9.c requires the sequence length to be one plus power of two. The
next power of 2 above 1500 is 2048 so a total signal length of 2049 is selected,
satisfying the zero-padding requirement with the value of 300. (See hprg4_9.c and
dprg4_9.c). The input signal is shown in figure (4.13) plotted form file hprg4_9i.dat.
Only the first 300 samples are plotted.
-5
0 50 100 150 200 250 300
Figure 4.13 First 300 Samples of Sine Wave with Added Uniform White Noise
The autocorrelation sequence results are stored in file hprg4_9o.dat from where they
are plotted using MATLAB. Figure (4.14) shows the autocorrelation result of the first

300 samples. Figure (4.14) shows that there is an impulse present at n=0, due to the
uncorrelated white noise for n≠0. There is also a cosine component at the beginning
due to the sine wave. We can infer that the correlation function in cases like this can
be used in conjunction with the spectrum to identify periodic components as shown by
the sinusoidal wave form in figure (4.14).
10000
8000
6000
4000
2000
-2000
0 50 100 150 200 250 300
Figure 4.14 Autocorrelation Function of the Sine Wave With Added White Noise
4.5.2 Fast Convolution based on FFT

The convolution function is very similar to the filtering algorithm. The
convolution cxy(n) of two function [xk] and [yk] is obtained by using equation (4.34)
[21].
M −1
cxy (n) = conv{x[ k ], y[k ]}∆ ∑ x[k ] y[n − k ], n = 0, 1, ..., M − 1 (4.34)
k =0

In the fast convolution method FFT is applied. According to the convolution theorem
the time domain convolution is equivalent to multiplications in the frequency domain.
Therefore we can write the convolution equation given in equation (4.34) as
Y (k ) = H (k ) X (k ) (4.35)
We used the real and complex FFT functions to calculate the convolution in this
implementation. Files hprg41_0.c and dprg41_0.c are used in this implementation.
We tested our algorithm to compute the output of a linear system, given the input
sequence and the system’s impulse response [41]. The input signal is a damped sine
wave describing an electromagnetic source in as volts. Figure (4.15) shows the input
signal and figure (4.16) shows the impulse response of the linear system.
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
0 200 400 600 800 1000
Figure 4.15 Input Signal for Convolution -- Damped Sine Wave

8
0
0 200 400 600 800 1000
Figure 4.16 Linear System’s Impulse Response for Convolution
The input signal is generated in array x and the system’s impulse response in y. Figure
(4.17) describes the convolution results.

8
-2
-4
0 200 400 600 800 1000
Figure 4.17 Output of the Linear System Using Convolution
The impulses in figure (4.17) represent various reflectors. These reflectors are actually
resulting in different paths from transmitter to receiver causing the input signal to be
added with different delays and different strengths at the output.
4.5.3 Average Periodogram based on FFT

The average Periodogram is a method to estimate the power spectrum of a
signal. We use the real data FFT calculation function here to compute the DFT of
each segment of the sequence. The algorithm requires window type iwindo which uses
1 for rectangular, 2 for tapered rectangular, 3 for triangular, 4 for Hanning, 5 for
Hamming, or 6 for Blackman window. It also requires an overlap value which should
be at least 0 or less than 1 (Refer to files hprg41_1.c and dprg41_1.c). We first tested
the program using a single sinusoid at 1/8th of the sampling frequency of 1. Figure
(4.18) shows the input sinusoid whereas figure (4.19) illustrates the power density
spectrum.

1
0.5
-0.5
-1
0 10 20 30 40
Figure 4.18 Input Sinusoid for Periodogram
0
0 0.1 0.2 0.3 0.4 0.5
Figure 4.19 Periodogram of Sinusoid

We can see from figure (4.19) that the single peak is at about 1/8th of the sampling
frequency. The power density in figure (4.19) can be easily converted to dB scale by
using
20log(Amp)
A more practical example of the Periodogram is the spectrum estimation for
noisy signals. We next implemented a sequence of length 1000 having a sine wave at
3/33 Hz-s, a sine wave at 7/32 Hz-s, and added uniform white noise. The sequence
was generated in MATLAB. Figure (4.20) shows the input sequence
1.5
0.5
x(N)
-0.5
-1
-1.5
-2
0 100 200 300 400 500
N
Figure 4.20 Input Sequence with Two Sinusoids and Uniform White Noise
We can see from figure (4.20) that the sinusoidal components are hardly visible. We
apply the Periodogram method and figure (4.21) plots the output in dB scale

20
10
-10
-20
-30
-40
-50
0 0.1 0.2 0.3 0.4 0.5
Figure 4.21 Power Spectrum of Two Sine Waves plus Added Noise (Sampling
Frequency is 1)
It is evident from figure (4.21) that the average Periodogram method was able to
identify the two peaks at frequencies 3/32 and 7/32 Hz-s.
We also implemented the Periodogram on the voice data file ltlchk.dat.

Assuming a sampling frequency of 5500 Hz-s, the following spectrum was plotted.

35
Power Spectrum Magnitude (dB)

30
25
20
15
10
0
0 500 1000 1500 2000 2500 3000
Frequency
Figure 4.22 Power Density Spectrum of Voice Data File ltlchk.dat
Application Exercises
1. Generate the following sequences.

x[n] = sin(2π0.1n) + sin(2π0.3n), n = 1 ,2, ... , 100
y[n] = cos(2π0.1n) + cos(2π0.3n), n = 1 ,2, ... , 100
Calculate an plot the autocorrelation of x[n], y[n], and cross-correlation
between x[n] and y(n).
2. Use the convolution algorithm files to compute 100 samples of the output of
the following filter [41, 21]:
 2πk  0.25 + 0.25z-1 + 0.25z-2
sin  
 16 

3. Calculate and plot the power spectral density of white uniform noise. Use the
following commands in MATLAB to generate the sequence.
» rand('uniform')
» rand('seed',12357)
» for n=1:1000
x(n)=sqrt(12*3)*(rand-0.5);
end
4. Try running Periodogram using the data hubble.dat. Do you get the following
power density spectrum assuming sampling frequency of 1.
-20
-40
-60
-80
-100
-120
0 0.1 0.2 0.3 0.4 0.5
Figure 4.23 Power Density Spectrum of hubble.dat
5. Use the file input.dat from chapter 2 and plot the power density spectrum.
4.6 Appendix 4 -- Application Programs

• Host File hprg4_1.c
/* HPRG4_1.C */
/* Direct Computation of DFT */
/* does not involve complex structure */
/* Input data: Real part in x
Imaginary part in y
Output data: Real part in xx
Imaginary part in yy */
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include "c:\spirit40\source\s40tools.h"
#define size 16 /* This code will work for a max size of 600 */
#define pi 3.141592654
#define maxbits 30
int inv;
long npt;
float x[size], y[size],xx[size],yy[size];
long inv_addr, npt_addr,x_addr,flag_addr,xx_addr,y_addr,yy_addr;
void main()
{
int status;
int n;
long k;
int k2;
printf("\n\n\t***Direct Computation of DFT***\n\n");

inv=0;
npt=8;
/* Initialization */
for (n=0;n<size;++n){
x[n]=0;
y[n]=0;
}
/* Input data. This can also be read from data files */
x[1]=4;
x[2]=2;
x[3]=1;
x[4]=4;
x[5]=6;
x[6]=3;
x[7]=5;
x[8]=2;
y[1]=0;
y[2]=0;
y[3]=0;
y[4]=0;
y[5]=0;
y[6]=0;
y[7]=0;
y[8]=0;

printf("the values read are...\n");
for (n=1;n<=npt;n++) printf("%f %f\n",x[n], y[n]);
dsp_select(1);
printf("\n\tC40 #1 is selected....\n\n");
if (dsp_dl_exec("dprg4_1.out")==-1){
printf("Error in downloading the file\n");
exit(1);
}
dsp_reset();
x_addr=dsp_get_laddr("_x");
y_addr=dsp_get_laddr("_y");
xx_addr=dsp_get_laddr("_xx");
yy_addr=dsp_get_laddr("_yy");
flag_addr=dsp_get_laddr("_flag");
dsp_dl_long_array(x_addr,size,(long*)x);
dsp_dl_long_array(y_addr,size,(long*)y);
status=1;
dsp_dl_int_array(flag_addr,1,&status);
while(1){
dsp_up_int_array(flag_addr,1,&status);
if (status==0) break;
if (kbhit()) exit(0);
}
dsp_up_long_array(xx_addr,size,(long*)&xx);
dsp_up_long_array(yy_addr,size,(long*)&yy);
printf("\n");
for (k=1;k<=npt;++k){
k2=k-1;
printf("%d \t%f \t%f \n",k2, xx[k], yy[k]);
}
}
• ‘C40 File dprg4_1.c
#include "c:\sim4x\math.h"
#include "c:\sim4x\stdlib.h"
#define size 16
#define pi 3.141592654
#define maxbits 30
int inv=0,flag=0; /* inv==0 for DFT and 1 for IDFT */

int npt=8;
float x[size],y[size];
float xx[size], yy[size];
main()
{
long k,n;

double WN, wk, c, s, XR[size], XI[size];
init_c40();
while (flag!=1);
dsp30(x,size);
dsp30(y,size);
WN=2*pi/npt;
if (inv==1)
WN=-WN;
for(k=0;k<npt;++k){
XR[k]=0.0;XI[k]=0.0;
wk=k*WN;
for (n=0;n<npt;++n){
c=cos(n*wk);s=sin(n*wk);
XR[k]=XR[k]+x[n+1]*c+y[n+1]*s;
XI[k]=XI[k]-x[n+1]*s+y[n+1]*c;
}
if (inv==1){
XR[k]=XR[k]/npt;
XI[k]=XI[k]/npt;
}
}
for (k=1;k<=npt;++k){
xx[k]=XR[k-1];
yy[k]=XI[k-1];
}
ieee30(xx,size);
ieee30(yy,size);
flag=0;
while(1);
}

/* HPRG4_2.C */
/* Goertzel Computation of DFT */
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#define N 5
#define NN 3
long ord_addr, x_addr,flag_addr,y_addr,a_addr, b_addr;
void main()
{
int status;
int ord;
int k;
float x[NN]={1,2,3}, y[NN]={0,0,0};
float a[N],b[N];
printf("\n\n\t***Goertzel Computation of DFT***\n\n");
printf("Enter 1 for DFT first order and 2 for DFT second order= ");
scanf("%d",&ord);

dsp_select(1);
exit(1);
}
dsp_reset();
ord_addr=dsp_get_laddr("_ord");
a_addr=dsp_get_laddr("_a");
b_addr=dsp_get_laddr("_b");
dsp_dl_long_array(x_addr,NN,(long*)x);
dsp_dl_long_array(y_addr,NN,(long*)y);
dsp_dl_int_array(ord_addr,1,(int*)ord);
status=1;
while(1){
}
dsp_up_long_array(a_addr,N,(long*)a);
dsp_up_long_array(b_addr,N,(long*)b);
printf("\n");
printf("The Goertzel DFT is...\n");
for (k=0;k<N;++k){
printf("%d \t%f \t%f \n",k, a[k], b[k]);
}
}
#define N 5 /* N -point DFT */
#define NN 3
int flag=0,ord;
float x[NN], y[NN], a[N], b[N];
main()
{
float cc,a2,a1,b2,b1,at,s,c,bt,q,t;
int k,j,i;
init_c40();
while (flag!=1);
dsp30(x,NN);

dsp30(y,NN);
/* goertzel algorithm */
if (ord==1){
q=6.283185307179586/N;
for (j=0;j<N;++j){
c=cos(q*j);
s=sin(q*j);
at=x[N-1];
bt=y[N-1];
for (i=0;i<N-2;++i){
t=c*at+s*bt+x[N-(i+2)];
bt=c*bt-s*at+y[N-(i+2)];
at=t;
}
a[j]=c*at+s*bt+x[0];
b[j]=c*bt-s*at+y[0];
}
}
else {
q=6.283185307179586/N;
for (j=0;j<N;++j){
s=sin(q*j);
c=cos(q*j);
cc=2*c;
a2=0;
b2=0;
a1=x[N-1];
b1=y[N-1];
for (i=0;i<N-2;++i){
t=a1;
a1=cc*a1-a2+x[N-(i+2)];
a2=t;
t=b1;
b1=cc*b1-b2+y[N-(i+2)];
b2=t;
}
a[j]=c*a1-a2-s*b1+x[0];
b[j]=s*a1+c*b1-b2+y[0];
}
}
ieee30(&a,N);
ieee30(&b,N);
flag=0;
while(1);
}
• Host File hprg4_1a.c
#define size 600
#define pi 3.141592654
typedef struct {
float real;

float imag;
}complex;
int flag=0;
int inv;
int npt;
int k,n;
float WN, wk, c, s, XR[size], XI[size];
complex x[size];
complex xx[size];
main()
{
int k;
init_c40();
while (flag!=1);
dsp30(x,npt*npt);
WN=2*pi/npt;
if (inv==1)
WN=-WN;
for(k=0;k<npt;++k){
XR[k]=0.0;XI[k]=0.0;
wk=k*WN;
XR[k]=XR[k]+x[n].real*c+x[n].imag*s;
XI[k]=XI[k]-x[n].real*s+x[n].imag*c;
}
if (inv==1){
XR[k]=XR[k]/npt;
XI[k]=XI[k]/npt;
}
}
for (k=0;k<npt;++k){
xx[k].real=XR[k];
xx[k].imag=XI[k];
}
ieee30(xx,npt*npt);
flag=0;
while(1);
}
• ‘C40 File dprg4_1a.c
#define size 600
#define pi 3.141592654
typedef struct {
float real;
float imag;
}complex;
int flag=0;
int inv;
int npt;
int k,n;

float WN, wk, c, s, XR[size], XI[size];
complex x[size];
complex xx[size];
main()
{
int k;
init_c40();
while (flag!=1);
dsp30(x,npt*npt);
WN=2*pi/npt;
if (inv==1)
WN=-WN;
for(k=0;k<npt;++k){
XR[k]=0.0;XI[k]=0.0;
wk=k*WN;
XR[k]=XR[k]+x[n].real*c+x[n].imag*s;
XI[k]=XI[k]-x[n].real*s+x[n].imag*c;
}
if (inv==1){
XR[k]=XR[k]/npt;
XI[k]=XI[k]/npt;
}
}
for (k=0;k<npt;++k){
xx[k].real=XR[k];
xx[k].imag=XI[k];
}
ieee30(xx,npt*npt);
flag=0;
while(1);
}
/* FFT-Radix-4 DIF */
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <math.h>
long x_addr, y_addr, flag_addr,m_addr,n_addr;
void main()
{
int i,status,n,m;
float x[4]={1,2,3,4};
float y[4]={0,0,0,0};
m=1;
n=pow(4,m); /* length of data is greater and equal to n */

printf("\n\t*** FFT Radix-4 DIF*** \n");
dsp_select(1);
printf("\nC40 #1 is selected.....\n\n");
exit(1);
}
dsp_reset();
m_addr=dsp_get_laddr("_m");
n_addr=dsp_get_laddr("_n");
dsp_dl_long_array(x_addr,n,(long*)x);
dsp_dl_long_array(y_addr,n,(long*)y);
dsp_dl_int_array(m_addr,1,(int*)&m);
dsp_dl_int_array(n_addr,1,(int*)&n);
status=1;
while(1){
}
dsp_up_long_array(x_addr,n,(long*)x);
dsp_up_long_array(y_addr,n,(long*)y);
printf("# Real Imag\n");

for (i=0;i<n;++i) printf("%d %4.4f %4.4f\n",i,x[i],y[i]);
/* FFT RADIX-4 DIF */

int m,n,n2,k,n1,j,i,i1,i2,i3;
float e,a,c01,c02,si1,si2,b,c,c03,si3,r3,r1,r2,s1,s3,r4,s2,s4,x[4],y[4];
int flag=0;
main()
{
init_c40();

while (flag!=1);
dsp30(x,4);
dsp30(y,4);
n2=n;
for (k=0;k<m;++k){
n1=n2;
n2=n2/4;
e=6.283185307179586/n1;
a=0;
for (j=0;j<n2;++j){
b=a+a;
c=a+b;
c01=cos(a);
c02=cos(b);
c03=cos(c);
si1=sin(a);
si2=sin(b);
si3=sin(c);
a=j*e;
for (i=j;i<n;i+=n1){
i1=i+n2;
i2=i1+n2;
i3=i2+n2;
r1=x[i]+x[i2];
r3=x[i]-x[i2];
s1=y[i]+y[i2];
s3=y[i]-y[i2];
r2=x[i1]+x[i3];
r4=x[i1]-x[i3];
s2=y[i1]+y[i3];
s4=y[i1]-y[i3];
x[i]=r1+r2;
r2=r1-r2;
r1=r3-s4;
r3=r3+s4;
y[i]=s1+s2;
s2=s1-s2;
s1=s3+r4;
s3=s3-r4;
x[i1]=c01*r3+si1*s3;
y[i1]=c01*s3-si1*r3;
x[i2]=c02*r2-si2*r2;
y[i2]=c02*s2-si2*r2;
x[i3]=c03*r1+si3*s1;
y[i3]=c03*s1-si3*r1;
}
}
}
/* digit reverse counter */

j=1;
n1=n-1;
for (i=1;i<n1;++i) {
if (i>j) goto g1;
r1=x[j];
x[j]=x[i];
x[i]=r1;
r1=y[j];

y[j]=y[i];
y[i]=r1;
g1: k=n/4;
g2: if (k*3>j) goto g3;
j=j-k*3;
k=k/4;
goto g2;
g3: j=j+k;
}
ieee30(&x,4);
ieee30(&y,4);
flag=0;
while(1);
}
/* HPRG4_4.C */
/* Cooley-Tukey FFT Radix-2 DIF and DIT */
/* x array of real part of input length >=N */
/* y array of real part of input length >=N */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <conio.h>
#define N 4 /* since N=2**m */
int m=2;
float xx[N],yy[N];
float x[N]={1,2,3,4}, /* Sample Input data */
y[N]={0,0,0,0};
long ch_addr, x_addr,flag_addr,xx_addr,y_addr,yy_addr,m_addr;

FILE *ofp;
void main()
{
int status,ch;
int k,i;
printf("\n\n\t***FFT Radix-2 Calculations (Sample program)***\n\n");

printf("The input data is....\n");
for (i=0;i<N;i++) printf("%f %f\n",x[i],y[i]);
printf("Enter 1 for DIF FFT or 2 for DIT FFT= ");
scanf("%d",&ch);
if (ch==1) printf("\n\nYou have selected Radix-2 DIF\n\n");
if (ch==2) printf("\n\nYou have selected Radix-2 DIT\n\n");
dsp_select(1);
exit(1);
}

dsp_reset();
ch_addr=dsp_get_laddr("_ch");
m_addr=dsp_get_laddr("_m");
dsp_dl_long_array(x_addr,N,(long*)x);
dsp_dl_long_array(y_addr,N,(long*)y);
dsp_dl_int_array(ch_addr,1,(int*)&ch);
dsp_dl_int_array(m_addr,1,(int*)&m);
status=1;
while(1){
}
dsp_up_long_array(xx_addr,N,(long*)&xx);
dsp_up_long_array(yy_addr,N,(long*)&yy);
printf("\n");
for (k=0;k<N;++k){
printf("%d \t%f \t%f \n",k, xx[k], yy[k]);
}
ofp=fopen("fftdtdf.dat","w");
printf("The result was written in file 'fftdtdf.dat'....\n");
for (i=0;i<N;++i)
fprintf(ofp, "%f %f\n", xx[i],yy[k]);
fclose(ofp);
}
#define N 4
int ch,flag=0;
int n2,k,n1,j,l,i,m;
float e,a,c,s,xt,yt;
float x[N],y[N],xx[N],yy[N];
main()
{
init_c40();
while (flag!=1);
dsp30(x,N);
dsp30(y,N);

if (ch==1) { /* FFT Radix-2 DIF */
n2=N;
for (k=0;k<m;++k){
n1=n2;
n2=n2/2;
e=6.283185037179586/n1;
a=0;
for (j=0;j<n2;++j){
c=cos(a);
s=sin(a);
a=(j+1)*e;
for (i=j;i<N;i=i+n1) {
l=i+n2;
xt=x[i]-x[l];
x[i]=x[i]+x[l];
yt=y[i]-y[l];
y[i]=y[i]+y[l];
x[l]=c*xt+s*yt;
y[l]=c*yt-s*xt;
}
}
}
/* digit Reverse counter */
j=0;
for (i=0;i<N-1;++i){
if (i>j) goto a1;
else {
xt=x[j];
x[j]=x[i];
x[i]=xt;
xt=y[j];
y[j]=y[i];
y[i]=xt;
}
a1: k=N/2;
a2: if (k>j) goto a3;

else {
j=j-k;
k=k/2;
goto a2;
}
a3: j=j+k;
}
}
else if (ch==2) { /* FFT Radix-2 DIT */
/* Digit Reverse counter */

j=0;
for (i=0;i<N-1;++i){
if (i>j) goto l1;
else {
xt=x[j];
x[j]=x[i];
x[i]=xt;
xt=y[j];
y[j]=y[i];
y[i]=xt;
}

l1: k=N/2;
l2: if (k>j) goto l3;

else {
j=j-k;
k=k/2;
goto l2;
}
l3: j=j+k;
}
/* Main FFT loops */
n1=1;
for (k=0;k<m;++k){
n2=n1;
n1=n2*2;
e=6.283185037179586/n1;
a=0;
for (j=0;j<n2;++j){
c=cos(a);
s=sin(a);
a=(j+1)*e;
for (i=j;i<N;i=i+n1) {
l=i+n2;
xt=c*x[l]+s*y[l];
yt=c*y[l]-s*x[l];
x[l]=x[i]-xt;
x[i]=x[i]+xt;
y[l]=y[i]-yt;
y[i]=y[i]+yt;
}
}
}
}
for (j=0;j<N;++j) {
xx[j]=x[j];
yy[j]=y[j];
}
ieee30(xx,N);
ieee30(yy,N);
flag=0;
while(1);
}
• Host file hprg4_5.c
/* HPRG4_5.C */
/* FFT Computation */
/* Real Data
N=power of 2
Storage N+2 real */
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>

#define nn 8
float x[10] = { -6.0, -2.0, 0.0, 2.0, 4.0, 6.0, 3.0, -1.0, 0.0, 0.0};
long x_addr, flag_addr,xx_addr;
float xx[nn+1];
void main()
{
int status,m;
printf("\n\nFFT Real sequence\n\n");
dsp_select(1);
exit(1);
}
dsp_reset();
dsp_dl_long_array(x_addr,10,(long*)x);
status=1;
while(1){
}
dsp_up_long_array(xx_addr,nn+1,(long *)&xx);
printf("M REAL IMAGINARY\n");
for (m = 0 ; m <= nn+1 ; m += 2)

{
printf("%d %f %f\n", m / 2, xx[m], xx[m+1]);
}
}
/* DPRG4_5.C */
/* Real Sequence */
#define PI 3.14159265358979323846
static long neg_i1 = -1;
typedef struct {
float r, i;
} complex;

long nn=8;
float x[10];
float xx[9];
long m;
int flag=0;
main()
{
int i;
init_c40();
while (flag!=1);
dsp30(x,10);
spfftr(x, &nn);
for (i=0;i<=nn+1;++i){
xx[i]=x[i];
}
ieee30(xx,nn+1);
flag=0;
while(1);
/* spfftr---N=2**K samples--N must be at least 4 and must be power of 2 */

void spfftr(complex *x, long *n)
{
void r_cnjg();
void spfftc();
long m, tmp_int;
complex u, tmp, tmp_complex;
float tpn, tmp_float;
tpn = (float) (2.0 * PI / (double) *n);
tmp_int = *n / 2;
spfftc(x, &tmp_int, &neg_i1);
x[*n / 2].r = x[0].r;

x[*n / 2].i = x[0].i;
for (m = 0 ; m <= (*n / 4) ; ++m)

{
u.r = (float) sin((double) m * tpn);
u.i = (float) cos((double) m * tpn);
r_cnjg(&tmp_complex, &x[*n / 2 - m]);
tmp.r = (((1.0 + u.r) * x[m].r - u.i * x[m].i)

+ (1.0 - u.r) * tmp_complex.r - -u.i * tmp_complex.i) / 2.0;
tmp.i = (((1.0 + u.r) * x[m].i + u.i * x[m].r)

+ (1.0 - u.r) * tmp_complex.i + -u.i * tmp_complex.r) / 2.0;
tmp_float = ((1.0 - u.r) * x[m].r - -u.i * x[m].i

+ (1.0 + u.r) * tmp_complex.r - u.i * tmp_complex.i) / 2.0;
x[m].i = ((1.0 - u.r) * x[m].i + -u.i * x[m].r
+ (1.0 + u.r) * tmp_complex.i + u.i * tmp_complex.r) / 2.0;
x[m].r = tmp_float;
r_cnjg(&x[*n / 2 - m], &tmp);

}
return;
}
/* fft of N=2**K complex data points ,using time decomposition with

input bit reversal */
void spfftc(complex *x, long *n, long *isign)

{
void complex_exp();
long i, l, m, mr,tmp_int;
complex t, tmp_complex, tmp;
float pisign;
pisign = (float) ((double) *isign * PI);
mr = 0;
for (m = 1 ; m < *n ; ++m)

{
l = *n;
l /= 2;
while (mr + l >= *n)

{
l /= 2;
}
mr = mr % l + l;
if (mr > m)
{
t.r = x[m].r;
t.i = x[m].i;
x[m].r = x[mr].r;
x[m].i = x[mr].i;
x[mr].r = t.r;
x[mr].i = t.i;
}
}
l = 1;
while (l < *n)

{
for (m = 0 ; m < l ; ++m)
{
tmp_int = l * 2;
for (i = m ; tmp_int < 0 ? i >= (*n - 1) : i < *n ;

i += tmp_int)
{
tmp.r = 0.0;

tmp.i = (float) m * pisign / (float) l;
complex_exp(&tmp_complex, &tmp);
t.r = x[i + l].r * tmp_complex.r - x[i + l].i * tmp_complex.i;

t.i = x[i + l].r * tmp_complex.i + x[i + l].i * tmp_complex.r;
x[i + l].r = x[i].r - t.r;

x[i + l].i = x[i].i - t.i;
x[i].r = x[i].r + t.r;

x[i].i = x[i].i + t.i;
}
}
l *= 2;
}
return;
}
/* complex conjugate */
void r_cnjg(complex *r, complex *z)
{
r->r = z->r;
r->i = -z->i;
}
/* complex exponential */
void complex_exp(complex *r, complex *z)
{
double expx;
expx = exp((double) z->r);
r->r = (float) expx * cos((double) z->i);

r->i = (float) expx * sin((double) z->i);
}
/* HPRG4_6.C */
/* FFT Radix-8 Computation */
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#define nn 8
float x[nn]={ 1,2,3}, y[nn]={0,0,0},xx[nn],yy[nn];

long x_addr, y_addr,flag_addr,xx_addr,yy_addr;
FILE *ofp;
void main()
{
int status,i;
printf("\n\nFFT Radix-8\n\n");

dsp_select(1);
exit(1);
}
dsp_reset();
dsp_dl_long_array(x_addr,nn,(long*)x);
dsp_dl_long_array(y_addr,nn,(long*)y);
status=1;
while(1){
}
dsp_up_long_array(xx_addr,nn,(long *)&xx);
dsp_up_long_array(yy_addr,nn,(long *)&yy);
ofp=fopen("hprg4_6.dat","w");
printf("\n\n");
printf("Real Imaginary\n");
for (i=0;i<nn;++i) {
printf("%f %f\n",xx[i],yy[i]);
fprintf(ofp, "%f %f\n", xx[i],yy[i]);
}
fclose(ofp);
printf("The data was saved in hprg4_6.dat\n");
}
/* dprg4_6.c---FFT Radix-8 */
#define nn 8
int m,n,n2,n1,j,b,c,d,e,f,g,i,jj;
float co2,co3,co4,co5,co6,co7,co8,si2,si3,si4,si5,si6,si7,si8;
int i1,i2,i3,i4,i5,i6,i7,i8;
float r1,r2,r3,r4,r5,r6,r7,r8,s1,s2,s3,s4,s5,s6,s7,s8,t1,t2,k;
float x[nn], y[nn],c81,e1,a,t,xx[nn],yy[nn];
int flag=0;
main()
{

init_c40();
while (flag!=1);
dsp30(x,nn);
dsp30(y,nn);
m=1;
n=pow(8,m);
c81=0.707106778;
n2=n;
for (k=0;k<m;++k) {
n1=n2;
n2=n2/8;
e1=6.283185307179586/n1;
a=0;
for (j=0;j<n2;++j) {
b=a+a;
c=a+b;
d=a+c;
e=a+d;
f=a+e;
g=a+f;
co2=cos(a);
co3=cos(b);
co4=cos(c);
co5=cos(d);
co6=cos(e);
co7=cos(f);
co8=cos(g);
si2=sin(a);
si3=sin(b);
si4=sin(c);
si5=sin(d);
si6=sin(e);
si7=sin(f);
si8=sin(g);
a=(j+1)*e1;
for (i1=j;i1<n;i1=i1+n1) {
i2=(i1)+n2;
i3=(i2)+n2;
i4=(i3)+n2;
i5=(i4)+n2;
i6=(i5)+n2;
i7=(i6)+n2;
i8=(i7)+n2;
r1=x[i1]+x[i5];
r5=x[i1]-x[i5];
r2=x[i2]+x[i6];
r6=x[i2]-x[i6];
r3=x[i3]+x[i7];
r7=x[i3]-x[i7];
r4=x[i4]+x[i8];
r8=x[i4]-x[i8];
t1=r1-r3;
r1=r1+r3;
r3=r2-r4;
r2=r2+r4;
x[i1]=r1+r2;
r2=r1-r2;
s1=y[i1]+y[i5];

s5=y[i1]-y[i5];
s2=y[i2]+y[i6];
s6=y[i2]-y[i6];
s3=y[i3]+y[i7];
s7=y[i3]-y[i7];
s4=y[i4]+y[i8];
s8=y[i4]-y[i8];
t2=s1-s3;
s1=s1+s3;
s3=s2-s4;
s2=s2+s4;
y[i1]=s1+s2;
s2=s1-s2;
r1=t1+s3;
t1=t1-s3;
s1=t2-r3;
t2=t2+r3;
x[i5]=co5*r2+si5*s2;
y[i5]=co5*s2-si5*r2;
x[i7]=co7*t1+si7*t2;
y[i7]=co7*t2-si7*t1;
r1=(r6-r8)*c81;
r6=(r6+r8)*c81;
s1=(s6-s8)*c81;
s6=(s6+s8)*c81;
t1=r5-r1;
r5=r5+r1;
r8=r7-r6;
r7=r7+r6;
t2=s5-s1;
s5=s5+s1;
s8=s7-s6;
s7=s7+s6;
r1=r5+s7;
r5=r5-s7;
r6=t1+s8;
t1=t1-s8;
s1=s5-r7;
s5=s5+r7;
s6=t2-r8;
t2=t2+r8;
x[i4]=co4*t1+si4*t2;
y[i4]=co4*t2-si4*t1;
}
}
}
/* Digit Reverse Counter */

j=1;
for (i=n-1;i>=1;--i){
if (i>j) goto a1;
else {
t1=x[j];

x[j]=x[i];
x[i]=t1;
t1=y[j];
y[j]=y[i];
y[i]=t1;
}
a1: k=n/8;
a2: if (k*7>j) goto a3;

else {
j=j-k*7;
k=k/8;
goto a2;
}
a3: j=j+k;
}
for (jj=0;jj<nn;++jj) {
xx[jj]=x[jj];
yy[jj]=y[jj];
}
ieee30(&xx,nn);
ieee30(&yy,nn);
flag=0;
while(1);
}
/* HPRG4_7.C */
/* Inverse FFT Computation */
/* Real Data
N=power of 2
Storage N+2 real */
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#define nn 8
float x[10] = { -6.0, -2.0, 0.0, 2.0, 4.0, 6.0, 3.0, -1.0, 0.0, 0.0 };
long x_addr, flag_addr,xx_addr;
float xx[nn];
void main()
{
int status,m;
printf("\n\n\t Inverse FFT of Real sequence\n\n");
printf("the input sequence is...\n");
for (m=0;m<10;m++) printf("\t%f\n",x[m]);
dsp_select(1);
exit(1);
}

dsp_reset();
dsp_dl_long_array(x_addr,10,(long*)x);
status=1;
while(1){
}
dsp_up_long_array(xx_addr,nn,(long *)&xx);
printf("The Inverse FFT of the input sequence is..\n");
for (m = 0 ; m < nn ; m ++)

{
printf("%d %.2f \n", m , xx[m]);
}
}
/* DPRG4_7.C */
/* Inverse FFT--Real Sequence */
#define pi 3.14159265358979323846

static long pos_i1 = 1;
typedef struct {
float r, i;
} complex;
long nn=8;
float x[10];
float xx[9];
long m;
int flag=0;
main()
{
long i;
init_c40();
while (flag!=1);
dsp30(x,10);
spfftr(x, &nn);
spiftr(x, &nn);

for (i=0;i<=nn+1;++i){
xx[i]=x[i];
}
ieee30(xx,nn+1);
flag=0;
while(1);
}

{
void r_cnjg();
void spfftc();
long m, tmp_int;
tpn = (float) (2.0 * pi / (double) *n);
tmp_int = *n / 2;
x[*n / 2].r = x[0].r;

x[*n / 2].i = x[0].i;
for (m = 0 ; m <= (*n / 4) ; ++m)

{
tmp.r = (((1.0 + u.r) * x[m].r - u.i * x[m].i)

tmp.i = (((1.0 + u.r) * x[m].i + u.i * x[m].r)


x[m].i = ((1.0 - u.r) * x[m].i + -u.i * x[m].r
x[m].r = tmp_float;
r_cnjg(&x[*n / 2 - m], &tmp);

}
return;
}
void spiftr(complex *x, long *n)

{
void r_cnjg();
void spfftc();
long m, tmp_int;
complex u, tmp_complex, tmp;

for (m = 0 ; m <= (*n / 4) ; ++m)
{
u.i = (float) -cos((double) m * tpn);
tmp.r = ((1.0 + u.r) * x[m].r - u.i * x[m].i)

+ ((1.0 - u.r) * tmp_complex.r - -u.i * tmp_complex.i);
tmp.i = ((1.0 + u.r) * x[m].i + u.i * x[m].r)
+ ((1.0 - u.r) * tmp_complex.i + -u.i * tmp_complex.r);
tmp_float = ((1.0 - u.r) * x[m].r - -u.i * x[m].i)

+ ((1.0 + u.r) * tmp_complex.r - u.i * tmp_complex.i);
x[m].i = ((1.0 - u.r) * x[m].i + -u.i * x[m].r)
+ ((1.0 + u.r) * tmp_complex.i + u.i * tmp_complex.r);
x[m].r = tmp_float;
r_cnjg(&x[*n / 2 - m], &tmp);

}
tmp_int = *n / 2;
spfftc(x, &tmp_int, &pos_i1);
return;
}

{
r->r = z->r;
r->i = -z->i;
}

{
void complex_exp();
float pisign;
pisign = (float) ((double) *isign * pi);
mr = 0;
for (m = 1 ; m < *n ; ++m)

{
l = *n;
l /= 2;

{
l /= 2;
}
mr = mr % l + l;
if (mr > m)

{
t.r = x[m].r;
t.i = x[m].i;
x[m].r = x[mr].r;
x[m].i = x[mr].i;
x[mr].r = t.r;
x[mr].i = t.i;
}
}
l = 1;
while (l < *n)

{
for (m = 0 ; m < l ; ++m)
{
tmp_int = l * 2;

i += tmp_int)
{
tmp.r = 0.0;

x[i + l].r = x[i].r - t.r;

x[i + l].i = x[i].i - t.i;
x[i].r = x[i].r + t.r;

x[i].i = x[i].i + t.i;
}
}
l *= 2;
}
return;
}
{
double expx;

}
/* HPRG4_8.C */
/* FFT Computation */
/* Complex data
N=power of 2
Storage N complex */

#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#define nn 4
typedef struct {
float r, i;
} complex;
complex x1[4] = { {2.0, 1.0 },{0.0, 2.0 },

{1.0, 1.0 },{-1.0, 0.0 } };
long x1_addr, flag_addr,x2_addr;

complex x2[4];
void main()
{
int status,m;
printf("\n\n\t FFT of Complex sequence\n\n");
dsp_select(1);
exit(1);
}
dsp_reset();
x1_addr=dsp_get_laddr("_x1");
x2_addr=dsp_get_laddr("_x2");
dsp_dl_long_array(x1_addr,nn*2,(long*)x1);
status=1;
while(1){
}
dsp_up_long_array(x1_addr,nn*2,(long *)&x1);
dsp_up_long_array(x2_addr,nn*2,(long *)x2);
printf("The FFT of the complex input sequence is..\n");
for (m = 0 ; m <= 3 ; m++)

{
printf("%d %4.4f %4.4f %4.4f %4.4f\n",
m, x1[m].r, x1[m].i, x2[m].r, x2[m].i);
}

• ‘C40 dprg4_8.c
/* DPRG4_8.C */
/* FFT--Complex Sequence */
#define pi 3.14159265358979323846

typedef struct {
float r, i;
} complex;
complex x1[4];
complex x2[4];
long m;
int flag=0;
main()
{
init_c40();
while (flag!=1);
dsp30(x1,4*2);
spfftc(x1, &pos_i4, &neg_i1);
for (m = 0 ; m <= 3 ; ++m)

{
x2[m].r = x1[m].r;
x2[m].i = x1[m].i;
}
spfftc(x1, &pos_i4, &pos_i1);
ieee30(x1,4*2);
ieee30(x2,4*2);
flag=0;
while(1);
}

{
void complex_exp();
float pisign;
mr = 0;
for (m = 1 ; m < *n ; ++m)

{
l = *n;
l /= 2;

{
l /= 2;
}
mr = mr % l + l;
if (mr > m)
{
t.r = x[m].r;
t.i = x[m].i;
x[m].r = x[mr].r;
x[m].i = x[mr].i;
x[mr].r = t.r;
x[mr].i = t.i;
}
}
l = 1;
while (l < *n)

{
for (m = 0 ; m < l ; ++m)
{
tmp_int = l * 2;

i += tmp_int)
{
tmp.r = 0.0;

x[i + l].r = x[i].r - t.r;

x[i + l].i = x[i].i - t.i;
x[i].r = x[i].r + t.r;

x[i].i = x[i].i + t.i;
}
}
l *= 2;
}
return;
}

{
double expx;

}

/* HPRG4_9.C */
/* Calculates Correlation based on FFT */
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <math.h>
long x_addr, flag_addr, y_addr;
int p1 = 1;
int p2 = 300;
int p3 = 2049;
int error, k, number_points;

long seed;
float x[2050];
double sprand(long *seed);

void save_file(float *y, int *number_points, int *number_plots, char *name);
void main()
{
int status;
printf("\n\n\n*** CORRELATION/AUTOCORRELATION CALCULATION BASED ON FFT ***\n\n\n");

printf("AutoCorrelation of 1500 samples of SineWave in Noise\n\n");
seed = 100;
for (k = 0 ; k <= 1499 ; ++k)

{
x[k] = (float) (sin(2.0 * M_PI * (double) k / 25.0)
+ sqrt(60.0) * (sprand(&seed) - 0.5));
}
for (k = 1500 ; k <= 2047 ; ++k)

{
x[k] = 0.0;
}
number_points = 300;
save_file(x,&number_points,&p1,"hprg4_9i");
printf("The input data was saved in hprg4_9i.dat\n");
dsp_select(1);
exit(1);
}
dsp_reset();

dsp_dl_long_array(x_addr,p3,(long*)&x);
status=1;
while(1){
}
printf("\nThe data is being uploaded..\n");
dsp_up_long_array(x_addr,300,(long*)x);
save_file(x,&number_points,&p1,"hprg4_9o");
printf("The output data was saved in hprg4_9o.dat\n");
}
double sprand(long *seed)

{
double ret_val;
*seed = 2045 * *seed + 1;

*seed -= (*seed / 1048576) * 1048576;
ret_val = (double) ((*seed + 1) / 1048577.0);

return(ret_val);
}
void save_file(float *y, int *number_points, int *number_plots, char *name)

{
FILE *fp;
char filename[11];
int i, j;
sprintf(filename,"%s.DAT",name);
fp = fopen(filename,"w+");
for ( i = 0 ; i < *number_points ; i++ ) {

for ( j = *number_plots-1 ; j >= 0 ; j-- ) {
fprintf(fp,"%15.7E", y[i + (j * *number_points)]);
}
fprintf(fp,"\n");
}
fclose(fp);
/* DPRG4_9.C */
/* COrrelation/Autocorrelation */
#define pi 3.14159265358979323846
typedef struct {

float r, i;
} complex;
long i0 = 0;
long p1 = 1;
long P2 = 300;
long P3 = 2049;
void spcorr();
long error, k, number_points;
float x[2050];
int flag=0;
main()
{
int i;
init_c40();
while (flag!=1);
dsp30(x,2050);
spcorr(x, x, &P3, &i0, &P2, &error);
ieee30(x,300);
flag=0;
while(1);
void spcorr(float *x, float *y, long *l, long *type, long *nmax, long *error)
{
void spfftr(), spiftr();
long j, k, m, n;
complex cx;
float test;
n = *l - 1;
if (*nmax < 0 || *nmax >= n)
{
*error = 2;
return;
}
test = (float) n;
test /= 2.0;
while ((test - 2.0) > 0.0)

{
test /= 2.0;
}
if ((test - 2.0) == 0)
{
for (k = 0 ; k < n && y[k] == 0.0 ; ++k) ;
for (j = n - 1 ; j >= 0 && x[j] == 0.0 ; --j) ;

if ((n - 1 - j) < (*nmax - k))
{
*error = 3;
return;
}
spfftr(x, &n);
if (*type != 0)
{
spfftr(y, &n);
}
for (m = 0 ; m <= (n / 2) ; ++m)

{
cx.r = x[m * 2] * y[m * 2] - -x[(m * 2) + 1] * y[(m * 2) + 1];
cx.i = x[m * 2] * y[(m * 2) + 1] + -x[(m * 2) + 1] * y[m * 2];
x[m * 2] = cx.r / n;
x[(m * 2) + 1] = cx.i / n;
}
spiftr(x, &n);
*error = 0;
}
else if ((test - 2.0) < 0.0)
{
*error = 1;
}
return;
}

{
void r_cnjg();
void spfftc();
long m, tmp_int;
tmp_int = *n / 2;
x[*n / 2].r = x[0].r;

x[*n / 2].i = x[0].i;
for (m = 0 ; m <= (*n / 4) ; ++m)

{
tmp.r = (((1.0 + u.r) * x[m].r - u.i * x[m].i)

tmp.i = (((1.0 + u.r) * x[m].i + u.i * x[m].r)


x[m].i = ((1.0 - u.r) * x[m].i + -u.i * x[m].r
x[m].r = tmp_float;
r_cnjg(&x[*n / 2 - m], &tmp);

}
return;
}

{
void complex_exp();
float pisign;
mr = 0;
for (m = 1 ; m < *n ; ++m)

{
l = *n;
l /= 2;

{
l /= 2;
}
mr = mr % l + l;
if (mr > m)
{
t.r = x[m].r;
t.i = x[m].i;
x[m].r = x[mr].r;
x[m].i = x[mr].i;
x[mr].r = t.r;
x[mr].i = t.i;
}
}
l = 1;
while (l < *n)

{
for (m = 0 ; m < l ; ++m)
{
tmp_int = l * 2;

i += tmp_int)
{
tmp.r = 0.0;


x[i + l].r = x[i].r - t.r;

x[i + l].i = x[i].i - t.i;
x[i].r = x[i].r + t.r;

x[i].i = x[i].i + t.i;
}
}
l *= 2;
}
return;
}
void spiftr(complex *x, long *n)

{
void r_cnjg();
void spfftc();
long m, tmp_int;
complex u, tmp_complex, tmp;
for (m = 0 ; m <= (*n / 4) ; ++m)

{
u.i = (float) -cos((double) m * tpn);
tmp.r = ((1.0 + u.r) * x[m].r - u.i * x[m].i)

+ ((1.0 - u.r) * tmp_complex.r - -u.i * tmp_complex.i);
tmp.i = ((1.0 + u.r) * x[m].i + u.i * x[m].r)
+ ((1.0 - u.r) * tmp_complex.i + -u.i * tmp_complex.r);
tmp_float = ((1.0 - u.r) * x[m].r - -u.i * x[m].i)

+ ((1.0 + u.r) * tmp_complex.r - u.i * tmp_complex.i);
x[m].i = ((1.0 - u.r) * x[m].i + -u.i * x[m].r)
+ ((1.0 + u.r) * tmp_complex.i + u.i * tmp_complex.r);
x[m].r = tmp_float;
r_cnjg(&x[*n / 2 - m], &tmp);

}
tmp_int = *n / 2;
spfftc(x, &tmp_int, &p1);
return;
}

{
double expx;

}

{
r->r = z->r;
r->i = -z->i;
}
/* HPRG41_0.C */
/* Convolution based on FFT */
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <math.h>

int i0 = 0;
int p1 = 1;
int p2 = 2049;
float coeff[8] = { 8.0, 4.0, 7.0, 2.0, 6.0, 3.0, 1.0, 1.0 };
void save_file(float *y, int *number_points, int *number_plots, char *name);
void main()
{
int status;
int k,number_points;
static float x[2050], y[2050];
printf("\n\n\n*** CONVOLUTION CALCULATION BASED ON FFT ***\n\n\n");

printf("Input Signal is a damped sine wave\n\n");
for (k = 0 ; k <= 2047 ; ++k)

{
x[k] = (float) (exp(-k / 50.0) *
sin(2.0 * M_PI * (double) k / 50.0));
if (k > 400)
{
x[k] = 0.0;
}
y[k] = 0.0;
/* System's impulse response */
if (k <= 700 && k % 100 == 0)
{
y[k] = coeff[k / 100];
}
}

number_points = 1000;
save_file(x,&number_points,&p1,"inp1");
save_file(y,&number_points,&p1,"inp2");
dsp_select(1);
exit(1);
}
dsp_reset();
dsp_dl_long_array(x_addr,p2,(long*)&x);
dsp_dl_long_array(y_addr,p2,(long*)&y);
status=1;
while(1){
}
dsp_up_long_array(x_addr,1000,(long*)x);
save_file(x,&number_points,&p1,"prg41_0");
}
void save_file(float *y, int *number_points, int *number_plots, char *name)

{
FILE *fp;
char filename[11];
int i, j;
sprintf(filename,"%s.DAT",name);
fp = fopen(filename,"w+");
for ( i = 0 ; i < *number_points ; i++ ) {

for ( j = *number_plots-1 ; j >= 0 ; j-- ) {
fprintf(fp,"%15.7E", y[i + (j * *number_points)]);
}
fprintf(fp,"\n");
}
fclose(fp);

/* DPRG41_1.C */
#define pi 3.14159265358979323846
#define ABS(x) ((x) >= 0 ? (x) : -(x))

#define MIN(a,b) ((a) <= (b) ? (a) : (b))
#define MAX(a,b) ((a) >= (b) ? (a) : (b))
typedef struct {
float r, i;
} complex;

long pos_i1 = 1;
long pos_i16 = 16;
long pos_i31 = 31;
float f0 = 0.0;
long k, error, nsgmts, number_points;

float work[34], x[32], y[17];
int flag=0;
main()
{
init_c40();
while (flag!=1);
dsp30(x,32);
dsp30(f0,1);
sppowr(x, y, work, &pos_i31, &pos_i16, &pos_i1, &f0, &nsgmts, &error);
ieee30(y,17);
flag=0;
while(1);
void sppowr(float *x, float *y, float *work, long *lx, long *ly, long *iwindo, float *ovrlap, long *nsgmts, long
*error)
{
void spmask(), spfftr();
long m, nsamp, tmp_int, isegmt, nshift;

float base, tsv;
if ((*lx + 1) < (*ly * 2))

{
*error = 2;
return;
}
base = (float) (*ly);

base /= 2.0;
while ((base - 2.0) > 0.0)

{
base /= 2.0;
}
if ((base - 2.0) == 0)
{
for (m = 0 ; m <= *ly ; ++m)

{
y[m] = 0.0;
}
nshift = MIN(MAX((long) ((*ly * 2) * (1.0 - *ovrlap) + 0.5), 1),

*ly * 2);
*nsgmts = 1 + (*lx + 1 - (*ly * 2)) / nshift;
for (isegmt = 0 ; isegmt < *nsgmts ; ++isegmt)

{
for (nsamp = 0 ; nsamp < (*ly * 2) ; ++nsamp)
{
work[nsamp] = x[nshift * isegmt + nsamp];
}
tmp_int = (*ly * 2) - 1;
spmask(work, &tmp_int, iwindo, &tsv, error);
if (*error != 0)
{
return;
}
tmp_int = *ly * 2;
spfftr(work, &tmp_int);
for (m = 0 ; m <= *ly ; ++m)

{
y[m] += (work[m * 2] * work[m * 2] + work[(m * 2) + 1]
* work[(m * 2) + 1]) / (tsv * *nsgmts);
}
}
}
else if ((base - 2.0) < 0.0)
{
*error = 3;
}
return;
}
void spmask(float *x, long *lx, long *type, float *tsv, long *error)
{
double spwndo();
long k, tmp_int;
double w;
if (*type < 1 || *type > 6)

{
*error = 1;
return;
}
*tsv = 0.0;
for (k = 0 ; k <= *lx ; ++k)
{
tmp_int = 1 + *lx;
w = spwndo(type, &tmp_int, &k);
x[k] *= (float) w;
*tsv += (float) (w * w);

}
*error = 0;
return;
}

{
void r_cnjg();
void spfftc();
long m, tmp_int;
tmp_int = *n / 2;
x[*n / 2].r = x[0].r;

x[*n / 2].i = x[0].i;
for (m = 0 ; m <= (*n / 4) ; ++m)

{
tmp.r = (((1.0 + u.r) * x[m].r - u.i * x[m].i)

tmp.i = (((1.0 + u.r) * x[m].i + u.i * x[m].r)


x[m].i = ((1.0 - u.r) * x[m].i + -u.i * x[m].r
x[m].r = tmp_float;
r_cnjg(&x[*n / 2 - m], &tmp);

}
return;
}

{
void complex_exp();
float pisign;
mr = 0;
for (m = 1 ; m < *n ; ++m)

{
l = *n;

l /= 2;

{
l /= 2;
}
mr = mr % l + l;
if (mr > m)
{
t.r = x[m].r;
t.i = x[m].i;
x[m].r = x[mr].r;
x[m].i = x[mr].i;
x[mr].r = t.r;
x[mr].i = t.i;
}
}
l = 1;
while (l < *n)

{
for (m = 0 ; m < l ; ++m)
{
tmp_int = l * 2;

i += tmp_int)
{
tmp.r = 0.0;

x[i + l].r = x[i].r - t.r;

x[i + l].i = x[i].i - t.i;
x[i].r = x[i].r + t.r;

x[i].i = x[i].i + t.i;
}
}
l *= 2;
}
return;
}

{
double expx;


}
#ifndef KR
#else
void r_cnjg(r, z)
complex *r, *z;
#endif
{
r->r = z->r;
r->i = -z->i;
}
double spwndo(long *type, long *n, long *k)

{
long l;
double ret_val;
if ((*type < 1 || *type > 6) || (*k < 0 || *k >= *n))

{
ret_val = 0.0;
return(ret_val);
}
ret_val = 1.0;
switch (*type)
{
case 1:
break;
case 2:
l = (*n - 2) / 10;
if (*k <= l)
{
ret_val = 0.5 * (1.0 - cos((double) *k * pi
/ ((double) l + 1.0)));
}
if (*k > (*n - l - 2))

{
ret_val = 0.5 * (1.0 - cos((double) (*n - *k - 1) * pi
/ ((double) l + 1.0)));
}
break;
case 3:
ret_val = 1.0 - ABS(1.0 - (double) (*k * 2) / ((double) *n - 1.0));
break;
case 4:
ret_val = 0.5 * (1.0 - cos((double) (*k * 2) * pi

/ ((double) *n - 1.0)));
break;
case 5:
ret_val = 0.54 - 0.46 * cos((double) (*k * 2) * pi

/ ((double) *n - 1.0));
break;
case 6:

/ ((double) *n - 1.0))
+ 0.08 * cos((double) (*k * 4) * pi
/ ((double) *n - 1.0));
break;
return(ret_val);
}
• Host file hprg41_1.c
/* HPRG41_1.C */
/* Periodogram example */
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <math.h>
FILE *ifp ,*ofp;

void main()
{
int status,k;
float x[32],y[17];
ifp=fopen("sigin.dat","w");
ofp=fopen("sigout.dat","w");
for (k = 0 ; k <=31 ; ++k)
{
x[k] = (float) sin(2.0 * M_PI * (double) k / 8.0);
fprintf(ifp,"%f\n",x[k]);
}
fclose(ifp);
dsp_select(1);
exit(1);
}
dsp_reset();

dsp_dl_long_array(x_addr,32,(long*)&x);
status=1;
while(1){
}
dsp_up_long_array(y_addr,17,(long*)&y);
for (k=0;k<=16;++k) {
printf("%f\n",y[k]);
fprintf(ofp,"%f\n",y[k]);
}
fclose(ofp);
}

/* DPRG41_1.C */
#define pi 3.14159265358979323846
#define ABS(x) ((x) >= 0 ? (x) : -(x))

#define MIN(a,b) ((a) <= (b) ? (a) : (b))
#define MAX(a,b) ((a) >= (b) ? (a) : (b))
typedef struct {
float r, i;
} complex;

long pos_i1 = 1;
long pos_i16 = 16;
long pos_i31 = 31;
float f0 = 0.0;
long k, error, nsgmts, number_points;

float work[34], x[32], y[17];
int flag=0;
main()
{
init_c40();
while (flag!=1);
dsp30(x,32);
dsp30(f0,1);
sppowr(x, y, work, &pos_i31, &pos_i16, &pos_i1, &f0, &nsgmts, &error);
ieee30(y,17);

flag=0;
while(1);
void sppowr(float *x, float *y, float *work, long *lx, long *ly, long *iwindo, float *ovrlap, long *nsgmts, long
*error)
{
void spmask(), spfftr();
long m, nsamp, tmp_int, isegmt, nshift;

float base, tsv;
if ((*lx + 1) < (*ly * 2))

{
*error = 2;
return;
}
base = (float) (*ly);

base /= 2.0;
while ((base - 2.0) > 0.0)

{
base /= 2.0;
}
if ((base - 2.0) == 0)
{
for (m = 0 ; m <= *ly ; ++m)
{
y[m] = 0.0;
}
nshift = MIN(MAX((long) ((*ly * 2) * (1.0 - *ovrlap) + 0.5), 1),

*ly * 2);
*nsgmts = 1 + (*lx + 1 - (*ly * 2)) / nshift;
for (isegmt = 0 ; isegmt < *nsgmts ; ++isegmt)

{
for (nsamp = 0 ; nsamp < (*ly * 2) ; ++nsamp)
{
work[nsamp] = x[nshift * isegmt + nsamp];
}
tmp_int = (*ly * 2) - 1;
spmask(work, &tmp_int, iwindo, &tsv, error);
if (*error != 0)
{
return;
}
tmp_int = *ly * 2;
spfftr(work, &tmp_int);
for (m = 0 ; m <= *ly ; ++m)

{

y[m] += (work[m * 2] * work[m * 2] + work[(m * 2) + 1]
* work[(m * 2) + 1]) / (tsv * *nsgmts);
}
}
}
else if ((base - 2.0) < 0.0)
{
*error = 3;
}
return;
}
void spmask(float *x, long *lx, long *type, float *tsv, long *error)
{
double spwndo();
long k, tmp_int;
double w;
if (*type < 1 || *type > 6)

{
*error = 1;
return;
}
*tsv = 0.0;
for (k = 0 ; k <= *lx ; ++k)
{
tmp_int = 1 + *lx;
w = spwndo(type, &tmp_int, &k);
x[k] *= (float) w;
*tsv += (float) (w * w);
}
*error = 0;
return;
}

{
void r_cnjg();
void spfftc();
long m, tmp_int;
tmp_int = *n / 2;
x[*n / 2].r = x[0].r;

x[*n / 2].i = x[0].i;
for (m = 0 ; m <= (*n / 4) ; ++m)

{

tmp.r = (((1.0 + u.r) * x[m].r - u.i * x[m].i)
tmp.i = (((1.0 + u.r) * x[m].i + u.i * x[m].r)


x[m].i = ((1.0 - u.r) * x[m].i + -u.i * x[m].r
x[m].r = tmp_float;
r_cnjg(&x[*n / 2 - m], &tmp);

}
return;
}

{
void complex_exp();
float pisign;
mr = 0;
for (m = 1 ; m < *n ; ++m)

{
l = *n;
l /= 2;

{
l /= 2;
}
mr = mr % l + l;
if (mr > m)
{
t.r = x[m].r;
t.i = x[m].i;
x[m].r = x[mr].r;
x[m].i = x[mr].i;
x[mr].r = t.r;
x[mr].i = t.i;
}
}
l = 1;
while (l < *n)

{
for (m = 0 ; m < l ; ++m)
{
tmp_int = l * 2;

i += tmp_int)
{
tmp.r = 0.0;

x[i + l].r = x[i].r - t.r;

x[i + l].i = x[i].i - t.i;
x[i].r = x[i].r + t.r;

x[i].i = x[i].i + t.i;
}
}
l *= 2;
}
return;
}

{
double expx;

}
#ifndef KR
#else
void r_cnjg(r, z)
complex *r, *z;
#endif
{
r->r = z->r;
r->i = -z->i;
}
double spwndo(long *type, long *n, long *k)

{
long l;
double ret_val;
if ((*type < 1 || *type > 6) || (*k < 0 || *k >= *n))

{
ret_val = 0.0;
return(ret_val);
}
ret_val = 1.0;
switch (*type)
{

case 1:
break;
case 2:
l = (*n - 2) / 10;
if (*k <= l)
{
ret_val = 0.5 * (1.0 - cos((double) *k * pi
/ ((double) l + 1.0)));
}
if (*k > (*n - l - 2))

{
ret_val = 0.5 * (1.0 - cos((double) (*n - *k - 1) * pi
/ ((double) l + 1.0)));
}
break;
case 3:
ret_val = 1.0 - ABS(1.0 - (double) (*k * 2) / ((double) *n - 1.0));
break;
case 4:
ret_val = 0.5 * (1.0 - cos((double) (*k * 2) * pi

/ ((double) *n - 1.0)));
break;
case 5:

/ ((double) *n - 1.0));
break;
case 6:

/ ((double) *n - 1.0))
+ 0.08 * cos((double) (*k * 4) * pi
/ ((double) *n - 1.0));
break;
return(ret_val);
}

DSP

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

DSP

Caricato da

Copyright:

Formati disponibili

4.

Discrete Fourier Transform (DFT)/Fast Fourier Transform (FFT)

We will analyze various types of DFT and FFT algorithms by implementing

4.1 Some Definitions

4.1.1 Fourier Series

where cn is the Fourier series coefficient given by

t is an independent variable with frequencies nω, known as the nth harmonics of ω,

Chapter 4: DFT / FFT 261

whereas the inverse Fourier transform is given by

4.2 Discrete Fourier Transform (DFT)

The discrete Fourier transform (DFT) of a discrete-time signal x[n] is defined

Chapter 4: DFT / FFT 262

4.2.1 Computation of the DFT: The Goertzel Algorithm

Chapter 4: DFT / FFT 263

Figure 4.1 Goertzel Algorithm as a Recursive Filter

The second order Goertzel algorithm [5, 21] is given by

4.2.2 Notes on DFT

Chapter 4: DFT / FFT 264

The DFT is the Fourier transform of a sampled signal computed at equally

Chapter 4: DFT / FFT 265

Table 4.2 Properties of the DFT

Another important property of the DFT is its application to the performance of

Chapter 4: DFT / FFT 266

We also used this algorithm to compute DFT for the sequence

Chapter 4: DFT / FFT 267

Chapter 4: DFT / FFT 268

1. Modify hprg4_1.c and dprg4_1.c to compute the IDFT of output given in

3. Calculate the DFT of the following two sequences:

4. Plot the following sequence:

Chapter 4: DFT / FFT 269

5. Convert program in file dprg4_1.c into an optimized assembly language file.

4.3 Fast Fourier Transform (FFT)

The direct calculation of the DFT from equation (4.5) is computationally

4.3.1 Radix-2 Algorithms

Chapter 4: DFT / FFT 270

Chapter 4: DFT / FFT 271

Figure 4.3 Butterfly Operation

Figure (4.3) shows an 8-point decimation-in-time (DIT) FFT. This is so called

Chapter 4: DFT / FFT 272

Figure 4.4 Signal Flow Graph

4.3.2 Decimation-in-frequency (DIF) Radix-2 Algorithm

Chapter 4: DFT / FFT 273

Chapter 4: DFT / FFT 274

Figure 4.6 Butterfly Diagram for the DIF FFT

4.3.3 FFT Radix-4 algorithm

Chapter 4: DFT / FFT 275

A radix-4 decimation-in-frequency (DIF) FFT is also obtained in a similar fashion as

Chapter 4: DFT / FFT 276

Chapter 4: DFT / FFT 277

4.3.4 FFT Split-radix algorithm

Figure 4.9 Signal Flow Graph for Length 16 Split-Radix FFT

Chapter 4: DFT / FFT 278

Chapter 4: DFT / FFT 279

Figure (4.10) shows the plot of this sequence.

Figure 4.10 Plot of Hypothetical Voltage Waveform

Figure 4.11 FFT of Figure (4.11) Sequence

Chapter 4: DFT / FFT 280

Chapter 4: DFT / FFT 281

2. Convert the program in dprg4_4.c to an assembly language program with C

3. Generate the sequence

Chapter 4: DFT / FFT 282

6. Convert the Radix-4 algorithms on file dprg4_3.c into assembly language

printf("\n\n\tDirect Computation of DFT\n\n");

printf("\n\n\tFFT Radix-2 Calculations (Sample program)\n\n");