Freescale Semiconductor Application Note
Document Number: AN3666 Rev. 0, 11/2010
Software Optimization of FFTs and IFFTs Using the SC3850 Core
The Fast Fourier Transform (FFT) is a numerically efficient algorithm used to compute the Discrete Fourier Transform (DFT). The Radix2 and Radix4 algorithms are used mostly for practical applications due to their simple structures. Compared with Radix2 FFT, Radix4 FFT provides a 25% savings in multipliers. For a complex Npoint Fourier transform, the Radix4 FFT reduces the number of complex multiplications from N ^{2} to 3(N/4)log N and the number of complex additions from N ^{2} to 8(N/4)log _{4} N, where log _{4} N is the number of stages and N/4 is the number of butterflies in each stage. FFTs are of importance to a wide variety of applications, such as telecommunications (3GPPLTE, WiMAX, and so on). For example, Orthogonal Frequency Division Multiplexing (OFDM) signals are generated using the FFT algorithm.
This application note describes the implementation of the Radix4 decimationintime (DIT) FFT algorithm using the Freescale StarCore SC3850 core. The document discusses how to use new features available in the SC3850 core, such as dualmultiplier, to improve the performance of the FFT. Code optimization and performance results are also investigated in this document. Typical reference code is included in this document to demonstrate the implementation details.
4
Contents
1. Introduction 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
2 

2. Radix4 FFT Algorithm 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
4 

3. SC3850 Data Types and Instructions 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
15 

4. Implementation on the SC3850 Core 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
21 

5. Experimental Results 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
47 

6. Conclusions 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
50 

7. References 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
. 
50 
© 2008–2010 Freescale Semiconductor, Inc.
Introduction
1
1.1
Introduction
Overview
The discrete Fourier transform (DFT) plays an important role in the analysis, design, and implementation of discretetime signal processing algorithms and systems because efficient algorithms exist for the computation of the DFT. These efficient algorithms are called Fast Fourier Transform (FFT) algorithms. In terms of multiplications and additions, the FFT algorithms can be orders of magnitude more efficient than competing algorithms.
It is well known that the DFT takes N ^{2} complex multiplications and N ^{2} complex additions for complex
Npoint transform. Thus, direct computation of the DFT is inefficient. The basic idea of the FFT algorithm is to break up an Npoint DFT transform into successive smaller and smaller transforms known as butterflies (basic computational elements). The small transforms used can be 2point DFTs known as Radix2, 4point DFTs known as Radix4, or other points. A twopoint butterfly requires 1 complex multiplication and 2 complex additions, and a 4point butterfly requires 3 complex multiplications and 8 complex additions. Therefore, the Radix2 FFT reduces the complexity of a Npoint DFT down to (N/2)log _{2} N complex multiplications and Nlog _{2} N complex additions since there are log _{2} N stages and each stage has N/2 2point butterflies. For the Radix4 FFT, there are log _{4} N stages and each stage has N/4 4point butterflies. Thus, the total number of complex multiplication is (3N/4)log _{4} N = (3N/8)log _{2} N and the number of required complex additions is 8(N/4)log _{4} N = Nlog _{2} N.
Above all, the radix4 FFT requires only 75% as many complex multiplies as the radix2 FFT, although it uses the same number of complex additions. These additional savings make it a widelyused FFT algorithm. Thus, we would like to use Radix4 FFT if the number of points is power of 4. However, if the number of points is power of 2 but not power of 4, the Radix2 algorithm must be used to complete the whole FFT process. In this application note, we will only discuss Radix4 FFT algorithm.
Now, let’s consider an example to demonstrate how FFTs are used in real applications. In the 3GPPLTE (Long Term Evolution), Mpoint DFT and Inverse DFT (IDFT) are used to convert the signal between frequency domain and time domain. 3GPPLTE aims to provide for an uplink speed of up to 50Mbps and
a downlink speed of up to 100Mbps. For this purpose, 3GPPLTE physical layer uses Orthogonal
Frequency Division Multiple Access (OFDMA) on the downlink and Single Carrier  Frequency Division Multiple Access (SCFDMA) on the uplink. Figure 1 shows the transmitter and receiver structure of OFDMA and SCFDMA systems.
We can see from this example that DFT and IDFT are the key elements to represent changing signals in time and frequency domains. In real applications, FFTs are normally used to for high performance instead of direct DFT calculation.
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
2
Freescale Semiconductor
{ Xn }
{ Xn }
Downlink: OFDMA
Figure 1. Transmitter and Receiver Structure of SCFDMA and OFDMA Systems
1.2
Organization
The rest of the document is organized as follows:
• Section 2, “Radix4 FFT Algorithm” gives a brief background for DFT, and describes (decimation in frequency) DIF and (decimation in time) DIT Radix4 FFT algorithms.
• Section 3, “SC3850 Data Types and Instructions” investigates some features in SC3850, which can be used for efficient FFT implementation.
• Section 4, “Implementation on the SC3850 Core” provides the detailed implementation on SC3850, and discusses fixedpoint implementation issues. How to fully utilize the resource in SC3850 and optimize the implementation is also discussed. Source code is included for reference.
• Section 5, “Experimental Results” presents experimental results.
• Section 6, “Conclusions” presents the conclusion.
• Section 7, “References” provides a list of useful references.
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
Freescale Semiconductor
3
Radix4 FFT Algorithm
2 Radix4 FFT Algorithm
2.1 DFT and IDFT
The Fast Fourier Transform (FFT) is a computationally efficient algorithm to calculate a Discrete Fourier
Transform (DFT). The DFT X(k), k=0,1,2,
X(k)
=
,N1
N – 1
∑
x(n)
of a sequence x(n), n=0,1,2,
exp
⎛
⎝
2πnk ⎞
⎠
j 
–
N
,N1
is defined as
Eqn. 1
Eqn. 2
is the twiddle factor. Equation 1
is called the Npoint DFT of the sequence of x(n). For each value of k, the value of X(k) represents the
Fourier transform at the frequency
2πk

N
. The IDFT is defined as follows:
x(n)
–nk
^{W} N
=
=
=
=
1

N
N – 1
∑
= 0
– 1
k
N
X(k)
exp
⎛
⎝
2πnk ⎞
⎠
j 
N
1

N
∑
k = 0
–nk
X(k)W _{N}
exp
⎛
⎝ j
⎞
N ⎠
2πnk

cos
⎛
⎝
2πnk ⎞
⎠

N
+
j
sin
⎛
⎝
⎞
⎠
2πnk

N
Eqn. 3
Eqn. 4
Equation 3 is essentially the same as Equation 1. The differences are that the exponent of the twiddle factor in Equation 3 is the negative of the one in Equation 1 and the scaling factor is 1/N. The IDFT can be simply computed using the same algorithms for DFT but with conjugated twiddle factors. Alternatively, we can use the same twiddles factors for DFT with conjugated input and output to compute IDFT. Equation 1 is also called the analysis equation and Equation 3 the synthesis equation.
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
4
Freescale Semiconductor
Radix4 FFT Algorithm
From Equation 1, it is clear that to compute X(k) for each k, it requires N complex multiplications and N1 complex additions. So, for N values of k, that is, for the entire DFT, it requires N ^{2} complex multiplications
and complex additions. Thus, the DFT is very computationally intensive. Note that a
multiplication of two complex numbers requires four real
multiplications and two real additions. A complex addition requires
two real additions.
We will present two commonly used FFT algorithms: decimation in frequency (DIF) and decimation in time (DIT). Please note that the Radix4 algorithms work out only when the FFT length N is a power of four.
NN(
– 1) ≈ N ^{2}
(a + jb) × (c + jd) = (ac – bd) + j(bc + ad)
(a + jb) + (c + jd) = (ac+
) + jb(
+ d)
2.2 Radix4 DIF FFT
We will use the properties shown by Equation 5 in the derivation of the algorithm.
Symmetry property:
Periodicity property:
N
k + 
W _{N}
2
^{W} _{N}
k + N
=
=
–
k
^{W} N
k
^{W} N
Eqn. 5
The Radix4 DIF FFT algorithm breaks a Npoint DFT calculation into a number of 4point DFTs (4point butterflies). Compared with direct computation of Npoint DFT, 4point butterfly calculation requires much less operations. The Radix4 DIF FFT can be derived as shown in Equation 6.
X(k)
=
=
=
=
=
∑
nk
x(n)W _{N}
n 
= 0 

N 
2N 

 
– 1 
 – 1 4 
4 
∑
n = 0
x(n)W
nk
N
++
N
∑
x(n)W
nk
N
n _{=} 
4
 3N – 1
4
∑
2N
n _{=} 
4
x(n)W
nk
N
+
N – 1
∑
n _{=}  3N
4
nk
x(n)W _{N}
N
 – 1
4
∑
n = 0
N
 – 1
4
nk
x(n)W _{N}
∑ x(n)W
nk
N
N
 – 1
4
4
⎞
⎠
W
⎛
⎝
n +  ^{⎞} k
N
4
⎠
N
N
++
∑
x
⎛
⎝
n + 
n = 0
N
 – 1
4
⎛
++
⎝
Nk

4
∑
x
N
n + 
⎞
⎠
W
N
W
N
nk
4
N
 – 1
4
∑
n = 0
x
⎛
⎝
2N ⎞
⎠
n + 
4
W
n +  2N ^{⎞} ⎠ k
⎛
⎝
N
4
+
N
 – 1
4
∑
n = 0
x
⎛
⎝
3N
n + 
^{⎞}
⎠
4
⎛
⎝
n +  3N ^{⎞} ⎠ k
4
^{W} N
N 
N 

 
– 1 
 
– 1 
4 
4 
2Nk

W
N
4
∑ x
⎛
⎝
2N ⎞
⎠
n + 
4
W
nk
N
+
3Nk

^{W} N
4 ∑ x
⎛ ⎝
3N
n + 
^{⎞}
⎠
4
nk
^{W} N
n = 0
N
 – 1
4
⎧
∑ ⎨
⎩
n = 0
x(n)
n = 0
⎛
++
⎝
Nk

4
N
n + 
⎞
⎠
W
N
x
4
W
2Nk

4
N
x ⎛
⎝
2N ⎞
⎠
n + 
4
n = 0
+ W
3Nk

4
N
x
⎛
⎝
3N ⎞
⎠
n + 
4
⎫
⎬ ^{W} N
⎭
nk
n = 0
k = 0123,,,, …, N – 1
Eqn. 6
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
Freescale Semiconductor
5
Radix4 FFT Algorithm
The special factors of Equation 7.
Nk 
2NK 
3Nk 
 
 
 
4
^{W} N
,
^{W}
^{W}
^{W}
^{W} N
4
Nk

4
N
2Nk

N
4
3Nk

N
4
=
=
=
, and
^{W} N
4 in above equation can be calculated as shown in
exp
⎛
⎝
exp
exp
2πk
N
–j  ⋅ 
N
2πk
4
4
⎞
⎠
3N ⎞
⎠
⎛
⎝
2N
–j  ⋅ 
N
2πk
⎛
⎝
–j  ⋅ 
N
4
⎞
⎠
⎛
⎝
πk ⎞
⎠
–
==
exp
j 
2
(–j) ^{k}
exp(–jπk)
==
(–1) ^{k}
j ^{k}
== ⎛ 3πk ⎞
exp
⎝
j 
–
2
⎠
Eqn. 7
Then, Equation 6 can be rewritten as shown in Equation 8.
X(k)
=
N
 – 1
4
∑
n = 0
⎧
⎨
⎩
x(n)
⎛
++
⎝
N
n + 
⎞
⎠
(–j)
k
x
4
(–1)
k
x
⎛
⎝
2N ⎞
⎠
n + 
4
k = 0123,,,, …, N – 1
+ j
k
x
⎛
⎝
3N ⎞
⎠
n + 
4
⎫ ⎬ ^{W} N ⎭
nk
Eqn. 8
Considering
very similar to a N/4point FFT. However, it is not an FFT of length N/4 because the twiddle factor
depends on N instead of N/4. To make this equation an N/4point FFT, the transform X(k) can be broken
into four parts as shown in Equation 9.
as one signal, Equation 8 looks
{
x(n)
++
(–j)
k
xn(
+ N ⁄ 4)
(–1)
k
x(n + (2N) ⁄ 4)
+
j
k
x(n + (3N) ⁄ 4)}
nk
^{W} N
X(4k) 
= 
X(4k + 1) 
= 
X(4k + 2) 
= 
X(4k + 3) 
= 
k 
= 
N
 – 1
4
∑
n
N

4
= 0
– 1
∑
n
N

4
= 0
– 1
∑
n
N

4
= 0
– 1
∑
n = 0
⎧
⎨
⎩
⎧
⎨
⎩
⎧
⎨
⎩
⎧
⎨
⎩
x(n)
x(n)
x(n)
x(n)
⎛
⎝
N
⎞
⎠
++ n + 
x
4
–
jx
⎛
⎝
N
n + 
⎞
⎠
4
–
x
⎛
⎝
2N ⎞
⎠
n + 
4
⎛
⎝
2N ⎞
⎠
xn + 
4
–
+
x
⎛
⎝
N
n + 
⎞
⎠
4
jx
⎛
⎝
N
n + 
⎞
⎠
4
+
⎛
⎝
2N ⎞
⎠
xn + 
4
– x
⎝ ⎛ n + 
2N ⎞
⎠
4
N
0123 ,,,, … _{,}  – 1
4
+ x
⎛
⎝
3N ⎞
⎠
n + 
4
⎫ ⎬ ^{W} N ⎭
0
nk ^{W} N ⁄ 4
+
jx
⎛
⎝
3N ⎞
⎠
n + 
4
⎫ ⎬ ^{W} N ⎭
n
nk ^{W} N ⁄ 4
– x
⎛
⎝
3N ⎞
⎠
n + 
4
⎫
⎬
⎭
–
jx
⎝ ⎛ n + 
3N ⎞
⎠
4
^{W} N
^{2}^{n} ^{W} N ⁄ 4
nk
⎫
⎬
⎭
^{W} N
^{3}^{n} ^{W} N ⁄ 4
nk
Eqn. 9
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
6
Freescale Semiconductor
Radix4 FFT Algorithm
In Equation 9, the twiddle factors are obtained as shown in Equation 10.
^{W}
4nk
N
=
_{W} N
n(4k + 1)
_{W} N
n(4k + 2)
_{W}
n(4k + 3)
N
exp
⎛
⎝
– j
⎛
⎝
2π  ⋅ 4nk ⎞
N ⎠
⎞
⎠
⎛
⎝
2πnk
j 
⎞
⎠
==
–
N ⁄ 4
exp
0
^{W} N
nk
^{W} N ⁄ 4
⎛
⎝
– j
⎛
⎝
2π ⋅ n(4k + 1) ⎞

N
⎠
2π ⋅ n(4k + 2) ⎞
⎠

⎞
⎠
⎛
⎝
2πn
2πnk ⎞
⎠
=
exp
exp
exp
⎛
⎝
– j
– j
⎛
⎝
==
–
⎛
⎝
exp
exp
exp
j  – j 
N
N ⁄ 4
2π ⋅ 2n
^{W} N
n
nk ^{W} N ⁄ 4
^{W} N
^{W} N
⎞
⎠
2πnk ⎞
⎠
2πnk ⎞
⎠
===
–
N
⎛
⎝
⎛
⎝
2π ⋅ n(4k + 3) ⎞

⎠
⎞
⎠
⎛
⎝
j  – j 
N
2π ⋅ 3n
N ⁄ 4
j  – j 
N
N ⁄ 4
^{2}^{n} ^{W} N ⁄ 4
nk
^{3}^{n} ^{W} N ⁄ 4
nk
===
–
N
If we use the definitions shown in Equation 11.
y(n) 
= 

yn( 
+ N ⁄ 4) 
= 
y(n + (2N) ⁄ 4) 
= 

y(n + (3N) ⁄ 4) 
= 
⎧
⎨
⎩
⎧
⎨
⎩
⎧
⎨
⎩
⎧
⎨
⎩
x(n)
x(n)
x(n)
x(n)
⎛
⎝
N
⎞
⎠
++ x n + 
4
–
jx
⎛
⎝
N
n + 
⎞
⎠
4
–
x
2N
⎛ ⎝ n + 
⎞
⎠
4
2N
xn + 
⎛
⎝
⎞
⎠
4
– x
N
⎝ ⎛ n + 
⎞
⎠
4
+
jx
N
⎛ ⎝ n + 
⎞
⎠
4
+
⎛
⎝
2N ⎞
⎠
xn + 
4
– x
2N
⎛ ⎝ n + 
⎞
⎠
4
+ x
⎛
⎝
3N ⎞
⎠
n + 
4
⎫
⎬ ^{W} N
⎭
0
+ jx
⎛
⎝
3N ⎞
⎠
n + 
4
⎫
⎬ ^{W} N
⎭
n
– x
⎝ ⎛ n + 
3N ⎞
⎠
4
⎫
⎬ ^{W} N
⎭
2n
–
jx
⎛ ⎝ n + 
3N ⎞
⎠
4
⎫
⎬ ^{W} N
⎭
3n
Eqn. 10
Eqn. 11
From Equation 9, we can see that X(4k), X(4k+1), X(4k+2), and X(4k+3) are N/4point FFT of y(n), y(n+N/4), y(n+2n/4), and y(n+3N/4), respectively. As a result, an Npoint DFT is reduced to the computation of four N/4point DFTs.
Figure 2 shows the corresponding Radix4 butterfly calculation of Equation 11. Through further rearrangement, it can be shown that each Radix4 DIF butterfly requires of 3 complex multiplications and 8 complex additions (12 real multiplications and 22 real additions). The decimation of the data sequences referred to as stage 1 of the decomposition. This process can be repeated again and again until the resulting sequences are reduced to onepoint sequences. For N=4 ^{p} , this decimation can be performed p=log _{4} N times. An example is shown in Figure 4 to illustrate the whole process. Thus the total number of complex multiplications is 3(N/4)log _{4} N since there are 3 complex multiplications each butterfly, N/4 butterflies each stage, and log _{4} N stages. The number of complex additions is 8(N/4)log _{4} N since there are 8 complex additions each butterfly, N/4 butterflies each stage, and log _{4} N stages.
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
Freescale Semiconductor
7
Radix4 FFT Algorithm
Figure 2. Radix4 DIF Butterfly
In Figure 2, each of output y(n), y(n + N / 4), y(n + 2N / 4), and y(n + 3N / 4) is a sum of four input signals x(n), x(n + N / 4), x(n + 2N / 4), and x(n + 3N / 4), each multiplied by either +1, –1, j, or –j, and then the sum is multiplied by a twiddle factor (1, W _{N} ^{n} , W _{N} ^{2}^{n} , W _{N} ^{3}^{n} ). The process of multiplication after addition is called “Decimation In Frequency (DIF)”. On the other hand, the process of addition after the multiplication is called “Decimation In Time (DIT)”. To illustrate the algorithm clearly, a simplified butterfly is shown in Figure 3.
x(n)
x(n + N/4)
x(n + 2N/4)
x(n + 3N/4)
y(n)
y(n + N/4)
y(n + 2N/4)
y(n + 3N/4)
y(n) : intermediate signal
Figure 3. Simplified Radix4 DIF Butterfly
The Radix4 DIF FFT algorithm processes an array of data by successive passes over the input data samples. On each pass, the algorithm performs Radix4 DIF butterflies, where each butterfly picks up four complex data and returns four complex data as shown in Figure 2 and Figure 3. The input data, output data, and the twiddle factors are assumed in the following format:
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
8
Freescale Semiconductor
Input data:
Output data:
Twiddle factors:
x
x
x(n) =
x
⎛
⎝
n +  ^{⎞}
⎠
N
4
⎛
⎝
⎛
⎝
n +  2N ^{⎞} ⎠
4
n +  ^{⎞}
⎠
3N
4
=
=
=
y
y
y(n) =
y
⎛
⎝
n +  ^{⎞}
⎠
N
4
⎛
⎝
n +  2N ^{⎞} ⎠
4
⎛
⎝
n +  3N ^{⎞} ⎠
4
=
=
=
Ar + j × Ai
Br + j × Bi
Cr + j × Ci
Dr + j × Di
Ar′ + j × Ai′
Br′ + j × Bi′
Cr′ + j × Ci′
Dr′ + j × Di′
Wbr + j × Wbi
W _{N} ^{2}^{n} = Wcr + j × Wci
^{W} N
n
=
W _{N} ^{3}^{n} = Wdr + j × Wdi
Radix4 FFT Algorithm
Eqn. 12
Eqn. 13
Eqn. 14
The real and imaginary part of output data for Radix4 DIF butterfly is specified in Equation 15.
Ar′ = Ar +++Br Cr Dr
= (Ar + Br) + (Cr + Dr)
Ai′ = Ai +++Bi Ci Di
= (Ai + Ci) + (Bi + Di)
Br′ = (Ar + Bi – Cr – Di) × Wbr – (Ai – Br – Ci + Dr) × Wbi
= ((Ar – Cr) + (Bi – Di)) × Wbr – ((Ai – Ci) – (Br – Dr)) × Wbi
Bi′ = (Ai – Br – Ci + Dr) × Wbr + (Ar + Bi – Cr – Di) × Wbi
= ((Ai – Ci) – (Br – Dr)) × Wbr + ((Ar – Cr) + (Bi – Di)) × Wbi
Cr′ = (Ar – Br + Cr – Dr) × Wcr – (Ai – Bi + Ci – Di) × Wci
Eqn. 15
= ((Ar + Cr) – (Br + Dr)) × Wcr – ((Ai + Ci) – (Bi + Di)) × Wci
Ci′ = (Ai – Bi + Ci – Di) × Wcr + (Ar – Br + Cr – Dr) × Wci
= ((Ai + Ci) – (Bi + Di)) × Wcr + ((Ar + Cr) – (Br + Dr)) × Wci
Dr′ = (Ar – Bi – Cr + Di) × Wdr – (Ai + Br – Ci – Dr) × Wdi
= ((Ar – Cr) – (Bi – Di)) × Wdr – ((Ai – Ci) + (Br – Dr)) × Wdi
Di′ = (Ai + Br – Ci – Dr) × Wdr + (Ar – Bi – Cr + Di) × Wdi
= ((Ai – Ci) + (Br – Dr)) × Wdr + ((Ar – Cr) – (Bi – Di)) × Wdi
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
Freescale Semiconductor
9
Radix4 FFT Algorithm
As we can see from Equation 15 that there are 12 real multiplications and 22 real additions (count only once for duplicated additions and multiplications). It is equivalent to 3 complex multiplications and 8 complex additions since one complex multiplication requires four real multiplications plus two real additions and one complex addition requires two real additions. As mentioned before, in Radix4 DIF butterfly algorithm, the Npoint FFT consists of log _{4} (N) stages, and each stage consists of N/4point Radix4 DIF butterflies. Therefore, Radix4 DIF butterfly calculation reduces the number of complex
multiplication that are needed for a Npoint DFT from N ^{2} to 3(N/4)log _{4} N (from 4N ^{2} to 3Nlog _{4} N in terms of real multiplications). For example, the number of real multiplications needed for a 1024point DFT is
reduced from
. The improvement of the Radix4 DIF butterfly algorithm
over the direct calculation of the DFT is approximately 273 times.
An example of 16point Radix4 DIF FFT diagram is shown in Figure 4 to illustrate the picture of an entire FFT algorithm. There are log _{4} 16=2 stages and each stage has 16/4=4 butterflies in the figure.
4N ^{2} = 4194304
to
3N
log
4
N
= 15360
x(0)
x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
x(7)
x(8)
x(9)
x(10)
x(11)
x(12)
x(13)
x(14)
x(15)
Figure 4. An example of a 16point Radix4 DIF FFT
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
10
Freescale Semiconductor
Radix4 FFT Algorithm
In Figure 4, the inputs are normally ordered while the outputs are presented in a digitreversed order. In the case of Radix4, the digits are 0, 1, 2, 3 (quaternary system). A direct result of digitreversed order for 16point Radix4 DIF FFT is summarized in Table 1. For example, the output X(9) occupies the position “6” as 6 _{1}_{0} = 12 _{4} and the digit reversed number is 21 _{4} = 9 _{1}_{0} , where the subscripts denote the bases used to present the number. That is, if the input index n is written as a base4 number, the output index is reversed base4 number. In order to store the output data to the correct order, the position of output data needs to be replaced with the digitreversed order. An efficient method to calculate the output indices will be introduced in Section 4 based on the bitreversed addressing mode provided by the SC3850 core.
Table 1. Digital Reversed Order of a 16point Radix4 FFT
Index 
Digital pattern 
Digital reversed pattern 
Digital reversed index 
0 
00 
00 
0 
1 
01 
10 
4 
2 
02 
20 
8 
3 
03 
30 
12 
4 
10 
01 
1 
5 
11 
11 
5 
6 
12 
21 
9 
7 
13 
31 
13 
8 
20 
02 
2 
9 
21 
12 
6 
10 
22 
22 
10 
11 
23 
32 
14 
12 
30 
03 
3 
13 
31 
13 
7 
14 
32 
23 
11 
15 
33 
33 
15 
Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0
Freescale Semiconductor
11
Radix4 FFT Algorithm
2.3 Radix4 DIT FFT
It is assumed that N is a power of 4, that is, N=4 ^{p} . The Radix4 DIT FFT can be derived as shown in Equation 16.
X(k)
=
N – 1
∑
nk
x(n)W _{N}
n 
= 0 

N 
N 
N 
N 

 
– 1 
 
– 1 
 
– 1 
 
– 1 
4 
4 
4 
4 
= 
∑ n = 0 
x(4n)W 4nk N +++ ∑ x(4n + 1)W _{N} ^{(}^{4}^{n} ^{+} ^{1}^{)}^{k} ∑ x(4n + 2)W _{N} ^{(}^{4}^{n} ^{+} ^{2}^{)}^{k} n = 0 n = 0 n = 0 
∑ x(4n + 3)W _{N} ^{(}^{4}^{n} ^{+} ^{3}^{)}^{k} 

N 
N N N 

 
– 1 
 – 1  – 1  
– 1 

= 
4 ∑ 
x(4n)W 4nk N 4 ++ N N W k ∑ x(4n + 1)W 4nk W 2k N 4 ∑ x(4n + 2)W 4nk N + 3k ^{W} N 4 
∑ 
x(4n + 3)W _{N} 
4nk 
Eqn. 16 

n 
= 0 
n = 0 n = 0 n 
= 
0 

N 
– 1 
N  – 1 N – 1 
N 
– 1 

 
 
 

= 
4 ∑ 
x(4n)W nk N ⁄ 4 4 ++ N N ⁄ 4 W k ∑ x(4n + 1)W nk W 2k N 4 ∑ x(4n + 2)W nk N ⁄ 4 + 3k ^{W} N 
4 ∑ nk x(4n + 3)W _{N} _{⁄} _{4} 

= 
n = 0 P(k) 
++ N W k Q(k) W n 2k N = 0 R(k) + W 3k N S(k) n = 0 
n = 0 
k = 0123,,,, …, N – 1
Each of the sums, P(k), Q(k), R(k), and S(k), in Equation 16 is recognized as an N/4point DFT. Although
the index k ranges over N values, k=0,1,2,
k=0,1,2,
since they are periodic with period N/4. The transform X(k) can be broken into four parts
as shown in Equation 17.
,N1,
each of the sums can be computed only for
,N/41,
X
X(k)
⎛
⎝
N
k + 
⎞
⎠
4
X
⎛
⎝
2N ⎞
⎠
k + 
4
X
⎛
⎝
3N ⎞
⎠
k + 
4
k
= ++
N
P(k)
W
k
Q(k)
W
2k
N
R(k)
+
W
3k
N
S(k)
⎛
⎝
N
k + 
4
⎞
⎠
= ++
N
P(k)
W
Q(k)
W
2
N
⎛
⎝
N
k + 
4
⎞
⎠
R(k)
+
3
⎛
⎝
^{W} N
N
k + 
4
⎞
⎠
S(k)
=
P(k)
–
jW
k
N
Q(k)
2k
– R(k)
W
N
+
jW
3k
N
S(k)
⎛
⎝
k + 
4
2N ⎞
⎠
= ++
N
P(k)
W
Q(k)
W
2
N
⎛
⎝
2N ⎞
⎠
k + 
4
R(k)
+
=
P(k)
–
W
k
N
Q(k)
+
W
2k
N
R(k)
⎛
⎝
k + 
4
3N ⎞
⎠
= ++
N
P(k)
W
Q(k)
W
2
N
⎛
⎝
–
W
3k
N
S(k)
3N ⎞
⎠
k + 
4
R(k)
+
P(k)
= +
=
012 ,,,
jW
k
N
Q(k)
N
… _{,}  _{–} _{1}
4
–
W
2k
N
R(k)
–
jW
3k
N
S(k)
Molto più che documenti.
Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.
Annulla in qualsiasi momento.