Sei sulla pagina 1di 52
Freescale Semiconductor Application Note Document Number: AN3666 Rev. 0, 11/2010 Software Optimization of FFTs and
Freescale Semiconductor Application Note Document Number: AN3666 Rev. 0, 11/2010 Software Optimization of FFTs and

Freescale Semiconductor Application Note

Document Number: AN3666 Rev. 0, 11/2010

Software Optimization of FFTs and IFFTs Using the SC3850 Core

The Fast Fourier Transform (FFT) is a numerically efficient algorithm used to compute the Discrete Fourier Transform (DFT). The Radix-2 and Radix-4 algorithms are used mostly for practical applications due to their simple structures. Compared with Radix-2 FFT, Radix-4 FFT provides a 25% savings in multipliers. For a complex N-point Fourier transform, the Radix-4 FFT reduces the number of complex multiplications from N 2 to 3(N/4)log N and the number of complex additions from N 2 to 8(N/4)log 4 N, where log 4 N is the number of stages and N/4 is the number of butterflies in each stage. FFTs are of importance to a wide variety of applications, such as telecommunications (3GPP-LTE, WiMAX, and so on). For example, Orthogonal Frequency Division Multiplexing (OFDM) signals are generated using the FFT algorithm.

This application note describes the implementation of the Radix-4 decimation-in-time (DIT) FFT algorithm using the Freescale StarCore SC3850 core. The document discusses how to use new features available in the SC3850 core, such as dual-multiplier, to improve the performance of the FFT. Code optimization and performance results are also investigated in this document. Typical reference code is included in this document to demonstrate the implementation details.

4

Contents

1. Introduction

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

2

2. Radix-4 FFT Algorithm

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

4

3. SC3850 Data Types and Instructions

.

.

.

.

.

.

.

.

.

.

.

.

15

4. Implementation on the SC3850 Core

.

.

.

.

.

.

.

.

.

.

.

.

21

5. Experimental Results

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

47

6. Conclusions

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

50

7. References

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

50

© 2008–2010 Freescale Semiconductor, Inc.

. . . . . . . . . . . . . . . 50
Introduction 1 1.1 Introduction Overview The discrete Fourier transform (DFT) plays an importa nt role
Introduction 1 1.1 Introduction Overview The discrete Fourier transform (DFT) plays an importa nt role

Introduction

1

1.1

Introduction

Overview

The discrete Fourier transform (DFT) plays an important role in the analysis, design, and implementation of discrete-time signal processing algorithms and systems because efficient algorithms exist for the computation of the DFT. These efficient algorithms are called Fast Fourier Transform (FFT) algorithms. In terms of multiplications and additions, the FFT algorithms can be orders of magnitude more efficient than competing algorithms.

It is well known that the DFT takes N 2 complex multiplications and N 2 complex additions for complex

N-point transform. Thus, direct computation of the DFT is inefficient. The basic idea of the FFT algorithm is to break up an N-point DFT transform into successive smaller and smaller transforms known as butterflies (basic computational elements). The small transforms used can be 2-point DFTs known as Radix-2, 4-point DFTs known as Radix-4, or other points. A two-point butterfly requires 1 complex multiplication and 2 complex additions, and a 4-point butterfly requires 3 complex multiplications and 8 complex additions. Therefore, the Radix-2 FFT reduces the complexity of a N-point DFT down to (N/2)log 2 N complex multiplications and Nlog 2 N complex additions since there are log 2 N stages and each stage has N/2 2-point butterflies. For the Radix-4 FFT, there are log 4 N stages and each stage has N/4 4-point butterflies. Thus, the total number of complex multiplication is (3N/4)log 4 N = (3N/8)log 2 N and the number of required complex additions is 8(N/4)log 4 N = Nlog 2 N.

Above all, the radix-4 FFT requires only 75% as many complex multiplies as the radix-2 FFT, although it uses the same number of complex additions. These additional savings make it a widely-used FFT algorithm. Thus, we would like to use Radix-4 FFT if the number of points is power of 4. However, if the number of points is power of 2 but not power of 4, the Radix-2 algorithm must be used to complete the whole FFT process. In this application note, we will only discuss Radix-4 FFT algorithm.

Now, let’s consider an example to demonstrate how FFTs are used in real applications. In the 3GPP-LTE (Long Term Evolution), M-point DFT and Inverse DFT (IDFT) are used to convert the signal between frequency domain and time domain. 3GPP-LTE aims to provide for an uplink speed of up to 50Mbps and

a downlink speed of up to 100Mbps. For this purpose, 3GPP-LTE physical layer uses Orthogonal

Frequency Division Multiple Access (OFDMA) on the downlink and Single Carrier - Frequency Division Multiple Access (SC-FDMA) on the uplink. Figure 1 shows the transmitter and receiver structure of OFDMA and SC-FDMA systems.

We can see from this example that DFT and IDFT are the key elements to represent changing signals in time and frequency domains. In real applications, FFTs are normally used to for high performance instead of direct DFT calculation.

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

{ Xn } { Xn } Introduction SC-FDMA Transmitter N-po int Subcarrier M-point DFT Mapping
{ Xn } { Xn } Introduction SC-FDMA Transmitter N-po int Subcarrier M-point DFT Mapping

{ Xn }

{ Xn }

Introduction SC-FDMA Transmitter N-po int Subcarrier M-point DFT Mapping IDFT Cyclic Prefix & Pulse Shaping
Introduction
SC-FDMA Transmitter
N-po int
Subcarrier
M-point
DFT
Mapping
IDFT
Cyclic Prefix &
Pulse Shaping
DAC & RF
Channel
SC-FDMA Receiver
Subcarrier De-
N-point
Mapping&
M-point
Cyclic Prefix
RF & ADC
IDFT
Equalization
DFT
Removal
Uplink SC-FDMA OFDMA Transmitter Subcarrier M-point Cyclic Prefix & Pulse Shaping DAC & RF Mapping
Uplink SC-FDMA
OFDMA Transmitter
Subcarrier
M-point
Cyclic Prefix &
Pulse Shaping
DAC & RF
Mapping
IDFT
Channel
OFDMA Receiver
Subcarrier De -
Mapping&
Equalization
M-point
Cyclic Prefix
RF & ADC
DFT
Removal

Downlink: OFDMA

Figure 1. Transmitter and Receiver Structure of SC-FDMA and OFDMA Systems

1.2

Organization

The rest of the document is organized as follows:

Section 2, “Radix-4 FFT Algorithm” gives a brief background for DFT, and describes (decimation in frequency) DIF and (decimation in time) DIT Radix-4 FFT algorithms.

Section 3, “SC3850 Data Types and Instructions” investigates some features in SC3850, which can be used for efficient FFT implementation.

Section 4, “Implementation on the SC3850 Core” provides the detailed implementation on SC3850, and discusses fixed-point implementation issues. How to fully utilize the resource in SC3850 and optimize the implementation is also discussed. Source code is included for reference.

Section 5, “Experimental Results” presents experimental results.

Section 6, “Conclusions” presents the conclusion.

Section 7, “References” provides a list of useful references.

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

Radix-4 FFT Algorithm 2 Radix-4 FFT Algorithm 2.1 DFT and IDFT The Fast Fourier Transform
Radix-4 FFT Algorithm 2 Radix-4 FFT Algorithm 2.1 DFT and IDFT The Fast Fourier Transform

Radix-4 FFT Algorithm

2 Radix-4 FFT Algorithm

2.1 DFT and IDFT

The Fast Fourier Transform (FFT) is a computationally efficient algorithm to calculate a Discrete Fourier

Transform (DFT). The DFT X(k), k=0,1,2,

X(k)

=

,N-1

N – 1

x(n)

of a sequence x(n), n=0,1,2,

exp

2πnk

j -------------

N

,N-1

is defined as

n = 0 N – 1 = ∑ x(n)W N nk n = 0 2πnk
n
= 0
N – 1
=
x(n)W N
nk
n
= 0
2πnk ⎞
nk
=
exp
j -------------
W N
N
2πnk ⎞
2πnk ⎞
=
cos
-------------
j
sin
-------------
N
N ⎠
In Equation 1 and Equation 2, N is the number of data,
j
=
–1
, and
nk
W N

Eqn. 1

Eqn. 2

is the twiddle factor. Equation 1

is called the N-point DFT of the sequence of x(n). For each value of k, the value of X(k) represents the

Fourier transform at the frequency

2πk

----------

N

. The IDFT is defined as follows:

x(n)

–nk

W N

=

=

=

=

1

----

N

N – 1

= 0

– 1

k

N

X(k)

exp

2πnk

j -------------

N

1

----

N

k = 0

–nk

X(k)W N

exp

j

N

2πnk

-------------

cos

2πnk

-------------

N

+

j

sin

2πnk

-------------

N

Eqn. 3

Eqn. 4

Equation 3 is essentially the same as Equation 1. The differences are that the exponent of the twiddle factor in Equation 3 is the negative of the one in Equation 1 and the scaling factor is 1/N. The IDFT can be simply computed using the same algorithms for DFT but with conjugated twiddle factors. Alternatively, we can use the same twiddles factors for DFT with conjugated input and output to compute IDFT. Equation 1 is also called the analysis equation and Equation 3 the synthesis equation.

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

Radix-4 FFT Algorithm From Equation 1 , it is clear that to compute X(k) for
Radix-4 FFT Algorithm From Equation 1 , it is clear that to compute X(k) for

Radix-4 FFT Algorithm

From Equation 1, it is clear that to compute X(k) for each k, it requires N complex multiplications and N-1 complex additions. So, for N values of k, that is, for the entire DFT, it requires N 2 complex multiplications

and complex additions. Thus, the DFT is very computationally intensive. Note that a

multiplication of two complex numbers requires four real

multiplications and two real additions. A complex addition requires

two real additions.

We will present two commonly used FFT algorithms: decimation in frequency (DIF) and decimation in time (DIT). Please note that the Radix-4 algorithms work out only when the FFT length N is a power of four.

NN(

– 1) ≈ N 2

(a + jb) × (c + jd) = (ac – bd) + j(bc + ad)

(a + jb) + (c + jd) = (ac+

) + jb(

+ d)

2.2 Radix-4 DIF FFT

We will use the properties shown by Equation 5 in the derivation of the algorithm.

Symmetry property:

Periodicity property:

N

k + ----

W N

2

W N

k + N

=

=

k

W N

k

W N

Eqn. 5

The Radix-4 DIF FFT algorithm breaks a N-point DFT calculation into a number of 4-point DFTs (4-point butterflies). Compared with direct computation of N-point DFT, 4-point butterfly calculation requires much less operations. The Radix-4 DIF FFT can be derived as shown in Equation 6.

X(k)

=

=

=

=

=

nk

x(n)W N

n

= 0

N

2N

----

– 1

------- – 1

4

4

n = 0

x(n)W

nk

N

++

N

x(n)W

nk

N

n = ----

4

------- 3N – 1
4

2N

n = -------

4

x(n)W

nk

N

+

N – 1

n = ------- 3N

4

nk

x(n)W N

N

---- – 1

4

n = 0

N

---- – 1

4

nk

x(n)W N

x(n)W

nk

N

N

---- – 1

4

4

W

n + ---- k

N

4

N

N

++

x

n + ----

n = 0

N

---- – 1

4

++

Nk

-------

4

x

N

n + ----

W

N

W

N

nk

4

N

---- – 1

4

n = 0

x

2N

n + -------

4

W

n + ------- 2N k

N

4

+

N

---- – 1

4

n = 0

x

3N

n + -------

4

n + ------- 3N k

4

W N

N

N

----

– 1

----

– 1

4

4

2Nk

-----------

W

N

4

x

2N

n + -------

4

W

nk

N

+

3Nk

-----------

W N

4 x

⎛ ⎝

3N

n + -------

4

nk

W N

n = 0

N

---- – 1

4

n = 0

x(n)

n = 0

++

Nk

-------

4

N

n + ----

W

N

x

4

W

2Nk

-----------

4

N

x

2N

n + -------

4

n = 0

+ W

3Nk

-----------

4

N

x

3N

n + -------

4

W N

nk

n = 0

k = 0123,,,, …, N – 1

Eqn. 6

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

Radix-4 FFT Algorithm The special factors of Equation 7 . Nk 2NK 3Nk ------- ------------
Radix-4 FFT Algorithm The special factors of Equation 7 . Nk 2NK 3Nk ------- ------------

Radix-4 FFT Algorithm

The special factors of Equation 7.

Nk

2NK

3Nk

-------

------------

-----------

4

W N

,

W

W

W

W N

4

Nk

-------

4

N

2Nk

-----------

N

4

3Nk

-----------

N

4

=

=

=

, and

W N

4 in above equation can be calculated as shown in

exp

exp

exp

2πk

N

–j ---------- ----

N

2πk

4

4

3N

2N

–j ---------- -------

N

2πk

–j ---------- -------

N

4

πk

==

exp

j ------

2

(–j) k

exp(–jπk)

==

(–1) k

j k

== 3πk

exp

j ----------

2

Eqn. 7

Then, Equation 6 can be rewritten as shown in Equation 8.

X(k)

=

N

---- – 1

4

n = 0

x(n)

++

N

n + ----

(–j)

k

x

4

(–1)

k

x

2N

n + -------

4

k = 0123,,,, …, N – 1

+ j

k

x

3N

n + -------

4

⎫ ⎬ W N

nk

Eqn. 8

Considering

very similar to a N/4-point FFT. However, it is not an FFT of length N/4 because the twiddle factor

depends on N instead of N/4. To make this equation an N/4-point FFT, the transform X(k) can be broken

into four parts as shown in Equation 9.

as one signal, Equation 8 looks

{

x(n)

++

(–j)

k

xn(

+ N 4)

(–1)

k

x(n + (2N) ⁄ 4)

+

j

k

x(n + (3N) ⁄ 4)}

nk

W N

X(4k)

=

X(4k + 1)

=

X(4k + 2)

=

X(4k + 3)

=

k

=

N

---- – 1

4

n

N

----

4

= 0

– 1

n

N

----

4

= 0

– 1

n

N

----

4

= 0

– 1

n = 0

x(n)

x(n)

x(n)

x(n)

N

++ n + ----

x

4

jx

N

n + ----

4

x

2N

n + -------

4

2N

xn + -------

4

+

x

N

n + ----

4

jx


N

n + ----

4

+

2N

xn + -------

4

– x

⎝ ⎛ n + -------

2N

4

N

0123 ,,,, … , ---- – 1

4

+ x

3N

n + -------

4

⎫ ⎬ W N

0

nk W N 4

+

jx

3N

n + -------

4

⎫ ⎬ W N

n

nk W N 4

– x

3N

n + -------

4

jx

⎝ ⎛ n + -------

3N

4

W N

2n W N 4

nk

W N

3n W N 4

nk

Eqn. 9

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

Radix-4 FFT Algorithm In Equation 9 , the twiddle factors are obtained as shown in
Radix-4 FFT Algorithm In Equation 9 , the twiddle factors are obtained as shown in

Radix-4 FFT Algorithm

In Equation 9, the twiddle factors are obtained as shown in Equation 10.

W

4nk

N

=

W N

n(4k + 1)

W N

n(4k + 2)

W

n(4k + 3)

N

exp

– j

2π -------------------- 4nk

N

2πnk

j -------------

==

N 4

exp

0

W N

nk

W N 4

– j

2π ⋅ n(4k + 1)

-----------------------------------

N

2π ⋅ n(4k + 2)

-----------------------------------

2πn

2πnk

=

exp

exp

exp

– j

– j

==

exp

exp

exp

j ---------- – j -------------

N

N 4

2π ⋅ 2n

W N

n

nk W N 4

W N

W N

2πnk

2πnk

===

N

2π ⋅ n(4k + 3)

-----------------------------------

j ----------------- – j -------------

N

2π ⋅ 3n

N 4

j ----------------- – j -------------

N

N 4

2n W N 4

nk

3n W N 4

nk

===

N

If we use the definitions shown in Equation 11.

 

y(n)

=

yn(

+ N 4)

=

y(n + (2N) ⁄ 4)

=

y(n + (3N) ⁄ 4)

=





x(n)

x(n)

x(n)

x(n)

N

++ x n + ----

4

jx


N

n + ----

4

x

2N

⎛ ⎝ n + -------

4

2N

xn + -------

4

– x

N

⎝ ⎛ n + ----

4

+

jx

N

⎛ ⎝ n + ----

4

+

2N

xn + -------

4

– x

2N

⎛ ⎝ n + -------

4

+ x


3N

n + -------

4

W N

0

+ jx


3N

n + -------

4

W N

n

– x

⎝ ⎛ n + -------

3N

4

W N

2n

jx

⎛ ⎝ n + -------

3N

4

W N

3n

Eqn. 10

Eqn. 11

From Equation 9, we can see that X(4k), X(4k+1), X(4k+2), and X(4k+3) are N/4-point FFT of y(n), y(n+N/4), y(n+2n/4), and y(n+3N/4), respectively. As a result, an N-point DFT is reduced to the computation of four N/4-point DFTs.

Figure 2 shows the corresponding Radix-4 butterfly calculation of Equation 11. Through further rearrangement, it can be shown that each Radix-4 DIF butterfly requires of 3 complex multiplications and 8 complex additions (12 real multiplications and 22 real additions). The decimation of the data sequences referred to as stage 1 of the decomposition. This process can be repeated again and again until the resulting sequences are reduced to one-point sequences. For N=4 p , this decimation can be performed p=log 4 N times. An example is shown in Figure 4 to illustrate the whole process. Thus the total number of complex multiplications is 3(N/4)log 4 N since there are 3 complex multiplications each butterfly, N/4 butterflies each stage, and log 4 N stages. The number of complex additions is 8(N/4)log 4 N since there are 8 complex additions each butterfly, N/4 butterflies each stage, and log 4 N stages.

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

Radix-4 FFT Algorithm (1,1,1,1) A x(n) y(n) A’ (1,-j,-1,j) Wb B x(n + N/4) y(n
Radix-4 FFT Algorithm (1,1,1,1) A x(n) y(n) A’ (1,-j,-1,j) Wb B x(n + N/4) y(n

Radix-4 FFT Algorithm

(1,1,1,1) A x(n) y(n) A’ (1,-j,-1,j) Wb B x(n + N/4) y(n + N/4) B’
(1,1,1,1)
A x(n)
y(n)
A’
(1,-j,-1,j)
Wb
B x(n + N/4)
y(n + N/4)
B’
(1,-1,1,-1)
Wc
C x(n + 2N/4)
y(n + 2N/4)
C’
(1,j,-1,-j)
Wd
D x(n + 3N/4)
y(n + 3N/4)
D’
2
π
Wb
= exp ⎜
j
y(n) : intermediate signal
N
n ⎟ ⎞
2
Wc
= exp ⎜
j
π ⋅ 2
N
n ⎟ ⎞
( n = 0,1,
,N/4-1
)
2
⎪ Wd
= exp ⎜
j
π ⋅ 3
N
n ⎟ ⎞

Figure 2. Radix-4 DIF Butterfly

In Figure 2, each of output y(n), y(n + N / 4), y(n + 2N / 4), and y(n + 3N / 4) is a sum of four input signals x(n), x(n + N / 4), x(n + 2N / 4), and x(n + 3N / 4), each multiplied by either +1, –1, j, or –j, and then the sum is multiplied by a twiddle factor (1, W N n , W N 2n , W N 3n ). The process of multiplication after addition is called “Decimation In Frequency (DIF)”. On the other hand, the process of addition after the multiplication is called “Decimation In Time (DIT)”. To illustrate the algorithm clearly, a simplified butterfly is shown in Figure 3.

x(n)

x(n + N/4)

x(n + 2N/4)

x(n + 3N/4)

0 n 2n 3n W N 0 -> 0 W N n -> n W
0
n
2n
3n
W N 0 -> 0
W N n -> n
W N 2n -> 2n
W N 3n -> 3n

y(n)

y(n + N/4)

y(n + 2N/4)

y(n + 3N/4)

y(n) : intermediate signal

Figure 3. Simplified Radix-4 DIF Butterfly

The Radix-4 DIF FFT algorithm processes an array of data by successive passes over the input data samples. On each pass, the algorithm performs Radix-4 DIF butterflies, where each butterfly picks up four complex data and returns four complex data as shown in Figure 2 and Figure 3. The input data, output data, and the twiddle factors are assumed in the following format:

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

Input data: Output data: Twiddle factors: x x x ( n ) = x ⎛
Input data: Output data: Twiddle factors: x x x ( n ) = x ⎛

Input data:

Output data:

Twiddle factors:

x

x

x(n) =

x

n + ----

N

4

n + ------- 2N

4

n + -------

3N

4

=

=

=

y

y

y(n) =

y

n + ----

N

4

n + ------- 2N

4

n + ------- 3N

4

=

=

=

Ar + j × Ai

Br + j × Bi

Cr + j × Ci

Dr + j × Di

Ar+ j × Ai

Br+ j × Bi

Cr+ j × Ci

Dr+ j × Di

Wbr + j × Wbi

W N 2n = Wcr + j × Wci

W N

n

=

W N 3n = Wdr + j × Wdi

Radix-4 FFT Algorithm

Eqn. 12

Eqn. 13

Eqn. 14

The real and imaginary part of output data for Radix-4 DIF butterfly is specified in Equation 15.

Ar= Ar +++Br Cr Dr

= (Ar + Br) + (Cr + Dr)

Ai= Ai +++Bi Ci Di

= (Ai + Ci) + (Bi + Di)

Br= (Ar + Bi – Cr – Di) × Wbr – (Ai – Br – Ci + Dr) × Wbi

= ((Ar – Cr) + (Bi – Di)) × Wbr – ((Ai – Ci) (Br – Dr)) × Wbi

Bi= (Ai – Br – Ci + Dr) × Wbr + (Ar + Bi – Cr – Di) × Wbi

= ((Ai – Ci) (Br – Dr)) × Wbr + ((Ar – Cr) + (Bi – Di)) × Wbi

Cr= (Ar – Br + Cr – Dr) × Wcr – (Ai – Bi + Ci – Di) × Wci

Eqn. 15

= ((Ar + Cr) (Br + Dr)) × Wcr – ((Ai + Ci) (Bi + Di)) × Wci

Ci= (Ai – Bi + Ci – Di) × Wcr + (Ar – Br + Cr – Dr) × Wci

= ((Ai + Ci) (Bi + Di)) × Wcr + ((Ar + Cr) (Br + Dr)) × Wci

Dr= (Ar – Bi – Cr + Di) × Wdr – (Ai + Br – Ci – Dr) × Wdi

= ((Ar – Cr) (Bi – Di)) × Wdr – ((Ai – Ci) + (Br – Dr)) × Wdi

Di= (Ai + Br – Ci – Dr) × Wdr + (Ar – Bi – Cr + Di) × Wdi

= ((Ai – Ci) + (Br – Dr)) × Wdr + ((Ar – Cr) (Bi – Di)) × Wdi

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

Radix-4 FFT Algorithm As we can see from Equation 15 that there are 12 real
Radix-4 FFT Algorithm As we can see from Equation 15 that there are 12 real

Radix-4 FFT Algorithm

As we can see from Equation 15 that there are 12 real multiplications and 22 real additions (count only once for duplicated additions and multiplications). It is equivalent to 3 complex multiplications and 8 complex additions since one complex multiplication requires four real multiplications plus two real additions and one complex addition requires two real additions. As mentioned before, in Radix-4 DIF butterfly algorithm, the N-point FFT consists of log 4 (N) stages, and each stage consists of N/4-point Radix-4 DIF butterflies. Therefore, Radix-4 DIF butterfly calculation reduces the number of complex

multiplication that are needed for a N-point DFT from N 2 to 3(N/4)log 4 N (from 4N 2 to 3Nlog 4 N in terms of real multiplications). For example, the number of real multiplications needed for a 1024-point DFT is

reduced from

. The improvement of the Radix-4 DIF butterfly algorithm

over the direct calculation of the DFT is approximately 273 times.

An example of 16-point Radix-4 DIF FFT diagram is shown in Figure 4 to illustrate the picture of an entire FFT algorithm. There are log 4 16=2 stages and each stage has 16/4=4 butterflies in the figure.

4N 2 = 4194304

to

3N

log

4

N

= 15360

x(0)

x(1)

x(2)

x(3)

x(4)

x(5)

x(6)

x(7)

x(8)

x(9)

x(10)

x(11)

x(12)

x(13)

x(14)

x(15)

Digit reversed order (quaternary system) X(0) 0 X(4) 0 x(8) 0 X(12) 0 0 X(1)
Digit reversed order
(quaternary system)
X(0)
0
X(4)
0
x(8)
0
X(12)
0
0
X(1)
X(5)
1
X(9)
2
X(13)
3
0
X(2)
2
X(6)
4
X(10)
6
X(14)
0
3
X(3)
6
X(7)
9
X(11)
X(15)

Figure 4. An example of a 16-point Radix-4 DIF FFT

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

Radix-4 FFT Algorithm In Figure 4 , the inputs are normally ordered while the outputs
Radix-4 FFT Algorithm In Figure 4 , the inputs are normally ordered while the outputs

Radix-4 FFT Algorithm

In Figure 4, the inputs are normally ordered while the outputs are presented in a digit-reversed order. In the case of Radix-4, the digits are 0, 1, 2, 3 (quaternary system). A direct result of digit-reversed order for 16-point Radix-4 DIF FFT is summarized in Table 1. For example, the output X(9) occupies the position “6” as 6 10 = 12 4 and the digit reversed number is 21 4 = 9 10 , where the subscripts denote the bases used to present the number. That is, if the input index n is written as a base-4 number, the output index is reversed base-4 number. In order to store the output data to the correct order, the position of output data needs to be replaced with the digit-reversed order. An efficient method to calculate the output indices will be introduced in Section 4 based on the bit-reversed addressing mode provided by the SC3850 core.

Table 1. Digital Reversed Order of a 16-point Radix-4 FFT

Index

Digital pattern

Digital reversed pattern

Digital reversed index

0

00

00

0

1

01

10

4

2

02

20

8

3

03

30

12

4

10

01

1

5

11

11

5

6

12

21

9

7

13

31

13

8

20

02

2

9

21

12

6

10

22

22

10

11

23

32

14

12

30

03

3

13

31

13

7

14

32

23

11

15

33

33

15

Software Optimization of FFTs and IFFTs Using the SC3850 Core, Rev. 0

Radix-4 FFT Algorithm 2.3 Radix-4 DIT FFT It is assumed that N is a power
Radix-4 FFT Algorithm 2.3 Radix-4 DIT FFT It is assumed that N is a power

Radix-4 FFT Algorithm

2.3 Radix-4 DIT FFT

It is assumed that N is a power of 4, that is, N=4 p . The Radix-4 DIT FFT can be derived as shown in Equation 16.

X(k)

=

N – 1

nk

x(n)W N

n

= 0

N

N

N

N

----

– 1

----

– 1

----

– 1

----

– 1

4

4

4

4

=

n

= 0

x(4n)W

4nk

N

+++

x(4n + 1)W N (4n + 1)k

x(4n + 2)W N (4n + 2)k

n = 0

n = 0

n = 0

x(4n + 3)W N (4n + 3)k

 

N

N

N

N

----

– 1

----

– 1

----

– 1

----

– 1

=

4

x(4n)W

4nk

N

4

++

N

N

W

k

x(4n + 1)W

4nk

W

2k

N

4

x(4n + 2)W

4nk

N

+

3k

W N

4

x(4n + 3)W N

4nk

Eqn. 16

n

= 0

n

= 0

n

= 0

n

=

0

N

– 1

N

---- – 1

N

– 1

N

– 1

----

----

----

=

4

x(4n)W

nk N 4

4

++

N

N 4

W

k

x(4n + 1)W

nk

W

2k

N

4

x(4n + 2)W

nk N 4

+

3k

W N

4

nk

x(4n + 3)W N 4

 

=

n

= 0

P(k)

++

N

W

k

Q(k)

W

n

2k

N

= 0

R(k)

+

W

3k

N

S(k)

n = 0

n = 0

 

k = 0123,,,, …, N – 1

Each of the sums, P(k), Q(k), R(k), and S(k), in Equation 16 is recognized as an N/4-point DFT. Although

the index k ranges over N values, k=0,1,2,

k=0,1,2,

since they are periodic with period N/4. The transform X(k) can be broken into four parts

as shown in Equation 17.

,N-1,

each of the sums can be computed only for

,N/4-1,

X

X(k)

N

k + ----

4

X

2N

k + -------

4

X

3N

k + -------

4

k

= ++

N

P(k)

W

k

Q(k)

W

2k

N

R(k)

+

W

3k

N

S(k)

N

k + ----

4

= ++

N

P(k)

W

Q(k)

W

2

N

N

k + ----

4

R(k)

+

3

W N

N

k + ----

4

S(k)

=

P(k)

jW

k

N

Q(k)

2k

– R(k)

W

N

+

jW

3k

N

S(k)

k + -------

4

2N

= ++

N

P(k)

W

Q(k)

W

2

N

2N

k + -------

4

R(k)

+

=

P(k)

W

k

N

Q(k)

+

W

2k

N

R(k)

k + -------

4

3N

= ++

N

P(k)

W

Q(k)

W

2

N

W

3k

N

S(k)

3N

k + -------

4

R(k)

+

P(k)

= +

=

012 ,,,

jW

k

N

Q(k)

N

, ---- 1

4

W

2k

N

R(k)

jW

3k

N

S(k)