A DFT and FFT Tutorial

The Fourier Transform Part X – FFT 2
Filming is currently underway on a special online course based on this blog which
will include videos, animations and work-throughs to illustrate, in a visual way,
how the Fourier Transform works, what all the math is all about and how it is
applied in the real world.
The module will be emailed to you the moment the course goes live.
The Fast Fourier Transform – Part 2
Last time we found out that the Discrete Fourier Transform (DFT), although a
hugely powerful tool, can be very time consuming to calculate. In 1965
two mathematicians, James Cooley and John Tukey, published a paper which
proposed a method that made the calculation of the DFT much more efficient.
Their observation, that Cosine and Sine waves repeat every 2π radians, meant that
many of the calculations necessary to compute the DFT of a signal repeated
themselves too. If the results of the repeated calculations could be stored and used
again the next time the calculation occurred, this could vastly reduce the number of
calculations required.
In order to take advantage of this repeating property of Sine and Cosine waves, we
use a method known as “Divide and Conquer”. We demonstrated how “Divide and
Conquer” works and found that there were 4 stages to the process:
1. SPLIT – Keep splitting the samples into groups of half the number of samples until you
are left with only a pair of samples in each group.
2. CALCULATE – Perform your algorithm on each of the sample pairs.
3. COMBINE – Use the results from the calculation you just performed to form the input
of the next stage of the problem.
4. REPEAT – Keep repeating 2 and 3 until you have an overall answer.
Having completed the splitting stage in the last post, we’re now going to look at
how we actually begin to calculate a 2-point DFT (that is a DFT with only 2
samples in it). As we’ll soon discover calculating a 2-point DFT is simplicity itself.
2. Calculate
How comes working out a 2-point DFT is so easy? A 2-point DFT is going to give
us only 2 frequencies in its result which we’ll call a(0) and a(1).
To calculate these terms we have to do 4 multiply and 2 add operations. These are:
1. Multiply the first sample, x(0), by a Cosine and Sine wave at frequency index k=0.
2. Multiply the second sample, x(1), by a Cosine and Sine wave at frequency index k=0
3. Add the results from 1 and 2 together: This gives us our first frequency term a(0).
4. Multiply the first sample, x(0), by a Cosine and Sine wave at frequency index k=1
5. Multiply the second sample, x(1), by a Cosine and Sine wave at frequency index k=1
6. Add the results from 4 and 5 together: This gives us our second frequency term a(1).
In order to calculate the first term, a(0), if I plot a Cosine and Sine wave with a
frequency index of zero (k=0):
You can see that a Cosine wave with a frequency of zero is simply a straight line
with a magnitude of 1. Any signal multiplied by 1 remains unchanged. So if my
signal at time 0 is x(0) and my signal at time 1 is x(1), after the multiplication with
the Cosine wave, these values will remain the same.
A Sine wave with a frequency of 0 is also a straight line, but it has a magnitude of
zero. Any signal multiplied by zero is simple zero so the Sine wave does not play
any part in the calculation.
Therefore to calculate our first frequency index a(0), we simply add together the 2
samples of our signal x(0) and x(1):
In order to calculate the second frequency term, a(1), if I plot a Cosine and Sine
wave with a frequency index of zero (k=1):
You can see that when the index of our sample is zero, the Cosine wave has an
amplitude of 1. Therefore we multiply x(0) by 1.
When the index of our sample is 1, the Cosine wave has an amplitude of -1,
Therefore we multiply x(1) by -1.
The Sine wave is again zero at both these sample indexes so again it plays no part
in the calculation.
So our second term, a(1), is also very easy to calculate:

This multiplication and addition process for a 2-point DFT can be shown on a
special diagram known as a butterfly diagram, so called as its form resembles the
wings of a butterfly.
Going back to the bottom row of our divide and conquer diagram…
we perform this operation on each group of 2 samples.

Twiddle Factors
Before we can continue to the next stage of the “Divide and Conquer” method,
the combining stage, we need to realize that there has been a bit of a twiddle!
What’s a twiddle?
The definition for the word “twiddle” which my dictionary gives is: “to twist,
move, or fiddle with”. Something in the previous stage of the calculation has been
fiddled with.
To try and explain what twiddle has been done, I’m going to use the Cosine
multiplication at a frequency index of 1 as an example.
If we go back to the graph of our original signal which we saw last time, I’m going
to look at the signal as if I had sampled it only twice over its duration at a sample
index of 0 and a sample index of 8 as this is what our multiplication for the first 2-
point DFT for the first sample pair in the bottom row of the split signal looks like.
We know from our butterfly diagram that in a 2-point DFT, at a frequency index of
1, at sample index 0, the Cosine wave must have and amplitude of 1 and at sample
index 8, the Cosine wave must have an amplitude of -1. We can see that this is true
from the graph. However, what happens when we try to perform a 2-point DFT
and the next 2 points in the bottom row of the split signal? Yes, sample indexes 4
and 12! I’m talking about you!!
As you can see from the graph above, at these 2 sample indexes, the Cosine wave’s
amplitude is no longer 1 and -1. As it stands, we cannot use the above graph to
calculate a 2-point DFT for samples 4 and 12. We need to do something to
it before we perform the first multiplication.
What we do is we twiddle it a bit. We change the phase of the Cosine wave, or in

other words we push the whole Cosine wave along the x-axis a bit until it is sitting
in the correct position over samples 4 and 12 so that the amplitudes of the Cosine
at each point are 1 and -1 respectively. To see what I mean, look at the following
animation:
The Fourier Transform Part XI – FFT 3

 Home
 /
 The Fourier Transform
 /
 The Fourier Transform Part XI – FFT 3
Click here to reserve your free module
In 1965 two mathematicians, James Cooley and John Tukey, published a paper
which proposed a method that made the calculation of the DFT much more
efficient. In our example of a 16 point DFT (A DFT with 16 samples in it), we saw
that the key to calculating it efficiently was to use the Divide and Conquer method:
1. SPLIT – Keep splitting the samples into groups of half the number of samples until you
are left with only a pair of samples in each group.
2. CALCULATE – Perform your algorithm on each of the sample pairs.
3. COMBINE – Use the results from the calculation you just performed to form the input
of the next stage of the problem.
4. REPEAT – Keep repeating 2 and 3 until you have an overall answer.
In the last post, we looked at how to compute a 2-point DFT for each of the sample
pairs in the “Calculate” stage mentioned above and found that it was quite easy to
do. However, before we move onto the “Combine” stage of the method, I would
first like to take a look at how we would normally calculate a 4-point DFT without
Cooley and Tukey’s help.
The 4-Point DFT
In a 4-point DFT, we divide our signal into 4 samples over its duration. We
multiply the 4 samples in the signal by 4 samples of a Cosine wave and 4 samples
of a Sine wave at frequency index 0. We then add all the results together to give
ourselves 2 results, one for the Cosine multiplication and one for the Sine
multiplication. This gives us the Cosine and Sine contributions of our signal for
frequency 0. We then repeat this for each of the 4 different frequency indexes
available in a 4-point DFT.
Let’s look at the source waves for each of the 4 frequencies, starting with
frequency zero which I’m going to call b(0):
In the left hand graph above, we can see our signal (shown in green and which I’m
going to call “x”) divided into 4 points at samples 0, 4, 8 and 12. The four points
on our signal, I’m going to call therefore x(0), x(4), x(8) and x(12).
This is multiplied by a Cosine wave with a frequency of zero. (A Cosine wave with
a frequency of zero is simply a straight line with an amplitude of 1). Any number
multiplied by 1 remains unchanged. So when we add together all the multiplied
points for our Cosine contribution at a frequency of zero, we’re simply going to
get:
In the right hand graph above, we can see our signal (again shown in green)
multiplied by a Sine wave with a frequency of zero. A Sine wave with a frequency
of zero is simply a straight line with an amplitude of zero. Any number multiplied
by zero equals zero, so there is no Sine contribution for this first frequency.
Next we’ll look at frequency 1 which I’m going to call b(1):
Looking at the left hand graph first, we see that at sample index 0, the Cosine wave
is 1 so our signal at x(0) will simply be multiplied by 1. At sample index 4, the
Cosine wave is zero so sample x(4) will be multiplied by zero. This is true also of
sample x(12). Sample 8 is multiplied by -1. So for the Cosine contribution, we’re
going to get:
–If you’ve been following the previous posts in this blog, the above equation might
begin to stir certain memories–
Looking at the right hand graph, we see that at sample index 0, the Sine wave is
zero, so x(0) will be multiplied by zero. This is true also for sample x(8). At
sample index 4, the Sine wave has an amplitude of 1, so x(4) will be multiplied by
1. At sample index 12, the Sine wave has an amplitude of -1, so x(12) will be
multiplied by -1. So for the Sine contribution, we’re going to get:
–any more memories stirring yet? —
Now I know that my old math professors would be screaming at me at this point
for sloppy, ambiguous writing. We cannot have b(1) being equal to 2 completely
different things as I have written it in the two equations above. So let’s go back to
our complex notation and represent the Cosine contribution as a real number and
the Sine contribution as an imaginary number (by multiplying it by “i”), so to keep
everyone happy, I’ll rewrite my sloppy notation as follows:
If this last equation confuses you like it would have done me not so long ago,
ignore it and just remember that the “i” is only there to try and keep the Cosine and
Sine terms separate as I have done in the 2 equations preceding it. It’s not even all
that important. What I’m trying to drive with all these graphs –and those cryptic
comments in green– will hopefully begin to become clearer as we continue.
So now for frequency 2 which I’m going to call b(2):
Looking at the left hand graph, you’ll excuse me if I cut all the description and just
go straight to the formula. Hopefully, you’ll be getting the gist of it by now. For
the Cosine contribution for b(2):
–This equation looks similar to the equation for b(0) save for a couple of minus
signs! —
Looking at the right hand graph… oh look, all the Sine samples are zero again so
there is no Sine component for this frequency.
— bit like b(0) in that respect! —
Let’s move on to frequency 3 (the final frequency in a 4-point DFT) which I’m
going to call b(3).
Looking at the left hand graph for the Cosine contribution:
— Now I’ve definitely seen that somewhere before! —
Looking at the right hand graph for the Sine contribution:
— OK so the minuses and pluses are switched round, but I’ve seen this equation
before too! —
Again, to keep my old math lecturer happy, let’s combine these equations properly:
Alright! So enough of the green comments and all the nods and winks to previous
blog posts! Let’s spell it out in words of one syllable and chuck in a nice, big red
font for good measure.
The Calculations are repeating themselves!
We might need to change a plus sign to a minus sign here and there, but all we are
doing is adding or subtracting our 4 samples x(0), x(4), x(8) and x(12) in different
orders.
There is a definite similarity between the even frequencies b(0) and b(2) and the
odd frequencies b(1) and b(3), and what is more, we’ve done some of these
calculations before when we were calculating the 2-point DFTs in the previous
blog post.
It seems quite plain that calculations involving x(0) and x(8) are repeating
themselves with regularity, at least in the Cosine calculation, but x(4) and x(12)
seem to come and go as they please. In order to help us ground what we have
instinctively understood, we need to look now to the “Combine” stage of Cooley
and Tukey’s algorithm.
3. Combine
The beauty of the Cooley Tukey method is that it allows us to treat each stage of
the calculation as if we were only calculating a 2-point DFT. The reason that
we can do it is that, at each stage, we have already done half of the work in the
previous stage and we use the results we have already calculated to help us
calculate the next stage.
As we saw last time, to calculate a 2-point DFT, we simply add together the 2
samples for the first frequency term and subtract them for the second. Let’s look at
how we can still us this same method, combining what we calculated for the 2-
point DFTs to calculate a 4-point DFT.
We have already calculated a 2-point DFT for samples 0 and 8, so we can reuse
that result. We can look at this on a butterfly diagram as shown below:
Starting at the top left of the diagram, x(0) and x(8) are used to calculate the two
frequency terms a(0) and a(1) for a 2-point DFT. These terms, a(0) and a(1) are
then fed into the top 2 inputs of the 4-point DFT. However, as this is a 4-point
DFT, we need 2 more samples to make up the numbers. These are samples x(4)
and x(12) which produced the frequency components a(2) and a(3) from our
second 2-point DFT in the bottom left hand corner of the diagram.
However, if we now look at a(2) and a(3) going into the second butterfly (the 4-
point DFT) in the middle of the diagram, we see that something happens to these
samples as they are fed in. They both get multiplied by a weird term which I’ve
called “W”. This W term is known as a “Twiddle Factor” (also called a phase
factor).
We spoke about “twiddle factors” at the end of the last post, It is now that the
effects of these twiddle factors come in to play. In order to remind ourselves of
what is happening, let’s look at the 2-point DFT for samples x(4) and x(12).
Remember that in the 2-point DFT, there is no Sine contribution as both
frequencies of Sine waves at the points we need to measure a 2-point DFT are
zero. Therefore, we’re going to look at what happens to the Cosine wave.
In order to perform the 2-point DFT on samples x(4) and x(12), we had to shift our
Cosine wave along the x-axis by π/2 radians as shown in the animation below.
This is because samples x(4) and X(12) did not lie at the correct points on the x-
axis for a 2-point DFT to be performed on them. However, now we are dealing
with a 4-point DFT, X(4) and x(12) do lie at the correct point, so we have to
“twiddle” the result we obtained from the 2-point DFT by the amount we shifted
the cosine wave. Watch what happens to the 2 samples on the Cosine wave at
sample index 4 and sample index 2 as the wave is phase shifted back to where it
should be. I’ve highlighted the samples with yellow circles.
You can see how they slide down the Cosine wave to the values they should be
when the Cosine wave is in position for a 4-point DFT.
As we saw in the 2-point DFT:
 the frequency term a(2) is made up of the sum of the samples x(4) + x(12).
 the frequency term a(3) are simply the difference between the samples x(4) – x(12).
If both of these samples x(4) and x(12) are modified by the shifting back of the
Cosine wave, then the corresponding frequency terms a(2) and a(3) will also be
similarly affected.
Therefore before a(2) and a(3) can be fed into the 4-point DFT, they undergo an
extra multiplication.
If we look at the term a(2) on the butterfly diagram above, as it gets fed into the
bottom half of the 4-point DFT, it gets multiplied by a twiddle factor “W” with a 0
written just above it and a 4 written just below it.
As we see from the above equation, this twiddle factor is a complex number. The
“0” above the W refers to the frequency index of a(2) in the 2-point
DFT (remember a(2) was the first of 2 frequency terms, therefore its index is “0”).
The “4” below the W refers to the number of points in the DFT we are currently
calculating.
However, here’s the interesting thing.

In the equation above, if we disregard the “0” for a second, notice that 2π/4 is
equal to π/2 which is exactly the number of radians we had to shift the cosine wave
through when we were trying to perform the 2-point DFT on the 2 samples x(4)
and x(12). So now we are calculating a 4-point DFT, in order to prepare the term
a(2), we have shifted the Cosine wave back (by π/2 radians) to where it should
be. It just so happens that this particular twiddle factor has no effect on the input
a(2) as Cos(0 x π/2) = 1. If we look at what we are doing on a graph for this first
frequency index:
As we are talking about a frequency of zero, it doesn’t matter by how much we

phase shift the Cosine wave, it will always have an amplitude of 1. Also it doesn’t
matter how much we phase shift the sine wave, it will always have an amplitude of
zero.
So there is no imaginary part to the result as the sine term is zero. Sine(0 x π/2) =
0.
So putting this all together for our first frequency term of the 4-point DFT which I
called b(0), the twiddle factor for a(2) was simply equal to 1 as we just saw so:
This is exactly the b(0) calculation we calculated when we did it the long way,
only we already did half the work in the last stage as:
and:
so:
Cooley and Tukey’s algorithm just doesn’t stop making life easier! By combining
the results from the previous stage 2-point DFT stage, we can now treat our 4-point
DFT as if it was made up of interleaved 2-point DFTs as a(0) and a(2) have already
been calculated. This means that no sooner have we calculated our term for b(0),
when we get b(2) thrown in for free. We can reuse the a(0) and a(2) terms we used
to calculate b(0), but just change the sign of a(2) to calculate b(2) just as we did
with samples x(0) and x(8) in the 2-point DFT.
So putting this all together for our third frequency term of the 4-point DFT which I
called b(2), the twiddle factor for a(2) is the same as before (=1), so:
This is exactly the b(2) calculation we calculated when we did it the long way,
only we already did half the work in the last stage as:
and:
so:
We do exactly the same process for the other two frequency terms in the 4-point
DFT, b(1) and b(3). The only difference is the twiddle factor.
Looking first at b(1), the frequency of our Cosine wave is 1, as we saw in the
animation (here it is again):
At a frequency of 1 the twiddle factor does have an effect on samples x(4) and
x(12), so a(3), which is made up of x(4) and x(12) will be similarly affected.
We already saw in the animation that the Cosine term at sample indexes 4 and 12
at this frequency is zero so for b(1) there is no Cosine component. However, look
what happens to the Sine component as we shift the Sine wave back to where it
should be for the 4-point DFT.
There is a Sine component as Sin (2π/4), which is the same as Sin (π/2), is equal
to 1. This is why the “i” remains in the twiddle factor equation above as it signifies
a Sine component with an amplitude of 1.
Using the butterfly diagram we can plug what we know back into the calculation
for b(1).
For the Cosine component, a(3) is zero so:

and from before we know that:
For the first time, we have a Sine component in our signal, a(3) = 1, so for the Sine
component:
and from before we know that:
So putting this all together:
This is exactly the b(1) calculation we calculated when we did it the long way as
we can rewrite this as:
This being the Cooley Tukey algorithm, having calculated our frequency b(1), we
get frequency b(3) for free simply by changing the sign of a(3).
Just as with the 2-point DFTs, we repeated the process to cover all the samples in
the signal, we do exactly the same for the all the 4-point DFTs. This combines all
the eight 2-point DFTs into four 4-point DFTs.
This is shown on a butterfly diagram as follows:

Consecutive groups of 2 samples from the 2-point DFTs combine together to form
the 4 inputs of the 4 point DFTs. In each of the 4-point DFTs, the second pair of
inputs are each multiplied by twiddle factors then their twiddled
values interleaved with the first pair of inputs to calculate the result.
We have now covered all the relevant stages for the FFT. Next time we’ll apply the
methods we have learned in this post to combine the four 4-point DFTs into two 8-
point DFTs then into to one 16-point DFT which will give us our final result. We’ll
then follow this up with a numeric example and actually calculate a 16-point FFT
for a real life signal.
The Fourier Transform Part XII – FFT 4
So now we finally come to the end of our journey through the Fourier Transform in
general and the Fast Fourier Transform in particular. We’ve divided, we’ve
calculated and at the end of the last post, we were in the middle of combining our
smaller DFTs into larger ones. We’d got as far as a 4-point DFT. We’re now going
to combine the four 4-point DFTs into two 8-point DFTs and finally into one 16-
point DFT to get the Fourier Transform for our 16 sample signal.
The way we combine the 4-point DFTs into and 8-point DFT is almost identical to
the way we combined the 2-point DFTs into 4-point DFTs, the only things that
change are the twiddle factors. Below is the butterfly diagram for combining the
four 4-point DFTs into two 8-point DFTs.
Starting from the left hand side of the diagram we see the four 4-point FFTs which
produce the results b(0) to b(15). These results are then fed into the two 8-point
DFTs. Taking the upper 8-point DFT as an example, we see that just as the first 2
terms going into the 4-point DFTs were unaffected by twiddle factors, for an 8-
point DFT, it is the first 4 terms that are unaffected by twiddle factors. The next 4
terms however have been twiddled as in order to perform the second 4-point DFT,
the Cosine and Sine waves had to be shifted similar to the shifted waves in the 2-
point DFTs we spoke about in the last 2 posts. It is the twiddle factors that now
shift these waves back to where they should be for the 8-point DFT.
How are these twiddle factors calculated?
Again we see the “W” notation we met last time. The number to the right and
above the W indicates the sample index of the 4-point DFT being fed into the input
and the the number to the right and below the W indicates the order of DFT we are
now calculating. Unsurprisingly, the number is 8 for an 8-point DFT. So the 4
twiddle factors are:
Disregarding the number in blue, we see that each of the twiddle factors has shifted
the Cosine and Sine waves by 2π/8 or π/4 radians (or 45° for those who prefer
degrees).
Finally, having computed the two 8-point DFTs, we combine everything together
into one 16-point DFT. Here is the whole process in a butterfly diagram from the
initial samples x(0) to x(15) all the way to the final frequency domain results X(0)
to X(15):
This time we see that the first 8 terms, c(0) to c(7), going into the 16-point DFT are
unaffected by twiddle factors whereas the second 8 terms, c(8) to c(15) are
multiplied by twiddle factors as follows:
Disregarding the number in blue, we see that each of the twiddle factors has shifted
the Cosine and Sine waves by 2π/16 or π/8 radians (or 22.5° for those who prefer
degrees).
This Divide and Conquer method can extend to any size of FFT, so long the
number of samples being fed into the FFT is a power of 2. (2, 4, 8, 16, 32, 64, 128,
256 etc…)
So now we have a method for computing a 16-point FFT, In the next post, I’m
going to finally put some numbers to all the theory we have been learning
throughout this blog and try and calculate the FFT for a real signal.
The Fourier Transform Part XIII –
Numerical Example
The Fast Fourier Transform – Numerical Example
What frequencies make up the following signal? This signal has 16 samples in it so
we are going to run an 16-point FFT to find out the answer.
The 16 samples in the signal have the following values:
Sample Amplitude
x(0) 0.5000
x(1) 0.5845
x(2) -0.1768
x(3) -0.4492
x(4) -0.2500
x(5) -0.4492
Sample Amplitude
x(6) -0.1768
x(7) 0.5845
x(8) 0.5000
x(9) 0.1226
x(10) 0.1768
x(11) -0.2579
x(12) -0.7500
x(13) -0.2579
x(14) 0.1768
x(15) 0.1226
Firstly we divide and reorder our samples into groups of 2 using the bit reversal
method, so sample x(0) gets grouped with sample x(8). Sample x(4) gets grouped
with sample x(12) etc.:
Sample Amplitude
x(0) 0.5000
x(8) 0.5000
x(4) -0.2500
x(12) -0.7500
x(2) -0.1768
x(10) 0.1768
x(6) -0.1768
Sample Amplitude
x(14) 0.1768
x(1) 0.5845
x(9) 0.1226
x(5) -0.4492
x(13) -0.2579
x(3) -0.4492
x(11) -0.2579
x(7) 0.5845
x(15) 0.1226
Now we perform a 2-point DFT on each of the sample pairs. This is very easy as
we simply add the samples together for the first term then subtract the for the
second so:
…and so on for each of the eight 2-point DFTs.

So the results for all the 2-point DFTs are as follows:
Frequency Magnitude
a(0) 1.0000
a(1) 0.0000
a(2) -1.0000
a(3) 0.5000
a(4) 0.0000
Frequency Magnitude
a(5) -0.3536
a(6) 0.0000
a(7) -0.3536
a(8) 0.7071
a(9) 0.4619
a(10) -0.7071
a(11) -0.1913
a(12) -0.7071
a(13) -0.1913
a(14) 0.7071
a(15) 0.4619
The next stage is to start combining the results of the previous stage into larger and
larger DFTs until we arrive back at a 16-point DFT. So the next stage is to
combine the eight 2-point DFTs into four 4-point DFTs. We use the output of the
previous stage to form the input of the next stage so we can treat it like a 2-point
DFT. It is now the twiddle factors begin to come into play so let’s remind
ourselves of their values for a 4-point DFT:
Notice that for the first time, in the 4-point DFT we have an imaginary term so
there is going to be a Sine component to some of the results as well as a Cosine
component. This makes the multiplication by the twiddle factor a little
more “complex” as the twiddle factor is a complex number.
Before we begin with the numeric calculation, I want to take a quick look at
multiplication with complex numbers. If you already know about multiplication
with complex numbers, then click here to skip to the rest of the example.
Multiplication with Complex Numbers
A complex number can be one of three types of numbers:
1. A completely real number (a number with the imaginary part equal to zero)
2. A completely imaginary number (a number with the real part equal to zero)
3. A number with both an real and imaginary component (a number with a non-zero real
part and a non-zero imaginary part)
Therefore there can be four types of multiplication:
1. A real number multiplied by another real number

2. A real number multiplied by an imaginary number
3. An imaginary number multiplied by another imaginary number
4. A complex number multiplied by another complex number
A real number multiplied by another real number
When we multiply two real numbers together, we never even think complex
numbers because any real number multiplied by another real number simply gives
us a real number as the result.
To prove this, let’s write a real number in complex form. We do this by simply
multiplying the imaginary part of the complex number by zero. So, for example,
the number 2 could be written as:
Let’s multiply it by 3, which can be written as:
So if we want to multiply these 2 numbers together:
Because we are multiplying two numbers that are inside brackets, we have to
multiply them in four stages using the FOIL method:
Stage 1 – First: We multiply the real number in each of the brackets (the first
number in each of the brackets).
Stage 2 – Outside: We multiply the real number in the first bracket by the
imaginary number in the second bracket (or the two numbers on the outer side of
the 2 brackets).
Stage 3 – Inside: We multiply the imaginary number in the first bracket by the real
number in the second bracket (or the two numbers on the inner side of the 2
brackets).
Stage 4 – Last: We multiply the imaginary numbers in each of the brackets (the
last number in each of the brackets).
So our calculation is going to look like:
All but the first set of brackets in the above equation contain a multiplication by
zero so only the first set of brackets will give a non-zero result. Therefore:
The result of the multiplication is a real number.

A real number multiplied by an imaginary number
What sort of number will we get if we multiply a real number by an imaginary

number? Let’s try multiplying 5 (a real number) by i4 (an imaginary number).
Firstly we write out our multiplication in complex form:
Using the same FOIL method again will give us:
You can see in the above equation that only the second set of brackets contain the
multiplication of terms that are both non-zero, therefore, only the second set of
brackets will give a non-zero result:
So the answer to our question is we will get a result that is imaginary if we

multiply a real number by an imaginary number.
An imaginary number multiplied by another imaginary number
What sort of number will we get if we multiply an imaginary number by another

imaginary number? Let’s try multiplying i2 (an imaginary number) by i4 (another
imaginary number). Again, let’s write out our multiplication in complex form:
FOIL will give us:
This time the only set of brackets containing two non-zero terms is the last set. So:
HERE COMES THE COOL BIT: Remember the definition of the imaginary
number i:
Which means that:
So:
The answer to our question this time is that an imaginary number multiplied by
another imaginary number gives us a real number as the result.
A complex number multiplied by another complex number
What sort of number will we get if we multiply a complex number (that is a

number with both a real and imaginary component) by another complex number?
Let’s try multiplying 8+i5 by 3+i9 (an imaginary number).
Using FOIL, we’ll write it out in full:
This time there are no zero terms anywhere in the calculation so each of the 4
brackets in the equation above are going to give us a result:
The first set of brackets contain a real number (24). The second and third set of
brackets contain an imaginary number so we can add these two brackets together.
The final set of brackets contains our old friend i-squared which is equal to -1, so
the final set of brackets contain a real number (-45) that can be combined with real
number in the first set of brackets (24). So the answer to our problem is:
So a complex number multiplied by another complex number gives us a

complex number as the result. However, this is not always true. Depending on the
values of the real and complex part of the number the result could also be
completely real or completely imaginary too. Just look at the next example.
Complex Conjugates
One way we can make the multiplication of two complex numbers can yield a real
result is if the two complex numbers are complex conjugates. A complex conjugate
is a pair of complex numbers that look identical, but for their sign. For example:
is the complex conjugate of:
If we were to multiply these two complex numbers together:
Using FOIL, this would expand to:
If we do the multiplications inside each of the 4 brackets we get:
Notice how the second and third brackets cancel each other out leaving a result that
is completely real. So:
If two complex conjugates are multiplied together, they give a real result.
Summary of the rules of complex multiplication

Continuing with our numeric example
To help you with the next section, this website might be useful: Complex Number
Calculator.
So now we’ve sorted out how to multiply real, imaginary and complex numbers,
let’s remind ourselves of the twiddle factors for a 4-point DFT:
So the calculations for the 4-point DFTs will work as follows:
…and so on for each of the four 4-point DFTs.

Frequency Real (Cosine) Imaginary (Sine)
b(0) 0 0
b(1) 0 -0.5
b(2) 2 0
b(3) 0 0.5
b(4) 0 0
b(5) -0.3536 0.3536

b(6) 0 0
b(7) -0.3536 -0.3536
b(8) 0 0
b(9) 0.4619 0.1913
b(10) 1.4142 0
b(11) 0.4619 -0.1913
b(12) 0 0
b(13) -0.1913 -0.4619
b(14) -1.4142 0
b(15) -0.1913 0.4619
Next we combine the results from the four 4-point DFTs into two 8-point DFTs.
We now have 4 twiddle factors to calculate:
Things are starting to get a little bit messy with lots of calculations with real and
imaginary parts to them so I’ll walk through one in detail and leave you to apply it
to all the others. You can check your answers in the table at the end of the 8-point
DFT section.
Let’s take the calculation of the frequency c(5) as an example. From the butterfly
diagram we see that c(5) is calculated using the following formula;
We saw above that:
and:
and:
So substituting these numbers into the equation:
Now this all looks very complicated, but all we need to do is break it down. Firstly
we’ll multiply out the brackets using the FOIL method. So the multiplication of the
brackets looks like:
Using FOIL gives us the result:
So:
c(0) 0.0000 0.0000
c(1) 0.0000 0.0001
c(2) 2.0000 0.0000
c(3) 0.0000 1.0001
c(4) 0.0000 0.0000
c(5) 0.0000 -1.0001
c(6) 2.0000 0.0000
c(7) 0.0000 -0.0001
c(8) 0.0000 0.0000
c(9) 0.0000 0.0000

c(10) 1.4142 1.4142
c(11) 0.9238 -0.3826
c(12) 0.0000 0.0000
c(13) 0.9238 0.3826
c(14) 1.4142 -1.4142
c(15) 0.0000 0.0000
Finally we come to the 16-point DFT:

This time there are 8 twiddle factors to think about. As we saw in the previous
post, these are:
Again the method we use to calculate each of the results is identical to the method
we used in the 8-point example, so, at the risk of making this post any longer than
it needs to be, I’ll skip the working and cut straight to the results which can be seen
in the following table:
X(0) 0.0000 0.0000
X(1) 0.0000 0.0000
X(2) 4.0000 0.0000
X(3) 0.0000 0.0002
X(4) 0.0000 0.0000
X(5) 0.0000 -2.0000
X(6) 0.0000 0.0000
X(7) 0.0000 -0.0001
X(8) 0.0000 0.0000
X(9) 0.0000 0.0001
X(10) 0.0000 0.0000
X(11) 0.0000 2.0000
X(12) 0.0000 0.0000
X(13) 0.0000 -0.0002
X(14) 4.0000 0.0000
X(15) 0.0000 0.0000
Now you can see that there are lots of zeros going on in the various terms.
However, it is a bit difficult to tell what is going on in our signal from these results.
We need to use Pythagoras to get a magnitude value for each frequency to tell us
what that frequency’s actual contribution is. We do this by combining the Real part
for each frequency, ℜ, and the imaginary part, ℑ in the following manner:
This gives us the following results:
Frequency Index Magnitude
0 0.0000
1 0.0000
2 4.0000
3 0.0002
4 0.0000
5 2.0000
6 0.0000
7 0.0001
8 0.0000
9 0.0001
10 0.0000
11 2.0000
12 0.0000
3 0.0002
14 4.0000
15 0.0000
But this would be much clearer in graphical form so let’s look at our answer
properly and see which frequencies make up our signal:
So our signal is made up of 2 frequencies (remember all the frequencies above half
the sampling rate, 8Hz in this case, are simply reflected frequencies). These
frequencies are 2Hz and 5Hz.
So there we have it! That’s how the Fourier Transform works which brings us to
the end of the blog.
As mentioned above, work is progressing on the video course based on this blog
and I will be posting little video snippets from the course from time to time. You
can receive an E-Mail notification when one of these snippets is available by
subscribing to the E-Mail list for this blog. Not only will you receive the E-Mail
notifications but also a free module from the course when the course goes live. All
you need to do is:
As I was working through this numeric example, in order to help me to perform all
the calculations necessary, I wrote my own FFT algorithm in Javascript which I’ll
post with a full explanation in the next post.
DFT and FFT TUTORIAL
A DFT is a "Discrete Fourier Transform". An FFT is a "Fast Fourier Transform". An
FFT is a DFT, but is much faster for calculations. The whole point of the FFT is speed in
calculating a DFT.
The Goal of This Tutorial

The goal of this tutorial is to show how to take a discretely sampled wave, usually
from nature, and convert it to the frequency domain by using an FFT. An FFT is a
DFT of a particular form. An FFT is simply a fast way of calculating a DFT.
Before explaining the FFT I explain the DFT. I do this for three reasons.
1. A DFT is much simpler to understand mathematically, which is better for

learning.
2. Once you understand what the terms of a DFT mean, they apply to the FFT,
so you are learning the FFT too.
3. You will gain an appreciation for what the FFT accomplishes, because it is
derived from the DFT and the only purpose of the complex math of an FFT
is to speed up the DFT calculation. It's purely about speedy number
crunching and changes none of the fundamentals.
Why Do This?
The reason to learn about the DFT and FFT is in order to get a frequency spectrum
of a wave or to understand better what frequencies it is composed of. This might
allow you to better identify, for example, a sound wave that you have sampled than
could be done with the time wave, which is useful for speech recognition. Or,
maybe you want to add or subtract frequencies and recreate the original wave with
these modifications using an inverse Fourier Transform. Doing this with light
waves you could, for example, remove dirty spots or noise from an image, or find
recurring patterns in an image. You may have other ideas as to what you can do
with the frequency components of a wave. The sky is the limit!
The DFT
The Discrete Fourier Transform converts discrete data from a time wave into a frequency
spectrum. Using the DFT implies that the finite segment that is analyzed is one period of an
infinitely extended periodic signal.
The DFT equation:
Equation 1
x(k) is the time wave that is converted to a frequency spectrum by the DFT.
Here are key concepts required to understand a DFT:
1. The "sampling rate", sr. The sampling rate is the number of samples taken
over a time period. For simplicity we will make the time interval between
samples equal. This is the "sample interval", si.
2. The fundamental period, T, is the period of all the samples taken. This is
also called the "window".
3. The "fundamental frequency" is f0, which is 1/T. f0 is the first harmonic,
the second harmonic is 2*f0, the third is 3*f0, etc.
4. The number of samples is N.
5. The "Nyquist Frequency", fc, is half the sampling rate. The Nyquist
frequency is the maximum frequency that can be detected for a given
sampling rate. This is because in order to measure a wave you need at least
two sample points to identify it (trough and peak).
6. "Euler's formula" --
7. The sampled part of the time wave, x(t), should be "typical" of how the
wave behaves over all time that it exists.
8. This notation makes handling the exponential easier. This is
sometimes called the "twiddle factor."
For simplicity, we will sample a sine wave with a small number of points, N, and
perform a DFT on it, then we will employ each of the concepts above. Note, the
sine wave is a time wave, and could be any wave in nature, for example a sound
wave. The horizontal axis is time. The vertical axis is amplitude.
Diagram 1
Notice how in the diagram above we are sampling four points. The fundamental
period, T, of the wave sampled is set to 2*pi. This applies to any wave we want to
sample. The interval between samples is 2*pi/N, so in this case it is 2*pi/4. Thus,
the interval between samples is pi/2 in this case.
The time wave is thus, x(k) = sin(pi/2*k) for k = 0 to N -1. The last point sampled
is always the point just before 2*pi, because the wave is considered to be a
repeating pattern and wraps around back to the value at k = 0, so you aren't missing
any information.
We also need to know the time taken to sample the wave, so that we can tie it to a
frequency. In our example, the time taken for the fundamental period, T, is 0.1
seconds (this value is measured when the wave is captured). That means the sine
wave is a 10 Hz wave. Hertz = cycles per second. Also, the sampling interval, si,
is the fundamental period time divided by the number of samples. So, si = T/N =
(0.1)/4 seconds, or 0.025 seconds. The sampling rate, or frequency, sr, = 1/si = 40
Hz, or 40 samples per second.
For the sine wave, the value at each of the four points sampled is:
And, before we plug into the DFT, some more on Wn, the twiddle factor,
referenced above:
The DFT formula, then, for a four point sample and with the twiddle factor is:
Now, Euler's Formula for N=4:

Equation 2
For the equation above, where k*n = 0 to N - 1, i.e. 0 to 3, here are the results:
Notice that any additional integer values of kn will cycle back around. For
example, kn = 4 cycles back to kn=0, so the value is 1. kn = 5 cycles back around
to kn = 1, so the value is -j. The equation "kn modulus 4" determines which value
of W is selected. Also, note that for larger samples the cycle is bigger. So for N=8
the equation would be "kn modulus 8". This is probably why W is called the
"twiddle factor".
Now put this together for the DFT:
Here is the DFT worked out for all four points and for four frequencies:
Evaluating the output data. Each F(n) value outputs a phase at a particular
frequency. The frequency of the point is determined by the fundamental frequency
multiplied by n, i.e. f = f0*n, where f0=1/T = 10Hz. The output values are the
phase of the frequencies, which are represented by a real part and an imaginary
part thus: real + j*imaginary. The fundamental frequency, first harmonic, is 10 Hz
as calculated above. The magnitude at a frequency is Calculated thus
sqrt(real*real + imaginary*imaginary).
Below is a frequency spectrum plot for the sine wave determined from the DFT we
just worked through:
Diagram 2
The frequency plot is in the "frequency domain". The magnitudes are plotted in
Diagram 2. The spike at 10 Hz shows that the DFT pulled out one of the
frequencies that is in the sine wave. In fact, the sine wave is a 10 Hz sine wave, so
that makes sense. However, the spike at 30 Hz should not be there, because there is
no 30 Hz wave in the sine wave. So what accounts for that spike? Well, this is
where the Nyquist Frequency, fc, mentioned above comes in. The Nyquist
frequency is the cut off point above which the data from the DFT is no longer
valid. The sampling rate is 40 Hz, and fc is half the sampling frequency, which
means that any frequency above 20 Hz will not be valid in this case. So, the 30 Hz
frequency is a spurious signal.
That completes analysis of a very simple wave.
Most waves will have many more frequencies in them, and thus many more spikes
of various magnitudes along the frequency spectrum. For example, Diagram 3,
below, is a plot of a triangle wave in time and its corresponding frequency
spectrum:
Diagram 3
The Next section is on the FFT. The FFT builds on the knowledge of the DFT
described above, so it should be understood before moving on to the FFT.
Overview of The FFT

<< Previous Next >>
This is a "decimation in time" FFT, because the input value to the FFT is the time wave,
x(t). The time wave could be a sound wave, for example.
Speed!
The only reason to learn the FFT is for speed. An FFT is a very efficient DFT
calculating algorithm.
How fast is an FFT versus a "straight" DFT?

Equation 1
This means that a 1024 sample FFT is 102.4 times faster than the "straight" DFT.
For larger numbers of samples the speed advantage improves. For example, for
4096 samples the FFT is over 340 times faster.
Learning the FFT is a bit of a challenge, but I'm

hoping this tutorial will make it relatively easy to
learn. Here is a basic outline of the tutorial:
1. First, you'll need to learn the "Danielson-Lanczos Lemma" (D-L Lemma). This will
require long equation writing, but it's a vital component of the FFT. I'll give several
examples.
2. You'll need to understand the "twiddle factor" -- . This was discussed a little during
the DFT tutorial. It along with the "D-L Lemma" are essential to understanding how an
FFT works.
3. Then the "Butterfly Diagram" will be explained. This builds on the first two concepts
above. The Butterfly diagram is a diagrammatic representation of an FFT algorithm.
4. You'll also learn about the "reverse bit pattern" for data input and the reason for it.
This is the first key to understanding the FFT. It takes quite a few steps, but I've broken
the tutorial down into small digestible steps to make this as smooth as possible.
Here is the Danielson-Lanczos Lemma:
Equation 2
Note it is a DFT broken up into two summations of half the size of the original.
The first summation is the "even terms", E, and the second is the "odd terms", O.
W is the "twiddle factor", and understanding it is another key to understanding the
FFT. Here is the "twiddle factor".
Equation 3
How to Expand the DFT

To expand the DFT into even and odd terms as in the lemma above, you do the
following. For the even term you substitute 2k into k, then you create a summation
of half the size of the original. For the odd term you substitute 2k + 1 into k, then
create a summation of half the size of the original.
Here is a the summation halved:

The example below shows the process required for a first level expansion.
Equation 4
Note the "twiddle factor" above and where it comes from.
And putting the even and odd terms together from above, we get the Danielson
Lanczos first level expansion:
Equation 5
The above is a first level break down. You can continue to break each term down
into even and odd terms until you run out of samples and only have one value in
the summation. Like this:
Equation 6
This happens because you keep halving the number of values summed on each
expansion of the equation. For the FFT we want all summations to be expanded
down to 1 term. Here is the pattern of expansion for the Danielson-Lanczos
Lemma:
As shown in the diagram above, the D-L Lemma breaks down in a binary manner.
That is, the number of terms expands as follows 1, 2, 4, 8, 16, 32, etc. In order to
get all of the summation to unity, 1, therefore, we must have a power of base 2
number of samples, or N=2^r samples. So, an FFT requires N = 2^r samples.
For a sample size of N=2, a first level expansion will be enough to get the
summations to unity. The first level expansion will look like this after plugging
into equation 5 above.
Equation 7
The summations are reduced to unity, and all that remains is a twiddle factor and
input values, x(0) and x(1). This is the general form used for the Butterfly
diagram, shown later in this tutorial.
Next example will be expansion of the D-L Lemma to 4 terms.
For N = 4 samples, the equation must be expanded again to four terms. Below is
the expansion to four terms. As with the first level expansion, substitute 2k into k
and reduce the summation by half for the even terms and substitute 2k+1 into k and
reduce the summation by half for the odd terms. The E and O below refer to
equation 5.
Here is the even value expanded from E:
Here is the odd value expanded from E:

Here is the even value expanded form O:
Here is the odd value expanded from O:

.
And finally:
Equation 8
Now, N = 4 samples, and using the same procedure as was used for two samples,
equation 8 becomes:
Equation 9
Once again, as with N=2, the summations have been reduced to unity, and all you
have remaining are "twiddle factors" and the input values, x(0), x(1), x(2), and
x(3).
The next example will be an 8 term expansion, shown but not worked through.
Expanded to 8 Terms
Next >>
Equation 10
After the expansion above you can plug in for N = 8 samples, and k=0, since all
summation would be unity when N=8.
Equation 11
Observations
Note two things about the equations 7, 9 and 11, repeated below. First, the order of
input values, x(n), is "reverse binary".For example, left to right the order for the 4
term equation is x(0), x(2), x(1) and x(3). The order for the 8 term equation is x(0),
x(4), x(2), x(6), x(1), x(5), x(3), and x(7). This naturally happens when the D-L
Lemma is expanded. The Butterfly Diagram makes use of this fact. The second
thing to note is that the "twiddled factors",W, build up with each new expansion,
so that you multiply more together. The Butterfly diagram also deals with this by
the adding of "stages", which you will see later in this tutorial.
Equation 7
Equation 9
Equation 11
More on the "Reverse Binary" pattern.

Here are two examples of the "reverse binary" example:
For 4 inputs:
Count from 0 to 3 in binary 00, 01, 10, 11. Now, reverse the bits of each number
and you get 00, 10, 01, 11. In decimal this is 0, 2, 1, and 3.
So, the values in the D-L equation would be x(0), x(2), x(1), and x(3). This is what
you see in equation 9 above.
For 8 inputs:
Count from 0 to 7 in binary 000, 001, 010, 011, 100, 101, 110, 111. Now, reverse
the bits of each number and you get 000, 100, 010, 110, 001, 101, 011, 111. In
decimal this sequence is 0, 4, 2, 6, 1, 5, 3, and 7.
So, the values in the D-L equation for 8 samples would be x(0), x(4), x(2), x(6),
x(1), x(5), x(3), and x(7). This is what you see in equation 11 above.
The same pattern holds for all expansions of the D-L Lemma, and is made use of
by the Butterfly Diagram.
Next I'll discuss the "twiddle factor" and then put it together with the Danielson-
Lanczos Lemma to create the Butterfly diagram, which is the FFT in diagram
form.
The "Twiddle Factor"

<< Previous Next >>
The twiddle factor, W, describes a "rotating vector", which rotates in increments
according to the number of samples, N. Here are graphs where N = 2, 4 and 8 samples.
The Redundancy and Symmetry of the "Twiddle

Factor"
As shown in the diagram above, the twiddle factor has redundancy in values as the
vector rotates around. For example W for N=2, is the same for n = 0, 2, 4, 6, etc.
And W for N=8 is the same for n = 3, 11, 19, 27, etc.
Also, the symmetry is the fact that values that are 180 degrees out of phase are the
negative of each other. So for example, W for N =4 samples, where n = 0,4,8, etc,
are the negative of n = 2,6,10, etc.
The Butterfly diagram takes advantage of this redundancy and symmetry, which is
part of what makes the FFT possible.
The Butterfly Diagram

<< Previous Next >>
The Butterfly Diagram builds on the Danielson-Lanczos Lemma and the twiddle
factor to create an efficient algorithm. The Butterfly Diagram is the FFT
algorithm represented as a diagram.
First, here is the simplest butterfly. It's the basic unit, consisting of just two inputs
and two outputs.
That diagram is the fundamental building block of a butterfly. It has two input
values, or N=2 samples, x(0) and x(1), and results in two output values F(0) and
F(1). The diagram comes form the D-L Lemma for two inputs.
This can be shown by taking equation 7 above and plugging in for values n=0 and
n=1, thus:
So, the Butterfly comes from the Danielson-Lanczos Lemma, but it also uses the
twiddle factor to take advantage of redundancies and symmetry in the D-L Lemma.
To get a full understanding of the Butterfly, a four input Butterfly will be required.
That is described next.
Constructing A 4 Input Butterfly Diagram

<< Previous Next >>
Here I will show you step-by-step how to construct a 4 input Butterfly Diagram.
Next extend lines and connect upper and lower butterflies.
Finally, labeling the butterfly.

Note the order of input values is "reverse bit" order. The Butterfly uses the natural
expansion order of the Danielson-Lanczos Lemma, which is why the input is
ordered that way. This was described earlier.
The four output equations for the butterfly are

derived below.
Equation 12
The N Log N savings

Remember, for a straight DFT you needed N*N multiplies. The N Log N savings
comes from the fact that there are two multiplies per Butterfly. In the 4 input
diagram above, there are 4 butterflies. so, there are a total of 4*2 = 8 multiplies. 4
Log(4) = 8. This is how you get the computational savings in the FFT! The log
is base 2, as described earlier. See equation 1.
In the next part I provide an 8 input butterfly example for completeness.
An 8 Input Butterfly
<< Previous Next >>
Here is an example of an 8 input butterfly:
An The 8 input butterfly diagram has 12 2-input butterflies and thus 12*2 = 24
multiplies.
N Log N = 8 Log (8) = 24. A straight DFT has N*N multiplies, or 8*8 = 64
multiplies. That's a pretty good savings for a small sample. The savings are over
100 times for N = 1024, and this increases as the number of samples increases.
FFT
A DFT and FFT TUTORIAL

A DFT is a "Discrete Fourier Transform". An FFT is a "Fast Fourier
Transform". An FFT is a DFT, but is much faster for calculations. The whole
point of the FFT is speed in calculating a DFT.
The Basic Idea

<< Home Next >>
A Fourier Transform converts a wave in the time domain to the frequency

domain.
Note, for a full discussion of the Fourier Series and Fourier Transform that are the foundation of the DFT
and FFT, see the Superposition Principle, Fourier Series, Fourier Transform Tutorial.
Every wave has one or more frequencies and amplitudes in it. An example is a sound wave. If someone
speaks, whistles, plays an instrument, etc., to generate a sound wave, then any sample of that sound wave
has a set of frequencies with amplitudes that describe that wave.
According to the mathematician Joseph Fourier, you can take a set of sine waves of different amplitudes
and frequencies and sum them together to equal any wave form. These component sine waves each have a
frequency and amplitude. A plot of frequency versus magnitude (amplitude) on an x-y graph of these sine
wave components is a frequency spectrum, or frequency domain, plot. See Diagram 1, below.
An inverse Fourier Transform converts the frequency domain components

back into the original time wave.
You can reassemble the time wave from the frequency components using the Inverse Fourier
Transform. The inverse Fourier won't be discussed here, but after learning the Fourier the Inverse is very
easy to learn, because the math is almost identical. Using the Fourier and Inverse Fourier together, not
only can you reassemble the original wave, you can also change the time wave by altering its frequency
components. You can add them, remove them, or tweak their values. This is a powerful method by which
to change the character of the time wave.
A DFT is a "Discrete Fourier Transform". An FFT is a "Fast Fourier Transform". The IDFT below
is"Inverse DFT" and IFFT is "Inverse FFT". A DFT is a Fourier that transforms a discrete number of
samples of a time wave and converts them into a frequency spectrum. However, calculating a DFT is
sometimes too slow, because of the number of multiplies required. An FFT is an algorithm that speeds up
the calculation of a DFT. In essence, an FFT is a DFT for speed. The entire purpose of an FFT is to speed
up the calculations.
Diagram 1
The equation for the Discrete Fourier Transform is:
Equation 1
Where F(n) is the amplitude at the frequency, n, and N is the number of discrete samples taken.
Outline For Learning About The DFT and FFT

<< Previous Next >>
Here is an outline of the steps used to explain both the DFT and FFT.
1. First the DFT will be explained. This is the vital first step, since an FFT is a DFT and there are,
therefore, basic concepts in common with both. Learning this first will make understanding the
FFT easier.
2. Once you understand the basic concepts of a DFT, the FFT will be explained. This is broken
into several steps.
3. The "Danielson-Lanczos Lemma" will be explained, which is the first step to understanding the
FFT.
4. The "twiddle factor" will be explained, which is another key to understanding the FFT.
5. The "Butterfly Diagram" will be explained. The Butterfly is an FFT in diagram form. It's the
final step of this tutorial and builds on the prior concepts.
6. Several examples will be given along with the basic concepts above.

<< Previous Next >>
The goal of this tutorial is to show how to take a discretely sampled wave, usually from nature, and
convert it to the frequency domain by using an FFT. An FFT is a DFT of a particular form. An FFT is
simply a fast way of calculating a DFT. Before explaining the FFT I explain the DFT. I do this for three
reasons.
1. A DFT is much simpler to understand mathematically, which is better for learning.

2. Once you understand what the terms of a DFT mean, they apply to the FFT, so you are learning
the FFT too.
3. You will gain an appreciation for what the FFT accomplishes, because it is derived from the
DFT and the only purpose of the complex math of an FFT is to speed up the DFT calculation.
It's purely about speedy number crunching and changes none of the fundamentals.
Why Do This?
<< Previous Next >>
The reason to learn about the DFT and FFT is in order to get a frequency spectrum of a wave or to
understand better what frequencies it is composed of. This might allow you to better identify, for
example, a sound wave that you have sampled than could be done with the time wave, which is useful for
speech recognition. Or, maybe you want to add or subtract frequencies and recreate the original wave
with these modifications using an inverse Fourier Transform. Doing this with light waves you could, for
example, remove dirty spots or noise from an image, or find recurring patterns in an image. You may
have other ideas as to what you can do with the frequency components of a wave. The sky is the limit!
The DFT
<< Previous Next >>
The Discrete Fourier Transform converts discrete data from a time wave into
a frequency spectrum. Using the DFT implies that the finite segment that is
analyzed is one period of an infinitely extended periodic signal.
The DFT equation:
Equation 1
1. The "sampling rate", sr. The sampling rate is the number of samples taken over a time period.
For simplicity we will make the time interval between samples equal. This is the "sample
interval", si.
2. The fundamental period, T, is the period of all the samples taken. This is also called the
"window".
3. The "fundamental frequency" is f0, which is 1/T. f0 is the first harmonic, the second harmonic
is 2*f0, the third is 3*f0, etc.
5. The "Nyquist Frequency", fc, is half the sampling rate. The Nyquist frequency is the maximum
frequency that can be detected for a given sampling rate. This is because in order to measure a
wave you need at least two sample points to identify it (trough and peak).
7. The sampled part of the time wave, x(t), should be "typical" of how the wave behaves over all
time that it exists.
8. This notation makes handling the exponential easier. This is sometimes called
the "twiddle factor."
For simplicity, we will sample a sine wave with a small number of points, N, and perform a DFT on it,
then we will employ each of the concepts above. Note, the sine wave is a time wave, and could be any
wave in nature, for example a sound wave. The horizontal axis is time. The vertical axis is amplitude.
Diagram 1
Notice how in the diagram above we are sampling four points. The fundamental period, T, of the wave
sampled is set to 2*pi. This applies to any wave we want to sample. The interval between samples is
2*pi/N, so in this case it is 2*pi/4. Thus, the interval between samples is pi/2 in this case.
The time wave is thus, x(k) = sin(pi/2*k) for k = 0 to N -1. The last point sampled is always the point
just before 2*pi, because the wave is considered to be a repeating pattern and wraps around back to the
value at k = 0, so you aren't missing any information.
We also need to know the time taken to sample the wave, so that we can tie it to a frequency. In our
example, the time taken for the fundamental period, T, is 0.1 seconds (this value is measured when the
wave is captured). That means the sine wave is a 10 Hz wave. Hertz = cycles per second. Also,
the sampling interval, si, is the fundamental period time divided by the number of samples. So, si =
T/N = 0.1/4 seconds, or 0.025 seconds. The sampling rate, or frequency, sr, = 1/si = 40 Hz, or 40
samples per second.
And, before we plug into the DFT, some more on Wn, the twiddle factor, referenced above:
Equation 2
Notice that any additional integer values of kn will cycle back around. For example, kn = 4 cycles back to
kn=0, so the value is 1. kn = 5 cycles back around to kn = 1, so the value is -j. The equation "kn modulus
4" determines which value of W is selected. Also, note that for larger samples the cycle is bigger. So for
N=8 the equation would be "kn modulus 8". This is probably why W is called the "twiddle factor".
Evaluating the output data. Each F(n) value refers to a particular frequency. The frequency of the point is
determined by the fundamental frequency multiplied by n i.e. f = f0*n, where f0=1/T = 10Hz. The output
values are the phase of the frequencies, which are represented by a real part and an imaginary part thus:
real + j*imaginary. The fundamental frequency, first harmonic, is 10 Hz as calculated above.
The magnitude at a frequency is Calculated thus sqrt(real*real + imaginary*imaginary).
Below is a frequency spectrum plot for the sine wave determined from the DFT we just worked through:
Diagram 2
The frequency plot is in the "frequency domain". The spike at 10 Hz shows that the DFT pulled out one
of the frequencies that is in the sine wave. In fact, the sine wave is a 10 Hz sine wave, so that makes
sense. However, the spike at 30 Hz should not be there, because there is no 30 Hz wave in the sine wave.
So what accounts for that spike? Well, this is where the Nyquist Frequency, fc, mentioned above comes
in. The Nyquist frequency is the cut off point above which the data from the DFT is no longer valid. The
sampling rate is 40 Hz, and fc is half the sampling frequency, which means that any frequency above 20
Hz will not be valid in this case. So, the 30 Hz frequency is a spurious signal.
Most waves will have many more frequencies in them, and thus many more spikes of various magnitudes
along the frequency spectrum. For example, below is a triangle wave in time and the corresponding
frequency spectrum of that wave:
Diagram 3
The Next section is FFT. The FFT builds on the knowledge above, so it should be understood before
moving on to the FFT.
Overview of The FFT

<< Previous Next >>
This is a "decimation in time" FFT, because the input value to the FFT is the
time wave, x(t). The time wave could be a sound wave, for example.
Speed!
The only reason to learn the FFT is for speed. An FFT is a very efficient DFT calculating algorithm.

Equation 1
This means that a 1024 sample FFT is 102.4 times faster than the "straight" DFT. For larger numbers of
samples the speed advantage improves. For example, for 4096 samples the FFT is over 340 times faster.

1. First, you'll need to learn the "Danielson-Lanczos Lemma" (D-L Lemma). This will require
long equation writing, but it's a vital component of the FFT. I'll give several examples.
2. You'll need to understand the "twiddle factor" -- . This was discussed a little during the
DFT tutorial. It along with the "D-L Lemma" are essential to understanding how an FFT works.
3. Then the "Butterfly Diagram" will be explained. This builds on the first two concepts above.
The Butterfly diagram is a diagrammatic representation of an FFT algorithm.
The Danielson-Lanczos Lemma

<< Previous Next >>
This is the first key to understanding the FFT. It takes quite a few steps, but
I've broken the tutorial down into small digestible steps to make this as
smooth as possible.
Equation 2
Note it is a DFT broken up into two summations of half the size of the original. The first summation is the
"even terms", E, and the second is the "odd terms", O. W is the "twiddle factor", and understanding it is
another key to understanding the FFT. Here is the "twiddle factor".
Equation 3

To expand the DFT into even and odd terms as in the lemma above, you do the following. For the even
term you substitute 2k into k, then you create a summation of half the size of the original. For the odd
term you substitute 2k + 1 into k, then create a summation of half the size of the original.

Equation 4
And putting the even and odd terms together from above, we get the Danielson Lanczos first level
expansion:
Equation 5
The above is a first level break down. You can continue to break each term down into even and odd terms
until you run out of samples and only have one value in the summation. Like this:
Equation 6
This happens because you keep halving the number of values summed on each expansion of the
equation.For the FFT we want all summations to be expanded down to 1 term. Here is the pattern of
expansion for the Danielson-Lanczos Lemma:
As shown in the diagram above, the D-L Lemma breaks down in a binary manner. That is, the number of
terms expands as follows 1, 2, 4, 8, 16, 32, etc. In order to get all of the summation to unity, 1, therefore,
we must have a power of base 2 number of samples, or N=2^r samples. So, an FFT requires N = 2^r
samples.
For a sample size of N=2, a first level expansion will be enough to get the summations to unity. The first
level expansion will look like this after plugging into equation 5 above.
Equation 7
The summations are reduced to unity, and all that remains is a twiddle factor and input values, x(0) and
x(1). This is the general form used for the Butterfly diagram, shown later in this tutorial.
Expansion of the Danielson-Lanczos to Four Terms

<< Previous Next >>
For N = 4 samples, the equation must be expanded again to four terms. Below is the expansion to four
terms. As with the first level expansion, substitute 2k into k and reduce the summation by half for the
even terms and substitute 2k+1 into k and reduce the summation by half for the odd terms. The E and O
below refer to equation 5.


.
And finally:
Equation 8
Now, N = 4 samples, and using the same procedure as was used for two samples, equation 8 becomes:
Equation 9
Once again, as with N=2, the summations have been reduced to unity, and all you have remaining are
"twiddle factors" and the input values, x(0), x(1), x(2), and x(3).
The Danielson-Lanczos Lemma Expanded to 8 Terms

<< Previous Next >>
Equation 10
After the expansion above you can plug in for N = 8 samples, and k=0, since all summation would be
unity when N=8.
Equation 11
Danielson-Lanczos Lemma Observations

<< Previous Next >>
Note two things about the equations 7, 9 and 11, repeated below. First, the order of input values, x(n), is
"reverse binary".For example, left to right the order for the 4 term equation is x(0), x(2), x(1) and x(3).
The order for the 8 term equation is x(0), x(4), x(2), x(6), x(1), x(5), x(3), and x(7). This naturally
happens when the D-L Lemma is expanded. The Butterfly Diagram makes use of this fact. The second
thing to note is that the "twiddled factors",W, build up with each new expansion, so that you multiply
more together. The Butterfly diagram also deals with this by the adding of "stages", which you will see
later in this tutorial.
Equation 7
Equation 9
Equation 11

For 4 inputs:
Count from 0 to 3 in binary 00, 01, 10, 11. Now, reverse the bits of each number and you get 00, 10, 01,
11. In decimal this is 0, 2, 1, and 3.
So, the values in the D-L equation would be x(0), x(2), x(1), and x(3). This is what you see in equation 9
above.
For 8 inputs:
Count from 0 to 7 in binary 000, 001, 010, 011, 100, 101, 110, 111. Now, reverse the bits of each number
and you get 000, 100, 010, 110, 001, 101, 011, 111. In decimal this sequence is 0, 4, 2, 6, 1, 5, 3, and 7.
So, the values in the D-L equation for 8 samples would be x(0), x(4), x(2), x(6), x(1), x(5), x(3), and x(7).
This is what you see in equation 11 above.
The same pattern holds for all expansions of the D-L Lemma, and is made use of by the Butterfly
Diagram.
Next I'll discuss the "twiddle factor" and then put it together with the Danielson-Lanczos Lemma to create
the Butterfly diagram, which is the FFT in diagram form.

<< Previous Next >>
The twiddle factor, W, describes a "rotating vector", which rotates in
increments according to the number of samples, N. Here are graphs where N
= 2, 4 and 8 samples.

Factor"
As shown in the diagram above, the twiddle factor has redundancy in values as the vector rotates around.
For example W for N=2, is the same for n = 0, 2, 4, 6, etc. And W for N=8 is the same for n = 3, 11, 19,
27, etc.
Also, the symmetry is the fact that values that are 180 degrees out of phase are the negative of each other.
So for example, W for N =4 samples, where n = 0,4,8, etc, are the negative of n = 2,6,10, etc.
The Butterfly diagram takes advantage of this redundancy and symmetry, which is part of what makes the
FFT possible.

<< Previous Next >>
The Butterfly Diagram builds on the Danielson-Lanczos Lemma and the twiddle factor to create an
efficient algorithm. The Butterfly Diagram is the FFT algorithm represented as a diagram.
First, here is the simplest butterfly. It's the basic unit, consisting of just two inputs and two outputs.
That diagram is the fundamental building block of a butterfly. It has two input values, or N=2 samples,
x(0) and x(1), and results in two output values F(0) and F(1). The diagram comes form the D-L Lemma
for two inputs.
This can be shown by taking equation 7 above and plugging in for values n=0 and n=1, thus:
So, the Butterfly comes from the Danielson-Lanczos Lemma, but it also uses the twiddle factor to take
advantage of redundancies and symmetry in the D-L Lemma.
To get a full understanding of the Butterfly, a four input Butterfly will be required. That is described next.

<< Previous Next >>

Note the order of input values is "reverse bit" order. The Butterfly uses the natural expansion order of the
Danielson-Lanczos Lemma, which is why the input is ordered that way. This was described earlier.

derived below.
Equation 12
The N Log N savings

Remember, for a straight DFT you needed N*N multiplies. The N Log N savings comes from the fact that
there are two multiplies per Butterfly. In the 4 input diagram above, there are 4 butterflies. so, there are a
total of 4*2 = 8 multiplies. 4 Log(4) = 8. This is how you get the computational savings in the
FFT! The log is base 2, as described earlier. See equation 1.
<< Previous Next >>
An The 8 input butterfly diagram has 12 2-input butterflies and thus 12*2 = 24 multiplies.
N Log N = 8 Log (8) = 24. A straight DFT has N*N multiplies, or 8*8 = 64 multiplies. That's a pretty
good savings for a small sample. The savings are over 100 times for N = 1024, and this increases as the
number of samples
Computing FFT Twiddle Factors

Rick Lyons●August 8, 2010●17 comments
 Tips and Tricks

Some days ago I read a post on the comp.dsp newsgroup and, if I understood the
poster's words, it seemed that the poster would benefit from knowing how to
compute the twiddle factors of a radix-2 fast Fourier transform (FFT).
Then, later it occurred to me that it might be useful for this blog's readers to be
aware of algorithms for computing FFT twiddle factors. So,... what follows are two
algorithms showing how to compute the individual twiddle factors of an N-point
decimation-in-frequency (DIF) and an N-point decimation-in-time (DIT) FFT.
The vast majority of FFT applications use (notice how I used the word "use"
instead of the clumsy word "utilize") standard, pre-written, FFT software routines.
However, there are non-standard FFT applications (for example, specialized
harmonic analysis, transmultiplexers, or perhaps using an FFT to implement a bank
of filters) where only a subset of the full N-sample complex FFT results are
required. Those oddball FFT applications, sometimes called "pruned FFTs",
require computation of individual FFT twiddle factors, and that's the purpose of
this blog.
This article is available in PDF format for easy printing
(If, by chance, the computation of FFT twiddle factors is of no interest to you, you
might just scroll down to the "A Little History of the FFT" part of this blog.)
Before we present the two twiddle factor computation algorithms, let's understand
the configuration of a single "butterfly" operation used in our radix-2 FFTs. We've
all seen the signal flow drawings of FFTs with their arrays of butterfly operations.
There are various ways of implementing a butterfly operation, but my favorites are
the efficient single-complex-multiply butterflies shown in Figure 1. A DIF
butterfly is shown in Figure 1(a), while a DIT butterfly is shown in Figure 1(b). In
Figure 1 the twiddle factors are shown as e–j2πQ/N, where variable Q is merely an
integer in the range of 0 ≤ Q ≤ (N/2)–1.
To simplify this blog's follow-on figures, we'll use Figures 1(c) and 1(d) to
represent the DIF and DIT butterflies. As such, Figure 1(c) is equivalent to Figure
1(a), and Figure 1(d) is equivalent to Figure 1(b).
Figure 1: Single-complex-multiply DIF and DIT butterflies.
Computing DIF Twiddle Factors
Take a look at Figure 2 showing the butterfly operations for an 8-point radix-2 DIF
FFT.
Figure 2: 8-point DIF FFT signal flow diagram.
For the radix-2 DIF FFT using the Figures 1(c) and 1(d) butterflies,
 The N-point DIF FFT has log2(N) stages, numbered P = 1, 2, ..., log2(N).
 Each stage comprises N/2 butterflies.
 Not counting the –1 twiddle factors, the Pth stage has N/2P unique twiddle
factors, numbered k = 0, 1, 2, ..., N/2P–1 as indicated by the bold numbers
above the upward-pointing arrows at the bottom of Figure 2.
Given those characteristics, the kth unique twiddle factor phase angle for the Pth
stage is computed using:
kth DIF twiddle factor angle = k•2P/2 (1)
where 0 ≤ k ≤ N/2P–1. For example, for the second stage (P = 2) of an N = 8-point

DIF FFT, the unique Q factors are:
k = 0, Q = 0•2P/2 = 0•4/2 = 0
k = 1, Q = 1•2P/2 = 1•4/2 = 2.
Computing DIT Twiddle Factors
Here's an algorithm for computing the individual twiddle factor angles of a radix-2
DIT FFT. Consider Figure 3 showing the butterfly signal flow of an 8-point DIT
FFT.
Figure 3: 8-point DIT FFT signal flow diagram.
For the DIT FFT using the Figures 1(c) and 1(d) butterflies,
 The N-point DIT FFT has log2(N) stages, numbered P = 1, 2, ..., log2(N).
 Each stage comprises N/2 butterflies.
 Not counting the –1 twiddle factors, the Pth stage has N/2 twiddle factors,
numbered k = 0, 1, 2, ..., N/2–1 as indicated by the upward arrows at the
bottom of Figure 3.
Given those characteristics, the kth DIT twiddle Q factor for the Pth stage is
computed using:
kth DIT twiddle factor Q = [⌊k2P/N⌋]bit-rev (2)
where 0 ≤ k ≤ N/2–1. The ⌊q⌋ operation means the integer part of q. The [z]bit-rev
function represents the three-step operation of:
[1] convert decimal integer z to a binary number represented by log2(N)–1

binary bits,
[2] perform bit reversal on the binary number as discussed in Section 4.5,
and
[3] convert the bit reversed number back to a decimal integer.
As an example of using Eq.(2), for the second stage (P = 2) of an N = 8-point DIT

FFT, the k = 3 twiddle Q factor is:
k = 3 twiddle factor Q = [⌊3•22/8⌋]bit-rev
= [⌊1.5⌋]bit-rev = [1]bit-rev = 2.
The above [1]bit-rev operation is: take the decimal number 1 and represent it with
log2(N)–1 = 2 bits, i.e., as 012. Next, reverse those bits to a binary 102 and convert
that binary number to our desired decimal result of 2.
A Little History of the FFT
The radix-2 FFT has a very interesting history. For example, one of the driving
forces behind the development of the FFT was the United State's desire to detect
nuclear explosions inside the Soviet Union in the early 1960s. Also, if it hadn't
been for the influence of a patent attorney, the Cooley-Tukey radix-2 FFT
algorithm might well have been known as the Sande-Tukey algorithm, named after
Gordon Sande and John Tukey. (That's the same Gordon Sande that occasionally
posts on the comp.dsp newsgroup.) For those and other interesting FFT historical
facts, see the following web sites.
Fast Fourier Transform (FFT)
In this section we present several methods for computing the DFT efficiently. In
view of the importance of the DFT in various digital signal processing
applications, such as linear filtering, correlation analysis, and spectrum analysis, its
efficient computation is a topic that has received considerable attention by many
mathematicians, engineers, and applied scientists.
From this point, we change the notation that X(k), instead of y(k) in previous
sections, represents the Fourier coefficients of x(n).
Basically, the computational problem for the DFT is to compute the sequence
{X(k)} of N complex-valued numbers given another sequence of data {x(n)} of
length N, according to the formula
In general, the data sequence x(n) is also assumed to be complex valued. Similarly,
The IDFT becomes
Since DFT and IDFT involve basically the same type of computations, our
discussion of efficient computational algorithms for the DFT applies as well to the
efficient computation of the IDFT.
We observe that for each value of k, direct computation of X(k) involves N

complex multiplications (4N real multiplications) and N-1 complex additions (4N-2
real additions). Consequently, to compute all N values of the DFT requires N 2
complex multiplications and N 2-N complex additions.
Direct computation of the DFT is basically inefficient primarily because it does not
exploit the symmetry and periodicity properties of the phase factor WN. In
particular, these two properties are :
The computationally efficient algorithms described in this sectio, known

collectively as fast Fourier transform (FFT) algorithms, exploit these two basic
properties of the phase factor.
Radix-2 FFT Algorithms
Let us consider the computation of the N = 2v point DFT by the divide-and conquer
approach. We split the N-point data sequence into two N/2-point data sequences
f1(n) and f2(n), corresponding to the even-numbered and odd-numbered samples of
x(n), respectively, that is,
Thus f1(n) and f2(n) are obtained by decimating x(n) by a factor of 2, and hence the
resulting FFT algorithm is called a decimation-in-time algorithm.
Now the N-point DFT can be expressed in terms of the DFT's of the decimated
sequences as follows:
But WN2 = WN/2. With this substitution, the equation can be expressed as
where F1(k) and F2(k) are the N/2-point DFTs of the sequences f1(m) and f2(m),
respectively.
Since F1(k) and F2(k) are periodic, with period N/2, we have F1(k+N/2) = F1(k) and
F2(k+N/2) = F2(k). In addition, the factor WNk+N/2 = -WNk. Hence the equation may
be expressed as
We observe that the direct computation of F1(k) requires (N/2)2 complex

multiplications. The same applies to the computation of F2(k). Furthermore, there
are N/2 additional complex multiplications required to compute WNkF2(k). Hence
the computation of X(k) requires 2(N/2)2 + N/2 = N 2/2 + N/2 complex
multiplications. This first step results in a reduction of the number of
multiplications from N 2 to N 2/2 + N/2, which is about a factor of 2 for N large.
Figure TC.3.1 First step in the decimation-in-time algorithm.
By computing N/4-point DFTs, we would obtain the N/2-point DFTs F1(k) and
F2(k) from the relations
The decimation of the data sequence can be repeated again and again until the
resulting sequences are reduced to one-point sequences. For N = 2v, this
decimation can be performed v = log2N times. Thus the total number of complex
multiplications is reduced to (N/2)log2N. The number of complex additions is
Nlog2N.
For illustrative purposes, Figure TC.3.2 depicts the computation of N = 8 point

DFT. We observe that the computation is performed in tree stages, beginning with
the computations of four two-point DFTs, then two four-point DFTs, and finally,
one eight-point DFT. The combination for the smaller DFTs to form the larger
DFT is illustrated in Figure TC.3.3 for N = 8.
Figure TC.3.2 Three stages in the computation of an N = 8-point DFT.

Figure TC.3.3 Eight-point decimation-in-time FFT algorithm.
Figure TC.3.4 Basic butterfly computation in the decimation-in-time FFT
algorithm.
An important observation is concerned with the order of the input data sequence
after it is decimated (v-1) times. For example, if we consider the case where N = 8,
we know that the first decimation yeilds the sequence x(0), x(2), x(4), x(6), x(1),
x(3), x(5), x(7), and the second decimation results in the sequence x(0), x(4), x(2),
x(6), x(1), x(5), x(3), x(7). This shuffling of the input data sequence has a well-
defined order as can be ascertained from observing Figure TC.3.5, which illustrates
the decimation of the eight-point sequence.
Figure TC.3.5 Shuffling of the data and bit reversal.
Another important radix-2 FFT algorithm, called the decimation-in-frequency

algorithm, is obtained by using the divide-and-conquer approach. To derive the
algorithm, we begin by splitting the DFT formula into two summations, one of
which involves the sum over the first N/2 data points and the second sum involves
the last N/2 data points. Thus we obtain
Now, let us split (decimate) X(k) into the even- and odd-numbered samples. Thus
we obtain
where we have used the fact that WN2 = WN/2
The computational procedure above can be repeated through decimation of the

N/2-point DFTs X(2k) and X(2k+1). The entire process involves v = log2N stages of
decimation, where each stage involves N/2 butterflies of the type shown in Figure
TC.3.7. Consequently, the computation of the N-point DFT via the decimation-in-
frequency FFT requires (N/2)log2N complex multiplications and Nlog2N complex
additions, just as in the decimation-in-time algorithm. For illustrative purposes, the
eight-point decimation-in-frequency algorithm is given in Figure TC.3.8.
Figure TC.3.6 First stage of the decimation-in-frequency FFT algorithm.
Figure TC.3.7 Basic butterfly computation in the decimation-in-frequency.
Figure TC.3.8 N = 8-piont decimation-in-frequency FFT algorithm.
We observe from Figure TC.3.8 that the input data x(n) occurs in natural order, but
the output DFT occurs in bit-reversed order. We also note that the computations
are performed in place. However, it is possible to reconfigure the decimation-in-
frequency algorithm so that the input sequence occurs in bit-reversed order while
the output DFT occurs in normal order. Furthermore, if we abandon the
requirement that the computations be done in place, it is also possible to have both
the input data and the output DFT in normal order.
Radix-4 FFT Algorithm
When the number of data points N in the DFT is a power of 4 (i.e., N = 4v), we can,
of course, always use a radix-2 algorithm for the computation. However, for this
case, it is more efficient computationally to employ a radix-r FFT algorithm.
Let us begin by describing a radix-4 decimation-in-time FFT algorithm briefly. We

split or decimate the N-point input sequence into four subsequences, x(4n),
x(4n+1), x(4n+2), x(4n+3), n = 0, 1, ... , N/4-1.
Thus the four N/4-point DFTs F(l, q)obtained from the above equation are
combined to yield the N-point DFT. The expression for combining the N/4-point
DFTs defines a radix-4 decimation-in-time butterfly, which can be expressed in
matrix form as
The radix-4 butterfly is depicted in Figure TC.3.9a and in a more compact form in
Figure TC.3.9b. Note that each butterfly involves three complex multiplications,
since WN0 = 1, and 12 complex additions.
Figure TC.3.9 Basic butterfly computation in a radix-4 FFT algorithm.
By performing the additions in two steps, it is possible to reduce the number of

additions per butterfly from 12 to 8. This can be accomplished ty expressing the
matrix of the linear transformation mentioned previously as a product of two
matrices as follows:
Figure TC.3.10 Sixteen-point radix-4 decimation-in-time algorithm with input in
normal order and output in digit-reversed order
A 16-point, radix-4 decimation-in-frequency FFT algorithm is shown in Figure
TC.3.11. Its input is in normal order and its output is in digit-reversed order. It has
exactly the same computational complexity as the decimation-in-time radex-4 FFT
algorithm.
Figure TC.3.11 Sixteen-point, radix-4 decimation-in-frequency algorithm with

input in normal order and output in digit-reversed order.
For illustrative purposes, let us re-derive the radix-4 decimation-in-frequency

algorithm by breaking the N-point DFT formula into four smaller DFTs. We have
From the definition of the twiddle factors, we have
The relation is not an N/4-point DFT because the twiddle factor depends on N and
not on N/4. To convert it into an N/4-point DFT we subdivede the DFT sequence
into four N/4-point subsequences, X(4k), X(4k+1), X(4k+2), and X(4k+3), k = 0, 1,
..., N/4. Thus we obtain the radix-4 decimation-in frequency DFT as
where we have used the property WN4kn = WknN/4. Note that the input to each N/4-
point DFT is a linear combination of four signal samples scaled by a twiddle
factor. This procedure is repeated v times, where v = log4N.
Split-Radix FFT Algorithms
An inspection of the radix-2 decimation-in-frequency flowgraph shown in Figure

TC.3.8 indicates that the even-numbered pints of the DFT can be computed
independently of the odd-numbered points. This suggests teh possibility of using
different computational methods for independent parts of the algorithm, with the
objective of reducing the number of computations. The split-radix FFT (SRFFT)
algorithms exploit this idea by using both a radix-2 and a radix-4 decomposition in
the same FFT algorithm.
First, we recall that in the radix-2 decimation-in-frequency FFT algorithm, the

even-numbered samples of the N-point DFT are given as
A radix-2 suffices for this computation.
The odd-numbered samples {X(2k+1)} of the DFT require the pre-multiplication of

the input sequence with the twiddle factors WNn. For these samples a radix-4
decomposition produces some computational efficiency because the four-point
DFT has the largest multiplication-free butterfly. Indeed, it can be shown that
using a radix greater than 4 does not result in a significant reduction in
computational complexity.
If we use a radix-4 decimation-in-frequency FFT algorithm for the odd-numbered

samples of the N-point DFT, we obtain the following N/4-point DFTs:
Figure TC.3.12 shows the flow graph for an in-place 32-point decimation-in-
frequency SFFT algorithm.
Figure TC.3.12 Length 23 split-radix FFT algorithms from paper by Duhamel
(1986); reprinted with permission from the IEEE
Figure TC.3.13 Butterfly for SRFFT algorithm.
Real Multiplications Real Additions

Radix- Radix- Radix- Split Radix- Radix- Radix- Split
N
2 4 8 Radix 2 4 8 Radix
16 24 20 20 152 148 148
32 88 68 408 388
64 264 208 204 196 1032 976 972 964
128 72 516 2054 2308
256 1800 1392 1284 5896 5488 5380
512 4360 3204 3076 13566 12420 12292
1024 10248 7856 7172 30728 28336 27652
Table TC.3.1 Number of Nontrivial Real Multiplcations and Additions to

Compute an N-point Complex DFT
You can keep expanding the butterfly by the same procedure.
The Basic Idea

<< Home Next >>
A Fourier Transform converts a wave in the time domain to the frequency domain.
Note, for a full discussion of the Fourier Series and Fourier Transform that are the
foundation of the DFT and FFT, see the Superposition Principle, Fourier Series,
Fourier Transform Tutorial.
Every wave has one or more frequencies and amplitudes in it. An example is a
sound wave. If someone speaks, whistles, plays an instrument, etc., to generate a
sound wave, then any sample of that sound wave has a set of frequencies with
amplitudes that describe that wave.
According to the mathematician Joseph Fourier, you can take a set of sine waves
of different amplitudes and frequencies and sum them together to equal any wave
form. These component sine waves each have a frequency and amplitude. A plot of
frequency versus magnitude (amplitude) on an x-y graph of these sine wave
components is a frequency spectrum, or frequency domain, plot. See Diagram 1,
below.
An inverse Fourier Transform converts the frequency domain components back into the
original time wave.
You can reassemble the time wave from the frequency components using the
Inverse Fourier Transform. The inverse Fourier won't be discussed here, but
after learning the Fourier the Inverse is very easy to learn, because the math is
almost identical. Using the Fourier and Inverse Fourier together, not only can you
reassemble the original wave, you can also change the time wave by altering its
frequency components. You can add them, remove them, or tweak their values.
This is a powerful method by which to change the character of the time wave.

Transform". The IDFT below is "Inverse DFT" and IFFT is "Inverse FFT". A
DFT is a Fourier that transforms a discrete number of samples of a time wave and
converts them into a frequency spectrum. However, calculating a DFT is
sometimes too slow, because of the number of multiplies required. An FFT is an
algorithm that speeds up the calculation of a DFT. In essence, an FFT is a DFT for
speed. The entire purpose of an FFT is to speed up the calculations.
Diagram 1
Equation 1
Where F(n) is the amplitude at the frequency, n, and N is the number of discrete
samples taken.
FFT
A DFT and FFT TUTORIAL
Transform". An FFT is a DFT, but is much faster for calculations. The whole
point of the FFT is speed in calculating a DFT.
The Basic Idea

<< Home Next >>
A Fourier Transform converts a wave in the time domain to the frequency

domain.
Note, for a full discussion of the Fourier Series and Fourier Transform that are the foundation of the DFT
and FFT, see the Superposition Principle, Fourier Series, Fourier Transform Tutorial.
Every wave has one or more frequencies and amplitudes in it. An example is a sound wave. If someone
speaks, whistles, plays an instrument, etc., to generate a sound wave, then any sample of that sound wave
has a set of frequencies with amplitudes that describe that wave.
According to the mathematician Joseph Fourier, you can take a set of sine waves of different amplitudes
and frequencies and sum them together to equal any wave form. These component sine waves each have a
frequency and amplitude. A plot of frequency versus magnitude (amplitude) on an x-y graph of these sine
wave components is a frequency spectrum, or frequency domain, plot. See Diagram 1, below.
An inverse Fourier Transform converts the frequency domain components

back into the original time wave.
You can reassemble the time wave from the frequency components using the Inverse Fourier
Transform. The inverse Fourier won't be discussed here, but after learning the Fourier the Inverse is very
easy to learn, because the math is almost identical. Using the Fourier and Inverse Fourier together, not
only can you reassemble the original wave, you can also change the time wave by altering its frequency
components. You can add them, remove them, or tweak their values. This is a powerful method by which
to change the character of the time wave.
A DFT is a "Discrete Fourier Transform". An FFT is a "Fast Fourier Transform". The IDFT below
is"Inverse DFT" and IFFT is "Inverse FFT". A DFT is a Fourier that transforms a discrete number of
samples of a time wave and converts them into a frequency spectrum. However, calculating a DFT is
sometimes too slow, because of the number of multiplies required. An FFT is an algorithm that speeds up
the calculation of a DFT. In essence, an FFT is a DFT for speed. The entire purpose of an FFT is to speed
up the calculations.
Diagram 1
Equation 1
Where F(n) is the amplitude at the frequency, n, and N is the number of discrete samples taken.
Outline For Learning About The DFT and FFT

<< Previous Next >>
Here is an outline of the steps used to explain both the DFT and FFT.
1. First the DFT will be explained. This is the vital first step, since an FFT is a DFT and there are,
therefore, basic concepts in common with both. Learning this first will make understanding the
FFT easier.
2. Once you understand the basic concepts of a DFT, the FFT will be explained. This is broken
into several steps.
3. The "Danielson-Lanczos Lemma" will be explained, which is the first step to understanding the
FFT.
4. The "twiddle factor" will be explained, which is another key to understanding the FFT.
5. The "Butterfly Diagram" will be explained. The Butterfly is an FFT in diagram form. It's the
final step of this tutorial and builds on the prior concepts.
6. Several examples will be given along with the basic concepts above.

<< Previous Next >>
The goal of this tutorial is to show how to take a discretely sampled wave, usually from nature, and
convert it to the frequency domain by using an FFT. An FFT is a DFT of a particular form. An FFT is
simply a fast way of calculating a DFT. Before explaining the FFT I explain the DFT. I do this for three
reasons.
1. A DFT is much simpler to understand mathematically, which is better for learning.

2. Once you understand what the terms of a DFT mean, they apply to the FFT, so you are learning
the FFT too.
3. You will gain an appreciation for what the FFT accomplishes, because it is derived from the
DFT and the only purpose of the complex math of an FFT is to speed up the DFT calculation.
It's purely about speedy number crunching and changes none of the fundamentals.
Why Do This?
<< Previous Next >>
The reason to learn about the DFT and FFT is in order to get a frequency spectrum of a wave or to
understand better what frequencies it is composed of. This might allow you to better identify, for
example, a sound wave that you have sampled than could be done with the time wave, which is useful for
speech recognition. Or, maybe you want to add or subtract frequencies and recreate the original wave
with these modifications using an inverse Fourier Transform. Doing this with light waves you could, for
example, remove dirty spots or noise from an image, or find recurring patterns in an image. You may
have other ideas as to what you can do with the frequency components of a wave. The sky is the limit!
The DFT
<< Previous Next >>
The Discrete Fourier Transform converts discrete data from a time wave into
a frequency spectrum. Using the DFT implies that the finite segment that is
analyzed is one period of an infinitely extended periodic signal.
The DFT equation:
Equation 1
1. The "sampling rate", sr. The sampling rate is the number of samples taken over a time period.
For simplicity we will make the time interval between samples equal. This is the "sample
interval", si.
2. The fundamental period, T, is the period of all the samples taken. This is also called the
"window".
3. The "fundamental frequency" is f0, which is 1/T. f0 is the first harmonic, the second harmonic
is 2*f0, the third is 3*f0, etc.
5. The "Nyquist Frequency", fc, is half the sampling rate. The Nyquist frequency is the maximum
frequency that can be detected for a given sampling rate. This is because in order to measure a
wave you need at least two sample points to identify it (trough and peak).
7. The sampled part of the time wave, x(t), should be "typical" of how the wave behaves over all
time that it exists.
8. This notation makes handling the exponential easier. This is sometimes called
the "twiddle factor."
For simplicity, we will sample a sine wave with a small number of points, N, and perform a DFT on it,
then we will employ each of the concepts above. Note, the sine wave is a time wave, and could be any
wave in nature, for example a sound wave. The horizontal axis is time. The vertical axis is amplitude.
Diagram 1
Notice how in the diagram above we are sampling four points. The fundamental period, T, of the wave
sampled is set to 2*pi. This applies to any wave we want to sample. The interval between samples is
2*pi/N, so in this case it is 2*pi/4. Thus, the interval between samples is pi/2 in this case.
The time wave is thus, x(k) = sin(pi/2*k) for k = 0 to N -1. The last point sampled is always the point
just before 2*pi, because the wave is considered to be a repeating pattern and wraps around back to the
value at k = 0, so you aren't missing any information.
We also need to know the time taken to sample the wave, so that we can tie it to a frequency. In our
example, the time taken for the fundamental period, T, is 0.1 seconds (this value is measured when the
wave is captured). That means the sine wave is a 10 Hz wave. Hertz = cycles per second. Also,
the sampling interval, si, is the fundamental period time divided by the number of samples. So, si =
T/N = 0.1/4 seconds, or 0.025 seconds. The sampling rate, or frequency, sr, = 1/si = 40 Hz, or 40
samples per second.
And, before we plug into the DFT, some more on Wn, the twiddle factor, referenced above:
Equation 2
Notice that any additional integer values of kn will cycle back around. For example, kn = 4 cycles back to
kn=0, so the value is 1. kn = 5 cycles back around to kn = 1, so the value is -j. The equation "kn modulus
4" determines which value of W is selected. Also, note that for larger samples the cycle is bigger. So for
N=8 the equation would be "kn modulus 8". This is probably why W is called the "twiddle factor".
Evaluating the output data. Each F(n) value refers to a particular frequency. The frequency of the point is
determined by the fundamental frequency multiplied by n i.e. f = f0*n, where f0=1/T = 10Hz. The output
values are the phase of the frequencies, which are represented by a real part and an imaginary part thus:
real + j*imaginary. The fundamental frequency, first harmonic, is 10 Hz as calculated above.
The magnitude at a frequency is Calculated thus sqrt(real*real + imaginary*imaginary).
Below is a frequency spectrum plot for the sine wave determined from the DFT we just worked through:
Diagram 2
The frequency plot is in the "frequency domain". The spike at 10 Hz shows that the DFT pulled out one
of the frequencies that is in the sine wave. In fact, the sine wave is a 10 Hz sine wave, so that makes
sense. However, the spike at 30 Hz should not be there, because there is no 30 Hz wave in the sine wave.
So what accounts for that spike? Well, this is where the Nyquist Frequency, fc, mentioned above comes
in. The Nyquist frequency is the cut off point above which the data from the DFT is no longer valid. The
sampling rate is 40 Hz, and fc is half the sampling frequency, which means that any frequency above 20
Hz will not be valid in this case. So, the 30 Hz frequency is a spurious signal.
Most waves will have many more frequencies in them, and thus many more spikes of various magnitudes
along the frequency spectrum. For example, below is a triangle wave in time and the corresponding
frequency spectrum of that wave:
Diagram 3
The Next section is FFT. The FFT builds on the knowledge above, so it should be understood before
moving on to the FFT.
Overview of The FFT

<< Previous Next >>
This is a "decimation in time" FFT, because the input value to the FFT is the
time wave, x(t). The time wave could be a sound wave, for example.
Speed!
The only reason to learn the FFT is for speed. An FFT is a very efficient DFT calculating algorithm.

Equation 1
This means that a 1024 sample FFT is 102.4 times faster than the "straight" DFT. For larger numbers of
samples the speed advantage improves. For example, for 4096 samples the FFT is over 340 times faster.

1. First, you'll need to learn the "Danielson-Lanczos Lemma" (D-L Lemma). This will require
long equation writing, but it's a vital component of the FFT. I'll give several examples.
2. You'll need to understand the "twiddle factor" -- . This was discussed a little during the
DFT tutorial. It along with the "D-L Lemma" are essential to understanding how an FFT works.
3. Then the "Butterfly Diagram" will be explained. This builds on the first two concepts above.
The Butterfly diagram is a diagrammatic representation of an FFT algorithm.
The Danielson-Lanczos Lemma

<< Previous Next >>
This is the first key to understanding the FFT. It takes quite a few steps, but
I've broken the tutorial down into small digestible steps to make this as
smooth as possible.
Equation 2
Note it is a DFT broken up into two summations of half the size of the original. The first summation is the
"even terms", E, and the second is the "odd terms", O. W is the "twiddle factor", and understanding it is
another key to understanding the FFT. Here is the "twiddle factor".
Equation 3

To expand the DFT into even and odd terms as in the lemma above, you do the following. For the even
term you substitute 2k into k, then you create a summation of half the size of the original. For the odd
term you substitute 2k + 1 into k, then create a summation of half the size of the original.

Equation 4
And putting the even and odd terms together from above, we get the Danielson Lanczos first level
expansion:
Equation 5
The above is a first level break down. You can continue to break each term down into even and odd terms
until you run out of samples and only have one value in the summation. Like this:
Equation 6
This happens because you keep halving the number of values summed on each expansion of the
equation.For the FFT we want all summations to be expanded down to 1 term. Here is the pattern of
expansion for the Danielson-Lanczos Lemma:
As shown in the diagram above, the D-L Lemma breaks down in a binary manner. That is, the number of
terms expands as follows 1, 2, 4, 8, 16, 32, etc. In order to get all of the summation to unity, 1, therefore,
we must have a power of base 2 number of samples, or N=2^r samples. So, an FFT requires N = 2^r
samples.
For a sample size of N=2, a first level expansion will be enough to get the summations to unity. The first
level expansion will look like this after plugging into equation 5 above.
Equation 7
The summations are reduced to unity, and all that remains is a twiddle factor and input values, x(0) and
x(1). This is the general form used for the Butterfly diagram, shown later in this tutorial.
Expansion of the Danielson-Lanczos to Four Terms

<< Previous Next >>
For N = 4 samples, the equation must be expanded again to four terms. Below is the expansion to four
terms. As with the first level expansion, substitute 2k into k and reduce the summation by half for the
even terms and substitute 2k+1 into k and reduce the summation by half for the odd terms. The E and O
below refer to equation 5.


.
And finally:
Equation 8
Now, N = 4 samples, and using the same procedure as was used for two samples, equation 8 becomes:
Equation 9
Once again, as with N=2, the summations have been reduced to unity, and all you have remaining are
"twiddle factors" and the input values, x(0), x(1), x(2), and x(3).
The Danielson-Lanczos Lemma Expanded to 8 Terms

<< Previous Next >>
Equation 10
After the expansion above you can plug in for N = 8 samples, and k=0, since all summation would be
unity when N=8.
Equation 11
Danielson-Lanczos Lemma Observations

<< Previous Next >>
Note two things about the equations 7, 9 and 11, repeated below. First, the order of input values, x(n), is
"reverse binary".For example, left to right the order for the 4 term equation is x(0), x(2), x(1) and x(3).
The order for the 8 term equation is x(0), x(4), x(2), x(6), x(1), x(5), x(3), and x(7). This naturally
happens when the D-L Lemma is expanded. The Butterfly Diagram makes use of this fact. The second
thing to note is that the "twiddled factors",W, build up with each new expansion, so that you multiply
more together. The Butterfly diagram also deals with this by the adding of "stages", which you will see
later in this tutorial.
Equation 7
Equation 9
Equation 11

For 4 inputs:
Count from 0 to 3 in binary 00, 01, 10, 11. Now, reverse the bits of each number and you get 00, 10, 01,
11. In decimal this is 0, 2, 1, and 3.
So, the values in the D-L equation would be x(0), x(2), x(1), and x(3). This is what you see in equation 9
above.
For 8 inputs:
Count from 0 to 7 in binary 000, 001, 010, 011, 100, 101, 110, 111. Now, reverse the bits of each number
and you get 000, 100, 010, 110, 001, 101, 011, 111. In decimal this sequence is 0, 4, 2, 6, 1, 5, 3, and 7.
So, the values in the D-L equation for 8 samples would be x(0), x(4), x(2), x(6), x(1), x(5), x(3), and x(7).
This is what you see in equation 11 above.
The same pattern holds for all expansions of the D-L Lemma, and is made use of by the Butterfly
Diagram.
Next I'll discuss the "twiddle factor" and then put it together with the Danielson-Lanczos Lemma to create
the Butterfly diagram, which is the FFT in diagram form.

<< Previous Next >>
The twiddle factor, W, describes a "rotating vector", which rotates in
increments according to the number of samples, N. Here are graphs where N
= 2, 4 and 8 samples.

Factor"
As shown in the diagram above, the twiddle factor has redundancy in values as the vector rotates around.
For example W for N=2, is the same for n = 0, 2, 4, 6, etc. And W for N=8 is the same for n = 3, 11, 19,
27, etc.
Also, the symmetry is the fact that values that are 180 degrees out of phase are the negative of each other.
So for example, W for N =4 samples, where n = 0,4,8, etc, are the negative of n = 2,6,10, etc.
The Butterfly diagram takes advantage of this redundancy and symmetry, which is part of what makes the
FFT possible.

<< Previous Next >>
The Butterfly Diagram builds on the Danielson-Lanczos Lemma and the twiddle factor to create an
efficient algorithm. The Butterfly Diagram is the FFT algorithm represented as a diagram.
First, here is the simplest butterfly. It's the basic unit, consisting of just two inputs and two outputs.
That diagram is the fundamental building block of a butterfly. It has two input values, or N=2 samples,
x(0) and x(1), and results in two output values F(0) and F(1). The diagram comes form the D-L Lemma
for two inputs.
This can be shown by taking equation 7 above and plugging in for values n=0 and n=1, thus:
So, the Butterfly comes from the Danielson-Lanczos Lemma, but it also uses the twiddle factor to take
advantage of redundancies and symmetry in the D-L Lemma.
To get a full understanding of the Butterfly, a four input Butterfly will be required. That is described next.

<< Previous Next >>

Note the order of input values is "reverse bit" order. The Butterfly uses the natural expansion order of the
Danielson-Lanczos Lemma, which is why the input is ordered that way. This was described earlier.

derived below.
Equation 12
The N Log N savings

Remember, for a straight DFT you needed N*N multiplies. The N Log N savings comes from the fact that
there are two multiplies per Butterfly. In the 4 input diagram above, there are 4 butterflies. so, there are a
total of 4*2 = 8 multiplies. 4 Log(4) = 8. This is how you get the computational savings in the
FFT! The log is base 2, as described earlier. See equation 1.
<< Previous Next >>
An The 8 input butterfly diagram has 12 2-input butterflies and thus 12*2 = 24 multiplies.
N Log N = 8 Log (8) = 24. A straight DFT has N*N multiplies, or 8*8 = 64 multiplies. That's a pretty
good savings for a small sample. The savings are over 100 times for N = 1024, and this increases as the
number of samples increases.
You can keep expanding the butterfly by the same procedure.
GPU_FFT
Back to projects
GPU_FFT release 3.0 is a Fast Fourier Transform library for the Raspberry Pi
which exploits the BCM2835 SoC GPU hardware to deliver ten times more data
throughput than is possible on the 700 MHz ARM of the original Raspberry Pi 1.
Kernels are provided for all power-of-2 FFT lengths between 256 and 4,194,304
points inclusive. Accuracy has been significantly improved, without compromising
performance, using a novel floating-point precision enhancement technique.
Raspberry Pi Foundation CEO Eben Upton attended the Radio Society of Great
Britain (RSGB) 2013 Annual Convention, where radio amateurs told him FFT
performance of its 700 MHz ARM limited the Pi's usefulness in Software Defined
Radio (SDR) applications. That was the impetus which led me to develop
GPU_FFT; and I wish to thank Eben Upton, Dom Cobley, Tim Mamtora, Gary
Keall and Dave Emett for helping.
GPU_FFT runs on the quad processing units (QPUs) in the BCM2835 V3D block.
There is a huge amount of interest in general-purpose programming of graphics
hardware; and, in February 2014, Broadcom gave the Pi community a fantastic
present by publishing the Videocore IV V3D Architecture Specification. Now,
anybody can develop code for the GPU and some pioneers already have. VC4ASM
is a full-featured macro assembler by Marcel Müller.
Similar throughput is now achievable using NEON instructions on the Pi 2 ARM

v7; however, GPU_FFT remains useful on the Pi 1 and as an example of general-
purpose GPU (GPGPU) programming.
What's new in release 3.0?

 log2N = 22 transform length
 Relative root-mean-square (rms) error reduced to ~ 1 part-per-million (ppm)
What was new in release 2.0?

 More transform lengths supported: log2N = 18, 19, 20 and 21
 Hardware transposer to accelerate 2-dimensional transforms. See hello_fft_2d
demo
 Small batches of short transforms ran more efficiently by avoiding mailbox
overhead
Getting started
This article is about how GPU_FFT works; but first a few pointers on how to use
it. GPU_FFT is distributed with the Raspbian operating system. The following
commands build and run the hello_fft demo:
cd /opt/vc/src/hello_pi/hello_fft
make
sudo ./hello_fft.bin 12
These files contain all the information you need to use GPU_FFT:
hello_fft.c
Demo source code
hello_fft_2d.c
gpu_fft.txt User Guide
The hello_fft demo isn't very spectacular. It just computes some performance
figures. The newly released hello_fft_2d demo is a bit more visual:
make hello_fft_2d.bin
sudo ./hello_fft_2d.bin
It generated the above image. Now back to how GPU_FFT works ...
Bit reversal
Two ways of implementing the classic radix-2 Cooley-Tukey FFT algorithm are:
decimation in time (DIT) and decimation in frequency (DIF). Both yield the same
result; they differ in the ordering of inputs and outputs. The choice is either to read
inputs or write outputs in bit-reversed-ordinal sequence. Neither option is attractive
when transforming data in SDRAM because memory controllers transfer blocks of
consecutive addresses in bursts. Randomly accessing memory in the way bit-
reversal dictates is expensive if there is more data than will fit in cache.
Decimation in time Decimation in frequency
o
re
r
v
d
0 0
0 0
0 0
0 1
0 0
1 0
0 0
1 1
0 0
0 1
1 1
1 0
1 0
0 0
0 1
1 1
0 0
1 1
1 0
1 1
0 1
1 1
1 1
1 1
Butterflies
The eight-point transforms depicted above have three so-called butterfly stages. In
general, an N-point FFT is computed in log2(N) stages by the radix-2 algorithm. A
naïve implementation might transfer data to and from SDRAM at every stage; but
to do so log2(N) times is inefficient. Ideally, we would perform multiple stages of
computation in registers and minimise memory access. FFT performance on
modern hardware is usually limited by memory bandwidth, not floating-point
computation.
Say we could accommodate four complex numbers in registers. Looking at the

above DIT data flow graph, if we loaded x(0), x(4), x(2) and x(6) from memory,
we could compute the first two butterfly stages in registers; temporarily save the
result to memory; and do the same for the odd-numbered ordinals 1, 5, 3 and 7; but
then we hit a problem. The inputs to the third butterfly do not fit in registers and it
gets worse with more stages. We will always run out of registers eventually, even
if we have quite a lot as the BCM2835 QPU does.
Bit rotation
Compare these 256-point DIT FFT data flow graphs:
DIT FFT 256 classic
DIT FFT 256 practical
The classic data flow graph illustrates the above mentioned implementational
challenge: the number of inputs to the butterflys doubles at each stage. In the
practical flow: extra "twists" (re-orderings) have been inserted after the fourth and
final stages; no butterfly is then a function of more than sixteen inputs; all
computation can be performed in registers using an FFT_16 codelet; and SDRAM
is only read/written twice. The extra twists are bit-rotations not reversals. Ordinals
are circularly shifted by four. Doing so twice in an eight-stage transform leaves
outputs in the correct positions. Butterfly wing span doubles from 16 to 32
between the 4th and 5th stages of the classic flow. Rotating ordinals by 4-bits
drops wing span back down to 2 again in the practical flow.
The above PDF files were generated by a C program, Butter, which was written to
visualize how twiddles change as we step through the data. You will also need
libHaru to build it, or, you can run the zipped executable if you trust me. I promise
it is safe. The zipped code generates a 4096-point "practical" flow graph; but the
#defines can be changed for other lengths.
Architecture
We would prefer to access memory "linearly" in increasing address order;
however, rotations and reversals require "scattered" accesses. DIT unavoidably
begins by reading data from bit-reversed-ordinal positions. Then there are two
alternative ways to do the rotations:
GPU_FFT takes the high road because the VPM/VDW are good at bit-rotated
writes and the same write code can then be used for both passes. But why DIT?
Why not DIF? Because a bit-reversed write would be slow.
End-to-end data path through V3D hardware on Pi 1:
The TMU (texture & memory lookup unit) is good at fetching contiguous or
scattered data from SDRAM, although it was not specifically optimised for bit-
reversed reads. The TMU cache is directly-mapped, aliasing on 4k boundaries.
Unfortunately, bit-reversal of consecutive ordinals makes upper address bits
toggle, causing collisions and evictions! VC4 L2 Cache can only be used on Pi 1;
and must be avoided for GPU_FFT and OpenGL to coexist. The ARM does not
access SDRAM through VC4 L2 cache on Pi 2. It is safe for VDW output to
bypass V3D internal caches, since they are smaller than the ping-pong data buffers.
Parallelism
The BCM2835 V3D QPU (quad processing unit) is a 16 SIMD (single instruction,
multiple data) processor. Registers and data paths are 16*32 bits wide, representing
vectors of 16 values. Instructions can act upon all 16 values in parallel, or,
optionally, on selected values; and the 16 can be rotated within a vector. These
features are exploited in GPU_FFT to implement a very fast FFT_16 codelet.
Working with complex numbers, only two registers are needed to store 16 real and
16 imaginary parts. The TMU has a convenient programming interface which
accepts a vector of 16 potentially scattered addresses and returns 16 words from
SDRAM.
BCM2835 V3D has 12 QPUs; but GPU_FFT only uses 8. It happens that bit-
rotated writes can be done very neatly, through the VPM and VDW, by 8 QPUs
working in parallel. One is master and 7 are slaves, synchronised using
semaphores. QPUs can access the VPM (vertex pipeline memory) either
horizontally or vertically. Each QPU outputs real and imaginary vectors to VPM
columns. The master then writes VPM rows to SDRAM through the VDW as a
series of bursts, with a stride between rows:
QPU0 QPU1 QPU6 QPU7
rei+0 imi+0 rei+16 imi+16 . . . rei+96 imi+96 rei+112 imi+112 base

rei+1 imi+1 rei+17 imi+17 . . . rei+97 imi+97 rei+113 imi+113 base + stride
rei+2 imi+2 rei+18 imi+18 . . . rei+98 imi+98 rei+114 imi+114 base + stride*2
. . . . .. .. . . .
. . . . . VPM . . . . .
. . . . .. .. . . .
The VDW write setup STRIDE is specified as a 13-bit field on page 59 of the
Videocore IV V3D Architecture Specification but luckily the hardware actually
supports 16-bits. The value specified in the setup is 64 bytes less than the true
stride shown above because the internal address register to which it is added has
already advanced 16*4 bytes.
Codelets
The QPUs are kept in lock-step. In the FFT-256 example, each QPU consumes 16-
points of data per step. Together they process 16*8 = 128 points. Only two such
steps per pass and two passes are required to complete the entire FFT-256. Longer
transforms require more steps, more passes and GPU_FFT also uses FFT-32 and
FFT-64 codelets as follows:
2 5
10 20 409 819 163 327 655
N 5 1 128K 256K 512K 1M 2M 4M
24 48 6 2 84 68 36
6 2
log2N 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Passes 2 2 2 2 3 3 3 3 3 4 4 4 4 4 4
4 5
Struct 5+ 6+ 4+4 5+4 5+5 5+5 6+5 5+4+ 5+4+ 5+4+ 5+5+ 6+5+ 6+6+
+ +
ure † 5 5 +4 +4 +4 +5 +5 4+4 4+5 5+5 5+5 5+5 5+5
4 4
† 4=FFT-16; 5=FFT-32; 6=FFT-64
Minimising the number of passes minimises SDRAM access.
log2N=18 and longer transforms require VDW write strides exceeding 65535 bytes
and must therefore issue multiple single-row VDW write commands.
5+4+4+5 has lower root-mean-square (rms) error than 5+5+4+4.
128k and 256k could be done as 6+6+5 and 6+6+6. When these shaders were
originally written, the FFT-64 codelet could only be used in the first pass. Now it
can be used in any pass and they could be optimised.
Complex arithmetic
All floating-point arithmetic in GPU_FFT is on complex numbers. This code
performs 16 complex multiplications in parallel:
#
# complex p(r0, r1) = complex a(r0, r1) * complex b(ra_re, rb_im)
#
nop; fmul r3, r1, rb_im # ii = a.im*b.im
nop; fmul r2, r0, ra_re # rr = a.re*b.re
nop; fmul r1, r1, ra_re # ir = a.im*b.re
nop; fmul r0, r0, rb_im # ri = a.re*b.im
fadd r1, r0, r1 # p.im = ri + ir
fsub r0, r2, r3 # p.re = rr - ii
Twiddles
Twiddle factors are unit vectors in the complex plane, which "rotate" as we work
through the data. Rotation in this context is the angular not the bit-shift variety!
The size and frequency of the rotations varies between passes. Twiddles in the
final pass change on every loop. Twiddles in the first pass do not change at all. The
general rule for a 4-pass transform with codelet structure a+b+c+d is as follows:
Pass 12 3 4
Codelet ab c d
Rotation (Δk) - c.d d 1
Interval (points) - b.c.d c.d d
This 3-pass FFT-4096 flow graph helped me visualize the subtleties:
DIT FFT 4096 (6MB zip of 33MB pdf) or download Butter.zip and run the
executable.
Twiddle factors are applied to data within the codelets at the points indicated on
the data flow. Each integer, k, represents a complex multiplication by
Ideally, twiddles should be calculated to a higher precision than the data; but the
QPU only supports 32-bit single-precision floats. Trigonometric recurrences are
used to minimise loss of precision:
Unfortunately, the QPU always rounds down (not to the nearest LSB) and
significant rounding errors accumulated, seriously compromising accuracy in
releases 1.0 and 2.0 of GPU_FFT. This has been addressed in release 3.0 using a
precision enhancement technique based on the work of Dekker [2] and Thall [3]
which corrects rounding errors, improving accuracy by almost three orders of
magnitude in the longer transforms!
Thall and Dekker use the "unevaluated sum" of two single-precision floats to
represent one double-precision quantity and provide primitives for addition,
subtraction and multiplication. To avoid a performance hit, only the cosθ and sinθ
terms are maintained at higher precision; and only the final two subtractions of the
recurrence are evaluated using Thall's primitives. Two QPU instructions per
iteration became twenty-six.
Real and imaginary parts are stored at offsets from ra_tw_re and rb_tw_im
respectively:
Codelet Offset Description

FFT-16 TW16_Pn_STEP
FFT-32 TW32_Pn_STEP
per-step α, β rotation for nth pass
TW48_Pn_STEP
FFT-64
TW64_Pn_STEP
TW16+0 1st stage unpacked twiddles
TW16+1 2nd stage unpacked twiddles
FFT-16
TW16+2 3rd stage unpacked twiddles
TW16+3, 4
FFT-32 TW32+0, 1
high-precision cosθ, sinθ
TW48+0, 1
FFT-64
TW64+0, 1
STEP values and starting values for cosθ, sinθ are loaded from memory addresses
passed-in through uniforms. Each of the 8 QPUs has unique (rx_tw_unique)
starting values in the final pass. Other values are common (rx_tw_shared) to all
shaders.
Twiddle management macros:
load_tw Load α, β and starting cosθ, sinθ

body_fft_16 FFT-16 codelet
fft_twiddles_32 FFT-32 codelet helper
unpack_twiddles updates TW16
rotate
next_twiddles_16 rotate one step
next_twiddles_32
df64_sub32 Enhanced-precision subtract
Using TW#i notation to denote the i th element of a vector, twiddles are packed in
the TW16 and TW32 registers:
Element #0 of the TW16 registers is not used. Twiddles are "unpacked" to suit the
4 stages of the FFT-16 codelet and conditional flags control which elements are
multiplied at each stage:
.macro body_fft_16
.rep i, 4
and.setf -, elem_num, (1<<i)
TW TW TW TW TW TW TW TW TW TW TW TW TW TW TW TW TW
16+ 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16#
0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
16+ 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16#
1 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
16+16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16#
2 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
16+16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16# 16#
3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
TW TW TW TW TW TW TW TW TW TW TW TW TW TW TW TW
TW
32# 32# 32# 32# 32# 32# 32# 32# 32# 32# 32# 32# 32# 32# 32# 32#
32
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
TMU read pipeline

Read-related macros:
read_rev
Queue TMU request for bit-reverse or linear (sequential) ordinal read
read_lin
load_rev Call read_* to queue another request and wait for response from
load_lin previous
interleave
Swap request/response elements to improve cache-efficiency
swizzle
bit_rev Bit-reversal primitive
At least one request is always queued to avoid the performance penalty of TMU
pipeline stalls. One too many requests are issued and one response must be thrown
away after the loop. Pseudo-code:
#######################################
# Pass 1
read_rev
foreach step {
load_rev
...
}
ldtmu0 # Wait for last (unwanted) response
ldtmu0
#######################################
# Pass 2
read_lin
foreach step {
load_lin
...
}
ldtmu0 # Wait for last (unwanted) response
ldtmu0
Write subtleties
VPM and VDW write macros:
Macro Caller Description
Setup VPM column write commands for this QPU
inst_vpm
instance (0...7)
write_vpm_16 Both Write codelet output to VPM columns
write_vpm_32 FFT_32 and FFT_64 final butterfly stage done
write_vpm_64 here
body_ra_save_16
write_vpm_*, sync with slaves and queue VDW
body_ra_save_32
Master write
body_ra_save_64
body_ra_sync Sync with slaves
body_rx_save_slave_16
body_rx_save_slave_32 write_vpm_* and sync wth master
Slaves
body_rx_save_slave_64
body_rx_sync_slave Sync with master
FFT_16 and FFT_32 codelet VPM writes are double-buffered, allowing the slaves
to proceed into the next step without waiting for the VDW write. FFT_16
alternates between the first 16 and second 16 VPM rows. FFT_32 alternates
between the first 32 and second 32 VPM rows. FFT_64 is not double-buffered
because user shaders can only use the first 64 rows. Note how the order of
semaphore instructions in the FFT_64 body_*_save_* macros differs from the others.
Notice also the use of VPM readbacks to ensure preceding writes have completed;
and the postponement of vw_wait until the last moment.
Mailbox
GPU_FFT uses a kernel character device known as the "mailbox" for
communication between the ARM and the BCM2835 VPU. Running start.elf from
the SD-Card boot partition, the VPU is a powerful (and on the Pi under-used)
vector processor that manages the Videocore hardware. GPU_FFT uses two
mailbox API functions:
unsigned qpu_enable (int file_desc, unsigned enable);
unsigned execute_qpu(int file_desc, unsigned num_qpus, unsigned control, unsigned noflush, unsigned
timeout);
The first enables or disables the GRAFX (V3D) power domain and opens a V3D
driver handle for running user shaders. The second executes one or more shaders
on the QPUs.
Unfortunately, calling through the mailbox involves a kernel context switch,

incurring a ~ 100µs performance penalty. This is not an issue for qpu_enable() which
need only be called twice; however, it is a significant overhead on execute_qpu()
when computing short transforms. This is why GPU_FFT 1.0 allowed batches of
transforms to be executed with a single mailbox call. GPU_FFT 3.0 still supports
batch mode; but, when the expected shader runtime is below a #defined threshold,
it avoids the mailbox overhead by poking V3D hardware registers directly from the
ARM side and busy-waits for shader completion.
V3D hardware registers are documented in the Videocore IV V3D Architecture

Specification.
Currently, programs which use the mailbox must run as root; however, Marcel
Müller is developing vcio2 a kernel driver to access the Raspberry Pi VideoCore
IV GPU without root privileges.
The mailbox special character device file has moved from ./char_dev to /dev/vcio in
the latest release.
Alternative algorithms
Instead of calculating twiddle factors on-the-fly using recurrences, a memory look-
up table can be used. This may be a full cycle, or just a single quadrant in size. The
latter occupies less memory; but requires more pre- and post-look-up processing.
Both consume TMU memory bandwidth. Performance of the log2N=20 Cooley-
Tukey shader with twiddle look-up was investigated.
Stockham autosort FFT algorithms access data in "natural" order, avoiding the bit-
reversal or "shuffle" step required by Cooley-Tukey. Radix-2 Stockham algorithms
perform N-point transforms in log2N passes, effectively shuffling indices 1-bit per
pass as they go. The number of passes can be reduced with higher radix codelets.
Based on work published by Microsoft Research [1], a mixed-radix Stockham
routine with twiddle look-up was prototyped, first in "C" and then in QPU
assembler.
The above experiments produced modest accuracy improvements; but throughput

was disappointing and performance counters implicated TMU cache aliasing.
References
1. High Performance Discrete Fourier Transforms on Graphics Processors
2. A Floating-Point Technique for Extending the Available Precision, T. J.
DEKKER, 1971
3. Extended-Precision Floating-Point Numbers for GPU Computation, Andrew
Thall, Alma College.
Links
omplex DFT and FFT Algorithm Implementation Using C++ Programming Language
All four types of Fourier Transform family can be carried out with either real number or complex
number. In my previous post, I shared how to implement real DFT algorithm using C++. In this
post, I will implement the complex number version of DFT algorithm using C++. After that, I
will also implement the Fast Fourier Transform (FFT) algorithm. FFT is another method for
calculating the DFT. While it produces the same result as the DFT algorithm, it is incredibly
more efficient, often reducing the computation time by hundreds. The algorithm that I use in this
post is based on The Scientist and Engineer's Guide to Digital Signal Processing book. This book
is very good for beginner to learn DSP.
The complex DFT transforms two N point time domain signals into two N point frequency
domain signals. The two time domain signals are called the real part and the imaginary part, just
as are the frequency domain signals. You can calculate the real DFT using complex DFT by
move the N point signal into the real part of the complex DFT's time domain, then set all of the
samples in the imaginary part to zero. Samples from 0 to N/2 of complex DFT correspond to the
real DFT's spectrum.
This is the equation for calculating the complex DFT:
The compact version of this equation is in the exponential form. Many textbook use this form.
From this form, we can expand the equation using Euler's formula, then with complex number
multiplication we get final equation for calculate the real and imaginary part. This final equation
is needed for implementing forward complex DFT code. The are several notation for the
equation:
With equation 1, we can implement the forward complex DFT code:
01 // Implement forward complex DFT equation (Eq. 1)
02 void cdft()
03 {
04 float sr, si;
05
06 // Zero REX[] and IMX[] so they can be used as accumulators
07 for (int k = 0; k < N; k++)
08 {
09 REX[k] = 0;
10 IMX[k] = 0;
11 }
12
13 // Loop for each value in frequency domain
14 for (int k = 0; k < N; k++)
15 {
16 // Loop for each value in time domain
17 for (int n = 0; n < N; n++)
18 {
19 // Correlate with the complex sinusoid, sr and si
20 sr = cos(2*PI*k*n/N);
21 si = -sin(2*PI*k*n/N);
22 REX[k] += xr[n]*sr - xi[n]*si;
23 IMX[k] += xr[n]*si + xi[n]*sr;
24 }
25 }
26 }
As in real DFT, to calculate the inverse complex DFT we need to scale the frequency domain
signals to get the sinusoids amplitudes by this equation:
After that, we can calculate the inverse complex DFT using this equation:
The equation for calculating the inverse complex DFT is similar to the forward complex DFT,
but no minus sign on the exponential form. From this form we can expand the equation to get the
final equation for calculating real and imaginary part of time domain signal. By using the
equation 2 and final equation 3, we can implement the inverse complex DFT code:
01 // Implement inverse complex DFT equation (Eq. 3)
02 void icdft()
03 {
04 float sr, si;
05
06 // Find the cosine and sine wave amplitudes (Eq. 2)
07 for (int k = 0; k < N; k++)
08 {
09 REX[k] /= N;
10 IMX[k] /= N;
11 }
12
13 // Zero xr[] and xi[] so it can be used as an accumulators
14 for (int n = 0; n < N; n++)
15 {
16 xr[n] = 0;
17 xi[n] = 0;
18 }
19
20 // Loop for each value in time domain
21 for (int n = 0; n < N; n++)
22 {
23 // Loop for each value in frequency domain
24 for (int k = 0; k < N; k++)
25 {
26 // Correlate with the complex sinusoid, sr and si
sr =
27
cos(2*PI*k*n/N);
28 si = -sin(2*PI*k*n/N);
29 xr[n] += REX[k]*sr + IMX[k]*si;
30 xi[n] += -REX[k]*si + IMX[k]*sr;
31 }
32 }
33 }
To check if this code is operating properly we can use MATLAB to plot the original time
domain signal and time domain signal from inverse complex DFT. The original signal and
inverse complex DFT signal should be the same. Actually there are not identical because of the
round off error.
Original Signal:
Inverse Complex DFT Signal:
The FFT algorithm is another method for calculating the DFT. This is the code for forward and
inverse FFT. If you want to learn more detail about this algorithm, you can read from The
Scientist and Engineer's Guide to Digital Signal Processing book.
01 // Upon entry, REX[] and IMX[] contain the real and imaginary parts
02 // of the input
03 // Upon return, REX[] and IMX[] contain the FFT output
04 void fft()
05 {
06 int nm1 = N - 1;
07 int nd2 = N / 2;
08 int m = log10(N) / log10(2);
09 int j = nd2;
10 int k;
11 int le, le2;
12 float ur, ui, sr, si;
13 int jm1;
14 int ip;
15 float tr, ti;
16
17 // Bit reversal sorting
18 for (int i = 1; i <= N-2; i++)
19 {
20 if (i >= j) goto a;
21 tr = REX[j];
22 ti = IMX[j];
23 REX[j] = REX[i];
24 IMX[j] = IMX[i];
25 REX[i] = tr;
26 IMX[i] = ti;
27 a:
28 k = nd2;
29 b:
30 if (k > j) goto c;
31 j -= k;
32 k /= 2;
33 goto b;
34 c:
35 j += k;
36 }
37
38 // Loop each stage
39 for (int l = 1; l <= m; l++)
40 {
41 le = pow(2, l);
42 le2 = le / 2;
43 ur = 1;
44 ui = 0;
45 // Calculate sine and cosine values
46 sr = cos(PI/le2);
47 si = -sin(PI/le2);
48 // Loop for each sub DFT
49 for (int j = 1; j <= le2; j++)
50 {
51 jm1 = j - 1;
52 // Loop for each butterfly
53 for (int i = jm1; i <= nm1; i += le)
54 {
55 ip = i + le2;
56 tr = REX[ip]*ur - IMX[ip]*ui;
57 ti = REX[ip]*ui + IMX[ip]*ur;
58 REX[ip] = REX[i] - tr;
59 IMX[ip] = IMX[i] - ti;
REX[i] = REX[i] +
60
tr;
61 IMX[i] = IMX[i] + ti;
62 }
63 tr = ur;
64 ur = tr*sr - ui*si;
65 ui = tr*si + ui*sr;
66 }
67 }
68 }
69
70 // Upon entry, REX[] and IMX[] contain the real and imaginary parts
71 // of the complex frequency domain
72 // Upon return, REX[] and IMX[] contain the complex time domain
73 void ifft()
74 {
75 // Change the sign of IMX[]
76 for (int k = 0; k <= N-1; k++)
77 {
78 IMX[k] = -IMX[k];
79 }
80
81 // Calculate forward FFT
82 fft();
83
84 // Divide the time domain by N and change the sign of IMX[]
85 for (int n = 0; n <= N-1; n++)
86 {
87 REX[n] /= N;
88 IMX[n] /= -N;
89 }
90 }
Butterfly diagram
From Wikipedia, the free encyclopedia
This article is about butterfly diagrams in FFT algorithms; for the sunspot
diagrams of the same name, see Solar cycle.
Signal-flow graph connecting the inputs x (left) to the outputs y that depend on
them (right) for a "butterfly" step of a radix-2 Cooley–Tukey FFT. This diagram
resembles a butterfly (as in the morpho butterfly shown for comparison), hence the
name, although in some countries it is also called the hourglass diagram.
In the context of fast Fourier transform algorithms, a butterfly is a portion of the

computation that combines the results of smaller discrete Fourier transforms
(DFTs) into a larger DFT, or vice versa (breaking a larger DFT up into
subtransforms). The name "butterfly" comes from the shape of the data-flow
diagram in the radix-2 case, as described below.[1] The earliest occurrence in print
of the term is thought to be in a 1969 MIT technical report.[2][3] The same structure
can also be found in the Viterbi algorithm, used for finding the most likely
sequence of hidden states.
Most commonly, the term "butterfly" appears in the context of the Cooley–Tukey
FFT algorithm, which recursively breaks down a DFT of composite size n = rm
into r smaller transforms of size m where r is the "radix" of the transform. These
smaller DFTs are then combined via size-r butterflies, which themselves are DFTs
of size r (performed m times on corresponding outputs of the sub-transforms) pre-
multiplied by roots of unity (known as twiddle factors). (This is the "decimation in
time" case; one can also perform the steps in reverse, known as "decimation in
frequency", where the butterflies come first and are post-multiplied by twiddle
factors. See also the Cooley–Tukey FFT article.)
Contents
 1 Radix-2 butterfly diagram
 2 Other uses
 3 See also
 4 References
 5 External links
Radix-2 butterfly diagram

In the case of the radix-2 Cooley–Tukey algorithm, the butterfly is simply a DFT
of size-2 that takes two inputs (x0, x1) (corresponding outputs of the two sub-
transforms) and gives two outputs (y0, y1) by the formula (not including twiddle
factors):
If one draws the data-flow diagram for this pair of operations, the (x0, x1) to (y0, y1)
lines cross and resemble the wings of a butterfly, hence the name (see also the
illustration at right).
A decimation-in-time radix-2 FFT breaks a length-N DFT into two length-N/2
DFTs followed by a combining stage consisting of many butterfly operations.
More specifically, a radix-2 decimation-in-time FFT algorithm on n = 2 p inputs

with respect to a primitive n-th root of unity relies on O(n log n) butterflies of
the form:
where k is an integer depending on the part of the transform being computed.

Whereas the corresponding inverse transform can mathematically be performed by
replacing ω with ω−1 (and possibly multiplying by an overall scale factor,
depending on the normalization convention), one may also directly invert the
butterflies:
corresponding to a decimation-in-frequency FFT algorithm.
Other uses
The butterfly can also be used to improve the randomness of large arrays of
partially random numbers, by bringing every 32 or 64 bit word into causal contact
with every other word through a desired hashing algorithm, so that a change in any
one bit has the possibility of changing all the bits in the large array.[4]
The Fourier Transform Part XIV – FFT

Algorithm
 Home
 /
 /
 The Fourier Transform Part XIV – FFT Algorithm
In this post, we’re going to develop an algorithm to implement all that we have
learned in the last few posts about the FFT. In developing this algorithm, I’ve
started from the smallest part of the computation I could think of and worked
outwards. The basic building block of the FFT is the “Butterfly” calculation.
This calculation is iterated many times over the course of the FFT.
The snippets of code that appear in this post are written in Javascript.
Before we start, let’s define some terms:

Any size of FFT will be broken down into stages. For example, I’ve shown a 16-
point FFT in the diagram above. The number of stages can be calculated by the
following formula:
If you plug the number 16 into the FFTSize of the formula above you’ll find that
there are 4 stages required to calculate the FFT as shown in the diagram above.
These stages are numbered 0-3.
Each stage of the calculation has a number of groups of butterflies in it. In stage 0,
there are 8 groups, each group containing 2 samples being fed into 1 butterfly.
In stage 1, there are 4 groups, each group containing 4 samples being fed into 2
overlapped butterflies.
In stage 2, there are 2 groups, each group containing 8 samples being fed into
4 overlapped butterflies.
…and so on.
The number of samples in each group can be calculated by the following formula:
The number of groups in each stage can be calculated by the following formula:
Each butterfly takes two samples and adds them together for the first term and
subtracts them for the second term. Things are complicated slightly (but not much)
by the fact that the second sample needs to be multiplied by a twiddle factor. This
addition and subtraction repeats itself constantly throughout the computation of the
FFT which means we are going to reuse the piece of code that does it a lot so it
makes sense to program it into it’s own function. The generalized butterfly for any
sample pair in the FFT can be calculated as follows.
x is the input sample
F is the output frequency term
FFTSize is the total number of samples going into the FFT
n is the sample index (0, 1, 2, 3, …, FFTSize-1)
N is the number of samples in each group of the current stage of the FFT
k is the order of the twiddle factor within each group ( 0, 1, 2, 3, …, (N/2)-1 )
W is the twiddle factor

However, although the butterfly lies right at the heart of the calculation, it itself is
made up of 3 even more basic calculations. These are one complex add (to add the
2 samples together), one complex subtract (to subtract the 2 samples one from each
other) and one complex multiply (to multiply the second sample by the twiddle
factor).
So we are first going to develop 3 functions to do each of these 3 operations.

Throughout the whole algorithm, I’m going to define my own data structure to
store the complex numbers used throughout the calculation. This is a structure
made up of 2 floating point numbers, one for the Real term and one for the
Imaginary term as follows:
1var ComplexNumber = {Real: 0, Imaginary: 0};
So lets start simple and write the function which adds together 2 complex numbers.
To do this we need to add the the real and imaginary terms separately.
1
2function
{
ComplexAdd(ComplexNumber0, ComplexNumber1)
3 var ComplexResult = {Real: 0, Imaginary: 0};

4
5 ComplexResult.Real = ComplexNumber0.Real + ComplexNumber1.Real;
6 ComplexResult.Imaginary = ComplexNumber0.Imaginary + ComplexNumber1.Imaginary;
7
return ComplexResult;
8}
9
The function accepts 2 complex numbers (ComplexNumber0

and ComplexNumber1), adds together the real and imaginary parts separately and
returns the result (ComplexResult).
Writing the subtract function is just as simple. We only need to change the addition
sign to a subtraction sign:
1function ComplexSubtract(ComplexNumber0, ComplexNumber1)

2{
var ComplexResult = {Real: 0, Imaginary: 0};
3
4 ComplexResult.Real = ComplexNumber0.Real - ComplexNumber1.Real;
5 ComplexResult.Imaginary = ComplexNumber0.Imaginary - ComplexNumber1.Imaginary;
6
7 return ComplexResult;
8}
9
The third function is the complex multiply function. Remember that multiplying
two complex numbers is like multiplying 2 brackets. We have to use the FOIL
method I mentioned in the previous post.
01
02
03function ComplexMultiply(ComplexNumber0, ComplexNumber1)
04{ var ComplexResult = {Real: 0, Imaginary: 0};
05 var First;
06 var Outside;
07 var Inside;
08 var Last;
09 // First - Produces real result
10 First = ComplexNumber0.Real * ComplexNumber1.Real;
11 // Outside - Produces imaginary result
12 Outside = ComplexNumber0.Real * ComplexNumber1.Imaginary;
// Inside - Produces imaginary result
13 Inside = ComplexNumber0.Imaginary * ComplexNumber1.Real;
14 // Last - Produces real result multiplied by i-squared (i-squared = -1)
15 Last = -1 * ComplexNumber0.Imaginary * ComplexNumber1.Imaginary;
16
17 ComplexResult.Real = First + Last;
ComplexResult.Imaginary = Inside + Outside;
18
19 return ComplexResult;
20}
21
22
Now that we have the 3 mathematical operations, we can write the code for the
butterfly. The butterfly has to add together the first and second samples to produce
the first frequency term and subtract the second sample from the first to produce
the second frequency term. The second sample is always multiplied by the relevant
twiddle factor. Therefore the inputs to the function are the two samples and the
twiddle factor. All of these are complex numbers. The function returns an
array containing the two frequency terms, again as complex numbers.
01function
{
Butterfly(Sample0, Sample1, TwiddleFactor)
02 var Frequency=[];
03 var TwiddledSample1=ComplexMultiply(Sample1, TwiddleFactor);
04
Frequency[0] = ComplexAdd(Sample0, TwiddledSample1);
05 Frequency[1] = ComplexSubtract(Sample0, TwiddledSample1);
06
07 return Frequency;
08}
09
10
Next we need to have a function which calculates the twiddle factors. The number
of twiddle factors depends on which stage of the calculation we are currently
doing and can be calculated by raising 2 to the power of the index of the current
stage. So, for example, if we are in the second stage of the FFT the stage index will
be equal to 1 (remember the index starts from zero) and there will be 2 twiddle
factors ( ). Therefore we need to have a “StageIndex” at the input to the
function to tell it which stage of the calculation we need the twiddle factors for.
The functions then returns an array of complex numbers which are the twiddle
factors. The twiddle factors are calculated using the following formula:
where:
N is the number of samples in each group of the current stage of the FFT
k is the order of the twiddle factor within the current group.
For the second stage of the FFT, k will range between 0 and 1.
01function CalculateTwiddleFactors(StageIndex)
02{
03 var NumberOfTwiddleFactors = Math.pow(2,StageIndex);
var TwiddleFactors=Array(NumberOfTwiddleFactors);
04 var NumberOfSamples = Math.pow(2,StageIndex+1);
05 var i;
06
07 for (i=0; i<NumberOfTwiddleFactors; i++)
{
08 var ComplexNumber = {Real: 0, Imaginary: 0};
09
10 ComplexNumber.Real = Math.cos(2 * Math.PI * i / NumberOfSamples);
11 ComplexNumber.Imaginary = -1 * Math.sin(2 * Math.PI * i / NumberOfSamples);
12 TwiddleFactors[i] = ComplexNumber;
}
13
14 return TwiddleFactors;
15}
16
17
18
Now we’ve got the butterfly sorted out with all its twiddle factors, we need to
prepare the samples that are going to be input into the first stage of the FFT.
Remember the samples have to be placed in a special order, a bit-reversed order,
not the order they occur in naturally. Therefore the following function will reorder
them for us:
01
02
03function BitReversal(SamplesIn, NumberOfSamples)
04{ var NewOrder;
05 var SamplesOut=[];
06 var NumberOfBits;
07 var SampleIndex;
08
09 NumberOfBits = Math.log(NumberOfSamples) / Math.log(2);
10 for (SampleIndex=0; SampleIndex<NumberOfSamples; SampleIndex++)
11 {
12 NewOrder = 0;
13 for (BitIndex=0; BitIndex<NumberOfBits; BitIndex++)
{
14 if ((SampleIndex < Math.pow(2, BitIndex)) == Math.pow(2, BitIndex))
15 NewOrder = NewOrder + Math.pow(2, (NumberOfBits - 1 - BitIndex));
16 }
17 } SamplesOut[SampleIndex] = SamplesIn[NewOrder];
18 return SamplesOut;
19}
20
21
The function accepts at its input, the array of samples to be reordered (I’ve called
the array: “SamplesIN”). We also have to tell the function how many samples there
are in the array (NumberOfSamples) for reasons that will become clear shortly.
What this function is doing is taking the bits of the index of the current sample and
reversing them so that, for example we have 16 samples in all and the the index of
the current sample is 8 (which in binary is 0100), the new index of this sample will
now be 2 (which in binary is 0010).
The total number of samples in the FFT is important as the order will change
depending on how many samples there are. This will affect the number of bits to
be reversed. If there are 32 samples in my FFT for example, it takes 5 bits to
describe the number 31 (remember the indexes start from 0 so the highest index
will be 31 not 32). Therefore, if I bit reverse the index of sample index 8 (which in
binary is 00100) in a 32-point FFT, the new index will remain 8 as reversing the
bits of the number 8 in 5-bit binary gives (00100) which is the same as before.
The function then uses this new, bit-reversed index (I’ve called it “NewOrder” in
the above function) to place the samples into a new array in a new order. I’ve
called the new array: “SamplesOut”. This is then returned to the main function and
the calculation of the FFT can commence.
So here is the FFT itself:
01function FFT(SampleArray, FFTSize)

{
02 var NumberOfStages = Math.log(FFTSize) / Math.log(2);;
03 var DFTStage;
04 var SampleIndex;
05 var GroupIndex;
var NumberOfSamplesInGroup;
06 var NumberOfGroups;
07 var CombinedIndex;
08 var HalfOfSamplesInGroup;
var TwiddleFactors=[];
09 var Sample0;
10 var Sample1;
11 var TwiddleFactor;
12
13 // Reorder the Samples usinge bit reversal technique
SampleArray = BitReversal(SampleArray, FFTSize);
14
15 // Main FFT calculation loop
16 for (DFTStage=0; DFTStage<NumberOfStages; DFTStage++)
17 {
// Calculate the twiddle factors for this stage of the DFT
18 TwiddleFactors = CalculateTwiddleFactors(DFTStage);
19
20 // Prepare to organize the samples into groups
21 NumberOfSamplesInGroup = Math.pow(2,DFTStage+1)
NumberOfGroups = FFTSize / NumberOfSamplesInGroup;
22 HalfOfSamplesInGroup = NumberOfSamplesInGroup / 2;
23
24 // Perform the Butterfly calculation on each group
25 for (GroupIndex=0; GroupIndex<NumberOfGroups; GroupIndex++)
26 {
for (SampleIndex=0; SampleIndex<(NumberOfSamplesInGroup/2); SampleIndex++)
27 {
28 CombinedIndex = NumberOfSamplesInGroup * GroupIndex + SampleIndex;
29
30 // Prepare samples and twiddle factor for input into the butterfly
31 Sample0 = SampleArray[CombinedIndex];
Sample1 = SampleArray[CombinedIndex + HalfOfSamplesInGroup];
32 TwiddleFactor = TwiddleFactors[SampleIndex];
33
34 // Do the butterfly calculation
35 Results = Butterfly(Sample0, Sample1, TwiddleFactor);
36
// Place results back into the sample array ready for the next stage
37 SampleArray[CombinedIndex] = Results[0];
38 SampleArray[CombinedIndex + HalfOfSamplesInGroup] = Results[1];
39 }
40 }
}
41 return SampleArray;
42}
43
44
45
46
47
48
49
50
51
52
The first thing to do is to work out how many stages we’re going to go through to
calculate the FFT using the formula we mentioned at the beginning of this post:
We then setup a counter (DFTStage) to keep track of which stage we are currently
calculating. Before we can begin the calculation, we have to reorder our samples so
they are in the correct order for the FFT. We do this using the BitReversal function
mentioned above. Once the input samples are in the right order, we can begin the
main loop which loops through each stage of the FFT calculation.
We calculate all the twiddle factors for this stage then prepare two more loops, the
second of which will run inside the first. The first loops through each group and
within that loop sits another loop which runs the butterfly itself on each of the pairs
of samples within the group.
The results are placed back into the input array ready for inputting into the next
stage of the FFT until all the stages have been calculated. This array, which now
contains all the results is returned at the end of the function.
The FFT returns an array of complex frequency terms which in itself doesn’t help
us much. What we need to do now, is take the results of the FFT calculation and
extract Magnitude and Phase data for each frequency index.
To calculate the magnitude we use Pythagoras:
01
02function CalculateMagnitude(SampleArray, FFTSize)
03{
04 var Magnitude=[];
05 var Real;
var Imaginary;
06
07 for (i=0; i<FFTSize; i++)
08 {
09 Real = SampleArray[i].Real;
Imaginary = SampleArray[i].Imaginary;
10 Magnitude[i] = Math.pow(Math.pow(Real, 2) + Math.pow(Imaginary, 2), 0.5);
11 }
12
13 return Magnitude;
14}
15
To calculate the phase we use the inverse Tan function:
01function CalculatePhase(SampleArray, FFTSize)

02{
03 var Phase=[];
var Real;
04 var Imaginary;
05
06 for (i=0; i<FFTSize; i++)
07 {
08 Real = SampleArray[i].Real;
Imaginary = SampleArray[i].Imaginary;
09 Phase[i] = 180 * Math.atan(Imaginary / Real) / Math.PI;
10 }
11
12 return Phase;
13}
14
15
I purposely wrote this algorithm in Javascript so that it could be run on a web

browser and this is what we’ll be doing next time in my Javascript FFT calculator
allowing you to plug in any signal into the input (so long as the number of samples
in your signal is a power of 2) and calculate the FFT for it. The calculator allows
you to see the results at each stage of the calculation as well as all the twiddle
factors.
The Fourier Transform Part XV – FFT

Calculator
 Home
 /
 /
 The Fourier Transform Part XV – FFT Calculator
Before I show you my special FFT Calculator which I wrote based on all the things
we have covered in this blog, let me keep you in suspense a little longer while we
review what we’ve learned in the blog. (…or I could just cut the c#@p and you
can click here to go straight to the calculator)
Review of the Fourier Transform Blog
Over the course of this Blog series we’ve taken a detailed look into the workings of
the Fourier Transform. I’ve been looking at the Fourier Transform through the eyes
of a sound engineer, using it to analyze sound signals. However, the Fourier
Transform is more versatile than that. It is used to analyze all manner of signals
such as pictures and even video.
In Part 1, we looked at the motivation behind the Fourier Transform, as a tool used
to give us information about our signal.
In Part 2, we saw that any signal can be seen as a collection of sine waves and we
considered how we might modify those sine waves, playing with their properties of
Frequency and Amplitude and adding them together to make more complex, richer
sounds.
In Part 3, we added phase to the list properties we can play with, shifting the waves
in time.
In Part 4, we looked at Complex Numbers, the language used to describe the

Fourier Transform algorithm.
In Part 5, we looked at Convolution, the way that we can measure the phase,
amplitude and frequency of the different sine waves present in a signal by testing it
with Cosine and Sine waves at known frequencies.
In Part 6, we looked at the Fourier Transform equation itself and understood via
the language of Complex Numbers what exactly it was doing.
In Part 7, we noticed that there was a problem with the Fourier Transform as it
stands in that it makes a number of inconvenient assumptions about our ability to
deal with infinities. Therefore we modified the Fourier Transform equation a little
to make it more Discrete.
In Part 8, we found that making the Fourier Transform discrete presented us with a
new problem of spectral leakage as the Fourier Transform assumes that signals go
on forever, something that is not true in our real world of finite things. Therefore
we saw how Windowing helped reduce the problem.
In Parts 9 – 12, we began to look at how we would implement the Fourier

Transform on a computer and noticed that there was a problem in that it required a
very large number of complex computations to calculate. However, to our rescue
came Cooley and Tukey who noticed that there were are large number of
computations that were repeating themselves within the Fourier Transform. This
discovery enabled them to develop a special algorithm called the Fast Fourier
Transform which remembered the repeating computations meaning they could be
reused in later stages of the calculation.
In Part 13, we did a numerical example and worked our way through a 16-point
FFT.
In Part 14, we wrote our own implementation of the FFT in JavaScript.
And now, in Part 15, the final post in the series (yes I know I’ve said that before),
we’re going to actually run the algorithm on a signal and look at the different
stages of the FFT calculation.
I used the code from Part 14 to write an implementation in JavaScript that should
hopefully (if I’ve got all the bugs out) run in your browser. This is an FFT
calculator which lets you take a peek at the different stages of the computation.
The FFT Calculator
There are 6 different screens in the calculator:
In the first screen, you plug in the values of the sample you want to analyze. Each
line should contain a numeric value representing the amplitude of that sample.
REMEMBER: Your signal must contain a number of samples that is a power of 2
(e.g. 2, 4, 8, 16, 32, 64, … etc.) Alternatively, you can use one of the 7 test signals
I’ve prepared for you. Just select the test signal you want to use from the select box
on the first screen. If your signal contains a valid number of samples, a “Next”
button will appear at the top of the calculator. Click it to proceed to the next
screen.
In the second screen, you will see a graph of your signal. This is a time domain
graph and shows you the amplitude of each sample you entered. Click “Next” to
continue. Alternatively you can click “Previous” to go back to the previous screen.
In the third screen, you will see a summary of the FFT that is about to be
performed on your signal. The summary details how many samples there are in
your signal, how many stages there will be to the FFT calculation, how many
groups of butterflies there are in each stage and how many butterflies there are in
each group. You can then select which specific butterfly in which group and in
which stage of the calculation you would like to see by pressing on the “Show
Butterfly” button. If you are a little confused as to what a stage, group of butterfly
is, just scroll down the page to see an example of a butterfly diagram for a 16-point
FFT. Once you have seen all the butterflies you want, click “Next”. Alternatively
you can click “Previous” to go back to the previous screen.
In the fourth screen, you will see a table containing all the result of the final stage
of the FFT calculation. For each frequency index, you will see the Cosine
contribution and the Sine contribution. Next to that, you will see the magnitude of
that frequency calculated using Pythagoras’ Theorem, and the phase (in degrees) of
that frequency calculated using the inverse tangent rule. Click “Next” to go to the
next screen or click “Previous” to go back to the previous screen.
In the fifth screen, you will see a frequency domain graph of the Magnitude of
each frequency. Click “Next” to go to the next screen or click “Previous” to go
back to the previous screen.
In the sixth screen, you will see a frequency domain graph of the Phase of each
frequency. This is the final screen in the calculator. Click “Back to Start” to go to
back to the first screen and try out a new signal or click “Previous” to go back to
the previous screen.

A DFT and FFT Tutorial

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

A DFT and FFT Tutorial

Caricato da

Copyright:

Formati disponibili

The Fourier Transform Part X – FFT 2

The Fast Fourier Transform – Part 2

So our second term, a(1), is also very easy to calculate:

we perform this operation on each group of 2 samples.

What we do is we twiddle it a bit. We change the phase of the Cosine wave, or in

The Fourier Transform Part XI – FFT 3

Click here to reserve your free module

The Fast Fourier Transform – Part 3

The 4-Point DFT

Next we’ll look at frequency 1 which I’m going to call b(1):

–any more memories stirring yet? —

So now for frequency 2 which I’m going to call b(2):

— bit like b(0) in that respect! —

Looking at the left hand graph for the Cosine contribution:

— Now I’ve definitely seen that somewhere before! —

Looking at the right hand graph for the Sine contribution:

The Calculations are repeating themselves!

As we saw in the 2-point DFT:

However, here’s the interesting thing.

As we are talking about a frequency of zero, it doesn’t matter by how much we

For the Cosine component, a(3) is zero so:

and from before we know that:

So putting this all together:

This is shown on a butterfly diagram as follows:

The Fourier Transform Part XII – FFT 4

Click here to reserve your free module

How are these twiddle factors calculated?

The 16 samples in the signal have the following values:

…and so on for each of the eight 2-point DFTs.

A complex number can be one of three types of numbers:

Therefore there can be four types of multiplication:

1. A real number multiplied by another real number

A real number multiplied by another real number

Let’s multiply it by 3, which can be written as:

So if we want to multiply these 2 numbers together:

So our calculation is going to look like:

The result of the multiplication is a real number.

What sort of number will we get if we multiply a real number by an imaginary

Using the same FOIL method again will give us:

So the answer to our question is we will get a result that is imaginary if we

An imaginary number multiplied by another imaginary number

What sort of number will we get if we multiply an imaginary number by another

FOIL will give us:

A complex number multiplied by another complex number

What sort of number will we get if we multiply a complex number (that is a

Using FOIL, we’ll write it out in full:

So a complex number multiplied by another complex number gives us a

is the complex conjugate of:

If we were to multiply these two complex numbers together:

Using FOIL, this would expand to:

If we do the multiplications inside each of the 4 brackets we get:

Summary of the rules of complex multiplication

So the calculations for the 4-point DFTs will work as follows:

…and so on for each of the four 4-point DFTs.

Frequency Real (Cosine) Imaginary (Sine)

b(5) -0.3536 0.3536

b(7) -0.3536 -0.3536

b(9) 0.4619 0.1913

b(11) 0.4619 -0.1913

b(13) -0.1913 -0.4619

b(15) -0.1913 0.4619

We saw above that:

Using FOIL gives us the result: