The 'Classic' Non-Recursive DIF FFT Routine.
For many processors/languages a recursive routine is not attractive, because of the overhead incurred by a procedure call. Be careful though, in reality an aversion to recursion could cost performance (see the note about cache efficiency). There is also a particular problem with DSP's which often have small hardware stacks, so deeply recursive routines will cause stack overflow. It is fairly easy to see that the above routine can be flattened into nested loops, to yield something resembling the 'classic' (non recursive) DIF FFT. Each DIF uses 2 'half size' DIF's, which in turn will use 4 'quarter size' DIF's, etc.. the last (not trivial) DIF will be for size 2. So, keeping notation consistent with the above recursive DIF, we get 3 nested loops:- Pass Loop: An N (=2p ) point transform will perform p 'passes', indexed by P=0..p-1.
- Block Loop: Pass will P operate on BP (=2P ) sub-blocks, each of size NP (=N/BP=2p-P), indexed by b=0..BP-1
- Butterfly Loop: Each sub-block operation will perform N'P (=NP/2=2p-P-1) butterflies, indexed by n=0..N'P-1.
{Perform in place DIF of 2^p points (=size of f)} PROCEDURE DIF(p,VAR f); LOCAL Bp,Np,Np',P,b,n,BaseE,BaseO,e,o; BEGIN {DIF} {initialise pass parameters} Bp:=1; {No. of blocks} Np:=1<<p; {No. of points in each block} {perform p passes} FOR P:=0 TO (p-1) DO BEGIN {pass loop} Np':=Np>>1; {No. of butterflies} BaseE:=0; {Reset even base index} FOR b:=0 TO (Bp-1) DO BEGIN {block loop} BaseO:=BaseE+Np'; {calc odd base index} FOR n:=0 TO (Np'-1) DO BEGIN {butterfly loop} e:= f[BaseE+n]+f[BaseO+n]; o:=(f[BaseE+n]-f[BaseO+n])*T(Np,n); f[BaseE+n]:=e; f[BaseO+n]:=o; END; {butterfly loop} BaseE:=BaseE+Np; {start of next block} END; {block loop} {calc parameters for next pass} Bp:=Bp<<1; {twice as many blocks} Np:=Np>>1; {half as many points in each block} END; {pass loop} END; {DIF}That's as far as we'll go on DIF code. Of course, if you look, there's still plenty of scope for further optimisation of this routine. We won't do it here for fear of obscuring the code even more. The remainder of this section is devoted to a few observations about the practical details of the twiddle factors, which so far have been largely neglected.
Bottom Previous Contents
The Radix 2 Decimation In Frequency (DIF) Algorithm.
The Maths.
We defined the FFT as:Even k, k=2k' (k'=0..N/2-1):
Odd k, k=2k'+1 (k'=0..N/2-1):
A Recursive DIF FFT Routine.
Given the above results, we can now have a 'first stab' at a recursive routine to implement this algorithm (in a hypothetical Pascal like programming language which supports complex numbers and allows arrays as function arguments and results):FUNCTION DIF(N,f); LOCAL N',n,fe,fo,Fe,Fo,k',F; IF N==1 THEN RETURN(f); {trivial if N==1} ELSE BEGIN {perform 2 sub-transforms} N':=N/2; {size of sub-transforms} FOR n:=0 TO (N'-1) DO {perform N' DIF 'butterflies'} BEGIN fe[n]:= f[n]+f[n+N']; {even subset} fo[n]:=(f[n]-f[n+N'])*T(N,n); {odd subset} END; Fe:=DIF(N',fe); {even k} Fo:=DIF(N',fo); {odd k} FOR k':=0 TO (N'-1) DO BEGIN F[2*k' ]:= Fe[k']; {even k} F[2*k'+1]:= Fo[k']; {odd k} END; RETURN(F); END;This is simplest form of DIF implementation and directly reflects the mathematical derivation of the algorithm. The process of calculating the fe and fo components is conventionally referred to as a (DIF) 'butterfly', and is the basic primitive operation of the FFT. (Apparently, this name was aquired from certain diagramatic representations of this operation which resemble butterflies.)
DIF FFT Algorithmic Complexity (Why it's fast).
The algorithmic complexity of an FFT is usually quantified in terms of the total number of butterfly operations performed. Let this number be C(p) for a 2p point transform. Looking at the DIF routine above, it is easy to see that C(p) must satisfy the following recurrence relation:A Recursive 'In Place' DIF FFT Routine.
If you coded something like the above routine in a real programming language it would work fine. However, from an efficiency point of view it is somewhat less than ideal. Allocating local arrays on the stack will make this implementation fairly memory hungry, and the last loop in the above routine performs no useful numerical computation, it simply re-arranges values previously calculated.What if the last loop were somehow replaced by a simple catenation the results of the two sub-transforms? Say we put all the even k samples in the top half of the result and all the odd samples in the bottom half of the result. Treating k as a binary number, the least significant bit of k determines which 'half' of the result array the corresponding value resides in. The least significant bit of k becomes the most significant bit of the corresponding (binary) array index. Because the function is recursive, this bit swapping will occur for each sub-transform etc.. I.E. The array index is determined by bit reversing k. Doing this won't save any time, unless we can arrange that the catenation operation is actually performed implicitly by each sub-transform. This suggests that an 'in place' algorithm should be used (one which performs all necessary operations in a single array, rather than allocating temporary arrays on the stack). Looking at the form of the first loop in the above routine indicates how to do this, put the 'even' subset values back in f[n] (the top half) and put the 'odd' subset values back in f[n+N'] (the bottom half), then perform the DIF operation on each half. Pulling all these ideas together, we get a more efficient (recursive) in-place DIF routine:
{Perform in place DIF of N points starting at position BaseE DIF(0,N,f) performs DIF FFT on entire array f, N= size of f } PROCEDURE DIF(BaseE,N, VAR f); {f is an external array} LOCAL N',BaseO,n,e,o; IF N==1 THEN {do nothing} ELSE BEGIN N':=N>>1; {shift right to get size of sub-transforms} BaseO:=BaseE+N'; {split block into 2 halves} FOR n:=0 TO (N'-1) DO BEGIN e:= f[BaseE+n]+f[BaseO+n]; o:=(f[BaseE+n]-f[BaseO+n])*T(N,n); f[BaseE+n]:=e; f[BaseO+n]:=o; END; DIF(BaseE,N',f); {even sub-transform} DIF(BaseO,N',f); {odd sub-transform} END;This version of the DIF routine is a little simpler and more efficient than the first, but has the disadvantage that the output is in 'jumbly' (bit reversed) order. This may on may not be a serious problem, depending on what the output is to be used for and whether or not your processor/programming language supports bit reversed addressing (most DSP's do). If bit reversed addressing is not available then you may need to produce a bit reversed index look up table to access the output.
It is worth noting that this is the simplest formulation of the 'in-place' DIF algorithm. It takes normal order input and generates bit reversed order output. However, this is not fundamental to the algorithm, it is merely a consequence of the simplest implementation. It is possible to write a reverse order DIF algorithm FFT which takes bit reversed order input and generates normal order output. This requires the DIF FFT routine to be amended in two ways:
- Passing an additional parameter which specifies the spacing between elements in each sub-array to be transformed. In the simple code above, this is always 1. In the reverse order FFT, this will start at 1 and double for each sub transform. This parameter is also useful feature for multi-dimensional transforms.
- Since the input to each sub-transform is now in bit reversed order, the twiddle factors must also used in bit reversed order. This isn't difficult if the twiddle factors are taken from a table in bit reversed order, or if bit reversed addressing is available.
The 'Classic' Non-Recursive DIF FFT Routine.
For many processors/languages a recursive routine is not attractive, because of the overhead incurred by a procedure call. Be careful though, in reality an aversion to recursion could cost performance (see the note about cache efficiency). There is also a particular problem with DSP's which often have small hardware stacks, so deeply recursive routines will cause stack overflow. It is fairly easy to see that the above routine can be flattened into nested loops, to yield something resembling the 'classic' (non recursive) DIF FFT. Each DIF uses 2 'half size' DIF's, which in turn will use 4 'quarter size' DIF's, etc.. the last (not trivial) DIF will be for size 2. So, keeping notation consistent with the above recursive DIF, we get 3 nested loops:- Pass Loop: An N (=2p ) point transform will perform p 'passes', indexed by P=0..p-1.
- Block Loop: Pass will P operate on BP (=2P ) sub-blocks, each of size NP (=N/BP=2p-P), indexed by b=0..BP-1
- Butterfly Loop: Each sub-block operation will perform N'P (=NP/2=2p-P-1) butterflies, indexed by n=0..N'P-1.
{Perform in place DIF of 2^p points (=size of f)} PROCEDURE DIF(p,VAR f); LOCAL Bp,Np,Np',P,b,n,BaseE,BaseO,e,o; BEGIN {DIF} {initialise pass parameters} Bp:=1; {No. of blocks} Np:=1<<p; {No. of points in each block} {perform p passes} FOR P:=0 TO (p-1) DO BEGIN {pass loop} Np':=Np>>1; {No. of butterflies} BaseE:=0; {Reset even base index} FOR b:=0 TO (Bp-1) DO BEGIN {block loop} BaseO:=BaseE+Np'; {calc odd base index} FOR n:=0 TO (Np'-1) DO BEGIN {butterfly loop} e:= f[BaseE+n]+f[BaseO+n]; o:=(f[BaseE+n]-f[BaseO+n])*T(Np,n); f[BaseE+n]:=e; f[BaseO+n]:=o; END; {butterfly loop} BaseE:=BaseE+Np; {start of next block} END; {block loop} {calc parameters for next pass} Bp:=Bp<<1; {twice as many blocks} Np:=Np>>1; {half as many points in each block} END; {pass loop} END; {DIF}That's as far as we'll go on DIF code. Of course, if you look, there's still plenty of scope for further optimisation of this routine. We won't do it here for fear of obscuring the code even more. The remainder of this section is devoted to a few observations about the practical details of the twiddle factors, which so far have been largely neglected.
Twiddle Factor Tricks.
Recall the form of the twiddle factors:The next trick is to notice that often the twiddle factor will be either +1 or -j (it will never be -1 or +j). In such cases there is no need to do a full blown complex multiply (remember this requires 4 real multiplies and 2 real adds). Simple addition and subtraction will suffice. We shall call butterflies that use on or other of these twiddle factors 'trivial butterflies'. Specifically:
In the penultimate (N=4) pass the butterflies will use either T4(0)=1 or T4(1)=-j. So this pass also uses only trivial butterflies. As in the above case, in each of the previous (N>4, P<p-2) passes, there are BP (=2P) sub-blocks and therefore 2P butterflies which use a twiddle factor of -j (at n=N/4) .
We can now calculate exactly how many of these trivial butterflies Triv(p) there are in a typical DIF FFT for sizes >4 (p>2). This is the sum of the number of butterflies in the last 2 passes (the easy bit) and the number of trivial butterflies in all the other passes (the hard bit, but the result in Annex B is useful here):
The non-recursive routine presented above can be modified to exploit trivial butterflies by doing something like this:
- Make the last 2 passes 'specials' which have the appropriate twiddle factors (+1 or -j) 'hard wired' into the code.
- For the remaining passes (N>4, N'p>2) break the inner butterfly loop into two, each half of which has a 'special' as the first butterfly. The first loop special butterfly should use +1 as the twiddle factor. The second loop special butterfly should use -j as the twiddle factor.
As a final note, we can observe that there are further opportunities to exploit which we haven't considered:
No hay comentarios:
Publicar un comentario