Sei sulla pagina 1di 5

A Floating-point Coprocessor Configured by a

FPGA in a Digital Platform Based on Fixed-point


DSP for Power Electronics
Haibing HU, Tianjun Jin, Xianmiao Zhang, Zhengyu LU, Zhaoming Qian
National Key Lab. of Power Electronics, Zhejiang University, Hangzhou, China
huhaibing@163.com
Abstract —A configurable floating-point coprocessor by a computation capability and accuracy with the advantages
FPGA is designed to enhance the computational capability of fixed-point DSPs dedicated to power electronics and
of the digital platform based on the fixed-point DSP, with motor drive applications. This type platform also has its
which the platform will be competent to implement drawbacks: (1)High cost; (2)Programming would be
intensively computational tasks. Detailed design procedures complicated; (3)Communication between two DSPs
of the coprocessor are presented. A new division algorithm would be a headache problem.
is proposed by combining the lookup-table algorithm and However the fixed-point DSPs have continued to be
multiplicative algorithm in order to reduce the number of the mainstay of the industry. The reason, of course, is the
LEs(Logic Element in FPGA) and latency. Error analysis of cost. With the advancement of FPGA technology,
the proposed algorithm shows that the maximum absolute floating-point units configured by FPGA are affordable.
approximate error is less than 2ulp(Unit in Last Place). The In this paper, a new digital platform for power electronics
coprocessor speed can reach up to 25 MFLOP(Million and motor drive is proposed to overcome above problems.
Floating-point Operations). FFT algorithm is adopted to The platform used a fixed-point DSP(TMS320f2812) as a
test the computational efficiency of the floating-point units. core processor and floating-point units(FPU) configured
Experimental results show the computation time by FPU is by a FPGA as a coprocessor to enhance the computation
five times less than that of DSP algorithms. capability. The coprocessor can implement floating-point
adder, multiplier and divider in 40 nanoseconds with the
Keywords-Floating-point;FPGA;Power Electronics;FFT
precision less than 2ulps.The coprocessor can be viewed
as a peripheral of the fixed-point DSP and can be
I. INTRODUCTION
accessed via external interface of DSP.
The computational complexity of modern power
electronics applications is steadily increasing. Meanwhile, II. PRINCIPLE OF FLOATING-POINT UNITS
the computation accuracy is in great demand for high The single precision numbers in the binary IEEE
performance applications. In such areas as Active Power standard are formed as shown in Fig.1. The most
Filter(APF), high performance motor servo drive and significant bit is the sign bit, which indicates a negative
sensorless speed control techniques for motor drive, number if it is set to 1. The following field denotes the
floating-point digital signal processors(DSPs) have their exponent with a constant bias added to it. As shown in
advantages over their counterparts, fixed-point processors, Fig.1, the remaining part of the number is normalized to
to accomplish these computational intensive tasks[1]. have one non-zero bit to the left of the floating point[6].
There are two typical digital platforms for these 31 23 0
computational demanding applications. S e+bias f
1) Some researchers configured their digital platforms Sign Biased exponent Significant s=1.f(the 1 is hidden)
based on floating-point DSPs with extension circuits of Figure 1. Format of IEEE single floating point number
A/D converters, D/A converters, PWM generators and Therefore, the value given by the standard format can
specific peripherals for motor driver and power be expressed using following expression.
electronics[2,3,4]. This method not only increased the (1)
m = ( − 1) Sign × 2 e × 1 .f
system cost greatly but also reduced the system
reliability with so many “discrete” components added to The range of single precision floating-point number
the platforms. varies from -3.4028236 e+38 to -1.1754944 e-38 and
from +1.1754944 e-38 to +3.4028236 e+38.
2) Another typical digital platform uses both
fixed-point DSP and floating-point DSP[5], combining III. FLOATING-POINT ALGORITHMS
the benefits of floating point DSPs for their high
A. Addition algorithm[6,7]
Project supported by National Nature Science Foundation of China
The floating-point addition and subtraction algorithm
(50237030ZD)
studied here is similar to what is done in most traditional

1-4244-0449-5/06/$20.00 ©2006 IEEE IPEMC 2006


processors. The layout of the floating-point addition absolute values of operand A and operand B.
implementation is demonstrated in Fig. 2. As can be seen B Multiplication algorithm
Floating point multiplication is much easier than
Operand A Operand B
Stage 1 floating point addition and is much like integer
S1 E1 Mantissa A S2 E2 Mantissa B multiplication. The block diagram of 32-bit floating-point
multiplier is shown in Fig.3. As can be seen from Fig.3,
8 8
24 24 three elements(sign, exponent and mantissa) of a floating
point expression can be easily obtained by following
Operation Exponent
comparison
5
Mantissa alignment procedures.
selection
Sign:
24 24
2
Mantissa A new sign bit can be easily accomplished by a “xor”
1 comparison
1
gate.
8 Subtraction/
Addition Mantissa:
Stage 2 Operands
comparison 25
A 24-bit multiplier is used for multiplying the
mantissas of operand A and operand B. Only 25 bit
Multiplex production is produced by the multiplier, which will
25
reduce chip area and latency significantly, since 25-bit
1 1 5 Leading zero production is accurate enough to obtain the new mantissa
Exponent
calculation detecting due to the fact that the mantissa is greater than 1.0 and
less than 2.0. Therefore the leading zero detecting and
Stage 3 Sign judgement Normalization normalization process are also much easier than those of
8
floating-point addition.
23
Exponent output Sign result Mantissa output Exponent:
New exponent is obtained by adding these two
Figure 2. Flow chart of a 32-bit floating-point adder
exponents of operand A and operand B and one bit carry
from Fig.2, the floating-point adder can be divided into from the leading zero detecting block.
three stages. The numbers above lines in the flow chart
Operand A Operand B
stand for the data width. The computation procedures for
each stage are as follows. S E Mantissa S E Mantissa

Stage 1:
24
z The larger value of exponents is found by 24
comparing the exponents of operand A and operand B. 24 bit multiplication
z If E1 is larger than E2, subtract E2 from E1 in order
to calculate the number of positions to shift the mantissa 25
B to right so as to make mantissa A and mantissa B Carry bit
aligned before addition and subtraction in next stage. processing

Stage 2: Exponent 1 Leading zero


z If S1 equals S2, add mantissa A to mantissa B, addition detecting
otherwise subtract mantissa A from mantissa B or
conversely, which is decided by absolute values of Sign 8
Normalization
judgement
operand A and operand B.
z According to signs and absolute values of operand A Sign Exponent Mantissa
and operand B, route one of three results obtained in Figure 3. Flow chart of a 32-bit floating-point multiplier
above step to stage 3.
C Division algorithm
Stage 3:
In general, floating point division is accomplished in
z The result obtained in above stage is shifted to the hardware by an algorithm belonging to one of three
left until the highest order bit is a one, which is so called general classes: (1) Lookup table; (2)Subtractive;(3)
“leading zero detecting”. Multiplicative[6]. On considering the precision
z Normalize above result to comply with the mantissa requirements for power electronics applications and in
part of floating-point format. order to reduce the number of LEs and latency, a new
z New exponent is calculated by subtracting larger division algorithm is proposed by combining the
exponent from the number of positions which is the result lookup-table algorithm and multiplicative algorithm. The
of leading zero detecting. block diagram of 32-bit floating-point divider is shown in
z New sign can be obtained according to signs and Fig.4. This algorithm can be expressed as follows.
Operand A Operand B
S E Mantissa S E Mantissa
1 b b (8)
= 1 − ( l ) + ( l )2 − "
Sign 1 + bl / b h bh bh
judgement A_HI A_LO B_HI B_LO

12 12
By properly selecting n , we can neglect the
12
8 8 12
2Kx25bit second-order and higher terms in above Taylor series
ROM
1 b
1 with rational approximation. ≈1− l .
25 bit 12 bit 12 bit 1 + bl / bh bh
multiplier multiplier multiplier
So expression (4) can be approximated by equation
25 12 12 12 (9).
Exponent Aligned 12 bit
subtraction addition multiplier 1 . f 1 a h a l a h bl a l bl (9)
= + − × − ×
26 12 1 . f 2 bh bh bh bh bh bh
8 Aligned
Subtraction For the same reason mentioned above, the last term
26 in(9) can also be neglected.
Exponent 2 Leading zero
subtraction detecting The value of 1 is stored in a lookup table. So by using
bh
Normalization
this algorithm, the division algorithm only requires four
multiplications and two additions, which will reduce the
S E Mantissa
chip cost and latency greatly.
Figure.4 Flow chart of the 32 bit floating point divider
D Error analysis
Suppose dividend and divider with single precision are
in the following format. It is obvious that only division algorithm will cause
approximate error. Whether the approximate error can be
m1 = ( − 1) s1 × 2 E 1 × 1 . f 1 (2) acceptable or not? We can use the mathematical method
to find out the maximum absolute approximate error. The
m 2 = ( − 1) s 2 × 2 E 2 × 1 . f 2 (3) following inequalities(10) and (11) are validated for this
infinite Taylor series due to the facts:
so the quotient q can be expressed as (4). b
0 < l < 1 and 1.0 < bh < 2 .
bh
m1 1 . f1 (4) b b b b
q= = ( − 1) S 1− S 2 × ( 2 ) E 1− E 2 × ( l ) 2 − ( l ) 3 + ( l ) 4 − " < ( l ) 2 < ( b ) 2 (10)
m2 1. f 2 bh bh bh bh
l

Let A = 1. f1 , B = 1. f 2 . Both A and B can be divided al b (11)


× l < a l × bl
into high parts and low parts respectively as shown in bh b h
Fig.5. So the maximum absolute approximate error can be
23 n n −1 0 estimated using the equation(12).
A ak al
E max = ( b l ) 2 + a l × b l (12)
B bk bl

Fig.5 Split A,B into high parts and low parts While the values of al and bl can be expressed in the
Therefore A and B can be expressed as (5) and (6).
following format.
A = a h + al (5) a l = 0. 0
"
0 xx "
xx (13)


24 − n n
B = b h + bl (6) (14)
b l = 0. 0
"
0 
xx
"
xx

24 − n n
Substituting above equations into (4), the last term of
the expression can be rewritten as follows. Using above expressions, the values of (bl )2 and
1 . f1 a a 1 (7) al × bl can be obtained as the following format.
= ( h + l )( )
1. f2 bh b h 1 + bl / b h (15)
( b l ) 2 = 0. 0
"
0 
xx
"
xx

We can expand the term 1 into a Taylor series


2 × ( 24 − n ) −1 2n

a l × bl = 0.0
"
0 
xx "
xx (16)
1 + bl / bh

2× ( 24 − n ) −1 2n
given by Therefore the maximum absolute approximate error
can be expressed as (17).
Error = 0. 0
"
0 xx "
xx (17) A Simulation
max 

2 × ( 24 − n ) − 2 2 n +1 With the introduction of high level hardware


Suppose n = 11 , the maximum absolute approximate description language such as VHDL, verilog HDL and
error will be less than 1ulp(unit in last place). With the etc, rapid design of floating point units has become
decrease in the number of n , the lookup table will possible. According to algorithms mentioned above, three
increase exponentially. In this paper, we made a tradeoff arithmetic units(floating point adder, multiplier and
between the size of look-up table and approximate error divider) were developed on the powerful platform of
by selecting n = 12 . According to expression(17),the QuartusII4.2 provided by Altera using VHDL. To verify
maximum absolute approximate error is less than the correctness of the above algorithms, only synthesis
2ulp,which is accurate enough for power electronic and simulation tools are used. Due to the size of VHDL
applications. source code it is not presented here. To illustrate the
correctness of the algorithms, several simulation
IV. SIMULATION AND ITS IMPLEMENTATION
diagrams of floating point arithmetic units are shown in
Fig. (6).

(a) simulation results of floating point adder

(b) simulation results of floating point multiplier

(c) simulation results of floating point divider


Figure 6. Simulation results of floating point units
B Implementation DSP (TMS320F2812) is fixed point by nature, the
For this project, we have chosen an Altera Cyclone floating point operations are implemented by a run-time
FPGA EP1C6, featuring 5890 LEs and a total embedded supporting library(rtl2800.lib). The flow chart of this
RAM of 92160 bits, of which 98 pins are available to the program is shown in Fig (7). The DSP was operating
user. After allocating the pins using assignment editor, a under 100MHz with external interface access less than
programming file can be successfully compiled by 40ns, that means the floating point units can operate
under 25MHz.As you see from Fig (7), we randomly
Tab.2 Resources used in the arithmetic units
produced the operands A and B, then send them to FPGA
Arithmetic units LE Memory bits PLL and in next operation retrieve the result from FPGA, then
Adder 883 0 0 calculated the results using DSP algorithms, after
Multiplier 995 0 0 finishing computation, compared these two results. If the
Divider 1505 51200 1 difference between two results were great than 2ulps
the QuartusII software, which can directly download to (from above analysis, the error tolerance is less than
FPGA via a JTAG using QuartusII programmer. In Tab(2) 2ulps), DSP would halt to produce a fatal error, otherwise
are summarized the resources dedicated to floating-point continue to do testing routine. In this test, we have run
units respectively. the program for several hours continuously. The test
In order to overall verify the floating-point units, we indicated the floating-point units were validated by
have developed a program to compare the result hardware.
calculated by DSP and that obtained by FPGA. Since
Start

Produce the 32-bit operand A and


(ms) DSP Algorithms Coprocessor
operand B randomly
120
Send to FPGA 100
80
Read result from FPGA
60
Calculate result by DSP 40

Compare two results


20
0
16 32 64 128 256 512 1024
<2ulp
Number

Halt to produce Figure 9. Computation time of FFT between FPU and DSP algorithms
a fatal error

VI. CONCLUSION
Figure 7. Flow chart for testing the floating -point units using DSP
In this paper, a floating-point coprocessor is
V. VERIFICATION AND COMPARISON STUDY successfully configured by FPGA to enhance the
computation capability and flexibility of the digital
FFT algorithm was adopted to verify the correctness of platform for power electronics applications. The
the results obtained by the FPU and make a comparison coprocessor can operate under 25MFLOP.The
between the FPU and DSP algorithms(supported by run computation precision of the coprocessor, though lower
time library) in computation efficiency. All these than IEEE 574 standard in which 0.5ulp is required, is
less than 2ulps, which is also sufficient accurate for
common applications. The platform is being used for
prototype development and implementation of
computationally intensive algorithms for various power
electronics applications.
REFERENCES
[1] Mongkol Konghirun, Longya Xu Jennifer Skinner Gray.
“Quantization Errors in Digital Motor Control Systems”.Power
Electronics and Motion Control Conference,Aug.
2004,pp.1421-1426.
[2] D.D.Bester, J.A. du Toit J.H.R Enslin. “High Performance DSP/FPGA
Figure 8. Proposed digital platform for testing FPUs
controller for Implementation of Computationally Intensive
experiments were carried in the digital platform shown in Algorithms”.IEEE International Symposium on Industrial
Fig(8). The results of FFT calculated by FPU were Electronics,Jul.1998,pp.240-244.
exported by using “data saving” feature available in CCS [3] Habib-ur Rehman, Richard J. Hampo. “ A flexible high
and then analyzed in Matlab. The results of FFT obtained performance advanced controller for electric machines”. Applied
by FPU are totally in consistence with those calculated Power Electronics Conference and Exposition, Feb.
by Matlab. 2000,pp.939-943.
To study the computation efficiency between DSP [4] Joep Jacob,Dirk Detjen, etc. “Rapid Prototyping Tools for Power
algorithms and FPU, FFT algorithm with different points Electronic Systems: Demonstration with Shunt Active Power
was adopted. The time consumed by DSP algorithms and Filters”. IEEE Trans. on Power
FPU was measured by the interval between the very Electronics,Vol.19(2),2004,pp:500-507.
beginning of FFT calculation and the end using DSP T2
[5] Wangjun Lei, Fang Zhuo,etc. “Development of 100KVA Active
timer. Notice that all the operation environments were set
Filter with Digital Controlled Multiple Parallel Power Converters”.
the same for all these tests. Computation time of two
IEEE Power Electronics Specialists Conference, Jun.
methods was shown in Fig(9). From this figure, it is very
2004,pp.1121-1126.
clear that the computation time by FPU is five time less
[6] Albert Austin Liddicoat, “High-performance Arithmetic for
than that of DSP algorithms. If the FPU were interfaced
Division and the Elementary Functions”. Ph.D Dissertation,
to the DSP with 32 bit data width or integrated into the
Stanford University. 2002.
DSP like other peripherals, the computation efficiency
would further improved, since at least two-thirds time are [7] Nabeel Shirazi, Al Walters, Peter Athanas.“Quantitative Analysis
wasted in organizing, feeding and retrieving data to and of Floating Point Arithetic on FPGA based Custom Computing
from FPU. Machines”. IEEE Symposium on FPGAs for Custom Computing
Machines. Apr. 1995,pp.155-162.

Potrebbero piacerti anche