Sei sulla pagina 1di 22

Approaches to Low-Power Implementations of DSP Systems

Class Advisor : Dr. Fakhraie Presentor : Nariman Moezi DSP Design & Implementation Course Seminar Spring 2004

Out line

Reduced twos complement representation Low power Scheduling Techniques for embedded DSP software Low power multiplier - Mitchell-Based logarithm multiplier - Power-Aware pipelined multiplier

Reduced twos complement representation

twos complement representation is widely used in the implementation of arithmetic operations. If X has a small magnitude and switches between a positive and a negative value,its sign extension changes between strings of zeros and ones.
_ (m<N), We van represent this number If X has magnitude less than 2m-1 by the sum of an m-bit vector {x m 1 , xm 2 ,..., x0 } and a constant vector having a string of ones from bit N-1 to bit m-1 at the MSB side :

(Zhan Yu et al , 2002)

APPLICATION : Low power FIR filter using Reduced Twos Complement Representation Consider a hybrid-form adaptive FIR filter ,where the inputs are 5-level data symbols and take values in {-2,-1,0,-1,2} . Assuming coefficients are N-bit twos complement numbers Such multiplications are simply shift and complement operations Assume that we detect that the maximum magnitude of a coefficient H is less than 2m-2 .We know that corresponding partial product P has a magnitude less than 2m-1 .

- Coefficient Maximum Magnitude Detection (An example with two taps and 6 bit coefficients)

- Partial-Product generation using reduced twos complement representation

-As the adaptive filter updates the coefficients, the word-length of the reduced representation will change. So does the error introduced by using the reduced representation.We can build a compensation vector correction path that imitates the error propagation in the accumulation path.

-A test chip was implemented in 0.25 um CMOS technology.There were used a hybrid-form filter of 160 taps and having 8 taps per hybrid section.The coefficient word-length is 10 bits.when operating at 2.5V with a 100MHz clock, a 32% power saving has been measured as summarized in this table :

Low-Power Scheduling Techniques for Embedded DSP Software

This section describes an instructional-level power model for a processor (Fujitsu) , and techniques to reduce the power of this processor. The DSP processor has a special architecture that allows instructions to be packed into pairs. The Booth multiplier on this processor is a major source of energy consumption for DSP programs. So a micro-architectural power model for the on chip Booth-multiplier is developed and analyzed for further power minimization. Based on this model, an effective technique of local code modification by operand swapping is used to further reduce power consumption.

(S. Malik,IEEE Trans 1997)

An example of a sequence four instructions where the overhead cost between 1 and 3 can nat be ignored

The sum of measured current for the four instructions is 204 mA. The sum of the base costs (37.2+14.4+36.6+14.4) and the overhead costs of adjacent instructions (18.4+18.4+18.4+18.4) is only 176.2 ,which under estimates the actual cost by 13.6%. The difference ,27.8,in the two estimates comes from the circuit state overhead between non-adjacent instructions 1&3. This is due to a special design at the inputs of the multiplier.there is a latch between each operand and multiplier to retain the the old values until the next multiply instruction is executed. This overhead is dependent on the previous and current values of input latches for each multiply operation.

Instruction packing for lowpower

A special architecture of the target DSP processor is the capability of packing an ALU-type instruction and a data transfer instruction codeword for simultaneous execution . The average current for packed instructions is only slightly more than the average current for a sequence of the two unpacked instructions.

Comparision of energy consumed by packed and unpacked instructions

As to the overhead cost of MAC instructions, when MAC is packed with a data transfer instruction, especially LAB ,which changes data values in registers A and B used by MAC as inputs, significantly wide variation of overhead cost is observed(from 1.4mA to 33.0mA). Such wide variation is mainly due to the complex booth multiplier implemented in the MAC unit.
The fundamental idea behind booth multiplier is to recode B by skipping over 1s technique. For example a 7-digit B value 0011110 that would need four additions of shifted A,can be recoded to a new value which requires one addition and a subtraction
_

Micro architectural model for the booth multiplier

0011110
weight=4

recode 01000 1 0
weight=2

we can reduce the number of additions and subtractions by just swapping the operands in registers A and B, which can result in current reduction. The table gives three experiments where swapping :

Variation of measured current by swapping operands op1 and op2 in registers A and B for MAC:LAB instructions.

Another that determines power consumption of the multiplier,is switching activity For the booth multiplier the characteristic of A is its switching activity and for B, weight factor and switching activity

Average current drawn by MAC:LAB for different characteristics of consecutive values in A and B.

For a typical DSP application MAC:LAB instructions are usually applied to a sequence data for filter operations such as ci X i

As we know only C and there is no information about X we , consider C as the value B .If switching activity or weight factor of value C is high we can swap operands.

Comparison of power consumption for 5 DSP programs by different scheduling techniques

Improved Mitchell-Based Logarithmic Multiplier for Lowpower DSP Applications

The technique of multiplying two numbers using logarithms is simple. Take the logarithms of two multiplicands, add the logarithms together and then take the antilogarithm of the resulting summation.

Mitchell method of calculating logarithms : assume N = 2510 = 110012 The MSB is bit 4,that gives a characteristic of 1002 and the retaining bits(10012) gives the fraction. This gives a value for the logarithm of 100.10012 (=4.562510). The correct value of log2(25) is 4.6439.

(Duncan J. McLaren et al IEEE 2003)

A binary number N ,can be written as:

Antilogarithms of this two equations are:

Note that k represents the characteristic and x the binary fraction,with x in the range 0< x < 1. The true logarithm and the approximation using the Mitchell method are:

The logarithm of a product is equal to the sum of the logarithms of the multiplicands

To correct the error the following is used:

This shows that to provide the correct answer, an error correction factor should be added to the summation before the antilogarithm is calculated.
however this would be impractical. The approach is to average the value of the correction factor over a range of x values, and add this to the summation. This results in a multiplier of improved accuracy. multiplier of improved accuracy. The two fractional parts are split into 8 ranges, from 0 to 1 in steps of 0.125. This means that the 3 most significant bits of x can be used to determine the error correction factor (which is pre calculated).

To test the multiplier further, it was used as part of a real application, in this case a Finite Impulse Response (FIR) Filter. The filter was an 11-tap low-pass FIR, with a normalized cut-off frequency of 0.25. The filter was implemented in Verilog using the standard multiplier, the un-modified Mitchell multipliers and the Improved Mitchell multipliers. The input was 16-bit and the output was 32-bit. The figure below shows the magnitude response from each of the three implementations.

Power-aware Pipelined Multiplier Design Based On 2-Dimensional Pipeline Gating

Although Boolean multipliers have natural power awareness to the changing of input precision, deeply pipelined designs do not have this benefit. In Boolean unpipelined multipliers, low input precision calculation (like 00010001) dissipates much less power than high input precision calculation (like 11111111). So Boolean unpipelined multipliers are naturally power aware to the changing of input precision. In deeply pipelined designs, the number of registers is much larger than that of other elements, these designs do not have the natural power awareness to the changing of input precision.

(Jia Di, J. S. Yuan et al GLSVLSI 2003)

To solve this problem and improve the power awareness of deeply pipelined multipliers,a novel technique,2-dimensional pipeline gating is proposed.This technique is to gate the clock to the registers in both vertical and horizontal direction.

In a 4*4 multiplier , when the input precision is 4, for example, calculating 11111111, S is generated based on all inner partial products. If the input precision is 2, for example, calculating 00110011, the partial products containing X2 or Y2 (the ones enclosed by a rectangular) can also be disabled.

References

M. T. Lee, V. Tiwari, S. Malik, and M. Fujita, Power analysis and minimization techniques for embedded DSP software," IEEE Trans. VLSI Syst., vol. 5, pp. 123-135, Mar. 1997.
Jia Di, J. S. Yuan et al,Power-aware Pipelined Multiplier Design Based On 2Dimensional Pipeline Gating GLSVLSI03, April 28-29, 2003 Zhan Yu et al,A Low Power Adaptive Filter Using Dynamic Reduced 2SC Representation,IEEE Custom Integrated Circuits Conference 2002 Duncan J. McLaren et al,Improved Mitchell-Based Logarithmic Multiplier for Low Power DSP ApplicationsIEEE 2003

Potrebbero piacerti anche