Sei sulla pagina 1di 13

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO.

7, JULY 2007

733

A Novel Charge Recycling Design Scheme Based on Adiabatic Charge Pump


Ka-Ming Keung, Student Member, IEEE, Vineela Manne, and Akhilesh Tyagi, Member, IEEE
AbstractPower consumption has become a critical design criterion for integrated circuits given the growing importance of portable battery-operated devices. A typical CMOS gate driven 2 by power supply ( DD ) draws energy equal to DD during every cycle of operation. We propose a new approach to recycle the charge with an adiabatic charge pump that moves the slower adiabatic components away from the critical path of logic. The critical path of the system, and hence the delay, do not change. This is achieved by overlapping the adiabatic charge pump delays with the computing path logic delays. Many embedded high performance applications such as digital signal processing (DSP), which exhibit datapath parallelism, are ideal candidates for this scheme. The proposed method has been implemented in DSP computations. SPICE simulations-based results indicate that the proposed scheme reduces energy consumption in these DSP circuits by as much as 18% (on average 9.94%) with no perceptible loss in performance. The area penalty for these energy savings are in the 1%2% range, The leakage energy reduction in 45-nm BPTM averages 46%. Index TermsChargepump, charge recycling, low power.

I. INTRODUCTION VER THE last 15 years, the semiconductor industry has witnessed an exponential growth in the deployment of highly complex digital circuits in the communication and digital signal processing domain. This in addition to the growing demand for portable wireless devices with sophisticated functionality has driven the designers to focus on increasing the operating frequency and integration density, and energy efciency [1][3]. Power dissipation is considered as one of the technology barriers by the International Technology Roadmap for Semiconductors (ITRS) [14]. The ITRS roadmap [14] further projects that the power density in the future technology nodes is likely to get even worse. The emerging paradigms of wearable, ubiquitous and pervasive computing are leading to an explosion in number and type of portable systems. Energy efciency takes on special signicance for these systems in order to deliver maximal computing services given a certain battery weight. Many of these applications require high performance, and yet need to consume low energy [4], [5]. Hence, the challenge is not just low power and energy design, but it is to design with lowest possible energy for a given performance. In order to maximize energy efciency, low-power design techniques and

concepts need to be implemented at multiple abstraction levels: technology, circuit, architecture, and algorithm [6]. In this paper, a new circuit and architecture level design method of conserving energy by recycling dirty charge is proposed. The charge bound for the ground terminal (ground-bound charge/current) is collected at a virtual ground terminal. This charge is pumped up to a higher potential with an adiabatic charge pump to serve as terminals. This reduces energy wastage revirtual internal sulting in lower energy consumption. Another unintended benet of charge recycling scheme is reduced subthreshold leakage current [25]. There are two factors at play in the leakage reduction. The virtual ground nodes boost the potential of source terminals of n-channel transistors. This reduces subthreshold and increased due leakage exponentially due to reduced to body-effect. The second factor contributing to higher energy efciency is that even the subthreshold leakage charge can be collected and recycled. The virtual ground capacitor has significantly favorable leakage prole than a transistor channel. The node as well. We also quansame issues apply to a virtual tify the leakage energy reduction benets of charge recycling design style in this paper. Of course, the charge pump used in this process will have some dissipative losses. The key is to ensure that these dissipative losses constitute a small fraction of the potential energy available in the charge stored at a virtual ground. Over a variety of DSP computations, which seem to be amenable to this design style, the charge recycling results in up to 18% energy savings. Section II presents power consumption background. Section III introduces the proposed charge recycling scheme. A specic charge pump-based implementation of this scheme is presented in Section IV. The results of experimental evaluation of the scheme with respect to energy savings, area and delay overhead, and leakage energy are provided in Section V. Section VI concludes the paper. II. POWER DISSIPATION IN CMOS DIGITAL CIRCUITS The main sources of power consumption in CMOS circuits can be classied into two types, namely static power dissipation and dynamic power dissipation, as given in (1) (1)

Manuscript received February 25, 2004; revised October 14, 2005 and February 5, 2007. K.-A. Keung and A. Tyagi are with the Electrical and Computer Engineering Department, Iowa State University, Ames, IA 50011 USA (e-mail: tyagi@iastate.edu). V. Manne is with Micron Technology, Inc., Boise, ID 83701 USA. Digital Object Identier 10.1109/TVLSI.2007.899220

The rst term refers to the static dissipation which is mainly due to leakage current arising from substrate injection and subthreshold effects. These components are determined primarily by fabrication technology. The second term in (1) refers to dynamic power dissipation that occurs when the outputs toggle

1063-8210/$25.00 2007 IEEE

734

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

due to change in input logic level. This can be further classied into switching dissipation and short circuit dissipation. Switching dissipation results from charging and discharging of in the circuit. Each time parasitic and load capacitances a capacitor charges, energy is drawn from the power supply . This charge is then dumped into ground when the capacitor discharges. The energy loss through this mechanism depends on design factors such as load and parasitic capacitance , supply voltage , clock frequency , and the acwhich is the probability of a transition per clock tivity factor cycle. Short circuit dissipation occurs mainly during switching transients. When both the pMOS (pull up) and the nMOS (pull down) networks are ON simultaneously, a direct short-circuit path from supply to ground terminal is formed. This effect is more prominent when the inputs switch slowly in relation to the intrinsic RC delays of the gate. This prolongs the short-circuit condition: the input voltage is higher than the n-type threshold and less than . Of all these energy components, one that is most easily addressed at logic and physical design levels is the dynamic . Adiabatic design styles switching energy, rely on the observation from the physics of computation [7], [8] that the laws of physics do not impose any lower bound on the energy requirements of an elementary gate. The key observation here is that loss of information does indeed require energy. However, a computation that is reversible need not lose any information, and hence can be made to consume as little energy as desired based on some other limits. The typical charge to output node to ground necessarily loses path from information and hence adds to the energy costs. An adiabatic computation is also reversible. The reverse path is often called the charge recovery path. The technique is to feed energy into the circuit during computing phase and to pull it back into the power supply during charge recovery phase. This trick however still does not prevent frictional energy losses which are the dissipative losses in resistances. These frictional losses can be minimized by matching the rate of energy input, captured by rises, to the intrinsic capacity of the gate the rate at which given by its RC characteristics. This is the essence of adiabatic design style. Such rate regulation for circuit energization also makes adiabatic circuits slower than classical CMOS design oscillatory to styles. Moreover, the need to make the switch between computing and charge recovery phases places extreme demands on CMOS to build RLC oscillators to match the clock frequencies of classical CMOS. Many variants of this core adiabatic design style have been developed to overcome these shortcomings, but of necessity, these are hybrid approaches that do not deliver full benets of the pure adiabatic principle. The proposed technique also falls into the later family of design styles that adopt some of the adiabatic principles. It traps the ground-bound charge that is sent down the ground path through normal switching and leakage in classical CMOS on a virtual ground capacitor. This charge is then recycled through a pumping action to a higher potential. In order to minimize the frictional losses in charge pumping operation, or to maximize the fraction of ground capacitor energy transferred , the charge pump operates in an adiabatic mode. to virtual

Fig. 1. Conceptual description of the proposed architecture.

This scheme achieves overall energy savings of up to 18% (on average $9.94%) with no perceptible performance degradation for a variety of DSP computations. The area penalty for these energy savings are in the 1%2% range. The leakage energy process averages 28.87%, and in reduction in AMI05.5 45-nm BPTM averages 46%. III. CONCEPTUAL DESCRIPTION Fig. 1 illustrates a schema for the proposed design methodology. The system or algorithm under consideration is rst divided into various subblocks. Some of these subblocks will serve as source blocks for charge, i.e., their ground bound charge will be trapped at a virtual ground node. Some will could be a virtual one, function as a target block. Their driven by the charge collected at a virtual ground which is then pumped to a higher potential. Note that a source block can switch between the virtual ground and real ground as shown in Fig. 1. When the voltage at virtual ground is too high, the source block uses real ground. An appropriate mechanism for deriving such a control signal is discussed later in the paper. Similarly, and real the target block may switch between the virtual controlled by a dynamically computed signal. Also note that although we have shown the source block in Fig. 1 fed by and the target block connected to real ground, there real is nothing in the schema to prevent them from connecting to , respectively. In the later discussion, virtual ground and in order to reduce clutter, we will usually avoid illustrations of and blocks that are connected to both virtual ground and control switches explicitly. that will show the ground and A virtual ground will collect charge only if the source block exhibits a signicant amount of switching activity. This is likely for systems with a large number of transistors that undergo considerable switching activity. When a signicant fraction of transistors in a source block switch often, considerable amount of charge is routed to ground. This is the charge targeted for recycling in the proposed mechanism resulting in increased energy-efciency for the system. It suits pipelined logic blocks well. The earlier pipeline stages can act as source-blocks. They

KEUNG et al.: NOVEL CHARGE RECYCLING DESIGN SCHEME

735

are driven by clean terminals (and virtual ground terminals). The charge collected and recycled from the earlier stages nodes. can drive the later pipeline stages through virtual In other words, the charge recycling mechanism forms its own parallel pipeline dovetailed with the logic pipeline. This charge recycling pipeline is inserted explicitly as a design technique. Note that the charge recycling pipeline need not necessarily be of synchronous variety in our schema. In fact, the scheme adopted by us deploys self-timed (asynchronous) control for charge recycling. The following discussion and the specic DSP lter implementations described later illustrate the concept further. A system is rst divided into conceptual logic blocks with specic functionality. The computation or pipeline schedule dictates which of these logic blocks ought to be source blocks and which ones ought to be target blocks. The identied source (target) logic blocks are then associated with a virtual ground node. We need to ensure that the amount of charge collected at the virtual ground of a source block is a signicant fraction of charge needed for a typical computation phase in the target block fed by this source block. Such matching of source and target blocks can be determined through an analysis of switching activity, number of transistors, and the relative schedule of the block within the system schedule. The size of the virtual ground (or ) capacitor is chosen carefully to match the expected amount of collected charge, the threshold voltage for switching between virtual ground and real ground. The local ground node of a source block is then connected to this virtual ground capacitor that collects all the ground-bound charge for most of the computation phases. As the virtual ground capacitor collects more charge, the voltage on the ground capacitor increases. The voltage on the virtual ground capacitor is continuously monitored to ensure that it is able to serve as an effective logic 0. A predetermined voltage threshold should not be exceeded so that it does not hinder the performance of the circuit or create noise immunity issues. Once the capacitor voltage reaches this voltage threshold, it is disconnected from the logic block. Any further ground-bound charge from the source block is either collected in another virtual ground capacitor or routed to real ground. The charge collected at the virtual ground capacitor will be at a potential up to the cutoff threshold voltage. It needs to be boosted to a in order to serve as a virtual . The voltage closer to charged ground capacitor is then connected to a charge pump capacitor circuit to transfer the charge to another virtual at a higher potential. is fed The target block to be powered by the virtual by the charge pump. Note that the charge pump introduces its ) from the time the ground capacitor reaches own delay capacitor its cutoff threshold voltage to the time the virtual achieves the right potential. In order to hide the charge pump time latency, the target block needs to be activated at least units after the source blocks computation is done. Pipelined computations will often contain many such collections of source and target blocks which are sufciently separated in the computation schedule. It may be necessary to drive the target-block until the charge pump is primed. At that initially by clean . point, the target block is switched to virtual

The second design concern is to ensure that the energy collected from a source block virtual ground capacitor along with the extra energy pumped in by the charge pump meets the energy needs of the target block. The collected charge at the virtual ground capacitor is dirty primarily due to its low potential rendering it useless for the intended purpose of energizing a logic block. The recycling metaphor states that it needs to be cleaned potential to be useful. Note that up or pushed up to a near just like recycled material, recycled charge may not be a perfect source of clean charge. When should the recycling or the charge pump operation be initiated? The simplest control mechanism to activate the charge pump is a completion signal from the source block. In this case, there can be signicant uncertainty with respect to the voltage at the virtual ground. The amount of charge collected at the virtual ground depends on the number of transitions that actually occur in the source block. This number is input dependent. If the source logic block has very low variance in its switching activity over all possible inputs, this issue is moot. In such a case, one could design with respect to an expected average number of transitions which is guaranteed to be close to the number of instantiated transitions due to low variance. This uncertainty in the amount of charge makes it dirty charge, the distribution of potential is broad. The symmetric, consumer side of this equation is the number of transitions to be . The voltage the supported in the target block by the virtual is charged up to depends on the amount of charge virtual on the virtual ground, and the capacitance of the virtual node. The amount of charge stored on the virtual should be large enough to supply charge for a certain number of target block transitions, and yet retain a high enough voltage to serve . Estimation of energy needs of the target as an effective block or equivalently the transition activity requires the knowledge of the input vector distribution, which is seldom available beforehand. For almost all simple systems, an average input case with a slight error margin can however be considered. In large systems, the best pairing of source and target blocks may arise from selection of identical logic blocks separated in schedule. For instance, an adder source block drives another adder target block. This is a natural match since they are symmetric in terms of logic transitions. As the complexity of the system under consideration increases, it may be difcult to identify a single source block to a single target block matching for all pairs, simply due to diversity in their switching characteristics. In such situation, the matching problem takes on polygyny or polyandry avor. A single source block may be matched with multiple target blocks or a single target block may be matched with multiple source blocks; or even multiple source blocks could be matched with multiple target blocks. This is similar to bin-packing problem along with the schedule constrains where underlying elements have nonuniform size in switching activity metric. In summary, the source charge can be either spatially distributed, or temporally distributed, or a combination of both. Note that this scheme also provides a limited degree of voltage scaling naturally. The logic blocks with a virtual or ground operate with a lower voltage swing, reduced by the nodes. The cutoff voltages of the virtual ground and virtual proposed scheme can also help with the subthreshold leakage

736

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

to through bootstrapping. node from and switches ON diode . Charge is This turns OFF diode pushed from node to node out until the output node reaches a nal voltage given by (2) Assuming the clock voltage is much higher than twice the diode drop, a boosted voltage is obtained at the output node that is higher than the voltage at the input node. Note that according to the ITRS roadmap 2006 executive summary [14], is expected to scale to 0.6 V from curthe supply voltage rent 0.9 V for low-power applications, to 0.9 V from current 1.1 V for high performance applications. The clock voltages of this order cannot be sustained with a typical diode drop of 0.70.8 V in silicon. Furthermore, the charge pump sensitivity is also affected by the diode voltage drop levels. RFID domain uses charge pumps with zero or low threshold junctions [24]. applications. These designs can be adapted for such low A complete analysis of the charge pump, including effects of external resistive load and stray capacitances, along with the necessary and sufcient conditions for correct charge pump operation are given in [10]. The circuit shown in Fig. 3 can be directly implemented for discrete component systems. However, for integrated applications, the diodes are replaced by diodeconnected n-channel MOS transistors, where the diode drop is replaced by MOS threshold voltage, . In certain systems, the voltage drop across the diode connected n-channel transistor is critical. In such cases with some slight additional modication, the diode connected n-channel transistors are replaced by bi-directional low drop switches that are implemented using transmission gates. Lin and Chua [21] give a generalized analysis of capacitordiode networks as a framework for charge pumps. Such a general capacitor-diode network will allow for the charge-pump to provide multiple voltage multipliers at different points in the network. This will expand the variety of consumer virtual nodes for each charge pump, each one with its own personalized voltage level. B. Modication to Standard Charge Pump As explained in Section III, the proposed approach towards charge recycling is to collect the ground-bound charge from a source-logic block into a capacitor and then use a charge pump circuit to generate virtual for a target block. For the proposed solution to be viable, we need to ensure that the additional frictional energy to operate the charge pump along with the various control circuitry does not constitute a signicant portion of the total energy saved by the scheme, thereby rendering the scheme inapplicable. Although the traditional charge pump does serve the purpose of boosting the voltage well, it is worth observing that considerable energy is drawn from the external clock sources. Of course, some of this energy becomes potential energy held by the charge at the boosted voltage level. It is the dissipative or frictional energy lost in the transistors of the charge pump that we wish to minimize. The frictional energy loss typically is not an issue in cases where the main criterion is to generate a high voltage. However, if the traditional

Fig. 2. Simplied model of a charge pump circuit.

Fig. 3. Current path during charge pump operation.

energy that is emerging as one of the big challenges for the future technology nodes. If the transistor isolating the virtual ground and the clean ground has a high threshold voltage (or body-source bias), then all (most of) the subthreshold leakage charge will be trapped at the virtual ground. When the amount of collected leakage charge exceeds a predetermined threshold, node. the charge pump will pump it up into another virtual An array of static memory subbanks can be organized as a collection of source and target blocks of leakage charge. IV. PROPOSED ARCHITECTURE Charge pumps are widely used as dc-dc converting circuits in many systems to generate voltages higher than the supply voltage. Most charge pumps are based on the Dickson charge pump circuit [9]. Although these traditional charge pump designs serve the purpose of boosting the voltage, a slight modication to the traditional scheme has been adopted in this paper to achieve low-power operation. A. Standard Charge Pump Operation The basic theory behind charge pump operation can be explained with the simple circuit shown in Fig. 2. The circuit consists of two diodes namely and a capacitor . The diodes act as self-timed switches and are used to make sure that charge ows in one direction from input to output. The lower is controlled by a clock signal, of magniplate of capacitor . In each clock cycle, charge is pumped along the diode tude is charged through diode and then chain as the capacitor discharged through diode . The basic operation occurs in two is phases [10]. During Phase 1, the lower plate of capacitor connected to ground (clock low), and the capacitor is charged as shown in Fig. 3. Assuming sufcient charging through time is provided, the capacitor is charged to voltage , represents the drop across the diode. where During Phase 2, the clock signal is raised high. This causes and the voltage at the voltage at node to rise from 0 to

KEUNG et al.: NOVEL CHARGE RECYCLING DESIGN SCHEME

737

Fig. 4. Three-input stage charge pump.

charge pump is directly deployed for charge recycling with the virtual ground capacitor as the input energy source, then during every cycle of charge pump operation, when charge is shared of the between the virtual ground capacitor and capacitor charge pump, half of the energy is dissipated in the switches. In occurs almost other words, if the charging of the capacitor , where instantaneously, then the dissipated energy is is the voltage difference between the input capacitor and . This loss is inevitable regardless of the network design parameters. An adiabatic charge sharing between the virtual ground can alleviate this capacitor and the charge pump capacitor frictional loss. The adiabatic principle states that the energy dissipation while charging a given node capacitance to a particular voltage, can, to a rst approximation, be asymptotically reduced to zero, if the charging time tends to innity [11][13]. One possible realization of adiabatic charge sharing with can model stepwise charging by over steps to achieve an . The resulting energy eventual voltage difference of to . Hence, saving is from the original our operating design principle for the charge pump will be to in incremental steps of rather than charge the capacitor in one step. The source capacitors for these incremental voltsteps can be a spatial collection capacages differing in , respectively, such that itors each at voltage , where . The energy dissipated during the charging process for such a system is then given by (3) where is the number of input stages. Thus increasing the number of input stages reduces the amount of dissipated energy. This provides us with a parametrized model for frictional energy loss in the charge pump where we can trade energy loss for charge pump delay. depends on The choice of the number of charging stages certain design objectives such as area, power, and speed. As the number of stages is increased, not only does the complexity of the circuit increase, but other issues such as energy dissipation in the control circuitry, leakage effects from the multiple input capacitors, speed of operation, area overhead, etc., become critical. A tradeoff must be reached between energy dissipation and circuit complexity. Based on these critera, the default number of input stages to the charge pump was set to three (3) in this paper. This may not be the most optimum solution, but has been conrmed through simulations to be a practical solution. The circuit diagram of the three-stage input model for a charge pump is given in Fig. 4.

Fig. 5. Voltage comparator.

C. Circuit Operation The input to the charge pump consists of three capacitors that are charged to predetermined voltage levels given by and , respectively, . This is achieved through a charge pump controller that cycles the connection to to the virtual ground capacitors of three different source blocks. The voltage on the different charge pump capacitors is carefully monitored during the charging process, and the capacitors are disconnected from the source block once the required voltage is achieved. This task of observing the voltage and breaking the connection between the virtual ground capacitor and the source block is controlled by a simple voltage comparator. The design of the voltage comparator (see Fig. 5) implemented in this scheme is similar to a SRAM cell with one input which in our exxed at a desired reference voltage level , or . The other input ( ) ample will be one of is connected to the virtual ground capacitor. The voltage comparator acts like a differential amplier. Its output settles to a or . Inilogic 0 or logic 1 based on whether tially, the voltage on the capacitor is zero, and hence the comparator output is 0 (or 1 depending on the design). As the capacitor starts charging, the voltage on the capacitor increases and once the voltage is higher than the reference voltage the comparator output state toggles. The comparator output is then used to disconnect the virtual ground capacitor and to instantiate the charge pump. The control to instantiate the charge pump could be decoupled from the virtual ground capacitor full event. We will describe such a scenario in the following. The charge collected on the various input virtual ground caand of the charge pump is then transferred pacitors as shown in Steps in steps to the intermediate capacitor 1, 2, and 3 of Fig. 6. The input capacitors are connected to the intermediate capacitor through transmission gates controlled by special step wave signals. The capacitor with least voltage value ) is rst connected to . This is achieved by closing (say . The charge on is now shared between the the rst gate and . For equal capacitance values, the two capacitors voltage at and settles nally at (assuming negligible drop across the transmission gate). Once the voltage on settles to a steady value, the input capacitor is disconnected is connected to by closing gate . It is essential to and

738

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

Fig. 6. Stepwise operation of the charge pump.

disconnect before connecting capacitor , to prevent reor to . Once the gate is verse charge ow from is higher switched on, since the voltage on capacitor ( after Step 1), charge than the voltage on capacitor sharing occurs between the capacitors. After sufcient charge is opened and transfer occurs between these two capacitors, is closed to connect the last capacitor to for charge before consharing. Once again, it is necessary to disconnect to the charge pump. Once the capacitor accumunecting lates all the charge, it is disconnected from the inputs. The next to the output capacphase is to transfer the charge from capacitor. This transfer also ought to occur itor, the virtual in an adiabatic stepwise fashion to minimize frictional losses. is connected to the output capacitor through switch . Now During the entire duration of the rst three steps (when the cais connected to various inputs), the lower plate of pacitor is at ground potential thereby causing charge to ow from capacitor of the charge pump. Until input capacitors into the this point, recycled energy has been transferred into the charge pump. This is the time to draw energy from the charge pump as well to boost the potential further. During Step 4, the clock is raised high to a signal, and thereby the lower plate of , forcing the voltage at the top plate of the capacitor voltage to increase by a value through bootstrapping. When is connected to , charge sharing takes place between the two capacitors and energy is transferred from the charge pump to the output capacitor. to As we had observed earlier, the charge transfer from also occurs in voltage steps to mimic an adiabatic process so that frictional losses are minimized. Unlike the input case, (where multiple capacitors with different stepwise voltages is the only source capacitor differing by a constant delta), in the output charging path. The adiabatic charging is achieved by raising the voltage at the lower plate of the capacitor in small steps . We have used a three step transfer for this phase as well. This ensures maximum charge transfer to the output capacitor. The process is repeated cyclically until the energy on the output capacitor equals the energy required by the target logic block, since this capacitor will . serve as virtual

D. Energy Analysis of Charge Pump Circuit As the primary focus of this paper is to recycle the charge in order to lower the overall energy consumption of the system, the additional energy to drive the charge pump needs to be as small as possible. A complete energy analysis of the various input sources, various available output sources, and the charge pump operation is thus imperative. The two sources of input energy to the charge pump system are the energy stored in the input capacitor and that provided by the external clock signal source that controls the various gates and the bottom plate of capacitor . The only consumer of output energy is the energy stored in the output capacitor of the charge pump that is eventually used . Despite adiabatic charging energy dissipaas the virtual tion still occurs in various segments of the charge pump circuit. This is for charging/discharging of the switch gate capacitances and for capacitor leakage. The rst source of input energy that comes from the charged input capacitors does not add to implementation cost as it is collected from the charge that would have otherwise been dumped to ground. The second source of input energy, constituting the additional energy that is provided by the various control and clock signals, that are essential for the circuit operation adds to implementation cost and has to be considered in the energy analysis. Let us consider a target logic block that needs energy to perform the desired operations. Without the availability of any internal charge recovery scheme this entire energy is drawn source. In this case, the energy cost from the external clean would be . To estimate the reduction in cost with the proposed charge recycling scheme, again consider a system incorporating virtual . Let denote the total energy obtained from the various virtual ground capacitors that act as input to a charge pump. is the result of collection of ground-bound charge from This switching activity in source blocks and hence can be considered free energy. When the charge pump is functional, as explained earlier, this energy is transferred in steps from the input capacitors to the output capacitor. At any instant of time, the energy drawn from this input source is given by (4)

KEUNG et al.: NOVEL CHARGE RECYCLING DESIGN SCHEME

739

Fig. 7. Block level implementation of charge pump recovery scheme.

where is the energy available at the input at time and is the energy remaining at the input after time . If refers to the time period of one cycle of charge pump is the magnitude of the energy drawn in operation, then one iteration of charge pump. Let us denote the energy available as at the output capacitor of the charge pump at time . The energy drawn by the target block after time is then given by (5) refers to the residual energy present on the where output capacitor at the end of one iteration. denote the total energy provided by the clock source Let during one cycle of charge pump operation in transferring the to the output capaccharge from intermediate capacitor itor. The charge pump draws energy to control the various gates and to drive the comparators over and above the energy drawn from the clock. All the different energy sources have been comin the equations to simplify the energy bined and denoted as calculations. The energy that is dissipated in the charge pump in the form of resistive loss in transmission gates and charging/dis. charging of the various switches can all be combined as Thus during every cycle of charge pump operation, energy is drawn from the input capacitor and clock source; part of which is dissipated in the operation of the circuit and the rest is distributed as usable energy that is available at the output capacitor and unusable energy that is stored in the intermediate capacitor primes that cannot be tapped directly. This energy stored at the charge pump. Based on the Law of Conservation of Energy, the various energy gures can be arranged as (6) is the remaining unusable enwhere ergy that is stored at the capacitor . Equation (6) indicates the relationship between the various energy factors. The magnitude of each term varies depending on the system under consideration, the source and target blocks, and the activities they undergo. needs to be equal to to source a target logic block units of energy. The various other design pathat requires . rameters of the system can then be determined based on

can be estimated from the voltage on the capacitor at the end of rst cycle of charge pump operation. The input energy term, as explained earlier, is considered free energy. From (6) the amount of external energy that is required in the form of , to generate , can be computed. The energy savings of the proposed charge recycling scheme in terms of the . The percentage total energy consumption is given by of energy saved, which is also an indirect estimate of the efciency of charge pump circuit is given by (7) Note that this gure of merit is an underestimate of the efciency since it considers as lost energy. Note that in a cost pipelined design for a time multiplexed charge pump, is paid only once over multiple charge pumping operations. E. Block Level ImplementationAn Example A typical block level implementation of the proposed scheme is shown in Fig. 7. The different source blocks are selected based on the criteria explained in the previous sections and are clustered to provide virtual ground inputs to the charge pump. The system is also designed such that the charge pumps compuoverlaps with the source cluster to target tational delay . This serves the dual purpose of avoiding block delay any further delay degradation in the circuit due to adiabatic charge pump, and of ensuring that the output of the charge pump to support the circuit is boosted to proper supply voltage expected number of transitions in the target block. A pipelined computation has many desirable attributes that favor the proposed charge recycling scheme. One such system is an -tap FIR lter as shown in Fig. 8. The different blocks shown in the system are as follows: delay element; multiplier block; adder block; weight by which the corresponding input is multiplied. The proposed charge pump circuit can be conveniently t into a structure like FIR. Note that the th block is only executed have executed. A after all the preceding blocks 1 through charge pump sourced by an earlier th stage can drive a target

740

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

Fig. 8. FIR lter structure: An implementation example.

Fig. 9. Spatial implementation of multiple charge pumps.

block given by a later th stage for . As long as Blocks and are carefully chosen the entire charge pump delay can be overlapped with FIR computation. The logic blocks and of Fig. 8 can be mapped onto SB1, SB2, and SB3 of the Source Block cluster in Fig. 7. The ground terminals of these source blocks are connected to capacitors which form virtual grounds at these nodes. These capacitors serve as inputs to the charge pump. Let these virtual ground capacitors reach their . Let us denote the threshold voltages no later than time node voltage time it takes the charge pump to boost virtual to an acceptable level as . The target block, TB, is chosen from the source to target block is such that such that the . Note that time is sufcient for the . When charge pump to pump enough energy into its virtual the virtual node is drained by the target block transitions , the target block is disconnected from below a threshold (charge pump) and connected to an alternate this virtual clean source. F. Multiple Charge Pumps-Based Implementations In a complex system comprising of several subblocks that are scattered both in terms of spatial location on the chip and in terms of their temporal schedule, multiple charge pumps are a better alternative to one complex charge recycling mechanism. Multiple charge pumps can also expose many more opportunities for charge recycling to achieve improved energy savings.

The two principal schemas for deploying multiple charge pumps are as follows. Spatially implementing multiple charge pumps, interlaced among different source and target blocks. This implies that many spatially distinct source blocks with favorable schedules with respect to their target blocks are available. Time multiplexing multiple charge pumps to provide continuous virtual supply to certain target blocks. In this case, the same source block could be time multiplexed to provide stepwise voltages for the charge pump. Fig. 9 shows an implementation involving multiple charge pumps to further reduce energy consumption. In this method, various source block clusters provide spatially distributed virtual grounds as inputs to different charge pumps which in turn provide virtual supplies for various target blocks. Here, each charge pump is associated with a single target block. When voltage falls below a predetermined threshold the virtual due to target block activity (higher than average), the target block is disconnected from this charge pump and connected to . It is reconnected to the same charge pump once the clean node to sustain charge pump has reenergized the virtual another round of transitions in the target block. Fig. 10 shows yet another way of connecting multiple charge pumps in a circuit. Some target blocks are always fed from virterminals. The target block cycles between multiple tual terminals with a cohesive recycling schedule. For virtual

KEUNG et al.: NOVEL CHARGE RECYCLING DESIGN SCHEME

741

Fig. 10. Time multiplexed multiple charge pumps.

example, consider a system of two charge pumps that are timemultiplexed to feed the target block as shown in Fig. 10. At rst to target block. the output of CP1 is connected as virtual is supplying energy to the target block, charge pump When terminal CP2 is in the charging phase. Once the virtual of CP1 falls below a tolerable value, it is disconnected from the target block and CP2 is connected to the target block. Note that since the charging periods of CP1 and CP2 are disjoint, the source blocks for the two charge pumps need to have disjoint activity periods. The scheme easily generalizes to charge pumps charging cywith each charge pump participating in cycle (where it feeds the target cles and one disjoint virtual block). V. SIMULATION RESULTS AND PERFORMANCE ANALYSIS SPICE simulations were performed with many different computations and charge pump schemas to conrm and evaluate the functionality and efciency of the proposed scheme. Results indicate that energy savings up to 18% can be achieved for many systems without any loss in performance. The proposed scheme is an ideal candidate for large systems that can be divided into many subblocks with sequential schedule. To estimate the performance and efciency of the scheme, three different DSP applications, namely FIR, FFT, and DCT were chosen. These systems are mostly found in domains where energy efciency is usually a major issue, for example, in cellular phones. Complete transistor level circuits of all the three systems were implemented in SPICE (in Cadence design setup) in TSMC 0.18- m technology (with supply voltage of 1.8 V) and tested with the adiabatic charge pump circuits. A. Energy Comparison To estimate the energy savings of the proposed scheme, all the three architectures were implemented with and without the charge recovery scheme discussed in this paper, and the resulting total energy savings were computed. The systems were

rst divided into smaller blocks such as multipliers, adders, or a combination of both. Fig. 11 shows the schematic diagram used in our implementations for FIR. Certain blocks were chosen for virtual ground (source blocks) and some others for virtual (target blocks). The source and target blocks were chosen such that they were separated by signicant computational delay. An adiabatically controlled charge pump was included in each of these systems. During the period of computation between the source and target blocks, the adiabatic charge pump recycles the voltage for the target charge and generates sufcient virtual block. The energy drawn from the power supply pin, for all the systems with and without the CPs (charge pumps) were evaluated through simulation and are tabulated in Table I. Set 1 and Set 2 in the table refers to two different random sets of inputs that instantiated these systems. Fig. 12 shows a comparison of the percentage energy savings with the proposed charge recycling scheme in different systems with different energy costs. For most of the systems, about 15% energy savings were observed. This gure can be further improved by rening the charge pump circuits to draw optimally minimum energy. Another variation of the scheme involving various charge pumps multiplexed in time or space domain yields higher energy savings for large circuits in which the energy overhead due to additional circuitry is small. This is illustrated in Table II which shows energy costs for selected circuits with and without the charge pumps. Fig. 13 shows a graph of percentage energy savings with respect to energy cost of the system. Note that this scheme is most benecial i.e., has higher savings, when used to generate virtual for large or high activity logic blocks. This is because, in small logic blocks with less signal activity, where the energy required is very low, the energy dissipation in the control circuitry which is a constant quantity would be more dominant and reduce the efciency of the scheme. On the other hand, in large circuits with high signal activity, the energy spent in control circuit forms a negligible fraction of the entire energy that is saved.

742

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

Fig. 11. FIR Implementation schematic. TABLE I ENERGY COSTS WITH AND WITHOUT THE CHARGE PUMPS (CP) WITH TWO RANDOM INPUTS (SET 1 AND SET 2)

These are reected in Fig. 14, which depicts the energy comparisons in the implementation involving multiple time-multiplexed charge pumps. Note that time multiplexing of multiple charge pumps to provide extended virtual supply to target blocks can improve the overall energy savings in large circuits. B. Area Comparison We assess the area overhead of the charge pump based recycling scheme both with respect to transistor count and layout area. The transistor count comparison is based on the number

Fig. 12. Energy cost comparison.

of equivalent minimum sized transistors which is estimated for the systems with and without charge pump. Various input and intermediate capacitors are equated to a representative number of minimum feature sized transistors. Assuming that all the capacitors are marshaled from the gate capacitances of transistors, each capacitance value can be translated into a xed number

KEUNG et al.: NOVEL CHARGE RECYCLING DESIGN SCHEME

743

TABLE II ENERGY COSTS WITH AND WITHOUT MULTIPLEXED CHARGE PUMPS (CP)

TABLE IV LAYOUT AREA OVERHEAD WITH CHARGE RECYCLING

Fig. 15. Layout area overhead with charge recycling. Fig. 13. Energy savings. TABLE V DELAY FIGURES WITH AND WITHOUT THE CHARGE PUMP

Fig. 14. Energy cost comparison for time multiplexed implementation.

We also generated layouts for these system designs with and without the charge pump. These layouts were done with AMI05 (0.6- m process) design rules. The netlists from these layouts were then extracted and simulated with Spectre, a SPICE level simulator, with simulation level bsim3v3. Table IV and Fig. 15 present the area data. The increase in area appears to be of the order of 3.4% on average. This certainly is an acceptable overhead given the corresponding energy benets. C. Delay Comparison Another gure-of-merit that needs to be computed is the speed of operation of the target block or equivalently the circuit generated by the charge delay. This is because the virtual . It is not an ideal pump is not exactly equal to the clean voltage source with innite supply of charge, with time the voltage value drops gradually; and hence the logic block when is likely to be slightly slower than powered with virtual due to the reduced voltage swing. These delay a clean values due to lowered voltage swing in a few sample circuits instantiated with sample inputs are shown in Table V. Note that the percentage increase in delay due to the lower drive is quite negligible and does not degrade the circuit performance. D. Leakage Energy Reduction An unexpected side effect of introducing virtual ground and nodes is reduction in leakage current. This is because

TABLE III AREA IN TRANSISTOR COUNT WITH AND WITHOUT THE CHARGE PUMP

of minimum sized transistors. With this model, as shown in Table III, the additional logic including the charge pump and control circuitry corresponds to less than 100 equivalent transistors. This forms less than 1% of the total transistor count of the circuits considered. The energy savings can be increased further by spatial implementations of multiple charge pumps. Though this would result in an increase in transistor count, the additional area overhead would still be a small fraction of the entire system area.

744

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 15, NO. 7, JULY 2007

TABLE VI LEAKAGE CURRENT COMPARISON DATA

Fig. 16. % reduction in leakage current with charge recycling.

the path from the logic node to ground or is now cut off with a capacitor whose leakage characteristics are signicantly different than those of transistors driving the logic node. Whatever charge is leaked by a device is now captured by the virtual instead of owing unimpeded to the ground or ground or node. However, additional benets accrue due to non-zero virtual ground voltage level which acts as a leakage clamp on and by effectively the driving transistor by reducing its raising its threshold through body effect. Recall that the subthreshold leakage current can be modeled as ([23, p. 201]) , where , and . Note that the leakage power decreases exponentially with respect to due to the drain induced barrier leakage effect [23]. We measured the leakage current values in these three systems: ALU/adder, FIR, and DCT with and without charge pumps. In order to assess the impact of the future technology nodes, we gather the leakage current data for AMI05 (0.5 ), 45-, and 32-nm BPTM (Berkeley Predictive Technology Model, now known as PTM for Predictive Technology Model) [15], [16] technology nodes. The following methodology is adopted for leakage current measurement. The average voltages at the virtual ground and virtual nodes are measured by running a HSPICE transient simulation under our test case for one cycle. This is followed by a dc sweep simulation on the module ground and to determine the average leakage current at the average voltage measured before. The result varies depending on the test case. The leakage current is usually smaller if the virtual ground voltage is high or the virtual voltage is low. In a system without a charge pump, nodes are held at the voltages suggested the ground and the in the technology les for that node (for instance, ground is 0 V is 0.7 V in 45-nm BPTM technology node). The bestand and case data is measured by forcing the virtual ground to virtual to , where and are the input voltage points with slope equal to 1 in voltage transfer characteristic graph of an inverter [26]. We present the leakage current data for AMI05, 45-nm BPTM, and 32-nm BPTM process. Note that AMI06 process supports capacitors realized from poly2 to poly layer [22]. We present the raw data on the measured leakage current levels in Table VI. The percentage reduction in the charge pump based designs over the original designs is illustrated in Fig. 16.

VI. CONCLUSION The proposed charge recycling approach is a charge recovery to ground path by introducing mechanism. It cuts the virtual ground nodes to collect the released ground-bound nodes charge. This charge is then recycled into virtual to supply other logic blocks (that are activated later in time, and hence can tolerate the intervening charge pump latency). The charge pumps incorporate adiabatic charge transfer in order to save energy (a charge pump operating normally would result in net energy loss). The proposed charge recycling places adiabatic blocks in the paths that can tolerate its latency, hence this scheme does not result in any performance loss. We incorporated this design methodology in several DSP lters such as FIR and DCT/IDCT. The resulting energy savings are of the order of 10% on average. The performance loss for this energy saving is less than 1% on average. The area overhead for charge recycling is in the 1%2% range. The leakage energy reduction in AMI05 0.5- m process averages 28.87%, and in 45-nm BPTM averages 46%. Spatially and temporally multiplexed charge pumps increase the deployability of the scheme and improve the energy savings up to 18%. Future work includes considering CAD algorithms to identify the logic blocks that can benet from the proposed scheme both at the logic synthesis and at net list levels. Another direction to pursue is to assess the leakage energy advantage of charge recycling in array structures, especially in processor caches. REFERENCES
[1] T. Arslan, D. H. Horrocks, and A. T. Erdogan, Overview and design directions for low-power circuits and architectures for digital signal processing, in Dig. IEE Colloq. Low-Power Analog. Digit. VLSI: ASICS Techn. Appl., 1995, pp. 6/16/5. [2] A. T. Erdogan, M. Hasan, and T. Arslan, Algorithmic low power FIR cores, Proc. Inst. Elect. Eng.Circuits, Devices Syst., vol. 150, no. 3, pp. 155160, 2003. [3] A. D. Garcia, J.-L. Danger, and W. Burleson, Low power digital design in FPGAs: A study of pipeline architectures implemented in a FPGA using a low supply voltage to reduce power consumption, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2000, pp. 561564. [4] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, Low-power CMOS digital design, J. Solid-State Circuits, vol. 27, no. 4, pp. 473484, Apr. 1992. [5] U. Ko, Poras, T. Balsara, and W. Lee, Low-power design techniques for high-performance CMOS adders, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 3, no. 2, pp. 327333, Jun. 1995. [6] A. Guyot and S. J. Abou-Samra, Low power CMOS digital design, in Proc. Int. Conf. Microelectron. (ICM), 1998, pp. IP6IP13.

KEUNG et al.: NOVEL CHARGE RECYCLING DESIGN SCHEME

745

[7] Workshop Phys. Comput., Dallas, TX, 1992. [8] Workshop Phys. Comput., Dallas, TX, 1994. [9] J. Dickson, On-chip high-voltage generation in nMOS integrated circuits using an improved voltage multiplier technique, IEEE J. SolidState Circuits, vol. 11, no. 3, pp. 374378, Jun. 1976. [10] L. Pylarinos, Charge pumps: An overview, Dept. Electr. Comput. Eng., Univ. Toronto, ON, Canada, (2001). [Online]. Available: http:// www.eecg.toronto.edu/kphang/ece1371/chargepumps.pdf. [11] W. C. Athas, J. G. Koller, and L. J. Svensson, An energy-efcient CMOS line driver using adiabatic switching, in Proc. IEEE 4th Great Lakes Symp. VLSI, 1994, pp. 196199. [12] W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzahis, and E. Chou, Low-power digital systems based on adiabatic-switching principles, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 2, no. 4, pp. 398407, Dec. 1994. [13] L. J. Svensson and J. G. Koller, Driving a capacitive load without dissipating fCV , in Proc. IEEE Symp. Low-Power Electron., 1994, pp. 100101. [14] ITRS, International technology roadmap for semiconductors, 2005 [Online]. Available: http://public.itrs.net [15] Arizona State Univ., Tempe, Berkeley predictive technology model, now known as predictive technology model, (2007). [Online]. Available: http://www-device.eecs.berkeley.edu., http://www.eas.asu.edu/ ~ptm. [16] Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu, New paradigm of predictive MOSFET and interconnect modeling for early circuit design, in Proc. CICC, 2000, pp. 201204. [17] Avant! Corporation, Mountain View, CA, Star-HSPICE 2001.4, (2001).. [18] Taiwan Semiconductor Manufacturing Company Ltd., Hsinchu, Taiwan, Taiwan Semiconductor Manufacturing Company Ltd., homepage (2007). [Online]. Available: http://www.tsmc.com. [19] D. Burger and T. M. Austin, The SimpleScalar tool set, Comput. Sci. Dept., Univ. Wisconsin-Madison., Tech. Rep. #1342, 1997, Version 2.0.. [20] S. J. E. Wilton and N. P. Jouppi, An enhanced access and cycle time model for on-chip caches, WRL Res. Tech. Rep. 93/5., 1994. [21] P. M. Lin and L. O. Chua, Topological generation and analysis of voltage multiplier circuits, IEEE Trans. Circuits Syst., vol. CAS-24, no. 10, pp. 517530, Oct. 1977. [22] MOSIS, Marina del Rey, CA, MOSIS: The MOSIS service, (2007). [Online]. Available: URL: http://www.mosis.org. [23] K. Roy and S. C. Prasad, Low-Power CMOS VLSI Circuit Design. New York: Wiley, 2000. [24] U. Karthaus and M. Fisher, Fully integrated passive UHF RFID transponder IC with 16.7-microwatt minimum RF input power, IEEE J. Solid-State Circuits, vol. 38, no. 10, pp. 16021608, Oct. 2003. [25] S. Mukhopadhyay, A. Raychowdhury, and K. Roy, Accurate estimation of total leakage current in scaled CMOS logic circuits based on compact current modeling, in Proc. 40th Design Autom. Conf. (DAC), 2003, pp. 169174. , Digital Integrated Cir[26] J. Rabey, A. Chandrakasan, and B. Nikolic cuits. Englewood Cliffs, NJ: Prentice-Hall, 2003.

Ka-Ming Keung (S06)received the B.S. degree in computer engineering from Iowa State University, Ames, in 2004, where he is currently pursuing the Ph.D. degree in computer engineering. His research interests include low-power design, logic design, and FPGA.

Vineela Manne received the B.Tech. degree in electronics and communications engineering from the Jawaharlal Nehru Technological University, Hyderabad, India, in 2001, and the M.S. degree in computer engineering from Iowa State University, Ames, in 2003. She is currently working with Micron Technology Inc., Boise, ID, as a Product Engineer in NAND Flash memory.

Akhilesh Tyagi (M88) received the B.E. degree (honors) in electrical and electronics engineering, from the Birla Institute of Technology and Science, Pilani, India, in 1981, the M.Tech. degree in computer engineering, from the Indian Institute of Technology, New Delhi, India, in 1983, and the Ph.D. degree in computer science from the University of Washington, Seattle, in 1988. He is now with the Electrical and Computer Engineering Department, Iowa State University, Ames. From August of 1987 to June of 1993, he was an Assistant Professor with the Department of Computer Science, University of North Carolina, Chapel Hill. Subsequent to that, he was with the Department of Computer Science. His research interests include VLSI complexity theory and low energy design, secure and DRM architectures and compilers.

Potrebbero piacerti anche