Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
INTRODUCTION
Because of the prominence of compact electronic items, low power framework
has pulled in more consideration lately. As technology advances, a system-on-a-chip
(SOC) configuration can contain more parts that prompt a higher power density. This
makes power dissipation achieve the cutoff points of what packaging, cooling or other
framework can help. Decreasing the power consumption can upgrade battery life as
well as can evade the overheating issue, which would build the level of trouble of
packaging or cooling consequently, the thought of power consumption in complex
SOCs has turned into a huge test to designers.
In addition, in advanced VLSI plans, power consumed by clocking has taken a
significant piece of the entire plan particularly for those designs using deeply scaled
CMOS technologies. In this way, a few strategies have been proposed to decrease the
power consumption of clocking. For a given plan that the areas of the cells have been
firm, the power consumed by clocking can be decreased further by substituting a few
flip-flops with multi-bit flip-flops. At clock tree synthesis, less number of flip-flops
implies reduced number of clock sinks. Therefore, the resulting clock system uses
reduced power consumption and utilizes less routing resource.
Furthermore, smaller flip-flops are substituted by bigger multi-bit flip-flops;
gadget varieties in the relating circuit can be orderly reduced. As the CMOS
technology progresses, the driving capacity of an inverter-based clock buffer
increments fundamentally. The ability to drive a clock buffer can be assessed by the
quantity of least measured inverters that it can drive on a given rising or falling time.
Due to this sensation, a few flip-flops can impart a common clock buffer to evade
unnecessary waste of power.
Fig.1.1 shows the block diagrams of 1- and 2-bit flip-flops. If we replace the
two 1-bit flip-flops as shown in Fig.1.1 by the 2-bit flip-flop as shown in Fig.1.2, the
total power consumption can be reduced because the two 1-bit flip-flops can share the
same clock buffer.
Fig 1.1 Two single bit flip-flops before merging and after merging
In any case, the areas of some flip-flops would be changed after this
substitution, and afterward the wire lengths of nets connecting pins to a flip-flop are
additionally changed. To abstain from damaging the timing imperatives, we confine
that the wire lengths of nets uniting pins to a flip-flop can't be longer than detailed
values after this procedure. On the other hand, to ensure that another flip-flop can be
put inside the desired region, we likewise need to consider the area capacity of the
region.
The power plays a significant part in any design one may need to focus on
power reduction strategies. To diminish the power consumption, a lot of low-power
plan procedures have been presented, for example, clock gating, power gating making
multi-supply-voltage plans, dynamic voltage per frequency scaling, and minimizing
clock system. Among these procedures, minimizing and fusing the clock system is
essential in reducing power consumption of a Soc (System on Chip). By diminishing
the power in circuit design it naturally reduces the many-sided quality and wire
length. In this manner, distinctive systems have been proposed [2], [3] to design a
reduced power consumption design.
The power had been expanded for diverse stages are static and dynamic
power. In dynamic power, change in input signal at distinctive rationale level will
result in exchanging and short out force in the configuration. In static force, it doesn't
have any impact of level change in information and yield. The Multi-bit Flip-flop
Dept of ECE, VLSI & ES, GVIC
EXISTING METHOD
2.1 NEED FOR LOW POWER DESIGN
In the near the beginning 1970s scheming digital circuits for soaring speed
and bare minimum area were the main design constraints. Most of the EDA tools were
deliberate distinctively to meet this criterion. Power consumption
was
also
element of the devise progression but not very discernible. The lessening of
area of digital circuits is not as big issue today for the reason that with new IC
making techniques, many millions of transistors can be fit in a single IC. On the
other hand, dwindling sizes of circuits have paved the way for condensed power
consumption in order to have an wholesale battery life. Also in submicron
technologies, there is a constraint on the proper running of circuits due to heat
generated by power dissipation. Market military are severe low power for not only
well again life but also trustworthiness, portability, routine, cost and time to market.
This is very true in the field of personal computing devices, wireless connections
systems, home amusement systems, which are becoming popular now-a-days.
Devices that are also used for high-performance computing particularly need to
squander less power to function fittingly and for a long period of time .
Keeping all these in mind, low power design has grow to be one of the most important
design parameters for VLSI (Very Large Scale Integration) systems.
2.1.1 DESIGN FLOW WITH AND WITHOUT POWER
A top-down commonplace VLSI design come up to is illustrated in Fig. 2.1.
The Fig. summarizes the flow of stepladder that is requisite to follow from a system
level plan to the physical design. The approach was meant at recital optimization and
area minimization. On the other hand, introducing the third stricture of power
dissipation finished the designers to alter the pour as shown in the right-hand side of
the Fig. 2.1.In each of the devise levels are two imperative power factors, namely
power optimization and power assessment. Power optimization is defined as the
progression of obtaining the best devise eloquent the devise constraints and devoid
of violating design stipulation. In order to meet the devise and requisite aspiration,
a power optimization modus operandi only one of its kind to that altitude should
be in employment. Power estimation is definite as the course of action of
manipulative power and energy debauched with a certain entitlement of accuracy and
Dept of ECE, VLSI & ES, GVIC
at poles apart phase of the devise progression. Power estimation techniques appraise
the effect of various optimizations and devise modifications on power at poles apart
abstraction levels.
Generally a devise performs a power optimization rung first and then a power
estimation rung, but surrounded by a firm devise level there is no unambiguous devise
procedure. Each devise level includes a large gathering of low power techniques.
Each possibly will result in a momentous decline of power dissipation. However, a
firm recipe of low power techniques may go in front to healthier domino effect than
another series of techniques.
Generally, power is obsessive when capacitors in the circuits are either
charged or discharged due to switching tricks. So at higher levels of a structure this
power dissipation is preserved by reducing the switching tricks which is finished by
shutting down down portions of the system when they are not looked-for. Large VLSI
circuits contain different workings like a processor, a functional unit and controllers.
The initiative of power reduction is to stop any of the workings of the processor when
they are not needed so that less power will be debauched when the processor is in
commission.
The first semiconductor chips apprehended two transistors each. Subsequent
advances supplementary more transistors, and as a upshot, more creature functions or
systems were incorporated over time. The first integrated circuits held only a few
devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it
possible to fabricate one or more logic gates on a single device. Now known
respectively as small-scale integration (SSI), improvements in technique led to
devices with hundreds of logic gates, known as medium-scale integration (MSI).
Further improvements lead to large-scale integration (LSI), i.e. systems with at
slightest a thousand logic gates. Current technologies have encouraged far-flung past
this mark and today's microprocessors have loads of millions of gates and billions of
personage transistors.
At one occasion there was an stab to name and regulate an assortment of levels
of large-scale integration beyond VLSI. Terms like ultra large scale integration
(ULSI) were worn. But the gigantic number of gates and transistors existing on
common devices has rendered such fine distinctions moot. Terms portentous greater
than VLSI levels of integration are no longer in widespread use.
Fig 2.2: Relationship between different abstraction level & Power estimation
techniques
on the spot
power.
predicament, several solutions are wished-for to triumph over this dilemma by using
the probabilistic dealings. In folks approaches, they use probabilities as a packed
together way to describe a large set of achievable logic signals. Another come within
reach of for average power assessment is to acquire the current waveform by
performing arts a simulation.
simulation-based
10
11
12
provide
irregular
measurement
on the subject of
the
trend
of
power
consumption sooner than implementation. However, they possibly will not have
very good accuracy owing to the lack of implementation particulars.
In disparity, bottom-up methods are useful as soon as reusing a previously
designed logic block so with the intention of all exhaustive internal structures of
the circuit are acknowledged. A power macro-model strength of character be
built for such logic blocks in this sort of methods. When this logic block is
used in an additional application, the analogous power macro-model be capable of
be recycled to estimate the power dissipation of this block lacking performing any
simulation at gate-level or transistor-level. The tradition of power model has been
showing Fig. 2.3. This kind of power modeling approach will be very useful in the IPbased SOC designs.
13
TOOLS REQUIRED
There has been a assortment of tools mixed up in this thesis. Even
though, this thesis is all with reference to simulation and power calculations of
macros which are made using tools; there are other tools that have been used
preceding to the tradition of power tools to give the requisite input to the
power tools. More prominence is given to these tools that are mainly involved in
power assessment. The usage of tools has been off the record as Power tools and NonPower tools.
chapter are some of the non-power tools drawn in in the intact design flow. A short
portrayal of each of these tools along with their functioning flow is given in this
chapter to appreciate their functionality. The subsequent chapter discusses each of the
power tools in detailed manner as most of the thesis involves the use of these power
tools. The following chapter also discusses the design flow from code
inscription to spice net-list simulation, clearly illumination the usage of these tools at
the respective level.
3.1.1 SIMULATION TOOL
Initially, Verilog or VHDL code for a fastidious design is written and tested.
Simulation is done using Mentors Modelsim for both VHDL Verilog and other
Verilog simulators. Xilinx is a simulation and a debugging tool for VHDL, Verilog,
and other mixed-language designs from Mentor Graphics. The basic simulation flow
is as shown in Fig. 2.5. Initially, a working library is fashioned and the code is
compiled using the commands depending upon whether the code is VHDL or
Verilog.
Verilog
Compiled
Simulator
(VCS)
from
Synopsys
is
high-
performance, high-capacity Verilog simulator that incorporates advanced highlevel abstraction, verification into an open platform. The basic work flow for VCS
consists of two basic steps:
a) Compiling source files into executable binary files
Dept of ECE, VLSI & ES, GVIC
14
15
tools
that comprise
a complete
Synopsys power tools offer power analysis and optimization throughout the
design cycle, from RTL to the gate level. Analyzing power early in the design cycle
can significantly affect the quality of the design. Improvements made to the design
while it is at RTL level can get even better results eventually. Not only these power
tools do accurate measurements but also can help in calculating power quicker.
Power consumption is calculated at three levels of abstraction. The tools used
at these levels are:
a) RTL Level - RTL Power Estimator
b) Gate Level Power Compiler (based on switching activity),
c) Transistor Level Nano Sim
3.2.1 POWER COMPILER
Power Compiler is an add-on product to Design Compiler. The Power
Compiler tool optimizes the design for power. Working in conjunction with the
Design Compiler tool, Power Compiler provides simultaneous optimization for
timing, power and area. In addition to the standard inputs to synthesis (RTL or
gate-level
net-list,
technology
Compiler uses two other inputs: Switching activity of design elements and power
constraints. It contains all the analysis capabilities of Design Power.
16
Power Compiler uses the same power analysis engine as Design Power.
This allows Power Compiler to the use the same switching activity for
optimization that Design Power uses for analysis. It accepts either user-defined
switching activity, switching activity from simulation, or a combination of both. It
provides RTL clock gating and optimizes the circuit based on circuit activity,
capacitance, and transition times. Power Compiler cannot only be used as a
standalone product but also can be used in coordination with Design Compiler,
Module Compiler, Physical Compiler and Floor plan Manager.
3.2.2. POWER COMPILER METHODOLOGY
Power Compiler is used at RTL and Gate level to calculate power and
do power optimization depending on the need. At each level of abstraction,
simulation, analysis and optimization can be performed to refine the design
before moving to the next lower level. Simulation and the resultant switching
activity gives the analysis and optimization the necessary information to refine the
design before going to next lower level of abstraction. The higher the level of design
abstraction, the greater the power savings can be achieved. The following Fig.4.2
describes the power flow at each of the abstraction level. Fig 3.3 shows power flow
from RTL to Gate level. Cell internal power and net toggling directly affect dynamic
power of a design. To report or optimize power, Power Compiler requires toggle
information for the design. This toggle information is called Switching Activity.
17
the design by comparing the time of a signal at a certain logic state to the total time of
the simulation. Toggle rate is the number of logic-0-to-logic-1 and logic-1-to-logic-0
transitions of a design object per unit of time.
The following Fig 4.5 shows the methodology of power calculation using the
combination of Power Compiler and Design Compiler. The flow of data between the
different steps and tools used are also shown. Before starting to calculate power using
Power Compiler the desired gate-level net-list of the design should be first generated.
The power methodology starts with the RTL design and finishes with a poweroptimized gate-level net-list. Ultimately, Power Compiler is used to calculate
power using the gate-level net-list produced by the Design Compiler or poweroptimized gate net-list produced by Power Compiler itself Power Compiler
models switching activity in terms
probability is the probability that a signal is at a certain logic state and is expressed as
a number between 0 and 1.
Power Compiler models switching activity in terms of static probability and
toggle rate. Static probability is the probability that a signal is at a certain logic state
and is expressed as a number between 0 and 1. It is calculated during simulation of
the design by comparing the time of a signal at a certain logic state to the total time of
the simulation. Toggle rate is the number of logic-0-to-logic-1 and logic-1-to-logic-0
transitions of a design object per unit of time.
The following Fig 3.5 shows the methodology of power calculation
using the combination of Power Compiler and Design Compiler. The flow of data
between the different steps and tools used are also shown. Before starting to calculate
power using Power Compiler the desired gate-level net-list of the design should be
first generated. The power methodology starts with the RTL design and finishes with a
power-optimized gate-level net-list. Ultimately, Power Compiler
is
used
to
calculate power using the gate-level net-list produced by the Design Compiler
or power-optimized gate net-list produced by Power Compiler itself. As seen in
the figure most of the processes that take place are using Design Compiler, but
the simulation process that is shown is outside Design Compiler tool and is done as
part of power calculation.
18
is
used
to
calculate power using the gate-level net-list produced by the Design Compiler
or power-optimized gate net-list produced by Power Compiler itself. As seen in
the figure most of the processes that take place are using Design Compiler, but
the simulation process that is shown is outside Design Compiler tool and is done as
part of power calculation.
The main purpose of simulation is to generate information about the
switching activity of the design and create a file called Back-annotation. This
file can contain switching activity from RTL simulation or gate-level simulation.
Initially, the RTL design is given to the HDL compiler to create a technologyindependent format called as GTECH design. This is as a result of analyzing and
elaborating the design by HDL compiler. This formatted design is given as an
Dept of ECE, VLSI & ES, GVIC
19
information
compared
to RTL forward-annotation
file. This
file
contains information from the technology library about cells with state and
path dependent power models. Lib2saif command is used to get this forwardannotation file.
20
21
22
23
24
3. From the alphabetically ordered symbols appear in the Symbols panel select the
required symbols for our design, and add them to the schematic file
The required Symbols are (Two 2-Input And gates, Two 1-Input Inverters, and
One 2-Input Or gates).
4. To connect the gates in Schematic file, Select > Add > Wire and use the wires to
draw the connections You might need to Zoom-In the Schematic file to be able to
connect the gates.
5. To connect the Input / Output Ports to our design, select Add > I/O Marker and
connect two ports to the input and one to the output.
Both Add > Wire and Add I/O Marker can be found in the panel of the icons
appears at the left of the schematic.
25
8. Save the final schematic file my_xor.sch which contains the final design.
In Design Panel > Processes > ISim Simulator, Double Click >Simulate
26
27
IMPLEMENTATION
In the past technique [1] the measure of time is wasted by discovering the
impossible combination of FF furthermore numerous single bit FF is utilized. This
may expand the complicated nature. So as to decrease the power MBFF idea is
utilized. It portrays that need to recognize a legal placement region for every FF. In
first stage, the reasonable placement regions of a FF connected with diverse pins are
discovered focused around the timing stipulations characterized on the pins. At that
point, the legal placement region of the FF can be obtained by overlapped area of
these regions.
Nonetheless, these regions are fit as a diamond shape; it is not simple to
recognize the overlapped region. Accordingly, the overlapped zone can be recognized
all the more effectively in the event that it can change the coordinate arrangement of
cells to get rectangular regions. In the second stage, it might want to manufacture a
combination table, which characterizes all combinations of FF keeping in mind the
end goal to get another multi-bit Ffs given by the library.
The flip-flops can be united with the assistance of the table. After the legal
placement regions of flip-flops are discovered and the combination table is fabricated,
we can utilize them to merge flip-flops. To accelerate our project, we will isolate a
chip into a few canisters and consolidation flip-flops in a neighborhood bin.
However, the flip-flops in diverse bins might be merge able. In this way, we
need to consolidate a few bins into a bigger bin and repeat this venture until no flipflop can be fused any longer. In this area, we would detail each one phase of our
technique. In the first subsection, we demonstrate a basic equation to change the
original coordination framework into another one so that a legal placement region for
each one flip-flop can be distinguished all the more effectively. The second subsection
shows the flow of building the combination table. At long last, the substitutions of
flip-flops will be depicted in the last subsection.
28
overlapping area of several regions. As shown in Fig. 4.1(a), there are two pins p1 and
p2 connecting to a ip-op f1,and the feasible placement regions for the two pins are
enclosed by dotted lines, which are denoted by Rp ( p1) and Rp ( p2), respectively.
Thus, the legal placement region R( f1) for f1 is the overlapping part of these regions.
In Fig. 401(b), R( f1) and R( f2) represent the legal placement regions of f1 and f2.
Because R( f1) and R( f2) overlap, we can replace f1 and f2 by a new ip-op f3
without violating the timing constraint, as shown in Fig. 4.1(c).
However, it is not easy to identify and record feasible placement regions if their
shapes are diamond. Moreover, four coordinates are required to record an overlapping
region [see Fig. 4.2(a)]. Thus, if we can rotate each segment 45, the
Fig. 4.1. (a) Feasible regions Rp ( p1) and Rp ( p2) for pins p1 and p2 which are
enclosed by dotted lines, and the legal region R( f1) for f1 which is enclosed by solid
lines. (b) Legal placement regions R( f1) and R( f2 ) for f1 and f2, and the feasible area
R3 which is the overlap region of R( f1) and R( f2). (c) New ip-op f3 that can be
used to replace f1 and f2 without violating timing constraints for all pins p1, p2,
p3,and p4.
29
Fig. 4.2 (a) Overlapping region of two diamond shapes. (b) Rectangular shapes
obtained by rotating the diamond shapes in (a) by 45.
Shapes of all regions would become rectangular, which makes identication of
overlapping regions become very simple. For example, the legal placement region,
enclosed by dotted lines in Fig. 4.2(a), can be identied more easily if we change its
original coordinate system [see Fig. 4.2(b)]. In such condition, we only need two
coordinates, which are the left-bottom corner and right-top corner of a rectangle, as
shown in Fig. 4.2(b), to record the overlapped area instead of using four coordinates.
The equations used to transform coordinate system are shown in (1) and (2).
Suppose the location of a point in the original coordinate system is denoted by (x, y).
After coordinate transformation, the new coordinate is denoted by (x, y). In the
original transformed equations, each value needs to be divided by the square root of 2,
which would induce a longer computation time. Since we only need to know the
relative locations of flip-flops, such computation are ignored in our method.
Thus, we use x and y, to denote the coordinates of transformed locations.
30
31
32
T = InitializationCombinationTable(L);
InsertPseudoType(L);
SortByBitNumber (L);
for each ni in T do
InsertChildrens (ni, NULL, NULL);
index = 0;
while index != size(T) do
range_first = index;
range_second = size(T);
index = size(T);
for each ni in T
for j = 1 to range_first do TypeVerify(ni, nj, T);
for j = i to range_second do TypeVerify(ni, nj, T);
T = DuplicateCombinationDelete(T);
T = UnusedCombinationDelete(T);
InsertPseudoType(L):
1. 1for i = (bmin+1) to (bmax-1)
2. if(L does not contain a type whose bit width is equal to i )
3. insert a pseudo type typej with bit width i to L;
InsertChildrens(n, n1, n2):
1
2
n.left_child n1;
n.right_child n2;
Combinationwhosebitwidthis4,theremustexistipopswhosebitwidthsare
2and3inL[pleaseseethelasttwobinarytreesinFig.9(e)forexample].Thus,we
havetocreatetwopseudotypesofipopswith2and3bitif L doesnotprovide
these ipops. Function InsertPseudoType in algorithm 1 shows how to create
33
34
Fig.4.4.Exampleofbuildingthecombinationtable.
35
(a)InitializethelibraryLandthecombinationtableT.(b)Pseudotypesareadded
intoL,andthecorrespondingbinarytreeisalsobuildforeachcombinationinT.(c)
Newcombinationn3isobtainedfromcombiningtwon1s.(d)Newcombinationn4is
obtained from combining n1 and n3, and the combination n5 is obtained from
combiningtwon3s.(e)Newcombinationn6isobtainedfromcombiningn1andn4.4
(f)Lastcombinationtableisobtainedafterdeletingtheunusedcombinationin4.4(e).
To delete them from the table and the two functions DuplicateCombinationDelete
and UnusedCombinationDelete are called for the purpose (Lines 14 and 15). In
DuplicateCombinationDelete, it checks whether the duplicated combinations exist
or not. If the duplicated combinations exist, only the one with the smallest height of
its
corresponding
binary
tree
is
left
and
the
others
are
deleted.
In
if(mod(b(typej)/2)==0)
b1=[b(typej)/2],b2=[b(typej)/2];
else
b1=b(typej)/2,b2=b(typej)b(typej)/2
fori=1to2
if((bi>bmin)&&
(Ldoesnotcontainatypewhosebitwidthisequaltobi))7insertapseudotypetypej
withbitwidthbitoL;8PseudoTypeVerifyInsertion(typej,L);
typeinL.Ifthecombinationisnotincludedintoanyothercombinations,itwillbe
deleted.
Forexample,supposealibraryLonlyprovidestwotypesofipops,whose
bitwidthsare1and4(i.e.,bmin=1andbmax=4),inFig.4.4(a).Werstinitialize
Dept of ECE, VLSI & ES, GVIC
36
twocombinations n1andn2torepresentthesetwotypesofipopsinthetableT
[seeFig.4.4(a)].Next,thefunctionInsertPseudoTypeisperformedtocheckwhether
theipoptypeswithbitwidthsbetween1and4existornot.Thus,twokindsof
ipoptypeswhosebitwidthsare2and3areaddedintoL,andalltypesofipops
in L are sorted according to their bit widths [see Fig. 4.4(b)]. Now, for each
combinationinT,wewouldbuildabinarytreewith0level,andtherootofthebinary
treedenotesthecombination.Next,wetrytobuildnewlegalcombinationsaccording
tothepresentcombinations.Bycombingtwo1bitipopsintherstcombination,
anewcombinationn3canbeobtained[seeFig.4.4(c)].Similarly,wecangetanew
combinationn4(n5)bycombiningn1andn3(twon3s)[seeFig.4.4(d)].Finally,n6is
obtainedbycombingn1andn4.Allpossiblecombinationsofipopsareshownin
Fig.4.4(e).Amongthesecombinations, n5and n6areduplicated sincetheyboth
representthesamecondition,whichreplacesfour1bitipopsbya4bitipop.
Tospeedupourprogram, n6isdeletedfrom T ratherthan n5becauseitsheightis
larger.Afterthisprocedure,n4becomesanunusedcombination[seeFig.4.4(e)]since
therootofbinarytreeofn4correspondstothepseudotype,type3,inLanditisonly
includedinn6.Afterdeletingn6,n4isalsoneedtobedeleted.Thelastcombination
tableTisshowninFig.4.4(f).
Inordertoenumerateallpossiblecombinationsinthecombinationtable,all
theipopswhosebitwidthsrangebetweenbmaxandbminanddonotexistinL
shouldbeinsertedinto Lintheaboveprocedure.However,thisistimeconsuming.
Toimprovetherunningtime,onlysometypesofipopsneedtobeinserted.There
existseveralchoicesifwewanttobuildabinarytreecorrespondingtoatypetypej.
However,thecompletebinarytreehasthesmallestheight.Thus,forbuildingabinary
treeofacertaincombination ni whosetypeis typej ,onlytheipopswhosebit
widths
37
Input
F.g.4.5detailedflowtomergeflipflopsare
(b(typej )/2)and (b(typej )b(typej )/2) should exist in L. Algorithm 2 shows the
enhanced procedure to insert ipops of pseudo types. For each typej in L, the
function.
PseudoTypeVerifyInsertionrecursivelycheckstheexistenceofipopswhosebit
widthsaround[b(typej)/2]andaddthemintoLiftheydonotexist(seeLines1and
2).InthefunctionPseudoTypeVerifyInsertion,itdividesthebitwidthb(typej)into
twoparts[b(typej)/2]and[b(typej)/2][b(typej)/2]andb(typej)b(typej)/2)if
b(typej)isaneven(odd)number(seeLines14inPseudoTypeVerifyInsertion),
Dept of ECE, VLSI & ES, GVIC
38
anditwouldinsertapseudotypetypejintoLifthetypeisnotprovidedbyLandits
bitwidthislargerthantheminimumbitwidth(denotedbybmin)ofipopsinL
(seeLines58inPseudoTypeVerifyInsertion).Thesameprocedurerepeatsinthe
newcreatedtype.Notethatthismethodworksonlywhenthe1bittypeexistsinL.
WestillhavetoinsertpseudoipopsbythefunctionInsertPseudoTypein
Algorithm1ifthe1bitipopisnotprovidedbyL.
Forexample,assumealibraryLonlyprovidestwokindsofipopswhose
bitwidthsare1and7.Inthenewprocedure,itrstaddstwopseudotypesofip
opswhosebitwidthsare3and4,respectively,fortheipopwith7bit(i.e., L
becomes[1347]).Next,theipopwhosebitwidthis2isaddedtoLfortheip
opwith4bit(i.e.,Lbecomes[12347]).Fortheipopwith3bit,theprocedure
stopsbecauseopopswith1and2bitsalreadyexistinL.Inthenewprocedure,we
donotneedtoinsert5and6bitpseudotypestoL.
4.2.2MergeFlipFlops
WehaveshownhowtobuildacombinationtableinSectionIIIB.Now,we
wouldliketoshowhowtousethecombinationtabletocombineipopsinthis
subsection.Toreducethecomplexity,werstdividethewholeplacementregioninto
several subregions, and use the combination table to replace ipops in each
subregion.Then,severalsubregionsarecombinedintoalargersubregionandtheip
opsarereplacedagainsothatthoseipopsintheneighboringsubregionscanbe
replaced further.Finally,thoseipopswithpseudotypesaredeletedinthelast
stagebecausetheyarenotprovidedbythesupportedlibrary.Fig.4.5showsthisow.
1)RegionPartition(Optional):Tospeedupourproblem,wedividethewholechip
intoseveralsubregions.Bysuitablepartition,thecomputationcomplexityofmerging
ipopscanbereducedsignicantly(therelatedquantitativeanalysiswillbeshown
inSectionV).AsshowninFig.11,wedividetheregionintoseveralsubregions,and
eachsubregioncontainssixbins,whereabinisthesmallestunitofasubregion
39
2)ReplacementofFlipopsinEachSubregion:Beforeillustratingourprocedureto
mergeipops,werstgiveanequationtomeasurethequalityiftwoipopsare
goingtobereplacedbyanewipopasfollows:
cost=routing_lengthavailable_area(5)
whererouting_lengthdenotesthetotalroutinglengthbetweenthenewipopand
the pins connected to it, and available_area represents the available area in the
feasible region for placing the new ipop. is a weighting factor (the related
analysisofthevalue willbeshowninSectionV).Thecostfunctionincludesthe
termrouting_lengthtofavorareplacementthatinducesshorterwirelength.Besides,if
theregionhaslargeravailablespacetoplaceanewipop,itimpliesthatithas
higheropportunitiestocombinewithotheripopsinthefutureandmorepower
reduction.Thus,wewillgiveitasmallercost.Oncetheipopscannotbemergedto
a higherbit type (as the 4bit combination n4 in Fig. 4.4), we ignore the
available_areainthecostfunction,andhenceissetto0.
After a combination has been built, we will do the replacements of ipops
accordingtothecombinationtable.First,welinkipopsbelowthecombinations
correspondingto
40
Fig.4.7.Exampleofreplacementsofipops.(a)Setsofipopsbeforemerging.
(b)Two1bitipops,f1andf2,arereplacedbythe2bitipopf3.(c)Two1bit
ipops,f4andf5,arereplacedbythe2bitipopf6.
1(d)Two2bitipops,f7andf8,arereplacedbythe4bitipopf9.
2(e)Two2bitipops,f3andf6,arereplacedbythe4bitipopf10.
3(f)Setsofipopsaftermerging.
41
theirtypesinthelibrary.Then,foreachcombination n in T, weseriallymergethe
ipopslinkedbelowtheleftchildandtherightchildof n fromleavestoroot.
Algorithm 3 shows the procedure to get a new ipop corresponding to the
combinationn.Basedonitsbinarytree,wecanndthecombinationsassociatedwith
theleftchildandrightchildoftheroot.Hence,theipopsinthelists,namedlleft
and lright, linked below the combinations of its left child and its right child are
checked(seeLines2and3).Then,foreachipopfiinlleft,thebestipopfbest
in lright, whichis the ipopthat canbemerged with fi withthe smallest cost
recorded in cbest, is picked. For each pair of ipops in the respective list, the
combinationcost[basedon(5)]iscomputediftheycanbemergedandthepairwith
thesmallestcostischosen(seeLines411).Finally,weaddanewipopf inthe
listofthecombinationnandremovethepickedipopswhichconstitutesthef(see
Lines1214).
Forexample,givenalibrarycontainingthreetypesofipops(1,2,and4bit),we
rstbuildacombinationtable T asshowninFig.4.7(a).Inthebeginning,theip
opswithvarioustypesare,respectively,linkedbelown1,n2,andn3in
Fig.4.8.Combinationofflipflopsnearsubregionboundaries.(a)Resultofreplace
flipflopsineachsubregion.(b)Resultofreplaceflipflopsineachnewsubregion
whichisobtainedfromcombiningtwelvesubregionin(a).
42
Fig.4.9.Combinationofsubregionstoalargerone.(a)Placementisoriginally
partitionedinto16subregionsforreplacement.(b)Subregionboundedbyboldlineis
obtainedfromcombiningfourneighboringsubregionsin(a).(c)Subregionbounded
byboldlineisobtainedfromcombiningfoursubregionsin(b).
Taccordingtotheirtypes.Supposewewanttoformaipopinn4,which
needstwo1bitipopsaccordingtothecombinationtable.Eachpairofipopsin
n1areselectedandcheckedtoseeiftheycanbecombined(notethattheyalsohave
tosatisfythetimingandcapacityconstraintsdescribedinSectionII).Ifthereare
severalpossiblechoices,thepairwiththesmallestcostvalueischosentobreakthe
tie.InFig.4.7(a),f1andf2arechosenbecausetheircombinationgainsthesmallest
cost.Thus,weaddanewnodef3inthelistbelown4,andthendeletef1andf2from
theiroriginallist[seeFig.4.7(b)].Similarly,f4andf5arecombinedtoobtainanew
ipop f6, and the result is shown in Fig. 4.7(c). After all ipops in the
combinationsof1leveltrees(n4and n5) areobtainedasshowninFig.4.7(d),we
starttoformtheipopsinthecombinationsof2leveltrees(n6,and n7).InFig.
4.7(e),thereexistsomeipopsinthelistsbelown2andn4,andwewillmergethem
togetipopsinn6andn7,respectively.Supposethereisnooverlapregionbetween
thecoupleofipopsinn2andn4.Itfailstoforma4bitipopinn6.Sincethe2
bitipopsf3andf6aremergeable,wecancombinethemtoobtaina4bitipop
f10inn7.Finally,becausethereexistsnocoupleofipopsthatcanbecombined
further,theprocedurenishesasshowninFig.4.7(f).
Iftheavailableoverlapregionoftwoipopsexists,wecanassignanewoneto
replacethoseipops.Oncethereissufcientspacetoplacethenewipopinthe
availableregion,thealgorithmwillperformthereplacement,andthenewgenerated
ipopwillbeplacedinthegridthatmakesthewirelengthbetweentheipopand
Dept of ECE, VLSI & ES, GVIC
43
itsconnectedpinssmallest.Ifthecapacityconstraintofthebin, Bk ,whichthegrid
belongstowillbeviolatedafterthenewipopisplacedonthatgrid,wewillsearch
thebinsnear Bk tondanewavailablegridforthenewipop.Ifnoneofbins
which are overlapped with the available region of new ipop can satisfy the
capacityconstraintaftertheplacementofnewipop,theprogramwillstopthe
replacementofthetwoipops.
3)BottomUpFlowofSubregionCombinations(Optional):AsshowninFig.4.8(a),
there mayexistsomeipops intheboundaryofeachsubregionthat cannotbe
replacedbyanyipopinitssubregion.However,theseipopsmaybemerged
withotheripopsinneighboringsubregionsasshowninFig.4.8(b).Hence,to
reducepowerconsumptionfurthermore,wecancombineseveralsubregionstoobtain
alargersubregionandperformthereplacementagaininthenewsubregionagain.The
procedurerepeatsuntilwecannotachieveanyreplacementinthenewsubregion.Fig.
14givesanexampleforthishierarchicalow.AsshowninFig.4.9(a),supposewe
divideachipinto16subregionsinthebeginning.Afterthereplacementofipops
isnishedineachsubregion,foursubregionsarecombinedtogetalargeroneas
showninFig.4.9(b).Supposesomeipopsinnewsubregionsstillcanbereplaced
bynewipopsinothernewsubregions,wewouldcombinefoursubregionsinFig.
4.9(b)togetalargeroneasshowninFig.4.9(c)andperformthereplacementinthe
new subregion again. As the procedure repeats in a higher level, the number of
mergeable ipops gets fewer. However, it would spend much time to get little
improvementforpowersaving.Toconsiderthisissue,thereexistsatradeoffbetween
powersavingandtimeconsuminginourprogram.
4)DeReplaceandReplace(Optional):Sincethepseudotypeisanintermediatetype,
whichisusedtoenumerateallpossiblecombinationsinthecombinationtableT,we
haveto remove the ipops belonging to pseudo types. Thus, after the above
procedureshavebeenapplied,wewouldperformdereplacementandreplacement
functionsifthereexistsanyopopsbelongingtoapseudotype.Forexample,if
therestillexistsaipop,fi,belongington3afterreplacementsinFig.9(f),wehave
todereplacefiintotwoipopsoriginallybelongston1.Afterdereplacing,wewill
do the replacements of ipops according to T without consideration of the
combinationswhosecorrespondingtypeispseudoinL.
Dept of ECE, VLSI & ES, GVIC
44
4.3.2 MODULES
This focuses on three different types of modules which are explained below.
4.3.2.1 DESIGN AND ANALYSIS OF MULTI-BIT FLIP-FLOPS
This module is utilized to decrease the power utilization by substituting some
flip flop with less Multi-Bit flip flops. We are utilizing the Multi-Bit flip flop rather
than more single bit flip flop to expand the clock synchronization. This will diminish
the unnecessary force wastage through the utilization of numerous clock sinks.
45
46
at the clock edge, and if the data changes at different times, the yield will be
unaffected. D flip-failures are by a wide margin the most well-known sort of flip-flops
and a few gadgets are made altogether from D flip-flops. They are regularly utilized
for shift- registers and input synchronization.
4.4 OBJECTIVES
1. Reduce the power consumption.
2. To reduce to the area.
3. To reduce the delay and power of a clock network.
4. To control clock skew because of common clock signal.
The above objectives can be achieved by merging several flip-flops and
synchronizing with clock signals.
47
RESULTS
5.1 SIMULATION AND SYNTHESIS OUTPUT
These results contain the simulation and synthesis results for different flip
flops and the adder which was designed as application module.
For a single bit flip-flop when input of clock leading edge is at 0 and the
trailing edge is at 1, d flip-flop input is given as 1, clear input is given as 1and the
preset output is given as 1. The output will be the 1; the simulated waveform was as
shown in the Fig. 5.1.
48
49
50
51
52
Flip-Flop Type
Delay(ns)
Clock Power(W)
1-bit
5.531
0.0127
2-bit
5.531
0.0127
4-bit
5.531
0.0130
8-bit
5.531
0.0166
CONCLUSION
Dept of ECE, VLSI & ES, GVIC
53
This project has proposed a methodology for flip-flop substitution for power
reduction in digital integrated circuit design. The system of flip-flop substitutions is
relying upon the combination table, which records the connections among the flipflop types. By the rules of substitutions from the combination table, the
incomprehensible combinations of flip-failures won't be viewed as that reductions
execution time. Other than power reduction, the destination of minimizing the
aggregate wire length likewise considered to the expense capacity. The Verilog source
code had produced for the application module as indicated in above areas and
simulated utilizing the Isim test system. The single bit and multibit flip-flops source
code additionally planned and reproduced and combined utilizing Xilinx ISE Design
suite. This methodology can be appropriate for any circuit comprising of various flipflops like counters registers.
REFERENCES
Dept of ECE, VLSI & ES, GVIC
54
[1] Ya-Ting Shyu, Jai-Ming Lin, Chun-Po Huang, Cheng-Wu Lin, Ying- Zu Lin,
and Soon- Jyh Chang, 2013, Effective and efficient approach for power reduction
by using Multi-bit Flip-flops in IEEE transactions on VLSI, vol. 21, no. 4.
[2] H. Kawagachi and T. Sakurai, 1997, A reduced clock-swing flip-flop (RCSFF)
for 63% clock power reduction , in VLSI Circuits Dig. Tech. Papers Symp., pp. 97
98.
[3] Y. Cheon, P.-H. Ho, A. B. Kahng, S. Reda, and Q. Wang, 2005, Power-aware
placement , in Proc. Design Autom. Conf., pp. 795800.
[4] Y.-T. Chang, C.-C. Hsu, P.-H. Lin, Y.-W. Tsai and S.-F. Chen, 2010, Postplacement power optimization with multi-bit flip- flops , in Proc. IEEE/ACM
Comput.-Aided Design Int. Conf., SanJose, CA, pp. 218223.
[5] P. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R.L. Allmon,
High- performance microprocessor design, IEEE J. Solid-State Circuits, vol. 33,
no. 5, pp. 676686, May 1998.
[6] L. Chen, A. Hung, H.-M. Chen, E. Y.-W. Tsai, S.-H. Chen, M.-H. Ku, and C.C.Chen,
Using
multi-bit
flip-flop
for
clock
power saving
by
Design
55