Reduction of Power Using Multi-Bit Flip-Flops

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS
INTRODUCTION
Because of the prominence of compact electronic items, low power framework
has pulled in more consideration lately. As technology advances, a system-on-a-chip
(SOC) configuration can contain more parts that prompt a higher power density. This
makes power dissipation achieve the cutoff points of what packaging, cooling or other
framework can help. Decreasing the power consumption can upgrade battery life as
well as can evade the overheating issue, which would build the level of trouble of
packaging or cooling consequently, the thought of power consumption in complex
SOCs has turned into a huge test to designers.
In addition, in advanced VLSI plans, power consumed by clocking has taken a
significant piece of the entire plan particularly for those designs using deeply scaled
CMOS technologies. In this way, a few strategies have been proposed to decrease the
power consumption of clocking. For a given plan that the areas of the cells have been
firm, the power consumed by clocking can be decreased further by substituting a few
flip-flops with multi-bit flip-flops. At clock tree synthesis, less number of flip-flops
implies reduced number of clock sinks. Therefore, the resulting clock system uses
reduced power consumption and utilizes less routing resource.
Furthermore, smaller flip-flops are substituted by bigger multi-bit flip-flops;
gadget varieties in the relating circuit can be orderly reduced. As the CMOS
technology progresses, the driving capacity of an inverter-based clock buffer
increments fundamentally. The ability to drive a clock buffer can be assessed by the
quantity of least measured inverters that it can drive on a given rising or falling time.
Due to this sensation, a few flip-flops can impart a common clock buffer to evade
unnecessary waste of power.
Fig.1.1 shows the block diagrams of 1- and 2-bit flip-flops. If we replace the
two 1-bit flip-flops as shown in Fig.1.1 by the 2-bit flip-flop as shown in Fig.1.2, the
total power consumption can be reduced because the two 1-bit flip-flops can share the
same clock buffer.
Dept of ECE, VLSI & ES, GVIC
Fig 1.1 Two single bit flip-flops before merging and after merging
In any case, the areas of some flip-flops would be changed after this
substitution, and afterward the wire lengths of nets connecting pins to a flip-flop are
additionally changed. To abstain from damaging the timing imperatives, we confine
that the wire lengths of nets uniting pins to a flip-flop can't be longer than detailed
values after this procedure. On the other hand, to ensure that another flip-flop can be
put inside the desired region, we likewise need to consider the area capacity of the
region.
The power plays a significant part in any design one may need to focus on
power reduction strategies. To diminish the power consumption, a lot of low-power
plan procedures have been presented, for example, clock gating, power gating making
multi-supply-voltage plans, dynamic voltage per frequency scaling, and minimizing
clock system. Among these procedures, minimizing and fusing the clock system is
essential in reducing power consumption of a Soc (System on Chip). By diminishing
the power in circuit design it naturally reduces the many-sided quality and wire
length. In this manner, distinctive systems have been proposed [2], [3] to design a
reduced power consumption design.
The power had been expanded for diverse stages are static and dynamic
power. In dynamic power, change in input signal at distinctive rationale level will
result in exchanging and short out force in the configuration. In static force, it doesn't
have any impact of level change in information and yield. The Multi-bit Flip-flop
(MBFF) is a successful power reduction procedure. It is utilized to decrease the

quantity of Flip Flop away stage. Sending numerous bits of information with single
FF utilizing single clock pulse is called MBFF. The idea of MBFF is presented in
adder application which is utilized to diminish the quantity of FFs which are not
empowered in the circuit outline. Mbffs have advantage over SBFF as more modest
outline zone, controllable clock, less delay on clock system and effective use of
routing resources.
The working of multi-bit flip flop is same as single-bit flip-flop, at whatever
point the clock gets dynamic state flip flop latches all data to yield. For idle state the
flip flop holds the information. The fundamental structure of multi-bit flip failure is
given in Fig. 1, it demonstrates that as opposed to utilizing single bit FF we can
supplant into multi bit FF as 2-bit FF, 4-bit FF and 8-bit FF are produced as a different
assignment. At the point when will the obliged bit of capacity FF is required the
specific errand is, no doubt brought in active region and others will be in-active (sleep
mode) region.
In the proposed work it takes after that it is utilized to store the quantity of bits
that are empowering specifically flip-flop utilizing single check and others are in
sleep mode. It doesn't devour power for other flip-flop which is not empowered
during the storage stage.
The multi smaller FF is supplanted by larger MBFF utilizing the less clock
source; all the more over gadget varieties in the relating circuit can be successfully
reduced. The FF can be fused with the assistance of combinational table which will
be powerfully empowered built in light of the number of bit capacity necessity
with force thought. The FF going to be united can be utilized for memory shows. By
decreasing the quantity of Ffs, the clock sinks area and clock dynamic power have
been viably diminished.
Let us see the 8 bit flip-flop can be arranged by using this application as
shown below. There are 8 inputs to the d type flip flop where single clock signal
enables all the 8 flip flops and gives the corresponding outputs.
Fig 1.2 8 bit flip flop
EXISTING METHOD
2.1 NEED FOR LOW POWER DESIGN
In the near the beginning 1970s scheming digital circuits for soaring speed
and bare minimum area were the main design constraints. Most of the EDA tools were
deliberate distinctively to meet this criterion. Power consumption
was
also
element of the devise progression but not very discernible. The lessening of
area of digital circuits is not as big issue today for the reason that with new IC
making techniques, many millions of transistors can be fit in a single IC. On the
other hand, dwindling sizes of circuits have paved the way for condensed power
consumption in order to have an wholesale battery life. Also in submicron
technologies, there is a constraint on the proper running of circuits due to heat
generated by power dissipation. Market military are severe low power for not only
well again life but also trustworthiness, portability, routine, cost and time to market.
This is very true in the field of personal computing devices, wireless connections
systems, home amusement systems, which are becoming popular now-a-days.
Devices that are also used for high-performance computing particularly need to
squander less power to function fittingly and for a long period of time .
Keeping all these in mind, low power design has grow to be one of the most important
design parameters for VLSI (Very Large Scale Integration) systems.
2.1.1 DESIGN FLOW WITH AND WITHOUT POWER
A top-down commonplace VLSI design come up to is illustrated in Fig. 2.1.
The Fig. summarizes the flow of stepladder that is requisite to follow from a system
level plan to the physical design. The approach was meant at recital optimization and
area minimization. On the other hand, introducing the third stricture of power
dissipation finished the designers to alter the pour as shown in the right-hand side of
the Fig. 2.1.In each of the devise levels are two imperative power factors, namely
power optimization and power assessment. Power optimization is defined as the
progression of obtaining the best devise eloquent the devise constraints and devoid
of violating design stipulation. In order to meet the devise and requisite aspiration,
a power optimization modus operandi only one of its kind to that altitude should
be in employment. Power estimation is definite as the course of action of
manipulative power and energy debauched with a certain entitlement of accuracy and
at poles apart phase of the devise progression. Power estimation techniques appraise
the effect of various optimizations and devise modifications on power at poles apart
abstraction levels.
Generally a devise performs a power optimization rung first and then a power
estimation rung, but surrounded by a firm devise level there is no unambiguous devise
procedure. Each devise level includes a large gathering of low power techniques.
Each possibly will result in a momentous decline of power dissipation. However, a
firm recipe of low power techniques may go in front to healthier domino effect than
another series of techniques.
Generally, power is obsessive when capacitors in the circuits are either
charged or discharged due to switching tricks. So at higher levels of a structure this
power dissipation is preserved by reducing the switching tricks which is finished by
shutting down down portions of the system when they are not looked-for. Large VLSI
circuits contain different workings like a processor, a functional unit and controllers.
The initiative of power reduction is to stop any of the workings of the processor when
they are not needed so that less power will be debauched when the processor is in
commission.
The first semiconductor chips apprehended two transistors each. Subsequent
advances supplementary more transistors, and as a upshot, more creature functions or
systems were incorporated over time. The first integrated circuits held only a few
devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it
possible to fabricate one or more logic gates on a single device. Now known
respectively as small-scale integration (SSI), improvements in technique led to
devices with hundreds of logic gates, known as medium-scale integration (MSI).
Further improvements lead to large-scale integration (LSI), i.e. systems with at
slightest a thousand logic gates. Current technologies have encouraged far-flung past
this mark and today's microprocessors have loads of millions of gates and billions of
personage transistors.
At one occasion there was an stab to name and regulate an assortment of levels
of large-scale integration beyond VLSI. Terms like ultra large scale integration
(ULSI) were worn. But the gigantic number of gates and transistors existing on
common devices has rendered such fine distinctions moot. Terms portentous greater
than VLSI levels of integration are no longer in widespread use.
Fig 2.1: VLSI design flow
2.2 RELATIONSHIP BETWEEN DIFFERENT ABSTRACTION

LEVELS
The relationship between devise abstraction level and power estimation
techniques is shown as Fig 2.2. The power estimation at top level is a good deal more
rapidly, but the accurateness will become worse due to the some degree of devise
information a number of CAD techniques for power estimation at lower levels of
abstraction, such as transistor-level [2-4] or gate-level, has been projected.
Generally speaking, they can afford more accurate estimation results.
However, they may become unpractical for multipart designs due to the whole system
simulation requires moreover much computation resources in such stumpy abstract

levels. In addition, as soon as the devise has been precise down to gate level or
lower, it may be too classy to go back to glue high-power problems. Most
outstandingly, IP vendors may not provide such low-level description for an IP to care
for their acquaintance.
Fig 2.2: Relationship between different abstraction level & Power estimation
techniques
2.3 BASIC CONCEPTS FOR POWER

The power dissipation of digital CMOS circuits can be described by
Pavg = Pdynamic + Pshort-circuit +Pleakage +Pstatic
2.3.1 STATIC POWER
Static power is the power dissipated by a gate whilst it is not switching that is,
when it is dormant or static. Superlatively, CMOS (Complementary Metal Oxide
Semiconductor) circuits drive away no static (DC) power since in the sturdy state
there is no direct pathway from Vdd to ground. These circumstances can on no
account be realized during practice, In view of the fact that in authenticity the MOS
transistor is not a just the thing switch. There will until the end of time be leakage
currents, sub threshold currents, and substrate injection currents, which bestow
rise to the static piece of power dissipation. The chief percentage of static power
outcome from source-to-drain sub threshold voltage, which is caused by
compact threshold voltages that thwart the gate from top to bottom turning off.
2.3.2 DYNAMIC POWER

Dynamic power is the power debauched whilst the circuit is active. A
circuit is active anytime the voltage on net changes due to a quantity of spur applied
to the circuit. In supplementary words, dynamic power dissipation is caused by the
charging. For the reason that voltage on an input net can change without unavoidably
consequential in logic alteration in the output, dynamic power can be dissipated
even at what time an output net doesnt change its logic state. This piece of dynamic
power dissipation is the consequence of charging and discharging parasitic
capacitances in the circuit.
Dynamic power of a circuit is self-possessed of
a) Switching power
b) Internal power
2.3.2.1 SWITCHING POWER
The switching power of a rousing cell is the power debauched by the charging
and discharging of the load capacitance at the amount produced of the cell. The full
amount load capacitance at the output of a driving cell is the totting up of the net and
gate capacitances on the pouring amount produced. The charging and discharging are
upshot of logic transitions. Switching power increases as logic transitions increase.
Therefore, the switching power of a cell is a utility of in cooperation the total load
capacitance at the cell amount produced and the velocity of logic transitions.
Switching power comprises 70-90 percent of the power dissipation of an vigorous
CMOS circuit.
2.3.2.2 INTERNAL POWER
In-house power is any power dissipated surrounded by the border line of
a cell. During switching, a circuit dissipate internal power by the charging or
discharging of any to be had capacitances in-house to the cell. In-house power
includes power debauched by a momentary short circuit stuck between the P and N
transistors of a gate, called short-circuit power.
2.3.3 SHORT-CIRCUIT POWER

The short-circuit power consumption, Pshort-circuit, is caused by the current
pour all the way through the direct path existing stuck between the power supply and
the ground during the transition segment.
2.3.4 LEAKAGE POWER
The PMOS and NMOS transistors worn in a CMOS logic circuit
universally encompass non-zero turn around leakage and sub-threshold currents.
These currents can make a payment to the total power dissipation flush when
the transistors are not the theater any switching action. The leakage power
dissipation, Pleakage is caused by two types of leakage currents.
The leakage power dissipation, P leakage is caused by two types of leakage
currents
a) Reverse-bias diode leakage current
b) Sub threshold current through a turned-off transistor channel
2.4 OVERVIEW OF POWER ESTIMATION TECHNIQUES

In our do research, we focal point on estimating the dynamic power
dissipation of digital circuit, which is directly related to chip heating and
battery natural life. This is pretty different from estimating the most awful case
of
on the spot
power.
Because this is a strongly input blueprint dependent
predicament, several solutions are wished-for to triumph over this dilemma by using
the probabilistic dealings. In folks approaches, they use probabilities as a packed
together way to describe a large set of achievable logic signals. Another come within
reach of for average power assessment is to acquire the current waveform by
performing arts a simulation.
We pass on these methods as
simulation-based
techniques. In the literature, many simulation-based approaches have been wished-for

at a choice of kinds of abstraction level. Generally vocalizations, the comparison of
the accuracy and speed in the midst of those approaches be capable of be summarized
in Fig. 2.2.Those nearly everyone accurate power estimation approaches is to act upon
transistor-level simulation, because the detailed information of the whole design is
notorious.
10
Nevertheless, it has the most awful because it requires too to a large

extent computation resources competence and it takes too to a great extent time
on behalf of simulation. Gate-level power simulation techniques be capable of make
available a well again trade-off stuck between accuracy and efficiency, but it possibly
will still cost a lot of redesign time to get to the bottom of power problems at what
time the design is before now at gate-level. Compared to other approaches, high-level
power estimation is much harder to obtain high accurate results, because the
detail in order of the design is before now loss too a large amount. However, if the
accuracy know how to be superior to an up to standard region, high level power
estimation techniques determination develop into very functional because we can
become aware of the power problems to a great extent earlier and more hastily. In
following section, the high-level power estimation willpower be introduced.
2.5 HIGH-LEVEL POWER ESTIMATION

In arrange to keep away from precious redesign steps for such complex devise,
designers encompass to estimate the power consumption at superior design stage
to appreciate whether supplementary improvements are requisite. It is unpractical
for SOC designs to use the long-established SPICE-liked simulation at
transistor-level as mentioned in Chapter 1. Consequently, a numeral of CAD
techniques encompass been planned for gate-level power estimation. However, when
the devise has been implemented to the gate level, it may at a standstill too late
or too expensive to advance the devise for power consumption problems. It
implies that high-level power assessment techniques are vital for designing such a
complex design to curtail redesign cycles.
Fig 2.3: A usage of high-level power model

11
A number of high-level power assessment techniques have been projected as

surveyed in. They are habitually classified as top-down and bottom-up styles. In the
top-down techniques, a course is specified as a Boolean function without factor
information of the circuit configuration. Top down methods habitually use a
quantity of abstract measurements such as entropy to measure of the total of
information alter as the power consumption standards . They would be constructive
when designing a logic block that was not until that time designed.
High-level power estimation techniques can be in the region of divided
into two categories: top down and bottom-up. At home the top-down techniques, a
combinational circuit is precise no more than as a Boolean function without in turn on
the circuit implementation.
Normally, they determination approximation the switching activity of
circuits by means of entropy. Entropy is a characterization of a random
capricious or a random progression which is commonly used in the in sequence
theory as a measure of information-carrying capacity.
Fig 2.4: High level power modeling concept

These kind of top-down techniques are useful when one is designing a
logic block that was not previously premeditated because they be capable of
12
provide
irregular
measurement
on the subject of
the
trend
of
power
consumption sooner than implementation. However, they possibly will not have
very good accuracy owing to the lack of implementation particulars.
In disparity, bottom-up methods are useful as soon as reusing a previously
designed logic block so with the intention of all exhaustive internal structures of
the circuit are acknowledged. A power macro-model strength of character be
built for such logic blocks in this sort of methods. When this logic block is
used in an additional application, the analogous power macro-model be capable of
be recycled to estimate the power dissipation of this block lacking performing any
simulation at gate-level or transistor-level. The tradition of power model has been
showing Fig. 2.3. This kind of power modeling approach will be very useful in the IPbased SOC designs.
13
TOOLS REQUIRED
There has been a assortment of tools mixed up in this thesis. Even
though, this thesis is all with reference to simulation and power calculations of
macros which are made using tools; there are other tools that have been used
preceding to the tradition of power tools to give the requisite input to the
power tools. More prominence is given to these tools that are mainly involved in
power assessment. The usage of tools has been off the record as Power tools and NonPower tools.
3.1 NON-POWER TOOLS

Non-power tools take account of Simulation tools, Synthesis tools, Layout
tools, Extraction tools and Waveform viewers.
The tools that are discussed in this
chapter are some of the non-power tools drawn in in the intact design flow. A short
portrayal of each of these tools along with their functioning flow is given in this
chapter to appreciate their functionality. The subsequent chapter discusses each of the
power tools in detailed manner as most of the thesis involves the use of these power
tools. The following chapter also discusses the design flow from code
inscription to spice net-list simulation, clearly illumination the usage of these tools at
the respective level.
3.1.1 SIMULATION TOOL
Initially, Verilog or VHDL code for a fastidious design is written and tested.
Simulation is done using Mentors Modelsim for both VHDL Verilog and other
Verilog simulators. Xilinx is a simulation and a debugging tool for VHDL, Verilog,
and other mixed-language designs from Mentor Graphics. The basic simulation flow
is as shown in Fig. 2.5. Initially, a working library is fashioned and the code is
compiled using the commands depending upon whether the code is VHDL or
Verilog.
Verilog
Compiled
Simulator
(VCS)
from
Synopsys
is
high-
performance, high-capacity Verilog simulator that incorporates advanced highlevel abstraction, verification into an open platform. The basic work flow for VCS
consists of two basic steps:
a) Compiling source files into executable binary files
14
b) Running the executable binary file

This two step approach simulates the design faster and uses less
memory than other interpretive simulators. The basic design flow is given in Fig
3.1.
Fig 3.1: shows the design flow

3.1.2 SYNTHESIS TOOL
Design Compiler is the core of the Synopsys synthesis software products. It
comprises tools that synthesize HDL designs into optimized technology-dependent,
gate-level designs. It supports a wide range of hierarchical design styles and can
optimize both combinational and sequential designs for speed, area, and power.
The basic Design Compiler (Design Vision) synthesis process is given in Fig.
4.2.The Design Compiler is a powerful tool that other products can be run inside its
environment using specific commands. Some of the products that can be accessed are
HDL compiler, automated chip synthesis, FPGA compiler, Behavioral compiler and
Power Compiler. HDL compiler reads and writes Verilog or VHDL design files. The
Verilog or VHDL compiler reads the HDL files and performs translation and
architectural optimization of the designs. The appropriate HDL compiler is
automatically called by Design Compiler when it reads an HDL design file.
15
Fig 3.2: shows basic design compiler synthesis process
3.2 POWER TOOLS

This thesis involves the usage of Synopsys power tools. The power products
are
tools
that comprise
a complete
methodology for low-power design.
Synopsys power tools offer power analysis and optimization throughout the
design cycle, from RTL to the gate level. Analyzing power early in the design cycle
can significantly affect the quality of the design. Improvements made to the design
while it is at RTL level can get even better results eventually. Not only these power
tools do accurate measurements but also can help in calculating power quicker.
Power consumption is calculated at three levels of abstraction. The tools used
at these levels are:
a) RTL Level - RTL Power Estimator
b) Gate Level Power Compiler (based on switching activity),
c) Transistor Level Nano Sim
3.2.1 POWER COMPILER
Power Compiler is an add-on product to Design Compiler. The Power
Compiler tool optimizes the design for power. Working in conjunction with the
Design Compiler tool, Power Compiler provides simultaneous optimization for
timing, power and area. In addition to the standard inputs to synthesis (RTL or
gate-level
net-list,
technology
library, design constraints, and parasitic),Power
Compiler uses two other inputs: Switching activity of design elements and power
constraints. It contains all the analysis capabilities of Design Power.
16
Power Compiler uses the same power analysis engine as Design Power.
This allows Power Compiler to the use the same switching activity for
optimization that Design Power uses for analysis. It accepts either user-defined
switching activity, switching activity from simulation, or a combination of both. It
provides RTL clock gating and optimizes the circuit based on circuit activity,
capacitance, and transition times. Power Compiler cannot only be used as a
standalone product but also can be used in coordination with Design Compiler,
Module Compiler, Physical Compiler and Floor plan Manager.
3.2.2. POWER COMPILER METHODOLOGY
Power Compiler is used at RTL and Gate level to calculate power and
do power optimization depending on the need. At each level of abstraction,
simulation, analysis and optimization can be performed to refine the design
before moving to the next lower level. Simulation and the resultant switching
activity gives the analysis and optimization the necessary information to refine the
design before going to next lower level of abstraction. The higher the level of design
abstraction, the greater the power savings can be achieved. The following Fig.4.2
describes the power flow at each of the abstraction level. Fig 3.3 shows power flow
from RTL to Gate level. Cell internal power and net toggling directly affect dynamic
power of a design. To report or optimize power, Power Compiler requires toggle
information for the design. This toggle information is called Switching Activity.
Fig 3.3: shows power flow at each of the abstraction level

Power Compiler models switching activity in terms of static probability and
toggle rate. Static probability is the probability that a signal is at a certain logic state
and is expressed as a number between 0 and 1. It is calculated during simulation of
17
the design by comparing the time of a signal at a certain logic state to the total time of
the simulation. Toggle rate is the number of logic-0-to-logic-1 and logic-1-to-logic-0
transitions of a design object per unit of time.
The following Fig 4.5 shows the methodology of power calculation using the
combination of Power Compiler and Design Compiler. The flow of data between the
different steps and tools used are also shown. Before starting to calculate power using
Power Compiler the desired gate-level net-list of the design should be first generated.
The power methodology starts with the RTL design and finishes with a poweroptimized gate-level net-list. Ultimately, Power Compiler is used to calculate
power using the gate-level net-list produced by the Design Compiler or poweroptimized gate net-list produced by Power Compiler itself Power Compiler
models switching activity in terms
of static probability and toggle rate. Static
probability is the probability that a signal is at a certain logic state and is expressed as
a number between 0 and 1.
Power Compiler models switching activity in terms of static probability and
toggle rate. Static probability is the probability that a signal is at a certain logic state
and is expressed as a number between 0 and 1. It is calculated during simulation of
the design by comparing the time of a signal at a certain logic state to the total time of
the simulation. Toggle rate is the number of logic-0-to-logic-1 and logic-1-to-logic-0
transitions of a design object per unit of time.
The following Fig 3.5 shows the methodology of power calculation
using the combination of Power Compiler and Design Compiler. The flow of data
between the different steps and tools used are also shown. Before starting to calculate
power using Power Compiler the desired gate-level net-list of the design should be
first generated. The power methodology starts with the RTL design and finishes with a
power-optimized gate-level net-list. Ultimately, Power Compiler
is
used
to
calculate power using the gate-level net-list produced by the Design Compiler
or power-optimized gate net-list produced by Power Compiler itself. As seen in
the figure most of the processes that take place are using Design Compiler, but
the simulation process that is shown is outside Design Compiler tool and is done as
part of power calculation.
18
Fig 3.4: shows power flow from RTL to Gate level

The following Fig 3.5 shows the methodology of power calculation
using the combination of Power Compiler and Design Compiler. The flow of data
between the different steps and tools used are also shown. Before starting to calculate
power using Power Compiler the desired gate-level net-list of the design should be
first generated. The power methodology starts with the RTL design and finishes with a
power-optimized gate-level net-list. Ultimately, Power Compiler
is
used
to
calculate power using the gate-level net-list produced by the Design Compiler
or power-optimized gate net-list produced by Power Compiler itself. As seen in
the figure most of the processes that take place are using Design Compiler, but
the simulation process that is shown is outside Design Compiler tool and is done as
part of power calculation.
The main purpose of simulation is to generate information about the
switching activity of the design and create a file called Back-annotation. This
file can contain switching activity from RTL simulation or gate-level simulation.
Initially, the RTL design is given to the HDL compiler to create a technologyindependent format called as GTECH design. This is as a result of analyzing and
elaborating the design by HDL compiler. This formatted design is given as an
19
input to Design Compiler. Before it is compiled by the Design Compiler, rtl2saif

command is used to create forward-annotation file which is later used for simulation.
The formatted design GTECH is later given as input to Design Compiler which
produces an output which is given to Power Compiler.
The Forward-annotation SAIF file is given as an input to do RTL simulation
which gives a back-annotation SAIF file which is used by Power Compiler. This
forward annotated file contains directives that determine which design elements to be
traced during simulation. Gate level simulation can also use a library forwardannotation file. This forward-annotation file used for gate level simulation has
different
information
compared
to RTL forward-annotation
file. This
file
contains information from the technology library about cells with state and
path dependent power models. Lib2saif command is used to get this forwardannotation file.
Fig 3.5: shows power methodology in power compiler

During power analysis, Power Compiler uses the annotated switching activity
to evaluate the power consumption of the design. During power optimization, Power
Compiler uses the annotated switching activity to make decisions about the design.
20
3.3 STARTING ISE SOFTWARE

3.3.1 WINDOWS
To start ISE, double-click the desktop icon
Or go to, Start All Programs Xilinx ISE Design Suite 14.1 ISE
Design tool Project Navigator
Or select Start > run, and run the following commands
1. C:\xilinx\14.1\ISE_DS\settings32.bat
2. C:\Xilinx\14.1\ISE_DS\ISE\bin\nt\ise.exe
3.3.2 LINUX
To start ISE, Open your terminal and run the following commands
1. cd /local/ Xilinx/13.2/ISE_DS/
2. Source settings32.csh
3. ise
3.3.3 CREATE A NEW PROJECT
To create a new ISE project Select File > New Project. The page Create New
Project appears
1. In the field Project Name, type tutorial_1 .You can choose another name that does
not contain any white spaces.
2. In the field Project Location, browse to a location (directory path) for the new
project
If you use Window, browse to a directory under your Z drive. If you use
Linux, browse to a directory under your home directory.
Note that: A tutorial_1 subdirectory is created automatically
3. In the field Top-level source type, select > Schematic
4. Click > Next to move to the page Project Settings
21
3.3.4 PROJECT SETTINGS PAGE

1. In the field Evaluation Development Board, select > Virtex 6 ML605 Evaluation
Platform.
2. In the field Simulator, Select > ISim(VHDL/Verilog).
3. In the field Preferred Language, Select > VHDL.
4. Click > Next to move to the page Project Summary.
5. Click > Finish in the page Project Summary.
Fig 3.6: shows the project location and type
22
Fig 3.7: shows the specific device and project properties

3.3.5 CREATE A NEW DESIGN
To study how to create a new design, we will design in this section a 2-Input
X-OR Gate. The X-OR Function is defined as: Y = A1 xor B1 = A1B1 + A1B1.
3.3.5.1 CREATE A SCHEMATIC SOURCE
1. In ISE Design Suite that appears on the left side of ISE, click on the Design Tab to
go to the Design Panel
2. In the Design Panel, right-click on the icon tutorial_ 1 and select > New Source to
move to the page Select Source Type.
23
Fig 3.8: shows how to create new source

3. The page Select Source Type
i) In the field File Name, type my_xor
You can choose another name that does not contain any white spaces
ii) From the Column at the left-side, select Schematic as a Source Type
iii) Tick the option > Add To Project.
iv) Click > Next to move to the page Project Summary
4. In the page Project Summary, click > Finish.
A Schematic source file my_xor.sch is added to the project
3.3.5.2 EDIT THE SCHEMATIC FILE
1. In ISE Design Suite, go to the Design Panel and open the source file my_xor.sch
by double clicking it
2. In ISE Design Suite, go to the Symbols Panel
24
3. From the alphabetically ordered symbols appear in the Symbols panel select the
required symbols for our design, and add them to the schematic file
The required Symbols are (Two 2-Input And gates, Two 1-Input Inverters, and
One 2-Input Or gates).
4. To connect the gates in Schematic file, Select > Add > Wire and use the wires to
draw the connections You might need to Zoom-In the Schematic file to be able to
connect the gates.
5. To connect the Input / Output Ports to our design, select Add > I/O Marker and
connect two ports to the input and one to the output.
Both Add > Wire and Add I/O Marker can be found in the panel of the icons
appears at the left of the schematic.
Fig 3.9: shows selecting the source type

6. Rename the Port by double clicking it, selecting Nets and typing the required name,
(A1, B1, or Y1).
7. To check the correctness of the Schematic, Select > Tools > Check Schematic
This check Figures out the mistakes such as floating pins or unconnected
wires. However, it cannot Figure out a faulty design.
25
8. Save the final schematic file my_xor.sch which contains the final design.
3.4 SIMULATE DESIGN

In this section, we will simulate our design to verify that it behaves as we
expect. We will use the Integrated Simulator (ISim).
3.4.1 ISIM
1. Open Design Panel. In Design Panel View, select > Simulation
2. In Design Panel > Hierarchy, select > my_xor_tst.vhd
3.
In Design Panel > Processes > ISim Simulator, Double Click >Simulate
Behavioral Model to open the Integrated Simulator (ISim).

3.4.1.2 ISIM WINDOW
i) Zoom-Out to view the whole simulation time, the Default simulation time is 1000
ns,
ii) The Simulator shows the three ports A1, B1 and Y1
iii) Compare the value of Y1 with A1 and B1. Y1 should always equal to A1 X-OR
B1
Fig 3.10: shows how to simulate
26
Fig 3.11: The waveforms result
27
IMPLEMENTATION
In the past technique [1] the measure of time is wasted by discovering the
impossible combination of FF furthermore numerous single bit FF is utilized. This
may expand the complicated nature. So as to decrease the power MBFF idea is
utilized. It portrays that need to recognize a legal placement region for every FF. In
first stage, the reasonable placement regions of a FF connected with diverse pins are
discovered focused around the timing stipulations characterized on the pins. At that
point, the legal placement region of the FF can be obtained by overlapped area of
these regions.
Nonetheless, these regions are fit as a diamond shape; it is not simple to
recognize the overlapped region. Accordingly, the overlapped zone can be recognized
all the more effectively in the event that it can change the coordinate arrangement of
cells to get rectangular regions. In the second stage, it might want to manufacture a
combination table, which characterizes all combinations of FF keeping in mind the
end goal to get another multi-bit Ffs given by the library.
The flip-flops can be united with the assistance of the table. After the legal
placement regions of flip-flops are discovered and the combination table is fabricated,
we can utilize them to merge flip-flops. To accelerate our project, we will isolate a
chip into a few canisters and consolidation flip-flops in a neighborhood bin.
However, the flip-flops in diverse bins might be merge able. In this way, we
need to consolidate a few bins into a bigger bin and repeat this venture until no flipflop can be fused any longer. In this area, we would detail each one phase of our
technique. In the first subsection, we demonstrate a basic equation to change the
original coordination framework into another one so that a legal placement region for
each one flip-flop can be distinguished all the more effectively. The second subsection
shows the flow of building the combination table. At long last, the substitutions of
flip-flops will be depicted in the last subsection.
4.1 TRANSFORMATION OF PLACEMENT SPACE

We have shown that the shape of a feasible placement region associated with
one pin pi connecting to a ip-op fi would be diamond in Section II. Since there may
exist several pins connecting to f i , the legal placement region of f i are the
28
overlapping area of several regions. As shown in Fig. 4.1(a), there are two pins p1 and
p2 connecting to a ip-op f1,and the feasible placement regions for the two pins are
enclosed by dotted lines, which are denoted by Rp ( p1) and Rp ( p2), respectively.
Thus, the legal placement region R( f1) for f1 is the overlapping part of these regions.
In Fig. 401(b), R( f1) and R( f2) represent the legal placement regions of f1 and f2.
Because R( f1) and R( f2) overlap, we can replace f1 and f2 by a new ip-op f3
without violating the timing constraint, as shown in Fig. 4.1(c).
However, it is not easy to identify and record feasible placement regions if their
shapes are diamond. Moreover, four coordinates are required to record an overlapping
region [see Fig. 4.2(a)]. Thus, if we can rotate each segment 45, the
Fig. 4.1. (a) Feasible regions Rp ( p1) and Rp ( p2) for pins p1 and p2 which are
enclosed by dotted lines, and the legal region R( f1) for f1 which is enclosed by solid
lines. (b) Legal placement regions R( f1) and R( f2 ) for f1 and f2, and the feasible area
R3 which is the overlap region of R( f1) and R( f2). (c) New ip-op f3 that can be
used to replace f1 and f2 without violating timing constraints for all pins p1, p2,
p3,and p4.
29
Fig. 4.2 (a) Overlapping region of two diamond shapes. (b) Rectangular shapes
obtained by rotating the diamond shapes in (a) by 45.
Shapes of all regions would become rectangular, which makes identication of
overlapping regions become very simple. For example, the legal placement region,
enclosed by dotted lines in Fig. 4.2(a), can be identied more easily if we change its
original coordinate system [see Fig. 4.2(b)]. In such condition, we only need two
coordinates, which are the left-bottom corner and right-top corner of a rectangle, as
shown in Fig. 4.2(b), to record the overlapped area instead of using four coordinates.
The equations used to transform coordinate system are shown in (1) and (2).
Suppose the location of a point in the original coordinate system is denoted by (x, y).
After coordinate transformation, the new coordinate is denoted by (x, y). In the
original transformed equations, each value needs to be divided by the square root of 2,
which would induce a longer computation time. Since we only need to know the
relative locations of flip-flops, such computation are ignored in our method.
Thus, we use x and y, to denote the coordinates of transformed locations.
4.2 OUR ALGORITHM

Our design ow can be roughly divided into three stages. Please see Fig. 4 for
our ow. In the beginning, we have to identify a legal placement region for each ipop fi . First, the feasible placement region of a ip-op associated with different pins
are found based on the timing constraints
30
Fig. 4.3 Flow chart of our algorithm.

dened on the pins. Then, the legal placement region of the ip-op fi can be obtained
by the overlapped area of these regions. However, because these regions are in the
diamond shape, it is not easy to identify the overlapped area. Therefore, the
overlapped area can be identied more easily if we can transform the coordinate
system of cells to get rectangular regions. In the second stage, we would like to build
a combination table, which denes all possible combinations of ip-ops in order to
get a new multi-bit ip-op provided by the library. The ip-ops can be merged with
the help of the table. After the legal placement regions of ip-ops are found and the
combination table is built, we can use them to merge ip-ops. To speed up our
program, we will divide a chip into several bins and merge ip-ops in a local bin.
However, the ip-ops in different bins may be mergeable. Thus, we have to combine
several bins into a larger bin and repeat this step until no ip-op can be merged
anymore.
In this section, we would detail each stage of our method. In the rst subsection, we
show a simple formula to transform the original coordination system into a new one
so that a legal placement region for each ip-op can be identied more easily. The
second subsection presents the ow of building the combination table. Finally, the
replacements of ip-ops will be described in the last subsection.
31
4.2.1 COMBINATION TABLE

A few flip-flops can be replaced by multi-bit flip-flop. In this proposed
methodology, the combination table is assemble, which is utilized to get achievable
flip-flops before substitution. This makes to use for recognizing the specific flip-flop
which will be empowered in active region and cannot be covered. Utilizing this
combination table, the flip-flop can be bit by bit replaced and this makes lessens the
multifaceted nature of the configuration. Since one and only combination of flip-flop
need to be considered in each one time, the clock signal can be successfully
decreased.
If we want to replace several ip-ops by a new ip-op f (note that the bit width of
f should equal to the summation of bit widths of these ip-ops), we have to make
sure that the new ip-op f is provided by the library L when the feasible regions of
these ip-ops overlap. In this paper, we will build a combination table, which
records all possible combinations of ip-ops to get feasible ip-ops before
replacements. Thus, we can gradually replace ip-ops according to the order of the
combinations of ip-ops in this table. Since only one combination of ip-ops needs
to be considered in each time, the search time can be reduced greatly. In this
subsection, we illustrate how to build a combination table.
The pseudo code for building a combination table T is shown in Algorithm 1.
We use a binary tree to represent one combination for simplicity. Each node in the tree
denotes one type of a ip-op in L. The types of ip-ops denoted by leaves will
constitute the type of the ip-op in the root. For each node, the bit width of the
corresponding ip-op equals to the bit width summation of ip-ops denoted by its
left and right child [please see Fig. 9(e) for example]. Let ni denote one combination
in T,and b(ni ) denote its bit width. In the beginning, we initialize a combination ni for
each kind of ip-ops in L (see Line 1). Then, in order to represent all combinations
by using a binary tree, we may add pseudo types, which denote those ip-ops that
are not provided by the library, (see Line 2). For example, assume that a library only
supports two kinds of ip-ops whose bit widths are 1 and 4, respectively. In order to
use a binary tree to denote a
32
Algorithm 1 Build Combination Table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
T = InitializationCombinationTable(L);
InsertPseudoType(L);
SortByBitNumber (L);
for each ni in T do
InsertChildrens (ni, NULL, NULL);
index = 0;
while index != size(T) do
range_first = index;
range_second = size(T);
index = size(T);
for each ni in T
for j = 1 to range_first do TypeVerify(ni, nj, T);
for j = i to range_second do TypeVerify(ni, nj, T);
T = DuplicateCombinationDelete(T);
T = UnusedCombinationDelete(T);
InsertPseudoType(L):
1. 1for i = (bmin+1) to (bmax-1)
2. if(L does not contain a type whose bit width is equal to i )
3. insert a pseudo type typej with bit width i to L;
InsertChildrens(n, n1, n2):
1
2
n.left_child n1;
n.right_child n2;
TypeVerify(n1, n2, T):

1. 1 bsum = b(n1)+ b(n2);
2. if (L contains a type whose bit width is bsum)
3. insert a new combination n whose bit width bsum to T;
4. InsertChildrens( n, n1, n2);
Combinationwhosebitwidthis4,theremustexistipopswhosebitwidthsare
2and3inL[pleaseseethelasttwobinarytreesinFig.9(e)forexample].Thus,we
havetocreatetwopseudotypesofipopswith2and3bitif L doesnotprovide
these ipops. Function InsertPseudoType in algorithm 1 shows how to create
33
pseudotypes.Let bmaxand bmindenotethemaximumandminimumbitwidthof

ipops in L.In InsertPseudoType, it inserts all ipops whose bit widths are
larger than bmin and smaller than bmax into L if they are not provided by L
originally.Afterthisprocedure,allcombinationsinLaresortedaccordingtotheirbit
widthsintheascendingorder(Line3).Atpresent,allcombinationsarerepresented
bybinarytreeswith0level.Thus,wewouldassignNULLtoitsrightandleftchild
(seeLines 4and5).Finally,foreverytwokinds ofcombinations in T,wetryto
combinethemtocreateanewcombination(Lines613).Ifthenewcombinationis
theipopofafeasibletype(thiscanbecheckedbythefunctionTypeVerify),we
wouldaddittothetableT.InthefunctionTypeVerify,werstaddthebitwidthsof
the two combinations together and store the result in bsum (see Line 1 in
TypeVerify).Then,wewilladdanewcombinationntoTwithbitwidthbsumifL
hassuchkindofaipop.Aftertheseprocedures,theremayexistsomeduplicated
orunusedcombinationsinT.Thus,wehave
34
Fig.4.4.Exampleofbuildingthecombinationtable.
35
(a)InitializethelibraryLandthecombinationtableT.(b)Pseudotypesareadded
intoL,andthecorrespondingbinarytreeisalsobuildforeachcombinationinT.(c)
Newcombinationn3isobtainedfromcombiningtwon1s.(d)Newcombinationn4is
obtained from combining n1 and n3, and the combination n5 is obtained from
combiningtwon3s.(e)Newcombinationn6isobtainedfromcombiningn1andn4.4
(f)Lastcombinationtableisobtainedafterdeletingtheunusedcombinationin4.4(e).
To delete them from the table and the two functions DuplicateCombinationDelete
and UnusedCombinationDelete are called for the purpose (Lines 14 and 15). In
DuplicateCombinationDelete, it checks whether the duplicated combinations exist
or not. If the duplicated combinations exist, only the one with the smallest height of
its
corresponding
binary
tree
is
left
and
the
others
are
deleted.
In
UnusedCombinationDelete,it checks the combinations whose corresponding type is

pseudo
Algorithm2InsertPseudoTypes(optional)
InsertPseudoType(L):
1. 1foreachtypejinLdo
2. PseudoTypeVerifyInsertion(typej,L);
PseudoTypeVerifyInsertion(typej,L):
1
if(mod(b(typej)/2)==0)
b1=[b(typej)/2],b2=[b(typej)/2];
else
b1=b(typej)/2,b2=b(typej)b(typej)/2
fori=1to2
if((bi>bmin)&&
(Ldoesnotcontainatypewhosebitwidthisequaltobi))7insertapseudotypetypej
withbitwidthbitoL;8PseudoTypeVerifyInsertion(typej,L);
typeinL.Ifthecombinationisnotincludedintoanyothercombinations,itwillbe
deleted.
Forexample,supposealibraryLonlyprovidestwotypesofipops,whose
bitwidthsare1and4(i.e.,bmin=1andbmax=4),inFig.4.4(a).Werstinitialize
36
twocombinations n1andn2torepresentthesetwotypesofipopsinthetableT
[seeFig.4.4(a)].Next,thefunctionInsertPseudoTypeisperformedtocheckwhether
theipoptypeswithbitwidthsbetween1and4existornot.Thus,twokindsof
ipoptypeswhosebitwidthsare2and3areaddedintoL,andalltypesofipops
in L are sorted according to their bit widths [see Fig. 4.4(b)]. Now, for each
combinationinT,wewouldbuildabinarytreewith0level,andtherootofthebinary
treedenotesthecombination.Next,wetrytobuildnewlegalcombinationsaccording
tothepresentcombinations.Bycombingtwo1bitipopsintherstcombination,
anewcombinationn3canbeobtained[seeFig.4.4(c)].Similarly,wecangetanew
combinationn4(n5)bycombiningn1andn3(twon3s)[seeFig.4.4(d)].Finally,n6is
obtainedbycombingn1andn4.Allpossiblecombinationsofipopsareshownin
Fig.4.4(e).Amongthesecombinations, n5and n6areduplicated sincetheyboth
representthesamecondition,whichreplacesfour1bitipopsbya4bitipop.
Tospeedupourprogram, n6isdeletedfrom T ratherthan n5becauseitsheightis
larger.Afterthisprocedure,n4becomesanunusedcombination[seeFig.4.4(e)]since
therootofbinarytreeofn4correspondstothepseudotype,type3,inLanditisonly
includedinn6.Afterdeletingn6,n4isalsoneedtobedeleted.Thelastcombination
tableTisshowninFig.4.4(f).
Inordertoenumerateallpossiblecombinationsinthecombinationtable,all
theipopswhosebitwidthsrangebetweenbmaxandbminanddonotexistinL
shouldbeinsertedinto Lintheaboveprocedure.However,thisistimeconsuming.
Toimprovetherunningtime,onlysometypesofipopsneedtobeinserted.There
existseveralchoicesifwewanttobuildabinarytreecorrespondingtoatypetypej.
However,thecompletebinarytreehasthesmallestheight.Thus,forbuildingabinary
treeofacertaincombination ni whosetypeis typej ,onlytheipopswhosebit
widths
37
Input
F.g.4.5detailedflowtomergeflipflopsare
(b(typej )/2)and (b(typej )b(typej )/2) should exist in L. Algorithm 2 shows the
enhanced procedure to insert ipops of pseudo types. For each typej in L, the
function.
PseudoTypeVerifyInsertionrecursivelycheckstheexistenceofipopswhosebit
widthsaround[b(typej)/2]andaddthemintoLiftheydonotexist(seeLines1and
2).InthefunctionPseudoTypeVerifyInsertion,itdividesthebitwidthb(typej)into
twoparts[b(typej)/2]and[b(typej)/2][b(typej)/2]andb(typej)b(typej)/2)if
b(typej)isaneven(odd)number(seeLines14inPseudoTypeVerifyInsertion),
38
anditwouldinsertapseudotypetypejintoLifthetypeisnotprovidedbyLandits
bitwidthislargerthantheminimumbitwidth(denotedbybmin)ofipopsinL
(seeLines58inPseudoTypeVerifyInsertion).Thesameprocedurerepeatsinthe
newcreatedtype.Notethatthismethodworksonlywhenthe1bittypeexistsinL.
WestillhavetoinsertpseudoipopsbythefunctionInsertPseudoTypein
Algorithm1ifthe1bitipopisnotprovidedbyL.
Forexample,assumealibraryLonlyprovidestwokindsofipopswhose
bitwidthsare1and7.Inthenewprocedure,itrstaddstwopseudotypesofip
opswhosebitwidthsare3and4,respectively,fortheipopwith7bit(i.e., L
becomes[1347]).Next,theipopwhosebitwidthis2isaddedtoLfortheip
opwith4bit(i.e.,Lbecomes[12347]).Fortheipopwith3bit,theprocedure
stopsbecauseopopswith1and2bitsalreadyexistinL.Inthenewprocedure,we
donotneedtoinsert5and6bitpseudotypestoL.
4.2.2MergeFlipFlops
WehaveshownhowtobuildacombinationtableinSectionIIIB.Now,we
wouldliketoshowhowtousethecombinationtabletocombineipopsinthis
subsection.Toreducethecomplexity,werstdividethewholeplacementregioninto
several subregions, and use the combination table to replace ipops in each
subregion.Then,severalsubregionsarecombinedintoalargersubregionandtheip
opsarereplacedagainsothatthoseipopsintheneighboringsubregionscanbe
replaced further.Finally,thoseipopswithpseudotypesaredeletedinthelast
stagebecausetheyarenotprovidedbythesupportedlibrary.Fig.4.5showsthisow.
1)RegionPartition(Optional):Tospeedupourproblem,wedividethewholechip
intoseveralsubregions.Bysuitablepartition,thecomputationcomplexityofmerging
ipopscanbereducedsignicantly(therelatedquantitativeanalysiswillbeshown
inSectionV).AsshowninFig.11,wedividetheregionintoseveralsubregions,and
eachsubregioncontainssixbins,whereabinisthesmallestunitofasubregion
39

F.g4.6exampleofregionportionwithsixbinsinonesubregion
2)ReplacementofFlipopsinEachSubregion:Beforeillustratingourprocedureto
mergeipops,werstgiveanequationtomeasurethequalityiftwoipopsare
goingtobereplacedbyanewipopasfollows:
cost=routing_lengthavailable_area(5)
whererouting_lengthdenotesthetotalroutinglengthbetweenthenewipopand
the pins connected to it, and available_area represents the available area in the
feasible region for placing the new ipop. is a weighting factor (the related
analysisofthevalue willbeshowninSectionV).Thecostfunctionincludesthe
termrouting_lengthtofavorareplacementthatinducesshorterwirelength.Besides,if
theregionhaslargeravailablespacetoplaceanewipop,itimpliesthatithas
higheropportunitiestocombinewithotheripopsinthefutureandmorepower
reduction.Thus,wewillgiveitasmallercost.Oncetheipopscannotbemergedto
a higherbit type (as the 4bit combination n4 in Fig. 4.4), we ignore the
available_areainthecostfunction,andhenceissetto0.
After a combination has been built, we will do the replacements of ipops
accordingtothecombinationtable.First,welinkipopsbelowthecombinations
correspondingto
40
Fig.4.7.Exampleofreplacementsofipops.(a)Setsofipopsbeforemerging.
(b)Two1bitipops,f1andf2,arereplacedbythe2bitipopf3.(c)Two1bit
ipops,f4andf5,arereplacedbythe2bitipopf6.
1(d)Two2bitipops,f7andf8,arereplacedbythe4bitipopf9.
2(e)Two2bitipops,f3andf6,arereplacedbythe4bitipopf10.
3(f)Setsofipopsaftermerging.
41
theirtypesinthelibrary.Then,foreachcombination n in T, weseriallymergethe
ipopslinkedbelowtheleftchildandtherightchildof n fromleavestoroot.
Algorithm 3 shows the procedure to get a new ipop corresponding to the
combinationn.Basedonitsbinarytree,wecanndthecombinationsassociatedwith
theleftchildandrightchildoftheroot.Hence,theipopsinthelists,namedlleft
and lright, linked below the combinations of its left child and its right child are
checked(seeLines2and3).Then,foreachipopfiinlleft,thebestipopfbest
in lright, whichis the ipopthat canbemerged with fi withthe smallest cost
recorded in cbest, is picked. For each pair of ipops in the respective list, the
combinationcost[basedon(5)]iscomputediftheycanbemergedandthepairwith
thesmallestcostischosen(seeLines411).Finally,weaddanewipopf inthe
listofthecombinationnandremovethepickedipopswhichconstitutesthef(see
Lines1214).
Forexample,givenalibrarycontainingthreetypesofipops(1,2,and4bit),we
rstbuildacombinationtable T asshowninFig.4.7(a).Inthebeginning,theip
opswithvarioustypesare,respectively,linkedbelown1,n2,andn3in
Fig.4.8.Combinationofflipflopsnearsubregionboundaries.(a)Resultofreplace
flipflopsineachsubregion.(b)Resultofreplaceflipflopsineachnewsubregion
whichisobtainedfromcombiningtwelvesubregionin(a).
42
Fig.4.9.Combinationofsubregionstoalargerone.(a)Placementisoriginally
partitionedinto16subregionsforreplacement.(b)Subregionboundedbyboldlineis
obtainedfromcombiningfourneighboringsubregionsin(a).(c)Subregionbounded
byboldlineisobtainedfromcombiningfoursubregionsin(b).
Taccordingtotheirtypes.Supposewewanttoformaipopinn4,which
needstwo1bitipopsaccordingtothecombinationtable.Eachpairofipopsin
n1areselectedandcheckedtoseeiftheycanbecombined(notethattheyalsohave
tosatisfythetimingandcapacityconstraintsdescribedinSectionII).Ifthereare
severalpossiblechoices,thepairwiththesmallestcostvalueischosentobreakthe
tie.InFig.4.7(a),f1andf2arechosenbecausetheircombinationgainsthesmallest
cost.Thus,weaddanewnodef3inthelistbelown4,andthendeletef1andf2from
theiroriginallist[seeFig.4.7(b)].Similarly,f4andf5arecombinedtoobtainanew
ipop f6, and the result is shown in Fig. 4.7(c). After all ipops in the
combinationsof1leveltrees(n4and n5) areobtainedasshowninFig.4.7(d),we
starttoformtheipopsinthecombinationsof2leveltrees(n6,and n7).InFig.
4.7(e),thereexistsomeipopsinthelistsbelown2andn4,andwewillmergethem
togetipopsinn6andn7,respectively.Supposethereisnooverlapregionbetween
thecoupleofipopsinn2andn4.Itfailstoforma4bitipopinn6.Sincethe2
bitipopsf3andf6aremergeable,wecancombinethemtoobtaina4bitipop
f10inn7.Finally,becausethereexistsnocoupleofipopsthatcanbecombined
further,theprocedurenishesasshowninFig.4.7(f).
Iftheavailableoverlapregionoftwoipopsexists,wecanassignanewoneto
replacethoseipops.Oncethereissufcientspacetoplacethenewipopinthe
availableregion,thealgorithmwillperformthereplacement,andthenewgenerated
ipopwillbeplacedinthegridthatmakesthewirelengthbetweentheipopand
43
itsconnectedpinssmallest.Ifthecapacityconstraintofthebin, Bk ,whichthegrid
belongstowillbeviolatedafterthenewipopisplacedonthatgrid,wewillsearch
thebinsnear Bk tondanewavailablegridforthenewipop.Ifnoneofbins
which are overlapped with the available region of new ipop can satisfy the
capacityconstraintaftertheplacementofnewipop,theprogramwillstopthe
replacementofthetwoipops.
3)BottomUpFlowofSubregionCombinations(Optional):AsshowninFig.4.8(a),
there mayexistsomeipops intheboundaryofeachsubregionthat cannotbe
replacedbyanyipopinitssubregion.However,theseipopsmaybemerged
withotheripopsinneighboringsubregionsasshowninFig.4.8(b).Hence,to
reducepowerconsumptionfurthermore,wecancombineseveralsubregionstoobtain
alargersubregionandperformthereplacementagaininthenewsubregionagain.The
procedurerepeatsuntilwecannotachieveanyreplacementinthenewsubregion.Fig.
14givesanexampleforthishierarchicalow.AsshowninFig.4.9(a),supposewe
divideachipinto16subregionsinthebeginning.Afterthereplacementofipops
isnishedineachsubregion,foursubregionsarecombinedtogetalargeroneas
showninFig.4.9(b).Supposesomeipopsinnewsubregionsstillcanbereplaced
bynewipopsinothernewsubregions,wewouldcombinefoursubregionsinFig.
4.9(b)togetalargeroneasshowninFig.4.9(c)andperformthereplacementinthe
new subregion again. As the procedure repeats in a higher level, the number of
mergeable ipops gets fewer. However, it would spend much time to get little
improvementforpowersaving.Toconsiderthisissue,thereexistsatradeoffbetween
powersavingandtimeconsuminginourprogram.
4)DeReplaceandReplace(Optional):Sincethepseudotypeisanintermediatetype,
whichisusedtoenumerateallpossiblecombinationsinthecombinationtableT,we
haveto remove the ipops belonging to pseudo types. Thus, after the above
procedureshavebeenapplied,wewouldperformdereplacementandreplacement
functionsifthereexistsanyopopsbelongingtoapseudotype.Forexample,if
therestillexistsaipop,fi,belongington3afterreplacementsinFig.9(f),wehave
todereplacefiintotwoipopsoriginallybelongston1.Afterdereplacing,wewill
do the replacements of ipops according to T without consideration of the
combinationswhosecorrespondingtypeispseudoinL.
44
4.3 BLOCK DIAGRAM AND ITS MODULES

This deals with the block diagram of the proposed method and its modules.
4.3.1 BLOCK DIAGRAM
The block diagram of the Application of Multi-bit flip-flop using QCL Adder
as shown in figure 4.Two inputs are given to QCL adder. QCL adder are developed by
Majority Logic XOR, AND, OR gate. The output of QCL adder is fed to highest bit
"1 finding Algorithm. This Algorithm finds the number of bits and the combination
table is built in order to merge the Flip-flops and it is stored in the Variable register
banks.
4.3.2 MODULES
This focuses on three different types of modules which are explained below.
4.3.2.1 DESIGN AND ANALYSIS OF MULTI-BIT FLIP-FLOPS
This module is utilized to decrease the power utilization by substituting some
flip flop with less Multi-Bit flip flops. We are utilizing the Multi-Bit flip flop rather
than more single bit flip flop to expand the clock synchronization. This will diminish
the unnecessary force wastage through the utilization of numerous clock sinks.
Fig 4.10 Block diagram

45
4.3.2.2 DESIGN OF MEMORY DEVICE USING MULTI-BIT FLIP FLOP

This is the application module to be developed. The memory designed by
mainly using the multi-bit flip flops. In this, power consumption of memory devices is
reduced compare to the single bit memory.
4.3.2.3 DESIGN AND ANALYSIS OF THE INTEGRATION MODULE
We are integrating all the sub modules and output signals are simulated. The
1-bit, 2-bit, 4-bit and 8-bit Ffs are created as partitioned assignment as demonstrated
in Fig. 3. The two inputs zone and b, is spoken to as input1 and b is spoken to as
input2. These two inputs are included and put away in the FF updating. After that it
checks the bits that are accessible in the area. The chosen Ffs are used when it is
empowered and yield is shown. This makes decreases the power and delay in the
design. The low power affects in the expense, size, weight, execution and unwavering
quality.
The multiplier application can likewise be carried out in this proposed work.
As opposed to including the bits, reproducing is possible and it is put away in the
specific enabled flip-flop. For case, accept that a library just helps two sorts of
flip-flops whose bit widths are 1 and 4 methods the specific flip-flop will be chosen
and it will be empowered in the area and will be in sleep mode (in-active region).
The D-FF is utilized as a part of this proposed work. It gives synchronous
information exchange and utilized for capacity reason. In any case, a dissimilar latch
element, a FF just duplicates the information from the data pin to the yield once for
every clock period and does not permit various multiple logic values to be passed in a
clock cycle. Information is exchanged at either the rising or the falling clock edge,
contingent upon the flip-flop setup. Unlike latch, a FF is not level-sensitive, yet rather
edge-activated. As it were, information gets put away into a FF just at the dynamic
edge of the clock. The 16 bit FF can likewise be produced as indicated in Fig. 4; it
diminishes the power and memory gadgets contrasted with single bit flip lemon. By
and large, the snake libraries comprises AND, XOR as well as dominant part doors.
The register banks are utilized to store the bit when it is enabled.
The D Flip-flop is the edge-triggered variation of the transparent latch. On the
rising (typically, albeit negative edge triggering is possible) edge of the clock, the
yield is given the estimation of the D data at that minute. The yield can be just change
46
at the clock edge, and if the data changes at different times, the yield will be
unaffected. D flip-failures are by a wide margin the most well-known sort of flip-flops
and a few gadgets are made altogether from D flip-flops. They are regularly utilized
for shift- registers and input synchronization.
4.4 OBJECTIVES
1. Reduce the power consumption.
2. To reduce to the area.
3. To reduce the delay and power of a clock network.
4. To control clock skew because of common clock signal.
The above objectives can be achieved by merging several flip-flops and
synchronizing with clock signals.
4.5 QUANDARY STATEMENT

The following quandary statement has been identified:
1. Several Flip-flops needs a separate clock signal, hence Power consumption,
is high.
2. Since several flip-flops needs a separate clock signal area consumed is also
high.
47
RESULTS
5.1 SIMULATION AND SYNTHESIS OUTPUT
These results contain the simulation and synthesis results for different flip
flops and the adder which was designed as application module.
For a single bit flip-flop when input of clock leading edge is at 0 and the
trailing edge is at 1, d flip-flop input is given as 1, clear input is given as 1and the
preset output is given as 1. The output will be the 1; the simulated waveform was as
shown in the Fig. 5.1.
Fig 5.1: Simulation diagram of single bit flip-flop

For this single bit flip-flop performed synthesis in order to generate the
synthesis reports. The RTL schematic for the single bit flip flop has shown in the
below figure.
48
Fig 5.2: RTL schematic for single bit flip-flop.

For a multibit(2 bit flip-flop) flip-flop when input of clock leading edge is 0
and trailing edge is 1,d1 input is 1 and d2 input is 0,clear input is 1 and preset input is
0.The output of q1 will be 1 and the output of q2 will be 0 as shown in the Fig. 5.3.
The output of D flip-flop will be same as the input only when the clear input is
in the higher position depending on period given to the clock before the simulation
input the clock signal varies as shown in the Fig. 5.1.
The output varies with the change in the input the Fig. shown are some
examples. Not only two bit multi-bit flip-flop we can also use 4 and 8 bit multi-bit
flip-flop. The Fig. below shows the simulation output of 2 multi-bit flip-flop.
49
Fig 5.3: Simulation diagram of multi bit flip-flop

For this multi bit flip-flop performed synthesis in order to generate the
synthesis reports. The RTL schematic for the multi bit flip flop has shown in the
below figure.
The following figure shows the synthesis results of double bit flip-flop.
50
Fig 5.4: shows synthesis output of multi bit flip-flop.

For adder when clock leading edge input is 0 and the trailing edge input is 1,
reset input is 1, input for a is 0000000 and the input for b is 01111111.The output is
01111111 as shown in the Fig. 5.4.
The input is 00000000 for a and the input is 01111111 is for b the two inputs
are added by the adder which we are used in the module. The sum is stored at cout as
01111111. Then the one bit finding algorithm counts the number of ones in the cout
output and shows in the en output. Here for this given inputs the en out is 7 because
the number of ones in this output is 7.Then it enables the corresponding registers to
store the ones by choosing the combination table.
51
Fig. 5.5: Simulation output of adder

For example if the input is 0000011 for a, if the input is 0000011 for b the
output will be 00000110 and the clock signal is given to the flip flop that is required
to show the output.
That the number of ones in that output was 2, this number of ones can be
shown by the en input in the code. By seeing that en value the combination table will
activates and corresponding registers will be enabled. That is the remaining flip flops
are in deactivation mode. By this we can reduce the power that is required for the
operation in the system on chip.
Again the values in the registers were resettled and again given as a input to
00000000 and b input as 00011111. Then the output was 00011111, similarly the en
output was 5 because the no of ones in the output is 5. Then it checks the combination
for 5 and then enables the 1 bit and 4 bit registers in order to store five bit data. By
this we can reduce the power that is required for the operation in the system on chip.
When the synthesis has done for 1,2,3,4 bit flip-flops the synthesis report has
analyzed and given the comparison table below.
52
Flip-Flop Type
Delay(ns)
Clock Power(W)
1-bit
5.531
0.0127
2-bit
5.531
0.0127
4-bit
5.531
0.0130
8-bit
5.531
0.0166
Table 5.1: Comparison Table
CONCLUSION
53
This project has proposed a methodology for flip-flop substitution for power
reduction in digital integrated circuit design. The system of flip-flop substitutions is
relying upon the combination table, which records the connections among the flipflop types. By the rules of substitutions from the combination table, the
incomprehensible combinations of flip-failures won't be viewed as that reductions
execution time. Other than power reduction, the destination of minimizing the
aggregate wire length likewise considered to the expense capacity. The Verilog source
code had produced for the application module as indicated in above areas and
simulated utilizing the Isim test system. The single bit and multibit flip-flops source
code additionally planned and reproduced and combined utilizing Xilinx ISE Design
suite. This methodology can be appropriate for any circuit comprising of various flipflops like counters registers.
REFERENCES
54
[1] Ya-Ting Shyu, Jai-Ming Lin, Chun-Po Huang, Cheng-Wu Lin, Ying- Zu Lin,
and Soon- Jyh Chang, 2013, Effective and efficient approach for power reduction
by using Multi-bit Flip-flops in IEEE transactions on VLSI, vol. 21, no. 4.
[2] H. Kawagachi and T. Sakurai, 1997, A reduced clock-swing flip-flop (RCSFF)
for 63% clock power reduction , in VLSI Circuits Dig. Tech. Papers Symp., pp. 97
98.
[3] Y. Cheon, P.-H. Ho, A. B. Kahng, S. Reda, and Q. Wang, 2005, Power-aware
placement , in Proc. Design Autom. Conf., pp. 795800.
[4] Y.-T. Chang, C.-C. Hsu, P.-H. Lin, Y.-W. Tsai and S.-F. Chen, 2010, Postplacement power optimization with multi-bit flip- flops , in Proc. IEEE/ACM
Comput.-Aided Design Int. Conf., SanJose, CA, pp. 218223.
[5] P. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R.L. Allmon,
High- performance microprocessor design, IEEE J. Solid-State Circuits, vol. 33,
no. 5, pp. 676686, May 1998.
[6] L. Chen, A. Hung, H.-M. Chen, E. Y.-W. Tsai, S.-H. Chen, M.-H. Ku, and C.C.Chen,
Using
multi-bit
flip-flop
for
clock
power saving
by
Design
Compiler, in Proc. Synopsys User Group (SNUG), 2010.

[7] J.-T. Yan and Z.-W. Chen, Construction of constrained multi-bit flip-flops for
clock power reduction, in Proc. ICGCS, pp. 675678, 2010.
[8] S.-H. Wang, Y.-Y. Liang, T.-Y. Kuo, and W.-K. Mak, Power-driven flipflop merging and relocation, in Proc. ISPD, pp. 107114, 2011.
[9] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, 2003, Digital Integrated
Circuits: A Design Perspective, 2nd ed. Upper Saddle River, NJ: Prentice-Hall
[10] Y. Kretchmer, 2001, Using multi-bit register inference to save area and power,
EE Times Asia.
55

Reduction of Power Using Multi-Bit Flip-Flops

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Reduction of Power Using Multi-Bit Flip-Flops

Caricato da

Copyright:

Formati disponibili

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

(MBFF) is a successful power reduction procedure. It is utilized to decrease the

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

Fig 1.2 8 bit flip flop

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

Fig 2.1: VLSI design flow

2.2 RELATIONSHIP BETWEEN DIFFERENT ABSTRACTION

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

simulation requires moreover much computation resources in such stumpy abstract

2.3 BASIC CONCEPTS FOR POWER

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

2.3.2 DYNAMIC POWER

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

2.3.3 SHORT-CIRCUIT POWER

2.4 OVERVIEW OF POWER ESTIMATION TECHNIQUES

Because this is a strongly input blueprint dependent

We pass on these methods as

techniques. In the literature, many simulation-based approaches have been wished-for

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

Nevertheless, it has the most awful because it requires too to a large

2.5 HIGH-LEVEL POWER ESTIMATION

Fig 2.3: A usage of high-level power model

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

A number of high-level power assessment techniques have been projected as

Fig 2.4: High level power modeling concept

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

3.1 NON-POWER TOOLS

The tools that are discussed in this

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

b) Running the executable binary file

Fig 3.1: shows the design flow

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

Fig 3.2: shows basic design compiler synthesis process

3.2 POWER TOOLS

methodology for low-power design.

library, design constraints, and parasitic),Power

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

Fig 3.3: shows power flow at each of the abstraction level

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

of static probability and toggle rate. Static

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

Fig 3.4: shows power flow from RTL to Gate level

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

input to Design Compiler. Before it is compiled by the Design Compiler, rtl2saif

Fig 3.5: shows power methodology in power compiler

Dept of ECE, VLSI & ES, GVIC

REDUCTION OF POWER BY USING MULTI-BIT FLIP-FLOPS FOR VLSI APPLICATIONS

3.3 STARTING ISE SOFTWARE