Sei sulla pagina 1di 83

LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

CHAPTER 1
INTRODUCTION
This project proposes a low-power and area-efficient shift register using pulsed
latches. The area and power consumption are reduced by replacing flip-flops with pulsed
latches. This method solves the timing problem between pulsed latches through the use
of multiple non-overlap delayed pulsed clock signals instead of the conventional single
pulsed clock signal. The shift register uses a small number of the pulsed clock signals by
grouping the latches to several sub shifter registers and using additional temporary
storage latches. A 256-bit shift register using pulsed latches was fabricated using a 0.18
um CMOS process with vdd=1.8v. The core area is 6600um2. The power consumption is
1.2mW at a 100 MHz clock frequency. The proposed shift register saves 37% area and
44% power compared to the conventional shift register with flip-flops.
In VLSI design power consumption has become a very important issue.
Sequential logic circuits, such as registers, memory elements, counters etc., are heavily
used in the implementation of Very Large Scale Integrated (VLSI) circuits. Power
dissipation is critical for battery-operated systems, such as laptops, calculators, cell
phones and MP3 players since it determines the battery life. Therefore, designs are
needed that can consume less power while maintaining comparable performance. Flip-
flop is a data storage element. The operation of the flip-flops is done by its clock
frequency. When multistage Flip-Flop is operated with respect to clock frequency, it
processes with high clock switching activity and then increases time latency. The timing
elements and clock interconnection Networks such as flip-flops and latches, is One of
the most power consuming components in modern very large Scale integration (VLSI)
system. The area, power and transistor count will compared and designed using several
latches and flip flop stages. This thesis explored using pulsed latches for timing
optimization purposes. Flip-flops are the basic storage elements used extensively in all
kinds of digital designs. Flip Flop is a circuit which is used to store state information.
Power consumption is one of the main objectives in designing a flip flop.

A Shift register is the basic building block in a VLSI circuit. Shift registers are
commonly used in many applications, such as digital filters , communication receivers ,
and image processing ICs . Recently, as the size of the image data continues to increase
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

due to the high demand for high quality image data, the word length of the shifter
register increases to process large image data in image processing ICs. An image-
extraction and vector generation VLSI chip uses a 4K-bit shift register . A 10-bit 208
channel output LCD column driver IC uses a 2K-bit shift register . A 16-megapixel
CMOS image sensor uses a 45K-bit shift register . As the word length of the shifter
register increases, the area and power consumption of the shift register become important
design considerations.
The architecture of a shift register is quite simple. An N-bit shift register is
composed of series connected N data flip-flops. The speed of the flip-flop is less
important than the area and power consumption because there is no circuit between flip-
flips in the shift register. The smallest flip-flop is suitable for the shift register to reduce
the area and power consumption. Recently, pulsed latches have replaced flip-flops in
many applications, because a pulsed latch is much smaller than a flip-flop . But the
pulsed latch cannot be used in a shift register due to the timing problem between pulsed
latches. This paper proposes a low-power and area-efficient shift register using pulsed
latches. The shift register solves the timing problem using multiple non-overlap delayed
pulsed clock signals instead of the conventional single pulsed clock signal. The shift
register uses a small number of the pulsed clock signals by grouping the latches to
several sub shifter registers and using additional temporary storage latches. The rest of
the paper is organized as follows: Section II describes the architecture of the proposed
shift register. Section III presents the measurement results of the fabricated chip.
Flip flops are the basic storage elements used extensively in all kinds of digital
designs. The current trends will eventually mandate low power design automation on a
very large scale to match the trends of power consumption of today’s and future
integrated chips. Power consumption of Very Large Scale Integrated (VLSI) design is
given by generalized relation .Since power is proportional to the square of the voltage as
per the relation, voltage scaling is the most prominent way to reduce power dissipation.
the pulsed latch consumes less power than the flip flop.

A master-slave flip-flop using two latches can be replaced by a pulsed latch


consisting of a latch and a pulsed clock signal. All pulsed latches share the pulse
generation circuit for the pulsed clock signal. As a result, the area and power
consumption of the pulsed latch become almost half of those of the master-slave flip-flop.
The pulsed latch is an attractive solution for small area and low power consumption. The
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

pulsed latch cannot be used in shift registers due to the timing problem. The shift register
consists of several latches and a pulsed clock signal (CLK_pulse). The operation
waveforms show the timing problem in the shifter register. The output signal of the first
latch (Q1) changes correctly because the input signal of the first latch (IN) is constant
during the clock pulse width. But the second latch has an uncertain output signal (Q2)
because its input signal (Q1) changes during the clock pulse width. One solution for the
timing problem is to add delay circuits between latches, as shown in Fig. 3(a). The
output signal of the latch is delayed and reaches the next latch after the clock pulse. As
shown in Fig. 3(b) the output signals of the first and second latches (Q1 and Q2) change
during the clock pulse width , but the input signals of the second and third latches (D2
and D3) become the same as the output signals of the first and second latches (Q1 and
Q2) after the clock pulse. As a result, all latches have constant input signals during the
clock.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

CHAPTER-2

LITERATURE SURVEY

Single event effects (SEEs) caused by radiation are a major concern when
working with circuits that need to operate in certain environments, like for example in
space applications. In this paper, new techniques for the implementation of moving
average filters that provide protection against SEEs are presented, which have a lower
circuit complexity and cost than traditional techniques like triple modular redundancy
(TMR). The effectiveness of these techniques has been evaluated using a software fault
injection platform and the circuits have been synthesized for a commercial library in
order to assess their complexity. The main idea behind the presented approach is to
exploit the structure of moving average filter implementations to deal with SEEs at a
higher level of abstraction. Gigabit Ethernet on Category-5 cable is the next generation
high-speed Ethernet LAN for twisted pair copper medium with a minimum required
reach of 100 meters. This paper presents a brief overview of the transmission scheme
agreed upon by the IEEE 802.3ab task force for 1Gb/s full-duplex operation over 4 pairs
of category-5 cable. Some system level simulation results are presented followed by a
discussion of the type of digital and analog circuits required for a single chip mixed-
signal CMOS implementation of the transceiver. For reliable operation under worst case
cabling conditions, the DSP portion of the transceiver has to perform over 150 Giga
operations per second. A feature-extraction and vector-generation VLSI has been
developed for real-time image recognition.
An arrayed-shift-register architecture has been employed in conjunction with a
pipelined directional-edge-filtering circuitry. As a result, it has become possible to scan
an image, pixel by pixel, with a 64 x 64-pixel recognition window and generate a 64-
dimensional feature vector in every 64 clock cycles. In order to determine the threshold
for edge-filtering operation adaptive to local luminance variation, a high-speed median
circuit has been developed. A binary median search algorithm has been implemented
using high-precision majority voting circuits working in the mixed-signal principle. A
prototype chip was designed and fabricated in a 0.18-mum 5-metal CMOS technology. A
high-speed feature vector generation in less than 9.7 ns/vector element has been
experimentally demonstrated. It is possible to scan a VGA-size image at a rate of 6.1
frames/s, thus generating as many as 1.5 x 106 feature vectors per second for recognition.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

This is more than 103 times faster than software processing running on a 3-GHz general-
purpose processor.
This paper presents a 10-bit column driver IC for active-matrix LCDs, with a
proposed iterative charge-sharing based (ICSB) capacitor-string that interpolates two
output voltages from a resistor-string DAC. Iterative mode change between a capacitive
voltage division mode and a charge sharing mode in the ICSB capacitor-string
interpolation suppresses the effect of mismatches between capacitors and that of parasitic
capacitances; thus, a highly linear capacitor sub-DAC is realized. In addition, the area-
sharing layout technique, which stacks the interpolation capacitor-string on top of the R-
DAC area, reduces the driver channel size and extends the bit resolution of the gamma-
corrected nonlinear main R-DAC. Consequently, the proposed ICSB capacitor-string
interpolation scheme provides highly uniform channel performance by passively
dividing the coarse voltages from the global resistor-string DAC with high area
efficiency, and more effective bit resolution for nonlinear gamma correction.
The prototype column driver IC was implemented using a 0.11-μm CMOS
process. The area occupation of the DAC and buffer amplifier per channel is only 188 ×
15 μm2, and the static power consumption is 0.9μA/channel with no additional static
power dissipation for the interpolation. The measured maximum DNL and INL are 0.25
LSB and 0.43 LSB, respectively. The measured maximum inter-channel DVO is 5.6 mV.
The proposed chip achieves state-of-the-art performance in terms of chip size and
channel-to-channel uniformity. The design and scaling of a 21 mm × 21 mm CMOS
image sensor for charged-particle imaging, ¿EM7,¿ is presented and compared to its
smaller prototype, EM5. The sensor contains ~50 million transistors spanning its 16
million pixels, and includes over 4,100 parallel analog processing and A/D conversion
circuits, utilizing 12 parallel 10-bit readout busses for high data throughput. The clock
distribution design in EM7 minimizes the clock delay by dividing the chip into multiple
parallel sections, each driven locally by a tree-like clock structure. By this technique,
simulations showed that the readout shift-register clock delay is reduced from 4.7 ns to
0.14 ns, and the row shift-register clock delay is reduced from 1.7 ns to 0.12 ns. With
similar local buffering, the ADC gray code counter delay is reduced from 35 ns to 0.9 ns.
These improvements allow EM7 to sustain image acquisition at 75 frames/s, for a
continuous data throughput of over 10Gb/s. The large chip dimensions and the increased
power consumption in EM7 also require more robust power distribution. A matrix-math
simulation shows the worst-case pixel IR voltage drop was improved from 20 mV to 8
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

mV. Similarly, the pixel's worst-case analog output's IR drop is reduced from 80.7 mV to
2.58 mV, and its bandwidth is thus increased from 6.92MHz to 14.4MHz. The power
supply IR drop in the output processing stage's op-amps is reduced from 327 mV to 35
mV, their open-loop gain variation is reduced from 525% to 28%, and their worst-case
bandwidth is increased from 0.87 MHz to 764MHz.

This paper presents new techniques to evaluate the energy and delay of flip-flop
and latch designs and shows that no single existing design performs well across the wide
range of operating regimes present in complex systems. We propose the use of a
selection of flip-flop and latch designs, each tuned for different activation patterns and
speed requirements. We illustrate our technique on a pipelined MIPS processor datapath
running SPECint95 benchmarks, where we reduce total flip-flop and latch energy by
over 60% without increasing cycle time. Flip-flops (FFs) are key building blocks in the
design of high-speed energy-efficient microprocessors, as their data-to-output delay (D-
Q) and power dissipation strongly affect the processor's clock period and overall power.
From previous analyses, the Transmission-Gate Pulsed Latch (TGPL) proved to be the
most energy-efficient FF in a large portion of the design space, ranging from high speed
(minimizing ED' products with j>;1) to minimum ED product designs, while simple
Master-Slave FFs (TGFF and ACFF ) are the most energy-efficient in the low-power E-
D space region. TGPL also has the lowest D Q delay along with STFF. However, the
latter has considerably worse energy efficiency, hence, the TGPL is the best reference
for a comparison. In this work, two new FFs are introduced, the Conditional Push-Pull
Pulsed Latch (CP3L), and a version with a Shareable (CSP3L) Pulse Generator (PG).
The
adoption of a fast push-pull second stage, which requires a conditional PG, enables 50-
to-100% delay improvements compared to TGPL, and absolute D-Q up to 0.7FO4. CP3L
and CSP3L also exhibit superior energy efficiency to TGPL in terms of minimum ED3
and ED products. A test chip is fabricated in 65nm CMOS technology (VDD=1V) to
measure delay and energy consumption of CP3L, CSP3L and TGPL in minimum ED and
ED3 sizing. Different loadings are used in the mini mum ED (16χ) and the minimum
ED3 (64χ) cases. In this paper, we propose a set of rules for consistent estimation of the
real performance and power features of the flip-flop and master-slave latch structures.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

A new simulation and optimization approach is presented, targeting both high-


performance and power budget issues. The analysis approach reveals the sources of
performance and power-consumption bottlenecks in different design styles. Certain
misleading parameters have been properly modified and weighted to reflect the real
properties of the compared structures. Furthermore, the results of the comparison of
representative master-slave latches and flip-flops illustrate the advantages of our
approach and the suitability of different design styles for high-performance and low-
power applications.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

CHAPTER-3
PROJECT DISCRIPTION
3.1 EXISTING SYSTEM:
The Shift Register is another type of sequential logic circuit that is used for the
storage or transfer of data in the form of binary numbers and then "shifts" the data out
once every clock cycle, hence the name shift register. It basically consists of several
single bit "D-Type Data Latches", one for each bit (0 or 1) connected together in a serial
or daisy-chain arrangement so that the output from one data latch becomes the input of
the next latch and so on. The data bits may be fed in or out of the register serially, i.e.
one after the other from either the left or the right direction, or in parallel, i.e. all together.
The number of individual data latches required to make up a single Shift Register is
determined by the number of bits to be stored with the most common being 8-bits wide,
i.e. eight individual data latches. Shift Registers are used for data storage or data
movement and are used in calculators or computers to store data such as two binary
numbers before they are added together, or to convert the data from either a serial to
parallel or parallel to serial format. The individual data latches that make up a single shift
register are all driven by a common clock (Clk) signal making them synchronous devices.
Shift register IC's are generally provided with a clear or reset connection so that they can
be "SET" or "RESET" as required. Generally, shift registers operate in one of four
different modes with the basic movement of data through a shift register being: • Serial-
in to Parallel-out (SIPO) - The register is loaded with serial data, one bit at a time, with
the stored data being available in parallel form
. • Serial-in to Serial-out (SISO) - The data is shifted serially "IN" and "OUT" of the
register, one bit at a time in either a left or right direction under clock control.
• Parallel-in to Serial-out (PISO) - The parallel data is loaded into the register
simultaneously and is shifted out of the register serially one bit at a time under clock
control.
• Parallel-in to Parallel-out (PIPO) - The parallel data is loaded simultaneously into the
register, and transferred together to their respective outputs by the same clock pulse. The
effect of data movement from left to right through a shift register can be presented
graphically as:
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Also, the directional movement of the data through a shift register can be either
to the left, (left shifting) to the right, (right shifting) left-in but right-out, (rotation) or
both left and right shifting within the same register thereby making it bidirectional.
Serial-In to Parallel-Out (SIPO)

Fig 3.1: Serial-In to Parallel-Out (SIPO)


The operation is as follows. Let us assume that all the flip-flops (FFA to FFD)
have just been RESET (CLEAR input) and that all the outputs QA to QD are at logic
level "0" i.e., no parallel data output. If a logic "1" is connected to the DATA input pin
of FFA then on the first clock pulse the output of FFA and therefore the resulting QA
will be set HIGH to logic "1" with all the other outputs still remaining LOW at logic "0".
Assume now that the DATA input pin of FFA has returned LOW again to logic "0"
giving us one data pulse or 0-1-0. The second clock pulse will change the output of FFA
to logic "0" and the output of FFB and QB HIGH to logic "1" as its input D has the logic
"1" level on it from QA. The logic "1" has now moved or been "shifted" one place along
the register to the right as it is now at QA. When the third clock pulse arrives this logic
"1" value moves to the output of FFC (QC) and so on until the arrival of the fifth clock
pulse which sets all the outputs QA to QD back again to logic level "0" because the input
to FFA has remained constant at logic level "0". The effect of each clock pulse is to shift
the data contents of each stage one place to the right, and this is shown in the following
table until the complete data value of 0-0-0-1 is stored in the register. This data value can
now be read directly from the outputs of QA to QD. Then the data has been converted
from a serial data input signal to a parallel data output. The truth table and following
waveforms show the propagation of the logic "1" through the register from left to right
as follows. Basic Movement of Data through a Shift Register Clock Pulse No QA QB
QC QD 0 0 0 0 0 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 4 0 0 0 1 5 0 0 0 0 Note that after the fourth
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

clock pulse has ended the 4-bits of data (0-0-0-1) are stored in the register and will
remain there provided clocking of the register has stopped. In practice the input data to
the register may consist of various combinations of logic "1" and "0". Commonly
available SIPO IC's include the standard 8-bit 74LS164 or the 74LS594.
Serial-In to Serial-Out (SISO) :

Fig3.2: Serial-In to Serial-Out (SISO)


This shift register is very similar to the SIPO above, except were before the data
was read directly in a parallel form from the outputs QA to QD, this time the data is
allowed to flow straight through the register and out of the other end. Since there is only
one output, the DATA leaves the shift register one bit at a time in a serial pattern, hence
the name Serial-in to Serial-Out Shift Register or SISO.
The SISO shift register is one of the simplest of the four configurations as it has
only three connections, the serial input (SI) which determines what enters the left hand
flip-flop, the serial output (SO) which is taken from the output of the right hand flip-flop
and the sequencing clock signal (Clk). The logic circuit diagram below shows a
generalized serial-in serial-out shift register. 4-bit Serial-in to Serial-out Shift Register
You may think what the point of a SISO shift register is if the output data is exactly the
same as the input data. Well this type of Shift Register also acts as a temporary storage
device or as a time delay device for the data, with the amount of time delay being
controlled by the number of stages in the register, 4, 8, 16 etc or by varying the
application of the clock pulses.
Commonly available IC’s include the 74HC595 8-bit Serial-in/Serial-out Shift
Register all with 3-state outputs.
Parallel-In to Serial-Out (PISO) :
The Parallel-in to Serial-out shift register acts in the opposite way to the serial-in
to parallel-out one above. The data is loaded into the register in a parallel format i.e. all
the data bits enter their inputs simultaneously, to the parallel input pins PA to PD of the
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

register. The data is then read out sequentially in the normal shift-right mode from the
register at Q representing the data present at PA to PD. This data is outputted one bit at a
time on each clock cycle in a serial format. It is important to note that with this system a
clock pulse is not required to parallel load the register as it is already present, but four
clock pulses are required to unload the data.

Fig3.3: Parallel-In to Serial-Out (PISO)


As this type of shift register converts parallel data, such as an 8-bit data word
into serial format, it can be used to multiplex many different input lines into a single
serial DATA stream which can be sent directly to a computer or transmitted
communications line. Commonly available IC's include the 74HC166 8-bit Parallel-
in/Serial-out Shift Registers.
Parallel-In to Parallel-Out (PIPO):
The final mode of operation is the Parallel-in to Parallel-out Shift Register. This
type of register also acts as a temporary storage device or as a time delay device similar
to the SISO configuration above. The data is presented in a parallel format to the parallel
input pins PA to PD and then transferred together directly to their respective output pins
QA to QA by the same clock pulse. Then one clock pulse loads and unloads the register.
This arrangement for parallel loading and unloading is shown below. 4-bit Parallel-in to
Parallel-out Shift Register
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Fig3.4: Parallel-In to Parallel-Out (PIPO)

The PIPO shift register is the simplest of the four configurations as it has only three
connections, the parallel input (PI) which determines what enters the flip-flop, the
parallel output (PO) and the sequencing clock signal .
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

CHAPTER-4
TOOLS REQUIRED
4.1 Introduction to VLSI:
4.1.1 Historical Perspective:

The electronics industry has achieved a phenomenal growth over the last two
decades, mainly due to the rapid advances in integration technologies, large-scale
systems design - in short, due to the advent of VLSI. The number of applications of
integrated circuits in high-performance computing, telecommunications, and consumer
electronics has been rising steadily, and at a very fast pace. Typically, the required
computational power (or, in other words, the intelligence) of these applications is the
driving force for the fast development of this field. Figure 4.1 gives an overview of the
prominent trends in information technologies over the next few decades. The current
leading-edge technologies (such as low bit-rate video and cellular communications)
already provide the end-users a certain amount of processing power and portability.

Figure 4.1: Overview of the prominent trends in information technologies.

This trend is expected to continue, with very important implications on VLSI and
systems design. One of the most important characteristics of information services is their
increasing need for very high processing power and bandwidth (in order to handle real-
time video, for example). The other important characteristic is that the information
services tend to become more and more personalized (as opposed to collective services
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

such as broadcasting), which means that the devices must be more intelligent to answer
individual demands, and at the same time they must be portable to allow more
flexibility/mobility Table 1.1 shows the evolution of logic complexity in integrated
circuits over the last three decades, and marks the milestones of each era. Here, the
numbers for circuit complexity should be interpreted only as representative examples to
show the order-of-magnitude. A logic block can contain anywhere from 10 to 100
transistors, depending on the function. State-of-the-art examples of ULSI chips, such as
the DEC Alpha or the INTEL Pentium contain 3 to 6 million transistors.
ERA YEAR COMPLEXITY
(no. of logic blocks per chip)
Single transistor 1959 less than 1
Unit logic (one gate) 1960 1
Multi-function 1962 2-4
Complex function 1964 5 - 20
Medium Scale Integration 1967 20 - 200 (MSI)
Large Scale Integration 1972 200 - 2000 (LSI)
Very Large Scale Integration 1978 2000 - 20000(VLSI)
Ultra Large Scale Integration 1989 20000 - ? (ULSI)

Table-1: Evolution of logic complexity in integrated circuits.


The most important message here is that the logic complexity per chip has been
(and still is) increasing exponentially. The monolithic integration of a large number of
functions on a single chip usually provides:

 Less area/volume and therefore, compactness


 Less power consumption
 Less testing requirements at system level
 Higher reliability, mainly due to improved on-chip interconnects
 Higher speed, due to significantly reduced interconnection length
 Significant cost savings
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure-4.2: Evolution of integration density and min feature size, as seen in the
early 1980s.

Therefore, the current trend of integration will also continue in the foreseeable
future. Advances in device manufacturing technology, and especially the steady
reduction of minimum feature size (minimum length of a transistor or an interconnect
realizable on chip) support this trend. Figure 4.2 shows the history and forecast of chip
complexity - and minimum feature size - over time, as seen in the early 1980s. At that
time, a minimum feature size of 0.3 microns was expected around the year 2000. A
minimum size of 0.25 microns was readily achievable by the year 1995. As a direct
result of this, the integration density has also exceeded previous expectations - the first
64 Mbit DRAM, and the INTEL Pentium microprocessor chip containing more than 3
million transistors were already available by 1994, pushing the envelope of integration
density.

Figure-4.3: Level of integration over time, for memory chips and logic chips.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Generally speaking, logic chips such as microprocessor chips and digital signal
processing (DSP) chips contain not only large arrays of memory (SRAM) cells, but also
many different functional units. As a result, their design complexity is considered much
higher than that of memory chips, although advanced memory chips contain some
sophisticated logic functions. This is translated into the increase in the design cycle time,
which is the time period from the start of the chip development until the mask-tape
delivery time. However, in order to make the best use of the current technology, the chip
development time has to be short enough to allow the maturing of chip manufacturing
and timely delivery to customers. As a result, the level of actual logic integration tends to
fall short of the integration level achievable with the current processing technology.
Sophisticated computer-aided design (CAD) tools and methodologies are developed and
applied in order to manage the rapidly increasing design complexity.
4.1.2 VLSI Design Flow:
The design process, at various levels, is usually evolutionary in nature. It starts
with a given set of requirements. Initial design is developed and tested against the
requirements. When requirements are not met, the design has to be improved. If such
improvement is either not possible or too costly, then the revision of requirements and its
impact analysis must be considered. The Y-chart (first introduced by D. Gajski) shown
in Fig. 4.4 illustrates a design flow for most logic chips, using design activities on three
different axes (domains) which resemble the letter Y.

Figure-4.4: Typical VLSI design flow in three domains (Y-chart representation).


LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

The Y-chart consists of three major domains, namely:

 behavioral domain,
 structural domain,
 geometrical layout domain.

The design flow starts from the algorithm that describes the behavior of the target chip.
The corresponding architecture of the processor is first defined. It is mapped onto the
chip surface by floor planning. The next design evolution in the behavioral domain
defines finite state machines (FSMs) which are structurally implemented with functional
modules such as registers and arithmetic logic units (ALUs). These modules are then
geometrically placed onto the chip surface using CAD tools for automatic module
placement followed by routing, with a goal of minimizing the interconnects area and
signal delays. The third evolution starts with a behavioral module description. Individual
modules are then implemented with leaf cells. In standard-cell based design, leaf cells
are already pre-designed and stored in a library for logic design use.

Figure-4.5: A more simplified view of VLSI design flow.


LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure 4.5 provides a more simplified view of the VLSI design flow, taking into
account the various representations, or abstractions of design - behavioral, logic, circuit
and mask layout. Note that the verification of design plays a very important role in every
step during this process. The failure to properly verify a design in its early phases
typically causes significant and expensive re-design at a later stage.

4.1.3 Design Hierarchy:


The use of hierarchy or divide and conquer technique involves dividing a module
into sub- modules and then repeating this operation on the sub-modules until the
complexity of the smaller parts becomes manageable. This approach is very similar to
the software case where large programs are split into smaller and smaller sections until
simple subroutines, with well-defined functions and interfaces, can be written. In Section
1.2, we have seen that the design of a VLSI chip can be represented in three domains.
Correspondingly, a hierarchy structure can be described in each domain separately.
However, it is important for the simplicity of design that the hierarchies in different
domains can be mapped into each other easily. This physical view describes the external
geometry of the adder and how pin locations allow some signals (in this case the carry
signals) to be transferred from one sub-block to the other without external routing. At
lower levels of the physical hierarchy, the internal mask.

Figure-4.6: Structural decomposition of a four-bit adder circuit, showing the


hierarchy down to gate level.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure-4.7: Regular design of a 2-1 MUX, a DFF and an adder, using inverters and
tri-state buffers.

4.1.4 VLSI Design Styles:

Several design styles can be considered for chip implementation of specified algorithms
or logic functions. Each design style has its own merits and shortcomings, and thus a
proper choice has to be made by designers in order to provide the functionality at low
cost.
(i) Field Programmable Gate Array (FPGA)
Fully fabricated FPGA chips containing thousands of logic gates
or even more, with programmable interconnects, are available to users for their custom
hardware programming to realize desired functionality. A typical field programmable
gate array (FPGA) chip consists of I/O buffers, an array of configurable logic blocks
(CLBs), and programmable interconnect structures. The programming of the
interconnects is implemented by programming of RAM cells whose output terminals are
connected to the gates of MOS pass transistors. A general architecture of FPGA from
XILINX is shown in Fig. 4.8. A more detailed view showing the locations of switch
matrices used for interconnect routing is given in Fig. 4.9. A simple CLB (model
XC2000 from XILINX) is shown in Fig. 4.10. It consists of four signal input terminals
(A, B, C, D), a clock signal terminal, user-programmable multiplexers, an SR-latch, and
a look-up table (LUT). The LUT is a digital memory that stores the truth table of the
Boolean function.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

The CLB is configured such that many different logic functions can be realized
by programming its array. More sophisticated CLBs have also been introduced to map
complex functions. At this stage, the chip design is completely described in terms of
available logic cells. Next, the placement and routing step assigns individual logic cells
to FPGA sites (CLBs) and determines the routing patterns among the cells in accordance
with the net list. After routing is completed, the on-chip

Figure-4.8: General architecture of Xilinx FPGAs.

Figure-4.9: Detailed view of switch matrices and interconnection routing between


CLBs.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure-4.10: XC2000 CLB of the Xilinx FPGA.

Performance of the design can be simulated and verified before downloading the
design for programming of the FPGA chip. The programming of the chip remains valid
as long as the chip is powered-on or until new programming is done. In most cases, full
utilization of the FPGA chip area is not possible - many cell sites may remain unused.

The largest advantage of FPGA-based design is the very short turn-around time,
i.e., the time required from the start of the design process until a functional chip is
available. The typical price of FPGA chips are usually higher than other realization
alternatives (such as gate array or standard cells) of the same design, but for small-
volume production of ASIC chips and for fast prototyping, FPGA offers a very valuable
option.
(ii) Gate Array Design
In view of the fast prototyping capability, the gate array (GA) comes after the
FPGA. While the design implementation of the FPGA chip is done with user
programming, that of the gate array is done with metal mask design and processing. Gate
array implementation requires a two-step manufacturing process: The first phase, which
is based on generic (standard) masks, results in an array of uncommitted transistors on
each GA chip. These uncommitted chips can be stored for later customization, which is
completed by defining the metal interconnects between the transistors of the array (Fig.
4.11). Since the patterning of metallic interconnects is done at the end of the chip
fabrication, the turn-around time can be still short, a few days to a few weeks. Figure
4.12 shows a corner of a gate array chip which contains bonding pads on its left and
bottom edges, diodes for I/O protection, nMOS transistors and pMOS transistors for chip
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

output driver circuits in the neighboring areas of bonding pads, arrays of nMOS
transistors and pMOS transistors, underpass wire segments, and power and ground buses
along with contact windows.

Figure-4.11: Basic processing steps required for gate array implementation.

Figure-4.12: A corner of a typical gate array chip.

Figure 4.13 shows a magnified portion of the internal array with metal mask
design (metal lines highlighted in dark) to realize a complex logic function. Typical gate
array platforms allow dedicated areas, called channels, for intercell routing as shown in
Figs. 4.12 and 4.13 between rows or columns of MOS transistors. The interconnection
patterns to realize basic logic gates can be stored in a library, some other platforms also
offer dedicated memory (RAM) arrays to allow a higher density where memory
functions are required. Figure 4.14 shows the layout views of a conventional gate array
and a gate array platform with two dedicated memory banks. With the use of multiple
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

interconnect layers, the routing can be achieved over the active cell areas; thus, the
routing channels can be removed as in Sea-of-Gates (SOG) chips. Here, the entire chip
surface is covered with uncommitted nMOS and pMOS transistors. As in the gate array
case, neighboring transistors can be customized using a metal mask to form basic logic
gates. For intercell routing, however, some of the uncommitted transistors must be
sacrificed. This approach results in more flexibility for interconnections, and usually in a
higher density. The basic platform of a SOG chip is shown in Fig. 4.15. Figure 4.16
offers a brief comparison between the channeled (GA) vs. the channel less (SOG)
approaches.

Figure-4.13: Metal mask design to realize a complex logic function on a channeled


GA platform.

Figure-4.14: Layout views of a conventional GA chip and a gate array with two
memory banks.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure-4.15: The platform of a Sea-of-Gates (SOG) chip.

In general, the GA chip utilization factor, as measured by the used chip area
divided by the total chip area, is higher than that of the FPGA and so is the chip speed,
since more customized design can be achieved with metal mask designs. The current
gate array chips can implement as many as hundreds of thousands of logic gates. (iii)
Standard-Cells Based Design
The standard-cells based design is one of the most prevalent full custom design
styles which require development of a full custom mask set. The standard cell is also
called the polycell. In this design style, all of the commonly used logic cells are
developed, characterized, and stored in a standard cell library. A typical library may
contain a few hundred cells including inverters, NAND gates, NOR gates, complex AOI,
OAI gates, D-latches, and flip-flops. The characterization of each cell is done for several
different categories. It consists of

 delay time vs. load capacitance


 circuit simulation model
 timing simulation model
 fault simulation model
 cell data for place-and-route
 mask data

To enable automated placement of the cells and routing of inter-cell connections,


each cell layout is designed with a fixed height.The power and ground rails typically run
parallel to the upper and lower boundaries of the cell, thus, neighboring cells share a
common power and ground bus. The input and output pins are located on the upper and
lower boundaries of the cell. Figure 4.17 shows the layout of a typical standard cell.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Notice that the nMOS transistors are located closer to the ground rail while the pMOS
transistors are placed closer to the power rail.

Figure-4.16: A standard cell layout example.

Figure 4.18 shows a floor plan for standard-cell based design. Inside the I/O
frame which is reserved for I/O cells, the chip area contains rows or columns of standard
cells. Between cell rows are channels for dedicated inter-cell routing. As in the case of
Sea-of-Gates, with over-the- cell routing, the channel areas can be reduced or even
removed provided that the cell rows offer sufficient routing space. The physical design
and layout of logic cells ensure that when cells are placed into rows, their heights are
matched and neighboring cells can be abutted side-by-side, which provides natural
connections for power and ground lines in each row. The signal delay, noise margins,
and power consumption of each cell should be also optimized with proper sizing of
transistors using circuit simulation.

Figure-4.17: A simplified floor plan of standard-cells-based design.


LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

If a number of cells must share the same input and/or output signals, a common
signal bus structure can also be incorporated into the standard-cell-based chip layout.
Figure 4.19 shows the simplified symbolic view of a case where a signal bus has been
inserted between the rows of standard cells. Note that in this case the chip consists of
two blocks, and power/ground routing must be provided from both sides of the layout
area. Standard-cell based designs may consist of several such macro-blocks, each
corresponding to a specific unit of the system architecture such as ALU, control logic,
etc.

Figure-4.18: Simplified floor plan consisting of two separate blocks and a common
signal bus.

After chip logic design is done using standard cells in the library, the most
challenging task is to place individual cells into rows and interconnect them in a way that
meets stringent design goals in circuit speed, chip area, and power consumption. Many
advanced CAD tools for place-and-route have been developed and used to achieve such
goals. Also from the chip layout, circuit models which include interconnect parasitic can
be extracted and used for timing simulation and analysis to identify timing critical paths.
For timing critical paths, proper gate sizing is often practiced to meet the timing
requirements. In many VLSI chips, such as microprocessors and digital signal processing
chips, standard-cells based design is used to implement complex control logic modules.
Some full custom chips can be also implemented exclusively with standard cells.

Finally, Fig. 4.20 shows the detailed mask layout of a standard-cell-based chip
with an uninterrupted single block of cell rows, and three memory banks placed on one
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

side of the chip. Notice that within the cell block, the separations between neighboring
rows depend on the number of wires in the routing channel between the cell rows. If a
high interconnect density can be achieved in the routing channel, the standard cell rows
can be placed closer to each other, resulting in a smaller chip area. The availability of
dedicated memory blocks also reduces the area, since the realization of memory
elements using standard cells would occupy a larger area.

Figure-4.19: Mask layout of a standard-cell-based chip with a single block of cells


and three memory banks.

(iv) Full Custom Design


Although the standard-cells based design is often called full custom design, in a
strict sense, it is somewhat less than fully custom since the cells are pre-designed for
general use and the same cells are utilized in many different chip designs. In a fuller
custom design, the entire mask design is done anew without use of any library. However,
the development cost of such a design style is becoming prohibitively high. Thus, the
concept of design reuse is becoming popular in order to reduce design cycle time and
development cost For logic chip design, data-path cells and PLAs. In real full-custom
layout in which the geometry, orientation and placement of every transistor is done
individually by the designer, design productivity is usually very low - typically 10 to 20
transistors per day, per designer. In digital CMOS VLSI, full-custom design is rarely
used due to the high labor cost. Exceptions to this include the design of high-volume
products such as memory chips, high- performance microprocessors and FPGA masters.
Figure 4.21 shows the full layout of the Intel 486 microprocessor chip, which is a good
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

example of a hybrid full-custom design. Here, one can identify four different design
styles on one chip: Memory banks (RAM cache), data-path units consisting of bit-slice
cells, control circuitry mainly consisting of standard cells and PLA blocks.

Figure-4.20: Overview of VLSI design styles.


4.2 INTRODUCTION TO XILINX:
4.2.1 MIGRATING PROJECTS FROM PREVIOUS ISE SOFTWARE
RELEASES:
When you open a project file from a previous release, the ISE® software prompts
you to migrate your project. If you click Backup and Migrate or Migrate only, the
software automatically converts your project file to the current release. If you click
Cancel, the software does not convert your project and, instead, opens Project Navigator
with no project loaded.
Note: After you convert your project, you cannot open it in previous versions of the
ISE software, such as the ISE 11 software. However, you can optionally create a
backup of the original project as part of project migration, as described below.
4.2.2 To Migrate a Project

 In the ISE 12 Project Navigator, select File > Open Project.


 In the Open Project dialog box, select the .xise file to migrate.
 Note: You may need to change the extension in the Files of type field to
display .npl (ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10
software) project files.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

 In the dialog box that appears, select Backup and Migrate or Migrate
Only.
 The ISE software automatically converts your project to an ISE 12
project.
 Note: If you chose to Backup and Migrate, a backup of the original
project is created at project_name_ise12migration.zip.

 Implement the design using the new version of the software.

Note: Implementation status is not maintained after migration.


4.2.3 IP Modules:
If your design includes IP modules that were created using CORE Generator™
software or Xilinx® Platform Studio (XPS) and you need to modify these modules, you
may be required to update the core. However, if the core netlist is present and you do
not need to modify the core, updates are not required and the existing netlist is used
during implementation.
4.2.4 Obsolete Source File Types:
The ISE 12 software supports all of the source types that were supported in the
ISE 11 software. If you are working with projects from previous releases, state diagram
source files (.dia), ABEL source files (.abl), and test bench waveform source files (.tbw)
are no longer supported. For state diagram and ABEL source files, the software finds an
associated HDL file and adds it to the project, if possible. To convert a TBW file after
project migration, see Converting a TBW File to an HDL Test Bench

4.2.5 Using ISE Example Projects:


To help familiarize you with the ISE® software and with FPGA and CPLD
designs, a set of example designs is provided with Project Navigator. The examples
show different design techniques and source types, such as VHDL, Verilog, schematic,
or EDIF, and include different constraints and IP.

To Open an Example

 Select File > Open Example.


 In the Open Example dialog box, select the Sample Project Name.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

o Note To help you choose an example project, the Project Description


field describes each project. In addition, you can scroll to the right to see
additional fields, which provide details about the project.

 In the Destination Directory field, enter a directory name or browse to the


directory.
 Click OK.

o The example project is extracted to the directory you specified in the


Destination Directory field and is automatically opened in Project
Navigator. You can then run processes on the example project and save
any changes.

Note If you modified an example project and want to overwrite it with the
original example project, select File > Open Example, select the Sample Project Name,
and specify the same Destination Directory you originally used. In the dialog box that
appears, select Overwrite the existing project and click OK.
4.2.6 Creating a Project:
Project Navigator allows you to manage your FPGA and CPLD designs using an
ISE® project, which contains all the source files and settings specific to your design.
First, you must create a project and then, add source files, and set process properties.
After you create a project, you can run processes to implement, constrain, and analyze
your design. Project Navigator provides a wizard to help you create a project as follows.
Note If you prefer, you can create a project using the New Project dialog box
instead of the New Project Wizard. To use the New Project dialog box, deselect the
Use New Project wizard option in the ISE General page of the Preferences dialog
box.
To Create a Project

 Select File > New Project to launch the New Project Wizard.
 In the Create New Project page, set the name, location, and project type, and
click Next.
 For EDIF or NGC/NGO projects only: In the Import EDIF/NGC Project page,
select the input and constraint file for the project, and click Next.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

 In the Project Settings page, set the device and project properties, and click
Next.
 In the Project Summary page, review the information, and click Finish to
create the project

4.2.7 Design panel:


Project Navigator manages your project based on the design properties (top-level
module type, device type, synthesis tool, and language) you selected when you created
the project. It organizes all the parts of your design and keeps track of the processes
necessary to move the design from design entry through implementation to
programming the targeted Xilinx® device.
Note For information on changing design properties, see Changing Design Properties.
 You can now perform any of the following:
Create new source files for your project.
Add existing source files to your project.
Run processes on your source files.

4.2.8 Creating a Copy of a Project:


You can create a copy of a project to experiment with different source options
and implementations. Depending on your needs, the design source files for the copied
project and their location can vary as follows:

 Design source files are left in their existing location, and the copied project
points to these files.
 Design source files, including generated files, are copied and placed in a
specified directory.
 Design source files, excluding generated files, are copied and placed in a
specified directory.

4.2.9 Using the Project Browser:


Alternatively, you can create an archive of your project, which puts all of
the project contents into a ZIP file. Archived projects must be unzipped before being
opened in Project Navigator. For information on archiving, see Creating a Project
Archive.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

To Create a Copy of a Project

1. Select File > Copy Project.


2. In the Copy Project dialog box, enter the Name for the copy.
Note The name for the copy can be the same as the name for the project, as long
as you specify a different location.
3. Enter a directory Location to store the copied project.
4. Optionally, enter a Working directory.
By default, this is blank, and the working directory is the same as the project
directory. However, you can specify a working directory if you want to keep
your ISE® project file (.xise extension) separate from your working area.
5. Optionally, enter a Description for the copy.
The description can be useful in identifying key traits of the project for
reference later.
6. In the Source options area, do the following:
Select one of the following options:
 Keep sources in their current locations - to leave the design source files in their
existing location.
4.2.10. Exclude generated files from the copy:
When you select this option, the copied project opens in a state in which
processes have not yet been run. To automatically open the copy after creating it, select
Open the copied project.
Note By default, this option is disabled. If you leave this option disabled, the original
project remains open after the copy is made.Click OK.
4.2.11 Creating a Project Archive:
A project archive is a single, compressed ZIP file with a .zip extension. By default, it
contains all project files, source files, and generated files, including the following:

 User-added sources and associated files


 Remote sources
 Verilog `include files
 Files in the macro search path
 Generated files
 Non-project files
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

4.2.12 Archive a Project:

 Select Project > Archive.


 In the Project Archive dialog box, specify a file name and directory for
the ZIP file.
 Optionally, select Exclude generated files from the archive to exclude
generated files and non-project files from the archive.
 Click OK.

A ZIP file is created in the specified directory. To open the archived project, you
must first unzip the ZIP file, and then, you can open the project.
Note Sources that reside outside of the project directory are copied into a
remote_sources subdirectory in the project archive.
4.3 Introduction to Verilog:
In the semiconductor and electronic-design industry, Verilog is a hardware description
language(HDL) used to model electronic systems. Verilog HDL, not to be confused
with VHDL (a competing language), is most commonly used in the design, verification,
and implementation of digital logic chips at the register-transfer level of abstraction. It is
also used in the verification of analog and mixed-signal circuits.
4.3.1 Overview :

Hardware description languages such as Verilog differ from


software programming languages because they include ways of describing the
propagation of time and signal dependencies (sensitivity). There are two assignment
operators, a blocking assignment (=), and a non-blocking (<=) assignment. The non-
blocking assignment allows designers to describe a state-machine update without
needing to declare and use temporary storage variables (in any general programming
language we need to define some temporary storage spaces for the operands to be
operated on subsequently; those are temporary storage variables). Since these concepts
are part of Verilog's language semantics, designers could quickly write descriptions of
large circuits in a relatively compact and concise form. At the time of Verilog's
introduction (1984), Verilog represented a tremendous productivity improvement for
circuit designers who were already using graphical schematic capture software and
specially-written software programs to document and simulate electronic circuits.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and strengths (strong, weak, etc.). This system allows abstract modeling of
shared signal lines, where multiple sources drive a common net. When a wire has
multiple drivers, the wire's (readable) value is resolved by a function of the source
drivers and their strengths. A subset of statements in the Verilog language
is synthesizable. Verilog modules that conform to a synthesizable coding style, known as
RTL (register-transfer level), can be physically realized by synthesis software. Synthesis
software algorithmically transforms the (abstract) Verilog source into a net list, a
logically equivalent description consisting only of elementary logic primitives (AND,
OR, NOT, flip-flops, etc.) that are available in a specific FPGA or VLSI technology.
Further manipulations to the net list ultimately lead to a circuit fabrication blueprint
(such as a photo mask set for an ASIC or a bit stream file for an FPGA.

4.3.2 HISTORY:
4.3.2 (a) Beginning

Verilog was the first modern hardware description language to be invented. It


was created by Phil Moorby and Prabhu Goel during the winter of 1983/1984. The
wording for this process was "Automated Integrated Design Systems" (later renamed
to Gateway Design Automation in 1985) as a hardware modeling language. Gateway
Design Automation was purchased by Cadence Design Systems in 1990. Cadence now
has full proprietary rights to Gateway's Verilog and the Verilog-XL, the HDL-simulator
that would become the de-facto standard (of Verilog logic simulators) for the next
decade. Originally, Verilog was intended to describe and allow simulation; only
afterwards was support for synthesis added.

4.3.2 (b) Verilog-95

With the increasing success of VHDL at the time, Cadence decided to make the
language available for open standardization. Cadence transferred Verilog into the public
domain under the Open Verilog International (OVI) (now known as Accellera)
organization. Verilog was later submitted to IEEE and became IEEE Standard 1364-
1995, commonly referred to as Verilog-95. In the same time frame Cadence initiated the
creation of Verilog-A to put standards support behind its analog simulator Spectre.
Verilog-A was never intended to be a standalone language and is a subset of Verilog-
AMS which encompassed Verilog-95.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

4.3.2(c) Verilog 2001

Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies


that users had found in the original Verilog standard. These extensions
became IEEE Standard 1364-2001 known as Verilog-2001. Verilog-2001 is a significant
upgrade from Verilog-95. First, it adds explicit support for (2's complement) signed nets
and variables. Previously, code authors had to perform signed operations using awkward
bit-level manipulations The same function under Verilog-2001 can be more succinctly
described by one of the built-in operators: +, -, /, *, >>>. A generate/end generate
construct (similar to VHDL's generate/end generate) allows Verilog-2001 to control
instance and statement instantiation through normal decision operators (case/if/else).
Using generate/end generate, Verilog-2001 can instantiate an array of instances, with
control over the connectivity of the individual instances. File I/O has been improved by
several new system tasks. And finally, a few syntax additions were introduced to
improve code readability (e.g. always @*, named parameter override, C-style
function/task/module header declaration).

4.3.2(d) Verilog 2005

Not to be confused with SystemVerilog, Verilog 2005 (IEEE Standard 1364-


2005) consists of minor corrections, spec clarifications, and a few new language features
(such as the unwire keyword). A separate part of the Verilog standard, Verilog-AMS,
attempts to integrate analog and mixed signal modeling with traditional Verilog.

4.3.2(e) SystemVerilog

SystemVerilog is a superset of Verilog-2005, with many new features and


capabilities to aid design verification and design modeling. As of 2009, the
SystemVerilog and Verilog language standards were merged into SystemVerilog 2009
(IEEE Standard 1800-2009). In the late 1990s, the Verilog Hardware Description
Language (HDL) became the most widely used language for describing hardware for
simulation and synthesis. However, the first two versions standardized by the IEEE
(1364-1995 and 1364-2001) had only simple constructs for creating tests. As design
sizes outgrew the verification capabilities of the language, commercial Hardware
Verification Languages (HVL) such as Open Vera and e were created. Companies that
did not want to pay for these tools instead spent hundreds of man-years creating their
own custom tools. This productivity crisis (along with a similar one on the design side)
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

led to the creation of Accellera, a consortium of EDA companies and users who wanted
to create the next generation of Verilog. The donation of the Open-Vera language formed
the basis for the HVL features of SystemVerilog.Accellera’s goal was met in November
2005 with the adoption of the IEEE standard P1800-2005 for SystemVerilog, IEEE
(2005). The most valuable benefit of SystemVerilog is that it allows the user to construct
reliable, repeatable verification environments, in a consistent syntax, that can be used
across multiple projects

Some of the typical features of an HVL that distinguish it from a Hardware Description
Language such as Verilog or VHDL are
 Constrained-random stimulus generation
 Functional coverage
 Higher-level structures, especially Object Oriented Programming
 Multi-threading and intercrosses communication
 Support for HDL types such as Verilog’s 4-state values
 Tight integration with event-simulator for control of the design
There are many other useful features, but these allow you to create test benches at
a higher level of abstraction than you are able to achieve with an HDL or a programming
language such as C. System Verilog provides the best framework to achieve coverage-
driven verification (CDV). CDV combines automatic test generation, self-checking test
benches, and coverage metrics to significantly reduce the time spent verifying a design.
The purpose of CDV is to:
 Eliminate the effort and time spent creating hundreds of tests.

 Ensure thorough verification using up-front goal setting.

4.3.2(f) Examples
Ex1: A hello world program looks like this:
module main;
initial
begin
$display("Hello world!");
$finish;
end
end module
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Ex2: A simple example of two flip-flops follows:

module top-level(clock, reset);


input clock;
input reset;
reg flop1;
reg flop2;
always @ (posedge reset or posedge clock)
if (reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
flop1 <= flop2;
flop2 <= flop1;
end
end module

The "<=" operator in Verilog is another aspect of its being a hardware description
language as opposed to a normal procedural language. This is known as a "non-
blocking" assignment. Its action doesn't register until the next clock cycle. This means
that the orders of the assignments are irrelevant and will produce the same result: flop1
and flop2 will swap values every clock. The other assignment operator, "=", is referred
to as a blocking assignment. When "=" assignment is used, for the purposes of logic, the
target variable is updated immediately. In the above example, had the statements used
the "=" blocking operator instead of "<=", flop1 and flop2 would not have been swapped.

Ex3: An example counter circuit follows:

module Div20x (rst, clk, cet, cep, count, tc);


// TITLE 'Divide-by-20 Counter with enables'
// enable CEP is a clock enable only
// enable CET is a clock enable and
// enables the TC output
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

// a counter using the Verilog language


parameter size = 5;
parameter length = 20;
input rst; // These inputs/outputs represent
input clk; // connections to the module.
input cet;
input cep;
output [size-1:0] count;
output tc;
reg [size-1:0] count; // Signals assigned
// within an always
// (or initial)block
// must be of type reg
wire tc; // Other signals are of type wire
// The always statement below is a parallel
// executes any time the signals
// rst or clk transition from low to high
always @ (posedge clk or posedge rst)
if (rst) // This causes reset of the cntr
count <= {size{1'b0}};
else
if (cet && cep) // Enables both true
begin
if (count == length-1)
count <= {size{1'b0}};
else
count <= count + 1'b1;
end
// the value of tc is continuously assigned

// the value of the expression


assign tc = (cet && (count == length-1));
end module

Ex4: An example of delays:

reg a, b, c, d;
wire e;
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

...
always @(b or e)
begin
a = b & e;
b = a | b;
#5 c = b;
d = #6 c ^ e;
end

4.3.3 Constants:

The definition of constants in Verilog supports the addition of a width parameter.


The basic syntax is:

<Width in bits>'<base letter><number>

Examples:

 12'h123 - Hexadecimal 123 (using 12 bits)


 20'd44 - Decimal 44 (using 20 bits - 0 extension is automatic)
 4'b1010 - Binary 1010 (using 4 bits)
 6'o77 - Octal 77 (using 6 bits)

4.3.4 Synthesizable Constructs:

There are several statements in Verilog that have no analog in real hardware, e.g.
$display. Consequently, much of the language cannot be used to describe hardware. The
examples presented here are the classic subset of the language that has a direct mapping
to real gates.

// Mux examples - Three ways to do the same thing.


// The first example uses continuous assignment
wire out;
assign out = sel ? a : b;
// the second example uses a procedure
// to accomplish the same thing.
reg out;
always @(a or b or sel)
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

begin
case(sel)
1'b0: out = b;
1'b1: out = a;
endcase
end
// Finally - you can use if/else in a
// procedural structure.
reg out;
always @(a or b or sel)
if (sel)
out = a;
else
out = b;

The next interesting structure is a transparent latch; it will pass the input to the
output when the gate signal is set for "pass-through", and captures the input and stores it
upon transition of the gate signal to "hold". In the example below the "pass-through"
level of the gate would be when the value of the if clause is true, i.e. gate = 1. This is
read "if gate is true, the din is fed to latch out continuously." Once the if clause is false,
the last value at latch out will remain and is independent of the value of din.

EX6: // Transparent latch example


reg out;
always @(gate or din)
if(gate)
out = din; // Pass through state
// Note that the else isn't required here. The variable
// When gate goes low, out will remain constant.

The flip-flop is the next significant template; in Verilog, the D-flop is the
simplest, and it can be modeled as:

reg q;
always @(posedge clk)
q <= d;
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

The significant thing to notice in the example is the use of the non-blocking
assignment. A basic rule of thumb is to use <= when there is a
posedge or negedge statement within the always clause. A variant of the D-flop is one
with an asynchronous reset.

reg q;
always @(posedge clk or posedge reset)
if(reset)
q <= 0;
else
q <= d;

The next variant is including both an asynchronous reset and asynchronous set
condition; again the convention comes into play, i.e. the reset term is followed by the set
term.

reg q;
always @(posedge clk or posedge reset or posedge set)
if(reset)
q <= 0;
else
if(set)
q <= 1;
else
q <= d;

Note: If this model is used to model a Set/Reset flip flop then simulation errors
can result. Consider the following test sequence of events. 1) reset goes high 2) clk goes
high 3) set goes high 4) clk goes high again 5) reset goes low followed by 6) set going
low.

4.3.5 Initial Vs Always:

There are two separate ways of declaring a Verilog process. These are
the always and the initial keywords. The always keyword indicates a free-running
process. The initial keyword indicates a process executes exactly once. Both constructs
begin execution at simulator time 0, and both execute until the end of the block. Once
an always block has reached its end, it is rescheduled (again).
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

//Examples:
initial
begin
a = 1; // Assign a value to reg a at time 0
#1; // Wait 1 time unit
b = a; // Assign the value of reg a to reg b
end
always @(a or b) // Any time a or b CHANGE, run the process
begin
if (a)
c = b;
else
d = ~b;
end // Done with this block, now return to the top (i.e. the @ event-control)
always @(posedge a)// Run whenever reg a has a low to high change
a <= b;

These are the classic uses for these two keywords, but there are two significant
additional uses. The most common of these is an always keyword without
the @(...) sensitivity list. It is possible to use always as shown below:

always
begin // Always begins executing at time 0 and NEVER stops
clk = 0; // Set clk to 0
#1; // Wait for 1 time unit
clk = 1; // Set clk to 1
#1; // Wait 1 time unit
end // Keeps executing - so continue back at the top of the begin

The always keyword acts similar to the "C" construct while(1) {..} in the sense
that it will execute forever. The other interesting exception is the use of
the initial keyword with the addition of the forever keyword.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

4.3.6 Race Condition:

The order of execution isn't always guaranteed within Verilog. This can best be
illustrated by a classic example. Consider the code snippet below:

initial
a = 0;
initial
b = a;
initial
begin
#1;
$display ("Value a=%b Value of b=%b",a,b);
end
What will be printed out for the values of a and b? Depending on the order of execution
of the initial blocks, it could be zero and zero.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

4.3.7 Operators:

Note: These operators are not shown in order of precedence.

Operator type Operator Operation performed


symbols
~ Bitwise NOT (1's complement)

& Bitwise AND

Bitwise | Bitwise OR

^ Bitwise XOR

~^ or ^~ Bitwise XNOR

! NOT

Logical && AND

|| OR

& Reduction AND

~& Reduction NAND

| Reduction OR
Reduction

~| Reduction NOR

^ Reduction XOR

~^ or ^~ Reduction XNOR
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

+ Addition

- Subtraction

- 2's complement
Arithmetic

* Multiplication

/ Division

** Exponentiation (*Verilog-2001)

> Greater than

< Less than

>= Greater than or equal to

<= Less than or equal to


Relational

== Logical equality (bit-value 1'bX is removed from


comparison)
!= Logical inequality (bit-value 1'bX is removed
from comparison)
=== 4-state logical equality (bit-value 1'bX is taken as
literal)
!== 4-state logical inequality (bit-value 1'bX is taken
as literal)
>> Logical right shift
Shift

<< Logical left shift


LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

>>> Arithmetic right shift (*Verilog-2001)

<<< Arithmetic left shift (*Verilog-2001)

Concatenation {,} Concatenation

Replication {n{m}} Replicate value m for n times

Conditional ?: Conditional

Table 2: List of Operators.

4.3.8 System Tasks:


System tasks are available to handle simple I/O, and various design measurement
functions. All system tasks are prefixed with $ to distinguish them from user tasks and
functions. This section presents a short list of the most often used tasks. It is by no means
a comprehensive list.

 $display - Print to screen a line followed by an automatic newline.


 $write - Write to screen a line without the newline.
 $swrite - Print to variable a line without the newline.
 $sscanf - Read from variable a format-specified string. (*Verilog-2001)
 $fopen - Open a handle to a file (read or write)
 $fdisplay - Write to file a line followed by an automatic newline.
 $fwrite - Write to file a line without the newline.
 $fscanf - Read from file a format-specified string. (*Verilog-2001)
 $fclose - Close and release an open file handle.
 $readmemh - Read hex file content into a memory array.
 $readmemb - Read binary file content into a memory array.
 $monitor - Print out all the listed variables when any change value.
 $time - Value of current simulation time.
 $dumpfile - Declare the VCD (Value Change Dump) format output file name.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

 $dumpvars - Turn on and dump the variables.


 $dumpports - Turn on and dump the variables in Extended-VCD format.8

 $random - Return a random value.


LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

CHAPTER-5

PROPOSED SYSTEM
An N-bit shift register is composed of series connected N data flip-flops.The
speed of the flip-flop is less important than the area and power consumption because
there is no circuit between flip-flips in the shift register. The smallest flip-flop is suitable
for the shift register to reduce the area and power consumption. Recently, pulsed latches
have replaced flip-flops in many applications, because a pulsed latch is much smaller
than a flip-flop . But the pulsed latch cannot be used in a shift register due to the timing
problem between pulsed latches.

Figure-5.1: (a) Master-slave flip-flop (b) Pulsed latch

A master-slave flip-flop using two latches in Fig. 1(a) can be replaced by a


pulsed latch consisting of a latch and a pulsed clock signal in Fig. 1(b). All pulsed
latches share the pulse generation circuit for the pulsed clock signal. As a result, the area
and power consumption of the pulsed latch become almost half of those of the master-
slave flip-flop. The pulsed latch is an attractive solution for small area and low power
consumption.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure 5.2: Shift register with latches and a pulsed clock signal. (a) Schematic.
(b)Waveforms.

The pulsed latch cannot be used in shift registers due to the timing problem, as
shown in Fig. 3.1. The shift register in Fig. 3.1(a) consists of several latches and a pulsed
clock signal (CLK_pulse). The operation waveforms in Fig. 3.1(b) show the timing
problem in the shifter register. The output signal of the first latch (Q1) changes correctly
because the input signal of the first latch (IN) is constant during the clock pulse width
(TPULSE). But the second latch has an uncertain output signal (Q2) because its input
signal (Q1) changes during the clock pulse width.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure 5.2: shift register with latches and pulsed clock signals (a) schematic
(b) wave forms
However, the delay circuits cause large area and power overheads. Another
solution is to use multiple non-overlap delayed pulsed clock signals, as shown in Fig.
3.2(a). The delayed pulsed clock signals are generated when a pulsed clock signal goes
through delay circuits. Each latch uses a pulsed clock signal which is delayed from the
pulsed clock signal used in its next latch. Therefore, each latch updates the data after its
next latch updates the data. As a result, each latch has a constant input during its clock
pulse and no timing problem occurs between latches.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure-5.2: Shift register with latches, delay circuits, and a pulsed clock signal.
(a) Schematic. (b) Waveforms.

One solution for the timing problem is to add delay circuits between latches, as
shown in Fig. 3.3(a). The output signal of the latch is delayed and reaches the next latch
after the clock pulse. As shown in Fig. 3.3(b). the output signals of the first and second
latches (Q1 and Q2) change during the clock pulse width, but the input signals of the
second and third latches (D2 and D3) become the same as the output signals of the first
and second latches (Q1 and Q2) after the clock pulse. As a result, all latches have
constant input signals during the clock pulse and no timing problem occurs between the
latches.

Figure- 5.4: delayed pulsed clock generator

Five non-overlap delayed pulsed clock signals are generated by the delayed
pulsed clock generator in Fig. 3.4. The sequence of the pulsed clock signals is in the
opposite order of the five latches. Initially, the pulsed clock signal CLK_pulseT
updates the latch data T1 from Q4. And then, the pulsed clock signals CLK_pulse1:4
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

update the four latch data from Q4 to Q1 sequentially. The latches Q2–Q4 receive data
from their previous latches Q1–Q3 but the first latch Q1 receives data from the input of
the shift register (IN). The operations of the other sub shift registers are the same as that
of the sub shift register #1 except that the first latch receives data from the temporary
storage latch in the previous sub shift register.

As shown in Fig. 5.4.each pulsed clock signal is generated in a clock-pulse circuit


consisting a delay circuit and an AND gate. When an N-bit shift register is divided into
K-bit sub shift registers, the number of clock-pulse circuits is K+1 and the number of
latches is N+N/K. A K-bit sub shift register consisting of K+1 latches requires K+1
pulsed clock signals. The number of sub shift registers (M) becomes N/K, each sub shift
register has a temporary storage latch. Therefore, N/K latches are added for the
temporary storage latches.

The conventional delayed pulsed clock circuits in Fig. 5.2 can be used to save the AND
gates in the delayed pulsed clock generator in Fig.5.4 In the conventional delayed pulsed
clock circuits, the clock pulse width must be larger than the summation of the rising and
falling times in all inverters in the delay circuits to keep the shape of the pulsed clock.
However, in the delayed pulsed clock generator in Fig. 5.4 the clock pulsed width can be
shorter than the summation of the rising and falling times because each sharp pulsed
clock signal is generated from an AND gate and two delayed signals. Therefore, the
delayed pulsed clock generator is suitable for short pulsed clock signals.

The power is consumed mainly in latches and clock-pulse circuits. Each latch
consumes power for data transition and clock loading. When the circuit powers are
normalized with a latch, the power consumption of a latch and a clock-pulse circuit are 1
and αP, respectively. The total power consumption is also αP ×(K+1) +N(1+1/K). An
integer for the minimum power is selected as a divisor of N, which is nearest to (N/ αP).
In K selection, the clock buffers in Fig. 5.4 are not considered. The total size of the clock
buffers is determined by the total clock loading of latches. Although the number of
latches increases from N to N(1+1/K) , the increment ratio of the clock buffers is small.
The number of clock buffers is K. As K increases, the size of a clock buffer decreases in
proportion to 1/K. because the number of latches connected to a clock buffer M= N/K is
proportional to 1/K. Therefore, the total size of the clock buffers increases slightly with
increasing K and the effect of the clock buffers can be neglected for choosing K.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure 5.4: proposed shift register


However, this solution also requires many delay circuits. Fig. 5.1.shows an
example the proposed shift registers. The proposed shift register is divided into sub
shifter registers to reduce the number of delayed pulsed clock signals. A 4-bit sub shifter
register consists of five latches and it performs shift operations with five non-overlap
delayed pulsed clock signals (CLK_pulse1:4 and CLK_pulseT). In the 4-bit sub shift
register #1, four latches store 4-bit data (Q1-Q4) and the last latch stores 1-bit temporary
data (T1) which will be stored in the first latch (Q5) of the 4-bit sub shift register #2. Fig.
5.2(b) shows the operation waveforms in the proposed shift register.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure 5.5: proposed shift register wave forms


The sequence of the pulsed clock signals is in the opposite order of the five
latches. Initially, the pulsed

Q4 to Q1 sequentially. The latches Q2–Q4 receive data from their previous latches Q1–
Q3 but the first latch Q1 receives data from the input of the shift register (IN). The
operations of the other sub shift registers are the same as that of the sub shift register #1
except that the first latch receives data from the temporary storage latch in the previous
sub shift register. The proposed shift register reduces the number of delayed pulsed clock
signals significantly, but it increases the number of latches because of the additional
temporary storage latches.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Figure 5.6: minimum clock cycle time of the proposed shift register
The maximum number of K is limited to the target clock frequency. As shown in
Fig. 7 the minimum clock cycle time (TCLK_MIN) is TCP +K × TDELEY+TCQ, where TCP is
the delay from the rising edge of the main clock signal (CLK) to the rising edge of the
first pulsed clock signal (CLK_pulseT), TDELEY is the delay of two neighbor pulsed
clock signals, TCQ is the delay from the rising edge of the last pulsed clock signal
(CLK_pulse1) to the output signal of the latch Q1. TCLK_MIN is proportional to K. As K
increases, the maximum clock frequency (fCLK_MAX=1/TCLK_MIN)decreases in proportion
to 1/K. Therefore, K must be selected under the maximum number which is determined
by the maximum clock frequency of the target applications.
The K+1 pulsed clock signals in Fig. 7 are supplied to all sub shift registers. Each pulsed
clock signal arrives at the sub shift registers at different time due to the pulse skew in the
wire. The pulse skew increases proportional to the wire distance from the delayed pulsed
clock generator. All pulsed clock signals have almost the same pulse skews when they
arrive at the same sub shift register. Therefore, in the same sub shift register, the pulse
skew differences between the pulsed clock signals are very small. The clock pulse
intervals larger than the pulse skew differences cancel out the effects of the pulse skew
differences. Also, the pulse skew differences between the different sub shift registers do
not cause any timing problem, because two latches connecting two sub shift registers use
the first and last pulsed clocks (CLK_pulseT and CLK_pulse1) which have a long
clock pulse interval. In a long shift register, a short clock pulse cannot through a long
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

wire due to parasitic capacitance and resistance. At the end of the wire, the clock pulse
shape is degraded because the rising and falling times of the clock pulse increase due to
the wire delay. A simple solution is to increase the clock pulse width for keeping the
clock pulse shape. But this decreases the maximum clock frequency. Another solution is
to insert clock buffers and clock trees to send the short clock pulse with a small wire
delay. But this increases the area and power overhead. Moreover, the multiple clock
pulses make the more overhead for multiple clock buffers and clock trees.

The maximum clock frequency in the conventional shift register is limited to only
the delay of flip-flops because there is no delay between flip-flips. Therefore, the area
and power consumption are more important than the speed for selecting the flip-flop.
The proposed shift register uses latches instead of flip- flops to reduce the area and
power consumption.

Figure 5.7: Schematic of the SSASPL


In chip implementation, the SSASPL (static differential sense amp shared pulse
latch) in Fig. 5.4, which is the smallest latch, is selected. The original SSASPL with 9
transistors [6] is modified to the SSASPL with 7 transistors in Fig. 5.4 by removing an
inverter to generate the complementary data input (Db) from the data input (D). In the
proposed shift register, the differential data inputs (D and Db) of the latch come from the
differential data outputs (Q and Qb) of the previous latch. The SSASPL uses the smallest
number of transistors (7 transistors) and it consumes the lowest clock power because it
has a single transistor driven by the pulsed clock signal.
The SSASPL updates the data with three NMOS transistors (M1-M3) and it holds
the data with four transistors in two cross-coupled inverters. It requires two differential
data inputs (D and Db) and a pulsed clock signal. When the pulsed clock signal is high,
its data is updated. The node Q or Qb is pulled down to ground according to the input
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

data (D and Db). The pull-down current of the NMOS transistors (M1-M3) must be larger
than the pull-up current of the PMOS transistors in the inverters. The SSASPL was
implemented and simulated with a 0.18µm CMOS process at VDD=1.8V. The sizes
(W/L) of the three NMOS transistors (M1-M3) are 1µm/0.18µm . The sizes of the NMOS
and PMOS transistors in the two inverters are all 0.5µm/0.18µm. The minimum clock
pulse width of the SSASPL to update the data is 62 ps at a typical process simulation
(TT) and 54–76 ps at all process corner simulations (FF-SS). The rising and falling times
of the clock pulse are approximately 100 ps. The clock pulse shape can be degraded due
to the
wire delay, signal coupling and supply noise. The clock pulse width of 170ps was
selected by adding the timing applications.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

CHAPTER-6
SIMULATION AND SYNTHESIS RESULTS
6.1 Existed system :
//top module
module top(
input [7:0] nm,
input tm_nm,clk,rst,xmitH,
input [2:0] sh,
input [15:0] baud_rate_div,
output p_f
);

wire [7:0] out,y,y1,rec_dataH;


bistcu b1(nm,y,rst,clk,tm_nm,out);
lfsr1 l1(clk,tm_nm,y);
uart_top
u1(clk,rst,out,rec_readyH,xmitH,rec_dataH,baud_rate_div,xmit_doneH);
misr m1(rec_dataH,sh,y1);
TRA t1(y1,clk,rst,p_f);

End module

//feedback shift register


module lfsr1(clk,rst,y);
input clk,rst;
output reg [7:0]y;
always@(posedge clk)

if(rst==0) y<=8'b10010101;
else begin
y[7]<= y[1] ^ y[0];
y[6]<=y[7];
y[5]<=y[6];
y[4]<=y[5];
y[3]<=y[4];
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

y[2]<=y[3];
y[1]<=y[2];
y[0]<=y[1];
end
endmodule

6.2 Synthesis report


Started : "Synthesize - XST".
Running xst...
Command Line: xst -intstyle ise -ifn "F:/vlsi/2015 m.tech/BISTUART/top.xst" -ofn
"F:/vlsi/2015 m.tech/BISTUART/top.syr"
Reading design: top.prj
=============================================================
* HDL Compilation *
=============================================================
Compiling verilog file "u_xmit.v" in library work
Compiling verilog file "u_rcvr.v" in library work
Module <u_xmit> compiled
Compiling verilog file "org.v" in library work
Module <u_rcvr> compiled
Compiling verilog file "mux.v" in library work
Module <org> compiled
Compiling verilog file "EP1.v" in library work
Module <mux> compiled
Compiling verilog file "EP0.v" in library work
Module <EP1> compiled
Compiling verilog file "dff.v" in library work
Module <EP0> compiled
Compiling verilog file "baud.v" in library work
Module <dff> compiled
Compiling verilog file "uart_top.v" in library work
Module <baud> compiled
Compiling verilog file "TRA.v" in library work
Module <uart_top> compiled
Compiling verilog file "misr.v" in library work
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Module <TRA> compiled


Compiling verilog file "lfsr1.v" in library work
Module <misr> compiled
Compiling verilog file "bistcu.v" in library work
Module <lfsr1> compiled
Compiling verilog file "top.v" in library work
Module <bistcu> compiled
Module <top> compiled
No errors in compilation
Analysis of file <"top.prj"> succeeded.
=============================================================
* Design Hierarchy Analysis *
=============================================================
Analyzing hierarchy for module <top> in library <work>.
Analyzing hierarchy for module <bistcu> in library <work>.
Analyzing hierarchy for module <lfsr1> in library <work>.
Analyzing hierarchy for module <uart_top> in library <work>.
Analyzing hierarchy for module <misr> in library <work>.
Analyzing hierarchy for module <TRA> in library <work>.
Analyzing hierarchy for module <mux> in library <work>.
Analyzing hierarchy for module <dff> in library <work>.
Analyzing hierarchy for module <u_rcvr> in library <work> with parameters.
Hi = "1"
Lo = "0"
r_center = "010"
r_sample = "100"
r_start = "000"
r_stop = "101"
r_wait = "011"
word_len = "1000"
Analyzing hierarchy for module <u_xmit> in library <work> with parameters.
Hi = "1"
Lo = "0"
word_len = "00000000000000000000000000001000"
x_idle = "000"
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

x_shift = "011"
x_shiftReg = "10"
x_start = "001"
x_startbit = "00"
x_stop = "100"
x_stopbit = "01"
x_wait = "010"
Analyzing hierarchy for module <baud> in library <work>.
Analyzing hierarchy for module <EP0> in library <work>.
Analyzing hierarchy for module <EP1> in library <work>.
Analyzing hierarchy for module <org> in library <work>.
=============================================================
* HDL Analysis *
=============================================================
Analyzing top module <top>.
Module <top> is correct for synthesis.
Analyzing module <bistcu> in library <work>.
Module <bistcu> is correct for synthesis.
Analyzing module <mux> in library <work>.
Module <mux> is correct for synthesis.
Analyzing module <dff> in library <work>.
Module <dff> is correct for synthesis.
Analyzing module <lfsr1> in library <work>.
Module <lfsr1> is correct for synthesis.
Analyzing module <uart_top> in library <work>.
Module <uart_top> is correct for synthesis.
Analyzing module <u_rcvr> in library <work>.
Hi = 1'b1
Lo = 1'b0
r_center = 3'b010
r_sample = 3'b100
r_ start = 3'b000
r_ stop = 3'b101
r_wait = 3'b011
word_len = 4'b1000
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

<par_dataH>
Module <u_rcvr> is correct for synthesis.
Analyzing module <u_xmit> in library <work>.
Hi = 1'b1
Lo = 1'b0
word_len = 32'sb00000000000000000000000000001000
x_idle = 3'b000
x_shift = 3'b011
x_shiftReg = 2'b10
x_start = 3'b001
x_startbit = 2'b00
x_stop = 3'b100
x_stopbit = 2'b01
x_wait = 3'b010
Module <u_xmit> is correct for synthesis.
Analyzing module <baud> in library <work>.
Module <baud> is correct for synthesis.
Analyzing module <misr> in library <work>.
Module <misr> is correct for synthesis.
Analyzing module <TRA> in library <work>.
Module <TRA> is correct for synthesis.
Analyzing module <EP0> in library <work>.
<clk>
Module <EP0> is correct for synthesis.
Analyzing module <EP1> in library <work>.
<clk>
Module <EP1> is correct for synthesis.
Analyzing module <org> in library <work>.
Module <org> is correct for synthesis.
=============================================================
* HDL Synthesis *
=============================================================

Performing bidirectional port resolution...


Synthesizing Unit <lfsr1>.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Related source file is "lfsr1.v".


Found 8-bit register for signal <y>.
Found 1-bit xor2 for signal <y_7$xor0000> created at line 28.
Summary:
inferred 8 D-type flip-flop(s).
Unit <lfsr1> synthesized.
Synthesizing Unit <misr>.
Related source file is "misr.v".
Found 8-bit shifter logical right for signal <b>.
Summary:
inferred 1 Combinational logic shifter(s).
Unit <misr> synthesized.
Synthesizing Unit <mux>.
Related source file is "mux.v".
Unit <mux> synthesized.
Synthesizing Unit <dff>.
Related source file is "dff.v".
Found 8-bit register for signal <q>.
Summary:
inferred 8 D-type flip-flop(s).
Unit <dff> synthesized.
Synthesizing Unit <u_rcvr>.
Related source file is "u_rcvr.v".
Found finite state machine <FSM_0> for signal <state>.
-----------------------------------------------------------------------
| States |5 |
| Transitions | 10 |
| Inputs |4 |
| Outputs |5 |
| Clock | sys_clk (rising_edge) |
| Reset | sys_rst_l (negative) |
| Reset type | asynchronous |
| Reset State | 000 |
| Encoding | automatic |
| Implementation | LUT |
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

-----------------------------------------------------------------------
Found 1-bit register for signal <rec_readyH>.
Found 4-bit up counter for signal <bitcell_cntrH>.
Found 8-bit register for signal <par_dataH>.
Found 1-bit register for signal <rec_datH>.
Found 1-bit register for signal <rec_datsyncH>.
Found 4-bit up counter for signal <recd_bitcntrH>.
Summary:
inferred 1 Finite State Machine(s).
inferred 2 Counter(s).
inferred 11 D-type flip-flop(s).
Unit <u_rcvr> synthesized.
Synthesizing Unit <u_xmit>.
Related source file is "u_xmit.v".
Found finite state machine <FSM_1> for signal <state>.
-----------------------------------------------------------------------
| States |5 |
| Transitions | 10 |
| Inputs |5 |
| Outputs |7 |
| Clock | sys_clk (rising_edge)|
| Reset | sys_rst_l (negative)|
| Reset type | asynchronous |
| Reset State | 000 |
| Encoding | automatic |
| Implementation | LUT
-----------------------------------------------------------------------
Using one-hot encoding for signal <xmitDataselH>.
Found 1-bit register for signal <xmit_doneH>.
Found 4-bit up counter for signal <bitcell_cntrH>.
Found 4-bit up counter for signal <bitcountH>.
Found 8-bit register for signal <xmit_shiftRegH>.
Summary:
inferred 1 Finite State Machine(s).
inferred 2 Counter(s).
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

inferred 9 D-type flip-flop(s).


Unit <u_xmit> synthesized.
Synthesizing Unit <baud>.
Related source file is "baud.v".
Found 1-bit register for signal <baud_clk>.
Found 16-bit comparator equal for signal <baud_clk$cmp_eq0000> created at line 20.
Found 16-bit register for signal <clk_div>.
Found 16-bit adder for signal <clk_div$addsub0000> created at line 34.
Found 16-bit comparator greater for signal <clk_div$cmp_gt0000> created at line 26.
Summary:
inferred 17 D-type flip-flop(s).
inferred 1 Adder/Subtractor(s).
inferred 2 Comparator(s).
Unit <baud> synthesized.
Synthesizing Unit <EP0>.
Related source file is "EP0.v".
Found 1-bit tri state buffer for signal <y>.
Summary:
inferred 1 Tristate(s).
Unit <EP0> synthesized.
Synthesizing Unit <EP1>.
Related source file is "EP1.v".
Found 1-bit tristate buffer for signal <y>.
Summary:
inferred 1 Tristate(s).
Unit <EP1> synthesized.
Synthesizing Unit <org>.
Related source file is "org.v".
Unit <org> synthesized.
Synthesizing Unit <bistcu>.
Related source file is "bistcu.v".
Unit <bistcu> synthesized.
Synthesizing Unit <uart_top>.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Related source file is "uart_top.v".


Unit <uart_top> synthesized.
Synthesizing Unit <TRA>.
Related source file is "TRA.v".
Found 1-bit adder for signal <w1>.
Found 1-bit adder for signal <w2>.
Found 1-bit adder for signal <w3>.
Found 1-bit adder for signal <w4>.
Found 1-bit adder for signal <w5>.
Found 1-bit adder for signal <w6>.
Found 1-bit adder for signal <w7>.
Found 1-bit adder for signal <w8>.
Summary:
inferred 8 Adder/Sub tractor(s).
6.3 Proposed system Simulation Report:

Block diagram for 256 bit shift registers


LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

RTL schematic for clock generator

RTl schematic for D_latch


LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Final report
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

6.4 Proposed system Synthesis Report:


HDL Synthesis Report

Macro Statistics

# Adders/Subtractors :9

1-bit adder :8

16-bit adder :1

# Counters :4

4-bit up counter :4

# Registers : 31

1-bit register : 29

16-bit register :1

8-bit register :1

# Latches :1

8-bit latch :1
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

# Comparators :2

16-bit comparator equal :1

16-bit comparator greater :1

# Logic shifters :1

8-bit shifter logical right :1

# Tristates :8

1-bit tristate buffer :8

# Xors :1

1-bit xor2 :1

* Advanced HDL Synthesis *

Analyzing FSM <FSM_1> for best encoding.

Optimizing FSM <u1/u2/state/FSM> on signal <state[1:3]> with gray encoding.

State | Encoding

1 | 000
2 | 001

010 | 011

011 | 110

100 | 010

Analyzing FSM <FSM_0> for best encoding.

Optimizing FSM <u1/u1/state/FSM> on signal <state[1:3]> with gray encoding.

State | Encoding

3 | 000

010 | 001

011 | 011

100 | 110

101 | 010
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Advanced HDL Synthesis Report

Macro Statistics

# FSMs :2

# Adders/Subtractors :9

1-bit adder :8

16-bit adder :1

# Counters :4

4-bit up counter :4

# Registers : 53

Flip-Flops : 53

# Latches :1

8-bit latch :1

# Comparators :2

16-bit comparator equal :1

16-bit comparator greater :1

# Logic shifters :1

8-bit shifter logical right :1

# Xors :1

1-bit xor2 :1

* Low Level Synthesis *

Optimizing unit <top> …


Optimizing unit <lfsr1> …
Optimizing unit <dff> …
Optimizing unit <u_xmit> …
Optimizing unit <baud> …
Optimizing unit <u_rcvr> …
Optimizing unit <TRA> …
Mapping all equations…
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Building and optimizing final netlist …


Found area constraint ratio of 100 (+ 5) on block top, actual ratio is 1.
Final Macro Processing …
Final Register Report

Macro Statistics
# Registers : 73
Flip-Flops : 73

* Partition Report *

Partition Implementation Status

No Partitions were found in this design.

* Final Report *
Clock Information:
-----------------------------------------------+---------------------------+-------+
Clock Signal | Clock buffer(FF name) |
Load |
-----------------------------------------------+---------------------------+-------+
clk | IBUF+BUFG | 33
|
u1/b1/baud_clk1 | BUFG | 40
|
u1/u1/state_cmp_eq0003(u1/u1/state_FSM_Out01:O)| NONE(*)(u1/u1/rec_dataH_7)
|8 |
-----------------------------------------------+---------------------------+-------+
(*) This 1 clock signal(s) are generated by combinatorial logic, and XST is not able to
identify which are the primary clock signals. Please use the CLOCK_SIGNAL constraint
to specify the clock signal(s) generated by combinatorial logic.
INFO:Xst:2169 – HDL ADVISOR – Some clock signals were not automatically
buffered by XST with BUFG/BUFR resources. Please use the buffer_type constraint in
order to insert these buffers to the clock signals to help prevent skew problems.

Asynchronous Control Signals Information:


LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

----------------------------------------------------+------------------------+-------+
Control Signal | Buffer(FF name)
| Load |
----------------------------------------------------+------------------------+-------+
b1/d1/rst_inv(u1/u2/state_FSM_Acst_FSM_inv1_INV_0:O)| NONE(u1/b1/baud_clk)
| 57 |
----------------------------------------------------+------------------------+-------+
Timing Summary:
Speed Grade: -5
Minimum period: 4.999ns (Maximum Frequency: 200.050MHz)
Minimum input arrival time before clock: 5.602ns
Maximum output required time after clock: 9.569ns
Maximum combinational path delay: 10.421ns

Process “Synthesize – XST” completed successfully.


LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

CHAPTER-7

CONCLUSION
This project proposed a low-power and area-efficient shift register using pulsed
latches. The shift register reduces area and power consumption by replacing flip-flops
with pulsed latches. The timing problem between pulsed latches is solved using multiple
non-overlap delayed pulsed clock signals instead of a single pulsed clock signal. A small
number of the pulsed clock signals is used by grouping the latches to several sub shifter
registers and using additional temporary storage latches. A 256-bit shift register was
fabricated using a 0.18 CMOS process with VDD=1.8v. Its core area is 6000um2. It
consumes 1.2mW at a 100 MHz clock frequency. The proposed shift register saves 37%
area and 44% power compared to the conventional shift register with flip-flops.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

CHAPTER-8

FUTURE SCOPE

With the increase in shift registers in every component ex: memory devices,
communication devices . There is a need oh high speed shifters which consumes less
power .So in this project we have shown a264 bit shifter using pulsed latches which
consumes very less power. In future shift registers are used or implemented for high
bit rates and very high speed devices which would be very useful to be implemented
everywhere and portable.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

INDEX
List of Figures:
Fig. No Fig.Name Page No.
3.1 Serial-In to Parallel-Out (SIPO) 9
3.2 Serial-In to Serial-Out (SISO) 10
3.3 Parallel-In to Serial-Out (PISO) 11
3.4 Parallel-In to Parallel-0ut (PIPO) 12
4.1 Overview of the prominent trends in information technologies 13
4.2 Evolution of integration density and min feature size as seen 15
in the early 1980s
4.3 Level of integration over time for memory chips an logic chips 15
4.4 Typical VLSI design flow in three domains 16
4.5 A more simplified view of VLSI design flow 17
4.6 Structural decomposition of a four -bit adder circuit, showing 18
The hierarchy down to gate level
4.7 Regular design of a 2-1 mux a DFF and an adder using
inverters and tri-state buffers. 19
4.8 General architecture of Xilinx FPGAs 20
4.9 Detailed view of switch matrices and interconnection routing 20
Between CLBS
4.10 XC2000 CLB of the Xilinx FPGA 21
4.11 Basic processing steps required for gate array implementation 22
4.12 A corner of a typical gate array chip 22
4.13 Metal mark design to realize a complex logic function on a 23
Channeled GA platform
4.14 Layout views of a conventional GA chip and gate array with two
Memory banks 23
4.15 The platform of a sea-of-gates(SOG)chip 24
4.16 A standard cell layout examples 25
4.17 A simplified floar plan of standard-cells based design 26
4.18 Simplified floar plan consisting of two separate blocks and a 26
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

Common signal bus


4.19 Mask layout of standard-cell based chip with a single block of
cells and three memory banks 27
4.20 Overview of VLSI design styles 35
5.1 (a) Master-slave flip-flop 47
(b) pulsed latch
5.2 Shift register with latch and a pulsed clock signal 48
(a) schematic
(b) waveforms
5.3 Delayed pulsed clock generator 50
5.4 Proposed shift register 52
5.5 Proposed shift register waveforms 53
5.6 Minimum clock cycle time of the proposed shift register 54
5.7 Schematic of the SSASPL 62
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

List of Tables:
Table No. Table Name Page No.
1 Evolution of logic complexity in integrated circuits 14
2 List of operators 44-45
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

REFERENCES:
1. P. Reyes, P. Reviriego, J. A. Maestro, and O. Ruano, “New protection techniques
against SEUs for moving average filters in a radiation environment,” IEEE Trans. Nucl.
Sci., vol. 54, no. 4, pp. 957–964, Aug. 2007.

2. M. Hatamian et al., “Design considerations for gigabit ethernet 1000 base-T twisted
pair transceivers,” Proc. IEEE Custom Integr. Circuits Conf., pp. 335–342, 1998.

3. H. Yamasaki and T. Shibata, “A real-time image-feature-extraction and vector-


generation vlsi employing arrayed-shift-register architecture,” IEEE J. Solid-State
Circuits, vol. 42, no. 9, pp. 2046–2053, Sep. 2007.

4. H.-S. Kim, J.-H. Yang, S.-H. Park, S.-T. Ryu, and G.-H. Cho, “A 10-bit column-driver
IC with parasitic-insensitive iterative charge-sharing based capacitor-string interpolation
for mobile active-matrix LCDs,” IEEE J. Solid-State Circuits, vol. 49, no. 3, pp. 766–
782, Mar. 2014.

5. S.-H. W. Chiang and S. Kleinfelder, “Scaling and design of a 16-megapixel CMOS


image sensor for electron microscopy,” in Proc. IEEE Nucl. Sci. Symp. Conf. Record
(NSS/MIC), 2009, pp. 1249–1256. [6] S. Heo, R. Krashinsky, and K. Asanovic,
“Activity-sensitive flip-flop and latch selection for reduced energy,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 15, no. 9, pp. 1060–1064, Sep. 2007.

7. S. Naffziger and G. Hammond, “The implementation of the nextgeneration 64 b


itanium microprocessor,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, Feb. 2002, pp. 276–504.

8. H. Partovi et al., “Flow-through latch and edge-triggered flip-flop hybrid elements,”


IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, pp. 138–139, Feb. 1996.

9. E. Consoli, M. Alioto, G. Palumbo, and J. Rabaey, “Conditional push-pull pulsed latch


with 726 fJops energy delay product in 65 nm CMOS,” in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, Feb. 2012, pp. 482–483.

10. V. Stojanovic and V. Oklobdzija, “Comparative analysis of masterslave latches and


flip-flops for high-performance and low-power systems,” IEEE J. Solid-State Circuits,
vol. 34, no. 4, pp. 536–548, Apr. 1999.
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES

11. J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor,” IEEE
J. Solid-State Circuits, vol. 31, no. 11, pp. 1703–1714, Nov. 1996.

12. S. Nomura et al., “A 9.7 mW AAC-decoding, 620 mW H.264 720p 60fps decoding,
8-core media processor with embedded forwardbody-biasing and power-gating circuit in
65 nm CMOS technology,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
Papers, Feb. 2008, pp. 262–264.

13. Y. Ueda et al., “6.33 mW MPEG audio decoding on a multimedia processor,” in


IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2006, pp. 1636–
1637.

14. B.-S. Kong, S.-S. Kim, and Y.-H. Jun, “Conditional-capture flip-flop for statistical
power reduction,” IEEE J. Solid-State Circuits, vol. 36, pp. 1263–1271, Aug. 2001.

15. C. K. Teh, T. Fujita, H. Hara, and M. Hamada, “A 77% energy-saving 22-transistor


single-phase-clocking D-flip-flop with adaptive-coupling configuration in 40 nm
CMOS,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2011,
pp. 338–339
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES
LOW POWER AND AREA-EFFICIENT SHIFT REGISTER USING PULSED LATCHES