Chap8 1 2 PDF

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711
Digital Device Components

A simple processor illustrates many of the basic components used in any dig-
ital system:
Memory
Control
Input-Output
Datapath
Datapath: The core -- all other components are support units that store
either the results of the datapath or determine what happens in the next
cycle.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
1 (December 11, 2000 3:44 pm)
1966
UMBC

Memory:
A broad range of classes exist determined by the way data is accessed:
Read-Only vs. Read-Write
Sequential vs. Random access
Single-ported vs. Multi-ported access
Or by their data retention characteristics:
Dynamic vs. Static
Stay tuned for a more extensive treatment of memories.
Control:
A FSM (sequential circuit) implemented using random logic, PLAs or
memories.
Interconnect and Input-Output:

Parasitic resistance, capacitance and inductance affects performance of
wires both on and off the chip.
Growing die size increases the length of the on-chip interconnect,
increasing the value of the parasitics.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
2 (December 11, 2000 3:44 pm)
1966
UMBC

Datapath elements include adders, multipliers, shifters, BFUs, etc.
The speed of these elements often dominates the overall system perfor-
mance so optimization techniques are important.
However, as we will see, the task is non-trivial since there are multiple
equivalent logic and circuit topologies to choose from, each with adv./
disadv. in terms of speed, power and area.
Also, optimizations focused at one design level, e.g., sizing transistors,

leads to inferior designs.
Control Bit-sliced organization
is common for datapaths.
Bit 4
Bit 3
Bit 2
Data-In
Adder
Shifter
Data-Out
Registers
Bit 1
Multiplexer
Bit 0
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
3 (December 11, 2000 3:44 pm)
1966
UMBC
Datapath Operators: Addition/Subtraction

Lets start with addition, since it is a very common datapath element and
often a speed-limiting element.
Optimizations can be applied at the logic or circuit level.

Logic-level optimization try to rearrange the Boolean equations to produce
a faster or smaller circuit, e.g. carry look-ahead adder.
Circuit-level optimizations manipulate transistor sizes and circuit topology
to optimize speed.
Lets start with some basic definitions before considering optimizations:
A B Ci G(A.B) P(A+B) P(A + B) Sum Co Carry status

0 0 0 0 0 0 0 0 delete
0 0 1 0 0 0 1 0 delete
0 1 0 0 1 1 1 0 propagate
0 1 1 0 1 1 0 1 propagate
1 0 0 0 1 1 1 0 propagate
1 0 1 0 1 1 0 1 propagate
1 1 0 1 1 0 0 1 generate
1 1 1 1 1 0 1 1 generate
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
4 (December 11, 2000 3:44 pm)
1966
UMBC

G(A.B): (generate)
Occurs when a Co is internally generated within the adder (occurs inde-
pendent of Ci).
P(A+B): (propagate)
Indicates that Ci is propagated (passed) to Co.
P(A XOR B): (propagate)

Used in some adders for the P term since it can be reused to generate the
sum term.
D(A.B): (delete)
Ensures that a carry bit will be deleted at Co.
The Boolean expressions for S and Co are:

Sum = A.B.Ci + A.B.Ci + A.B.Ci + A.B.Ci = A XOR B XOR C
Carry = A.B + A.Ci + B.Ci
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
5 (December 11, 2000 3:44 pm)
1966
UMBC

But S and Co can be written in terms of G and P:
Co(G, P) = G + PC i (or P in this case).
S(G, P) = P XOR C i
Note that G and P are INdependent of Ci.

(Also, Co and S can be expressed in terms of delete (D)).
Ripple-carry adder:
A0 B0 A1 B1 A2 B2 A3 B3
Ci,0 Co,0 Co,1 Co,2 Co,3

FA =Ci,1 FA FA FA
S0 S1 S2 S3
The critical path (worst case delay over all possible inputs) is a ripple from
lsb to msb.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
6 (December 11, 2000 3:44 pm)
1966
UMBC

The delay in this case is proportional to the number of bits, N, in the input
words:
tadder = (N - 1)tcarry + tsum
where tcarry and tsum are the propagation delays from Ci to Co & S.
One possible worst case bit pattern (from lsb to msb) is:
A: 00000001; B: 01111111
Convince yourself that this is true.
Note that when optimizing this structure, it is far more important to optimize
tcarry than tsum.
The inverting property of a full adder can be used to achieve this goal:
A B A B
Ci Co Ci Co
FA FA
S S
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
7 (December 11, 2000 3:44 pm)
1966
UMBC

Thus,
S(A, B, Ci) = S(A, B, Ci)
Co(A, B, Ci) = Co(A, B, Ci)
One possible (un-optimized) implementation:
A
B S
Transistor level diagram uses
Ci 32 transistors.
P XOR Ci (see Weste and Eshraghian).
A
Ci B Co
A
B
G(A.B)
Ci.P(A + B)
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
8 (December 11, 2000 3:44 pm)
1966
UMBC

Co is reused in the S term as:
Sum = A.B.Ci + (A + B + Ci)Co
Symmetrical
A B A Ci A B A design
eliminates
diffusion
Ci B B caps and
Co reduces
Ci series R.
Ci B
S
Ci
A B A
A
A B Ci B
Are the n and p trees duals Co
of each other? 28 transistors
Even with some design tricks, e.g., transistors on the critical path, Ci placed
closest to the output and symmetrical design, this implementation is slow.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
9 (December 11, 2000 3:44 pm)
1966
UMBC

The load capacitance in previous version on Co consists of 2 diffusion capaci-
tances (inverter) and 6 (next bit) gate capacitances:
C<n+1> Overflow
B<n>
S<n> C<3>
A<n> B<3>
S<3>
C<n> Sign of A<3>
C<3> the result
B<3> B<2>
S<3> S<2>
A<3> A<2>
B<1>
B<2> S<1>
A<2> S<2> A<1>
B<1> B<0>
S<1> S<0>
A<1> A<0>
B<0> Subtract
A<0> S<0>
Eliminates the inverter delay per bit for carry!

Cin
This version increases Cos load to 4 diffusion caps, 2 internal (sum) gate caps
plus the 6 (next bit) gate caps.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
10 (December 11, 2000 3:44 pm)
1966
UMBC

Serial addition can be used if area is a concern:
Reg
1-bit
Clk
Set
n bit shift register Clr
addend
Cout
n bit shift register
result
augand C
in
Clk
In this case, you want equal Sum and Carry delays in order to minimize clock
cycle time.
Bit-level pipelining can be used to break the dependency between addition

time and the number of bits by inserting FAs between each register bit.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
11 (December 11, 2000 3:44 pm)
1966
UMBC

Transmission-gate Adder:
Total transistors is 26
XNOR
B S
A
Co
XOR
Ci
Note: S and Co delay times are approximately equal -- good for multipliers.
See Weste and Eshraghian for an 18 transistor implementation.

YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
12 (December 11, 2000 3:44 pm)
1966
UMBC

Dynamic Adder Design: np-CMOS adder

S1
A1 B1 B1 Ci1
A1 B1 Ci1 A1
Ci A1
B1

Ci2

Ci1
B0
Ci0 B0 A0 B0 Ci0
A0
A0 B0 A0 Ci0
S0

YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
13 (December 11, 2000 3:44 pm)
1966
UMBC

Dynamic Adder Design: Manchester Carry-Chain adder.
A chain of pass-transistors are used to implement the carry chain.

P0 P1 P2 P3 P4
Co,0 Co,1 Co,2 Co,3 Co,4 Co,4
3 2.5 2 1.5 1
Ci,0 G0 G1 G2 G3 G4
3.5 3 2.5 2 1.5 1
4 3.5 3 2.5 2 1.5
Transistor sizes largest here since worst case is to discharge all nodes Co,k.
Precharge: All intermediate nodes, e.g. Co,0, charged to VDD.
Evaluate: Node Co,k is discharged, for example, if there is an incoming
carry, Ci,0 and the previous propagate signals are high, P0 to Pk-1.
Only 4 diffusion capacitances are present per node but the distributed RC-
nature of the chain results in delay that is quadratic with number of bits.
Buffers and/or transistor sizing can be used to improve performance.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
14 (December 11, 2000 3:44 pm)
1966
UMBC

Consider the worst case delay of the carry chain:
R1 R2 R3 R4 R5 R6 Out
C1 C2 C3 C4 C5 C6
Elmore delay is given by:

N i
t p = 0.69 C i R j
i = 1 j = 1
The delay of the RC network is then:

tp = 0.69(C1R1 + C2(R1 + R2) + C3(R1 + R2 + R3) + C4(R1 + R2 + R3 + R4) +
C5(R1 + R2 + R3 + R4 + R5) + C6(R1 + R2 + R3 + R4 + R5 + R6)
Since R1 appears 6 times in the expression, it makes sense to minimize its
contribution.
Note that reducing R by a factor, e.g. k, at each stage increases the capacitance
by a factor k and increases area.
A k-factor of 1.5, reduces delay by 40% and increases area by 3.5X.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
15 (December 11, 2000 3:44 pm)
1966
UMBC

Carry-Bypass adder:
P0 G0 P1 G1 P2 G2 P3 G3
Ci,0 Co,0 Co,1 Co,2 Co,3

FA FA FA FA
Co,3
Mux
BP = P0P1P2P3
Assume Ak and Bk (for k = 1...3) are set such that all Pk (propagate) are
high.
In this case, an incoming carry Ci,0 = 1, propagates along the com-
plete chain and Co,3 = 1.
In other words:
if (P0P1P2P3 == 1) then Co,3 = Ci,0 else either DELETE or GENERATE
occurred.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
16 (December 11, 2000 3:44 pm)
1966
UMBC

Linear Carry-Select adder:
One way around waiting for the incoming carry is to compute the result
of both possible values in advance and let the incoming carry select the
correct result.
Setup This block adds bits k to k+3.
P,G
0 Select operation is much faster than
0-carry propagation
time to compute either of the two
possible carry vectors.
1 1-carry propagation
Co,k-1 Co,k+3 For Square-Root Carry-Select,

Mux higher order blocks take more
operand bits than lower order
Carry vector blocks.
Sum Generation
A Square-Root Carry-Select Adder (delay = O(N1/2)) is constructed by

increasing the number of input bits in each block from lsb to msb.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
17 (December 11, 2000 3:44 pm)
1966
UMBC

Carry look-ahead adder (avoiding the ripple altogether):
Compute the carries to each stage in parallel.
The carry out of the kth stage is computed as:
Co,k = Gk + Pk . Co,k-1 where Gk = Ak . Bk
Pk = Ak + Bk
The dependency between Co,k and Co,k-1 can be eliminated by

expanding Co,k-1.
Co,k = Gk + Pk . (Gk-1 + Pk-1.Co,k-2)
For example, for 4 stages of look-ahead:

C0 = G0 + P0Ci
C1 = G1 + P1G0 + P1P0Ci
C2 = G2 + P2G1 + P2P1G0 + P2P1P0Ci
C3 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0Ci
Note that the low-order terms, e.g., P0 and G0, appear in the expression for
every bit, making the fanout load large.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
18 (December 11, 2000 3:44 pm)
1966
UMBC

Carry look-ahead adder:
One possible implementation without using simple logic gates.
G3
G2
G1
G0
Ci,0
C0,3
P0
P1
P2
P3
Size and fan-in of the gates limit the size to about four.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
19 (December 11, 2000 3:44 pm)
1966
UMBC

Carry look-ahead adder:
Factoring term C3 yields:
C3 = G3 + P3(G2 + P2(G1 + P1(G0 + P0Ci,0)))
Clk
Domino CMOS implementation: C<3>
P<3> G<3>
Worst case is pull-down
through 6 series n-channel P<2> G<2>
transistors.
P<1> G<1>
P<0> G<0>
Ci,0
Clk
Other high speed versions
given in Weste and Eshraghian.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
20 (December 11, 2000 3:44 pm)
1966
UMBC

The Logarithmic look-ahead adder: O(log2N) delay:
(G0, P0) Co,0
Co,2 Co,4
(G1, P1) Co,1
Forward
binary tree
(G2, P2) Co,3
Co,5
(G3, P3)
(G4, P4)
Co,6
(G5, P5)
Inverse
(G6, P6) binary tree
Co,7
(G7, P7)
(C4-7,P4-7)
The dot operator ( )is defined as: (g, p) . (g, p) = (g + pg, pp)
The number of logic levels is proportional to log2N, fan-in is limited and the
layout is compact (jigsaw puzzle) (see Rabaey for details).
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
21 (December 11, 2000 3:44 pm)
1966
UMBC
Datapath Operators: Comparison

Magnitude Comparators:
May be built from an adder, complementer (XOR gates) and a zero
detect unit.
B >= A
B<3>
A<3>
B<2>
A<2> B=A
B<1>
A<1> Zero detect NOR gate.
B<0>
A<0>
Think about the modifications necessary to make it a signed comparator

(Hint: A couple of XOR gates).
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
22 (December 11, 2000 3:44 pm)
1966
UMBC
Datapath Operators: Binary Counters

Asynchronous: Based on the Toggle register.
T T
Q
C
T T
T T T T
Clk Q<0> Q<1> Q<2> Q<3>

T Q T Q T Q T Q
Clk Q<3>
T Q T Q T Q T Q
"Ripple Carry" Binary counter

Not a good choice for performance and testability (with no reset).
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
23 (December 11, 2000 3:44 pm)
1966
UMBC
Datapath Operators: Binary Counters

Synchronous counter.
Q<0> Q<1> Q<2> Q<3>
D Q 0 D Q 0 D Q 0 D Q
1-bit 1 1-bit 1 1-bit 1 1-bit
Reg Reg Reg Reg
Clk Clk Clk Clk

Clear Clear Clear Clear
Clk
Clear
Replace AND gate with an adder for up/down counting capability.

Weste and Eshraghian also show a version that can be initialized.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
24 (December 11, 2000 3:44 pm)
1966
UMBC
Datapath Operators: Multiplication

Multiplication can be broken down into two steps:
Computation of partial products.
Accumulation of the shifted partial products.
1100
X 0101
Binary multiplication equivalent to
1100
0000 AND operation
1100
0000
0111100
Multipliers may be classified by the format in which data words are accessed:
Serial
Serial/parallel
Parallel
The parallel form computes the partial products in parallel.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
25 (December 11, 2000 3:44 pm)
1966
UMBC

Parallel Unsigned Multiplication:
m1 Multiplying 2 unsigned binary integers results in:

X = X i2i
i=0
m1 n1 m + n 1
P = XY = X i2i Y j2 j = Pk 2 k
n1 i=0 j=0 k=0
Y = Y j2 j
j=0
X3 X2 X1 X0 Multiplicand
Y3 Y2 Y1 Y0 Multiplier
X3Y0 X2Y0 X1Y0 X0Y0

X3Y1 X2Y1 X1Y1 X0Y1 There are m*n summands
X3Y2 X2Y2 X1Y2 X0Y2 produced by a set of m*n
X3Y3 X2Y3 X1Y3 X0Y3 AND gates in parallel.
P7 P6 P5 P4 P3 P2 P1 P0
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
26 (December 11, 2000 3:44 pm)
1966
UMBC

Parallel Multiplication:
Multiplication is carried out using a bitwise AND of the operands, Xi
and Yi.
Most of the work (and delay) is in summing the partial products.
B Ci
Y
X
A
Multiplication A NxN multiplier requires:
N(N-2) full adders
N half adders
Sum the N2 AND gates
Co Partial products
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
27 (December 11, 2000 3:44 pm)
1966
UMBC

Array multiplier:
X3 X2 X1 X0 Y0
M
N
tmult = (M-1)+(N-2)tcarry P0
X3 X2 X1 X0 Y1
+ (N-1)tsum + tand
HA FA FA HA
X3 X2 X1 P1
X0 Y2
FA FA FA HA There are a large

number of nearly
P2 identical critical
X3 X2 X1 X0 Y3 paths in this
circuit.
FA FA FA HA
P7 P6 P5 P4 P3
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
28 (December 11, 2000 3:44 pm)
1966
UMBC

From the delay expression and the fact that all critical paths have the same
length, minimizing tmult requires minimizing both tcarry and tsum.
This is in contrast with the adder where minimizing tcarry was key.
The transmission gate adder is a good choice here.
Parallel Signed Multiplication:

Baugh-Wooley algorithm: Only 3 additional adders required over the
m2
unsigned version.
A = am 1 2 m 1 + ai 2i Let A and B represent signed integers.
i=0
m2 Expanding shows that the last two rows of
B = bm 1 2 m 1 + bi 2i summands are all negative so the algorithm
i=0 simply adds in their negations.
m2 n2
P = am 1 2 m 1 i
+ ai 2 bn 1 2 n 1 + b i 2 i
i=0 i=0
m 2n 2 m2 n2
= am 1 bn 1 2m + n 2 + ai b j 2 i + j ai bn 1 2n 1 + i am 1 bi 2 m 1 + i
i=0j=0 i=0 i=0
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
29 (December 11, 2000 3:44 pm)
1966
UMBC
IVERSITY O
UN F
M
AR
1966
U M B C
YLAND BA
L
TI
Y MO
RE COUNT
a7 a a6 a a5 a a4 a3 a a2 a a1
7 6 5 a4 3 2 a1 a0 a0
AND AND AND AND AND AND AND AND b0 b0
( a7 b0 ) ( a6 b0 ) ( a5 b0 ) ( a4 b0 ) ( a3 b0 ) ( a2 b0 ) ( a1 b0 ) ( a0 b0 )
b1 b1
AND AND AND AND AND AND AND AND P0
ADD ADD ADD ADD ADD ADD ADD
( a7 b1 )
UMBC
( a6 b1 ) ( a5 b1 ) ( a4 b1 ) ( a3 b1 ) ( a2 b1 ) ( a1 b1 ) ( a0 b1 )
b2 b2
Principles of VLSI Design

( a7 b2 )
( a6 b2 ) ( a5 b2 ) ( a4 b2 ) ( a3 b2 ) ( a2 b2 ) ( a1 b2 ) ( a0 b2 )
b3 b3

Parallel Signed Multiplication:
b4 b4
AND AND AND AND AND AND AND P3
AND
30
b5 b5
Subsystem Design
b6 b6
( a7 b6 )
( a6 b6 ) ( a5 b6 ) ( a4 b6 ) ( a3 b6 ) ( a2 b6 ) ( a1 b6 ) ( a0 b6 )
b7 b7
( a7 b7 ) ADD ADD ADD ADD ADD ADD ADD
( a6 b7 ) ( a5 b7 ) ( a4 b7 ) ( a3 b7 ) ( a2 b7 ) ( a1 b7 ) ( a0 b7 )
ADD
( a7 b7 )
P8
ADD
ADD ( a7 b7 )
P15 P14 P13 P12 P11 P10 P9 P7
(December 11, 2000 3:44 pm)

CMPE 413/CMSC 711

Carry-Save Multiplier:
Carry bits can be passed diagonally downwards instead of to the left.
HA HA HA HA 4x4 version
HA FA FA FA
Cost: A little extra
area:
HA FA FA FA Advantage:
Critical path is uniquely defined:
tmult = (N-1)tcarry + tand + tmerge
HA FA FA HA
(Assuming tadd = tcarry).
Vector-merging adder Minimizing tmerge is useful,
e.g. use carry-select or lookahead.
Here the carry bits are not immediately added but rather saved for the
next adder stage.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
31 (December 11, 2000 3:44 pm)
1966
UMBC

Serial Unsigned Multiplication:
If area
is a
concern. Reg
1-bit
Clk
reset G2
serial register
P7 P0
X
Y G1
Cin
Xi and Yi delivered serially Clk Computes the summands
to the inputs of G1 at different rates. row-wise from right to left.
Disadv: Quadratic delay: tmult = M x N x tcarry
Serial/Parallel Unsigned Multiplier shown in Weste and Eshraghian.

YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
32 (December 11, 2000 3:44 pm)
1966
UMBC

Booth Encoding:
A special encoding of the multiplier word reduces the number of
required addition stages and speeds up multiplication substantially.
Radix-4 scheme:
(N 1) 2
j
Y = Y j 4 with ( Y j { 2, 1, 0, 1, 2 } )
j=0
The number of partial products (and additions) is halved, resulting in

area and speed advantage.
The disadvantage is a somewhat more involved multiplier cell.

AND operation replaced with inversion and shift logic.
Virtually every multiplier in use employs the Booth scheme.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
33 (December 11, 2000 3:44 pm)
1966
UMBC

Wallace Multiplier:
Trees can be used to replace the linear partial-sum adders:
Y0 Y1 Y2
Y0 Y1 Y2 Y3 Y4 Y5
FA
Ci-1 FA FA
Ci Y3 Ci
Ci-1
FA Ci
Ci-1 FA Ci-1
Ci Y4
Ci
FA
Ci-1 FA
Ci Y5 Slice of a 6-bit
carry-save mult. C
Sum
FA
# of ripple stages is N-2 Adv: O(log2N) mult time.
Ci Disadv: Very irregular -- difficult
Sum
to layout.
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
34 (December 11, 2000 3:44 pm)
1966
UMBC
Datapath Operators: Shifters

Right/Left 1-bit shifter:
A3 A2 A1 A0
IR IL
0 1 0 1 0 1 0 1
S S S S
Mux Mux Mux Mux
Right/Left
H3 H2 H1 H0
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
35 (December 11, 2000 3:44 pm)
1966
UMBC
Datapath Operators: Shifters

Barrel shifter:
s<3> s<2> s<1> s<0>
r<3>
r<2>
r<1>
r<0>
shift result
1 l<3:0>
2 l<4:1>
l<6:0> Arithmetic and logical shifts and rotates possible 4 l<5:2>
by muxing l<6:0> to the appropriate values. 8 l<6:3>
YLAND BA
AR L
M
F
TI
U M B C
MO
IVERSITY O
RE COUNT
UN
Y
36 (December 11, 2000 3:44 pm)
1966
UMBC

Chap8 1 2 PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chap8 1 2 PDF

Caricato da

Copyright:

Formati disponibili

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Digital Device Components

Digital Device Components

Interconnect and Input-Output:

Digital Device Components

Also, optimizations focused at one design level, e.g., sizing transistors,

Datapath Operators: Addition/Subtraction

Optimizations can be applied at the logic or circuit level.

Lets start with some basic definitions before considering optimizations:

A B Ci G(A.B) P(A+B) P(A + B) Sum Co Carry status

Datapath Operators: Addition/Subtraction

P(A XOR B): (propagate)

The Boolean expressions for S and Co are:

Datapath Operators: Addition/Subtraction

Note that G and P are INdependent of Ci.

Ci,0 Co,0 Co,1 Co,2 Co,3

Datapath Operators: Addition/Subtraction

Datapath Operators: Addition/Subtraction

One possible (un-optimized) implementation:

Datapath Operators: Addition/Subtraction

Datapath Operators: Addition/Subtraction

Eliminates the inverter delay per bit for carry!

Datapath Operators: Addition/Subtraction

Bit-level pipelining can be used to break the dependency between addition

Datapath Operators: Addition/Subtraction

See Weste and Eshraghian for an 18 transistor implementation.

Datapath Operators: Addition/Subtraction

Datapath Operators: Addition/Subtraction

4 3.5 3 2.5 2 1.5

Datapath Operators: Addition/Subtraction

Elmore delay is given by:

The delay of the RC network is then:

Datapath Operators: Addition/Subtraction

Ci,0 Co,0 Co,1 Co,2 Co,3

Datapath Operators: Addition/Subtraction

Co,k-1 Co,k+3 For Square-Root Carry-Select,

A Square-Root Carry-Select Adder (delay = O(N1/2)) is constructed by

Datapath Operators: Addition/Subtraction

The dependency between Co,k and Co,k-1 can be eliminated by

For example, for 4 stages of look-ahead:

Datapath Operators: Addition/Subtraction

Datapath Operators: Addition/Subtraction

Datapath Operators: Addition/Subtraction

Datapath Operators: Comparison

Think about the modifications necessary to make it a signed comparator

Datapath Operators: Binary Counters

Clk Q<0> Q<1> Q<2> Q<3>

"Ripple Carry" Binary counter

Datapath Operators: Binary Counters

Clk Clk Clk Clk

Replace AND gate with an adder for up/down counting capability.

Datapath Operators: Multiplication

The parallel form computes the partial products in parallel.

Datapath Operators: Multiplication

m1 Multiplying 2 unsigned binary integers results in:

X3Y0 X2Y0 X1Y0 X0Y0

Datapath Operators: Multiplication

Datapath Operators: Multiplication

FA FA FA HA There are a large

Datapath Operators: Multiplication

Parallel Signed Multiplication:

AND AND AND AND AND AND AND AND P1

ADD ADD ADD ADD ADD ADD ADD