Sei sulla pagina 1di 36

Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Digital Device Components


A simple processor illustrates many of the basic components used in any dig-
ital system:

Memory

Control

Input-Output
Datapath

Datapath: The core -- all other components are support units that store
either the results of the datapath or determine what happens in the next
cycle.

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
1 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Digital Device Components


Memory:
A broad range of classes exist determined by the way data is accessed:
Read-Only vs. Read-Write
Sequential vs. Random access
Single-ported vs. Multi-ported access
Or by their data retention characteristics:
Dynamic vs. Static
Stay tuned for a more extensive treatment of memories.

Control:
A FSM (sequential circuit) implemented using random logic, PLAs or
memories.

Interconnect and Input-Output:


Parasitic resistance, capacitance and inductance affects performance of
wires both on and off the chip.
Growing die size increases the length of the on-chip interconnect,
increasing the value of the parasitics.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
2 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Digital Device Components


Datapath elements include adders, multipliers, shifters, BFUs, etc.
The speed of these elements often dominates the overall system perfor-
mance so optimization techniques are important.

However, as we will see, the task is non-trivial since there are multiple
equivalent logic and circuit topologies to choose from, each with adv./
disadv. in terms of speed, power and area.

Also, optimizations focused at one design level, e.g., sizing transistors,


leads to inferior designs.
Control Bit-sliced organization
is common for datapaths.

Bit 4
Bit 3
Bit 2

Data-In
Adder
Shifter
Data-Out

Registers
Bit 1
Multiplexer

Bit 0
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
3 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Lets start with addition, since it is a very common datapath element and
often a speed-limiting element.

Optimizations can be applied at the logic or circuit level.


Logic-level optimization try to rearrange the Boolean equations to produce
a faster or smaller circuit, e.g. carry look-ahead adder.
Circuit-level optimizations manipulate transistor sizes and circuit topology
to optimize speed.

Lets start with some basic definitions before considering optimizations:

A B Ci G(A.B) P(A+B) P(A + B) Sum Co Carry status


0 0 0 0 0 0 0 0 delete
0 0 1 0 0 0 1 0 delete
0 1 0 0 1 1 1 0 propagate
0 1 1 0 1 1 0 1 propagate
1 0 0 0 1 1 1 0 propagate
1 0 1 0 1 1 0 1 propagate
1 1 0 1 1 0 0 1 generate
1 1 1 1 1 0 1 1 generate
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
4 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


G(A.B): (generate)
Occurs when a Co is internally generated within the adder (occurs inde-
pendent of Ci).

P(A+B): (propagate)
Indicates that Ci is propagated (passed) to Co.

P(A XOR B): (propagate)


Used in some adders for the P term since it can be reused to generate the
sum term.

D(A.B): (delete)
Ensures that a carry bit will be deleted at Co.

The Boolean expressions for S and Co are:


Sum = A.B.Ci + A.B.Ci + A.B.Ci + A.B.Ci = A XOR B XOR C
Carry = A.B + A.Ci + B.Ci

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
5 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


But S and Co can be written in terms of G and P:
Co(G, P) = G + PC i (or P in this case).
S(G, P) = P XOR C i

Note that G and P are INdependent of Ci.


(Also, Co and S can be expressed in terms of delete (D)).

Ripple-carry adder:
A0 B0 A1 B1 A2 B2 A3 B3

Ci,0 Co,0 Co,1 Co,2 Co,3


FA =Ci,1 FA FA FA

S0 S1 S2 S3
The critical path (worst case delay over all possible inputs) is a ripple from
lsb to msb.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
6 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


The delay in this case is proportional to the number of bits, N, in the input
words:
tadder = (N - 1)tcarry + tsum
where tcarry and tsum are the propagation delays from Ci to Co & S.
One possible worst case bit pattern (from lsb to msb) is:
A: 00000001; B: 01111111
Convince yourself that this is true.

Note that when optimizing this structure, it is far more important to optimize
tcarry than tsum.

The inverting property of a full adder can be used to achieve this goal:
A B A B

Ci Co Ci Co
FA FA

S S
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
7 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Thus,
S(A, B, Ci) = S(A, B, Ci)
Co(A, B, Ci) = Co(A, B, Ci)

One possible (un-optimized) implementation:

A
B S
Transistor level diagram uses
Ci 32 transistors.
P XOR Ci (see Weste and Eshraghian).

A
Ci B Co
A
B
G(A.B)
Ci.P(A + B)

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
8 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Co is reused in the S term as:
Sum = A.B.Ci + (A + B + Ci)Co
Symmetrical
A B A Ci A B A design
eliminates
diffusion
Ci B B caps and
Co reduces
Ci series R.
Ci B
S
Ci
A B A
A

A B Ci B
Are the n and p trees duals Co
of each other? 28 transistors
Even with some design tricks, e.g., transistors on the critical path, Ci placed
closest to the output and symmetrical design, this implementation is slow.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
9 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


The load capacitance in previous version on Co consists of 2 diffusion capaci-
tances (inverter) and 6 (next bit) gate capacitances:
C<n+1> Overflow
B<n>
S<n> C<3>
A<n> B<3>
S<3>
C<n> Sign of A<3>
C<3> the result
B<3> B<2>
S<3> S<2>
A<3> A<2>
B<1>
B<2> S<1>
A<2> S<2> A<1>
B<1> B<0>
S<1> S<0>
A<1> A<0>
B<0> Subtract
A<0> S<0>

Eliminates the inverter delay per bit for carry!


Cin
This version increases Cos load to 4 diffusion caps, 2 internal (sum) gate caps
plus the 6 (next bit) gate caps.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
10 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Serial addition can be used if area is a concern:

Reg
1-bit
Clk
Set
n bit shift register Clr
addend
Cout
n bit shift register
result

augand C
in
Clk
In this case, you want equal Sum and Carry delays in order to minimize clock
cycle time.

Bit-level pipelining can be used to break the dependency between addition


time and the number of bits by inserting FAs between each register bit.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
11 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Transmission-gate Adder:

Total transistors is 26

XNOR
B S

A
Co

XOR
Ci

Note: S and Co delay times are approximately equal -- good for multipliers.

See Weste and Eshraghian for an 18 transistor implementation.


YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
12 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Dynamic Adder Design: np-CMOS adder


S1
A1 B1 B1 Ci1
A1 B1 Ci1 A1
Ci A1
B1

Ci2


Ci1
B0
Ci0 B0 A0 B0 Ci0
A0
A0 B0 A0 Ci0
S0

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
13 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Dynamic Adder Design: Manchester Carry-Chain adder.
A chain of pass-transistors are used to implement the carry chain.


P0 P1 P2 P3 P4
Co,0 Co,1 Co,2 Co,3 Co,4 Co,4
3 2.5 2 1.5 1
Ci,0 G0 G1 G2 G3 G4
3.5 3 2.5 2 1.5 1

4 3.5 3 2.5 2 1.5

Transistor sizes largest here since worst case is to discharge all nodes Co,k.
Precharge: All intermediate nodes, e.g. Co,0, charged to VDD.
Evaluate: Node Co,k is discharged, for example, if there is an incoming
carry, Ci,0 and the previous propagate signals are high, P0 to Pk-1.

Only 4 diffusion capacitances are present per node but the distributed RC-
nature of the chain results in delay that is quadratic with number of bits.
Buffers and/or transistor sizing can be used to improve performance.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
14 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Consider the worst case delay of the carry chain:
R1 R2 R3 R4 R5 R6 Out

C1 C2 C3 C4 C5 C6

Elmore delay is given by:


N i
t p = 0.69 C i R j
i = 1 j = 1

The delay of the RC network is then:


tp = 0.69(C1R1 + C2(R1 + R2) + C3(R1 + R2 + R3) + C4(R1 + R2 + R3 + R4) +
C5(R1 + R2 + R3 + R4 + R5) + C6(R1 + R2 + R3 + R4 + R5 + R6)
Since R1 appears 6 times in the expression, it makes sense to minimize its
contribution.

Note that reducing R by a factor, e.g. k, at each stage increases the capacitance
by a factor k and increases area.
A k-factor of 1.5, reduces delay by 40% and increases area by 3.5X.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
15 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Carry-Bypass adder:
P0 G0 P1 G1 P2 G2 P3 G3

Ci,0 Co,0 Co,1 Co,2 Co,3


FA FA FA FA
Co,3
Mux

BP = P0P1P2P3
Assume Ak and Bk (for k = 1...3) are set such that all Pk (propagate) are
high.
In this case, an incoming carry Ci,0 = 1, propagates along the com-
plete chain and Co,3 = 1.
In other words:
if (P0P1P2P3 == 1) then Co,3 = Ci,0 else either DELETE or GENERATE
occurred.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
16 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Linear Carry-Select adder:
One way around waiting for the incoming carry is to compute the result
of both possible values in advance and let the incoming carry select the
correct result.
Setup This block adds bits k to k+3.
P,G
0 Select operation is much faster than
0-carry propagation
time to compute either of the two
possible carry vectors.
1 1-carry propagation

Co,k-1 Co,k+3 For Square-Root Carry-Select,


Mux higher order blocks take more
operand bits than lower order
Carry vector blocks.
Sum Generation

A Square-Root Carry-Select Adder (delay = O(N1/2)) is constructed by


increasing the number of input bits in each block from lsb to msb.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
17 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Carry look-ahead adder (avoiding the ripple altogether):
Compute the carries to each stage in parallel.
The carry out of the kth stage is computed as:
Co,k = Gk + Pk . Co,k-1 where Gk = Ak . Bk
Pk = Ak + Bk

The dependency between Co,k and Co,k-1 can be eliminated by


expanding Co,k-1.
Co,k = Gk + Pk . (Gk-1 + Pk-1.Co,k-2)

For example, for 4 stages of look-ahead:


C0 = G0 + P0Ci
C1 = G1 + P1G0 + P1P0Ci
C2 = G2 + P2G1 + P2P1G0 + P2P1P0Ci
C3 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0Ci
Note that the low-order terms, e.g., P0 and G0, appear in the expression for
every bit, making the fanout load large.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
18 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Carry look-ahead adder:
One possible implementation without using simple logic gates.

G3
G2
G1
G0
Ci,0
C0,3

P0
P1
P2
P3

Size and fan-in of the gates limit the size to about four.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
19 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Carry look-ahead adder:
Factoring term C3 yields:
C3 = G3 + P3(G2 + P2(G1 + P1(G0 + P0Ci,0)))
Clk
Domino CMOS implementation: C<3>
P<3> G<3>
Worst case is pull-down
through 6 series n-channel P<2> G<2>
transistors.
P<1> G<1>

P<0> G<0>

Ci,0

Clk
Other high speed versions
given in Weste and Eshraghian.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
20 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


The Logarithmic look-ahead adder: O(log2N) delay:
(G0, P0) Co,0
Co,2 Co,4
(G1, P1) Co,1
Forward
binary tree
(G2, P2) Co,3
Co,5
(G3, P3)

(G4, P4)
Co,6
(G5, P5)
Inverse
(G6, P6) binary tree
Co,7
(G7, P7)
(C4-7,P4-7)
The dot operator ( )is defined as: (g, p) . (g, p) = (g + pg, pp)
The number of logic levels is proportional to log2N, fan-in is limited and the
layout is compact (jigsaw puzzle) (see Rabaey for details).
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
21 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Comparison


Magnitude Comparators:
May be built from an adder, complementer (XOR gates) and a zero
detect unit.
B >= A
B<3>
A<3>
B<2>
A<2> B=A
B<1>
A<1> Zero detect NOR gate.
B<0>
A<0>

Think about the modifications necessary to make it a signed comparator


(Hint: A couple of XOR gates).

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
22 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Binary Counters


Asynchronous: Based on the Toggle register.

T T
Q
C
T T
T T T T

Clk Q<0> Q<1> Q<2> Q<3>


T Q T Q T Q T Q

Clk Q<3>
T Q T Q T Q T Q

"Ripple Carry" Binary counter


Not a good choice for performance and testability (with no reset).

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
23 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Binary Counters


Synchronous counter.
Q<0> Q<1> Q<2> Q<3>

D Q 0 D Q 0 D Q 0 D Q
1-bit 1 1-bit 1 1-bit 1 1-bit
Reg Reg Reg Reg

Clk Clk Clk Clk


Clear Clear Clear Clear

Clk
Clear

Replace AND gate with an adder for up/down counting capability.


Weste and Eshraghian also show a version that can be initialized.

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
24 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Multiplication


Multiplication can be broken down into two steps:
Computation of partial products.
Accumulation of the shifted partial products.
1100
X 0101
Binary multiplication equivalent to
1100
0000 AND operation
1100
0000
0111100
Multipliers may be classified by the format in which data words are accessed:
Serial
Serial/parallel
Parallel

The parallel form computes the partial products in parallel.

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
25 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Multiplication


Parallel Unsigned Multiplication:

m1 Multiplying 2 unsigned binary integers results in:


X = X i2i
i=0
m1 n1 m + n 1
P = XY = X i2i Y j2 j = Pk 2 k
n1 i=0 j=0 k=0
Y = Y j2 j
j=0
X3 X2 X1 X0 Multiplicand
Y3 Y2 Y1 Y0 Multiplier

X3Y0 X2Y0 X1Y0 X0Y0


X3Y1 X2Y1 X1Y1 X0Y1 There are m*n summands
X3Y2 X2Y2 X1Y2 X0Y2 produced by a set of m*n
X3Y3 X2Y3 X1Y3 X0Y3 AND gates in parallel.

P7 P6 P5 P4 P3 P2 P1 P0

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
26 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Multiplication


Parallel Multiplication:
Multiplication is carried out using a bitwise AND of the operands, Xi
and Yi.
Most of the work (and delay) is in summing the partial products.

B Ci
Y
X

A
Multiplication A NxN multiplier requires:
N(N-2) full adders
N half adders
Sum the N2 AND gates
Co Partial products

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
27 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Multiplication


Array multiplier:
X3 X2 X1 X0 Y0
M
N
tmult = (M-1)+(N-2)tcarry P0
X3 X2 X1 X0 Y1
+ (N-1)tsum + tand

HA FA FA HA

X3 X2 X1 P1
X0 Y2

FA FA FA HA There are a large


number of nearly
P2 identical critical
X3 X2 X1 X0 Y3 paths in this
circuit.

FA FA FA HA
P7 P6 P5 P4 P3
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
28 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Multiplication


From the delay expression and the fact that all critical paths have the same
length, minimizing tmult requires minimizing both tcarry and tsum.

This is in contrast with the adder where minimizing tcarry was key.
The transmission gate adder is a good choice here.

Parallel Signed Multiplication:


Baugh-Wooley algorithm: Only 3 additional adders required over the
m2
unsigned version.
A = am 1 2 m 1 + ai 2i Let A and B represent signed integers.
i=0
m2 Expanding shows that the last two rows of
B = bm 1 2 m 1 + bi 2i summands are all negative so the algorithm
i=0 simply adds in their negations.
m2 n2
P = am 1 2 m 1 i
+ ai 2 bn 1 2 n 1 + b i 2 i
i=0 i=0

m 2n 2 m2 n2
= am 1 bn 1 2m + n 2 + ai b j 2 i + j ai bn 1 2n 1 + i am 1 bi 2 m 1 + i
i=0j=0 i=0 i=0

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
29 (December 11, 2000 3:44 pm)
1966
UMBC
IVERSITY O
UN F
M
AR

1966
U M B C
YLAND BA
L

TI
Y MO
RE COUNT
a7 a a6 a a5 a a4 a3 a a2 a a1
7 6 5 a4 3 2 a1 a0 a0
AND AND AND AND AND AND AND AND b0 b0
( a7 b0 ) ( a6 b0 ) ( a5 b0 ) ( a4 b0 ) ( a3 b0 ) ( a2 b0 ) ( a1 b0 ) ( a0 b0 )
b1 b1
AND AND AND AND AND AND AND AND P0
ADD ADD ADD ADD ADD ADD ADD
( a7 b1 )

UMBC
( a6 b1 ) ( a5 b1 ) ( a4 b1 ) ( a3 b1 ) ( a2 b1 ) ( a1 b1 ) ( a0 b1 )
b2 b2
Principles of VLSI Design

AND AND AND AND AND AND AND AND P1


( a7 b2 )
ADD ADD ADD ADD ADD ADD ADD
( a6 b2 ) ( a5 b2 ) ( a4 b2 ) ( a3 b2 ) ( a2 b2 ) ( a1 b2 ) ( a0 b2 )
b3 b3
AND AND AND AND AND AND AND AND P2
Datapath Operators: Multiplication

ADD ADD ADD ADD ADD ADD ADD


Parallel Signed Multiplication:

( a7 b3 ) ( a6 b3 ) ( a5 b3 ) ( a4 b3 ) ( a3 b3 ) ( a2 b3 ) ( a1 b3 ) ( a0 b3 )
b4 b4
AND AND AND AND AND AND AND P3
AND

30
ADD ADD ADD ADD ADD ADD ADD
( a7 b4 ) ( a6 b4 ) ( a5 b4 ) ( a4 b4 ) ( a3 b4 ) ( a2 b4 ) ( a1 b4 ) ( a0 b4 )
b5 b5
AND AND AND AND AND AND AND AND P4
ADD ADD ADD ADD ADD ADD ADD
Subsystem Design

( a7 b5 ) ( a6 b5 ) ( a5 b5 ) ( a4 b5 ) ( a3 b5 ) ( a2 b5 ) ( a1 b5 ) ( a0 b5 )
b6 b6
AND AND AND AND AND AND AND AND P5
( a7 b6 )
ADD ADD ADD ADD ADD ADD ADD
( a6 b6 ) ( a5 b6 ) ( a4 b6 ) ( a3 b6 ) ( a2 b6 ) ( a1 b6 ) ( a0 b6 )
b7 b7
AND AND AND AND AND AND AND AND P6
( a7 b7 ) ADD ADD ADD ADD ADD ADD ADD
( a6 b7 ) ( a5 b7 ) ( a4 b7 ) ( a3 b7 ) ( a2 b7 ) ( a1 b7 ) ( a0 b7 )
ADD
( a7 b7 )
ADD ADD ADD ADD ADD ADD ADD
P8
ADD
ADD ( a7 b7 )
P15 P14 P13 P12 P11 P10 P9 P7

(December 11, 2000 3:44 pm)


CMPE 413/CMSC 711
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Multiplication


Carry-Save Multiplier:
Carry bits can be passed diagonally downwards instead of to the left.

HA HA HA HA 4x4 version

HA FA FA FA
Cost: A little extra
area:

HA FA FA FA Advantage:
Critical path is uniquely defined:
tmult = (N-1)tcarry + tand + tmerge
HA FA FA HA
(Assuming tadd = tcarry).
Vector-merging adder Minimizing tmerge is useful,
e.g. use carry-select or lookahead.
Here the carry bits are not immediately added but rather saved for the
next adder stage.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
31 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Multiplication


Serial Unsigned Multiplication:

If area
is a
concern. Reg
1-bit
Clk
reset G2

serial register
P7 P0
X
Y G1
Cin
Xi and Yi delivered serially Clk Computes the summands
to the inputs of G1 at different rates. row-wise from right to left.
Disadv: Quadratic delay: tmult = M x N x tcarry

Serial/Parallel Unsigned Multiplier shown in Weste and Eshraghian.


YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
32 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Multiplication


Booth Encoding:
A special encoding of the multiplier word reduces the number of
required addition stages and speeds up multiplication substantially.

Radix-4 scheme:
(N 1) 2
j
Y = Y j 4 with ( Y j { 2, 1, 0, 1, 2 } )
j=0

The number of partial products (and additions) is halved, resulting in


area and speed advantage.

The disadvantage is a somewhat more involved multiplier cell.


AND operation replaced with inversion and shift logic.

Virtually every multiplier in use employs the Booth scheme.

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
33 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Multiplication


Wallace Multiplier:
Trees can be used to replace the linear partial-sum adders:
Y0 Y1 Y2
Y0 Y1 Y2 Y3 Y4 Y5
FA
Ci-1 FA FA
Ci Y3 Ci
Ci-1
FA Ci
Ci-1 FA Ci-1
Ci Y4
Ci

FA
Ci-1 FA
Ci Y5 Slice of a 6-bit
carry-save mult. C
Sum
FA
# of ripple stages is N-2 Adv: O(log2N) mult time.
Ci Disadv: Very irregular -- difficult
Sum
to layout.
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
34 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Shifters


Right/Left 1-bit shifter:

A3 A2 A1 A0

IR IL

0 1 0 1 0 1 0 1
S S S S
Mux Mux Mux Mux

Right/Left

H3 H2 H1 H0

YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
35 (December 11, 2000 3:44 pm)
1966
UMBC
Principles of VLSI Design Subsystem Design CMPE 413/CMSC 711

Datapath Operators: Shifters


Barrel shifter:
s<3> s<2> s<1> s<0>

r<3>

r<2>

r<1>

r<0>

shift result
1 l<3:0>
2 l<4:1>
l<6:0> Arithmetic and logical shifts and rotates possible 4 l<5:2>
by muxing l<6:0> to the appropriate values. 8 l<6:3>
YLAND BA
AR L
M

F
TI
U M B C

MO

IVERSITY O
RE COUNT

UN
Y
36 (December 11, 2000 3:44 pm)
1966
UMBC

Potrebbero piacerti anche