VLSI Arith Addition

VLSI Arithmetic
Adders & Multipliers

Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Introduction
Digital Computer Arithmetic belongs to
Computer Architecture, however, it is also an
aspect of logic design.
The objective of Computer Arithmetic is to
develop appropriate algorithms that are
utilizing available hardware in the most
efficient way.
Ultimately, speed, power and chip area are
the most often used measures, making a
strong link between the algorithms and
technology of implementation.
Oklobdzija 2004
Computer Arithmetic
Basic Operations
Addition
Multiplication
Multiply-Add
Division
Evaluation of Functions
Multi-Media
Oklobdzija 2004
Computer Arithmetic
Addition of Binary Numbers

Full Adder. The full adder is the fundamental building block
of most arithmetic circuits:
ai
Cout
bi
Full
Adder
Cin
si
The sum and carry outputs are described as:
si ai bi ci ai bi ci ai bi ci ai bi ci
ci 1 ai bi ci ai bi ci ai bi ci ai bi ci ai bi ai ci bi ci
Oklobdzija 2004
Computer Arithmetic

Inputs
Outputs
ci
ai
bi
si
ci+1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
0
0
0
1
0
1
1
1
Oklobdzija 2004
Computer Arithmetic
Propagate
Generate
Propagate
Generate
6
Full-Adder Implementation
Full Adder operations is defined by equations:
si ai bi ci ai bi ci ai bi ci ai bi ci ai bi ci pi ci
ci 1 ai bi ci ai bi ci ai bi g i pi ci
ai b
Carry-Propagate: pi ai bi
and Carry-Generate gi
g i a i bi
c out
c in
One-bit adder could be

implemented as shown
Oklobdzija 2004
Computer Arithmetic
si
High-Speed Addition
ci 1 g i pi ci
a
g i ai bi
pi ai bi
c out
s
One-bit adder could be

implemented more efficiently
because MUX is faster
Oklobdzija 2004
c in
si pi ci
si
Computer Arithmetic
The Ripple-Carry Adder
Oklobdzija 2004
Computer Arithmetic
The Ripple-Carry Adder

A0
Ci,0
A1
B0
FA
S0
Co,0
(= C i,1)
A2
B1
FA
S1
C o,1
A3
B2
FA
Co,2
S2
FA
B3
Co,3
S3
Worst case delay linear with the number of bits

td = O(N)
tadder N 1 tcarry + tsum
Goal: Make the fastest possible carry path circuit

From Rabaey
Oklobdzija 2004
Computer Arithmetic
10
Inversion Property
A
Ci
FA
Co
Ci
FA
Co
S
S A B C i = S A B C i
C o A B C i = Co A B Ci
From Rabaey
Oklobdzija 2004
Computer Arithmetic
11
Minimize Critical Path by Reducing Inverting

Stages
Even Cell
A0
Ci,0
A1
B0
FA
C o,0
S0
B1
FA
S1
A2
Co,1
A3
B2
FA
Odd Cell
C o,2
S2
B3
FA
C o,3
S3
Exploit Inversion Property

From Rabaey Note: need 2 different types of cells
Oklobdzija 2004
Computer Arithmetic
12
Ripple Carry Adder

Carry-Chain of an RCA implemented using multiplexer from the
a i+ 2
b i+ 2
a i+ 1
b i+ 1
ai
bi
standard cell
library:
Critical Path
c i+ 1
c out
ci
c in
Oklobdzija, ISCAS88
s i+ 2
Oklobdzija 2004
s i+ 1
Computer Arithmetic
si
13
Manchester Carry-Chain
Realization of the Carry Path
Simple and very popular scheme for implementation of
carry signal path
dd
dd
dd
dd
dd
dd
dd
dd
Generate
device
Carry out
Carry in
+
+ Propagate
device
Predischarge
& kill device
Oklobdzija 2004
Computer Arithmetic
14
Original Design
T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers:
A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.
Oklobdzija 2004
Computer Arithmetic
15
Manchester Carry Chain (CMOS)

Implement P with pass-transistors
Implement G with pull-up, kill (delete) with pull-down
Use dynamic logic to reduce the complexity and speed up
VDD
Ci,0
P0
P1
P2
P3
P4
G0
G1
G2
G3
G4
Kilburn, et al, IEE Proc, 1959.

Oklobdzija 2004
Computer Arithmetic
16
Pass-Transistor Realization in DPL

C
C C
A
A
B
B
S
X O R /X N O R
U L T IP L E X E R
U FFER
A N D /N A N D
V
C C
A
A
B
B
V
A
A
B
B
C C
C C
U L T IP L E X E R
U FFER
O R /N O R
Oklobdzija 2004
Computer Arithmetic
17
Carry-Skip Adder
MacSorley, Proc IRE 1/61
Lehman, Burla, IRE Trans on Comp, 12/61
Oklobdzija 2004
Computer Arithmetic
18
Carry-Skip Adder
Ci,0
G1
FA
P0
C o,0
P0 G1
Ci,0
FA
FA
P0
C o,0
G1
P2
C o,1
G1
FA
FA
P2
Co,1
G2
P3
Co,2
G2
FA
FA
P3
C o,2
G3
G3
FA
Co,3
BP=P oP1 P2 P3
Multiplexer
P0
Co,3
Bypass
From Rabaey
Idea: If (P0 and P1 and P2 and P3 = 1)

then C o3 = C 0, else kill or generate.
Oklobdzija 2004
Computer Arithmetic
19
Carry-Skip Adder:
N-bits, k-bits/group, r=N/k groups
...
O R
C
out
a
N - 1 b N - 1 N - k - 1b N - k - 1
N -1
b (r-1 )k a
O R
S
P
(r-1 )kb (r-1 )k
...
G
2 k -1
O R
...
r1
N -k -1
r-1
AND
P
AND
2 k -1
ak
bk
k -1
r-2
(r-2 )k
2 k -1
k -1
O R
S
P
...
a0 b
...
G
... ...
... ...
S
(r-1 )k -1
...
... ...
... ...
S
(r-1 )k
k -1
S
P
AND
0
0
AN D
c r it ic a l p a t h , d e la y = 2 ( k - 1 ) + ( N / 2 - 2 )
Oklobdzija 2004
Computer Arithmetic
20
in
Carry-Skip Adder
tp
ripple adder
bypass adder
t d 2 k 1 t RCA
2 t SKIP
2k
4..8
Oklobdzija 2004
N
Computer Arithmetic
21
Variable Block Adder

(Oklobdzija, Barnes: IBM 1985)
Oklobdzija 2004
Computer Arithmetic
22
Carry-chain of a 32-bit Variable Block Adder

a
N -1
...
..
C
out
ajb
..
.
N -1
N -1
m -1
S
P
m -1
m -1
a0 b
...
G
m -2
m -2
m -2
...
...
..
in
s k ip in g
...
C
ou
r ip p lin g
Oklobdzija 2004
C a r r y s ig n a l p a th
Computer Arithmetic
23
in

1 3
6
5
=9
3 1
Any-point-to-any-point delay = 9
as compared to 12 for CSKA
Oklobdzija 2004
Computer Arithmetic
24
Carry-chain block size determination for a

32-bit Variable Block Adder
Oklobdzija 2004
Computer Arithmetic
25
Delay Calculation for Variable Block Adder

P0
Ci,0
G0
P1
P2
G1
P3
G2
BP
Co,3
G3
BP
Delay model:
Oklobdzija 2004
Computer Arithmetic
26

Variable Group Length
t d c1 c2 N c3
Oklobdzija, Barnes, Arith85
Oklobdzija 2004
Computer Arithmetic
27

Variable Block Lengths
No closed form solution for delay

It is a dynamic programming problem
Oklobdzija 2004
Computer Arithmetic
28
Delay Comparison: Variable Block Adder

Oklobdzija 2004
Computer Arithmetic
29

Delay
16
VBA
14
12
CLA
10
8
VBA- Multi-Level
6
4
2
0
4
11
18
25
32
39
46
53
60
Size N
Oklobdzija 2004
Computer Arithmetic
30
VLSI Arithmetic
Lecture 4
Review
Lecture 3

Oklobdzija 2004
Computer Arithmetic
33

a N-1b N-1
C out
..
...
Gm
Gm-1
SN-1
Pm
Pm-1
Gm
Gm-1
aj b j
..
.
Gm-2
a 0 b0
...
...
..
G2
G1
G0
Sj
Si
Pm-2
P2
Gm-2
bi
...
Cin
S0
P1
G2
G1
skiping
P0
G0
...
C out
Cin
rippling
Oklobdzija 2004
Carry signal path
Computer Arithmetic
34

1 3
6
5
=9
3 1
Any-point-to-any-point delay = 9
as compared to 12 for CSKA
Oklobdzija 2004
Computer Arithmetic
35
Carry-chain block size determination for a

32-bit Variable Block Adder
Oklobdzija 2004
Computer Arithmetic
36
Delay Calculation for Variable Block Adder

P0
Ci,0
G0
P1
P2
G1
P3
G2
BP
Co,3
G3
BP
Delay model:
Oklobdzija 2004
Computer Arithmetic
37

Variable Group Length
t d c1 c2 N c3
Oklobdzija, Barnes, Arith85
Oklobdzija 2004
Computer Arithmetic
38

Variable Block Lengths
No closed form solution for delay

It is a dynamic programming problem
Oklobdzija 2004
Computer Arithmetic
39

Oklobdzija 2004
Computer Arithmetic
40

Delay
16
Square Root
Dependency
14
VBA
12
Log
Dependency
10
CLA
VBA- Multi-Level
6
4
2
0
4
11
18
25
32
39
46
53
60
Size N
Oklobdzija 2004
Computer Arithmetic
41
Circuit Issues
Adder speed can not be estimated based
on:
logic gates in the critical path
number of transistors in the path
logic levels in the path
Estimating Adders speed is much more

complex and many of the fast schemes
may be misleading you.
Oklobdzija 2004
Computer Arithmetic
42
Fan-Out Dependency
Oklobdzija 2004
Computer Arithmetic
43
Fan-In Dependency
This looks like
Logical Effort
(1985)
Oklobdzija 2004
Computer Arithmetic
44

Oklobdzija 2004
Computer Arithmetic
45
Oklobdzija 2004
Computer Arithmetic
46
Carry-Lookahead Adder
(Weinberger and Smith, 1958)
ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who

invented CLA adder in 1958)
Ref: A. Weinberger and J. L. Smith, A Logic for High-Speed Addition,

National Bureau of Standards, Circ. 591, p.3-12, 1958.
Oklobdzija 2004
Computer Arithmetic
47
CLA Definitions: One-bit adder

ci 1 g i pi ci
a
g i ai bi
pi ai bi
c out
s
c in
si pi ci
si
Oklobdzija 2004
Computer Arithmetic
48
CLA
Definitions:
4-bit
Adder
a
a
a
b
b
b
b
ai+3
i+2
i+3
Ci+4
Ci+3
gi+3
pi+3
i+1
i+2
Ci+2
gi+2
pi+2
i+1
Ci+1
gi+1
pi+1
Ci
gi
pi
ci 1 ai bi ci ai bi ci ai bi g i pi ci
ci 2 g i 1 pi 1ci 1 g i 1 pi 1 ( g i pi c1 )
g i 1 pi 1 g i pi 1 pi c1
Oklobdzija 2004
Computer Arithmetic
49
Carry-Lookahead
a
a 4-bits
a
a Adder:
b
b
b
b
i+3
i+2
i+3
Ci+4
Ci+3
gi+3
pi+3
i+1
i+2
Ci+2
gi+2
pi+2
i+1
Ci+1
gi+1
pi+1
Ci
gi
pi
ci 3 g i 2 pi 2 ci 2 g i 2 pi 2 ( g i 1 pi 1 g i pi 1 pi ci )
g i 2 pi 2 g i 1 pi 2 pi 1 g i pi 2 pi 1 pi ci
ci 4 g i 3 pi 3ci 3 g i 3 pi 3 ( g i 2 pi 2 g i 1 pi 2 pi 1 g i )
g i 3 pi 3 g i 2 pi 3 pi 2 g i 1 pi 3 pi 2 pi 1 g i pi 3 pi 2 pi 1 pi ci
Gj
Oklobdzija 2004
Computer Arithmetic
Pj
50
G j g i 3 pi 3 g i 2 pi 3 pi 2 g i 1 pi 3 pi 2 pi 1 g i
Pj pi 3 pi 2 pi 1 pi
One gate delay

to calculate p, g
One to calculate
P and two for G
i+ 3
i+ 2
i+ 2
i+ 1
i+ 1
C in
g
C
i+ 1p i+ 1
4 (j+ 1 )
Three gate delays

To calculate C4(j+1)
i+ 1 p i+ 1
i+ 1
i+ 1
P , G G ro u p
c4( j 1) G j Pj c j
Oklobdzija 2004
i+ 3
4 j+ 3
4 j+ 2
4 j+ 1
Compare that to 8 in RCA !
Computer Arithmetic
51
(Weinberger and Smith)
G * j Gi 3 Pi 3Gi 2 Pi 3 Pi 2Gi 1 Pi 3 Pi 2 Pi 1Gi
P * j Pi 3 Pi 2 Pi 1 Pi
G
j+ 3
j+ 3
j+ 2
j+ 2
j+ 1
j+ 1
4 (j+ 1 )
P*
G*
c4 ( j 1) G *k P *k c4 j
4 j+ 3
4 j+ 2
4 j+ 1
Additional two gate delays

C16 will take a total of 5 vs. 32 for RCA !
Oklobdzija 2004
Computer Arithmetic
52
4j
32-bit Carry Lookahead Adder

a
28
24
in d iv id u a l a d d e r s
g e n e r a t i n g : g i, p i,
and sum S i
20
12
C a r r y - lo o k a h e a d s u p e r - b lo c k s o f
4 - b it s b lo c k s g e n e r a tin g :
G * i, P * i, a n d C in f o r t h e 4 - b i t
b lo c k s
out
16
out
in
in
in
C a r r y - lo o k a h e a d b lo c k s o f
4 - b it s g e n e r a t in g :
G i, P i, a n d C in f o r t h e
a d d e rs
G r o u p p r o d u c in g f in a l
c a rry C out a n d C 16
C r i t i c a l p a t h d e l a y = ( f o r g i , p i ) + 2 x 2 ( f o r G , P ) + 3 x 2 ( f o r C i n ) + 1 X O R - ( f o r S u m ) = a p p x . 1 2 o f d e l a y
Oklobdzija 2004
Computer Arithmetic
53
(Weinberger and Smith: original derivation, 1958 )
Oklobdzija 2004
Computer Arithmetic
54
(Weinberger and Smith: original derivation )
Oklobdzija 2004
Computer Arithmetic
55
Carry-Lookahead Adder (Weinberger and Smith)

please notice the similarity with Parallel-Prefix Adders !
Oklobdzija 2004
Computer Arithmetic
56
Carry-Lookahead Adder (Weinberger and Smith)

please notice the similarity with Parallel-Prefix Adders !
Oklobdzija 2004
Computer Arithmetic
57
Motorola: CLA Implementation

Example
A. Naini, D. Bearden and W. Anderson, A 4.5nS 96b CMOS
Adder Design,
Proceedings of the IEEE Custom Integrated Circuits
Conference, May 3-6, 1992.
PG BLO C K
PG BLO CK
P63:60
P63:48
P63:0
P59:48
G59:48
P55:48
G55:48
P51:48
C4
C8
C12
C16
C32
C48
C52
C56
G63:0
C60
P47:32
G63:48
61
P59:56
G59:56
P55:52
G55:52
P51:48
G51:48
P11:0
G11:0
P7:0
G7:0
P3:0
G3:0
G15:0
P15:0
G31:16
P31:16
G47:32
G63:60
C
P,G62:60 63
C
P,G61:6062
C
P,G60
CAR RY
BLO C K
P63
G63
P62
G62
P61
G61
P60
G60
P59
PG BLO CK
PG BLO C K
G56
P55
G52
G51:48
P15:12
G15:12
P11:8
G11:8
P7:4
G7:4
P3:0
G3:0
C16
P15:0
G31:0
C32
P31:0
G47:0
P47:0
C48
3.75nS
G15:0
C64
63
- S
63
- C
60
P51
G48
P,G2:0
P,G1:0
P,G0
C0
2.35nS
2.0nS
P47
G32
P
31
G16
P15
PG BLO CK
1.7nS
- C
48
- C
4 7 :0
- G
1 5 :0
- G
3 :0
- G
0
C r itic a l p a th : A , B - G
59
Computer Arithmetic
Oklobdzija 2004
G12
P11
G8
P7
G4
P3
G3
P2
G2
P1
G1
P0
G0
1.05nS
...
...
...
...
...
...
...
...
4.8nS
Critical path in Motorola's 64-bit CLA
2.7nS
Motorola's 64-bit
CLA
conventional PG Block
no better
situation here !
Basically, this is MCC performance with

Carry-Skip.
One should not expect any better results
than VBA.
Oklobdzija 2004
carry ripples locally

5-transistors in the path
Computer Arithmetic
60
Motorola's 64-bit
CLA
Modified PG Block
Intermediate propagate signals Pi:0

are generated to speed-up C3
still critical path resembles MCC
Oklobdzija 2004
Computer Arithmetic
61
Motorola's 64-bit CLA
3.9nS
1.8nS
2.2nS
3.55nS
2.9nS
Oklobdzija 2004
3.2nS
Computer Arithmetic
62
P6 3
G6 3
P6 2
G6 2
P6 1
G6 1
P6 0
G
60
P5 9
G
56
P5 5
G5 2
P5 1
G4 8
P4 7
G3 2
P3 1
G1 6
P
15
G1 2
P1 1
G8
P7
P6 3 : 6 0
P6 3 : 4 8
P6 3 : 0
G5 5 : 4 8
P5 5 : 4 8
C5 6
G5 9 : 4 8
P5 9 : 4 8
C6 0
G6 3 : 0
P4 7 : 3 2
G6 3 : 4 8
C6 1
P5 9 : 5 6
G5 9 : 5 6
P5 5 : 5 2
G5 5 : 5 2
P5 1 : 4 8
C
16
C3 2
C4 8
C5 2
P5 1 : 4 8
G5 1 : 4 8
G3 : 0
P3 : 0
G7 : 0
P7 : 0
C4
C8
G1 1 : 0
P1 1 : 0
C1 2
G5 1 : 4 8
P1 5 : 1 2
G1 5 : 1 2
P1 1 : 8
G1 1 : 8
P7 : 4
G7 : 4
P3 : 0
G3 : 0
G1 5 : 0
P1 5 : 0
G3 1 : 1 6
P3 1 : 1 6
G4 7 : 3 2
C G6 3 : 6 0
P , G6 2 : 6 0 6 3
C
P , G6 1 : 66 02
P , G6 0
P , G2 : 0
P , G1 : 0
P , G0
C4 8
P4 7 : 0
G4 7 : 0
C3 2
P3 1 : 0
G3 1 : 0
C1 6
P1 5 : 0
G1 5 : 0
C6 4
C0
63
3.2nS
2.9nS Computer Arithmetic
Oklobdzija 2004
3.55nS
2.2nS
3.9nS
2.35nS
2.0nS
6 3
- S
6 3
- C
6 0
- C
4 8
- C
4 7 :0
- G
1 5 :0
- G
3 :0
- G
0
- G
C r it ic a l p a t h : A , B
B L O C K
1.7nS
P G
3.75nS
2.7nS
B L O C K
P G
B L O C K
P G
1.05nS
G4
P3
G3
P2
G2
P1
G1
P0
G0
C A R R Y
B L O C K
B L O C K
P G
B L O C K
P G
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
4.8nS
1.8nS
Delay Optimized CLA

B. Lee, V. G. Oklobdzija
Journal of VLSI Signal Processing, Vol.3, No.4, October 1991
Delay
Optimized
CLA: LeeOklobdzija 91
(a.) Fixed groups and levels
(b.) variable-sized groups,
fixed levels
(c.) variable-sized groups and
fixed levels
(d.) variable-sized groups and
levels
Oklobdzija 2004
Computer Arithmetic
65
Two-Levels of Logic Implementation of

the Carry Block
Oklobdzija 2004
Computer Arithmetic
66
Two-Levels of Logic Implementation of

the Carry-Lookahead Block
Oklobdzija 2004
Computer Arithmetic
67
Three-Levels of Logic Implementation

of the Carry Block (restricted fan-in)
Oklobdzija 2004
Computer Arithmetic
68
Three-Levels of Logic Implementation of the

Carry Lookahead (restricted fan-in)
Oklobdzija 2004
Computer Arithmetic
69
Delay Optimized CLA: Lee-Oklobdzija 91
Delay: Two-level BCLA

Oklobdzija 2004
Delay: Three-level BCLA

Computer Arithmetic
70
Delay Optimized CLA: Lee-Oklobdzija 91
(a.) 2-level BCLA =8.5nS

Oklobdzija 2004
(b.) 3-level BCLA =8.9nS
Computer Arithmetic
71
Lings Adder
Huey Ling, High-Speed Binary Adder
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Used in: IBM 3033, IBM 168, Amdahl V6, HP etc.
Lings Derivations
ai
bi
define:
Ci 1 g i pi Ci
H i 1 Ci 1 Ci
gi implies Ci+1 which implies

Hi+1 , thus: gi= gi Hi+1
ci+1
ci
g i ai bi
si
pi Ci pi Ci 1 pi g i pi pi Ci 1
ai bi pi gi
ti
pi Ci 1 pi Ci 1 pi H i 1
0 0 0
1 1 0
0 1 0
1 0 1
pi Ci pi H i 1
Ci 1 ti H i 1
Ci 1 gi pi Ci gi H i 1 pi Ci
gi H i 1 pi H i 1 ti H i 1
Oklobdzija 2004
Computer Arithmetic
73
From:
Lings Derivations
and
CiC1
gi
pi CCi
H ii 11 Cii 11 Cii g i p
g
i i
i
i
i
because:
H i 1 g i ti 1 H i
Ci 1 ti H i 1
fundamental expansion
Now we need to derive Sum equation
Oklobdzija 2004
Computer Arithmetic
74
Ling Adder
Variation of CLA:
Lings equations:
pi ai bi
ti ai bi
gi ai bi
gi ai bi
Ci 1 g i pi Ci
H i 1 g i ti 1 H i
Si pi Ci
Si ti H i 1 g i ti 1 H i
Ling, IBM J. Res. Dev, 5/81
Oklobdzija 2004
Computer Arithmetic
75
Ling Adder
Lings equation:
Variation of CLA:
Ci 1 g i g i Ci pi Ci
g i g i pi Ci
Ci 1 g i ti Ci
H i g i ti 1 H i 1
Ling uses different transfer function.
Four of those functions have desired
properties (Lings is one of them)
see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.
Oklobdzija 2004
Computer Arithmetic
76
Ling Adder
Conventional:
Fan-in of 5
C4 g 3 t3 g 2 t 3t 2 g1 t3t 2t1 g 0 t 3t 2t1t0Cin

Ling:
H 4 g 3 t 2 g 2 t 2t1 g1 t 2t1t0 g 0 t 2t1t0t 1Cin

H 4 g 3 g 2 t 2 g1 t 2t1 g 0 t 2t1t0Cin
Fan-in of 4
Oklobdzija 2004
Computer Arithmetic
77
Advantages of Lings Adder

Uniform loading in fan-in and fan-out
H16 contains 8 terms as compared to G16 that
contains 15.
H16 can be implemented with one level of logic
(in ECL), while G16 can not.
(Lings adder takes full advantage of wired-OR,
of special importance when ECL technology is
used)
Oklobdzija 2004
Computer Arithmetic
78
VLSI Arithmetic
Lecture 5
Review
Lecture 4
Lings Adder
Huey Ling, High-Speed Binary Adder
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Used in: IBM 3033, IBM S370/168, Amdahl V6, HP etc.
Lings Derivations
ai
bi
define:
Ci 1 g i pi Ci
H i 1 Ci 1 Ci
gi implies Ci+1 which implies

Hi+1 , thus: gi= gi Hi+1
ci+1
ci
g i ai bi
si
pi Ci pi Ci pi g i pi pi Ci
ai bi pi gi
ti
pi Ci pi Ci 1 pi H i 1
0 0 0
1 1 0
0 1 0
1 0 1
pi Ci pi H i 1
Ci 1 ti H i 1
Ci 1 gi pi Ci gi H i 1 pi Ci
gi H i 1 pi H i 1 ti H i 1
Oklobdzija 2004
Computer Arithmetic
82
From:
Lings Derivations
and
CiC1
gi
pi CCi
H ii 11 Cii 11 Cii g i p
g
i i
i
i
i
because:
H i 1 g i ti 1 H i
Ci 1 ti H i 1
fundamental expansion
Now we need to derive Sum equation
Oklobdzija 2004
Computer Arithmetic
83
Ling Adder
Variation of CLA:
Lings equations:
pi ai bi
ti ai bi
gi ai bi
gi ai bi
Ci 1 g i pi Ci
H i 1 g i ti 1 H i
Si pi Ci
Si ti H i 1 g i ti 1 H i
Ling, IBM J. Res. Dev, 5/81
Oklobdzija 2004
Computer Arithmetic
84
Ling Adder
Lings equation:
Variation of CLA:
Ci 1 g i g i Ci pi Ci
g i g i pi Ci
ai
bi
Hi+1
ci+1
ai-1 bi-1
Hi
g i, t i
ci
si
Ci 1 g i ti Ci
gi-1, ti-1
ci-1
si-1
H i 1 g i ti 1 H i
Ling uses different transfer function.
Four of those functions have desired
properties (Lings is one of them)
see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.
Oklobdzija 2004
Computer Arithmetic
85
Ling Adder
Conventional:
Fan-in of 5
C4 g 3 t3 g 2 t 3t 2 g1 t3t 2t1 g 0 t 3t 2t1t0Cin

Ling:
H 4 g 3 t 2 g 2 t 2t1 g1 t 2t1t0 g 0 t 2t1t0t 1Cin

H 4 g 3 g 2 t 2 g1 t 2t1 g 0 t 2t1t0Cin
Fan-in of 4
Oklobdzija 2004
Computer Arithmetic
86
Advantages
of and
Lings
Uniform
loading in fan-in
fan-outAdder
H16 contains 8 terms as compared to G16 that
contains 15.
H16 can be implemented with one level of logic
(in ECL), while G16 can not (with 8-way wire-OR).
(Lings adder takes full advantage of wired-OR, of
special importance when ECL technology is
used - his IBM limitation was fan-in of 4 and
wire-OR of 8)
Oklobdzija 2004
Computer Arithmetic
87
Ling: Weinberger Notes
Oklobdzija 2004
Computer Arithmetic
88
Oklobdzija 2004
Computer Arithmetic
89
Oklobdzija 2004
Computer Arithmetic
90
Advantage of Lings Adder

32-bit adder used in: IBM 3033, IBM S370/
Model168, Amdahl V6.
Implements 32-bit addition in 3 levels of
logic
Implements 32-bit AGEN: B+Index+Disp in
4 levels of logic (rather than 6)
5 levels of logic for 64-bit adder used in
HP processor
Oklobdzija 2004
Computer Arithmetic
91
Implementation of Lings
Adder in CMOS
(S. Naffziger, A Subnanosecond 64-b Adder, ISSCC 96)
Oklobdzija 2004
Computer Arithmetic
92
S. Naffziger,
ISSCC96
H 4 g 3 g 2 t 2 g1 t 2t1 g 0
Ci 1 ti H i 1
Oklobdzija 2004
Computer Arithmetic
93
S. Naffziger,
ISSCC96
H 4 g 3 g 2 t 2 g1 t 2t1 g 0
Oklobdzija 2004
Computer Arithmetic
94
H 4 g 3 g 2 t 2 g1 t 2t1 g 0
S. Naffziger,
ISSCC96
Oklobdzija 2004
Computer Arithmetic
95
S. Naffziger,
ISSCC96
Oklobdzija 2004
Computer Arithmetic
96
S. Naffziger, ISSCC96
Oklobdzija 2004
Computer Arithmetic
97
Oklobdzija 2004
Computer Arithmetic
98
S. Naffziger,
ISSCC96
Oklobdzija 2004
Computer Arithmetic
99
C16 p15 H16 p15 ( g15 g11 t11 g 7 t11t7 g 0 )

Oklobdzija 2004
Computer Arithmetic
100
S. Naffziger,
ISSCC96
Oklobdzija 2004
Computer Arithmetic
101
S. Naffziger,
ISSCC96
Oklobdzija 2004
Computer Arithmetic
102
S. Naffziger,
ISSCC96
Oklobdzija 2004
Computer Arithmetic
103
Ling Adder Critical Path
Oklobdzija 2004
Computer Arithmetic
104
Ling Adder: Circuits

G3
CK
A2
CK
A3
B2
A2
A1
B2
B1
A1
G4
B0
G0
P1
G1
B1 A3
B3
A0
B0 A2
B2
P4
CK
LC
CK
A1
B3
B1
A0
CK
G2
P2
CK
SumL
K
G
C1L
LCH
LCL
C1H C0L
C0H
P
C1H
SumH
C1L C0H
LCH
C0L
LCL
CK
Oklobdzija 2004
Computer Arithmetic
105
LCS4 Critical G
Path
in1
4b
(k,p) or (g,p)
P4
G3
G4
12b
C15
32b
C47
C31
C15
16b
S63 S62
Oklobdzija 2004
S48
Computer Arithmetic
106
LCS4 Logical Effort

Delay
Prefix-4 Ling/Conditional-Sum (Dynamic - Long Carry Path)
Stages
dg3# (dg3)
g4 (NAND2)
C15# (GG4)
C15 (INV)
C47# (LC)
C47 (INV)
C47#b (INV)
C47b (INV)
S63# (SUM)
S63 (INV)
Branch
4.0
2.0
1.0
1.0
3.0
1.0
1.0
1.0
16.0
1.0
Oklobdzija 2004
LE
0.98
1.11
1.01
1.00
1.03
1.00
1.00
1.00
0.86
1.00
Total
Path
Effort
fo, opt
Parasitic Branch Total LE
2.97
1.84
1.80
1.00
3.32
3.84E+02 9.73E-01 3.74E+02 1.81
1.00
1.00
1.00
1.36
1.00
Computer Arithmetic
Effort
Delay
(ps)
Parasitic
Delay
(ps)
Total
Delay
(ps)
Total
Delay
(FO4)
66
70
136
7.2
107
Results:
0.5u Technology
Speed: 0.930 nS
Nominal process, 80C, V=3.3V
See: S. Naffziger, A Subnanosecond 64-b Adder, ISSCC 96
Oklobdzija 2004
Computer Arithmetic
108
Prefix Adders
and
Parallel Prefix Adders
from: Ercegovac-Lang
Oklobdzija 2004
Computer Arithmetic
110
Prefix Adders
Following recurrence operation is defined:
(g, p)o(g,p)=(g+pg, pp)

such that:
(g0, p0)
G i, P i =
i=0
(gi, pi)o(Gi-1, Pi-1 )
ci+1 = Gi
c1 = g0+ p0 cin
1in
for i=0, 1, .. n
(g-1, p-1)=(cin,cin)
This operation is associative, but not commutative

It can also span a range of bits (overlapping and adjacent)
Oklobdzija 2004
Computer Arithmetic
111
Oklobdzija 2004
Computer Arithmetic
112
Parallel Prefix Adders: variety of possibilities

Oklobdzija 2004
Computer Arithmetic
113
Pyramid Adder:
M. Lehman, A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic
Units, IFIP Congress, Munich, Germany, 1962.
Oklobdzija 2004
Computer Arithmetic
114

Oklobdzija 2004
Computer Arithmetic
115

Oklobdzija 2004
Computer Arithmetic
116
Hybrid BK-KS Adder
Oklobdzija 2004
Computer Arithmetic
117
Parallel Prefix Adders: S. Knowles 1999
operation is associative: h>ijk
operation is idempotent: h>ijk
produces carry: cin=0
Oklobdzija 2004
Computer Arithmetic
118
Parallel Prefix Adders: Ladner-Fisher
Exploits associativity, but not idempotency.

Produces minimal logical depth
Oklobdzija 2004
Computer Arithmetic
119

(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of two.

Large fan-out (of 16; n/2); Large capacitive loading
combined with the long wires (in the last stages)
Oklobdzija 2004
Computer Arithmetic
120
Parallel Prefix Adders:Exploits

Kogge-Stone
idempotency
to limit the fan-out to 1.
Dramatic increase in
wires. The wire span
remains the same as
in Ladner-Fisher.
Buffers needed in both
cases: K-S, L-F
Oklobdzija 2004
Computer Arithmetic
121
Kogge-Stone Adder
Oklobdzija 2004
Computer Arithmetic
122
Parallel Prefix Adders: Brent-Kung

Set the fan-out to one
Avoids explosion of wires (as in K-S)
Makes no sense in CMOS:
fan-out = 1 limit is arbitrary and extreme
much of the capacitive load is due to wire
(anyway)
It is more efficient to insert buffers in L-F

than to use B-K scheme
Oklobdzija 2004
Computer Arithmetic
123
Brent-Kung Adder
Oklobdzija 2004
Computer Arithmetic
124
Parallel Prefix Adders: Han-Carlson

Is a hybrid synthesis of L-F and K-S
Trades increase in logic depth for a
reduction in fan-out:
effectively a higher-radix variant of K-S.
others do it similarly by serializing the prefix
computation at the higher fan-out nodes.
Others, similarly trade the logical depth for

reduction of fan-out and wire.
Oklobdzija 2004
Computer Arithmetic
125
Parallel Prefix Adders:

variety of possibilities
from: Knowles
bounded by L-F and K-S at ends
Oklobdzija 2004
Computer Arithmetic
126

Knowles 1999
Following rules are used:

Lateral wires at the jth level span 2j bits
Lateral fan-out at jth level is power of 2 up
to 2j
Lateral fan-out at the jth level cannot
exceed that a the (j+1)th level.
Oklobdzija 2004
Computer Arithmetic
127

Knowles 1999
The number of minimal depth graphs of this type

is given in:
at 4-bits there is only K-S and L-F, afterwards

there are several new possibilities.
Oklobdzija 2004
Computer Arithmetic
128
Knowles 1999
example of a new 32-bit adder [4,4,2,2,1]

Oklobdzija 2004
Computer Arithmetic
129

Knowles 1999
Example of a new 32-bit adder [4,4,2,2,1]

Oklobdzija 2004
Computer Arithmetic
130

Knowles 1999
Delay is given in terms of FO4 inverter delay: w.c.

(nominal case is 40-50% faster)
K-S is the fastest

K-S adders are wire limited (requiring 80% more area)
The difference is less than 15% between examined schemes
Oklobdzija 2004
Computer Arithmetic
131

Knowles 1999
Conclusion
Irregular, hybrid schmes are
possible
The speed-up of 15% is
achieved at the cost of large
wiring, hence area and power
Circuits close in speed to K-S
are available at significantly
lower wiring cost
Oklobdzija 2004
Computer Arithmetic
132
VLSI Arithmetic
Lecture 6
Review
Lecture 5
Prefix Adders
and
Parallel Prefix Adders
Oklobdzija 2004
Computer Arithmetic
136
Prefix Adders
Following recurrence operation is defined:
(g, p)o(g,p)=(g+pg, pp)

such that:
(g0, p0)
G i, P i =
i=0
(gi, pi)o(Gi-1, Pi-1 )
ci+1 = Gi
c1 = g0+ p0 cin
1in
for i=0, 1, .. n
(g-1, p-1)=(cin,cin)
This operation is associative, but not commutative

It can also span a range of bits (overlapping and adjacent)
Oklobdzija 2004
Computer Arithmetic
137
Parallel Prefix Adders: S. Knowles 1999
operation is associative: h>ijk
operation is idempotent: h>ijk
produces carry: cin=0
Oklobdzija 2004
Computer Arithmetic
138
Oklobdzija 2004
Computer Arithmetic
139

Oklobdzija 2004
Computer Arithmetic
140

Oklobdzija 2004
Computer Arithmetic
141

Oklobdzija 2004
Computer Arithmetic
142
Kogge-Stone Adder
Oklobdzija 2004
Computer Arithmetic
143
Brent-Kung Adder
Oklobdzija 2004
Computer Arithmetic
144
Hybrid BK-KS Adder
Oklobdzija 2004
Computer Arithmetic
145
Pyramid Adder:
M. Lehman, A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic
Units, IFIP Congress, Munich, Germany, 1962.
Oklobdzija 2004
Computer Arithmetic
146
Exploits associativity, but not idempotency.

Produces minimal logical depth
Oklobdzija 2004
Computer Arithmetic
147

(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of two.

Large fan-out (of 16; n/2); Large capacitive loading
combined with the long wires (in the last stages)
Oklobdzija 2004
Computer Arithmetic
148
Parallel Prefix Adders:Exploits

Kogge-Stone
idempotency
to limit the fan-out to 1.
Dramatic increase in
wires. The wire span
remains the same as
in Ladner-Fisher.
Buffers needed in both
cases: K-S, L-F
Oklobdzija 2004
Computer Arithmetic
149
Parallel Prefix Adders: Brent-Kung

Set the fan-out to one
Avoids explosion of wires (as in K-S)
Makes no sense in CMOS:
fan-out = 1 limit is arbitrary and extreme
much of the capacitive load is due to wire
(anyway)
It is more efficient to insert buffers in L-F

than to use B-K scheme
Oklobdzija 2004
Computer Arithmetic
150
Two Parallel Prefix Adder Structures

Kogge-Stone
log(bits) carry stages

Extra Wiring
Oklobdzija 2004
Han-Carlson
log(bits) + 1 carry stages

Reduced Wiring and Gates
Computer Arithmetic
151
Parallel Prefix Adders: Han-Carlson

Is a hybrid synthesis of L-F and K-S
Trades increase in logic depth for a
reduction in fan-out:
effectively a higher-radix variant of K-S.
others do it similarly by serializing the prefix
computation at the higher fan-out nodes.
Others, similarly trade the logical depth for

reduction of fan-out and wire.
Oklobdzija 2004
Computer Arithmetic
152
Parallel Prefix Adders:

variety of possibilities
from: Knowles
bounded by L-F and K-S at ends
Oklobdzija 2004
Computer Arithmetic
153

Knowles 1999
Following rules are used:

Lateral wires at the jth level span 2j bits
Lateral fan-out at jth level is power of 2 up
to 2j
Lateral fan-out at the jth level cannot
exceed that a the (j+1)th level.
Oklobdzija 2004
Computer Arithmetic
154

Knowles 1999
The number of minimal depth graphs of this type

is given in:
at 4-bits there is only K-S and L-F, afterwards

there are several new possibilities.
Oklobdzija 2004
Computer Arithmetic
155
Knowles 1999
example of a new 32-bit adder [4,4,2,2,1]

Oklobdzija 2004
Computer Arithmetic
156

Knowles 1999
Example of a new 32-bit adder [4,4,2,2,1]

Oklobdzija 2004
Computer Arithmetic
157

Knowles 1999
Delay is given in terms of FO4 inverter delay: w.c.

(nominal case is 40-50% faster)
K-S is the fastest

K-S adders are wire limited (requiring 80% more area)
The difference is less than 15% between examined schemes
Oklobdzija 2004
Computer Arithmetic
158

Knowles 1999
Conclusion
Irregular, hybrid schmes are
possible
The speed-up of 15% is
achieved at the cost of large
wiring, hence area and power
Circuits close in speed to K-S
are available at significantly
lower wiring cost
Oklobdzija 2004
Computer Arithmetic
159
Possibilities for Further Research

The logical depth is important (Knowles was right)
The fan-out is less important than fan-in (Knowles
was wrong):
It is possible to examine a variety of topologies with
restricted and varied fan-in.
Driving strength and Logical Effort rules were

overlooked and at least neglected:
It is possible to create number of topologies taking LE
rules into account.
It is further possible to combine the rules with compound
domino implementation taking advantage of two different
rules governing dynamic and static.
It is still possible to produce a better adder !

Oklobdzija 2004
Computer Arithmetic
160
Other Types of Adders
Oklobdzija 2004
Computer Arithmetic
161
Conditional Sum Adder

J. Sklansky, Conditional-Sum Addition Logic,
IRE Transactions on Electronic
Computers, EC-9, p.226-231, 1960.
Oklobdzija 2004
Computer Arithmetic
163
Conditional
Sum Adder
Oklobdzija 2004
Computer Arithmetic
164
Oklobdzija 2004
Computer Arithmetic
165
Oklobdzija 2004
from: Ercegovac-Lang 166

Computer Arithmetic
Oklobdzija 2004
Computer Arithmetic
167
Carry-Select Adder
O. J. Bedrij, Carry-Select Adder, IRE
Transactions on Electronic Computers, June
1962, p.340-34
Carry-Select Sum Adder
Oklobdzija 2004
from: Ercegovac-Lang 169

Computer Arithmetic
Carry-Select Adder
Addition under assumption of C in=0 and Cin =1.
Oklobdzija 2004
Computer Arithmetic
170
Carry Select Adder:

combining two 32-b VBAs in select mode
Delay =VBA32+ MUX

Oklobdzija 2004
Computer Arithmetic
171
Carry-Select Adder
O.J. Bedrij, IBM Poughkeepsie, 1962

Oklobdzija 2004
Computer Arithmetic
172

VLSI Arith Addition

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

VLSI Arith Addition

Caricato da

Copyright:

Formati disponibili

VLSI Arithmetic

Adders & Multipliers

Addition of Binary Numbers

Addition of Binary Numbers

The sum and carry outputs are described as:

Addition of Binary Numbers

One-bit adder could be

One-bit adder could be

The Ripple-Carry Adder

The Ripple-Carry Adder

Worst case delay linear with the number of bits

Goal: Make the fastest possible carry path circuit

Minimize Critical Path by Reducing Inverting

Exploit Inversion Property

Ripple Carry Adder

Manchester Carry Chain (CMOS)

Kilburn, et al, IEE Proc, 1959.

Pass-Transistor Realization in DPL

Idea: If (P0 and P1 and P2 and P3 = 1)

(r-1 )kb (r-1 )k

Variable Block Adder

Carry-chain of a 32-bit Variable Block Adder

Carry-chain of a 32-bit Variable Block Adder

Carry-chain block size determination for a

Delay Calculation for Variable Block Adder

Variable Block Adder

Variable Group Length

Carry-chain of a 32-bit Variable Block Adder

Variable Block Lengths

No closed form solution for delay

Delay Comparison: Variable Block Adder

Delay Comparison: Variable Block Adder

Variable Block Adder

Carry-chain of a 32-bit Variable Block Adder

Carry signal path

Carry-chain of a 32-bit Variable Block Adder

Carry-chain block size determination for a

Delay Calculation for Variable Block Adder

Variable Block Adder

Variable Group Length

Carry-chain of a 32-bit Variable Block Adder

Variable Block Lengths

No closed form solution for delay

Delay Comparison: Variable Block Adder

Delay Comparison: Variable Block Adder

Estimating Adders speed is much more

Delay Comparison: Variable Block Adder

ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who

Ref: A. Weinberger and J. L. Smith, A Logic for High-Speed Addition,

CLA Definitions: One-bit adder

One gate delay

Three gate delays

Compare that to 8 in RCA !

G * j Gi 3 Pi 3Gi 2 Pi 3 Pi 2Gi 1 Pi 3 Pi 2 Pi 1Gi

Additional two gate delays

32-bit Carry Lookahead Adder

Carry-Lookahead Adder (Weinberger and Smith)

Carry-Lookahead Adder (Weinberger and Smith)

Motorola: CLA Implementation

Critical path in Motorola's 64-bit CLA

Basically, this is MCC performance with

carry ripples locally

Intermediate propagate signals Pi:0