Sei sulla pagina 1di 172

VLSI Arithmetic

Adders & Multipliers


Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel

Introduction
Digital Computer Arithmetic belongs to
Computer Architecture, however, it is also an
aspect of logic design.
The objective of Computer Arithmetic is to
develop appropriate algorithms that are
utilizing available hardware in the most
efficient way.
Ultimately, speed, power and chip area are
the most often used measures, making a
strong link between the algorithms and
technology of implementation.
Oklobdzija 2004

Computer Arithmetic

Basic Operations

Addition
Multiplication
Multiply-Add
Division

Evaluation of Functions
Multi-Media
Oklobdzija 2004

Computer Arithmetic

Addition of Binary Numbers

Addition of Binary Numbers


Full Adder. The full adder is the fundamental building block
of most arithmetic circuits:

ai

Cout

bi

Full
Adder

Cin

si

The sum and carry outputs are described as:

si ai bi ci ai bi ci ai bi ci ai bi ci

ci 1 ai bi ci ai bi ci ai bi ci ai bi ci ai bi ai ci bi ci
Oklobdzija 2004

Computer Arithmetic

Addition of Binary Numbers


Inputs

Outputs

ci

ai

bi

si

ci+1

0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

0
1
1
0
1
0
0
1

0
0
0
1
0
1
1
1

Oklobdzija 2004

Computer Arithmetic

Propagate
Generate

Propagate
Generate
6

Full-Adder Implementation
Full Adder operations is defined by equations:

si ai bi ci ai bi ci ai bi ci ai bi ci ai bi ci pi ci

ci 1 ai bi ci ai bi ci ai bi g i pi ci

ai b

Carry-Propagate: pi ai bi
and Carry-Generate gi

g i a i bi
c out

c in

One-bit adder could be


implemented as shown
Oklobdzija 2004

Computer Arithmetic

si

High-Speed Addition
ci 1 g i pi ci
a

g i ai bi

pi ai bi

c out
s

One-bit adder could be


implemented more efficiently
because MUX is faster
Oklobdzija 2004

c in

si pi ci
si

Computer Arithmetic

The Ripple-Carry Adder

Oklobdzija 2004

Computer Arithmetic

The Ripple-Carry Adder


A0
Ci,0

A1

B0

FA

S0

Co,0
(= C i,1)

A2

B1

FA

S1

C o,1

A3

B2

FA

Co,2

S2

FA

B3
Co,3

S3

Worst case delay linear with the number of bits


td = O(N)
tadder N 1 tcarry + tsum

Goal: Make the fastest possible carry path circuit


From Rabaey
Oklobdzija 2004

Computer Arithmetic

10

Inversion Property
A
Ci

FA

Co

Ci

FA

Co

S
S A B C i = S A B C i
C o A B C i = Co A B Ci

From Rabaey
Oklobdzija 2004

Computer Arithmetic

11

Minimize Critical Path by Reducing Inverting


Stages
Even Cell
A0
Ci,0

A1

B0

FA

C o,0

S0

B1

FA

S1

A2
Co,1

A3

B2

FA

Odd Cell

C o,2

S2

B3

FA

C o,3

S3

Exploit Inversion Property


From Rabaey Note: need 2 different types of cells
Oklobdzija 2004

Computer Arithmetic

12

Ripple Carry Adder


Carry-Chain of an RCA implemented using multiplexer from the
a i+ 2
b i+ 2
a i+ 1
b i+ 1
ai
bi
standard cell
library:

Critical Path

c i+ 1

c out

ci

c in

Oklobdzija, ISCAS88
s i+ 2

Oklobdzija 2004

s i+ 1

Computer Arithmetic

si

13

Manchester Carry-Chain
Realization of the Carry Path
Simple and very popular scheme for implementation of
carry signal path

dd

dd

dd

dd

dd

dd

dd

dd

Generate
device
Carry out

Carry in
+

+ Propagate
device

Predischarge
& kill device

Oklobdzija 2004

Computer Arithmetic

14

Original Design
T. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers:
A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.

Oklobdzija 2004

Computer Arithmetic

15

Manchester Carry Chain (CMOS)


Implement P with pass-transistors
Implement G with pull-up, kill (delete) with pull-down
Use dynamic logic to reduce the complexity and speed up

VDD

Ci,0

P0

P1

P2

P3

P4

G0

G1

G2

G3

G4

Kilburn, et al, IEE Proc, 1959.


Oklobdzija 2004

Computer Arithmetic

16

Pass-Transistor Realization in DPL


C

C C

A
A
B
B
S
X O R /X N O R

U L T IP L E X E R

U FFER

A N D /N A N D
V

C C

A
A
B
B

V
A
A
B
B

C C

C C

U L T IP L E X E R

U FFER

O R /N O R

Oklobdzija 2004

Computer Arithmetic

17

Carry-Skip Adder
MacSorley, Proc IRE 1/61
Lehman, Burla, IRE Trans on Comp, 12/61

Oklobdzija 2004

Computer Arithmetic

18

Carry-Skip Adder
Ci,0

G1

FA

P0
C o,0

P0 G1
Ci,0

FA

FA

P0
C o,0

G1

P2
C o,1

G1

FA

FA

P2
Co,1

G2

P3
Co,2

G2

FA

FA

P3
C o,2

G3

G3

FA

Co,3

BP=P oP1 P2 P3
Multiplexer

P0

Co,3

Bypass

From Rabaey

Idea: If (P0 and P1 and P2 and P3 = 1)


then C o3 = C 0, else kill or generate.

Oklobdzija 2004

Computer Arithmetic

19

Carry-Skip Adder:
N-bits, k-bits/group, r=N/k groups

...

O R
C

out

a
N - 1 b N - 1 N - k - 1b N - k - 1

N -1

b (r-1 )k a

O R

S
P

(r-1 )kb (r-1 )k

...
G

2 k -1

O R
...

r1

N -k -1
r-1

AND

P
AND

2 k -1

ak

bk

k -1

r-2

(r-2 )k

2 k -1

k -1

O R

S
P

...

a0 b

...
G

... ...

... ...
S

(r-1 )k -1

...

... ...

... ...
S

(r-1 )k

k -1

S
P

AND

0
0

AN D

c r it ic a l p a t h , d e la y = 2 ( k - 1 ) + ( N / 2 - 2 )

Oklobdzija 2004

Computer Arithmetic

20

in

Carry-Skip Adder
tp
ripple adder

bypass adder

t d 2 k 1 t RCA
2 t SKIP
2k

4..8

Oklobdzija 2004

N
Computer Arithmetic

21

Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004

Computer Arithmetic

22

Carry-chain of a 32-bit Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)
a

N -1

...

..
C

out

ajb
..
.

N -1

N -1

m -1

S
P

m -1

m -1

a0 b

...
G

m -2

m -2

m -2

...

...

..

in

s k ip in g
...
C

ou

r ip p lin g

Oklobdzija 2004

C a r r y s ig n a l p a th

Computer Arithmetic

23

in

Carry-chain of a 32-bit Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

1 3

6
5

=9

3 1

Any-point-to-any-point delay = 9
as compared to 12 for CSKA

Oklobdzija 2004

Computer Arithmetic

24

Carry-chain block size determination for a


32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004

Computer Arithmetic

25

Delay Calculation for Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)
P0
Ci,0

G0

P1

P2
G1

P3
G2

BP
Co,3

G3
BP

Delay model:

Oklobdzija 2004

Computer Arithmetic

26

Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

Variable Group Length

t d c1 c2 N c3
Oklobdzija, Barnes, Arith85
Oklobdzija 2004

Computer Arithmetic

27

Carry-chain of a 32-bit Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

Variable Block Lengths

No closed form solution for delay


It is a dynamic programming problem

Oklobdzija 2004

Computer Arithmetic

28

Delay Comparison: Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004

Computer Arithmetic

29

Delay Comparison: Variable Block Adder


Delay

16

VBA

14
12

CLA

10
8

VBA- Multi-Level

6
4
2
0
4

11

18

25

32

39

46

53

60

Size N

Oklobdzija 2004

Computer Arithmetic

30

VLSI Arithmetic
Lecture 4
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel

Review
Lecture 3

Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004

Computer Arithmetic

33

Carry-chain of a 32-bit Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)
a N-1b N-1
C out

..

...

Gm

Gm-1

SN-1
Pm

Pm-1

Gm

Gm-1

aj b j
..
.
Gm-2

a 0 b0

...

...

..

G2

G1

G0

Sj

Si

Pm-2

P2

Gm-2

bi

...

Cin

S0
P1

G2

G1
skiping

P0

G0

...
C out

Cin
rippling

Oklobdzija 2004

Carry signal path

Computer Arithmetic

34

Carry-chain of a 32-bit Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

1 3

6
5

=9

3 1

Any-point-to-any-point delay = 9
as compared to 12 for CSKA

Oklobdzija 2004

Computer Arithmetic

35

Carry-chain block size determination for a


32-bit Variable Block Adder
(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004

Computer Arithmetic

36

Delay Calculation for Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)
P0
Ci,0

G0

P1

P2
G1

P3
G2

BP
Co,3

G3
BP

Delay model:

Oklobdzija 2004

Computer Arithmetic

37

Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

Variable Group Length

t d c1 c2 N c3
Oklobdzija, Barnes, Arith85
Oklobdzija 2004

Computer Arithmetic

38

Carry-chain of a 32-bit Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

Variable Block Lengths

No closed form solution for delay


It is a dynamic programming problem

Oklobdzija 2004

Computer Arithmetic

39

Delay Comparison: Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004

Computer Arithmetic

40

Delay Comparison: Variable Block Adder


Delay

16

Square Root
Dependency

14

VBA

12

Log
Dependency

10

CLA

VBA- Multi-Level

6
4
2
0
4

11

18

25

32

39

46

53

60

Size N

Oklobdzija 2004

Computer Arithmetic

41

Circuit Issues
Adder speed can not be estimated based
on:
logic gates in the critical path
number of transistors in the path
logic levels in the path

Estimating Adders speed is much more


complex and many of the fast schemes
may be misleading you.
Oklobdzija 2004

Computer Arithmetic

42

Fan-Out Dependency

Oklobdzija 2004

Computer Arithmetic

43

Fan-In Dependency
This looks like
Logical Effort
(1985)

Oklobdzija 2004

Computer Arithmetic

44

Delay Comparison: Variable Block Adder


(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004

Computer Arithmetic

45

Oklobdzija 2004

Computer Arithmetic

46

Carry-Lookahead Adder
(Weinberger and Smith, 1958)

ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who


invented CLA adder in 1958)

Ref: A. Weinberger and J. L. Smith, A Logic for High-Speed Addition,


National Bureau of Standards, Circ. 591, p.3-12, 1958.
Oklobdzija 2004

Computer Arithmetic

47

CLA Definitions: One-bit adder


ci 1 g i pi ci
a

g i ai bi

pi ai bi

c out
s

c in

si pi ci
si
Oklobdzija 2004

Computer Arithmetic

48

CLA
Definitions:
4-bit
Adder
a
a
a
b
b
b
b

ai+3

i+2

i+3

Ci+4

Ci+3

gi+3

pi+3

i+1

i+2

Ci+2

gi+2

pi+2

i+1

Ci+1

gi+1

pi+1

Ci

gi

pi

ci 1 ai bi ci ai bi ci ai bi g i pi ci

ci 2 g i 1 pi 1ci 1 g i 1 pi 1 ( g i pi c1 )
g i 1 pi 1 g i pi 1 pi c1
Oklobdzija 2004

Computer Arithmetic

49

Carry-Lookahead
a
a 4-bits
a
a Adder:
b
b
b
b
i+3

i+2

i+3

Ci+4

Ci+3

gi+3

pi+3

i+1

i+2

Ci+2

gi+2

pi+2

i+1

Ci+1

gi+1

pi+1

Ci

gi

pi

ci 3 g i 2 pi 2 ci 2 g i 2 pi 2 ( g i 1 pi 1 g i pi 1 pi ci )
g i 2 pi 2 g i 1 pi 2 pi 1 g i pi 2 pi 1 pi ci
ci 4 g i 3 pi 3ci 3 g i 3 pi 3 ( g i 2 pi 2 g i 1 pi 2 pi 1 g i )
g i 3 pi 3 g i 2 pi 3 pi 2 g i 1 pi 3 pi 2 pi 1 g i pi 3 pi 2 pi 1 pi ci

Gj
Oklobdzija 2004

Computer Arithmetic

Pj

50

Carry-Lookahead Adder
G j g i 3 pi 3 g i 2 pi 3 pi 2 g i 1 pi 3 pi 2 pi 1 g i
Pj pi 3 pi 2 pi 1 pi

One gate delay


to calculate p, g
One to calculate
P and two for G

i+ 3

i+ 2

i+ 2

i+ 1

i+ 1

C in

g
C

i+ 1p i+ 1

4 (j+ 1 )

Three gate delays


To calculate C4(j+1)

i+ 1 p i+ 1

i+ 1

i+ 1

P , G G ro u p

c4( j 1) G j Pj c j
Oklobdzija 2004

i+ 3

4 j+ 3

4 j+ 2

4 j+ 1

Compare that to 8 in RCA !

Computer Arithmetic

51

Carry-Lookahead Adder
(Weinberger and Smith)

G * j Gi 3 Pi 3Gi 2 Pi 3 Pi 2Gi 1 Pi 3 Pi 2 Pi 1Gi

P * j Pi 3 Pi 2 Pi 1 Pi
G

j+ 3

j+ 3

j+ 2

j+ 2

j+ 1

j+ 1

4 (j+ 1 )

P*

G*

c4 ( j 1) G *k P *k c4 j

4 j+ 3

4 j+ 2

4 j+ 1

Additional two gate delays


C16 will take a total of 5 vs. 32 for RCA !
Oklobdzija 2004

Computer Arithmetic

52

4j

32-bit Carry Lookahead Adder


a

28

24

in d iv id u a l a d d e r s
g e n e r a t i n g : g i, p i,
and sum S i

20

12

C a r r y - lo o k a h e a d s u p e r - b lo c k s o f
4 - b it s b lo c k s g e n e r a tin g :
G * i, P * i, a n d C in f o r t h e 4 - b i t
b lo c k s

out

16

out

in

in

in

C a r r y - lo o k a h e a d b lo c k s o f
4 - b it s g e n e r a t in g :
G i, P i, a n d C in f o r t h e
a d d e rs

G r o u p p r o d u c in g f in a l
c a rry C out a n d C 16

C r i t i c a l p a t h d e l a y = ( f o r g i , p i ) + 2 x 2 ( f o r G , P ) + 3 x 2 ( f o r C i n ) + 1 X O R - ( f o r S u m ) = a p p x . 1 2 o f d e l a y

Oklobdzija 2004

Computer Arithmetic

53

Carry-Lookahead Adder
(Weinberger and Smith: original derivation, 1958 )

Oklobdzija 2004

Computer Arithmetic

54

Carry-Lookahead Adder
(Weinberger and Smith: original derivation )

Oklobdzija 2004

Computer Arithmetic

55

Carry-Lookahead Adder (Weinberger and Smith)


please notice the similarity with Parallel-Prefix Adders !

Oklobdzija 2004

Computer Arithmetic

56

Carry-Lookahead Adder (Weinberger and Smith)


please notice the similarity with Parallel-Prefix Adders !

Oklobdzija 2004

Computer Arithmetic

57

Motorola: CLA Implementation


Example
A. Naini, D. Bearden and W. Anderson, A 4.5nS 96b CMOS
Adder Design,
Proceedings of the IEEE Custom Integrated Circuits
Conference, May 3-6, 1992.

PG BLO C K
PG BLO CK

P63:60

P63:48

P63:0

P59:48
G59:48

P55:48
G55:48

P51:48

C4

C8

C12

C16

C32

C48

C52

C56

G63:0

C60

P47:32

G63:48

61

P59:56
G59:56

P55:52
G55:52

P51:48
G51:48

P11:0
G11:0

P7:0
G7:0

P3:0
G3:0

G15:0

P15:0

G31:16

P31:16

G47:32

G63:60
C
P,G62:60 63
C
P,G61:6062
C
P,G60

CAR RY
BLO C K

P63
G63
P62
G62
P61
G61
P60
G60
P59

PG BLO CK
PG BLO C K

G56
P55
G52
G51:48

P15:12
G15:12

P11:8
G11:8

P7:4
G7:4

P3:0
G3:0

C16
P15:0

G31:0

C32
P31:0

G47:0

P47:0

C48

3.75nS
G15:0

C64

63

- S
63

- C
60

P51
G48

P,G2:0
P,G1:0
P,G0

C0

2.35nS
2.0nS

P47
G32

P
31

G16

P15

PG BLO CK

1.7nS

- C
48

- C
4 7 :0

- G
1 5 :0

- G
3 :0

- G
0

C r itic a l p a th : A , B - G

59
Computer Arithmetic
Oklobdzija 2004

G12
P11
G8
P7
G4
P3
G3
P2
G2
P1
G1
P0
G0

1.05nS

...
...
...
...
...
...
...
...

4.8nS

Critical path in Motorola's 64-bit CLA

2.7nS

Motorola's 64-bit
CLA
conventional PG Block

no better
situation here !

Basically, this is MCC performance with


Carry-Skip.
One should not expect any better results
than VBA.
Oklobdzija 2004

carry ripples locally


5-transistors in the path

Computer Arithmetic

60

Motorola's 64-bit
CLA
Modified PG Block

Intermediate propagate signals Pi:0


are generated to speed-up C3
still critical path resembles MCC

Oklobdzija 2004

Computer Arithmetic

61

Motorola's 64-bit CLA

3.9nS

1.8nS

2.2nS
3.55nS
2.9nS

Oklobdzija 2004

3.2nS

Computer Arithmetic

62

P6 3
G6 3
P6 2
G6 2
P6 1
G6 1
P6 0
G
60

P5 9
G
56

P5 5
G5 2
P5 1
G4 8

P4 7
G3 2

P3 1
G1 6

P
15

G1 2
P1 1
G8
P7

P6 3 : 6 0

P6 3 : 4 8

P6 3 : 0

G5 5 : 4 8

P5 5 : 4 8

C5 6

G5 9 : 4 8

P5 9 : 4 8

C6 0

G6 3 : 0

P4 7 : 3 2

G6 3 : 4 8

C6 1

P5 9 : 5 6
G5 9 : 5 6

P5 5 : 5 2
G5 5 : 5 2

P5 1 : 4 8

C
16

C3 2

C4 8

C5 2
P5 1 : 4 8
G5 1 : 4 8

G3 : 0

P3 : 0

G7 : 0

P7 : 0

C4

C8

G1 1 : 0

P1 1 : 0

C1 2

G5 1 : 4 8

P1 5 : 1 2
G1 5 : 1 2

P1 1 : 8
G1 1 : 8

P7 : 4
G7 : 4

P3 : 0
G3 : 0

G1 5 : 0

P1 5 : 0

G3 1 : 1 6

P3 1 : 1 6

G4 7 : 3 2

C G6 3 : 6 0
P , G6 2 : 6 0 6 3
C
P , G6 1 : 66 02
P , G6 0

P , G2 : 0
P , G1 : 0
P , G0

C4 8
P4 7 : 0
G4 7 : 0

C3 2
P3 1 : 0
G3 1 : 0

C1 6
P1 5 : 0
G1 5 : 0

C6 4

C0

63
3.2nS
2.9nS Computer Arithmetic

Oklobdzija 2004

3.55nS
2.2nS

3.9nS
2.35nS
2.0nS

6 3

- S
6 3

- C
6 0

- C
4 8

- C
4 7 :0

- G
1 5 :0

- G
3 :0

- G
0

- G
C r it ic a l p a t h : A , B

B L O C K

1.7nS

P G

3.75nS
2.7nS

B L O C K
P G
B L O C K
P G

1.05nS

G4
P3
G3
P2
G2
P1
G1
P0
G0
C A R R Y
B L O C K

B L O C K
P G
B L O C K
P G

. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .

4.8nS

1.8nS

Delay Optimized CLA


B. Lee, V. G. Oklobdzija
Journal of VLSI Signal Processing, Vol.3, No.4, October 1991

Delay
Optimized
CLA: LeeOklobdzija 91
(a.) Fixed groups and levels
(b.) variable-sized groups,
fixed levels
(c.) variable-sized groups and
fixed levels
(d.) variable-sized groups and
levels
Oklobdzija 2004

Computer Arithmetic

65

Two-Levels of Logic Implementation of


the Carry Block

Oklobdzija 2004

Computer Arithmetic

66

Two-Levels of Logic Implementation of


the Carry-Lookahead Block

Oklobdzija 2004

Computer Arithmetic

67

Three-Levels of Logic Implementation


of the Carry Block (restricted fan-in)

Oklobdzija 2004

Computer Arithmetic

68

Three-Levels of Logic Implementation of the


Carry Lookahead (restricted fan-in)

Oklobdzija 2004

Computer Arithmetic

69

Delay Optimized CLA: Lee-Oklobdzija 91

Delay: Two-level BCLA


Oklobdzija 2004

Delay: Three-level BCLA


Computer Arithmetic

70

Delay Optimized CLA: Lee-Oklobdzija 91

(a.) 2-level BCLA =8.5nS


Oklobdzija 2004

(b.) 3-level BCLA =8.9nS

Computer Arithmetic

71

Lings Adder
Huey Ling, High-Speed Binary Adder
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Used in: IBM 3033, IBM 168, Amdahl V6, HP etc.

Lings Derivations

ai

bi

define:

Ci 1 g i pi Ci

H i 1 Ci 1 Ci

gi implies Ci+1 which implies


Hi+1 , thus: gi= gi Hi+1

ci+1

ci

g i ai bi

si

pi Ci pi Ci 1 pi g i pi pi Ci 1

ai bi pi gi

ti

pi Ci 1 pi Ci 1 pi H i 1

0 0 0

1 1 0

0 1 0

1 0 1

pi Ci pi H i 1

Ci 1 ti H i 1

Ci 1 gi pi Ci gi H i 1 pi Ci
gi H i 1 pi H i 1 ti H i 1
Oklobdzija 2004

Computer Arithmetic

73

From:

Lings Derivations
and

CiC1
gi
pi CCi
H ii 11 Cii 11 Cii g i p

g
i i
i
i
i
because:

H i 1 g i ti 1 H i

Ci 1 ti H i 1

fundamental expansion

Now we need to derive Sum equation

Oklobdzija 2004

Computer Arithmetic

74

Ling Adder
Variation of CLA:

Lings equations:

pi ai bi

ti ai bi

gi ai bi

gi ai bi

Ci 1 g i pi Ci

H i 1 g i ti 1 H i

Si pi Ci

Si ti H i 1 g i ti 1 H i
Ling, IBM J. Res. Dev, 5/81

Oklobdzija 2004

Computer Arithmetic

75

Ling Adder
Lings equation:

Variation of CLA:

Ci 1 g i g i Ci pi Ci
g i g i pi Ci
Ci 1 g i ti Ci

H i g i ti 1 H i 1
Ling uses different transfer function.
Four of those functions have desired
properties (Lings is one of them)

see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.
Oklobdzija 2004

Computer Arithmetic

76

Ling Adder
Conventional:

Fan-in of 5

C4 g 3 t3 g 2 t 3t 2 g1 t3t 2t1 g 0 t 3t 2t1t0Cin


Ling:

H 4 g 3 t 2 g 2 t 2t1 g1 t 2t1t0 g 0 t 2t1t0t 1Cin


H 4 g 3 g 2 t 2 g1 t 2t1 g 0 t 2t1t0Cin
Fan-in of 4

Oklobdzija 2004

Computer Arithmetic

77

Advantages of Lings Adder


Uniform loading in fan-in and fan-out
H16 contains 8 terms as compared to G16 that
contains 15.
H16 can be implemented with one level of logic
(in ECL), while G16 can not.
(Lings adder takes full advantage of wired-OR,
of special importance when ECL technology is
used)
Oklobdzija 2004

Computer Arithmetic

78

VLSI Arithmetic
Lecture 5
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel

Review
Lecture 4

Lings Adder
Huey Ling, High-Speed Binary Adder
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Used in: IBM 3033, IBM S370/168, Amdahl V6, HP etc.

Lings Derivations

ai

bi

define:

Ci 1 g i pi Ci

H i 1 Ci 1 Ci

gi implies Ci+1 which implies


Hi+1 , thus: gi= gi Hi+1

ci+1

ci

g i ai bi

si

pi Ci pi Ci pi g i pi pi Ci

ai bi pi gi

ti

pi Ci pi Ci 1 pi H i 1

0 0 0

1 1 0

0 1 0

1 0 1

pi Ci pi H i 1

Ci 1 ti H i 1

Ci 1 gi pi Ci gi H i 1 pi Ci
gi H i 1 pi H i 1 ti H i 1
Oklobdzija 2004

Computer Arithmetic

82

From:

Lings Derivations
and

CiC1
gi
pi CCi
H ii 11 Cii 11 Cii g i p

g
i i
i
i
i
because:

H i 1 g i ti 1 H i

Ci 1 ti H i 1

fundamental expansion

Now we need to derive Sum equation

Oklobdzija 2004

Computer Arithmetic

83

Ling Adder
Variation of CLA:

Lings equations:

pi ai bi

ti ai bi

gi ai bi

gi ai bi

Ci 1 g i pi Ci

H i 1 g i ti 1 H i

Si pi Ci

Si ti H i 1 g i ti 1 H i
Ling, IBM J. Res. Dev, 5/81

Oklobdzija 2004

Computer Arithmetic

84

Ling Adder
Lings equation:

Variation of CLA:

Ci 1 g i g i Ci pi Ci
g i g i pi Ci

ai

bi

Hi+1

ci+1

ai-1 bi-1
Hi

g i, t i

ci

si

Ci 1 g i ti Ci

gi-1, ti-1

ci-1

si-1

H i 1 g i ti 1 H i
Ling uses different transfer function.
Four of those functions have desired
properties (Lings is one of them)

see: Doran, IEEE Trans on Comp. Vol 37, No.9 Sept. 1988.
Oklobdzija 2004

Computer Arithmetic

85

Ling Adder
Conventional:

Fan-in of 5

C4 g 3 t3 g 2 t 3t 2 g1 t3t 2t1 g 0 t 3t 2t1t0Cin


Ling:

H 4 g 3 t 2 g 2 t 2t1 g1 t 2t1t0 g 0 t 2t1t0t 1Cin


H 4 g 3 g 2 t 2 g1 t 2t1 g 0 t 2t1t0Cin
Fan-in of 4

Oklobdzija 2004

Computer Arithmetic

86

Advantages
of and
Lings
Uniform
loading in fan-in
fan-outAdder
H16 contains 8 terms as compared to G16 that
contains 15.
H16 can be implemented with one level of logic
(in ECL), while G16 can not (with 8-way wire-OR).
(Lings adder takes full advantage of wired-OR, of
special importance when ECL technology is
used - his IBM limitation was fan-in of 4 and
wire-OR of 8)
Oklobdzija 2004

Computer Arithmetic

87

Ling: Weinberger Notes

Oklobdzija 2004

Computer Arithmetic

88

Ling: Weinberger Notes

Oklobdzija 2004

Computer Arithmetic

89

Ling: Weinberger Notes

Oklobdzija 2004

Computer Arithmetic

90

Advantage of Lings Adder


32-bit adder used in: IBM 3033, IBM S370/
Model168, Amdahl V6.
Implements 32-bit addition in 3 levels of
logic
Implements 32-bit AGEN: B+Index+Disp in
4 levels of logic (rather than 6)
5 levels of logic for 64-bit adder used in
HP processor
Oklobdzija 2004

Computer Arithmetic

91

Implementation of Lings
Adder in CMOS
(S. Naffziger, A Subnanosecond 64-b Adder, ISSCC 96)

Oklobdzija 2004

Computer Arithmetic

92

S. Naffziger,
ISSCC96

H 4 g 3 g 2 t 2 g1 t 2t1 g 0

Ci 1 ti H i 1
Oklobdzija 2004

Computer Arithmetic

93

S. Naffziger,
ISSCC96

H 4 g 3 g 2 t 2 g1 t 2t1 g 0
Oklobdzija 2004

Computer Arithmetic

94

H 4 g 3 g 2 t 2 g1 t 2t1 g 0

S. Naffziger,
ISSCC96
Oklobdzija 2004

Computer Arithmetic

95

S. Naffziger,
ISSCC96
Oklobdzija 2004

Computer Arithmetic

96

S. Naffziger, ISSCC96
Oklobdzija 2004

Computer Arithmetic

97

S. Naffziger, ISSCC96
Oklobdzija 2004

Computer Arithmetic

98

S. Naffziger,
ISSCC96
Oklobdzija 2004

Computer Arithmetic

99

C16 p15 H16 p15 ( g15 g11 t11 g 7 t11t7 g 0 )


S. Naffziger, ISSCC96
Oklobdzija 2004

Computer Arithmetic

100

S. Naffziger,
ISSCC96
Oklobdzija 2004

Computer Arithmetic

101

S. Naffziger,
ISSCC96
Oklobdzija 2004

Computer Arithmetic

102

S. Naffziger,
ISSCC96
Oklobdzija 2004

Computer Arithmetic

103

Ling Adder Critical Path

Oklobdzija 2004

Computer Arithmetic

104

Ling Adder: Circuits


G3

CK
A2

CK
A3

B2

A2

A1

B2

B1

A1

G4

B0

G0
P1

G1

B1 A3

B3

A0

B0 A2

B2

P4

CK

LC

CK

A1

B3

B1

A0

CK

G2

P2
CK

SumL
K
G

C1L

LCH

LCL

C1H C0L

C0H
P

C1H

SumH

C1L C0H
LCH

C0L

LCL

CK

Oklobdzija 2004

Computer Arithmetic

105

LCS4 Critical G
Path

in1

4b
(k,p) or (g,p)

P4

G3

G4

12b
C15
32b
C47

C31

C15

16b
S63 S62

Oklobdzija 2004

S48

Computer Arithmetic

106

LCS4 Logical Effort


Delay
Prefix-4 Ling/Conditional-Sum (Dynamic - Long Carry Path)

Stages
dg3# (dg3)
g4 (NAND2)
C15# (GG4)
C15 (INV)
C47# (LC)
C47 (INV)
C47#b (INV)
C47b (INV)
S63# (SUM)
S63 (INV)

Branch
4.0
2.0
1.0
1.0
3.0
1.0
1.0
1.0
16.0
1.0

Oklobdzija 2004

LE
0.98
1.11
1.01
1.00
1.03
1.00
1.00
1.00
0.86
1.00

Total
Path
Effort
fo, opt
Parasitic Branch Total LE
2.97
1.84
1.80
1.00
3.32
3.84E+02 9.73E-01 3.74E+02 1.81
1.00
1.00
1.00
1.36
1.00

Computer Arithmetic

Effort
Delay
(ps)

Parasitic
Delay
(ps)

Total
Delay
(ps)

Total
Delay
(FO4)

66

70

136

7.2

107

Results:
0.5u Technology
Speed: 0.930 nS
Nominal process, 80C, V=3.3V

See: S. Naffziger, A Subnanosecond 64-b Adder, ISSCC 96

Oklobdzija 2004

Computer Arithmetic

108

Prefix Adders
and
Parallel Prefix Adders

from: Ercegovac-Lang
Oklobdzija 2004

Computer Arithmetic

110

Prefix Adders

Following recurrence operation is defined:

(g, p)o(g,p)=(g+pg, pp)


such that:

(g0, p0)
G i, P i =

i=0

(gi, pi)o(Gi-1, Pi-1 )

ci+1 = Gi
c1 = g0+ p0 cin

1in

for i=0, 1, .. n

(g-1, p-1)=(cin,cin)

This operation is associative, but not commutative


It can also span a range of bits (overlapping and adjacent)
Oklobdzija 2004

Computer Arithmetic

111

from: Ercegovac-Lang
Oklobdzija 2004

Computer Arithmetic

112

Parallel Prefix Adders: variety of possibilities


from: Ercegovac-Lang

Oklobdzija 2004

Computer Arithmetic

113

Pyramid Adder:
M. Lehman, A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic
Units, IFIP Congress, Munich, Germany, 1962.

Oklobdzija 2004

Computer Arithmetic

114

Parallel Prefix Adders: variety of possibilities


from: Ercegovac-Lang

Oklobdzija 2004

Computer Arithmetic

115

Parallel Prefix Adders: variety of possibilities


from: Ercegovac-Lang

Oklobdzija 2004

Computer Arithmetic

116

Hybrid BK-KS Adder

Oklobdzija 2004

Computer Arithmetic

117

Parallel Prefix Adders: S. Knowles 1999

operation is associative: h>ijk

operation is idempotent: h>ijk

produces carry: cin=0

Oklobdzija 2004

Computer Arithmetic

118

Parallel Prefix Adders: Ladner-Fisher

Exploits associativity, but not idempotency.


Produces minimal logical depth
Oklobdzija 2004

Computer Arithmetic

119

Parallel Prefix Adders: Ladner-Fisher


(16,8,4,2,1)

Two wires at each level. Uniform, fan-in of two.


Large fan-out (of 16; n/2); Large capacitive loading
combined with the long wires (in the last stages)
Oklobdzija 2004

Computer Arithmetic

120

Parallel Prefix Adders:Exploits


Kogge-Stone
idempotency
to limit the fan-out to 1.
Dramatic increase in
wires. The wire span
remains the same as
in Ladner-Fisher.
Buffers needed in both
cases: K-S, L-F

Oklobdzija 2004

Computer Arithmetic

121

Kogge-Stone Adder

Oklobdzija 2004

Computer Arithmetic

122

Parallel Prefix Adders: Brent-Kung


Set the fan-out to one
Avoids explosion of wires (as in K-S)
Makes no sense in CMOS:
fan-out = 1 limit is arbitrary and extreme
much of the capacitive load is due to wire
(anyway)

It is more efficient to insert buffers in L-F


than to use B-K scheme
Oklobdzija 2004

Computer Arithmetic

123

Brent-Kung Adder

Oklobdzija 2004

Computer Arithmetic

124

Parallel Prefix Adders: Han-Carlson


Is a hybrid synthesis of L-F and K-S
Trades increase in logic depth for a
reduction in fan-out:
effectively a higher-radix variant of K-S.
others do it similarly by serializing the prefix
computation at the higher fan-out nodes.

Others, similarly trade the logical depth for


reduction of fan-out and wire.

Oklobdzija 2004

Computer Arithmetic

125

Parallel Prefix Adders:


variety of possibilities
from: Knowles

bounded by L-F and K-S at ends

Oklobdzija 2004

Computer Arithmetic

126

Parallel Prefix Adders: variety of possibilities


Knowles 1999

Following rules are used:


Lateral wires at the jth level span 2j bits
Lateral fan-out at jth level is power of 2 up
to 2j
Lateral fan-out at the jth level cannot
exceed that a the (j+1)th level.
Oklobdzija 2004

Computer Arithmetic

127

Parallel Prefix Adders: variety of possibilities


Knowles 1999

The number of minimal depth graphs of this type


is given in:

at 4-bits there is only K-S and L-F, afterwards


there are several new possibilities.
Oklobdzija 2004

Computer Arithmetic

128

Parallel Prefix Adders: variety of possibilities

Knowles 1999

example of a new 32-bit adder [4,4,2,2,1]


Oklobdzija 2004

Computer Arithmetic

129

Parallel Prefix Adders: variety of possibilities


Knowles 1999

Example of a new 32-bit adder [4,4,2,2,1]


Oklobdzija 2004

Computer Arithmetic

130

Parallel Prefix Adders: variety of possibilities


Knowles 1999

Delay is given in terms of FO4 inverter delay: w.c.


(nominal case is 40-50% faster)

K-S is the fastest


K-S adders are wire limited (requiring 80% more area)
The difference is less than 15% between examined schemes
Oklobdzija 2004

Computer Arithmetic

131

Parallel Prefix Adders: variety of possibilities


Knowles 1999

Conclusion
Irregular, hybrid schmes are
possible
The speed-up of 15% is
achieved at the cost of large
wiring, hence area and power
Circuits close in speed to K-S
are available at significantly
lower wiring cost

Oklobdzija 2004

Computer Arithmetic

132

VLSI Arithmetic
Lecture 6
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel

Review
Lecture 5

Prefix Adders
and
Parallel Prefix Adders

from: Ercegovac-Lang
Oklobdzija 2004

Computer Arithmetic

136

Prefix Adders

Following recurrence operation is defined:

(g, p)o(g,p)=(g+pg, pp)


such that:

(g0, p0)
G i, P i =

i=0

(gi, pi)o(Gi-1, Pi-1 )

ci+1 = Gi
c1 = g0+ p0 cin

1in

for i=0, 1, .. n

(g-1, p-1)=(cin,cin)

This operation is associative, but not commutative


It can also span a range of bits (overlapping and adjacent)
Oklobdzija 2004

Computer Arithmetic

137

Parallel Prefix Adders: S. Knowles 1999

operation is associative: h>ijk

operation is idempotent: h>ijk

produces carry: cin=0

Oklobdzija 2004

Computer Arithmetic

138

from: Ercegovac-Lang
Oklobdzija 2004

Computer Arithmetic

139

Parallel Prefix Adders: variety of possibilities


from: Ercegovac-Lang

Oklobdzija 2004

Computer Arithmetic

140

Parallel Prefix Adders: variety of possibilities


from: Ercegovac-Lang

Oklobdzija 2004

Computer Arithmetic

141

Parallel Prefix Adders: variety of possibilities


from: Ercegovac-Lang

Oklobdzija 2004

Computer Arithmetic

142

Kogge-Stone Adder

Oklobdzija 2004

Computer Arithmetic

143

Brent-Kung Adder

Oklobdzija 2004

Computer Arithmetic

144

Hybrid BK-KS Adder

Oklobdzija 2004

Computer Arithmetic

145

Pyramid Adder:
M. Lehman, A Comparative Study of Propagation Speed-up Circuits in Binary Arithmetic
Units, IFIP Congress, Munich, Germany, 1962.

Oklobdzija 2004

Computer Arithmetic

146

Parallel Prefix Adders: Ladner-Fisher

Exploits associativity, but not idempotency.


Produces minimal logical depth
Oklobdzija 2004

Computer Arithmetic

147

Parallel Prefix Adders: Ladner-Fisher


(16,8,4,2,1)

Two wires at each level. Uniform, fan-in of two.


Large fan-out (of 16; n/2); Large capacitive loading
combined with the long wires (in the last stages)
Oklobdzija 2004

Computer Arithmetic

148

Parallel Prefix Adders:Exploits


Kogge-Stone
idempotency
to limit the fan-out to 1.
Dramatic increase in
wires. The wire span
remains the same as
in Ladner-Fisher.
Buffers needed in both
cases: K-S, L-F

Oklobdzija 2004

Computer Arithmetic

149

Parallel Prefix Adders: Brent-Kung


Set the fan-out to one
Avoids explosion of wires (as in K-S)
Makes no sense in CMOS:
fan-out = 1 limit is arbitrary and extreme
much of the capacitive load is due to wire
(anyway)

It is more efficient to insert buffers in L-F


than to use B-K scheme
Oklobdzija 2004

Computer Arithmetic

150

Two Parallel Prefix Adder Structures


Kogge-Stone

log(bits) carry stages


Extra Wiring
Oklobdzija 2004

Han-Carlson

log(bits) + 1 carry stages


Reduced Wiring and Gates
Computer Arithmetic

151

Parallel Prefix Adders: Han-Carlson


Is a hybrid synthesis of L-F and K-S
Trades increase in logic depth for a
reduction in fan-out:
effectively a higher-radix variant of K-S.
others do it similarly by serializing the prefix
computation at the higher fan-out nodes.

Others, similarly trade the logical depth for


reduction of fan-out and wire.

Oklobdzija 2004

Computer Arithmetic

152

Parallel Prefix Adders:


variety of possibilities
from: Knowles

bounded by L-F and K-S at ends

Oklobdzija 2004

Computer Arithmetic

153

Parallel Prefix Adders: variety of possibilities


Knowles 1999

Following rules are used:


Lateral wires at the jth level span 2j bits
Lateral fan-out at jth level is power of 2 up
to 2j
Lateral fan-out at the jth level cannot
exceed that a the (j+1)th level.
Oklobdzija 2004

Computer Arithmetic

154

Parallel Prefix Adders: variety of possibilities


Knowles 1999

The number of minimal depth graphs of this type


is given in:

at 4-bits there is only K-S and L-F, afterwards


there are several new possibilities.
Oklobdzija 2004

Computer Arithmetic

155

Parallel Prefix Adders: variety of possibilities

Knowles 1999

example of a new 32-bit adder [4,4,2,2,1]


Oklobdzija 2004

Computer Arithmetic

156

Parallel Prefix Adders: variety of possibilities


Knowles 1999

Example of a new 32-bit adder [4,4,2,2,1]


Oklobdzija 2004

Computer Arithmetic

157

Parallel Prefix Adders: variety of possibilities


Knowles 1999

Delay is given in terms of FO4 inverter delay: w.c.


(nominal case is 40-50% faster)

K-S is the fastest


K-S adders are wire limited (requiring 80% more area)
The difference is less than 15% between examined schemes
Oklobdzija 2004

Computer Arithmetic

158

Parallel Prefix Adders: variety of possibilities


Knowles 1999

Conclusion
Irregular, hybrid schmes are
possible
The speed-up of 15% is
achieved at the cost of large
wiring, hence area and power
Circuits close in speed to K-S
are available at significantly
lower wiring cost

Oklobdzija 2004

Computer Arithmetic

159

Possibilities for Further Research


The logical depth is important (Knowles was right)
The fan-out is less important than fan-in (Knowles
was wrong):
It is possible to examine a variety of topologies with
restricted and varied fan-in.

Driving strength and Logical Effort rules were


overlooked and at least neglected:
It is possible to create number of topologies taking LE
rules into account.
It is further possible to combine the rules with compound
domino implementation taking advantage of two different
rules governing dynamic and static.

It is still possible to produce a better adder !


Oklobdzija 2004

Computer Arithmetic

160

Other Types of Adders

Oklobdzija 2004

Computer Arithmetic

161

Conditional Sum Adder


J. Sklansky, Conditional-Sum Addition Logic,
IRE Transactions on Electronic
Computers, EC-9, p.226-231, 1960.

Conditional Sum Adder

from: Ercegovac-Lang
Oklobdzija 2004

Computer Arithmetic

163

Conditional
Sum Adder

Oklobdzija 2004

Computer Arithmetic

164

Conditional Sum Adder

from: Ercegovac-Lang
Oklobdzija 2004

Computer Arithmetic

165

Conditional Sum Adder

Oklobdzija 2004

from: Ercegovac-Lang 166


Computer Arithmetic

Conditional Sum Adder

Oklobdzija 2004

Computer Arithmetic

167

Carry-Select Adder
O. J. Bedrij, Carry-Select Adder, IRE
Transactions on Electronic Computers, June
1962, p.340-34

Carry-Select Sum Adder

Oklobdzija 2004

from: Ercegovac-Lang 169


Computer Arithmetic

Carry-Select Adder
Addition under assumption of C in=0 and Cin =1.

Oklobdzija 2004

Computer Arithmetic

170

Carry Select Adder:


combining two 32-b VBAs in select mode

Delay =VBA32+ MUX


Oklobdzija 2004

Computer Arithmetic

171

Carry-Select Adder

O.J. Bedrij, IBM Poughkeepsie, 1962


Oklobdzija 2004

Computer Arithmetic

172

Potrebbero piacerti anche