Sei sulla pagina 1di 95

Part III

The Arithmetic/Logic Unit

Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 1

About This Presentation


This presentation is intended to support the use of the textbook
Computer Architecture: From Microprocessors to Supercomputers,
Oxford University Press, 2005, ISBN 0-19-515455-X. It is updated
regularly by the author as part of his teaching of the upper-division
course ECE 154, Introduction to Computer Architecture, at the
University of California, Santa Barbara. Instructors can use these
slides freely in classroom teaching and for other educational
purposes. Any other use is strictly prohibited. Behrooz Parhami
Edition

Released

Revised

Revised

Revised

Revised

First

July 2003

July 2004

July 2005

Mar. 2006

Jan. 2007

Jan. 2008

Jan. 2009

Jan. 2011

Oct. 2014

Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 2

III The Arithmetic/Logic Unit


Overview of computer arithmetic and ALU design:
Review representation methods for signed integers
Discuss algorithms & hardware for arithmetic ops
Consider floating-point representation & arithmetic
Topics in This Part
Chapter 9

Number Representation

Chapter 10 Adders and Simple ALUs


Chapter 11 Multipliers and Dividers
Chapter 12 Floating-Point Arithmetic
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 3

Preview of Arithmetic Unit in the Data Path


Incr PC

Next addr
jta

Next PC

ALUOvfl

(PC)

PC

Instr
cache

rs
rt

inst

0
1
2

rd
31
imm
op

Br&Jump

Instruction fetch

Fig. 13.3
Oct. 2014

Register
writeback

(rs)
Ovfl

Reg
file

ALU
(rt)

/
16

ALU
out

Data
cache

Data
out

Data
in

Func

0
32
SE / 1

Data
addr

0
1
2

Register input

fn

RegDst
RegWrite

Reg access / decode

ALUSrc
ALUFunc

ALU operation

DataRead
RegInSrc
DataWrite

Data access

Key elements of the single-cycle MicroMIPS data path.


Computer Architecture, The Arithmetic/Logic Unit

Slide 4

Computer Arithmetic as a Topic of Study


Brief overview article
Encyclopedia of Info Systems,
Academic Press, 2002,
Vol. 3, pp. 317-333
Our textbooks treatment
of the topic falls between
the extremes (4 chaps.)
Graduate course
ECE 252B Text:
Computer Arithmetic,
Oxford U Press, 2000
(2nd ed., 2010)
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 5

9 Number Representation
Arguably the most important topic in computer arithmetic:
Affects system compatibility and ease of arithmetic
Twos complement, flp, and unconventional methods
Topics in This Chapter
9.1 Positional Number Systems
9.2 Digit Sets and Encodings
9.3 Number-Radix Conversion
9.4 Signed Integers
9.5 Fixed-Point Numbers
9.6 Floating-Point Numbers
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 6

9.1 Positional Number Systems


Representations of natural numbers {0, 1, 2, 3, }
||||| ||||| ||||| ||||| ||||| ||
27
11011
XXVII

sticks or unary code


radix-10 or decimal code
radix-2 or binary code
Roman numerals

Fixed-radix positional representation with k digits


k1

Value of a number: x = (xk1xk2 . . . x1x0)r = xi r i


i=0

For example:
27 = (11011)two = (124) + (123) + (022) + (121) + (120)
Number of digits for [0, P]: k = logr (P + 1) = logr P + 1
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 7

Unsigned Binary Integers


0000

1111
1110

15

0001
0010

1
2

14
1101

0011
3

13
1100

12

1011

Turn x notches
counterclockwise
to add x

Inside: Natural number


Outside: 4-bit encoding

11

1010

0101

12
11
10

15
0

2
3
4
5

9 8 7

6
9

1001

0100

4
5

10

14
13

8
1000

0110

7
0111

Turn y notches
clockwise
to subtract y

Figure 9.1 Schematic representation of 4-bit code for


integers in [0, 15].
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 8

Representation Range and Overflow

Overflow region max

max

Numbers smaller
than max

Overflow region
Numbers larger
than max

Finite set of representable numbers

Figure 9.2 Overflow regions in finite number representation systems.


For unsigned representations covered in this section, max = 0.
Example 9.2, Part d
Discuss if overflow will occur when computing 317 316 in a number
system with k = 8 digits in radix r = 10.
Solution
The result 86 093 442 is representable in the number system which
has a range [0, 99 999 999]; however, if 317 is computed en route to
the final result, overflow will occur.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 9

9.2 Digit Sets and Encodings


Conventional and unconventional digit sets
Decimal digits in [0, 9]; 4-bit BCD, 8-bit ASCII
Hexadecimal, or hex for short: digits 0-9 & a-f
Conventional ternary digit set in [0, 2]
Conventional digit set for radix r is [0, r 1]
Symmetric ternary digit set in [1, 1]
Conventional binary digit set in [0, 1]
Redundant digit set [0, 2], encoded in 2 bits
( 0 2 1 1 0 )two and ( 1 0 1 0 2 )two represent 22
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 10

Carry-Save Numbers
Radix-2 numbers using the digits 0, 1, and 2
Example: (1 0 2 1)two = (123) + (022) + (221) + (120) = 13
Possible encodings
(a) Binary

(b) Unary

0
1
2

0
1
1
2

MSB
LSB
Oct. 2014

00
01
10
11 (Unused)
1 0 2 1
0 0 1 0 = 2
1 0 0 1 = 9

00
01 (First alternate)
10 (Second alternate)
11

First bit
Second bit

Computer Architecture, The Arithmetic/Logic Unit

1 0 2 1
0 0 1 1 = 3
1 0 1 0 = 10
Slide 11

The Notion of Carry-Save Addition


Digit-set combination: {0, 1, 2} + {0, 1} = {0, 1, 2, 3} = {0, 2} + {0, 1}
This bit
being 1
represents
overflow
(ignore it)

Carry-save
input

Carry-save
addition

Two
carry-save
inputs

Binary input
Carry-save
0
output

Carry-save
addition

a. Carry-save addition.

b. Adding two carry-save numbers.

Figure 9.3 Adding a binary number or another


carry-save number to a carry-save number.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 12

9.3 Number Radix Conversion


Two ways to convert numbers from an old radix r to a new radix R

Perform arithmetic in the new radix R


Suitable for conversion from radix r to radix 10
Horners rule:
(xk1xk2 . . . x1x0)r = (((0 + xk1)r + xk2)r + . . . + x1)r + x0
(1 0 1 1 0 1 0 1)two = 0 + 1 1 2 + 0 2 2 + 1 5 2 + 1
11 2 + 0 22 2 + 1 45 2 + 0 90 2 + 1 181

Perform arithmetic in the old radix r


Suitable for conversion from radix 10 to radix R
Divide the number by R, use the remainder as the LSD
and the quotient to repeat the process
19 / 3 rem 1, quo 6 / 3 rem 0, quo 2 / 3 rem 2, quo 0
Thus, 19 = (2 0 1)three
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 13

Justifications for Radix Conversion Rules

( xk 1 xk 2 x0 ) r xk 1r

k 1

xk 2 r

k 2

x1r x0

x0 r ( x1 r ( x2 r ()))
Justifying Horners rule.

x
Binary representation of x/2

Figure 9.4
Oct. 2014

x mod 2

Justifying one step of the conversion of x to radix 2.


Computer Architecture, The Arithmetic/Logic Unit

Slide 14

9.4 Signed Integers


We dealt with representing the natural numbers
Signed or directed whole numbers = integers
{ . . . , 3, 2, 1, 0, 1, 2, 3, . . . }

Signed-magnitude representation
+27 in 8-bit signed-magnitude binary code 0 0011011
27 in 8-bit signed-magnitude binary code 1 0011011
27 in 2-digit decimal code with BCD digits 1 0010 0111

Biased representation
Represent the interval of numbers [N, P] by the unsigned
interval [0, P + N]; i.e., by adding N to every number
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 15

Twos-Complement Representation
With k bits, numbers in the range [2k1, 2k1 1] represented.
Negation is performed by inverting all bits and adding 1.
0000

1111
1

1110

+0

0001
+1

0010
+2

2
1101

0011
3

1011

1100

Turn x notches
counterclockwise
to add x

+3
+4
+5

5
6

1001

0100

4
5
6

0101

1
0

2
3
4
5

7 8 7

+6
7

1010

2
3

8
1000

+7

0110
0111

Turn 16 y notches
counterclockwise to
add y (subtract y)

Figure 9.5 Schematic representation of 4-bit 2s-complement


code for integers in [8, +7].
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 16

Conversion from 2s-Complement to Decimal


Example 9.7
Convert x = (1 0 1 1 0 1 0 1)2s-compl to decimal.
Solution
Given that x is negative, one could change its sign and evaluate x.
Shortcut: Use Horners rule, but take the MSB as negative
1 2 + 0 2 2 + 1 3 2 + 1 5 2 + 0 10 2 + 1
19 2 + 0 38 2 + 1 75

Sign Change for a 2s-Complement Number


Example 9.8
Given y = (1 0 1 1 0 1 0 1)2s-compl, find the representation of y.
Solution
y = (0 1 0 0 1 0 1 0) + 1 = (0 1 0 0 1 0 1 1)2s-compl
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

(i.e., 75)
Slide 17

Twos-Complement Addition and Subtraction

k
/

c in
Adder

k
/

k
/

k
/

xy

c out

y or
y
AddSub
Figure 9.6
Oct. 2014

Binary adder used as 2s-complement adder/subtractor.


Computer Architecture, The Arithmetic/Logic Unit

Slide 18

9.5 Fixed-Point Numbers


Positional representation: k whole and l fractional digits
Value of a number: x = (xk1xk2 . . . x1x0 . x1x2 . . . xl )r = xi r i
For example:
2.375 = (10.011)two = (121) + (020) + (021) + (122) + (123)
Numbers in the range [0, rk ulp] representable, where ulp = r l
Fixed-point arithmetic same as integer arithmetic
(radix point implied, not explicit)
Twos complement properties (including sign change) hold here as well:
(01.011)2s-compl = (021) + (120) + (021) + (122) + (123) = +1.375
(11.011)2s-compl = (121) + (120) + (021) + (122) + (123) = 0.625
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 19

Fixed-Point 2s-Complement Numbers


1.111
.125

1.110

0.000
+0

.25

0.001
+.125

0.010
+.25

1.101

0.011
.375

1.011

.5

1.100

+.5

0.100

+.625

.625

0.101
+.75

.75
1.010

+.375

.875
1.001

1
1.000

+.875

0.110

0.111

Figure 9.7 Schematic representation of 4-bit 2s-complement


encoding for (1 + 3)-bit fixed-point numbers in the range [1, +7/8].
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 20

Radix Conversion for Fixed-Point Numbers


Convert the whole and fractional parts separately.
To convert the fractional part from an old radix r to a new radix R:

Perform arithmetic in the new radix R


Evaluate a polynomial in r 1: (.011)two = 0 21 + 1 22 + 1 23
Simpler: View the fractional part as integer, convert, divide by r l
(.011)two = (?)ten
Multiply by 8 to make the number an integer: (011)two = (3)ten
Thus, (.011)two = (3 / 8)ten = (.375)ten

Perform arithmetic in the old radix r


Multiply the given fraction by R, use the whole part as the MSD
and the fractional part to repeat the process
(.72)ten = (?)two
0.72 2 = 1.44, so the answer begins with 0.1
0.44 2 = 0.88, so the answer begins with 0.10
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 21

9.6 Floating-Point Numbers


Useful for applications where very large and very small
numbers are needed simultaneously
Fixed-point representation must sacrifice precision
for small values to represent large values
x = (0000 0000 . 0000 1001)two
Small number
y = (1001 0000 . 0000 0000)two
Large number
Neither y2 nor y / x is representable in the format above
Floating-point representation is like scientific notation:
20 000 000 = 2 10 7
+0.000 000 007 = +7 109
Sign
Oct. 2014

Significand

Exponent base

Exponent

Computer Architecture, The Arithmetic/Logic Unit

Also, 7E9

Slide 22

ANSI/IEEE Standard Floating-Point Format (IEEE 754)


Revision (IEEE 754R) was completed in 2008: The revised version includes
16-bit and 128-bit binary formats, as well as 64- and 128-bit decimal formats

Short (32-bit) format


8 bits,
bias = 127,
126 to 127

23 bits for fractional part


(plus hidden 1 in integer part)

Sign Exponent
11 bits,
bias = 1023,
1022 to 1023

Short exponent range is 127 to 128


but the two extreme values
are reserved for special operands
(similarly for the long format)

Significand
52 bits for fractional part
(plus hidden 1 in integer part)

Long (64-bit) format

Figure 9.8
Oct. 2014

The two ANSI/IEEE standard floating-point formats.


Computer Architecture, The Arithmetic/Logic Unit

Slide 23

Short and Long IEEE 754 Formats: Features


Table 9.1 Some features of ANSI/IEEE standard floating-point formats
Feature
Word width in bits
Significand in bits
Significand range
Exponent bits
Exponent bias
Zero (0)
Denormal

Single/Short
32
23 + 1 hidden
[1, 2 223]
8
127
e + bias = 0, f = 0
e + bias = 0, f 0
represents 0.f 2126
e + bias = 255, f = 0
e + bias = 255, f 0
e + bias [1, 254]
e [126, 127]
represents 1.f 2e

Double/Long
64
52 + 1 hidden
[1, 2 252]
11
1023
e + bias = 0, f = 0
e + bias = 0, f 0
represents 0.f 21022
e + bias = 2047, f = 0
e + bias = 2047, f 0
e + bias [1, 2046]
e [1022, 1023]
represents 1.f 2e

min

2126 1.2 1038

21022 2.2 10308

max

2128 3.4 1038

21024 1.8 10308

Subnormal

Infinity ()
Not-a-number (NaN)
Ordinary number

Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 24

10 Adders and Simple ALUs


Addition is the most important arith operation in computers:
Even the simplest computers must have an adder
An adder, plus a little extra logic, forms a simple ALU
Topics in This Chapter
10.1 Simple Adders
10.2 Carry Propagation Networks
10.3 Counting and Incrementation
10.4 Design of Fast Adders
10.5 Logic and Shift Operations
10.6 Multifunction ALUs
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 25

10.1 Simple Adders


Inputs
x

Outputs

s
c

0
0
1
1

0
1
0
1

0
0
0
1

Inputs

Digit-set interpretation:
{0, 1} + {0, 1}
= {0, 2} + {0, 1}

HA

0
1
1
0

Outputs

cin

cout

0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

0
0
0
1
0
1
1
1

0
1
1
0
1
0
0
1

Figures 10.1/10.2
Oct. 2014

cout

FA

cin

Digit-set interpretation:
{0, 1} + {0, 1} + {0, 1}
= {0, 2} + {0, 1}

Binary half-adder (HA) and full-adder (FA).

Computer Architecture, The Arithmetic/Logic Unit

Slide 26

Full-Adder Implementations
x
y

HA

c out
HA

c in

x
y
c out

s
(a) FA built of two HAs
x
y

c out

0
1
2
3

0
1
2
3

c in

s
(b) CMOS mux-based FA

c in
s
(c) Two-level AND-OR FA

Figure10.3 Full adder implemented with two half-adders, by means


of two 4-input multiplexers, and as two-level gate network.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 27

Ripple-Carry Adder: Slow But Simple

x31
c32
cout

y31

FA

Oct. 2014

c31

. . .

c2

y1

FA

x0
c1

y0

FA

Critical path
s31

Figure 10.4

x1

s1

c0
cin

s0

Ripple-carry binary adder with 32-bit inputs and output.

Computer Architecture, The Arithmetic/Logic Unit

Slide 28

Carry Chains and Auxiliary Signals


Bit positions

15 14 13 12
----------1 0 1 1

11 10 9 8
----------0 1 1 0

7 6 5 4
----------0 1 1 0

cout 0 1 0 1
1 0 0 1
1 1
\__________/\__________________/
4
6

g = xy

Oct. 2014

Carry chains and their lengths

3 2 1 0
----------1 1 1 0

0
0 0 1 1 cin
\________/\____/
3
2

Computer Architecture, The Arithmetic/Logic Unit

p=xy

Slide 29

Carry Chains Illustrated with Dominoes

Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 30

10.2 Carry Propagation Networks


gi pi

Carry is:

0
0
1
1

annihilated or killed
propagated
generated
(impossible)

0
1
0
1

g k1 p k1

xi

g k2 p k2

yi

gi = xi yi
pi = xi yi
g i+1 p i+1

gi

pi

...

...

g1 p1

g0 p0

c0

Carry network
ck

c k1

...
c k2

ci
c i+1

...
c1

c0

si

Figure 10.5 The main part of an adder is the carry network. The rest
is just a set of gates to produce the g and p signals and the sum bits.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 31

Ripple-Carry Adder Revisited


The carry recurrence: ci+1 = gi pi ci
Latency of k-bit adder is roughly 2k gate delays:
1 gate delay for production of p and g signals, plus
2(k 1) gate delays for carry propagation, plus
1 XOR gate delay for generation of the sum bits
gk1 pk1

gk2 pk2

g1

p1

...

ck

ck1

Figure 10.6
Oct. 2014

ck2

c2

c1

g0

p0

c0

The carry propagation network of a ripple-carry adder.


Computer Architecture, The Arithmetic/Logic Unit

Slide 32

The Complete Design of a Ripple-Carry Adder


gi pi

Carry is:

0
0
1
1

annihilated or killed
propagated
generated
(impossible)

0
1
0
1

g k1 p k1
gk1 pk1

ck

ck

xi

g k2 p k2

c k1

gi = xi yi
pi = xi yi
g i+1 p i+1

gi

pi

...

...
gk2 pk2

Carry network

ck2

...
c k2

g1 p1
g1

.
ck1

yi

ci
c i+1

p1

.
c2

c1

...
c1

g0 p0
g0

c0

p0

c0

c0

si

Figure 10.6 (ripple-carry network) superimposed on


Figure 10.5 (general structure of an adder).
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 33

First Carry Speed-Up Method: Carry Skip


g4j+3 p4j+3

c4j+4

c4j+3

g4j+2 p4j+2

c4j+2

g4j+1

p4j+1

c4j+1

g4j

p4j

c4j

One-way street

Freeway

Figures 10.7/10.8
A 4-bit section of a ripple-carry network
with skip paths and the driving analogy.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 34

Mux-Based Skip Carry Logic


Fig. 10.7

g4j+3 p4j+3

c4j+4

g4j+2 p4j+2

c4j+3

g4j+3 p4j+3

c4j+4

0
1c4j+4

p[4j, 4j+3]

c4j+3

g4j+1

c4j+2

g4j+2 p4j+2

p4j+1

g4j

p4j

c4j+1

g4j+1

c4j+2

p4j+1

c4j

g4j

p4j

c4j+1

c4j

The carry-skip adder of Fig. 10.7 works fine if we begin with a


clean slate, where all signals are 0s; otherwise, it will run into
problems, which do not exist in this mux-based implementation
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 35

10.3 Counting and Incrementation


k
/
k
/

Data in
IncrInit

0
1

k
/

Count
register
D

k
/

c in

_
C

Adder /
k

Update
a
(Increment
amount)

Figure 10.9
Oct. 2014

c out

Schematic diagram of an initializable synchronous counter.


Computer Architecture, The Arithmetic/Logic Unit

Slide 36

Circuit for Incrementation by 1


Figure 10.6

Substantially
simpler than
an adder

gk1 pk1

ck

xk1

ck

0g1

gk2 pk2

sk1

x1

...
ck1

ck2

c2

xk2

ck1

p1

ck2

sk2

c2

s2

p0

x0

c1

s1

s0

Figure 10.10 Carry propagation network


and sum logic for an incrementer.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

x0

c0

c1

x1

...

0g0

Slide 37

10.4 Design of Fast Adders


Carries can be computed directly without propagation
For example, by unrolling the equation for c3, we get:
c3 = g2 p2 c2 = g2 p2 g1 p2 p1 g0 p2 p1 p0 c0

We define generate and propagate signals for a block


extending from bit position a to bit position b as follows:
g[a,b] = gb pb gb1 pb pb1 gb2 . . . pb pb1 pa+1 ga
p[a,b] = pb pb1 . . . pa+1 pa

Combining g and p signals for adjacent blocks:


g[h,j] = g[i+1,j] p[i+1,j] g[h,i]
p[h,j] = p[i+1,j] p[h,i]

i+1 i

[h, j] = [i + 1, j] [h, i]
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 38

Carries as Generate Signals for Blocks [ 0, i ]


gi pi

Carry is:

0
0
1
1

annihilated or killed
propagated
generated
(impossible)

0
1
0
1

g k1 p k1

xi

g k2 p k2

yi

Assuming c0 = 0,
we have ci = g [0,i 1]
g i+1 p i+1

gi

pi

...

...

g1 p1

g0 p0

c0

Carry network
ck

c k1

Figure 10.5
Oct. 2014

...
c k2

ci
c i+1

...
c1

c0

si
Computer Architecture, The Arithmetic/Logic Unit

Slide 39

Second Carry Speed-Up Method: Carry Lookahead


[7, 7 ]

[6, 6 ]

[5, 5 ]

[4, 4 ] [3, 3 ]

[2, 2 ]

[1, 1 ]

[6, 7 ]

[0, 0 ]

g [1, 1] p [1, 1]
g [0, 0]
p [0, 0]

[2, 3 ]
[4, 5 ]

[0, 1 ]

[4, 7 ]
[0, 3 ]

[0, 7 ]

[0, 6 ]

[0, 5 ]

[0, 4 ] [0, 3 ]

[0, 2 ]

g [0, 1] p [0, 1]
[0, 1 ]

[0, 0 ]

Figure 10.11 Brent-Kung lookahead carry network for an 8-digit adder,


along with details of one of the carry operator blocks.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 40

Recursive Structure of Brent-Kung Carry Network


[7, 7]

[6, 6]

[5, 5]

[4, 4]

[3, 3]

[2, 2]

[1, 1]

[0, 0]
[7, 7 ]

[6, 6 ]

[5, 5 ]

[4, 4 ] [3, 3 ]

[2, 2 ]

[6, 7 ]

[2, 3 ]
[0, 1 ]

[4, 7 ]
[0, 3 ]

[0, 6]

[0, 5]

[0, 4]

[0, 7 ]

[0, 7]

[0, 0 ]

[4, 5 ]

4-input Brent-Kung
carry network

[1, 1 ]

[0, 3]

[0, 2]

[0, 1]

[0, 6 ]

[0, 5 ]

[0, 4 ] [0, 3 ]

[0, 2 ]

[0, 1 ]

[0, 0 ]

[0, 0]

Figure 10.12 Brent-Kung lookahead carry network for an 8-digit adder,


with only its top and bottom rows of carry-operators shown.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 41

An Alternate Design: Kogge-Stone Network

c8 = g [0,7]

c7 = g [0,6]

c6 = g [0,5]

c5 = g [0,4]

c4 = g [0,3]

c3 = g [0,2]

c2 = g [0,1]

c1 = g [0,0]

Kogge-Stone lookahead carry network for an 8-digit adder.


Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 42

Brent-Kung vs. Kogge-Stone Carry Network


[7, 7 ]

[6, 6 ]

[5, 5 ]

[4, 4 ]

[3, 3 ]

[2, 2 ]

[1, 1 ]

[6, 7 ]

[0, 0 ]

g [1, 1] p [1, 1]
g [0, 0]
p [0, 0]

[2, 3 ]
[0, 1 ]

[4, 5 ]

[4, 7 ]
[0, 3 ]

[0, 7 ]

[0, 6 ]

[0, 5 ]

[0, 4 ]

[0, 3 ]

[0, 2 ]

g [0, 1] p [0, 1]
[0, 1 ]

11 carry operators
4 levels

Oct. 2014

[0, 0 ]

17 carry operators
3 levels

Computer Architecture, The Arithmetic/Logic Unit

Slide 43

Carry-Lookahead Logic with 4-Bit Block


p i+3

g i+3

p i+2

g i+2

p i+1 g i+1 p i

gi

g [i, i+3]

Intermeidte carries

p [i, i+3]

Block signal generation

ci

c i+3

c i+2

c i+1

Figure 10.13
Blocks needed in the design of carry-lookahead adders
with four-way grouping of bits.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 44

Third Carry Speed-Up Method: Carry Select


Allows doubling of adder width with a single-mux additional delay
x

c out

Adder

c in

Version 0
of sum bits

c out

Figure 10.14
Oct. 2014

[a, b]

Adder

[a, b]

c in

Version 1
of sum bits
a

[a, b]

The lower
a positions,
(0 to a 1)
are added
as usual

Carry-select addition principle.


Computer Architecture, The Arithmetic/Logic Unit

Slide 45

10.5 Logic and Shift Operations


Conceptually, shifts can be implemented by multiplexing
Right-shifted
values
RightLeft
Shift amount
5

00, x[30, 2]
0, x[31, 1]
x[31, 0]
32

32 32

6-bit code specifying


shift direction & amount

Figure 10.15

x[0], 00...0
x[1, 0], 00...0

x[30, 0], 0
x[31, 0]

. . .

32

32

31

32

32
33

. . .

32
62

32
63

Multiplexer

Oct. 2014

00...0, x[31]

Left-shifted
values

32

Multiplexer-based logical shifting unit.


Computer Architecture, The Arithmetic/Logic Unit

Slide 46

Arithmetic Shifts
Purpose: Multiplication and division by powers of 2
sra $t0,$s1,2
srav $t0,$s1,$s0
op

31

25

20

rt

15

rd

10

sh

fn

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1
ALU
instruction

op

31

rs

# $t0 ($s1) right-shifted by 2


# $t0 ($s1) right-shifted by ($s0)

Unused

25

rs

Source
register

20

rt

Destination
register

15

rd

Shift
amount

10

sh

sra = 3

fn

0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1
ALU
instruction

Figure 10.16
Oct. 2014

Amount
register

Source
register

Destination
register

Unused

srav = 7

The two arithmetic shift instructions of MiniMIPS.


Computer Architecture, The Arithmetic/Logic Unit

Slide 47

Practical Shifting in Multiple Stages


00
01
10
11

No shift
Logical left
Logical right
Arith right

x[31], x[31, 1]
0, x[31, 1]
x[30, 0], 0
x[31, 0]
32
32
32
32
0

y[31, 0]
0
2

(0 or 2)-bit shift

z[31, 0]

Multiplexer
0

32
2

(a) Single-bit shifter

Figure 10.17
Oct. 2014

(0 or 4)-bit shift

(0 or 1)-bit shift

(b) Shifting by up to 7 bits

Multistage shifting in a barrel shifter.


Computer Architecture, The Arithmetic/Logic Unit

Slide 48

Bit Manipulation via Shifts and Logical Operations


Bits 10-15

AND with mask to isolate a field: 0000 0000 0000 0000 1111 1100 0000 0000
Right-shift by 10 positions to move field to the right end of word
The result word ranges from 0 to 63, depending on the field pattern
32-pixel (4 8) block of
black-and-white image:

Row 0

Row 1

Row 2

Row 3

Representation
as 32-bit word:

1010 0000 0101 1000 0000 0110 0001 0111

Hex equivalent:

0xa0a80617

Figure 10.18 A 4 8 block of a black-and-white


image represented as a 32-bit word.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 49

10.6 Multifunction ALUs


Arith fn (add, sub, . . .)

Operand 1

Operand 2

Arith
unit
Logic
unit

Result

Select fn type
(logic or arith)

Logic fn (AND, OR, . . .)


General structure of a simple arithmetic/logic unit.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 50

ConstVar

Shift function

Constant
5
amount

Amount

Variable
amount

00
01
10
11

No shift
Logical left
Logical right
Arith right

Shifter

Function
class

32
5 LSBs

Shifted y

Adder
y

0 or 1

c0

32

32

k
/

c 31

xy

Shift
Set less
Arithmetic
Logic

2
0

Shorthand
symbol
for ALU

MSB

32

32

00
01
10
11

An ALU for
MiniMIPS

Control

c 32

x
Func

AddSub

s
ALU
Logic
unit
AND
OR
XOR
NOR

00
01
10
11

32input
NOR

Ovfl
Zero

Logic function

Zero

Ovfl

Figure 10.19 A multifunction ALU with 8 control signals (2 for


function class, 1 arithmetic, 3 shift, 2 logic) specifying the operation.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 51

11 Multipliers and Dividers


Modern processors perform many multiplications & divisions:
Encryption, image compression, graphic rendering
Hardware vs programmed shift-add/sub algorithms
Topics in This Chapter
11.1 Shift-Add Multiplication
11.2 Hardware Multipliers
11.3 Programmed Multiplication
11.4 Shift-Subtract Division
11.5 Hardware Dividers
11.6 Programmed Division
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 52

11.1 Shift-Add Multiplication


Multiplicand
Multiplier

x
y

Partial
products
bit-matrix

y0
y
1
y2
y3

Product

Figure 11.1

x
x
x
x

20
21
22
23

Multiplication of 4-bit numbers in dot notation.

z(j+1) = (z(j) + yj x 2k) 21 with z(0) = 0 and z(k) = z


| add |
| shift right |
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 53

Binary and Decimal Multiplication


Position
7 6 5 4 3 2 1 0
=========================
x24
1 0 1 0
y
0 0 1 1
=========================
z (0)
0 0 0 0
4
+y0x2
1 0 1 0

2z (1)
0 1 0 1 0
(1)
z
0 1 0 1 0
4
+y1x2
1 0 1 0

2z (2)
0 1 1 1 1 0
(2)
z
0 1 1 1 1 0
4
+y2x2
0 0 0 0

2z (3)
0 0 1 1 1 1 0
(3)
z
0 0 1 1 1 1 0
4
+y3x2
0 0 0 0

2z (4)
0 0 0 1 1 1 1 0
(4)
z
0 0 0 1 1 1 1 0
=========================

Figure 11.2
Oct. 2014

Example 11.1

Position
7 6 5 4 3 2 1 0
=========================
x104
3 5 2 8
y
4 0 6 7
=========================
z (0)
0 0 0 0
4
+y0x10 2 4 6 9 6

10z (1)
2 4 6 9 6
(1)
z
0 2 4 6 9 6
4
+y1x10 2 1 1 6 8

10z (2)
2 3 6 3 7 6
(2)
z
2 3 6 3 7 6
4
+y2x10 0 0 0 0 0

10z (3)
0 2 3 6 3 7 6
(3)
z
0 2 3 6 3 7 6
4
+y3x10 1 4 1 1 2

10z (4)
1 4 3 4 8 3 7 6
(4)
z
1 4 3 4 8 3 7 6
=========================

Step-by-step multiplication examples for 4-digit unsigned numbers.


Computer Architecture, The Arithmetic/Logic Unit

Slide 54

Twos-Complement Multiplication
Position
7 6 5 4 3 2 1 0
=========================
x24
1 0 1 0
y
0 0 1 1
=========================
z (0)
0 0 0 0 0
4
+y0x2
1 1 0 1 0

2z (1)
1 1 0 1 0
(1)
z
1 1 1 0 1 0
4
+y1x2
1 1 0 1 0

2z (2)
1 0 1 1 1 0
(2)
z
1 1 0 1 1 1 0
4
+y2x2
0 0 0 0 0

2z (3)
1 1 0 1 1 1 0
(3)
z
1 1 1 0 1 1 1 0
4
+(y3x2 ) 0 0 0 0 0

2z (4)
1 1 1 0 1 1 1 0
(4)
z
1 1 1 0 1 1 1 0
=========================

Figure 11.3
Oct. 2014

Example 11.2

Position
7 6 5 4 3 2 1 0
=========================
x24
1 0 1 0
y
1 0 1 1
=========================
z (0)
0 0 0 0 0
4
+y0x2
1 1 0 1 0

2z (1)
1 1 0 1 0
(1)
z
1 1 1 0 1 0
4
+y1x2
1 1 0 1 0

2z (2)
1 0 1 1 1 0
(2)
z
1 1 0 1 1 1 0
4
+y2x2
0 0 0 0 0

2z (3)
1 1 0 1 1 1 0
(3)
z
1 1 1 0 1 1 1 0
4
+(y3x2 ) 0 0 1 1 0

2z (4)
0 0 0 1 1 1 1 0
(4)
z
0 0 0 1 1 1 1 0
=========================

Step-by-step multiplication examples for 2s-complement numbers.


Computer Architecture, The Arithmetic/Logic Unit

Slide 55

11.2 Hardware Multipliers


Shift
Multiplier y
Doublewidth partial product z (j)

Hi

Shift

Lo

Multiplicand x

Mux

yj

Enable
Select

c out

Figure 11.4
Oct. 2014

Adder

c in

AddSub

Hardware multiplier based on the shift-add algorithm.


Computer Architecture, The Arithmetic/Logic Unit

Slide 56

The Shift Part of Shift-Add


From adder
cout

Sum
/k

Partial product

/k1

Multiplier
/k1

/k

To adder

yj

Figure11.5
Shifting incorporated in the connections to
the partial product register rather than as a separate phase.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 57

High-Radix Multipliers
Multiplicand
Multiplier

x
y
0, x, 2x, or 3x

Product

Radix-4 multiplication in dot notation.

z(j+1) = (z(j) + yj x 2k) 41 with z(0) = 0 and z(k/2) = z


| add |
Assume k even
| shift right |
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 58

Tree Multipliers
All partial products

Several partial products

...

...

Large tree of
carry-save
adders

Logdepth

Adder

Logdepth

Product
(a) Full-tree multiplier

Figure 11.6
Oct. 2014

Small tree of
carry-save
adders

Adder
Product
(b) Partial-tree multiplier

Schematic diagram for full/partial-tree multipliers.


Computer Architecture, The Arithmetic/Logic Unit

Slide 59

Array Multipliers
0
s

x3 0
0

x2 0
0

x1 0
0

x0 0

MA

MA

MA

MA

0
0

Figure 9.3a
(Recalling
carry-save
addition)

s
MA

y0

c
MA

MA

MA

y1

z0

0
MA

MA

MA

MA

y2

z1

0
MA

MA

MA

MA

y3

z2

0
z3
FA

Figure 11.7
Oct. 2014

Our original
dot-notation
representing
multiplication

FA

FA

HA

z7

z6

z5

Straightened
dots to depict
array multiplier
to the left

z4

Array multiplier for 4-bit unsigned operands.

Computer Architecture, The Arithmetic/Logic Unit

Slide 60

11.3 Programmed Multiplication


MiniMIPS instructions related to multiplication
mult
multu
mfhi
mflo

$s0,$s1
$s2,$s3
$t0
$t1

#
#
#
#

set
set
set
set

Hi,Lo to ($s0)($s1); signed


Hi,Lo to ($s2)($s3); unsigned
$t0 to (Hi)
$t1 to (Lo)

Example 11.3
Finding the 32-bit product of 32-bit integers in MiniMIPS
Multiply; result will be obtained in Hi,Lo
For unsigned multiplication:
Hi should be all-0s and Lo holds the 32-bit result
For signed multiplication:
Hi should be all-0s or all-1s, depending on the sign bit of Lo
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 61

Emulating a Hardware Multiplier in Software


Example 11.4 (MiniMIPS shift-add program for multiplication)
Shift
$a1
(multiplier
Multiplier
y y)
$v0Doublewidth
(Hi part of z)partial
$v1product
(Lo part zof(j)z)
$t0 (carry-out)
$a0Multiplicand
(multiplicandx x)

Also, holds
LSB of Hi
during shift

Shift

Mux

$t1 (bit j of y)

yj

Enable
Select

$t2 (counter)
c out

Adder

c in

AddSub

Part of the
control in
hardware

Figure 11.8 Register usage for programmed multiplication


superimposed on the block diagram for a hardware multiplier.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 62

Multiplication When There Is No Multiply Instruction


Example 11.4 (MiniMIPS shift-add program for multiplication)
shamu: move
move
addi
mloop: move
move
srl
subu
subu
beqz
addu
sltu
noadd: move
srl
subu
subu
sll
addu
srl
sll
addu
addi
bne
jr
Oct. 2014

$v0,$zero
$vl,$zero
$t2,$zero,32
$t0,$zero
$t1,$a1
$a1,1
$t1,$t1,$a1
$t1,$t1,$a1
$t1,noadd
$v0,$v0,$a0
$t0,$v0,$a0
$t1,$v0
$v0,1
$t1,$t1,$v0
$t1,$t1,$v0
$t0,$t0,31
$v0,$v0,$t0
$v1,1
$t1,$t1,31
$v1,$v1,$t1
$t2,$t2,-1
$t2,$zero,mloop
$ra

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

initialize Hi to 0
initialize Lo to 0
init repetition counter to 32
set c-out to 0 in case of no add
copy ($a1) into $t1
halve the unsigned value in $a1
subtract ($a1) from ($t1) twice to
obtain LSB of ($a1), or y[j], in $t1
no addition needed if y[j] = 0
add x to upper part of z
form carry-out of addition in $t0
copy ($v0) into $t1
halve the unsigned value in $v0
subtract ($v0) from ($t1) twice to
obtain LSB of Hi in $t1
carry-out converted to 1 in MSB of $t0
right-shifted $v0 corrected
halve the unsigned value in $v1
LSB of Hi converted to 1 in MSB of $t1
right-shifted $v1 corrected
decrement repetition counter by 1
if counter > 0, repeat multiply loop
return to the calling program

Computer Architecture, The Arithmetic/Logic Unit

Slide 63

11.4 Shift-Subtract Division


Divisor

Subtracted
bit-matrix

z
Dividend
y 3 x 23
y 2 x 22
y 1 x 21
y 0 x 20
s

Figure11.9

Quotient

Remainder

Division of an 8-bit number by a 4-bit number in dot notation.

z(j) = 2z(j1) ykj x 2k with z(0) = z and z(k) = 2k s


|shift|
| subtract |
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 64

Integer and Fractional Unsigned Division


Position
7 6 5 4 3 2 1 0
=========================
z
0 1 1 1 0 1 0 1
4
x2
1 0 1 0
=========================
z (0)
0 1 1 1 0 1 0 1
(0)
2z
0 1 1 1 0 1 0 1
4
y3x2
1 0 1 0
y3=1

z (1)
0 1 0 0 1 0 1
(1)
2z
0 1 0 0 1 0 1
4
y2x2
0 0 0 0
y2=0

z (2)
1 0 0 1 0 1
(2)
2z
1 0 0 1 0 1
4
y1x2
1 0 1 0
y1=1

z (3)
1 0 0 0 1
(3)
2z
1 0 0 0 1
4
y0x2
1 0 1 0
y0=1

z (4)
0 1 1 1
s
0 1 1 1
y
1 0 1 1
=========================

Figure 11.10
Oct. 2014

Example 11.5

Position
1 2 3 4 5 6 7 8
==========================
z
.1 4 3 5 1 5 0 2
x
.4 0 6 7
==========================
z (0)
.1 4 3 5 1 5 0 2
(0)
10z
1.4 3 5 1 5 0 2
y1x
1.2 2 0 1
y1=3

z (1)
.2 1 5 0 5 0 2
(1)
10z
2.1 5 0 5 0 2
y2x
2.0 3 3 5
y2=5

z (2)
.1 1 7 0 0 2
(2)
10z
1.1 7 0 0 2
y3x
0.8 1 3 4
y3=2

z (3)
.3 5 6 6 2
(3)
10z
3.5 6 6 2
y4x
3.2 5 3 6
y4=8

z (4)
.3 1 2 6
s
.0 0 0 0 3 1 2 6
y
.3 5 2 8
==========================

Division examples for binary integers and decimal fractions.


Computer Architecture, The Arithmetic/Logic Unit

Slide 65

Division with Same-Width Operands


Position
7 6 5 4 3 2 1 0
=========================
z
0 0 0 0 1 1 0 1
4
x2
0 1 0 1
=========================
z (0)
0 0 0 0 1 1 0 1
(0)
2z
0 0 0 1 1 0 1
4
y3x2
0 0 0 0
y3=0

z (1)
0 0 0 1 1 0 1
(1)
2z
0 0 1 1 0 1
4
y2x2
0 0 0 0
y2=0

z (2)
0 0 1 1 0 1
(2)
2z
0 1 1 0 1
4
y1x2
0 1 0 1
y1=1

z (3)
0 0 0 1 1
(3)
2z
0 0 1 1
4
y0x2
1 0 1 0
y0=0

z (4)
0 0 1 1
s
0 0 1 1
y
0 0 1 0
=========================

Figure 11.11
Oct. 2014

Example 11.6

Position
1 2 3 4 5 6 7 8
==========================
z
.0 1 0 1
x
.1 1 0 1
==========================
z (0)
.0 1 0 1
(0)
2z
0.1 0 1 0
y1x
0.0 0 0 0
y1=0

z (1)
.1 0 1 0
(1)
2z
1.0 1 0 0
y2x
0.1 1 0 1
y2=1

z (2)
.0 1 1 1
(2)
2z
0.1 1 1 0
y3x
0.1 1 0 1
y3=1

z (3)
.0 0 0 1
(3)
2z
0.0 0 1 0
y4x
0.0 0 0 0
y4=0

z (4)
.0 0 1 0
s
.0 0 0 0 0 0 1 0
y
.0 1 1 0
==========================

Division examples for 4/4-digit binary integers and fractions.


Computer Architecture, The Arithmetic/Logic Unit

Slide 66

Signed Division
Method 1 (indirect): strip operand signs, divide, set result signs
Dividend
z=5
z=5
z = 5
z = 5

Divisor
x=3
x = 3
x=3
x = 3

Quotient
y=1
y = 1
y = 1
y=1

Remainder
s=2
s=2
s = 2
s = 2

Method 2 (direct 2s complement): develop quotient with digits


1 and 1, chosen based on signs, convert to digits 0 and 1
Restoring division: perform trial subtraction, choose 0 for q digit
if partial remainder negative
Nonrestoring division: if sign of partial remainder is correct,
then subtract (choose 1 for q digit) else add (choose 1)
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 67

11.5 Hardware Dividers


Shift
Quotient y
Partial rem ainder z (j) (initially z)

Hi

Shift

Lo

Mux 1

Load

Quotient
digit
selector

Divisor x

yk j

Enable

Select

out

Adder

Trial difference

Figure 11.12
Oct. 2014

in

1
(Always
subtract)

Hardware divider based on the shift-subtract algorithm.


Computer Architecture, The Arithmetic/Logic Unit

Slide 68

The Shift Part of Shift-Subtract


qkj
From adder
/k

Partial remainder

/k

Quotient
/k

MSB

/k

To adder
Figure 11.13 Shifting incorporated in the connections to the
partial remainder register rather than as a separate phase.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 69

High-Radix Dividers
x

Divisor

Quotient

Dividend

0, x, 2x, or 3x
s

Remainder

Radix-4 division in dot notation.

z(j) = 4z(j1) (yk2j+1 yk2j)two x 2k with z(0) = z and z(k/2) = 2ks


|shift|
Assume k even
| subtract |
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 70

Array Dividers
z7

y3

x3

z6

MS

x1

MS

x0

z5

MS

y2

x2

z4

MS

MS

z3
0

z2

d
MS

MS

MS

z1

y1
MS

MS

MS

MS

Our original
dot-notation
for division

z0

y0
MS

s3

MS

s2

Figure 11.14
Oct. 2014

MS

s1

MS

s0

Straightened
dots to depict
an array divider

Array divider for 8/4-bit unsigned integers.


Computer Architecture, The Arithmetic/Logic Unit

Slide 71

11.6 Programmed Division


MiniMIPS instructions related to division
div
divu
mfhi
mflo

$s0,$s1
$s2,$s3
$t0
$t1

#
#
#
#

Lo = quotient, Hi = remainder
unsigned version of division
set $t0 to (Hi)
set $t1 to (Lo)

Example 11.7
Compute z mod x, where z (singed) and x > 0 are integers
Divide; remainder will be obtained in Hi
if remainder is negative,
then add |x| to (Hi) to obtain z mod x
else Hi holds z mod x
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 72

Emulating a Hardware Divider in Software


Example 11.8 (MiniMIPS shift-add program for division)
Shift
$a1
(quotient
Quotient
y y)
$v0
Partial
(Hi part
remainder
of z)
$v1
z (j) (initially
(Lo part of
z)z)
Shift

$t0 (MSB of Hi)

yk j
Load

$t1 (bit k j of y)

$a0
Divisor
(divisorx x)

Mux

Enable

Quotient
digit
selector

Select

$t2 (counter)
c out

Adder

Trial difference

c in

1
(Always
subtract)

Part of the
control in
hardware

Figure 11.15 Register usage for programmed division superimposed


on the block diagram for a hardware divider.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 73

Division When There Is No Divide Instruction


Example 11.7 (MiniMIPS shift-subtract program for division)
shsdi: move
move
addi
dloop: slt
sll
slt
or
sll
sge
or
sll
or
beq
subu
nosub: addi
bne
move
jr
Oct. 2014

$v0,$a2
$vl,$a3
$t2,$zero,32
$t0,$v0,$zero
$v0,$v0,1
$t1,$v1,$zero
$v0,$v0,$t1
$v1,$v1,1
$t1,$v0,$a0
$t1,$t1,$t0
$a1,$a1,1
$a1,$a1,$t1
$t1,$zero,nosub
$v0,$v0,$a0
$t2,$t2,-1
$t2,$zero,dloop
$v1,$a1
$ra

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#

initialize Hi to ($a2)
initialize Lo to ($a3)
initialize repetition counter to 32
copy MSB of Hi into $t0
left-shift the Hi part of z
copy MSB of Lo into $t1
move MSB of Lo into LSB of Hi
left-shift the Lo part of z
quotient digit is 1 if (Hi) x,
or if MSB of Hi was 1 before shifting
shift y to make room for new digit
copy y[k-j] into LSB of $a1
if y[k-j] = 0, do not subtract
subtract divisor x from Hi part of z
decrement repetition counter by 1
if counter > 0, repeat divide loop
copy the quotient y into $v1
return to the calling program

Computer Architecture, The Arithmetic/Logic Unit

Slide 74

Divider vs Multiplier: Hardware Similarities


Shift
Quotient y
Partial rem ainder z (j) (initially z)

Shift

yk j

Multiplier y
Dou blewi dth pa rtial product z (j)

Load

Shift

Shift

Quotient
digit
selector

Divisor x

Mux

Enable

Multiplicand x

yj

Enable

Mux

Select

Select

Figure 11.12
c

out

Adder

in

Trial difference
x3 0
0

x2 0
0

MA

x1 0
0

MA

x0 0

z7

y3
MA

Adder

c out

1
(Always
subtract)

0
MA

Figure 11.4

y0

x3

z6

MS

x2

x1
z5

MS

c in

x0
z4

MS

MS

0
MA

MA

MA

MA

y1

z0

0
MA

MA

MA

MA

y2

z1

MS

MS

MS

MS

MS

MS

MS

0
MA

MA

MA

MA

y3

MS

z3
FA

FA

HA

z7

z6

z5

Oct. 2014

S traighten ed
dots to depic t
arr ay m ultiplier
to the left

Figure 11.14

z4

Our original
dot-notation
for division

z0

y0
MS

MS

MS

0
FA

z1

y1
MS

z2

z3

z2

y2
O ur o rigin al
dot-n otation
rep res entin g
m ultiplic ation

AddSub

s3

s2

s1

s0

Turn upside-down

Computer Architecture, The Arithmetic/Logic Unit

0
Straightened
dots to depict
an array divider

Figure 11.7
Slide 75

12 Floating-Point Arithmetic
Floating-point is no longer reserved for high-end machines
Multimedia and signal processing require flp arithmetic
Details of standard flp format and arithmetic operations
Topics in This Chapter
12.1 Rounding Modes
12.2 Special Values and Exceptions
12.3 Floating-Point Addition
12.4 Other Floating-Point Operations
12.5 Floating-Point Instructions
12.6 Result Precision and Errors
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 76

12.1 Rounding Modes


Short (32-bit) format
8 bits,
23 bits for fractional part
bias = 127, (plus hidden 1 in integer part)
126 to 127

Sign Exponent

IEEE 754
Format

0, , NaN

Denormals:
0.f 2emin

1.f 2e

Significand

11 bits,
bias = 1023,
1022 to 1023

52 bits for fractional part


(plus hidden 1 in integer part)

Denormals allow
graceful underflow

Long (64-bit) format

Negative numbers
max
min
FLP

Sparser

Overflow
region

Denser

Positive numbers
min +
FLP +

Denser

Figure 12.1
Oct. 2014

Underflow
example

Sparser

Underflow
regions
Midway
example

max +

Overflow
region
Typical
example

Overflow
example

Distribution of floating-point numbers on the real line.


Computer Architecture, The Arithmetic/Logic Unit

Slide 77

Round-to-Nearest (Even)
rtnei(x)

rtni(x)

x
4

Oct. 2014

(a) Round to nearest even integer

Figure 12.2

x
2

(b) Round to nearest integer

Two round-to-nearest-integer functions for x in [4, 4].


Computer Architecture, The Arithmetic/Logic Unit

Slide 78

Directed Rounding
ritni(x)

rutni(x)

x
4

x
4

(a) Round inward to nearest integer

(b) Round upward to nearest integer

Figure 12.3 Two directed round-to-nearest-integer functions for x in [4, 4].


Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 79

12.2 Special Values and Exceptions


Zeros, infinities, and NaNs (not a number)
0 Biased exponent = 0, significand = 0 (no hidden 1)
Biased exponent = 255 (short) or 2047 (long), significand = 0
NaN Biased exponent = 255 (short) or 2047 (long), significand 0
Arithmetic operations with special operands
(+0) + (+0) = (+0) (0) = +0
(+0) (+5) = +0
(+0) / (5) = 0
(+) + (+) = +
x (+) =
(+) x = , depending on the sign of x
x / (+) = 0, depending on the sign of x
(+) = +
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 80

Exceptions
Undefined results lead to NaN (not a number)
(0) / (0) = NaN
(+) + () = NaN
(0) () = NaN
() / () = NaN

Arithmetic operations and comparisons with NaNs


NaN + x = NaN
NaN + NaN = NaN
NaN 0 = NaN
NaN NaN = NaN

NaN < 2 false


NaN = Nan false
NaN (+) true
NaN NaN true

Examples of invalid-operation exceptions


Addition:
Multiplication:
Division:
Square-root:
Oct. 2014

(+) + ()
0
0 0 or
Operand < 0

Computer Architecture, The Arithmetic/Logic Unit

Slide 81

12.3 Floating-Point Addition


(2e1s1) + (2e1(s2 / 2e1e2)) = 2e1(s1 s2 / 2e1e2)
(2e2s2)

Numbers to be added:
x = 25 1.00101101
y = 21 1.11101101
Operands after alignment shift:
x = 25 1.00101101
y = 25 0.000111101101
Result of addition:
s = 25 1.010010111101
s = 25 1.01001100
Figure 12.4
Oct. 2014

Operand with
smaller exponent
to be preshifted

Extra bits to be
rounded off
Rounded sum

Alignment shift and rounding in floating-point addition.


Computer Architecture, The Arithmetic/Logic Unit

Slide 82

Inp ut 1

Hardware for
Floating-Point
Addition

Inp ut 2
Unpack

Signs Exponents

Significands

AddSub
Mu x

Sub

Possible swap
& compleme nt
Align
significands

Control
& sign
logic

Add
Norma lize
& round

Figure 12.5
Simplified schematic of
a floating-point adder.

Sign

Exponent

Significand

Pack
Outp ut

Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 83

12.4 Other Floating-Point Operations


Overflow
(underflow)
possible

Floating-point multiplication

(2e1s1) (2e2s2) = 2e1+ e2(s1 s2)


Product of significands in [1, 4)
If product is in [2, 4), halve to normalize (increment exponent)
Overflow
(underflow)
possible

Floating-point division

(2e1s1) / (2e2s2) = 2e1 e2(s1 / s2)


Ratio of significands in (1/2, 2)
If ratio is in (1/2, 1), double to normalize (decrement exponent)
Floating-point square-rooting
(2es)1/2 = 2e/2(s)1/2
= 2(e1)2(2s)1/2
Normalization not needed
Oct. 2014

when e is even
when e is odd

Computer Architecture, The Arithmetic/Logic Unit

Slide 84

Hardware for
Floating-Point
Multiplication
and Division

Input 1

Input 2
Unpack

Signs Exponents

Significands

MulDiv

Multiply
or divide

Control
& sign
logic

Normalize
& round

Figure 12.6 Simplified


schematic of a floatingpoint multiply/divide unit.

Sign

Exponent

Significand

Pack
Output

Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 85

12.5 Floating-Point Instructions


Floating-point arithmetic instructions for MiniMIPS:
add.s
sub.d
mul.d
div.s
neg.s
31

op

$f0,$f8,$f10
$f0,$f8,$f10
$f0,$f8,$f10
$f0,$f8,$f10
$f0,$f8
25

ex

20

#
#
#
#
#

ft

set
set
set
set
set

15

$f0
$f0
$f0
$f0
$f0

fs

to
to
to
to
to
10

($f8) +fp
($f8) fp
($f8) fp
($f8) /fp
($f8)

($f10)
($f10)
($f10)
($f10)

fd

fn

0 1 0 0 0 1 0 0 0 0 x 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 x x x
Floating-point
instruction

s=0
d=1

Source
register 2

Source
register 1

Destination
register

add.* = 0
sub.* = 1
mul.* = 2
div.* = 3
neg.* = 7

Figure 12.7 The common floating-point instruction format for


MiniMIPS and components for arithmetic instructions. The extension
(ex) field distinguishes single (* = s) from double (* = d) operands.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 86

The Floating-Point Unit in MiniMIPS


...
m 2 32

Loc 0 Loc 4 Loc 8


4 B / location

Memory

Loc
Loc
m 8 m 4

up to 2 30 words

Coprocessor 1

...
EIU

$0
$1
$2

(Main proc.)

$31

ALU

Execution
& integer
unit

(Coproc. 1)

Integer
mul/div

FP
arith

Hi

Lo

FPU

$0
$1
$2

Floatingpoint unit

$31

Pairs of registers,
beginning with an
even-numbered
one, are used for
double operands

BadVaddr Trap &


(Coproc. 0) Status memory
Cause unit
EPC

TMU

Chapter
10

Chapter
11

Figure 5.1
Oct. 2014

Chapter
12

Memory and processing subsystems for MiniMIPS.


Computer Architecture, The Arithmetic/Logic Unit

Slide 87

Floating-Point Format Conversions


MiniMIPS instructions for number format conversion:
cvt.s.w
cvt.d.w
cvt.d.s
cvt.s.d
cvt.w.s
cvt.w.d
31

op

$f0,$f8
$f0,$f8
$f0,$f8
$f0,$f8
$f0,$f8
$f0,$f8
25

ex

#
#
#
#
#
#
20

set
set
set
set
set
set

ft

$f0
$f0
$f0
$f0
$f0
$f0

15

to
to
to
to
to
to

fs

single(integer $f8)
double(integer $f8)
double($f8)
single($f8,$f9)
integer($f8)
integer($f8,$f9)
10

fd

fn

0 1 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 x x x
Floating-point
instruction

Figure 12.8
Oct. 2014

*.w = 0
w.s = 0
w.d = 1
*.* = 1

Unused

Source
register

Destination
register

To format:
s = 32
d = 33
w = 36

Floating-point instructions for format conversion in MiniMIPS.


Computer Architecture, The Arithmetic/Logic Unit

Slide 88

Floating-Point Data Transfers


MiniMIPS instructions for floating-point load, store, and move:
lwc1
swc1
mov.s
mov.d
mfc1
mtc1
31

$f8,40($s3)
$f8,A($s3)
$f0,$f8
$f0,$f8
$t0,$f12
$f8,$t4

op

25

20

ft

load mem[40+($s3)] into $f8


store ($f8) into mem[A+($s3)]
load $f0 with ($f8)
load $f0,$f1 with ($f8,$f9)
load $t0 with ($f12)
load $f8 with ($t4)
15

fs

10

fd

fn

0 1 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Floating-point
instruction
31

ex

#
#
#
#
#
#

op

s=0
d=1
25

rs

Unused

20

rt

Source
register
15

rd

Destination
register
10

sh

mov.* = 6

fn

0 1 0 0 0 1 0 0 x 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Floating-point
instruction

Figure 12.9
Oct. 2014

mfc1 = 0
mtc1 = 4

Source
register

Destination
register

Unused

Unused

Instructions for floating-point data movement in MiniMIPS.


Computer Architecture, The Arithmetic/Logic Unit

Slide 89

Floating-Point Branches and Comparisons


MiniMIPS instructions for floating-point load, store, and move:
bc1t
bc1f
c.eq.*
c.lt.*
c.le.*
31

L
L
$f0,$f8
$f0,$f8
$f0,$f8

op

25

20

branch on fp flag true


branch on fp flag false
if ($f0)=($f8), set flag to true
if ($f0)<($f8), set flag to true
if ($f0)($f8), set flag to true
rt

operand / offset

15

0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 x 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
Floating-point
instruction
31

rs

#
#
#
#
#

op

bc1? = 8

25

ex

true = 1
false = 0
20

ft

Offset

Correction: 1 1 x x x 0
15

fs

10

fd

fn

0 1 0 0 0 1 0 0 0 0 x 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Floating-point
instruction

Figure 12.10
Oct. 2014

s=0
d=1

Source
register 2

Source
register 1

Unused

c.eq.* = 50
c.lt.* = 60
c.le.* = 62

Floating-point branch and comparison instructions in MiniMIPS.


Computer Architecture, The Arithmetic/Logic Unit

Slide 90

Floating-Point
Instructions of
MiniMIPS

Copy

Table 12.1
Arithmetic

s/d for single/double


# 0/1 for single/double

Conversions

Memory access
Control transfer
Oct. 2014

Instruction

Usage

Move s/d registers


Move fm coprocessor 1
Move to coprocessor 1
Add single/double
Subtract single/double
Multiply single/double
Divide single/double
Negate single/double
Compare equal s/d
Compare less s/d
Compare less or eq s/d
Convert integer to single
Convert integer to double
Convert single to double
Convert double to single
Convert single to integer
Convert double to integer
Load word coprocessor 1
Store word coprocessor 1
Branch coproc 1 true
Branch coproc 1 false

mov.* fd,fs
mfc1 rt,rd
mtc1 rd,rt
add.* fd,fs,ft
sub.* fd,fs,ft
mul.* fd,fs,ft
div.* fd,fs,ft
neg.* fd,fs
c.eq.* fs,ft
c.lt.* fs,ft
c.le.* fs,ft
cvt.s.w fd,fs
cvt.d.w fd,fs
cvt.d.s fd,fs
cvt.s.d fd,fs
cvt.w.s fd,fs
cvt.w.d fd,fs
lwc1 ft,imm(rs)
swc1 ft,imm(rs)
bc1t L
bc1f L

Computer Architecture, The Arithmetic/Logic Unit

ex fn
#
0
4
#
#
#
#
#
#
#
#
0
0
1
1
0
1
rs
rs
8
8
Slide 91

6
0
1
2
3
7
50
60
62
32
33
33
32
36
36

12.6 Result Precision and Errors


Example 12.4
Laws of algebra may not hold in floating-point arithmetic. For example,
the following computations show that the associative law of addition,
(a + b) + c = a + (b + c), is violated for the three numbers shown.
Numbers to be added first
a =-25 1.10101011
b = 25 1.10101110

Numbers to be added first


b = 25 1.10101110
c =-22 1.01100101

Compute a + b
25 0.00000011
a+b = 22 1.10000000
c =-22 1.01100101

Compute b + c (after preshifting c)


25 1.101010110011011
b+c = 25 1.10101011 (Round)
a =-25 1.10101011

Compute (a + b) + c
22 0.00011011
Sum = 26 1.10110000

Compute a + (b + c)
25 0.00000000
Sum = 0 (Normalize to special code for 0)

Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 92

Error Control and Certifiable Arithmetic


Catastrophic cancellation in subtracting almost equal numbers:
Area of a needlelike triangle
A = [s(s a)(s b)(s c)]1/2

b
a

Possible remedies
Carry extra precision in intermediate results (guard digits):
commonly used in calculators
Use alternate formula that does not produce cancellation errors
Certifiable arithmetic with intervals
A number is represented by its lower and upper bounds [xl, xu]
Example of arithmetic: [xl, xu] +interval [yl, yu] = [xl +fp yl, xu +fp yu]
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 93

Evaluation of Elementary Functions


Approximating polynomials
ln x = 2(z + z3/3 + z5/5 + z7/7 + . . . ) where z = (x 1)/(x + 1)
ex = 1 + x/1! + x2/2! + x3/3! + x4/4! + . . .
cos x = 1 x2/2! + x4/4! x6/6! + x8/8! . . .
tan1 x = x x3/3 + x5/5 x7/7 + x9/9 . . .
Iterative (convergence) schemes
For example, beginning with an estimate for x1/2, the following
iterative formula provides a more accurate estimate in each step
q(i+1) = 0.5(q(i) + x/q(i))
Table lookup (with interpolation)
A pure table lookup scheme results in huge tables (impractical);
hence, often a hybrid approach, involving interpolation, is used.
Oct. 2014

Computer Architecture, The Arithmetic/Logic Unit

Slide 94

Function Evaluation by Table Lookup


h bits
xH

Input x

k - h bits
xL
xL

Table
for a

f(x)

Table
for b
Best linear
approximation
in subinterval

Multiply
x
xH
Add
Output
Figure 12.12
Oct. 2014

f(x)

The linear approximation above is


characterized by the line equation
a + b x L , where a and b are
read out from tables based on x H

Function evaluation by table lookup and linear interpolation.


Computer Architecture, The Arithmetic/Logic Unit

Slide 95

Potrebbero piacerti anche