f37 Book Intarch Pres pt3 PDF

Part III
The Arithmetic/Logic Unit
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 1
About This Presentation

This presentation is intended to support the use of the textbook
Computer Architecture: From Microprocessors to Supercomputers,
Oxford University Press, 2005, ISBN 0-19-515455-X. It is updated
regularly by the author as part of his teaching of the upper-division
course ECE 154, Introduction to Computer Architecture, at the
University of California, Santa Barbara. Instructors can use these
slides freely in classroom teaching and for other educational
purposes. Any other use is strictly prohibited. Behrooz Parhami
Edition
Released
Revised
Revised
Revised
Revised
First
July 2003
July 2004
July 2005
Mar. 2006
Jan. 2007
Jan. 2008
Jan. 2009
Jan. 2011
Oct. 2014
Oct. 2014
Slide 2
III The Arithmetic/Logic Unit

Overview of computer arithmetic and ALU design:
Review representation methods for signed integers
Discuss algorithms & hardware for arithmetic ops
Consider floating-point representation & arithmetic
Topics in This Part
Chapter 9
Number Representation
Chapter 10 Adders and Simple ALUs

Chapter 11 Multipliers and Dividers
Chapter 12 Floating-Point Arithmetic
Oct. 2014
Slide 3
Preview of Arithmetic Unit in the Data Path

Incr PC
Next addr
jta
Next PC
ALUOvfl
(PC)
PC
Instr
cache
rs
rt
inst
0
1
2
rd
31
imm
op
Br&Jump
Instruction fetch
Fig. 13.3
Oct. 2014
Register
writeback
(rs)
Ovfl
Reg
file
ALU
(rt)
/
16
ALU
out
Data
cache
Data
out
Data
in
Func
0
32
SE / 1
Data
addr
0
1
2
Register input
fn
RegDst
RegWrite
Reg access / decode
ALUSrc
ALUFunc
ALU operation
DataRead
RegInSrc
DataWrite
Data access
Key elements of the single-cycle MicroMIPS data path.

Slide 4
Computer Arithmetic as a Topic of Study

Brief overview article
Encyclopedia of Info Systems,
Academic Press, 2002,
Vol. 3, pp. 317-333
Our textbooks treatment
of the topic falls between
the extremes (4 chaps.)
Graduate course
ECE 252B Text:
Computer Arithmetic,
Oxford U Press, 2000
(2nd ed., 2010)
Oct. 2014
Slide 5
9 Number Representation
Arguably the most important topic in computer arithmetic:
Affects system compatibility and ease of arithmetic
Twos complement, flp, and unconventional methods
Topics in This Chapter
9.1 Positional Number Systems
9.2 Digit Sets and Encodings
9.3 Number-Radix Conversion
9.4 Signed Integers
9.5 Fixed-Point Numbers
9.6 Floating-Point Numbers
Oct. 2014
Slide 6
9.1 Positional Number Systems

Representations of natural numbers {0, 1, 2, 3, }
||||| ||||| ||||| ||||| ||||| ||
27
11011
XXVII
sticks or unary code

radix-10 or decimal code
radix-2 or binary code
Roman numerals
Fixed-radix positional representation with k digits

k1
Value of a number: x = (xk1xk2 . . . x1x0)r = xi r i

i=0
For example:
27 = (11011)two = (124) + (123) + (022) + (121) + (120)
Number of digits for [0, P]: k = logr (P + 1) = logr P + 1
Oct. 2014
Slide 7
Unsigned Binary Integers

0000
1111
1110
15
0001
0010
1
2
14
1101
0011
3
13
1100
12
1011
Turn x notches
counterclockwise
to add x
Inside: Natural number

Outside: 4-bit encoding
11
1010
0101
12
11
10
15
0
2
3
4
5
9 8 7
6
9
1001
0100
4
5
10
14
13
8
1000
0110
7
0111
Turn y notches
clockwise
to subtract y
Figure 9.1 Schematic representation of 4-bit code for

integers in [0, 15].
Oct. 2014
Slide 8
Representation Range and Overflow
Overflow region max
max
Numbers smaller
than max
Overflow region
Numbers larger
than max
Finite set of representable numbers
Figure 9.2 Overflow regions in finite number representation systems.

For unsigned representations covered in this section, max = 0.
Example 9.2, Part d
Discuss if overflow will occur when computing 317 316 in a number
system with k = 8 digits in radix r = 10.
Solution
The result 86 093 442 is representable in the number system which
has a range [0, 99 999 999]; however, if 317 is computed en route to
the final result, overflow will occur.
Oct. 2014
Slide 9
9.2 Digit Sets and Encodings

Conventional and unconventional digit sets
Decimal digits in [0, 9]; 4-bit BCD, 8-bit ASCII
Hexadecimal, or hex for short: digits 0-9 & a-f
Conventional ternary digit set in [0, 2]
Conventional digit set for radix r is [0, r 1]
Symmetric ternary digit set in [1, 1]
Conventional binary digit set in [0, 1]
Redundant digit set [0, 2], encoded in 2 bits
( 0 2 1 1 0 )two and ( 1 0 1 0 2 )two represent 22
Oct. 2014
Slide 10
Carry-Save Numbers
Radix-2 numbers using the digits 0, 1, and 2
Example: (1 0 2 1)two = (123) + (022) + (221) + (120) = 13
Possible encodings
(a) Binary
(b) Unary
0
1
2
0
1
1
2
MSB
LSB
Oct. 2014
00
01
10
11 (Unused)
1 0 2 1
0 0 1 0 = 2
1 0 0 1 = 9
00
01 (First alternate)
10 (Second alternate)
11
First bit
Second bit
1 0 2 1
0 0 1 1 = 3
1 0 1 0 = 10
Slide 11
The Notion of Carry-Save Addition

Digit-set combination: {0, 1, 2} + {0, 1} = {0, 1, 2, 3} = {0, 2} + {0, 1}
This bit
being 1
represents
overflow
(ignore it)
Carry-save
input
Carry-save
addition
Two
carry-save
inputs
Binary input
Carry-save
0
output
Carry-save
addition
a. Carry-save addition.
b. Adding two carry-save numbers.
Figure 9.3 Adding a binary number or another

carry-save number to a carry-save number.
Oct. 2014
Slide 12
9.3 Number Radix Conversion

Two ways to convert numbers from an old radix r to a new radix R
Perform arithmetic in the new radix R

Suitable for conversion from radix r to radix 10
Horners rule:
(xk1xk2 . . . x1x0)r = (((0 + xk1)r + xk2)r + . . . + x1)r + x0
(1 0 1 1 0 1 0 1)two = 0 + 1 1 2 + 0 2 2 + 1 5 2 + 1
11 2 + 0 22 2 + 1 45 2 + 0 90 2 + 1 181
Perform arithmetic in the old radix r

Suitable for conversion from radix 10 to radix R
Divide the number by R, use the remainder as the LSD
and the quotient to repeat the process
19 / 3 rem 1, quo 6 / 3 rem 0, quo 2 / 3 rem 2, quo 0
Thus, 19 = (2 0 1)three
Oct. 2014
Slide 13
Justifications for Radix Conversion Rules
( xk 1 xk 2 x0 ) r xk 1r
k 1
xk 2 r
k 2
x1r x0
x0 r ( x1 r ( x2 r ()))
Justifying Horners rule.
x
Binary representation of x/2
Figure 9.4
Oct. 2014
x mod 2
Justifying one step of the conversion of x to radix 2.

Slide 14
9.4 Signed Integers

We dealt with representing the natural numbers
Signed or directed whole numbers = integers
{ . . . , 3, 2, 1, 0, 1, 2, 3, . . . }
Signed-magnitude representation
+27 in 8-bit signed-magnitude binary code 0 0011011
27 in 8-bit signed-magnitude binary code 1 0011011
27 in 2-digit decimal code with BCD digits 1 0010 0111
Biased representation
Represent the interval of numbers [N, P] by the unsigned
interval [0, P + N]; i.e., by adding N to every number
Oct. 2014
Slide 15
Twos-Complement Representation
With k bits, numbers in the range [2k1, 2k1 1] represented.
Negation is performed by inverting all bits and adding 1.
0000
1111
1
1110
+0
0001
+1
0010
+2
2
1101
0011
3
1011
1100
Turn x notches
counterclockwise
to add x
+3
+4
+5
5
6
1001
0100
4
5
6
0101
1
0
2
3
4
5
7 8 7
+6
7
1010
2
3
8
1000
+7
0110
0111
Turn 16 y notches
counterclockwise to
add y (subtract y)
Figure 9.5 Schematic representation of 4-bit 2s-complement

code for integers in [8, +7].
Oct. 2014
Slide 16
Conversion from 2s-Complement to Decimal

Example 9.7
Convert x = (1 0 1 1 0 1 0 1)2s-compl to decimal.
Solution
Given that x is negative, one could change its sign and evaluate x.
Shortcut: Use Horners rule, but take the MSB as negative
1 2 + 0 2 2 + 1 3 2 + 1 5 2 + 0 10 2 + 1
19 2 + 0 38 2 + 1 75
Sign Change for a 2s-Complement Number

Example 9.8
Given y = (1 0 1 1 0 1 0 1)2s-compl, find the representation of y.
Solution
y = (0 1 0 0 1 0 1 0) + 1 = (0 1 0 0 1 0 1 1)2s-compl
Oct. 2014
(i.e., 75)
Slide 17
Twos-Complement Addition and Subtraction
k
/
c in
Adder
k
/
k
/
k
/
xy
c out
y or
y
AddSub
Figure 9.6
Oct. 2014
Binary adder used as 2s-complement adder/subtractor.

Slide 18
9.5 Fixed-Point Numbers

Positional representation: k whole and l fractional digits
Value of a number: x = (xk1xk2 . . . x1x0 . x1x2 . . . xl )r = xi r i
For example:
2.375 = (10.011)two = (121) + (020) + (021) + (122) + (123)
Numbers in the range [0, rk ulp] representable, where ulp = r l
Fixed-point arithmetic same as integer arithmetic
(radix point implied, not explicit)
Twos complement properties (including sign change) hold here as well:
(01.011)2s-compl = (021) + (120) + (021) + (122) + (123) = +1.375
(11.011)2s-compl = (121) + (120) + (021) + (122) + (123) = 0.625
Oct. 2014
Slide 19
Fixed-Point 2s-Complement Numbers

1.111
.125
1.110
0.000
+0
.25
0.001
+.125
0.010
+.25
1.101
0.011
.375
1.011
.5
1.100
+.5
0.100
+.625
.625
0.101
+.75
.75
1.010
+.375
.875
1.001
1
1.000
+.875
0.110
0.111
Figure 9.7 Schematic representation of 4-bit 2s-complement

encoding for (1 + 3)-bit fixed-point numbers in the range [1, +7/8].
Oct. 2014
Slide 20
Radix Conversion for Fixed-Point Numbers

Convert the whole and fractional parts separately.
To convert the fractional part from an old radix r to a new radix R:
Perform arithmetic in the new radix R

Evaluate a polynomial in r 1: (.011)two = 0 21 + 1 22 + 1 23
Simpler: View the fractional part as integer, convert, divide by r l
(.011)two = (?)ten
Multiply by 8 to make the number an integer: (011)two = (3)ten
Thus, (.011)two = (3 / 8)ten = (.375)ten
Perform arithmetic in the old radix r

Multiply the given fraction by R, use the whole part as the MSD
and the fractional part to repeat the process
(.72)ten = (?)two
0.72 2 = 1.44, so the answer begins with 0.1
0.44 2 = 0.88, so the answer begins with 0.10
Oct. 2014
Slide 21
9.6 Floating-Point Numbers

Useful for applications where very large and very small
numbers are needed simultaneously
Fixed-point representation must sacrifice precision
for small values to represent large values
x = (0000 0000 . 0000 1001)two
Small number
y = (1001 0000 . 0000 0000)two
Large number
Neither y2 nor y / x is representable in the format above
Floating-point representation is like scientific notation:
20 000 000 = 2 10 7
+0.000 000 007 = +7 109
Sign
Oct. 2014
Significand
Exponent base
Exponent
Also, 7E9
Slide 22
ANSI/IEEE Standard Floating-Point Format (IEEE 754)

Revision (IEEE 754R) was completed in 2008: The revised version includes
16-bit and 128-bit binary formats, as well as 64- and 128-bit decimal formats
Short (32-bit) format

8 bits,
bias = 127,
126 to 127
23 bits for fractional part

(plus hidden 1 in integer part)
Sign Exponent
11 bits,
bias = 1023,
1022 to 1023
Short exponent range is 127 to 128

but the two extreme values
are reserved for special operands
(similarly for the long format)
Significand
Long (64-bit) format
Figure 9.8
Oct. 2014
The two ANSI/IEEE standard floating-point formats.

Slide 23
Short and Long IEEE 754 Formats: Features

Table 9.1 Some features of ANSI/IEEE standard floating-point formats
Feature
Word width in bits
Significand in bits
Significand range
Exponent bits
Exponent bias
Zero (0)
Denormal
Single/Short
32
23 + 1 hidden
[1, 2 223]
8
127
e + bias = 0, f = 0
e + bias = 0, f 0
represents 0.f 2126
e + bias = 255, f = 0
e + bias = 255, f 0
e + bias [1, 254]
e [126, 127]
represents 1.f 2e
Double/Long
64
52 + 1 hidden
[1, 2 252]
11
1023
e + bias = 0, f = 0
e + bias = 0, f 0
represents 0.f 21022
e + bias = 2047, f = 0
e + bias = 2047, f 0
e + bias [1, 2046]
e [1022, 1023]
represents 1.f 2e
min
2126 1.2 1038
21022 2.2 10308
max
2128 3.4 1038
21024 1.8 10308
Subnormal
Infinity ()
Not-a-number (NaN)
Ordinary number
Oct. 2014
Slide 24
10 Adders and Simple ALUs

Addition is the most important arith operation in computers:
Even the simplest computers must have an adder
An adder, plus a little extra logic, forms a simple ALU
10.1 Simple Adders
10.2 Carry Propagation Networks
10.3 Counting and Incrementation
10.4 Design of Fast Adders
10.5 Logic and Shift Operations
10.6 Multifunction ALUs
Oct. 2014
Slide 25
10.1 Simple Adders

Inputs
x
Outputs
s
c
0
0
1
1
0
1
0
1
0
0
0
1
Inputs
Digit-set interpretation:
{0, 1} + {0, 1}
= {0, 2} + {0, 1}
HA
0
1
1
0
Outputs
cin
cout
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
0
0
1
0
1
1
1
0
1
1
0
1
0
0
1
Figures 10.1/10.2
Oct. 2014
cout
FA
cin
Digit-set interpretation:
{0, 1} + {0, 1} + {0, 1}
= {0, 2} + {0, 1}
Binary half-adder (HA) and full-adder (FA).
Slide 26
Full-Adder Implementations
x
y
HA
c out
HA
c in
x
y
c out
s
(a) FA built of two HAs
x
y
c out
0
1
2
3
0
1
2
3
c in
s
(b) CMOS mux-based FA
c in
s
(c) Two-level AND-OR FA
Figure10.3 Full adder implemented with two half-adders, by means

of two 4-input multiplexers, and as two-level gate network.
Oct. 2014
Slide 27
Ripple-Carry Adder: Slow But Simple
x31
c32
cout
y31
FA
Oct. 2014
c31
. . .
c2
y1
FA
x0
c1
y0
FA
Critical path
s31
Figure 10.4
x1
s1
c0
cin
s0
Ripple-carry binary adder with 32-bit inputs and output.
Slide 28
Carry Chains and Auxiliary Signals

Bit positions
15 14 13 12
----------1 0 1 1
11 10 9 8
----------0 1 1 0
7 6 5 4
----------0 1 1 0
cout 0 1 0 1
1 0 0 1
1 1
\__________/\__________________/
4
6
g = xy
Oct. 2014
Carry chains and their lengths
3 2 1 0
----------1 1 1 0
0
0 0 1 1 cin
\________/\____/
3
2
p=xy
Slide 29
Carry Chains Illustrated with Dominoes
Oct. 2014
Slide 30
10.2 Carry Propagation Networks

gi pi
Carry is:
0
0
1
1
annihilated or killed
propagated
generated
(impossible)
0
1
0
1
g k1 p k1
xi
g k2 p k2
yi
gi = xi yi
pi = xi yi
g i+1 p i+1
gi
pi
...
...
g1 p1
g0 p0
c0
Carry network
ck
c k1
...
c k2
ci
c i+1
...
c1
c0
si
Figure 10.5 The main part of an adder is the carry network. The rest
is just a set of gates to produce the g and p signals and the sum bits.
Oct. 2014
Slide 31
Ripple-Carry Adder Revisited

The carry recurrence: ci+1 = gi pi ci
Latency of k-bit adder is roughly 2k gate delays:
1 gate delay for production of p and g signals, plus
2(k 1) gate delays for carry propagation, plus
1 XOR gate delay for generation of the sum bits
gk1 pk1
gk2 pk2
g1
p1
...
ck
ck1
Figure 10.6
Oct. 2014
ck2
c2
c1
g0
p0
c0
The carry propagation network of a ripple-carry adder.

Slide 32
The Complete Design of a Ripple-Carry Adder

gi pi
Carry is:
0
0
1
1
propagated
generated
(impossible)
0
1
0
1
g k1 p k1
gk1 pk1
ck
ck
xi
g k2 p k2
c k1
gi = xi yi
pi = xi yi
g i+1 p i+1
gi
pi
...
...
gk2 pk2
Carry network
ck2
...
c k2
g1 p1
g1
.
ck1
yi
ci
c i+1
p1
.
c2
c1
...
c1
g0 p0
g0
c0
p0
c0
c0
si
Figure 10.6 (ripple-carry network) superimposed on

Figure 10.5 (general structure of an adder).
Oct. 2014
Slide 33
First Carry Speed-Up Method: Carry Skip

g4j+3 p4j+3
c4j+4
c4j+3
g4j+2 p4j+2
c4j+2
g4j+1
p4j+1
c4j+1
g4j
p4j
c4j
One-way street
Freeway
Figures 10.7/10.8
A 4-bit section of a ripple-carry network
with skip paths and the driving analogy.
Oct. 2014
Slide 34
Mux-Based Skip Carry Logic

Fig. 10.7
g4j+3 p4j+3
c4j+4
g4j+2 p4j+2
c4j+3
g4j+3 p4j+3
c4j+4
0
1c4j+4
p[4j, 4j+3]
c4j+3
g4j+1
c4j+2
g4j+2 p4j+2
p4j+1
g4j
p4j
c4j+1
g4j+1
c4j+2
p4j+1
c4j
g4j
p4j
c4j+1
c4j
The carry-skip adder of Fig. 10.7 works fine if we begin with a

clean slate, where all signals are 0s; otherwise, it will run into
problems, which do not exist in this mux-based implementation
Oct. 2014
Slide 35
10.3 Counting and Incrementation

k
/
k
/
Data in
IncrInit
0
1
k
/
Count
register
D
k
/
c in
_
C
Adder /
k
Update
a
(Increment
amount)
Figure 10.9
Oct. 2014
c out
Schematic diagram of an initializable synchronous counter.

Slide 36
Circuit for Incrementation by 1

Figure 10.6
Substantially
simpler than
an adder
gk1 pk1
ck
xk1
ck
0g1
gk2 pk2
sk1
x1
...
ck1
ck2
c2
xk2
ck1
p1
ck2
sk2
c2
s2
p0
x0
c1
s1
s0
Figure 10.10 Carry propagation network

and sum logic for an incrementer.
Oct. 2014
x0
c0
c1
x1
...
0g0
Slide 37
10.4 Design of Fast Adders

Carries can be computed directly without propagation
For example, by unrolling the equation for c3, we get:
c3 = g2 p2 c2 = g2 p2 g1 p2 p1 g0 p2 p1 p0 c0
We define generate and propagate signals for a block

extending from bit position a to bit position b as follows:
g[a,b] = gb pb gb1 pb pb1 gb2 . . . pb pb1 pa+1 ga
p[a,b] = pb pb1 . . . pa+1 pa
Combining g and p signals for adjacent blocks:

g[h,j] = g[i+1,j] p[i+1,j] g[h,i]
p[h,j] = p[i+1,j] p[h,i]
i+1 i
[h, j] = [i + 1, j] [h, i]
Oct. 2014
Slide 38
Carries as Generate Signals for Blocks [ 0, i ]

gi pi
Carry is:
0
0
1
1
propagated
generated
(impossible)
0
1
0
1
g k1 p k1
xi
g k2 p k2
yi
Assuming c0 = 0,
we have ci = g [0,i 1]
g i+1 p i+1
gi
pi
...
...
g1 p1
g0 p0
c0
Carry network
ck
c k1
Figure 10.5
Oct. 2014
...
c k2
ci
c i+1
...
c1
c0
si
Slide 39
Second Carry Speed-Up Method: Carry Lookahead

[7, 7 ]
[6, 6 ]
[5, 5 ]
[4, 4 ] [3, 3 ]
[2, 2 ]
[1, 1 ]
[6, 7 ]
[0, 0 ]
g [1, 1] p [1, 1]
g [0, 0]
p [0, 0]
[2, 3 ]
[4, 5 ]
[0, 1 ]
[4, 7 ]
[0, 3 ]
[0, 7 ]
[0, 6 ]
[0, 5 ]
[0, 4 ] [0, 3 ]
[0, 2 ]
g [0, 1] p [0, 1]
[0, 1 ]
[0, 0 ]
Figure 10.11 Brent-Kung lookahead carry network for an 8-digit adder,

along with details of one of the carry operator blocks.
Oct. 2014
Slide 40
Recursive Structure of Brent-Kung Carry Network

[7, 7]
[6, 6]
[5, 5]
[4, 4]
[3, 3]
[2, 2]
[1, 1]
[0, 0]
[7, 7 ]
[6, 6 ]
[5, 5 ]
[4, 4 ] [3, 3 ]
[2, 2 ]
[6, 7 ]
[2, 3 ]
[0, 1 ]
[4, 7 ]
[0, 3 ]
[0, 6]
[0, 5]
[0, 4]
[0, 7 ]
[0, 7]
[0, 0 ]
[4, 5 ]
4-input Brent-Kung
carry network
[1, 1 ]
[0, 3]
[0, 2]
[0, 1]
[0, 6 ]
[0, 5 ]
[0, 4 ] [0, 3 ]
[0, 2 ]
[0, 1 ]
[0, 0 ]
[0, 0]
Figure 10.12 Brent-Kung lookahead carry network for an 8-digit adder,

with only its top and bottom rows of carry-operators shown.
Oct. 2014
Slide 41
An Alternate Design: Kogge-Stone Network
c8 = g [0,7]
c7 = g [0,6]
c6 = g [0,5]
c5 = g [0,4]
c4 = g [0,3]
c3 = g [0,2]
c2 = g [0,1]
c1 = g [0,0]
Kogge-Stone lookahead carry network for an 8-digit adder.

Oct. 2014
Slide 42
Brent-Kung vs. Kogge-Stone Carry Network

[7, 7 ]
[6, 6 ]
[5, 5 ]
[4, 4 ]
[3, 3 ]
[2, 2 ]
[1, 1 ]
[6, 7 ]
[0, 0 ]
g [1, 1] p [1, 1]
g [0, 0]
p [0, 0]
[2, 3 ]
[0, 1 ]
[4, 5 ]
[4, 7 ]
[0, 3 ]
[0, 7 ]
[0, 6 ]
[0, 5 ]
[0, 4 ]
[0, 3 ]
[0, 2 ]
g [0, 1] p [0, 1]
[0, 1 ]
11 carry operators
4 levels
Oct. 2014
[0, 0 ]
17 carry operators
3 levels
Slide 43
Carry-Lookahead Logic with 4-Bit Block

p i+3
g i+3
p i+2
g i+2
p i+1 g i+1 p i
gi
g [i, i+3]
Intermeidte carries
p [i, i+3]
Block signal generation
ci
c i+3
c i+2
c i+1
Figure 10.13
Blocks needed in the design of carry-lookahead adders
with four-way grouping of bits.
Oct. 2014
Slide 44
Third Carry Speed-Up Method: Carry Select

Allows doubling of adder width with a single-mux additional delay
x
c out
Adder
c in
Version 0
of sum bits
c out
Figure 10.14
Oct. 2014
[a, b]
Adder
[a, b]
c in
Version 1
of sum bits
a
[a, b]
The lower
a positions,
(0 to a 1)
are added
as usual
Carry-select addition principle.

Slide 45
10.5 Logic and Shift Operations

Conceptually, shifts can be implemented by multiplexing
Right-shifted
values
RightLeft
Shift amount
5
00, x[30, 2]
0, x[31, 1]
x[31, 0]
32
32 32
6-bit code specifying

shift direction & amount
Figure 10.15
x[0], 00...0
x[1, 0], 00...0
x[30, 0], 0
x[31, 0]
. . .
32
32
31
32
32
33
. . .
32
62
32
63
Multiplexer
Oct. 2014
00...0, x[31]
Left-shifted
values
32
Multiplexer-based logical shifting unit.

Slide 46
Arithmetic Shifts
Purpose: Multiplication and division by powers of 2
sra $t0,$s1,2
srav $t0,$s1,$s0
op
31
25
20
rt
15
rd
10
sh
fn
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1
ALU
instruction
op
31
rs
# $t0 ($s1) right-shifted by 2

# $t0 ($s1) right-shifted by ($s0)
Unused
25
rs
Source
register
20
rt
Destination
register
15
rd
Shift
amount
10
sh
sra = 3
fn
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1
ALU
instruction
Figure 10.16
Oct. 2014
Amount
register
Source
register
Destination
register
Unused
srav = 7
The two arithmetic shift instructions of MiniMIPS.

Slide 47
Practical Shifting in Multiple Stages

00
01
10
11
No shift
Logical left
Logical right
Arith right
x[31], x[31, 1]
0, x[31, 1]
x[30, 0], 0
x[31, 0]
32
32
32
32
0
y[31, 0]
0
2
(0 or 2)-bit shift
z[31, 0]
Multiplexer
0
32
2
(a) Single-bit shifter
Figure 10.17
Oct. 2014
(0 or 4)-bit shift
(0 or 1)-bit shift
(b) Shifting by up to 7 bits
Multistage shifting in a barrel shifter.

Slide 48
Bit Manipulation via Shifts and Logical Operations

Bits 10-15
AND with mask to isolate a field: 0000 0000 0000 0000 1111 1100 0000 0000
Right-shift by 10 positions to move field to the right end of word
The result word ranges from 0 to 63, depending on the field pattern
32-pixel (4 8) block of
black-and-white image:
Row 0
Row 1
Row 2
Row 3
Representation
as 32-bit word:
1010 0000 0101 1000 0000 0110 0001 0111
Hex equivalent:
0xa0a80617
Figure 10.18 A 4 8 block of a black-and-white

image represented as a 32-bit word.
Oct. 2014
Slide 49
10.6 Multifunction ALUs

Arith fn (add, sub, . . .)
Operand 1
Operand 2
Arith
unit
Logic
unit
Result
Select fn type
(logic or arith)
Logic fn (AND, OR, . . .)

General structure of a simple arithmetic/logic unit.
Oct. 2014
Slide 50
ConstVar
Shift function
Constant
5
amount
Amount
Variable
amount
00
01
10
11
No shift
Logical left
Logical right
Arith right
Shifter
Function
class
32
5 LSBs
Shifted y
Adder
y
0 or 1
c0
32
32
k
/
c 31
xy
Shift
Set less
Arithmetic
Logic
2
0
Shorthand
symbol
for ALU
MSB
32
32
00
01
10
11
An ALU for
MiniMIPS
Control
c 32
x
Func
AddSub
s
ALU
Logic
unit
AND
OR
XOR
NOR
00
01
10
11
32input
NOR
Ovfl
Zero
Logic function
Zero
Ovfl
Figure 10.19 A multifunction ALU with 8 control signals (2 for

function class, 1 arithmetic, 3 shift, 2 logic) specifying the operation.
Oct. 2014
Slide 51
11 Multipliers and Dividers

Modern processors perform many multiplications & divisions:
Encryption, image compression, graphic rendering
Hardware vs programmed shift-add/sub algorithms
11.1 Shift-Add Multiplication
11.2 Hardware Multipliers
11.3 Programmed Multiplication
11.4 Shift-Subtract Division
11.5 Hardware Dividers
11.6 Programmed Division
Oct. 2014
Slide 52
11.1 Shift-Add Multiplication

Multiplicand
Multiplier
x
y
Partial
products
bit-matrix
y0
y
1
y2
y3
Product
Figure 11.1
x
x
x
x
20
21
22
23
Multiplication of 4-bit numbers in dot notation.
z(j+1) = (z(j) + yj x 2k) 21 with z(0) = 0 and z(k) = z

| add |
| shift right |
Oct. 2014
Slide 53
Binary and Decimal Multiplication

Position
7 6 5 4 3 2 1 0
=========================
x24
1 0 1 0
y
0 0 1 1
=========================
z (0)
0 0 0 0
4
+y0x2
1 0 1 0
2z (1)
0 1 0 1 0
(1)
z
0 1 0 1 0
4
+y1x2
1 0 1 0
2z (2)
0 1 1 1 1 0
(2)
z
0 1 1 1 1 0
4
+y2x2
0 0 0 0
2z (3)
0 0 1 1 1 1 0
(3)
z
0 0 1 1 1 1 0
4
+y3x2
0 0 0 0
2z (4)
0 0 0 1 1 1 1 0
(4)
z
0 0 0 1 1 1 1 0
=========================
Figure 11.2
Oct. 2014
Example 11.1
Position
7 6 5 4 3 2 1 0
=========================
x104
3 5 2 8
y
4 0 6 7
=========================
z (0)
0 0 0 0
4
+y0x10 2 4 6 9 6
10z (1)
2 4 6 9 6
(1)
z
0 2 4 6 9 6
4
+y1x10 2 1 1 6 8
10z (2)
2 3 6 3 7 6
(2)
z
2 3 6 3 7 6
4
+y2x10 0 0 0 0 0
10z (3)
0 2 3 6 3 7 6
(3)
z
0 2 3 6 3 7 6
4
+y3x10 1 4 1 1 2
10z (4)
1 4 3 4 8 3 7 6
(4)
z
1 4 3 4 8 3 7 6
=========================
Step-by-step multiplication examples for 4-digit unsigned numbers.

Slide 54
Twos-Complement Multiplication
Position
7 6 5 4 3 2 1 0
=========================
x24
1 0 1 0
y
0 0 1 1
=========================
z (0)
0 0 0 0 0
4
+y0x2
1 1 0 1 0
2z (1)
1 1 0 1 0
(1)
z
1 1 1 0 1 0
4
+y1x2
1 1 0 1 0
2z (2)
1 0 1 1 1 0
(2)
z
1 1 0 1 1 1 0
4
+y2x2
0 0 0 0 0
2z (3)
1 1 0 1 1 1 0
(3)
z
1 1 1 0 1 1 1 0
4
+(y3x2 ) 0 0 0 0 0
2z (4)
1 1 1 0 1 1 1 0
(4)
z
1 1 1 0 1 1 1 0
=========================
Figure 11.3
Oct. 2014
Example 11.2
Position
7 6 5 4 3 2 1 0
=========================
x24
1 0 1 0
y
1 0 1 1
=========================
z (0)
0 0 0 0 0
4
+y0x2
1 1 0 1 0
2z (1)
1 1 0 1 0
(1)
z
1 1 1 0 1 0
4
+y1x2
1 1 0 1 0
2z (2)
1 0 1 1 1 0
(2)
z
1 1 0 1 1 1 0
4
+y2x2
0 0 0 0 0
2z (3)
1 1 0 1 1 1 0
(3)
z
1 1 1 0 1 1 1 0
4
+(y3x2 ) 0 0 1 1 0
2z (4)
0 0 0 1 1 1 1 0
(4)
z
0 0 0 1 1 1 1 0
=========================
Step-by-step multiplication examples for 2s-complement numbers.

Slide 55
11.2 Hardware Multipliers

Shift
Multiplier y
Doublewidth partial product z (j)
Hi
Shift
Lo
Multiplicand x
Mux
yj
Enable
Select
c out
Figure 11.4
Oct. 2014
Adder
c in
AddSub
Hardware multiplier based on the shift-add algorithm.

Slide 56
The Shift Part of Shift-Add

From adder
cout
Sum
/k
Partial product
/k1
Multiplier
/k1
/k
To adder
yj
Figure11.5
Shifting incorporated in the connections to
the partial product register rather than as a separate phase.
Oct. 2014
Slide 57
High-Radix Multipliers
Multiplicand
Multiplier
x
y
0, x, 2x, or 3x
Product
Radix-4 multiplication in dot notation.
z(j+1) = (z(j) + yj x 2k) 41 with z(0) = 0 and z(k/2) = z

| add |
Assume k even
| shift right |
Oct. 2014
Slide 58
Tree Multipliers
All partial products
Several partial products
...
...
Large tree of
carry-save
adders
Logdepth
Adder
Logdepth
Product
(a) Full-tree multiplier
Figure 11.6
Oct. 2014
Small tree of
carry-save
adders
Adder
Product
(b) Partial-tree multiplier
Schematic diagram for full/partial-tree multipliers.

Slide 59
Array Multipliers
0
s
x3 0
0
x2 0
0
x1 0
0
x0 0
MA
MA
MA
MA
0
0
Figure 9.3a
(Recalling
carry-save
addition)
s
MA
y0
c
MA
MA
MA
y1
z0
0
MA
MA
MA
MA
y2
z1
0
MA
MA
MA
MA
y3
z2
0
z3
FA
Figure 11.7
Oct. 2014
Our original
dot-notation
representing
multiplication
FA
FA
HA
z7
z6
z5
Straightened
dots to depict
array multiplier
to the left
z4
Array multiplier for 4-bit unsigned operands.
Slide 60
11.3 Programmed Multiplication

MiniMIPS instructions related to multiplication
mult
multu
mfhi
mflo
$s0,$s1
$s2,$s3
$t0
$t1
#
#
#
#
set
set
set
set
Hi,Lo to ($s0)($s1); signed

Hi,Lo to ($s2)($s3); unsigned
$t0 to (Hi)
$t1 to (Lo)
Example 11.3
Finding the 32-bit product of 32-bit integers in MiniMIPS
Multiply; result will be obtained in Hi,Lo
For unsigned multiplication:
Hi should be all-0s and Lo holds the 32-bit result
For signed multiplication:
Hi should be all-0s or all-1s, depending on the sign bit of Lo
Oct. 2014
Slide 61
Emulating a Hardware Multiplier in Software

Example 11.4 (MiniMIPS shift-add program for multiplication)
Shift
$a1
(multiplier
Multiplier
y y)
$v0Doublewidth
(Hi part of z)partial
$v1product
(Lo part zof(j)z)
$t0 (carry-out)
$a0Multiplicand
(multiplicandx x)
Also, holds
LSB of Hi
during shift
Shift
Mux
$t1 (bit j of y)
yj
Enable
Select
$t2 (counter)
c out
Adder
c in
AddSub
Part of the
control in
hardware
Figure 11.8 Register usage for programmed multiplication

superimposed on the block diagram for a hardware multiplier.
Oct. 2014
Slide 62
Multiplication When There Is No Multiply Instruction

Example 11.4 (MiniMIPS shift-add program for multiplication)
shamu: move
move
addi
mloop: move
move
srl
subu
subu
beqz
addu
sltu
noadd: move
srl
subu
subu
sll
addu
srl
sll
addu
addi
bne
jr
Oct. 2014
$v0,$zero
$vl,$zero
$t2,$zero,32
$t0,$zero
$t1,$a1
$a1,1
$t1,$t1,$a1
$t1,$t1,$a1
$t1,noadd
$v0,$v0,$a0
$t0,$v0,$a0
$t1,$v0
$v0,1
$t1,$t1,$v0
$t1,$t1,$v0
$t0,$t0,31
$v0,$v0,$t0
$v1,1
$t1,$t1,31
$v1,$v1,$t1
$t2,$t2,-1
$t2,$zero,mloop
$ra
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
initialize Hi to 0
initialize Lo to 0
init repetition counter to 32
set c-out to 0 in case of no add
copy ($a1) into $t1
halve the unsigned value in $a1
subtract ($a1) from ($t1) twice to
obtain LSB of ($a1), or y[j], in $t1
no addition needed if y[j] = 0
add x to upper part of z
form carry-out of addition in $t0
copy ($v0) into $t1
halve the unsigned value in $v0
subtract ($v0) from ($t1) twice to
obtain LSB of Hi in $t1
carry-out converted to 1 in MSB of $t0
right-shifted $v0 corrected
halve the unsigned value in $v1
LSB of Hi converted to 1 in MSB of $t1
right-shifted $v1 corrected
decrement repetition counter by 1
if counter > 0, repeat multiply loop
return to the calling program
Slide 63
11.4 Shift-Subtract Division

Divisor
Subtracted
bit-matrix
z
Dividend
y 3 x 23
y 2 x 22
y 1 x 21
y 0 x 20
s
Figure11.9
Quotient
Remainder
Division of an 8-bit number by a 4-bit number in dot notation.
z(j) = 2z(j1) ykj x 2k with z(0) = z and z(k) = 2k s

|shift|
| subtract |
Oct. 2014
Slide 64
Integer and Fractional Unsigned Division

Position
7 6 5 4 3 2 1 0
=========================
z
0 1 1 1 0 1 0 1
4
x2
1 0 1 0
=========================
z (0)
0 1 1 1 0 1 0 1
(0)
2z
0 1 1 1 0 1 0 1
4
y3x2
1 0 1 0
y3=1
z (1)
0 1 0 0 1 0 1
(1)
2z
0 1 0 0 1 0 1
4
y2x2
0 0 0 0
y2=0
z (2)
1 0 0 1 0 1
(2)
2z
1 0 0 1 0 1
4
y1x2
1 0 1 0
y1=1
z (3)
1 0 0 0 1
(3)
2z
1 0 0 0 1
4
y0x2
1 0 1 0
y0=1
z (4)
0 1 1 1
s
0 1 1 1
y
1 0 1 1
=========================
Figure 11.10
Oct. 2014
Example 11.5
Position
1 2 3 4 5 6 7 8
==========================
z
.1 4 3 5 1 5 0 2
x
.4 0 6 7
==========================
z (0)
.1 4 3 5 1 5 0 2
(0)
10z
1.4 3 5 1 5 0 2
y1x
1.2 2 0 1
y1=3
z (1)
.2 1 5 0 5 0 2
(1)
10z
2.1 5 0 5 0 2
y2x
2.0 3 3 5
y2=5
z (2)
.1 1 7 0 0 2
(2)
10z
1.1 7 0 0 2
y3x
0.8 1 3 4
y3=2
z (3)
.3 5 6 6 2
(3)
10z
3.5 6 6 2
y4x
3.2 5 3 6
y4=8
z (4)
.3 1 2 6
s
.0 0 0 0 3 1 2 6
y
.3 5 2 8
==========================
Division examples for binary integers and decimal fractions.

Slide 65
Division with Same-Width Operands

Position
7 6 5 4 3 2 1 0
=========================
z
0 0 0 0 1 1 0 1
4
x2
0 1 0 1
=========================
z (0)
0 0 0 0 1 1 0 1
(0)
2z
0 0 0 1 1 0 1
4
y3x2
0 0 0 0
y3=0
z (1)
0 0 0 1 1 0 1
(1)
2z
0 0 1 1 0 1
4
y2x2
0 0 0 0
y2=0
z (2)
0 0 1 1 0 1
(2)
2z
0 1 1 0 1
4
y1x2
0 1 0 1
y1=1
z (3)
0 0 0 1 1
(3)
2z
0 0 1 1
4
y0x2
1 0 1 0
y0=0
z (4)
0 0 1 1
s
0 0 1 1
y
0 0 1 0
=========================
Figure 11.11
Oct. 2014
Example 11.6
Position
1 2 3 4 5 6 7 8
==========================
z
.0 1 0 1
x
.1 1 0 1
==========================
z (0)
.0 1 0 1
(0)
2z
0.1 0 1 0
y1x
0.0 0 0 0
y1=0
z (1)
.1 0 1 0
(1)
2z
1.0 1 0 0
y2x
0.1 1 0 1
y2=1
z (2)
.0 1 1 1
(2)
2z
0.1 1 1 0
y3x
0.1 1 0 1
y3=1
z (3)
.0 0 0 1
(3)
2z
0.0 0 1 0
y4x
0.0 0 0 0
y4=0
z (4)
.0 0 1 0
s
.0 0 0 0 0 0 1 0
y
.0 1 1 0
==========================
Division examples for 4/4-digit binary integers and fractions.

Slide 66
Signed Division
Method 1 (indirect): strip operand signs, divide, set result signs
Dividend
z=5
z=5
z = 5
z = 5
Divisor
x=3
x = 3
x=3
x = 3
Quotient
y=1
y = 1
y = 1
y=1
Remainder
s=2
s=2
s = 2
s = 2
Method 2 (direct 2s complement): develop quotient with digits

1 and 1, chosen based on signs, convert to digits 0 and 1
Restoring division: perform trial subtraction, choose 0 for q digit
if partial remainder negative
Nonrestoring division: if sign of partial remainder is correct,
then subtract (choose 1 for q digit) else add (choose 1)
Oct. 2014
Slide 67
11.5 Hardware Dividers

Shift
Quotient y
Partial rem ainder z (j) (initially z)
Hi
Shift
Lo
Mux 1
Load
Quotient
digit
selector
Divisor x
yk j
Enable
Select
out
Adder
Trial difference
Figure 11.12
Oct. 2014
in
1
(Always
subtract)
Hardware divider based on the shift-subtract algorithm.

Slide 68
The Shift Part of Shift-Subtract

qkj
From adder
/k
Partial remainder
/k
Quotient
/k
MSB
/k
To adder
Figure 11.13 Shifting incorporated in the connections to the
partial remainder register rather than as a separate phase.
Oct. 2014
Slide 69
High-Radix Dividers
x
Divisor
Quotient
Dividend
0, x, 2x, or 3x
s
Remainder
Radix-4 division in dot notation.
z(j) = 4z(j1) (yk2j+1 yk2j)two x 2k with z(0) = z and z(k/2) = 2ks

|shift|
Assume k even
| subtract |
Oct. 2014
Slide 70
Array Dividers
z7
y3
x3
z6
MS
x1
MS
x0
z5
MS
y2
x2
z4
MS
MS
z3
0
z2
d
MS
MS
MS
z1
y1
MS
MS
MS
MS
Our original
dot-notation
for division
z0
y0
MS
s3
MS
s2
Figure 11.14
Oct. 2014
MS
s1
MS
s0
Straightened
dots to depict
an array divider
Array divider for 8/4-bit unsigned integers.

Slide 71
11.6 Programmed Division

MiniMIPS instructions related to division
div
divu
mfhi
mflo
$s0,$s1
$s2,$s3
$t0
$t1
#
#
#
#
Lo = quotient, Hi = remainder
unsigned version of division
set $t0 to (Hi)
set $t1 to (Lo)
Example 11.7
Compute z mod x, where z (singed) and x > 0 are integers
Divide; remainder will be obtained in Hi
if remainder is negative,
then add |x| to (Hi) to obtain z mod x
else Hi holds z mod x
Oct. 2014
Slide 72
Emulating a Hardware Divider in Software

Example 11.8 (MiniMIPS shift-add program for division)
Shift
$a1
(quotient
Quotient
y y)
$v0
Partial
(Hi part
remainder
of z)
$v1
z (j) (initially
(Lo part of
z)z)
Shift
$t0 (MSB of Hi)
yk j
Load
$t1 (bit k j of y)
$a0
Divisor
(divisorx x)
Mux
Enable
Quotient
digit
selector
Select
$t2 (counter)
c out
Adder
Trial difference
c in
1
(Always
subtract)
Part of the
control in
hardware
Figure 11.15 Register usage for programmed division superimposed

on the block diagram for a hardware divider.
Oct. 2014
Slide 73
Division When There Is No Divide Instruction

Example 11.7 (MiniMIPS shift-subtract program for division)
shsdi: move
move
addi
dloop: slt
sll
slt
or
sll
sge
or
sll
or
beq
subu
nosub: addi
bne
move
jr
Oct. 2014
$v0,$a2
$vl,$a3
$t2,$zero,32
$t0,$v0,$zero
$v0,$v0,1
$t1,$v1,$zero
$v0,$v0,$t1
$v1,$v1,1
$t1,$v0,$a0
$t1,$t1,$t0
$a1,$a1,1
$a1,$a1,$t1
$t1,$zero,nosub
$v0,$v0,$a0
$t2,$t2,-1
$t2,$zero,dloop
$v1,$a1
$ra
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
initialize Hi to ($a2)
initialize Lo to ($a3)
initialize repetition counter to 32
copy MSB of Hi into $t0
left-shift the Hi part of z
copy MSB of Lo into $t1
move MSB of Lo into LSB of Hi
left-shift the Lo part of z
quotient digit is 1 if (Hi) x,
or if MSB of Hi was 1 before shifting
shift y to make room for new digit
copy y[k-j] into LSB of $a1
if y[k-j] = 0, do not subtract
subtract divisor x from Hi part of z
decrement repetition counter by 1
if counter > 0, repeat divide loop
copy the quotient y into $v1
return to the calling program
Slide 74
Divider vs Multiplier: Hardware Similarities

Shift
Quotient y
Partial rem ainder z (j) (initially z)
Shift
yk j
Multiplier y
Dou blewi dth pa rtial product z (j)
Load
Shift
Shift
Quotient
digit
selector
Divisor x
Mux
Enable
Multiplicand x
yj
Enable
Mux
Select
Select
Figure 11.12
c
out
Adder
in
Trial difference
x3 0
0
x2 0
0
MA
x1 0
0
MA
x0 0
z7
y3
MA
Adder
c out
1
(Always
subtract)
0
MA
Figure 11.4
y0
x3
z6
MS
x2
x1
z5
MS
c in
x0
z4
MS
MS
0
MA
MA
MA
MA
y1
z0
0
MA
MA
MA
MA
y2
z1
MS
MS
MS
MS
MS
MS
MS
0
MA
MA
MA
MA
y3
MS
z3
FA
FA
HA
z7
z6
z5
Oct. 2014
S traighten ed
dots to depic t
arr ay m ultiplier
to the left
Figure 11.14
z4
Our original
dot-notation
for division
z0
y0
MS
MS
MS
0
FA
z1
y1
MS
z2
z3
z2
y2
O ur o rigin al
dot-n otation
rep res entin g
m ultiplic ation
AddSub
s3
s2
s1
s0
Turn upside-down
0
Straightened
dots to depict
an array divider
Figure 11.7
Slide 75
12 Floating-Point Arithmetic
Floating-point is no longer reserved for high-end machines
Multimedia and signal processing require flp arithmetic
Details of standard flp format and arithmetic operations
12.1 Rounding Modes
12.2 Special Values and Exceptions
12.3 Floating-Point Addition
12.4 Other Floating-Point Operations
12.5 Floating-Point Instructions
12.6 Result Precision and Errors
Oct. 2014
Slide 76
12.1 Rounding Modes

Short (32-bit) format
8 bits,
bias = 127, (plus hidden 1 in integer part)
126 to 127
Sign Exponent
IEEE 754
Format
0, , NaN
Denormals:
0.f 2emin
1.f 2e
Significand
11 bits,
bias = 1023,
1022 to 1023

Denormals allow
graceful underflow
Long (64-bit) format
Negative numbers
max
min
FLP
Sparser
Overflow
region
Denser
Positive numbers
min +
FLP +
Denser
Figure 12.1
Oct. 2014
Underflow
example
Sparser
Underflow
regions
Midway
example
max +
Overflow
region
Typical
example
Overflow
example
Distribution of floating-point numbers on the real line.

Slide 77
Round-to-Nearest (Even)
rtnei(x)
rtni(x)
x
4
Oct. 2014
(a) Round to nearest even integer
Figure 12.2
x
2
(b) Round to nearest integer
Two round-to-nearest-integer functions for x in [4, 4].

Slide 78
Directed Rounding
ritni(x)
rutni(x)
x
4
x
4
(a) Round inward to nearest integer
(b) Round upward to nearest integer
Figure 12.3 Two directed round-to-nearest-integer functions for x in [4, 4].

Oct. 2014
Slide 79
12.2 Special Values and Exceptions

Zeros, infinities, and NaNs (not a number)
0 Biased exponent = 0, significand = 0 (no hidden 1)
Biased exponent = 255 (short) or 2047 (long), significand = 0
NaN Biased exponent = 255 (short) or 2047 (long), significand 0
Arithmetic operations with special operands
(+0) + (+0) = (+0) (0) = +0
(+0) (+5) = +0
(+0) / (5) = 0
(+) + (+) = +
x (+) =
(+) x = , depending on the sign of x
x / (+) = 0, depending on the sign of x
(+) = +
Oct. 2014
Slide 80
Exceptions
Undefined results lead to NaN (not a number)
(0) / (0) = NaN
(+) + () = NaN
(0) () = NaN
() / () = NaN
Arithmetic operations and comparisons with NaNs

NaN + x = NaN
NaN + NaN = NaN
NaN 0 = NaN
NaN NaN = NaN
NaN < 2 false

NaN = Nan false
NaN (+) true
NaN NaN true
Examples of invalid-operation exceptions

Addition:
Multiplication:
Division:
Square-root:
Oct. 2014
(+) + ()
0
0 0 or
Operand < 0
Slide 81
12.3 Floating-Point Addition

(2e1s1) + (2e1(s2 / 2e1e2)) = 2e1(s1 s2 / 2e1e2)
(2e2s2)
Numbers to be added:
x = 25 1.00101101
y = 21 1.11101101
Operands after alignment shift:
x = 25 1.00101101
y = 25 0.000111101101
Result of addition:
s = 25 1.010010111101
s = 25 1.01001100
Figure 12.4
Oct. 2014
Operand with
smaller exponent
to be preshifted
Extra bits to be
rounded off
Rounded sum
Alignment shift and rounding in floating-point addition.

Slide 82
Inp ut 1
Hardware for
Floating-Point
Addition
Inp ut 2
Unpack
Signs Exponents
Significands
AddSub
Mu x
Sub
Possible swap
& compleme nt
Align
significands
Control
& sign
logic
Add
Norma lize
& round
Figure 12.5
Simplified schematic of
a floating-point adder.
Sign
Exponent
Significand
Pack
Outp ut
Oct. 2014
Slide 83
12.4 Other Floating-Point Operations

Overflow
(underflow)
possible
Floating-point multiplication
(2e1s1) (2e2s2) = 2e1+ e2(s1 s2)

Product of significands in [1, 4)
If product is in [2, 4), halve to normalize (increment exponent)
Overflow
(underflow)
possible
Floating-point division
(2e1s1) / (2e2s2) = 2e1 e2(s1 / s2)

Ratio of significands in (1/2, 2)
If ratio is in (1/2, 1), double to normalize (decrement exponent)
Floating-point square-rooting
(2es)1/2 = 2e/2(s)1/2
= 2(e1)2(2s)1/2
Normalization not needed
Oct. 2014
when e is even
when e is odd
Slide 84
Hardware for
Floating-Point
Multiplication
and Division
Input 1
Input 2
Unpack
Signs Exponents
Significands
MulDiv
Multiply
or divide
Control
& sign
logic
Normalize
& round
Figure 12.6 Simplified

schematic of a floatingpoint multiply/divide unit.
Sign
Exponent
Significand
Pack
Output
Oct. 2014
Slide 85
12.5 Floating-Point Instructions

Floating-point arithmetic instructions for MiniMIPS:
add.s
sub.d
mul.d
div.s
neg.s
31
op
$f0,$f8,$f10
$f0,$f8,$f10
$f0,$f8,$f10
$f0,$f8,$f10
$f0,$f8
25
ex
20
#
#
#
#
#
ft
set
set
set
set
set
15
$f0
$f0
$f0
$f0
$f0
fs
to
to
to
to
to
10
($f8) +fp
($f8) fp
($f8) fp
($f8) /fp
($f8)
($f10)
($f10)
($f10)
($f10)
fd
fn
0 1 0 0 0 1 0 0 0 0 x 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 x x x
Floating-point
instruction
s=0
d=1
Source
register 2
Source
register 1
Destination
register
add.* = 0
sub.* = 1
mul.* = 2
div.* = 3
neg.* = 7
Figure 12.7 The common floating-point instruction format for

MiniMIPS and components for arithmetic instructions. The extension
(ex) field distinguishes single (* = s) from double (* = d) operands.
Oct. 2014
Slide 86
The Floating-Point Unit in MiniMIPS

...
m 2 32
Loc 0 Loc 4 Loc 8

4 B / location
Memory
Loc
Loc
m 8 m 4
up to 2 30 words
Coprocessor 1
...
EIU
$0
$1
$2
(Main proc.)
$31
ALU
Execution
& integer
unit
(Coproc. 1)
Integer
mul/div
FP
arith
Hi
Lo
FPU
$0
$1
$2
Floatingpoint unit
$31
Pairs of registers,
beginning with an
even-numbered
one, are used for
double operands
BadVaddr Trap &

(Coproc. 0) Status memory
Cause unit
EPC
TMU
Chapter
10
Chapter
11
Figure 5.1
Oct. 2014
Chapter
12
Memory and processing subsystems for MiniMIPS.

Slide 87
Floating-Point Format Conversions

MiniMIPS instructions for number format conversion:
cvt.s.w
cvt.d.w
cvt.d.s
cvt.s.d
cvt.w.s
cvt.w.d
31
op
$f0,$f8
$f0,$f8
$f0,$f8
$f0,$f8
$f0,$f8
$f0,$f8
25
ex
#
#
#
#
#
#
20
set
set
set
set
set
set
ft
$f0
$f0
$f0
$f0
$f0
$f0
15
to
to
to
to
to
to
fs
single(integer $f8)
double(integer $f8)
double($f8)
single($f8,$f9)
integer($f8)
integer($f8,$f9)
10
fd
fn
0 1 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 x x x
Floating-point
instruction
Figure 12.8
Oct. 2014
*.w = 0
w.s = 0
w.d = 1
*.* = 1
Unused
Source
register
Destination
register
To format:
s = 32
d = 33
w = 36
Floating-point instructions for format conversion in MiniMIPS.

Slide 88
Floating-Point Data Transfers

MiniMIPS instructions for floating-point load, store, and move:
lwc1
swc1
mov.s
mov.d
mfc1
mtc1
31
$f8,40($s3)
$f8,A($s3)
$f0,$f8
$f0,$f8
$t0,$f12
$f8,$t4
op
25
20
ft
load mem[40+($s3)] into $f8

store ($f8) into mem[A+($s3)]
load $f0 with ($f8)
load $f0,$f1 with ($f8,$f9)
load $t0 with ($f12)
load $f8 with ($t4)
15
fs
10
fd
fn
0 1 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Floating-point
instruction
31
ex
#
#
#
#
#
#
op
s=0
d=1
25
rs
Unused
20
rt
Source
register
15
rd
Destination
register
10
sh
mov.* = 6
fn
0 1 0 0 0 1 0 0 x 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Floating-point
instruction
Figure 12.9
Oct. 2014
mfc1 = 0
mtc1 = 4
Source
register
Destination
register
Unused
Unused
Instructions for floating-point data movement in MiniMIPS.

Slide 89
Floating-Point Branches and Comparisons

MiniMIPS instructions for floating-point load, store, and move:
bc1t
bc1f
c.eq.*
c.lt.*
c.le.*
31
L
L
$f0,$f8
$f0,$f8
$f0,$f8
op
25
20
branch on fp flag true

branch on fp flag false
if ($f0)=($f8), set flag to true
if ($f0)<($f8), set flag to true
if ($f0)($f8), set flag to true
rt
operand / offset
15
0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 x 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
Floating-point
instruction
31
rs
#
#
#
#
#
op
bc1? = 8
25
ex
true = 1
false = 0
20
ft
Offset
Correction: 1 1 x x x 0
15
fs
10
fd
fn
0 1 0 0 0 1 0 0 0 0 x 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Floating-point
instruction
Figure 12.10
Oct. 2014
s=0
d=1
Source
register 2
Source
register 1
Unused
c.eq.* = 50
c.lt.* = 60
c.le.* = 62
Floating-point branch and comparison instructions in MiniMIPS.

Slide 90
Floating-Point
Instructions of
MiniMIPS
Copy
Table 12.1
Arithmetic
s/d for single/double

# 0/1 for single/double
Conversions
Memory access
Control transfer
Oct. 2014
Instruction
Usage
Move s/d registers

Move fm coprocessor 1
Move to coprocessor 1
Add single/double
Subtract single/double
Multiply single/double
Divide single/double
Negate single/double
Compare equal s/d
Compare less s/d
Compare less or eq s/d
Convert integer to single
Convert integer to double
Convert single to double
Convert double to single
Convert single to integer
Convert double to integer
Load word coprocessor 1
Store word coprocessor 1
Branch coproc 1 true
Branch coproc 1 false
mov.* fd,fs
mfc1 rt,rd
mtc1 rd,rt
add.* fd,fs,ft
sub.* fd,fs,ft
mul.* fd,fs,ft
div.* fd,fs,ft
neg.* fd,fs
c.eq.* fs,ft
c.lt.* fs,ft
c.le.* fs,ft
cvt.s.w fd,fs
cvt.d.w fd,fs
cvt.d.s fd,fs
cvt.s.d fd,fs
cvt.w.s fd,fs
cvt.w.d fd,fs
lwc1 ft,imm(rs)
swc1 ft,imm(rs)
bc1t L
bc1f L
ex fn
#
0
4
#
#
#
#
#
#
#
#
0
0
1
1
0
1
rs
rs
8
8
Slide 91
6
0
1
2
3
7
50
60
62
32
33
33
32
36
36
12.6 Result Precision and Errors

Example 12.4
Laws of algebra may not hold in floating-point arithmetic. For example,
the following computations show that the associative law of addition,
(a + b) + c = a + (b + c), is violated for the three numbers shown.
Numbers to be added first
a =-25 1.10101011
b = 25 1.10101110
Numbers to be added first

b = 25 1.10101110
c =-22 1.01100101
Compute a + b
25 0.00000011
a+b = 22 1.10000000
c =-22 1.01100101
Compute b + c (after preshifting c)

25 1.101010110011011
b+c = 25 1.10101011 (Round)
a =-25 1.10101011
Compute (a + b) + c
22 0.00011011
Sum = 26 1.10110000
Compute a + (b + c)
25 0.00000000
Sum = 0 (Normalize to special code for 0)
Oct. 2014
Slide 92
Error Control and Certifiable Arithmetic

Catastrophic cancellation in subtracting almost equal numbers:
Area of a needlelike triangle
A = [s(s a)(s b)(s c)]1/2
b
a
Possible remedies
Carry extra precision in intermediate results (guard digits):
commonly used in calculators
Use alternate formula that does not produce cancellation errors
Certifiable arithmetic with intervals
A number is represented by its lower and upper bounds [xl, xu]
Example of arithmetic: [xl, xu] +interval [yl, yu] = [xl +fp yl, xu +fp yu]
Oct. 2014
Slide 93
Evaluation of Elementary Functions

Approximating polynomials
ln x = 2(z + z3/3 + z5/5 + z7/7 + . . . ) where z = (x 1)/(x + 1)
ex = 1 + x/1! + x2/2! + x3/3! + x4/4! + . . .
cos x = 1 x2/2! + x4/4! x6/6! + x8/8! . . .
tan1 x = x x3/3 + x5/5 x7/7 + x9/9 . . .
Iterative (convergence) schemes
For example, beginning with an estimate for x1/2, the following
iterative formula provides a more accurate estimate in each step
q(i+1) = 0.5(q(i) + x/q(i))
Table lookup (with interpolation)
A pure table lookup scheme results in huge tables (impractical);
hence, often a hybrid approach, involving interpolation, is used.
Oct. 2014
Slide 94
Function Evaluation by Table Lookup

h bits
xH
Input x
k - h bits
xL
xL
Table
for a
f(x)
Table
for b
Best linear
approximation
in subinterval
Multiply
x
xH
Add
Output
Figure 12.12
Oct. 2014
f(x)
The linear approximation above is

characterized by the line equation
a + b x L , where a and b are
read out from tables based on x H
Function evaluation by table lookup and linear interpolation.

Slide 95

f37 Book Intarch Pres pt3 PDF

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

f37 Book Intarch Pres pt3 PDF

Caricato da

Copyright:

Formati disponibili

Part III

The Arithmetic/Logic Unit

Computer Architecture, The Arithmetic/Logic Unit

About This Presentation

Computer Architecture, The Arithmetic/Logic Unit

III The Arithmetic/Logic Unit

Chapter 10 Adders and Simple ALUs

Computer Architecture, The Arithmetic/Logic Unit

Preview of Arithmetic Unit in the Data Path

Reg access / decode

Key elements of the single-cycle MicroMIPS data path.

Computer Arithmetic as a Topic of Study

Computer Architecture, The Arithmetic/Logic Unit

Computer Architecture, The Arithmetic/Logic Unit

9.1 Positional Number Systems

sticks or unary code

Fixed-radix positional representation with k digits

Value of a number: x = (xk1xk2 . . . x1x0)r = xi r i

Computer Architecture, The Arithmetic/Logic Unit

Unsigned Binary Integers

Inside: Natural number

Figure 9.1 Schematic representation of 4-bit code for

Computer Architecture, The Arithmetic/Logic Unit

Representation Range and Overflow

Overflow region max

Finite set of representable numbers

Figure 9.2 Overflow regions in finite number representation systems.

Computer Architecture, The Arithmetic/Logic Unit

9.2 Digit Sets and Encodings

Computer Architecture, The Arithmetic/Logic Unit

Computer Architecture, The Arithmetic/Logic Unit

The Notion of Carry-Save Addition

b. Adding two carry-save numbers.

Figure 9.3 Adding a binary number or another

Computer Architecture, The Arithmetic/Logic Unit

9.3 Number Radix Conversion

Perform arithmetic in the new radix R

Perform arithmetic in the old radix r

Computer Architecture, The Arithmetic/Logic Unit

Justifications for Radix Conversion Rules

Justifying one step of the conversion of x to radix 2.

9.4 Signed Integers

Computer Architecture, The Arithmetic/Logic Unit

Figure 9.5 Schematic representation of 4-bit 2s-complement

Computer Architecture, The Arithmetic/Logic Unit

Conversion from 2s-Complement to Decimal

Sign Change for a 2s-Complement Number

Computer Architecture, The Arithmetic/Logic Unit

Twos-Complement Addition and Subtraction

Binary adder used as 2s-complement adder/subtractor.

9.5 Fixed-Point Numbers

Computer Architecture, The Arithmetic/Logic Unit

Fixed-Point 2s-Complement Numbers

Figure 9.7 Schematic representation of 4-bit 2s-complement

Computer Architecture, The Arithmetic/Logic Unit

Radix Conversion for Fixed-Point Numbers

Perform arithmetic in the new radix R

Perform arithmetic in the old radix r

Computer Architecture, The Arithmetic/Logic Unit

9.6 Floating-Point Numbers

Computer Architecture, The Arithmetic/Logic Unit

ANSI/IEEE Standard Floating-Point Format (IEEE 754)