Sei sulla pagina 1di 26

Floating Point Arithmetic

Ravi Paruchuri
Haritha Talluri
Vasuki Mulukutla
Satish Gogisetty

MIPS floating-point arithmetic


Floating-point computations are vital for many
applications, but correct implementation of floating-point
hardware and software is very tricky.
Well study the IEEE 754 standard for floating-point
arithmetic.
How floating-point numbers are represented.
The basic addition algorithm.
MIPS support for IEEE 754 arithmetic.
We wont worry much about hardware implementation of
these ideas.
09/18/15

Floating Point Arithm

A history lesson in a CS class!


In the past, each machine had its own implementation of
floating-point arithmetic hardware and/or software.
It was impossible to write portable programs that
would produce
the same results on different systems.
Many strange tricks were needed to get correct
answers out of
some machines, such as Crays or the IBM System 370.
But nobody was even really sure what correct meant!
It wasnt until 1985 that the IEEE 754 standard was
adopted.
Having a standard ensures that all compliant machines
will
produce the same outputs for the same program.
09/18/15
Floating Point Arithm
3
The standard is very complex and difficult to

Floating-point representation
IEEE numbers are stored in a kind of binary scientific
notation:
mantissa 2exponent
We represent a binary floating-point number with three
fields:
A sign bit ,s.
An exponent field, e.
A fraction field, f.
The standard defines several different precisions:
For single precision numbers, e is 8 bits long and f is 23
bits
s
e
f
long.This gives a total of 32 bits, including the sign bit.
09/18/15
Floating Point Arithm
4
1 BIT

8 BITS

23 BITS

s
1 BIT

e
11 BITS

f
52 BITS

In double precision, e is 11 bits and f is 52 bits, for a tota


64 bits.
There are also various extended precision formats; for ex
Intel chips use an 80-bit format internally.

09/18/15

Floating Point Arithm

Sign
The sign bit is 0 for positive numbers and 1 for negative numbers.
But unlike integers, IEEE values are stored in signed magnitude
format.

s
1 BIT

09/18/15

8 BITS

Floating Point Arithm

23 BITS

s
1 BIT

e
8 BITS

Mantissa

f
23 BITS

The field f represents a binary fraction.


There is an implicit 1 to the left of the binary point.
For example, if f is 01101..., the actual mantissa would be
1.01101...
This is called a normalized number; there is exactly one nonzero digit
to the left of the point.
Numbers have a unique normalized representation, even
though there
are many ways to a number in regular scientific notation:
2.32 x 102 = 23.2 x 101 = 0.232 x 103 =
A side effect is that we get a little more precision: there are
really 24
09/18/15
Floating Point Arithm
7

Exponent

s
1 BIT

e
8 BITS

f
23 BITS

The e field represents the exponent as a biased number.


It stores the actual exponent plus 127 for single precision, or
the
actual exponent plus 1023 in double precision.
This essentially converts all possible exponents from -127 to
+127
into unsigned numbers from 0 to 255.
Two examples:
If the exponent is 4, the e field will be 4+127 = 131
(100000112).
If e contains 01011101 (9310), the actual exponent is 93-127 =
-34.
09/18/15
Floating Point Arithm
8
Storing a biased exponent before a normalized mantissa means

Converting an IEEE 754 number to decimal

s
1 BIT

8 BITS

23 BITS

The decimal value of an IEEE number is:


(1 - 2s) x (1 + f) x 2 e-bias

Here, the s, f and e fields are assumed to be converted to de

(1 - 2s) is 1 or -1, depending on whether the sign bit is 0 or 1


We add the implicit 1 to the fraction field f.

Again, the bias is either 127 or 1023, for single or double pr


09/18/15

Floating Point Arithm

Example IEEE-decimal conversion


Lets find the decimal value of the following IEEE
number.
1

01111100

11000000000000000000000

First convert each individual field to decimal.


The sign bit s is 1.
The e field contains 01111100 = 12410.
The mantissa is 0.11000 = 0.7510.
Then just plug these decimal values of s, e and f into the
formula:
This gives us
09/18/15

(1 - 2s) x (1 + f) x 2

e-bias

Floating Point Arithm

10

Converting a decimal number to IEEE 754


What is the single-precision representation of 347.625?
1. Convert this to binary: 347.625 = 101011011.101 2.

2. Normalize the number by shifting the binary point unti


there is a single 1 to the left:
101011011.101 x 2 0 = 1.01011011101 x 2 8
3. The digits to the right of the binary point,
010110111012,
comprise the fractional field f.

4. The number of times you shifted gives the exponent. I


this
contains the

case,we shifted 8 places to the left. The field e

exponentFloating
plus thePoint
bias;Arithm
we have 8 + 127 = 135 =11
09/18/15

Special values
In practice, e=00000000 and e=11111111 and their double
precision
counterparts are reserved for special purposes.
If the mantissa is always (1 + f), then how is 0
represented?
The fraction field f should be 0000...0000.
The exponent field e contains the value 00000000.
With signed magnitude, there are two zeroes: +0.0 and
-0.0.
There are also representations of positive and
negativeinfinity:
The fraction is 0000...0000.
The exponent is set to 11111111.
Finally,09/18/15
there is a special
number value, which 12
is
Floatingnot
Pointa Arithm

Range of single-precision numbers


The value formula for single precision is
(1 - 2s) x (1 + f) x 2e-127.
The largest possible number is (2 - 2-23) x 2127 = 2128 2104.
The largest possible e is 11111110 (254).
The largest possible f is 11111111111111111111111 (1 - 223
).
And the smallest positive non-zero number is 1 x 2-126 = 2.
The smallest e is 00000001 (1).
The smallest f is 00000000000000000000000 (0).

126

In comparison, the smallest and largest possible 32-bit


09/18/15
Floating Point Arithm
13
integers
in

Finiteness
There arent more IEEE numbers.
With 32 bits, there are 232-1, or about 4 billion, different bit
patterns.
These can represent 4 billion integers or 4 billion reals.
But there are an infinite number of reals, and the IEEE
format can
only represent some of the ones from about -2128 to +2128.
This causes enormous headaches in doing floating-point
arithmetic.
Not even every integer between -2128 to +2128 can be
represented.
Small roundoff errors can quickly accumulate with
multiplications or
exponentiations,
resulting
in bigArithm
errors.
09/18/15
Floating Point
Rounding errors can invalidate many basic arithmetic

14

Floating-point addition example


To get a feel for floating-point operations, well do an
addition
example.
To keep it simple, well use base 10 scientific notation.
Assume the mantissa and exponent have four and two
digits
each.
The text shows an example for the addition:
99.99 + 0.161 = 100.151
As normalized
operands
09/18/15 numbers,
Floatingthe
Point
Arithm would be written
15

Steps 1-2: the actual addition

Step 1: Equalize the exponents


We cant add the mantissas until the exponents are the same.
The number with the smaller exponent should be rewritten b
increasing its exponent and shifting the point leftward:
1.610 x 10-1 = 0.0161 x 101
With four significant digits, this gets rounded to 0.016 x 101.
Rewriting the number with the smaller exponent could result
loss of least significant digitsthe rightmost 1 in this case.
But rewriting the number with the larger exponent could resu
loss of the most significant digits, which is much worse.
Step 2: Add the mantissas
9 .999 x 101
+ 0.016 x 101
10 .015 x 101
09/18/15

Floating Point Arithm

16

Steps 3-5: representing the result


Step 3: Normalize the result if necessary
This step may cause the point to shift either left or
right, and the
exponent to either increase or decrease.
For our example, 10.015 x 101 becomes 1.0015 x 102
Step 4: Round the number if needed
1.0015 x 102 gets rounded to 1.002 x 102

Step 5: Repeat Step 3 if the result is no longer normalized


Its possible for rounding to add digits; for example,
rounding
9.9995 yields 10.000.
We dont need this step in our example.
Our final result is 1.002 x 102, or 100.2.
The correct answer is 100.151, so we have the right

09/18/15

Floating Point Arithm

17

Multiplication
Multiplication is a very common floating-point operation.
To multiply two values, first multiply their magnitudes
exponents.

and add the


9.999 x 101
x 1.610 x 10-1
16 .098 x 100

You can then round and normalize the result, yielding


1.610 x 101.
The sign of a product is the xor of the signs of the
operands:
If two numbers have the same sign, their product is
positive.
09/18/15
Floating
Point Arithm
18

Floating-point hardware
Intel introduced the 8087 coprocessor around 1981.

The main CPU would call the 8087 for floating-point operation
The 8087 had eight special 80-bit (extended precision) regis
that could be accessed in a stack-like fashion.
Some of the IEEE standard is based on the 8087.

Intels 80486, introduced in 1989, included floating-point support


processor itself.

The MIPS floating-point architecture and instruction set still ref


the old coprocessor days.

There are separate floating-point registers and special instruc


for accessing those registers.
Some of the instruction names include coprocessor.

09/18/15

Floating Point Arithm

19

MIPS floating-point architecture


MIPS includes a separate set of 32 floating-point registers,
$f0-$f31.
Each floating-point register is 32 bits long and can hold a
single-precision number.
Two registers can be combined to store a double-precision
number.
You can store up to 16 double-precision values in registers
$f0-$f1, $f2-$f3, ..., $f30-$f31.
There are also separate instructions for floating-point
arithmetic. Theoperands must be floating-point registers.
add.s $f1, $f2, $f3 # Single-precision $f1 = $f2 + $f3
add.d $f2, $f4, $f6 # Double-precision $f2 = $f4 + $f6
Other basic operations include:
Point Arithm
sub.s09/18/15
and sub.d forFloating
subtraction

20

Floating-point register transfers

mov.s and mov.d move data between floating-point registers.


Use mtc1 and mfc1 to transfer data between the regular registe
($0-$31) and the floating-point registers ($f0-$f31).
mtc1
mfc1

$t0 , $f0
$t0 , $f0

# $f0 = $t0
# $t0 = $f0

There are also special instructions for transferring data betwee


floating-point registers and memory.
(The base address is still given in an integer register.)
lwc1 $f2 , 0($a0)
swc1 $f4 , 4($sp)

# $f2 = M[$a0]
# M[$sp+4] = $

Miscellaneous notes:
Be careful with the order of the integer and floating-point
registers in these instructions.
The c1 stands for coprocessor 1.

09/18/15

Floating Point Arithm

21

Floating-point comparisons
We also need special instructions for comparing floatingpoint
values, since slt and sltu only apply to signed and
unsigned
integers.
c.le.s
$f2 , $f4
c.eq.s
$f2 , $f4
c.lt.s
$f2 , $f4
These each set a special coprocessor register to 1 or 0.
You can then branch depending on whether this register
contains 1 or 0.
bc1t Label
# branch if true
bc1f Label
# branch if false
For example, to branch to Exit if $f2 = $f4:
c.eq.s
$f4
09/18/15
Floating Point$f2,
Arithm

22

Floating-point functions
To pass data to and from floating-point functions:
The arguments are placed in $f12-$f15.
Return values go into $f0-$f1.
We also split the register-saving chores, just like
before.
$f0-$f19 are caller-saved.
$f20-$f31 are callee-saved.
These are the same basic ideas as before because we
still have the
same problems to solve, but now with different registers.
09/18/15

Floating Point Arithm

23

The hard part


Loading constants is probably the hardest part of writing
floating-point
programs in MIPS!
$f0 is not hardwired to the value 0.0.
MIPS does not have immediate floating-point
instructions and
SPIMdoesnt support any such pseudo-instructions.
One solution is to store your floating point constants as
program data,
which can be loaded with a l.s pseudo-instruction:
.data
alpha: .float
0.55555
09/18/15
Floating Point Arithm

24

Type conversions
You can also cast integer values into floating-point ones with
MIPS
type conversion instructions.
Type to
convert to

cvt . s . w
Type to

point

convert from

register

Floating-point
destination

$f4, $f2
Floatingsource

Possible types include integers (w), single-precision (s) and


doubleprecision (d) floating-point.
li
mtc1
09/18/15
cvt.s.w

$t0 , 32
# $t0 = 32
$t0 , $f2
# $f2 = 32
Floating
Point
Arithm
$f4 , $f2
# $f4 = 32.0

25

Summary
IEEE 754 is a floating-point arithmetic standard that
defines
number representations and operations.
Having a finite number of bits means we cant represent
all possible
real numbers, and must introduce errors by
approximation.
MIPS processors support the IEEE 754 standard.
There is a completely different set of floating-point
registers.
New09/18/15
instructionsFloating
do floating-point
Point Arithmarithmetic,
26

Potrebbero piacerti anche