Floating Point

Floating Point Arithmetic
Ravi Paruchuri
Haritha Talluri
Vasuki Mulukutla
Satish Gogisetty
MIPS floating-point arithmetic

Floating-point computations are vital for many
applications, but correct implementation of floating-point
hardware and software is very tricky.
Well study the IEEE 754 standard for floating-point
arithmetic.
How floating-point numbers are represented.
The basic addition algorithm.
MIPS support for IEEE 754 arithmetic.
We wont worry much about hardware implementation of
these ideas.
09/18/15
Floating Point Arithm
A history lesson in a CS class!

In the past, each machine had its own implementation of
floating-point arithmetic hardware and/or software.
It was impossible to write portable programs that
would produce
the same results on different systems.
Many strange tricks were needed to get correct
answers out of
some machines, such as Crays or the IBM System 370.
But nobody was even really sure what correct meant!
It wasnt until 1985 that the IEEE 754 standard was
adopted.
Having a standard ensures that all compliant machines
will
produce the same outputs for the same program.
09/18/15
3
The standard is very complex and difficult to
Floating-point representation
IEEE numbers are stored in a kind of binary scientific
notation:
mantissa 2exponent
We represent a binary floating-point number with three
fields:
A sign bit ,s.
An exponent field, e.
A fraction field, f.
The standard defines several different precisions:
For single precision numbers, e is 8 bits long and f is 23
bits
s
e
f
long.This gives a total of 32 bits, including the sign bit.
09/18/15
4
1 BIT
8 BITS
23 BITS
s
1 BIT
e
11 BITS
f
52 BITS
In double precision, e is 11 bits and f is 52 bits, for a tota

64 bits.
There are also various extended precision formats; for ex
Intel chips use an 80-bit format internally.
09/18/15
Sign
The sign bit is 0 for positive numbers and 1 for negative numbers.
But unlike integers, IEEE values are stored in signed magnitude
format.
s
1 BIT
09/18/15
8 BITS
23 BITS
s
1 BIT
e
8 BITS
Mantissa
f
23 BITS
The field f represents a binary fraction.

There is an implicit 1 to the left of the binary point.
For example, if f is 01101..., the actual mantissa would be
1.01101...
This is called a normalized number; there is exactly one nonzero digit
to the left of the point.
Numbers have a unique normalized representation, even
though there
are many ways to a number in regular scientific notation:
2.32 x 102 = 23.2 x 101 = 0.232 x 103 =
A side effect is that we get a little more precision: there are
really 24
09/18/15
7
Exponent
s
1 BIT
e
8 BITS
f
23 BITS
The e field represents the exponent as a biased number.

It stores the actual exponent plus 127 for single precision, or
the
actual exponent plus 1023 in double precision.
This essentially converts all possible exponents from -127 to
+127
into unsigned numbers from 0 to 255.
Two examples:
If the exponent is 4, the e field will be 4+127 = 131
(100000112).
If e contains 01011101 (9310), the actual exponent is 93-127 =
-34.
09/18/15
8
Storing a biased exponent before a normalized mantissa means
Converting an IEEE 754 number to decimal
s
1 BIT
8 BITS
23 BITS
The decimal value of an IEEE number is:

(1 - 2s) x (1 + f) x 2 e-bias
Here, the s, f and e fields are assumed to be converted to de
(1 - 2s) is 1 or -1, depending on whether the sign bit is 0 or 1

We add the implicit 1 to the fraction field f.
Again, the bias is either 127 or 1023, for single or double pr

09/18/15
Example IEEE-decimal conversion

Lets find the decimal value of the following IEEE
number.
1
01111100
11000000000000000000000
First convert each individual field to decimal.

The sign bit s is 1.
The e field contains 01111100 = 12410.
The mantissa is 0.11000 = 0.7510.
Then just plug these decimal values of s, e and f into the
formula:
This gives us
09/18/15
(1 - 2s) x (1 + f) x 2
e-bias
10
Converting a decimal number to IEEE 754

What is the single-precision representation of 347.625?
1. Convert this to binary: 347.625 = 101011011.101 2.
2. Normalize the number by shifting the binary point unti

there is a single 1 to the left:
101011011.101 x 2 0 = 1.01011011101 x 2 8
3. The digits to the right of the binary point,
010110111012,
comprise the fractional field f.
4. The number of times you shifted gives the exponent. I

this
contains the
case,we shifted 8 places to the left. The field e
exponentFloating
plus thePoint
bias;Arithm
we have 8 + 127 = 135 =11
09/18/15
Special values
In practice, e=00000000 and e=11111111 and their double
precision
counterparts are reserved for special purposes.
If the mantissa is always (1 + f), then how is 0
represented?
The fraction field f should be 0000...0000.
The exponent field e contains the value 00000000.
With signed magnitude, there are two zeroes: +0.0 and
-0.0.
There are also representations of positive and
negativeinfinity:
The fraction is 0000...0000.
The exponent is set to 11111111.
Finally,09/18/15
there is a special
number value, which 12
is
Floatingnot
Pointa Arithm
Range of single-precision numbers

The value formula for single precision is
(1 - 2s) x (1 + f) x 2e-127.
The largest possible number is (2 - 2-23) x 2127 = 2128 2104.
The largest possible e is 11111110 (254).
The largest possible f is 11111111111111111111111 (1 - 223
).
And the smallest positive non-zero number is 1 x 2-126 = 2.
The smallest e is 00000001 (1).
The smallest f is 00000000000000000000000 (0).
126
In comparison, the smallest and largest possible 32-bit

09/18/15
13
integers
in
Finiteness
There arent more IEEE numbers.
With 32 bits, there are 232-1, or about 4 billion, different bit
patterns.
These can represent 4 billion integers or 4 billion reals.
But there are an infinite number of reals, and the IEEE
format can
only represent some of the ones from about -2128 to +2128.
This causes enormous headaches in doing floating-point
arithmetic.
Not even every integer between -2128 to +2128 can be
represented.
Small roundoff errors can quickly accumulate with
multiplications or
exponentiations,
resulting
in bigArithm
errors.
09/18/15
Floating Point
Rounding errors can invalidate many basic arithmetic
14
Floating-point addition example

To get a feel for floating-point operations, well do an
addition
example.
To keep it simple, well use base 10 scientific notation.
Assume the mantissa and exponent have four and two
digits
each.
The text shows an example for the addition:
99.99 + 0.161 = 100.151
As normalized
operands
09/18/15 numbers,
Floatingthe
Point
Arithm would be written
15
Steps 1-2: the actual addition
Step 1: Equalize the exponents

We cant add the mantissas until the exponents are the same.
The number with the smaller exponent should be rewritten b
increasing its exponent and shifting the point leftward:
1.610 x 10-1 = 0.0161 x 101
With four significant digits, this gets rounded to 0.016 x 101.
Rewriting the number with the smaller exponent could result
loss of least significant digitsthe rightmost 1 in this case.
But rewriting the number with the larger exponent could resu
loss of the most significant digits, which is much worse.
Step 2: Add the mantissas
9 .999 x 101
+ 0.016 x 101
10 .015 x 101
09/18/15
16
Steps 3-5: representing the result

Step 3: Normalize the result if necessary
This step may cause the point to shift either left or
right, and the
exponent to either increase or decrease.
For our example, 10.015 x 101 becomes 1.0015 x 102
Step 4: Round the number if needed
1.0015 x 102 gets rounded to 1.002 x 102
Step 5: Repeat Step 3 if the result is no longer normalized

Its possible for rounding to add digits; for example,
rounding
9.9995 yields 10.000.
We dont need this step in our example.
Our final result is 1.002 x 102, or 100.2.
The correct answer is 100.151, so we have the right
09/18/15
17
Multiplication
Multiplication is a very common floating-point operation.
To multiply two values, first multiply their magnitudes
exponents.
and add the

9.999 x 101
x 1.610 x 10-1
16 .098 x 100
You can then round and normalize the result, yielding

1.610 x 101.
The sign of a product is the xor of the signs of the
operands:
If two numbers have the same sign, their product is
positive.
09/18/15
Floating
Point Arithm
18
Floating-point hardware
Intel introduced the 8087 coprocessor around 1981.
The main CPU would call the 8087 for floating-point operation
The 8087 had eight special 80-bit (extended precision) regis
that could be accessed in a stack-like fashion.
Some of the IEEE standard is based on the 8087.
Intels 80486, introduced in 1989, included floating-point support

processor itself.
The MIPS floating-point architecture and instruction set still ref

the old coprocessor days.
There are separate floating-point registers and special instruc

for accessing those registers.
Some of the instruction names include coprocessor.
09/18/15
19
MIPS floating-point architecture

MIPS includes a separate set of 32 floating-point registers,
$f0-$f31.
Each floating-point register is 32 bits long and can hold a
single-precision number.
Two registers can be combined to store a double-precision
number.
You can store up to 16 double-precision values in registers
$f0-$f1, $f2-$f3, ..., $f30-$f31.
There are also separate instructions for floating-point
arithmetic. Theoperands must be floating-point registers.
add.s $f1, $f2, $f3 # Single-precision $f1 = $f2 + $f3
add.d $f2, $f4, $f6 # Double-precision $f2 = $f4 + $f6
Other basic operations include:
Point Arithm
sub.s09/18/15
and sub.d forFloating
subtraction
20
Floating-point register transfers
mov.s and mov.d move data between floating-point registers.

Use mtc1 and mfc1 to transfer data between the regular registe
($0-$31) and the floating-point registers ($f0-$f31).
mtc1
mfc1
$t0 , $f0
$t0 , $f0
# $f0 = $t0
# $t0 = $f0
There are also special instructions for transferring data betwee

floating-point registers and memory.
(The base address is still given in an integer register.)
lwc1 $f2 , 0($a0)
swc1 $f4 , 4($sp)
# $f2 = M[$a0]
# M[$sp+4] = $
Miscellaneous notes:
Be careful with the order of the integer and floating-point
registers in these instructions.
The c1 stands for coprocessor 1.
09/18/15
21
Floating-point comparisons
We also need special instructions for comparing floatingpoint
values, since slt and sltu only apply to signed and
unsigned
integers.
c.le.s
$f2 , $f4
c.eq.s
$f2 , $f4
c.lt.s
$f2 , $f4
These each set a special coprocessor register to 1 or 0.
You can then branch depending on whether this register
contains 1 or 0.
bc1t Label
# branch if true
bc1f Label
# branch if false
For example, to branch to Exit if $f2 = $f4:
c.eq.s
$f4
09/18/15
Floating Point$f2,
Arithm
22
Floating-point functions
To pass data to and from floating-point functions:
The arguments are placed in $f12-$f15.
Return values go into $f0-$f1.
We also split the register-saving chores, just like
before.
$f0-$f19 are caller-saved.
$f20-$f31 are callee-saved.
These are the same basic ideas as before because we
still have the
same problems to solve, but now with different registers.
09/18/15
23
The hard part

Loading constants is probably the hardest part of writing
floating-point
programs in MIPS!
$f0 is not hardwired to the value 0.0.
MIPS does not have immediate floating-point
instructions and
SPIMdoesnt support any such pseudo-instructions.
One solution is to store your floating point constants as
program data,
which can be loaded with a l.s pseudo-instruction:
.data
alpha: .float
0.55555
09/18/15
24
Type conversions
You can also cast integer values into floating-point ones with
MIPS
type conversion instructions.
Type to
convert to
cvt . s . w
Type to
point
convert from
register
Floating-point
destination
$f4, $f2
Floatingsource
Possible types include integers (w), single-precision (s) and

doubleprecision (d) floating-point.
li
mtc1
09/18/15
cvt.s.w
$t0 , 32
# $t0 = 32
$t0 , $f2
# $f2 = 32
Floating
Point
Arithm
$f4 , $f2
# $f4 = 32.0
25
Summary
IEEE 754 is a floating-point arithmetic standard that
defines
number representations and operations.
Having a finite number of bits means we cant represent
all possible
real numbers, and must introduce errors by
approximation.
MIPS processors support the IEEE 754 standard.
There is a completely different set of floating-point
registers.
New09/18/15
instructionsFloating
do floating-point
Point Arithmarithmetic,
26

Floating Point

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Floating Point

Caricato da

Copyright:

Formati disponibili

Floating Point Arithmetic

MIPS floating-point arithmetic

Floating Point Arithm

A history lesson in a CS class!

In double precision, e is 11 bits and f is 52 bits, for a tota

Floating Point Arithm

Floating Point Arithm

The field f represents a binary fraction.

The e field represents the exponent as a biased number.

Converting an IEEE 754 number to decimal

The decimal value of an IEEE number is:

Here, the s, f and e fields are assumed to be converted to de

(1 - 2s) is 1 or -1, depending on whether the sign bit is 0 or 1

Again, the bias is either 127 or 1023, for single or double pr

Floating Point Arithm

Example IEEE-decimal conversion

First convert each individual field to decimal.

Floating Point Arithm

Converting a decimal number to IEEE 754

2. Normalize the number by shifting the binary point unti

4. The number of times you shifted gives the exponent. I

case,we shifted 8 places to the left. The field e

Range of single-precision numbers

In comparison, the smallest and largest possible 32-bit

Floating-point addition example

Steps 1-2: the actual addition

Step 1: Equalize the exponents

Floating Point Arithm

Steps 3-5: representing the result

Step 5: Repeat Step 3 if the result is no longer normalized

Floating Point Arithm

and add the

You can then round and normalize the result, yielding

Intels 80486, introduced in 1989, included floating-point support

The MIPS floating-point architecture and instruction set still ref

There are separate floating-point registers and special instruc

Floating Point Arithm

MIPS floating-point architecture

Floating-point register transfers

mov.s and mov.d move data between floating-point registers.

There are also special instructions for transferring data betwee

Floating Point Arithm

Floating Point Arithm

The hard part

Possible types include integers (w), single-precision (s) and

Potrebbero piacerti anche