Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Ravi Paruchuri
Haritha Talluri
Vasuki Mulukutla
Satish Gogisetty
Floating-point representation
IEEE numbers are stored in a kind of binary scientific
notation:
mantissa 2exponent
We represent a binary floating-point number with three
fields:
A sign bit ,s.
An exponent field, e.
A fraction field, f.
The standard defines several different precisions:
For single precision numbers, e is 8 bits long and f is 23
bits
s
e
f
long.This gives a total of 32 bits, including the sign bit.
09/18/15
Floating Point Arithm
4
1 BIT
8 BITS
23 BITS
s
1 BIT
e
11 BITS
f
52 BITS
09/18/15
Sign
The sign bit is 0 for positive numbers and 1 for negative numbers.
But unlike integers, IEEE values are stored in signed magnitude
format.
s
1 BIT
09/18/15
8 BITS
23 BITS
s
1 BIT
e
8 BITS
Mantissa
f
23 BITS
Exponent
s
1 BIT
e
8 BITS
f
23 BITS
s
1 BIT
8 BITS
23 BITS
01111100
11000000000000000000000
(1 - 2s) x (1 + f) x 2
e-bias
10
exponentFloating
plus thePoint
bias;Arithm
we have 8 + 127 = 135 =11
09/18/15
Special values
In practice, e=00000000 and e=11111111 and their double
precision
counterparts are reserved for special purposes.
If the mantissa is always (1 + f), then how is 0
represented?
The fraction field f should be 0000...0000.
The exponent field e contains the value 00000000.
With signed magnitude, there are two zeroes: +0.0 and
-0.0.
There are also representations of positive and
negativeinfinity:
The fraction is 0000...0000.
The exponent is set to 11111111.
Finally,09/18/15
there is a special
number value, which 12
is
Floatingnot
Pointa Arithm
126
Finiteness
There arent more IEEE numbers.
With 32 bits, there are 232-1, or about 4 billion, different bit
patterns.
These can represent 4 billion integers or 4 billion reals.
But there are an infinite number of reals, and the IEEE
format can
only represent some of the ones from about -2128 to +2128.
This causes enormous headaches in doing floating-point
arithmetic.
Not even every integer between -2128 to +2128 can be
represented.
Small roundoff errors can quickly accumulate with
multiplications or
exponentiations,
resulting
in bigArithm
errors.
09/18/15
Floating Point
Rounding errors can invalidate many basic arithmetic
14
16
09/18/15
17
Multiplication
Multiplication is a very common floating-point operation.
To multiply two values, first multiply their magnitudes
exponents.
Floating-point hardware
Intel introduced the 8087 coprocessor around 1981.
The main CPU would call the 8087 for floating-point operation
The 8087 had eight special 80-bit (extended precision) regis
that could be accessed in a stack-like fashion.
Some of the IEEE standard is based on the 8087.
09/18/15
19
20
$t0 , $f0
$t0 , $f0
# $f0 = $t0
# $t0 = $f0
# $f2 = M[$a0]
# M[$sp+4] = $
Miscellaneous notes:
Be careful with the order of the integer and floating-point
registers in these instructions.
The c1 stands for coprocessor 1.
09/18/15
21
Floating-point comparisons
We also need special instructions for comparing floatingpoint
values, since slt and sltu only apply to signed and
unsigned
integers.
c.le.s
$f2 , $f4
c.eq.s
$f2 , $f4
c.lt.s
$f2 , $f4
These each set a special coprocessor register to 1 or 0.
You can then branch depending on whether this register
contains 1 or 0.
bc1t Label
# branch if true
bc1f Label
# branch if false
For example, to branch to Exit if $f2 = $f4:
c.eq.s
$f4
09/18/15
Floating Point$f2,
Arithm
22
Floating-point functions
To pass data to and from floating-point functions:
The arguments are placed in $f12-$f15.
Return values go into $f0-$f1.
We also split the register-saving chores, just like
before.
$f0-$f19 are caller-saved.
$f20-$f31 are callee-saved.
These are the same basic ideas as before because we
still have the
same problems to solve, but now with different registers.
09/18/15
23
24
Type conversions
You can also cast integer values into floating-point ones with
MIPS
type conversion instructions.
Type to
convert to
cvt . s . w
Type to
point
convert from
register
Floating-point
destination
$f4, $f2
Floatingsource
$t0 , 32
# $t0 = 32
$t0 , $f2
# $f2 = 32
Floating
Point
Arithm
$f4 , $f2
# $f4 = 32.0
25
Summary
IEEE 754 is a floating-point arithmetic standard that
defines
number representations and operations.
Having a finite number of bits means we cant represent
all possible
real numbers, and must introduce errors by
approximation.
MIPS processors support the IEEE 754 standard.
There is a completely different set of floating-point
registers.
New09/18/15
instructionsFloating
do floating-point
Point Arithmarithmetic,
26