Presentation is loading. Please wait.

Presentation is loading. Please wait.

1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:

Similar presentations


Presentation on theme: "1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:"— Presentation transcript:

1 1

2 2 12.1 Rounding Modes

3 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard: round to floating number whose significand has an LSB of 0 (of two adjacent floating- point number, the significand of one must end in 0, and the other one in 1). This is called round-to-near- even. For example, 3.5 and 4.5 are both rounded to 4, the closet even number, based on round-to-near-even.

4 4 Other rounding methods –Round inward (toward 0):choose the nearest value in the same direction as 0. –Round upward (toward +∞): choose the larger of the two possible values. –Round downward (toward -∞): choose the smaller of the two possible vavlues.

5 5 Example 12.1 Rounding to the nearest integer a.Consider the rounded even integer corresponding to a real signed-magnitude number x a rtnei(x). Plot this round-to- nearest-even-integer for x in the range [-4,4]. b.Repeat part a for the function rtni(x), that is, round-to-nearest-integer function, where the midway values are always rounded up

6 6

7 7 Example 12.2 Directed rounding a.Consider the inward-directed round corresponding to a real signed-magnitude number x as a function ritni(x). Plot this round-inward-to-nearest-integer function for x in the range [-4,4]. b.Repeat part a for the round-upward-to- nearest-integer rutni(x).

8 8 Figure 12.3 Two directed round-to-nearest-integer functions for x in [– 4, 4].

9 9 Figure 12.3 (Continued)

10 10 12.2 Special Values and Execeptions Five special values in ANSI/IEEE floating-point standard –±0Biased exponent=0, significand=0 (no hidden 1) –± ∞Biased exponent=255 (short), or 2047 (long), significand=0 –NaNBiased exponent=255 (short), or 2047 (long), significand≠0

11 11 Consider the addition of ±2 e1 s1 and ±2 e2 s2, where e1 > e2 (±2 e1 s1) +(±2 e2 s2)=±2 e1 (s1±s2/2 e1-e2 ) 12.3 Floating-Point Addition

12 12

13 13 Figure 12.6 Simplified schematic of a floating-point adder

14 14 12.4 Other Floating-point Operations Multiplication of ±2 e1 s1 and ±2 e2 s2 (±2 e1 s1)×(±2 e2 s2)=±2 e1+e2 (s1×s2/2 e1-e2 ) Division of ±2 e1 s1 and ±2 e2 s2 (±2 e1 s1)/(±2 e2 s2)=±2 e1-e2 (s1/s2)

15 15 Figure 12.6 Simplified schematic of a floating-point multiply/divide unit.

16 16 Figure 12.7 The common floating-point instruction format for MiniMIPS and components for arithmetic instructions. The extension (ex) field distinguishes single ( * = s ) from double ( * = d ) operands. 12.5 Floating-Point Instructions 10 floating-point arithmetic instructions (5 different operations: add, sub, multiply, divide, negate) add.s $f0,$f8,$f10# set $f0 to ($f8)+($f10) add.d $f0,$f8,$f10# set $f0 $f1 to ($f8 $f9 )+($f10 $f11 ) Single operands can be in any of the floating registers. Double operands must be in specified to be in even numbered registers

17 17 Figure 12.8 Floating-point instructions for format conversion in MiniMIPS. 6 format conversion instructions: integer to single/double, single to double, double to single, and single/double to integer cvt.s.w $f0,$f8 # set $f0 to single (integer $f8) cvt.d.w $f0,$f8 # set $f0 to double (integer $f8) cvt.d.s $f0,$f8 # set $f0 to double ($f8) cvt.s.d $f0,$f8 # set $f0 to single ( $f8, $f9,) cvt.w.s $f0,$f8 # set $f0 to integer ($f8) cvt.w.d $f0,$f8 # set $f0 to integer ($f8, $f9)

18 18 Figure 12.9 Instructions for floating-point data movement in MiniMIPS. 6 data transfer instructions: load/store word to/from coprocessor1, move single/double from one FP register to another, move (copy) between FP registers and CPU general registers. lwcl $f8, 40($3) # load mem[40+($s3)] into $f8 swc1 $f8, A($3) # store mem[A+($s3)] into $f8 mv.s $f0,$f8 # load $f0 with ($f8) mv.d $f0,$f8 # load $f0,$f1 with ( $f8, $f9,) mfc1 $t0,$f12 # load $t0 with ($f12) mtc1 $f8,$t4 # load $f8 with ($t4)

19 19 Figure 12.10 Floating-point branch and comparison instructions in MiniMIPS. 2 branch and 6 comparison instructions. The FP unit has a flag that is set to T or F based on 6 comparisons (equal, less than, or less or equal for single/double data type) bc1tL # branch on FP flag true bc1fL # branch on FP flag false c.eq.* $f0, $f8 # if ($f0)=($f8), set flag to true c.lt.* $f0, $f8 # if ($f0)<($f8), set flag to true c.lw.* $f0, $f8 # if ($f0)≤($f8), set flag to true

20 20 Table 12.1 The 30 MiniMIPS floating-point instructions:because the op field contains 17 for all but two of the instructions (49 for lwc1 and 50 for swc1 ), it is not shown.

21 21 12.6 Result Precision and Errors FP arithmetic can be quite dangerous and must be used with proper care, because results of FP computations are inexact. Why? –Many real numbers do not have exact binary representation within a finite word format. This is referred as representation error. –Even for values that are exactly representable, FP arithmetic produces inexact results. For example, product of 2 short FP numbers will have a 48 bits significant that must be rounded to 23 bits (plus hidden 1) This is called computation error.

22 22 Example 12. 4 Associate law of addition does not hold in general in FP arithmetic. For example a= -2 5 ×(1.10101011) b=2 5 × (1.10101110) c=-2 -2 × (1.01100101) (a+b)+c = a+(b+c) ?

23 23 Figure 12.11 Algebraically equivalent computations may yield different results with floating-point arithmetic.

24 24 Using guard digits to avoid excessive error. For example, in a 10-digit calculator, 1/3 is represented as 0.333 333 333 3, multiplying 3 results in 0.999 999 999 9, but not 1. However, in a calculator with 2 guard bits, 1/3 is represented as 0.333 333 333 333, but still displayed as 0.333 333 333 3, multiplying 3 results in 1.

25 25 Figure 12.12 Function evaluation by table lookup and linear interpolation.


Download ppt "1. 2 12.1 Rounding Modes 3 Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard:"

Similar presentations


Ads by Google