Presentation is loading. Please wait.

Presentation is loading. Please wait.

Floating Point in computers Comply with standards: IEEE 754 ISO/IEC 559.

Similar presentations


Presentation on theme: "Floating Point in computers Comply with standards: IEEE 754 ISO/IEC 559."— Presentation transcript:

1 Floating Point in computers Comply with standards: IEEE 754 ISO/IEC 559

2 Timeline Introductionquite short Binary reviewnot so long Integer Arithmetic1/3 Floating Point1/3 Floating Point Arithmetic1/3 Other issuesextra short

3 Introduction Who does computer arithmetic? Intel’s spare money How is it done in hardware? How Integer relates to Floating point Now, we go back to “computer structure”

4 Binary numbers What is 1 0 0 1 0 1 1. 0 0 1 0 1 ? 64 8 2 1

5 Signed Binary Integers Sign-magnitude 2’s complement 1’s complement biased

6 Sign-Magnitude High order bit = Sign 0101 = 5 1101 = -5 2 zero’s

7 2’s complement Number + Negative = 2 n 0101 = 5 1011 = -5 Easy addition (drop carry) Formula: -a n-1 2 n-1 + a n-2 2 n-2 + … +a 1 2 1 + a 0

8 1’s Complement Negative - complement to 1 0101 = 5 1010 = -5 2 zero’s Number + Negative = 2 n -1

9 Biased Binary = Number + Bias Bias = 5: 1101 = 55+5=10 0000 = -5(-5)+5 = 0 Relative order remains

10 Integer Arithmetic

11 Adding (usigned) Integers Elementry school : 1 1 0 0 1 1 0 1 1 0 0 0 0 1 1 0 + 110 1 0 1 1010 1 1 Result has n+1 bits!

12 Adding Integers - hardware Half Adder ab C in s C out ab s Full Adder 2 logical levels

13 Ripple carry Adder a n-1 b n-1 s n-1 C out a n-2 b n-2 C in s n-2 a1a1 b1b1 s1s1 a0a0 b0b0 s0s0 Slow - 2n logical levels Small constant (CMOS) Other ways exist

14 Adding Signed Integers In 2’s complement: b + (-a)= b + (2 n -a)= 2 n + (b-a) hence - add as integers, discard carry out Example:0011 + 1100 = ? = (2 n - (b+a)) + 2 n = (2n-b)+(2n-a)(-b) + (-a)

15 Substracting Integers Add the negation Negating 2’s complement: 11010100101011000110000 = ? 00001001010110101001110

16 Integer (unsigned) Multiplication Elementry school :1 1 0 1 1 0 0 1 1 1 0 1 0 0 1 1 0 1 0 1 1 1 0 1 0 1 * Result is 2n bits !

17 Hardware Multiplier P=0 loop:(i) if A 0 =1, add B to P (ii) right-shift P & A AP B Shift n n Carry n

18 Integer (unsigned) Division Elementry school : 110111 0 00 011 1 11 000 0 00 001 0 00 01 Result: 0100, Rem 1 Dec: 13/3=4, Rem 1

19 Hardware Divider P=0 loop:(i) left-shift P & A (ii) Sub. B from P: positive: a 0 =1 negative: a 0 =0, restore P (add B) AP B Shift n n+1 0

20 Example 13 / 3 = 4 (1) n=4 A=1101B=00011P=00000

21 PAB 0 0 0 1 10 0 0 0 0 1 1 0 1

22 PAB 0 0 0 1 10 0 0 0 1 0 1 0 0 Quotient Remainder

23 Division - remarks Non-restoring Algorithm Load P only if positive Check for 0 (Total) Result is 2n bits!

24 Integer arithmetic - remarks Signed Multiply and Division –Algorithms exist –We will not use them What to do with extra bits? Faster methods

25 Floating Point

26 Non Integers - Other Methods Fixed Point –example: # # #. # –Binary point shifted –Integer arithmetic (extra shifting) –Small number magnitude Rational –a/b(a,b  Z)

27 Floating Point Exponent + Significand (= Mantisa) x = s 2 e Example: s=101 e=011 x = 101 2 11 = 40= 5 2 3 = 101000

28 Uniqueness Denormal Numbers:123.456  10 7 0.123  10 4 Normalized:#.###  10 # 1.123  10 4 What about 0 ?

29 Floating Point Standard Why Standartize? –Hardware accelerators –Software compatibility –Build Software Libraries –etc….. IEEE 754-1985ISO/IEC 559 Includes: Structure, Arithmetic results

30 Float Types 4 Precision Types: –Single –Single extended –Double –Double extended

31 Single Precision 32 bits: Exponent (e):Biased ( + 127) Significand (f):Fixed fraction: 0. # # # … Nuber:1.f 2 e-127 11111111111111111111111111111111 Sign(1)Exponent(8)Significand(23)

32 Single Precision - Example 1 10000001 01000000000000000000000 10000001 = 129 01000… = 0.01000…  129-127=2 X = - 1.25 2 2 X = - 5  1.01= 1.25

33 Single Precision - Range E max = 127(e = 254) E min = -126(e = 1) Why |E min |<|E max |? –1/2 E min does not overflow Why Biased notation? What about 0 and 255 ?

34 Floating Point Precision

35 Exmaples We shall use base 10 sometimes: f will have 3 digits E max will be 98 E min will be -97 Ex:5.34  10 70

36 NaN Not a Number Result of ilegal computation: – –Any computation involving a NaN e = E max + 1&f  0 # 11111111 ####################### Many NaN’s (different f’s)

37 NaN’s in use Zero finder outside domain –f(x) = sqrt(x) - 1 Works since all computations NaN No exception caused !

38 Zero’s 0 00000000 00000000000000000000000 ? this is NOT 1.0  2 E min 1 00000000 00000000000000000000000 ?  0 is signed!  0 both exits! What is the difference?

39 Signed 0’os +0 = -0 BUT: Multiply/Divide keep sign rules: Monivation: –Using inf correctly (describe later) –log(x): log(0)=-inflog(negative)=Nan log(x) if x  (-0) ?

40 ± inf More logic: e = E max + 1&f = 0 # 11111111 00000000000000000000000

41 Inf usage Example (If tan -1 is defined properly)

42 More on 0’os and inf’s General Rule for 0/inf arithmetic: –Take appropriate limit: 1/(1/x) where x=0 or inf Why not Max # instead?

43 Zero’s and inf’s - yet again X/(x 2 +1) is bad!Why? 1/(x+x -1 ) is better Do we need to check for x=0? Using 2 zero’s and inf’s saves some special cases checks.

44 Denormalized numbers Example: –x=1.2310 - 98 y=1.1110 - 98 –x-y = 1.2010 - 99 = 0 –so: x-y=0 but: x  y –think of:if(x  y) then z=1/(x-y) Soluition: –use denormalized numbers!

45 Denormal Numbers Smallest normal: 1.0 2 E min Below, use denormal: 0.f 2 E min e = E min - 1&f  0 # 00000000 ####################### Gradual underflow: 1.23 10 -4 ( /10 ) 0.12 10 -4 ( /10 ) 0.01 10 -4 ( /10 ) 0

46 Denormal Numbers Back to our Example: –x=1.2310 - 98 y=1.1110 - 98 –x-y = 0.1210 - 98 –and this is not 0 !

47 Flush to 0 Vs Gradual Underflow 02 -4 2 -3 2 -1 2 -2 02 -4 2 -3 2 -1 2 -2

48 Special Values - Summary ExponentFractionRepresents E min -1 f=0  0 E min -1 f  0 0.f  2 E min E min  e  E max ---- 1.f  2 e E max +1 f=0  0 E max +1 f  0 0.f  2 E min

49 Rounding Why is rounding needed? Infinit numbers  Finit representation Integers only overflow Almost all operations need rounding IEEE - specifies algorithms for arithmetic

50 Numbers need rounding Out of range: –x>2  2 E max x<1  2 E min Between 2 floats: –0.1 10 = 0.00011001100…. 2 = 1.1001100….  2 -4 –1.1001  2 -4

51 Measuring Error ULPS(units in last place) –1.12  10 -1 Vs 0.124: 0.4 ulps –1.12  10 -1 Vs 0.118: 0.2 ulps Relative Error –Difference/Original –1.12  10 -1 Vs 0.124: Err=0.004/0.124=0.032

52 Calculate Using Rounding Benign cancellation –Calculate 10.1-9.93 (= 0.17) 1.01  10 1 0.99  10 1 0.02  10 1 = 2.00  10 -1 –30 upls!

53 Rounding problems Catastrophic cancellation –b 2 -4ac –both b 2 and 4ac are rounded –the (-) exposes the error –b=3.34 a=1.22 c=2.28 b 2 =11.2 4ac=11.1 b 2 -4ac=0.10 correct=0.0292(70.08 upls)

54 IEEE Arithmetic Requirement: + -    shold be EXACTLY rounded remaindershold be EXACTLY rounded Integer conv.shold be EXACTLY rounded Not all (transcendental, binary to decimal) “Tie break” - Round to Even

55 Round to Even How will 1.005 be rounded ? –Round Up:1.01 –Round Even:1.00 Why? Example: –x i =x i-1 +y-yx0=1.00 y=0.125 –Round up:1.00, 1.01, 1.02, …. –Round even:1.00, 1.00, 1.00, ….

56 Float Multiplication Integer multiply Biased additio n “ Biased addition ” : ­detect Overflow: Use n+1 bit adder ­detect Underflow:Harder (Denormals)

57 Rounding Multiplication 1.23 6.78 8.3394 X Round to 8.34 2.83 4.47 12.6501 X Round to 1.27 1.28 7.81 09.9968 X Round to 1.00 1.0001 1 1.0010 0 1.0010 1 0.1101 0 Round bit 0 Round bit 1 All rest 0 Round bit 1 All rest 0 Shift needed

58 Round, Guard, Sticky 0. 1 1 0 1 0 0 0 1 0 numberguardroundsticky 1. 0 0 1 0 0 0 1 0 0 numberroundsticky

59 Rounding Multiplication AP B Shift n n Carry n x 0 x 1.x 2 x 3 x 4 x 5 g r s s s s x 1.x 2 x 3 x 4 x 5 g X 0. x 1 x 2 x 3 x 4 x 5 Case 1: x 0 =0, shift Case 2: x 0 =1, inc. exp Product Results: Roun d digit Sticky bit

60 Rounding rules r=0  rounded OK r=1, s=1  add 1 to LSB r=1, s=0  add 1 if LSB=1 Denormals  Extra shifting

61 Float addition Compute all digits and round? –1.00  2 20 + 1.00  2 -20 = 10000000….0000001 –too long! Use Round and Sticky bits: –shift to same exponent –r = first discarded digit –s = OR of rest discarded

62 Float addition - example 1.10011.00001 1.10100 + r=1, s=1 Round needed!  1.10101 Calculate:1.10011  2 0 + 1.10001  2 -5 Shift exponents:1.10011  2 0 + 0.0000110001  2 0 r=1 s=0|0|0|1= 1

63 Signed Addition/Substraction Simplest way- convert to 2’s cmpl. Cancellation of high order bit - shift more bits cancel - How many guard digits? 1.00000 1.11111 0.11111 + 1.00000 0.00000101111 - 1.1111101000 1 cmpl

64 Float Division Integer division Biased substractio n Very similar to Multiplication Dividing using integer divide Compute 2 more bits (round, guard) Use remainder as sticky bit (Why?) Sign bit: XOR

65 More on floats

66 Rounding modes IEEE specifies 4 modes: –Nearest(default) –towards 0 –towards +inf –towards -inf affects overflow (How?)

67 Exceptions Set a flag at: –Underflow1.0  2 E min x 1.0  2 E min –Overflow1.0  2 E max x 1.0  2 E max –divide by 01/0 –inexactRounded was needed –invalidNaN return operations flags are sticky

68 Speeding up Different algorithms may be used Result should be exact divide SRT algorithm in pentium –5/2048 entries in a table –1/9,000,000 chance –check:

69 Precision Why extended precisions? –Return higher accuracy (D*D  ext. D) –use for computations:


Download ppt "Floating Point in computers Comply with standards: IEEE 754 ISO/IEC 559."

Similar presentations


Ads by Google