CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 9: Floating Point Numbers.

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 9: Floating Point Numbers

CSE 2462 Motivation  Maximal information with given bit numbers.  Arithmetic with proper precision.  Fairness of rounding.  Features at the expenses of the complexity of the operations.

CSE 2463 Topics:  Floating Point Numbers (IEEE P754)  Standard  Operations  Exceptional Situations  Rounding Modes  Numerical Computing with IEEE Floating Point Arithmetic, Michael L. Overton, SIAM

CSE 2464 Standard 2 32  Typically  Goal: Dynamic Range: largest #/ smallest #  If too large, holes between # ’ s

CSE 2465 Standard  ulp (unit in the last place)  Difference between two consecutive values of the significand. 3 Parts  x = ~s b e :sign, significand, exponent Sign Bit 8-bit exponent 23-bit Significand

CSE 2466 Standard  ~ e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 s 1 s 2 s 3 … s 22 s 23 1. s 1 s 2 s 3 … s 22 s 23 normalized number 0. s 1 s 2 s 3 … s 22 s 23 denormalized number e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 0 0 0 0 0 0 0 0 0x= 0.s 1 s 2 s 3 … s 22 s 23 2 -126 1 0 0 0 0 0 0 0 1 x= 1.s 1 s 2 s 3 … s 22 s 23 2 -126 2 0 0 0 0 0 0 1 0 x= 1.s 1 s 2 s 3 … s 22 s 23 2 -125. 127 0 1 1 1 1 1 1 1 x= 1.s 1 s 2 s 3 … s 22 s 23 2 0. 253 1 1 1 1 1 1 0 1 x= 1.s 1 s 2 s 3 … s 22 s 23 2 126 254 1 1 1 1 1 1 1 0 x= 1.s 1 s 2 s 3 … s 22 s 23 2 127 255 1 1 1 1 1 1 1 1 x= Inf if (s 1 … s 23 )= 0, NaN otherwise. NaN  Not a Number

CSE 2467 Standard 0.01x2 -3 = 0.001x2 -2  Same number, so normalize to remove redundancy  Use a default 1 in front for one more bit precision.  Smallest Number 0.00 … 01x2 -126 = 1.0x2 -23 x2 -126 = 1x2 -149

CSE 2468 Standard - Example ~ eeeeeeee sssss sssss sssss sssss sss 0 00000000 00000000000000000000000 = 0.000 … 0x2 -126 1 00000000 00000000000000000000000 =-0.000 … 0x2 -126 0 00000000 00000000000000000000001 = 0.000 … 1x2 -149 0 00000001 00000000000000000000000 = 1.000 … 0x2 -126 normalized minimum 0 00000001 00000000000000000000001 = 1.000 … 1x2 -126. 0 01111111 00000000000000000000000 = 1.000 … 0x2 0 0 01111111 00000000000000000000001 = 1.000 … 1x2 0 0 10000000 00000000000000000000001 = 1.000 … 1x2 1

CSE 2469 Standard – Example Cont. 0 11111110 00000000000000000000000 = 1.000 … 0x2 127 0 11111110 00000000000000000000001 = 1.000 … 1x2 127 0 11111110 11111111111111111111111 = 1.111 … 1x2 127 - Normalized Maximum 0 11111111 00000000000000000000000 = Inf N min = 1.0 x 2 -126 N max = (2 – 2 -23 )2 127

CSE 24610 Double Floating Point ~ e 1 e 2 … e 11 s 1 s 2 … s 52 0 00 … 000 s 1 s 2 … s 52 x=0.s 1 s 2 … s 52 2 -1022 0 00 … 001 s 1 s 2 … s 52 x=1.s 1 s 2 … s 52 2 -1022. 0 01 … 111 s 1 s 2 … s 52 x=1.s 1 s 2 … s 52 2 0 0 10 … 000 s 1 s 2 … s 52 x=1.s 1 s 2 … s 52 2 1. 0 11 … 110 s 1 s 2 … s 52 x=1.s 1 s 2 … s 52 2 1023 0 11 … 111 s 1 s 2 … s 52 x=Inf if (s 1 … s 52 )=0

CSE 24611 Overflow/Underflow N max N min SparserDenser Overflow Underflow

CSE 24612 Addition/Multiplication  ~s 1 xb e1 + (~s 2 xb e2 ) = ~sxb e = ~s 1 xb e1 + ~s 2 /b e1-e2 x b e1 = (~s 1 + ~s 2 /b e1-e2 ) x b e1  (~s 1 xb e1 ) x (~s 2 xb e2 ) = ~(s 1 xs 2 )b e1+e2

CSE 24613 Exceptions a/0 = Inf if a > 0 a/Inf = 0if a != 0 a · 0 = 0 a · Inf = Inf if a > 0 a + Inf = Inf 0 · Inf = invalid operation (NaN) 0/0 = invalid operation (NaN) Inf - Inf = NaN NaP op a = NaN

CSE 24614 Rounding Mode  Adder Output = C out z 1 z 0.z -1 z -2 … z - l GRS Guard Bit Round Bit Sticky Bit, OR of all bits below bit R 1.101 x 2 3 +1.110 x 2 3 11.011 x 2 3 1.1011x2 4 Normalize – need to round or

CSE 24615 Rouding 1.110 2 3 - 1.101 2 3 0.001 2 3 1.000 2 0 normalize 1.101 2 3 - 1.111 2 2 1.101 2 3 - 0.1101 2 3 0.1101 2 3 1.101 2 2 Guard bit

CSE 24616 Rounding  Round to the nearest even 1.10111 toward 0 1.1011 Toward + Inf 1.1100 Toward - Inf 1.1011

CSE 24617 Conventional Rounding Error Rounding Error 1.10100  1.101= 0 1.10101  1.101= -0.25 1.10110  1.110 = +0.5 1.10111  1.110 = +0.25 Average Error = 0.5/4 = 0.125

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 9: Floating Point Numbers.

Similar presentations

Presentation on theme: "CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 9: Floating Point Numbers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 9: Floating Point Numbers.

Similar presentations

Presentation on theme: "CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 9: Floating Point Numbers."— Presentation transcript:

Similar presentations

About project

Feedback