Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 10 Floating Point Number Rounding, Polynomial.

Similar presentations


Presentation on theme: "CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 10 Floating Point Number Rounding, Polynomial."— Presentation transcript:

1 CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 10 Floating Point Number Rounding, Polynomial Expression

2 CSE 2462 Topics:  Rounding F.P. Numbers  Polynomial Expression

3 CSE 2463 Rounding the numbers Why we need the Guard bit Round bit Sticky bit

4 CSE 2464 Example 1 1.00000 2 4 -1.10000 2 -3 Normalize according to exponent 1.00000 2 4 -0.00000011 2 4 0.11111101 2 4 Renormalize 1.1111101x2 3 Result = 1.11111x2 3 Take 5 bits after decimal Round bit Sticky Bit

5 CSE 2465 Rounding  We need only one guard bit for normalization after addition.  Assumption: Operands are normalized.  Why?

6 CSE 2466 Example 2 1.00001 2 3 -1.01011 2 -1 Normalize according to exponent 1.00000 2 3 -0.000101011 2 3 0.111100101 2 3 Renormalize 1.11100101 2 2 Result = 1.11101 2 2 Take 5 bits after decimal Round bit Bit on the boundary Non-zero => round-up

7 CSE 2467 Theory behind it gr round guard Other bits OR Sticky bit  When shifting right, don ’ t need to remember anything more than 3 bits below This is a necessary and sufficient condition

8 CSE 2468  Polynomial Approximation of Functions

9 CSE 2469 Taylor Series f(x) = f(x 0 ) + Example: sin(x) = x – x 3 /3! + x 5 /5! – x 7 /7!+ …

10 CSE 24610 Taylor Series Given: P N (x) = = c 0 +x(c 1 +x(c 2 + … +x(c N-1 +xc N ))))) R(N) =c N R(i-1) =c i-1 +xR(i) … P N (X) =R(0) How to calculate value of function? Group common factors …. N multiples and adds Recursively

11 CSE 24611 Taylor Series  1 adder => do it in series  Given more components => can we go faster?  Take N = 7 as example c 7 x 7 +c 6 x 6 +c 5 x 5 +c 4 x 4 +c 3 x 3 +c 2 x 2 +c 1 x 1 +c 0 How to accelerate?

12 CSE 24612 Taylor Series c 7 x 7 +c 6 x 6 +c 5 x 5 +c 4 x 4 +c 3 x 3 +c 2 x 2 +c 1 x 1 +c 0 Use 3 stages to generate x k Use x k to generate the polynominal expression. + x x xxxx x + ++ ++ + Carry-save =constant time Log n x x2x2 x3x3 x4x4 x5x5 x6x6 x7x7

13 CSE 24613 Taylor Series c 7 x +c 6 c 5 x +c 4 x c 3 x +c 2 c 1 x +c 0 x 2( c 7 x +c 6 )+c 5 x +c 4 x x 2 (c 3 x +c 2 )+ c 1 x +c 0 x 4 [x 2( c 7 x +c 6 )+c 5 x +c 4 x]+x 2 (c 3 x +c 2 )+c 1 x +c 0 This is a bit faster. Only 2 stages But what is fastest way to produce result? & energy efficient? => minimize[# of multiplies] All this uses + ’ s and x ’ s. Need to get rid of them. => Let ’ s to try table look-up x x2x2 x4x4

14 CSE 24614 Taylor Series – Table look-up SRAM/DRAM => eat power ROM => better option f(x) = Suppose there is a table as a binary tree. Let x = x H + x L x 0 = x H Example X = 110101 x H = 110000 f(x H + x L ) = x L = 000101

15 CSE 24615 Taylor Series – Table look-up 1 st order f(x H + x L ) ~= => Only 1 multiplication !!! x Table-1 Table-2 x + f(x H + x L ) xHxH xLxL f(x H ) f’(x H )

16 CSE 24616 Taylor Series  With extra order => 1 Extra table and 1 multiplier  If you wish to change the function, all you have to do is just change the content of the table  Problem? => Now it ’ s the size of the table! L / 2^L

17 CSE 24617 Taylor Series  Let ’ s reduce X into 3 sections (instead of the previous 2 (High and Low) ) x = x 1 +x 2 2 -k +x 3 2 -2k => f( x) = f( x 1 +x 2 2 -k )+ x 3 2 -2k f ’ ( x 1 ) + Epsilon E ~= 2 -3k f(x) requires a 2 n x V n table 2 n : # of bits of x V n : # bits of f(x) 32bit x => 2 32 x 2 32 = 2 64 64 bits -> HUGE!! -> but do we really need all those # ’ s in the table??

18 CSE 24618 Taylor Series Let E = epsilon, [] = Lower limit x*y = (x+y) 2 / 4 – (x-y) 2 / 4 = ( [(x+y)/2] + E/2 ) 2 - ( [(x-y)/2] + E/2 ) 2 = [ (x+y)/2 ] 2 - [ (x-y)/2 ] 2 - E * y ……… x Content of lower bits determines lower bits of result, but not other bits !! ……… x2x2 Table

19 CSE 24619 Taylor Series  2 n x V vs.2 n x (v-w ) + 2 L x w 2 n x v – (2 n x w - 2 L x w ) 2 n x v – w (2 n - 2 L ) Size of table is reduced by 2 n x v n /x v /f(x) 2 n x (v-w) n /x v-w / 2 L x w L / w / f(x)


Download ppt "CSE 246: Computer Arithmetic Algorithms and Hardware Design Instructor: Prof. Chung-Kuan Cheng Fall 2006 Lecture 10 Floating Point Number Rounding, Polynomial."

Similar presentations


Ads by Google