CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu

CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
CSE 575 Computer Arithmetic Spring Mary Jane Irwin ( These slides need more work!!

Division by Reciprocation
Assuming a fast multiplier is provided, another way to do division is via reciprocation Q = P/D  P * 1/D that can be particularly efficient if several divisions by the same divisor need to be performed. Compute reciprocal by Series expansion (Taylor series) Additive iteration (Newton-Raphson) Also consider that divide happens infrequently would like divide time to approx. equal multiply time (if possible)

Two Goals Shoot for logarithmic convergence (e.g., double the number of “resolved” quotient bits each iteration) Use only simple operations (e.g., add, subtract, compare, multiply)

Reciprocation by Series Expansion
Let D = 1 + X and ½ ≤ D < 1 Then, based on the Maclaurin series g(X) = 1/D = 1/(1+X) = 1 – X + X2 –X3 +X and since X = D - 1, the above can be factored (for ½ ≤ D < 1) into 1/D = (1 - X)(1 + X2)(1 + X4)(1 + X8)(1 + X16). . . Notice that the 2’sc of 1 + Xj = 1 - Xj since 2 – (1 + Xj) = 1 - Xj and (1 + Xj) (1 - Xj) = 1 - X2j Maclaurin series is a special case of the Taylor series

IBM 360/91 Approach Compute 1/D to 32-bit precision via
(1 – X)(1 + X2)(1 + X4) by table look-up 1 - X8 = [(1 – X)(1 + X2)(1 + X4)](1 + X) 1 + X8 is the 2’sc of 1 - X8 1 – X16 = (1 + X8)(1 – X8) 1 + X16 is the 2’sc of 1 – X16 1 – X32 = (1 + X16)(1 – X16) 1 + X32 is the 2’sc of 1 – X32 Requires a 28 x 8 table look-up and three multiplies to compute the needed terms start off with a ROM table look up (for speed) - want to start off with 8 bits (to the right of the binary point) “correct” so need an 2**8 x8 bit ROM (or better a 2**10x8 bit ROM) to give the first 8 bits of the inverse

Series Expansion Calculations
1/D = (1 - X)(1 + X2)(1 + X4)(1 + X8)(1 + X16) (1 + X32) table look-up (1 + X) * (1 - X8) * 2s’c (1 + X8) * * (1 – X16) 2s’c (1 + X16) Need two multipliers per iteration * * (1 – X32) 2s’c (1 + X32)

Additive Iteration Function must be based on a continuous and differentiable function of the form f(X) = 0 finding the root of Q = P/D or 1/D (or something close) where we can develop an iterative method for finding the root where the iterations contain only simple operations (i.e., no divide)

Newton-Raphson Approach
Newton-Raphson method Determine a root of f(X) = 0 giving the iterative recurrence Xi+1 = Xi - f(Xi)/f’(Xi) f’(Xi) = f(Xi)/(Xi-Xi+1) f(X) Xi+1 Xi tangent at Xi a second order equations (I.e., uses the 1st derivative) and will need 2 multiplies per iteration Xi+2 Root

Newton-Raphson Reciprocation
Newton-Raphson applied to reciprocation uses f(X) = 1/X – D = 0 which has a root at X = 1/D Since f’(X) = -(1/X)2, gives the recurrence Xi+1 = Xi (2 – XiD) Chose X0 such that 0 < X0 < 2/D Requiring two multipliers per iteration In general, for D in [1/2, 1) so that 1/D is in (1,2] -> picking X0 = 1.5 is simple and adequate (and since error0 < 1/D convergence is guaranteed) or choosing X0 = 1 also converges and first iteration only requires one multiply choosing X0 = 1 approaches the root from below f(X) is continuous and differentiable and 1/X = D, so X = 1/D so it has a root at the reciprocal simple iteration using 2 multiplies

Decimal Example Find 1/D where D = 0.75 (1/D= 1.33333. . .)
For lecture X3 = = 3 =

Convergence Rate Gives quadratic convergence (i+1 ≤ |i|2)
Xi+1 = Xi (2 – XiD) and i = 1/D – Xi So Xi = 1/D - i = (1 – Di )/D and i+1 = 1/D – Xi+1 i+1 = 1/D – [Xi (2–XiD)] = [1-2DXi+(DXi)2]/D And substituting for Xi i+1 = [1 - 2D((1–D i)/D) + (1–Di)2]/D = D i2 Recall that D < 1, so i+1 ≤ |i|2 to get 32 bits of precision, need 5 iterations each requiring two multiplies

Binary Example Find 1/D where D = 0.75 = 0.1100
X1 = 2-D = = 1 ≤ 2-2 X2 = = 2 ≤ 2-4 For lecture X3 = = 3 ≤ 2-8

Initial Values For D in [½, 1) a good initial value would be
X0 = 1.5 since it limits 0 to the maximum of 0.5 A better approximation would be X0 = 4(√3 – 1) - 2D = D that can be obtained easily and quickly from D by shifting and adding

Speeding it Up Iterative division takes 2log2n - 1 multiplications
So with 64-bit numbers and a 5 ns multiplier, division would need *6 - 1 = 55 ns Speedups are possible through Reducing the number of multiplies by doing a better initial guess (e.g., with a table look-up) Using narrower multiplications Performing the multiply faster

Key References Anderson, The IBM system 360/91 floating point execution unit, IBM J Res. Development, 11(1):34-53, 1967. Flynn, On division by functional iteration, IEEE Trans. on Computers, C-19(8): , 1970. Oberman and Flynn, Division algorithms and implementation, IEEE Trans. on Computers, C-46(8): , 1997. Waser and Flynn, Introduction to Arithmetic for Digital Systems Designers, HRW, 1982.

CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu

Similar presentations

Presentation on theme: "CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu

Similar presentations

Presentation on theme: "CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu"— Presentation transcript:

Similar presentations

About project

Feedback