Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 11 Advanced Dividers.

Similar presentations


Presentation on theme: "Lecture 11 Advanced Dividers."— Presentation transcript:

1 Lecture 11 Advanced Dividers

2 Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 15 Variation in Dividers 15.3, Combinational and Array Dividers Chapter 16, Division by Convergence

3 Division versus Multiplication
Division is more complex than multiplication: Need for quotient digit selection or estimation Overflow possibility: the high-order k bits of z must be strictly less than d; this overflow check also detects the divide-by-zero condition. Pentium III latencies Instruction Latency Cycles/Issue Load / Store Integer Multiply Integer Divide Double/Single FP Multiply Double/Single FP Add Double/Single FP Divide The ratios haven’t changed much in later Pentiums, Atom, or AMD products* *Source: T. Granlund, “Instruction Latencies and Throughput for AMD and Intel x86 Processors,” Feb. 2012 May 2012 Computer Arithmetic, Division

4 Classification of Dividers
Array Dividers Dividers by Convergence Sequential Radix-2 High-radix Restoring Non-restoring regular SRT using carry save adders SRT using carry save adders

5 Fractional Division

6 Unsigned Fractional Division
zfrac Dividend z-1z z-(2k-1)z-2k dfrac Divisor d-1d d-(k-1) d-k qfrac Quotient q-1q q-(k-1) q-k sfrac Remainder …0s-(k+1) s-(2k-1) s-2k k bits

7 Integer vs. Fractional Division
For Integers: z = q d + s  2-2k z 2-2k = (q 2-k) (d 2-k) + s (2-2k) For Fractions: zfrac = qfrac dfrac + sfrac where zfrac = z 2-2k dfrac = d 2-k qfrac = q 2-k sfrac = s 2-2k

8 Unsigned Fractional Division Overflow
Condition for no overflow: zfrac < dfrac

9 Sequential Fractional Division
Basic Equations s(0) = zfrac s(j) = 2 s(j-1) - q-j dfrac for j=1..k 2k · sfrac = s(k) sfrac = 2-k · s(k)

10 Fig. 13.2 Examples of sequential division with integer and fractional operands.

11 Array Dividers

12 Sequential Fractional Division
Basic Equations sfrac(0) = zfrac s(j) = 2 s(j-1) - q-j dfrac s(k)frac = 2k sfrac

13 Restoring Unsigned Fractional Division
s(0) = z for j = 1 to k if 2 s(j-1) - d > 0 q-j = 1 s(j) = 2 s(j-1) - d else q-j = 0 s(j) = 2 s(j-1)

14 Restoring Array Divider
May 2012 Computer Arithmetic, Division

15 Non-Restoring Unsigned Fractional Division
s(-1) = z-d for j = 0 to k-1 if s(j-1) > 0 q-j = 1 s(j) = 2 s(j-1) - d else q-j = 0 s(j) = 2 s(j-1) + d end for if s(k-1) > 0 q-k = 1 q-k = 0

16 Nonrestoring Array Divider
Similarity to array multiplier is deceiving Critical path May 2012 Computer Arithmetic, Division

17 Division by Convergence

18 Division by Convergence
Chapter Goals Show how by using multiplication as the basic operation in each division step, the number of iterations can be reduced Chapter Highlights Digit-recurrence as convergence method Convergence by Newton-Raphson iteration Computing the reciprocal of a number Hardware implementation and fine tuning May 2012 Computer Arithmetic, Division

19 16.1 General Convergence Methods
Sequential digit-at-a-time (binary or high-radix) division can be viewed as a convergence scheme As each new digit of q = z / d is determined, the quotient value is refined, until it reaches the final correct value Convergence is from below in restoring division and oscillating in nonrestoring division Digit q 1 Meanwhile, the remainder s = z – q  d approaches 0; the scaled remainder is kept in a certain range, such as [– d, d) May 2012 Computer Arithmetic, Division

20 Elaboration on Scaled Remainder in Division
The partial remainder s(j) in division recurrence isn’t the true remainder but a version scaled by 2j Division with left shifts s(j) = 2s(j–1) – qk–j (2k d) with s(0) = z and |–shift–| s(k) = 2ks |––– subtract –––| Digit q 1 Quotient digit selection keeps the scaled remainder bounded (say, in the range –d to d) to ensure the convergence of the true remainder to 0 May 2012 Computer Arithmetic, Division

21 Recurrence Formulas for Convergence Methods
u (i+1) = f(u (i), v (i)) v (i+1) = g(u (i), v (i)) u (i+1) = f(u (i), v (i), w (i)) v (i+1) = g(u (i), v (i), w (i)) w (i+1) = h(u (i), v (i), w (i)) Constant Desired function Guide the iteration such that one of the values converges to a constant (usually 0 or 1) The other value then converges to the desired function The complexity of this method depends on two factors: a. Ease of evaluating f and g (and h) b. Rate of convergence (number of iterations needed) May 2012 Computer Arithmetic, Division

22 16.2 Division by Repeated Multiplications
Motivation: Suppose add takes 1 clock and multiply 3 clocks 64-bit divide takes 64 clocks in radix 2, 32 in radix 4  Divide faster via multiplications faster if 10 or fewer needed Idea: Converges to q Force to 1 Remainder often not needed, but can be obtained by another multiplication if desired: s = z – qd To turn the identity into a division algorithm, we face three questions: 1. How to select the multipliers x(i) ? 2. How many iterations (pairs of multiplications)? 3. How to implement in hardware? May 2012 Computer Arithmetic, Division

23 Formulation as a Convergence Computation
Idea: Force to 1 Converges to q d (i+1) = d (i) x (i) Set d (0) = d; make d (m) converge to 1 z (i+1) = z (i) x (i) Set z (0) = z; obtain z/d = q  z (m) Question 1: How to select the multipliers x (i) ? x (i) = 2 – d (i) This choice transforms the recurrence equations into: d (i+1) = d (i) (2 - d (i)) Set d (0) = d; iterate until d (m)  1 z (i+1) = z (i) (2 - d (i)) Set z (0) = z; obtain z/d = q  z (m) u (i+1) = f(u (i), v (i)) v (i+1) = g(u (i), v (i)) Fits the general form May 2012 Computer Arithmetic, Division

24 Determining the Rate of Convergence
d (i+1) = d (i) x (i) Set d (0) = d; make d (m) converge to 1 z (i+1) = z (i) x (i) Set z (0) = z; obtain z/d = q  z (m) Question 2: How quickly does d (i) converge to 1? We can relate the error in step i + 1 to the error in step i: d (i+1) = d (i) (2 - d (i)) = 1 – (1 – d (i))2 1 – d (i+1) = (1 – d (i))2 For 1 – d (i)  e, we get 1 – d (i+1)  e2: Quadratic convergence In general, for k-bit operands, we need 2m – 1 multiplications and m 2’s complementations where m = log2 k May 2012 Computer Arithmetic, Division

25 Quadratic Convergence
Table Quadratic convergence in computing z/d by repeated multiplications, where 1/2  d = 1 – y < 1 ––––––––––––––––––––––––––––––––––––––––––––––––––––––– i d (i) = d (i–1) x (i–1), with d (0) = d x (i) = 2 – d (i) 0 1 – y = (.1xxx xxxx xxxx xxxx)two  1/ y 1 1 – y 2 = (.11xx xxxx xxxx xxxx)two  3/ y 2 2 1 – y 4 = ( xxxx xxxx xxxx)two  15/ y 4 3 1 – y 8 = ( xxxx xxxx)two  255/ y 8 4 1 – y 16 = ( )two = 1 – ulp Each iteration doubles the number of guaranteed leading 1s (convergence to 1 is from below) Beginning with a single 1 (d  ½), after log2 k iterations we get as close to 1 as is possible in a fractional representation May 2012 Computer Arithmetic, Division

26 Graphical Depiction of Convergence to q
Fig Graphical representation of convergence in division by repeated multiplications. May 2012 Computer Arithmetic, Division

27 16.5 Hardware Implementation
Repeated multiplications: Each pair of ops involves the same multiplier d (i+1) = d (i) (2 - d (i)) Set d (0) = d; iterate until d (m)  1 z (i+1) = z (i) (2 - d (i)) Set z (0) = z; obtain z/d = q  z (m) Fig Two multiplications fully overlapped in a 2-stage pipelined multiplier. May 2012 Computer Arithmetic, Division

28 16.3 Division by Reciprocation
The Newton-Raphson method can be used for finding a root of f (x) = 0 Start with an initial estimate x(0) for the root Iteratively refine the estimate via the recurrence x(i+1) = x(i) – f (x(i)) / f (x(i)) Justification: tan a(i) = f (x(i)) = f (x(i)) / (x(i) – x(i+1)) Fig Convergence to a root of f(x) = 0 in the Newton-Raphson method. May 2012 Computer Arithmetic, Division

29 Computing 1/d by Convergence
1/d is the root of f (x) = 1/x – d f (x) = –1/x2 Substitute in the Newton-Raphson recurrence x(i+1) = x(i) – f (x(i)) / f (x(i)) to get: x (i+1) = x (i) (2 - x (i)d) One iteration = Two multiplications + One 2’s complementation Error analysis: Let d (i) = 1/d – x(i) be the error at the ith iteration d (i+1) = 1/d – x (i+1) = 1/d – x (i) (2 – x (i) d) = d (1/d – x (i))2 = d (d (i))2 Because d < 1, we have d (i+1) < (d (i))2 -d 1/d x f(x) May 2012 Computer Arithmetic, Division

30 Choosing the Initial Approximation to 1/d
With x(0) in the range 0 < x(0) < 2/d, convergence is guaranteed Justification: | d(0) | = | x(0) – 1/d | < 1/d d(1) = | x(1) – 1/d | = d (d(0))2 = (d d(0)) d(0) < d(0) For d in [1/2, 1): Simple choice x(0) = 1.5 Max error = 0.5 < 1/d Better approx x(0) = 4(3 – 1) – 2d = – 2d Max error  0.1 1 x 1/x 2 May 2012 Computer Arithmetic, Division

31 16.4 Speedup of Convergence Division
Compute y = 1/d Do the multiplication yz Division can be performed via 2 log2 k – 1 multiplications This is not yet very impressive 64-bit numbers, 3-ns multiplier  33-ns division Three types of speedup are possible: Fewer multiplications (reduce m) Narrower multiplications (reduce the width of some x(i)s) Faster multiplications May 2012 Computer Arithmetic, Division

32 Initial Approximation via Table Lookup
Convergence is slow in the beginning: it takes 6 multiplications to get 8 bits of convergence and another 5 to go from 8 bits to 64 bits d x(0) x(1) x(2) = ( )two Better approx Approx to 1/d Read this value, x(0+), directly from a table, thereby reducing 6 multiplications to 2 A 2w  w lookup table is necessary and sufficient for w bits of convergence after 2 multiplications Example with 4-bit lookup: d = xxxx (11/16  d < 12/16) Inverses of the two extremes are 16/11  and 16/12  So, is a good estimate for 1/d  = (11/8)  (11/16) = 121/128 =  = (11/8)  (3/4) = 33/32 = May 2012 Computer Arithmetic, Division

33 Visualizing the Convergence with Table Lookup
Fig Convergence in division by repeated multiplications with initial table lookup. May 2012 Computer Arithmetic, Division

34 Convergence Does Not Have to Be from Below
Fig Convergence in division by repeated multiplications with initial table lookup and the use of truncated multiplicative factors. May 2012 Computer Arithmetic, Division

35 Sequential Dividers with Carry-Save Adders

36 Block diagram of a radix-2 SRT divider with partial remainder in stored-carry form

37 Pentium bug (1) October 1994 Thomas Nicely, Lynchburg Collage, Virginia finds an error in his computer calculations, and traces it back to the Pentium processor November 7, 1994 First press announcement, Electronic Engineering Times Late 1994 Tim Coe, Vitesse Semiconductor presents an example with the worst-case error c = / Pentium = Correct result =

38 Pentium bug (2) Intel admits “subtle flaw” November 30, 1994
Intel’s white paper about the bug and its possible consequences Intel - average spreadsheet user affected once in 27,000 years IBM - average spreadsheet user affected once every 24 days Replacements based on customer needs December 20, 1994 Announcement of no-question-asked replacements

39 Pentium bug (3) Error traced back to the look-up table used by
the radix-4 SRT division algorithm 2048 cells, 1066 non-zero values {-2, -1, 1, 2} 5 non-zero values not downloaded correctly to the lookup table due to an error in the C script

40

41 Follow-up Courses

42 DIGITAL SYSTEMS DESIGN
ECE 681 VLSI Design for ASICs (Fall semesters) H. Homayoun, project/lab, front-end and back-end ASIC design with Synopsys tools 2. ECE 699 Digital Signal Processing Hardware Architectures (Spring semesters) A. Cohen, project, FPGA design for DSP ECE 682 VLSI Test Concepts (Spring semesters) T. Storey, homework

43 NETWORK AND SYSTEM SECURITY
ECE 646 Cryptography and Computer Network Security (Fall semesters) K.Gaj, hardware, software, or analytical project 2. ECE 746 Advanced Applied Cryptography (Spring semesters) J.-P. Kaps, hardware, software, or analytical project ECE 899 Cryptographic Engineering (Spring semesters) J.-P. Kaps, research-oriented project


Download ppt "Lecture 11 Advanced Dividers."

Similar presentations


Ads by Google