Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu

Similar presentations


Presentation on theme: "CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu"— Presentation transcript:

1 CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
CSE 575 Computer Arithmetic Spring Mary Jane Irwin (

2 Table Lookup Arithmetic Def’n
Given an m-variable function f(xm-1, xm-2, …, x1, x0) the table lookup evaluation of f requires the construction of a 2u x n table that holds, for each combination of input values (needing a total of u-bits to represent), the desired n-bit result. The u-bit string is obtained by concatenating the input values to form the table address. The n-bit value is the content of the table corresponding to that address.

3 Arithmetic by Table Lookup
Advantages memory is denser and can be made more robust than random logic reduces the cost of hardware development more flexible (allows last minute design changes) reduces the number of building blocks Disadvantages slow for large tables table size grows exponentially in input size

4 Direct Table Lookup 2u x n Table (ROM or RAM) Operand(s) u bits Result(s) n bits Unary (single variable) functions (1/x, ln x, x2) limited to 12 to 16 bits of operand precision (212 to 216 table) Binary functions (xy, x/y, xy) limited to 8 bits of operand precision (28+8=16 table) Operand bits are concatenated to form the address 2*10 = 1K 2**16 = 64K

5 Indirect Table Lookup One way to reduce the table size is to see if the binary function can be converted into a unary function (requires pre- and postprocessing logic) Much smaller than u ! Smaller Table(s) (ROM or RAM) Postprocessing Logic Result(s) n bits Operand(s) u bits processing Pre- Boundary between the use of table – in supporting roles like we’ve seen for SRT division and function evaluation or in a primary role – is quite fuzzy. With pure logic and pure tabular as extreme points in a continuum of hybrid solutions.

6 Indirect Table Lookup Example
Pre- and postprocessing hardware should be simple and fast Consider the multiplication identity that converts the problem to one of squaring X * Y = ¼ [(X + Y)2 – (X – Y)2] Preprocessing does X+Y and X-Y Two tables to lookup (X+Y)2 and (X-Y)2 Postprocessing does subtract and 2-bit right shift Or can share one table – and do the square evaluations serially Can also play lots of games (see book section 24.2) to reduce by a few bits the size of the address

7 Indirect Multiplication
2** ROM (X+Y)2 Right shifter (¼) X*Y (2n-bits) X (n-bits) Y (n-bits) Subtractor (X-Y)2 Adder 2n+1addr bits n +1 bits 2n contentbits for lecture Assuming 16 bit operands, takes two 2**17 ROM tables by 32-bits (= 2**18 by 32-bits of storage) as opposed to one 2**32 ROM table by 16-bits

8 Hardware Optimizations
Since X+Y and X-Y are both either even or odd, the least significant two bits of (X+Y)2 and (X-Y)2 are identical (both either 00 or 01) and cancel either other out in the postprocess subtraction can reduce the tables to 2n+1 x (2n-2) can eliminate the postprocessing right shifter Improved speed at no cost!! Assuming 16 bit operands, takes two 2**17 ROM tables by 30-bits (= 2**18 by 30-bits of storage) Improved speed since table is a little smaller and the final shift step is eliminated.

9 (X+Y)/2 = (X+Y)/2 + /2 and (X-Y)/2 = (X-Y)/2 + /2
More Hardware Opts For a factor-of-2 table size reduction Let  denote the lsb of X+Y and X-Y then (X+Y)/2 = (X+Y)/2 + /2 and (X-Y)/2 = (X-Y)/2 + /2 Then ¼[(X+Y)2–(X-Y)2] = ((X+Y)/2 + /2)2 – ((X-Y)/2 + /2)2 = (X+Y)/22 - (X-Y)/22 + Y Preprocessing computes X+Y and X-Y dropping the lsb Two tables to lookup (X+Y)2 and (X-Y)2 Postprocessing does three operand addition with the third operand being 0 or Y

10 Opt. Indirect Multiplication
X (n-bits) Adder 2** ROM (X+Y)2 2n addr bits n bits 2n-1 content bits 2** ROM (X-Y)2 Subtractor Y (n-bits) for lecture Takes two 2**16 ROM tables by 31-bits (= 2**17 by v-bits of storage) CSA Adder X*Y (2n-bits)

11 Another Lookup Example
To add sign & log representations of X = (SX, LX) and Y = (SY, LY) (for XY0) to get Z = (SZ, LZ) where LZ = log Z = log (XY) Computation of LZ can be done as LZ = log (XY) = log [X (1  Y/X)] = log X + log (1  Y/X) = LX + log (1  log-1 ) where  = LY – LX Binary input ROM too expensive – for 12 bit operands would be a 2 **24 table of 12 bit results!!

12 Indirect Log Addition  2u+1 addr bits LX (u-bits) u contentbits
2** ROM log (1  log-1 ) Subtractor LY (u-bits) Adder u +1 bits LZ (u-bits)

13 SIMD Arrays Bit-serial processing on SIMD arrays (e.g., CM2)
accommodate as many processors as possible simple so many will fit on one chip with limited I/O to fit pin limitations bit-serial processor all computing the same operation on their local data (SIMD)

14 Bit-Serial ALU Design Flexibility is provided by using hardware that supports any three single-bit input, two single-bit output function giving 223 = 256 such logic functions How to encode the 256 functions within an 8-bit opcode? Do it with multiplexors! and use the truth table for each function as the opcode

15 Multiplexor Arithmetic
Arithmetic with multiplexors! a + b + cin = 2*carryout + sum For lecture Show binary adder with left MUX as sum computer and right as carry computer, MUX control are the three 1-bit operands while the MUX inputs are the “opcodes” that drive addition. a b cin a b cin carryout sum

16 Bit-Serial Mux ALU a b from local memory f Opcode (from central
control) f(a,b,c) g Opcode (from central control) g(a,b,c) to local memory

17 FPGAs Field programmable gate arrays (FPGAs)
as many logic components (LUT) as possible on a chip flexible/programmable logic structure with limited I/O to fit interconnect limitations (connection vias ) once again can use muxes with a local pre-programmed local control memory

18 Key References Ling, An approach to implementing multiplication with small tables, IEEE Trans. on Computers, 39(5): , 1990. Noetzel, An interpolating memory unit for function evaluation, IEEE Trans. on Computers, 38(3): , 1989. Parhami, Computer Arithmetic, Oxford Univ. Press, 1999. Tang, Table lookup algorithms for elementary functions and their error analysis, Proc. Symp. Computer Arithmetic, pp , 1991.


Download ppt "CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu"

Similar presentations


Ads by Google