Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 18: Datapath Functional Units

Similar presentations


Presentation on theme: "Lecture 18: Datapath Functional Units"— Presentation transcript:

1 Lecture 18: Datapath Functional Units

2 Outline Comparators Shifters Multi-input Adders Multipliers
18: Datapath Functional Units

3 Comparators 0’s detector: A = 00…000 1’s detector: A = 11…111
Equality comparator: A = B Magnitude comparator: A < B 18: Datapath Functional Units

4 1’s & 0’s Detectors 1’s detector: N-input AND gate
0’s detector: NOTs + 1’s detector (N-input NOR) Asymmetric design that favors the late-arriving inputs. Here, the delay from the latest bit A7 is a single gate 18: Datapath Functional Units

5 1’s & 0’s Detectors Fast pseudo-nMOS or dynamic NOR detector.
Works well for words up to about 16 bits For larger words, the gates can be split into 8–16-bit chunks to reduce the parasitic delay 18: Datapath Functional Units

6 Equality Comparator Check if each bit is equal (XNOR, aka equality gate) 1’s detect on bitwise equality 18: Datapath Functional Units

7 Magnitude Comparator Compute B – A and look at sign B – A = B + ~A + 1
For unsigned numbers, carry out is sign bit If there is a carry-out: A<=B. Otherwise: A>B. zero detector indicates that the numbers are equal. 18: Datapath Functional Units

8 Signed vs. Unsigned For signed numbers, comparison is harder
C: carry out Z: zero (all bits of B – A are 0) N: negative (MSB of result) V: overflow (inputs had different signs, output sign  B) S: N xor V (sign of result) 18: Datapath Functional Units

9 Counters Features of counters:
Resettable: counter value is reset to 0 when RESET is asserted Loadable: counter value is loaded with N-bit value when LOAD is asserted Enabled: counter counts only on clock cycles when EN is asserted Reversible: counter increments or decrements based on UP/DOWN input Terminal Count: TC output asserted when counter overflows 18: Datapath Functional Units

10 Binary Counter Asynchronous ripple-carry counter The falling transition of each register clocks the subsequent register. Asynchronous circuits introduce a whole assortment of problems. Delay is long. 18: Datapath Functional Units

11 Binary Counter Synchronous up/down counter Resettable
Cycle time is limited by the ripple-carry delay with reset, load, and enable 18: Datapath Functional Units

12 Fast Binary Counter The speed of the previous counter is limited by the adder It can be overcome by dividing the counter into two or more segments. A 32-bit counter could be constructed from: a 4-bit prescalar counter 28-bit counter 18: Datapath Functional Units

13 Shifters Logical Shift: Shifts number left or right and fills with 0’s
1011 LSR 1 = LSL1 = 0110 Arithmetic Shift: Shifts number left or right. Rt shift sign extends 1011 ASR1 = ASL1 = 0110 Rotate: Shifts number left or right and fills with lost bits 1011 ROR1 = ROL1 = 0111 18: Datapath Functional Units

14 Funnel Shifter A funnel shifter can do all six types of shifts
creates a 2N – 1-bit input word Z from A and/or the kill values. Selects N-bit field Y from 2N–1-bit input Shift by k bits (0  k < N) Logically involves N N:1 multiplexers 18: Datapath Functional Units

15 Funnel Source Generator
Rotate Right Logical Right Arithmetic Right Rotate Left Logical/Arithmetic Left 18: Datapath Functional Units

16 Array Funnel Shifter N N-input multiplexers
Use 1-of-N hot select signals for shift amount nMOS pass transistor design (Vt drops!) N=4 18: Datapath Functional Units

17 Array Funnel Shifter Source generator consists of a 2N – 1-bit 5:1 multiplexer controlled by the shift type and direction. Z2N-2:N 18: Datapath Functional Units

18 Logarithmic Funnel Shifter
Log N stages of 2-input muxes No select decoding needed 18: Datapath Functional Units

19 32-bit Logarithmic Funnel
The first stage performs a coarse shift right by 0, 8, 16, or 24 bits. The second stage performs a fine shift right by 0–7 bits. The mux decode block conditionally inverts k for left shifts. 18: Datapath Functional Units

20 Barrel Shifter Barrel shifters perform right rotations using wrap-around wires. Left rotations are right rotations by N – k = k + 1 bits. Shifts are rotations with the end bits masked off. Barrel shifter forms: Array Logarithmic: better for large shifts. 18: Datapath Functional Units

21 Logarithmic Barrel Shifter
1 1 Right rotate only Right/Left Shift & Rotate Right/Left shift 18: Datapath Functional Units

22 32-bit Logarithmic Barrel
Datapath never wider than 32 bits First stage preshifts by 1 to handle left shifts 18: Datapath Functional Units

23 Multi-input Adders Suppose we want to add k N-bit words
Ex: = 10111 Straightforward solution: k-1 N-input CPAs Large and slow 18: Datapath Functional Units

24 Carry Save Addition A full adder sums 3 inputs and produces 2 outputs
Carry output has twice weight of sum output (X + Y + Z = S + 2C) N full adders in parallel are called carry save adder Produce N sums and N carry outs 18: Datapath Functional Units

25 CSA Application Use k-2 stages of CSAs
Keep result in carry-save redundant form Final CPA computes actual result 18: Datapath Functional Units

26 Multiplication Multiplication is less common than addition.
Multiplier is essential for Microprocessors Digital signal processors Graphics engines 18: Datapath Functional Units

27 Multiplication Example: M x N-bit multiplication
Produce N M-bit partial products (PPs) Sum these to produce M+N-bit product 18: Datapath Functional Units

28 General Form Multiplicand: Y = (yM-1, yM-2, …, y1, y0)
Multiplier: X = (xN-1, xN-2, …, x1, x0) Product: 18: Datapath Functional Units

29 Dot Diagram Large multiplications are illustrated using dot diagrams.
Each dot represents a bit 18: Datapath Functional Units

30 Multiplier design Depends on: Latency Throughput Energy Area
Design complexity Simple approach: use an M + 1-bit carry-propagate adder (CPA) to add the first two partial products then another CPA to add the third partial product to the running sum, and so forth Requires N – 1 CPAs and is slow Better approach: using arrays 18: Datapath Functional Units

31 Array Multiplier Fast multipliers use carry-save adders (CSAs)
4 × 4 array multiplier for unsigned numbers using an array of CSAs. Each cell contains: 2-input AND gate Full adder (CSA) First row: converts first PPs into CS form Other rows: use CSA to add PP to CS of previous row and generate a CS result N LSB outputs: are directly sum of CSAs. MSB output bits arrive in CS form require M-bit CPA to convert into binary form. (CPA: Carry-Propagate Adder) (CSA: Carry Save Adder) 18: Datapath Functional Units

32 Rectangular Array Squash array to fit rectangular floorplan (the parallelogram shape wastes space. 18: Datapath Functional Units

33 Improvement The first row of CSAs adds the first partial product to a pair of 0s (Sin=0 and Cin=0). Leads to a regular structure but is inefficient. The first row of CSAs can be used to add the first three partial products together. This reduces the number of rows by two and reduces the adder propagation delay Another improvement: replace the bottom row with a faster CPA such as a lookahead or tree adder The critical path of an array multiplier involves N–2 CSAs and a CPA. 18: Datapath Functional Units

34 Two’s Complement Array Multiplication
MSB of a two’s complement number has a negative weight. Two of the PPs have negative weight and should be subtracted. Subtraction is handled by taking the two’s complement of the terms to be subtracted, i.e., inverting the bits and adding one. 18: Datapath Functional Units

35 Baugh-Wooley multiplier algorithm
The upper parallelogram: Multiplication of all but the MSBs of the inputs Next row: product of the MSB (X5Y5) Next two pairs of rows: the inversions of the terms to be subtracted. Each term has implicit leading and trailing zeros, which are inverted to leading and trailing ones. Extra ones must be added in the least significant column when taking the two’s complement. 18: Datapath Functional Units

36 Modified Baugh-Wooley multiplier
Reduces the number of partial products: precomputing the sums of the constant ones pushing some of the terms upward into extra columns 18: Datapath Functional Units

37 squashed into a rectangle
Almost identical to the unsigned multiplier AND gates are replaced by NAND gates in the hatched cells. 1s are added in place of 0s at two of the unused inputs. Signed and unsigned arrays are so similar A single array can be used for both XOR gates are used to conditionally invert some of the terms depending on the mode. 18: Datapath Functional Units

38 Fewer Partial Products
Array multiplier requires N partial products If we looked at groups of r bits, we could form N/r partial products. Faster and smaller? Called radix-2r encoding Ex: r = 2: look at pairs of bits Form partial products of 00  0 01  Y 10  2Y simple shift 11  3Y hard multiple, requiring a slow carry-propagate addition of Y+2Y First three are easy, but 3Y requires adder  18: Datapath Functional Units

39 Booth Encoding Observe that: 3Y = 4Y – Y 2Y = 4Y – 2Y
4Y in a radix-4 multiplier array is equivalent to Y in the next row 18: Datapath Functional Units

40 Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 18: Datapath Functional Units

41 Example 18: Datapath Functional Units

42 Booth Hardware Booth encoder generates control lines for each PP
Booth selectors choose PP bits 18: Datapath Functional Units

43 Sign Extension Partial products can be negative
Require sign extension, which is cumbersome High fanout on most significant bit 18: Datapath Functional Units

44 Simplified Sign Ext. Sign bits are either all 0’s or all 1’s
Note that all 0’s is all 1’s + 1 in proper column Use this to reduce loading on MSB 18: Datapath Functional Units

45 Even Simpler Sign Ext. No need to add all the 1’s in hardware
Precompute the answer! 18: Datapath Functional Units

46 Advanced Multiplication
Signed vs. unsigned inputs Higher radix Booth encoding Array vs. tree CSA networks 18: Datapath Functional Units


Download ppt "Lecture 18: Datapath Functional Units"

Similar presentations


Ads by Google