Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arithmetic Circuits for Number Crunching: Adders, Subtractors,

Similar presentations


Presentation on theme: "Arithmetic Circuits for Number Crunching: Adders, Subtractors,"— Presentation transcript:

1 Arithmetic Circuits for Number Crunching: Adders, Subtractors,
Multipliers and Dividers Dr. Tassadaq Hussain Instructor: Dr. Rehan Ahmed

2 Roadmap Arithmetic Circuits Addition Subtraction Multiplication
Division Approximate Computing Fractional Numbers Fixed Point Floating Point Allegory of Arithmetic, from "Margarita Philosophica," 1504 by Gregor Reisch

3 ADDITION and SUBTRACTION

4 Design Tradeoffs There are many ways to build adders (or any function). Which is the right implementation? Depends on your system’s requirements Optimization Metrics: power Speed Power Area area time

5 Single Cycle vs. Multi Cycle Arithmetic
Arithmetic units can be written as: Combinational Blocks Outputs depend only on inputs Results available in one clock-cycle Multi-Cycle Result is computed over multiple cycles Can be much smaller (require fewer logic resources) Can be pipelined for higher clock frequency

6 Common Adder Architectures
Carry Ripple/Propagate Adder Carry Select Adder Carry Look-ahead Adder Carry Save Adder

7 Carry Ripple/Propagate Adder

8 Review:1-Bit (Half) Adder
C S 1 A B S C What if we want to add more than one bit?

9 Review: Full Adder A B Cin Cout S 1 S A B Cout Cin

10 Aside: How do we perform Subtraction?

11 A  B  A  B 1 Aside: Subtraction A  B  A  (B) B  B 1
Can perform subtraction with addition A  B  A  (B) Recall, Twos Complement Numbers: B  B 1 (Invert all bits of B and add 1) A  B  A  B 1

12 Back to Carry Propagate/Ripple Adder…

13 Carry Propagate Adder (CPA)
Connect Full Adders to make Carry Propagate Adder (Ripple Adder) Cin Cout B3 A3 B A2 B A1 B A0 C4 S3 S S S0 Right-most stage is least-significant bit (LSB) Carry-out of previous stage feeds into Carry-In of next stage Can extend to any number of bits – well, we’ll see that …

14 How fast is this Ripple Carry Adder?

15 Delay of a Full Adder What is the critical path delay of a Full Adder?
B Cout Cin Assume all gates have the same gate propagation delay tPD If inputs A, B, Cin arrive at time 0, S is ready after tPD A, B, Cin  XOR Gate  S Cout is ready after 3 tPD A, B  OR  AND  OR  Cout Critical Path Delay is 3 tPD

16 Delay of Carry Propagate Adder
Consider 2-bit Carry Propagate Adder S1 A1 B1 Cout Cin S0 A0 B0 Cout Cin

17 Delay of Carry Propagate Adder
Consider 2-bit Carry Propagate Adder C C out in SS00 AA00 BB00 Cin 00 SS11 AA11 BB11 Cout Inputs A0, A1, B0, B1 arrive at time 0, Cout of first bit is ready after 3 tPD Delay of next Cout is ready after another 2 tPD Critical Path Delay is 5 tPD

18 Delay of Carry Propagate Adder
Delay for an N-bit Carry Propagate Adder is 2 N 1tPD  3tPD Cin A 0 B0 Cout B 1 A 1 B N-1 A N-1 SN S N-1 2*t gate S1 S0 3*t gate Delay is proportional to N  SLOW for large N When 32- or 64-bit numbers are used, this delay may become unacceptably high.

19 Aside: Combined Adder/Subtractor
A  B  A  B` 1 A3 B3 A2 B2 A1 B1 A0 B0 MODE 1 0 1 0 1 0 1 0 Cout ADDER Ci n C 4 S 3 S2 Subtraction: Mode = 1 S1 S0 Addition: Mode = 0 B inputs are inverted Carry-in is 1 B inputs are NOT inverted Carry-in is 0 Exercise: How does this affect delay?

20 Can we make faster Adders i.e.
Do something about the Carry Generation?

21 Carry Select Adder (CSA)

22 Carry-Select Adder Carry Propagate Adders are slow because the high-order bits need to wait for the carry-in from lower-order bits. Calculate high-order bits for BOTH cases of carry-in. Then select the correct case when carry-in is ready Trade-off area (use more gates) for faster performance

23 Carry Select Adder: 8-Bit Example
1 Cout Cin 4 4 1 0 4 4-bit Carry Propagate Adder S A 7..4 B A B 3..0 A 7..4 B 7..4 S S 3..0 Bits 7..4 computed in parallel with bits 3..0

24 Carry Look-Ahead Adder (CLA)

25 Approach Gi  f (Ai, Bi ) Pi  f (Ai, Bi )
Pre-compute parts of carry logic For each bit of the addition, independently calculate two terms: Generate term Gi  f (Ai, Bi ) Propagate term Pi  f (Ai, Bi ) Gi and Pi are independent of Carry terms Ci Then, by using the various Generate and Propagate terms, we can compute the carry terms at each bit without the ripple effect.

26 Generate Term Does adding the i-th bits of A, B generate a carry?
Example: A = 011, B = 010 Look at each bit-position INDEPENDANTLY Does adding bit 0 generate a Carry? A + B = NO, = 01 (carry NOT generated) Does adding bit 2 generate a Carry? A + B = NO, = 00 G2= 0 G = 0 Does adding bit 1 generate a Carry? A + B = YES, = 10 (carry generated) G1= 1

27 Gi  Ai and Bi Generate Term Ai Bi Gi 1
Ai, Bi would generate a carry if both are 1 Ai Bi Gi 1 Gi  Ai and Bi

28 Propagate Term If there was a carry-in at the i-th bit, would it propagate to the next stage? Example: A = 011, B = 010 Look at each bit-position INDEPENDANTLY Would bit 0 propagate a Carry? A + B = YES, = 10 (carry propagated) Would bit 2 propagate a Carry? A + B = NO, = 01 P2= 0 P =1 hypothetical carry-in hypothetical carry-in Would bit 1 propagate a Carry? A + B = YES, = 11 (carry propagated) P1= 1 hypothetical carry-in

29 Pi  Ai or Bi Propagate Term Ai Bi P i 1
Ai, Bi would propagate a carry if either is 1 Ai Bi P i 1 Pi  Ai or Bi

30 Will the i+1 Position Produce a Carry Out?
Couti+1  Gi or Pi and Cini  A Carry was GENERATED A Carry was PROPAGATED

31 4-Bit Example C1  G0  P0  C0 C2  G1  P1  C1
Let Ci denote the carry-in of stage i this means that it is also the carry-out of the previous stage A3 B3 A2 B2 A1 B1 A0 B0 C0 Bit 0 S0 Bit 1 S1 Bit 2 S2 Bit 3 S3 C1 C2 C3 C4 C1  G0  P0  C0 C2  G1  P1  C1 Note: We are using logical operators here: + means OR . means AND C3  G2  P2  C2 C4  G3  P3  C3

32 Carry Look-Ahead Logic
C1  G0  P0  C0 C2  G1  P1  C1 C3  G2  P2  C2 C4  G3  P3  C3 Perform Forward Substitution C1  G0  P0  C0 C2  G1  P1  G0  P1  P0  C0 C3  G2  P2  G1  P2  P1  G0  P2  P1  P0  C0 C4  G3  P3  G2  P3  P2  G1  P3  P2  P1  G0  P3  P2  P1  P0  C0

33 Carry Look-Ahead Delay
G1 P1 G0 P1 P0 C0 G0 P0 C0 C1  G0  P0  C0 C2  G1  P1  G0  P1  P0  C0 C3  G2  P2  G1  P2  P1  G0  P2  P1  P0  C0 C4  G3  P3  G2  P3  P2  G1  P3  P2  P1  G0  P3  P2  P1  P0  C0 AND-OR networks C C1 If C0 and all Gi and Pi terms are available at the same time, ALL Ci terms will be ready after 2 gate delays No Ripple Effect!

34 Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P 3 2 1 C4 Carry Look-Ahead Unit C3 C2 C1 A2 B2 A1 B1 A3 B3 A0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) S4 S3 S2 S1 S0

35 Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G0 P Generate Propagate Generate Propagate Generate Propagate G1 P1 tPD G3 P3 G2 P2 C4 Carry Look-Ahead C3 C2 A3 B3 A2 B2 A1 Unit C1 G P B1 A0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) S4 S3 S2 S1 S0 All G, P terms available after single gate delay

36 Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P tPD + 2 tPD 3 2 1 C4 Carry Look-Ahead Unit C3 C2 C1 A2 B2 A1 B1 A3 B3 A0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) S4 S3 S2 S1 S0 As discussed in Slide 34

37 Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G1 P Generate Propagate G0 P Generate Propagate Generate Propagate G2 P2 tPD + 2 tPD G3 P3 C4 C C3 A3 B3 A2 1 arry Look-Ahead Unit C2 C1 B2 A1 B1 A0 B0 + Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) tPD S4 S3 S2 S1 S0

38 CLA - Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P tPD + 2 tPD 3 2 1 C4 Carry Look-Ahead Unit C3 C2 C1 A2 B2 A1 B1 A3 B3 A0 B0 + Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) tPD S4 ALL outputs ready after 4 tPD

39 CLA - Scalability Typically do not extend beyond 4-bits
In theory, we could build Carry Look-Ahead Adders of any size N However, equations get more complex very quickly, and we need wider and wider gates (slow) in the carry logic. C1  G0  P0  C0 C2  G1  P1  G0  P1  P0  C0 C3  G2  P2  G1  P2  P1  G0  P2  P1  P0  C0 C4  G3  P3  G2  P3  P2  G1  P3  P2  P1  G0  P3  P2  P1  P0  C0 Typically do not extend beyond 4-bits

40 Hierarchical Carry Look-Ahead Adder

41 Building Larger Carry Look-Ahead Adder
Hierarchically Build Bigger CLAs Create a 4-bit CLA like we discussed The 4-bit CLA now also computes a group generate term GG and a group propagate term PG PG  P3  P2  P1  P0 GG  G3  P3  G2  P3  P2  G1  P3  P2  P1  G0 Combine 4 x 4-bit CLAs to build a 16-bit CLA 16-bit CLA can also compute it’s own GG and PG terms Can then take 4 x 16-bit CLAs to build 64-bit CLA and so on…

42 4-Bit CLA Alternative Schematic
A3 B3 A2 B2 A1 B1 A0 B0 C0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 G3 P3 C3 G2 P2 C2 G1 P1 C1 Carry Look-Ahead Unit G0 P0 C4 Same circuit as slide 39 but drawn differently. Use this as building block for 16-bit CLA

43 16-bit Carry Look-Ahead Adder
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 S 3..0 4 4 4 S15..12 S8..11 S 7..4 G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16

44 16-bit Carry Look-Ahead Adder
Carry Look-Ahead logic is the same C4  G0  P0  C0 C8  G4  P4  C4 C12  G8  P8  C8 C16  G12  P12  C12 Perform Forward Substitution C1  G0  P0  C0 C8  G4  P4  G0  P4  P0  C0 C12  G8  P8  G4  P8  P4  G0  P8  P4  P0  C0 C16  G12  P12  G8  P12  P8  G4  P12  P8  P4  G0  P12  P8  P4  P0  C0 Page 45

45 16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 S 3..0 4 4 4 S15..12 S8..11 S 7..4 G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16

46 16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 4 4 4 S15..12 S8..11 S 7..4 D G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16 Inputs arrive at time 0 GG of 4-bit CLAs ready after 3 tPD

47 16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 5 t PD 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 4 4 4 S15..12 S8..11 S 7..4 D G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16 Carries from CLA Unit ready after another 2 tPD

48 16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 4-bit CLA 4-bit CLA 4-bit Adder Adder Ad GG PG GG PG G 4 4 4 S S8..11 S 7..4 G12 P12 C12 G8 P8 C8 C16 Carry Look-Ahead C0 B7..4 A3..0 B 3..0 A B A B A B A B C0 3 3 2 2 1 1 0 0 5 tPD Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P 4 4 4 CLA er G PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 3 2 1 d C4 Carry Look-Ahead Unit C3 C2 C1 A2 B2 A1 B1 D A3 B3 A0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) G4 P4 C4 Unit G0 P0 S3 S2 S1 S0

49 16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 4-bit CLA 4-bit CLA 4-bit Adder Adder Ad GG PG GG PG G 4 4 4 S S8..11 S 7..4 G12 P12 C12 G8 P8 C8 C0 B7..4 A3..0 B 3..0 A B A B A B A B C0 3 3 2 2 1 1 0 0 5 tPD Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P 4 4 4 CLA er G PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 3 2 1 d C4 Carry Look-Ahead Unit C3 C2 C1 D 7 t A3 B3 A2 B2 A1 B1 PDA0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) G4 P4 C4 nit G0 P0 S3 S2 S1 S0 Carries in 4-bit CLAU ready after another 2 tPD

50 16-bit CLA Critical Path Delay
B7..4 A3..0 B 3..0 A B A B A B A B C0 3 3 2 2 1 1 0 0 5 tPD Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P 4 4 4 CLA er G PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 3 2 1 d C4 Carry Look-Ahead Unit C3 C2 C1 D 7 t A3 B3 A2 B2 A1 B1 PDA0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) G4 P4 C4 nit G0 P0 GG PG S3 S2 S1 8 tPD S0 Sum bits ready after another tPD

51 16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 5 t PD 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 4 4 4 8 tPD S S 15..12 S 7..4 D 8..11 G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16 Overall Critical Path Delay is 8 tPD

52 64-bit Carry Look-Ahead Adder
A B A B A B A15..0 B15..0 C0 16-bit CLA Adder GG PG 16-bit CLA Adder GG PG 16-bit CLA Adder GG PG 16-bit CLA Adder GG PG 16 S15..0 16 16 16 S63..48 S47..32 S G0 P0 GG PG G48 P48 C48 G32 P32 C32 G16 P16 C16 Carry Look-Ahead Unit C64 Exercise: What is the Critical Path Delay of this 64-bit CLA?

53 CLA Critical Path Delay
Delay for an N-bit Carry Look-Ahead Adder is 4 log4 N tPD + + + + G/P G/P G/P G/P Group Gi Pi terms propagate down Group carry terms propagate up Carry into Full Adder 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 16-bit Carry Look-Ahead Unit 16-bit Carry Look-Ahead Unit 64-bit Carry Look-Ahead Unit Delay increases logarithmically w.r.t. N Proportional to number of hierarchical “levels”

54 CPA vs CLA Critical Path Delay
140 Critical Path Delay (gate propagation delays) Carry Propagate Adder O N  120 100 80 60 40 Carry Look-Ahead Adder O log N  20 10 20 30 Adder Size 40 50 60

55 Carry Save Adder (CSA): Adding Multiple Operands

56 Motivation How do you add more than two numbers?
There are many applications where you need to add more than two numbers at a time Multiplication is a good example which we will see soon … Say we want to add M N-bit numbers…

57 M 1 tCLA  O M  log N  One Approach … CLA
Add first two numbers. Add the next number to the previous sum. Repeat. Need M-1 adders Assuming CLA adders, delay is M 1 tCLA  O M  log N 

58 Another Approach… log2 M  tCLA  O log M  log N 
Use Full Tree Topology CLA CLA Add pairs of numbers in parallel. Add pairs of partial sums in parallel. Repeat for each level. CLA CLA CLA CLA CLA CLA CLA Still need M-1 adders CLA CLA CLA Assuming CLA adders, delay is log2 M  tCLA  O log M  log N 

59 Better Approach: Carry Save Adder
Defer the addition of carry bits until later Basic Idea: Add 3 operands at a time (A, B, D) Produce 2 output numbers: partial sum bits (P) and carry bits (C) Add partial sum and carry with a fast adder A  B  D  P  2  C Example: A = 9, B = 12, D = 13  P = 8, C = 13  P+2*C = 34 } A 1001 9 B 1100 12 D 1101 13 P 1000 8 C 1101_ S 100010 34 Do this with Full Adders Do this with Carry Look-Ahead Adder

60 Add 3 Numbers – Produce 2 Outputs
Remember, full adders already add 3 operands and produce 2 outputs Use Full Adders but don’t connect the carry chain A N-1 BN-1 DN-1 A1 B1 D1 A0 B0 D0 CN-1 P N-1 C1 P1 C0 P0

61 Critical Path Delay … Critical Path Delay is just the Full Adder Delay
A N-1 BN-1 DN-1 A1 B1 D1 A0 B0 D0 CN-1 P N-1 C1 P1 C0 P0 All carry bits ready after 3 tPD independent of N

62 Adding More than 3 Operands
Use 3-input CSA as building block A B D N N N N-Bit CSA N C P Delay of a CSA adder INDEPENDENT of N O 1

63 Adding 4 Operands Need a standard adder such as CLA in the end to
A B D N-Bit CSA E N-Bit CSA CLA S Need a standard adder such as CLA in the end to add the two remaining numbers into one.

64 Adding 6 Operands A B D E F G S N-Bit CSA N-Bit CSA N-Bit CSA
CLA S

65 Implement through wiring.
What about the 2C part? A  B  D  P  2  C Multiplying by 2 can be done with bit-shifting. Implement through wiring. A3 B3 D3 A2 B2 D2 A1 B1 D1 A0 B0 D0 A B D N-Bit CSA C P FA FA FA FA C P C P C P C P E E E E E 3 2 1 N-Bit CSA C P FA FA FA FA FA C P C P C P C P C P CLA CLA 5 S S

66 What about the 2C part? CSA adder width increases by 1-bit per level to prevent overflow No problem! CSA delay does not scale with width! A3 B3 D3 A2 B2 D2 A1 B1 D1 A0 B0 D0 A B D N-Bit CSA C P 4-bit CSA FA FA FA FA C P C P C P C P E E E E E 3 2 1 N-Bit CSA C P FA FA FA FA FA 5-bit CSA C P C P C P C P C P CLA CLA 5-bit CLA 5 S S

67 Multiplication

68 Multiplication 0 x 0 = 0 0 x 1 = 0 A 1 x 0 = 0 B 1 x 1 = 1 A x B
Multiplying 1-bit numbers is AND operation 0 x 0 = 0 0 x 1 = 0 1 x 0 = 0 1 x 1 = 1 A B A x B

69 Multiplication Multiplying 1-bit x N-bit is AND operation
A3 A2 A1 A0 B A x B = P x 0 = 0000 1011 x 1 = 1011 P3 P2 P1 P0

70 N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X

71 N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0

72 N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1

73 N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1 A . B A . B A . B A . B 3 2 2 2 1 2 0 2

74 N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1 A . B A . B A . B A . B 3 2 2 2 1 2 0 2 A . B A . B A . B A . B 3 3 2 3 1 3 0 3

75 N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1 A . B A . B A . B A . B 3 2 2 2 1 2 0 2 + A . B A . B A . B A . B 3 3 2 3 1 3 0 3 P7 P6 P5 P4 P3 P2 P1 P0

76 N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1 A . B A . B A . B A . B 3 2 2 2 1 2 0 2 + A . B A . B A . B A . B 3 3 2 3 1 3 0 3 P7 P6 P5 P4 P3 P2 P1 P0 Sounds familiar, Right! : Multiplication by Hand How to express this multiplication in Hardware?

77 Hardware Implementation: N-Bit x N-Bit (Unsigned) Multiplication

78 Hardware Implementation: N-Bit x N-Bit (Unsigned) Multiplication

79 Hardware Implementation: N-Bit x N-Bit (Unsigned) Multiplication

80 Hardware Implementation: N-Bit x N-Bit (Unsigned) Multiplication
Do you see any problem in this circuit?

81 Using Fast Adders…. Fast Adder Fast Adder HA Fast Adder A3 B1 0
A2 B1 A3 B0 A1 B1 A2 B0 A0 B1 A1 B0 A0 B0 Fast Adder A3 B2 A2 B2 A1 B2 A0 B2 HA Fast Adder A3 B3 A2 B3 A1 B3 A0 B3 Fast Adder P7 P6 P5 P4 P3 P2 P1 P0

82 Large Multipliers [aka Decomposed Multipliers]
Construct large multipliers out of smaller multipliers Construct 2N-bit x 2N-bit multiplier using N-bit x N-bit multipliers Let A and B be 2N-bit numbers such that: A  A2 N 1A2 N 2 …ANAN 1AN 2 …A0 AH AL B  B2 N 1B2 N 2 …BNBN 1BN 2 …B0 BH BL { {

83 2N-Bit x 2N-Bit Multiplier
AH is the upper N bits of A and AL is the lower N bits of A BH is the upper N bits of B and BL is the lower N bits of B Therefore, A  A  2 N N A B  B  2  B H L H L And so, A  B  A    2  A N  B  2 N B H L H L  A B 2 2 N A B  A B 2  A B N H H H L L H L L

84 2N-Bit x 2N-Bit Multiplier
A  B  A B 2 A A  A B 2  A B N H H H L L H L L { N-Bit x N-Bit Multipliers

85 2N-Bit x 2N-Bit Multiplier
Bit Shifting { { 2 N A  B  A B 2 A A  A B 2  A B N H H H L L H L L

86 2N-Bit x 2N-Bit Multiplier
A  B  A B 2 A A  A B 2  A B N H H H L L H L L { { 2N-Bit Adders

87 2N-Bit x 2N-Bit Multiplier
AH AL BH BL AL BL AH BL AL BH AH BH 4N-Bit Result

88 Fmax For A Combinational Multiplier
In Verilog, P = A * B; (make sure P has twice the number of bits of A and B) Synthesis tool will create a combinational multiplier if DSP Block inference is turned off. Measured on our Cyclone FPGA: 8 bits x 8 bits: 16 bits x 16 bits: 32 bits x 32 bits: 64 bits x 64 bits: Fmax = 464 MHz Fmax = 396 MHz Fmax = 104 MHz Fmax = 66 MHz

89 Serial (Multi-cycle) Multiplier

90 Multi Cycle Multiplier
How do you multiply by hand? 1101 A 1011 B 1101 11010 000000 P

91 One Possible Algorithm
1101 11010 000000 P P = 0 while B != 0: if B(0) == 1: P = P + A A = << 1 B >> Note: This is NOT Verilog

92 Example Pold P 1311  143 11011011  10001111 A B P = P + A A = A
A Pold P 1 B P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1

93 Example Pold P 1311  143 11011011  10001111 A B P = P + A A = A
A Pold P 1 B P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1

94 Example Pold P 1311  143 11011011  10001111 A B P = P + A A = A
A Pold P 1 B 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1

95 Example Pold P 1311  143 11011011  10001111 A B P = P + A A = A
A Pold P 1 B 1 1 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1

96 Example Pold P 1311  143 11011011  10001111 A B P = P + A A = A
A Pold P 1 B 1 1 1 1 1 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1

97 Example Pold P 1311  143 11011011  10001111 A B P = P + A A = A
A Pold P 1 B 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1

98 Example Pold P 1311  143 11011011  10001111 A B P = P + A A = A
A Pold P 1 B 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1

99 Continue until B is ZERO

100 Final Result Pold P 1311  143 11011011  10001111 A B P = P + A A =
A Pold P B 1 1 1 1 1 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1

101 shiftA loadB shiftB loadP selP z
Top – Level Schematic Controller (State Machine) Datapath b0 s done P n 2n loadA shiftA loadB shiftB loadP selP z B A When s goes high, n-bit values are available on A and B. The machine then multiplies, and when it is finished, asserts done and puts the result on P. It then waits for s to go low.

102 Datapath selP P = 0 while B != 0 if B(0) == 1 P = P + A
DataB (input) n n n 2n loadA shiftA clk load shiftleft loadB shiftB clk load shiftright 2n-bit Register A n-bit Register B 2n n adder 2n 2n z b0 selP Psel 1 P = 0 while B != 0 if B(0) == 1 P = P + A A = A << 1 B = B >> 1 loadP load 2n-bit Register P clk 2n P (output) Page 40

103 State Machine reset s=1 z=0 s=0 s=1 z=1 loadP=1 selP=0 done=1 s=0
shiftA=1 shiftB=1 selP=1 if (b0) loadP=1 else loadP=0 loadP=1 selP=0 done=1 s=0

104 Fmax For A Multi-Cycle Multiplier
Measured on our Cyclone FPGA: Combinational Multiplier Fmax from earlier slide: 8 bits x 8 bits: 16 bits x 16 bits: 32 bits x 32 bits: 64 bits x 64 bits: Fmax = 464 MHz Fmax = 396 MHz Fmax = 104 MHz Fmax = 66 MHz Multi-Cycle Multiplier Fmax: 64 bits x 64 bits: Fmax = 400 MHz Faster Fmax, but there is now LATENCY Takes up to N clock cycles to compute product

105 Division

106 Fmax For A Combinational Divider
In Verilog you can infer a combinational divider: Q = A / B; Measured on our Cyclone FPGA: 8 bits / 8 bits: 16 bits / 16 bits: 32 bits / 32 bits: 64 bits / 64 bits: Fmax = 79 MHz Fmax = 25 MHz Fmax = 9 MHz Fmax = 3 MHz (uses 13% of the logic resources!) Too slow and too large. Almost always use multi-cycle divider in practice

107 Review: Pencil and Paper Division
100 10 01100 1001 001 01 10000 1110 101 Q A B R 9 140 50 45 5 15

108 Restoring Division Algorithm
Dividend A (N bit) Divisor B (N-1 bit) Quotient Q (N bit) Remainder R (N-1 bit) S = A << 1 N times: repeat = - B S2N-1..N S2N-1..N if S < 0: = + B A  Q  B  R S2N-1..N S2N-1..N S << 1 S0 = 0 else S0 = 1 S is an internal variable that is 2N bits wide At the end, SN is not used Q = R = SN S2N-1..N+1 Note: This is NOT Verilog

109 (note that bit 8 is ignored)
Flow Chart Start 1. Load in the divisor into the 8-bit Divisor register and the dividend into the 16-bit Remainder register. Shift the remainder left by 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, and place the result in the left half of the Remainder register The 16-bit Remainder register referred to in this flowchart is S from the pseudo-code. Remainder >= 0 Remainder < 0 Test Remainder 3b. Restore the original value by adding the Divisor Register to the left half of the remainder register and place the sum in the left half of the Remainder Register. Also, shift the Remainder register one bit to the left, setting the new rightmost bit to 0 3a. Shift the Remainder register one bit to the left, setting the new rightmost bit to 1 No 8th Repetition? Yes Done. Quotient is in bits (7:0) of Remainder Register and Remainder is in bits (15:9) of Remainder Register (note that bit 8 is ignored)

110 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 R0 = 0 else S B 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R =

111 Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 1

112 Example + = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old + 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 1

113 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 1

114 Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S A S15..8 = S << 1 S0 = 0 else S A 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 2

115 Example + = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old + 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 2

116 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 2

117 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 3 Same as Iteration 1 and 2. Showing outcome only

118 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 4 Same as Iteration 1 and 2. Showing outcome only

119 Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 5

120 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 5

121 Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 6

122 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 6

123 Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 7

124 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 7

125 Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 8

126 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 8

127 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S S << 1 S0 = 1 S7..0 S15..9 1 Q Q = R = Quotient

128 Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder  5 1111 S = A << 1 N times: = S < 0: remainder  101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S S << 1 S0 = 1 S7..0 S15..9 1 Q Q = R = 1 R Remainder (Note Bit 8 not used)

129 Overall , ¬ Load A, B 1 2 3 4 5 6 7 Clock cycle 8 A/ Q rr R 10001100
1 2 3 4 5 6 7 Clock cycle 8 A/ Q rr R 1001 A B Shift left Subtract ,

130 Datapath and Controller
Controller (State Machine) start load sel shift inbit add sign divisor dividend Datapath Quotient Remainder n n n n

131 Datapath – 8bit Version S 15..8 S 7..0
& 8 Data Signals Dividend A Divisor B Quotient Q = S 7..0 Remainder R = S15..9 Control Signals (in) CLK LOAD ADD SEL SHIFT INBIT Control Signals (out) SIGN Operators & Concatenate LOAD CLK 8-bit Register S 15..8 8 8 ADD SIGN (bit 7) Add / Subtract S 7..0 A 8 8 S & 16 & 16 16 10 01 16-bit 3:1 MUX 11 SEL 2 16 SHIFT INBIT Combinational Left Shift 16 16-bit Register CLK 16 S

132 State Machine S15..8 = S15..8 0: S15..8 S15..8 = S << 1 S7..0
load=0 sel=01 shift=1 inbit=0 add=1 S = A << 1 repeat N times: S15..8 = if S < any state start=1 S : - B load=1 sel=10 shift=1 inbit=0 add=dc load=0 sel=01 shift=0 inbit=dc add=0 sign=1 S15..8 = S << 1 S0 = 0 else S15..8 + B sign=0 load=0 sel=11 shift=1 inbit=1 add=dc dc=don't care S << 1 S0 = 1 S7..0 S15..9 Controller State Machine Q = R = Need 2N Clock Cycles 2 Cycles Per Iteration Can use a counter for loop control Count to 2N

133 Signed Arithmetic

134 Review: Binary Numbers
Unsigned numbers all bits represent the magnitude of a positive integer Signed numbers left-most bit represents the sign of a number b b b n 1 1 Magnitude MSB b b b b n 1 n 2 1 Magnitude Sign 0 denotes + 1 denotes MSB

135 Review: Negative Numbers Representation
Negative numbers can be represented in following ways: Sign and magnitude +5 = 0101 and −5 = 1101 1’s complement 2’s complement

136 Review: 1’s Complement Let K be the negative equivalent of an n-bit positive number P. Then, in 1’s complement representation K is obtained by subtracting P from 2n – 1, namely K = (2n – 1) – P This means that K can be obtained by inverting all bits of P.

137 Review: 2’s Complement Let K be the negative equivalent of an n-bit positive number P. Then, in 2’s complement representation K is obtained by subtracting P from 2n , namely K = 2n – P The 2’s complement can computed by inverting all bits of P and then adding 1 to the resulting 1’s-complement number.

138 Example: Interpretation of four-bit signed integers

139 Suitability of Different Number Representations
To assess the suitability of different number representations, it is necessary to investigate their use in arithmetic operations, particularly in addition and subtraction Addition of positive numbers is the same for all three number representations. But there are significant differences when negative numbers are involved.

140 1’s Complement Addition
( + 5 ) ( 5 ) + ( + 2 ) + + ( + 2 ) + ( + 7 ) ( - 3 ) ( + 5 ) ( 5 ) + ( 2 ) + + ( 2 ) + ( + 3 ) 1 ( 7 ) 1 1 1 The conclusion from these examples is that the addition of 1’s complement numbers may or may not be simple. In some cases a correction is needed, which amounts to an extra addition that must be performed. Consequently, the time needed to add two 1’s complement numbers may be twice as long as the time needed to add two unsigned numbers

141 2’s Complement Addition
( + 5 ) ( 5 ) + ( + 2 ) + + ( + 2 ) + ( + 7 ) ( 3 ) ( + 5 ) ( 5 ) + ( 2 ) + + ( 2 ) + ( + 3 ) 1 ( 7 ) 1 ignore ignore the addition process is the same, regardless of the signs of the operands

142 2’s Complement Subtraction
( + 5 ) ( + 2 ) + ( + 3 ) 1 The key conclusion of this section is that the subtraction operation can be realized as the addition operation, using a 2’s complement of the subtrahend regardless of the signs of the two operands ignore ( 5 ) ( + 2 ) + ( 7 ) 1 ignore ( + 5 ) ( 2 ) + ( + 7 ) ( 5 ) ( 2 ) + ( 3 )

143 Signed Multiplication
The multiplication algorithm we discussed previously works for both signed and unsigned integers. But, for signed, first perform SIGN EXTENSION on operands to twice as many bits 4-bit Example Perform sign extension to create 8-bit operands No Sign Extension With Sign Extension 0011 (3) 1011 (-5) 0000 (NOT -15) (-15) LSB 8-bits

144 Signed Division To handle signed binary number division, we first convert both the dividend and the divisor to positive numbers to perform the division, and then correct the signs of the results as needed. We adopt a convention that the remainder and the dividend shall have the same sign. That is, if the dividend is positive, then the remainder will be positive. If the dividend is negative, then the remainder will be negative. As for the quotient, it will be positive if the divisor and the dividend have the same sign. Otherwise, it will be negative. Here are some examples that illustrate these conventions: 0111/0011 = 0010 R 0001 ( 7 /3 = 2, remainder = 1) 0111/1101 = 1110 R 0001 ( 7/-3 = -2, remainder = 1) 1001/0011 = 1110 R 1111 ( -7/3 = -2, remainder = -1) 1001/1101 = 0010 R 1111 ( -7/-3 = 2, remainder = -1)

145 Signed Division To summarize, if dividend is negative, then two's complement must be applied to the remainder at the end. If the dividend and the divisor have different signs, then the quotient must be negated with 2's complement operation at the end.

146 Fractional Numbers

147 Dealing with Fractional Values
So far, we have been working with integral values Two ways to represent non-integer numbers in binary: Fixed Point Representation Smaller hardware, simpler to implement Constant Resolution Floating Point Representation Larger range of values, more accurate at extremes

148 Fixed-point Representation

149 Fractional Numbers In decimal (base 10) 7241.0381
= 7x x x x x x x x10-4 In binary (base 2) 1x23 + 0x22 + 0x21 + 1x20 + 0x x x x2-4

150 Fixed Point Representation
Uses N-bits to represent a number Location of radix point is FIXED 1 Location of radix point determines range and precision

151 More Formally: Qm.n Format
The Qm.n format of an N bit number sets m bits to the left and n bits to the right of the binary point. For example, a Q15.1 number has 15 integer bits and 1 fractional bit. a Q1.14 number has 1 integer bit and 14 fractional bits. Q format is often used in hardware that does not have a floating-point unit and in applications that require constant resolution.

152 8-bit Example 4 bits after radix point Precision: 0.0625
Precision: Range: 0 to .125 .0625

153 8-bit Example 5 bits after radix point Precision: 0.03125
4 2 1 Precision: Range: 0 to

154 Example 2 * 101 + 6 * 100 + 5 * 10-1 = 26.5 25 24 23 22 21 20 2-1 2-2 2-3 ... 1 = 1 * 24 + 1 * 23 + 0 * 22 + 1 * 21 + 0* 20 + 1 * 2-1 = = 26.5 All digits (or bits) to the left of the binary point carries a weight of 20, 21, 22, and so on. Digits (or bits) on the right of binary point carries a weight of 2-1, 2-2, 2-3

155 What do you Observe here?
25 24 23 22 21 20 Binary Point 2-1 2-2 2-3 1 . 25 24 23 22 21 20 Binary Point 2-1 2-2 2-3 1 . A careful reader should now realize the bit pattern of 53 and 26.5 is exactly the same. The only difference, is the position of binary point. In the case of 5310, there is "no" binary point. Alternatively, we can say the binary point is located at the far right, at position 0. (Think in decimal, 53 and 53.0 represents the same number.)

156 Exercise Consider the following binary representation:
Now using Q5,3 format figure out what number is represented? 1

157 Negative Numbers Remember, we use 2's complement to represent negative numbers. Table shows all the numbers representable with 4-bits 2's complement: n/2 assume the binary point is at position 1 Bit Pattern Number Represented (n) n / 2 1 -1 -0.5 -2 -3 -1.5 -4 -5 -2.5 -6 -7 -3.5 -8 7 3.5 6 3 5 2.5 4 2 1.5 0.5

158 Arithmetic with Fixed-point Representation

159 Addition and Subtraction
Addition and subtraction are performed by treating the fixed-point numbers as integers and adding them using standard addition/subtraction. In terms of the bit count of the result, that can be expressed as: [m.n] ± [m.n] = [m n] Example: Adding two numbers represented in [8.8]-format gives a sum that is represented in [9.8].

160 Multiplication and Division
Treat bits as integers and perform operation, but need to pay attention to radix point in result and shift accordingly Example 3.25 6.5 21.125 1 . If your system is using 8-bit Fixed Point Numbers with 2 fractional bits, need to shift the result to Leads to loss of precision (result is 21) If only we could have used 3 of the 8 bits for the fraction! 

161 Multiplication and Division
In terms of the number of bits in the multiplication result, [m.n] × [m.n] = [2m . 2n] Division: [m1.n1] [m2.n2] ≈ [m1 + n2 . n1 − n2], where again n1 ≥ n2.

162 Floating-point Numbers

163 IEEE 754 Binary Floating Point Standard
Binary floating point encoding scheme established in 1985 and used in almost every electronic/computing system Most commonly used formats from the standard: Single-Precision (32-bits) 31 30 23 22 1 8 bits 23 bits Sign Biased Exponent Mantissa (Significand) Double-Precision (64-bits) 63 62 52 51 1 11 bits 52 bits Sign Biased Exponent Mantissa (Significand) sign 1 1.mantissa  2exponentbias Value =

164 IEEE 754 Binary Floating Point Standard
Binary floating point encoding scheme established in 1985 and used in almost every electronic/computing system Most commonly used formats from the standard: Single-Precision (32-bits) 31 30 23 22 1 8 bits 23 bits Sign Biased Exponent Mantissa (Significand) Double-Precision (64-bits) 63 62 52 51 1 11 bits 52 bits Sign Biased Exponent Mantissa (Significand)

165 Converting a Number to IEEE754
= 31 30 23 22 Sign Biased Exponent Mantissa (Significand)

166 Converting a Number to IEEE754
 x 25 31 30 23 22 Sign Biased Exponent Mantissa (Significand) Convert number to scientific notation This is called NORMALIZATION

167 Converting a Number to IEEE754
 x 25 31 30 23 22 Sign Biased Exponent Mantissa (Significand) First bit of a normalized non-zero binary value is ALWAYS 1 Don’t need to store it

168 Converting a Number to IEEE754
 x 25 31 30 23 22 Sign Biased Exponent Mantissa (Significand) Pad with trailing 0’s and store as Mantissa

169 Converting a Number to IEEE754
 x 25 31 30 23 22 Sign Biased Exponent Mantissa (Significand) Sign bit is 0 for positive numbers and 1 for negative Do NOT convert Mantissa to Two’s Complement

170 Converting a Number to IEEE754
 x 25 510  1012 Exponent needs to be signed But Two’s Complement makes comparisons more complex Add a BIAS (offset) to put exponent into unsigned range

171 Exponent Bias bias  2k1 1
where k is the number of bits used to store exponent Single-Precision: k = 8  bias = = 127 exponent in range of exponent after bias in range of (0, 255 have special meaning) Double-Precision: k = 11  bias = =1023 exponent in range of exponent after bias in range of

172 Converting a Number to IEEE754
= x 25 Biased exponent: = 31 30 23 22 Sign Biased Exponent Mantissa (Significand)

173 1 1.mantissa  2exponentbias
Converting From IEEE754 sign 1 1.mantissa  2exponentbias 31 30 23 22 Sign Biased Exponent Mantissa (Significand)   1   2 127 2 10 2 10

174 More Examples 1 10000001 100 1 1000000000000000000 Example 1:
Determine the IEEE754 Single-Precision representation of = = x 22 Sign: 1 Mantissa: 10011 Biased Exponent = = = 1

175 1   More Examples 1.01  2   1 1.01  2  0.1562510
Determine the decimal number represented by the following bits interpreted as an IEEE754 Single-Precision value 1 1.01  2 127 2 10 2     1 1.01  2 124 127 10 10 2

176 Special IEEE754 Numbers Zero Not a Number (NaN) Infinity
Biased Exponent: all 0’s Mantissa: all 0’s Sign: 0 for +0; 1 for -0 Biased Exponent: all 1’s Mantissa: non-zero Sign: don’t care E.g. sqrt(-1) Infinity Subnormal Numbers Biased Exponent: all 1’s Mantissa: all 1’s Sign: 0 for +infinity; 1 for -infinity E.g. divide by +0 or divide by -0 Biased Exponent: all 0’s Mantissa: Non-zero More on this in next Need to support all of these for strict IEEE compliance For many applications, strict compliance is not required

177 Floating Point Arithmetic
In general… Addition/Subtraction Denormalize operands so that both have the same exponent Perform operation on mantissa with integer arithmetic Normalize result Multiplication/Division XOR the sign bits of operands Add/subtract unbiased exponents using integer arithmetic Perform operation on mantissa using integer arithmetic (need to pay attention to location of radix points) Don’t worry about details now. You can always look these up.

178 More about Floating Point Arithmetic
There is a lot more to be discussed on Floating-Point (rounding, accuracy, etc.) beyond scope of this course. These details can be really important depending on your application Notable Case: 1991 Dhahran, Saudi Arabia – 28 US Soldiers killed because missile interception system internal clock had drifted by 0.33s due to floating-point rounding error accumulated over several days. 0.33s translated into incorrect calculation on incoming missile location by about 600m. After correct initial detection, system looked at wrong part of the sky and found no missile. Thus it did not proceed with interception attempt.

179 Fixed Point vs. Floating Point
Simpler circuitry (need fewer logic/routing resources) for arithmetic operations Floating Point Better precision Higher dynamic range of representable values

180 Why Floating-Point Needs More Hardware
Need to denormalize operands Need to renormalize results Need to perform separate operations on exponent and mantissa Need to support subnormal and normal representations (optional)

181 Overflow Largest positive single-precision number: x 2127 ~= x 1038 OVERFLOW occurs when an arithmetic operation leads to a result that is too big to represent

182 Underflow Smallest positive single-precision normalized number: x ~= x 10-38 UNDERFLOW occurs when an arithmetic operation leads to a result smaller than this value

183 Underflow NORMAL NUMBERS Numbers that can be represented by the normalized format UNDERFLOW GAP The range of 0 to the smallest normal number SUBNORMAL NUMBERS Numbers in the underflow gap IEEE754 has a mechanism to represent a subset of the Subnormal Numbers

184 Subnormal Numbers in IEEE754
To encode a subnormal number in IEEE754, Biased Exponent has all 0’s  represents smallest possible exponent Single-Precision: -126 Double-Precision: -1022 Mantissa is interpreted as being preceded by 0. instead of 1. Example: x 2-126 Trade-off precision (number of significant digits) for range

185 IEEE754 Subnormal Range Smallest positive single-precision subnormal number: x ~= x 10-45

186 (Initial Version – Updates in Progress)
Doing Maths on FPGA (Initial Version – Updates in Progress)

187 Pyramid Smoothing 2 x 2 L3 4 x 4 Up-Sampling L2 8 x 8 L1 16 x 16 L0
Down-Sampling

188 Parent Calculation – Down-Sampling Phase
Ri Result is a floating-point number

189

190 Approximate Computing

191 THANK YOU

192 Adders: Single cycle vs. Multicycle
All of these are single-cycle (combinational) adders. Usually, adders are done as combinational blocks Alternatively, you can add one bit per cycle Need to build a datapath / controller Exercise: design a multi-cycle adder Compare to the “bit counting” circuit in earlier lecture

193 { Adding 16 Operands O log M  log N  log3/2 M  O log N 
N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA log3/2 M  N-Bit CSA N-Bit CSA N-Bit CSA levels N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA Delay For Sum Bits scales in O log M  log N  CLA delay scales in O log N  CLA S

194 Delay Scalability for 64-Bit Operands
200 Cri$cal Path Delay (gate propaga$on delays) 180 Unbalanced CLA Tree O M  log N  160 140 120 100 Balanced CLA Tree O log M  log N  80 60 40 Balanced CSA Tree O log M  log N  20 2 4 6 Number of Operands M

195 16-bit Carry Look-Ahead Adder
A3 B3 A2 B2 A1 B1 A0 B0 A3 B3 A2 B2 A1 B1 A0 B0 A3 B3 A2 B2 A1 B1 A0 B0 A3 B3 A2 B2 A1 B1 A0 B0 C0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 G3 P3 C G2 P2 C2 G1 P1 C G0 P0 G3 P3 C3 G2 P2 C G1 P1 C1 4-bit Carry Look-Ahead Unit G0 P0 GG PG G3 P3 C3 G2 P2 C G1 P1 C1 4-bit Carry Look-Ahead Unit G0 P0 GG PG G3 P3 C3 G2 P2 C2 G1 P1 C G0 P0 4-bit Carry Look-Ahead Unit GG PG 4-bit Carry Look-Ahead Unit GG PG GG12 PG12 C12 GG8 PG C 8 16-bit Carry Look-Ahead Unit GG 4 PG C 4 GG 0 PG 0 C16 GG PG Use this as building block for 64-bit CLA Page 207


Download ppt "Arithmetic Circuits for Number Crunching: Adders, Subtractors,"

Similar presentations


Ads by Google