Download presentation
Presentation is loading. Please wait.
Published bySusanti Johan Modified over 5 years ago
1
Arithmetic Circuits for Number Crunching: Adders, Subtractors,
Multipliers and Dividers Dr. Tassadaq Hussain Instructor: Dr. Rehan Ahmed
2
Roadmap Arithmetic Circuits Addition Subtraction Multiplication
Division Approximate Computing Fractional Numbers Fixed Point Floating Point Allegory of Arithmetic, from "Margarita Philosophica," 1504 by Gregor Reisch
3
ADDITION and SUBTRACTION
4
Design Tradeoffs There are many ways to build adders (or any function). Which is the right implementation? Depends on your system’s requirements Optimization Metrics: power Speed Power Area area time
5
Single Cycle vs. Multi Cycle Arithmetic
Arithmetic units can be written as: Combinational Blocks Outputs depend only on inputs Results available in one clock-cycle Multi-Cycle Result is computed over multiple cycles Can be much smaller (require fewer logic resources) Can be pipelined for higher clock frequency
6
Common Adder Architectures
Carry Ripple/Propagate Adder Carry Select Adder Carry Look-ahead Adder Carry Save Adder
7
Carry Ripple/Propagate Adder
8
Review:1-Bit (Half) Adder
C S 1 A B S C What if we want to add more than one bit?
9
Review: Full Adder A B Cin Cout S 1 S A B Cout Cin
10
Aside: How do we perform Subtraction?
11
A B A B 1 Aside: Subtraction A B A (B) B B 1
Can perform subtraction with addition A B A (B) Recall, Twos Complement Numbers: B B 1 (Invert all bits of B and add 1) A B A B 1
12
Back to Carry Propagate/Ripple Adder…
13
Carry Propagate Adder (CPA)
Connect Full Adders to make Carry Propagate Adder (Ripple Adder) Cin Cout B3 A3 B A2 B A1 B A0 C4 S3 S S S0 Right-most stage is least-significant bit (LSB) Carry-out of previous stage feeds into Carry-In of next stage Can extend to any number of bits – well, we’ll see that …
14
How fast is this Ripple Carry Adder?
15
Delay of a Full Adder What is the critical path delay of a Full Adder?
B Cout Cin Assume all gates have the same gate propagation delay tPD If inputs A, B, Cin arrive at time 0, S is ready after tPD A, B, Cin XOR Gate S Cout is ready after 3 tPD A, B OR AND OR Cout Critical Path Delay is 3 tPD
16
Delay of Carry Propagate Adder
Consider 2-bit Carry Propagate Adder S1 A1 B1 Cout Cin S0 A0 B0 Cout Cin
17
Delay of Carry Propagate Adder
Consider 2-bit Carry Propagate Adder C C out in SS00 AA00 BB00 Cin 00 SS11 AA11 BB11 Cout Inputs A0, A1, B0, B1 arrive at time 0, Cout of first bit is ready after 3 tPD Delay of next Cout is ready after another 2 tPD Critical Path Delay is 5 tPD
18
Delay of Carry Propagate Adder
Delay for an N-bit Carry Propagate Adder is 2 N 1tPD 3tPD … Cin A 0 B0 Cout B 1 A 1 B N-1 A N-1 SN S N-1 2*t gate S1 S0 3*t gate Delay is proportional to N SLOW for large N When 32- or 64-bit numbers are used, this delay may become unacceptably high.
19
Aside: Combined Adder/Subtractor
A B A B` 1 A3 B3 A2 B2 A1 B1 A0 B0 MODE 1 0 1 0 1 0 1 0 Cout ADDER Ci n C 4 S 3 S2 Subtraction: Mode = 1 S1 S0 Addition: Mode = 0 B inputs are inverted Carry-in is 1 B inputs are NOT inverted Carry-in is 0 Exercise: How does this affect delay?
20
Can we make faster Adders i.e.
Do something about the Carry Generation?
21
Carry Select Adder (CSA)
22
Carry-Select Adder Carry Propagate Adders are slow because the high-order bits need to wait for the carry-in from lower-order bits. Calculate high-order bits for BOTH cases of carry-in. Then select the correct case when carry-in is ready Trade-off area (use more gates) for faster performance
23
Carry Select Adder: 8-Bit Example
1 Cout Cin 4 4 1 0 4 4-bit Carry Propagate Adder S A 7..4 B A B 3..0 A 7..4 B 7..4 S S 3..0 Bits 7..4 computed in parallel with bits 3..0
24
Carry Look-Ahead Adder (CLA)
25
Approach Gi f (Ai, Bi ) Pi f (Ai, Bi )
Pre-compute parts of carry logic For each bit of the addition, independently calculate two terms: Generate term Gi f (Ai, Bi ) Propagate term Pi f (Ai, Bi ) Gi and Pi are independent of Carry terms Ci Then, by using the various Generate and Propagate terms, we can compute the carry terms at each bit without the ripple effect.
26
Generate Term Does adding the i-th bits of A, B generate a carry?
Example: A = 011, B = 010 Look at each bit-position INDEPENDANTLY Does adding bit 0 generate a Carry? A + B = NO, = 01 (carry NOT generated) Does adding bit 2 generate a Carry? A + B = NO, = 00 G2= 0 G = 0 Does adding bit 1 generate a Carry? A + B = YES, = 10 (carry generated) G1= 1
27
Gi Ai and Bi Generate Term Ai Bi Gi 1
Ai, Bi would generate a carry if both are 1 Ai Bi Gi 1 Gi Ai and Bi
28
Propagate Term If there was a carry-in at the i-th bit, would it propagate to the next stage? Example: A = 011, B = 010 Look at each bit-position INDEPENDANTLY Would bit 0 propagate a Carry? A + B = YES, = 10 (carry propagated) Would bit 2 propagate a Carry? A + B = NO, = 01 P2= 0 P =1 hypothetical carry-in hypothetical carry-in Would bit 1 propagate a Carry? A + B = YES, = 11 (carry propagated) P1= 1 hypothetical carry-in
29
Pi Ai or Bi Propagate Term Ai Bi P i 1
Ai, Bi would propagate a carry if either is 1 Ai Bi P i 1 Pi Ai or Bi
30
Will the i+1 Position Produce a Carry Out?
Couti+1 Gi or Pi and Cini A Carry was GENERATED A Carry was PROPAGATED
31
4-Bit Example C1 G0 P0 C0 C2 G1 P1 C1
Let Ci denote the carry-in of stage i this means that it is also the carry-out of the previous stage A3 B3 A2 B2 A1 B1 A0 B0 C0 Bit 0 S0 Bit 1 S1 Bit 2 S2 Bit 3 S3 C1 C2 C3 C4 C1 G0 P0 C0 C2 G1 P1 C1 Note: We are using logical operators here: + means OR . means AND C3 G2 P2 C2 C4 G3 P3 C3
32
Carry Look-Ahead Logic
C1 G0 P0 C0 C2 G1 P1 C1 C3 G2 P2 C2 C4 G3 P3 C3 Perform Forward Substitution C1 G0 P0 C0 C2 G1 P1 G0 P1 P0 C0 C3 G2 P2 G1 P2 P1 G0 P2 P1 P0 C0 C4 G3 P3 G2 P3 P2 G1 P3 P2 P1 G0 P3 P2 P1 P0 C0
33
Carry Look-Ahead Delay
G1 P1 G0 P1 P0 C0 G0 P0 C0 C1 G0 P0 C0 C2 G1 P1 G0 P1 P0 C0 C3 G2 P2 G1 P2 P1 G0 P2 P1 P0 C0 C4 G3 P3 G2 P3 P2 G1 P3 P2 P1 G0 P3 P2 P1 P0 C0 AND-OR networks C C1 If C0 and all Gi and Pi terms are available at the same time, ALL Ci terms will be ready after 2 gate delays No Ripple Effect!
34
Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P 3 2 1 C4 Carry Look-Ahead Unit C3 C2 C1 A2 B2 A1 B1 A3 B3 A0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) S4 S3 S2 S1 S0
35
Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G0 P Generate Propagate Generate Propagate Generate Propagate G1 P1 tPD G3 P3 G2 P2 C4 Carry Look-Ahead C3 C2 A3 B3 A2 B2 A1 Unit C1 G P B1 A0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) S4 S3 S2 S1 S0 All G, P terms available after single gate delay
36
Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P tPD + 2 tPD 3 2 1 C4 Carry Look-Ahead Unit C3 C2 C1 A2 B2 A1 B1 A3 B3 A0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) S4 S3 S2 S1 S0 As discussed in Slide 34
37
Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G1 P Generate Propagate G0 P Generate Propagate Generate Propagate G2 P2 tPD + 2 tPD G3 P3 C4 C C3 A3 B3 A2 1 arry Look-Ahead Unit C2 C1 B2 A1 B1 A0 B0 + Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) tPD S4 S3 S2 S1 S0
38
CLA - Overall Critical Path Delay
A3 B3 A2 B2 A1 B1 A0 B0 C0 Inputs arrive at time 0 Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P tPD + 2 tPD 3 2 1 C4 Carry Look-Ahead Unit C3 C2 C1 A2 B2 A1 B1 A3 B3 A0 B0 + Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) tPD S4 ALL outputs ready after 4 tPD
39
CLA - Scalability Typically do not extend beyond 4-bits
In theory, we could build Carry Look-Ahead Adders of any size N However, equations get more complex very quickly, and we need wider and wider gates (slow) in the carry logic. C1 G0 P0 C0 C2 G1 P1 G0 P1 P0 C0 C3 G2 P2 G1 P2 P1 G0 P2 P1 P0 C0 C4 G3 P3 G2 P3 P2 G1 P3 P2 P1 G0 P3 P2 P1 P0 C0 Typically do not extend beyond 4-bits
40
Hierarchical Carry Look-Ahead Adder
41
Building Larger Carry Look-Ahead Adder
Hierarchically Build Bigger CLAs Create a 4-bit CLA like we discussed The 4-bit CLA now also computes a group generate term GG and a group propagate term PG PG P3 P2 P1 P0 GG G3 P3 G2 P3 P2 G1 P3 P2 P1 G0 Combine 4 x 4-bit CLAs to build a 16-bit CLA 16-bit CLA can also compute it’s own GG and PG terms Can then take 4 x 16-bit CLAs to build 64-bit CLA and so on…
42
4-Bit CLA Alternative Schematic
A3 B3 A2 B2 A1 B1 A0 B0 C0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 G3 P3 C3 G2 P2 C2 G1 P1 C1 Carry Look-Ahead Unit G0 P0 C4 Same circuit as slide 39 but drawn differently. Use this as building block for 16-bit CLA
43
16-bit Carry Look-Ahead Adder
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 S 3..0 4 4 4 S15..12 S8..11 S 7..4 G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16
44
16-bit Carry Look-Ahead Adder
Carry Look-Ahead logic is the same C4 G0 P0 C0 C8 G4 P4 C4 C12 G8 P8 C8 C16 G12 P12 C12 Perform Forward Substitution C1 G0 P0 C0 C8 G4 P4 G0 P4 P0 C0 C12 G8 P8 G4 P8 P4 G0 P8 P4 P0 C0 C16 G12 P12 G8 P12 P8 G4 P12 P8 P4 G0 P12 P8 P4 P0 C0 Page 45
45
16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 S 3..0 4 4 4 S15..12 S8..11 S 7..4 G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16
46
16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 4 4 4 S15..12 S8..11 S 7..4 D G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16 Inputs arrive at time 0 GG of 4-bit CLAs ready after 3 tPD
47
16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 5 t PD 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 4 4 4 S15..12 S8..11 S 7..4 D G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16 Carries from CLA Unit ready after another 2 tPD
48
16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 4-bit CLA 4-bit CLA 4-bit Adder Adder Ad GG PG GG PG G 4 4 4 S S8..11 S 7..4 G12 P12 C12 G8 P8 C8 C16 Carry Look-Ahead C0 B7..4 A3..0 B 3..0 A B A B A B A B C0 3 3 2 2 1 1 0 0 5 tPD Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P 4 4 4 CLA er G PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 3 2 1 d C4 Carry Look-Ahead Unit C3 C2 C1 A2 B2 A1 B1 D A3 B3 A0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) G4 P4 C4 Unit G0 P0 S3 S2 S1 S0
49
16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 4-bit CLA 4-bit CLA 4-bit Adder Adder Ad GG PG GG PG G 4 4 4 S S8..11 S 7..4 G12 P12 C12 G8 P8 C8 C0 B7..4 A3..0 B 3..0 A B A B A B A B C0 3 3 2 2 1 1 0 0 5 tPD Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P 4 4 4 CLA er G PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 3 2 1 d C4 Carry Look-Ahead Unit C3 C2 C1 D 7 t A3 B3 A2 B2 A1 B1 PDA0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) G4 P4 C4 nit G0 P0 S3 S2 S1 S0 Carries in 4-bit CLAU ready after another 2 tPD
50
16-bit CLA Critical Path Delay
B7..4 A3..0 B 3..0 A B A B A B A B C0 3 3 2 2 1 1 0 0 5 tPD Generate Propagate G3 P Generate Propagate G2 P Generate Propagate G1 P Generate Propagate G0 P 4 4 4 CLA er G PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 3 2 1 d C4 Carry Look-Ahead Unit C3 C2 C1 D 7 t A3 B3 A2 B2 A1 B1 PDA0 B0 Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) Full Adder (Sum Only) G4 P4 C4 nit G0 P0 GG PG S3 S2 S1 8 tPD S0 Sum bits ready after another tPD
51
16-bit CLA Critical Path Delay
A B A8..11 B8..11 A7..4 B7..4 A3..0 B 3..0 C0 5 t PD 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4-bit CLA Adder GG PG 4 3 tP S 3..0 4 4 4 8 tPD S S 15..12 S 7..4 D 8..11 G8 P8 C8 G4 P4 C4 Carry Look-Ahead Unit G0 P0 G12 P12 C12 C16 Overall Critical Path Delay is 8 tPD
52
64-bit Carry Look-Ahead Adder
A B A B A B A15..0 B15..0 C0 16-bit CLA Adder GG PG 16-bit CLA Adder GG PG 16-bit CLA Adder GG PG 16-bit CLA Adder GG PG 16 S15..0 16 16 16 S63..48 S47..32 S G0 P0 GG PG G48 P48 C48 G32 P32 C32 G16 P16 C16 Carry Look-Ahead Unit C64 Exercise: What is the Critical Path Delay of this 64-bit CLA?
53
CLA Critical Path Delay
Delay for an N-bit Carry Look-Ahead Adder is 4 log4 N tPD + + + + G/P G/P G/P G/P Group Gi Pi terms propagate down Group carry terms propagate up Carry into Full Adder 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 4-bit CLAU 16-bit Carry Look-Ahead Unit 16-bit Carry Look-Ahead Unit 64-bit Carry Look-Ahead Unit Delay increases logarithmically w.r.t. N Proportional to number of hierarchical “levels”
54
CPA vs CLA Critical Path Delay
140 Critical Path Delay (gate propagation delays) Carry Propagate Adder O N 120 100 80 60 40 Carry Look-Ahead Adder O log N 20 10 20 30 Adder Size 40 50 60
55
Carry Save Adder (CSA): Adding Multiple Operands
56
Motivation How do you add more than two numbers?
There are many applications where you need to add more than two numbers at a time Multiplication is a good example which we will see soon … Say we want to add M N-bit numbers…
57
M 1 tCLA O M log N One Approach … CLA
Add first two numbers. Add the next number to the previous sum. Repeat. Need M-1 adders Assuming CLA adders, delay is M 1 tCLA O M log N
58
Another Approach… log2 M tCLA O log M log N
Use Full Tree Topology CLA CLA Add pairs of numbers in parallel. Add pairs of partial sums in parallel. Repeat for each level. CLA CLA CLA CLA CLA CLA CLA Still need M-1 adders CLA CLA CLA Assuming CLA adders, delay is log2 M tCLA O log M log N
59
Better Approach: Carry Save Adder
Defer the addition of carry bits until later Basic Idea: Add 3 operands at a time (A, B, D) Produce 2 output numbers: partial sum bits (P) and carry bits (C) Add partial sum and carry with a fast adder A B D P 2 C Example: A = 9, B = 12, D = 13 P = 8, C = 13 P+2*C = 34 } A 1001 9 B 1100 12 D 1101 13 P 1000 8 C 1101_ S 100010 34 Do this with Full Adders Do this with Carry Look-Ahead Adder
60
Add 3 Numbers – Produce 2 Outputs
Remember, full adders already add 3 operands and produce 2 outputs Use Full Adders but don’t connect the carry chain A N-1 BN-1 DN-1 A1 B1 D1 A0 B0 D0 … CN-1 P N-1 C1 P1 C0 P0
61
Critical Path Delay … Critical Path Delay is just the Full Adder Delay
A N-1 BN-1 DN-1 A1 B1 D1 A0 B0 D0 … CN-1 P N-1 C1 P1 C0 P0 All carry bits ready after 3 tPD independent of N
62
Adding More than 3 Operands
Use 3-input CSA as building block A B D N N N N-Bit CSA N C P Delay of a CSA adder INDEPENDENT of N O 1
63
Adding 4 Operands Need a standard adder such as CLA in the end to
A B D N-Bit CSA E N-Bit CSA CLA S Need a standard adder such as CLA in the end to add the two remaining numbers into one.
64
Adding 6 Operands A B D E F G S N-Bit CSA N-Bit CSA N-Bit CSA
CLA S
65
Implement through wiring.
What about the 2C part? A B D P 2 C Multiplying by 2 can be done with bit-shifting. Implement through wiring. A3 B3 D3 A2 B2 D2 A1 B1 D1 A0 B0 D0 A B D N-Bit CSA C P FA FA FA FA C P C P C P C P E E E E E 3 2 1 N-Bit CSA C P FA FA FA FA FA C P C P C P C P C P CLA CLA 5 S S
66
What about the 2C part? CSA adder width increases by 1-bit per level to prevent overflow No problem! CSA delay does not scale with width! A3 B3 D3 A2 B2 D2 A1 B1 D1 A0 B0 D0 A B D N-Bit CSA C P 4-bit CSA FA FA FA FA C P C P C P C P E E E E E 3 2 1 N-Bit CSA C P FA FA FA FA FA 5-bit CSA C P C P C P C P C P CLA CLA 5-bit CLA 5 S S
67
Multiplication
68
Multiplication 0 x 0 = 0 0 x 1 = 0 A 1 x 0 = 0 B 1 x 1 = 1 A x B
Multiplying 1-bit numbers is AND operation 0 x 0 = 0 0 x 1 = 0 1 x 0 = 0 1 x 1 = 1 A B A x B
69
Multiplication Multiplying 1-bit x N-bit is AND operation
A3 A2 A1 A0 B A x B = P x 0 = 0000 1011 x 1 = 1011 P3 P2 P1 P0
70
N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X
71
N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0
72
N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1
73
N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1 A . B A . B A . B A . B 3 2 2 2 1 2 0 2
74
N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1 A . B A . B A . B A . B 3 2 2 2 1 2 0 2 A . B A . B A . B A . B 3 3 2 3 1 3 0 3
75
N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1 A . B A . B A . B A . B 3 2 2 2 1 2 0 2 + A . B A . B A . B A . B 3 3 2 3 1 3 0 3 P7 P6 P5 P4 P3 P2 P1 P0
76
N-Bit x N-Bit (Unsigned) Multiplication
Consider 4-bit x 4-bit Multiplication A3 B3 A2 B2 A1 B1 A0 B0 X A . B A . B A . B A . B 3 0 2 0 1 0 0 0 A . B A . B A . B A . B 3 1 2 1 1 1 0 1 A . B A . B A . B A . B 3 2 2 2 1 2 0 2 + A . B A . B A . B A . B 3 3 2 3 1 3 0 3 P7 P6 P5 P4 P3 P2 P1 P0 Sounds familiar, Right! : Multiplication by Hand How to express this multiplication in Hardware?
77
Hardware Implementation: N-Bit x N-Bit (Unsigned) Multiplication
78
Hardware Implementation: N-Bit x N-Bit (Unsigned) Multiplication
79
Hardware Implementation: N-Bit x N-Bit (Unsigned) Multiplication
80
Hardware Implementation: N-Bit x N-Bit (Unsigned) Multiplication
Do you see any problem in this circuit?
81
Using Fast Adders…. Fast Adder Fast Adder HA Fast Adder A3 B1 0
A2 B1 A3 B0 A1 B1 A2 B0 A0 B1 A1 B0 A0 B0 Fast Adder A3 B2 A2 B2 A1 B2 A0 B2 HA Fast Adder A3 B3 A2 B3 A1 B3 A0 B3 Fast Adder P7 P6 P5 P4 P3 P2 P1 P0
82
Large Multipliers [aka Decomposed Multipliers]
Construct large multipliers out of smaller multipliers Construct 2N-bit x 2N-bit multiplier using N-bit x N-bit multipliers Let A and B be 2N-bit numbers such that: A A2 N 1A2 N 2 …ANAN 1AN 2 …A0 AH AL B B2 N 1B2 N 2 …BNBN 1BN 2 …B0 BH BL { {
83
2N-Bit x 2N-Bit Multiplier
AH is the upper N bits of A and AL is the lower N bits of A BH is the upper N bits of B and BL is the lower N bits of B Therefore, A A 2 N N A B B 2 B H L H L And so, A B A 2 A N B 2 N B H L H L A B 2 2 N A B A B 2 A B N H H H L L H L L
84
2N-Bit x 2N-Bit Multiplier
A B A B 2 A A A B 2 A B N H H H L L H L L { N-Bit x N-Bit Multipliers
85
2N-Bit x 2N-Bit Multiplier
Bit Shifting { { 2 N A B A B 2 A A A B 2 A B N H H H L L H L L
86
2N-Bit x 2N-Bit Multiplier
A B A B 2 A A A B 2 A B N H H H L L H L L { { 2N-Bit Adders
87
2N-Bit x 2N-Bit Multiplier
AH AL BH BL AL BL AH BL AL BH AH BH 4N-Bit Result
88
Fmax For A Combinational Multiplier
In Verilog, P = A * B; (make sure P has twice the number of bits of A and B) Synthesis tool will create a combinational multiplier if DSP Block inference is turned off. Measured on our Cyclone FPGA: 8 bits x 8 bits: 16 bits x 16 bits: 32 bits x 32 bits: 64 bits x 64 bits: Fmax = 464 MHz Fmax = 396 MHz Fmax = 104 MHz Fmax = 66 MHz
89
Serial (Multi-cycle) Multiplier
90
Multi Cycle Multiplier
How do you multiply by hand? 1101 A 1011 B 1101 11010 000000 P
91
One Possible Algorithm
1101 11010 000000 P P = 0 while B != 0: if B(0) == 1: P = P + A A = << 1 B >> Note: This is NOT Verilog
92
Example Pold P 1311 143 11011011 10001111 A B P = P + A A = A
A Pold P 1 B P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1
93
Example Pold P 1311 143 11011011 10001111 A B P = P + A A = A
A Pold P 1 B P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1
94
Example Pold P 1311 143 11011011 10001111 A B P = P + A A = A
A Pold P 1 B 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1
95
Example Pold P 1311 143 11011011 10001111 A B P = P + A A = A
A Pold P 1 B 1 1 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1
96
Example Pold P 1311 143 11011011 10001111 A B P = P + A A = A
A Pold P 1 B 1 1 1 1 1 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1
97
Example Pold P 1311 143 11011011 10001111 A B P = P + A A = A
A Pold P 1 B 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1
98
Example Pold P 1311 143 11011011 10001111 A B P = P + A A = A
A Pold P 1 B 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1
99
Continue until B is ZERO
100
Final Result Pold P 1311 143 11011011 10001111 A B P = P + A A =
A Pold P B 1 1 1 1 1 1 1 P = 0 while B != 0: if B(0) == 1: P = P + A A = A << 1 B = B >> 1
101
shiftA loadB shiftB loadP selP z
Top – Level Schematic Controller (State Machine) Datapath b0 s done P n 2n loadA shiftA loadB shiftB loadP selP z B A When s goes high, n-bit values are available on A and B. The machine then multiplies, and when it is finished, asserts done and puts the result on P. It then waits for s to go low.
102
Datapath selP P = 0 while B != 0 if B(0) == 1 P = P + A
DataB (input) n n n 2n loadA shiftA clk load shiftleft loadB shiftB clk load shiftright 2n-bit Register A n-bit Register B 2n n adder 2n 2n z b0 selP Psel 1 P = 0 while B != 0 if B(0) == 1 P = P + A A = A << 1 B = B >> 1 loadP load 2n-bit Register P clk 2n P (output) Page 40
103
State Machine reset s=1 z=0 s=0 s=1 z=1 loadP=1 selP=0 done=1 s=0
shiftA=1 shiftB=1 selP=1 if (b0) loadP=1 else loadP=0 loadP=1 selP=0 done=1 s=0
104
Fmax For A Multi-Cycle Multiplier
Measured on our Cyclone FPGA: Combinational Multiplier Fmax from earlier slide: 8 bits x 8 bits: 16 bits x 16 bits: 32 bits x 32 bits: 64 bits x 64 bits: Fmax = 464 MHz Fmax = 396 MHz Fmax = 104 MHz Fmax = 66 MHz Multi-Cycle Multiplier Fmax: 64 bits x 64 bits: Fmax = 400 MHz Faster Fmax, but there is now LATENCY Takes up to N clock cycles to compute product
105
Division
106
Fmax For A Combinational Divider
In Verilog you can infer a combinational divider: Q = A / B; Measured on our Cyclone FPGA: 8 bits / 8 bits: 16 bits / 16 bits: 32 bits / 32 bits: 64 bits / 64 bits: Fmax = 79 MHz Fmax = 25 MHz Fmax = 9 MHz Fmax = 3 MHz (uses 13% of the logic resources!) Too slow and too large. Almost always use multi-cycle divider in practice
107
Review: Pencil and Paper Division
100 10 01100 1001 001 01 10000 1110 101 Q A B R 9 140 50 45 5 15
108
Restoring Division Algorithm
Dividend A (N bit) Divisor B (N-1 bit) Quotient Q (N bit) Remainder R (N-1 bit) S = A << 1 N times: repeat = - B S2N-1..N S2N-1..N if S < 0: = + B A Q B R S2N-1..N S2N-1..N S << 1 S0 = 0 else S0 = 1 S is an internal variable that is 2N bits wide At the end, SN is not used Q = R = SN S2N-1..N+1 Note: This is NOT Verilog
109
(note that bit 8 is ignored)
Flow Chart Start 1. Load in the divisor into the 8-bit Divisor register and the dividend into the 16-bit Remainder register. Shift the remainder left by 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, and place the result in the left half of the Remainder register The 16-bit Remainder register referred to in this flowchart is S from the pseudo-code. Remainder >= 0 Remainder < 0 Test Remainder 3b. Restore the original value by adding the Divisor Register to the left half of the remainder register and place the sum in the left half of the Remainder Register. Also, shift the Remainder register one bit to the left, setting the new rightmost bit to 0 3a. Shift the Remainder register one bit to the left, setting the new rightmost bit to 1 No 8th Repetition? Yes Done. Quotient is in bits (7:0) of Remainder Register and Remainder is in bits (15:9) of Remainder Register (note that bit 8 is ignored)
110
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 R0 = 0 else S B 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R =
111
Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 1
112
Example + = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old + 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 1
113
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 1
114
Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S A S15..8 = S << 1 S0 = 0 else S A 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 2
115
Example + = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old + 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 2
116
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 2
117
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 3 Same as Iteration 1 and 2. Showing outcome only
118
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 4 Same as Iteration 1 and 2. Showing outcome only
119
Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 5
120
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 5
121
Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 6
122
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 6
123
Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 7
124
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 7
125
Example - = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old - 1 B S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 8
126
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S old S << 1 S0 = 1 S7..0 S15..9 1 S Q = R = ITERATION 8
127
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S S << 1 S0 = 1 S7..0 S15..9 1 Q Q = R = Quotient
128
Example = S15..8 - S15..8 S15..8 = S << 1 S15..8 + S7..0 S15..9
9 140 remainder 5 1111 S = A << 1 N times: = S < 0: remainder 101 repeat S15..8 if S B S15..8 = S << 1 S0 = 0 else S B 1 S S << 1 S0 = 1 S7..0 S15..9 1 Q Q = R = 1 R Remainder (Note Bit 8 not used)
129
Overall , ¬ Load A, B 1 2 3 4 5 6 7 Clock cycle 8 A/ Q rr R 10001100
1 2 3 4 5 6 7 Clock cycle 8 A/ Q rr R 1001 A B Shift left Subtract ,
130
Datapath and Controller
Controller (State Machine) start load sel shift inbit add sign divisor dividend Datapath Quotient Remainder n n n n
131
Datapath – 8bit Version S 15..8 S 7..0
& 8 Data Signals Dividend A Divisor B Quotient Q = S 7..0 Remainder R = S15..9 Control Signals (in) CLK LOAD ADD SEL SHIFT INBIT Control Signals (out) SIGN Operators & Concatenate LOAD CLK 8-bit Register S 15..8 8 8 ADD SIGN (bit 7) Add / Subtract S 7..0 A 8 8 S & 16 & 16 16 10 01 16-bit 3:1 MUX 11 SEL 2 16 SHIFT INBIT Combinational Left Shift 16 16-bit Register CLK 16 S
132
State Machine S15..8 = S15..8 0: S15..8 S15..8 = S << 1 S7..0
load=0 sel=01 shift=1 inbit=0 add=1 S = A << 1 repeat N times: S15..8 = if S < any state start=1 S : - B load=1 sel=10 shift=1 inbit=0 add=dc load=0 sel=01 shift=0 inbit=dc add=0 sign=1 S15..8 = S << 1 S0 = 0 else S15..8 + B sign=0 load=0 sel=11 shift=1 inbit=1 add=dc dc=don't care S << 1 S0 = 1 S7..0 S15..9 Controller State Machine Q = R = Need 2N Clock Cycles 2 Cycles Per Iteration Can use a counter for loop control Count to 2N
133
Signed Arithmetic
134
Review: Binary Numbers
Unsigned numbers all bits represent the magnitude of a positive integer Signed numbers left-most bit represents the sign of a number b b b n – 1 1 Magnitude MSB b b b b n – 1 n – 2 1 Magnitude Sign 0 denotes + 1 denotes – MSB
135
Review: Negative Numbers Representation
Negative numbers can be represented in following ways: Sign and magnitude +5 = 0101 and −5 = 1101 1’s complement 2’s complement
136
Review: 1’s Complement Let K be the negative equivalent of an n-bit positive number P. Then, in 1’s complement representation K is obtained by subtracting P from 2n – 1, namely K = (2n – 1) – P This means that K can be obtained by inverting all bits of P.
137
Review: 2’s Complement Let K be the negative equivalent of an n-bit positive number P. Then, in 2’s complement representation K is obtained by subtracting P from 2n , namely K = 2n – P The 2’s complement can computed by inverting all bits of P and then adding 1 to the resulting 1’s-complement number.
138
Example: Interpretation of four-bit signed integers
139
Suitability of Different Number Representations
To assess the suitability of different number representations, it is necessary to investigate their use in arithmetic operations, particularly in addition and subtraction Addition of positive numbers is the same for all three number representations. But there are significant differences when negative numbers are involved.
140
1’s Complement Addition
( + 5 ) ( – 5 ) + ( + 2 ) + + ( + 2 ) + ( + 7 ) ( - 3 ) ( + 5 ) ( – 5 ) + ( – 2 ) + + ( – 2 ) + ( + 3 ) 1 ( – 7 ) 1 1 1 The conclusion from these examples is that the addition of 1’s complement numbers may or may not be simple. In some cases a correction is needed, which amounts to an extra addition that must be performed. Consequently, the time needed to add two 1’s complement numbers may be twice as long as the time needed to add two unsigned numbers
141
2’s Complement Addition
( + 5 ) ( – 5 ) + ( + 2 ) + + ( + 2 ) + ( + 7 ) ( – 3 ) ( + 5 ) ( – 5 ) + ( – 2 ) + + ( – 2 ) + ( + 3 ) 1 ( – 7 ) 1 ignore ignore the addition process is the same, regardless of the signs of the operands
142
2’s Complement Subtraction
( + 5 ) – ( + 2 ) – + ( + 3 ) 1 The key conclusion of this section is that the subtraction operation can be realized as the addition operation, using a 2’s complement of the subtrahend regardless of the signs of the two operands ignore ( – 5 ) – ( + 2 ) – + ( – 7 ) 1 ignore ( + 5 ) – ( – 2 ) – + ( + 7 ) ( – 5 ) – ( – 2 ) – + ( – 3 )
143
Signed Multiplication
The multiplication algorithm we discussed previously works for both signed and unsigned integers. But, for signed, first perform SIGN EXTENSION on operands to twice as many bits 4-bit Example Perform sign extension to create 8-bit operands No Sign Extension With Sign Extension 0011 (3) 1011 (-5) 0000 (NOT -15) (-15) LSB 8-bits
144
Signed Division To handle signed binary number division, we first convert both the dividend and the divisor to positive numbers to perform the division, and then correct the signs of the results as needed. We adopt a convention that the remainder and the dividend shall have the same sign. That is, if the dividend is positive, then the remainder will be positive. If the dividend is negative, then the remainder will be negative. As for the quotient, it will be positive if the divisor and the dividend have the same sign. Otherwise, it will be negative. Here are some examples that illustrate these conventions: 0111/0011 = 0010 R 0001 ( 7 /3 = 2, remainder = 1) 0111/1101 = 1110 R 0001 ( 7/-3 = -2, remainder = 1) 1001/0011 = 1110 R 1111 ( -7/3 = -2, remainder = -1) 1001/1101 = 0010 R 1111 ( -7/-3 = 2, remainder = -1)
145
Signed Division To summarize, if dividend is negative, then two's complement must be applied to the remainder at the end. If the dividend and the divisor have different signs, then the quotient must be negated with 2's complement operation at the end.
146
Fractional Numbers
147
Dealing with Fractional Values
So far, we have been working with integral values Two ways to represent non-integer numbers in binary: Fixed Point Representation Smaller hardware, simpler to implement Constant Resolution Floating Point Representation Larger range of values, more accurate at extremes
148
Fixed-point Representation
149
Fractional Numbers In decimal (base 10) 7241.0381
= 7x x x x x x x x10-4 In binary (base 2) 1x23 + 0x22 + 0x21 + 1x20 + 0x x x x2-4
150
Fixed Point Representation
Uses N-bits to represent a number Location of radix point is FIXED 1 Location of radix point determines range and precision
151
More Formally: Qm.n Format
The Qm.n format of an N bit number sets m bits to the left and n bits to the right of the binary point. For example, a Q15.1 number has 15 integer bits and 1 fractional bit. a Q1.14 number has 1 integer bit and 14 fractional bits. Q format is often used in hardware that does not have a floating-point unit and in applications that require constant resolution.
152
8-bit Example 4 bits after radix point Precision: 0.0625
Precision: Range: 0 to .125 .0625
153
8-bit Example 5 bits after radix point Precision: 0.03125
4 2 1 Precision: Range: 0 to
154
Example 2 * 101 + 6 * 100 + 5 * 10-1 = 26.5 25 24 23 22 21 20 2-1 2-2 2-3 ... 1 = 1 * 24 + 1 * 23 + 0 * 22 + 1 * 21 + 0* 20 + 1 * 2-1 = = 26.5 All digits (or bits) to the left of the binary point carries a weight of 20, 21, 22, and so on. Digits (or bits) on the right of binary point carries a weight of 2-1, 2-2, 2-3
155
What do you Observe here?
25 24 23 22 21 20 Binary Point 2-1 2-2 2-3 1 . 25 24 23 22 21 20 Binary Point 2-1 2-2 2-3 1 . A careful reader should now realize the bit pattern of 53 and 26.5 is exactly the same. The only difference, is the position of binary point. In the case of 5310, there is "no" binary point. Alternatively, we can say the binary point is located at the far right, at position 0. (Think in decimal, 53 and 53.0 represents the same number.)
156
Exercise Consider the following binary representation:
Now using Q5,3 format figure out what number is represented? 1
157
Negative Numbers Remember, we use 2's complement to represent negative numbers. Table shows all the numbers representable with 4-bits 2's complement: n/2 assume the binary point is at position 1 Bit Pattern Number Represented (n) n / 2 1 -1 -0.5 -2 -3 -1.5 -4 -5 -2.5 -6 -7 -3.5 -8 7 3.5 6 3 5 2.5 4 2 1.5 0.5
158
Arithmetic with Fixed-point Representation
159
Addition and Subtraction
Addition and subtraction are performed by treating the fixed-point numbers as integers and adding them using standard addition/subtraction. In terms of the bit count of the result, that can be expressed as: [m.n] ± [m.n] = [m n] Example: Adding two numbers represented in [8.8]-format gives a sum that is represented in [9.8].
160
Multiplication and Division
Treat bits as integers and perform operation, but need to pay attention to radix point in result and shift accordingly Example 3.25 6.5 21.125 1 . If your system is using 8-bit Fixed Point Numbers with 2 fractional bits, need to shift the result to Leads to loss of precision (result is 21) If only we could have used 3 of the 8 bits for the fraction!
161
Multiplication and Division
In terms of the number of bits in the multiplication result, [m.n] × [m.n] = [2m . 2n] Division: [m1.n1] [m2.n2] ≈ [m1 + n2 . n1 − n2], where again n1 ≥ n2.
162
Floating-point Numbers
163
IEEE 754 Binary Floating Point Standard
Binary floating point encoding scheme established in 1985 and used in almost every electronic/computing system Most commonly used formats from the standard: Single-Precision (32-bits) 31 30 23 22 1 8 bits 23 bits Sign Biased Exponent Mantissa (Significand) Double-Precision (64-bits) 63 62 52 51 1 11 bits 52 bits Sign Biased Exponent Mantissa (Significand) sign 1 1.mantissa 2exponentbias Value =
164
IEEE 754 Binary Floating Point Standard
Binary floating point encoding scheme established in 1985 and used in almost every electronic/computing system Most commonly used formats from the standard: Single-Precision (32-bits) 31 30 23 22 1 8 bits 23 bits Sign Biased Exponent Mantissa (Significand) Double-Precision (64-bits) 63 62 52 51 1 11 bits 52 bits Sign Biased Exponent Mantissa (Significand)
165
Converting a Number to IEEE754
= 31 30 23 22 Sign Biased Exponent Mantissa (Significand)
166
Converting a Number to IEEE754
x 25 31 30 23 22 Sign Biased Exponent Mantissa (Significand) Convert number to scientific notation This is called NORMALIZATION
167
Converting a Number to IEEE754
x 25 31 30 23 22 Sign Biased Exponent Mantissa (Significand) First bit of a normalized non-zero binary value is ALWAYS 1 Don’t need to store it
168
Converting a Number to IEEE754
x 25 31 30 23 22 Sign Biased Exponent Mantissa (Significand) Pad with trailing 0’s and store as Mantissa
169
Converting a Number to IEEE754
x 25 31 30 23 22 Sign Biased Exponent Mantissa (Significand) Sign bit is 0 for positive numbers and 1 for negative Do NOT convert Mantissa to Two’s Complement
170
Converting a Number to IEEE754
x 25 510 1012 Exponent needs to be signed But Two’s Complement makes comparisons more complex Add a BIAS (offset) to put exponent into unsigned range
171
Exponent Bias bias 2k1 1
where k is the number of bits used to store exponent Single-Precision: k = 8 bias = = 127 exponent in range of exponent after bias in range of (0, 255 have special meaning) Double-Precision: k = 11 bias = =1023 exponent in range of exponent after bias in range of
172
Converting a Number to IEEE754
= x 25 Biased exponent: = 31 30 23 22 Sign Biased Exponent Mantissa (Significand)
173
1 1.mantissa 2exponentbias
Converting From IEEE754 sign 1 1.mantissa 2exponentbias 31 30 23 22 Sign Biased Exponent Mantissa (Significand) 1 2 127 2 10 2 10
174
More Examples 1 10000001 100 1 1000000000000000000 Example 1:
Determine the IEEE754 Single-Precision representation of = = x 22 Sign: 1 Mantissa: 10011 Biased Exponent = = = 1
175
1 More Examples 1.01 2 1 1.01 2 0.1562510
Determine the decimal number represented by the following bits interpreted as an IEEE754 Single-Precision value 1 1.01 2 127 2 10 2 1 1.01 2 124 127 10 10 2
176
Special IEEE754 Numbers Zero Not a Number (NaN) Infinity
Biased Exponent: all 0’s Mantissa: all 0’s Sign: 0 for +0; 1 for -0 Biased Exponent: all 1’s Mantissa: non-zero Sign: don’t care E.g. sqrt(-1) Infinity Subnormal Numbers Biased Exponent: all 1’s Mantissa: all 1’s Sign: 0 for +infinity; 1 for -infinity E.g. divide by +0 or divide by -0 Biased Exponent: all 0’s Mantissa: Non-zero More on this in next Need to support all of these for strict IEEE compliance For many applications, strict compliance is not required
177
Floating Point Arithmetic
In general… Addition/Subtraction Denormalize operands so that both have the same exponent Perform operation on mantissa with integer arithmetic Normalize result Multiplication/Division XOR the sign bits of operands Add/subtract unbiased exponents using integer arithmetic Perform operation on mantissa using integer arithmetic (need to pay attention to location of radix points) Don’t worry about details now. You can always look these up.
178
More about Floating Point Arithmetic
There is a lot more to be discussed on Floating-Point (rounding, accuracy, etc.) beyond scope of this course. These details can be really important depending on your application Notable Case: 1991 Dhahran, Saudi Arabia – 28 US Soldiers killed because missile interception system internal clock had drifted by 0.33s due to floating-point rounding error accumulated over several days. 0.33s translated into incorrect calculation on incoming missile location by about 600m. After correct initial detection, system looked at wrong part of the sky and found no missile. Thus it did not proceed with interception attempt.
179
Fixed Point vs. Floating Point
Simpler circuitry (need fewer logic/routing resources) for arithmetic operations Floating Point Better precision Higher dynamic range of representable values
180
Why Floating-Point Needs More Hardware
Need to denormalize operands Need to renormalize results Need to perform separate operations on exponent and mantissa Need to support subnormal and normal representations (optional)
181
Overflow Largest positive single-precision number: x 2127 ~= x 1038 OVERFLOW occurs when an arithmetic operation leads to a result that is too big to represent
182
Underflow Smallest positive single-precision normalized number: x ~= x 10-38 UNDERFLOW occurs when an arithmetic operation leads to a result smaller than this value
183
Underflow NORMAL NUMBERS Numbers that can be represented by the normalized format UNDERFLOW GAP The range of 0 to the smallest normal number SUBNORMAL NUMBERS Numbers in the underflow gap IEEE754 has a mechanism to represent a subset of the Subnormal Numbers
184
Subnormal Numbers in IEEE754
To encode a subnormal number in IEEE754, Biased Exponent has all 0’s represents smallest possible exponent Single-Precision: -126 Double-Precision: -1022 Mantissa is interpreted as being preceded by 0. instead of 1. Example: x 2-126 Trade-off precision (number of significant digits) for range
185
IEEE754 Subnormal Range Smallest positive single-precision subnormal number: x ~= x 10-45
186
(Initial Version – Updates in Progress)
Doing Maths on FPGA (Initial Version – Updates in Progress)
187
Pyramid Smoothing 2 x 2 L3 4 x 4 Up-Sampling L2 8 x 8 L1 16 x 16 L0
Down-Sampling
188
Parent Calculation – Down-Sampling Phase
Ri Result is a floating-point number
190
Approximate Computing
191
THANK YOU
192
Adders: Single cycle vs. Multicycle
All of these are single-cycle (combinational) adders. Usually, adders are done as combinational blocks Alternatively, you can add one bit per cycle Need to build a datapath / controller Exercise: design a multi-cycle adder Compare to the “bit counting” circuit in earlier lecture
193
{ Adding 16 Operands O log M log N log3/2 M O log N
N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA log3/2 M N-Bit CSA N-Bit CSA N-Bit CSA levels N-Bit CSA N-Bit CSA N-Bit CSA N-Bit CSA Delay For Sum Bits scales in O log M log N CLA delay scales in O log N CLA S
194
Delay Scalability for 64-Bit Operands
200 Cri$cal Path Delay (gate propaga$on delays) 180 Unbalanced CLA Tree O M log N 160 140 120 100 Balanced CLA Tree O log M log N 80 60 40 Balanced CSA Tree O log M log N 20 2 4 6 Number of Operands M
195
16-bit Carry Look-Ahead Adder
A3 B3 A2 B2 A1 B1 A0 B0 A3 B3 A2 B2 A1 B1 A0 B0 A3 B3 A2 B2 A1 B1 A0 B0 A3 B3 A2 B2 A1 B1 A0 B0 C0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 Generate Propagat e Sum Logic S3 Generate Propagat e Sum Logic S2 Generate Propagat e Sum Logic S1 Generate Propagat e Sum Logic S0 G3 P3 C G2 P2 C2 G1 P1 C G0 P0 G3 P3 C3 G2 P2 C G1 P1 C1 4-bit Carry Look-Ahead Unit G0 P0 GG PG G3 P3 C3 G2 P2 C G1 P1 C1 4-bit Carry Look-Ahead Unit G0 P0 GG PG G3 P3 C3 G2 P2 C2 G1 P1 C G0 P0 4-bit Carry Look-Ahead Unit GG PG 4-bit Carry Look-Ahead Unit GG PG GG12 PG12 C12 GG8 PG C 8 16-bit Carry Look-Ahead Unit GG 4 PG C 4 GG 0 PG 0 C16 GG PG Use this as building block for 64-bit CLA Page 207
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.