Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)

arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)

arithmetic.2 2/15 Requirements: CPU needs a 32-bit ALU (1) Functional Specification inputs: 2 x 32-bit operands A, B, 4-bit mode outputs:32-bit result S, 1-bit carry, 1 bit overflow operations:add, addu, sub, subu, and, or, xor, nor, slt, sltU (2) Block Diagram (schematic symbol/ Verilog description) ALU AB m ovf S 32 4 c

arithmetic.3 2/15 1-bit adder Review (Appendix B.5, B.6) ABCCoSum 00000 00101 01001 01110 10001 10110 11010 11111 Sum = a!bc! + ab!c! + a!b!c+abc = a b c = XOR Carryout = a!bc + ab!c + abc! + abc a b Sum Cin Co A B Cin sum 2 units of delay from A/B to sum 1unit of delay from Cin to sum

arithmetic.4 2/15 Carry Out circuit Cin a b Cout 2 units of delay from Cin to Cout

arithmetic.5 2/15 1-bit ALU cell: ADD, AND, OR A B 1-bit Full Adder CarryOut Mux CarryIn Result add and or S-select ABCCoCo O 00000 00101 01001 01110 10001 10110 11010 11111 Full Adder (3->2 element)

arithmetic.6 2/15 Additional operations: Subtract, AND, OR A - B = A + (– B) = A + B + 1 –form two complement by invert and add one A B 1-bit Full Adder CarryOut Mux CarryIn Result add and or S-select invert

arithmetic.7 2/15 1-bit ALU: AND, OR, a+b, a+b! Most significant bit ALU Delays Result = 1 gate delay From a to result = 2 Form b to Result = 2 (ignore b invert)

arithmetic.8 2/15 Final 32-bit ALU, including zero detect Operation

arithmetic.9 2/15 Behavioral Representation: verilog, RTL FYI) module ALU(A, B, m, S, c, ovf); input [0:31] A, B; input [0:3] m; output [0:31] S; output c, ovf; reg [0:31] S; reg c, ovf; always @(A, B, m) begin case (m) 0: S = A + B;... end endmodule Code written, simulated & verified translated into hardware (mapped) How complex digital design is done

arithmetic.10 2/15 Overflow ?? - 4-bit example Examples: 7 + 3 = 10 but... - 4 - 5 = - 9 but... 2’s ComplementBinaryDecimal 00000 10001 20010 30011 0000 1111 1110 1101 Decimal 0 -2 -3 40100 50101 60110 70111 1100 1011 1010 1001 -4 -5 -6 -7 1000-8 0111 0011+ 1010 1 1100 1011+ 0111 110 7 3 1 – 6 – 4 – 5 7

arithmetic.11 2/15 Overflow Detection Overflow: arithmetic result too large (or too small) to represent properly –Example: - 8  4-bit binary number  7 When adding operands with different signs, overflow cannot occur! Overflow occurs when adding: –2 positive numbers and sum is negative –2 negative numbers and the sum is positive On your own: Prove you can detect overflow by: –Carry into MSB  Carry out of MSB 0111 0011+ 1010 1 1100 1011+ 0111 110 7 3 1 – 6 –4 – 5 7 0

arithmetic.12 2/15 Overflow Detection Logic Carry into MSB  Carry out of MSB –For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1] CarryIn0 A0 B0 1-bit ALU Result0 CarryOut0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 A2 B2 1-bit ALU Result2 CarryIn2 A3 B3 1-bit ALU Result3 CarryIn3 CarryOut3 Overflow XYX XOR Y 000 011 101 110

arithmetic.13 2/15 MIPS ALU requirements Add, AddU, Sub, SubU, AddI, AddIU –=> 2’s complement adder/sub with overflow detection And, Or, AndI, OrI, Xor, Xori, Nor –=> Logical AND, logical OR, XOR, nor SLTI, SLTIU (set less than) –=> 2’s complement adder with inverter, check sign bit of result ALU must support these ops

arithmetic.14 2/15 MIPS arithmetic instruction format - Review Signed arithmetic generate overflow, no carry R-type: I-Type: 3125201550 opRsRtRdfunct opRsRtImmed 16 Typeopfunct ADDI10xx ADDIU11xx SLTI12xx SLTIU13xx ANDI14xx ORI15xx XORI16xx LUI17xx Typeopfunct ADD0040 ADDU0041 SUB0042 SUBU0043 AND0044 OR0045 XOR0046 NOR0047 Typeopfunct 0050 0051 SLT0052 SLTU0053

arithmetic.15 2/15 Ripple Adder Performance? Critical Path of n-bit Rippled-carry adder is n*CP A0 B0 1-bit ALU Result0 CarryIn0 CarryOut0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 A2 B2 1-bit ALU Result2 CarryIn2 CarryOut2 A3 B3 1-bit ALU Result3 CarryIn3 CarryOut3 Very slow: Must improve Assume t = carry delay / bit 32- bit ALU needs 32 * t units of delay 64-bit ALU needs 64 * t units of delay A B Cin sum 2 units of delay from A/B to sum 1unit of delay from Cin to sum

arithmetic.16 2/15 Fast Addition : Carry Lookahead Carry Inputs can be precomputed by logic c1 = g0 + c0  p0 = a0  b0 + c0  (a0 + b0) p0 = a0 + b0 g0 = a0  b0 c2 = g1 + p1  c1 = g1 + p1  g0 + p1  p0  c0 = a1  b1 + c1  a1 + b1) p1 = a1 + b1 g1 = a1  b1 c3 = g2 + p2  g1 + p2  p1  g0 + p2  p1  p0  c0 c4 = g3 + p3  g2 + p3  p2  g1 + p3  p2  p1  g0 + p3  p2  p1  p0  c0 C 4 = func( a 3, b 3, a 2, b 2, a 1, b 1, a 0, b 0, c 0 ) 1 unit delay each p, g 1 unit delay 3 units of delay

arithmetic.17 2/15 Fast Addition: Carry Look Ahead – 4 bits ABC-out 000“kill” 01C-in“propagate” 10C-in“propagate” 111“generate” g = a and b 1 delay p = a or b C0 = Cin c1 = g0 + c0  p0 c2 = g1 + g0  p1  + c0  p0  p1 c3 = g2 + g1  p2 + g0  p1  p2 + c0  p0  p1  p2 a0 b0 a1 b1 a2 b2 a3 b3 S S S S g p g p g p g p G0= g3 + p3  g2 + p3  p2  g1 + p3  p2  p1  g0 C4 =... P0 = p3  p2  p1  p0 3 units of delay for G0 3 units of delay for c1, c2, c3, (c4) 4 units of delay for S1, S2, S3 3 3 3 4 4 4 2

arithmetic.18 2/15 Carry Lookahead – 2 nd level – 16 bits Add 2 nd level abstraction for more practical 4-bit units Each P i, G i handles 4 bits at a time, 0-3, 4-7, 8-11,..) P0 = p3  p2  p1  p0; G0 = g3 + p3  g2 + p3  p2  g1 + p3  p2  p1  g0 P1 = p7  p6  p5  p4; G1 = g7 + p7  g6 + p7  p6  g5 + p7  p6  p5  g4 P2 = p11  p10  p9  p8; G2 =g11 + p11  g10 + p11  p10  g9 + p11  p10  p9  g8 P3 = p15  p14  p13  p12; G3 = ……. 3 units of delay for G0, G1, G2, G3 2 units of delay for P0, P1, P2, P3

arithmetic.19 2/15 Fast Addition: Cascaded Carry Look-ahead (16-bit): CLACLA 4-bit Adder 4-bit Adder 4-bit Adder c4 = G0 + C0  P0 c8 = G1 + G0  P1 + C0  P0  P1 c12 = G2 + G1  P2 + G0  P1  P2 + C0  P0  P1  P2 G P G0 P0 c16 =... C0 5 units of delay for c8, c12, c16 c4 has 4 units of delay c8 c12 5 5 4

arithmetic.20 2/15 Carry Lookahead Homework You are required to calculate the performance of a 16-bit Carry lookahead adder similar to the one discussed in class. The design has 2 options 1. assuming ripple carry is used inside each 4-bit cell 2.Carry lookahead is used inside each 4-bit cell Both cases use carry lookahead at predicting 4-bit boundary carries [c4, c8, c12] Draw a table showing the delay of each adder bit i.e. Sum0 - Sum 15; as well as the carry at each stage of the design – for the 2 designs

arithmetic.21 2/15 8-bit carry lookahead adder (4-bit block is also CLA) c5= g4 + c4.p4 Delays 1 4 1

arithmetic.22 2/15 8-bit CLA – uses ripple carry inside 4-bit block a0 b0 Result0 Result1 Result2 Result3 a1 b1 a2 b2 a3 b3 a4 b4 Result4 Result5 Result6 Result7 a7 b7 a6 b6 a5 b5 2 nd level carry lookahead c4 0 2 4 6 4 6 8 10 2 3 5 7 5 7 9 11

arithmetic.23 2/15 Additional MIPS ALU requirements Mult, MultU, Div, DivU => Need 32-bit multiply and divide, signed and unsigned Sll, Srl, Sra => Need left shift, right shift, right shift arithmetic by 0 to 31 bits Nor (leave as exercise !) => logical NOR or use 2 steps: (A OR B) XOR 1111....1111

arithmetic.24 2/15 Multiply, Divide & Shift

arithmetic.25 2/15 MIPS arithmetic instructions InstructionExampleMeaningComments add add $1,$2,$3$1 = $2 + $33 operands; exception possible subtractsub $1,$2,$3$1 = $2 – $33 operands; exception possible add immediateaddi $1,$2,100$1 = $2 + 100+ constant; exception possible add unsignedaddu $1,$2,$3$1 = $2 + $33 operands; no exceptions subtract unsignedsubu $1,$2,$3$1 = $2 – $33 operands; no exceptions add imm. unsign.addiu $1,$2,100$1 = $2 + 100+ constant; no exceptions multiply mult $2,$3Hi, Lo = $2 x $364-bit signed product multiply unsignedmultu$2,$3Hi, Lo = $2 x $3 64-bit unsigned product divide div $2,$3Lo = $2 ÷ $3,Lo = quotient, Hi = remainder Hi = $2 mod $3 divide unsigned divu $2,$3Lo = $2 ÷ $3,Unsigned quotient & remainder Hi = $2 mod $3 Move from Himfhi $1$1 = HiUsed to get copy of Hi Move from Lomflo $1$1 = LoUsed to get copy of Lo

arithmetic.26 2/15 MULTIPLY (unsigned) Paper and pencil example (unsigned): Multiplicand 1000A Multiplier 1001B 1000 0000 0000 1000 Product 01001000 m bits x n bits = m+n bit product Binary makes it easy: –0 => place 0 ( 0 x multiplicand) –1 => place a copy ( 1 x multiplicand) 4 versions of multiply hardware & algorithm: –successive refinement

arithmetic.27 2/15 Fast Multiply== Array Multiplier Stage i accumulates A * 2 i if B i == 1 Q: How much hardware for 32 bit multiplier? Critical path? B0B0 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 B1B1 B2B2 B3B3 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 0000 Bi AjAj Multiplicand A Multiplier B Product P Cell delays ?

arithmetic.28 2/15 Multiplier operation At each stage shift multiplicand left ( x 2) Multiplier bit B i determines : add in shifted multiplicand Accumulate 2n bit partial product at each stage B0B0 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 B1B1 B2B2 B3B3 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 0000 00 0 Multiplication, using shift & Add

arithmetic.29 Multiplication, using shift & Add long-multiplication approach 1000 × 1001 1000 0000 1000 1001000 Length of product is the sum of operand lengths multiplicand multiplier product 2/15

arithmetic.30 Multiplication Hardware using shift & Add Initially 0 2/15

arithmetic.31 Optimized Multiplier using shift & Add Perform steps in parallel: add/shift One cycle per partial-product addition ok, if frequency of multiplications is low 2/15 32 – bit ALU, multiplicand

arithmetic.32 2/15 Multiply Algorithm Done Yes: 32 repetitions 2. Shift the Product register right 1 bit. No: < 32 repetitions 1. Test Product0 Product0 = 0 Product0 = 1 1a. Add multiplicand to the left half of product & place the result in the left half of Product register 32nd repetition? Start 0000 0011 0010 1: 0010 0011 0010 2: 0001 0001 0010 1: 0011 0001 0010 2: 0001 1000 0010 1: 0001 1000 0010 2: 0000 1100 0010 1: 0000 1100 0010 2: 0000 0110 0010 0000 0110 0010 Product Multiplicand

arithmetic.33 2/15 MIPS logical instructions InstructionExampleMeaningComment and and $1,$2,$3$1 = $2 & $33 reg. operands; Logical AND or or $1,$2,$3$1 = $2 | $33 reg. operands; Logical OR xor xor $1,$2,$3$1 = $2  $33 reg. operands; Logical XOR nor nor $1,$2,$3$1 = ~($2 |$3)3 reg. operands; Logical NOR and immediate andi $1,$2,10$1 = $2 & 10Logical AND reg, constant or immediate ori $1,$2,10$1 = $2 | 10Logical OR reg, constant xor immediate xori $1, $2,10 $1 = ~$2 &~10Logical XOR reg, constant shift left logical sll $1,$2,10$1 = $2 << 10Shift left by constant shift right logical srl $1,$2,10$1 = $2 >> 10Shift right by constant shift right arithm. sra $1,$2,10$1 = $2 >> 10Shift right (sign extend) shift left logical sllv $1,$2,$3$1 = $2 << $3 Shift left by variable shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable

arithmetic.34 2/15 How shift instructions are implemented Two kinds: logical-- value shifted in is always "0" arithmetic-- on right shifts, sign extend msblsb"0" msblsb"0" instruction can request 0 to 32 bits to be shifted! 1011  1110 shift right arithmetic by 2 1100  1011 shift right logical by 2

arithmetic.35 –Shift value can be either be: 5 bit unsigned integer Specified in bottom byte of another register. Example: ADDr0, r1, r2, LSL#7 Semantics: r2 is shifted left by 7 & then added to r1 Result Operand 1 Barrel Shifter Operand 2 ALU ARM :: Barrel Shifter: 2/1 4

arithmetic.36 2/15 Barrel Shifter, used in ICs Shift Right using one transistor per switch

arithmetic.37 Barrel Shifter, used in ICs Shift ……Left & right D3 D2 D1 D0 A5 A4 A3 A2A1A0 SR0SR1SR2 SL 1SL 2SL3

arithmetic.38 2/15 Summary: Multiply & Shift Multiply: successive refinement to see final design –32-bit Adder, 64-bit shift register, 32-bit Multiplicand Register Fast multiply  Array multiplier Shifter: success refinement 1/bit at a time shift register to barrel shifter

arithmetic.39 2/15 Floating Point Arithmetic How to represent –numbers with fractions, e.g., 3.1416 –very small numbers, e.g.,.000000001 –very large numbers, e.g., 3.15576  10 9 Fixed point Floating point: a number system with floating decimal point Normalized numbers: no leading 0’s, single digit before decimal point 1.0 x 3.1557 x 35 0.03

arithmetic.40 2/15 Floating Point Notation – IEEE 754 FP 6.02 x 10 1.673 x 10 23 -24 exponent radix (base) Mantissa decimal point Sign, magnitude IEEE F.P. ± 1.M x 2 e - 127 Issues: –Arithmetic (+, -, *, / ) –Representation, Normal form –Range and Precision, Single, Double –Rounding –Exceptions (e.g., divide by zero, overflow, underflow)

arithmetic.41 2/15 Floating-Point Arithmetic Floating point numbers in IEEE 754 standard: single precision 1823 sign exponent: excess 127 binary integer mantissa: sign + magnitude, normalized binary significand w/ hidden integer bit: 1.M actual exponent is e = E - 127 SE M N = (-1) 2 (1.M) S E-127 0 < E < 255 0 = 0 00000000 0... 0 -1.5 = 1 01111111 10... 0 Numbers that can be represented is in the range: 2 -126 (1.0) to2 127 (2 - 2 -23 ) Double Precision IEEE 754 [64-bits] Exponent = 11 bits, Bias = 1023, Mantissa = 52, Sign= 1bit 127

arithmetic.42 2/15 Exponent Bias used to simplify comparisons If we use 2’s complement, not good for sorting and comparison 0000 00001111 1111 most negative most positive exponent exponent

arithmetic.43 2/15 Floating Point – Example review Represents –bias = 127 for 32-bit word –S = 1: negative 0: positive or zero Example (from fraction to floating point representation) -0.75

arithmetic.44 2/15 Floating-Point Example - review Represent –0.75 ––0.75 = (–1) 1 × 1.1 2 × 2 –1 –S = 1 –Fraction = 1000…00 2 –Exponent = –1 + Bias = 126 Single: –1 + 127 = 126 = 01111110 2 Double: –1 + 1023 = 1022 = 01111111110 2 Single: 1011111101000…00 Double: 1011111111101000…00

arithmetic.45 2/15 Addition – Multiply Algorithm issues For addition (or subtraction) : (1) compute Ye - Xe (getting ready to align binary point) (2) right shift Xm that many positions to form Xm 2 (3) compute Xm 2 + Ym (4) for multiply, doubly biased exponent must be corrected: Xe = 7 Ye = -3 Excess 8 extra subtraction step of the bias amount Xe-Ye Xe = 1111 Ye = 0101 10100 = 15 = 5 20 = 7 + 8 = -3 + 8 4 + 8 + 8

arithmetic.46 2/15 Floating Point Addition Step 1: align, round Step 2: add Step 3: normalize, check overflow or underflow Step 4: round Example:

arithmetic.47 2/15 Floating Point Multiplication Step 1: add exponents, subtract bias, Mpy mantissas Step 2: normalize and check over/underflow Step 3: round Step 4: check sign Example:

arithmetic.48 FP Adder Hardware more complex than integer adder Doing it in one clock cycle - takes too long –Much longer than integer operations –Slower clock would penalize all instructions FP adder usually takes several cycles – pipelined 2/15

arithmetic.49 FP Adder Hardware Step 1 Step 2 Step 3 Step 4 2/15

arithmetic.50 2/15 Floating Point: Overflow & Underflow Exponent too large to be represented Underflow: negative exponent too small to fit in exponent field

arithmetic.51 2/15 Summary of Floating Point Arithmetic IEEE floating point standard 32 bit and 64 bit Converting decimal numbers to floating point and vice versa Overflow and underflow Floating point add and multiply

Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)

Similar presentations

Presentation on theme: "Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)

Similar presentations

Presentation on theme: "Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)"— Presentation transcript:

Similar presentations

About project

Feedback