Presentation is loading. Please wait.

Presentation is loading. Please wait.

331 W07.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 7 ALU Design [Adapted from Dave Patterson’s UCB CS152 slides.

Similar presentations


Presentation on theme: "331 W07.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 7 ALU Design [Adapted from Dave Patterson’s UCB CS152 slides."— Presentation transcript:

1 331 W07.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 7 ALU Design [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

2 331 W07.2Spring 2006 Head’s Up  This week’s material l MIPS logic and multiply instructions -Reading assignment – PH 3.1-3.4 l MIPS ALU design -Reading assignment – PH B.5, B.6

3 331 W07.3Spring 2006 Review: MIPS Arithmetic Instructions R-type: I-Type: 3125201550 opRsRtRdfunct opRsRtImmed 16 Typeop funct ADD00100000 ADDU00100001 SUB00100010 SUBU00100011 AND00100100 OR00100101 XOR00100110 NOR00100111 Typeop funct 00101000 00101001 SLT00101010 SLTU00101011 00101100 0add 1addu 2sub 3subu 4and 5or 6xor 7nor aslt bsltu l expand immediates to 32 bits before ALU l 10 operations so can encode in 4 bits 32 m (operation) result A B ALU 4 zeroovf 1 1

4 331 W07.4Spring 2006 Review: A 32-bit Adder/Subtractor 1-bit FA S0S0 c 0 =carry_in c1c1 1-bit FA S1S1 c2c2 S2S2 c3c3 c 32 =carry_out 1-bit FA S 31 c 31...  Built out of 32 full adders (FAs) A0A0 B0B0 A1A1 B1B1 A2A2 B2B2 A 31 B 31 add/subt 1 bit FA A B S carry_in carry_out S = A xor B xor carry_in carry_out = A  B v A  carry_in v B  carry_in (majority function)  Small but slow!

5 331 W07.5Spring 2006 Minimal Implementation of a Full Adder architecture concurrent_behavior of full_adder is signal t1, t2, t3, t4, t5: std_logic; begin t1 <= not A after 1 ns; t2 <= not cin after 1 ns; t4 <= not((A or cin) and B) after 2 ns; t3 <= not((t1 or t2) and (A or cin)) after 2 ns; t5 <= t3 nand B after 2 ns; S <= not((B or t3) and t5) after 2 ns; cout <= not(t1 or t2) and t4) after 2 ns; end concurrent_behavior;  Can you create the equivalent schematic? Can you determine worst case delay (the worst case timing path through the circuit)?  Gate library: inverters, 2-input nands, or-and-inverters

6 331 W07.6Spring 2006 Logic Operations  Logic operations operate on individual bits of the operand. $t2 = 0…0 0000 1101 0000 $t1 = 0…0 0011 1100 0000 and $t0, $t1, $t2$t0 = or $t0, $t1 $t2$t0 = xor $t0, $t1, $t2$t0 = nor $t0, $t1, $t2$t0 =  How do we expand our FA design to handle the logic operations - and, or, xor, nor ?

7 331 W07.7Spring 2006 A Simple ALU Cell 1-bit FA carry_in carry_out A B add/subt result op

8 331 W07.8Spring 2006 An Alternative ALU Cell 1-bit FA carry_in s1 s2 s0 result carry_out A B

9 331 W07.9Spring 2006 The Alternative ALU Cell’s Control Codes s2s1s0c_inresultfunction 0000Atransfer A 0001A + 1increment A 0010A + Badd 0011A + B + 1add with carry 0100A – B – 1subt with borrow 0101A – Bsubtract 0110A – 1decrement A 0111Atransfer A 100xA or Bor 101xA xor Bxor 110xA and Band 111x!Acomplement A

10 331 W07.10Spring 2006  Need to support the set-on-less-than instruction ( slt ) remember: slt is an arithmetic instruction l produces a 1 if rs < rt and 0 otherwise l use subtraction: (a - b) < 0 implies a < b  Need to support test for equality ( beq ) l use subtraction: (a - b) = 0 implies a = b  Need to add the overflow detection hardware Tailoring the ALU to the MIPS ISA

11 331 W07.11Spring 2006 Modifying the ALU Cell for slt 1-bit FA A B result carry_in carry_out add/subtop add/subt less

12 331 W07.12Spring 2006 Modifying the ALU for slt + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less...  First perform a subtraction  Make the result 1 if the subtraction yields a negative result  Make the result 0 if the subtraction yields a positive result

13 331 W07.13Spring 2006 Modifying the ALU for Zero + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less... 0 0 set  First perform subtraction  Insert additional logic to detect when all result bits are zero add/subt op

14 331 W07.14Spring 2006 Review: Overflow Detection  Overflow: the result is too large to represent in the number of bits allocated  Overflow occurs when l adding two positives yields a negative l or, adding two negatives gives a positive l or, subtract a negative from a positive gives a negative l or, subtract a positive from a negative gives a positive  On your own: Prove you can detect overflow by: l Carry into MSB xor Carry out of MSB 1 1 11 0 1 0 1 1 0 0111 0011+ 7 3 0 1 – 6 1100 1011+ –4 – 5 7 1 0

15 331 W07.15Spring 2006 Modifying the ALU for Overflow + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less... 0 0 set  Modify the most significant cell to determine overflow output setting  Disable overflow bit setting for unsigned arithmetic zero... add/subt op overflow

16 331 W07.16Spring 2006 Example: When do the result outputs settle at their final values for the inputs: add/subt = 0 op = 000 A = 1111 B = 0001

17 331 W07.17Spring 2006 Example: cont’d When do the result outputs settle at their final values for the inputs: add/subt = 0 op = 100 A = 1111 B = 0001

18 331 W07.18Spring 2006 Example: cont’d When do the result outputs settle at their final values for the inputs: add/subt = 1 op = 101 A = 1111 B = 0001 What is the zero output of these inputs?

19 331 W07.19Spring 2006 Example: cont’d With the ALU design described in class, we assumed that a subtraction operation had to be performed as part of the beq instruction. When do the outputs settle? Is there a faster alternative?

20 331 W07.20Spring 2006 But What about Performance?  Critical path of n-bit ripple-carry adder is n*CP  Design trick – throw hardware at it (Carry Lookahead) A0 B0 1-bit ALU Result0 CarryIn0 CarryOut0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 A2 B2 1-bit ALU Result2 CarryIn2 CarryOut2 A3 B3 1-bit ALU Result3 CarryIn3 CarryOut3

21 331 W07.21Spring 2006 Fast carry using “infinite” hardware (Parallel)  cout = b cin + a cin + a b c 1 = (b 0 +a 0 )c 0 + a 0 b 0 = a 0 b 0 + a 0 c 0 + b 0 c 0 c 2 = (b 1 +a 1 )c 1 + a 1 b 1 = (b 1 +a 1 )((b 0 +a 0 )c 0 + a 0 b 0 ) + a 1 b 1 = a 1 a 0 b 0 + a 1 a 0 c 0 + b 1 a 0 c 0 + b 1 a 0 b 0 + a 1 b 0 c 0 + b 1 b 0 c 0 + b 1 a 1 c3= a 2 a 1 a 0 b 0 + a 2 a 1 a 0 c 0 + a 2 b 1 a 0 c 0 + a 2 b 1 a 0 b 0 + a 2 a 1 b 0 c 0 + a 2 b 1 b 0 c 0 + a 2 b 1 a 1 + … …  Outputs settle much faster l D_c3 = 2* D_and + D_or (best case) l…l… l D_c31 = 5 *D_and + D_or (best case)  Problem: Prohibitively expensive

22 331 W07.22Spring 2006 Hierarchical Solution I  Hierarchical solution I l Group 32 bits into 8 4-bit groups l Within each group, use carry look ahead l Use 4-bit as a building block, and connect them in ripple carry fashion.

23 331 W07.23Spring 2006 First Level: Propagate and generate ci+1 = (aibi)+(ai+bi)ci gi = aibi pi = (ai+bi)  ci+1 = 1 if l gi = 1, or l pi and ci = 1  c1 = g0+(p0c0) c2 = g1+(p1g0)+(p1p0c0) c3 = g2+(p2g1)+(p2p1g0)+(p2p1p0c0) c4 = g3+(p3g2)+(p3p2g1)+ (p3p2p1g0) + (p3p2p1p0c0) ci+1 = gi + pi ci

24 331 W07.24Spring 2006 Hierarchical Solution I (16 bit) ALU0 A0A0 B0B0 c 0 =carry_in A1A1 B1B1 A2A2 B2B2 A3A3 B3B3 ALU1 A4A4 B4B4 c 4 =carry_in A5A5 B5B5 A6A6 B6B6 A7A7 B7B7 … Delay = 4 * Delay ( 4-bit carry look-ahead ALU) result 0-3 result 4-7

25 331 W07.25Spring 2006 Hierarchical Solution II  Hierarchical solution I l Group 32 bits into 8 4-bit groups l Within each group, use carry look ahead l Use 4-bit as a building block, and connect them in ripple carry fashion.  Hierarchical solution II l Group 32 bits into 8 4-bit groups l Within each group, use carry look ahead l Another level of carry look ahead is used to connect these 4-bit groups

26 331 W07.26Spring 2006 Hierarchical Solution II A0 B0 A3 B3 A4 B4 A7 B7 A8 B8 A11 B11 A12 B12 A15 B15 cin P0 G0 P1 G1 P2 G2 P3 G3 result 0-3 result 4-7 result 8-11 result 12-15 pipi gigi c i+1 C1 p i+1 g i+1 p i+2 p i+3 g i+2 g i+3 c i+2 C2 c i+3 C3 c i+3 cout Carry-lookahead unit input a0-a15, b0-b15 calculate P0-P3, G0-G3 Calculate C1-C4 each 4-bit ALU calculates its results

27 331 W07.27Spring 2006 Fast Carry using the second level abstraction  P0 = p3.p2.p1.p0 P1 = p7.p6.p5.p4 P2 = p11.p10.p9.p8 P3 = p15.p14.p13.p12  G0 = g3+(p3.g2) + (p3.p2.g1) + (p3.p2.p1.g0) G1 = g7+(p7.g6) + (p7.p6.g5) + (p7.p6.p5.g4) G2 = g11+(p11.g10)+(p11.p10.g9) + (p11.p10.p9.g8) G3 = g15+(p15.g14)+(p15.p14.g3)+(p15.p14.p3.g12)  C1 = G+(P0c0) C2 = G1+(P1G0)+(P1P0c0) C3 = G2+(P2G1)+(P2P1G0)+(P2P1P0c0) C4 = G3+(P3G2)+(P3P2G1)+(P3P2P1G0) + (P3P2P1P0c0)

28 331 W07.28Spring 2006 Shift Operations  Also need operations to pack and unpack 8-bit characters into 32-bit words  Shifts move all the bits in a word left or right sll $t2, $s0, 8 #$t2 = $s0 << 8 bits srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits  Such shifts are logical because they fill with zeros op rs rt rd shamt funct 000000 00000 10000 01010 01000 000000 000000 00000 10000 01010 01000 000010

29 331 W07.29Spring 2006 Shift Operations, con’t  An arithmetic shift ( sra ) maintain the arithmetic correctness of the shifted value (i.e., a number shifted right one bit should be ½ of its original value; a number shifted left should be 2 times its original value) so sra uses the most significant bit (sign bit) as the bit shifted in note that there is no need for a sla when using two’s complement number representation sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits  The shift operation is implemented by hardware (usually a barrel shifter) outside the ALU 000000 00000 10000 01010 01000 000011

30 331 W07.30Spring 2006  More complicated than addition accomplished via shifting and addition 0010 (multiplicand) x_1011 (multiplier) 0010 0010 (partial product 0000 array) 0010 00010110 (product)  Double precision product produced  More time and more area to compute Multiplication

31 331 W07.31Spring 2006 mult $s0, $s1 # hi||lo = $s0 * $s1  Low-order word of the product is left in processor register lo and the high-order word is left in register hi  Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file MIPS Multiply Instruction op rs rt rd shamt funct 000000 10000 10001 00000 00000 011000

32 331 W07.32Spring 2006 Review: MIPS ISA, so far CategoryInstrOp CodeExampleMeaning Arithmeti c (R & I format) add0 and 32add $s1, $s2, $s3$s1 = $s2 + $s3 add unsigned0 and 33addu $s1, $s2, $s3$s1 = $s2 + $s3 subtract0 and 34sub $s1, $s2, $s3$s1 = $s2 - $s3 subt unsigned0 and 35subu $s1, $s2, $s3$s1 = $s2 - $s3 add immediate8addi $s1, $s2, 6$s1 = $s2 + 6 add imm. unsigned9addiu $s1, $s2, 6$s1 = $s2 + 6 multiply0 and 24mult $s1, $s2hi || lo = $s1 * $s2 multiply unsigned0 and 25multu $s1, $s2hi || lo = $s1 * $s2 divide0 and 26div $s1, $s2lo = $s1/$s2, rem. in hi divide unsigned0 and 27divu $s1, $s2lo = $s1/$s2, rem. in hi Logical (R & I format) and0 and 36and $s1, $s2, $s3$s1 = $s2 & $s3 or0 and 37or $s1, $s2, $s3$s1 = $s2 | $s3 xor0 and 38xor $s1, $s2, $s3$s1 = $s2 xor $s3 nor0 and 39nor $s1, $s3, $s3$s1 = !($s2 | $s2) and immediate12andi $s1, $s2, 6$s1 = $s2 & 6 or immediate13ori $s1, $s2, 6$s1 = $s2 | 6 xor immediate14xori $s1, $s2, 6$s1 = $s2 xor 6

33 331 W07.33Spring 2006 Review: MIPS ISA, so far con’t CategoryInstrOp CodeExampleMeaning Shift (R format) sll0 and 0sll $s1, $s2, 4$s1 = $s2 << 4 srl0 and 2srl $s1, $s2, 4$s1 = $s2 >> 4 sra0 and 3sra $s1, $s2, 4$s1 = $s2 >> 4 Data Transfer (I format) load word35lw $s1, 24($s2)$s1 = Memory($s2+24) store word43sw $s1, 24($s2)Memory($s2+24) = $s1 load byte32lb $s1, 25($s2)$s1 = Memory($s2+25) load byte unsigned36lbu $s1, 25($s2)$s1 = Memory($s2+25) store byte40sb $s1, 25($s2)Memory($s2+25) = $s1 load upper imm15lui $s1, 6$s1 = 6 * 2 16 move from hi0 and 16mfhi $s1$s1 = hi move to hi0 and 17mthi $s1hi = $s1 move from lo0 and 18mflo $s1$s1 = lo move to lo0 and 19mtlo $s1lo = $s1

34 331 W07.34Spring 2006 Review: MIPS ISA, so far con’t CategoryInstrOp CodeExampleMeaning Cond. Branch (I & R format) br on equal4beq $s1, $s2, Lif ($s1==$s2) go to L br on not equal5bne $s1, $s2, Lif ($s1 !=$s2) go to L set on less than0 and 42 slt $s1, $s2, $s3if ($s2<$s3) $s1=1 else $s1=0 set on less than unsigned 0 and 43 sltu $s1, $s2, $s3 if ($s2<$s3) $s1=1 else $s1=0 set on less than immediate 10slti $s1, $s2, 6if ($s2<6) $s1=1 else $s1=0 set on less than imm. unsigned 11sltiu $s1, $s2, 6if ($s2<6) $s1=1 else $s1=0 Uncond. Jump (J & R format) jump2j 2500go to 10000 jump and link3jal 2500go to 10000; $ra=PC+4 jump register0 and 8jr $s1go to $s1 jump and link reg0 and 9jalr $s1, $s2go to $s1, $s2=PC+4


Download ppt "331 W07.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 7 ALU Design [Adapted from Dave Patterson’s UCB CS152 slides."

Similar presentations


Ads by Google