331 W07.1Spring 2006 14:332:331 Computer Architecture and Assembly Language Spring 2006 Week 7 ALU Design [Adapted from Dave Patterson’s UCB CS152 slides.

Slides:



Advertisements
Similar presentations
Arithmetic for Computers
Advertisements

331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
©UCB CPSC 161 Lecture 6 Prof. L.N. Bhuyan
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 7 Arithmetic Logic Unit February 19,
1 CONSTRUCTING AN ARITHMETIC LOGIC UNIT CHAPTER 4: PART II.
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
Arithmetic II CPSC 321 E. J. Kim. Today’s Menu Arithmetic-Logic Units Logic Design Revisited Faster Addition Multiplication (if time permits)
Lecture 9 Sept 28 Chapter 3 Arithmetic for Computers.
1 Representing Numbers Using Bases Numbers in base 10 are called decimal numbers, they are composed of 10 numerals ( ספרות ) = 9* * *10.
ECE 232 L8.Arithm.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 8 Computer.
Integer Multiplication and Division ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering.
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
CSE431 L03 MIPS Arithmetic Review.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 03: MIPS Arithmetic Review Mary Jane Irwin (
Computer Systems Organization: Lecture 3
Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.
ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 6: Logic/Shift Instructions Partially adapted from Computer Organization and Design, 4.
1  1998 Morgan Kaufmann Publishers Chapter Four Arithmetic for Computers.
331 Practice Exam.1Fall 2003 Naming Conventions for Registers 0$zero constant 0 (Hdware) 1$atreserved for assembler 2$v0expression evaluation & 3$v1function.
Week 7.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 7 [Adapted from Dave Patterson’s UCB CS152 slides and Mary.
Chapter 3 Arithmetic for Computers. Arithmetic Where we've been: Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
CSE331 W07&8.1Irwin Fall 2007 PSU CSE 331 Computer Organization and Design Fall 2007 Week 7&8 Section 1: Mary Jane Irwin (
CMPT 334 Computer Organization
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
CMPE 325 Computer Architecture II Cem Ergün Eastern Mediterranean University Integer Representation and the ALU.
1 CS/COE0447 Computer Organization & Assembly Language Chapter 3.
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
CS35101 Computer Architecture Spring 2006 Week 8 P Durand ( [Adapted from MJI ( [Adapted from Dave Patterson’s.
07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.
1  1998 Morgan Kaufmann Publishers Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture.
Csci 136 Computer Architecture II – Constructing An Arithmetic Logic Unit Xiuzhen Cheng
Computer Architecture Chapter 3 Instructions: Arithmetic for Computer Yu-Lun Kuo 郭育倫 Department of Computer Science and Information Engineering Tunghai.
Computing Systems Designing a basic ALU.
CS.305 Computer Architecture Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available by Dr Mary.
CS Computer Architecture Spring 2006 Week 6/7 Paul Durand ( Course url:
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)
Integer Multiplication and Division ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering.
1 Modified from  Modified from 1998 Morgan Kaufmann Publishers Chapter Three: Arithmetic for Computers Section 2 citation and following credit line is.
CPE 232 MIPS Arithmetic1 CPE 232 Computer Organization MIPS Arithmetic – Part I Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin (
1 ELEN 033 Lecture 4 Chapter 4 of Text (COD2E) Chapters 3 and 4 of Goodman and Miller book.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Arithmetic: Part II.
EI 209 Chapter 3.1CSE, 2015 EI 209 Computer Organization Fall 2015 Chapter 3: Arithmetic for Computers Haojin Zhu ( )
C-H1 Lecture Adders Half adder. C-H2 Full Adder si is the modulo- 2 sum of ci, xi, yi.
Addition, Subtraction, Logic Operations and ALU Design
Computer Arthmetic Chapter Four P&H. Data Representation Why do we not encode numbers as strings of ASCII digits inside computers? What is overflow when.
Csci136 Computer Architecture II Lab#5 Arithmetic Review ALU Design Ripple Carry Adder & Carry lookahead HW #4: Due on Feb 22, before class Feb.16, 2005.
Integer Multiplication and Division COE 301 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University.
9/23/2004Comp 120 Fall September Chapter 4 – Arithmetic and its implementation Assignments 5,6 and 7 posted to the class web page.
EE204 L03-ALUHina Anwar Khan EE204 Computer Architecture Lecture 03- ALU.
Integer Multiplication and Division ICS 233 Computer Architecture & Assembly Language Prof. Muhamed Mudawar College of Computer Sciences and Engineering.
1 CPTR 220 Computer Organization Computer Architecture Assembly Programming.
CSE 340 Computer Architecture Spring 2016 MIPS Arithmetic Review.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Computer Arthmetic Chapter Four P&H.
CS 230: Computer Organization and Assembly Language
Integer Multiplication and Division
Single Bit ALU 3 R e s u l t O p r a i o n 1 C y I B v b 2 L S f w d O
Arithmetic for Computers
XU, Qiang 徐強 [Adapted from UC Berkeley’s D. Patterson’s and
CS 314 Computer Organization Fall Chapter 3: Arithmetic for Computers
CS352H: Computer Systems Architecture
ECE232: Hardware Organization and Design
Instructor: Mozafar Bag-Mohammadi University of Ilam
A 1-Bit Arithmetic Logic Unit
Number Representation
MIPS Arithmetic and Logic Instructions
Presentation transcript:

331 W07.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 7 ALU Design [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

331 W07.2Spring 2006 Head’s Up  This week’s material l MIPS logic and multiply instructions -Reading assignment – PH l MIPS ALU design -Reading assignment – PH B.5, B.6

331 W07.3Spring 2006 Review: MIPS Arithmetic Instructions R-type: I-Type: opRsRtRdfunct opRsRtImmed 16 Typeop funct ADD ADDU SUB SUBU AND OR XOR NOR Typeop funct SLT SLTU add 1addu 2sub 3subu 4and 5or 6xor 7nor aslt bsltu l expand immediates to 32 bits before ALU l 10 operations so can encode in 4 bits 32 m (operation) result A B ALU 4 zeroovf 1 1

331 W07.4Spring 2006 Review: A 32-bit Adder/Subtractor 1-bit FA S0S0 c 0 =carry_in c1c1 1-bit FA S1S1 c2c2 S2S2 c3c3 c 32 =carry_out 1-bit FA S 31 c  Built out of 32 full adders (FAs) A0A0 B0B0 A1A1 B1B1 A2A2 B2B2 A 31 B 31 add/subt 1 bit FA A B S carry_in carry_out S = A xor B xor carry_in carry_out = A  B v A  carry_in v B  carry_in (majority function)  Small but slow!

331 W07.5Spring 2006 Minimal Implementation of a Full Adder architecture concurrent_behavior of full_adder is signal t1, t2, t3, t4, t5: std_logic; begin t1 <= not A after 1 ns; t2 <= not cin after 1 ns; t4 <= not((A or cin) and B) after 2 ns; t3 <= not((t1 or t2) and (A or cin)) after 2 ns; t5 <= t3 nand B after 2 ns; S <= not((B or t3) and t5) after 2 ns; cout <= not(t1 or t2) and t4) after 2 ns; end concurrent_behavior;  Can you create the equivalent schematic? Can you determine worst case delay (the worst case timing path through the circuit)?  Gate library: inverters, 2-input nands, or-and-inverters

331 W07.6Spring 2006 Logic Operations  Logic operations operate on individual bits of the operand. $t2 = 0… $t1 = 0… and $t0, $t1, $t2$t0 = or $t0, $t1 $t2$t0 = xor $t0, $t1, $t2$t0 = nor $t0, $t1, $t2$t0 =  How do we expand our FA design to handle the logic operations - and, or, xor, nor ?

331 W07.7Spring 2006 A Simple ALU Cell 1-bit FA carry_in carry_out A B add/subt result op

331 W07.8Spring 2006 An Alternative ALU Cell 1-bit FA carry_in s1 s2 s0 result carry_out A B

331 W07.9Spring 2006 The Alternative ALU Cell’s Control Codes s2s1s0c_inresultfunction 0000Atransfer A 0001A + 1increment A 0010A + Badd 0011A + B + 1add with carry 0100A – B – 1subt with borrow 0101A – Bsubtract 0110A – 1decrement A 0111Atransfer A 100xA or Bor 101xA xor Bxor 110xA and Band 111x!Acomplement A

331 W07.10Spring 2006  Need to support the set-on-less-than instruction ( slt ) remember: slt is an arithmetic instruction l produces a 1 if rs < rt and 0 otherwise l use subtraction: (a - b) < 0 implies a < b  Need to support test for equality ( beq ) l use subtraction: (a - b) = 0 implies a = b  Need to add the overflow detection hardware Tailoring the ALU to the MIPS ISA

331 W07.11Spring 2006 Modifying the ALU Cell for slt 1-bit FA A B result carry_in carry_out add/subtop add/subt less

331 W07.12Spring 2006 Modifying the ALU for slt + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less...  First perform a subtraction  Make the result 1 if the subtraction yields a negative result  Make the result 0 if the subtraction yields a positive result

331 W07.13Spring 2006 Modifying the ALU for Zero + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less set  First perform subtraction  Insert additional logic to detect when all result bits are zero add/subt op

331 W07.14Spring 2006 Review: Overflow Detection  Overflow: the result is too large to represent in the number of bits allocated  Overflow occurs when l adding two positives yields a negative l or, adding two negatives gives a positive l or, subtract a negative from a positive gives a negative l or, subtract a positive from a negative gives a positive  On your own: Prove you can detect overflow by: l Carry into MSB xor Carry out of MSB – –4 –

331 W07.15Spring 2006 Modifying the ALU for Overflow + A1A1 B1B1 result 1 less + A0A0 B0B0 result 0 less + A 31 B 31 result 31 less set  Modify the most significant cell to determine overflow output setting  Disable overflow bit setting for unsigned arithmetic zero... add/subt op overflow

331 W07.16Spring 2006 Example: When do the result outputs settle at their final values for the inputs: add/subt = 0 op = 000 A = 1111 B = 0001

331 W07.17Spring 2006 Example: cont’d When do the result outputs settle at their final values for the inputs: add/subt = 0 op = 100 A = 1111 B = 0001

331 W07.18Spring 2006 Example: cont’d When do the result outputs settle at their final values for the inputs: add/subt = 1 op = 101 A = 1111 B = 0001 What is the zero output of these inputs?

331 W07.19Spring 2006 Example: cont’d With the ALU design described in class, we assumed that a subtraction operation had to be performed as part of the beq instruction. When do the outputs settle? Is there a faster alternative?

331 W07.20Spring 2006 But What about Performance?  Critical path of n-bit ripple-carry adder is n*CP  Design trick – throw hardware at it (Carry Lookahead) A0 B0 1-bit ALU Result0 CarryIn0 CarryOut0 A1 B1 1-bit ALU Result1 CarryIn1 CarryOut1 A2 B2 1-bit ALU Result2 CarryIn2 CarryOut2 A3 B3 1-bit ALU Result3 CarryIn3 CarryOut3

331 W07.21Spring 2006 Fast carry using “infinite” hardware (Parallel)  cout = b cin + a cin + a b c 1 = (b 0 +a 0 )c 0 + a 0 b 0 = a 0 b 0 + a 0 c 0 + b 0 c 0 c 2 = (b 1 +a 1 )c 1 + a 1 b 1 = (b 1 +a 1 )((b 0 +a 0 )c 0 + a 0 b 0 ) + a 1 b 1 = a 1 a 0 b 0 + a 1 a 0 c 0 + b 1 a 0 c 0 + b 1 a 0 b 0 + a 1 b 0 c 0 + b 1 b 0 c 0 + b 1 a 1 c3= a 2 a 1 a 0 b 0 + a 2 a 1 a 0 c 0 + a 2 b 1 a 0 c 0 + a 2 b 1 a 0 b 0 + a 2 a 1 b 0 c 0 + a 2 b 1 b 0 c 0 + a 2 b 1 a 1 + … …  Outputs settle much faster l D_c3 = 2* D_and + D_or (best case) l…l… l D_c31 = 5 *D_and + D_or (best case)  Problem: Prohibitively expensive

331 W07.22Spring 2006 Hierarchical Solution I  Hierarchical solution I l Group 32 bits into 8 4-bit groups l Within each group, use carry look ahead l Use 4-bit as a building block, and connect them in ripple carry fashion.

331 W07.23Spring 2006 First Level: Propagate and generate ci+1 = (aibi)+(ai+bi)ci gi = aibi pi = (ai+bi)  ci+1 = 1 if l gi = 1, or l pi and ci = 1  c1 = g0+(p0c0) c2 = g1+(p1g0)+(p1p0c0) c3 = g2+(p2g1)+(p2p1g0)+(p2p1p0c0) c4 = g3+(p3g2)+(p3p2g1)+ (p3p2p1g0) + (p3p2p1p0c0) ci+1 = gi + pi ci

331 W07.24Spring 2006 Hierarchical Solution I (16 bit) ALU0 A0A0 B0B0 c 0 =carry_in A1A1 B1B1 A2A2 B2B2 A3A3 B3B3 ALU1 A4A4 B4B4 c 4 =carry_in A5A5 B5B5 A6A6 B6B6 A7A7 B7B7 … Delay = 4 * Delay ( 4-bit carry look-ahead ALU) result 0-3 result 4-7

331 W07.25Spring 2006 Hierarchical Solution II  Hierarchical solution I l Group 32 bits into 8 4-bit groups l Within each group, use carry look ahead l Use 4-bit as a building block, and connect them in ripple carry fashion.  Hierarchical solution II l Group 32 bits into 8 4-bit groups l Within each group, use carry look ahead l Another level of carry look ahead is used to connect these 4-bit groups

331 W07.26Spring 2006 Hierarchical Solution II A0 B0 A3 B3 A4 B4 A7 B7 A8 B8 A11 B11 A12 B12 A15 B15 cin P0 G0 P1 G1 P2 G2 P3 G3 result 0-3 result 4-7 result 8-11 result pipi gigi c i+1 C1 p i+1 g i+1 p i+2 p i+3 g i+2 g i+3 c i+2 C2 c i+3 C3 c i+3 cout Carry-lookahead unit input a0-a15, b0-b15 calculate P0-P3, G0-G3 Calculate C1-C4 each 4-bit ALU calculates its results

331 W07.27Spring 2006 Fast Carry using the second level abstraction  P0 = p3.p2.p1.p0 P1 = p7.p6.p5.p4 P2 = p11.p10.p9.p8 P3 = p15.p14.p13.p12  G0 = g3+(p3.g2) + (p3.p2.g1) + (p3.p2.p1.g0) G1 = g7+(p7.g6) + (p7.p6.g5) + (p7.p6.p5.g4) G2 = g11+(p11.g10)+(p11.p10.g9) + (p11.p10.p9.g8) G3 = g15+(p15.g14)+(p15.p14.g3)+(p15.p14.p3.g12)  C1 = G+(P0c0) C2 = G1+(P1G0)+(P1P0c0) C3 = G2+(P2G1)+(P2P1G0)+(P2P1P0c0) C4 = G3+(P3G2)+(P3P2G1)+(P3P2P1G0) + (P3P2P1P0c0)

331 W07.28Spring 2006 Shift Operations  Also need operations to pack and unpack 8-bit characters into 32-bit words  Shifts move all the bits in a word left or right sll $t2, $s0, 8 #$t2 = $s0 << 8 bits srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits  Such shifts are logical because they fill with zeros op rs rt rd shamt funct

331 W07.29Spring 2006 Shift Operations, con’t  An arithmetic shift ( sra ) maintain the arithmetic correctness of the shifted value (i.e., a number shifted right one bit should be ½ of its original value; a number shifted left should be 2 times its original value) so sra uses the most significant bit (sign bit) as the bit shifted in note that there is no need for a sla when using two’s complement number representation sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits  The shift operation is implemented by hardware (usually a barrel shifter) outside the ALU

331 W07.30Spring 2006  More complicated than addition accomplished via shifting and addition 0010 (multiplicand) x_1011 (multiplier) (partial product 0000 array) (product)  Double precision product produced  More time and more area to compute Multiplication

331 W07.31Spring 2006 mult $s0, $s1 # hi||lo = $s0 * $s1  Low-order word of the product is left in processor register lo and the high-order word is left in register hi  Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file MIPS Multiply Instruction op rs rt rd shamt funct

331 W07.32Spring 2006 Review: MIPS ISA, so far CategoryInstrOp CodeExampleMeaning Arithmeti c (R & I format) add0 and 32add $s1, $s2, $s3$s1 = $s2 + $s3 add unsigned0 and 33addu $s1, $s2, $s3$s1 = $s2 + $s3 subtract0 and 34sub $s1, $s2, $s3$s1 = $s2 - $s3 subt unsigned0 and 35subu $s1, $s2, $s3$s1 = $s2 - $s3 add immediate8addi $s1, $s2, 6$s1 = $s2 + 6 add imm. unsigned9addiu $s1, $s2, 6$s1 = $s2 + 6 multiply0 and 24mult $s1, $s2hi || lo = $s1 * $s2 multiply unsigned0 and 25multu $s1, $s2hi || lo = $s1 * $s2 divide0 and 26div $s1, $s2lo = $s1/$s2, rem. in hi divide unsigned0 and 27divu $s1, $s2lo = $s1/$s2, rem. in hi Logical (R & I format) and0 and 36and $s1, $s2, $s3$s1 = $s2 & $s3 or0 and 37or $s1, $s2, $s3$s1 = $s2 | $s3 xor0 and 38xor $s1, $s2, $s3$s1 = $s2 xor $s3 nor0 and 39nor $s1, $s3, $s3$s1 = !($s2 | $s2) and immediate12andi $s1, $s2, 6$s1 = $s2 & 6 or immediate13ori $s1, $s2, 6$s1 = $s2 | 6 xor immediate14xori $s1, $s2, 6$s1 = $s2 xor 6

331 W07.33Spring 2006 Review: MIPS ISA, so far con’t CategoryInstrOp CodeExampleMeaning Shift (R format) sll0 and 0sll $s1, $s2, 4$s1 = $s2 << 4 srl0 and 2srl $s1, $s2, 4$s1 = $s2 >> 4 sra0 and 3sra $s1, $s2, 4$s1 = $s2 >> 4 Data Transfer (I format) load word35lw $s1, 24($s2)$s1 = Memory($s2+24) store word43sw $s1, 24($s2)Memory($s2+24) = $s1 load byte32lb $s1, 25($s2)$s1 = Memory($s2+25) load byte unsigned36lbu $s1, 25($s2)$s1 = Memory($s2+25) store byte40sb $s1, 25($s2)Memory($s2+25) = $s1 load upper imm15lui $s1, 6$s1 = 6 * 2 16 move from hi0 and 16mfhi $s1$s1 = hi move to hi0 and 17mthi $s1hi = $s1 move from lo0 and 18mflo $s1$s1 = lo move to lo0 and 19mtlo $s1lo = $s1

331 W07.34Spring 2006 Review: MIPS ISA, so far con’t CategoryInstrOp CodeExampleMeaning Cond. Branch (I & R format) br on equal4beq $s1, $s2, Lif ($s1==$s2) go to L br on not equal5bne $s1, $s2, Lif ($s1 !=$s2) go to L set on less than0 and 42 slt $s1, $s2, $s3if ($s2<$s3) $s1=1 else $s1=0 set on less than unsigned 0 and 43 sltu $s1, $s2, $s3 if ($s2<$s3) $s1=1 else $s1=0 set on less than immediate 10slti $s1, $s2, 6if ($s2<6) $s1=1 else $s1=0 set on less than imm. unsigned 11sltiu $s1, $s2, 6if ($s2<6) $s1=1 else $s1=0 Uncond. Jump (J & R format) jump2j 2500go to jump and link3jal 2500go to 10000; $ra=PC+4 jump register0 and 8jr $s1go to $s1 jump and link reg0 and 9jalr $s1, $s2go to $s1, $s2=PC+4