1 ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ? –support the arithmetic/logic operations: add, addi addiu, sub, subu, and,

Slides:



Advertisements
Similar presentations
Multiplication and Division
Advertisements

The MIPS 32 1)Project 1 Discussion? 1)HW 2 Discussion? 2)We want to get some feel for programming in an assembly language - MIPS 32 We want to fully understand.
1 Chapter Three Last revision: 4/17/ Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture.
Arithmetic for Computers
Datorteknik IntegerMulDiv bild 1 MIPS mul/div instructions Multiply: mult $2,$3Hi, Lo = $2 x $3;64-bit signed product Multiply unsigned: multu$2,$3Hi,
CMPE 325 Computer Architecture II
Computer Architecture ECE 361 Lecture 6: ALU Design
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 8 - Multiplication.
1 CONSTRUCTING AN ARITHMETIC LOGIC UNIT CHAPTER 4: PART II.
Division CPSC 321 Computer Architecture Andreas Klappenecker.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.
Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.
361 div.1 Computer Architecture ECE 361 Lecture 7: ALU Design : Division.
1 Representing Numbers Using Bases Numbers in base 10 are called decimal numbers, they are composed of 10 numerals ( ספרות ) = 9* * *10.
Computer Organization Multiplication and Division Feb 2005 Reading: Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann.
1 Chapter 4: Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture Assembly Language and.
Chapter Four Arithmetic and Logic Unit
1 ECE369 Chapter 3. 2 ECE369 Multiplication More complicated than addition –Accomplished via shifting and addition More time and more area.
COE 308: Computer Architecture (T041) Dr. Marwan Abu-Amara Integer & Floating-Point Arithmetic (Appendix A, Computer Architecture: A Quantitative Approach,
CPSC 321 Computer Architecture ALU Design – Integer Addition, Multiplication & Division Copyright 2002 David H. Albonesi and the University of Rochester.
ECE 232 L9.Mult.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 9 Computer Arithmetic.
(original notes from Prof. J. Kelly Flanagan)
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
Computer Arithmetic Nizamettin AYDIN
Computer Architecture ALU Design : Division and Floating Point
Computing Systems Basic arithmetic for computers.
ECE232: Hardware Organization and Design
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman, Sep 04 Oren Kapah) IBM י ב מ 10-1 The ALU.
Number Systems and Arithmetic or Computers go to elementary school Reading – Peer Instruction Lecture Materials for Computer Architecture by Dr.
07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.
1 EGRE 426 Fall 08 Chapter Three. 2 Arithmetic What's up ahead: –Implementing the Architecture 32 operation result a b ALU.
Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Spring 2006 University of Ilam.
1  1998 Morgan Kaufmann Publishers Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture.
Lecture 6: Multiply, Shift, and Divide
Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Ilam University.
Conversion to Larger Number of Bits Ex: Immediate Field (signed 16 bit) to 32 bit Positive numbers have implied 0’s to the left. So, put 16 bit number.
Cs 152 l6 Multiply 1 DAP Fa 97 © U.C.B. ECE Computer Architecture Lecture Notes Multiply, Shift, Divide Shantanu Dutt Univ. of Illinois at.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
05/03/2009CA&O Lecture 8,9,10 By Engr. Umbreen sabir1 Computer Arithmetic Computer Engineering Department.
Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication
IT253: Computer Organization
Arithmetic.1 2/15 Computer Arithmetic ALU Performance is critical ( App. C5, C6 4 th ed.)
Csci 136 Computer Architecture II – Multiplication and Division
Mohamed Younis CMCS 411, Computer Architecture 1 CMSC Computer Architecture Lecture 11 Performing Division March 5,
1 Chapter 3, Appendix B ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ? –support the arithmetic/logic operations: add, addi.
1 ELEN 033 Lecture 4 Chapter 4 of Text (COD2E) Chapters 3 and 4 of Goodman and Miller book.
Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank Competency Area 4: Computer Arithmetic.
Addition, Subtraction, Logic Operations and ALU Design
Division Quotient Divisor Dividend – – Remainder.
Prof. Hsien-Hsin Sean Lee
Computer Arthmetic Chapter Four P&H. Data Representation Why do we not encode numbers as strings of ASCII digits inside computers? What is overflow when.
EE204 L03-ALUHina Anwar Khan EE204 Computer Architecture Lecture 03- ALU.
By Wannarat Computer System Design Lecture 3 Wannarat Suntiamorntut.
Floating Point Representations
Computer System Design Lecture 3
Computer Arthmetic Chapter Four P&H.
Integer Multiplication and Division
MIPS mul/div instructions
Part II : Lecture III By Wannarat.
CS/COE0447 Computer Organization & Assembly Language
CSCE 350 Computer Architecture
Topic 3c Integer Multiply and Divide
ECEG-3202 Computer Architecture and Organization
Computer Architecture EECS 361 Lecture 6: ALU Design
Chapter 3 Arithmetic for Computers
Number Representation
Number Representation
Presentation transcript:

1 ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ? –support the arithmetic/logic operations: add, addi addiu, sub, subu, and, or, andi, ori, xor, xori, slt, slti, sltu, sltiu design a multiplier design a divider

2 Review Digital Logic Gates: Combinational Logic

3 Review Digital Logic PLA: AND array, OR array

4 Review Digital Logic

5 A D latch implemented with NOR gates. A D flip-flop with a falling-edge trigger.

6 DQ CLK Value of D is sampled on positive clock edge. Q outputs sampled value for rest of cycle. Q Review Digital Logic D

7 module ff(D, Q, CLK); input D, CLK; output Q; reg Q; (posedge CLK) Q <= D; endmodule Correct ? module ff(D, Q, CLK); input D, CLK; output Q; (CLK) Q <= D; endmodule Module code has two bugs. Where? Review: Edge-Triggering in Verilog

8 If Change == 1 on positive CLK edge traffic light changes R Y G If Rst == 1 on positive CLK edge R Y G = CLKChange Rst R Y G (red) (yellow) (green)

9 Change == 1 R Y G R Y G R Y G Rst == 1

10 R Y G Change == 1 R Y G R Y G R Y G Rst == 1 Change

11 Change == 1 R Y G R Y G R Y G Rst == 1 “One-Hot Encoding” DQDQ DQRGY

12 Next State Combinational Logic DQDQ DQ RGY Change Rst Change == 1 R Y G R Y G R Y G Rst == 1

13 wire next_R, next_Y, next_G; output R, Y, G; DQDQ DQRGY ??? State Elements: Traffic Light Controller

14 module ff(Q, D, CLK); input D, CLK; output Q; reg Q; (posedge CLK) Q <= D; endmodule DQ CLK Value of D is sampled on positive clock edge. Q outputs sampled value for rest of cycle.

15 DQDQ DQRGY State Elements: Traffic Light Controller ff ff_R(R, next_R, CLK); ff ff_Y(Y, next_Y, CLK); ff ff_G(G, next_G, CLK); wire next_R, next_Y, next_G; output R, Y, G;

16 Next State Logic: Traffic Light Controller Next State Combinational Logic next_Gnext_Rnext_YRGY Change Rst wire next_R, next_Y, next_G; assign next_R = rst ? 1’b1 : (change ? G : R); assign next_Y = rst ? 1’b0 : (change ? R : Y); assign next_G = rst ? 1’b0 : (change ? Y : G);

17 wire next_R, next_Y, next_G; output R, Y, G; assign next_R = rst ? 1’b1 : (change ? G : R); assign next_Y = rst ? 1’b0 : (change ? R : Y); assign next_G = rst ? 1’b0 : (change ? Y : G); ff ff_R(R, next_R, CLK); ff ff_Y(Y, next_Y, CLK); ff ff_G(G, next_G, CLK);

18 Logic Diagram: Traffic Light Controller Next State Combinational Logic DQDQ DQ RGY Change == 1 R Y G R Y G R Y G Rst == 1

19 ALU for MIPS ISA design a 1-bit ALU using AND gate, OR gate, a full adder, and a mux

20 ALU for MIPS ISA design a 32-bit ALU by cascading 32 1-bit ALUs

21 ALU for MIPS a 1-bit ALU performing AND, OR, addition and subtraction If we set Binvert = Carryin =1 then we can perform a - b

22

23 ALU for MIPS include a “less” input for set-on-less-than (slt)

24 ALU for MIPS design the most significant bit ALU most significant bit need to do more work (detect overflow and MSB can be used for slt ) how to detect an overflow overflow = carryin{MSB} xor carryout{MSB] overflow = 1 ; means overflow overflow = 0 ; means no overflow set-on-less-than slt $1, $2, $3; if $2 < $3 then $1 = 1, else $1 = 0 ; if MSB of $2 - $3 is 1, then $1 = 1 ; 2’s comp. MSB of a negative no. is 1

25 ALU for MIPS a 1-bit ALU for the MSB Overflow =Carryin XOR Carryout

26 A 32-bit ALU constructed from 32 1-bit ALUs

27 A 32-bit ALU with zero detector

28

29 A Verilog behavioral definition of a MIPS ALU.

30 ALU for MIPS Critical path of 32-bit ripple carry adder is 32 x carry propagation delay How to solve this problem –design trick : use more hardware –design trick : look ahead, peek –carry look adder (CLA) CLA ab cout 00 0nothing happen 0 1 cin propagate cin 1 0 cin propagate cin generate propagate = a + b;generate = ab

31 ALU for MIPS CLA using 4-bit as an example two 4-bit numbers: a3a2a1a0, b3b2b1b0 p0 = a0 + b0; g0 = a0b0 c1 = g0 + p0c0 c2 = g1 + p1c1 c3 = g2 + p2c2 c4 = g3 + p3c3 larger CLA adders can be constructed by cascading 4- bit CLA adders other adders: carry select adder, carry skip adder

32 Design Process Divide and Conquer –using simple components –glue simple components together –work on the things you know how to do. The unknown will become obvious as you make progress Successive Refinement –multiplier design –divider design

33 Multiplier paper and pencil method multiplicand0110 multiplier product n bits x m bits = m+n bits binary : 0 place 0 1place a copy of multiplicand

34 Multiply Hardware Version 1 multiplicand shift left 64 bits shift right 64-bit ALU multiplier product write control 64 bits 32 bits x 32 bits; using 64-bit multiplicand reg. 64 bit ALU, 64 bit product reg. 32 bit multiplier ADD Check the right most bit of M’r to decide to add 0 or multiplicand Control provides four control signals

35 Multiply Algorithm Version 1 1. test multiplier0 (i.e., bit0 of multiplier) 1.a if multiplier0 = 1, add multiplicand to product and place result in product register 2. shift the multiplicand left 1 bit 3. shift the multiplier right 1 bit 4. 32nd repetition ? if yes done if no go to 1.

36 Multiply Algorithm Version 1 Example iter. step multiplier multiplicand product 0initial a a x 0101 =

37 Multiplier Algorithm Version 1 observations from version 1 1/2 bits in multiplicand always 0 use 64-bit adder is wasted (for 32 bit x 32 bit) 0’s inserted into multiplicand as shifted left, least significant bits of the product does not change once formed 3 steps per bit shift product to right instead of shifting multiplicand to left ? (by adding to the left half of the product register)

38 Multiply Hardware Version 2 multiplicand 32 bits shift right 32-bit ALU multiplier product shift right control 32 bits 32-bit multiplicand reg. 32-bit ALU, 64-bit product reg. 32-bit multiplier reg ADD Check the right most bit of M’r to decide to add 0 or multiplicand Write into the left half of the product register write 32 bits

39 Multiply Algorithm Version 2 1. test multiplier0 (i.e., bit 0 of the multiplier) 1a. if multiplier0 = 1 add multiplicand to the left half of product and place the result in the left half of product register; 2. shift product reg. right 1 bit 3. shift multiplier reg. right 1 bit 4. 32nd repetition ? if yes done if no, go to 1.

40 Multiply Algorithm Version 2 Example iter. step multiplier multiplicand product 0initial a a

41 Multiply Version 2 Observations –product reg. wastes space that exactly matches the size of multiplier –3 steps per bit –combine multiplier register and product register

42 Multiply Hardware Version 3 32-bit multiplicand register, 32-bit ALU, 64-bit product register, multiplier reg is part of product register multiplicand 32 bit ALU product (multiplier) control shift right write into left half ADD

43 Multiply Algorithm Version 3 1. test product0 (multiplier is in the right half of product register) 1a. if product0 = 1 add multiplicand to the left half of product and place the result in the left half of product register 2. shift product register right 1 bit 3. 32nd repetition ? if yes, done if no, go to 1.

44 Multiply Algorithm Version 3 Example iter. step multiplicand product 0initial a a a x x 1011 = x 11 = 154 need to save the carry

45 Multiply Algorithm Version 3 Observations 2 steps per bit because of multiplier and product in one register, shift right 1 bit once (rather than twice in version 1 and version 2) MIPS registers Hi and Li correspond to left and right half of product MIPS has instruction multu How about signed numbers in multiplication ? –method 1: keep the sign of both numbers and use the magnitude for multiplication, after 32 repetitions, then change the product to appropriate sign. –method 2: Booth’s algorithm –Booth’s algorithm is more elegant in signed number multiplications –Booth’s algorithm uses the same hardware as version 3

46 Booth’s Algorithm Motivation for Booth’s Algorithm is speed example 2 x 6 = 0010 x 0110 normal approach Booth’s approach Booth’s approach : replace a string of 1s in multiplier by two actions action 1: beginning of a string of 1s, subtract multiplicand action 2: end of a string of 1s, add multiplicand

47 Booth’s Algorithm end of run middle of run beginning of run current bit bit to the right explanation action (previous bit) 1 0 beginning of a run of 1s sub. mult’d from left half of product 1 1 middle of a run no arithmetic oper. 0 1 end of a run add mul’d to left half of product 0 0 middle of a run of 0s no arith. operation.

48 Booth’s Algorithm Example iteration step multiplicand product 0 initial sub product shift right shift right shift right add shift right x 7=-14 in signed binary 1110 x 0111 = previous bit To begin with we put multiplier at the right half of the product register

49 Divide Algorithm Paper and pencil quotient divisor dividend remainder (modulo )

50 Divide Hardware Version 1 64-bit divisor reg., 64-bit ALU, 32-bit quotient reg. 64-bit remainder register divisor shift right 64-bit ALU remainder quotient control shift left write put the dividend in the remainder register initially

51 Divide Algorithm Version 1 start: place dividend in remainder 1. sub. divisor from the remainder and place the result in remainder 2. test remainder 2a. if remainder >= 0, shift quotient to left setting the new rightmost bit to 1 2b. if remainder <0, restore the original value by adding divisor to remainder, and place the sum in remainder. shift quotient to left and setting new least significant bit 0 3. shift divisor right 1 bit 4. n+1 repetitions ? if yes, done, if no, go to 1.

52 Divide Algorithm Version 1 Example iter. step quotient divisor remainder 0initial b b b a a

53 Divide Algorithm Version 1 Observations –1/2 bits in divisor always 0 –1/2 of divisor is wasted –1/2 of 64-bit ALU is wasted Possible improvement –instead of shifting divisor to right, shifting remainder to left ? –first step can not produce a 1 in quotient, so switch order to shift first and then subtract. This can save one iteration

54 Divide Hardware Version 2 32-bit divisor reg. 32-bit ALU, 32-bit quotient reg., 64-bit remainder reg. divisor 32-bit ALU remainder control quotient shift left

55 Divide Algorithm Version 2 start: place dividend in remainder 1. shift remainder left 1 bit 2. sub. divisor from the left half of remainder and place the result in the left half of remainder 3. test remainder 3a. if remainder >= 0, shift quotient to left setting the new rightmost bit to 1 3b. if remainder <0, restore the original value by adding divisor to the left half of remainder, and place the sum in the left of the remainder. also shift quotient to left and setting new least significant bit 0 4. n repetitions ? if yes, done, if no, go to 1.

56 Divide Algorithm Version 2 Example iter. step quotient divisor remainder 0initial b a b a

57 Divide Algorithm Version 2 Observations –3 steps (shift remainder left, subtract, shift quotient left) Further improvement (version 3) –eliminating quotient register by combining with remainder register as shifted left –therefore loop contains only two steps, because the shift of remainder is shifting the remainder in the left half and the quotient in the right half at the same time –consequence of combining the two registers together is the remainder shifted one time unnecessary at the last iteration –final correction step: shift back the remainder in the left half of the remainder register (i.e., shift right 1 bit of remainder only)

58 Divide Hardware Version 3 32-bit divisor register, 32-bit ALU, 64-bit remainder register, 0-bit quotient register (quotient bit shifts into remainder register, as remainder register shifts left) divisor 32-bit ALU remainder, quotient control 64-bit 32bits shift left write

59 Divide Algorithm Version 3 start: place dividend in remainder 1. shift remainder left 1 bit 2. sub. divisor from the remainder and place the result in remainder 3. test remainder 3a. if remainder >= 0, shift remainder to left setting the new rightmost bit to 1 3b. if remainder <0, restore the original value by adding divisor to the left half of remainder, and place the sum in the left of the remainder. also shift remainder to left and setting new least significant bit 0 4. n repetitions ? if yes, done, if no, go to 2.

60 Divide Algorithm Version 3 Example iter. step divisor remainder 0initial b b a b correction step: shift remainder right 1bit. quotient

61 Divide Algorithm Version 3 Observations –same hardware as multiply, need a 32-bit ALU to add and subtract and a 64-bit register to shift left and right –divide algorithm version 3 is called restoring division algorithm for unsigned numbers Signed numbers divide –simplest method »remember signs of dividend and divisor, make positive, and finally complement quotient and remainder as necessary »dividend and remainder must have the same sign »quotient is negative if dividend sign and divisor sign disagree –SRT (named after three persons) method »an efficient algorithm

62 Floating Point Numbers What can be represented in N bits ? unsigned 0 2 N -1 2’s complement. -2 N- 1 2 N ’s comp.-2 N N BCD0 10 N/4 - 1 How about very small numbers, very large numbers rationals, such as 2/3; irrationals such as  2; transcendentals, such as , .

63 Floating Point Numbers Mantissa (aka Significand), Exponent (using radix of 10) 6.12 x IEEE standard F.P. mantissa = sign + magnitude; magnitude is normalized with hidden integer bit: 1.M exponent = E -127 (excess 127), 0 < E < 255 a FP number N = (-1) S 2 (E-127) (1.M) 0 = = single precision S(1bit), E(8 bits), M(23 bits) S E M

64 Floating Point Numbers Single Precision FP numbers = __________________________________ = ___________________________________ 7 = ____________________________________ =-0.11 b =-1.1 x 2 -1 E= = b =-1.01 x 2 2 E=129 7 = 111 b = 1.11 x 2 2 E=129

65 Floating Point Numbers Single precision FP number What is the smallest number in magnitude ? (1.0) What is the largest number in magnitude ? ( ) binary = ( ) 2 127

66 Floating Point Numbers single precision FP numbers ExponentSignificandObject represented nonzero denormalized numbers 1 to 254 anything floating point numbers infinite 255 nonzero NaN (Not A Number) other topics in FP numbers 1. extra bits for rounding 2. guard bit, sticky bit 3. algorithms for FP numbers

67 Floating Point Numbers Double precision –64 bits total »52-bit significand »11-bit exponent (excess 1023 bias) –Number is: (-1) s (1.M) x 2 E-1023

68 Basic Addition Algorithm Steps for Y + X, assuming Y >= X 1. Align binary points (denormalize smaller number) a. compute Diff = Exp(Y) - Exp(X); Exp = Exp(Y) b. Sig(X) = Sig(X) >> Diff 2. Add the aligned components Sig = Sig(X) + Sig(Y) 3. Normalize the sum 1. shift Sig right/left until leading bit is 1; decrementing or incrementing Exp. 2. Check for overflow in Exp 3. Round 4. repeat step 3 it not still normalized

69 Addition Example 4-bit significand x x 2 2 align binary points (denormalize smaller number) x x 2 3 Add the aligned components x 2 3 Normalize the sum x 2 4 No overflow, no rounding

70 Another Addition Example x x 1 –4-bit significand; extra bit needed for accuracy 1. Align binary point: x x Subtract the aligned components x Normalize x 2 2 = 4.75 Without extra bit, the result would be x 2 3 = = 4.5, which is off by This is too much!

71 Accuracy and Rounding Want arithmetic to be fully precise –IEEE 754 keeps two extra digits on the right during intermediate calculations (guard digit, round digit) Alignment step can cause data to be discarded (shifted out on right) 2.56 x x x x x 10 2 (We have two digits to round 0 to 49 round down Guard Round Answer = 2.37 x 10 2 Without using Guard and Round digits, Answer would be 2.36 x to 99 round up