CS141-L3-1Tarun Soni, Summer ‘03 More ALUs and floating point numbers  Today: The rest of chap 4:  Multiplication, Division and Floating point numbers.

Slides:

Advertisements

Similar presentations

Datorteknik IntegerMulDiv bild 1 MIPS mul/div instructions Multiply: mult $2,$3Hi, Lo = $2 x $3;64-bit signed product Multiply unsigned: multu$2,$3Hi,

Advertisements

1 IKI10230 Pengantar Organisasi Komputer Bab 6: Aritmatika 7 & 14 Mei 2003 Bobby Nazief Qonita Shahab bahan kuliah:

CMPE 325 Computer Architecture II

Computer Architecture ECE 361 Lecture 6: ALU Design

1 CONSTRUCTING AN ARITHMETIC LOGIC UNIT CHAPTER 4: PART II.

Division CPSC 321 Computer Architecture Andreas Klappenecker.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.

Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.

361 div.1 Computer Architecture ECE 361 Lecture 7: ALU Design : Division.

Arithmetic IV CPSC 321 Andreas Klappenecker. Any Questions?

1  2004 Morgan Kaufmann Publishers Chapter Three.

Integer Multiplication and Division ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering.

Computer Organization Multiplication and Division Feb 2005 Reading: Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann.

Chapter Four Arithmetic and Logic Unit

CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.

CPSC 321 Computer Architecture ALU Design – Integer Addition, Multiplication & Division Copyright 2002 David H. Albonesi and the University of Rochester.

ECE 232 L9.Mult.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 9 Computer Arithmetic.

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.

Computer Arithmetic.

Computer Architecture ALU Design : Division and Floating Point

Computing Systems Basic arithmetic for computers.

ECE232: Hardware Organization and Design

07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.

CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.

Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.

Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Spring 2006 University of Ilam.

1  1998 Morgan Kaufmann Publishers Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture.

Lecture 6: Multiply, Shift, and Divide

Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Ilam University.

Cs 152 l6 Multiply 1 DAP Fa 97 © U.C.B. ECE Computer Architecture Lecture Notes Multiply, Shift, Divide Shantanu Dutt Univ. of Illinois at.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

05/03/2009CA&O Lecture 8,9,10 By Engr. Umbreen sabir1 Computer Arithmetic Computer Engineering Department.

Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication

Computer Arithmetic See Stallings Chapter 9 Sep 10, 2009

Integer Multiplication and Division ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering.

Csci 136 Computer Architecture II – Multiplication and Division

Mohamed Younis CMCS 411, Computer Architecture 1 CMSC Computer Architecture Lecture 11 Performing Division March 5,

CS152 / Kubiatowicz Lec6.1 2/12/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 6 Multiply, Divide, Shift February 12, 2003 John.

Division Quotient Divisor Dividend – – Remainder.

순천향대학교 정보기술공학부 이 상 정 1 3. Arithmetic for Computers.

1 ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ? –support the arithmetic/logic operations: add, addi addiu, sub, subu, and,

Integer Multiplication and Division COE 301 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University.

By Wannarat Computer System Design Lecture 3 Wannarat Suntiamorntut.

William Stallings Computer Organization and Architecture 8th Edition

Computer System Design Lecture 3

Integer Multiplication and Division

MIPS mul/div instructions

Morgan Kaufmann Publishers Arithmetic for Computers

CS/COE0447 Computer Organization & Assembly Language

Morgan Kaufmann Publishers

CDA 3101 Summer 2007 Introduction to Computer Organization

CSCE 350 Computer Architecture

Topic 3c Integer Multiply and Divide

How to represent real numbers

Computer Arithmetic Multiplication, Floating Point

ECEG-3202 Computer Architecture and Organization

Computer Architecture EECS 361 Lecture 6: ALU Design

Morgan Kaufmann Publishers Arithmetic for Computers

Number Representation

Presentation transcript:

CS141-L3-1Tarun Soni, Summer ‘03 More ALUs and floating point numbers  Today: The rest of chap 4:  Multiplication, Division and Floating point numbers

CS141-L3-2Tarun Soni, Summer ‘03  Instruction Set Architectures  Performance issues  2s complement, Addition, Subtraction The Story so far: Basically ISA and some ALU stuff

CS141-L3-3Tarun Soni, Summer ‘03 CPU: The big picture Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction ° Design hardware for each of these steps!!! Execute an entire instruction FetchDecode Fetch Execute Store Next

CS141-L3-4Tarun Soni, Summer ‘03 CPU: Clocking Clk Don’t Care SetupHold SetupHold All storage elements are clocked by the same clock edge

CS141-L3-5Tarun Soni, Summer ‘03 CPU: Big Picture: Control and Data Path ALUctr RegDst ALUSrc ExtOp MemtoRegMemWr Equal Instruction Imm16RdRsRt nPC_sel Adr Inst Memory DATA PATH Control Op Fun RegWr

CS141-L3-6Tarun Soni, Summer ‘03 CPU: The abstract version Logical vs. Physical Structure Data Out Clk 5 RwRaRb bit Registers Rd ALU Clk Data In Data Address Ideal Data Memory Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 32 A B Next Address Control Datapath Control Signals Conditions

CS141-L3-7Tarun Soni, Summer ‘03 Computer Performance Multiplication and Division

CS141-L3-8Tarun Soni, Summer ‘03 The 32 bit ALU-limited edition Bit-slice plus extra on the two ends Overflow means number too large for the representation Carry-look ahead and other adder tricks AB M S 32 4 Ovflw ALU0 a0b0 cinco s0 ALU31 a31b31 cinco s31 C/L to produce select, comp, c-in signed-arith and cin xor co

CS141-L3-9Tarun Soni, Summer ‘03 The Design Process Divide and Conquer (e.g., ALU) –Formulate a solution in terms of simpler components. –Design each of the components (subproblems) Generate and Test (e.g., ALU) –Given a collection of building blocks, look for ways of putting them together that meets requirement Successive Refinement (e.g., multiplier, divider) –Solve "most" of the problem (i.e., ignore some constraints or special cases), examine and correct shortcomings. Formulate High-Level Alternatives (e.g., shifter) –Articulate many strategies to "keep in mind" while pursuing any one approach. Work on the Things you Know How to Do –The unknown will become “obvious” as you make progress. Optimization Criteria: Delay [Logic levels, Fan in/out], Area [Gate count, Package count, Pin out] Cost, Power, Design time

CS141-L3-10Tarun Soni, Summer ‘03 The 32 bit ALU-limited edition Supported Operations 000 = and 001 = or 010 = add 110 = subtract 111 = slt Tuned performance by using Carry-lookahead adders. What about other instructions ? multiply mult $2,$3Hi, Lo = $2 x $364-bit signed product multiply unsignedmultu$2,$3Hi, Lo = $2 x $3 64-bit unsigned product divide div $2,$3Lo = $2 ÷ $3,Lo = quotient, Hi = remainder Hi = $2 mod $3 divide unsigned divu $2,$3Lo = $2 ÷ $3,Unsigned quotient & remainder

CS141-L3-11Tarun Soni, Summer ‘03 Grade school Paper and pencil example: Multiplicand 1000 Multiplier x Product m bits x n bits = m+n bit product Binary makes it easy: –0 => place 0 ( 0 x multiplicand) –1 => place multiplicand ( 1 x multiplicand) we’ll look at a couple of versions of multiplication hardware

CS141-L3-12Tarun Soni, Summer ‘03 Unsigned basic multiplier B0B0 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 B1B1 B2B2 B3B3 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P Stage i accumulates A * 2 i if B i == 1

CS141-L3-13Tarun Soni, Summer ‘03 Unsigned basic multiplier at each stage shift A left ( x 2) use next bit of B to determine whether to add in shifted multiplicand accumulate 2n bit partial product at each stage B0B0 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 A0A0 A1A1 A2A2 A3A3 B1B1 B2B2 B3B3 P0P0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P

CS141-L3-14Tarun Soni, Summer ‘03 Unsigned basic multiplier for(i=0; i<32; i++) { If ( mulitplier[0] == 1 ) // we could do multiplier[i] and skip the shift { product += multiplicand ; // product is 64 bit register // adder is 64 bit. ! } multiplicand << 1; // shift multiplicand to prepare for next add // multiplicand is in a 64 bit register mulitplier >> 1; // position the i’th bit on lsb for test. } The algorithm

CS141-L3-15Tarun Soni, Summer ‘03 Unsigned basic multiplier 64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit multiplier reg Multiplier = datapath + control Product Multiplier Multiplicand 64-bit ALU Shift Left Shift Right Write Control 32 bits 64 bits ProductMultiplierMultiplicand

CS141-L3-16Tarun Soni, Summer ‘03 Some observations 1 clock per cycle => 100 clocks per multiply –Ratio of multiply to add 5:1 to 100:1 1/2 the bits in multiplicand always 0 => 64-bit adder is wasted 0’s inserted in left of multiplicand as shifted => least significant bits of product never changed once formed Instead of shifting multiplicand to left, shift product to right? Speed ? Power/efficiency of the adder ? Pattern of result on product register ?

CS141-L3-17Tarun Soni, Summer ‘03 Multiplier bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, 32-bit Multiplier reg Product Multiplier Multiplicand 32-bit ALU Shift Right Write Control 32 bits 64 bits Shift Right

CS141-L3-18Tarun Soni, Summer ‘03 Multiplier Shift the Multiplier register right 1 bit. Done Yes: 32 repetitions 2. Shift the Product register right 1 bit. No: < 32 repetitions 1. Test Multiplier0 Multiplier0 = 0 Multiplier0 = 1 1a. Add multiplicand to the left half of product & place the result in the left half of Product register 32nd repetition? Start for(i=0; i<32; i++) { If ( mulitplier[0] == 1 ) { product[31:16] += multiplicand ; // product is 64 bit register // adder is 32 bit. ! } product >> 1; // shift product right // saving product[i:0] for final result mulitplier >> 1; // position the i’th bit on lsb for test. }

CS141-L3-19Tarun Soni, Summer ‘03 Multiplier 2.0 Product Multiplier Multiplicand 32-bit ALU Shift Right Write Control 32 bits 64 bits Shift Right ProductMultiplierMultiplicand NextProduct = = = =

CS141-L3-20Tarun Soni, Summer ‘03 Multiplier 3.0 Product register wastes space that exactly matches size of multiplier => combine Multiplier register and Product register 32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, (0-bit Multiplier reg) Product (Multiplier) Multiplicand 32-bit ALU Write Control 32 bits 64 bits Shift Right

CS141-L3-21Tarun Soni, Summer ‘03 Multiplier 3.0 Done Yes: 32 repetitions 2. Shift the Product register right 1 bit. No: < 32 repetitions 1. Test Product0 Product0 = 0 Product0 = 1 1a. Add multiplicand to the left half of product & place the result in the left half of Product register 32nd repetition? Start for(i=0; i<32; i++) { If ( product[0] == 1 ) { product[31:16] += multiplicand ; // product is 64 bit register // adder is 32 bit. ! } product >> 1; // shift product right // saving product[i:0] for final result }

CS141-L3-22Tarun Soni, Summer ‘03 More observations ? 2 steps per bit because Multiplier & Product combined MIPS registers Hi and Lo are left and right half of Product Gives us MIPS instruction MultU How can you make it faster? What about signed multiplication? –easiest solution is to make both positive & remember whether to complement product when done (leave out the sign bit, run for 31 steps) –apply definition of 2’s complement need to sign-extend partial products and subtract at the end –Booth’s Algorithm is elegant way to multiply signed numbers using same hardware as before and save cycles can handle multiple bits at a time

CS141-L3-23Tarun Soni, Summer ‘03 Booths algorithm Example 2 x 6 = 0010 x 0110: 0010 x shift (0 in multiplier) add (1 in multiplier) add (1 in multiplier) shift (0 in multiplier) ALU with add or subtract gets same result in more than one way: 6= – = – = For example 0010 x shift (0 in multiplier) – 0010 sub (first 1 in multpl.) shift (mid string of 1s) add (prior step had last 1)

CS141-L3-24Tarun Soni, Summer ‘03 Booths algorithm Current BitBit to the RightExplanationExampleOp 10Begins run of 1s sub 11Middle of run of 1s none 01End of run of 1s add 00Middle of run of 0s none Originally for Speed (when shift was faster than add) Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one –

CS141-L3-25Tarun Soni, Summer ‘03 Booths algorithm Booths Example (2 x 7) 1a. P = P - m shift P (sign ext) 1b > nop, shift > nop, shift > add 4a shift 4b done OperationMultiplicandProductnext? 0. initial value > sub

CS141-L3-26Tarun Soni, Summer ‘03 Booths algorithm Booths Example (2 x -3) 1a. P = P - m shift P (sign ext) 1b > add a shift P 2b > sub a shift 3b > nop 4a shift 4b done OperationMultiplicandProductnext? 0. initial value > sub

CS141-L3-27Tarun Soni, Summer ‘03 Division 1001 Quotient Divisor Dividend – – Remainder (or Modulo result) See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder => sizeof( Dividend ) = sizeof( Quotient ) + sizeof( Divisor ) 3 versions of divide, successive refinement

CS141-L3-28Tarun Soni, Summer ‘03 Division bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Remainder Quotient Divisor 64-bit ALU Shift Right Shift Left Write Control 32 bits 64 bits

CS141-L3-29Tarun Soni, Summer ‘03 Division Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Test Remainder Remainder < 0 Remainder >= 0 2a. Shift the Quotient register to the left setting the new rightmost bit to 1. 2b. Restore the original value by adding the Divisor register to the Remainder register, and place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to Shift the Divisor register right 1 bit. 33rd repetition? No: < 33 repetitions Done Yes: 33 repetitions Start Takes n+1 steps for n-bit Quotient & Rem. QuotientDivisorRemainder

CS141-L3-30Tarun Soni, Summer ‘03 Division 2.0 1/2 bits in divisor always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? 1st step cannot produce a 1 in quotient bit (otherwise too big) => switch order to shift first and then subtract, can save 1 iteration 32-bit Divisor reg, 32-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Remainder Quotient Divisor 32-bit ALU Shift Left Write Control 32 bits 64 bits Shift Left

CS141-L3-31Tarun Soni, Summer ‘03 Division 2.0 3b. Restore the original value by adding the Divisor register to the left half of the Remainderregister, &place the sum in the left half of the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. Test Remainder Remainder < 0 Remainder >= 0 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. 3a. Shift the Quotient register to the left setting the new rightmost bit to Shift the Remainder register left 1 bit. Done Yes: n repetitions nth repetition? No: < n repetitions Start: Place Dividend in Remainder

CS141-L3-32Tarun Soni, Summer ‘03 Division 3.0 Eliminate Quotient register by combining with Remainder as shifted left –Start by shifting the Remainder left as before. –Thereafter loop contains only two steps because the shifting of the Remainder register shifts both the remainder in the left half and the quotient in the right half –The consequence of combining the two registers together and the new order of the operations in the loop is that the remainder will shifted left one time too many. – Thus the final correction step must shift back only the remainder in the left half of the register 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg) Remainder (Quotient) Divisor 32-bit ALU Write Control 32 bits 64 bits Shift Left “HI”“LO”

CS141-L3-33Tarun Soni, Summer ‘03 Division 3.0 Remainder Divisor b. Restore the original value by adding the Divisor register to the left half of the Remainderregister, &place the sum in the left half of the Remainder register. Also shift the Remainder register to the left, setting the new least significant bit to 0. Test Remainder Remainder < 0 Remainder 0 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. 3a. Shift the Remainder register to the left setting the new rightmost bit to Shift the Remainder register left 1 bit. Done. Shift left half of Remainder right 1 bit. Yes: n repetitions (n = 4 here) nth repetition? No: < n repetitions Start: Place Dividend in Remainder

CS141-L3-34Tarun Soni, Summer ‘03 Sign of remainder = ? 7/4 = (Q=1, R=3) 7/4 = (Q=2, R=-1) Which do you prefer? Convention: a/b = (Q, R) Sign(R) <= Sign(a) Thus 7/4 = (Q=1, R=3) -7/4 = (Q=-1,R=-3) Division: some signed details -a 0 a Q*b + R + Q*b R a =Q*b + R

CS141-L3-35Tarun Soni, Summer ‘03 Floating Point What can be represented in N bits? Unsigned0to2 2s Complement- 2to s Complement-2 +1to2 -1 But, what about? –very large numbers?9,349,398,989,787,762,244,859,087,678 –very small number? –rationals 2/3 – irrationals 2 – transcendentalse N N-1

CS141-L3-36Tarun Soni, Summer ‘03 Floating Point 6.02 x x exponent radix (base) Mantissa decimal point IEEE F.P. ± 1.M x 2 e Issues: ° Arithmetic (+, -, *, / ) ° Representation, Normal form ° Range and Precision ° Rounding ° Exceptions (e.g., divide by zero, overflow, underflow) ° Errors ° Properties ( negation, inversion, if A ° B then A - B ° 0 )

CS141-L3-37Tarun Soni, Summer ‘03 Floating Point Binary Fractions = 1x x x x2 0 so = 1x x x x x x2 -3 e.g.,.75 = 3/4 = 3/2 2 = 1/2 + 1/4 =.11

CS141-L3-38Tarun Soni, Summer ‘03 Floating Point Representation of floating point numbers in IEEE 754 standard: single precision 1823 sign exponent: excess 127 binary integer mantissa: sign + magnitude, normalized binary significand w/ hidden integer bit: 1.M actual exponent is e = E SE M N = (-1) 2 (1.M) S E < E < = = Magnitude of numbers that can be represented is in the range: (1.0) to2 127 ( ) which is approximately: 1.8 x to3.40 x integer comparison valid on IEEE Fl.Pt. numbers of same sign!

CS141-L3-39Tarun Soni, Summer ‘03 Floating Point Leading “1” bit of significand is implicit Exponent is “biased” to make sorting easier –all 0s is smallest exponent all 1s is largest –bias of 127 for single precision and 1023 for double precision –summary: (–1) sign ´ (1+significand) ´ 2 exponent – bias Example: –decimal: -.75 = -3/4 = -3/2 2 –binary: -.11 = -1.1 x 2 -1 –floating point: exponent = 126 = –IEEE single precision: Sign Exponent Significand

CS141-L3-40Tarun Soni, Summer ‘03 Floating Point Floating Point Addition How do you add in scientific notation? x x 10 2 Basic Algorithm 1. Align 2. Add 3. Normalize 4. Round Approximate algorithm. While (Exp(A) > Exp(B) ) { shift Mantissa(B) right; Exp(B)++; } Mantissa(Result) = Mantissa(A) + Mantissa(B); Exp(Result) = Exp(A); // or Exp(B) While (Mantissa(Result)[msb] !=1!) { Exp(Result)--; } Round(Mantissa); Round(Exponent);

CS141-L3-41Tarun Soni, Summer ‘03 Floating Point

CS141-L3-42Tarun Soni, Summer ‘03 Floating Point Addition

CS141-L3-43Tarun Soni, Summer ‘03 Floating Point Floating Point Multiplication How do you multiply in scientific notation? (9.9 x 10 4 )(5.2 x 10 2 ) = x 10 7 Basic Algorithm 1. Add exponents 1a. Correct for bias in exponent representation (Exp -= 127); 2. Multiply 3. Normalize 4. Round 5. Set Sign

CS141-L3-44Tarun Soni, Summer ‘03 Floating Point Accuracy Issues FP Accuracy Extremely important in scientific calculations Very tiny errors can accumulate over time IEEE 754 FP standard has four rounding modes –always round up –always round down –truncate –round to nearest => in case of tie, round to nearest even Requires extra bits in intermediate representations

CS141-L3-45Tarun Soni, Summer ‘03 Floating Point Accuracy Issues Guard bits -- bits to the right of the least significant bit of the significand computed for use in normalization (could become significant at that point) and rounding. IEEE 754 has three extra bits and calls them guard, round, and sticky. How many extra bits? IEEE Spec: As if computed the result exactly and rounded.

CS141-L3-46Tarun Soni, Summer ‘03 Floating Point Overflows Infinity and NaNs result of operation overflows, i.e., is larger than the largest number that can be represented overflow is not the same as divide by zero (raises a different exception) +/- infinity S It may make sense to do further computations with infinity e.g., X/0 > Y may be a valid comparison Not a number, but not infinity (e.q. sqrt(-4)) invalid operation exception (unless operation is = or =) NaN S non-zero NaNs propagate: f(NaN) = NaN HW decides what goes here

CS141-L3-47Tarun Soni, Summer ‘03 Multiplication and division take much longer than addition, requiring multiple addition steps. Floating Point extends the range of numbers that can be represented, at the expense of precision (accuracy). FP operations are very similar to integer, but with pre- and post- processing. Rounding implementation is critical to accuracy over time. Summary