1 Chapter Three Last revision: 4/17/2015. 2 Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture.

Slides:



Advertisements
Similar presentations
Chapter Three.
Advertisements

Datorteknik IntegerMulDiv bild 1 MIPS mul/div instructions Multiply: mult $2,$3Hi, Lo = $2 x $3;64-bit signed product Multiply unsigned: multu$2,$3Hi,
©UCB CPSC 161 Lecture 6 Prof. L.N. Bhuyan
1 CONSTRUCTING AN ARITHMETIC LOGIC UNIT CHAPTER 4: PART II.
Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.
361 div.1 Computer Architecture ECE 361 Lecture 7: ALU Design : Division.
1 Chapter Three Last revision: 6/15/ Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Three.
1 Chapter 4: Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture Assembly Language and.
Chapter Four Arithmetic and Logic Unit
Arithmetic I CPSC 321 Andreas Klappenecker. Administrative Issues Office hours of TA Praveen Bhojwani: M 1:00pm-3:00pm.
1 ECE369 Chapter 3. 2 ECE369 Multiplication More complicated than addition –Accomplished via shifting and addition More time and more area.
1  1998 Morgan Kaufmann Publishers Chapter Four Arithmetic for Computers.
CPSC 321 Computer Architecture ALU Design – Integer Addition, Multiplication & Division Copyright 2002 David H. Albonesi and the University of Rochester.
Chapter 3 Arithmetic for Computers. Arithmetic Where we've been: Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's.
Arithmetic for Computers
(original notes from Prof. J. Kelly Flanagan)
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
Computer Arithmetic Nizamettin AYDIN
1 CS/COE0447 Computer Organization & Assembly Language Chapter 3.
1 Modified from  Modified from 1998 Morgan Kaufmann Publishers Chapter Three: Arithmetic for Computers citation and following credit line is included:
Computer Arithmetic.
EGRE 426 Fall 09 Chapter Three
Computer Architecture ALU Design : Division and Floating Point
Computing Systems Basic arithmetic for computers.
ECE232: Hardware Organization and Design
07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.
1 EGRE 426 Fall 08 Chapter Three. 2 Arithmetic What's up ahead: –Implementing the Architecture 32 operation result a b ALU.
1  1998 Morgan Kaufmann Publishers Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture.
Lecture 6: Multiply, Shift, and Divide
Csci 136 Computer Architecture II – Constructing An Arithmetic Logic Unit Xiuzhen Cheng
Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Ilam University.
Computing Systems Designing a basic ALU.
1 Lecture 6 BOOLEAN ALGEBRA and GATES Building a 32 bit processor PH 3: B.1-B.5.
Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
Csci 136 Computer Architecture II – Multiplication and Division
Mohamed Younis CMCS 411, Computer Architecture 1 CMSC Computer Architecture Lecture 11 Performing Division March 5,
1  2004 Morgan Kaufmann Publishers Performance is specific to a particular program/s –Total execution time is a consistent summary of performance For.
1 ELEN 033 Lecture 4 Chapter 4 of Text (COD2E) Chapters 3 and 4 of Goodman and Miller book.
1  2004 Morgan Kaufmann Publishers Lets Build a Processor Almost ready to move into chapter 5 and start building a processor First, let’s review Boolean.
Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank Competency Area 4: Computer Arithmetic.
순천향대학교 정보기술공학부 이 상 정 1 3. Arithmetic for Computers.
1 ALU for Computers (MIPS) design a fast ALU for the MIPS ISA requirements ? –support the arithmetic/logic operations: add, addi addiu, sub, subu, and,
Prof. Hsien-Hsin Sean Lee
Lec 11Systems Architecture1 Systems Architecture Lecture 11: Arithmetic for Computers Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all.
1 Arithmetic Where we've been: –Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: –Implementing the Architecture.
Computer Arthmetic Chapter Four P&H. Data Representation Why do we not encode numbers as strings of ASCII digits inside computers? What is overflow when.
9/23/2004Comp 120 Fall September Chapter 4 – Arithmetic and its implementation Assignments 5,6 and 7 posted to the class web page.
EE204 L03-ALUHina Anwar Khan EE204 Computer Architecture Lecture 03- ALU.
May 2, 2001System Architecture I1 Systems Architecture I (CS ) Lecture 11: Arithmetic for Computers * Jeremy R. Johnson May 2, 2001 *This lecture.
By Wannarat Computer System Design Lecture 3 Wannarat Suntiamorntut.
1 CPTR 220 Computer Organization Computer Architecture Assembly Programming.
Based on slides from D. Patterson and www-inst.eecs.berkeley.edu/~cs152/ COM 249 – Computer Organization and Assembly Language Chapter 3 Arithmetic For.
1 Chapter 3 Arithmetic for Computers Lecture Slides are from Prof. Jose Delgado-Frias, Mr. Paul Wettin, and Prof. Valeriu Beiu (Washington State University.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Floating Point Representations
Computer System Design Lecture 3
Computer Arthmetic Chapter Four P&H.
Integer Multiplication and Division
MIPS mul/div instructions
Chapter Three : Arithmetic for Computers
CS/COE0447 Computer Organization & Assembly Language
CSCE 350 Computer Architecture
Arithmetic Where we've been:
Arithmetic Logical Unit
Computer Arithmetic Multiplication, Floating Point
Chapter 3 Arithmetic for Computers
Morgan Kaufmann Publishers Arithmetic for Computers
Presentation transcript:

1 Chapter Three Last revision: 4/17/2015

2 Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: –Implementing the Architecture 32 operation result a b ALU

3 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2) decimal: n -1 Of course it gets more complicated: numbers are finite (overflow) fractions and real numbers negative numbers e.g., no MIPS subi instruction; addi can add a negative number) How do we represent negative numbers? i.e., which bit patterns will represent which numbers? Numbers

4 Sign Magnitude: One's Complement Two's Complement 000 = = = = = = = = = = = = = = = = = = = = = = = = -1 Issues: balance, number of zeros, ease of operations Which one is best? Why? Possible Representations

5 32 bit signed numbers: two = 0 ten two = + 1 ten two = + 2 ten two = + 2,147,483,646 ten two = + 2,147,483,647 ten two = – 2,147,483,648 ten two = – 2,147,483,647 ten two = – 2,147,483,646 ten two = – 3 ten two = – 2 ten two = – 1 ten maxint minint MIPS

6 Negating a two's complement number: invert all bits and add 1 –remember: “negate (+/-)” and “invert (1/0)” are quite different! Converting n bit numbers into numbers with more than n bits: –MIPS 16 bit immediate gets converted to 32 bits for arithmetic –copy the most significant bit (the sign bit) into the other bits > > –"sign extension" (lbu vs. lb) Two's Complement Operations

7 Just like in grade school (carry/borrow 1s) Two's complement operations easy –subtraction using addition of negative numbers Overflow (result too large for finite computer word): –e.g., adding two n-bit numbers does not yield an n-bit number note that overflow term is somewhat misleading, 1000 it does not mean a carry “overflowed” Addition & Subtraction

8 No overflow when adding a positive and a negative number No overflow when signs are the same for subtraction Overflow occurs when the value affects the sign: –overflow when adding two positives yields a negative –or, adding two negatives gives a positive –or, subtract a negative from a positive and get a negative –or, subtract a positive from a negative and get a positive Consider the operations A + B, and A – B –Can overflow occur if B is 0 ? –Can overflow occur if A is 0 ? Detecting Overflow

9 An exception (interrupt) occurs –Control jumps to predefined address for exception –Interrupted address is saved for possible resumption Details based on software system / language –example: flight control vs. homework assignment Don't always want to detect overflow — new MIPS instructions: addu, addiu, subu note: addiu still sign-extends! note: sltu, sltiu for unsigned comparisons –Roll over: circular buffers –Saturation: pixel lightness control Effects of Overflow

10 Problem: Consider a logic function with three inputs: A, B, and C. Output D is true if at least one input is true Output E is true if exactly two inputs are true Output F is true only if all three inputs are true Show the truth table for these three functions. Show the Boolean equations for these three functions. Show an implementation consisting of inverters, AND, and OR gates. Review: Boolean Algebra & Gates

11 Let's build an ALU to support the andi and ori instructions –we'll just build a 1 bit ALU, and use 32 of them Possible Implementation (sum-of-products): b a operation result opabres An ALU (arithmetic logic unit)

12 Selects one of the inputs to be the output, based on a control input Lets build our ALU using a MUX: S C A B 0 1 Review: The Multiplexor note: we call this a 2-input mux even though it has 3 inputs!

13 Not easy to decide the “best” way to build something –Don't want too many inputs to a single gate –Dont want to have to go through too many gates –for our purposes, ease of comprehension is important Let's look at a 1-bit ALU for addition: How could we build a 1-bit ALU for add, and, and or? How could we build a 32-bit ALU? Different Implementations c out = a b + a c in + b c in sum = a xor b xor c in

14 Building a 32 bit ALU R0 = a ^ b; R1 = a V b; R2 = a + b; Case (Op) { 0: R = R0; 1: R = R1; 2: R = R2 } Case (Op) { 0: R = a ^ b; 1: R = a V b; 2: R = a + b}

15 Two's complement approch: just negate b and add. How do we negate? A very clever solution: Result = A + (~B) + 1 What about subtraction (a – b) ?

16 Need to support the set-on-less-than instruction (slt) –remember: slt is an arithmetic instruction –produces a 1 if rs < rt and 0 otherwise –use subtraction: (a-b) < 0 implies a < b Need to support test for equality (beq $t5, $t6, $t7) –use subtraction: (a-b) = 0 implies a = b Tailoring the ALU to the MIPS

Supporting slt Can we figure out the idea? 0 3 Result Operation a 1 CarryIn CarryOut 0 1 Binvert b 2 Less a.

18

19 Test for equality Notice control lines: 000 = and 001 = or 010 = add 110 = subtract 111 = slt Note: zero is a 1 when the result is zero! 0 3 Result Operation a 1 CarryIn CarryOut 0 1 Binvert b 2 Less a.

20 Conclusion We can build an ALU to support the MIPS instruction set –key idea: use multiplexor to select the output we want –we can efficiently perform subtraction using two’s complement –we can replicate a 1-bit ALU to produce a 32-bit ALU Important points about hardware –all of the gates are always working –the speed of a gate is affected by the number of inputs to the gate –the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) Our primary focus: comprehension, however, –Clever changes to organization can improve performance (similar to using better algorithms in software) –we’ll look at two examples for addition and multiplication

21 Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition? –two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c 1 = b 0 c 0 + a 0 c 0 + a 0 b 0 c 2 = b 1 c 1 + a 1 c 1 + a 1 b 1 c 2 = c 3 = b 2 c 2 + a 2 c 2 + a 2 b 2 c 3 = c 4 = b 3 c 3 + a 3 c 3 + a 3 b 3 c 4 = Not feasible! Why? Problem: ripple carry adder is slow

22 An approach in-between our two extremes Motivation: – If we didn't know the value of carry-in, what could we do? –When would we always generate a carry? g i = a i b i –When would we propagate the carry? p i = a i + b i Did we get rid of the ripple? c 1 = g 0 + p 0 c 0 c 2 = g 1 + p 1 c 1 c 2 = c 3 = g 2 + p 2 c 2 c 3 = c 4 = g 3 + p 3 c 3 c 4 = Feasible! Why? Carry-lookahead adder

23 Can’t build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders Better: use the CLA principle again! Use principle to build bigger adders

24 More complicated than addition –accomplished via shifting and addition More time and more area Let's look at 3 versions based on gradeschool algorithm 0010 (multiplicand) __x_1011 (multiplier) Negative numbers: convert and multiply –there are better techniques, we won’t look at them Multiplication

25 Multiplication: Implementation

26 Second Version

27 Final Version

28 Divide: Paper & Pencil 1001 Quotient Divisor Dividend – – Remainder (or Modulo result) See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder => | Dividend | = | Quotient | + | Divisor | 3 versions of divide, successive refinement

29 DIVIDE HARDWARE Version 1 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Remainder Quotient Divisor 64-bit ALU Shift Right Shift Left Write Control 32 bits 64 bits

30 2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. Divide Algorithm Version 1 Takes n+1 steps for n-bit Quotient & Rem. Remainder QuotientDivisor Test Remainder Remainder < 0 Remainder  0 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. 2a. Shift the Quotient register to the left setting the new rightmost bit to Shift the Divisor register right1 bit. Done Yes: n+1 repetitions (n = 4 here) Start: Place Dividend in Remainder n+1 repetition? No: < n+1 repetitions

31 Divide Algorithm I example (7 / 2) Remainder QuotientDivisor : : : : : : : : : : : : : : : Answer: Quotient = 3 Remainder = 1

32 Observations on Divide Version 1 1/2 bits in divisor always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? 1st step cannot produce a 1 in quotient bit (otherwise too big) => switch order to shift first and then subtract, can save 1 iteration

33 Divide Algorithm I example: wasted space Remainder QuotientDivisor : : : : : : : : : : : : : : :

34 Divide: Paper & Pencil Quotient Divisor Dividend – – Remainder (or Modulo result) – Notice that there is no way to get a 1 in leading digit! (this would be an overflow, since quotient would have n+1 bits)

35 DIVIDE HARDWARE Version 2 32-bit Divisor reg, 32-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Remainder Quotient Divisor 32-bit ALU Shift Left Write Control 32 bits 64 bits Shift Left

36 Divide Algorithm Version 2 Remainder Quotient Divisor b. Restore the original value by adding the Divisor register to the left half of the Remainderregister, &place the sum in the left half of the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. Test Remainder Remainder < 0 Remainder  0 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. 3a. Shift the Quotient register to the left setting the new rightmost bit to Shift the Remainder register left 1 bit. Done Yes: n repetitions (n = 4 here) nth repetition? No: < n repetitions Start: Place Dividend in Remainder

37 Observations on Divide Version 2 Eliminate Quotient register by combining with Remainder as shifted left –Start by shifting the Remainder left as before. –Thereafter loop contains only two steps because the shifting of the Remainder register shifts both the remainder in the left half and the quotient in the right half –The consequence of combining the two registers together and the new order of the operations in the loop is that the remainder will shifted left one time too many. – Thus the final correction step must shift back only the remainder in the left half of the register

38 DIVIDE HARDWARE Version 3 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg) Remainder (Quotient) Divisor 32-bit ALU Write Control 32 bits 64 bits Shift Left “HI”“LO”

39 Divide Algorithm Version 3 Remainder Divisor b. Restore the original value by adding the Divisor register to the left half of the Remainderregister, &place the sum in the left half of the Remainder register. Also shift the Remainder register to the left, setting the new least significant bit to 0. Test Remainder Remainder < 0 Remainder  0 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. 3a. Shift the Remainder register to the left setting the new rightmost bit to Shift the Remainder register left 1 bit. Done. Shift left half of Remainder right 1 bit. Yes: n repetitions (n = 4 here) nth repetition? No: < n repetitions Start: Place Dividend in Remainder

40 Observations on Divide Version 3 Same Hardware as Multiply: just need ALU to add or subtract, and 64-bit register to shift left or shift right Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide Signed Divides: Simplest is to remember signs, make positive, and complement quotient and remainder if necessary –Note: Dividend and Remainder must have same sign –Note: Quotient negated if Divisor sign & Dividend sign disagree e.g., –7 ÷ 2 = –3, remainder = –1 –What about? –7 ÷ 2 = –4, remainder = +1 Possible for quotient to be too large: if divide 64-bit integer by 1, quotient is 64 bits (“called saturation”)

41 Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers, e.g.,  10 9 Representation: –sign, exponent, significand: (–1) sign  significand  2 exponent –more bits for significand gives more accuracy –more bits for exponent increases range IEEE 754 floating point standard: –single precision: 8 bit exponent, 23 bit significand –double precision: 11 bit exponent, 52 bit significand

42 Recall Scientific Notation 6.02 x x exponent radix (base) Mantissa decimal point Sign, magnitude IEEE F.P. ± 1.M x 2 e Issues: –Arithmetic (+, -, *, / ) –Representation, Normal form –Range and Precision –Rounding –Exceptions (e.g., divide by zero, overflow, underflow) –Errors –Properties ( negation, inversion, if A  B then A - B  0 )

Floating-Point Arithmetic Representation of floating point numbers in IEEE 754 standard: single precision 1823 sign exponent: excess 127 binary integer mantissa: sign + magnitude, normalized binary significand w/ hidden integer bit: 1.M actual exponent is e = E SE M N = (-1) 2 (1.M) S E < E < = = Magnitude of numbers that can be represented is in the range: (1.0) to2 127 ( ) which is approximately: 1.8 x to3.40 x (integer comparison valid on IEEE Fl.Pt. numbers of same sign!)

44 IEEE 754 floating-point standard Leading “1” bit of significand is implicit Exponent is “biased” to make sorting easier –all 0s is smallest exponent all 1s is largest –bias of 127 for single precision and 1023 for double precision –summary: (–1) sign  significand)  2 exponent – bias Example: –decimal: -.75 = -3/4 = -3/2 2 –binary: -.11 = -1.1 x 2 -1 –floating point: exponent = 126 = –IEEE single precision:

45 Floating Point Complexities Operations are somewhat more complicated (see text) In addition to overflow we can have “underflow” Accuracy can be a big problem –IEEE 754 keeps two extra bits, guard and round –four rounding modes –positive divided by zero yields “infinity” –zero divide by zero yields “not a number” –other complexities Implementing the standard can be tricky Not using the standard can be even worse –see text for description of 80x86 and Pentium bug!

46 Floating-Point Addition Use, assume four decimal digits of significand and two decimal digits of exponent Step 1. Align number for smaller exponent – Step 2. Significand addition – Step 3. Normalized scientific notation (overflow or underflow check) – Step 4. Round the number –

47 Flow Diagram of Floating –Point Addition Start 1. Compare the exponents of the two numbers. Shift the smaller number to the right until its exponent would match the larger exponent 2. Add the significands 3. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent Overflow or Underflow? Exception 4. Round the siginificand to the appropriate number of bits Still normalized? Done Yes No Yes

48 Block Diagram of Floating-Point Addition SignExponentFractionSignExponentFraction Small ALU SignExponentFraction 0 1 Exponent difference 0 1 Big ALU Shift right Shift left or right Increment or decrement Rounding hardware Control

49 Floating-Point Multiplication (I) Use, assume four decimal digits of significand and two decimal digits of exponent Step 1. Adding exponent together, and subtract the bias from the sum: – Step 2. Multiplication on significands: –

50 Floating-Point Multiplication (II) Step 3. Normalized scientific notation (overflow or underflow check) – Step 4. Round the number – Step 5. Sign determination –

51 Flow Diagram of Floating –Point Multiplication Start 1. Add the biased exponents of the two numbers, subtracting the bias from the sum to get the biased exponent 2. Multiply the significands 3. Normalize the product if necessary, shifting it right and incrementing the exponent Overflow or Underflow? Exception 4. Round the siginificand to the appropriate number of bits Still normalized? Done Yes No Yes 5. Set the sign of the product to positive if the signs of the original operands are the same; if they differ make the sign negative

52 Floating-Point Instructions in MIPS NameExampleComments 32 floating-point register $f0, $f1, $f2, …, $f31MIPS floating-point registers are used in pairs for double precision numbers memory words Memory[0], Memory[4], …, Memory[ ] Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential word addresses differ by 4. Memory holds data structures, such as arrays, and spilled register, such as those saved on procedure calls. CategoryInstructionExampleMeaningComments Arithmetic FP add singleadd.s $f2,$f4,$f6$f2 = $f4 + $f6FP add (single precision) FP subtract singlesub.s $f2,$f4,$f6$f2 = $f4 - $f6FP subtract (single precision) FP multiply singlemul.s $f2,$f4,$f6$f2 = $f4 × $f6FP multiply (single precision) FP divide singlediv.s $f2,$f4,$f6$f2 = $f4 / $f6FP divide (single precision) FP add doubleadd.d $f2,$f4,$f6$f2 = $f4 + $f6FP add (double precision) FP subtract doublesub.d $f2,$f4,$f6$f2 = $f4 - $f6FP subtract (double precision) FP multiply doublemul.d $f2,$f4,$f6$f2 = $f4 × $f6FP multiply (double precision) FP divide doublediv.d $f2,$f4,$f6$f2 = $f4 / $f6FP divide (double precision) Data transfer load word copr. 1lwc1 $f1,100($s2)$f1 = Memory[$s2+100]32-bit data to FP register store word copr. 1swc1 $f1,100($s2)Memory[$s2+100] = $f132-bit data to memory Conditional branch branch on FP truebc1t 25if(cond == 1) go to PC+4+100PC-relative branch if FP cond. branch on FP falsebc1f 25if(cond == 0) go to PC+4+100PC-relative branch if not cond. FP compare single (eq,ne,lt,le,gt,ge) c.lt.s $f2,$f4 if($f2<$f4) cond = 1; else cond = 0; FP compare less than single precision FP compare double (eq,ne,lt,le,gt,ge) c.lt.d $f2,$f4 if($f2<$f4) cond = 1; else cond = 0; FP compare less than double precision MIPS floating-point operands MIPS floating-point assembly language

53 MIPS Floating-Point Architecture NameFormatExampleComments add.sR add.s $f2,$f4,$f6 sub.sR sub.s $f2,$f4,$f6 mul.sR mul.s $f2,$f4,$f6 div.sR div.s $f2,$f4,$f6 add.dR add.d $f2,$f4,$f6 sub.dR sub.d $f2,$f4,$f6 mul.dR mul.d $f2,$f4,$f6 div.dR div.d $f2,$f4,$f6 lwc1I lwc1 $f1,100($s2) swc1I swc1 $f1,100($s2) bc1tI178125bc1t 25 bc1fI178025bc1f 25 c.lt.sR c.lt.s $f2,$f4 c.lt.dR c.lt.d $f2,$f4 Field size6 bits5 bits 6 bitsAll MIPS instructions 32 bits

54 Floating-Point C Program Compiling Convert a temperature in Fahrenheit to Celsius C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr-32.0)); } MIPS ‘code’: f2c:lwc1$f16, const5($gp) lwc1$f18, const9($gp) div.s$f16, $f16, $f18 lwc1$f18, const32($gp) sub.s$f18, $f12, $f18 mul.s$f0, $f16, $f18 jr$ra

55 Floating-Point C Procedure with Two-Dimensional Matrices (I) Perform matrix multiplication of X = X + Y *Z C code: void mm (double x[][], double y[][], double z[][]) { int i, j, k; for(i = 0; i != 32; i = i + 1) for(j = 0; j != 32; i = j + 1) for(k = 0; k != 32; i = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; }

56 Floating-Point C Procedure with Two-Dimensional Matrices (II) MIPS ‘code’: mm:… li $t1, 32; loop termination li $s0, 0; i = 0 L1: li $s1, 0; j = 0 L2: li $s2, 0; k = 0 sll $t2, $s0, 5 addu $t2, $t2, $s1 sll $t2, $t2, 3; byte offset of [i][j] addu $t2, $a0, $t2; byte address of x[i][j] l.d $f4, 0($t2) L3: sll $t0, $s2, 5 addu $t0, $t0, $s1 sll $t0, $t0, 3; byte offset of [k][j] addu $t0, $a2, $t0; byte address of z[k][j] l.d $f16, 0($t0) sll $t2, $s0, 5 addu $t0, $t0, $s2 sll $t0, $t0, 3; byte offset of [i][k] addu $t0, $a1, $t0; byte address of y[i][k] l.d $f18, 0($t0) mul.d $f16, $f18, $f16 add.d $f4, $f4, $f16; f4 = x[i][j] + y[i][k] * z[k][j] addiu $s2, $s2, 1; $k = k + 1 bne $s2, $t1, L3 s.d $f4, 0($t2); x[i][j] = $f4 addiu $s1, $s1, 1; $j = j + 1 bne $s1, $t1, L2 addiu $s0, $s0, 1; $i = i + 1 bne $s0, $t1, L1 …

57 Rounding with Guard Digits Add 2.56 ten ×10 0 to 2.34 ten ×10 2, assuming three significant decimal digits and round to the nearest decimal number With guard and round digits –Shift the smaller number to right to align the exponents, so 2.56 ten ×10 0 become ten ×10 2 –Guard hold 5 digit, round hold 6 digit – yield 2.37 ten ×10 2 Without guard and round digits – – yield 2.36 ten ×10 2

58 Chapter Four Summary Computer arithmetic is constrained by limited precision Bit patterns have no inherent meaning but standards do exist –two’s complement –IEEE 754 floating point Computer instructions determine “meaning” of the bit patterns Performance and accuracy are important so there are many complexities in real machines (i.e., algorithms and implementation). We are ready to move on (and implement the processor) you may want to look back (Section 4.12 is great reading!)