CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Slides:

Advertisements

Similar presentations

CSE431 Chapter 3.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 3: Arithmetic for Computers Mary Jane Irwin ( )

Advertisements

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Arithmetic in Computers Chapter 4 Arithmetic in Computers2 Outline Data representation integers Unsigned integers Signed integers Floating-points.

Chapter 3 Arithmetic for Computers. Exam 1 CSCE

CSCE 212 Chapter 3: Arithmetic for Computers Instructor: Jason D. Bakos.

Lecture 16: Computer Arithmetic Today’s topic –Floating point numbers –IEEE 754 representations –FP arithmetic Reminder –HW 4 due Monday 1.

Integer Multiplication (1/3)

1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.

Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.

1  2004 Morgan Kaufmann Publishers Chapter Three.

CSE431 L03 MIPS Arithmetic Review.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 03: MIPS Arithmetic Review Mary Jane Irwin (

Computer Systems Organization: Lecture 3

ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 11: Floating Point Partially adapted from Computer Organization and Design, 4 th edition,

ECE 15B Computer Organization Spring 2010 Dmitri Strukov Lecture 6: Logic/Shift Instructions Partially adapted from Computer Organization and Design, 4.

COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.

CPSC 321 Computer Architecture ALU Design – Integer Addition, Multiplication & Division Copyright 2002 David H. Albonesi and the University of Rochester.

Systems Architecture Lecture 14: Floating Point Arithmetic

IT 251 Computer Organization and Architecture Introduction to Floating Point Numbers Chia-Chi Teng.

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.

CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.

Computing Systems Basic arithmetic for computers.

ECE232: Hardware Organization and Design

CPS3340 COMPUTER ARCHITECTURE Fall Semester, /14/2013 Lecture 16: Floating Point Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.

Computer Organization CS224 Fall 2012 Lesson 21. FP Example: °F to °C  C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr )); } fahr in $f12,

CSF 2009 Arithmetic for Computers Chapter 3. Arithmetic for Computers Operations on integers Addition and subtraction Multiplication and division Dealing.

Lecture 9: Floating Point

Computer Arithmetic II Instructor: Mozafar Bag-Mohammadi Ilam University.

Floating Point Representation for non-integral numbers – Including very small and very large numbers Like scientific notation – –2.34 × –

Computer Architecture Chapter 3 Instructions: Arithmetic for Computer Yu-Lun Kuo 郭育倫 Department of Computer Science and Information Engineering Tunghai.

CS.305 Computer Architecture Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available by Dr Mary.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

FLOATING POINT ARITHMETIC P&H Slides ECE 411 – Spring 2005.

CS 61C L3.2.1 Floating Point 1 (1) K. Meinz, Summer 2004 © UCB CS61C : Machine Structures Lecture Floating Point Kurt Meinz inst.eecs.berkeley.edu/~cs61c.

CHAPTER 3 Floating Point.

CDA 3101 Fall 2013 Introduction to Computer Organization

Restoring Unsigned Integer Division

Chapter 3 Arithmetic for Computers. Chapter 3 — Arithmetic for Computers — 2 Arithmetic for Computers Operations on integers Addition and subtraction.

Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.

Chapter 3 Arithmetic for Computers CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

EI 209 Chapter 3.1CSE, 2015 EI 209 Computer Organization Fall 2015 Chapter 3: Arithmetic for Computers Haojin Zhu ( )

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 3 Arithmetic for Computers.

CS 224 Computer Organization Spring 2011 Arithmetic for Computers With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide contents.

CSE 340 Computer Architecture Spring 2016 MIPS Arithmetic Review.

William Stallings Computer Organization and Architecture 8th Edition

/ Computer Architecture and Design

Morgan Kaufmann Publishers Arithmetic for Computers

Computer Architecture & Operations I

Morgan Kaufmann Publishers Arithmetic for Computers

Arithmetic for Computers

Morgan Kaufmann Publishers Arithmetic for Computers

/ Computer Architecture and Design

Floating Point Arithmetics

Arithmetic for Computers

CSCE 350 Computer Architecture

CS 61C: Great Ideas in Computer Architecture Floating Point Arithmetic

Computer Arithmetic Multiplication, Floating Point

ECEG-3202 Computer Architecture and Organization

Chapter 3 Arithmetic for Computers

Morgan Kaufmann Publishers Arithmetic for Computers

Chapter 3 Arithmetic for Computers

Number Representation

Presentation transcript:

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson & Hennessy, ©2005 Some slides and/or pictures in the following are adapted from: slides ©2008 UCB 9-11

CSE431 Chapter 3.2Irwin, PSU, 2008  32-bit signed numbers (2’s complement): two = 0 ten two = + 1 ten two = + 2,147,483,646 ten two = + 2,147,483,647 ten two = – 2,147,483,648 ten two = – 2,147,483,647 ten two = – 2 ten two = – 1 ten Number Representations maxint minint  Converting <32-bit values into 32-bit values l copy the most significant bit (the sign bit) into the “empty” bits > > l sign extend versus zero extend ( lb vs. lbu ) MSB LSB

CSE431 Chapter 3.3Irwin, PSU, 2008 MIPS Arithmetic Logic Unit (ALU)  Must support the Arithmetic/Logic operations of the ISA add, addi, addiu, addu sub, subu mult, multu, div, divu sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu 32 m (operation) result A B ALU 4 zeroovf 1 1  With special handling for l sign extend – addi, addiu, slti, sltiu l zero extend – andi, ori, xori l overflow detection – add, addi, sub

CSE431 Chapter 3.4Irwin, PSU, 2008 Dealing with Overflow OperationOperand AOperand BResult indicating overflow A + B≥ 0 < 0 A + B< 0 ≥ 0 A - B≥ 0< 0 A - B< 0≥ 0  Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit  When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur  MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception

CSE431 Chapter 3.5Irwin, PSU, 2008 Two ’ s Complement Arithmetic  Addition is accomplished by adding the codes, ignoring any final carry  Subtraction: change the sign and add  16 + (-23) =?  16 - (-23) =?  (-16) =?

CSE431 Chapter 3.6Irwin, PSU, 2008

CSE431 Chapter 3.7Irwin, PSU, 2008

CSE431 Chapter 3.8Irwin, PSU, 2008 Hardware for Addition and Subtraction

CSE431 Chapter 3.9Irwin, PSU, 2008 Multiply  Binary multiplication is just a bunch of right shifts and adds multiplicand multiplier partial product array double precision product n 2n n can be formed in parallel and added in parallel for faster multiplication

CSE431 Chapter 3.10Irwin, PSU, 2008 Multiplication Example 1011 Multiplicand (11 dec) x 1101 Multiplier (13 dec) 1011 Partial products 0000 Note: if multiplier bit is 1 copy 1011 multiplicand (place value) 1011 otherwise zero Product (143 dec) Note: need double length result

CSE431 Chapter 3.11Irwin, PSU, 2008 Add and Right Shift Multiplier Hardware multiplicand 32-bit ALU multiplier Control add shift right product

CSE431 Chapter 3.12Irwin, PSU, 2008 Add and Right Shift Multiplier Hardware multiplicand 32-bit ALU multiplier Control add shift right product = = 5 add add = 30

CSE431 Chapter 3.13Irwin, PSU, 2008 Unsigned Binary Multiplication

CSE431 Chapter 3.14Irwin, PSU, 2008 Execution of Example

CSE431 Chapter 3.15Irwin, PSU, 2008 Multiplying Negative Numbers  This does not work!  Solution 1 l Convert to positive if required l Multiply as above l If signs were different, negate answer  Solution 2 Booth ’ s algorithm

CSE431 Chapter 3.16Irwin, PSU, 2008 Booth ’ s Algorithm

CSE431 Chapter 3.17Irwin, PSU, 2008 Example of Booth ’ s Algorithm

CSE431 Chapter 3.18Irwin, PSU, 2008  Multiply ( mult and multu ) produces a double precision product mult $s0, $s1 # hi||lo = $s0 * $s1 Low-order word of the product is left in processor register lo and the high-order word is left in register hi Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file MIPS Multiply Instruction x18  Multiplies are usually done by fast, dedicated hardware and are much more complex (and slower) than adders

CSE431 Chapter 3.19Irwin, PSU, 2008 MIPS Multiplication  Two 32-bit registers for product l HI: most-significant 32 bits l LO: least-significant 32-bits  Instructions l mult rs, rt / multu rs, rt -64-bit product in HI/LO l mfhi rd / mflo rd -Move from HI/LO to rd -Can test HI value to see if product overflows 32 bits l mul rd, rs, rt -Least-significant 32 bits of product –> rd

CSE431 Chapter 3.20Irwin, PSU, 2008 Division  Division is just a bunch of quotient digit guesses and left shifts and subtracts dividend = quotient x divisor + remainder dividend divisor partial remainder array quotient n n remainder n

CSE431 Chapter 3.21Irwin, PSU, Division of Unsigned Binary Integers Quotient Dividend Remainder Partial Remainders Divisor

CSE431 Chapter 3.22Irwin, PSU, 2008 Left Shift and Subtract Division Hardware divisor 32-bit ALU quotient Control subtract shift left dividend remainder

CSE431 Chapter 3.23Irwin, PSU, 2008 Left Shift and Subtract Division Hardware divisor 32-bit ALU quotient Control subtract shift left dividend remainder = = sub rem neg, so ‘ient bit = restore remainder sub rem neg, so ‘ient bit = restore remainder sub rem pos, so ‘ient bit = sub rem pos, so ‘ient bit = 1 = 3 with 0 remainder

CSE431 Chapter 3.24Irwin, PSU, 2008 Division of Signed Binary Integers

CSE431 Chapter 3.25Irwin, PSU, 2008 Division of Signed Binary Integers

CSE431 Chapter 3.26Irwin, PSU, 2008 Division of Signed Binary Integers

CSE431 Chapter 3.27Irwin, PSU, 2008  Divide ( div and divu ) generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file MIPS Divide Instruction  As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by x1A

CSE431 Chapter 3.28Irwin, PSU, 2008 MIPS Division  Use HI/LO registers for result l HI: 32-bit remainder l LO: 32-bit quotient  Instructions l div rs, rt / divu rs, rt l No overflow or divide-by-0 checking -Software must perform checks if required Use mfhi, mflo to access result

CSE431 Chapter 3.29Irwin, PSU, 2008 Representation of Fractions “Binary Point” like decimal point signifies boundary between integer and fractional parts: xx. yyyy Example 6-bit representation: = 1x x x2 -3 = If we assume “fixed binary point”, range of 6-bit representations with this format: 0 to (almost 4)

CSE431 Chapter 3.30Irwin, PSU, 2008 Fractional Powers of / / / / / i 2 -i

CSE431 Chapter 3.31Irwin, PSU, and Example: (done in class)

CSE431 Chapter 3.32Irwin, PSU, 2008 Representation of Fractions So far, in our examples we used a “fixed” binary point. What we really want is to “float” the binary point. Why? Floating binary point most effective use of our limited bits (and thus more accuracy in our number representation): … … Any other solution would lose accuracy! example: put into binary. Represent as in 5-bits choosing where to put the binary point. Store these bits and keep track of the binary point 2 places to the left of the MSB With floating point rep., each numeral carries a exponent field recording the whereabouts of its binary point. The binary point can be outside the stored bits, so very large and small numbers can be represented.

CSE431 Chapter 3.33Irwin, PSU, x radix (base) decimal pointsignificand exponent  Normalized form: no leadings 0s (exactly one digit to left of decimal point)  Alternatives to representing 1/1,000,000,000 l Normalized: 1.0 x l Not normalized: 0.1 x 10 -8,10.0 x Scientific Notation (in Decimal)

CSE431 Chapter 3.34Irwin, PSU, two x 2 -1 radix (base) “binary point” exponent  Computer arithmetic that supports it called floating point, because it represents numbers where the binary point is not fixed, as it is for integers Declare such variable in C as float significand Scientific Notation (in Binary)

CSE431 Chapter 3.35Irwin, PSU, 2008  Normal format: +1.xxxxxxxxxx two *2 yyyy two  32-bit version (C “ float ”) 0 31 SExponent Significand 1 bit8 bits23 bits S represents Sign Exponent represents y’s Significand represents x’s Floating Point Representation

CSE431 Chapter 3.36Irwin, PSU, 2008  What if result too large? Overflow!  Exponent larger than represented in 8-bit Exponent field  What if result too small? Underflow!  Negative exponent larger than represented in 8-bit Exponent field  What would help reduce chances of overflow and/or underflow? 0 1 underflow overflow Floating Point Representation

CSE431 Chapter 3.37Irwin, PSU, 2008  64 bit version (C “ double ”) Double Precision (vs. Single Precision) –C variable declared as double –But primary advantage is greater accuracy due to larger significand 0 31 SExponent Significand 1 bit11 bits20 bits Significand (cont’d) 32 bits Double Precision Fl. Pt. Representation

CSE431 Chapter 3.38Irwin, PSU, 2008  Next Multiple of Word Size (128 bits) l Unbelievable range of numbers l Unbelievable precision (accuracy)  Currently being worked on (IEEE 754r) l Current version has 15 exponent bits and 112 significand bits (113 precision bits)  Oct-Precision? l Some have tried, no real traction so far  Half-Precision? l Yep, that’s for a short (16 bit) en.wikipedia.org/wiki/Quad_precision en.wikipedia.org/wiki/Half_precision QUAD Precision Fl. Pt. Representation

CSE431 Chapter 3.39Irwin, PSU, 2008 Single Precision (DP similar):  Sign bit:1 means negative, 0 means positive  Significand: l To pack more bits, leading 1 implicit for normalized numbers l bits single, bits double l always true: 0 < Significand < 1 (for normalized numbers)  Note: 0 has no leading 1, so reserve exponent value 0 just for number SExponent Significand 1 bit8 bits23 bits IEEE 754 Floating Point Standard

CSE431 Chapter 3.40Irwin, PSU, 2008  IEEE 754 uses “biased exponent” representation l Designers wanted FP numbers to be used even if no FP hardware; e.g., sort records with FP numbers using integer compares l Wanted bigger (integer) exponent field to represent bigger numbers l 2’s complement poses a problem (because negative numbers look bigger)  1.0x 2 -1 and 1.0x2 1 (done in class) IEEE 754 Floating Point Standard

CSE431 Chapter 3.41Irwin, PSU, 2008 Called Biased Notation, where bias is number subtracted to get real number –IEEE 754 uses bias of 127 for single precision –Subtract 127 from Exponent field to get actual value for exponent –1023 is bias for double precision  Summary (single precision): 0 31 SExponent Significand 1 bit8 bits23 bits (-1) S x (1 + Significand) x 2 (Exponent-127) –Double precision identical, except with exponent bias of 1023 (half, quad similar) IEEE 754 Floating Point Standard

CSE431 Chapter 3.42Irwin, PSU, 2008 Single-Precision Range  Exponents and reserved  Smallest value l Exponent:  actual exponent = 1 – 127 = –126 l Fraction: 000…00  significand = 1.0 l ±1.0 × 2 –126 ≈ ±1.2 × 10 –38  Largest value l exponent:  actual exponent = 254 – 127 = +127 l Fraction: 111…11  significand ≈ 2.0 l ±2.0 × ≈ ±3.4 ×

CSE431 Chapter 3.43Irwin, PSU, 2008 Double-Precision Range  Exponents 0000…00 and 1111…11 reserved  Smallest value l Exponent:  actual exponent = 1 – 1023 = –1022 l Fraction: 000…00  significand = 1.0 l ±1.0 × 2 –1022 ≈ ±2.2 × 10 –308  Largest value l Exponent:  actual exponent = 2046 – 1023 = l Fraction: 111…11  significand ≈ 2.0 l ±2.0 × ≈ ±1.8 ×

CSE431 Chapter 3.44Irwin, PSU, 2008 Floating-Point Precision  Relative precision l all fraction bits are significant l Single: approx 2 –23 -Equivalent to 23 × log 10 2 ≈ 23 × 0.3 ≈ 6 decimal digits of precision l Double: approx 2 –52 -Equivalent to 52 × log 10 2 ≈ 52 × 0.3 ≈ 16 decimal digits of precision

CSE431 Chapter 3.45Irwin, PSU, 2008 Floating-Point Example  Represent –0.75 l –0.75 = (–1) 1 × × 2 –1 l S = 1 l Fraction = 1000…00 2 l Exponent = –1 + Bias -Single: – = 126 = Double: – = 1022 =  Single: …00  Double: …00

CSE431 Chapter 3.46Irwin, PSU, 2008 Floating-Point Example  What number is represented by the single-precision float …00 l S = 1 l Fraction = 01000…00 2 l Fxponent = = 129  x = (–1) 1 × ( ) × 2 (129 – 127) = (–1) × 1.25 × 2 2 = –5.0

CSE431 Chapter 3.47Irwin, PSU, 2008 (done in class) Example: Converting Binary FP to Decimal

CSE431 Chapter 3.48Irwin, PSU, x 10 1 Example: Converting Decimal to FP (done in class)

CSE431 Chapter 3.49Irwin, PSU, 2008 Representation for 0  Represent 0? l exponent all zeroes l significand all zeroes l What about sign? Both cases valid +0: :

CSE431 Chapter 3.50Irwin, PSU, 2008 Special Numbers  What have we defined so far? (Single Precision) ExponentSignificandObject nonzero ??? anything+/- fl. pt. # /- ∞ 255 nonzero???

CSE431 Chapter 3.51Irwin, PSU, 2008 Representation for Not a Number  What do I get if I calculate sqrt(-4.0) or 0/0 ? l If ∞ not an error, these shouldn’t be either l Called Not a Number (NaN) l Exponent = 255, Significand nonzero  Why is this useful? l Hope NaNs help with debugging? l They contaminate: op(NaN, X) = NaN

CSE431 Chapter 3.52Irwin, PSU, 2008 Infinities and NaNs  Exponent = , Fraction = l ±Infinity l Can be used in subsequent calculations, avoiding need for overflow check  Exponent = , Fraction ≠ l Not-a-Number (NaN) l Indicates illegal or undefined result -e.g., 0.0 / 0.0 l Can be used in subsequent calculations

CSE431 Chapter 3.53Irwin, PSU, 2008 Representation for Denorms  Problem: There’s a gap among representable FP numbers around 0  Normalization and implicit 1 is to blame! (done in class) b a Gaps!

CSE431 Chapter 3.54Irwin, PSU, 2008 Representation for Denorms  Solution: l We still haven’t used Exponent = 0, Significand nonzero l Denormalized number: no (implied) leading 1, implicit exponent = -127 l Smallest representable pos num: a = l Second smallest representable pos num: b =

CSE431 Chapter 3.55Irwin, PSU, 2008 Special Numbers  What have we defined so far? (Single Precision) ExponentSignificandObject 000 0nonzeroDenorm 1-254anything+/- fl. pt. # 2550+/- ∞ 255nonzeroNaN

CSE431 Chapter 3.56Irwin, PSU, 2008 Floating-Point Addition  Consider a 4-digit decimal example l × × 10 –1  1. Align decimal points l Shift number with smaller exponent l × × 10 1  2. Add significands l × × 10 1 = × 10 1  3. Normalize result & check for over/underflow l × 10 2  4. Round and renormalize if necessary l × 10 2

CSE431 Chapter 3.57Irwin, PSU, 2008 Floating Point Addition  Addition (and subtraction) (  F1  2 E1 ) + (  F2  2 E2 ) =  F3  2 E3 l Step 0: Restore the hidden bit in F1 and in F2 l Step 1: Align fractions by right shifting F2 by E1 - E2 positions (assuming E1  E2) keeping track of (three of) the bits shifted out in G R and S l Step 2: Add the resulting F2 to F1 to form F3 l Step 3: Normalize F3 (so it is in the form 1.XXXXX …) -If F1 and F2 have the same sign  F3  [1,4)  1 bit right shift F3 and increment E3 (check for overflow) -If F1 and F2 have different signs  F3 may require many left shifts each time decrementing E3 (check for underflow) l Step 4: Round F3 and possibly normalize F3 again l Step 5: Rehide the most significant bit of F3 before storing the result

CSE431 Chapter 3.58Irwin, PSU, 2008 Floating Point Addition Example  Add (0.5 =  2 -1 ) + ( =  2 -2 ) l Step 0: l Step 1: l Step 2: l Step 3: l Step 4: l Step 5:

CSE431 Chapter 3.59Irwin, PSU, 2008 Floating Point Addition Example  Add (0.5 =  2 -1 ) + ( =  2 -2 ) l Step 0: l Step 1: l Step 2: l Step 3: l Step 4: l Step 5: Hidden bits restored in the representation above Shift significand with the smaller exponent (1.1100) right until its exponent matches the larger exponent (so once) Add significands (-0.111) = – = Normalize the sum, checking for exponent over/underflow x 2 -1 = x 2 -2 =.. = x 2 -4 The sum is already rounded, so we’re done Rehide the hidden bit before storing

CSE431 Chapter 3.60Irwin, PSU, 2008 Floating Point Multiplication  Multiplication (  F1  2 E1 ) x (  F2  2 E2 ) =  F3  2 E3 l Step 0: Restore the hidden bit in F1 and in F2 l Step 1: Add the two (biased) exponents and subtract the bias from the sum, so E1 + E2 – 127 = E3 also determine the sign of the product (which depends on the sign of the operands (most significant bits)) l Step 2: Multiply F1 by F2 to form a double precision F3 l Step 3: Normalize F3 (so it is in the form 1.XXXXX …) -Since F1 and F2 come in normalized  F3  [1,4)  1 bit right shift F3 and increment E3 -Check for overflow/underflow l Step 4: Round F3 and possibly normalize F3 again l Step 5: Rehide the most significant bit of F3 before storing the result

CSE431 Chapter 3.61Irwin, PSU, 2008 Floating Point Multiplication Example  Multiply (0.5 =  2 -1 ) x ( =  2 -2 ) l Step 0: l Step 1: l Step 2: l Step 3: l Step 4: l Step 5:

CSE431 Chapter 3.62Irwin, PSU, 2008 Floating Point Multiplication Example  Multiply (0.5 =  2 -1 ) x ( =  2 -2 ) l Step 0: l Step 1: l Step 2: l Step 3: l Step 4: l Step 5: Hidden bits restored in the representation above Add the exponents (not in bias would be -1 + (-2) = -3 and in bias would be (-1+127) + (-2+127) – 127 = (-1 -2) + ( ) = = 124 Multiply the significands x = Normalized the product, checking for exp over/underflow x 2 -3 is already normalized The product is already rounded, so we’re done Rehide the hidden bit before storing

CSE431 Chapter 3.63Irwin, PSU, 2008 Floating Point Examples  Add (0.75) + (-0.375)  Multiplication (0.75) * (-0.375)

CSE431 Chapter 3.64Irwin, PSU, 2008 Accurate Arithmetic  IEEE Std 754 specifies additional rounding control l Extra bits of precision (guard, round, sticky) l Choice of rounding modes l Allows programmer to fine-tune numerical behavior of a computation  Not all FP units implement all options l Most programming languages and FP libraries just use defaults  Trade-off between hardware complexity, performance, and market requirements

CSE431 Chapter 3.65Irwin, PSU, 2008 MIPS Floating Point Instructions  MIPS has a separate Floating Point Register File ( $f0, $f1, …, $f31 ) (whose registers are used in pairs for double precision values) with special instructions to load to and store from them lwcl $f1,54($s2) #$f1 = Memory[$s2+54] swcl $f1,58($s4) #Memory[$s4+58] = $f1  And supports IEEE 754 single add.s $f2,$f4,$f6 #$f2 = $f4 + $f6 and double precision operations add.d $f2,$f4,$f6 #$f2||$f3 = $f4||$f5 + $f6||$f7 similarly for sub.s, sub.d, mul.s, mul.d, div.s, div.d

CSE431 Chapter 3.66Irwin, PSU, 2008 MIPS Floating Point Instructions, Con’t  And floating point single precision comparison operations c.x.s $f2,$f4 #if($f2 < $f4) cond=1; else cond=0 where x may be eq, neq, lt, le, gt, ge and double precision comparison operations c.x.d $f2,$f4 #$f2||$f3 < $f4||$f5 cond=1; else cond=0  And floating point branch operations bclt 25 #if(cond==1) go to PC+4+25 bclf 25 #if(cond==0) go to PC+4+25

CSE431 Chapter 3.67Irwin, PSU, 2008 FP Example: °F to °C  C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr )); } fahr in $f12, result in $f0, literals in global memory space  Compiled MIPS code: f2c: lwc1 $f16, const5($gp) lwcl $f18, const9($gp) div.s $f16, $f16, $f18 lwcl $f18, const32($gp) sub.s $f18, $f12, $f18 mul.s $f0, $f16, $f18 jr $ra

CSE431 Chapter 3.68Irwin, PSU, 2008 FP Example: Array Multiplication  X = X + Y × Z l All 32 × 32 matrices, 64-bit double-precision elements  C code: void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } Addresses of x, y, z in $a0, $a1, $a2, and i, j, k in $s0, $s1, $s2

CSE431 Chapter 3.69Irwin, PSU, 2008 FP Example: Array Multiplication MIPS code: li $t1, 32 # $t1 = 32 (row size/loop end) li $s0, 0 # i = 0; initialize 1st for loop L1: li $s1, 0 # j = 0; restart 2nd for loop L2: li $s2, 0 # k = 0; restart 3rd for loop sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x) addu $t2, $t2, $s1 # $t2 = i * size(row) + j sll $t2, $t2, 3 # $t2 = byte offset of [i][j] addu $t2, $a0, $t2 # $t2 = byte address of x[i][j] l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j] L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z) addu $t0, $t0, $s1 # $t0 = k * size(row) + j sll $t0, $t0, 3 # $t0 = byte offset of [k][j] addu $t0, $a2, $t0 # $t0 = byte address of z[k][j] l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j] …

CSE431 Chapter 3.70Irwin, PSU, 2008 FP Example: Array Multiplication … sll $t0, $s0, 5 # $t0 = i*32 (size of row of y) addu $t0, $t0, $s2 # $t0 = i*size(row) + k sll $t0, $t0, 3 # $t0 = byte offset of [i][k] addu $t0, $a1, $t0 # $t0 = byte address of y[i][k] l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k] mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j] add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j] addiu $s2, $s2, 1 # $k k + 1 bne $s2, $t1, L3 # if (k != 32) go to L3 s.d $f4, 0($t2) # x[i][j] = $f4 addiu $s1, $s1, 1 # $j = j + 1 bne $s1, $t1, L2 # if (j != 32) go to L2 addiu $s0, $s0, 1 # $i = i + 1 bne $s0, $t1, L1 # if (i != 32) go to L1

CSE431 Chapter 3.71Irwin, PSU, 2008 Problem  Calculate the sum of A and B assuming the 16-bit NVIDIA format (1 bit sign, 5 bit exponent and 10 bit significands), as well as, 1 guard, 1 round bit and 1 sticky bit.  A = x 10 3  B = x 10 -1

CSE431 Chapter 3.72Irwin, PSU, 2008 Problem  Calculate the product of A and B assuming the 16-bit NVIDIA format (1 bit sign, 5 bit exponent and 10 bit significands), as well as, 1 guard, 1 round bit and 1 sticky bit.  A = x 10 0  B = x 10 0

CSE431 Chapter 3.73Irwin, PSU, 2008 Associativity  Parallel programs may interleave operations in unexpected orders l Assumptions of associativity may fail

CSE431 Chapter 3.74Irwin, PSU, 2008 Problem  Calculate (A+B)+C and A+(B+C) assuming the 16-bit NVIDIA format (1 bit sign, 5 bit exponent and 10 bit significands), as well as, 1 guard, 1 round bit and 1 sticky bit.  A = x 10 1  B = x  C = x 10 1