Presentation is loading. Please wait.

Presentation is loading. Please wait.

CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.

Similar presentations


Presentation on theme: "CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair."— Presentation transcript:

1 CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair

2 Chapter 3 CEN 316 2 Chapter 3 2 Review of Numbers  Computers are made to deal with numbers  What can we represent in N bits? è Unsigned integers: 0to2 N - 1 è Signed Integers (Two’s Complement) -2 (N-1) to 2 (N-1) - 1

3 Chapter 3 CEN 316 3 Other Numbers  What about other numbers? è Very large numbers? (seconds/century) 3,155,760,000 10 (3.15576 10 x 10 9 ) è Very small numbers? (atomic diameter) 0.00000001 10 (1.0 10 x 10 -8 ) è Rationals (repeating pattern) 2/3 (0.666666666...) è Irrationals 2 1/2 (1.414213562373...) e (2.718...),  (3.141...)  All represented in scientific notation

4 Chapter 3 CEN 316 4 Scientific Notation (in Decimal) 6.02 10 x 10 23 radix (base) decimal pointmantissa exponent  Normalized form: no leadings 0s (exactly one digit to left of decimal point)  Alternatives to representing 1/1,000,000,000 è Normalized: 1.0 x 10 -9 è Not normalized: 0.1 x 10 -8,10.0 x 10 -10

5 Chapter 3 CEN 316 5 Scientific Notation (in Binary)  Computer arithmetic that supports it called floating point, because it represents numbers where binary point is not fixed, as it is for integers è Declare such variable in C as float 1.0 two x 2 -1 radix (base) “binary point” exponent mantissa

6 Chapter 3 CEN 316 6 Floating Point Representation (1/2)  Normal format: +1.xxxxxxxxxx two *2 yyyy two  Multiple of Word Size (32 bits) 0 31 SExponent 302322 Significand 1 bit8 bits23 bits   S represents Sign   Exponent represents y’s   Significand represents x’s   Represent numbers as small as 2.0 x 10 -38 to as large as 2.0 x 10 38

7 Chapter 3 CEN 316 7 Floating Point Representation (2/2)  What if result too large? (> 2.0x10 38 ) è Overflow! è Overflow  Exponent larger than represented in 8-bit Exponent field  What if result too small? (>0, < 2.0x10 -38 ) è Underflow! è Underflow  Negative exponent larger than represented in 8-bit Exponent field  How to reduce chances of overflow or underflow?

8 Chapter 3 CEN 316 8 Double Precision Fl. Pt. Representation  Next Multiple of Word Size (64 bits)   Double Precision (vs. Single Precision) è è C variable declared as double è è Represent numbers almost as small as 2.0 x 10 -308 to almost as large as 2.0 x 10 308 è è But primary advantage is greater accuracy due to larger significand 0 31 SExponent 302019 Significand 1 bit11 bits20 bits Significand (cont’d) 32 bits

9 Chapter 3 CEN 316 9 IEEE 754 Floating Point Standard (1/4)  Single Precision, DP similar  Sign bit:1 means negative 0 means positive  Significand: è To pack more bits, leading 1 implicit for normalized numbers è 1 + 23 bits single, 1 + 52 bits double è always true: Significand < 1 (for normalized numbers)  Note: 0 has no leading 1, so reserve exponent value 0 just for number 0

10 Chapter 3 CEN 316 10 IEEE 754 Floating Point Standard (2/4)  Kahan wanted FP numbers to be used even if no FP hardware; e.g., sort records with FP numbers using integer compares  Could break FP number into 3 parts: compare signs, then compare exponents, then compare significands  Wanted it to be faster, single compare if possible, especially if positive numbers  Then want order: è Highest order bit is sign ( negative < positive) è Exponent next, so big exponent => bigger # è Significand last: exponents same => bigger #

11 Chapter 3 CEN 316 11 IEEE 754 Floating Point Standard (3/4)  Negative Exponent? è 2’s comp? 1.0 x 2 -1 v. 1.0 x2 +1 (1/2 v. 2) 01111 000 0000 0000 0000 0000 0000 1/2 00000 0001000 0000 0000 0000 0000 0000 2 è è This notation using integer compare of 1/2 v. 2 makes 1/2 > 2!   Instead, pick notation 0000 0001 is most negative, and 1111 1111 is most positive è è 1.0 x 2 -1 v. 1.0 x2 +1 (1/2 v. 2) 1/2 00111 1110 000 0000 0000 0000 0000 0000 01000 0000000 0000 0000 0000 0000 0000 2

12 Chapter 3 CEN 316 12 IEEE 754 Floating Point Standard (4/4)   Called Biased Notation, where bias is number subtract to get real number è è IEEE 754 uses bias of 127 for single prec. è è Subtract 127 from Exponent field to get actual value for exponent è è 1023 is bias for double precision  Summary (single precision): 0 31 S Exponent 302322 Significand 1 bit8 bits 23 bits   (-1) S x (1 + Significand) x 2 (Exponent-127) è è Double precision identical, except with exponent bias of 1023

13 Chapter 3 CEN 316 13 Example: Converting Binary FP to Decimal  Sign: 0 => positive  Exponent: è 0110 1000 two = 104 ten è Bias adjustment: 104 - 127 = -23  Significand: è 1 + 1x2 -1 + 0x2 -2 + 1x2 -3 + 0x2 -4 + 1x2 -5 +... =1+2 -1 +2 -3 +2 -5 +2 -7 +2 -9 +2 -14 +2 -15 +2 -17 +2 -22 = 1.0 ten + 0.666115 ten 00110 1000101 0101 0100 0011 0100 0010   Represents: 1.666115 ten *2 -23 ~ 1.986*10 -7 (about 2/10,000,000)

14 Chapter 3 CEN 316 14 Converting Decimal to FP (1/3)  Simple Case: If denominator is an exponent of 2 (2, 4, 8, 16, etc.), then it’s easy.  Show MIPS representation of -0.75 è -0.75 = -3/4 è -11 two /100 two = -0.11 two è Normalized to -1.1 two x 2 -1 è (-1) S x (1 + Significand) x 2 (Exponent-127) è (-1) 1 x (1 +.100 0000... 0000) x 2 (126-127) 10111 1110100 0000 0000 0000 0000 0000

15 Chapter 3 CEN 316 15 Converting Decimal to FP (2/3)  Not So Simple Case: If denominator is not an exponent of 2. è Then we can’t represent number precisely, but that’s why we have so many bits in significand: for precision è Once we have significand, normalizing a number to get the exponent is easy. è So how do we get the significand of a never-ending number?

16 Chapter 3 CEN 316 16 Converting Decimal to FP (3/3)  To finish conversion: è Write out binary number with repeating pattern. è Cut it off after correct number of bits (different for single v. double precision). è Derive Sign, Exponent and Significand fields.

17 Chapter 3 CEN 316 17 1: -1.75 2: -3.5 3: -3.75 4: -7 5: -7.5 6: -15 What is the decimal equivalent of the floating pt # above? 1 1000 0001111 0000 0000 0000 0000 0000 Example

18 Chapter 3 CEN 316 18 “And in conclusion…”  Floating Point numbers approximate values that we want to use.  IEEE 754 Floating Point Standard is most widely accepted attempt to standardize interpretation of such numbers è Every desktop or server computer sold since ~1997 follows these conventions   Summary (single precision): 0 31 SExponent 302322 Significand 1 bit8 bits23 bits   (-1) S x (1 + Significand) x 2 (Exponent-127) è è Double precision identical, bias of 1023

19 Chapter 3 CEN 316 19 Representation for ± ∞  In FP, divide by 0 should produce ± ∞, not overflow.  Why? è OK to do further computations with ∞ E.g., X/0 > Y may be a valid comparison  IEEE 754 represents ± ∞ è Most positive exponent reserved for ∞ è Significands all zeroes è Sign: 0 for +∞ and 1 for -∞

20 Chapter 3 CEN 316 20 Representation for 0  Represent 0? è exponent all zeroes è significand all zeroes too è What about sign? è +0: 0 00000000 00000000000000000000000 è -0: 1 00000000 00000000000000000000000  Why two zeroes? è Helps in some limit comparisons

21 Chapter 3 CEN 316 21 Special Numbers  What have we defined so far? (Single Precision) ExponentSignificand Object 0 0 0 0 nonzero ??? 1-254 anything +/- fl. pt. # 255 0 +/- ∞ 255 nonzero ??? è Exp=0,255 & Sig!=0 …

22 Chapter 3 CEN 316 22 Representation for Not a Number  What is sqrt(-4.0) or 0/0 ? è If ∞ not an error, these shouldn’t be either. è Called Not a Number (NaN) è Exponent = 255, Significand nonzero

23 Chapter 3 CEN 316 23 Representation for Denorms (1/2)  Problem: There’s a gap among representable FP numbers around 0 è Smallest representable pos num: a = 1.0… 2 * 2 -126 = 2 -126 è Second smallest representable pos num: b = 1.000……1 2 * 2 -126 = 2 -126 + 2 -149 a - 0 = 2 -126 b - a = 2 -149 b a0 + - Gaps! Normalization and implicit 1 is to blame!

24 Chapter 3 CEN 316 24 Representation for Denorms (2/2)   Solution: è è We still haven’t used Exponent = 0, Significand nonzero è è Denormalized number: no leading 1, implicit exponent = -126. è è Smallest representable pos num: a = 2 -149 è è Second smallest representable pos num: b = 2 -148 0 + -

25 Chapter 3 CEN 316 25 Rounding  Math on real numbers  we worry about rounding to fit result in the significant field.  FP hardware carries 2 extra bits of precision, and rounds for proper value  Rounding occurs when converting… è double to single precision è floating point # to an integer

26 Chapter 3 CEN 316 26 IEEE Four Rounding Modes  Round towards + ∞ è ALWAYS round “up”: 2.1  3, -2.1  -2  Round towards - ∞ è ALWAYS round “down”: 1.9  1, -1.9  -2  Truncate è Just drop the last bits (round towards 0)  Round to (nearest) even (default) è Normal rounding, almost: 2.5  2, 3.5  4 è Insures fairness on calculation è Half the time we round up, other half down

27 Chapter 3 CEN 316 27 Summary

28 Chapter 3 CEN 316 28 FP Addition & Subtraction  Much more difficult than with integers (can’t just add significands)  How do we do it? è De-normalize to match larger exponent è Add significands to get resulting one è Normalize (& check for under/overflow) è Round if needed (may need to renormalize)  If signs ≠, do a subtract. (Subtract similar) è If signs ≠ for add (or = for sub), what’s ans sign?  Question: How do we integrate this into the integer arithmetic unit?

29 Chapter 3 CEN 316 29 Floating point Addition

30 09/12/1436Chapter 3Chapter 3 CEN 316 30

31 Chapter 3 CEN 316 31 Floating point Multiplication

32 Chapter 3 CEN 316 32 MIPS Floating Point Architecture

33 Chapter 3 CEN 316 33 MIPS Floating Point Architecture  Problems: è These are far more complicated than their integer counterparts è Can take much longer to execute è Inefficient to have different instructions take vastly differing amounts of time. è Some programs do no FP calculations è It takes lots of hardware relative to integers to do FP fast

34 Chapter 3 CEN 316 34 MIPS Floating Point Architecture  1990 Solution: Make a completely separate chip that handles only FP.  Coprocessor 1: FP chip è contains 32 32-bit registers: $f0, $f1, … è most of the registers specified in.s and.d instruction refer to this set è separate load and store: lwc1 and swc1 (“load word coprocessor 1”, “store …”) è Double Precision: by convention, even/odd pair contain one DP FP number: $f0 / $f1, $f2 / $f3, …, $f30 / $f31 ñ Even register is the name


Download ppt "CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair."

Similar presentations


Ads by Google