Presentation is loading. Please wait.

Presentation is loading. Please wait.

IEEE Arithmetic UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative.

Similar presentations


Presentation on theme: "IEEE Arithmetic UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative."— Presentation transcript:

1 IEEE Arithmetic UC Berkeley Fall 2004, E77 http://jagger.me.berkeley.edu/~pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA. http://jagger.me.berkeley.edu/~pack/e77http://creativecommons.org/licenses/by-sa/2.0/

2 Computer representation of numbers Integers and rational numbers in different bases Floating point IEEE standards Roundoff Errors on basic arithmetic

3 Integers in base 10 The base10 number (2075) 10 means What are the “rules”? –In each digit, there is a number from 0 to 9 –The “value” of the slots are increasing powers of 10, starting at 10 0 Throughout the lecture, the “ explanatory ” expression is always given in base 10

4 Integers in base 2 The base2 number (1001101) 2 means What are the “rules”? –In each digit, there is a number from 0 to 1 –The “value” of the slots are increasing powers of 2, starting at 2 0 This numbers in this “ explanatory ” expression are base 10

5 Integers in base p The base p number (d 4 d 3 d 2 d 1 d 0 ) p means What are the “rules”? –In each digit, there is a number from 0 to p-1 –The “value” of the slots are increasing powers of p, starting at p 0

6 Rational numbers in base 10 The base 10 number (75.396) 10 means What are the “rules”? –In each digit, there is a number from 0 to 9 –The “value” of the slots to the left of the decimal point are increasing powers of 10, starting at 10 0 –The “value” of the slots to the right of the decimal point are decreasing powers of 10, starting at 10 -1 –The leading digit is usually not zero, but it is ok if it is. –We usually write 75.396 instead of 075.396, but they are the same.

7 Rational numbers in base 2 The base 2 number (1001.101) 2 means What are the “rules”? –In each digit, there is a number from 0 to 1 –The “value” of the slots to the left of the decimal point are increasing powers of 2, starting at 2 0 –The “value” of the slots to the right of the decimal point are decreasing powers of 2, starting at 2 -1 –The leading digit is usually not zero (see previous slide)

8 Rational numbers in base p The base p number (d 2 d 1 d 0.d -1 d -2 d -3 ) p means What are the “rules”? –In each digit, there is a number from 0 to p-1 –The “value” of the slots to the left of the decimal point are increasing powers of p, starting at p 0 –The “value” of the slots to the right of the decimal point are decreasing powers of p, starting at p -1 –The leading digit is usually not 0 (see previous slide)

9 “Floating Point” numbers in base p The base p number (d 1 d 0.d -1 d -2 d -3 ) × p e means What are the “rules”? –In each digit, there is a number from 0 to p-1 –The “value” of the slots to the left of the decimal point are increasing powers of p, starting at p 0 –The “value” of the slots to the right of the decimal point are decreasing powers of p, starting at p -1 –The exponent, e, must be an integer. It just “moves” the decimal point e places to the right. exponent mantissa

10 Normalized Floating point numbers in base 2 A normalized binary number is written d 0.d -1 d -2 d -3 × 2 e. The rules for normalized are: –Only one digit to left of decimal point –Value of this digit, d 0 is always 1 So, normalized binary numbers appear as (for example) 1.d -1 d -2 d -3 d -4 d -5 d -6 × 2 e In an unnormalized number, still allow only one digit to left of decimal point, but it may be 0. Finally, we need a ± sign in front to denote positive or negative, giving the form ± d 0.d -1 d -2 d -3 d -4 d -5 d -6 × 2 e

11 Computer representation of numbers Computers “store” everything –Integers, Characters –Floating point numbers (eg., 1.435678, cos(2.3), etc.) –Memory location addresses –Computer chip instructions (“multiply contents of register A by contents of register B, and store result in register C”) as (long) sequences of digits made up of 0’s and 1’s. –Each single digit is called a “bit” (8 bits is a byte) –So, a “bit” can store a single base2 number (0 or 1) For scientific computing and engineering calculations, an important question is “How are floating point numbers represented?” Need to store three things: –the mantissa, –the exponent, and –the sign

12 Storing the mantissa, exponent and sign Normalized numbers are of the form ±1.d -1 d -2 d -3 d -4 d -5 d -6 × 2 e Store the sign with 1 bit: 0 means positive, 1 means negative. Store the mantissa (6 digits above) bit-by-bit. Store the exponent as a binary integer –What about the sign of the exponent? Later… The number of bits used to store the mantissa controls the precision to which numbers can be stored. The number of bits used to store the exponent controls the range of numbers (smallest to biggest) that can be stored

13 IEEE Standard Single precision –32 bits 1 sign bit (0 means positive, 1 means negative) 8 bits for the exponent 23 bits for the fraction Double precision –64 bits 1 sign bit (0 means positive, 1 means negative) 11 bits for the exponent 52 bits for the fraction Plus, some additional rules as how to interpret things. Let’s use a different precision to explain the rules.

14 Toy precision, using IEEE rules 7 bits (only 2 7 =128 different possibilities) 1 sign bit, S, (0 means positive, 1 means negative) 3 bits for the exponent, E (000,001,010,011,100,101,110,111) 3 bits for the mantissa, M Rules –If 1≤E≤6, then value is -1 S × 2 E-3 × 1.M 2 E-3 takes on values 0.25, 0.5, 1, 2, 4, 8 1.M values: 1, 1.125, 1.25, 1.375, 1.5, 1.6125, 1.75, 1.8725 Minimum value is 0.25, maximum value is 15 (realmin, realmax) 48 positive numbers, 48 negative numbers –If E=0, and M≠0, then value is -1 S × 2 -2 × 0.M 1/32, 2/32, 3/32, 4/32, 5/32, 6/32, 7/32 (and their negatives) These are the unnormalized numbers (14 of them) –If E=0 and M=0, then value is 0 or -0 (based on S) –If E=7 and M=0, then value is Inf or –Inf (based on S) –If E=7 and M≠0, then value is NaN (14 of these)

15 Double precision, using IEEE rules 64 bits (2 64 =18400000000000000000 different possibilities) 1 sign bit, S, (0 means positive, 1 means negative) 11 bits for the exponent, E (between 0 and 2047) 52 bits for the mantissa, M (between 0 and 4.5×10 15 ) Rules –If 1≤E≤2046, then value is -1 S × 2 E-1023 × 1.M 2 E-1023 takes on 2046 different values, from 2 -1022 to 2 1023 1.M takes on 4.5×10 15 evenly spaced values from 1 to 2 (just less) Min value is 2.2×10 -308, maximum value is 1.8×10 308 –If E=0, and M≠0, then value is -1 S × 2 -1022 × 0.M 4.5×10 15 evenly spaced values from 0 to 2 -1022 (just less) These are the unnormalized numbers (9.0×10 15 of them) –If E=0 and M=0, then value is 0 or -0 (based on S) –If E=2047 and M=0, then value is Inf or –Inf (based on S) –If E=2047 and M≠0, then value is NaN (4.5×10 15 of these)

16 Toy precision, using IEEE rules 3-bit mantissa, 3-bit exponent, and sign bit gives 110 distinct numbers between -15 and 15, +0 and -0, +Inf and –Inf, NaN There are gaps. For 15<x<15, let fl(x) denote the closest floating point number to x. The relative representation error is For toy precision, the maximum relative error is 2 -4. In IEEE double precision, the maximum relative error is 2 -53 which is about 10 -16. Size of gaps between representable numbers depends on number (relative representation error)

17 IEEE Arithmetic Arithmetic (add, subtract, multiply and divide) on pairs of these numbers can give results that are not representable. IEEE standard is that the stored result of these operations should be the nearest floating point number. Hence fl(x+y) = (x+y)×(1+δ) where | δ |<10 -16 fl(x-y) = (x-y)×(1+δ) where | δ |<10 -16 fl(x×y) = (x×y)×(1+δ) where | δ |<10 -16 fl(x/y) = (x/y)×(1+δ) where | δ |<10 -16 IEEE arithmetic also has fl(√y) = √y(1+δ)


Download ppt "IEEE Arithmetic UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative."

Similar presentations


Ads by Google