# Fixed-point and floating-point numbers CS370 Fall 2003.

## Presentation on theme: "Fixed-point and floating-point numbers CS370 Fall 2003."— Presentation transcript:

Fixed-point and floating-point numbers CS370 Fall 2003

2 Representations of numbers Unsigned integers Signed integers – 1’s and 2’s complement representation To represent –Very Large and very Small numbers –Real numbers in general  Fixed-point numbers  Floating-point numbers

3 Base-10 (decimal) arithmetic Uses the ten numbers from 0 to 9 Each column represents a power of 10

4 Base-10 (decimal) arithmetic Uses the ten numbers from 0 to 9 Each column represents a power of 10

5 Standard binary representation Uses the two numbers from 0 to 1 Every column represents a power of 2

6 Fixed-point representation Uses the two numbers from 0 to 1 Every column represents a power of 2

8 Range of values in a byte

9 Scientific notation (1) One billion =1,000,000,000 =1 x 10 9 –significand or mantissa: 1 –base or radix: 10 –exponent: 9

10 Scientific notation (2) 1999 =1.999 x 10 3 –significand or mantissa: 1999 –base or radix: 10 –exponent: 3 =19.99 x 10 =199.9 x 10

11 Practice (base 10) 258 = 2.58 x 10 2 Mantissa = 258 Radix = 10 Exponent = 2 24.25 = 2.425 x 10 1 Mantissa = 2425 Radix = 10 Exponent = 1

12 Base-2 scientific notation 2.25 ten =10.01 two =10.01 two x 2 0 =1.001 two x 2 1  normalized Numbers are usually normalized which means that the leading bit is always a 1.

13 8-bit floating point format (1) sign 1 bit exponent 3 bits significand 4 bits number base 2 number base 10 000110011.001x2 1 2.25 001111001.1 x 2 3 12.0 011111101.11 x 2 7 224.0 100111101.11 x 2 -1 0.875

14 Improvements Bias the exponent –Always subtract a fixed amount, e.g., 3 –Allows representation of negative exponents Implicit one -Leading one in a Phone number such as 1-619-556-0231 is redundant. –Why use a bit for the leading one?

15 8-bit floating-point format (2) Exponent (3 bits) is biased by 3 The leading one of significand is implicit Zero is represented by all zeros

16 IEEE standard floating-point Single precision –32 bits sign: 1 bit exponent: 8 bits significand: 23 bits –Bias: 127 Double precision –64 bits sign: 1 bit exponent: 11 bits significand: 52 bits –Bias: 511

17 Practice( base 10) 13 = 1.3 x 10 1 = 1.011 x 2 3 1.25 = 1.25 x 10 0 = 1.010 x 2 0

18

19