Fundamentals of Computer Science Lecture 7
Floating Point Representation Numbers used in scientific calculations are designed by a sign, by the magnitude of the number, and by the position of the radix point. The position of the radix point is required to represent fractions, integers, or mixed integer-fraction numbers. There are two ways of specifying the position of the radix point which are: Fixed-point representation (which we have studied up till now) Floating-point representation There are two ways for positioning the radix point. They are: Putting the radix point at the extreme left of the number. Putting the radix point at the extreme right of the number.
Floating Point Representation However, it should be noted that in both cases, the radix point is not actually present in the digital system; rather, its position is implied by the fact that the number is predefined as an integer or fraction. It is most popular to use floating-point notation for storing fractions in the main memory of the computer. In decimal system, the floating-point notation ; which is also called “scientific notation”; of any real number can be represented as : (N)10 = F × 10E Where, (N)10 : is the real number in decimal format F: is a fraction E: is an exponent
Floating Point Representation Example: The real number + 6132.789 can be expressed using floating point notation as: (+ 6132.789)10 = + .6132789 × 10+04 Similar to the decimal system, floating point notation can also be applied to the binary system as: (N)2 = F × 2E Example: The binary number + 1001.11 can be represented with a 8-bit fraction and 6-bit exponent as: (+ 1001.11)2 = + . 100111 × 2+04
Common Formats for Floating Point Binary Numbers Representation There are 4 different formats for representing floating-point binary numbers. They are: signed-magnitude method 2’s complement method Excess method IEEE standard method
Floating-Point Numbers using signed-magnitude notation
Floating-Point Numbers using signed-magnitude notation sum of 7-terms geometric series
Floating-point numbers using signed-magnitude notation
Floating-point numbers using signed-magnitude notation
Floating-point numbers using signed-magnitude notation
Float-point numbers using signed-2’s complement In this method, only the exponent part is expressed using 2’s complement notation. There is only one sign bit exists for the mantissa.
Float-point numbers using signed-2’s complement
Float-point numbers using excess method In this method, only the exponent part is expressed using excess notation. There is only one sign bit exists for the mantissa.
Float-point numbers using excess method