Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., 3.1416 –very small numbers, e.g.,.000000001 –very large numbers,

Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., 3.1416 –very small numbers, e.g.,.000000001 –very large numbers, e.g., 3.15576 X 10 9 –A notation that renders numbers with a single digit to the left of the decimal point is called Scientific notation. –A number in scientific notation that has no leading zeros is called a normalized number. E.g 0.1 X 10 -8 and 10.0 X 10 -10 are not in normalized scientific notation, while 1.0 X 10 -9 is. –If we introduce decimals, we can easily create other possible representations. Each of these alternative representations is created by shifting the decimal point from its original location. Since each phase shift represents a multiplication or division of the value by the base, we can increase or decrease the exponent to compensate for the shift.

Thus to define a number in exponential or scientific notation requires the specification of four separate components. a) the sign of the number b) the magnitude of the number known as the MANTISSA c) the sign of the exponent d) the magnitude of the exponent Two additional pieces of information are required to complete the picture: e) the base of the exponent (in this case 10, for computers it is 2) f) the location of the radix point (the radix point is set at a particular location in the number, most commonly the beginning or the end of the number. Since its location never changes, it is not necessary to actually store the point. Instead the location is implied. A designer of a floating point representation must find a compromise between the size of the fraction and the size of the exponent because a fixed word size means you must take a bit from one to add a bit to the other. This trade off is between precision and range: increasing the size of the fraction enhances the precision of the fraction, while increasing the size of the exponent increases the range of numbers that can be represented.

For floating point numbers, we might assign the digits as follows: SEEMMMMM The sign digit will be used to store the sign of the mantissa. Most commonly, the mantissa is stored using sign magnitude format. A few computers use complementary notation. No specific provision for the sign of the exponent. Thus the sign of the exponent must be included within the digits of the exponent itself. The method of storing the exponent is known as the Excess N notation, where N is the chosen mid value. Say the exponent can take values from 00 to 99 (two decimal digits), and we pick a value somewhere in the middle, ’50’, and declare that value to correspond to the exponent 0, then every value lower than that will be negative and those above will be positive. What we have done is offset, or bias the value of the exponent by our chosen amount. Thus to convert from exponential format to the format used in our example, we add the offset to the exponent, and store in that form.

MIPS floating point representation Representation: –In normalized scientific notation, numbers are represented as a single non zero digit to the left of the binary point. Eg. 1.xxxxxx X 2 yyyy –Out of the 32 bits, 1 bit is reserved for the sign, 8 bits for the exponent, and 23 bits for the fraction. –Thus numbers almost as large as 2.0 X 10 38 and almost as small as 2.0 X 10 -38 can be represented. –Overflow occurs when the exponent is too large, and Underflow occurs when the negative exponent is too large to fit in the exponent field. IEEE 754 floating point standard: –single precision: 8 bit exponent, 23 bit significand –double precision: 11 bit exponent, 52 bit significand –MIPS double precision allows numbers almost as small as 2.0 X 10 -308 and almost as large as 2.0 X 10 308

IEEE 754 floating-point standard Leading “1” bit of significand is implicit. Hence the number is 24 bits long in single precision (implied 1 and a 23 bit fraction ) and 53 bits long in double precision ( 1+52). We use the term “significand” to represent the 24 or 53 bit number that is 1 plus the fraction, and the “fraction” when we mean the 23 or 52 bit number. Exponent is “biased” to make sorting easier –all 0s is smallest exponent all 1s is largest –bias of 127 for single precision and 1023 for double precision –summary: (–1) sign ´ (1+Fraction) ´ 2 exponent – bias - Refer Fig. 3.15 on Page 194 of the Text book Example: –decimal: -.75 = - ( ½ + ¼ ) –binary: -.11 = -1.1 x 2 -1 –floating point: exponent = 126 = 01111110 –IEEE single precision: 10111111010000000000000000000000

Floating point addition 1.Start 2.Compare the exponents of the two numbers; shift the smaller number to the right until its exponent matches the larger exponent 3.Add the significands 4.Normalize the sum, either shifting right and incrementing the exponent, or shifting left and decrementing the exponent. 5.Overflow or underflow, if YES, exception, if NO, step 6 6.Round the significand to the appropriate number of bits. 7.Still normalized? If Yes Done, If No, go to step 4

Floating point addition

Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., 3.1416 –very small numbers, e.g.,.000000001 –very large numbers,

Similar presentations

Presentation on theme: "Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., 3.1416 –very small numbers, e.g.,.000000001 –very large numbers,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., 3.1416 –very small numbers, e.g.,.000000001 –very large numbers,

Similar presentations

Presentation on theme: "Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., 3.1416 –very small numbers, e.g.,.000000001 –very large numbers,"— Presentation transcript:

Similar presentations

About project

Feedback