CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.

CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic

Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., 3.1416 –very small numbers, e.g.,.000000001 –very large numbers, e.g., 3.15576  10 9 Representation: –sign, exponent, significand: (–1) sign  significand  2 exponent –more bits for significand gives more accuracy –more bits for exponent increases range IEEE 754 floating point standard: –single precision: 8 bit exponent, 23 bit significand –double precision: 11 bit exponent, 52 bit significand

Floating point representation: The idea is to normalize all numbers, so the significand has exactly one digit to the left of the decimal point. –12345 = 1.2345 * 10^4 –.0000012345 = 1.2345 * 10^-6 –Do this in binary: 1.01110 x 2^(1011) IEEE FP representation –(+/-) 1.0101010101010101010101 * 2 ^ ( 10101010) –This is single precision –Double precision: 64 bits in all. Where does one need accuracy of that level?

Floating point numbers Representation issues: –sign bit, exponent, significand –Question: how to represent each field –Question: which order to lay them out in a word? –Factor: should be easy to do comparisons (for sorting) For arithmetic, we will have special hardware anyway –Choice: Sign + magnitude representation Sign bit, followed by exponent, then significand (why?) exponent: represented with a “bias”: add 127 (1023 for double precision) significand: assume implicit 1. (so 00001 means 1.00001)

Floating point representation So: –(+/-) x (1 + significand) x 2 ^ (exponent - bias) is the value of a floating point number –Example: 0 00001000 01010000000000000000000 –Example: convert -.41 to single precision form

IEEE 754 floating-point standard Leading “1” bit of significand is implicit Exponent is “biased” to make sorting easier –all 0s is smallest exponent all 1s is largest –bias of 127 for single precision and 1023 for double precision –summary: (–1) sign  significand)  2 exponent – bias Example: –decimal: -.75 = -3/4 = -3/2 2 –binary: -.11 = -1.1 x 2 -1 –floating point: exponent = 126 = 01111110 –IEEE single precision: 10111111010000000000000000000000

Floating point addition The problem is: the exponents of numbers being added may be different –2.0 * 10^1 + 3.0 * 10^(-1) –2.0 * 10^1 +.03 * 10^ 1 : Now we can add them –2.03 * 10 ^1 –But we are not necessarily done! –E.g. 9.74 * 10^0 + 3.3 * 10^(-1) –10.07 * 10^0 is not correct form! –Shift again to get the correct form: 1.037 * 10^1

You can get different results A + B + C = A + (B+C) = (A+B) + C –Right? Can you see a problem? When do you lose bits?

Floating point multiplication Add exponents, but subtract bias Then multiply significands Then normalize

CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.

Similar presentations

Presentation on theme: "CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.

Similar presentations

Presentation on theme: "CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic."— Presentation transcript:

Similar presentations

About project

Feedback