CS 232: Computer Architecture II

CS 232: Computer Architecture II
Prof. Laxmikant (Sanjay) Kale Floating point arithmetic

Floating Point (a brief look)
We need a way to represent numbers with fractions, e.g., very small numbers, e.g., very large numbers, e.g.,  109 Representation: sign, exponent, significand: (–1)sign significand 2exponent more bits for significand gives more accuracy more bits for exponent increases range IEEE 754 floating point standard: single precision: 8 bit exponent, 23 bit significand double precision: 11 bit exponent, 52 bit significand

Floating point representation:
The idea is to normalize all numbers, so the significand has exactly one digit to the left of the decimal point. 12345 = * 10^4 = * 10^-6 Do this in binary: x 2^(1011) IEEE FP representation (+/-) * 2 ^ ( ) This is single precision Double precision: 64 bits in all. Where does one need accuracy of that level?

Floating point numbers
Representation issues: sign bit, exponent, significand Question: how to represent each field Question: which order to lay them out in a word? Factor: should be easy to do comparisons (for sorting) For arithmetic, we will have special hardware anyway Choice: Sign + magnitude representation Sign bit, followed by exponent, then significand (why?) exponent: represented with a “bias”: add 127 (1023 for double precision) significand: assume implicit 1. (so means )

Floating point representation
So: (+/-) x (1 + significand) x 2 ^ (exponent - bias) is the value of a floating point number Example: Example: convert -.41 to single precision form

IEEE 754 floating-point standard
Leading “1” bit of significand is implicit Exponent is “biased” to make sorting easier all 0s is smallest exponent all 1s is largest bias of 127 for single precision and 1023 for double precision summary: (–1)sign significand) 2exponent – bias Example: decimal: = -3/4 = -3/22 binary: = -1.1 x 2-1 floating point: exponent = 126 = IEEE single precision:

Floating point addition
The problem is: the exponents of numbers being added may be different 2.0 * 10^ * 10^(-1) 2.0 * 10^ * 10^ 1 : Now we can add them 2.03 * 10 ^1 But we are not necessarily done! E.g * 10^ * 10^(-1) 10.07 * 10^0 is not correct form! Shift again to get the correct form: * 10^1

You can get different results
A + B + C = A + (B+C) = (A+B) + C Right? Can you see a problem? When do you lose bits?

Floating point multiplication
Add exponents, but subtract bias Then multiply significands Then normalize

Guard and Round bits Not all numbers are representable
To minimize rounding error As the machine performs steps of the addition or multiplication algorithm, intermediate results are stored with 3 additional bits Consider first just 2 of them: called round and guard Adding those 2 bits allows us to round correctly 1 enough for addition, but 2 needed for multiplication What to do when the result 2.50? Sticky bit. And rounding modes. 3.45 * 10^ * 10^-2 Assume 3 digits can be used. 3.45 > 3.48

CS 232: Computer Architecture II

Similar presentations

Presentation on theme: "CS 232: Computer Architecture II"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 232: Computer Architecture II

Similar presentations

Presentation on theme: "CS 232: Computer Architecture II"— Presentation transcript:

Similar presentations

About project

Feedback