Presentation is loading. Please wait.

Presentation is loading. Please wait.

Double-Precision Floating-Point Numbers Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Copyright © 2007.

Similar presentations


Presentation on theme: "Double-Precision Floating-Point Numbers Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Copyright © 2007."— Presentation transcript:

1 Double-Precision Floating-Point Numbers Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Copyright © 2007 by Douglas Wilhelm Harder. All rights reserved. ECE 204 Numerical Methods for Computer Engineers

2 Double-Precision Floating-Point Numbers This topic introduces binary numbers –requirements –a poor means of storage –a good means of storage

3 Double-Precision Floating-Point Numbers We will now use this same floating-point format, but we will apply it to binary numbers

4 Double-Precision Floating-Point Numbers In our example, we used six decimal digits The double-precision floating-point format uses 64 bits (or eight bytes) Like our format, they are broken up into –a leading sign bit, –an exponent (with a bias), and –a mantissa

5 Double-Precision Floating-Point Numbers Like our six-digit version, the bits are stored in the order: SEEEEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM The bias is 01111111111, or 1023 This allows us to represent numbers in the range 2 -1023 to 2 1025, though the floating- point standard IEEE 754 reserves the use of the lowest (all zeros) and highest (all ones) exponents

6 Double-Precision Floating-Point Numbers Recall that the leading bit in a floating- point representation must be non-zero, thus, the bit must be 1 We therefore do not store the leading digit, thus, the mantissa actually represents 1.MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

7 Double-Precision Floating-Point Numbers Rather than printing out a lot of 1s and 0s, instead, we will use hexadecimal numbers: 0 00000 1 10001 2 20010 3 30011 4 40100 5 50101 6 60110 7 70111 8 81000 9 91001 a101010 b111011 c121100 d131101 e141110 f151111

8 Double-Precision Floating-Point Numbers To convert a binary number into hexadecimal, simply group the bits into groups of four (starting a a radix point if it exists) and replace each group with the corresponding hexadecimal value To convert from hexadecimal to binary, replace each hexadecimal digit with its four-bit equivalent (including leading zeros)

9 Double-Precision Floating-Point Numbers Some of the more common numbers are: >> format hex >> 1 ans = 3ff0000000000000 >> 2 ans = 4000000000000000 >> -1 ans = bff0000000000000 >> -2 ans = c000000000000000 Recall that 3ff 16 = 001111111111 2 which is our bias

10 Double-Precision Floating-Point Numbers Some operations are quite straight- forward: –multiplication by 2 adds 1 to the exponent and leaves the mantissa unchanged –division by 2 subtracts 1 from the exponent and leaves the mantissa unchanged

11 Double-Precision Floating-Point Numbers Rounding rules are simplified Given a binary number which has more than 53 bits of precision, then to round it to a 53 bit number –if the 54 th bit is 0, then truncate (round down) –if all bits after the 53 rd bit are 1000··· then round up if the 53 rd bit is 1, otherwise truncate, and –otherwise, round up (add 1 to the 53 rd bit)

12 Double-Precision Floating-Point Numbers Remember, we deal with 53 bits because we store 52 bits together with the implicit leading 1

13 Usage Notes These slides are made publicly available on the web for anyone to use If you choose to use them, or a part thereof, for a course at another institution, I ask only three things: –that you inform me that you are using the slides, –that you acknowledge my work, and –that you alert me of any mistakes which I made or changes which you make, and allow me the option of incorporating such changes (with an acknowledgment) in my set of slides Sincerely, Douglas Wilhelm Harder, MMath dwharder@alumni.uwaterloo.ca


Download ppt "Double-Precision Floating-Point Numbers Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Copyright © 2007."

Similar presentations


Ads by Google