IT11004: Data Representation and Organization

IT11004: Data Representation and Organization
IT11004: Data Representation and Organization Floating Point Representation

normalized form A real number is called normalized, if it is in the form: d0.d1d2d3…x10n where n is an integer, d1d2d3… are the digits of the number in base 10, and d0 is not zero. As examples, the number in normalized form is x102 the number in normalized form is x10-3 Clearly, any non-zero real number can be normalized.

Floating Point Precisions
s exp frac Encoding MSB is sign bit exp field encodes E frac field encodes M Sizes Single precision: 8 exp bits, 23 frac bits 32 bits total Double precision: 11 exp bits, 52 frac bits 64 bits total Extended precision: 15 exp bits, 63 frac bits Only found in Intel-compatible machines Stored in 80 bits 1 bit wasted

Single-precision floating-point format (binary32)
Single-precision floating-point format (binary32) A computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point. One of the first programming languages to provide single- and double-precision floating-point data types was Fortran. Single-precision binary floating-point is used due to its wider range over fixed point Single precision is known as float in C, C++, C#, Java[1], and Haskell, and single in Pascal, Visual Basic, and MATLAB.

IEEE single-precision binary floating-point format: binary32
IEEE single-precision binary floating-point format: binary32

convert a base 10 real number into binary32 format
consider a real number with an integer and a fraction part such as Convert the integer part into binary convert the fraction part using the following technique add the two results and adjust them to produce a proper final conversion Conversion of the fractional part 0.375 x 2 = 0.750 0.750 x 2 = 1.500 0.500 x 2 = fraction = 0.000, terminate (0.375)10 can be exactly represented in binary as (0.011)2 Therefore (12.375)10 = (12)10 + (0.375)10 = (1100)2 + (0.011)2 = ( )2 In normalized form (12.375)10 = x23

convert a base 10 real number into binary32 format…
convert a base 10 real number into binary32 format… In normalized form (12.375)10 = x23 From which we deduce: The exponent is 3 (and in the biased form it is therefore =130 = ) The fraction is (looking to the right of the binary point) From these we can form the resulting 32 bit IEEE 754 binary32 format representation of as: = H

Ex 1 Consider a value 0.25 . We can see that : (0.25)10 =(1.0)2x2-2
0.25 x 2 = 0.5 0.5 x 2 = 1.0  = 0.012 Consider a value We can see that : (0.25)10 =(1.0)2x2-2 From which we deduce : The exponent is −2 (and in the biased form it is 127+(−2)= 125 = ) The fraction is 0 (looking to the right of binary point in 1.0 is all zeros) From these we can form the resulting 32 bit IEEE 754 binary32 format representation of real number 0.25 as: = 3e800000H

Ex 2 Convert into binary 32 floating point format

Double-precision floating-point format (binary64)
a computer number format that occupies two adjacent storage locations in computer memory. A double-precision number, sometimes simply called a double, may be defined to be an integer, fixed point, or floating point binary64 is having: Sign bit: 1 bit Exponent width: 11 bits Significand precision: 52 bits

IT11004: Data Representation and Organization

Similar presentations

Presentation on theme: "IT11004: Data Representation and Organization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IT11004: Data Representation and Organization

Similar presentations

Presentation on theme: "IT11004: Data Representation and Organization"— Presentation transcript:

Similar presentations

About project

Feedback