1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU3 The floating point standard  Single Precision  Value of bits stored in representation is: If e=255 and f /= 0, then v is NaN regardless of s If e=255 and f = 0, then v = (-1) s  If 0 < e < 255, then v = (-1) s 2 e-127 (1.f) – normalized number If e = 0 and f /= 0, the v = (-1) s 2 -126 (0.f)  Denormalized numbers – allow for graceful underflow If e = 0 and f = 0 the v = (-1) s 0 (zero)

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU4 The floating point standard  Double Precision  Value of bits in word representation is: If e=2047 and f /= 0, then v is NaN regardless of s If e=2047 and f = 0, then v = (-1) s  If 0 < e < 2047, then v = (-1) s 2 e-1023 (1.f)  – normalized number If e = 0 and f /= 0, the v = (-1) s 2 -1022 (0.f)  Denormalized numbers – allow for graceful underflow If e = 0 and f = 0 the v = (-1) s 0 (zero)

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU5 The floating point standard  Notes on single and double precision The leading 1 of the fractional part is not stored for normalized numbers  Mantissa 1.001001001001001…. Representation allows for +0 and -0 indicating direction of 0 (allow determination that might matter if rounding was used) Denormalized numbers allow graceful underflow towards 0

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU6 Conversion Examples  Converting from base 10 to the representation  Single precision example  Covert 100 10  Step 1 – convert to binary - 0110 0100  In a binary representation form of 1.xxx have 0110 0100 = 1.100100 x 2 6

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU7 Conversion Example Continued  1.1001 x 2 6 is binary for 100  Thus the exponent is a 6 Biased exponent will be 6+127=133 = 1000 0101 Sign will be a 0 for positive Stored fractional part f will be 1001  Thus we have s e f 0 100 0 010 1 1 00 1000…. 4 2 C 8 0 0 0 0 in hexadecimal $42C8 0000 is representation for 100

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU8 Another example  Representation for -175 (sign magnitude rep) 175 = 128 + 32 + 8 + 4 + 2 +1 = 1010 1111 Or 1.0101111 x 2 7 S = 1 Exponent is 7 +127 = 134 = 1000 0110 Fractional part f = 0101111 Representation 1100 0011 0010 1111 0000 …. Or in Hex $C32F 0000

A fractional example  Decimal value 0.25  Convert to binary 0.0100  In power form 1.0000 x 2 -2  Sign is + so 0  Exponent is -2 +127 = 125 = 0111 1101  Fractional part is 00000…  Representation is 0 011 1110 1 000 0000 …  And in Hex $3E80 0000 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU9

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU10 Converting back  Convert $C32F 0000 into decimal  Extract components from 1100 0011 0010 1111 S = 1 Exponent = 1000 0110 = 128+4+2 = 134 unbias 134 – 127 =7 f = 0101111 so mantissa is 1.0101111 Adjust man by exponent 1010 1111 (move binary pt 7 places) Or 128+32+15 = 175 Sign is negative so -175

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU11 Another example  Convert $41C8 0000 to decimal 0100 0001 1100 1000 0000 …. S is 0 so positive number Exponent 1000 0011 = 128+3=131-127=4 f = 1001 so mantissa is 1.1001 With 4 binary positions have 11001 as final number in binay which is 25

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU12 Arithmetic with floating point numbers  Add op1 $42C8 0000 and op2 $41C8 0000  First divide into component parts Op1 $42C8 0000 =0100 0010 1100 1000 0000 ….  S = 0  E = 1000 0101 = 133 – 127 = 6  M op1 = 1.10010000… Op2 $41C8 0000 =0100 0001 1100 1000 0000 ….  S = 0  E = 1000 0011 = 131 – 127 = 4  M op2 = 1.10010000…

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU13 Now add the mantissas  But first align the mantissas Op1 1.1001000…. Op2 1.1001000…. Which is the smaller number and needs to be aligned Exponent difference between op1 and op2 is 2 So shift op2 by 2 binary places or Op2 becomes 0.0110010000…

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU14 Add  Add op1 mantissa with the aligned op2 mantissa 1.1001000000… 0.0110010000… 1.1111010000  Result exponent is 6  Value is 1111101 or 64+32+16+8+4+1=125  Values added were 100 and 25

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU15 Constructing Result Value  Sign 0  Exponent 6 E = 1000 0101 = 133 – 127 = 6  Mantissa of Result 1.1111010000  Fractional Part 1111010000….  Constructed Value 0 100 0010 1 111 1010 0000 0000 0000 0000 $4 2 F A 0 0 0 0 (125)

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU16 Floating point representation of 125  Positive so s is 0  Exponent is 6 + 127 = 133 = 1000 0101  Fractional part from mantissa of 1.111101 or 111101  Constructed value 0 1000 0101 111101 00000000000000000 $42FA 0000

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU17 Multiplication example  Multiply op1 $42C8 0000 & op2 $41C8 0000  First divide into component parts Op1 $42C8 0000 =0100 0010 1100 1000 0000 ….  S = 0  E = 1000 0101 = 133 – 127 = 6  M op1 = 1.10010000… Op2 $41C8 0000 =0100 0001 1100 1000 0000 ….  S = 0  E = 1000 0011 = 131 – 127 = 4  M op2 = 1.10010000…

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU18 Multiplication basics  Base 10 example 3x10 2 * 1.1x10 2 = 3.3 x 10 4  Have 2 numbers A x 2 ea and B x 2 eb  Multiply and get  result = A*B x 2 ea+eb

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU19 So here  Have sign of both is + so result is +  Exponent addition Both exponents are biased as stored If you add stored binary exponents you need to subtract the extra bias or 127 Or using pencil and paper (or powerpoint) can just add the unbiased exponent of one operand to the other biased exponent Here have 133 + 4 = 137 = 1000 1001

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU20 The mantissas  Do a binary multiplication 1.1001 1 1001 11001 and add 100111 0001  Adjusting for binary point have 10.01110001

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU21 Final result  Exponent is 137 or 10  Mantissa is 10.01110001  Adjusted for exponent 1001 1100 0100  Value is 2048+256+128+64+4  Or 2304+128+68 = 2432 + 68 = 2500  And we were multiplying 100 * 25

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU22 Specification of a FPA  Floating Point Add/Subtract Unit  Specification Inputs in IEEE 754 Double Precision Must perform both addition and subtraction Must handle the full floating point standard  Normalized numbers  Not a Numbers – NaNs  +/- Infinity  Denormalized numbers

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU23 Specifications continued Result will be a IEEE 754 Double Precision representation Unit will correctly handle the invalid operation of adding +  and -  = Nan per the standard Unit latches it inputs into registers from parallel 64-bit data busses. There is a separate signal line that indicates the operation add or subtract

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU24 Specifications continued  Outputs The correctly represented result Flags that are output are  Zero result  Overflow to infinity from normalized numbers as inputs  NaN result  Overshift (result is the larger of the two operands)  Denormalized result  Inexact (result was rounded)  Invalid operation for addition

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU25 High level block diagram  Basic architecture interface Data – 64 bit A,B,& C Busses Control signals – Latch, Add/Sub, Asel, Drive Condition Flags Output – 7 Flag signals Clocks – Phi1 and Phi2 (a 2 phase clocked architecture

Denormalized Example  Denormalized example $0010 0000 0 0000 0000 0010000…… f =.001000000 *$42C8 0000 (100)  Change Denormalized to 2 e-127 form S = 0 E = 2 -127 M = x.f = 0.0100 0000 …… 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU27

Do multiplication  Had the FP representation of 100 S = 0 E = 2 6 (133 – 127) M = 1.f = 1.1001  Multiply and get a result with S = 0 E = 2 6-127 = -121 M = 1.1001*0.01 = 0.011001 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU28

Renormalize  The values to store S = 0 E = 2 -121 for a mantissa of 0.011001 (stored e 6) Adjust to normailized form E = 2 -123 for a mantissa of 001.1001 (stored e 4)  Construct value to store S E F 0 0000 0100 1001 0000 ……… = $0248 0000 1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU29

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Similar presentations

Presentation on theme: "1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Similar presentations

Presentation on theme: "1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution."— Presentation transcript:

Similar presentations

About project

Feedback