1/8/2007 - L24 IEEE Floating Point Basics Copyright 2006 - Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Slides:

Advertisements

Similar presentations

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Advertisements

Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Binary Arithmetic Binary addition Binary subtraction

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.

Floating Point Numbers

1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.

CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (2)

Floating Point Numbers

Integer Arithmetic Floating Point Representation Floating Point Arithmetic Topics.

Floating Point Numbers

CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.

Floating Point Numbers

CPSC 321 Computer Architecture ALU Design – Integer Addition, Multiplication & Division Copyright 2002 David H. Albonesi and the University of Rochester.

ECEN 248 Integer Multiplication, Number Format Adopted from Copyright 2002 David H. Albonesi and the University of Rochester.

Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Simple Data Type Representation and conversion of numbers

Computer Organization and Architecture Computer Arithmetic Chapter 9.

Computer Arithmetic Nizamettin AYDIN

ECE232: Hardware Organization and Design

Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.

CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.

Lecture 9: Floating Point

CSC 221 Computer Organization and Assembly Language

1 Number Systems Lecture 10 Digital Design and Computer Architecture Harris & Harris Morgan Kaufmann / Elsevier, 2007.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:

Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.

Computer Architecture Lecture 22 Fasih ur Rehman.

Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.

Data Representation: Floating Point for Real Numbers Computer Organization and Assembly Language: Module 11.

Computer Architecture Lecture 11 Arithmetic Ralph Grishman Oct NYU.

CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.

1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point Adder Using the IEEE Floating Point Standard for an.

Chapter 9 Computer Arithmetic

William Stallings Computer Organization and Architecture 8th Edition

Floating Point Representations

Floating Point Representations

Recitation 4&5 and review 1 & 2 & 3

Integer Division.

Lecture 9: Floating Point

NxN Crossbar design for Barrel Shifter

Floating Point Number system corresponding to the decimal notation

Outline Introduction Floating Point Arithmetic Adder Multiplier.

CSCE 350 Computer Architecture

The IEEE Floating Point Standard and execution units for it

CSCI206 - Computer Organization & Programming

How to represent real numbers

How to represent real numbers

ECEG-3202 Computer Architecture and Organization

IEEE Floating Point Adder

A floating point multiplier behavior model.

IEEE Floating Point Adder Verification

A floating point multiplier behavior model.

The IEEE Floating Point Standard and execution units for it

A floating point multiplier behavior model.

Chapter 1 Introduction.

Lecture 9: Shift, Mult, Div Fixed & Floating Point

Presentation transcript:

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution units for it

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU2 Lecture overview  The standard  Floating Point Basics  A floating point adder design

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU3 The floating point standard  Single Precision  Value of bits stored in representation is: If e=255 and f /= 0, then v is NaN regardless of s If e=255 and f = 0, then v = (-1) s  If 0 < e < 255, then v = (-1) s 2 e-127 (1.f) – normalized number If e = 0 and f /= 0, the v = (-1) s (0.f)  Denormalized numbers – allow for graceful underflow If e = 0 and f = 0 the v = (-1) s 0 (zero)

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU4 The floating point standard  Double Precision  Value of bits in word representation is: If e=2047 and f /= 0, then v is NaN regardless of s If e=2047 and f = 0, then v = (-1) s  If 0 < e < 2047, then v = (-1) s 2 e-1023 (1.f)  – normalized number If e = 0 and f /= 0, the v = (-1) s (0.f)  Denormalized numbers – allow for graceful underflow If e = 0 and f = 0 the v = (-1) s 0 (zero)

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU5 The floating point standard  Notes on single and double precision The leading 1 of the fractional part is not stored for normalized numbers  Mantissa …. Representation allows for +0 and -0 indicating direction of 0 (allow determination that might matter if rounding was used) Denormalized numbers allow graceful underflow towards 0

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU6 Conversion Examples  Converting from base 10 to the representation  Single precision example  Covert  Step 1 – convert to binary  In a binary representation form of 1.xxx have = x 2 6

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU7 Conversion Example Continued  x 2 6 is binary for 100  Thus the exponent is a 6 Biased exponent will be 6+127=133 = Sign will be a 0 for positive Stored fractional part f will be 1001  Thus we have s e f …. 4 2 C in hexadecimal $42C is representation for 100

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU8 Another example  Representation for -175 (sign magnitude rep) 175 = = Or x 2 7 S = 1 Exponent is = 134 = Fractional part f = Representation …. Or in Hex $C32F 0000

A fractional example  Decimal value 0.25  Convert to binary  In power form x 2 -2  Sign is + so 0  Exponent is = 125 =  Fractional part is 00000…  Representation is …  And in Hex $3E /8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU9

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU10 Converting back  Convert $C32F 0000 into decimal  Extract components from S = 1 Exponent = = = 134 unbias 134 – 127 =7 f = so mantissa is Adjust man by exponent (move binary pt 7 places) Or = 175 Sign is negative so -175

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU11 Another example  Convert $41C to decimal …. S is 0 so positive number Exponent = 128+3= =4 f = 1001 so mantissa is With 4 binary positions have as final number in binay which is 25

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU12 Arithmetic with floating point numbers  Add op1 $42C and op2 $41C  First divide into component parts Op1 $42C = ….  S = 0  E = = 133 – 127 = 6  M op1 = … Op2 $41C = ….  S = 0  E = = 131 – 127 = 4  M op2 = …

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU13 Now add the mantissas  But first align the mantissas Op …. Op …. Which is the smaller number and needs to be aligned Exponent difference between op1 and op2 is 2 So shift op2 by 2 binary places or Op2 becomes …

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU14 Add  Add op1 mantissa with the aligned op2 mantissa … …  Result exponent is 6  Value is or =125  Values added were 100 and 25

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU15 Constructing Result Value  Sign 0  Exponent 6 E = = 133 – 127 = 6  Mantissa of Result  Fractional Part ….  Constructed Value $4 2 F A (125)

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU16 Floating point representation of 125  Positive so s is 0  Exponent is = 133 =  Fractional part from mantissa of or  Constructed value $42FA 0000

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU17 Multiplication example  Multiply op1 $42C & op2 $41C  First divide into component parts Op1 $42C = ….  S = 0  E = = 133 – 127 = 6  M op1 = … Op2 $41C = ….  S = 0  E = = 131 – 127 = 4  M op2 = …

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU18 Multiplication basics  Base 10 example 3x10 2 * 1.1x10 2 = 3.3 x 10 4  Have 2 numbers A x 2 ea and B x 2 eb  Multiply and get  result = A*B x 2 ea+eb

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU19 So here  Have sign of both is + so result is +  Exponent addition Both exponents are biased as stored If you add stored binary exponents you need to subtract the extra bias or 127 Or using pencil and paper (or powerpoint) can just add the unbiased exponent of one operand to the other biased exponent Here have = 137 =

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU20 The mantissas  Do a binary multiplication and add  Adjusting for binary point have

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU21 Final result  Exponent is 137 or 10  Mantissa is  Adjusted for exponent  Value is  Or = = 2500  And we were multiplying 100 * 25

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU22 Specification of a FPA  Floating Point Add/Subtract Unit  Specification Inputs in IEEE 754 Double Precision Must perform both addition and subtraction Must handle the full floating point standard  Normalized numbers  Not a Numbers – NaNs  +/- Infinity  Denormalized numbers

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU23 Specifications continued Result will be a IEEE 754 Double Precision representation Unit will correctly handle the invalid operation of adding +  and -  = Nan per the standard Unit latches it inputs into registers from parallel 64-bit data busses. There is a separate signal line that indicates the operation add or subtract

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU24 Specifications continued  Outputs The correctly represented result Flags that are output are  Zero result  Overflow to infinity from normalized numbers as inputs  NaN result  Overshift (result is the larger of the two operands)  Denormalized result  Inexact (result was rounded)  Invalid operation for addition

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU25 High level block diagram  Basic architecture interface Data – 64 bit A,B,& C Busses Control signals – Latch, Add/Sub, Asel, Drive Condition Flags Output – 7 Flag signals Clocks – Phi1 and Phi2 (a 2 phase clocked architecture

1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU26 Start the VHDL  The entity interface  In the next lecture

Denormalized Example  Denormalized example $ …… f = *$42C (100)  Change Denormalized to 2 e-127 form S = 0 E = M = x.f = …… 1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU27

Do multiplication  Had the FP representation of 100 S = 0 E = 2 6 (133 – 127) M = 1.f =  Multiply and get a result with S = 0 E = = -121 M = *0.01 = /8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU28

Renormalize  The values to store S = 0 E = for a mantissa of (stored e 6) Adjust to normailized form E = for a mantissa of (stored e 4)  Construct value to store S E F ……… = $ /8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU29