Floating Point (FLP) Representation

Slides:



Advertisements
Similar presentations
Spring 2013 Advising Starts this week! CS2710 Computer Organization1.
Advertisements

COMP2130 Winter 2015 Storing signed numbers in memory.
Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.
CENG536 Computer Engineering department Çankaya University.
Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Floating Point (FLP) Representation A Floating Point value: f = m*r**e Where: m – mantissa or fractional r – base or radix, usually r = 2 e - exponent.
Floating Point Numbers
Faculty of Computer Science © 2006 CMPUT 229 Floating Point Representation Operating with Real Numbers.
COMP3221: Microprocessors and Embedded Systems Lecture 14: Floating Point Numbers Lecturer: Hui Wu Session 2, 2004.
Floating Point Numbers
Floating Point Numbers. CMPE12cGabriel Hugh Elkaim 2 Floating Point Numbers Registers for real numbers usually contain 32 or 64 bits, allowing 2 32 or.
Floating Point Numbers. CMPE12cCyrus Bazeghi 2 Floating Point Numbers Registers for real numbers usually contain 32 or 64 bits, allowing 2 32 or 2 64.
Chapter 5 Floating Point Numbers. Real Numbers l Floating point representation is used whenever the number to be represented is outside the range of integer.
Floating Point Numbers
1 Module 2: Floating-Point Representation. 2 Floating Point Numbers ■ Significant x base exponent ■ Example:
Floating Point Numbers
ECEN 248 Integer Multiplication, Number Format Adopted from Copyright 2002 David H. Albonesi and the University of Rochester.
Computer Science 210 Computer Organization Floating Point Representation.
The IEEE Format for storing float (single precision) data type Use the “enter” key to proceed through the show.
Simple Data Type Representation and conversion of numbers
Binary Real Numbers. Introduction Computers must be able to represent real numbers (numbers w/ fractions) Two different ways:  Fixed-point  Floating-point.
Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.
Computer Architecture
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers,
9.4 FLOATING-POINT REPRESENTATION
1 Number Systems Lecture 10 Digital Design and Computer Architecture Harris & Harris Morgan Kaufmann / Elsevier, 2007.
Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.
Data Representation: Floating Point for Real Numbers Computer Organization and Assembly Language: Module 11.
IT11004: Data Representation and Organization Floating Point Representation.
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.
Representation of Data (Part II) Computer Studies Notes: chapter 19 Ma King Man.
Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.
Floating Point Arithmetic – Part I
FLOATING-POINT NUMBER REPRESENTATION
William Stallings Computer Organization and Architecture 8th Edition
CSCI206 - Computer Organization & Programming
Floating Points & IEEE 754.
Floating Point Representations
Lecture 6. Fixed and Floating Point Numbers
Computer Science 210 Computer Organization
Introduction To Computer Science
Floating Point Representations
Dr. Clincy Professor of CS
CSCI206 - Computer Organization & Programming
Integer Division.
Lecture 9: Floating Point
CS 232: Computer Architecture II
April 2006 Saeid Nooshabadi
PRESENTED BY J.SARAVANAN. Introduction: Objective: To provide hardware support for floating point arithmetic. To understand how to represent floating.
William Stallings Computer Organization and Architecture 7th Edition
Chapter 6 Floating Point
CSCI206 - Computer Organization & Programming
Number Representations
Data Representation Data Types Complements Fixed Point Representation
Floating Point Representation
CSCI206 - Computer Organization & Programming
(Part 3-Floating Point Arithmetic)
Computer Science 210 Computer Organization
How to represent real numbers
How to represent real numbers
October 17 Chapter 4 – Floating Point Read 5.1 through 5.3 1/16/2019
Floating Point Numbers
CS 286 Computer Architecture & Organization
IT11004: Data Representation and Organization
Numbers with fractions Could be done in pure binary
Computer Organization and Assembly Language
Number Representations
Presentation transcript:

Floating Point (FLP) Representation A Floating Point value: f = m*r**e Where: m – mantissa or fractional r – base or radix, usually r = 2 e - exponent

Normalization Normalized value: 0.1011 Unnormalized value: 0.001011 =0.1011*2**-2 Value of normalized mantissa: 0.5<=m<1

FLP Format sign exponent mantissa sign: 0 + 1 - 1 - Biased exponent: assume exponent q bits -2**(q-1)<=e<=2**(q-1)-1 add bias: +2**(q-1) to all sides, get: 0 <= eb <= 2**q -1 e – true exponent; eb – biased exponent

Example f = -0.5078125*2**-2 Assume a 32-bit format: sign – 1 bit, exponent – 10 bits (q=10), mantissa – 21 bits q-1 = 9, b = bias = 2**9 = 512, e = -2, eb = e + b = -2 + 512 = 510 f representation: 1 0111111110 10000010………0 since 0.5=0.1, 0.0078125=2**-7

Range of representation In fixed point, the largest number representable in 32 bits: 2**31-1 approximately equal 10**9 In the previous 32-bit format, the largest number representable: (1-2**-21)*2**511 Approximately equal 10**153 The smallest: 0.5*2**-512 If a number falls above the largest, we have an overflow, if below the smallest, we have an underflow.

IEEE FLP Standard 754 1985 Single precision: 32 bits Double precision: 64 bits Single Precision. f = +- 1.M*2**(E’-127) where: M – fractional, E’ – biased exponent, bias = 127 Format: sign: 1 bit, exponent – 8 bits, fractional – 23 bits. True exponent E = E’ – 127 0 < E’ < 255

Normalized single precision 1.xxxxxx The 1 before the binary point is not stored, but assumed to exist. Example: convert 5.25 to single precision representation. 5.25 = 101.01 not normalized. Normalized: 1.0101*2**2 True exponent E = 2, Biased exponent E’ = E + 127 = 129, thus: 0 10000001 01010…………0

Double precision Value represented: +- 1.M*2**(E’-1023) Format: sign: 1 bit, exponent 11 bits, fractional 52bits. Bias = 1023 Maximal number represented in single precision, approximately: 10**38 In double precision: approximately 10**308

Precision Increasing the exponent field, increases the range, but then, the fractional is decreased, decreasing the precision. Suppose we want to receive a precision of n decimal digits. How many bits x we need in the fractional? 2**x = 10**n, take decimal log on both sides: xlog2 = n; x=n/log2=n/0.301 For n=7, need 7/0.301=23.3, 24 bits. Achieved in single precision standard, since M has 23 bit and there is 1., not stored but existing.

Extended Precision 80 bits Not a part of IEEE standard. Used primarily in Intel processors. Exponent 15 bits. Fractional 64 bits. This is why FLP registers in Intel processors are 80 and not 64 bits. Its precision is 19 decimal digits.

FLP Computation Given 2 FLP values: X=Xm*2**Xe; Y=Ym*2**Ye; Xe<Ye X+-Y = (Xm*2**(Xe-Ye)+-Ym)*2**Ye X*Y = Xm*Ym*2**(Xe+Ye) X/Y = (Xm/Ym)*2**(Xe-Ye)