Floating Point Computation

Slides:



Advertisements
Similar presentations
Roundoff and truncation errors
Advertisements

2009 Spring Errors & Source of Errors SpringBIL108E Errors in Computing Several causes for malfunction in computer systems. –Hardware fails –Critical.
Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.
Fabián E. Bustamante, Spring 2007 Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.
Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
2-1 Chapter 2 - Data Representation Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer Architecture.
University of Washington Today Topics: Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties.
ECIV 201 Computational Methods for Civil Engineers Richard P. Ray, Ph.D., P.E. Error Analysis.
CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.
Floating-Point and High-Level Languages Programming Languages Spring 2004.
Copyright 2008 Koren ECE666/Koren Part.4c.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
1 Error Analysis Part 1 The Basics. 2 Key Concepts Analytical vs. numerical Methods Representation of floating-point numbers Concept of significant digits.
Floating Point Numbers
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM.
Computer Science 210 Computer Organization Floating Point Representation.
February 26, 2003MIPS floating-point arithmetic1 Question  Which of the following are represented by the hexadecimal number 0x ? —the integer.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Chapter 31.
Simple Data Type Representation and conversion of numbers
Numbers and number systems
Data Representation – Binary Numbers
Binary Real Numbers. Introduction Computers must be able to represent real numbers (numbers w/ fractions) Two different ways:  Fixed-point  Floating-point.
Information Representation (Level ISA3) Floating point numbers.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Number Systems II Prepared by Dr P Marais (Modified by D Burford)
CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI 230 Information Representation: Negative and Floating Point.
Fixed-Point Arithmetics: Part II
Lecture 2 Number Representation and accuracy
CISE301_Topic11 CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4:
Introduction to Numerical Analysis I
Fall 2011SYSC 5704: Elements of Computer Systems 1 Data Representation Also called Encoding Murdocca Chapter 2.
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
Data Representation in Computer Systems
ME 142 Engineering Computation I Computer Precision & Round-Off Error.
5.2 Errrors. Why Study Errors First? Nearly all our modeling is done on digital computers (aside: what would a non-digital analog computer look like?)
Spring Floating Point Computation Jyun-Ming Chen.
Round-off Errors.
Fixed and Floating Point Numbers Lesson 3 Ioan Despi.
Round-off Errors and Computer Arithmetic. The arithmetic performed by a calculator or computer is different from the arithmetic in algebra and calculus.
Floating Point Arithmetic
MECN 3500 Inter - Bayamon Lecture 3 Numerical Methods for Engineering MECN 3500 Professor: Dr. Omar E. Meza Castillo
Integer and Fixed Point P & H: Chapter 3
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Chapter 3.
CS 61C L3.2.1 Floating Point 1 (1) K. Meinz, Summer 2004 © UCB CS61C : Machine Structures Lecture Floating Point Kurt Meinz inst.eecs.berkeley.edu/~cs61c.
Numerical Analysis CC413 Propagation of Errors.
CS1Q Computer Systems Lecture 2 Simon Gay. Lecture 2CS1Q Computer Systems - Simon Gay2 Binary Numbers We’ll look at some details of the representation.
10/7/2004Comp 120 Fall October 7 Read 5.1 through 5.3 Register! Questions? Chapter 4 – Floating Point.
Numerical Analysis CC413 Propagation of Errors. 2 In numerical methods, the calculations are not made with exact numbers. How do these inaccuracies propagate.
Module 2.2 Errors 03/08/2011. Sources of errors Data errors Modeling Implementation errors Absolute and relative errors Round off errors Overflow and.
Cosc 2150: Computer Organization Chapter 9, Part 3 Floating point numbers.
Introduction to Numerical Analysis I
Floating Point Representations
Integer Division.
Topics IEEE Floating Point Standard Rounding Floating Point Operations
Machine arithmetic and associated errors Introduction to error analysis (cont.) Class III.
Chapter 6 Floating Point
Topic 3d Representation of Real Numbers
Roundoff and Truncation Errors
Data Representation Data Types Complements Fixed Point Representation
How to represent real numbers
Approximations and Round-Off Errors Chapter 3
Topic 3d Representation of Real Numbers
Roundoff and Truncation Errors
Presentation transcript:

Floating Point Computation Jyun-Ming Chen

Contents Sources of Computational Error Computer Representation of (Floating-point) Numbers Efficiency Issues

Sources of Computational Error Converting a mathematical problem to numerical problem, one introduces errors due to limited computation resources: round off error (limited precision of representation) truncation error (limited time for computation) Misc. Error in original data Blunder (programming/data input error) Propagated error

Supplement: Error Classification (Hildebrand) Gross error: caused by human or mechanical mistakes Roundoff error: the consequence of using a number specified by n correct digits to approximate a number which requires more than n digits (generally infinitely many digits) for its exact specification. Truncation error: any error which is neither a gross error nor a roundoff error. Frequently, a truncation error corresponds to the fact that, whereas an exact result would be afforded (in the limit) by an infinite sequence of steps, the process is truncated after a certain finite number of steps.

Common measures of error Definitions total error = round off + truncation Absolute error = | numerical – exact | Relative error = Abs. error / | exact | If exact is zero, rel. error is not defined

Ex: Round off error Representation consists of finite number of digits Implication: real-number is discrete (more later) R

Watch out for printf !! By default, “%f” prints out 6 digits behind decimal point.

Ex: Numerical Differentiation Evaluating first derivative of f(x) Truncation error

Numerical Differentiation (cont) Select a problem with known answer So that we can evaluate the error!

Numerical Differentiation (cont) Error analysis h  (truncation) error  What happened at h = 0.00001?!

Ex: Polynomial Deflation F(x) is a polynomial with 20 real roots Use any method to numerically solve a root, then deflate the polynomial to 19th degree Solve another root, and deflate again, and again, … The accuracy of the roots obtained is getting worse each time due to error propagation

Computer Representation of Floating Point Numbers Floating point VS. fixed point Decimal-binary conversion Standard: IEEE 754 (1985)

Floating VS. Fixed Point Decimal, 6 digits (positive number) fixed point: with 5 digits after decimal point 0.00001, … , 9.99999 Floating point: 2 digits as exponent (10-base); 4 digits for mantissa (accuracy) 0.001x10-99, … , 9.999x1099 Comparison: Fixed point: fixed accuracy; simple math for computation (sometimes used in graphics programs) Floating point: trade accuracy for larger range of representation

Decimal-Binary Conversion Ex: 134 (base 10) Ex: 0.125 (base 10) Ex: 0.1 (base 10)

Floating Point Representation Fraction, f Usually normalized so that Base, b 2 for personal computers 16 for mainframe … Exponent, e

Understanding Your Platform

Padding How about

IEEE 754-1985 Purpose: make floating system portable Defines: the number representation, how calculation performed, exceptions, … Single-precision (32-bit) Double-precision (64-bit)

Number Representation S: sign of mantissa Range (roughly) Single: 10-38 to 1038 Double: 10-307 to 10307 Precision (roughly) Single: 7 significant decimal digits Double: 15 significant decimal digits Describe how these are obtained

Implication When you write your program, make sure the results you printed carry the meaningful significant digits.

Implicit One Normalized mantissa to increase one extra bit of precision Ex: –3.5

Exponent Bias Ex: in single precision, exponent has 8 bits 0000 0000 (0) to 1111 1111 (255) Add an offset to represent +/ – numbers Effective exponent = biased exponent – bias Bias value: 32-bit (127); 64-bit (1023) Ex: 32-bit 1000 0000 (128): effective exp.=128-127=1

Ex: Convert – 3.5 to 32-bit FP Number

Examine Bits of FP Numbers Explain how this program works

The “Examiner” Use the previous program to Observe how ME work Test subnormal behaviors on your computer/compiler Convince yourself why the subtraction of two nearly equal numbers produce lots of error NaN: Not-a-Number !?

Design Philosophy of IEEE 754 [s|e|m] S first: whether the number is +/- can be tested easily E before M: simplify sorting Represent negative by bias (not 2’s complement) for ease of sorting [biased rep] –1, 0, 1 = 126, 127, 128 [2’s compl.] –1, 0, 1 = 0xFF, 0x00, 0x01 More complicated math for sorting, increment/decrement

Exceptions Overflow: Underflow Dwarf Machine Epsilon (ME) ±INF: when number exceeds the range of representation Underflow When the number are too close to zero, they are treated as zeroes Dwarf The smallest representable number in the FP system Machine Epsilon (ME) A number with computation significance (more later)

Extremities E : (1…1) E : (0…0) M (0…0): infinity More later E : (1…1) M (0…0): infinity M not all zeros; NaN (Not a Number) E : (0…0) M (0…0): clean zero M not all zero: dirty zero (see next page)

Not-a-Number Numerical exceptions Often cause program to stop running Sqrt of a negative number Invalid domain of trigonometric functions … Often cause program to stop running

Extremities (32-bit) Max: Min (w/o stepping into dirty-zero) 1. (1.111…1)2254-127=(10-0.000…1) 21272128 1. (1.000…0)21-127=2-126

Dirty-Zero (a.k.a. denormals) a.k.a.: also known as Dirty-Zero (a.k.a. denormals) No “Implicit One” IEEE 754 did not specify compatibility for denormals If you are not sure how to handle them, stay away from them. Scale your problem properly “Many problems can be solved by pretending as if they do not exist”

Dirty-Zero (cont) 2-126 2-126 (Dwarf: the smallest representable) 2-126 denormals dwarf 2-126 00000000 10000000 00000000 00000000 2-127 00000000 01000000 00000000 00000000 00000000 00100000 00000000 00000000 2-128 00000000 00010000 00000000 00000000 2-129 (Dwarf: the smallest representable)

Drawf (32-bit) Value: 2-149

Machine Epsilon (ME) Definition This is not the same as the dwarf smallest non-zero number that makes a difference when added to 1.0 on your working platform This is not the same as the dwarf Why 1.0?

Computing ME (32-bit) 1+eps Getting closer to 1.0 ME: (00111111 10000000 00000000 00000001) –1.0 = 2-23  1.12  10-7

Effect of ME

Significance of ME Never terminate the iteration on that 2 FP numbers are equal. Instead, test whether |x-y| < ME

Numerical Scaling Number Density: there are as many IEEE 754 numbers between [1.0, 2.0] as there are in [256, 512] Revisit: “roundoff” error ME: a measure of density near the 1.0 Implication: Scale your problem so that intermediate results lie between 1.0 and 2.0 (where numbers are dense; and where roundoff error is smallest) R

Scaling (cont) Performing computation on denser portions of real line minimizes the roundoff error but don’t over do it; switch to double precision will easily increase the precision The densest part is near subnormal, if density is defined as numbers per unit length

How Subtraction is Performed on Your PC Steps: convert to Base 2 Equalize the exponents by adjusting the mantissa values; truncate the values that do not fit Subtract mantissa normalize

Subtraction of Nearly Equal Numbers Base 10: 1.24446 – 1.24445 1. 1110111 0100011 1010100… – Significant loss of accuracy (most bits are unreliable)

Theorem of Loss Precision x, y be normalized floating point machine numbers, and x>y>0 If then at most p, at least q significant binary bits are lost in the subtraction of x-y. Interpretation: “When two numbers are very close, their subtraction introduces a lot of numerical error.”

Implications When you program: You should write these instead: Every FP operation introduces error, but the subtraction of nearly equal numbers is the worst and should be avoided whenever possible

Efficiency Issues Horner Scheme program examples

Horner Scheme For polynomial evaluation Compare efficiency

Accuracy vs. Efficiency

Good Coding Practice

On Arrays …

Issues of PI 3.14 is often not accurate enough 4.0*atan(1.0) is a good substitute

Compare:

Exercise Explain why Explain why converge when implemented numerically

Exercise Why Me( ) does not work as advertised? Construct the 64-bit version of everything Bit-Examiner Dme( ); 32-bit: int and float. Can every int be represented by float (if converted)?