Spring 20131 Floating Point Computation Jyun-Ming Chen.

Spring 20131 Floating Point Computation Jyun-Ming Chen

Spring 20132 Contents Sources of Computational Error Computer Representation of (floating-point) Numbers Efficiency Issues

Spring 20133 Sources of Computational Error Converting a mathematical problem to numerical problem, one introduces errors due to limited computation resources: –round off error (limited precision of representation) –truncation error (limited time for computation) Misc. –Error in original data –Blunder: to make a mistake through stupidity, ignorance, or carelessness; programming/data input error –Propagated error

Spring 20134 Supplement: Error Classification (Hildebrand) Gross error: caused by human or mechanical mistakes Roundoff error: the consequence of using a number specified by n correct digits to approximate a number which requires more than n digits (generally infinitely many digits) for its exact specification. Truncation error: any error which is neither a gross error nor a roundoff error. Frequently, a truncation error corresponds to the fact that, whereas an exact result would be afforded (in the limit) by an infinite sequence of steps, the process is truncated after a certain finite number of steps.

Spring 20135 Common Measures of Error Definitions –total error = round off + truncation –Absolute error = | numerical – exact | –Relative error = Abs. error / | exact | If exact is zero, rel. error is not defined

Spring 20136 Ex: Round off error Representation consists of finite number of digits The approximation of real-number on the number line is discrete! R

Spring 20137 Watch out for printf !! By default, “%f” prints out 6 digits behind decimal point.

Spring 20138 Ex: Numerical Differentiation Evaluating first derivative of f(x) Truncation error

Spring 20139 Numerical Differentiation (cont) Select a problem with known answer –So that we can evaluate the error!

Spring 201310 Numerical Differentiation (cont) Error analysis –h  (truncation) error  What happened at h = 0.00001?!

Spring 201311 Ex: Polynomial Deflation F(x) is a polynomial with 20 real roots Use any method to numerically solve a root, then deflate the polynomial to 19 th degree Solve another root, and deflate again, and again, … The accuracy of the roots obtained is getting worse each time due to error propagation

Spring 201312 Computer Representation of Floating Point Numbers Decimal-binary conversion Floating point VS. fixed point Standard: IEEE 754 (1985)

Spring 201313 Decimal-Binary Conversion Ex: 29(base 10) 2)29 2)14 1 2) 7 0 2) 3 1 2) 1 1 2) 0 1 29 10 =11101 2

Spring 201314 Fraction Binary Conversion Ex: 0.625 (base 10) 22 a 1 =1 22 a 2 =1 a 3 =1 a 4 = a 5 =…=0

Spring 201315 Computing:How about 0.1 10 ? 0.625 1.250 2  2  0.500 2  1.000 0.625 10 = 0.101 2 0.1 10 = 0.00011 2

Spring 201316 Floating VS. Fixed Point Decimal, 6 digits (positive number) –fixed point: with 5 digits after decimal point 0.00001, …, 9.99999 –Floating point: 2 digits as exponent (10-base); 4 digits for mantissa (accuracy) 0.001x10 00, …, 9.999x10 99 Comparison: –Fixed point: fixed accuracy; simple math for computation (used in systems w/o FPU) –Floating point: trade accuracy for larger range of representation

Spring 201317 Floating Point Representation Fraction, f –Usually normalized so that Base,  –2 for personal computers –16 for mainframe –… Exponent, e

Spring 201318 IEEE 754-1985 Purpose: make floating system portable Defines: the number representation, how calculation performed, exceptions, … Single-precision (32-bit) Double-precision (64-bit)

Spring 201319 Number Representation S: sign of mantissa Range (roughly) –Single: 10 -38 to 10 38 –Double: 10 -307 to 10 307 Precision (roughly) –Single: 7-8 significant decimal digits –Double: 15 significant decimal digits

Spring 201320 Significant Digits In binary sense, 24 bits are significant (with implicit one – next page) In decimal sense, roughly 7-8 decimal significant digits When you write your program, make sure the results you printed carry the meaningful significant digits. 1 2 -23

Spring 201321 Implicit One Normalized mantissa always  1.0 –Only store the fractional part to increase one extra bit of precision Ex: 3.5

Spring 201322 Exponent Bias Ex: in single precision, exponent has 8 bits –0000 0000 (0) to 1111 1111 (255) Add an offset to represent +/ – numbers –Effective exponent = biased exponent – bias –Bias value: 32-bit (127); 64-bit (1023) –Ex: 32-bit 1000 0000 (128): effective exp.=128-127=1

Spring 201323 Ex: Convert – 3.5 to 32-bit FP Number

Spring 201324 Examine Bits of FP Numbers Explain how this program works

Spring 201325 The “Examiner” Use the previous program to –Observe how ME work –Test subnormal behaviors on your computer/compiler –Convince yourself why the subtraction of two nearly equal numbers produce lots of error –NaN: Not-a-Number !?

Spring 201326 Design Philosophy of IEEE 754 [s|e|m] S first: whether the number is +/- can be tested easily E before M: simplify sorting Represent negative by bias (not 2’s complement) for ease of sorting –[biased rep] –1, 0, 1 = 126, 127, 128 –[2’s compl.] –1, 0, 1 = 0xFF, 0x00, 0x01 More complicated math for sorting, increment/decrement

Spring 201327 Exceptions Overflow: –±INF: when number exceeds the range of representation Underflow –When the number are too close to zero, they are treated as zeroes Dwarf –The smallest representable number in the FP system Machine Epsilon (ME) –A number with computation significance (more later)

Spring 201328 Extremities E : (1…1) –M (0…0): infinity –M not all zeros; NaN (Not a Number) E : (0…0) –M (0…0): clean zero –M not all zero: dirty zero (see next page) More later

Spring 201329 Not-a-Number Numerical exceptions –Sqrt of a negative number –Invalid domain of trigonometric functions –… Often cause program to stop running

Spring 201330 Extremities (32-bit) Max: Min (w/o stepping into dirty-zero) 1. (1.111…1)  2 254-127 =(10-0.000…1)  2 127  2 128 (1.000…0)  2 1 -127 =2 -126

Spring 201331 Dirty-Zero (a.k.a. denormals) No “Implicit One” IEEE 754 did not specify compatibility for denormals If you are not sure how to handle them, stay away from them. Scale your problem properly –“Many problems can be solved by pretending as if they do not exist” a.k.a.: also known as

Spring 201332 Dirty-Zero (cont) 00000000 10000000 00000000 00000000 00000000 01000000 00000000 00000000 00000000 00100000 00000000 00000000 00000000 00010000 00000000 00000000 2 -126 2 -127 2 -128 2 -129 (Dwarf: the smallest representable) R 0 2 -126 denormals dwarf

Spring 201333 Drawf (32-bit) Value: 2 -149

Spring 201334 Machine Epsilon (ME) Definition –smallest non-zero number that makes a difference when added to 1.0 on your working platform This is not the same as the dwarf

Spring 201335 Computing ME (32-bit) 1+eps Getting closer to 1.0 ME : (00111111 10000000 00000000 00000001) –1.0 = 2 -23  1.12  10 -7

Spring 201336 Effect of ME

Spring 201337 Significance of ME Never terminate the iteration on that 2 FP numbers are equal. Instead, test whether |x-y| < ME

Machine Epsilon (Wikipedia) Spring 201338 Machine epsilon gives an upper bound on the relative error due to rounding in floating point arithmetic.

Spring 201339 Numerical Scaling Number density: there are as many IEEE 754 numbers between [1.0, 2.0] as there are in [256, 512] Revisit: – “roundoff” error –ME: a measure of real number density near 1.0 Implication: –Scale your problem so that intermediate results lie between 1.0 and 2.0 (where numbers are dense; and where roundoff error is smallest) R

Spring 201340 Scaling (cont) Performing computation on denser portions of real line minimizes the roundoff error –but don’t over do it; switch to double precision will easily increase the precision –The densest part is near subnormal, if density is defined as numbers per unit length

Spring 201341 How Subtraction is Performed on Your PC Steps: –convert to Base 2 –Equalize the exponents by adjusting the mantissa values; truncate the values that do not fit –Subtract mantissa –normalize

Spring 201342 Subtraction of Nearly Equal Numbers Base 10: 1.24446 – 1.24445 1. 1110111 0100011 1010100… – Significant loss of accuracy (most bits are unreliable)

Spring 201343 Theorem of Loss Precision x, y be normalized floating point machine numbers, and x>y>0 If then at most p, at least q significant binary bits are lost in the subtraction of x-y. Interpretation: –“When two numbers are very close, their subtraction introduces a lot of numerical error.”

Spring 201344 Implications When you program:You should write these instead: Every FP operation introduces error, but the subtraction of nearly equal numbers is the worst and should be avoided whenever possible

Spring 201345 Efficiency Issues Horner Scheme program examples

Spring 201346 Horner Scheme For polynomial evaluation Compare efficiency

Spring 201347 Accuracy vs. Efficiency

Spring 201348 Good Coding Practice

Spring 201349 Storing Multidimensional Array in Linear Memory C and others Fortran, MATLAB

Spring 201350 On Accessing Arrays … Which one is more efficient?

Spring 201351 Issues of PI 3.14 is often not accurate enough –4.0*atan(1.0) is a good substitute

Spring 201352 Compare:

Spring 201353 Exercise Explain why Explain why converge when implemented numerically

Spring 201354 Exercise Why Me( ) does not work as advertised? Construct the 64-bit version of everything –Bit-Examiner –Dme( ); 32-bit: int and float. Can every int be represented by float (if converted)?

Spring 201355 Understanding Your Platform 1 4 4 8 4 8 4 2 Memory word: 4 bytes on 32-bit machines

Spring 201356 Padding How about

Spring 201357 Data Alignment (data structure padding) Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure. Alignment requirement:

Spring 201358 Ex: Padding // for Data2 to align on a 2-byte boundary // no padding required; already on 4-byte boundary // final padding to align a 4-byte boundary sizeof (struct MixedData) = 12 bytes

Spring 201359 Data Alignment (cont) By changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment. Direct the compiler to ignore data alignment (align it on a 1- byte boundary) Push current alignment to stack

Spring 201360 #include struct pad1 { char data1; short data2; int data3; char data4; }; struct pad2 { int data3; short data2; char data1; char data4; }; #pragma pack(push) #pragma pack(1) struct pad3 { char data1; short data2; int data3; char data4; }; #pragma pack(pop) main() { printf ("pad1 size: %d\n", sizeof (struct pad1)); printf ("pad2 size: %d\n", sizeof (struct pad2)); printf ("pad3 size: %d\n", sizeof (struct pad3)); } 12 8 8

Spring 20131 Floating Point Computation Jyun-Ming Chen.

Similar presentations

Presentation on theme: "Spring 20131 Floating Point Computation Jyun-Ming Chen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spring 20131 Floating Point Computation Jyun-Ming Chen.

Similar presentations

Presentation on theme: "Spring 20131 Floating Point Computation Jyun-Ming Chen."— Presentation transcript:

Similar presentations

About project

Feedback