3 Roundoff and Cancellation Roundoff examples for the float s23e8 format additive roundoffa= = fl 1 multiplicative roundoffb= * = fl 1 cancellationc=a,b(c-1) * 10 8 = fl 0 Cancellation promotes the small error to the absolute error 4 and a relative error 1. Order of operations can be crucial: – 1= fl 0 1 – = fl
4 More Precision float s23e double s52e long double s63e This is all wrong, even the sign is wrong!! … The correct result is Lesson learnt: Computational Precision ≠ Accuracy of Result Evaluating (with powers as multiplications) [S.M. Rump, 1988] for gives
5 Precision and Accuracy There is no monotonic relation between the computational precision and the accuracy of the final result. Increasing precision can decrease accuracy ! Even when one can prove positive effects of increased precision, it is very difficult to quantify them. We often simply rely on the experience that increased precision helps in common cases. But for common cases we need high precision only in very few places to obtain the desired accuracy.
7 Resources for Signed Integer Operations OperationAreaLatency min(r,0) max(r,0) b+12 add(r 1,r 2 ) sub(r 1,r 2 ) 2bb add(r 1,r 2,r 3 ) add(r 4,r 5 ) 2b1 mult(r 1,r 2 ) sqr(r) b(b-2) b ld(b) sqrt(r) 2c(c-5) c(c+3) b: bitlength of argument, c: bitlength of result
8 Arithmetic Area Consumption on a FPGA
9 Higher Precision Emulation Given a m x m bit unsigned integer multiplier we want to build a n x n multiplier with a n=k*m bit result The evaluation of the first sum requires k(k+1)/2 multiplications, the evaluation of the second depends on the rounding mode For floating point numbers additional operations for the correct handling of the exponent are necessary A float-float emulation is less complex than an exact double emulation, but typically still requires 10 times more operations
12 Direct Scheme Example: LU Solver [J. Dongarra et al., 2006]
13 Iterative Refinement: First and Second Step High precision path through fine nodes Low precision path through coarse nodes
14 Iterative Scheme Example: Stationary Solver We obtain a convergent series : To clarify the interaction of these two iterative schemes let us consider a general convergent iterative scheme [D. Göddeke et al., 2005]
15 Mixed Precision for Convergent Schemes Explicit solution representation Problem: Summation of addends with decreasing size. Solution: Split the sum into a sum of partial sums (outer and inner loop). Precision reduction: Reduce the number range for G, e.g. G affine in U: Iterative refinement: this formulation is equivalent to the refinement step in the outer iteration scheme for
16 Iterative Convergence: First Partial Sum Convergent iterative scheme High precision path through fine nodes Low precision path through coarse nodes
17 Iterative Convergence: Second Partial Sum Convergent iterative scheme High precision path through fine nodes Low precision path through coarse nodes
18 CPU Results: LU Solver chart courtesy of Jack Dongarra
19 GPU Results: Conjugate Gradient and Multigrid
20Conclusions The relation between computational precision and final accuracy is not monotonic Iterative refinement allows to reduce the precision of many operations without a loss of final accuracy In multiplier dominated designs the resulting savings grow quadratically (area or time) Area and time improvements benefit various architectures: FPGA, CPU, GPU, Cell, etc.