Lecture 2 Addendum Rounding Techniques ECE 645 – Computer Arithmetic.

Slides:



Advertisements
Similar presentations
Number Representation Part 2 Little-Endian vs. Big-Endian Representations Floating Point Representations ECE 645: Lecture 5.
Advertisements

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.
Topics covered: Floating point arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Binary Arithmetic Binary addition Binary subtraction
CHAPTER 2 Number Systems, Operations, and Codes
Assembly Language and Computer Architecture Using C++ and Java
Number Systems Standard positional representation of numbers:
Chapter 6 Arithmetic. Addition Carry in Carry out
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
Copyright 2008 Koren ECE666/Koren Sample Mid-term 1.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital.
1 Binary Arithmetic, Subtraction The rules for binary arithmetic are: = 0, carry = = 1, carry = = 1, carry = = 0, carry =
CSE 378 Floating-point1 How to represent real numbers In decimal scientific notation –sign –fraction –base (i.e., 10) to some power Most of the time, usual.
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM.
1 Binary Numbers Again Recall that N binary digits (N bits) can represent unsigned integers from 0 to 2 N bits = 0 to 15 8 bits = 0 to bits.
Introduction to Number Systems
Number Systems Lecture 02.
Chapter 3 Data Representation part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
ECE 645 – Computer Arithmetic Lecture 9: Basic Dividers ECE 645—Computer Arithmetic 4/1/08.
ELEN 5346/4304 DSP and Filter Design Fall Lecture 12: Number representation and Quantization effects Instructor: Dr. Gleb V. Tcheslavski Contact:
Numerical Computations in Linear Algebra. Mathematically posed problems that are to be solved, or whose solution is to be confirmed on a digital computer.
Data Representation – Binary Numbers
Binary Real Numbers. Introduction Computers must be able to represent real numbers (numbers w/ fractions) Two different ways:  Fixed-point  Floating-point.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
NUMBER REPRESENTATION CHAPTER 3 – part 3. ONE’S COMPLEMENT REPRESENTATION CHAPTER 3 – part 3.
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
CISE301_Topic11 CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4:
Computing Systems Basic arithmetic for computers.
Arithmetic Chapter 4.
Computer Architecture
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
Data Representation in Computer Systems
Chapter 4 – Arithmetic Functions and HDLs Logic and Computer Design Fundamentals.
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
CPS120: Introduction to Computer Science Computer Math: Signed Numbers.
1 COMS 161 Introduction to Computing Title: Numeric Processing Date: October 20, 2004 Lecture Number: 23.
Number Representation Part 2 Little-Endian vs. Big-Endian Representations Floating Point Representations Rounding Representation of the Galois Field elements.
ECE 301 – Digital Electronics Unsigned and Signed Numbers, Binary Arithmetic of Signed Numbers, and Binary Codes (Lecture #2)
BR 8/99 Binary Numbers Again Recall than N binary digits (N bits) can represent unsigned integers from 0 to 2 N bits = 0 to 15 8 bits = 0 to 255.
CSC 221 Computer Organization and Assembly Language
ECE 331 – Digital System Design Representation and Binary Arithmetic of Negative Numbers and Binary Codes (Lecture #10) The slides included herein were.
1 Floating Point Operations - Part II. Multiplication Do unsigned multiplication on the mantissas including the hidden bits Do unsigned multiplication.
Data Representation in Computer Systems. 2 Signed Integer Representation The conversions we have so far presented have involved only positive numbers.
Digital Logic Lecture 3 Binary Arithmetic By Zyad Dwekat The Hashemite University Computer Engineering Department.
Digital Representations ME 4611 Binary Representation Only two states (0 and 1) Easy to implement electronically %0= (0) 10 %1= (1) 10 %10= (2) 10 %11=
CS1Q Computer Systems Lecture 2 Simon Gay. Lecture 2CS1Q Computer Systems - Simon Gay2 Binary Numbers We’ll look at some details of the representation.
ELEN 033 Lecture #5 Tokunbo Ogunfunmi Santa Clara University.
1 ELEN 033 Lecture 4 Chapter 4 of Text (COD2E) Chapters 3 and 4 of Goodman and Miller book.
순천향대학교 정보기술공학부 이 상 정 1 3. Arithmetic for Computers.
Computer Architecture Lecture 11 Arithmetic Ralph Grishman Oct NYU.
Number Representation Part 2 Floating Point Representations Rounding Representation of the Galois Field elements ECE 645: Lecture 5.
Sources of Computational Errors FLP approximates exact computation with real numbers Two sources of errors to understand and counteract: 1. Representation.
Chapter 8 Computer Arithmetic. 8.1 Unsigned Notation Non-negative notation  It treats every number as either zero or a positive value  Range: 0 to 2.
ECE 3110: Introduction to Digital Systems Signed Number Conversions and operations.
Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.
Binary Addition The simplest arithmetic operation in binary is addition. Adding two single-digit binary numbers is relatively simple, using a form of carrying:
Lecture 4: Digital Systems & Binary Numbers (4)
Integer Division.
Chapter 6 Floating Point
Number Representation
How to represent real numbers
Approximations and Round-Off Errors Chapter 3
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
Presentation transcript:

Lecture 2 Addendum Rounding Techniques ECE 645 – Computer Arithmetic

2 Required Reading Parhami, Chapter 17.5, Rounding schemes Rounding Algorithms 101

3 Rounding Rounding occurs when we want to approximate a more precise number (i.e. more fractional bits L) with a less precise number (i.e. fewer fractional bits L') Example 1: old: (K=6, L=8) new: (K'=6, L'=2) Example 2: old: (K=6, L=8) new: (K'=6, L'=0) The following pages show rounding from L>0 fractional bits to L'=0 bits, but the mathematics hold true for any L' < L Usually, keep the number of integral bits the same K'=K

4 Rounding Equation y = round(x) Fractional partWhole part x k–1 x k–2... x 1 x 0. x –1 x –2... x –l y k–1 y k–2... y 1 y 0 Round

5 Rounding Techniques There are different rounding techniques: 1) truncation results in round towards zero in signed magnitude results in round towards -∞ in two's complement 2) round to nearest number 3) round to nearest even number (or odd number) 4) round towards +∞ Other rounding techniques 5) jamming or von Neumann 6) ROM rounding Each of these techniques will differ in their error depending on representation of numbers i.e. signed magnitude versus two's complement Error = round(x) – x

6 1) Truncation Truncation in signed-magnitude results in a number chop(x) that is always of smaller magnitude than x. This is called round towards zero or inward rounding (3.5) 10  011 (3) 10 Error = (-3.5) 10  111 (-3) 10 Error = +0.5 Truncation in two's complement results in a number chop(x) that is always smaller than x. This is called round towards -∞ or downward-directed rounding (3.5) 10  011 (3) 10 Error = (-3.5) 10  100 (-4) 10 Error = -0.5 The simplest possible rounding scheme: chopping or truncation x k–1 x k–2... x 1 x 0. x –1 x –2... x –l x k–1 x k–2... x 1 x 0 trunc ulp

7 Truncation Function Graph: chop(x) Fig Truncation or chopping of a signed-magnitude number (same as round toward 0). Fig Truncation or chopping of a 2’s-complement number (same as round to -∞). chop( x ) –4 –3 –2 –1 x –4 –3 –2 – x ) –4 –3 –2 –1 x –4 –3 –2 –

8 Bias in two's complement truncation X (binary) X (decimal) chop(x) (binary) chop(x) (decimal) Error (decimal) Assuming all combinations of positive and negative values of x equally possible, average error is In general, average error = -(2 -L' -2 -L )/2, where L' = new number of fractional bits

9 Implementation truncation in hardware Easy, just ignore (i.e. truncate) the fractional digits from L to L'+1 x k-1 x k-2.. x 1 x 0. x -1 x -2.. x -L =y k-1 y k-2.. y 1 y 0. ignore (i.e. truncate the rest)

10 2) Round to nearest number Rounding to nearest number what we normally think of when say round (2.25) 10  010 (2) 10 Error = (2.75) 10  011 (3) 10 Error = (2.00) 10  010 (2) 10 Error = (2.5) 10  011 (3) 10 Error = +0.5 [round-half-up (arithmetic rounding)] (2.5) 10  010 (2) 10 Error = -0.5 [round-half-down]

11 Round-half-up: dealing with negative numbers Rounding to nearest number what we normally think of when say round (-2.25) 10  110 (-2) 10 Error = (-2.75) 10  101 (-3) 10 Error = (-2.00) 10  110 (-2) 10 Error = (-2.5) 10  110 (-2) 10 Error = +0.5 [asymmetric implementation] (-2.5) 10  101 (-3) 10 Error = -0.5 [symmetric implementation]

12 Round to Nearest Function Graph: rtn(x) Round-half-up version Asymmetric implementation Symmetric implementation

13 Bias in two's complement round to nearest Round-half-up asymmetric implementation X (binary) X (decimal) rtn(x) (binary) rtn(x) (decimal) Error (decimal) Assuming all combinations of positive and negative values of x equally possible, average error is Smaller average error than truncation, but still not symmetric error We have a problem with the midway value, i.e. exactly at 2.5 or -2.5 leads to positive error bias always Also have the problem that you can get overflow if only allocate K' = K integral bits Example: rtn(011.10)  overflow This overflow only occurs on positive numbers near the maximum positive value, not on negative numbers

14 Implementing round to nearest (rtn) in hardware Round-half-up asymmetric implementation Two methods Method 1: Add '1' in position one digit right of new LSB (i.e. digit L'+1) and keep only L' fractional bits x k-1 x k-2.. x 1 x 0. x -1 x -2.. x -L + 1 =y k-1 y k-2.. y 1 y 0. y -1 Method 2: Add the value of the digit one position to right of new LSB (i.e. digit L'+1) into the new LSB digit (i.e. digit L) and keep only L' fractional bits x k-1 x k-2.. x 1 x 0. x -1 x -2.. x -L + x -1 y k-1 y k-2.. y 1 y 0. ignore (i.e. truncate the rest) ignore (i.e truncate the rest)

15 Round to Nearest Even Function Graph: rtne(x) To solve the problem with the midway value we implement round to nearest-even number (or can round to nearest odd number) Fig Rounding to the nearest even number. Fig R* rounding or rounding to the nearest odd number.

16 Bias in two's complement round to nearest even (rtne) average error is now 0 (ignoring the overflow) cost: more hardware X (binary) X (decimal ) rtne(x) (binary) rtne(x) (decimal) Error (decimal) (overfl)

17 4) Rounding towards infinity We may need computation errors to be in a known direction Example: in computing upper bounds, larger results are acceptable, but results that are smaller than correct values could invalidate upper bound Use upward-directed rounding (round toward +∞) up(x) always larger than or equal to x Similarly for lower bounds, use downward-directed rounding (round toward -∞) down(x) always smaller than or equal to x We have already seen that round toward -∞ in two's complement can be implemented by truncation

18 Rounding Toward Infinity Function Graph: up(x) and down(x) up(x)down(x) down(x) can be implemented by chop(x) in two's complement

19 Two's Complement Round to Zero Two's complement round to zero (inward rounding) also exists inward( x ) –4 –3 –2 –1 x –4 –3 –2 –

20 Other Methods Note that in two's complement round to nearest (rtn) involves an addition which may have a carry propagation from LSB to MSB Rounding may take as long as an adder takes Can break the adder chain using the following two techniques: Jamming or von Neumann ROM-based

21 5) Jamming or von Neumann Chop and force the LSB of the result to 1 Simplicity of chopping, with the near-symmetry or ordinary rounding Max error is comparable to chopping (double that of rounding)

22 6) ROM Rounding Fig ROM rounding with an 8  2 table. Example: Rounding with a 32  4 table Rounding result is the same as that of the round to nearest scheme in 31 of the 32 possible cases, but a larger error is introduced when x 3 = x 2 = x 1 = x 0 = x –1 = 1 x k–1... x 4 x 3 x 2 x 1 x 0. x –1 x –2... x –l x k–1... x 4 y 3 y 2 y 1 y 0 ROM ROM dataROM address

23 ANSI/IEEE Floating-Point Standard ANSI/IEEE standard includes four rounding modes: Round to nearest even [default rounding mode] Round toward zero (inward) Round toward +  (upward) Round toward –  (downward)