1 A Combined Decimal and Binary Floating-point Multiplier Charles Tsen, Sonia González-Navarro, Michael Schulte, Brian Hickmann, Katherine Compton 2009.

Slides:



Advertisements
Similar presentations
Zhongkai Chen. Gonzalez-Navarro, S. ; Tsen, C. ; Schulte, M. ; Univ. of Malaga, Malaga This paper appears in: Signals, Systems and Computers, ACSSC.
Advertisements

Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.
Fabián E. Bustamante, Spring 2007 Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
CENG536 Computer Engineering department Çankaya University.
Decimal Floating-Point Arithmetic
1 CONSTRUCTING AN ARITHMETIC LOGIC UNIT CHAPTER 4: PART II.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.
Decimal Floating-point Multiplication via Carry-Save Addition Mark Erle Systems & Technology Group International Business Machines Brian Hickmann & Mike.
Assembly Language and Computer Architecture Using C++ and Java
Number Systems Standard positional representation of numbers:
Assembly Language and Computer Architecture Using C++ and Java
Copyright 2008 Koren ECE666/Koren Part.4c.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
1 Module 2: Floating-Point Representation. 2 Floating Point Numbers ■ Significant x base exponent ■ Example:
1 ECE369 Chapter 3. 2 ECE369 Multiplication More complicated than addition –Accomplished via shifting and addition More time and more area.
COE 308: Computer Architecture (T041) Dr. Marwan Abu-Amara Integer & Floating-Point Arithmetic (Appendix A, Computer Architecture: A Quantitative Approach,
1/8/ L24 IEEE Floating Point Basics Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point The IEEE Floating Point Standard and execution.
Operations on data CHAPTER 4.
3-1 Chapter 3 - Arithmetic Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer Architecture.
Information Representation (Level ISA3) Floating point numbers.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
3-1 Chapter 3 - Arithmetic Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles of Computer Architecture.
Computer Arithmetic.
Fixed-Point Arithmetics: Part II
Computing Systems Basic arithmetic for computers.
ECE232: Hardware Organization and Design
Floating Point. Agenda  History  Basic Terms  General representation of floating point  Constructing a simple floating point representation  Floating.
Data Representation in Computer Systems
S. Rawat I.I.T. Kanpur. Floating-point representation IEEE numbers are stored using a kind of scientific notation. ± mantissa * 2 exponent We can represent.
Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large numbers,
07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Abdullah Aldahami ( ) March 12, Introduction 2. Background 3. Proposed Multiplier Design a.System Overview b.Fixed Point Multiplier.
Fixed and Floating Point Numbers Lesson 3 Ioan Despi.
AMIN FARMAHININ-FARAHANI CHARLES TSEN KATHERINE COMPTON FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor.
CSC 221 Computer Organization and Assembly Language
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Lecture notes Reading: Section 3.4, 3.5, 3.6 Multiplication
Computer Arithmetic Floating Point. We need a way to represent –numbers with fractions, e.g., –very small numbers, e.g., –very large.
Computer Engineering FloatingPoint page 1 Floating Point Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
순천향대학교 정보기술공학부 이 상 정 1 3. Arithmetic for Computers.
Number Representation and Arithmetic Circuits
1/8/ L25 Floating Point Adder Copyright Joanne DeGroat, ECE, OSU1 IEEE Floating Point Adder Using the IEEE Floating Point Standard for an.
S 2/e C D A Computer Systems Design and Architecture Second Edition© 2004 Prentice Hall Chapter 6 Overview Number Systems and Radix Conversion Fixed point.
1 CE 454 Computer Architecture Lecture 4 Ahmed Ezzat The Digital Logic, Ch-3.1.
Floating Point Representations
Integer Division.
Topics IEEE Floating Point Standard Rounding Floating Point Operations
Floating Point Numbers: x 10-18
NxN Crossbar design for Barrel Shifter
Floating Point Number system corresponding to the decimal notation
Data Representation Data Types Complements Fixed Point Representation
(Part 3-Floating Point Arithmetic)
Computer Arithmetic Multiplication, Floating Point
Computer Architecture
Faculty of Cybernetics, Statistics and Economic Informatics –
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
Computer Organization and Assembly Language
Presentation transcript:

1 A Combined Decimal and Binary Floating-point Multiplier Charles Tsen, Sonia González-Navarro, Michael Schulte, Brian Hickmann, Katherine Compton th IEEE International Conference on Application- specific Systems, Architectures and Processors

2 Presented by: Mehrnoosh Janbakhsh Feb 2010

3 In this presentation, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE Floating-point Standard. The multiplier design operates on either 64-bit binary encoded decimal floating-point (DFP) numbers or 64-bit binary floating-point (BFP) numbers.

4 IEEE defines two encodings for DFP numbers: The decimal encoding of DFP numbers (the significand is encoded) which is named Densely-Packed Decimal (DPD). The binary encoding of DFP numbers and is commonly referred to as Binary Integer Decimal (BID) because the significand is encoded as an unsigned binary integer.

5 The designed multiplier uses the BID encoding for DFP multiplication, also shares the hardware for BFP and BID multiplication.

6 Outline i. Describes the BFP and BID data types ii. Reviews the BFP iii. BID multiplication algorithms iv. Introduces the combined BFP and BID algorithm v. The synthesis results vi. Future research

7 DFP AND BFP DATATYPES - Representation The BFP and DFP number formats use three fields to define a number: a sign, an exponent, and a significand. The value of a normalized BFP number is: (-1) power S.C.2 power E-bias S: sign C: significand E: the biased exponent Bias: positive const.

8 In DFP S is the sign and the exponent E is biased by a value bias to allow negative exponents but Unlike BFP, the significand C is an unsigned integer with p decimal digits of precision, and this significand is not normalized— It can be any value in the range [0,10 power p -1]

9 Example To clarify the floating-point formats, consider an example of how to represent the value in both BFP and BID systems. In 64-bit BFP, it is represented as (-1) power 0. ( …0). 2 power ( ), where there are 52 binary zeros after the radix point of the significand. With the 64-bit BID encoding, is represented as (-1) power power ( ). In this case, the significand is represented as a binary integer 0… , where there are 47 zeros before the leftmost 1.

10 - Rounding Modes The rounding mode, combined with the sign, whether the closest number is odd or even, and the location of the infinitely precise result on the number line determine the direction of rounding. IEEE specifies five rounding modes for floating-point numbers: RTE, RTA, RTZ, RTN,RTP. The RTA rounding mode is required only for DFP, but the other four rounding modes apply to both BFP and DFP.

11 - Special Values and Exceptions Invalid, divide by zero, underflow, overflow, and inexact are exceptions. The special values are infinity (INF), signaling Not-a-Number (sNaN), and quiet Not-A-Number (qNaN). The difference between sNaN and qNaN is that the sNaN will cause the invalid exception flag to be raised when it is an operand to any operation.

12 FLOATING-POINT MULTIPLICATION ALGORITHMS Step1: Decode inputs A and B to obtain (sign A, E A, C A ) and (sign B, E B, C B ). Also detect special input operands, such as NaN, Zero, and INF. Step2: Compute intermediate product: C IP = C A. C B with a binary multiplier. In parallel, compute intermediate exponent, E IP = E A + E B - bias and final sign, sign Z = sign A XOR sign B Step3: Examine C IP to determine if rounding is needed. Rounding is needed if C IP exceeds p bits or digits. Step4: Create C Z via a conditional increment of C TP based on r* and s*. If rounding causes a carry out, set C Z to 1,000,000,000,000, and adjust the final exponent, E Z. Step5: Encode the output, based on (sign Z, E Z, C Z ).

13 COMBINED MULTIPLIER DESIGN - Operand Decoder and Encoder The exponent and significand widths differ by only one bit between BID and BFP. Thus, each input is decoded into 70-bits: 1 bit for the sign, 11 bits for the exponent, 54 bits for significand, and 4 bits to indicate a special value using a one-hot encoding.

14 Block diagram of combined multiplier

15 Shared hardwareUnshared hardware -Significand decoding -Sign decoding -Special case detection -Exponent decoding -BFP subnormal detection -BID non-canonical zero detection SHARED HARDWARE IN OPERAND DECODER BLOCK

16 COMBINED MULTIPLIER DESIGN - Multiply Datapath

17 DATAPATH BLOCK DESCRIPTION This block multiplies the significands,C A and C B, to obtain an intermediate product, C IP, which has up to 107 significant bits. C IP.w d, to truncate d decimal digits as the first step in rounding BID numbers. The 107 times108-bit multiplication uses four 54 times 54-bit multiplies The fully shaded portions represent hardware that is completely shared between the BID and BFP datapaths. The unshaded areas are dedicated to only one of the datatypes, and the partially shaded areas contain some shared circuitry and some dedicated circuitry.

18 To determine if a BID value must be rounded, it is compared to 10 power16. To avoid a long carry chain, the multiplier individually examines the lower and upper 54-bits of PS and PC, since if any bit is set in the upper 54-bits,rounding is needed. If the sum of the lower bits of PS and PC are greater than 10 power 16 or if the OR'd bit is set, then rounding is needed for BID. For normalized BFP multiplication, since it is known that CIP is in the range [1.0, 4.0), normalization consists of a conditional right shift by one bit and an OR tree to determine s*. The design sets a bit called ultimate if CTP is all 1s, indicating that incrementing it will cause a carryout.

19 SHARED HARDWARE IN MULTIPLY DATAPATH BLOCK Shared hardwareUnshared hardware -54x54-bit multiplier -Right shifter -Sticky calculation -Exponent calculation (bias difference requires extra logic) -Detect if BID rounding needed -Multiply feedback path for BID rounding -BID rounding lookup tables -BID digits counting -Detect all-1 significand for BFP -Detect all-9 significand for BID

20 COMBINED MULTIPLIER DESIGN - Rounding Logic Based on s* and r* on Floating-point rounding techniques for both BID and BFP the sign of the result, and the rounding mode, the final result is determined by conditionally incrementing the upper bits of C IP. SHARED HARDWARE IN ROUNDING LOGIC Shared hardwareUnshared hardware -Incrementer -Increment decision logic -Overflow detection -Underflow detection

21 COMBINED MULTIPLIER DESIGN - Control If a BID multiply enters the unit while it is idle, the operation begins immediately. Subsequent multiplies wait until the current BID multiply finishes, which takes five or fifteen cycles, depending on if rounding is needed. If a BFP multiply enters the unit while it is idle, are fully pipelined. Since BFP multiplies always take five cycles in this design, the control can keep track of how many cycles before the pipeline is empty. It is chosen to make the multiplier have variable latency for BID multiplication (five to fifteen cycles) to exploit a common case.

22 Future work May provide more sophisticated communication with a scheduler to enable more than one BID multiply operation in flight. The design could be enhanced to allow BID and BFP operations to be interleaved.

23 Results The combined BFP and BID multiplier are modeled in RTL-level Verilog For baseline comparisons, the hardware for the standalone BID multiplier and BFP multiplier are modeled All three designs were simulated with hundreds of directed test cases and millions of random test cases using Mentor Graphics Modelsim. The synthesis are performed based on Synopsys Design Compiler and TSMC’s tcbn65gplus 65nm CMOS standard cell library.

24 SYNTHESIS RESULTS DesignArea (um2) Delay (ns) Delay (FO4) Standalone BFP Standalone BID Total Area of Standalone Multipliers Combined BID and BFP

25 ……Results The area of a combined BID and BFP multiplier occupies 58% of the total area of separate BFP and BID units. The delay of the combined multiplier is slightly longer than the standalone DFP multiplier and 37.8% longer than the standalone BFP unit.

26 CONCLUSIONS AND FUTURE WORK The goal of this research was to investigate hardware sharing opportunities for IEEE floating-point multiplication. The work shows that the sharing potential between BFP and BID may be beneficial to chip designers wishing to conserve area. Future work to improve the algorithms and designs for hardware sharing may lend further insights into sharing possibilities.

27 Any Questions?