Presentation is loading. Please wait.

Presentation is loading. Please wait.

AICCSA’06 Sharja 1 A CAD Tool for Scalable Floating Point Adder Design and Generation Using C++/VHDL By Asim J. Al-Khalili.

Similar presentations


Presentation on theme: "AICCSA’06 Sharja 1 A CAD Tool for Scalable Floating Point Adder Design and Generation Using C++/VHDL By Asim J. Al-Khalili."— Presentation transcript:

1 AICCSA’06 Sharja 1 A CAD Tool for Scalable Floating Point Adder Design and Generation Using C++/VHDL By Asim J. Al-Khalili

2 AICCSA’06 Sharja 2 Overview Introduction to Floating point Addition Architecture of Single Path FADD Activity Scaling Triple Data Path Floating Point Adder VHDL Modeling Results Implementation

3 AICCSA’06 Sharja 3 FP Representation FP Representation --1.XXXXX 2 * 2 YYYY (IEEE 754 floating-point standard, single precision)

4 AICCSA’06 Sharja 4 Floating point Addition Start 1. Compare the exponents of the two numbers. 2. Shift the smaller number to the right until its exponent would match the larger exponent 3. Add the significand Overflow/Underflow 4. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent 5. Round the significand to the appropriate number Exceptions Still Normalized ? Done No Yes No

5 AICCSA’06 Sharja 5

6 6 Architecture Consideration What’s the best architecture?

7 AICCSA’06 Sharja 7 FP Adder Function include--- Sign identification Exponent comparison Smaller significand right shift Significand comparison ( If exp. are equal) Significand inverter Addition and Leading Zero anticipation Normalization shifting left Rounding Shift after rounding Compensation shifting Exception handler

8 AICCSA’06 Sharja 8 Architecture of TDPFADD

9 AICCSA’06 Sharja 9 Transition activity scaling State assertion conditions of TDPFADD State Active data path State assertion criterionActivity scaled blocks IBypassEither exponent is zero or emax +1 or edif > p Entire TDPFADD except Bypass data path and Exponent, Control, and Result Int. Flag units JLZANo Bypass and subtraction and edif 1 (LZsp) Pre-alignment barrel shifter (large) KLZB No Bypass and addition or edif  1 (LZs 1) LZA logic and normalization barrel shifter (large)

10 AICCSA’06 Sharja 10 With IEEE single precision floating point data format, the probability that the FADD is in states A, B or C is given by P(A) = 0.8177, P(B) = 0.1765 and P(C) = 0.0058. Here, it is assumed that the exponents are independent, uniformly distributed random variables and the events of addition and subtraction are equally likely. With IEEE double precision floating point format P(A) = 0.9484, P(B) = 0.0509 and P(C) = 7*10 -4. The time averaged power consumption (expected value) of a transition activity scaled FADD whose operational states are represented by Fig. 2 is given by Power=P(A)* P A + P(B) P B * + P(C ) * P C where P A, P B and P C represent the time averaged power consumption of the FADD in states A, B and C respectively. Probabilities of the Paths

11 AICCSA’06 Sharja 11 Pipelined TDPFADD

12 AICCSA’06 Sharja 12 Architecture Consideration 5 2 1 3 4 6 7 Straightforward IEEE Floating-point addition algorithm 1.Exponent subtraction. 2.Alignment. 3.Significand addition. 4.Conversion. 5.Leading-one detection. 6.Normalization. 7.Rounding. Advantages: 1. Positive result, Eliminate Complement 2. Comparison // Alignment 3. Full Normal // Rounding

13 AICCSA’06 Sharja 13 Compound Adder How can a compound adder compute fastest?

14 AICCSA’06 Sharja 14 Compound Adder The Compound adder computes simultaneously the sum and the sum plus one, and then the correct rounded result is obtained by selecting according to the requirements of the rounding.

15 AICCSA’06 Sharja 15 Architecture Consideration Cont. (Compare to signal path) Reduce latency FAR data-path: --No Conversion --No Full normalization --No LOP CLOSE data-path: --No Full Alignment The latency of the floating-point addition Can be improved if the rounding is combined with the addition/subtraction. Reduce total path delay --eliminate Comparator Increase area --two 2’s COMP ADDER

16 AICCSA’06 Sharja 16

17 AICCSA’06 Sharja 17. CComparison of low latency architectures of TDPFADD and single data path FADD using 0.13 micron CMOS technology ParametersTDPFADDSingle data path FADD Maximum Delay, D (ns)13.6219.54 Average Power, P a (mW) at 16.7 MHz2.9515.72 Worst case Power, P w (mW) at 16.7 MHz4.215.13 Power using real data, Preal (mW) at 16.7 MHz3.414.58 Area, A (10 4 cell-area)3.622.24 Power-Delay Product, PD (ns.mW)40.18307.16 Area-Power Product, AP (10 4 cell-area.mW)10.68 35.21 Area-Delay Product, AT (10 4 cell-area.ns)49.3043.76 Area-Delay 2 Product, AT 2 (10 4 cell-area.ns 2 )671.5855.2

18 AICCSA’06 Sharja 18 Comparison of low latency architectures of TDPFADD and single data path FADD using FPGA technology ParametersTDPFADDSingle data path FADD Maximum Delay, D (ns)71.27109.21 Average Power, P a (W) at 2.38 MHz0.113 0.204 Worst case Power, P w (W) at 2.38 MHz 0.196 0.205 Power using real data, Preal (W) at 2.38 MHz0.1380.183 Area, A, Total CLBs (#)11573.7 Power-Delay Product, PD (ns.10mW)8.8522.27 Area-Power Product, AP (10#.10mW)12.99 15.03 Area-Delay Product, AT (10#.ns)8196 8048 Area-Delay 2 Product, AT 2 (10#.ns 2 )58.41 x 10 4 87.90x 10 4

19 AICCSA’06 Sharja 19 Comparison of pipelined architectures of TDPFADD and single data path FADD using 0.13 micron CMOS technology ParametersTDPFADDSingle data path FADD Maximum Delay, D (ns)5.786.35 Average Power, P a (mW) at 50 MHz3.87 6.00 Worst case Power, P w (mW) at 50 MHz4.515.71 Power using real data, Preal (mW) at 50 MHz3.945.50 Area, A (10 4 cell-area)5.464.44 Power-Delay Product, PD (ns.mW) 22.3638.1 Area-Power Product, AP (10 4 cell-area.mW) 21.13 26.64 Area-Delay Product, AT (10 4 cell-area.ns)31.5528.19 Area-Delay 2 Product, AT 2 (10 4 cell-area.ns 2 ) 182.40 179.03

20 AICCSA’06 Sharja 20 Comparison of pipelined structures of TDPFADD and single data path FADD using FPGA technology ParametersTDPFADDSingle data path FADD Maximum Delay, D (ns)33.7045.08 Average Power, P a (W) at 5 MHz0.0890.111 Worst case Power, P w (W) at 5 MHz0.11300.1197 Power using real data, Preal (W) at 5 MHz0.0960.1141 Area, A, Total CLBs (#)147.11104.66 Power-Delay Product, PD (ns.10mW)2.9995.01 11.61 Area-Power Product, AP (10#.10mW)13.09 Area-Delay Product, AT (10#.ns)4957.604718.07 Area-Delay 2 Product, AT 2 (10#.ns 2 )1.67 x 10 4 21.26 x 10 4

21 AICCSA’06 Sharja 21 VHDL Modeling Design Idea : 1. The length and depth parameters needed by some components are defined in package pkg.vhd 2. The parameters of pkg.vhd are created by C/C++ program with user defined Exponent and Significand length 3. VHDL components and created pkg.vhd together generate FP Adder

22 AICCSA’06 Sharja 22 VHDL Generation Get Parameter Length from user C++ program Calculate needed parameters Package Pkg.vhd Structural VHDL code of the floating point adder Synthesize floating point adder hardware VHDL code

23 AICCSA’06 Sharja 23 Calculating the Parameters Using C/C++

24 AICCSA’06 Sharja 24 Input: Exponent Length = 8 Significand Length = 23 Implementation Example 1

25 AICCSA’06 Sharja 25 Generated package pkg.vhd : library ieee; use ieee.std_logic_1164.all; package pkg is constant Exponent_Length : positive :=8; constant Significand_Length : positive :=23; constant HideSig_Length : positive :=27; constant HideSig_Depth : positive :=5; constant LZA_Length : positive :=28; constant LZA_Depth : positive :=5; constant LZA_P2_Length : positive:=32; end pkg;

26 AICCSA’06 Sharja 26 The synthesized FP Adder

27 AICCSA’06 Sharja 27

28 AICCSA’06 Sharja 28

29 AICCSA’06 Sharja 29 Simulation and Test Result

30 AICCSA’06 Sharja 30 Input: Exponent Length = 4 Significand Length = 11 Implementation Example 2

31 AICCSA’06 Sharja 31 Generated package pkg.vhd : library ieee; use ieee.std_logic_1164.all; package pkg is constant Exponent_Length : positive :=4; constant Significand_Length : positive :=11; constant HideSig_Length : positive :=15; constant HideSig_Depth : positive :=4; constant LZA_Length : positive :=16; constant LZA_Depth : positive :=4; constant LZA_P2_Length : positive:=16; end pkg;

32 AICCSA’06 Sharja 32 The synthesized FP Adder

33 AICCSA’06 Sharja 33 The Synthesized FADD

34 AICCSA’06 Sharja 34

35 AICCSA’06 Sharja 35 A scalable-length FP adder is generated The length of the adder is given by the user through C/C++ The objective function is also stated A structural mode FP adder is modeled by VHDL The adder is Synthesizable Depending on Power-Area-Delay requirement a Simple/TDPADD/Pipelined/PTDOADD is generated The adder can also be pipelined Conclusion

36 AICCSA’06 Sharja 36

37 AICCSA’06 Sharja 37 VHDL Modeling 1. Package for Length and Depth Parameters 2. Components of the FP Adder 3. Top Configuration of the FP Adder

38 AICCSA’06 Sharja 38 Input parameters : Significand length Exponent length Output parameters: significand length for calculation significand length for shifting significand depth for shifting Exponent length 1. Package for Length and Depth Parameters

39 AICCSA’06 Sharja 39 Exponent Difference Calculates the difference of the two exponents.

40 AICCSA’06 Sharja 40 Significand Comparison

41 AICCSA’06 Sharja 41 A>B if (a n >b n ) OR (a n =b n ) AND a n-1 >b n-1 ) OR (a n =b n AND a n-1 =b n-1 AND a n-2 >b n-2 ) OR… A>B if a n =b n AND a n-1 =b n-1 AND a n-2= b n-2 … A<B if (a n< b n ) OR (a n =b n ) AND a n-1 <b n-1 ) OR (a n =b n AND a n-1 =b n-1 AND a n-2 <b n-2 ) OR… Equation for Comparison

42 AICCSA’06 Sharja 42 Right Shifter and GRS-bit Generation

43 AICCSA’06 Sharja 43 Right Shifter and GRS-bit Generation Right Shift with variable length

44 AICCSA’06 Sharja 44 Manchester Adder/Subtractor

45 AICCSA’06 Sharja 45

46 AICCSA’06 Sharja 46 Leading Zero Anticipation Logic Might one bit anticipate error

47 AICCSA’06 Sharja 47 Leading Zero Counter

48 AICCSA’06 Sharja 48 Normalization Shifter (left barrel shifter)

49 AICCSA’06 Sharja 49 Rounding Logic =G(M0+R+S)

50 AICCSA’06 Sharja 50 A Half Full Adder

51 AICCSA’06 Sharja 51 Significand Exponent Sign Exception Handling 3. Top Configuration of FP Adder

52 AICCSA’06 Sharja 52 Significand

53 AICCSA’06 Sharja 53 Exponent

54 AICCSA’06 Sharja 54 Sign Select Logic 1.Sign of larger exponent 2.Exponent equal, sign of larger Significand

55 AICCSA’06 Sharja 55 Exception Handling ExponentSignificandObject representedControl Logic 00011 0NonzeroDenormalized number01 1 to 254AnythingFloating-Point number00 2550Infinity10

56 AICCSA’06 Sharja 56 Comparison of Synthesis results for IEEE 754 Single Precision FP addition Using Xilinx Vertex-2 FPGA ParametersSIMPLETDPFADDPIPE/ TDPFADD Maximum delay, D (ns)327.6213.8101.11 Average Power, P (mW)@ 2.38 MHz 18361024382.4 Area A, Total number of CLBs (#) 66410351324 Power Delay Product (ns. 10mW) 7.7. *10 4 4.31 *10 4.3.82 *10 4 Area Delay Product (10 #.ns) 2.18`*10 4 2.21 * 10 4 1.34 *10 4 Area-Delay 2 Product (10#. ns 2 ) 7.13.*10 6 4.73 * 10 6 1.35 *10 6

57 AICCSA’06 Sharja 57 Main Blocks What blocks are considered? Compound Adder with Flagged Prefix Adder (New) LOP with Concurrent Position Correction (New) Alignment Shifter Normalization Shifter

58 AICCSA’06 Sharja 58 Compound Adder Cont. Round to nearest if g=1 if (LSB=1) OR (r+s=1) Add 1 to the result else Truncate at LSB Round Toward zero Truncate Round Toward +Infinity if sign=positive if any bits to the right of the result LSB=1 Add 1 to the result else Truncate at LSB if sign=negative Truncate at LSB Round Toward -Infinity if sign=negative if any bits to the right of the result LSB=1 Add 1 to the result else Truncate at LSB if sign=positive Truncate at LSB Rounding Block Sum, Sum+1 Sum Sum, Sum+1 and Sum+2


Download ppt "AICCSA’06 Sharja 1 A CAD Tool for Scalable Floating Point Adder Design and Generation Using C++/VHDL By Asim J. Al-Khalili."

Similar presentations


Ads by Google