Energy and Delay Improvement via Decimal Floating Point Hossam A.H.Fahmy, Electronics and Communications Department, CairoUniversity Egypt and.

Energy and Delay Improvement via Decimal Floating Point Units @ Hossam A.H.Fahmy, Electronics and Communications Department, CairoUniversity Egypt and RamyRaafat, Amira, M.Abdel-Majeed, Rodina, Samy, TarekElDeeb, YasminFarouk, SilMinds ©2009, 19th IEEE International Symposium on Computer Arithmetic Vishesh Kalra EE800 11089943 Vishesh Kalra EE800 11089943

I NTRODUCTION The proposal in the paper stresses the importance of energy savings achieved by hardware implementation for Decimal Floating Point(DFP). Reason > Some Decimal Fractions which represent a tax amount or a sales discount yield an infinitely recurring number if converted to a binary representation. Hence a Binary number with finite number of bits cannot accurately represent such fractions. > One Study shows that in a large billing application such an error may be up to 5 million $ /year. > A Benchmarking Study estimates that many financial applications spend over 75% of their execution time in DFP functions.

Decimal Adder

Decimal Multiplier Two main paths- Significand path and Exponent path. Significand Path relies on a Parallel Multiplier to generate partial products in parallel and reduce them to two vectors(sum and carry) using a carry save addition tree. These two vectors are added using the new decimal adder proposed.

FMA(FUSED MULTIPLY ADD) It uses a significand path similar to multiplier explained before and introduces the operand c after alignment as an initial partial product in the reduction tree. Product axb has 2p digits and operand c is either shifted to right or left. If c shifted to the left we use 3p digits wide final adder to get result which is later rounded.

DECIMAL DIV/SQRT Algorithm used – Modified Newton Raphson. Iterate on to find out reciprocal of b and to find out reciprocal square root of b. Uses the parallel optimized FMA unit and all other results in redundant form and modify the regular multiplication circuit to speed up the operation. The author proposes new rounding scheme – truncates quotient to p digits and checks actual reminder to decide on the correct rounding.

Example of Rounding

Energy Evaluation To verify the designs in real hardware synthesis is done on the Altera Cyclone II development kit. On the FPGA, our hardware connects to a NIOSII processor as a slave memory mapped to Component on the Avalon bus. Due to the limitation of this conﬁguration, the operands are transmitted to our designs on several clock cycles. This connection, however, may be retroﬁt with a DFP acceleration card.

Result Delay in Divider is less than 1/3 times of 2300 FO4 for Decimal 64 design proposed by Wang and Schulte. Using the Power Play Estimator tool of Altera, the estimated average power/instruction is 109 mW, while FPGA running at 50 MHz. Energy Delay Product of over 500.

Energy and Delay Improvement via Decimal Floating Point Hossam A.H.Fahmy, Electronics and Communications Department, CairoUniversity Egypt and.

Similar presentations

Presentation on theme: "Energy and Delay Improvement via Decimal Floating Point Hossam A.H.Fahmy, Electronics and Communications Department, CairoUniversity Egypt and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Energy and Delay Improvement via Decimal Floating Point Hossam A.H.Fahmy, Electronics and Communications Department, CairoUniversity Egypt and.

Similar presentations

Presentation on theme: "Energy and Delay Improvement via Decimal Floating Point Hossam A.H.Fahmy, Electronics and Communications Department, CairoUniversity Egypt and."— Presentation transcript:

Similar presentations

About project

Feedback