A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS Using Distributed Arithmetic and Residue Number System.

Slides:



Advertisements
Similar presentations
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Advertisements

Distributed Arithmetic
Unified Approach to the Design of Modulo-(2 n ± 1) Adders Based on Signed-LSB Representation of Residues Ghassem Jaberipur Dept. Electrical & Computer.
Lecture Adders Half adder.
CENG536 Computer Engineering Department Çankaya University.
Chapter 15 Digital Signal Processing
IUCEE Workshop presentation-YVJoshi VLSI Signal Processing Y. V. Joshi SGGS Institute of Engineering and Technology, Nanded.
Improving Power And Performance of Embedded Applications Using Residue Number System Compilers For Embedded Systems Rooju Chokshi.
ECEN4002 Spring 2002DSP Lab Intro R. C. Maher1 A Short Introduction to DSP Microprocessor Architecture R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2002.
Distributed Arithmetic: Implementations and Applications
©2004 Brooks/Cole FIGURES FOR CHAPTER 18 CIRCUITS FOR ARITHMETIC OPERATIONS Click the mouse to move to the next page. Use the ESC key to exit this chapter.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
GPGPU platforms GP - General Purpose computation using GPU
Number Systems - Part II
Coping With the Carry Problem 1. Limit Carry to Small Number of Bits Hybrid Redundant Residue Number Systems 2.Detect the End of Propagation Rather Than.
ECE 8053 Introduction to Computer Arithmetic (Website: Course & Text Content: Part 1: Number Representation.
Adders and Multipliers Review. ARITHMETIC CIRCUITS Is a combinational circuit that performs arithmetic operations, e.g. –Addition –Subtraction –Multiplication.
A Bit-Serial Method of Improving Computational Efficiency of Dot-Products 1.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “A Tutorial“ Greg Goslin Digital Signal Processing.
Highest Performance Programmable DSP Solution September 17, 2015.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
Part.7.1 Copyright 2007 Koren & Krishna, Morgan-Kaufman FAULT TOLERANT SYSTEMS Part 7 - Coding.
Topic: Arithmetic Circuits Course: Digital Systems Slide no. 1 Chapter # 5: Arithmetic Circuits.
Decimal Multiplier on FPGA using Embedded Binary Multipliers Authors: H. Neto and M. Vestias Conference: Field Programmable Logic and Applications (FPL),
Reconfigurable Computing - Multipliers: Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on.
Sequential Multipliers Lecture 9. Required Reading Chapter 9, Basic Multiplication Scheme Chapter 10, High-Radix Multipliers Chapter 12.3, Bit-Serial.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
Implementation of Finite Field Inversion
Lecture 4 Multiplier using FPGA 2007/09/28 Prof. C.M. Kyung.
Sequential Arithmetic ELEC 311 Digital Logic and Circuits Dr. Ron Hayne Images Courtesy of Cengage Learning.
ECE 8053 Introduction to Computer Arithmetic (Website: Course & Text Content: Part 1: Number Representation.
Mohammad Reza Najafi Main Ref: Computer Arithmetic Algorithms and Hardware Designs (Behrooz Parhami) Spring 2010 Class presentation for the course: “Custom.
D ISTRIBUTED A RITHMETIC (DA) 1. D EFINITION DA is basically (but not necessarily) a bit- serial computational operation that forms an inner (dot) product.
EE2174: Digital Logic and Lab Professor Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University CHAPTER 8 Arithmetic.
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
C-H1 Lecture Adders Half adder. C-H2 Full Adder si is the modulo- 2 sum of ci, xi, yi.
Other Arithmetic Functions Section 4-5
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
High-Radix Sequential Multipliers Bit-Serial Multipliers Modular Multipliers Lecture 9.
Hardware Implementations of Finite Field Primitives
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
Choosing RNS Moduli Assume we wish to represent 100, Values Standard Binary  lg 2 (100,000) 10  =   =17 bits RNS(13|11|7|5|3|2), Dynamic.
Array Multiplier Haibin Wang Qiong Wu. Outlines Background & Motivation Principles Implementation & Simulation Advantages & Disadvantages Conclusions.
Array multiplier TU/e Processor Design 5Z032.
CORDIC (Coordinate rotation digital computer)
Lecture Adders Half adder.
Sequential Multipliers
Overview of Residue Number System (RNS) for Advanced VLSI Design and VLSI Signal Processing NTUEE 吳安宇.
Reconfigurable Computing - Options in Circuit Design
Multiplication & Division
Arithmetic and Logic Units
CprE 583 – Reconfigurable Computing
Radix 2 Sequential Multipliers
Brief Overview of Residue Number System (RNS)
Computer Organization and Design
Multiplier-less Multiplication by Constants
Applications of Distributed Arithmetic to Digital Signal Processing:
UNIVERSITY OF MASSACHUSETTS Dept
Topics Multipliers..
UNIVERSITY OF MASSACHUSETTS Dept
Tree and Array Multipliers
Sequential Multipliers
UNIVERSITY OF MASSACHUSETTS Dept
Arithmetic Logic Unit A.R. Hurson Electrical and Computer Engineering Missouri University of Science & Technology A.R. Hurson.
Booth Recoding: Advantages and Disadvantages
Chapter 14 Arithmetic Circuits (II): Multiplier Rev /12/2003
Applications of Distributed Arithmetic to Digital Signal Processing:
Lecture 2 Adders Half adder.
Presentation transcript:

A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS Using Distributed Arithmetic and Residue Number System

Project Scope To compare the implementation efficiencies (area times delay) of Distributed Arithmetic (DA), RNS and DA- RNS based parallel multiply accumulate architectures on FPGAs

Background and Context FPGAs increasingly used for DSP computations FPGAs have potential for parallelism FPGAs architecture exploitation (LUT based) Novel MAC architectures especially suitable for FPGAs

Some More Background In DSP MACs use constant coefficient (Fixed Multiplicand) Full Multiplier Implementation Not Required Not All Multiplier Architecture Efficient for FPGAs

Motivation Distributed Arithmetic and Residue Arithmetic techniques are LUT based techniques Explore the “synergy” between FPGA architecture and above mentioned techniques

Distributed Arithmetic Overview

Basic Serial Architecture

Residue Arithmetic Overview (z1, z2,..., zn) = ( x1, x2, …, xn)  (y1,y2, …, yn) zi = (xi  yi) mod mi  denotes any of the modulo operations of addition, subtraction or multiplication

Modulo Adder

Modulo Constant Multiplier Due to the small sizes of residues and a constant multiplicand, a direct LUT based implementation is very efficient 4-bit Constant Modulo Multiplier A0 A1 A2 A3 X[3:0] 5-bit Constant Modulo Multiplier A0 A1 A2 A3 X[4:0] A4

RNS MAC Architecture

Conversion Issues in RNS Binary to RNS and RNS to Binary Conversion are significant overheads Binary to RNS relatively simple RNS to Binary Using a Direct CRT Implementation Requires Modulo M adders

Forward Conversion

Reverse Conversion

DA-RNS Coupling

Scaling Accumulator Design

DA 8-bits 8 Taps 12-bits Coefficients Implementation

Critical Path Results Source: PSC8_0_PSC_0/I_Q7 (FF) Destination: SACC24_REG2/I_Q3 (FF) Data Path: PSC8_0_PSC_0/I_Q7 to SACC24_REG2/I_Q3e)