Multiplier-less Multiplication by Constants

Slides:



Advertisements
Similar presentations
Programmable FIR Filter Design
Advertisements

ADSP Lecture2 - Unfolding VLSI Signal Processing Lecture 2 Unfolding Transformation.
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Integer & Fixed Point Addition and Multiplication CENG 329 Lab Notes By F. Serdar TAŞEL.
Chapter 4 Retiming.
Distributed Arithmetic
CENG536 Computer Engineering Department Çankaya University.
MM3FC Mathematical Modeling 3 LECTURE 3
Integer Multiplication and Division ICS 233 Computer Architecture and Assembly Language Dr. Aiman El-Maleh College of Computer Sciences and Engineering.
Digital Kommunikationselektronik TNE027 Lecture 4 1 Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic analog.
Digital Kommunikationselektronik TNE027 Lecture 3 1 Multiply-Accumulator (MAC) Compute Sum of Product (SOP) Linear convolution y[n] = f[n]*x[n] = Σ f[k]
Distributed Arithmetic: Implementations and Applications
A COMPARATIVE STUDY OF MULTIPLY ACCCUMULATE IMPLEMENTATIONS ON FPGAS Using Distributed Arithmetic and Residue Number System.
GPGPU platforms GP - General Purpose computation using GPU
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
A Bit-Serial Method of Improving Computational Efficiency of Dot-Products 1.
07/19/2005 Arithmetic / Logic Unit – ALU Design Presentation F CSE : Introduction to Computer Architecture Slides by Gojko Babić.
Chapter 6 Digital Filter Structures
Copyright © 2001, S. K. Mitra Digital Filter Structures The convolution sum description of an LTI discrete-time system be used, can in principle, to implement.
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Selected.
ECE 734 Project Implementation of Multiple Constant Multiplication Algorithms for FIR Filters Hamid Shojaei.
Department of Electrical and Computer Engineering Brian M. McCarthy Department of Electrical & Computer Engineering Villanova University ECE8231 Digital.
Lecture 11, Advance Digital Design
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
ECE 448: Lab 7 Design and Testing of an FIR Filter.
EEE 503 Digital Signal Processing Lecture #2 : EEE 503 Digital Signal Processing Lecture #2 : Discrete-Time Signals & Systems Dr. Panuthat Boonpramuk Department.
CS 151: Digital Design Chapter 4: Arithmetic Functions and Circuits
DEPARTMENTT OF ECE TECHNICAL QUIZ-1 AY Sub Code/Name: EC6502/Principles of digital Signal Processing Topic: Unit 1 & Unit 3 Sem/year: V/III.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
Integer Multiplication and Division COE 301 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University.
Chapter 8 Computer Arithmetic. 8.1 Unsigned Notation Non-negative notation  It treats every number as either zero or a positive value  Range: 0 to 2.
ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
DSP First, 2/e Lecture 11 FIR Filtering Intro. May 2016 © , JH McClellan & RW Schafer 2 License Info for DSPFirst Slides  This work released.
Chapter 4 Structures for Discrete-Time System Introduction The block diagram representation of the difference equation Basic structures for IIR system.
Application of digital filter in engineering
Structures for Discrete-Time Systems
CORDIC (Coordinate rotation digital computer)
Integer Multiplication and Division
DSP Design – Lecture 7 Unfolding cont. & Folding Fredrik Edman fredrik
Lecture 11 FIR Filtering Intro
CEN352 Dr. Nassim Ammour King Saud University
EEE4176 Applications of Digital Signal Processing
Lecture 12 Linearity & Time-Invariance Convolution
FFT-based filtering and the
By: Mohammadreza Meidnai Urmia university, Urmia, Iran Fall 2014
Digital Logic Last Time … This Time … Control Path, Arithmetic Ops a
Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.
DESIGN AND IMPLEMENTATION OF DIGITAL FILTER
{ Storage, Scaling, Summation }
Chap. 8 Datapath Units: Multiplier Design
لجنة الهندسة الكهربائية
Unsigned Multiplication
King Fahd University of Petroleum and Minerals
Unconventional Fixed-Radix Number Systems
UNIT V Linear Time Invariant Discrete-Time Systems
Z TRANSFORM AND DFT Z Transform
Overview Part 1 – Design Procedure Part 2 – Combinational Logic
UNIVERSITY OF MASSACHUSETTS Dept
Signal Processing First
UNIVERSITY OF MASSACHUSETTS Dept
Lecture 9 Digital VLSI System Design Laboratory
UNIVERSITY OF MASSACHUSETTS Dept
Zhongguo Liu Biomedical Engineering
Lecture 22 IIR Filters: Feedback and H(z)
Implementation of a De-blocking Filter and Optimization in PLX
Computer Architecture
Presentation transcript:

Multiplier-less Multiplication by Constants Dr. Shoab A. Khan

Multiplication by Constant In many algorithms a large percentage of multiplications are by constants Complexity of a general purpose multiplier is not required Generate Partial Products (PPs) only for 1s in the constant multiplier The number of PPs can be further reduced using canonic sign digit format

Example: FIR Filter In an FIR filter all coefficients are constant For a fully parallel implementation, general purpose multipliers are not required Coefficients are converted in canonic sign digit form

Canonic Sign Digit (CSD) No 2 consecutive bits are non-zero Contains minimum possible number of non-zero bits Representation is unique

CSD is obtained using string property Canonic Sign Digit (CSD) CSD is obtained using string property Examples a Q1.7 format number 01111111 = 20 - 2-7 = 10000001 01101111 → 01110001 → 10010001 k = 20 - 2 -3 - 2 –7 Kx = x20 - x2 -3 - x2 –7

FIR filter Convolution summation with constant coefficients h[k]

Conversion of FIR Coefficient in CSD Only one nonzero CSD digit for approximately each 20 dB of stopband attenuation Four non-zero digits per coefficient for 80 dB stopband attenuation

Example: CSD Representation Let a coefficient is 0 1 1 0 1 1 1 0 1 1 0 1 1 Converting to CSD and keeping 4 non-zero digits: 1 0 0 1 0 0 0 1 0 0 1 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10

CSD multiplier 1 s s s 1 s s s s s s s 1 s s s s s s s s s s xn as it is 1 Shift xn by 3 Take 1’s complement Add a 1 to the LSB, sign extension s s s 1 Shift xn by 7 Take 1’s complement, Add a 1 to the LSB, sign extension s s s s s s s 1 Shift xn by 10 Take 1’s complement, Add a 1 to the LSB, sign extension s s s s s s s s s s

CSD Multiplier in 5-coeff FIR filter REG xn N Xn-1 Xn-2 Xn-3 Xn-4 h0 h1 h2 h3 h4

An Optimal Direct Form FIR Filter Architecture

Example: CSD Representation 0110111011011101 0110111011100101 0110111100100101 0111000100100101 1001000100100101

CANONICAL SIGNED DIGIT REPRESENTATION FOR FIR DIGITAL FILTERS CSD FIR paper CANONICAL SIGNED DIGIT REPRESENTATION FOR FIR DIGITAL FILTERS

Optimized DFG Transformation Use compression tree and remove the use of CPA in a feedback loop The result is kept in partial sum and partial carry form The first order difference equation changes to

Example 1: First Order IIR Filter DFG with one adder and one multiplier in the critical path. Transformed DFG with Wallace compression tree and CPA outside the feedback loop

Example 2: DFT 2nd Order IIR Filter

Example: Optimal Mapping: Design Option 1 Optimized implementation with CSD multipliers, compression trees and CPA outside the IIR filter

Design Option 2 Using unified reduction trees for the feedforward and feedback computations and CPA outside the filter

Design Option 3 CPA outside the feedback loop

FIR Filter: Direct Form

All multiplications are implemented as one compression tree and a single CPA

Example: Conversion to Fixed-Point h[n] = [0.0246 0.2344 0.4821 0.2344 0.0246] h[n] = round(h[n]*215) = [805 7680 15798 7680 805] 16’b0000_0110_0100_1010 16’b0011_1100_0000_0000 16’b0011_1101_1011_0110

Conversion in CSD

Keeping maximum of 4 non-zero CSD in each coefficient results in

Input to Compression Tree

CV Computation for first CSD multiplier

Pipelined DF FIR Filter Pipeline direct form FIR filter for FPGAs with DSP48 blocks

Transpose Direct Form FIR Filter xn Critical Path h0 h1 h2 h3 h4 X X X X X xnh2 xnh3 xnh4 xnh0 xnh1 + + + +

Critical Path

Filter Implementation

TD FIR with one stage of pipelining registers

Deeply pipelined TDF FIR filter with critical path equal to one full adder delay

Same Example

TDF Implementation

Example from the Book

Hybrid FIR Filter Structure

Hybrid Designs

Complexity Reduction Adv DSD contents

Complexity Reduction Constituent sub graphs that are shared in the original graph Example: three multipliers, 3, 53 and 585 with x

Optimized Implementation Selected sub-graphs from previous slide

Find common sub-expression Eliminate their re-use

Example: Common Sub-expression Elimination

Horizontal Common Sub-expressions for the example in the text

Vertical Sub-expressions Elimination

Common Sub Expression

Optimized implementation exploiting vertical common sub-expressions

Example of horizontal and vertical sub-expressions elimination

Distributed Arithmetic Based Design Yet another way of looking at dot product design

ROM for Distributed Arithmetic x2b x1b x0b Contents of ROM 1 A0 A1 A1+ A0 A2 A2+ A0 A2+ A1 A2+ A1+ A0

DA for computing the dot product of integer numbers for N=4 and K=3

Look-up table x2b x1b x0b Contents of ROM 1 A0 3 A1 -1 A1+ A0 2 A2 5 1 A0 3 A1 -1 A1+ A0 2 A2 5 A2+ A0 8 A2+ A1 4 A2+ A1+ A0 7

DA-based architecture for implementing an FIR filter of length L and N-bit data samples

Cycle by cycle working of DA Address LUT Accumulator 3’b100 5 000101_000 1 3’b111 7 001001_100 2 3’b000 -1 000011_110 3 3’b101 8 111001_111

DA-based parallel implementation of an 18-coefficient FIR filter setting L=3 and M=6

A LUT-less implementation of a DA-based FIR filter

A parallel implementation for M=K uses a 2:1 MUX, compression tree and a CPA

Reducing the output of the multiplexers using a CPA-based adder tree and one accumulator

DA-based IIR filter design

Two ROM-based design

One ROM-based design

DFT implementation using circular convolution

Optimized TDF implementation of the DF implementation in previous figure

Questions/Feedback !!