Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,

Slides:



Advertisements
Similar presentations
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
Advertisements

Reconfigurable Computing - Verifying Circuits John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.
Henry Hexmoor1 Chapter 5 Arithmetic Functions Arithmetic functions –Operate on binary vectors –Use the same subfunction in each bit position Can design.
ECE 331 – Digital System Design
UNIVERSITY OF MASSACHUSETTS Dept
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Chapter 6 Arithmetic. Addition Carry in Carry out
UNIVERSITY OF MASSACHUSETTS Dept
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Contemporary Logic Design Arithmetic Circuits © R.H. Katz Lecture #24: Arithmetic Circuits -1 Arithmetic Circuits (Part II) Randy H. Katz University of.
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
M. Interleaving Montgomery High-Radix Comparison Improvement Adders CLA CSK Comparison Conclusion Improving Cryptographic Architectures by Adopting Efficient.
COE 308: Computer Architecture (T041) Dr. Marwan Abu-Amara Integer & Floating-Point Arithmetic (Appendix A, Computer Architecture: A Quantitative Approach,
Computer ArchitectureFall 2007 © August 29, 2007 Karem Sakallah CS 447 – Computer Architecture.
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Coping With the Carry Problem 1. Limit Carry to Small Number of Bits Hybrid Redundant Residue Number Systems 2.Detect the End of Propagation Rather Than.
Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
3-1 Chapter 3 - Arithmetic Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles of Computer Architecture.
Logical Circuit Design Week 8: Arithmetic Circuits Mentor Hamiti, MSc Office ,
Digital Arithmetic and Arithmetic Circuits
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Chapter 4 – Arithmetic Functions and HDLs Logic and Computer Design Fundamentals.
Chapter # 5: Arithmetic Circuits
Reconfigurable Computing - Multipliers: Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on.
Spring 2002EECS150 - Lec12-cl3 Page 1 EECS150 - Digital Design Lecture 12 - Combinational Logic Circuits Part 3 March 4, 2002 John Wawrzynek.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Mohammad Reza Najafi Main Ref: Computer Arithmetic Algorithms and Hardware Designs (Behrooz Parhami) Spring 2010 Class presentation for the course: “Custom.
Reconfigurable Computing - Type conversions and the standard libraries John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots.
Reconfigurable Computing - FPGA structures John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
1 CS 151: Digital Design Chapter 4: Arithmetic Functions and Circuits 4-1,2: Iterative Combinational Circuits and Binary Adders.
EE2174: Digital Logic and Lab Professor Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University CHAPTER 8 Arithmetic.
Combinational Circuits
Computer Architecture Lecture 32 Fasih ur Rehman.
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Reconfigurable Computing - Pipelined Systems John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.
Unconventional Fixed-Radix Number Systems
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
ECE 331 – Digital System Design Multi-bit Adder Circuits, Adder/Subtractor Circuit, and Multiplier Circuit (Lecture #12)
CS/EE 3700 : Fundamentals of Digital System Design Chris J. Myers Lecture 5: Arithmetic Circuits Chapter 5 (minus 5.3.4)
CPEN Digital System Design
Addition, Subtraction, Logic Operations and ALU Design
Number Representation and Arithmetic Circuits
Computer Architecture Lecture 11 Arithmetic Ralph Grishman Oct NYU.
Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.
Addition and multiplication Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
UNIT 2. ADDITION & SUBTRACTION OF SIGNED NUMBERS.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
CS151 Introduction to Digital Design Chapter 4: Arithmetic Functions and HDLs 4-1: Iterative Combinational Circuits 4-2: Binary Adders 1Created by: Ms.Amany.
Explain Half Adder and Full Adder with Truth Table.
Choosing RNS Moduli Assume we wish to represent 100, Values Standard Binary  lg 2 (100,000) 10  =   =17 bits RNS(13|11|7|5|3|2), Dynamic.
Addition and multiplication1 Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
Full Adder Truth Table Conjugate Symmetry A B C CARRY SUM
Prof. Sin-Min Lee Department of Computer Science
Integer Multiplication and Division
UNIVERSITY OF MASSACHUSETTS Dept
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu
Reconfigurable Computing - Options in Circuit Design
Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.
ECE 331 – Digital System Design
Unsigned Multiplication
Unconventional Fixed-Radix Number Systems
UNIVERSITY OF MASSACHUSETTS Dept
ECE 352 Digital System Fundamentals
ECE 352 Digital System Fundamentals
UNIVERSITY OF MASSACHUSETTS Dept
Presentation transcript:

Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia

Design Options – so far ‘Structural Options’ 1.Bit serial  Most Space efficient  Slow  One bit of result produced per cycle  Sometimes this isn’t a problem  Example  Small efficient adder  Very small multiplier

Serial Circuits  Bit serial adder ENTITY serial_add IS PORT( a, b, clk : IN std_logic; sum, cout : OUT std_logic ); END ENTITY serial_add; ARCHITECTURE df OF serial_add IS SIGNAL cint : std_logic; BEGIN PROCESS( clk ) BEGIN IF clk’EVENT AND clk = ‘1’ THEN sum <= a XOR b XOR cint; cint <= (a AND b) OR (b AND cint) OR (a AND cint ); END IF; END PROCESS; cout <= cint; END ARCHITECTURE df; 2-bit register c out sum a b c in FA Note: The synthesizer will insert the latch on the internal signals! clock Note: Reset or clear needed to frame operands!

Design Options – so far ‘Structural Options’ 1.Bit serial  Most Space efficient 2.Sequential  Combinatorial / bit-parallel block + register  Example  Sequential multiplier – adder + shifter + register

Design Options – so far ‘Structural Options’ 1.Bit serial 2.Sequential 3.Pipelined  High throughput  High latency too though!  Need to achieve pipeline balance  Every stage should have similar propagation delay  More later!  Example  Pipelined multiplier 4.Examine communication patterns  Example  Eliminate horizontal carry chains in parallel array multiplier

Design Options – so far ‘Structural Options’ 1.Bit serial 2.Sequential 3.Pipelined 4.Examine communication patterns  Example  Eliminate horizontal carry chains in parallel array multiplier

Multipliers  We can add the partial products with FA blocks b0b0 b1b1 a0a0 a1a1 a2a2 a3a3 FA 0 p0p0 p1p1 b2b2 product bits Try to use a more efficient adder in each row? A simpler scheme uses a ‘carry save’ adder – which pushes the carry out’s down to the next row! Note that an extra adder is needed below the last row to add the last partial products and the carries from the row above! Carry select adder

Design Options – so far ‘Structural Options’ 1.Bit serial 2.Sequential 3.Pipelined 4.Examine communication patterns 5.Tree structures  Example  Combine carries in level below  Wallace Tree multiplier

Signed digit arithmetic – Avoiding the carries!  If we use more than one bit to represent each bit of an operand In binary, the partial products are trivial – if multiplier bit = 1, copy the multiplicand else 0 Use an ‘and’ gate!

Residue Arithmetic  Residue Number Systems  A verse by the Chinese scholar, Sun Tsu, over 1500 years ago posed this problem  What number has remainders 2, 3 and 2 when divided by the numbers 7, 5 and 3, respectively?  This is probably the first documented use of number representations using multiple residues  In a residue number system, a number, x, is represented by the list of its residues (remainders) with respect to k relatively prime moduli, m k-1, m k-2, …, m 0  Thus x is represented by (x k-1, x k-2, …, x 0 )  where  x i = x mod m i  So the puzzle may be re-written What is the decimal representation of (2,3,2) in RNS(7,5,3)?

Residue Number Systems  The dynamic range of a RNS, M = m k-1  m k-2  … m 0  For example, in the system RNS(8,7,5,3) M = 8  7  5  3 = 840  Thus we have  Any RNS can be viewed as a weighted representation  In RNS(8,7,5,3), the weights are:  Thus (1,2,4,0) represents (105     0) 840 = (1689) 840 = 9 DecimalRNS(8,7,5,3) 0 or 840 or -840 or …(0,0,0,0) 1 or 841 or -839 or …(1,1,1,1) 2 or 842 or …(2,2,2,2) 8 or 848 or …(0,1,3,2)

Residue Number Systems - Operations  Complement  To find –x, complement each of the digits with respect to the modulus for that digit 21 = (5,0,1,0)  so -21 = (8-5,0,5-1,0) = (3,0,4,0)  Addition or subtraction is performed on each digit ( 5, 5, 0, 2 ) RNS = 5 10 ( 7, 6, 4, 2 ) RNS = ( (5+7)=4 8, (5+6)=4 7, 4, (2+2)=1 3 ) RNS = 4 10 ( 4, 4, 4, 1 ) RNS = 4 10  Multiplication is also achieved by operations on each digit ( 5, 5, 0, 2 ) RNS = 5 10 ( 7, 6, 4, 2 ) RNS = ( (5x7)=3 8, (5x6)=2 7, 0, (2x2)=1 3 ) RNS = ( 3, 2, 0, 1 ) RNS = -5 10

Residue Arithmetic - Advantages  Parallel independent operations on small numbers of digits  Significant speed ups  Especially for multiplication!  4 bit x 4 bit multiplier (moduli up to 15) much simpler than 16 bit x 16 bit one  Carries are strictly confined to small numbers of bits  Each modulus is only a small number of bits  Can be implemented in Look Up Tables (LUTs)  6 bit residues (moduli up to 63)  64 x 64 x 6 bits required (<4Kbytes)

Residue Arithmetic – Choosing the moduli  Largest modulus determines the overall speed –  Try to make it as small as possible  Simple strategy  Choose sequence of prime numbers until the dynamic range, M, becomes large enough eg Application requires a range of at least 10 5, ie M  10 5  For RNS(13,11,7,5,3,2), M = 30,300  Range is too low, so add one more modulus:  RNS(17,13,11,7,5,3,2), M = 510,510  Now each modulus requires a separate circuit and our range is now ~5 times as large as needed, so remove 5 :  RNS(17,13,11,7,3,2), M = 102,102  Six residues, requiring = 19 bits  The largest modulus (17 requiring 5 bits) determines the speed, so …

Residue Arithmetic – Choosing the moduli Application requires a range of at least 10 5, ie M  10 5  …  RNS(17,13,11,7,3,2), M = 102,102  Six residues, requiring = 19 bits  The largest modulus ( 17 requiring 5 bits) determines the speed, so combine some of the smaller moduli (Remember the requirement is that they be relatively prime!)  Try to produce the largest modulus using only 5 bits – Pair 2 and 13, 3 and 7  RNS(26,21,17, 11), M = 102,102  Four residues, requiring = 19 bits (no improvement in total bit count, but 2 fewer ALUs!)  Better …?

Residue Arithmetic – Choosing the moduli Application requires a range of at least 10 5, ie M  10 5  …  RNS(26,21,17, 11), M = 102,102  Four residues, requiring = 19 bits (no improvement in total bit count, but 2 fewer ALUs!)  Include powers of smaller primes before primes, starting with  RNS(3,2), M = 6  Note that 2 2 is smaller than the next prime, 5, so move to  RNS(2 2,3), M = 12  (trying to minimize the size of the largest modulus)  After including 5 and 7, note that 2 3 and 3 2 are smaller than 11:  RNS(3 2,2 3,7,5), M = 2,520  Add 11  RNS(11,3 2,2 3,7,5), M = 27,720  Add 13  RNS(13,11,3 2,2 3,7,5), M = 360,360

Residue Arithmetic – Choosing the moduli Application requires a range of at least 10 5, ie M  10 5  …  Add 13  RNS(13,11,3 2,2 3,7,5), M = 360,360  M is now 3  larger than needed, so replace 9 with 3, then combine 5 and 3  RNS(15,13,11,2 3,7), M = 360,360  5 moduli,  = 18 bits,  largest modulus has 4 bits  You can actually do somewhat better than this!  Reference: B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, Oxford University Press, 2000

Residue Numbers - Conversion  Inputs and outputs will invariably be in standard binary or decimal representations,  conversion to and from them is required  Conversion from binary | decimal to RNS  Problem: Given a number, y, find its residues wrt moduli, m i  Divisions would be too time-consuming!  Use this equality:  (y k-1 y k-2 …y 1 y 0 ) 2  mi =   2 k-1 y k-1  mi + … +  2y 1  mi +  y 0  mi  mi  So we only need to precompute the residues  2 j  mi for each of the moduli, m i, used by the RNS

Residue Numbers - Conversion  2 j  3  2 j  5  2 j  7 2 j j For RNS(8,7,5,3) : 8 is trivially calculated (3 LSB bits) For 7, 5 and 3, we need the powers of 2 modulus 7, 5 and 3

Residue Numbers - Conversion  2 j  3  2 j  5  2 j  7 2 j j Find = = in RNS(8,7,5,3) : 8 is = 4 10 Note that the additions are done in a modular adder! Worst case: k additions for each residue for a k -bit number 7 = 7 = 7 = 3

Residue Numbers - Conversion

Residue Arithmetic - Disadvantages  Range is limited  Division is hard!  Comparison, sign (<0?) are hard  Still suitable for some DSP applications  Only use +, x  Range is limited  Result range is known  Examples: digital filters, Fourier transforms

Multipliers  ‘Long’ multiplication a 3 a 2 a 1 a 0 b 3 b 2 b 1 b 0 x x x x x x x x x In binary, the partial products are trivial – if multiplier bit = 1, copy the multiplicand else 0 Use an ‘and’ gate! b0b0 b1b1 b2b2 b3b3 a0a0 b0b0 a1a1 a2a2 a3a3 first row of partial products

Multipliers  We can add the partial products with FA blocks b0b0 b1b1 a0a0 a1a1 a2a2 a3a3 FA 0 p0p0 p1p1 b2b2 product bits

Parallel Array Adder  We can build this adder in VHDL with two GENERATE loops FOR j IN 0 TO n-1 GENERATE -- For each row FOR j IN 0 TO n-1 GENERATE –- Generate a row pjk : full_adder PORT MAP( … ); END GENERATE; This part is straight-forward! SIGNAL pa, pb, cout : ARRAY( 0 TO n-1 ) OF ARRAY( 0 TO n-1 ) OF std_logic; … but you need to fill in the PORT MAP using internal signals!

Multipliers  We can add the partial products with FA blocks b0b0 b1b1 a0a0 a1a1 a2a2 a3a3 FA 0 p0p0 p1p1 b2b2 product bits Optimization 1: Replace this row of FAs Time? What’s the worst case propagation delay?