Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale.

Slides:



Advertisements
Similar presentations
Functions and Functional Blocks
Advertisements

Chapter 2 Logic Circuits.
التصميم المنطقي Second Course
Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.
CSE-221 Digital Logic Design (DLD)
Chapter 3 Simplification of Switching Functions
Lecture 14 Today we will Learn how to implement mathematical logical functions using logic gate circuitry, using Sum-of-products formulation NAND-NAND.
Design of Arithmetic Circuits – Adders, Subtractors, BCD adders
Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.
Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.
Grover’s Algorithm in Machine Learning and Optimization Applications
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
1 CSE 20: Lecture 7 Boolean Algebra CK Cheng 4/21/2011.
CS 105 Digital Logic Design
Digital Logic Design Adil Waheed. BOOLEAN ALGEBRA AND LOGIC SIMPLIFICATION AND gate F = A.B OR gate F = A + B NOT gate F = A NAND gate F = A.B NOR gate.
Chapter 2: Boolean Algebra and Logic Functions
Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
Binary Numbers.
BOOLEAN ALGEBRA Saras M. Srivastava PGT (Computer Science)
SUPLEMENTARY CHAPTER 1: An Introduction to Digital Logic The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Boolean Algebra and Digital Circuits
Department of Computer Engineering
Lecture 7 Topics –Boolean Algebra 1. Logic and Bits Operation Computers represent information by bit A bit has two possible values, namely zero and one.
Enhancing FPGA Performance for Arithmetic Circuits Philip Brisk 1 Ajay K. Verma 1 Paolo Ienne 1 Hadi Parandeh-Afshar 1,2 1 2 University of Tehran Department.
IKI a-Combinatorial Components Bobby Nazief Semester-I The materials on these slides are adopted from those in CS231’s Lecture Notes.
Computer Organization 1 Logic Gates and Adders. Propositions –Venn Diagrams.
Chapter 6-1 ALU, Adder and Subtractor
CSI-2111 Computer Architecture Ipage Revision  Objective : To examine basic concepts of: –2.1 Numbering Systems –2.2 Binary Numbers –2.3 Boolean.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Logic Circuits I.
Arithmetic Building Blocks
Chapter 10 (Part 2): Boolean Algebra  Logic Gates (10.3) (cont.)  Minimization of Circuits (10.4)
Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale.
Module 9.  Digital logic circuits can be categorized based on the nature of their inputs either: Combinational logic circuit It consists of logic gates.
A Flexible DSP Block to Enhance FGPA Arithmetic Performance
Optimization Algorithm
Digital Electronics Lecture 6 Combinational Logic Circuit Design.
Copyright © 2004 by Miguel A. Marin1 COMBINATIONAL CIRCUIT SYNTHESIS CLASSIC TWO-LEVEL CIRCUIT SYNTHESIS MULTILEVEL-CIRCUIT SYNTHESIS FACTORIZATION DECOMPOSITION.
CHAPTER 4 Combinational Logic
Computing Systems Designing a basic ALU.
Computer Science 101 Circuit Design - Examples. Sum of Products Algorithm Identify each row of the output that has a 1. Identify each row of the output.
COE 202: Digital Logic Design Combinational Circuits Part 2 KFUPM Courtesy of Dr. Ahmad Almulhem.
1 Lecture 6 BOOLEAN ALGEBRA and GATES Building a 32 bit processor PH 3: B.1-B.5.
Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale.
Weikang Qian. Outline Intersection Pattern and the Problem Motivation Solution 2.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
CS221: Digital Logic Design Combinational Circuits
Computer Organization and Design Transistors & Logic - II Montek Singh Oct {21,28}, 2015 Lecture 10.
A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International.
1  2004 Morgan Kaufmann Publishers Performance is specific to a particular program/s –Total execution time is a consistent summary of performance For.
1 Chapter 4 Combinational Logic Logic circuits for digital systems may be combinational or sequential. A combinational circuit consists of input variables,
1  2004 Morgan Kaufmann Publishers Lets Build a Processor Almost ready to move into chapter 5 and start building a processor First, let’s review Boolean.
Iterative Layering: Optimizing Arithmetic Circuits by Structuring the Information Flow Ajay K. Verma 1, Philip Brisk 2, Paolo Ienne 1 International Conference.
Boolean Algebra. BOOLEAN ALGEBRA Formal logic: In formal logic, a statement (proposition) is a declarative sentence that is either true(1) or false (0).
Grade School Again: A Parallel Perspective CS Lecture 7.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Improving.
1 CS 352 Introduction to Logic Design Lecture 1 Ahmed Ezzat Number Systems and Boolean Algebra, Ch-1 + Ch-2.
1 CS 352 Introduction to Logic Design Lecture 2 Ahmed Ezzat Boolean Algebra and Its Applications Ch-3 + Ch-4.
Digital Circuits Design Chin-Sung Lin Eleanor Roosevelt High School.
Amit Verma National Institute of Technology, Rourkela, India
Prof. Sin-Min Lee Department of Computer Science
Boolean Algebra.
Combinational Logic Logic circuits for digital systems may be combinational or sequential. A combinational circuit consists of input variables, logic gates,
Computer Architecture CST 250
Basic Logic Gates 1.
CSE 370 – Winter Combinational Logic - 1
Optimization Algorithm
ECE 352 Digital System Fundamentals
XOR Function Logic Symbol  Description  Truth Table 
Combinational Circuit Design
Presentation transcript:

Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale de Lausanne (EPFL) csda Progressive Decomposition: A Heuristic to Structure Arithmetic Circuits

2 Logic Synthesis: Unable to Impose Hierarchy and Structure Ripple-Carry Adder Carry-Lookahead Adder Our contribution Logic synthesis tools Local optimization via Boolean minimization Lacking for arithmetic circuits Architectural transformation Not with logic synthesis ab(a + b) + ca + ca + c

3 Leading Zero Detector (LZD) Finds the position of the most significant non-zero bit x i is TRUE if (i+1) th most-significant bit is the leading non-zero bit Large fan-in dependencies Convert x i to a binary number x i = a 15 a 14 … a 15-(i-2) a 15-(i-1) a 15-i

4 LZD: A Better Implementation [Oklobdzija94] Divide 16 bits into 4 blocks For each block compute the following: Position of the most significant non-zero bit Control bit V i is TRUE if at least one bit in this block is one Reduced fan-in dependencies Similar in principle to carry-lookahead addition

5 What is the Bottom Line? 0.30 ns (392.3 μm 2 ) 0.36 ns (426.8 μm 2 ) 16% faster, 8% smaller

6 Outline Related Work Architectural optimization How to impose hierarchy? Properties of Algebra Ring structure of Boolean expressions Progressive Decomposition Algorithm Results Conclusions

7 Related Work Manual approaches for optimizing circuits of interest The entire field of computer arithmetic Great ideas by really smart people! Algorithmic approaches for a particular class of circuits Variable group size CLA adder [Lee91] Irregular partial product compressors [Stelling98] Heuristics to optimize general classes of circuits Kernel and co-kernel extraction [Brayton82] Architecture exploration via exhaustive search [Verma06]

8 Input Condensation Leader expressions: Sufficient to evaluate the whole of an expression Once you evaluate them, you can discard the input bits Compute all leader expressions in parallel Recursively compute leader expressions again IN One Big Circuit Leader Expressions OUT IN L|L| < |IN| Smaller Circuit OUT s c 8-input parallel counter (Leader Expressions)

9 Hierarchical Circuit Construction Use leader expressions as building blocks to impose hierarchy Theorem: This approach always generalizes to circuits that have an “effective online algorithm”

10 Reed-Muller Form XOR-of-Product Form Better suits arithmetic circuits Forms a ring under the operations XOR and AND Boolean properties exploited by our algorithm Identities Null Spaces Linear Dependence Before After X = [a 1 b 1 + (a 1 + b 1 )a 0 b 0 ]  [(a 1  b 1  a 0 b 0 )c 1 + c 0 (a 0  b 0 )(c 1 + (a 1  b 1  a 0 b 0 ))] X = a 1 b 1  a 0 a 1 b 0  a 0 b 0 b 1  a 1 c 1  b 1 c 1  a 0 b 0 c 1  a 0 c 0 c 1  a 0 a 1 c 0  a 0 b 1 c 0  a 0 b 0 c 0  b 0 c 0 c 1  a 1 b 0 c 0  b 0 b 1 c 0

11 Progressive Decomposition: Algorithm Overview Choose a subset of input bits How many bits? Many different combinations? Find leader expressions Optimize via Boolean ring properties Find identities Discard dependent expressions xyz z = f(x, y) Rewrite circuit in terms of leader expressions Recursively process the remaining circuit

12 Finding Leader Expressions Similar to kernel extraction in algebraic factorization X = ad  aef  bcd  abe  ace  bcef L(X, {a, b, c}) = ? Separate product terms: (chosen, not chosen) input bits X = ad  aef  bcd  abe  ace  bcef {(a, d), (a, ef), (bc, d), (ab, e), (ac, e), (bc, ef)} {( , γ), (β, γ)}  {(   β, γ)} Combine expressions: {(a, d), (a, ef), (bc, d), (ab, e), (ac, e), (bc, ef)} {(a  bc, d), {(a, d), (a, ef), (bc, d), (ab, e), (ac, e), (bc, ef)} {(a  bc, d), (a  bc, ef), {(a, d), (a, ef), (bc, d), (ab, e), (ac, e), (bc, ef)} {(a  bc, d), (a  bc, ef), (ab  ac, e)} {( , β), ( , γ)}  {( , β  γ)} Combine expressions: {(a  bc, d), (a  bc, ef), (ab  ac, e)} {(a  bc, d  ef), (ab  ac, e)} Leader expression of X using inputs {a, b, c}  β   γ   (β  γ)  γ  βγ  (   β)γ X = (a  bc)d  (a  bc)ef  (ab  ac)e X = (a  bc)(d  ef)  (ab  ac)e

13 Hierarchy and Circuit Structure X = ad  aef  bcd  abe  ace  bcef X = s 1 (d  ef)  s 2 e L(X, {a, b, c}) = {a  bc, ab  ac} s 1 = a  bc s 2 = ab  ac abcdef s1s1 s2s2 X abcdef X abcdef X

14 Example: Ternary Adder (3 rd Output) X = [a 1 b 1 + (a 1 + b 1 )a 0 b 0 ]  [(a 1  b 1  a 0 b 0 )c 1 + c 0 (a 0  b 0 )(c 1 + (a 1  b 1  a 0 b 0 ))] Ripple-Carry Adder L(X, {a 1, b 1, c 1 }) = {a 1  b 1  c 1, a 1 b 1  b 1 c 1  a 1 c 1 } sum carry Carry-save adder 0 ++ X ++ a0a0 b0b0 a1a1 b1b1 + c1c1 c0c0 0 0 Ripple-Carry Adder 3:2 Compressor CSA a0a0 b0b0 c0c0 a1a1 b1b1 c1c X

15 Exploiting the Null Space Null space of P, N(P): All expressions that satisfy PX = 0 X = ab(c  d)  (a  b)(cd  e)  ce  de L(X, {a, b}) = {ab, a  b} s 1  N(s 2 ) X = s 1 (c  d)  s 2 (cd  e)  ce  de s 1 = ab s 2 = a  b … X = (c  d)(s 1  e)  cds 2  es 2 L(X, {c, d}) = {cd, c  d} t 1  N(t 2 ) t 1 = cd t 2 = c  d X =t 2 (s 1  e)  t 1 s 2  es 2 … X = s 2 (t 1  e)  t 2 (s 1  e) L(X, {s 2, t 2 }) = ? {(s 2, t 1  e), (t 2, s 1  e)} {(s 2, s 1  t 1  e), (t 2, s 1  t 1  e)}{(s 2, s 1  t 1  e), (t 2, s 1  e)} s 1  N(s 2 ) {(s 2, s 1  t 1  e), (t 2, s 1  t 1  e)} t 1  N(t 2 ) {(s 2  t 2, s 1  t 1  e)} {(s 2, s 1  t 1  e), (t 2, s 1  t 1  e)} {(s 2  t 2, s 1  t 1  e)} {( , γ), (β, γ)}  {(   β, γ)} Combine expressions: X = (s 2  t 2 )(s 1  t 1  e)

16 Linear Independence Linear dependence Between leader expressions Or between their corresponding coefficients Rewrite some elements in terms of others LZD Initial basis: Reduces to: {V 0, P 00, P 01, V 0  P 00, V 0  P 01 } {V 0, P 00, P 01 } {a  b, b  c, c  a} c  a = (a  b)  (b  c) {a  b, b  c}

17 7-bit Majority Function s1s s2s s3s s4s a4a3a2a1a4a3a2a s 3 = s 1 s 2 s 1 = a 1  a 2  a 3  a 4 s 2 = a 1 a 2  a 1 a 3  a 1 a 4  a 2 a 3  a 2 a 4  a 3 a 4 s 3 = a 1 a 2 a 3  a 1 a 2 a 4  a 1 a 3 a 4  a 2 a 3 a 4 s 4 = a 1 a 2 a 3 a 4 L(X, {a 1, a 2, a 3, a 4 }) = {s 1, s 2, s 3, s 4 } s 1, s 2  N(s 4 )s 4  N(s 1 )s 4  N(s 2 ) 4:3 Parallel Counter a4a4 a3a3 a2a2 a1a1 s4s4 s2s2 s1s1 Returns 1 if at least 4 bits are 1; 0 otherwise

18 Propagation of Null Space Information X = ap  bp  cp  ax  ay  by  bz  cx  cz {az = 0, bx = 0, cy = 0} {(a, p  x  y),(b, p  y  z), (c, p  x  z)} L(X, {a, b, c}) = ? {(a, p  x  y  z), (b, p  x  y  z), (c, p  x  z)} x  N(b) {(a, p  x  y  z), (b, p  x  y  z),{(a, p  x  y  z), z  N(a) {( , γ), (β, γ)}  {(   β, γ)} Combine expressions: {(a  b, p  x  y  z), (c, p  x  z)} {(a, p  x  y  z), (b, p  x  y  z), (c, p  x  z)} {(a  b, p  x  y  z), (c, p  x  z)} {(a  b, p  x  y  z), (c, p  x  y  z)} y  N(c) {(a  b  c, p  x  y  z)} {(a  b, p  x  y  z), (c, p  x  y  z)} {(a  b  c, p  x  y  z)} L(X, {a, b, c}) = {a  b  c} X = (a  b  c)(p  x  y  z)

19 Experimental Setup Circuit written by hand Sum-of-product form Known Arithmetic Circuits Progressive Decomposition Synopsis Design Compiler - compile_ultra - minimize delay Artisan Standard Cells UMC (90 nm) 1 2 3

20 Results Delay (ns) Area (μm 2 ) TGA DesignWare carry (A-B) CSA Unoptimized Progressive decomposition Manual implementation 16-bit LZD/LOD 15-bit comparator 16-bit adder 16:5 parallel counter 15-bit majority function 32-bit LOD 12-bit 3-input adder

21 Conclusion Progressive Decomposition Algorithm Arithmetic circuits Previously, hard to optimize Expert ideas can be generalized and automated Automatically infers successful circuit designs from the literature Carry-lookahead adder Structured LZD circuit Carry-save addition Parallel counters Long-term goal Replace manual circuit design with automated tools