Presentation is loading. Please wait.

Presentation is loading. Please wait.

Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations Using Canonical TED Representation M. Ciesielski, D. Gomez-Prado,Q.

Similar presentations


Presentation on theme: "Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations Using Canonical TED Representation M. Ciesielski, D. Gomez-Prado,Q."— Presentation transcript:

1 Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations Using Canonical TED Representation M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation”, in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems ECE 667 Synthesis and Verification of Digital Systems Spring 2011 Slides adapted from D. Gomez-Prado,Q. Ren, M. Ciesielski, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

2 2 Electrical and Computer Engineering Overview  Motivation  TED Review  Related Work  TED Decomposition System  TED Linearization  Product Term Extraction  Sum-Term Extraction  Reordering  DFG Generation  Replacing constant multipliers by Shifters  Conclusion  References

3 3 Electrical and Computer Engineering Motivation F=a ⋅ (f ⋅ (g+d ⋅ c)+c ⋅ e ⋅ g) F=a ⋅ f ⋅ g+a ⋅ f d ⋅ c+a ⋅ c ⋅ e ⋅ g Minimum number of operations: 5MPY, 2ADD F=(a ⋅ f)(g+d ⋅ c)+(a ⋅ c) ⋅ e ⋅ g number of operations: 6MPY, 2ADD Res: 2 MPY,1 ADD 8 MPY, 2 ADD 1 2 3 4 5 1 2 3 4 L=3 MPY +1 ADD L = 3 MPY +2 ADD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

4 4 Electrical and Computer Engineering TED Review [Construction] zu qw (zu+qw) + x(zu+qw) pw 2 + + yw Canonical for the given order: x,z,u,q,p,y,w 1 2 w ^2 1 w Notation: NON-LINEAR Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

5 5 Electrical and Computer Engineering RELATED WORK  HDL Compilers High level synthesis systems – Cyber, Spark, Catapult C – Lacks local optimility  Kernel based decomposition [Hosangadi et al, Optimizing Polynomial Expressions by algebraic factorization and cse, IEEE Transactions 2005] Lacks canonicity  Cut based decomposition (TED based) [Askar et al. “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007] Limitation – only applicable to TEDs with disjoint decomposition property

6 6 Electrical and Computer Engineering Cut based decomposition (Related Work)  Top down approach  Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs  Different sequence of cuts results in different DFG Sequence - A3,A1,M1,A2

7 7 Electrical and Computer Engineering Cut based decomposition (Related Work)  Top down approach  Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs  Different sequence of cuts results in different DFG Sequence – A1,A3,M1,A2 Sequence - A3,A1,M1,A2

8 8 Electrical and Computer Engineering TED decomposition [TDS]  Cut based decomposition mentioned earlier only works for TEDs with disjoint decomposition property Many TEDs don’t have this property  New approach – Bottom up Identify algebraic operations and extract from the graph Also works for TEDs without disjoint decomposition property TED based factorization, CSE, and decomposition jointly referred asTED decomposition  Systematically involves Linearization Product-term extraction Sum-term extraction Reordering DFG generation

9 9 Electrical and Computer Engineering Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009) TDS System Overview TED linearization Variable ordering TED factorization & decomposition Constant multiplication & shifter generation Common subexpression elimination (CSE) TED-based Transformations Static timing analysis Latency optimization Resource constraints DFG-based Transformations Behavioral transformations Optimized DFG TDS netlist Design objectives Design constraints Structural elements Functional TED Structural DFG TDS flow Matrix transforms, Polynomials C, Behavioral HDL DFG extraction High Level Synthesis (GAUT) RTL VHDL Original DFG HLS flow

10 10 Electrical and Computer Engineering TED Linearization  TED naturally represents polynomial in its factored form  This efficiency is missing when considering non-linear expressions F=a 2 c+abc a could be factored out split a^2 into a1 and a2 F=a 1 (a 2 +b)c

11 11 Electrical and Computer Engineering TED Decomposition split w^2 into w1 and w2 TED Linearization [back to previous example] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

12 12 Electrical and Computer Engineering TED Linearization [Concept] ^1 x ^n ^0 F0F0 F1F1 FnFn ….. x1x1 ^0 F0F0 x2x2 F1F1 xnxn F n-1 FnFn ^1 ^0 ^1 split x k = x 1.x 2.x 3 …..x k, where x i =x j for all i,j iteratively perform splitting on high order nodes above substitution results in Horner form which contains minimum no. of multiplications

13 13 Electrical and Computer Engineering Product Term Extraction  Extractable Product Term – product of variables which appear in expression only once Can be extracted from TED without duplicating any of it’s variables  Set of nodes connected by a series of multiplicative edges only starting and ending nodes can have incident additive edges Starting and ending nodes can have more than one incoming or outgoing multiplicative edge Ending node can be terminal node 1  [TDS] recursively identify such terms by traversing the graph in a bottom-up fashion For each node use depth first approach for including nodes in product term

14 14 Electrical and Computer Engineering start u has only one * parent …YES u has only one child path …YES z has only one * parent …YES z has only one * child path …NO CONTINUE BACKTRACK zu P1 P2 Product-Term Extraction [back to example] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

15 15 Electrical and Computer Engineering Sum Term Extraction  Extractable Sum Term – sum of variables which appear in expression only once Can be extracted from TED without duplicating any of it’s variables  “Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only”  [TDS] recursively identify such terms by traversing the graph in a bottom-up fashion For each node, make a list of incident nodes and extract the nodes from the list if connected by additive edges only  [TDS] Uses associativity property of addition

16 16 Electrical and Computer Engineering Keep support (irreducible) start S1 Sum-Term Extraction [back to example] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

17 17 Electrical and Computer Engineering Sum Term Extraction  Extractable Sum Term – sum of variables which appear in expression only once Can be extracted from TED without duplicating any of it’s variables  “Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only”  [TDS] recursively identify such terms by traversing the graph in a bottom-up fashion For each node, make a list of incident nodes and extract the nodes from the list if connected by additive edges only  [TDS] Uses associativity property of addition

18 18 Electrical and Computer Engineering Example to illustrate Associativity* S1=b+d S2=a+c

19 19 Electrical and Computer Engineering Stop when TED is Irreducible. Now generate DFG – (to be explained later) If Sum term extraction results in more product terms, go back Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009) Sum-Term Extraction [cont. – back to example]

20 20 Electrical and Computer Engineering P3 P4 P5 S3 Stop when TED is Irreducible. S2 Reordering [Back to previous example -> Iteration 2 extraction] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

21 21 Electrical and Computer Engineering F = S3 = P5+P4 = x·S2+w1·S1 = x·(P1+P3)+w1·(P2+y) = x·(z·u+q·w1)+w1·(p·w2+y) = x·(z·u+q·w)+w·(p·w+y) 1× total: 5 MPY, 3 ADD 1+ Normal Factored Form* Factored form associated with a TED is called NFF for that TED, if the order Of variables in the factored form is Compatible with the order in the given TED Theorem: The NFF derived from a linear TED Is unique Canonical Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

22 22 Electrical and Computer Engineering DFG Generation and Optimization  Transform each irreducible TED into simple DFG Additive edge -> addition operation Multiplicative edge -> multiplication operation Break multiple operands operations into chain of operations  [TDS] maintain a hash table for DFG nodes keyed by the corresponding function Helps in reusing the node, if same function/expression found again Captures redundancy due to poor variable order during factorization  DFG is not unique Can be restructured and balanced to minimize cost

23 23 Electrical and Computer Engineering Data Flow Graph L=2 MPY +2 ADD Req 3 MPY, 2 ADD total: 5 MPY, 3 ADD Reordering cost 1 2 3 4 Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

24 24 Electrical and Computer Engineering S2 P3 P4 S3 L=2 MPY +2 ADD Req 3 MPY, 2 ADD Reordering [-> Iteration 3 extraction] Cost involves Reordering of variable Extraction DFG generation Annotating Latency and resource requirements Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

25 25 Electrical and Computer Engineering 1 2 3 4 F 1 2 3 4 5 total: 4 MPY, 3 ADD F = S3 = P4+P3 = w ⋅ S2+x ⋅ P1 = w ⋅ (q+S1)+x ⋅ (z ⋅ u) = w ⋅ (q+P2+y)+x ⋅ z ⋅ u = w ⋅ (q+p ⋅ w+y)+x ⋅ z ⋅ u L=2 MPY +2 ADD L=2 MPY +3 ADD Req 1 MPY,1 ADD 1× 1+ Reordering cost L=2 MPY +2 ADD Req 2 MPY, 1 ADD Previous cost L=2MPY+2ADD Req=3MPY,2ADD Generating and evaluating new Data Flow Graph [Iteration 3] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

26 26 Electrical and Computer Engineering Through reordering all cases can be obtained 1 2 3 4 Reordering [-> Iteration 4 extraction,DFG generation] Design Space Exploration Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

27 27 Electrical and Computer Engineering Replacing constant multipliers*  By shifters Transform constant multiplications into shifters, while considering factorization involving shifters  Steps Represent constant in CSD format – Use shift variable L i (instead of 2 i for shifting i bits Generate TED with shift variables, linearize it and perform decomposition Replace terms involving shift variables (L i ) by i-bit shifters 7a + 6b L 3 (a+b) - L.b - a ((a+b)<<3) – (a+(b<<1)) (L 3 -1)a+(L 3 -L)b

28 28 Electrical and Computer Engineering Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009) TDS – TED Decomposition System  RECAP  Read in the CDFG file (cdfg) or polynomial expression (poly) or using pre-coded DSP transforms (tr)  Translate into functional TED (dfg2ted) and structural elements (comparators etc.)  Linearize its data path (linearize)  Iterate Iterate Product term extraction Sum term extraction Reorder to minimize latency (reorder)  Set of irreducible TEDs  Produce Final DFG (ted2dfg)and annotate back the CDFG file (write)  Data flow and computation intensive designs - DSP Design Space Exploration

29 29 Electrical and Computer Engineering Conclusion  Results in the paper show 15% Latency improvement and 7% area reduction when using DFG generated from TDS instead of using KBD Far better results when compared to original DFG  TDS – front end to GAUT  Fundamental limitation – decomposition dependent upon variable reordering which is an expensive operation

30 30 Electrical and Computer Engineering REFERENCES  M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation”, in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems  M. Ciesielski, S. Askar, D. Gomez-Prado, J. Guillot, and E. Boutillon, “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007, pp. 455–460  TDS—TED-Based Dataflow Decomposition System, Univ. Massachusetts,Amherst, MA. [Online]. Available: http://www.ecs.umass.edu/ece/labs/vlsicad/tds.html

31 31 Electrical and Computer Engineering QUESTIONS?

32 32 Electrical and Computer Engineering Experiment Setup* TED linearization Variable ordering TED factorization & decomposition Constant multiplication & shifter generation Common subexpression elimination (CSE) TED-based Transformations Static timing analysis Latency optimization Resource constraints DFG-based Transformations Behavioral transformations Optimized DFG TDS netlist Design objectives Design constraints Structural elements Functional TED Structural DFG TDS flow Matrix transforms, Polynomials C, Behavioral HDL DFG extraction High Level Synthesis (GAUT) RTL VHDL Original DFG HLS flow KBD ORIGINAL TED Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

33 33 Electrical and Computer Engineering Results* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

34 34 Electrical and Computer Engineering Results: Quintic Spline* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

35 35 Electrical and Computer Engineering Results: Quartic spline* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

36 36 Electrical and Computer Engineering Improvement over KBD and Original* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)

37 37 Electrical and Computer Engineering


Download ppt "Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations Using Canonical TED Representation M. Ciesielski, D. Gomez-Prado,Q."

Similar presentations


Ads by Google