Presentation is loading. Please wait.

Presentation is loading. Please wait.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

Similar presentations


Presentation on theme: "A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering."— Presentation transcript:

1 A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106-9560 {gong, wanggang, kastner}@ece.ucsb.edu http://express.ece.ucsb.edu June 22, 2004

2 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 2 Outline  Reconfigurable computing systems  Compilation process  Synthesizing to hardware  Experimental results  Concluding remarks

3 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 3 Outline  Reconfigurable computing systems  Challenges of application representations  Compilation process  Synthesizing to hardware  Experimental results  Concluding remarks

4 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 4 Reconfigurable Computing Systems  Standard programmable platforms  Post-manufacturing customization  Designs shift from physical chips to configuration files  A software design flow  Feature hardware speed with software flexibility  Enable higher productivity

5 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 5 Application Representations  A common application representation is needed to tame the complexity of system synthesis  Requirements  Able to generate software code for microprocessors  Able to be easily translate to hardware configuration files  Allow a variety of transformations and optimizations to exploit the performance

6 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 6 Parallelism Exploration  Fine grain parallelism  Multiple functional units  Issuing an operation to a free functional units  Operations executed independently  Coarse grain parallelism  Executing multiple threads  With occasional synchronization  Reconfigurable computing systems support both fine and coarse grain parallelism

7 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 7 PDG + SSA  The PDG + SSA representation can be used for both hardware synthesis and software generation  The PDG and SSA forms are common representations for software generation  Here we concentrate on hardware synthesis

8 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 8 Outline  Reconfigurable computing systems  Compilation process  Overview  Constructing the PDG  Incorporating the SSA form  Synthesizing to hardware  Experimental results  Concluding remarks

9 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 9 Overview

10 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 10 Program Dependence Graph  PDG: Program Dependence Graph  ENTRY node: the root node of a PDG  PREDICATE nodes: producing predicate values from expressions  Diamond-shaped nodes 2, 3, and 4  STATEMENTS nodes: a arbitrary set of operations  Circle nodes: 1, 4, 6, 7, and 8  REGION nodes: summarizing all operations with the same control conditions together.  House-shaped nodes R2, R3, R4 …  R3: the predicate value of 2 is True  Edges represent dependencies

11 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 11 Constructing the PDG from the CDFG  Implemented based on Ferrante’s algorithm  Using post-dominate tree var = pred; for (i = 0; i < len; ++i) { val += diff; if (val > 32767) val = 32767; else if (val < -32768) val = -32768; } return val;

12 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 12 Constructing the PDG (cont’d)

13 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 13 The Static Single Assignment Form  Each variable has exactly one assignment  A variable is referenced always using the same name  At joint points of control conditions, special Ø nodes are inserted. val += diff; if (val > 32767) val = 32767; else if (val < -32768) val = -32768; val_2 = val_1 + diff; if (val_2 > 32767) val_3 = 32767; else if (val_2 < -32768) val_4 = -32768; val_5 = phi(val_2,val_3,val_4);

14 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 14 Extending the PDG with Ø-Nodes

15 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 15 The Program Representation  Loop independent Ø-nodes  taking two or more input values and a predicate value  committing one of the inputs depending on this predicate  Loop carried Ø-nodes  Input: the initial value, the loop- carried value, and also a predicate value  Outputs: one to the iteration body, and the other to the loop exit  Directing proper values to proper outputs.

16 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 16 Outline  Reconfigurable computing systems  Compilation process  Synthesizing to hardware  Data-path elements  Ø-nodes  Experimental results  Concluding remarks

17 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 17 Synthesizing the Data-Path  A one-to-one mapping is used  Different resource allocation and binding algorithms can be used (on-going work)  Each operation has an operator and several operands  Operands are synthesized directly to wires in the circuit  Each variable in the SSA form has only one definition point  PREDICATE nodes: synthesized to Boolean logic signals to control next-stage transitions and direct multiplexers to commit the correct value.

18 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 18 Synthesizing Ø-nodes  A loop-independent Ø-nodes are synthesized to a multiplexer. The multiplexer selects input values depending on the predicate values.  For a loop carried Ø-node, an additional switch is generated to direct the loop-exiting values

19 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 19 Synthesize to Hardware  Simplifications and optimizations  Removing unnecessary control dependencies  Cascading/ expanding multipliers obtain better performance  Flip-flops are inserted  Guarantee that correct values will available no matter which execution path is taken

20 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 20 Outline  Reconfigurable computing systems  Compilation process  Synthesizing to hardware  Experimental results  Setup and benchmarks  Results  Concluding remarks

21 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 21 Setup and Benchmarks  Benchmark suites  Functions from the MediaBench suite  Profiled using sample data  Only report conservative results  Estimated execution time  Aggressive predicated execution  Only report conservative results  Area  One-to-one mapping without resource sharing  Reported in numbers of FPGA slices

22 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 22 Estimated Execution Time

23 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 23 Estimated Execution Time (cont’d)

24 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 24 Estimated FPGA Area

25 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 25 Outline  Reconfigurable computing systems  Compilation process  Synthesizing to hardware  Experimental results  Concluding remarks  On-going/future work

26 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 26 Concluding Remarks  The PDG+SSA form supports a variety of transformations and enables both coarse and fine grain parallelism  A method to synthesize this form to hardware  This form gives faster execution time using similar area when compared with CFG and PSSA forms

27 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 27 On-going/Future work  Investigate transformations to create coarse grained parallelism using the PDG+SSA form  Augment the PDG+SSA form with architectural information to provide fast estimation.  Integrate of resource sharing and other architectural synthesis techniques

28 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 28 Thank You  Prof Ryan Kastner and Gang Wang  All audiences

29 6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 29 Questions


Download ppt "A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering."

Similar presentations


Ads by Google