Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.

Slides:



Advertisements
Similar presentations
Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Course Outline Traditional Static Program Analysis Software Testing
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
08/31/2001Copyright CECS & The Spark Project SPARK High Level Synthesis System Sumit GuptaTimothy KamMichael KishinevskyShai Rotem Nick SavoiuNikil DuttRajesh.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine Conditional.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
1/20 Data Communication Estimation and Reduction for Reconfigurable Systems Adam Kaplan Philip Brisk Ryan Kastner Computer Science Elec. and Computer Engineering.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.
08/31/2001Copyright CECS & The Spark Project Center for Embedded Computer Systems University of California, Irvine High-Level.
Center for Embedded Computer Systems Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A C-to-VHDL Parallelizing High-Level.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse-Grain and Fine-Grain Optimizations.
Generation of CDFGs from Scheduled and Pipelined Assembly Code The 18th International Workshop on Languages and Compilers for Parallel Computing October.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Center for Embedded Computer Systems University of California, Irvine and San Diego Hardware and Interface Synthesis of.
Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.
Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
SPARK Accelerating ASIC designs through parallelizing high-level synthesis Sumit Gupta Rajesh Gupta
VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Design Space Exploration
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable.
Slack Analysis in the System Design Loop Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University.
Operation Tables for Scheduling in the presence of Partial Bypassing Aviral Shrivastava 1 Eugene Earlie 2 Nikil Dutt 1 Alex Nicolau 1 1 Center For Embedded.
High-level optimization Jakub Yaghob
Introduction to cosynthesis Rabi Mahapatra CSCE617
Instruction Scheduling for Instruction-Level Parallelism
CSCI1600: Embedded and Real Time Software
Ann Gordon-Ross and Frank Vahid*
Register Pressure Guided Unroll-and-Jam
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
Suhas Chakravarty, Zhuoran Zhao, Andreas Gerstlauer
Architectural-Level Synthesis
Dynamic Hardware Prediction
How to improve (decrease) CPI
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying Parallelizing Compiler Transformations Sumit Gupta Nikil Dutt Rajesh Gupta Alex Nicolau Supported by Semiconductor Research Corporation

High Level Synthesis M e m o r y ALU Control Data path d = e - fg = h + i If Node TF c x = a + b c = a < b j = d x g l = e + x x = a + b; c = a < b; if (c) then d = e – f; else g = h + i; j = d x g; l = e + x; Transform behavioral descriptions to RTL/gate level From C to CDFG to Architecture

Focus of this Paper Spark High-Level Synthesis Framework Spark High-Level Synthesis Framework Comprehensive synthesis framework Comprehensive synthesis framework Coordinated Coarse & Fine-Grain HLS and Compiler optimizations Coordinated Coarse & Fine-Grain HLS and Compiler optimizations C Input => RTL VHDL Output C Input => RTL VHDL Output Results in terms of Logic Synthesis Results (Area/Timing) Results in terms of Logic Synthesis Results (Area/Timing) Large, real-life applications: MPEG, Image Processing Large, real-life applications: MPEG, Image Processing Descriptions with complex and nested conditionals and loops Descriptions with complex and nested conditionals and loops Objectives: Objectives: Improve quality of HLS results by extracting a high degree of parallelization Improve quality of HLS results by extracting a high degree of parallelization Reduce impact of control constructs on QOR Reduce impact of control constructs on QOR Improve controllability of the HLS solutions Improve controllability of the HLS solutions

Recent Related Work Code motions in the presence of conditionals Code motions in the presence of conditionals Condition Vector List Scheduling [Wakabayashi 89] Condition Vector List Scheduling [Wakabayashi 89] Path Based Scheduling [Camposano 91] Path Based Scheduling [Camposano 91] Symbolic Scheduling [Radivojevic 96] Symbolic Scheduling [Radivojevic 96] WaveSched Scheduler [Lakshminarayana 98] WaveSched Scheduler [Lakshminarayana 98] Basic Block Control Graph Scheduling [Santos 99] Basic Block Control Graph Scheduling [Santos 99] Early work was on data-intensive DSP algorithms Early work was on data-intensive DSP algorithms Pipelining, Algorithmic transformations Pipelining, Algorithmic transformations

SPARK High Level Synthesis Framework C Input => RTL VHDL Output C Input => RTL VHDL Output VHDL => Logic Synthesis Results VHDL => Logic Synthesis Results Customizable Scheduler Customizable Scheduler Modular toolbox of transformations Modular toolbox of transformations Heuristics select transformations Heuristics select transformations

The Intermediate Representation Spark uses Hierarchical Task Graphs (HTG) Spark uses Hierarchical Task Graphs (HTG) Consists of hierarchy of basic blocks and HTG nodes Consists of hierarchy of basic blocks and HTG nodes 3 Types of HTG Nodes: Single, Compound, Loop 3 Types of HTG Nodes: Single, Compound, Loop Enables application of coarse and fine grain optimizations Enables application of coarse and fine grain optimizations Can regenerate C code Can regenerate C code Augmented by data dependency graphs Augmented by data dependency graphs

Loop HTG Node

Trailblazing: Code Motion Technique HTGs enable Hierarchical Operation Moves HTGs enable Hierarchical Operation Moves Does not visit each node in the graph Does not visit each node in the graph

Scheduling Heuristic BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB Speculate c b d + Across HTG Across HTG + a Get Available Ops Get Available Ops a, b, c, d a, b, c, d Determine Code Motions Required Determine Code Motions Required Assign Cost to each Operation Assign Cost to each Operation Cost is based on data dependency chain Cost is based on data dependency chain Schedule Op with lowest Cost Schedule Op with lowest Cost BB 0 BB 9 Speculate Across HTG

BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB c b + a BB 0 BB 9 + d Scheduling Heuristic BB 2BB 3 BB 1 BB 6BB 7 BB 5 BB 4 BB c b d + + Across HTG Conditional Speculation + a + d BB 0 BB 9 + d Speculate Across HTG

Experimentation Experiments for several transformations Experiments for several transformations Speculative Code Motions Speculative Code Motions Transformations applied during scheduling: Dynamic CSE Transformations applied during scheduling: Dynamic CSE We have used Spark to synthesize designs derived from several industrial designs We have used Spark to synthesize designs derived from several industrial designs MPEG-1 Prediction Block MPEG-1 Prediction Block GIMP Image Processing software GIMP Image Processing software Scheduling Results Scheduling Results Number of States in FSM Number of States in FSM Cycles on Longest Path through Design Cycles on Longest Path through Design VHDL: Logic Synthesis VHDL: Logic Synthesis Critical Path Length (ns) Critical Path Length (ns) Unit Area Unit Area

Code Motions: Logic Synthesis Results Within Basic Blocks & Across Hierar. Blocks + Speculation + Reverse Speculation & Early Condition Execution Condition Speculation Speculative Code Motions 50 % reduction in delay with 20 % Area increase 50 % reduction in delay with 20 % Area increase

CSE/Dynamic CSE Results All Code Motions Enabled + Only CSE + Only Dyanmic CSE + CSE & Dynamic CSE Dynamic CSE 30 % reduction in delay, 25 % reduction in Area 30 % reduction in delay, 25 % reduction in Area Speculative Code Motions + Dynamic CSE 75 % reduction in delay with No Area increase

Conclusions Comprehensive High-Level Synthesis framework Comprehensive High-Level Synthesis framework Behavioral C to RTL VHDL Behavioral C to RTL VHDL Hierarchical IR + Parallelizing Transformations Hierarchical IR + Parallelizing Transformations Toolbox of Transformations guided by Heuristics Toolbox of Transformations guided by Heuristics Basic compiler transformations: CSE, Copy Propagation Basic compiler transformations: CSE, Copy Propagation Platform for applying Coarse and Fine-grain Optimizations Platform for applying Coarse and Fine-grain Optimizations Experimentation with large industrial applications Experimentation with large industrial applications

Thank You

Additional Slides

Eliminating Dependencies by Renaming