Fall 2006EE 5301 - VLSI Design Automation I VII-1 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part VII: High Level Synthesis.

Slides:



Advertisements
Similar presentations
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
ECE-777 System Level Design and Automation Hardware/Software Co-design
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Chapter 4 Retiming.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.
COE 561 Digital System Design & Synthesis Scheduling Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE Synthesis & Verification - Lecture 3 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
The Theory of NP-Completeness
Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
SCHEDULING SOURCES- Mark Manwaring Kia Bazargan Giovanni De Micheli Gupta Youn-Long Lin M. Balakrishnan Camposano, J. Hofstede, Knapp, MacMillen Lin.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida.
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Pipelining and Retiming
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Carnegie Mellon Lecture 8 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm Reading: Chapter 10.5 – 10.6 M. LamCS243: Software.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Approximation Algorithms based on linear programming.
High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida.
Scheduling Determines the precise start time of each task.
High-Level Synthesis Creating Custom Circuits from High-Level Code
High-Level Synthesis: Creating Custom Circuits from High-Level Code
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
High-Level Synthesis: Creating Custom Circuits from High-Level Code
A SoC Design Automation Seoul National University
Architectural-Level Synthesis
Architecture Synthesis
HIGH LEVEL SYNTHESIS.
Integrated Systems Centre © Giovanni De Micheli – All rights reserved
ICS 252 Introduction to Computer Design
ICS 252 Introduction to Computer Design
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Fall 2006EE VLSI Design Automation I VII-1 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part VII: High Level Synthesis

Fall 2006EE VLSI Design Automation I VII-2 References and Copyright Textbooks referred (none required)  [Mic94] G. De Micheli “Synthesis and Optimization of Digital Circuits” McGraw-Hill, Slides used: (Modified by Kia when necessary)  [©Gupta] © Rajesh Gupta UC-Irvine

Fall 2006EE VLSI Design Automation I VII-3 High Level Synthesis (HLS) The process of converting a high-level description of a design to a netlist  Input: oHigh-level languages (e.g., C) oBehavioral hardware description languages (e.g., VHDL) oStructural HDLs (e.g., VHDL) oState diagrams / logic networks  Tools: oParser oLibrary of modules  Constraints: oArea constraints (e.g., # modules of a certain type) oDelay constraints (e.g., set of operations should finish in clock cycles)  Output: oOperation scheduling (time) and binding (resource) oControl generation and detailed interconnections

Fall 2006EE VLSI Design Automation I VII-4 High-Level Synthesis Compilation Flow Lex Parse Compilation front-end Behavioral Optimization Intermediate form Arch synth Logic synth Lib Binding HLS backend x = a + b  c + d + +  abcd + +  adbc

Fall 2006EE VLSI Design Automation I VII-5 Behavioral Optimization Techniques used in software compilation  Expression tree height reduction  Constant and variable propagation  Common sub-expression elimination  Dead-code elimination  Operator strength reduction (e.g., *4  << 2 ) Typical Hardware transformations  Conditional expansion oIf (c) then x=A else x=B  compute A and B in parallel, x=(C)?A:B  Loop expansion oInstead of three iterations of a loop, replicate the loop body three times A B x c

Fall 2006EE VLSI Design Automation I VII-6 Architectural Synthesis Deals with “computational” behavioral descriptions  Behavior as sequencing graph (aka dependency graph, or data flow graph DFG)  Hardware resources as library elements oPipelined or non-pipelined oResource performance in terms of execution delay  Constraints on operation timing  Constraints on hardware resource availability  Storage as registers, data transfer using wires Objective  Generate a synchronous, single-phase clock circuit  Might have multiple feasible solutions (explore tradeoff)  Satisfy constraints, minimize objective: oMaximize performance subject to area constraint oMinimize area subject to performance constraints [©Gupta]

Fall 2006EE VLSI Design Automation I VII-7 Synthesis in Temporal Domain Scheduling and binding can be done in different orders or together Schedule:  Mapping of operations to time slots (cycles)  A scheduled sequencing graph is a labeled graph [©Gupta] + NOP   +<      + <

Fall 2006EE VLSI Design Automation I VII-8 Operation Types For each operation, define its type. For each resource, define a resource type, and a delay (in terms of # cycles) T is a relation that maps an operation to a resource type that can implement it  T : V  {1, 2,..., n res }. More general case:  A resource type may implement more than one operation type (e.g., ALU) Resource binding:  Map each operation to a resource with the same type  Might have multiple options [©Gupta]

Fall 2006EE VLSI Design Automation I VII-9 Schedule in Spatial Domain Resource sharing  More than one operation bound to same resource  Operations have to be serialized  Can be represented using hyperedges (define vertex partition) [©Gupta] + NOP   +<

Fall 2006EE VLSI Design Automation I VII-10 Scheduling and Binding  Resource dominated  Control dominated Resource constraints:  Number of resource instances of each type {a k : k=1, 2,..., n res }. Scheduling:  Labeled vertices  (v 3 )=1. Binding:  Hyperedges (or vertex partitions)  (v 2 )=adder1. Cost:  Number of resources  area  Registers, steering logic (Muxes, busses), wiring, control unit Delay:  Start time of the “sink” node  Might be affected by steering logic and schedule (control logic) – resource-dominated vs. ctrl-dominated

Fall 2006EE VLSI Design Automation I VII-11 Architectural Optimization Optimization in view of design space flexibility A multi-criteria optimization problem:  Determine schedule  and binding .  Under area , latency and cycle time  objectives Find non-dominated points in solution space Solution space tradeoff curves:  Non-linear, discontinuous  Area / latency / cycle time (more?) Evaluate (estimate) cost functions Unconstrained optimization problems for resource dominated circuits:  Min area: solve for minimal binding  Min latency: solve for minimum scheduling [©Gupta]

Fall 2006EE VLSI Design Automation I VII-12 Scheduling and Binding Cost and  determined by both  and .  Also affected by floorplan and detailed routing  affected by  :  Resources cannot be shared among concurrent ops  affected by  :  Resources cannot be shared among concurrent ops  When register and steering logic delays added to execution delays, might violate cycle time. Order?  Apply either one (scheduling, binding) first [©Gupta]

Fall 2006EE VLSI Design Automation I VII-13 How Is the Datapath Implemented? In the following schedule and binding, every operation has two inputs. If an input is not shown explicitly, it comes from a unique register Wires between modules? Input selection? How does binding / scheduling affect congestion? How does binding / scheduling affect steering logic? +      + <

Fall 2006EE VLSI Design Automation I VII-14 Operation Scheduling Input:  Sequencing graph G(V, E), with n vertices  Cycle time .  Operation delays D = {d i : i=0..n}. Output:  Schedule  determines start time t i of operation v i.  Latency = t n – t 0. Goal: determine area / latency tradeoff Classes:  Non-hierarchical and unconstrained  Latency constrained  Resource constrained  Hierarchical [©Gupta]

Fall 2006EE VLSI Design Automation I VII-15 Min Latency Unconstrained Scheduling Simplest case: no constraints, find min latency Given set of vertices V, delays D and a partial order > on operations E, find an integer labeling of operations  : V  Z + Such that:  t i =  (v i ).  t i  t j + d j  (v j, v i )  E.  = t n – t 0 is minimum. Solvable in polynomial time Bounds on latency for resource constrained problems ASAP algorithm used: topological order

Fall 2006EE VLSI Design Automation I VII-16 ASAP Schedules  Schedule v 0 at t 0 =0.  While (v n not scheduled) oSelect v i with all scheduled predecessors oSchedule v i at t i = max {t j +d j }, v j being a predecessor of v i.  Return t n. + NOP   +<

Fall 2006EE VLSI Design Automation I VII-17 ALAP Schedules  Schedule v n at t n =.  While (v 0 not scheduled) oSelect v i with all scheduled successors oSchedule v i at t i = min {t j -d j }, v j being a succecessor of v i. + NOP      +<

Fall 2006EE VLSI Design Automation I VII-18 Resource Constraint Scheduling Constrained scheduling  General case NP-complete  Minimize latency given constraints on area or the resources (ML-RCS)  Minimize resources subject to bound on latency (MR- LCS) Exact solution methods  ILP: Integer Linear Programming  Hu’s heuristic algorithm for identical processors Heuristics  List scheduling  Force-directed scheduling

Fall 2006EE VLSI Design Automation I VII-19 Use binary decision variables  i = 0, 1,..., n  l = 1, 2,..., ’+1 ’ given upper-bound on latency  x il = 1 if operation i starts at step l, 0 otherwise. Set of linear inequalities (constraints), and an objective function (min latency) Observations   t i = start time of op i.  is op v i (still) executing at step l ? ILP Formulation of ML-RCS [Mic94] p.198 ?

Fall 2006EE VLSI Design Automation I VII-20 Start Time vs. Execution Time For each operation v i, only one start time If d i =1, then the following questions are the same:  Does operation v i start at step l ?  Is operation v i running at step l ? But if d i >1, then the two questions should be formulated as:  Does operation v i start at step l ? oDoes x il = 1 hold?  Is operation v i running at step l ? oDoes the following hold? ?

Fall 2006EE VLSI Design Automation I VII-21 Operation v i Still Running at Step l ? Is v 9 running at step 6?  Is x 9,6 + x 9,5 + x 9,4 = 1 ? Note:  Only one (if any) of the above three cases can happen  To meet resource constraints, we have to ask the same question for ALL steps, and ALL operations of that type v9v x 9,4 =1 v9v x 9,5 =1 v9v x 9,6 =1

Fall 2006EE VLSI Design Automation I VII-22 Operation v i Still Running at Step l ? Is v i running at step l ?  Is x i,l + x i,l x i,l-di+1 = 1 ? vivi l l-1 l-d i x i,l-di+1 =1 vivi l l-1 l-d i x i,l-1 =1 vivi l l-1 l-d i x i,l =1...

Fall 2006EE VLSI Design Automation I VII-23 Constraints:  Unique start times:  Sequencing (dependency) relations must be satisfied  Resource constraints Objective: min c T t.  t =start times vector, c =cost weight (e.g., [ ])  When c =[ ], c T t = ILP Formulation of ML-RCS (cont.)

Fall 2006EE VLSI Design Automation I VII-24 ILP Example Assume = 4 First, perform ASAP and ALAP  (we can write the ILP without ASAP and ALAP, but using ASAP and ALAP will simplify the inequalities) + NOP   +<      +< v2v1 v3 v4 v5 vn v6 v7 v8 v9 v10 v11 v2v1 v3 v4 v5 vn v6 v7 v8 v9 v10 v11

Fall 2006EE VLSI Design Automation I VII-25 ILP Example: Unique Start Times Constraint Without using ASAP and ALAP values: Using ASAP and ALAP:

Fall 2006EE VLSI Design Automation I VII-26 ILP Example: Dependency Constraints Using ASAP and ALAP, the non-trivial inequalities are: (assuming unit delay for + and *)

Fall 2006EE VLSI Design Automation I VII-27 ILP Example: Resource Constraints Resource constraints (assuming 2 adders and 2 multipliers) Objective:  Since =4 and sink has no mobility, any feasible solution is optimum, but we can use the following anyway:

Fall 2006EE VLSI Design Automation I VII-28 ILP Formulation of MR-LCS Dual problem to ML-RCS Objective:  Goal is to optimize total resource usage, a.  Objective function is c T a, where entries in c are respective area costs of resources Constraints:  Same as ML-RCS constraints, plus:  Latency constraint added:  Note: unknown a k appears in constraints. [©Gupta]

Fall 2006EE VLSI Design Automation I VII-29 Hu’s Algorithm Simple case of the scheduling problem  Operations of unit delay  Operations (and resources) of the same type Hu’s algorithm  Greedy  Polynomial AND optimal  Computes lower bound on number of resources for a given latency OR: computes lower bound on latency subject to resource constraints Basic idea:  Label operations based on their distances from the sink  Try to schedule nodes with higher labels first (i.e., most “critical” operations have priority) [©Gupta]

Fall 2006EE VLSI Design Automation I VII-30 Hu’s Algorithm HU (G(V,E), a) { Label the vertices // label = length of longest path passing through the vertex l = 1 repeat { U = unscheduled vertices in V whose predecessors have been scheduled (or have no predecessors) Select S  U such that |S|  a and labels in S are maximal Schedule the S operations at step l by setting t i =l, i: v i  S. l = l + 1 } until v n is scheduled. }

Fall 2006EE VLSI Design Automation I VII-31 Hu’s Algorithm: Example [©Gupta]

Fall 2006EE VLSI Design Automation I VII-32 List Scheduling Greedy algorithm for ML-RCS and MR-LCS  Does NOT guarantee optimum solution Similar to Hu’s algorithm  Operation selection decided by criticality  O(n) time complexity More general input  Resource constraints on different resource types

Fall 2006EE VLSI Design Automation I VII-33 List Scheduling Algorithm: ML-RCS LIST_L (G(V,E), a) { l = 1 repeat { for each resource type k { U l,k = available vertices in V. T l,k = operations in progress. Select S k  U l,k such that | S k | + | T l,k |  a k Schedule the S k operations at step l } l = l + 1 } until v n is scheduled. }

Fall 2006EE VLSI Design Automation I VII-34 List Scheduling Example [©Gupta]

Fall 2006EE VLSI Design Automation I VII-35 List Scheduling Algorithm: MR-LCS LIST_R (G(V,E), ’) { a = 1, l = 1 Compute the ALAP times t L. if t 0 L < 0 return (not feasible) repeat { for each resource type k { U l,k = available vertices in V. Compute the slacks { s i = t i L - l,  v i  U l,k }. Schedule operations with zero slack, update a Schedule additional S k  U l,k under a constraints } l = l + 1 } until v n is scheduled. }

Fall 2006EE VLSI Design Automation I VII-36 Force-Directed Scheduling Similar to list scheduling  Can handle ML-RCS and MR-LCS  For ML-RCS, schedules step-by-step  BUT, selection of the operations tries to find the globally best set of operations Idea:  Find the mobility  i = t i L – t i S of operations  Look at the operation type probability distributions  Try to flatten the operation type distributions Definition: operation probability density  p i ( l ) = Pr { v i starts at step l }.  Assume uniform distribution: [©Gupta]

Fall 2006EE VLSI Design Automation I VII-37 Force-Directed Scheduling: Definitions Operation-type distribution (NOT normalized to 1)  Operation probabilities over control steps:  Distribution graph of type k over all steps:   q k ( l ) can be thought of as expected operator cost for implementing operations of type k at step l.

Fall 2006EE VLSI Design Automation I VII-38 Example + NOP   +<

Fall 2006EE VLSI Design Automation I VII-39 Force-Directed Scheduling Algorithm: Idea Very similar to LIST_L(G(V,E), a)  Compute mobility of operations using ASAP and ALAP  Computer operation probabilities and type distributions  Select and schedule operations  Update operation probabilities and type distributions  Go to next control step Difference with list sched in selecting operations  Select operations with least force  Consider the effect on the type distribution  Consider the effect on successor nodes and their type distributions

Fall 2006EE VLSI Design Automation I VII-40 To Probe Further... Linear programming  programming.shtml Linear programming tools  faq.html Automatic complication of pipelined designs  T. Maruyama and T. Hoshino, “A C to HDL Compiler for Pipeline Processing on FPGAs”, IEEE Symposium on FPGAs for Custom Computing Machines (FCCM), pp , 2001.