ECE 667 Synthesis and Verification of Digital Circuits

Slides:



Advertisements
Similar presentations
Algorithm Design Methods (I) Fall 2003 CSE, POSTECH.
Advertisements

Algorithm Design Methods Spring 2007 CSE, POSTECH.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
ECE Longest Path dual 1 ECE 665 Spring 2005 ECE 665 Spring 2005 Computer Algorithms with Applications to VLSI CAD Linear Programming Duality – Longest.
§3 Shortest Path Algorithms Given a digraph G = ( V, E ), and a cost function c( e ) for e  E( G ). The length of a path P from source to destination.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Courseware Integer Linear Programming approach to Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Introduction to Algorithms
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.
COE 561 Digital System Design & Synthesis Scheduling Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Sequential Timing Optimization. Long path timing constraints Data must not reach destination FF too late s i + d(i,j) + T setup  s j + P s i s j d(i,j)
Precedence Constrained Scheduling Abhiram Ranade Dept. of CSE IIT Bombay.
Chapter 3 The Greedy Method 3.
ECE734 VLSI Arrays for Digital Signal Processing Algorithm Representations and Iteration Bound.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE Synthesis & Verification - Lecture 3 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Approximation Algorithms
EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
Tirgul 13. Unweighted Graphs Wishful Thinking – you decide to go to work on your sun-tan in ‘ Hatzuk ’ beach in Tel-Aviv. Therefore, you take your swimming.
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
SCHEDULING SOURCES- Mark Manwaring Kia Bazargan Giovanni De Micheli Gupta Youn-Long Lin M. Balakrishnan Camposano, J. Hofstede, Knapp, MacMillen Lin.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
CSE 550 Computer Network Design Dr. Mohammed H. Sqalli COE, KFUPM Spring 2007 (Term 062)
Fall 2006EE VLSI Design Automation I VII-1 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part VII: High Level Synthesis.
Scheduling for Synthesis of Embedded Hardware
1 IOE/MFG 543 Chapter 7: Job shops Sections 7.1 and 7.2 (skip section 7.3)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Iterative Flattening in Cumulative Scheduling. Cumulative Scheduling Problem Set of Jobs Each job consists of a sequence of activities Each activity has.
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.
Network Optimization Models
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.
An Efficient Algorithm for Scheduling Instructions with Deadline Constraints on ILP Machines Wu Hui Joxan Jaffar School of Computing National University.
A Faster Approximation Scheme for Timing Driven Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, and Charles J. Alpert** *Dept of ECE, Michigan Technological.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2015 Architecture Synthesis (Provisioning, Allocation)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
ELEC692 VLSI Signal Processing Architecture Lecture 3
CS223 Advanced Data Structures and Algorithms 1 Maximum Flow Neil Tang 3/30/2010.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Algorithm Design Methods 황승원 Fall 2011 CSE, POSTECH.
Suppose G = (V, E) is a directed network. Each edge (i,j) in E has an associated ‘length’ c ij (cost, time, distance, …). Determine a path of shortest.
Scheduling Determines the precise start time of each task.
Greedy Technique.
Algorithm Design Methods
The minimum cost flow problem
Resource Sharing and Binding
Integrated Systems Centre © Giovanni De Micheli – All rights reserved
Algorithm Design Methods
Topic 15 Job Shop Scheduling.
Algorithm Design Methods
ICS 252 Introduction to Computer Design
CSE 417: Algorithms and Computational Complexity
Maximum Flow Neil Tang 4/8/2008
Algorithm Design Methods
Reconfigurable Computing (EN2911X, Fall07)
CS137: Electronic Design Automation
Presentation transcript:

ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms ECE 667

Architectural Optimization Optimization in view of design space flexibility A multi-criteria optimization problem: Determine schedule f and binding b. Under area A, latency L and cycle time t objectives Solution space tradeoff curves: Non-linear, discontinuous Area / latency / cycle time (more?) Evaluate (estimate) cost functions Unconstrained optimization problems for resource dominated circuits: Min area: solve for minimal binding Min latency: solve for minimum L scheduling HLS - Scheduling

Scheduling, Allocation and Assignment Techniques are well understood and mature HLS - Scheduling Copyright 2003  Mani Srivastava

Scheduling and Assignment - Overview F1 = (a + b + c) * d F2 = (x + y) * z F3 = (x + y)* w a b 4 control steps, 1 Add, 2 Mult Schedule 1 +1 s1 + * Control Step c +2 s2 d x y *1 s3 s4 +3 *2 *3 s3 z w F1 F2 F3 HLS - Scheduling Copyright 2003  Mani Srivastava

Scheduling and Assignment - Overview F1 = (a + b + c) * d F2 = (x + y) * z F3 = (x + y)* w Schedule 2 s1 s2 *2 +3 F2 s3 *3 F3 y x w z 4 control steps, 1 Add, 1 Mult a +1 +2 *1 b + * Control Step s2 c s3 d s4 F1 HLS - Scheduling Copyright 2003  Mani Srivastava

Algorithm Description  Data Flow Graph HLS - Scheduling

Data Flow Graph (DFG) HLS - Scheduling

Sequencing Graph Add source and sink nodes HLS - Scheduling

Sequencing Graph * * Add source and sink nodes (NOP) to the DFG + < Data Flow Graph (DFG) Sequencing Graph HLS - Scheduling

ASAP Scheduling Algorithm As Soon as Possible scheduling Unconstrained minimum latency scheduling Uses topological sorting of the sequencing graph (polynomial time) Gives optimum solution to scheduling problem Schedule first the first node no  T1 until last node nv is scheduled Ci = completion time (delay) of predecessor i of node j HLS - Scheduling

ASAP Scheduling Algorithm - Example Assume Ci = 1 start TS = 1 TS = 2 TS = 3 finish TS = 4 HLS - Scheduling

ASAP Scheduling Example How many Add, Mult are needed ? Sequence Graph ASAP Schedule HLS - Scheduling

ALAP Scheduling Algorithm As late as Possible scheduling Latency-constrained scheduling (latency is fixed) Uses reversed topological sorting of the sequencing graph If over-constrained (latency too small), solution may not exist Schedule first the last node nv  T, until first node n0 is scheduled Ci = completion time (delay) of predecessor i of node j (Latency) HLS - Scheduling

ALAP Scheduling Algorithm - example Assume Ci = 1 TS = 1 start finish TS = 2 TS = 3 TS = 4 = T HLS - Scheduling

ALAP Scheduling Example How many Add, Mult are needed ? ALAP Schedule (latency constraint = 4) Sequence Graph HLS - Scheduling

ASAP & ALAP Scheduling No Resource Constraint ASAP Schedule Critical Path=4 Slack ALAP Schedule Sequence Graph HLS - Scheduling

ASAP & ALAP Scheduling No Resource Constraint ASAP Schedule Slack Having determined the minimum latency, what can we do with the slack? Adds flexibility to the schedule Determine the minimum # resources ALAP Schedule What if you want to find Min. latency given fixed resources? Min resources given a latency? HLS - Scheduling

Computing Slack (mobility) Slack of Operator i: Si = TSiALAP – TSiASAP Defines mobility of the operators S6 = 2-1 = 1 S7 = 2-1 = 1 S8 = 3-1 = 2 S9 = 4-2 = 2 S10 = 3-1 = 2 S11 = 4-2 = 2 * NOP + < 1 2 3 4 5 6 7 8 9 10 11 n * NOP < + 1 2 3 4 5 6 7 8 9 10 11 n HLS - Scheduling

Timing Constraints Time measured in cycles or control steps Imposing relative timing constraints between operators i and j max & min timing constraints Node number (name) Required delay HLS - Scheduling

Constraint Graph Gc(V,E) MULT delay =2 ADD delay = 1 HLS - Scheduling

Existence of Schedule under Timing Constraints Upper bound (max timing constraint) is a problem Examine each max timing constraint (i, j): Longest weighted path between nodes i and j must be  max timing constraint uij. Any cycle in Gc including edge (i, j) must be negative or zero Necessary and sufficient condition: The constraint graph Gc must not have positive cycles Example: Assume delays: ADD=1, MULT=2 Path {1  2} has weight 2  u12=3, that is cycle {1,2,1} has weight = -1, OK No positive cycles in the graph, so it has a consistent schedule HLS - Scheduling

Existence of schedule under timing constraints Example: satisfying assignment Assume delays: ADD=1, MULT=2 Feasible assignment: Vertex Start time v0  step 1 v1  step 1 v2  step 3 v3  step 1 v4  step 5 vn  step 6 HLS - Scheduling

Scheduling – a Combinatorial Optimization Problem NP-complete Problem Optimal solutions for special cases and ILP Heuristics - iterative Improvements Heuristics – constructive Various versions of the problem Unconstrained, minimum latency Resource-constrained, minimum latency Timing-constrained, minimum latency Latency-constrained, minimum resource If all resources are identical, problem is reduced to multiprocessor scheduling (Hu’s algorithm) HLS - Scheduling

Observation about ALAP & ASAP No consideration given to resource constraints No priority is given to nodes on critical path As a result, less critical nodes may be scheduled ahead of critical nodes No problem if unlimited hardware is available However if the resources are limited, the less critical nodes may block the critical nodes and thus produce inferior schedules List scheduling techniques overcome this problem by utilizing a more global node selection criterion HLS - Scheduling

Hu’s Algorithm Simple case of the scheduling problem Hu’s algorithm Each operation has unit delay Each operation can be implemented by the same operator (multiprocessor) Hu’s algorithm Greedy, polynomial time Optimal for trees and single type operations Computes minimum number of resources for a given latency (MR-LCS), or computes minimum latency subject to resource constraints (ML-RCS) Basic idea: Label operations based on their distances from the sink Try to schedule nodes with higher labels first (i.e., most “critical” operations have priority) HLS - Scheduling

Hu’s Algorithm (for Multiprocessors) Labeling of nodes Label operations based on their distances from the sink Notation I = label of node i  = maxi i p(j) = # vertices with label j Theorem (Hu) Lower bound on the number of resources to complete schedule with latency L is amin = max   j=1 p( +1- j) / (+L- ) where  is a positive integer (1     +1) In this case: amin= 3 (number of operators needed) HLS - Scheduling

Hu’s Algorithm (min Latency s.t. Resource constraint) HU (G(V,E), a) { Label the vertices // a = resource constraint (scalar) // label = length of longest path passing through the vertex l = 1 repeat { U = unscheduled vertices in V whose predecessors have been already scheduled (or have no predecessors) Select S  U such that |S|  a and labels in S are maximal Schedule the S operations at step l by setting ti= l,  vi  S; l = l + 1; } until vn is scheduled. } HLS - Scheduling

Hu’s Algorithm: Example (a=3) HLS - Scheduling [©Gupta]

List Scheduling (arbitrary operators) Extend the idea to several operators Greedy algorithm for ML-RCS and MR-LCS Does NOT guarantee optimum solution Similar to Hu’s algorithm Operation selection decided by criticality O(n) time complexity Considers a more general case Resource constraints with different resource types Multi-cycle operations Pipelined operations HLS - Scheduling

List Scheduling Algorithms Algorithm 1: Minimize latency under resource constraint (ML-RC) Resource constraint represented by vector a (indexed by resource type) Example: two types of resources, MULT (a1=1), ADD (a2=2) Algorithm 2: Minimize resources under latency constraint (MR-LC) Latency constraint is given and resource constraint vector a to be minimized Define: The candidate operations Ul,k those operations of type k whose predecessors have already been scheduled early enough (completed at step l): Ul,k = { vi  V: type(vi) = k and tj + dj  l, for all j: (vj, vi)  E The unfinished operations Tl,k those operations of type k that started at earlier cycles but whose execution has not finished at step l (multi-cycle opearations): Tl,k = { vi  V: type(vi) = k and ti + di > l Priority list List operators according to some heuristic urgency measure Common priority list: labeled by position on the longest path in decreasing order HLS - Scheduling

List Scheduling Algorithm 1: ML-RC Minimize latency under resource constraint LIST_L (G(V,E), a) { // resource constraints specified by vector a l = 1 repeat { for each resource type k { Ul,k = candidate operations available in step l Tl,k = unfinished operations (in progress) Select Sk  Ul,k such that |Sk| + |Tl,k|  ak Schedule the Sk operations at step l } l = l + 1 } until vn is scheduled Note: If for all operators i, di = 1 (unit delay), the set Tl,k is empty HLS - Scheduling

List Scheduling ML-RC – Example 1 (a=[2,2]) List Scheduling ML-RC – Example 1 (a=[2,2]) Minimize latency under resource constraint (with d = 1) Assumptions All operations have unit delay (di =1) Resource constraints: MULT: a1 = 2, ALU: a2 = 2 Step 1: U1,1 = {v1,v2, v6, v8}, select {v1,v2} U1,2 = {v10}, select + schedule Step 2: U2,1 = {v3,v6, v8}, select {v3,v6} U2,2 = {v11}, select + schedule Step 3: U3,1 = {v7,v8}, select + schedule U3,2 = {v4}, select + schedule Step 4: U4,2 = {v5,v9}, select + schedule * NOP < + 1 2 3 4 5 6 7 8 9 10 11 n T1 T2 T3 T4 HLS - Scheduling

List Scheduling ML-RC – Example 2a (a = [3,1]) Minimize latency under resource constraint (with d1=2, d2=1 ) * NOP < + 1 2 3 4 5 6 7 8 9 10 11 n T1 T2 T3 T4 T5 T6 T7 Assumptions Operations have different delay: delMULT = 2, delALU = 1 Resource constraints: MULT: a1 = 3, ALU: a2 = 1 MUTL ALU start time U= {v1, v2, v6} v10 1 T= {v1, v2, v6} v11 2 U= {v3, v7, v8} -- 3 T= {v3, v7, v8} -- 4 -- v4 5 -- v5 6 -- v9 7 Latency L = 8 HLS - Scheduling

List Scheduling ML-RC – Example 2b (Pipelined) Minimize latency under resource constraint (a1=3 MULTs, a2=3 ALUs ) Assumptions Multipliers are pipelined Sharing between first and second pipeline stage allowed for different multipliers MULT ALU start time {v1, v2, v6} v10 1 v8 v11 2 {v3, v7} -- 3 -- v9 4 -- v4 5 -- v5 6 L=7, Compare to multi-cycle case * NOP < + 1 2 3 4 5 6 7 8 9 10 11 n T1 T2 T3 T4 T5 T6 HLS - Scheduling

List Scheduling Algorithm 2: MR-LC LIST_R (G(V,E), l’) { a = 1, l = 1 Compute latest possible starting times t L using ALAP algorithm repeat { for each resource type k { Ul,k = candidate operations; Compute slacks { si = tiL - l,  vi Ul,k }; Schedule operations with zero slack, update a; Schedule additional Sk  Ul,k requiring no additional resources; } l = l + 1 } until vn is scheduled. HLS - Scheduling

List Scheduling MR-LC – Example 4 (di=1) Minimize resources under latency constraint Assumptions All operations have unit delay (di=1) Latency constraint: L = 4 Use slack information to guide the scheduling Schedule operations with slack=0 first Add other operations only if current resource count allows e.g. add *6 with *3, without exceeding current Mult=2 The lower the slack the more urgent it is to schedule the operation * NOP < + 1 2 3 4 5 6 7 8 9 10 11 n T1 T2 T3 T4 HLS - Scheduling

Scheduling – a Combinatorial Optimization Problem NP-complete Problem Optimal solutions for special cases (trees) and ILP Heuristics iterative Improvements constructive Various versions of the problem Minimum latency, unconstrained (ASAP) Latency-constrained scheduling (ALAP) Minimum latency under resource constraints (ML-RC) Minimum resource schedule under latency constraint (MR-LC) If all resources are identical, problem is reduced to multiprocessor scheduling (Hu’s algorithm) Minimum latency multiprocessor problem is intractable for general graphs For trees greedy algorithm gives optimum solution HLS - Scheduling