Scheduling Determines the precise start time of each task.

Slides:



Advertisements
Similar presentations
Covers, Dominations, Independent Sets and Matchings AmirHossein Bayegan Amirkabir University of Technology.
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
Introduction to Algorithms
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
COE 561 Digital System Design & Synthesis Scheduling Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
Mapping of Applications to Platforms Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These slides.
Computational Methods for Management and Economics Carla Gomes Module 8b The transportation simplex method.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
The number of edge-disjoint transitive triples in a tournament.
Steps in DP: Step 1 Think what decision is the “last piece in the puzzle” –Where to place the outermost parentheses in a matrix chain multiplication (A.
Tirgul 12 Algorithm for Single-Source-Shortest-Paths (s-s-s-p) Problem Application of s-s-s-p for Solving a System of Difference Constraints.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE Synthesis & Verification - Lecture 3 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
SCHEDULING SOURCES- Mark Manwaring Kia Bazargan Giovanni De Micheli Gupta Youn-Long Lin M. Balakrishnan Camposano, J. Hofstede, Knapp, MacMillen Lin.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
1 IOE/MFG 543 Chapter 7: Job shops Sections 7.1 and 7.2 (skip section 7.3)
Sequencing Problem.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
SVM by Sequential Minimal Optimization (SMO)
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.
An Efficient Algorithm for Scheduling Instructions with Deadline Constraints on ILP Machines Wu Hui Joxan Jaffar School of Computing National University.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Outline Introduction Minimizing the makespan Minimizing total flowtime
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
Parallel Machine Scheduling
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
Algorithm Design Methods 황승원 Fall 2011 CSE, POSTECH.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
Balanced Billing Cycles and Vehicle Routing of Meter Readers by Chris Groër, Bruce Golden, Edward Wasil University of Maryland, College Park American University,
Approximation Algorithms based on linear programming.
1 Job Shop Scheduling. 2 Job shop environment: m machines, n jobs objective function Each job follows a predetermined route Routes are not necessarily.
Linear program Separation Oracle. Rounding We consider a single-machine scheduling problem, and see another way of rounding fractional solutions to integer.
More NP-Complete and NP-hard Problems
EMGT 6412/MATH 6665 Mathematical Programming Spring 2016
Lap Chi Lau we will only use slides 4 to 19
Advanced Algorithms Analysis and Design
Topics in Algorithms Lap Chi Lau.
Topological Sort In this topic, we will discuss: Motivations
The minimum cost flow problem
Basic Project Scheduling
Minimum Spanning Tree 8/7/2018 4:26 AM
Basic Project Scheduling
Chap 3. The simplex method
Analysis of Algorithms
Chapter 3 The Simplex Method and Sensitivity Analysis
Scheduling Theory By Sarah Walker 12/4/2018.
3.5 Minimum Cuts in Undirected Graphs
Architecture Synthesis
Scheduling Algorithms
Resource Sharing and Binding
Integrated Systems Centre © Giovanni De Micheli – All rights reserved
Chapter 8. General LP Problems
Richard Anderson Autumn 2016 Lecture 7
EE5900 Advanced Embedded System For Smart Infrastructure
離散數學 DISCRETE and COMBINATORIAL MATHEMATICS
ICS 252 Introduction to Computer Design
Chapter 8. General LP Problems
Chapter 8. General LP Problems
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Scheduling Determines the precise start time of each task. The start times must satisfy the original dependencies of the sequencing graph, Scheduling determines the concurrency of the resulting implementation Area/latency trade-off points can be derived and resources may be bounded to satisfy design requirement. A spectrum of solutions may be obtained by scheduling a sequencing graph with different resource constraints. Parallelism increases area due to more number of resource usage but reduces latency (delay)

Model for scheduling problems : X1 = a+da; U1= u-(3*a*u*da) – (3*y*da); Y1= y+u*da; C=X1< a; * _ + < 3 a u da y X1 Y1 C U1

Sequencing graph * + < - V0 vn 1 2 3 4 5 6 7 8 9 10 11

Scheduling rules The Latency of the schedule is the number of cycles to execute the entire schedule or it is the difference in start time of the sink and source vertices  = tn – to The start time of an operation is at least as large as the start time of each of its direct predecessor plus its execution delay. i.e. ti > tj+dj i,j (vj,vi)  E.

Scheduling without Resource constraints Used when dedicated resources are used, when operations differ in their use. Used when resource binding is done prior to scheduling and resource conflicts are solved by serializing the operations that share the same resource. Used to derive bounds on latency for constrained problems. A lower bound on latency can be computed.

Problem formulation for Scheduling Input to Scheduling problem is a sequencing graph with 1.D = {di ; i=0, 1….. n} denotes the set of operation execution delays. The execution delays of the source and sink vertices are both zero. i.e. d0=dn=0. Also assume that delays are data independent. 2. Set of start time T= {ti ; i=0, 1….. n}, the start time for the operations. The Latency of the schedule is the number of cycles to execute the entire schedule or it is the difference in start time of the sink and source vertices  = tn – to

ASAP ( As Soon As Possible) Algorithm ASAP (Gs (V,E)) { Schedule Vo by setting tso =1; Repeat { Select a vertex Vi whose predecessors were all scheduled; Schedule Vi by setting tsi= max tsj+dj; j:(Vj,Vi)  E. } until Vn is scheduled return (ts); Assume all operations have unit execution delays.

ASAP algorithm ASAP algorithm would set first tso =1 then vertices whose predecessors have been scheduled are v1, v2, v3, v8, v10. Their start time is set to tso + do = 1+0 =1 The start time for sink tns = 5 There fore the latency is = 5-1 = 4 .

Sequencing graph * + < - V0 vn 1 2 3 4 5 6 7 8 9 10 11

ALAP ( As Late As Possible) Algorithm ALAP (Gs(V,E),) { Schedule Vn by setting tLn= +1; Repeat { Select vertex Vi whose successors are all scheduled; Schedule Vi setting tLi=min (tLj-dj); j: (Vi, Vj) E. } Until (V0 is scheduled); Return (tL)

ALAP graph * + - < 5 4 3 2 1 7 6 8 10 9 11 Vo Vn

Scheduling with Resource constraints : The integer linear programming model Start time of each operations is unique Sequencing relations must be satisfied The resource bounds must be met at every schedule time step

Example xil parameters. x 0,1=1; x 1,1=1, x 2,1=1; x 3,2=1; x 4,3=1; x 5,4=1 Vertex V0 starts at 1,therefore x 0,1=1, similarly V1 starts at time step 1, V2 starts at time step 1, V3 starts at time step 2, V4 starts at time step 3 and V5 starts at time step 4 Vertices V1, V2, V3, V4, and V5 has mobility of 0 ie they have only one start time possibilities. Other vertices V6, V7 has mobility of 1, hence their xil parameters are x 6,1+ x 6,2=1; x 7,2+ x 7,3 = 1

vertices V8, V9, V10 and V11 has mobility of 2, hence their xil parameters are x 8,1+x 8,2+x 8,3=1; x 9,2+x 9,3+x 9,4=1; x 10,1+x 10,2+x 10,3=1; x 11,2+x 11,3+x 11,4=1; x n,5=1

Using sequencing constraints we get a set of sequencing conditions as follows. 2x 7,2+3x 7,3 – x 6,1 – 2x 6,2 -1 ≥ 0 If V7 starts at time step 2, then v6 should start at time step 1 or if V7 starts at time step 3, then V6 can start at time step 1 or step 2 2x 9,2+3x 9,3+4x 9,4-x 8,1-x 8,3 -1 ≥ 0 2x 11,2+3x 11,3+4x 11,4-x 10,1-2x 10,2-3x 10,3 -1 ≥ 0 4x 5,4 -2x 7,2-3x 7,3-1 ≥ 0

Resource constraints: two multipliers available 1. At time step 1 x 1,1+x 2,1+x 6,1+x 8,1 ≤ 2 Selected: x 1,1+x 2,1 = 2 1. At time step 2 x 3,2+x 6,2+x 7,2+x 8,2 ≤ 2 Selected: x 3,2+x 6,2 = 2 3. At time step 3 x 7,3+x 8,3 ≤ 2

Resource constraints: two ALUS available 1. At time step 1 x 10,1≤ 2 2. At time step 2 x 9,2+x 10,2+x 11,2≤ 2 V11 is selected for scheduling 3. At time step 3 x 4,3+x 9,3+x 10,3+x 11,3≤ 2 V4 is selected 4. At time step 4 x 5,4+x 9,4+x 11,4≤ 2 V5, V9 are selected

* + - < 5 4 3 2 1 7 6 8 10 9 11 Vo Vn

Heuristic scheduling algorithms List scheduling List { Gs (V.E) a ) l = 1 Repeat { for each resource type k = 1,2 ….nres Determine candidate operators U l,k; Determine unfinished operations T l,k ; Select a vertex such that Sk is subset of U l,k and Sk+ T l,k ≤ ak Schedule the Sk operations at step l by setting ti=l; Vi € Sk; } l=l+1; } until(Vn is scheduled); return (t); }

The candidate operations U l,k are those operations of type k whose predecessors have already been scheduled early enough so, that the corresponding operations are completed at step The unfinished operations T l,k are the set of operations of type k that started at earlier cycles and whose execution is not finished at step l. A priority list of operations is used in choosing among the operations based on some heuristic urgency measures A common priority list is the table with weights of their longest path to the sink and rank them in decreasing order the most urgent operations are scheduled first scheduling under resource.

Labeled graph

Let a = [l,1] T in the beginning. At the first step for k = 1, U1,1 = { v1,v2,v6,v8}. two operations with zero slack/longest path, ( v1,v2) are scheduled. Thus vector a = [2, 1] T . For k = 2, U1,1 = {v10}, is selected and scheduled. At the second step for k = 1, U2,1 = {v 3 , v6, v8}. There are two operations with zero slack, { v 3 ,v 6} which are scheduled. For k = 2, U2,2 = {v11} ,which is selected and scheduled.

At the third step for k = 1 , U 3,1 , = {v7,v8} which are selected. For k = 2, U3.2 = (v4), which is selected and scheduled. At the fourth step U4,2 = {v5,v9}. Both operations have zero slack. They are selected and scheduled. a is updated to a = [2,2]T. Hence two resources of each type are required..

Labeled graph

Assumptions all operations have unit delay a1 = 2 Multiplier a2 = 2 ALUs 1st Step, k = 1, U 1,1 = { V1, V2 V6, V8 } the selected operations are { V1, V2, } because their label in maximum k = 2, U 2,1 = { V10} which is Selected & scheduled At 2nd step, k = 1, U 2,1= { V3, V6, V8 } Selected operations are { V3, V6} because their label in maximum For k = 2, U 2,2 { V11} which is selected and scheduled At 3rd Step, K= 1, U 3,1, = { V7, V8} Which are selected & scheduled K= 2, U 3,2 = { V4} is Scheduled At 4th Step { V5, V9} are selected and scheduled

List scheduling to determine minimum resource List scheduling applied to minimize the resource usage under latency Constraint ʎ. At the beginning, one resource per type is assumed, i.e., a is a vector with all entries set to 1. a=[1,1] Slack of an operation is used to rank the operations The lower the slack, the higher the urgency in the list is. Operations with zero slack are always scheduled; otherwise the latency bound would be violated. Scheduling such operations may require additional resources, i.e., updating a. The remaining operations are scheduled only if they do not require additional resources

Let a = [l,1] T in the beginning. At the first step for k = 1, U1,1 = { v1,v2,v6,v8}. two operations with zero slack, ( v1,v2) are scheduled. Thus vector a = [2, 1] T . For k = 2, U1,1 = {v10}, is selected and scheduled. At the second step for k = 1, U2,1 = {v 3 , v6, v8}. There are two operations with zero slack, { v 3 ,v 6} which are scheduled. For k = 2, U2,2 = {v11} ,which is selected and scheduled.

At the third step for k = 1 , U 3,1 , = {v7,v8} which are selected. For k = 2, U3.2 = (v4), which is selected and scheduled. At the fourth step U4,2 = {v5,v9}. Both operations have zero slack. They are selected and scheduled. a is updated to a = [2,2]T. Hence two resources of each type are required..

a1 = 3 multipliers and a2 = I ALU a1 = 3 multipliers and a2 = I ALU. execution delays of the multiplier and the ALU are 2 and 1 respectively. , Multiplier ALU Start time V1,v2,v6 v10 1 - v11 2 V3,v7,v8 3 4 v4 5 v5 6 v9 7

Multiprocessor Scheduling and Hu's Algorithm labels by αi;: i = 1 , 2 , . . . , n) and let = α=max αi p ( j ) be the number of vertices with label equal to j, p(0) = 1, p(l) = 3. p(2) = 4, p(3) = 2. p(4) = 2.

a = 3. the first iteration of Hu's algorithm would select U = {V1, V2, V6, V8, V10} schedule operations {V1,V2,V6) at the first time step, because their labels (a(1) =4, a(2) = 4, a(6) = 3) are not smaller than any other label of unscheduled vertices in U. At the second iteration U = {V3,V7,V8,V10} and {V3,V7,V8) are scheduled at the second time step. .

Operations {V4,V9,V10} are scheduled at the third step, {V5, V11} at the fourth (Vn) at the last

Heuristic Scheduling Algorithms: Force-directed Scheduling The time frame of an operation is the time interval where it can be scheduled. Time frames : ( [tis , tiL] ); i = 0, I. . . . , n]. The operation probability is a function that is zero outside the corresponding time frame and is equal to the reciprocal of the frame width inside it Probability of the operations at time l is {pi(l)i; = 0, I , . . . , n }

Operations whose time frame is one unit wide are bound to start in one specific time step. For the remaining operations, the larger the width, the lower the probability that the operation is scheduled in any given step inside the corresponding time frame. The type distribution is the sum of the probabilities of the operations implementable by a specific resource type in the set {I, 2, . . . , nres} at any time step of interest. the type distribution at time 1 is {qk(l);k = 1 , 2 , .. . , nres}. A distribution graph is a plot of an operation-type distribution over the schedule steps.

Operation v1, has zero mobility Operation v1, has zero mobility. Hence p1( l ) = 1, p1(2) = p1(3) = p1(4) = 0. Similar considerations apply to operation v2. Operation v6 has mobility 1. Its time frame is [I , 2]. p6(l) = p6(2) = 0.5 and p6(3) = p6(4) = 0. Operation v8 has mobility 2. Its time frame is [1, 3]. Hence p8(l) = p8(2) = p8(3) = 0.3 and p8(4) = 0. Thus the type distribution for the multiplier (k = 1 ) at step 1 is q1 ( 1 ) = 1 + 1 + 0.5 + 0.3 = 2.8.

Forces can be categorized into two classes. 1. set of forces relating an operation to the different possible control steps where it can be scheduled and called self-forces. 2. related to the operation dependencies and called predecessor/successor forces.

Consider the operation v6 . Its type is multiply (i.e., k = 1). v6 can he scheduled in the first two schedule steps, and its probability is p6 = 0.5 in those steps and zero elsewhere. Type probability q1(l) = 2.8 and q1(2) = 2.3. When the operation is assigned to step 1, its probability variations are 1-0.5 for step 1 0-0.5 for step 2. self-force = 2.8 * (1 - 0.5) + 2.3 * (0 - 0.5) = 0.25. Force is positive, because the concurrency at step 1 of the multiplication is higher than at step 2. when the operation is assigned to step 2, self-force = 2.8 * (0 - 0.5) + 2.3 * ( 1 - 0.5) = -0.25.

The assignment of operation v6 to step 2 implies the assignment of operation v7 to step 3. Therefore the force of v7 related to step 3, q1(2)(0- p7(2)) +q1(3)(1 – p7(3)) = =2.3 * (0 - 0.5) + 0.8* (1 - 0.5) = - 0.75 is the successor force of v6 The total force on v6 at step 2 is the sum of its self-force and successor force = -0.25 - 0.75 = -1. The total forces on v6 at step1 and 2 are 1 and -1 , respectively. Scheduling v6 at step 1 would thus increase the concurrency as compared to scheduling v6 at step 2