Chapter – 5.2 Static Process Scheduling

Chapter – 5.2 Static Process Scheduling
Anjum Reyaz-Ahmed

Outline Part I : Static Process Scheduling
Precedence process model Communication system model Part II: Current Literary Review "Optimizing Static Job Scheduling in a Network of Heterogeneous Computers," ICPP 2000 “Design Optimization of Time- and Cost- Constrained Fault-Tolerant Distribution Embedded Systems”, DATE 2005 “White Box Performance Analysis Considering Static Non-Preemptive Software Scheduling”, DATE 2009 Part III: Future Research Initiatives

Static Process Scheduling
Given a set of partially ordered tasks, define a mapping of processes to processors before the execution of the processes. Cost model: CPU cost and communication cost, both should be specified in prior. Minimize the overall finish time (makespan) on a non-preemptive multiprocessor system (of identical processors) Except for some very restricted cases, scheduling to optimize the makespan are NP-Complete Heuristic solution are usually proposed Static process scheduling: deterministic scheduling policy Scheduling a set of partially ordered tasks on a non-preemptive multi-processor system of identical processors to minimize the overall finishing time (makespan) Optimize makespan  NP-complete Need approximate or heuristic algorithms… Attempt to balance and overlap computation and communication Mapping processes to processors is determined before the execution Once a process starts, it stays at the processor until completion Need prior knowledge about process behavior (execution time, precedence relationships, communication patterns) Scheduling decision is centralized and non-adaptive [Chow and Johnson 1997]

Precedence Process Model
This model is used to describe scheduling for ‘program’ which consists of several sub-tasks. The schedulable unit is sub-tasks. Program is represented by a DAG. Precedence constraints among tasks in a program are explicitly specified. critical path: the longest execution path in the DAG, often used to compare the performance of a heuristic algorithm. [Chow and Johnson 1997]

Precedence Process and Communication System Models
Communication overhead for A(P1) and E(P3) = 4 * 2 = 8 Execution time Communication overhead for one message The underlying architecture on which the tasks are to be mapped can be characterized by a communication system model showing unit communication delay between processors. No. of messages to communicate [Chow and Johnson 1997]

contd.. Algorithms: Scheduling goal: minimize the makespan time.
List Scheduling (LS): Communication overhead is not considered. Using a simple greedy heuristic: No processor remains idle if there are some tasks available that it could process. Extended List Scheduling (ELS): the actual scheduling results of LS with communication consideration. Earliest Task First scheduling (ETF): the earliest schedulable task (with communication delay considered) is scheduled first. Static multiprocessor scheduling Represented by DAG Minimize overall makespan Nodes : Task with known execution time Edges : Weight showing message units to be transferred Weights Hides the details of hardware architecture NP-Complete Communication system model: Scheduling strategies List Scheduling (LS): no processor remains idle if there are some tasks available that it could process (no communication overhead) Extended List Scheduling (ELS): LS first + communication overhead Earliest Task First (ETF) scheduling: the earliest schedulable task is scheduled first Critical path: longest execution path Lower bound of the makespan Try to map all tasks in a critical path onto a single processor [Chow and Johnson 1997]

Makespan Calculation for LS, ELS, and ETF
[Chow and Johnson 1997]

Communication Process Model
There are no precedence constrains among processes modeled by a undirected graph G, node represent processes and weight on the edge is the amount of communication messages between two connected processes. Process execution cost might be specified some times to handle more general cases. Scheduling goal: maximize the resource utilization. Communication process model: Maximize resource utilization and minimize inter-process communication Undirected graph G=(V,E) V: Processes E: weight = amount of interaction between processes Cost equation Cost(G,P) = E ej(pi) + E ci,j(pi,pj) e = process execution cost (cost to run process j on processor i) C = communication cost (C==0 if i==j) NP-Complete [Chow and Johnson 1997]

contd… the problem is to find an optimal assignment of m process to P processors with respect to the target function: P: a set of processors. ej(pi): computation cost of execution process pi in processor Pj. ci,j(pi,pj): communication overhead between processes pi and pj. Assume a uniform communicating speed between processors. [Chow and Johnson 1997]

optimal polynomial time algorithm
This is referred as Module Allocation problem. It is NP-complete except for a few cases: For P=2, Stone suggested an polynomial time solution using Ford-Fulkerson’s maximum flow algorithm. For some special graph topologies such as trees, Bokhari’s algorithm can be used. Known results: The mapping problem for an arbitrary number of processors is NP-complete. Problem optimal polynomial time algorithm suboptimal 2 processor Yes 2 proc. with varying load tree-structured graph series parallel graph 3 and more processor systems yes [Chow and Johnson 1997]

Stone’s two-processor model to achieve minimum total execution and communication cost
Example: Partition the graph by drawing a line cutting through some edges Result in two disjoint graphs, one for each process Set of removed edges  cut set Cost of cut set  sum of weights of the edges Total inter-process communication cost between processors Of course, the cost of cut sets is 0 if all processes are assigned to the same node Computation constraints (no more k, distribute evenly…) Maximum flow and minimum cut in a commodity-flow network Find the maximum flow from source to destination [Chow and Johnson 1997]

Maximum Flow Algorithm in Solving the Scheduling Problem

Minimum-Cost Cut Only the cuts that separate A and B are feasible

Generalized solution for more than two processor
Stone uses a repetitive approach based on two-processor algorithm to solve n-processor problems. Treat (n-1) processors as one super processor The processors in the super-processor is further broken down based on the results from previous step. [Chow and Johnson 1997]

Other Heuristics Other heuristic: separate the optimization of computation and communication. Assume communication delay is more significant cost merge processes with higher interprocess interaction into cluster of processes clusters of processes are then assigned to the processor that minimizes the computation cost With reduced problem size, the optimal is relatively easier to solve (exhaust search) A simple heuristic: merge processes if communication costs is higher than a threshold C Also can put constrains on the total computation for the cluster, to prevent over clustering. [Chow and Johnson 1997]

Cluster of Processes For C = 9, We get three clusters (2,4), (1,6 )and (3,5) Clusters (2,4) and (1,6) must be mapped to processors A and B. Cluster (3,5) can be assigned to A 0r B But assigned to A due to lower communication cost Total Cost = 41 ( Computation cost = 17 on A and 14 on B Communication cost = 10) (2,4)) (1,6)) (3,5)) 6 11 4 [Chow and Johnson 1997]

Part II Current Literary Review

Optimizing Static Job Scheduling in a Network of Heterogeneous Computers -----Xueyan Tang & Samuel T. Chanson-----IEEE 2000 Summary: Static job scheduling schemes in a network of computers with different speeds. Optimization techniques are proposed for workload allocation and job dispatching. The proposed job dispatching algorithm is an extension of the traditional round-robin scheme [Tang & Chanson 2000]

Optimization for Workload Allocation
a fraction αi of all the jobs are sent to computer ci where [Tang & Chanson 2000] [200]

Simple Weighted Workload Allocation
Amount of workload for each computer proportional to its processing speed All computers are equally utilized . Does not provide best performance [Tang & Chanson 2000]

Dynamic Least-Load Scheduling
Beneficial to allocate a disproportional higher fraction of the workload to the more powerful computers. Assign new job to the machine with least normalized load it is known that jobs moved from a slow machine to a fast machine, decreases slow machine’s utilization decreases a lot whereas utilization of fast machine does not increase that much [Tang & Chanson 2000]

Optimizing Technique for Job Dispatching
Random Based Job Dispatching Newly arrived job is scheduled to run on “randomly” selected computer Round-Robin Based Job Dispatching The objective here is to smooth inter-arrival intervals of consecutive jobs . For example suppose there are 4 computers c1, c2, c3 and c4 with workload fractions 1/8, 1/8, 1/4 and ½ respectively. Dispatching scheme - c4, c3, c4, c2, c4, c3, c4, c1, c4, c3, c4, c2, c4, c3, c4, c1, …… [Tang & Chanson 2000]

Summary The key idea of optimizing the workload allocation scheme it to send a disproportionately high fraction of workload to the most powerful computers. An analytical model is developed to derive the optimized allocation strategy mathematically For job dispatching an algorithm that extends round-robin to a general case is presented [Tang & Chanson 2000]

Design Optimization of Time- and Cost-Constrained Fault-Tolerent Distributed Embedded Systems---- V Izosimov, P Pop, P Eles & Z Peng DATE, IEEE 2005 Synopsis Re-execution and Replication are used for tolerating transient faults Processes are statically schedules and communication are performed using the time triggered protocol [Izosimov et al. 2005]

System Architecture Each node has a CPU and communication controller running independently Time Triggered Communication Protocol [Izosimov et al. 2005]

Fault-Tolerance Mechanisms
Re-execution Active Replication [Izosimov et al. 2005]

Summary Addresses optimization of distributed embedded systems for fault tolerance Two fault-tolerance mechanism Re-execution – time redundancy Active replication – space redundancy [Izosimov et al. 2005]

White Box Performance Analysis Considering Static Non-Preemptive Software Scheduling --- A Viehl, M Pressler, Oliver Bringmann---- DATE IEEE 2009 Synopsis A novel approach for the integration of cooperative and static non-preemptive scheduling in formal white box analysis presented [Viehl et al. 2009]

Future Research Initialtive
Use AI techniques for Static Scheduling Genetic Algorithm Simulated Annealing

References: Randy Chow & Theodore Johnson . “Distributed Operating Systems & Algorithms”. pp Addison-Wesley 1997 Xueyan Tang & Samuel T. Chanson. “ Optimizing Static Job Scheduling in a Network of Heterogeneous Computers”. pp , icpp, IEEE 2000 Viacheslav Izosimov, Paul Pop, Petru Else & Zebo Peng. “ Design Optimization of Time- and C0st-Constrained Fault Tolerant Distribution Embedded Systems”. Design Automation and Test in Europe (DATE), IEEE, 2005 Alxander Viehl, Michael Pressler and Oliver Bringmann. “ White Box Performance Analysis Considering Static Non-Preemptive Software Scheduling”. Design Automation and Test in Europe (DATE), IEEE, 2009

Thank you!!

Chapter – 5.2 Static Process Scheduling

Similar presentations

Presentation on theme: "Chapter – 5.2 Static Process Scheduling"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter – 5.2 Static Process Scheduling

Similar presentations

Presentation on theme: "Chapter – 5.2 Static Process Scheduling"— Presentation transcript:

Similar presentations

About project

Feedback