Parallel Simulation etc. 601.46 Roger Curry Presentation on Load Balancing.

Parallel Simulation etc. 601.46 Roger Curry Presentation on Load Balancing

Load Balancing Goal is to ensure that simulation time advances at approximately the same rate across all LPs Goal is to ensure that simulation time advances at approximately the same rate across all LPs – An LP lagging behind slows down the simulation – An LP too far in the future cannot do any useful work Partitioning Partitioning – Static schemes – Dynamic schemes

Partitioning one-to-one mapping of simulation objects to LPs could result in unnecessary overhead for parallel simulation. one-to-one mapping of simulation objects to LPs could result in unnecessary overhead for parallel simulation. Solution: partition simulation objects into groups and assign each group to an LP. Solution: partition simulation objects into groups and assign each group to an LP. Partitioning usually attempts to Partitioning usually attempts to – Minimize load imbalances – Minimize inter-processor communication – Maximize lookahead

Static partitioning techniques For SMP inter-processor communication is not such an issue For SMP inter-processor communication is not such an issue Finding the optimal partition is NP-hard, most techniques are heuristics which attempt to find good partitions. Finding the optimal partition is NP-hard, most techniques are heuristics which attempt to find good partitions. Simulated Annealing (SA) Simulated Annealing (SA) – Difficult to devise appropriate cost functions – Can take a long time to find a good solution Graph algorithms (max-flow, min-cut) Graph algorithms (max-flow, min-cut)

Static partitioning packages METIS, SCOTCH are two graph partitioning packages available. METIS, SCOTCH are two graph partitioning packages available. These packages attempt to minimize inter-processor communication (edge weights), and evenly distribute the work load (node weights). These packages attempt to minimize inter-processor communication (edge weights), and evenly distribute the work load (node weights). Use inverse of lookahead for edge weights. Use inverse of lookahead for edge weights.

Static partitioning (cont. ) Workload requirements are usually unknown prior to simulation. Solutions: Workload requirements are usually unknown prior to simulation. Solutions: – Pre-simulation – Load-estimation IP-TN traffic load estimation IP-TN traffic load estimation For traffic T, where T=1,…,n and n is the number of traffics Let R be the route used by traffic T for every hop h i in R h i.wg = h i.wg + ( n * ( rate(T) / total_rate ) ) h i.wg = h i.wg + ( n * ( rate(T) / total_rate ) ) where h i.wg is the weight of host i, rate(T) is the rate of traffic T, and total_rate is the sum of all traffic rates endforendfor

Static partitioning (cont. ) SCOTCH and METIS do a pretty good job (Figure 5), unfortunately they don’t eliminate low lookahead cycles! SCOTCH and METIS do a pretty good job (Figure 5), unfortunately they don’t eliminate low lookahead cycles! Most conservative synchronization protocols perform poorly if there are low lookahead cycles and few events (this limits parallelism). Most conservative synchronization protocols perform poorly if there are low lookahead cycles and few events (this limits parallelism). Solution : Merging algorithm to eliminate these cycles (before, or after). (Figure 3). Solution : Merging algorithm to eliminate these cycles (before, or after). (Figure 3).

Dynamic partitioning (structures) LP SO Task LP SO LP SO LP SO CMB – SMP, and othersCCTKit, Taskit CMB-SMP, Taskit Central queue of LPs CCTKit Distributed queues of tasks

Dynamic schemes Centralized queue (CMB-SMP, Taskit) Centralized queue (CMB-SMP, Taskit) – LPs (or Tasks) can migrate between processors via the central scheduling queue. – Requires that the number of LPs (or Tasks) be (significantly) greater than the number of processors. – Trap: If a LP has few events it can execute, then cost of accessing a (lockable) global queue makes this strategy quite expensive.

Dynamic schemes Distributed queues (CCTKit) Distributed queues (CCTKit) – Most distributed queue implementations do not allow for LPs to migrate between processors; they are simply assigned during the static partitioning phase. – There is definitely a possibility of moving LPs between different scheduling queues (Rob, has this been done yet?). – The advantage of distributed queues (in terms of parallel simulation, is that they don not require a lock since they are only accessed by a single processor.

Dynamic schemes The synchronization protocol (or at least the scheduling algorithm) is tightly coupled with dynamic load balancing in general. The synchronization protocol (or at least the scheduling algorithm) is tightly coupled with dynamic load balancing in general. In CCTkit, only scheduling LPs (Tasks) that have work to do ensures that when a LP executes it will at least be able to advance its simulation time (not necessarily by a lot). In CCTkit, only scheduling LPs (Tasks) that have work to do ensures that when a LP executes it will at least be able to advance its simulation time (not necessarily by a lot). Naive synchronization protocols can end up scheduling LPs with nothing to do! Naive synchronization protocols can end up scheduling LPs with nothing to do!

Future research directions? Task migration between Processors. Task migration between Processors. LP migration between Tasks. LP migration between Tasks. Hierarchical scheduling / load balancing Hierarchical scheduling / load balancing – Apply different synchronization techniques to different parts of the model. – Try to extract relevant structure from graph to determine a good partitioning.

Conclusions A load balancing strategy needs to take into account both static and dynamic solutions. A load balancing strategy needs to take into account both static and dynamic solutions. Determining an optimal number of Tasks or LPs may not be that important if we can obtain consistently good performance with a varying number of LPs. Determining an optimal number of Tasks or LPs may not be that important if we can obtain consistently good performance with a varying number of LPs.

Parallel Simulation etc. 601.46 Roger Curry Presentation on Load Balancing.

Similar presentations

Presentation on theme: "Parallel Simulation etc. 601.46 Roger Curry Presentation on Load Balancing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Simulation etc. 601.46 Roger Curry Presentation on Load Balancing.

Similar presentations

Presentation on theme: "Parallel Simulation etc. 601.46 Roger Curry Presentation on Load Balancing."— Presentation transcript:

Similar presentations

About project

Feedback