Load Balancing Definition: A load is balanced if no processes are idle

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Ch 11 Distributed Scheduling –Resource management component of a system which moves jobs around the processors to balance load and maximize overall performance.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
1 Load Balancing and Termination Detection ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2009.
Multiple Processor Systems
CS 582 / CMPE 481 Distributed Systems
Load Balancing How? –Partition the computation into units of work (tasks or jobs) –Assign tasks to different processors Load Balancing Categories –Static.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
3.5 Interprocess Communication
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Strategies for Implementing Dynamic Load Sharing.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Distributed process management: Distributed deadlock
Load Balancing How? –Partition the computation into units of work (tasks or jobs) –Assign tasks to different processors Load Balancing Categories –Static.
Challenges of Process Allocation in Distributed System Presentation 1 Group A4: Syeda Taib, Sean Hudson, Manasi Kapadia.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.
Load Balancing and Termination Detection Load balance : - statically before the execution of any processes - dynamic during the execution of the processes.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Chapter 3: Processes. 3.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 7, 2006 Process Concept Process – a program.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
DISTRIBUTED COMPUTING
Distributed, Self-stabilizing Placement of Replicated Resources in Emerging Networks Bong-Jun Ko, Dan Rubenstein Presented by Jason Waddle.
Dynamic Load Balancing Tree and Structured Computations.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
William Stallings Data and Computer Communications
Load Balancing Definition: A load is balanced if no processes are idle
Load Balancing and Termination Detection
Overview Parallel Processing Pipelining
Processes and threads.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Chapter 3: Process Concept
Parallel Graph Algorithms
Operating Systems (CS 340 D)
Processes and Threads Processes and their scheduling
OPERATING SYSTEMS CS3502 Fall 2017
Efficient Join Query Evaluation in a Parallel Database System
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
Parallel Programming By J. H. Wang May 2, 2017.
Wireless Sensor Network Architectures
Task Scheduling for Multicore CPUs and NUMA Systems
Operating Systems (CS 340 D)
Parallel Programming in C with MPI and OpenMP
Chapter 3 Process Management.
Lecture 2: Processes Part 1
Parallel Sort, Search, Graph Algorithms
Chapter 2: The Linux System Part 3
Operating Systems.
May 19 Lecture Outline Introduce MPI functionality
CS 584.
Multiprocessor and Real-Time Scheduling
Concurrency: Mutual Exclusion and Process Synchronization
Parallel Algorithm Models
Adaptivity and Dynamic Load Balancing
CS703 - Advanced Operating Systems
Outline Chapter 2 (cont) Chapter 3: Processes Virtual machines
Introduction to High Performance Computing Lecture 17
Parallel Programming in C with MPI and OpenMP
Chapter 3: Process Concept
Presentation transcript:

Load Balancing Definition: A load is balanced if no processes are idle How? Partition the computation into units of work (tasks or jobs) Assign tasks to different processors Load Balancing Categories Static (load assigned before application runs) Dynamic (load assigned as applications run) Centralized (Tasks assigned by the master or root process) De-centralized (Tasks reassigned among slaves) Semi-dynamic (application periodically suspended and load balanced) Load Balancing Algorithms are: Adaptive if they adapt to system load levels using thresholds Stable if load balancing traffic is independent of load levels Symmetric if both senders and receivers initiate action Effective if load balancing overhead is minimal Note: Load balancing is an NP-Complete problem

Improving the Load Balance By realigning processing work, we improve speed-up

Static Load Balancing Done prior to executing the parallel application Round Robin Tasks assigned sequentially to processors If tasks > processors, the allocation wraps around Randomized: Tasks are assigned randomly to processors Partitioning – Tasks are represented by a graph Recursive Bisection Simulated Annealing Genetic Algorithms Multi-level Contraction and Refinement Advantages Simple to implement Minimal run time overhead Disadvantages Predicting execution times is often not knowable Affect of communication dynamics is often ignored The number of iterations required by processors to converge on a solution is often indeterminate Note: The Random algorithm is a popular benchmark for comparison

A Load Balancing Partitioning Graph The Nodes represent tasks The Edges represent communication cost The Node values represent processing cost A second node value could represent reassignment cost

Dynamic Load Balancing Done as a parallel application executes Centralized A single process hands out tasks Processes ask for more work when their processing completes Double buffering (ask for more while still working) can be effective Decentralized Processes detect that their work load is low Processes sense an overload condition When new tasks are spawned during execution When a sudden increase in task load occurs Questions Which neighbors should participate in the rebalancing? How should the adaptive thresholds be set? What are the communications needed to balance? How often should balancing occur?

Centralized Load Balancing Work Pool, Processer Farm, or Replicated Worker Algorithm Master Processor: Maintains the work pool (queue, heap, etc.) While ( task=Remove()) != null) Receive(pi, request_msg) Send(pi, task) While(more processes) Send(pi, termination_msg) Slave Processor: Perform task and then ask for another task = Receive(pmaster, message) While (task!=terminate) Process task Send(pmaster, request_msg) Master Slaves In this case, the slaves do not spawn new tasks How would the pseudo code change if they did?

Decentralized Load Balancing (Worker processes interact among themselves) There is no Master Processor Each Processor maintains a work queue Processors interact with neighbors to request and distribute tasks

Decentralized Mechanisms Balancing is among a subset of the total running processes Application Balancing Algorithm Receiver Initiated Process requests tasks when it is about to go idle Effective when the load is heavy Unstable when the load is light (A request frequency threshold is necessary) Sender Initiated Process with a heavy load distributes the excess Can cause thrashing when loads are heavy (synchronizing system load with neighbors is necessary) Task Queue

Process Selection Global or Local? Neighbor selection algorithms Global involves all of the processors of the network May require expensive global synchronization May be difficult if the load dynamic is rapidly changing Local involves only neighbor processes Overall load may not be balanced Easier to manage and less overhead than the global approach Neighbor selection algorithms Random: randomly choose another process Easy to implement and studies show reasonable results Round Robin: Select among neighbors using modular arithmetic Easy to implement. Results similar to random selection Adaptive Contracting: Issue bids to neighbors; best bid wins Handshake between neighbors needed It is possible to synchronize loads

Choosing Thresholds How do we estimate system load? Synchronization averages task queue length or processes Average number of tasks or projected execution time When is the load low? When a process is about to go idle Goal: prevent idleness, not achieve perfect balance A low threshold constant is sufficient When is the load high? When some processes have many tasks and others are idle Goal: prevent thrashing Synchronization among processors is necessary An exponentially growing threshold works well What is the job request frequency? Goal: minimize load balancing overhead

Gradient Algorithm Maintains a global pressure grid Node Data Structures For each neighbor Distance, in hops, to the nearest lightly-loaded process A load status flag indicating if the current processor is lightly-loaded, or normal Routing Spawned jobs go to the nearest lightly-loaded process Local Synchronization Node status changes are multicast to its neighbors L 2 1

Symmetric Broadcast Networks (SBN) Stage 3 5 Global Synchronization Stage 2 1 Stage 1 3 7 Stage 0 4 2 6 Characteristics A unique SBN starts at each node Each SBN is lg P deep Simple operations algebraically compute successors Easily adapts to the hypercube Algorithm Starts with a lightly loaded process Phase 1: SBN Broadcast Phase 2: Gather task queue lengths Load is balanced during the load and gather phases Successor 1 = (p+2s-1) %P; 1≤s≤3 Successor 2 = (p-2s-1); 1≤s<3 Note: If successor 2<0 successor2 +=P

Line Balancing Algorithm Master or slave processors adjust pipeline Slave processors Request and receives tasks if queue not full Pass tasks on if task request is posted Non blocking receives are necessary to implement this algorithm Uses a pipeline approach Request task if queue not full Receive task from request Deliver task to pi+1 pi+1 requests task Dequeue and process task pi Note: This algorithm easily extends to a tree topology

Semi-dynamic Pseudo code Partitioning Run algorithm Time to check balance? Suspend application IF load is balanced, resume application Re-partition the load Distribute data structures among processors Resume execution Partitioning Model application execution by a partitioning graph Partitioning is an NP-Complete problem Goals: Balance processing and minimize communication and relocation costs Partitioning Heuristics Recursive Bisection, Simulated Annealing, Multi-level, MinEx

Partitioning Graph P1 Load = (9+4+7+2) + (4+3+1+7) = 37 P2 R1 P5 R3 P8 R3 P4 R1 P6 R6 P9 R6 P4 R4 P7 R5 P1 P2 c4 c6 c2 c1 c7 c3 c8 c5 Question: When can we move a task to improve load balance?

Distributed Termination Insufficient condition for distributed termination Empty task queues at every process Sufficient condition for distributed termination requires All local termination conditions satisfied No messages in transit that could restart an inactive process Termination algorithms Acknowledgment Ring Tree Fixed energy distribution

Acknowledgement Termination Process Receives task Immediately acknowledge if source is not parent Acknowledge parent as process goes idle Process goes idle after it completes processing local tasks Sends all acknowledgments Receives all pending acknowledgments Notes The process sending an initial task that activates another process becomes that process's parent A process always goes inactive before its parent If the master goes inactive, termination occurs Active Inactive First task Acknowledge first task Pi Pj

Single Pass Ring Termination Pseudo code P0 sends a token to P1 when it goes idle Pi receives token IF Pi is idle it passes token to Pi+1 ELSE Pi sends token to Pi+1 when it goes idle P0 receives token Broadcast final termination message Assumptions Processes cannot reactivate after going idle Processes cannot pass new tasks to an idle process Token P0 P1 P2 Pn

Dual Pass Ring Termination Handles task sent to a process that already passed the token on Key Point: Processors pass either Black or White tokens on only if they are idle Pseudo code (Only idle processors send tokens) WHEN P0 goes idle and has token, it sends white token to p1 IF Pi sends a task to pj where j<i Pi becomes a black process WHEN Pi>0 receives token and goes idle IF Pi is a black process Pi colors the token black, Pi becomes White ELSE Pi sends token to p(i+1)%P unchanged in color IF P0 receives token and is idle IF token is White, application terminates ELSE po sends a White token to P1 Process: white=ready for termination, black: sent a task to Pj-x Token: white=ready for termination, black=communication possible

Tree Termination If a Leaf process terminates, it sends a token to it’s parent process Internal nodes send tokens to their parent when all of their child processes terminate If the root node receives the token, the application can terminate AND Leaf Nodes Terminated

Fixed Energy Termination Energy defined by an integer or long value P0 starts with full energy When Pi receives a task, it also receives an energy allocation When Pi spawns tasks, it assigns them to processors with additional energy allocations within its allocation When a process completes it returns its energy allotment The application terminates when the master becomes idle Implementation Problem: Integer division eventually becomes zero Solution: Use two level energy allocation <generation, energy> The generation increases each time energy value goes to zero

Fair Scheduling in Web Servers CS 213 Lecture 17 L.N. Bhuyan

Objective Create an arbitrary number of service quality classes and assign a priority weight for each class. Provide service differentiation for different use classes in terms of the allocation of CPU and disk I/O capacities