Work Stealing for Irregular Parallel Applications on Computational Grids Vladimir Janjic University of St Andrews 12th December 2011.

Slides:

Advertisements

Similar presentations

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies Experiences.

Advertisements

University of Southampton Electronics and Computer Science M-grid: Using Ubiquitous Web Technologies to create a Computational Grid Robert John Walters.

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Using Parallel Genetic Algorithm in a Predictive Job Scheduling

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

An Energy Efficient Routing Protocol for Cluster-Based Wireless Sensor Networks Using Ant Colony Optimization Ali-Asghar Salehpour, Babak Mirmobin, Ali.

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

The Organic Grid: Self- Organizing Computation on a Peer-to-Peer Network Presented by : Xuan Lin.

Cluster Computing and Genetic Algorithms With ClusterKnoppix David Tabachnick.

Parallel Simulation etc Roger Curry Presentation on Load Balancing.

Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.

Transposition Driven Work Scheduling in Distributed Search Department of Computer Science vrijeamsterdam vrije Universiteit amsterdam John W. Romein Aske.

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.

Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.

Strategies for Implementing Dynamic Load Sharing.

Task Based Execution of GPU Applications with Dynamic Data Dependencies Mehmet E Belviranli Chih H Chou Laxmi N Bhuyan Rajiv Gupta.

P-Grid Presentation by Thierry Lopez P-Grid: A Self-organizing Structured P2P System Karl Aberer, Philippe Cudré-Mauroux, Anwitaman Datta, Zoran Despotovic,

Distributed Process Management1 Learning Objectives Distributed Scheduling Algorithms Coordinator Elections Orphan Processes.

FLANN Fast Library for Approximate Nearest Neighbors

1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.

ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.

Efficient and Robust Query Processing in Dynamic Environments Using Random Walk Techniques Chen Avin Carlos Brito.

Parallel and Distributed Simulation FDK Software.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

Grid Computing With Charm++ And Adaptive MPI Gregory A. Koenig Department of Computer Science University of Illinois.

1 December 12, 2009 Robust Asynchronous Optimization for Volunteer Computing Grids Department of Computer Science Department of Physics, Applied Physics.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Optimization Problems - Optimization: In the real world, there are many problems (e.g. Traveling Salesman Problem, Playing Chess ) that have numerous possible.

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY 1 On-line Automated Performance Diagnosis on Thousands of Processors Philip C. Roth Future.

Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.

Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008.

DIST: A Distributed Spatio-temporal Index Structure for Sensor Networks Anand Meka and Ambuj Singh UCSB, 2005.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

Explorations into Internet Distributed Computing Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu.

More on Adaptivity in Grids Sathish S. Vadhiyar Source/Credits: Figures from the referenced papers.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.

Internet-Based TSP Computation with Javelin++ Michael Neary & Peter Cappello Computer Science, UCSB.

Scalable and Coordinated Scheduling for Cloud-Scale computing

Proposal of Asynchronous Distributed Branch and Bound Atsushi Sasaki†, Tadashi Araragi†, Shigeru Masuyama‡ †NTT Communication Science Laboratories, NTT.

Futures, Scheduling, and Work Distribution Speaker: Eliran Shmila Based on chapter 16 from the book “The art of multiprocessor programming” by Maurice.

High Performance LU Factorization for Non-dedicated Clusters Toshio Endo, Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa (University of Tokyo) and the future.

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

A Stable Broadcast Algorithm Kei Takahashi Hideo Saito Takeshi Shibata Kenjiro Taura (The University of Tokyo, Japan) 1 CCGrid Lyon, France.

Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.

Java-Based Parallel Computing on the Internet: Javelin 2.0 & Beyond Michael Neary & Peter Cappello Computer Science, UCSB.

Distributed Scheduling Motivations: reduce response time of program execution through load balancing Goal: enable transparent execution of programs on.

Dynamic Load Balancing Tree and Structured Computations.

Computer Architecture: Parallel Task Assignment

Data Management on Opportunistic Grids

Introduction to Load Balancing:

Department of Computer Science University of California,Santa Barbara

Parallel Programming in C with MPI and OpenMP

L21: Putting it together: Tree Search (Ch. 6)

Abhinandan Ramaprasath, Anand Srinivasan,

Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale

Atlas: An Infrastructure for Global Computing

Parallel Programming in C with MPI and OpenMP

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Work Stealing for Irregular Parallel Applications on Computational Grids Vladimir Janjic University of St Andrews 12th December 2011

November 21, 2006 In this talk…  Feudal Stealing algorithm for scheduling irregular parallel applications  Combination of Grid-GUM and Cluster-aware Random Stealing  Irregular parallel applications -- task trees highly unbalanced

November 21, 2006 What is work stealing?  Work Stealing -- passive, distributed, dynamic scheduling method  Idle “thieves” steal work from busy “victims”

November 21, 2006 To steal or not to steal?  Why stealing?  Dynamic, adaptive and “cheap”  Does not require prior knowledge of task dependencies => good for irregular applications  Inherently distributed => scalability  Why not stealing?  Not optimal  Possibly slow work distribution

November 21, 2006 Work stealing on Computational Grids  Grid-GUM (GpH), Satin (Java d&c), Javelin (Java), Atlas (Java)  The main problem : Steal attempts can be expensive due to high latencies  Especially for irregular applications, where all work may be concentrated on a few nodes  The main questions are where to send steal attempts and how to respond to them  Use load information (Grid-GUM)

November 21, 2006 Cluster-aware Random Stealing (CRS)  Local (within a cluster) and remote (outside of cluster) stealing done in parallel  Works well for regular applications on heterogeneous environments with a lot of parallelism  Not so well for irregular

November 21, 2006 Centralised and distributed work stealing

November 21, 2006 Feudal Stealing  Use the CRS algorithm as a base  Local stealing done using Random Stealing  Remote stealing done via cluster head nodes  Only head nodes (and a victim) visited  Head nodes hold load information Cluster 0 Local load Remote Load PE Load Cl Time Load

November 21, 2006 Feudal Stealing

November 21, 2006 How is load information in head nodes obtained?  Load of nodes inside the cluster periodically sent  Load of remote clusters updated from remote-steal messages  Cluster load information attached to remote-steal messages (similar to Grid-GUM)

November 21, 2006 Evaluation of Feudal Work Stealing  Using simulations, on generic benchmarks for load balancing algorithms (UTS -- Unbalanced Tree Search)  For regular and less-irregular applications, performs as well as CRS and better than Grid-GUM  For highly-irregular applications, better than CRS and Grid- GUM

November 21, 2006 Comparison of Feudal Stealing, CRS and Grid-GUM

November 21, 2006 Improvements of Feudal Stealing over CRS

November 21, 2006 Conclusions  Feudal Stealing works well for irregular parallel applications on Computational Grids  Sacrifices some desirable features of “pure” work stealing in order to make better selection of remote targets  Tested only using simulations. Implementation in Grid-GUM under way  Tested only on artificial applications (unbalanced tree search)

November 21, 2006 More info  Vladimir Janjic, Load Balancing of Irregular Parallel Applications On Heterogeneous Distributed Computing Environments, PhD Thesis, University of St Andrews, 2011  Vladimir Janjic, Kevin Hammond, Think Locally Steal Globally : Using Dynamic Load Information in Work-Stealing on Computational Grids, Submitted to CCGrid 2012  Vladimir Janjic, Kevin Hammond, Feudal Work-Stealing, In preparration, planned for submission to EuroPar 2012