A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

Slides:



Advertisements
Similar presentations
Introduction to Grid Application On-Boarding Nick Werstiuk
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
Distributed Systems CS
Thoughts on Shared Caches Jeff Odom University of Maryland.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Parallel Programming Models and Paradigms Prof. Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of Melbourne, Australia.
Reference: Message Passing Fundamentals.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Parallel Programming Models and Paradigms
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Contemporary Languages in Parallel Computing Raymond Hummel.
Mapping Techniques for Load Balancing
Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Computer System Architectures Computer System Software
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
MapReduce VS Parallel DBMSs
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Faucets Queuing System Presented by, Sameer Kumar.
Grid Computing Framework A Java framework for managed modular distributed parallel computing.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Background Computer System Architectures Computer System Software.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
BIG DATA/ Hadoop Interview Questions.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Conception of parallel algorithms
Pattern Parallel Programming
Computer Engg, IIT(BHU)
Constructing a system with multiple computers or processors
Parallel Algorithm Design
Software Engineering Introduction to Apache Hadoop Map Reduce
Many-core Software Development Platforms
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Chapter 4: Threads & Concurrency
Presentation transcript:

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

What we will cover Why we think that cluster programming models are not always enough for multi-core computing Why we think that Grid programming models are for many cases more appropriate Quick look over some programming models that have worked well in grids and we believe could be constructive in multi- core environments Look at where some of these ideas are reappearing in models for multi-core computing

Assumptions when programming clusters Nodes within an allocated set are all homogenous, both in terms of the configuration, and the loads being placed on them Once nodes have been allocated to a process they will not be used by any other user process until the first finishes

Assumptions when programming clusters Outside of very tightly coupled tasks on very large numbers of processors, the noise caused by other background tasks running on the node has a minimal effect on user processes Because nodes will run the same background tasks, large supercomputers able to handle the problem of background tasks through centralised control of when such tasks execute

Models for Programming Clusters Message passing, MPI Shared memory, OpenMP Embarrassingly parallel batch jobs

Properties of Multi-core systems Cores will be shared with a wide range of other applications dynamically Load can no longer be considered homogeneous across the cores Cores will likely not be heterogeneous as accelerators become common for scientific hardware Source code will often be unavailable, preventing compilation against the specific hardware configuration

Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)

Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)

Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)

Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)

Multi-core processor with all nodes allocated to each task Idle Task A Task B (Single Threaded)

Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)

Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)

Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)

Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)

Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)

Multi-core processor where allocated nodes can change Idle Task A Task B (Single Threaded)

Map-Reduce Developed by Google to simplify programming analysis functions to execute in their heterogeneous distributed computing environments Constructed around the ideas drawn from functional programming Has allowed the easy harnessing of huge amounts of computing power spanning many distributed resources

Boinc Developed by David Anderson at Berkley Is an abstracted version of the framework behind the project Designed to make the construction and management of trivially parallel tasks trivial Used by a range of other projects including climateprediction.net,

Martlet Developed for the analysis of data produced by the climateprediction.net project Based on ideas from functional programming Able to dynamically adjust the workflow to adapt to changing numbers of resources and data distributions

Grid-GUM and GpH Grid-GUM is a platform to support Glasgow parallel Haskell in a Grid environment Programmer defines places where programs could potentially have multiple threads executing Each processor has a thread and uses work stealing where possible to handle the dynamic and heterogeneous nature of tasks and resources. Intelligent scheduling reduces communication between disparate resources e.g. between machines or clusters

Styx Grid Services Developed at Reading University to huge amounts of analyse environmental data. Built on top of the Styx protocol originally developed for the P9 operating system Allows the effective construction of workflows pipelining processes Reduces the amount of data active in the system at any one time, and improves the performance of many stage analysis techniques

Abstract Grid Workflow Language (AGWL) XML based workflow language developed to hide much of the system dependant complexity AGWL never contains descriptions of the data transfer, partitioning of data, or locations of hardware At runtime, the underlying system examines the available resources and compiles the workflow into Concrete Grid Workflow Language automatically adding the detail

Programming Multi-Core Some ideas that appear in these projects are also appearing in some other places. These include; Microsoft’s LINQ constructs CodePlay’s Sieve Constructs Intel’s Thread Building Blocks API Dynamic DAG generation

LINQ Based on Lambda Calculus and now part of the.NET framework, LINQ is intended to provide a uniform way of accessing and applying functions to data stored in different data structures. This allows both the easy construction of pipelines, but also the automatic construction of parallel pipelines. This has much in common with Styx Grid Services.

Sieve Sieve is a range of language constructs and a supporting compiler that allows users to construct a range of parallel programming patterns. These patterns include marking points where the code can be split and automatically managing a pool of threads to execute this code complete with work stealing. This is the same pattern used by Grid-GUM and Glasgow Parallel Haskell

Thread Building Blocks Intel’s Thread Building Blocks is an API supporting a range of different parallel programming models. This includes divide and conquer methods and batch methods producing tasks to be handled by a thread pool, allowing dynamic load These are very similar to Boinc, Martlet and Map-Reduce

Dynamic Dependency Analysis Work carried out at a range of institutions including University of Tennessee and Oak Ridge National Laboratory Takes code written in a high level language, and dynamically converts this into a DAG of dependant tasks This can automatically generate thousands of tasks that can be scheduled to try and both keep all the cores busy all the time and adapt to changing resources

Conclusions Multi-core machine will operate in a much more heterogeneous and dynamic environment than clusters do today. Some aspects of grid computing have already started looking at the problems associated with such environments. Some approaches to programming multi-core machines already include some of these ideas. Functional programming appears a lot It is important that we remember why we must include such functionality in the models.