Rio de Janeiro, October, 2005 SBAC 20051 Portable Checkpointing for BSP Applications on Grid Environments Raphael Y. de Camargo Fabio Kon Alfredo Goldman.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Threads, SMP, and Microkernels
Christian Delbe1 Christian Delbé OASIS Team INRIA -- CNRS - I3S -- Univ. of Nice Sophia-Antipolis November Automatic Fault Tolerance in ProActive.
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Spark: Cluster Computing with Working Sets
Types of Parallel Computers
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
Distributed Processing, Client/Server, and Clusters
Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.
Chapter 16 Client/Server Computing Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
City University London
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
MPICH-V: Fault Tolerant MPI Rachit Chawla. Outline  Introduction  Objectives  Architecture  Performance  Conclusion.
Run-Time Storage Organization
Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Andrei Goldchleger, Fabio Kon, Alfredo Goldman and Marcelo Finger University of São.
Checkpointing-based Rollback Recovery for Parallel Applications on the InteGrade Grid Middleware Raphael Y. de Camargo Andrei Goldchleger Fabio Kon Alfredo.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Pregel: A System for Large-Scale Graph Processing
Computer System Architectures Computer System Software
Yavor Todorov. Introduction How it works OS level checkpointing Application level checkpointing CPR for parallel programing CPR functionality References.
PicsouGrid Viet-Dung DOAN. Agenda Motivation PicsouGrid’s architecture –Pricing scenarios PicsouGrid’s properties –Load balancing –Fault tolerance Perspectives.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Process Introspection: A Checkpoint Mechanism for High Performance Heterogeneous Distributed Systems. University of Virginia. Author: Adam J. Ferrari.
DCE (distributed computing environment) DCE (distributed computing environment)
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Chapter 2 (PART 1) Light-Weight Process (Threads) Department of Computer Science Southern Illinois University Edwardsville Summer, 2004 Dr. Hiroshi Fujinoki.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
OS2- Sem ; R. Jalili Introduction Chapter 1.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership: process includes a virtual address space to hold the process image (fig 3.16)
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Department of Computer Science and Software Engineering
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
FTOP: A library for fault tolerance in a cluster R. Badrinath Rakesh Gupta Nisheeth Shrivastava.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
Operating System Concepts
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Contents 1.Overview 2.Multithreading Model 3.Thread Libraries 4.Threading Issues 5.Operating-system Example 2 OS Lab Sun Suk Kim.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
OPERATING SYSTEM CONCEPT AND PRACTISE
Data Management on Opportunistic Grids
University of Technology
Threads, SMP, and Microkernels
Multiple Processor Systems
Lecture 4- Threads, SMP, and Microkernels
Multithreaded Programming
Database System Architectures
Presentation transcript:

Rio de Janeiro, October, 2005 SBAC Portable Checkpointing for BSP Applications on Grid Environments Raphael Y. de Camargo Fabio Kon Alfredo Goldman Department of Computer Science IME / USP

Rio de Janeiro, October, 2005 SBAC INTRODUCTION ● Computational Grids: ubiquitous access and coordinated usage of distributed resources ● Opportunistic Grids: usage of idle time of non- dedicated resources (desktop PCs) – Resources are heterogeneous (Mac, Windows, Linux) – Failure rate is higher than dedicated resources – Fails on a daily basis

Rio de Janeiro, October, 2005 SBAC INTEGRADE ● Grid middleware: usage of idle computing power from personal computers ● Federation of clusters – Clusters composed of a collection of resource providing nodes ● Sequential, parameter sweeping, and BSP applications

Rio de Janeiro, October, 2005 SBAC MOTIVATION ● Fault-tolerance is essential, specially when running parallel applications – Failure of a single node require restarting the application from the beginning – Checkpointing can be used as a fault- tolerance mechanism ● Mechanisms supporting heterogeneity improve resource utilization – Portable checkpointing mechanism allows reinitialization on machines of different architecture

Rio de Janeiro, October, 2005 SBAC OUR APPROACH ● Source code instrumentation – Perform additional tasks – Logging, profiling, persistence ● BSP application on heterogeneous nodes ● Portable checkpointing of applications ● Pre-compiler based on OpenC++ – Open-source tool for compile time reflection

Rio de Janeiro, October, 2005 SBAC BSP MODEL ● Bridging model – Link architecture to software ● Execution performed in supersteps – Computation and synchronization phases ● Two communication mechanisms: – Direct Remote Memory Access (DRMA) – Bulk Synchronous Message Passing (BSMP) ● Existing implementations: – Oxford BSPLib, PUB, BSP-G ● Work only on homogeneous clusters

Rio de Janeiro, October, 2005 SBAC HETEROGENEOUS NODES ● Extended BSPLib API – Some mehods receive extra parameter describing data type information → used to convert data – Pointer data types are defined by their declaration – Arbitrary data casts are not allowed ● Reasonable requirement for portability ● Pre-compiler automatically modifies application source code to use the extended API – Not need for manual modifications

Rio de Janeiro, October, 2005 SBAC CHECKPOINTING APPROACHES ● System-level checkpoint – Data is copied in chekpoints directly from application address space ● Application-level checkpoint – Instrument application source code to save its state – Semantic information about data-types is available ● Allows generation of portable checkpoints ● Drawbacks – Need to modify application source-code – Checkpoints at certain points in the application

Rio de Janeiro, October, 2005 SBAC CHECKPOINTING LIBRARY ● Pre-compiler instruments application source code – No manual instrumentation of source code – Necessary access to source code ● Checkpointing Library – Timer with a minimum checkpoint interval – Saving performed by a separate thread – Checkpoint can be stored in filesystem (NFS) or remote checkpoint repository (TCP/IP) ● Execution Manager – Coordinates checkpointing of BSP parallel applications

Rio de Janeiro, October, 2005 SBAC SAVING EXECUTION DATA Execution stack –Information from active function calls Local variables, function parameters, return address, and control information –Dependent on architecture and OS Heap area –Memory chunks allocated by application Execution Stack: control information local variables function parameters control information local variables function parameters Necessary to save – Execution stack + global variables – Data in heap area – Other information

Rio de Janeiro, October, 2005 SBAC ● Save only data necessary for reconstruction – List of function calls – Value of parameters and local variables ● Data added to an auxiliary stack during execution ● Recovery – Data read from checkpoint – Functions called – Local variables and parameter values assigned – Data conversion is performed if necessary EXECUTION STACK

Rio de Janeiro, October, 2005 SBAC POINTERS AND HEAP MEMORY ● Memory addresses – Specific for an execution – Architecture dependent ● Checkpoint generation – Data from heap area is copied to checkpoint – Memory addresses → offsets in checkpoint ● Recovery – Memory areas are allocated – Data is copied to these memory areas

Rio de Janeiro, October, 2005 SBAC EXPERIMENTS ● Parallel BSP applications – Similarity between large sequences of characters – Matrix multiplication ● Testbed: – Machines in two labs: ● 11 AthlonXP 1700+, 512MB ● 1 Power PC G4, 512MB ● 2 Athlon , 512MB ● 100Mbps Ethernet in 2 connected LANs

Rio de Janeiro, October, 2005 SBAC CHECKPOINTING OVERHEAD ● Simulation parameters – Matrix multiplication application using 9 nodes – Matrix size: 450x450 and 1800x1800 – Checkpoint sizes: 2.3MB and 37.1MB – Checkpointing intervals: 10, 30, and 60s

Rio de Janeiro, October, 2005 SBAC CHECKPOINTING OVERHEAD ● Storage on local machine or remote repository is faster than with NFS ● When using a remote repository, the overhead was consistently below 10%, even with a 10s interval

Rio de Janeiro, October, 2005 SBAC DYNAMIC GRID SIMULATION ● We simulated a dynamic environment where machine can fail unexpectedly ● Sequence similarity application using 10 nodes – Machine fails according to an exponential distribution ● MTBF (1/ λ) = 600s and 1800s ● Smaller checkpointing intervals → smaller execution times ckp interval 1/λt total 1/λt total 10s600s517.4s 1800s 490.3s 30s600s571.4s 1800s 519.3s 60s600s699.4s 1800s 534.5s

Rio de Janeiro, October, 2005 SBAC HETEROGENEOUS NODES Matrix multiplication on 4 heterogeneous nodes –3 AtlhonXP (x86) + 1 PowerPC G4 (ppc) Elements of type long double ● Time spent on data conversion is small compared to total execution time Matrix sizet exec t x86 t ppc 500x s0.042s0.217s 1000x s0.066s0.348s 2000x s0.078s0.430s

Rio de Janeiro, October, 2005 SBAC RESTART AN APPLICATION ● Time to recover from a checkpoint saved on different architectures ● Application that generates a graph of structures containing 20K nodes ● When recovering on an x86 machine – From x86: 0.179s – From x86-64:0.186s → 3.9% slower than x86 – From PPC: 0.192s → 7.2% slower than x86 ● Overhead when reading checkpoint data

Rio de Janeiro, October, 2005 SBAC CONCLUSIONS ● Overhead of portability is small, and can lead to better resource utilization ● Possible to execute BSP applications on heterogeneous nodes ● Ongoing work – Distributed checkpoint repository ● Scalability and Fault-tolerance – Simulations in large scale and wide area Grids – Support for multithreaded C++ applications

Rio de Janeiro, October, 2005 SBAC QUESTIONS For more information, please visit the poject page: