Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus Gabrielle Allen, Thomas Dramlitsch, Ian Foster,

Slides:



Advertisements
Similar presentations
Distributed Systems CS
Advertisements

Beowulf Supercomputer System Lee, Jung won CS843.
High Performance Computing Course Notes Grid Computing.
Gabrielle Allen*, Thomas Dramlitsch*, Ian Foster †, Nicolas Karonis ‡, Matei Ripeanu #, Ed Seidel*, Brian Toonen † * Max-Planck-Institut für Gravitationsphysik.
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
Cactus in GrADS Dave Angulo, Ian Foster Matei Ripeanu, Michael Russell Distributed Systems Laboratory The University of Chicago With: Gabrielle Allen,
Cactus in GrADS (HFA) Ian Foster Dave Angulo, Matei Ripeanu, Michael Russell.
Computer Science Department 1 Load Balancing and Grid Computing David Finkel Computer Science Department Worcester Polytechnic Institute.
Dynamic Load Balancing Experiments in a Grid Vrije Universiteit Amsterdam, The Netherlands CWI Amsterdam, The
Cactus Code and Grid Programming Here at GGF1: Gabrielle Allen, Gerd Lanfermann, Thomas Radke, Ed Seidel Max Planck Institute for Gravitational Physics,
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
GridLab & Cactus Joni Kivi Maarit Lintunen. GridLab  A project funded by the European Commission  The project was started in January 2002  Software.
Presenter : Cheng-Ta Wu Antti Rasmus, Ari Kulmala, Erno Salminen, and Timo D. Hämäläinen Tampere University of Technology, Institute of Digital and Computer.
Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.
Grid Computing Net 535.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Parallel Communications and NUMA Control on the Teragrid’s New Sun Constellation System Lars Koesterke with Kent Milfeld and Karl W. Schulz AUS Presentation.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.
Low-Power Wireless Sensor Networks
Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
1 Cactus in a nutshell... n Cactus facilitates parallel code design, it enables platform independent computations and encourages collaborative code development.
Applications for the Grid Here at GGF1: Gabrielle Allen, Thomas, Dramlitsch, Gerd Lanfermann, Thomas Radke, Ed Seidel Max Planck Institute for Gravitational.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
N*Grid – Korean Grid Research Initiative Funded by Government (Ministry of Information and Communication) 5 Years from 2002 to million US$ Including.
Supercomputing Center CFD Grid Research in N*Grid Project KISTI Supercomputing Center Chun-ho Sung.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Network-on-Chip Energy-Efficient Design Techniques for Interconnects Suhail Basit.
Example: Sorting on Distributed Computing Environment Apr 20,
The Globus Project: A Status Report Ian Foster Carl Kesselman
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
The Grid the united computing power Jian He Amit Karnik.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
GridLab WP-2 Cactus GAT (CGAT) Ed Seidel, AEI & LSU Co-chair, GGF Apps RG, Gridstart Apps TWG Gabrielle Allen, Robert Engel, Tom Goodale, *Thomas Radke.
Community Software Development with the Astrophysics Simulation Collaboratory Authors: Gregor von Laszewski, Michael Russell, Ian Foster, John Shalf, Presenter:
Connections to Other Packages The Cactus Team Albert Einstein Institute
Automatic Parameterisation of Parallel Linear Algebra Routines Domingo Giménez Javier Cuenca José González University of Murcia SPAIN Algèbre Linéaire.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Outline Why this subject? What is High Performance Computing?
2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Albert-Einstein-Institut Exploring Distributed Computing Techniques with Ccactus and Globus Solving Einstein’s Equations, Black.
Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.
Background Computer System Architectures Computer System Software.
Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Clouds , Grids and Clusters
Introduction to Load Balancing:
Exploring Distributed Computing Techniques with Ccactus and Globus
Multi-Processing in High Performance Computer Architecture:
by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow
CLUSTER COMPUTING.
Hybrid Programming with OpenMP and MPI
COMP60621 Fundamentals of Parallel and Distributed Systems
Adaptive Grid Computing
Motion-Aware Routing in Vehicular Ad-hoc Networks
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus Gabrielle Allen, Thomas Dramlitsch, Ian Foster, Nick Karonis, Matei Ripeanu, Ed Seidel, Brian Toonen Proceedings of Supercomputing 2001 (Winning Paper for Gordon Bell Prize - Special Category) Presenter Imran Patel

Outline Introduction Computational Grids Grid-enabled Cactus Toolkit Experimental Results Ghostzones and Compression Adaptive Strategies Conclusion

Introduction Widespread use of numerical simulation techniques has led to a high demand for traditional high- performance computing resources (supercomputers). Low-end computers are becoming increasingly powerful and are connected with high-speed networks. “Computational Grids” aim to tie these scattered resources into an integrated infrastructure. Applications for the grid include large-scale simulations which require high resources for increased throughput.

Introduction: The Problem Heterogeneous and dynamically behaving resources makes development of grid- enabled applications extremely difficult. One approach is to develop computational frameworks which hide this complexity from the programmer. Cactus-G: a simulation framework which uses grid-aware components and message passing library (MPICH-G2)

Computational Grids Computational Grids differ from other parallel computing environments: A grid may have nodes with different processor speeds, memory, etc Grids may have widely different network interconnects and topologies. Resource availability varies in a grid. Nodes in a grid may have different software configurations.

Computational Grids – Programming Techniques To overcome these problems, some generic techniques have been devised: Irregular Data Distributions: use application/network/node information. Grid-aware Communication schedules: overlapping/grouping, dedicated nodes Redundant Computation: at the expense of reduced communication. Protocol Tuning: TCP tweaks,compression

Cactus-G Cactus is a modular and parallel simulation environment used by scientists and engineers in the fields of numerical relativity, astrophysics, climate modeling, etc. Cactus design consists of a core (flesh) which connects to application modules (thorns). Various thorns exist for services such as Globus Toolkit, PETSc library, visualization, etc. Cactus is highly portable and parallel since it uses abstraction APIs which themselves are implemented as thorns. MPICH-G2 exploits Globus services and provides faster communication and QOS.

Cactus-G: Architecture Applications thorns need not be grid- aware. Example of a grid- aware Cactus thorn is PUGH, which provides MPI- based parallelism. The DUROC library handles process management.

Experimental Results: Setup An application in Fortran for solving numerical relativity problems 57 3-d variables, 780 flops per gridpoint per iteration. N x N x 6 x ghostzone_size x 8 variables need to be sync’ed at each processor. Total 1500 CPUs organized in a 5 x 12 x 25 3-d mesh

Experimental Results: Setup 4 supercomputers at SDSC and NCSA 1024-CPU IBM Power-SP (306 MFlops/s). One 256 CPU and two 128-CPU SGI Origin2000 systems. (168 MFlops/s). Intramachine: 200 MB/s Intermachine: 100 MB/s SDSC NCSA: 3 MB/s on 622 Mb/s

Communication Optimizations Communication/Computation Overlap: Processors across WANs were given few grid points so that they could overlap their communication with computations. Compression: A cactus thorn which exploits the regularity of data for compression using the libz library. Ghostzones: Larger ghostzones were used for efficient communication at the expense of redundant computations.

Performance Metrics Flop/s rate and efficiency are used as metrics. Total execution time (t tot ): Measured using MPI_Wtime() Expected Communication Time (t comp ): Ideal time calculated from single node. Flop Count F: Calculated using hardware counters 780 Flops per gridpoint per iteration. Flop/s Rate = F * num_gridpts * num_iterations / t tot E = t comp /t tot

Performance Figures 4 supercomputers: 42 GFlop/s, 14% Compression + 10 Ghostzones: 249 GFlop/s, 63.3% Smaller run on =1140 processors: 292 GFlop/s, 88%

Ghostzones Increasing ghostzone size can reduce latency overhead by transferring fewer messages with same amount of total data. Increasing the ghostzone size beyond a certain point does not give any benefits and wastes memory.

Compression For increased throughput across WANs. Compression found to be highly useful since data is regular/smooth. Since smoothness of data changes over time, compression effects can change. So, we need adaptive compression.

Adaptive Strategies - Compression Predicting optimal values of ghostzone/compression parameters might be good. Don’t want to use detailed network characteristics. For ex: change the compression state based on efficiency, which is averaged over N iterations.

Adaptive Ghostzone Sizes Adapting Ghostzone size is challenging: Many ghostsize values possible, memory re-allocations, ripple effects, need to get extra data from neighbors. Start with a size of 1 and increase/decrease in accordance with the efficiency.

Adaptive Ghostzone Sizes

Further Information The Cactus Framework and Toolkit: Design and Applications Tom Goodale, et al. Vector and Parallel Processing - VECPAR'2002 Grid Aware Parallelizing Algorithms Thomas Dramlitsch, Gabrielle Allen, Edward Seidel Journal of Parallel and Distributed Computing