Cactus in GrADS Dave Angulo, Ian Foster Matei Ripeanu, Michael Russell Distributed Systems Laboratory The University of Chicago With: Gabrielle Allen,

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Gabrielle Allen*, Thomas Dramlitsch*, Ian Foster †, Nicolas Karonis ‡, Matei Ripeanu #, Ed Seidel*, Brian Toonen † * Max-Planck-Institut für Gravitationsphysik.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Cactus in GrADS (HFA) Ian Foster Dave Angulo, Matei Ripeanu, Michael Russell.
Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus Gabrielle Allen, Thomas Dramlitsch, Ian Foster,
The Cactus Portal A Case Study in Grid Portal Development Michael Paul Russell Dept of Computer Science The University of Chicago
Cactus Code and Grid Programming Here at GGF1: Gabrielle Allen, Gerd Lanfermann, Thomas Radke, Ed Seidel Max Planck Institute for Gravitational Physics,
Grid Application Development Software Project Outline l Resource Selection: Current Directions l Contracts: Current Directions l Current Status –Resource.
Scuola Superiore Sant’Anna Project Assignments Operating Systems.
GridLab & Cactus Joni Kivi Maarit Lintunen. GridLab  A project funded by the European Commission  The project was started in January 2002  Software.
Workload Management Massimo Sgaravatto INFN Padova.
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
Cactus-G: Experiments with a Grid-Enabled Computational Framework Dave Angulo, Ian Foster Chuang Liu, Matei Ripeanu, Michael Russell Distributed Systems.
Course Instructor: Aisha Azeem
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Ch4: Distributed Systems Architectures. Typically, system with several interconnected computers that do not share clock or memory. Motivation: tie together.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Christopher Jeffers August 2012
PicsouGrid Viet-Dung DOAN. Agenda Motivation PicsouGrid’s architecture –Pricing scenarios PicsouGrid’s properties –Load balancing –Fault tolerance Perspectives.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Matthew Moccaro Chapter 10 – Deployment and Mobility PART II.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Grads Meeting - San Diego Feb 2000 The Cactus Code Gabrielle Allen Albert Einstein Institute Max Planck Institute for Gravitational Physics
Process Introspection: A Checkpoint Mechanism for High Performance Heterogeneous Distributed Systems. University of Virginia. Author: Adam J. Ferrari.
Cactus Project & Collaborative Working Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert Einstein Institute)
NeSC Apps Workshop July 20 th, 2002 Customizable command line tools for Grids Ian Kelley + Gabrielle Allen Max Planck Institute for Gravitational Physics.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
SOFTWARE DESIGN (SWD) Instructor: Dr. Hany H. Ammar
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
1 Cactus in a nutshell... n Cactus facilitates parallel code design, it enables platform independent computations and encourages collaborative code development.
Applications for the Grid Here at GGF1: Gabrielle Allen, Thomas, Dramlitsch, Gerd Lanfermann, Thomas Radke, Ed Seidel Max Planck Institute for Gravitational.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Nomadic Grid Applications: The Cactus WORM G.Lanfermann Max Planck Institute for Gravitational Physics Albert-Einstein-Institute, Golm Dave Angulo University.
The Globus Project: A Status Report Ian Foster Carl Kesselman
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Superscheduling and Resource Brokering Sven Groot ( )
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Cactus/TIKSL/KDI/Portal Synch Day. Agenda n Main Goals:  Overview of Cactus, TIKSL, KDI, and Portal efforts  present plans for each project  make sure.
GridLab WP-2 Cactus GAT (CGAT) Ed Seidel, AEI & LSU Co-chair, GGF Apps RG, Gridstart Apps TWG Gabrielle Allen, Robert Engel, Tom Goodale, *Thomas Radke.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
Glen Dobson, Lancaster University Service Grids Workshop NeSC Edinburgh 23/7/04 Endpoint Services Glen Dobson Lancaster University,
Albert-Einstein-Institut Exploring Distributed Computing Techniques with Ccactus and Globus Solving Einstein’s Equations, Black.
Dynamic Grid Computing: The Cactus Worm The Egrid Collaboration Represented by: Ed Seidel Albert Einstein Institute
Cactus Workshop - NCSA Sep 27 - Oct Generic Cactus Workshop: Summary and Future Ed Seidel Albert Einstein Institute
Parallel Computing Presented by Justin Reschke
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Peter Kacsuk – Sipos Gergely MTA SZTAKI
The Cactus Team Albert Einstein Institute
Pipeline Execution Environment
J. Michael, M. Shing M. Miklaski, J. Babbitt Naval Postgraduate School
Exploring Distributed Computing Techniques with Ccactus and Globus
University of Technology
Dynamic Grid Computing: The Cactus Worm
Chapter 4: Threads.
Adaptive Grid Computing
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

Cactus in GrADS Dave Angulo, Ian Foster Matei Ripeanu, Michael Russell Distributed Systems Laboratory The University of Chicago With: Gabrielle Allen, Thomas Dramlitsch, Ed Seidel, John Shalf, Thomas Radke

Distributed Systems Lab ARGONNE  CHICAGO Presentation Outline l Cactus Overview –Architecture –Applications l Cactus and Grid computing –Metacomputing, Worms, … l Proposed Cactus-GrADS project –The “Cactus-G worm” –Tequila thorn and architecture –Issues

Distributed Systems Lab ARGONNE  CHICAGO What is Cactus? Cactus is a freely available, modular, portable and manageable environment for collaboratively developing parallel, high- performance multidimensional simulations –Originally developed for astrophysics, but nothing about it is astrophysics- specific

Distributed Systems Lab ARGONNE  CHICAGO Cactus Applications Example output from Numerical Relativity Simulations

Distributed Systems Lab ARGONNE  CHICAGO Cactus Architecture l Codes are constructed by linking a small core (flesh) with selected modules (thorns) –Custom linking/configuration tools l Core provides basic management services l A wide variety of thorns are supported –Numerical methods –Grids and domain decompositions –Visualization and steering –Etc.

Distributed Systems Lab ARGONNE  CHICAGO Cactus Architecture Configure CST Flesh Computational Toolkit Operating Systems AIX NT Linux Unicos Solaris HP-UX Thorns Cactus SuperUX Irix OSF Make

Distributed Systems Lab ARGONNE  CHICAGO Cactus Applications l A Cactus “application” is just another thorn, “linked” with other tool thorns l Numerous Astrophysics applications –E.g., Calculate Schwartzchild Event Horizons for colliding black holes l Potential candidates for GrADS work –Elliptical Solver, BenchADM –Both use 3-D grid abstract topology

Distributed Systems Lab ARGONNE  CHICAGO Cactus Model (cont.) Building an executable Cactus Source Flesh IOBasic IOASCII WaveToy LDAP Worm … Thorns Configuration Compiler options Tool options MPI options HDF5 options

Distributed Systems Lab ARGONNE  CHICAGO Running Cactus Parameter File Specify which thorns to activate Specify global parameters Specify restricted parameters Specify private parameters

Distributed Systems Lab ARGONNE  CHICAGO Parallelism in Cactus l Distributed memory model: each thorn is passed a section of the global grid l The parallel driver (implemented in a thorn) can use whatever method it likes to decompose the grid across processors and exchange ghost zone information - each thorn is presented with a standard interface, independent of the driver l Standard driver distributed with Cactus (PUGH) is for a parallel unigrid and uses MPI for the communication layer l PUGH can do custom processor decomposition and static load balancing l AMR driver also provided

Distributed Systems Lab ARGONNE  CHICAGO Cactus and Grid Computing: General Observations l Reasons to work with Cactus –Rich structure, computationally intensive, numerous opportunities for Grid computing –Talented and motivated developer/user community l Issues –At core, relatively simple structure –Cactus system is relatively complex –User community is relatively small

Distributed Systems Lab ARGONNE  CHICAGO Cactus-G: Possible Opportunities l “Metacomputing”: use heterogeneous systems as source of low-cost cycles –Departmental pool or multi-site system l Dynamic resource selection, e.g. –“Cheapest” resources to achieve interactivity –“Fastest” resource for best turnaround –“Best” resolution to meet turnaround goal –Spawn independent tasks: e.g., analysis –Migration to “better” resource for all above

Distributed Systems Lab ARGONNE  CHICAGO Cactus-G: Common Building Blocks l Resource selection based on resource and application characterizations l Implementation and management of distributed output l (De)centralized logging, accounting for resource usage, parameter selection, etc. l Fault discovery, recovery, tolerance l Code/executable management and creation l Next-generation Cactus that increases flexibility with respect to parameter selection

Distributed Systems Lab ARGONNE  CHICAGO Proposed Cactus-G Challenge Problem: Cactus-G Worm l Migrate to “faster/cheaper/bigger” system –When system identified by resource discovery –When resource requirements change l Why? –Tests much of the machinery required for Cactus-G (source code mgmt, discovery, …) –Places substantial demands on GrADS –Good potential to show real benefit –Migration approach simplifies infrastructure demands (MPI-2 support not required)

Distributed Systems Lab ARGONNE  CHICAGO Cactus-G Worm Basic Architecture and Operation Cactus “flesh” “ Tequila” Thorn Compute resource Compute resource … Code repository … Code repository Storage resource Storage resource … Grid Information Service GrADS Resource Selector Application Manager Appln & other thorns (1) Adapt. request (2) Resource request (3) Write checkpoint (4) Migration request (5) Cactus startup (7) Read checkpoint (0) Possible user input (6) Load code (1’) Resource notification Store models, etc. Query

Distributed Systems Lab ARGONNE  CHICAGO Tequila Thorn Functions l Initiates adaptation on application request or on notification of new resources –Can include user input (e.g., HTTP thorn) l Requests resources from external entity –GIS or ResourceSelector l Checkpoints application l Contacts Application Manager to request restart on new resources –AppManager has security, robustness advantages vs. direct restart

Distributed Systems Lab ARGONNE  CHICAGO Cactus-G Worm: Approach 1)Uniproc Tequila thorn that speaks to GIS, adapts periodically [done: Cactus group] 2)Tequila thorn that speaks to UCSD Resource Selector [current focus] 3)Integrate accurate performance models 4)Support multiprocessor execution 5)Detailed evaluation 6)Add adaptation triggers: e.g., contract violation, new regime, user input

Distributed Systems Lab ARGONNE  CHICAGO Tequila Thorn + ResourceSelector l ResourceSelector must be set up as service l Tequila thorn sends request for new bag of resources l ResourceSelector responds with the new bag

Distributed Systems Lab ARGONNE  CHICAGO Current Status l Tequila thorn prototype developed that speaks to ResourceSelector l Dummy ResourceSelector that returns a static bag of resources l Demonstrated Cactus+Tequila operating l Performance model developed l Expected by May: multiprocessor support, ResourceSelector interface, real performance model

Distributed Systems Lab ARGONNE  CHICAGO Open Issues l Should we move more management logic into Application Manager? l How does Contract Monitor fit into architecture? l How does PPS fit into architecture? l How does COP and Aplication Launcher fit into architecture (Cactus has its own launcher and compiles its own code)? l How does Pablo fit into architecture (Which thorns are monitored, is flesh monitored)?

The End

Distributed Systems Lab ARGONNE  CHICAGO Request and Response l The Request to the ResourceSelector will be stored in the InformationService l Only the pointer to the data in the IS will be passed to the ResourceSelector l The Response from the ResourceSelector will also be stored in the IS l Only the pointer to the data in the IS will be passed back.

Distributed Systems Lab ARGONNE  CHICAGO Tequila communication overview Cactus Tequila Thorn Resource Selector Information Service

Distributed Systems Lab ARGONNE  CHICAGO Cactus Architecture in GrADS Configure CST Flesh Computational Toolkit Operating Systems AIX NT Linux Unicos Solaris HP-UX Thorns Cactus SuperUX Irix OSF Make Toolkit Grads Communi- cation library

Distributed Systems Lab ARGONNE  CHICAGO Communication Details step 1 l Event sent to Tequila thorn requesting restart Cactus Tqeuila Thorn Resource Selector Information Service

Distributed Systems Lab ARGONNE  CHICAGO Communication Details step 2 l Tequila store AART in IS Cactus Tqeuila Thorn Resource Selector Information Service

Distributed Systems Lab ARGONNE  CHICAGO Communication Details step 3 l Tequila sends request to ResourceSelector passing pointer to data in IS Cactus Tqeuila Thorn Resource Selector Information Service

Distributed Systems Lab ARGONNE  CHICAGO Communication Details step 4 l ResourceSelector retrieves AART from IS Cactus Tqeuila Thorn Resource Selector Information Service

Distributed Systems Lab ARGONNE  CHICAGO Communication Details step 5 l ResourceSelector stores bag of resources (in AART) in IS Cactus Tqeuila Thorn Resource Selector Information Service

Distributed Systems Lab ARGONNE  CHICAGO Communication Details step 6 l ResourceSelector responds to Tequila passing pointer to data in IS Cactus Tqeuila Thorn Resource Selector Information Service

Distributed Systems Lab ARGONNE  CHICAGO Communication Details step 7 l Tequila retrieves AART with new bag of resources from IS Cactus Tqeuila Thorn Resource Selector Information Service

Distributed Systems Lab ARGONNE  CHICAGO Requirements l Using the IS for communication adds overhead. l Why do this? l GrADS requirement 1: do some things (e.g. compile) at one time and have the results stored in a persistent storage area. Pick these stored results up later and complete other phases.

Distributed Systems Lab ARGONNE  CHICAGO Sample Tequila Scenario l User asks to run an ADM simulation 400x400x400 for 1000 timesteps in 10s. l Resource selector contacted to obtain virtual machines l Best virtual machine selected based on performance model. l AM starts Cactus on that virtual machine (and monitors execution Contracts?) l User (or application manager) decides that computation advances too slow and decides to search for a better virtual machine l AM finds a better machine, commands the Cactus run to Checkpoint, transfers files and restart Cactus