Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Cactus Code: A Parallel, Collaborative, Framework for Large Scale Computing Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert.

Similar presentations


Presentation on theme: "The Cactus Code: A Parallel, Collaborative, Framework for Large Scale Computing Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert."— Presentation transcript:

1 The Cactus Code: A Parallel, Collaborative, Framework for Large Scale Computing Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert Einstein Institute)

2 Outline THE GRID: Dependable, consistent, pervasive access to high-end resources CACTUS is a freely available, modular, portable and manageable environment for collaboratively developing parallel, high- performance multi-dimensional simulations

3 History n Cactus originated in 1997 as a code for numerical relativity, following a long line of codes developed in Ed Seidel’s research groups, at the NCSA and recently the AEI. n Numerical Relativity: complicated 3D hyperbolic/elliptic PDEs, dozens of equations, thousands of terms, many people from very different disciplines working together, needing a fast, portable, flexible, easy-to-use, code which can incorporate new technologies without disrupting users. n Originally: Paul Walker, Joan Masso, John Shalf, Ed Seidel. n Cactus 4.0, August 1999: Total rewrite and redesign of code, learning from experiences with previous versions. t=0 t=100 Need Multi Tflop, Tbyte computing!!

4 Gravitational Waves Astronomy: New Field, Fundamental New Information about the Universe Multi-Teraflop Computation, AMR, Elliptic-Hyperbolic Numerical Relativity

5 Numerical Relativity With Cactus n Biggest computations ever: l 256 proc O2K at NCSA, 225,000 SU’s, 1Tbyte Output Data in a Few Weeks n Black Holes (prime source for GW) l Increasingly complex collisions: now doing full 3D grazing collisions n Gravitational Waves l Study linear waves as testbeds l Move on to fully nonlinear waves l Interesting Physics: BH formation in full 3D! n Neutron Stars l Developing capability to do full GR hydro l Now can follow full orbits!

6 What is Cactus n Flesh (ANSI C) provides code infrastructure (parameter, variable, scheduling databases, error handling, APIs, make, parameter parsing, ) n Thorns (F77/F90/C/C++/Java/Perl/Python) are plug-in and swappable modules or collections of subroutines providing both the computational instructructure and the physical application. Well-defined interface through 3 config files n Just about anything can be implemented as a thorn: Driver layer (MPI, PVM, SHMEM, …), Black Hole evolvers, elliptic solvers, reduction operators, interpolators, web servers, grid tools, IO, … n User driven: easy parallelism, no new paradigms, flexible n Collaborative: thorns borrow concepts from OOP, thorns can be shared, lots of collaborative tools n Computational Toolkit: existing thorns for (Parallel) IO, elliptic, MPI unigrid driver, n Integrate other common packages and tools: HDF5, Globus, PETSc, PAPI, Panda, FlexIO, GrACE, Autopilot, LCAVision, OpenDX, Amira,... n Trivially Grid enabled!

7 Current Version Cactus 4.0 n Cactus 4.0 beta 1 released September 1999 n Community code: Distributed under GNU GPL n Currently: Cactus 4.0 beta 8 n Supported Architectures: l SGI Origin l SGI 32/64 l Cray T3E l Dec Alpha l Intel Linux IA32/IA64 l Windows NT l HP Exemplar l IBM SP2 l Sun Solaris l Hitachi SR8000-F l NEC SX-5 l Mac Linux l...

8 Cactus Computational Toolkit: Parallel utilities (thorns) for computational scientist CactusBase l Boundary, IOUtil, IOBasic, CartGrid3D, IOASCII, Time CactusBench l BenchADM CactusConnect l HTTPD, HTTPDExtra CactusExample l WaveToy1DF77, WaveToy2DF77 CactusElliptic l EllBase, EllPETSc, EllSOR, EllTest CactusPUGH l Interp, PUGH, PUGHSlab, PUGHReduce CactusPUGHIO l IOFlexIO, IOHDF5, IsoSurfacer CactusTest l TestArrays, TestCoordinates, TestInclude1, TestInclude2, TestComplex, TestInterp, TestReduce CactusWave l IDScalarWave, IDScalarWaveC, IDScalarWaveCXX, WaveBinarySource, WaveToyC, WaveToyCXX, WaveToyF77, WaveToyF90, WaveToyFreeF90 external l IEEEIO, RemoteIO, TCPXX, jpeg6b BetaThorns (In Development) l IOStreamedHDF5, IOJpeg, IOHDF5Util,…, many more

9 How To Use Cactus n Application scientist usually concentrates on the application l Physics, Performance, Algorithms l Logically: Operations on a grid (structured or unstructured) l Program in any language n Then takes advantage of parallel API features enabled by Cactus l IO, Data streaming, remote visualization/steering, AMR, MPI/PVM, checkpointing, Grid Computing, interpolations, reductions, etc… l Abstraction allows one to switch between different MPI, PVM layers, different I/O layers, etc, with no or minimal changes to application! n (nearly) All architectures supported and autoconfigured l Common to develop on laptop (w/wo MPI); run on anything n Metacode Concept l Very, very lightweight, not a huge framework l User specifies desired code modules in configuration files l Desired code generated, automatic routine calling sequences, syntax checking, etc… l You can actually read the code it creates...

10 Cactus Community DLR Geophysics (Bosl) Numerical Relativity Community Cornell Crack prop. NASA NS GC Livermore SDSS (Szalay) Intel Microsoft Clemson “Egrid” NCSA, ANL, SDSC AEI Cactus Group (Allen) NSF KDI (Suen) EU Network (Seidel) Astrophysics (Zeus) US Grid Forum DFN Gigabit (Seidel) “GRADS” (Kennedy, Foster, Dongarra, et al) ChemEng (Bishop) San Diego, GMD, Cornell Berkeley

11 Grid Computing n AEI Numerical Relativity Group has access to high-end resources in over ten centers in Europe/USA n They want: l Bigger simulations, more simulations and faster throughput l Intuitive IO at local workstation l No new systems/techniques to master!! n How to make best use of these resources? l Provide easier access … noone can remember ten usernames, passwords, batch systems, file systems, … great start!!! l Combine resources for larger productions runs (more resolution badly needed!) l Dynamic scenarios … automatically use what is available n Many other reasons for Grid Computing for computer scientists, funding agencies, supercomputer centers...

12 Grid-Enabled Cactus n Cactus and its ancestor codes have been using Grid infrastructure since 1993 n Support for Grid computing was part of the design requirements for Cactus 4.0 (experiences with Cactus 3) n Cactus compiles “out-of-the-box” with Globus [using globus device of MPICH-G(2)] n Design of Cactus means that applications are unaware of the underlying machine/s that the simulation is running on … applications become trivially Grid-enabled n Infrastructure thorns (I/O, driver layers) can be enhanced to make most effective use of the underlying Grid architecture

13 Cactus + Globus Cactus Application Thorns Distribution information hidden from programmer Initial data, Evolution, Analysis, etc Grid Aware Application Thorns Drivers for parallelism, IO, communication, data mapping PUGH: parallelism via MPI (MPICH-G2, grid enabled message passing library) Grid Enabled Communication Library MPICH-G2 implementation of MPI, can run MPI programs across heterogenous computing resources Standard MPI Single Proc

14 Grid Experiments n SC93 l remote CM-5 simulation with live viz in CAVE n SC95 l Heroic I-Way experiments leads to development of Globus. Cornell SP-2, Power Challenge, with live viz in San Diego CAVE n SC97 l Garching 512 node T3E, launched, controlled, visualized in San Jose n SC98 l HPC Challenge. SDSC, ZIB, and Garching T3E compute collision of 2 Neutron Stars, controlled from Orlando n SC99 l Colliding Black Holes using Garching, ZIB T3E’s, with remote collaborative interaction and viz at ANL and NCSA booths n 2000 l Single simulation LANL, NCSA, NERSC, SDSC, ZIB, Garching, … l Dynamic distributed computing … spawning new simulations!!

15 Grand Picture Remote steering and monitoring from airport Origin: NCSA Remote Viz in St Louis T3E: Garching Simulations launched from Cactus Portal Grid enabled Cactus runs on distributed machines Remote Viz and steering from Berlin Viz of data from previous simulations in SF café DataGrid/DPSS Downsampling Globus http HDF5 IsoSurfaces

16 Demo: Remote Computing n Have most of this working now n Need to make it common place, and trivially available to users n Requires development of readers/networks for Viz clients too n Remote simulation: l Monitor and steer using thorn HTTPD l Display live isosurfacers with thorn isosurfacer/IsoView GUI l Display full live viz with HDF5 thorns and OpenDX

17 Remote Visualization IsoSurfaces and Geodesics Contour plots (download) Grid Functions Streaming HDF5 Amira LCA Vision OpenDX

18 Remote Visualization n Streaming data from Cactus simulation to viz client l Clients: OpenDX, Amira, LCA Vision,... n Protocols l Proprietary: –Isosurfaces, geodesics l HTTP: –Parameters, xgraph data, JPegs l Streaming HDF5: –HDF5 provides downsampling and hyperslabbing –all above data, and all possible HDF5 data (e.g. 2D/3D) –two different technologies Streaming Virtual File Driver (I/O rerouted over network stream) XML-wrapper (HDF5 calls wrapped and translated into XML)

19 Remote Visualization (2) n Clients l Proprietary: –Amira l HTTP: –Any browser (+ xgraph helper application) l HDF5: –Any HDF5 aware application h5dump Amira OpenDX LCA Vision (soon) l XML: –Any XML aware application Perl/Tk GUI Future browsers (need XSL-Stylesheets)

20 Remote Visualization - Issues n Parallel streaming l Cactus can do this, but readers not yet available on the client side n Handling of port numbers l clients currently have no method for finding the port number that Cactus is using for streaming l development of external meta-data server needed (ASC/TIKSL) n Generic protocols n Data server l Cactus should pass data to a separate server that will handle multiple clients without interfering with simulation l TIKSL provides middleware (streaming HDF5) to implement this n Output parameters for each client

21 Remote Steering Remote Viz data XML HTTP HDF5 Amira Any Viz Client

22 Remote Steering n Stream parameters from Cactus simulation to remote client, which changes parameters (GUI, command line, viz tool), and streams them back to Cactus where they change the state of the simulation. n Cactus has a special STEERABLE tag for parameters, indicating it makes sense to change them during a simulation, and there is support for them to be changed. n Example: IO parameters, frequency, fields, timestep, debugging flags n Current protocols: l XML (HDF5)to standalone GUI l HDF5to viz tools (Amira) l HTTPto Web browser (HTML forms)

23 Thorn HTTPD n Thorn which allows simulation any to act as its own web server n Connect to simulation from any browser anywhere n Monitor run: parameters, basic visualization,... n Change steerable parameters n See running example at www.CactusCode.org n Wireless remote viz, monitoring and steering

24 Remote Steering - Issues n Same kinds of problems as remote visualization l generic protocols l handling of port numbers l broadcasting of active Cactus simulations n Security l Logins l Who can change parameters? n Lots of issues still to resolve...

25 Remote Offline Visualization Viz Client (Amira) HDF5 VFD DataGrid (Globus) DPSS FTP HTTP Visualization Client DPSS Server FTP Server Web Server Remote Data Server Downsampling, hyperslabs Viz in Berlin 4TB at NCSA Only what is needed

26 Remote Offline Visualization n Accessing remote data for local visualization n Should allow downsampling, hyperslabbing, etc. n Access via DPSS is working (TIKSL) n Waiting for DataGrid support for HTTP and FTP to remove dependency on the DPSS file systems.

27 New Grid Applications n Dynamic Staging: move to faster/cheaper/bigger machine l “Cactus Worm” n Multiple Universe l create clone to investigate steered parameter (“Cactus Virus”) n Automatic Convergence Testing l from intitial data or initiated during simulation n Look Ahead l spawn off and run coarser resolution to predict likely future n Spawn Independent/Asynchronous Tasks l send to cheaper machine, main simulation carries on n Thorn Profiling l best machine/queue l choose resolution parameters based on queue l ….

28 New Grid Applications (2) n Dynamic Load Balancing l inhomogeneous loads l multiple grids n Portal l resource choosing l simulation launching l management n Intelligent Parameter Surveys l farm out to different machines n Make use of l Running with management tools such as Condor, Entropia, etc. l Scripting thorns (management, launching new jobs, etc) l Dynamic use of eg MDS for finding available resources

29 Go! Dynamic Grid Computing Clone job with steered parameter Queue time over, find new machine Add more resources Found a horizon, try out excision Look for horizon Calculate/Output Grav. Waves Calculate/Output Invariants Find best resources Free CPUs!! NCSA SDSC RZG SDSC LRZArchive data

30 Users View

31 Cactus Worm n Egrid Test Bed: 10 Sites n Simulation starts on one machine, seeks out new resources (faster/cheaper/bigger) and migrates there, etc, etc n Uses: Cactus, Globus n Protocols: gsissh, gsiftp, streams or copies data n Queries Egrid GIIS at each site n Publishes simulation information to Egrid GIIS n Demonstrated at Dallas SC2000 n Development proceeding with KDI ASC (USA), TIKSL/GriKSL (Germany), GrADS (USA), Application Group of Egrid (Europe) n Fundamental dynamic Grid application !!! n Leads directly to many more applications

32 Demo: Cactus Worm n Worm running around 10 sites of the Egrid testbed n Currently developing more features/fault tolerance/logging n Will run for around 1000 generations (1day) then dies!

33 Dynamic Grid Computing n Fundamental Issues (all needed for Cactus Worm) l Dynamic resource selection (query information server) l Authentification (how to move files, issue remote shell commands) l Executable staging (build on demand, or maintain database?) l Data migration (copy, stream, which protocol?) l Fault tolerance (essential!!!!) l Book-keeping (essential!!!! … where did the output go, what actually happened?) l Publishing of simulation information (information should be available to you and your collaborators)

34 User Portal n Find resources l automatically finds machines with a user allocation (group aware!) l continuously monitor resources, network etc. n Authentification l single login, don’t need to remember lots of usernames/passwords n Launch simulation l automatically create executable on chosen machine l write data to appropriate storage location l negotiate local queue structures n Monitor/steer simulations l access remote visualization and steering while simulation is running l collaborative … choose who else can look in and/or steer l performance … how efficient is the simulation? n Archiving l store thorn lists, parameter files, output locations, configurations, …

35 Cactus Portal n KDI ASC Project n Technology: Globus, GSI, Java Beans, DHTML, Java CoG, MyProxy, GPDK, TomCat, Stronghold n Allows submission of distributed runs n Accesses the ASC Grid Testbed (SDSC, NCSA, Argonne, ZIB, LRZ, AEI) n Undergoing testing by users now! n Main difficulty now is that it requires everything to work … robustness!! n But is going to revolutionise our use of computing resources

36 Grid Related Projects n ASC: Astrophysics Simulation Collaboratory l NSF Funded (WashU, Rutgers, Argonne, U. Chicago, NCSA) l Collaboratory tools, Cactus Portal l Starting to use Portal for production runs n E-Grid: European Grid Forum (GGF: Global Grid Forum) l Working Group for Testbeds and Applications (Chair: Ed Seidel) l Test application: Cactus+Globus l Demos at Dallas SC2000 n GrADs: Grid Application Development Software l NSF Funded (Rice, NCSA, U. Illinois, UCSD, U. Chicago, U. Indiana...) l Application driver for grid software

37 Grid Related Projects (2) n Distributed Runs l AEI, Argonne, U. Chicago l Working towards running on several computers, 1000’s of processors (different processors, memories, OSs, resource management, varied networks, bandwidths and latencies) n TIKSL/GriKSL l German DFN funded: AEI, ZIB, Garching l Remote online and offline visualization, remote steering/monitoring n Cactus Team l Dynamic distributed computing … l Testing of alternative communication protocols … MPI, PVM, SHMEM, pthreads, OpenMP, Corba, RDMA,... l Developing Grid Application Development Toolkit

38 Grid Application Development Toolkit n Application developer should be able to build simulations with tools that easily enable dynamic grid capabilities n Want to build programming API to easily allow: l Query information server (e.g. GIIS) –What’s available for me? What software? How many processors? l Network Monitoring l Decision Thorns –How to decide? Cost? Reliability? Size? l Spawning Thorns –Now start this up over here, and that up over there l Authentification Server –Issues commands, moves files on your behalf (can’t pass-on Globus proxy)

39 Grid Application Development Toolkit (2) l Information Server –What is running where? Where to connect for viz/steering? What and where are other people in the group running? –Spawn hierarchies –Distribute/loadbalance l Data Transfer –Use whatever method is desired –Gsi-ssh, Gsi-ftp, Streamed HDF5, scp, GASS, Etc… l LDAP routines for simulation codes –Write simulation information in LDAP format –Publish to LDAP server l Stage Executables –CVS checkout of new codes that become connected, etc… l Etc… n If we build this, we can get developers and users!

40 More Information... n Cactus: l Web Site: www.CactusCode.org (Documentation/Tutorials etc) l Cactus Worm: www.CactusCode.org/Development/Egrid.html n Global Grid Forum (Egrid) l www.egrid.org l www.gridforum.org n ASC Portal l www.ascportal.org n TIKSL Gigabit Computing l www.zib.de/Visual/projects/TIKSL/ n Black Holes and Neutron Star: Pictures and Movies l jean-luc.aei.mpg.de n Any questions: cactus@cactuscode.org


Download ppt "The Cactus Code: A Parallel, Collaborative, Framework for Large Scale Computing Gabrielle Allen Max Planck Institute for Gravitational Physics, (Albert."

Similar presentations


Ads by Google