Presentation on theme: "Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines Royal."— Presentation transcript:
Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines Royal Society, Tuesday 15 June, 2004 RealityGrid Software Infrastructure: Achievements and Prospects
RealityGrid Annual Workshop, 15/6/20042 Outline Review –How we got here Status –Where we are today Prospects –Where we’re going
Manchester Computing Supercomputing, Visualization & e-Science Review How we got here
RealityGrid Annual Workshop, 15/6/20044 The pieces Fast track Computational Steering Library and tools (MC) On-line Visualization (MC) Web portal (EPCC) Human-Computer Interfaces (HCI) Deep track Performance Control (CNC) Resource management, component frameworks (IC) Instruments: LUSI, XMT (not this talk) This talk will emphasise fast track work.
RealityGrid Annual Workshop, 15/6/20045 Design philosophies Grid-enabled Component-based and service-oriented –plug in and compose new components and services, from partners and third parties Independence and modularity –to minimize dependencies on third-party software Should be able to steer locally without and Grid middleware –to facilitate parallel development within project Integration and/or interoperability –Things should work together Respect autonomy of application owners –Prefer light-weight instrumentation of application codes to wholesale re-factoring –Same source (or binary) should work with or without steering Dynamism and adaptability –Attach/detach steering client from running application –Adapt to prevailing conditions Intuitive and appropriate user interfaces
RealityGrid Annual Workshop, 15/6/20046 Historical Context – Messages from above In 2002, we were told “use Globus, SRB or Condor”. Then we were told “Web services are OK too”. Then the Open Grid Services Architecture (OGSA) effort was announced. OGSA would be based on the Open Grid Services Infrastructure (OGSI), and specifications began in earnest with (it seemed) overwhelming industrial support. “You must be on an OGSA-convergence track. You must use e- Science certificates.” GT3 appears Some people build GT3 services. No-one builds production grids based on GT3. Early in 2004, we hear “OGSI was a great success. OGSI is dead. Long live WS-RF. GT3 is obsolescent.”
RealityGrid Annual Workshop, 15/6/ Enter Grid Services OGSI brought the hope of convergence between Web services (technology of choices for business process integration) and Grid computing. It offered state, 2-level naming (GSH, GSR), lifetime management, and infrastructure support for common patterns (factories, registries, notification)… With Dave Snelling, we experimented with UNICORE-based OGSI prototype (pre-dating GT3 preview).
RealityGrid Annual Workshop, 15/6/20048 First “Fast Track” Demonstration Jens Harting at UK e-Science All Hands Meeting, September 2002
RealityGrid Annual Workshop, 15/6/20049 “Fast Track” Steering Demo UK e-Science AHM 2002 Bezier SGI Manchester Vtk + VizServer Dirac SGI QMUL LB3D with RealityGrid Steering API Laptop SHU Conference Centre UNICORE Gateway and NJS Manchester Firewall SGI OpenGL VizServer Simulation Data VizServer client Steering GUI The Mind Electric GLUE web service hosting environment with OGSA extensions Single sign-on using UK e-Science digital certificates UNICORE Gateway and NJS QMUL Steering (XML)
RealityGrid Annual Workshop, 15/6/ Steering architecture in 2002 Communication modes: Shared file system Files moved by UNICORE daemon GLOBUS-IO Simulation Visualization data transfer Client Steering library
RealityGrid Annual Workshop, 15/6/ Dilemma Wanted to separate steering from job management Architecture was brittle and firewall unfriendly –Client needed to know too much about application deployment –Direct connection between client and simulation is problematic when client is mobile OGSI’s lifetime management, registries, language neutrality and notification seemed ideal for steering –(ended up not using OGSI notification for firewall reasons) But all “production” grids were based on Globus Toolkit version 2 (GT2)
RealityGrid Annual Workshop, 15/6/ Serendipity – OGSI::Lite Mark Mc Keown’s OGSI::Lite started life as a spare time exercise to understand Web services, then OGSI. Soon became a near-complete OGSI implementation. Minimal pre-requisites (Perl and SOAP::Lite) meant we could deploy it trivially in user space when the job is run. Only need permission to listen on a port. (This would be highly non-trivial using deep stack of GT3.) So we could have our OGSI cake and eat it on a GT2 grid. Our steering architecture quickly got a middle-tier implemented in OGSI::Lite.
RealityGrid Annual Workshop, 15/6/ The Architecture of Steering Steering client Simulation Steering library Visualization Registry Steering GS connect publish find bind data transfer (Globus-IO) publish bind Client Steering library Display components start independently and attach/detach dynamically multiple clients: Qt/C++,.NET on PocketPC, GridSphere Portlet (Java) remote visualization through SGI VizServer, Chromium, and/or streamed to Access Grid OGSI middle tier
RealityGrid Annual Workshop, 15/6/ The TeraGyroid Project Funding from EPSRC (UK) & NSF (USA) Ran LB3D across UK e-Science Grid and US TeraGrid Study of defect dynamics in liquid crystalline surfactant systems using lattice-Boltzmann methods Featured world’s largest Lattice Boltzmann simulation TRICEPS was the HPC-Challenge aspect of this work –Transcontinental RealityGrids for Interactive Collaborative Exploration of Parameter Space –“most innovative data-intensive application” at SC’03 Later picked up ISC 2004 award in the “Integrated Data and Information Management” category More in Richard Blake’s talk
RealityGrid Annual Workshop, 15/6/ New for TeraGyroid Access Grid integration use of Chromium to complement VizServer job migration based on malleable checkpoints user friendly “wizard” to drive job launching and migration support for parameter space exploration through checkpoint trees –also implemented in OGSI::Lite –services thrown together for TeraGyroid have been upgraded in flight –still running 8 months later file transfer service –to get around issues with systems homed on two networks port forwarding (Stephen Booth, EPCC) –to work around lack of public IP address on compute nodes (e.g. HPCx)
RealityGrid Annual Workshop, 15/6/ Checkpoint trees and parameter space exploration Initial condition: Random water/ surfactant mixture. Self-assembly starts. Rewind and restart from checkpoint. Lamellar phase: surfactant bilayers between water layers. Cubic micellar phase, low surfactant density gradient. Cubic micellar phase, high surfactant density gradient.
RealityGrid Annual Workshop, 15/6/ TeraGyroid Testbed Visualization Computation Starlight (Chicago) Netherlight (Amsterdam) BT provision PSC ANL NCSA Phoenix Caltech SDSC UCL Daresbury Manchester SJ4 MB-NG Network PoP Access Grid node Service Registry production network Dual-homed system 10 Gbps 2 x 1 Gbps
RealityGrid Annual Workshop, 15/6/ EPSRC e-Science Meeting 2004 Multiple steering clients driving same simulation –Qt client on laptop –.NET client on PDA Simon Nee (Loughborough) –Web client GridSphere Portlet Access through web browser Matthew Egbert (EPCC) –not all at same time –significant achievement in terms of OGSI interoperability Collaborative steering prototype –using ICENI and client proxy –Java bindings to client side of steering library (JNI) –Gary Kong (LeSC)
RealityGrid Annual Workshop, 15/6/ Public Release – April 2004 Steering Library released as version 1.1 version 1.0 was project internal very liberal open source license (FreeBSD) API specification version 1.1 Library (C and Fortran90 bindings) Tools, including Qt steerer User Manual Examples Available for download at: Globus-IO replaced by vanilla sockets major simplification to build process only way to complete integration of NAMD and VMD into RealityGrid
Manchester Computing Supercomputing, Visualization & e-Science Status Where we are today
RealityGrid Annual Workshop, 15/6/ Steering library We instrument (add "knobs" and "dials" to) simulation codes through a steering library, written in C –Bindings in Fortran90, C/C++ (complete) and Java (partial) Library features: –Pause/resume –Checkpoint and restart –Set values of steerable parameters (parameter steer) –Report values of monitored (read-only) parameters (parameter watch) –Emit "samples" to remote systems for e.g. on-line visualization –Consume "samples" from remote systems for e.g. resetting boundary conditions –Automatic emit/consume with steerable frequency –No restrictions on parallelisation paradigm You only implement what you need
RealityGrid Annual Workshop, 15/6/ Qt Steering client Built using C++ and Qt Attaches to any steerable RealityGrid application Discovers what commands are supported Discovers steerable & monitored parameters Constructs appropriate widgets on the fly
RealityGrid Annual Workshop, 15/6/ On-line visualisation Fast track uses open source VTK for on-line visualisation –Simple GUI built with Tk/Tcl, polls for new data to refresh image –Some in-built parallelism –extended to use the steering library –AVS-format data supported –XDR-format data for sample transfer between platforms –Volume render (parallel) –Isosurface –Hedgehog –Cut-plane New work on atom-centric meshes for Steve Kenny
RealityGrid Annual Workshop, 15/6/ OGSI is dead. Long live WS-RF! WS-ResourceFramework preserves most OGSI ideas in a way which is friendlier (less abusive) to Web services. Open Middleware Infrastructure Institute (OMII) has a conservative roadmap based on Web services. –WS-I plus as little else as possible UK National Grid Service is aligned with EGEE. –This means Globus Toolkit version 2 for at least 12 months. WS-RF (and WS-Notification) are moving targets. What does this mean for us?
RealityGrid Annual Workshop, 15/6/ Our response to WS-RF We must be able to exploit the grids that exist –GT4 is unlikely to be stable and widely deployed in lifetime of RealityGrid OGSI::Lite works fine for us, so continue to use it for now. In time, WS-RF may be appropriate. –seems indicated for the Steering Grid Service, which is a very dynamic thing –optional for persistent services such as Checkpoint Metadata Tree and Registry. These could be implemented in plain Web services. WSRF::Lite is already an option –prototype released within a few weeks of first publication of WS-RF drafts –featured in WS-RF interop fest in April, and interop demo at GGF 11 last week
RealityGrid Annual Workshop, 15/6/ Standards, generally Very slow progress on Advance Reservation –RealityGrid requires co-allocation of compute, viz, AG resources at time to suit the humans –LSF, PBS(Pro), SGE now support it, but not accessible through middleware –GRAAP-WG at GGF is bogged down in WS-Agreement and has yet to address protocols and apply them to Advance Reservation problem Practical WS-RF interoperability will require coherent, global security strategy for Web services, and a delegation model –not clear that GT4 interoperability is the driver. –GT3 and GT4 security has never been on the standards table –what is GSI-SecureConversation anyway? OGSA itself is a massive undertaking and will not settle in RealityGrid’s lifetime RealityGrid is a provider of use case drivers for GRAAP, GridCPR, OGSA, SAGA (and other) groups in GGF
Manchester Computing Supercomputing, Visualization & e-Science Prospects Where we’re going
RealityGrid Annual Workshop, 15/6/ Steering Plans Tabbed steerer (work in progress) –single client tabs between multiple steerable simulations –required for thermodynamic integration work using NAMD Steering of multi-component simulations (coupled models) –requires metadata about component interactions and schedule Quantitative study of the overhead of steering and on-line visualization Support use of steering within project Final release of steering library, toolkit and documentation Significant Gap - Security!!! –contingent on additional funding for WSRF::Lite –and coherent global security strategy for Web services
RealityGrid Annual Workshop, 15/6/ Steering - Wishlist Port of steering services to WS-RF –probably in a follow-on project Provenance of steering and parameter space exploration Collaborative steering –i.e. support simultaneous connection of multiple clients Scripted steering –Breakpoints ( IF (temperature > TOO_HOT) THEN … ) –Replay of previous steering actions Integration of steering into selected MVEs –entirely feasible, but can’t do them all
RealityGrid Annual Workshop, 15/6/ Standardisation of Steering Opportunities: Standardise an API for computational steering Standardise the WSDL of the Steering Grid Service These could be input to the GGF research group “Simple APIs for Grid Applications” (SAGA-RG) Is there critical mass?
RealityGrid Annual Workshop, 15/6/ Visualization Plans Finish atom-centric meshes High-performance visualization –re-evaluate AVS with Parallel Support Toolkit “Thin visualization” –delivered to PDA or Web browser –thumbnails in checkpoint tree Possibilities Use of *-ray from Utah AVS module for streaming to Access Grid VizServer integration: –Put GSI authentication into VizServer PAM when released –Liaison with Platform and SGI regarding use of VizServer API for Advance Reservation of graphics pipes
RealityGrid Annual Workshop, 15/6/ Launching and packaging Plans Continue to improve usability Reduce deployment overhead –wizard can now work with Java CoG kit easier to deploy than Globus client bundles Possibilities Integrate RLS or SRB into checkpoint tree Pick up Web service approaches to job submission
RealityGrid Annual Workshop, 15/6/ HCI Plans Update of HCI Audit report in light of experiences Journal paper on the HCI of TeraGyroid .NET client –deployable demonstrator with renderings on PDA and Windows laptop Identified activities, off critical path, for PhD student VizServer QoS experiments with MB-NG or UK-Light Thin visualization for PDAs and Web portals
RealityGrid Annual Workshop, 15/6/ Portal Currently provides Web client for steering –GridSphere portlet communicates with Steering Grid Service via SOAP Prototype portlet for checkpoint tree browsing Little resource (2-3 PM) remains for second phase of portal work. Plans Finish checkpoint tree browsing Incorporate use of registry for simulation discovery Hope to inherit JSR168 portlets for job launching and monitoring limited visualization capability –slice of scalar field –subject to resources
RealityGrid Annual Workshop, 15/6/ Resource Management – Deep Track Advance Reservation –proof of concept using SGE 6.0 Implemented within Job Submission Web Service separated from ICENI –using Job Definition Markup Language (JDML) which is evolving into Job Submission Definition Language (JSDL) through Global Grid Forum JSDL working group –designed to support plug-in of other job submission systems eg. Globus, gsi-ssh, UNICORE, LSF,...
RealityGrid Annual Workshop, 15/6/ ICENI integration – Deep Track Application Steering library Steering GS Control Status Data in / Data out Technical report on feasibility of integrating fast-track steerable binary (with associated SGS) as an ICENI component If practical, do it.
RealityGrid Annual Workshop, 15/6/ Performance Control – Deep Track Performance Control of coupled models –working with HybridMD code and Bespoke Framework Generator (BFG) –outcomes: technology demonstrator & research papers –deployment in production is unlikely Performance prediction of same –Steering of BFG-coupled models Integration of PERCO and ICENI is not likely Generalised malleable-checkpoint library is unlikely –major undertaking, re-inventing SRS from UTK –application specific alternatives always possible for those that need it Proven to be possible to support steering or PERCO through a common API –which simplifies instrumentation of application codes –but doing both at the same time leads to frighteningly complex interactions
RealityGrid Annual Workshop, 15/6/ Conclusions We will not solve everything during the lifetime of RealityGrid We must be ruthless about what we do and do not undertake
RealityGrid Annual Workshop, 15/6/ Partners Academic University College London Queen Mary, University of London Imperial College University of Manchester University of Edinburgh University of Oxford University of Loughborough Industrial Schlumberger Edward Jenner Institute for Vaccine Research Silicon Graphics Inc Computation for Science Consortium Advanced Visual Systems Fujitsu BT Exact
Manchester Computing Supercomputing, Visualization & e-Science Bringing Science and Supercomputers Together Manchester Computing